I. Introduction
Image processing and artificial intelligence are used to enhance and extract features from images, facilitating machine learning of models for skin cancer disease detection and classification. Currently, classical machine learning and deep learning models with variations and integrations of learning models are used. These models present a high efficiency in benign and malignant diagnostic classifications, although, in some cases, a special hardware tool is needed to execute the computational models.
In the detection of skin cancer, classification between benign and malignant lesions is prioritized [1], with a benign classification success rate of 82.77% and 88.61% using the ResNet101 and InceptiónV3 models, and a malignant classification success rate of 85.66 % and 86% respectively for these models [2]. Similarly, support vector machines have been used for classification, where sensitivity reaches 90% for benign lesions, and the classification efficiency is just over 93% [3]. However, works have been developed with an emphasis on skin cancer type identification, where the melanoma type classification reached an efficiency of 91% with 111 dermatoscopic images, and a success rate of more than 90% in carcinoma classification with 135 images with test data. On the other hand, with validation data, the accuracy of the employed framework drops to 72.1% when tested with two classes. When adding more classes, its overall accuracy performance drops to 55.4% [4]. Additionally, combined neural networks have been employed for the diagnosis of non-pigmented skin cancer using the InceptionV3 model and a ResNet50, where cancer identification presented an accuracy of 72.5% in 95% of clinical practices, and specific diagnosis among malignant types of skin cancer showed an increase to 74.2% in 95% of practices [5].
VGG is a neural network architecture used in skin lesion detection and identification processes. In its VGGNet extension, it has been fused and compared with other CNN architectures such as GoogleNet, AlexNet, and ResNet, where with VGGNet, the average hit rate was 81.3%, expressing the need to use high-level and cost hardware tools [6]. Furthermore, a VGG architecture with 13 convolutional layers, 5 clustering layers, and 3 fully connected layers with input sizes of 227*227*3 and 200 iterations has been applied, where the hit rate in lesion classification ranged from 60% to 90% [7]. Likewise, XGBoost has been implemented as a classification model, where with the default hyperparameters defined and with test image sizes of 64x64, a recall of 44% in the identification of melanoma, and specifics of 69% in the identification of non-melanoma lesions was presented [8]. Additionally, the performance of XGBoost classifications has been compared with RNNs, where XGBoost decreases the efficiency with respect to RNNs. However, XGBoost does not suffer from information loss effects due to data transformation for RNN model fits [9].
This paper presents HASCC, a hybrid algorithm developed for skin cancer classification between carcinoma (C1), benign keratosis (C2), and melanoma (C3) with images from the HAM10000 dataset. HASCC contains image processing, feature extraction, and reduction stages based on VGG16 and PCA, and a classifier based on XGBoost. This algorithm is integrated into a CAD-like graphical interface, runs on a Raspberry Pi 4 board, and its performance is measured in hardware and software.
II. Methodology
A methodology based on three stages is proposed. In the first stage, the hybrid algorithm developed for detecting and classifying skin cancer, such as carcinoma, benign keratosis, and melanoma, is detailed. The second stage presents the graphical user interface designed for the CAD tool. The third stage validates the tool developed on the Raspberry Pi 4 through a hardware and software comparison against classification algorithms.
A. Hybrid Algorithm for Computer-Aided Diagnosis
The hybrid algorithm developed for skin cancer classification on dermoscopic images integrates algorithms based on image processing techniques, such as feature extraction and reduction, using the VGG16 algorithm for extraction and PCA for dimensionality reduction of image components, as well as XGBoost for classification.
1) Image Processing Techniques for Body Hair Suppression. The input image is initially subjected to a preprocessing step by converting to grayscale with median filter smoothing and kernel size with k=3. Subsequently, the Sobel operator is applied to identify body hair in the lesion. In this case, the first derivative for the horizontal and vertical components is applied to the grayscale image. In addition, erosion morphology is applied to attenuate the effects generated by the noise produced by the body hair on the lesion, using a cross-shaped structural element with a kernel size of 5x5. Moreover, the thresholding method is applied to separate the lesion region from the image background. The binary thresholding method (with a variable value between 0 and 255 pixels from the graphical interface) is used, complemented with contour search stages, to detect the region of interest in the image. In addition, the Otsu adaptive thresholding method is employed for a first approximation of the background lesion segmentation. In this case, the reinforcement threshold is 127.
Image processing culminates with the application of morphological aperture to smooth the contours of the thresholded image with an elliptical structural element with k=5. Furthermore, a superposition is performed between the image with median smoothing and the thresholded region of interest with morphological aperture, where the image without body hair is represented in the foreground.
2) VGG16 and PCA Algorithms. The images used are part of the ISIC Challenge skin cancer dataset consisting of 10,015 pigmented dermoscopic image samples validated with different modalities, in which the dimension of each image is 600x450 pixels. For feature extraction, a VGG16 convolutional neural network architecture is made without layers fully connected by its extraction application, composed of 5 convolutional layers of 64, 128, 256, and 512 channels and 5 grouping layers with 2x2 stride, optimized and grouped with 3x3 size kernels for the convolutional layers and 2x2 size kernels for the grouping layers that allow improving the performance of the network for the extraction. According to the above, an input image size of 128x128 pixels in RGB scale is proposed for feature extraction. The data distribution is 60% for training and 40% for validation and testing.
An image component reduction method is applied to reduce the computational requirements. In this case, the Principal Component Analysis (PCA) method is used [10]. With the image in RGB color space, the channels are separated, and a normalized data raster is obtained and transformed so that the number of components to be reduced is input. In the case of the ISIC Challenge Dataset images, the initial and original 1,890 image components extracted with the VGG16 network were reduced to 375 components for the hybrid algorithm with a total explained variance ratio of 96.6%, representing most of the variance of the information.
Following the reduction of the components by separate channels, the components are integrated in a form that generates a multi-channel image for classification.
3) XGBoost Classifier. The XGBoost hyperparameters referred to the learning rate, the number of reinforced trees per gradient (equivalent to iterations per reinforcement), the maximum tree depth set that makes the classifier less susceptible to overfitting, the Lagrangian operator (gamma), the ratio of random training subsamples to train each tree, the subsampling ratio of each column when constructing each tree and the random seed were set to 0.1, 400, 5, 0.04, 0.8, 0.85 and 27, respectively.
B. Graphical User Interface
For the design of the graphical user interface, two factors are taken into consideration: the identification of user requirements and the identification of the task [11]. Regarding the identification of user requirements, it is necessary to protect the personal information provided to the system, so a credential validation stage is required to access the system. Likewise, a characterization stage is required so that potential patients can be individualized. In addition, this information must be sent to a spreadsheet file in real time. Moreover, the assisted diagnosis tool for the identification and classification of skin cancer through images requires the specialist to adapt and segment the image using sliders, and also to determine the kernel size of the blurring filter to be applied to the image.
Regarding task identification, the specialist must log in to the system by validating credentials to access the characterization stage, in which they enter the necessary information to individualize the patient and save this information. Then, they access HASCC and enter the input image into the system. The specialist has the autonomy to use horizontal sliders to segment the image as far as they wish. They can also apply smoothing filtering using horizontal sliders if the image has body hair. In addition, the method used for diagnosis must be selected and visualized.
C. Validation of HASCC on the Embedded System
The HASCC was validated on a Raspberry Pi 4 embedded system. Its performance was weighted against three methods based on computational intelligence algorithms: AutoKeras (a machine learning distribution of Keras), LightGBM +GLCM (Light Gradient Boosting Machine + Grey-Level Co-occurrence Matrix), and PCA+VGG16 (as a classifier). The comparison is performed both for hardware and software [12]. Regarding hardware, the response time from the time of execution, RAM used, CPU used, and the power consumed by the Raspberry Pi 4B at the time of HASCC execution were analyzed. In addition, the software comparison was performed considering parameters such as accuracy, precision, recall, and F1-Score [13]. Additionally, the HASCC was compared to previous work developed for the identification and classification of skin cancer.
III. Results and Discussion
Figure 1 shows the image processing applied as the initial process of the proposed hybrid algorithm. Section A shows 3 original images corresponding to the classification classes: benign keratosis, carcinoma, and melanoma. Similarly, in B, the grayscale images are shown, while in C and D, those resulting from applying the first derivative for edge detection using Sobel for the horizontal and vertical components are presented, respectively. D illustrates the Sobel resultant, while E and F show the thresholded images using the binary thresholding method and Otsu's method, respectively. The graphical interface allows the determination of the best segmentation using any of the thresholding techniques since the efficiency of this process depends on factors such as the level of incident light at the time of image capture.
In Figure 2, the distribution of PCA components, the confusion matrix, and the ROC curve obtained for the hybrid algorithm are depicted.
Figure 2a shows the dispersion for the first and last pair of components when applying PCA, where the overlapping effect between classes is mitigated. In addition, the confusion matrix is shown in Figure 2b, where the true positives are represented with values of 382, 343, and 344 for C1, C2, and C3, respectively. Moreover, the HASCC ROC curves are presented in Figure 2c, from which the sensitivity rate of 88.2% for C1, 80.9% for C2, and 85.4% for C3 is inferred, indicating the stability of the model in determining the positive proportions of C1, C2, and C3.
Figure 3 shows the system architecture for HASCC. Access to the system is initially represented by the integrated RPi 4 board, validating the login credentials. Subsequently, the diagnostic GUI is accessed, where the image is loaded into the system and the skin cancer type is classified using the HASCC.
Table 1 shows the results of the comparison between the proposed hybrid skin cancer classification algorithm (HASCC) and the other hardware and software classification algorithms, in which the shaded values represent the best results. HASCC takes approximately 40 times less time to train than AutoKeras, 1.39 times less time than LightGBM + GLCM, and 1.76 times less time than PCA + VGG16. Likewise, HASCC had the shortest response time upon execution, being 7.6 times faster than AutoKeras, 6 times faster than LightGBM + GLCM, and 1.6 times faster than VGG16. In terms of RAM memory, HASCC was the least demanding, with a decrease of 12.6%, 10.2%, and 2 % when compared to the other algorithms. It also required 3.1 times less CPU level during processing than AutoKeras, 1.4 times less than LightGBM + GLCM, and approximately the same percentage as VGG16. In terms of power, it was only outperformed by the LightGBM + GLCM algorithm, which consumed 2.95 W, while HASCC consumed 3.05 W.
Table 1 Performance of the HASCC considering hardware and software efficiency.
Accuracy | ||||||||
Method | Training time | Response time | RAM % | CPU % | Power | C1 | C2 | C3 |
HASCC | 83.4 | 0.25 | 42.7 | 21.4 | 3.05 | 93.2 | 88.5 | 88.2 |
AutoKeras | 3325.1 | 1.9 | 55.3 | 65.8 | 3.2 | 87.9 | 80.3 | 84.8 |
LightGBM +GLCM | 116.2 | 1.5 | 52.9 | 30 | 2.95 | 91.8 | 82.1 | 84.4 |
PCA+VGG16 | 146.55 | 0.4 | 44.7 | 21.6 | 4.35 | 92 | 85.8 | 86 |
Method | Precision | Recall | F1-Score | ||||||
C1 | C2 | C3 | C1 | C2 | C3 | C1 | C2 | C3 | |
HASCC | 88.2 | 80.9 | 85.6 | 91.6 | 84.3 | 79.1 | 89.9 | 82.6 | 82.2 |
AutoKeras | 89.6 | 65.3 | 74.2 | 78.2 | 73.3 | 77.5 | 83.5 | 69.1 | 75.8 |
LightGBM +GLCM | 90.1 | 71.2 | 75.7 | 86.7 | 74.4 | 75.5 | 88.3 | 72.8 | 75.6 |
PCA+VGG16 | 90.8 | 76.9 | 77.7 | 86.6 | 80.1 | 78.4 | 88.6 | 78.5 | 78.1 |
At the software level, HASCC showed better performance concerning the accuracy of other algorithms since, for each of the three classes, it reached 93.2% for C1, 88.5% for C2, and 88.2% for C3. In terms of precision, it was only surpassed by VGG16 in the C1 class, which improved performance by 2.6%, while HASCC improved accuracy by up to 15.6% for the C2 class, and by up to 11.4% for the C3 class. Likewise, HASCC performed better for each of the three skin cancer type classes, improving by up to 13.4%, 11%, and 3.7% at the recall level, indicating an improvement of the proposed hybrid algorithm to identify the cancer type.
The values were confirmed with the F1-Score metric, which compares the combined performance of the metrics, in which HASCC presented the highest values with 89.9%, 82.6%, and 82.2% for the three classes. The improvement of HASCC with respect to the other algorithms and methods at the software level can be explained by the reduction of components prior to the classification stage due to its representation in terms of least squares, decreasing from 1,890 to 375 image components. With this, both the execution time and the percentages of RAM and CPU used are improved.
In terms of power, the method based on LightGBM + GLCM, although it was 1.03 times faster, it gives up performance at the software level due to the consideration of the image purely in grayscale. HASCC also improved software-level performance in terms of accuracy, decreasing the gap between the numbers of false and true positives and false negatives and true and negatives in the classification of each of the three classes. At the accuracy level, HASCC presented the lowest performance with 88.2% in the carcinoma class, which was an expected result since only true and false positives are taken into consideration, not including the number of negatives represented in the classifier confusion. In contrast, it showed better performance for the classes C2 and C3, which are the most complex to classify due to the similarity in appearance of the lesion. In terms of recall, HASCC was the best performer for each of the three classes, indicating a remarkable improvement of this method at the level of correct classifications of the classes (true positives). This justifies the improvement in the F1-Score, indicating that HASCC presents a better balance between precision and recall for each of the classes through the harmonic mean.
Table 2 presents the comparison between HASCC and related work. [14] did not differentiate between specific types of skin lesions but between benign and malignant. Still, HASCC improved on average the classification performance of benign malignant lesions considered in the study by 5.75% compared to Resnet101, and by 2.66% compared to InceptionV3. In [15], all classes available in the ISIC-Archive dataset were considered, and the Resnet 50 architecture was used. The accuracy, on average, outperformed HASCC by 2.13%. However, the method used in this study contemplated the data augmentation of the HAM10000 dataset in classes containing a small number of images, which, by increasing proportionally, increases the chances of overfitting and overlearning in training. Similarly, in [16], the same classes tested by HASCC were considered, using the MobileNet and Modified MobileNet architecture, where HASCC improves performance by 9.83% over MobileNet, and by 6.04% over Modified MobileNet.
Table 2 HASCC vs related works.
Work | Dataset | Class | Architecture | Result |
---|---|---|---|---|
HASCC | ISIC-Archive (HAM 10000) | Akiec, bkl, mel | VGG16 - PCA - XGBoost | ACC: 93.2 %, 88.5 %, 88.2 %, |
[14] | ISIC-Archive | Malign and no malign | Resnet 101 - Inception V3 | ACC: 85.66 %; 82.77 % - 86 %; 88.61 % |
[15] | ISIC-Archive (HAM 10000) | All classes of HAM10000 | Resnet 50 | ACC: 92.1 % |
[16] | ISIC-Archive (HAM 10000) | Mel, akiec (malign), bkl (benign) | MobileNet, modified MobileNet | ACC: 80.14 %, 83.93 % SPE: 80 %, 84 % |
[17] | ISIC-Archive | Mel, benign | U-MobileNetV1, U-DenseNet121 | ACC: 87.6 % |
HASCC improves in specificity by 5% over MobileNet and by 1% over MobileNet Modified. Furthermore, in [17], only melanoma class and benign lesions were considered for classification, using a fusion between U-MobileNetV1 and U-DenseNet121 architectures, where classification hits were 2.37% lower on average than those developed by HASCC. Although HASCC improves the classification processes of skin lesions, there is a need to improve the melanoma class samples to improve differentiation, given their similarity to other lesion types.
Figure 4 shows the integrated graphical user interface for HASCC. It shows the stages of credential validation, patient characterization, visualization, and lesion diagnosis.
IV. Conclusion
In this work, a hybrid algorithm for skin cancer classification is proposed. The use of image processing techniques enabled body hair removal, feature extraction using a robust method such as VGG16, component reduction using PCA, and selection of information of interest to deliver to the XGBoost classifier, which directly influenced the software performance and model hardware requirements for implementation. The results showed an improvement of HASCC over conventional and merged architectures, as well as related work in skin cancer classification. By running HASCC on a Raspberry Pi 4B embedded system, the possibility of replicating and improving the proposed architecture on high and low-cost hardware tools is inferred. Also, it is concluded that there is space for model optimization considering feature extraction and classification parameters, such as the susceptibility to improve the data provided by the datasets to avoid over-fitting, over-learning, and increasing the hit rate in classifications.