Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Band selection in spectral imaging for non-invasive melanoma diagnosis

Open Access Open Access

Abstract

A method consisting of the combination of the Synthetic Minority Over-Sampling TEchnique (SMOTE) and the Sequential Forward Floating Selection (SFFS) technique is used to do band selection in a highly imbalanced, small size, two-class multispectral dataset of melanoma and non-melanoma lesions. The aim is to improve classification rate and help to identify those spectral bands that have a more important role in melanoma detection. All the processing steps were designed taking into account the low number of samples in the dataset, situation that is quite common in medical cases. The training/test sets are built using a Leave-One-Out strategy. SMOTE is applied in order to deal with the imbalance problem, together with the Qualified Majority Voting scheme (QMV). Support Vector Machines (SVM) is the classification method applied over each balanced set. Results indicate that all melanoma lesions are correctly classified, using a low number of bands, reaching 100% sensitivity and 72% specificity when considering nine (out of a total of 55) spectral bands.

© 2013 Optical Society of America

1. Introduction

Cutaneous melanoma is one of the most common malignant skin cancers. According to the European Cancer Observatory, in 2008, 32107 cases for men and 35324 cases for women were detected; i. e. a rate of 11.5 and 11.4 per 100000 people respectively [1]. Early signs of melanoma appearance include changes in the shape or in the color of existing moles or in the formation of a new lump. The ABCD rule (A, Asymmetry, B, irregular Border, C, variety of Colors, and D, Diameter) helps in lesion diagnosis. Currently the diagnosis process includes tissue biopsy and histopathology, which implies surgical intervention. Therefore, a non invasive diagnosis method would be beneficial to avoid unnecessary surgery. The different skin components (such as melanin, hemoglobin, oxy-hemoglobin and water) behave differently in the infrared part of the wavelength spectrum [2]. However, the human eye is not sensitive to this spectral range. Multispectral images acquired in the visible to near-infrared spectrum may help experts to differentiate benign from malignant lesions. On the other hand, as the number of bands increases, noise and redundancy increase as well [3]. Some authors have used a pre-defined set of bands, however these sets differ among publications. Diebele et. al [4] use only the 540, 650 and 950nm bands. Patwardhan, Dhawan et al. [5] selected bands in the range from 350 to 700nm simulating light propagation using a Monte Carlo modeling approach. Dhawan et al. [6] present a dataset of real lesions acquired with a nevoscope system for 510, 560 and 610nm. D‘Alessandro and Dhawan have also analyzed melanoma lesions using a nevoscope, but in different sets in the infra-red spectral range [2, 7]. Other authors apply feature reduction methods over a group of features generated from the images, and not over the set of bands [8, 9].

The aim of this work is to analyze the effect of band selection on the quality of melanoma classification, over the visible and near infrared wavelength range, taking into account that the dataset is small and imbalanced. This work is substantially different from [10], where preliminary classification results were presented and no band selection strategy was applied. The structure of the paper is: in Section 2 the multispectral image acquisition system is described. In Section 3 the image acquisition process and the classification strategy are explained. Results and discussions are shown in Section 4, and conclusions are given in Section 5.

2. Image acquisition

The acquisition system consists of two pairs of cameras, each one with a Liquid Crystal Tunable Filter (LCTF). One of the LCTFs works in the [400, 720]nm visible spectral range, and the other in the [650, 1100]nm near infrared spectral range. Its combination forms a 71 –dimensional vector in the range from 400 to 1100nm. However, due to acquisition problems in the extrema of the filter range, the bands in the [400, 440]nm and in the [1000, 1100]nm intervals were discarded, therefore working in the [450, 990]nm range. The final dimensionality of the vectors is 55. The camera attached to the LCTF in the visible range was a Marlin F080B (Allied Vision Technologies), whose CCD resolution is 1024 × 768 pixels. A QImaging Retiga EX camera, with a resolution of 1036×1360 pixels, was coupled to near infrared LCTF. In both cases (Fig. 1), the optical system in front of the rear part of the filter was a Canon TV zoom lens, while a Macro Schneider system [11] connected the filter to the camera. The skin lesions were illuminated with a fiber optic ring light guide coupled to the Canon TV zoom lens. The light guide transmits the light from a Fiber-Lite DC950 (150W quartz halogen lamp) light source. A white cylinder was attached to the optical ring to homogenize the illumination and to allow working at a fixed and predefined distance. In this set-up, cross-polarizers (to avoid influence of the skin specular reflection component) were not used for two reasons: (a) because it would increase the already significant acquisition integration time for some wavelengths that are needed for the LCTF due to its transmission performance; (b) because the optical set-up produces a diffuse illumination on the skin lesion, minimizing specular effects. The amount of light needed for image acquisition depends on the camera sensor sensitivity, the light emitting source power and the LCTF transmittance factor. Band acquisition is sequential and the acquisition time for each band is different, due to the dependence of the filter transmittance factor with wavelength. In order to infer the acquisition time per band, an ideal reflectance diffuser object, called spectralon, was used. We found there was an approximately linear dependence between acquisition time and grey level value. The acquisition time for all the bands was approximately 83 seconds. During this time, involuntary movements of the patient may happen. Therefore, an image registration method is needed to correct them. The image corresponding to the band with the smallest acquisition time was selected as the reference image. The rest of them were registered against this applying a Mutual Information maximization based registration method [12, 13]. This is a (multimodal) reference method in medical imaging, which provides very accurate results when it is applied to some type of modalities, as it is the case of the spectral bands in an image. A 2D affine motion model is used to parameterize the relative image movement between two bands. These motion parameters are figured out by maximizing the mutual information between the pixel values of the bands using a simplex-based optimization algorithm. A vector with the spectral information in the whole [450, 990]nm interval was created with the mean grey scale value of a Region Of Interest (ROI), normalized between 0 and 1. The ROI was manually defined in each case by an expert physician. Since image acquisition in the VIS and NIR ranges are taken with different filters, the overlapping spectral bands of both filters are used to normalize the values of both vectors in a single vector representation.

 figure: Fig. 1

Fig. 1 Multispectral image acquisition system.

Download Full Size | PDF

3. Selection of spectral bands

The number of samples in medical diagnosis is usually small and fortunately the number of melanoma lesions is usually much lower than the number of benign lesions. However the input dimensionality may be higher. Therefore, it becomes necessary to apply a band selection method able to deal with these characteristics: a high dimensionality two class problem with imbalance and datasets with few samples. Feature selection methods can be divided into three main groups:wrapper, filter, and embedded methods [14]. Filter methods use a selection criterion that is independent of the classification strategy. Wrapper methods select the features taking into account the result given by a classification strategy. Embedded techniques are defined for a specific classification method. Filter methods are usually based on statistical measures like correlation and mutual information [15] and they usually work poorly when considering small datasets. On the other hand, wrapper methods are computationally expensive because they validate feature selection using the classification method selected. In this problem of few samples, classification execution time is low, therefore a wrapper method is a good option. In wrapper methods, the generation of the feature subsets can be considered as a combinatorial search problem. The most popular solutions are forward, backward and floating sequential schemes, which provide a sub-optimal solution.

The Sequential Forward Floating Selection (SFFS) [16] procedure (selected in this work) starts with an empty set S. For a particular number of selected bands, and once a band has been incorporated in S, it makes a number of backward steps as long as the resulting subsets of bands is better than the previously evaluated ones at that level. Therefore, there will be no backward steps if the performance can not be improved. The SFFS band selection approach is a data driven method, i. e., it is not a physics-based method that models light propagation in the skin. It selects the bands based on pattern recognition principles for classification tasks, that is, in this case the band selection is treated as a feature selection problem in pattern recognition, using the response of the chosen classifier combined with the SFFS strategy to select the subset of bands that are more relevant from the point of view of classification performance. In our problem, the main aim is the correct detection of all the melanoma lesions. Our second aim is to obtain the highest possible value of correctly classified benign lesions. Therefore, when two sets of bands give the same number of correctly classified melanoma lesions, then the number of correctly classified benign lesions is the next criteria to consider. In the step of adding or deleting one band, some bands can produce the same best result. In order to select one of the options, the band that is farthest away from the bands in S is added in the next step, and the band that is closest to one that will remain in S is removed in the deletion step.

3.1. Classification model

Figure 2 shows a block model of the classification strategy. A Leaving-One-Out strategy was used to create the training and test datasets, in order to deal with the low number of samples. Each sample x is used once as test data with all other observations as training data. For each sample x a classification result is obtained using only the bands in S (Fig. 2). The number of samples of each class correctly classified is used to assess the quality of S as previously explained. In order to obtain competitive classification results, the imbalance problem must be solved. The melanoma-class has a lower number of samples, it is the minority class and it also tends to have poor classification results. There are two main ways to overcome an imbalance problem: over-sampling the minority class and under-sampling the majority class. Over-sampling generates new samples for the minority class until both classes are equally represented. This is the approach we used here, in particular the Synthetic Minority Over-Sampling TEchnique (SMOTE) [17]. Given the imbalanced dataset, for each sample x1 of the minority class, one of its 3-nearest-neighbours of the same class is selected xnear1; then a random number between 0 and 1 is generated to allocate a new sample xnew on the line between the two points x1 and xnear1. Due to the random behaviour of SMOTE, different sets may be generated as well. We therefore applied it five times, obtaining five different balanced datasets (Dsmo1 to Dsmo5). Each balanced set was considered only with the bands in S (called Dsmoi[S], i = 1,...,5). In this paper, a Support Vector Machines (SVM, [18]) classification strategy with a Radial basis Function (RBF) kernel was used. The best combination of parameters (C,γ ) is obtained by a grid search in the log-scale space of the parameters, using a k-fold-cross-validation strategy, with k = 3, to obtain an optimal solution. The grid search was made over the entire balanced data of all samples and bands. Five balanced datasets are generated and the classification model for each one generates a prediction. We applied SVM on each one of the balanced datasets Dsmoi[S]. For each lesion five results were obtained. In this case the Simple Voting System scheme is not the best option. For example in the case that a lesion might be classified as melanoma two times and as a benign lesion three times, here the risk assumed to consider it as benign is too high and the doctors would recommend a biopsy to confirm or discard the diagnosis. A minimum percentage is required to take a decision and discard the possibility of a biopsy. The Qualified Majority Voting (QMV) scheme adapts well to this idea. It consists of classifying a lesion as benign only if the number of times classified as benign is more than a threshold, in this case a 70%. Alternatively, it would be classified as a malignant lesion.

 figure: Fig. 2

Fig. 2 Classification Model. Result for a sample x using bands in S.

Download Full Size | PDF

4. Results and discussions

Multispectral images of 32 lesions were acquired during approximately one year at the Hospital Provincial de Castellón, Spain. The benign lesions were classified by the expert pathologists in compound melanocytic nevus, intradermal melanocytic nevus, intradermal compound papillomatous nevus, compound papillomatous nevus, dysplastic nevus, congenital nevus and blue nevus; while most of the melanomas were type nodular with Clark index IV; the BRESLOW and Ki67 index were provided too. As already mentioned in the paper, the problem was addressed as a classification task to differenciate between melanoma and non melanoma (benign lesions). In particular, 7 of them were finally diagnosed as melanomas. The remaining 25 were labeled as benign. In our case, we consider the positive class as the melanoma class. In order to assess the classification quality, we consider the sensitivity (SE) and specificity (SP) measures. The SE is the proportion of malignant cases correctly classified, and can be mathematically expressed as SE= TP/(TP + FN), where TP means True Positives, and FN means False Negatives. Therefore 100% should be the aim. For the negative class, this proportion is called SP, which can be formulated as SP= TN/(FP + TN), where TN is the number of True Negatives and FP is the number of False Positives. Additionally, the True Negative (TN) number should be as high as possible, but the True Positive TP number is the main aim. Table 1 shows the classification results for different groups of bands as selected by SFFS. The best results, 100% SE and 72% SP are obtained when the group of 9 to 19 bands selected by SFFS is used. For the case of 9 bands, the selected bands are {440, 460, 490, 510, 530, 550, 710, 780, 790}. Therefore, they span the visible and the first part of the near infrared wavelength range. Besides, note that SFFS achieves 100% sensitivity when using subsets of bands between 2 and 19 bands. When the 55 available bands are used, the SE is 71%, and SP is 76%. Being the detection of all melanoma cases the most important aim, we can see that using all the bands does not reach the main objective. For the case of nine bands, only two bands are in the near infrared part of the spectrum. It is worthwhile mentioning that the Infrared region is in fact necessary and that the proposed method does not select individual bands, but combinations of bands. Notice also that for the case of the selection of just one band, the best result was obtained with 830nm (TP= 6, TN= 14, sensitivity= 86% and specificity= 56%). Besides, the range [800, 830]nm is the range where, when selecting just one band, the results are the best ones, with TP= 6 and TN= 13 for the [800, 820]nm wavelength interval.

Tables Icon

Table 1. Results Obtained with Each Subset of Bands

5. Conclusion

This paper shows the viability to do band selection on multispectral images acquired of pigmented skin lesions in order to classify them as melanoma or benign skin lesions. Sequential Forward Floating Selection (SFFS) is used taking into account the characteristics of the problem: a two-class imbalanced dataset with few samples, where the class with the highest priority is the minority class. It is applied giving more importance to the positive class. SVM is used as classification method. The dataset is balanced using the SMOTE technique before classifying the samples. The randomness of the SMOTE technique is dealt by running it 5 times and using the QMV voting scheme. The results with the sets of less than 20 bands are competitive, i. e. all melanoma lesions are correctly identified. The best result is 100% SE and 72% SP, obtained when considering between 9 and 19 bands. The results obtained show that band selection improves sensitivity results when compared with the use of all the bands, and therefore can assist doctors in the diagnosis of melanoma lesions, reducing the amount of unnecessary surgery for biopsy tests, allowing a fast way to perform patient screenings in an extensive, quick and cheap way. We would like to stress that the main aim of this work was to analyse how discriminative the spectral information can be to differentiate between melanoma lesions and other types of lesions. Though the results are promising, in order to increase specifity, future research must include other features to characterize the ROIs, as it can be shape or texture features.

Acknowledgments

This work was supported by the Spanish Ministry of Science and Innovation under the projects Consolider Ingenio 2010CSD2007 – 00018, and EODIX AYA 2008 – 05965 – C04 – 04/ESP, by the Project 09I079.01/1 from Fundación C.V. Hospital Provincial de Castellón, and by the Generalitat Valenciana through the project PROMETEO/2010/028.

References and links

1. “Cancer: Melanoma of skin,” Eur. Cancer Obs. (2011) http://eu-cancer.iarc.fr/cancer-11-melanoma-of-skin.html,en.

2. B. D‘Alessandro and A. P. Dhawan, “Multispectral Transillumination Imaging of Skin Lesions for Oxygenated and Deoxygenated Hemoglobin Measurement,” in Proceedings of IEEE EMBS (2010), pp. 6637–6640.

3. S. Kumar, J. Ghosh, and M. Crawford, “Best-bases feature extraction algorithms for classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens. 39, 1368–1379 (2001) [CrossRef]  .

4. I. Diebele, I. Kuzmina, A. Lihachev, J. Kapostinsh, A. Derjabo, L. Valeine, and J. Spigulis, “Clinical evaluation of melanomas and common nevi by spectral imaging,” Biomed. Opt. Express 3, 467–472 (2012) [CrossRef]   [PubMed]  .

5. S. V. Patwardhan, A. P. Dhawan, and P. A. Relue, “Wavelength Selection for Multi-Spectral Imaging of Skin Lesions Using Nevoscope,” in Proceedings of the IEEE 29th Annual Northeast Bioengineering Conference (2003), pp. 327–328 [CrossRef]  .

6. A. P. Dhawan, B. D‘Alessandro, S. Patwardhan, and N. Mullani, “Multispectral Optical Imaging of Skin-Lesions for Detection of Malignant Melanomas,” in Proceedings of IEEE EMBS (2009), pp. 5352–5255.

7. B. D‘Alessandro and A. P. Dhawan, “Blood Oxygen Saturation Estimation in Transilluminated Images of Skin Lesions,” in Proceedings of the IEEE-EMBS on BHI (2012), pp. 729–732.

8. M. Elbaum, A. W. Kopf, H. S. Rabinovitz, R. G. B. Langley, H. Kamino, M. C. Mihm Jr., A. J. Sober, G. L. Peck, A. Bogdan, D. Gutkowicz-Krusin, M. Greenebaum, S. Keem, M. Oliviero, and S. Wang, “Automatic differentiation of melanoma from melanocytic nevi with multispectral digital dermoscopy: A feasibility study,” J. Am. Acad. Dermatol. 44, 207–218 (2001) [CrossRef]   [PubMed]  .

9. R. Marchesini, A. Bono, S. Tomatis, C. Bartoli, A. Colombo, M. Lualdi, and M. Carrara, “In vivo evaluation of melanoma thickness by multispectral imaging and an artificial neural network: A retrospective study on 250 cases of cutaneous melanoma,” Tumori 93, 170–177 (2007) [PubMed]  .

10. I. Quinzán, P. Latorre Carmona, P. García, E. Boldó, F. Pla, V. García, R. Lozoya, and G. Pérez de Lucía, “Non-Invasive Melanoma Diagnosis Using Multispectral Imaging,” in Proceedings of ICPRAM (2012), pp. 386–393.

11. SCHNEIDER Industrial optics: OEM. In http://www.schneiderkreuznach.com.

12. F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens, “Multimodality image registration by maximization of mutual information,” IEEE Trans. Med. Imaging 16(2), 187–198 (1997) [CrossRef]   [PubMed]  .

13. J. P. W. Pluim, J. B. A. Maintz, and M. A. Viergever, “Mutual-information-based registration of medical images: A survey,” IEEE Trans. Med. Imaging 22(8), 986–1004 (2003) [CrossRef]   [PubMed]  .

14. I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” J. Mach. Learn. Res. 3, 1157–1182 (2003).

15. T. M. Cover and J. A. Thomas, Elements of Information Theory (John Wiley And Sons, 1991) [CrossRef]  .

16. P. Pudil, F. J. Ferri, J. Novovicova, and J. Kittler, “Floating search methods for feature selection with nonmonotonic criterion functions,” Proc. of the 12th Int. Conf. on Pat. Rec. (1994), Vol. 2, pp. 279–283.

17. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res. 16, 321–357 (2002).

18. C. Cortes and V. Vapnik, “Support-vector network,” Mach. Learn. 20, 273–297 (1995) [CrossRef]  .

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (2)

Fig. 1
Fig. 1 Multispectral image acquisition system.
Fig. 2
Fig. 2 Classification Model. Result for a sample x using bands in S.

Tables (1)

Tables Icon

Table 1 Results Obtained with Each Subset of Bands

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.