Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Stokes shift spectroscopy and machine learning for label-free human prostate cancer detection

Open Access Open Access

Abstract

The Stokes shift spectra (S3) of human cancerous and normal prostate tissues were collected label free at a selected wavelength interval of 40 nm to investigate the efficacy of the approach based on three key molecules—tryptophan, collagen, and reduced nicotinamide adenine dinucleotide (NADH)—as cancer biomarkers. S3 combines both fluorescence and absorption spectra in one scan. The S3 spectra were analyzed using machine learning (ML) algorithms, including principal component analysis (PCA), nonnegative matrix factorization (NMF), and support vector machines (SVMs). The components retrieved from the S3 spectra were considered principal biomarkers. The differences in the weights of the components between the two types of tissues were found to be significant. Sensitivity, specificity, and accuracy were calculated to evaluate the performance of SVM classification. This research demonstrates that S3 spectroscopy is effective for detecting the changes in the relative concentrations of the endogenous fluorophores in tissues due to the development of cancer label free.

© 2023 Optica Publishing Group

“Optical biopsy” (OB) was first used as a new armamentarium for cancer detection when Alfano’s group measured the difference in the emission spectra between malignant and nonmalignant tissues of human lung, liver, and prostate tissues in the early 1980s [1]. Salient spectral properties of tissues interacting with light offer new approaches to solve medical problems at the molecular level. It is widely acknowledged that OB is more sensitive than conventional diagnostic methods since the biochemical changes reflected in the spectra of the endogenous building block fluorophores appear earlier than the morphologic variation disclosed by the histological aberration. The key endogenous fluorophores within tissue that have been widely studied and reported as intrinsic cancer biomarkers include tryptophan, collagen, reduced nicotinamide adenine dinucleotide (NADH), flavins, porphyrins, and elastin, etc. [2]. Conventional spectroscopic methods, including absorption, emission, excitation, and elastic scattering spectroscopy [2,3], cannot be used to acceptably resolve the spectra of multiple bio-molecules in tissues to obtain the spectral information on each component by one-run scanning. An efficient way to rapidly obtain the spectral information on complex mixtures of multiple fluorophores related to the malignancy within tissue is desired.

A method called Stokes shift spectroscopy (S3), which combines fluorescence and absorption spectra, was introduced by Alfano to address this issue [4]. S3 was developed to analyze multiple endogenous fluorophores (e.g., tryptophan, collagen, NADH, and flavins) label free in biological samples by employing a single scan. In this Letter, we will develop the S3 method into a spectral tool to distinguish malignant from normal specimens of prostate tissues, aided by machine learning (ML) algorithms. This method utilizes principal component analysis (PCA) and nonnegative matrix factorization (NMF) for dimension reduction and feature extraction. Support vector machines (SVMs) are employed to distinguish the spectra of different types of tissues [5].

For fluorescent molecules, the peaks of the absorption and emission occur at different wavelengths. The difference between the emission and absorption peaks is known as the Stokes shift, which depends on the polarity of the host environment surrounding the emitting organic molecules. S3 spectra are acquired by recording the fluorescence signal label free while scanning the excitation and emission wavelengths synchronously with a selected constant wavelength shift (Δλc) between the two wavelengths. In our study, a fluorescence spectrometer (Perkin-Elmer LS-50) was used in the synchronous scan mode. The excitation light with a 5 nm spectral width was focused on samples with a spot size of ∼3 mm × 1 mm. The power of the incident light was ∼0.5 µW. The scan speed was 250 nm per minute. The fluorescence was collected with a spectral resolution of 0.5 nm. Fresh normal (n = 15) and cancerous (n = 15) human tissue samples were obtained from the Cooperation Human Tissue Network (CHTN) and National Disease Research Interchange (NDRI). The malignancy of each sample according to the pathology report was sent along with the sample. Spectral measurements were performed between 24 and 36 hours after surgery. The size of the samples ranged from ∼1 cm to multiple centimeters.

The S3 spectra of the prostate tissues were recorded with Δλc = 40 nm, based on our previous knowledge of the Stokes shift interval wavelengths of tryptophan and collagen [6], as shown in Fig. 1. Each spectrum was normalized to the sum of the squares of the intensity values of that spectrum. The main peaks of both cancerous and normal tissues were found at ∼295 nm and ∼340 nm. The main difference between the profiles of the two types of tissues is that the peak intensity I295 (corresponding to tryptophan) in cancerous tissue is higher than that in normal tissue, while the intensity I340 (corresponding to collagen) in cancerous tissue is lower than that in normal tissue. This observation is the same as for the emission spectra collected with selective excitation wavelengths of 300 nm and 340 nm [6,7], which indicate increased tryptophan and decreased collagen in cancerous tissues in comparison with normal tissues.

 figure: Fig. 1.

Fig. 1. Average S3 spectra of cancerous (dash-dot) and normal (solid) prostate tissues acquired with the selective wavelength shift constant Δλc = 40 nm with standard deviation error boundaries.

Download Full Size | PDF

To evaluate the discrimination efficacy using these key endogenous fluorophores as biomarkers for cancer detection, machine learning algorithms including PCA and NMF were applied to analyze the S3 spectra of the cancerous and normal prostate tissues, with the aims being to reduce dimensionality and extract features from the spectra of the samples. PCA is a matrix decomposition technique to find the uncorrelated orthogonal components, the principal components (PCs), which account for the largest variances in the data. NMF is another method for signal decomposition [8]. Unlike PCA, NMF only uses nonnegative constraints and is thus particularly well positioned to retrieve the individual basis spectra for the key components [8]. Using this nonnegative technique on an inherently positive spectral dataset has been shown to allow the extraction of intrinsic fluorescence spectra since the S3 spectra and concentrations of constituents are positive values. The principles of PCA and NMF are described in detail elsewhere in the literature [5,8]. Once the S3 spectral data were unmixed by PCA or NMF, the weights (scores) for the components were considered as features of the samples. The key components with the most discriminative information were identified and used for classification. SVM classifiers were trained based on these features that indicate bio-molecular alterations that are reflected in the S3 spectra of cancerous and normal prostate tissues. The goal of the SVM is to find, for p-dimensional vectors, a (p − 1)-dimensional hyperplane such that the distance between the support vectors and the hyperplane is maximized [5,8]. To evaluate SVM models, leave-one-out cross validation (LOOCV) [5] was used to avoid bias in the classification and provide a more robust evaluation of the models.

In a diagnostic test, the outcome of a data point may be positive (disease) or negative (healthy), which can be either true or false. Statistical measures including sensitivity, specificity, and accuracy were used to evaluate the performance of classification and were calculated as follows:

$$\begin{array}{l} \textrm{Sensitivity} = \textrm{TP}/(\textrm{TP} + \textrm{FN})\\ \textrm{Specificity} = \textrm{TN}/(\textrm{FP} + \textrm{TN})\\ \textrm{Accuracy} = (\textrm{TP} + \textrm{TN})/(\textrm{TP} + \textrm{FP} + \textrm{TN} + \textrm{FN}), \end{array}$$
where TP, FP, FN, and TN are the numbers of true positive, false positive, false negative, and true negative samples, respectively. The performance of a two-group (binary) classification is also commonly evaluated using a receiver operating characteristic (ROC) curve. A ROC curve is a graphical plot of sensitivity versus (1 − specificity). The classification performance is measured by the area under the ROC curve (AUROC).

It is widely recognized that a tissue contains a number of fluorophores, including the biomarkers of interest. An S3 spectrum of a tissue sample is a linear combination of the spectra of all components. The S3 spectrum of each of the K fluorophores can be expressed as an N-dimensional vector ck, (k = 1, …, K), where N is the number of wavelengths in the spectrum and the ck’s are assumed to be linearly independent. The S3 spectrum of a tissue can be written as

$$s = \mathop \sum \limits_1^K {{\boldsymbol a}_{\boldsymbol k}}{{\boldsymbol c}_{\boldsymbol k}} + {\boldsymbol n = Ca + n,}$$
where C = (c1, c2, ..., ck) is the matrix of spectra of fluorophores, a = (a1, a2, …, ak)T, ak is a constant proportional to the concentration of the kth fluorophore in the tissue, and n is the noise. There are two purposes for the analysis of the S3 spectra of cancerous and normal prostate tissues using ML algorithms in our study: (1) to understand and extract the changes in relative concentrations of principal biochemical components in the prostate tissues and (2) to create a criterion that can be used to distinguish cancerous from normal prostate tissues.

To achieve our first objective, the three leading PCs obtained by PCA and the first three nonnegative components (NCs) extracted by NMF are shown in Figs. 2(a) and 2(b), respectively. To better understand the spectral changes caused by biochemical changes due to tumorigenesis, the S3 spectra of the key biomarkers tryptophan, collagen, and NADH were also measured individually with Δλc = 40 nm. The results are shown in Fig. 2(b), where the extracted NCs for comparison are overlaid as solid, dashed, and dash-dotted black lines, respectively. The three chemicals are measured in aqueous solutions at a concentration of ∼0.4 mg/mL.

 figure: Fig. 2.

Fig. 2. Three leading (a) PCs from PCA and (b) NCs from NMF overlaid on the spectra of the corresponding fluorophores.

Download Full Size | PDF

Since the PC spectra are linear combinations of the S3 spectra of the underlying key biomolecules, the three leading PCs carry negative values, as shown in Fig. 2(a), and cannot be interpreted as any individual molecular spectrum. Similarly, the PC scores are linear combinations of the relative concentrations of the individual endogenous fluorophores [ak in Eq. (2)]. Therefore, it is challenging to use PCs to provide a biochemical interpretation of the spectral measurements based on the biomarker changes due to carcinogenesis. The salient feature of Fig. 2(b) is that the NCs reveal the biomarker spectra and directly disclose the biomarker changes between cancerous and normal tissues. The comparison between the extracted leading NCs for the tissue samples and the S3 spectra of endogenous fluorophores shows good agreement between the first two NCs and tryptophan and collagen, respectively. The third NC has mixed peaks, which may be partly due to its low signal level. Besides the double peaks around the tryptophan peak, it shows a peak at about 385 nm corresponding to NADH [2].

To quantify the relationship between the biomarkers and the corresponding NCs, the correlation coefficients between the S3 spectra of the biomarkers and the corresponding NCs were calculated and are given in Table 1.

Tables Icon

Table 1. Correlation Coefficients between the S3 Spectra of Biomarkers and the Corresponding NCs

It can be seen from Table 1 that the correlation coefficients between the tryptophan spectrum and the first NC and between the collagen spectrum and the second NC are 0.998 and 0.981, respectively. This high correlation indicates that the two NCs can be predominantly attributed to these molecules. The correlation coefficient between NADH and the third NC is only 0.082 if the whole profiles are considered. This is expected since NC3 includes multiple peaks, including those that may be attributed to tryptophan. If only the signal after 350 nm is considered, the correlation coefficient between NC3 and NADH is 0.679, indicating a moderate linear relationship between the spectral features of NADH and the third NC. In summary, Fig. 2(b) and Table 1 demonstrate that the three extracted NCs are similar to the S3 spectra of the key endogenous fluorophores tryptophan, collagen, and NADH, and are considered to be attributable to these molecules, which account for the major spectroscopic feature changes in the S3 spectra. Therefore, the NCs can be used to directly provide the biochemical basis for the diagnosis of prostate cancer using S3 spectra.

Figure 3(a) illustrates a scatterplot of the weights of the two leading PCs retrieved by PCA from S3 spectra of cancerous (triangles) and normal (circles) tissues, along with the trained SVM classifier. It clearly shows that these two leading PCs are able to distinguish the healthy and diseased tissues excellently. The corresponding ROC curve with LOOCV is shown in Fig. 3(b). The LOOCV sensitivity, specificity, and accuracy and the AUROC were calculated and are summarized in Table 2. The AUROC value calculated from the ROC curve shown in Fig. 3(b) is 1, demonstrating excellent efficacy of the promising diagnostic method for distinguishing cancerous prostate tissues from healthy normal counterparts based on the PCA-SVM analysis of S3 spectra.

 figure: Fig. 3.

Fig. 3. (a) Scatterplot of the scores for the two leading PCs along with the SVM classifier (solid line) and (b) the corresponding ROC curve for the classification with LOOCV.

Download Full Size | PDF

Tables Icon

Table 2. Performance of Diagnosis Using S3 Spectroscopy and PCA-SVM for Cancer Detection

To investigate the changes of the weights (relative concentrations) of the NCs (biomarkers) in tissue from the S3 spectra, scatterplots for the weights of NC1 (tryptophan) versus NC2 (collagen), NC1 (tryptophan) versus NC3 (NADH), and NC2 (collagen) versus NC3 (NADH) for cancerous (triangles) and normal (circles) prostate tissues are shown in Fig. 4(a)–4(c), respectively. The most salient feature of Fig. 4(a) is that most of the data points for the normal tissues are located in the upper-left of the data region, in contrast to the cancerous tissues, indicating that the relative concentrations of collagen in normal tissues are higher in comparison with the cancerous tissues, while it is the opposite for tryptophan. Figure 4(b) provides reproducible evidence of increases in the relative concentrations of tryptophan and NADH in cancerous prostate tissues. Figure 4(c) shows again that collagen is lower while NADH is higher in cancerous tissues compared with normal tissues. In summary, Fig. 4(a)–4(c) shows that the relative concentration of collagen in cancerous prostate tissue is lower than that in normal prostate tissue, but the relative concentrations of tryptophan and NADH are higher in cancerous prostate tissues compared with those in normal prostate tissues. This observation is in good agreement with other studies in the literature [8,9]. Using Student’s t-test, it was shown the NC scores are statistically significantly different between normal and cancerous prostate tissues, with p values of 3.669 × 10–4, 1.552 × 10−7, and 2.687 × 10−4 for NC1, NC2, and NC3, respectively.

 figure: Fig. 4.

Fig. 4. Scatterplots of the three leading NC scores for the cases of (a) NC1 versus NC2; (b) NC1 versus NC3; and (c) NC2 versus NC3, along with SVM classifiers (solid lines), and the corresponding LOOCV ROC curves for (d) NC1 versus NC2; (e) NC1 versus NC3; and (f) NC2 versus NC3.

Download Full Size | PDF

The classification of positive and negative groups in the study was determined using SVMs. To evaluate the NMF-SVM models for the classification of S3 spectra of prostate tissue, the sensitivity, specificity, and accuracy with LOOCV were calculated and are summarized in Table 3. The performance of the two-group classification was also evaluated using ROC curves. The ROC curves shown in Figs. 4(d)–4(f) were used to further evaluate the performance of the classification of cancerous and normal prostate tissues with LOOCV.

Tables Icon

Table 3. Performance of Diagnosis Using S3 Spectroscopy and NMF-SVM for Cancer Detection

Among the key fluorophores, tryptophan is an essential amino acid that is transported into cancer cells via large amino acid transporter system (LAT1/CD98), and is degraded to kynurenine in the cells by the enzyme indoleamine-2,3-dioxygenase (IDO) [10]. Numerous studies have pointed out that the tryptophan consumption by cancer is involved in suppressing the immune response to cancer cells [10]. It is known that aggressive cancer cells have a large amount of amino acid transporters on the cell membrane, which can more efficiently take up tryptophan from the surrounding environment [10,11]. Collagen is the main component in the extracellular matrix (EEM). It has been observed that the concentration of collagen is decreased in cancerous compared to normal prostate tissues [12,13]. In normal prostate tissue, the collagen network is dense, with a larger number of fibers, while the collagen fiber in prostate cancer is nonuniform, with loss and disintegration. NADH is a coenzyme involved in the oxidation of fuel molecules and can be used to probe changes in cellular metabolism. Chance et al. exploited this phenomenon and showed that direct monitoring of NADH fluorescence dynamically interprets the metabolic activity within cells [11]. An intuitive application of fluorescent spectroscopy techniques is to study carcinogenesis at a variety of organ sites (such as the prostate) which are known to have increased metabolic rates; therefore, directly monitoring the key endogenous fluorophores tryptophan, collagen, and NADH label free in tissues using S3 spectroscopy aided by machine learning methods could be a novel spectroscopic tool for prostate cancer detection.

In summary, machine learning analysis of the S3 spectra of human prostate tissues can reveal the differences in spectral features between cancerous and normal samples. The spectral components extracted using PCA and NMF with dimension reduction were used to create criteria for prostate cancer detection. SVM classifiers were trained to separate the two types of tissues. Our study indicates that S3 spectroscopy with decomposition methods is a promising technique to provide characteristic information for the classification of diseased and healthy specimens while significantly reducing sample dimensionality and keeping the spectral strength. Furthermore, NMF provides a more insightful understanding of the changes in relative concentrations of principal biomarkers due to tumorigenesis, while PCA results are more difficult to interpret. This research shows that S3 spectra may be used as “fingerprints” for label-free detection of cancers in different organs such as prostate, breast, and brain. In this study, the signal from NADH is relatively low since Δλc = 40 nm is not its optimal wavelength shift, which leads to cross talk between NADH and tryptophan in NC3. Multi-channel detection could help optimize the signal from multiple fluorophores simultaneously and further improve the efficacy of S3. S3 spectroscopy may be combined with other rapid imaging and analysis techniques such as spatial frequency spectral analysis (SFSA) [14] and polarization analysis using a Muller matrix to analyze and retrieve the morphological information and further improve the diagnostic performance [15,16].

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

REFERENCES

1. R. R. Alfano, D. Tata, J. Cordero, P. Tomashefsky, F. Longo, and M. Alfano, IEEE J. Quantum Electron. 20, 1507 (1984). [CrossRef]  

2. R. Richards-Kortum and E. Sevick-Muraca, Annu. Rev. Phys. Chem. 47, 555 (1996). [CrossRef]  

3. N. Ramanujam, in R. A. Meyers ed., Encyclopedia of Analytical Chemistry (John Wiley Sons Ltd., Chichester), pp. 20–56 (2000).

4. R. R. Alfano and Y. Yang, IEEE J. Quantum Electron. 9, 148 (2003). [CrossRef]  

5. Y. Zhou, C.-H. Liu, B. Wu, X. Yu, G. Cheng, K. Zhu, K. Wang, C. Zhang, M. Zhao, R. Zong, L. Zhang, L. Shi, and R. R. Alfano, J. Biomed. Opt. 24, 1 (2019). [CrossRef]  

6. Y. Pu, W. B. Wang, G. C. Tang, and R. R. Alfano, J. Biomed. Opt. 15, 047008 (2010). [CrossRef]  

7. Y. Pu, L. A. Sordillo, Y. Yang, and R. R. Alfano, Opt. Lett. 39, 6787 (2014). [CrossRef]  

8. J. Xue, Y. Pu, J. Smith, X. Gao, C. Wang, and B. Wu, Sci. Rep. 11, 2282 (2021). [CrossRef]  

9. D. B. Shennan, J. Thomson, M. C. Barber, and M. T. Travers, Biochim. Biophys. Acta, Biomembr. 1611, 81 (2003). [CrossRef]  

10. H. Betsunoh, T. Fukuda, N. Anzai, D. Nishihara, T. Mizuno, H. Yuki, A. Masuda, Y. Yamaguchi, H. Abe, M. Yashi, Y. Fukabori, K. I. Yoshida, and T. Kamai, BMC Cancer 13, 509 (2013). [CrossRef]  

11. B. Chance, J. Williamson, D. Famieson, and B. Schoener, Biochem Z 341, 357 (1965).

12. C. Morrison, J. Thornhill, and E. Gaffney, Urol. Res. 28, 304 (2000). [CrossRef]  

13. D. F. Gleason and G. T. Mellinger, J. Urol. (N. Y., NY, U. S.) 111, 58 (1974). [CrossRef]  

14. Y. Pu, J. Jagtap, A. Pradhan, and R. R. Alfano, J. Biophotonics 8, 233 (2015). [CrossRef]  

15. D. Ivanov, V. Dremin, E. Borisova, A. Bykov, T. Novikova, I. Meglinski, and R. Ossikovski, Biomed. Opt. Express 12, 4560 (2021). [CrossRef]  

16. V. A. Ushenko, B. T. Hogan, A. Dubolazov, G. Piavchenko, S. L. Kuznetsov, A. G. Ushenko, Y. O. Ushenko, M. Gorsky, A. Bykov, and I. Meglinski, Sci. Rep. 11, 5162 (2021). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (4)

Fig. 1.
Fig. 1. Average S3 spectra of cancerous (dash-dot) and normal (solid) prostate tissues acquired with the selective wavelength shift constant Δλc = 40 nm with standard deviation error boundaries.
Fig. 2.
Fig. 2. Three leading (a) PCs from PCA and (b) NCs from NMF overlaid on the spectra of the corresponding fluorophores.
Fig. 3.
Fig. 3. (a) Scatterplot of the scores for the two leading PCs along with the SVM classifier (solid line) and (b) the corresponding ROC curve for the classification with LOOCV.
Fig. 4.
Fig. 4. Scatterplots of the three leading NC scores for the cases of (a) NC1 versus NC2; (b) NC1 versus NC3; and (c) NC2 versus NC3, along with SVM classifiers (solid lines), and the corresponding LOOCV ROC curves for (d) NC1 versus NC2; (e) NC1 versus NC3; and (f) NC2 versus NC3.

Tables (3)

Tables Icon

Table 1. Correlation Coefficients between the S3 Spectra of Biomarkers and the Corresponding NCs

Tables Icon

Table 2. Performance of Diagnosis Using S3 Spectroscopy and PCA-SVM for Cancer Detection

Tables Icon

Table 3. Performance of Diagnosis Using S3 Spectroscopy and NMF-SVM for Cancer Detection

Equations (2)

Equations on this page are rendered with MathJax. Learn more.

Sensitivity = TP / ( TP + FN ) Specificity = TN / ( FP + TN ) Accuracy = ( TP + TN ) / ( TP + FP + TN + FN ) ,
s = 1 K a k c k + n = C a + n ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.