Brain tumor grading diagnosis using transfer learning based on optical coherence tomography

Sanford P. C. Hsu; Sanford P. C. Hsu; Sanford P. C. Hsu; Miao-Hui Lin; Chun-Fu Lin; Chun-Fu Lin; Tien-Yu Hsiao; Yi-Min Wang; Chia-Wei Sun; Chia-Wei Sun; Chia-Wei Sun

doi:10.1364/BOE.513877

1. Introduction

According to the Central Brain Tumor Registry of the United States (CBTRUS) statistics [1], originating from glial cells, astrocytoma stands out as the most prevalent malignant brain tumor, known as gliomas. Their defining characteristic is their invasive growth into the surrounding white matter of the brain, posing challenges in distinguishing the tumor's border from the brain tissue. Based on growth rate and invasiveness, it is categorized into low-grade glioma (LGG) and high-grade glioma (HGG). LGG encompasses pilocytic astrocytoma (grade 1) and astrocytoma with IDH mutation (grade 2), while HGG includes astrocytoma with IDH mutation (grade 3, 4) and glioblastoma with IDH-wildtype (GBM, grade 4). Patients with HGG have an average life expectancy of approximately ten months [2]. For LGG, the lifespan exhibits variations across studies, spanning from 61.1 to 90 months [3]. About 45% of LGG evolve the malignant variant (HGG) within five years [4,5]. Complete tumor resection represents a pivotal phase in the therapeutic process, facilitating the safe removal of a substantial tumor entity, mitigating neurological deficits, and establishing a precise tumor phenotype for subsequent treatment strategies. Statistics indicate a direct correlation between the extent of tumor resection and patient life expectancy [6–9], particularly for less malignant astrocytomas [10]. Retrospective studies have emphasized potential issues: the confusion of primary central nervous system lymphoma (PCNSL) with glioma and inaccurate glioma grade classification, resulting in incorrect treatment decisions [11–15]. Treatment strategies for PCNSL and glioma are different. Also, within glioma, HGG and LGG follow distinct plans. Therefore, accurate clinical classification is crucial to minimize the risk of recurrence [16–22].

The general diagnosis of brain tumors is to perform surgery. A stereotaxic nerve guidance system guided the surgeon to remove all tumors without harming other brain tissues intraoperatively. After resection, the pathologists conduct paraffin sections (PS) to perform definite tissue diagnosis. While this method can preserve the tissue type and cell characteristics, the significant drawback is that the process takes about days. Since PS cannot provide tissue information intraoperatively, frozen section (FS), was developed, providing surgical strategy insights in approximately 30 minutes. However, the fast freezing can distort cellular structures, making artifacts on its hematoxylin/eosin (H&E) staining image and accuracy inferior to PS. Hence, on-site determination of brain tumors is essential during surgery. Optical coherence tomography (OCT) is a real-time and non-invasiveness imaging technology widely applied in medicine. It provides micro-level image resolution in cross-section and appropriate millimeter penetration depth, filling a vacancy between magnetic resonance imaging (MRI) and fluorescence microscopy. Due to its ability to offer image resolution comparable to pathological results, this approach is sometimes called “optical biopsy.” OCT eliminates the need for additional contrast agents, mitigating potential side effects and streamlining the image acquisition process. On-site OCT scanning using a movable cart on fresh ex-vivo brain tissue potentially provides alternative histological information for clinicians. Therefore, OCT emerges as a secure option for facilitating intraoperative diagnoses in neurosurgery [23].

In recent years, there has been a significant surge of interest in deep learning. In addition, achieving rapid and effective convergence requires considerable computational power due to the immense computational complexity involved. Given these challenges, transfer learning is a novel approach for addressing problems across disparate, interconnected tasks by leveraging existing knowledge. Much research has been committed to applying transfer learning using brain MRI [24–32]. The thriving of transfer learning to MRI brain imaging classification has been developed. However, there is a lack of relevant research on OCT imaging.

Real-time qualitative and quantitative cues offer clinicians valuable intraoperative brain tumor differentiation information. Currently, the time-consuming issue of FS leaves space for improvement. Thus, rapid on-site diagnoses via OCT technology provide an alternative tool to break through the insufficiency of a diagnostic method by FS. This study represents the pioneering effort in combining OCT technology with transfer learning to classify high- and low-grade gliomas. Furthermore, the hierarchical binary classification could align with clinicians’ immediate needs. We aim to cultivate a robust OCT system tailored for intraoperative brain tumor assessment.

2. Materials and methods

In this study, we conducted our experiment on ex vivo specimens. Both gliomas and PCNSL samples were excised from routine surgical operations. Unfortunately, there is no published OCT normal brain tissue datasets. Given the unavailability of normal brain tissues from surgical routines, we selected the porcine brain as a surrogate for normal human brain tissues as the porcine brain resembles humans in histological characteristics and is suitable for preliminary clinical trials. [33,34] Following OCT measurements of the specimens, we employed the pre-trained MobileNetV2 model for subsequent deep learning. This design ultimately yielded predictions for three probabilities. Model performance evaluation used a confusion matric and dynamic t-distributed stochastic neighbor embedding (t-SNE) scatter plots [35]. Experimental details are illustrated in the following sections accordingly.

The utilized OCT system remained consistent with the one previously published [36]. It was designed using a single-mode fiber-based balanced Mach-Zehnder interferometer configuration. The high-speed swept-source laser (HSL-20-50, Santec Corp.) had a center wavelength of 1.31 µm with a full width at half maximum (FWHM) of 100 nm, reaching a theoretical axial resolution of 8 µm in air. The A-line scanning rate of 50 kHz was provided by the amplified photodetector (APD) (PDA05CF2, Thorlabs) to detect the reflection from the fiber Bragg grating (FBG) (FBGSMF-1266-80-0.2-A-(2)60F/E, L = 1 M, Tatsuta Electric Wire & Cable Co., Ltd.) at the wavelength of 1266.0 nm. The k-linearity calibration was performed using the built-in k-trigger of the laser.

Figure 1 shows the current swept-source OCT (SS-OCT) schematic diagram. The input light was split by the C1 coupler into a minor portion and a major portion, with a ratio of 1:99. The minor part was directed towards the FBG, resulting in 80% of the incoming beam being reflected at a wavelength of 1266.0 nm. The APD captured the reflected beam as the A-trigger signal. The major part passed the C2 coupler and entered the interferometer's reference and sample arms through two polarization-insensitive optical circulators (PICIR-1214-12-L-05-NE, OF-Link Communications Co., Ltd.) Cir1 and Cir2, with a ratio of 20:80. In the reference arm, the light was collimated by a fiber collimator (FC1) (F260APC-C, Thorlabs) and an achromatic lens (L1) (AC254- 030-C-ML, Thorlabs) and reflected by a gold-coated mirror. In the sample arm, galvanometers (G1, G2) (GVS012, Thorlabs) and fiber collimator (FC2, L2) were added to the optical path before the samples. The system had an approximate lateral resolution of 18 µm in the air. The interference signal was formed in the C3 coupler by transmitting the beams from the two arms via the circulators and detected by a balanced photodetector (BPD) (PDB480-AC, Thorlabs) to acquire less contaminated signals. The system sensitivity achieved was 91.58 dB. Before entering the waveform digitizer (ATS9350, Alazar Technologies), the electric signals were filtered by a high-pass (HPF) (ZFHP-0R23-S+, Mini-Circuits International) and low-pass filter (LPF) (BLP-90+, Mini-Circuits International), leading a designated frequency band of 0.23 to 81 MHz. Finally, the interference signals were sampled linearly in k-space.

Fig. 1. System diagram of swept-source OCT (SS-OCT) system. Black curves, dotted curves, and spaced yellow regions represent optical fiber path, electrical wires, and air-space beam trans-mission, respectively. HSL, high-speed laser; C1, 1:99 coupler; C2, 20:80 coupler; C3, 50:50 coupler; Cir1, Cir 2, circulator; L1, L2, lens; APD, amplified photodetector; BPD, balanced photodetector; FBG, fiber Bragg grating; FC, fiber collimator; FG, function generator; G1, G2, Galvano scanner; GC, Galvano controller; HPF, high-pass filter; LPF, low-pass filter; M, mirror; S, sample; PC, personal computer.

Download Full Size | PDF

Self-developed LabVIEW programs controlled all system functions (LabVIEW 2017, National Instruments). The function generator (FG) governed the two galvanometers, ensuring synchronization with the waveform digitizer for two-dimensional scanning. The scanning area covered 5 mm (width x) × 5 mm (width y), encompassing a C-scan comprising 1000 × 1000 A-scans. Each sample was scanned volumetrically in multiple directions to increase the data. The scanning area was adjusted for each OCT volume measurement to minimize the structural similarity among OCT volumes from the same specimens. In addition, frames exhibiting strong reflections were manually excluded from model training as they degraded image quality. This study aimed to confirm the feasibility of the proposed algorithm.

The Department of Neurosurgery at Taipei Veterans General Hospital in Taiwan recruited participants for this study, and all subjects provided written informed consent. Patients between the ages of 20 and 65 who required resection surgeries were eligible for inclusion, while those with metastatic brain tumors or who had undergone chemotherapy or radiotherapy were excluded. Tumor specimens were obtained during routine surgical procedures and preserved in formalin solution, along with porcine samples. We conducted OCT-scanning ex vivo after one day in formalin solution. The size of each specimen was at least 5 mm × 5 mm × 5 mm (maximum 10 mm × 10mm × 10 mm). Each scan covered a physical size of 5 mm (B scan, width x) × 5 mm (C scan, width y) × 5 mm (depth, z). The study received ethical approval from both the Institutional Review Board (IRB) of Taipei Veteran General Hospital (2019-07-022CC) and the National Chiao Tung University (NCTU-REC-108-066E).

Figure 2 depicts the image preprocessing before deep learning. In addition, all images were captured from different angles, even from the same specimen. All data processing was done using Python v3.6 programming language with CUDA GPU acceleration on a personal computer equipped with 16.0 GB RAM, an Intel Core i5-7500 CPU operating at 3.40 GHz, and an NVIDIA GeForce GTX1660 GPU. Starting from the captured interference signal, background subtraction, and calibration of k linearity in the frequency domain, then Fast Fourier Transform to the spatial domain. After that, speckle reduction and artifact removal would generate OCT images. Despecked images involved averaging seven adjacent B-scans after undergoing translational registration. We resized and normalized the despeckled images into the size of 128 pixels (depth) × 256 pixels (width) (physical range of 2.5 mm × 5.0 mm) before the training to achieve the efficient training of the neural network. Data augmentation was implemented through random combinations of rotation, translation, horizontal flip, and zooming. To preserve the morphological features in the OCT images, translation, and zooming effects were set at a ratio of 0.1. As such, strict control was exercised over image quality to eliminate potential variables that could influence the results.

Fig. 2. Image processing flowchart. Multiple scanning from the same specimens were acquired at different sections. The obtained images were preprocessed by speckle reduction, invalid images removal, resize, and normalization. Finally, data augmentation was employed during the model training process.

Download Full Size | PDF

The model was grafted from MobileNetV2 pretrained on the ImageNet dataset. We removed the last layer of the MobileNetV2 model and then added three new layers, as shown in Fig. 3(a), to learn the relation between the features extracted by the MobileNetV2 model and desired output labels. During the training processing, a batch size of 32 images was employed, and the Adam optimizer was selected with a learning rate of 0.00001. Furthermore, validation accuracy served as the criterion to evaluate performance and stopped when the validation data’s accuracy function ceased to increase for 20 consecutive epochs. It is worth noting that the activation function of the output layer here utilized the sigmoid function to enable the adjustable thresholds for individual classes. Figure 3(b) depicts our self-built binary hierarchical classification flowchart. The predictions were completed by sequentially deciding whether the tissue was normal, PCNSL, and HGG. The decision was arranged by the closeness of the labels pathologically. This technique is proper when the patient's prior probability is available based on the surgeons’ experience, and the user can adjust the threshold accordingly to optimize the confidence of the prediction outcomes.

Fig. 3. Model design using the transfer learning technique. (a) The model was constructed by adding layers after the pretrained MobileNetV2 model. (b) The final prediction output of the model was adjustable by using the stepwise thresholding technique proposed in this study. NOR, normal tissue; LYM, PCNSL; HGG, high-grade glioma; LGG, low-grade glioma. P_NOR, P_LYM, P_HGG represent the probability given by the model output.

Download Full Size | PDF

Two visualization techniques, confusion matrix, and t-SNE, were showcased to assess the model performance. T-SNE stands out as a powerful dimensional reduction method that preserves the local structure of transformed data, making it particularly applicable for visualizing high-dimensional datasets. This research used t-SNE to examine data distributions. Moreover, Bokeh, one of the Python modules, can make the graphs active without static presentation. Through dragging or zooming with the mouse, each dot on the dynamic t-SNE plot would show the sample's true label, predicted label, and corresponding OCT image. We successfully designed a user-friendly interface. Each generated plot is saved as an HTML file. After closing the program, users can still open the HTML files to review the model's performance. The results of this segment will be presented in subsequent sections.

3. Results

Twelve patients diagnosed with glioma, including nine HGG, three LGG, and one with PCNSL, were recruited from routine operations. Each specimen underwent multiple scans to augment the dataset. The recruitment detail is listed in Table 1, where NOR represents the normal brain tissue harvested from a porcine. We divided the data into training and testing datasets to facilitate model training and testing. Within the training dataset, we employed fivefold cross-validation to assess the model's consistency and reliability, ensuring robust performance.

Table 1. Recruitment information

View Table | View all tables in this article

Data splitting is shown in Table 2. Regarding data splitting for glioma, the data of one patient will not be allocated to the training set and the testing set simultaneously. This method will complete the model training, and there will be no cheating problem. Unfortunately, PCNSL only has one patient. The source of NOR is the porcine brain. Thankfully, the PCNSL patient provided seven separated specimens. We believe that in this case, the data can be divided into training and testing on the basis of volume, even for the same patient. The volume set of the training set is self-controlled to be a multiple of five at the split data level to implement five-fold cross-validation.

Table 2. Data separation

View Table | View all tables in this article

The accuracies of the proposed method on training data, validation data, and testing data under default probabilities (0.5) were 93.09%, 89.56%, and 84.59%, with standard deviations of 4.1%, 3.5%, and 3.6%, respectively, demonstrating an acceptable differentiation power. Figure 4 shows the confusion matrix of the testing data from one of the models. Normal tissues and LGG have a mere portion of misclassifying each other, which may be caused by OCT scanning for inconspicuous lesions. Also, in the third layer classification under default probability (0.5), the accuracy of distinguishing between HGG and LGG reached 80%. An important factor contributing to this challenge is the data imbalance. The primary image features are similar since both belong to the glioma category. More training data must be required to enhance the model performance. Thankfully, our proposed binary hierarchical classification method enables customizing threshold adjustments to align with clinical requirements.

Fig. 4. The confusion matrix of the testing data from one model. The sensitivities and specificities of PCNSL (labeled as LYM) were 99.0% and 100%, respectively. While HGG is 80.0% and 80.0%, LGG is 80.3% and 88.5%, leading to an overall accuracy of 86.4%.

Download Full Size | PDF

The evaluation metrics for PCNSL (labeled as LYM) exhibited sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of 99.0%, 100.0%, 100.0%, and 99.6%, respectively. Corresponding metrics for HGG were 80.0%, 80.0%, 87.0%, and 90.4%, and LGG were 80.3%, 88.5%, 70.0%, and 93.1%, culminating in an overall accuracy of 86.4%. According to the receiver operating characteristic (ROC) curves of P_NOR, P_LYM, and P_HGG from the testing data, as depicted in Fig. 5, the mean areas under the curves (AUC) were 0.996, 1.000, and 0.898, showing good differentiation powers of the targeted tumors.

Fig. 5. ROC curve of Normal, PCNSL, and Glioma of the MobileNetV2 transfer learning model by the testing data. The AUC are 0.996, 1.000 and 0.898, respectively.

Download Full Size | PDF

We plotted the 2D distribution of the data at the last average pooling layer using dynamic t-SNE, as depicted in Fig. 6. The plot features true boundaries that divide the entire representation into four ground truth categories. The two axes correspond to the meta-features of the data, with each data point representing an OCT image color-coded according to its predicted label. According to the results in Fig. 6, normal tissue exhibited homogeneous appearances, whereas glioma, in general, displayed irregular holes and abnormal attenuation, as reported before [36]. In contrast, instead of showing microstructural features, PCNSL surprisingly tended to be homogeneous with a few abnormalities of attenuation. Comparing the histological staining images of HGG and LGG [Fig. 6(e) and (h)] in conjunction with the expert physician's commentary, pleomorphic tumor cells (blue arrow) and hypercellularity (yellow star) are typical microstructures of glioma. Glioma entity generally exhibits a formless area without nuclei (green star). Although the structures are similar, we observe that morphologies in LGG and HGG are different. As shown in Fig. 6(c) and (d), tissue structures in LGG are relatively denser, while Fig. 6(f) and (g) illustrate that those are sparser in HGG with numerous vesicles (red arrow). The results are consistent with known histological findings and comparable with prior research [37]. Through a dynamic t-SNE graph, we can quickly point out the dots’ information for misclassification. (Fig. 7) These misclassified points sit near the true boundary, showing their features are too similar to another category. In clinical scenes, the model had saved in the PC of movable OCT cart. After OCT measurement, it takes five minutes for data recording. Then, we will load raw data to perform image pre-processing (five minute). The model importing and use tensorflow.keras comment: model.prediction to obtain the results. (one to five minutes depending on prediction images numbers). With this visualization, the reliability and convince of our model were affirmed.

Fig. 6. Dynamic t-SNE scatter plot of the data points at the last average pooling layer. The colors denoted the predicted label by the model, and the pink line showed the true boundary between four categories. Examples of OCT intensity images of (a) Normal, (b) Lymphoma, (c) (d) LGG and the corresponding (e) H&E stain image, (f)(g) HGG and the corresponding (h) H&E stain image. Yellow star: hypercellularity; green star: Glioma entity; blue arrow: pleomorphic tumor cells; red arrow: vesicles. Scale bar: 1 mm.

Download Full Size | PDF

Fig. 7. Misclassified information on dynamic t-SNE scatter plot. The colors denoted the predicted label by the model, and the pink line showed the true boundary between four categories. Examples of OCT intensity images of misclassified each other between LGG and Normal (a)(b), HGG and LGG (c)(d). One of LGG misclassified for PCNSL (e). Scale bar: 1 mm.

Download Full Size | PDF

4. Discussions

The standard diagnosis of brain tumors is surgery. Before the operation, the surgeon examines the patient's clinical symptoms and medical images, determining the type of brain tumor the patient suffers from and pinpointing its location for preoperative surgical planning. Currently, clinics’ most common medical imaging examinations are computed tomography (CT) and MRI. Retrospective studies have pointed out that FS might confuse PCNSL with glioma and may inaccurately classify glioma grades, leading to incorrect or over-treatment. [11–15]. In a previous statistic of HGG and LGG, comparing HGG and LGG, LGG's was only 78.4%, whereas HGG's accuracy was 91.6% [15]. Out of 578 brain tumor cases, 13 were diagnosed as PCNSL by PS, but only four were correctly identified by FS [15]. This phenomenon indicates that despite PCNSL's low prevalence in brain tumors, FS's sensitivity to PCNSL is notably poor and often mistaken for glioma. In addition, the treatment strategies for PCNSL and glioma differ according to the clinical recommendation. While gliomas are typically addressed with surgical resection, PCNSL is treated exclusively with chemotherapy. In glioma, HGG and LGG would adopt different grade plans. Usually, HGG patients need appropriate adjuvant radiation therapy plus chemotherapy to avoid relapse after total tumor resection. On the other hand, LGG patients require complete surgical resection alone. Hence, on-site determination of PCNSL and glioma, as well as HGG and LGG, is essential during surgery. It looks forward to the new technology to address this challenge in the clinic.

Traditional learning is disengaged, taking place on individual tasks or datasets and training separate models. In contrast, transfer learning enables utilizing previously learned model data (weights and features), which can be transferred and applied to address new tasks. Some research committed to applying transfer learning using brain MRI. For instance, Chelghoum et al. [24] employed the brain contrast-enhanced magnetic resonance images (CE-MRI) dataset from Figshare [25], which consists of three distinct brain tumor types (glioma, meningioma, and pituitary tumor) to classify only abnormal brain MR images. Due to the few training datasets, the classification systems evaluate deep transfer learning for feature extraction by investigating nine deep pre-trained model architectures: AlexNet [26], GoogleNet [27], VGG16 [28], VGG19 [28], Residual Networks (ResNet18, ResNet50, ResNet101) [29], Residual Networks and Inception-v2 (Res-Net-Inception-v2) [30], Squeeze and Excitation Network (SENet) [31]. This process contributes to enhancing radiologist interventions, aiding in resolving brain tumor classification challenges, and facilitating the development of more effective treatments. Moreover, Kulkarni et al. used four transfer learning pre-trained models [32] to perform grading diagnoses on glioma MRI images, utilizing AlexNet, GoogLeNet, ResNet18, and ResNet50. The researchers endeavored to maintain consistent control over various parameters, encompassing the learning rate, gradient decay factor, maximum epochs, mini-batch size, validation frequency, and optimizer. They compared the accuracy and time required for different models. Although transfer learning has achieved good classification results on MRI brain images, there is a lack of research on OCT brain images. In addition, training time and prediction time of new data are also crucial. So far, no study has pointed out that OCT images are suitable for lightweight models.

MobileNet is a lightweight neural network designed for mobile and embedded devices. Its main feature is the incorporation of depthwise separable and pointwise convolutions to reduce computational complexity and model size. Throughout its development, the primary focus has been enhancing mobile device efficiency. Owing to the sensitivity of MobileNetV1 to parameter selection, MobileNetV2 was introduced. Compared with MobileNetV1, MobileNetV2 incorporates inverted residual blocks, addresses linear bottlenecks between layers, and reduces parameters by 30% while maintaining high-performance levels. In the literature review, transfer learning for OCT mainly uses the VGG series, ResNet series, AlexNet, GoogleNet, et cetera. The training time of these models is long and large (especially the VGG series). Previous studies have confirmed that self-built attention ResNet model can achieve three classifications of brain tumor images from OCT [36]. However, the architecture of this model is very complex and takes time. Therefore, we try MobileNetV2, which aims to improve efficiency and performance and is suitable for image classification on resource-constrained devices. More studies need to be using MobileNetV2 on OCT images, highlighting the novelty of our research value. Our test results demonstrate that utilizing the transfer learning pre-trained model MobileNetV2 for the training process takes approximately one hour. In contrast, the previous study, which involved manual construction of the ResNet model, required more than six hours, showing a sixfold reduction in training time. Additionally, we implemented hierarchical binary classification and thoroughly tested the model, resulting in a commendable accuracy rate.

Unbalanced data refers to a situation where the number of specific categories in the data is either excessively large or too small, potentially leading to an overreliance on dominant categories during model training or prediction. We employed data augmentation techniques to mitigate this issue, including rotation, zooming, width and height shifting, and horizontal flipping. Variant images are fed in every model. Data augmentation becomes even more crucial in cases like PCNSL, where we have only one patient's data. The underlying remedy to address these challenges is to expand the dataset size. Also, different types are split into training, validation, and testing under the same proportion. Once the number of OCT volumes from each patient is equal, it can help prevent the model from excessively relying on specific patients’ data, thus enhancing its overall robustness and performance.

Figure 6 shows the correctly classified results on the dynamic t-SNE plot, whereas Fig. 7 demonstrates the misclassification. According to our previous findings, NOR OCT images display homogeneous and no microstructural appearance, whereas glioma exhibits abnormal microstructure, such as microcystins, calcification, and hemorrhaging within tumoral regions [36]. On the other hand, these are relatively uncommon on PCNSL OCT images [38]. The results in Fig. 6 can verify the research reproducibility. Furthermore, we can observe similar overall morphologies when comparing LGG [Fig. 6(c) and (d)] with HGG [Fig. 6(f) and (g)]. The tissue structures in LGG appear relatively denser, while in HGG, it seems sparser. Numerous vesicles (red arrow) and pleomorphic tumor cells (blue arrow) can be observed in HGG images. Furthermore, comparing these two types reveals that the dense hypercellularity structures (yellow star) with high reflectivity in HGG samples are absent in the LGG samples. The remarkable correspondence between OCT images and histology [Fig. 6(e) and(h)] further substantiates the potential of OCT as a diagnostic imaging tool for clinical glioma diagnosis.

From Fig. 7, LGG and Normal are occasionally misclassified between each other [Fig. 7(a) and (b)], as well as HGG and LGG [Fig. 7(c) and (d)]. Straight artifacts might contribute to these misclassifications. Hence, it becomes essential to undertake further steps involving speckle reduction and registration to address these issues. Also, the misclassification between HGG and LGG likely resulted from their similarities, which aligns with our findings from the visual inspection of OCT images. Moreover, one of the LGGs misclassified for PCNSL may result from scanning out of range. In this condition, specific image pre-processing is required to crop additional regions. In our research, we use the easy average and translational registration to solve speckles, but the processed images still need to be improved. Generally speaking, the most fundamental way is to filter the signal in hardware, while in software, non-local means filter [39] or conditional generative adversarial network (cGAN) [40] is a novel method that can eliminate speckles more comprehensively. Ultimately, classification could be further improved by slightly adjusting the decision boundary without obvious overfitting.

Figure 8(a)-(d) presents the confusion matrix performance for four custom threshold combinations. Detailed values for sensitivity, specificity, PPV, and NPV of each category are outlined in Table 3. The default probability yields an overall accuracy of 86.4%. We focus on adjusting Threshold_1 and Threshold_3 while keeping Threshold_2 constant. Modifying the threshold value of Threshold_3 to 0.3 enhances HGG sensitivity to 95.8%, while LGG sensitivity decreases to 63.51%. Lowering the threshold value of Threshold_1 to 0.3 increases Normal sensitivity to 96.9%. Remarkably, the combination of thresholds (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.4) yields the highest overall accuracy at 88.0%. Comparing different adjustments within the range of 0.4 to 0.7 (a variation of ±0.2) has minimal impact on the overall accuracy, which remains above 80%. It's important to note that these four categories exhibit distinct sensitivity, specificity, PPV, and NPV variations. In this example, we adjusted the thresholds (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.7), meaning that the diagnosis of HGG would be more strict. Although the entire accuracy decreased from 86.4% to 80.7%, the specificity of HGG increased from 80.0% to 97.1%, and the sensitivity of LGG improved from 80.3% to 88.7%, as expected. This pilot study provides a novel approach to using the model. To reduce the impact of subjectivity and variability, it is necessary to give the surgeons a user-friendly operation interface, guide surgeons on how to use the model, adjust the threshold, and monitor and evaluate the use of the model. Therefore, it is possible to achieve cross-disciplinary cooperation between engineering and medicine and practice translational medicine. This design would be valuable when diagnostic evidence is on hand or individual requirements (age, medical history, chief complaint, physical examination, symptoms, et cetera) must be considered. To sum up, the current implementation of transfer learning has provided us with a novel and fast way to classify and differentiate various types and grades of brain tumors.

Fig. 8. Confusion matrix of the testing data from one model using transfer learning with the customized thresholds. The thresholds were set to be (a) (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.3). (b) (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.7). (c) (Threshold_1, Threshold_2, Threshold_3) = (0.3, 0.5, 0.5). The best overall accuracy under (d) (Threshold_1, Threshold_2, Threshold_3) = (0.5, 0.5, 0.4). The sensitivities, specificities, PPV, and NPV of Normal, PCNSL, HGG, and LGG are shown in Table 3.

Download Full Size | PDF

Table 3. The performances of different threshold combination

View Table | View all tables in this article

Compared to other quantitative indices like the attenuation coefficient [41], co-channel attenuation, and forward cross-scattering [42,43], our method demonstrates an acceptable ability to distinguish between normal and tumoral tissues, achieving a commendable accuracy rate. Given OCT's ability to discern between normal and tumoral tissues, this approach is also expected to aid surgeons in evaluating residual tumor tissues at the end of the surgical excision in the future. Although differentiation among PCNSL, HGG, and LGG was still to be partially improved due to the limited samples, we anticipate achieving a high differentiation power based on the current insights into OCT image characteristics. This distinction is critical for clinically differentiating PCNSL, HGG, and LGG.

In the research regarding the classification of brain tumors, we showed that tumoral tissues and normal tissues can be distinguished almost perfectly. According to previous reports, [36] GBM displays necrotic areas, thus altering attenuation uniformity. Nonetheless, radiotherapy and surgical coagulation can also cause tissue necrosis [44], and patients treated with radiotherapy were excluded from this study. Also, the porcine brain suffered from no coagulation necrosis. Further studies are warranted to investigate features of tissues from more general occasions, and patients treated with radiotherapy should be included in the future recruitment criteria. In vivo, measuring normal human brain tissues is also one of our plans to claim a more straightforward conclusion. Another concern is that although the cancerous specimens should contain at most infiltrative tissues yet no normal tissues according to the surgical guideline of maximal safe resection, in consideration of a small amount of normal tissue is included and appearing in single frames within an OCT volume, the predictive model should be able to identify them rather than regard them as the class of a given overall label. As a suggestion for future research, the problem can be addressed by the experimental design of data processing introduced with multiple instance learning, in which negative frames are allowed in a positive OCT volume. Thereby, the prediction accuracy could be further escalated.

5. Conclusions

In this work, we recruited a limited number of cases to preliminarily validate the feasibility of using OCT to differentiate between PCNSL and gliomas from normal tissues. Subsequently, we determined PCNSL from gliomas and further distinguished between HGG and LGG among gliomas using the MobileNetV2 pre-trained model in an ex vivo experimental setup. Consistent with previous research, normal tissues displayed a homogeneous tissue texture, while PCNSL showed no abnormalities of attenuation along the transversal direction. In addition, HGG demonstrated clear vesicles, albeit with a sparser structural density. In contrast, LGG exhibited a distribution of floccules and dark strips, potentially due to hypercellularity and fibrosis.

Leveraging transfer learning, we achieved an overall testing accuracy of 86.4% for four differentiation categories using default thresholds. Users can modify arbitrary thresholds in real-time through customized binary hierarchical classification as diagnostic evidence is on hand, or individual requirements must be considered. Furthermore, dynamic t-SNE could provide on-site information about targeted data points. Through our custom interface, users can rapidly ascertain the model's performance by viewing the image's sorted number, true label, and predicted label. This user-friendly graphical user interface (GUI) offers tangible support for surgeons’ clinical settings. Notably, we observed that all misclassified images congregated at the boundaries of the four categories, suggesting that with sufficient data for model generalization, the classification accuracy could further improve.

To our knowledge, this study marks the first study where normal tissues, PCNSL, HGG, and LGG, were observed in OCT imaging at the same time, and the findings of these distinctive features of OCT imaging might be an essential key during surgery. So far, our results show a promising outlook for combining OCT and transfer learning in PCNSL and glioma identification. Ultimately, this proposed methodology promises to assist surgeons during operations and elevate patient outcomes.

Funding

Veterans General Hospitals (V113C-057); Veterans General Hospitals University System of Taiwan Joint Research Program (VGHUST112-G1-6-1); Yen Tjing Ling Medical Foundation (CI-112-4); National Science and Technology Council (NSTC) (Grant Number: 112-2221-E-075-002, 111-2221-E-075-002, 111-2221-E-A49-047-MY3).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available according to the protection of human research participants.

References

1. Q. T. Ostrom, G. Cioffi, H. Gittleman, et al., “CBTRUS statistical report: primary brain and other central nervous system tumors diagnosed in the United States in 2012–2016,” Neuro-oncology 21(Supplement_5), v1–v100 (2019). [CrossRef]

2. D. N. Louis, A. Perry, G. Reifenberger, et al., “The 2016 World Health Organization Classification of tumors of the central nervous system: a summary,” Acta Neuropathol. 131(6), 803–820 (2016). [CrossRef]

3. E. Crocetti, A. Trama, C. Stiller, et al., “Epidemiology of glial and non-glial brain tumours in Europe,” Eur. J. Cancer 48(10), 1532–1542 (2012). [CrossRef]

4. S. L. Hervey-Jumper and M. S. Berger, “Maximizing safe resection of low-and high-grade glioma,” J. Neuro-Oncol. 130(2), 269–282 (2016). [CrossRef]

5. J. S. Smith, E. F. Chang, K. R. Lamborn, et al., “Role of extent of resection in the long-term outcome of low-grade hemispheric gliomas,” J. Clin. Oncol. 26(8), 1338–1345 (2008). [CrossRef]

6. L. A. Snyder, A. B. Wolf, M. E. Oppenlander, et al., “The impact of extent of resection on malignant transformation of pure oligodendrogliomas,” J. Neurosurg. 120(2), 309–314 (2014). [CrossRef]

7. N. Sanai and M. S. Berger, “Glioma extent of resection and its impact on patient outcome,” Neurosurgery 62(4), 753–766 (2008). [CrossRef]

8. N. Sanai, M.-Y. Polley, M. W. McDermott, et al., “An extent of resection threshold for newly diagnosed glioblastomas,” J. Neurosurg. 115(1), 3–8 (2011). [CrossRef]

9. W. Stummer, H.-J. Reulen, T. Meinel, et al., “Extent of resection and survival in glioblastoma multiforme: identification of and adjustment for bias,” Neurosurgery 62(3), 564–576 (2008). [CrossRef]

10. D. Kuhnt, Andreas Becker, Oliver Ganslandt, et al., “Correlation of the extent of tumor volume resection and patient survival in surgery of glioblastoma multiforme with high-field intraoperative MRI guidance,” Neuro-oncology 13(12), 1339–1348 (2011). [CrossRef]

11. R. Amraei, A. Moradi, H. Zham, et al., “A comparison between the diagnostic accuracy of frozen section and permanent section analyses in central nervous system,” Asian Pac. J. Cancer Prev. APJCP 18, 659 (2017). [CrossRef]

12. F. N. Obeidat, H. A. Awad, A. T. Mansour, et al., “Accuracy of frozen-section diagnosis of brain tumors: An 11-year experience from a tertiary care center,” Turkish Neurosurgery 29, 242–246 (2018). [CrossRef]

13. T. P. Plesec and R. A. Prayson, “Frozen section discrepancy in the evaluation of central nervous system tumors,” Arch. Pathology & Laboratory Medicine 131(10), 1532–1540 (2007). [CrossRef]

14. M. Reni and A. J. Ferreri, “Therapeutic management of primary CNS lymphoma in immunocompetent patients,” Expert Rev. Anticancer Ther. 1(3), 382–394 (2001). [CrossRef]

15. K. Tofte, C. Berger, S. H. Torp, et al., “The diagnostic properties of frozen sections in suspected intracranial tumors: A study of 578 consecutive cases,” Surg. Neurol. Int. 5(1), 8 (2014). [CrossRef]

16. N. F. Marko, R. J. Weil, J. L. Schroeder, et al., “Extent of resection of glioblastoma revisited: personalized survival modeling facilitates more accurate survival prediction and supports a maximum-safe-resection approach to surgery,” J clinical oncology 32(8), 774–782 (2014). [CrossRef]

17. R. Batash, N. Asna, P. Schaffer, et al., “Glioblastoma multiforme, diagnosis and treatment; recent literature review,” Curr. Medicinal Chemistry 24(27), 3002–3009 (2017). [CrossRef]

18. F. Hanif, K. Muzaffar, K. Perveen, et al., “Glioblastoma multiforme: a review of its epidemiology and pathogenesis through clinical presentation and treatment,” Asian Pac. Journal Cancer Prevention 18, 3 (2017). [CrossRef]

19. K. Hoang-Xuan, E. Bessell, J. Bromberg, et al., “Diagnosis and treatment of primary cns lymphoma in immunocom petent patients: guidelines from the european association for neuro-oncology,” Lancet Oncol. 16(7), e322–e332 (2015). [CrossRef]

20. P. Niparuck, P. Boonsakan, T. Sutthippingkiat, et al., “Treatment outcome and prognostic factors in pcnsl,” Diagn. Pathol. 14(1), 56 (2019). [CrossRef]

21. L. Qian, C. Tomuleasa, I.-A. Florian, et al., “Advances in the treatment of newly diagnosed primary central nervous system lymphomas,” Blood research 52(3), 159–166 (2017). [CrossRef]

22. C. Grommes, J. L. Rubenstein, L. M. DeAngelis, et al., “Comprehensive approach to diagnosis and treatment of newly diagnosed primary CNS lymphoma,” Neuro-oncology 21(3), 296–305 (2019). [CrossRef]

23. M. L. Gabriele, G. Wollstein, H. Ishikawa, et al., “Three dimensional optical coherence tomography imaging: advantages and advances,” Prog. Retinal Eye Res. 29(6), 556–579 (2010). [CrossRef]

24. R. Chelghoum, A. Ikhlef, A. Hameurlaine, et al., “Transfer learning using convolutional neural network architectures for brain tumor classification from MRI images,” in IFIP international conference on artificial intelligence applications and innovations (Springer, 2020), pp. 189–200.

25. J. Cheng, “Brain tumor dataset,” figshare, 2017, https://figshare.com/articles/brain_tumor_dataset/1512427.

26. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Adv. neural information processing systems 25 (2012).

27. C. Szegedy, W. Liu, Y. Jia, et al., “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2015), pp. 1–9.

28. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXivarXiv preprint arXiv:1409.1556 (2014). [CrossRef]

29. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778.

30. C. Szegedy, S. Ioffe, V. Vanhoucke, et al., “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of the AAAI conference on artificial intelligence31, (2017).

31. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 7132–7141.

32. S. M. Kulkarni and G. Sundari, “Transfer learning using convolutional neural network architectures for glioma classification from MRI images,” Int. J. Comput. Sci. & Netw. Secur. 21, 198–204 (2021).

33. N. M. Lind, A. Moustgaard, J. Jelsing, et al., “The use of pigs in neuroscience: modeling brain disorders,” Neurosci. Biobehav. Rev. 31(5), 728–751 (2007). [CrossRef]

34. Y.-Q. Li, K.-S. Chiu, X.-R. Liu, et al., “Polarization-sensitive optical coherence tomography for brain tumor characterization,” IEEE J. Select. Topics Quantum Electron. 25(6), 1–9 (2019). [CrossRef]

35. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne,” J. Machine Learning Research 9, 2579–2605 (2008).

36. S. P. Hsu, T.-Y. Hsiao, L.-C. Pai, et al., “Differentiation of primary central nervous system lymphoma from glioblastoma using optical coherence tomography based on attention resnet,” Neurophotonics 9(01), 015005 (2022). [CrossRef]

37. X. Yu, C. Hu, W. Zhang, et al., “Feasibility evaluation of micro-optical coherence tomography (µoct) for rapid brain tumor type and grade discriminations: µoct images versus pathology,” BMC Med. Imaging 19(1), 102 (2019). [CrossRef]

38. C. Jenkins and I. Colquhoun, “Characterization of primary intracranial lymphoma by computed tomography: an analysis of 36 cases and a review of the literature with particular reference to calcification haemorrhage and cyst formation,” Clin. Radiol. 53(6), 428–434 (1998). [CrossRef]

39. A. Uzan, Y. Rivenson, and A. Stern, “Speckle denoising in digital holography by nonlocal means filtering,” Appl. Opt. 52(1), A195–A200 (2013). [CrossRef]

40. Y. Ma, X. Chen, W. Zhu, et al., “Speckle noise reduction in optical coherence tomography images based on edge-sensitive cgan,” Biomed. Opt. Express 9(11), 5129–5146 (2018). [CrossRef]

41. C. Kut, K. L. Chaichana, J. Xi, et al., “Detection of human brain cancer infiltration ex vivo and in vivo using quantitative optical coherence tomography,” Sci. Transl. Med. 7(292), 292ra100 (2015). [CrossRef]

42. E. B. Kiseleva, K. S. Yashin, A. A. Moiseev, et al., “Optical coefficients as tools for increasing the optical coherence tomography contrast for normal brain visualization and glioblastoma detection,” Neurophoton. 6(03), 1–035003 (2019). [CrossRef]

43. K. S. Yashin, E. B. Kiseleva, A. A. Moiseev, et al., “Quantitative nontumorous and tumorous human brain tissue assessment using microstructural co-and cross-polarized optical coherence tomography,” Sci. reports 9, 2024 (2019). [CrossRef]

44. K. S. Yashin, E. B. Kiseleva, E. V. Gubarkova, et al., “Cross-polarization optical coherence tomography for brain tumor imaging,” Front. Oncol. 9, 201 (2019). [CrossRef]

Patient	Specimen	OCT volumes	OCT frames	Diagnosis	Splitting
A	2^b	3 (1 × 1, 2 × 2)^b	(971,1942)^b	Glioblastoma, IDH-wildtype, CNS WHO grade 4	training
B	2	3 (1 × 2, 2 × 1)	(971, 1942)		training
C	1	2	(1942)		testing
D	2	3 (1 × 2, 2 × 1)	(1941, 960)		training
E	1	1	(971)		training
F	1	1	(971)		training
G	1	2	(1942)		testing
H	1	3	(2912)		training
I	1	1	(965)		training
J	1	2	(1942)	Astrocytoma, IDH-mutant, CNS WHO grade 2	testing
K	4	5 (1 × 2, 2 × 1, 3 × 1, 4 × 1)	(1941, 970, 965, 966)	Astrocytoma, IDH-mutant, CNS WHO grade 2	training
L	1	1	(971)	Pliocytic astrocytoma	testing
M	7	8 (1 × 1, 2 × 2, 3 × 1, 4 × 1, 5 × 1, 6 × 1, 7 × 1)	(971, 1941, 971, 971, 971, 971, 971)	Primary diffuse large B-cell lymphoma of the CNS
N^a	9	17 (1 × 1, 2 × 3, 3 × 2, 4 × 2, 5 × 1, 6 × 2, 7 × 1, 8 × 3, 9 × 2)	(970, 2910, 1940, 1940, 970, 1929, 970, 2888, 1940)	Normal

Diagnosis	Training		Testing
Diagnosis	OCT volumes^a	OCT frames	OCT volumes	OCT frames
HGG	15	14546	4	3883
LGG	5	4847	3	2908
PCNSL	5	4854	3	2913
Normal	15	14517	2	1940

Patient	Specimen	OCT volumes	OCT frames	Diagnosis	Splitting
A	2^b	3 (1 × 1, 2 × 2)^b	(971,1942)^b	Glioblastoma, IDH-wildtype, CNS WHO grade 4	training
B	2	3 (1 × 2, 2 × 1)	(971, 1942)		training
C	1	2	(1942)		testing
D	2	3 (1 × 2, 2 × 1)	(1941, 960)		training
E	1	1	(971)		training
F	1	1	(971)		training
G	1	2	(1942)		testing
H	1	3	(2912)		training
I	1	1	(965)		training
J	1	2	(1942)	Astrocytoma, IDH-mutant, CNS WHO grade 2	testing
K	4	5 (1 × 2, 2 × 1, 3 × 1, 4 × 1)	(1941, 970, 965, 966)	Astrocytoma, IDH-mutant, CNS WHO grade 2	training
L	1	1	(971)	Pliocytic astrocytoma	testing
M	7	8 (1 × 1, 2 × 2, 3 × 1, 4 × 1, 5 × 1, 6 × 1, 7 × 1)	(971, 1941, 971, 971, 971, 971, 971)	Primary diffuse large B-cell lymphoma of the CNS
N^a	9	17 (1 × 1, 2 × 3, 3 × 2, 4 × 2, 5 × 1, 6 × 2, 7 × 1, 8 × 3, 9 × 2)	(970, 2910, 1940, 1940, 970, 1929, 970, 2888, 1940)	Normal

Diagnosis	Training		Testing
Diagnosis	OCT volumes^a	OCT frames	OCT volumes	OCT frames
HGG	15	14546	4	3883
LGG	5	4847	3	2908
PCNSL	5	4854	3	2913
Normal	15	14517	2	1940

Brain tumor grading diagnosis using transfer learning based on optical coherence tomography

Abstract

1. Introduction

2. Materials and methods

3. Results

4. Discussions

5. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (3)

Biomedical Optics Express

Threshold_1	Threshold_2	Threshold_3	Sensitivity/ specificity				PPV/ NPV				Overall accuracy
Threshold_1	Threshold_2	Threshold_3	NOR	LYM	HGG	LGG	NOR	LYM	HGG	LGG
0.5	0.5	0.3	89.6%	99.0%	95.8%	63.5%	94.2%	100.0%	79.5%	82.4%	87.5%
0.5	0.5	0.3	98.9%	100.0%	87.7%	95.5%	97.9%	99.7%	97.7%	88.7%	87.5%
0.5	0.5	0.4	89.6%	99.0%	89.4%	74.2%	94.2%	100.0%	84.3%	77.1%	88.0%^a
0.5	0.5	0.4	98.9%	100.0%	91.7%	92.6%	97.9%	99.7%	94.5%	91.5%	88.0%^a
0.5	0.5	0.5	89.6%	99.0%	80.0%	80.3%	94.2%	100.0%	87.0%	70.0%	86.4%
0.5	0.5	0.5	99.0%	100.0%	80.0%	88.5%	97.9%	99.6%	90.4%	93.1%	86.4%
0.5	0.5	0.6	89.6%	99.0%	68.0%	86.0%	94.2%	100.0%	89.8%	63.0%	83.9%
0.5	0.5	0.6	98.9%	100.0%	96.2%	83.1%	97.9%	99.7%	85.8%	94.7%	83.9%
0.5	0.5	0.7	89.9%	99.0%	56.5%	88.7%	94.2%	100.0%	90.8%	57.4%	80.7%
0.5	0.5	0.7	98.9%	100.0%	97.1%	78.0%	97.9%	99.7%	81.7%	95.4%	80.7%
0.3	0.5	0.5	96.9%	99.0%	80.0%	72.6%	84.3%	100.0%	87.0%	71.4%	85.7%
0.3	0.5	0.5	96.4%	100.0%	94.0%	90.3%	99.4%	99.7%	90.4%	90.8%	85.7%
0.4	0.5	0.5	93.3%	99.0%	80.0%	77.6%	90.6%	100.0%	87.0%	70.7%	86.4%
0.4	0.5	0.5	98.1%	100.0%	94.0%	89.3%	98.7%	99.7%	90.4%	92.3%	86.4%
0.6	0.5	0.5	85.8%	99.0%	80.0%	82.3%	97.0%	100.0%	87.0%	68.9%	86.4%
0.6	0.5	0.5	99.5%	100.0%	94.0%	87.6%	97.2%	99.7%	90.4%	93.7%	86.4%
0.7	0.5	0.5	81.6%	99.0%	80.0%	83.3%	98.6%	100.0%	87.0%	67.6%	85.8%
0.7	0.5	0.5	99.8%	100.0%	94.0%	86.7%	96.4%	99.7%	90.4%	94.0%	85.8%