Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Differentiation of breast tissue types for surgical margin assessment using machine learning and polarization-sensitive optical coherence tomography

Open Access Open Access

Abstract

We report an automated differentiation model for classifying malignant tumor, fibro-adipose, and stroma in human breast tissues based on polarization-sensitive optical coherence tomography (PS-OCT). A total of 720 PS-OCT images from 72 sites of 41 patients with H&E histology-confirmed diagnoses as the gold standard were employed in this study. The differentiation model is trained by the features extracted from both one standard OCT-based metric (i.e., intensity) and four PS-OCT-based metrics (i.e., phase difference between two channels (PD), phase retardation (PR), local phase retardation (LPR), and degree of polarization uniformity (DOPU)). Further optimized by forward searching and validated by leave-one-site-out-cross-validation (LOSOCV) method, the best feature subset was acquired with the highest overall accuracy of 93.5% for the model. Furthermore, to show the superiority of our differentiation model based on PS-OCT images over standard OCT images, the best model trained by intensity-only features (usually obtained by standard OCT systems) was also obtained with an overall accuracy of 82.9%, demonstrating the significance of the polarization information in breast tissue differentiation. The high performance of our differentiation model suggests the potential of using PS-OCT for intraoperative human breast tissue differentiation during the surgical resection of breast cancer.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Cancer is the second leading cause of death globally, with an estimated 1.9 million new cancer cases to be diagnosed in the United States in 2021 [1]. Breast, colorectal, lung, cervical, and thyroid cancer are the most common among women. Among all of these, breast cancer is the second leading cause of cancer death in women after lung cancer, accounting for an estimated 284,200 new cases diagnosed and 44,130 deaths in the United States reported in 2021 [1]. The current standard of care for early-stage breast cancer is breast conserving surgery (BCS), which is equivalent to mastectomy in long-term survival rate, preserves more appearance and sensation of the breast, and offers a shorter recovery time [2,3]. Tumor margin evaluation is a crucial issue during BCS in order to avoid margin re-excision and to reduce the risk of local recurrence of the disease. Some evaluative methods have been adopted for intraoperative tumor margin assessment to strive for complete resection of the tumor. Frozen section is time-consuming and provides evaluation results with low accuracy due to difficulties in sectioning adipose tissue [4]. Touch preparation cytology provides faster assessment than frozen section but is associated with sample preparation artifacts [5]. Therefore, there is a significant need for real-time (<1 s) evaluation of the tumor margin during BCS.

Various intraoperative methods based on optical techniques have been reported for breast tumor margin evaluation, including handheld probe-based radiofrequency spectral analysis [6,7], quantitative diffuse reflectance imaging [8,9], confocal mosaicking microscopy [10], point spectroscopy [11,12], and optical coherence tomography (OCT) [1317]. OCT is a promising method for intraoperative breast tumor margin evaluation because of the high resolution, high acquisition speed for large-area imaging, and millimeter-scale imaging depth [18,19]. Recently with the development of handheld [20] and needle probes [21,22], or utilizing a single fiber [23], portable OCT systems have showed great potential for breast tumor margin evaluation during BCS, especially with the help from advanced artificial intelligence algorithms [24,25]. The criterion for margin evaluation using standard OCT is image contrast and textures, which are dependent on the variations of refractive index and the optical scattering properties of tissues. For breast tissues, it is possible to differentiate between malignant tumor and fibro-adipose tissue types because malignant tumor tissue appears denser and highly scattering than healthy tissue and disrupts the normal low-scattering honeycomb-like architecture of fibro-adipose tissue found in the normal breast [26]. However, normal stroma, which is comprised largely of collagen and connective tissue in the breast, is also dense and highly scattering under standard OCT, making it challenging to differentiate normal stroma from malignant tumor tissue. Recently, OCT-based elastography (OCE) was also applied in breast cancer imaging which showed good performance at detecting normal stroma from malignant tumor tissue by measuring stiffness quantitatively [27]. The promising potential of OCE-based breast tumor margin detection [2831] was demonstrated and some automated morphological segmentation methods were proposed for breast tissue subtypes differentiation [3234]. In addition, some works focused on the comparison and combination of OCE and polarization-sensitive OCT [35,36].

In our previous work [37], a bench-top polarization-sensitive OCT (PS-OCT) was demonstrated for detecting and quantifying birefringence contrast between healthy and cancerous tissues based on collagen content, and a portable intraoperative PS-OCT system was used for enhancing the detection and differentiation of invasive ductal carcinoma (IDC) [38]. By using the complementary information provided by standard OCT-based metrics and PS-OCT-based metrics, an overall accuracy of 89.4$\%$ was achieved to differentiate fibro-adipose tissue, stroma, and IDC, demonstrating the potential of PS-OCT as an adjunct modality for enhanced intraoperative differentiation of breast cancer. In this paper, we newly report an automated approach for breast tissue differentiation from intraoperative PS-OCT images. Both standard intensity and polarization images were post-processed for further feature extraction. Based on the extracted features, a robust diagnostic model validated by leave-one-site-out-cross-validation (LOSOCV) was built by support vector machine (SVM) to differentiate malignant tumor, fibro-adipose, and stroma tissues using histopathology as the gold standard. The diagnostic model based on features extracted from the combination of one standard OCT-based metric (i.e., intensity) and four PS-OCT-based metrics (i.e., phase difference between two channels (PD), phase retardation (PR), local phase retardation (LPR), and degree of polarization uniformity (DOPU)) lead to a high overall accuracy of 93.5$\%$. The high performance of the classifier suggests that PS-OCT is a promising modality for automated and robust diagnosis of breast cancer, particularly in the intraoperative setting.

2. Materials and methods

2.1 Tissue sample acquisition and preparation

A total of 41 human subjects undergoing surgery for either breast cancer (lumpectomy or mastectomy, all invasive ductal carcinoma (IDC)) or breast reduction surgery (healthy controls, no history of cancer) were recruited for this study. The study protocol was approved by the Institutional Review Boards (IRB) of both Carle Foundation Hospital in Urbana, IL and the University of Illinois at Urbana-Champaign. Tissue specimens from 17 subjects were imaged immediately after surgical resection by the PS-OCT system in the surgical pathology room adjacent to the operating room. The other 24 specimens provided by the Cooperative Human Tissue Network (CHTN) were transported in saline and kept on ice and were imaged within 24 hours of initial surgical resection . After PS-OCT imaging, the imaged sites were marked with surgical ink and the tissue specimens were placed in formalin per standard pathologic protocol. After standard tissue processing and staining with hematoxylin and eosin (H&E) for histological evaluation, diagnoses were made by a board-certified pathologist, and diagnostic results served as the gold standard ground-truth for breast tissue classification.

2.2 PS-OCT imaging

A portable custom-designed PS-OCT system was employed for tissue imaging, which is described in detail in our previous work [38]. Two coupled superluminescent diodes (SLDs) were used in this intraoperative PS-OCT system to cover a spectral range of 1200~1400 nm. The axial resolution and transverse resolution were 5 µm and 8 µm, respectively. Two spectrometers each using a 2048 pixel linescan camera were used for interference signal detection, resulting in a 76 kHz A-scan rate, and the sensitivity of the system was 89 dB. To assess each specimen, one or multiple sites were imaged according to the size of specimen. Each imaged site comprised 512 $\times$ 512 $\times$ 2048 pixels in the X, Y and Z directions, respectively, corresponding to 2.8 $\times$ 2.8 $\times$ 4 mm or 4.2 $\times$ 4.2 $\times$ 4 mm. The scan length in the X and Y directions was determined by the galvo control voltage, which was set to either 4 V or 6 V for different sites, depending on the specimen size.

For each B-scan image, one standard OCT-based metric (i.e., intensity) and four polarization-based metrics (i.e., phase difference between two channels (PD), phase retardation (PR), local phase retardation (LPR), and degree of polarization uniformity (DOPU)) were calculated for the breast tissue classification. For our intraoperative PS-OCT system, the standard OCT intensity I is expressed as the total intensity of the two output channels [39]:

$$I={\left |E_{out,1} \right |}^{2}+{\left |E_{out,2} \right |}^{2},$$
where $\left |E_{out,1} \right |$ and $\left |E_{out,2} \right |$ are the intensities measured by two spectrometers. The phase difference between the two output channels PD was calculated as
$$PD=\textrm{arg}(E_{out,2}\times E_{out,1}^{*}),$$
where “*" donates the complex conjugate and “arg" represents the argument.

The PS-OCT enabled PR is represented by

$$PR=ac\textrm{tan}(\left |E_{out,1} \right |/\left |E_{out,2} \right |),$$
and the local phase retardation LPR is represented by [17]
$$LPR=\left \| \frac{1}{\omega _{z}} \int_{-\omega _{z}/2}^{\omega_{z} /2}[m_{32}^{'}(z+\nu )m_{13}^{'}(z+\nu )m_{21}^{'}(z+\nu )]^{T}dz\right \|,$$
where $\mathbf {m}^{'}=\mathbf {m}-\mathbf {G}\cdot \mathbf {m}^{T}\cdot \mathbf {G}$, $\mathbf {m}$ is the matrix logarithm of retardation over a differential depth, $\omega _{z}$ is an axial distance, $\mathbf {G}=\textrm{diag}(1,-1,-1,-1)$ is the Minkowski matrix, and z is a given depth.

To obtain DOPU, the Stokes vectors are first calculated by [40]

$$S=\begin{bmatrix} I\\ Q\\ U\\ V \end{bmatrix}=\begin{bmatrix} E_{out,1}E_{{out,1}}^{*}+E_{out,2}E_{{out,2}}^{*}\\ E_{out,1}E_{{out,1}}^{*}-E_{out,2}E_{{out,2}}^{*}\\ 2\left |E_{out,1} \right |/\left |E_{out,2} \right |\textrm{cos}(\textrm{arg}(E_{out,1}E_{out,2}^{*}))\\ 2\left |E_{out,1} \right |/\left |E_{out,2} \right |\textrm{sin}(\textrm{arg}(E_{out,1}E_{out,2}^{*})) \end{bmatrix},$$
where the DOPU is then defined as
$$DOPU=\sqrt{\bar{Q}^{2}+\bar{U}^{2}+\bar{V}^{2}},$$
with
$$(\bar{Q},\bar{U},\bar{V})=(\sum_{i}\frac{Q_{i}}{I_{i}},\sum_{i}\frac{U_{i}}{I_{i}},\sum_{i}\frac{V_{i}}{I_{i}}),$$
where i indicates the i-th pixel within a spatial kernel by which DOPU is defined.

2.3 Dataset construction

For multiple sites from the same sample, correlation analyses were conducted in order to ensure data independence and eliminate classification artifacts arising from images with high spatial proximity [38]. The final dataset is summarized in Table 1, noting that 2 subjects contributed to both fibro-adipose and stroma datasets. Therefore, a total of 72 independent sites obtained from 41 patients were finally included in this study. Additionally, we imaged at evenly spaced intervals at each tissue site to get labeled images, but the number of valid labeled images varied significantly among different sites depending on the size and shape of the specimen. To avoid our dataset being partial to subjects/sites with more labeled B-scan images, only 10 images were selected from each site based on the lowest number of labeled frames for all sites. For sites with more than 10 labeled images, we randomly selected 10 images for our dataset.

Tables Icon

Table 1. Number of patients, sites, and PS-OCT images included in this study.

After obtaining raw data from two PS-OCT detection channels, 5 numerical metrics (I, PD, PR, LPR, and DOPU) described in Section 2.2 were calculated in MATLAB for every selected B-scan image. Therefore, 5 metrics matrices of each B-scan image, rather than only the intensity matrix, were used for further feature extraction to take full advantage of the polarization information provided by the PS-OCT system. Figure 1 shows these five metric images along with representative H&E-stained histology results for three tissue types. Thus, the total dataset consisted of 3,600 metric images (720 B-scan images $\times$ 5 metrics) in this study.

 figure: Fig. 1.

Fig. 1. Representative structural OCT images (a-1, b-1, c-1), local phase retardation (LPR) images (a-2, b-2, c-2), phase difference (PD) images (a-3, b-3, c-3), phase retardation (PR) images (a-4, b-4, c-4), degree of polarization uniformity (DOPU) images (a-5, b-5, c-5) and H&E-stained histology images (a-6, b-6, c-6) of malignant tumor (a-1 to a-6), fibro-adipose (b-1 to b-6) and stroma (c-1 to c-6) tissue, respectively. Scale bars represent 500 µm. The color scale unit for PD, PR, and LPR is radian, the color scale unit for DOPU is a.u.

Download Full Size | PDF

2.4 Region of interest (ROI) selection

In OCT systems, signal-to-noise ratio (SNR) decreases in for deeper layers of a sample due to the multiple scattering and overall attenuation in biological tissues. In order to build a model with high reliability, a region-of-interest (ROI) was selected for our dataset to exclude data with low SNR. The upper boundary of the ROI was automatically determined by a greedy algorithm for surface extraction [41]. The lower boundary was determined as a fixed number of pixels below the upper boundary. In this paper, 256 pixels below the upper boundary were chosen, empirically aimed at excluding pixels with low SNR. The number of pixels is constant for every A-scan in this fixed ROI, which is beneficial for further feature extraction.

2.5 Feature extraction

For each selected B-scan image, feature extraction was performed using all 5 of the metrics: I, PD, PR, LPR, and DOPU. Features were extracted from both A-scan line data and B-scan image data to provide local and global information at the same time, where A-scan line data was extracted from B-scan image data in this study. A feature matrix was first built with all the extracted features to cover more image properties, and then a feature subset was selected by forward searching to achieve the highest accuracy and limit overfitting.

2.5.1 A-scan features

Two-dimensional median filters with 8 pixel and 20 pixel kernels in the lateral and axial directions, respectively, were applied to all the B-scan images after ROI selection. The size of the filter was decided heuristically to filter some noise while maintaining and reinforcing the structure of the breast tissue. In this study, the total image dataset contained 3,600 metric images, and each image was comprised of 512 A-scan lines. Thus, it was time-consuming to extract features from all the A-scan lines. While preserving the characteristics of tissue structure, all the image data was down-sampled (retaining every eighth A-scan line) to accelerate A-scan feature extraction. Therefore, there were 64 A-scan lines in each down-sampled B-scan image data.

Two groups of features: global features and local features, were calculated for each A-scan line. For global features, polynomial fitting was implemented to all data in the ROI for capturing the changing trend of structures. The polynomial order was primarily determined by tissue type and metric selected. The honeycomb structure of fibro-adipose tissue usually introduced multiple peaks in the depth direction which resulted in a higher order polynomial for better fitting. In addition, in contrast to the intensity metric, some metrics (i.e., PR) became larger for some tissue types at greater depths, which meant that linear regression was not suitable for fitting. In this study, the polynomial fitting order was heuristically set as 5th for all five metrics data. In total, 7 features (6 fitting coefficients and 1 fitting error) were extracted for each metric data as A-scan global features, as shown in Fig. 2(b).

 figure: Fig. 2.

Fig. 2. A-scan line feature extraction. An original A-scan line data from one image metric is shown in (a). The red dashed line in (b) is the global fitting result after 5th-order polynomial fitting. (c) Short-range linear fitting results in 8 adjacent windows. Peaks and valleys are shown by the red line trace in (d).

Download Full Size | PDF

Local features consisted of two types: short-range features and peak-valley features. For short-range features, as illustrated in Fig. 2(c), A-scan line data in the ROI was split into 8 consecutive short-range subsets, with 32 pixels in each. Linear fitting was performed in each subset, providing 3 features (slope, intercept, and fitting error) carrying local tissue structure. Five statistical operations: mean, standard deviation (STD), mode, maximum, and minimal were performed among all 8 short-range subsets. Peak-valley features were previously proposed for identifying oral malignancy in the hamster cheek pouch [42]. This type of feature also had the potential to differentiate fibro-adipose from other tissue types due to the regular adipocyte boundaries. By performing morphological closing and opening by flat structuring elements of size 5 and 2 pixels, respectively, A-scan lines were filtered and then normalized. H-maxima transformation was performed for the suppression of those spurious peaks with normalized values less than 0.1. Local maxima and minima were defined as “peak" and “valley", as shown in Fig. 2(d). Seven features were calculated from all the detected peaks and valleys: (1) $\sum p_{i}$, (2) $\sum p_{i}-\sum v_{i}$, (3) $\sum p_{i}+\sum v_{i}$, (4) $A_{p_{1}}$: A-scan data value at $p_{1}$, (5) $A_{p_{2}}$: A-scan data value at $p_{2}$, (6) $A_{v_{2}}$: A-scan data value at $v_{1}$, (7) $A_{v_{2}}$:A-scan data value at $v_{2}$, where $p_{i}$ and $v_{i}$ represented i-th peak and valley, respectively.

For all the extracted A-scan features, an intermediate feature matrix was built [43], and four statistics (mean, STD, minimum, and maximum) were calculated among all A-scan lines in the down-sampled B-scan images. In the proposed model, we extracted a total of 116 A-scan features. ((7 global features + 15 short-range features + 7 peak-valley features) $\times$ 4 statistics).

2.5.2 Texture features

Representing the local variations in one small region of an image, texture is a powerful quantify metric for image analysis. Here, a statistical texture analysis technique based on a gray-level co-occurrence matrix (GLCM) method was implemented for texture features extraction [44]. For L uniform gray levels, GLCM is based on the estimation of the second order joint conditional probability density function, $f(i,j|d,\theta )$, which means the probability of a pixel with a gray-level value i being d pixels away from a pixel with a gray-level value j in the $\theta$ direction. In our study, we first scaled image data to eight uniform levels, then let d = 1, 2, 4, and 6, and $\theta$ = 0, 45, and 90 deg. Therefore, for a given distance d, an 8 $\times$ 8 matrix, $s_{\theta }(i,j|d)$ was calculated from $f(i,j|d,\theta )$ for different $\theta$. For each combination of d and $\theta$, two features, energy and entropy, were calculated as per Eq. (8) and Eq. (9) [44],

$$energy=\sum_{i=0}^{L-1}\sum_{i=0}^{L-1}s_{\theta }(i,j|d)^{2},$$
$$entropy=\sum_{i=0}^{L-1}\sum_{i=0}^{L-1}s_{\theta }(i,j|d)\textrm{log}[s_{\theta }(i,j|d)],$$
where $s_{\theta }(i,j|d)$ is the $(i,j)$th element of the GLCM for distance d, contributing to a total of 24 features for 3 directions with 4 different distances.

2.5.3 Morphological features

Morphological analysis of OCT images has been reported in the discrimination of freshly-excised specimens of gastrointestinal tissue [45]. Region segmentation was first achieved by a k-means method [46], which partitioned the observation of the image data into k clusters. Typically, three steps were involved in the k-means method. An initial k cluster centroid was first chosen, followed by computation of point-to-cluster distances of all observations, and moving each observation to the closest cluster, finally, the average of the observations in each cluster was recalculated to obtain new k centroid locations. This process was repeated until cluster assignments did not change. In this study, all 5 metrics data for one B-scan were segmented into two to four clusters, which meant that the k-means method was performed three times and in total, 9 regions were obtained. Six features were then extracted per region, as follows:

$$mean=\overline{M}=\frac{1}{N}\sum_{i=1}^{N}M_{i},$$
$$normalized\: mean: \overline{M_{N}}=\frac{\overline{M}-\textrm{min}(M)}{\textrm{max}(M)-\textrm{min}(M)},$$
$$absolute\: deviation:\Delta M=\sum_{i=1}^{N}\left | M_{i} -\overline{M}\right |,$$
$$STD:\sigma _{M}=(\frac{1}{N}\sum_{i=1}^{N}(M_{i}-\overline{M})^{2})^{\frac{1}{2}},$$
$$skewness:S_{M}=\frac{\frac{1}{N}\sum_{i=1}^{N}(M_{i}-\overline{M})^{3}}{\sigma _{M}^{3}},$$
$$kurtosis:K_{M}=\frac{\frac{1}{N}\sum_{i=1}^{N}(M_{i}-\overline{M})^{4}}{\sigma _{M}^{4}}.$$

The term $M_i$ is the specific metric value of a pixel within the region and N is the number of pixels of the region. Accordingly, the total number of morphological features is 54 (9 regions times 6 features) for each metric image data.

2.5.4 Feature set combination

After the feature extraction mentioned above, a total of 970 features were extracted for further classifier training and cross-validating, summarized in Table 2.

Tables Icon

Table 2. Features list for each image metric data in the classifier.

2.5.5 Feature set fine selection

In order to improve the performance of the classifier and the computing efficiency, the best subset of the features was required for further classification and validation. First, features with the same values for B-scan data from all tissue sites were excluded since they provided no useful information for classification. After that, only 754 features remained for the next step. As the initial features were extracted to capture more tissue properties, some of the remaining 754 features may not be independent or carry little information about the image labels. Features with a large dependency may ensure a high accuracy when taken separately, however, the combinations of them may not necessarily lead to a better performance because they do not give independent information. Therefore, the minimal-redundancy-maximal-relevance (mRMR) criterion [47] was applied, giving a candidate list of features which attributed more relevant information to the labels, and fewer correlation with other features. During mRMR-based feature selection, features were first selected based on the maximal relevance (Max-Relevance) criterion, in which the selected features were required, individually, to have the largest mutual information with the labels. It is likely that features selected by Max-Relevance may have rich redundancy because of the large dependency among these features. Therefore, the minimal redundancy (Min-Redundancy) condition was added to select mutually exclusive features. The resulting classifier built on this fine selected feature subset can achieve both high accuracy and faster speed.

2.6 Classification and validation

Supervised learning was employed here for classification, which uses a training set with known labels to predicate the testing set. Support vector machine (SVM) was used here for developing a diagnostic model. As introduced in Section 2.5.5 high-rank features selected by mRMR, were utilized for classifier training. All extracted features varied greatly due to the different sorts of statistical operations. Therefore, all features needed to be normalized to zero mean and unit variance to prevent some features carrying greater weight. By forward searching [48], classification accuracy with different numbers of features for the classifier can be obtained for the final decision on the feature subset. During the forward searching, the evaluation direction was decided by the feature rank after mRMR feature selection. If several features gave the same overall accuracy in one evaluation step, the feature with the highest mRMR rank among these features was selected as the best individual feature for this step because it carried the highest relevance to the labels and the minimal redundancy with other features.

In order to accurately estimate the performance of the classifier, leave-one-tissue-site-out cross-validation (LOSOCV) was used here to avoid potential correlations among one single tissue site. This approach eliminated the dependency of the classification result on training or on the testing set selection and reduced the risk of over-fitting. Based on the site number, 72 iterations for validation were needed in this study. In each iteration, features from one tissue site were selected as the validation set, while features from the remaining 71 sites were employed as the training set. After all the cross validation, accuracy, sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV) were calculated per diagnostic category from all others to evaluate the performance of our classifier.

3. Results and discussion

The accuracy of the classifier for differentiating breast tissue mainly depends on the number of features in the subset selected by mRMR criterion. The top 100 high-rank features (47 are from intensity, 11 are from PD, 12 are from PR, 14 are from LPR, and 16 are from DOPU), as illustrated in Fig. 3(a), were first selected based on consideration of the required computing time. Validated by LOSOCV, the overall accuracy for this 100-features-based classifier is 87.2$\%$.

 figure: Fig. 3.

Fig. 3. Results for best feature set selection. For the PS-OCT-based classifier, (a) is the top 100 high-rank feature distribution sorted by mRMR rank, and (b) is the feature distribution after forward searching, where the best 17 features are marked in red. For the intensity-only classifier, (c) is the top 100 high-rank feature distribution sorted by mRMR rank, and (d) is the feature distribution after forward searching, where the best 10 features are marked in red. Colored regions represent features from degree of polarization uniformity (DOPU, purple), local phase retardation (LPR, blue), intensity (I, turquoise), phase difference (PD, dark yellow) and phase retardation (PR, yellow), respectively.

Download Full Size | PDF

In order to select the best feature subset with the best classification performance, the forward searching algorithm was implemented, and the overall accuracy was calculated as the evaluation metric of classification performance. The feature selected in each forward searching step is shown in Fig. 3(b). The overall accuracy during the forward searching is illustrated as the blue line in Fig. 4(a), where the accuracy reaches a maximum of 93.5$\%$ when selecting 17 features (marked in red in Fig. 3(b)) for classifier training. There is no evidence to prove that these 17 features are totally independent, but according to the procedure of mRMR, these features are mutually exclusive features that carried more relevant information to the labels.

 figure: Fig. 4.

Fig. 4. Prediction performance based on classifier. (a) Overall accuracy during forward searching, using a model trained by 5-metrics-integrated PS-OCT features and intensity-only features, illustrated by the blue and orange lines, respectively. (b)-(d) Receiver operating characteristic (ROC) curves of the 5-metrics-integrated classifier with (b) malignant tumor, (c) fibro-adipose, and (c) stroma as the positive class, respectively.

Download Full Size | PDF

The 17 features in the final classifier are summarized in Table 3. All of the 17 features were derived from the PS-OCT image metrics (PD: n=2, PR: n=6, LPR: n=3, DOPU: n=6), which also demonstrates that polarization information provided by PS-OCT carried more information about the tissue type. Among the 4 PS-OCT metrics, PR and DOPU contributed more to increase the overall accuracy of the classifier compared to the OCT intensity-only classifier. A total of 13 features in the morphological group made up the largest proportion in the final feature set, mainly because they are more relevant to breast tissue classification, and they occupied a large proportion of the initial feature set (54/194).

Tables Icon

Table 3. Number of final selected features in each category.

The prediction result for this 17-feature-based classifier training is given in Table 4. Among 240 malignant tumor tissue images, 222 were correctly classified, while the remaining 11 were misclassified as adipose and 7 as stroma. For tissue diagnosed as adipose, 221 of 240 were correctly classified as adipose, while 17 were misclassified as malignant tumor and 2 as stroma. Finally, for the stoma group, 230 of 240 were correctly classified as stroma, 9 were misclassified as malignant tumor, and 1 was misclassified as adipose. Based on these results, the obtained sensitivity, specificity, negative predictive value (NPV), and positive predictive value (PPV), and accuracy for differentiating a breast tissue type from all others are listed in Table 4.

Tables Icon

Table 4. Confusion matrix of model trained by integrated features from PS-OCT images with highest overall accuracy by forward searching.

To investigate the improvement of classification performance brought by polarization-based information obtained with the use of the PS-OCT system, only intensity data was used for building a simplified classifier for comparison. Feature selection by mRMR, as illustrated in Fig. 3(c), and forward searching were also performed for this intensity-only classifier, whose performance is illustrated by the orange line in Fig. 4(a). Trained by 10 features, as indicated by the red points in Fig. 3(d), the intensity-only classifier reached its best result with an overall accuracy of 82.9$\%$. The receiver operating characteristics (ROC) curves achieved by the 17-feature-based integrated classifier are illustrated by the blue lines in Fig. 4(b)-(d), with malignant tumor (Fig. 4(b)), fibro-adipose (Fig. 4(c)), and stroma (Fig. 4(d)) as the positive classes, and the area-under-curve (AUC) values of 0.98, 0.98, and 0.99, respectively. The ROC curves for the intensity-only classifier are illustrated by the orange lines in Fig. 4(b)-(d), and the AUC values were 0.90, 0.92, and 0.97 for malignant tumor, fibro-adipose, and stroma, respectively. The differentiation model based on the integrated features has higher credibility than intensity-only model. This demonstrated that with the polarization data provided by a PS-OCT system, more information about the tissue type can be obtained. With this extra tissue information, the differentiation model for breast tissue will give a more accurate prediction for an unknown tissue type.

Some limitations of the current differentiation model, and further work for optimization, are to be noted. Although only 17 features were selected for the final model training, initial feature extraction (20 s / PS-OCT image $\times$ 720 PS-OCT images = 4 hours) and forward searching for the final subset (5 hours) were time consuming (Note the feature extraction and forward searching were operated using MATLAB on a Windows desktop with an Intel Core i7 2.9GHz and 16GB RAM). Additionally, the rank of features determined by mRMR varied considerably for different training datasets, which introduces additional computational time and effort. However, considered from another perspective, this large amount of alternative initial feature set parameters gives more degrees of freedom for the training classifier than those from specific fixed-number feature extraction, which will also improve the prediction accuracy. The time consuming procedure of extracting 970 features and forward searching was only executed during the training of the model. Once the final model was built, for a new unknown breast tissue specimen and image, the tissue type classification will be very quick (<1 s) because only 17 features need to be calculated, highlighting the potential for real-time clinical application in the future. Furthermore, if there is a need to accelerate the testing procedure, we can only select the first several features based on the forward searching result, after trading off the testing time and overall accuracy.

In addition, because the current model is trained through a supervised learning method, the three types of tissues included in the training set were quite homogeneous, with little or no regions within the images being other tissue types. This was intentionally selected to make sure further computation of the five numerical metrics used for feature extraction were corresponding to labels in the classifier training. In clinical settings, the tissues encountered are more likely to be heterogeneous, with a mixture of malignant tumor and normal regions, as shown in Fig. 5. Only a very small region was malignant tumor while most of the image was stroma. From a clinical perspective, the histology result would identify this image as containing malignant tumor, as indicated by the red arrow in Fig. 5(f). However, it was misclassified as stroma because the malignant tumor region was very small relative to the other dominant tissue type, and the current model only performed classification. Image segmentation was not included in our algorithm at the present stage. However, in the near future, we will add a segmentation algorithm to our model to find important sub-ROIs. Further sub-ROI selection can be employed for heterogeneous tissue images, followed by calculating numerical metrics for each sub-ROI. Using the features from those metrics, the predictions for each selected sub-ROI can be obtained. Another alternative method would be to train the differentiation model based on weakly-supervised or unsupervised methods, which is an ongoing project in our group. In weakly-supervised or unsupervised methods, not all training data needs to be labeled, with the potential for margin assessment in clinical applications to be achieved with reliable accuracy.

 figure: Fig. 5.

Fig. 5. Structural OCT (a), LPR (b), PD (c), PR (d), DOPU (e), and H&E-stained histology (f) images for a heterogeneous tissue region. Region indicated by the red ovals and arrows was identified histologically as malignant tumor.

Download Full Size | PDF

Finally, all the images in this study were imaged ex-vivo after the tissues were surgically resected from the recruited subjects. Handhold probes for intraoperative OCT imaging of the in-vivo surgical resection bed and loco-regional lymph nodes has been demonstrated [20]. Current efforts are underway for the construction and implementation of a handheld PS-OCT surgical probe that will be integrated with the intraoperative PS-OCT system for in-vivo imaging during malignant tumor resection and for real-time margin assessment with the implementation of our tissue differentiation model.

4. Conclusions

In this work, a differentiation model for breast tissues (malignant tumor, fibro-adipose, stroma) was obtained based on information provided by a lab-built intraoperative PS-OCT system. A total of 72 sites from 41 human subjects were included in training set, with 10 B-scan frames from each site extracted for computing five numerical metrics (I, PD, PR, LPR, DOPU) for each frame. After fine feature subset selection by mRMR and forward searching, the classifier trained by SVM was obtained with an overall accuracy of 93.5$\%$ for differentiating malignant tumor, fibro-adipose, and stroma breast tissues. After comparing with a classifier based only on intensity (usually obtained from standard OCT systems), our multiple-metrics-integrated classifier showed better differentiation performance. This demonstrates that with additional polarization information provided by a PS-OCT system, more information about the tissue can be acquired and utilized by multiple numerical metrics, illustrating the potential of PS-OCT for enhanced breast cancer detection. With future integration of a handhold surgical imaging PS-OCT probe, real-time in-vivo margin assessment with machine-learning-based tissue differentiation is anticipated for surgical breast tumor procedures.

Funding

National Institutes of Health (1R01EB023232, IR01CA213149).

Acknowledgments

The authors want to thank the research and surgical nursing staff at Carle Foundation Hospital, Urbana, IL, for assisting with tissue resection, tissue handling, and histopathological evaluation, as well as the human subjects who participated in this research. We also would like to thank the Cooperative Human Tissue Network (CHTN) for providing additional tissue specimens for this study. Additional information can be found at: https://biophotonics.illinois.edu.

Disclosures

S.A.B. is co-founder of Diagnostic Photonics, which is licensing intellectual property from the University of Illinois at Urbana-Champaign related to Interferometric Synthetic Aperture Microscopy for intraoperative imaging of cancer. All other authors declare that there are no conflicts of interest related to this article.

References

1. R. L. Siegel, K. D. Miller, H. E. Fuchs, and A. Jemal, “Cancer statistics, 2021,” CA: A Cancer J. for Clin. 71(1), 7–33 (2021). [CrossRef]  

2. U. Veronesi, N. Cascinelli, L. Mariani, M. Greco, R. Saccozzi, A. Luini, M. Aguilar, and E. Marubini, “Twenty-year follow-up of a randomized study comparing breast-conserving surgery with radical mastectomy for early breast cancer,” N. Engl. J. Med. 347(16), 1227–1232 (2002). [CrossRef]  

3. E. S. Hwang, D. Y. Lichtensztajn, S. L. Gomez, B. Fowble, and C. A. Clarke, “Survival after lumpectomy and mastectomy for early stage invasive breast cancer: the effect of age and hormone receptor status,” Cancer 119(7), 1402–1411 (2013). [CrossRef]  

4. J. C. Cendan, D. Coco, and E. M. Copeland, “Accuracy of intraoperative frozen-section analysis of breast cancer lumpectomy-bed margins,” J. Am. Coll. Surg. 201(2), 194–198 (2005). [CrossRef]  

5. E. K. Valdes, S. K. Boolbol, I. Ali, S. M. Feldman, and J. M. Cohen, “Intraoperative touch preparation cytology for margin assessment in breast-conservation surgery: does it work for lobular carcinoma?” Ann. Surg. Oncol. 14(10), 2940–2945 (2007). [CrossRef]  

6. F. Schnabel, S. Boolbol, M. Gittleman, T. Karni, L. Tafra, S. Feldman, A. Police, N. Friedman, S. Karlan, D. Holmes, S. Willey, M. Carmon, K. Fernandez, S. Akbari, J. Harness, M. D. L. Guerra, T. Frazier, K. Lane, R. Simmons, and T. Allweis, “A randomized prospective study of lumpectomy margin assessment with use of marginprobe in patients with nonpalpable breast malignancies,” Ann. Surg. Oncol. 21(5), 1589–1595 (2014). [CrossRef]  

7. M. Thill, “Marginprobe: intraoperative margin assessment during breast conserving surgery by using radiofrequency spectroscopy,” Expert Rev. Med. Devices 10(3), 301–315 (2013). [CrossRef]  

8. J. Q. Brown, T. M. Bydlon, S. A. Kennedy, M. L. Caldwell, J. E. Gallagher, M. Junker, L. G. Wilke, W. T. Barry, J. Geradts, and N. Ramanujam, “Optical spectral surveillance of breast tissue landscapes for detection of residual disease in breast tumor margins,” PLoS One 8(7), e69906 (2013). [CrossRef]  

9. B. J. Tromberg, A. Cerussi, N. Shah, and M. Compton, “Imaging in breast cancer: diffuse optics in breast cancer: detecting tumors in pre-menopausal women and monitoring neoadjuvant chemotherapy,” Breast Cancer Res. 7(6), 279–285 (2005). [CrossRef]  

10. S. Abeytunge, Y. Li, B. Larson, G. Peterson, and E. Seltzer, “Confocal microscopy with strip mosaicing for rapid imaging over large areas of excised tissue,” J. Biomed. Opt. 18(6), 061227 (2013). [CrossRef]  

11. M. D. Keller, E. Vargis, A. Mahadevan-Jansen, N. de Matos Granja, R. H. Wilson, M.-A. Mycek, and M. C. Kelley, “Development of a spatially offset Raman spectroscopy probe for breast tumor surgical margin evaluation,” J. Biomed. Opt. 16(7), 077006 (2011). [CrossRef]  

12. G. M. Palmer, C. Zhu, T. M. Breslin, F. Xu, K. W. Gilchrist, and N. Ramanujam, “Comparison of multiexcitation fluorescence and diffuse reflectance spectroscopy for the diagnosis of breast cancer,” IEEE Trans. Biomed. Eng. 50(11), 1233–1242 (2003). [CrossRef]  

13. S. A. Boppart, W. Luo, D. L. Marks, and K. W. Singletary, “Optical coherence tomography: feasibility for basic research and image-guided surgery of breast cancer,” Breast Cancer Res. Treat. 84(2), 85–97 (2004). [CrossRef]  

14. F. T. Nguyen, A. M. Zysk, E. J. Chaney, J. G. Kotynek, and S. A. Boppart, “Intraoperative evaluation of breast tumor margins with optical coherence tomography,” Cancer Res. 69(22), 8790–8796 (2009). [CrossRef]  

15. A. M. Zysk, K. Chen, E. Gabrielson, L. Tafra, E. A. May Gonzalez, J. K. Canner, E. B. Schneider, A. J. Cittadine, P S. Carney, S. A. Boppart, K. Tshuchiya, K. Sawyer, and L. K. Jacobs, “Intraoperative assessment of final margins with a handheld optical imaging probe during breast-conserving surgery may reduce the reoperation rate: results of a multicenter study,” Ann. Surg. Oncol. 22(10), 3356–3362 (2015). [CrossRef]  

16. A. M. Zysk and S. A. Boppart, “Computational methods for analysis of human breast tumor tissue in optical coherence tomography images,” J. Biomed. Opt. 11(5), 054015 (2006). [CrossRef]  

17. M. Villiger, D. Lorenser, R. A. Mclaughlin, B. C. Quirk, R. W. Kirk, B. E. Bouma, and D. D. Sampson, “Deep tissue volume imaging of birefringence through fibre-optic needle probes for the delineation of breast tumour,” Sci. Rep. 6(1), 28771 (2016). [CrossRef]  

18. R. Ha, L. C. Friedlander, H. Hibshoosh, C. Hendon, S. Feldman, S. Ahn, H. Schmidt, M. K. Akens, M. Fitzmaurice, B. C. Wilson, and V. L. Mango, “Optical coherence tomography: a novel imaging method for post-lumpectomy breast margin assessment-a multi-reader study,” Academic Radiol. 25(3), 279–287 (2018). [CrossRef]  

19. H. Schmidt, C. Connolly, S. Jaffer, T. Oza, C. R. Weltz, E. R. Port, and A. Corben, “Evaluation of surgically excised breast tissue microstructure using wide-field optical coherence tomography,” Breast J. 26(5), 917–923 (2020). [CrossRef]  

20. S. J. Erickson-Bhatt, R. M. Nolan, N. D. Shemonski, S. G. Adie, J. Putney, D. Darga, D. T. McCormick, A. J. Cittadine, A. M. Zysk, M. Marjanovic, E. J. Chaney, G. L. Monroy, F. A. South, K. A. Cradock, Z. G. Liu, M. Sundaram, P. S. Ray, and S. A. Boppart, “Real-time imaging of the resection bed using a handheld probe to reduce incidence of microscopic positive margins in cancer surgery,” Cancer Res. 75(18), 3706–3712 (2015). [CrossRef]  

21. R. McLaughlin, B. Quirk, A. Curatolo, R. Kirk, L. Scolaro, D. Lorenser, P. Robbins, B. Wood, C. Saunders, and D. Sampson, “Imaging of breast cancer with optical coherence tomography needle probes: feasibility and initial results,” IEEE J. Sel. Top. Quantum Electron. 18(3), 1184–1191 (2012). [CrossRef]  

22. A. M. Zysk, F. T. Nguyen, E. J. Chaney, J. G. Kotynek, and S. A. Boppart, “Clinical feasibility of microscopically-guided breast needle biopsy using a fiber-optic probe with computer-aided detection,” Technol. Cancer Res. Treat. 8(5), 315–321 (2009). [CrossRef]  

23. Y. Liu, B. Hubbi, and X. Liu, “Single fiber OCT imager for breast tissue classification based on deep learning,” in Optical Fibers and Sensors for Medical Diagnostics and Treatment Applications XX, vol. 11233 (Proc. SPIE, 2020), pp. 114–119.

24. A. Butola, D. K. Prasad, A. Ahmad, V. Dubey, D. Qaiser, A. Srivastava, P. Senthilkumaran, B. S. Ahluwalia, and D. S. Mehta, “Deep learning architecture LightOCT for diagnostic decision support using optical coherence tomography images of biological samples,” Biomed. Opt. Express 11(9), 5017–5031 (2020). [CrossRef]  

25. X. Yao, Y. Gan, E. Chang, H. Hibshoosh, S. Feldman, and C. Hendon, “Visualization and tissue classification of human breast cancer images using ultrahigh-resolution OCT,” Lasers Surg. Med. 49(3), 258–269 (2017). [CrossRef]  

26. J. Wang, Y. Xu, and S. A. Boppart, “Review of optical coherence tomography in oncology,” J. Biomed. Opt. 22(12), 1–23 (2017). [CrossRef]  

27. V. Zaitsev, A. Matveev, L. Matveev, A. Sovetsky, M. Hepburn, A. Mowla, and B. Kennedy, “Strain and elasticity imaging in compression optical coherence elastography: The two-decade perspective and recent advances,” J. Biophotonics 14(2), e202000257 (2021). [CrossRef]  

28. B. F. Kennedy, R. A. McLaughlin, K. M. Kennedy, L. Chin, P. Wijesinghe, A. Curatolo, A. Tien, M. Ronald, B. Latham, C. M. Saunders, and D. D. Sampson, “Investigation of optical coherence microelastography as a method to visualize cancers in human breast tissue,” Cancer Res. 75(16), 3236–3245 (2015). [CrossRef]  

29. W. M. Allen, L. Chin, P. Wijesinghe, R. W. Kirk, B. Latham, D. D. Sampson, C. M. Saunders, and B. F. Kennedy, “Wide-field optical coherence micro-elastography for intraoperative assessment of human breast cancer margins,” Biomed. Opt. Express 7(10), 4139–4153 (2016). [CrossRef]  

30. K. M. Kennedy, R. Zilkens, W. M. Allen, K. Y. Foo, Q. Fang, L. Chin, R. W. Sanderson, J. Anstie, P. Wijesinghe, A. Curatolo, H. E. I. Tan, N. Morin, B. Kunjuraman, C. Yeomans, S. L. Chin, H. DeJong, K. Giles, B. F. Dessauvagie, B. Latham, C. M. Saunders, and B. F. Kennedy, “Diagnostic accuracy of quantitative micro-elastography for margin assessment in breast-conserving surgery,” Cancer Res. 80(8), 1773–1783 (2020). [CrossRef]  

31. Q. Fang, L. Frewer, R. Zilkens, B. Krajancich, A. Curatolo, L. Chin, K. Y. Foo, D. D. Lakhiani, R. W. Sanderson, P. Wijesinghe, J. D. Anstie, B. F. Dessauvagie, B. Latham, C. M. Saunders, and B. F. Kennedy, “Handheld volumetric manual compression-based quantitative microelastography,” J. Biophotonics 13(6), e201960196 (2020). [CrossRef]  

32. E. V. Gubarkova, A. A. Sovetsky, V. Y. Zaitsev, A. L. Matveyev, D. A. Vorontsov, M. A. Sirotkina, L. A. Matveev, A. A. Plekhanov, N. P. Pavlova, S. S. Kuznetsov, A. Y. Vorontsov, E. V. Zagaynova, and N. D. Gladkova, “OCT-elastography-based optical biopsy for breast cancer delineation and express assessment of morphological/molecular subtypes,” Biomed. Opt. Express 10(5), 2244–2263 (2019). [CrossRef]  

33. A. A. Plekhanov, M. A. Sirotkina, A. A. Sovetsky, E. V. Gubarkova, and V. Y. Zaitsev, “Histological validation of in vivo assessment of cancer tissue inhomogeneity and automated morphological segmentation enabled by optical coherence elastography,” Sci. Rep. 10(1), 11781 (2020). [CrossRef]  

34. M. A. Sirotkina, E. V. Gubarkova, A. A. Plekhanov, A. A. Sovetsky, V. V. Elagin, A. L. Matveyev, L. A. Matveev, S. S. Kuznetsov, E. V. Zagaynova, N. D. Gladkova, and V. Y. Zaitsev, “In vivo assessment of functional and morphological alterations in tumors under treatment using OCT-angiography combined with OCT-elastography,” Biomed. Opt. Express 11(3), 1365–1382 (2020). [CrossRef]  

35. E. V. Gubarkova, E. B. Kiseleva, M. A. Sirotkina, D. A. Vorontsov, and N. D. Gladkova, “Diagnostic accuracy of cross-polarization OCT and OCT-elastography for differentiation of breast cancer subtypes: Comparative study,” Diagnostics 10(12), 994 (2020). [CrossRef]  

36. A. Miyazawa, S. Makita, E. Li, K. Yamazaki, M. Kobayashi, S. Sakai, and Y. Yasuno, “Polarization-sensitive optical coherence elastography,” Biomed. Opt. Express 10(10), 5162–5181 (2019). [CrossRef]  

37. F. A. South, E. J. Chaney, M. Marjanovic, S. G. Adie, and S. A. Boppart, “Differentiation of ex vivo human breast tissue using polarization-sensitive optical coherence tomography,” Biomed. Opt. Express 5(10), 3417–3426 (2014). [CrossRef]  

38. J. Wang, Y. Xu, K. J. Mesa, F. A. South, E. J. Chaney, D. R. Spillman, R. Barkalifa, M. Marjanovic, P. S. Carney, A. M. Higham, Z. G. Liu, and S. A. Boppart, “Complementary use of polarization-sensitive and standard OCT metrics for enhanced intraoperative differentiation of breast cancer,” Biomed. Opt. Express 9(12), 6519–6528 (2018). [CrossRef]  

39. S. G. Adie, T. R. Hillman, and D. D. Sampson, “Detection of multiple scattering in optical coherence tomography using the spatial distribution of stokes vectors,” Opt. Express 15(26), 18033–18049 (2007). [CrossRef]  

40. M. J. Ju, Y. J. Hong, S. Makita, Y. Lim, K. Kurokawa, L. Duan, M. Miura, S. Tang, and Y. Yasuno, “Advanced multi-contrast jones matrix optical coherence tomography for doppler and polarization sensitive imaging,” Opt. Express 21(16), 19412–19436 (2013). [CrossRef]  

41. K. L. Lurie, R. Angst, and A. K. Ellerbee, “Automated mosaicing of feature-poor optical coherence tomography volumes with an integrated white light imaging system,” IEEE Trans. Biomed. Eng. 61(7), 2141–2153 (2014). [CrossRef]  

42. P. Pande, S. Shrestha, J. Park, M. J. Serafino, I. Gimenez-Conti, J. L. Brandon, Y. Cheng, B. E. Applegate, and J. A. Jo, “Automated classification of optical coherence tomography images for the diagnosis of oral malignancy in the hamster cheek pouch,” J. Biomed. Opt. 19(8), 086022 (2014). [CrossRef]  

43. T. Marvdashti, D. Lian, S. Z. Aasi, J. Y. Tang, and A. K. E. Bowden, “Classification of basal cell carcinoma in human skin using machine learning and quantitative features captured by polarization sensitive optical coherence tomography,” Biomed. Opt. Express 7(9), 3721–3735 (2016). [CrossRef]  

44. R. W. Conners and C. A. Harlow, “A theoretical comparison of texture algorithms,” IEEE Trans. Pattern Anal. Machine Intell. PAMI-2(3), 204–222 (1980). [CrossRef]  

45. P. B. Garcia-Allende, I. Amygdalos, H. Dhanapala, R. D. Goldin, and D. S. Elson, “Morphological analysis of optical coherence tomography images for automated classification of gastrointestinal tissues,” Biomed. Opt. Express 2(10), 2821–2836 (2011). [CrossRef]  

46. R. Gnanadesikan, Methods for Statistical Data Analysis of Multivariate Observations (Wiley, 1977).

47. H. Peng, F. Long, and C. Ding, “Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Trans. Pattern Anal. Machine Intell. 27(8), 1226–1238 (2005). [CrossRef]  

48. J. Kittler, “Feature selection and extraction,” in Handbook of Pattern Recognition and Image Processing,(Elsevier, 1986), pp.59–83.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (5)

Fig. 1.
Fig. 1. Representative structural OCT images (a-1, b-1, c-1), local phase retardation (LPR) images (a-2, b-2, c-2), phase difference (PD) images (a-3, b-3, c-3), phase retardation (PR) images (a-4, b-4, c-4), degree of polarization uniformity (DOPU) images (a-5, b-5, c-5) and H&E-stained histology images (a-6, b-6, c-6) of malignant tumor (a-1 to a-6), fibro-adipose (b-1 to b-6) and stroma (c-1 to c-6) tissue, respectively. Scale bars represent 500 µm. The color scale unit for PD, PR, and LPR is radian, the color scale unit for DOPU is a.u.
Fig. 2.
Fig. 2. A-scan line feature extraction. An original A-scan line data from one image metric is shown in (a). The red dashed line in (b) is the global fitting result after 5th-order polynomial fitting. (c) Short-range linear fitting results in 8 adjacent windows. Peaks and valleys are shown by the red line trace in (d).
Fig. 3.
Fig. 3. Results for best feature set selection. For the PS-OCT-based classifier, (a) is the top 100 high-rank feature distribution sorted by mRMR rank, and (b) is the feature distribution after forward searching, where the best 17 features are marked in red. For the intensity-only classifier, (c) is the top 100 high-rank feature distribution sorted by mRMR rank, and (d) is the feature distribution after forward searching, where the best 10 features are marked in red. Colored regions represent features from degree of polarization uniformity (DOPU, purple), local phase retardation (LPR, blue), intensity (I, turquoise), phase difference (PD, dark yellow) and phase retardation (PR, yellow), respectively.
Fig. 4.
Fig. 4. Prediction performance based on classifier. (a) Overall accuracy during forward searching, using a model trained by 5-metrics-integrated PS-OCT features and intensity-only features, illustrated by the blue and orange lines, respectively. (b)-(d) Receiver operating characteristic (ROC) curves of the 5-metrics-integrated classifier with (b) malignant tumor, (c) fibro-adipose, and (c) stroma as the positive class, respectively.
Fig. 5.
Fig. 5. Structural OCT (a), LPR (b), PD (c), PR (d), DOPU (e), and H&E-stained histology (f) images for a heterogeneous tissue region. Region indicated by the red ovals and arrows was identified histologically as malignant tumor.

Tables (4)

Tables Icon

Table 1. Number of patients, sites, and PS-OCT images included in this study.

Tables Icon

Table 2. Features list for each image metric data in the classifier.

Tables Icon

Table 3. Number of final selected features in each category.

Tables Icon

Table 4. Confusion matrix of model trained by integrated features from PS-OCT images with highest overall accuracy by forward searching.

Equations (15)

Equations on this page are rendered with MathJax. Learn more.

I = | E o u t , 1 | 2 + | E o u t , 2 | 2 ,
P D = arg ( E o u t , 2 × E o u t , 1 ) ,
P R = a c tan ( | E o u t , 1 | / | E o u t , 2 | ) ,
L P R = 1 ω z ω z / 2 ω z / 2 [ m 32 ( z + ν ) m 13 ( z + ν ) m 21 ( z + ν ) ] T d z ,
S = [ I Q U V ] = [ E o u t , 1 E o u t , 1 + E o u t , 2 E o u t , 2 E o u t , 1 E o u t , 1 E o u t , 2 E o u t , 2 2 | E o u t , 1 | / | E o u t , 2 | cos ( arg ( E o u t , 1 E o u t , 2 ) ) 2 | E o u t , 1 | / | E o u t , 2 | sin ( arg ( E o u t , 1 E o u t , 2 ) ) ] ,
D O P U = Q ¯ 2 + U ¯ 2 + V ¯ 2 ,
( Q ¯ , U ¯ , V ¯ ) = ( i Q i I i , i U i I i , i V i I i ) ,
e n e r g y = i = 0 L 1 i = 0 L 1 s θ ( i , j | d ) 2 ,
e n t r o p y = i = 0 L 1 i = 0 L 1 s θ ( i , j | d ) log [ s θ ( i , j | d ) ] ,
m e a n = M ¯ = 1 N i = 1 N M i ,
n o r m a l i z e d m e a n : M N ¯ = M ¯ min ( M ) max ( M ) min ( M ) ,
a b s o l u t e d e v i a t i o n : Δ M = i = 1 N | M i M ¯ | ,
S T D : σ M = ( 1 N i = 1 N ( M i M ¯ ) 2 ) 1 2 ,
s k e w n e s s : S M = 1 N i = 1 N ( M i M ¯ ) 3 σ M 3 ,
k u r t o s i s : K M = 1 N i = 1 N ( M i M ¯ ) 4 σ M 4 .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.