Multi-class classification of breast tissue using optical coherence tomography and attenuation imaging combined via deep learning

Ken Y. Foo; Ken Y. Foo; Kyle Newman; Kyle Newman; Qi Fang; Qi Fang; Peijun Gong; Peijun Gong; Hina M. Ismail; Hina M. Ismail; Devina D. Lakhiani; Devina D. Lakhiani; Renate Zilkens; Renate Zilkens; Benjamin F. Dessauvagie; Benjamin F. Dessauvagie; Bruce Latham; Bruce Latham; Christobel M. Saunders; Christobel M. Saunders; Christobel M. Saunders; Christobel M. Saunders; Lixin Chin; Lixin Chin; Brendan F. Kennedy; Brendan F. Kennedy; Brendan F. Kennedy

doi:10.1364/BOE.455110

1. Introduction

Breast cancer is the most commonly diagnosed cancer in the world, accounting for 11.7% of all new cancer cases in 2020, corresponding to over two million patients [1]. Breast-conserving surgery (BCS) is one of the main treatments for early-stage (I or II) breast cancer, with 61% of US patients receiving this treatment in 2016 [2]. However, the extent of a breast cancer can be difficult to identify intraoperatively, which contributes to high re-excision rates (15–25%) due to the detection of tumor within the surgical margin after histopathological analysis [3–7]. As this negatively impacts the physical and mental health of patients and increases healthcare costs [8–12], better tools for intraoperative margin assessment are needed.

Current intraoperative margin assessment techniques include gross examination, frozen section analysis, specimen radiography, and ultrasonography [13–15], however, frozen section analysis requires a pathologist, significantly extends operation times, and has poorer quality compared to paraffin-embedded section [16–18], and gross examination, specimen radiography, and ultrasonography are associated with relatively low sensitivities (∼50–60%) [15,19]. Optical techniques such as fluorescence imaging, Raman spectroscopy, and optical coherence tomography (OCT) are emerging as promising alternatives [20–22]. In particular, OCT generates images from variations in backscattered light from different tissue types and benefits from rapid, high resolution, three-dimensional (3-D) imaging to 1–2 mm depth in tissue [23–25]. While adipose tissue (fat) has a distinctive appearance due to low backscattering within adipose cells, highly scattering tissues, including benign dense tissue (which includes stroma, ducts and lobules) and malignant tissue, often appear similar [26–28]. An important performance metric for OCT-based breast tissue classifiers is therefore their ability to differentiate highly scattering tissues.

While breast tissue classification in OCT images has been demonstrated by training human readers to identify texture features [23–26], human interpretation introduces inter-reader subjectivity and is time-consuming, which may present a considerable barrier to clinical use. Therefore, machine learning, and convolutional neural networks (CNNs) in particular, which excel at identifying image textures, have been investigated. However, the vast majority have focused on binary classification into malignant tissue and non-malignant tissue, and the performance in benign dense tissue was not evaluated [29–32]. Furthermore, the performances reported likely depend on the prevalence of adipose tissue (since a high prevalence will likely result in high apparent performance), however, this has not been investigated in detail. While Mojahed et al. [33] investigated classification into adipose tissue, stroma, invasive ductal carcinoma (IDC), and ductal carcinoma in situ (DCIS), they did not report the class sizes, and only reported the F₁ score [the harmonic mean of the positive predictive value (PPV) and sensitivity] for each class, making comparison with other studies that report sensitivity and specificity challenging. Therefore, further investigation into the performance of CNNs for identifying adipose tissue, benign dense tissue, and malignant tissue in OCT images is required.

Several functional adaptions have been developed for OCT to provide additional contrast between benign dense tissue and malignant tissue [34,35], including polarization-sensitive OCT, which detects variations in tissue birefringence [28,36–39], optical coherence elastigraphy (OCE), which detects variations in tissue mechanical properties [26,40–43], and attenuation imaging, which detects variations in the rate of decay of light as it travels through tissue, or the attenuation coefficient [44–49]. A disadvantage of both polarization-sensitive OCT and OCE compared to attenuation imaging is that both require additional hardware, such as additional detectors for polarization-sensitive OCT [36] and a loading mechanism for OCE [41], and the acquisition of additional data, which increases cost, complexity, and acquisition time [36,50,51]. While CNNs have been applied to polarization-sensitive OCT images [28], attenuation imaging has not yet been investigated, even though it is particularly suitable due to the distinctive textures that it generates in tumor [44–48].

Another way to improve performance may be to use a loss function (also known as a cost, error, or objective function) that correlates strongly with performance metrics, which include accuracy (correct classifications divided by dataset size), sensitivity (true positives divided by condition positive), and specificity (true negatives divided by condition negatives) [52], since CNN training depends on the loss function to guide the adjustment of weights to improve prediction performance [53]. While the cross-entropy loss function is usually used for CNNs [53], this does not correlate strongly with the performance metrics (see Section 2.3). As accuracy can be misleading for imbalanced datasets, which are common in medical imaging, where one class size (e.g., non-malignant tissue) heavily outweighs another class size (e.g., malignant tissue) [52], and sensitivity and specificity require two numbers per class to describe performance, which makes ranking classifiers difficult [54], these metrics are not suitable for constructing a loss function. One alternative is the Matthews correlation coefficient (MCC), which is the correlation coefficient between the true and predicted classes, which has been proposed as a metric that is robust to class imbalance, yields a single number, and generalizes to multi-class classification [52,54]. Therefore, using an MCC-based loss function may improve performance, and while this has been investigated for binary segmentation, generalization to multi-class classification has not been implemented [55].

In this paper, we develop a CNN for multi-class breast tissue classification as adipose tissue, benign dense tissue, or malignant tissue, using both the OCT intensity and the attenuation coefficient. 5,804 images from fresh mastectomy specimens from 29 patients were used for fine-tuning and evaluating a pre-trained ResNet-18 network [56]. Additionally, we demonstrate that cross-entropy loss does not correlate strongly with accuracy and the MCC, and introduce an MCC-based loss function for multi-class classification that exhibits much stronger correlation. Two networks, one trained with OCT images, and one trained with OCT and attenuation images (henceforth referred to as the OCT network, and the combined network, respectively), are compared, and a statistically significant (${p < 0.004}$) improvement in total accuracy is obtained for the combined network (85.4%) compared to the OCT network (83.3%). While both networks perform well in adipose tissue (accuracy >96%), the combined network is more accurate than the OCT network in benign dense tissue (87.0% versus 85.0%, ${p < 0.004}$) and malignant tissue (87.6% versus 85.5%, ${p < 0.002}$), indicating that while benign dense tissue and malignant tissue can be challenging to differentiate in OCT images, the additional contrast from attenuation imaging improves performance.

2. Materials and methods

2.1 Patient recruitment and imaging protocol

Twenty-nine female patients undergoing mastectomy at Fiona Stanley Hospital (Projects: RGS0000001530 and RGS000003726) in Western Australia were included in this study. Ethics approval was granted by the Sir Charles Gairdner and Osborne Park Health Care Group Human Research Ethics Committee (EC00271) for Project RGS0000001530 and South Metropolitan Health Service Human Research Ethics Committee (EC00265) for Project RGS000003726. All participants provided informed consent. One fresh tissue specimen per patient (∼20×30×5 mm³) was excised by a pathologist from each mastectomy specimen and scanned using OCT. Where possible, the specimen was selected to contain both benign and malignant tissue, however, in four cases where the tumor was particularly small, all malignant tissue was required for clinical purposes, and none could be spared for scanning. In these cases, a specimen containing only adipose tissue and benign dense tissue was scanned. After scanning, the specimen underwent standard histopathological processing, including inking to preserve orientation, placement in one or more cassettes, fixation in 10% neutral-buffered formalin for at least 24 hours, paraffin embedding, sectioning, and staining with hematoxylin and eosin. This resulted in histology images corresponding to the OCT en face plane, which were annotated by a pathologist who identified malignant tissue regions based on cellular morphology, where malignant tissue is defined as any tissue containing cancer cells, and includes IDC (20 specimens), DCIS (13 specimens), invasive lobular carcinoma (ILC; four specimens), lobular carcinoma in situ (LCIS; two specimens), and invasive mucinous carcinoma (one specimen). Specimens with malignant tissue typically contained more than one type; in particular, DCIS was always accompanied by either IDC or invasive mucinous carcinoma, and LCIS was always accompanied by ILC.

Specimens were scanned with a spectral-domain OCT system (TEL320C1, Thorlabs Inc., Newton, NJ, USA) using a superluminescent diode source with a central wavelength of 1,300 nm and bandwidth of 200 nm, and an objective lens (LSM04; Thorlabs Inc). The measured axial and lateral resolution of this system is 5.5 µm (in air) and 13 µm, respectively. During scanning, the tissue specimen was placed on two orthogonal translation stages and a laboratory jack. The orthogonal translation stages enable a grid of up to 3×3 subvolumes to be acquired, which are stitched in post-processing to obtain a wide-field volume with a field-of-view of ∼45×45×3.5 mm³ in air [57]. The voxel size of the processed scans is 20×20×3.5 µm³ in air. While this undersamples the lateral resolution, it enables wide-field volumes to be acquired in 10 minutes, which allows the entire scanning process, including specimen transportation from surgery, sample selection with the pathologist, scanning, and placing the specimen in formalin, to be comfortably completed in one hour, thus minimizing the effect of tissue degradation on histological processing. During scanning, a clear, compliant silicone layer was placed on the specimen, and the specimen was raised using the laboratory jack until it was in contact with an imaging window fixed to an annular piezoelectric actuator. This was done to enable quantitative micro-elastography images to be generated [57], however, these images were not used in this study.

One wide-field OCT volume was used per specimen, from which attenuation images were also generated. As detailed in Foo et al. [48], to enable attenuation imaging, the objective lens confocal function was characterized by imaging a suspension of 0.5 µm diameter polystyrene microspheres (Polybead 07307-15, Polysciences, Inc., Warrington, FL, USA) diluted in distilled water (3% v/v) prior to tissue imaging, but after the focus was set. Additionally, the sensitivity roll-off of the spectrometer was characterized by imaging a mirror at several depths and modeling the decrease in intensity with depth. After acquisition, the OCT spectral data was processed using a custom post-processing chain to generate 3-D volumes of OCT intensity and attenuation coefficient. The attenuation coefficient was calculated by first averaging the linear OCT signal-to-noise ratio (SNR) in the $xy$ (lateral) plane using a 100×100 µm² symmetric Gaussian kernel (${\sigma ={20}\,{\mathrm{\mu} \mathrm{m}}}$), then dividing the result by the confocal and sensitivity roll-off functions to obtain the corrected OCT signal. The slope with respect to depth, $z$, of the logarithm of the corrected OCT signal was then used to compute the attenuation coefficient at each lateral location. This slope was calculated using linear least-squares over an adaptive fitting range, which was determined by an algorithm that aimed to maximize the fitting range in homogeneous regions, subject to a minimum fitting range limit of 100 µm. Post-processing was done using a desktop computer running MATLAB R2016a on Windows Server 2012 with two Intel Xeon E5-2690 octa-core central processing units (CPUs) and 192 GiB random-access memory (RAM). An entire wide-field scan typically required at least ten hours for post-processing, however, this may be significantly reduced by optimizing the adaptive fitting range algorithm, or using alternative methods to compute the attenuation coefficient, such as the depth-resolved method [58].

2.2 Data preparation and network training

To assemble the dataset, three en face depths at 180 µm, 200 µm and 220 µm below the tissue surface were extracted from each wide-field volume and divided into 2×2×0.04 mm³ sub-images (225–529 sub-images per wide-field scan, 12,239 sub-images in total). The OCT and attenuation images were represented as separate channels within each en face image. 191 sub-images were excluded due to a lack of available coregistered histology (${n = 7}$) and technical issues during image acquisition that resulted in one-third of one wide-field image being lost (${n = 184}$). The remaining sub-images were assigned to one class, either adipose tissue (${n = 3,048}$), benign dense tissue (${n = 1,144}$), malignant tissue (${n = 1,612}$), or background (i.e., no tissue; ${n = 6,244}$). As it would not have been feasible for a pathologist to label each of the 5,804 tissue sub-images individually, labeling was done by coregistering the en face wide-field OCT images at 200 µm depth with histology in the same plane that was annotated by a pathologist. Sub-images were labeled according to the class that constituted the largest fraction of the sub-images; therefore, all malignant tissue sub-images contained cancer cells. Finally, background sub-images were removed. The dataset construction methodology is summarized in Fig. 1(a).

Fig. 1. (a) Overview of the methodology to construct and label the dataset. Labeled histology, OCT images, and attenuation images are obtained from each tissue specimen. The OCT and attenuation images are subdivided and labeled by coregistration with the labeled histology to construct the training, validation, and test sets. (b) Overview of the network training and evaluation methodology. The training and validation sets are used to train the network, and the test set is hidden from the network until evaluation to measure the network performance. Scale bars represent 10 mm in the wide-field images (no borders), and 1 mm in the magnified images (dashed borders).

Download Full Size | PDF

To train and evaluate the CNN, the included sub-images were split into three independent sets for training, validation, and testing. The training set was used to update the network weights after each iteration using the backpropagation algorithm, the validation set was used to identify the best training hyperparameters (such as the initial learning rate, the learning rate schedule, and the number of epochs), and the test set was used to evaluate the final performance of the network, as illustrated in Fig. 1(b). To reduce selection bias resulting from the allocation of sub-images to the training, validation, and test sets, 10-fold cross-validation was used, in which the CNN was trained and evaluated for ten different splits of the dataset. Specifically, the 29 wide-field scans were divided into ten groups (nine groups of three and one group of two), which formed the test set in each round of cross-validation. For each test set, the remaining sub-images were then randomly allocated in an ${{80}{\%}/{20}{\%}}$ ratio to the training and validation sets, respectively, such that the proportion of adipose tissue, benign dense tissue, and malignant tissue sub-images in the training sets was equal to that in the validation sets. This method of splitting the test sets ensured all sub-images from the same tissue specimen (and the same patient) were in the same test set. Consequently, the CNN was never both trained and tested on the same tissue specimen or patient in the same round of cross-validation, thus removing the potential of overestimating the CNN performance due to intra-specimen or intra-patient correlations. The class sizes for all cross-validation groups are summarized in Table 1 and provided in full in Table S1 in Supplement 1.

Table 1. Summary of the dataset. The training, validation, and test set sizes for each class are given as mean $\pm$ standard deviation across all ten cross-validation groups. The total class sizes are the same for all cross-validation groups.

View Table | View all tables in this article

To maximize visual contrast in the attenuation images, the square root of the attenuation channel was taken for all images (i.e., the attenuation channel was transformed to units of $\text {length}^{-1/2}$). The OCT channel was provided in units of decibels (dB). Both channels were then normalized by clipping the values at the 1st and 99th percentiles (−3.8 dB and 33.0 dB, respectively, for OCT, and 0.06 mm^−1/2 and 4.68 mm^−1/2, respectively, for attenuation imaging), then scaling linearly to the range ${[0,1]}$. The 1st and 99th percentiles were calculated using the full dataset (i.e., all 29 wide-field scans), such that the same normalization was used for all cross-validation groups. Dataset augmentation was applied during training to reduce overfitting to the training set and consisted of random rotations by 90-degree increments, random horizontal flips, and random depth selection (from 180 µm, 200 µm and 220 µm depths). While random rotations and flips are commonly used data augmentation techniques, to the authors’ knowledge, random depth selection has not previously been documented, possibly since it requires 3-D data. We used depths at 180 µm, 200 µm and 220 µm because the tissue did not change much over this depth range, such that corresponding sub-images at each depth contained the same tissue type, and therefore the same class labels could be used for all depths. However, small variations were present at different depths that occurred due to, for example, different noise realizations, different speckle realizations, or different arrangements of microarchitecture such as adipose cells. Additionally, as the range of selected depths (40 µm) is less than the attenuation fitting range (100 µm), attenuation images at these depths are likely more correlated than the corresponding OCT images. While this may reduce the efficacy of this dataset augmentation technique for attenuation imaging, there is likely still some benefit, as small variations between images are still present.

Two ResNet-18 networks, the OCT network and the combined network, were trained to classify sub-images as adipose tissue, benign dense tissue, or malignant tissue. ResNet-18 is a CNN that is characterized by the inclusion of shortcut connections that enable layers to be bypassed, which facilitates the training of deep networks that contain many layers (e.g., 152 layers in the original implementation) [56]. Deep networks have been shown to enable high performance in complex image recognition tasks [59], and ResNets in particular have been successfully applied to histopathological image classification in a range of areas, such as for identifying cells infected by malaria [60], blood cell classification [61], and cancer detection in lymph node tissue [62], breast tissue [63], and colon tissue [64]. ResNet-18 is an 18-layer network that contains, in addition to an initial 7×7 convolutional layer and a fully-connected output layer, eight “building blocks”, which each contain two 3×3 convolutional layers and a shortcut connection. The full architecture is described by He et al. [56], and is illustrated in Fig. S1 in Supplement 1.

Transfer learning was applied by initializing the network weights using a pre-trained network from PyTorch 1.10.0 [65] trained on a subset of the ImageNet database that contains 1.4 million images of everyday objects in 1,000 categories, such as porcupine, Blenheim spaniel, and letter opener [66]. Despite the visual differences between the types of images in ImageNet and OCT images, transfer learning using networks that have been pre-trained using ImageNet is widely used in medical imaging, and has been shown to improve performance, especially for tasks using deep networks with limited datasets [67]. After initialization, the first convolutional layer was replaced to accept the required number of input channels (one or two), and the last fully connected layer was replaced to produce three outputs, representing the classes of adipose tissue, benign dense tissue, and malignant tissue. All network weights were allowed to update during training. Sub-images were upsampled using bilinear interpolation from 100×100 px² to 224×224 px² prior to input to ResNet-18 to match the image dimensions of the pre-trained network.

Training was done using the AdamW optimizer implemented in PyTorch with $(\beta _1, \beta _2) = (0.9, 0.999)$, a stepped learning schedule that decayed the learning rate starting from 3×10^-4 by a factor of 0.316 after 50 and 100 epochs, and a minibatch size of 64. The number of training epochs was determined using an early stopping algorithm with a patience of 100 epochs, in which training continues until no improvement in the validation loss is seen for 100 epochs. This allows the network to achieve the minimum validation loss while also avoiding overfitting to the training set, thus acting as an efficient regularization method that improves the generalizability of the network [68, p. 240]. A novel MCC-based loss function for multi-class classification, described in Section 2.3, was used to generate the results in Section 3. Training and evaluation of all networks was performed on a cloud-based Linux system running Python 3.9.7 with seven Intel Xeon Cascade Lake 2.6 GHz virtual central processing units (vCPUs), 45 GB RAM, and an Nvidia V100 PCIe virtual graphics processing unit (vGPU) with 4 GiB available memory, different from the system used for OCT and attenuation data processing. PyTorch 1.10.0 [65] and NumPy 1.21.2 [69] were used for array computations, SciPy 1.7.1 and scikit-learn 1.0.1 [70] were used for statistical analysis, and Matplotlib 3.4.3 [71] was used for generating plots. Training required ∼10 s per epoch, and inference (predicting the class) required ∼0.5 ms for one sub-image. As such, a typical wide-field image containing ∼500 sub-images could be classified in <300 ms, which is sufficiently fast for intraoperative application.

2.3 MCC-based loss function

To train a CNN, the backpropagation algorithm is used to compute the gradient of neural network weights with respect to the loss function, which indicates how the network weights should be updated after each training iteration to reduce the loss [68, p. 197]. The loss function takes the network output and usually produces a value that should decrease as network performance increases, with a minimum of zero corresponding to perfect performance. In addition, the loss function must be differentiable with respect to the network weights to enable backpropagation. Appropriate choice of the loss function is crucial to the training of a CNN, as the training will attempt to minimize the loss function, and, hence, a loss function that poorly correlates to the “true” performance metrics of interest will produce inferior results to one with better correlation. In this section, we provide a brief review of the cross-entropy loss function, introduce the MCC loss function, and demonstrate that the MCC loss function exhibits stronger correlation with performance metrics than the cross-entropy loss function.

Cross-entropy loss is the most commonly used loss function for multi-class classification problems [53]. It can be calculated for a batch of $N$ samples labeled by one of $n$ classes by

(1)$$\mathcal{L}_{{\mathit{CE}}} ={-}\frac{1}{N} \sum_{i=1}^{N} \sum_{j=1}^{n} w_j Y_{ij} \log \left(\tilde{Z}_{ij}\right),$$

where $\tilde {Z}_{ij}$ is an ${N \times n}$ matrix where the $i$th row and $j$th column contains the probability that the $i$th sample belongs to the $j$th class. Given the raw outputs of the network, $Z_{ij}$, an ${N \times n}$ matrix where each row contains the $n$-dimensional vector output from the network for a particular sample, $\tilde {Z}_{ij}$ is obtained by applying the softmax function, $S(\cdot )$, to normalize the network outputs to the range ${[0,1]}$ and ensure ${\sum _{j=1}^{n} \tilde {Z}_{ij} = 1\,\forall \, i}$ [53],

(2)$$\tilde{Z}_{ij} = S\left(Z_{ij}\right) = \frac{\exp\left(Z_{ij}\right)}{\sum_{j=1}^{n}\exp\left(Z_{ij}\right)}.$$

$Y_{ij}$ is an ${N \times n}$ matrix where each $i$th row is zero, except for the $j$th column corresponding to the true class for the $i$th sample, where ${Y_{ij}=1}$. $w_j$ is the weight for the $j$th class, which may be used to improve model performance on classes with a small number of samples. One common strategy is to make the class weights inversely proportional to the class sizes [72]. Intuitively, cross-entropy loss enables machine learning models to train because the higher the probability the model assigns to the correct class for a particular sample, the lower the loss. However, it does not correlate strongly with accuracy and other performance metrics commonly used in evaluating diagnostic tests, as shown in Fig. 2(c) and Table 2.

Table 2. Cross-validated comparison between the cross-entropy loss and the MCC loss in terms of the total accuracy and the MCC, evaluated on the validation set. All entries are given as mean $\pm$ standard deviation across all ten cross-validation groups.

View Table | View all tables in this article

While accuracy is a poor performance metric for imbalanced datasets, a more robust metric, the MCC, that is less sensitive to class size differences has been proposed [52,54]. We therefore construct a loss function based on the MCC in order to obtain a loss function that correlates strongly with classifier performance. The multi-class MCC is given by [73,74],

(3)$$MCC = \frac{N C - \sum_{j=1}^{n}T_{j} P_{j}}{\sqrt{\left(N^{2} - \sum_{j=1}^{n} T_{j}^{2}\right)\left(N^{2} - \sum_{j=1}^{n} P_{j}^{2}\right)}},$$

where $C$ is the total number of samples correctly predicted, $T_j$ is the number of samples in class $j$, and $P_j$ is the number of times class $j$ is predicted. The MCC can be computed from the matrices $Y_{ij}$ and $\hat {Z}_{ij}$, where, as before, $Y_{ij}$ contains the true classes, and $\hat {Z}_{ij}$ is a similar matrix containing the predicted classes, via

(4)$$C = \sum_{i=1}^{N} \sum_{j=1}^{n} Y_{ij} \hat{Z}_{ij}, $$

(5)$$T_{j} = \sum_{i=1}^{N} Y_{ij}, $$

(6)$$P_{j} = \sum_{i=1}^{N} \hat{Z}_{ij}. $$

The predictions $\hat {Z}_{ij}$ can be obtained from the probabilities $\tilde {Z}_{ij}$ by replacing the maximum value in each row of $\tilde {Z}_{ij}$ with a 1 and all other entries with a 0. However, as this operation is not differentiable as required for backpropagation, the MCC cannot be used directly to construct a loss function. Following the approach described in [55], we therefore propose a differentiable approximation of the multi-class MCC, which is computed by replacing the predictions $\hat {Z}_{ij}$ with the probabilities $\tilde {Z}_{ij}$ in the expressions for $C$ and $P_j$,

(7)$$C = \sum_{i=1}^{N} \sum_{j=1}^{n} Y_{ij} \tilde{Z}_{ij}, $$

(8)$$P_{j} = \sum_{i=1}^{N} \tilde{Z}_{ij}. $$

In Eq. (3), the denominator serves to scale the MCC to the range $[-1, 1]$. In the case where the denominator is zero, the numerator will also be zero, and the denominator can be arbitrarily set to one to produce a finite result and maintain differentiability [74]. Perfect classification corresponds to an MCC of 1. As training algorithms usually aim to minimize the loss, we define the MCC loss function as

(9)$$\mathcal{L}_{MCC} = 1 - MCC$$

Figure 2 shows a comparison between the cross-entropy and MCC loss functions, using the same training and validation sets as in Group 5 of Table S1. A summary of the results across all the groups in Table S1 is provided in Table 2. The networks using the cross-entropy loss function and the MCC loss function were trained using OCT and attenuation images and the same hyperparameters stated in Section 2.2, except for the initial learning rate, which was set to 0.03 for the cross-entropy loss function, as this resulted in better performance for this loss function. Additionally, for the cross-entropy loss function, classes were weighted proportionally to the inverse combined training and validation set class sizes for the adipose tissue, benign dense tissue, and malignant tissue classes, as is commonly done for imbalanced classes [72], resulting in weights of 0.54, 1.44, and 1.02, respectively.

Fig. 2. The validation set results for cross-validation Group 5. The results at each epoch are shown for a network trained using (a) the cross-entropy loss function, and (b) the MCC loss function, including the total accuracy, MCC, and corresponding validation and training loss functions. Purple, green, and orange triangles indicate the maximum total accuracy, maximum MCC, and minimum validation loss, respectively, across all epochs. The total accuracy and MCC is plotted against the (c) cross-entropy loss function, and (d) MCC loss function, with the Spearman’s rank correlation coefficient calculated using all epochs (large dots), and calculated using only early epochs before overfitting occurs (small dots), shown in the legends.

Download Full Size | PDF

Figure 2(a) and 2(b) show the validation set results for a network trained using the cross-entropy loss function and the MCC loss function, respectively, including the total accuracy (that is, the total correct predictions across all classes divided by the dataset size), MCC, and corresponding validation and training loss functions. Figure 2(a) shows that when the cross-entropy loss function is used, the validation loss starts increasing rapidly after Epoch 101 due to overfitting to the training set, however, this does not correspond to a substantial decrease in performance, as indicated by the relatively stable validation accuracy and MCC. The poor correlation between the cross-entropy loss and model performance is further illustrated in Fig. 2(c), and can be quantified using Spearman’s rank correlation coefficient, $\rho$, where ${\rho = 1}$ indicates identical ranking, ${\rho = -1}$ indicates opposite ranking, and ${\rho \approx 0}$ indicates uncorrelated rankings [75]. Using this metric, the mean $\pm$ standard deviation correlation across all cross-validation groups was ${\rho = -0.596 }\pm 0.097$ between cross-entropy loss and accuracy, and ${\rho = -0.623 }\pm 0.099$ between cross-entropy loss and the MCC, as shown in Table 2, indicating poor correlation. Furthermore, for Group 5, the network weights that produce the minimum validation loss (${\text {loss} = 0.352}$ at Epoch 101) does not coincide with the network weights that produce the highest performance (${\text {accuracy} = {87.7}{\% }}$ at Epoch 108 and $MCC = 0.799$ at Epoch 122), illustrating a disconnect between the cross-entropy loss function and the evaluation metric. This is particularly problematic since a common practice in machine learning is to retain the model with the lowest validation loss, under the assumption that the lowest loss corresponds to the best performance [68, p. 239]. Across all cross-validation groups, the total accuracy at the epoch with the minimum loss was 0.6±0.3% lower than the maximum total accuracy across all epochs, and the MCC at the epoch with the minimum loss was 0.009±0.005 lower than the maximum MCC across all epochs.

Figure 2(b) shows that the proposed MCC loss function provides a better indication of the network accuracy and MCC, and that the minimum MCC loss corresponds to the maximum accuracy and MCC. The improved correlation between the MCC loss function and the network performance is illustrated in Fig. 2(d) and Table 2, which shows ${\rho \approx -0.97 }\pm 0.01$ between the MCC loss and both accuracy and the MCC across all cross-validation groups. This improved correlation is observed whether we include all epochs, or restrict the analysis to only early epochs before overfitting occurs, as shown for Group 5 in Fig. 2(c) and 2(d). For Group 5, the best accuracy (89.8%) and MCC (0.833) also coincide with the minimum loss (0.172) at Epoch 59, and are higher than the accuracy and MCC of the network trained using the cross-entropy loss function. Across all cross-validation groups, the total accuracy at the epoch with the minimum loss was 0.2±0.1% lower than the maximum total accuracy across all epochs, and the MCC at the epoch with the minimum loss was 0.003±0.003 lower than the maximum MCC across all epochs, which is a smaller decrease than when using the cross-entropy loss. The improved correlation between the loss function and the performance metrics may have contributed to the improved total accuracy (90.1±0.8%) and MCC (0.837±0.014) at the minimum loss obtained using the MCC loss function, compared to a total accuracy of 88.2±1.0% and MCC of 0.809±0.017 obtained using the cross-entropy loss function.

3. Results

Table 3 summarizes the classification performance for both the OCT network and the combined network on the test sets of each cross-validation group, and presents the full confusion matrix as well as several performance metrics evaluated per class [sensitivity, specificity, PPV, negative predictive value (NPV), and accuracy] and across all classes (total accuracy and MCC). Both networks were trained using the MCC loss function and the hyperparameters detailed in Section 2.2. To calculate the performance metrics for a particular class, a one-versus-rest approach is used, where a binary confusion matrix is constructed for each class by merging the other classes, and the performance metrics are then computed using the standard formulae, e.g., for malignant tissue, the performance metrics are calculated for detecting malignant tissue versus non-malignant tissue (i.e., adipose tissue and benign dense tissue).

Table 3. Confusion matrix and classification performance of the OCT network and the combined network on the test sets. The confusion matrix entries are the sum of the confusion matrices from each cross-validation group and the percentage of the true class. The sensitivity, specificity, PPV, NPV, and accuracy for each class, and the total accuracy and MCC across all classes, are the mean $\pm$ standard deviation across all cross-validation groups. Bold indicates the best network, and asterisks indicate a statistically significant difference between the networks, for corresponding metrics.

View Table | View all tables in this article

The total accuracy and MCC were 83.3±6.6% and 0.734±0.072, respectively, for the OCT network, and 85.4±7.4% and 0.766±0.088, respectively, for the combined network. While it may initially appear that there is no significant difference between the two networks based on the standard deviations, a closer look at the performance in each cross-validation group (Fig. 3) shows that while the variation between groups is large (resulting in the large standard deviations), the combined network performs better than the OCT network in nine out of ten groups. The effect of adding attenuation images to OCT images on CNN performance can be statistically tested using the Wilcoxon signed-rank test, which tests if changing a parameter (in our case, adding attenuation images) corresponds to a shift in the location (i.e., median) of a population (e.g., accuracy measurements), given measurements from both before the change (i.e., from the OCT network) and after the change (i.e., from the combined network) [76, pp. 39–55]. Applying this test indicates that the difference between the OCT network and the combined network is statistically significant (${\alpha = 0.05}$) for both total accuracy (${p < 0.004}$) and the MCC (${p < 0.028}$). Similarly, the two-sided Wilcoxon signed-rank test can be applied to the performance metrics for each class, with statistically significant differences indicated by asterisks in Table 3. The $p$-values for each of these tests are provided in Table S5 in Supplement 1. While both networks achieve similarly high performance in adipose tissue (>95% across all metrics), the combined network achieves significantly better sensitivity, NPV, and accuracy in benign dense tissue (${p < 0.011}$, ${p < 0.006}$, and ${p < 0.004}$, respectively), and specificity, PPV, and accuracy in malignant tissue (${p < 0.028}$, ${p < 0.014}$, and ${p < 0.002}$, respectively). This suggests that while OCT can accurately differentiate between adipose tissue and dense tissue, adding attenuation imaging increases the contrast between benign dense tissue and malignant tissue, resulting in the improved performance in these classes.

Fig. 3. (a) Total accuracy and (b) MCC across all classes for each cross-validation group for both the OCT network and the combined network.

Download Full Size | PDF

In addition to the OCT network and combined network shown here, a third network that was trained only with attenuation images, referred to as the attenuation network, was also investigated to determine if the improved performance of the combined network over the OCT network may be achieved using attenuation images alone. The results, discussed in Section 3 of Supplement 1, indicate that the attenuation network performs worse than the combined network. This may be because attenuation imaging provides better contrast than OCT in malignant tissue but poorer contrast in adipose tissue, possibly due to the low signal from the interiors of adipose cells, which results in higher noise. Therefore, the following results will focus on the performance of the combined network relative to that of the OCT network.

Figure 3 shows that while similar performance is achieved across most cross-validation groups, Group 4 and Group 10 appear to perform below average. This is because the test sets for these groups contain a lower proportion of easy-to-classify adipose tissue sub-images in the test set (39.2% for Group 4 and 34.4% for Group 10, compared to an average of 52.7% across all groups; see Table S1 in Supplement 1), and, consequently, a higher proportion of hard-to-classify benign dense tissue and malignant tissue sub-images. While ideally all test sets should contain exactly the same proportion of each tissue class, some variation is necessary because all sub-images from the same specimen were required to be in the same test set (as discussed in Section 2.2). Another factor that potentially contributed to the lower performance of Group 10 may have been the above-average number of benign dense tissue sub-images containing lobules in this test set, which, compared to sub-images containing only stroma, might have been more difficult to discern from malignant tissue. It is also worth noting that the groups containing the relatively rare malignant subtypes, ILC, LCIS, and invasive mucinous carcinoma (Groups 3, 5, 6, and 7), had similar performance to groups containing only IDC and/or DCIS (other than Groups 4 and 10), suggesting that all malignant subtypes can be differentiated from adipose tissue and benign dense tissue in OCT and attenuation images. However, a thorough analysis of the ability of deep learning to identify specific malignant subtypes is outside the scope of this study, and more specimens containing rare malignant subtypes will need to be imaged to address this.

Figure 4 shows the receiver operating characteristic (ROC) curves for both the OCT network and the combined network for classifying adipose tissue, benign dense tissue, and malignant tissue. ROC curves show the trade-off between sensitivity and specificity, and are often summarized by computing the area under the curve (AUC). A perfect classifier would extend to the top-left corner and have an ${AUC = 1}$. Here, the mean ROC curves, averaged across all cross-validation groups, are indicated by solid lines, and the standard deviation around the mean at each point on each curve is indicated by the shaded region. The mean and standard deviations are computed by vertical averaging, that is, the ROC curves from each cross-validation group are first resampled by linear interpolation to 1,001 linearly spaced points in specificity between 0 and 1, and the mean and standard deviation are computed over all cross-validation groups at each specificity [77]. The AUC mean and standard deviation are computed from the ROC curves from each cross-validation group prior to interpolation.

Fig. 4. ROC curves for the OCT network and the combined network for classifying (a) adipose tissue, (b) benign dense tissue, and (c) malignant tissue. The solid lines indicate the mean curve over all ten cross-validation groups, and the shaded regions indicate $\pm$ one standard deviation. The areas under the curves are given as mean $\pm$ standard deviation in the legends.

Download Full Size | PDF

Figure 4(a) shows that the ROC curves for both networks closely approach the top-left corner and have an ${AUC > 0.99}$, reaffirming that both networks can identify adipose tissue with high accuracy. A larger difference is seen between the ROC curves for benign dense tissue [Fig. 4(b)] and malignant tissue [Fig. 4(c)]. In both cases, the combined network achieves better performance, as indicated by the larger AUCs, which supports the hypothesis that attenuation imaging is particularly useful for differentiating between benign dense tissue and malignant tissue. As before, while there is a large standard deviation in the AUCs due to differences between the datasets in each cross-validation group, using the two-sided Wilcoxon signed-rank test, a significant difference can be found between the AUCs for the two networks in benign dense tissue (${p < 0.049}$) and malignant tissue (${p < 0.028}$). No significant difference was found at the ${\alpha = 0.05}$ level for adipose tissue (${p < 0.131}$).

Figure 5 shows an example of a wide-field image containing adipose tissue on the right, benign dense tissue in the bottom-left, and malignant tissue (IDC) in the top-left. From the OCT image in Fig. 5(a), the adipose tissue is readily identifiable since the combination of low scattering from the lipid-filled adipose cell interiors and high scattering from the cell membranes generates a distinct honeycomb-like texture. While benign dense tissue and malignant tissue initially appear similar in the OCT image, closer inspection reveals that the texture of the malignant tissue appears more heterogeneous than that of the benign dense tissue. This difference in texture may occur because while benign dense tissue tends to be laid in organized strands, cancer disrupts these regular growth patterns, as has been noted in a polarization-sensitive OCT study of breast tissue [38]. The difference in texture between benign dense tissue and malignant tissue is enhanced in the attenuation image shown in Fig. 5(b).

Fig. 5. An example of a wide-field (a) OCT image and (b) attenuation image. The (c) OCT network performance and (d) combined network performance, for the cross-validation group where this wide-field sample was in the test set, are also shown overlaid on the OCT and attenuation images, respectively. For each sub-image, semi-transparent colored overlays indicate the predicted class, and solid colored borders indicate the true class where the prediction is incorrect. The true class was determined by co-registration with histopathology that was annotated by a pathologist. Magnified images show a malignant tissue sub-image (dashed cyan square) in (e) OCT and (f) attenuation imaging, and a benign dense tissue sub-image (dashed orange square) in (g) OCT and (h) attenuation imaging. Dashed white lines in (f) and (h) highlight the round features in the attenuation image of malignant tissue, and the straight features in the attenuation image of benign dense tissue, respectively. Wide-field scale bars in (a–d) represent 5 mm, scale bars in magnified images (e–h) represent 0.5 mm.

Download Full Size | PDF

Figure 5(c) and 5(d) show the predicted classifications by the OCT network and the combined network, respectively, for the cross-validation group where this sample was in the test set. The combined network outperforms the OCT network across all classes, with the number of misclassified adipose tissue, benign dense tissue, and malignant tissue sub-images being lower for the combined network (1, 17, and 5, respectively) compared to the OCT network (2, 32, and 8, respectively). When considering the magnified attenuation image of malignant tissue [Fig. 5(f)], round features, highlighted by the curved dashed white lines, can be observed. While these are faintly visible in the magnified OCT image [Fig. 5(e)], they are much more difficult to discern. By comparison, the magnified attenuation image of benign dense tissue [Fig. 5(h)] exhibits a more linear texture, highlighted by the straight dashed white lines, which again appears more distinct than in the magnified OCT image [Fig. 5(g)]. While the OCT network incorrectly classified both sub-images, the more distinct textures in the attenuation images may have contributed to the correct classification of both sub-images by the combined network.

4. Discussion

In this paper, we have developed a CNN based on ResNet-18 for multi-class breast tissue classification as adipose tissue, benign dense tissue, or malignant tissue, using both OCT and attenuation imaging. In addition, we also demonstrated that there can be poor correlation between the commonly used cross-entropy loss function and common performance metrics such as accuracy and the MCC, which can reduce performance. To address this issue, we introduced a novel loss function based on the MCC that is applicable to multi-class classification problems, and showed that this function achieves much higher correlation with accuracy and the MCC. Our results show that attenuation imaging can improve performance in classifying benign dense tissue and malignant tissue compared to OCT alone [26,28].

Our results highlight the importance of training classifiers to identify separate adipose tissue and benign dense tissue classes, as opposed to a single non-malignant tissue class, when evaluating the classification performance of OCT, since the distinctive texture of adipose tissue, and the relatively similar texture of benign dense tissue and malignant tissue, likely results in much higher performance for adipose tissue than for benign dense tissue. In the extreme case, if the non-malignant tissue class contained only adipose tissue, the classification performance would likely approach that of the adipose tissue class, since identification of adipose tissue would be sufficient to differentiate adipose tissue and malignant tissue. While high performance may be achieved, this likely would not translate to more general datasets containing benign dense tissue, which is often seen in both women with mammographically dense breasts, and women who have received neoadjuvant chemotherapy (an increasingly common modality of treatment [2]). Previous studies have typically achieved sensitivities and specificities in the range of 90–96% [30–33], which is comparable to the performance we achieved in adipose tissue. While a sensitivity and specificity as high as 98.6% and 99.1%, respectively, has been reported in one study, the authors noted that increasing the number of classes above two may decrease the performance of their network [29]. Importantly, as none of the previous studies have reported the number of adipose tissue samples and benign dense tissue samples present within the non-malignant tissue class, it is difficult to determine whether the high sensitivities and specificities previously achieved will translate to datasets containing a large number of benign dense tissue samples, which are likely to be encountered in a clinical setting.

While our results show that adding attenuation images to OCT images corresponds to a statistically significant increase in performance, the amount of improvement varies between performance metrics, with the largest improvements observed for benign dense tissue sensitivity (8.4%), malignant tissue PPV (3.9%), and the MCC (0.032). Smaller improvements were observed for benign dense tissue NPV and accuracy; malignant tissue specificity and accuracy; and total accuracy (∼2%). All adipose tissue metrics were >95% for both networks. This indicates that while adipose tissue can be accurately identified in OCT images without additional contrast, benign dense tissue and malignant tissue are more difficult to differentiate, and adding attenuation images improves performance in these tissues. The high performance of the OCT network in adipose tissue, combined with the high prevalence of adipose tissue in breast tissue, also explains the relatively small improvements in some metrics, since these are dominated by the number of adipose tissue true positives. Additionally, the mean values for all metrics except adipose tissue sensitivity and NPV were higher for the combined network than the OCT network, indicating that the improvements in some metrics do not require compromising performance in other metrics.

While previous studies have suggested that attenuation imaging substantially increases contrast in dense breast tissue [44–49], the corresponding classification performance improvement found in this study is relatively modest. One possible explanation is that attenuation images inherently have lower resolution than OCT images, both laterally (due to A-scan averaging) and axially (due to the fitting range used to calculate the slope of the intensity with depth) [48]. Therefore, OCT images may contain more information at small scales than attenuation images that, while imperceptible to humans, may be utilized by CNNs. This loss of information at small scales may counteract the increased contrast in attenuation images, thus limiting the classification performance gains. Alternatively, it is possible that increasing the lateral resolution may increase the amount of information in OCT and attenuation images, and thus improve the performance of both classifiers. As mentioned in Section 2.1, in this study, the lateral resolution was undersampled; therefore, in this case, denser lateral sampling may improve performance.

Since the attenuation images are calculated from the OCT scans, one may question whether the network could deduce the information in the attenuation images without being shown them explicitly. However, our results show a statistically significant improvement in performance when attenuation images are input explicitly. This may be because while the OCT network is trained on two-dimensional (2-D) images at a single depth at either 180 µm, 200 µm or 220 µm, attenuation images contains information from a range of depths (at least 100 µm for our implementation) [48]. Using 3-D OCT volumes instead of 2-D images to compensate for this would also greatly increase the number of input parameters, thus increasing training time, evaluation time, disk space requirements, and the risk of overfitting. Furthermore, while the attenuation images have been corrected for the OCT system sensitivity roll-off and confocal function, these effects remain in the OCT images and may cause variations that are independent of the sample that the CNN cannot account for. Explicit computation of the attenuation images may also be considered as a form of incorporating domain knowledge into the network, which is particularly beneficial for small datasets [78,79].

While the multi-class MCC loss function introduced in this study correlates better with conventional diagnostic test performance metrics (such as accuracy) than the cross-entropy loss function, an important consideration is that the minibatch size must be large enough to contain several samples from each class in order to yield an accurate estimate of the MCC. In this study, the minibatch size was constrained to 64 sub-images by the available vGPU memory, which corresponded to 33.6 adipose tissue, 12.6 benign dense tissue, and 17.8 malignant tissue sub-images in each minibatch on average. Additionally, while the improved correlation between the performance metrics and the MCC loss function increases the likelihood that the epoch with the minimum loss will correspond to the maximum performance, it is not guaranteed. This is because even though the correlation has been substantially improved, it is not perfect, due to the approximation of replacing the network predictions with the probabilities to calculate the number of correct predictions and predictions per class, as described in Section 2.3. However, like the cross-entropy loss function, the MCC loss function can be used to train neural networks for any application. While we have demonstrated improved correlation between performance metrics and the MCC loss function (corresponding to improved performance) for our particular application, similar trends should be observed if the MCC loss function is applied to other applications, since there is nothing unique about our application that makes it particularly suited to the MCC loss function.

While this study demonstrated that attenuation imaging can enable improved classification performance compared to OCT alone, more work is required to increase the sensitivity and specificity in malignant tissue before OCT and attenuation imaging can be implemented for routine intraoperative margin assessment. This may be achieved by incorporating additional OCT-based contrast mechanisms, such as polarization-sensitive OCT [28,36–39] or OCE [26,40–43], to potentially further improve contrast between benign dense tissue and malignant tissue, although these techniques would require additional hardware. To progress this work towards intraoperative application, the time required for post-processing must be reduced, either by optimizing the current method of computing the attenuation coefficient, or by investigating alternatives such as the depth-resolved method [58]. While mastectomy specimens were used in this study to facilitate histology coregistration, classification performance should also be assessed on excised tissue from BCS to resemble the clinical case more accurately. Additionally, it would be useful to distinguish between different cancer subtypes, such as invasive ductal, invasive lobular and in situ cancers.

5. Conclusion

We have developed a CNN for multi-class breast tissue classification as adipose tissue, benign dense tissue, or malignant tissue, using both OCT and attenuation imaging, and shown that adding attenuation images to OCT images improves performance. We also introduced a novel MCC-based loss function for multi-class classification and showed that it exhibits improved correlation with performance metrics, such as accuracy and the MCC, compared to the commonly used cross-entropy loss function. Our results indicate that while adipose tissue is easily identifiable in OCT images, benign dense tissue and malignant tissue are more difficult to differentiate, highlighting the importance of reporting the adipose tissue prevalence alongside classifier performance. While additional work is needed to investigate whether additional contrast from polarization-sensitive OCT or OCE can improve performance, as well as assess performance on BCS specimens, this study presents an important step towards the use of OCT, combined with attenuation imaging using CNNs, for intraoperative margin assessment.

Funding

Australian Research Council; Department of Health, Government of Western Australia; Cancer Council Western Australia; Herta Massarik PhD Scholarship for Breast Cancer Research from the University of Western Australia; Australian Government Research Training Program (RTP) Scholarship.

Acknowledgments

The authors acknowledge the facilities and scientific and technical assistance of the Australian Microscopy & Microanalysis Research Facility at the Centre for Microscopy, Characterisation & Analysis, The University of Western Australia, a facility funded by the University, State and Commonwealth Governments. The authors also acknowledge the use of PathWest, Fiona Stanley Hospital, and the scientific and technical assistance of PathWest staff, specifically, Sally Cousans for performing histology slide scanning, and Christopher Yeomans for specimen handling. This work was supported by resources provided by the Pawsey Supercomputing Centre with funding from the Australian Government and the Government of Western Australia.

Disclosures

BL: OncoRes Medical (I), CMS: OncoRes Medical (I,S), LC: OncoRes Medical (I,E), BFK: OncoRes Medical (F,I). The other authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. H. Sung, J. Ferlay, R. L. Siegel, M. Laversanne, I. Soerjomataram, A. Jemal, and F. Bray, “Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries,” Ca-Cancer J. Clin. 71(3), 209–249 (2021). [CrossRef]

2. C. E. DeSantis, J. Ma, M. M. Gaudet, L. A. Newman, K. D. Miller, A. G. Sauer, A. Jemal, and R. L. Siegel, “Breast cancer statistics, 2019,” Ca-Cancer J. Clin. 69(6), 438–451 (2019). [CrossRef]

3. E. Heeg, M. B. Jensen, L. R. Hölmich, A. Bodilsen, R. A. E. M. Tollenaar, A. V. Laenkholm, B. V. Offersen, B. Ejlertsen, M. A. M. Mureau, and P. M. Christiansen, “Rates of re-excision and conversion to mastectomy after breast-conserving surgery with or without oncoplastic surgery: A nationwide population-based study,” Br. J. Surg. 107(13), 1762–1772 (2020). [CrossRef]

4. K. Kaczmarski, P. Wang, R. Gilmore, H. N. Overton, D. M. Euhus, L. K. Jacobs, M. Habibi, M. Camp, M. J. Weiss, and M. A. Makary, “Surgeon re-excision rates after breast-conserving surgery: A measure of low-value care,” J. Am. Coll. Surg. 228(4), 504–512e2 (2019). [CrossRef]

5. A. Bodilsen, K. Bjerre, B. V. Offersen, P. Vahl, B. Ejlertsen, J. Overgaard, and P. Christiansen, “The influence of repeat surgery and residual disease on recurrence after breast-conserving surgery: A Danish breast cancer cooperative group study,” Ann. Surg. Oncol. 22(S3), 476–485 (2015). [CrossRef]

6. L. G. Wilke, T. Czechura, C. Wang, B. Lapin, E. Liederbach, D. P. Winchester, and K. Yao, “Repeat surgery after breast conservation for the treatment of stage 0 to II breast carcinoma: A report from the national cancer data base, 2004–2010,” JAMA Surg. 149(12), 1296–1305 (2014). [CrossRef]

7. R. Jeevan, D. A. Cromwell, M. Trivella, G. Lawrence, O. Kearins, J. Pereira, C. Sheppard, C. M. Caddy, and J. H. P. van der Meulen, “Reoperation rates after breast conserving surgery for breast cancer among women in England: Retrospective study of hospital episode statistics,” BMJ [Br. Med. J.] 345, e4505 (2012). [CrossRef]

8. Y. Grant, R. Al-Khudairi, E. St John, M. Barschkett, D. Cunningham, R. Al-Mufti, K. Hogben, P. Thiruchelvam, D. J. Hadjiminas, A. Darzi, A. W. Carter, and D. R. Leff, “Patient-level costs in margin re-excision for breast-conserving surgery,” Br. J. Surg. 106(4), 384–394 (2019). [CrossRef]

9. C. Dahlbäck, J. Manjer, M. Rehn, and A. Ringberg, “Determinants for patient satisfaction regarding aesthetic outcome and skin sensitivity after breast-conserving surgery,” World J. Surg. Oncol. 14(1), 303 (2016). [CrossRef]

10. S. E. Abe, J. S. Hill, Y. Han, K. Walsh, J. T. Symanowski, L. Hadzikadic-Gusic, T. Flippo-Morton, T. Sarantou, M. Forster, and R. L. White, “Margin re-excision and local recurrence in invasive breast cancer: A cost analysis using a decision tree model,” J. Surg. Oncol. 112(4), 443–448 (2015). [CrossRef]

11. J. Heil, K. Breitkreuz, M. Golatta, E. Czink, J. Dahlkamp, J. Rom, F. Schuetz, M. Blumenstein, G. Rauch, and C. Sohn, “Do reexcisions impair aesthetic outcome in breast conservation surgery? Exploratory analysis of a prospective cohort study,” Ann. Surg. Oncol. 19(2), 541–547 (2012). [CrossRef]

12. N. B. Kouzminova, S. Aggarwal, A. Aggarwal, M. D. Allo, and A. Y. Lin, “Impact of initial surgical margins and residual cancer upon re-excision on outcome of patients with localized breast cancer,” Am. J. Surg. 198(6), 771–780 (2009). [CrossRef]

13. C. Koopmansch, J.-C. Noël, C. Maris, P. Simon, M. Sy, and X. Catteau, “Intraoperative evaluation of resection margins in breast-conserving surgery for in situ and invasive breast carcinoma,” Breast Cancer: Basic Clin. Res. 15, 117822342199345 (2021). [CrossRef]

14. R. J. Gray, B. A. Pockaj, E. Garvey, and S. Blair, “Intraoperative margin management in breast-conserving surgery: A systematic review of the literature,” Ann. Surg. Oncol. 25(1), 18–27 (2018). [CrossRef]

15. E. R. St John, R. Al-Khudairi, H. Ashrafian, T. Athanasiou, Z. Takats, D. J. Hadjiminas, A. Darzi, and D. R. Leff, “Diagnostic accuracy of intraoperative techniques for margin assessment in breast cancer surgery: A meta-analysis,” Ann. Surg. 265(2), 300–310 (2017). [CrossRef]

16. M. T. Garcia, B. S. Mota, N. Cardoso, A. L. C. Martimbianco, M. D. Ricci, F. M. Carvalho, R. Gonçalves, and J. M. Jr. Soares, , and J. R. Filassi, “Accuracy of frozen section in intraoperative margin assessment for breast-conserving surgery: A systematic review and meta-analysis,” PLoS One 16(3), e0248768 (2021). [CrossRef]

17. T. Nowikiewicz, E. Śrutek, I. Głowacka-Mrotek, M. Tarkowska, A. Żyromska, and W. Zegarski, “Clinical outcomes of an intraoperative surgical margin assessment using the fresh frozen section method in patients with invasive breast cancer undergoing breast-conserving surgery – a single center analysis,” Sci. Rep. 9(1), 13441–134418 (2019). [CrossRef]

18. M. S. Sabel, J. M. Jorns, A. Wu, J. Myers, L. A. Newman, and T. M. Breslin, “Development of an intraoperative pathology consultation service at a free-standing ambulatory surgical center: Clinical and economic impact for patients undergoing breast cancer surgery,” Am. J. Surg. 204(1), 66–77 (2012). [CrossRef]

19. A. Nunez, V. Jones, K. Schulz-Costello, and D. Schmolze, “Accuracy of gross intraoperative margin assessment for breast cancer: Experience since the SSO-ASTRO margin consensus guidelines,” Sci. Rep. 10(1), 17344–173449 (2020). [CrossRef]

20. J. Heidkamp, M. Scholte, C. Rosman, S. Manohar, J. J. Fütterer, and M. M. Rovers, “Novel imaging techniques for intraoperative margin assessment in surgical oncology: A systematic review,” Int. J. Cancer 149(3), 635–645 (2021). [CrossRef]

21. J. Schwarz and H. Schmidt, “Technology for intraoperative margin assessment in breast cancer,” Ann. Surg. Oncol. 27(7), 2278–2287 (2020). [CrossRef]

22. A. R. Pradipta, T. Tanei, K. Morimoto, K. Shimazu, S. Noguchi, and K. Tanaka, “Emerging technologies for real-time intraoperative margin assessment in future breast-conserving surgery,” Adv. Sci. 7(9), 1901519 (2020). [CrossRef]

23. S. J. Erickson-Bhatt, R. M. Nolan, N. D. Shemonski, S. G. Adie, J. Putney, D. Darga, D. T. McCormick, A. J. Cittadine, A. M. Zysk, M. Marjanovic, E. J. Chaney, G. L. Monroy, F. A. South, K. A. Cradock, Z. G. Liu, M. Sundaram, P. S. Ray, and S. A. Boppart, “Real-time imaging of the resection bed using a handheld probe to reduce incidence of microscopic positive margins in cancer surgery,” Cancer Res. 75(18), 3706–3712 (2015). [CrossRef]

24. A. M. Zysk, K. Chen, E. Gabrielson, L. Tafra, E. A. May Gonzalez, J. K. Canner, E. B. Schneider, A. J. Cittadine, P. S. Carney, S. A. Boppart, K. Tsuchiya, K. Sawyer, and L. K. Jacobs, “Intraoperative assessment of final margins with a handheld optical imaging probe during breast-conserving surgery may reduce the reoperation rate: Results of a multicenter study,” Ann. Surg. Oncol. 22(10), 3356–3362 (2015). [CrossRef]

25. F. T. Nguyen, A. M. Zysk, E. J. Chaney, J. G. Kotynek, U. J. Oliphant, F. J. Bellafiore, K. M. Rowland, P. A. Johnson, and S. A. Boppart, “Intraoperative evaluation of breast tumor margins with optical coherence tomography,” Cancer Res. 69(22), 8790–8796 (2009). [CrossRef]

26. K. M. Kennedy, R. Zilkens, W. M. Allen, K. Y. Foo, Q. Fang, L. Chin, R. W. Sanderson, J. Anstie, P. Wijesinghe, A. Curatolo, H. E. I. Tan, N. Morin, B. Kunjuraman, C. Yeomans, S. L. Chin, H. DeJong, K. Giles, B. F. Dessauvagie, B. Latham, C. M. Saunders, and B. F. Kennedy, “Diagnostic accuracy of quantitative micro-elastography for margin assessment in breast-conserving surgery,” Cancer Res. 80(8), 1773–1783 (2020). [CrossRef]

27. X. Yao, Y. Gan, E. Chang, H. Hibshoosh, S. Feldman, and C. Hendon, “Visualization and tissue classification of human breast cancer images using ultrahigh-resolution OCT,” Laser Surg. Med. 49(3), 258–269 (2017). [CrossRef]

28. D. Zhu, J. Wang, M. Marjanovic, E. J. Chaney, K. A. Cradock, A. M. Higham, Z. G. Liu, Z. Gao, and S. A. Boppart, “Differentiation of breast tissue types for surgical margin assessment using machine learning and polarization-sensitive optical coherence tomography,” Biomed. Opt. Express 12(5), 3021–3036 (2021). [CrossRef]

29. A. Butola, D. K. Prasad, A. Ahmad, V. Dubey, D. Qaiser, A. Srivastava, P. Senthilkumaran, B. S. Ahluwalia, and D. S. Mehta, “Deep learning architecture LightOCT for diagnostic decision support using optical coherence tomography images of biological samples,” Biomed. Opt. Express 11(9), 5017–5031 (2020). [CrossRef]

30. S. Kansal, S. Goel, J. Bhattacharya, and V. Srivastava, “Generative adversarial network–convolution neural network based breast cancer classification using optical coherence tomographic images,” Laser Phys. 30(11), 115601 (2020). [CrossRef]

31. A. Rannen Triki, M. B. Blaschko, Y. M. Jung, S. Song, H. J. Han, S. I. Kim, and C. Joo, “Intraoperative margin assessment of human breast tissue in optical coherence tomography images using deep neural networks,” Comput. Med. Imaging Graph. 69, 21–32 (2018). [CrossRef]

32. N. Singla, K. Dubey, and V. Srivastava, “Automated assessment of breast cancer margin in optical coherence tomography images via pretrained convolutional neural network,” J. Biophotonics 12(3), e2018002551 (2019). [CrossRef]

33. D. Mojahed, R. S. Ha, P. Chang, Y. Gan, X. Yao, B. Angelini, H. Hibshoosh, B. Taback, and C. P. Hendon, “Fully automated postlumpectomy breast margin assessment utilizing convolutional neural network based optical coherence tomography image classification method,” Acad. Radiol. 27(5), e81–e86 (2020). [CrossRef]

34. J. Kim, W. Brown, J. R. Maher, H. Levinson, and A. Wax, “Functional optical coherence tomography: Principles and progress,” Phys. Med. Biol. 60(10), R211–R237 (2015). [CrossRef]

35. R. A. Leitgeb and B. Baumann, “Multimodal optical medical imaging concepts based on optical coherence tomography,” Front. Phys. 6, 1141–11417 (2018). [CrossRef]

36. J. F. de Boer, C. K. Hitzenberger, and Y. Yasuno, “Polarization sensitive optical coherence tomography – a review,” Biomed. Opt. Express 8(3), 1838–1873 (2017). [CrossRef]

37. R. Patel, A. Khan, R. Quinlan, and A. N. Yaroslavsky, “Polarization-sensitive multimodal imaging for detecting breast cancer,” Cancer Res. 74(17), 4685–4693 (2014). [CrossRef]

38. F. A. South, E. J. Chaney, M. Marjanovic, S. G. Adie, and S. A. Boppart, “Differentiation of ex vivo human breast tissue using polarization-sensitive optical coherence tomography,” Biomed. Opt. Express 5(10), 3417–3426 (2014). [CrossRef]

39. M. Villiger, D. Lorenser, R. A. McLaughlin, B. C. Quirk, R. W. Kirk, B. E. Bouma, and D. D. Sampson, “Deep tissue volume imaging of birefringence through fibre-optic needle probes for the delineation of breast tumour,” Sci. Rep. 6(1), 28771–2877111 (2016). [CrossRef]

40. W. M. Allen, L. Chin, P. Wijesinghe, R. W. Kirk, B. Latham, D. D. Sampson, C. M. Saunders, and B. F. Kennedy, “Wide-field optical coherence micro-elastography for intraoperative assessment of human breast cancer margins,” Biomed. Opt. Express 7(10), 4139–4153 (2016). [CrossRef]

41. B. F. Kennedy, K. M. Kennedy, and D. D. Sampson, “A review of optical coherence elastography: Fundamentals, techniques and prospects,” IEEE J. Sel. Top. Quantum Electron. 20(2), 272–288 (2014). [CrossRef]

42. S. Wang and K. V. Larin, “Optical coherence elastography for tissue characterization: A review,” J. Biophotonics 8(4), 279–302 (2014). [CrossRef]

43. V. Y. Zaitsev, A. L. Matveyev, L. A. Matveev, A. A. Sovetsky, M. S. Hepburn, A. Mowla, and B. F. Kennedy, “Strain and elasticity imaging in compression optical coherence elastography: The two-decade perspective and recent advances,” J. Biophotonics 14(2), e2020002571 (2021). [CrossRef]

44. N. V. Iftimia, B. E. Bouma, M. B. Pitman, B. Goldberg, J. Bressner, and G. J. Tearney, “A portable, low coherence interferometry based instrument for fine needle aspiration biopsy guidance,” Rev. Sci. Instrum. 76(6), 064301 (2005). [CrossRef]

45. B. D. Goldberg, N. V. Iftimia, J. E. Bressner, M. B. Pitman, E. F. Halpern, B. E. Bouma, and G. J. Tearney, “Automated algorithm for differentiation of human breast tissue using low coherence interferometry for fine needle aspiration biopsy guidance,” J. Biomed. Opt. 13(1), 014014 (2008). [CrossRef]

46. M. Mujat, R. D. Ferguson, D. X. Hammer, C. M. Gittins, and N. V. Iftimia, “Automated algorithm for breast tissue differentiation in optical coherence tomography,” J. Biomed. Opt. 14(3), 034040 (2009). [CrossRef]

47. A. Butola, A. Ahmad, V. Dubey, V. Srivastava, D. Qaiser, A. Srivastava, P. Senthilkumaran, and D. S. Mehta, “Volumetric analysis of breast cancer tissues using machine learning and swept-source optical coherence tomography,” Appl. Opt. 58(5), A135–A141 (2019). [CrossRef]

48. K. Y. Foo, L. Chin, R. Zilkens, D. D. Lakhiani, Q. Fang, R. Sanderson, B. F. Dessauvagie, B. Latham, S. McLaren, C. M. Saunders, and B. F. Kennedy, “Three-dimensional mapping of the attenuation coefficient in optical coherence tomography to enhance breast tissue microarchitecture contrast,” J. Biophotonics 13(6), e2019602011 (2020). [CrossRef]

49. P. Gong, M. Almasian, G. van Soest, D. M. de Bruin, T. G. van Leeuwen, D. D. Sampson, and D. J. Faber, “Parametric imaging of attenuation by optical coherence tomography: Review of models, methods, and clinical translation,” J. Biomed. Opt. 25(04), 1 (2020). [CrossRef]

50. Q. Fang, L. Frewer, R. Zilkens, B. Krajancich, A. Curatolo, L. Chin, K. Y. Foo, D. D. Lakhiani, R. W. Sanderson, P. Wijesinghe, J. D. Anstie, B. F. Dessauvagie, B. Latham, C. M. Saunders, and B. F. Kennedy, “Handheld volumetric manual compression-based quantitative microelastography,” J. Biophotonics 13(6), e2019601961 (2020). [CrossRef]

51. B. Krajancich, A. Curatolo, Q. Fang, R. Zilkens, B. F. Dessauvagie, C. M. Saunders, and B. F. Kennedy, “Handheld optical palpation of turbid tissue with motion-artifact correction,” Biomed. Opt. Express 10(1), 226–241 (2019). [CrossRef]

52. D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics 21(1), 6 (2020). [CrossRef]

53. W. Rawat and Z. Wang, “Deep convolutional neural networks for image classification: A comprehensive review,” Neural Comput. 29(9), 2352–2449 (2017). [CrossRef]

54. D. Chicco, N. Tötsch, and G. Jurman, “The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min. 14(1), 13 (2021). [CrossRef]

55. K. Abhishek and G. Hamarneh, “Matthews correlation coefficient loss for deep convolutional networks: Application to skin lesion segmentation,” in 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI) (IEEE, 2021), pp. 225–229.

56. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 770–778.

57. W. M. Allen, K. M. Kennedy, Q. Fang, L. Chin, A. Curatolo, L. Watts, R. Zilkens, S. L. Chin, B. F. Dessauvagie, B. Latham, C. M. Saunders, and B. F. Kennedy, “Wide-field quantitative micro-elastography of human breast tissue,” Biomed. Opt. Express 9(3), 1082–1096 (2018). [CrossRef]

58. K. A. Vermeer, J. Mo, J. J. A. Weda, H. G. Lemij, and J. F. de Boer, “Depth-resolved model-based reconstruction of attenuation coefficients in optical coherence tomography,” Biomed. Opt. Express 5(1), 322–337 (2014). [CrossRef]

59. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conference on Learning Representations (ICLR 2015) (Computational and Biological Learning Society, 2015), pp. 1–14.

60. A. S. B. Reddy and D. S. Juliet, “Transfer learning with ResNet-50 for malaria cell-image classification,” in 2019 International Conference on Communication and Signal Processing (ICCSP) (IEEE, 2019), pp. 0945–0949.

61. L. Ma, R. Shuai, X. Ran, W. Liu, and C. Ye, “Combining DC-GAN with ResNet for blood cell image classification,” Med. Biol. Eng. Comput. 58(6), 1251–1264 (2020). [CrossRef]

62. M. Wang and X. Gong, “Metastatic cancer image binary classification based on ResNet model,” in 2020 IEEE 20th International Conference on Communication Technology (ICCT) (IEEE, 2020), pp. 1356–1359.

63. Q. A. Al-Haija and A. Adebanjo, “Breast cancer diagnosis in histopathological images using ResNet-50 convolutional neural network,” in 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS) (IEEE, 2020), pp. 1–7.

64. D. Sarwinda, R. H. Paradisa, A. Bustamam, and P. Anggia, “Deep learning in image classification using residual network (ResNet) variants for detection of colorectal cancer,” Procedia Computer Science 179, 423–431 (2021). [CrossRef]

65. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “PyTorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems, vol. 32 H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, and R. Garnett, eds. (Curran Associates, Inc., 2019), pp. 8024–8035.

66. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,” Int. J. Comput. Vis. 115(3), 211–252 (2015). [CrossRef]

67. H.-C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016). [CrossRef]

68. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, Adaptive Computation and Machine Learning (The MIT Press, 2016).

69. C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E. Oliphant, “Array programming with NumPy,” Nature 585(7825), 357–362 (2020). [CrossRef]

70. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and É. Duchesnay, “Scikit-learn: Machine learning in python,” J. Mach. Learn. Res. 12, 2825–2830 (2011).

71. J. D. Hunter, “Matplotlib: A 2D graphics environment,” Comput. Sci. Eng. 9(3), 90–95 (2007). [CrossRef]

72. Y. Cui, M. Jia, T. Lin, Y. Song, and S. Belongie, “Class-balanced loss based on effective number of samples,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, 2019), pp. 9260–9269.

73. J. Gorodkin, “Comparing two K-category assignments by a K-category correlation coefficient,” Comput. Biol. Chem. 28(5-6), 367–374 (2004). [CrossRef]

74. M. Grandini, E. Bagli, and G. Visani, “Metrics for multi-class classification: An overview,” ArXiv200805756 Cs Stat (2020).

75. C. Spearman, “The proof and measurement of association between two things,” Am. J. Psychol. 15(1), 72–101 (1904). [CrossRef]

76. M. Hollander, D. A. Wolfe, and E. Chicken, Nonparametric Statistical Methods, Wiley Series in Probability and Statistics (Wiley, 2015), 3rd ed.

77. T. Fawcett, “An introduction to ROC analysis,” Pattern Recognit. Lett. 27(8), 861–874 (2006). [CrossRef]

78. N. Muralidhar, M. R. Islam, M. Marwah, A. Karpatne, and N. Ramakrishnan, “Incorporating prior domain knowledge into deep neural networks,” in 2018 IEEE International Conference on Big Data (Big Data) (IEEE, 2018), pp. 36–45.

79. X. Xie, J. Niu, X. Liu, Z. Chen, S. Tang, and S. Yu, “A survey on incorporating domain knowledge into deep learning for medical image analysis,” Med. Image Anal. 69, 101985 (2021). [CrossRef]

	Class
Set	Adipose tissue	Benign dense tissue	Malignant tissue	Total
Training	2,194.6 $\pm$ 43.6	823.7 $\pm$ 41.2	1,160.6 $\pm$ 33.8	4178.9 $\pm$ 45.1
Validation	548.6 $\pm$ 11.0	205.9 $\pm$ 10.1	290.2 $\pm$ 8.5	1,044.7 $\pm$ 11.4
Test	304.8 $\pm$ 54.5	114.4 $\pm$ 51.3	161.2 $\pm$ 42.3	580.4 $\pm$ 56.4
Total	3,048.0	1,144.0	1,612.0	5,804.0

	Cross-entropy loss	MCC loss
Total accuracy
Total accuracy (at minimum loss) (%)	88.2 $\pm$ 1.0	90.1 $\pm$ 0.8
Total accuracy (maximum across all epochs) (%)	88.8 $\pm$ 1.0	90.2 $\pm$ 0.7
Decrease in total accuracy at minimum loss compared to maximum (%)	0.6 $\pm$ 0.3	0.2 $\pm$ 0.1
Spearman’s $ρ$ between loss and total accuracy (all epochs)	−0.596 $\pm$ 0.097	−0.968 $\pm$ 0.009
MCC
MCC (at minimum loss)	0.809 $\pm$ 0.017	0.837 $\pm$ 0.014
MCC (maximum across all epochs)	0.818 $\pm$ 0.016	0.840 $\pm$ 0.012
Decrease in MCC at minimum loss compared to maximum	0.009 $\pm$ 0.005	0.003 $\pm$ 0.003
Spearman’s $ρ$ between loss and MCC (all epochs)	−0.623 $\pm$ 0.099	−0.969 $\pm$ 0.009

		OCT network			Combined network
	Histopathology	Adipose tissue	Benign dense tissue	Malignant tissue	Adipose tissue	Benign dense tissue	Malignant tissue
	Sub-images	3,048	1,144	1,612	3,048	1,144	1,612
CNN prediction	Adipose tissue	2,951(96.8%)	68(5.9%)	67(4.2%)	2,950(96.8%)	75(6.6%)	52(3.2%)
	Benign dense tissue	61(2.0%)	669(58.5%)	343(21.3%)	54(1.8%)	757(66.2%)	322(20.0%)
	Malignant tissue	36(1.2%)	407(35.6%)	1,202(74.6%)	44(1.4%)	312(27.3%)	1,238(76.8%)
	Sensitivity (%)	$96.9 \pm 2.4$	$59.6 \pm 18.6$ *	$75.4 \pm 14.5$	$96.8 \pm 1.7$	$68.0 \pm 15.4$ *	$77.3 \pm 12.1$
	Specificity (%)	$95.0 \pm 2.7$	$91.6 \pm 7.3$	$89.5 \pm 9.0$ *	$95.4 \pm 2.6$	$91.9 \pm 6.5$	$91.6 \pm 7.6$ *
	PPV (%)	$95.5 \pm 2.5$	$66.2 \pm 15.3$	$75.5 \pm 14.7$ *	$95.6 \pm 3.6$	$69.2 \pm 10.5$	$79.4 \pm 14.1$ *
	NPV (%)	$96.4 \pm 2.3$	$90.1 \pm 8.7$ *	$90.8 \pm 5.7$	$96.4 \pm 1.5$	$91.7 \pm 8.7$ *	$91.5 \pm 5.3$
	Accuracy (%)	$96.1 \pm 1.6$	$85.0 \pm 6.5$ *	$85.5 \pm 6.4$ *	$96.2 \pm 1.4$	$87.0 \pm 7.2$ *	$87.6 \pm 6.7$ *
		OCT network			Combined network
	Total accuracy (%)	$83.3 \pm 6.6$ *			$85.4 \pm 7.4$ *
	MCC	$0.734 \pm 0.072$ *			$0.766 \pm 0.088$ *

	Class
Set	Adipose tissue	Benign dense tissue	Malignant tissue	Total
Training	2,194.6 $\pm$ 43.6	823.7 $\pm$ 41.2	1,160.6 $\pm$ 33.8	4178.9 $\pm$ 45.1
Validation	548.6 $\pm$ 11.0	205.9 $\pm$ 10.1	290.2 $\pm$ 8.5	1,044.7 $\pm$ 11.4
Test	304.8 $\pm$ 54.5	114.4 $\pm$ 51.3	161.2 $\pm$ 42.3	580.4 $\pm$ 56.4
Total	3,048.0	1,144.0	1,612.0	5,804.0

	Cross-entropy loss	MCC loss
Total accuracy
Total accuracy (at minimum loss) (%)	88.2 $\pm$ 1.0	90.1 $\pm$ 0.8
Total accuracy (maximum across all epochs) (%)	88.8 $\pm$ 1.0	90.2 $\pm$ 0.7
Decrease in total accuracy at minimum loss compared to maximum (%)	0.6 $\pm$ 0.3	0.2 $\pm$ 0.1
Spearman’s $ρ$ between loss and total accuracy (all epochs)	−0.596 $\pm$ 0.097	−0.968 $\pm$ 0.009
MCC
MCC (at minimum loss)	0.809 $\pm$ 0.017	0.837 $\pm$ 0.014
MCC (maximum across all epochs)	0.818 $\pm$ 0.016	0.840 $\pm$ 0.012
Decrease in MCC at minimum loss compared to maximum	0.009 $\pm$ 0.005	0.003 $\pm$ 0.003
Spearman’s $ρ$ between loss and MCC (all epochs)	−0.623 $\pm$ 0.099	−0.969 $\pm$ 0.009

Multi-class classification of breast tissue using optical coherence tomography and attenuation imaging combined via deep learning

Abstract

1. Introduction

2. Materials and methods

2.1 Patient recruitment and imaging protocol

2.2 Data preparation and network training

2.3 MCC-based loss function

3. Results

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (5)

Tables (3)

Equations (9)

Biomedical Optics Express