Automated classification of otitis media with OCT: augmenting pediatric image datasets with gold-standard animal model data

Guillermo L. Monroy; Jungeun Won; Jungeun Won; Jindou Shi; Jindou Shi; Malcolm C. Hill; Ryan G. Porter; Ryan G. Porter; Michael A. Novak; Michael A. Novak; Wenzhou Hong; Pawjai Khampang; Joseph E. Kerschner; Joseph E. Kerschner; Darold R. Spillman; Stephen A. Boppart; Stephen A. Boppart; Stephen A. Boppart; Stephen A. Boppart

doi:10.1364/BOE.453536

1. Introduction

Otitis media (OM), the general term for a middle ear infection, is one of the most common diseases that affect children [1]. Acute OM (AOM) is the initial response to otopathogens, causing an inflammatory response that ultimately leads to pain and fluid build-up in the middle ear. An accurate diagnosis requires expertise in ear examination by health care professionals, and is treated with analgesia and antibiotic therapy when indicated [2]. Recurrent acute otitis media (RAOM) occurs when there are multiple AOM infections in a short timeframe (which individually resolve in between episodes). When fluid from a single AOM infection persists for >3 months, chronic otitis media with effusion (COME) can result. If a patient has substantive difficulties with RAOM and/or COME, the surgical placement of tympanostomy (drainage) tubes under general anesthesia may be necessary. Middle ear biofilms (MEBs) are thought to be the major cause of these resistant and recurring infections [3,4]. MEBs have been found to be present during OM infection and may explain why chronic infections are difficult to treat without surgery, as the local environment created by the biofilm must be effectively removed and reset to normal conditions [5].

To diagnose OM, clinicians typically use an otoscope to visually assess the condition of the ear and compare findings with any associated physical symptoms from a patient physical exam. Unfortunately, in a primary care setting, the difficulty of accurately visualizing a child’s ear limits the accuracy of OM diagnosis (∼50-70% [6]). Generally, it is difficult to perceive subtle signs of infection in tight ear canals of fussy children, not to mention the minimal visual differences in OM disease states from viewing only the surface of the tympanic membrane (TM). While otoscopes have reasonable contrast to detect middle ear effusions (MEE) through a translucent tympanic membrane, otoscopes alone are not able to identify biofilms and other relevant diagnostic factors. Other tools, such as tympanometry and pneumatic otoscopy, are recommended to better noninvasively diagnose OM and detect fluid, though they are often unused or used improperly [7].

Recently, a range of advanced optical techniques have been developed to noninvasively diagnose OM. Standard otoscopes, or otoscopes fitted with specialized fluorescent or hyperspectral illumination hardware, augmented with machine learning [8–10] have shown promise to improve OM diagnosis. Other modalities employing shortwave [11] and terahertz [12] light similarly improve the identification of fluid in the middle ear. Raman spectroscopy and other fluorescent techniques [13] have similarly shown promise to identify the biochemical signatures of OM infection [14], and are currently in development.

As a solution to this diagnostic challenge, our team has developed handheld optical coherence tomography (OCT) probes in an otoscope-like form factor tethered to both cart-based [15] or low-cost briefcase-based [16] portable systems. OCT, as a noninvasive imaging technology, functions as the optical analogue to ultrasound imaging, using broadband near-infrared light to produce tomographic cross-sectional depth-resolved images and volumetric datasets of tissue. OCT can enhance a clinician’s ability to assess and diagnose OM by providing clear information about the state of infection, including the TM and any contents within the middle ear, including MEE or biofilm, if present.

The utility of OCT in otology has been broadly demonstrated across several studies [4,17–20], though data interpretation was typically done by OCT researchers or experts intimately familiar with these systems [21]. One group investigating OM using OCT has performed clinical studies with a similar device, though prioritized observing primarily subjects with RAOM in one study [22], and TM retraction and post-operative recovery in another [20]. Other groups are exploring the functional and vibrational characteristics of the middle [23] and inner ear [24,25]. Other groups are applying OCT to better observe tympanoplasty or other reconstructive surgical procedures [26], or to better observe the ossicles [27]. Other otologic diseases have also been explored using OCT, including cholesteatoma [28].

In a recent study, we developed a machine learning platform to better disseminate this technique and improve the diagnostic capability of users without requiring expert knowledge of OCT or OM [21]. The system can automatically identify the presence of any MEB and/or purulent MEE in OCT images without user intervention. The first iteration of this platform focused on a 58-subject pilot study and had an average performance of 91.1% using OCT data alone, and 99.3% when integrating clinically available information (OM grade otoscopy ranking, quantitative otoscopy analysis, patient exam, patient history). Supervised learning was chosen here to feature interpretability of the classifier results, and later ensure interpretability to relevant clinical diagnostic metrics, such as the presence and quality of fluid, inflammation, or the presence of a biofilm, among others. The platform performed well overall, although the model had limited OCT data available from specific sub-types of the various disease states, such as acute infections with serous MEEs. Recent years have also seen a substantial reduction in pediatric visits to ambulatory care clinics [29,30], limiting our capability to increase the OCT dataset. Furthermore, no public OCT image databases of OM are available, as there are only a few other teams pursuing similar research [20,26]. An alternative approach must be considered to collect additional OCT data and develop a more complete model capable of detecting and classifying the entire spectrum of OM clinical presentations.

The chinchilla (Chinchilla lanigera) has historically been the gold-standard pre-clinical animal model used to study human OM [31,32], given the structural and physiological similarity to humans. This also extends to similarities in immune system behavior, which is important when understanding the importance of pneumococcal vaccines on reducing reinfection incidence rates [33], mucin production and its effect on RAOM [34], understanding specific molecular pathways to chronic OM with effusion production [35], and optimizing treatment strategies based on results in the chinchilla [36]. This model allows for reliable and consistent capture of images and data within specific and controlled infection time points and treatment regimens. In contrast, human studies have significantly more variability, including time to appointment following acute infection onset, variability in treatment protocols, compliance to established guidelines, and treatment follow-up [37,38]. Our team recently longitudinally imaged and followed chinchillas with OCT, under OM infection and with antibiotic therapy [39]. While the chinchilla middle ear has recently been observed with OCT to better understand ossicular motion during hearing [40,41], few other studies have observed OM infection with OCT in this animal model.

In this paper, the utility of OCT ear imaging data from chinchillas was explored to augment and to classify human data and various infection groups using a machine-learning based analysis platform. The segmentation and noise floor handling of the platform, as well as new metrics related to capturing fluid characteristics, were integrated to enhance the analysis and classification. Tests and validation show compatibility between human and chinchilla data, and good performance on a classifier trained with a blended dataset.

2. Methodology

2.1 Machine learning platform overview

The machine learning platform has distinct phases to collect, manage, and intake data, extract useful features, then train and test data for different classification tasks. A high-level visualization of the platform is shown in Fig. 1. Data used in this study was collected from two portable OCT cart-based systems with handheld probes of similar specifications: a broad bandwidth source centered ∼830 nm ± 80 nm, 2.5 mW of power incident on the tissue, ∼3 µm axial and ∼20 µm lateral resolution in air, and 30 frames per second imaging rate with 3 mm depth (axial) x 5 mm width (lateral) field-of-view. Complete specifications for these systems are given in prior publications [4,39]. The data library was comprised of OCT images used in the original platform [21], and newly added but previously collected data from both humans [4,18,42] and chinchillas [39]. The data was collected across multiple clinical sites, including the (human) pediatric clinic, surgical ward, ENT specialist’s office, and chinchilla OM research laboratory. Previous studies had appropriate oversight from either an IRB (human subjects) or IACUC (animal subjects). For this study, any personally identifying meta-data was stripped to remove any personal information [43]. In total, 87 human subjects (90 ears) and 47 chinchilla (85 ears) were observed in this study. Both human (H) and chinchilla (C) cross-sectional OCT images of the TM were sorted into classification Groups 1-4 (# of images per species, # radial profiles): 1 - Normal (H₁: 68 / C₁: 2,154); 2 - Serous Fluid (H₂: 10, C₂: 2,045); 3 - Biofilm (H₃: 21, C₃: 38); 4 - Biofilm and Purulent Fluid (H₄: 156, C₄: 2,077).

Fig. 1. Processing flow and automated classifier pipeline. Images were previously collected using two comparably equipped state-of-the-art portable OCT cart-based systems and tethered handheld probes. Chinchilla and human data were manually collected and organized based on ground truth labeling. Images were fed into the automated feature extraction program, which extracted depth profiles and feature vectors representative of 28 distinct qualities related to tissue structure, image texture, and optical attenuation. In total, this process generated an image library with approximately 1.95 million depth-profile entries. The database was used to perform several tests and comparisons, with test data labeled with a confidence interval and color coded to represent infection group, as in the final panel.

Download Full Size | PDF

Special attention was given to unify OCT data and metadata, as they originate from multiple studies of different focus, taking place at different imaging sites and clinical focus. Ground truth labels were assigned for each 2-D cross-sectional image based on the diagnoses from otoscopy, primarily by the physician treating the human subject, or the researcher attending to the chinchilla during the time of imaging. Because some OCT datasets were taken immediately prior to surgery, a positive identification of fluid could be obtained and could be directly sorted into specific groups. Histopathology of biofilms was likewise confirmed in some datasets, as the originating studies aimed to link the presence of biofilms with specific and conclusive OCT image features (in both human [4] and chinchilla by histology [39]). Using this prior knowledge, biofilms can also be reliably identified from OCT images where surgical or histological analysis was not possible. All data was reviewed by a 3-operater reader study to avoid bias in assigning labels and sort the remaining unlabeled biofilm and/or fluid data into Groups 2-4 by majority vote. Each reader had at least 3 years of OCT imaging and OCT otology experience. If no agreement was met between the readers, the dataset was excluded from consideration, although this happened in only a few cases. A brief additional discussion related to the standard clinical practices and ground truth labeling is available in the Supplement 1.

For the purposes of this study, classification was achieved on 1-D depth profiles and 1-D classification accuracy is reported to explore the utility of chinchilla data in this platform. A future use case reconstructs all 1-D OCT profiles from an image along with predicted infection group labels, color coded over the image as demonstrated in the rightmost panel in Fig. 1. Additional discussion in the Supplement 1 further describes the future use case of this system, and tabulates statistics of the 2-D image-level classification.

2.2 Automated data processing and feature extraction

A visual processing pipeline of this automated process is shown in Fig. 2. The data was first normalized to have comparable image properties (signal-to-noise ratio, dynamic range, etc.), then segmented and analyzed. In the past, data was manually preprocessed to ensure uniformity across datasets taken at different times, locations, and with different OCT systems. Segmentation techniques often rely heavily on this normalization step to equalize the SNR and background levels. With a larger library and data from multiple imaging systems, this became impractical to manually tune. In this updated platform, a dynamic range enhancement filter on every image was implemented. Segmentation primarily relied on the background noise level in the top ∼25 pixels of each OCT image to detect a noise floor. With subject imaging taking place over many days and at different clinical sites, etc., this was a crucial enhancement as system performance was not always consistent. Segmentation took advantage of different masking strategies, including edge detection and thresholding methods. Depth-resolved attenuation maps were also utilized to distinguish areas of “real” structures against high-value noise/speckle. Without this added capability, the feature extractor described in the next section was inefficient - noise and low-scattering particulates in fluid can appear to be very similar, which causes improper detection of the deepest extent of the image. An example of this filtering is shown in the Supplement 1, along with a complete list of the extracted features.

Fig. 2. Visual schematic of the pre-processing and feature extraction steps used in this study. Data is first preprocessed to equalize image quality and unify metadata. Then, the data is segmented by finding using edge finding techniques and the attenuation map to identify the top edge of the TM and the bottom extent of any fluid against potentially high-value noise. Once this valid ROI is identified, texture-, attenuation-, and structural-based features can be extracted for each image and radial profile in the database. Chinchilla OCT data from Group 4 is used to depict this process.

Download Full Size | PDF

Once the feature extractor segments the tissue from the background, surface and depth profiles were fit to the top surface of the TM and the deepest point in the image that could be recognized within the middle ear cavity (in healthy subjects - often the bottom surface of the TM, or any with fluid, biofilm, etc.). Then, a radial profile was taken tangent to the surface of the eardrum. A table was generated with features and metrics from each radial profile, including 2-D values which captured image-level features. This is graphically displayed in the center panel of Fig. 1, and in Fig. 2. These additional features were implemented to better consider 2-D information such as texture and attenuation for more precisely differentiating images with MEEs and MEBs [44]. To extract texture from fitted regions, the histogram of the entire region of interest (ROI) was taken, and the gamma parameter was fit and extracted [45]. A fixed threshold removes the TM from consideration in the ROI. A maximum likelihood estimator for alpha-stable distributions [46] was also used on the same ROI as a comparison to the standard gamma function. Next, a custom-designed “attenuation pocket” calculation helped to differentiate the amount and density of particulates of any MEE within the middle ear. In total, 15 new features were added to the platform, for a total of 28 total features, generally organized into structural, texture, and attenuation map-based features. Each image ranged from 300-800 pixels wide (approximately 0.5–3.0 mm across the TM). For this dataset, this translates to 1.945 million total radial profiles distributed across 4 infection groups and 2 species (Human (5.54%): H₁ 21.19%, H₂ 2.10%, H₃ 7.87%, H₄ 68.83%; Chinchilla (94.46%): C₁ 37.92%, C₂ 31.06%, C₃ 0.47%, C₄ 30.55%).

2.3 Classifier setup: training, tuning, and validation testing

Radial profile data was used in the MATLAB classification app (R2021A), where the desired features and validation strategies were assigned to test classification performance. Optimizable random forest (RF), k-nearest neighbor (k-NN), and support vector machine (SVM) classifiers were used to test applicability of this platform on OCT data. Training and hyperparameter tuning (80% of data) for each classifier was handled through the classification app with 5x cross-validation, where a range of parameters was searched for the lowest misclassification error based on a set of customized misclassification weights. For this platform, weights were set to penalize errors based on the imbalanced dataset sizes in certain groups, commonly known as ‘balanced’ settings found in other software. A weight for each test group was calculated as:

{\textrm{w}_\textrm{j}} = {\textrm{n}_{\textrm{total radial profiles}}}/({\textrm{n}_{\textrm{classification groups}}}\ast {\textrm{n}_{\textrm{radial profiles in groups j}}}).

The remaining data (20% holdout) was tested using the optimized classifier to generate statistics and a confusion matrix. Care was taken when creating the test sets to ensure minimal or no overlap between subjects to ensure unique test cases in data. In the human database, there was no overlap of subjects between the training and test set. With the chinchilla data, while there was no overlap across the training and test set of the same animal, although an animal may appear across different infection groups. Animals from one study were inoculated with OM and longitudinally imaged as the infection progressed. We do not expect this effect to be substantial, though is a limitation of the available data.

Three main tests were performed to validate the dataset. First, Test 1 attempted to discern between human and chinchilla datasets. Test 2 attempted to use a model trained on chinchilla data to classify infection groups in human data. Lastly, Test 3 tested the performance of the intended use case of a chinchilla-supplemented human database to detect infection group in both human and chinchilla data. Confusion matrix accuracy statistics on radial profiles, as well as image-reconstructed statistics, were both evaluated to ensure minimal misclassification errors across the groups.

2.4 Computational hardware

A custom MATLAB script (R2021A) was used to extract features with the classification learner app for classifier training. Scripts were run on consumer level PCs (I9-9980XE, RTX 2080, 64 GB RAM), and no special considerations were needed to run this platform, which was designed to run on any system with MATLAB and a modern GPU. Feature extraction required approximately 60 seconds per image, as the number of radial profiles varied in each image. Classifier training runtime varied widely depending on the type of classifier being trained and the hyperparameter tuning process. Once trained, the model could be exported to classify a single image in approximately 45 seconds when using a CPU-based implementation.

3. Results

3.1 Statistical exploration of the dataset with t-SNE analysis

To ensure the data between species was comparable, and to explore the tradeoffs with ground truth labeling, the feature vectors were tested to explore their distribution with respect to both species and infection group. The data should not be easily distinguishable by species if it is to be compared equitably. For infection profiles, clinical presentations appear on more of a spectrum, such that infection groups may be individually clustered, but the boundaries not clearly defined. Representative data from each group and species are shown in Fig. 3(A) and 3(B). Visually, the OCT image data between species appears comparable as further described in the image caption. To examine this data, a 28-feature space was reduced to a 2-feature space using t-distributed stochastic neighbor embedding (t-SNE) plots to explore these distributions [47–49]. t-SNE plots, shown in the bottom of Fig. 3, were generated using raw feature vectors without ground truth labeling (species or infection group). Coloration with labels was added after running the function to better visualize and interpret the dataset.

Fig. 3. Visual comparison of (A) chinchilla and (B) human OCT data. The t-SNE plots help visualize the overall distribution of data in this study. Bottom Left: When labeled by species, the distributions of data overlap, lending support that human and chinchilla data can be directly compared. Bottom Right: When labeled by infection group, the distribution appears more as a spectrum, ranging from healthy (Group 1 – Normal in Red) to severely infected cases (Group 4 – Biofilm + Purulent Fluid in Purple). Color coding is applicable to both the species (bottom highlight bar) and infection group (image boundaries) and can be directly compared in the t-SNE plots. OCT scale bars represent approximately 100 µm in depth.

Download Full Size | PDF

The data was well distributed, with no identifiable grouping with respect to species. When labeled by infection group, distinct clusters emerged but followed the expected spectrum of clinical presentations. The normal group (Group 1 – red) appears to separate out into two groups. This is likely because no scaling of images was performed for any test, including t-SNE or later classification tests, which would match the chinchilla TM thickness with human data. However, when comparing the normal group to the species plot in the same region, there is also a fair amount of overlap in these clusters. Likewise, infection groups (Groups 2-4) are distributed but slightly overlapped and have common elements. For example, there is a small cluster of green points mixed within the red normal cluster, indicating the tradeoffs when the granularity of the ground truth label is not as fine as the data itself. Specifically, serous OM cases may have some depth profiles that resemble normal depth profiles due to a limited volume of fluid in the middle ear cavity, although they are taken from a subject with fluid and the majority of profiles in that dataset demonstrate fluid.

Additional tests shown in Fig. 4 demonstrate the hyperparameter tuning of t-SNE plots to ensure they have converged, with a comparison to principle component analysis (PCA) and uniform manifold approximation and projection (UMAP) [50]. The stability of the parameters was tested by varying the parameter space ± 20% and reducing the amount of datapoints by 50% and 25%. For t-SNE, learning rate and perplexity were scaled to the dataset size to more accurately visualize global dataset qualities when using very large datasets [49]. Default settings for both PCA and UMAP produced satisfactory results, since parameter variation did not seem to significantly vary the groupings. UMAP results converged well using default settings, and variation of min_dist and n_neighbors produced no meaningful insight, and thus are not shown.

Fig. 4. Analysis and visualization of dataset characteristics. Top: t-SNE plots from Fig. 3 show finalized t-SNE results and hyperparameters. PCA and UMAP results demonstrate similar findings as t-SNE tests. Hyperparameters used to generate plots are shown below each technique. Bottom: t-SNE hyperparameter variation reveals a stable or converged association between groups, though the exact organization and position of clusters changes slightly in each iteration.

Download Full Size | PDF

If specific features are temporarily removed from t-SNE comparison, it was observed that the clusters converge more readily than in presented plots. 2-D image features like texture and attenuation are influenced by minute differences in OCT system hardware or inherent species-level differences, and as a result, any data taken from that system likely contains a unique fingerprint that is detectable in image data when directly comparing in this manner. It is not believed detrimental to classification, as tests focus on identifying infection group – not species. Furthermore, each system collected data across all infection groups. In other words, one system collected data for most human studies, and the other collected data for most chinchilla studies, with some overlap depending on scheduling at the time. Additional exploration is shown in the Supplement 1.

3.2 Classification results

Results of the classification tests are shown in Table 1. Each classification task was itemized with its respective training/validation and test datasets, with results from major supervised-learning classifier types. Test 1: To ensure chinchilla and human data were indistinguishable, classifiers were trained with a blended dataset with the intention of identifying species (chinchilla or human). The classifiers overall performed poorly, supporting that the blended dataset is comparable and not separable. In conjunction with the t-SNE results above, further testing proceeded more confidently. Test 2: To explore if chinchilla data alone could be used to identify the state of infection in human data, the classifiers were retrained with infection group labels. Results are overall insufficient to perform this task reliably, thus a combined model is likely preferred. Test 3: Using a full, blended dataset to identify infection group, results are promising. Still, Group 3 classification performance was unsatisfactory, likely due to limited dataset sizes (G3: 17k entries, vs. 1.95 million total entries). Test 4 was added to ensure the classifier was not overly prioritizing Group 3 (Biofilm) data with the weighting parameters used, so Group 3 was temporarily removed during analysis. Overall, the classifier performs well for this task over both chinchilla and human data, with a 95% mean classification accuracy and F1 score. With future additional data in Group 3, it must be evaluated if the results would approach what is seen in Tests 3 and 4.

Table 1. Classification results. Training data, test data, and classification task are specified for each experiment. The test dataset is comprised of both chinchilla and human data to test species-specific performance. Results for each species are shown for each test, along with a combined F1 overall score. H: human OCT data. C: chinchilla OCT data. H_n/C_n: Species Infection Group(s) 1/2/3/4. CV: Cross-validation. RF: Random Forest. SVM: Support vector machine. k-NN: k-nearest neighbor. RP_C = Radial profile accuracy for chinchilla, and RP_H for human data. F1: F1 score.

View Table

4. Discussion

The data augmentation strategy implemented here was initially explored to supplement cases of human acute OM with serous fluid (Group 2) using chinchilla data. Luckily, the results demonstrate that all data groupings are viable. Generally, this is possible because of the physiological and anatomical similarities of chinchillas to humans under infectious OM conditions. t-SNE plots and preliminary classification tests provide strong evidence of these similarities in this dataset. Without image scaling, TM thicknesses will differ between species, though they should still be identified as normal, which likely explains the two distributions that are seen in the normal group. Classification results otherwise are straightforward to interpret. Tests 1 and 2 accomplish an F1 score near the 70% mark. When given a test dataset, it is not possible to reliably discern the original species, nor is it consistent to use a chinchilla-only database to identify infection states in human data. The use of a blended dataset provides better results in Test 3, but due to limited Group 3 data (biofilm - 0.88%), the overall performance was still limited. Excluding Group 3 temporarily as in Test 4, the performance of this platform became much more consistent. Presumably, with additional data, all data groups can be used. Currently, the database of this platform is somewhat limited in terms of datapoints and source diversity. Additional data from human subjects is needed to match what was collected for chinchilla data, and specifically to increase the amount of data for Group 3 (biofilm). While biofilms are readily identified in humans with COME, the chinchilla model studied is designed for AOM [34,51], which limits correlation with Group 3 human data. While most 1-D profiles in a 2-D image may match the predicted group, there may be some profiles that more closely represent a different infection group. Middle ear contents are fairly heterogenous, and as such its 1-D appearance generally depends on the exact position and location of the scan, and relative amount of fluid volume in the middle ear cavity. For example, as shown in Fig. 3 – a serous Fluid image (Group 2) can have some regions that appear normal (Group 1) if the scattering from the fluid is very low and/or if the particulates are sufficiently distributed. Thus, the radial profile extracted would only contain the signal from the TM and no other scatterers. Similarly, in a Biofilm and Purulent Fluid dataset (Group 4), the fluid and biofilm distribution may be irregular, and may have elements of pure fluid or pure biofilm (Group 2 or 4). Overall, this crosstalk effect seems minimal, but worth noting as a tradeoff with ground truth granularity of labeling data at the 2-D image level instead of 1-D depth-profile level.

For the segmentation platform, the detection of the interface between the inner surface of the TM and any biofilm or fluid would be an ideal metric to capture. While not always possible to visualize due to index matching between the biofilm layers and TM, it may allow for more distinction and quantification within datasets. Distinguishing between biofilms and dense fluid is also difficult and may need to rely on multiple/adjacently collected frames. Alternatively, pneumatic OCT [7] may be able to provide some mechanical perturbation to give some distinction of these features. In the future, using unsupervised or weakly-supervised deep learning methods for both segmentation and classification may be beneficial as more granular infection group labels are desired or other tests are explored. As experienced in other imaging modalities, labeling or relabeling very large datasets is a substantial and time-consuming task [43]. Similarly, lessons learned in deep learning on otoscopy images can be integrated as well [52–54]. Our group is working on capturing and integrating other forms of optical contrast, such as from Raman spectroscopy, into this imaging platform, and these additional features could be integrated to further improve the automated classification [55]. Finally, the results in this paper constitute OCT-data only classification. Additional human datasets collected in the future can be added to improve the accuracy of the platform, along with clinical reports to take into account infection history, as was done previously [21].

Currently, this machine learning platform and imaging system are used as an observational tool during clinical visits. Correlating the image features gathered with OCT to clinical diagnostic and treatment protocols helps to validate the diagnostic utility of the system and demonstrate parity to or advantages over standard techniques. Recently, a low-cost and portable briefcase OCT system from our team found user invariant and accurate classification results between novice and expert users [17], albeit paired with the older version of this classification platform. Longer term, we hope the expanded information offered by this platform when paired with an OCT imaging system can be studied as part of the decision-making process by the physician. In a future study, we plan to compare the performance of this existing platform against transfer learning or other deep learning techniques, and compare performance, interpretability, and clinical utility. Our team is conducting several ongoing clinical studies, which will grow the data library in diversity and help to prevent overfitting when exploring new model types.

Finally, we believe the results demonstrated here are informative for teams performing machine learning based studies on human diseases that may have a related animal model or for related projects in otoscopy. Many of the challenges that were managed in this study are broadly applicable to other OCT-based machine learning projects, especially if integrating data from multiple systems and clinical sites, or similarly if no public databases exist or are widely available. Database organization, management, and potential future dissemination can be modeled after public retinal OCT databases (OPENICPSR, Biobank [56], AROI), or datasets available from specific research teams [57]. As demonstrated here, it is possible for preclinical animal models to augment data in machine learning platforms and databases. The successful integration of chinchilla data into this human OM / OCT model provides strong evidence that the chinchilla is indeed a comparable infection model for human OM.

5. Conclusion

These preliminary results demonstrate the effectiveness of supplementing a human OCT image dataset of OM with OCT image data from a relevant, physiologically and anatomically similar pre-clinical animal model that can temporarily supplement a limited human data library until additional data can be acquired over time. While this data augmentation strategy may not always be possible in all applications, as the specific similarities between humans and pre-clinical animal disease models may not always be as compatible, this study demonstrates and validates the long-standing use of the chinchilla as a surrogate for studying human OM. With additional study and refinement, the diagnostic capabilities of this combined OCT and ML platform may positively impact clinical decision making and patient outcomes in the future.

Funding

National Institutes of Health (R01DC019412, R01EB028615).

Acknowledgments

The authors thank the research coordinators and research staff at Carle Foundation Hospital, and Eric J. Chaney, Edita Aksamitiene, PhD and Marina Marjanovic, PhD from the University of Illinois Urbana-Champaign for their operational assistance with our IRB protocols and clinical studies. All data used in this study was collected under an approved IRB (University of Illinois, Carle Foundation Hospital, Medical College of Wisconsin) or IACUC protocol (Medical College of Wisconsin).

Disclosures

GLM, JW, DRS, SAB; The Board of Trustees of the University of Illinois (P)

MAN, SAB; PhotoniCare, Inc (IE)

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request and through a mutual agreement.

Supplemental document

See Supplement 1 for supporting content.

References

1. K. M. Harmes, R. A. Blackwood, H. L. Burrows, J. M. Cooke, R. V. Harrison, and P. P. Passamani, “Otitis media: diagnosis and treatment,” Am Fam Physician 88(7), 435–440 (2013).

2. A. S. Lieberthal, A. E. Carroll, T. Chonmaitree, T. G. Ganiats, A. Hoberman, M. A. Jackson, M. D. Joffe, D. T. Miller, R. M. Rosenfeld, X. D. Sevilla, R. H. Schwartz, P. A. Thomas, and D. E. Tunkel, “The diagnosis and management of acute otitis media,” Pediatrics 131(3), e964–e999 (2013). [CrossRef]

3. L. Hall-Stoodley, F. Z. Hu, A. Gieseke, L. Nistico, D. Nguyen, J. Hayes, M. Forbes, D. P. Greenberg, B. Dice, A. Burrows, P. A. Wackym, P. Stoodley, J. C. Post, G. D. Ehrlich, and J. E. Kerschner, “Direct detection of bacterial biofilms on the middle-ear mucosa of children with chronic otitis media,” JAMA 296(2), 202–211 (2006). [CrossRef]

4. G. L. Monroy, W. Hong, P. Khampang, R. G. Porter, M. A. Novak, D. R. Spillman, R. Barkalifa, E. J. Chaney, J. E. Kerschner, and S. A. Boppart, “Direct analysis of pathogenic structures affixed to the tympanic membrane during chronic otitis media,” Otolaryngol.--Head Neck Surg. 159(1), 117–126 (2018). [CrossRef]

5. M. F. L. van den Broek, I. De Boeck, F. Kiekens, A. Boudewyns, O. M. Vanderveken, and S. Lebeer, “Translating recent microbiome insights in otitis media into probiotic strategies,” Clin. Microbiol. Rev. 32(4), e00010 (2019). [CrossRef]

6. M. E. Pichichero, “Diagnostic accuracy, tympanocentesis training performance, and antibiotic selection by pediatric residents in management of otitis media,” Pediatrics 110(6), 1064–1070 (2002). [CrossRef]

7. J. Won, G. L. Monroy, P.-C. Huang, R. Dsouza, M. C. Hill, M. A. Novak, R. G. Porter, E. Chaney, R. Barkalifa, and S. A. Boppart, “Pneumatic low-coherence interferometry otoscope to quantify tympanic membrane mobility and middle ear pressure,” Biomed. Opt. Express 9(2), 397–409 (2018). [CrossRef]

8. T. C. Cavalcanti, S. Kim, K. Lee, S. Y. Lee, M. K. Park, and J. Y. Hwang, “Smartphone-based spectral imaging otoscope: System development and preliminary study for evaluation of its potential as a mobile diagnostic tool,” J. Biophotonics 13(6), e2452 (2020). [CrossRef]

9. T. C. Cavalcanti, H. M. Lew, K. Lee, S. Y. Lee, M. K. Park, and J. Y. Hwang, “Intelligent smartphone-based multimode imaging otoscope for the mobile diagnosis of otitis media,” Biomed. Opt. Express 12(12), 7765–7779 (2021). [CrossRef]

10. J. V. Sundgaard, J. Harte, P. Bray, S. Laugesen, Y. Kamide, C. Tanaka, R. R. Paulsen, and A. N. Christensen, “Deep metric learning for otitis media classification,” Med. Image Anal. 71, 102034 (2021). [CrossRef]

11. J. A. Carr, T. A. Valdez, O. T. Bruns, and M. G. Bawendi, “Using the shortwave infrared to image middle ear pathologies,” Proc. Natl. Acad. Sci. U. S. A. 113(36), 9989–9994 (2016). [CrossRef]

12. Y. B. Ji, I. S. Moon, H. S. Bark, S. H. Kim, D. W. Park, S. K. Noh, Y. M. Huh, J. S. Suh, S. J. Oh, and T. I. Jeon, “Terahertz otoscope and potential for diagnosing otitis media,” Biomed. Opt. Express 7(4), 1201–1209 (2016). [CrossRef]

13. J. J. Yim, S. P. Singh, A. Xia, R. Kashfi-Sadabad, M. Tholen, D. M. Huland, D. Zarabanda, Z. Cao, P. Solis-Pazmino, M. Bogyo, and T. A. Valdez, “Short-wave infrared fluorescence chemical sensor for detection of otitis media,” ACS Sens. 5(11), 3411–3419 (2020). [CrossRef]

14. A. Locke, S. Fitzgerald, and A. Mahadevan-Jansen, “Advances in optical detection of human-associated pathogenic bacteria,” Molecules 25(22), 5256 (2020). [CrossRef]

15. G. L. Monroy, R. L. Shelton, R. M. Nolan, C. T. Nguyen, M. A. Novak, M. C. Hill, D. T. McCormick, and S. A. Boppart, “Noninvasive depth-resolved optical measurements of the tympanic membrane and middle ear for differentiating otitis media,” Laryngoscope 125(8), E276–E282 (2015). [CrossRef]

16. G. L. Monroy, J. Won, D. R. Spillman, R. Dsouza, and S. A. Boppart, “Clinical translation of handheld optical coherence tomography: Practical considerations and recent advancements,” J. Biomed. Opt. 22(12), 1–30 (2017). [CrossRef]

17. J. Won, G. L. Monroy, R. I. Dsouza, D. R. Spillman Jr., J. McJunkin, R. G. Porter, J. Shi, E. Aksamitiene, M. Sherwood, L. Stiger, and S. A. Boppart, “Handheld briefcase optical coherence tomography with real-time machine learning classifier for middle ear infections,” Biosensors (Basel) 11(5), 143 (2021). [CrossRef]

18. J. Won, G. L. Monroy, P. C. Huang, M. C. Hill, M. A. Novak, R. G. Porter, D. R. Spillman, E. J. Chaney, R. Barkalifa, and S. A. Boppart, “Assessing the effect of middle ear effusions on wideband acoustic immittance using optical coherence tomography,” Ear Hear 41(4), 811–824 (2020). [CrossRef]

19. W. Kim, S. Kim, S. Huang, J. S. Oghalai, and B. E. Applegate, “Picometer scale vibrometry in the human middle ear using a surgical microscope based optical coherence tomography and vibrometry system,” Biomed. Opt. Express 10(9), 4395–4410 (2019). [CrossRef]

20. K. Park, N. H. Cho, M. Jeon, S. H. Lee, J. H. Jang, S. A. Boppart, W. Jung, and J. Kim, “Optical assessment of the in vivo tympanic membrane status using a handheld optical coherence tomography-based otoscope,” Acta Otolaryngol 138(4), 367–374 (2018). [CrossRef]

21. G. L. Monroy, J. Won, R. Dsouza, P. Pande, M. C. Hill, R. G. Porter, M. A. Novak, D. R. Spillman, and S. A. Boppart, “Automated classification platform for the identification of otitis media using optical coherence tomography,” npj Digit. Med. 2(1), 22 (2019). [CrossRef]

22. N. H. Cho, S. H. Lee, W. Jung, J. H. Jang, and J. Kim, “Optical coherence tomography for the diagnosis and evaluation of human otitis media,” J. Korean Med. Sci. 30(3), 328–335 (2015). [CrossRef]

23. C. G. Lui, W. Kim, J. B. Dewey, F. D. Macias-Escriva, K. Ratnayake, J. S. Oghalai, and B. E. Applegate, “In vivo functional imaging of the human middle ear with a hand-held optical coherence tomography device,” Biomed. Opt. Express 12(8), 5196–5213 (2021). [CrossRef]

24. H. Y. Lee, P. D. Raphael, A. Xia, J. Kim, N. Grillet, B. E. Applegate, A. K. Ellerbee Bowden, and J. S. Oghalai, “Two-dimensional cochlear micromechanics measured in vivo demonstrate radial tuning within the mouse organ of corti,” J. Neurosci. 36(31), 8160–8173 (2016). [CrossRef]

25. E. S. Olson and C. E. Strimbu, “Cochlear mechanics: New insights from vibrometry and optical coherence tomography,” Curr. Opin Physiol. 18, 56–62 (2020). [CrossRef]

26. H. E. I. Tan, P. L. Santa Maria, P. Wijesinghe, B. Francis Kennedy, B. J. Allardyce, R. H. Eikelboom, M. D. Atlas, and R. J. Dilley, “Optical coherence tomography of the tympanic membrane and middle ear: a review,” Otolaryngol Head Neck Surg. 159(3), 424–438 (2018). [CrossRef]

27. D. MacDougall, J. Rainsbury, J. Brown, M. Bance, and R. Adamson, “Optical coherence tomography system requirements for clinical diagnostic middle ear imaging,” J. Biomed. Opt. 20(5), 056008 (2015). [CrossRef]

28. H. R. Djalilian, M. Rubinstein, E. C. Wu, K. Naemi, S. Zardouz, K. Karimi, and B. J. Wong, “Optical coherence tomography of cholesteatoma,” Otol Neurotol. 31(6), 932–935 (2010). [CrossRef]

29. A. Mehrotra, M. Chernew, D. Linetsky, H. Hatch, D. Cutler, and E. C. Schneider, “The impact of the COVID-19 pandemic on outpatient care: Visits return to prepandemic levels, but not for all providers and patients,” https://www.commonwealthfund.org/publications/2020/oct/impact-covid-19-pandemic-outpatient-care-visits-return-prepandemic-levels (10/15/20). Accessed 1/31/21. (10.26099/41xy-9m57).

30. X. Xu, “Children wearing facemasks during the COVID-19 pandemic has reduced pressure on paediatric respiratory departments,” Acta Paediatr 110(3), 750 (2021). [CrossRef]

31. L. O. Bakaletz, “Chinchilla as a robust, reproducible and polymicrobial model of otitis media and its prevention,” Expert Rev Vaccines 8(8), 1063–1082 (2009). [CrossRef]

32. M. Shimoyama, J. R. Smith, J. De Pons, M. Tutaj, P. Khampang, W. Z. Hong, C. B. Erbe, G. D. Ehrlich, L. O. Bakaletz, and J. E. Kerschner, “The chinchilla research resource database: Resource for an otolaryngology disease model,” Database-Oxford, (2016).

33. G. S. Giebink, “The pathogenesis of pneumococcal otitis media in chinchillas and the efficacy of vaccination in prophylaxis,” Rev Infect Dis 3(2), 342–352 (1981). [CrossRef]

34. J. E. Kerschner, P. Khampang, and T. Samuels, “Extending the chinchilla middle ear epithelial model for mucin gene investigation,” Int. J. Pediatr Otorhi 74(9), 980–985 (2010). [CrossRef]

35. M. F. Bhutta, R. B. Thornton, L. S. Kirkham, J. E. Kerschner, and M. T. Cheeseman, “Understanding the aetiology and resolution of chronic otitis media from animal and human studies,” Dis. Model Mech. 10(11), 1289–1300 (2017). [CrossRef]

36. W. Hong, P. Khampang, A. R. Kerschner, A. C. Mackinnon, K. Yan, P. M. Simpson, and J. E. Kerschner, “Antibiotic modulation of mucins in otitis media; should this change our approach to watchful waiting?” Int J Pediatr Otorhinolaryngol 125, 134–140 (2019). [CrossRef]

37. S. Meherali, A. Campbell, L. Hartling, and S. Scott, “Understanding parents’ experiences and information needs on pediatric acute otitis media: A qualitative Study,” J Patient Exp 6(1), 53–61 (2019). [CrossRef]

38. K. Kubicek, D. Liu, C. Beaudin, J. Supan, G. Weiss, Y. Lu, and M. D. Kipke, “A profile of nonurgent emergency department use in an urban pediatric hospital,” Pediatr Emerg Care 28(10), 977–984 (2012). [CrossRef]

39. J. Won, W. Hong, P. Khampang, D. R. Spillman Jr., S. Marshall, K. Yan, R. G. Porter, M. A. Novak, J. E. Kerschner, and S. A. Boppart, “Longitudinal optical coherence tomography to visualize the in vivo response of middle ear biofilms to antibiotic therapy,” Sci. Rep. 11(1), 5176 (2021). [CrossRef]

40. J. J. Rosowski, A. Ramier, J. T. Cheng, and S. H. Yun, “Optical coherence tomographic measurements of the sound-induced motion of the ossicular chain in chinchillas: Additional modes of ossicular motion enhance the mechanical response of the chinchilla middle ear at higher frequencies,” Hear Res 396, 108056 (2020). [CrossRef]

41. A. Ramier, J. T. Cheng, M. E. Ravicz, J. J. Rosowski, and S. H. Yun, “Mapping the phase and amplitude of ossicular chain motion using sound-synchronous optical coherence vibrography,” Biomed Opt Express 9(11), 5489–5502 (2018). [CrossRef]

42. G. L. Monroy, P. Pande, R. M. Nolan, R. L. Shelton, R. G. Porter, M. A. Novak, D. R. Spillman, E. J. Chaney, D. T. McCormick, and S. A. Boppart, “Noninvasive in vivo optical coherence tomography tracking of chronic otitis media in pediatric subjects after surgical intervention,” J. Biomed. Opt. 22(12), 1–11 (2017). [CrossRef]

43. M. J. Willemink, W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, H. Harvey, L. R. Folio, R. M. Summers, D. L. Rubin, and M. P. Lungren, “Preparing medical imaging data for machine learning,” Radiology 295(1), 4–15 (2020). [CrossRef]

44. R. Dsouza, D. R. Spillman Jr., R. Barkalifa, G. L. Monroy, E. J. Chaney, K. C. White, and S. A. Boppart, “In vivo detection of endotracheal tube biofilms in intubated critical care patients using catheter-based optical coherence tomography,” J. Biophotonics 12(5), e201800307 (2019). [CrossRef]

45. A. A. Lindenmaier, L. Conroy, G. Farhat, R. S. DaCosta, C. Flueraru, and I. A. Vitkin, “Texture analysis of optical coherence tomography speckle for characterizing biological tissues in vivo,” Opt. Lett. 38(8), 1280–1282 (2013). [CrossRef]

46. P. Zagaglia, Estimation of alpha-stable distribution parameters using a quantile method (https://www.mathworks.com/matlabcentral/fileexchange/34783-estimation-of-alpha-stable-distribution-parameters-using-a-quantile-method), MATLAB Central File Exchange. Retrieved Oct 2020.

47. S. You, Y. Sun, L. Yang, J. Park, H. Tu, M. Marjanovic, S. Sinha, and S. A. Boppart, “Real-time intraoperative diagnosis by deep neural network driven multiphoton virtual histology,” NPJ Precis Oncol 3(1), 33 (2019). [CrossRef]

48. Y. Li, C. M. Nowak, U. Pham, K. Nguyen, and L. Bleris, “Cell morphology-based machine learning models for human cell state classification,” NPJ Syst Biol Appl 7(1), 23 (2021). [CrossRef]

49. A. C. Belkina, C. O. Ciccolella, R. Anno, R. Halpert, J. Spidlen, and J. E. Snyder-Cappione, “Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets,” Nat. Commun. 10(1), 5415 (2019). [CrossRef]

50. C. Meehan, J. Ebrahimian, W. Moore, and S. Meehan, Uniform manifold approximation and projection (UMAP) (https://www.mathworks.com/matlabcentral/fileexchange/71902), MATLAB Central File Exchange (2021).

51. S. D. Reid, W. Hong, K. E. Dew, D. R. Winn, B. Pang, J. Watt, D. T. Glover, S. K. Hollingshead, and W. E. Swords, “Streptococcus pneumoniae forms surface-attached communities in the middle ear of experimentally infected chinchillas,” J. Infect Dis. 199(6), 786–794 (2009). [CrossRef]

52. X. Zeng, Z. Jiang, W. Luo, H. Li, H. Li, G. Li, J. Shi, K. Wu, T. Liu, X. Lin, F. Wang, and Z. Li, “Efficient and accurate identification of ear diseases using an ensemble deep learning model,” Sci. Rep. 11(1), 10839 (2021). [CrossRef]

53. K. Tsutsumi, K. Goshtasbi, A. Risbud, P. Khosravi, J. C. Pang, H. W. Lin, H. R. Djalilian, and M. Abouzari, “A web-based deep learning model for automated diagnosis of otoscopic images,” Otol Neurotol 42(9), e1382–e1388 (2021). [CrossRef]

54. D. Livingstone, A. S. Talai, J. Chau, and N. D. Forkert, “Building an otoscopic screening prototype tool using deep learning,” J Otolaryngol Head Neck Surg 48(1), 66 (2019). [CrossRef]

55. A. Locke, F. R. Zaki, S. Fitzgerald, K. Sudhir, G. L. Monroy, H. Choi, J. Won, A. Mahadevan-Jansen, and S. A. Boppart, “Differentiation of otitis media-causing planktonic and bacterial biofilms via Raman spectroscopy and optical coherence tomography,” Frontiers, in press (2022).

56. P. A. Keane, C. M. Grossi, P. J. Foster, Q. Yang, C. A. Reisman, K. Chan, T. Peto, D. Thomas, P. J. Patel, and U. K. B. E. V. Consortium, “Optical coherence tomography in the UK Biobank study - rapid automated analysis of retinal thickness for large population-based studies,” PLoS One 11(10), e0164095 (2016). [CrossRef]

57. S. M. Khan, X. Liu, S. Nath, E. Korot, L. Faes, S. K. Wagner, P. A. Keane, N. J. Sebire, M. J. Burton, and A. K. Denniston, “A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability,” Lancet Digit Health 3(1), e51–e66 (2021). [CrossRef]

Automated classification of otitis media with OCT: augmenting pediatric image datasets with gold-standard animal model data

Abstract

1. Introduction

2. Methodology

2.1 Machine learning platform overview

2.2 Automated data processing and feature extraction

2.3 Classifier setup: training, tuning, and validation testing

2.4 Computational hardware

3. Results

3.1 Statistical exploration of the dataset with t-SNE analysis

3.2 Classification results

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (4)

Tables (1)

Equations (1)

Biomedical Optics Express

Guillermo L. Monroy	https://orcid.org/0000-0002-3669-8514
Darold R. Spillman	https://orcid.org/0000-0001-9946-2659
Stephen A. Boppart	https://orcid.org/0000-0002-9386-5630