COVID-19 screening with digital holographic microscopy using intra-patient probability functions of spatio-temporal bio-optical attributes

Timothy O’Connor; Bahram Javidi

doi:10.1364/BOE.466005

1. Introduction

Coronavirus disease 2019 (COVID-19), caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), can cause shortness of breath, fatigue, and fevers or chills with severe cases resulting in hospitalization or death. In addition to the outwardly expressed symptoms of the disease, COVID-19 has also been reported to induce several changes to red blood cells (RBCs) [1–8]. Reported differences in RBCs associated with COVID-19 include: decreased hemoglobin and hematocrit levels [1], increased RBC distribution width [1], increased frequency of stomatocytes and knizocytes [3], increased intracellular nitric oxide levels [4], decreased RBC deformability [2][5], and increased RBC aggregation [2][5]. Additional conflicting reports suggest changes in the mean cell volume as well, with [2] and [6] reporting increased volume, [7] reporting decreased cell volume, and [8] reporting no statistically significant difference associated with infection. Notably, these studies [1–8] show meaningful differences both at the cellular level as well as among the distributions of hematological parameters between healthy and infected individuals.

Digital holographic microscopy (DHM) is a proven modality for biological imaging that has gained popularity because of its label-free, quantitative phase imaging capabilities which enables detailed cell analysis and disease identification [9–24]. DHM is an interferometric imaging system that provides nanometer scale phase sensitivity and single-shot operation. Beyond simple cell inspection, these properties make DHM a valuable tool in studying time-varying properties of live biological samples [9,10,17–22]. Early studies of using DHM to extract spatio-temporal information for disease identification showed success in the classification of individuals for sickle cell disease [17,18] and more recently for COVID-19 [9,10]. Initial works in this area developed hand-crafted features to capture the spatio-temporal information [17]. Following the promising results of hand-crafted spatio-temporal features, performance was further improved by employing deep learning in the form of a bi-directional long short-term memory (bi-LSTM) network to learn to classify time-varying signals [18,9]. Most recently, advancement in cell identification using spatio-temporal performance was achieved using a highly comparative time-series analysis (HCTSA) [25,26], for massive temporal feature extraction [10]. This last approach found success in extracting features on the time-varying optical volume signal computed from the reconstructed red blood cells and outperformed both the manual attempts in defining hand-crafted features as well as the bi-LSTM approach for spatio-temporal classification using deep learning. Though successful in their task of disease screening, each of these previous approaches focused on classifying individual RBCs then used a census policy to determine an individuals’ classification with respect to the disease state. However, information on the distribution of a given patient’s cells such as RBC distribution width can also serve as important biomarkers for disease [27] and therefore may be beneficial to improving diagnostic performance. To this end, we introduce an approach to directly classify individuals based on the probability distribution functions of cellular bio-optical attributes from digital holographic reconstructed RBCs.

More specifically, we propose a classification scheme based on the Bag-of-Features (BoF) [28] methodology to create feature vectors from the histograms of cellular bio-optical attributes. For each patient, the histograms are computed for attributes of interest and normalized as a probability density functions (PDFs) to account for differences in the number of cells between patients. The histogram bin counts are concatenated into a feature vector and then classified using a linear support vector machine. We additionally compare this proposed approach to classifiers directly using statistical measures such as mean, standard deviation, skewness, and kurtosis and to a k-nearest neighbor classifier using the Kolmogorov-Smirnoff distance between probability distributions as the distance metric [29]. Our results show improved performance through the inclusion of this distribution information in our classification scheme.

The remainder of this paper is organized as follows. We begin by overviewing the microscopy system for digital holographic recording and the corresponding numerical reconstruction of RBCs. Next, we detail our BoF classification approach. Then we present the results of a statistical analysis on the studied bio-optical attributes of the RBCs followed by the classification results. Finally, the results are followed by a discussion of and the conclusions of our work.

2. Methodology

2.1 Digital holographic microscopy

A low-cost, 3D-printed shearing digital holographic microscope was used for RBC data collection as previously reported [9,10]. Shearing DHM relies on a glass plate to laterally shear the incoming object beam into two duplicated copies with an off-axis angle between them and provide a simple implementation of common-path, off-axis DHM [12,13,23]. The system used for data collection consists of a laser diode (1.2 mW, Thorlabs CPS 635R), microscope objective lens (40X, 0.65 NA), shear plate, CMOS image sensor (Basler aca A3800-14um), and a 1D translational stage for axial positioning of the sample is shown along with an optical diagram in Fig. 1.

Fig. 1. (a) Optical configuration and (b) 3D-printed experimental system with dimensions 94 mm x 107 mm x 190.5 mm [9] used in RBC data collection.

Download Full Size | PDF

As both laterally sheared copies of the beam contain the object information, shearing systems require sparse sample preparation to avoid overlapping information between the object and reference beams. This is easily accomplished with RBC samples by making thin blood smears for imaging. The redundant capture of a given cell, which arises due to a shearing configuration, can be avoided by setting the lateral shear greater than the sensor dimensions. The amount of lateral shear is a function of the glass plate thickness, refractive index, and the angle of incidence [30]. By the Rayleigh criterion, our maximal achievable lateral resolution is 0.594 µm. Temporal stability of the system, determined based on the pixel-wise standard deviations in optical path length (OPL) for video of a blank glass slide, was measured to be 2.5 nm.

Numerical reconstruction follows the Fourier spectrum analysis method customary to off-axis digital holographic reconstruction [11–13]. In Fourier space, the object spectrum is isolated from the DC and conjugate terms, centered, and then inverse-transformed. From the recovered complex amplitude, Ũ (ξ, η), the object phase is extracted as Ф=tan^-1[Im{Ũ}/Re{Ũ}], where Im{·} and Re{·} represent the real and imaginary parts, respectively. We use Goldstein’s branch-cut method [31] for phase unwrapping and subtract a reference phase from a cell free-region of a glass slide to correct for system aberrations [12,13,32]. From the recovered phase, the optical path length (OPL) is computed as OPL= Ф_un[λ/(2π)] where the OPL is used for all further processing including calculation of the bio-optical attributes. The OPL can be directly related to the thickness of a sample when all refractive indices are known, however, the refractive indices cannot be assumed to be known for various disease states, especially those which can alter the hemoglobin levels as hemoglobin concentration is known to have a direct impact on the refractive index [33].

Following numerical reconstruction, individual RBCs are segmented from the field-of-view in a partially automated way with user supervision. Potential cells are automatically identified based on a calibrated expected cell size from a binarized version of the phase map. Each potential cell is then presented to the user to accept or reject. Candidate cells are rejected if they are incorrect detections from the phase map (i.e., noise from various sources, cell fragments, etc.), or if they are in contact with neighboring cells, or if the cells show substantial lateral translation over the course of the video. The result is a dataset consisting of segmented free-floating cells that were laterally stationary.

2.2 Sample preparation and data collection

All data was collected on-site at the UConn Health Center using the field potable, 3D-printed system as shown in Fig. 1. Study data was obtained from 10 consenting COVID-19 positive patients and 14 volunteer healthy healthcare workers in accordance with UConn Health and UConn Storrs Institutional Review Board policies as previously described [9,10]. Whole blood samples were collected in K2DETA spray-coated tubes and processed within 4 hours of collection. Video holograms were recorded for 10 seconds at 20 frames per second (fps) followed by numerical reconstruction. The resulting dataset consisted of 1472 total red blood cells, (838 COVID RBCs and 634 healthy RBCs). COVID-19 positive individuals were confirmed by PCR testing. Healthy volunteers were confirmed healthy by a recent negative PCR test along with negative serology test results. Alternatively, healthy volunteers were allowed to participate in the study if they had recovered from COVID-19 with the positive PCR result coming at least 90 days prior to inclusion.

2.3 Bag-of-features for distribution-based classification

Our goal is to provide a classification framework that incorporates the intra-patient distribution information. For example, RBC distribution width, which measures the spread of RBC size within a given patient, is of known benefit as a general biomarker for disease [27] and moreover has been identified as a potential biomarker specifically for COVID-19 [1]. We aim to incorporate not only the distribution spreads, taken as the standard deviation of attribute values for a given patient, but to also capture higher-order information pertaining to the distributions. Additionally, we do not limit ourselves to only the RBC size but explore a variety of potential bio-optical attributes including those related to the spatio-temporal behavior of the cells. Our approach borrows ideas from the Bag-of-Features (BoF), also known as Bag-of-Visual-Words, methodology that is common to natural language processing tasks and has similarly shown success in image classification tasks [28].

In general terms, the BoF approach takes the frequency of various features plotted onto a frequency histogram then quantizes the bin counts into a feature vector for classification. For a typical image classification task, this procedure would consist of: (1) computing local features such as STIP, FAST, or BRISK features; (2) clustering feature descriptors by k-means segmentation into k mutually exclusive groups each representing a “word” or “feature” in the classification dictionary; (3) computing the counts of each “word” within a given image to form the final feature vectors; and lastly (4) classifying the images based on the feature vectors.

Since our goal is to make use of the distribution information which we believe to be important for characterizing the disease state of the samples, we modify this BoF approach to instead convert PDFs of bio-optical attributes computed from a given human subjects’ cells into feature vectors then directly classify individuals from those feature vectors.

For each human subject in our dataset, we first reconstruct individual segmented RBCs as detailed in Section 2.1. From the segmented RBCs, bio-optical attributes are extracted then the histograms of attribute values are plotted, and the bin counts are used to create an encoded feature vector. For this work, bio-optical attributes extracted include handcrafted morphological measures [17,34] characterizing the cell shape and the catch22 [35] subset of highly comparative time-series analysis [25,26] measures which are aimed at computing values for the spatio-temporal behavior of the RBCs. Here, we are concerned with the time-varying behavior relating to the cellular optical volume as was previously shown to provide high classification capabilities for COVID-19 [10]. The full list of handcrafted bio-optical attributes considered in this work is provided in the Supplemental Material Table S1, and the description of each catch22 feature is given in [35]. The output bin counts are concatenated into an encoded feature vector which is fed to a linear support vector machine for classification following a patient-wise cross validation procedure. We use a nested cross-validation procedure [36] to evaluate our model as well as tune the model for feature selection and the optimal number of bins, N.

Since the number of cells per patient varies across our dataset (ranged from 21 RBCs to 161 RBCs with an average of 61 ${\pm} $ 37 RBCs), all histograms are plotted as PDFs and sum to 1. Additionally, each bio-optical attribute is rescaled by the training data to be in the range [0,1] such that all histograms can be plotted on the same range to ensure consistency across the dataset. The number of bins, N, determines how many partitions to divide the histogram into and thus defines the coarseness of information encoded into the resulting feature vectors. In the case of N = 100, each bin count would represent a percentile of the PDF, and in the case of N = 4, each bin would represent a different quartile. We view the value of N as a hyperparameter and tune it during model training from the set of potential values {3,5,10,15,25,50}. Besides tuning the number of bins, we also perform feature selection to reduce model complexity and retain only the most useful bio-optical attributes. This process of model selection and hyperparameter tuning is performed during the inner-loop of the nested cross-validation procedure. An overview diagram of our classification procedure is provided by Fig. 2.

Fig. 2. Overview diagram for Bag-of-Features (BoF) distribution-based classification model. Probability distribution functions are computed for all bio-optical attributes of interest for the cell population sampled for each subject. The PDFs are quantized into N number of bins then concatenated to form a feature vector. The corresponding feature vectors are then classified by a linear support vector machine.

Download Full Size | PDF

For the nested cross-validation [36], an outer cross-validation loop determines the current test subject with the remaining subjects being used for training. Among the training subjects, an inner cross-validation is used to determine the optimal model to be applied to the test subject. Here, our outer-loop uses a leave-one-out cross validation (LOOCV) and for our inner-loop we run repeated stratified cross-validations with 100 repetitions to reduce the chance of overfitting to the hyperparameters to the inner-loop data. Repeated inner-loop cross-validations was essential during feature selection to avoid features being selected by chance. During the inner-loop, we perform feature selection and tune the number of bins, N. The feature selection is conducted using a ranking-based feature selection. First, the features are ranked in order of individual classification performance, then the top X-features are evaluated where X is continually increased until performance is no longer improved. Feature selection is performed for each possible value of N number of bins, then the best-performing model in terms of the inner-loop performance is applied to the current test subject on the outer-loop. Nested cross-validation offers a solution to model selection and/or hyperparameter tuning when the dataset is small, but the drawbacks include increased computation for training and that the optimal model for each test subject may be different. To better allow for comparison between models and reduce the odds of the best model being chosen by chance, each outer-loop classification is assessed using a bagged-ensemble [37] with 100 bootstrap samples each having the size of the original training dataset.

All classifications, both inner and outer loop, use a linear support vector machine (SVM) to classify the output feature vectors. The linear SVM is implemented in MATLAB and uses a box constraint of 1, where the box constraint is the parameter for regularization controlling the penalty for margin violations. Each feature vector is standardized prior to training by subtracting the mean and dividing by the standard deviation of the feature values. We expect the proposed model will capture the distribution of various attributes and provide additional information for classification.

3. Results

3.1 Statistical comparison of bio-optical attributes

First, we take a statistical look at the distributions of bio-optical attributes. To do so, we perform t-tests on the means, standard deviations, skewness, and kurtosis measures of the bio-optical attributes between the COVID-positive and healthy cohorts. In addition to performing t-testing on each of the bio-optical attributes, we further performed a Kolmogorov-Smirnov test (KS-test) to directly assess whether or not the bio-optical attribute values between healthy and COVID-19 samples come from the same continuous distribution [29]. The main results are summarized in the following and the full results of these statistical tests are provided in the supplemental material (Table S2).

From the t-testing results, we observe statistically significant differences (COVID vs Healthy, p-value) between population means with increases in the mean values of optical volume (3.47×${10^{ - 18}}$ vs 3.20×${10^{ - 18}}$, p = 0.0008), projected area (5.49×${10^{ - 11}}$ vs 5.06×${10^{ - 18}}$, p = 0.00007), perimeter (2.88×${10^{ - 5}}$ vs 2.07×${10^{ - 5}}$, p = 0.0065), maximum (3.47×${10^{ - 18}}$ vs 3.20×${10^{ - 18}}$, p = 0.0065) and minimum (8.77×${10^{ - 6}}$ vs 8.38×${10^{ - 6}}$, p = 0.00004) cell widths, and decreased circularity (0.8805 vs 0.9237, p = 0.0368), and sphericity (0.3503 vs 0.3563, p = 0.0125) among the COVID-19 cohort. Additionally, several catch22 bio-optical attributes show statistically significant differences between cohorts with the CO embed2 Dist tau d expfit meandiff attribute being the most significant (1.2565 vs 0.8773, p = 0.0092). We further observe similar differences when looking at the standard deviations of bio-optical attributes between the two populations whereby the elongation (0.0761 vs 0.0641, p = 0.0091), projected cell area (6.83×${10^{ - 12}}$ vs 5.97×${10^{ - 12}}$, p = 0.0173) and maximum cell width (6.17×${10^{ - 7}}$ vs 5.36×${10^{ - 7}}$, p = 0.0187) showed the most significant differences among the handcrafted bio-optical attributes and the CO embed2 Dist tau d expfit meandiff attribute (0.9734 vs 0.5552, p = 0.0012) was again most significant among the catch22 attributes. This feature is explained in more detail in section 3.3 on feature importance. As seen with earlier reports, the COVID-19 cohort shows higher standard deviation values suggesting an increased spread in bio-optical attributes in the disease-state. This analysis corroborates previously reported information that distribution-based measures such as the RBC distribution width [1] do provide discriminatory information between healthy and COVID-19 infected populations and can be useful biomarkers.

However, unlike previous studies which focused solely on the means and standard deviations of various hematological parameters, here we have extended this analysis beyond first and second-order statistics to include skewness and kurtosis which are the third and fourth central moments, respectively, capturing the lack of symmetry and the heaviness of tails of the distributions. This inclusion reveals several bio-optical attributes have significant differences, especially among the spatio-temporal measures from the catch22 set of measures. The most significant difference for skewness (2.1270 vs 0.6945, p = 0.0001) and kurtosis values (7.2100 vs 2.5902, p = 0.0002) were seen in the DN_OutlierInclude_n_001_mdrmd attribute. Furthermore, when directly comparing the distributions of the bio-optical attributes rather than summary statistics by way of KS-testing, 39 of the 51 bio-optical attributes rejected the null hypothesis that the data came from the same distribution at 5% significance with 30 of those rejecting the null hypothesis at 0.5% significance. Notably more bio-optical attributes showed significant differences under the KS-test than by simple t-testing on lower order statistics. This suggests individual low-order summary statistics may omit distinguishing information between populations of COVID positive and healthy cells.

3.2 Classification results

As discussed in section 2.3, we compare the results of our proposed approach to several other classifiers including linear SVMs using various population statistics and to a k-nearest neighbor (KNN) classifier using the KS test statistic as the distance metric. For all models feature selection is performed within the nested cross-validation as described above. Additionally, for the KNN classifier, the number of neighbors is considered a hyperparameter and tuned within the nested cross validation by considering values of {1, 2, 3, 4, 5}. The results are shown without feature selection in Table 1, and with improved performance after feature selection in Table 2.

Table 1. Classification results without feature selection^a

View Table | View all tables in this article

Table 2. Classification results after feature selection^a

View Table | View all tables in this article

From Table 2, we see the proposed BoF-SVM approach along with the SVM using mean, standard deviations, and skewness values and the SVM using mean, standard deviations, skewness, and kurtosis values each correctly classified 22/24 subjects for 91.67% classification accuracy. This supports our overall hypothesis that inclusion of distribution information beyond solely the means and standard deviations is beneficial for classification purposes. Given similar performance between these top three models and a limited dataset of 24 individuals, it is difficult to conclusively state which model is best for all applications. However, among the top classifiers with equal diagnostic performance, we believe the BoF-SVM is most generalizable as it makes no assumptions as to which summary statistics are most informative and moreover, can retain higher order statistical information. As such the remaining analysis focuses on this proposed approach. After feature selection, the catch22 and combined model for our proposed BoF-SVM classifier both output the same model, retaining none of the handcrafted morphological bio-optical attributes. Hence, the catch22 and combined feature-set results reported in Table 2 for the BoF-SVM are two instances of the same model. This optimal model for the BoF-SVM approach used only 1 bio-optical attribute in all 24 folds of the nested cross-validation which was the CO embed2 Dist tau d expfit meandiff measure. For 23 of the 24 subjects, the inner-loop determined 25 bins to be the optimal quantization of the distribution, while the for the remaining subject, 5 bins were found to be optimal.

The confusion matrix for the best performing model is shown in Table 3. The proposed approach achieves 91.67% accuracy with 90.00% sensitivity, and 92.86% specificity. Classification scores of each patient are further provided by Supplemental Material Table S3.

Table 3. Confusion matrix for best model

View Table | View all tables in this article

Lastly, we compare the performance of our proposed approach to previously published methods for classification on this same dataset [9,10,17,18]. As shown in Table 4, the proposed approach achieves the highest accuracy, and Mathew’s correlation coefficient in terms of patient classification, with slightly lower AUC in comparison to the LSTM-based classifier [18]. We further analyze this comparison via a one-way ANOVA on ranks using the Kruskal-Wallis test in Fig. 3.

Table 4. Summary of classification results for Healthy and COVID-19 RBCs^a

View Table | View all tables in this article

Fig. 3. One-way ANOVA on ranks for comparing patient level classification between methods using the Kruskal-Wallis test. Comparison of methods shows results for handcrafted features in a support vector machine (HC, red line), a bi-directional long short-term memory deep learning approach (LSTM, blue), a highly comparative time-series analysis approach for massive, automated time-series feature extraction (HCTSA, green) and the proposed intra-patient cell attribute distribution-based bag-of-features approach (BoF, magenta).

Download Full Size | PDF

The ANOVA results show significant difference between various methodologies (p = 2.3275 × 10⁻⁶). A Tukey-Kramer test for post-hoc analysis, otherwise known as a test for multiple comparisons, reveals the bag-of-features and handcrafted feature approach are significantly different with p = 7.18 × 10⁻⁶. The HCTSA approach is likewise significantly different than the handcrafted feature approach (p = 1.28 × 10⁻⁴) and lastly the BoF approach was provided statistically significant performance than the LSTM approach (p = 0.0115). Other differences between models were found to be non-significant.

3.3 Feature importance

Following feature selection, we find our best model (across all folds of the nested cross-validation) comprises of only a single bio-optical attribute taken from the catch22 subset of HCTSA features. Namely, CO Embed2 Dist tau d expfit meandiff was singularly capable of providing 91.67% patient classification accuracy using the BoF-SVM methodology. More specifically, the standard deviation of CO Embed2 Dist tau d expfit meandiff, shows a separation that with an optimal threshold could correctly separate all but two individuals in the dataset. From Table S2 of the Supplemental Material, t-testing shows statistically significant increases in the average mean, standard deviation, skewness, and kurtosis values of this bio-optical attribute in the COVID positive cohort in comparison to the healthy counterparts. Likewise, the KS-testing returns a p-value of 5.24×${10^{ - 8}}$, to reject the null hypothesis that the distribution of values from COVID-19 positive subjects and the healthy subjects come from the same distribution. Among lower-order summary statistics, the standard deviation has the smallest p-value (0.0012) indicating significant differences between the standard deviations of CO embed2 Dist tau d expfit meandiff between COVID positive and healthy subjects. The box plot overlayed with a scatter plot for the standard deviations of this bio-optical attribute are provided by Fig. 4.

Fig. 4. Box plots overlayed with scatter plot for the standard deviations of CO embed2 Dist tau d expfit meandiff from the catch22 subset of HCTSA features. Each point represents a single human subject plotted by the standard deviation of CO embed2 Dist tau d expfit meandiff among their RBCs. Horizontal jitter is added to ease visualization of individual data points. The COVID positive cohort shows increased standard deviation indicating increased spread of distributions for the feature values.

Download Full Size | PDF

The exact biological relation of this extracted measure requires further investigation. From [25], it is provided that this specific attribute is a correlation-based metric that analyzes distances in a 2D embedding space. Concisely, this measure returns the mean difference of the exponential fit to successive distances in a 2D embedding space [35]. This is accomplished by first taking the z-scored time-series data, in this case being the time-series of a cells’ optical volume and embedding into a 2D space using a time delay embedding for a time delay of tau where tau is the embedding-distance and is set as the first zero-crossing of the autocorrelation function for the input time-series data. Time-delay embedding transforms the time-series into a matrix of data such that the more focus is given to the time-dependency of the data [38]. Then, from the 2D embedded matrix, the Euclidean distances between successive points is computed. Next, given the Euclidean distances between points in the space, the difference between the distribution of Euclidean distances to its exponential fit is finally returned as the output computed measure. From our dataset, the COVID positive cohort exhibited higher mean CO embed2 Dist tau d expfit meandiff values as well as increased intra-patient spread of values. Future work should examine whether this bio-optical attribute is informative for characterizing the spatio-temporal behavior of biological cells in other tasks and whether a connection between this measure to an underlying biological property can be unearthed.

4. Discussion

The results show an improvement in screening for COVID-19 by incorporating the attribute distribution information into the classification scheme. As seen from Tables 1 and 2, the proposed approach provides only moderate performance if considering solely handcrafted morphological features. For the models using only handcrafted features, the most important bio-optical attributes were those relating to the size and shape of the RBCs. This shows a general applicability of this approach to a wide range of cell imaging systems. However, the results clearly show that the most informative features for classification come from the spatio-temporal dynamic behavior of the healthy vs diseased cells as captured by the catch22 derived bio-optical attributes. This highlights the benefits of digital holographic microscopy in optical diagnostic sensing, and the advantage of measuring spatio-temporal information to aid in disease screening applications.

An additional benefit of the proposed approach is a generalizability to differences in the attribute distributions. By converting the distribution into a feature vector, the model gains flexibility regarding maintaining the higher order statistical information of the underlying distribution rather than being limited to lower order summary statistics. In any case, these results clearly show a significant benefit of incorporating attribute distribution information into our classification model as performance of the proposed model exceeded that of previous models using a simple majority of cell classifications to determine a given patients’ classification.

Among the misclassified subjects, were one healthy subject misclassified as COVID-19 positive, and one COVID-19 positive patient misclassified as healthy. The misclassified COVID-19 patient was a 67-year-old male with a moderate case of COVID-19 and a hospital stay of 7 days which is tied for the shortest stay among all subjects in our dataset. The misclassified healthy subject was a 49-year-old female that tested negative for antibodies indicating no prior infection. The misclassified healthy subject was also the lone subject wherein the optimal number of bins was determined to be 5. However, even when manually setting the number of bins to 25, this patient was still misclassified with a 45% classification accuracy among the 100 bootstraps. While the reason for misclassifying a healthy subject is less clear, the misclassification of one of the least severe COVID cases in our dataset may be a sign of a dependency on disease severity. To examine this closer, we looked at the correlations between the output confidence scores from our classifier and the length of the hospitalization. Excluding all healthy subjects, the confidence scores are weakly correlated with the length of hospitalization (R = 0.2863). With the healthy individuals included, and hospitalization stay set to 0 days, the confidence scores show a moderate correlation to the hospitalization stay (R = 0.6993). With the two misclassified subjects removed, the correlation between confidence scores of COVID positive subjects to length of hospital stay is R = 0.1328 and the correlation between all subjects and length of hospital stay is R = 0.7931. Future work may look to explore the study of RBCs for disease progression and looking for prognostic indicators for early identification of severe illnesses.

5. Conclusion

In conclusion, we have presented a classification approach to incorporate intra-patient bio-optical attribute probability distribution information in patient disease screening. The inclusion of probability distribution-based information improves our classification performance resulting in correct classification of 22/24 (91.67% accuracy, 90.00% sensitivity, 92.86% specificity) of the subjects in our COVID-19 dataset. The proposed approach could be applied to various microscopy techniques; however, we find the dynamic spatio-temporal bio-optical attributes obtained using our digital holographic microscope to be the most informative for disease screening. In particular, we identified a singular decisive attribute, “CO Embed 2Dist tau d exp fit mean diff”, from the catch22 subset of HCTSA features by which the standard deviation of the attribute values alone is sufficient to separate all but two individuals in this dataset by disease state. We further show the confidence scores output by our model provide weak to moderate correlation with the length of hospital stay which may suggest this approach could be useful to look for prognostic indicators. Future work entails continued development of systems for cell identification in digital holographic microscopy based on spatio-temporal dynamics, working with larger, clinically relevant datasets and statistical analysis on the performance of various classifiers and feature sets.

Funding

Office of the Vice President of Research, University of Connecticut (COVID-RSF); U.S. Department of Education (GAANN Fellowship).

Acknowledgments

We thank Dr. Liang and Dr. Shen of the Pat and Jim Calhoun Cardiology Center at UConn Health as well as their staff for clinical research support and discussions during data collection. T. O’Connor acknowledges support through the GAANN fellowship.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. B. M. Henry, J. L. Benoit, S. Benoit, C. Pulvino, B. A. Berger, M. H. S. de Olivera, C. A. Crutchfield, and G. Lippi, “Red Blood Cell Distribution Width (RDW) Predicts COVID-19 Severity: A Prospective, Observational Study from the Cincinnati SARS-CoV-2 Emergency Department Cohort,” Diagnostics 10(9), 618 (2020). [CrossRef]

2. T. Thomas, D. Stefanoni, M. Dzieciatkowska, A. Issaian, T. Nemkov, R. C. Hill, R. O. Francis, K. E. Hudson, P. W. Buehler, J. C. Zimring, E. A. Hod, K. C. Hansen, S. L. Spitalnik, and A. D’Alessandro, “Evidence for structural protein damage and membrane lipid remodeling in red blood cells from COVID-19 patients,” J. Proteome Res. 19(11), 4455–4469 (2020). [CrossRef]

3. A. Berzuini, C. Bianco, A. C. Migliorini, M. Maggioni, L. Valenti, and D. Prati, “Red blood cell morphology in patients with COVID-19-related anaemia,” Blood Transfus. 19(1), 34–36 (2021). [CrossRef]

4. M. E. Mortaz, M. Malkmohammad, H. Jamaati, P. A. Naghan, S. M. Hashemian, P. Tabarisi, M. Varnham, H. Zaheri, E. G. U. Chousein, G. Folkerts, and I. M. Adcock, “Silent hypoxia: higher NO in red blood cells of COVID-19 patients,” BMC Pulm. Med. 20(1), 269 (2020). [CrossRef]

5. C. Renoux, R. Fort, E. Nader, C. Boisson, P. Joly, E. Stauffer, M. Robert, S. Girard, A. Cibiel, A. Gauthier, and P. Connes, “Impact of COVID-19 on red blood cell rheology,” Br. J. Haematol. 192(4), e108 (2021). [CrossRef]

6. C. Wang, R. Deng, L. Gou, Z. Fu, X. Zhang, F. Shao, G. Wang, W. Fu, J. Xiao, X. Ding, L. Tao, X. Xiulin, and C. Li, “Preliminary study to identify severe from moderate cases of COVID-19 using combined hematology parameters,” Ann. Transl. Med. 8(9), 593 (2020). [CrossRef]

7. M. Grau, L. Ibershoff, J. Zacher, J. Bros, F. Tomschi, K. Felicitas Diebold, HG. Predel, and W. Bloch, “Even patients with mild COVID-19 symptoms after SARS-CoV-2 infection show prolonged altered red blood cell morphology and rheological parameter,” J. Cell. Mol. Med. 26(10), 3022–3030 (2022). [CrossRef]

8. M. Kubánková, B. Hohberger, J. Hoffmanns, J. Fürst, M. Herrmann, J. Guck, and M. Kräter, “Physical phenotype of blood cells is altered in COVID-19,” Biophys. J. 120(14), 2838–2847 (2021). [CrossRef]

9. T. O’Connor, J. B. Shen, B. T. Liang, and B. Javidi, “Digital holographic deep learning of red blood cells for field-portable, rapid COVID-19 screening,” Opt. Lett. 46(10), 2344–2347 (2021). [CrossRef]

10. T. O’Connor, S. Santaniello, and B. Javidi, “COVID-19 detection from red blood cells using highly comparative time-series analysis (HCTSA) in digital holographic microscopy,” Opt. Express 30(2), 1723–1736 (2022). [CrossRef]

11. U. Schnars and W. Jueptner, Digital Holography: Digital Hologram Recording, Numerical Reconstruction, and Related Techniques (Springer, 2005).

12. A. Anand, I. Moon, and B. Javidi, “Automated Disease Identification With 3-D Optical Imaging: A Medical Diagnostic Tool,” Proc. IEEE 105(5), 924–946 (2017). [CrossRef]

13. A. Anand, V. Chhaniwal, and B. Javidi, “Tutorial: Common path self-referencing digital holographic microscopy,” APL Photonics 3(7), 071101 (2018). [CrossRef]

14. Y. Jo, S. Park, J. Jung, J. Yoon, H. Joo, M. Kim, S. Kang, M. C. Choi, S. Y. Lee, and Y. Park, “Holographic deep learning for rapid optical screening of anthrax spores,” Sci. Adv. 3(8), e1700606 (2017). [CrossRef]

15. A. Anand, V. Chhaniwal, N. Patel, and B. Javidi, “Automatic Identification of Malaria-Infected RBC With Digital Holographic Microscopy Using Correlation Algorithms,” IEEE Photonics J. 4(5), 1456–1464 (2012). [CrossRef]

16. A. Doblas, E. Roche, F. Ampudia-Blasco, M. Martinez-Corral, G. Saavedra, and J. Garcia-Sucerquia, “Diabetes screening by telecentric digital holographic microscopy,” J. Microsc. 261(3), 285–290 (2016). [CrossRef]

17. B. Javidi, A. Markman, S. Rawat, T. O’Connor, A. Anand, and B. Andemariam, “Sickle cell disease diagnosis based on spatio-temporal cell dynamics analysis using 3D printed shearing digital holographic microscopy,” Opt. Express 26(10), 13614–13627 (2018). [CrossRef]

18. T. O’Connor, A. Anand, B. Andemariam, and B. Javidi, “Deep learning-based cell identification and disease diagnosis using spatio-temporal cellular dynamics in compact digital holographic microscopy,” Biomed. Opt. Express 11(8), 4491–4508 (2020). [CrossRef]

19. K. Jaferzadeh, I. Moon, M. Bardyn, M. Prudent, J. Tissot, B. Rappaz, B. Javidi, G. Turcatti, and P. Marquet, “Quantification of stored red blood cell fluctuations by time-lapse holographic cell imaging,” Biomed. Opt. Express 9(10), 4714–4729 (2018). [CrossRef]

20. D. Midtvedt, E. Olsen, and F. Hook, “Label-free spatio-temporal monitoring of cytosolic mass, osmolarity, and volume in living cells,” Nat. Commun. 10(1), 340 (2019). [CrossRef]

21. F. Dubois, C. Yourassowsky, O. Monnom, J. Legros, O. Debeir IV, P. Van Ham, R. Kiss, and C. Decaestecker, “Digital holographic microscopy for the three-dimensional dynamic analysis of in vitro cancer cell migration,” J. Biomed. Opt. 11(5), 054032 (2006). [CrossRef]

22. M. Hejna, A. Jorapur, J. S. Song, and R. L. Judson, “High accuracy label-free classification of single-cell kinetic states from holographic cytometry of human melanoma cells,” Sci. Rep. 7(1), 11943 (2017). [CrossRef]

23. A. S. Singh, A. Anand, R. A. Leitgeb, and B. Javidi, “Lateral shearing digital holographic imaging of small biological specimens,” Opt. Express 20(21), 23617–23622 (2012). [CrossRef]

24. M. Mugnano, P. Memmolo, L. Miccio, F. Merola, V. Bianco, A. Bramanti, A. Gambale, R. Russo, I. Andolfo, A. Iolascon, and P. Ferraro, “Label-free optical marker for red-blood-cell phenotyping of inherited anemias,” Anal. Chem. 90(12), 7495–7501 (2018). [CrossRef]

25. B. D. Fulcher and N. S. Jones, “hctsa: A computational framework for automated time-series phenotyping using massive feature extraction,” Cell Syst. 5(5), 527–531.e3 (2017). [CrossRef]

26. B. D. Fulcher, M. A. Little, and N. S. Jones, “Highly comparative time-series analysis: the empirical structure of time-series and their methods,” J. Roy. Soc. Interface 10(83), 20130048 (2013). [CrossRef]

27. O. Oloche, K. Tapela, C. O. Olwal, A. L. Djomkam Zune, N. N. Nganyewo, and O. Quaye, “Red blood cell distribution width as a prognostic biomarker for viral infections: prospects and challenges,” Biomark Med. 16(1), 41–50 (2022). [CrossRef]

28. G. Csurka, C. R. Dance, L. Fan, J. Willamowski, and C. Bray, Visual Categorization with Bags of Keypoints, in Workshop on Statistical Learning in Computer Vision, ECCV 2004 May 15 (Vol. 1, No. 1-22, pp. 1–2).

29. F.J. Massey, “The Kolmogorov-Smirnov test for goodness of fit,” J. Am. Stat. Assoc. 46(253), 68–78 (1951). [CrossRef]

30. R. Shukla and D. Malacara, “Some applications of the Murty interferometer: a review,” Opt. Lasers Eng. 26(1), 1–42 (1997). [CrossRef]

31. R. Goldstein, H. Zebker, and C. Werner, “Satellite radar interferometry: two-dimensional phase unwrapping,” Radio Sci. 23(4), 713–720 (1988). [CrossRef]

32. P. Ferraro, S. De Nicola, A. Finizio, G. Coppola, S. Grilli, C. Magro, and G. Pierattini, “Compensation of the inherent wave front curvature in digital holographic coherent microscopy for quantitative phase-contrast imaging,” Appl. Opt. 42(11), 1938 (2003). [CrossRef]

33. E.N. Lazareva and V. V. Tuchin, “Measurement of refractive index of hemoglobin in the visible/NIR spectral range,” J. Biomed. Opt. 23(03), 1 (2018). [CrossRef]

34. P. Girshovitz and N. T. Shaked, “Generalized cell morphological parameters based on interferometric phase microscopy and their application to cell life cycle characterization,” Biomed. Opt. Express 3(8), 1757 (2012). [CrossRef]

35. C. H. Lubba, S. S. Sethi, P. Knaute, S. R. Schultz, B. D. Fulcher, and N. S. Jones, “catch22: CAnonical Time-series Characteristics,” Data Min. Knowl. Disc. 33(6), 1821–1852 (2019). [CrossRef]

36. M. Stone, “Cross-validatory choice and assessment of statistical predictions,” J. Royal. Stats. Soc. B 36, 111–133 (1974). [CrossRef]

37. L. Breiman, “Bagging predictors,” Mach. Learn. 24(2), 123–140 (1996). [CrossRef]

38. T. Von Oertzen and S. M. Boker, “Time delay embedding increases estimation precision of models of intraindividual variability,” Psychometrika 75(1), 158–175 (2010). [CrossRef]

Method	Patient level classification
	Feature Set
	Handcrafted features			catch22 feature set			Combined feature set
	ACC	MCC	AUC	ACC	MCC	AUC	ACC	MCC	AUC
SVM-Means	75.00%	0.4781	0.8000	62.50%	0.2182	0.6464	79.17%	0.5674	0.8107
SVM-Stds	62.50%	0.2182	0.6179	87.50%	0.7419	0.8714	83.33%	0.5674	0.7357
SVM-Means + Stds	75.00%	0.4857	0.7929	83.33%	0.6574	0.8143	79.17%	0.5674	0.8071
SVM-Means + Stds + Skewness	70.83%	0.3928	0.7500	83.33%	0.6574	0.8571	83.33%	0.5795	0.8500
SVM-Means+ Stds+ Skewness + Kurtosis	62.50%	0.2182	0.7250	75.00%	0.4857	0.8679	70.83%	0.4382	0.8536
KNN-KS	75.00%	0.5976	0.7857	58.33%	0.1690	0.5857	75.00%	0.5071	0.7571
SVM-BoF	62.50%	0.2182	0.7500	75.00%	0.4857	0.6964	79.17%	0.5674	0.8536

Method	Patient level classification
	Feature Set
	Handcrafted features			catch22 feature set			Combined feature set
	ACC	MCC	AUC	ACC	MCC	AUC	ACC	MCC	AUC
SVM-Means	79.17%	0.5674	0.8250	58.33%	0.1690	0.5643	70.83%	0.3928	0.7393
SVM-Stds	33.33%	-0.338	0.2143	83.33%	0.6571	0.8464	83.33%	0.6571	0.8464
SVM-Means + Stds	70.83%	0.3928	0.7893	83.33%	0.6574	0.8357	66.67%	0.2988	0.7714
SVM-Means + Stds + Skewness	79.17%	0.5795	0.9071	75.00%	0.4857	0.8714	91.67%	0.8286	0.8643
SVM-Means + Stds+ Skewness + Kurtosis	75.00%	0.4857	0.8214	75.00%	0.4857	0.8214	91.67%	0.8286	0.8679
KNN-KS	79.17%	0.6078	0.8071	87.50%	0.7419	0.8643	87.50%	0.7419	0.8643
BoF-SVM	66.67%	0.2988	0.7036	91.67%	0.8286	0.8643	91.67%	0.8286	0.8714

Method	Accuracy	AUC	MCC
Handcrafted features (HC) [17]	70.83%	.8571	.4800
LSTM [18]	87.50%	.9393	.7419
HCTSA [9,10]	87.50%	.8571	.7593
BoF-SVM (Proposed Method)	91.67%	.8714	.8286

Method	Patient level classification
	Feature Set
	Handcrafted features			catch22 feature set			Combined feature set
	ACC	MCC	AUC	ACC	MCC	AUC	ACC	MCC	AUC
SVM-Means	75.00%	0.4781	0.8000	62.50%	0.2182	0.6464	79.17%	0.5674	0.8107
SVM-Stds	62.50%	0.2182	0.6179	87.50%	0.7419	0.8714	83.33%	0.5674	0.7357
SVM-Means + Stds	75.00%	0.4857	0.7929	83.33%	0.6574	0.8143	79.17%	0.5674	0.8071
SVM-Means + Stds + Skewness	70.83%	0.3928	0.7500	83.33%	0.6574	0.8571	83.33%	0.5795	0.8500
SVM-Means+ Stds+ Skewness + Kurtosis	62.50%	0.2182	0.7250	75.00%	0.4857	0.8679	70.83%	0.4382	0.8536
KNN-KS	75.00%	0.5976	0.7857	58.33%	0.1690	0.5857	75.00%	0.5071	0.7571
SVM-BoF	62.50%	0.2182	0.7500	75.00%	0.4857	0.6964	79.17%	0.5674	0.8536

Method	Patient level classification
	Feature Set
	Handcrafted features			catch22 feature set			Combined feature set
	ACC	MCC	AUC	ACC	MCC	AUC	ACC	MCC	AUC
SVM-Means	79.17%	0.5674	0.8250	58.33%	0.1690	0.5643	70.83%	0.3928	0.7393
SVM-Stds	33.33%	-0.338	0.2143	83.33%	0.6571	0.8464	83.33%	0.6571	0.8464
SVM-Means + Stds	70.83%	0.3928	0.7893	83.33%	0.6574	0.8357	66.67%	0.2988	0.7714
SVM-Means + Stds + Skewness	79.17%	0.5795	0.9071	75.00%	0.4857	0.8714	91.67%	0.8286	0.8643
SVM-Means + Stds+ Skewness + Kurtosis	75.00%	0.4857	0.8214	75.00%	0.4857	0.8214	91.67%	0.8286	0.8679
KNN-KS	79.17%	0.6078	0.8071	87.50%	0.7419	0.8643	87.50%	0.7419	0.8643
BoF-SVM	66.67%	0.2988	0.7036	91.67%	0.8286	0.8643	91.67%	0.8286	0.8714

COVID-19 screening with digital holographic microscopy using intra-patient probability functions of spatio-temporal bio-optical attributes

Abstract

1. Introduction

2. Methodology

2.1 Digital holographic microscopy

2.2 Sample preparation and data collection

2.3 Bag-of-features for distribution-based classification

3. Results

3.1 Statistical comparison of bio-optical attributes

3.2 Classification results

3.3 Feature importance

4. Discussion

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (4)

Tables (4)

Biomedical Optics Express

Predicted	COVID	Healthy
Actual	COVID	Healthy
COVID	9	1
Healthy	1	13
Classification Accuracy	91.67%