Deep learning-enabled volumetric cone photoreceptor segmentation in adaptive optics optical coherence tomography images of normal and diseased eyes

Somayyeh Soltanian-Zadeh; Zhuolin Liu; Yan Liu; Ayoub Lassoued; Catherine A. Cukras; Donald T. Miller; Daniel X. Hammer; Sina Farsiu; Sina Farsiu

doi:10.1364/BOE.478693

1. Introduction

Early diagnosis, prognosis, and treatment of retinal neurodegenerative diseases are enhanced with the visualization of retinal cell populations in patients. Adaptive optics (AO) enabled imaging, such as AO scanning laser ophthalmoscopy (AO-SLO) [1–10] and AO optical coherence tomography (AO-OCT) [11–21], allow enhanced in vivo visualization of human retinal cells, including photoreceptors [22,23]. Photoreceptors degenerate in many of the most common retinal diseases, such as age-related macular degeneration (AMD) [24], and inherited retinal diseases, such as retinitis pigmentosa (RP) [25,26], resulting in reduced vision and, ultimately, blindness. In RP, for example, shortening of the cone outer segment and eventual loss of the entire cell reflect the progression of the disease, and therefore cellular-level biomarkers (e.g., measurements of cone density, outer segment length, and diameter) carry much clinical interest. In the realm of ophthalmic therapeutic approaches (stem cell, gene therapy, and pharmaceutical), the development of new therapies often takes years to realize treatment efficacy owing in part to the redundancy in the visual system [27,28] and the limitation of the subjective, coarse clinical endpoints such as visual acuity. Sensitive AO-based cellular-level biomarkers have the potential to significantly reduce clinical trial duration and time to determine efficacy for a vast array of new therapies, but only if they are readily available to the clinician or therapy developer. Thus, fast, automated, and objective quantification of cellular-scale AO-enabled images is a prerequisite to the acceptance and use of AO-based metrics as validated clinical endpoints.

AO-SLO has been the dominant and most widely used modality for retinal cell imaging in the living human eye [29,30], with numerous methods developed for automated analysis of the en face cone mosaic structure in confocal and split-detector AO-SLO images. These methods range from traditional image processing approaches [31–39] to more recent deep learning-based methods [40–50]. Visualization and quantification of the cone mosaic properties have led to increased knowledge of disease pathophysiology [29,30,51]. Beyond two-dimensional (2-D) characteristics of the cone mosaic, as measured from AO-SLO images, other structural properties of individual cells, such as the cone’s outer segment length and inner segment diameter, may be additional important biomarkers of disease [52]. Thus, in recent years there has been increased interest in imaging and quantifying the cone population as captured by the volumetric AO-OCT modality. The current state-of-the-art AO-OCT systems provide 3-D visualization of retinal cells and can measure cell processes manifested as optical changes in micrometer- and nanometer-scale [52].

The increased use of AO-OCT imaging for clinical and vision research has led to the development of automated methods to analyze and quantify different retinal cells, such as cones [53,54] and ganglion cells [55]. The automated cone photoreceptor analysis methods in previous studies were based on processing 2-D en face projection images [53] or single B-scan images [54]. As with AO-SLO, 2-D analysis of cones in AO-OCT images does not make use of all of the cone information captured in the volumetric image [56]. In this paper, we developed a deep learning-based method to automatically quantify individual cone photoreceptors in 3-D AO-OCT scans of healthy and diseased retinas. Our method consists of several modules for segmenting the photoreceptor layer, vasculature, and cones in AO-OCT images. We took advantage of previously acquired and labeled confocal AO-SLO cone images to minimize human effort in creating ground truth labels for the training of the vessel and cone segmentation modules for the AO-OCT data analysis. The results showed that our method achieved human-level performance in identifying and segmenting individual cones in the AO-OCT scans across different retinal locations, imaging devices, and patient populations. Using the automated 3-D segmentations of individual cones, we also provided quantitative analysis of cone density and outer segment lengths in different participants and the alterations to these cellular-level properties in diseased states. To promote future studies, we have made the imaging datasets and the manual expert labels available online.

2. Materials and methods

2.1 AO-OCT dataset and annotation

The AO-OCT dataset was acquired using the following three imaging systems: multimodal AO based on spectral domain OCT (mAO-SDOCT) [17], mAO based on swept source using Fourier domain mode-locked laser technology (mAO-FDML) [18], and AO-SDOCT [57]. The first two imaging systems were developed at the U. S. Food and Drug Administration (FDA). To image the cone mosaic with both systems, the AO focus with depth-of-focus of ∼35 µm was placed at the cone level and optimized based on the visual assessment of the real-time display of the imaging data. The AO-OCT volumes covered field-of-views (FOV) of size 1°×1° to 3°×3° to spatially sample different retinal eccentricities. The en face images were 300 × 300 and 500 × 500 pixels for the mAO-SDOCT and mAO-FDML systems, respectively. The dataset from these systems (denoted as the ‘FDA’ dataset) consisted of AO-OCT volumes from five participants affected by retinal diseases (referred to as ‘diseased’, ten volumes in total) and four unaffected participants (referred to as ‘healthy’, fifteen volumes in total) across different retinal locations (Table 1). The affected participants had a range of pathologies, including three participants with early-stage, sub-clinical drusen deposits, one participant with mutations in the retinitis pigmentosa GTPase regulator (RPGR) gene, and one participant with mutations in the CTRP5 gene. Each volume was the average of 30 to 300 registered AO-OCT volumes of the same retinal patch.

Table 1. Summary of AO-OCT dataset

View Table | View all tables in this article

The AO-SDOCT [57] system was developed at Indiana University (IU), and was used to acquire AO-SDOCT scans focused at the cone photoreceptor level (AO depth-of-focus of ∼31 µm) from three RP participants (eight volumes in total) and three age-matched healthy participants (eight volumes in total) at up to four retinal eccentricities (Table 1). The three RP participants were ranked by disease severity and labeled as RP1 (early-stage), RP2 (mid-stage), and RP3 (advanced). Each volume was the average of 36 registered AO-OCT volume videos (300 × 300 pixels en face images covering FOV of 0.8°×1°) acquired of the same retinal patch. Details of imaging, data preparation, participant ranking, and related clinical data are reported in [57]. All protocols for the collection of the FDA and IU datasets adhered to the tenets of the Helsinki declaration and were approved by the respective Institutional Review Boards.

The ground truth cone labels for evaluation purposes were created by expert graders at each institution. The graders manually marked the center of each cone photoreceptor in the 2-D projection images of the cone photoreceptor outer segment layers, starting from the inner-segment outer-segment (IS/OS) junction and ending at the end of the cone outer segment tip (COST) layer. These graders carefully checked that the 2-D projection images were created from the targeted regions of the 3-D volumes to ensure accurate identification of cones. Another grader, independent of the graders at FDA and IU, marked the cones in the 2-D images to create the “2^nd grader” set, serving as the expert-level performance for evaluating model performance. A subset of ∼100 randomly selected cones per AO-OCT volume of the FDA’s healthy group was manually segmented on the 2-D projection images to evaluate our method's segmentation performance. To create the ground truth labels for layer segmentation, an experienced grader marked 150-200 B-scans (consecutions of 50 B-scans) in each AO-OCT volume.

2.2 Confocal AO-SLO dataset and annotation

The AO-SLO images used in this paper are from the online dataset in [31]. The dataset includes 840 images (150 × 150 pixels) from the right eye of 21 participants. Four locations at 0.65° from the center of fixation were imaged, with 150 images captured at each location. The process was repeated 10 times for each individual. We used the segmentation masks of [31] obtained with the graph theory and dynamic programming method as the ground truth cone segmentation masks for model training.

2.3 Overall framework

To automatically segment individual cone photoreceptors in AO-OCT volumes, we developed a framework that included special handling for issues such as vessel hypo-reflectivity, where structural information is lost due to the shadowing of the vessels. The overall framework consists of three convolutional neural network (CNN) modules (Fig. 1): (1) layer segmentation (L-CNN), (2) vessel segmentation (V-CNN), and (3) cone detection (C-CNN). Given an AO-OCT volume, L-CNN automatically segmented the photoreceptor outer segment layer (from IS/OS to COST). Using the segmentation of L-CNN, we next created a 2-D projection image of the cone mosaic by taking the average of voxel intensities in the segmented layer across the axial dimension. We also cropped the original AO-OCT volume to 20 voxels above and below the upper and lower boundaries of the layer segmentation mask for further processing. V-CNN then processed this 2-D projection image to segment prominent vessel regions in the field-of-view. The 2-D projection image and the extracted sub-volume were processed by C-CNN to detect cones. Finally, after post-processing the output predictions of C-CNN, the predictions of L-CNN and V-CNN were used to refine the cone predictions, yielding the final output 3-D segmentation of individual cone photoreceptor outer segments. All processing steps were the same across the imaging devices. Each step is detailed in the following sections.

Fig. 1. Overall framework for automatic segmentation of individual cone photoreceptor outer segments from AO-OCT scans.

Download Full Size | PDF

2.4 Cone outer segment layer segmentation

Retinal layer segmentation in OCT images has received much attention in past years [58–62] but has not been comprehensively addressed for AO-OCT data. Here, we proposed a single-layer segmentation method for AO-OCT images based on previous deep learning models for OCT data. Instead of using complex network architectures or loss functions to obtain topologically correct segmentations, we used the signed distance map (SDM) for spatial regularization and implicitly representing shape information [61,63]. From [61] for 2-D images, we defined SDM for volumetric data as:

(1)$$SDM = \textrm{}\left\{ {\begin{array}{{c}} { - mi{n_{y\epsilon {\Omega _B}\textrm{}}}d({x,y} ),\;\textrm{ if}x\epsilon {\Omega _F}\textrm{}}\\ {mi{n_{y\epsilon {\Omega _F}\textrm{}}}d({x,y} ),\;\textrm{if}x\epsilon {\Omega _B}.} \end{array}} \right.\textrm{}$$

In the above equation, Ω_F and Ω_B denote the set of foreground (layer) and background voxels, respectively, and d(x, y) is the Euclidean distance between voxels x and y. We then normalized the SDM to [-1, 1] by dividing the values of each region (inside and outside layer regions) by their corresponding maximum absolute value.

Instead of treating each B-scan independently using a 2-D network architecture, we designed the layer segmentation network, named L-CNN, with 3-D processing units to exploit the inherent 3-D spatial context within AO-OCT scans. L-CNN has an encoder-decoder architecture with residual connections in each encoder and decoder level, and skip connections from the encoder to the decoder (Fig. 2). We used filters of size 7 × 3 × 3 in the convolutional layers with a stride of 1, 3 × 3 × 3 max-pooling layers with a stride of 2 in the encoder path, and bilinear up-sampling (by a factor of 2) in the decoder path. The output of the decoder is a 16-channel feature map, which is then shared by two output branches. The first branch, the binary segmentation branch (conv-b), consists of a convolutional block with 16 channels followed by a final 2-channel 1 × 1 × 1 convolution layer with softmax activation to classify each voxel as background or cone layer. The second branch, the distance regression branch (conv-r), has the same structure as conv-b except that the final convolution layer is 1-channel with kernel size of 3 × 3 × 3 and hyperbolic tangent activation function for regression of SDM values. We optimized L-CNN by minimizing the weighted sum of the dice loss L_dice (applied to the conv-b branch) and the mean squared error regression loss L_re_g (applied to the conv-r branch).

Fig. 2. Architecture of L-CNN for segmentation of the cone outer segment layer. The numbers beneath each block denote the number of filters for that layer. The stride in all convolutional and max-pooling layers was set to 1 and 2, respectively.

Download Full Size | PDF

We used the manually labeled AO-OCT volumes of the FDA’s healthy set for the training of L-CNN under two experiments: (1) leave one subject out cross-validation with training and testing on FDA healthy participants, and (2) training on all FDA healthy images and testing on all other AO-OCT data. During training and inference, all images (and labels) were resized to have a pixel size of 1.5 µm in the axial direction. We trained L-CNN with randomly cropped sub-volumes of size 256 × 64 × 16 voxels (axial × fast-scan × slow-scan directions) for a maximum of 180 epochs and used the Adam optimizer with an initial learning rate of 10⁻⁴. We set the weight for L_re_g to be 3 times of L_dice during training and used data augmentation in the form of random horizontal flipping. In each training experiment, images of one participant in the training set were separated and used as the validation data. We used the network weights that resulted in the highest dice score on the validation data for segmenting the test images.

During inference, we used sliding windows of size 256 × 128 × 128 voxels (axial × fast-scan × slow-scan directions) with 10 voxels overlap in all directions to generate the output maps. We averaged the predictions in the overlapping regions to yield the final outputs. We post-processed the output of conv-b to get the final layer segmentations for the test images. Our post-processing included steps to deal with mislabeled regions. We first smoothed the output probability map using a mean filter of size 3 × 5 × 3 voxels (axial × fast-scan × slow-scan directions), which was then binarized using a threshold of 0.5. Next, we eliminated isolated regions from the binarized predictions in each B-scan that had areas less than 0.1 of the largest connected component in the B-scan. To circumvent the effect of vessel shadows in the layer predictions, which manifested as discontinuities in the layer segmentations, we applied hole filling to the prediction masks.

2.5 Vessel detection and segmentation

Our framework includes a vessel segmentation module to handle structural information loss due to the shadowing of large vessels. This module, named V-CNN, has two sub-modules. The first is a classification CNN to determine whether dark vessel regions are present in the input image. If this sub-module, named ClassNet, indicates the presence of vessels, the image is then passed to the second sub-module, which is a segmentation CNN.

We constructed ClassNet using standard CNN design principles. The details of the network architecture are shown in Fig. 3(A). The input image of size 128 × 128 pixels is processed by consecutive blocks of convolution layers with stride 1, batch normalization, ReLU activation, and pooling layers. The pooling layers consist of a 3 × 3 max-pooling layer with a stride of 2, a 3 × 3 average pooling layer with a stride of 2, and a 2-D global average pooling layer. The last layer is a fully connected layer with softmax activation to output a probability value for whether the input image contains vessel shadows.

Fig. 3. Components of V-CNN. A) ClassNet for determining whether vessels are present in an image. B) The vessel segmentation network. C) Example AO-SLO images with simulated vessels added to them. The numbers beneath each block in A and B denote the number of filters for that layer. The stride in all convolutional layers is 1, unless noted otherwise. DAC: dense atrous convolution, RMP: residual multi-kernel pooling block.

Download Full Size | PDF

The segmentation sub-module was inspired by the popular CE-Net [64] architecture. Briefly, CE-Net is a U-shaped 2-D semantic segmentation neural network consisting of a dense atrous convolution (DAC) block and a residual multi-kernel pooling block (RMP) at its last encoder block. The DAC and RMP blocks were shown to improve performance when added to a backbone encoder-decoder network for the segmentation of medical images, including vessel segmentation in retinal images [64]. Our vessel segmentation neural network is summarized in Fig. 3(B). It consists of an encoder part, which is followed by modifications of DAC and RMP, and a decoder part to restore the high-level semantic features to the original size. We also incorporated skip connections from the encoder to the decoder to recover spatial information lost during consecutive strided convolutions. The encoder part consists of four levels of residual convolutional blocks, similar to the CNN in our previous work [55]. To adapt the DAC block to our encoder, we set the number of channels in its convolutional layers to 256 (Fig. S1). We also modified the RMP in [64] to have receptive fields of size 3 × 3, 5 × 5, and 9 × 9 (Fig. S1). The decoder consists of three upscaling blocks followed by a convolutional layer with batch normalization and ReLU activation. The final layer is a 1 × 1 convolutional layer with softmax activation.

We used the confocal AO-SLO cone images to create the training data for V-CNN. We simulated vessel shadows by adding dark tubular structures to the images (Fig. 3(C)). Using the cosine function to approximate a local portion of a large vessel, we simulated the image of a vessel shadow on the 2-D image grids X and Y of size M × N pixels as:

(2)$$V = \; |{Y - \cos (X )+ y0} |\le th,$$

where the x- and y-axes are defined in the [0, π] and [-1, 1] intervals, respectively, y0 is an offset value in the y-direction, and th is a threshold value to binarize the function. To smooth the vessel edges in the binary image V, we applied Gaussian smoothing with a standard deviation of 2 pixels to it. Next, to diversify the appearance of the generated vessel image, we applied elastic deformation to V. The deformed vessel image was then scaled by a factor of α and applied to an AO-SLO cone image I by:

(3)$$I = I\ast ({1 - \alpha V} ).$$

During training, the parameters y0, th, and α were randomly drawn from uniform distributions in the intervals of [-1, 1], [0.15, 0.3], and [0.5, 0.9], respectively. Demonstrative examples of simulated images are shown in Fig. 3(C).

We used AO-SLO images from 16 participants (640 images) as training images and the remaining 5 participants (200 images) for validation and monitoring of the training process. For training of ClassNet, we added vessels to half of the validation and training images in any sampled training batch. We trained ClassNet with 128 × 128 pixels images, batch size of 20 for a maximum of 50 epochs, and used the Adam optimizer with an initial learning rate of 0.001. We used random flipping, rotation (0°, 90°, 180°, or 270°), and image scaling (scale factor between 0.5 and 0.8) for data augmentation. We used the network weights that resulted in the highest accuracy score on the validation images to apply to AO-OCT images. During inference, 2-D projection AO-OCT images were resized to 256 × 256 pixels to predict the presence of dark vessel shadows in the data with a threshold of 0.4.

During the training of the vessel segmentation network, vessels were added to all images. The network was trained with images symmetrically padded to size of 256 × 256 pixels, batch size of 20 for a maximum of 70 epochs with the Adam optimizer with an initial learning rate of 10⁻⁴. Data augmentation in the form of random flipping, rotation (0°, 90°, 180°, or 270°), and scaling (scale factor between 0.5 and 0.8) was used for training. The network weight that resulted in the highest dice score on the validation images was kept for application to AO-OCT data. During inference, if a 2-D projection AO-OCT image was determined to contain vessel shadows, it was passed to the segmentation network. We used test-time-augmentation to improve the final vessel segmentation result, in which the prediction of the original image was averaged with predictions of the flipped images around the horizontal axis, vertical axis, and both axes. The averaged prediction map was then smoothed with a 3 × 3 mean filter and binarized with a threshold of 0.5. Finally, hole filling was applied to the binarized vessel mask and any region with an area smaller than 0.1 of the largest connected component was removed.

2.6 Cone segmentation

Our idea of exploiting labeled confocal AO-SLO images to train a deep learning model for the detection of cones in AO-OCT images is similar to that of [53]. However, our approach differs significantly from [53]. Beyond the difference in the cone detection network architecture of our method and [53] (fully convolutional versus sliding window-based classification network), we went beyond cone localization: we additionally segmented the cone boundaries by exploiting cone segmentation masks from Chiu et al. [31]. Moreover, our cone detection module operated in 3-D, whereas the method in [53] was designed to only process 2-D projection images.

Our cone segmentation network, named C-CNN, is based on the network we previously implemented for retinal ganglion cell segmentation [55] (Fig. 4). C-CNN is a 2-D fully convolutional neural network composed of an encoder, followed by bilinear up-sampling and convolutional layers applied to features at each level of the encoder. These features are concatenated and passed to three branches to predict the binary cone centroid, binary segmentation, and distance maps of the input image. The centroid map is a binary mask where each cone center is represented by a 2 × 2 pixels square. The segmentation mask is the binary mask of the cone cell bodies, and the distance map represents the normalized distance of each cone pixel to its corresponding boundary. The goal of introducing the distance map was to recover and separate the segmentations of individual cones.

Fig. 4. C-CNN for automatic cone segmentation. A) Network architecture. The network outputs three predictions: The binary center mask denoting the center of cells, the binary segmentation mask for cell soma segmentation, and the distance map representing the normalized distance of each cone pixel to its corresponding cell boundary. The stride in all convolutional layers is 1, unless noted otherwise. B) Application of C-CNN to cropped AO-OCT volumes during inference. Each segmented cone is represented by a randomly assigned color in the en face (xy) and cross-sectional (xz and yz) slices.

Download Full Size | PDF

We used AO-SLO images from 16 participants (640 images) as training data and the remaining five participants (200 images) to validate and monitor the training process. We trained C-CNN with batch size 20 for a maximum of 20 epochs and used the Adam optimizer (initial learning rate of 10⁻⁴). We applied flipping, rotation (0°, 90°, 180°, or 270°), image scaling (scale factor between 0.5 and 1.2), and gamma correction (gamma value between 0.7 and 0.9) for data augmentation at random in each epoch. We used the network weights that resulted in the highest F₁ score (see Section 2.7) before plateauing on the validation images for application to AO-OCT images.

After training, we applied C-CNN to AO-OCT data. All AO-OCT data were resized to have a pixel size of 1 µm in the lateral direction. To better adapt for the densely packed smaller-sized cones near the fovea, images taken within 3° of the fovea were enlarged by a factor of 1.5 after the initial resizing. To detect cones, we conducted two experiments: (1) processing of the 2-D mean projection image and (2) 3-D processing of the volume. In the first experiment, we used the layer segmentation results of L-CNN to generate the mean projection image. We applied gamma correction (with parameter 0.85) to any AO-OCT image with average intensity less than the average of the AO-SLO training set. In the second experiment, we complemented the findings from the first experiment with cones found by processing individual planes of the AO-OCT volume. As shown in Fig. 4(B), after cropping AO-OCT volumes to 20 voxels above and below the upper and lower boundaries of L-CNN’s segmentation masks, we sliced them into xy, xz, and yz sections, which C-CNN independently processed to generate three sets of prediction maps. Similar to the xy plane, cones appear as bright circular regions in the xz and yz planes as well. Thus, the multi-view prediction scheme can aid in cone localization. We averaged the output predictions to yield the final maps for further post-processing. Gamma correction was applied to any volume that needed intensity correction in the first experiment before running through C-CNN.

In the post-processing stage, to circumvent the increasing presence of rod photoreceptors as noisy background in between cones at higher eccentricities, center mask predictions for images from 5° and beyond were first smoothed by mean filtering with kernel size of 3. We then detected cones as local maxima points separated by at least d_min in the smoothed center mask that had probability values larger than 0.5. d_min was set to 3 µm for locations below 3° and was scaled up for higher eccentricities to consider the increase in cone spacing. The scaling was based on the ratio between cone spacing at 3° and other locations from previous literature [65], yielding d_min values of 3.3–4.1 µm for 4°-9°. These steps were carried out for the first experiment of processing 2-D mean projection images. We used test-time augmentation to improve the final prediction maps. We averaged the predictions on the original images with the predictions of the flipped images around the horizontal axis, vertical axis, and both axes.

For the second experiment, where we processed the entire volume (Fig. 4(B)), the same steps were carried out with the addition of masking out points beyond the segmented layer boundaries. We also had an additional processing step for cone aggregation. We included this step to account for the fact that each cone can exhibit two hyper reflective spots in the IS/OS and COST layers. In this step, we used the xy coordinates of the detected 3-D centroids and matched them to the 2-D centroids from the projection image (the first experiment). A 2-D centroid and its matched 3-D points represented one detected cone in the xy coordinate of the 2-D centroid. Any unmatched 3-D centroids were grouped together based on their xy proximity to represent one cone located at their mean xy coordinates.

To get the segmentation masks for individual cones (both 2-D and 3-D), we used the seeded watershed algorithm applied to the distance map along with the binary segmentation mask. The detected cone centers were used as seeds for the watershed algorithm. In case of 3-D segmentation, we used the watershed segmentation masks at the top and bottom axial coordinates of each cone to estimate its diameter at the corresponding en face planes. We used the intensity profile of each detected cone, obtained from the original AO-OCT scan, to estimate its top and bottom limits. We calculated the intensity profile as the average intensity around the 3 × 3 pixels neighborhood of a cone center at each axial plane. From the intensity profile, we detected the top three prominent peaks within L-CNN’s segmentation bounds. Using the intensity value of the axially top-most peak as reference, we filtered out the other two peaks if their values were below 0.25 of the reference value. The top and bottom limits of each cone were then set as the minimum and maximum axial coordinates of the remaining peaks. We represented each cone as a conical frustum with the estimated diameter values at its top and bottom planes. Cones with one estimated axial location were represented as spheres. We approximated a cone diameter on the basis of a circle with an equivalent area as the segmentation mask. In practice, we set minimum and maximum limits to the estimated diameters for the 3-D segmentations. Estimated diameters smaller than D_min and larger than D_max were set to these values, respectively. We used D_min = 2 µm across all eccentricities, and D_max of 8 µm and 12 µm for locations below 3° and at/above 3°, respectively.

We compared the cone detection performance of our method with that of [53]. As the code for [53] was not publicly available, we implemented their method by carefully following the descriptions in the paper. We call our implementation of [53] CifarNet, as the network used in the mentioned study was a modified Cifar neural network. We used the same set of training and validation images as was used for C-CNN. Following the descriptions in [53], AO-SLO images were resized by factors of 1.5, 2, 2.5, and 3 to match the range of cone sizes in the AO-OCT images. CifarNet was trained with the binary cross-entropy loss, batch size of 100, and the number of maximum epochs was set to 50 with early stopping if the validation loss did not decrease in 4 epochs. The learning rate was set to 10⁻³ with a weight decay of 10⁻⁴. At test time, AO-OCT images were resized to pixel size of 1 µm and the generated probability maps were binarized using Otsu’s method. Cone centers were determined as the centroid of any 4-connected components in the binarized map.

2.7 Performance evaluation

We compared the cone detection performance of the automated methods with the gold-standard markings, using recall, precision, and F₁ scores defined as

(4)$$Recall = \frac{{{N_{TP}}}}{{{N_{GT}}}},$$

(5)$$Precision = \frac{{{N_{Tp}}}}{{{N_{detected}}}},$$

(6)$${F_1} = \frac{{2 \times Recall \times Precision}}{{Recall + Precision}}.$$

In the above equations, N_GT is the number of manually marked cones, N_detected denotes the number of automatically detected cones, and N_TP is the number of true positive cones. True positive cones were determined using the Euclidean distance between the manually marked and the automatically found cones. Each manually marked cone was matched to its nearest automatic cone if the distance between them was smaller than half the cone spacing in normal eyes at the corresponding imaged eccentricity. Based on previous literature [65] for cone spacings in healthy eyes, we used values of 6.1, 7, 7.7, 8.2, and 8.6 µm as the cone spacing for eccentricities 2°, 3°, 4°, 5°, and 6°, respectively. For other eccentricities, we extrapolated the cone spacings by fitting a logarithmic function to the values at 2° to 6°, yielding spacing values of 4.5, 8.9, 9.3, and 9.5 µm for eccentricities ≤1°, 7°, 8°, and 9°, respectively. For the diseased cases, we used 0.75 times the cone spacing as the distance threshold for matching manual cones to the automatically found cones. Any cone overlapping with segmented vessel regions, along with their matched cone from the other set, were removed from the calculation of the performance scores. We also disregarded cones within 10 pixels of the volume edges to remove border artifacts. For inter-observer variability, we compared the markings of the 2^nd grading to the gold-standard markings in the same way. We also compared the automatically estimated cone densities to the gold-standard values using Bland-Altman analysis. We measured cone density by dividing the cone count by the image area after accounting for vessels and image edges. We used the Wilcoxon signed-rank test to determine the statistical significance of the observed differences.

To evaluate the segmentation performance of L-CNN and C-CNN, we used the dice similarity coefficient (DSC), defined as

(7)$$DSC = \textrm{}\frac{{2 \times TP}}{{2 \times TP + FP + FN}},$$

where TP, FP, and FN are the number of true positive, false positive, and false negative voxels in the predicted binary segmentation maps, respectively.

3. Results

3.1 Performance of layer segmentation

We quantified the performance of layer segmentation on test images for the subset of B-scans that were manually labeled. For the FDA set, the Dice score was 0.872 ± 0.061 and 0.869 ± 0.084 for the healthy and diseased participants, respectively, and for the IU set, the scores were 0.950 ± 0.016 and 0.916 ± 0.074 for the healthy and RP participants, respectively. Figure 5 shows examples comparing the automated layer segmentations with the manual labels. The results reflect high performance across images acquired with different AO-OCT setups from diseased and healthy individuals.

Fig. 5. Example layer segmentation results compared to manual grading for test images on A) FDA and B) IU datasets. Each image corresponds to a different participant. Images were cropped to the area around the cone outer segment layer for illustration purposes. Scale bars: 100 µm.

Download Full Size | PDF

3.2 Performance of cone detection

The cone detection performance of our method compared to CifarNet and the 2^nd human grader is summarized in Table 2. These results correspond to the processing of 2-D mean projection images created by manually setting the upper and lower layer boundaries. For all reported results, cones in the segmented vessel region were removed. As the results show, C-CNN consistently outperformed CifarNet and was on par with the 2^nd grader based on the F₁ score across all sets of images (p-values > 0.5) except that of FDA’s diseased participants (p-value = 0.049). Even for the 2^nd grader, the diseased cases were more challenging, as reflected by the lower F₁ scores between the healthy and diseased groups (especially for the IU dataset with moderate to severe RP cases). Example results are shown in Fig. 6 and Fig. S2.

Fig. 6. Example cone detection results on A) healthy and B) diseased images. Smaller images are the zoomed-in illustrations of the white box area (50 × 50 µm², 0.15°×0.15°) shown in the original images. The top and bottom examples in A and B are from the FDA (500 × 500 and 300 × 240 pixels, respectively) and IU datasets (300 × 300 pixels for both), respectively. Green points denote true positives, yellow denotes false negatives, and red denotes false positives. Segmented vessel regions are overlaid as light blue masks. Scale bars: 100 µm (∼0.3°).

Download Full Size | PDF

Table 2. Cone detection performance scores (mean ± standard deviation) on AO-OCT projection images with networks trained on labeled AO-SLO images. Cells falling onto vessel regions, as segmented with V-CNN, were discarded prior to calculating the scores for all three methods. Cases with significantly (p-value < 0.05) smaller scores compared to the 2^nd grader are marked with †.

View Table | View all tables in this article

Next, we compared the cone detection performances of using the 2-D projection images obtained from L-CNN’s segmentation mask to that of creating the images through manual layer inspection. Using the automatically generated 2-D projection images yielded the same level of cone detection performance as using the images created through manual determination of the cone outer segment layer bounds (all F₁ p-values > 0.1; Table S1).

The results presented so far were for detecting cones from 2-D mean projection images. Compared to processing only the 2-D projection image created from the automatic layer segmentation, 3-D processing of AO-OCT volumes resulted in higher recall and lower precision scores (Table S2). The final performance, as reflected by the F₁ score, was similar between the two experiments for all sets (p-values ≥ 0.25) except for the FDA diseased set (p-value = 0.037), where the 3-D processing achieved a lower score (0.834 ± 0.078 versus 0.855 ± 0.063). Overall, pooling the performance scores at the participant level across all the images (15 participants in total), 3-D processing yielded higher recall (0.921 ± 0.088 versus 0.895 ± 0.099, p-value = 2 × 10⁻⁴), lower precision (0.863 ± 0.110 versus 0.899 ± 0.091, p-value = 6 × 10⁻⁵), and the same F₁ score (0.884 ± 0.074 versus 0.891 ± 0.068, p-value = 0.252) as 2-D processing. The processing time, from layer segmentation to 3-D cone localization, was 270.30 ± 21.35 seconds/volume for the IU dataset with 300 B-scans of size 300 × 300 pixels across all participants. Details on hardware and other processing times can be found in the Supplementary Text.

3.3 Cone density analysis

The mean error of the automatically measured cone densities was small (-3.96% and 3.47% difference for FDA and IU, respectively; Fig. S3 and Table S3) compared to that of manual counting for the healthy participants but more varied for the diseased individuals (-13.86% and 19.26% difference for FDA and IU, respectively; Fig. S3 and Table S3). The automatically measured cone densities decreased from 31.47 × 10³ cones/mm² in foveal regions to 11.89 × 10³ cones/mm² in perifoveal areas of IU’s healthy group, and from 25.22 × 10³ cones/mm² to 8.43 × 10³ cones/mm² in IU’s RP group, reflecting the decrease in cone density with increasing distance from the fovea (Table S3). The results also show that the automated measurements – albeit more varied – still captured the overall trend of cone loss in the RP participants compared to the healthy age-matched individuals (Table S3).

3.4 Cone segmentation

Here we provide a proof of concept on segmenting individual cones from the AO-OCT images. Figure 7 shows the DSC scores for segmenting cones on 2-D projection images of the FDA’s healthy group. On each box in Fig. 7, the central mark indicates the median and the bottom and top edges indicate the 25^th and 75^th percentiles, respectively. The overall performance across the data was DSC = 0.787 ± 0.039. As the results show, the DSC scores for images acquired with higher sampling density (pixel size ≤ 1 µm) were generally higher compared to lower sampling density images of the same individual.

Fig. 7. Boxplots for the dice scores of cone segmentation on 2-D mean projection images for FDA’s healthy group (total of 15 images, ∼100 cones per image). The data have been color-coded based on their actual lateral pixel sizes. The black dashed line indicates the mean score across images. On each box, the central mark indicates the median, and the bottom and top edges indicate the 25^th and 75^th percentiles, respectively. The whiskers extend to the most extreme data points not considered outliers (delineated using the circle markers).

Download Full Size | PDF

We demonstrate qualitative results for the 3-D segmentation of cones in Fig. 8 and Fig. S4 for the FDA and IU datasets, respectively. Figure 9(A) compares the OS lengths for RP participants to healthy age-matched controls across retinal eccentricities. The automatically measured OS lengths reflect the generally shorter and more varied OS lengths for the RP participants than for the age-matched controls, trends consistent with previous manual measurements [57]. In Fig. 9(B), we visualize the OS lengths as a 2-D map for an individual with drusen deposits, which highlights the affected regions as areas with much shorter OS lengths than adjacent regions.

Fig. 8. Volumetric cone segmentations illustrated with randomly assigned colors on en face (xy) and cross-sectional (xz and yz) slices for participants from the FDA dataset: A) healthy individual at 7° temporal to fovea, and B) participant with drusen at 1° temporal to fovea. Scale bars: 50 µm.

Download Full Size | PDF

Fig. 9. Automatic cone outer segment (OS) length measurements. A) Distribution of OS lengths for true positive cones on RP and age-matched controls across different retinal eccentricities. B) Visualization of OS length as a 2-D heatmap for a diseased participant with drusen deposits at 1° temporal to fovea. Scale bar: 50 µm

Download Full Size | PDF

4. Discussion

In this paper, we developed a deep learning method to automatically process AO-OCT scans of the human retina and segment individual cone photoreceptors. Our framework consisted of three main modules for addressing the different tasks of retinal layer segmentation, retinal vessel exclusion, and cone segmentation. We used confocal AO-SLO images to train the vessel and cone segmentation modules. For vessel segmentation, we introduced an image simulation framework to add vessels as dark regions to cone images for the training of the neural network. In addition to using the ground truth cone center markings of [1] and the segmentation masks of [31] to train C-CNN, we added distance maps to better handle the separation of individual cones. For 3-D segmentation of cones, we applied the network trained on 2-D AO-SLO images to orthogonal views of AO-OCT images to obtain accurate 3-D predictions of cones.

Our method achieved high-performance scores across different retinal locations, imaging devices, and patient populations. The ability of our technique to detect cones was on par to that of experts and surpassed that of an earlier reported neural network architecture based on 2-D AO-OCT cone detection [53]. Furthermore, our technique extends 2-D cone detection to 3-D predictions of cone segmentations and includes additional modules for automatic layer segmentation and dealing with the shadowing effect of vessels that degrade cone visibility. Our current algorithm does not predict the exact reason for the low visibility of photoreceptors (e.g., shadowing effect, defocus, or cell atrophy). As larger annotated datasets of AO-OCT become available, deep networks can be trained to predict why photoreceptors are not discernible in an image.

We demonstrated the use of our framework for segmenting individual cones and measuring cone density and outer segment length from volumetric AO-OCT scans. Our cone segmentations on 2-D projection images were more accurate for images acquired at a higher sampling density. A larger dataset is needed to further investigate the segmentation performance across different imaging parameters and retinal locations. The cone density results showed a decreasing trend in cone density with retinal eccentricity, which is consistent with previous reports in the literature [2,56,65]. These results also reflected cone loss in the diseased participants, as expected. We also reported automatic measurements of OS length across RP and age-matched healthy controls. The measured lengths reflected the generally shorter and more varied OS lengths for the RP participants than for the age-matched controls, trends that are consistent with previous reports [57]. We envision that our automated method will be beneficial for future clinical studies on cellular-level alterations to photoreceptors (e.g., cone OS length and inner segment diameter) in retinal diseases or structural-functional investigations of fundamental cell processes in healthy eyes.

Our work is the first step towards a comprehensive automated cone analysis pipeline for AO-OCT images. One future avenue would be to extend the current framework to the segmentation of rods and RPE cells. To achieve this and further improve the accuracy of layer segmentation and remove photoreceptor segmentation artifacts in more noisy cases, as part of our ongoing work, we will extend the capabilities of the proposed L-CNN by integrating the BiconNet connectivity principles [66], which are proven impactful for other image analysis applications [67]. Extension of L-CNN through the connectivity principles to segment multiple layers (e.g., photoreceptor inner segment, photoreceptor outer segment, and RPE) would also allow measurements of inner segment length. Another future direction would be to develop a multimodal deep learning model to use simultaneously acquired AO-SLO and AO-OCT images from multimodal imagers. A multimodal model that uses information from two sources can potentially further improve the accuracy of cone segmentation.

Funding

Foundation Fighting Blindness (BR-CL-0621-0812-DUKE); National Institutes of Health (P30EY005722, R01-EY018339); Research to Prevent Blindness (Unrestricted Grant to Duke University); Hartwell Foundation (Postdoctoral Fellowship); U.S. Food and Drug Administration (FDA Critical Path Initiative).

Acknowledgments

We thank Oliver Wolcott from the FDA for technical assistance.

Disclosures

The mention of commercial products, their sources, or their use in connection with material reported herein is not to be construed as either an actual or implied endorsement of such products by the US Department of Health and Human Services.

Data availability

Imaging data and the manual expert labels underlying the results presented in this paper are available at [68].

Supplemental document

See Supplement 1 for supporting content.

References

1. R. Garrioch, C. Langlo, A. M. Dubis, R. F. Cooper, A. Dubra, and J. Carroll, “The repeatability of in vivo parafoveal cone density and spacing measurements,” Optometry and Vision Science 89(5), 632–643 (2012). [CrossRef]

2. D. Scoles, Y. N. Sulai, C. S. Langlo, G. A. Fishman, C. A. Curcio, J. Carroll, and A. Dubra, “In vivo imaging of human cone photoreceptor inner segments,” Invest. Ophthalmol. Visual Sci. 55(7), 4244–4251 (2014). [CrossRef]

3. R. Sabesan, H. Hofer, and A. Roorda, “Characterizing the human cone photoreceptor mosaic via dynamic photopigment densitometry,” PLoS One 10(12), e0144891 (2015). [CrossRef]

4. R. F. Cooper, W. S. Tuten, A. Dubra, D. H. Brainard, and J. I. W. Morgan, “Non-invasive assessment of human cone photoreceptor function,” Biomed. Opt. Express 8(11), 5098–5112 (2017). [CrossRef]

5. A. Dubra and Y. Sulai, “Reflective afocal broadband adaptive optics scanning ophthalmoscope,” Biomed. Opt. Express 2(6), 1757–1768 (2011). [CrossRef]

6. A. Dubra, Y. Sulai, J. L. Norris, R. F. Cooper, A. M. Dubis, D. R. Williams, and J. Carroll, “Noninvasive imaging of the human rod photoreceptor mosaic using a confocal adaptive optics scanning ophthalmoscope,” Biomed. Opt. Express 2(7), 1864–1876 (2011). [CrossRef]

7. R. F. Cooper, A. M. Dubis, A. Pavaskar, J. Rha, A. Dubra, and J. Carroll, “Spatial and temporal variation of rod photoreceptor reflectance in the human retina,” Biomed. Opt. Express 2(9), 2577–2589 (2011). [CrossRef]

8. T. DuBose, D. Nankivil, F. LaRocca, G. Waterman, K. Hagan, J. Polans, B. Keller, D. Tran-Viet, L. Vajzovic, A. N. Kuo, C. A. Toth, J. A. Izatt, and S. Farsiu, “Handheld adaptive optics scanning laser ophthalmoscope,” Optica 5(9), 1027–1036 (2018). [CrossRef]

9. R. F. Cooper, D. H. Brainard, and J. I. W. Morgan, “Optoretinography of individual human cone photoreceptors,” Opt. Express 28(26), 39326–39339 (2020). [CrossRef]

10. E. A. Rossi, C. E. Granger, R. Sharma, Q. Yang, K. Saito, C. Schwarz, S. Walters, K. Nozato, J. Zhang, and T. Kawakami, “Imaging individual neurons in the retinal ganglion cell layer of the living eye,” Proc. Natl. Acad. Sci. U.S.A. 114(3), 586–591 (2017). [CrossRef]

11. R. J. Zawadzki, S. M. Jones, S. S. Olivier, M. Zhao, B. A. Bower, J. A. Izatt, S. Choi, S. Laut, and J. S. Werner, “Adaptive-optics optical coherence tomography for high-resolution and high-speed 3D retinal in vivo imaging,” Opt. Express 13(21), 8532–8546 (2005). [CrossRef]

12. O. P. Kocaoglu, S. Lee, R. S. Jonnal, Q. Wang, A. E. Herde, J. C. Derby, W. Gao, and D. T. Miller, “Imaging cone photoreceptors in three dimensions and in time using ultrahigh resolution optical coherence tomography with adaptive optics,” Biomed. Opt. Express 2(4), 748–763 (2011). [CrossRef]

13. K. S. K. Wong, Y. Jian, M. Cua, S. Bonora, R. J. Zawadzki, and M. V. Sarunic, “In vivo imaging of human photoreceptor mosaic with wavefront sensorless adaptive optics optical coherence tomography,” Biomed. Opt. Express 6(2), 580–590 (2015). [CrossRef]

14. Z. Liu, O. P. Kocaoglu, and D. T. Miller, “3D imaging of retinal pigment epithelial cells in the living human retina,” Invest. Ophthalmol. Vis. Sci. 57(9), OCT533 (2016). [CrossRef]

15. Z. Liu, K. Kurokawa, F. Zhang, J. J. Lee, and D. T. Miller, “Imaging and quantifying ganglion cells and other transparent neurons in the living human retina,” Proc. Natl. Acad. Sci. U.S.A. 114(48), 12803–12808 (2017). [CrossRef]

16. M. J. Ju, M. Heisler, D. Wahl, Y. Jian, and M. V. Sarunic, “Multiscale sensorless adaptive optics OCT angiography system for in vivo human retinal imaging,” J. Biomed. Opt 22(12), 1 (2017). [CrossRef]

17. Z. Liu, J. Tam, O. Saeedi, and D. X. Hammer, “Trans-retinal cellular imaging with multimodal adaptive optics,” Biomed. Opt. Express 9(9), 4246–4262 (2018). [CrossRef]

18. Z. Liu, F. Zhang, K. Zucca, A. Agarwal, and D. X. Hammer, “Ultrahigh speed multimodal adaptive optics system for microscopic structural and functional imaging of the human retina,” Biomed. Opt. Express 13(11), 5860–5878 (2022). [CrossRef]

19. V. P. Pandiyan, X. Jiang, A. Maloney-Bertelli, J. A. Kuchenbecker, U. Sharma, and R. Sabesan, “High-speed adaptive optics line-scan OCT for cellular-resolution optoretinography,” Biomed. Opt. Express 11(9), 5274–5296 (2020). [CrossRef]

20. A. J. Bower, T. Liu, N. Aguilera, J. Li, J. Liu, R. Lu, J. P. Giannini, L. A. Huryn, A. Dubra, Z. Liu, D. X. Hammer, and J. Tam, “Integrating adaptive optics-SLO and OCT for multimodal visualization of the human retinal pigment epithelial mosaic,” Biomed. Opt. Express 12(3), 1449–1466 (2021). [CrossRef]

21. J. Polans, D. Cunefare, E. Cole, B. Keller, P. S. Mettu, S. W. Cousins, M. J. Allingham, J. A. Izatt, and S. Farsiu, “Enhanced visualization of peripheral retinal vasculature with wavefront sensorless adaptive optics optical coherence tomography angiography in diabetic patients,” Opt. Lett. 42(1), 17–20 (2017). [CrossRef]

22. P. Godara, A. M. Dubis, A. Roorda, J. L. Duncan, and J. Carroll, “Adaptive optics retinal imaging: emerging clinical applications,” Optometry and Vision Science 87(12), 930–941 (2010). [CrossRef]

23. J. S. Gill, M. Moosajee, and A. M. Dubis, “Cellular imaging of inherited retinal diseases using adaptive optics,” Eye 33(11), 1683–1698 (2019). [CrossRef]

24. C. Bowes Rickman, S. Farsiu, C. A. Toth, and M. Klingeborn, “Dry age-related macular degeneration: mechanisms, therapeutic targets, and imaging,” Invest. Ophthalmol. Vis. Sci. 54(14), ORSF68 (2013). [CrossRef]

25. E. M. Lad, J. L. Duncan, W. Liang, M. G. Maguire, A. R. Ayala, I. Audo, D. G. Birch, J. Carroll, J. K. Cheetham, T. A. Durham, A. T. Fahim, J. Loo, Z. Deng, D. Mukherjee, E. Heon, R. B. Hufnagel, B. Guan, A. Iannaccone, G. J. Jaffe, C. N. Kay, M. Michaelides, M. E. Pennesi, A. Vincent, C. Y. Weng, and S. Farsiu, “Baseline microperimetry and OCT in the RUSH2A study: structure−function association and correlation with disease severity,” Am. J. Ophthalmol. 244, 98–116 (2022). [CrossRef]

26. J. A. Boughman, P. M. Conneally, and W. E. Nance, “Population genetic studies of retinitis pigmentosa,” Am. J. Hum. Genet. 32, 223 (1980).

27. E. Bensinger, N. Rinella, A. Saud, P. Loumou, K. Ratnam, S. Griffin, J. Qin, T. C. Porco, A. Roorda, and J. L. Duncan, “Loss of foveal cone structure precedes loss of visual acuity in patients with rod-cone degeneration,” Invest. Ophthalmol. Vis. Sci. 60(8), 3187–3196 (2019). [CrossRef]

28. K. G. Foote, P. Loumou, S. Griffin, J. Qin, K. Ratnam, T. C. Porco, A. Roorda, and J. L. Duncan, “Relationship between foveal cone structure and visual acuity measured with adaptive optics scanning laser ophthalmoscopy in retinal degeneration,” Invest. Ophthalmol. Vis. Sci. 59(8), 3385–3393 (2018). [CrossRef]

29. N. Wynne, J. Carroll, and J. L. Duncan, “Promises and pitfalls of evaluating photoreceptor-based retinal disease with adaptive optics scanning light ophthalmoscopy (AOSLO),” Prog. Retinal Eye Res. 83, 100920 (2021). [CrossRef]

30. S. A. Burns, A. E. Elsner, K. A. Sapoznik, R. L. Warner, and T. J. Gast, “Adaptive optics imaging of the human retina,” Prog. Retinal Eye Res. 68, 1–30 (2019). [CrossRef]

31. S. J. Chiu, Y. Lokhnygina, A. M. Dubis, A. Dubra, J. Carroll, J. A. Izatt, and S. Farsiu, “Automatic cone photoreceptor segmentation using graph theory and dynamic programming,” Biomed. Opt. Express 4(6), 924–937 (2013). [CrossRef]

32. D. Cunefare, R. F. Cooper, B. Higgins, D. F. Katz, A. Dubra, J. Carroll, and S. Farsiu, “Automatic detection of cone photoreceptors in split detector adaptive optics scanning light ophthalmoscope images,” Biomed. Opt. Express 7(5), 2036–2050 (2016). [CrossRef]

33. J. Liu, H. Jung, A. Dubra, and J. Tam, “Automated photoreceptor cell identification on nonconfocal adaptive optics images using multiscale circular voting,” Invest. Ophthalmol. Vis. Sci. 58(11), 4477–4489 (2017). [CrossRef]

34. C. Bergeles, A. M. Dubis, B. Davidson, M. Kasilian, A. Kalitzeos, J. Carroll, A. Dubra, M. Michaelides, and S. Ourselin, “Unsupervised identification of cone photoreceptors in non-confocal adaptive optics scanning light ophthalmoscope images,” Biomed. Opt. Express 8(6), 3081–3094 (2017). [CrossRef]

35. J. Liu, H. Jung, A. Dubra, and J. Tam, “Cone photoreceptor cell segmentation and diameter measurement on adaptive optics images using circularly constrained active contour model,” Invest. Ophthalmol. Vis. Sci. 59(11), 4639–4652 (2018). [CrossRef]

36. Y. Chen, Y. He, J. Wang, W. Li, L. Xing, F. Gao, and G. Shi, “Automated cone photoreceptor cell segmentation and identification in adaptive optics scanning laser ophthalmoscope images using morphological processing and watershed algorithm,” IEEE Access 8, 105786–105792 (2020). [CrossRef]

37. K. Y. Li and A. Roorda, “Automated identification of cone photoreceptors in adaptive optics retinal images,” J. Opt. Soc. Am. A 24(5), 1358–1363 (2007). [CrossRef]

38. A. E. Salmon, R. F. Cooper, M. Chen, B. Higgins, J. A. Cava, N. Chen, H. M. Follett, M. Gaffney, H. Heitkotter, E. Heffernan, T. G. Schmidt, and J. Carroll, “Automated image processing pipeline for adaptive optics scanning light ophthalmoscopy,” Biomed. Opt. Express 12(6), 3142–3168 (2021). [CrossRef]

39. R. F. Cooper, G. K. Aguirre, and J. I. W. Morgan, “Fully automated estimation of spacing and density for retinal mosaics,” Trans. Vis. Sci. Tech. 8(5), 26 (2019). [CrossRef]

40. Y. Chen, Y. He, J. Wang, W. Li, L. Xing, X. Zhang, and G. Shi, “Automated cone photoreceptor cell identification in confocal adaptive optics scanning laser ophthalmoscope images based on object detection,” J. Innov. Opt. Health Sci. 15(01), 2250001 (2022). [CrossRef]

41. J. Hamwood, D. Alonso-Caneiro, D. M. Sampson, M. J. Collins, and F. K. Chen, “Automatic detection of cone photoreceptors with fully convolutional networks,” Trans. Vis. Sci. Tech. 8(6), 10 (2019). [CrossRef]

42. Y. Chen, Y. He, J. Wang, W. Li, L. Xing, X. Zhang, and G. Shi, “DeepLab and bias field correction based automatic cone photoreceptor cell identification with adaptive optics scanning laser ophthalmoscope images,” Wireless Communications and Mobile Computing 2021, 1 (2021). [CrossRef]

43. D. Cunefare, L. Fang, R. F. Cooper, A. Dubra, J. Carroll, and S. Farsiu, “Open source software for automatic detection of cone photoreceptors in adaptive optics ophthalmoscopy using convolutional neural networks,” Sci. Rep. 7(1), 6620 (2017). [CrossRef]

44. D. Cunefare, A. L. Huckenpahler, E. J. Patterson, A. Dubra, J. Carroll, and S. Farsiu, “RAC-CNN: multimodal deep learning based automatic detection and classification of rod and cone photoreceptors in adaptive optics scanning light ophthalmoscope images,” Biomed. Opt. Express 10(8), 3815–3832 (2019). [CrossRef]

45. D. Cunefare, C. S. Langlo, E. J. Patterson, S. Blau, A. Dubra, J. Carroll, and S. Farsiu, “Deep learning based detection of cone photoreceptors with multimodal adaptive optics scanning light ophthalmoscope images of achromatopsia,” Biomed. Opt. Express 9(8), 3740–3756 (2018). [CrossRef]

46. M. Zhou, N. Doble, S. S. Choi, T. Jin, C. Xu, S. Parthasarathy, and R. Ramnath, “Using deep learning for the automated identification of cone and rod photoreceptors from adaptive optics imaging of the human retina,” Biomed. Opt. Express 13(10), 5082–5097 (2022). [CrossRef]

47. K. Li, Q. Yin, J. Ren, H. Song, and J. Zhang, “Automatic quantification of cone photoreceptors in adaptive optics scanning light ophthalmoscope images using multi-task learning,” Biomed. Opt. Express 13(10), 5187–5201 (2022). [CrossRef]

48. . J. Liu, C. Shen, T. Liu, N. Aguilera, and J. Tam, “Deriving visual cues from deep learning to achieve subpixel cell segmentation in adaptive optics retinal images,” in International Workshop on Ophthalmic Medical Image Analysis, (Springer, 2019), 86–94.

49. J. Liu, C. Shen, N. Aguilera, C. Cukras, R. B. Hufnagel, W. M. Zein, T. Liu, and J. Tam, “Active cell appearance model induced generative adversarial networks for annotation-efficient cell segmentation and identification on adaptive optics retinal images,” IEEE Trans. Med. Imaging 40(10), 2820–2831 (2021). [CrossRef]

50. B. Davidson, A. Kalitzeos, J. Carroll, A. Dubra, S. Ourselin, M. Michaelides, and C. Bergeles, “Automatic cone photoreceptor localisation in healthy and Stargardt afflicted retinas using deep learning,” Sci. Rep. 8(1), 7911 (2018). [CrossRef]

51. K. M. Litts, R. F. Cooper, J. L. Duncan, and J. Carroll, “Photoreceptor-based biomarkers in AOSLO retinal imaging,” Invest. Ophthalmol. Visual Sci. 58(6), BIO255 (2017). [CrossRef]

52. D. T. Miller and K. Kurokawa, “Cellular-scale imaging of transparent retinal structures and processes using adaptive optics optical coherence tomography,” Annu. Rev. Vis. Sci. 6(1), 115–148 (2020). [CrossRef]

53. M. Heisler, M. J. Ju, M. Bhalla, N. Schuck, A. Athwal, E. V. Navajas, M. F. Beg, and M. V. Sarunic, “Automated identification of cone photoreceptors in adaptive optics optical coherence tomography images using transfer learning,” Biomed. Opt. Express 9(11), 5353–5367 (2018). [CrossRef]

54. E. M. Wells-Gray, S. S. Choi, M. Ohr, C. M. Cebulla, and N. Doble, “Photoreceptor identification and quantitative analysis for the detection of retinal disease in AO-OCT imaging,” in Ophthalmic Technologies XXIX, (SPIE, 2019) 43–51.

55. S. Soltanian-Zadeh, K. Kurokawa, Z. Liu, F. Zhang, O. Saeedi, D. X. Hammer, D. T. Miller, and S. Farsiu, “Weakly supervised individual ganglion cell segmentation from adaptive optics OCT images for glaucomatous damage assessment,” Optica 8(5), 642–651 (2021). [CrossRef]

56. A. Reumueller, L. Wassermann, M. Salas, M. Schranz, V. Hacker, G. Mylonas, S. Sacu, W. Drexler, M. Pircher, and U. Schmidt-Erfurth, “Three-dimensional composition of the photoreceptor cone layers in healthy eyes using adaptive-optics optical coherence tomography (AO-OCT),” PLoS One 16(1), e0245293 (2021). [CrossRef]

57. A. Lassoued, F. Zhang, K. Kurokawa, Y. Liu, M. T. Bernucci, J. A. Crowell, and D. T. Miller, “Cone photoreceptor dysfunction in retinitis pigmentosa revealed by optoretinography,” Proc. Natl. Acad. Sci. U.S.A. 118(47), e2107444118 (2021). [CrossRef]

58. S. J. Chiu, X. T. Li, P. Nicholas, C. A. Toth, J. A. Izatt, and S. Farsiu, “Automatic segmentation of seven retinal layers in SDOCT images congruent with expert manual segmentation,” Opt. Express 18(18), 19413–19428 (2010). [CrossRef]

59. L. Fang, D. Cunefare, C. Wang, R. H. Guymer, S. Li, and S. Farsiu, “Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search,” Biomed. Opt. Express 8(5), 2732–2744 (2017). [CrossRef]

60. A. G. Roy, S. Conjeti, S. P. K. Karri, D. Sheet, A. Katouzian, C. Wachinger, and N. Navab, “ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks,” Biomed. Opt. Express 8(8), 3627–3642 (2017). [CrossRef]

61. T. Kepp, J. Ehrhardt, M. P. Heinrich, G. Hüttmann, and H. Handels, “Topology-preserving shape-based regression of retinal layers in OCT image data using convolutional neural networks,” in 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019), (IEEE, 2019), 1437–1440.

62. J. Kugelman, D. Alonso-Caneiro, S. A. Read, S. J. Vincent, and M. J. Collins, “Automatic segmentation of OCT retinal boundaries using recurrent neural networks and graph search,” Biomed. Opt. Express 9(11), 5759–5777 (2018). [CrossRef]

63. L. Heinrich, J. Funke, C. Pape, J. Nunez-Iglesias, and S. Saalfeld, “Synaptic cleft segmentation in non-isotropic volume electron microscopy of the complete drosophila brain,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2018), 317–325.

64. Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, and J. Liu, “Ce-net: context encoder network for 2D medical image segmentation,” IEEE Transactions on Medical Imaging 38(10), 2281–2292 (2019). [CrossRef]

65. R. Legras, A. Gaudric, and K. Woog, “Distribution of cone density, spacing and arrangement in adult healthy retinas with adaptive optics flood illumination,” PLoS One 13(1), e0191141 (2018). [CrossRef]

66. Z. Yang, S. Soltanian-Zadeh, K. K. Chu, H. Zhang, L. Moussa, A. E. Watts, N. J. Shaheen, A. Wax, and S. Farsiu, “Connectivity-based deep learning approach for segmentation of the epithelium in in vivo human esophageal OCT images,” Biomed. Opt. Express 12(10), 6326–6340 (2021). [CrossRef]

67. Z. Yang, S. Soltanian-Zadeh, and S. Farsiu, “BiconNet: An edge-preserved connectivity-based approach for salient object detection,” Pattern Recognition 121, 108231 (2022). [CrossRef]

68. S. Soltanian-Zadeh, Z. Liu, Y. Liu, A. Lassoued, C. Cukras, D. T. Miller, D. X. Hammer, and S. Farsiu, “Deep learning-enabled volumetric cone photoreceptor segmentation in adaptive optics optical coherence tomography images of normal and diseased eyes,” Duke University Repository, 2023, https://people.duke.edu/~sf59/Soltanian_BOE_2023.htm.

Dataset	Group	#Participants, #Volumes	Eccentricity
FDA	Healthy	4, 15	3°-3.5°, 5°, and 7°-7.5° temporal and nasal to fovea
FDA	Diseased	5, 10	Within 1° of fovea (participants with drusen deposits), and at 6°, 7.5°, and 9° temporal to fovea (other participants)
IU	Healthy	3, 8	2°, 4°, 6°, and 8° temporal to fovea
IU	RP	3, 8	2°, 4°, 6°, and 8° temporal to fovea

Dataset	Method	Recall	Precision	F₁
FDA Healthy	C-CNN	0.926 ± 0.068	0.882 ± 0.081	0.900 ± 0.056
	CifarNet	0.927 ± 0.053	0.460 ± 0.147†	0.606 ± 0.0134†
	2^nd Grader	0.915 ± 0.061	0.900 ± 0.057	0.906 ± 0.051
FDA Diseased	C-CNN	0.931 ± 0.078	0.798 ± 0.092†	0.855 ± 0.066†
	CifarNet	0.872 ± 0.0136	0.426 ± 0.135†	0.551 ± 0.095†
	2^nd Grader	0.911 ± 0.048	0.882 ± 0.037	0.895 ± 0.030
IU Healthy	C-CNN	0.942 ± 0.051	0.971 ± 0.016	0.956 ± 0.029
	CifarNet	0.907 ± 0.064	0.574 ± 0.115†	0.695 ± 0.080†
	2^nd Grader	0.934 ± 0.029	0.983 ± 0.015	0.958 ± 0.019
IU RP	C-CNN	0.790 ± 0.187	0.980 ± 0.015	0.863 ± 0.119
	CifarNet	0.797 ± 0.210	0.599 ± 0.139†	0.663 ± 0.137†
	2^nd Grader	0.832 ± 0.114	0.975 ± 0.026	0.894 ± 0.069

Dataset	Group	#Participants, #Volumes	Eccentricity
FDA	Healthy	4, 15	3°-3.5°, 5°, and 7°-7.5° temporal and nasal to fovea
FDA	Diseased	5, 10	Within 1° of fovea (participants with drusen deposits), and at 6°, 7.5°, and 9° temporal to fovea (other participants)
IU	Healthy	3, 8	2°, 4°, 6°, and 8° temporal to fovea
IU	RP	3, 8	2°, 4°, 6°, and 8° temporal to fovea

Dataset	Method	Recall	Precision	F₁
FDA Healthy	C-CNN	0.926 ± 0.068	0.882 ± 0.081	0.900 ± 0.056
	CifarNet	0.927 ± 0.053	0.460 ± 0.147†	0.606 ± 0.0134†
	2^nd Grader	0.915 ± 0.061	0.900 ± 0.057	0.906 ± 0.051
FDA Diseased	C-CNN	0.931 ± 0.078	0.798 ± 0.092†	0.855 ± 0.066†
	CifarNet	0.872 ± 0.0136	0.426 ± 0.135†	0.551 ± 0.095†
	2^nd Grader	0.911 ± 0.048	0.882 ± 0.037	0.895 ± 0.030
IU Healthy	C-CNN	0.942 ± 0.051	0.971 ± 0.016	0.956 ± 0.029
	CifarNet	0.907 ± 0.064	0.574 ± 0.115†	0.695 ± 0.080†
	2^nd Grader	0.934 ± 0.029	0.983 ± 0.015	0.958 ± 0.019
IU RP	C-CNN	0.790 ± 0.187	0.980 ± 0.015	0.863 ± 0.119
	CifarNet	0.797 ± 0.210	0.599 ± 0.139†	0.663 ± 0.137†
	2^nd Grader	0.832 ± 0.114	0.975 ± 0.026	0.894 ± 0.069

Deep learning-enabled volumetric cone photoreceptor segmentation in adaptive optics optical coherence tomography images of normal and diseased eyes

Abstract

1. Introduction

2. Materials and methods

2.1 AO-OCT dataset and annotation

2.2 Confocal AO-SLO dataset and annotation

2.3 Overall framework

2.4 Cone outer segment layer segmentation

2.5 Vessel detection and segmentation

2.6 Cone segmentation

2.7 Performance evaluation

3. Results

3.1 Performance of layer segmentation

3.2 Performance of cone detection

3.3 Cone density analysis

3.4 Cone segmentation

4. Discussion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (9)

Tables (2)

Equations (7)

Biomedical Optics Express