Automatic quantification of cone photoreceptors in adaptive optics scanning light ophthalmoscope images using multi-task learning

Kaiwen Li; Qi Yin; Ji Ren; Hongxin Song; Jie Zhang

doi:10.1364/BOE.471426

1. Introduction

Vision is arguably the most important sense, and a decrease or loss of vision will greatly affect the quality of life. The photoreceptor is a specific type of neuron that captures the visible light photons and initializes the visual phototransduction [1]. The two classic photoreceptor cells are cones and rods in mammalian retinas. Cone mainly contributes to the bright central and color vision which enables a person to see fine details, thus it is more important to daily life. However, many retina diseases can affect cones and cause progressive loss of cones, which affects hundreds of millions of people in the world [2]. Thus, precisely measuring and quantifying the cone metrics, such as density, spacing, size, etc., is critical to characterize the disease. However, regular imaging modalities, such as fundus camera, scanning light ophthalmoscope (SLO), and optical coherence tomography (OCT), cannot visualize the individual photoreceptors.

Adaptive optics (AO) can greatly enhance the resolution of these imaging modalities [3–14]. The state-of-the-art AO-SLO can even visualize the smallest photoreceptors: rods and foveal cones [15] in the living human eyes. Integrating AO-SLO into the ophthalmic imaging system enables observing the microstructure of the retina non-invasively at the cellular level, which potentially improves the capability of diagnosis, prognosis, and treatment of many fundus diseases. This technique is of great scientific and clinical significance [16]. In recent years, the relationship between density, spacing, and other morphological quantitative indicators of retinal photoreceptors with a variety of fundus diseases has also been studied and confirmed [3,15,17–23].

In order to obtain these important quantitative indicators, it is necessary to accurately detect and recognize each cone photoreceptor in the acquired AO-SLO image. Considering the high density of human photoreceptors, manual identification is time-consuming and error-prone. To confront these problems, some automatic detection methods have been proposed [24–34]. These methods employ techniques like HoughCircles, local intensity maxima detection, graph-theory, dynamic programming, and so on. Garrioch et al. [28] improved the method of Li et al. [25] and verified it on a large data set. The method of Li et al. [25] required manual setting of cutoff frequency of a finite-impulse-response low-pass filter and Garrioch et al. [28] objectively and automatically determined the filter based on the image itself (by first automatically estimating the modal cone frequency in the image being analyzed). However, a manual correction was still needed to supplement the missing cone photoreceptors. Chiu et al. [30] applied graph theory and dynamic programming (GTDP) to detect cone photoreceptors, and segment them later. With the rapid development of the deep learning technique, the convolutional neural network (CNN) has soon been applied to this field. Cunefare et al. [34] used CNN to detect cone photoreceptors, by predicting whether each patch center is a cone photoreceptor center.

According to Ref. [35], multi-task learning can obtain additional useful information by mining the relationship between tasks. By using these additional information, a better and more robust model can be trained. In most cases, the effect of multi-task learning is better than that of single task learning.

In this paper, we propose to develop a new cone quantification method by replacing the traditional two classifications (cone and background) with three classifications (cone, cone edge, and background) to train a deep learning model. The cone edge is derived from the segmentation results published in Chiu et al. [30] The purpose of adding a cone edge is to better handle the adhesion between cone photoreceptors. In addition, the segmentation result is also used to construct another weight image, which results in a regression task. The purpose of adding the regression task is to construct a multi-task learning model so that we can use the advantages of multi-task learning to promote the performance of the classification task. Thus, the multi-task learning technique is employed to output the classification and regression results at the same time and finally, a better result of cone detection is achieved.

2. Methods

In this section, we introduce our method in detail.

2.1 Dataset

Our model is trained and tested on the basis of dataset from the study of Garrioch et al. [28] and Chiu et al. [30]. The dataset contains 840 images from 21 subjects (20 healthy subjects and 1 subject with deuteranopia), with image size of 150 × 150 pixels, taken by AO-SLO system. Each subject was imaged at four locations (bottom left, bottom right, top left and top right) 0.65°from the center of fixation, and an average of 10 images were collected at each location. A region of interest with a size of 150 × 150 pixels was extracted from the center of each image for analysis, and each image was semiautomatically marked by experts to obtain the central position of cone photoreceptors.

2.2 Label

According to the segmentation results provided in Chiu et al. [30] study, the corresponding binary image of each cone photoreceptor was obtained as a two-classification label (cone and the background). To obtain the label for training our model, we then used the contour tracing algorithm in Matlab or OpenCV to extract the contour edge of each cone, so as to obtain a three-classification label image including background, cone edge, and cone. Figure 1 shows the processing method for one normal and two specific cases. The middle row of Fig. 1 shows two complete but relatively close cones in the image. It can be seen that their corresponding two-classification labels are mixed as a whole. In this case, as shown in the second row of Fig. 1, the two cones can be clearly separated by adding edges. As shown in the bottom row in Fig. 1, for these cones that are not in complete shape at the edge of the image, there is no need to add edge labels on the incomplete side of these cones. It should be noted that if one cone is relatively small, i.e. the number of pixels belonging to the cone is less than 4 after the cone edge is added, there is no need to add cone edge. Otherwise, the corresponding cone features could not be extracted for the small cone after one or two down-sampling in CNN.

Fig. 1. Example of three-classification. From left to right: three cone samples of the dataset, two-classification label from GTDP [30], and three-classification label in our method. First col (from top to bottom): one regular cone, two complete but relatively close cones, and one incomplete cone at the edge of the image. Second col (from top to bottom): the two-classification label of the regular cone, the labels of two closely adjacent cones are mixed as a whole, and the label of the incomplete cone. Third col (from top to bottom): the three-classification label of the regular cone, labels of the two closely adjacent cones can be clearly separated by adding cone edge labels, and there is no need to add cone edge labels on the incomplete side of this incomplete cone.

Download Full Size | PDF

The main purpose of adding cone edge is to better handle the adhesion between cones. If two cones are closely adjacent, the two-classification labeling method mixes these two cones together, as shown in the middle row of Fig. 1. Consequently, in the predicted cone probability map, two adjacent cones would be mixed, thus leading to inaccurate cone counting result. Therefore, using cone edge to distinguish these adherent cones from the beginning is more favorable to the training of the network model, and helps to improve the accuracy of relevant indicators (such as: true positive rate, false positive rate, and Dice’s coefficient) in post-processing. As shown in Fig. 2(b), it can be seen that the adhesion between cones in the cone probability map corresponding to two-classification labels is obvious, and many individual cones cannot be clearly distinguished. In contrast, in the cone probability map corresponding to three-classification labels, each individual cone can be clearly identified in Fig. 2(c), which is more conducive to subsequent post-processing. We use one-hot coding to generate training labels for the classification task based on three-classification label images. At the same time, we use the three-classification label image to generate the corresponding weight label. For each three-classification image, we first generate a full-zero three-channel matrix of corresponding size, where each channel corresponds to a classification, and the first channel to the third channel correspond to three classifications: cone, edge and background. The matrix is assigned values according to the three-classification image. For example, if the classification corresponding to a point in the three-classification image is cone, the corresponding position of the first channel of the matrix is set to 1.

Fig. 2. Cone probability map corresponding to two-classification vs Cone probability map corresponding to three-classification. (a) Original image. (b) Cone probability map corresponding to two-classification. Many cones in the probability map are connected to each other, which are not easy to distinguish. (c) Cone probability map corresponding to three-classification. Each cone in the probability map can be easily identified.

Download Full Size | PDF

2.3 Image pre-processing and Patch extraction

For each image, a Gaussian filter with a small-size of 3 × 3 and the sigma of 1.5 is used to to perform a simple denoising. Then the filtered image is normalized with the mean and standard deviation of it. The normalizing process is done according to Eq. (1):

(1)$${I_{Normazlized}} = \frac{{{I_{Gauss}} - {I_{Mean}}}}{{{I_{std}}}}$$

Where I_Gauss is the image after Gaussian filtering, I_mean and I_std are the corresponding mean value and standard deviation.

To train our model, we used 560 images from 14 subjects as the training set, and 80 images from another 2 subjects as the validation set. The remaining 200 images from 5 subjects (1 subject with deuteranopia) were used as the testing set. We sampled the images by partly overlapped sampling. And the sliding window of size 64 × 64 pixels was used to sample with a stride of 16 along both horizontal and vertical directions.

2.4 CNN learning model

The multi-task learning in this paper adopts the hard sharing mechanism, that is, all tasks share the feature extraction part, and each task retains a separate output layer. This approach reduces the risk of over-fitting. By learning multiple tasks at the same time, the overall model learns the common representation of these multiple tasks, which makes the risk of over-fitting of each original task smaller.

The network model adopted in this paper is shown in Fig. 3, which contains a feature extraction module and a multi-task learning module. The main network architecture of this feature extraction module is similar to the architecture in Ref. [36] and modified to better handle the task of dense small target segmentation. The main structure consists of two branches. Both branches are composed of two UNets. In Ref. [36], it is mainly composed of convolution module, max-pooling layer and up-sampling layer. Each convolution module has two convolution layers, and each convolution layer is followed by a ReLU layer. There are two types of UNets, the main structures of them are similar to the normal UNet. However, the first convolution module of the first UNet encoder is not connected to the max-pooling layer, but an up-sampling layer. And the last convolution module of the decoder is the corresponding max-pooling layer. The second convolution module of the second UNet encoder is not connected to the max-pooling layer, but an up-sampling layer. And the penultimate convolution module of the decoder is the corresponding max-pooling layer. This is to better extract the features of different size of images. In addition, there is no feature fusion between the first convolution module and the last convolution module of each UNet. Each of their branches consists of two UNets, but the features between the two UNets are not fused. The first UNet of each branch is used for rough segmentation, and the second UNet is used for precise segmentation. Finally, the output of the two branches is averaged as the final result.

Fig. 3. Network architecture of our method. (a) Feature extraction module. It consists of two branches, and each branch consists of two UNets. (b) Multi-task learning module, which consists of a classification task and a regression task. The number in each box indicates the number of channels.

Download Full Size | PDF

In contrast to Ref. [36], we also fused the features of the first convolution module and the last convolution module of each UNet, and added a dense connection structure between the two UNets of each branch to fully fuse these features. Each branch finally outputs 32 feature maps, and the outputs of the two branches are concatenated and fused as the input of the multi-task learning module.

The multi-task learning module includes the classification task of three classifications and the regression task of weight prediction. Loss functions used in this paper are Categorical Cross-Entropy (CCE) and Mean Square Error (MSE).

The batch size is set to 32, the learning rate is set to 0.001, and Adam [37] is used as the optimizer. We trained the model for 100 epochs and preserved model parameters with the best performance. The final loss is calculated as shown in Eq. (2).

(2)$$Loss = CCE(outpu{t_{classification}},G{T_{classification}}) + MSE(outpu{t_{regression}},G{T_{regression}})$$

2.5 Test

The test samples are also obtained by overlapping sampling. Before sampling, the boundary should be expanded to better process cones at the edge of test images. It should be noted that the use of a large stride would result in an obvious boundary effect. Although reducing the stride alleviates this problem, it would lead to an increase in test samples and processing time. In this paper, a weighted map is proposed to avoid the boundary effect effectively.

A fixed weighted map is constructed according to the sampling size, in which the weight value of one pixel gets larger as it moves closer to the center, and the corresponding weight value of the center point is set to 1. The fixed weighted image is built by generating a two-dimensional Gaussian filter kernel with a size of 64 × 64 pixels and a sigma of 15. As shown in Fig. 4, taking the cone probability map as an example, these prediction results corresponding to each sample are dot multiplied with the fixed weighted map. Then we prepare two full-zero matrices consistent with the size of this test image. According to the position of test samples in the original image, dot multiplication results are added to the corresponding position of one of the matrices. The fixed weighted map is also added to the corresponding position of another matrix. Finally, the former matrix is used to divide the latter matrix point by point to get the final cone probability map.

Fig. 4. Schematic flow of the test and the cone probability map. (a) Schematic flow of the test: The cone probability output corresponding to each sample is dot multiplied with the fixed weight map, and placed in the corresponding position in full-zero matrixs respectively. Finally, the dot division operation is performed between the two matrixs. (b) The cone probability map. Left: The cone probability map without using the fixed weight map; Right: The cone probability map with using the fixed weight map.

Download Full Size | PDF

2.6 Post-processing

The post-processing mainly aims to process the cone probability map and obtain the cone centers. Following the idea of non-maximum suppression in Ref. [38], a 9 × 9 mean filter was first performed on the cone probability map to get a mean filter map, and then a sliding window with a size S of 9 × 9 was used to detect valid cones in this mean filter map. If the center point of a mean filter map block corresponding to the current sliding window has the maximum value of this block, and its value is greater than the chosen threshold 0.15, then this point is considered the center point of a cone. Then size S of the mean filter and the sliding window was reduced by 2, and the above operations were repeated to detect cone center points. It should be noted that if point A is considered as a cone center, but there is already a cone center B nearby with a distance between A and B shorter than S+2, it means that A is actually not a cone center. In fact, there is already a cone center B corresponding to a large cone near A, and A is a part of B.

The above operations were repeated until the final sliding window size S became 3 × 3, because the smallest cone size in these test images has defaulted to 3 × 3. Figure 5 shows the original image, the cone probability map, the cone center points obtained by the above post-processing, and detected cone centers marked in green on the original image respectively.

Fig. 5. Detection of cones in AO-SLO image. (a): Original image. (b): Probability map generated from (a) using the proposed method. (c): Detected cones according to (b). (d): Detected cones marked in green on the image.

Download Full Size | PDF

3. Results

We conducted comparative experiments on multiple dimensions to verify the performance of our method. Taking the semi-automatic label published in [28] and supplemented by experts as the gold standard, we compared our method to the methods in [30], [34] on the test datasets.

3.1 Cone identification performance

If an automatically marked cone was located within a certain Euclidean distance of a marked cone of the gold standard, it was considered to be a true positive, otherwise, it was considered to be a false positive. We set the Euclidean distance value to 1.75 µm according to Ref. [30]. In order to evaluate the performance of cone identification, we calculated the true positive rate, false positive rate, and Dice’s coefficient as:

(3)$$\left\{ {\begin{array}{{c}} {True\textrm{ }positive\textrm{ }rate\textrm{ } = \textrm{ }{N_{TP}}/{N_{Gold}}}\\ {False\textrm{ }positive\textrm{ }rate\textrm{ } = \textrm{ }{N_{FP}}/({N_{TP}} + {N_{FP}})}\\ {Dice^{\prime}s\textrm{ }coefficient\textrm{ } = \textrm{ }2\ast {N_{TP}}/({N_{TP}} + {N_{FP}} + {N_{Gold}})} \end{array}} \right.$$

Where N_TP is the number of true positives, N_FP is the number of false positives, N_Gold is the number of the gold standard. The performances in cone identification for these three methods are shown in Table 1. Standard deviations are also given in parenthesis.

Table 1. Cone identification performance of fully automatic methods in the test set

View Table | View all tables in this article

We could see that our method achieves the best results in all three evaluation indicators, which proved the effectiveness of our method. The standard deviation also showed that our method was more stable than the other two methods.

Figure 6 is an illustrative example of cone identification results, where the first column shows AO-SLO images, and the remaining three columns show results of three automatic methods with true positives in green, false positives in red, and false negatives in blue. Compared with the other two methods, the proposed method can recognize cones more accurately. Especially, in Fig. 6(d), the proposed method achieves a true positive rate of 100% and a false positive rate of 0%.

Fig. 6. Cone identification performance comparison between different methods (green: true positive; red: false positive; blue: false negative): First column: AO-SLO image of the cone mosaic in log scale. Second column: results with GTDP method. method. Third column: results with the C-CNN method. Final column: results with our method.

Download Full Size | PDF

To better understand the performance of our model, we further analyzed 372 cones that were judged to be false negative. As shown in Fig. 7, we found that these cones can be roughly divided into three situations: a) the cone whose distance from the nearest automatically identified cone exceeds 1.75 µm; b) the cones whose mean value under the sliding windows corresponding to sizes 3 is less than 0.15; c) the cone whose mean values under the sliding windows corresponding to sizes 3, 5, 7 and 9 are more than 0.15, but are not the max values under the corresponding sliding windows. The cones in b) and c) were not detected during post-processing. Figure 7(d) shows the number of cones corresponding to these three situations.

Fig. 7. Further analysis of 372 false negatives (marked by blue). (a): The cone whose distance from the nearest identified cone exceeds 1.75 µm. (b): The cone whose mean value under the sliding windows corresponding to sizes 3 is less than 0.15. (c): The cone whose mean values under the sliding windows corresponding to sizes 3,5,7 and 9 are more than 0.15, but are not the max values under the corresponding sliding windows. (d): The number of cones in three situations. The cones in (b) and (c) are not detected during post-processing.

Download Full Size | PDF

In addition, we also further analyzed the 326 cones that were judged to be false positive. As shown in Fig. 8, we found that these cones can also be roughly divided into three situations: a) the cone whose distance from the gold standard exceeds 1.75 µm; b) the cone that is identified by our method while missed in the gold standard; c) the cone that identified by our method mistakenly. Figure 8(d) shows the number of cones in three situations.

Fig. 8. Further analysis of 326 false positives (marked by red). (a): The cone whose distance from the gold standard exceeds 1.75 µm. (b): The cone that identified by our method while missed in the gold standard. (c): The cone that identified by our method mistakenly. (d): The number of cones in three situations.

Download Full Size | PDF

3.2 Reproducibility results

We evaluated the reproducibility of each method by comparing cone density (number of cones per square millimeter) and cone spacing (mean distance from each cone to its nearest neighbor). In addition, the two components (within-subject coefficient of variation (CV), and intra-class (intra-subject) correlation coefficient (ICC)) described in [30] were also used to compare the reproducibility of each method. The lower the CV, the better the performance of the method. The closer the ICC is to 1, the better the performance of the method.

Table 2 shows the reproducibility results of each method. Table 3 provides the average cone density and cone spacing for 5 subjects in the test set. The average cone density ICC of our method was 0.9871, which indicates that 98.71% of the overall variability was related to the variability between subjects, and only 1.29% was related to our method. The average cone density within-subject CV of our method was 0.0089, which indicates that the error in reproducing the same measurement for the same subject was within 0.89% of the mean. In terms of density and spacing, our results were generally closer to the gold standard.

Table 2. Reproducibility Comparison of Cone Density and Spacing Measurements

View Table | View all tables in this article

Table 3. Intersubject variability cone density and spacing measurements

View Table | View all tables in this article

3.3 Bland-Altman plots

Bland-Altman plots were used to compare cone density measurements between the automatic methods and the gold standard. The results are shown in Fig. 9. The middle solid line shows the average difference, and the upper and lower dotted lines show the 95% confidence limits of agreement. Figure 9 shows that the average difference in our method is closer to 0, and the range of the 95% confidence interval in our method is the smallest, which indicate that our method was more stable and accurate than the other two methods.

Fig. 9. Bland-Altman plots comparing cone density for (a): GTDP [30], (b): C-CNN [34], and (c): Our method.

Download Full Size | PDF

3.4 Ablation Experiment

We proved the effectiveness of our method by comparing multi-task learning, single-task learning with three classifications and single-task learning with two classifications. True positive rate, false positive rate, and Dice’ s coefficient were used for comparison. In addition, we directly applied the pre-processing part and post-processing part to the original image to verify whether the deep learning method was indeed effective. Table 4 shows the results of the ablation experiment. Standard deviations are shown in parenthesis. As shown in Table 4, the multi-task learning method achieves the best results.

Table 4. Cone identification performance of the ablation experiment

View Table | View all tables in this article

Figure 10 shows the classification loss curve of multi-task learning and single-task learning with three classifications. According to the declining line of loss, multi-task learning converges faster and better than single-task learning, which indicates that multi-task joint training could promote the training of every single task.

Fig. 10. Classification loss curve of training set and validation set.

Download Full Size | PDF

3.5 Cone segmentation result

We further processed the cone probability map to realize cone segmentation. First, the cone probability map was turned into a binary map, The threshold of binarization was set to 0.15. Then the contour of each cone was detected by the algorithm (findContours) of OpenCV. Then the minimum circumscribed oblique rectangle of each cone contour was obtained by the algorithm (minAreaRect) of OpenCV and the elliptical boundary of each cone was drawn. Three samples of the segmentation results were shown in Fig. 11. It can be seen that the segmentation results well encircle the cones in all three images, which indicates that our method can be applied to segment the cones for future further analysis, such as calculating the size of cones.

Fig. 11. The segmentation results of our method. First row: Three original images. Second row: Segmentation results are marked in green circles on the original images.

Download Full Size | PDF

4. Conclusion

In this paper, we developed a fully automatic CNN-based method for detecting foveal center cone photoreceptors in AO-SLO images. We extended the segmentation results provided by Chiu et al. [30] and added cone edges in classification to better handle the adhesion between cones. As a result, each cone could be clearly distinguished in the cone probability map, which was conducive to the post-processing part, hence improving the accuracy of cone recognition.

We trained and validated our network using a public data set consisting of over 200,000 cones. Compared with the other two fully automatic methods, our method achieved the best performance in multiple aspects. We achieved 99.20% true positive rate, 0.71% false positive rate, and 99.24% Dice’s coefficient on the test set consisting of 44,634 cones. Our method identified 326 cones that were judged to be false positive, some of which appeared to have obvious cone shapes, but were not identified by the gold standard method. 372 cones were judged as false negative, some of which did not appear to exist in the image. Although these gold standards were annotated and supplemented by experts, considering the subjective factors of individuals, these errors could be acceptable. Table 2 also shows that our method generated more reproducible cone density measurements than other automated methods. The ablation experiments in Table 4 proved that the performance of three-classification was better than that of two-classification. We used the weight label map to design a multi-task learning model including a classification task and a regression task. We improved the learning ability of every single task and reduced the risk of over-fitting in every single task through information interaction between multi-tasks. At the same time, we further processed the cone probability map to realize cone segmentation.

It should be noted that there is room for improvement in our proposed method, the proportion of two task loss values in the total loss can be obtained through adaptive technology, which can avoid a lot of manual parameter adjustment. In addition, better pre-processing techniques can be used to enhance the characteristics of cones that were judged to be false negative.

Based on deep learning, our method can be easily extended to other imaging modalities or imaging conditions with different features. Several retinal diseases lead to alterations of the cone mosaic (e.g. age-related macular degeneration, achromatopsia, retinitis pigmentosa, and Usher syndrome), and quantitative analysis of these mosaics captured with AO-SLO is potentially useful for the characterization, early diagnosis, and prognosis of these diseases. In the future, we will get more pathological data and work further to translate the method proposed here to better suit a clinical environment.

Acknowledgments

We would like to thank R. Garrioch, C. Langlo, A. M. Dubis, R. F. Cooper, A. Dubra, and J. Carroll study [28], including image acquisition, the repeatability study design, and providing the gold standard for cone identification. We would also like to thank S. J. Chiu, Y. Lokhnygina, A. M. Dubis, A. Dubra, J. Carroll, J. A. Izatt, and S. Farsiu for their work on GTDP [30]. The authors would like to thank Dr. Pengfei Zhang for his valuable discussion.

Disclosures

Kaiwen Li: Robotrak Technologies Co., Ltd. (P), Qi Yin: Robotrak Technologies Co., Ltd. (P), Ji Ren: Robotrak Technologies Co., Ltd. (P), Jie Zhang: Robotrak Technologies Co., Ltd. (I)(P); University of Rochester (P).

Data availability

Data underlying the results presented in this paper are available in [30, 39].

References

1. E. N. Pugh and W. H. Cobbs, “Visual transduction in vertebrate rods and cones: a tale of two transmitters, calcium and cyclic GMP,” Vision. Res. 26(10), 1613–1643 (1986). [CrossRef]

2. A. F. Wright, C. F. Chakarova, M. M. Abd El-Aziz, and S. S. Bhattacharya, “Photoreceptor degeneration: genetic and mechanistic dissection of a complex trait,” Nat Rev Genet 11(4), 273–284 (2010). [CrossRef]

3. A. Roorda and D. R. Williams, “The arrangement of the three cone classes in the living human eye,” Nature 397(6719), 520–522 (1999). [CrossRef]

4. A. Roorda, F. Romero-Borja, W. Donnelly Iii, H. Queener, T. Hebert, and M. Campbell, “Adaptive optics scanning laser ophthalmoscopy,” Opt. Express 10(9), 405–412 (2002). [CrossRef]

5. R. J. Zawadzki, S. M. Jones, S. S. Olivier, M. Zhao, B. A. Bower, J. A. Izatt, S. Choi, S. Laut, and J. S. Werner, “Adaptive-optics optical coherence tomography for high-resolution and high-speed 3D retinal in vivo imaging,” Opt. Express 13(21), 8532–8546 (2005). [CrossRef]

6. D. Merino, C. Dainty, A. Bradu, and A. G. Podoleanu, “Adaptive optics enhanced simultaneous en-face optical coherence tomography and scanning laser ophthalmoscopy,” Opt. Express 14(8), 3345–3353 (2006). [CrossRef]

7. C. Torti, B. Povazay, B. Hofer, A. Unterhuber, J. Carroll, P. K. Ahnelt, and W. Drexler, “Adaptive optics optical coherence tomography at 120,000 depth scans/s for non-invasive cellular phenotyping of the living human retina,” Opt. Express 17(22), 19382–19400 (2009). [CrossRef]

8. R. D. Ferguson, Z. Zhang, D. X. Hammer, M. Mujat, A. H. Patel, C. Deng, W. Zou, and S. Burns, “Adaptive optics scanning laser ophthalmoscope with integrated wide-feld retinal imaging and tracking,” J. Opt. Soc. Am. 27(11), A265–A277 (2010). [CrossRef]

9. A. Dubra and Y. Sulai, “Refective afocal broadband adaptive optics scanning ophthalmoscope,” Biomed. Opt. Express 2(6), 1757–1768 (2011). [CrossRef]

10. R. S. Jonnal, O. P. Kocaoglu, Q. Wang, S. Lee, and D. T. Miller, “Phase-sensitive imaging of the outer retina using optical coherence tomography and adaptive optics,” Biomed. Opt. Express 3(1), 104–124 (2012). [CrossRef]

11. T. Y. P. Chui, D. A. VanNasdale, and S. A. Burns, “The use of forward scatter to improve retinal vascular imaging with an adaptive optics scanning laser ophthalmoscope,” Biomed. Opt. Express 3(10), 2537–2549 (2012). [CrossRef]

12. D. Scoles, Y. N. Sulai, C. S. Langlo, G. A. Fishman, C. A. Curcio, J. Carroll, and A. Dubra, “In vivo imaging of human cone photoreceptor inner segments,” Invest. Ophthalmol. Vis. Sci. 55(7), 4244–4251 (2014). [CrossRef]

13. J. Zhang, Q. Yang, K. Saito, K. Nozato, D. R. Williams, and E. A. Rossi, “An adaptive optics imaging system designed for clinical use,” Biomed. Opt. Express 6(6), 2120–2137 (2015). [CrossRef]

14. E. A. Rossi, C. E. Granger, R. Sharma, Q. Yang, K. Saito, C. Schwarz, S. Walters, K. Nozato, J. Zhang, T. Kawakami, W. Fischer, L. R. Latchney, J. J. Hunter, M. M. Chung, and D. R. Williams, “Imaging individual neurons in the retinal ganglion cell layer of the living eye,” Proc. Natl. Acad. Sci. U.S.A. 114(3), 586–591 (2017). [CrossRef]

15. A. Dubra, Y. Sulai, J. L. Norris, R. F. Cooper, A. M. Dubis, D. R. Williams, and J. Carrolll, “Noninvasive imaging of the human rod photoreceptor mosaic using a confocal adaptive optics scanning ophthalmoscope,” Biomed. Opt. Express 2(7), 1864–1876 (2011). [CrossRef]

16. L. Sawides, A. de Castro, and S. A. Burns, “The organization of the cone photoreceptor mosaic measured in the living human retina,” Vision Res. 132, 34–44 (2017). [CrossRef]

17. T. Y. Chui, H. Song, and S. A. Burns, “Adaptive-optics imaging of human cone photoreceptor distribution,” J. Opt. Soc. Am. 25(12), 3021–3029 (2008). [CrossRef]

18. T. Y. Chui, H. Song, and S. A. Burns, “Individual variations in human cone photoreceptor packing density: variations with refractive error,” Invest. Ophthalmol. Vis. Sci. 49(10), 4679–4687 (2008). [CrossRef]

19. K. Y. Li, P. Tiruveedhula, and A. Roorda, “Intersubject variability of foveal cone photoreceptor density in relation to eye length,” Invest. Ophthalmol. Vis. Sci. 51(12), 6858–6867 (2010). [CrossRef]

20. Y. Kitaguchi, K. Bessho, T. Yamaguchi, N. Nakazawa, T. Mihashi, and T. Fujikado, “In vivo measurements of cone photoreceptor spacing in myopic eyes from images obtained by an adaptive optics fundus camera,” Jpn. J. Ophthalmol. 51(6), 456–461 (2007). [CrossRef]

21. D. Merino, J. L. Duncan, P. Tiruveedhula, and A. Roorda, “Observation of cone and rod photoreceptors in normal subjects and patients using a new generation adaptive optics scanning laser ophthalmoscope,” Biomed. Opt. Express 2(8), 2189–2201 (2011). [CrossRef]

22. A. Roorda and D. R. Williams, “Optical fiber properties of individual human cones,” J. Vis. 2(5), 4–412 (2002). [CrossRef]

23. M. Pircher, R. J. Zawadzki, J. W. Evans, J. S. Werner, and C. K. Hitzenberger, “Simultaneous imaging of human cone mosaic with adaptive optics enhanced scanning laser ophthalmoscopy and high-speed transversal scanning optical coherence tomography,” Opt. Lett. 33(1), 22–24 (2008). [CrossRef]

24. D. M. Bukowska, A. L. Chew, E. Huynh, I. Kashani, S. Wan, P. Wan, and F. K. Chen, “Semi-automated identifcation of cones in the human retina using circle hough transform,” Biomed. Opt. Express 6(12), 4676–4693 (2015). [CrossRef]

25. K. Y. Li and A. Roorda, “Automated identifcation of cone photoreceptors in adaptive optics retinal images,” J. Opt. Soc. Am. 24(5), 1358–1363 (2007). [CrossRef]

26. B. Xue, S. S. Choi, N. Doble, and J. S. Werner, “Photoreceptor counting and montaging of en-face retinal images from an adaptive optics fundus camera,” J. Opt. Soc. Am. 24(5), 1364–1372 (2007). [CrossRef]

27. D. H. Wojtas, B. Wu, P. K. Ahnelt, P. J. Bones, and R. P. Millane, “Automated analysis of diferential interference contrast microscopy images of the foveal cone mosaic,” J. Opt. Soc. Am. 25(5), 1181–1189 (2008). [CrossRef]

28. R. Garrioch, C. Langlo, A. M. Dubis, R. F. Cooper, A. Dubra, and J. Carroll, “Repeatability of in vivo parafoveal cone density and spacing measurements,” Optom. Vis. Sci. 89(5), 632–643 (2012). [CrossRef]

29. A. Turpin, P. Morrow, B. Scotney, R. Anderson, and C. Wolsley, “Automated identifcation of photoreceptor cones using multi-scale modelling and normalized cross-correlation,” Proc. ICIAP 6978, 494–503 (2011). [CrossRef]

30. S. J. Chiu, Y. Lokhnygina, A. M. Dubis, A. Dubra, J. Carroll, J. A. Izatt, and S. Farsiu, “Automatic cone photoreceptor segmentation using graph theory and dynamic programming,” Biomed. Opt. Express 4(6), 924–937 (2013). [CrossRef]

31. F. Mohammad, R. Ansari, J. Wanek, and M. Shahidi, “Frequency-based local content adaptive fltering algorithm for automated photoreceptor cell density quantifcation,” Proc. ICIP, 2325–2328 (2012).

32. R. F. Cooper, C. S. Langlo, A. Dubra, and J. Carroll, “Automatic detection of modal spacing (Yellott’s ring) in adaptive optics scanning light ophthalmoscope images,” Ophthalmic Physiol. Opt. 33(4), 540–549 (2013). [CrossRef]

33. D. Cunefare, R. F. Cooper, B. Higgins, D. F. Katz, A. Dubra, J. Carroll, and S. Farsiu, “Automatic detection of cone photoreceptors in split detector adaptive optics scanning light ophthalmoscope images,” Biomed. Opt. Express 7(5), 2036–2050 (2016). [CrossRef]

34. D. Cunefare, L. Fang, R. F. Cooper, A. Dubra, J. Carroll, and S. Farsiu, “Open source software for automatic detection of cone photoreceptors in adaptive optics ophthalmoscopy using convolutional neural networks,” Sci Rep. 7(1), 6620 (2017). [CrossRef]

35. S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv:1706.05098v1 (2017).

36. Y. Wu, Y. Xia, Y. Song, Y. Zhang, and W. Cai, “Multiscale network followed network model for retinal vessel segmentation,” Proc. MICCAI 11071, 119–126 (2018). [CrossRef]

37. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 (2014).

38. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” Proc. CVPR779–788 (2016).

39. S. J. Chiu, Y. Lokhnygina, A. M. Dubis, A. Dubra, J. Carroll, J. A. Izatt, and S. Farsiu, “Automatic cone photoreceptor segmentation using graph theory and dynamic programming,” Duke, 2013, http://www.duke.edu/~sf59/Chiu_BOE_2013_dataset.htm.

Method	True positive rate	False positive rate	Dice’s coefficient
GTDP [30]	0.9894(0.0011)	0.0162(0.0013)	0.9865(0.0011)
C-CNN [34]	0.9846(0.0011)	0.0134(0.0012)	0.9856(0.0010)
Our method	0.9920(0.0008)	0.0071(0.0007)	0.9924(0.0006)

		Cone Density			Cone Spacing
Method	Fixation Location	Mean(cones/mm²)	ICC	Within-subject CV	Mean (µm)	ICC	Within-subject CV
GTDP [30]	Bottom left	71,947	0.9870	0.0114	3.401	0.9752	0.0085
	Bottom right	71,722	0.9870	0.0108	3.377	0.9545	0.0113
	Top left	76,013	0.9553	0.0137	3.202	0.9600	0.0117
	Top right	77,111	0.9765	0.0151	3.196	0.9709	0.0123
	Average	74,198	0.9765	0.0128	3.294	0.9651	0.0110
C-CNN [34]	Bottom left	71,531	0.9961	0.0064	3.495	0.9861	0.0060
	Bottom right	71,167	0.9920	0.0088	3.477	0.9771	0.0066
	Top left	75,544	0.9546	0.0119	3.323	0.9554	0.0094
	Top right	76,284	0.9908	0.0095	3.299	0.9886	0.0065
	Average	73,632	0.9834	0.0092	3.399	0.9768	0.0071
Our method	Bottom left	71,570	0.9946	0.0075	3.459	0.9788	0.0075
	Bottom right	71,326	0.9918	0.0088	3.430	0.9555	0.0104
	Top left	75,431	0.9726	0.0093	3.291	0.9477	0.0125
	Top right	76,469	0.9896	0.0099	3.261	0.9636	0.0144
	Average	73,699	0.9871	0.0089	3.360	0.9614	0.0112
Gold standard	Bottom left	71,736	0.9918	0.0092	3.420	0.9630	0.0088
	Bottom right	71,187	0.9917	0.0087	3.423	0.9698	0.0082
	Top left	75,683	0.9659	0.0124	3.244	0.9612	0.0102
	Top right	76,496	0.9908	0.0094	3.236	0.9821	0.0088
	Average	73,776	0.9851	0.0099	3.331	0.9690	0.0090

	Our method		Gold standard
Subject	Cone Density (cones/mm²)	Cone Spacing (µm)	Cone Density (cones/mm²)	Cone Spacing (µm)
1	66,802	3.54	66,876	3.49
2	73,917	3.42	73,802	3.40
3	74,116	3.46	74,074	3.41
4	76,041	3.16	75,818	3.17
5	77,620	3.22	78,306	3.18
Average	73,699	3.36	73,776	3.33

Method	True positive rate	False positive rate	Dice’s coefficient
Pre-processing and Post-processing	0.8494(0.0073)	0.0004(0.0001)	0.9149(0.0044)
Single-task learning with two classifications	0.9891(0.0011)	0.0093(0.0008)	0.9898(0.0009)
Single-task learning with three classifications	0.9912(0.0009)	0.0074(0.0007)	0.9918(0.0007)
Multi-task learning with three classifications	0.9920(0.0008)	0.0071(0.0007)	0.9924(0.0006)

Method	True positive rate	False positive rate	Dice’s coefficient
GTDP [30]	0.9894(0.0011)	0.0162(0.0013)	0.9865(0.0011)
C-CNN [34]	0.9846(0.0011)	0.0134(0.0012)	0.9856(0.0010)
Our method	0.9920(0.0008)	0.0071(0.0007)	0.9924(0.0006)

Automatic quantification of cone photoreceptors in adaptive optics scanning light ophthalmoscope images using multi-task learning

Abstract

1. Introduction

2. Methods

2.1 Dataset

2.2 Label

2.3 Image pre-processing and Patch extraction

2.4 CNN learning model

2.5 Test

2.6 Post-processing

3. Results

3.1 Cone identification performance

3.2 Reproducibility results

3.3 Bland-Altman plots

3.4 Ablation Experiment

3.5 Cone segmentation result

4. Conclusion

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (4)

Equations (3)

Biomedical Optics Express