Deep learning-based classification of the anterior chamber angle in glaucoma gonioscopy

Quan Zhou; Quan Zhou; Jingmin Guo; Jingmin Guo; Zhiqi Chen; Wei Chen; Chaohua Deng; Tian Yu; Fei Li; Xiaoqin Yan; Tian Hu; Linhao Wang; Yan Rong; Mingyue Ding; Junming Wang; Junming Wang; Xuming Zhang; Xuming Zhang

doi:10.1364/BOE.465286

1. Introduction

Glaucoma affects about 79.6 million people worldwide and it is the first leading cause of irreversible blindness [1]. It most commonly results from a failure to maintain a balance between the amount of aqueous humor produced and that drained away, thereby leading to increased intraocular pressure and optic neuropathy [2–4]. Most aqueous humor drains at the anterior chamber angle (ACA) through the trabecular meshwork, Schlemm’s canal, collector channels, aqueous and episcleral veins [4]. The main mechanism of increased intraocular pressure in glaucoma is an impaired outflow facility due to an abnormal ACA drainage system that is either open (open angle glaucoma) or closed (angle closure glaucoma) [5]. The examination of ACA is the basis for dividing glaucoma into open angle and closed angle, which have distinct management protocols and outcomes [6]. Incorrect ACA assessment may lead to misdiagnosis, inappropriate treatment and even serious adverse consequences for the patients. Accurate assessment of the ACA is undoubtedly the top priority in the diagnosis and treatment of glaucoma.

Gonioscopy is recognized as the gold standard for assessing the ACA and is mandatory for the diagnosis and management of glaucoma [6–8]. Alexios Trantas has first used it to observe the ACA of a living human and coined this term [9]. The gonioscope consisting of a thick contact lens incorporates an angled mirror that refracts or reflects the rays coming from the ACA at the interface between the lens and the air. By using slit-lamp illumination and magnification, the structures in the ACA, such as cornea, Schwalbe’s line (SL), trabecular meshwork (TM), scleral spur (SS) and ciliary body band (CBB) are clearly visible (Fig. 1). Because gonioscopy allows a wide panoramic view of the ACA, it is still an irreplaceable diagnostic tool in everyday ophthalmic practice, even with some new available imaging methods.

Fig. 1. Gonioscopic image of the anterior chamber angle. SL: Schwalbe’s line; TM: Trabecular meshwork; SS: Scleral spur; CBB: Ciliary body band.

Download Full Size | PDF

Using gonioscopy, the ophthalmologist can determine if the ACA is open or closed [7]. It should be emphasized that gonioscopy is the standard for assessing narrow angles, and glaucoma experts agree that every suspicious patient should undergo it [6,7]. Patients with narrow angles can have different outcomes ranging from angle closure attack to anatomical narrow angles without evidence of glaucoma. Angle closure glaucoma is responsible for half of glaucoma related blindness worldwide [1]. The detailed grading of ACA is highly important for predicting the possibility of angle closure in patients with narrow angles. Scheie has developed a grading system in which Roman numerals are used to describe the ACA grading [10]. In this system, a larger number indicates a narrower angle. This system allows for the quantitative recording of gonioscopic findings for predicting the risk of angle closure and for communication [8]. Currently, it is mostly accepted as a grading method in gonioscopy. According to Scheie, grading of the ACA is shown in Table 1. However, as the gonioscopy is highly subjective and technique-dependent, grading may vary between examiners [11]. It requires expertise of a highly skilled examiner [12]. Its relative lack of usage among non-glaucoma specialists adds to the misinterpretation of grading results [13]. It has been confirmed that without an accurate estimation of the ACA grading, 10% of angle closure glaucoma may be mistakenly regarded as open angle glaucoma [13]. Even 1.5% of patients referred for cataract surgery are found to have undetected narrow angles or angle closure, and suffer irreparable damage [14]. An objective and accurate evaluation method is needed to provide a reliable basis for the diagnosis and treatment of glaucoma so as to reduce the blinding rate.

Table 1. Scheie chamber angle grading system

View Table | View all tables in this article

The artificial intelligence (AI) has developed into a hot research field in recent years mainly due to the rapid development of computing power of computers and the explosion of big data. Currently, the deep learning (DL), as one of the emerging fields belonging to AI, has become very popular in the field of medical diagnosis in that it can improve the speed and accuracy of diagnosis and reduce the burden of doctors. The representative DL models include deep boltzmann machines [14], deep belief networks [15], stacked auto-encoders [16], recurrent neural networks [17,18], generative adversarial networks [19,20] and convolutional neural networks (CNN) [21,22]. Among these models, the CNN has found wide application to early detection of different eye diseases in intelligent medicine. Therefore, it can be adopted as a supplementary tool for the ophthalmologists to validate their decision. For example, Maji et al. [23] have proposed to use the attention based CNN to realize the automatic classification of retinal blood vessels from the fundus images. The approach can provide better performance than the state-of-the-art machine learning methods. Raghavendra et al. [24] have put forward an 18-layer CNN to detect glaucoma using the fundus images. The adopted CNN can effectively extract robust features that are conducive to glaucoma detection from the fundus images. Muhammad et al. [25] have developed a hybrid DL method to distinguish healthy suspect eyes from glaucomatous ones based on OCT scans. In this algorithm, the CNN is utilized to extract features and the random forest is adopted to provide the final prediction. Son et al [26] have used different DL algorithms to screen multiple abnormal findings in the retinal fundus images. The results show that the learning based algorithms can offer reliable performance, which opens a possibility for reducing the workload of clinical doctors. Fu et al [27] have proposed a multi-label deep network to segment optic cup and optic disc from the fundus images. The proposed method achieves state-of-the-art segmentation performance on Online Retinal Fundus Image Database for Glaucoma Analysis (ORIGA) [28]. Compared with the traditional machine learning approaches, this presented method provides a significant improvement of diagnosis performance. The DL has also been applied to the diagnosis of glaucoma based on the qualitative and quantitative analysis of the ACA in AS-OCT images [29]. The sensitivity, specificity and AUC are even 0.90 ± 0.02, 0.92 ± 0.008 and 0.96 for the diagnosis of angle closure [29]. Fu et al. [30] have presented a multilevel CNN based method for predicting the angle-closure detection results by learning a discriminative representation from the anterior segment optical coherence tomography (AS-OCT) images. These results are encouraging as they demonstrate the ability of DL to perform an automatic and objective diagnosis of eye images with the similar accuracy to an experienced and highly trained ophthalmologist.

However, to the best of our knowledge, there are few reports on the automatic classification of the ACA from the anterior segment images obtained by gonioscopy. The only report is from Chiang et al [31]. In this research, a simple CNN architecture is proposed to classify open and closed angle images. This research only addresses a binary classification problem and no further work is done on the detailed classification of the closed angle images. Moreover, the CNN adopted in this research includes some down-sampling operations which will cause the loss of such thin and long structures as SL, TM, SS and CBB in the gonioscopic images. To address these issues, we have developed an automatic, objective, and accurate computer-aided diagnosis (CAD) approach based on the high resolution network for the classification of the ACA. The proposed method can extract the features of the above fine structures in the gonioscope ACA images effectively by combining a distinctive high resolution network with the spatial and channel attention, thereby facilitating the accurate multi-classification of the ACA. The effectiveness and superiority of the proposed method has been demonstrated on the clinical gonioscope ACA images.

2. Materials and methods

2.1 Image dataset

We include a wide population to ensure the diversity of ACA. It includes patients diagnosed with glaucoma, suspicious patients and normal people who voluntarily participate in the check. The clinical datasets have been collected from the outpatient clinics and wards of Tongji Hospital, Tongji Medical College of Huazhong University of Science and Technology between Oct 30, 2020 and Jan 30, 2021. Age (18 years or older) and willingness and ability to participate are the inclusion criteria. Exclusion criteria includes previous eye surgery, active ocular infection, corneal edema or degeneration and anterior segment opacity precluding a clear view of the ACA, and possible allergy to topical anesthesia (proparacaine hydrochloride ophthalmic solution 0.5%, Nanjing Ruinian Best Pharmaceutical Co. Ltd. China) and levofloxacin (levofloxacin hydrochloride eye gel, Hubei Guangji Pharmaceutical Co. Ltd. 0.6 mL, China) used during the examination. This study has been conducted in compliance with the Declaration of Helsinki. Ethics approval (with the code of TJ-IRB20101024) is obtained from Tongji Hospital, Tongji Medical College of Huazhong University of Science and Technology.

Indirect gonioscopy has been performed by a single examiner using a 4-gonio lens (G-4 Gonio, Volk Optical Inc., Mentor, OH, USA) and slit lamp microscope (Digital Camera Slit Lamp Microscope LS-6, Chongqing Sunkingdom Medical Equipment Co. Ltd, Chongqing, China) at high magnification (×16) with eyes in the primary gaze position [32]. A 1-mm light beam is reduced to a very narrow slit and adjusted to the photographed quadrant [32]. Digital photometer of ambient illumination is less than 1 lux. Accidental indentation and light falling on the pupil should be avoided during the examination. The ACA, located at the nasal, temporal, superior, inferior, superior temporal, superior nasal, inferior temporal, and inferior nasal sides is photographed separately. The grade of ACA is classified with the Scheie grading system [10], which has been widely used in the glaucoma clinic of Tongji Hospital.

A total of 146 participants have been included. The mean age is 50.7 ± 15.1 years, and 58.2% of participants are female. The gonioscopy is performed on both eyes of the patients if the participants are eligible. The multiple gonioscopic images are collected at several angles for the eyes of each participant. We will exclude the images that are so blurry that it is impossible for the doctors to judge the ACA grading and the images in which the regions of interest are missing due to the patient's physiological motion. For each participant, there exist certain differences among the images acquired at different angles, and these images may correspond to different grades. A total of 1780 original gonioscope ACA images are obtained, including 922 images in grade 0, 200 images in grade I, 200 images in grade II, 200 images in grade III and 258 images in grade IV. It should be noted that we have not implemented standardization of image contrast, brightness and orientation for these data. The reference labels have been provided by two professors with 11 and 19 years of clinical practice and 7 and 15 years of experience in glaucoma diagnosis who were not previously involved in this assessment.

2.2 HahrNet

In the classification task, the effective image feature extraction plays a highly important role. The structures of the ACA in the goniscope image appear to be long and thin. The existing DL methods cannot extract the complete features of such structures from the goniscope image in that they generally reconstruct the high-resolution features from the low-resolution ones using a series of subnets. To address this problem, we have proposed a hybrid attention based high resolution convolutional network (HahrNet) in this work. The proposed HahrNet is improved from HrNet [33], and it retains high-resolution feature maps while gradually introduces low-resolution feature ones, and then combines feature maps of different resolutions at the same depth and feature maps of the same resolution at different depth to improve the representation ability of high-resolution and low-resolution feature maps so that the information between different resolutions can be exchanged and aggregated better. Furthermore, the important information will be filtered and strengthened by adding a hybrid attention module.

The proposed HahrNet based classification scheme is shown in Fig. 2. In the proposed scheme, three feature maps with different resolutions are taken as examples here. The feature maps marked with blue, orange and green colors represent high-resolution, medium-resolution and low-resolution feature maps, respectively. More feature maps of different scales can be introduced according to the need of the classification task. The detailed structure of the HahrNet will be introduced here.

Fig. 2. The HahrNet based classification scheme.

Download Full Size | PDF

In the first stage of the HahrNet, the original input image is first passed through a convolution kernel of size 3×3 with stride of 2 to extract high-resolution feature maps. The number of convolution kernels is 64. Then, the number of feature channels is further increased to 256 through a convolution module composed of four residual units where each unit with the same structure to the ResNet-50 is formed by a bottleneck. Following the bottleneck, the high-resolution and medium-resolution feature maps with 32 and 64 channels are obtained through the transition layer which contains two 3×3 convolutions (Conv). Likewise, after the feature maps are processed by the same convolution module and corresponding transition layer, more feature maps with different resolution scales can be obtained. Information exchange between different scales of feature information is performed through the information exchange unit. The feature maps of different resolutions are down-sampled using a convolution kernel of size 3×3 and up-sampled using a deconvolution kernel of size 3×3 with stride of 2 to realize the final feature fusion. In addition to the above information exchange, the low-level features of the same resolution and the high-level features of the same resolution are also connected through the skip connection to form the dense connection. Since the number of low-level feature maps obtained through the bottleneck model is 256, we will align the number of channels of low-level features and that of high-level ones by convolution and integrate them by the skip connection. For the generated multi-resolution feature maps, the low-resolution feature maps will be up-sampled by stepwise deconvolution and are summed until the integration with the highest-resolution feature maps is completed. The low-resolution feature maps are up-sampled by deconvolution (Deconv) step by step until they are integrated with the highest-resolution feature maps. The multi-channel feature maps are obtained through three 3×3 Conv with the number of channels 64, 128 and 256.

The features extracted by the HahrNet are processed by several Conv, batch normalization (BN) and rectified linear unit (ReLU) activation. The fusion of feature maps with different scales will inevitably produce a certain amount of redundant information. In order to highlight the key information and filter out the useless information, we use the hybrid attention including channel attention and spatial attention to improve the feature representation ability. Here, the Effective Channel Attention (ECA) module [34] is used as channel attention (CA) in that it can ensure the validity and high efficiency of the model by achieving appropriate cross-channel interaction while avoiding dimensionality reduction. The spatial attention (SA), as a supplement to channel attention, focuses on the spatial relationship between features and it is used to filter out the useless information. The hybrid attention module works in this way. In this CA module, the global average pooling operation is used for each channel of the input features to extract the global information, and then the appropriate cross-channel interaction is achieved through one-dimensional convolution. In the SA module which follows the CA module, the average pooling (AvgPool) and the max pooling (MaxPool) operations are firstly applied to the features processed by ECA along the channel axis to quickly capture the global context information, thereby generating two 2D maps. Then, the two feature maps are connected by cascading to generate a two-channel feature map. Finally, the convolutional layer is used as the connected feature descriptor to generate the spatial attention feature maps. Following the hybrid attention module, the standard classification step is implemented by adding a global average pooling (GAP) layer, a fully connected (FC) layer and a softmax classifier.

2.3 Training of HahrNet

2.3.1 Data argumentation

Since there is relatively large difference between the number of images for grade 0 and that for the other grades, we have utilized such commonly used data augmentation methods as rotation of 90° and 180°and flip for images of the other grades to address the problem of data imbalance in the dataset. The dataset is split at the patient level and we randomly select 497 images of 43 participants as the test dataset. 86 and 17 participants are for the training set and the validation set, respectively, and data augmentation is performed on the training and validation images. After augmentation, the number of images in each grade for grades I to III is 1000 and that of images in grade IV is 1032. Table 2 shows the number of images for each grade before and after argumentation.

Table 2. Baseline characteristics.

View Table | View all tables in this article

2.3.2 Training and validation

To train the HahrNet effectively, the gonioscope ACA images are resized to 256×256, and 3960 images and 497 images are randomly selected as the training set and test set, respectively. The HahrNet is realized using Keras 2.2.4. It is trained using the cross entropy as the loss function on an Ubuntu 16.04 computer with Core I7-6950X CPU and 96G RAM, where the NVIDIA GTX 2080Ti GPU with CUDA 10.1 is used for acceleration. The training batch size is 32. The initial learning rate is set to 0.0001 and it will be reduced to 0.1 times of the original one if the loss of the validation set does not decrease for 10 consecutive epochs. The training will be stopped when the loss of the validation set does not decrease for 30 consecutive epochs. The Adam optimizer is adopted to minimize the cost function.

To ensure the reliability of HahrNet, it is trained for five times under the premise of randomly initializing the parameters. The average training process of HahrNet is shown in Fig. 3, where the accuracy is defined in Section 3.1. Clearly, the training accuracy curve and loss curve maintain the smooth transition during training. The accuracy curve and loss curve of the validation set fluctuate greatly in the early stage, but are gradually stabilized in the later stage. In general, the HahrNet can converge quickly and involves no overfitting during training.

Fig. 3. Loss and accuracy curve of HahrNet on the training and validation datasets.

Download Full Size | PDF

3. Experimental results

3.1 Evaluation metrics

To evaluate the classification performance of HahrNet, we have used 497 gonioscope ACA images as test dataset and used the reference labels as ground truth. It should be emphasized that the test images are produced from the original unaugmented dataset. The accuracy, specificity, sensitivity, receiver operating characteristic curve (ROC) and area under the curve (AUC) are used as the evaluation metrics. Here, the accuracy (ACC) measures the proportion of correctly classified samples to the total samples.

(1)$$ACC = \frac{{TP + TN}}{{{N_{test}}}} \in [{0,1} ]$$

Where ${N_{test}}$ is the total number of test images, TP and TN denote the number of true positive and true negative cases, respectively.

The specificity (SPE) measures the ability to recognize the negative samples and it is defined as the proportion of cases which are correctly classified in all negative samples. The sensitivity (SEN) measures the ability to recognize the positive samples and it is defined as the proportion of cases which are correctly classified in all positive samples. As this paper discusses the multi-classification problem, the average values of the above metrics are calculated to appreciate the classification performance.

(2)$$\overline {SPE} = \frac{1}{5}( \frac{{T{N_1}}}{{T{N_1} + F{P_1}}} + \frac{{T{N_2}}}{{T{N_2} + F{P_2}}} + \frac{{T{N_3}}}{{T{N_3} + F{P_3}}} + \frac{{T{N_4}}}{{T{N_4} + F{P_4}}} + \frac{{T{N_5}}}{{T{N_5} + F{P_5}}}) \in [{0,1} ]$$

(3)$$\overline {SEN} = \frac{1}{5}( \frac{{T{P_1}}}{{T{P_1} + F{N_1}}} + \frac{{T{P_2}}}{{T{P_2} + F{N_2}}} + \frac{{T{P_3}}}{{T{P_3} + F{N_3}}} + \frac{{T{P_4}}}{{T{P_4} + F{N_4}}} + \frac{{T{P_5}}}{{T{P_5} + F{N_5}}}) \in [{0,1} ]$$

Where TP_i, TN_i, FP_i, FN_i means the number of true positives, true negatives, false positives and false negatives cases when the i-th class is regarded as positive and the remaining are negative.

The ROC is produced by calculating the false positive rate (1-specificity) and the true positive rate (sensitivity) with different predicted probability thresholds. It is a comprehensive metric reflecting the sensitivity and specificity of continuous variables. The AUC represents the quantitative estimation of the area under the ROC, which is mainly used to measure the generalization and classification performance of the model.

All statistical analyses based on the above metrics are made using Python (version 3.5) packages and SPSS (version 18.0).

3.2 Ablation study

3.2.1 Comparison of the number of network stages

The proposed HahrNet involves some key parameters. The most important parameter is the number Ns of network stages. To analyze the influence of Ns on the classification results, Ns is set to 2, 3, and 4, respectively. Here, Ns > 4 will not be considered because too many stages will lead to heavy calculation load. The corresponding quantitative results of all metrics for the various Ns are shown in Table 3. It can be seen from Table 3 that the classification performance of HahrNet is slightly better than that of HrNet when Ns = 2. With the increase of Ns, the performance of HahrNet will be significantly improved. In this paper, we will choose Ns = 4 to achieve the trade-off between the classification performance and the computational complexity.

Table 3. Metrics values of HahrNet using different network stages.

View Table | View all tables in this article

3.2.2 Comparison of the number of network stages

To verify the effectiveness of the structure of HahrNet, the network structure will be evaluated. The compared network models are improved from the HrNet [33], where the four network stages are used and the number of corresponding channels from high-resolution features to low-resolution features is 32, 64, 128, and 256, respectively. The first compared model is HrNet_Deconv, where the transposed convolution based up-sampling method is used instead of the nearest neighbor interpolation in the HrNet. Besides, the hybrid attention module and the dense connection are combined with HrNet_Deconv. The resultant models are called as HrNet_HA and HrNet_DC, respectively. For the HrNet_HA, it will be further divided into HrNet_HA_EX, HrNet_HA_CL and HrNet_HA_All. The HrNet_HA_EX means that the hybrid attention module is connected after the information exchange unit. The HrNet_HA_CL means that the hybrid attention module is placed after the last integration stage of different resolution feature maps. The HrNet_HA_All means that the attention module is not only placed in the integration stage, but also placed after the information interaction unit.

The quantitative results of the above schemes are shown in Table 4. From Table 4, it can be seen that the HrNet_Deconv provides a significant improvement of various metrics over the HrNet. The reason lies in that the up-sampling method in the HrNet_Deconv provides it with better adaptability in the selection of up-sampling parameters than that in the HrNet. Compared with the HrNet_Deconv, the HrNet_DC, HrNet_HA_EX and HrNet_HA_CL provide higher metrics values, which proves the effectiveness of dense connection and hybrid attention. Accordingly, the proposed HahrNet has the best classification performance in terms of ACC, $\overline {SEN} $ and $\overline {SPE} $ among all evaluated models because it combines the effective up-sampling method with the dense connection and hybrid attention modules.

Table 4. Metrics values of HahrNet and its variants.

View Table | View all tables in this article

3.2.3 Comparison of the feature fusion method of HrhaNet

The feature fusion method has an important influence on the classification performance of the proposed HrhaNet. After the feature maps of different scales are extracted from the original input image, there are two schemes for the feature fusion. The first scheme is called HahrNet_Up. where the low-resolution feature map is sequentially deconvolved and up-sampled to the high resolution feature map for integration until the final fusion feature map is produced by summation operation. The second method is called HahrNet_Down which works in an opposite way to HahrNet_Up.

The quantitative results of the above two models are shown in Table 5. Clearly, the HahrNet_Down and HahrNer_Up perform better than the HrNet in most metrics. In particular, the HahrNet_Up achieves the best classification performance in three models. The reason can be explained in this way. The key structures in the gonioscope ACA image seem to be thin and long. When the low-resolution feature map is superimposed with the high-resolution feature map through deconvolution, the HahrNet_Up can preserve the integrity of the feature information of thin and long structures well by retaining the high-resolution features while incorporating the low-resolution ones.

Table 5. Metrics values of HahrNet using different feature fusion strategies.

View Table | View all tables in this article

3. Comparison of classification results

3.3.1 Comparison among the HahrNet and other DL models

In order to verify the superiority of HahrNet, such DL models as Xception [35], ResNet [36], MobileNet [37], DenseNet [38], NasNet [39] and HrNet are used for comparison. The average training curve of each model on the validation set is shown in Fig. 4. Obviously, all models have certain fluctuations in the early training stage. In the later training stage, the HahrNet converges faster than other networks. In general, the HahrNet provides the highest classification accuracy among all models.

Fig. 4. Accuracy curves for different DL models on the ACA validation dataset.

Download Full Size | PDF

Table 6 shows the quantitative metrics of evaluated seven models. Clearly, the HahrNet outperforms the other six models in most metrics. Although the ResNet and DenseNet can provide relatively competitive classification results, but their accuracy is 1.01% and 1.61% lower than that of HahrNet, respectively. The superiority of HahrNet results from the following two strategies. Firstly, the HahrNet has excellent feature representation ability because the multi-resolution feature maps are utilized and they can interact through cross-connections while the feature maps of the same scale at different depth are combined by dense connection. Secondly, the introduction of the spatial and channel attention into the HahrNet can help to filter out the redundant information in the feature maps and strengthen the useful feature information.

Table 6. Classification performance of different networks on ACA test dataset.

View Table | View all tables in this article

3.3.2 Comparison among the HahrNet and human readers

To compare the classification performance of the HahrNet with that of human readers, we have provided the same clinical test dataset for nine human readers from the ophthalmology department of Tongji Hospital. These readers are divided into three groups according to their clinical experience and professional background: three postgraduate students with two years of clinical practice, three attending physicians with more than three years of experience in ophthalmology, and three experts with five years of experience in ophthalmology and at least two years of training experience in the gonioscopy. These readers are told to independently read gonioscope ACA images. They classify each image according to the Scheie grading system and record the answer on the sheet. The performance of human readers is evaluated by computing accuracy, specificity and sensitivity based on their predictions with the reference labels. The classification results of HahrNet and human readers are shown in Table 7. The classification results for the nine human readers are provided in supplementary materials. Obviously, the accuracy, specificity, and sensitivity of HahrNet reach 96.18%, 99.04% and 95.94%, respectively. From postgraduates to experts, the classification performance is gradually improved. The expert 1 achieves the best classification performance among all doctors. It is worth noting that the HahrNet achieves better results than postgraduates and attending physicians on all metrics. Compared with experts, the HahrNet can also provide quite competitive results, even better results than expert 1. Figure 5 and Fig. 6 show the comparisons of ROC between the HahrNet and each human reader as well as each panel of human readers on the test gonioscopic images, respectively. It can be seen that the HahrNet achieves an AUC of 0.9957, which is higher than those of all human readers except expert 2 and expert 3.

Fig. 5. ROC of HahrNet and each human reader on the test gonioscopic images.

Download Full Size | PDF

Fig. 6. ROC of HahrNet and each panel of human readers on the test gonioscopic images.

Download Full Size | PDF

Table 7. Comparison of classification performance among the HahrNet and nine human readers.

View Table | View all tables in this article

Table 8 shows the comparison between the classification performance of the HahrNet and the average one of the human readers. The observation from Table 8 shows that the average sensitivity, specificity and accuracy of human readers increase with the increasing clinical experience because the long-term and repeated training of the highly experienced ophthalmologists can help to improve and maintain the classification performance. The paired samples t-test is performed among the pairwise readers with the similar experience in glaucoma diagnosis. The AUC values of postgraduates, attending physicians and experts are 0.7871 (95% CI, 0.751-0.824 where CI means confidence interval), 0.9215 (95% CI, 0.894-0.948) and 0.9769 (95% CI, 0.965-0.989), respectively. Interobserver agreement is excellent with an intraclass correlation coefficient (ICC) of 0.831 (95% CI, 0.804-0.855, p < 0.001) for postgraduates, 0.971 (95% CI, 0.966-0.975, p < 0.001) for attending physicians, and 0.992 (95% CI, 0.991-0.994, p < 0.001) for experts.

Table 8. Comparison of classification performance among the HahrNet and the human readers (average).

View Table | View all tables in this article

The confusion matrix of the HahrNet is shown in Fig. 7 where the deeper color means the higher accuracy. From Fig. 7, we can see that the HahrNet achieves an accuracy of 100% in classifying grade I, grade II and grade IV. Moreover, it provides an accuracy of 98% in the task of distinguishing grade III, but a relatively unsatisfactory accuracy of 81.7% in the task of classifying grade 0. The reason is that it is relatively difficult to accurately classify the ACA images of grade 0 since the tissue structure visible in the normal ACA is rich and complex. In addition, the ACA of the visible CBB is classified as grade 0 while the ACA of the narrow CBB is classified as grade 1 in the Scheie chamber angle grading system. This kind of non-quantitative standard might also cause grading deviation.

Fig. 7. The confusion matrixes of HahrNet on the test gonioscopic images.

Download Full Size | PDF

4. Discussion

4.1 Advantages of the proposed method

Glaucoma is characterized by abnormalities in the ACA that result in impaired aqueous outflow. Despite the development of new ophthalmology equipments, gonioscopy has been consistently considered as the gold standard of ACA evaluation. However, due to the strong subjectivity and empirical dependence of gonioscopy, the results of gonioscopy vary greatly among ophthalmologists. The final diagnosis depends on the subjective judgment of glaucoma professors and there is still a certain probability of missed diagnosis and misdiagnosis. To address this issue, we have developed a DL algorithm for automatically classifying gonioscopic ACA images.

In this work, 4954 images augmented from 1780 original images of 146 patients are used to evaluate the performance of HahrNet in the classification of gonioscopic images. Experimental results demonstrate that for five grades of ACA (grade 0-grade IV), the HahrNet provides the best classification accuracy for grade I, grade II and grade IV, an accuracy of 98% in ACA images of grade III but the lowest accuracy for grade 0. Compared with several popular DL models, the HahrNet provides higher sensitivity, accuracy and specificity in the identification of all grades of ACA. Compared with the postgraduates, attending physicians and experts, the HahrNet achieves better or competitive classification performance. Our research reveals that the proposed DL method provides a fast, convenient and objective screening scheme for the computer-aided diagnosis of Glaucoma without waiting for an assessment by senior specialists. Appropriate intervention with the aid of the fully automatic DL algorithm can greatly reduce the possibility of blindness and may have long-term positive effects.

Our study may help from the following aspects: (1) Thanks to the strong ability to classify the ACA, our model demonstrates that the HahrNet is an effective way to break through the bottleneck of the subjectivity of the gonioscopy. (2) When the ophthalmologist determines the ACA grading, the evaluation result provided by this model can serve as a reference. To a certain extent, it can vastly reduce the workload of glaucoma specialist and avoid the misjudgment due to inexperience of ophthalmologists. (3) The evaluation of ACA grading with the HahrNet is potentially valuable in the screening of populations at high risk of angle closure glaucoma and with poor access to eye care. In particular, the use of telemedicine and virtual ophthalmology has also increased recently. Our model can help primary hospitals to screen out suspicious patients who do not need advice from glaucoma specialist, and help patients to carry out triage and referral.

4.2 Limitations of the proposed method

There are a few limitations to be mentioned in this study. Firstly, as a DL-based image analysis method, our method still relies on the number and the quality of gonioscopic images. In our research, the number of gonioscopic images is relatively limited because the number of participants is relatively small. Meanwhile, the physicians need to perform surface anesthesia and collect images at several angles for the eyes of each participant to ensure the reliability and quality of the gonioscopy images. It is believed that the reliable gonioscopy images can be captured better with the continuous improvement of imaging technology and image acquisition methods by the physicians. Meanwhile, the recent researches on the continuous optimization of DL models on the low-quality images and small-sample size datasets will further improve their robustness to the images of different quality and attenuate their dependance on image quality. Secondly, our study involves the single center design and the participants in this study are all Chinese. Although we have adopted the standardized gonioscopy procedures and captured as clear images as possible, we cannot eliminate the possible negative impact on the classification performance when the model is applied to the images from other acquisition devices and examiners. Further studies are required to evaluate the usefulness of the HahrNet in different populations and their applicability to different devices. Thirdly, except for ACA grading, other image information such as pigmentation, angle neovascularization, and peripheral anterior synechiae are not considered. They will improve the performance of the model if these factors related to the diagnosis of glaucoma can be additionally considered.

5. Conclusion

In this work, we have proposed the DL based method for the classification of the clinical gonioscopic ACA images. Distinctively, the HahrNet can extract the features of such fine structures as Schwalbe’s line, Trabecular meshwork, Scleral spur and Ciliary body effectively from the ACA images by combining the high resolution network and the hybrid attention module, thereby leading to its high classification performance for ACA grading. We have compared the results of the proposed algorithm with those of state-of-the-art DL methods and human readers on the clinical test dataset. Experimental results demonstrate that it provides outperforming or competitive classification results for the gonioscopic ACA images than the compared DL models and human readers. In summary, the proposed DL method can be used as an auxiliary solution for ophthalmologists with little experience to grade the ACA, thereby enabling early and appropriate treatment of glaucoma.

Funding

National Natural Science Foundation of China (81974133, 61871440).

Acknowledgments

We thank the participants and the doctors from Tongji Hospital of Tongji Medical College, Huazhong University of Science and Technology.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. H. A. Quigley, “The number of people with glaucoma worldwide in 2010 and 2020,” Br. J. Ophthalmol. 90(3), 262–267 (2006). [CrossRef]

2. P. J. Foster, “The definition and classification of glaucoma in prevalence surveys,” Br. J. Ophthalmol. 86(2), 238–242 (2002). [CrossRef]

3. J. Dietze, K. Blair, S. J. Havens, and Glaucoma, 2020 Nov 19, in StatPearls [Internet] (StatPearls Publishing, 2020).

4. M. Goel, “Aqueous humor dynamics: A Review,” Open Ophthalmol. J. 4(1), 52–59 (2010). [CrossRef]

5. R. N. Weinreb, T. Aung, and F. A. Medeiros, “The pathophysiology and treatment of glaucoma,” JAMA 311(18), 1901 (2014). [CrossRef]

6. I. Riva, E. Micheletti, F. Oddone, C. Bruttini, S. Montescani, G. De Angelis, L. Rovati, R. N. Weinreb, and L. Quaranta, “Anterior chamber angle assessment techniques: A review,” J. Clin. Med. 9(12), 3814 (2020). [CrossRef]

7. N. Porporato, M. Baskaran, R. Husain, and T. Aung, “Recent advances in anterior chamber angle imaging,” Eye 34(1), 51–59 (2020). [CrossRef]

8. P. Singh, M. Tyagi, Y. Kumar, K. Kuldeep, and P. Das Sharma, “Gonioscopy: A review,” Open J. Ophthalmol. 03(04), 118–121 (2013). [CrossRef]

9. W. L. M. Alward, “A history of gonioscopy,” Optom. Vis. Sci. 88(1), 29–35 (2011). [CrossRef]

10. H. G. Scheie, “Width and pigmentation of the angle of the anterior chamber,” AMA Arch. Ophthalmol. 58(4), 510 (1957). [CrossRef]

11. J. Phu, H. Wang, S. K. Khuu, B. Zangerl, M. P. Hennessy, K. Masselos, and M. Kalloniatis, “Anterior chamber angle evaluation using gonioscopy: Consistency and agreement between optometrists and ophthalmologists,” Optom. Vis. Sci. 96(10), 751–760 (2019). [CrossRef]

12. R. Feng, S. M. H. Luk, C. H. K. Wu, L. Crawley, and I. Murdoch, “Perceptions of training in gonioscopy,” Eye 33(11), 1798–1802 (2019). [CrossRef]

13. Y. F. Choong, N. Devarajan, A. Pickering, S. Pickering, and M. W. Austin, “Initial management of ocular hypertension and primary open-angle glaucoma: an evaluation of the royal college of ophthalmologists’ guidelines,” Eye 17(6), 685–690 (2003). [CrossRef]

14. L. F. Polania and K. E. Barner, “Exploiting restricted Boltzmann machines and deep belief networks in compressed sensing,” IEEE Trans. Signal Process. 65(17), 4538–4550 (2017). [CrossRef]

15. Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013). [CrossRef]

16. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006). [CrossRef]

17. H. Zhang, Z. Wang, and D. Liu, “A comprehensive review of stability analysis of continuous-time recurrent neural networks,” IEEE Trans. Neural Networks Learn. Syst. 25(7), 1229–1262 (2014). [CrossRef]

18. J. Kugelman, D. Alonso-Caneiro, S. A. Read, S. J. Vincent, and M. J. Collins, “Automatic segmentation of OCT retinal boundaries using recurrent neural networks and graph search,” Biomed. Opt. Express 9(11), 5759 (2018). [CrossRef]

19. A. Refaee, C. J. Kelly, H. Moradi, and S. E. Salcudean, “Denoising of pre-beamformed photoacoustic data using generative adversarial networks,” Biomed. Opt. Express 12(10), 6184 (2021). [CrossRef]

20. S. Kazeminia, C. Baur, A. Kuijper, B. van Ginneken, N. Navab, S. Albarqouni, and A. Mukhopadhyay, “GANs for medical image analysis,” Artif. Intell. Med. 109, 101938 (2020). [CrossRef]

21. X. Yuan, L. Zhou, S. Yu, M. Li, X. Wang, and X. Zheng, “A multi-scale convolutional neural network with context for joint segmentation of optic disc and cup,” Artif. Intell. Med. 113, 102035 (2021). [CrossRef]

22. J. J. Gómez-Valverde, A. Antón, G. Fatti, B. Liefers, A. Herranz, A. Santos, C. I. Sánchez, and M. J. Ledesma-Carbayo, “Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning,” Biomed. Opt. Express 10(2), 892 (2019). [CrossRef]

23. D. Maji and A. A. Sekh, “Automatic grading of retinal blood vessel in deep retinal image diagnosis,” J. Med. Syst. 44(10), 180 (2020). [CrossRef]

24. U. Raghavendra, H. Fujita, S. V. Bhandary, A. Gudigar, J. H. Tan, and U. R. Acharya, “Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images,” Inf. Sci. 441, 41–49 (2018). [CrossRef]

25. H. Muhammad, T. J. Fuchs, N. De Cuir, C. G. De Moraes, D. M. Blumberg, J. M. Liebmann, R. Ritch, and D. C. Hood, “Hybrid deep learning on single wide-field optical coherence tomography scans accurately classifies glaucoma suspects,” J. Glaucoma 26(12), 1086–1094 (2017). [CrossRef]

26. J. Son, J. Y. Shin, H. D. Kim, K.-H. Jung, K. H. Park, and S. J. Park, “Development and validation of deep learning models for screening multiple abnormal findings in retinal fundus images,” Ophthalmology 127(1), 85–94 (2020). [CrossRef]

27. H. Fu, J. Cheng, Y. Xu, D. W. K. Wong, J. Liu, and X. Cao, “Joint optic disc and cup segmentation based on multi-label deep network and polar transformation,” IEEE Trans. Med. Imaging 37(7), 1597–1605 (2018). [CrossRef]

28. Z. Zhang, F. Yin, J. Liu, W. K. Wong, N. M. Tan, B. H. Lee, J. Cheng, and T. Y. Wong, “ORIGA-light: An online retinal fundus image database for glaucoma analysis and research,” in 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology (IEEE, 2010), pp. 3065–3068.

29. H. Fu, M. Baskaran, Y. Xu, S. Lin, D. W. K. Wong, J. Liu, T. A. Tun, M. Mahesh, S. A. Perera, and T. Aung, “A deep learning system for automated angle-closure detection in anterior segment optical coherence tomography images,” Am. J. Ophthalmol. 203, 37–45 (2019). [CrossRef]

30. H. Fu, Y. Xu, S. Lin, D. W. K. Wong, M. Baskaran, M. Mahesh, T. Aung, and J. Liu, “Angle-closure detection in anterior segment OCT based on multilevel deep network,” IEEE Trans. Cybern. 50(7), 3358–3366 (2020). [CrossRef]

31. M. Chiang, D. Guth, A. A. Pardeshi, J. Randhawa, A. Shen, M. Shan, J. Dredge, A. Nguyen, K. Gokoffski, B. J. Wong, B. Song, S. Lin, R. Varma, and B. Y. Xu, “Glaucoma expert-level detection of angle closure in goniophotographs with convolutional neural networks: The Chinese American Eye Study,” Am. J. Ophthalmol. 226, 100–107 (2021). [CrossRef]

32. Y. Dai, S. Zhang, M. Shen, Y. Zhou, M. Wang, J. Ye, and D. Zhu, “Modeling of gonioscopic anterior chamber angle grades based on anterior segment optical coherence tomography,” Eye Vis. 7(1), 30 (2020). [CrossRef]

33. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2019), pp. 5686–5696.

34. Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “ECA-Net: Efficient channel attention for deep convolutional neural networks,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2020), pp. 11531–11539.

35. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2017), pp. 1800–1807.

36. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 770–778.

37. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “MobileNets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.

38. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2017), pp. 2261–2269.

39. B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 8697–8710.

Classification	Angle structures visible	Clinical interpretation
Grade 0	Ciliary body band seen	No angle closure
Grade I	Ciliary body band narrow	No angle closure
Grade II	Ciliary body band not seen, scleral spur seen	Rarely closure possible
Grade III	Post trabecular meshwork not seen	Closure likely
Grade IV	Schwalbe’s line not seen	Closed

Metrics	$A C C$	$\bar{S E N}$	$\bar{S P E}$	$\bar{A U C}$
HrNet (Ns = 2)	0.9316	0.9276	0.9828	0.9955
HahrNet (Ns = 2)	0.9356	0.9319	0.9838	0.9951
HahrNet (Ns = 3)	0.9457	0.9422	0.9864	0.9955
HahrNet (Ns = 4)	0.9618	0.9594	0.9904	0.9957

Metrics	$A C C$	$\bar{S E N}$	$\bar{S P E}$	$\bar{A U C}$
HrNet	0.9316	0.9276	0.9828	0.9955
HrNet_Deconv	0.9517	0.9490	0.9879	0.9961
HrNet_DC	0.9577	0.9553	0.9894	0.9963
HrNet_HA_CL	0.9577	0.9551	0.9894	0.9949
HrNet_HA_All	0.9396	0.9356	0.9848	0.9950
HrNet_HA_EX	0.9557	0.9531	0.9889	0.9952
HahrNet	0.9618	0.9594	0.9904	0.9957

Metrics	$A C C$	$\bar{S E N}$	$\bar{S P E}$	$\bar{A U C}$
HrNet Hahr	0.9316	0.9276	0.9828	0.9955
Net_Down	0.9437	0.9404	0.9859	0.9953
HahrNet_Up	0.9618	0.9594	0.9904	0.9957

Metrics	$A C C$	$\bar{S E N}$	$\bar{S P E}$	$\bar{A U C}$
Xception	0.9396	0.9358	0.9848	0.9943
ResNet-50	0.9517	0.9487	0.9878	0.9962
DenseNet-121	0.9457	0.9421	0.9863	0.9956
MobileNet	0.9276	0.9230	0.9818	0.9955
NasNet	0.9336	0.9295	0.9833	0.9938
HrNet	0.9316	0.9276	0.9828	0.9955
HahrNet	0.9618	0.9594	0.9904	0.9957

Deep learning-based classification of the anterior chamber angle in glaucoma gonioscopy

Abstract

1. Introduction

2. Materials and methods

2.1 Image dataset

2.2 HahrNet

2.3 Training of HahrNet

2.3.1 Data argumentation

2.3.2 Training and validation

3. Experimental results

3.1 Evaluation metrics

3.2 Ablation study

3.2.1 Comparison of the number of network stages

3.2.2 Comparison of the number of network stages

3.2.3 Comparison of the feature fusion method of HrhaNet

3. Comparison of classification results

3.3.1 Comparison among the HahrNet and other DL models

3.3.2 Comparison among the HahrNet and human readers

4. Discussion

4.1 Advantages of the proposed method

4.2 Limitations of the proposed method

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (8)

Equations (3)

Biomedical Optics Express

Grade	Number of patients	Before argumentation	After argumentation
Grade 0	117	922	922
Grade I	74	200	1000
Grade II	71	200	1000
Grade III	60	200	1000
Grade IV	68	258	1032

HahrNet and human readers	$\bar{S E N}$	$\bar{S P E}$	ACC
Postgraduate 1	0.6679	0.9170	66.80%
Postgraduate 2	0.7072	0.9268	70.42%
Postgraduate 3	0.6031	0.9011	59.96%
Attending physician 1	0.8397	0.9596	83.90%
Attending physician 2	0.8618	0.9657	86.32%
Attending physician 3	0.9213	0.9804	92.15%
Expert 1	0.9338	0.9834	93.36%
Expert 2	0.9800	0.9950	97.99%
Expert 3	0.9775	0.9944	97.79%
HahrNet	0.9594	0.9904	96.18%

Metrics	Postgraduate	Attending physician	Expert	HahrNet
$\bar{S E N}$	0.6594	0.8743	0.9638	0.9594
$\bar{S P E}$	0.9150	0.9686	0.9909	0.9904
$A C C$	65.73%	87.46%	96.38%	96.18%
$\bar{A U C}$	0.7871	0.9215	0.9769	0.9957
p value	<0.001	<0.001	<0.001	<0.001