RMPPNet: residual multiple pyramid pooling network for subretinal fluid segmentation in SD-OCT images

Jian Yang; Zexuan Ji; Sijie Niu; Qiang Chen; Songtao Yuan; Songtao Yuan; Wen Fan

doi:10.1364/OSAC.387102

1. Introduction

Central serous chorioretinopathy (CSC), caused by the impaired pigment epithelial barrier [1–3], has a high palindromic rate and is easy to cause permanent damage to vision [4]. The neurosensory retinal detachment (NRD) is a prominent characteristic of CSC, which results in the accumulations of subretinal fluid at the posterior pole [5]. Conventional medical treatment procedures often require a lot of time and energy from doctors to recognize and diagnose the CSC. Therefore, it is necessary to develop intelligent methods to automatically segment the NRD areas in medical images [6].

Conventionally, color fundus photographs, fundus autofluorescence and optical coherence tomography imaging technologies have been widely used to diagnose the retinal lesions [7–9]. Comparing with color fundus photographs and fundus autofluorescence, the spectral-domain optical coherence tomography (SD-OCT) imaging has become an important imaging modality for the diagnosis and treatment of CSC. SD-OCT can provide three-dimensional images consisting of a series of continuous two-dimensional images [10,11]. Therefore, the SD-OCT images can fully explore the lesion characteristics, which is advantageous for diagnosis and treatment of retinopathy [12].

Factually, the retinal detachment is caused by the neurosensory retina separated from the underlying retinal pigment epithelium (RPE) [5]. Therefore, many unsupervised NRD segmentation approaches utilize the layer segmentation results to locate the NRD region, such as the thresholding-based algorithms [13], the level sets based methods [14,15] and the graph search model [16–19]. Wu et al. [20] used the fuzzy level-set method guided by the Enface fundus image to reduce the searching range and improve the segmentation efficiency. Besides, the supervised machine learning approaches are widely utilized in retinal images, including k-nearest neighbor [21], random forest [22], kernel regression [23], support vector machine [24] and deep learning model [25]. Moreover, semi-supervised or interactive segmentation methods are utilized to overcome the influence of low contrast and speckle noise in the SD-OCT image by incorporating prior and expert information. Zheng et al. [26] used computerized segmentation combining with minimal expert interaction for fast and accurate quantification of subretinal fluid. Fernández [27] applied the deformable model to outline the lesion boundaries based on the manual initialization. Wang et al. [28] introduced the soft higher-order constraint into Markov random fields and proposed a slice-wise label propagation algorithm to capture the temporal coherence among the slices. Based on the 3D characteristic of SD-OCT, Montuoro et al. [29] reported a 3D graph search method to simultaneously segment the retinal layers and subretinal fluid. Wu et al. [22] proposed a three-dimensional segmentation method by utilizing the continuous max flow optimization algorithm. Nevertheless, most methods rely on the performance of layer segmentation results.

In 2014, Long et al. [30] started a pathbreaking job by using the neural network for semantic segmentation. The conventional classification network is modified as a pixel-level classification network by using the transpose convolutions. Recently, semantic segmentation has already become a popular research direction in deep learning. In the semantic segmentation networks, the receptive field is important to explore local context and global information embedded in data. As an intuitive solution, the utilization of large convolution kernels [31] would significantly increase the cost of calculation and memory. Comparatively, the atrous convolution can obtain a larger receptive field with less computational cost [32,33]. In addition, the combination of $1 \times n$ and $n \times 1$ convolutions can also obtain a larger receptive field.

Another important issue is the various sizes of NRD lesions as shown in Fig. 1. Without specific procedures, conventional semantic segmentation networks would hardly obtain satisfactory results for small or large lesions. Image/Feature pyramid strategy are intuitive solutions by feeding different scales of images/feature into the networks [34–40]. Yu et al. [41] developed an extra convolution layer in cascade to gradually capture long-range context. PSPNet [42] performs pooling operations for different scales, which is termed as pyramid polling module (PPM). Analogous with Deeplabv3 [32], it utilizes ASPP to capture multi-scale features. Factually, the encoder-decoder structure [30,43,44,45] can exploit more multi-scale features. Roy et al. [46] built a ReLayNet model to segment the layer and the fluid areas. Gao et al. [47] proposed a double-branched and area-constraint fully convolutional networks (DBFCN) for subretinal fluid segmentation in SD-OCT images. Hu et al. [48] proposed stochastic ASPP for retinal edema segmentation, which could alleviate the overfitting problem and significantly reduce the validation error.

Fig. 1. SD-OCT retinal image with NRD.

Download Full Size | PDF

For the NRD segmentation in SD-OCT images, the receptive field and the multi-scale features are both noteworthy. First, as illustrated in Fig. 1, the characteristic of NRD lesion is similar with that of background areas. Therefore, the network requires a sufficiently wide receptive field to prevent the center of NRD lesion from being recognized as background. Second, due to the three-dimensional characterization of NRD lesions, the size of edema area would have a large difference between the ends of edema and the middle of edema. Hence, the multi-scale strategy is necessary to deal with the various sizes of NRD lesions.

In this paper, to better deal with the receptive field and the multi-scale features, we propose a Residual Multiple Pyramid Pooling Network (RMPPNet) for NRD segmentation in SD-OCT images. RMPPNet uses the encoder-decoder architecture. In the encoder stage, we use six stride convolutions to replace the conventional pooling layers. The residual architecture is used to obtain a wider receptive field and extract more detailed features. Three Pyramid Pooling modules (PPM) are supplemented into the encoder network to further explore the multi-scale features. In the decoder stage, we use multiple transpose convolutions to recover the resolution of feature maps, and concatenate the feature maps from the encoder for each transpose convolution layer. Finally, for better and faster training, we construct a new loss function to constrain the different sets between the true label and the prediction label. The proposed algorithm can product higher segmentation accuracy compared to the state-of-the-art methods.

2. Proposed network

The architecture of the proposed RMPPNet model is shown in Fig. 2, which is an encoder-decoder structure and shares a similar backbone with UNet [44]. First, RMPPNet has strong feature extraction capabilities to ensure more accurate prediction results, which can further reduce the over- and under-segmentations. Second, the utilization of striding convolutions can further enlarge the receptive field to explore both local and global information embedded in data. Third, the pyramid pooling modules can further explore the multi-scale features, which would make the model deal with the various sizes of NRD lesions. In addition, RMPPNet is relatively concise and lightweight.

Fig. 2. Structure of the Residual Multiple Pyramid Pooling Net (RMPPNet).

Download Full Size | PDF

2.1 Backbone net

As a pioneering work for semantic segmentation, FCN utilizes the backbone of VGGNet [49] to construct the encoder network. Based on the similar idea, many researchers use the backbone of classification networks, such as AlexNet [50], GoogLeNet [51], ResNet [52], and DenseNet [53], to build the segmentation or detection network. The networks, including Deeplab, PSPNet, and SwiftNet, prove that the backbone of classification networks can assure accurate performance in segmentation networks. Therefore, in the proposed network, the residual modules are used to build the backbone net due to its lower computational complexity and better generalization performance. As illustrated in Fig. 2, the encoder part contains six residual modules. Each module begins with a striding convolution layer for reducing the resolution of feature maps with less memory. Then, one pyramid pooling module (PPM) is stacked after two Residual modules. In the decoder stage, we use multiple transpose convolutions to recover the resolution of feature maps, and contact the feature maps from the encoder for each transpose convolution layer.

2.2 Multiple pyramid pooling modules

The pyramid pooling module in PSPNet and DeeplabV3 has been proved to be a simple and efficient module for extracting multi-scale features in the spatial field. However, both PSPNet and Deeplab may not produce satisfactory results because the PPM module is only processed one time at the end of encoder stage. Therefore, in the proposed network, the multiple pyramid pooling modules strategy is proposed by extracting multi-scale features for each scale of feature maps in the encoder stage as illustrated in Fig. 2. First, four (two for the last PPM) different scales of average down sampling features are used to get four different sizes of feature maps. Second, the $1 \times 1$ convolution operations are performed to obtain the corresponding feature maps whose number of channels is a quarter of the input feature maps. Third, four groups of feature maps are respectively recovered to the size of input feature maps by up sampling operations. Finally, the four groups of feature maps and the input feature maps are concatenated to get the output feature maps, where the number of channels is twice that of the input feature maps. Comparing with PSPNet and DeeplabV3, the combination of multiple PPMs can further explore the various sizes of NRD lesions.

2.3 Decoder stage

In the decoder stage, six transpose convolutions are used to restore the resolution of feature maps for preserving more contextual information. As shown in Fig. 2, from the encoder stage to the decoder stage, the blue connections to the last five transpose convolutions are supplemented to make sure more lower-level semantic information and multi-scale features be considered during deconvolution procedure. Instead of concatenating the feature maps of the encoder stage and decoder stage, the convolutions of feature maps are directly added up to save storing memory without significantly reducing performance.

2.4 Loss function

To train the neural network, the binary cross-entropy (BCE) function is a widely utilized loss function, which is defined as

(1)$${L_{BCE}} = \frac{1}{m}\mathop \sum \nolimits_1^m \left( { - \frac{1}{n}\mathop \sum \nolimits_1^n ({{y_{truth}}log{y_{pred}} - ({1 - {y_{truth}}} )\textrm{log}({1 - {y_{pred}}} )} )} \right)$$

where m is the number of classes ($m = 2$ in our paper, i.e., NRD and background), n is the number of points in each map, ${y_{truth}}$ and ${y_{pred}}$ are the truth value and the prediction value for each pixel, respectively. However, BCE would pay more attention to the background for SD-OCT images because the background area generally takes a larger part in most B-scans. Therefore, the DICE loss function is used to make sure the training procedures can pay more attention to the segmentation objects, which is defined as

(2)$${L_{DICE}} = \frac{1}{m}\mathop \sum \nolimits_1^m \left( {1 - \frac{{2{\ast }{G_{truth}} \cdot {G_{pred}} + \theta }}{{{G_{truth}} + {G_{pred}} + \theta }}} \right)$$

where ${G_{truth}}$ and ${G_{pred}}$ are the value of label map and prediction map, respectively. $\theta $ is a small positive value to prevent zero denominator. ${L_{DICE}}$ can make the model ignore the background class and focus on the foreground class. However, ${L_{DICE}}$ is generally hard to take the boundary details into consideration.

To solve the above problem, in this paper, we supplement a new loss function as follows:

(3)$${L_{DIF}} = \frac{1}{m}\mathop \sum \nolimits_1^m ({{G_{truth}} + {G_{pred}} - 2{\ast }{G_{truth}} \cdot {G_{pred}}} )$$

${L_{DIF}}$ pays more attention to the difference between label map and prediction map, which could further improve the prediction accuracy by considering more boundary details of the targets.

Consequently, the final loss function utilized in our model can be formulated as:

(4)$$L = {L_{BCE}} + {\lambda _1}{L_{DICE}} + {\lambda _2}{L_{DIF}}$$

where ${\lambda _1}$ and ${\lambda _2}$ are two balance parameters that are set as 1 for all the experiments.

3. Experiments

3.1 Datasets and evaluation criteria

In this paper, three different datasets were utilized to evaluate the proposed model, each of which contained a varying number of longitudinal SD-OCT cube scans acquired with a Cirrus OCT device (Carl Zeiss Meditec, Inc., Dublin, CA). All the scans cover a $6 \times 6 \times 2m{m^3}$ area centered on the fovea with volume dimension $1024 \times 512 \times 128$. Each cube contains 128 B-scans (2D images) with size $1024 \times 512$. This study was approved by the Institutional Review Board (IRB) of the First Affiliated Hospital of Nanjing Medical University with in-formed consent. The segmentation ground truths for each case were obtained based on the outlines of NRD regions that were manually drew by experts. To further reduce the computational burden, all the B-scans were cropped to $640 \times 512$ by removing the redundant background.

The first dataset contains 35 cubes from 23 patients, and all the cubes are diagnosed as CSC with only NRD lesions. Based on the first dataset, the second dataset supplement ten normal cubes without NRD lesions. Therefore, the second dataset contains 45 cubes from 27 patients. The last dataset includes 23 cubes from 12 eyes of 12 patients with NRD lesions, which were described and utilized in previous works [14,20–22,28,47,54,55]. In this dataset, each cube contains two ground truths drew by two independent experts. The five-fold cross-validation strategy was employed to verify the performances for each algorithm, and the patient-independent setting was guaranteed for all the experiments.

We employed three criterion to quantitatively evaluate the performances, including the true positive volume fraction (TPVF), dice similarity coefficient (DSC) and positive predicative value (PPV).

(5)$$TPVE = \frac{{{V_{truth}} \cdot {V_{pred}}}}{{{V_{truth}}}}$$

(6)$$PPV = \frac{{{V_{truth}} \cdot {V_{pred}}}}{{{V_{pred}}}}$$

(7)$$DSC = \frac{{2{\ast }{V_{truth}} \cdot {V_{pred}}}}{{{V_{truth}} + {V_{pred}}}}$$

where ${V_{truth}}$ and ${V_{pred}}$ indicate the volume of ground truth and prediction/segmentation, respectively. Moreover, a linear analysis and Bland-Altman approach were applied for statistical correlation and reproducibility analyses.

3.2 Analysis of the proposed model

The first experiment was carried on the first dataset to verify the effect of multiple PPMs and the new loss function. As illustrated in Fig. 2, the residual modules are used to build the backbone net due to its lower computational complexity and better generalization performance. The encoder part contains six residual modules. In our network, one PPM is stacked after two residual modules. The segmentation accuracies with different settings of PPMs are listed in Table 1. RMPPNet-0 indicates that the multi-scale feature maps are not considered. RMPPNet-1 utilizes a similar strategy in both PSPNet and DeeplabV3 models, i.e., the module for dealing with multi-scale features is only processed one time at the end of the encoder stage. Different combinations of multiple PPMs are represented from RMPPNet-2 to RMPPNet-6, where RMPPNet-3 is our final choice. Comparing RMPPNet-0 with the other settings, we can find that the utilization of multi-scale features is necessary, which can significantly improve the segmentation accuracy. Comparing RMPPNet-1 with the other multiple PPMs strategy, we can observe that the multiple PPMs is an effective strategy, which can further improve the extracting capabilities of multi-scale features to adapt various sizes of NRD. It should be noted that RMPPNet-6 can obtain better results than RMPPNet-3, but has a larger computational burden. However, the corresponding improvement is limit. Consequently, considering both computational complexity and segmentation accuracy, in this paper, we set RMPPNet-3 as our final network. Finally, we can observe that the proposed multiple pyramid pooling strategy can effectively improve the segmentation accuracy by at least 1.3%.

Table 1. The segmentation accuracies with different setting of PPMs.

View Table | View all tables in this article

As described in Sec. 2.4, the BCE loss pays more attention to the background, and the Dice loss may be difficult to further consider the boundary details of the targets. Therefore, a new loss function by supplementing a DIE term is proposed, which pays more attention to the difference between ground truths and predictions. The segmentation accuracies with different combinations of loss functions are listed in Table 2. We can observe that the proposed loss function, i.e., BEC + Dice + DIF, is an effective way to further improve the performances of the network.

Table 2. The segmentation accuracies with different combinations of loss functions.

View Table | View all tables in this article

3.3 Experimental comparison with CNNs

In the first comparison experiment, we compared the proposed network with state-of-the-art CNNs on the first dataset, including FCN [30], SegNet [43], UNet [44], FastFCN [40], PSPNet [42] and DeepLabV3+ [33]. For all the methods, we set the training epochs as 60. The learning rate was initially set as 1e-4, and halved after every 20 epochs. The batch-size was 8 for FCN and DeepLabV3+, and 4 for all the other methods. The BEC loss was utilized to optimize all the comparison CNNs. The five-fold cross-validation strategy was employed for each model. The corresponding statistical results (mean ± standard deviation) with respect to DSC are listed in Table 3.

Table 3. Statistical results (mean ± standard deviation) of DSC index with five-fold cross-validation on the first dataset.

View Table | View all tables in this article

As shown in Table 3, our model can obtain a higher segmentation accuracy than most CNNs by presenting higher mean values of DSC. Moreover, the lower standard deviation of the proposed model indicates that the proposed RMPPNet is more robust to all the cases in the first dataset. As we mentioned before, in both PSPNet and DeeplabV3 models, the module for dealing with multi-scale features is only processed one time at the end of the encoder stage. Comparatively, the multiple pyramid pooling modules strategy is proposed by extracting multi-scale features for different scales in the encoder stage to make sure the proposed network can adapt various sizes of NRD. Comparing with state-of-the-art methods, we can observe that the proposed method can effectively improve the segmentation accuracy.

It should be noted that the accuracies listed in Table 3 were calculated based on 3D segmentation results. To further demonstrate the superior performances of the proposed model on B-scan images, the detection results on four example cubes are shown in Fig. 3. In Fig. 3, each bar indicates the detection results on the 3D cubes with 128 B-scans, in which the green and blue colors respectively indicate the correct detections for the B-scans with/without NRD, and the red color shows the false detections. The red color in each bar indicates the capability of distinguishing the NRD lesions from the normal region. As shown in Fig. 3, the proposed model can obtain more accurate prediction results with less false detections. Furthermore, Fig. 4 shows the segmentation results on the selected B-scans from the cubes in Fig. 3, where the green line is the boundary of ground truth, and the red line is the boundary of the predictions obtained by each method. Comparatively, the proposed the proposed model can produce smooth and accurate segmentation results that are highly consistent with the ground truths. Specifically, the B-scans selected from the third cube do not contain any NRD lesions. All the comparison models would product missegmentations on the regions with low intensity, which indicates that these models may hard to distinguish the NRD lesions from the normal regions with low intensity. The predictions of the proposed model can better deal with this problem. Consequently, the proposed model can obtain more smooth and accurate segmentation results with less over- or under- segmentations.

Fig. 3. Visual comparison of the detection results on four example cubes in the first dataset. Each bar indicates the detection results on the 3D cubes with 128 B-scans, in which the green and blue colors respectively indicate the correct detections for the B-scans with/without NRD, and the red color shows the false detections. For each subfigure, the bars from top to bottom shows the detection results obtained by FCN, SegNet, UNet, FastFCN, PSPNet, DeepLabV3+ and our model, respectively.

Download Full Size | PDF

Fig. 4. The segmentation results on the selected B-scans from the cubes in Fig. 3. The green line is the boundary of the label ground truth, the red line is the boundary of the predictions for each method, and the yellow line is the overlapping part of the red line and the green line. In each row, the images show the segmentation results obtained by FCN, SegNet, UNet, FastFCN, PSPNet, DeepLabV3+ and our model, respectively.

Download Full Size | PDF

To further verify the capability of distinguishing the NRD lesions from the normal region with low intensity, we compared the proposed network with state-of-the-art CNNs on the second dataset, which supplemented ten normal cubes without NRD lesions. The five-fold cross-validation strategy was employed for each model, and the statistical results (mean ± standard deviation) concerning DSC are listed in Table 4. Comparing the results in Table 3 and Table 4, for all the comparison networks, the corresponding performances dramatically declines due to the influence of the supplemented cubes, which means that all the models may suffer from the risk of recognizing the normal regions with low intensity as the NRD lesions. Comparatively, the proposed model can still obtain the best segmentation accuracy by only reducing 2% DSC index, which indicates better generalization of the proposed model.

Table 4. Statistical results (mean ± standard deviation) of DSC index with five-fold cross-validation on the second dataset.

View Table | View all tables in this article

Similar with the first comparison experiment, the detection results on four example cubes and the corresponding segmentation results on the selected B-scans are shown in Fig. 5 and Fig. 6, respectively. It should be noted that the first cube in Fig. 5 does not contain any NRD lesions. All the comparison CNNs would obtain obvious false detection and missegmentaitons by recognizing the normal regions with low intensity as the NRD lesions. Comparatively, the proposed model can explore the corresponding differences without any misclassification and missegmentation for the first cube. In conclusion, the proposed model shows better segmentation performances and a more stable generalization than state-of-the-art CNNs on the second dataset.

Fig. 5. Visual comparison of the detection results on four example cubes in the second dataset. Each bar indicates the detection results on the 3D cubes with 128 B-scans, green color and blue color respectively indicate the correct detections for the B-scans with/without NRD, and the red color shows the false detections. For each subfigure, the bars from top to bottom show the detection results obtained by FCN, SegNet, UNet, FastFCN, PSPNet, DeepLabV3+ and our model, respectively.

Download Full Size | PDF

Fig. 6. The segmentation results on the selected B-scans from the cubes in Fig. 5. The green line is the boundary of the label ground truth, the red line is the boundary of the predictions for each method, and the yellow line is the overlapping part of the red line and the green line. In each row, the images show the segmentation results obtained by FCN, SegNet, UNet, FastFCN, PSPNet, DeepLabV3+ and our model, respectively.

Download Full Size | PDF

3.4 Experimental comparison with NRD segmentation methods

In the last experiment, we employed the third dataset to demonstrate the superiors performances of the proposed model by comparing with state-of-the-art NRD segmentation methods, including a semi-supervised segmentation algorithm using label propagation and higher-order constraint (LPHC) [29], a stratified sampling k-nearest neighbor classifier based algorithm (SS-KNN) [23], a random forest classifier based method (RF) [24], a fuzzy level set with cross-sectional voting (FLSCV) [5], a continuous max-flow optimization-based method (CMF) [9], an Enface fundus-driven method (EFD) [6], a blob detection method (Blob) [55] and a double-branched and area-constraint fully convolutional networks (DBFCN) [46]. Due to the limited training samples, we augmented the training data by mirroring the images. As mentioned in Sec. 3.1, each cube in the third dataset contains two ground truths drew by two independent experts. Notably, we utilized the ground truths drawn by the second expert to train the proposed model.

Table 5 summarizes the quantitative results (mean ± standard deviation) between the segmentations and two independent ground truths. Overall, the proposed RMPPNet model can obtain a higher segmentation accuracy based on both ground truths. The methods such as LPHC, SS-KNN, RF, FLSCV, CMF and EFD rely on the layer segmentation results. They may not produce satisfactory results with poor layer segmentations. Both Blob method and DBFCN model can obtain better segmentations without utilizing any layer segmentation results. As an unsupervised method, the Blob method utilizes complex post-processing to obtain satisfactory results, which limits the generalized ability. Based on the backbone of FCN, DBFCN designs a double-branched structure, which can be treated as a multi-scale processing module. This structure is only employed on the last feature maps of the encoder stage. Therefore, DBFCN may be limited in adapting various sizes of NRD. Consequently, the proposed model can obtain better segmentation results, which are highly consistent with the ground truths.

Table 5. The quantitative results (mean ± standard deviation) between the segmentations and two independent ground truths.

View Table | View all tables in this article

Figure 7 shows the segmentation results obtained by all the comparison methods on four example B-scans selected from four different patients. In Fig. 7, the images from top to bottom respectively show the segmentation results obtained by LPHC, SS-KNN, RF, FLSCV, CMF, EFD, Blob, DBFCN and our method, in which the red line is the boundary segmentation results, the green and blue lines are the boundaries of ground truths drawn by the first and second expert, and the yellow line is the overlapping part between the predictions and two ground truths. The small image beside each B-scan shows the enlarged view of the NRD lesion. The B-scan in the first column contains a small NRD lesion, which has been ignored by RF, FLSCV, CMF and EFD. LPHC fail to distinguish small NRD lesion from the normal regions. Both Blob and DBFCN can only obtain a very small part of the target, which indicates that both methods cannot well handle the small lesions. The second B-scan contains a large NRD lesion. Most methods can obtain a satisfactory result. Comparatively, the proposed model can produce more smooth boundaries without obvious boundary leak on the left of NRD. The third B-scan contains a long sideling NRD. The methods such as SS-KNN, RF, CMF, Blob and DBFCN tend to segment the target into discontinuous regions. Both LPHC and FLSCV fail to recognize the NRD lesion, and EFD can only segment left part of the NRD. The last B-scan contains a large NRD lesion with low-contrast boundaries. Most comparison methods cannot well handle the low contrast problem with an obvious leak or under segmentation around boundaries. Comparatively, the proposed model can obtain better visual segmentation results, which are highly consistent with the ground truth. Therefore, from Fig. 7, we can observe that the proposed model can well handle the low-contrast boundaries and various scales for the NRD lesions.

Fig. 7. The segmentation results obtained by all the comparison methods on four example B-scans selected from four different patients. The red line is the boundary segmentation results, the green and blue lines are the boundaries of ground truths drawn by the first and second expert, and the yellow line is the overlapping part between the predictions and two ground truths. The images beside each B-scans show the enlarged view of the targets.

Download Full Size | PDF

Figure 8 shows a statistical correlation analysis by utilizing a linear regression analysis and Bland-Altman approach based on the segmentation results of the proposed method and two ground truths. From Figs. 8(a) and 8(c), we can observe that the proposed model can obtain a high correlation concerning two experts by getting ${r^2} = 1.00$. The Bland-Altman plots in Figs. 8(b) and 8(d) indicate the stable agreement between the predictions and ground truths. Consequently, the proposed model can produce accurate segmentation results that are highly consistent with both ground truths.

Fig. 8. Statistical correlation analysis. Figures (a) and (c) show the linear regression analysis between the proposed method and Expert 1/2, respectively. Figures (b) and (d) show the Bland-Altman plot for the proposed method and Expert 1/2, respectively.

Download Full Size | PDF

To further highlight the smoothness and consistency of the segmentation obtained by the proposed model, Fig. 9 shows the 3-D segmentation surfaces on eight example cubes selected from eight different patients. The methods such as LPHC, SS-KNN and FLSCV suffer from over- or under-segmentation, which are difficult to produce satisfactory surfaces. The RF method suffers from the insufficient segmentation with smaller volumes. The methods such as CMF, EFD, Blob and DBFCN can obtain better results, which, however, cannot well handle the low contrast issues and the corresponding surfaces contain obvious boundary leaks. By contrast, the proposed model outperforms the other comparison methods and generates the most similar surfaces with the ground truths.

Fig. 9. 3D surfaces of the segmentations obtained by all the comparison methods on eight example cases selected from eight patients. The images from the first row to the ninth row are the segmentation surfaces obtained by LPHC, SS-KNN, RF, FLSCV, CMF, EFD, Blob, DBFCN and the proposed model, respectively. The images in the last two rows show the manual segmentation surfaces of Expert 1 and Expert 2, respectively.

Download Full Size | PDF

4. Discussion

In this paper, we proposed an RMPPNet to automatically segment the NRD lesions in SD-OCT images. The comparison experiments on three different datasets demonstrated the superior performances and generalizations of the proposed method comparing with state-of-the-art CNNs and NRD segmentation methods, which indicated the practical application for the clinical diagnosis of CSC.

As showed in Fig. 4, Fig. 6, and Fig. 7, most state-of-the-art methods suffered from the ignoring small NRD lesion, failing to distinguish small NRD lesion from the normal regions, boundary leaks and over- or under-segmentations. Comparatively, the proposed model can provide a wider receptive field and more abundant multi-scale features to overcome the various size, low contrast and weak boundaries involved in SD-OCT images with NRD lesions. As listed in Tables 3–5, the proposed method obtained the highest DSC 92.6%, 90.2% and 96.6% on the three datasets, respectively.

To further demonstrate the superior performances in dealing with various defects in SD-OCT images with NRD lesions, Fig. 10 shows six example B-scans and the corresponding ground truths and segmentations, including one B-scan with a small NRD lesion, two B-scans with large NRD, one B-scan with low contrast around lesion boundaries, one B-scan with two discontinuous NRD lesions and one B-scan with a long sideling NRD. Comparatively, the proposed model could obtain similar visual segmentation results comparing with the ground truths.

Fig. 10. The comparison results between the ground truths and the segmentation results obtained by the proposed method on six example B-scans. In each subfigure, the image beside the B-scan shows the enlarged view of the rectangle region marked by a green box, in which the yellow area is the true-positive segmentations, the green and red areas are the under and over segmentations obtained by the proposed method, respectively.

Download Full Size | PDF

It should be noted that we utilized the ground truths drawn by the second expert to train the proposed model in Sec. 3.4. To further explore the generalizations for different ground truths, we quantitatively evaluated the proposed method by using different training and testing ground truths marked by different experts. The quantitative results (mean ± standard deviation) are listed in Table 6, where Tr.Em&Te.En indicates training with the ground truth drawn by the m-th expert and testing with the ground truth drawn by the n-th expert. Both m and n take values from 1 and 2 which indicates the first expert and the second expert, respectively. Figure 11 shows the segmentation results on two example B-scans with four different training-testing strategies. Overall, comparing with the results listed in Table 5, the proposed RMPPNet model can obtain better segmentation results based on four different training-testing strategies, which are highly consistent with both ground truths. The criterion (all > 95.5%) indicate very high inter-observer and intra-observer agreement, highlighting that the proposed model has excellent generalization ability for different ground truths.

Fig. 11. The segmentation results on two example B-scans with four different training-testing strategies. In each row, the images from left to right shows the original B-scans, the enlarged segmentation results obtained by Tr.E1&Te.E1 strategy, Tr.E1&Te.E2 strategy, Tr.E2&Te.E1 strategy and Tr.E2&Te.E2 strategy, respectively. All the segmentation maps are the enlarged views of the rectangle region marked by a green box in the B-scan image, in which the yellow area is the true-positive segmentations, the green and red areas are the under and over segmentations obtained by the proposed method, respectively.

Download Full Size | PDF

Table 6. The quantitative results (mean ± standard deviation) of the proposed method by using different training and testing ground truths.

View Table | View all tables in this article

The limitations of the proposed algorithm are summarized as follows:

(1) Except for NRD, intraretinal fluid (IRF) and pigment epithelial detachment (PED) are also widely occurred fluids in retinal diseases. The most challenging to segment IRF is the low contrast between the IRF and background. The model would hardly distinguish the IRF from vessel-like structures. Therefore, the proposed method may fail in segmenting IRF. Based on the prior clinical information, both NRD and PED share similar intensity characteristics. NRD generally locates above the RPE layer, and PED locates below the RPE layer. Generally, the average optical intensity is the highest around the RPE layer over the whole B-scans. Therefore, one of our future work is to distinguish NRD and PED based on the gradient values around the fluid boundaries.
(2) The current version of the proposed model can only deal with 3D subjects slice by slice, which means that the 3D spatial information has been ignored. Therefore, the final 3D segmentations are not smooth enough, as shown in Fig. 9. Extension from 2D to 3D segmentation for the proposed network mainly suffers from the large computational complexity of 3D convolutions, which is out of the scope of this paper and subjects to future research.

5. Conclusion

In this paper, to better deal with the receptive field and the multi-scale features, we propose RMPPNet for NRD segmentation in SD-OCT images by introducing multiple pyramid pooling modules and constructing a new loss function. Therefore, the proposed network can handle the various size, low contrast and weak boundaries involved in SD-OCT images with NRD lesions. Experimental results on three datasets demonstrate the superior performances of the proposed model comparing with the state-of-the-art methods, which indicates the practical application for the clinical diagnosis of CSC. Our future work would mainly focus on the extension of 3D segmentation with less computational complexity.

Funding

Natural Science Foundation of Jiangsu Province (BK20180069); Six Talent Peaks Project in Jiangsu Province (SWYY-056); National Natural Science Foundation of China (61671242, 61701192); Suzhou Industrial Innovation Project (SS201759).

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. R. Hua, L. Liu, C. Li, and L. Chen, “Evaluation of the effects of photodynamic therapy on chronic central serous chorioretinopathy based on the mean choroidal thickness and the lumen area of abnormal choroidal vessels,” Photodiagn. Photodyn. Ther. 11(4), 519–525 (2014). [CrossRef]

2. K. K. Dansingani, C. Balaratnasingam, S. Mrejen, M. Inoue, K. B. Freund, J. M. Klancnik, and L. A. Yannuzzi, “Annular Lesions and Catenary Forms in Chronic Central Serous Chorioretinopathy,” Am. J. Ophthalmol. 166, 60–67 (2016). [CrossRef]

3. P. Agrawal, “Increased Choroidal Vascularity in Central Serous Chorioretinopathy Quantified Using Swept-Source Optical Coherence Tomography,” Am. J. Ophthalmol. 174, 176–177 (2017). [CrossRef]

4. A. Daruich, A. Matet, A. Dirani, E. Bousquet, M. Zhao, N. Farman, F. Jaisser, and F. Behar-Cohen, “Central serous chorioretinopathy: Recent findings and new physiopathology hypothesis,” Prog. Retinal Eye Res. 48, 82–118 (2015). [CrossRef]

5. Jr., MD M. R. Keith Shuler and Prithvi Mruthyunjaya, “Diagnosing and Managing Central Serous Chorioretinopathy - American Academy of Ophthalmology,” AAO Eyenet (2006).

6. A. R. Irvine, “The pathogenesis of aphakic retinal detachment,” Ophthalmic Surg 16(2), 101–107 (1985).

7. S. Vujosevic, M. Casciano, E. Pilotto, B. Boccassini, M. Varano, and E. Midena, “Diabetic macular edema: Fundus autofluorescence and functional correlations,” Invest. Ophthalmol. Visual Sci. 52(1), 442–448 (2011). [CrossRef]

8. V. Chaikitmongkol, P. Khunsongkiet, D. Patikulsila, M. Ratanasukon, N. Watanachai, C. Jumroendararasame, C. B. Mayerle, I. C. Han, C. J. Chen, P. Winaikosol, C. Dejkriengkraikul, J. Choovuthayakorn, P. Kunavisarut, and N. M. Bressler, “Color Fundus Photography, Optical Coherence Tomography, and Fluorescein Angiography in Diagnosing Polypoidal Choroidal Vasculopathy,” Am. J. Ophthalmol. 192, 77–83 (2018). [CrossRef]

9. T. Sekiryu, “Fundus autofluorescence in central serous chorioretinopathy,” Japanese J Clin Ophthalmol 67(2), 150–155 (2013). [CrossRef]

10. C. R. G. Dreher, N. Kulp, C. Mandery, M. Wachter, and T. Asfour, “A framework for evaluating motion segmentation algorithms,” IEEE-RAS Int Conf Humanoid Robot 30(2), 83–90 (2017). [CrossRef]

11. S. J. Ahn, T. W. Kim, J. W. Huh, H. G. Yu, and H. Chung, “Comparison of features on SD-OCT between acute central serous chorioretinopathy and exudative age-related macular degeneration,” Ophthalmic Surg. Lasers Imaging 43(5), 374–382 (2012). [CrossRef]

12. M. Y. Teke, U. Elgin, P. Nalcacioglu-Yuksekkaya, E. Sen, P. Ozdal, and F. Ozturk, “Comparison of autofluorescence and optical coherence tomography findings in acute and chronic central serous chorioretinopathy,” Int. J. Ophthalmol. 7(2), 350–354 (2014). [CrossRef]

13. G. R. Wilkins, O. M. Houghton, and A. L. Oldenburg, “Automated segmentation of intraretinal cystoid fluid in optical coherence tomography,” IEEE Trans. Biomed. Eng. 59(4), 1109–1114 (2012). [CrossRef]

14. J. Wang, M. Zhang, A. D. Pechauer, L. Liu, T. S. Hwang, D. J. Wilson, D. Li, and Y. Jia, “Automated volumetric segmentation of retinal fluid on optical coherence tomography,” Biomed. Opt. Express 7(4), 1577 (2016). [CrossRef]

15. J. Novosel, Z. Wang, H. De Jong, M. Van Velthoven, K. A. Vermeer, and L. J. Van Vliet, “Locally-adaptive loosely-coupled level sets for retinal layer and fluid segmentation in subjects with central serous retinopathy,” in Proceedings - International Symposium on Biomedical Imaging (2016), 2016-June, pp. 702–705.

16. K. Li, X. Wu, D. Z. Chen, and M. Sonka, “Optimal surface segmentation in volumetric images - A graph-theoretic approach,” IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 119–134 (2006). [CrossRef]

17. P. A. Dufour, L. Ceklic, H. Abdillahi, S. Schroder, S. De Dzanet, U. Wolf-Schnurrbusch, and J. Kowal, “Graph-based multi-surface segmentation of OCT data using trained hard and soft constraints,” IEEE Trans. Med. Imaging 32(3), 531–543 (2013). [CrossRef]

18. F. Shi, X. Chen, H. Zhao, W. Zhu, D. Xiang, E. Gao, M. Sonka, and H. Chen, “Automated 3-D retinal layer segmentation of macular optical coherence tomography images with serous pigment epithelial detachments,” IEEE Trans. Med. Imaging 34(2), 441–452 (2015). [CrossRef]

19. B. J. Antony, A. Lang, E. K. Swingle, O. Al-Louzi, A. Carass, S. Solomon, P. A. Calabresi, S. Saidha, and J. L. Prince, “Simultaneous segmentation of retinal surfaces and microcystic macular edema in SDOCT volumes,” SPIE Med. Imaging 9784, 97841C (2016). [CrossRef]

20. M. Wu, Q. Chen, X. J. He, P. Li, W. Fan, S. T. Yuan, and H. Park, “Automatic subretinal fluid segmentation of retinal SD-OCT images with neurosensory retinal detachment guided by Enface fundus imaging,” IEEE Trans. Biomed. Eng. 65(1), 87–95 (2018). [CrossRef]

21. G. Quellec, K. Lee, M. Dolejsi, M. K. Garvin, M. D. Abràmoff, and M. Sonka, “Three-dimensional analysis of retinal layer texture: Identification of fluid-filled regions in SD-OCT of the macula,” IEEE Trans. Med. Imaging 29(6), 1321–1330 (2010). [CrossRef]

22. M. Wu, W. Fan, Q. Chen, Z. Du, X. Li, S. Yuan, and H. Park, “Three-dimensional continuous max flow optimization-based serous retinal detachment segmentation in SD-OCT for central serous chorioretinopathy,” Biomed. Opt. Express 8(9), 4257 (2017). [CrossRef]

23. S. J. Chiu, M. J. Allingham, P. S. Mettu, S. W. Cousins, J. A. Izatt, and S. Farsiu, “Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema,” Biomed. Opt. Express 6(4), 1172 (2015). [CrossRef]

24. T. Hassan, M. Usman Akram, B. Hassan, A. M. Syed, and S. A. Bazaz, “Automated segmentation of subretinal layers for the detection of macular edema,” Appl. Opt. 55(3), 454 (2016). [CrossRef]

25. Y. Xu, K. Yan, J. Kim, X. Wang, C. Li, L. Su, S. Yu, X. Xu, and D. D. Feng, “Dual-stage deep learning framework for pigment epithelium detachment segmentation in polypoidal choroidal vasculopathy,” Biomed. Opt. Express 8(9), 4061 (2017). [CrossRef]

26. Y. Zheng, J. Sahni, C. Campa, A. N. Stangos, A. Raj, and S. P. Harding, “Computerized assessment of intraretinal and subretinal fluid regions in spectral-domain optical coherence tomography images of the retina,” Am. J. Ophthalmol. 155(2), 277–286.e1 (2013). [CrossRef]

27. D. C. Fernández, “Delineating fluid-filled region boundaries in optical coherence tomography images of the retina,” IEEE Trans. Med. Imaging 24(8), 929–945 (2005). [CrossRef]

28. T. Wang, Z. Ji, Q. Sun, Q. Chen, S. Yu, W. Fan, S. Yuan, and Q. Liu, “Label propagation and higher-order constraint-based segmentation of fluid-associated regions in retinal SD-OCT images,” Inf. Sci. 358-359, 92–111 (2016). [CrossRef]

29. A. Montuoro, S. M. Waldstein, B. S. Gerendas, U. Schmidt-Erfurth, and H. Bogunović, “Joint retinal layer and fluid segmentation in OCT scans of eyes with severe macular edema using unsupervised representation and auto-context,” Biomed. Opt. Express 8(3), 1874 (2017). [CrossRef]

30. J. Long, E. Shelhamer, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” IEEE Trans Pattern Anal Mach Intell 39(4), 640–651 (2014). [CrossRef]

31. C. Peng, X. Zhang, G. Yu, G. Luo, and J. Sun, “Large kernel matters - Improve semantic segmentation by global convolutional network,” Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 20172017-Janua, 1743–1751 (2017).

32. L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking Atrous Convolution for Semantic Image Segmentation,” (2017).

33. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV) (2018), pp. 801–818.

34. G. Lin, A. Milan, C. Shen, and I. Reid, “RefineNet: Multi-path refinement networks for high-resolution semantic segmentation,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017), 2017-Janua, pp. 5168–5177.

35. L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). [CrossRef]

36. M. Oršić, I. Krešo, P. Bevandić, and S. Šegvić, “In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019), pp. 12607–12616.

37. L. C. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille, “Attention to Scale: Scale-Aware Semantic Image Segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016), 2016-Decem, pp. 3640–3649.

38. D. Eigen and R. Fergus, “Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture,” in Proceedings of the IEEE International Conference on Computer Vision (2015), 2015 Inter, pp. 2650–2658.

39. P. O. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene labeling,” in 31st International Conference on Machine Learning, ICML 2014 (2014), 1(CONF), pp. 151–159.

40. H. Wu, J. Zhang, K. Huang, K. Liang, and Y. Yu, “FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation,” (2019).

41. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” 4th Int Conf Learn Represent ICLR 2016 - Conf Track Proc (2016).

42. H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 20172017-Janua, 6230–6239 (2017).

43. V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). [CrossRef]

44. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (2015), 9351, pp. 234–241.

45. L. Fang, D. Cunefare, C. Wang, R. H. Guymer, S. Li, and S. Farsiu, “Automatic segmentation of nine retinal layer boundaries in OCT images of non-exudative AMD patients using deep learning and graph search,” Biomed. Opt. Express 8(5), 2732 (2017). [CrossRef]

46. A. G. Roy, S. Conjeti, S. P. K. Karri, D. Sheet, A. Katouzian, C. Wachinger, and N. Navab, “ReLayNet: retinal layer and fluid segmentation of macular optical coherence tomography using fully convolutional networks,” Biomed. Opt. Express 8(8), 3627 (2017). [CrossRef]

47. K. Gao, S. Niu, Z. Ji, M. Wu, Q. Chen, R. Xu, S. Yuan, W. Fan, Y. Chen, and J. Dong, “Double-branched and area-constraint fully convolutional networks for automated serous retinal detachment segmentation in SD-OCT images,” Comput. Meth. Prog. Bio. 176, 69–80 (2019). [CrossRef]

48. J. Hu, Y. Chen, and Z. Yi, “Automated segmentation of macular edema in OCT using deep neural networks,” Med. Image Anal. 55, 216–227 (2019). [CrossRef]

49. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int Conf Learn Represent ICLR 2015 - Conf Track Proc (2015).

50. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (2012), pp. 1097–1105.

51. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2015), 07-12-June, pp. 1–9.

52. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016), 2016-Decem, pp. 770–778.

53. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017 (2017), 2017-Janua, pp. 2261–2269.

54. A. Lang, A. Carass, E. K. Swingle, O. Al-Louzi, P. Bhargava, S. Saidha, H. S. Ying, P. A. Calabresi, and J. L. Prince, “Automatic segmentation of microcystic macular edema in OCT,” Biomed. Opt. Express 6(1), 155 (2015). [CrossRef]

55. Z. Ji, Q. Chen, M. Wu, S. Niu, W. Fan, S. Yuan, and Q. Sun, “Beyond Retinal Layers: A Large Blob Detection for Subretinal Fluid Segmentation in SD-OCT Images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (2018), pp. 372–380.

Net	PPM position	DSC (%)
RMPPNet-0	No PPM	90.2 ± 9.3
RMPPNet-1	6th	91.3 ± 7.3
RMPPNet-2	4th,6th	91.9 ± 6.2
RMPPNet-3	2nd,4th,6th	92.6 ± 5.6
RMPPNet-4	2nd,4th,5th,6th	92.3 ± 5.8
RMPPNet-5	2nd,3rd,4th,5th,6th	92.5 ± 5.6
RMPPNet-6	1st,2nd,3rd,4th,5th,6th	92.8 ± 5.3

Loss Function	DSC (%)
RMPPNet-3+BEC	92.1 ± 6.1
RMPPNet-3+BEC + Dice	92.3 ± 5.7
RMPPNet-3+BEC + Dice+ DIF	92.6 ± 5.6

	Expert 1			Expert 2
Methods	TPVF	PPV	DSC	TPVF	PPV	DSC
LPHC [28]	81.3 ± 9.4	55.6 ± 13.3	65.3 ± 10.4	81.2 ± 9.3	55.8 ± 12.8	65.7 ± 10.5
SS-KNN [23]	80.9 ± 6.6	91.9 ± 3.8	86.1 ± 4.1	80.3 ± 6.5	91.8 ± 3.8	85.9 ± 4.1
RF [24]	92.6 ± 4.4	92.4 ± 2.0	87.1 ± 4.3	92.5 ± 4.3	91.9 ± 2.2	88.9 ± 4.2
FLSCV [5]	84.4 ± 16.0	86.2 ± 7.3	78.9 ± 21.7	84.4 ± 15.1	86.8 ± 7.3	79.4 ± 2.2
CMF [9]	92.1 ± 4.1	93.0 ± 3.4	93.9 ± 2.5	92.0 ± 3.9	94.0 ± 3.5	94.3 ± 2.6
EFD [6]	94.2 ± 5.2	93.0 ± 4.8	93.7 ± 4.0	94.5 ± 5.1	94.1 ± 5.3	94.6 ± 4.1
Blob [55]	95.1 ± 2.3	93.1 ± 5.0	94.0 ± 3.1	95.2 ± 2.2	94.2 ± 4.9	94.7 ± 3.0
DBFCN [47]	94.3 ± 1.8	95.8 ± 1.4	95.0 ± 1.8	94.4 ± 3.2	97.0 ± 1.1	95.6 ± 1.6
Expert 1	-	-	-	96.3 ± 1.4	95.2 ± 1.4	95.7 ± 1.1
RMPPNet	95.9 ± 1.8	95.8 ± 1.4	95.8 ± 0.9	96.1 ± 1.8	97.1 ± 1.0	96.6 ± 0.9

Criterion	TPVF	PPV	DSC
Tr.E1&Te.E1	96.2 ± 1.5	96.7 ± 1.2	96.4 ± 0.9
Tr.E1&Te.E2	95.6 ± 1.9	95.5 ± 1.4	95.6 ± 1.0
Tr.E2&Te.E1	95.9 ± 1.8	95.8 ± 1.4	95.8 ± 0.9
Tr.E2&Te.E2	96.1 ± 1.8	97.1 ± 1.0	96.6 ± 0.9

Net	PPM position	DSC (%)
RMPPNet-0	No PPM	90.2 ± 9.3
RMPPNet-1	6th	91.3 ± 7.3
RMPPNet-2	4th,6th	91.9 ± 6.2
RMPPNet-3	2nd,4th,6th	92.6 ± 5.6
RMPPNet-4	2nd,4th,5th,6th	92.3 ± 5.8
RMPPNet-5	2nd,3rd,4th,5th,6th	92.5 ± 5.6
RMPPNet-6	1st,2nd,3rd,4th,5th,6th	92.8 ± 5.3

RMPPNet: residual multiple pyramid pooling network for subretinal fluid segmentation in SD-OCT images

Abstract

1. Introduction

2. Proposed network

2.1 Backbone net

2.2 Multiple pyramid pooling modules

2.3 Decoder stage

2.4 Loss function

3. Experiments

3.1 Datasets and evaluation criteria

3.2 Analysis of the proposed model

3.3 Experimental comparison with CNNs

3.4 Experimental comparison with NRD segmentation methods

4. Discussion

5. Conclusion

Funding

Disclosures

References

Cited By

Figures (11)

Tables (6)

Equations (7)

OSA Continuum