Multiple Lesions Insertion: boosting diabetic retinopathy screening through Poisson editing

Zekuan Yu; Zekuan Yu; Zekuan Yu; Zekuan Yu; Zekuan Yu; Rongyao Yan; Yuanyuan Yu; Xiao Ma; Xiao Liu; Jie Liu; Jie Liu; Qiushi Ren; Yanye Lu

doi:10.1364/BOE.420776

1. Introduction

Diabetes is a disease due to an abnormal increase in the glucose level of the blood, resulting in irreversible damage to the blood vessels [1,2]. It is the fifth deadliest disease in the USA, and no effective cure until now. The total economic cost of diabetes reached to the US $\$$132 billion, accounting for one out of every 10 health care dollars spent in the USA [3,4]. Diabetic retinopathy (DR) is a common concurrent disease caused by diabetes, because the tiny blood vessels that nourish the retina are vulnerable to anomalous glucose level. This disease may significantly damage the central vision, regarded as one of the four blind diseases. Early detection coupled with early treatment is a workable and reliable method to prevent permanent vision loss due to DR [5].

As shown in Fig. 1, there are many lesions such as exudates, microaneurysms and hemorrhages appearing in the DR fundus image. Due to the damages of blood vessels of the retina, there will be leaking of lipid out of blood vessels which is a major indicator of DR. These lipid leakages produce yellow structures on the retina called hard exudates, and white structure on the retina called soft exudates [6]. Hard exudates have clear boundaries but soft exudates or cotton wool spots have fuzzy image classification [7], detection [8], segmentation [9], super-resolution [10]. However, due to the lack of sufficient ground truth samples for training, the majority of CNN-based methods targeting medical image CAD applications fail to be adequately trained end to end from scratch. Data augmentation is one of a common-used method to improve model generalization through enlarging the size of datasets. The classical data augmentation methods, including over-sampling, down-sampling [11], rotation [12], color jittering [13], etc., have been widely used in medical image analysis. With the development and application of deep learning in the medical image field, generative adversarial network (GAN) as a new data augmentation method has achieved good performance in many image classification tasks [14–16]. The GAN-based data augmentation method sometimes cannot generate more satisfactory image texture detail and smooth boundary. Poisson editing [17], introduced in 2003, is becoming a technique with major applications in many different domains of image processing and computer graphics. Currently, the Poisson editing method has been applied to many aspects of medical image processing, such as image stitching [18], segmentation [19], image fusion [20] and colorization [21], and achieved good results. Poisson editing can achieve seamless synthesis by solving the Poisson equation defined by the objective function. The method not only smooths out the gradient domain of the blending boundary but also adds consistent texture into the blending region, which can generate lots of meaningful synthetic images by inserting different kinds of original image patches into different domain images [22].

Fig. 1. DR fundus image with different lesions.

Download Full Size | PDF

This paper presents a new data augmentation method toward the task of developing a CAD system to support ophthalmologists in the transferred treatment decision. Our main contributions included: (i) we propose a new data augmentation pipeline named Multiple Lesions Insertion (MLI), this method is used to simulate new realistic diabetic retinopathy (DR) fundus images based on the healthy fundus images by inserting real lesions, exudates, hemorrhages, microaneurysms templates into new healthy fundus images with Poisson editing. (ii) Based on this strategy, our MLI pipeline can generate different DR grading funds images with multiple lesions and the number of different types of lesions can be manually selected, we have proved that the samples generated by the MLI method are realistic with clear texture details. (iii) We demonstrate the improvement of DR funds images transferred treatment task with our MLI data augmentation method. And then we compared different data augmentation methods, we found that with the number of generated images raising, the MLI method as a new data augmentation method has better performance than over-sampling, image rotation, adding new real samples these kinds of methods.

The remainder of this paper is organized as follows. In Section II, we discuss the related work and introduced the dataset. In Section III, we propose the MLI pipeline. In Section IV, we show the experiment results and evaluate the results. In section V and section VI, we make the discussion and conclusion of this work, respectively.

2. Related work and dataset

Nowadays, many studies aimed to solve the challenges of the machine learning task with small-sample data and imbalanced data. The data augmentation method is an effective way, especially in the medical image CAD research field. GAN and its variants have been applied to the medical image synthesis tasks and have achieved good performance.

Bhattacharya et al. [23] proposed to generate NIH chest X-ray image dataset for chest pathology classification by convolutional generative adversarial network (DCGAN). They showed that DCGAN data augmentation strategy can improve the generalization performance of CNN model, and improve the classification accuracy. Hou et al. [24] proposed a GAN-based method for compositing images with different strategies. They found that composited images of cancer cells could improve segmentation results and reduce nuclear segmentation errors by 6% to 9%. Li et al. [25] proposed the pyramid GAN(Py-GAN) to complete blind image and achieve pixel-level realism. Such an approach is valuable in medical image inpainting works. Fridadar et al. [26] proposed a GAN method to generate synthetic medical images for data enhancement and used them to improve the performance of CNN for liver disease classification. By adding the synthetic data augmentation, the results achieved 85.7% sensitivity and 92.4% specificity. The method [27] has the contributions that used generative adversarial deep neural networks to classify multi-modal MRI brain tumors. They first pre-trained the discriminator, and then replaced the fully connected layer as the classifier for tumor classification and achieved brain tumors classification mean accuracy of 95%.

In the task of DR fundus image synthesis, many GAN-based methods have been developed [28–30]. You et al. [31] proposed Cycle-GAN by adopting the Convolutional Block Attention Module (Cycle-CBAM) to achieve a data enhancement. Then they conducted DR fundus images classification experiments to evaluate the effect of data enhancement. Experiments show that this data enhancement method can improve the classification accuracy by 2%. Zhou et al. [32] proposed a diabetic retinopathy generative adversarial network (DR-GAN) to synthesize high-resolution fundus images, which can both synthesize highly realistic controllable fundus images and contribute to the DR grading task. Although the data augmentation methods of fundus images which based on GAN have achieved some success, the image quality of the generated images are not good as the real image, the texture details of the generated images are poor, and the application robustness and generalization are unsatisfactory. These led the data augmentation methods which based on GAN gave limited help to improve the diagnosis accuracy of CAD system.

Yu et al. [31,33] proposed a new preprocessing pipeline named multiple-channels-multiple-landmarks (MCML), which aimed to synthesize color fundus images from a combination of vessel tree, optic disc, and optic cup images. The performance of MCML method is better than many classic networks, such as Pix2pix, Cycle-GAN, it has great potential in the computer-aided diagnosis based on fundus image. Niu et al. [1] proposed P-GAN which based method to synthesize pathological retinal image given the descriptor and a binary vessel segmentation. Tub-GAN [34] was proposed to extend style transfer to the generator and increase the diversity of synthesized samples. However, the quality of the generated image with P-GAN and Tub-GAN is not good, the average scores of Tub-GAN and P-GAN generated fundus images from 5 professional ophthalmologists were 2.66 and 3.55, respectively.

Afterwards, many techniques to create new artificial samples and to alter the distribution of lesion properties in the existing dataset have been developed. Pezeshk et al. [35] developed an image blending method to augment existing dataset by seamlessly inserting a lesion extracted from a source image into a target image, which is based on Poisson image blending and focused on the pulmonary nodules in chest CT exams. This method proves the feasibility of lesion insertion in the field of medical imaging. Then Pezeshk et al. [36] first demonstrated the effectiveness of lesion insertion as a data augmentation method and boosted the performance of CAD system. However, the method cannot automatically insert lesion into the field of view (FOV) and the manual selection of suitable insertion areas is a big problem especially for large medical image dataset. Zahra et al. [37] developed an image blending technique with Poisson editing that allows users to seamlessly insert regions of interest (ROIs) from one mammogram into another. And they assessed the performance of the fusion images, the average by-image sensitivities for native and inserted clusters are 72.3% and 67.6%, respectively. It demonstrated that using the inserted microcalcification clusters for assessing mammographic can improve the accuracy of CADs.

In summary, the GAN method can only generate new simples based on certain specific dataset, cannot change the image distribution, and cannot improve the diagnostic accuracy of the CAD system. On the contrary, the Poisson image editing method is not limited by a specific dataset. It can use multiple datasets for seamlessly intersecting different kinds of lesions to generate more and more lesion images with widely distributed. And these generated images can improve diagnosis accuracy in the CAD system. There is no research on applying Poisson image editing method to generate fundus images with multiple lesions. In this study, we proposed a new multiple lesion insertion method with Poisson editing in fundus image, and demonstrated the MLI method make contributions to the DR screening.

2.1 Dataset

We use five publicly available datasets in our study, including Messidor dataset [38], E-ophtha dataset [39], Kaggle dataset [40], IDiRD dataset [41] and DRIVE dataset [42]. The specific information of the dataset is shown in Table 1.

Table 1. The specific information of the dataset

View Table | View all tables in this article

2.1.1 Messidor dataset

The images were acquired using TopCon TRC NW6 non-mydriatic fundus camera with a 45$^{\circ }$ FOV. In this work, 546 healthy fundus images were used as the inserted images, and all the images were resized to 1440 $\times$ 960.

2.1.2 E-ophtha dataset

E-ophtha database consists of two sub databases, one for microaneurysms (e-ophtha-MA), another for exudates (e-ophtha-EX). And it was acquired at more than 30 screening centers around France with a 45° FOV. The dataset includes ground-truth annotated by ophthalmology experts.

2.1.3 Kaggle dataset

This dataset includes a total of 35621 images of training set images with five DR grading labels. "0, 1, 2, 3, 4" stands for "no DR", "mild", "moderate", "severe", and "proliferative DR". Kaggle dataset is a large set of high-resolution retina images under a variety of imaging conditions and does not have a fixed FOV.

2.1.4 IDiRD dataset

This dataset contains 516 fundus images and it was acquired using a Kowa VX-10 alpha digital fundus camera with a 50° FOV. It is divided into 0 to 4 grades according to DR staging. The number of images in each grade is 168, 25, 168, 93 and 62.

2.1.5 DRIVE dataset

The database includes 40 images, including fundus photo and paired retinal blood vessel segmentation results, 20 of which are used for training and 20 for testing. The data was collected from a Canon CR5 non-mydriatic 3CCD camera with a 45$^{\circ }$ FOV. The image resolution of all the fundus images are 565 $\times$ 584.

2.2 Data cleaning and image pre-preprocessing

According to the Image quality assessment criteria, we convert the five major levels of clinical DR severity to the referral (positive sample) and non-referral (negative sample) as a binary classification problem [43]. We combine grade 0 and grade 1 in the Kaggle database, and define them as the nonreferral category, the total number of samples is 3134 + 628 = 3762. In the same way, we combine grade 2, 3, and 4 as the referral category, and the total number of referrals is 636 + 83+ 20 = 739. For IDiRD dataset, we divide the images of different grades from grade 0 to grade 4 into two types: referral and non-referral. We combine grade 0 and grade 1 into non-referral data, grade 3 and grade 4 into the referral data. In this way, a new IDiRD1 data set is formed.

In order to study the impact of the data augmentation on the deep learning network, 3762 non-referral samples and 739 referral samples in the Kaggle database were randomly divided at 70% and 30% to obtain the Kaggle1 database and the Kaggle2 database. In Kaggle1 database the number of no-referral samples is 2633, the referral samples are 517. In the Kaggle2, non-referral samples are 1129 and referral samples are 222. In our study, the Kaggle1 and the Kaggle2 database are used as the training set, and the IDiRD1 database is the test set. In addition, all the images are resized to 299 $\times$ 299.

The bounding box masks of hemorrhages in our study are manually cropped by the ophthalmologists in the Messidor dataset and microaneurysms and exudates masks are acquired by the E-ophtha dataset. All images including ground-truth annotations are normalized to 1440$\times$960 in the preprocessing process, then, connected components are found in ground-truth annotation, which generates masks for each individual lesion. The template of a single micro aneurysm can then be obtained by an element-wise multiplication of the expanded mask and original image. In addition, we can also obtain the bounding box mask of the lesion from the expanded mask.

3. Methods

We proposed a new fundus synthesis method called Multiple lesions Insertion (MLI) method, which combines exudates, hemorrhages, microaneurysms into a healthy fundus image. Figure 2 shows the working flow of the MLI method.We define the exudate lesions as EX$_{patch}$ =[EX$_1$,EX$_2$, $\cdots$,EX$_N$], the hemorrhages lesions as HEM$_{patch}$ =[HEM$_1$,HEM$_2$, $\cdots$,HEM$_N$], the microaneurysms lesions as MA$_{patch}$ =[MA$_1$,MA$_2$, $\cdots$,MA$_N$]. The original healthy fundus image (I$_H$) can be divided into three parts, vessel tree (I$_V$), optic disc (I$_O$) and the mask of the field of view (I$_F$). Then, the interesting inserting areas defined as I$_c$ = I$_F$ $\cup$ (~I$_v$) $\cup$ (~I$_o$). Our pipeline can automatically define the number of different lesion patches and then seamlessly insert them into a new location on different healthy fundus images to obtain the new DR fundus images.

Fig. 2. DR fundus image with different lesions.

Download Full Size | PDF

3.1 Structure extraction

In our paper, the raw fundus image is denoted as I and its red channel is denoted as I$_r$. To make blended image more authentic, the fact that exudate do not appear inside vessel or optic disc should be taken into consideration, which makes segmentation of vessel and optic disc necessary for the healthy fundus images. So that no lesion would be inserted in these areas. Because FOV not only limit lesion insertion, but also affects the accuracy of vessel segmentation [44,45], we firstly used the iterative method in [46] to extract FOV mask. A disk area is extracted from I$_r$, whose radius r$_0$ should be smaller than that of the FOV. The mean intensity of pixels inside this disk area is denoted as t$_R$, and pixels in I$_r$ with intensity higher than threshold t$_R$ forms the initial FOV region $R$.

Then, for any pixel p$\notin$R, its neighboring pixels locate within r$1$ pixels from $p$ forms the set N$_{r1}$(p). A local threshold t$p$ is computed as t$_0\cdot$mean$_{N_{r1}(p)\cap R}$. If the intensity of $N_{r 1}(p)$ is larger than t$_p$, the pixel is added to set $R$. After that, this step is repeated until $R$ remains unchanged. In our implementation, parameters are set as follows: r$_0$=0.8$\times$$min(h,w)$, r$_1$=30, t$_0$=0.75, where $h$ is the height of I$_r$ and $w$ the width. It should be pointed out that t$_p$ in a pixel-wise fashion is extremely time consuming, the convolution with a disk shape is using in the practical implementation. Finally, we implemented the U-net [47] model which trained on the DRIVE dataset in the Messidor fundus images to extract vessel parts.

Optic disc segmentation breaks down into two steps, localization and segmentation. The former step roughly determines the location of optic disc by locating its centroid or a part inside it, whereas the latter more precisely identify the pixels belonging to optic disc. In the localization step, method in [45] is applied, a simple thresholding followed by area filtering is used for localization purpose, which is performed on luminance channel I$_l$ of the HLS color space. The threshold is set to 0.8$max$I$_l$ and area filtering is configured to filter out only the largest particle b, and centroid c can be calculated by b. The segmentation step is performed on red channel I$_r$ and is basically a marker-controlled watershed transformation [48].

Three consecutive morphological operations, closing p$_1$ = $\phi ^s$($I_r$), opening p$_2$= $\gamma ^s$($p_1$) and opening by construction p$_3$ = $\gamma ^{rec}_{p1}$($p_2$), are applied to eliminate variations in pixel intensity, where s is a square structure element with 16 pixels in width. In the segmentation step, it concludes by performing watershed transformation on the morphological gradient of p$_3$ with particle b as internal marker and a circle around c with radius larger than the diameter of optic disc. Due to the fact that there is no ground truth annotations of vessel and optic disc results in Messidor database, and the insert region is large enough, the segmentation errors do not have large effect on the inserted candidate region I$_c$. Since there is no ground truth of optic disc and vessel segmentation masks in Messidor dataset, we do not pay additional attentions into the precision of the vessel and optic disc segmentation in our MLI pipeline.

3.2 Poisson image editing

Poisson image editing is an effective way for seamless blending, whose mathematical foundation lies on Poisson equation with Dirichlet boundary conditions. In the essence, it asserts the existence and uniqueness for a function across its domain given the boundary values and Laplacian of its interior values. Therefore, under the guidance of a certain gradient field, we can accomplish interpolation inwards of the values along the boundary of an area within the target image to achieve blending. In this paper, we denote target image, source image as $f_t$,$f_s$ respectively. We defined the D$\in$$R^2$ as the domain across and denote a closed subset of D as $\Omega$ with boundary $\partial \Omega$,and denote the expected result within after blending as f. For the purpose of interpolation, Poisson editing is essentially a diffusion process, and is prone to halos and bleeding artifacts when the differences between source and target images along the insertion boundary are not uniform. we can find f by solving the following minimization problem:

(1)$$\underset{f}{min}\iint_\Omega|\triangledown f|^2\cdots with f|_{\partial\Omega}=f_t|_{\partial\Omega}$$

(2)$$\underset{f}{min}\iint_\Omega|\triangledown f-\nu|^2\cdots with f|_{\partial\Omega}=f_t|_{\partial\Omega}$$

where ${\nu }$ is called a guidance field, which, in our case, can be set to ${\triangledown f_s}$ so that the interpolation is guided by the chosen source image. The solution to the above problem is in fact the unique solution to the following Poisson equation with Dirichlet boundary conditions:

(3)$$\triangledown f=div(\nu), over \Omega$$

where ${div =\frac {\partial }{\partial x}+\frac {\partial }{\partial y}}$ is divergence operator.

Since Poisson image editing described above allows guidance filed to be unconservative, then we use the mixing gradient to combine salient features in source and destination image. Specifically, we use the following equation to obtain vector v:

(4)$$for\; all\; x\in \Omega, \nu(x)=max(\triangledown f_t(x),\triangledown f_s(x))$$

We discretize it by using the pixel grid of the digital image using finite difference discretization, and apply iterative method like Jacobi method or Gauss-Seidel iteration with successive overrelaxation to solve the minimization problem.

Since the contrast of the lesion patches after Poisson editing will be changed, to ensure the visibility of lesions after blending step, we only chosen the lesion patches whose contrast against neighboring background is high enough as templates. And we introduced the contrast improvement index (CII) [49] to restrict the contrast of the inserted lesion patches.

(5)$$C=\frac{Y-G}{Y+G}$$

(6)$$CH =C_{en}/C_{org}$$

where, Y is the grayscale value of the ROI, G is the grayscale value of the background, ${C_{en}}$ is the contrast of the enhanced image, and ${C_{org}}$ is the contrast of the original image. As we all know, the DR lesions mainly appear on red and green channels, the green component of the image has the highest contrast for exudates and hemorrhages. Whereupon, we considered the CII_R and CII_G values of the different channels. If CII_R<0.01 or CII_G<0.035, the template will not be used for insertion and the candidate templates will be abandoned. We determine the threshold based on the CII distribution across e-ophtha database. The thresholds are 10% percentile of corresponding distribution.

We use the healthy fundus image as target and extracted lesions as source to achieve the lesion blending through the Poisson image editing which is performed on each channel of RGB image and combined the results of each channel to generate the final blended image. In addition, CII values are examined in the blended image around inserted area to ensure the visibility of blended lesions.

3.3 Inception-V3 model

Inception-V3 [50] as a CNN model has achieved good performance in fundus image analysis. In our study, both end to end training and transfer learning were implemented on Inception-V3 model. Focal loss [51] was implemented during each comparison studies, which can solve sample imbalance and improve classification accuracy. We also evaluate and compare several other common-used CNN architectures. In our experiment, Inception-V3 is trained from scratch, or the weights training for Inception-V3 in supervised manner. In order to solve the problem of imbalance of positive and negative samples in different dataset, we introduce focal loss in training process. Focal loss can reduce the weight of a large number of simple negative samples to keep data relative balance. we use the following equation to obtain focal loss function:

(7)$$CE(p,y)={-}(y\cdot logp+(1-y)log(1-p))$$

where, $y\in \pm 1$ represents the correct classification, $p\in [0,1]$ is the predicted probability of $y$=1, we define $p_t$ as:

(8)$$p_t = \begin{cases} p, & if\, y = 1 \\ 1-p, & otherwise \\ \end{cases}$$

Then $CE(p,y)=CE(p_t)=-log(p_t)$, we add the modulation factor item on the correct classification to get the focal loss function, when $r$=2, the focal loss function performs best:

(9)$$FL(p_t)={-}(1-p_t)^rlog(p_t)$$

To the best of our knowledge, this is the first time to apply Poisson editing method to fundus image analysis task. And our method demonstrate that the generated fundus images make contributions to the DR screening.

4. Experiment and results

4.1 Training protocols and evaluation metrics

We implemented Poisson editing method on MATLAB 2018b. DR screening related experiments were implemented on Linux, PyTorch 0.4.1. Tasks are trained on a single server with an NVIDIA Quadro P4000 graphics card. The batch size was 8, learning rate is 0.001, momentum is 0.9, the weight delay is 1e-4. During the end to end training procedure, the epoch was set between 140 and 150; when applying transfer learning method, the epoch was set between 40 and 50. The convergence time of transfer learning is shorter than the end to end training, since the transfer learning load pretrained parameters and is easier to achieve convergence. Comparing with other data augmentation methods, both the traditional methods, such as under-sampling, oversampling, cropping, rotation and adding real samples, are used to adjust class distribution and compensate for data imbalance.

The sensitivity, specificity and accuracy are used for evaluating the performance of the proposed method as given in Equations (10) to (12). The sensitivity is proportional tcorrectly classifying referral images, the specificity is proportional to correctly classifying no-referral images. The true results are proportional to the accuracy. The accuracy value is used to test the reliability of the diagnostic test. True positive TP is defined as correctly classified data and false negative FN denotes the incorrectly rejected data. The true negative value is represented as TN, which is the correctly rejected data. The incorrectly identified data is the false positive FP.

(10)$$Sensitivity(SEN)=\frac{TP}{TP+FN}$$

(11)$$Specificity(SPE) =\frac{T N}{T N+F P}$$

(12)$$Accuracy(ACC)=\frac{TP+TN}{TP+TN+FP+FN}$$

Area Under the Curve (AUC) is used to evaluate the performance of the CAD systems in our study. In our paper, the referral and non-referral represent positive and negative samples, respectively.

4.2 Subjective visual quality evaluation

Figure 3 shows the generated fundus images by applying MLI method on Messidor dataset. We performed subtraction on the corresponding original image is shown in Fig. 4, the MLI generated fundus images only changed in the lesion area. Figure 5 shows the generated DR fundus images applying of MLI, P-GAN [34] and Tub-GAN [30]. We can observe that the fundus images generated by our MLI method preserve not only the vessel vascular shape, but also the shape of the optic and optic cup. It is worth mentioning that, our method has real textures and rich details without dark spots, artifacts and Since Inception-V3 has achieved good performance in diabetic retinopathy computer aided diagnosis tasks, we used Inception-V3 as the CNN model in our experiments. By analyzing the results of the referral and non-referral classification experiments, we found that MLI dataaugmentation method is significantly better than the results of the traditional data augmentation method, and our method has a great improvement in the evaluation metrics discontinuity.

Fig. 3. Fusion fundus image Results. (a) Original fundus images (b)-(d) Fusion fundus images with MLI methods.

Download Full Size | PDF

Fig. 4. Fusion Results. (a)-(c) Original fundus images; (d)-(f) Fusion fundus images with MLI methods; (g)-(i) Corresponding image subtraction results.

Download Full Size | PDF

Fig. 5. Fusion Results by different methods. (a) Fusion fundus images by MLI methods; (b) Fusion fundus images by P-GAN [29]; (c) Fusion fundus images by Tub-GAN [30].

Download Full Size | PDF

4.3 Quantitative evaluation

In the preprocessing step of our paper, the DRIVE dataset is used for blood vessel segmentation, the E-optha dataset is used for extracting lesions. And the Kaggle, Messidor dataset are used for training the classification of referral and non-referral, and IDiRD1 is used for testing. Figure 6 demonstrates the T-SNE (t-distributed stochastic neighbor embedding) visualization distributions of different data augmentation methods. The blue points represent the real non-referral data and the orange points represent the augmentation data. It can be seen that the data augmentation methods, such as under-sampling, oversampling, cropping and rotation, do not change the distribution of the database. In contrast, the generated image method can change the distribution of the dataset. In Fig. 7, the left picture shows the T-SNE distribution images formed by 500 generated referral fundus images, 500 real referrals fundus images, and 500 real non-referrals fundus images. The picture on the right shows 2633 real non-referral and 3000 generated fundus images T-NSE distribution. We can observe that the distribution of the generated images and the real images are different, the generated images had more wide spread distribution. These also explain why under-sampling, oversampling, cropping and Rotation methods have a poor effect on classification accuracy (Table 5).

Fig. 6. The T-SNE distribution of dataset with different data augmentation methods. (a) the T-SNE distribution of dataset with under-sampling; (b) the T-SNE distribution of dataset with oversampling; (c) the T-SNE distribution of dataset with adding real images; (d) the T-SNE distribution of dataset with cropping and rotation; (e) the T-SNE distribution of dataset with only 3000 generated images; (f) the T-SNE distribution of dataset with real images and 3000 generated images.

Download Full Size | PDF

Fig. 7. The T-SNE distribution images of real samples and generated samples. (a) the T-SNE distribution of 500 generated samples, 500 real referral samples and 500 non-referred samples (b) the distributions of 517+3000 generated samples and 2633 real non-referral samples.

Download Full Size | PDF

In order to better demonstrate the effectiveness of the synthetic images based on the MLI method makes in the diabetic retinopathy screening, we conducted two experiments. One is Inception-V3 with end to end training strategy model classification with samples increased, and another is Inception-V3 with transfer learning strategy model classification with samples increased. The different Inception-V3 model classification results with different samples are shown in Table 2 and Table 3. We can observe that the practical classification error is ignored with $\pm$ standard deviation. And we can observe that the Inception-V3 with transfer learning strategy has better performance than the Inception-V3 with end to end training strategy. Without any data augmentation strategy, the Inception-V3 with transfer learning strategy model classification results achieved that the ACC is 90.8%, AUC is 96.32%, SEN is 88.39% and SPE is 92.75%. All the evaluation metrics are higher than the metrics of the Inception-V3 with end to end training strategy. With the MLI generated samples increasing, the accuracy and AUC values both increased. When adding 3000 samples, the CAD system achieved the best performance, the ACC is 95.4%, with the ACC increasement of 4.6%. The SEN is 91.61%, with the SEN increasement of 3.22%. Inception-V3 with transfer learning strategy is the best-performing network structure, which can be used as the classifier of the referral and non-referral binary classification.

Table 2. Inception-v3 with end to end training strategy model classification results with samples increasing

View Table | View all tables in this article

Table 3. Inception-V3 with transfer learning strategy model classification results with samples increasing

View Table | View all tables in this article

Table 4. The number of the training samples from comparison experiments in this study

View Table | View all tables in this article

Finally, we conduct the comparative experiments of the MLI method with the other traditional data augmentation methods and the Inception-V3 with transfer learning strategy, the epoch is set to the 50. The number of training samples from comparison experiments is shown in Table 4. Table 5 and Fig. 8 show the classification results and Confusion Matrixes of different data augmentation methods. Data augmentation methods such as oversampling, adding other real samples, cropping, and rotation both improve the sensitivity or specificity of the CAD system. Especially adding newly generated samples with real samples, the sensitivity has nearly achieved an increment of 19.35% compared to the under-sampling method. Under-sampling and only adding generated samples do not change the performance of the CAD system. When 3000 generated images are used as training samples, the sensitivity of the CAD system is terrible, which further indicated that the generated samples cannot replace real samples. The possible reason for the poor effect when using the generated image alone as the training set (Table 5 only generated 3000 images) is that the original training images are from the Kaggle database, the test images are from IDiRD database, images these two databases share similar data distribution. The generated images are from the Messidor database, there are great differences between the Messidor database and the IDiRD database, so that only using the generated samples does not change the distribution of the original data. In summary, by using the real referral images (517), the data distribution of fake referral images (3000) can be corrected to give full play to the effect of the generated images (Table 5 Real images with 3000 generated images), the MLI method got the best performance, and it is better than other data augmentation methods.

Fig. 8. Confusion Matrixes of different data augmentation methods. (a) Under-sampling, (b) Oversampling, (c) Cropping and Rotation, (d) Adding real images, (e) Only 3000 generated images, (f) Real images with 3000 generated images, (g) End to end Inception-V3 with 3000 generated fundus images, (h) Transfer learning Inception-V3 with 0 generated fundus images.

Download Full Size | PDF

Table 5. Different data augmentation methods comparison with transfer learning inception-V3

View Table | View all tables in this article

5. Discussion

In this paper, a new data augmentation method with multiple lesions insertion was proposed. As discussed above, MLI method has been demonstrated as a new data augmentation method which can boost the performance of the DR screening. There are also some limitations in this study.

1. The different healthy images and insert location are uncontrollable. All the generated samples are stochastic sampling from these limited healthy fundus images. Same lesions appear at different healthy fundus images and different locations may bring some system errors to the final CAD result. We will consider the different MLI methods combined with GAN method to obtain better synthesize fundus images by adding some position and shape size constraints. In the future work, the number of lesion templates from more databases will be adopted in order to improve the data multiplicity, which can improve the performance of the CAD system.
2. In this paper, we only carried out the binary classification of referral and non-referral for the MLI data augmentation method. In our following work, we will focus on the influence of data augmentation on the staging classification in DR fundus images.
3. As suggested in this paper, transfer learning from the large image dataset to the fundus images classification has benefited from our experiments. This provides some enlightenments on cross-dataset CNN learning in the medical image, we can apply the pre-trained model to the target medical image dataset to obtain a more accurate classification model through fine-tuning and training.
4. Limited dataset may impede the development of the CADs, and data augmentation will definitely become an effective method to solve this problem. But, the number of generated Despites on the limits of calculation, we did not make more than method, in future work, more than 5000 generated images and the multi-category task will be conducted for the further comparative study. Furthermore, how many images generated depends on the performance of the specific classification tasks, how many generated images can be chosen as the most appropriate number of generated images to improve the performance of the CAD system needs to be further discussed. T-SNE analysis is suggested to be applied to observe the distribution of generated samples and original samples. And it is recommended to use different numbers of generated data for augmentation study.

6. Conclusions

In this work, we propose a new fundus image synthesis method called Multiple Lesions Insertion (MLI) which combines real exudates, hemorrhages, microaneurysms patches into a new healthy fundus image with Poisson editing. The generated samples by MLI achieved good results with the realistic texture details. We first demonstrate the effectiveness of adding synthetic samples based on MLI method boosting DR screening. All the experiments show that the diagnosis accuracy can contrast lesion templates which influence the classification accuracy. In our experiment, 3000 generated fundus images with increasing the training samples achieve the best performance. The MLI method has a good effect in generating medical images, it can be used as an auxiliary method for fundus lesions images generation and other medical image generation tasks with a wide range of applications.

Funding

Fudan University (20275); National Key Instrumentation Development Project of China (2013YQ030651); National Natural Science Foundation of China (61875123, 81421004); Shandong Key Laboratory of Intelligent Buildings Technology (SDIBT202006); Independent Research fund of Key Laboratory of Industrial Dust Prevention and Control & Occupational Health and Safety, Ministry of Education (EK20201003).

Acknowledgment

Thanks to Shujin Zhu for their suggestions on article revision. Yanye Lu (yanye.lu@pku.edu.cn) is also the joint corresponding author of the article.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. Y. Niu, L. Gu, F. Lu, F. Lv, Z. Wang, T. Sato, Z. Zhang, Y. Xiao, X. Dai, and T. Cheng, “Pathological evidence exploration in deep retinal image diagnosis,” in Proc. AAAI conf. Tech. Applications., pp. 1093–1101, Jul. 2019.

2. V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopalan, K. Widner, T. Madams, J. Cuadros, R. Kim, R. Raman, P. C. Nelson, J. L. Mega, and D. R. Webster, “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs,” JAMA 316(22), 2402–2410 (2016). [CrossRef]

3. Z. Gao, J. Li, J. Guo, Y. Chen, Z. Yi, and J. Zhon, “Diagnosis of diabetic retinopathy using deep neural networks,” IEEE Access 7, 3360–3370 (2019). [CrossRef]

4. S. Resnikoff, W. Felch, T. M. Gauthier, and B. Spivey, “The number of ophthalmologists in practice and training worldwide: a growing gap despite more than 200 000 practitioners,” Br. J. Ophthalmol. 96(6), 783–787 (2012). [CrossRef]

5. D. S. Fong, L. Aiello, T. W. Gardner, G. L. King, G. Blankenship, J. D. Cavallerano, F. L. Ferris III, and R. Klein, “Retinopathy in diabetes,” Diabetes Care 27(Supplement 1), S84–S87 (2004). [CrossRef]

6. J. Li, S. Wang, X. Han, G. Zhang, M. Zhao, and L. Ma, “Spatiotemporal trends and influence factors of global diabetes prevalence in recent years,” Soc. Sci. Med. 256, 113062 (2020). [CrossRef]

7. N. Zhu and Z. Y. U. C. Kou, “A new deep neural architecture search pipeline for face recognition,” IEEE Access 8, 91303–91310 (2020). [CrossRef]

8. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). [CrossRef]

9. C. Kou, W. Li, W. Liang, Z. Yu, and J. Hao, “Microaneurysms segmentation with a U-Net based on recurrent residual convolutional neural network,” J. Med. Imaging 6(02), 1 (2019). [CrossRef]

10. Z. Jiang, Z. Yu, S. Feng, Z. Huang, Y. Peng, J. Guo, Q. Ren, and Y. Lu, “A super-resolution method-based pipeline for fundus fluorescein angiography imaging,” Biomed. Eng. Online 17(1), 125 (2018). [CrossRef]

11. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” J. Artif. Intell. Res. 16(1), 321–357 (2002). [CrossRef]

12. L. C. Storoni, A. J. McCoy, and R. J. Read, “Likelihood-enhanced fast rotation functions,” Acta Crystallogr. Sect. D-Struct. Biol. 60(3), 432–438 (2004). [CrossRef]

13. J. S. Jie, W. Ping, J. Peiyi, and H. S. Ping, “Research on data augmentation for image classification based on convolution neural networks,” in Proceedings of Chinese Automation Congress (CAC), IEEE, Jinan, ShanDong, China, 2017, pp. 4165–4170.

14. J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2242–2251.

15. Y. Z Y. Li, Y. Zhu, and X. Hu, “Wasserstein GAN based on Autoencoder with back-translation for cross-lingual embedding mappings,” Pattern Recogn. Lett. 129, 311–316 (2020). [CrossRef]

16. X. Miao, M. Ian, S. Yao, and Y. Yan, “Improved conditional generative adversarial net classification for spoken language recognition,” in Proceedings of IEEE Spoken Language Technology Workshop (SLT). Lang. Technol. Workshop, SLT - Proc., Athens, Greece, 2018, pp. 98–104.

17. P. Pérez, M. Gangnet, and A. Blake, “Poisson Image Editing,” ACM Trans. Graph. 22(3), 313–318 (2003). [CrossRef]

18. M. Li, Y. M. Lu, S. H. Han, Z. B. Wu, and L. Cao, “Multiple transmission electron microscopic image stitching based on sift features,” JSMU 35(9), 1251–1257 (2015).

19. L. Ding and H. Ren, “Segmentation algorithm of medical exercise rehabilitation image based on HFCNN and IoT,” IEEE Access 7(99), 160829–160844 (2019). [CrossRef]

20. M. Kim, J. Yun, and Y. Cho, “Deep learning in medical imaging,” Neurospine 16(4), 657–668 (2019). [CrossRef]

21. A. N. Mathur, A. Khattar, and O. Sharma, “Multimodal medical volume colorization from 2D style,” arXiv:2004.11702, 2020.

22. J. M. Morel, A. B. Petro, and C. Sbert, “Fourier implementation of Poisson image editing,” Pattern Recognit. Lett. 33(3), 342–348 (2012). [CrossRef]

23. O. P. Verma, S Roy, S. C. Pandey, and M Mittal, “Advancement of machine intelligence in interactive medical image analysis,” in Algorithms for Intelligent Systems, Verma O P, Springer, 2020.

24. L. Hou, A. Agarwal, D. Samaras, T. M. Kurc, R. R. Gupta, and J. H. Saltz, “Unsupervised histopathology image synthesis,” arXiv preprint arXiv:1712.05021, 2017.

25. X. Li, H. Shen, M. Yu, X. Wei, J. Han, J. Zhu, J. Gao, Z. Liu, Y. Zhang, and R. Yu, “Blind image inpainting using pyramid GAN on thyroid ultrasound images,” in Proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 2019, pp. 678–683.

26. M. Fridadar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, “GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification,” Neurocomputing 321, 321–331 (2018). [CrossRef]

27. N. Ghassemi, A. Shoeibi, and M. Rouhani, “Deep neural network with generative adversarial networks pre-training for brain tumor classification based on MR images,” Biomed. Signal Process. Control 57, 101678 (2020). [CrossRef]

28. P. Costa, A. Galdran, M. I. Meyer, M. Niemeijer, M. Abrámoff, A. M. Mendonça, and A. Campilho, “End-to-end adversarial retinal image synthesis,” IEEE Trans. Med. Imaging 37(3), 781–791 (2018). [CrossRef]

29. H. Zhao, H. Li, and L. Cheng, “Synthesizing filamentary structured images with GANs,” arXiv:1706.02185, 2017.

30. Z. Yu, Q. Xiang, J. Meng, C. Kou, Q. Ren, and Y. Lu, “Retinal image synthesis from multiple-landmarks input with generative adversarial networks,” Biomed. Eng. Online 18(1), 62 (2019). [CrossRef]

31. Q. You, C. Wan, J. Sun, J. Shen, H. Ye, and Q. Yu, “Fundus image enhancement method based on CycleGAN,” in Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, 2019, pp. 4500–4503.

32. Y. Zhou, B. Wang, X. He, S. Cui, and L. Shao, “DR-GAN: Conditional generative adversarial network for fine-grained lesion synthesis on diabetic retinopathy images,” arXiv:1912.04670, 2019.

33. C. Kou, W. Li, Z. Yu, and L. Yuan, “An enhanced residual U-Net for microaneurysms and exudates segmentation in fundus images,” IEEE Access 8, 185514–185525 (2020). [CrossRef]

34. H. Zhao, H. Li, S. Maurer-Stroh, and C. Li, “Synthesizing retinal and neuronal images with generative adversarial nets,” Med. Image Anal. 49, 14–26 (2018). [CrossRef]

35. A. Pezeshk, B. Sahiner, R. Zeng, A. Wunderlich, W. Chen, and N. Petrick, “Seamless insertion of pulmonary nodules in chest CT images,” IEEE Trans. Biomed. Eng. 62(12), 2812–2827 (2015). [CrossRef]

36. A. Pezeshk, N. Petrick, W. Chen, and B. Sahiner, “Seamless lesion insertion for data augmentation in CAD training,” IEEE Trans. Med. Imaging 36(4), 1005–1015 (2017). [CrossRef]

37. Z. Ghanian, A. Pezeshk, N. Petrick, and B. Sahiner, “Computational insertion of microcalcification clusters on mammograms: Reader differentiation from native clusters and computer-aided detection comparison,” J. Med. Imaging 5(04), 1 (2018). [CrossRef]

38. E. Decenciére, X. Zhang, G. Cazuguel, B. Lay, B. Cochener, C. Trone, G. Philippe, R. Ordonez, P. Massin, A. Erginay, B. Charton, and J. C. Klein, “Feedback on a publicly distributed image database: the messidor database,” Image Anal. Stereol. 33(3), 231–234 (2014). [CrossRef]

39. E. Decenciere, G. Cazuguel, X. Zhang, G. Thibault, J. C. Klein, F. Meyer, B. Marcotegui, G. Quellec, M. Lamard, R. Danno, D. Elie, P. Massin, Z. Viktor, A. Erginay, B. Lay, and A. Chabouis, “TeleOphta: machine learning and image processing methods for teleophthalmology,” IRBM 34(2), 196–203 (2013). [CrossRef]

40. S Dhivya, “Diabetic retinopathy detection from human retinal images using image processing techniques,” Int. J. Sci. Res. & Development 10(2), 615–626 (2017). [CrossRef]

41. P. Porwal, S. Pachade, R. Kamble, M. Kokare, G. Deshmukh, V. Sahasrabuddhe, and F. Meriaudeau, “Indian Diabetic Retinopathy Image Dataset (IDRiD): a database for diabetic retinopathy screening research,” Data 3(3), 25 (2018). [CrossRef]

42. J. Staal, M. D. Abramoff, M. Niemeijer, M. A. Viergever, and B. Van Ginneken, “Ridge-based vessel segmentation in color images of the retina,” IEEE Trans. Med. Imaging 23(4), 501–509 (2004). [CrossRef]

43. R. Sayres, A. Taly, E. Rahimy, K. Blumer, D. Coz, N. Hammel, J. Krause, A. Narayanaswamy, Z. Rastegar, D. Wu, S. Xu, S. Barb, A. Joseph, M. Shumski, J. Smith, A. B. Sood, G. S. Corrado, L. Peng, and D. R. Webster, “Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy,” Ophthalmology 126(4), 552–564 (2019). [CrossRef]

44. S. Ramachandran, N. Strisciuglio, A. Vinekar, R. John, and G. Azzopardi, “U-COSFIRE filters for vessel tortuosity quantification with application to automated diagnosis of retinopathy of prematurity,” Neural Comput. Appl. 32(16), 12453–12468 (2020). [CrossRef]

45. N. Halder, D. Roy, R. Banerjee, P. Roy, P. P. Sarkar, and S. Bandyopadhyay, “Automatic detection and segmentation of optic disc (ADSO) of retinal fundus images based on mathematical morphology,” in Proceedings of National Conference on Emerging Trends on Sustainable Technology and Engineering Applications (NCETSTEA), Durgapur, India, 2020, no. 9119931.

46. E. Kaya, İ. Saritaş, and I. A. Ozkan, “Supervised segmentation of retinal vessel structures using ANN,” arXiv:2001.05549, 2020.

47. R. Olaf, F. Philipp, and B. Thomas, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention, Munich, Germany, 2015, pp. 234–241.

48. W. Gómez, L. Leija, A. Alvarenga, A. Infantosi, and W. Pereira, “Computerized lesion segmentation of breast ultrasound based on marker-controlled watershed transformation,” Med. Phys. 37(1), 82–95 (2009). [CrossRef]

49. A. R. Alankrita, A. Shrivastava, and V. Bhateja, “Contrast improvement of cerebral mri features using combination of non-linear enhancement operator and morphological filter,” in Proceedings of (IEEE) International Conference on Network and Computational Intelligence (ICNCI 2011), Zhengzhou, China, 2011, pp. 182–187.

50. X. Xia, C. Xu, and B. Nan, “Inception-v3 for flower classification,” in Proceedings of 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, 2017, pp. 783–787.

51. T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision. Pattern Anal. Mach. Intell, vol. 42, no. 2, pp. 2980–2988.

Dataset	Resolution	Number of images	Include content
Dataset	Resolution	Number of images	Labels	Masks
Messidor [39]	1440 $\times$ 960	1200	0, 1, 2, 3, 4 stands for no DR, mild, moderate, severe, and proliferative DR risk of macular edema	hemorrhages
	2240 $\times$ 1488
	2304 $\times$ 1536
E-ophtha [40]	*	463	e-ophtha-MA	microaneurysms
E-ophtha [40]	*	463	e-ophtha-EX	exudates
Kaggle [41]	*	35621	0, 1, 2, 3, 4 stands for no DR, mild, moderate, severe, and proliferative DR	–
IDiRD [42]	768 $\times$ 584	516	0, 1, 2, 3, 4 stands for no DR, mild, moderate, severe, and proliferative DR	–
DRIVE [43]	565 $\times$ 584	40	–	blood vessels

Number of generated fundus images	0	1000	2000	3000	4000	4500
ACC	0.6795 $\pm$ 0.026	0.8321 $\pm$ 0.019	0.8412 $\pm$ 0.023	0.8652 $\pm$ 0.015	0.8482 $\pm$ 0.012	0.8148 $\pm$ 0.018
AUC	0.6858 $\pm$ 0.026	0.8724 $\pm$ 0.017	0.8865 $\pm$ 0.012	0.9075 $\pm$ 0.012	0.8812 $\pm$ 0.015	0.8322 $\pm$ 0.022
SEN	0.4452 $\pm$ 0.100	0.7713 $\pm$ 0.042	0.7859 $\pm$ 0.043	0.7824 $\pm$ 0.042	0.7742 $\pm$ 0.057	0.7501 $\pm$ 0.033
SPE	0.8676 $\pm$ 0.036	0.8648 $\pm$ 0.050	0.8855 $\pm$ 0.064	0.9317 $\pm$ 0.050	0.9077 $\pm$ 0.050	0.8667 $\pm$ 0.051

Number of generated fundus images	0	1000	2000	3000	4000	4500
ACC	0.908	0.8908	0.9195	0.954	0.9454	0.9454
AUC	0.9632	0.9843	0.9812	0.9893	0.9897	0.9915
SEN	0.8839	0.9613	0.9742	0.9161	0.9742	0.9613
SPE	0.9275	0.8342	0.8756	0.9848	0.9223	0.9326

Methods	None	Under-sampling	Over-sampling	Cropping and Rotation	Adding real images	Only 3000 generated images	Real images with 3000 generated images
ACC	0.908	0.7672	0.8939	0.8707	0.8879	0.7098	0.954
AUC	0.9632	0.8694	0.9716	0.982	0.9675	0.9502	0.9893
SEN	0.8839	0.7226	0.8387	0.929	0.9161	0.3484	0.9161
SPE	0.9275	0.6425	0.9378	0.8238	0.8653	1	0.9848

Dataset	Resolution	Number of images	Include content
Dataset	Resolution	Number of images	Labels	Masks
Messidor [39]	1440 $\times$ 960	1200	0, 1, 2, 3, 4 stands for no DR, mild, moderate, severe, and proliferative DR risk of macular edema	hemorrhages
	2240 $\times$ 1488
	2304 $\times$ 1536
E-ophtha [40]	*	463	e-ophtha-MA	microaneurysms
E-ophtha [40]	*	463	e-ophtha-EX	exudates
Kaggle [41]	*	35621	0, 1, 2, 3, 4 stands for no DR, mild, moderate, severe, and proliferative DR	–
IDiRD [42]	768 $\times$ 584	516	0, 1, 2, 3, 4 stands for no DR, mild, moderate, severe, and proliferative DR	–
DRIVE [43]	565 $\times$ 584	40	–	blood vessels

Multiple Lesions Insertion: boosting diabetic retinopathy screening through Poisson editing

Abstract

1. Introduction

2. Related work and dataset

2.1 Dataset

2.1.1 Messidor dataset

2.1.2 E-ophtha dataset

2.1.3 Kaggle dataset

2.1.4 IDiRD dataset

2.1.5 DRIVE dataset

2.2 Data cleaning and image pre-preprocessing

3. Methods

3.1 Structure extraction

3.2 Poisson image editing

3.3 Inception-V3 model

4. Experiment and results

4.1 Training protocols and evaluation metrics

4.2 Subjective visual quality evaluation

4.3 Quantitative evaluation

5. Discussion

6. Conclusions

Funding

Acknowledgment

Disclosures

References

Cited By

Figures (8)

Tables (5)

Equations (12)

Biomedical Optics Express

Data augmentation methods	the number of original data		the number of augmented data		the number of data used in our paper
	negative	positive	negative	positive	negative	positive
Oversampling	2633	517	0	2116	2633	2633
Under-sampling	517	517	0	0	517	517
Cropping and rotation	2633	517	0	3000	2633	3517
Adding real images	2633	517	1129	221	3762	738
3000 generated images	2633	517	0	3000	2633	3517