FCE-Net: a fast image contrast enhancement method based on deep learning for biomedical optical images

Yunfei Zhang; Yunfei Zhang; Peng Wu; Peng Wu; Siqi Chen; Hui Gong; Hui Gong; Xiaoquan Yang; Xiaoquan Yang

doi:10.1364/BOE.459347

1. Introduction

Optical imaging can achieve single cell resolution to help us to build more realistic structural models of biological tissues at mesoscopic scale [1,2]. We can use it to reveal the mechanisms of life both in physiology and pathology [3–5]. However, the quality of the raw images from the optical microscopes tends to be poor [6]. This is because the signal intensity of samples is not always uniform and the optical imaging are prone to degrade, due to the heterogeneity of biological tissues. Take the fluorescence imaging of brain as an example, the signal intensity of fluorescence labeled soma is tens or even hundreds of times that of axons and dendrites [7,8]. When we acquire these images from block of tissues, such as whole brain imaging and in-vivo imaging, the somas maybe blurred and the axons maybe suppressed in the background. Therefore, enhancement of the raw images is highly required before the analysis and information extraction. In addition, the number of images acquired with optical imaging has been dramatically increased in recent years, due to the expanded capability of optical microscope in speed, spatial resolution, and imaging volume. A fast image enhancement method is not only required in real-time observation but also expected for large volume images processing [9].

Adjusting the grayscale of image is an effective enhancement method to improve image quality, and the enhanced image is conducive to display and further analysis. There were many classic image enhancement methods in previous researches, such as gamma map and histogram equalization (HE) have been applied in the biomedical images [10–11]. They performed global grayscale adjustment for the image. Although these methods were highly efficient, they could not achieve adaptive enhancement of image contrast. In particular, these methods always enhance the background signal excessively when the parameter was not well selected. To address this issue, researchers divided the image into several regions to process and adjust the grayscale locally, such as Adaptive histogram equalization (AHE) [12], Contrast Limited Adaptive histogram equalization (CLAHE) [13] and adaptive contrast enhancement (ACE) [14]. However, AHE often overmagnify the noise and cause stitching artifacts. The CLAHE algorithm and its variants only alleviate the problems. The ACE algorithm is an adaptive method for enhancing contrast. The target signal is selected by low pass filtering. Then the high frequency signal is enhanced by multiplying an adaptive enhancement factor which is related to the local variance of the image. ACE can highly improve the contrast of the images. But the processing efficiency is extremely low. It cannot actually be used for heavy duty image processing task. Although there were many traditional algorithms that are proposed in the following years [15–18], they were difficult to guarantee image quality and high efficiency simultaneously.

In recent years, deep learning plays a major role in image processing of computer science. Different kinds of deep neural networks (DNNs) have been proposed to make the network deeper and more complex to solve practical problem, such as Convolution Neural Network (CNN) [19–21], Generative Adversarial Network (GAN) [22,23], Vision Transformer (ViT) [24,25], and so on. These networks have huge numbers of parameters which limited their speed both in training and processing. However, in-vivo dynamic imaging or large-scale image dataset requires fast processing. This implies that the network should have faster reasoning and smaller size of parameters. Therefore, many light-weight networks have been designed to trade-off the accuracy and efficiency for image processing. The strategy for lightweight networks can be broadly divided into two categories: 1) making the input image or network size smaller to squeeze the parameter size [26–28]. This limits the size of the input image or the network structure. 2) multiscale spatial resolution fusion [29–31]. These networks will carry out fast up-sampling after multi-dimension fusion, which may miss the high dimensional spatial frequency in up-sampling processing and loss the details of the predicted results. Therefore, in biomedical image processing, researchers usually use Fully Convolutional Network (FCN) [32], U-Net [33] or their variation [34–37] to overcome the above drawbacks. Although these networks have the advantages of simple structure and high accuracy, they still have difficulties in fast biomedical images processing.

Inspired by these networks, we designed a new network called Fast Contrast Enhancement Network (FCE-Net). We combined the advantages of these typical networks such as the dual path mode and the skip connection. Meanwhile we introduced the spatial attention module (SAM) [38] to obtain the inter-spatial relationship of features. Particularly, in the first part of network, we designed a double path structure with separated spatial path and attention path to obtain spatial information and large receptive field, respectively. In the second part of network, we merged the double path images and recover the original size. For restoring the missing details in the up-sampling process, we concatenated the output image with the previous features in spatial path. This network adopted ResNet-18 as the backbone in attention path and uses fewer convolution blocks layers. We used mouse brain as sample to validate the FCE-Net. We compared FCE-Net with classic image enhancement algorithms, U-Net and U-Net++ in processing efficiency and image quality, respectively. The performance of FCE-Net in cell counting task was also evaluated to indicate its utility in biomedical images analysis. At last, we chose the public retinal fundus image datasets DRIVE to examine the feasible of FCE-Net in different kind of biomedical images.

2. Methods

2.1. Network architecture

In the enhancement of biomedical images, the resolution of the images should not be degraded. At the same time, the global correlation information between different pixels in the image should be ensured. Therefore, the network not only needs enough spatial information, but also needs a large receptive field in each convolutional layer. Such as U-Net, it includes two convolution blocks in each sampling layer commonly. These blocks contain many kernels with a size of $3 \times 3$ to obtain a large receptive field and ensure enough spatial information as well at a cost of increasing of number of parameters. FCE-Net is different from conventional networks which utilize single path encoder and decoder structure. As shown in Fig. 1, FCE-Net has upper and lower paths: attention path contains large receptive field which is used to obtain the correlation information between pixels of image; spatial path is in charge of preserve spatial information in different layers. Then we combine the two paths together and decode the image to its original size.

Fig. 1. Diagram of FCE-Net. (a) The main framework of the network. The first half of the network is separated to the spatial path and the attention path. And the second half of the network concatenates the results of the dual-path and then outputs the final result. (b) A detailed illustration of the Spatial Attention Module (SAM). The operations are represented by arrows with different colors.

Download Full Size | PDF

In attention path, for accelerating to get large receptive field, we chose pre-trained ResNet-18 as backbone. ResNet-18 with pre-trained by ImageNet can fast get deep feature map and guarantee the correlation between adjacent pixels in each image. ResNet-18 uses Max-pooling layer to ensure the invariance and weaken irrelevant information in each single channel. However, ResNet-18 ignores that the image feature has a strong correlation between different channels. To utilize the inter-spatial relationship of features, we appended the SAM on the tail. SAM simulates the attention mechanism of organism. Through Max-pooling and Average-pooling operation between different channels, the features extracted by ResNet-18 can be weighted, so as to strengthen the weight of signals and enhance the predication results. In SAM, we applied the Max-pooling and Average-pooling operation of the input image along the channel axis, respectively, and concatenated along channel axis. We used convolutional block with kernel size of to further enlarge the receptive field and merge the output channels as one. After convolution layer and sigmoid activation function, the output was multiplied with input image to simulate human attention and increase the weight of objects of interest in image.

The other path is called spatial path which included three convolutional blocks to encode spatial information. Due to the first convolution block contains the most abundant high-frequency components in image, we concatenated the output image with feature images which was got from first convolutional block just like U-Net. This processing could guarantee the recovered image size with original image as well as keep the details. Each convolutional block layer includes a convolutional layer with stride of 2, a batch normalization layer and a ReLU layer. In first convolutional layer, the convolutional kernel size is $7 \times 7$ and others kernel size is $3 \times 3$. After extraction of the spatial path feature maps, the output becomes 1/8 of the original image and concatenates with output of the other path.

Then we performed up sampling of the output of attention path by 2 times for matching the spatial path, and we concatenated the output of spatial path and attention path together along channel axis. Next, we carried out two up-sampling processing: 4 times fast up-sampling to concatenate with spatial path output and recover the missing high-frequency spatial information in first processing. And the second processing restore the output image to its original size. Finally, we used $1 \times 1$ convolutional block to change the numbers of output channel to the original channel numbers.

2.2. Datasets acquisition

We imaged Thy1-EGFP mouse brain using the time-delay integration-fluorescence micro- optical sectioning tomography(TDI-fMOST) system. A complete dataset of whole brain was acquired for subsequent experiments. The TDI-fMOST system, reported in our previous article [39], is a three-dimensional imaging system that combines mechanical sectioning and line scanning microscope. The sample processing procedure can be outlined as follows [40]: after anesthesia, the mouse was perfused with 0.01 M phosphate buffered saline (PBS, Sigma-Aldrich) and 4% paraformaldehyde (PFA, Sigma-Aldrich). We removed the mouse brain and fixed it with 4% PFA for 24 h again. The sample was washed with 0.01 M PBS repeatedly. After that, the mouse brain was dehydrated with ethanol and soaked in resin. Finally, the mouse brain was placed in the drying box to polymerize. Polymerization conditions are vacuum environment and about 50 degrees Celsius. All animal experiments followed the procedures approved by the Animal Ethics Committee of Huazhong University of Science and Technology.

As previously mentioned, ACE algorithm can enhance image contrast well. But the processing efficiency is extremely low. In order to overcome the limitation of this algorithm, we used FCE-Net to learn its ability to enhance image contrast and improve efficiency. We processed the raw mouse brain images using ACE algorithm [14]. The enhanced images were treated as the ground truth for subsequent training. Now we explain the principle of the ACE algorithm.

(1)$${I_{ij}} = {G_{ij}}({O_{ij}} - {M_{ij}}) + {M_{ij}}$$

(2)$${G_{ij}} = \left\{ \begin{array}{c} A\frac{M}{{{\sigma_{ij}}}},{G_{ij}} < Th\\ Th,{G_{ij}} \ge Th \end{array} \right.$$

In the Eq.(1), ${\textrm{O}_{\textrm{ij}}}$ is the gray value in the image, ${\textrm{M}_{\textrm{ij}}}$ is the mean filter, ${\textrm{I}_{\textrm{ij}}}\textrm{ }$ is the enhanced image. ${\textrm{G}_{\textrm{ij}}}$ is the adaptive coefficient, that is calculated using Eq.(2). A is the enhancement factor. M is the image global mean. ${\mathrm{\sigma }_{\textrm{ij}}}$ is the image local variance centered at ${\textrm{O}_{\textrm{ij}}}$ with a window size of 2N+1. Th is the threshold to avoid excessive enhancement. ACE can adaptively enhance the signal of the image. The parameters that need to be tuned are the window size for mean filtering and local variance calculation, the enhancement factor A, and the threshold Th.

In the training of FCE-Net, we sampled the whole brain images with an interval of 100$\mathrm{\mu }$m along anterior-posterior axes of brain. The images were cropped with a size of $512 \times 512$ and number of 4500. These images were divided into four categories according to the global mean. In our experiment, the images with average gray value of less than 10 was regarded as level 1. Average gray value between 10 and 20 was considered as level 2. The average gray value of level 3 was between 20 and 70. The images with highest average gray value were regarded as level 4. The principle of parameter selection was according to the improvement of image contrast and visualization in the pre-experiment. We used parameters: N = 20, A = 10, Th = 5; N = 20, A = 7, Th = 5; N = 20, A = 2, Th = 10 to process first three categories of images using ACE algorithm, respectively. The images included in the level 4 were not processed. The original images and the enhanced images were used for training FCE-Net end-to-end. As shown in Fig. 2, we showed three original images with different grayscale in the mouse brain dataset and the images processed by the ACE algorithm. Figure 2(a)-(c) are the raw images. Figure 2(d)-(f) are the images processed with ACE (ground truth).

Fig. 2. The images were divided into four levels according to the average gray level of the images. Different parameters were applied to the first three levels of images for ACE processing, while the fourth level of images were not processed, (no display). (a)-(c) are the raw images with three different brightness levels and (d)-(f) are the results processed by the traditional ACE algorithm. Scale bar, 50 µm.

Download Full Size | PDF

2.3. Implementation details

FCE-Net architecture was implemented using Pytorch, an open-source deep learning software package. For iteratively updating the weights and biases, we used the adaptive moment estimation (Adam) optimizer [41] and MSE loss as loss function during the training of FCE-Net. For each image dataset, the ratio of the training, validation and test was set to be 18:1:1. The specifications of PC hardware used for network training and blind testing are Intel 8 cores, 16 threads, 3.70Gz CPU, 64GB RAM and NVIDIA GeForce GTX 3090 GPU (24GB RAM). Our datasets are about 4500 pairs of mouse brain images with pixels of $512 \times 512$. Training time of each epoch was about 1.98 min. And the total training takes ∼ 3.3 h under ∼ 100 epochs in average. Previous lightweight networks mainly obtained the correlation of features in each single channel. While FCE-Net could quickly get the correlation in global spatial features and improve the accuracy of the prediction results when we use small size of parameters. It is worth mentioning that we trained and gained weights based on images of mouse brain. We still used the same weights in the processing of retinal images. The source code for training FCE-Net is available in a public repository on GitHub, https://github.com/dooooordie/FCE-Net_code.git.

3. Results

3.1. Ablation experiments

To validate the network design, we compared the network performance without SAM, with one, two, and three SAM and the convolution kernel in SAM layers with the size of $3 \times 3$, $5 \times 5$, $7 \times 7$ and $9 \times 9$. We randomly cropped a group of images with the size of $512 \times 512$ and the number of 300 as a test set. The peak signal to noise ratio (PSNR) and structural similarity index (SSIM) are used to evaluate the performance. PSNR is a common index to evaluate image quality based on the difference between corresponding pixels which is calculated by Mean Square Error (MSE). We can calculate the MSE and PSNR as follows:

(3)$$MSE = \frac{1}{{mn}}\sum\limits_{i = 0}^m {{{\sum\limits_{j = 0}^n {||{T(i,j) - I(i,j)} ||} }^2}}$$

(4)$$PSNR = 10{\log _{10}}(\frac{{{{({2^b} - 1)}^2}}}{{MSE}})$$

where I denotes the network output, T denotes the corresponding ground truth, m and n are the size of the image and b is the image bit.

SSIM refers to structural similarity, which is used to measure the similarity of two images. The output value of SSIM is closer to 1, the less distortion of the image is indicated. Eq.(5) shows how SSIM is calculated:

(5)$$SSIM = \frac{{(2{\mu _T}{\mu _I} + {C_1})(2{\sigma _{T,I}} + {C_2})}}{{(\mu _T^2 + \mu _I^2 + {C_1})(\sigma _T^2 + \sigma _I^2 + {C_2})}}$$

where ${\mathrm{\mu }_\textrm{T}}$ and ${\mathrm{\mu }_\textrm{I}}$ are the mean value of T and I, respectively; ${\mathrm{\sigma }_\textrm{T}}\textrm{ }$ and ${\mathrm{\sigma }_\textrm{I}}$ are the standard deviations of T and I, respectively; and ${\mathrm{\sigma }_{\textrm{T,I}}}$ is the covariance of T and I. ${\textrm{C}_\textrm{1}}$ and ${\textrm{C}_\textrm{2}}$ are constants to avoid the denominator close to zero. In our manuscript, ${\textrm{C}_\textrm{1}}$ is 0.01 and ${\textrm{C}_\textrm{2}}$ is 0.03. The specific values are shown in Table 1. The increasing of the SAM and the expansion of the convolution kernel can improve the PSNR and SSIM. When the number of SAM is three, the improvement of network performance is relatively small. Therefore, we chose two SAM in this article. After determining the number of SAM, we verified the effect of the size of convolution kernel on the image performance. When the size of convolution kernel goes to $7 \times 7$, the SSIM and PSNR are close to their maximum. To balance network performance and inference speed, the parameters we finally used are two SAM and $7 \times 7$ convolution kernel in SAM layers.

Table 1. PSNR and SSIM in ablation experiment

View Table | View all tables in this article

3.2. Contrast enhancement

To validate FCE-Net, we selected typical images from the whole-brain dataset. The original image shown in Fig. 3(a) contains a soma and dendrites of the neuron in mouse brain. The brightness of the soma is high, while the surrounding fibers are faint. We cannot achieve a reasonable visualization of data using simple contrast adjustment method. Low grayscale information is difficult to be extracted and analyzed. Then we processed the image using four algorithms. The CLAHE was achieved by “adapthisted” function on the MATLAB R2017a (License No. 40588452). We built U-Net and U-Net++ architecture trained by the same hardware and using the same dataset as FCE-Net. The parameters and details of the U-Net we used are consistent with the Ref [33]. The filter size of ${X^{[0,0]}} - {X^{[4,0]}}$ are 64, 128, 256, 512, and 1024 in the U-Net++ and the other super-parameters are consistent with the original text [34]. Figure 3(b)-(f) showed the enhancement results of CLAHE, U-Net, U-Net++, FCE-Net and ACE algorithms, respectively. Figure 3(g)-(l) are the enlarged images located at the yellow box of Fig. 3(a)-(f), respectively. The profile of the image at the yellow dotted line is shown in Fig. 3(m). The profile of the image at the purple dotted line is shown in Fig. 3(n). In this experiment, the weak signal in the image can be displayed after enhancement. The enhancement of CLAHE method is not obvious and the background signal is over-amplified. The image enhancement effects of U-Net, U-Net++, FCE-Net and ACE are similar visually. These results show that our network can output high-quality enhanced images.

Fig. 3. Results of four image enhancement methods. (g)-(l) are enlarged views of the yellow box in (a)-(f) respectively. (m) is the normalized gray curve at the yellow dotted line of (g)-(l). (n) is the normalized gray curve at the purple dotted line of (g)-(l). Scale bar, 40 µm, 20 µm.

Download Full Size | PDF

3.3. Efficiency comparison

Here, we used the same workstation to test the time consuming, the number of parameters and memory occupied for four methods (U-Net, U-Net++, FCE-Net and traditional ACE). The experimental images are three groups of mouse brain images with different size: $512 \times 512$, $1024 \times 1024$ and $2048 \times 2048$. There are 20 images for each size. The final time consuming is the average value of the 20 images, which is shown in Table 2. We can see that the image processing speed of FCE-Net is the highest, 3.6 times better than U-Net, 14.8 times better than U-Net++ and 1200 times better than traditional ACE for $1024 \times 1024$ image. This is mainly because the total number of parameters and the memory occupied by the whole network of FCE-Net are about 1/3 of U-Net and 1/5 of U-Net++. For $2048 \times 2048$ sized images, U-Net++ cannot even process by NVIDIA GeForce GTX 3090 GPU (24GB RAM). By calculation, we can know the image processing frame rate under $512 \times 512$ and $1024 \times 1024$ can reach 67fps and 37fps, respectively, using FCE-Net, which can be used in a video-rate observation.

Table 2. Times and volume of Network comparison^a

View Table | View all tables in this article

3.4. Quality evaluation

In order to quantitatively evaluate the performance of FCE-Net in the whole brain dataset, we use the dataset mentioned in the ablation experiments to test. We processed images with ACE algorithm as the ground truth. Then we compared the results processed with U-Net, U-Net++ and FCE-Net using PSNR and SSIM.

The statistical results are shown in the Fig. 4. The average SSIM and the PSNR of images enhanced by U-Net data is 0.931 and 37.369, respectively. The average SSIM and the PSNR of images enhanced by U-Net++ data is 0.956 and 38.279, respectively. The average SSIM and the PSNR of images enhanced by FCE-Net data is 0.970 and 36.510, respectively. The images processed by FCE-Net is closed to U-Net in PSNR index and better than U-Net, U-Net++ in SSIM index. These results demonstrate that FCE-Net performs well with different image features.

Fig. 4. Quantitative performance of FCE-Net and U-Net++ across the dataset. PSNR, peak signal to noise ratio; SSIM, structural similarity index.

Download Full Size | PDF

3.5. Soma localization and counting

The soma recognition and localization are critical in quantifying the distribution of specific neurons in the whole brain [42]. Since our method can achieve great enhancement of weak signals, it may improve the performance in this task. In the experiment, we randomly selected seven data cubes with a size of $500 \times 500 \times 500$ µm³ from the whole brain dataset for evaluation and verification. The voxel resolution of each image is $1 \times 1 \times 2$ µm³. And the NeuroGPS [43] software was used for automatic cell counting for both raw images and those processed with FCE-Net in advance. Then we manually counted the raw data as a ground truth. Figure 5 (a) and Fig. 5(b) show the 3D images reconstructed by raw and enhanced images by FCE-Net, respectively. The red dots in the figures show the position of the soma centers. The yellow arrows indicate the somas that were missed by the NeuroGPS in both datasets. The red arrows indicate the somas that were missed by the software in the raw data but identified in the enhanced data. Since the signal of somas at these positions are very low in the raw images, they cannot be easily distinguished from the background, even with a cutting-edge cell counting software. After processing with FCE-Net, the brightness of those somas was elevated and they can be recognized by the software effectively. According to the statistics of the results from 7 data cubes, the average accuracy and recall rate of the raw data were $98.5\%\pm 1.5\%$ (n = 7) and $92.5 \pm 5.7\%$ (n = 7). The average accuracy and recall rate of the data processed by FCE-Net were $97.6 \pm 1.6\%$ (n = 7) and $98.4 \pm 1.4\%$ (n = 7). The precision rate of cell counting slightly decreased with about 1% after the enhancement of dataset. But the FCE-Net evidently improved the recall rate of cell counting with about 6%. Each data cube processed by FCE-Net consumed only about 3s, which is almost negligible in the three-dimensional cell counting task.

Fig. 5. Comparison of the performance in cell counting with and without FCE-Net. Automatically locating soma centers using NeuroGPS in the (a) Raw data and (b) enhanced data (red dots). Red arrows indicate somas that were not correctly identified in the raw data block, but identified using FCE-Net. All of the missed somas in both data blocks are marked by yellow arrows. (c) Precision and (d) recall rate for both raw and enhanced data.

Download Full Size | PDF

3.6. Vascular segmentation

In order to validate the generalization of the FCE-Net, we chose the public retinal fundus image datasets DRIVE and the task of segmentation of the vessel of retina. DRIVE dataset contains normal and abnormal fundus vascular images. By observing the vascular distribution on the fundus, diabetic retinopathy can be screened. It is difficult to retrieve effective information directly from the background of images. We used FCE-Net to enhance the contrast of DRIVE’s images and the parameters of the FCE-Net are still derived from the mouse brain dataset as shown in Section 2. We chose No.28 and No.36 images in training set of DRIVE as blind test images as shown in Fig. 6. In Fig. 6(a1)-(a2) and Fig. 6(d1)-(d2), we showed the results of raw and processed images with FCE-Net. It can be seen that the part of capillary is hard to recognize in the raw image but can be clearly distinguish after processing with FCE-Net. Next, as shown in Fig. 6(a3)-(a6) and Fig. 6(d3)-(d6), we compared two different segmentation method—Otsu, a method commonly used for thresholding segmentation [44] and deep learning method with better performance in vascular segmentation called spatial Attention U-Net (SA-UNet) [37] for raw and enhanced images, respectively. It is worth mentioning that in order to boost the performance of thresholding segmentation, we have applied the Subtract Background operation using Fiji software(an open source image processing package) to raw and enhanced images. Then, we determined the optimal threshold with Otsu and performed image binarization. SA-UNet segmented the raw and enhanced images directly. And we took the manual segmentation of the vasculature as the ground truth as shown in (a7) and (d7). We selected two different typical areas in the Fig. 6(a), (d) and marked them with red dashed boxes and yellow dashed boxes, respectively. Their partial enlarged details have been shown in Fig. 6(b-c) and (e-f). Form these details, we can clearly see that no matter which pair or method, the FCE-Net performs better in segmentation task. Compared with ground truth, the number of small blood vessels segmented in Fig. 6, from left to right in the same row, is increasing, which is more and more consistent with ground truth. SA-UNet combined with FCE-Net produced more accurate segmentation results which retain more retinal blood vessels. The result has shown that the FCE-Net can further improve the performance of state-of-the-art model in retinal vascular segmentation task.

Fig. 6. Results of vascular segmentation. (a1)-(a2) and (d1)-(d2) show the raw images and images processed with FCE-Net. (a3)-(d3) are the segmentation results of the raw images using Otsu. (a4)-(d4) are the segmentation results of the images using Otsu combined with FCE-Net. (a5)-(d5) are the segmentation results of the raw images using SA-UNet. (a6)-(d6) are the segmentation results of images using SA-Unet combined with FCE-Net. (a7)-(d7) are the manual segmentation results using as the ground truth. Two different typical areas in the (a) and (d) are marked with yellow dashed boxes and red dashed boxes, respectively. Their partial enlarged details have been shown in (b-c) and (e-f).

Download Full Size | PDF

To evaluate these methods, we compared the results of different methods with ground truth. We counted the results of each pixel and produced the confusion matrix. And according to the confusion matrix, the results were evaluated by five metrics which are accuracy score (ACC), area under the ROC curve (AUC), F1-score (F1), Matthew Correlation Coefficient (MCC) and Intersection Over Union (IOU). Since this statistical model can be regarded as a binary classification, it can divide into true positive (TP), false positive (FP), false negative (FN), and true negative (TN). Based on these four variables, we can calculate the metrics as follow:

(6)$$ACC = \frac{{TP + TN}}{{TP + FP + FN + TN}}$$

(7)$$F1 = \frac{{2 \times precision \times recall}}{{precision + recall}}$$

(8)$$MCC = \frac{{TP \times TN - FP \times FN}}{{\sqrt {(TP + FP) \times (TP + FN) \times (TN + FP) \times (TN + FN)} }}$$

(9)$$IOU = \frac{{TP}}{{TP + FP + FN}}$$

The ROC curve is plotted by $T{P_{rate}}$ and $F{P_{rate}}$ where $precision = \frac{{TP}}{{TP + FP}}$, $recall = \frac{{TP}}{{TP + FN}}$, $T{P_{rate}} = \frac{{TP}}{{TP + FN}}$ and $F{P_{rate}} = \frac{{FP}}{{FP + TN}}$. The five index scores of segmentation performance with and without FCE-Net by thresholding method and SA-UNet are shown in Table 3. Segmentation after enhancement with FCE-Net still perform better than that without enhancement in terms of every score. These results further demonstrate the effectiveness of FCE-Net in different type of biological samples.

Table 3. Segmentation evaluation of raw image and enhanced image

View Table | View all tables in this article

4. Discussion and conclusion

In this work, we proposed a fast optical biomedical image enhancement method based on deep learning called FCE-Net. Generally, spatial information and receptive field need to trade-off in previous network. In FCE-Net, we separated spatial path and attention path to obtain spatial information and large receptive field at the same time. And we introduced the SAM into attention path to enhance the inter-spatial relationship. It not only reduces the number of total parameters, but also improves the accuracy and generalization of the output. We used Thy1-EGFP mouse brain as sample to train and validate the FCE-Net. The results indicated that FCE-Net outperform the CLAHE, U-Net and U-Net++ in speed and similar with U-Net++ in quality. Take the image size of $512 \times 512$ as an example, FCE-Net can guarantee SSIM up to 0.965 at 67fps as shown in Table.2 and Fig. 4. In cell counting task, the images processed by FCE-Net can get much better recall rate. In the segmentation of retinal blood vessels using the public retinal fundus image datasets DRIVE, FCE-Net can further improve the results using state-of-the-art segmentation method SA-UNet with few extra time consuming.

In the FCE-Net training, we used the results of the ACE algorithm as the training dataset, which can be easily acquired. No doubt that better training images lead to better performance in contrast enhancement. But in our research, training FCE-Net using images processed by ACE has promise similar results as cutting-edge network-based algorithm, and with higher processing speed. Furthermore, the images in our training dataset contains the results of the ACE algorithm with several sets of parameters to mitigate the bias introduced by manually setting single parameters. This approach enables the network to learn a certain ability to adapt images with different distribution.

In cell counting task, the soma and branches coexist in the images of mouse brain. Therefore, these branches were also enhanced by the FCE-Net which leads to slight decline for the precision. While the contrast of somas were improved, the recall rate is high and stable in every data cube. For images with less interference signals, FCE-Net may perform better in this kind of task. In vascular segmentation task, only 9%-14% of the pixels belong to the blood vessel in DRIVE datasets. Therefore, there is no significant difference between the raw images and the processed images in the segmentation evaluation scores. But we can still see from the images and scores that FCE-Net is effective in improving the segmentation accuracy.

In summary, FCE-Net has the ability to rapidly enhance large amounts of images. The generalization and the image quality after enhancing with FCE-Net indicate the great potential in the biomedical image processing. In this manuscript, the images we used are normally with high SNR. However, FCE-Net can also be used for images with low SNR, as well as the noisy images can be involved in the training of network. FCE-Net learned the excellent ability of the ACE algorithm to extract information and enhance high-frequency signals. We have shown that the vessels, of which morphology is quite different with neurons, can be processed by FCE-Net without further training. Although various structures may emerge in biomedical images, FCE-Net may also be directly adopted or with slight optimization. In short, FCE-Net is expected to serve a variety of tasks such as image segmentation, object tracking, and neuron reconstruction for biomedical researches.

Funding

National Science and Technology Innovation 2030 (2021ZD0200104, 2021ZD0201001); National Natural Science Foundation of China (32192412, 81871082).

Acknowledgments

The authors would like to thank Prof. Xiangning Li for providing the mouse brain samples.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but maybe obtained from the authors upon reasonable request.

References

1. H. U. Dodt, U. Leischner, A. Schierloh, N. Jahrling, C. P. Mauch, K. Deininger, J. M. Deussing, M. Eder, W. Zieglgansberger, and K. Becker, “Ultramicroscopy: three-dimensional visualization of neuronal networks in the whole mouse brain,” Nat Methods 4(4), 331–336 (2007). [CrossRef]

2. A. T. Eggebrecht, S. L. Ferradal, A. Robichaux-Viehoever, M. S. Hassanpour, H. Dehghani, A. Z. Snyder, T. Hershey, and J. P. Culver, “Mapping distributed brain function and networks with diffuse optical tomography,” Nat. Photonics 8(6), 448–454 (2014). [CrossRef]

3. D. Linaro, B. Vermaercke, R. Iwata, A. Ramaswamy, B. L. Philippot, L. Boubakar, B. A. Davis, K. Wierda, K. Davie, S. Poovathingal, P. Penttila, A. Bilheu, L. D. Bruyne, D. Gall, K. K. Conzelmann, V. Bonin, and P. Vanderhaeghen, “Xenotransplanted human cortical neurons reveal species-specific development and functional integration into mouse visual circuits,” Neuron 104(5), 972–986.e6 (2019). [CrossRef]

4. B. Zingg, H. Hintiryan, L. Gou, M. Y. Song, M. Bay, M. S. Bienkowski, N. N. Foster, S. Yamashita, I. Bowman, A. W. Toga, and H. W. Dong, “Neural networks of the mouse neocortex,” Cell 156(5), 1096–1111 (2014). [CrossRef]

5. A. Nobili, E. C. Latagliata, M. T. Viscomi, V. Cavallucci, D. Cutuli, G. Giacovazzo, P. Krashia, F. R. Rizzo, R. Marino, M. Federici, P. D. Bartolo, D. Aversa, M. C. Dell’Acqua, A. Cordella, M. Sancandi, F. Keller, L. Petrosini, S. Puglisi-Allegra, N. B. Mercuri, R. Coccurello, N. Berretta, and M. D’Amelio, “Dopamine neuronal loss contributes to memory and reward dysfunction in a model of Alzheimer’s disease,” Nat Commun 8(1), 14727–14 (2017). [CrossRef]

6. M. Guo, Y. Li, Y. J. Su, T. Lambert, D. D. Nogare, M. W. Moyle, L. H. Duncan, R. Ikegami, A. Santella, I. Rey-Suarez, D. Green, A. Beiriger, J. J. Chen, H. Vishwasrao, S. Ganesan, V. Prince, J. C. Waters, C. M. Annunziata, M. Hafner, W. A. Mohler, A. B. Chitnis, A. Upadhyaya, T. B. Usdin, Z. R. Bao, D. Colón-Ramos, P. L. Riviere, H. F. Liu, Y. C. Wu, and H. Shroff, “Rapid image deconvolution and multiview fusion for optical microscopy,” Nat. Biotechnology 38(11), 1337–1346 (2020). [CrossRef]

7. R. Y. Cai, C. C. Pan, A. Ghasemigharagoz, M. I. Todorov, B. Förstera, S. Zhao, H. S. Bhatia, A. Parra-Damas, L. Mrowka, D. Theodorou, M. Rempfler, A. L. R. Xavier, B. T. Kress, C. Benakis, H. Steinke, S. Liebscher, I. Bechmann, A. Liesz, B. Menze, M. Kerschensteiner, M. Nedergaard, and A. Ertürk, “Panoptic imaging of transparent mice reveals whole-body neuronal projections and skull-meninges connections,” Nat Neurosci 22(2), 317–327 (2019). [CrossRef]

8. H. Q. Xiong, Z. Q. Zhou, M. Q. Zhu, X. H. Lv, A. A. Li, S. W. Li, L. H. Li, T. Yang, S. M. Wang, Z. Q. Yang, T. H. Xu, Q. M. Luo, H. Gong, and S. Q. Zeng, “Chemical reactivation of quenched fluorescent protein molecules enables resin-embedded fluorescence microimaging,” Nat Commun 5(1), 3992 (2014). [CrossRef]

9. M. Kumar and Y. Kozorovitskiy, “Tilt-invariant scanned oblique plane illumination microscopy for large-scale volumetric imaging,” Opt. Lett. 44(7), 1706–1709 (2019). [CrossRef]

10. F. Kallel and A. B. Hamida, “A new adaptive gamma correction based algorithm using DWT-SVD for non-contrast CT image enhancement,” IEEE Trans.on Nanobioscience 16(8), 666–675 (2017). [CrossRef]

11. P. Kandhway, A. K. Bhandari, and A. Singh, “A novel reformed histogram equalization based medical image contrast enhancement using krill herd optimization,” Biomedical Signal Processing and Control 56, 101677 (2020). [CrossRef]

12. M. Rahnemoonfar, A. F. Rahman, R. J. Kline, and A. Greene, “Automatic seagrass disturbance pattern identification on sonar images,” IEEE J. Oceanic Eng. 44(1), 132–141 (2019). [CrossRef]

13. K. Q. Li, X. Q. Qi, Y. W. Luo, Z. Y. Yao, X. G. Zhou, and M. Y. Sun, “Accurate retinal vessel segmentation in color fundus images via fully attention-based networks,” IEEE J. Biomed. Health Inform. 25(6), 2071–2081 (2021). [CrossRef]

14. S. Agaian and S. A. McClendon, “Novel medical image enhancement algorithms,” in Image Processing: Algorithms and Systems VIII. International Society for Optics and Photonics (SPIE, 2010), pp. 245–256.

15. S. Wang, J. Zheng, H. M. Hu, and B. Li, “Naturalness preserved enhancement algorithm for non-uniform illumination images,” IEEE Trans. on Image Process. 22(9), 3538–3548 (2013). [CrossRef]

16. W. C. Wang, X. J. Wu, X. H. Yuan, and Z. R. Gao, “An experiment-based review of low-light image enhancement methods,” IEEE Access 8, 87884–87917 (2020). [CrossRef]

17. L. T. Yuan, S. K. Swee, and T. C. Ping, “Infrared image enhancement using adaptive trilateral contrast enhancement,” Pattern Recognit. Lett. 54, 103–108 (2015). [CrossRef]

18. H. Bo, C. H. Rao, Y. D. Zhang, Y. Dai, X. J. Rao, and Y. B. Fan, “Hybrid filtering and enhancement of high-resolution adaptive-optics retinal images,” Opt. Lett. 34(22), 3484–3486 (2009). [CrossRef]

19. K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2016), pp. 770–778.

20. C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2015), pp. 1–9.

21. K. Sun, Y. Zhao, B. R. Jiang, T. H. Cheng, B. Xiao, D. Liu, Y. D. Mu, X. G. Wang, W. Y. Liu, and J. D. Wang, “High-resolution representations for labeling pixels and regions,” arXiv:1904.04514 (2019).

22. C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. H. Wang, and W. Z. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 4681–4690.

23. L. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Adv. Neural. Infom. Process. Syst. 27, 10 (2014).

24. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16 × 16 words: Transformers for image recognition at scale,” arXiv:2010.11929 (2020).

25. N. Carion, F. MassaGabriel, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision (Springer, 2020), pp. 213–229.

26. A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv:1606.02147 (2016).

27. Y. Wang, Q. Zhou, J. Xiong, X. Wu, and X. Jin, “ESNet: an efficient symmetric network for real-time semantic segmentation,” in Chinese Conference on Pattern Recognition and Computer Vision (Springer, 2019), pp. 41–52.

28. X. X. Li, Z. W. Liu, P. Luo, C. C. Loy, and X. O. Tang, “Not all pixels are equal: difficulty-aware semantic segmentation via deep layer cascade,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2017), pp. 3193–3202.

29. C. Q. Yu, J. B. Wang, C. Peng, C. X. Gao, G. Yu, and N. Sang, “Bisenet: Bilateral segmentation network for real-time semantic segmentation,” in Proceedings of the European Conference on Computer Vision (Springer, 2018), pp. 325–341.

30. H. S. Zhao, X. J. Qi, X. Y. Shen, J. P. Shi, and J. Y. Jia, “Icnet for real-time semantic segmentation on high-resolution images,” in Proceedings of the European conference on computer vision (Springer, 2018), pp. 405–420.

31. Y. Wang, Q. Zhou, J. Liu, J. Xiong, G. W. Gao, X. F. Wu, and L. J. Latecki, “Lednet: A lightweight encoder-decoder network for real-time semantic segmentation,” in 2019 IEEE International Conference on Image Processing (IEEE, 2019), pp. 1860–1864.

32. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2015), pp. 3431–3440.

33. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention (Springer, 2015), pp. 234–241.

34. Z. W. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, and J. M. Liang, “UNet++: redesigning skip connections to exploit multiscale features in image segmentation,” IEEE Trans. Med. Imaging 39(6), 1856–1867 (2020). [CrossRef]

35. J. N. Chen, Y. Y. Lu, Q. H. Yu, X. D. Luo, E. Adeli, Y. Wang, L. Lu, A. Yuille, and Y. Y. Zhou, “Transunet: Transformers make strong encoders for medical image segmentation,” arXiv:2102.04306, (2021).

36. P. Wu, D. J. Zhang, J. Yuan, S. Q. Zeng, H. Gong, Q. M. Luo, and X. Q. Yang, “Large depth-of-field fluorescence microscopy based on deep learning supported by Fresnel incoherent correlation holography,” Opt. Express 30(4), 5177–5191 (2022). [CrossRef]

37. C. L. Guo, M. Szemenyei, Y. G. Yi, W. L. Wang, B. Chen, and C. Q. Fan, “SA-UNet: spatial attention u-net for retinal vessel segmentation,” in 2020 25th International Conference on Pattern Recognition (IEEE, 2021), pp. 1236–1242.

38. S. H. Woo, J. Park, J. Y. Lee, and I. S. Kweon, “Cbam: convolutional block attention module,” in Proceedings of the European Conference on Computer Vision (Springer, 2018), pp. 3–19.

39. T. Yang, T. Zheng, Z. H. Shang, X. J. Wang, X. H. Lv, J. Yuan, and S. Q. Zeng, “Rapid imaging of large tissues using high-resolution stage-scanning microscopy,” Biomed. Opt. Express 6(5), 1867–1875 (2015). [CrossRef]

40. Y. D. Gang, X. L. Liu, X. J. Wang, Q. Zhang, H. F. Zhou, R. X. Chen, L. Liu, Y. Jia, F. F. Yin, R. Gong, J. D. Chen, and S. Q. Zeng, “Plastic embedding immunolabeled large-volume samples for three-dimensional high-resolution imaging,” Biomed. Opt. Express 8(8), 3583–3596 (2017). [CrossRef]

41. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv: 1412.6980 (2014).

42. B. Bloem, L. Schoppink, D. C. Rotaru, A. Faiz, P. Hendriks, H. D. Mansvelder, W. D. J. V. D. Berg, and F. G. Wouterlood, “Topographic mapping between basal forebrain cholinergic neurons and the medial prefrontal cortex in mice,” J. Neurosci. 34(49), 16234–16246 (2014). [CrossRef]

43. T. W. Quan, T. Zheng, Z. Q. Yang, W. X. Ding, S. W. Li, J. Li, H. Zhou, Q. M. Luo, H. Gong, and S. Q. Zeng, “NeuroGPS: automated localization of neurons for brain circuits using L1 minimization model,” Sci. Rep. 3(1), 1414 (2013). [CrossRef]

44. L. Qu, Y. Y. Li, P. Xie, L. J. Liu, Y. M. Wang, J. Wu, Y. Liu, T. Wang, L. F. Li, K. X. Guo, W. Wan, L. Ouyang, F. Xiong, A. C. Kolstad, Z. H. Wu, F. Xu, Y. F. Zheng, H. Gong, Q. M. Luo, G. Q. Bi, H. W. Dong, M. Hawrylycz, H. K. Zeng, and H. C. Peng, “Cross-modal coherent registration of whole mouse brains,” Nat Methods 19(1), 111–118 (2022). [CrossRef]

Number of SAM	None SAM	One SAM	Two SAM	Three SAM
SSIM	0.941	0.957	0.970	0.973
PSNR	35.180	35.912	36.510	36.655
Kernel size	$3 \times 3$	$5 \times 5$	$7 \times 7$	$9 \times 9$
SSIM	0.966	0.966	0.970	0.971
PSNR	36.230	36.202	36.510	36.380

	Image size/pixels	FCE-Net	U-Net	U-Net++	ACE
Time/s	$512 \times 512$	0.015	0.025	0.114	8.178
	$1024 \times 1024$	0.027	0.098	0.402	32.651
	$2048 \times 2048$	0.072	0.373	*	130.406
Total parameters	$512 \times 512$	12,422,087	31,042,369	47,174,785
Memory/MB	$512 \times 512$	1651.4	4077.4	7466.9

Method		ACC	AUC	F1	MCC	IOU
Threshold	Raw	0.9337	0.7469	0.8041	0.6442	0.4758
Threshold	FCE-Net	0.9365	0.7822	0.8256	0.6669	0.5228
SA-UNet	Raw	0.9603	0.8937	0.9037	0.8078	0.7091
SA-UNet	FCE-Net	0.9611	0.9065	0.9078	0.8155	0.7206

Number of SAM	None SAM	One SAM	Two SAM	Three SAM
SSIM	0.941	0.957	0.970	0.973
PSNR	35.180	35.912	36.510	36.655
Kernel size	$3 \times 3$	$5 \times 5$	$7 \times 7$	$9 \times 9$
SSIM	0.966	0.966	0.970	0.971
PSNR	36.230	36.202	36.510	36.380

	Image size/pixels	FCE-Net	U-Net	U-Net++	ACE
Time/s	$512 \times 512$	0.015	0.025	0.114	8.178
	$1024 \times 1024$	0.027	0.098	0.402	32.651
	$2048 \times 2048$	0.072	0.373	*	130.406
Total parameters	$512 \times 512$	12,422,087	31,042,369	47,174,785
Memory/MB	$512 \times 512$	1651.4	4077.4	7466.9

FCE-Net: a fast image contrast enhancement method based on deep learning for biomedical optical images

Abstract

1. Introduction

2. Methods

2.1. Network architecture

2.2. Datasets acquisition

2.3. Implementation details

3. Results

3.1. Ablation experiments

3.2. Contrast enhancement

3.3. Efficiency comparison

3.4. Quality evaluation

3.5. Soma localization and counting

3.6. Vascular segmentation

4. Discussion and conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (6)

Tables (3)

Equations (9)

Biomedical Optics Express