Underwater ghost imaging based on generative adversarial networks with high imaging quality

Xu Yang; Zhongyang Yu; Lu Xu; Jiemin Hu; Long Wu; Long Wu; Chenghua Yang; Wei Zhang; Jianlong Zhang; Yong Zhang; Yong Zhang

doi:10.1364/OE.435276

1. Introduction

Due to high imaging resolution, active optical imaging [1] has become an important technology for underwater target imaging. However, because the attenuation and scattering of light in water will cause varying degrees of degradation, the underwater image obtained by the active optical imaging system is facing many problems, such as low contrast, fuzzy details, information loss, and color distortion. Therefore, methods to improve underwater imaging quality become the study focus of underwater vision and underwater image processing. As a popular active optical imaging technique, ghost imaging (GI) [2] can reconstruct the target by calculating the correlation of the intensity fluctuation of the light field. Compared with the conventional optical imaging technologies, GI is anti-interference, low system complexity, and long-range of action, therefore it plays a great role in the field of remote sensing [3], fluorescence imaging [4], terahertz imaging [5], lidar [6], et al. To improve the imaging quality of ghost imaging, more and more GI improvement schemes have been proposed. For instance, iterative denoising GI [7], scalar matrix structure with GI [8], differential GI [9], and Hadamard GI [10]. These methods effectively improve the imaging quality of ghost imaging and promote the practical process of ghost imaging. However, high-quality imaging results need a huge number of measurements to support, which greatly reduces the possibility of GI real-time imaging.

To achieve high-quality reconstruction with a few measurements, advanced reconstruction algorithms including compressive sensing algorithm and deep learning algorithm are applied to GI. The Compressive Sensing Ghost Imaging (CSGI) [11,12] takes advantage of the sparse characteristics to reconstruct the target with little sampling. However, it increases the computational resources and time complexity of the image algorithm. As a result, it is still difficult to meet the real-time requirements for reconstruction processing. In addition, although the CSGI can achieve the purpose of down-sampling to some extent, a lot of measurements are needed to get a high-quality reconstructed image. Therefore, the contradiction between measurement quantity and imaging quality is still huge. Due to the success of deep learning in the field of machine vision, more and more studies apply deep learning [13–16] to the reconstruction of GI under low sampling rate conditions. In the Pyramid deep learning ghost imaging (PDLGI) [13] method, the spatial distribution of speckle can be obtained by deep neural network autonomously, instead of pre-design. The target is illuminated by the trained speckle distribution and the total reflected light intensities recorded by the bucket detector [17] are utilized as the input of the neural network model to reconstruct the image of the target. Compared with other reconstruction methods, PDLGI is more efficient and requires fewer measurements for reconstruction. However, most of the existing DLGI studies, whose reconstruction models are trained with the natural image dataset captured by the camera in free space. The studies on DLGI for the underwater environment are relatively few. If the DLGI trained with the natural images dataset is directly applied in the underwater environment detection, the imaging quality of the reconstruction results is usually poor due to the scattering and absorption in the water.

To enable the GI system to reconstruct underwater targets with high quality under low sampling rate conditions, the underwater ghost imaging based on generative adversarial networks (UGI-GAN) algorithm is proposed in this paper. In the proposed UGI-GAN model, a modified U-Net is designed as the main network architecture of the generator. The modified U-Net model contains double skip connections between layers and the attention module is added to one of the skip connections [18] to improve the reconstruction effect of the network. A convolutional neural network (CNN) is chosen as the discriminator to guide the image generation of the generator. The total reflected light intensities recorded by the bucket detector are the input of the network model and the image of underwater targets can be reconstructed directly. Moreover, a cycle generative adversarial (Cycle-GAN) network is implemented to obtain the “fuzzy-clear” paired underwater target dataset for network training. Compared with the natural image dataset, the “fuzzy-clear” paired underwater target dataset generated by Cycle-GAN is more conducive to the network model to learn the features of underwater images, to drive the network model to achieve the high-quality reconstruction of the underwater target. Simulation and experimental results demonstrate the effectiveness of the proposed method. The reconstructed results of the method are much clearer than other methods and effectively reduce the image blur caused by the water. The proposed UGI-GAN is suitable for the high-quality reconstruction of underwater images in practical applications.

2. Method

2.1. UGI-GAN imaging scheme

The UGI-GAN system schematic is shown in Fig. 1. The light emitted from the laser source is expanded by the beam expander lens [19] and illuminates the computer-controlled spatial light modulator (SLM) [20]. The spatial distribution of the laser is modulated by the SLM according to the speckle modulation modes controlled by the computer. The modulated laser is emitted on the underwater target by a transmitting antenna. The laser reflected by the underwater target is collected by the receiving antenna and focused on a bucket detector. The bucket detector is connected to the data acquisition system (DAS) which is used to record the total light intensity reflected from the target. The recorded light intensity is then sent to the computer. In the UGI-GAN, the reconstruction network model is implemented in the computer.

Fig. 1. Schematic of UGI-GAN system

Download Full Size | PDF

The reflectivity distribution of the underwater target can be expressed as T(x, y), where (x, y) are the horizontal and vertical coordinates. The m^th distribution of speckles illuminated on the target is represented by I_m, where m = 1, 2, …, M and M is the total number of measurements. The reflected light is focused on the active area of the bucket detector. Hence, the m^th measurement of bucket detector can be written as:

(1)$${y_m} = \sum\limits_{(x,y)} {T({x,y} ){I_m}({x,y} )}$$

According to the principle of GI, the reconstruction process calculates the second order correlation between the measurements of the bucket detector and the spatial distribution of the light. In order to improve the SNR and contrast ratio of the reconstruction results [21], the CS and DL algorithms are used to improve the reconstruction efficiency.

In CSGI, the whole M speckle distribution is generally represented in matrix form [22]. The matrix of speckle distribution I can be expressed as following:

(2)$${\mathbf I} = {\left[ {\begin{array}{ccc} {{I_1}({1,1} )}& \cdots &{{I_1}({p,q} )}\\ \vdots & \ddots & \vdots \\ {{I_M}({1,1} )}& \cdots &{{I_M}({p,q} )} \end{array}} \right]_{M \times N}}$$

Here, p and q are the numbers of pixels in the horizontal and vertical direction respectively. N is the number of elements in each speckle pattern and equals to p × q. The reflectivity distribution of targets can be also denoted in matrix form and it is expressed as:

(3)$${\mathbf x} = {\left[ {\begin{array}{c} {T({1,1} )}\\ \vdots \\ {T({p,q} )} \end{array}} \right]_{N \times 1}}$$

Therefore, the matrix form of the M measurements of bucket detector y is:

(4)$${\mathbf y} = {\left[ {\begin{array}{c} {{y_1}}\\ \vdots \\ {{y_M}} \end{array}} \right]_{M \times 1}} = \left[ {\begin{array}{c} {\sum\limits_x^p {\sum\limits_y^q {T({x,y} ){I^{(1 )}}({x,y} )} } }\\ \vdots \\ {\sum\limits_x^p {\sum\limits_y^q {T({x,y} ){I^{(M )}}({x,y} )} } } \end{array}} \right] = {\mathbf {Ix}}.$$

The column vector x is requested to be solved or approximately solved based on the matrix y and I, which is also the essence of CSGI. The solution is not unique and is difficult to be obtained. According to the theory of the CS, the method of implicit regularization [23] can be utilized to solve Eq. (4), such as iterative methods of algebraic recovery [24] and unconstrained L2-norm regularization methods [25]. The optimization process can be expressed as:

(5)$${{\mathbf x}^\ast } = \arg \mathop {\min }\limits_{\mathbf x} \Psi ({\mathbf x} )+ \frac{\lambda }{2}||{{\mathbf {Ix}} - {\mathbf y}} ||_2^2,$$

where Ψ is the transform operator to the sparse basis [26], λ is the regularization parameter, and ||.||₂ is the L2-norm. Unlike the second order correlation [27] or the CS algorithm, the proposed method is to reconstruct the image by using the DL network framework in this paper. The specific process is that the M measurements of bucket detector y are the inputs and an underwater high-quality reconstructed image x is the output. Its reconstruction process can be expressed as:

(6)$${{\mathbf x}^\ast } = DL\{{\mathbf y} \},$$

where DL{.} represents the trained neural network whose function is to reconstruct the image of the target through the measurements of the bucket detector. In order to obtain the reconstructed results of the target x*, which is as similar to x as possible, the labeled data pairs are used to train neural networks DL{.}. All labeled data pairs contain the known target image x^(j) and the corresponding bucket detector measurements y^(j). Note that the target image is an underwater clear image, in which case the network structure can be written as:

(7)$$D{L_{learn}} = \mathop {\arg \min }\limits_{D{L_\theta },\theta \in \Theta } \sum\limits_{j = 1}^J {L({{{\mathbf x}^{(j )}},D{L_\theta }\{{{{\mathbf y}^{(j )}}} \}} )+ \varphi (\theta )} $$

where J is the total number of the dataset, Θ is the set of all possible parameters of the neural network and L(.) is the error loss function between the network output DL_θ{y^(j)} and the known target image x^(j). The function of the φ(θ) is used to regularize the parameters of the neural network to avoid overfitting [28]. Once the training processing of DL_learn is complete, the reconstructed results of the underwater target can be obtained as following:

(8)$${\mathbf x}_{DL}^\ast{=} D{L_{learn}}({\mathbf y} )$$

2.2. Preparation of paired training dataset

In order to train the proposed UGI-GAN neural network, the dataset needs to contain a huge number of paired underwater images. Although it is relatively easy to obtain the fuzzy underwater images, the corresponding clear images are extremely difficult to obtain. Aiming at the above problems, the Cycle-GAN neural network [29] is proposed to transform the fuzzy underwater images to the corresponding clear images through the style transfer approach in this paper. The Cycle-GAN neural network is a common method to transform images from one style to another in the field of DL and computer vision. Moreover, it is very convenient to train the Cycle-GAN neural network model. The datasets of different domains for training do not need to be paired with each other, which makes it possible to obtain the paired “clear-fuzzy” underwater dataset. In order to obtain the “clear” and “fuzzy” domain underwater datasets respectively, 5000 clear natural images in the free space and 5000 fuzzy underwater images are crawled from the internet. Then, the collected data sets are used to train the network. After training, use the trained Cycle-GAN model to change the underwater picture into a clear picture close to the free space to obtain paired dataset.

The structure of the Cycle-GAN [29] network model is shown in Fig. 2. The function of Cycle-GAN is to realize the transformation of different image domains. The goal is to use the Cycle-GAN to generate paired underwater datasets.

Fig. 2. Schematic of Cycle-GAN.

Download Full Size | PDF

As Fig. 2(a) shows, there are 2 domains: the source domain X of the underwater fuzzy image, and the target domain Y of the clear image captured by the camera in the free space. In addition, Cycle-GAN is a circular network consisting of two generators and two discriminators. And generator G_x2y and generator G_y2x generate images from domain X to Y and from domain Y to X, respectively. In addition, Dx and Dy are the discriminators. Dx is used to distinguish the input images x of domain X and the translated images {G_y2x(y)}. Similarly, Dy is used to distinguish input images y of domain Y and the translated images {G_x2y(x)}. The 0/1 is the discriminative result of the discriminator. In order to match the data distribution of generated images to the data distribution in the target domain, the adversarial losses are applied to both mapping functions. The adversarial loss of G_x2y and Dy can be expressed as:

(9)$${L_{adversarial}}({G_{x2y}},Dy\textrm{, }X\textrm{ ,}Y) = {E_{y\sim {P_y}(y)}}\{ \log [{D_Y}(y)]\} + {E_{x\sim {P_x}(x)}}\{ \log [1 - {D_Y}({G_{x2y}}(x))]\}$$

where the generator G_x2y converts the underwater fuzzy images x into the underwater clear images G_x2y(x), and the discriminator Dy is used to whether the generated image is real or not. Similarly, the adversarial loss of G_y2x and Dx can be expressed as L_adversarial(G_y2x, Dx, Y, X)

In order to prevent the trained G_x2y and G_y2x from contradicting each other, cycle-consistency is applied to the network. As Fig. 2(b) shows, the image translation cycle should bring the x from domain X back to the original image. Therefore the cycle-consistency can be expressed as: x→G_x2y(x) →G_y2x(G_x2y(x)) ≈x. Similarly, the cycle-consistency in Fig. 2(c) can be expressed as y→G_y2x(y) →G_x2y(G_y2x(y)) ≈y. The behavior can be incentivized by a cycle consistency loss, which can be expressed as:

(10)$${L_{cyc}}({G_{x2y}},{G_{\textrm{y}2x}}) = {E_{x\sim {P_x}(x)}}[||{G_{y2x}}({G_{x2y}}(x)) - x|{|_1}] + {E_{y\sim {P_y}(y)}}[||{G_{x2y}}({G_{y2x}}(y)) - y|{|_1}]$$

After training, use the trained Cycle-GAN model to obtain 15000 pairs of all kinds of underwater datasets including animals, plants, shells, and so on.

The data set needed for the UGI-GAN experiment training contains 15000 pairs of images. Among them, 1000 pairs of pictures are randomly selected as the test set, and the remaining 14000 pairs are used as the training set. This number of data sets can well meet the training requirements. Sample images from the data set are shown in Fig. 3.

Fig. 3. Sampled paired images in the dataset generated by Cycle-GAN model.

Download Full Size | PDF

2.3. Network structure

The proposed UGI-GAN network structure is shown in Fig. 4. As Fig. 4 shows, the UGI-GAN network is based on the structure of generating adversarial networks [30]. The generative adversarial network consists of two parts: generator G and discriminator D. The function of the generator is to generate the reconstructed results of the underwater target using the measurements of the bucket detector. The function of the discriminator is to guide the generator to generate reconstruction results with high imaging quality.

Fig. 4. UGI-GAN consists of a generator G and a discriminator D.

Download Full Size | PDF

Generator: the generator of the proposed UGI-GAN is inspired by U-net [31,32]. The inputs of the generator are the M measurements of the bucket detector, and the output of the generator is the 128×128 image with channel 1. Generator mainly contains three types of structures: full connection layer, convolution module, and attention gate (AG). The first layer of the generator is the input layer, which is a one-dimensional vector with a size of M×1. The second layer is the fully connected layer whose dimension is 16384×1. The third layer reshapes the vector 16384×1 to a matrix of size 128×128. Then there is U-Net which is used to reconstruct underwater images. The U-Net adopts the structure of encoder-decoder. The encoder phase is a down-sample process. The decoder phase is an up-sample process. These two phases include the convolution layer, Max-pooling layer, Up-sample layer, and AG. The convolutional layer is used to extract image features. The Max-pooling layer is used to reduce the size of the input image, which can effectively reduce the impact of changes in position and size on imaging effects. The up-sample layer is used to raise the size of the output image. However, with the max-pooling layer deepening, the details and features of the input signal are gradually lost. To solve this problem, the double skip connections modules are added to the encoder-decoder network to connect the encoder and decoder corresponding layer. Different from the conventional skip connection, AG is added to the path of skip connection [33] which can enhance the salient features of skip connections and eliminate irrelevant content and noise responses in skip connections. In addition, the other path of skip connection layer is added to connect the encoder and decoder corresponding layer, which can recover more details. With the help of the proposed double skip connections, the UGI-GAN model can obtain more image features of the underwater target and reduce the adverse effect of imaging quality degradation caused by water.

The double skip connections are used in the network structure of underwater ghost imaging based on a generative adversarial network (UGI-GAN). The double skip connections which are between the corresponding convolutional layer and deconvolution layer of U-net are suitable for the underwater image reconstruction problem. The double skip connections consist of two connections: concatenation connection and element-wise adding connection. The reasons why we use such structure are shown as below:

Concatenation connection: First, as the layer of the network increases, the detail and important characteristics of the underwater image will be lost. The deconvolution can hardly recover the detail and important characteristics perfectly. The feature map transmitted by the concatenation connection containing much useful detail and important characteristics will contribute to recovering a better underwater image. Second, the concatenation connection can make the training of the network easier than that without the concatenation connection.

Element-wise adding connection: Element-wise adding connection here is useful because there are a lot of similar important characteristics between the input and output layers. It greatly increases the reconstruction performance compared to the same network without the element-wise adding connection. It also effectively solves the problem of vanishing gradient in the training process.

The structure of the discriminator is composed of 9 convolution layers and 1 fully connected layer shown at the bottom of Fig. 4. Discriminator based on CNN is used to improve the reconstruction ability of the generator. The discriminator firstly uses 9 convolution layers to extract the features of the input image, and then a full connection layer converts it into one-dimensional feature vector output to achieve the discriminant function.

2.4. Network training

To improve the imaging quality of the generator, the weighted sum of the adversarial loss, perceptual loss, and pixel loss is set as the total loss function during the training process [34]. Perceptual loss is acquired based on the VGG19 network [35], which is a trained network model. In addition, Adam optimizer [36] is used to optimize the loss function. The training epoch is set to 200. The batch size is 12. The initial learning rate (LR) is 0.0001, and when the training epoch is over 100, LR will be set as 0.00005.

The loss function is used to estimate the difference between the target and the reconstructed result. When the value of the loss function decreases, the reconstructed image is more like the target image. Perceptual loss and pixel loss can be described as:

(11)$${L_{\textrm{perceptual }}} = ||{{f_{\textrm{vgg19}}}({{x_g}} )- {f_{\textrm{vgg19 }}}(x)} ||_2^2.$$

(12)$${L_{\textrm{pixel }}} = ||{{x_g} - x} ||_2^2.$$

where x_g and x are the network reconstruction image and the target image respectively. In addition, f_vgg19 is VGG19 trained network.

Adversarial loss can be described as:

(13)$${L_{adversarial}} = {E_{y\sim {P_y}(y)}}\{ \log [D(G(y))]\} .$$

where E represents the expected value of the distribution function. y is the input of the network model. P_y(y) is the data distribution of y. G(y) is the output of the generator. Therefore, the total loss of the generator can be expressed as:

(14)$${L_{\textrm{total }}} = \alpha {L_{\textrm{perceptual }}} + \beta {L_{\textrm{pixel }}} + \gamma {L_{\textrm{adversarial }}}.$$

The values of α, β, and γ are 0.006, 1, 0.001 respectively. The proposed network model is tested on a python 3.8 version with PyTorch 1.5.1. A graphics processing unit (GEFORCE GTX-1650 GPU) is used to accelerate the computation.

3. Numerical simulation and experimental results

3.1. Numerical simulation

In order to verify the effectiveness of the proposed UGI-GAN, the numerical simulation is performed. Furthermore, the U-Net deep learning ghost imaging (UDLGI) [16] and PDLGI method are trained with the same dataset and simulated as comparisons. The reconstruction results of different methods at varying sampling rates are shown in Fig. 5. The first column in Fig. 5 represents the underwater clear image and the underwater fuzzy image respectively. The paired images are generated by the Cycle-GAN mentioned in section 2.2. The clear image is treated as the ground-truth image of the underwater target, which is utilized to calculate the loss function, and the fuzzy image is treated as the target image with water degradation, which is utilized to simulate the measurements of the bucket detector. Under varying sampling rate conditions, the reconstruction results of various methods are shown to the right of the yellow dotted line in Fig. 5. It can be seen from Fig. 5 that the sampling rate has a great influence on the reconstruction quality of every method. When the sampling rate is relatively low, such as 2.5%, the reconstruction results of the three methods are blurry and the edge information of the target is distorted seriously. As the sampling rate increases, the quality of the reconstruction results gets better and the edge of the target becomes much clearer than that under low sampling rate conditions. Especially, when the sampling rate is higher than 10%, all the three methods have good reconstruction quality and visual effect. When the sampling rate is greater than 10%, the quality of the reconstruction results cannot be much affected even if the sampling rate is further increased. Moreover, under the condition of the same sampling rate, such as the 20%, the results obtained by different reconstruction methods have obvious differences. Influenced by the scattering and absorption of water, the distortion degrees of the results reconstructed by the PDLGI method are most serious. The reconstructed results of the PDLGI method have a serious loss of target details and relatively low image contrast. The surface of the reconstructed result is like to be covered with a layer of thin gauze from the point of visual effect. The reconstructed results of the UDLGI method are better than those of the PDLGI method. The reconstructed results of the UDLGI method are clearer and have higher image contrast, although the details of some areas are not well reconstructed and there is a certain degree of image distortion and noise. The proposed UGI-GAN method has the best reconstruction quality. Compared with those of the other two methods, the edge profile information of the reconstructed results is relatively clear and the visual effect of the target detail is good. Meanwhile, the proposed method has a robust ability to resist underwater blurring and reduction of intensity. Although the image of the simulation target is seriously blurred by scattering, the reconstructed result has a high imaging contrast and image quality. The simulation results show that the UGI-GAN network can achieve high-quality image reconstruction, and the reconstruction ability is better than that of the UDLGI and PDLGI methods under different sampling rates.

Fig. 5. Comparison of simulation results of UGI-GAN, UDLGI, and PDLGI at different sampling rates.

Download Full Size | PDF

To quantitatively demonstrate the advantages of this approach, the performance of reconstructed underwater images is quantitatively evaluated by the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [37]. The PSNR and SSIM can be calculated as follows:

(15)$$PSNR = 10 \times {\log _{10}}\left( {\frac{{MAX_I^2}}{{MSE}}} \right).$$

(16)$$SSIM(x,y) = \frac{{({2{\mu_x}{\mu_y} + {c_1}} )({2{\sigma_{xy}} + {c_2}} )}}{{({\mu_x^2 + \mu_y^2 + {c_1}} )({\sigma_x^2 + \sigma_y^2 + {c_2}} )}}.$$

Here, MAX_I is the maximum possible value of the image pixels. In the simulation, the target is the 8-bit grayscale image, therefore MAX_I=255. MSE is the mean square error of the image. The MSE is defined as follows:

(17)$$MSE = \frac{1}{{128 \times 128}}\sum\limits_{i = 1}^{128} {\sum\limits_{j = 1}^{128} |} O(i,j) - R(i,j){|^2},$$

where O(i, j) is the target image, and R(i, j) is the reconstructed image. i, j are the horizontal and vertical coordinates of the pixel points respectively. In Eq. (14), μ_x and μ_y represent the average values of the total pixels in the clear underwater target image and the reconstructed image respectively. $\sigma _x^2 $ and $\sigma _y^2 $ are the variances of the clear underwater target image and the reconstructed image. σ_xy is the covariance of the target image. c₁ and c₂ are two small positive numbers used to avoid dividing by 0.

The total loss is used to explain the optimization process during the training process. The total loss curve in the training process of GAN under the sampling rate of 10% is shown in Fig. 6. As Fig. 6 shows, with the increase of the epoch, the value of the total loss function decreases. And when the epoch is more than 70, the total loss function curve tends to be stable. The whole training process is relatively stable, which meets the requirements of training.

Fig. 6. The total loss curve of UGI-GAN in the training process of GAN at a sampling rate of 10%.

Download Full Size | PDF

The reconstruction time of the UGI-GAN, UDLGI, and PDLGI methods under 4 different sampling conditions is shown in Table 1. The proposed method is UGI-GAN. The sampling rates include 3%, 5%, 10% and 20%. There are 1,000 pairs of underwater images used to calculate the reconstruction time of the three methods under different sampling rates and the average reconstruction time of each image is obtained. As illustrated in Table 1, no matter what the sampling rate is, the average time of each image reconstructed by UGI-GAN is less than those of UDLGI and PDLGI. When the sampling rate is 10%, the average time of each image reconstructed by UDLGI and PDLGI are 19.17 ms and 23.25 ms respectively. And the average time of each image reconstructed by UGI-GAN is only 15.17 ms which can meet the real-time imaging requirements and it is less than those of UDLGI and PDLGI.

Table 1. The reconstruction time of UGI-GAN, UDLGI and PDLGI.

View Table

Here, the restored image quality is quantitatively evaluated by PSNR and SSIM. Figure 7 shows the PSNR and SSIM curves which are corresponding to the images in Fig. 7. Note that the under the clear image is used as a reference image for calculating the PSNR and SSIM index values. Different color lines represent the imaging quality evaluation indexes reconstructed by the methods. The red, blue, and green lines represent UGI-GAN, PDLGI, and UDLGI respectively. Figure 7(a) and (b) show the PSNRs and SSIMs of the different methods changing with the sampling rates. It can be seen that the sampling rate affects the quality of the reconstruction. When the sampling rate is low, the PSNRs and SSIMs of all the methods are relatively low. The PSNRs and SSIMs increase with the increase of sampling rate. Moreover, under the condition of the same sampling rate, the quantitative evaluation indexes of the three methods have considerable differences. The quantitative evaluation indexes of the PDLGI method are the lowest and the quantitative evaluation indexes of the UGI-GAN method are the highest, indicating that the image quality of UGI-GAN is better than those of the other two methods under different sampling rates.

Fig. 7. The PSNRs and SSIMs of the reconstructed shark images of UGI-GAN, UDLGI, and PDLGI at different sampling rates.

Download Full Size | PDF

In addition, an underwater submarine image, which is not available in the training set, is chosen as the target to examine the generalization ability of the trained neural network model. The reconstruction results are shown in Fig. 8 and the PSNRs and SSIMs of every reconstructed underwater submarine result are listed below. As Fig. 8 shows, every reconstruction method has good generalization ability and the submarine can be successfully reconstructed from the bucket signal with the sampling rate as low as 2.5%, although the images are apparently distorted. With the increasing sampling rate, the image is nearly perfectly reconstructed. At the same time, in terms of reconstruction performance with respect to the sampling rate, such as the 20%, the reconstructed results of PDLGI are apparently distorted and contain a lot of noise, which indicates that the PDLGI method is sensitive to noise. The situations become much improved when the UDLGI is used to reconstruct the image but the images still contain a degree of noise. Compared to the other two methods, the UGI-GAN method can nearly perfectly reconstruct the images. The generalization test results verify that the proposed UGI-GAN method has the best generalization ability among the three methods under different sampling rates.

Fig. 8. Comparison of simulation results of UGI-GAN, UDLGI, and PDLGI at different sampling rates.

Download Full Size | PDF

3.2. Experimental results

An experiment is implemented based on the schematic of the UGI-GAN system as Fig. 1 shows. In the experiment, the laser source is a pulsed laser with a wavelength of 520 ± 10 nm (NPL52B) and average pulse power of 12 mW. The laser beam is expanded by a lens and illuminated on the computer-controlled SLM. The spatial distribution of the laser is modulated by the SLM according to the speckle modulation mode transmitted by the computer. The dimension of the modulation mode is set to 128×128. The light reflected from the target is collected from the camera lens and recorded by a bucket detector (DET36A2). The detector is connected to a data acquiring card (NI-PXI-5154) to measure the total light intensity. The bandwidth of the data acquiring card is 2 GHz and the sampling frequency is 5 GHz/s. In the experiment, the targets are chosen as a toy whale and a model submarine whose material is both plastic. The targets are placed in a sink, whose size is 95cm×36cm×40 cm. The sink is filled with pure water and the target is flooded by water. In order to avoid interference from background light and simulate the dark environment in deep water detection, the experiment is performed in the dark environment.

Then the trained UGI-GAN model is used to recover the image from the measurements of the bucket detector in Fig. 1. The experimental results are shown in Fig. 9. In the experiment, selecting a toy whale as the target, the input of the UDLGI, PDLGI, and UGI-GAN methods is the signal obtained by the bucket detector, and the output of each network is reconstructed image. The reconstruction results are shown in Fig. 9. It can be seen that the experimental results from Fig. 9 are similar to the simulation results from Fig. 5 in section 3.1. The experimental results verify that the proposed UGI-GAN method has the best performance among the three methods under different sampling rates.

Fig. 9. Comparison of reconstructed whale images of UGI-GAN, UDLGI, and PDLGI at different sampling rates.

Download Full Size | PDF

To quantitatively prove the advantages of the proposed UGI-GAN method, the PSNRs and SSIMs of reconstructed images of different methods under varying sampling rates are calculated. The results are shown in Fig. 10. The red, blue, and green lines represent UGI-GAN, PDLGI, and UDLGI respectively. It can be seen that the results from Fig. 10 are similar to the results from Fig. 7 in section 3.1. The PSNRs and SSIMs of the proposed UGI-GAN method verify that the proposed method is better than those of the other methods under different sampling rates.

Fig. 10. The PSNRs and SSIMs of the reconstructed whale images of UGI-GAN, UDLGI, and PDLGI at different sampling rates.

Download Full Size | PDF

In the second experiment, an underwater whale model is chosen as the target to examine the reconstruction ability of the trained neural network model with a 20% sampling rate for the underwater environment with different degrees of ambiguity. In order to obtain the target under different degrees of underwater ambiguity, 5 ml, 10 ml, and 20 ml of milk are poured into the underwater environment respectively. The blurring of the underwater environment increases with the increase in milk. The reconstruction results are shown in Fig. 11. As Fig. 11 shows, the degree of underwater ambiguity has a great influence on the reconstructed results. When the degree of underwater ambiguity is high, the reconstruction results of the three methods are blurry and the edge information of the target is distorted seriously. As the degree of underwater ambiguity decreases, the quality of the reconstruction results gets better and the edge of the target becomes much clearer. Moreover, under the condition of the same degree of underwater ambiguity, the reconstructed results of the three methods have obvious differences. As expected, the reconstructed results of the PDLGI method have a serious loss of target details and relatively low image contrast, which indicates the PDLGI method is not good at moving fog of underwater environment. The reconstructed results of the UDLGI method are better than that of the PDLGI method. The reconstructed results of the UDLGI method are clearer and have high image contrast, but contain some noise. Compared with the other two methods, the reconstructed results of the proposed UGI-GAN method have the highest image contrast with the least noise. The experimental results verify that the UGI-GAN network can achieve high-quality image reconstruction in varying degrees of underwater ambiguity, and the reconstruction ability is better than that of the UDLGI and PDLGI methods.

Fig. 11. Comparison of reconstructed whale images of UGI-GAN, UDLGI, and PDLGI at different degrees of underwater ambiguity.

Download Full Size | PDF

To quantitatively prove the advantages of the proposed UGI-GAN method, the PSNRs and SSIMs of reconstructed images of different methods under the underwater environment at different degrees of ambiguity are calculated. The results are shown in Fig. 12. The horizontal axis of each figure represents the amount of milk poured into it, which is 0 ml, 5 ml, 10 ml, and 20 ml respectively. As Fig. 12 shows, the red, blue, and green lines represent UGI-GAN, PDLGI, and UDLGI respectively. When the degree of ambiguity of the underwater environment is low, the PSNRs and SSIMs of the three methods are relatively high. With the increase of the degrees of ambiguity, the PSNRs and SSIMs of all the methods decrease. Moreover, under the condition of the same degree of ambiguity of the underwater environment, the quantitative evaluation indexes of the three methods have considerable differences. The PSNRs and SSIMs of the PDLGI method are the lowest while the PSNRs and SSIMs of the proposed UGI-GAN method are the highest verifying that the proposed UGI-GAN method performs the best among the three methods in varying degrees of underwater ambiguity.

Fig. 12. The PSNRs and SSIMs of the reconstructed whale images of UGI-GAN, UDLGI, and PDLGI at different degrees of underwater ambiguity.

Download Full Size | PDF

In the third experiment, an underwater detector model, which is different from the training sets, is chosen as the target to examine the generalization ability of the trained neural network model. The experimental results are shown in Fig. 13. On this basis, the PSNRs and SSIMs of reconstructed images of different methods under different sampling rates are indicated below. When the sampling rate is low, such as 2.5%, only UDLGI, and UGI-GAN can restore a clear image, and the image quality of UGI-GAN is better. With the increasing sampling rate, the image quality of UGI-GAN is always superior to the other methods. It can be concluded that the proposed UGI-GAN method has the best generalization ability among the three methods under different sampling rates and the detector can be reconstructed.

Fig. 13. Comparison of reconstructed detector images of UGI-GAN, UDLGI, and PDLGI at different sampling rates.

Download Full Size | PDF

In the fourth experiment, an underwater detector model is chosen as the target to examine the reconstruction ability of the trained neural network model with a 20% sampling rate for the underwater environment with different degrees of ambiguity. The reconstruction results are shown in Fig. 14. On this basis, the PSNRs and SSIMs of every reconstructed underwater detector result are indicated below. It can be seen that the experimental results from Fig. 14 are similar to the experimental results from Fig. 11 in section 3.2. It can be concluded that the proposed UGI-GAN method has the best generalization ability among the three methods in varying degrees of underwater ambiguity. This again shows that our proposed method can produce better results than the other methods.

Fig. 14. Comparison of reconstructed detector images of UGI-GAN, UDLGI, and PDLGI at different degrees of underwater ambiguity.

Download Full Size | PDF

4. Conclusions

To sum up, in order to improve the reconstruction performance of GI for underwater targets, a UGI-GAN method is proposed. The proposed UGI-GAN model can use the total intensity of the target reflection echo detected by the bucket detector as input to reconstruct the underwater environment target with high quality. In the proposed UGI-GAN model, the generator utilizes U-Net as the main network architecture and adds double skip connections between layers. In the double skip connections, the attention module is added to one of the skip connections to improve the reconstruction effect of the network. The CNN is used as the discriminator to guide the image generation of the generator. At the same time, to improve the quality of underwater image reconstruction, the adversarial loss, perceptual loss, and pixel loss are added as the total loss function in a certain proportion in the proposed UGI-GAN model. For obtaining paired data sets, the Cycle-GAN network is used for style transfer of underwater original images to obtain paired data sets.

The performance of the proposed UGI-GAN method at different sampling rates is analyzed and compared with UDLGI and PDLGI by numerical simulations and experiments. Through the comparison, it can be seen that the reconstructed results of the UGI-GAN method have the best visual effects under the different sampling rate conditions. To compare the advantages and disadvantages of the reconstruction results more quantitatively, the PSNRs and SSIMs of the reconstruction results are also calculated. The quality indexes also demonstrate that the UGI-GAN method has the best reconstruction ability. In addition, underwater submarine images are selected as targets to demonstrate the generalization ability of the training network. The underwater image training method proposed in this paper can reconstruct various underwater targets and it proves that the UGI-GAN has a strong generalization ability. Meanwhile, the performance of the proposed UGI-GAN method in an underwater environment with different degrees of ambiguity is also analyzed and compared with UDLGI and PDLGI through experiments. Experimental results show that the reconstructed image of UGI-GAN has the best visual effect and quality indexes in underwater environments with different degrees of ambiguity.

Funding

National Natural Science Foundation of China (61801429); Natural Science Foundation of Zhejiang Province (LQ20F050010, LY20F010001); Fundamental Research Funds of Zhejiang Sci-Tech University (2020Q020); Key Laboratory Foundation (WDZC20205500208).

Acknowledgements

The authors thank Pengfei Jiang for fruitful discussion.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. F. Chen, P. W. Tillberg, and E. S. Boyden, “Expansion microscopy,” Science 347(6221), 543–548 (2015). [CrossRef]

2. M. Sun, H. Wang, and J. Huang, “Improving the performance of computational ghost imaging by using a quadrant detector and digital micro-scanning,” Sci Rep 9(1), 4105–4110 (2019). [CrossRef]

3. B. I. Erkmen, “Computational ghost imaging for remote sensing,” J. Opt. Soc. Am. A 29(5), 782–789 (2012). [CrossRef]

4. N. Tian, Q. Guo, A. Wang, D. Xu, and L. Fu, “Fluorescence ghost imaging with pseudothermal light,” Opt. Lett. 36(16), 3302–3304 (2011). [CrossRef]

5. J. S. Totero Gongora, L. Olivieri, L. Peters, J. Tunesi, V. Cecconi, A. Cutrona, R. Tucker, V. Kumar, A. Pasquazi, and M. Peccianti, “Route to Intelligent Imaging Reconstruction via Terahertz Nonlinear Ghost Imaging,” Micromachines-Basel 11(5), 521 (2020). [CrossRef]

6. S. Ma, C. Hu, C. Wang, Z. Liu, and S. Han, “Multi-scale ghost imaging LiDAR via sparsity constraints using push-broom scanning,” Opt. Commun 448, 89–92 (2019). [CrossRef]

7. G. Li, Z. Yang, Y. Zhao, R. Yan, X. Liu, and B. Liu, “Normalized iterative denoising ghost imaging based on the adaptive threshold,” Laser. Phys. Lett. 14(2), 025207 (2017). [CrossRef]

8. C. Yang, C. Wang, J. Guan, C. Zhang, S. Guo, W. Gong, and F. Gao, “Scalar-matrix-structured ghost imaging,” Photonics. Res 4(6), 281–285 (2016). [CrossRef]

9. Y. O-oka and S. Fukatsu, “Differential ghost imaging in time domain,” Appl. Phys. Lett. 111(6), 061106 (2017). [CrossRef]

10. L. Wang and S. Zhao, “Fast reconstructed and high-quality ghost imaging with fast Walsh-Hadamard transform,” Photonics. Res 4(6), 240–244 (2016). [CrossRef]

11. S. Yuan, Y. Yang, X. Liu, X. Zhou, and Z. Wei, “Optical image transformation and encryption by phase-retrieval-based double random-phase encoding and compressive ghost imaging,” Opt. Laser. Eng 100, 105–110 (2018). [CrossRef]

12. R. Zhu, G. Li, and Y. Guo, “Compressed-Sensing-based Gradient Reconstruction for Ghost Imaging,” Int. J. Theor. Phys 58(4), 1215–1226 (2019). [CrossRef]

13. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560–25572 (2019). [CrossRef]

14. Y. He, G. Wang, G. Dong, S. Zhu, H. Chen, A. Zhang, and Z. Xu, “Ghost Imaging Based on Deep Learning,” Sci Rep 8(1), 6469 (2018). [CrossRef]

15. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci Rep 7(1), 17865 (2017). [CrossRef]

16. T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, M. Sano, R. Hirayama, T. Kakue, A. Shiraki, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun 413, 147–151 (2018). [CrossRef]

17. D. Zhang, R. Yin, T. Wang, Q. Liao, H. Li, Q. Liao, and J. Liu, “Ghost imaging with bucket detection and point detection,” Opt. Commun 412, 146–149 (2018). [CrossRef]

18. X. Zhu, Z. Li, X. Zhang, H. Li, Z. Xue, and L. Wang, “Generative Adversarial Image Super-Resolution Through Deep Dense Skip Connections,” Comput. Graph. Forum 37(7), 289–300 (2018). [CrossRef]

19. C. Schindlbeck, C. Pape, and E. Reithmeier, “Predictor-corrector framework for the sequential assembly of optical systems based on wavefront sensing,” Opt. Express 26(8), 10669 (2018). [CrossRef]

20. H. Kim, C. Y. Hwang, K. S. Kim, J. Roh, W. Moon, S. Kim, B. R. Lee, S. Oh, and J. Hahn, “Anamorphic optical transformation of an amplitude spatial light modulator to a complex spatial light modulator with square pixels [invited],” Appl. Opt. 53(27), G139–146 (2014). [CrossRef]

21. X. Berthelon, G. Chenegros, T. Finateu, S. Ieng, and R. Benosman, “Effects of Cooling on the SNR and Contrast Detection of a Low-Light Event-Based Camera,” IEEE T. Biomed. Circ. S 12(6), 1467–1474 (2018). [CrossRef]

22. R. Ossikovski, “Differential matrix formalism for depolarizing anisotropic media,” Opt. Lett. 36(12), 2330–2332 (2011). [CrossRef]

23. C. Ma, K. Wang, Y. Chi, and Y. Chen, “Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution,” Found. Comput. Math 20(3), 451–632 (2020). [CrossRef]

24. S. S. Alaviani and N. Elia, “A Distributed Algorithm for Solving Linear Algebraic Equations Over Random Networks,” IEEE T. Automat. Contr 66(5), 2399–2406 (2021). [CrossRef]

25. Z. Zhang, F. Li, M. Zhao, L. Zhang, and S. Yan, “Robust Neighborhood Preserving Projection by Nuclear/L2,1-Norm Regularization for Image Feature Extraction,” IEEE Trans. Image Process 26(4), 1607–1622 (2017). [CrossRef]

26. S. L. Brunton, J. L. Proctor, and J. N. Kutz, “Discovering governing equations from data by sparse identification of nonlinear dynamical systems,” P. Natl. A. Sci 113(15), 3932–3937 (2016). [CrossRef]

27. B. D. Mangum, Y. Ghosh, J. A. Hollingsworth, and H. Htoon, “Disentangling the effects of clustering and multi-exciton emission in second-order photon correlation experiments,” Opt. Express 21(6), 7419–7426 (2013). [CrossRef]

28. P. Piotrowski and J. J. Napiorkowski, “A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modelling,” J. Hydrol 476(1), 97–111 (2013). [CrossRef]

29. J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks,” ICCV2243–2251 (2017).

30. X. Yang, P. Jiang, M. Jiang, L. Xu, L. Wu, C. Yang, W. Zhang, J. Zhang, and Y. Zhang, “High imaging quality of Fourier single pixel imaging based on generative adversarial networks at low sampling rate,” Opt. Laser. Eng 140, 106533 (2021). [CrossRef]

31. Ronneberger P. Fischer and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” International Conference on Medical image computing and computer-assisted intervention234–241 (2015).

32. T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, J. Deubner, Z. Jäckel, K. Seiwald, A. Dovzhenko, O. Tietz, C. Dal Bosco, S. Walsh, D. Saltukoglu, T. L. Tay, M. Prinz, K. Palme, M. Simons, I. Diester, T. Brox, and O. Ronneberger, “U-Net: deep learning for cell counting, detection, and morphometry,” Nat. Methods 16(1), 67–70 (2019). [CrossRef]

33. J. Schlemper, O. Oktay, M. Schaap, M. Heinrich, B. Kainz, B. Glocker, and D. Rueckert, “Attention gated networks: Learning to leverage salient regions in medical images,” Med. Image. Anal 53, 197–207 (2019). [CrossRef]

34. Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss,” IEEE T. Med Imaging 37(6), 1348–1357 (2018). [CrossRef]

35. V. Rajinikanth, A. N. Joseph Raj, K. P. Thanaraj, and G. R. Naik, “A Customized VGG19 Network with Concatenation of Deep and Handcrafted Features for Brain Tumor Detection,” Appl. Sci 10(10), 3429 (2020). [CrossRef]

36. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv e-prints 1412.6980 (2014).

37. U. Sara, M. Akter, and M. S. Uddin, “Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study,” J. Comput. Commun 07(03), 8–18 (2019). [CrossRef]

	Reconstruction time
Sampling rate	3%	5%	10%	20%
UGI-GAN	15.15ms	15.15ms	15.17ms	15.21ms
UDLGI	19.13ms	19.14ms	19.17ms	19.24ms
PDLGI	23.15ms	23.21ms	23.25ms	23.41ms

Underwater ghost imaging based on generative adversarial networks with high imaging quality

Abstract

1. Introduction

2. Method

2.1. UGI-GAN imaging scheme

2.2. Preparation of paired training dataset

2.3. Network structure

2.4. Network training

3. Numerical simulation and experimental results

3.1. Numerical simulation

3.2. Experimental results

4. Conclusions

Funding

Acknowledgements

Disclosures

Data availability

References

Data availability

Cited By

Figures (14)

Tables (1)

Equations (17)

Optics Express