Unsupervised underwater imaging based on polarization and binocular depth estimation

Enlai Guo; Jian Jiang; Yingjie Shi; Lianfa Bai; Jing Han

doi:10.1364/OE.507976

1. Introduction

High-quality imaging of targets in turbid water has broad applications, such as archaeological exploration, species research, and infrastructures inspection [1–4]. However, the scattering and absorption of the target light by the particles in the water will inevitably cause the backscattered light, resulting in the degradation of underwater imaging quality, and this situation will be aggravated with the increase of the target distance or the increase of particles in the water [5–8]. Therefore, it is necessary to eliminate the impact of backscattered light on image acquisition.

At present, in order to improve the quality of underwater imaging, many methods have been proposed, which are mainly divided into image enhancement methods and physical model-based methods [9,10]. Image enhancement methods [11,12] aim to improve contrast but do not analyze the physical causes of image quality degradation, so the recovery of image details is limited. As another class of methods, the method based on physical model is to solve the problem by establishing the light scattering model in the process of underwater propagation, among which the polarization-based imaging method has been widely concerned as one of the most effective methods [13–17]. During the development of polarization-based imaging technology, many methods proposed mainly perform underwater reconstruction by recording multiple polarization images with different polarization parameters in the scene [18,19]. Schechner et al. [6,20] proposed an underwater polarization imaging model, which requires the extraction of the relationship between the degree of polarization (DoP) and transmittance of backscattered light from two orthogonally polarized images for scattering light removal. Liang et al. [21] extracted the relationship between target radiation and Angle of polarization (AoP) in three or four polarized images, so as to achieve the restoration of target scenes in water bodies. However, the physical model based on polarization imaging simplifies the propagation process of underwater information, and the model depends on the accuracy of collecting images of different polarization states [18]. The degree of polarization of backscattered light is considered to be a constant value and is estimated by manually selecting some regions, which is inconsistent with the reality. The two orthogonally polarized images at the brightest and darkest moments during the rotation of the polarizer are difficult to obtain accurately [22,23], these disadvantages make the recovery results by these methods suboptimal [24,25].

In recent years, the rise of deep learning has been gradually applied to underwater imaging, which is mainly due to the powerful feature extraction ability of deep learning [25–28]. Using deep learning can learn its inherent laws from a large amount of data to achieve clear underwater imaging. However, it is worth noting that underwater reconstruction using deep learning also has problems such as poor interpretability and lack of physical prior. Recently, many researchers have attempted to embed physical information into deep learning models [29–31] instead of directly using end-to-end models for underwater imaging. In this case, Hu et al [10] proposed an underwater polarization recovery method based on dense network, which can effectively remove backscattered light. However, under the constraints of this supervised network, it is necessary to capture a large number of pixel-level paired blurry and clear images underwater [32–34], which is challenging for dataset preparation. To reduce the data requirements of supervised networks, some have proposed unsupervised training methods [35–37]. Lu et al. [27] combined depth information obtained using the dark channel prior [1,38] with traditional generative adversarial networks. However, in situations with higher turbidity, the dark channel prior almost cannot provide effective depth information, and light intensity information as the input of the network can support the reconstruction of physical information is insufficient. The reconstruction effect under such circumstances needs further verification. Subsequent unsupervised underwater imaging methods use polarization images as part of the network input, but need to obtain accurate angle linear polarization images to calculate the prior information such as AOP and DOP. Yang et al. used an unsupervised network to inpaint backscattered(background) light [39], and used AoP to estimate polarization parameters for underwater image restoration; Peng et al. [26] proposed a $\mathrm {U}^2 \mathrm {R}-\mathrm {pGAN}$ structure using AoP and degree of linear polarization (DoLP) as constraints perform underwater recovery. In the present method, relying solely on polarization characteristics cannot avoid the dependence on the accuracy of polarization model parameters. Therefore, it is necessary to add other physical information based on polarization information. Considering that the interference of backscattered light on detector imaging will increase with distance, this paper adds depth information on the basis of the polarization dimension to improve the quality of underwater imaging.

In this paper, an unsupervised underwater restoration method based on polarization and binocular depth estimation is proposed. Based on the underwater polarization physical model and the relationship between backscattered light and distance, by utilizing depth information as a feature channel input for an unsupervised network, the influence of parameter errors in polarization image computation on underwater imaging quality is reduced. At the same time, at different stages of model training, this method uses the generated clear images to obtain more accurate depth information and re-use it as input to the network. This helps the network understand the transmission process more accurately, which can ultimately improve the quality of imaging. In addition, the unsupervised unpaired training features can reduce the strict matching of data and the limitation of data quantity in the previous supervised network. The experimental results show that when the target is located at different depths or in different water turbidity, the PSNR and SSIM of the recovery results using the method in this paper are significantly improved compared with the traditional method. At the same time, the recovery results of the targets with different depth distributions are also better than those of the traditional method. The final experiment proves the effectiveness of the method by comparing the depth maps of different stages.

2. Methods

When the water contains micro-particles matter and imaging is performed, the micro-particles in the water will cause the absorption and scattering of scene light, resulting in a decrease in the quality of the target image. The light received by the detector includes the scene radiant light and the backscattered light of the illumination. Among them, the scattered light $B(x, y)$ propagating in water will increase with the increase of distance, which can be expressed as:

(1)$$B(x,y) = B_\infty[1-t(x,y)].$$

Among them, $B_\infty$ represents the backscattered light at infinity, and $t(x, y)$ represents the value of the transmittance of the underwater medium at $(x, y)$, which describes the absorption and scattering of radiated light in water, $t(x, y)$ is given by:

(2)$$t(x,y) = exp(-\beta d(x,y)).$$

Among them, $\beta$ represents the extinction coefficient caused by scattering or absorption and is generally a constant by default, and $d(x, y)$ is the distance between the underwater $(x, y)$ and the camera. The process of underwater reconstruction is to restore the radiation intensity information of the original scene, usually the model is [40]:

(3)$$I(x,y) = L(x,y)\cdot t(x,y)+B(x,y).$$

Among them, $I(x, y)$ represents the optical radiation signal received at the pixel $(x, y)$ on the detector. The goal of the image recovery is to remove the backscattered light and recover the object irradiance $L(x, y)$. Assuming that the degree of polarization of the backscattered light is almost constant throughout the scene, and the target usually has a strong depolarization ability [15], so the degree of polarization of $L(x, y)\cdot t(x, y)$ is almost negligible and the transmittance of the underwater polarization imaging model can be expressed as:

(4)$$t(x, y)=1-\frac{I^{/{/}}(x, y)-I^{{\perp}}(x, y)}{P_{s c a t} B_{\infty}}.$$

Among them, $I^{//}(x, y)$ and $I^{\perp }(x, y)$ are respectively modulated into two orthogonal linear polarization images by the analyzer in front of the detector, and $P_{scat}$ is the polarization degree of backscattered light. According to Equation (4), the transmission map can be calculated as long as $P_{scat}$ and the $B_\infty$ are obtained. In the traditional method, the calculation of $P_{scat}$ and $B_\infty$ needs to manually select the average value of the pixel gray level in the background area $\Omega$ instead, and $B_\infty$ can be expressed as:

(5)$$B_{\infty}=\sum_{\Omega}\left(\mathrm{I}^{/{/}} / N+\mathrm{I}^{{\perp}} / N\right),$$

and $P_{scat}$ can be expressed as:

(6)$$P_{s c a t}=\frac{\sum_{\Omega}\left(I^{/{/}}(x, y)-I^{{\perp}}(x, y)\right)}{\sum_{\Omega}\left(I^{/{/}}(x, y)+I^{{\perp}}(x, y)\right)}.$$

According to Equation (5) and (6), the calculation of $P_{scat}$ and $B_\infty$ requires the selection of background region $\Omega$ in the image by experience. However, if there is a brighter target underwater or the underwater background area is less, there will be a deviation in the selection of the background area $\Omega$, and finally the result obtained through Equation (5) and (6) has a large error with the truth, which leads to the degradation of underwater imaging quality.

2.1 Algorithm structure of underwater imaging

In view of the shortcomings of current underwater restoration methods, this paper designs an underwater unsupervised reconstruction method based on polarization and binocular depth estimation. The method in this paper uses the depth information obtained by the binocular system as part of the network input, and recalculates the depth map during the network training process, and finally obtains a clear underwater image. The specific process is shown in Algorithm 1.

Algorithm 1. Unsupervised underwater reconstruction algorithm

View Table | View all tables in this article

First, the pre-built binocular system is calibrated to obtain the internal parameters and external parameters of the binocular system, so that the subsequent correction of the collected underwater binocular images can be used to obtain the initial depth map.

Secondly, using linearly polarized light illumination, adjust the analyzer in front of the binocular system for modulation to obtain polarization images parallel and perpendicular to the polarization direction of the illumination light ($I^{//}(x, y)$ and $I^{\perp }(x, y)$), and perform preliminary descattering according to the traditional underwater polarization reconstruction method.

Finally, the polarization information and the depth information acquired in the binocular system are fused as the input of the neural network proposed in this paper, and clear underwater images are obtained through unsupervised training. At the same time, during the model training process, this method uses the gradually clear image obtained by the neural network to recalculate the depth information and replace the initial depth information of the input part, thereby improving the quality of underwater imaging. The specific network structure will be introduced in detail in Section 2.3.

2.2 Model structure

The structure of the unsupervised network used in this article is shown in Fig. 1.

Fig. 1. Schematic diagram of the neural network structure.

Download Full Size | PDF

The whole model consists of two parts, the generator and the discriminator. The main function of the generator is to generate a clear image $\mathrm {G}_{\mathrm {A}}\left (x_i,d^{\prime }\right )$ from the input blurred image x and then generate a reconstructed image $x_R$, and in the process of model training, make $x_R$ close to $x$; At the same time, the generator B will convert the clear image $y$ into a blurred image $\mathrm {G}_{\mathrm {B}}$ to generate a reconstructed image $y_R$, and make $y_R$ close to the original clear image $y$ during the training process.

In this process, the results $\mathrm {G}_{\mathrm {A}}\left (x_i,d^{\prime }\right )$ and $\mathrm {G}_{\mathrm {B}}\left (y\right )$ of the generator are determined to be ’1’ by the discriminator. The role of the discriminator is to mark the image generated by the generator as ’0’ and the actual image as ’1’, and the discriminator also needs to mark the image generated during the training process of the generator as ’1’, which will make the images generated by the generator more realistic, which is closer to the desired result [35,37]. The structure of the generator $\mathrm {G}_{\mathrm {A}}$ is shown in Fig. 2(a), which mainly includes encoding and decoding modules. Between the encoding and decoding modules, nine residual modules (as shown in Fig. 2(c)) are additionally introduced to extract more feature information between the two modules and perform feature fusion. In the input part, the initial descattered images of the left view and right view and the weighted initial depth information $d'$ need to be channel-fused as the input of the two encoding parts respectively, After that, feature extraction and fusion are performed on the input information through the encoding module and the residual module, and the final decoding module will generate an image $\mathrm {G}_{\mathrm {A}}\left (x_i,d^{\prime }\right )$. The structure of the generator $\mathrm {G}_{\mathrm {B}}$ is shown in Fig. 2(b). Its structure is similar to that of $\mathrm {G}_{\mathrm {A}}$, but the encoding part of the generator $\mathrm {G}_{\mathrm {B}}$ has only one input port, and in the decoding part, two branches are derived for upsampling to generate two images $\mathrm {G}_{\mathrm {B}}\left (y\right )$.

Fig. 2. (a) Generator A. (b) Generator B. (c) Residual structure.

Download Full Size | PDF

In order to improve the discrimination ability of the discriminator, the model uses PatchGAN [35,36] to improve the accuracy. The discriminator mainly performs downsampling and feature extraction through the convolution module. Instead of an evaluation of the overall image, it pays more attention to different modules in the image, so it is easier to focus on high-frequency details.

3. Experiments

3.1 Datasets

In order to verify the effect of the method in this paper, an underwater imaging experiment is carried out in this work to establish an underwater dataset. The experimental setup is shown in Fig. 3.

Fig. 3. Schematic diagram of the data acquisition system. The polarizer in front of the light source is used to generate linearly polarized light to illuminate the target, and the analyzer in front of the camera is adjusted to obtain a scene with two orthogonal polarization states. The parameters of the device are also marked in the figure.

Download Full Size | PDF

During the data collection process, active illumination is used, where an LED light (THORLABS , SM2F32-A) is set up as an incoherent light source. During the experiment, the actual output power of the light source remained constant, about 580mW. By setting up an appropriate incoherent light source, the light intensity in turbid water can be increased, making it easier to meet imaging conditions. A polarizer ($\Phi$ = 50.8 mm, extinction ratio=1,000:1) is placed in front of the light source to provide linearly polarized light for illumination. In the experiment, two grayscale cameras of the same specification (Basler, acA1920-40gc) anDuring the experiment, the power of the light source remained constant, about 19W. d an industrial camera lens of the same specification ($f$=25mm, computer) are used to build a binocular system. Two analyzers of the same specifications ($\Phi$ =50.8 mm, extinction ratio=1,000:1) are placed in front of the binocular system. By rotating the analyzers in front of the camera, images parallel and perpendicular to the direction of the polarizer are captured. The underwater targets are placed in a transparent glass tank (15cm $\times$ 15cm $\times$ 20cm), and clear water is added to the tank. The turbidity of the water is adjusted as required by adding a milk solution. The original images captured by the cameras have a pixel size of 1920$\times$1200. After subsequent cropping and correction, images of size 256$\times$512 are obtained. Every time a kind of underwater target is placed, the left and right cameras respectively collect images in two polarization directions ($I^{//}(x, y)$ and $I^{\perp }(x, y)$) with angles between the polarizer and the analyzers being 0$^{\circ }$ and 90$^{\circ }$.

In the experiment, the distance between the camera and the underwater target is set to about 70-85cm, and 5cm is used as a step, and then the effect of this method under different distances is verified for underwater imaging. At the same time, different amounts of milk solution are poured into clear water to obtain water bodies with different turbidity to verify the recovery advantages of this method under different turbidity. During the experiment, the targets are divided into plane targets and targets with varying depths to verify that the method in this paper is applicable to targets with different depth distributions.

Ultimately, a data set containing 200 underwater scenes is established through the actual system, and each pair of data consists of orthogonal polarized images from the left and right viewpoints (i.e., images containing parallel and vertical polarization states in a single viewing angle). Among them, 170 pairs of data are used for unsupervised training, and the remaining 30 pairs are used for verification and testing. At each training time, images taken through clear water and turbid water are randomly selected as input in an unpaired manner. The entire training cycle consists of 1200 iterations, and the learning rate decreases by 0.0001 every 100 iterations. Each iteration takes approximately 3 minutes, resulting in a total training time of around 55 hours.

3.2 Quality measures

Two commonly used criteria, peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), are used to evaluate the quality of image restoration. The mean square error represents the square of the difference between the clear image $I(x, y)$ and the degraded image $I'(x, y)$ with a given pixel size of m$\times$n:

(7)$$M S E=\frac{1}{m n} \sum_{i=0}^{m-1} \sum_{j=0}^{n-1}\left[I(i, j)-I^{\prime}(i, j)\right]^2.$$

PSNR is defined on the basis of the mean square error and expresses the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. PSNR can be denoted as:

(8)$$P S N R=10 \cdot \lg \frac{M A X^2}{M S E},$$

where MAX represents the maximum pixel value of the image. The human visual system (HSV) mainly obtains structural information from the visible area. Under this condition, SSIM is introduced to measure the similarity of two image structures:

(9)$$SSIM=\frac{\left(2 \mu_I \mu_{I^{\prime}}+c_1\right)\left(2 \sigma_{I I}+c_2\right)}{\left(\mu_I^2+\mu_I^2+c_1\right)\left(\sigma_I^2+\sigma_I^2+c_2\right)},$$

where $\mu _I$ and $\mu _I^{\prime }$ represent the means of the clean image and blurred image, respectively. $\sigma _{II^{\prime }}$ represents the covariance between image $I(x, y)$ and image $I'(x, y)$, respectively. $\sigma _{I}$ and $\sigma _{I^{\prime }}$ represents the variances between the clean image and blurred image. $c_1=\left (k_1 L\right )^2$ and $c_2=\left (k_2 L\right )^2$ are constants used to maintain stability, where $L$ represents the dynamic range of the pixels, and $k_1$ and $k_2$ are also constants.

3.3 Results and discussion

Three groups of experiments are compared in this section:

* Comparison of imaging results for targets at different depths. In the experiment, a quantitative amount of milk solution is added to the tank to achieve a turbidity level of approximately 90 NTU. The position of the target is changed so that the transmission distance of the light containing the target information in the turbid water ranged from 2-15cm.
* Comparison of target imaging effects with different depth distributions. In the experiment, a certain amount of milk solution is added to the water tank to increase the turbidity of the water in the tank to 160 NTU, and at the same time, the target with depth variation in the water is imaged.
* Comparison of imaging results of the same target in water bodies with different turbidity. In the experiment, the position of the target is fixed, and milk solution is gradually added into the water tank, so that the turbidity of the water in the tank gradually changes to 90 NTU, 130 NTU and 170 NTU, so as to image the targets in different turbidity water bodies.

In the experiment, the method proposed in this paper is compared with these existing methods including orthogonal polarization reconstruction proposed by Schechner [6,20], BM3D enhancement [41] based on polarization reconstruction, and monocular unpolarization CycleGAN [37].

3.3.1 Experimental results for targets with different propagation distances

In the first group of experiments, the distance between the target and the front surface of the water tank is gradually increased, so that the light carrying target information travels longer in the turbid water body, and the images collected by the camera are gradually affected by the scattered light. In this experiment, four underwater recovery methods are compared, and the results are shown in Fig. 4.

Fig. 4. The reconstruction effect of the same target at different distances by different methods, when the turbidity of the water body is 90NTU. The error map is obtained by subtracting the target image in each state using the GT. (a) Original image. (b)-(f) Result of polarization, BM3D, CycleGAN, the proposed method, and GT respectively.

Download Full Size | PDF

Figures 4(a)-(f) show the original images collected at different distances from the front surface of the water tank and the restoration effects of different methods for the same target. It can be seen from the results that although the traditional polarization method can remove the influence of part of the scattered light on the imaging quality to a certain extent, it may be caused by the error of the angle between the direction of the analyzer and the direction of the polarizer when the image is collected, so the transmission map using polarized image estimation produces errors, and the target information in the reconstruction results of traditional polarization methods is also still relatively blurred. And the secondary enhancement using BM3D on the basis of polarization reconstruction do not yield significant improvements in the results. Since CycleGAN under the monocular perspective only uses light intensity information for underwater restoration and lacks the physical prior of depth information and polarization information, as the scattering distance increases, the target information cannot be effectively restored. Compared with the previous three methods, the method proposed in this paper uses the depth information acquired by the binocular system as part of the network input based on the utilization of polarization information, so that the neural network can obtain more dimensional information, and combines the laws of physical propagation. Therefore, the network can be better guided for underwater recovery. By comparing the error maps in different states, it can be seen that the error map information of the method proposed in this paper is much smaller than that in other methods, indicating that the method proposed in this paper can recover more target information.

In addition, the results of (c) are compared by two image quality evaluation indicators, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), and the results obtained are shown in Table 1.

Table 1. Quantitative comparison of the reconstruction effects of different methods when the target is 12 cm away from the front surface of the water tank

View Table | View all tables in this article

From the data in Table 1, in terms of PSNR, the results obtained by using the method in this paper are improved by 5dB and 6dB compared with the traditional polarization reconstruction results and the results of CycleGAN under the monocular perspective. In terms of SSIM, the results obtained by the method in this paper are improved by about 0.05 and 0.03 compared with the polarization reconstruction results and the results obtained by CycleGAN. At the same time, Fig. 5 also shows how the image quality restored using the method proposed in this article changes with the propagation distance. It can be seen from the changes in Fig. 5 that in terms of SSIM and PSNR, as the propagation distance increases, the results obtained using the method in this paper also continue to decline.

Fig. 5. Changes in PSNR and SSIM of the results obtained using this method as the propagation distance increases.

Download Full Size | PDF

By changing the transmission distance of the target and comparing the differences in PSNR and SSIM obtained by different methods, it can be seen that the method proposed in this paper introduces the depth information of the scene into the neural network, which can guide the network to perform better feature mapping. In the end, the results obtained by this method are significantly better than other methods in PSNR and SSIM.

3.3.2 Experimental results for targets with different depth distributions

In the second group of experiments, two types of targets are placed in the tank: one is a target whose depth varied continuously, and the other is a target composed of two depth step planes. Due to the different depths, the light carrying the target structure information has different scattering distances in the turbid water body, and the degree of interference by the scattered light is also different, so the image is affected by the scattering to different degrees.

Figure 6 shows the acquired raw images and the results of using different methods to restore objects with depth variations. From the comparison results in Fig. 6, it can be seen that the traditional polarization reconstruction method has limited effect on the restoration of deep area of the target, and the restored target is still blurred; the effect of using BM3D secondary enhancement on the basis of polarization reconstruction is not obvious. The results obtained by using monocular non-polarization CycleGAN have obvious noise and some target information is lost; Compared with monocular non-polarization CycleGAN, the method proposed in this paper combines the laws of physical propagation, so the results have less noise and almost no target information is lost. It can be seen from the error map in Fig. 6 that the error map information of other methods is significantly more than the error map information of this method, indicating that other methods lose more information during the image restoration process.

Fig. 6. Effect comparison of target underwater reconstruction methods with depth changes. The error map is obtained by subtracting the target image in each state using the GT. (a1) Original image. (b1)-(f1) Resluts of polarization reconstruction, BM3D, CycleGAN, the proposed method, and GT respectively. (g1) Depth map of GT.

Download Full Size | PDF

In addition, the results in Fig. 6 are compared by calculating PSNR and SSIM, and the results are shown in Table 2.

Table 2. Quantitative comparison of reconstruction effects of different methods for targets with depth changes under different turbidity

View Table | View all tables in this article

From the data in Table 2, for the targets in (a) and (b), the PSNR of the results obtained by the method in this paper is 4dB and 6dB higher than that of the traditional polarization reconstruction, and the SSIM is 0.14 and 0.11 higher; Compared with the results obtained by monocular unpolarized CycleGAN, PSNR increases by 3dB and 5dB, and SSIM increases by 0.06 and 0.18. From Fig. 7, it can be observed that the proposed method in this paper shows significant improvement in terms of PSNR and SSIM compared to the results obtained from other methods. In addition, in order to verify the effect of the method proposed in this paper on removing scattered light, this paper compares the depth information calculated by using different methods to restore the target with depth changes, as shown in Fig. 8.

Fig. 7. Comparison of results of different methods in terms of PSNR and SSIM in Scenes I.

Download Full Size | PDF

Fig. 8. Effect of different methods on depth maps with depth-varying targets in turbid water. (a) Depth map of original image. (b),(c) Depth map calculated by polarization reconstruction and the proposed method respectively. (d) Depth map of GT.

Download Full Size | PDF

It can be seen from Fig. 8 that in the original image captured through the turbid water body, because the imaging quality of the target is greatly affected by the scattered light, the structural information of the original target is destroyed, so the final depth map is also inaccurate; Using the traditional polarization method to restore the depth map of the target, because the influence of part of the backscattered light is removed to restore part of the structural information in the target, the profile of the depth map is gradually accurate, but because the scattered light is not completely removed, so certain areas of the depth map are still inaccurate; Finally, the method proposed in this paper is used to restore the underwater target information. This method integrates the depth map obtained by the binocular system into the network and iteratively calculates the depth map during the network training process. The final depth map is similar to the gt depth map, which proves that the method in this paper has a better effect on removing scattered light.

By using targets with different depth distributions for underwater image restoration, it can be observed that the proposed method in this paper can recover more structural information of the targets. This is because the proposed method is based on the relationship between the underwater transmission process and depth information. By using depth information as part of the network input, the method can gradually obtain clearer images and thus more accurate depth information, which can help the model better understand the underwater transmission process and ultimately improve the quality of underwater imaging.

3.3.3 Experimental results for targets in different water turbidities

In the third group of experiments, the content of milk solution in the water body is gradually increased, so that the turbidity of the water body gradually increases. Figure 9 shows the original image collected for the same target when the water turbidity gradually increases and the restoration results of different methods. From the comparison results, it can be seen that when the turbidity of the water body is about 90NTU, the traditional polarization reconstruction method still has a certain effect, but when the turbidity of the water body is 130NTU or higher, the traditional polarization reconstruction method cannot enhance the contrast of the image; The BM3D effect of secondary enhancement on this basis is not obvious. Through experimental comparison, the results obtained by this method are significantly better than those obtained by traditional polarization reconstruction when the water turbidity ranges from 90 to 170 NTU. At the same time, the error map is obtained by subtracting the target image in various states from the gt image. By comparing the error map, it can be seen that the error map of the method proposed in this paper displays less information, indicating that more information is recovered in this case.

Fig. 9. The reconstruction effect of different methods on the same target when the water turbidity gradually increases. The error map is obtained by subtracting the target image in each state using the GT. (a) Original image. (b)-(e) Result of polarization, BM3D, the proposed method, and GT respectively.

Download Full Size | PDF

In addition, the recovery results in different turbidities are compared by calculating PSNR and SSIM, as shown in Table 3. It can be seen from Table 3 that as the water turbidity gradually increases to 170NTU, the recovery results of the method in this paper are compared with the results of polarization reconstruction, and the gap in PSNR and SSIM is gradually narrowing. If the turbidity of the water body continues to rise, the information in the collected polarization images may also be gradually obliterated by the interference of scattered light, making the method in this paper unable to perform clear underwater imaging. Figure 10 also shows that as the turbidity of the water body increases, the recovery ability of the method in this paper for the target in the water gradually decreases. This experimental part compares the results recovered by this method with the results of other methods by changing the turbidity in the water body. It can be seen that the method proposed in this paper can indeed effectively improve the quality of underwater imaging. This is because the method proposed in this paper takes advantage of the relationship between the underwater transmission process and depth information, and uses depth information as part of the neural network to guide the network to perform feature mapping and achieve high-quality underwater image restoration.

Table 3. Quantitative comparison of reconstruction effects of different methods for the same target under different turbidities

View Table | View all tables in this article

Fig. 10. Changes in PSNR and SSIM of the results obtained using this method as the water turbidity increases.

Download Full Size | PDF

Through the above three sets of comparative experiments: the recovery effect of targets at different depths and different turbidity and the recovery of targets with different depth distributions, it can be verified that the self-supervised underwater polarization imaging method based on binocular estimation proposed in this paper can not only Improve the imaging quality of targets at different depths and different water turbidity, and the recovery results for targets with different depth distributions are significantly better than the results of traditional polarization reconstruction and monocular non-polarization CycleGAN. The above experiments prove the effectiveness of the method proposed in this paper and can provide a different idea for subsequent underwater imaging research.

4. Conclusion

This paper proposes an algorithm that combines stereo depth estimation with the unsupervised nature of CycleGAN. Based on the correlation between the depth information in the scene and the underwater imaging process, an unsupervised underwater imaging method based on binocular depth estimation is proposed, which uses depth information to guide the network for better feature mapping. This method estimates the depth information in the scene by building a binocular system and uses it as part of the input to the unsupervised network. Introducing depth information into the network can help the network more accurately understand the physical process of underwater transmission, thereby reducing errors caused by polarization image calculation parameters, thereby improving the quality of underwater imaging. At the same time, at different stages of model training, this method will use the gradually generated clear images to recalculate more accurate depth information as input to the network, and ultimately recover more target information. Experimental results show that compared with the results of existing similar methods, the reconstruction results obtained by this method have obvious advantages in restoring image details and can retain more target information. The proposal of this method provides a new idea for the development of the combination of deep learning and physical models in the field of anti-scattering.

Funding

National Natural Science Foundation of China (62031018, 62101255, 61971227); Jiangsu Provincial Key Research and Development Program (BE2022391).

Acknowledgments

We thank Chenyin Zhou, Lingfeng Liu, for technical supports and experimental discussion.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). [CrossRef]

2. G. N. Bailey and N. C. Flemming, “Archaeology of the continental shelf: marine resources, submerged landscapes and underwater archaeology,” Quat. Sci. Rev. 27(23-24), 2153–2165 (2008). [CrossRef]

3. L. B. Wolff, “Polarization vision: a new sensory approach to image understanding,” Image Vis. computing 15(2), 81–93 (1997). [CrossRef]

4. D. M. Kocak, F. R. Dalgleish, F. M. Caimi, et al., “A focus on recent developments and trends in underwater imaging,” Mar. Technol. Soc. J. 42(1), 52–67 (2008). [CrossRef]

5. Y. Shi, E. Guo, L. Bai, et al., “Polarization-based haze removal using self-supervised network,” Front. Phys. 9, 789232 (2022). [CrossRef]

6. Y. Y. Schechner and N. Karpel, “Recovery of underwater visibility and structure by polarization analysis,” IEEE J. Oceanic Eng. 30(3), 570–587 (2005). [CrossRef]

7. Y. Shi, E. Guo, S. Zhu, et al., “Research on optimal skip connection scale in learning-based scattering imaging,” in Seventh Symposium on Novel Photoelectronic Detection Technology and Applications, vol. 11763 (SPIE, 2021), pp. 235–241.

8. Q. Cheng, L. Bai, J. Han, et al., “Super-resolution imaging through the diffuser in the near-infrared via physically-based learning,” Opt. Lasers Eng. 159, 107186 (2022). [CrossRef]

9. Z. Dong, D. Zheng, Y. Huang, et al., “A polarization-based image restoration method for both haze and underwater scattering environment,” Sci. Rep. 12(1), 1836 (2022). [CrossRef]

10. H. Hu, Y. Han, X. Li, et al., “Physics-informed neural network for polarimetric underwater imaging,” Opt. Express 30(13), 22512–22522 (2022). [CrossRef]

11. Y. Xu, J. Wen, L. Fei, et al., “Review of video and image defogging algorithms and related studies on image restoration and enhancement,” Ieee Access 4, 165–188 (2016). [CrossRef]

12. S. Emberton, L. Chittka, and A. Cavallaro, “Underwater image and video dehazing with pure haze region segmentation,” Comput. Vis. Image Underst. 168, 145–156 (2018). [CrossRef]

13. F. Liu, P. Han, Y. Wei, et al., “Deeply seeing through highly turbid water by active polarization imaging,” Opt. Lett. 43(20), 4903–4906 (2018). [CrossRef]

14. K. O. Amer, M. Elbouz, A. Alfalou, et al., “Enhancing underwater optical imaging by using a low-pass polarization filter,” Opt. Express 27(2), 621–643 (2019). [CrossRef]

15. B. Huang, T. Liu, H. Hu, et al., “Underwater image recovery considering polarization effects of objects,” Opt. Express 24(9), 9826–9838 (2016). [CrossRef]

16. Y. Y. Schechner and N. Karpel, “Clear underwater vision,” in Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1 (IEEE, 2004), p. I

17. F. Liu, Y. Wei, P. Han, et al., “Polarization-based exploration for clear underwater vision in natural illumination,” Opt. Express 27(3), 3629–3641 (2019). [CrossRef]

18. X. Li, J. Xu, L. Zhang, et al., “Underwater image restoration via stokes decomposition,” Opt. Lett. 47(11), 2854–2857 (2022). [CrossRef]

19. J. Liang, L. Ren, H. Ju, et al., “Polarimetric dehazing method for dense haze removal based on distribution analysis of angle of polarization,” Opt. Express 23(20), 26146–26157 (2015). [CrossRef]

20. T. Treibitz and Y. Y. Schechner, “Active polarization descattering,” IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 385–399 (2009). [CrossRef]

21. J. Liang, L. Ren, E. Qu, et al., “Method for enhancing visibility of hazy images based on polarimetric imaging,” Photonics Res. 2(1), 38–44 (2014). [CrossRef]

22. Y. Wei, P. Han, F. Liu, et al., “Enhancement of underwater vision by fully exploiting the polarization information from the stokes vector,” Opt. Express 29(14), 22275–22287 (2021). [CrossRef]

23. N. Agarwal, J. Yoon, E. Garcia-Caurel, et al., “Spatial evolution of depolarization in homogeneous turbid media within the differential mueller matrix formalism,” Opt. Lett. 40(23), 5634–5637 (2015). [CrossRef]

24. P. Han, F. Liu, Y. Wei, et al., “Optical correlation assists to enhance underwater polarization imaging performance,” Opt. Lasers Eng. 134, 106256 (2020). [CrossRef]

25. X. Ding, Y. Wang, and X. Fu, “Multi-polarization fusion generative adversarial networks for clear underwater imaging,” Opt. Lasers Eng. 152, 106971 (2022). [CrossRef]

26. P. Qi, X. Li, Y. Han, et al., “U2r-pgan: Unpaired underwater-image recovery with polarimetric generative adversarial network,” Opt. Lasers Eng. 157, 107112 (2022). [CrossRef]

27. J. Lu, N. Li, S. Zhang, et al., “Multi-scale adversarial network for underwater image restoration,” Opt. Laser Technol. 110, 105–113 (2019). [CrossRef]

28. J. Zhang, J. Shao, H. Luo, et al., “Learning a convolutional demosaicing network for microgrid polarimeter imagery,” Opt. Lett. 43(18), 4534–4537 (2018). [CrossRef]

29. Y. Shi, E. Guo, M. Sun, et al., “Non-invasive imaging through scattering medium and around corners beyond 3d memory effect,” Opt. Lett. 47(17), 4363–4366 (2022). [CrossRef]

30. S. Zhu, E. Guo, J. Gu, et al., “Efficient color imaging through unknown opaque scattering layers via physics-aware learning,” Opt. Express 29(24), 40024–40037 (2021). [CrossRef]

31. E. Guo, Y. Sun, S. Zhu, et al., “Single-shot color object reconstruction through scattering medium based on neural network,” Opt. Lasers Eng. 136, 106310 (2021). [CrossRef]

32. J. Pan, S. Liu, D. Sun, et al., “Learning dual convolutional neural networks for low-level vision,” in Conference on computer vision and pattern recognition (IEEE, 2018), pp. 3070–3079.

33. B. Li, X. Peng, Z. Wang, et al., “Aod-net: All-in-one dehazing network,” in International conference on computer vision (IEEE, 2017), pp. 4770–4778.

34. X. Liu, X. Li, and S.-C. Chen, “Enhanced polarization demosaicking network via a precise angle of polarization loss calculation method,” Opt. Lett. 47(5), 1065–1068 (2022). [CrossRef]

35. I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial networks,” Commun. ACM 63(11), 139–144 (2020). [CrossRef]

36. J.-Y. Zhu, T. Park, P. Isola, et al., “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in International conference on computer vision (IEEE, 2017), pp. 2223–2232.

37. P. Isola, J.-Y. Zhu, T. Zhou, et al., “Image-to-image translation with conditional adversarial networks,” in Conference on computer vision and pattern recognition (IEEE, 2017), pp. 1125–1134.

38. R. Sathya, M. Bharathi, and G. Dhivyasri, “Underwater image enhancement by dark channel prior,” in 2nd International Conference on Electronics and Communication Systems (IEEE, 2015), pp. 1119–1123.

39. S. Yang, B. Qu, G. Liu, et al., “Unsupervised learning polarimetric underwater image recovery under nonuniform optical fields,” Appl. Opt. 60(26), 8198–8205 (2021). [CrossRef]

40. H. Horvath, “On the applicability of the koschmieder visibility formula,” Atmos. Environ. (1967) 5(3), 177–184 (1971). [CrossRef]

41. K. Dabov, A. Foi, V. Katkovnik, et al., “Image denoising by sparse 3-d transform-domain collaborative filtering,” IEEE Trans. on Image Process. 16(8), 2080–2095 (2007). [CrossRef]

Method	PSNR	SSIM
Original data	15.7354	0.6982
Polarization	17.2523	0.7295
BM3D	17.3124	0.7122
CycleGAN	16.2226	0.7444
Ours	22.0821	0.7705

Methods	PSNR/SSIM
Methods	(a)	(b)
Original data	14.6268/0.5066	14.4864/0.5586
Polarization	15.6569/0.5750	22.6211/0.7260
BM3D	15.8783/0.5991	22.9113/0.7399
CycleGAN	16.3545/0.6524	21.9008/0.6524
Ours	20.9033/0.7211	25.1342/0.8234

Method	PSNR/SSIM
Method	(a) 90NTU	(b) 130 NTU	(c) 170NTU
Original data	16.0884/0.5041	15.5988/0.5253	14.2002/0.4799
Polarization	17.1204/0.5956	15.4147/0.5214	15.1121/0.5184
BM3D	15.9499/0.6027	15.1811/0.5201	15.0592/0.4823
Ours	22.0012/0.8014	19.6991/0.7241	18.1652/0.6420

Method	PSNR	SSIM
Original data	15.7354	0.6982
Polarization	17.2523	0.7295
BM3D	17.3124	0.7122
CycleGAN	16.2226	0.7444
Ours	22.0821	0.7705

Methods	PSNR/SSIM
Methods	(a)	(b)
Original data	14.6268/0.5066	14.4864/0.5586
Polarization	15.6569/0.5750	22.6211/0.7260
BM3D	15.8783/0.5991	22.9113/0.7399
CycleGAN	16.3545/0.6524	21.9008/0.6524
Ours	20.9033/0.7211	25.1342/0.8234

Unsupervised underwater imaging based on polarization and binocular depth estimation

Abstract

1. Introduction

2. Methods

2.1 Algorithm structure of underwater imaging

2.2 Model structure

3. Experiments

3.1 Datasets

3.2 Quality measures

3.3 Results and discussion

3.3.1 Experimental results for targets with different propagation distances

3.3.2 Experimental results for targets with different depth distributions

3.3.3 Experimental results for targets in different water turbidities

4. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (4)

Equations (9)

Optics Express