Learning mapping by curve iteration estimation For real-time underwater image enhancement

Junting Wang; Xiufen Ye; Yusong Liu; Xinkui Mei; Xing Wei

doi:10.1364/OE.512397

1. Introduction

The development and utilization of marine resources have become increasingly important to enhance exploration and advancement in the oceanic world [1]. However, images captured underwater are often affected by various factors such as light attenuation, scattering, and absorption, which result in poor quality and make it challenging to extract necessary information [2]. Underwater image enhancement serves as a basis for downstream tasks by improving the visual perception quality of underwater images. The significance of this technique is self-evident.

Over the past few years, researchers have developed several underwater image enhancement methods, broadly categorized into three groups: the physical model-based methods, model-free methods and deep learning-based methods [3]. The physical model-based methods aim to simulate the underwater imaging process by regarding the enhanced process as an inverse solution problem of an equation. Representatively, the Akkaynak-Treibitz imaging model [4], the DCP (Dark Channel Prior) model [5], and its series of variants. However, employing underwater image enhancement methods based solely on physical models is confronted with a significant challenge that originates from the complexity of the underwater environment, as the presence of various factors cannot be taken into account. The model-free methods [6,7] involve the direct modification of the pixel values of the image’s RGB, HSV, and other color spaces to improve the contrast of the image and other quality indexes [8]. However, these methods are prone to over-enhance or under-enhance the image.

By comparison, methods based on deep learning [9,10] are more likely to achieve more stable results. Representatively, [11] proposed Ucolor and employed media transmission-guided multicolor space embedding. Its application is limited by the access of the transmission map. Some researchers aimed to design complex networks, incorporate additional loss functions, or integrate traditional prior knowledge into deep learning methods. Such approaches increase the complexity of the model and might overcomplicate the low-level vision tasks, which in turn imposes significant restrictions when applied. Meanwhile, complex models are more demanding on computing resources and hinder the implementation of real-time underwater image enhancement [12].

We present a novel iterative curve estimation framework for underwater image enhancement. We aim to utilize curves to simulate the nonlinear mapping relationship between degraded and non-degraded images. This approach mirrors the fundamental core logic of deep learning networks, which means learning mapping. Furthermore, we offer a comprehensive explanation and substantiation for the mathematical reasoning behind curve. Based on this rationale, we choose the lowest-order polynomial curve to cover a wide range of mapping scenarios.

First, we design an iterative curve with parameters to model the mapping between raw and enhanced images. Compared to the conventional end-to-end models, our algorithm greatly reduces the demand for model learning capabilities. Then, we introduce CieNet, a network for curve parameter estimation, and a set of loss functions used to train this network.

Our algorithm has few parameters and requires low computing resources, yet is capable of achieving real-time underwater image enhancement. The main contributions of this paper are summarized as follows:

• We simplify the task of enhancing underwater images as a curve parameter estimation task, which provides a new way to develop low-level vision tasks in underwater environments.
• We propose an innovative iterative curve estimation framework for enhancing underwater images. We introduce the iterative curves with parameters, then design the parameter estimation network (CieNet) and a set of loss functions.
• Our model is characterized by a low number of parameters and a fast computational speed, making it suitable for deployment on small and resource-constrained devices. Its flexibility allows for the achievement of real-time enhancement, thereby demonstrating strong practical applicability.

2. Related work

Underwater image enhancement has been extensively researched since it can improve the visual perception quality of images. It has undergone significant development in recent years, which can be classified into three distinct categories: physical model-based methods, non-physical model-based methods, and deep learning-based methods.

2.1 Physical model-based methods

The physical model-based methods involve simulating the underwater image imaging process and constructing an appropriate imaging model. The model parameters are then determined, and the natural image without any distortions due to light attenuation or absorption can be restored based on these parameters.

He et al. [5] estimated the ambient light and transmission map by using the dark channel prior. Drews et al. [13] proposed the unreliability of the red channel in underwater images due to the severe absorption and attenuation. Instead, the green and blue channels can provide enough information to restore images. Peng et al. [14] proposed a transmission map estimation method for underwater scenes based on image blurriness and light absorption. Liu et al. [15] proposed a new underwater light attenuation prior (NULAP) and a colorless tilt correction fusion smoothing filtering method to enhance color saturation and edge details.

Another branch of research attempted to simulate more accurate degradation processes using the optical mechanism of underwater imaging. Zhao et al. [16] proposed a method of deriving inherent optical properties of water from the background color based on an underwater image imaging model. Akkaynak et al. [4] pointed out that the general underwater imaging models ignore that the attenuation coefficients of direct and backscattered components are different. It presented an improved underwater imaging model. Xie et al. [17] proposed a variational framework guided by a red channel prior and with additional consideration for the forward scatter component based on the underwater image formation model.

The simplified physical model can be challenging to simulate the complex underwater degradation processes. However, using prior knowledge to develop accurate mathematical models can present significant limitations in their applications due to their complexity in calculating parameters. It is also the most important problem to be solved in the physical model-based methods.

2.2 Non-physical model-based methods

Traditional Non-physical model-based methods aim to produce visually satisfactory enhancement results by directly modifying the RGB pixel values of the image. Dong et al. [18] utilized histogram stretching in the LAB color model to enhance underwater images. Zhuang et al. [19] proposed a Retinex variational model based on transcendental Laplacian reflection. Zhou et al. [20] proposed an underwater image enhancement method that involves removing color projection and using backscatter pixels prior.

Image fusion is a representative approach, where Ancuti et al. [8] proposed fusing the results of contrast enhancement and color correction as the final image enhancement result. Further, Ancuti et al. [21] built the enhancement results by blending color-compensated and white-balanced versions of the original degraded images. Ancuti et al. [22] employed a color-transfer strategy and operated in a color opponent space that helps compensate for chromatic loss automatically. Li et al. [23] proposed a combination of color compensation and color constancy algorithms using specific categories to eliminate color offset.

The underwater image enhancement methods using non-physical models can enhance the contrast of underwater images but tend to cause over-enhancement, producing results that score well in objective evaluation indexes but prove poor in subjective perception.

2.3 Deep learning-based methods

The explosion of deep learning in low-level vision tasks has witnessed massive image enhancement networks being proposed. Since the GT (Ground Truth) for underwater image enhancement is almost impossible to obtain, enhanced models based on supervised methods have used synthetic paired datasets. It is unclear how these algorithms would perform in the wild. To bridge this gap, Li et al. [24] constructed an underwater image enhancement benchmark dataset UIEB and proposed WaterNet. Li et al. [25] proposed UWCNN, which can simulate different underwater types and degradation levels. Chen et al. [26] proposed an underwater image enhancement model with detection sensors to guide the model in producing images that facilitate detection. Zhou et al. proposed a feature alignment module (FAM) [27] and a multi-feature fusion (MFF) module [28] to fuse derived image features, which can improve the final reconstruction effect.

On the other hand, some unsupervised algorithms had been proposed to get rid of the dependence on paired data. Guo et al. [29] proposed zero-reference deep curve estimation for low-light image enhancement. Li et al. [30] proposed WaterGAN, a generative adversarial network for generating realistic underwater images from in-air images. Li et al. [31] proposed a weakly supervised underwater color transfer model called Water CycleGAN to correct color distortion. Guo et al. [32] proposed the residual multi-scale dense block in the generator, which can improve the performance by delivering more details. Islam et al. [33] presented Funie-GAN for real-time underwater image enhancement, combining paired and unpaired training. Jiang et al. [34] transferred in-air images to real-world underwater image enhancement, which eliminates the dependence on underwater paired data. It tries to reduce the gap between the real and synthetic data domains, thus enhancing the generalization performance of the deep learning model.

Pure data-driven methods make the enhanced images lack physical meaning. Limited to the quality of the training data, most deep learning-based image enhancement algorithms could be improved when targeting data in the wild. Meanwhile, the complexity of the model makes it impossible for these algorithms to perform real-time underwater image enhancement. The deep learning-based methods must address the challenge of balancing real-time performance and overall performance.

3. Method

Figure 1 illustrates the framework of our study. Initially, we obtained the RGB scatter map for the raw underwater image and the GT. Then, we feed the scatter map as an input to CieNet, a network for curve parameter estimation, which estimates each of the three channels separately. The curve parameters are updated iteratively, and the enhanced image is obtained through iterative parameter application. In this section, we will provide a detailed explanation of the design and application of the iterative curve, the parameter estimation network CieNet, and a set of loss functions we used.

Fig. 1. The framework of our method. The RGB scatter map for the raw underwater image and the GT (Ground Truth) is provided as input to the curve parameter estimation network, CieNet. The curve parameters of the three channels are obtained separately. By applying the curve mapping function iteratively, the final enhancement result is obtained and we display the result of the RGB scatter mapping relationship we learned on the far right.

Download Full Size | PDF

3.1 Curve estimation for underwater image enhancement

Our statistical experimental analysis revealed that there exists an intuitive pattern for pixel value mapping between the raw and the enhanced underwater image. As shown in Fig. 2(c)-(e), we show a three-channel scatter map depicting the mapping relationship between the raw and the enhanced underwater image. The x-axis depicts the pixel values of the raw underwater image, while the y-axis denotes the pixel values of the enhanced image, with values normalized between 0 and 1. While it is challenging to describe this mapping pattern in mathematical terms, the scatter plot emphasizes a particular area that can be illustrated via a dynamic curve featuring specific parameters. The challenge in this research is the designing of the curve, which we will address in the following discussion combining theory and experiments.

Fig. 2. (a)This dynamic curve with $\alpha =\pm 1,0.5,0.25,0.1$. (b)This dynamic curve with $\alpha =\pm 1$ for 1, 2, 3, and 4 iterations. (c)Scatter map for R channel with $\alpha =\pm 1$ for 1, 2 and 3 iterations. (d)Scatter map for G channel with $\alpha =\pm 1$ for 1, 2 and 3 iterations. (e)Scatter map for B channel with $\alpha =\pm 1$ for 1, 2 and 3 iterations.

Download Full Size | PDF

Our method transforms the task of learning pixel value mapping between input and output images into estimating curve parameters. This approach is comparable to incorporating a powerful mathematical statistical law into the deep learning network, thus significantly reducing the learning requirements of the network for the same amount of data. Our further experiments demonstrate that we can obtain satisfactory performance by using a lightweight network called CieNet.

Theoretically, polynomials of high order can infinitely approximate an arbitrary curve, and more complex curves often necessitate higher-order polynomials for accurate fitting. Nevertheless, high-order polynomials are complex in form, difficult to design, and have numerous parameters. We can achieve the function of high-order curves by iterating through a simple low-order curve.

The low-order curve we use to iterate can be expressed as:

(1)$$E_{\Omega }\left ( x \right ) =\alpha _{\Omega }\left ( x \right ) \cdot R_{\Omega }\left ( x \right )^{2}+ \left ( 1-\alpha _{\Omega }\left ( x \right ) \right ) \cdot R_{\Omega }\left ( x \right )$$

where $\Omega \in \left ( R,G,B \right )$, $R_{\Omega } \left ( x \right )$ is the pixel value of the raw underwater image, $E_{\Omega } \left ( x \right )$ represents the value of enhanced images and $\alpha _{\Omega }\left ( x \right )$ is the parameter we need to solve.

The curve meets our following requirements: 1. It has a simple form with few parameters and is differentiable to enable gradient retrieval. 2. It is monotonous, making the learning of mapping more efficient, ensuring image contrast, and maintaining a constant output of $\left [ 0,1 \right ]$ when normalizing the input.

The changes of the lower-order curve as $\alpha \in \left [ -1,1 \right ]$ are exhibited in Fig. 2(a). It is observable that when $\alpha =\pm 1$, the curve represents the boundary of the mapping it can fit. This dynamic curve with parameters can fit the mapping included in the two curves when $\alpha =\pm 1$.

In order to adapt to more complex situations, we apply this curve iteratively according to Eq. (2).

(2)$$E_{n}(x)=\alpha _{n}(x)\cdot E_{n-1}^{2}(x)+(1-\alpha _{n}(x) ) \cdot E_{n-1}(x)$$

where $n$ is the number of iterations.

Figure 2(b) depicts the curve with $\alpha =\pm 1$ for 1, 2, 3 and 4 iterations, revealing that the curve can fit a more comprehensive dynamic range with more iterations. Figure 2(c)-(e) depicts an RGB scatter map. Three iterations of the curve can cover almost all cases of RGB scatter mapping, barring apparent boundary anomalies. Subsequent experiments have also proven that we can achieve satisfactory results using three iterations in the EUVP dataset [33].

Although the RGB channels undergo a similar mapping process, there are significant differences in the mapping results. As shown in Fig. 3, we compare and display the estimations for $\alpha$ of RGB channels as a normalized heat map, taking the average of the parameters over three iterations. The color bar in Fig. 3(c) shows that red and blue colors represent more drastic changes. The $R$ channel displays a significantly greater degree of change than the $GB$ channels, which aligns with the attenuation law of underwater images. In theory, light undergoes energy decay due to absorption by water during the imaging process of underwater images. Typically, red light decays the fastest, while blue and green light lines decay at slower rates. As such, the red channel exhibits the most severe attenuation in underwater images, making it the most important channel for compensation. The findings in Fig. 3 align with this theory. Furthermore, separating the three channels for estimation can yield superior enhancement outcomes, as the RGB channels focus on different enhancement objects as shown in Fig. 3(c)-(e).

Fig. 3. Visualization of heat map for average curve parameters $\alpha$ of different channels in three iterative estimations. (a)Raw. (b)GT. (c)Heat map of R channel. (c)Heat map of G channel. (c)Heat map of B channel.

Download Full Size | PDF

3.2 CieNet

Specifically, CieNet takes a paired RGB image as input and outputs the corresponding parameters of the curve during the three iterations. Figure 4 displays the detailed network structure of our proposed model, CieNet. The architecture of CieNet is extremely simple, which consists of seven sub-modules (CoECA and TaECA). CoECA (Conv-Efficient Channel Attention) and TaECA (Tanh-Efficient Channel Attention) employ the convolution activation function stack block combined with the attention mechanism. The utilization of 1D convolution enables the achievement of local cross-channel interactions without the need for dimensionality reduction [35]. As a result, it significantly decreases the model’s complexity while preserving its performance. We connect these modules into a symmetrical structure through skip connections. Using a lightweight network greatly decreases the limitations on application platforms and permits real-time underwater image enhancement on embedded devices. Importantly, our subsequent experiments have demonstrated the superior performance of our algorithm in terms of running-time.

Fig. 4. The structure of CieNet.

Download Full Size | PDF

3.3 Loss function

This paper proposes a novel combination of four loss functions: content loss $L_{gt}$, perception loss $L_{perce}$, sharpness loss $L_{sharp}$, and color loss $L_{color}$. By integrating these losses, our training model can effectively enhance the visual perception quality of underwater images.

$L_{gt}$ is constructed based on Eq. (3). We select $L_{1}$ loss due to its low vulnerability to blurring.

(3)$$L_{gt}=\left \| Y-E\right \| _{1}$$

where $Y$ and $E$ represent the pixel value of the GT and our enhanced image, respectively.

The perception loss $L_{perce}$ is designed to enforce similar feature expressions between the enhanced images and the GT. By retaining these features during the enhancement process, the results can exhibit more refined texture. $L_{perce}$ is calculated based on Eq. (4).

(4)$$L_{perce}=\left \| \phi (Y)- \phi (E)\right \| _{2}$$

where $\phi (Y)$ and $\phi (E)$ correspond to the features extracted from layer relu5_4 of a pre-trained VGG-19 network.

In our experiments, we observed that the unprocessed images were often blurry and lacked sufficient clarity and contrast. To address this, we introduced the sharpness loss to measure image blurriness as computed in Eq. (5).

(5)$$L_{sharp}=\left \| \triangledown Y-{\triangledown} E \right \| _{1}$$

where $\triangledown Y$ and $\triangledown E$ correspond to the gradient of the GT and the enhanced image, as computed in Eq. (6).

(6)$$\triangledown p=p_{x}^{2} +p_{y}^{2}$$

where $p_{x}^{2}$ and $p_{y}^{2}$ are gradients of the sobel operator obtained in the $x$ and $y$ directions, respectively. By controlling the gradient changes of the image in the $x$ and $y$ directions, $L_{sharp}$ controls the image’s smoothness, preventing the sharp changes of adjacent pixels from causing image distortion and making the generated image gradient similar to GT.

The construction of color loss $L_{color}$ in this study is based on the theory of color constancy. The Gray World Hypothesis states that natural images with good visual perception should have similar mean and histogram distributions across all color channels. Consequently, we calculate the mean pixel intensity of the three RGB channels and use Eq. (7) to determine $L_{color}$, where $I_{R}$, $I_{G}$, and $I_{B}$ denote the average pixel intensity of the three channels we enhanced.

(7)$$L_{color}=(I_{R}- I_{G})^{2} +(I_{R}- I_{B})^{2} +(I_{G}- I_{B})^{2}$$

We construct the total loss $L_{total}$ according to Eq. (8).

(8)$$L_{total}=\lambda _{1}L_{gt}+\lambda _{2}L_{perce}+\lambda _{3}L_{sharp}+\lambda _{4}L_{color}$$

The values of these four losses $L_{gt}$, $L_{perce}$, $L_{sharp}$ and $L_{color}$ are very close in magnitude. The initial value of all the coefficients was set to 1. Through experiments, it was found that during training, $L_{gt}$ and $L_{perce}$ played a more important role in enhancing the overall image quality. As $\lambda _{1}$ and $\lambda _{2}$ gradually increased, the overall training can converge faster and produce images with higher enhancement quality in the early stages. However, when $\lambda _{3}$ and $\lambda _{4}$ increased, the overall training oscillated and the quality of the model obtained by final contraction was not as good as the former. So, the coefficients of total loss were selected as $\lambda _{1}=\lambda _{2}=1.5$ and $\lambda _{3}=\lambda _{4}=1$.

4. Experiments

In this section, firstly, we conducted qualitative and quantitative experiments to validate the superior performance of our method. We selected excellent underwater image enhancement methods from recent years, including IBLA [14], MLLE [36] and HLPR [19] for traditional methods; WaterNet [24], Funie-Gan [33], URanker [37] and UIEC^ˆ2Net [38] for deep learning methods. Additionally, to ensure the effectiveness of our network and evaluate the impact of our chosen loss functions and parameter selection, we conducted a series of ablation experiments. Furthermore, we applied our enhancement results to the tasks of SIFT keypoint detection, matching, and image segmentation. The significant improvement in performance demonstrated the effectiveness of our algorithm when applied to the downstream tasks.

4.1 Datasets

1) EUVP Dataset [33]: This dataset comprises underwater images captured using a diverse range of seven camera equipment. These images exhibit varying water quality, lighting conditions, and landscapes. The CycleGAN [39] is employed to generate distorted images from the good-quality images captured, thus creating a paired simulated dataset for supervised training. We utilized 2184 image pairs from the scenes category of the EUVP dataset. Out of which, 1964 images constituted the training set, while 220 images were reserved as the test set.

2) UIEB Dataset [24]: This dataset comprises a diverse range of underwater scenes, encompassing images with varying quality, characteristics and content. Various outstanding image enhancement algorithms are applied to improve the original authentic images, and the outcome with the highest visual perceptual quality is chosen as the final GT. We utilized a set of 890 image pairs from the UIEB dataset, with 130 images explicitly reserved for the test set.

3) LUSI Dataset [10]: This dataset comprises authentic underwater images with diverse water scenes, water types, lighting conditions, and target categories. The selection of GT involves two rounds of subjective and objective evaluations to minimize potential deviations. We partitioned the dataset into 647 samples for testing and 3632 for training.

4.2 Implementation details

All images were processed with a resolution of 240 $\times$ 320. The implementation was carried out on Pytorch and tested on $RTX3060$. A batch size of 8 and a learning rate of 0.001 were used in this experiment, and all models were trained for 200 epochs. Based on the mathematical background introduced in the Method, we selected the number of iterations $n=3$ for the EUVP dataset, applying the same principle, $n=4$ for the UIEB and the LUSI datasets.

4.3 Qualitative experiments

The results of our qualitative comparative experiments are presented in Fig. 5. It suggests that traditional algorithms tend to over-enhance images and compromise visual quality despite high contrast. Moreover, traditional algorithms exhibit color tendencies. The MLLE-enhanced images tend to be grayscale, the IBLA-enhanced images maintain a similar blue-green tone to the raw images, while the HLPR-enhanced images are red-tinted. This result is consistent with our previous analysis of the shortcomings of traditional methods. In contrast, deep learning-based methods outperform traditional methods in color tone. Our algorithm produces hues closest to GT while maintaining good processing outcomes at the boundary. The results of Funie-GAN leave traces of deconvolution kernels and are more prone to failure in boundary and extreme environments. In comparison to WaterNet, UIEC^ˆ2Net, and URanker, our algorithm yields significantly clearer results and eliminates blurring more efficiently. We analyzed Fig. 5 in detail and zoomed in on the marked areas in the red box. The turtle can be clearly seen. Our method, WaterNet, and Funie-GAN can produce the closest color tone to GT without overexposure or color imbalance. Meanwhile, the results obtained by our method show that the turtle’s texture is the most realistic and clear.

Fig. 5. The qualitative comparison of different underwater image enhancement algorithms. (a) Raw. (b) GT. (c) IBLA [14]. (d) MLLE [36]. (e) HLPR [19]. (f) WaterNet [24]. (g) Funie-GAN [33]. (h) URanker [37]. (i) UIEC^ˆ2Net [38]. (j) CieNet(ours).

Download Full Size | PDF

4.4 Quantitative experiments

To quantitatively assess the effectiveness of different methods, we conducted comprehensive full-reference and no-reference evaluation. We use three indexes, MSE (Mean Squared Error), PSNR (Peak Signal-to-Noise Ratio), and SSIM (Structural Similarity), to conduct a comprehensive full-reference evaluation.

A higher PSNR score and a lower MSE score signify a stronger resemblance between the result and the GT in terms of image content. The calculation of SSIM incorporates information from three key aspects: brightness, contrast, and structure. A value closer to 1 indicates higher similarity in image structure, texture, and superior quality between the tested image and the GT.

We use UCIQE and UIQM for non-reference evaluation. The evaluation of underwater image quality in UIQM consists of three components: the image color metric (UICM), the sharpness metric (UISM) and the contrast metric (UIConM):

(9)$$UIQM=c_{1}\times UICM + c_{2}\times UISM + c_{3}\times UIConM$$

where $c_{1}$, $c_{2}$ and $c_{3}$ are set to 0.0282, 0.2953 and 3.5753. The UCIQE calculation entails the standard deviation $\sigma _{c}$, the average of brightness contrast $conl$ and the saturation $\mu _{s}$:

(10)$$UCIQE=k_{1}\times \sigma _{c} + k_{2}\times conl + k_{3}\times \mu _{s}$$

where $k_{1}$, $k_{2}$ and $k_{3}$ are set to 0.4680, 0.2745 and 0.2576. In theory, higher scores for UCIQE and UIQM indicate higher image quality.

The results of the full-reference evaluation are presented in Table 1, with the best results highlighted in bold and the second-best results marked with underscores. Our method outperformed all other algorithms on the EUVP and the LUSI dataset in terms of all three indexes. Particularly, our results significantly surpass those of competing algorithms regarding the SSIM index. This outcome aligns with our qualitative experiment analysis, indicating that our improved texture features outperform others. Additionally, our experimental results demonstrate that we achieved the second-best performance on the UIEB dataset, comparable to the performance of the URanker algorithm.

Table 1. The full-reference evaluation of different underwater image enhancement algorithms.

View Table | View all tables in this article

The outcomes of our non-reference evaluation experiments are presented in Table 2. We highlighted the scores of the GT in bold and marked the methods with the highest scores using underscores. All methods have shown improvements in the UCIQE and UIQM indexes compared to the raw image. The results reveal that even the UCIQE and UIQM scores of the GT are not considerably high. As stated in [40], the non-reference indexes are biased.

Table 2. The No-reference evaluation of different underwater image enhancement algorithms.

View Table | View all tables in this article

The UCIQE and UIQM indexes partially reflect the image quality, yet a higher score does not necessarily corresponds to superior image quality. Through Eq. (9), we can see that UIQM can only comprehensively consider the information on image color, sharpness, and contrast. When the sharpness and contrast of the image are very high, it can ensure that the UIQM score is excellent. However, the image is very prone to overexposure, which will greatly damage the perceptual quality of the image. In the same way, UCIQE can only consider the image’s standard deviation, brightness, and saturation comprehensively. Traditional methods can directly modify the pixel value distribution of RGB to improve these three components, but they can easily cause excessive image processing. Combining Fig. 5 and Table 2, it can be seen that HLPR achieves excellent UCIQE and UIQM indexes on all three datasets. However, as depicted in Fig. 5, upon visual inspection, it is evident that the method does not yield the highest image quality. Therefore, it is necessary to develop more appropriate evaluation indexes and mechanisms to assess visual quality when the GT is unavailable.

4.5 Ablation experiments

We have conducted ablation experiments to verify the effectiveness of each loss. Specifically, we tested the enhancement results by removing specific losses during training. Figure 6 shows the qualitative comparison of enhanced results after removing specific loss. It is evident in Fig. 6 that using an overall combination of losses produces superior results. $L_{gt}$ maintains the stability of content for the enhanced image, $L_{color}$ adjusts the image to be closer to the natural image in terms of color, while $L_{perce}$ and $L_{sharp}$ improve the contrast and partially eliminate the blurriness of the image.

Fig. 6. Qualitative results of each loss function ablation experiments. (a) Raw. (b) GT. (c) Ours. (d) w/o $L_{color}$. (e) w/o $L_{gt}$. (f) w/o $L_{perce}$. (g) w/o $L_{sharp}$.

Download Full Size | PDF

Table 3 shows the quantitative evaluation results conducted on the loss function ablation experiments. The absence of the three losses of $L_{gt}$, $L_{perce}$ and $L_{sharp}$ led to a decline in the final three indexes of MSE, PSNR, and SSIM. While the absence of $L_{color}$ resulted in improvement in these three indexes. We analyzed the dataset and enlarged the local area in the last row of Fig. 6 for a detailed comparison. The color tone of GT (b) still leans towards the underwater blue-green tone. Our enhancement results significantly eliminated this underwater color tone in (c), whereas the enhancement result without $L_{color}$ restored the blue-green tone in (d). The addition of color constant loss $L_{color}$, while reducing these three indexes to some extent, can effectively remove underwater tones. The UIQM is calculated to verify the effectiveness of $L_{color}$ in improving image color. According to Eq. (9), in the calculation of UIQM, the image color metric (UICM) is considered to evaluate the image’s color. The experimental results are shown in the Table 3, and we can see that without $L_{color}$, UIQM will decrease. This shows that $L_{color}$ can effectively improve image color compared to other losses.

Table 3. The full-reference evaluation of ablation experiments for each loss.

View Table | View all tables in this article

Our method achieved satisfactory results by three iterations with a network composed of only seven submodules on the EUVP dataset. To test the effective performance of our selected parameters, we increase the number of iterations and sub-modules to observe whether the performance can be improved. The results in Table 4 illustrate that even with the simplest combination, we achieved the best results. The increase in the number of iterations and deepening of the network does not significantly improve the algorithm’s performance, providing further evidence for the lower computing demand of iterative learning on curves.

Table 4. The full-reference evaluation of ablation experiments for different setups.

View Table | View all tables in this article

4.6 Running test

Table 5 displays the time required for different algorithms to perform image enhancement on an image with a resolution of 240 $\times$ 320. The results indicate that our algorithm runs much faster than other algorithms. Meanwhile, we tested our method on the embedded device Jetson AGX Xavier, and it only took 0.018s to process image with a resolution of 240 $\times$ 320. It demonstrates our algorithm can be deployed on small platforms, allowing for efficient real-time underwater image enhancement on limited computing resources. We can achieve satisfactory enhancement results with only 70767 network parameters. When the input size of the image used is 240$\times$320$\times$3, Flops=5.42G. Additionally, the algorithm does not require any supplementary information and is user-friendly, making it easily promotable.

Table 5. Running-time test of different underwater image enhancement algorithms.

View Table | View all tables in this article

4.7 Application test

To verify the positive effect of images enhanced by our algorithm on downstream tasks, we conducted the experiments of image segmentation, detection and matching of sift key points. The results of image segmentation are based on [41].

As shown in Fig. 7, it is evident that for the underwater dense fish image, the raw image cannot segment targets effectively. Our method and IBLA can improve the performance of segmentation and identify more targets than the raw image. We marked three boxes with targets. Only our method successfully segmented the fish marked with a yellow box, only our method, IBLA and HLPR successfully segmented the fish marked with a red box, and only our method, IBLA, MLLE, WaterNet, and URanker successfully identified two fishes marked with a black box. Meanwhile, the segmentation result of our method has the shape closest to GT, which shows the advantages of our method in subsequent segmentation task. Worse still, the segmentation results of some algorithms are even worse than the original image, indicating that many image enhancement algorithms are not practical for downstream tasks, despite being able to improve the contrast of the image. It urges us that, for further investigation, low-level visual tasks should be integrated with high-level visual tasks.

Fig. 7. The results of image segmentation for different underwater image enhancement algorithms. (a) Raw. (b) GT. (c) IBLA [14]. (d) MLLE [36]. (e) HLPR [19]. (f) WaterNet [24]. (g) Funie-GAN [33]. (h) URanker [37]. (i) UIEC^ˆ2Net [38]. (j) CieNet(ours).

Download Full Size | PDF

Simultaneously, we tested the different algorithms’ performance on sift key point detection and matching [42]. As shown in Fig. 8, the images enhanced all generated more key points than the raw image. However, only the results of our algorithm and Funie-GAN [33] produced a similar number of key points as the GT. The other methods generated a large number of incorrect key points due to excessive enhancement that caused damage to the image’s original information. In the key point matching process, our algorithm far outperformed other algorithms. The correct matching should be all straight lines, but many mismatches (slashes) exist in other algorithms. It shows that our algorithm not only enhances the visual perception quality of the image but also preserves its semantic information. In contrast, while most other algorithms enhance the visual perception quality of the image by improving the image contrast but sacrifice its structural information.

Fig. 8. The results of sift key point detection and matching for different underwater image enhancement algorithms. (a) The sift key point detection of GT, Raw image, GT (from the left to right). (b)-(j) The left corresponds to the results of sift key point detection of different methods, the right corresponds to the key point matching between the GT and different enhanced images. (b) Raw. (c) IBLA [14]. (d) MLLE [36]. (e) HLPR [19]. (f) WaterNet [24]. (g) Funie-GAN [33]. (h) URanker [37]. (i) UIEC^ˆ2Net [38]. (j) CieNet(ours).

Download Full Size | PDF

5. Conclusion

Our paper creatively applies curve mapping to underwater image enhancement, using a simple network to achieve satisfactory results. Our method can be easily extended to the small device. According to the experimental results, our method outperforms other methods in subjective and objective evaluation, as well as in terms of running-time. Our method combines mathematical priors with deep learning, enabling a reduced reliance on complex deep models used in previous studies. Our research fully demonstrates the effectiveness of this mapping transformation idea for underwater low-level visual tasks. Our future research aims to optimize this method by integrating downstream tasks for improved adaptability in real-world scenarios.

Funding

National Natural Science Foundation of China (42276187).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in the EUVP Dataset [33], UIEB Dataset [24] and LUSI Dataset [10].

References

1. S. Anwar and C. Li, “Diving deeper into underwater image enhancement: A survey,” Signal Process. Image Commun. 89, 115978 (2020). [CrossRef]

2. M. Jian, X. Liu, H. Luo, et al., “Underwater image processing and analysis: A review,” Signal Process. Image Commun. 91, 116088 (2021). [CrossRef]

3. R. Liu, X. Fan, and M. Zhu, “Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light,” IEEE Trans. Circuits Syst. Video Technol. 30(12), 4861–4875 (2020). [CrossRef]

4. D. Akkaynak and T. Treibitz, “Sea-thru: A method for removing water from underwater images,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2019), pp. 1682–1691.

5. K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). [CrossRef]

6. J. Zhou, X. Wei, and J. Shi, “Underwater image enhancement via two-level wavelet decomposition maximum brightness color restoration and edge refinement histogram stretching,” Opt. Express 30(10), 17290–17306 (2022). [CrossRef]

7. J. Zhou, L. Pang, D. Zhang, et al., “Underwater image enhancement method via multi-interval subhistogram perspective equalization,” IEEE Journal of Oceanic Engineering (2023).

8. C. Ancuti, C. O. Ancuti, T. Haber, et al., “Enhancing underwater images and videos by fusion,” in 2012 IEEE conference on computer vision and pattern recognition, (IEEE, 2012), pp. 81–88.

9. K. Ji, W. Lei, W. Zhang, et al., “Dual stream fusion network for underwater image enhancement of multi-scale turbidity restoration and multi-path color correction,” Opt. Express 32(4), 6291–6308 (2024). [CrossRef]

10. L. Peng, C. Zhu, and L. Bian, “U-shape transformer for underwater image enhancement,” IEEE Transactions on Image Processing (2023).

11. C. Li, S. Anwar, J. Hou, et al., “Underwater image enhancement via medium transmission-guided multi-color space embedding,” IEEE Trans. on Image Process. 30, 4985–5000 (2021). [CrossRef]

12. M. K. Moghimi and F. Mohanna, “Real-time underwater image enhancement: a systematic review,” Journal of Real-Time Image Processing pp. 1–17 (2021).

13. P. L. Drews, E. R. Nascimento, S. S. Botelho, et al., “Underwater depth estimation and image restoration based on single images,” IEEE Comput. Grap. Appl. 36(2), 24–35 (2016). [CrossRef]

14. Y.-T. Peng and P. C. Cosman, “Underwater image restoration based on image blurriness and light absorption,” IEEE Trans. on Image Process. 26(4), 1579–1594 (2017). [CrossRef]

15. K. Liu and Y. Liang, “Enhancement of underwater optical images based on background light estimation and improved adaptive transmission fusion,” Opt. Express 29(18), 28307–28328 (2021). [CrossRef]

16. X. Zhao, T. Jin, and S. Qu, “Deriving inherent optical properties from background color and underwater image enhancement,” Ocean Eng. 94, 163–172 (2015). [CrossRef]

17. J. Xie, G. Hou, G. Wang, et al., “A variational framework for underwater image dehazing and deblurring,” IEEE Trans. Circuits Syst. Video Technol. 32(6), 3514–3526 (2022). [CrossRef]

18. L. Dong, W. Zhang, and W. Xu, “Underwater image enhancement via integrated rgb and lab color models,” Signal Process. Image Commun. 104, 116684 (2022). [CrossRef]

19. P. Zhuang, J. Wu, F. Porikli, et al., “Underwater image enhancement with hyper-laplacian reflectance priors,” IEEE Trans. on Image Process. 31, 5442–5455 (2022). [CrossRef]

20. J. Zhou, T. Yang, W. Chu, et al., “Underwater image restoration via backscatter pixel prior and color compensation,” Eng. Appl. Artif. Intell. 111, 104785 (2022). [CrossRef]

21. C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, et al., “Color balance and fusion for underwater image enhancement,” IEEE Trans. on Image Process. 27(1), 379–393 (2018). [CrossRef]

22. C. O. Ancuti, C. Ancuti, C. De Vleeschouwer, et al., “Color channel transfer for image dehazing,” IEEE Signal Process. Lett. 26(9), 1413–1417 (2019). [CrossRef]

23. Y. Li, C. Zhu, J. Peng, et al., “Fusion-based underwater image enhancement with category-specific color correction and dehazing,” Opt. Express 30(19), 33826–33841 (2022). [CrossRef]

24. C. Li, C. Guo, W. Ren, et al., “An underwater image enhancement benchmark dataset and beyond,” IEEE Trans. on Image Process. 29, 4376–4389 (2020). [CrossRef]

25. C. Li, S. Anwar, and F. Porikli, “Underwater scene prior inspired deep underwater image and video enhancement,” Pattern Recognit. 98, 107038 (2020). [CrossRef]

26. L. Chen, Z. Jiang, and L. Tong, “Perceptual underwater image enhancement with deep learning and physical priors,” IEEE Trans. Circuits Syst. Video Technol. 31(8), 3078–3092 (2021). [CrossRef]

27. J. Zhou, D. Zhang, and W. Zhang, “Cross-view enhancement network for underwater images,” Eng. Appl. Artif. Intell. 121, 105952 (2023). [CrossRef]

28. J. Zhou, J. Sun, W. Zhang, et al., “Multi-view underwater image enhancement method via embedded fusion mechanism,” Eng. Appl. Artif. Intell. 121, 105946 (2023). [CrossRef]

29. C. Guo, C. Li, J. Guo, et al., “Zero-reference deep curve estimation for low-light image enhancement,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2020), pp. 1780–1789.

30. J. Li, K. A. Skinner, R. M. Eustice, et al., “Watergan: Unsupervised generative network to enable real-time color correction of monocular underwater images,” IEEE Robot. Autom. Lett. 3, 387–394 (2017). [CrossRef]

31. C. Li, J. Guo, and C. Guo, “Emerging from water: Underwater image color correction based on weakly supervised color transfer,” IEEE Signal Process. Lett. 25(3), 323–327 (2018). [CrossRef]

32. Y. Guo, H. Li, and P. Zhuang, “Underwater image enhancement using a multiscale dense generative adversarial network,” IEEE J. Oceanic Eng. 45(3), 862–870 (2020). [CrossRef]

33. M. J. Islam, Y. Xia, and J. Sattar, “Fast underwater image enhancement for improved visual perception,” IEEE Robot. Autom. Lett. 5(2), 3227–3234 (2020). [CrossRef]

34. Q. Jiang, Y. Zhang, and F. Bao, “Two-step domain adaptation for underwater image enhancement,” Pattern Recognition 122, 108324 (2022). [CrossRef]

35. Q. Wang, B. Wu, P. Zhu, et al., “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2020), pp. 11534–11542.

36. W. Zhang, P. Zhuang, H.-H. Sun, et al., “Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement,” IEEE Trans. on Image Process. 31, 3997–4010 (2022). [CrossRef]

37. C. Guo, R. Wu, X. Jin, et al., “Underwater ranker: Learn which is better and how to be better,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37 (2023), pp. 702–709.

38. Y. Wang, J. Guo, H. Gao, et al., “Uiec^ˆ 2-net: Cnn-based underwater image enhancement using two color space,” Signal Process. Image Commun. 96, 116250 (2021). [CrossRef]

39. J.-Y. Zhu, T. Park, P. Isola, et al., “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, (2017), pp. 2223–2232.

40. J. Wang, X. Ye, and Y. Liu, “Underwater self-supervised monocular depth estimation and its application in image enhancement,” Eng. Appl. Artif. Intell. 120, 105846 (2023). [CrossRef]

41. A. Kirillov, E. Mintun, N. Ravi, et al., “Segment anything,” arXivarXiv:2304.02643 (2023). [CrossRef]

42. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. 60(2), 91–110 (2004). [CrossRef]

	EUVP Dataset			UIEB Dataset			LUSI Dataset
Method	MSE( $\times 100$ ) $↓$	PSNR $↑$	SSIM $↑$	MSE( $\times 100$ ) $↓$	PSNR $↑$	SSIM $↑$	MSE( $\times 100$ ) $↓$	PSNR $↑$	SSIM $↑$
IBLA [14]	0.908	28.597	0.682	1.007	28.169	0.642	1.047	27.934	0.181
MLLE [36]	1.015	28.073	0.622	0.952	28.363	0.725	1.039	27.971	0.161
HLPR [19]	1.039	27.968	0.164	1.035	27.981	0.232	1.054	27.901	0.118
Funie-GAN [33]	0.747	29.442	0.776	0.91	28.616	0.846	0.852	28.906	0.781
WaterNet [24]	0.919	28.528	0.770	0.971	28.282	0.729	0.880	28.744	0.834
URanker [37]	0.917	28.523	0.765	0.780	29.338	0.909	0.879	28.755	0.812
UIEC^ˆ2Net [38]	0.957	28.337	0.753	0.831	29.026	0.858	0.886	28.764	0.829
CieNet(Ours)	0.742	29.476	0.813	0.818	29.197	0.890	0.817	29.080	0.853

	EUVP Dataset		UIEB Dataset		LUSI Dataset
Method	UCIQE $↑$	UIQM $↑$	UCIQE $↑$	UIQM $↑$	UCIQE $↑$	UIQM $↑$
Raw	0.551	2.234	0.522	1.975	0.515	1.937
GT	0.581	3.786	0.620	3.578	0.588	3.754
IBLA [14]	0.598	2.012	0.603	2.575	0.582	2.235
MLLE [36]	0.609	3.547	0.614	3.688	0.602	3.759
HLPR [19]	0.612	4.038	0.649	4.036	0.642	3.872
Funie-GAN [33]	0.567	3.924	0.589	3.932	0.567	3.859
WaterNet [24]	0.578	3.558	0.557	3.826	0.577	3.759
URanker [37]	0.603	3.616	0.607	3.940	0.595	3.919
UIEC^ˆ2Net [38]	0.615	3.522	0.620	3.891	0.605	3.728
CieNet(Ours)	0.579	3.766	0.616	3.268	0.575	4.514

Method	MSE( $\times$ 100) $↓$	PSNR $↑$	SSIM $↑$	UIQM $↑$
Ours	0.742	29.476	0.813	3.766
w/o $L_{c o l o r}$	0.693	29.796	0.821	3.621
w/o $L_{g t}$	0.881	28.715	0.779	4.357
w/o $L_{p e r}$	0.753	29.417	0.810	4.012
w/o $L_{s h a r p}$	0.753	29.411	0.809	4.084

Method	MSE( $\times$ 100) $↓$	PSNR $↑$	SSIM $↑$
8iter + 7modules	0.778	29.269	0.806
3iter + 11modules	0.757	29.385	0.813
3iter + 7modules	0.742	29.476	0.813

	EUVP Dataset			UIEB Dataset			LUSI Dataset
Method	MSE( $\times 100$ ) $↓$	PSNR $↑$	SSIM $↑$	MSE( $\times 100$ ) $↓$	PSNR $↑$	SSIM $↑$	MSE( $\times 100$ ) $↓$	PSNR $↑$	SSIM $↑$
IBLA [14]	0.908	28.597	0.682	1.007	28.169	0.642	1.047	27.934	0.181
MLLE [36]	1.015	28.073	0.622	0.952	28.363	0.725	1.039	27.971	0.161
HLPR [19]	1.039	27.968	0.164	1.035	27.981	0.232	1.054	27.901	0.118
Funie-GAN [33]	0.747	29.442	0.776	0.91	28.616	0.846	0.852	28.906	0.781
WaterNet [24]	0.919	28.528	0.770	0.971	28.282	0.729	0.880	28.744	0.834
URanker [37]	0.917	28.523	0.765	0.780	29.338	0.909	0.879	28.755	0.812
UIEC^ˆ2Net [38]	0.957	28.337	0.753	0.831	29.026	0.858	0.886	28.764	0.829
CieNet(Ours)	0.742	29.476	0.813	0.818	29.197	0.890	0.817	29.080	0.853

Learning mapping by curve iteration estimation For real-time underwater image enhancement

Abstract

1. Introduction

2. Related work

2.1 Physical model-based methods

2.2 Non-physical model-based methods

2.3 Deep learning-based methods

3. Method

3.1 Curve estimation for underwater image enhancement

3.2 CieNet

3.3 Loss function

4. Experiments

4.1 Datasets

4.2 Implementation details

4.3 Qualitative experiments

4.4 Quantitative experiments

4.5 Ablation experiments

4.6 Running test

4.7 Application test

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (5)

Equations (10)

Optics Express