Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Efficient automated high dynamic range 3D measurement via deep reinforcement learning

Open Access Open Access

Abstract

High dynamic range 3D measurement technology, utilizing multiple exposures, is pivotal in industrial metrology. However, selecting the optimal exposure sequence to balance measurement efficiency and quality remains challenging. This study reinterprets this challenge as a Markov decision problem and presents an innovative exposure selection method rooted in deep reinforcement learning. Our approach’s foundation is the exposure image prediction network (EIPN), designed to predict images under specific exposures, thereby simulating a virtual environment. Concurrently, we establish a reward function that amalgamates considerations of exposure number, exposure time, coverage, and accuracy, providing a comprehensive task definition and precise feedback. Building upon these foundational elements, the exposure selection network (ESN) emerges as the centerpiece of our strategy, acting decisively as an agent to derive the optimal exposure sequence selection. Experiments prove that the proposed method can obtain similar coverage (0.997 vs. 1) and precision (0.0263 mm vs. 0.0230 mm) with fewer exposures (generally 4) compared to the results of 20 exposures.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

3D measurement technology based on structured light can achieve fast and accurate measurement for complex parts due to its advantages of non-contact and high precision [13]. By capturing precise 3D data, this technology significantly enhances formation quality and streamlines manufacturing processes. It has found applications in industries such as automobile manufacturing and aerospace, with potential for expansive use in various other fields. However, challenges arise in measuring highly reflective materials, like automotive sheet metal. Cameras often capture images that are either too dark or overexposed due to the high dynamic range (HDR) surfaces, which can compromise reconstruction accuracy or even lead to failures in reconstruction [4,5].

In response, various HDR 3D measurement techniques have emerged, including multiple exposure fusion (MEF) [68], deep learning [911], adaptive projection [12,13], and polarization filter [14]. Among these techniques, multiple exposure fusion is widely regarded as the most practical and feasible method in industrial metrology due to its high accuracy and simplicity [15]. The measurement efficiency and quality of MEF depend on the exposure time and number (referred to as the exposure sequence). Hence, selecting the optimal exposure sequence becomes a critical aspect.

The exposure sequence selection problem can be defined as determining the minimum exposure sequence within the exposure interval to attain the specified coverage and accuracy. It resembles the NP-hard set cover problem, for which no exact polynomial-time solution exists [16]. As a simplified example, if exposure has 50 candidate values, and the number of exposures is specified as 6, then the set of exposure sequence has $\operatorname {C} _{50}^6 = 15890700$ possible values.

In tackling this issue, traditional methods primarily use greedy algorithms for approximate solutions, resulting in complex calculations. Therefore, researchers simplify the problem to achieve a fast generation of exposure sequence. Jiang et al. [6] first project a series of uniformly illuminated images according to preset parameters, calculate the mask, and then select the appropriate parameters. Feng et al. [7] directly generate exposure sequences based on the histogram. It can calculate the exposure sequence for multiple areas with significant differences, but it is prone to block effects and is difficult to apply to scenes with gradual changes in reflectance. Zhang [8] set the max-min threshold according to the gray value and iteratively calculated the exposure sequence while considering the random noise factor of the camera. Chen et al. [17] proposed a new image quality metric based on intensity modulation and regional overexposure. This approach substantially enhances measurement quality by employing Newton’s method for iterative exposure adjustment. However, its goal is to obtain the maximum quality gained from the subsequent exposure, rather than to optimize the overall measurement quality and efficiency, which can easily lead to too many exposures. Overall, current methods tend to choose exposure times based on immediate coverage improvements, neglecting a careful consideration of the entire exposure sequence, including factors like efficiency and accuracy, thereby hindering optimal outcomes.

This paper introduces a novel optimal exposure sequence selection method using deep reinforcement learning. It innovatively transforms the exposure sequence selection process into a Markov decision process, suitable for solving through reinforcement learning techniques, described in Section 2. Unlike traditional methods, this approach holistically evaluates the measurement efficiency and quality across the entire exposure sequence to determine the optimal selection, moving beyond the limitation of focusing on individual exposure times. Consequently, it more effectively realizes optimal outcomes.

Nevertheless, the application of reinforcement learning faces challenges, particularly in state update and action selection, as: (1) the existing methods cannot comprehensively consider the measurement efficiency and quality to fit the optimal exposure sequence selection strategy; (2) the state update in reinforcement learning necessitates interaction with the external environment, leading to laborious operations and slow training speed.

Therefore, this work introduces an exposure selection network (ESN) that leverages semantic and brightness information. The ESN comprehensively considers measurement efficiency and quality by considering the current exposure, current image, and HDR image as inputs. To enhance the training of ESN and achieve optimal policy fitting, we construct a multi-goal reward function integrating exposure number, exposure time, coverage, and accuracy. Furthermore, we construct an exposure image prediction network (EIPN) to construct an accurate virtual environment from a single exposure image. The EIPN begins by removing noise on the low-exposure image and then extracts image feature information through a skip pyramid context aggregation network for specified exposure image prediction. Based on the proposed method, the exposure sequence can be accurately selected using only one image, achieving simultaneous optimization of measurement efficiency and quality. Experiments prove that the proposed method can obtain similar coverage (0.997 vs. 1) and precision (0.0263 mm vs. 0.0230 mm) with fewer exposures (generally 4) compared to the results of 20 exposures.

The rest of this paper is organized as follows: Section 2 defines the goal and task; Section 3 explains the specific method; Section 4 presents various experimental results of the method; Section 5 summarizes this paper.

2. Goal and task

2.1 Goal

As mentioned earlier, the objective of exposure sequence selection is to find the minimum sequence of exposures within the exposure interval to achieve the specified coverage and precision. This task can be simplified to achieve the highest measurement efficiency while maintaining measurement quality. Consequently, accurately defining measurement efficiency and quality becomes a primary requirement.

2.1.1 Measurement efficiency

The efficiency of the Multiple Exposure Fusion (MEF) method is primarily influenced by: (1) Total Exposure Time ($T_s$): This parameter crucially affects the duration of image acquisition. Prolonged $T_s$ reduces measurement efficiency, whereas a shorter $T_s$ enhances it. (2) Number of Exposures ($N_s$): $N_s$ influences the quantity of image groups necessary. A higher $N_s$ necessitates processing more data during multiple exposure fusion, adversely impacting efficiency. Moreover, the efficiency impact remains uniform across different exposure groups in the multiple exposure fusion algorithm.

Consequently, the measurement efficiency of MEF is mathematically represented as:

$$M_{e} ={-}w_{t} \cdot T_s - N_s \cdot b,$$
where $w_{t}$ represents the weight of exposure time, and $b$ signifies a constant value indicating the influence of exposure count. Negative values for $T_s$ and $N_s$ reflect an inverse correlation with measurement efficiency. Furthermore, the efficiency reduction per exposure is calculated as:
$$m_{e} ={-}w_{t} \cdot t - b,$$
where $t$ represents the current exposure time.

2.1.2 Measurement quality

Measurement quality can be divided into the coverage $N_s$ and the accuracy $A_s$. Coverage refers to the ratio of the actual measurable area to the maximum measurable area. The commonly used measurable area evaluation is mainly based on pixel intensity, which can be expressed as

$$\operatorname{c}(i) = \left\{ \begin{gathered} 1,{\text{if}} \ i \in g_m\\ 0,{\text{otherwise}} \hfill \\ \end{gathered} \right.,$$
where $i$ represents the pixel intensity, $g_m$ is the measurable intensity interval $[30,250]$, its definition refers to [18].

In multiple exposure fusion technology, measurement accuracy is influenced by various factors, including system calibration, hardware equipment, and image intensity. Image intensity is the primary variable, with its optimal level directly linked to maximum measurement accuracy. Consequently, establishing a definitive functional relationship between image intensity and measurement accuracy is crucial. The main challenges in defining this relationship arise from random noise and the camera’s nonlinear response. Due to the complex nature of these factors and their resistance to precise quantification, an experimental methodology is crucial for developing accurate evaluation functions.

To this end, a comprehensive experimental method has been developed. This method focuses on varying exposure times and analyzing plane fitting errors to establish the necessary functional relationship. The experiment employs a standard flat plate, with a standard deviation (Std) of 0.0067 mm, ascertained using Hexagon Global Classic SR 07.10.07. The experimental procedure includes the following steps:

(1) Image acquisition: Images are captured across a range of exposure times, from 3ms to 50ms in 1ms increments.

(2) Gray value and Std error analysis: A consistent local area is designated for analysis across all exposures to calculate average gray values and conduct 3D reconstruction. Plane fitting is performed in this area to obtain the Std, as shown in Fig. 1(a), (b). The rationale for selecting a consistent area is to counteract poor measurement quality at the edges, while maintaining uniformity in the point cloud utilized for analysis across different exposures.

 figure: Fig. 1.

Fig. 1. Planar point cloud fitting analysis. (a) Schematic representation of partial area selection for plane fitting. (b) The curve depicts the average gray value and Std of the fitted plane. (c) Relationship curve between gray value and weight.

Download Full Size | PDF

(3) Weight determination: Initial weights are computed by dividing the minimum error by the corresponding error. This approach is based on the understanding that more significant errors indicate lower measurement quality, thus meriting reduced weights. Polynomial fitting and interpolation are then conducted to determine final weights, as depicted in Fig. 1(c). Additionally, weights corresponding to significant errors are reset to zero to preserve accuracy. Therefore, the accuracy function can be expressed as

$$\operatorname{a} (i) = \left\{ \begin{array}{l} - 1.21 \times {10^{ - 9}}{i^4} + 6.60 \times {10^{ - 7}}{i^3} - 0.00014{i^2} + 0.016i - 0.018,{\text{if}} \ i \in g_m\\ 0,\,{\text{otherwise}} \hfill \\ \end{array} \right.,$$

This function is quantified using a fourth-order polynomial, representing the changing relationship between image intensity and measurement accuracy.

Significantly, Fig. 1(b) distinctly showcases a significant trend in the Std of plane fitting, characterized initially by a marked decrease, followed by a moderate diminishment, and culminating in an increase. This trend is principally influenced by the varying impacts of nonlinear response and random noise at different intensity levels. At lower intensities, the pronounced combined effects of nonlinear response and random noise cause an increase in Std. As intensity increases, the waning effects of these factors lead to a substantial decrease in Std. At moderate intensity levels, nonlinear response maintains a consistent influence, largely unaffected by intensity fluctuations. Simultaneously, the lessening impact of random noise leads to a gradual reduction in Std. However, even as the influence of random noise diminishes at higher intensity levels, an intensification of nonlinear response offsets this trend, resulting in a rise in Std.

Overall, measurement quality can be defined as

$${M_q} = {C_s} + {A_s}=\operatorname{C}({\mathbf{H}})+\operatorname{A}({\mathbf{H}}),$$
$$C({\mathbf{H}}) = \frac{\sum \left( c({\mathbf{H}}) \odot {\mathbf{Mea}}_{\max} \right)}{N \cdot c_{\max}},$$
$$A({\mathbf{H}}) = \frac{\sum \left( a(\mathbf{H}) \odot {\mathbf{Mea}}_{\max} \right)}{N \cdot c_{\max}},$$
where $C_s, A_s$ represents the measurement coverage and measurement accuracy of the entire exposure sequence, ${\mathbf{H}}$ represents the HDR image, $N$ represents the total number of pixels in the image, $c_{\max }$ represents the maximum measurable coverage of the image, ${\mathbf{Mea}}_{\max }$ represents the largest measurable matrix corresponding to $c_{\max }$. In ${\mathbf{Mea}}_{\max }$, the measurable elements are 1, and the others are 0. It should be noted that we cannot only use $A_s$ to represent quality. For instance, some areas might be of high quality, yet others are unmeasurable, still resulting in acceptable overall quality. On the contrary, using $C_s, A_s$ together can not only better evaluate the quality, but also construct sub-goals to assist in subsequent training, as shown in Section 3.4.2.

2.2 Task

The exposure sequence selection task can be defined as follows: Find the exposure sequence from the exposure interval with the minimum total exposure time and the least number of exposures to achieve the specified coverage and precision. It has the same structure as the set cover problem, which is NP-hard [16]. This type of problem is a classical mathematical problem that cannot be solved exactly in polynomial time. Traditional methods mainly use the greedy algorithm for approximate solutions, and the calculation is complicated. In recent years, reinforcement learning methods have been widely used to solve this problem due to their advantages of fast solution speed, strong generalization ability, and high accuracy [19].

The basis of reinforcement learning is the Markov decision process, which can be expressed as $p(s_{k+1} \mid s_k, act_k)$, where $p$ is the transition probability, $s_k$ represents the $k$-th state and $act_k$ represents the action in the $k$-th state. This means that the next state is determined by the current state and action, which has nothing to do with the previous state and action. The exposure sequence selection can be decomposed into multiple single exposure predictions; that is to say, the next exposure is selected according to the current state, regardless of the previous state. Hence, the exposure sequence selection process can be regarded as a Markov decision process, making it suitable for reinforcement learning techniques.

3. Method

3.1 Basic composition of method

The automatic exposure selection method’s process is delineated in Fig. 2. Initially, a low-exposure image is obtained to generate the initial state and fed to the agent. Based on this state, the agent decides the next action, which the environment executes. This execution results in a new state and a corresponding reward, both returned to the agent. The process repeats iteratively until a specific stopping condition is met, as detailed in Section 3.3.1. This method encompasses five critical components: state, action, agent, reward, and environment, which are explained below.

 figure: Fig. 2.

Fig. 2. The overall workflow of the proposed method.

Download Full Size | PDF

State: The state refers to the complete description of the current environment, which is analyzed by the agent to choose the next action. It forms the basis for the agent to choose action, affecting training convergence speed and performance. In theory, using all raw information as the state is feasible. However, extracting relevant information for action selection poses challenges in this scenario, resulting in slow and inefficient training. Therefore, how to design an appropriate state to help the agent establish action-selecting relevance more efficiently and accurately is vital. In the exposure sequence decision process, the action aims to strike a balance between achieving significant improvements in measurement quality and minimizing the associated decrease in measurement efficiency, ultimately aiming to maximize overall reward. Therefore, we design a state to explicitly represent the measurement quality and efficiency, indicated by three items: 1) current exposure time, which represents current measurement efficiency. 2) current HDR image, which represents measurement quality that can be improved. 3) current exposure image, which represents current measurement quality. The first and third elements are directly obtainable and will not be elaborated further. The second element, derived by merging the previous HDR image with the current exposure image, is expressed as

$${{\mathbf{H}}_c} = \operatorname{F} ({{\mathbf{H}}_p},{\mathbf{I}}){\text{ }},$$
where ${\mathbf{H}}_{c},{\mathbf{H}}_{p}$ represent the current HDR image and previous HDR image, ${\mathbf{I}}$ represents the current exposure image, $\operatorname {F}$ represents the fusion function, specifically
$$\operatorname{F}\left(x,y\right)=\left\{\begin{array}{l} {x,{\text{if}} \ \operatorname{a}(x)>\operatorname{a}(y)} \\ {y,{\text{otherwise}}} \end{array}\right.,$$
where x and y represent arbitrary values.

Action: Action represents the behavior that the agent selects according to the current state to go to the next state. Action sets are divided into discrete action spaces and continuous action spaces. The collection of exposure time variation values should belong to a continuous action space. However, its solution is complex and computationally intensive. So, we use discrete action space in this paper to simplify computation. In addition, intuitively, an action should be defined as an increase or decrease in exposure time. For example, assuming that the action space is defined as $[-20,20]$ ms, if the precision of the exposure time is to be defined to the level of $0.1$ ms, the action space will have 400 values. The number is too large, which makes subsequent training converge. To address this, we define action as a proportion of exposure time, which can be expressed as

$$e _{k+1} =e_{k} {\rm *}act,$$
where $e _{k}$ represents the exposure time of the $k$ th state, $act$ represents action. In particular, since the input is a low-exposure image, it can be defined that the exposure time can only be increased to reduce the amount of computation, which means $act>1$ all the time.

Agent: The agent refers to selecting appropriate action based on the current state and passing it to the environment. In this paper, we build two ESN based on Dueling DQN (Deep Q Network) [20] for training to predict the next action. The two networks are the prediction network and the target network, respectively. The prediction network is used to predict the reward function of each action in the current state, and the target network is for the next state. The ultimate goal is to achieve the maximum global reward, specifically expressed as the optimized value function

$${v_\pi }(s) = E{X_\pi }\left[ {\sum_{k = 0}^\infty {{\gamma ^k}} {R_{k + 1}}s} \right],$$
where $\pi$ represents the strategy, $s$ represents the state, ${v_\pi }(s)$ represents the maximum value expected to be obtained in the current state according to the strategy replaced by the convolutional neural network, $R_{k + 1}$ represents the reward of the $k+1$ th step, $\gamma$ represents the discount factor. In addition, the ESN uses an end-to-end convolutional neural network to select exposure based on HDR image quality, current image semantics, and brightness information. It takes the current state as input and the exposure multiplier as output. The specific structure will be described in Section 3.3.

Reward: The reward refers to how well the environment behaves to take a specific action in a certain state. For example, if only positive rewards are given to actions that increase the coverage of the measurable area, the final prediction result will often be the most significant coverage, but the measurement efficiency is very low. The goal of the reward function is to comprehensively consider the measurement efficiency and quality, which will be described in Section 3.4.

Environment: In reinforcement learning, the environment is conceptualized as an external system that responds to an agent’s actions, subsequently determining the next state and providing reward feedback. Precisely, in our approach, this environment mirrors the behavior of a camera: it captures images at determined exposure times based on the agent’s actions and then formulates the following state and reward grounded on the acquired image.

The direct acquisition of images from real-world settings for training poses substantial challenges. This method lengthens data collection time and decelerates the training process, leading to concerns regarding its practicality. Zhang et al. [8] implemented the linear multiplication (LM) technique for image prediction at set exposures to improve efficiency. While LM simplifies computation, it often results in notable inaccuracies caused by the nonlinear behavior of cameras and random noise, as illustrated in Fig. 3. Furthermore, Zheng et al. [21] suggested using an irradiance map for environmental simulation. Utilizing the camera response function (CRF) for exposure-specific image predictions, this method demonstrates significant accuracy in environmental emulation. Nevertheless, it necessitates the prior collection of multiple exposure images.

 figure: Fig. 3.

Fig. 3. Predicted results of LM. (a) and (c) are the images captured under uniform light with an exposure time of 40 ms and 280 ms, respectively. (b) is the image obtained by LM of (a). (d),(e) are the close-up views of the results shown in (b) and (c).

Download Full Size | PDF

In summary, while prevailing methods can craft a meticulous virtual environment, they generally entail the collection of numerous exposure images (often around 6), undermining the efficiency of measurements. We propose an innovative strategy in response to these intricacies: constructing a virtual environment via the EIPN for training, necessitating merely a single image exposure. Further insights into this methodology are elaborated upon in Section 3.2.

3.2 Exposure image prediction network (EIPN)

3.2.1 Network structure

The EIPN realizes the prediction of exposure images based on a single low-exposure image. The reason for choosing a low-exposure image is that it retains more information than an overexposed high-exposure image. As shown in Fig. 4, the low-exposure image in the input is firstly passed through the Noise Remove module and then multiplied with the exposure time multiple in the input to obtain the preliminary image. Then, the preliminary image and the result of the Noise Remove module are input into the Feature Extraction module. After that, the entire Feature Extraction module is regarded as a residual module to improve the convergence speed, expressed as

$$Pre=PI+FeaExtract{ (Inpu)},$$
where $Pre$ represents the predicted result of the EIPN, $PI$ represents the preliminary image, ${ Inpu}$ contains the preliminary image and the result of the Noise Remove module, $FeaExtract$ represents the Feature Extraction module. Finally, to ensure the range of the output image, we use the Error Remove module to remove outliers such as negative numbers. In EIPN, the two most important modules are Noise Remove and Feature Extraction, and their specific explanations are as follows.

 figure: Fig. 4.

Fig. 4. The overall workflow of EIPN. The gray area on the left represents the fundamental process of EIPN, and the blue and cyan areas on the right, indicated by dashed boxes, represent the Noise Remove and Feature Extraction modules, respectively. PI stands for the preliminary image.

Download Full Size | PDF

Noise Remove: The difficulty in predicting a specified exposure image based on a single image mainly comes from the image noise, especially for the low-exposure image. Therefore, we construct the Noise Remove module based on U-Net. It first downsamples and upsamples the input image, then upsamples and reduces the dimensionality. This structure can effectively remove noise and has been applied in many scenes [22].

Feature Extraction: Traditional exposure image prediction methods consider that the intensity of a pixel only depends on itself rather than other pixels, but actually, it is not independent. For instance, if the intensity of a specific pixel is much greater than surrounding pixels, it might have louder noise. Analogously, even in different regions of the image, the intensities of a certain object tend to be similar. Therefore, local and global information should be considered comprehensively when performing image prediction. In EIPN, the Feature Extraction module based on skip pyramid context aggregation network [10] is used to predict the result by preliminary image. It incorporates receptive fields of varying sizes into the connection layer, improving the preservation of small-scale and large-scale features.

3.2.2 EIPN Loss

In the EIPN, we developed a loss function that integrates considerations of gradient and overexposure to mitigate the impact of overexposed areas on predicted images and enhance prediction quality in regions with substantial gradients. The training process incorporates a fixed low-exposure image and an action variable $act$ as inputs while utilizing a high-exposure image as the ground truth.

The loss function is delineated as follows:

$${L_{{\rm{EIPN}}}} =\sqrt{L_{e} {}^{2} {\rm +}L_{o} {}^{2} } ,$$
$${L_e} = \frac{1}{N}\sum_{x,y} {\sqrt {{{\mathbf{E}}_g}(x,y){{\left( {{{\mathbf{I}}_p}(x,y) - {{\mathbf{I}}_g}(x,y)} \right)}^2} + {\varepsilon ^2}} } ,$$
$${L_o} = \frac{1}{N}\sum_{x,y} {\sqrt {{{\mathbf{O}}_g}(x,y){{\left( {{{\mathbf{I}}_p}(x,y) - {{\mathbf{I}}_g}(x,y)} \right)}^2} + {\varepsilon ^2}} },$$
where $I_{p}$ and $I_{g}$ represent the intensities of the output and ground truth images, respectively. $E_{g}$ denotes the gradient of the ground truth image, while ${\mathbf{O}}_{g}$ indicates its overexposure status, with 0 signifying overexposure and 1 denoting non-overexposure. The variables $x, y$ correspond to the pixel coordinates in the image, and $\varepsilon$ is a small constant, set to $10^{-3}$ in this study.

The component $L_{e}$ ingeniously combines the gradient information and mean absolute error (MAE) to effectively minimize errors, particularly in regions with substantial gradients [23]. The component $L_{o}$, designed explicitly for underexposed areas, counterbalances the disproportionate influence of overexposed regions. This is crucial as predictions in overexposed areas often simplify to binary decisions, which, while straightforward, can skew the model’s performance assessment. Incorporating overexposed areas directly into the loss function may superficially reduce the overall error metric, but it fails to capture errors in normally exposed regions accurately. For instance, a significant part of the ground truth might be overexposed in scenarios involving long exposure time. A model adept at predicting these areas could seem accurate, yet this belies its inability to effectively address errors in non-overexposed areas, thus compromising overall prediction accuracy.

To assess the loss function’s performance, we deliberately provoked overfitting in the network to scrutinize the prediction result. Figure 5 demonstrates that although the general error distributions of different loss functions are similar, distinct variations are observable in specific regions. The conventional MAE loss function, which computes the MAE over the entire image, minimizes overall error but often neglects local variations, especially in areas with significant gradients. In contrast, the gradient-incorporating $L_e$ loss function effectively reduces errors at the margins but may overlook broader error patterns, resulting in more significant overall errors. The $L_o$ loss function, focused on overexposure, produces an error pattern resembling MAE, marked by significant errors in regions with substantial gradients but a reduced overall error scope. This method effectively minimizes errors in such regions by primarily addressing underexposed areas, although it may not account for errors in areas with notable gradient shifts. The newly developed $L_{EIPN}$ function, which integrates $L_e$ and $L_o$, proficiently reduces errors and achieves a balanced error distribution in regions with substantial gradient variations, thus ensuring a consistent overall error distribution.

 figure: Fig. 5.

Fig. 5. Comparison of loss functions. (a) and (b) represent the input (10ms) and the ground truth (95ms) respectively. (c) shows the error map of the predictions obtained after training with different loss functions against the ground truth. MAE Loss represents the loss function of MAE. $L_{e}$, $L_{o}$, and ${L_{{\rm {EIPN}}}}$ correspond to the loss functions defined in Eq. (13)–(15).

Download Full Size | PDF

3.3 Exposure selection network (ESN)

3.3.1 Network structure

As mentioned in Section 3.1, the agent consists of two ESN, jointly trained to solve the optimal value function. As shown in Fig. 2, the agent selects the action based on the state. Therefore, we use the state as input in ESN, as shown in Fig. 6. The ESN first uses the Res module to extract semantic information. At the same time, the histogram of the current image is also used as a feature to represent the brightness information. Then, these two kinds of information are concatenated and passed to fully connected layers to generate the predicted value for each action. Finally, the action with the highest value is selected as the output.

 figure: Fig. 6.

Fig. 6. The overall workflow of ESN. The left gray area illustrates the input composition, the cyan area on the right depicts the basic architecture of the ESN, and the blue area with a dashed box represents the RES module.

Download Full Size | PDF

In ESN, the Res module based on the residual network is used to extract semantic information because of its simplicity and practicality [24]. However, unlike the classic residual network, we only use fewer layers to extract features because of the particularity of deep reinforcement learning. Unlike traditional supervised learning, deep reinforcement learning lacks supervisory signals and is prone to unstable and non-monotonic performance growth [25]. If the network is deep, its convergence will slow or diverge. Consequently, we opt to use a relatively shallow deep reinforcement learning network in the Res module to ensure quick fitting. Furthermore, we employ the Dueling DQN framework for constructing the ESN to enhance the efficacy of policy evaluation. By decoupling the state values independent of actions from the Q-values, the framework facilitates superior discrimination among actions with closely aligned values [20]. The workflow of Dueling DQN is shown in Algorithms 1 and 2. Algorithm 1 represents the basic training process of ESN. Algorithm 2 represents the execution function of each step during ESN training.

oe-32-4-4857-i001

oe-32-4-4857-i002

To ensure efficient training progress, it is imperative to establish termination conditions, preventing perpetual operation. In general, there are four conditions where the current round of iteration will stop and start the next round, namely: (1) reaching the specified number of exposures or exposure time; (2) the predicted action is 1; (3) the measurable area is less than the threshold; (4) all goals are completed. Except for situation (4), the other three can only mean the end of the current round, not the completion of the task. The second situation means the loss for the next exposure is huge and exceeds the rewards for completing tasks. The third case indicates that there are very few measurement quality that can be improved, expressed as

$$\frac{{\sum {\left( {{\mathbf{Me}}{{\mathbf{a}}_{\max }} \odot \left( {1{\rm{ - }}{\mathop{\rm o}\nolimits} \left( {{{\mathbf{H}}_c}} \right)} \right) \odot \left( {1 - {\rm{a}}\left( {{{\mathbf{H}}_c}} \right)} \right)} \right)} }}{{N \cdot {c_{\max }}}} < \rm{Thr}_{rm},$$
where ${\mathop {\rm o}\nolimits }$ represents the overexposure function, overexposed is 1, and others are 0. $\rm {Thr}_{rm}$ represents the set threshold for remaining measurement. In this paper, $\rm {Thr}_{rm}=0.08$.

3.3.2 ESN loss

In traditional deep learning, the primary aim is to minimize the disparity between predicted outputs and their respective ground-truth labels, commonly using metrics such as MAE and mean squared error (MSE). In contrast, reinforcement learning, as detailed in Section 3.1, emphasizes maximizing the expected cumulative rewards. Its loss function predominantly relies on the temporal difference (TD) error, quantifying the difference between an estimated Q-value and the sum of the actual reward plus the discounted estimated future rewards.

Building upon the foundation of reinforcement learning, this study primarily adopts the Dueling DQN model. Although the architectural elements of Dueling DQN present apparent deviations from the conventional DQN, it is paramount to note that both share identical optimization objectives and loss functions. The central aim is to attenuate the TD error, delineated as:

$$\delta = r + \gamma {Q_{{\rm{target}}}}\left( {{s_{{\rm{next}}}},{\rm{arg}}\mathop {{\rm{max}}}_{ac{t_{{\rm{next}}}}} {Q_{{\rm{pred}}}}({s_{{\rm{next}}}},ac{t_{{\rm{next}}}})} \right) - {Q_{{\rm{pred}}}}(s,act),$$
where $\delta$ represents the temporal difference (TD) error, $r$ represents the immediate reward after taking action $act$ in state $s$, $Q_{\text {target}}$ represents the target network, $Q_{\text {pred}}$ represents the prediction network, $s,s_{\text {next}}$ represents the now and subsequent state, $act, act_{\text {next}}$ represents the action chosen in state $s, s_{\text {next}}$.Therefore, the loss function can be expressed as
$$L_{ESN} ={\delta}^2,$$

During training, a fixed exposure image is randomly selected from the dataset to initialize the state, and then iteratively trains according to Eqs. (17), (18).

3.4 Reward function

Rewards can be categorized into three types: single-step reward, sub-goal reward, and main line goal reward. Typically, only the single-step reward is used for training as it succinctly represents the effect of each step. However, it is sparse and does not fully represent the final results, posing challenges in ensuring training speed and overall optimization. To address this issue, we combine the three rewards described below to solve this problem.

3.4.1 Single-step reward

As mentioned earlier, the reward is related to measurement efficiency and quality, expressed as

$$R=w_{q} *R_{q} +w_{e} *R_{e} ,$$
where $w_{q},w_{e}$ represent the weights of $R_{q}$ and $R_{e}$ respectively.

For $R_{e}$, it can be constructed directly according to Eq. (2), expressed as

$$R_e = m_{e} ={-}w_{t} *t-b.$$

For $R_{q}$, the reward should be an improvement in measurement quality, not itself. Therefore, it can be expressed as

$${R_q}{\text{ }} = M_q^{add} - M_q^{loss}{\text{ }},$$
where $M_q^{add}$ represents the increment of measurement quality, expressed as
$$M_q^{add} = \operatorname{C} \left( {{{\mathbf{H}}_c}} \right) - \operatorname{C} \left( {{{\mathbf{H}}_p}} \right) + A\left( {{{\mathbf{H}}_c}} \right) - A\left( {{{\mathbf{H}}_p}} \right),$$

$M_q^{loss}$ represents the total loss of image quality due to improper exposure time. For example, when the action makes the next exposure time very long, the image quality may be greatly improved, but some areas may have been overexposed without achieving better quality, and subsequent actions cannot be recovered. Therefore, $M_q^{loss}$ can be expressed as

$$M_q^{loss} = \operatorname{C} \left( {{\mathbf{H}}_c^o - {\mathbf{H}}_p^o} \right) - \operatorname{A} \left( {\left( {{\mathbf{H}}_c^o - {\mathbf{H}}_p^o} \right) \odot {{\mathbf{H}}_c}} \right),$$
where ${\mathbf{H}}_{c}^{o},{\mathbf{H}}_{p}^{o}$ represents the overexposure matrix of the current HDR image and previous HDR image, the overexposure elements are 1, and the others are 0.

3.4.2 Sub-goal reward

The sub-goal reward involves breaking down the main goal into sub-goals and assigning rewards to them, thereby encouraging the agent to converge faster and enhance performance. In this paper, we define two sub-goals, the requirement for coverage and image quality, expressed as

$$\left\{ \begin{gathered} \mathrm{G1}:\operatorname{C} ({{\mathbf{H}}_C}) > {\rm{Thr}}_{c} \hfill \\ \mathrm{G2}:\operatorname{A} ({{\mathbf{H}}_C}) > {\rm{Thr}}_{a} \hfill \\ \end{gathered} \right.,$$
where $\mathrm {G1},\mathrm {G2}$ represent sub-goals of measurement area coverage and image quality, respectively. $\operatorname {C}$ and $\operatorname {A}$ are shown in Eq. (6), (7). ${\rm {Thr}}_{c} ,{\rm {Thr}}_{a}$ represent the set thresholds of $\mathrm {G1},\mathrm {G2}$, respectively. In this paper, ${\rm {Thr}}_{c}=0.98, {\rm {Thr}}_{a}=0.75$. When any sub-goal is met, the corresponding reward is given.

3.4.3 Main line goal reward

The main line goal reward indicates the measurement efficiency and quality of the final result obtained by a series of actions given by the agent. Therefore, it can be expressed as

$$\left\{ \begin{gathered} \mathrm{F1}:w_{Q} *{M_q}+w_{E} *{M_e},G1\& G2 \hfill \\ \mathrm{F2}:-(1-w_{Q} *{M_q})-(1-w_{E} *{M_e}),else \hfill \\ \end{gathered} \right.,$$
where $w_{Q} ,w_{E}$ represent the corresponding weight of $M_q,M_e$, respectively. $\mathrm {F1}$ means the designated goal is finally completed. $\mathrm {F2}$ means the maximum number of exposures or the longest exposure time has been reached. The reason for the discrepancy in weight from the single-step reward is that the final reward does not need to consider the loss caused by each individual step.

4. Experiment

4.1 Experiment settings and dataset

To confirm the efficacy of our proposed method, we construct a 3D measuring instrument to create our dataset and conduct experiments. The instrument comprises two CMOS cameras (model: BFLY-U3-23S6M-C) fitted with 16mm lenses (model: ML-M1618HR) and a DLP (digital-light-processing) projector (model: DLP4500). The resolution of the projector and cameras are 912 $\mathrm {\times }$ 1140 and 1920 $\mathrm {\times }$ 1200, respectively.

Automotive sheet metal parts are widely used and have HDR surfaces, making them typical application scenarios for HDR measurements. Therefore, this paper selects 15 typical sheet metal parts, such as the rear partition and rear reinforcement plate of the C-pillar, to construct a dataset. When building the dataset, multiple exposure data from many random angles are obtained by the 3D measuring instrument. The dataset includes 162 sets of data, each including 20 groups of images with different exposures. The exposure time of each group increases equidistantly; the minimum time is 5ms, while the maximum is 195ms. Notably, each group includes 16 fringe images and one uniform illumination image, so the dataset totals about 110,000 images. Also, to prevent overfitting during training, the dataset is randomly split into two parts: a training set contains 88000 images, and a validation set contains 22000 images.

4.2 Validation of EIPN

In the EIPN training, a fixed exposure image is randomly selected from the dataset, followed by random generation of an exposure time multiple. The corresponding ground truth image was then produced based on the irradiance. Particularly, we set the input size of EIPN to 512*512 and use random cropping, rotation, and flipping to expand the dataset. In this paper, the Adam optimization algorithm is used for training. The ratio of the training set to the verification set is 4:1. First, train 30 rounds with a learning rate of 1 $\mathrm {\times }$ 10 ${}^{-4}$, and then train for 30 rounds with a learning rate of 1 $\mathrm {\times }$ 10 ${}^{-5}$. According to the final training result, the validation set error is 0.00457.

To verify the effectiveness of EIPN, we compare the results of our method with linear multiplication (LM) [8] and camera response function (CRF) [21] method with different exposures, as shown in Fig. 7 and Fig. 8. The result obtained by the LM method has the most significant error, and the MAE exceeds 0.14. The results obtained by the CRF method are smaller than LM but still reach 0.01. Its effect depends on the number of exposures. When the number of exposures is smaller, the MAE will be more significant. The result of EIPN can be the best; its MAE reaches 0.006, and the error distribution is smoother.

 figure: Fig. 7.

Fig. 7. Results comparison of image predictions at specified exposure time. The first and second columns are the actual images with 10ms and 95ms, respectively. The third to sixth columns are the images predicted by different methods.

Download Full Size | PDF

 figure: Fig. 8.

Fig. 8. The error map of the prediction results. (a),(b) corresponding to (a),(b) in Fig. 7. The first to fourth column corresponds to the third to sixth column in Fig. 7. MAE stands for the mean absolute error of the predicted image.

Download Full Size | PDF

4.3 Validation of ESN

During training in ESN, the image is sampled to 960*600 to improve training efficiency, and a data augmentation method similar to EIPN is adopted to expand the dataset. The Adam optimizer is used to train ESN for 60000 episodes. The learning rate is $10^{-4}$ for the first 30000 episodes, and $10^{-5}$ for the next 30000 episodes. The parameters for training are as follows: the number of learning interval steps is 20, the batch size is 16, the memory size is 2000, and the neural network replacement interval is 200. A round here means that the predicted exposure sequence ends when the specified condition is met, and the state is reinitialized. The specific end conditions will be elaborated in Section 3.4. In particular, during training, the action with the greatest value will not be selected, but the value corresponding to all actions will be output directly.

To verify the effectiveness of ESN, we first use ESN to predict exposure sequences for images in the dataset. As shown in Fig. 9(a), the measure area coverage $C_s>0.98$ is about 99.4%, $C_s>0.99$ is about 96.2%, and the average value ${\mathop {\rm mean}\nolimits } \left ( {{C_s}} \right ) = 0.997$. It shows that ESN can achieve sub-goal $\mathrm {G1}$ well and obtain a higher coverage rate. In addition, the distribution of coverage ratios is rising from $[0.98,1]$. This is mainly because sub-goal $\mathrm {G1}$ is usually completed first, and the additional high exposure for quality will further improve coverage. As shown in Fig. 9(b), the accuracy $A_s>0.75$ is about 99.8%, $A_s>0.8$ is about 33.2%, and the average value ${\mathop {\rm mean}\nolimits } \left ( {{A_s}} \right )=0.786$. The accuracy is relatively evenly distributed within $[0.75,0.8]$, and then gradually decreases. This is mainly because when sub-goal $\mathrm {G2}$ is met, sub-goal $\mathrm {G1}$ is basically completed, and exposure stops accordingly. As shown in Fig. 9(c), the total reward $R>1.4$ is about 99.7%, $R>1.5$ is about 71.7%, and the average value ${\mathop {\rm mean}\nolimits } \left ( R \right ) = 1.513$. The total reward has several peaks mainly because the total exposure time is mostly distributed in these regions.

 figure: Fig. 9.

Fig. 9. Statistics of prediction results in the dataset. (a),(b) and (c) are the results of the coverage, accuracy, and rewards, respectively.

Download Full Size | PDF

To further demonstrate ESN’s effectiveness, empirical tests were conducted across multiple typical parts, comparing the performance of various methodologies. These included the histogram method (His) [7], Song Zhang’s method (ZS) [8], and the fixed exposure methods Fix6 (6 exposures) and Fix20 (20 exposures). The exposure time range for the Fix6 and Fix20 methods is established between 10 ms and 300 ms, with uniform intervals. The results from Fix20, considered ideal with coverage set to 1, served as a benchmark. In the initial comparative analysis of fusion images, as presented in Fig. 10, the His and Fix6 methods failed to measure certain regions. In contrast, the ZS and the ESN methods effectively made almost all measurable areas accessible, matching the comprehensive coverage of the Fix20 method. The subsequent analysis involved coverage, accuracy, exposure time, and exposure num, as shown in Fig. 11 and Table 1. Notably, the exposure time statistics here represent the total time for capturing 17 images, including 16 fringe images and 1 uniform illumination image. The analysis showed that ESN’s coverage performance (0.997) matched the Fix20 (1) and ZS (0.995) methods. It consistently exceeded the coverage standard ${\rm {Thr}}_{c} = 0.98$, outperforming the His and Fix6 methods. In terms of accuracy, ESN slightly lagged behind Fix20 (0.81 vs. 0.87) but closely matched the ZS method (0.81 vs. 0.79), meeting the pre-established accuracy standard ${\rm {Thr}}_{a} = 0.75$. Crucially, ESN required fewer and shorter exposures than the Fix20 and ZS methods, underscoring its practical effectiveness.

 figure: Fig. 10.

Fig. 10. Fusion result. Original images with single exposure(40ms) for different parts, and their fused images by different methods. The red area indicates that it is overexposed or too dark.

Download Full Size | PDF

 figure: Fig. 11.

Fig. 11. Data analysis of results obtained by different methods. (a), (b), (c) and (d) respectively represent the statistical results of coverage, accuracy, number of exposures, and exposure time. Part1 to part4 corresponds to (a), (b), (c), and (d) in Fig. 10.

Download Full Size | PDF

Tables Icon

Table 1. Summary of average metrics for fusion images.

4.4 Validation of 3D reconstruction results

To further validate the actual reconstruction outcomes of the ESN method, we conducted 3D reconstruction experiments using several comparative methods outlined in the previous section. The specific results are depicted in Fig. 12. It is observable that the 3D reconstruction outcomes slightly differ from the image prediction results. This discrepancy primarily arises because the exposure time prediction considers only the left camera image, overlooking potential overexposure in the right camera image, hence leading to varied reconstruction outcomes. Nevertheless, the results achieved using the ESN method are generally akin to those of the Fix20 method, surpassing both the Fix6 and His methods. Concurrently, we utilized the results from the Fix20 method as a reference to quantify the actual point cloud coverage, as indicated in Table 2. Utilizing a single fixed exposure time (40 ms) yields a limited number of point clouds, resulting in less optimal reconstructions. While the Fix20 method produces the largest quantity of point clouds, its practicality is diminished by the extensive number of exposures and the consequent time consumption, rendering it highly inefficient. The His and ZS methods, despite generating more point clouds than a single fixed exposure, fall short in comparison to the ESN method.

 figure: Fig. 12.

Fig. 12. The reconstruction results corresponding to the Fig. 10.

Download Full Size | PDF

Tables Icon

Table 2. Summary of average metrics for reconstruction results.

Additionally, to ascertain the actual measurement accuracy of the ESN method, we conducted experiments with a standard step block, comparing the plane fitting results and height difference of the step blocks, as illustrated in Fig. 13. The height differential between planes A and B was measured at 20.1095 mm by the Hexagon Global Classic SR 07.10.07 machine. The findings indicate that the accuracy of the ESN method (0.0263 mm) is comparable to the Fix20 method (0.0231 mm).

 figure: Fig. 13.

Fig. 13. Statistical analysis of step block. (a) The reconstruction point cloud; (b) Error analysis of reconstruction point cloud from ten positions.

Download Full Size | PDF

5. Conclusion

This paper presents an automatic exposure selection method utilizing deep reinforcement learning. It transforms the traditional exposure sequence selection process into Markov decision process, and considers the global information comprehensively to optimize the measurement efficiency and quality. Experiments show it can obtain similar coverage (0.997 vs. 1) and precision (0.0263 mm vs. 0.0230 mm) with fewer exposures (generally 4) compared to the results of 20 exposures.

Funding

National Key Research and Development Program of China (2022YFB4600800); Shenzhen Fundamental Research Program (JCYJ20210324142007022); Key Research and Development Program of Hubei Province (2021BAA204, 2021BAA049, 2022BAA065).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Y. Hu, W. Rao, L. Qi, et al., “A refractive stereo structured-light 3d measurement system for immersed object,” IEEE Trans. Instrum. Meas. 72, 1–13 (2023). [CrossRef]  

2. Z. Zhang, J. Yu, N. Gao, et al., “Three-dimensional shape measurement techniques of shiny surfaces,” Infrared and Laser Engineering 49(3), 303006 (2020). [CrossRef]  

3. Y. Zheng, Y. Wang, V. Suresh, et al., “Real-time high-dynamic-range fringe acquisition for 3d shape measurement with a rgb camera,” Meas. Sci. Technol. 30(7), 075202 (2019). [CrossRef]  

4. J. Xu and S. Zhang, “Status, challenges, and future perspectives of fringe projection profilometry,” Opt. Lasers Eng. 135, 106193 (2020). [CrossRef]  

5. Z. Sun, B. Wang, Y. Zheng, et al., “Bras: Bidirectional reflectance adjustment strategy for 3d reconstruction of mirror-like surface,” IEEE Trans. Ind. Inf. 19(11), 10775–10785 (2023). [CrossRef]  

6. H. Jiang, H. Zhao, and X. Li, “High dynamic range fringe acquisition: A novel 3d scanning technique for high-reflective surfaces,” Opt. Lasers Eng. 50(10), 1484–1493 (2012). [CrossRef]  

7. S. Feng, Y. Zhang, Q. Chen, et al., “General solution for high dynamic range three-dimensional shape measurement using the fringe projection technique,” Opt. Lasers Eng. 59, 56–71 (2014). [CrossRef]  

8. S. Zhang, “Rapid and automatic optimal exposure control for digital fringe projection technique,” Opt. Lasers Eng. 128, 106029 (2020). [CrossRef]  

9. H. Yu, D. Zheng, J. Fu, et al., “Deep learning-based fringe modulation-enhancing method for accurate fringe projection profilometry,” Opt. Express 28(15), 21692 (2020). [CrossRef]  

10. X. Liu, W. Chen, H. Madhusudanan, et al., “Optical measurement of highly reflective surfaces from a single exposure,” IEEE Trans. Ind. Inf. 17(3), 1882–1891 (2021). [CrossRef]  

11. J. Zhang, B. Luo, F. Li, et al., “Single-exposure optical measurement of highly reflective surfaces via deep sinusoidal prior for complex equipment production,” IEEE Trans. Ind. Inf. 19(2), 2039–2048 (2023). [CrossRef]  

12. C. Chen, N. Gao, X. Wang, et al., “Adaptive pixel-to-pixel projection intensity adjustment for measuring a shiny surface using orthogonal color fringe pattern projection,” Meas. Sci. Technol. 29(5), 055203 (2018). [CrossRef]  

13. H. Lin, J. Gao, Q. Mei, et al., “Adaptive digital fringe projection technique for high dynamic range three-dimensional shape measurement,” Opt. Express 24(7), 7703 (2016). [CrossRef]  

14. B. Salahieh, Z. Chen, J. J. Rodriguez, et al., “Multi-polarization fringe projection imaging for high dynamic range objects,” Opt. Express 22(8), 10064 (2014). [CrossRef]  

15. J. Zhu, F. Yang, J. Hu, et al., “High dynamic reflection surface 3d reconstruction with sharing phase demodulation mechanism and multi-indicators guided phase domain fusion,” Opt. Express 31(15), 25318–25338 (2023). [CrossRef]  

16. S. He, D.-H. Shin, J. Zhang, et al., “Full-view area coverage in camera sensor networks: Dimension reduction and near-optimal solutions,” IEEE Trans. Veh. Technol. 65(9), 7448–7461 (2016). [CrossRef]  

17. W. Chen, X. Liu, C. Ru, et al., “Automated exposures selection for high dynamic range structured-light 3d scanning,” IEEE Trans. Ind. Electron. 70(7), 7428–7437 (2022). [CrossRef]  

18. K. Zhong, Z. Li, X. Zhou, et al., “Enhanced phase measurement profilometry for industrial 3d inspection automation,” Int. J. Adv. Manuf. Technol. 76(9-12), 1563–1574 (2015). [CrossRef]  

19. Y. Zhang, R. Bai, R. Qu, et al., “A deep reinforcement learning based hyper-heuristic for combinatorial optimisation with uncertainties,” Eur. J. Oper Res. 300(2), 418–427 (2022). [CrossRef]  

20. Z. Wang, T. Schaul, M. Hessel, et al., “Dueling network architectures for deep reinforcement learning,” in Proc. Int. Conf. Mach. Learn., (PMLR, 2016), pp. 1995–2003.

21. C. Zheng, Z. Li, Y. Yang, et al., “Single image brightening via multi-scale exposure fusion with hybrid learning,” IEEE Trans. Circuits Syst. Video Technol. 31(4), 1425–1435 (2021). [CrossRef]  

22. C. Chen, Q. Chen, J. Xu, et al., “Learning to see in the dark,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (IEEE, Salt Lake City, UT, 2018), pp. 3291–3300.

23. G. Seif and D. Androutsos, “Edge-based loss function for single image super-resolution,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., (2018), pp. 1468–1472.

24. K. He, X. Zhang, S. Ren, et al., “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., (2016), pp. 770–778.

25. E. Cetin, P. J. Ball, S. Roberts, et al., “Stabilizing off-policy deep reinforcement learning from pixels,” arXiv, arXiv:2207.00986 (2022). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (13)

Fig. 1.
Fig. 1. Planar point cloud fitting analysis. (a) Schematic representation of partial area selection for plane fitting. (b) The curve depicts the average gray value and Std of the fitted plane. (c) Relationship curve between gray value and weight.
Fig. 2.
Fig. 2. The overall workflow of the proposed method.
Fig. 3.
Fig. 3. Predicted results of LM. (a) and (c) are the images captured under uniform light with an exposure time of 40 ms and 280 ms, respectively. (b) is the image obtained by LM of (a). (d),(e) are the close-up views of the results shown in (b) and (c).
Fig. 4.
Fig. 4. The overall workflow of EIPN. The gray area on the left represents the fundamental process of EIPN, and the blue and cyan areas on the right, indicated by dashed boxes, represent the Noise Remove and Feature Extraction modules, respectively. PI stands for the preliminary image.
Fig. 5.
Fig. 5. Comparison of loss functions. (a) and (b) represent the input (10ms) and the ground truth (95ms) respectively. (c) shows the error map of the predictions obtained after training with different loss functions against the ground truth. MAE Loss represents the loss function of MAE. $L_{e}$, $L_{o}$, and ${L_{{\rm {EIPN}}}}$ correspond to the loss functions defined in Eq. (13)–(15).
Fig. 6.
Fig. 6. The overall workflow of ESN. The left gray area illustrates the input composition, the cyan area on the right depicts the basic architecture of the ESN, and the blue area with a dashed box represents the RES module.
Fig. 7.
Fig. 7. Results comparison of image predictions at specified exposure time. The first and second columns are the actual images with 10ms and 95ms, respectively. The third to sixth columns are the images predicted by different methods.
Fig. 8.
Fig. 8. The error map of the prediction results. (a),(b) corresponding to (a),(b) in Fig. 7. The first to fourth column corresponds to the third to sixth column in Fig. 7. MAE stands for the mean absolute error of the predicted image.
Fig. 9.
Fig. 9. Statistics of prediction results in the dataset. (a),(b) and (c) are the results of the coverage, accuracy, and rewards, respectively.
Fig. 10.
Fig. 10. Fusion result. Original images with single exposure(40ms) for different parts, and their fused images by different methods. The red area indicates that it is overexposed or too dark.
Fig. 11.
Fig. 11. Data analysis of results obtained by different methods. (a), (b), (c) and (d) respectively represent the statistical results of coverage, accuracy, number of exposures, and exposure time. Part1 to part4 corresponds to (a), (b), (c), and (d) in Fig. 10.
Fig. 12.
Fig. 12. The reconstruction results corresponding to the Fig. 10.
Fig. 13.
Fig. 13. Statistical analysis of step block. (a) The reconstruction point cloud; (b) Error analysis of reconstruction point cloud from ten positions.

Tables (2)

Tables Icon

Table 1. Summary of average metrics for fusion images.

Tables Icon

Table 2. Summary of average metrics for reconstruction results.

Equations (25)

Equations on this page are rendered with MathJax. Learn more.

M e = w t T s N s b ,
m e = w t t b ,
c ( i ) = { 1 , if   i g m 0 , otherwise ,
a ( i ) = { 1.21 × 10 9 i 4 + 6.60 × 10 7 i 3 0.00014 i 2 + 0.016 i 0.018 , if   i g m 0 , otherwise ,
M q = C s + A s = C ( H ) + A ( H ) ,
C ( H ) = ( c ( H ) M e a max ) N c max ,
A ( H ) = ( a ( H ) M e a max ) N c max ,
H c = F ( H p , I )   ,
F ( x , y ) = { x , if   a ( x ) > a ( y ) y , otherwise ,
e k + 1 = e k a c t ,
v π ( s ) = E X π [ k = 0 γ k R k + 1 s ] ,
P r e = P I + F e a E x t r a c t ( I n p u ) ,
L E I P N = L e 2 + L o 2 ,
L e = 1 N x , y E g ( x , y ) ( I p ( x , y ) I g ( x , y ) ) 2 + ε 2 ,
L o = 1 N x , y O g ( x , y ) ( I p ( x , y ) I g ( x , y ) ) 2 + ε 2 ,
( M e a max ( 1 o ( H c ) ) ( 1 a ( H c ) ) ) N c max < T h r r m ,
δ = r + γ Q t a r g e t ( s n e x t , a r g m a x a c t n e x t Q p r e d ( s n e x t , a c t n e x t ) ) Q p r e d ( s , a c t ) ,
L E S N = δ 2 ,
R = w q R q + w e R e ,
R e = m e = w t t b .
R q   = M q a d d M q l o s s   ,
M q a d d = C ( H c ) C ( H p ) + A ( H c ) A ( H p ) ,
M q l o s s = C ( H c o H p o ) A ( ( H c o H p o ) H c ) ,
{ G 1 : C ( H C ) > T h r c G 2 : A ( H C ) > T h r a ,
{ F 1 : w Q M q + w E M e , G 1 & G 2 F 2 : ( 1 w Q M q ) ( 1 w E M e ) , e l s e ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.