Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Self super-resolution autostereoscopic 3D measuring system using deep convolutional neural networks

Open Access Open Access

Abstract

Autostereoscopy technology can provide a rapid and accurate three-dimensional (3D) measurement solution for micro-structured surfaces. Elemental images (EIs) are recorded within one snapshot and the measurement accuracy can be quantified from the disparities existing in the 3D information. However, a trade-off between the spatial and the angular resolution of the EIs is a major obstacle to the improvement on the measurement results. To address this issue, an angular super-resolution algorithm based on deep neural networks is proposed to construct a self super-resolution autostereoscopic (SSA) 3D measuring system. The proposed super-resolution algorithm can generate novel perspectives between the neighboring EIs so that the angular resolution is enhanced. The proposed SSA 3D measuring system can achieve self super-resolution on its measurement data. A comprehensive comparison experiment was conducted to verify the feasibility and technical merit of the proposed measuring system. The results show that the proposed SSA system can significantly improve the resolution of the measuring data by around 4 folds and enhance the measurement accuracy to a sub-micrometer level with lower standard deviations and biases.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The manufacture of micro-structured freeform surfaces has become a key research issue with their rapidly increasing commercial demands. The application of these complicated surfaces has been used in various industries, including biology and bionics science, space and astronautics science, advanced electronic products, etc. [1,2]. Rapid and accurate measurement of the products with micro-structured features has drawn a lot of attention from the research community. Coordinate-measuring machines (CMMs) are widely used for complex structure measurement due to their stability and high accuracy [3]. A probing system is used in CMMs to scan the measured surfaces so that the surface geometry can be reconstructed. Since contact stylus could result in damage on the measured parts, optical scanners are developed to perform non-contact measurement [4]. However, a fact that cannot be neglected is the low efficiency of the CMMs. The amount of the sampling points acquired by the stylus or the scanners directly affects the measurement performance. A large number of sampling points also contribute to a time-consuming measurement process. In addition, the risk of probe damage increases with a higher scanning speed [5]. To overcome the drawbacks, the autostereoscopic imaging technology is an alternative novel solution to realize fast and high accurate 3D surface profile measurement.

Autostereoscopic three-dimensional (3D) imaging technology can obtain 3D information from multiple view-angle elemental images (EIs) that are captured in one snapshot. The measured surfaces can be reconstructed using the disparity information stored in the EIs and digital refocusing techniques. Li et al. [6,7] developed a 3D measuring system, firstly introducing the autostereoscopic technology to measurement systems and realizing on-machine 3D measurement. Compared with other non-contact measuring systems, the autostereoscopic 3D measurement system is relatively easy to implement, required less restriction of machining conditions, and is able to record richer 3D information for disparity extraction. However, one inherent conflict of the autostereoscopic technology is the trade-off of the angular resolution and the spatial resolution of the EIs. The spatial resolution is the field of view (FOV) of every single EI and the angular resolution indicates the number of views from which these EIs are captured. To obtain high-resolution EIs, some conventional methods [8,9] have been developed to super resolve low-resolution 3D information. However, these research work mostly relied on an accurate disparity estimation which imposed difficulty and severe errors. Moreover, the research studies on how to enhance the angular resolution are still received relatively little attention.

With the development of machine learning techniques, deep convolutional neural networks (CNN) are used for super-resolution tasks achieving superior performance against conventional methods [10,11]. Some deep learning methods [12,13] have also been developed to enhance the angular resolution of stereo images that are generated via simulation or captured by commercial light-field cameras. However, there are few noises and almost perfect illumination conditions in the simulated stereo images. The images recorded by the commercial light-field cameras usually have a small baseline. As a result, these models cannot achieve high-quality enhancement on the measurement data with various noises, complex illumination effects, and a large baseline. The development of a super-resolution method for the measurement data can benefit the measurement performance of the autostereoscopic measuring system.

In this paper, a self super-resolution autostereoscopic (SSA) 3D measuring system is presented. The objective of the study is to synthesize novel views between adjacent EIs so that more corresponding points originating from every object point are acquired. With more corresponding points, the matching errors can be reduced during the 3D reconstruction process and the digital refocusing process. As a result, the measuring results are improved due to the more accurate depth estimation. To this end, a self super-resolution algorithm for the EIs recorded by the optical measuring system has been developed using a deep neural network that can enhance the angular resolution of the EIs by nearly 4 folds. This enhancement also resulted in the enhancement on the spatial resolution of the refocused images with 2-fold larger, which contributed to a more delicate structure reconstruction in the axial direction and more accurate measurement of the autostereoscopic measuring system. The measurement results were greatly improved in the bias, standard deviation, and maximum absolute error dimensions compared with the traditional autostereoscopic measuring (TAM) system proposed in [6]. In Section 2, the autostereoscopic measuring principle is briefly described, and the self super-resolution deep CNN is illustrated in Section 3. The setup for the SSA 3D measuring system as well as experimental results are presented in Section 4 and a conclusion is presented in Section 5.

2. Autostereoscopic 3D measurement principle

Autostereoscopy technology can obtain raw 3D information within one snapshot through embedding a micro-lens array (MLA) into a traditional imaging system without any hardware aids. It is a rapid optical solution to acquire 3D information of the measured parts. There are 3 steps during autostereoscopic 3D measurement, which are recording process, 3D reconstruction, and disparity information extraction. During the recording process as shown in Fig. 1, the MLA splits the rays emitting from main objective lens so that multiple EIs with slightly different angles are recorded. The differences of these EIs indicate disparities in the series of images. The disparities are directly related to the depth of the different target points on the measured surface, i.e., the desired measuring quantity. As the whole measurement system is fixed and ensured, the disparities of the EIs are solely correlated to the depth information. The EIs and their disparities are used for the next-step 3D reconstruction process. The 3D reconstruction process is symmetrical with the recording process since optical rays are reversible. Based on the disparity information stored in the EIs, the depth can be determined using the distance from the MLA to the imaging sensor and the distance between two focused target points in different EIs. Through the establishment of the relationship between image pixels and the depth, the 3D reconstruction surface and refocused images can be obtained using the abundant information existing in the EIs. The digital refocusing process is to re-arrange the pixels layer by layer and the refocused information is generated from different reconstructed planes. As a result, a series of 2D images with multiple focuses on different depth planes are acquired and the depth of the focused part can be determined.

 figure: Fig. 1.

Fig. 1. Autostereoscopic recording process with the illustration of the spatial resolution and the angular resolution determined by the resolution of the image sensor and the array size of the MLA.

Download Full Size | PDF

According to the autostereoscopic theory, a key issue that affects the measurement resolution and accuracy is the pitch size and the number of the micro-lenses in the MLA. With the dimensions of a MLA unchanged, a larger pitch size results in the larger dimensions of every single EI but the smaller number of these EIs. It can be found in Fig. 1 that the resolution of every single EI and the number of the EIs are conjointly determined by the resolution of the image sensor and the array size of the MLA. As a result, the higher spatial resolution contributes to the smaller angular resolution. On the contrary, as the number of the micro-lenses increases, the spatial resolution decreases. To break through the trade-off by avoiding changing the optical system, a super-resolution algorithm that can enhance the angular resolution without reducing the size of EIs is essential.

3. Self super-resolution approach based on deep convolutional neural networks

The proposed self super-resolution approach based on deep CNNs consists of a registration network, a residual encoder-decoder network, a refining network, and a discriminator network, shown in Fig. 2, and will be explained in section 3.1.1, 3.1.2, 3.1.3, and 3.1.4, respectively. To enhance the angular resolution of the measurement data, novel view EIs are interpolated between neighboring EIs. Taking a $\textrm{2} \times \textrm{2}$ neighboring image grid as an example, novel views of the EIs are interpolated in the middle of the horizontal images, the vertical images, and the center of the four images. Hence, the input EIs of the proposed self super-resolution approach can be grouped as horizontal pairs, vertical pairs, and central groups. According to the network framework, the registration network performs affine transformation to the input image pairs, the residual encoder-decoder network extracts the features from the registered images and reconstructs the features of the desired novel view images, and finally the residual refining network refines the features and recovers the novel view images. The three networks including the registration network, the encoder-decoder network, and the refining network compose a generative network. This generative process synthesizes the EIs that are desired to be interpolated in the low-angular-resolution measurement data. The interpolation results which are synthetic images are further restricted by the discriminator network that is trained to distinguish them from real measurement data. The discrimination results are fed back to the generative network so that the generative network can interpolate high-quality synthetic novel view images to fool the discriminator. Hence, the angular-resolution enhancement is achieved by the high-quality interpolation. The details of the network components and training process are discussed in the following sections and two neighboring horizontal EIs ${I_l}$ and ${I_r}$ are taken as a horizontal input pair for demonstration. The vertical input pairs and the central input groups are processed under the same rule.

 figure: Fig. 2.

Fig. 2. Framework of the proposed self super-resolution approach. The approach receives low-angular-resolution measurement data as input and enhances the angular resolution of the data by interpolating synthetic views. The final output high-resolution data are composed of the original measurement data and the synthetic data.

Download Full Size | PDF

3.1 Network details

3.1.1 Registration network

In the autostereoscopic system, the three-dimensional information is reconstructed using the corresponding points in every EI. The axial dimension of the measured parts can be obtained based on the difference of the disparity of two object points in different depth planes. $\varDelta d$ is the disparity of a point which can determine through $\varDelta d = gu/D$. g is the distance from the MLA to the image sensor, u is the baseline distance between two adjacent micro-lenses, and D is the shooting distance. It is obvious that the disparity difference between two object points that located in the top surface and the bottom surface can be expressed as

$$|{\Delta {d_t} - \Delta {d_b}} |= \frac{{gu|{{D_t} - {D_b}} |}}{{{D_t}{D_b}}}.$$
where the disparity and shooting distance of the top surface point are $\Delta {d_t}$ and ${D_t}$, respectively. $\Delta {d_b}$ and ${D_b}$ are the disparity and shooting distance of the bottom surface point. Since the dimensions of the measured micro-structures is much smaller than the shooting distance (i.e., $|{{D_t} - {D_b}} |\ll \min ({{D_t},{D_b}} )$), $|{\Delta {d_t} - \Delta {d_b}} |$ is much smaller than $\min ({\Delta {d_t},\Delta {d_b}} )$. This reveals that the large baseline contributes less to the desired measuring values, and affine transformation to the adjacent EIs is able to reduce the redundant disparity information. In addition, direct fusion of two adjacent EIs with the large baseline could result in severe image artefacts. To this end, the proposed registration network is used to perform alignment of the neighboring EIs so that the effect of the large baseline is eliminated. The registration process can be formulated in Eq. (2) where ${\theta _x}$ and ${\theta _x}$ are the affine parameters in the x and y directions. It is obvious that the neighboring images in each input pair has their own affine parameters $\theta _x^\ast $ and $\theta _y^\ast $, and the affine parameters are predicted by the registration network ${f_R}({\cdot} )$.
$$\begin{array}{*{20}{c}} {\left[ {\begin{array}{*{20}{c}} {\theta_x^l}\\ {\theta_y^l} \end{array}} \right],\left[ {\begin{array}{*{20}{c}} {\theta_x^r}\\ {\theta_y^r} \end{array}} \right] = {f_R}({{I_l},{I_r}} ),}&{\begin{array}{*{20}{c}} {{{I^{\prime}}_l} = \left[ {\begin{array}{ccc} 1&0&{\theta_x^l}\\ 0&1&{\theta_y^l}\\ 0&0&1 \end{array}} \right]{I_l},}&{{{I^{\prime}}_r} = \left[ {\begin{array}{ccc} 1&0&{\theta_x^r}\\ 0&1&{\theta_y^r}\\ 0&0&1 \end{array}} \right]{I_r}} \end{array}} \end{array}.$$

The registration network aims to predict the affine parameters from the input image pairs, with the process as illustrated in Fig. 3. The desired input images to be registered are firstly processed by 4 convolutional blocks and mapped to a feature space. Each block is comprised of two convolutional layers for feature extraction. The mapping from the input I to the output features F happens in each layer. A max-pooling layer is set after each convolutional block to reduce the feature dimension through applying a filter that strides on the input and only allows the maximum value to go through. The end of the registration network is three fully connected layers, which flatten the 2D features output by the convolutional blocks to 1D neurons. Finally, the affine parameters are predicted by the last fully connected layer. The input image pairs are then registered using the affine parameters so that they can be involved in the same coordinate system. It is noted that there are three types of input. To keep consistency with different registration processes, a total of three registration networks are required for horizontal, vertical, and central registration, respectively. These registration networks are all in the same architecture but possess their own weights without sharing.

 figure: Fig. 3.

Fig. 3. Registration network (horizontal). Vertical and central registration networks have the same network architecture but different weights from the horizontal network.

Download Full Size | PDF

3.1.2 Residual encoder-decoder network

The residual encoder-decoder network is the main component of the generative network, which is used to realize feature extraction and feature reconstruction. The output features are directly used for the synthesis of the novel view EIs. The residual encoder-decoder network can extract features from the registered images and reconstruct features of the desired novel view of EIs, with its framework as shown in Fig. 4. The feature extraction is achieved by an encoder that consists of three convolutional blocks. Similarly, the first two blocks are followed by a max-pooling layer so that the feature dimensions are reduced. Direct convolution on the concatenation of the input EIs from different perspectives will deliver image artefact to the subsequential feature extraction and feature reconstruction. To this end, the registered input is processed separately by the encoder and the fusion happens between the local and global features. The fusion on the feature level will reduce the artefacts in the final output. Taking the horizontal image pairs for explanation, the two images are separately processed by the encoder. Hence, two dimension-reduced features corresponding to the two input images are acquired. The two features are concatenated and merged as one feature and input to the horizontal decoder for feature reconstruction. Since the feature dimensions are shrunk during encoding, the decoder adopts a symmetric architecture with the encoder and replaces the max-pooling layers with up-sampling layers that can recover the dimension of the features. It is noted that the two image features output by each convolutional block of the encoder are shared with the decoder and concatenated with the output of each block of the decoder to form a new feature that contains both local and global information. The concatenated feature is then inputted to the next block of the decoder. This forms a residual architecture and can dramatically accelerate the learning efficiency and reconstruction performance of the encoder-decoder network since multi-level features are taken into consideration during the feature reconstruction. With regard to the three input types, there are also three types of decoders to process horizontal, vertical, and central paired images separately with the same concern of the setup of the registration network.

 figure: Fig. 4.

Fig. 4. Framework of the residual encoder-decoder network. Vertical and central input pairs are processed under the same rule. All the input is processed by the encoder network separately.

Download Full Size | PDF

3.1.3 Refining network

To recover the novel view of EIs from the features reconstructed by different decoders, a refining network is built to refine features and recover images as shown in Fig. 5, which is also in a residual architecture. It contains three residual convolutional blocks with their details shown on the middle of Fig. 5. The residual blocks can avoid gradient vanishing [14] and improve learning efficiency as each block only needs to learn the residual value between its input and output. Regardless of the input types, there is only one refining network that accepts all the outputs from the three different decoders to perform feature refining and novel view generation. This is helpful to keep the consistency of the novel views corresponding to different input pairs since all the interpolated novel images are finally generated by the sole network.

 figure: Fig. 5.

Fig. 5. Framework of the residual refining network.

Download Full Size | PDF

3.1.4 Generative adversarial network structure

The three networks, i.e., the registration network, the encoder-decoder network, and the refining network form a generative network. The low angular-resolution measurement data are inputted to the generative network and go through registration, encoding, decoding, and refining. Finally, high angular-resolution measurement data are output. To further enhance the quality of the synthetic interpolation data, a discriminator network is built to form an adversarial relationship with the generative network. The generative adversarial network is an unsupervised learning framework, which can learn to generate data that follows a targeted distribution [15]. In this work, the discriminator network is a classifier whose framework is shown in Fig. 6, able to differentiate the real data obtained by measuring system and the synthetic data interpolated by the generative network. The differentiation results are fed back to the generative network during training so that the generative network can update its weights to generate data with higher quality. As a result, the generative network is able to produce high-quality synthetic novel views that have a similar distribution as the real measurement data to improve the angular resolution. Eventually, the discriminator cannot distinguish the difference between the real data and the synthetic data. This adversarial game between the generative network and the discriminator can prevent the synthetic data far away from the real measurement data.

 figure: Fig. 6.

Fig. 6. Framework of the discriminator network.

Download Full Size | PDF

3.2 Network training

According to the working principle of the proposed self super-resolution approach, the data collected from previous experiments is split to different training pairs. Taking a $\textrm{3} \times \textrm{3}$ neighboring gird of the EIs captured in one snapshot in Fig. 7 as an example, the images at the four right angles are used as input data and the rest images are used as the ground truth. The four input images can be grouped as horizontal, vertical, central input pairs. Following the previous discussion, the middle image in the first row can be regarded as the ground truth of the novel view generated from the horizontal pair. Similarly, the middle image in the left column and the central image can be regarded as the ground truth for vertical and central input pairs, respectively. Hence, the training process aims to minimize the errors between the synthetic novel view images and the ground truth by iteratively updating the weights of the network. It is noted that the input data in the training process are not adjacent, and different from those inputted in the super-resolution test process. The function of the registration network is to predict the distance between pixels from two EIs caused by the large baseline $gu/{D_m}({{D_t} \le {D_m} \le {D_b}} )$. Albeit only non-adjacent EIs are able to be used for the supervised learning of the proposed model due to the unavailability of the ground truth of the novel views, the baseline between the non-adjacent EIs is still determined by the specifications of the MLA and the shooting distance. The non-adjacent baseline is $g({i \cdot u} )/{D_m}({{D_t} \le {D_m} \le {D_b}} )$ where $i \cdot u$ is the center distance of the two non-adjacent micro-lenses. It is possible for the registration network to predict the baseline with different values of i solely based on pixel information. For a generalization consideration, the input of the proposed algorithm is not only $\textrm{3} \times \textrm{3}$ neighboring girds, but also $\textrm{3} \times \textrm{3}$ non-adjacent grids down-sampled from a $({2n + 1} )\times ({2n + 1} )$ neighboring gird. After learning with the whole algorithm in an end-to-end fashion, the registration network is able to predict affine parameters based on the pixel information of the input EIs to eliminate the effect of the large baseline.

 figure: Fig. 7.

Fig. 7. The training data and their corresponding ground truth.

Download Full Size | PDF

To determine the errors, a mean absolute error (MAE) loss, a perceptual loss, and an adversarial loss are used and incorporated as one composed multiple loss function for the network training. The MAE loss is used to directly compare the pixel error between the interpolated images and the ground truth, which is formulated as:

$${l_{MAE}} = \frac{1}{N}\sum {({|{{{\tilde{I}}_m} - {I_{gt}}} |} )} .$$
where ${\tilde{I}_m}$ is the interpolation result, N is the batch size in one iteration during the network training, and ${I_{gt}}$ is the ground truth corresponding to ${\tilde{I}_m}$. Perceptual loss was proposed by [16], which can compare the style differences of two images through determining the distance between the perceptual features extracted from the images. Since the MAE loss only compares the difference between pixel, some image properties such as perceptual similarity and image styles, are overlooked. This neglection can contribute to a distortion and low-quality reconstruction of novel view images. Hence, the perceptual loss is used to monitor the high-level differences between the interpolated images and the ground truth, by comparing their features extracted by a fixed pretrained VGGNet. The VGGNet was proposed by the Visual Geometry Group of Oxford University [17] and trained on an enormous image recognition dataset. The perceptual loss is formulated as
$${l_\phi } = \frac{1}{N}\sum {({{{|{\phi ({{{\tilde{I}}_m}} )- \phi ({{I_{gt}}} )} |}^2}} )} .$$
where $\phi ({\cdot} )$ denotes the feature extraction performed by the VGGNet whose parameters are frozen during the training process. In terms of the adversarial loss, the distinguishing results of the discriminator are used to constrain the output of the generative network, with the formulation as follows:
$${l_a} ={-} \frac{1}{N}\sum {{D_\varepsilon }({{{\tilde{I}}_m}} )} .$$
where ${D_\varepsilon }$ is the distinguishing operation performed by the discriminator. The discriminator is trained simultaneously with the generative network through the following training loss
$${l_{dis}} = \frac{1}{N}\sum {({{D_\varepsilon }({{{\tilde{I}}_m}} )- {D_\varepsilon }({{I_{gt}}} )} )} .$$

Hence, the multiple loss function of the generative network is

$$l = {l_{MAE}} + {\mu _\phi }{l_\phi } + {\mu _a}{l_a} = \frac{1}{N}\sum {({|{{{\tilde{I}}_m} - {I_{gt}}} |+ {\mu_\phi }{{|{\phi ({{{\tilde{I}}_m}} )- \phi ({{I_{gt}}} )} |}^2} - {\mu_a}{D_\varepsilon }({{{\tilde{I}}_m}} )} )} .$$
where ${\mu _\phi }$ and ${\mu _a}$ are the penalty coefficients of the perceptual loss and the adversarial loss. Since all the networks are embedded into one large network, the training can be performed in an end-to-end fashion without extra processing of the data. Hence, when the dataset is updated or the target task is changed, the network can be easily finetuned and transferred for the changes.

The detailed information of the proposed networks, including the registration network, the encoder-decoder network, the refining network, and the discriminator is shown in Table 1, where ${n_{in}}$ is the number of the input EIs, and ${s_k} \times {s_k}$ denotes the kernel size. For the horizontal and vertical input pairs, ${n_{in}}$ is 2, and for the center input groups, ${n_{in}}$ is 4.

Tables Icon

Table 1. Details of the proposed networks

3.3 Depth reconstruction approach

To reconstruct the target surface, a depth reconstruction algorithm based on disparity patterns is developed. The method consists of 4 steps, including pixel-point description, matching, depth optimization based on disparity patterns, and coordinates mapping. First, all the pixel points are described using the local region information including grayscale values and gradient values. On the basis of the description, the points in each EI are matched to the central EI in the multi-view EI array separately. One group of matched points are corresponding points in a determined 3D position in the reconstruction coordinate. Due to the unavoidable measurement uncertainty and the matching errors, the group of matched points is impossible to focus accurately on one point in the 3D coordinate so that only the points in the center EI are kept for the reconstruction. Finally, an optimization process is utilized to find the accurate depth of the matched points. From each resolvable depth, a group of disparity patterns can be determined based on the working principle of autostereoscopic technology. During the optimization process, the patterns should be the closest to the group of matched points. Finally, the spatial coordinates of the remained points are mapped to the 3D coordinate and the point cloud can be obtained. In addition, since each EI acquired by the proposed system only contains part information of the target object, a sliding window technique is exploited to use a small image window sliding on the whole EI array with 1 stride. This can assure that in each sliding window the EIs required to be matched contain similar information so that the rate of correct matching can be improved.

4. System setup and experiments

4.1 System setup and network learning details

The SSA system was established as shown in Fig. 8, where the schematic diagram of the system is shown in (a), the system implementation is shown in (b), and the measured sample is shown in (c). To demonstrate the advancement and improvement over the pioneer research [6], the same sample was used for the evaluation to control the variables. The sample is a surface with pyramid micro-structures. Each pyramid has two edges named Edge A and Edge B in the lateral direction and a height in the axial direction. The measured sample was mounted on a three-axis positioning stage for the lateral and longitudinal motion. Illumination device was used in the dark measurement environment. Multiple illumination conditions were performed to construct the dataset. The acquired measuring data were sent to a computing station for super resolution, digital refocusing, and surface geometry reconstruction. Table 2 shows the specifications of the system.

 figure: Fig. 8.

Fig. 8. Setup of the SSA system. (a) Schematic diagram of the proposed system. (b) System implementation. (c) Measured sample.

Download Full Size | PDF

Tables Icon

Table 2. Specification of the SSA system

To train the proposed self super-resolution approach, a dataset was first built using the data collected from preliminary experiments. The dataset was composed of 80 scenes where various samples including the aforementioned micro-structured surface and 3D complex micro structures. The data were captured in different conditions, including optical system parameters, illumination conditions, exposure conditions, etc. to improve the generalization capability of the approach. The angular resolution and the spatial resolution of each scene were $16 \times 9$ and , respectively. As a result, there are 11,520 EIs in the dataset in total. Among the dataset, 72 scenes were used for the training of the proposed networks and 8 scenes were used for testing. It is noted that the proposed self super-resolution approach can be only trained using the measurement data obtained by the proposed system. This indicates that the trained networks can be easily finetuned on new scenes without modification of the networks and massive consumption of time. As pilot research, this paper currently performs a series of measurements on one target surface to verify the proposed measurement method and system. Further research will be conducted to validate on the effectiveness of different workpieces. The approach was implemented using PyTorch. The multiple loss penalty coefficients ${\mu _\phi }$ and ${\mu _a}$ were set to 0.01 and 0.01 respectively. The Leaky ReLU (Rectified Linear Unit) and ReLU were taken as the activation functions of the generative network and the discriminator. Adam optimizer [18] was used as the training optimizer. The computation platform was equipped using a Nvidia GeForce RTX 2080 graphics card and an Intel Core i7-8700 central processing unit.

4.2 Experiments and analysis

All the experiments results were acquired in accordance with the procedure as shown in Fig. 1. The acquired measuring data, i.e., the EIs went through a super-resolution process conducted by the trained self super-resolution network, then digital refocusing, and reconstruction. The digital refocusing process was based on the method proposed in [7] and the reconstruction was conducted using the proposed depth reconstruction approach. As a result, the geometry and height information of the measured surfaces were obtained. The experiment results were compared with the measuring results acquired by the TAM system without super-resolution. Three different comparisons regarding to angular resolution, spatial resolution of the refocused images, and 3D reconstruction results are presented in this experimental study in order to provide powerful illustration of the novelties of the proposed method.

The first comparison happened between the measuring data of the TAM system and the super-resolution measuring data of the proposed SSA system. The angular resolution of the EIs recorded in one snapshot by the TAM system were $16 \times 9$. After super resolution, the angular resolution of those images recorded by the SSA system was expanded to $31 \times 17$, nearly improved by 4 times. The low angular-resolution images recorded by the TAM system are shown in Fig. 9(a). Figure 9(c) shows the high-resolution EIs produced by the SSA system and a $\textrm{3} \times \textrm{3}$ local region was enlarged as shown in Fig. 9(b) for a vivid comparison, where the low angular-resolution EIs were bordered in color and those without color borders were the novel views generated by the proposed self super-resolution approach. The comparison indicates that the proposed self super-resolution approach is able to interpolate high-quality novel view EIs and the angular resolution of the autostereoscopic system is enhanced obviously.

 figure: Fig. 9.

Fig. 9. Comparison of the EIs between the proposed SSA system and the TAM system. (a) Low-resolution measurement EIs (TAM system). (b) Partial enlargement of the high-resolution EIs. (c) High-resolution EIs generated by the proposed self super-resolution approach (SSA system).

Download Full Size | PDF

A comparative experiment was conducted in this stage to evaluate the performance of the proposed super-resolution approach. 4D bilinear method is incorporated as a standard interpolation approach, which was taken as a baseline method, as well as a state-of-the-art deep learning approach proposed in [13] which has achieved high-quality angular super-resolution for synthetic light-field images was used for the comparison. In this experiment, the model of [13] was retrained on the measurement data collected by the proposed autostereoscopic system under the same training conditions as the proposed approach. In addition, the pre-trained model of [13] which was supervised by a public light-field dataset was also finetuned on the measurement dataset for a comparison. The results generated by the baseline method and the state-of-the-art approach [13] are exhibited in Fig. 10, where the baseline method produced some image artefacts in the novel views, and both the retrained and pre-trained model of [13] failed to generate high-quality novel views since inevitable noises, imperfect illumination conditions, and the missing part in the large-baseline EIs limited the depth estimation premise of [13].

 figure: Fig. 10.

Fig. 10. Comparison of the angular super-resolution methods. The novel views are reconstructed by (a) the 4D Bilinear method as a baseline, (b) a state-of-the-art deep learning model [13] which was trained on the measurement dataset, (c) the model [13] which was pre-trained on a public light-field dataset and finetuned on the measurement dataset, and (d) the proposed model trained solely on the measurement dataset.

Download Full Size | PDF

The second comparison was performed between the refocused images of the TAM system and the SSA system. Since the angular resolution was enhanced by approximately 4 times in the proposed system, the spatial resolution of the refocused images was also improved by 2 times. Two refocused images obtained by the TAM system and the SSA system are shown in Fig. 11, with the same local regions magnifying to the same scale. Moreover, it is noted that the smoothness of the refocused images is improved significantly since these novel view EIs provide extra pixel information to fill the sparse between the two points in the digital refocusing process. According to the autostereoscopic theory and the refocusing principle, the corresponding points from the EIs are focused on different focal planes to form multiple refocused images. It is notable that the amount of the corresponding points determines the number of layers of the focal stack. The increment of the layers of the focal stack improves the axial measurement precision. Hence, the novel views produced by the proposed SSA system result in the increase of the corresponding points and consequently the improvement of both lateral and axial measurement resolution.

 figure: Fig. 11.

Fig. 11. Comparison of the refocused images between the proposed SSA system and the TAM system. (a) Low-resolution refocused images with different focal length (TAM system). (b) Partial enlargement for comparison. (c) High-resolution refocused images (SSA system).

Download Full Size | PDF

The third comparison is about 3D reconstruction results based on the measurement data. The reconstruction was performed on the low-resolution EIs recorded by the TAM system and our high-resolution EIs. The reconstruction results of the target surface are compared in Fig. 12, where the point cloud generated by our high-resolution EIs is distinctly denser with more details kept.

 figure: Fig. 12.

Fig. 12. Comparison of the surface reconstruction between the proposed SSA system and the TAM system.

Download Full Size | PDF

The measuring results obtained by the SSA system are shown in Table 3, which were determined using the disparity information extracted from the refocused image. A total of 15 measurements were conducted and 2 targets were measured to verify the feasibility and measurement performance of the SSA system. The measurement result acquired by a commercial measurement product – Zygo Nexview Optical profiler was used as the true value of the dimensions of the measured pyramid structure. Statistical results including bias, standard deviation (SD), and maximum absolute error (MaxAE) were provided in the Table 3, which indicates the measurement data acquired by the SSA system is valid. A comparison between the SSA system and the TAM system is provided in both Table 3 and Fig. 13. The numerical comparison shows the improvement of the measurement performance realized by the proposed system and the superiority of the system.

 figure: Fig. 13.

Fig. 13. Comparison of the measurement accuracy between the proposed SSA system and the TAM system.

Download Full Size | PDF

Tables Icon

Table 3. Statistical comparison between the SSA system and the TAM system

5. Conclusion

Autostereoscopic technology provides a rapid and accurate 3D measuring solution that can acquire the surface profile with only one snapshot. However, the dominant limitation of the autostereoscopic 3D measuring system is the trade-off between the angular resolution and the spatial resolution of the measuring data. In this paper, a self super-resolution approach based on deep convolutional neural networks is embedded into the autostereoscopic measuring system, which helps the system to achieve self super resolution during measurement process and significantly enhance the angular resolution of the measuring data. The self super-resolution approach was composed of a registration network, a residual encode-decoder networks, and a refining network. All of these key components formulate a generative network that can interpolate novel views between the neighboring EIs acquired by the proposed SSA measurement system. A discriminator network was implemented to distinguish the generative synthetic results from realist measuring data. The distinguish results were fed back to the generative network as an adversarial loss. In addition, the mean absolute error loss and the perceptual loss were utilized to be integrated with the adversarial loss so that a multi-loss function was used for the training of the proposed self super-resolution approach. Experiments have been conducted to demonstrate the superiority of the proposed self super-resolution approach. Comparison was also conducted between the proposed SSA measurement system and the traditional autostereoscopic measurement system without super resolution from many aspects including the resolution of measuring data, the quality of refocused images, and the statistics of their measurement results, which manifest the better measurement performance of the proposed SSA measurement system. This research reveals the potential and value of the SSA system to be used for rapid and accurate measurement on micro-structured surfaces.

Funding

Research and Innovation Office of Hong Kong Polytechnic University (RK2G); Research Grants Council of the Government of the Hong Kong Special Administrative Region, China (15207521); International Partnership Scheme of Bureau of International Cooperation, Chinese Academy of Sciences (181722KYSB20180015).

Acknowledgments

The work described in this paper was mainly supported by a grant from the Research Grants Council of the Government of the Hong Kong Special Administrative Region, China (Project No. 15207521). In addition, the authors would like to express their sincere thanks to the funding support from International Partnership Scheme of the Bureau of the International Scientific Cooperation of the Chinese Academy of Sciences (Project No.: 181722KYSB20180015). The authors would also like to express their sincere thanks to the Research and Innovation Office of The Hong Kong Polytechnic University for their financial support of the project through a PhD studentship (project account code: RK2G).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. B. C. Hornbuckle, C. L. Williams, S. W. Dean, X. Zhou, C. Kale, S. A. Turnage, J. D. Clayton, G. B. Thompson, A. K. Giri, and K. N. Solanki, “Stable microstructure in a nanocrystalline copper–tantalum alloy during shock loading,” Commun. Mater. 1(1), 1–6 (2020). [CrossRef]  

2. J. Park, J. Kim, J. Hong, H. Lee, Y. Lee, S. Cho, S.-W. Kim, J. J. Kim, S. Y. Kim, and H. Ko, “Tailoring force sensitivity and selectivity by microstructure engineering of multidirectional electronic skins,” NPG Asia Mater. 10(4), 163–176 (2018). [CrossRef]  

3. S. H. Mian and A. Al-Ahmari, “New developments in coordinate measuring machines for manufacturing industries,” Int. J. Metrol. Qual. Eng. 5(1), 101 (2014). [CrossRef]  

4. B. Gapinski, M. Wieczorowski, L. Marciniak-Podsadna, B. Dybala, and G. Ziolkowski, “Comparison of different method of measurement geometry using CMM, optical scanner and computed tomography 3D,” Procedia Eng. 69, 255–262 (2014). [CrossRef]  

5. A. Bastas, “Comparing the probing systems of coordinate measurement machine: Scanning probe versus touch-trigger probe,” Meas. 156, 107604 (2020). [CrossRef]  

6. D. Li, C. F. Cheung, M. Ren, L. Zhou, and X. Zhao, “Autostereoscopy-based three-dimensional on-machine measuring system for micro-structured surfaces,” Opt. Express 22(21), 25635–25650 (2014). [CrossRef]  

7. D. Li, C. F. Cheung, M. Ren, D. Whitehouse, and X. Zhao, “Disparity pattern-based autostereoscopic 3D metrology system for in situ measurement of microstructured surfaces,” Opt. Lett. 40(22), 5271–5274 (2015). [CrossRef]  

8. S. Wanner and B. Goldluecke, “Variational Light Field Analysis for Disparity Estimation and Super-Resolution,” IEEE Trans. Pattern Anal. Mach. Intell. 36(3), 606–619 (2014). [CrossRef]  

9. K. Mitra and A. Veeraraghavan, “Light field denoising, light field superresolution and stereo camera based refocussing using a GMM light field patch prior,” in 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2012), pp. 22–28.

10. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). [CrossRef]  

11. J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 1646–1654.

12. Y. Yoon, H.-G. Jeon, D. Yoo, J.-Y. Lee, and I. S. Kweon, “Learning a deep convolutional network for light-field image super-resolution,” in Proceedings of the IEEE international conference on computer vision workshops (2015), pp. 24–32.

13. J. Jin, J. Hou, H. Yuan, and S. Kwong, “Learning Light Field Angular Super-Resolution via a Geometry-Aware Network,” in Proceedings of the AAAI conference on artificial intelligence (2020), pp. 11141–11148.

14. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2016), pp. 770–778.

15. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative Adversarial Networks,” in Advances in Neural Information Processing Systems 27 (NIPS 2014) (2014).

16. J. Johnson, A. Alahi, and F.-F. Li, “Perceptual losses for real-time style transfer and super-resolution,” in European conference on computer vision (2016), pp. 694–711.

17. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv:1409.1556 (2014).

18. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv:1412.6980 (2014).

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (13)

Fig. 1.
Fig. 1. Autostereoscopic recording process with the illustration of the spatial resolution and the angular resolution determined by the resolution of the image sensor and the array size of the MLA.
Fig. 2.
Fig. 2. Framework of the proposed self super-resolution approach. The approach receives low-angular-resolution measurement data as input and enhances the angular resolution of the data by interpolating synthetic views. The final output high-resolution data are composed of the original measurement data and the synthetic data.
Fig. 3.
Fig. 3. Registration network (horizontal). Vertical and central registration networks have the same network architecture but different weights from the horizontal network.
Fig. 4.
Fig. 4. Framework of the residual encoder-decoder network. Vertical and central input pairs are processed under the same rule. All the input is processed by the encoder network separately.
Fig. 5.
Fig. 5. Framework of the residual refining network.
Fig. 6.
Fig. 6. Framework of the discriminator network.
Fig. 7.
Fig. 7. The training data and their corresponding ground truth.
Fig. 8.
Fig. 8. Setup of the SSA system. (a) Schematic diagram of the proposed system. (b) System implementation. (c) Measured sample.
Fig. 9.
Fig. 9. Comparison of the EIs between the proposed SSA system and the TAM system. (a) Low-resolution measurement EIs (TAM system). (b) Partial enlargement of the high-resolution EIs. (c) High-resolution EIs generated by the proposed self super-resolution approach (SSA system).
Fig. 10.
Fig. 10. Comparison of the angular super-resolution methods. The novel views are reconstructed by (a) the 4D Bilinear method as a baseline, (b) a state-of-the-art deep learning model [13] which was trained on the measurement dataset, (c) the model [13] which was pre-trained on a public light-field dataset and finetuned on the measurement dataset, and (d) the proposed model trained solely on the measurement dataset.
Fig. 11.
Fig. 11. Comparison of the refocused images between the proposed SSA system and the TAM system. (a) Low-resolution refocused images with different focal length (TAM system). (b) Partial enlargement for comparison. (c) High-resolution refocused images (SSA system).
Fig. 12.
Fig. 12. Comparison of the surface reconstruction between the proposed SSA system and the TAM system.
Fig. 13.
Fig. 13. Comparison of the measurement accuracy between the proposed SSA system and the TAM system.

Tables (3)

Tables Icon

Table 1. Details of the proposed networks

Tables Icon

Table 2. Specification of the SSA system

Tables Icon

Table 3. Statistical comparison between the SSA system and the TAM system

Equations (7)

Equations on this page are rendered with MathJax. Learn more.

| Δ d t Δ d b | = g u | D t D b | D t D b .
[ θ x l θ y l ] , [ θ x r θ y r ] = f R ( I l , I r ) , I l = [ 1 0 θ x l 0 1 θ y l 0 0 1 ] I l , I r = [ 1 0 θ x r 0 1 θ y r 0 0 1 ] I r .
l M A E = 1 N ( | I ~ m I g t | ) .
l ϕ = 1 N ( | ϕ ( I ~ m ) ϕ ( I g t ) | 2 ) .
l a = 1 N D ε ( I ~ m ) .
l d i s = 1 N ( D ε ( I ~ m ) D ε ( I g t ) ) .
l = l M A E + μ ϕ l ϕ + μ a l a = 1 N ( | I ~ m I g t | + μ ϕ | ϕ ( I ~ m ) ϕ ( I g t ) | 2 μ a D ε ( I ~ m ) ) .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.