Fully automated structured light scanning for high-fidelity 3D reconstruction via graph optimization

Zhengchao Lai; Runlin Zhang; Xuanquan Wang; Xuanquan Wang; Yu Zhang; Yu Zhang; Zhizhou Jia; Shaokun Han

doi:10.1364/OE.518556

1. Introduction

In recent years, rapid advancements in computer technology have significantly enhanced the processing power and accuracy of 3D data, spurring the demand for high-precision 3D reconstruction in fields such as digital twinning, medical applications, and archaeology. It plays a pivotal role in providing accurate physical entity replication, advancing precise medical diagnostics and treatments, and digitally preserving historical relics, thus fostering deep integration of technology across multiple domains.

Many existing model reconstruction systems rely on cumbersome processes such as manual presets [1] or repeated pre-calibration [2,3]. Others depend on hardware arrays based on high-precision LiDAR scanners [4]. The development of more efficient structured light cameras, specifically suited for object-level dense surface detailing, provides opportunities for more convenient and high-precision reconstruction systems. Against these backdrops, high-fidelity model reconstruction is still considered a challenging issue. We identify the main challenges as two consistency issues: geometric consistency and normal consistency.

One is the geometric consistency issue in complex model registration and stitching [5–7]. Most recent registration reconstruction methods focus primarily on pairwise registration of two partial point clouds (scans), only ensuring frame-to-frame consistency in the reconstructed model. Due to potentially small co-visible areas between different viewpoints and the possibility of matching errors caused by similar geometric features, the errors in frame-to-frame registration stitching can accumulate significantly with increasing frame numbers [8], leading to layering in the reconstructed model. Apart from geometric consistency, normal consistency is also crucial for the final reconstructed model. A globally consistent direction (all normals pointing to the same side of the surface) is a prerequisite for reconstruction and rendering. The complexity of some models lies in their interlaced topological structures, making surface normal variations complex and often indeterminable locally. Traditional orientation methods, focusing on local consistency, risk propagating local errors to large areas. Inconsistent normals can lead to extensive tearing and gaps in reconstruction and rendering.

Dedicated to addressing these challenges, this study proposes an innovative end-to-end scheme encompassing everything from data acquisition to high-fidelity surface model generation. The main contributions of this paper are as follows:

• Graph optimization-based global registration framework: Integrating the matching poses of multiple frames into pose graph optimization, this method constructs both global and local constraints. To achieve the optimal balance between the two, an information matrix describing the degree of match is constructed, allowing for iterative optimization without the need for repeated nearest-neighbor searches, thus enhancing global consistency.
• Dynamic weighting based on geometric and color difference constraints: A weighting function based on Gaussian distribution function is proposed, and the iteratively reweighted least squares method is used to smoothly prune redundant constraints, effectively handling outliers and erroneous constraints, and enhancing the robustness of the reconstruction system.
• Normal consistency correction based on viewpoint propagation: A strategy for normal vector correction is proposed, which, by combining local estimates based on camera viewpoints with global pose transformations, addresses both global consistency and local accuracy. This improves the accuracy and robustness of normal vector estimation, providing a foundation for surface model reconstruction.

2. Related works

In order to obtain three-dimensional data of an object from different perspectives through scanning, three approaches can be adopted: moving the target, moving the sensor, or neither moving both but instead increasing the number of sensors. Typically, the appropriate approach is selected based on the distinct characteristics of the object. For example, the works of Ye et al. [9] and Khawaldeh et al. [10]. demonstrate a method of placing the object on a turntable for rotational scanning. This method excels in capturing fine details of the object, but faces limitations in handling larger or irregular objects due to constraints in degrees of freedom and sensor perception distance. Moreover, certain studies [11,12] have utilized a strategy involving the installation of cameras on robotic arms, which then proceed to scan stationary objects following a predefined path. Furthermore, to capture high-dynamic scenes, Yue et al. [13] fixed multiple structured-light cameras onto a mechanical structure, enabling simultaneous shooting to reduce the time of the acquisition process, and to obtain three-dimensional data of static targets for multi-frame reconstruction. However, the convenience of using these systems is limited, necessitating complex external parameter estimation and manual presetting. Another challenge faced is that a commonality among these approaches is the requirement for high-precision point cloud registration techniques. However, traditional registration techniques are often limited to processing data from a single viewpoint or simple geometric structures. When dealing with multi-viewpoint, complex-structured large-scale point clouds, they frequently struggle to achieve the desired accuracy and efficiency.

2.1 Object-level point cloud registration

For high-precision reconstruction at the object level, point cloud registration is a key technology. Registration can be categorized into coarse registration, which does not require an initial estimate of relative pose, and fine registration, aimed at computing a more precise pose with an initial pose assumption. Currently, coarse registration is primarily targeted at scenarios characterized by low similarity between point clouds, aiming to robustly ascertain an approximate pose. Representative approaches include methods based on learning or probabilistic foundations. On the other hand, fine registration predominantly employs the iterative closest point (ICP) algorithm and its variants [14], which is the main subject discussed in this paper.

The improvement considerations for variants of ICP registration methods include their robustness to noise, outliers, and partial overlaps. To reduce the influence of outliers outside non-overlapping regions during iteration. A popular approach is to heuristically classify and discard some partial matching constraints based on distance [15,16]. Some methods propose apply a nonlinear weight to the cost, such as R-ICP [17] introduces the Welsch function, while Sparse-ICP utilizes the $l_p$-norm ($p$ < 1) [18]. AA-ICP [19] method applied Anderson acceleration to speed up the convergence. Color-ICP [20] utilizes color data to establish a comprehensive optimization goal, integrating geometric and photometric considerations for enhanced registration accuracy. Enhanced ICP that takes into account local and global geometric features [21–26]. However, these pairwise registration methods typically face a critical issue: the matching between frames often only considers the consistency between the matched frames. For complex model reconstruction, ensuring global consistency in point cloud registration from multiple viewpoints is crucial for seamless and watertight reconstruction. One approach is to adopt concepts from SLAM (simultaneous localization and mapping) [27,28] or SfM (structure from motion) [29–31], where the registered point clouds are incorporated into a temporary model. This model is then treated as a whole for registration with the next viewpoint. However, such incremental reconstruction often faces significantly increased computation as the scale of the scene or target enlarges. [32] proposes the use of local 3D information extraction to achieve incremental refinement and updating of point cloud models. This method effectively manages and optimizes incremental point cloud data without significantly increasing the computational burden. Another method is the simultaneous registration of multiple point sets, which involves grouping several point clouds and implementing a divide-and-conquer strategy. This approach, known as groupwise point set registration, significantly lightens the matching process [33–36]. However, these methods are essentially a compromise, as they require the use of sparsity-inducing techniques to filter out redundant information in dense point clouds, leading to the exclusion of some high-frequency information from global optimization. Additionally, the stitching within groups in these methods is still done on a frame-by-frame or incremental basis. This not only inherits the drawbacks of pairwise registration but also faces issues related to computational complexity and accuracy loss.

Beyond traditional optimization-based methods, learning-based approaches have also achieved impressive results in multi-frame matching [7,37,38,39]. The PointDSC [7] method proposed by Gojcic et al. enhances the accuracy and robustness of point cloud registration by learning deep features. Similarly, the deep global registration [39] introduced by Choy et al. utilizes deep learning for feature extraction and a global optimization framework to reduce error accumulation. [37]. Wang et al. [37], in addition to employing NetVLAD [40] for extracting global features from local descriptors, also utilize neural networks to estimate the overlap rate between scanning frames, facilitating effective initialization of a sparse pose graph. These methods demonstrate increased robustness in scenarios with high noise levels and in large-scale scene reconstruction, providing new perspectives for registration. However, these approaches still face limitations in processing high-precision, object-level scenes, such as high computational complexity and a strong dependence on data features.

Our algorithm differs from existing ones in several aspects. Firstly, we are the first to integrate point cloud color feature information as a dimension into the constraints of global matching, employing iteratively reweighted least squares (IRLS) for iteration and automatic filtering of redundant constraints. Another distinction lies in our approach’s ability to more smoothly reduce the impact of suboptimal and erroneous matches on the global scale, unlike methods that use pruning [41–43]. In cases of matches with similar local features, such as vases, teapots, and other locally symmetric objects, traditional global search methods tend to create incorrect constraints. Geometric methods are ineffective in discriminating and filtering these, as such erroneous matches inherently have a high degree of correspondence. Our method utilizes the fourth dimension for weighting, preventing local optima without explicitly increasing computational costs. To generate a complete model, we also propose a normal vector correction strategy, which merges local estimations based on camera perspectives with global pose transformations, balancing global geometric consistency and normal consistency.

2.2 Globally consistent normal orientation estimation

Establishing an accurate and globally consistent normal orientation is one of the most crucial aspects of surface reconstruction from point clouds. The seminal work by Hoppe et. al. [44] introduced a propagation method that begins with the initialization of normal vectors using principal component analysis (PCA), followed by the random selection of a face orientation. To optimize the propagation order for normal vector consistency, a Riemannian graph is constructed upon which a minimum spanning tree algorithm is applied. This method and its variants have demonstrated effective performance in common scenarios and remain widely used. Subsequent strategies for propagation have been proposed, including multi-seed [45], Hermite curves [46], edge collapse [47], and learning-based approaches such as dipole propagation [48]. These methods primarily aim to enhance the robustness towards sharp features during propagation; however, they are prone to falling into local optima due to initial value sensitivity, leading to extensive incorrect normal propagation. An alternative approach for reconstruction tasks integrates normal vector consistency with the surface reconstruction task itself. iPSR [49] refines surfaces iteratively by feeding normals calculated in previous iterations into a Poisson surface reconstruction solver, thereby refining the surface and correcting normals throughout the iterations. However, this method may be constrained by memory and iteration time limitations [50].

The aforementioned mainstream methods for normal vector consistency correction are applicable to unordered point clouds that lack observational viewpoints. The task of stitching multiple depth maps represents a semi-unordered point cloud scenario. Applying techniques suited for unordered point clouds results in the loss of existing viewpoint information and entails high computational complexity. Conversely, estimating each frame as an ordered point cloud fails to integrate information across different observational frames, thus no unified approach exists for normal consistency correction in point clouds reconstructed from depth scanning. Our method amalgamates the observed normal vectors from each frame and directly processes the raw point clouds without requiring any preprocessing or propagation. This approach is characterized by its short computation time and high flexibility, making it well-suited for global consistency estimation of normal vectors in depth scanning data.

3. Method

3.1 System pipeline

The proposed reconstruction system and algorithmic workflow are illustrated in Fig. 1. This section will elaborate on the process in three main parts: data acquisition, initial registration, and optimized reconstruction.

Fig. 1. Omnidirectional Scanning Platform (left) and reconstruction system pipeline(right)

Download Full Size | PDF

Initially, an automated acquisition of 3D data covering the entire outer surface of the object under test is required. Our custom-designed hardware system employs a structured light camera for 3D data acquisition. This camera, developed in-house, incorporates a projector emitting blue structured light patterns and a monocular camera synchronously capturing grayscale images. The mechanical structure is configured with three degrees of freedom. The object is placed on a centrally located, rotatable platform capable of 360-degree horizontal rotation (indicated by the red arrow). The camera is able to move vertically (as shown by the green arrow) and adjust its pitch angle (as indicated by the blue arrow). This allows the camera to capture the object from various angles, ensuring maximum coverage of every area.

The system is configured to uniformly distribute 10-20 predetermined pose angles (determined based on the size of the object and the camera’s field of view) across a 0 to 360-degree horizontal range around the object under test. The mechanical structure performs movements and rotations to acquire 3D data. The first step in complete model reconstruction is the initial alignment of the collected 3D data using the transformation poses derived from the motion of the mechanical structure as a prior. However, due to the existence of errors in the motion poses of the mechanical structure and calibration inaccuracies, it becomes necessary to further register the collected 3D data, using the extrinsic parameters of the mechanical movement as initial values. This process, referred to as inter-frame pose estimation in the figure, aims to obtain relatively higher precision in inter-frame relative poses, a step known as the pairwise registration.

The most significant challenge in point cloud matching and reconstruction between frames is the accumulation of errors, especially in the complete reconstruction of oversized targets. Accumulated errors can result in layering in the final model. To address this issue, we propose a method based on a graph-optimized multi-path matching global optimization algorithm, detailed in Chap.3.2.

To ensure system execution efficiency, the initial model frames collected are generally few. For some complex models, it is challenging to ensure complete collection of 3D data in non-convex parts and dead angles. Therefore, we propose a system that performs hole detection in the model before executing global model generation. This involves frame supplementation in the model’s missing parts, aiming for a comprehensive collection of 3D data of the model as much as possible. Due to space constraints, we will not provide a detailed introduction here. For more details, please refer to our publicly available source code.

After the complete stitching of the point cloud, surface reconstruction is performed for data simplification, enhanced visualization and rendering, and improved data processing efficiency. This process involves transforming the point cloud into a mesh model, creating continuous, smooth surfaces from scattered point sets. We complete the surface reconstruction by employing our proposed method of normal consistency correction combined with Poisson reconstruction [51]. This process is discussed in Chap.3.3.1. Finally, uv-map texturing is applied by [52] for visualization.

3.2 Global consistency registration adjustment

3.2.1 Notation

Let $\mathrm {T}_{m}^{n}$ represent the transformation relationship from the coordinate system of frame $m$ to that of frame $n$, with $\mathrm {T}_i$ denoting the transformation of frame $i$ to frame $0$, omitting the superscript. It is important to clarify that our ultimate goal is to estimate the absolute poses of all frames in the world coordinate system. Considering the coordinate system of the first point cloud frame (frame 0) as the world coordinate system, the relationships between all other collected frames and the reference frame form a set of poses $\mathcal {T} = \{\mathrm {T}_i | i = 1, 2,\ldots, n-1\}$. As the final optimization target, this is defined within the graph optimization framework illustrated in Fig. 2, represented as the blue blocks. Initially, we employ the point-to-plane feature-based ICP matching approach, namely R-ICP [17], for the estimation of pose between temporally adjacent frames. This step is referred to as local pairwise matching, each frame undergoes registration, estimating transformation poses $\mathrm {T}_{i+1}^{i}$ to align frame $i+1$ with the preceding frame $i$. Generally, the point clouds of adjacent poses have a higher overlap rate, resulting in fewer outlier matches and more reliable matching outcomes. Following this method, the pose of frame $i+2$ relative to its preceding frame $i+1$ is calculated as $\mathrm {T}_{i+2}^{i+1}$, and so on up to $\mathrm {T}_{n-1}^{n-2}$, creating $n-1$ matching constraints among $n$ frames, represented as $\mathcal {L} =\{\mathrm {T}_{i}^{i-1}|i=1,\ldots,n-1\}$. This point cloud registration relationship is defined as local pairwise matching edges, represented by blue circular connectors in Fig. 2. Under local pairwise matching relationships, the transformation of each frame’s point cloud relative to the world coordinate system can be derived through chain inference, serving as the initial value for the final optimization target:

(1)$$\tilde{\mathcal{T}}=\left\{ \tilde{\mathrm{T}}_i|\tilde{\mathrm{T}}_i=\prod_{k=0}^{i-1}{\mathrm{T}_{k+1}^{k}},i=1,2,\ldots,n-1 \right\}$$

Fig. 2. Assuming that point clouds are captured from 8 different frame viewpoints, the Multi-path Global Optimization Factor Graph is illustrated as shown in the figure. This graph demonstrates the constraint relationships among different optimized poses.

Download Full Size | PDF

During the matching process, significant overlap might also occur between non-adjacent frames, such as the loop-closing pose $\mathrm {T}_{n-1}^{0}$ formed by encircling the entire set, or when the camera poses of adjacent frames are sufficiently close, there often exist co-visibility relationships between frames that are separated by one or more frames, allowing for matching. This is referred to as global matching edges, and the constraints of this matching relationship are defined as global matching edges, represented by yellow circular connectors in Fig. 2. These constraints are expressed as: $\mathcal {G} =\{\mathrm {T}_{i}^{j}|i,j=1,2,\ldots,n-1,|i-j|\ne 1\}$.

From the analysis above, each frame is matched with at least two other point clouds, encompassing two types of matches: local and global match constrains.

3.2.2 Global consistency objective function

To optimize the error in global consistency, it is first necessary to construct a mathematical model to quantify the global cumulative error caused by pose deviations and to adjust it using nonlinear optimization techniques. This chapter introduces the construction of the global consistency objective function.

In the factor graph mathematical model constructed in Chap.3.2.1, we understand that global matching relationships provide the poses between non-adjacent frames, and through the chain-like deduction of local matching results, we can also determine their transformations. The key issue is that local and global matching constraints may not always be consistent. Simply put, due to the presence of cumulative errors, there may be a discrepancy between the transformation relationships of the poses obtained through global matching $\mathrm {T}_{i}^{j}$ and those obtained through local matching $\tilde {\mathrm {T}}_i$ and $\tilde {\mathrm {T}}_j$ (i.e. inferred from the inter-frame registration results as per Eq. (1)), which yields:

(2)$$\mathrm{T}_{i}^{j}\ne \left( \tilde{\mathrm{T}}_j \right) ^{{-}1}\cdot \tilde{\mathrm{T}}_i$$

Similarly, this effect also applies to loop-closing poses. As shown in Fig. 2, it is precisely the presence of such cumulative errors that leads to the layering phenomenon observed when matching the last frame with the first frame: $\mathrm {T}_{7}^{0}\ne \tilde {\mathrm {T}}_7$. Fig. 3 represents each frame’s point cloud as line segments, with different colors distinguishing different frames. This illustrates how the cumulative effect of errors under local registration impacts the final alignment results, with the largest error occurring between the last frame and the first frame. It’s important to note that there are overlapping regions that extend beyond inter-frame point clouds. Global bundle optimization can ensure that the results exhibit global consistency. Therefore, the ultimate optimization goal proposed in this paper is to optimize the poses of all odometry nodes, minimizing the discrepancy between local and global matching constraint edges. To address the consistency issue, a consistency objective function about $\mathcal {T}$, constituted by both local and global constraints $\mathcal {L} \cup \mathcal {G}$, needs to be formulated. For any target pose node $\mathrm {T}_i \in \mathcal {T}$, and any edge $\mathrm {T}_{i}^{j}$ connected to it in the aforementioned factor graph, define the corresponding matching pair between frames $i$ and $j$ as $\mathcal {C}_{ij} = \{(\mathrm {p}, \mathrm {q}) \parallel \parallel \mathrm {T}_{i}^{j}\mathrm {p} - \mathrm {q}\parallel < \varepsilon, \mathrm {p} \in P_i, \mathrm {q} \in P_j\}$. Here, $P_i$ and $P_j$ respectively represent the point clouds of frames $i$ and $j$. For simplicity, the homogeneous transformation process is omitted here; $\mathrm {T\mathrm {p}}$ represents $R\mathrm {p} + t$, where $\mathrm {T} \in SO(3)$. $\varepsilon$ is typically set to 10 millimeters, representing the maximum matching distance in the effective overlapping area.

Fig. 3. The line segments represent the point cloud, with different colors distinguishing different frames, showcasing the alignment results of both local and global registration.

Download Full Size | PDF

Fig. 4. When $\nu$ is 5, smaller intensity differences between point pairs in the point cloud indicate higher similarity, corresponding to higher confidence levels. Constraints exceeding the truncation threshold $3\nu =15$ will be discarded.

Download Full Size | PDF

From the local constraints between frames $i$ and $j$, transforming their matching points to the base coordinate system results in $\tilde {\mathrm {T}}_i\mathrm {p} \triangleq \tilde {\mathrm {T}}_j\mathrm {q}$, constructing the local matching error with respect to $\mathrm {T}_i$ and $\mathrm {T}_j$ as: $\epsilon _l(\mathrm {T}_i, \mathrm {T}_j) = \sum _{(\mathrm {p}, \mathrm {q}) \in \mathcal {C}_{ij}}{\left \| \tilde {\mathrm {T}}_i\mathrm {p} - \tilde {\mathrm {T}}_j\mathrm {q} \right \|^2}$. Through global matching, it is known that $\tilde {\mathrm {T}}_j\mathrm {T}_{i}^{j} \approx \tilde {\mathrm {T}}_i$, thus similarly $\tilde {\mathrm {T}}_j\mathrm {T}_{i}^{j}\mathrm {p} \triangleq \tilde {\mathrm {T}}_j\mathrm {q}$. Combining this with $\epsilon _l(\mathrm {T}_i, \mathrm {T}_j)$, the global matching cost for node $\mathrm {T}_i$ with respect to $\mathrm {T}_{j}^{i}$ can be constructed as:

(3)$$\begin{aligned} \epsilon _g\left( \mathrm{T}_i|\mathrm{T}_{i}^{j} \right) &=\sum_{\left( \mathrm{p},\mathrm{q} \right) \in \mathcal{C} _{ij}}{\left\| \tilde{\mathrm{T}}_i\mathrm{p}-\tilde{\mathrm{T}}_j\mathrm{q} \right\| ^2}\\ &\approx \sum_{\left( \mathrm{p},\mathrm{q} \right) \in \mathcal{C} _{ij}}{\left\| \tilde{\mathrm{T}}_i\mathrm{p}-\tilde{\mathrm{T}}_j\mathrm{T}_{i}^{j}\mathrm{p} \right\| ^2}\\ &\cong \sum_{\left( \mathrm{p},\mathrm{q} \right) \in \mathcal{C} _{ij}}{\left\| \left( \tilde{\mathrm{T}}_j\mathrm{T}_{i}^{j} \right) ^{{-}1}\tilde{\mathrm{T}}_i\mathrm{p}-\mathrm{p} \right\| ^2}, \end{aligned}$$

where

(4)$$\left( \tilde{\mathrm{T}}_j\mathrm{T}_{i}^{j} \right) ^{{-}1}\tilde{\mathrm{T}}_i\approx \left[ \begin{matrix} 1 & -\gamma & \beta & a\\ \gamma & 1 & -\alpha & b\\ -\beta & \alpha & 1 & c\\ 0 & 0 & 0 & 1\\ \end{matrix} \right] =\left[ \begin{matrix} \mathrm{I}+\zeta ^{\land} & \tau\\ 0 & 1\\ \end{matrix} \right]$$

When $\zeta = [\alpha, \beta, \gamma ]^{\top }, \tau = [a, b, c]^{\top }$, and $\xi = [\zeta, \tau ]^{\top }$ are substituted into Eq. (3), it results in:

(5)$$\begin{aligned} \epsilon \left( \mathrm{T}_i|\mathrm{T}_{i}^{j} \right) &\approx \sum_{\left( \mathrm{p},\mathrm{q} \right) \in \mathcal{C} _{ij}}{\left\| \zeta \times \mathrm{p}+\tau \right\| ^2}\\ &=\sum_{\left( \mathrm{p},\mathrm{q} \right) \in \mathcal{C} _{ij}}{\left\| \left[ -\mathrm{p}^{\land}|\mathbf{I} \right] \xi \right\| ^2}\\ &=\xi ^{\top}\left( \sum_{\left( \mathrm{p},\mathrm{q} \right) \in \mathcal{C} _{ij}}{G_{\mathrm{p}}^{\top}G_\mathrm{p}} \right) \xi =\xi ^{\top}\Lambda \xi \end{aligned}$$

It can be observed that $G_\mathrm {p} = [-\mathrm {p}^{\land } | \mathbf {I}]$ is a constant only dependent on the matched point cloud in frame $i$. Therefore,

(6)$$\Lambda = \sum_{(\mathrm{p},\mathrm{q}) \in \mathcal{C}_{ij}}{G_{\mathrm{p}}^{\top}G_\mathrm{p}}$$

is defined as the information matrix, a constant value that can be computed once before the global optimization. However, edges constructed by global matching are not always entirely reliable. We need to incorporate a verification of the matching points into the information matrix to reduce the impact of incorrect matches on global optimization. This will be elaborated in the next Chap.3.2.3 Robust weight function, regarding the modification of the information matrix.

3.2.3 Robust weight function

Based on the above analysis, we can establish all edges related to the node and construct the matching cost $\epsilon _g(\mathrm {T}_i) = \epsilon _g(\mathrm {T}_i|\mathrm {T}_{i}^{j}) + \epsilon _g(\mathrm {T}_i|\mathrm {T}_{i}^{k}) + \cdots$, to optimize node $\mathrm {T}_i$. However, to enhance the robustness of iterations, determining the reliability of constraint edges is crucial. Among all matches, there are inevitably some erroneous or highly erroneous matches that can negatively impact the global optimization, causing the optimization to iterate in the wrong direction. Current robust function schemes based on the ICP method, apart from assessing nearest neighbor points and overlap rates, lack more comprehensive parameters for weighting. We propose introducing intensity similarity as a factor in the global optimization process, assessing the intensity similarity of matching point pairs. The intensity information of the point cloud can be derived from the intensity values of grayscale images obtained through structured light scanning, or from the colors of RGB images through external parameter transformation and mapping.

At the beginning of each iteration, it is imperative to re-construct the the corresponding matching pair set $\mathcal {C}_{ij}$ for the edge $\mathrm {T}_{i}^{j}$ given in current iteration. For the set of point pairs $(\mathrm {p},\mathrm {q}) \in \mathcal {C}_{ij}$ satisfying the geometric constraint $\parallel \mathrm {T}_{i}^{j}\mathrm {p} - \mathrm {q}\parallel < \varepsilon$, the optimization weight is measured using the zero-mean Gaussian function. The standard deviation $\nu$ is used to adjust the sensitivity of the weight to intensity differences $\Delta I_{(\mathrm {p},\mathrm {q})} = I_\mathrm {p} - I_\mathrm {q}$:

(7)$$w_{(\mathrm{p},\mathrm{q})}=exp\left( -(\Delta I_{(\mathrm{p},\mathrm{q})})^2/2\nu ^2 \right),$$

Figure 4 depicts the curve illustrating the correlation between the intensity differences $\Delta I_{(\mathrm {p},\mathrm {q})} $ and the weight $w_{(\mathrm{p},\mathrm{q})}$ where $3\nu$ acts as a truncation operator. When $\nu$ is less than the truncation threshold of 5, i.e., $\Delta I_{(\mathrm {p},\mathrm {q})} < 15$, point pairs with smaller intensity differences indicate higher similarity and, correspondingly, higher confidence.

This confidence level, serving as a weight in the optimization process, can reduce the adverse impact of incorrect matches on the optimization. Therefore, the information matrix (6) described in Chap.3.2.2 can be optimized to:

(8)$$\Lambda _w=\sum_{(\mathrm{p},\mathrm{q})\in \mathcal{C} _{ij}}{w_{(\mathrm{p},\mathrm{q})}^{2}G_{\mathrm{p}}^{\top}G_\mathrm{p}}$$

The information matrix is updated into the objective function in Eq. (5), constructing the residual $\epsilon (\mathrm {T}_i)$ for all edges related to $\mathrm {T}_i$. Ultimately, we obtain a global optimization objective function that incorporates a weight for point cloud color intensity consistency.

(9)$$E(\mathcal{T} )=\sum_{i=1}^{n-1}{\epsilon _g(\mathrm{T}_i)}$$

The process of inter-frame point cloud registration, construction of the graph optimization framework, and the global optimization registration algorithm based on intensity similarity robust functions are illustrated in the pseudocode as shown in Algorithm 1.

Algorithm 1. Global Registration Graph Optimization Algorithm.

View Table | View all tables in this article

In summary, we initially constructed inter-frame matches for each point cloud frame with respect to other frames with which it shares sufficient co-visibility constraints, including local constraints $\mathsf {\mathcal {L}}$ (adjacent frames) and global constraints $\mathsf {\mathcal {G}}$ (non-adjacent frames). As discussed in Chap.3.2.2, the cost of global consistency is solely related to the information matrix $\Lambda$, which is constructed from the set of matching points by Eq. (6). After each pose update, changes in the pose cause alterations in the set of matching points ${\mathcal {C}_{ij}}$, necessitating an update to the nearest neighbor points in the constraint point cloud matches. This process involves re-weighting each constraint based on the color differences of the matching points to establish ${{w}_{(\text {p},\text {q})}}$, thereby entering a new iteration of optimization. Consequently, the final optimized pose is achieved by smoothly re-weighting all constraints, balancing global and local constraints against each other to minimize the consistency cost $E(\mathcal {T} )$.

3.3 Surface reconstruction

After point cloud matching and global optimization, transforming each frame’s point cloud results in the complete point cloud model: $\mathbb {P} = \left \{ \bigcup _{i=0}^{n-1}{\mathrm {T}_i\mathrm {P}_i} \right \}$, composed through the pose transformations and fusion of various frames.

However, a point cloud is an unordered set of points without stored information about the connections between points. This unstructured nature makes performing many geometric operations difficult, such as volume calculation, surface reconstruction, and complex geometric transformations. Furthermore, point clouds can have uneven coverage, leading to inconsistent quality in the reconstructed surfaces. Thus, it becomes necessary to transform the point cloud into a mesh. A mesh provides richer topological information, making surface reconstruction, modeling, and editing operations simpler, more intuitive, and efficiently representable.

In the transformation from point cloud to mesh, it is first essential to construct point cloud normal vectors with surface consistency. During the process of generating a mesh from a point cloud, it is usually necessary to estimate the normal vectors of each point to help determine the shape, orientation, and topological structure of the local surface. If the normal vectors are inconsistent, the reconstructed surface might display incorrect geometric shapes or unnatural sharp edges where it should be smooth. Therefore, consistency in normal vectors is crucial for converting a point cloud into a mesh. This chapter will introduce our process of constructing consistent normal vectors and transforming the point cloud into a mesh.

3.3.1 Normal vector consistency

Normal vector correction involves establishing the direction of normal vectors for each point in a point cloud so that they consistently point outward from the object’s geometric surface throughout the entire cloud. The main challenge is that estimating normal vectors for the point cloud as a whole, using a search radius $r$ for each point to estimate its normal vector and obtaining result $\mathbb {N}^g$, lacks consideration of global direction. This can lead to sudden changes in normal vector direction in certain areas. Fig. 5 displays these normal vectors in pseudo-color, clearly highlighting this issue.

Fig. 5. Figure a and b respectively depict the pseudocolor results of integrated point cloud normal estimation and global fused frame-by-frame normal estimation. By taking a cross-section at the plane where $z$ equals the $z$-axis of the point cloud’s center point, the resulting visualizations of the point cloud normal vectors are as shown in Figures c and d, respectively.

Download Full Size | PDF

Fig. 6. The correction before (red vectors) and after (green vectors) for the z=0 cross-section of global fused frame-by-frame normal estimation is shown. From the details in the red and blue boxes, it is evident that our method effectively corrects areas of the object’s surface with high curvature and high frequency.

Download Full Size | PDF

Fig. 7. The complete mesh models in the above figure are the results of Poisson mesh reconstruction before (left) and after (right) normal vector correction. We present a detailed comparison of both in the solid and dashed boxes, respectively.

Download Full Size | PDF

We propose a method of correcting normal vectors by recursively inferring frame-by-frame point cloud normal vectors into a global format. First, for each frame of point cloud data captured from different viewing angles, normal vectors are estimated based on the direction of lines from each point in the cloud to the camera’s coordinate origin. As this estimation method ensures that each point is within the visible range of the camera’s origin, the estimated normal vectors will uniformly point outward from the point cloud surface, forming acute angles with the direction of the camera origin. This estimation ensures the consistency of the point cloud normal vectors throughout the entire cloud. Estimating the normal vectors for each frame’s point cloud, and then fusing them after the global pose transformation of the point cloud estimated in Chap.3.2, we can obtain globally consistent normal vectors for the point cloud:

(10)$$\mathbb{N} ^l=\bigcup_{i=0}^{n-1}{R_iN_i},$$

where $R_i$ and $N_i$ respectively denote the rotation matrix representing the transformation from the coordinate system of the i-th frame point cloud to the world coordinate system and the normal vector with the observation view as the positive direction.

As shown in the cross-sections in Fig. 5(c and d), the result of the integrated point cloud normal estimation exhibits several instances of sudden changes in the direction of the point cloud normal vectors. This occurs because it is not feasible to determine the correct direction of normal vectors based on a single origin, leading to uncertainty in the direction of locally estimated normal vectors. In contrast, the cross-sectional point cloud direction obtained by global fused frame-by-frame normal estimation is correct. This is because each point therein is oriented towards the direction of its corresponding observational viewpoint’s origin, ensuring that the normal vectors point outward from the object’s surface.

The aforementioned method does ensure the correct orientation of the point cloud. However, each point cloud’s normal vector is derived from the estimation of its single-frame point cloud, transformed accordingly. Due to the lack of neighboring points in the single-frame point cloud at the edges, the normal vector estimation is often inaccurate. In contrast, the integrated point cloud normal estimation, which considers all frames within the neighborhood, tends to have higher accuracy in normal vector estimation. However, as analyzed above, this method lacks consistency in the direction of the point cloud. Therefore, we combine both estimations: using the direction of $\mathbb {N}^l$ to correct the direction of $\mathbb {N}^g$, while keeping the magnitude unchanged. The calculation is as follows:

(11)$$\bar{n}_{\mathrm{p}}^{l}=\frac{1}{\left\| \mathcal{N} \left( \mathrm{p},r \right) \right\|}\sum_{\mathrm{q}\in \mathcal{N} \left( \mathrm{p},r \right)}{n_{\mathrm{q}}^{l}},$$

(12)$$n_{\mathrm{p}}^{*}=\begin{cases} n_{\mathrm{p}}^{g}, & \cos \left\langle n_{\mathrm{p}}^{g},\bar{n}_{\mathrm{p}}^{l} \right\rangle{>}0,\\ -n_{\mathrm{p}}^{g} & else\\ \end{cases}$$

where $n_{\mathrm {p}}^{g}\in \mathbb {N} ^g,n_{\mathrm {q}}^{l}\in \mathbb {N} ^l$. $n_{\mathrm {p}}^{*}$ represents the final fused normal vector result that maintains global consistency in orientation. $\mathcal {N}(\mathrm {p}, r)$ represents the point cloud within the radius $r$ of point $p$, and $\bar {n}_{\mathrm {p}}^{l}$ denotes the mean normal vector of the point cloud within this neighborhood. The mean direction is used as a reference to correct the orientation of the globally solved point cloud normal vector $n_{\mathrm {p}}^{g}$. The point clouds before and after correction are shown in Fig. 6

3.3.2 Mesh generation

From the preceding methodology, the point cloud is endowed with oriented normals that exhibit uniform directionality, either inward or outward relative to the surface’s geometry. Consistent and robust normal orientations are a prerequisite for reconstructing a mesh from a point cloud [53]. These orientations facilitate the construction of a Signed Distance Function (SDF) in the surrounding space, mapping each point in space to its shortest distance to the surface [54]. The sign of the distance distinguishes whether a point is inside or outside the surface, simplifying the construction of surfaces from point clouds. The method employed for reconstructing meshes from point clouds in this context is the Poisson reconstruction method proposed by [51]. Leveraging global information, this implicit surface representation approach adeptly manages complex geometric and topological structures such as holes and bridges, enabling the generation of seamless and closed surfaces.

The primary steps of Poisson surface reconstruction include: firstly, transforming the oriented point samples that represent the solid boundary into a continuous vector field in 3D space. Subsequently, the process involves finding a scalar function, which entails identifying a scalar function whose gradient best matches the transformed vector field. Finally, isosurface extraction is performed to extract an appropriate isosurface, resulting in a sealed mesh. This method relies on a key equation that finds the optimal surface by minimizing an energy function. This energy function is typically expressed as:

(13)$$E(\chi )=\int_V{(}\mathrm{p)}-\nabla \chi (\mathrm{p)}^2d\mathrm{p}+\alpha \cdot \frac{Area(P)}{w(\mathrm{p)}}\chi ^2(\mathrm{p)},$$

In this equation, $E(\chi )$ represents the energy function, and $\chi$ is the scalar function being sought. Area(P) is the area of the reconstructed surface. $\nabla \chi (\mathrm {p})$ is the gradient of $\chi$ at point $\mathrm {p}$, related to the normal vector field. $\omega$ is a weighting factor used to balance between fitting the data points and smoothing the surface. The consistency of normal vectors directly affects $\nabla \chi (\mathrm {p})$, i.e., the gradient of the scalar function. When normal vector directions are inconsistent, pointing inward at some points and outward at others, this leads to a disordered gradient field, resulting in incorrect surface reconstruction. The consistency of normal vectors also impacts the second term of the energy function because $\chi (\mathrm {p})$ needs to be close to zero near the sample points, and inconsistent normal vectors can cause significant deviations in the values of $\chi (\mathrm {p})$ at these points, increasing the value of the energy function. Therefore, the consistency of normal vectors is crucial for minimizing the energy function during the Poisson reconstruction process. Only when normal vectors are consistent can the reconstructed surface accurately reflect the original data’s shape and structure. The algorithmic process for the correction and unification of normal vectors and the generation of a consistent surface is presented in Algorithm 2.

Algorithm 2. Normal Vector Correction and Surface Generation Algorithm.

View Table | View all tables in this article

Using the surface generation method for point clouds as proposed by [51], we can obtain the results shown in Fig. 7.

4. Experiment

To evaluate the effectiveness of our algorithm, we conducted extensive simulations and real-world experiments, and compared them horizontally with numerous other methods to demonstrate the advanced nature of our method. This chapter is divided into two parts: simulations and real-world experiments. In real-world scenarios, it is challenging to obtain complete error quantification due to the difficulty in acquiring true values of the reconstructed pose and model. We used Gazebo simulations to mimic the process of a depth camera collecting 3D data of the test model from various angles. Simulated errors were added to the data, and the true point cloud and pose were obtained for detailed quantitative error assessment. This is introduced in Chap.4.1. In real-world settings, we used a custom-built three-degree-of-freedom automatic data collection platform for data collection and reconstruction, and to implement various functions of the system. This is discussed in Chap.4.2

4.1 Simulation experiment

4.1.1 Gazebo-based data synthesis

To quantitatively assess the effectiveness of the algorithm, we simulated some existing evaluation models in Gazebo to generate reconstructed data with authentic reference values. A portion of these models, sourced from Stanford’s 3D dataset [55], had additional textures applied. Others were based on complete 3D representations of cultural artifacts. The comprehensive scanning process of the models involved simulating the capture of depth and color images using RGB-D cameras from 16 different positions and viewing angles. The viewpoints were evenly distributed to cover the object’s observation space. Some of the viewpoints were set at angles of 15 to 30 degrees from the horizontal plane, while others were positioned at angles between 45 and 60 degrees, with the horizontal angles evenly distributed from 0 to 360 degrees. The shooting distance was established at 0.5 to 1.5 meter from the surface of the models. The placement of the cameras is illustrated in Fig. 8. The figure illustrates the Gazebo scene set up for data acquisition, incorporating factors like lighting, textures, and reflections to emulate the real-world collection process as closely as possible. The right image displays the point cloud merged from real poses and point clouds, along with individual frames.

Fig. 8. The estimated pose and the point cloud generated from multi-frame registration.

Download Full Size | PDF

4.1.2 Synthetic data evaluation

A total of six models of varying sizes were collected. The larger models, namely lion, dragon, and deer, have a surface area of approximately 1-2 square meter. The medium-sized models are eagle, bunny, and vase, each with a surface area of about 0.5 square meters. Among these, the vase, due to its cylindrical and quasi-symmetrical shape, exhibits numerous similar geometric features. All models possess textures.

The accuracy of our algorithm was verified by conducting a precision comparison with numerous state-of-the-art model registration algorithms. The algorithms used for comparison are R-ICP [17], AA-ICP [19], and Sparse-ICP [18], which are typical variants of the iterative closest point (ICP) method based on point-to-plane and point-to-point matching. Color-ICP [20], a variant of the ICP algorithm, incorporates color information into its optimization process. Note that as our algorithm employs global optimization, it requires pairwise registration based on the results of R-ICP [17] before the final outcome can be obtained.

To accurately quantify the registration outcomes, we employ the same evaluation protocol as in [56,57], where the angular difference between the rotation matrix and the ground truth rotation matrix is measured by the following equation:

(14)$$e_{rot}=\mathrm{arc}\cos \left( \frac{\mathrm{tr(R}_{\mathrm{gt}}^{\top}\mathrm{R}_{\mathrm{est}})-1}{2} \right)$$

The error in the translation vector is given by the Euclidean distance:

(15)$$e_{trans}=\left\| \mathrm{t}_{\mathrm{gt}}-\mathrm{t}_{\mathrm{est}} \right\| ,$$

where $\left [ \mathrm {R}_{\mathrm {gt}},\mathrm {t}_{\mathrm {gt}} \right ]$ and $\left [ \mathrm {R}_{\mathrm {est}},\mathrm {t}_{\mathrm {est}} \right ]$ respectively represent the ground truth and estimated pose transformations to the base coordinate system.

The multi-frame registration errors of all algorithms across six model data sets are compared in Fig. 9. It indicates that the initial frame matching error of our algorithm is comparable to that of other algorithms. However, with an increasing number of frames, the errors in other algorithms accumulate, progressively enlarging frame by frame, exhibiting an upward trend in the curve. In contrast, our algorithm, through multi-frame matching and correction, maintains errors at a relatively lower level. Relative to the native R-ICP pose estimation, our algorithm demonstrates an average reduction of 67.9% in translation error and 77.1% in rotation error across the six datasets tested.

The prior experiments assessed the accuracy of pose estimation. We now proceed to evaluate the quality of the reconstruction by assessing the point cloud registration accuracy for each model. This involves calculating the positional error (Euclidean distance) between each point cloud and the actual point cloud, followed by computing the average Root Mean Square Error (RMSE) and standard deviation for each frame. The results are illustrated in color-coded charts as shown in Fig. 10.

Fig. 9. The comparison of translation and rotation errors in the registration performed by each algorithm.

Download Full Size | PDF

Fig. 10. A color-coded error mean table, where darker colors indicate larger values.

Download Full Size | PDF

Figure 10 shows that our algorithm surpasses others in terms of registration accuracy and stability. The errors of the proposed method, R-ICP, and Sparse-ICP are maintained within sub-millimeter levels, while for Color-ICP and AA-ICP, due to the cumulative error from frame to frame, the average registration errors range between 3-5 mm. Particularly for the vase model, characterized by numerous similar geometric features, other algorithms fail to robustly handle such scenarios and cannot effectively differentiate outlier matches, leading to local optima; AA-ICP even yields incorrect results. Our method, like Color-ICP, employs the color information of point clouds as an additional constraint. The difference lies in our use of IRLS strategy, which dynamically adjusts its weights during iterations, offering greater adaptability. This approach allows us to balance accuracy and robustness effectively.

The globally registered complete model undergoes consistent normal vector estimation using the method described in Chap.3.3.1, followed by Poisson mesh generation and texture mapping as outlined in Chap.3.3. The final results produced by our algorithm are depicted in Fig. 11, showcasing the leftmost textured model. The quantitative results from Fig. 10 reveal that the errors associated with the Color-ICP and AA-ICP results significantly exceed those of the other methods compared. Due to space limitations, only a subset of models reconstructed by the methods that achieved sub-millimeter accuracy are displayed. On the right, partial mesh reconstruction results of models stitched together using various algorithms are displayed. To more prominently illustrate the impact of registration errors, which leads to stratification in the mesh, we employ color coding based on the values of the normal vectors. This underscores the significant impact of global registration accuracy on the quality of mesh model generation in the context of model scanning and reconstruction.

Fig. 11. The images in the leftmost column presents the complete texture map model generated by our algorithm. Other columns illustrate meshes color-encoded by normal vectors, derived from point clouds aligned via diverse algorithms and subsequently meshed through Poisson reconstruction, with GT employed as the benchmark for comparison.

Download Full Size | PDF

4.2 Real-world experiment

In the real-world experiment, we conducted reconstruction tests on portrait sculptures as targets. The target was placed on a turntable at the center of the platform, with 16 poses preset similar to those in the simulation experiments. Data acquisition was carried out using the automated collection platform described in Fig. 1. The process of data collection took approximately one minute from start to finish. Subsequently, the acquired multi-view target data underwent registration and reconstruction. It is important to note that since the raw data inevitably contains outliers from the scene, it is first necessary to segment and preprocess the target point cloud. This part involves the calibration of the platform and the derivation of the kinematic pose of the platform, which are not elaborated upon here.

4.3 Precision evaluation

Similar to the simulation experiments, the five algorithms compared will be tested on 16 frames of point clouds captured from different perspectives, outputting the pose of each point cloud relative to the initial point cloud. Given the challenges in acquiring accurate registration poses in real-world scenarios, an alternative approach for error measurement is employed. The pose of the point cloud from the final frame is transformed into the coordinate system of the point cloud from the initial frame, which serves as the baseline coordinate system. Subsequently, a nearest neighbor search is conducted between the point clouds of the first and final frames, and the distance between the closest corresponding points is computed as the matching error. This method is utilized to assess the accuracy of loop closure and to evaluate the overall consistency of the point cloud’s global registration.

We compute errors only for matched point pairs with distances less than 1mm, categorizing those with distances greater than 1mm as incorrect matches. A cumulative frequency graph of the errors for all inliers is depicted in Fig. 12, where the horizontal axis represents the magnitude of the error and the vertical axis indicates the proportion or frequency of all data points with errors less than or equal to this error value. The graph clearly illustrates the characteristics of the data distribution; a curve closer to the upper-left corner indicates a larger proportion of points with smaller errors, signifying higher algorithmic precision. For instance, in our algorithm, the proportion of points with matching errors less than 0.5mm is 99.05%, while for R-ICP, it is 98.5%, for Color-ICP is 79.16%, for Sparse-ICP is 69.89%, and for AA-ICP is 53%.

The pseudo-color visualization of registration errors in the point cloud is shown in Fig. 13. The color-coded point cloud demonstrates that our results exhibit the smallest matching errors in the co-visible areas of the first and last frames, with an overall uniform error distribution. The results from R-ICP and Sparse-ICP also present relatively small errors, but with uneven distribution, displaying smaller errors in some regions and larger in others, a result of error accumulation. Conversely, the results from Color-ICP and AA-ICP show larger errors, with a significant number of points in the co-visible areas having matching errors exceeding 1mm, leading to their exclusion.

Fig. 12. The horizontal axis represents the registration error, while the vertical axis indicates the proportion or frequency of all data points with errors less or equal to the corresponding value on the horizontal axis.

Download Full Size | PDF

Fig. 13. The pseudo-color map of the errors in matched corresponding pairs is depicted, with darker colors representing smaller errors. On the far left is the final result of our algorithm after mesh reconstruction and color mapping.

Download Full Size | PDF

4.4 Time efficiency evaluation

We conducted an offline time efficiency evaluation experiment on the data from the previous experiment, testing five methods separately. The experiments were conducted on a computing platform comprised of a laptop with an Intel i7-13620 CPU and 16GB of RAM, without the involvement of GPU processing. To reduce the complexity of point cloud data and improve processing efficiency, the target point cloud was subjected to downsampling with a voxel size of 2.5 millimeters in the preprocessing step. The average number of points per preprocessed point cloud frame is 24,600, and the final generated model has a surface area of approximately 0.43 square meters. The 16 frames of preprocessed target point clouds were registered using the methods under evaluation. Given that the frame-to-frame matching processes are independent of one another, all methods were evaluated using multithreading to maximize computational efficiency. The timing results of the tests are shown in Table 1. Our method includes the time consumed by three modules: local registration, global registration, and consistency iterative optimization(Iter. Opt.), as well as the total time information.

Table 1. Total time consumed in registration by each method. Local Reg. and Global Reg. respectively represent the time consumption of the Local Registration and Global Registration modules in our algorithm. Iter. Opt. refers to the time for iterative optimization for consistency. (Units: seconds)

View Table | View all tables in this article

The timing results from various methods reveal that our approach, due to the incorporation of global cross-matching and consistency iterative optimization, consumes more time for reconstruction compared to other methods. Indeed, the most time-consuming aspect during the system’s operation is data collection. Structured light cameras, constrained by their underlying principles, require at least 1 second for data to be collected and received. Furthermore, the scanning process necessitates the actuation of mechanical structures to change their poses, with the collection of 16 frames taking approximately 1 minute. During the data collection phase, both local registration and global registration proceed concurrently, with consistency iterative optimization being performed only after the complete collection. Thus, from an engineering standpoint, our method approaches real-time performance. Compared to other methods, it significantly enhances registration accuracy with an acceptable increase in processing time.

Additionally, our method applies consistency correction to the normals of the point cloud, with this module consuming 1.21 seconds. Finally, including the Poisson mesh reconstruction at a depth of 10 and texture mapping by [52], the total time consumed by the system is 43.89 seconds.

5. Conclusion

We propose a novel automated reconstruction system using structured light scanning. Our method overcomes the challenge of error accumulation in traditional multi-frame registration techniques. In this study, we introduce the integration of color information of point clouds into global optimization and its iterative processing using the IRLS method, achieving enhanced precision in global registration. Especially when dealing with models that have numerous similar features or complex topological structures (such as vases), where other algorithms fail to robustly handle such cases, our method effectively makes corrections. The experiments demonstrate that our method achieves an average reduction of 67.9% in translation error and a 77.1% reduction in rotation error compared to frame-by-frame pose estimation on simulated data. In real-world experiments, more than 99.05% of matched pairs have a matching error less than 0.5mm(recall rate). We also introduces a normal vector correction strategy, solving the inconsistency problem of normal vectors in surface reconstruction, thereby significantly enhancing the quality of the final reconstructed model. The 3D reconstruction platform we designed is highly practical and operable, greatly reducing the cost of manual operations during the reconstruction process.

In summary, this study is of significant importance not only in theory and engineering but also provides practical solutions for relevant industrial applications.

6. Discussion

To facilitate the construction of matches under co-visibility constraints, our methodology involves comprehensive cyclic matching and association of point pairs across frames for global optimization, hindering real-time application capabilities. In future endeavors, we envisage replacing the laborious point cloud search mechanism with a more sophisticated search algorithm, transcending the conventional KD-tree (K-Dimensional Tree) approach. Additionally, employing a strategy that iterates with variable confidence weights for distinct frames and overlap ratios, informed by the spatial positioning of viewpoints, promises to enhance the robustness and efficiency over mere reliance on simplistic cyclic matching processes.

Furthermore, our approach to global consistency in normal vector correction is specifically designed for multi-frame fused point clouds, acquired from identifiable viewpoints, such as those captured by LiDAR or RGB-D sensors. This technique is not applicable to the normalization of unordered point clouds. In our future research, we plan to explore the integration of ray tracing techniques to adapt our method for application to unordered point clouds.

Funding

National Key Research and Development Program of China (2021YFC2202404).

Acknowledgments

The authors appreciate Prof. Shaohui Zhang of Beijing Institute of Technology for his helpful and constructive comments and suggestions.

Disclosures

The authors have no conflicts of interest to declare.

Data availability

Data underlying the results presented in this paper are publicly available at [58].

References

1. H. Du, P. Henry, X. Ren, et al., “Interactive 3d modeling of indoor environments with a consumer depth camera,” in Proceedings of the 13th International Conference on Ubiquitous Computing, (Association for Computing Machinery, New York, NY, USA, 2011), UbiComp ’11, pp. 75–84.

2. Y. M. Kim, J. Dolson, M. Sokolsky, et al., “Interactive acquisition of residential floor plans,” in 2012 IEEE International Conference on Robotics and Automation, (2012), pp. 3055–3062.

3. A. Sankar and S. Seitz, “Capturing indoor scenes with smartphones,” in Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, (Association for Computing Machinery, New York, NY, USA, 2012), UIST ’12, pp. 403–412.

4. N. Corso and A. Zakhor, “Indoor localization algorithms for an ambulatory human operated 3d mobile mapping system,” Remote Sens. 5(12), 6611–6646 (2013). [CrossRef]

5. Z. Qin, H. Yu, C. Wang, et al., “Geometric transformer for fast and robust point cloud registration,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2022), pp. 11133–11142.

6. Z. Qin, H. Yu, C. Wang, et al., “Deep graph-based spatial consistency for robust non-rigid point cloud registration,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), pp. 5394–5403.

7. X. Bai, Z. Luo, L. Zhou, et al., “Pointdsc: Robust point cloud registration using deep spatial consistency,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), pp. 15854–15864.

8. C. Choy, W. Dong, and V. Koltun, “Deep global registration,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), pp. 2511–2520.

9. Y. Ye and Z. Song, “An accurate 3d point cloud registration approach for the turntable-based 3d scanning system,” in 2015 IEEE International Conference on Information and Automation, (2015), pp. 982–986.

10. S. Khawaldeh, T. A. Aleef, U. Pervaiz, et al., “Complete end-to-end low cost solution to a 3d scanning system with integrated turntable,” ArXiv, arXiv.1709.02247 (2017). [CrossRef]

11. Y. Zong, J. Liang, and W. Pai, “A high-efficiency and high-precision automatic 3d scanning system for industrial parts based on a scanning path planning algorithm,” Opt. Lasers Eng. 158, 107176 (2022). [CrossRef]

12. I. D. Lee, J. H. Seo, and Y. M. Kim, “Automatic pose generation for robotic 3-d scanning of mechanical parts,” IEEE Trans. Robot. 36(4), 1219–1238 (2020). [CrossRef]

13. H. Yue, Y. Yu, W. Chen, et al., “Accurate three dimensional body scanning system based on structured light,” Opt. Express 26(22), 28544–28559 (2018). [CrossRef]

14. D. Xie, W. Zhu, F. Rong, et al., “Registration of point clouds: A survey,” in 2021 International Conference on Networking Systems of AI (INSAI), (2021), pp. 136–142.

15. S. Rusinkiewicz and M. Levoy, “Efficient variants of the icp algorithm,” in Proceedings Third International Conference on 3-D Digital Imaging and Modeling, (2001), pp. 145–152.

16. D. Chetverikov, D. Stepanov, and P. Krsek, “Robust euclidean alignment of 3d point sets: the trimmed iterative closest point algorithm,” Image Vis. Comput. 23(3), 299–309 (2005). [CrossRef]

17. J. Zhang, Y. Yao, and B. Deng, “Fast and robust iterative closest point,” IEEE Trans. Pattern Anal. Mach. Intell. 44, 3450–3466 (2021). [CrossRef]

18. S. Bouaziz, A. Tagliasacchi, and M. Pauly, “Sparse iterative closest point,” Comput. Graph. Forum 32(5), 113–123 (2013). [CrossRef]

19. A. L. Pavlov, G. W. Ovchinnikov, D. Y. Derbyshev, et al., “Aa-icp: Iterative closest point with anderson acceleration,” in 2018 IEEE International Conference on Robotics and Automation (ICRA), (2018), pp. 3407–3412.

20. J. Park, Q.-Y. Zhou, and V. Koltun, “Colored point cloud registration revisited,” in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), pp. 143–152.

21. H. Yang, J. Shi, and L. Carlone, “Teaser: Fast and certifiable point cloud registration,” IEEE Trans. Robot. 37(2), 314–333 (2021). [CrossRef]

22. Z. Yang, T. Dan, and Y. Yang, “Multi-temporal remote sensing image registration using deep convolutional features,” IEEE Access 6, 38544–38555 (2018). [CrossRef]

23. S. Chen, L. Nan, and R. Xia, “Plade: A plane-based descriptor for point cloud registration with small overlap,” IEEE Trans. Geosci. Remote Sensing 58(4), 2530–2540 (2020). [CrossRef]

24. L. He, S. Wang, Q. Hu, et al., “Gfoicp: Geometric feature optimized iterative closest point for 3-d point cloud registration,” IEEE Trans. Geosci. Remote Sensing 61, 1–17 (2023). [CrossRef]

25. Y. He, J. Yang, and X. Hou, “Icp registration with dca descriptor for 3d point clouds,” Opt. Express 29(13), 20423–20439 (2021). [CrossRef]

26. Z. Du, Y. Zuo, and X. Song, “Efficient and accurate registration with bwph descriptor for low-quality point clouds,” Opt. Express 31(23), 39307–39322 (2023). [CrossRef]

27. Y. Zhao, H. Yu, and K. Zhang, “Fpp-slam: indoor simultaneous localization and mapping based on fringe projection profilometry,” Opt. Express 31(4), 5853–5871 (2023). [CrossRef]

28. Y. Zhang, F. Tosi, S. Mattoccia, et al., “Go-slam: Global optimization for consistent 3d instant reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), (2023), pp. 3727–3737.

29. B. Wu, X. Ge, L. Xie, et al., “Enhanced 3d mapping with an rgb-d sensor via integration of depth measurements and image sequences,” Photogramm. Eng. & Remote. Sens. 85(9), 633–642 (2019). [CrossRef]

30. D. Um and S. Lee, “Microscopic structure from motion (sfm) for microscale 3d surface reconstruction,” Sensors 20(19), 5599 (2020). [CrossRef]

31. K. Koide, J. Miura, and M. Yokozuka, “Interactive 3d graph slam for map correction,” IEEE Robot. Autom. Lett. 6(1), 40–47 (2021). [CrossRef]

32. E. Vidal, N. Piotto, G. Cordara, et al., “Automatic video to point cloud registration in a structure-from-motion framework,” in 2015 IEEE International Conference on Image Processing (ICIP), (2015), pp. 2646–2650.

33. P. Yin, S. Zhao, and H. Lai, “Automerge: A framework for map assembling and smoothing in city-scale environments,” IEEE Trans. Robot. 39(5), 3686–3704 (2023). [CrossRef]

34. J. Williams and M. Bennamoun, “Simultaneous registration of multiple corresponding point sets,” Comput. Vis. Image Underst. 81(1), 117–142 (2001). [CrossRef]

35. X. Mateo, X. Orriols, and X. Binefa, “Bayesian perspective for the registration of multiple 3d views,” Comput. Vis. Image Underst. 118, 84–96 (2014). [CrossRef]

36. S. Fantoni, U. Castellani, and A. Fusiello, “Accurate and automatic alignment of range surfaces,” in 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, (2012), pp. 73–80.

37. H. Wang, Y. Liu, Z. Dong, et al., “Robust multiview point cloud registration with reliable pose graph initialization and history reweighting,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2023), pp. 9506–9515.

38. Z. J. Yew and G. H. Lee, “Learning iterative robust transformation synchronization,” in 2021 International Conference on 3D Vision (3DV), (2021), pp. 1206–1215.

39. R. Lei, B. Yang, D. Quan, et al., “Deep global feature-based template matching for fast multi-modal image registration,” in 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, (2021), pp. 5457–5460.

40. R. Arandjelovic, P. Gronat, and A. Torii, “Netvlad: Cnn architecture for weakly supervised place recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1437–1451 (2018). [CrossRef]

41. S. Choi, Q.-Y. Zhou, and V. Koltun, “Robust reconstruction of indoor scenes,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), pp. 5556–5565.

42. Z. Gojcic, C. Zhou, J. D. Wegner, et al., “Learning multiview 3d point cloud registration,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), pp. 1756–1766.

43. F. Arrigoni, B. Rossi, and A. Fusiello, “Spectral synchronization of multiple views in se (3),” SIAM J. Imaging Sci. 9(4), 1963–1990 (2016). [CrossRef]

44. H. Hoppe, T. DeRose, T. Duchamp, et al., “Surface reconstruction from unorganized points,” in Proceedings of the 19th Annual Conference on Computer Graphics and Interactive Techniques, (Association for Computing Machinery, New York, NY, USA, 1992), SIGGRAPH ’92, pp. 71–78.

45. H. Xie, K. McDonnell, and H. Qin, “Surface reconstruction of noisy and defective data sets,” in IEEE Visualization 2004, (2004), pp. 259–266.

46. S. König and S. Gumhold, “Consistent propagation of normal orientations in point clouds,” in International Symposium on Vision, Modeling, and Visualization, (2009).

47. J. Jakob, C. Buchenau, and M. Guthe, “Parallel globally consistent normal orientation of raw unorganized point clouds,” Comput. Graph. Forum 38(5), 163–173 (2019). [CrossRef]

48. G. Metzer, R. Hanocka, D. Zorin, et al., “Orienting point clouds with dipole propagation,” ACM Trans. Graph. 40(4), 1–14 (2021). [CrossRef]

49. F. Hou, C. Wang, and W. Wang, “Iterative poisson surface reconstruction (ipsr) for unoriented points,” ACM Trans. Graph. 41(4), 1–13 (2022). [CrossRef]

50. R. Xu, Z. Dou, and N. Wang, “Globally consistent normal orientation for point clouds by regularizing the winding-number field,” ACM Trans. Graph. 42(6), 1–18 (2023). [CrossRef]

51. M. M. Kazhdan and H. Hoppe, “Screened poisson surface reconstruction,” ACM Trans. Graph. 32(3), 1–13 (2013). [CrossRef]

52. M. Waechter, N. Moehrle, and M. Goesele, “Let there be color! — Large-scale texturing of 3D reconstructions,” in Proceedings of the European Conference on Computer Vision, (Springer, 2014).

53. S. P. Lim and H. Haron, “Surface reconstruction techniques: a review,” Artif. Intell. Rev. 42(1), 59–78 (2014). [CrossRef]

54. Z. Huang, Y. Wen, Z. Wang, et al., “Surface reconstruction from point clouds: A survey and a benchmark,” ArXiv, arXiv.2205.02413 (2022). [CrossRef]

55. M. Levoy, J. Gerth, B. Curless, et al., “The Stanford 3D scanning repository,” https://graphics.stanford.edu/data/3Dscanrep/.

56. A. Chatterjee and V. M. Govindu, “Efficient and robust large-scale rotation averaging,” in 2013 IEEE International Conference on Computer Vision, (2013), pp. 521–528.

57. X. Huang, Z. Liang, X. Zhou, et al., “Learning transformation synchronization,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), pp. 8074–8083.

58. L. Zhengchao, Z. Runlin, W. Xuanquan, et al., “Fully Automated Structured Light Scanning for High-Fidelity 3D Reconstruction via Graph Optimization,” https://github.com/zhijianglu/HF-Recon.

Fully automated structured light scanning for high-fidelity 3D reconstruction via graph optimization

Abstract

1. Introduction

2. Related works

2.1 Object-level point cloud registration

2.2 Globally consistent normal orientation estimation

3. Method

3.1 System pipeline

3.2 Global consistency registration adjustment

3.2.1 Notation

3.2.2 Global consistency objective function

3.2.3 Robust weight function

3.3 Surface reconstruction

3.3.1 Normal vector consistency

3.3.2 Mesh generation

4. Experiment

4.1 Simulation experiment

4.1.1 Gazebo-based data synthesis

4.1.2 Synthetic data evaluation

4.2 Real-world experiment

4.3 Precision evaluation

4.4 Time efficiency evaluation

5. Conclusion

6. Discussion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (3)

Equations (15)

Optics Express

Our				R-ICP	AA-ICP	Sparse-ICP	Color-ICP
Local Reg.	Global Reg.	Iter. Opt.	Total	R-ICP	AA-ICP	Sparse-ICP	Color-ICP
8.23	6.25	3.29	17.78	8.23	9.51	72.36	3.22