Color-guided optimization model with reliable self-structure priors for depth map restoration

Yumin Chen; Zhihu Hong; Dongjun Sun; Zhaoyu Peng; Nongtao Zhang; Kangshun Luo; Chong Liu; Junming Tian; Yan Qing; Chunlin Li; Mostafa Mahmoud Ibrahim; You Yang

doi:10.1364/OSAC.430664

1. Introduction

Depth maps play a crucial role in many visual computation and communication applications such as 3D surgery operations [1–3], augmented reality [4], UAV navigation [5], human-device interaction [6], 3D modeling [7], and 3DTV/free viewpoint visual applications [8]. These depth maps are obtained from current depth sensors such as Kinect and Time of Flight (ToF) cameras, which are affordable and popular. Unfortunately, these depth maps are far from perfect, where they suffer from different types of degradation such as intrinsic noise and missing pixels. In addition, the depth maps captured with the current devices have low resolution. These various degradation types reduce the range sensors performance, when they are used in different applications. For benefit from the depth information, many researchers devoted their efforts to depth restoration, where they pursued various ways either through using multi-depth maps or exploiting an auxiliary information such as a registered color image. The current research on depth map restoration, including depth map filtering and super-resolution can be divided into three categories according to the baseline [9]:

(1) Filtering methods: The filtering methods depend on local or non-local information, and all of these methods can be divided into two categories: self-guided and color-guided. Self-guided filters depend only on the depth map for filtering either local-based such as bilateral filter (BF) [10], non-local-based such as non-local means (NLM) filter [11] or any traditional filtering method used for color images [12]. These self-guided methods are not applicable for up-sampling and filtering ToF-based depth maps because these under-sampled depth maps have very low quality. On the other hand, the color-guided filters exploit the content of the color image to detect the depth discontinuities in the noisy environment. These color-guided filters rely on the edges correlation assumption between the depth and color images. For example, joint bilateral filter (JBF) [13] is one of the popular color-guided local methods used for enhancing depth maps. Most of the color-guided methods depend explicitly on the co-occurrence property between the depth map and the corresponding color image. Figure 1 shows the edges of both depth and color images. From Fig. 1, it is observed that there is an inconsistency between these edges; however, some of these color edges correspond to the depth edges. When the color image is used as a guide for depth map filtering, these textures that do not correspond to any depth edge are transferred and copied to the homogeneous regions of depth map, and this problem is called texture coping problem. This problem is a challenge for all color-guided filtering methods.

Fig. 1. Edges co-occurrence between depth and color images.

Download Full Size | PDF

(2) Optimization methods: Optimization methods are actually global methods used to recover the depth maps through using some regularization terms or priors that model the depth map characteristics. Most of the previous optimization models used for up-sampling and filtering depth maps generated with ToF and Kinect devices are color-guided. Some of these color-guided models are based on graphical model such as Markov random field (MRF) [14] or a low rank-based method (LRM) [15]. Besides that, two optimization models depending on auto-regressive (AR) predictors have been proposed. The first model is based on AR without any further priors [16] but the second one is with adding further priors such as a low rank (LR) and total variation (TV) regularization terms [17]. All of these AR-based models result in blurry depth edges. Apart from the aforementioned optimization methods, some methods depend on the prior of mutual structures between the color and depth images for depth map enhancement. Mutual structure for joint filtering (MSJF) method [18] is an example of guiding the depth map restoration by using the structures that exist in both depth and color images. In addition, based on weighted least square, some optimization models were proposed for depth map restoration [8,19,20].

(3) Learning methods: Learning-based methods are methods that use the learning methodology for depth map processing. These learning-based methods can be divided into two categories: deep neural networks and sparse coding methods. As deep learning is a new born field, this field drew the attention of many researchers for using the deep learning in depth map restoration. For instance, a convolutional neural network (CNN) with a pre-processing step is used for depth map enhancement in [21]. Zhu et al. [22] combine CNN and a linear regularization into a learning network for filtering and up-sampling depth maps. Recently, He et al. proposed a graphical neural network to handle the compression artifacts of multi-view depth maps [23]. For sparse coding, Wang et al. [24] used the sparse coding constrained with a trilateral prior for filling hole pixels in Kinect depth maps. This constrain is robust to handle the blurred depth discontinuities that appears if the sparse code is used alone. Wang et al. [25] proposed to use deep intensity features for compression artifacts reduction of depth map.

In this paper, we propose a color-guided optimization model based on the reliable self structures of depth and color images for depth map restoration, including two tasks: filtering noise with filling missing pixels task and super-resolution (i.e. up-sampling) task. Our contributions can be outlined as follows:

• Reliable self structures-based depth map filtering: In this task, motivated by mutually guided image filtering (muGIF) [26] method, we construct an optimization model that depends also on the mutual structures or relative structures but our model depends only on the self relative structures of depth map guided by the self relative structures of color image. The other contribution is utilizing of this relative structure based model for filling missing pixels in the depth map, where the original model muGIF [26] is not applicable for filling hole pixels. To this end, we also propose a confidence map for using the color-based prior only in missing regions, and that is for overcoming the texture transfer problem.
• Reliable self structures-based depth map up-sampling: In this task, although the original model muGIF is used for depth map up-sampling, we propose a modified model similar to the model used for depth map filtering with a different confidence map. The confidence map in this task is suitable for dealing with the problems that face the original model. With this modification, the performance of depth map up-sampling is improved.

2. Proposed color-guided optimization model

2.1 Problem statement and degradation model

We should note that the depth map is not only polluted with intrinsic noise (e.g. Gaussian noise with constant or depth value-related variance) but also corrupted with missing of some regions, especially near the depth edges. In addition, these depth maps captured by recent sensors and depth cameras such as ToF are of low resolution compared with the high resolution (HR) RGB color image for the same scene. For summarizing the depth map degradation, the main types of degradation that contaminate the captured depth maps are intrinsic noise, hole pixels (i.e. missing pixels) either random or structural and under-sampling. The observation model could be mathematically formulated as: $\textbf {T}_{0}= \textbf {P}\textbf {T}+\textbf {n}$, where $\textbf {P}$ is the observation matrix, $\textbf {T}$ is the desired depth map, $\textbf {T}_{0}$ is the degraded depth map, and $\textbf {n}$ is the intrinsic noise. For super-resolution in ToF and Kinect version2 (i.e. ToF-based) cameras, we can denote the observation matrix $\textbf {P}$ as $\textbf {P}_{s}$, which represents a sampling matrix. $\textbf {P}_{s}$ is constructed from an identity matrix by removing those rows associated with pixels that are not exist in the low resolution and should be estimated in the high resolution. On the other hand, for hole pixels filling at Kinect version1 (i.e. structure light-based) camera, we denote $\textbf {P}$ as $\textbf {P}_{h}$ to represent an identity matrix whose rows associated with hole pixels are removed. For understanding the shortcomings that face the depth map restoration, we separate the analysis for filtering of the missing pixels and intrinsic noise from super-resolution task.

Given a degraded depth map with spatial noise (e.g. Gaussian noise with constant variance or the variance at each pixel is proportional to the square of noise-free depth value), the degraded depth map can be restored through many methods. As the optimization-based method muGIF [26] is very related to our work is used as an example in the analysis. muGIF is a recent method that defines a new measurement for mutual response to manage structural similarity between two inputs for image smoothing. In the case of color-guided manner, let us denote the depth map as a target image and the color image as a reference or guided image by $\textbf {T}_{0}$ and $\textbf {R}_{0}$ respectively. $\textbf {R}$ and $\textbf {T}$ indicate the filtering outputs of color and depth images respectively through the optimization iterations. The muGIF is formulated as:

(1)$$\arg \min_{T,R} \quad \alpha_{t}\mathcal{R}(\textbf{T},\textbf{R},\epsilon_{t},\epsilon_{r}) +\beta_{t}||\textbf{T}-\textbf{T}_{0}||^{2}_{F} +\alpha_{r}\mathcal{R}(\textbf{R},\textbf{T},\epsilon_{r},\epsilon_{t}) +\beta_{r}||\textbf{R}-\textbf{R}_{0}||^{2}_{F},$$

where $\alpha _{t}$, $\beta _{t}$, $\alpha _{r}$ and $\beta _{r}$ are regularization parameters. $||\textbf {T}-\textbf {T}_{0}||^{2}_{F}$ and $||\textbf {R}-\textbf {R}_{0}||^{2}_{F}$ are data fidelity terms for guided and guiding images respectively. These fidelity terms enforce $\textbf {T}$ and $\textbf {R}$ not to deviate from the original inputs $\textbf {T}_{0}$ and $\textbf {R}_{0}$. $\mathcal {R}(\textbf {T},\textbf {R},\epsilon _{t},\epsilon _{r})$ and $\mathcal {R}(\textbf {R},\textbf {T},\epsilon _{r},\epsilon _{t})$ are mutual terms that depend on the relative structure of one image with respect to a guiding image. The mutual term for depth map guided by color image is formulated as:

(2)$$\mathcal{R}(\textbf{T},\textbf{R},\epsilon_{t},\epsilon_{r}) \doteq \sum_{i}\frac{(\nabla\textbf{T}_{i})^{2}}{\max(|\nabla\textbf{R}_{i}|,\epsilon_{r})\cdot \max(|\nabla\textbf{T}_{i}|,\epsilon_{t})},$$

where $\epsilon _{t}$ and $\epsilon _{r}$ are parameters to control the stability of the denominators of the mutual terms, and they are also used to improve the stability to the small artifacts. This mutual model actually can perform as three types of filtering, including a dynamic only (D), a static/dynamic (S/D), and a dynamic/dynamic (D/D) filters. The first type is the self-guided filter, where $R=T$ and $\epsilon _{r}=\epsilon _{t}$ (see Eq. (3)). This dynamic only filter depends only on the features of depth map, and it is used for removing noise and texture from the image. As the depth map has not texture, where it consists of only edges and homogeneous regions (i.e. smoothed regions), this type is suitable for removing spatial noise from the depth map with edges preservation.

(3)$$\mathcal{R}(\textbf{T},\epsilon_{t}) \doteq \sum_{i}\frac{(\nabla\textbf{T}_{i})^{2}}{ \max(|\nabla\textbf{T}_{i}|,\epsilon_{t})^{2}}$$

The second and third types of muGIF, including S/D and D/D filters benefit from the co-aligned color image, where they mutually guide the filtering operation by the mutual relationship between the depth and color images. The S/D type only dynamically modulates the depth map at every iteration with a weight function depending on the mutual features of the static color image and the dynamic depth image. In addition, the color image is fixed and not dynamically modified at every iteration. For D/D, both the depth and the color images are dynamically modulated at every iteration with a weight function, which depends on the mutual features of the dynamic color and dynamic depth images. In addition to applicability of these both types for smoothing the depth image from spatial Gaussian noise, they are also applicable for using in depth map up-sampling (see Ref. [27] for details).

2.2 Analysis of different mutually filtering types

2.2.1 Filling hole pixels and filtering

For filling hole pixels and filtering task, the self-guided muGIF is robust for dealing with intrinsic noise and removing texture; however, it can not handle the hole pixels. For the remaining types, the hole pixels are not completely filled, especially the large black regions, although the mutuality between the color and depth edges is activated. Figure 2(b-d) show the performance of muGIF in the case of hole pixels existence in the depth map. As the original muGIF is not built for filling hole pixels, we first modify the data fidelity term of depth map in muGIF model by the degradation model, and we denote this new obtained model as Extended muGIF (EmuGIF) in this paper. After the modification, the optimization model becomes as follows:

(4)$$ \arg \min_{T,R} \quad \alpha_{t}\mathcal{R}(\textbf{T},\textbf{R},\epsilon_{t},\epsilon_{r}) +\beta_{t}||\textbf{PT}-\textbf{T}_{0}||^{2}_{F} +\alpha_{r}\mathcal{R}(\textbf{R},\textbf{T},\epsilon_{r},\epsilon_{t}) +\beta_{r}||\textbf{R}-\textbf{R}_{0}||^{2}_{F} $$

Fig. 2. Comparison on filtering depth map from compound noise (i.e. intrinsic noise and hole pixels) on simulated Kinect depth map (Art). (a) Kinect-like depth map, (b) muGIF(D) (MAE:19.114814), (c) muGIF(S/D) (MAE:18.706415), (d) muGIF(D/D) (MAE:18.690930), (e) Groundtruth depth and color images, (f) EmuGIF(D) and difference map (MAE:1.703211), (g) EmuGIF(S/D) and difference map (MAE:1.349451), (h) EmuGIF(D/D) and difference map (MAE:1.311019).

Download Full Size | PDF

Figure 2(f-h) demonstrates the performance of EmuGIF. From Fig. 2(f), it is clear to see that the self-guided EmuGIF can fill the hole pixels with wrong predictions, especially in the large black pixels. On the other hand, S/D and D/D EmuGIF perform well in filtering and filling the hole pixels; however, these color-guided types of EmuGIF are not sufficient for handling the case where color edges are inconsistent with depth discontinuities. The inconsistency between the compound noise free depth map and color image appears in two cases. The first one is the homogeneous depth regions that correspond to highly textured color regions, while the other one is the depth discontinuities that correspond to weak color edges. Therefore, when the color image is used as a guiding image, some textures transfer from the color image to the homogeneous regions of depth map in the first case, while the restored depth edges will be blurred in the second case. The mutuality concept depends on the mutual structures between the two images (i.e. depth and color images). The mutual filter considers that the structure is consistent, if common edges exist in the two images. On the other hand, the structure is considered inconsistent if the edges appears in only one image but not on the other. As the mutual concept is robust for inconsistency, the texture coping problem that results from transferring the texture to the corresponding homogeneous depth regions is highly mitigated especially if the depth map is not contaminated with heavy intrinsic noise. Unfortunately, this mutuality is not powerful if the color edges that correspond to the depth edges are weak because the mutual filter considers this case is also inconsistent, which in turn results in smoothing or blurring these depth edges.

2.2.2 Super-resolution

For super-resolution task, we initialize HR depth map by using the bicubic interpolation, then all types of muGIF are applied on this interpolated version. Figure 3 shows the results of these types on simulated ToF depth map Art for 8$\times$ up-sampling. From Fig. 3, we can evidently observe that muGIF algorithm is not robust for noisy depth map up-sampling, where some noise still remains in the recovered HR depth map. Actually, there is a conflict between removing the intrinsic noise and keeping the sharpness of depth edges. To remove the noise and smooth the homogeneous regions, some depth edges will be blurred in the case of using self guided type. In addition to the blurred discontinuities, some fake edges transfer to the homogeneous depth regions in the case of the other types.

2.3 Proposed method for compound noise filtering

In addition to what we aforementioned, if the depth map has missing pixels, filling these pixels becomes more challenging for the mutual filter even with the degradation model. To handle the missing pixels and overcome the blurry effect at the depth discontinuities corresponding to the weak color edges, we propose a new optimization model, which depends also on the relative structure concept. This new optimization model can be expressed as:

(5)$$\hat{\mathbf{T}}=\arg \min_{T} \quad ||\mathbf{P_{h}T}-\mathbf{T}_{0}||^{2}_{F} +\alpha_{r}\mathcal{R}_{masked}(\mathbf{R},\epsilon_{r}) +\alpha_{t}\mathcal{R}_{masked}(\mathbf{T},\epsilon_{t}),$$

where $\hat {\textbf {T}}$ is the estimated depth map and the observation matrix $\textbf {P}_{h}$ is the mask that identifies the missing pixels, which equals 1 for valid pixels and 0 for missing pixels. $\mathcal {R}_{masked}(\textbf {T},\epsilon _{t})$ is the relative structure of depth map to self-guided depth map mutual response masked by a confidence map $C$. This relative structure related to depth map gradients is expressed as:

(6)$$\mathcal{R}_{masked}(\textbf{T},\epsilon_{t}) \doteq \sum_{i}\frac{\textbf{C}\odot(\nabla\textbf{T}_{i})^{2}}{ \max(|\nabla\textbf{T}_{i}|,\epsilon_{t})^{2}}.$$

Fig. 3. Performance of different muGIF types for 8$\times$ up-sampling the simulated ToF depth map (Art). (a) Groundtruth depth and color images, (b) muGIF(D) and difference map (MAE:3.1798), (c) muGIF(S/D) and difference map (MAE:2.9814), (d) muGIF(D/D) and difference map (MAE:2.8645).

Download Full Size | PDF

On the other hand, $\mathcal {R}_{masked}(\textbf {R},\epsilon _{r})$ is the relative structure of depth map to self-guided color image mutual response masked by the inverted confidence map $\textbf {C}_{inv}$. This relative structure related to color image gradients is expressed as:

(7)$$\mathcal{R}_{masked}(\textbf{R},\epsilon_{r}) \doteq \sum_{i}\frac{\textbf{C}_{inv}\odot(\nabla\textbf{T}_{i})^{2}}{ \max(|\nabla\textbf{R}_{i}|,\epsilon_{r})^{2}}.$$

This proposed model is processed iteratively to converge with the best performance. The confidence map $C$ is the eroded mask of the intermediate output $T$, and it is expressed as:

(8)$$\textbf{C}=f(u(\textbf{T})),$$

where $f(.)$ is the erode function, which is within the morphology operations and $u(.)$ is the unit step function for extracting the mask from the image, and it is described as:

(9)$$u(a)= \begin{cases} 1, \quad a>0 \\ 0, \quad otherwise. \end{cases}$$

In Eq. (7), $\textbf {C}_{inv}$ is the inverted confidence map and the operator $\odot$ stands for element wise matrix multiplication. Therefore, these confidence and inverted confidence maps are updated at every iteration. In the beginning of the algorithm, $T$ is initialized with $T_{0}$ value. To overcome the texture transfer and blurring of the depth discontinuities, this confidence map guides the filtering optimization model by color-based relative structure in only missing depth regions. Similar to muGIF [27] decomposition, the objective function in Eq. (5) is decomposed as follows

(10)$$\arg \min_{t} \quad ||\mathbf{P_{h}t}-\mathbf{t}_{0}||^{2}_{2} +\alpha_{r}\mathbf{t}^{T}(\sum_{d} \mathbf{D}_{d}^{T}\mathbf{C}\mathbf{U}^{2}\mathbf{D}_{d})\mathbf{t} +\alpha_{t}\textbf{t}^{T}(\sum_{d} \mathbf{D}_{d}^{T}\mathbf{C}_{inv}\mathbf{V}^{2}\mathbf{D}_{d})\mathbf{t}$$

We further set $\alpha _{r}=\alpha _{t}=\alpha$, and combine the last two terms in the decomposition equation for simplicity as:

(11)$$\arg \min_{t} ||\mathbf{P_{h}t}-\mathbf{t}_{0}||^{2}_{2} +\alpha\mathbf{t}^{T}(\sum_{d} \mathbf{D}_{d}^{T}[\mathbf{C}\mathbf{U}^{2}+\mathbf{C}_{inv}\mathbf{V}^{2}]\mathbf{D}_{d})\mathbf{t},$$

where $t$ and $\mathbf {t}_{0}$ are the filtered and raw depth maps in vectorized form respectively. $\mathbf {U}^{2}$ and $\mathbf {V}^{2}$ are the diagonal matrices with the i-th diagonal entries being the denominators of Eq. (6) and Eq. (7) respectively. $\textbf {D}_{d}$ is the discrete gradient operator in horizontal, vertical and diagonal directions, which points out the 8 surrounding nearby pixels. As the final decomposition equation has quadratic terms, which are convex, the closed form solution can be described as:

(12)$$(\textbf{P}_{h} +\alpha(\sum_{d} \mathbf{D}_{d}^{T}[\mathbf{C}\mathbf{U}^{2}+\mathbf{C}_{inv}\mathbf{V}^{2}]\mathbf{D}_{d}))\mathbf{t}=\mathbf{t}_{0}.$$

2.4 Proposed method for super-resolution

In this subsection, we describe how the depth map degradation process related to low resolution is modeled. The optimization model used for super-resolution is similar to the filtering model but with some modifications in the initialization of depth map and the confidence map. The proposed model for super-resolution is formulated as follows:

(13)$$\arg \min_{T} ||\mathbf{P_{s}T}-\mathbf{T}_{0}||^{2}_{F} +\alpha_{r}\mathcal{R}_{upsample}(\mathbf{R},\epsilon_{r}) +\alpha_{t}\mathcal{R}_{upsample}(\mathbf{T},\epsilon_{t}),$$

where $P_{s}$ represents the down-sampling operator and $T$ is HR depth map wanted to be restored. Although, the relative structure regularization terms for super resolution are semi-similar to those for compound noise filtering, they differ in how the confidence map for super-resolution is calculated. These regularization terms are expressed as:

(14)$$\mathcal{R}_{upsample}(\mathbf{R},\epsilon_{t}) \doteq \sum_{i}\frac{\mathbf{E}\odot(\nabla\mathbf{T}_{i})^{2}}{ \max(|\nabla\mathbf{T}_{i}|,\epsilon_{t})^{2}}$$

(15)$$\mathcal{R}_{upsample}(\mathbf{R},\epsilon_{r}) \doteq \sum_{i}\frac{\mathbf{E}_{inv}\odot(\nabla\mathbf{T}_{i})^{2}}{ \max(|\nabla\mathbf{R}_{i}|,\epsilon_{r})^{2}}$$

In these regularization equations, $E$ is the edge confidence map. The problem of super-resolution is widely considered as an interpolation problem but unfortunately most of these interpolation methods recover HR depth map with blurred edges and blocking effects. Since the super-resolution task concerns with how to retain edge sharpness during the operation, we calculate the confidence map $E$ depending on the depth edges. Most of depth recovering methods used for up-sampling depth maps, especially the optimization-based ones, initialize HR depth map by an interpolated version. However, it is observed that the depth discontinuities resulted from these interpolation methods (e.g. bicubic interpolation) are quite blurred, especially if the low resolution of depth map is noisy. In addition to that, the interpolated map is also incorrect for large up-sampling factor. For this consideration, the edge confidence map $E$ is designated, where it is smaller near the edges (means unreliable) and larger in the homogeneous regions. In this confidence map, the color-based relative structure $\mathcal {R}_{upsample}(\textbf {R},\epsilon _{r})$ is used only near and at depth discontinuities but not used in the homogeneous regions. This excluding process of using color-based prior in the smooth regions is similar as in filtering process to overcome the texture coping problem.

To calculate $E$, we first distinguish between the smooth depth regions and regions around depth discontinuities. In the beginning, we denoise the interpolated map by $L_{0}$ smoothing filter [28] to facilitate the distinguishing process. Then, similar to [20], we also use the local depth relative smoothness and denote it as $\rho$ to detect the depth edges as shown in the following decomposition model equation:

(16)$$\arg \min_{t} ||\mathbf{P_{s}t}-\mathbf{t}_{0}||^{2}_{2} +\alpha\mathbf{t}^{T}(\sum_{d} \mathbf{D}_{d}^{T}[\mathbf{E}\mathbf{U}^{2}+\mathbf{E}_{inv}\mathbf{V}^{2}]\rho\mathbf{D}_{d})\mathbf{t}.$$

After that, to identify the depth discontinuities, the best measure is the weights of depth map gradients. Let us denote $W$ as the multiplicative inverse matrix of $\max (|\nabla \textbf {T}|,\epsilon _{t})$, where $|\nabla \textbf {T}|$ is the absolute depth gradient matrix and $\epsilon _{t}$ is the threshold value for distinguishing operation. For a certain pixel $i$, if $|\nabla \textbf {T}_{i}|<\epsilon _{t}$, the weight value for this pixel becomes the maximum value, and that means that the pixel $i$ is within the homogeneous regions. Otherwise, all the other pixels that do not satisfy this condition are considered as discontinuities regions. As the depth gradients are calculated in the 8 neighbors of every pixel, these gradients matrices will be in horizontal, vertical and diagonal directions. Let us denote $\textbf {M}_{g}$ as the sum of the gradients in every direction, and it is described as:

(17)$$\textbf{M}_{g}=\sum_{d}\textbf{W}_{d}= \sum_{d}(\max(|\nabla_{d}\textbf{T}|,\epsilon_{t}))^{{-}1}$$

The maximum value of this $l_{1}$ norm depth gradients $\textbf {M}_{g}$ is obtained if and only if the gradient in every direction has a maximum value. Therefore, the edge confidence map $E$ is specified as Eq. (18). This edge confidence map $E$ and local relative smoothness $\rho$ are actually updated in every iteration because the intermediate depth map $T$ is updated.

(18)$$\textbf{E}=(\textbf{M}_{g}<\max(\textbf{M}_{g}))$$

As Eqs. (12) and (16) are symmetric positive definite Laplacian matrix, many techniques [29] such as preconditioned conjugate gradients method can be used for solving the cost functions. The minimization technique used in our proposed model and muGIF is the same, and modifying of the cost functions and adding of confidence maps do not affect the minimization technique.

3. Experiments and discussions

In this section, the performance of our model is verified via various experiments using different types of datasets. These experiments include two degradation types: super-resolution with filtering intrinsic noise (i.e. ToF-like experiments) and filling the missing black pixels with filtering intrinsic noise (i.e. Kinect-like experiments). For datasets, our proposed method is tested on Middlebury datasets [30], where these datasets are modified to simulate ToF and Kinect-like degradation models. Moreover, our method is tested on real datasets. We also qualitatively and quantitatively compared our filtering method with the state-of-the-art methods: a low rank based method (LRM) [15], a mutual structure for joint filtering (MSJF) [18], the color guided AR model [16], RCG [20], muGIF [27], adaptive color guided non local means method (ACGMNLM) [31], and ACGMNLM with shock filter (ACGMNLM+SF) [31]. For super-resolution comparison, fast guided global interpolation method (FGI) [32] and learning dynamic guidance method (DG) [33] are used in addition to the mentioned methods. The parameters of our method are set as follows: $\alpha$ in Eq. (12) is set as 0.0002 for filtering task but 0.0005 divided by up-sampling rate for super-resolution task. $\epsilon _{t}$ and $\epsilon _{r}$ are set as 0.005 for the two tasks.

3.1 Experiments on simulated and real Kinect depth maps

In this part of experiments, our method is tested on both simulated and real Kinect datasets. In regards to the simulated datasets, we also reuse the two simulated Kinect datasets used in [20], where the first dataset (D1) and the second dataset (D2) are prepared by [15] and [20] respectively. After that, we apply our method and the compared filters that are applicable for restoring the Kinect-like depth maps on these corrupted depth maps. Table 1 obtains the comparison between our proposed and other filtering methods performed on the simulated Kinect datasets in terms of MAE. From Table 1, it is clearly observed that our proposed method ranks the first, which has the smallest MAE for most depth maps in the two datasets. In addition, our average score outperforms the average scores of other filtering methods.

Table 1. Quantitative comparison between different algorithms on simulated Kinect datasets in terms of MAE

View Table | View all tables in this article

For visual comparison, Figs. 4 and 5 illustrate the comparison between different algorithms performed on two simulated Kinect depth maps from D1 (Art and Teddy). In Figs. 4 and 5, two specific regions from each depth map are picked and enlarged for further clarification; one region is chosen in the homogeneous regions and the other region is for clarification the problem of blurred and distorted depth discontinuities. From Figs. 4(c) and 5(c), it is clearly observed that LRM blurs the depth discontinuities because it is based on patch-based low rank optimization method. Furthermore, some textures are also transferred in the homogeneous regions. For AR results, some intrinsic noise still occupies the homogeneous regions in depth maps as shown in Figs. 4(d) and 5(d). For RCG results, the depth edges are over-sharpen. In addition, fake edges are transferred from the color images as shown in Figs. 4(e) and 5(e). Although the results of EmuGIF methods have little texture in their smoothed regions, they distort the depth edges as shown in Fig. 4(f-g). ACGMNLM with and without SF are robust against texture coping artifacts; however, the resulted depth edges are not sharp enough compared with our proposed optimization-based filtering method. Among all of these aforementioned filtering methods, our proposed method performs the best results in overcoming the color texture transfer, preserving sharper edges, and handling the intrinsic noise. In regards to the real Kinect dataset, we also evaluate our proposed method on real Kinect dataset. Some of NYU dataset [34] is used in our verification. Figure 6 illustrates our method performance in against of other filtering methods performed on two depth maps from the mentioned dataset. From Fig. 6(c), it is obviously observed that AR method [16] suffers from blurred depth edges as appears in the marked regions. RCG method [20] always over-sharpens the depth discontinuities and transfers more texture to the corresponding homogeneous depth regions as shown in Fig. 6(d). The results of EmuGIF are comparable with our results; however, our results are still the best for preserving the depth discontinuities and overcoming the texture coping problem.

Fig. 4. Experiments on simulated Kinect depth map (Art). (a) Color image, (b) Kinect-like depth map, (c) LRM, (d) AR, (e) RCG, (f) EmuGIF(S/D), (g) EmuGIF(D/D), (h) ACGMNLM, (i) ACGMNLM+SF, (j) Proposed method. (k) The groundtruth depth map.

Download Full Size | PDF

Fig. 5. Experiments on simulated Kinect depth map (Teddy). (a) Color image, (b) Kinect-like depth map, (c) LRM, (d) AR, (e) RCG, (f) EmuGIF(S/D), (g) EmuGIF(D/D), (h) ACGMNLM, (i) ACGMNLM+SF, (j) Proposed method. (k) The groundtruth depth map.

Download Full Size | PDF

Fig. 6. Experiments on two real Kinect depth maps from the NYU Kinect dataset. (a) Color images, (b) Real Kinect depth maps, (c) AR, (d) RCG, (e) EmuGIF(S/D), (f) EmuGIF(D/D), (g) Proposed Method.

Download Full Size | PDF

In addition to the objective and subjective evaluation of simulated and real Kinect depth maps, we also construct 3D point clouds, where the flying pixels and texture coping artifacts appear in the 3D space with better visualization than at the two dimensions. Figure 7 shows the point clouds obtained based on the resulted depth maps of different compound noise filtering methods. From Fig. 7, we can see that the point cloud obtained from the depth map resulted from AR is very distorted either in the homogeneous regions or at the depth boundaries. For RCG method, the obtained point cloud boundaries are also distorted especially at the locations corresponding to color image regions that have rich textures; however, almost depth edges are very sharp. This problem appears because of the over-sharpness of RCG method. From Fig. 7(c), it is observed that the point cloud obtained from EmuGIF(S/D) method has many defects due to texture transferring and flying pixels near the edges. Although the point cloud of EmuGIF(D/D) is better than that of EmuGIF(S/D) for reduction of flying pixels, it still has some distortions in the geometry corresponding to the homogeneous regions due to texture coping. This texture coping problem is tackled in ACGMNLM and ACGMNLM+SF because these methods are robust against texture transferring, where the color image is not used in non-hole regions. However, the flying pixels still appear at the depth discontinuities. On the other hand, our proposed optimization model provides a high improvement on tackling the flying pixels, where the depth discontinuities are sharp as shown in Fig. 7(g). In addition, our method is robust against texture coping problem.

Fig. 7. Perceptual quality evaluation on point cloud of Art, (a) AR, (b) RCG, (c) EmuGIF(S/D), (d) EmuGIF(D/D), (e) ACGMNLM (ours), (f) ACGMNLM+SF (ours), (g) Proposed optimization model (ours), (h) Ground truth.

Download Full Size | PDF

3.2 Experiments on simulated ToF depth maps

In this part of experiments, our up-sampling method is tested on both simulated ToF dataset. The simulated dataset is provided by Yang et al [16], where they took some depth maps from Middlebury datasets (six depth maps) and made these depth maps noisy and under-sampled with the following factor: 2, 4, 8 and 16 to mimic the real ToF depth maps. In this experiment, our method is compared with some of edge aware up-sampling optimization model. Table 2 shows the comparison between our proposed method and the other up-sampling methods performed on the simulated ToF dataset in terms of MAE for all up-sampling rates. From Table 2, we can see that our up-sampling results are much better than the results of all types of muGIF model because of the effect of edge confidence map. In addition, it is clearly seen that our proposed method has lower errors compared the learning-based method DG and the optimization-based method FGI in most of simulated depth maps. Although our objective results are slightly worse than RCG up-sampling results as shown in Table 2, our method still outperform RCG methods over some of the simulated depth maps (e.g. Art and Book). Our proposed method ranks the first or the second among the other up-sampling methods.

Table 2. Quantitative comparison between different algorithms on simulated ToF datasets in terms of MAE.

View Table | View all tables in this article

For subjective evaluation, our proposed method is also compared with the up-sampling methods, where the subjective evaluation is sometimes better than the objective evaluation especially for visualization of the depth discontinuities. Figure 8 presents the subjective evaluation of 8$\times$ up-sampled depth map Art by all methods. In addition, the difference maps for all methods are also shown. The depth map up-sampled by the dynamic/dynamic type of muGIF and DG methods still contain observable intrinsic noise. In addition, their depth discontinuities are also blurred as shown in Fig. 8(a-b). For FGI, the depth map obtained by this method has very little noise; however, the depth edges still are blurred and distorted. For RCG, the transferred textures to the homogeneous regions of depth map are still noticeable especially at the regions corresponding to the high contrasted textures in the registered color image. It is also noticed that our proposed method outperforms RCG method in overcoming the texture coping problem and preserving the depth discontinuities as shown in the corresponding difference maps of Fig. 8(d-e).

Fig. 8. Comparison on 8$\times$ up-sampling depth map on simulated ToF dataset (Art) and the difference map, (a) muGIF(D/D) (MAE:2.86), (b) DG (MAE:2.93), (c) FGI (MAE:2.41), (d) RCG (MAE:1.71), (e) Proposed method (MAE:1.58), (f) Groundtruth depth and color images.

Download Full Size | PDF

3.3 Experiments on real ToF depth maps

In addition to testing of our proposed super-resolution model on simulated ToF dataset, it is further tested on real ToF dataset. The real ToF dataset used for verification is provided by [35], which includes three depth maps namely Shark, Devil and Books. The depth maps included in this dataset are at low spatial resolution (i.e. 120 $\times$ 160), where their values in millimeter (mm), while the spatial resolution of the registered intensity images are 610 $\times$ 810. In regards to the real ToF dataset, Table 3 illustrates the quantitative performance of our method compared with other up-sampling methods on real ToF dataset. From Table 3, it is seen that our method ranks first for most of the depth maps of the real dataset (Books and Shark).

Table 3. MAE results of real ToF datasets measured in mm

View Table | View all tables in this article

For subjective evaluation, our method is validated by the depth map results and the point cloud obtained by our method. The visual comparison of one real ToF depth map Books is presented in Fig. 9. AR and all muGIF types still have some noise and blurry edges. The learning-based method DG has little noise and distortion in the homogeneous regions and at the depth discontinuities respectively; however, the depth edges are also blurred as AR and muGIF methods. The depth edges resulted from RCG method are distorted and quite jaggy as shown in Fig. 9(d), although RCG over-sharpens the depth edges. From Fig. 9, one can undoubtedly realize that our up-sampling method is the best in recovering and up-sampling real ToF depth map compared with other methods especially RCG method.

Fig. 9. Experimental results of real ToF data (Books), (a) AR, (b) muGIF(S/D), (c) muGIF(D/D), (d) DG, (e) RCG, (f) Proposed method, (g) Groundtruth depth map, (h) color image.

Download Full Size | PDF

The other validation is the point cloud, where the benefit of warping the depth maps into point clouds is that the problem of blurred depth edges and flying pixels are clearly appeared in the point cloud. Figure 10 presents the point clouds obtained from various up-sampling methods including ours. This figure confirms on the observations drawn from Fig. 9. From Fig. 10, it is obvious that our method preserves the boundaries of depth map with few flying pixels compared with the other approaches. Although RCG method has robust performance, where it has sharp edges and very few flying pixels compared with other methods, and it is comparable with our method, there is noticeable distortions at the boundaries of point cloud resulted from RCG.

Fig. 10. Experimental results of point cloud obtained from real ToF data (Books), (a) AR, (b) muGIF(S/D), (c) muGIF(D/D), (d) DG, (e) RCG, (f) Proposed method.

Download Full Size | PDF

3.4 Visualization of confidence maps

In this subsection, we discuss the changing of the confidence maps through optimization iterations. Figure 11 shows the changing of confidence maps through the iterations. As shown in Fig. 11, the first column of the first row represents the initialization of confidence map $C$ which is the eroded mask of Kinect-like depth map. After the first iteration, the hole pixels are filled and the confidence map becomes a white matrix, where the proposed method is equivalent to the self-guided muGIF, which removes any texture copied from the color image. For the edge confidence map $E$, the black regions in the map is related to the homogeneous regions of depth map, while the white pixels are related to the expected edges.

4. Conclusion

In this paper, a new optimization model depending on the relative structures of both depth and color images is proposed for depth filtering and up-sampling tasks. In addition, a confidence map suitable for every task is proposed for distinguishing between the depth discontinuities and smooth regions, where the color-based and depth-based priors are used in them respectively. Our proposed model is superior for overcoming texture coping problem in both filling hole pixels and super-resolution tasks. Moreover, the depth discontinuities of our results are sharp and moderate between blur and over-sharpen as shown in experiments on both simulated and real Kinect and ToF data.

Fig. 11. Visualization of confidence maps through 5 iterations. The first row shows $C$, where all the hole pixels are filled after the first iteration, thus the confidence map $C$ becomes a white matrix. The second row shows the edge confidence map $E$ changing.

Download Full Size | PDF

Funding

National Natural Science Foundation of China (61971203); China Southern Power Grid (YNKJXM20180015).

Disclosures

All authors are not employed by government or government related entities, and commercial entities. The research is not conducted under any commercial relationships to any kind of commercial entities.

This manuscript or any part of the manuscript has not been published or under consideration by other journals.

All authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Badiali, L. Cercenelli, S. Battaglia, E. Marcelli, C. Marchetti, V. Ferrari, and F. Cutolo, “Review on augmented reality in oral and cranio-maxillofacial surgery: Toward surgery-specific head-up displays,” IEEE Access 8, 59015–59028 (2020). [CrossRef]

2. M. H. Lee, J. Kim, K. Lee, C. Choi, and J. Y. Hwang, “Wide-field 3d ultrasound imaging platform with a semi-automatic 3d segmentation algorithm for quantitative analysis of rotator cuff tears,” IEEE Access 8, 65472–65487 (2020). [CrossRef]

3. Z. Dai, R. Yang, F. Hang, J. Zhuang, Q. Lin, Z. Wang, and Y. Lao, “Neurosurgical craniotomy localization using interactive 3d lesion mapping for image-guided neurosurgery,” IEEE Access 7, 10606–10616 (2019). [CrossRef]

4. B. J. Boom, S. Orts-Escolano, X. X. Ning, S. McDonagh, P. Sandilands, and R. B. Fisher, “Interactive light source position estimation for augmented reality with an rgb-d camera,” Comp. Anim. Virtual Worlds 28(1), e1686 (2017). [CrossRef]

5. Y. Lu, Z. Xue, G.-S. Xia, and L. Zhang, “A survey on vision-based uav navigation,” Geo-spatial information science 21(1), 21–32 (2018). [CrossRef]

6. J. Palacios, C. Sagüés, E. Montijano, and S. Llorente, “Human-computer interaction based on hand gestures using rgb-d sensors,” Sensors 13(9), 11842–11860 (2013). [CrossRef]

7. Y. Wang, Y. Yang, and Q. Liu, “Feature-aware trilateral filter with energy minimization for 3d mesh denoising,” IEEE Access 8, 52232–52244 (2020). [CrossRef]

8. Y. Yang, Q. Liu, X. He, and Z. Liu, “Cross-view multi-lateral filter for compressed multi-view depth video,” IEEE Trans. on Image Process. 28(1), 302–315 (2019). [CrossRef]

9. M. M. Ibrahim, Q. Liu, R. Khan, J. Yang, E. Adeli, and Y. Yang, “Depth map artifacts reduction: A review,” IET Image Processing 14(12), 2630–2644 (2020). [CrossRef]

10. C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in IEEE Int. Conf. Comput. Vis. (ICCV), (IEEE, 1998), pp. 839–846.

11. A. Buades, B. Coll, and J.-M. Morel, “Image denoising methods. a new nonlocal principle,” SIAM Rev. 52(1), 113–147 (2010). [CrossRef]

12. E. S. Gastal and M. M. Oliveira, “Adaptive manifolds for real-time high-dimensional filtering,” ACM Trans. Graph. 31(4), 1–13 (2012). [CrossRef]

13. J. Kopf, M. F. Cohen, D. Lischinski, and M. Uyttendaele, “Joint bilateral upsampling,” ACM Trans. Graph. 26(3), 96–100 (2007). [CrossRef]

14. J. Diebel and S. Thrun, “An application of markov random fields to range sensing,” in Conf. Neural Information Processing Systems (NIPS), (2005), pp. 291–298.

15. S. Lu, X. Ren, and F. Liu, “Depth enhancement via low-rank matrix completion,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), (IEEE, 2014), pp. 3390–3397.

16. J. Yang, X. Ye, K. Li, C. Hou, and Y. Wang, “Color-guided depth recovery from rgb-d data using an adaptive autoregressive model,” IEEE Trans. Image Process. 23(8), 3443–3458 (2014). [CrossRef]

17. W. Dong, G. Shi, X. Li, K. Peng, J. Wu, and Z. Guo, “Color-guided depth recovery via joint local structural and nonlocal low-rank regularization,” IEEE Trans. Multimedia 19(2), 293–301 (2017). [CrossRef]

18. X. Shen, C. Zhou, L. Xu, and J. Jia, “Mutual-structure for joint filtering,” in IEEE Int. Conf. Comput. Vis. (ICCV), (IEEE, 2015), pp. 3406–3414.

19. D. Min, S. Choi, J. Lu, B. Ham, K. Sohn, and M. N. Do, “Fast global image smoothing based on weighted least squares,” IEEE Trans. Image Process. 23(12), 5638–5653 (2014). [CrossRef]

20. W. Liu, X. Chen, J. Yang, and Q. Wu, “Robust color guided depth map restoration,” IEEE Trans. Image Process. 26(1), 315–327 (2017). [CrossRef]

21. X. Zhang and R. Wu, “Fast depth image denoising and enhancement using a deep convolutional network,” in Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), (IEEE, 2016), pp. 2499–2503.

22. J. Zhu, J. Zhang, Y. Cao, and Z. Wang, “Image guided depth enhancement via deep fusion and local linear regularizaron,” in Int. Conf. Image Process. (ICIP), (IEEE, 2017), pp. 4068–4072.

23. X. He, Q. Liu, and Y. Yang, “MV-GNN: Multi-view graph neural network for compression artifacts reduction,” IEEE Transaction on Image Processing 29, 6829–6840 (2020). [CrossRef]

24. Z. Wang, J. Hu, S. Wang, and T. Lu, “Trilateral constrained sparse representation for kinect depth hole filling,” Pattern Recognit. Lett. 65, 95–102 (2015). [CrossRef]

25. X. Wang, P. Zhang, Y. Zhang, L. Ma, S. Kwong, and J. Jiang, “Deep intensity guidance based compression artifacts reduction for depth map,” J. Vis. Commun. Image Represent. 57, 234–242 (2018). [CrossRef]

26. X. Guo, Y. Li, and J. Ma, “Mutually guided image filtering,” in 2017 ACM Multimedia Conf., (ACM, 2017), pp. 1283–1290.

27. X. Guo, Y. Li, J. Ma, and H. Ling, “Mutually guided image filtering,” IEEE Trans. Pattern Anal. Mach. Intell. 42(3), 694–707 (2020). [CrossRef]

28. L. Xu, C. Lu, Y. Xu, and J. Jia, “Image smoothing via l₀ gradient minimization,” ACM Trans. Graph. 30(6), 1–12 (2011). [CrossRef]

29. D. Krishnan and R. Szeliski, “Multigrid and multilevel preconditioners for computational photography,” ACM Trans. Graph. 30(6), 1–10 (2011). [CrossRef]

30. H. Hirschmuller and D. Scharstein, “Evaluation of cost functions for stereo matching,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), (IEEE, 2007), pp. 1–8.

31. M. M. Ibrahim, Q. Liu, and Y. Yang, “An adaptive colour-guided non-local means algorithm for compound noise reduction of depth maps,” IET Image Processing 14(12), 2768–2779 (2020). [CrossRef]

32. Y. Li, D. Min, M. N. Do, and J. Lu, “Fast guided global interpolation for depth and motion,” in European Conference on Computer Vision, (Springer, 2016), pp. 717–733.

33. S. Gu, W. Zuo, S. Guo, Y. Chen, C. Chen, and L. Zhang, “Learning dynamic guidance for depth image enhancement,” in IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), (IEEE, 2017), pp. 712–721.

34. N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, “Indoor segmentation and support inference from rgbd images,” in European Conf. on Comput. Vis. (ECCV), (Springer, 2012), pp. 746–760.

35. D. Ferstl, C. Reinbacher, R. Ranftl, M. Rüther, and H. Bischof, “Image guided depth upsampling using anisotropic total generalized variation,” in IEEE Int. Conf. Comput. Vis. (ICCV), (IEEE, 2013), pp. 993–1000.

Dataset	LRM	AR	MSJF	RCG	EmuGIF (S/D)	EmuGIF (D/D)	ACGMNLM	ACGMNLM +SF	Ours
Art (D1)	1.854	3.066	2.538	2.505	1.309	1.264	1.403	1.492	1.023
Books (D1)	0.953	2.103	1.003	1.571	0.816	0.734	0.862	0.780	0.699
Moebius (D1)	1.038	2.079	1.142	1.381	0.863	0.785	0.869	0.818	0.674
Dolls (D1)	1.062	2.058	1.123	1.821	0.925	0.851	1.061	1.043	0.9
Laundry (D1)	1.151	2.203	1.310	1.404	0.857	0.756	1.183	1.169	0.87
Reindeer (D1)	1.178	2.291	1.32	2.082	0.900	0.824	1.019	1.003	0.788
Teddy (D1)	1.004	2.132	1.126	1.557	0.944	0.863	0.858	0.807	0.758
Cones (D1)	1.134	2.199	1.272	1.534	1.064	0.819	0.941	0.897	0.795
Art (D2)	-	1.354	-	1.674	1.652	1.816	2.095	1.475	1.217
Books (D2)	-	1.081	-	2.014	1.269	1.194	1.842	1.103	1.02
Moebius (D2)	-	1.095	-	1.978	1.291	1.294	1.475	1.088	1.079
Dolls (D2)	-	1.471	-	2.253	1.484	1.361	2.132	1.360	1.402
Laundry (D2)	-	1.148	-	1.535	1.489	1.441	1.895	1.439	1.437
Reindeer (D2)	-	5.554	-	2.093	1.394	1.378	1.796	1.186	1.099
Average	1.172	2.131	1.354	1.814	1.161	1.099	1.388	1.118	0.982

Dataset	Scale	RCG	FGI	DG	muGIF(S/D)	muGIF(D/D)	Ours
Art	2 $\times$	0.69	0.81	0.80	1.10	1.00	0.71
	4 $\times$	1.03	1.29	1.39	1.75	1.66	1.08
	8 $\times$	1.71	2.41	2.93	2.98	2.86	1.58
	16 $\times$	3.37	4.44	5.05	4.66	4.74	3.12
Book	2 $\times$	0.55	0.57	0.52	0.76	0.70	0.50
	4 $\times$	0.75	0.76	0.89	1.26	1.13	0.73
	8 $\times$	1.11	1.14	1.82	2.06	1.88	1.14
	16 $\times$	1.73	1.90	2.95	2.56	3.00	1.84
Moebius	2 $\times$	0.52	0.59	0.53	0.81	0.78	0.51
	4 $\times$	0.74	0.74	0.90	1.30	1.21	0.73
	8 $\times$	1.12	1.12	1.85	2.09	1.95	1.14
	16 $\times$	1.69	1.81	2.95	2.66	3.03	2.02
Dolls	2 $\times$	0.63	0.67	0.66	0.83	0.77	0.63
	4 $\times$	0.87	0.90	0.97	1.30	1.18	0.88
	8 $\times$	1.20	1.36	1.86	2.05	1.95	1.33
	16 $\times$	1.74	2.10	2.90	2.52	2.90	2.24
Laundry	2 $\times$	0.54	0.61	0.81	0.82	0.74	0.72
	4 $\times$	0.77	0.91	1.08	1.37	1.23	0.97
	8 $\times$	1.13	1.50	2.15	2.26	2.11	1.53
	16 $\times$	2.13	2.60	3.51	3.12	3.46	2.42
Reindeer	2 $\times$	0.54	0.64	0.53	0.86	0.78	0.53
	4 $\times$	0.75	0.83	0.99	1.41	1.27	0.81
	8 $\times$	1.10	1.44	2.10	2.33	2.11	1.16
	16 $\times$	2.09	2.64	3.52	3.13	3.47	2.02

Dataset	AR	RCG	DG	muGIF(S/D)	muGIF(D/D)	Ours
Books	14.37	13.37	14.58	14.20	14.27	13.18
Devil	15.41	14.62	16.99	15.38	15.36	16.36
Shark	16.27	15.62	16.23	16.10	16.16	15.53

Dataset	LRM	AR	MSJF	RCG	EmuGIF (S/D)	EmuGIF (D/D)	ACGMNLM	ACGMNLM +SF	Ours
Art (D1)	1.854	3.066	2.538	2.505	1.309	1.264	1.403	1.492	1.023
Books (D1)	0.953	2.103	1.003	1.571	0.816	0.734	0.862	0.780	0.699
Moebius (D1)	1.038	2.079	1.142	1.381	0.863	0.785	0.869	0.818	0.674
Dolls (D1)	1.062	2.058	1.123	1.821	0.925	0.851	1.061	1.043	0.9
Laundry (D1)	1.151	2.203	1.310	1.404	0.857	0.756	1.183	1.169	0.87
Reindeer (D1)	1.178	2.291	1.32	2.082	0.900	0.824	1.019	1.003	0.788
Teddy (D1)	1.004	2.132	1.126	1.557	0.944	0.863	0.858	0.807	0.758
Cones (D1)	1.134	2.199	1.272	1.534	1.064	0.819	0.941	0.897	0.795
Art (D2)	-	1.354	-	1.674	1.652	1.816	2.095	1.475	1.217
Books (D2)	-	1.081	-	2.014	1.269	1.194	1.842	1.103	1.02
Moebius (D2)	-	1.095	-	1.978	1.291	1.294	1.475	1.088	1.079
Dolls (D2)	-	1.471	-	2.253	1.484	1.361	2.132	1.360	1.402
Laundry (D2)	-	1.148	-	1.535	1.489	1.441	1.895	1.439	1.437
Reindeer (D2)	-	5.554	-	2.093	1.394	1.378	1.796	1.186	1.099
Average	1.172	2.131	1.354	1.814	1.161	1.099	1.388	1.118	0.982

Dataset	Scale	RCG	FGI	DG	muGIF(S/D)	muGIF(D/D)	Ours
Art	2 $\times$	0.69	0.81	0.80	1.10	1.00	0.71
	4 $\times$	1.03	1.29	1.39	1.75	1.66	1.08
	8 $\times$	1.71	2.41	2.93	2.98	2.86	1.58
	16 $\times$	3.37	4.44	5.05	4.66	4.74	3.12
Book	2 $\times$	0.55	0.57	0.52	0.76	0.70	0.50
	4 $\times$	0.75	0.76	0.89	1.26	1.13	0.73
	8 $\times$	1.11	1.14	1.82	2.06	1.88	1.14
	16 $\times$	1.73	1.90	2.95	2.56	3.00	1.84
Moebius	2 $\times$	0.52	0.59	0.53	0.81	0.78	0.51
	4 $\times$	0.74	0.74	0.90	1.30	1.21	0.73
	8 $\times$	1.12	1.12	1.85	2.09	1.95	1.14
	16 $\times$	1.69	1.81	2.95	2.66	3.03	2.02
Dolls	2 $\times$	0.63	0.67	0.66	0.83	0.77	0.63
	4 $\times$	0.87	0.90	0.97	1.30	1.18	0.88
	8 $\times$	1.20	1.36	1.86	2.05	1.95	1.33
	16 $\times$	1.74	2.10	2.90	2.52	2.90	2.24
Laundry	2 $\times$	0.54	0.61	0.81	0.82	0.74	0.72
	4 $\times$	0.77	0.91	1.08	1.37	1.23	0.97
	8 $\times$	1.13	1.50	2.15	2.26	2.11	1.53
	16 $\times$	2.13	2.60	3.51	3.12	3.46	2.42
Reindeer	2 $\times$	0.54	0.64	0.53	0.86	0.78	0.53
	4 $\times$	0.75	0.83	0.99	1.41	1.27	0.81
	8 $\times$	1.10	1.44	2.10	2.33	2.11	1.16
	16 $\times$	2.09	2.64	3.52	3.13	3.47	2.02

Color-guided optimization model with reliable self-structure priors for depth map restoration

Abstract

1. Introduction

2. Proposed color-guided optimization model

2.1 Problem statement and degradation model

2.2 Analysis of different mutually filtering types

2.2.1 Filling hole pixels and filtering

2.2.2 Super-resolution

2.3 Proposed method for compound noise filtering

2.4 Proposed method for super-resolution

3. Experiments and discussions

3.1 Experiments on simulated and real Kinect depth maps

3.2 Experiments on simulated ToF depth maps

3.3 Experiments on real ToF depth maps

3.4 Visualization of confidence maps

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (3)

Equations (18)

OSA Continuum