Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

3D shape recovery algorithm from image orientations of textured surfaces

Open Access Open Access

Abstract

Previous psychophysical studies have demonstrated that the image orientation of textured surfaces guides human 3D shape perception. However, the accuracy of computational 3D shape reconstruction solely from image orientation requires further study. This paper proposes a 3D shape recovery algorithm from the image orientation of a single textured surface image. The evaluation of the proposed algorithm uses computer-generated textured complex 3D surfaces. The depth correlations between the recovered and true surface shapes achieved or exceeded 0.8, which is similar to the accuracy of human shape perception, as shown in a previous psychophysical study, indicating that the image orientations contain adequate information for 3D shape recovery from textured surface images.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. INTRODUCTION

We intuitively perceive 3D surface shape from a single 2D image although 3D shape recovery from a single 2D image is problematic. To understand the nature of human 3D shape perception [1,2], or to enable robust 3D shape estimation for practical applications, the shape from the texture model has been studied [3,4]. The shape from texture studies uses local image statistics changes [57] and/or detailed texture elements [8] to detect texture distortion caused by the projection of a 3D surface onto an image plane.

This study is motivated by previous psychophysical studies demonstrating the importance of the orientation field among other local image statistics for human 3D shape perception from texture, as well as that from specular and diffuse reflections [912]. The orientation field is a collection of dominant orientations at every image location (see Fig. 1); this information is represented in the primary visual cortex (V1), which contains cells tuned to specific orientations [13]. From a theoretical point of view, this image orientation corresponds to the orientation of the surface’s first derivative through texture foreshortening or the orientation of the surface’s second derivative through specular reflection [9,14]. Thus, the orientation field is related to 3D surface shapes with respect to multiple types of image cues, although it does not fully retain the texture distortion information [57]. Previous studies hypothesized that the human visual system uses the orientation field for 3D shape perception [10]. In support of this hypothesis, they showed that 3D shape perception is influenced according to orientation field modulation resulting from image manipulation, and it is even regulated by psychophysical adaptation to specific orientation fields [10].

 figure: Fig. 1.

Fig. 1. Flowchart of the proposed shape recovery algorithm. The orientation field is extracted from an image. The hue represents the image orientation that maximally stimulates the V1-cell-like oriented filter at each location. A cost function is formulated based on the orientation field. The estimated surface depth is obtained by minimizing the cost function.

Download Full Size | PDF

 figure: Fig. 2.

Fig. 2. Recovered 3D shapes from textured surfaces. The textured surface images were generated using computer graphics. The recovered surface depths and the ground-truth 3D images are represented by depth maps with superimposed contour lines.

Download Full Size | PDF

In our previous study, to investigate the computational plausibility of the hypothesis, we developed a 3D shape recovery algorithm from a single 2D image of specular highlights [15]. The algorithm relies on the relationship between the orientation field and the surface’s second derivative, although it also uses another image cue, i.e., vertical polarity, to resolve the concave/convex ambiguity. The algorithm recovered complex 3D shapes, showing the computational sufficiency of the orientation field combined with the vertical polarity for 3D shape recovery in the presence of specular reflections.

The purpose of this study is to investigate how much 3D shape information is contained in the orientation field of textured images to better understand the findings of the previous psychophysical study [10], and to compare the 3D shape estimation performance obtained from the orientation fields of texture and specular reflection. This paper outlines a 3D shape recovery algorithm from a single textured image based on the relationship between the orientation field and the surface’s first derivatives. The algorithm is evaluated on the same set of 3D shapes used in the previous study [15] to facilitate the comparison of the estimation performance. The proposed algorithm using textured images attains higher accuracy than the previous algorithm using specular images, and the result indicates that the orientation field is adequate for 3D shape recovery in the case of texture, although the orientation field alone is inadequate in the case of specular reflection.

2. MATERIALS AND METHODS

Figure 1 shows the flowchart of the proposed algorithm to recover 3D surface depth from a single textured 2D image. The main procedure is as follows. First, the orientation field is extracted from an image; second, the cost function is formulated based on the orientation field; finally, the 3D shape is recovered by minimizing the cost function.

As a precondition to 3D shape recovery, we assume that the image region where the object exists is known. We denote the object region as ${\Omega}$, the number of pixels in ${\Omega}$ as ${N_{{\Omega}}}$, the boundary region, which is the region between the boundary of ${\Omega}$ and one pixel inside it, as $\partial {\Omega}$, and the number of pixels in $\partial {\Omega}$ as ${N_{\partial {\Omega}}}$. The resolution of the 3D shape recovery is $256 \times 256$ pixels. We set a Cartesian coordinate on the image plane, where the $x$- and $y$-axes represent the horizontal and vertical axes of an image plane, respectively, and the $z$-axis represents the direction toward the viewer. We represent the depth of the 3D object surface as $z({x,y})$. The following notations were used: scalars are represented in normal-type letters ($x$); vectors are represented in lower-case boldface letters (${\boldsymbol x}$); and matrices are represented in upper-case boldface letters (${\boldsymbol X}$).

A. Images and Extraction of Orientation Fields

We used the images of 12 different 3D shapes to evaluate the proposed algorithm. The images had a $1024 \times 1024$ pixel resolution and were achromatic, although they were down-sampled to $256 \times 256$ pixels to reduce the orientation field error before the 3D shape recovery. These images were rendered using LightWave 2020 (NewTek). The procedural texture in LightWave (Turbulence; frequency 5, contrast 100%, small power 2.0) was used to add isotropic surface texture. Note that the surface texture of the 2D image in Fig. 1 is coarser than those in Fig. 2 as a result of setting the frequency to four to clarify the texture and 3D shape for the benefit of the reader. The solid texture was elongated along the $x$-axis or $y$-axis when the anisotropic surface texture was generated. The luminosity of the surface was set to 100%, and no lighting was used because the proposed algorithm uses neither diffuse nor specular reflection information. The 3D shapes of objects #1–6 were randomly generated using spherical harmonics with increasing complexity. The 3D shapes of objects #7–12 were human-made. These 3D shapes were used in our previous studies [1517].

We extracted the orientation field as follows. The image orientation $\theta ({x,y})$ is the angle that maximizes the magnitude of the response $p$ of the oriented filter (first-derivative operator): $\theta ({x,y}) = \mathop {{\rm argmax}}\nolimits_{\theta ^\prime} {p^2}({\theta ^\prime ({x,y})})$. The steerable pyramid [18,19] (matlabPyrTools [20]) was used to extract the image orientation in accordance with previous studies [9,10,14]. The responses were obtained by steering the filter through 120 equal orientation steps between 0° and 180°. The orientation responses at the finest possible spatial scale ($1024 \times 1024$ pixel resolution) were extracted for all the shapes in accordance with a previous study [9]. Then, the amplitudes, which are the squared responses, were averaged and downsampled to $256 \times 256$ pixels and convolved by a $3 \times 3$ constant filter for noise reduction. Next, the image orientation was obtained based on the above equation. The image anisotropy, which is defined by the ratio of the minimum and maximum magnitudes of the oriented filter response with respect to its angle [9], was not used in this study.

The orientation field error is quantified by the mean absolute errors throughout the object region between the image and surface orientations. The surface orientation is obtained from the ground-truth 3D shapes (see Section 2.B).

B. Formulation and Minimization of Cost Function

Cost function $E$ consists of three terms: the first derivative constraint given by the orientation field $C$, the convex prior $P$, and the boundary condition $B$:

$$E = C + P + B.$$

The surface gradient is represented by slant $\sigma$ and tilt $\tau$ [21] as

$$\nabla z = \left({\begin{array}{*{20}{c}}{\frac{{\partial z}}{{\partial x}}}\\[6pt]{\frac{{\partial z}}{{\partial y}}}\end{array}} \right) = R\!\left(\tau \right)\left({\begin{array}{*{20}{c}}{- \tan \sigma}\\0\end{array}} \right),$$
where $R$ is the rotation matrix and $\sigma = {\tan ^{- 1}}\sqrt {{{({\frac{{\partial z}}{{\partial x}}})}^2} + {{({\frac{{\partial z}}{{\partial y}}})}^2}}$. The slant is the angle between the surface normal and the line of sight, and it ranges from 0° to 90°. The tilt is the direction of the surface normal projected onto the image plane, and it ranges from 0° to 360°. Here, we define the surface orientation ${\theta _s}$ as the orientation perpendicular to the tilt direction. Note that the surface orientation ranges from 0° to 180° as does the image orientation. The surface gradient is also represented by the slant and the surface orientation as
$$\nabla z = \left({\begin{array}{*{20}{c}}{\frac{{\partial z}}{{\partial x}}}\\[6pt]{\frac{{\partial z}}{{\partial y}}}\end{array}} \right) = R\!\left({{\theta _s}} \right)\left({\begin{array}{*{20}{c}}0\\{s\tan \sigma}\end{array}} \right),$$
where $s = + 1$ if  $0^\circ \le \tau \lt 90^\circ$ or $270^\circ \le \tau \lt 360^\circ$, and $s = - 1$ if  $90^\circ \le \tau \lt 270^\circ$.

If the textured surface is slanted, the projected texture onto the image plane is compressed along the tilt direction, and this compression causes the image orientation along the surface orientation [9,10]. The first derivative constraint is based on this relationship such that the image orientation approximates the surface orientation $\theta \approx {\theta _s}$. This approximation is valid if the texture is isotropic [5,22]. This relationship is described with the error term as ${\theta _s} = \theta + \delta \theta$. Here, we introduce the coordinate axes $({u,v})$ by rotating the original axes $({x,y})$ by the image orientation $\theta ({x,y})$. Note that the axes $({u,v})$ depend on each position, corresponding to the image orientation in that position. Then, Eq. (3) is described as

$$\left({\begin{array}{*{20}{c}}{\frac{{\partial z}}{{\partial u}}}\\[6pt]{\frac{{\partial z}}{{\partial v}}}\end{array}} \right) = R\!\left({\delta \theta} \right)\left({\begin{array}{*{20}{c}}0\\{s\tan \sigma}\end{array}} \right).$$

The first derivative constraint $C$ is based on Eq. (4) where $\frac{{\partial z}}{{\partial u}} = - s\tan \sigma \sin \delta \theta = O({\delta \theta})$ is small. The cost is the sum of the squared $\frac{{\partial z}}{{\partial u}}$ throughout the object region:

$$C = \frac{1}{{2{N_{{\Omega}}}}}\mathop \sum \limits_{x,y \in {\Omega}} {\left({\frac{{\partial z}}{{\partial u}}} \right)^2}.$$

The first derivative constraint is summarized as $C = \frac{1}{2}{{\boldsymbol z}^T}{\boldsymbol {Lz}}$, where ${\boldsymbol z}$ is the column vector of size ${N_{{\Omega}}} \times 1$ that consists of $z({x,y})$ in object region ${\Omega}$, and ${\boldsymbol L}$ is a matrix that represents the summarized differential operator, which is a Laplacian matrix of size ${N_{{\Omega}}} \times {N_{{\Omega}}}$.

The convex prior $P$ is based on the prior that humans tend to perceive the surface as globally convex rather than globally concave [23,24]. Instead of encouraging the surface curvature positive, this prior prefers a surface bulge toward the viewer:

$$P = - \frac{1}{{{N_{{\Omega}}}}}\mathop \sum \limits_{x,y \in {\Omega}} z.$$

This prior prevents the trivial constant depth solution that only minimizes the first derivative constraint, which would be inappropriate according to the generic viewpoint principle [25], and renders a bumpy recovered shape. The convex prior is summarized as $P = - \frac{1}{{{N_{{\Omega}}}}}{{\boldsymbol z}^T}{\textbf 1}$, where ${\textbf 1}$ is the all-ones column vector of size ${N_{{\Omega}}} \times 1$.

Tables Icon

Table 1. Orientation Field Error for Each Textured Surfacea

Tables Icon

Table 2. Estimation Performance for Each Textured Surfacea

Boundary condition $B$ is introduced to resolve the translation ambiguity along the $z$-axis by zeroing the mean depth values along the boundary region [15]:

$$B = \frac{1}{2}{\left({\frac{1}{{{N_{\partial {\Omega}}}}}\mathop \sum \limits_{x,y \in \partial {\Omega}} z} \right)^2}.$$

The boundary condition is summarized as $B = \frac{1}{2}{{\boldsymbol z}^T}{\boldsymbol {Bz}}$, where ${\boldsymbol B}$ is the coefficient matrix of size ${N_{{\Omega}}} \times {N_{{\Omega}}}$.

The cost function is summarized as

$$E ({\boldsymbol z} ) = \frac{1}{2}{{\boldsymbol z}^T} ({{\boldsymbol L} + {\boldsymbol B}}){\boldsymbol z} - \frac{1}{{{N_\Omega}}}{{\boldsymbol z}^T}{\textbf 1}.$$

The optimal recovered 3D shape minimizes the cost function, ${\boldsymbol {\hat z}} = \mathop {\min}\nolimits_{\boldsymbol z} E({\boldsymbol z})$. Therefore, the derivative of the cost function with respect to ${\boldsymbol z}$ should be zero, $\frac{{\partial E}}{{\partial {\boldsymbol z}}} = ({{\boldsymbol L} + {\boldsymbol B}}){\boldsymbol {\hat z}} - \frac{1}{{{N_\Omega}}}{\textbf 1} = {\textbf 0}$. The solution is obtained as

$${\boldsymbol {\hat z}} = \frac{1}{{{N_\Omega}}}{\left({{\boldsymbol L} + {\boldsymbol B}} \right)^{{-} 1}}{\textbf 1}.$$

C. Evaluation of Recovered Depth

The shape recovery performance is quantified by measuring the correlation between the recovered and true depth. We used two depth correlations: global and local interior. The global depth correlation is simply the correlation coefficient of the recovered and true depths throughout the object region. However, the global depth correlation tends to become high as long as the depth around the boundary is small, because the true depth is generally very small around the boundary and modest inside the object region. In other words, global depth is sensitive to the depth around the boundary and insensitive to the details of the shape within the object region. Therefore, this paper also uses a local interior depth correlation, calculated as follows. First, a grid is set to divide the vertical and horizontal axes of the image into eight regions, respectively (at 32-pixel intervals). Second, a circle is centered at an intersection of the grid with a radius of 32 pixels. Third, a depth correlation is measured in the intersection of the circle and the object area after removing the area near the boundary (within 24 pixels from the boundary). A depth correlation is not measured if the intersection area is smaller than half of the circle’s area. Fourth, the depth correlation values are averaged. As a result, the local interior depth correlation is not affected by the shapes near the boundary and is sensitive to the agreement of the concavity and the convexity inside the object region. Note that we did not evaluate the local interior depth correlation for objects #9 and #11. No depth correlation values were obtained with the above procedure because most of the object region is near the boundary, and the global depth correlation appears sufficient as a measure because there is no fine shape structure inside of these object regions.

3. RESULTS

Twelve computer-generated 3D objects with textured surfaces validate our proposed algorithm (Fig. 2). The details of the images and 3D shapes are provided in Materials and Methods. Because the texture pattern is so fine, the reader may not fully see the textures and 3D shapes from the small format in Fig. 2. The original high-resolution textured images are provided as supplementary materials (see Figs. S1–S12 in Supplement 1). Figure 2 depicts the ground-truth 3D objects, the 2D texture images, and the corresponding estimated depth images. The depths are represented in grayscale; nearer surfaces are lighter and distant surfaces are darker. Additionally, 50 contour lines are superimposed to aid in the depiction.

The mean absolute errors of the orientation fields and the depth correlations of the recovered and the true shapes are used for evaluation. Table 1 summarizes the orientation field errors of the 12 objects. The average value mean absolute error of the image orientation across the 12 objects is 23.6°. Table 2 summarizes the estimation performance of the proposed algorithm on the 12 objects. The average values of the global depth correlation ${r_g}$ and the local interior depth correlation ${r_{\textit{li}}}$ across the 12 objects are 0.88 and 0.84, respectively. In our previous study [15], the shape recovery was deemed successful if both the global and local interior depth correlations exceeded 0.7, as determined by examining the appearance of the recovered object. Indeed, the recovered shape of object #12, where the local interior depth correlation is below 0.7, does not appear successful. The other recovered shapes, where both ${r_g}$ and ${r_{\textit{li}}}$ exceed 0.7, resemble the true 3D surface, although the regions of the 3D shapes with small true slants tend to be estimated as much flatter than the true 3D shapes (see the recovered shapes of #1 and #8 in Fig. 2).

The relationship between the estimation performance and the true surface slant is further investigated. The object regions are divided according to the true surface slant by increments of 15° to evaluate the orientation field error and the estimation performance. Here, the 75°–90° slant range is excluded because the average ratio of the object regions in this range is only 3.2%. Figure 3(a) summarizes the average values of the orientation field errors for the 12 objects for each slant range. The result shows that the orientation field error decreases monotonically as the true slant increases. This is because a more slanted surface corresponds to clearer image orientations resulting from texture foreshortening. Figure 3(b) shows the average values of the global depth correlation for the 12 objects for each slant range. The estimation performance is low where the true slant is low. This is because the orientation field error is high, thereby explaining the excessive flatness of the recovered shapes in Fig. 2 where the true slants are low. The estimation performance is also low where the true slant is high. This is likely because the high-slant object regions are away from each other in the image plane, and the shape recovery error accumulates along the distance. For example, if the 3D object is a sphere, the low- and high-slant object region is the central and the peripheral area of the image plane, respectively.

 figure: Fig. 3.

Fig. 3. (a) Averaged mean absolute orientation field error for each slant range across 12 objects. (b) Averaged global depth correlation for each slant range across 12 objects.

Download Full Size | PDF

Tables Icon

Table 3. Orientation Field Error and Estimation Performance for Elongated Textured Surfacesa

Next, the effect of the orientation field errors on the shape recovery errors is investigated by using the proposed algorithm to recover the shapes from the surface orientations obtained from the true 3D shapes instead of the image orientations. In this case, the average values of the global and local interior depth correlations for the 12 objects were ${r_g} = {0.96}$ and ${r_{\textit{li}}} = {0.97}$, respectively. The estimation performance with this input was very high (${r_g} \ge 0.95$, ${r_{\textit{li}}} \ge 0.98$), except for objects #10 (${r_g} = 0.97$, ${r_{\textit{li}}} = 0.81$) and #12 (${r_g} = 0.85$, ${r_{\textit{li}}} = 0.91$).

Finally, the effect of the deviation from the isotropic assumption of the texture on the recovered shape is investigated using surface images with textures elongated in the horizontal or vertical direction. Table 3 summarizes the average values of the orientation field errors and the estimation performance for the 12 objects for each texture elongation direction and magnification. The result shows that the orientation field error increases and the depth correlations decrease as the texture anisotropy increases. Figure 4 shows examples of the anisotropic textured surface images and the 3D shapes recovered from these images. Figures 4(a) and 4(b) depict textured images of object #5 in which the texture is elongated four times in the horizontal and vertical directions, respectively. Figure 4(c) depicts the shape recovered from the image in Fig. 4(a). The estimation performance was ${r_g} = 0.83$, ${r_{\textit{li}}} = 0.89$. Figure 4(d) depicts the shape recovered from the image in Fig. 4(b). The estimation performance was ${r_g} = 0.87$, ${r_{\textit{li}}} = 0.95$. These recovered shapes are affected by the anisotropic textures. The locations where the two recovered depths differ are marked by red crosses. At the location of the upper cross, the slant is gentle and broad in Fig. 4(c) but steep and narrow in Fig. 4(d). At the location of the lower cross, the surface is flatter and closer to the observer in Fig. 4(c) but more slanted and further away in Fig. 4(d). Across the image, the tilt is biased toward the vertical and horizontal directions in Figs. 4(c) and 4(d), respectively. This is because the image orientations are biased toward the texture elongation direction, and therefore the cost of the depth variation in that direction is high [see Eq. (5)].

 figure: Fig. 4.

Fig. 4. Anisotropic texture images of object #5 and the 3D shapes recovered from them. (a) Textured surface image with the texture elongated in the horizontal direction. (b) Textured surface image with the texture elongated in the vertical direction. (c) Shape recovered from the image in (a). (d) Shape recovered from the image in (b). Red crosses indicate the locations where the recovered shapes differ.

Download Full Size | PDF

4. DISCUSSION

This paper proposes an algorithm to estimate 3D shape from a single textured image, verifying the computational plausibility of 3D shape reconstruction from image orientations as demonstrated psychophysically [9,10]. A cost function incorporates the knowledge that the image orientation approximates the 3D orientation of the surface, and the 3D shapes are recovered by minimizing the cost function. The proposed algorithm is evaluated on 12 complex shapes with textured surfaces and achieves high depth correlations between the recovered and ground truth shapes (${\sim}{0.85}$). This depth correlation value is similar to that of the human shape perception from the orientation field, as shown in the previous psychophysical study (${r_g} = 0.857673$, see Fig. 5(G) of [10]). This indicates that the orientation field is adequate for 3D shape recovery from textured images, although the orientation field alone is inadequate, and the vertical polarity is additionally required for 3D shape recovery from specular images [15].

The orientation field error is a major limiting factor of the proposed algorithm according to the following two results. First, the algorithm achieves a very high shape recovery performance when using the surface orientations obtained directly from the true 3D shapes as input. Second, the estimation performance is inversely proportional to the orientation field error except where the true slant is too high (see Fig. 3). The first result also indicates that the proposed shape recovery algorithm mostly works well under such ideal conditions, although there are still cases of poor shape recovery, such as for object #10, which has a cusp [1], and #12, whose shape is complex.

The estimation performance of the proposed algorithm using textured surface images, ${r_g} = 0.88$ and ${r_{\textit{li}}} = 0.84$, is higher than that of our previous algorithm using glossy surface images, ${r_g} = 0.85$ and ${r_{\textit{li}}} = 0.76$, and mirrored surface images, ${r_g} = 0.84$ and ${r_{\textit{li}}} = 0.75$ (see Table 4 of [15]), although the average orientation field error of the textured surface was 23.6°, more than double that of glossy surface, 11.3°, and mirrored surface, 10.9° (see Table 3 of [15]). This is probably because the first derivative constraint obtained by texture foreshortening is easier to use for 3D shape recovery than the second derivative constraint obtained by specular reflection. In fact, the estimation algorithm using textured surfaces proposed in this study is similar to but much simpler than that proposed in our previous research using glossy or mirrored surfaces [15]. The difficulty of 3D shape recovery from specular reflection is also demonstrated in human shape perception [2,26,27].

3D shape estimation from orientation fields has some limitations. First, if the texture is anisotropic, the orientation fields are biased [14], and the resultant recovered 3D shape is distorted (see Table 3 and Fig. 4). Local image statistics other than the orientations or detailed texture element information can mitigate this problem and improve the estimation performance, although human-perceived 3D shapes are also affected by anisotropic texture under certain conditions [22], and it is not yet clear if detailed texture information is used in human 3D shape perception [8]. Second, it is difficult to estimate the absolute depth as well as the depth scale. For example, it is difficult to distinguish large, distant objects from small, nearby objects. Humans also have difficulty estimating the depth scale [10,28] from a single image without prior knowledge of the object’s shape (see Fig. 5(G) of [10]). Therefore, we evaluated the recovered shapes using depth correlations. For 3D shapes that have several landmarks, the landmark-based alignment method [29] provides an effective evaluation technique. Future work could consider adding spatial frequency information to the proposed algorithm, potentially facilitating depth cues in perspective projection because the texture shrinks as the distance increases [57]. This hypothesis is supported by the fact that V1 cells are tuned not only to specific orientations but also to specific spatial frequencies.

Finally, because the developed algorithm is based on psychophysical findings, the recovered shapes may resemble those derived by human 3D shape perception from textured images more closely than the ground truth shapes. The shape recovery results of this paper indicate several estimation biases worth verifying. First, the perceived shapes from regions of the textured images with a low true slant and flat recovered shape appear flatter than the ground truth shapes. Note that the opposite phenomenon occurs in the perceived shapes obtained from the corresponding glossy and specular images (compare Figs. S1–S12 in Supplement 1 and Fig. 4 and Fig. 5 of [15]) [30]. Second, the biased shapes obtained from anisotropic textures, which also cause bias in human shape perception [21], are shown in Fig. 4. If the developed algorithm has similarities with human shape perception, the perceived shape obtained from Fig. 4(a) should be closer to the shape in Fig. 4(c) than the one in Fig. 4(d) and the perceived shape obtained from Fig. 4(b) should be closer to the shape in Fig. 4(d) than the one in Fig. 4(c). It would be interesting to explore these hypotheses by conducting psychophysical experiments to compare these results with the shapes recovered by human 3D perception from textured 2D images.

5. CONCLUSION

The 3D shape recovery algorithm proposed in this paper uses the orientation field of a single textured image and achieves depth correlation results between the recovered and ground truth shapes that are as high as those of human shape perception, as shown in a previous psychophysical study, indicating that from a computational perspective, the orientation field contains adequate information for 3D shape perception from a single textured image. The recovered shapes have some estimation biases when the true slant is low and the texture is anisotropic. Future work could include psychophysical experiments to investigate the similarity of the recovered shape and human shape perception to understand the mechanism of human shape perception from textured images.

Funding

Japan Society for the Promotion of Science (KAKENHI Grant Number JP22K12224).

Acknowledgment

The author would like to thank Akiko Nishio for providing 3D shapes and helpful comments. The author would also like to thank Irina Entin, M. Eng., and Kimberly Moravec, PhD, from Edanz (https://jp.edanz.com/ac) for editing a draft of this paper.

Disclosures

The author declares no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Supplement 1.

Supplemental document

See Supplement 1 for supporting content.

REFERENCES

1. J. T. Todd, “The visual perception of 3D shape,” Trends Cogn. Sci. 8, 115–121 (2004). [CrossRef]  

2. J. F. Norman, J. T. Todd, and G. A. Orban, “Perception of three-dimensional shape from specular highlights, deformations of shading, and other types of visual information,” Psychol. Sci. 15, 565–570 (2004). [CrossRef]  

3. J. J. Gibson, The Perception of the Visual World (Houghton Mifflin, 1950).

4. S. E. Palmer, Vision Science: Photons to Phenomenology (MIT, 1999), Ch. 5.

5. J. Gårding, “Shape from texture for smooth curved surfaces in perspective projection,” J. Math. Imaging Vis. 2, 327–350 (1992). [CrossRef]  

6. J. Malik and R. Rosenholtz, “Computing local surface orientation and shape from texture for curved surfaces,” Int. J. Comput. Vis. 23, 149–168 (1997). [CrossRef]  

7. M. Clerc and S. Mallat, “The texture gradient equation for recovering shape from texture,” IEEE Trans. Pattern Anal. Mach. Intell. 24, 536–549 (2002). [CrossRef]  

8. D. Verbin and T. Zickler, “Toward a universal model for shape from texture,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 422–430.

9. R. W. Fleming, A. Torralba, and E. H. Adelson, “Specular reflections and the perception of shape,” J. Vis. 4(9):10 (2004). [CrossRef]  

10. R. W. Fleming, D. Holtmann-Rice, and H. H. Bülthoff, “Estimation of 3D shape from image orientations,” Proc. Natl. Acad. Sci. USA 108, 20438–20443 (2011). [CrossRef]  

11. B. Kunsberg and S. W. Zucker, “Critical contours: an invariant linking image flow with salient surface organization,” SIAM J. Imaging Sci. 11, 1849–1877 (2018). [CrossRef]  

12. B. Kunsberg and S. W. Zucker, “From boundaries to bumps: when closed (extremal) contours are critical,” J. Vis. 21(13):7 (2021). [CrossRef]  

13. D. H. Hubel and T. N. Wiesel, “Receptive fields and functional architecture of monkey striate cortex,” J. Physiol. 195, 215–243 (1968). [CrossRef]  

14. R. W. Fleming, A. Torralba, and E. H. Adelson, “Shape from sheen,” MIT-CSAIL-TR-2009-051 (2009).

15. T. Shimokawa, A. Nishio, M. Sato, M. Kawato, and H. Komatsu, “Computational model for human 3D shape perception from a single specular image,” Front. Comput. Neurosci. 13, 10 (2019). [CrossRef]  

16. A. Nishio, N. Goda, and H. Komatsu, “Neural selectivity and representation of gloss in the monkey inferior temporal cortex,” J. Neurosci. 32, 10780–10793 (2012). [CrossRef]  

17. A. Nishio, T. Shimokawa, N. Goda, and H. Komatsu, “Perceptual gloss parameters are encoded by population responses in the monkey inferior temporal cortex,” J. Neurosci. 34, 11143–11151 (2014). [CrossRef]  

18. E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, “Shiftable multiscale transforms,” IEEE Trans. Inf. Theory 38, 587–607 (1992). [CrossRef]  

19. E. P. Simoncelli and W. T. Freeman, “The steerable pyramid: A flexible architecture for multi-scale derivative computation,” in 2nd International Conference on Image Processing (1995), Vol. 3, pp. 444–447.

20. Laboratory for Computational Vision, NYU, “matlabPyrTools,” GitHub (2016), https://github.com/LabForComputationalVision/matlabPyrTools.

21. K. A. Stevens, “Slant-tilt: the visual encoding of surface orientation,” Biol. Cybern. 46, 183–195 (1983). [CrossRef]  

22. R. Rosenholtz and J. Malik, “Surface orientation from texture: isotropy or homogeneity (or both)?” Vis. Res. 37, 2283–2293 (1997). [CrossRef]  

23. M. S. Langer and H. H. Bülthoff, “A prior for global convexity in local shape-from-shading,” Perception 30, 403–410 (2001). [CrossRef]  

24. B. Liu and J. T. Todd, “Perceptual biases in the interpretation of 3D shape from shading,” Vis. Res. 44, 2135–2145 (2004). [CrossRef]  

25. W. T. Freeman, “The generic viewpoint assumption in a framework for visual perception,” Nature 368, 542–545 (1994). [CrossRef]  

26. S. Savarese, L. Fei-Fei, and P. Perona, “What do reflections tell us about the shape of a mirror?” in 1st Symposium on Applied Perception in Graphics and Visualization (2004), pp. 115–118.

27. A. Faisman and M. S. Langer, “Qualitative shape from shading, highlights, and mirror reflections,” J. Vis. 13(5):10 (2013). [CrossRef]  

28. B. G. Khang, J. J. Koenderink, and A. M. Kappers, “Shape from shading from images rendered with various surface types and light fields,” Perception 36, 1191–1213 (2007). [CrossRef]  

29. J. Zhang, Y. Luximon, P. Shah, and P. Li, “3D statistical head modeling for face/head-related product design: a state-of-the-art review,” Comput. Aided Des. 159, 103483 (2023). [CrossRef]  

30. S. W. Mooney and B. L. Anderson, “Specular image structure modulates the perception of three-dimensional shape,” Curr. Biol. 24, 2737–2742 (2014). [CrossRef]  

Supplementary Material (1)

NameDescription
Supplement 1       Supplemental document.

Data availability

Data underlying the results presented in this paper are available in Supplement 1.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (4)

Fig. 1.
Fig. 1. Flowchart of the proposed shape recovery algorithm. The orientation field is extracted from an image. The hue represents the image orientation that maximally stimulates the V1-cell-like oriented filter at each location. A cost function is formulated based on the orientation field. The estimated surface depth is obtained by minimizing the cost function.
Fig. 2.
Fig. 2. Recovered 3D shapes from textured surfaces. The textured surface images were generated using computer graphics. The recovered surface depths and the ground-truth 3D images are represented by depth maps with superimposed contour lines.
Fig. 3.
Fig. 3. (a) Averaged mean absolute orientation field error for each slant range across 12 objects. (b) Averaged global depth correlation for each slant range across 12 objects.
Fig. 4.
Fig. 4. Anisotropic texture images of object #5 and the 3D shapes recovered from them. (a) Textured surface image with the texture elongated in the horizontal direction. (b) Textured surface image with the texture elongated in the vertical direction. (c) Shape recovered from the image in (a). (d) Shape recovered from the image in (b). Red crosses indicate the locations where the recovered shapes differ.

Tables (3)

Tables Icon

Table 1. Orientation Field Error for Each Textured Surfacea

Tables Icon

Table 2. Estimation Performance for Each Textured Surfacea

Tables Icon

Table 3. Orientation Field Error and Estimation Performance for Elongated Textured Surfacesa

Equations (9)

Equations on this page are rendered with MathJax. Learn more.

E = C + P + B .
z = ( z x z y ) = R ( τ ) ( tan σ 0 ) ,
z = ( z x z y ) = R ( θ s ) ( 0 s tan σ ) ,
( z u z v ) = R ( δ θ ) ( 0 s tan σ ) .
C = 1 2 N Ω x , y Ω ( z u ) 2 .
P = 1 N Ω x , y Ω z .
B = 1 2 ( 1 N Ω x , y Ω z ) 2 .
E ( z ) = 1 2 z T ( L + B ) z 1 N Ω z T 1 .
z ^ = 1 N Ω ( L + B ) 1 1 .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.