Electroholography of real scenes by RGB-D camera and the downsampling method

Satoki Hasegawa; Hidenari Yanagihara; Yota Yamamoto; Takashi Kakue; Tomoyoshi Shimobaba; Tomoyoshi Ito

doi:10.1364/OSAC.2.001629

1. Introduction

In holography, the three-dimensional (3D) information of an object is recorded as interference fringes (hologram) by superimposing reference and object light. The 3D information is reconstructed by irradiating the hologram with reference light. A hologram generated through computer simulation is called a computer-generated hologram (CGH). Electroholography is a technique for performing 3D reconstruction, including that of moving images using CGH [1].

In recent years, various researches for generating CGH from real scene have been studied. 3D information acquisition methods such as integral photography (IP) [2,3], 3D reconstruction using multi-view images [4,5], and RGB-D camera [6] are used. In this study, we performed the reconstruction of 3D images using the CGH of real scenes using the RGB-D camera, which can also easily capture movies.

The RGB-D camera simultaneously acquires depth (D) and color information (RGB). We can obtain D without employing the use of a dedicated optical system or camera calibration. However, when we generate CGH using 3D information obtained from an RGB-D camera as is, the density of the 3D information becomes high. A liquid crystal display (LCD) with a resolution of 2K1K, commonly used as a spatial light modulator (SLM), cannot hold all original 3D data. The result is a low-quality reconstructed image. Additionally, because of the large amount of information involved, the cost of CGH generation is high and animation becomes difficult.

Therefore, in this study, we performed downsampling on 3D information acquired using the RGB-D camera and evaluated holographic reconstruction with reduced input information. In the electroholography input data, the point-cloud [7–9] and polygon models [10], among others, were examined. In this study, we used a point-cloud model that can easily handle downsampling data quantitatively.

The method of this research is shown in Fig. 1. Kinect for Windows v2 developed by Microsoft Corp. was used for the RGB-D camera. The specifications are as shown in Table 1. For downsampling, we used the voxel grid filter algorithm, which divides the space into a grid shape and ascertains the center of gravity of the object points in each lattice. This is equivalent to the operation of discretizing the space into a lattice shape and determining the drawing points. The quantitative change in downsampling determines the density of the drawing grid. The amount of downsampling was varied, and the reconstructed and moving images were evaluated.

Fig. 1. Electroholography system using an RGB-D camera and downsampling method.

Download Full Size | PDF

Table 1. Specifications of Kinect for Windows v2.

View Table | View all tables in this article

2. Proposed method using downsampling

2.1 Photographing system

For the 3D real scene, we photographed a person in a room. The arrangement was as shown in Fig. 2. The distance between the RGB-D camera and the person was 2.5m. Figure 3 shows a color image (RGB: 1,920 $\times$ 1,080) photograph, and Fig. 4 shows the depth image (D: 512 $\times$ 424) at that time. D is expressed as black (dark gray) for objects closer to the camera and white (thin gray) for objects at farther distances. The portion with no depth information is filled with black. Since this area cannot acquire data, it is removed as an outlier in the actual calculation. Figure 4 is expressed in grayscale (256 tones) for visualization, but in reality, it has 16 bits resolution in the depth direction. Figure 5 shows a point-cloud image (512 $\times$ 424 pixels) obtained using the RGB and depth images. Each pixel has D in addition to RGB.

Fig. 2. Shooting scenario by RGB-D camera.

Download Full Size | PDF

Fig. 3. Captured color (RGB) image.

Download Full Size | PDF

Fig. 4. Captured depth image.

Download Full Size | PDF

Fig. 5. Captured RGB-D composite image.

Download Full Size | PDF

2.2 Downsampling

The 3D space of the photographing system was 512 (x) $\times$ 424 (y) $\times$ $2^{16}$ (z: 16 bit), within which an object structure sized 512 $\times$ 424 = 217,088 points was drawn. The SLM used in this experiment was a 1,920 $\times$ 1,080-pixels (2 million pixels) LCD, which is widely used as a display. Because the area of the SLM used is small, an image of a single object point reconstructed from the SLM has a large side-lobe. When multiple points are reconstructed from the SLM simultaneously, their side-lobes interfere with each other. Therefore, unnecessary images are generated and the quality of the reconstructed images is degraded [11]. By downsampling, we can reduce the effect of side-lobe interferences and improve the image quality. Holography is a recording technique that has high redundancy, making it difficult to correctly record the object point data exceeding 100,000 points on a 2-million-pixel hologram. In particular, spatial resolution of high density regions decreases. Therefore, in this study, downsampling was performed to coarsen the 3D space lattice of the real scene and its effect verified. The load of holographic calculation was reduced by using the downsampling method. We evaluated the image quality at that time.

The downsampling algorithm is as described below. As shown in Fig. 6, the space is separated by a grid. We found the center of gravity of the object points in the lattice and repositioned that point with the object point of that lattice. Denoting the length of one side of the unit cell by $\Delta$L, the object points are discretized at intervals of approximately $\Delta$L.

Fig. 6. Downsampling method to quantize to a space lattice.

Download Full Size | PDF

2.3 CGH calculation

In this study, we generated a phase hologram (kinoform [12]) using the Fresnel approximation formula for the acquired point-cloud model. The calculations are as follows:

(1)$$I\left(x_\alpha,y_\alpha\right)=\sum_{j=1}^{N}{A_j \cos{(}kr_{\alpha j})}+i\sum_{j=1}^{N}{A_j \sin{(}kr_{\alpha j})}$$

(2)$$\phi\left(x_\alpha,y_\alpha\right)=\arg{\left[I\left(x_\alpha,y_\alpha\right)\right]}$$

Here, the indices $\alpha$ indicate the hologram and object, respectively, $r_{\alpha j}$ is the distance between object point $j$ and pixel $\alpha$ on the hologram, $A_j$ is the intensity of the object point, $k$ is the wave number of the reference light, and $i$ is the imaginary unit. The function $\phi \left (x_\alpha ,y_\alpha \right )$ is the argument of $I\left (x_\alpha ,y_\alpha \right )$, and $\phi \left (x_\alpha ,y_\alpha \right )$ is recorded on the hologram. In the calculation of Eq. (1), we used a recurrence formula algorithm [13], which is a speed-up method that utilizes the equally spaced pixels. Table 2 describes the environment used for the calculation.

Table 2. Development environment of CGH calculation.

View Table | View all tables in this article

2.4 Reconstruction optical system

The outline of the reconstructed electroholography system is shown in Fig. 7. The light emitted from the laser light source is converted into parallel beams through the collimator lens and is made incident on the SLM by the half mirror. The incident light is modulated by the CGH displayed on the SLM, and the object image is reconstructed.

Fig. 7. Electroholography system.

Download Full Size | PDF

The reconstruction optical setup used in this study is shown in Fig. 8. The wavelength of the laser is 532nm of green. The laser light was made into parallel beams using an objective lens and a collimator lens. A reflective type LCD with a resolution 1,920 $\times$ 1,080 pixels, pixel pitch ${8.0}\,{\mu \textrm {m}}$ and of size 15.36 mm $\times$ 8.64 mm was used. A pinhole was used to remove the zero order light. The reconstructed image was observed through the field lens. Although we could directly observe reconstructed images without a field lens, we used it to easily capture images via digital camera.

Fig. 8. Optical setup.

Download Full Size | PDF

3. Experiment

We increased spatial resolution $\Delta$L (length of one side of the cubic lattice) by 10 mm increments and verified the reconstructed image quality. Because the detection range of Kinect for Windows v2 is limited from 0.5 to 8.0 m, the image captured by the depth sensor has pixels whose depth information has not been obtained correctly. We defined the values of these pixels as outliers. Because we cannot use the 3D information from such outliers, we eliminated them before the CGH calculation. Therefore, the number of object points decreased from 217,088 (512 $\times$ 424) to 150,626. We set the 150,626 object points as the original image. In the case of a point-cloud model, calculation time is proportional to number of object points. Table 3 and Fig. 9 show this proportionality. The actual RGB-D images up to $\Delta$L = 50 mm are shown in Fig. 10, and the holographic reconstructed images are shown in Fig. 11. In the future, we plan to develop an algorithm for evaluating 3D images quantitatively.

Table 3. Downsampling by spatial resolution $\Delta$L.

View Table | View all tables in this article

Fig. 9. CGH calculation time with respect to the number of object points.

Download Full Size | PDF

Fig. 10. Downsampled point-cloud images.

Download Full Size | PDF

Fig. 11. Holographic reconstructed images obtained from downsampled point-cloud images.

Download Full Size | PDF

Figure 12 shows the reconstructed movies for the original image (Visualization 1), $\Delta$L = 30 mm (Visualization 2), and $\Delta$L = 50 mm (Visualization 3). Although the movie for $\Delta$L = 30 mm is not very different from that of the original image, the grids are conspicuous in the movie for $\Delta$L = 50 mm. In video playback, the afterimage effect occurs not only with human eyes but also with a digital video camera. Thus, the degradation of video-image quality is generally not noticeable when the number of object points is decreased. Moreover, recording more than 100,000 object points on a CGH degrades the quality of the reconstructed images when we use a 2M-pixel SLM. To overcome this problem, an algorithm that applies temporal-division multiplexing to the reconstruction of holographic images by dividing objects with more than 100,000 points into several point clouds with less than 100,000 points was proposed [14].

Fig. 12. Reconstructed movies.

Download Full Size | PDF

4. Conclusion and discussion

In this paper, electroholographic reconstruction was performed by downsampling the score of point-cloud model data acquired using an RGB-D camera. Even when the original image was coarsened to a lattice spacing of $\Delta$L = 40 mm and the number of object points was reduced to 15%, a reconstructed image with an accurate image quality was obtained. The calculation time was shortened to 13 s per CGH. It has been reported that the CGH calculation of a point-cloud model can be accelerated by a factor of approximately 100 by using a graphics processing unit (GPU) [15]. Using a GPU for this research, we can reconstruct moving images in real time. In this regard, we are continuing research.

The RGB-D camera used in this study has specifications that permit the collection of point-cloud data of approximately 200,000 points only. For example, it may be more useful if the downsampling method of this study is applied to large, high density point clouds exceeding 1 million points. We continue to conduct research on this aspect as well.

In addition, research on color reconstruction is underway. It takes three times longer to generate each CGH for RGB. We expect that reducing the number of object points by downsampling will lead to real-time color reconstruction.

References

1. P. St-Hilaire, S. A. Benton, M. E. Lucente, J. K. Mary Lou Jepsen, H. Yoshikawa, and J. S. Underkoffler, “Electronic display system for computational holography,” Proc.SPIE 1212, 174–182 (1990). [CrossRef]

2. G. Lippmann, “Epreuves reversibles photographies integrals,” Comptes-Rendus Academie des Sciences 146, 446–451 (1908).

3. J. Arai, M. Kawakita, T. Yamashita, H. Sasaki, M. Miura, H. Hiura, M. Okui, and F. Okano, “Integral three-dimensional television with video system using pixel-offset method,” Opt. Express 21(3), 3474–3485 (2013). [CrossRef]

4. Y. Ohsawa, K. Yamaguchi, T. Ichikawa, and Y. Sakamoto, “Computer-generated holograms using multiview images captured by a small number of sparsely arranged cameras,” Appl. Opt. 52(1), A167–A176 (2013). [CrossRef]

5. H. Sato, T. Kakue, Y. Ichihashi, Y. Endo, K. Wakunami, R. Oi, K. Yamamoto, H. Nakayama, T. Shimobaba, and T. Ito, “Real-time colour hologram generation based on ray-sampling plane with multi-gpu acceleration,” Sci. Rep. 8(1), 1500 (2018). [CrossRef]

6. D. Hiyama, T. Shimobaba, T. Kakue, and T. Ito, “Acceleration of color computer-generated hologram from rgb–d images using color space conversion,” Opt. Commun. 340, 121–125 (2015). [CrossRef]

7. A.-H. Phan, M.-l. Piao, S.-K. Gil, and N. Kim, “Generation speed and reconstructed image quality enhancement of a long-depth object using double wavefront recording planes and a gpu,” Appl. Opt. 53(22), 4817–4824 (2014). [CrossRef]

8. Y. Ogihara and Y. Sakamoto, “Fast calculation method of a cgh for a patch model using a point-based method,” Appl. Opt. 54(1), A76–A83 (2015). [CrossRef]

9. T. Sugie, T. Akamatsu, T. Nishitsuji, R. Hirayama, N. Masuda, H. Nakayama, Y. Ichihashi, A. Shiraki, M. Oikawa, N. Takada, et al., “High-performance parallel computing for next-generation holographic imaging,” Nat. Electron. 1(4), 254–259 (2018). [CrossRef]

10. K. Matsushima, “Computer-generated holograms for three-dimensional surface objects with shade and texture,” Appl. Opt. 44(22), 4607–4614 (2005). [CrossRef]

11. M. Makowski, “Minimized speckle noise in lens-less holographic projection by pixel separation,” Opt. Express 21(24), 29205–29216 (2013). [CrossRef]

12. L. Lesem, P. Hirsch, and J. Jordan, “The kinoform: a new wavefront reconstruction device,” IBM J. Res. Dev. 13(2), 150–155 (1969). [CrossRef]

13. T. Shimobaba and T. Ito, “An efficient computational method suitable for hardware of computer-generated hologram with phase computation by addition,” Comput. Phys. Commun. 138(1), 44–52 (2001). [CrossRef]

14. Y. Yamamoto, H. Nakayama, N. Takada, T. Nishitsuji, T. Sugie, T. Kakue, T. Shimobaba, and T. Ito, “Large-scale electroholography by horn-8 from a point-cloud model with 400,000 points,” Opt. Express 26(26), 34259–34265 (2018). [CrossRef]

15. N. Masuda, T. Ito, T. Tanaka, A. Shiraki, and T. Sugie, “Computer generated holography using a graphics processing unit,” Opt. Express 14(2), 603–608 (2006). [CrossRef]

Name	Description
Visualization 1	The reconstructed movie for the original image.
Visualization 2	The reconstructed movie for the delta L = 30 mm.
Visualization 3	The reconstructed movie for the delta L = 50 mm.

Color image (RGB)	Resolution	1,920 $\times$ 1,080
Color image (RGB)	Frame rate	30 [fps]
Depth image	Resolution	512 $\times$ 424
	Detection range	0.5 $\sim$ 8.0 [m]
	Frame rate	30 [fps]

OS	Windows 10 Enterprise 64 bit
CPU	Intel Core i7-7700K @4.20 GHz
Number of cores	4
Number of threads	8
RAM	32 GB
Compiler	Microsoft Visual C++ 2015

$Δ$ L	Number of object points	CGH calculation time [s]
Original image	150,626	79.94
10 mm	106,936	56.98
20 mm	58,305	30.95
30 mm	35,386	18.90
40 mm	23,243	12.81
50 mm	16,366	9.03
60 mm	11,952	6.64
70 mm	9,137	5.10
80 mm	7,281	4.07
90 mm	5,902	3.30
100 mm	4,958	2.79

Color image (RGB)	Resolution	1,920 $\times$ 1,080
Color image (RGB)	Frame rate	30 [fps]
Depth image	Resolution	512 $\times$ 424
	Detection range	0.5 $\sim$ 8.0 [m]
	Frame rate	30 [fps]

Electroholography of real scenes by RGB-D camera and the downsampling method

Abstract

1. Introduction

2. Proposed method using downsampling

2.1 Photographing system

2.2 Downsampling

2.3 CGH calculation

2.4 Reconstruction optical system

3. Experiment

4. Conclusion and discussion

References

Supplementary Material (3)

Cited By

Figures (12)

Tables (3)

Equations (2)

OSA Continuum