Abstract
This study develops an interactive directional volumetric display that tracks a particular person and keeps displaying a directional image only to the person in real-time. Therefore, we construct a person-tracking system and combine it with the directional volumetric display to achieve interaction. There are limitations for real-time interaction due to the processing time of the algorithm. Thus, we accelerate the algorithm by utilizing a graphics processing unit (GPU). The GPU implementation processed images which comprise 64 × 64 pixels in 30 frames per second with image quality enough for practical applications.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
1. Introduction
A volumetric display can display volumetric images and three-dimensional (3D) objects in real 3D space [1–3]. The volumetric images and the 3D objects can be observed from any viewing position without wearing the equipment, such as 3D glasses. It is noteworthy that various types of volumetric displays have been developed and utilized in digital signage, media art, and medical applications [4–10] .
By considering the 3D objects, some previously conducted studies [11,12] proposed algorithms for creating 3D objects like the one shown on the cover of the book “Gödel, Escher, Bach: an Eternal Golden Braid” [13]. The 3D objects represent various images, which are based on viewing positions; moreover, it can only represent binary images. In our previously conducted study [14], an algorithm we developed can create 3D objects, which represent gradated images depending on viewing positions. By displaying the 3D object created using our algorithm [14] on a volumetric display, we also developed a directional volumetric display which is depicted in Fig. 1. Figure 1(a) and (b) demonstrate the directional volumetric display composed of light emitting diodes (LEDs) and composed of threads and a projector, respectively. As shown in Fig. 1, we can only observe directional images from the designated viewing positions by using just one volumetric image because we observe a superposed image of each voxel layer of the volumetric image when we observe a volumetric display. The directional volumetric display illustrated in Fig. 1 can display both unmoving and moving images. For making practical use of the directional volumetric display, we also developed algorithms to improve the image quality of the displayed images [15] and remove the limitation on displayable images [16]. In this study, the algorithm used for calculating each volume element (voxel) of the 3D object displayed on the directional volumetric display [14–16] is referred to as voxel calculation algorithm.
We primarily aim to utilize the directional volumetric display as interactive media [17–19], which has attracted attention recently in the fields of digital signage and media art. There are some studies about the interactive media [20–22]. In these studies, the interactive media display images and texts depending on detected gestures and movement of observers. Moreover, there are several studies about interactive volumetric displays [23,24]. In these studies, the interactive volumetric displays change positions and shapes of a displayed volumetric image depending on detected gestures of observers. These interactive media and interactive volumetric displays have attracted attention and added value to digital signage and media art. By considering the directional volumetric display, the direction of the display axis of the displayed image is one of the most significant factors because only a person who is on the display axis can observe the displayed image. Therefore, implementing person-tracking and real-time processing to the directional volumetric display is an interesting subject as interactive media. To our best knowledge, there has been no study to develop interactive media which keep displaying directional image only to a particular person. Therefore, this study develops an interactive directional volumetric display which tracks a particular person and keeps displaying a directional image only to the person in real-time. To develop the interactive directional volumetric display, we construct a person-tracking system and combine it with the directional volumetric display.
Moreover, there are limitations on image resolution and processing conditions for real-time processing of the voxel calculation algorithm, since the processing time increases because of the following two problems:
- • The voxels are arranged in 3D; therefore, the computational cost of the voxel calculation algorithm explosively increases as the resolution of the displayed images become higher.
- • An iteration process is required to improve the image quality of the displayed images. Thus, the processing time and the image quality are trade-off relation.
2. Methods
This section describes how to construct the person-tracking system, the voxel calculation algorithm, and a projection system to fabricate the interactive directional volumetric display using GPU implementation.
2.1 Person-tracking system
We utilize Intel RealSense D415 [30] as the sensor of the person-tracking system. The Intel RealSense D415 is one of the depth cameras, which can acquire 3D information from its environment. The specifications of the Intel RealSense D415 are presented in Table 1. RealSense D415 has a depth field of view of $65^\circ \pm 2^\circ$ in horizontal, $40^\circ \pm 1^\circ$ in vertical, and $72^\circ \pm 2^\circ$ in diagonal. Additionally, we apply RealSense software development kit (SDK) 2.0 (2.16.5) [31] and NuiTrack SDK 1.3.8 [32] with RealSense. We can obtain 3D information from RealSense by utilizing RealSense SDK. We use NuiTrack SDK to detect faces and skeletons and obtain its coordinates.
As illustrated in Fig. 2, we set a coordinate system of the person-tracking system herein. The coordinate system comprises the x-axis, the horizontal z-axis, and the vertical y-axis with the origin at RealSense. We define a vector $\bf {H}$ from the origin to the target person’s head $(H_x,H_y,H_z)$, where a vector $\bf {O}$ denotes a criterion from the origin to $(0,H_y,C)$, and a variable $C$ denotes an arbitrary positive number. We calculate the angle $\theta$ made by $\bf {H}$ and $\bf {O}$ by solving Eq. (1) derived by the inner product of $\bf {H}$ and $\bf {O}$.
We obtain the angle of the target person by calculating Eq. (1) at every frame of the displayed images.
2.2 Voxel calculation algorithm
Figure 3 depicts a scheme of the voxel calculation algorithm. Figure 3(a) demonstrates a relationship between voxels and input images, and Fig. 3(b) shows an iteration process for improving the image quality of the displayed images. We use three input images as an example in Fig. 3. However, in theory, there is no limitation on the number of images unless the display axis ($w_i$ shown in Fig. 3(a)) is parallel [14]. There are two ways for calculating the voxel calculation algorithm: multiplication algorithm [14,15] and addition algorithm [16]. In the multiplication algorithm, the iteration process can be calculated by applying Eqs. (2)–(4). In the addition algorithm, the iteration process can be calculated by applying Eqs. (5), (3), and (6).
Here, $N$ denotes the number of images, and $k$ denotes the number of iterations. The coordination systems of the voxel calculation algorithm, $(x,\;y,\;z)$ and $(u_i, v_i, w_i)$, are set as shown in Fig. 3(a). By utilizing the geometric mean or sum of pixel values of the input images ${I_i(u_i,\;v_i)^{(k)}}$, where first input images ${I_i(u_i,\;v_i)^{(0)}}$ are the original images $O_i(u_i,\;v_i)$, the voxel values $V(x,\;y,\;z)^{(k)}$ are calculated in Eqs. (2) and (5). In Eq. (3), pixel values of the displayed images $D_i(u_i,\;v_i)^{(k)}$ are calculated by adding the voxel values along each display axis. In Eqs. (4) and (6), pixel values of the input images are updated by the feedback of the ratio or the difference between the pixel values of the original images and the displayed images. Calculations are executed on every voxel in Eqs. (2), (3), and (5). Consequently, the computational costs of the equations are high.
When one of the pixel values of the original images is ‘0’ in the multiplication algorithm, the voxel value becomes ‘0’ without depending on the pixel values of the other original images. If one of the original images consists of numerous black pixels in which the pixel values are ‘0’, we cannot correctly obtain the displayed images. In the addition algorithm, the voxel value does not become ‘0’ unless the pixel values of all the original images are ‘0’. Therefore, the addition algorithm can display any original image, whereas the multiplication algorithm cannot [16]. From the perspective of practical use, we apply the addition algorithm as the voxel calculation algorithm herein.
2.3 Projection system to fabricate an interactive directional volumetric display using GPU implementation
We develop a projection system to fabricate the interactive directional volumetric display using GPU implementation. Figure 4 demonstrates the flow of the projection system. By applying a method proposed in our previously conducted study [5], we fabricate the interactive directional volumetric display composed of threads and a projector herein. The projection system is constructed by utilizing a CPU, a GPU, and OpenGL [33].
The CPU reads the original images and an angle from the person-tracking system. The GPU processes the voxel calculation algorithm and converts the voxel values into a projection image. The height of the projection image is adjusted based on an elevation angle of the light beams from the projector and the distance between each thread and the projector [5]. We use compute unified device architecture (CUDA) version 9.0 [34] for the GPU implementation. CUDA is a parallel computing architecture provided by NVIDIA. We can easily implement parallel computing by applying GPU with CUDA [35]. For parallel computing, each GPU thread is assigned to each voxel or pixel. Using pixel buffer object (PBO), the projection image calculated on the GPU is mapped into an OpenGL frame buffer. The PBO transfers pixel data to an OpenGL frame buffer through direct memory access; therefore, it achieves high-speed pixel data transfers.
3. Results
3.1 Processing time of the projection system
In this study, we utilized an NVIDIA GeForce GTX 1050 for the GPU implementation. The specifications of the NVIDIA GeForce GTX 1050 are presented in Table 2. For the CPU implementation, we applied Intel Core i5-6500 and Microsoft Visual C++. OpenMP was also used for multi-threading. The specifications of the Intel Core i5-6500 are presented in Table 3.
First, we describe the convergence of image quality of the displayed images, which accompany the number of iterations of the voxel calculation algorithm. For evaluating the image quality, we measured the average structural similarity (SSIM) [36] values of each red, green, and blue element of the displayed images. Figure 5 depicts the convergence of the average SSIM values of the displayed images, which comprise 64 $\times$ 64 pixels, when two, three, and four original images are utilized with 1, 5, and 10 iteration(s). As shown in Fig. 5, the SSIM values converged from 5 to 10 iterations. Consequently, herein, we measured the processing time of the projection system with 1, 5, and 10 iteration(s).
We present the processing time of the projection system using the CPU implementation and the GPU implementation. The processing time with two and three original images are presented in Table 4 and 5, respectively. The processing time was given by the average of the first 30 frames of the original images. As demonstrated in Table 4 and 5, the GPU implementation was 24 times on the average and up to 45 times faster than the CPU implementation.
Finally, Fig. 6 illustrates the processing time of the GPU implementation using the original images, which consist of 64 $\times$ 64 pixels. As depicted in Fig. 6, when the original images consisting of 64 $\times$ 64 pixels are used with five iterations, the GPU implementation can process the algorithm in 30 frames per second with up to six original images.
3.2 Simulation results of the projection system
Herein, we present the simulation result of the interactive directional volumetric display using the projection system. Figure 7 and 8 show the simulation results using two original images which comprise 20 $\times$ 20 and 64 $\times$ 64 pixels, respectively. Figure 7(a), 7(b), 8(a), and 8(b) show the original images. As shown in Fig. 7(c), 7(d), 8(c), and 8(d), we can observe the displayed images only from the designated viewing position. Also, when we observe the interactive directional volumetric display from an undesignated viewing position, we cannot recognize both original images as shown in Fig. 7(e) and 8(e).
3.3 Interactive directional volumetric display
We present the system overview of the interactive volumetric display illustrated in Fig. 9 and the relationship between the directional volumetric display, the projector, RealSense, and the angles of the observer depicted in Fig. 10. The size of the display was 1.87 m in height and 0.95 m in width and depth. We used Vinymo MBT (NAGAI YORIITO Co., Ltd., Japan) as threads, and its fineness is 280 dtex. We applied MH550 projector (BenQ Japan Co., Ltd., Japan), and the projector was installed with a distance of 1.05 m from the display and 0.70 m above the floor. The specifications of the projector are presented in Table 6. The height of the projected image was adjusted to 0.40 m. Host personal computer and RealSense were installed beside the display.
Figure 11 shows the observation results of the interactive directional volumetric display, and the results are also presented in Visualization 1. The comparison of the observation results in the directional volumetric display with and without the person-tracking system is also presented in Visualization 2. The original images, which comprise 20 $\times$ 20 pixels, are demonstrated in Fig. 11(a) and 11(b). In this study, we displayed the original image 1 in a fixed angle ($0^\circ$) and the original image 2 in an angle of a moving target person. Figure 11(c) depicts the displayed image 1. In addition, we can also simultaneously observe the displayed image 2 from a moving viewing position: $45^\circ$, $90^\circ$, and $135^\circ$ as illustrated in Fig. 11(d), 11(e), and 11(f), respectively. We remark that the original images could not be observed from the undesignated viewing position as shown in Fig. 11(g).
As shown in Fig. 11 and Visualization 1, when we observe the interactive directional volumetric display from an angle of $45^\circ$ and $135^\circ$, the visibility of the displayed images is deteriorated compared to the displayed image observed from an angle of $90^\circ$ because of an occlusion of threads mentioned in Discussion. However, the deterioration does not make the displayed image unrecognizable.
4. Discussion
The interactive directional volumetric display could keep sending various visual information to various persons. Therefore, it can be utilized as digital signage, such as multilingual signage. The multilingual signage can display information in each person’s language by identifying their language using image and voice recognition. In addition, by implementing other interactions such as gesture detection, the interactive directional volumetric display could be applied in the field of media art. Even if we attempt to develop a new system by combining the directional volumetric display with other interactions as described above, an interaction using the person-tracking system is essential for the new system. This is because only a person who is on the display axis can observe the displayed image of the directional volumetric display. The GPU acceleration is also necessary to achieve the interactions in real-time. Hence, this study would be a basis of interactions with the directional volumetric display and has a significant impact on the fields of digital signage and media art.
Our previously conducted study [5] fabricated the directional volumetric display which displays directional images only in an angle of $0^\circ$ and $90^\circ$ depicted in Fig. 10. On the other hand, this study confirms that we can observe displayed images from a wider range of the angle using the same optics configuration as the previously conducted study. This result improves the usefulness of the directional volumetric display. However, there are problems with the quality of the images due to the optics configuration. As shown in Fig. 11, the quality of the displayed images is low. This is caused by two problems of the directional volumetric display composed of threads and a projector. The first problem is a limitation on resolution. By considering the vertical resolution of the directional volumetric display composed of threads and a projector, we can display an image with arbitrary vertical resolution within a range of the projector resolution because threads are continuous in vertical. In contrast, there is a limitation on the horizontal resolution. Based on a theory proposed in the previously conducted study, we can display an image which consists of up to 43 pixels in horizontal using a projector with resolution of 1920 $\times$ 1080. However, in practice, we only can display an image which consists of approximately 20 pixels in horizontal due to the fabrication of the display. The second problem is an occlusion of threads. A method to determine positions of threads presupposes that the directional volumetric display composed of threads and a projector is observed only from an angle of $0^\circ$ and $90^\circ$ shown in Fig. 10. Therefore, when the observer is not in an angle of $0^\circ$ or $90^\circ$, the occlusion of threads occurs. The occlusion makes fewer threads observable and generates noise on the displayed images. On the other hand, the quality of the displayed images shown in Fig. 7 and 8 are much better than those shown in Fig. 11. This means that the interactive directional volumetric display has the possibility of displaying a high-quality image. Therefore, we should utilize a volumetric display which can display high-quality and high-resolution image as the interactive directional volumetric display for the practical use of the interactive directional volumetric display.
Herein, although we used a single NVIDIA GeForce GTX 1050, which is inexpensive and readily available GPU, the GPU implementation was up to 45 times faster than that of the CPU implementation. Therefore, by utilizing a higher-performance GPU or multiple GPUs, it is inferred that we can calculate higher-resolution images in real-time. Moreover, in this study, the person-tracking system we constructed does not achieve a person tracking for the whole circumference of the interactive directional volumetric display. To achieve a person tracking for the whole circumference, we remark that the two options shown in Fig. 12 are feasible. The first option (Fig. 12(a)) applies multiple RealSenses. Moreover, the integration of data obtained from the multiple RealSences may need complex processing and increase the processing time of the projection system. The second option (Fig. 12(b)) requires only one RealSense as well as this study; therefore, the second option could be a solution to achieve the person tracking for the whole circumference of the interactive directional volumetric display.
5. Conclusion
In this study, we developed the interactive directional volumetric display by combining the person-tracking system and the directional volumetric display. We observed that the interactive directional volumetric display tracked a particular person and kept displaying a directional image only to the person. The projection system to fabricate the interactive directional volumetric display using the GPU implementation is developed. Consequently, the processing time of the GPU implementation became up to 45 times faster than that of the CPU implementation. The GPU implementation could process up to six original images, which consist of 64 $\times$ 64 pixels, with five iterations in 30 frames per second. Based on these results, it is noteworthy that the GPU acceleration relaxes the limitations on the resolution of the original images, the number of iterations, and the number of the original images for real-time processing.
Funding
Japan Society for the Promotion of Science (18K11599).
Disclosures
The authors declare no conflicts of interest.
References
1. B. G. Blundell, A. J. Schwarz, and D. K. Horrell, “Volumetric three-dimensional display systems: their past, present and future,” Eng. Sci. Educ. J. 2(5), 196–200 (1993). [CrossRef]
2. D. L. MacFarlane, “Volumetric three-dimensional display,” Appl. Opt. 33(31), 7453–7457 (1994). [CrossRef]
3. G. E. Favalora, “Volumetric 3D displays and application infrastructure,” Computer 38(8), 37–44 (2005). [CrossRef]
4. M. Parker, “Lumarca,” in ACM SIGGRAPH ASIA 2009 Art Gallery &; Emerging Technologies: Adaptation, (ACM, New York, NY, USA, 2009), SIGGRAPH ASIA ’09, p. 77.
5. A. Shiraki, M. Ikeda, H. Nakayama, R. Hirayama, T. Kakue, T. Shimobaba, and T. Ito, “Efficient method for fabricating a directional volumetric display using strings displaying multiple images,” Appl. Opt. 57(1), A33–A38 (2018). [CrossRef]
6. R. Hirayama, A. Shiraki, H. Nakayama, T. Kakue, T. Shimobaba, and T. Ito, “Operating scheme for the light-emitting diode array of a volumetric display that exhibits multiple full-color dynamic images,” Opt. Eng. 56(7), 073108 (2017). [CrossRef]
7. R. Hirayama, M. Naruse, H. Nakayama, N. Tate, A. Shiraki, T. Kakue, T. Shimobaba, M. Ohtsu, and T. Ito, “Design, implementation and characterization of a quantum-dot-based volumetric display,” Sci. Rep. 5(1), 8472 (2015). [CrossRef]
8. S. K. Nayar and V. N. Anand, “3D Display Using Passive Optical Scatterers,” Computer 40(7), 54–63 (2007). [CrossRef]
9. K. Kumagai, S. Hasegawa, and Y. Hayasaki, “Volumetric bubble display,” Optica 4(3), 298–302 (2017). [CrossRef]
10. M. Gately, Y. Zhai, M. Yeary, E. Petrich, and L. Sawalha, “A Three-Dimensional Swept Volume Display Based on LED Arrays,” J. Display Technol. 7(9), 503–514 (2011). [CrossRef]
11. G. Sela and G. Elber, “Generation of view dependent models using free form deformation,” Visual Comput. 23(3), 219–229 (2007). [CrossRef]
12. N. J. Mitra and M. Pauly, “Shadow Art,” ACM Trans. Graph. 28(5), 1–7 (2009). [CrossRef]
13. D. R. Hofstadter, Gödel, Escher, Bach: An Eternal Golden Braid (Basic Books, Inc., New York, NY, USA, 1979).
14. H. Nakayama, A. Shiraki, R. Hirayama, N. Masuda, T. Shimobaba, and T. Ito, “Three-dimensional volume containing multiple two-dimensional information patterns,” Sci. Rep. 3(1), 1931 (2013). [CrossRef]
15. R. Hirayama, H. Nakayama, A. Shiraki, T. Kakue, T. Shimobaba, and T. Ito, “Image quality improvement for a 3D structure exhibiting multiple 2D patterns and its implementation,” Opt. Express 24(7), 7319–7327 (2016). [CrossRef]
16. A. Shiraki, D. Matsumoto, R. Hirayama, H. Nakayama, T. Kakue, T. Shimobaba, and T. Ito, “Improvement of an algorithm for displaying multiple images in one space,” Appl. Opt. 58(5), A1–A6 (2019). [CrossRef]
17. J. Müller, F. Alt, D. Michelis, and A. Schmidt, “Requirements and Design Space for Interactive Public Displays,” in Proceedings of the 18th ACM International Conference on Multimedia, (ACM, New York, NY, USA, 2010), MM ’10, pp. 1285–1294.
18. T. Ojala, V. Kostakos, H. Kukka, T. Heikkinen, T. Linden, M. Jurmu, S. Hosio, F. Kruger, and D. Zanni, “Multipurpose Interactive Public Displays in the Wild: Three Years Later,” Computer 45(5), 42–49 (2012). [CrossRef]
19. K. Kuikkaniemi, G. Jacucci, M. Turpeinen, E. Hoggan, and J. Müller, “From Space to Stage: How Interactive Screens Will Change Urban Life,” Computer 44(6), 40–47 (2011). [CrossRef]
20. A. D. Wilson, “TouchLight: An Imaging Touch Screen and Display for Gesture-Based Interaction,” in Proceedings of the 6th International Conference on Multimodal Interfaces, (ACM, New York, NY, USA, 2004), ICMI ’04, pp. 69–76.
21. H. Benko, A. D. Wilson, and R. Balakrishnan, “Sphere: Multi-touch Interactions on a Spherical Display,” in Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology, (ACM, New York, NY, USA, 2008), UIST ’08, pp. 77–86.
22. I. Stavness, B. Lam, and S. Fels, “pCubee: A Perspective-Corrected Handheld Cubic Display,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, (ACM, New York, NY, USA, 2010), CHI ’10, pp. 1381–1390.
23. R. Balakrishnan, G. W. Fitzmaurice, and G. Kurtenbach, “User interfaces for volumetric displays,” Computer 34(3), 37–45 (2001). [CrossRef]
24. T. Grossman, D. Wigdor, and R. Balakrishnan, “Multi-finger Gestural Interaction with 3D Volumetric Displays,” in Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology, (ACM, New York, NY, USA, 2004), UIST ’04, pp. 61–70.
25. J. D. Owens, M. Houston, D. Luebke, S. Green, J. E. Stone, and J. C. Phillips, “GPU Computing,” Proc. IEEE 96(5), 879–899 (2008). [CrossRef]
26. D. Luebke and G. Humphreys, “How GPUs Work,” Computer 40(2), 96–100 (2007). [CrossRef]
27. F. Xu and K. Mueller, “Accelerating Popular Tomographic Reconstruction Algorithms on Commodity PC Graphics Hardware,” IEEE Trans. Nucl. Sci. 52(3), 654–663 (2005). [CrossRef]
28. M. Beister, D. Kolditz, and W. A. Kalender, “Iterative reconstruction methods in X-ray CT,” Phys. Medica 28(2), 94–108 (2012). [CrossRef]
29. A. Eklund, P. Dufort, D. Forsberg, and S. M. LaConte, “Medical image processing on the GPU - Past, present and future,” Med. Image Anal. 17(8), 1073–1094 (2013). [CrossRef]
30. “Depth Camera D415 - Intel® RealSense™ Depth and Tracking Cameras,” https://www.intelrealsense.com/depth-camera-d415/.
31. “Release Intel® RealSense™ SDK 2.0 (build 2.16.5),” https://github.com/IntelRealSense/librealsense/releases/tag/v2.16.5.
32. “Nuitrack Full Body Skeletal Tracking Software - Kinect replacement for Android, Windows, Linux, iOS, Intel RealSense, Orbbec,” https://nuitrack.com/.
33. “OpenGL - The Industry Standard for High Performance Graphics,” https://www.opengl.org/.
34. “CUDA Toolkit 9.0 Downloads | NVIDIA Developer,” https://developer.nvidia.com/cuda-90-download-archive.
35. J. Nickolls, I. Buck, M. Garland, and K. Skadron, “Scalable Parallel Programming with CUDA,” Queue 6(2), 40–53 (2008). [CrossRef]
36. A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]