Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Dual focal plane augmented reality interactive display with gaze-tracker

Open Access Open Access

Abstract

Stereoscopic augmented reality (AR) displays have a fixed focus plane and they suffer from visual discomfort due to vergence-accommodation conflict (VAC). In this study, we demonstrated a biocular (i.e. common optics for two eyes and same images are shown to both eyes) two focal-plane based AR system with real-time gaze tracker, which provides a novel interactive experience. To mitigate VAC, we propose a see-through near-eye display mechanism that generates two separate virtual image planes at arm’s length depth levels (i.e. 25 cm and 50 cm). Our optical system generates virtual images by relaying two liquid crystal displays (LCDs) through a beam splitter and a Fresnel lens. While the system is limited to two depths and discontinuity occurs in the virtual scene, it provides correct focus cues and natural blur effect at the corresponding depths. This allows the user to distinguish virtual information through the accommodative response of the eye, even when the virtual objects overlap and partially occlude in the axial direction. The system also provides correct motion parallax cues within the movement range of the user without any need for sophisticated head trackers. A road scene simulation is realized as a convenient use-case of the proposed display so that a large monitor is used to create a background scene and the rendered content in the LCDs is augmented into the background. Field-of-view (FOV) is 60 × 36 degrees and the eye-box is larger than 100 mm, which is comfortable enough for two-eye viewing. The system includes a single camera-based pupil and gaze tracker, which is able to select the correct depth plane based on the shift in the interpupillary distance with user’s convergence angle. The rendered content can be distributed to both depth planes and the background scene simultaneously. Thus, the user can select and interact with the content at the correct depth in a natural and comfortable way. The prototype system can be used in tasks that demand wide FOV and multiple focal planes and as an AR and vision research tool.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

There is a rapid growth in the development of three-dimensional (3D) virtual and augmented reality (AR) displays in recent years. Inward rotation of the eyes (convergence) and focus control or accommodation are neurally coupled in natural vision [1]. In conventional see-through near-eye displays, a pair of parallax images for two eyes are rendered in flat displays so that a virtual image is created at different depths according to the amount of binocular disparity using separate optics for the two eyes. In this case, a conflict occurs between vergence action of the eyes and accommodation distance, known as vergence-accommodation conflict (VAC) [2]. Many recent studies show that VAC is one of the most significant causes of visual discomfort [3] and it might even lead to error in the perception of the scene geometry [4]. Displaying AR content without visual discomfort is a challenging task from a technological perspective. Visual discomfort including headaches, eye strain and motion sickness associated with available head-mounted displays (HMDs) create less than desirable viewing experience. Mimicking natural vision requires rendering correct focus cues together with the natural blurring of images [5]. Several solutions are provided in the literature to mitigate VAC problem in conventional HMDs. Methods that attempt to minimize the visual discomfort associated with VAC can be categorized [6] as Maxwellian view displays [7], vari-focal plane displays [8], multifocal plane (MFP) displays [9], integral imaging-based displays [10], computational multilayer displays [11], and computational holographic displays [12]. While all these methods reduce visual discomfort, they bring about their drawbacks such as optical system complexity, narrow FOV, and small eye-box.

Computational holographic displays are the only solution which can provide all the natural depth cues and visual comfort but at the cost of high computational requirements. Recently launched Magic Leap 1 product is a wearable binocular headset that uses 6 layers of waveguides as optical relays in order to provide two depth planes [13]. Despite sophisticated depth sensors and pupil trackers, the system cannot render information at different depth simultaneously and cannot render objects at arm’s length since the depths of focal planes are 1 m and 3 m [14]. Thus, even this sophisticated system cannot provide comfortable interaction with objects at arm’s length due to severe VAC and visual discomfort. The entire scene is rendered in one depth plane, which is selected based on the gaze information computed using the pupil trackers.

We designed and implemented a dual-focal plane biocular AR prototype using two separate displays rendered at 25 cm and 50 cm distances to allow for an interactive wide FOV display using an integrated gaze tracker. Biocular means both eyes share the same optics for viewing while binocular means separated optics are used for each eye. In our proposed biocular display, the two eyes are presented with an identical image and there should be no conflict between the cues for accommodation and vergence. It is important to note that biocular displays are still three dimensional, but because all elements of the image are presented with no binocular disparity, the perception is of a flat three-dimensional image located at a specified depth [15]. Our system has an advantage over fixed focus AR systems since two focal planes are simultaneously present without the conflict between the vergence and accommodation. Furthermore, the biocular implementation provides full motion parallax within the movement range of the user (about 5 cm in both horizontal and vertical directions), which significantly enhances the 3D feeling and removes the complexity and inaccuracies related to head trackers needed in binocular displays. This second point is especially worthwhile to emphasize, because during interaction with arm’s length objects, even minor inaccuracies of head trackers lead to noticeable vibrations and jumps in object positions, creating disturbing effects. Mitigation of such effects results in a significant increase in implementation complexity of head tracking units. Section 2 describes the system design and optical simulations, Section 3 gives experimental results for the display, Section 4 is dedicated to integrated pupil and gaze tracker and the interactive display demonstration.

2. System design and simulations

In this section, we discuss the design process of a spatially-multiplexed dual-focal plane biocular augmented reality prototype. To eliminate the error in the perception of the scene geometry in displays, correct focus cues must be enabled in different depth levels. The depth resolution of a human is estimated as 1/7D (D: diopters) due to the limited depth of field of the eye [16]. Thus, adjacent depth planes must be separated 1/7D for a continuous depth of field and the range of human accommodation is at most 4 diopters for a typical near point. In this case, a display with full accommodative range requires 28 image planes [17]. However, such an implementation is not feasible due to lack of available transparent display technologies and high computational power requirement. Furthermore, stacking a large number of displays is not practical. Therefore, we designed a dual-focal-plane see-through display since it allows users to distinguish virtual objects in different depths through accommodation action of the eye. Whereas it causes a discontinuity for continuous depth scenes, it is a feasible application in terms of form factor and it presents a novel experience for users. In today's consumer HMD applications, interaction in the arms-length without VAC is a challenging task. Thus, distances to the two depth planes are selected as 25 cm (2D) and 50 cm (4D).

The system provides correct focus cues in two focal planes with natural blur effect. Although the proposed display does not require focus tunable lenses and dynamic components, depth perception for the dual-focal plane is easily observable and it is shown in the experimental results section.

2.1 Design

The dual-focal plane is acquired by using two separate planar LCDs placed at different optical distances with respect to the single magnifier Fresnel lens. Both LCDs are totally viewed by both eyes so virtual images do not contain any parallax. Therefore, hardware wise, our proposal is less demanding with respect to conventional parallax-based approaches. Optical distances are optimized in ZEMAX to minimize the spot radius of different field points of the image. Virtual content is separated into two sections based on their depth information. The closer section to the viewer’s eye appears at 25 cm (4D). The rear section is displayed at 50 cm (2D). Images are superimposed into real word through a planar beam splitter. A camera is placed towards to eye to track depth of user’s convergence depth. Figure 1 illustrates the schematic drawing of the system. In Fig. 1, the far display (sketched with a solid red color) is placed at 125 mm from the lens. The image of the far display (shown with transparent red color) is formed 50 cm away from the viewer’s eye. The physical distance between the near display (solid blue colored display in Fig. 1) and the lens is 86 mm. The distance of the image of the near display to the user’s eye is 25 cm. The distance between the eyes and the combiner is 40 mm. The combiner is placed 33 mm below the lens. The camera is used to track the user’s gaze depth.

 figure: Fig. 1.

Fig. 1. Schematic illustration of the optical layout

Download Full Size | PDF

We use two Topfoison TF60010A 1440 × 2560 6.0” TFT LCD panel with a refresh rate of 60 Hz to display rendered images. These displays can be driven by a single portable computer simultaneously. As a magnifier, a 6” (152.4 mm) focal length acrylic Fresnel lens with 1.5 mm thickness from Edmund Optics is used. We used a 50/50 flat combiner in front of the user’s eyes with a thickness of 2 mm. We intentionally chose a very thin beam-splitter to eliminate the ghost artifacts and augment the contents displayed on the LCDs.

We use Unity 3D game engine and C# programming language to render the virtual scene. Two virtual cameras are placed in the same virtual 3D position inside Unity 3D with different culling masks. Each virtual camera renders the scene with the associated culling mask and displays the rendered content on the corresponding LCD. Virtual objects that are supposed to be closer than 3 diopters rendered in the near display and virtual objects with a depth of more than 3 diopters are rendered in the far display. In this way, the rendered objects are distributed over one of the available fixed virtual depth planes.

2.2 Optical simulations

In this section, we evaluate several attributes of our display by simulations. All simulations are performed in ZEMAX optical design software. In all simulations, we used a ZEMAX model of Fresnel lens from Edmund’s Optics official website [18]. The 3D optical layout of the setup is shown in Fig. 2. In Fig. 2, the blue solid lines represent the rays emanating from the near display. The rays emanating from the far display is illustrated by solid red colors, as evident in Fig. 2.

 figure: Fig. 2.

Fig. 2. 3-D Optical Layout in ZEMAX

Download Full Size | PDF

The proposed AR display achieves an overlapping biocular 60.0 degrees horizontal and 35.5 degrees vertical FOV for both near and far displays, where the inter-pupillary distance (IPD) is set to 65 mm in the ZEMAX.

Eye-box size is defined as a space in which the user’s head is decentered in this space and still, a horizontal and a vertical FOV of at least 10 degrees is available for both eyes [19]. Based on this definition, the designed AR display has an eye-box of 105 mm in the horizontal axis and 66 mm in the vertical axis.

For analyzing the optical distortion of the system, we assume that one of the user’s eyes is in the center of the eye-box. Distortion value of a single field point in percent determined as

$$Distortion{\;\ }({\%} )\, = \,100{\;\ } \times {\;\ }\frac{{{y_{chief}} - {y_{ref}}}}{{{y_{ref}}}}$$
where ${y_{chief}}$ is chief ray height and ${y_{ref}}$ is reference ray height for the undistorted ray as provided by ZEMAX. Distortion values are recorded for distinct points in the available instantaneous FOV. Then, we assume eyes are in the natural position which means each eye is shifted by IPD/2 or -IPD/2 in the horizontal axis. The simulation results of the optical distortion are shown in Fig. 3. Figure 3(a) illustrates the distortion values of different field points, while the eye is located in the center of eye-box. In Fig. 3(b), the eyes are shifted by 32.5 mm and the distortion values of the different field points along the FOV is reported for each eye. It is evident that the image distortion does not exceed 11% in the entire FOV when eyes are in their natural position. The distortion value increases at the edges of the eye-box due to the limited clear aperture of the Fresnel lens.

 figure: Fig. 3.

Fig. 3. a) The distortion values of different field points, while the eye is in the center of eye-box. b) The distortion values of different field points, while eyes are in the natural position (i.e. off-centered by IPD/2 = 32.5 mm)

Download Full Size | PDF

The spot diagram of the system in Fig. 4 shows the spot size variation for red, green, and blue wavelengths at 9 different field points at the center, edges, and corner points to cover the entire FOV. The RMS spot radius varies from 367 µm to 667 µm while the diffraction limited airy spot diameter is 164 µm for the 25 cm virtual image distance. This corresponds to 1.5-3.0 cycles/degrees angular resolution. Image quality can be improved by using multiple lenses and glass optics.

 figure: Fig. 4.

Fig. 4. The spot radius sizes for 9 sample points, which are selected as the center, edge and corner points of the display to cover the entire FOV. In this ZEMAX simulation, the eye is shifted by IPD/2 and the wavelengths are chosen to support the visible light spectrum (red color with 656.3 nm, green color with 587.6 nm and blue color with 486.1 nm). Positions of the field points in the FOV are shown in bottom-right.

Download Full Size | PDF

As observed in Fig. 3 and Fig. 4, the proposed system suffers from optical distortion and chromatic aberration. Since the full images on both LCDs are seen by both eyes, distortion and chromatic aberration compensation is not performed, i.e. any correction that improves the right eye image quality will result in an equal degree of degradation in the left eye image quality in the horizontal axis. While the angular resolution is 10x below the retinal resolution limit and maximum optical distortion is 11%, the experimental setup has good image quality and effectively demonstrates the natural blurring effect as discussed below.

3. Experimental results

In this section, we present the experimental results of the proposed system. We designed a road scene as a convenient use-case of the proposed AR display. A large LCD monitor (Dell 21.5” 1920 × 1080 pixels at a distance of 50 cm (2D) is used as a background scene display. The virtual scene augmented on the background scene is rendered in Unity 3D. We get two video outputs simultaneously from Unity 3D and display them in LCDs synchronously. Furthermore, a chinrest is placed to fix the head position for the gaze tracking process. A photo of the implemented bench-top prototype hardware is illustrated in Fig. 5.

 figure: Fig. 5.

Fig. 5. The prototype hardware including 2 LCDs for two focal planes and camera for simultaneously tracking two eyes and computing the gaze distance

Download Full Size | PDF

While Fig. 6 illustrates the contents shown in both LCDs that correspond to two frames, Fig. 7 shows the experimental results captured with a camera. The frames shown in Fig. 6 are displayed by the proposed system and a background scene is displayed using a monitor. Scenes, provided by the display, are captured with Nikon D5300 DSLR camera. The camera aperture was set to f/10 with an exposure time of 1/800 seconds.

 figure: Fig. 6.

Fig. 6. a) Content appearing in the near display b) Content appearing in the far display

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. Experimental captures of the virtual scene. a) The camera is focused on 4 diopters (25 cm). b) The camera is focused on 2 diopters (50 cm). FOV is 60.0 by 35.5 degrees and the image can be seen with both eyes simultaneously. (see Visualization 1 and Visualization 2)

Download Full Size | PDF

In Fig. 7 (a), when the camera is focused in the near-focal-plane; speedometer (bottom-left), warning signs (upper-left), navigation map (upper-right) and an animation showing car’s technical situation (bottom-right) are sharp, and they are displayed in the near display as shown in Fig. 6(a). In this case; blue arrow signs, hotel sign, and yellow bounding box are blurred, and they are displayed in the far display as illustrated in Fig. 6(b). The experimental results show that the implemented display provides correct depth cues. The user can distinguish the super-imposed virtual objects, even when they are in the same line of sight due to the clear depth perception.

4. Gaze tracker for interaction

We also added a gaze tracker to interact with the virtual objects. The system uses a single camera (Goldmaster V-52 webcam) with the resolution of 640 × 480 pixels and it does not require any infra-red illumination. The gaze tracker utilizes a feature-based estimation [20] to make it less sensitive to variations in illumination and viewpoint. Furthermore, we aimed an easy calibration process. The calibration process assumes a fixed head position. The output of the algorithm is a single digit output, indicating the display at which the user converges for the central region in the virtual scene.

To clarify the assumptions in the gaze tracking algorithm, two schematic drawings are provided in Fig. 8. A geometrical model is developed to estimate the convergence depth of a user, as shown in Fig. 8(a). Assuming the IPD of the user is 65 mm and using the geometry shown in Fig. 8(a), the rotation angle difference of each eye is computed as:

$$\varDelta \theta {\;\ }\, = {\;\ }\,{\theta _N}{\;\ } - {\;\ }{\theta _F} = {\;\ }{\tan^{ - 1}}\left( {\frac{{IPD/2}}{{\textrm{Near}{\;\ }\textrm{depth}}}} \right) - {\tan^{ - 1}}\left( {\frac{{IPD/2}}{{\textrm{Far}{\;\ }\textrm{depth}}}} \right) = 3.7^\circ \, = {\;\ }\,0.065{\;\ }rad$$

 figure: Fig. 8.

Fig. 8. a) A geometrical model is developed to estimate the convergence depth of a user. b) Pupil centers illustrated on the left and right eyes. The blue and red dots respectively indicate the positions of pupils when user converges at the near display and far display

Download Full Size | PDF

where ${\theta _N}$ and ${\theta _F}$ are the angles of rotation when the user gazes at the near and far display, respectively.

Figure 8(b) shows the IPD shift between the near and far focus planes. Ellipses represent the left and the right eyes. The eye pupil positions are color coded. The blue dots indicate the position of pupils when the user converges at the near display. Locations of eye centers while gazing at far display are illustrated by red dots. Based on the model in Fig. 8(a), IPD difference will be

$$\varDelta IPD = 2\varDelta \textrm{x} \approx 2{\;\ } \times {r_{\textrm {eye}}}{\;\ } \times {\;\ }\varDelta \theta = 1.56\,\textrm{mm},$$
where the radius of the eye is assumed 12 mm. Horizontal 640 pixels across the camera correspond to 128 mm for the fixed head and camera position in our setup, therefore the amount of shift $\varDelta \textrm{IPD}$ corresponds to approximately 8 pixels. Since 2 diopter shift from 50 cm to 25 cm correspond to 8 pixels, the camera limits the minimum measurable gaze to ∼0.25 diopters. The real limit comes from the noise and the algorithms. The accuracy can be improved by using a camera with a smaller FOV, larger pixel count, and frame averaging.

The main steps of the calibration, gaze tracking, and the interaction process are shown in Fig. 9(a). The user looks at the near display for a while. The tracking algorithm finds the locations of the eye center. The average of the eye-center locations is recorded and IPDN is extracted in terms of pixels. Afterward, the same process is repeated for the far display and IPDF is recorded for the user. After acquiring near and far IPDs, a threshold is determined for robust decision making in the gaze tracking system. Determining convergence depth requires robust pupil center estimation for each eye and accurate IPD calculation in each frame. The details of iris tracking and estimating the accurate location of the eye center algorithm are shown in Fig. 9(b).

 figure: Fig. 9.

Fig. 9. a) Calibration and tracking process. b) Details of the eye center and IPD estimation algorithm

Download Full Size | PDF

Detailed steps of the finding accurate location of the eye center are shown in Fig. 10. For each frame, we extract a region of interest (ROI) for each eye by using the cascade classifiers as illustrated in Fig. 10(a). Full color and different color channel images were tried, and the red channel gave the best results in the experiments. The red-channel image, which is presented in Fig. 10(b), is segmented into two classes - iris, and background - through thresholding via the optimized threshold value. After morphological operations on the segmented image, the rough eye center position is determined and presented in Fig. 10(c). The resulting image is transformed into polar coordinates around the initial value of the eye center which as shown in Fig. 10(d). In this case, the iris fills the left-side of the polar transformed image. Strong vertical edge points, indicated in the region between dashed red lines in Fig. 10(d), correspond the accurate radial edges of the iris. Figure 10(e) demonstrates the transformed image in the cartesian coordinates. In this figure, white points correspond to the accurate radial edges [21] of the iris found in the previous step. The best-fitting ellipse and its center are acquired through a direct least-squares method [22] which is shown in Fig. 10(e). Ellipse’s center corresponds to the accurate eye center position in 2D and they are projected into the original frame as illustrated in Fig. 10(f).

 figure: Fig. 10.

Fig. 10. a) Region of interest for one eye b) Red-channel image c) Thresholded and segmented image d) Polar transformed image e) Radial edge points, the best-fitting ellipse, and its center (illustrated as blue dot) f) Final result on both eyes (computed pupil center illustrated as a green dot)

Download Full Size | PDF

The gaze position of the user is estimated at each frame. We used the information of the gaze depth plane to interact with 3D virtual objects placed on the different planes of convergence in real-time. To demonstrate the interaction, we created a proof-of-concept game using the Unity3D game engine [23]. The gaze tracker script is developed in Python programming language utilizing OpenCV library [24]. To establish communication between Unity3D game engine and the gaze tracker script, we used the NodeJS Mosca library of the MQTT broker [25]. We send binary gaze depth information via a specific topic to the broker. Then, the broker sets the gaze state flag in Unity3D. The gaze state flag is used in C# script to change the virtual content on the proper display. The screenshots of the game together with acquired IPD values are illustrated in Fig. 11. As evident in Fig. 11, two texts are displayed at different depths (or displays). User changes the color of the near object into the red by converging at the near display, as shown in Fig. 11(a). The color of the far object is turned into red in Fig. 11(b), when the user converges at the far display.

 figure: Fig. 11.

Fig. 11. Interaction with gaze tracking. User’s gaze is at the a) near display and b) far display. Based on the user’s gaze, text color changes automatically.

Download Full Size | PDF

The gaze tracking system is tried by 4 different users. IPD distribution for convergence in the near and far display for the users is presented in Fig. 12(a). Distribution in each interval is represented by a box by taking the data extracted from 180 frames into consideration. On each box, the central red line indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points and the outliers are plotted individually using the red ‘+’ symbol [26]. Thresholds are determined in a way that the system can decide which display user is looking at. The threshold value changes between users automatically since the users have different nominal IPDs. Then, we repeated the gaze tracking and decision-making experiment 10 times by asking the User-4 to look at the near and the far display consequently. We recorded the IPD value at each frame while the user is converged in the near or far display and each interval consists of 60 frames. Results are shown in Fig. 12(b). A pre-determined IPD threshold value, found in the calibration process, is illustrated by the dashed red line. It can be clearly observed that we acquired a robust gaze tracker system with only a few outliers. Our implemented gaze tracker is optimized for a limited number of people in certain illumination conditions. Thus, the results might differ in different illumination conditions and for people with extreme IPDs or different eye shapes. Furthermore, the robustness of the system needs to be studied further to test head orientation and other variations.

 figure: Fig. 12.

Fig. 12. a) Measured IPD distribution of 4 users while looking at the near display and far display subsequently during the calibration process. Data extracted from 180 frames in each interval. Red dashed line shows the selected threshold value for each user. b) Repeated gaze tracking results with User-4 when the user repeatedly focuses on near and then far display. User-4. Each box consists of IPD measurement extracted from 60 frames. The threshold value found in the calibration process is illustrated by the dashed line.

Download Full Size | PDF

5. Conclusion

Nearly AR displays available in the market are fixed focus stereoscopic displays and all of them avoid rendering objects closer than 60 cm to limit VAC. In this study, we proposed and demonstrated a simple dual-focal plane AR display prototype to render objects within arm’s length. We implemented a fast and accurate gaze tracker using a single camera to find whether the user is gazing at the near or far display. The presented display reduces the VAC by letting the user perceive objects in two different depth planes. We achieved overlapping biocular 60.0 degrees horizontal and 35.5 degrees vertical FOV with the eye-box size of 105 mm in the horizontal axis and 66 mm in the vertical axis. Optical distortion analysis realized in ZEMAX shows that maximum distortion does not exceed 11% for the natural position of the eyes at the corners of the image. Depth perception and natural blur effect can be clearly observed in the experiments for two different focal planes. Furthermore, gaze tracker is integrated with the rendering system to automatically change the content based on the gaze distance. The experimental results show that the proposed AR display system is promising to study wide FOV and gaze-tracker based interactive AR displays. The proposed system constitutes a convenient table-top display especially targeted for 3D visualization and interaction with virtual objects within arm’s length, and may be useful in medical, gaming, simulation, training, design applications, and as a vision research tool. Currently, the system provides full motion parallax and VAC free visual experience, but virtual content is restricted to two depth planes. Using time multiplexing schemes in combination with fast switchable liquid crystal lenses, the number of virtual displays can be increased. Furthermore, the system can be miniaturized in terms of form-factor for use as a head-worn display or modified to provide larger eye-relief and larger focal plane depths for use as a head-up display.

Funding

FP7 Ideas: European Research Council (IDEAS-ERC) (340200, 755154).

References

1. I. P. Howard and B. J. Rogers, “Development and pathology of binocular vision,” in Binocular Vision and Stereopsis (Oxford Scholarship Online, 1996), pp. 603–644. [CrossRef]  

2. T. Shibata, T. Kawai, K. Ohta, M. Otsuki, N. Miyake, Y. Yoshihara, and T. Iwasaki, “Stereoscopic 3-D display with optical correction for the reduction of the discrepancy between accommodation and convergence,” J. Soc. Inf. Disp. 13(8), 665 (2005). [CrossRef]  

3. D. M. Hoffman, A. R. Girshick, K. Akeley, and M. S. Banks, “Vergence-accommodation conflicts hinder visual performance and cause visual fatigue,” J. Vision 8(3), 33 (2008). [CrossRef]  

4. G. Mather and D. R. Smith, “Depth cue integration: Stereopsis and image blur,” Vision Res. 40(25), 3501–3506 (2000). [CrossRef]  

5. M. S. Banks, S. A. Cholewiak, G. D. Love, P. Srinivasan, and R. Ng, “ChromaBlur: Rendering Chromatic Eye Aberration Improves Accommodation and Realism in HMDs,” Imaging and Applied Optics 2017 (3D, AIO, COSI, IS, MATH, PcAOP) (2017).

6. H. Hua, “Enabling Focus Cues in Head-Mounted Displays,” Proc. IEEE 105(5), 805–824 (2017). [CrossRef]  

7. T. Ando and E. Shimizu, “Head-mounted display using holographic optical element,” Three-Dimensional Video and Display: Devices and Systems: A Critical Review (2001).

8. K. Akşit, W. Lopes, J. Kim, P. Shirley, and D. Luebke, “Near-eye varifocal augmented reality display using see-through screens,” ACM Transactions on Graphics 36(6), 1–13 (2017). [CrossRef]  

9. J. P. Rolland, M. W. Krueger, and A. Goon, “Multifocal planes head-mounted displays,” Appl. Opt. 39(19), 3209 (2000). [CrossRef]  

10. K. Akşit, J. Kautz, and D. Luebke, “Slim near-eye display using pinhole aperture arrays,” Appl. Opt. 54(11), 3422 (2015). [CrossRef]  

11. F. Huang, D. Luebke, and G. Wetzstein, “The light field stereoscope,” ACM SIGGRAPH 2015 Emerging Technologies on - SIGGRAPH 15 (2015).

12. S. Kazempourradi, M. Hedili, K., B. Soner, A. Cem, E. Ulusoy, and H. Urey, “Full-Color Holographic Near-Eye Display with Natural Depth Cues,” In Proc. of the 18th International Meeting of Information Display (IMID) (2018).

13. J. Macnamara and G., Three Dimensional Virtual and Augmented Reality Display System, 26 Jan. 2017.

14. B. T. Schowengerdt, D. Lin, and P. S. Hilaire, U.S. Patent No. US20180052277A1. (U.S. Patent and Trademark Office, Washington, DC, 2017).

15. S. Rushton, M. Mon-Williams, and J. P. Wann, “Binocular vision in a bi-ocular world: New-generation head-mounted displays avoid causing visual deficit,” Displays 15(4), 255–260 (1994). [CrossRef]  

16. J. P. Rolland, M. W. Krueger, and A. A. Goon, “Dynamic focusing in head-mounted displays,” Stereoscopic Displays and Virtual Reality Systems VI (1999).

17. K. Akeley, S. J. Watt, A. R. Girshick, and M. S. Banks, “A stereo display prototype with multiple focal distances,” ACM SIGGRAPH 2004 Papers on - SIGGRAPH 04 (2004).

18. Optics - Imaging - Photonics - Optomechanics - Lasers Edmund Optics. Retrieved from http://www.edmundoptics.com/

19. C. R. Spitzer, U. Ferrell, and T. Ferrell, Digital Avionics Handbook3rd ed. (CRC Press, 2017).

20. G. Iannizzotto and F. L. Rosa, “Competitive Combination of Multiple Eye Detection and Tracking Techniques,” IRE Trans. Ind. Electron. 58(8), 3151–3159 (2011). [CrossRef]  

21. E. Wood and A. Bulling, “EyeTab,” Proceedings of the Symposium on Eye Tracking Research and Applications - ETRA 14 (2014).

22. A. Fitzgibbon, M. Pilu, and R. Fisher, “Direct least squares fitting of ellipses,” Proceedings of 13th International Conference on Pattern Recognition (1996).

23. Unity. Retrieved from https://unity3d.com/

24. OpenCV library. Retrieved from http://www.opencv.org/

25. Mosca. Retrieved from http://www.mosca.io/

26. Reconstruction an Image from Projection Data – MATLAB & Simulink Example. Retrieved from http://www.mathworks.com/help/stats/boxplot.html

Supplementary Material (2)

NameDescription
Visualization 1       Experimental video capture of the virtual scene
Visualization 2       Experimental video capture of the virtual scene

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (12)

Fig. 1.
Fig. 1. Schematic illustration of the optical layout
Fig. 2.
Fig. 2. 3-D Optical Layout in ZEMAX
Fig. 3.
Fig. 3. a) The distortion values of different field points, while the eye is in the center of eye-box. b) The distortion values of different field points, while eyes are in the natural position (i.e. off-centered by IPD/2 = 32.5 mm)
Fig. 4.
Fig. 4. The spot radius sizes for 9 sample points, which are selected as the center, edge and corner points of the display to cover the entire FOV. In this ZEMAX simulation, the eye is shifted by IPD/2 and the wavelengths are chosen to support the visible light spectrum (red color with 656.3 nm, green color with 587.6 nm and blue color with 486.1 nm). Positions of the field points in the FOV are shown in bottom-right.
Fig. 5.
Fig. 5. The prototype hardware including 2 LCDs for two focal planes and camera for simultaneously tracking two eyes and computing the gaze distance
Fig. 6.
Fig. 6. a) Content appearing in the near display b) Content appearing in the far display
Fig. 7.
Fig. 7. Experimental captures of the virtual scene. a) The camera is focused on 4 diopters (25 cm). b) The camera is focused on 2 diopters (50 cm). FOV is 60.0 by 35.5 degrees and the image can be seen with both eyes simultaneously. (see Visualization 1 and Visualization 2)
Fig. 8.
Fig. 8. a) A geometrical model is developed to estimate the convergence depth of a user. b) Pupil centers illustrated on the left and right eyes. The blue and red dots respectively indicate the positions of pupils when user converges at the near display and far display
Fig. 9.
Fig. 9. a) Calibration and tracking process. b) Details of the eye center and IPD estimation algorithm
Fig. 10.
Fig. 10. a) Region of interest for one eye b) Red-channel image c) Thresholded and segmented image d) Polar transformed image e) Radial edge points, the best-fitting ellipse, and its center (illustrated as blue dot) f) Final result on both eyes (computed pupil center illustrated as a green dot)
Fig. 11.
Fig. 11. Interaction with gaze tracking. User’s gaze is at the a) near display and b) far display. Based on the user’s gaze, text color changes automatically.
Fig. 12.
Fig. 12. a) Measured IPD distribution of 4 users while looking at the near display and far display subsequently during the calibration process. Data extracted from 180 frames in each interval. Red dashed line shows the selected threshold value for each user. b) Repeated gaze tracking results with User-4 when the user repeatedly focuses on near and then far display. User-4. Each box consists of IPD measurement extracted from 60 frames. The threshold value found in the calibration process is illustrated by the dashed line.

Equations (3)

Equations on this page are rendered with MathJax. Learn more.

Distortion (%)=100 × ychiefyrefyref
Δθ = θN θF= tan1(IPD/2Near depth)tan1(IPD/2Far depth)=3.7= 0.065 rad
ΔIPD=2Δx2 ×reye × Δθ=1.56mm,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.