Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Pupil replication waveguide system for autostereoscopic imaging with a wide field of view

Open Access Open Access

Abstract

Augmented reality head-up displays (HUDs) require virtual-object distance matching to the real scene along an adequate field of view (FoV). At the same time, pupil-replication-based waveguide systems provide a wide FoV while affording compact HUDs. To provide 3D imaging and enable virtual-object distance matching in such waveguide systems, we propose a time-sequential autostereoscopic imaging architecture using a synchronized multi-view picture generation and eyebox formation units. Our simulation setup to validate the system feasibility yields an FoV of 15° × 7.5° with clear crosstalk-less images with a resolution of 60 pix/deg for each eye. Our proof-of-concept prototype with reduced specs yields results that are consistent with the simulation in terms of the viewing-zone formation. Thus, viewing zones for the left and right eyes in plane of the eyebox can be clearly observed. Finally, we discuss how the initial distance of the virtual image can be set for quantified fatigue-free 3D imaging, and an FoV can be further extended in such types of waveguide systems by varying parameters of the eyebox formation unit.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

Corrections

20 October 2021: Typographical corrections were made to the body text, Eq. (2), and Figs. 6 and 8.

1. Introduction

Augmented reality (AR) devices have attracted considerable attention for their applicability in many aspects of daily life ranging from assistive technologies to entertainment. One such AR device is the automotive head-up display (HUD), which uses AR to improve driving safety and comfort. A critical feature of HUDs involves matching the depth of virtual objects with the real scene. In this regard, several approaches have already been investigated such as multi- or variable-depth HUDs [13], autostereoscopic or light-field-based systems with eye-tracking [4,5], and holographic displays, which are based on technologies ranging from computer-generated holograms (CGHs) with a single depth and scattering screens [6] to multi-depth CGHs enabling 3D vision cues [79].

Each of these technologies offers certain advantages as well as drawbacks. Multi-depth technologies use simple architectures; however, their depth matching is limited to a certain number of planes. The variable-depth approach mitigates the issue of depth matching at the expense of mechanics-related limitations such as vibration and latency. Autostereoscopic or light-field approaches can afford several depths similar to the holographic approach, which implements all 3D vision cues. However, these approaches still utilize freeform-mirror-based projection systems. Therefore, an extension of the field of view (FoV) will result in a drastic increase in the size and packaging volume of the HUD system. To achieve a compact setup with wide FoV, several technologies based on holography [1012], Pancharatnam–Berry phase lenses [13], and waveguides [14] have been proposed. Lens-based approaches will need to consider dispersion effects, which implies stringent requirements as regards source bandwidth and additional chromatic aberration corrections. On the other hand, the waveguide approach can be implemented with relatively large design freedom.

The waveguide approach is very useful to reduce the HUD system volume. In this approach, the pupil replication technique ensures that an image can be observed with its full length while the picture-generation system can be reduced in size [15]. In general, in head-mounted displays (HMDs) using waveguides, stereoscopic 3D images are implemented by delivering the image of interest via waveguides for each eye. Such a device can be implemented with a single display [16] or with separate displays for each eye [17]. However, with a single waveguide, the user can only observe the same image with both eyes. Meanwhile, it is difficult to implement two waveguides in an HUD system owing to space constraints, and thus, current waveguide-based technologies only offer 2D images as the driver view. However, the main thrust of HUD development is to enable 3D-image observation toward increasing driver comfort and safety. Here, we note that 3D images can be provided by means of the stereo image technique, wherein separate images are viewed by the left and right eyes.

Against this backdrop, here, we propose an architecture to implement stereoscopic images with a single-waveguide architecture. Here, by “single waveguide,” we refer to a waveguide layout with a full eyebox shared by both eyes, although the waveguide unit itself can be single-layer or multi-layer. By modifying the picture-generation unit to generate different images for each eye within the same FoV and adding an eyebox view formation unit, our approach provides the driver with two 2D images whose features are offset from each other based on binocular disparity. Thus, the brain merges them into a 3D perspective, thereby improving the depth matching of virtual objects with the real scene. In this study, we explore the main architecture and propose the common design principles for such systems. We particularly focus on system implementation with autostereoscopic viewing-zone formation in time-sequential mode as a practical solution. Moreover, we construct a simulation setup and actual prototype to validate our concept. Finally, we discuss our results and future work in this direction.

2. Concept

We first present and validate the concept of stereo-image formation for HUD application to deliver binocular-disparity-based 3D images to the driver. In this case, the matching of the final virtual-image depth and real-scene object depth is achieved by the offset between the stereo-pair images as shown in Fig. 1(a). The offset value corresponds to the view difference from the observation positions of the left and right eyes [18]. By varying this offset, the final 3D image can be generated at various distances. The pupil replication waveguide typically operates with an image at infinity. For this, the display is placed in the focal plane of the projection lens. Thus, the information about the pixels data is converted into an angular distribution, and then distributed over the region of the eyebox where it is finally perceived by the eyes. We can encode distance to virtual objects using offset as shown in Fig. 1(b).

 figure: Fig. 1.

Fig. 1. (a) Concept of 3D virtual-image generation at various distances based on offset-induced binocular disparity and (b) potential implementation with a pupil replication waveguide system.

Download Full Size | PDF

However, although the waveguide-based approach with pupil replication has proved its adaptability to both HMD and HUD applications, offering system compactness as well as sufficient eyebox size, a single waveguide delivers only a 2D virtual image to both eyes. To realize 3D images with a single-waveguide-based system, we consider a novel method to establish at least two viewing zones at the eyebox plane with a single waveguide. Figure 2 shows our proposed the architecture. In addition to the waveguide with pupil replication, we introduce two key units: a multi-view picture-generation unit (MV-PGU) to generate images for the left and right eyes and a multi-view eyebox formation unit (MV-EFU) to form viewing zones in accordance with the eye position and rendered content. We use the term “multi-view” here as the number of viewpoints to be formed need not be limited to only two.

 figure: Fig. 2.

Fig. 2. Architecture for autostereoscopic imaging using a pupil replication waveguide system with synchronized multi-view picture generation and eyebox formation units.

Download Full Size | PDF

In contrast to conventional PGUs, the MV-PGU can generate several content images within the same FoV, wherein each image has a distinguishing feature based on, for example, wavelength, polarization state, or time. Although any of these features can be implemented by means of suitable architecture, we describe here a system based on time difference, i.e., time-sequential MV-PGU. The time-sequential mode provides alternate-view formation and content representation for each eye. The image source displays content corresponding to the view angle for the left eye at each “odd” moment of time t1, t3,…, t2n-1, whereas content for the right eye is displayed at each “even” moment t2, t4,…, t2n, as shown in Fig. 3(a).

 figure: Fig. 3.

Fig. 3. (a) Schematic of multi-view picture-generation unit (MV-PGU) affording two views within the same field of view (FoV). (b) Time-sequential-mode operation of the spatial mask (SM).

Download Full Size | PDF

In our setup, the MV-PGU is a compact optical system that creates two virtual images at infinity with a small exit pupil (∼5 mm diameter). We orient the exit pupil to align with the in-coupling element of the waveguide. The waveguide system is implemented with the pupil replication technique. Both the in-coupling and out-coupling elements are composed of surface-relief gratings (SRGs). However, this choice of SRGs is not compulsory; other type of elements such as semi-transparent mirrors or micro-prisms (etc.) can also be implemented. In our design, we used waveguide architecture with 2D pupil replication to achieve an eyebox size of 130 mm × 60 mm.

The MV-EFU is a thin optical module, which separates light based on a distinguishing parameter - time, to form viewing zones at the eyebox plane. This module consists of a stack of lens arrays (LA1 and LA2, where LA1 corresponds to the bottom lens array (closer to the waveguide) and LA2 to the top lens array (closer to the eyebox)) with a spatial mask (SM) positioned between them. The lens stack is that of similar to Gabor superlens [1921], however, we imply several specific features. Thus, the focal lengths and positions of LA1 and LA2 are chosen to form a telescopic system in each channel. In this case, light out-coupled from the waveguide preserves its direction, and the virtual image is still formed at infinity after passage through both arrays. And the SM is positioned at the conjugate plane of the eyebox with respect to LA2, as shown in Fig. 2.

The SM is a key element in the MV-EFU, and it is composed of a thin layer of absorber with embedded segments of transmitting filters. Each segment is positioned in alignment with the corresponding lens in the lens array so that the number of segments matches the number of lenses in each layer. The number of filters in each segment corresponds to the number of viewing zones at the eyebox. In the case of the time-sequential display, the number of views is two, and therefore, we require only two filters in each segment. In the time-sequential mode, the SM acts as the active element for time-dependent transmittance; thus, for example, the SM can be a liquid-crystal display (LCD) panel. Therefore, in the LCD case, the transmitting filters represent LC pixels in two operational states, on and off, which correspond to the transmittance or absorbance modes, respectively, based on polarization. The SM is synchronized with MV-PGU to ensure that the views at the eyebox are formed alternately with a frequency of at least 25 fps for each eye. Figure 3 depicts this synchronization. Eye-tracking can further be used to acquire the eye position and accordingly update pixel allocation at the SM.

To allocate pixels to the left and right eyes, we perform the following calculation. As mentioned above, the SM and eyebox planes are conjugate with respect to the second lens array LA2. This means that LA2 creates an image of the SM at the eyebox, and therefore, the mask position can be calculated as per Eq. (1):

$${a_{SM}} = 1/({1/{f_{LA2}} - 1/a{^{\prime}_{SM}}} ), $$
where a′SM denotes the distance from LA2 to the eyebox plane (the so-called eye relief), aSM the distance from the SM to LA2, and fLA2 the focal length of LA2. In this case, the width of a single SM segment can be calculated via the magnification relation MLA2= a′SM/aSM, as wSM = wEB/MLA2, where wEB denotes the eyebox width.

The mask segments consist of LC pixels, and therefore, each segment has a pixel pitch, which allows us to distinguish between static and dynamic pixel-pitch values. The static pixel pitch of the mask is shown in Fig. 4. This pitch determines the positions of the viewing-zone profiles formed by each lens in LA2. To align all the profiles suitably, PS should equal the pitch of LA2, PLA2, such that the following condition holds true for each k:

$$ N \times P_{S}=k \times P_{L A 2} \times\left(1+1 / M_{L A 2}\right) \forall k \in\left\{1, \ldots, N_{L A 2} / 2\right\}, $$
where N denotes an integer, 1/MLA2 = aSM/a′SM the inverse magnification of LA2, and NLA2 the number of lenses in LA2. The single viewing zone corresponding to one eye represents the summarized illuminance profile of all mask segments profiles formed by each lens in LA2. When PS matches PLA2, points x1, x2 match x′1 = x′2, all the profiles are aligned suitably, resulting in a wide low-crosstalk margin, as shown in Fig. 4(a). Here, we define IV as the illuminance of a single view summarized over all Ii, where Ii denotes the illuminance of a single view from each lens in LA2 with i ranging from 1 to NLA2.

 figure: Fig. 4.

Fig. 4. Concept of static pitch. (a) Case of static pitch PS matching with lens pitch PLA2, resulting in a wide low-crosstalk margin. (b) Case of static pitch PS not matching with lens pitch PLA2, resulting in a narrow low-crosstalk margin.

Download Full Size | PDF

The dynamic pitch of the SM determines the single spatial shift of the SM segment along horizontal direction. When the data from the eye-tracking camera shows that the eye movement has reached a certain limit, we shift the SM by one pixel, which is equal to the dynamic pitch. Therefore, it determines the minimum possible step of the viewing-zone translation while following the eye position (Fig. 5). Thus, the dynamic pitch can be calculated as per Eq. (3):

$${P_D} = \varDelta {x_V}/{M_{LA2}}, $$
where ΔxV denotes the desirable step of the viewing-zone translation along the eyebox x-direction. Parameter PD should be calculated with a narrow step value to ensure that the eye position can be followed accurately, as shown in Fig. 5(a). If PD is large, the translation of the viewing zones during mask re-rendering will become larger than the actual eye-movement step ΔxE, thus resulting in a narrow low-crosstalk margin.

 figure: Fig. 5.

Fig. 5. Concept of dynamic pitch. (a) Dynamic pitch PD is sufficiently small to follow eye movement smoothly, and thus, the viewing-zone translation matches the eye translation, thereby leading to a wide low-crosstalk margin. (b) Dynamic pitch PD is too large to follow eye movement smoothly, and therefore, the viewing-zone translation is larger than the eye translation, affording a narrow low-crosstalk margin.

Download Full Size | PDF

The actual pixel pitch of the SM is selected as the greatest common divisor of the static and dynamic values as PSM = gcd(PS, PD) to secure for both viewing conditions. For high-latency systems, a low margin can be a serious issue. Thus, mismatches in the SM pitch selection should be avoided.

3. Simulation

First, we simulated our concept using paraxial elements to validate the initial idea. Next, we attempted to validity the feasibility of the approach. In this section, we consider the feasibility aspect. To simulate the full system, we designed the optics using CODE V and constructed the final model in LightTools. Our model includes three main components except for receivers: MV-PGU, waveguide with pupil replication, and MV-EFU.

In this section, we describe the design procedure to deliver a bright uniform image. First, we introduce the illumination module for MV-PGU, which consists of a laser diode (LD) (the typical reference for size, angular, and spectral parameters is the Osram PL 520 Metal Can TO38 with λpeak = 520 nm) with collimator, speckle reducer, transform lens, and polarizing beam splitter (PBS). The illumination stage was designed in conjunction with the projection setup. Thus, considering the size of illuminated display area, required angle of incidence (AOI), and divergence, we set the following parameters: collimation lens focal length = 30 mm, diffusion angle of speckle reducer = 6°, and transform lens focal length = 70 mm. The entire unit was tuned in LightTools to achieve a uniformity of 85% as per a 9-point measurement, as shown in Fig. 6(a).

 figure: Fig. 6.

Fig. 6. Optics design of the multi-view picture-generation unit (MV-PGU). (a) Illumination system to provide a uniform illumination for display. (b) Projection system to translate the spatial display data into the corresponding angular distribution for in-coupling to the waveguide.

Download Full Size | PDF

The HED-5201 LCOS microdisplay with an initial resolution of 1280 × 720 pixels and pixel size of 9.6 µm was chosen as the display. This display was chosen for its monochromatic output, considering that our simulation proof of concept (PoC) setup was designed for images with a single color. A frame rate of 60 Hz was applied to confirm our system’s potential applicability in the time-sequential mode. We used a display area of 900 × 450 pixels to generate a final FoV of 15° × 7.5° with a resolution of 60 pix/deg. This value corresponds to the normal resolution of the human eye (≥1 arcmin). To translate the display area of 8.64 mm × 4.32 mm (900 × 450 pixels) to our target FoV, we require an optical system with focal length = 32.81 mm. We also need to consider the normal direction of light from the illumination unit, the beam splitter position in front of the display, and the distance to the exit pupil to correctly align the MV-PGU with the input grating of the waveguide. Except for resolution, the abovementioned factors formed the main considerations for optimization. We optimized the projection module using CODE V and achieved a modulation level of ≥0.52 at a spatial frequency of 30 cyc/deg (estimated using the relation sqrt((FoVH/2)2 + (FoVV/2)2) = 8.39°), as shown in Fig. 6(b). Here, the maximum distortion was 2.1%.

As the next step, we designed the waveguide with pupil replication to transmit light in the range of 15°. We used a substrate thickness of 3 mm, and the base material was BASF64 (Schott catalog). The waveguide included in-coupling, redirecting, and out-coupling gratings, as shown in Fig. 7(a). In the study, SRGs were used as gratings, as shown in Fig. 7(b). The in-coupling zone contains an SRG with dimensions of 6.65 mm × 6.65 mm and period of 0.41 µm. The redirecting zone has a width of 368 mm with gradually increasing taper. This zone is sub-divided into 20 elementary zones with a smaller period of 0.29 µm [22]. The out-coupling zone is composed of a 368 mm × 140 mm diffraction element, which is divided into 22 elementary diffraction gratings along its height, with the period for each grating being 0.41 µm, conjugated with the in-coupling grating. The in-coupling grating period d is calculated for the normal incident direction from the MV-PGU to the in-coupling grating. The main criterion is that the angular distribution of the diffracted light propagates without violating the total internal reflection (TIR) condition (θm > θTIR, where θm = sin−1((sin(-FoVIN/2) - mλ/d)/n) and θTIR = sin−1(1/n) with diffraction order m = −1, wavelength λ = 520 nm, and refractive index n = 1.704), as shown in Fig. 7(b). The same approach can be implemented for more efficient in-coupling gratings, such as metal-coated reflective blazed gratings. In our study, the required efficiencies for gratings are calculated via the optimization module of LightTools to achieve uniform illumination within the eyebox area. Thus, for example, the theoretical normalized diffraction efficiency is 1.0 for the out-coupling grating, as shown in Fig. 7(c). With the implemented illumination (5 million rays as per the LD-imported ray data) and projection systems, we achieved an illuminance uniformity of 87% at the eyebox with dimensions of 130 mm × 60 mm and intensity uniformity of 75% at FoV = 15°×7.5°. Both sets of uniformities were obtained using nine-point measurements, as shown in Fig. 7(d).

 figure: Fig. 7.

Fig. 7. (a) Waveguide layout with 2D pupil replication. (b) Schematic of surface-relief grating (SRG). (c) Efficiency profiles of redirecting and out-coupling gratings. (d) Simulation results showing integrated eyebox uniformity of 87% and field-of-view uniformity of 75%.

Download Full Size | PDF

In the next step, we developed the MV-EFU module with the two lens arrays and the SM. Although the magnifications of LA1 and LA2 can be attenuated, to simplify the projection optics design, we set MLA = 1. For module compactness, we chose small focal lengths of fLA1 = fLA2 = 10 mm. The lens arrays were designed to be identical, which means that the arrays can be fabricated from a single sheet for mass production. The lens material was assumed to be poly(methyl methacrylate) (PMMA). The lens pitch was set to 5.15 mm taking into account the mask pixel pitch. The panel selected as the reference for mask was the monochromatic BOE RV101FBB-N00. The actual size of the active image area (221.4 mm × 129.6 mm) is insufficient to cover our target FoV for an eye relief of 750 mm; however, we used this panel’s pixel pitch (=45 µm) as a feasible value. Consequently, the viewing-zone-translation step, ΔxV, corresponded to 3.34 mm with a total eye relief of 750 mm, i.e., 600 mm from the flat windshield. In this case, the distance from SM to LA2 can be calculated as per Eq. (1) and is equal to 10.14 mm. With the achieved lens f-number of 1.94, the energy transmission can be >50% for a horizontal FoV as per the following equation: PLA2/2= (fLA1+ fLA2) × tan(FoVH/2). We optimized the lens arrays using CODE V and achieved a modulation level of 0.65 at an angular frequency of 30 cyc/deg (Fig. 8).

 figure: Fig. 8.

Fig. 8. Single channel of lens-array stack in the multi-view eyebox formation unit (MV-EFU).

Download Full Size | PDF

The final step involved the integration of all the modules in LightTools. As shown in Fig. 9, we positioned two kinds of receivers in the eyebox plane: eye receivers with eye optics to view the image at the retina and an illuminance receiver with the size of the complete eyebox. To utilize both types of receivers, we simulated two pairs of images. First, we used the text “Right” as the image for the right eye and “Left” for the left eye to visualize the content difference along with two white images to construct a viewing-zone illuminance chart. For each image pair, we used patterns at the SM rendered in the pattern left-white-right-black (LWRB) for t1, t3,…, t2n−1 and left-black-right-white (LBRW) for t2, t4,…, t2n. Each of these patterns on the mask transmits the signal image for each eye at the corresponding moment. In our system, the angular direction is translated into the spatial direction by the human eye. Therefore, an image observed at each point of the eyebox will match the FoV in terms of the intensity distribution. From Fig. 9(b), we can observe clearly separated Right-Left images with sharp edges at the eyebox plane. The received viewing-zone distribution corresponds to a margin of 28.5 mm measured at the crosstalk level of 3% at the eyes for an inter-pupillary distance (IPD) of 65 mm. Taking into account the measurement accuracy, the viewing-zone translation step is found to match the theoretical value of 3.34 mm. We additionally modeled the stereo-pair images to visualize the applicability of our system for creating 3D scenes, as shown in Fig. 9(e). In initial modeling, the stripes are more noticeable compared to the left / right content. This phenomenon is associated with a discrepancy between the pitch of the lens arrays and the positions of the discrete light out-coupling for different fields due to the narrow LD spectrum. To mitigate this issue, we add a “wide” spectrum simulation with λpeak = 520 nm and FWHM = 10 nm, where stripes are removed for angular measurements based on the full eyebox.

 figure: Fig. 9.

Fig. 9. Results of full system simulation. (a) Simulation setup in LightTools. (b) Images observed by the left and right eyes in terms of the integrated intensities over eyebox. “Left” is observed with the LWRB mask and “Right” with LBRW mask. Images are simulated with an apodization map applied to compensate for non-uniformity, including that caused by the multi-view eyebox formation unit (MV-EFU). The receiver has resolution of 30 pix/deg. (c) Viewing-zone distribution over the eyebox for the initial position and for mask shifts of 1 and 5 pixels. (d) Crosstalk chart corresponding to the initial viewing-zone position. Crosstalk is calculated as min(ILWRB/ILBRW,ILBRW/ILWRB) × 100%, and values at x ≤ 35 mm and x ≥ 165 mm are invalid because of the presence of unbalanced signal within these bounds. (e) Simulated stereo-pair images with objects located at different distances.

Download Full Size | PDF

Table 1 summarizes the simulation results.

Tables Icon

Table 1. Results obtained with simulated system

4. Experiment

To validate our concept, we assembled an experimental setup using commercially available off-the-shelf (COTS) components. To ensure sufficient power, we used the “Coherent” laser (Genesis CX532-2000) with an operating wavelength of λ = 532 nm, which is similar to that of our simulation setup. The illumination system was assembled using Newport kit lenses for beam expansion, a speckle reducer (Optotune LSR-5-17), a linear polarizer to improve the contrast, and a transmissive LCD panel (L3P08X-66G01) to generate virtual images. The projection system was implemented using a standard lens with f = 100 mm. We used a 10 inch backlight unit (BLU) as the waveguide for pupil replication. This backlight module is separate from the redirecting grating; regardless, it was possible to implement replication along both the horizontal and vertical directions. The MV-EFU was implemented using two Edmund Optics lenticular arrays with PLA1/2= 8.47 mm and fLA1/2= 10.41 mm. The arrays were assembled with a 10.1 inch BOE monochrome display (5K resolution of 4920 × 2880 pixels) used as the SM in the time-sequential mode. To align the position of the SM and arrays, we first position the SM with the LA2 to form an image of the SM in the eyebox plane, as shown in Fig. 10(b). To compensate for errors in the step of the lens array, mask contents are created for each segment one by one, with the condition of projecting the image of each segment in the center of the eyebox. This unit is then added to the main setup in front of LA1 to obtain telescopic (infinity-to-infinity) ray paths between the two arrays. As the lens array modulation transfer function (MTF) is poor, it is not possible to obtain a virtual image with good quality; however, for validating the PoC of viewing-zone formation, it is sufficient that the SM and arrays are suitably aligned. In our experiments, the content for the PoC setup was similar to that used in the simulation. Two masks, patterned LWRB and LBRW, were rendered to form viewing zones for the left and right eyes, respectively. The display and SM were synchronized to transmit letters “A” and “B” as images to each viewing zone. Here, we note that both images are formed in the same FoV of 8° × 4°. Figure 10(b) shows the image captured by the stereo-camera with spatially separated channels for IPD = 65 mm. We do not use any additional combiners to merge the virtual image with the real scene, so as not to sacrifice the visibility of the virtual image. We note here that when these images are rendered as a stereoscopic pair instead of letters, the resulting image is a 3D image with depth perception. Moreover, the image quality obtained with the PoC setup can be improved with the use of customized elements in all key modules.

 figure: Fig. 10.

Fig. 10. (a) Proof-of-concept setup, (b) single-segment-based SM calibration scheme and (c) virtual images observed from the left- and right-eye viewing positions. The face mask is located in the plane of the eyebox at a distance of 750 mm from the LA2 to display the distribution of the viewing zones formed by all SM segments simultaneously. Then, instead of a face mask, a stereo-camera is placed to capture images that appear in a time-sequential manner.

Download Full Size | PDF

5. Discussion

In this study, we proposed a method for the implementation of stereo images, i.e., different images for the left and right eyes, in the single-waveguide-based approach. We note here that several assumptions were made for convenience, and consequently, several issues may require resolution prior to practical application.

Here, we looked at the initial distance of the virtual image at infinity, therefore, our lens arrays are designed in true afocal mode and stereoscopic images are rendered on an infinity basis. Further, the virtual image can also be set at a closer distance by using a negative lens acting over the entire eyebox [15], or by changing the optical power of the lens arrays. In this case, we can render the image in a fatigue-free area quantified for one “near-field” reference plane in automotive applications. For instance, with a −1/5D lens, we can position the image at a distance of 5 m and render for a depth ranging from 2.4 m to 26.2 m [23].

We also considered only a time-sequential system; however, the MV-PGU and MV-EFU can be configured to utilize different characteristics (for example, wavelength or polarization state) to separate light into the viewing zones. Each of these approaches affords certain advantages and disadvantages. When using wavelength separation, there is no need to compromise on the frame rate of the display and mask. However, we will need to use multiple light sources for the display and coatings with narrow, but close emission and transmission spectra for the mask. While polarization-based control can be more robust, it is necessary to consider the windshield reflectance ratio, which varies for s- and p-polarization states.

Although we were able to “transmit” a relatively narrow FoV through our waveguide, when using SRGs, the diffraction efficiency may vary for different incident angles. This can be especially interesting if we intend to cover the entire area of the windshield with a virtual image, which is theoretically possible, since we are using a waveguide with pupil replication technique. In this case, we can consider architecture for a foveated imaging [24] to keep the driver's attention on important aspects of safe driving. To complete the task, we should perform additional rigorous coupled-wave analysis (RCWA) to calculate the diffraction efficiencies [25] and implement countermeasures to prevent uniformity loss. To address this issue, we can consider transmitting an FoV that is narrower than our target and subsequently extend it using a variable-magnification lens stack (Fig. 11). This solution can be applied to extend the FoV of indirect see-through AR devices [26].

 figure: Fig. 11.

Fig. 11. Field-of-view extension by increasing the magnification of the lens stack.

Download Full Size | PDF

A full-color display can also be easily implemented with our architecture. In the case of mirror-based waveguides, all colors can be transmitted within a single waveguide, which means that the layout need not be changed. Meanwhile, multiple waveguides can be used to improve the efficiency of grating-based waveguides. We note here that when using the MV-EFU, the distance from the waveguide to the lens stack does not affect the virtual image and the viewing-zone performance. Therefore, it is possible to use multiple-waveguide architecture.

To proceed further with this concept of the stereo-image formation, we also need to consider possible diffraction and vignetting issues at plane of the SM.

The topics discussed would be included in scope of our further research.

6. Conclusion

Our study proposed a method for the implementation of stereo images in the waveguide-based approach mainly for automotive HUDs. The key components of our setup are the multi-view picture-generation unit and eyebox formation unit. The combined operation of these modules affords the formation of selective viewing zones at the eyebox plane where stereoscopy-based 3D images can be observed. The integration of these modules with the waveguide with pupil replication affords a wide viewing area in a compact final system form factor. In the study, we validated our concept using simulations as well as experiments. We believe that our approach is suitable for application to automotive-related as well as other types of indirect see-through AR devices.

Acknowledgments

The original images for modeling the stereo-pair were downloaded from the Middlebury Stereo website. The authors gratefully acknowledge assistance from Sergey Dubynin, Vasily Grigoriev, Sergey Kopenkin, Yury Borodin, and Andrey Putilin in assembling the experimental setup.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J.-H. Seo, C.-Y. Yoon, J.-H. Oh, S. B. Kang, C. Yang, M. R. Lee, and Y. H. Han, “A study on multi-depth head-up display,” Dig. Tech. Pap. - Soc. Inf. Disp. Int. Symp. 48(1), 883–885 (2017). [CrossRef]  

2. Z. Qin, S.-M. Lin, K.-T. Luo, C.-H. Chen, and Y.-P. Huang, “Dual-focal-plane augmented reality head-up display using a single picture generation unit and a single freeform mirror,” Appl. Opt. 58(20), 5366–5374 (2019). [CrossRef]  

3. N. Yamada, J. Hashimura, O. Tannai, T. Kojima, and K. Sugawara, “Head-up display apparatus,” U. S. Patent Publication Application 2020/0055399 A1 (2020).

4. T. Matsumoto, K. Kusafuka, G. Hamagishi, and H. Takahashi, “Glassless 3D head up display using parallax barrier with eye tracking image processing,” Dig. Tech. Pap. - Soc. Inf. Disp. Int. Symp. 49(1), 1511–1514 (2018). [CrossRef]  

5. J.-H. Lee, I. Yanusik, Y. Choi, B. Kang, C. Hwang, J. Park, D. Nam, and S. Hong, “Automotive augmented reality 3D head-up display based on light-field rendering with eye-tracking,” Opt. Express 28(20), 29788–29804 (2020). [CrossRef]  

6. J. Christmas and N. Collings, “Realizing automotive holographic head up displays,” Dig. Tech. Pap. - Soc. Inf. Disp. Int. Symp. 47(1), 1017–1020 (2016). [CrossRef]  

7. C. Chen, W. Lin, S. Lee, and K. Luo, “Holographic augmented reality head up display for vehicle application,” Dig. Tech. Pap. - Soc. Inf. Disp. Int. Symp. 50(1), 680–682 (2019). [CrossRef]  

8. R. Häussler, Y. Gritsai, E. Zschau, R. Missbach, H. Sahm, M. Stock, and H. Stolle, “Large real-time holographic 3D displays: enabling components and results,” Appl. Opt. 56(13), F45–F52 (2017). [CrossRef]  

9. J. An, K. Won, Y. Kim, J.-Y. Hong, H. Kim, Y. Kim, H. Song, C. Choi, Y. Kim, J. Seo, A. Morozov, H. Park, S. Hong, S. Hwang, K. Kim, and H.-S. Lee, “Slim-panel holographic video display,” Nat. Commun. 11(1), 5568 (2020). [CrossRef]  

10. I. R. Redmond, “Holographic optical elements for automotive windshield displays,” Dig. Tech. Pap. - Soc. Inf. Disp. Int. Symp. 51(1), 246–249 (2020). [CrossRef]  

11. M. Svarichevsky, “Holographic heads up display,” U. S. Patent Publication Application 2019/0155028 A1 (2019).

12. B. Lee, K. Bang, M. Chae, and C. Yoo, “Holographic optical elements for head-up displays and near-eye displays,” Proc. SPIE 11708, 1170803 (2021). [CrossRef]  

13. T. Zhan, Y.-H. Lee, G. Tan, J. Xiong, K. Yin, F. Gou, J. Zou, N. Zhang, D. Zhao, J. Yang, S. Liu, and S.-T. Wu, “Pancharatnam–Berry optical elements for head-up and near-eye displays,” J. Opt. Soc. Am. B 36(5), D52–D65 (2019). [CrossRef]  

14. P. Richter, W. Von Spiegel, and J. Waldern, “Volume optimized and mirror-less holographic waveguide augmented reality head-up display,” Dig. Tech. Pap. - Soc. Inf. Disp. Int. Symp. 49(1), 725–728 (2018). [CrossRef]  

15. B. C. Kress, “Optical waveguide combiners for AR headsets: features and limitations,” Proc. SPIE 11062, 110620J (2019). [CrossRef]  

16. T. Levola, “Stereoscopic near to eye display using a single microdisplay,” Dig. Tech. Pap. - Soc. Inf. Disp. Int. Symp. 38(1), 1158–1159 (2007). [CrossRef]  

17. Q. Hou, Q. Wang, D. Cheng, and Y. Wang, “Geometrical waveguide in see-through head-mounted display: a review,” Proc. SPIE 10021, Optical Design and Testing VII, 100210C (2016).

18. H. Lee, J. Lee, and S. Hong, “Crosstalk reduction method in a glasses-free AR 3D HUD,” Proc. SPIE 11765, 1176516 (2021). [CrossRef]  

19. C. Hembd-Sölner, R. F. Stevens, and M. C. Hutley, “Imaging properties of the Gabor Superlens,” J. Opt. A: Pure Appl. Opt. 1(1), 94–102 (1999). [CrossRef]  

20. K. Akşit, H. Baghsiahi, P. Surman, S. Ölçer, E. Willman, D. R. Selviah, S. Day, and H. Urey, “Dynamic exit pupil trackers for autostereoscopic displays,” Opt. Express 21(12), 14331–14341 (2013). [CrossRef]  

21. S. Chestak, D.-S. Kim, and S.-W. Cho, “Dual side transparent OLED 3D display using Gabor super-lens,” Proc. SPIE 9391, 93910L (2015). [CrossRef]  

22. B. C. Kress and W. J. Cummings, “Optical architecture of HoloLens mixed reality headset,” Proc. SPIE 10335, 103350K (2017). [CrossRef]  

23. N. Broy, S. Höckh, A. Frederiksen, M. Gilowski, J. Eichhorn, F. Naser, H. Jung, J. Niemann, M. Schell, A. Schmid, and F. Alt, “Exploring design parameters for a 3D head-up display,” Proc. PerDis ‘14, pp. 38–43, ACM (2014).

24. J. Kim, Y. Jeong, M. Stengel, K. Akşit, R. Albert, B. Boudaoud, T. Greer, J. Kim, W. Lopes, Z. Majercik, P. Shirley, J. Spjut, M. McGuire, and D. Luebke, “Foveated AR: dynamically-foveated augmented reality display,” ACM Trans. Graph. 38(4), 1–15 (2019). [CrossRef]  

25. S. Peng and G. M. Morris, “Efficient implementation of rigorous coupled-wave analysis for surface-relief gratings,” J. Opt. Soc. Am. A 12(5), 1087–1096 (1995). [CrossRef]  

26. K. Yin, Z. He, K. Li, and S.-T. Wu, “Doubling the FOV of AR displays with a liquid crystal polarization-dependent combiner,” Opt. Express 29(8), 11512–11519 (2021). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1. (a) Concept of 3D virtual-image generation at various distances based on offset-induced binocular disparity and (b) potential implementation with a pupil replication waveguide system.
Fig. 2.
Fig. 2. Architecture for autostereoscopic imaging using a pupil replication waveguide system with synchronized multi-view picture generation and eyebox formation units.
Fig. 3.
Fig. 3. (a) Schematic of multi-view picture-generation unit (MV-PGU) affording two views within the same field of view (FoV). (b) Time-sequential-mode operation of the spatial mask (SM).
Fig. 4.
Fig. 4. Concept of static pitch. (a) Case of static pitch PS matching with lens pitch PLA2, resulting in a wide low-crosstalk margin. (b) Case of static pitch PS not matching with lens pitch PLA2, resulting in a narrow low-crosstalk margin.
Fig. 5.
Fig. 5. Concept of dynamic pitch. (a) Dynamic pitch PD is sufficiently small to follow eye movement smoothly, and thus, the viewing-zone translation matches the eye translation, thereby leading to a wide low-crosstalk margin. (b) Dynamic pitch PD is too large to follow eye movement smoothly, and therefore, the viewing-zone translation is larger than the eye translation, affording a narrow low-crosstalk margin.
Fig. 6.
Fig. 6. Optics design of the multi-view picture-generation unit (MV-PGU). (a) Illumination system to provide a uniform illumination for display. (b) Projection system to translate the spatial display data into the corresponding angular distribution for in-coupling to the waveguide.
Fig. 7.
Fig. 7. (a) Waveguide layout with 2D pupil replication. (b) Schematic of surface-relief grating (SRG). (c) Efficiency profiles of redirecting and out-coupling gratings. (d) Simulation results showing integrated eyebox uniformity of 87% and field-of-view uniformity of 75%.
Fig. 8.
Fig. 8. Single channel of lens-array stack in the multi-view eyebox formation unit (MV-EFU).
Fig. 9.
Fig. 9. Results of full system simulation. (a) Simulation setup in LightTools. (b) Images observed by the left and right eyes in terms of the integrated intensities over eyebox. “Left” is observed with the LWRB mask and “Right” with LBRW mask. Images are simulated with an apodization map applied to compensate for non-uniformity, including that caused by the multi-view eyebox formation unit (MV-EFU). The receiver has resolution of 30 pix/deg. (c) Viewing-zone distribution over the eyebox for the initial position and for mask shifts of 1 and 5 pixels. (d) Crosstalk chart corresponding to the initial viewing-zone position. Crosstalk is calculated as min(ILWRB/ILBRW,ILBRW/ILWRB) × 100%, and values at x ≤ 35 mm and x ≥ 165 mm are invalid because of the presence of unbalanced signal within these bounds. (e) Simulated stereo-pair images with objects located at different distances.
Fig. 10.
Fig. 10. (a) Proof-of-concept setup, (b) single-segment-based SM calibration scheme and (c) virtual images observed from the left- and right-eye viewing positions. The face mask is located in the plane of the eyebox at a distance of 750 mm from the LA2 to display the distribution of the viewing zones formed by all SM segments simultaneously. Then, instead of a face mask, a stereo-camera is placed to capture images that appear in a time-sequential manner.
Fig. 11.
Fig. 11. Field-of-view extension by increasing the magnification of the lens stack.

Tables (1)

Tables Icon

Table 1. Results obtained with simulated system

Equations (3)

Equations on this page are rendered with MathJax. Learn more.

a S M = 1 / ( 1 / f L A 2 1 / a S M ) ,
N × P S = k × P L A 2 × ( 1 + 1 / M L A 2 ) k { 1 , , N L A 2 / 2 } ,
P D = Δ x V / M L A 2 ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.