Unmixing of the background components in an off-axis holographic-mirror-based imaging system using spectral image processing

Fumino Matsui; Fumiaki Watanabe; Tomoya Nakamura; Tomoya Nakamura; Masahiro Yamaguchi

doi:10.1364/OE.412019

1. Introduction

Recently, various applications containing a holographic optical element (HOE) in visual media, including a head-mounted display, a stereoscopic display, and a transparent camera system, have been actively investigated [1]. An off-axis holographic-mirror-based imaging system (OHMIS) that incorporates a volume HOE (vHOE) captures diffracted light images [2,3] using an optical arrangement, which has been illustrated in Fig. 1. By placing this system in front of a display screen, it can be used in a video communication system that captures the frontal image of a person gazing at a display, which is valuable for a face-to-face telepresence that enhances the sense of “being there” [2–7].

Fig. 1. Examples of an off-axis holographic-mirror-based imaging system (OHMIS) (a) with a volume reflection holographic optical element (HOE) [2,3] and (b) with a waveguide vHOE [7].

Download Full Size | PDF

In a conventional video communication system, the camera is typically placed above the display screen and the directions of the eyes do not coincide. Some systems, such as a display system with a slanted half-mirror or a projection display with a semi-translucent screen, were proposed to solve this issue by capturing the frontal image of a person’s face [4,5]. Although these systems can acquire frontal images, they can be bulky and exhibit limited applications. An OHMIS realizes a thin display system with a camera that can be used to capture the frontal image of a person, as shown in Fig. 1(a). Furthermore, a thinner frontal imaging system was developed by applying a holographic waveguide, as shown in Fig. 1(b), which is suitable in case of mobile device applications [7].

Other applications of this system are expected in various fields owing to the fact that the frontal imaging of a subject has been enabled. For example, the implementation of an OHMIS built into eyeglass-type devices for gaze detection has been proposed [8]. Another application can be a highly accurate touchless user interface that operates by recognizing the gestures in front of the screen. Furthermore, augmented reality applications, such as a natural virtual cloth-fitting service, which are close to reality, are expected by processing the frontal subject image that is similar to a reflection image.

A major challenge associated with an OHMIS is the presence of background components in the captured images, as shown in Fig. 2. When an object exists around the subject, the direct reflection of the object on the glass plate surface is mixed in the captured image. Three rounded objects overlay the image of the subject, as demonstrated in Fig. 2(a). When an object is positioned behind the vHOE, the light transmitted through the glass may mix into the background. The star-shaped object behind the glass in Fig. 2(a) corresponds to this case. In the example shown in Fig. 2(b), a mask is placed on the back of the vHOE to block the transmitted light. The vHOE used here is fabricated with a green laser, and only green component of the light from the subject is diffracted by the vHOE and captured by the camera due to the wavelength selectivity of Bragg diffraction. In Fig. 2(c), the image of the subject appears in green, but it is difficult to observe as the reflected component is mixed into the captured image.

Fig. 2. Example of image mixing in an OHMIS. (a) Geometrical arrangement of the optical setup in which the direct reflection and transmission components are mixed into the virtual image of the subject generated by the HOE. (b) Photograph of the OHMIS experiment in which the image of the background object is mixed into (c) the captured image. The image of the subject is green in the captured image as the vHOE exhibits wavelength selectivity with a green laser.

Download Full Size | PDF

Even though the unmixing of background components in an OHMIS has not been previously studied, some studies have attempted to eliminate the reflection component from the images because of the glass surface. For example, Li and Brown’s method [9] utilized the property that the reflection component was not focused. Another technique used multi-viewpoint images [10]. However, in the application presented in this study, the background components that have to be eliminated are not always out of focus. Furthermore, multiple cameras are required for obtaining multi-viewpoint images, complicating the entire system. Alternative methods using convolutional neural networks have been recently proposed [11–13]; however, a huge number of images are required for training and the performances of these methods are dependent on the image contents.

In an OHMIS, the image of the subject is formed by diffracted light with a narrow spectral bandwidth owing to the wavelength selectivity of the volume reflection hologram. In this study, we consider this spectral characteristic of vHOE and propose a method to separate the background components from the captured image. A multispectral camera is used for capturing images, and spectral image processing unmixes the narrowband diffraction component and the broadband background component.

To the best of our knowledge, this work is the first report on the application of spectral image processing to the imaging system that utilizes vHOE. The proposed technique is a basic methodology for this purpose, but is simple and computationally light, and can easily be implemented in real-time. Thus it has high potential in various scenarios of vHOE applications in image acquisition systems.

2. Method

Figure 3 presents a schematic illustration of the proposed system. The spectral reflectance of majority of the real-world objects is smooth with respect to wavelength; therefore, it can be decomposed using a limited number of basis functions [14,15]. Thus, the spectral reflectance of the background component is considered to be smooth over the visible wavelength range. This property of spectral reflectance has been utilized in spectral estimation and spectral image analysis. In contrast, the diffracted light from vHOE includes a narrowband spectral distribution that can be attributed to Bragg diffraction. Therefore, the mixed spectrum exhibits a partially sharp distribution, as shown in Fig. 3. The application of a multispectral camera allows the usage of the aforementioned spectral properties to separate the mixed components. In addition, an image with high-contrast diffraction components can be obtained due to the narrowband spectral sensitivity of the multispectral camera if the spectral range of the Bragg diffraction and one of the multispectral bands can be adjusted. Subsequently, we describe the mathematical model used to separate the diffraction and background components and to obtain the diffraction components after the elimination of the background components.

Fig. 3. Schematic of the proposed system.

Download Full Size | PDF

First, we formulate the manner in which a multispectral camera can be used to capture a mixed image. We consider a point on an object and its corresponding pixel; the spatial coordinates are omitted for simplicity. Let N be the number of bands in a multispectral camera and L be the number of samples in the wavelength direction. This study uses a monochromatic vHOE. Then, the radiance ${f_e}$ of the subjects to be estimated is considered to be a scalar because we can neglect the fluctuations in the spectral reflectance and illumination light spectrum within the narrow wavelength range diffracted by vHOE. By denoting the spectral diffraction efficiency of the vHOE as an L-dimensional column vector H, the spectral radiance of the diffracted light can be approximately given as ${f_e}{\textbf H}$, where ${f_e}$ represents the spectral radiance of the subjects at the diffraction center wavelength of the vHOE. On the other hand, the spectral radiance of the reflected background is given as $a{{\textbf b}_{\boldsymbol e}}$, where ${{\textbf b}_{\boldsymbol e}}$ is an L-dimensional column vector that represents the spectral radiance of the background light, and the glass reflectance is assumed to be a constant $a$. Then, the pixel value in the obtained multispectral image, as shown in Fig. 3, can be expressed as

(1)$$\begin{aligned}{\textbf g} &= {f_e}{\textbf {SH}} + a{\textbf S}{{\textbf b}_{\boldsymbol e}} + {\textbf n}\\ &= ({{\textbf {SH}}\,\,a{\textbf S}} )\left( {\begin{array}{l} {f_e}\\ {{{\textbf b}_{\boldsymbol e}}} \end{array}} \right) + {\textbf n},\end{aligned}$$

where ${\textbf g} = {({{g_1},\ldots {g_N}} )^T}$ is an N-dimensional column vector that represents a pixel value obtained using the multispectral camera, S is an N × L matrix representing the spectral sensitivity of the multispectral camera, and n is an N-dimensional column vector that represents noise, respectively. The objective of this method is to estimate $({f_e}\;{{\textbf b}_e} )^T$ from g, which represents the unmixing of ${f_e}$ and ${{\textbf b}_{\boldsymbol e}}$.

Equation (1) includes N equations and L unknowns. L is 81-dimensional when the visible light range from 380 to 780 nm is discretized at 5-nm intervals, and the number of bands N is typically 4–20 in multispectral imaging. Therefore, the problem is an ill-posed problem, and the solution cannot be uniquely determined. In this proposed method, linear form estimation is performed using Wiener estimation, that is, a matrix A that minimizes the expected value of the mean square error of the estimated solution, $\phi $, which is given as

(2)$$\phi = \left\langle\textrm{||}\left( {\begin{array}{{c}} {{f_e}}\\ {{{\textbf b}_{\boldsymbol e}}} \end{array}} \right) - {\textbf {Ag}} ||_2^2\right\rangle,$$

where $\left\langle \cdot \right\rangle $ denotes the expectation operator and ${||\cdot ||_2}$ denotes the $\cal{l}^2$ norm. The estimate of the unknown vector ${({{{\hat{f}}_e}\; {{\hat{{\textbf b}}}_e}} )^T}$ is obtained as

\left( {\begin{array}{{c}} {{{\hat{f}}_e}}\\ {{{\hat{{\textbf b}}}_e}} \end{array}} \right) = {\textbf {Ag}}.

Let ${{\textbf R}_{\boldsymbol f}}$ be a covariance matrix for the estimation target ${({{f_e}\,\,{{\textbf b}_e}} )^T}$ and ${{\textbf R}_{\boldsymbol n}}$ be a covariance matrix of the noise; then, the estimation matrix A can be written as

(3)$${\textbf A} = {{\textbf R}_{\boldsymbol f}}{{\textbf C}^T}{({{\textbf C}{{\textbf R}_{\boldsymbol f}}{{\textbf C}^T} + {{\textbf R}_{\boldsymbol n}}} )^{ - 1}},$$

where $\; {\textbf C} = ({{\textbf {SH}}\,a{\textbf S}} )$. Here, we assume that ${{\textbf b}_e}$ follows a first-order Markov model and that ${\textbf R}(\rho )$ is a covariance matrix obtained by assuming that each element of ${{\textbf b}_e}$ is a first-order Markov chain [16], such that

(4)$${\textbf R}(\rho )= \left( {\begin{array}{{ccccc}} {{\rho^0}}&{{\rho^1}}&{{\rho^2}}& \cdots &{{\rho^{L - 1}}}\\ {{\rho^1}}&{{\rho^0}}&\rho &{}&{{\rho^{L - 2}}}\\ {{\rho^2}}&\rho &{{\rho^0}}&{}& \vdots \\ \vdots &{}&{}& \ddots &{{\rho^1}}\\ {{\rho^{L - 1}}}&{{\rho^{L - 2}}}& \cdots &{{\rho^1}}&{{\rho^0}} \end{array}} \right),$$

where $\rho $ is the correlation between the adjacent wavelength elements. Using Eq. (4) and by assuming that ${{\textbf b}_e}$ and ${f_e}$ are independent, ${{\textbf R}_{\boldsymbol f}}$ becomes

(5)$${{\textbf R}_{\boldsymbol f}} = \left( {\begin{array}{{cccc}} 1&0& \cdots &0\\ 0&{}&{}&{}\\ \vdots &{}&{{\textbf R}(\rho )}&{}\\ 0&{}&{}&{} \end{array}} \right).$$

It is reported that when $\rho $ is between 0.95 and 1, the Markov process is a good approximation of the spectral correlation matrix [16].

3. Simulation

3.1. Setup

Herein, we present several simulated results to numerically verify the effectiveness of the proposed method. Figure 4 shows a monochrome image of a subject along with three spectral reflectance images that are used as the background, referred to as an artificial color chart, a plush, and a cloth. The spectral image of the artificial color chart was artificially generated to mimic ColorChecker (X-Rite Inc.) based on the measured spectral reflectance. The two hyperspectral images of the plush and cloth were obtained using an NH series camera (EBA JAPAN Co. Ltd.) [17]. The spectral reflectance images were obtained by subtracting a dark current image and dividing with the spectral power distribution of the illuminant obtained from the image of a whiteboard captured under identical illumination condition. The spatial and wavelength resolutions of the hyperspectral images were reduced when generating spectral reflectance images for performing the simulation. The resolution of the original data was 1024 × 1280 pixels, and the central area of the image was cropped to reduce the resolution to match that of the subject image.

Fig. 4. The images used in the simulation include (a) a monochrome image as the subject and spectral images of (b) an artificial color chart, (c) a plush, and (d) a cloth. These images are rendered in the sRGB color space.

Download Full Size | PDF

The number of spectral samples in the original data was 151 in the range of 350–1100 nm at 5-nm intervals. Among these data, the 81-dimensional data obtained within the visible light region (380–780 nm) were used for the simulation. Owing to the presence of some noise, smoothing was performed along the wavelength direction.

The spectral power distribution of the illuminants is shown in Fig. 5(a), from which ${f_e}$ and ${{\textbf b}_e}$ were generated based on these data. The resolution of all the images was 256 × 256 pixels, and the number of spectral samples of ${{\textbf b}_e}$ was 81, as mentioned above. Figure 5(b) shows the vHOE diffraction efficiency distribution H and glass reflectance a. As for the spectral sensitivity S of the multispectral camera, we used a simulated sensitivity distribution having a normal distribution in 40-nm intervals from 410 to 690 nm, full width at half maximum of 20 nm, and a maximum of 1, as shown in Fig. 4(c). The vector g was obtained using these data. Further, random Gaussian white noise was added to each band of the multispectral image, where the variance of the noise $\textrm{||}n{\textrm{||}^2}$ is determined as

(6)$$ \|n\|^{2}=\frac{\left\|g_{W}\right\|^{2}}{10^{\frac{S N R}{10}}} $$

where ${g_W}$ is the average of the N-band signal values for white objects in case of each illuminant. We set SNR = 30 in the following simulation; for the reconstruction process, we set $\rho = 0.999$.

Fig. 5. (a) Spectral power distribution of the D65 (orange line), A (green line), and F7 (purple line) illuminants. (b) Spectral diffraction efficiency distribution of vHOE (blue line) and glass reflectance (orange line). (c) Simulated 8-band spectral sensitivity used during the simulation.

Download Full Size | PDF

3.2 Evaluation

To evaluate the estimation accuracy, the normalized root mean square error (NRMSE) with respect to the estimated image ${\hat{f}_e}$ and the original image ${f_e}$, denoted as $NRMS{E_{{{\hat{f}}_e}}}$, was used. Further, the error before unmixing was assessed using one band of the captured image (fourth band), where the spectral sensitivity covered the peak wavelength of diffraction. The image of the subject can be observed in the fourth band, while the background scene was mixed. For the assessment of the error before unmixing, the NRMSE between the captured image ${g_4}$ and ${f_e}$, denoted as $NRMS{E_{{g_4}}}$, was computed. The definitions of NRMSE before and after the estimation process are identical and given by

(7)$$NRMS{E_{img}} = \frac{{\sqrt {\frac{1}{m}\mathop \sum \nolimits_{\textrm{i} = 1}^m {{({\mathrm{\xi }[i ]- {f_e}[i ]} )}^2}} }}{{\sqrt {\frac{1}{m}\mathop \sum \nolimits_{\textrm{i} = 1}^m {{({{f_e}[i ]- {{\bar{f}}_e}} )}^2}} }},$$

Where

(8)$$\xi [i ]= \left\{ {\begin{array}{{cc}} {{{\hat{f}}_e}[i ]}&{for\; NRMS{E_{{{\hat{f}}_e}}}}\\ {\left( {\frac{{{{\bar{f}}_e}}}{{{{\bar{g}}_4}}}} \right){g_4}[i ]}&{for\; NRMS{E_{{g_4}}}} \end{array}} \right.\; ,$$

such that ${\bar{f}_e}$ and ${\bar{g}_4}$ are the means of ${f_e}$ and ${g_4}$, respectively, and m is the total number of pixels. The ratio of ${\bar{f}_e}$ to ${\bar{g}_4}$ is multiplied for compensation to exclude the errors introduced due to the change in signal average caused by various factors such as the diffraction efficiency of the vHOE and the spectral sensitivity of the camera. The accuracy of background component elimination can be evaluated by observing how $NRMS{E_{{{\hat{f}}_e}}}$ becomes smaller with respect to $NRMS{E_{{g_4}}}$.

However, the error measured by NRMSE defined above includes the mixture of background components and the noise when Gaussian noise is added. To evaluate the unmixing effect in the noisy case, it is expected to separate the noise and the unmixing error. For this purpose, we resized the images ${\hat{f}_e}$ and ${g_4}$ to 1/4 in both horizontal and vertical directions with averaging $4 \times 4$ pixel values. The original image ${f_e}$ was also resized in the same way, and then the NRMSEs of the resized images were calculated using Eq. (7). By averaging $4 \times 4$ pixels, the deviation due to the independent random noise can be significantly reduced, whereas the error caused by the background mixture still remains. Those evaluation metrics are denoted by $NRMS{E_{1/4}}$ in Table 1. They are given only for noisy cases because they are almost identical to the original NRMSEs in noise-free cases.

Table 1. NRMSE values calculated for each condition.

View Table

3.3 Results

Figure 6 presents some simulation results. The background components were observed to almost disappear when the proposed method was applied, as presented in Figs. 6(b), (d), (f), (h), and (j). The simulation revealed that unmixing can be achieved regardless of the presence of noise, changes in illuminants, and types of background components. Table 1 presents the NRMSE values measured under all the conditions. In the noise-free cases, the $NRMS{E_{{{\hat{f}}_e}}}$ values were significantly smaller under all conditions than the $NRMS{E_{{g_4}}}$ values. This indicates the reduction of the background component, numerically confirming that separation could be achieved. However, in noisy scenarios, these values were closer than that in the noise-free cases. In particular, when “cloth” was used as the background with illuminant A, the $NRMS{E_{{{\hat{f}}_e}}}$ value was larger than $NRMS{E_{{g_4}}}$, which can be attributed to the influence of noise. Even though the NRMSE was not reduced in these cases, the background components were still well removed, as can be observed from Figs. 6(h) and (j). Besides, $NRMS{E_{1/4}}$ became significantly smaller in ${\hat{f}_e}$ than ${g_4}$ in all cases, even in the case of “cloth” and illuminant A. This result means that the slightly amplified random noise primarily caused the error in the images estimated for the noisy cases.

Fig. 6. Simulation results with (a) through (f) representing the noise-free cases and (g) through (j) representing the noisy cases. The ${g_4}$ images are (a), (c), (e), (g), and (i), and the estimated images ${\hat{f}_e}$ are (b), (d), (f), (h), and (j).

Download Full Size | PDF

Next, the estimation was performed using 5-, 3-, and 2-band images to investigate the effect of the number of bands on the estimation accuracy. These bands were selected based on the following criteria: (1) the #4 band that includes the diffraction center wavelength of the vHOE, was always used and (2) the remaining bands were selected such that their peaks would become as close as possible to that of #4. Specifically, #2/#3/#4/#5/#6 were used in case of five bands, #3/#4/#5 were used in case of three bands, and #4/#5 were used in case of two bands. In case of 2-band estimations, the results obtained using #3 or #5 were nearly identical; therefore, only the latter case has been presented in this study.

Figure 7 presents the average of the $NRMS{E_{{{\hat{f}}_e}}}$ values under each condition. The average values for each illuminant were presented because the $NRMS{E_{{{\hat{f}}_e}}}$ values were almost identical among all the subjects. In noise-free cases, the $NRMS{E_{{{\hat{f}}_e}}}$ values tended to decrease with increasing number of bands. On the other hand, in noisy cases, almost no difference could be observed in the $NRMS{E_{{{\hat{f}}_e}}}$ values, except for the 2-band cases. Therefore, three bands are considered to be almost sufficient for performing estimation under general noisy situations.

Fig. 7. Average $NRMS{E_{{{\hat{f}}_e}}}$ values for each illuminant condition.

Download Full Size | PDF

4. Experiment

4.1 Setup

An experiment was conducted to investigate whether the proposed method could operate in a real environment. As an illuminant, an artificial sunlight lamp (SERIC Ltd., XC 100AF) was used with a color temperature conversion filter at a color temperature of 6500 K. Figure 8(a) shows the radiance distribution of this illuminant. The lamps were covered with semi-translucent white paper to achieve light diffusion.

Fig. 8. (a) Spectral power distribution of the illuminant. (b) Diffraction efficiency distribution of the vHOE and glass reflectance. The orange line is the measured diffraction efficiency, whereas the blue line is the fitted distribution. The green line is the measured glass reflectance, whereas the purple line is the averaged value. Note that the diffraction efficiency curve was measured as “spectral diffraction efficiency by transmittance measurement for volume reflection holograms” defined in [18] for simplicity, and includes the scattered component (observed in 540∼610 nm range). (c) Spectral sensitivity of the multispectral camera used in the experiment. #a, #b, and #c indicate the bands used in the three-band estimation, as shown in Fig. 10.

Download Full Size | PDF

The vHOE was fabricated using a photopolymer material (Covestro AG, HX200) and attached to an A4-sized low-reflection glass plate. The hologram was exposed using a Nd:YAG laser with a 532-nm wavelength and a diffraction angle of 135°. The details of its fabrication process have been explained by Nakamura et al. [2]. Figure 8(b) shows the spectral diffraction of the vHOE and the reflectance of the cover glass surface. The spectral diffraction efficiency was measured according to the method described in ISO17901-1 [18]. During the reconstruction process, the distribution obtained after noise reduction was used, where Gaussian distribution fitting was applied to eliminate the noise from the measurement result. The reflectance of the cover glass was almost uniform within the effective range of the camera. The average value of 8.8% was used during the reconstruction process.

A multispectral camera, IMEC SNm4 × 4 VIS, was used with a spectral sensitivity shown in Fig. 8(c). 16-band images are acquired via a single shot because there are 4 × 4 spectral filters arranged in a mosaic pattern. To process these images as spectral images, each band image was extracted from the corresponding pixels and arranged in the wavelength direction using the same spatial coordinates. The 4 × 4 filter area was considered as one pixel of the spectral image. The pixel count of the image was 900 × 600, which was obtained by cropping the entire captured image of 1088 × 2048 pixels. This process reduced the resolution by 1/4 to 225 × 150 pixels.

Figure 9 shows the arrangement of the experimental setup. A mannequin was positioned in front of the vHOE as the subject, and the background objects were placed toward its right-hand side. The illumination lights were arranged at 45° to the left and the right. The illuminance was 455 lx in front of the mannequin and 368 lx on the surface of the objects. During the reconstruction process, we set ρ = 0.99.

Fig. 9. The experimental setup with D_o= 70 cm, D_c= 45 cm, D_s= 25 cm, and θ_H = 45°.

Download Full Size | PDF

4.2 Results

Figure 10(a) denotes the captured image in sRGB format obtained using the multispectral image [19]. Although the diffracted components from the subject can be observed, the subject remains difficult to recognize because of the presence of background components. The image of the band in Fig. 10(a) [Fig. 8(c)] is shown in Fig. 10(b). However, the background components (in this case, comprising the color chart) hindered the subject component. Figure 10(c) shows the resultant image obtained using the proposed estimation method. When compared with Fig. 10(a), the background components are considerably less and the subject component is much clearer.

Fig. 10. (a) The captured image displayed in sRGB. (b) Band #a image of the captured 16-band image. (c) Resultant image obtained using 16 bands. (d) Resultant image obtained using three bands.

Download Full Size | PDF

Figure 10(d) shows the resultant image obtained using three bands; i.e., only the #a, #b, and #c bands specified in Fig. 8(c). The estimation resulted in an almost identical appearance to that obtained when using 16 bands. This result suggests that a similar estimation accuracy may be achieved using only three bands, including a band with a peak near the diffraction center wavelength and two bands distributed along both the sides.

In the experimental result shown in Fig. 10 and Visualization 1, the noise was amplified in the background-removed image. We estimate the amount of noise as the standard deviation of a small region (16 ${\times} $ 26 pixels) where the luminance of the object is almost uniform in the top-left corner of Fig. 10(b)–(d). As a result, the standard deviations of the images (b) before background removal, (c) background-removed result using 16-bands, and (d) background-removed result using 3 bands were 4.5, 7.1, and 6.9, respectively. The mean values of the same region were almost unchanged, i.e., 55.6, 59.8, 59.4, respectively. This tendency of noise amplification is similar to the case of simulation results. Thus, the main source of noise amplification is considered to be the matrix inversion in Eq. (3). The degree of noise amplification depends on the condition number of the matrix $({{\textbf C}{{\textbf R}_{\boldsymbol f}}{{\textbf C}^T} + {{\textbf R}_{\boldsymbol n}}} )$ in Eq. (3). The condition number increases if the spectral sensitivities are overlapped each other and the width of the spectral diffraction efficiency is large. If the spectral sensitivity is narrower, the noise amplification is less significant but the light used in the captured image is decreased resulting in more noise appeared in the captured image. It would be valuable to apply more advanced noise removal techniques, such as total-variation minimization or block-matching and 3D-filtering in the future.

We also implemented this method in a real-time video system using an Intel Core i5-7300U CPU 2.60 GHz and 8 GB RAM. Visualization 1 shows the recorded video. For demonstration, the estimated image, the acquired image of the #a band, and the acquired image in sRGB were reproduced in real time. The obtained image was displayed at 8.3 frames per second (fps), with an exposure time of 120 ms. The calculation time required to convert the acquired image was not longer than 30 ms. Therefore, the frame rate was restricted only by the exposure time. The calculation in the method is simple. Using a fixed matrix A in Eq. (3), which can be precomputed, the estimated image can be obtained by the inner product of the multiband pixel values and the first column of matrix A. The image resolution of the current experimental system is low, but the computational time for this process will be negligible even in the case of a high-resolution image, because such pixel-wise matrix multiplication is normally included in the standard image and video systems. Therefore, this method can be easily implemented in a real-time operation utilizing a typical PC.

5. Discussion

Some edges of the background component, corresponding to the boundaries between the color patches in the background object, were still visible in the resultant estimated image, as can be observed from Figs. 10(c) and (d). These features can be attributed to the usage of the mosaic filter on the multispectral sensor. Each band of a single pixel captured a different object within the filter area when the boundaries of the background components were located within the 4 × 4 filter area. Therefore, a spectrum different from the original was generated. This problem can be solved by performing an estimation using a spatial correlation matrix using the mosaic filter, such as the spatio-spectral 3D Wiener estimation technique [20]. Since the solution to this edge-artifact issue is slightly different from the basic methodology for unmixing presented in this paper, the application of spatio-spectral 3D Wiener estimation will be discussed elsewhere.

The estimation method presented in this paper is very basic and simple, which is a minimization of mean square error using the prior information on the smoothness of the spectrum. It is expected in the future to apply a more advanced technique, e.g., the minimization of $\cal{l}^1$ norm or total variation, the use of spatial correlation, and the use of temporal correlation such as Kalman filtering. It is also expected to optimize the number of bands and the spectral sensitivity of the image capturing device.

Based on the theory described in section 2, the radiance spectrum of the background object is assumed to be smooth, which is reasonable in case of reflective objects if the spectrum of the illumination source is smooth because the spectral reflectance of natural surfaces is mostly smooth. However, artificial illuminants may include spectra with peaks. In the simulation presented in section 3, a fluorescent lamp F7 was included even though it did not significantly influence the estimation accuracy. This result can be attributed to the fact that the local peak of F7 does not overlap with the peak of the vHOE because the estimation accuracy is expected to be affected when both the peaks overlap. A solution to this problem is to measure the illuminant spectrum and establish a smoothness constraint only with respect to the spectral reflectance.

This approach can be applied directly to full-color HOE [6] because the proposed method can adapt to the red, green, and blue components independently. Based on the experiment performed using three bands, the estimation in case of full-color HOE is expected to be performed with a total of nine bands using three bands each for R, G, and B. A further reduction with respect to the number of bands should be studied in the future.

6. Conclusion

In this study, we proposed a method to separate obstructive background components from the images captured via a novel imaging system using vHOE. The proposed method utilized multispectral imaging with spectral image processing that unmixed the subject image and the background components. Further, the effectiveness of the approach was confirmed via simulations and experiments. This method was observed to work appropriately when a band with its peak near the diffraction center wavelength and two adjacent bands were present. In addition, the real-time video capturing operation was demonstrated.

Funding

Japan Society for the Promotion of Science (18H03256).

Acknowledgment

The authors acknowledge NTT DOCOMO Inc. and Covestro AG for their technical support.

Disclosures

The authors declare no conflicts of interest.

References

1. N. Kim, Y. L. Piao, and H. Y. Wu, “Holographic optical elements and application,” In Holographic Materials and Optical Systems. InTech, (2017).

2. T. Nakamura, S. Kimura, K. Takahashi, Y. Aburakawa, S. Takahashi, S. Igarashi, S. Torashima, and M. Yamaguchi, “Off-axis virtual-image display and camera by holographic mirror and blur compensation,” Opt. Express 26(19), 24864–24880 (2018). [CrossRef]

3. S. Kimura, T. Nakamura, S. Takahashi, S. Igarashi, S. Torashima, M. Yamaguchi, and Y. Aburakawa, “Research of video communication system using holographic optical elements,” Technical report of IEICE118, 265 (2018).

4. P. Harman, “Autostereoscopic teleconferencing system,” Stereoscopic Displays and Virtual Reality Systems VII3957, 293–302 (2000).

5. K. Otsuka, “MMSpace: Kinetically-augmented telepresence for small group-to-group conversations,” in 2016 IEEE Virtual Reality (VR) (IEEE, 2016), pp. 19–28.

6. F. Watanabe, T. Nakamura, S. Torashima, S. Igarashi, S. Kimura, Y. Aburakawa, and M. Yamaguchi, “Dispersion compensation for full-color virtual-imaging systems with a holographic oﬀ-axis mirror,” Proc. SPIE 11306, 3 (2020). [CrossRef]

7. H. Konno, S. Igarashi, T. Nakamura, and M. Yamaguchi, “Waveguide-HOE-based camera that captures a frontal image for flat-panel display,” in Proceedings of International Display Workshops (IDW18), pp. 1127–1130 (2018).

8. M. Zhou, O. Matoba, Y. Kitagawa, Y. Takizawa, T. Matsumoto, H. Ueda, A. Mizuno, and N. Kosaka, “Fabrication of an integrated holographic imaging element for a three-dimensional eye-gaze detection system,” Appl. Opt. 49(19), 3780–3785 (2010). [CrossRef]

9. Y. Li and M. S. Brown, “Single image layer separation using relative smoothness,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2014), pp. 2752–2759.

10. Y. Li and M. S. Brown, “Exploiting reflection change for automatic reflection removal,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2013), pp. 2432–2439.

11. X. Zhang, R. Ng, and Q. Chen, “Single image reflection separation with perceptual losses,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE/CVF, 2018), pp. 4786–4794.

12. K. Wei, J. Yang, Y. Fu, D. Wipf, and H. Huang, “Single image reflection removal exploiting misaligned training data and network enhancements,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE/CVF, 2019), pp. 8178–8187.

13. J. Yang, D. Gong, L. Liu, and Q. Shi, “Seeing deeply and bidirectionally: A deep learning approach for single image reflection removal,” in Proceedings of the European Conference on Computer Vision (ECCV, 2018), pp. 654–669.

14. L. T. Maloney, “Evaluation of linear models of surface spectral reflectance with small numbers of parameters,” J. Opt. Soc. Am. A 3(10), 1673–1683 (1986). [CrossRef]

15. M. J. Vrhel, R. Gershon, and L. S. Iwan, “Measurement and analysis of object reflectance spectra,” Color Res. Appl. 19(1), 4–9 (1994). [CrossRef]

16. W. K. Pratt and C. E. Mancill, “Spectral estimation techniques for the spectral calibration of a color image scanner,” Appl. Opt. 15(1), 73–75 (1976). [CrossRef]

17. Y. Takara, N. Manago, H. Saito, Y. Mabuchi, A. Kondoh, T. Fujimori, F. Ando, M. Suzuki, and H. Kuze, “Remote sensing applications with NH hyperspectral portable video camera,” Proc. SPIE 8527, 85271G (2012). [CrossRef]

18. . ISO 17901-1:2015. Optics and photonics — Holography — Part 1: Methods of measuring diffraction efficiency and associated optical characteristics of holograms.

19. M. Yamaguchi, H. Haneishi, and N. Ohyama, “Beyond red-green-blue(RGB): Spectrum-based color imaging technology,” J. Imaging Sci. Technol. 52(1), 010201 (2008). [CrossRef]

20. Y. Murakami, K. Fukura, M. Yamaguchi, and N. Ohyama, “Color reproduction from low-SNR multispectral images using spatio-spectral Wiener estimation,” Opt. Express 16(6), 4106–4120 (2008). [CrossRef]

Background	Illuminant	$N R M S E$				$N R M S E_{1 / 4}$
		Noise-free		With noise		With noise
		${\hat{f}}_{e}$	$g_{4}$	${\hat{f}}_{e}$	$g_{4}$	${\hat{f}}_{e}$	$g_{4}$
Artificial chart	D65	0.139	0.630	0.484	0.707	0.206	0.647
	A	0.126	0.640	0.652	0.777	0.239	0.657
	F7	0.157	0.675	0.495	0.747	0.218	0.657
Plush	D65	0.055	0.510	0.462	0.579	0.149	0.556
	A	0.053	0.516	0.643	0.645	0.192	0.567
	F7	0.053	0.540	0.468	0.602	0.150	0.588
Cloth	D65	0.057	0.492	0.465	0.549	0.149	0.494
	A	0.060	0.498	0.643	0.604	0.191	0.503
	F7	0.084	0.519	0.475	0.572	0.150	0.519

Unmixing of the background components in an off-axis holographic-mirror-based imaging system using spectral image processing

Abstract

1. Introduction

2. Method

3. Simulation

3.1. Setup

3.2 Evaluation

3.3 Results

4. Experiment

4.1 Setup

4.2 Results

5. Discussion

6. Conclusion

Funding

Acknowledgment

Disclosures

References

Supplementary Material (1)

Cited By

Figures (10)

Tables (1)

Equations (9)

Optics Express