Compressive single-pixel hyperspectral imaging using RGB sensors

Chenning Tao; Huanzheng Zhu; Xucheng Wang; Shuhang Zheng; Qin Xie; Chang Wang; Rengmao Wu; Zhenrong Zheng

doi:10.1364/OE.416388

1. Introduction

Common imaging systems mainly capture spatial distribution of a scene, even with an RGB image sensor only three channels of spectral information are acquired. In spectral imaging, the spectrum of each spatial point is available, which makes it crucial in various fields such as remote sensing [1,2], biomedical diagnosis [3–5], astronomy [6,7], etc. In conventional spectral imaging systems, scanning is required in the spatial or spectral dimension, which is time-consuming and degrades the photon collection efficiency [8]. Compressive spectral imaging systems based on the compressive sensing theory reconstruct spatial-spectral data cubes from two-dimensional projections, which significantly reduces the number of measurements [9]. In a compressive spectral imaging system, the input data cube is successively modulated by a spectral modulator (prism or grating) and a spatial modulator (spatial light modulator or digital micromirror device) before captured by the focal plane array [10–14]. However, careful design on the system configuration is required and the alignment-sensitive optical components reduce the reliability of the system [15].

Over the last decade there has been extensive researches on the employment of single-pixel detectors as the imaging devices due to the lower cost and higher detection efficiency than the counterpart focal plane arrays (CCD or CMOS sensors) [16]. In a single-pixel imaging system, the light field of the object is correlated with a sequence of spatial patterns and then detected by a single-pixel detector without imaging lens. Taking the advantage of the compressive sensing theory, the image can be reconstructed from measurements with number much fewer than the total pixel number of the image. Single-pixel imaging also provides a promising scheme for spectral imaging. The first single-pixel camera architecture for spectral imaging was proposed with a dichroic beam splitter which separates the input light to red, green and blue outputs, and each output channel is reconstructed respectively to recover the RGB image [17]. Later, an all-optical spectral splitting device was designed to spatially split the input light into eight sub-regions with different spectra on the spatial light modulator, which effectively increases the number of spectral channels [18]. In another type of hyperspectral single-pixel imaging system, a diffraction grating is used to stretch the input light after the spatial modulation, and a sinusoidal spectral modulation is applied by a rotating film [19]. However, in the aforementioned systems, bulky and complicated dispersers are required for full spectral acquirements which correspondingly increase the complexity of the system, and the sparsity in the spectral dimension is not utilized. To make full use of the sparse properties of the spectra for common scenes, and to reduce the requirement on the optical elements in the imaging systems especially the spectral disperser, spectral super-resolution based on compressive sensing is demanded in single-pixel hyperspectral imaging.

In this paper, a lens-free compressive single-pixel hyperspectral imaging system with RGB sensors is proposed. In combination with the structured illumination configuration for spatial modulation, RGB sensors are used as the single-pixel detectors for spectral information sampling. Based on the compressive sensing theory, the sparse properties in both spectral and spatial dimensions are fully utilized by reconstructing the hyperspectral images with the pre-trained spatial-spectral dictionary. Furthermore, the spatial patterns of the structured illumination and the dictionary are simultaneously optimized by minimizing the coherence of the system matrix, which improves the sampling efficiency and the reconstruction fidelity. Both simulations and proof-of-concept experiments validate the feasibility of the RGB-sensor-based single-pixel hyperspectral imaging.

2. System configuration and method

The sensing model of the single-pixel hyperspectral imaging system is illustrated in Fig. 1. The input spatial-spectral data cube is denoted as f_mnl, which describes a 3D data cube of resolution N × N × L with the spatial coordinates m, n, and the spectral coordinate l. The data cube is spatially modulated by a series of binary (block-unblock) spatial patterns and the correlated light is detected by an RGB sensor without spatial resolution (see Appendix A). For clarity and convenience, the system projection is described in the matrix form. Suppose K is the number of spatial patterns, with the vectorized spatial patterns ${\textbf X} \in {{\mathbb R}^{K \times {N^2}}}$, the system matrix H of the single pixel hyperspectral imaging system can be expressed as

(1)$${\textbf H} = ({{\textbf {BXC}}} )\odot {\textbf R}$$

where ${\textbf B} = {[{{{\textbf I}_K}\; \; {{\textbf I}_K}\; {{\textbf I}_K}} ]^T}$ is the replication matrix for the response of the RGB sensor and I_K is the identity matrix with dimension K, ${\textbf C} = {[{{{\textbf I}_{{N^2}}} \ldots \; {{\textbf I}_{{N^2}}}} ]_{{N^2} \times {N^2}L}}$ is the replication matrix in the spectral dimension, $\; {\textbf R} \in {{\mathbb R}^{3K \times {N^2}L}}$ is the response matrix of the RGB sensor with similar replication in both dimensions (response and spectral), and ⊙ is the Hadamard product. The light g detected by the sensor is

(2)$${\textbf g} = {\textbf {Hf}}$$

Fig. 1. The sensing scheme of the single-pixel hyperspectral imaging system. The 3D data cube is spatially modulated and projected to an RGB sensor.

Download Full Size | PDF

The reconstruction of the 3D data cube f is an optimization problem in compressive sensing. If f has its sparse representation $\boldsymbol{\mathrm{\theta}} \in \mathbb{R}^{d}$ on the basis $\; {\textbf D} = [{{{\textbf d}_1}\; {{\textbf d}_2} \ldots {{\textbf d}_d}} ]\in {{\mathbb R}^{{N^2}L \times d}}$ by ${\textbf f} = \; {\textbf D}\boldsymbol{\mathrm{\theta}}$, where d is the element number in D, then, f can be obtained by solving the following optimization problem

(3)$$ \hat{\mathbf{f}}=\mathbf{D} \underset{\boldsymbol{\mathrm{\theta}}}{\arg \min }\left(\|\mathbf{H} \mathbf{D} \boldsymbol{\mathrm{\theta}}-\mathbf{g}\|_{2}^{2}+\tau\|\boldsymbol{\mathrm{\theta}}\|_{1}\right) $$

where τ is a regularization parameter. Algorithms including the orthogonal matching pursuit (OMP) [20] and the two-step iterative shrinkage/thresholding algorithm (TwIST) [21] are available to solve Problem (3).

In compressive sensing, accurate reconstruction can be accomplished by choosing appropriate H and D that are sufficiently incoherent [22]. Or in other words, the coherence between H and D should be minimized. Details about the definition of coherence and the substitute form is discussed in our previous work [14] (see Appendix B). The optimization problem for coherence minimization can be written in the form of

(4)$$\mathop {\min }\limits_{{\textbf H},{\textbf D}} {\cal J}({{\textbf H},{\textbf D}} )= ||{{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}} - {{\textbf I}_d}} ||_F^2$$

where I_d is the identity matrix with dimension d (number of elements in the dictionary), and subscript F is the Frobenius norm. As the matrix B, C are fixed and R is determined by the spectral response of the RGB sensor, ${\cal J}$ (H, D) is only influenced by the spatial patterns X and the basis D. Hadamard basis patterns and Fourier basis patterns are two representative spatial modulation patterns used in single-pixel imaging. However, these two patterns are generated by Hadamard transform and Fourier transform, which are predetermined and therefore unoptimizable. Here, random binary patterns with a transmittance value of 0.5 are used as the initial X for the optimization. Then, gradient descent is applied to update X by minimizing Problem (4). With the basis D fixed, the gradient of ${\cal J}$ with respect to X can be computed by (see Appendix C)

(5)$$\frac{{\partial {\cal J}}}{{\partial {\textbf X}}} = 4{{\textbf B}^T}({({{\textbf {HD}}{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}}{{\textbf D}^T}} )\odot {\textbf R}} ){{\textbf C}^T} - 4{{\textbf B}^T}({({{\textbf {HD}}{{\textbf D}^T}} )\odot {\textbf R}} ){{\textbf C}^T}$$

To constrain the elements in X to be binary (0 or 1), a penalty term φ is added

(6)$$\varphi = ||{{\textbf X} \odot (1 - {\textbf X})} ||_F^2$$

The gradient of φ with respect to X is (see Appendix D)

(7)$$\frac{{\partial \varphi }}{{\partial {\textbf X}}} = 2{\textbf X} - 6{{\textbf X}^2} + 4{{\textbf X}^3}$$

Moreover, to constrain the variability of X in terms of columns or rows, a variability constrain term ψ is added

(8)$$\psi = ||{{\textbf X}{{\textbf X}^T} - u{{\textbf I}_K}} ||_F^2 + ||{{{\textbf X}^T}{\textbf X} - v{{\textbf I}_{{N^2}}}} ||_F^2$$

where the constants u and v are set to adapt the sampling rate. The variability constraint can be employed to ensure that the rows or columns are orthogonal and the sampling rate for each shot is similar [23]. The gradient of ψ with respect to X is

(9)$$\frac{{\partial \psi }}{{\partial {\textbf X}}} = 4({{\textbf X}{{\textbf X}^T} - v{{\textbf I}_K}} ){\textbf X} + 4{\textbf X}({{{\textbf X}^T}{\textbf X} - u{{\textbf I}_{{N^2}}}} )$$

And the update equation of the spatial patterns X is

(10)$${{\textbf X}_{s + 1}} = {{\textbf X}_s} - \alpha \frac{{\partial {\cal J}}}{{\partial {\textbf X}}} - \beta \frac{{\partial \varphi }}{{\partial {\textbf X}}} - \gamma \frac{{\partial \psi }}{{\partial {\textbf X}}}$$

where s is the iteration index, α, β and γ are the update stepsize.

The trainable over-complete dictionary is used as the basis D. Compared with fixed bases, the learned over-complete dictionary can sparsely represent the images more efficiently [24]. The gradient of ${\cal J}$ with respect to D can be computed, where X is fixed [14]

(11)$$\frac{{\partial {\cal J}}}{{\partial {\textbf D}}} = 4{{\textbf H}^T}{\textbf {HD}}{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}} - 4{{\textbf H}^T}{\textbf {HD}}$$

The initial dictionary D₀ is trained by K-SVD algorithm [25] using a hyperspectral image dataset. The update equation of the dictionary D is

(12)$${{\textbf D}_{s + 1}} = {{\textbf D}_s} - \varepsilon \frac{{\partial {\cal J}}}{{\partial {\textbf D}}}$$

where ɛ is the update stepsize. After the gradient descent optimization of the spatial patterns X and the basis D, the coherence between H and D is calculated and the stop criterion is set as the coherence value difference between iterations smaller than a given threshold. Then, the binarization is applied to the final spatial patterns X. As D is optimized with only minimization of coherence considered, to maintain the sparsity representation functionality of D with regard to the image set, another K-SVD dictionary training initialized by the given dictionary D is performed after the gradient descent optimization. The pseudocode of the optimization algorithm is shown in Algorithm 1. To be noted, the spatial pattern update stepsize α, β, γ and the dictionary update stepsize ɛ are determined by the values of the gradients.

Algorithm 1. Gradient-based optimization for spatial patterns X and dictionary D

View Table | View all tables in this article

3. Simulations

In this section, the performance of the proposed single-pixel hyperspectral imaging system and the optimization algorithm is evaluated via simulated acquisition. The proposed single-pixel hyperspectral imaging system featuring the lens-free structure uses two RGB sensors, as the parallel collections by the RGB sensors with different spectral responses can improve the reconstruction in the spectral dimension without increasing the sampling time (proved in the following discussion of Table 1). The normalized spectral responses for the two RGB sensors (TCS34725 and TCS3200, AMS) used in the simulation and the experiment are shown in Fig. 2(a), which can give different detection results in terms of RGB values. Therefore, in Eq. (1), the spectral response matrix ${\textbf R} \in {{\mathbb R}^{6K \times {N^2}L}}$ now represents the responses of the two RGB sensors, and the replication matrix ${\textbf B} = {[{{{\textbf I}_K}\; \; {{\textbf I}_K}\; {{\textbf I}_K}\; {{\textbf I}_K}\; \; {{\textbf I}_K}\; {{\textbf I}_K}} ]^T}$ is doubled with the row dimension of 6K. Correspondingly, the system matrix ${\textbf H} \in {{\mathbb R}^{6K \times {N^2}L}}$ gives six measurements of the two RGB sensors for each spatial pattern.

Fig. 2. (a) The normalized spectral responses for the RGB sensor TCS34725 and TCS3200. (b) Gradient descent iterations of the coherence value during the optimization of the spatial patterns X and the basis D.

Download Full Size | PDF

Natural Scenes 2015 hyperspectral image dataset [26] is used to train the over-complete dictionary D, which is chosen as ${\textbf D} = {{\textbf D}_{\textrm{spatial}}} \otimes \; {{\textbf D}_{\textrm{spectral}}}$, where D_spatial is the spatial dictionary, D_spectral is the spectral dictionary, and ${\otimes} $ is the Kronecker product. The initial spatial dictionary contains 1500 atoms with size 32×32 (N = 32), and the initial spectral dictionary contains 45 atoms with size 31 which represents the spectral range from 400 nm to 700 nm at an interval of 10 nm (L = 31), consistent with the training dataset and the reconstructed hyperspectral images. In the simulation, 128 spatial patterns (K = 128) are used, and the measurement ratio for the spatial dimension is K/N² = 0.125. The measurement ratio for the spectral dimension is 6/L = 0.1935, and the total compression ratio for the single-pixel hyperspectral imaging system is therefore 0.02419. The initial spatial patterns are random binary patterns with an average transmittance of 0.5. The coherence for the initial spatial patterns and the initial dictionary trained by K-SVD is 0.9033. With the gradient optimization algorithm (Algorithm 1), the coherence is reduced to 0.4653, as shown in Fig. 2(b). After another K-SVD training for maintaining the sparse representation properties of the dictionary, the optimized spatial patterns and the corresponding optimized dictionary are obtained. Three image quality metrics, including the peak signal-to-noise ratio (PSNR), the structural similarity (SSIM), and the spectrum angular mapper (SAM) [27] are adopted to quantitatively assess the quality of the reconstruction. In particular, SAM calculates the angles between the spectral vectors of the ground truth and those of the reconstruction, with the spectra denoted by vectors of dimension equal to the number of the spectral bands. Higher PSNR/SSIM values and lower SAM value indicate better reconstruction quality.

The binary images of the first three (3 in 128 patterns) spatial patterns for both optimized and unoptimized conditions are shown in Fig. 3, with the corresponding transmittance for each spatial pattern. As the initial spatial patterns are randomly generated with an average transmittance of 0.5, the transmittances for these three unoptimized spatial patterns fluctuate around 0.5. In the gradient-descent-based optimization, as the gradient values in each iteration are mainly positive, the transmittances of the optimized spatial patterns are generally lower than those of the unoptimized ones; or intuitively, the unblock pixels in the optimized spatial patterns form a “subset” of those in the unoptimized spatial patterns. The decrease of the overall transmittance (from 0.5010 to 0.3912) in the optimized spatial patterns can effectively reduce the redundancy between measurements at each pixel and therefore improve the reconstruction performance in the spatial dimension.

Fig. 3. The first three spatial patterns before and after the optimization.

Download Full Size | PDF

The visual comparison between the reconstruction results with the optimized and unoptimized dictionary/spatial pattern are shown in Fig. 4 for two test images, with the original images shown for reference. The corresponding PSNR, SSIM, and SAM values are shown in the last two columns in Table 1. The reconstructed hyperspectral images at four wavelengths (460 nm, 520 nm, 580 nm, and 640 nm) are indicated in the corresponding RGB colors. Part of the images in the dashed boxes are magnified for clear comparison of spatial reconstruction quality.

Fig. 4. Reconstructed hyperspectral images of the proposed single-pixel hyperspectral imaging system with 128 spatial patterns. (a) and (d) are the ground truth, (b) and (e) are the reconstructed spectral images with the unoptimized spatial patterns and dictionary, (c) and (f) show the reconstructed spectral images with the optimized spatial patterns and dictionary.

Download Full Size | PDF

In the block-wise sampling and reconstruction of single-pixel imaging, the spatial pixels with higher intensity are dominant, which guarantees their well-performed reconstructions compared to those with lower intensity. As shown in the magnified part, for all wavelengths, reconstructions of the high-intensity pixels are close to the original ones for both the optimized and unoptimized cases. For some of the spatial pixels with relatively low intensity, the reconstructed results are close to zero due to the lower weights during the spatial patterning and reconstruction. The optimized spatial patterns mitigate this problem on low-intensity pixels, due to fewer nonzero elements in the spatial pattern matrix for each shot compared to the unoptimized (random) spatial patterns, which decreases the redundancy in the measurements. At the wavelengths (e.g. 460 nm and 520 nm for Img 2) with low-intensity, there is block effect due to the block-wise measurements and reconstructions. Compared with the unoptimized condition (Fig. 4(e)), the optimized spatial patterns with a larger range of transmittance (0.2686–0.4814) facilitate the recognition of the difference in absolute intensity of the spatial blocks, which effectively reduces the block effect. Simultaneously, the dictionary is optimized for adapting the optimized spatial patterns. Therefore, the PSNR and SSIM values for the optimized cases are higher than the unoptimized cases.

The spectrum for a spatial pixel (point spectrum) is important in hyperspectral imaging applications, e.g. spectral signature analysis or medical diagnose. The point spectra in the reconstructed hyperspectral images at random spatial pixels in the two test images are shown in Fig. 5, with the true spectra as reference. For the unoptimized spatial patterns and dictionary, the reconstructed spectra have the similar spectral tendency as the truth, with the spectra fluctuating around the true value. Although the fluctuation is not a fatal defect as smoothing over wavelengths can mitigate the problem, but it may potentially bring additional difficult to spectral signature analysis. For the optimized spatial patterns and dictionary, the fluctuation is eliminated, with a highly accurate value in terms of shape and absolute intensity.

Fig. 5. Spectral signature comparison for the spatial points indicated by the red points in the ground truth.

Download Full Size | PDF

The reconstruction results for the one-sensor structure (TCS34725 or TCS3200) and the dual-sensor structure with unoptimized/optimized spatial patterns and dictionary are compared in Table 1. For the one-sensor structure with unoptimized spatial patterns and dictionary, the reconstruction performance in both spatial and spectral dimension is insufficient. The optimization of spatial patterns and dictionary significantly improves the reconstruction performance, in particular for the spatial dimension indicated by higher SSIM values, which is also related to the decreased redundancy in measurements by reducing the coherence of the system. For the dual-sensor structure with unoptimized spatial patterns and dictionary, the spectral fidelity is enhanced compared to the one-sensor structure, even without the optimization on spatial patterns and dictionary (for Img 2). As for the dual-sensor structure with the optimized spatial patterns and dictionary, the reconstruction performance is further improved compared to the optimized one-sensor structure or the unoptimized dual-sensor structure.

Table 1. Reconstruction quality comparison with the one-sensor structure and the dual-sensor structure.

View Table | View all tables in this article

The dual-sensor structure is compared to the state-of-the-art multispectral single pixel sensor (AS7341) with the same spatial patterns, as shown in Table 2. The multispectral sensor has 8 channels in the visible range, with the spectral response bandwidth ranging from 29 nm to 51 nm. Due to the narrow bandwidths, the multispectral sensor is supposed to perform well in spectral imaging with lower spectral resolution (> 30 nm). However, for the hyperspectral imaging with high spectral resolution (10 nm), reconstruction with the optimized dual-sensor configuration is even better than the multispectral sensor, especially in the spectral dimension.

Table 2. Reconstruction quality comparison using different sensors.

View Table | View all tables in this article

4. Experimental results

A proof-of-concept experiment system is built to demonstrate the effectiveness of the proposed single-pixel hyperspectral imaging system. Figure 6(a) shows the schematic setup. A DLP-based projector projects the optimized binary spatial patterns onto the target scene (a standard color checker, X-Rite), and the reflected light is captured by the RGB color sensors (TCS34725, TCS3200) without imaging lens. The target scene is fixed to 12 blocks with size of 128 $\; \times $ 128 pixels as shown in Fig. 6(b). The integration time of the TCS34725 RGB sensor is set to be 101 ms, and the integration time of the TCS3200 RGB sensor is about 120 ms, which is determined by the illumination of the projector. With higher luminance, the integration time can be further reduced for higher capturing speed. For background noise elimination, completely dark patterns are projected to the target, and the reflected light captured by the RGB sensors are averaged and subtracted in the following measurements. To take illumination (irradiance spectrum is shown in Fig. 6(c)) into consideration, the radiance (reflectance × irradiance) is used as training data instead of the reflectance. As the illumination is limited to the spectral range between 430 nm to 650 nm, the spectra of the hyperspectral images in the experiment part also range from 430 nm to 650 nm, with the spectral bands L of 23. After the reconstruction of the radiance, the irradiance (Fig. 6(c)) is divided to obtain the reflectance spectrum of the target.

Fig. 6. (a) is the experiment setup. (b) is the target scene. (c) is the irradiance spectrum of the projector used in the experiment.

Download Full Size | PDF

For the spectral dimension, the reconstructed spectrum for each color block is shown in Fig. 7(b), with the truth for reference. The truth spectra are measured with the spectrometer (Ocean Optics QE65 Pro, with an integrating sphere Thorlabs IS200-4). Due to the spatial nonuniformity, the spectra are obtained by averaging over the block area at each wavelength. The reconstructed spectra are close to the truth. The difference mainly happens in the color blocks with spectra with low occurrence frequency in the training dataset (Natural Scenes 2015). In the dataset consisted of natural scenes (plants and buildings), the colors e.g. magenta, are not as common as the colors for vegetation (e.g. foliage and green), which therefore introduces larger error in spectrum fidelity, as shown in the root mean square error (RMSE) values for the spectra (Fig. 7(a)). The reconstructed hyperspectral images for the 23 wavelengths are shown in the corresponding colors at these wavelengths (Fig. 7(c)). For clarity, the reconstructed image is individually normalized with maximum intensity for each wavelength. Compared to the simulation results, the spatial resolution degrades mainly due to the noise of the sensor, which can be improved at the cost of longer capturing time. Moreover, the dataset for dictionary training is sampled in different environments from the experiment, which is also related to the nonuniformity in each color block.

Fig. 7. (a) presents the reconstruction error for each color block in terms of root mean square error. (b) shows the spectrum comparison of the 12 color blocks between the reconstructed results and the ground truth. (c) shows the reconstructed spectral images from the experiment.

Download Full Size | PDF

5. Conclusion

In summary, we demonstrate a low-cost compressive single-pixel hyperspectral imaging system using RGB sensors. First, instead of a common monochromatic single-pixel detector, the RGB sensors are applied for capturing the spectral information without introducing additional acquisition time. Second, with the pre-trained spatial-spectral dictionary, hyperspectral images can be reconstructed from measurements of low measurement ratio based on compressive sensing. The reconstruction quality can be improved after employing a gradient-based optimization on the spatial patterns and the dictionary, by minimizing the coherence of the system. Third, the intrinsic sparse properties in both spatial and spectral dimension is fully utilized for mitigating the problems on sampling and reconstruction cost. Last, a proof-of-concept experiment validates the effectiveness of the proposed system. This lens-free, disperser-free, and robust configuration opens opportunities for miniaturization of hyperspectral cameras and enables employment in portable devices, mini-satellites, and drones. And it is expected to stimulate new possibilities for modern imaging systems towards low-cost and high-resolution future.

Appendix A

The model for the RGB sensor that collects the projection can be expressed as

(13)$$\begin{array}{l} {g_r} = \sum\limits_{1 \le l \le L}^{} {{R_{l,r}}\sum\limits_{1 \le m,n \le N}^{} {{f_{mnl}}{T_{mn}}} } \\ {g_g} = \sum\limits_{1 \le l \le L}^{} {{R_{l,g}}\sum\limits_{1 \le m,n \le N}^{} {{f_{mnl}}{T_{mn}}} } \\ {g_b} = \sum\limits_{1 \le l \le L}^{} {{R_{l,b}}\sum\limits_{1 \le m,n \le N}^{} {{f_{mnl}}{T_{mn}}} } \end{array}$$

where R_r, R_g, and R_b are the spectral responses of the RGB sensor for red, green, and blue channel, respectively. T is the binary transmittance of the spatial patterns. The three channels of RGB sensors can collect light simultaneously.

Appendix B

Let system matrix ${\textbf H} = {[{{\textbf h}_1^T,\; {\textbf h}_2^T, \ldots ,\; {\textbf h}_{3K}^T} ]^T} \in {{\mathbb R}^{3K \times {N^2}L}}$, the coherence between H and D can be expressed as

(14)$$\mu ({\textbf H},{\textbf D}) = \mathop {\max }\limits_{\scriptstyle1 \le i \le 3K\atop \scriptstyle1 \le j \le d} |{{{\textbf h}_i}{{\textbf d}_j}} |$$

The minimization of the coherence between H and D is equivalent to a nonconvex problem which minimizes the maximum entry-absolute-difference between the Gram matrix (HD)^THD and the identity matrix I_d

(15)$$\mathop {\min }\limits_{{\textbf H},{\textbf D}} f({\textbf H},{\textbf D}) = {||{{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}} - {{\textbf I}_d}} ||_\infty }$$

In order to solve this problem, the coherence is represented by the Frobenius norm instead of the ℓ _∞-norm, and the minimization Problem (15) can be rewritten in the form of

(16)$$\mathop {\min }\limits_{{\textbf H},{\textbf D}} {\cal J}({{\textbf H},{\textbf D}} )= ||{{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}} - {{\textbf I}_d}} ||_F^2$$

where the subscript F is the Frobenius norm that calculates the square root of the quadratic sum for all matrix elements.

Appendix C

The gradient of ${\cal J}$ with respect to X is calculated, where D is considered fixed:

(17)$$\begin{aligned} \frac{{\partial {\cal J}}}{{\partial {\textbf X}}} &= \frac{\partial }{{\partial {\textbf X}}}||{{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}} - {\textbf I}} ||_F^2\\ &= \frac{\partial }{{\partial {\textbf X}}}Tr({{{({{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}} - {\textbf I}} )}^T}({{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}} - {\textbf I}} )} )\\ &= \frac{\partial }{{\partial {\textbf X}}}Tr({{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}}{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}}} )- 2\frac{\partial }{{\partial {\textbf X}}}Tr({{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}}} )\\ &= \frac{\partial }{{\partial {\textbf X}}}[{Tr({{{\textbf D}^T}{{({({{\textbf {BXC}}} )\odot {\textbf R}} )}^T}({({{\textbf {BXC}}} )\odot {\textbf R}} ){\textbf D}{{\textbf D}^T}{{({({{\textbf {BXC}}} )\odot {\textbf R}} )}^T}({({{\textbf {BXC}}} )\odot {\textbf R}} ){\textbf D}} )} ]\\ &- 2\frac{\partial }{{\partial {\textbf X}}}[{Tr({{{\textbf D}^T}{{({({{\textbf {BXC}}} )\odot {\textbf R}} )}^T}({({{\textbf {BXC}}} )\odot {\textbf R}} ){\textbf D}} )} ]\\ &= 4{{\textbf B}^T}({({{\textbf {HD}}{{\textbf D}^T}{{\textbf H}^T}{\textbf {HD}}{{\textbf D}^T}} )\odot {\textbf R}} ){{\textbf C}^T} - 4{{\textbf B}^T}({({{\textbf {HD}}{{\textbf D}^T}} )\odot {\textbf R}} ){{\textbf C}^T} \end{aligned}$$

Appendix D

The gradient of φ with respect to X is calculated:

(18)$$\begin{aligned} \frac{{\partial \varphi }}{{\partial {\textbf X}}} &= \frac{\partial }{{\partial {\textbf X}}}||{{\textbf X} \odot (1 - {\textbf X})} ||_F^2\\ &= \frac{\partial }{{\partial {\textbf X}}}Tr({{{({{\textbf X} \odot (1 - {\textbf X})} )}^T}({{\textbf X} \odot (1 - {\textbf X})} )} )\\ &= \frac{\partial }{{\partial {\textbf X}}}Tr({{{({{\textbf X} - {{\textbf X}^2}} )}^T}({{\textbf X} - {{\textbf X}^2}} )} )\\ &= \frac{\partial }{{\partial {\textbf X}}}Tr({{{\textbf X}^T}{\textbf X} - {{({{\textbf X}^2})}^T}{\textbf X} - {{\textbf X}^T}{{\textbf X}^2} + {{({{\textbf X}^2})}^T}{{\textbf X}^2}} )\\ &= 2{\textbf X} - 6{{\textbf X}^2} + 4{{\textbf X}^3} \end{aligned}$$

Funding

National Natural Science Foundation of China (61327902).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. Borengasser, W. S. Hungate, and R. L. Watkins, Hyperspectral Remote Sensing: Principles and Applications (CRC, 2008), chap. 7, 8, 9, 10.

2. I. Makki, R. Younes, C. Francis, T. Bianchi, and M. Zucchetti, “A survey of landmine detection using hyperspectral imaging,” ISPRS J. Photogramm. Remote Sens. 124, 40–53 (2017). [CrossRef]

3. G. Lu and B. Fei, “Medical hyperspectral imaging: a review,” J. Biomed. Opt. 19(1), 010901 (2014). [CrossRef]

4. A. Bjorgan and L.L. Randeberg, “Towards real-time medical diagnostics using hyperspectral imaging technology,” in Proceedings of the OSA European Conference on Biomedical Optics, (Optical Society of America, 2015), pp. 953712.

5. C. Tao and Z. Zheng, “Compressive Hyperspectral Imaging Enhanced Biomedical Imaging,” BJSTR 22(4), 16805–16807 (2019). [CrossRef]

6. A. S. K. Kumar and A. R. Chowdhury, “Hyper-Spectral Imager in visible and near-infrared band for lunar compositional mapping,” J Earth Syst Sci 114(6), 721–724 (2005). [CrossRef]

7. A. S. K. Kumar, A. R. Chowdhury, A. Banerjee, A. B. Dave, B. N. Sharma, K. J. Shah, K. R. Murali, S. Mehta, S. R. Joshi, and S. S. Sarkar, “Hyper Spectral Imager for lunar mineral mapping in visible and near infrared band,” Curr. Sci. 96(4), 496–499 (2009).

8. Y. Garini, I. T. Young, and G. McNamara, “Spectral imaging: Principles and applications,” Cytometry, Part A 69A(8), 735–747 (2006). [CrossRef]

9. N. Hagen and M. W. Kudenov, “Review of snapshot spectral imaging technologies,” Opt. Eng. 52(9), 090901 (2013). [CrossRef]

10. A. A. Wagadarikar, N. P. Pitsianis, X. Sun, and D. J. Brady, “Video rate spectral imaging using a coded aperture snapshot spectral imager,” Opt. Express 17(8), 6368–6388 (2009). [CrossRef]

11. G. Arce, D. Brady, C. Lawrence, H. Arguello, and D. Kittle, “Compressive Coded Aperture Spectral Imaging: An Introduction,” IEEE Signal Process. Mag. 31(1), 105–115 (2014). [CrossRef]

12. X. Cao, T. Yue, X. Lin, S. Lin, X. Yuan, Q. Dai, L. Carin, and D. J. Brady, “Computational snapshot multispectral cameras: Toward dynamic capture of the spectral world,” IEEE Signal Process. Mag. 33(5), 95–108 (2016). [CrossRef]

13. C. Tao, H. Zhu, P. Sun, R. Wu, and Z. Zheng, “Hyperspectral image recovery based on fusion of coded aperture snapshot spectral imaging and RGB images by guided filtering,” Opt. Commun. 458, 124804 (2020). [CrossRef]

14. C. Tao, H. Zhu, P. Sun, R. Wu, and Z. Zheng, “Simultaneous coded aperture and dictionary optimization in compressive spectral imaging via coherence minimization,” Opt. Express 28(18), 26587–26600 (2020). [CrossRef]

15. M. Ding, P. W. Yuen, and M. A. Richardson, “Design of Single Prism Coded Aperture Snapshot Spectral Imager using Ray Tracing Simulation,” in 2018 IEEE British and Irish Conference on Optics and Photonics (BICOP), (IEEE, 2018), pp. 1–4.

16. M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics 13(1), 13–20 (2019). [CrossRef]

17. S. S. Welsh, M. P. Edgar, R. Bowman, P. Jonathan, B. Sun, and M. J. Padgett, “Fast full-color computational imaging with single-pixel detectors,” Opt. Express 21(20), 23068–23074 (2013). [CrossRef]

18. Z. Li, J. Suo, X. Hu, C. Deng, J. Fan, and Q. Dai, “Efficient single-pixel multispectral imaging via non-mechanical spatio-spectral modulation,” Sci. Rep. 7(1), 41435 (2017). [CrossRef]

19. L. Bian, J. Suo, G. Situ, Z. Li, J. Fan, F. Chen, and Q. Dai, “Multispectral imaging using a single bucket detector,” Sci. Rep. 6(1), 24752 (2016). [CrossRef]

20. J. A. Tropp and A. C. Gilbert, “Signal recovery from random measurements via orthogonal matching pursuit,” IEEE Trans. Inf. Theory 53(12), 4655–4666 (2007). [CrossRef]

21. J. M. Bioucas-Dias and M. A. T. Figueiredo, “A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. on Image Process. 16(12), 2992–3004 (2007). [CrossRef]

22. E. J. Candes and M. B. Wakin, “An Introduction to Compressive Sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008). [CrossRef]

23. L. Galvis, E. Mojica, H. Arguello, and G. Arce, “Shifting colored coded aperture design for spectral imaging,” Appl. Opt. 58(7), B28–B38 (2019). [CrossRef]

24. M. Elad and M. Aharon, “Image denoising via learned dictionaries and sparse representation,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1 (IEEE, 2006), pp. 895–900.

25. M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation,” IEEE Trans. Signal Process. 54(11), 4311–4322 (2006). [CrossRef]

26. S. M. C. Nascimento, K. Amano, and D. H. Foster, “Spatial distributions of local illumination color in natural scenes,” Vision Res. 120, 39–44 (2016). [CrossRef]

27. F. A. Kruse, A. Lefkoff, J. Boardman, K. Heidebrecht, A. Shapiro, P. Barloon, and A. J. Goetz, “The spectral image processing system (SIPS)—interactive visualization and analysis of imaging spectrometer data,” Remote Sens. Environ. 44(2-3), 145–163 (1993). [CrossRef]

Image number	Quality metrics	Unoptimized TCS34725	Unoptimized TCS3200	Optimized TCS34725	Optimized TCS3200	Unoptimized dual sensors	Optimized dual sensors
Img 1 (Trees)	PSNR	19.07	19.07	29.27	29.33	25.17	31.25
	SSIM	0.6793	0.6795	0.9703	0.9707	0.9323	0.9824
	SAM	0.6545	0.6544	0.1917	0.1902	0.3192	0.1497
Img 2 (Flower)	PSNR	19.53	19.54	25.99	26.03	26.86	34.08
	SSIM	0.6544	0.6544	0.9155	0.9157	0.9277	0.9852
	SAM	0.6333	0.6333	0.2520	0.2512	0.2425	0.1066

Image number	Sensor type	PSNR	SSIM	SAM
Img 1 (Trees)	Optimized dual sensors	31.25	0.9824	0.1497
Img 1 (Trees)	Multispectral sensor	29.98	0.9767	0.1780
Img 2 (Flower)	Optimized dual sensors	34.08	0.9852	0.1066
Img 2 (Flower)	Multispectral sensor	30.67	0.9680	0.1621

Image number	Quality metrics	Unoptimized TCS34725	Unoptimized TCS3200	Optimized TCS34725	Optimized TCS3200	Unoptimized dual sensors	Optimized dual sensors
Img 1 (Trees)	PSNR	19.07	19.07	29.27	29.33	25.17	31.25
	SSIM	0.6793	0.6795	0.9703	0.9707	0.9323	0.9824
	SAM	0.6545	0.6544	0.1917	0.1902	0.3192	0.1497
Img 2 (Flower)	PSNR	19.53	19.54	25.99	26.03	26.86	34.08
	SSIM	0.6544	0.6544	0.9155	0.9157	0.9277	0.9852
	SAM	0.6333	0.6333	0.2520	0.2512	0.2425	0.1066

Image number	Sensor type	PSNR	SSIM	SAM
Img 1 (Trees)	Optimized dual sensors	31.25	0.9824	0.1497
Img 1 (Trees)	Multispectral sensor	29.98	0.9767	0.1780
Img 2 (Flower)	Optimized dual sensors	34.08	0.9852	0.1066
Img 2 (Flower)	Multispectral sensor	30.67	0.9680	0.1621

Compressive single-pixel hyperspectral imaging using RGB sensors

Abstract

1. Introduction

2. System configuration and method

3. Simulations

4. Experimental results

5. Conclusion

Appendix A

Appendix B

Appendix C

Appendix D

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (3)

Equations (18)

Optics Express