On-the-fly compressive single-pixel foveation using the STOne transform

Anthony T. Giljum; Anthony T. Giljum; Kevin F. Kelly

doi:10.1364/OE.452160

1. Introduction

Foveation is a concept in imaging where certain portions of the scene are resolved with a high pixel density and other parts are resolved with a lower pixel density. In conventional imaging, the primary benefit of foveation is a reduction in the amount of data that is necessary to store and/or transmit, since less data is sampled outside of a predetermined region of interest. In compressive sensing, however, the data is acquired in a compressed form from the beginning, so less data needs to be stored to retain information for the entire image at full resolution in the first place [1–3]. However, we show that there is still a benefit in the foveation of compressive imaging corresponding to a reduction in reconstruction time since it is necessary to reconstruct fewer pixels if only a small region of interest (ROI) is desired at the maximum resolution. For large datacubes, such as in hyperspectral imaging or video imaging, the $\ell _1$ reconstruction time can be very long, making any improvement in reconstruction speed valuable.

Previous work has been done on developing a foveation framework for single-pixel cameras. Such approaches fall into two camps: either a fixed region of interest is used and the sensing model is built around its fixed location, [4,5] or else a low-resolution reconstruction of the scene is first acquired, and then the sensing matrix is updated adaptively to take more measurements in the selected regions of interest. [6,7] Both fixed-region and adaptive methods suffer from significant drawbacks that limit their real-world effectiveness. Fixed-region methods must solve the chicken-and-egg problem of determining where the foveated region should be placed before any image has been acquired. In [4], this problem was avoided by performing Nyquist sensing rather than compressive sensing. The foveated region would then be adaptively moved based on the previous frame to improve global image quality. Using Nyquist-rate single-pixel imaging creates a tradeoff between the resolution of the foveated region and the field-of-view of the camera, since the images need to be acquired fast enough to dynamically update patterns displayed. In [5], compressive sensing is used, but the issue of determining where to place the foveated region is not addressed. Adaptive foveation methods are able to dynamically determine where to place the foveation region, but when the scene changes quickly or the number of pixels in the scene becomes large, new patterns cannot be generated and uploaded to the spatial-light modulator, such as a digital micromirror device (DMD), quickly enough for video rate imaging. In addition, adaptive patterns are not as universal as uniform-resolution random patterns, meaning that if it is decided that a different region of interest is needed after acquisition, this new region likely cannot be reconstructed with any significant improvement in resolution.

In this work, we demonstrate a method that allows for foveation anywhere in the image using fixed, non-adaptive DMD patterns, enabling easy scaling to higher-dimensional compressive imaging. In addition, the regions of interest are determined entirely after the measurements have been acquired. By reconstructing fewer pixels than is necessary for the full-resolution image, we are able to reduce the reconstruction time. We are able to reconstruct foveated images from fixed democratic measurements through the use of the Sum-To-One (STOne) transform. [8] The STOne transform has previously proved its utility in compressive static and video imaging since it simultaneously captures high and low resolution information. Here we demonstrate how an algorithm that exploits the same mathematical properties that generate multiple length scale STOne patterns obtained from the nested embedding permutation can also be used to generate on-the-fly foveation in any region of the image without any modification to the original acquisition patterns. In addition to producing an $\ell _1$ reconstruction, the STOne transform is also capable of generating a fast $\ell _2$ inverse transform at varying resolutions with the same patterns which also resolves the question on where to foveate.

2. Foveated image model

In the standard compressive sensing model, measurement acquisition is described by

(1)$$y=\Phi x$$

where $y$ describes the $M$ acquired measurements, $\Phi$ is the $M\times N$ fat sensing matrix, in this case STOne under nested-embedding and structured-random permutations, [8] and $x$ is the vector of length $N$ describing the linearized scene that we want to reconstruct. This sensing model is implemented in hardware using the design shown in Fig. 1. To then recover the image, one needs to solve the following $\ell _1$ optimization problem

(2)$$x^*=\arg\min_{\hat{x}}\mu|y-\Phi \hat{x} |+|TV(\hat{x})|$$

where $TV$ stands for Total Variation - the sparsifying transform used - and $\mu$ corresponds to the regularization parameter weighting the measurements against the sparsifying transform. Smaller values of $\mu$ correspond to smoother images, but potentially images that no longer accurately represent the scene. Although many different sparsifying transforms are used, $TV$ is commonly used in imaging and is well known. For this reason, we use it here. [8–10] Departing from previous methods, the proposed method leaves the acquisition model exactly the same for foveated imaging as for standard single-pixel imaging; rather, the optimization problem is modified slightly. In the proposed model, it is necessary to solve

(3)$$x^*=\arg\min_{\hat{x}}\mu|y-\bar{\Phi} \hat{x} |+|TV(\hat{x})|$$

where $\bar {\Phi }$ is a foveated STOne transform.

Fig. 1. (a) The single pixel camera hardware architecture used for data acquisition. The detector here can be either a photodiode for grayscale imaging or a spectrometer for hyperspectral imaging. (b) The downsampling approximation enabled by the STOne transform used in our foveation method. (c) The foveation pipeline shown from the point-of-view of the sensing matrix to emphasize the importance of the downsampling approximation. Full-resolution STOne patterns are displayed on the DMD. During reconstruction, the STOne matrix is foveated using the upsample/downsample approximation. Undoing the nested-embedding permutation represents the DMD pattern as a vector that is highly redundant outside of the RoI. A variation on run-length encoding is used to compress this vector such that all pixels – high and low resolution – are represented by a single value representing the mirror state (on or off).

Download Full Size | PDF

For this reconstruction model to give high-quality images, it is necessary that $\bar {\Phi }x$ be approximately equal to $y=\Phi x$. That is, foveated measurements of the image must be approximately the same as full-resoluution measurements. This is guaranteed by the downsampling property of the STOne transform shown in Fig. 1(b). A more formal statement of this guarantee is given and proven in the supplemental materials. The foveated transform is then obtained by leaving any regions of interest at the original resolution while downsampling everything else, as shown in Fig. 1(c). If the image vector $\hat {x}$ is a column vector, then we can represent the foveated transform as

(4)$$\bar{\Phi}=\Phi D$$

where $D$ is an $N\times N_f$ foveation operator that downsamples regions outside the RoI, replacing blocks of high-resolution pixels with a single low-resolution pixel, much like in a run-length encoding. [11] This downsampling and run-length encoding is shown in Fig. 1(c). Since matrix multiplication is associative, we can interpret the sampling operation

(5)$$y=\Phi Dx=(\Phi D)x = \Phi (Dx)$$

as either performing a foveated STOne transform on a full-resolution image or as performing a full-resolution STOne transform on a foveated image – the two approaches are mathematically equivalent. Since the full-resolution STOne admits a fast transform, we implement the latter interpretation. The PDHG reconstruction algorithm used in this work requires both the forward transform of the sensing matrix and its adjoint, [10] which is computed as

(6)$$\bar{\Phi}^T=D^T\Phi^T$$

Noting that a property of the STOne transform is that $\Phi ^T=\Phi$, we simply need to determine $D^T$. Since $D$ foveates a linearized image by downsampling outside the RoI, by inspection its adjoint $D^T$ must upsample the linearized image by upsampling outside the RoI. In practice, $D$ is readily computed by averaging the low-resolution pixels together, while its adjoint is computed as a nearest neighbor upsampling of the low-resolution region.

3. Methods

3.1 Experimental hardware setup

The hardware data that will be shown was acquired using the single-pixel camera shown in Fig. 1(a). A ViALUX DLPC410 DMD chipset was used as the spatial light modulator. To account for the fact that the STOne transform contains values +1 and −1, we acquire data using a matrix with values +1 and 0 and then subtract off the mean measurement intensity times the ratio of the total number of pixels in each pattern to twice the number of +1 pixels in each pattern. That is,

(7)$$b_{new} = b - \bar{b}\times\frac{n_1 + n_0}{2n_1}$$

where $\bar {b}$ is the mean measurement value, $n_1$ is the total number of "on" pixels per pattern, and $n_0$ is the total number of "off" pixels per pattern. This gives an approximation of the "true" measurement using positive and negative values. For grayscale data, a Thorlabs PDA36A photodiode was used with the gain set to max. For hyperspectral data, an OceanOptics Flame-S-XR1-ES spectrometer with a 25 micron slit and 50 ms integration time was used.

3.2 Computation of derivatives

In the PDHG reconstruction algorithm assuming TV sparsity, it is necessary to compute the derivative operator and its adjoint. For images with mixed pixel sizes, it is necessary to take some care wherever pixels of two different sizes meet. For the results in this paper, we use the forward-difference derivative with the following rule wherever a larger pixel meets a smaller pixel. If the larger pixel is on the left or above the smaller pixel, then we compute the average derivative across all smaller pixels along that side. If the smaller pixel is on the left or above the larger pixel, then we compute only the difference between the smaller and larger pixel. The motivation behind this is the derivative at each pixel is then computed on the scale of that pixel.

4. Results

4.1 Grayscale simulation and hardware experiments

In Fig. 2, we show the scalability of this method of foveation by reconstructing the monarch test image at $512\times 512$ resolution and 10:1 subsampling using three different approaches: the Nyquist-rate $\ell _2$ preview, our foveation method, and the traditional full-resolution reconstruction. While the $\ell _2$ method is fastest, it is only capable of reconstructing a noisy $128\times 128$ image. This is sufficient to inform the placement of the RoIs in our foveation method, but is insufficient as a final reconstruction. The full-resolution reconstruction gives the highest quality all over the image, but requires over 4 seconds to reconstruct. The proposed foveation approach requires only slighly over 1 second to reconstruct and gives comparable results in the selected foveated regions. While a reconstruction time of 1 second is too slow for real-time applications, this implementation is on a CPU as a proof-of-concept. Other researchers have found that similar $\ell _1$ algorithms enjoyed a time reduction of up to 2 orders of magnitude when implemented on a GPU due to speedups in the matrix-multiplication and fast-transform steps. [12] If our algorithm demonstrates similar improvement with a GPU implementation, it would be able to reconstruct $512\times 512$ grayscale images at upwards of 10 fps, shifting the imaging bottleneck from reconstruction to acquisition.

Fig. 2. Foveated grayscale reconstruction from simulated data. (a) Reconstruction using 10:1 subsampling of a $512\times 512$ image. Using the exact same set of STOne measurements, an $\ell _2$ preview at $128\times 128$ resolution, a foveated image with $512\times 512$-equivalent pixel-size in the RoI, and a full $512\times 512$ image can all be reconstructed. Note that our foveation technique requires 4x less time to reconstruct than the full-resolution image. In a complete pipeline, the $\ell _2$ preview can be used to inform the placement of the RoIs before foveated reconstruction. (b) A representation of the idea that acting on a full-resolution image with a foveated transform equivalent to acting on a foveated image with a full-resolution transform.

Download Full Size | PDF

To demonstrate that this algorithm works in hardware as well as in simulation, we acquired $128\times 128$ spatial resolution data on a single-pixel camera and reconstructed at 4:1 compression using different foveated regions, as shown in Fig. 3. A full-resolution reconstruction and high-res camera image are given for comparison. Due to the smaller number of pixels, the reconstruction error is more significant here than in the $512\times 512$ reconstruction, as becomes apparent when the resolution outside the ROI decreases. Despite this, we are always able to reconstruct a foveated image in less time than is required for the full-resolution image.

Fig. 3. Foveated grayscale reconstructions from hardware data. All images (except for the ground truth) were reconstructed using the exact same data at 4:1 subsampling. Note that the 128/64 foveated reconstruction gives the highest SSIM among the foveated reconstructions. This is because the higher non-RoI resolution is a better approximation of the ground-truth scene, so the algorithm converges to a more correct image. In all cases, SSIM is computed only over the pixels at the maximum available resolution.

Download Full Size | PDF

4.2 Foveated hyperspectral compressive imaging

We also extend this algorithm to the hyperspectral domain, noting that because we only modify the spatial sensing matrix, our foveation method works regardless of how any of the signal’s other dimensions are sampled. The result is shown in Fig. 4. This data was acquired in the manner of [13] using an OceanInsight Flame-S-XR1-ES Spectrometer and was reconstructed at 5:1 compression. To compare the spectra across the ground-truth and reconstructed images, the spectra are standardized to have a mean of 0 and standard deviation of 1. Whereas the reconstruction speed was optimal for a minimal amount of downsampling in the grayscale $128\times 128$ image, the hyperspectral image is able to achieve faster reconstruction speeds for larger amounts of downsampling. The reason for this is that there are significantly more voxels in the hyperspectral reconstruction than there are pixels in the grayscale image. This means that the downsampling error becomes less significant since the sparsity of an image increases as roughly the square root of the number of pixels. Because the error is less significant, the reconstruction algorithm will reach its convergence criterion more quickly than it would otherwise. Note that the spectra within the foveated region are smoother than those outside of it. This is because there are many more high-resolution pixels than low-resolution pixels, even though the low-resolution pixels cover a greater physical area in the field of view. Because there are fewer low-resolution pixels, their contribution to the loss in the reconstruction algorithm is less significant and so more error is tolerated in that area. In addition, the TV regularization cannot help smooth the low-resolution region as much, since the scene changes more rapidly at the scale of the low-resolution pixels. Finally, note that we continue to reconstruct 4-5 times faster than the standard reconstruction even as the spectral dimensionality of the problem increases.

Fig. 4. Foveated hyperspectral reconstruction from hardware data. Pseudo-RGB image and extracted spectra from a $128\times 128\times 64$ datacube reconstructed at 5:1 compression using our foveation method and the standard full-resolution approach. Data was acquired using an OceanInsight Flame spectrometer and reconstructed in the style of [13] from 500 to 700 nm. All spectra were standardized to have mean 0 and standard deviation 1. Note that the spectra within the foveated region are as smooth as in the full-resolution reconstruction, while outside the foveated region, the spectra appear noisier. Also note that the foveated reconstruction requires almost 5x less time than the full-resolution reconstruction.

Download Full Size | PDF

4.3 Foveated hyperspectral video

We maintain this speedup even as we move to 4D-imaging comprising compression in the two spatial dimensions as well as the temporal one. To simulate hyperspectral video reconstruction, we used a cropped $128\times 128$ section of the "ball" video from [14]. The hyperspectral video had 16 spectral channels, which were sampled in the manner of [13]. The video reconstruction is given as the solution to the problem

(8)$$x^*=\arg\min_{\hat{x}}\mu|y-\bar{\Phi}\hat{x}|+|TV_{4D}(\hat{x})|$$

where $TV_{4D}$ is a 4-dimensional total variation operator over space, spectrum, and time. Selected false-color frames of the 128/64 foveated reconstruction at 16:1 compression are shown in Fig. 5 along with spectral montages from two of the frames. The foveated reconstruction required only 211 seconds, whereas reconstructing in the usual way requires 895 seconds. Note that the green half of the rolling ball is not visible in the red parts of the spectrum, but is clearly visible in the green parts of the spectrum.

Fig. 5. Foveated hyperspectral video from simulated data. Selected frames of a reconstructed foveated $128\times 128\times 16\times 40$ hyperspectral video of a rolling 2-color ball. The foveated reconstruction required 211 seconds, while the equivalent full-reconstruction required 895 seconds. The montage shows each band of a single frame of the reconstruction. They reveal that the green side of the ball is almost invisible in the red, while the orange side of the ball is bright in the red. Each band in the montage has been scaled to [0,1] for display purposes.

Download Full Size | PDF

5. Extensions and conclusions

As an example extension of our foveation method, we note that our foveation method naturally parallelizes any reconstruction algorithm for full-resolution reconstructions. Because the ROIs can be determined after measurement acquisition, we propose that to achieve a faster full-resolution reconstruction, multiple reconstructions could be performed in parallel, with each reconstruction having adjacent foveated regions, such that the set of all foveated regions covers the entire image. The RoIs can then be stitched together to form a final, full-resolution image, as shown in Fig. 6.

Fig. 6. Foveation-enabled parallel full-resolution reconstruction schematic. Multiple reconstructions are performed in parallel with each reconstruction having a different RoI such that the RoIs cover the full field-of-view. The foveated regions from each reconstruction are then stitched together to form the complete image. The stitching procedure can produce gridding artifacts that are readily avoided by choosing the foveated regions to overlap slightly and then feathering them together.

Download Full Size | PDF

Outside the realm of $\ell _1$ optimization, we anticipate that this foveation approach will also apply to neural network reconstructions, such as those used by [15–17]. In these networks, a fully-connected layer is used at the beginning of the network to convert the measurements into an image. As the image becomes large or high-dimensional, however, the number of neurons needed in that initial layer becomes correspondingly large. Training a network with so many neurons is computationally expensive, requires a large amount of memory on the GPU, and runs the risk of overfitting the training data. [16] By utilizing this exact same foveation concept in a neural network, the initial fully connected layer can be trained to reconstruct a foveated image rather than a full-resolution image, thus reducing the size of the layer. In addition, a suite of neural networks could be trained to each treat a different part of the image as the RoI. The reconstructed RoIs could then be stitched together, as previously described, to reconstruct the full image.

Funding

Small Business Innovation Research (W909MY-17-C-0006).

Acknowledgments

The authors would like to acknowledge Dr. Sanjeev Agarwal at Army NVESD for useful discussions regarding the parallelization of this approach for faster recovery of the entire image.

Disclosures

The authors declare no conflicts of interest.

Data availability

The full-resolution STOne transform used in this paper was made available at [18] following the release of [8]. The foveated transform used in this paper as well as the code used to generate the reconstructed images in this paper are available at [19].

Supplemental document

See Supplement 1 for supporting content.

References

1. M. Duarte, M. Davenport, D. Takhar, J. Laska, T. Sun, K. Kelly, and R. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Process. Mag. 25(2), 83–91 (2008). [CrossRef]

2. D. Takhar, J. Laska, M. Wakin, M. Duarte, D. Baron, S. Sarvotham, K. Kelly, and R. Baraniuk, “A new compressive imaging camera architecture using optical-domain compression,” in Electronic Imaging 2006, (International Society for Optics and Photonics, 2006).

3. E. Candès, “Compressive sampling,” in Proceedings of the International Congress of Mathematicians, vol. 3 (2006), pp. 1433–1452.

4. D. Phillips, M.-J. Sun, J. Taylor, M. Edgar, S. Barnett, G. Gibson, and M. Padgett, “Adaptive foveated single-pixel imaging with dynamic supersampling,” Sci. Adv. 3(4), e1601782 (2017). [CrossRef]

5. Z. Shin, H. Lin, T.-Y. Chai, X. Wang, and S. Chua, “Programmable spatially variant single-pixel imaging based on compressive sensing,” J. Electron. Imag. 30(02), 021004 (2021). [CrossRef]

6. Y. Yu, B. Wang, and L. Zhang, “Saliency-based compressive sampling for image signals,” IEEE Signal Process. Lett. 17(11), 973–976 (2010). [CrossRef]

7. Z. Zhao, X. Xie, C. Wang, S. Mao, W. Liu, and G. Shi, “Roi-csnet: Compressive sensing network for roi-aware image recovery,” Signal Process. Image Commun. 78, 113–124 (2019). [CrossRef]

8. T. Goldstein, L. Xu, K. Kelly, and R. Baraniuk, “The stone transform: Multi-resolution image enhancement and compressive video,” IEEE Trans. on Image Process. 24(12), 5581–5593 (2015). [CrossRef]

9. C. Li, “An efficient algorithm for total variation regularization with applications to the single pixel camera and compressive sensing,” Ph.D. thesis, Rice University (2010).

10. T. Goldstein, M. Li, and X. Yuan, “Adaptive primal-dual splitting methods for statistical learning and image processing,” in Advanced Neural Information Processing Systems, (2015), pp. 2089–2097.

11. S. Wolfram, A New Kind of Science (Wolfram Media, Champaign, IL, 2002), chap. Processes of Perception and Analysis, pp. 560–563.

12. B. Shuang, W. Wang, H. Shen, L. Tauzin, C. Flatebo, J. Chen, N. Moringo, L. Bishop, K. Kelly, and C. Landes, “Generalized recovery algorithm for 3d super-resolution microscopy using rotating point spread functions,” Sci. Rep. 6(1), 30826 (2016). [CrossRef]

13. T. Sun and K. Kelly, “Compressive sensing hyperspectral imager,” in Computational Optical Sensing and Imaging, (2009), p. CTuA5.

14. F. Xiong, J. Zhou, and Y. Qian, “Material based object tracking in hyperspectral videos,” IEEE Trans. on Image Process. 29, 3719–3733 (2020). [CrossRef]

15. A. Mousavi, A. Patel, and R. Baraniuk, “A deep learning approach to structured signal recovery,” in 2015 53rd annual allerton conference on communication, control, and computing (Allerton), (Monticello, Illinois, 2015), pp. 1336–1343.

16. A. Mousavi and R. Baraniuk, “Learning to invert: Signal recovery via deep convolutional networks,” in 2017 IEEE international conference on acoustics, speech, and signal processing, (IEEE, 2017), pp. 2272–2276.

17. C. Higham, R. Murray-Smith, M. Padgett, and M. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. 8(1), 2369 (2018). [CrossRef]

18. T. Goldstein, “Implementation of the STOne transform for compressed sensing videon,” GitHub (2022) [accessed 12 May 2022], https://github.com/tomgoldstein/stone.

19. A. T. Giljum and K. F. Kelly, “STOne-Foveation,” GitHub (2022) [accessed 12 May 2022], https://github.com/atgRICE/STOne-Foveation.

On-the-fly compressive single-pixel foveation using the STOne transform

Abstract

1. Introduction

2. Foveated image model

3. Methods

3.1 Experimental hardware setup

3.2 Computation of derivatives

4. Results

4.1 Grayscale simulation and hardware experiments

4.2 Foveated hyperspectral compressive imaging

4.3 Foveated hyperspectral video

5. Extensions and conclusions

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (6)

Equations (8)

Optics Express