High-performance reconstruction method combining total variation with a video denoiser for compressed ultrafast imaging

Chengquan Pei; David Day-Uei Li; Qian Shen; Shian Zhang; Dalong Qi; Chengzhi Jin; Le Dong

doi:10.1364/AO.506058

1. INTRODUCTION

Compressed ultrafast photography (CUP) [1] is a computational imaging technique that allows 2D imaging at picosecond or even femtosecond time scales. CUP combines compressed sensing (CS) [2] and streak camera technologies, capable of capturing irreducible transient events with only a single shot. Because of its high temporal resolution, it has significant potential to reveal underlying mechanisms in the physical, chemical, and biological domains [2–4]. CUP is being applied widely in fluorescence lifetime detection [1,5,6], electronic imaging [7], and ultrafast laser dynamics [8,9].

CUP can be divided into two steps [1]: the first step is image acquisition by streak cameras. This process can be viewed as multiple frames of video superimposed into a single observed image after random sampling and offsetting. The second step is data reconstruction, where the forward imaging model builds the inverse problem model. Then the observed images are converted into a video by an optimization algorithm. Hunt et al. derived theoretical guarantees for stable, accurate reconstruction at high compression ratios [10,11]. As CUP reconstruction is an ill-posed problem, total variation (TV) [12–14], sparsity [15–18], and low rank [19,20] are usually chosen as penalty terms in the optimization problem. The two-step iterative threshold method (TwIST) [21] is a classical CUP reconstruction algorithm that employs TV constraints in the reconstruction process [1,8], and the algorithm has high efficiency but introduces artifacts. The TwIST has other disadvantages, such as containing more artificially determined parameters, being sensitive to initial values, and being challenging to converge. C. Yang utilized the augmented lagrangian (AL) algorithm to reduce the dependence on penalty term parameters and to obtain higher image reconstruction quality [22].

Recently plug-and-play (PnP) has been applied as a flexible and efficient algorithmic framework [23,24] for solving inverse problems with convergence proofs [25,26]. Its most significant advantage is that any state-of-the-art denoising algorithm can be plugged in to improve the reconstruction quality. Lai et al. decoupled the optimization variables [27] using the alternating direction multiplier method (ADMM) in the PnP framework. Dabov plugs in the denoising algorithm block-matching 3D filtering denoising (BM3D) [28] based on the PnP framework to solve the suboptimization problem. Although a reconstructed image’s quality can be improved, the reconstruction efficiency degrades, since BM3D performs frame-wise and is time-consuming. With the development of deep learning (DL), some CNN-based image- denoising algorithms have better performance, such as the deep CNN denoiser prior for image restoration (IRCnn) [29], deep CNN for image denoising (DnCnn) [30], and the fast and flexible solution for CNN-based image denoising (FFDNet) [31]. For scenarios (e.g., CUP) with insufficient data to support end-to-end training, PnP has the greater advantage of using a denoiser that is trained in advance [26]. This property allows CNN-based denoisers to be applied to the reconstruction process. Q. Shen applied the FFDNet to the algorithmic framework of PnP, and its reconstruction quality, speed, and flexibility exceed those of traditional reconstruction algorithms [32].

Fig. 1. Data compression process of CUP.

Download Full Size | PDF

Image-denoising algorithms are widely used denoisers in CUP reconstruction using the PnP framework. However, in the forward model, transient scenes that are continuous in time are discrete sampled into multiple frames, and there are correlations in the temporal dimension between frames. In previous works, the image-denoising algorithm used in reconstruction based on the PnP could only exploit the spatial correlation of each frame. This paper applies the advanced CNN-based video denoising algorithm FastDVDNet [33] to the PnP to exploit the correlation between adjacent frames. The experimental and simulation results show that the proposed method improves reconstruction quality and efficiency.

2. MATHEMATICAL MODEL OF CUP

The CUP system compresses multiple video frames into one frame. As shown in Fig. 1, the process of data acquisition contains three steps: encoding, shearing, and overlaying. A video dataset with $N$ frames (${\rm the}\;{\rm size} = {n_x} \times {n_y}$) represents the ultrafast dynamic scene to be captured, randomly sampled by a mask (${\rm the}\;{\rm size} = {n_x} \times {n_y}$).

In the encoding step, the Hadamard (element-wise) product of the coding matrix $C$ and dynamic scene $X$ indicates that the encoding device, such as the DMD, randomly encodes each frame of the video data ${X_i}$. In the shearing step, under the action of the streak camera, each frame shears the ${s_0}$ pixel along the sweep direction from the previous frame, and this process can be represented by the shearing operator ${S_i}$. In the overlaying step, all images are accumulated into one observed image during the exposure time of the CCD embedded in the streak camera.

The CUP data compression process can be described as follows:

(1)$${\boldsymbol Y}=\sum_{i=1}^{i=N}{{{\boldsymbol S}_{i}}\cdot \left( {\boldsymbol C}\odot {{\boldsymbol X}_{i}} \right)+\boldsymbol Z},$$

where the symbol $\odot$ denotes the Hadamard product, $\boldsymbol{Y,Z} \in {\mathbb{R}^{L \times {n_y}}}$, $L = {n_x} + ({N - 1}){s_0}$, and $Z$ indicates noise in the imaging process. To convert the Hadamard product into the standard matrix product, the 2D vectors ${{\boldsymbol X}_i}$, ${\boldsymbol Y}$, and ${\boldsymbol Z}$ in Eq. (1) need to be converted into one-dimensional vectors ${x_i}$, $y$, and $z$:

(2)$$\begin{split} {\boldsymbol y }&= \sum\limits_{i=1}^{N}{{{\boldsymbol S}_{i}^{\prime}}\cdot \left( {{\boldsymbol C}_{\rm diag}}{{\boldsymbol x}_{i}} \right)+{\boldsymbol z}} \\ & =\sum\limits_{i=1}^{N}{({{\boldsymbol S}_{i}^{\prime}}{{\boldsymbol C}_{\rm diag}}){{\boldsymbol x}_{i}}+{\boldsymbol z}} \\ & =\big[ {{\boldsymbol S}_{1}^{\prime}}{{\boldsymbol C}_{\rm diag}},{{\boldsymbol S}_{2}^{\prime}}{{\boldsymbol C}_{\rm diag}},\ldots,{{\boldsymbol S}_{N}^{\prime}}{{\boldsymbol C}_{\rm diag}} \big]{{\big[ {\boldsymbol x}_{1}^{T},{\boldsymbol x}_{2}^{T},\ldots, {\boldsymbol x}_{N}^{T} \big]}^{T}}+{\boldsymbol z},\end{split}$$

where ${{\boldsymbol x}_i} = {\rm Vec}({{{\boldsymbol X}_i}}) \in {\mathbb{R}^{{n_x}{n_y} \times 1}}$, ${\boldsymbol y} = {\rm Vec}({\boldsymbol Y}) \in {\mathbb{R}^{L{n_y} \times 1}}$, and ${\boldsymbol z} = V{\rm ec}({\boldsymbol Z}) \in {\mathbb{R}^{L{n_y} \times 1}}$. The symbol ${\rm Vec}\;({\cdot})$ indicates the transformation of a 2D vector into a 1D vector. ${{\boldsymbol C}_{{\rm diag}}}$ denotes the diagonal form of the encoding matrix ${\boldsymbol C}$, ${{\boldsymbol C}_{{\rm diag}}} = {\rm diag}({\rm Vec}({\boldsymbol C})) \in {\mathbb{R}^{{n_x}{n_y} \times {n_x}{n_y}}}$, and the symbol $\rm diag\;({\cdot})$ indicates the conversion of a 1D vector into a 2D diagonal matrix. Equation (2) can be described as a classical linear inverse problem model:

(3)$${\boldsymbol y} = {\boldsymbol H}{\boldsymbol x} + {\boldsymbol z},$$

where ${\boldsymbol x} = {[{{\boldsymbol x}_1^T,{\boldsymbol x}_2^T,\ldots,{\boldsymbol x}_N^T}]^{{T}}}$, the sensing matrix ${\boldsymbol H} \in {\mathbb{R}^{L{n_y} \times {n_x}{n_y}N}}$, is a blocked diagonal matrix:

(4)$${\boldsymbol H} = \left[{{{\boldsymbol H}_1},\ldots,{{\boldsymbol H}_N}} \right] = \left[{{\boldsymbol S}_1^{\prime} {{\boldsymbol C}_{{\rm diag}}},{\boldsymbol S}_2^{\prime} {{\boldsymbol C}_{{\rm diag}}},\ldots,{\boldsymbol S}_N^{\prime} {{\boldsymbol C}_{{\rm diag}}}} \right],$$

where ${\boldsymbol S}_i^{\prime} = {\rm Circshift}({{{\boldsymbol I}_0},({i - 1}){s_0}{n_y}}) \in {\mathbb{R}^{L{n_y} \times {n_x}{n_y}}}$, $i = {1},{2},\def\LDeqbreak{}{3},\ldots{N}$, and the symbol CircShift (${\boldsymbol A}$, $l$) indicates that the matrix ${\boldsymbol A}$ is cyclically shifted by $l$ units along the vertical direction.

3. PROPOSED RECONSTRUCTION METHODS

A. PnP-ADMM for CUP

For the linear inverse Eq. (3), the PnP-ADMM algorithm iteratively solves the following three problems to find the optimal solutions [25]:

(5)$${{\boldsymbol x}^{({k + 1} )}} = \mathop {\rm argmin}\limits_x f\!\left({\boldsymbol x} \right)+ \frac{\rho}{2}\left\| {{\boldsymbol x}- \left({{{\boldsymbol v}^{(k )}}- \frac{1}{\rho}{{\boldsymbol u}^{(k )}}} \right)} \right\|_2^2,$$

(6)$${{\boldsymbol v}^{({k + 1} )}} = {D_\sigma}\left({{{\boldsymbol x}^{(k + 1)}} + \frac{1}{\rho}{{\boldsymbol u}^{(k )}}} \right),$$

(7)$${{\boldsymbol u}^{({k + 1} )}} = {{\boldsymbol u}^{(k )}} + \rho \left({{{\boldsymbol x}^{({k + 1} )}} - {{\boldsymbol v}^{({k + 1} )}}} \right).$$

For Eq. (6), as ${{\boldsymbol H\boldsymbol H}^T} = {\rm diag}({[{{\psi _1},\ldots,{\psi _{L{n_y}}}}]^T})$ is a diagonal matrix, a closed-form solution can be obtained [32]:

(8)$${{\boldsymbol x}^{k + 1}} = \left({{{\boldsymbol \upsilon} ^k} - \frac{1}{\rho}{{\boldsymbol u}^k}} \right) + {{\boldsymbol H}^T}\!\left[{\boldsymbol y -\boldsymbol H\left({{{\boldsymbol \upsilon }^k} - \frac{1}{\rho}{{\boldsymbol u}^k}} \right)} \right]./\left({{\boldsymbol s} + \rho\boldsymbol I} \right)\!,$$

where ${\boldsymbol s} = {[{{\psi _1},{\psi _2},\ldots,{\psi _{L{n_y}}}}]^T}$, and ${{\cal D}{(\cdot)}}$ in Eq. (7) denotes the denoising algorithm. Based on the PnP framework; any advanced denoising algorithm can solve Eq. (6).

B. PnP-GAP for CUP

The solution steps of the PnP-GAP algorithm are as follows [34]:

(9)$${{\boldsymbol x}^{k + 1}} = {{\boldsymbol \theta} ^k} + {{\boldsymbol H}^T}{\left({\boldsymbol H{{\boldsymbol H}^T}} \right)^{- 1}}\left({{\boldsymbol y} - \boldsymbol H{{\boldsymbol \theta} ^k}} \right),$$

(10)$${{\boldsymbol \theta} ^{k + 1}} = {{\cal D}_\sigma}\big({{{\boldsymbol x}^{k + 1}}} \big).$$

Similarly, ${{\cal D}(\cdot)}$ represents the denoising algorithm. Since $\boldsymbol H{{\boldsymbol H}^T} = {\rm diag}({{{[{{\psi _1},{\psi _2},\ldots,{\psi _{L{n_y}}}}]}^T}})$ is a diagonal matrix in CUP, Eq. (9) can be simplified as

(11)$${{\boldsymbol x}^{k + 1}} = {{\boldsymbol \theta} ^k} + {{\boldsymbol H}^T}\left({{\boldsymbol y} -{\boldsymbol H}{{\boldsymbol \theta} ^k}} \right)\cdot/s,$$

where ${\boldsymbol s} = {[{{\psi _1},{\psi _2},\ldots,{\psi _{L{n_y}}}}]^T}$.

C. Denoise Operators

FastDVDNet is a two-stage structure, where five frames of images are divided into three groups as inputs to Block1 [33], and the outputs of the three Block1 are the inputs to Block2. The architecture used in FastDVDnet can be found in Ref. [33]. Among them, three Block1 share parameters, and Block1 and Block2 have the same structure; it is a modified version of the U-Net network. It is worth noting that, compared with the original U-Net, the network here has two downsampling layers (whereas the original network has four) with downsampling achieved through Conv layers instead of pools. In addition, upsampling is not attained through bilinear interpolation or deconvolution, but rather through pixel-shuffle. FastDVDNet has a simpler structure than DVDNet. In image/video denoising, the L1 loss is commonly used, as it keeps the overall information of the denoised image. However, we use the ${{\rm L}_2}$ loss in FastDVDNet, and the parameters of this denoise net are similar to those in Ref. [33].

4. SIMULATION RESULTS

A. Simulation on Benchmark Data

To verify the effectiveness of the proposed algorithm, we selected eight frames from the benchmark dataset runner, traffic, drop, and crash [34] and generated a compressed image according to the CUP data compression model. Then PnP-TV, PnP-BM3D, PnP-FFDNet, PnP-FastDvDNet, and PnP-TV $+$ FastDvDNet were used to recover video data of size ${256} \times {256} \times {8}$ from the generated image of size ${256} \times {256}$. We also used the deep-learning method (termed as DL) proposed in Refs. [35,36] and augmented the Lagrangian and deep-learning hybrid algorithm (termed as AL $+$ DL) proposed in Ref. [37] to compare the reconstruction performance. In addition, the spatial and intensity constraints (SIC) method (termed as TV $+$ FastDvDNet-P) proposed in Ref. [38] was used to provide a more stringent evaluation of the proposed method. In the simulation, the streak camera measurement was generated according to the forward model. Similarly, the CCD measurement was generated by integrating the original dynamic scene along the time axis. The computing platform configuration parameters we used are as follows: CPU is 12th Gen Intel Core i7-12700H 2.70 GHz, and the GPU is the NVIDIA GeForce RTX 3060 laptop GPU.

As shown in Fig. 2, one frame from the eight frames reconstructed from each simulation dataset is selected for comparison. Artifacts exist in the reconstructed image of PnP-TV, and blurring occurs in the detail part, so the image’s visual effect is relatively poor. Notably, the reconstruction performance of these algorithms is close when recovering drop data with a simple image structure. In contrast, the reconstruction results of PnP-BM3D and PnP-FFDNet show significant distortion when recovering traffic and crash data with complex image structures. Therefore, PnP-FastDvDNet and PnP-TV $+$ FastDvDNet outperformed other algorithms in restoring image details.

Fig. 2. Reconstruction results of algorithms using different denoisers.

Download Full Size | PDF

Table 1. Average PSNR (dB)

View Table | View all tables in this article

Table 2. Average SSIM

View Table | View all tables in this article

Table 3. Results of Average Execution Time (Second)^a

View Table | View all tables in this article

Tables 1 and 2 show the PSNR and SSIM of the reconstructed results, respectively. According to the average PSNR and SSIM, PnP-FastDvDNet is the best algorithm using a single denoiser, and the algorithm using the combined TV and FastDvDNet denoiser has 1.2 dB and 0.012 improvements in the PSNR and SSIM, respectively, compared to it. Regarding algorithm execution efficiency, the algorithm using the BM3D denoiser takes the longest average time, while the algorithm using the TV and FFDNet denoiser takes less time. Still, the reconstruction is less effective than the FastDvDNet and TV $+$ FastDvDNet denoiser algorithm. The execution time of PnP-TV $+$ FastDvDNet is less than that of PnP-FastDvDNet because a part of its iterations uses a TV denoiser. However, PnP-TV $+$ FastDvDNet works better due to the utilization of both TV and video depth during reconstruction. The PSNR and SSIM were shown in Tables 1 and 2. We also made an efficiency comparison for different methods, as shown in Table 3.

Fig. 3. Reconstruction results for different compression ratios.

Download Full Size | PDF

Fig. 4. Graphs of the (a) PSNR (dB), (b) SSIM, and (c) using time with increasing the compression ratio for the reconstruction results.

Download Full Size | PDF

By applying the joint TV and FastDvDNet denoiser to the PnP, temporal correlations are exploited in the reconstruction process, and the reconstruction effect exceeds that of the algorithm using a single denoiser. At the same time, its execution time is less than that of PnP-FastDvDNet.

B. Simulation on High Compression Ratio Data

To test the reconstruction performance of the proposed algorithm on data with larger compression ratios (CRs), 13 simulated datasets were generated by selecting 3, 6, ${9},{\ldots}$, and 39 frames from the drop dataset. The CUP data compression ratio is defined as follows:

(12)$$R = \frac{{{\rm size}\,({\boldsymbol x})}}{{{\rm size}\,({\boldsymbol y})}} = \frac{{{n_x}{n_y}N}}{{L{n_y}}} = \frac{{{n_x}N}}{L} = \frac{{{n_x}N}}{{{n_x} + {s_0}(N - 1)}}.$$

The number of compressed frames positively correlates with the data CR.

Figure 3 shows the reconstruction results when the number of compressed frames is 3, 12, 21, 30, and 39, respectively. Compared with PnP-TV $+$ FastDvDNet, the algorithm using a single denoiser shows distortion and blurring in some areas of the reconstructed image when the CR increases. The results show that PnP-TV $+$ FastDvDNet delivers high reconstruction performance in terms of a large CR.

As shown in Fig. 4, the PSNR and SSIM and using time of the reconstruction results using a single denoiser decrease rapidly when the CR increases. In contrast, PnP-TV $+$ FastDvDNet curve decreases more gently, and the reconstruction performance is better than other algorithms after the number of compressed frames exceeds 24 frames. The simulation results on data with high compression ratios show that our proposed method outperforms the algorithm using a single denoiser and takes less time to reconstruct than PnP-FastDvDNet.

Fig. 5. (a) PSNR (dB) variation with the number of iterations. (b) SSIM variation with the number of iterations.

Download Full Size | PDF

C. Simulation on Different Denoise Operators

With PnP we can apply advanced denoisers to the solutions of Eq. (3). Image-denoising algorithms are widely used in PnP-based optimization algorithms. However, in a CUP reconstruction problem, the image-denoising algorithms only recognize spatial features; they do they do not exploit the correlation between different video frames. For ultrafast dynamic scenes, considering the timing correlation between sequential frames can guarantee higher-quality reconstructions.

In contrast, our FastDvDNet denoising considers image spatial features and the temporal correlation between video frames. We integrate TV and FastDvDNet denoising with PnP, using both TV and video deep prior to improve image reconstruction quality.

As shown in Fig. 5, we applied different denoisers to the reconstruction process. The optimization algorithm has iterated a total of 450 times. For PnP-TV $+$ FastDvDNet, the TV denoiser was used for 225 iterations, followed by the FastDvDNet denoiser for 225. All other methods use a single denoiser iterating 150 times. Compared with TV-only or FastDvDNet-only denoisers, the integrated TV and FastDvDNet denoiser shows better PSNR and SSIM performances, even surpassing the image denoiser BM3D and FFDNet.

D. Balance between the TV and FastDVDnet

In order to reveal the principle of denoising, the total denoising expression can be described as ${\lambda _1}{\rm FastDvDnet} + {\lambda _2}{\rm TV}$, ${\lambda _2} = {k}{\lambda _1}$. We set ${\lambda _1} = {0.0001}$ and calculated the PSNR when $k$ is ${1},\;{2},\;{3},{\ldots 10}$, respectively, and the result is shown in Fig. 6.

Fig. 6. PSNR curves as functions of the parameters ${\lambda _1}$ and ${\lambda _2}$.

Download Full Size | PDF

From Fig. 6, we can see that the performance of the proposed method is insensitive to the variation in the values of ${\lambda _2}$ in the range of 0.0001–0.001.

5. REAL DATA

As indicated in Fig. 7, an ultrafast dynamic scene is collected by an objective lens (OL) and imaged onto a DMD (Texas Instruments, DLP Light Crafter 3000), loading a static random pattern.

Fig. 7. Experimental system setup.

Download Full Size | PDF

Fig. 8. Simply CUP real data ${E}$ and ${N}$.

Download Full Size | PDF

Fig. 9. Reconstruction results of real data ${\boldsymbol E}$ and the comparison among (a) TV, (b) BM3D, (c) FFDnet, (d) FastDVDnet, and (e) TV $+$ FastDVDnet.

Download Full Size | PDF

Fig. 10. Reconstruction results of real data ${\boldsymbol N}$ and the comparison among (a) TV, (b) BM3D, (c) FFDnet, (d) FastDVDnet, and (e) TV $+$ FastDVDnet.

Download Full Size | PDF

After that, the encoded ultrafast scene of each channel is transferred through a ${4}f$ imaging system composed of ${{\rm L}_1}$ and ${{\rm L}_2}$ and reflected by a reflecting prism. Finally, all of them are captured by a streak camera (Hamamatsu, C7700) with the slit fully opened. This scientific instrument can capture ultrafast scenes based on the photo-electric effect and temporal shearing technique [25,33]. In the first scene, a single laser pulse from a mode-locked Ti:sapphire laser amplifier (Spectra-Physics, 50 fs, 0.8 mJ) was stretched up to 200 ps by a pulse stretcher. The stretched pulse was spatially expanded to illuminate hollow letters “${E}$” and “${N}$” fabricated in a black nylon plate. Photons inside the shape could pass through the plate, while those outside were blocked. The resulting ${E}$-shaped laser pulse was projected onto a thin white paper sheet for scattering and further observed by the CUP system. Figure 8 shows the compressed data ${\boldsymbol E}$ and ${\boldsymbol N}$ captured by the CUP system. Data ${\boldsymbol E}$ and ${\boldsymbol N}$ sequence depths are 21 and 30, respectively.

We performed the proposed method, and Figs. 9 and 10 show the reconstructed outcomes.

Figures 8 and 9 show the reconstruction results for real data ${\boldsymbol E}$ and ${\boldsymbol N}$, respectively. Intensity variations are difficult to observe between different reconstruction frames for PnP-TV, whereas intensity variations can be observed, but are not continuous for PnP-BM3D and PnP-FFDNet. The FastDvDNet’s reconstruction result shows a more coherent intensity change between adjacent frames, guaranteeing a better visual effect.

Light springs (LSs), as a type of novel-shaped ultrafast laser beam carrying orbital angular momentum (OAM), have a helical structure in both phase and intensity profile [1–3]. In order to extend the experiments to more complex real-world data. We conduct an experiment to capture the LSs, and the experimental details can be found in Ref. [39]. In the above simulations and experiments, we have been demonstrated that the TV $+$ FastDvDNet has advantage over state-of-the-art algorithms. Therefore, we reconstruct the compressed image using only FFDnet, FastDVDnet, and TV $+$ FastDVDnet to manifest the efficiency of the proposed method. The reconstructed result is shown in Fig. 11.

Fig. 11. Reconstruction results of LSs and the comparison among (a) FFDnet, (b) FastDVDnet, and (c) TV $+$ FastDVDnet.

Download Full Size | PDF

As shown in Fig. 11, we can see that the TV $+$ FastDvDNet’s reconstruction result shows a more coherent intensity change between adjacent frames, guaranteeing a better visual effect. To complement the 1D evolution information of the reconstructed data, the initial intensity integral information is supplemented. As demonstrated in Fig. 12, a comparison reveals that the temporal evolution characteristics of the reconstructed data closely resemble those of the original data.

Fig. 12. Information of the temporal evolution of the laser pulse intensity.

Download Full Size | PDF

Information of the temporal evolution of the laser pulse intensity is also extracted for further comparison and shown in Fig. 12. The black line for reference indicates the time evolution of the pulse intensity measured by the one-dimensional (1D) streak camera with its entrance slit being narrowed to a few micrometers for spatial sampling. Figure 12 shows that all the data extracted from reconstructed images agree well with the reference, so it manifests that our methods are capable for image reconstruction in CUP.

6. CONCLUSION

The proposed PnP-TV $+$ FastDvDNet exploits both the TV and the deep video prior of the image, offering a higher quality reconstruction. The reconstruction results on simulated and real data show that using the integrated TV and FastDvDNet denoiser outperforms single-denoiser algorithms. Since the sequence depth of current CUP systems has exceeded 1000 frames, the proposed method provides much better reconstruction with a high sequence depth.

Funding

National Natural Science Foundation of China (62101302).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

REFERENCES

1. L. Gao, J. Liang, L. Lin, et al., “Single-shot compressed ultrafast photography at one hundred billion frames per second,” Nature 516, 74–77 (2014). [CrossRef]

2. Y. C. Eldar and G. Kutyniok, Compressed Sensing: Theory and Applications (Cambridge University, 2012).

3. J. Liang and L. V. Wang, “Single-shot ultrafast optical imaging,” Optica 5, 1113–1127 (2018). [CrossRef]

4. D. Qi, S. Zhang, C. Yang, et al., “Single-shot compressed ultrafast photography: a review,” Adv. Photonics 2, 014003 (2020). [CrossRef]

5. P. Wang, J. Liang, and L. V. Wang, “Single-shot ultrafast imaging attaining 70 trillion frames per second,” Nat. Commun. 11, 2091 (2020). [CrossRef]

6. J. V. Thompson, J. D. Mason, H. T. Beier, et al., “High speed fluorescence imaging with compressed ultrafast photography,” Proc. SPIE 10076, 1007613 (2017). [CrossRef]

7. X. Liu, S. Zhang, A. Yurtsever, et al., “Single-shot real-time sub-nanosecond electron imaging aided by compressed sensing: analytical modeling and simulation,” Micron 117, 47–54 (2019). [CrossRef]

8. J. Liang, L. Zhu, and L. V. Wang, “Single-shot real-time femtosecond imaging of temporal focusing,” Light Sci. Appl. 7, 42 (2018). [CrossRef]

9. T. Kim, J. Liang, L. Zhu, et al., “Picosecond-resolution phase-sensitive imaging of transparent objects in a single shot,” Sci. Adv. 6, eaay6200 (2020). [CrossRef]

10. E. J. Candès, “The restricted isometry property and its implications for compressed sensing,” C. R. Math. 346, 589–592 (2008). [CrossRef]

11. J. Hunt, T. Driscoll, A. Mrozack, et al., “Metamaterial apertures for computational imaging,” Science 339, 310–313 (2013). [CrossRef]

12. S. Osher, M. Burger, D. Goldfarb, et al., “An iterative regularization method for total variation-based image restoration,” Multiscale Model. Simul. 4, 460–489 (2005). [CrossRef]

13. E. Herrholz and G. Teschke, “Compressive sensing principles and iterative sparse recovery for inverse and ill-posed problems,” Inverse Probl. 26, 125012 (2010). [CrossRef]

14. X. Yuan, “Generalized alternating projection based total variation minimization for compressive sensing,” in IEEE International Conference on Image Processing (ICIP) (2015), pp. 2539–2543.

15. D. Liu, Z. Wang, B. Wen, et al., “Robust single image super-resolution via deep networks with sparse prior,” IEEE Trans. Image Process. 25, 3194–3207 (2016). [CrossRef]

16. P. Llull, X. Liao, X. Yuan, et al., “Coded aperture compressive temporal imaging,” Opt. Express 21, 10526–10545 (2013). [CrossRef]

17. M. Iordache, J. E. M. Bioucas-Dias, and A. J. Plaza, “Total variation spatial regularization for sparse hyperspectral unmixing,” IEEE Trans. Geosci. Remote Sens. 50, 4484–4502 (2012). [CrossRef]

18. D. Reddy, A. Veeraraghavan, and R. Chellappa, “P2C2: programmable pixel compressive camera for high speed imaging,” in Conference on Computer Vision and Pattern Recognition (2011), pp. 329–336.

19. W. Ren, X. Cao, J. Pan, et al., “Image deblurring via enhanced low-rank prior,” IEEE Trans. Image Process. 25, 3426–3437 (2016). [CrossRef]

20. S. Gu, L. Zhang, W. Zuo, et al., “Weighted nuclear norm minimization with application to image denoising,” in IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 2862–2869.

21. J. M. Bioucas-Dias and M. Figueiredo, “A new TwIST: two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process. 16, 2992–3004 (2008). [CrossRef]

22. C. Yang, D. Qi, F. Cao, et al., “Improving the image reconstruction quality of compressed ultrafast photography via an augmented Lagrangian algorithm,” J. Opt. 21, 035703 (2019). [CrossRef]

23. S. Venkatakrishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model based reconstruction,” in IEEE Global Conference on Signal and Information Processing (2013), pp. 945–948.

24. S. Sreehari, S. V. Venkatakrishnan, B. Wohlberg, et al., “Plug-and-play priors for bright field electron tomography and sparse interpolation,” IEEE Trans. Comput. Imaging 2, 408–423 (2015). [CrossRef]

25. S. H. Chan, X. Wang, and O. A. Elgendy, “Plug-and-play ADMM for image restoration: fixed-point convergence and applications,” IEEE Trans. Comput. Imaging 3, 84–98 (2016). [CrossRef]

26. E. K. Ryu, J. Liu, S. Wang, et al., “Plug-and-play methods provably converge with properly trained denoisers,” presented at Proceedings of the 36th International Conference on Machine Learning (2019).

27. Y. Lai, Y. Xue, C.-Y. Côté, et al., “Single-shot ultraviolet compressed ultrafast photography,” Laser Photonics Rev. 14, 2000122 (2020). [CrossRef]

28. K. Dabov, A. Foi, V. Katkovnik, et al., “Image denoising by sparse 3D transform-domain collaborative filtering,” IEEE Trans. Image Process. 16, 2080–2095 (2007). [CrossRef]

29. Z. Kai, W. Zuo, S. Gu, et al., “Learning deep CNN denoiser prior for image restoration,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).

30. K. Ma, Z. Duanmu, Q. Wu, et al., “Waterloo exploration database: New challenges for image quality assessment models,” IEEE Trans. Image Process. 26, 1004–1016 (2016). [CrossRef]

31. K. Zhang, W. Zuo, and L. Zhang, “FFDNet: toward a fast and flexible solution for CNN based image denoising,” IEEE Trans. Image Process. 27, 4608–4622 (2017). [CrossRef]

32. Q. Shen, J. Tian, and C. Pei, “A novel reconstruction algorithm with high performance for compressed ultrafast imaging,” Sensors 22, 7372 (2022). [CrossRef]

33. M. Tassano, J. Delon, and T. Veit, “FastDVDnet: towards real-time deep video denoising without flow estimation,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019), pp. 1351–1360.

34. X. Yuan, Y. Liu, J. Suo, et al., “Plug-and-play algorithms for large-scale snapshot compressive imaging,” IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, Washington, 2020, pp. 1444–1454.

35. M. Parker, “CUP-Net: compressed ultrafast photography using convolutional neural networks,” ENGS 88 Honors Thesis (AB Students). 15 (Dartmouth College, 2020).

36. Y. Ma, X. Feng, and L. Gao, “Deep-learning-based image reconstruction for compressed ultrafast photography,” Opt. Lett. 45, 4400–4403 (2020). [CrossRef]

37. C. Yang, Y. Yao, C. Jin, et al., “High-fidelity image reconstruction for compressed ultrafast photography via an augmented-Lagrangian and deep-learning hybrid algorithm,” Photonics Res. 9, B30–B37 (2021). [CrossRef]

38. L. Zhu, Y. Chen, J. Liang, et al., “Space- and intensity-constrained reconstruction for compressed ultrafast photography,” Optica 3, 694–697 (2016). [CrossRef]

39. C. Jin, D. Qi, Y. Yao, et al., “Single-shot real-time imaging of ultrafast light springs,” Sci. China Phys. Mech. Astron. 64, 124212 (2021). [CrossRef]

Algorithm	Runner	Traffic	Drop	Crash	Average
AL $+$ DL	32.12	24.56	38.51	25.50	30.1725
DL	31.24	23.55	38.01	24.91	29.7524
PnP-ADMM-TV	26.87	20.54	30.88	24.44	25.964
PnP- ADMM-BM3D	32.48	24.24	40.48	26.05	30.958
PnP- ADMM-FFDNet	31.88	23.22	38.59	24.58	30.122
PnP-ADMM-FastDvDNet	33.42	26.58	37.76	27.22	31.28
PnP-ADMM-TV $+$ FastDvDNet	34.81	27.51	39.15	27.25	32.446
PnP-ADMM-TV $+$ FastDvDNet-P	35.01	28.12	39.89	28.35	32.842
PnP-GAP-TV	28.3	25.4	35.4	26.5	28.57
PnP-GAP- BM3D	33.5	26.2	39.1	25.6	31.06
PnP-GAP- FFDNet	34.1	27.3	38.8	27.1	29.88
PnP-GAP-TV $+$ FastDvDNet	35.6	28.2	38.08	28	32.71
PnP-GAP-TV $+$ FastDvDNet-P	35.89	28.98	38.91	28.56	33.01

Algorithm	Runner	Traffic	Drop	Crash	Average
AL $+$ DL	0.965	0.861	0.995	0.901	0.9305
DL	0.95	0.843	0.989	0.896	0.9291
PnP-ADMM-TV	0.905	0.729	0.953	0.863	0.8714
PnP-ADMM-BM3D	0.965	0.873	0.994	0.908	0.9402
PnP-ADMM-FFDNet	0.962	0.859	0.992	0.889	0.9326
PnP-ADMM-FastDvDNet	0.966	0.914	0.987	0.921	0.9474
PnP-ADMM-TV $+$ FastDvDNet	0.973	0.932	0.989	0.931	0.9582
PnP-ADMM-V $+$ FastDvDNet-P	0.981	0.95	0.99	0.95	0.97
PnP-GAP-TV	0.914	0.812	0.95	0.791	0.8694
PnP-GAP-BM3D	0.973	0.861	0.993	0.812	0.919
PnP-GAP-FFDNet	0.976	0.88	0.984	0.881	0.9372
PnP-GAP-TV $+$ FastDvDNet	0.98	0.912	0.99	0.911	0.9688
PnP-GAP-TV $+$ FastDvDNet-P	0.985	0.92	0.99	0.921	0.97

Algorithm	Runner	Traffic	Drop	Crash	Average
AL $+$ DL	32.12	24.56	38.51	25.50	30.1725
DL	31.24	23.55	38.01	24.91	29.7524
PnP-ADMM-TV	26.87	20.54	30.88	24.44	25.964
PnP- ADMM-BM3D	32.48	24.24	40.48	26.05	30.958
PnP- ADMM-FFDNet	31.88	23.22	38.59	24.58	30.122
PnP-ADMM-FastDvDNet	33.42	26.58	37.76	27.22	31.28
PnP-ADMM-TV $+$ FastDvDNet	34.81	27.51	39.15	27.25	32.446
PnP-ADMM-TV $+$ FastDvDNet-P	35.01	28.12	39.89	28.35	32.842
PnP-GAP-TV	28.3	25.4	35.4	26.5	28.57
PnP-GAP- BM3D	33.5	26.2	39.1	25.6	31.06
PnP-GAP- FFDNet	34.1	27.3	38.8	27.1	29.88
PnP-GAP-TV $+$ FastDvDNet	35.6	28.2	38.08	28	32.71
PnP-GAP-TV $+$ FastDvDNet-P	35.89	28.98	38.91	28.56	33.01

Algorithm	Runner	Traffic	Drop	Crash	Average
AL $+$ DL	0.965	0.861	0.995	0.901	0.9305
DL	0.95	0.843	0.989	0.896	0.9291
PnP-ADMM-TV	0.905	0.729	0.953	0.863	0.8714
PnP-ADMM-BM3D	0.965	0.873	0.994	0.908	0.9402
PnP-ADMM-FFDNet	0.962	0.859	0.992	0.889	0.9326
PnP-ADMM-FastDvDNet	0.966	0.914	0.987	0.921	0.9474
PnP-ADMM-TV $+$ FastDvDNet	0.973	0.932	0.989	0.931	0.9582
PnP-ADMM-V $+$ FastDvDNet-P	0.981	0.95	0.99	0.95	0.97
PnP-GAP-TV	0.914	0.812	0.95	0.791	0.8694
PnP-GAP-BM3D	0.973	0.861	0.993	0.812	0.919
PnP-GAP-FFDNet	0.976	0.88	0.984	0.881	0.9372
PnP-GAP-TV $+$ FastDvDNet	0.98	0.912	0.99	0.911	0.9688
PnP-GAP-TV $+$ FastDvDNet-P	0.985	0.92	0.99	0.921	0.97

High-performance reconstruction method combining total variation with a video denoiser for compressed ultrafast imaging

Abstract

1. INTRODUCTION

2. MATHEMATICAL MODEL OF CUP

3. PROPOSED RECONSTRUCTION METHODS

A. PnP-ADMM for CUP

B. PnP-GAP for CUP

C. Denoise Operators

4. SIMULATION RESULTS

A. Simulation on Benchmark Data

B. Simulation on High Compression Ratio Data

C. Simulation on Different Denoise Operators

D. Balance between the TV and FastDVDnet

5. REAL DATA

6. CONCLUSION

Funding

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (12)

Tables (3)

Equations (12)

Applied Optics

Algorithm	Runner	Traffic	Drop	Crash	Average
PnP-TV	21	22	18	20	20.2
PnP-BM3D	731	706	727	713	715.4
PnP-FFDNet*	26	27	25	26	26.2
PnP-FastDvDNet*	72	75	69	70	70.6
PnP-TV $+$ FastDvDNet*	69	63	56	60	61.6
AL $+$ DL	51	55	60	58	56
DL	0.1	0.1	0.1	0.1	0.1