Abstract
Recently, imaging systems with submicron image sensors have been increasingly adopting the Quad Bayer color filter array (CFA) owing to its high sensitivity under low illumination conditions when pixel binning techniques are employed. Conversely, under sufficient illumination conditions, direct demosaicing methods are required for the Quad Bayer color filter array to obtain a full-resolution color image. However, the Quad Bayer color filter array is more susceptible to aliasing artifacts because of its structural disadvantages compared to the Bayer color filter array. In this paper, we propose the multi-frame demosaicing method to address this problem. The proposed method assesses local spatial contributions within local windows at each pixel to reconstruct the lost spatial resolution caused by aliasing. Additionally, we utilize the color difference domain to exploit inter-channel correlations, aiming to alleviate color artifacts and enhance spatial resolution. The experimental results demonstrate that our proposed method outperforms other methods, both quantitatively and qualitatively, including single-frame-based demosaicing methods and another multi-frame demosaicing method with pixel binning techniques.
© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
The digital imaging system captures images by utilizing various types of image sensors. These sensors measure the brightness value of each pixel by quantifying the number of photons absorbed. However, the wavelength of the photons, which determines the color, cannot be discriminated by the image sensor itself. Therefore, the color filter array (CFA), a two-dimensional filter designed to selectively transmit a specific spectrum, is typically employed to capturing color images.
The Bayer CFA [1], depicted in Fig. 1(a), is currently the most widely used CFA owing to its simple structure and cost efficiency. Generally, most image signal processors (ISPs) are designed to acquire full-color images using the Bayer CFA. These ISPs restore the full-color images from the Bayer CFA image using various demosaicing techniques. Recently, image sensors have become smaller than ever before, leading to the increasing adoption of high-megapixel (MP) cameras, such as 48MP or 108MP sensors, in the latest mobile phones. However, this submicron image sensor exhibits relatively low sensitivity under low illumination conditions because of its reduced physical capacity to absorb photons.
To address this problem, some imaging systems that incorporate submicron image sensors have replaced the Bayer CFA with the Quad Bayer CFA [2]. The Quad Bayer CFA, as illustrated in Fig. 1(b), is devised to facilitate pixel binning techniques [3], combining adjacent pixels into a single pixel. Under low illumination conditions, the imaging system converts the full-resolution Quad Bayer CFA image to a quarter-resolution Bayer CFA image. Conversely, when the light source is sufficient for utilizing submicron image sensors, the Quad Bayer CFA image can be directly demosaiced to achieve a full-resolution color image.
The Quad Bayer CFA could be considered to have similar shape and properties when compared to the Bayer CFA. Nevertheless, their structural characteristics are quite different from a demosaicing perspective. In the Bayer CFA, the distance between color pixels is identical in all directions. However, as illustrated in Fig. 1, the distance between color pixels is irregular in the Quad Bayer CFA. Therefore, it is more complicated to interpolate missing color pixels of the Quad Bayer CFA than the Bayer CFA. Another significant distinction arises from their frequency domain representations, as illustrated in Fig. 2. As the image acquired with a CFA is a subsampled representation of a full-color image, its spectrum comprises superposed multiple spectral components. In the case of the Bayer CFA, its spectral components are relatively spaced apart. However, the spectral components of the Quad Bayer CFA are positioned much closer, making them more susceptible to mutual corruption. Consequently, images reconstructed from the Quad Bayer CFA are more vulnerable to aliasing artifacts. Due to the structural differences between these two CFAs, numerous color demosaicing methods proposed for the Bayer CFA [4–8] may not be the most suitable solution for the Quad Bayer CFA.
In this paper, we proposed a multi-frame demosaicing method that could be directly applied to the Quad Bayer CFA. Our proposed method can reconstruct high-frequency details from multiple aliased images. Inspired by the multi-frame demosaicing method for the Bayer CFA [9], we utilize a regression approach using steering kernels [10] for image reconstruction. Furthermore, the color difference domain, which is assumed to be smoother than the spatial domain, is employed for exploiting inter-channel correlation and enhancing color fidelity.
The remainder of this paper is organized as follows. In Section 2, the detailed description of our proposed method will be provided. The experimental results will be given in Section 3 and the paper will be concluded in Section 4. Additionally, Appendix A will present a detailed mathematical derivation of the frequency domain representation of the Quad Bayer CFA.
2. Proposed method
As explained, the spectral structure of the Quad Bayer CFA negatively impacts the quality of the reconstructed image when it is demosaiced using only a single frame. However, the numerous superresolution works show that when the multi-frame image that is accurately registered is exploited, it becomes possible to restore the corrupted high-frequency information [11–14]. From this idea, Wronski et al. [9] proposed a multi-frame demosaicing method for the Bayer CFA. However, as described, it cannot be directly applied to the Quad Bayer CFA due to its structural distinctions. In this section, we will introduce a multi-frame demosaicing method specifically designed for the Quad Bayer CFA. Building upon Wronski’s work, our proposed method enhances the color fidelity and resolution of the demosaiced image by exploiting the color difference domain.
The overview of our proposed method is depicted in Fig. 3. The proposed method is organized into four steps: luminance estimation, luminance processing, green channel demosaicing, and red/blue channel demosaicing. In the first step, luminance information is estimated based on a downsampling strategy for each input Quad Bayer CFA. We compared several downsampling methods to determine which method is suitable for estimating luminance. In the second step, from estimated luminance channels, optical flow is measured for sub-pixel registration and local statistics such as local mean, local standard deviation are computed to obtain noise parameter. Following that, the steering kernel is derived from the obtained optical flow, noise parameter, in addition to the kernel covariance yielded from local structure tensor analysis. In the third and fourth steps, multi-frame demosaicing using the steering kernel in the color difference domain is performed. For each input Quad Bayer CFA, green channels are demosaiced first, and then the color differences are computed. These multi-frame color differences are merged into fully demosaiced single color difference. The fully demosaiced red and blue channels are obtained by adding the color difference channels to the fully demosaiced green channel. Finally, by stacking the fully demosaiced red, green, and blue channels, the demosaiced color image from the multi-frame Quad Bayer CFA is produced.
2.1 Luminance channel estimation for the Quad Bayer CFA
The initial stage of the multi-frame demosaicing process involves sub-pixel registration. Several sub-pixel registration methods have been introduced, and optical flow estimation methods [15–17] are the most commonly used methods as of late. These methods measure the spatial displacement between a reference frame and a target frame. However, optical flow cannot be estimated directly between two CFA frames since each CFA frame is comprised of multiple color channels. Therefore, the luminance channel is generally estimated for optical flow measurement purposes because it contains red, green, and blue channel information.
For the Bayer CFA, the luminance channel can be simply estimated by fusing neighboring color pixels. However, this is not the case for the Quad Bayer CFA because the spatial resolution of the luminance channel will be reduced by a quarter, as depicted in Fig. 4(a). The estimated luminance channel in Fig. 5(b) depicts this significant spatial resolution loss compared to the ground truth luminance channel in Fig. 5(a). Since the loss of spatial resolution can lead to inaccurate sub-pixel registration, it should be minimized. Therefore, we have proposed a shifted averaging schemes for luminance estimation, as illustrated in Fig. 4(b) and Fig. 4(c). The method described in Fig. 4(b) exploits overlapped window for minimizing spatial resolution loss. In the method described in Fig. 4(c), neighboring color pixels are combined with a smaller, non-overlapping window, excluding the first row and column. The pattern captured by averaging windows may vary depending on how pixels are placed, but this shifting procedure enables the estimation of the luminance channel with minimal spatial resolution loss. The spatial resolution of the estimated luminance channel in Fig. 5(c) and Fig. 5(d) are considerably enhanced compared with Fig. 5(b). Additionally, Fig. 5(d) exhibits enhanced estimation quality and less spatial resolution loss compared to Fig. 5(c) since it is computed through a smaller window. Consequently, the luminance estimation method described in Fig. 4(c) is adopted for our proposed method.
Although the spatial resolution loss during luminance channel estimation is minimized by the shifted averaging scheme, it is important to note that the size of the luminance channel is still half that of the input CFA frames. Thus, the measured optical flow between the reference frame and target frame should be interpolated to double its size. Let the input Quad Bayer CFA frames with size $(H \times W)$ be denoted as $\textbf{I}_{n\in \{0, 1, \ldots, N-1\}}$ and corresponding estimated luminance as $\ell _{n\in \{0, 1, \ldots, N-1\}}$. $N$ denotes the number of the input Quad Bayer CFA frames. $H$ and $W$ indicate the height and width of input frames, respectively. The optical flow between the reference luminance $\ell _0$ and target luminance $\ell _t$, $(\bar {u}_t[n_1, n_2], \bar {v}_t[n_1, n_2])$ is measured if
2.2 Multi-frame demosaicing method using steering kernel estimation
The full-color image is reconstructed by demosaicing the input Quad Bayer CFA frames based on the measured optical flow and the estimated luminance channels. Similar to Wronski’s method [9], we utilized the steering kernel-based interpolation method [10] to combine multi-frame inputs. This method exhibits high performance in restoring high-frequency information from aliased images because the kernel is formed by evaluating local spatial contribution based on sub-pixel registration and edge directions. The restored color channels $\tilde {\textbf{I}}_{\textrm{c}\in \{\textrm{R}, \textrm{G}, \textrm{B}\}}[n_1, n_2]$ are obtained as follows:
Here, $\mathscr{W}$ denotes a local window with a center of $[n_1, n_2]$, and $\mathscr{w}_{[n_1, n_2]}$ represents the steering kernel at the pixel $[n_1, n_2]$.
As described in Eq. (2), the restored color pixel at $[n_1, n_2]$ is derived through local weighted summation first intra-frame and then inter-frame. The steering kernel is computed at every pixel position $[n_1, n_2]$ before conducting multi-frame demosaicing. For each input frame, local weighted summation is performed using this steering kernel at $[n_1, n_2]$. Note that during this intra-frame computation, sub-pixel registration is considered, as the steering kernel is computed using optical flow. The intra-frame weighted sum is then merged across the frames and normalized by the summation of the steering kernel at $[n_1, n_2]$. The multi-frame demosaicing is completed when this procedure is conducted for all pixel positions and for all color channels, ${\textrm{c}\in \{\textrm{R}, \textrm{G}, \textrm{B}}\}$. However, for performing demosaicing in the color difference domain, only the green channel is fully demosaiced in advance. The detailed procedure of color difference domain demosaicing will be provided in the next subsection. From now on, the method for computing the steering kernel will be described, with a focus on noise parameter and the kernel covariance matrix.
The local weight $\mathscr{w}_{[n_1, n_2]}$ is determined from the noise parameter $\alpha$ and spatial structural information such as edge directions. The noise parameter, derived from the luminance channel, is used to determine the amount of denoising required for each pixel. Since the proposed method consistently computes local averaging across multiple frames, the signal-to-noise ratio (SNR) would be increased on its own. However, it is imperative to note that detailed areas could be oversmoothed during this process. Therefore, it is appropriate that in detailed areas, the shape of the steering kernel needs to be sharpened along the edge directions. Conversely, a rounder and larger kernel shape is more suitable in plain areas. Consequently, the noise parameter $\alpha$ is determined by comparing the local standard deviation $\sigma _\textrm{lc}$ from the luminance channel to the predicted standard deviation $\sigma _\textrm{pd}$. The determined standard deviation is derived as follows:
If the noise model is an additive white noise model, $\sigma _\textrm{pd}$ would be measured from the largest plain area since it is assumed to be constant. In constrast, in the case of the Poisson-Gaussian noise model [18], the relationship between $\sigma _\textrm{pd}$ and signal intensity should be pre-defined since it is proportional to signal intensity. The noise parameter $\rho$ is then defined as,
Another factor that affects the local contribution $\mathscr{w}_{[n_1, n_2]}$ is the spatial structure. It is analyzed through the eigenanalysis of the local structure tensor $\hat {\Omega }$, defined as:
The parameters $k_1(\lambda _1, \lambda _2)$ and $k_2(\lambda _1, \lambda _2)$ are determined based on the values of $\lambda _1$ and $\lambda _2$, following a similar approach as in Wronski’s method [9]. The kernel covariance matrix is then decomposed to determine the shape of the steering kernel, as follows:
The matrix $\textbf{U}_\theta$ represents the orientation of the steering kernel, and $\Lambda$ represents the ratio between the lengths of the major and minor axes of the kernel.
From the noise parameter $\alpha$ and the kernel covariance matrix $\textbf{C}$, we defined the local contribution $\mathscr{w}_{[n_1, n_2]}$ as follows:
2.3 Multi-frame demosaicing method in the color difference domain
In this subsection, the image demosaicing method in the color difference domain will be explained. In Eq. (2), the color channels $\tilde {\textbf{I}}_{\textrm{c}\in \{\textrm{R}, \textrm{G}, \textrm{B}\}}[n_1, n_2]$ are restored in the spatial domain. However, it is widely understood that the color difference domain is more effective for image demosaicing tasks since images tend to be smoother in the color difference domain compared to the spatial domain [5–7]. Thus, we utilized the color difference domain to enhance spatial resolution and reduce color artifacts.
The green channel is selected as the reference channel for representing the color difference domain. This is due to the structure of the Quad Bayer CFA, where the number of pixels covered by the green color filter is twice that of red and blue. As shown in Fig. 7, the restored green channel in Fig. 7(b) exhibits superior spatial resolution and fewer artifacts compared to the red and blue channels in Figs. 7(a) and 7(c). The spatial resolution of the red and blue channels will be enhanced if the color difference domain is utilized.
Demosaicing must be performed on the green channels to achieve the transition from the spatial domain to the color difference domain with the input Quad Bayer CFA frames The green channels of all input frames, denoted as $\tilde {\textbf{I}}_{\textrm{G},t\in \{0, 1, \ldots, N-1\}}$, are demosaiced in the spatial domain using Eq. (2). Note that the parameters in Eq. (2) must be estimated individually for demosaicing the green channels, considering that the reference frame changes. Then, the Quad Bayer CFA frames in the color difference domain is defined as follows:
As depicted in Fig. 8, utilizing the color difference domain significantly reduces color artifacts and enhances the spatial resolution of the reconstructed color images.
3. Experimental results
In this section, we will describe our experimental methodology and provide a qualitative and quantitative evaluation of our proposed method. We captured several images in a darkroom studio using a color filter wheel to generate a simulation dataset. We could ensure that red, green, and blue color pixels were captured at exactly the same positions using a color filter wheel. The images captured without any missing pixels were then sub-sampled to create the Quad Bayer CFA images. Additionally, the camera was mounted on a rail that allowed for both horizontal and vertical movement. The multi-frame images are acquired by simulating the movement of the imaging system using these devices. We captured two image sets with different signal-to-noise ratios (SNR), 30dB and 20dB, by acquiring images under varying illumination conditions. To performe sub-pixel registration, we utilized Farneback’s method [16] to estimate optical flow. We used 8 Quad Bayer CFA frames for reconstructing full-color images.
Our proposed method is compared with two single-frame demosaicing methods and two multi-frame demosaicing methods: the modified residual interpolation method [8] adapted for the Quad Bayer CFA, a deep learning-based joint demosaicing and denoising method for the Quad Bayer CFA [19], DBSR [20] and BIPNet [21]. For single-frame demosaicing methods, we applied demosaicing techniques to the base frame. Since the two multi-frame demosaicing methods are designed for the Bayer CFA, we applied the pixel binning technique to the Quad Bayer CFA frames beforehand. The Quad Bayer CFAs are converted to Bayer CFAs with half the spatial resolution through pixel binning and then reconstructed using joint demosaicing and superresolution with a scale factor of 4. We downscaled this image using pixel reduction with a scale factor of 2 to match the spatial resolution.
From Figs. 9–12, we compared the aliasing artifacts that occurred in the reconstructed color images. Images reconstructed using single-frame demosaicing methods tend to exhibit aliasing issues due to the aforementioned structural disadvantage of the Quad Bayer CFA. In addition, images reconstructed using the pixel binning techniques experience a significant loss in spatial resolution because the pixel binning inherently causes degradation. Thanks to the multi-frame approach, our proposed method successfully restored high-frequency information from the aliasing artifacts present in the Quad Bayer CFA. Figure 13 evaluates the demosaicing techniques based on their performance in reconstructing color images along the edge directions. Our proposed method exhibits fewer color artifacts between the horizontal and vertical edges, indicating better performance. The denoising performance is demonstrated in Fig. 14. Our proposed method demonstrates superior denoising performance while preserving high-frequency components compared to other methods, as evidenced by the peak signal-to-noise ratio (PSNR) results.
To evaluate the demosaicing methods quantitatively, we utilized PSNR and SSIM [22]. The PSNR measures the ratio of the signal power to the noise power, and the SSIM assesses image similarity by considering their mean, standard deviation, and covariance. A higher value in both metrics indicates better image quality. In Table 1, our proposed method demonstrates both a high PSNR value and a high SSIM value. Our proposed method exhibits higher PSNR in both 20dB and 30dB datasets. The SSIM metric of the proposed method is measured to be similar to or higher than that of conventional methods.
4. Conclusion
In this paper, the multi-frame demosaicing method for the Quad Bayer CFA is proposed. Our proposed method reconstructed the lost high-frequency components from the aliasing artifacts caused by the inherent disadvantage of the Quad Bayer CFA structure using multi-frame approaches. Additionally, the color difference domain is employed for enhancing the spatial resolution of reconstructed color channels. Experimental results demonstrate that our proposed method exhibits better performance compared with both the single-frame demosaicing method and the multi-frame demosaicing method with pixel binning techniques, quantitatively and qualitatively. Future work could focus on addressing multi-frame approaches with occlusion and rapid local motion problems, aspects that were not considered in our proposed method.
Appendix A: Structural analysis of the Quad Bayer CFA on the frequency domain
In this Appendix, we will derive the frequency domain representation of the Quad Bayer CFA. Given that the CFA images are obtained from the full-color image through a subsampling procedure, their relationship can be interpreted as modulation-demodulation relations [23,24]. The two-dimensional CFA image $\textbf{I}$ can be expressed as follows:
Note that $\textrm{c}\in {\{\textrm{R}, \textrm{G}, \textrm{B}}\}$ denotes the observed color channel, red (R), green (G) and blue (B). $\textbf{I}_{\textrm{c}}$ and $\textbf{M}_{\textrm{c}}$ denote the two-dimensional image and modulation function of the corresponding color channel $\textrm{c}$, respectively. The element-wise product operation between two matrices is represented as ${\odot }$. Equation (11) describes that the composition of the CFA is determined by the design of the modulation function $\textbf{M}_{\textrm{c}}$.
For the Quad Bayer CFA, the modulation functions are designed as follows:
The frequency domain representation of the Quad Bayer CFA can be obtained by applying discrete Fourier transform (DFT) to Eq. (11) as follows:
Using Eq. (16), Eq. (15) can be simplified as follows:
By rearranging Eq. (14) using Eq. (17), the frequency domain representation of the Quad Bayer CFA can be interpreted as a combination of a single luminance component and 8 chrominance components. The luminance component $F_L$, represented by the white circle in Fig. 2(b), is located at the $(\omega _1, \omega _2)$ position and can be defined as follows:
The chrominance components, represented by the blue circles in Fig. 2(b) can be defined similarly. As demonstrated in Eq. (17), the spacing between neighbor components is $\pi /2$ in the Quad Bayer CFA. Considering that this spacing is doubled in the Bayer CFA, it can be understood that the Quad Bayer CFA is more susceptible to aliasing issues for this reason. In the next section, we will address this issue using a multi-frame demosaicing approach.
Funding
National Research Foundation of Korea (2022R1A2C2002897); Samsung Mobile eXperience (MX) Division.
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
References
1. B. Bayer, “Color imaging array,” United States Patent, no. 3971065 (1976).
2. I. Kim, S. Song, S. Chang, et al., “Deep image demosaicing for submicron image sensors,” J. Imaging Sci. Technol. 63(6), 060410-1–060410-12 (2019). [CrossRef]
3. G. A. Agranov, C. Molgaard, A. Bahukhandi, et al., “Pixel binning in an image sensor,” (2017). US Patent 9,686,485.
4. R. Kimmel, “Demosaicing: Image reconstruction from color ccd samples,” IEEE Trans. on Image Process. 8(9), 1221–1228 (1999). [CrossRef]
5. S.-C. Pei and I.-K. Tam, “Effective color interpolation in ccd color filter arrays using signal correlation,” IEEE Trans. Circuits Syst. Video Technol. 13(6), 503–513 (2003). [CrossRef]
6. W. Lu and Y.-P. Tan, “Color filter array demosaicking: new method and performance measures,” IEEE Trans. on Image Process. 12(10), 1194–1210 (2003). [CrossRef]
7. I. Pekkucuksen and Y. Altunbasak, “Multiscale gradients-based color filter array interpolation,” IEEE Transactions on Image Processing 22(1), 157–165 (2012). [CrossRef]
8. D. Kiku, Y. Monno, M. Tanaka, et al., “Beyond color difference: Residual interpolation for color image demosaicking,” IEEE Transactions on Image Processing 25(3), 1288–1300 (2016). [CrossRef]
9. B. Wronski, I. Garcia-Dorado, M. Ernst, et al., “Handheld multi-frame super-resolution,” ACM Trans. Graph. 38(4), 1–18 (2019). [CrossRef]
10. H. Takeda, S. Farsiu, and P. Milanfar, “Kernel regression for image processing and reconstruction,” IEEE Trans. on Image Process. 16(2), 349–366 (2007). [CrossRef]
11. S. Baker and T. Kanade, “Limits on super-resolution and how to break them,” IEEE Trans. Pattern Anal. Machine Intell. 24(9), 1167–1183 (2002). [CrossRef]
12. S. C. Park, M. K. Park, and M. G. Kang, “Super-resolution image reconstruction: a technical overview,” IEEE Signal Process. Mag. 20(3), 21–36 (2003). [CrossRef]
13. D. Robinson and P. Milanfar, “Fundamental performance limits in image registration,” IEEE Trans. on Image Process. 13(9), 1185–1199 (2004). [CrossRef]
14. D. Robinson and P. Milanfar, “Statistical performance analysis of super-resolution,” IEEE Trans. on Image Process. 15(6), 1413–1428 (2006). [CrossRef]
15. B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” in IJCAI’81: 7th international joint conference on Artificial intelligence, vol. 2 (1981), pp. 674–679.
16. G. Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, June 29–July 2, 2003 Proceedings 13, (Springer, 2003), pp. 363–370.
17. M. Tao, J. Bai, P. Kohli, et al., “Simpleflow: A non-iterative, sublinear optical flow algorithm,” in Computer graphics forum, vol. 31 (Wiley Online Library, 2012), pp. 345–353.
18. A. Foi, M. Trimeche, V. Katkovnik, et al., “Practical poissonian-gaussian noise modeling and fitting for single-image raw-data,” IEEE Trans. on Image Process. 17(10), 1737–1754 (2008). [CrossRef]
19. S. A Sharif, R. A. Naqvi, and M. Biswas, “Beyond joint demosaicking and denoising: An image processing pipeline for a pixel-bin image sensor,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, (2021), pp. 233–242.
20. G. Bhat, M. Danelljan, L. Van Gool, et al., “Deep burst super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 9209–9218.
21. A. Dudhane, S. W. Zamir, S. Khan, et al., “Burst image restoration and enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), pp. 5759–5768.
22. Z. Wang, A. C. Bovik, H. R. Sheikh, et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]
23. D. Alleysson, S. Susstrunk, and J. Hérault, “Linear demosaicing inspired by the human visual system,” IEEE Trans. on Image Process. 14(4), 439–449 (2005). [CrossRef]
24. E. Dubois, “Frequency-domain methods for demosaicking of bayer-sampled color images,” IEEE Signal Process. Lett. 12(12), 847–850 (2005). [CrossRef]