Prior-free imaging unknown target through unknown scattering medium

Yingjie Shi; Yingjie Shi; Enlai Guo; Enlai Guo; Enlai Guo; Lianfa Bai; Jing Han; Jing Han

doi:10.1364/OE.453695

1. Introduction

Light as it travels through inhomogeneous media, such as atmosphere containing haze and biological tissues, can lead to serious degradation of target information, and this degradation will lead to failure of target detection [1,2]. At present, imaging through scattering medium can be achieved mainly by using wavefront shaping [3,4], optical coherence tomography [5,6], methods based on transmission matrix [7,8] or point spread function [9,10]. Besides, learning-based methods have played an important role in computational imaging [11–13]. At the same time, the strong data fitting ability of deep learning shows a good way to find the relationship between speckle patterns and the original targets [14,15]. However, this mapping relationship mainly represents the characteristics of the dataset rather than the physical process. So the learning-based methods have two challenges, the first one is those methods need for a large-scale paired training dataset, and the other is the different scattering conditions need generalization [16].

There are different neural networks used to restore targets hidden behind the scattering medium. Convolutional Neural Network(CNN), such as Unet [17], is a very popular convolutional neural network type and demonstrates superior performance in cell segmentation and detection [18], image recovery [19], 3D imaging [20], and other fields. U-shaped networks perform well in terms of imaging through the scattering medium. Li $et al$., for the first time, use the u-shaped structure "IDiffNet" to restore the target behind the scattering medium [14]. Situ $et al$. construct a neural network that can restore targets behind a thick scattering medium with an optical depth of 13.4 [15]. "PDSNet" is designed by Guo $et al$. to reconstruct complex targets behind the scattering medium, and the targets beyond 40 times optical memory effect range are restored [21]. Generative Adversarial Network (GAN), as another network type, has a good performance in solving image super-resolution [22,23], bright field holography [24], and other problems. Some work also uses the structure of GAN to restore the target behind the scattering medium. Sun $et al$. used GAN to realize imaging through the medium with fixed scattering particle concentration in a low photon state based on a single-photon avalanche diode detector (SPAD) [25]. The above methods have good performance in solving specific problems, but all of them need a large amount of paired data collected in advance to optimize the weight of the neural network. These methods are often ineffective when the properties of scattering media change greatly and the neural network is trained without corresponding paired data.

In addition to reconstructing targets hidden behind known scattering media by using GAN or CNN, those two types of networks also show strong reconstruction ability for unknown scattering scenes but an amount of paired data is needed. Sun $et al$. classify the input speckle patterns by a sub-network in front of a GAN, and the targets behind five scattering media with different concentrations are restored [26]. A U-shaped network integrated with physical process is proposed by Zhu $et al$. for imaging through two unknown attributes ground-glass [27]. KL de Jong $et al$. proposed an unsupervised method for change detection by using CNN, and the input image also has a clear structure [28]. Although these two methods solve the problem that the deep learning method does not apply to different scattering media to a certain extent, their implementation still relies on thousands or even tens of thousands of paired data. This drawback means that many speckle patterns and their corresponding targets need to be obtained by an invasive approach. Yamazaki $et al$. use semi-supervised CycleGAN to restore the targets, but this method is only applied in the case of weak scattering, that is, the input of this method still retained the structural information of the original target [29]. Unsupervised training approaches have a good performance in phase imaging [30], lens-free imaging [31], computational ghost imaging [32], medical image analysis [33], and other fields. However, using unsupervised training approaches directly can not restore the original targets because the speckle lacks features similar to the structure of the target. Therefore, it is necessary to combine the actual physical processes when building the network to extract the original target information from the speckle patterns to achieve an unsupervised learning process.

In this paper, a method based on the unsupervised network combined physical model is proposed. This method does not need any labeled and paired data to optimize the neural network in advance, and only one frame of speckle patterns and unpaired real targets are needed as input for the online learning process. The unsupervised training approach reasonably uses physical processes rather than simply mapping from speckle patterns to targets. The proposed method does not need to collect the input and output of the system as paired data for training, so the method is irrelevant to the system, and prior information needs not to be known. The Peak Signal to Noise Ratio(PSNR) of the restored cross-category targets hidden behind the unknown scattering medium can exceed 17dB. Through the effective combination of the physical model and neural network, this method allows learning-based methods have more applications possibility for imaging through scattering medium.

2. Method

2.1 Basic theory

Imaging through scattering media based on speckle correlation is impressive, and the optical memory effect (OME) is shown to exist in different scattering scenes [34–36]. This theory provides a theoretical basis for imaging through unknown scattering media. There is a high correlation between the speckle collected when there is a slight displacement of the incident light. Therefore, the optical system can be considered as a linear system in the range of OME. The collected speckle patterns $I$ can be expressed as:

(1)$$I=O\ast PSF,$$

where $O$ represents the intensity information of the original target, $PSF$ represents the point spread function of the system, and the symbol $\ast$ represents the convolution operation. The autocorrelation of speckle pattern can be calculated as:

(2)$$I\otimes I=(O\ast PSF)\otimes (O\ast PSF)=(O\ast O)\otimes (PSF\ast PSF),$$

where $\otimes$ represents the correlation operation, and the autocorrelation of the system point spread function $(PSF\otimes PSF)$ is a peak function in the range of OME. So the autocorrelation of speckle is approximately equal to the original target autocorrelation. In addition, speckle autocorrelation usually has a background constant term $C$, and autocorrelation of speckle can be further expressed as:

(3)$$I\otimes I=(O\ast O)+C.$$

The relationship between autocorrelation of targets and autocorrelation of speckle patterns provides the possibility to construct consistency constraints that can replace data prior provided by paired data and achieve unsupervised reconstruction.

To prove that the autocorrelation consistency can be applied to different scattering conditions, three diffusers with different scattering characteristics are selected as the scattering medium, which is 220grit ground glass, 600grit ground glass, and Zinc Oxide(ZnO) flake, respectively. Those three mediums have obvious differences in scattering characteristics and have been widely used in previous work because they are easier to characterize and can provide standardized scattering environment [27,35–38]. As shown in Fig. 1(a), for the same target, speckle patterns formed through different scattering media have obvious distribution differences, but the main structure of their corresponding autocorrelation has the same distribution as that of the original target. The intensity of different speckle patterns and autocorrelations at the positions marked in the white dotted line in shown in Fig. 1(a) is plotted as Fig. 1(b)-(c), respectively. The intensity distribution of speckle patterns corresponding to different scattering media is almost irregular, and the autocorrelation corresponding to different speckle patterns has almost the same structure as the original target autocorrelation except for the background noise which is originated from replacing the ensemble averaging with averaging over a finite number of speckles [36].

Fig. 1. Speckle patterns and autocorrelation corresponding to different scattering media. (a) The first line shows the original target and corresponding speckle patterns formed by different scattering media, and the second line shows the autocorrelation corresponding to the target and speckle patterns in the first line. (b)-(c) Intensity values corresponding to the white dotted lines in the (a) respectively. Scalebar: 1368$\mu m$.

Download Full Size | PDF

Speckle formed through different scattering media have the same autocorrelation structure with the condition of the same target and optical system setup, and this phenomenon can help neural networks to build closed-loop optimization processes and replace data priors with physical process. In addition, the consistency of target autocorrelation structure and speckle patterns autocorrelation structure also provides an important constraint for neural networks to realize unsupervised optimization processes.

2.2 Model design

Point-to-point constraints are often required for U-shaped networks optimized in an end-to-end approach. In the strategy of GAN, generator and discriminator optimize each other through an adversarial generation approach. Pixel-level constraints between input and output are replaced by discriminator, and high-dimensional features extracted by discriminator are more universal than simple pixel-level features, and this strategy helps the network to find a better solution in some special optimization scenarios. Generally, the input of the GAN is the data containing the information of the original target. After passing the generator, a target hoped to reconstruct is generated, and the generated target (named as Fake) and the original target (named as Real) are sent into the discriminator to judge whether it is real or fake. Finally, the difference between the discriminant result and the original label (Real or Fake) is used to optimize the generator and discriminator.

Applications of GAN usually have two supervised strategies in the field of computational imaging. One is strong-supervised, that is, the input data is paired with the original data one by one. This strategy is the same as the supervised CNN, thousands of paired data consist of targets and speckle patterns that need to be used for optimizing networks. The other is the unsupervised method, the input data is unpaired with the real targets. The unsupervised strategy usually requires strong structural information between the input data and the real targets. Therefore, the current method using GAN to restore the targets hidden behind the scattering medium is also based on the supervised strategy. Since speckle patterns almost do not contain a similar structure to the original target, the correct reconstruction result cannot be obtained by directly training GAN networks with an unsupervised strategy. Unsupervised strategies are nearly impossible in previous work, but the proposed approach constructs effective unsupervised constraints by combining physical process.

To solve the shortcoming that existing neural networks can only recover the original target structure information by a large number of high-quality labeled data, as shown in Fig. 2, an unsupervised neural network with a GAN structure is constructed. The network consists of a generator and a discriminator, the input of the generator is the collected speckle pattern, and the input of the discriminator is the target reconstructed by the generator and the real targets unpaired with the speckle patterns. The source code of generator and discriminator as we show in Code 1 [39]. The size of input speckles is 600*600 pixels, sub-autocorrelation and sub-speckles with the same size of 64*64 pixels are used for the optimization process after correlation operations. In the supervised approach, there is a one-to-one pairing between the targets and speckle patterns, and this pairing is a prerequisite for using supervised methods. In contrast, although the real targets dataset is required in the proposed method, it is not paired to speckle patterns. Real targets that are different from hidden objects in intensity distribution can also be used in the reconstruction process. Therefore, the prior information of hidden targets such as structure type does not need to be obtained in advance and the target corresponding to speckle patterns does not appear in the dataset used to train the network. The real targets unpaired with speckle patterns in the proposed method are used to extract high-dimensional features by the discriminator. The extracted features and the autocorrelation consistency are used as a constraint in the reconstruction process. In the process of neural network online learning, the size of the input speckle patterns and the autocorrelation structure is unified, so even if the larger size speckle patterns are used as input, the efficiency of the network will hardly decrease.

Fig. 2. Schematic diagram of self-supervised neural network based on a physical model for imaging through the scattering medium. Real unpaired targets need to be prepared in advance, but prior information of hidden targets is no need for the selection of real target types.

Download Full Size | PDF

It can be seen from the analysis in Section 2.1 that the speckle autocorrelation is approximately equal to the target autocorrelation, and this consistency can undoubtedly trigger thinking in the structural design of the neural network. For the generator of the proposed network, not only the authenticity of the generated image as the direction of loss optimization but also the consistency between the speckle autocorrelation and the generated target autocorrelation is designed as a constraint to guide the generator optimization.

The optimization process of the neural network is divided into the optimization of discriminator and optimization of the generator, and the parameter updating process of the two is carried out alternately, so the weight of the whole neural network can be updated online without training in advance. All labels of the real targets are considered real. After discriminant, the Binary Cross-Entropy(BCE) function is used to perform loss calculation with the full true label, and then the weight of the discriminator is optimized. All labels of the generator output are considered as fake, and the loss calculated by BCE after discrimination is used to optimize the discriminator and generator respectively. The autocorrelation consistency between the generated targets and speckle patterns is also used to optimize the generator, that is, the speckle autocorrelation and the generated target autocorrelation are used to calculate loss through Mean Square Error(MSE) as another part of the loss constraint of the generator.

Therefore, the BCE function is mainly used to optimize discriminators to distinguish as much as possible whether the target is generated or not. While the BCE function is also used to optimize the generator so that the feature of generated targets are more similar to the unpaired real targets. After being constrained by the BCE function, the feature of generated targets are similar to the real targets, but the generated targets still need to be constrained by the MSE function to constrain physical processes. The constrained physical process can effectively extract original target information from speckle to reconstruct the structure of targets.

Finally, the loss function of the neural network can be divided into two parts, the one used to optimize the discriminator is $L_{D}$ and the other one used to optimize the generator is $L_{G}$. $L_{D}$ can be calculated as:

(4)$$L_D=BCE(label_{row},label_{real} )+BCE(label_{generated},label_{fake} ),$$

where $label_{row}$ is the output of the discriminator when the original target is input, and $label_{generated}$ is the output of the discriminator when the generated target is input. $label_{real}$ and $label_{fake}$ are all-one matrices and all-zero matrices respectively. $BCE(.)$ is the Binary Cross-Entropy function, which is defined as:

(5)$$BCE(p,y)={-}[y\ast \log (p)+(1-y)\ast \log (1-p) ].$$

$L_G$ is the loss function of the generator, which can be calculated as:

(6)$$L_G=2\ast MSE(corr_{speckle},corr_{generated} )+BCE(label_{generated},label_{real} ),$$

where $corr_{speckle}$ is the autocorrelation of the original speckle and $corr_{generated}$ is the autocorrelation of the generated target. $MSE(.)$ Is the Mean Square Error function, which is defined as:

(7)$$MSE(I',I)=\frac{1}{H \ast W}\sum_{i=1}^H\sum_{j=1}^W{[I'(i,j)-I(i,j)]^2},$$

where $H$ and $W$ are the height and width of the input image respectively.

3. Experiment

3.1 Experimental system and data acquisition

The schematic diagram of the optical system used for speckle collection is shown in Fig. 3. A laser(center wavelength: 650nm) passes through a rotating ground glass, and the beam is collimated by the lens. The diameter of the collimated beam is adjusted by the pupil and finally irradiated on Digital Micromirror Devices(DMD) (pixel count: 1024*768, pixel size: 13.68 $\mu m$) after passing through the Total Internal Reflection(TIR) prism. The target information is loaded into the beam after being reflected by the DMD, and the reflected beam is irradiated on the scattered medium. Finally, an industrial camera (pixel count: 1920*1200, pixel size: 5.86 $\mu m$) is used to collect the speckle pattern. The distance between DMD and the scattering medium is 29cm, and the distance between the scattering medium and the camera is 6cm.

Fig. 3. Schematic diagram of experimental system structure.

Download Full Size | PDF

To verify that the proposed method can be applied to different scattering situations, three different scattering medium are selected and used as the diffuser in the data acquisition process, which are 220grit ground glass, 600grit ground glass, and ZnO flakes, respectively. Loaded targets mainly come from MNIST [40], Fashion-MNIST [41], Kuzushiji-Kanji [42], and the English Characters. Two groups of 5000 targets are randomly selected from MNIST and Fashion-MNIST as original targets, respectively. In addition, 5 characters from each of the Kuzushiji-Kanji and English Characters are selected as other testing targets to verify the reconstruction ability of different methods for targets with obvious structural differences. Targets from different datasets can be considered different categories because of the obvious differences in structure. To verify the ability of cross-category targets reconstruction, data used to optimize the neural network and the data used to test the model come from the different datasets. In the process of data collection, all targets loaded by DMD are scaled to 80*80 pixels, and speckle patterns corresponding to three different scattering media are collected respectively. The ME range of 220grit diffuser, 600grit diffuser, and ZnO flakes are measured, and the ME range corresponds to DMD is 124 pixels, 132 pixels, and 103 pixels, respectively. Therefore, all targets are within the ME range.

Two groups each with 4500 targets from the MNIST and Fashion-MNIST respectively are used as the real targets to optimize the proposed method in different experiments, and the speckle patterns used for reconstruction come from the speckle corresponding to the remaining 500 targets in both groups. To compare with the supervised method, targets used to optimize the proposed method and corresponding speckle patterns are composed to paired data as training sets. The remaining 500 pairs in each group of speckle patterns and targets are served as testing sets. For comparison, the supervised method uses the speckle-Ground Truth(GT) paired training sets to train the network in advance, and the paired testing sets are used to verify the effectiveness of the trained model. All the neural networks mentioned in this paper are running on the computing platform with CPU i7-9700 and GPU RTX3090. The network is run on Ubuntu16.04 and is accelerated by Pytorch1.8 and CUDA11.1.

3.2 Comparison of the proposed method and the speckle correlation technology

Restoring the original target structure through the autocorrelation of speckle patterns is extensively studied, and the theoretical basis of this method also comes from the consistency of speckle autocorrelation and target autocorrelation. The speckle correlation technology obtains the amplitude in the frequency domain of the target through the autocorrelation of the speckle and estimates the corresponding phase information through the phase recovery algorithm to restore the original target. The hybrid input-output (HIO) method is used as representative of phase recovery algorithms to compare the effectiveness of speckle correlation technology and the proposed method.

Fashion-MNIST and MNIST used for comparison have great differences in the distribution of target structures. MNIST is mainly composed of thin linear structures, while Fashion-MNIST is mainly composed of continuous block structures. Speckle patterns of the same size are used for comparison between the two methods. The HIO algorithm is iterated 25 times, and the proposed method is trained 20 epoch online. The final results of the restored target by the two methods are shown in Fig. 4. When the speckle correlation technology is used to restore the target with linear structure, the original target can be reconstructed completely, but the details of the target with continuous block structure cannot be effectively restored. The proposed method can be effectively reconstructed for both types of objects, and the details of restored targets are more accurate such as the curved structure of the " bent pants " and the notch of the character "0". The speckle correlation technology relies on autocorrelation structure information for target reconstruction. Therefore, when the structure is complex or the autocorrelation structure is not clear enough, the reconstructed target quality will be significantly reduced. The proposed method involves the optimization process of the discriminator, so it can recover the original targets with more accurate details using the unpaired targets. Simultaneously, the hidden target reconstruction may fail when the ratio of background intensity to the original average autocorrelation intensity exceeds 0.6. However, both background intensity and noise can be suppressed by selecting a larger speckle and filtering speckles.

Fig. 4. Comparison of restored targets between the HIO algorithm and the proposed unsupervised method. Scalebar: 547.2$\mu m$.

Download Full Size | PDF

In the unsupervised constraint, the autocorrelation of the speckle patterns is approximately equal to the original target autocorrelation. However, due to the existence of the background noise in the autocorrelation, that strong randomness information can not provide effective constraints and will lead to random noise around the reconstructed target. The intensity of noise will gradually decrease as the used area of speckle patterns increases.

To verify the effect of speckle patterns with different sizes on the reconstruction results, the autocorrelation of sub-speckle with different sizes are used as input. The reconstruction results by the proposed method are shown in Fig. 5(b), and the intensity values corresponding to the positions marked in white dotted lines are shown in Fig. 5(c). The central intensity distribution of the autocorrelation corresponding to different size speckle patterns is highly consistent, and the difference mainly comes from the background noise. As shown in the enlarged area in Fig. 5(c), the intensity curve (purple) corresponding to a sub-speckle with the size of 600*600 pixels is the flattest. The surrounding background noise becomes smooth when the size of the selected sub-speckle increases. The background noise in speckle autocorrelation causes the feature extraction to be disturbed in the pixel-level constraint. The disturbances in the optimization process cause random noise to occur in the final reconstruction result. The final reconstruction results are shown in Fig. 5(b). With the increase of the input sub-speckle size, the random noise around targets can be effectively suppressed.

Fig. 5. Reconstruction quality comparison with speckle patterns of different sizes as input. (a) The autocorrelation of GT and autocorrelations corresponding to sub-speckles with the size of 300*300pixels, 400*400pixels, 500*500pixels, and 600*600pixels, respectively. Sub-speckles with different sizes come from the same speckles pattern. (b) GT and reconstruction results when the speckle patterns with different sizes as input. (c) The intensity values corresponding to the positions marked in the white dotted line in the autocorrelation structure of (a), and the part in the red box in the figure is enlarged. Scalebar: 547.2$\mu m$.

Download Full Size | PDF

The proposed method is optimized online by combining physical scattering processes, and this unsupervised strategy is more capable of reconstructing the structure and details of the target than the method using speckle correlation. As the size of used speckle patterns increases, the background noise of autocorrelation is lesser, and the noise in the target reconstructed by the proposed method is reduced. The generative-discriminative approach of the proposed method compensates for the lack of optimization ability of the traditional method and improves the ability to reconstruct complex targets and details of targets.

3.3 Quality comparison of restored targets hidden behind the unknown scattering medium

To compare the effectiveness of different supervision strategies in different scattering scenes, two supervised neural networks are selected to compare the reconstruction ability of different approaches. The first one is ERFNet which is based on end-to-end learning and has a U-shaped architecture [43]. ERFNet, as a backbone network, is shown excellent performance in restoring color targets and targets beyond the OME range [38]. In addition, Pixel2Pixel GAN(P2PGAN) [44] is also widely used in image style transfer [45], super-resolution imaging [46], and other fields. ERFNet and P2PGAN have relatively excellent performance, therefore, those two networks are selected as a comparison. It is worth noting that both of the above two networks belong to strong-supervised networks, and a large amount of paired data is required for training.

The training and comparison strategies for the above two networks are as follows: Speckle-GT pairs obtained through 220grit ground glass are used for training, and the trained model is used to restore the target hidden behind 220grit ground glass, 600grit ground glass, and ZnO flakes respectively. The quality of restored targets hidden behind unknown scattering media is used to compare the reconstruction capability of networks.

The unsupervised neural network proposed in this paper does not need any paired targets as input because it can optimize learning online. All the GT and speckle patterns used in the test process of the supervised method come from the testing set, and those GT and speckle patterns do not appear in the training set. To ensure the unification of data quantity, real targets used for online learning are taken from the training set and speckle patterns used to restore target are from the testing set, therefore, speckle patterns and targets are completely unpaired.

The results of target reconstruction using different methods are shown in Fig. 6. For ERFNet and P2PGAN, two supervisor methods that need paired data, the structure of the target can be restored relatively completely when the input speckle patterns and the speckle patterns used to train the network are produced through the same scattering medium. However, when target reconstruction uses speckle patterns produced by two unknown scattering media, only the indistinguishable jumble of patterns are obtained. This is mainly because the essence of supervised optimization is a fitting process among different datasets. P2PGan and ERFNet can only get the mapping relationship between speckle patterns generated through the known scattering medium and the original target, and cannot characterize the universal physical scattering process. So it cannot eliminate the influence of the characteristics of different scattering media. And this influence makes the supervised neural network used directly difficult to reconstruct the target hidden behind the unknown scattering medium. Although some supervised methods incorporated physical process can also achieve generalization to some medium, they still need large-scale data for advanced training. In contrast, the proposed method can effectively restore the targets hidden behind the unknown scattering media without any training process in advance.

Fig. 6. Comparison of target quality restored by different networks. Scalebar: 410.4$\mu m$.

Download Full Size | PDF

The average mean absolute error(MAE), structural similarity(SSIM), and PSNR of results restored by different methods are calculated, as shown in Table 1. From the objective comparison index, ERFNet reconstruction ability is better than P2PGAN when the restored targets are hidden behind the trained scattering medium. This is because ERFNet performs pixel-level point-to-point optimization fitting between speckle patterns and hidden targets, there are stronger constraints between generated target and original target compared with P2PGAN. Therefore, discriminator constraints of GAN are not always effective, and appropriate optimization strategies need to be selected based on actual scenarios. The proposed method uses a discriminator to constrain the high-dimensional characteristics of the generated target to ensure that the structural features of the generated targets are similar to the real targets. At the same time, the consistency relationship of autocorrelation is used to constrain the structural details of the generated targets at the pixel level. Finally, Hidden targets are correctly reconstructed through a combination of pixel-level constraints and discriminator constraints. Although the objective evaluation index reconstructed by the proposed method is not perfect, the recognizable structure of the target hidden in those unknown media can be restored by the proposed method.

Table 1. MAE, SSIM, and PSNR of the results with unknown scattering media under different network structures.

View Table | View all tables in this article

The proposed method combines the scattering process with GAN, so the hidden target behind three unknown scattering media can be effectively reconstructed by using the combination of discriminator constraints and pixel-level constraints based on autocorrelation consistency. The optimization constraint strategy used by the proposed method removes the obstacles caused by insufficient generalization capacity to scattering meidum and increases the usability of the proposed method in different scattering scenes.

3.4 Reconstruction of cross-category targets hidden behind the unknown scattering medium

Since it is impossible to invasively collect the structure information of the hidden targets in the real application to form paired data, so there is no guarantee that the targets used to optimize the neural network have a similar structural distribution to the hidden targets. Therefore, it is useful to restore hidden targets by using networks optimized through the cross-category dataset. The cross-category target mentioned in this section refers to the target which features of structure greatly differ from the structure of targets used in the training process. At present, the learning-based methods can only reconstruct the target with a similar structure to the target in the training set. When the structure distribution of the target has changed, the network can only fit the result similar to the target in the training set, or even the targets cannot be reconstructed effectively. The method proposed in this paper does not need paired data to optimize the neural network in a point-to-point manner and adds consistency constraint between speckle patterns and original target, therefore, the network can also reconstruct the cross-category targets better.

To compare the reconstruction capabilities of different methods for cross-category targets hidden behind unknown scattering media and verify the effectiveness of the proposed methods, the following two groups of experiments are carried out:

Group 1: The two supervised methods are trained with targets selected from the Fashion-MNIST dataset and the corresponding speckles generated through 220grit ground glass. Then the trained model is used to reconstruct the targets from the MNIST dataset by using the speckles generated by 220grit, 600grit, and ZnO flakes respectively. In the online learning process of the proposed method, the discriminator is optimized with targets from the Fashion-MNIST dataset, and the targets in the MNIST dataset are reconstructed from the speckles generated by those three scattering medium.

Group 2: The two supervised methods are trained with targets selected from the MNIST dataset and the corresponding speckles generated through 220grit ground glass. Then the trained model is used to reconstruct the targets in the Fashion-MNIST dataset, the targets in the Kuzushiji-Kanji dataset, and the English Characters with the speckles generated through 220grit, 600grit, and ZnO flakes respectively. MNIST data sets are used to optimize the discriminator of the proposed method and the three type targets hidden behind the three scattering medium are restored online.

In the experiment of Group 1, the reconstruction results of cross-category targets by different methods are shown in in Fig. 7, and the objective comparison indexes of the reconstruction results are shown in Table 2. For ERFNet with pixel-level constraint supervision, due to the large difference in the distribution of target structures between the training set and the testing set, the correct target structures cannot be restored even when the trained model is used to restore the targets hidden behind scattering medium same as training diffuser. P2PGAN can restore the outline of some targets (such as characters "0", "2", etc.) when the target is hidden behind the 220grit scattering medium, but most characters are still indiscernible. In these scattering scenes, the pixel-level constraints of the CNN are no longer valid, but the constraints on the high-dimensional features by the discriminator of the GAN are more effective.

Fig. 7. Reconstruction results of cross-category target hidden behind unknown scattering medium using different network structures (Group1). Scalebar: 410.4$\mu m$.

Download Full Size | PDF

Fig. 8. Reconstruction results of cross-category target hidden behind unknown scattering medium using different network structures (Group2). Scalebar: 410.4$\mu m$.

Download Full Size | PDF

Table 2. MAE, SSIM, and PSNR of the reconstruction results (Group1).

View Table | View all tables in this article

Although the structure of targets to be restored greatly differs from that the target used in the online training process, the cross-category target can still be successfully reconstructed hidden behind three unknown media by using the proposed method. The recognizable reconstruction results benefit from the network design that combines pixel-level constraints based on physical processes and discriminator constraints. The discriminator constrains only the high-dimensional features of the target, and the structure of the hidden target is still reconstructed by pixel-level constraints on autocorrelation. Therefore, the proposed method can still combine the autocorrelation of speckle patterns and discriminant results to optimize the network online and reconstruct hidden targets. In contrast, when the speckle patterns are generated through the two unknown medium, there is almost no identifiable target restored by two supervised networks.

In the experiment of Group 2, the reconstruction results of cross-category targets with different methods are shown in Fig. 8, and the objective comparison indexes of the reconstruction results are shown in Table 3. The results are similar to the results of the Group 1 experiment. However, since the structure of hidden targets is more complex than the training set targets, the P2PGan that only uses the discriminator constraint even cannot reconstruct the hidden target behind the trained scattering medium. The proposed method still can restore the cross-category targets for three unknown media.

Table 3. MAE, SSIM, and PSNR of the reconstruction results (Group2).

View Table | View all tables in this article

The rationality of the speckle patterns and targets consistency constraint constructed by the proposed method is further verified by two sets of experiments. This consistency improves the ability of neural networks to recover cross-category targets and reduces dependence on datasets.

The proposed method combines the pixel-level constraints containing physical processes with discriminator constraints and constructs an appropriate optimization process, so the cross-category hidden targets behind three unknown scattering medium can be effectively reconstructed. The targets generalization ability of the proposed method can well work out the imaging problems where the structure of the hidden target is unknown. This ability lets the unpaired real targets required for the proposed method extremely accessible and greatly reduces the data dependence of the proposed method.

4. Conclusion

This paper proposes a prior-free unsupervised neural network that incorporates physical processes. The proposed method achieves unsupervised online optimization through discriminator constraints and pixel-level constraints combined with physical processes. The proposed method does not require any paired data or any prior information to train the network in advance, and the hidden target behind unknown scattering medium can be reconstructed by using only one single frame of the speckle pattern through online optimization. The proposed method uses physical process instead of data fitting, and the low dependence on data and scattering scenes generalization increases the application possibility of this method. This online optimization process improves the ability to work out the physical inverse problem of hidden targets reconstruction. The proposed framework with broad applicability can effectively be applied in the ill-posed problem of different scattering scenes. When the size of targets is beyond the ME range, the autocorrelation structure will gradually degenerate, and this method may not be able to reconstruct the hidden target effectively. This combination of physical models and neural networks can also provoke thinking in the field of imaging through scattering medium.

Funding

National Natural Science Foundation of China (61971227, 62031018, 62101255); China Postdoctoral Science Foundation (2021M701721); Fundamental Research Funds for the Central Universities (30920031101).

Acknowledgments

We thank Shuo Zhu, Jinye Miao, Mengzhang Liu, and Chenyin Zhou for technical supports and experimental discussion. We also thank Shiyuan Jia for the grammar suggestion.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. W. Goodman, Speckle phenomena in optics: theory and applications (Roberts and Company Publishers, 2007).

2. M. C. Roggemann, B. M. Welsh, and B. R. Hunt, Imaging through turbulence (CRC press, 1996).

3. A. P. Mosk, A. Lagendijk, G. Lerosey, and M. Fink, “Controlling waves in space and time for imaging and focusing in complex media,” Nat. Photonics 6(5), 283–292 (2012). [CrossRef]

4. I. M. Vellekoop and A. Mosk, “Focusing coherent light through opaque strongly scattering media,” Opt. Lett. 32(16), 2309–2311 (2007). [CrossRef]

5. Y. Mao, C. Flueraru, S. Chang, D. P. Popescu, and M. G. Sowa, “High-quality tissue imaging using a catheter-based swept-source optical coherence tomography systems with an integrated semiconductor optical amplifier,” IEEE Trans. Instrum. Meas. 60(10), 3376–3383 (2011). [CrossRef]

6. D. Huang, E. A. Swanson, C. P. Lin, J. S. Schuman, W. G. Stinson, W. Chang, M. R. Hee, T. Flotte, K. Gregory, C. A. Puliafito, and J. G. Fujimoto, “Optical coherence tomography,” Science 254(5035), 1178–1181 (1991). [CrossRef]

7. S. M. Popoff, G. Lerosey, R. Carminati, M. Fink, A. C. Boccara, and S. Gigan, “Measuring the transmission matrix in optics: an approach to the study and control of light propagation in disordered media,” Phys. Rev. Lett. 104(10), 100601 (2010). [CrossRef]

8. A. Drémeau, A. Liutkus, D. Martina, O. Katz, C. Schülke, F. Krzakala, S. Gigan, and L. Daudet, “Reference-less measurement of the transmission matrix of a highly scattering material using a dmd and phase retrieval techniques,” Opt. Express 23(9), 11898–11911 (2015). [CrossRef]

9. D. Lu, M. Liao, W. He, Z. Cai, and X. Peng, “Imaging dynamic objects hidden behind scattering medium by retrieving the point spread function,” in Speckle 2018: VII International Conference on Speckle Metrology, vol. 10834 (International Society for Optics and Photonics, 2018), p. 1083428.

10. X. Xu, X. Xie, A. Thendiyammal, H. Zhuang, J. Xie, Y. Liu, J. Zhou, and A. P. Mosk, “Imaging of objects through a thin scattering layer using a spectrally and spatially separated reference,” Opt. Express 26(12), 15073–15083 (2018). [CrossRef]

11. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

12. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921–943 (2019). [CrossRef]

13. E. Moen, D. Bannon, T. Kudo, W. Graf, M. Covert, and D. Van Valen, “Deep learning for cellular image analysis,” Nat. Methods 16(12), 1233–1246 (2019). [CrossRef]

14. S. Li, M. Deng, J. Lee, A. Sinha, and G. Barbastathis, “Imaging through glass diffusers using densely connected convolutional networks,” Optica 5(7), 803–813 (2018). [CrossRef]

15. M. Lyu, H. Wang, G. Li, S. Zheng, and G. Situ, “Learning-based lensless imaging through optically thick scattering media,” Adv. Photonics 1(03), 1 (2019). [CrossRef]

16. S. Gigan, O. Katz, H. B. de Aguiar, E. R. Andresen, A. Aubry, J. Bertolotti, E. Bossy, D. Bouchet, J. Brake, S. Brasselet, Y. Bromberg, H. Cao, T. Chaigne, Z. Cheng, W. Choi, T. Cižmár, M. Cui, V. R. Curtis, H. Defienne, M. Hofer, R. Horisaki, R. Horstmeyer, N. Ji, A. K. LaViolette, J. Mertz, C. Moser, A. P. Mosk, N. C. Pégard, R. Piestun, S. Popoff, D. B. Phillips, D. Psaltis, B. Rahmani, H. Rigneault, S. Rotter, L. Tian, I. M. Vellekoop, L. Waller, L. Wang, T. Weber, S. Xiao, C. Xu, A. Yamilov, C. Yang, and H. Yilmaz, “Roadmap on wavefront shaping and deep imaging in complex media,” (2021).

17. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

18. T. Falk, D. Mai, R. Bensch, Ö. Çiçek, A. Abdulkadir, Y. Marrakchi, A. Böhm, J. Deubner, Z. Jäckel, K. Seiwald, A. Dovzhenko, O. Tietz, C. Dal Bosco, S. Walsh, D. Saltukoglu, T. L. Tay, M. Prinz, K. Palme, M. Simons, I. Diester, T. Brox, and O. Ronneberger, “U-net: deep learning for cell counting, detection, and morphometry,” Nat. Methods 16(1), 67–70 (2019). [CrossRef]

19. M. Weigert, U. Schmidt, T. Boothe, A. Müller, A. Dibrov, A. Jain, B. Wilhelm, D. Schmidt, C. Broaddus, S. Culley, M. Rocha-Martins, F. Segovia-Miranda, C. Norden, R. Henriques, M. Zerial, M. Solimena, J. Rink, P. Tomancak, L. Royer, F. Jug, and E. W. Myers, “Content-aware image restoration: pushing the limits of fluorescence microscopy,” Nat. Methods 15(12), 1090–1097 (2018). [CrossRef]

20. C. Ounkomol, S. Seshamani, M. M. Maleckar, F. Collman, and G. R. Johnson, “Label-free prediction of three-dimensional fluorescence images from transmitted-light microscopy,” Nat. Methods 15(11), 917–920 (2018). [CrossRef]

21. E. Guo, S. Zhu, Y. Sun, L. Bai, C. Zuo, and J. Han, “Learning-based method to reconstruct complex targets through scattering medium beyond the memory effect,” Opt. Express 28(2), 2433–2446 (2020). [CrossRef]

22. H. Wang, Y. Rivenson, Y. Jin, Z. Wei, R. Gao, H. Günaydın, L. A. Bentolila, C. Kural, and A. Ozcan, “Deep learning enables cross-modality super-resolution in fluorescence microscopy,” Nat. Methods 16(1), 103–110 (2019). [CrossRef]

23. W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nat. Biotechnol. 36(5), 460–468 (2018). [CrossRef]

24. Y. Wu, Y. Luo, G. Chaudhari, Y. Rivenson, A. Calis, K. De Haan, and A. Ozcan, “Bright-field holography: cross-modality deep learning enables snapshot 3d imaging with bright-field contrast using a single hologram,” Light: Sci. Appl. 8(1), 25 (2019). [CrossRef]

25. Y. Sun, X. Wu, Y. Zheng, J. Fan, and G. Zeng, “Scalable non-invasive imaging through dynamic scattering media at low photon flux,” Opt. Lasers Eng. 144, 106641 (2021). [CrossRef]

26. Y. Sun, J. Shi, L. Sun, J. Fan, and G. Zeng, “Image reconstruction through dynamic scattering media based on deep learning,” Opt. Express 27(11), 16032–16046 (2019). [CrossRef]

27. S. Zhu, E. Guo, J. Gu, L. Bai, and J. Han, “Imaging through unknown scattering media based on physics-informed learning,” Photonics Res. 9(5), B210–B219 (2021). [CrossRef]

28. K. L. de Jong and A. S. Bosman, “Unsupervised change detection in satellite images using convolutional neural networks,” in 2019 International Joint Conference on Neural Networks (IJCNN), (IEEE, 2019), pp. 1–8.

29. K. Yamazaki, R. Horisaki, and J. Tanida, “Imaging through scattering media based on semi-supervised learning,” Appl. Opt. 59(31), 9850–9854 (2020). [CrossRef]

30. F. Wang, Y. Bian, H. Wang, M. Lyu, G. Pedrini, W. Osten, G. Barbastathis, and G. Situ, “Phase imaging with an untrained neural network,” Light: Sci. Appl. 9(1), 77 (2020). [CrossRef]

31. K. Monakhova, V. Tran, G. Kuo, and L. Waller, “Untrained networks for compressive lensless photography,” Opt. Express 29(13), 20913–20929 (2021). [CrossRef]

32. S. Liu, X. Meng, Y. Yin, H. Wu, and W. Jiang, “Computational ghost imaging based on an untrained neural network,” Opt. Lasers Eng. 147, 106744 (2021). [CrossRef]

33. E. Kang, H. J. Koo, D. H. Yang, J. B. Seo, and J. C. Ye, “Cycle-consistent adversarial denoising network for multiphase coronary ct angiography,” Med. Phys. 46(2), 550–562 (2019). [CrossRef]

34. I. Freund, M. Rosenbluh, and S. Feng, “Memory effects in propagation of optical waves through disordered media,” Phys. Rev. Lett. 61(20), 2328–2331 (1988). [CrossRef]

35. J. Bertolotti, E. G. Van Putten, C. Blum, A. Lagendijk, W. L. Vos, and A. P. Mosk, “Non-invasive imaging through opaque scattering layers,” Nature 491(7423), 232–234 (2012). [CrossRef]

36. O. Katz, P. Heidmann, M. Fink, and S. Gigan, “Non-invasive single-shot imaging through scattering layers and around corners via speckle correlations,” Nat. Photonics 8(10), 784–790 (2014). [CrossRef]

37. T. Wu, J. Dong, and S. Gigan, “Non-invasive single-shot recovery of a point-spread function of a memory effect based scattering imaging system,” Opt. Lett. 45(19), 5397–5400 (2020). [CrossRef]

38. E. Guo, Y. Sun, S. Zhu, D. Zheng, C. Zuo, L. Bai, and J. Han, “Single-shot color object reconstruction through scattering medium based on neural network,” Opt. Lasers Eng. 136, 106310 (2021). [CrossRef]

39. Y. Shi, E. Guo, L. Bai, and J. Han, “Code of sdsgan,” figshare, (2022) https://opticapublishing.figshare.com/s/a5a272796f709eadc9d3.

40. Y. LeCun, “The mnist database of handwritten digits,” http://yann.lecun.com/exdb/mnist/ (1998).

41. H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747 (2017).

42. T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Yamamoto, and D. Ha, “Deep learning for classical japanese literature,” arXiv preprint arXiv:1812.01718 (2018).

43. E. Romera, J. M. Alvarez, L. M. Bergasa, and R. Arroyo, “Erfnet: Efficient residual factorized convnet for real-time semantic segmentation,” IEEE Trans. Intell. Transport. Syst. 19(1), 263–272 (2018). [CrossRef]

44. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” CVPR (2017).

45. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, (2017), pp. 2223–2232.

46. Y. Zhang, S. Liu, C. Dong, X. Zhang, and Y. Yuan, “Multiple cycle-in-cycle generative adversarial networks for unsupervised image super-resolution,” IEEE Trans. on Image Process. 29, 1101–1112 (2020). [CrossRef]

Method	Training Diffuser	Testing Diffused	MAE	SSIM	PSNR
ERFNet	220grit	220grit	0.0151	0.8370	28.4453
		600grit	0.0978	0.6444	12.0135
		ZnO	0.0972	0.6328	11.9280
P2PGAN	220grit	220grit	0.0636	0.5293	18.6484
		600grit	0.1798	0.2544	9.6313
		ZnO	0.1791	0.2793	9.6384
Ours	None	220grit	0.0432	0.6759	16.1317
		600grit	0.0293	0.7155	17.8050
		ZnO	0.0451	0.6644	15.0652

Method	Training Diffuser	Testing Diffused	MAE	SSIM	PSNR
ERFNet	220grit	220grit	0.0776	0.7338	13.0376
		600grit	0.0892	0.7180	12.1303
		ZnO	0.0862	0.6903	13.1282
P2PGAN	220grit	220grit	0.1332	0.3164	12.1081
		600grit	0.1621	0.4370	10.1339
		ZnO	0.1540	0.4964	10.4132
Ours	None	220grit	0.0436	0.5539	14.0652
		600grit	0.0388	0.5569	14.0515
		ZnO	0.0279	0.4967	13.9065

Method	Training Diffuser	Testing Diffused	MAE	SSIM	PSNR
ERFNet	220grit	220grit	0.1027	0.5625	13.0376
		600grit	0.1050	0.5551	12.4785
		ZnO	0.1189	0.5039	10.9032
P2PGAN	220grit	220grit	0.1540	0.2891	10.5820
		600grit	0.1569	0.2405	10.1028
		ZnO	0.2089	0.1420	9.3089
Ours	None	220grit	0.0278	0.6325	17.2638
		600grit	0.0348	0.6159	16.3824
		ZnO	0.0379	0.5937	16.0385

Method	Training Diffuser	Testing Diffused	MAE	SSIM	PSNR
ERFNet	220grit	220grit	0.0151	0.8370	28.4453
		600grit	0.0978	0.6444	12.0135
		ZnO	0.0972	0.6328	11.9280
P2PGAN	220grit	220grit	0.0636	0.5293	18.6484
		600grit	0.1798	0.2544	9.6313
		ZnO	0.1791	0.2793	9.6384
Ours	None	220grit	0.0432	0.6759	16.1317
		600grit	0.0293	0.7155	17.8050
		ZnO	0.0451	0.6644	15.0652

Method	Training Diffuser	Testing Diffused	MAE	SSIM	PSNR
ERFNet	220grit	220grit	0.0776	0.7338	13.0376
		600grit	0.0892	0.7180	12.1303
		ZnO	0.0862	0.6903	13.1282
P2PGAN	220grit	220grit	0.1332	0.3164	12.1081
		600grit	0.1621	0.4370	10.1339
		ZnO	0.1540	0.4964	10.4132
Ours	None	220grit	0.0436	0.5539	14.0652
		600grit	0.0388	0.5569	14.0515
		ZnO	0.0279	0.4967	13.9065

Prior-free imaging unknown target through unknown scattering medium

Abstract

1. Introduction

2. Method

2.1 Basic theory

2.2 Model design

3. Experiment

3.1 Experimental system and data acquisition

3.2 Comparison of the proposed method and the speckle correlation technology

3.3 Quality comparison of restored targets hidden behind the unknown scattering medium

3.4 Reconstruction of cross-category targets hidden behind the unknown scattering medium

4. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Supplementary Material (1)

Data availability

Cited By

Figures (8)

Tables (3)

Equations (7)

Optics Express