Unsupervised physics-informed deep learning-based reconstruction for time-resolved imaging by multiplexed ptychography

Omri Wengrowicz; Alex Bronstein; Oren Cohen

doi:10.1364/OE.515445

1. Introduction

Ptychography is a powerful coherent diffractive imaging technique, yielding label-free, high-contrast quantitative amplitude and phase images, without the need for prior information (e.g., support) on the object and probe beam [1,2]. In a conventional ptychographic microscope, a complex-valued object is scanned in a stepwise fashion through a localized beam. In each step, the intensity diffraction pattern from the illuminated region on the object is measured on a Fraunhofer plane. Critically, the illumination spot in each step overlaps substantially with neighboring spots, resulting in significant redundancy in the measured data. The set of recorded diffraction patterns is used to reconstruct the object’s complex transfer function using an iterative phase retrieval algorithm [2].

Applications of scanning-based ptychography are limited to static samples or repetitive events using pump-probe techniques [3]. In order to apply ptychography and exploit its attractive properties to dynamic samples, several single-shot ptychography (SSP) configurations, in which the entire ptychographic data is recorded in a single camera exposure, were proposed and demonstrated [4–9]. We use an SSP setup called 4fSSP [5], that is illustrated in Fig. 1(a). In 4fSSP, the object is placed in the middle of a 4f system, the camera is placed on the output plane and a pinhole array is placed on the input plane. The object is not placed on the Fourier plane, because in that case all the probe beams will overlap on the object. Thus, by slightly shifting the object from the Fourier plane, we obtain an adjustable set of substantially overlapping probe beams as required for ptychography. The second lens then Fourier transforms the object plane, and data is recorded on the camera. For a grid of pinholes in the input, we obtain a corresponding grid of diffraction patterns. Therefore, by dividing the image captured by the camera into blocks, we can get a ptychographic measurement that was captured on a single shot.

Fig. 1. (a) Conceptual diagram of 4fSSP microscope with ray tracing. An array of pinholes is located at the input plane of a 4f system. Lens L1 focuses the light beams that diffract from the array onto the object, which is located at distance $d$ before the back focal plane of lens L1. Lens L2 focuses the diffracted light from the object to the CCD, which is located on the output plane of the 4f system, resulting in blocks of diffraction patterns, where each block corresponds to a region on the object that is illuminated by a beam originating from one of the pinholes. (b) Schematic of the 4fSSP system we simulate. The pinhole array is replaced by a phase SLM displaying an MLA, producing an array of focal spots. The object is replaced amplitude SLM. Lenses L1 and L2 are replaced by lens OL. (c) The phase structure, which is induced by SLM_P, that mimics a micro-lens array (MLA). (d) An example of a simulated diffraction patterns image.

Download Full Size | PDF

Time-resolved Imaging by Multiplexed Ptychography (TIMP) was proposed and demonstrated as a promising approach to obtain ultrahigh-speed high-resolution imaging of complex-valued objects [6,10,11]. In TIMP, an SSP system is illuminated by a burst of pulses that is much faster than the integration time of the sensor, so the diffraction patterns from all the pulses are summed up and recorded in a single camera snapshot. In order to produce a video of the event from the recorded multiplexed ptychographic data, i) the burst should consist of different (preferentially mutually orthogonal) probe pulses and ii) the ordinary ptychographic reconstruction algorithm should be replaced by a multi-state ptychographic algorithm (MsPA) [12,13]. Generation of such a burst of short pulses and its utilization for TIMP was recently reported [11]. TIMP offers exciting possibilities for single-shot ultrahigh-speed imaging [14]. First, due to its relative simplicity, it should be applicable across the electromagnetic spectrum, including extreme UV and x-ray spectral regions. Second, in TIMP, the spatial resolution and frame rate can be largely uncoupled to the number of frames (the cost of increasing the number of frames can be allocated to reduce the field of view or to enhance the complexity of the microscope [6]). Yet, TIMP was demonstrated so far for simple objects (e.g., binary images of digits and letters [10]).

In the past decade, neural networks had a remarkable impact on computational imaging [15]. Specifically, neural network based reconstruction algorithms were demonstrated for scanning-based ptychography [16], and single-shot ptychography [17] showing high quality reconstructions with higher spatial resolution and better resistance to noise. However, those approaches required big training data sets (thousands of samples), thus not applicable for general unknown samples.

In recent years, physics informed untrained neural network were introduced and had a great impact on computational and phase imaging [18,19]. In approaches of this kind, a prior knowledge of the physical model of the system is used, thus enabling the neural network to converge iteratively to the correct solution, without the requirement to train it before.

Here we propose a physics-driven self-supervised neural network based reconstruction method for TIMP. The method is based on the physical model of TIMP, and does not require any training data. We demonstrate the effectiveness of our method on simulated data of natural images. The results show that the method successfully reconstructs high-quality images from TIMP data, featuring superior characteristics over the traditional algorithm, such as resolution and robustness.

This work has the potential to make TIMP experiments more flexible and accessible for a variety of new applications in time-resolved imaging, such as studying the dynamics of materials and biological processes.

2. deepTIMP

2.1 Problem formulation

When probing the nature of thin, dynamic optical entities, such entities can be represented by a complex-valued mask, symbolized as $O(x, y, t)$. In the process of multi-frame imaging, we presume the capture of $K$ frames of the object over time intervals of $\Delta t$. Within the scope of TIMP, we direct a series of $K$ unique pulsed probe beams ${P_k(x,y)}_{k=0}^{k=K-1}$ onto the object at varying temporal instances, paving the way for the subsequent demultiplexing operation intrinsic to the TIMP framework [6,10,11].

Given that the camera acquisition time surpasses the experimental time (probed event) such that $T_{cam} >> K \Delta t$, the recorded intensity integrates an incoherent summation over the power spectra of the $K$ sequential pulses. This integration can be mathematically represented as:

(1)$$I = \sum_{k=0}^{K-1} {|\mathcal{F}[P_k(x,y)O(x,y,k \Delta t)]|^2}$$

where $I$, $P$, and $O$ signify the recorded intensity, probe, and object distributions correspondingly, $k$ denotes the frame/pulse index, and $\mathcal {F}$ is the 2D spatial Fourier operator.

Upon assuming the probe beams to be known, we define the transformation (or forward model) in Eq. (1) as $H$, leading to:

(2) $$I = H(O)$$

Given a recorded intensity $I$, the inverse problem is encapsulated as:

(3)$$\hat{O} = \underset{O}{\arg \min}[D(H(O),I)]$$

where $D$ is the chosen distance (or loss) function. In the context of using neural networks for inverse problems, the goal is to optimize the network to satisfy:

(4)$$f(I)=H^{{-}1}(I)=O$$

where $f$ is the network mapping function and $H^{-1}$ symbolizes the inverse operation of $H$. In a supervised learning strategy, the optimization of $f$ is performed over a vast dataset of $N$ labeled object-measurement pairs:

(5)$$\hat{f} = \underset{f}{\arg \min}\sum_{n=0}^{N-1} {D(f(I_n),O_n)}$$

This approach, however, faces limitations when a substantial set of training samples is unavailable. To mitigate this limitation, physics-driven unsupervised methodologies were proposed [20,21]. As per these methodologies, the necessity to fit multiple samples concurrently is substituted by imposing the relation in Eq. (2) between the reconstructed object and the provided measurement. Consequently, Eq. (3) takes on the form:

(6)$$\hat{O} = \hat{f}(I) \; s.t. \; \hat{f} = \underset{f}{\arg \min}[D(H(f(I)),I)]$$

where $s.t.$ stands for "subject to".

2.2 Simulated optical setup

The simulated optical configuration employed in this work is founded on the 4fSSP scheme (see Fig. 1(a)) [5,6,10,17,22,23]. Within this 4fSSP framework, a pinhole array is situated at the entry plane of a 4f system. Lens L1 directs the light beams diffracting from the array towards the object, which resides at a distance $d$ ahead of lens L1’s rear focal plane. This slight displacement, $d$, from the Fourier plane induces a partial overlap between the beams, a critical aspect in ptychography. Lens L2 collects the object-diffracted light and transfers it to a camera at the 4f system’s output plane. Assuming that the object’s spatial power spectrum is primarily confined to a low-frequency region, the camera records an intensity pattern made up of distinctly identifiable blocks. Each block houses a diffraction pattern tied to a beam stemming from a single pinhole, containing spectral data about a particular region on the object plane.

Our simulation (Fig. 1(b)) models an optical setup consisting of a 520nm plane wave introduced into a modified 4fSSP setup. To maintain the design’s adaptability and versatility, we substitute the static pinhole array with a reflective HOLOEYE PLUTO-2 phase-only spatial light modulator (SLM), denoted by SLM_P, which fabricates a tunable mask-like beam structure on the 4f system’s input plane. Specifically, we adjust the phase so that SLM_P functions as a micro-lens array (MLA), generating an effective pinhole array at a focal distance, $f_{MLA}$, downstream from the SLM. Consequently, SLM_P is situated $f_{MLA}$ before the 4f system’s input plane. An instance of such a phase mask is displayed in Fig. 1(c). It consists of a $N_X \times N_Y = 6 \times 6$ grid of square phase masks, where in each mask the phase is given by $\Phi (r)=\exp (i \pi r^2 / \lambda f_{MLA})$, with $\lambda = 520$nm representing the illumination wavelength and $r$ being the distance from the center of a single micro-lens, which is locally defined in each square mask. We selected $f_{MLA}$ according to $f_{MLA} = f_{OL} b / W = 100$mm, where $f_{OL}=50$mm is the focal length of lens OL, $b=1$mm is the separation between consecutive lenses/pinholes and $W=500 \mu$m is the desired single probe spot size on the object plane. Lens OL supplants both lenses L1 and L2 in a double pass configuration: the beams traverse OL twice – first as they propagate towards the object and then when they are reflected from the object towards the camera. This configuration results in a more compact setup, a crucial aspect in situations where the focal lengths of L1 and L2 are short and present practical mounting challenges.

Up to this point, we have discussed a system that is oriented towards single-frame capture. Given our focus on multi-frame measurements which necessitate subsequent demultiplexing, we employ multiple phase masks to render each pulse (corresponding to a separate frame) distinctive. This is achieved by implementing phase gradient encoding [10], which introduces a unique linear phase (k-vector) to each pulse:

(7)$$\Phi_{\mathbf{k}}(\mathbf{r}) = \exp\left(i\pi \frac{\|\mathbf{r}\|^2}{\lambda f_{MLA}}\right) \exp\left(i\mathbf{k \cdot r}\right)$$

where $\mathbf {k}=(k_x,k_y)$ is a k-vector representing a specific mode, and $\mathbf {r}=(x,y)$ represents the vector of spatial coordinates. For a mutually orthogonal set, every pair of different modes must fulfill the following condition [10]:

(8)$$\Delta\mathbf{k} \in \left\{\frac{2\pi}{b}(i, j)\,\middle| \, (i,j) \in \mathbb{Z}^2,\;(i,j)\neq(0,0)\right\}$$

where $\Delta \mathbf {k} = \mathbf {k_1} - \mathbf {k_2}$, and $\mathbf {k_1},\mathbf {k_2}$ are k-vectors representing two different modes. We chose to use mutually orthogonal probes, obeying the necessary condition for solving the general multiplexed ptychography problem [13]. However, orthogonality is not strictly necessary for subsequent demultiplexing TIMP, as shown in the following sections.

For the sake of ease and simplicity, the preponderance of the analysis conducted in this study presumed the objects to be real and positive, simulated as an amplitude-only SLM (HOLOEYE HED 6001 monochrome LCOS microdisplay). The object SLM, denoted by SLM_O, is placed at a distance $d=7$mm before the focal plane of OL (which is Fourier plane of the 4f system). This results in a $\sim$85% overlap on the object plane between beams originated from neighboring lenses, and a field of view of FOV $\approx (N_X \times N_Y)b d / f_{OL} = 900 \mu$m $\times$ $900 \mu$m [5]. Both SLMs, SLM_P and SLM_O, feature $1920 \times 1080$ pixels, a pixel size of $8 \mu$m and an 8-bit dynamic range.

In the second pass, OL transforms the object’s plane exit-wave into the spatial frequency domain (apart from an additional phase which the camera cannot detect) at the 4f system’s exit plane, where the camera is placed. The transformation from real space to the spatial coordinates is given by $\pmb {\nu } = \mathbf {r} / \lambda f_{OL}$, where $\mathbf {r}$ is a spatial coordinates vector on the camera plane. Therefore, the maximum measurable frequency in each block is represented by:

(9)$$\nu_{cutoff} = \frac{b}{2 \lambda f_{OL}}$$

The resulting images are captured with a simulated Basler acA2440-75um camera that has $2448 \times 2048$ pixels, a pixel size of $3.45 \mu$m and a 12-bit dynamic range.

It is important to note that the parameter values we employed above are considered nominal. However, in the next sections these values are not treated as constants; instead, we systematically vary and adjust them. This approach was undertaken to rigorously evaluate the impact of these parameters on the outcome results, analyze the sensitivity to these parameters, and compare the proposed method to MsPA for reference.

2.3 Network architecture

The available data in the described configuration is a single diffraction patterns image of size $1 \times 2448 \times 2048$. This data is used as the input of any reconstruction algorithm. We demonstrate the proposed method on natural images from CIFAR10 [24]. Since CIFAR10 contains $32 \times 32$ pixels images, the required output size is $N_{frames} \times 32 \times 32$, where $N_{frames}$ is the number of recorded and reconstructed frames. The network we use is a convolutional encoder-decoder comprised of 10 encoding units (encoder) and 5 decoding units (decoder), and denoted by deepTIMP. Each encoder unit in the encoder is comprised of a convolution layer with $4 \times 4$ kernels, stride 2 and 1 pixel padding, followed by a batch normalization layer and a LeakyReLU activation with $\alpha = 0.2$ (except for the first unit that does not contain batch normalization, and the last one for which the activation is a sigmoid function). Each decoder unit is comprised of a transposed convolution layer, followed by a batch normalization layer, an a ReLU activation. The transposed convolution layers of the first 4 decoder unit have $4 \times 4$ kernels, 1 pixel padding and stride 2, and the last one has $3 \times 3$ kernels, stride 1 and 1 pixel padding. The first encoder unit has a single input channel (its input is the input image) and 2 output channels. The number of output channels of every encoder unit is twice the number of its input channels. The number of output channels of every decoder unit is a half of the number of its input channels. Therefore, for an input image of size $1 \times 2448 \times 2048$, the feature encoding size (the size of the result between the encoder and the decoder) is $1024 \times 2 \times 2$ and the output image size is $32 \times 32 \times 32$. We added to the encoder-decoder architecture an output layer, which is a convolution layer with a $3 \times 3$ kernel, stride 1, 1 pixel padding, 32 input channels and $N_{frames}$ output channels. Thus, the output size of the network is $N_{frames} \times 32 \times 32$, as required.

2.4 Training

The training methodology is illustrated in Fig. 2. Unlike the last deep-learning approach for SSP [17] which was a supervised approach, the proposed method is unsupervised. Here we do not use (thousands) samples to train a network until it can generalize. Instead, we use a single diffraction patterns image and an (approximated) optical forward model of the system. The diffraction patterns image is fed into deepTIMP, and the output is then propagated through the forward model. Then we compare the output to input image with a loss function, and update deepTIMP weights to minimize this loss function. After repeating these steps iteratively, deepTIMP is overfitted and outputs a result that after propagation will be as close as possible to the diffraction patterns image at the input. For an accurate forward model, this output is the original object. In essence, deepTIMP is optimized such that the function it embodies approximates the inverse mapping of the TIMP measurement process. As in [17], we feed the recorded diffraction pattern directly into the algorithm without partitioning it into $N \times N$ square segments of individual diffraction patterns. This omission of division potentially enhances the resolution obtained. Information pertaining to frequencies higher than $\nu _{cutoff}$ (refer to Eq. (9)) surpasses the confines of a diffraction pattern block in algorithms that do partition the data [4–6,10], thus limiting the resolution. Furthermore, in such algorithms, this information is not merely lost, but effectively contributes noise to adjacent blocks. Therefore, deepTIMP may result in reconstructions of higher resolution than those produced by previous TIMP algorithms.

Fig. 2. Overview of the Training Procedure. Step 0: Original laboratory measurement. The actual object frames are not directly available; only the recorded data is accessible. This stage is performed once initially, but the captured data is utilized repeatedly throughout the training process. Step 1: Reconstruction of object frames from the data. This data is fed into a neural network, which provides the current approximation for the object frames. Step 2: Propagation of the reconstructed frames to the camera plane using a model of the physical system. This model should ideally be a precise replica of the actual optical system in Step 0. Higher model fidelity leads to more accurate results. Step 3: Constraint imposition. The efficacy of the reconstruction is assessed by comparing the simulated intensities with the measured ones via a loss function. This loss is utilized to calculate the gradients for the network parameters and subsequently update them. The cycle comprising Steps 1-3 is repeated until converging to a solution.

Download Full Size | PDF

We trained deepTIMP (using PyTorch) on natural images from CIFAR10 [24]. Before inputting a raw intensity image, we extracted its square root (utilizing the amplitude instead of the intensity of the image) and normalized it such that the brightest pixel had a value of 1. The training was performed using the Adam optimizer [25], a variant of stochastic gradient descent, minimizing the mean absolute error (L1 loss) between the actual and the reconstructed diffraction intensities. Each reconstruction was trained with a learning rate of 0.0005 and $\beta _1=0.9$. For a typical case, 9 frames of a positive real-valued object, deepTIMP was trained for approximately 2000 epochs on a single NVIDIA GTX 1080Ti GPU, a process that took roughly 15 minutes, similar to the computation time of MsPA.

3. Numerical results

In the subsequent sections, we will draw a comparative analysis between deepTIMP and MsPA. The metric we employ for this comparison is the Structural Similarity Index (SSIM) [26]. The comparative analysis is conducted under several scenarios, which include nominal conditions, variations in the number of frames, variations in the mutual orthogonality of different probe pulses, variations in the overlap of probe beams on the object plane, and a variation of the ptychographic microscope cutoff frequency.

3.1 Image quality

In this section, we investigate the quality of reconstruction and resolution attained by deepTIMP, and compare it to the results from MsPA. The calculated diffraction patterns, the ground-truth and reconstructed frames, and their corresponding power spectra are shown in Fig. 3. Calculating the mean SSIM index and the standard deviation over the nine reconstructed frames in Fig. 3(a), we got $0.92 \pm 0.03$ for deepTIMP, indicating a strong resemblance with the original reference, significantly surpassing the $0.36 \pm 0.15$ attained by MsPA. The power spectra in Fig. 3(c) show no sharp cutoff for both methods. This is not surprising since the system parameters were set such that the system cutoff frequency (Eq. (9)) matches the pixel size of the simulated objects ($24 \mu$m).

Fig. 3. An example of a nine-frame reconstruction. (a) For each frame within the set of nine, three images are presented in sequence from left to right: the ground truth, deepTIMP-reconstructed, and MsPA-reconstructed. The SSIM index of each reconstructed image is listed below it. The mean SSIM index of the nine deepTIMP reconstructed frames is $0.92 \pm 0.03$, indicating a strong resemblance with the original reference, significantly surpassing the $0.36 \pm 0.15$ attained by MsPA. (b) The computed diffraction image used for reconstruction. The sparse image indicates that this configuration can be potentially used with more frames, or for higher resolution objects. (c) Radial average of the 2D spatial spectra of the ground truth, deepTIMP, and MsPA reconstructions, averaged over the nine frames. No clear frequency cutoff can observed in these spectra since the system parameters were set to match the frequency cutoff of the imaged frames.

Download Full Size | PDF

3.2 Number of frames

Within the framework of TIMP, separate temporal frames are perceived as distinct multiplexed ptychographic modes [6,12]. Hence, ideal demultiplexing is theoretically achievable given that specific criteria are satisfied [13]. Nonetheless, the configuration of SSP and the finite dynamic range might contribute to the degradation of the reconstruction quality as the number of frames grows, hence constraining the system frame capacity. More explicitly, the division of the measured intensity image into blocks imposes a resolution limit on conventional algorithms such as MsPA, which is expressed by Eq. (9).

When employing phase gradient encoding, this limit alters due to the fact that $\Delta k = 2\pi /b$ shifts the output intensity diffraction patterns on the camera plane by $\Delta r = \lambda f_{MLA}/b$, thus imposing a limit on the attainable spatial spectral content of the original system. In accordance with Eq. (8), for objects having a typical bandwidth of $\Delta \nu _{obj}=n\nu _{max}$ where $0\le n \le 1$, the highest mode encompassed within the available spectral width is expressed by:

(10)$$\delta \nu_{max} < \nu_{cutoff} - \Delta \nu_{obj}=(1-n)\frac{b}{2\lambda f_1} \Rightarrow i_{max} < (1-n) \frac{b^2}{2 \lambda f_{MLA}}$$

where we used $\delta \nu _{max} = i_{max} \Delta r / \lambda f_{OL}$. Substituting the optical parameters into Eq. (10), it can be inferred that the highest detectable mode in the current setup is $i_{max}= \lfloor 9.6\times (1-n)\rfloor$. Consequently, the system can capture up to 289 frames if the captured object bandwidths are within the limit of $\nu _{max}/6$, up to 169 frames if the captured object bandwidths are within the limit of $\nu _{max}/3$, and up to 49 frames if the captured objects’ bandwidths are within the limit of $\nu _{max}\times 2/3$. However, as evidenced in Fig. 4, MsPA exhibits a consistent decrease in the SSIM index as more frames are introduced. Conversely, when utilizing deepTIMP, there is almost no change and the SSIM index remains steady as the number of frames increases.

Fig. 4. Performance versus number of frames. (a) deepTIMP (top) and MsPA (bottom) reconstruction examples. K is the number of frames in the reconstructed movie. Only the first frame of the movie is presented. The rightmost image is the ground truth image which serves as the reference. (b) SSIM index for each algorithm outcome, for each K, averaged over all the reconstructed frames. The error bars represents the range between the highest and lowest values amongst the reconstructed frames.

Download Full Size | PDF

3.3 Orthogonality

TIMP can be viewed as a distinct subclass of multiplexed ptychography, given that each object mode corresponds to a unique probe mode and reciprocally, rather than exhibiting interplay among all object and probe modes. This shift from the general to the special case is mathematically represented by:

(11)$$I = \sum_{k=1}^{K} {\sum_{l=1}^{L} {\|\mathcal{F}[P_kO_l]\|^2}} \Longrightarrow I = \sum_{k=1}^{K} {\|\mathcal{F}[P_kO_k]\|^2}$$

where $I, P, O$ designate intensity, probe, and object distributions respectively, $k$ represents the probe mode index (which also signifies the object mode index on the right-hand side), $l$ is the object mode index (on the left-hand side), and $\mathcal {F}$ symbolizes the 2D spatial Fourier operator. The left side of Eq. (11) explicates the general multiplexed ptychography problem, while the right side delineates the reduced problem encompassing TIMP, aligning with Eq. (1).

It has been mathematically substantiated that the general problem of multiplexed ptychography can be uniquely resolved provided all probe beam modes are mutually orthogonal [12,13]. However, the validity of this condition for TIMP remained uncertain. To investigate this, we introduce a new parameter, $\delta$, to the mode term in Eq. (7):

(12)$$\Phi^{PGE}_{\mathbf{k},\delta}(\mathbf{r}) = \exp\left(i\delta\mathbf{k \cdot r}\right)$$

If the $\mathbf {k}$ modes satisfy the criterion in Eq. (8), the orthogonality of the modes can be modulated by adjusting $\delta$. For instance, when $\delta =0$ all modes are identical, whereas for $\delta \in \mathbb {Z}$, they are mutually orthogonal.

As demonstrated in Fig. 5, there is no substantial enhancement for integer values of $\delta$. This infers that orthogonality is not a prerequisite in the TIMP configuration being considered. The findings for MsPA depict a slow yet consistent improvement of the SSIM index with an increase in $\delta$, owing to the greater separation between modes, which makes them less multiplexed. For deepTIMP, orthogonality appears to exert minimal influence on the outcomes. This is potentially beneficial as the utilization of denser modes augments the system bandwidth, thereby enabling surpassing of the frame number restriction imposed by Eq. (10).

Fig. 5. Performance versus probe modes orthogonality. We examine how the quality of reconstruction changes with variations in the mutual orthogonality of modes probing different frames. Only the first frame is presented. (a) Reconstruction examples for different $\delta$ values. The rightmost image represents the ground truth which serves as the reference. Following this, we present 8 reconstructions of the same first frame, incrementing $\delta$ from $0.25$ to $2$, moving from left to right. For each $\delta$, the upper image is the deepTIMP reconstruction, while the lower one is the MsPA reconstruction. (b) A quantitative representation of the outcomes, the SSIM index for each algorithm and each $\delta$, averaged over the 9 reconstructed frames. The error bars represents the range between the highest and lowest values amongst the reconstructed frames.

Download Full Size | PDF

3.4 Probe beams overlap

Ptychography operates on the principle of scanning an object using probe beams with significant overlap. This overlapping creates correlations among the measurements, thereby giving ptychography its advantageous attributes. Investigations into the scanning pattern and overlap have suggested optimal overlap levels to be between $60{\%}-80{\%}$ [27,28].

In order to isolate the impacts resulting from varying overlap, we performed simulations for different setups with different overlaps between the probes. This variation was achieved by adjusting $d$, the distance of the object from the Fourier plane. As depicted in Fig. 6, an optimal overlap of $80-85{\%}$ was observed for both methods, and deepTIMP has superior results for the entire overlap range.

Fig. 6. Investigation into the influence of probe beams overlap. We examine how the quality of reconstruction changes with variations in the overlap between neighboring probe beams on the object plane. (a) Reconstructions for different overlap values. Only the first of the nine frames is presented. The rightmost image represents the ground truth which serves as the reference. Following this, we present 6 reconstructions of the same image, incrementing the overlap from $50{\%}$ to $100{\%}$ (calculated assuming gaussian probe beams), moving from left to right. For each overlap, the upper image is the deepTIMP reconstruction, while the lower one is the MsPA reconstruction. (b) A quantitative representation of the outcomes, the SSIM index for each algorithm and each overlap.

Download Full Size | PDF

3.5 Resolution

In previous sections we used the same nominal system configuration, and varied only a single parameter each time. It allowed us to explore the performance of the algorithms under each variation. In this section we shall demonstrate that deepTIMP can yield resolution enhancement when the microscope system has low resolution when iterative ptychographic reconstruction algorithm is used. In order to decrease the cutoff resolution of the microscope, we use smaller separation distance between consecutive lenses in the MLA, $b=0.5$mm (see Eq. (9)). To keep the same FOV, we change the MLA phase mask to consist of a $N_X \times N_Y = 9 \times 9$ lenses grid. The calculated diffraction patterns image is shown in Fig. 7(b). The ground-truth and reconstructed frames are shown in Fig. 7(a), while their corresponding power spectra are shown in Fig. 7(c). Compared with Fig. 3(b), the diffraction patterns are less separated due to the smaller $b$. Moreover, applying MsPA in this case requires division of the diffraction patterns image into $9 \times 9$ blocks of size $b \times b$, resulted with reduced available bandwidth for each block compared to the previous configuration. Therefore, the cutoff frequency and the resolution of the reconstructed frames using MsPA are reduced. Moreover, since each frame is encoded by an illumination pulse with a distinct linear phase, its corresponding diffraction pattern is slightly shifted on the detector plane. Thus, the diffraction patterns of only one frame can be centered in each block, while the others are shifted and experience an even lower cutoff frequency. Indeed, comparing the MsPA reconstructions in this geometry (Fig. 7(a)) with the previous geometry (Fig. 3(a)), the reconstructions in this geometry are significantly blurred. Using Eq. (10), the average cutoff frequency of the current configuration is $0.72\nu _{cutoff}$, in agreement with the cutoff shown in Fig. 7(c). Also, due to the high density of diffraction patterns, interference between neighboring diffraction patterns and modes produces distinct fringes in some of the MsPA reconstructed frames. Interestingly, deepTIMP experiences only a mild performance reduction, as its averaged spatial spectrum follows the ground truth throughout the entire frequency range, including frequencies beyond the calculated cutoff. The difference between the sensitivities of both algorithms to the reduced cutoff frequency of the ptychographic microscope originates from the form that the data is fed into the algorithm. Since deepTIMP get the whole image as an input, without any divisions or cropping, it experiences less information loss and interference compared to MsPA.

Fig. 7. Demonstration of enhanced resolution by using DeepTIMP. (a) For each frame within the set of nine, three images are presented in sequence from left to right: the ground truth, deepTIMP-reconstructed, and MsPA-reconstructed. The SSIM index of each reconstructed image is listed below it. The mean SSIM index for deepTIMP reconstructed frames is $0.88 \pm 0.05$, indicating a strong resemblance with the original reference, significantly surpassing the $0.38 \pm 0.1$ attained by MsPA. (b) The computed diffraction image used for reconstruction. In this configuration the modes are considerably denser. (c) Radial average of the 2D spatial spectra of the ground truth, deepTIMP, and MsPA reconstructions, averaged over the nine frames. The spectral distribution of the deepTIMP reconstruction parallels that of the original image over the entirety of its range, a characteristic indicative of its superior SSIM index. MsPA undergoes a cutoff at $0.7\nu _{cutoff}$, resulting in diminished resolution, a phenomenon visually substantiated in (a).

Download Full Size | PDF

4. Conclusion

In summary, we have introduced and numerically investigated a deep learning oriented reconstruction method for TIMP, denoted by deepTIMP. Utilizing a physical forward model of the system, deepTIMP is trained in an unsupervised manner, obviating the need for large databases and long supervised training procedures. We compared deepTIMP to MsPA, a conventional algorithm for the reconstruction of multiplex ptychography data. Initially, our comparison centered around image quality under normal circumstances, after which we manipulated some parameters of the optical system to scrutinize sensitivity fluctuations regarding the quantity of reconstructed frames, the mutual orthogonality of the probe pulses within the burst, the overlap between contiguous probe beams on the object plane, and the cutoff frequency of the ptychographic microscope. Notably, deepTIMP showcased superior reconstruction quality under all examined conditions. Furthermore, deepTIMP exhibited nearly no sensitivity to the parameter variations administered throughout the majority of the variation ranges, whereas MsPA’s operability is confined to a significantly narrower parameter space. This study numerically examined the application of deepTIMP to TIMP of real-valued dynamic objects. A significant forthcoming step would entail extending and examining the algorithm’s applicability to experimental data and complex-valued imagery. Another promising direction is to apply the unsupervised physics-informed deep learning-based reconstruction approach to other multiplexed ptychographic scenarios.

Funding

H2020 European Research Council (819440).

Disclosures

The authors declare no conflicts of interest.

Data availability

No data were generated or analyzed in the presented research.

References

1. J. Rodenburg, “Ptychography and related diffractive imaging methods,” in Advances in Imaging and Electron Physics, vol. 150Hawkes, ed. (Elsevier, 2008), pp. 87–184.

2. A. M. Maiden and J. M. Rodenburg, “An improved ptychographical phase retrieval algorithm for diffractive imaging,” Ultramicroscopy 109(10), 1256–1262 (2009). [CrossRef]

3. C. La-O-Vorakiat, E. Turgut, C. A. Teale, et al., “Ultrafast demagnetization measurements using extreme ultraviolet light: Comparison of electronic and magnetic contributions,” Phys. Rev. X 2(1), 011005 (2012). [CrossRef]

4. X. Pan, C. Liu, and J. Zhu, “Single shot ptychographical iterative engine based on multi-beam illumination,” Appl. Phys. Lett. 103(17), 171105 (2013). [CrossRef]

5. P. Sidorenko and O. Cohen, “Single-shot ptychography,” Optica 3(1), 9–14 (2016). [CrossRef]

6. P. Sidorenko, O. Lahav, and O. Cohen, “Ptychographic ultrahigh-speed imaging,” Opt. Express 25(10), 10997–11008 (2017). [CrossRef]

7. X. He, C. Liu, and J. Zhu, “Single-shot fourier ptychography based on diffractive beam splitting,” Opt. Lett. 43(2), 214–217 (2018). [CrossRef]

8. X. He, C. Liu, and J. Zhu, “Single-shot aperture-scanning fourier ptychography,” Opt. Express 26(22), 28187–28196 (2018). [CrossRef]

9. G. I. Haham, O. Peleg, P. Sidorenko, et al., “High-resolution (diffraction limited) single-shot multiplexed coded-aperture ptychography,” J. Opt. 22(7), 075608 (2020). [CrossRef]

10. O. Wengrowicz, O. Peleg, B. Loevsky, et al., “Experimental time-resolved imaging by multiplexed ptychography,” Opt. Express 27(17), 24568–24577 (2019). [CrossRef]

11. A. Veler, M. Birk, C. Dobias, et al., “Single-shot ptychographic imaging of non-repetitive ultrafast events,” Opt. Lett. 49(2), 178–181 (2024). [CrossRef]

12. P. Thibault and A. Menzel, “Reconstructing state mixtures from diffraction measurements,” Nature 494(7435), 68–71 (2013). [CrossRef]

13. P. Li, T. Edo, D. Batey, et al., “Breaking ambiguities in mixed state ptychography,” Opt. Express 24(8), 9038–9052 (2016). [CrossRef]

14. J. Liang and L. V. Wang, “Single-shot ultrafast optical imaging,” Optica 5(9), 1113–1127 (2018). [CrossRef]

15. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(8), 921 (2019). Publisher: Optical Society of America. [CrossRef]

16. Z. Guan and E. H. Tsai, “Ptychonet: Fast and high quality phase retrieval for ptychography,” Tech. rep., Brookhaven National Lab. (2019).

17. O. Wengrowicz, O. Peleg, T. Zahavy, et al., “Deep neural networks in single-shot ptychography,” Opt. Express 28(12), 17511 (2020). [CrossRef]

18. Q. Chen, D. Huang, and R. Chen, “Fourier ptychographic microscopy with untrained deep neural network priors,” Opt. Express 30(22), 39597 (2022). [CrossRef]

19. B. Seong, I. Kim, T. Moon, et al., “Untrained deep learning-based differential phase-contrast microscopy,” Opt. Lett. 48(13), 3607 (2023). [CrossRef]

20. D. Yang, J. Zhang, Y. Tao, et al., “Dynamic coherent diffractive imaging with a physics-driven untrained learning method,” Opt. Express 29(20), 31426 (2021). Publisher: Optical Society of America. [CrossRef]

21. F. Wang, Y. Bian, H. Wang, et al., “Phase imaging with an untrained neural network,” Light: Sci. Appl. 9(1), 77 (2020). [CrossRef]

22. B. K. Chen, P. Sidorenko, O. Lahav, et al., “Multiplexed single-shot ptychography,” Opt. Lett. 43(21), 5379–5382 (2018). [CrossRef]

23. W. Xu, H. Xu, Y. Luo, et al., “Optical watermarking based on single-shot-ptychography encoding,” Opt. Express 24(24), 27922–27936 (2016). [CrossRef]

24. A. Krizhevsky, “Learning Multiple Layers of Features from Tiny Images,” Tech. rep. (2009).

25. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

26. A. Hore and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in 20th International Conference on Pattern Recognition (IEEE, 2010), pp. 2366–2369.

27. O. Bunk, M. Dierolf, S. Kynde, et al., “Influence of the overlap parameter on the convergence of the ptychographical iterative engine,” Ultramicroscopy 108(5), 481–487 (2008). [CrossRef]

28. X. Huang, H. Yan, R. Harder, et al., “Optimization of overlap uniformness for ptychography,” Opt. Express 22(10), 12634–12644 (2014). [CrossRef]

Unsupervised physics-informed deep learning-based reconstruction for time-resolved imaging by multiplexed ptychography

Abstract

1. Introduction

2. deepTIMP

2.1 Problem formulation

2.2 Simulated optical setup

2.3 Network architecture

2.4 Training

3. Numerical results

3.1 Image quality

3.2 Number of frames

3.3 Orthogonality

3.4 Probe beams overlap

3.5 Resolution

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Equations (12)

Optics Express