Abstract
Understanding process variations and their impact in silicon photonics remains challenging. To achieve high-yield manufacturing, a key step is to extract the magnitude and spatial distribution of process variations in the actual fabrication, which is usually based on measurements of replicated test structures. In this paper, we develop a Bayesian-based method to infer the distribution of systematic geometric variations in silicon photonics, without requiring replication of identical test structures. We apply this method to characterization data from multiple silicon nitride ring resonators with different design parameters. We extract distributions with standard deviation of 28 nm, 0.8 nm, and 3.8 nm for the width, thickness, and partial etch depth, respectively, as well as the spatial maps of these variations. Our results show that this characterization and extraction approach can serve as an efficient method to study process variation in silicon photonics, facilitating the design of high-yield silicon photonic circuits in the future.
© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
Integrated silicon photonics is a rising technology platform that adopts existing CMOS fabrication infrastructure to meet the future demands of high-rate data communication and to enable other novel applications in sensing, LiDAR, quantum computation, and more [1–3]. To increase yield and performance of photonic components and circuits, the effect of manufacturing variations in silicon photonics are starting to be studied and reported [4–7]. One of the key steps in process variation analysis is to find the magnitude and spatial distributions of process variations in the actual fabrication. These variations are usually inferred from measurements of spatially replicated test structures [5–7].
However, the accuracy of such inference is highly dependent on the design of the test structures. For example, [5] uses ring resonators (RRs) as the test structure, and extracts the variations of total thickness of silicon-on-insulator layer $T$ and waveguide width $w$ from the resonance wavelength and group index. While the mapping is unique and sufficient for the extraction, the group index is not very sensitive to the variation of $T$ and $w$, and can be easily affected by measurement noise. Thus the extraction results in [5] show a strong correlation between these two variations; this is not aligned with the physics of the manufacturing process, because the variations come from two different and independent processes, i.e., deposition and etching.
In order to achieve high accuracy in such inference, the test structure needs to be carefully designed so that its performance is sensitive to the targeted variations, and such that one can decouple the effects of different variation components. As an example, [6] uses a combination of two different Mach-Zehnder interferometers (MZIs) to infer the same variations as [5], and is able to decouple the effects and produces variation estimates having more realistic weak correlation. However, finding test structures that achieve this separation can take many cycles of design, fabrication, and measurement, with substantial cost in time and resources.
In this work, we develop a Bayesian-based method to extract the systematic geometric process variation information from design characterization measurements for different devices that are not spatially replicated. The method is applied to characterization data from multiple silicon nitride RRs (Fig. 1), and the inference of the width $w$, thickness $T$, and partial etch depth $h$ variations on the silicon nitride ridge waveguide (Fig. 2(a)) gives results with good confidence. Since such characterization experiments are widely used for photonic component design and process design kit (PDK) development, our method provides an alternative, efficient, and accessible approach to study process variations in silicon photonics, and thus facilitate the design of high-yield silicon photonic circuit applications in the future.
The original data and source code for implementation are available in Ref. [8].
2. Data extraction and challenges
The characterization data is from 48 silicon nitride all-pass RRs with varying radius and gap width, as shown in Fig. 1(a). Fabrication was performed in the MIT Microsystems Technology Laboratories (MTL) cleanroom facilities on a 6-inch wafer (WaferPro, LLC) with single-sided 200 nm stoichiometric Si$_{3}$N$_{4}$ on 3 $\mathrm {\mu }$m thermal SiO$_{2}$. A single electron-beam lithography step and reactive ion etch step was used to pattern the ridge waveguides and grating couplers into the silicon nitride layer, leaving a 35 nm slab. After resist removal and cleaning, a 2 $\mathrm {\mu }$m thick film of SiO$_{2}$ was deposited and subsequently annealed in nitrogen at 950$^{\circ }$ to remove hydrogen and densify the film [9]. The characterization measurements were performed using a home-built wafer-scale automated system, a 250-$\mathrm {\mu }$m-pitch fiber array (PLC Connections), and an optical vector analyzer in the C and L-band (LUNA OVA 5000). A typical measurement frequency response is shown in Fig. 1(b).
In our measurements, all of the RR devices share almost exactly the same baseline insertion loss resulting from the grating couplers, which thus can be evaluated and removed by averaging and smoothing the responses of RRs with no observable resonance. We denote the response of the $i$-th device after this detrend as $I_i(\lambda )$, where $\lambda$ is wavelength.
For each device with observable resonances, we use the workflow shown in Fig. 2(b) to extract its variation values. In particular, for each resonant peak of each device, the resonance wavelength, peak depth, and full width at half maximum (FWHM) are extracted by nonlinear curve fitting based on a Lorentzian model, which is is an approximate form of the theoretical RR model response [10] when the peak is narrow and sharp enough:
A first-order sensitivity matrix $\mathbf {S}_i$ is built for each device $i$ based on simulation that connects geometric variations $\boldsymbol {p}_i = [\Delta w_i, \Delta T_i, \Delta h_i]^\mathrm {T}$ to the change of the physical properties at all the peaks:
where the nominal values of the physical properties of the peaks $\boldsymbol {\bar {y}}_i$ are also obtained from simulation. It should be noted that we ignore the material refractive index variation in our model, which can be significant and potentially affect the extraction results for silicon nitride devices. However, if one is interested in such variation, one can simply add the index variation $\Delta n_i$ into the model $\boldsymbol {p}_i$, and use the same method presented in this paper to extract this or other additional variation sources.2.1 Auxiliary gradient parameters
It is worth noting that the physical properties in $\boldsymbol {y}_i$ are not occurring at the same location: $n_\text {eff}$ and $n_\text {g}$ are properties of the ring, and $t$ (or $L$) is a property of the ring-waveguide coupler. Considering the size of the ring, it is typical to assume that these properties are experiencing different geometric variations. Therefore, a more rigorous formulation for the variations is $\boldsymbol {p}_i = [\Delta w_i, \Delta T_i, \Delta h_i, \Delta w'_i, \Delta T'_i, \Delta h'_i]^\mathrm {T}$, where the first three indices correspond to the variations at the ring, and the latter three correspond to the variations at the coupler.
In this paper, we are using a different but equivalent formulation:
In the main sections of this paper, we focus on the variation distribution of $w$, $T$, and $h$, and only use the gradient parameters for auxiliary purposes.
2.2 Nominal fitting design
Of the various sources of potential error in the extraction, one that can be easily fixed is the nominal value errors from simulation. This is not a problem for repetitive test structures, since a deviation in nominal value can only result in a systematic mean shift in the results. But since we have different devices, such deviation is different for each device, and therefore introduces extra variance to the results. This issue can be ameliorated by adjusting the nominal values according to the extracted data, i.e., shift and scale the values in each group so that they fit the physical properties from data.
We denote the radius and gap width of the $i$-th RR device as $R_i$ and $\chi _i$, respectively. Effective index and group index are only dependent on the ring and thus are only a function of the radius; the coupling coefficient is a function of both.
In order to simplify the modeling, we choose the problem formulation (2) so that we only need to model the nominal values of effective index at one certain wavelength $\lambda _c$. We denote the simulated nominal effective index for radius $R$ at wavelength $\lambda _c$ as $\hat {n}_\text {eff}(R)$. We now consider the possible error in simulation. We assume the actual nominal value is a “stretch and shift” version of the simulation:
This gives a linear nominal model
We can apply a similar approach to model the nominal group index, but we also need to consider the wavelength dependency. A simple adaptation is to first fit the simulated nominal values to a linear wavelength model:
where $\hat {\eta }_0(R)$ and $\hat {\eta }_1(R)$ are the fitting coefficients. Then we perform the same “stretch and shift” approach: which then gives $\bar {n}_\text {g}(\lambda, R) = \bar {\eta }_0(R) + \lambda \bar {\eta }_1(R)$ andFor the nominal coupling coefficient model, we choose a different approach. This is because the simulation for the coupling coefficient is more complicated and less accurate, and not suitable for the “stretch and shift” scheme. On the other hand, by using the nonlinear transformation $L=\ln \arccos {t}$, we are able to use a simple polynomial model for the nominal value of $L$:
Combining (7), (10), and (12), we obtain
2.3 Naive approach
Combining the nominal fitting in Section 2.2, the problem formulation (3) becomes
For a naive approach, we first find the coefficient $\boldsymbol {w}$ by minimizing the difference between the measurement data $\boldsymbol {y}_i$ and $\mathbf {A}_i \boldsymbol {w}$ across all the devices, which is a least-squares regression problem. Then we solve each $\boldsymbol {p}_i$ using least-squares regression:
Moreover, there are multiple solutions for the effective index in the extraction based on different mode number $M$:
For rings with small radius, such solutions are widely separated, and it is easy to select correctly based on their distance to the nominal value. But when the radius of the ring is larger, these solutions become close to each other and are hard to separate and select, which introduces additional error if we pick the wrong mode number.
3. Error estimation
As noted in Section 2, one of the challenges of the problem comes from the error in the measurement and extraction process. Therefore, if we can manage to obtain information about the extraction error, we may be able to achieve more accurate inference by including those error estimates.
The most direct approach for error estimation is to perform multiple measurements on the same devices, which will not only provide a good estimation of extraction error, but also reduce the noise in the results. However, in some cases, we may need to perform analysis on historical measurement data where the devices have only been measured once, as is the case in this paper. In this section, we present how we can still perform estimation of extraction error even from single measurements on each device across a set of characterization devices.
A rather simple approach for error estimation is to estimate the covariance of the fitting coefficients in (1) from the fitting residual and the linearized version of the model around the fitting results [11]. Then we can estimate the covariance matrix $\mathbf {\Sigma }_i$ of $\boldsymbol {y}_i$ from the linearized relationship between $\boldsymbol {y}_i$ and the coefficients $\boldsymbol {\theta }$. For simplicity, here we assume that the error of extracted properties at different peaks are all uncorrelated, i.e., $\mathbf {\Sigma }_i$ is a diagonal matrix.
However, this approach is not very accurate, since it assumes that the measurement noise is independent and identically distributed (i.i.d.) at different wavelength points, and the entire estimation process uses linear approximations at multiple steps which accumulate the error. One way to visualize this issue is to look at the wavelength dependency of these extracted properties, which is supposed to behave as some smooth and low-order polynomial response based on simulation. Since the effect of process variations is highly correlated across the wavelength, the uncorrelated “rough” perturbations seen in the extracted response (e.g., at 1582 and 1592 nm) in Fig. 3(a) must come from the extraction error. While the raw estimation $\hat {\mathbf {\Sigma }}_i$ from the nonlinear fitting residual here indeed shows a correct relative trend among different peaks, it is often too small overall to explain all the uncorrelated perturbations.
Fortunately, this visualization also motivates an approach to improve the estimation. We can fit the extracted wavelength responses to a low-order polynomial function (e.g., a linear or quadratic function) using the raw estimated $\hat {\mathbf {\Sigma }}_i$ as weights, then choose a proper scaling factor $\gamma _i$ such that $\mathbf {\Sigma }_i=\gamma _i\hat {\mathbf {\Sigma }}_i$ matches the residual from the weighted fitting.
Letting $z_k$ denote the extracted value at wavelength $\lambda _k$, we can build the probabilistic model as
We note that finding $\boldsymbol {w}_z$ is just a weighted linear regression where the weight is inversely proportional to the variance of the extracted error, and the scaling factor is decided by the residual of the weighted fitting. We perform this scaling adjustment to $n_\text {eff}$, $n_\text {g}$, and $L$ of each device separately. But for the effective index $n_\text {eff}$, since the problem formulation (2) only uses one single wavelength point $\lambda _c$, we just need to find the corresponding $\boldsymbol {w}_z$ and calculate the fitting value at $\lambda _c$:
As seen in Fig. 3(a), this improved approach provides a fairly accurate estimation of the extraction error level.
4. Bayesian approach: a toy example
Once we have error information, we can introduce a Bayesian model to incorporate it into our analysis. In order to illustrate this idea, we start from a simple 1D toy example:
where the observed data $\{y_i\}$, $y_i \sim \mathsf {y}_i$, consist of normally distributed clean data $\hat {\mathsf {y}}_i\sim \mathcal {N}(m, \sigma ^2)$ and noise with different variances: $\mathsf {e}_i\sim \mathcal {N}(0, \sigma _i^2)$. Figure 3(b) shows one instantiation of this example, with number of data $N=40$, $\mu =0$, $\sigma =1$, and $\sigma _i$ varying from 0.01 to 10.Suppose that all the $\sigma _i$ are known, and we want to estimate $\mu$ and $\sigma$ from observed data $\{{y}_i\}$. The maximal likelihood (ML) estimator of the problem is the $(\mu, \sigma )$ that maximizes the probability of the observed data: $p_\mathsf {y}(\{{y}_i\}) = \prod _{i=1}^N \mathcal {N}({y}_i; \mu, \sigma ^2 + \sigma _i^2)$. One approach to find the ML estimates is the expectation-maximization (EM) algorithm [12], which is an iterative method commonly used to find the ML estimates of parameters in a probabilistic model. For the toy problem, the algorithm starts from a guess of the parameters $(\mu, \sigma )$, and updates the guess in each iteration using the following steps:
We run the algorithm on the instantiation shown in Fig. 3(b), using the statistics of the noisy data $\{\mathsf {y}_i\}$ as the starting point, and compare the result with the ones directly calculated from the clean data. As shown in Fig. 4(a), the algorithm converges to error $<5{\%}$ within 20 iterations. Intuitively, the algorithm starts from the samples with smaller error variance and uses the estimation to adjust the large error samples. The E-step (21) also provides an estimation for the clean data with confidence interval (shown in Fig. 4(b)), which will be used to estimate the process variation for each device in the next section.
5. Implementation
We assume that the process variations $\boldsymbol {p}_i = \boldsymbol {\mathsf {p}}_i$ and the errors from the properties extraction $\boldsymbol {\mathsf {e}}_i$ are independent random variables following normal distributions with zero mean, and the mode numbers $\mathsf {n}_i$ are independent random uniformly distributed integer variables:
Then we can build a probabilistic model for the extracted property values:
Let $\hat {\boldsymbol {\mathsf {y}}}_i=\mathbf {A}_i \boldsymbol {w} + \mathbf {S}_i \boldsymbol {\mathsf {p}}_i$ denote the clean data. Then this is the same type of problem as in the toy example in Section 4, but with multiple dimensions, mapping between $\boldsymbol {\mathsf {p}}$ and $\hat {\boldsymbol {\mathsf {y}}}$, and an extra discrete random variable $\mathsf {n}_i$. By using the EM algorithm, we can specify the iterative steps as:
- • E-step: from current guess $(\boldsymbol {w}, \mathbf {\Lambda })$, find the posterior $p_{\hat {\boldsymbol {\mathsf {y}}}_i|\boldsymbol {\mathsf {y}}_i}$:$$p_{\hat{\boldsymbol{\mathsf{y}}}_i|\boldsymbol{\mathsf{y}}_i}(\hat{\boldsymbol{y}}_i|\boldsymbol{y}_i) = \sum_{k\in\mathbb{Z}}\mathcal{N}_{z}(k; \nu_i, \rho_i^{{-}1})\mathcal{N}(\hat{\boldsymbol{y}}_i; \boldsymbol{\mu}_i - k\boldsymbol{\delta}_i, \mathbf{R}_i^{{-}1}),$$where$$\begin{array}{c} \mathbf{R}_i = \mathbf{S}_i^{+,\mathrm{T}}\mathbf{\Lambda} \mathbf{S}_i^+{+} \mathbf{\Lambda}_i, \quad \boldsymbol{\mu}_i = \mathbf{R}_i^{{-}1}\left(\mathbf{S}_i^{+,\mathrm{T}}\mathbf{\Lambda} \mathbf{S}_i^+ \mathbf{A}_i\boldsymbol{w} + \mathbf{\Lambda}_i \boldsymbol{y}_i\right), \quad \boldsymbol{\delta}_i = \mathbf{R}_i^{{-}1}\mathbf{\Lambda}_i\boldsymbol{d}_i, \\ \rho_i = \boldsymbol{d}_i^\mathrm{T}\mathbf{\Lambda}_i\boldsymbol{d}_i - \boldsymbol{\delta}_i^\mathrm{T}\mathbf{R}_i\boldsymbol{\delta}_i, \quad \nu_i = \rho_i^{{-}1}\left(\boldsymbol{d}_i^\mathrm{T}\mathbf{\Lambda}_i \boldsymbol{y}_i - \boldsymbol{\delta}_i^\mathrm{T}\mathbf{R}_i\boldsymbol{\mu}_i\right). \end{array}$$
- • M-step: update the guess to $(\tilde {\boldsymbol {w}}, \tilde {\mathbf {\Lambda }})$ based on the posterior (25) (Here we actually perform the M-step using expectation conditional maximization (ECM) algorithm [13], which does not require the exact optimal solution in the M-step):$$\tilde{\mathbf{\Lambda}} = \left[\frac1N \sum_{i=1}^N \mathbf{S}_i^+(\mathbf{V}_i + \boldsymbol{r}_i\boldsymbol{r}_i^\mathrm{T})\mathbf{S}_i^{+,\mathrm{T}}\right]^{{-}1}, \quad \tilde{\boldsymbol{w}} = \left(\sum_{i'=1}^N \mathbf{A}_{i'}^\mathrm{T}\mathbf{S}_{i'}^{+,\mathrm{T}}\mathbf{\Lambda} \mathbf{S}_{i'}^+ \mathbf{A}_{i'}\right)^{{-}1}\sum_{i=1}^N \mathbf{A}_i^\mathrm{T}\mathbf{S}_i^{+,\mathrm{T}}\mathbf{\Lambda} \mathbf{S}_i^+ \boldsymbol{m}_i,$$where$$\boldsymbol{m}_i = \boldsymbol{\mu}_i - \mathbb{E}_{z}[\nu_i,\rho_i^{{-}1}]\boldsymbol{\delta}_i, \quad \mathbf{V}_i = \mathbf{R}_i^{{-}1} + \mathrm{var}_{z}[\nu_i, \rho_i^{{-}1}]\boldsymbol{\delta}_i\boldsymbol{\delta}_i^\mathrm{T}, \quad \boldsymbol{r}_i = \boldsymbol{m}_i - \mathbf{A}_i\boldsymbol{w}.$$
5.1 Discrete Gaussian Distribution
For the discrete normal distribution as shown in (27), the expectation and variance are
Therefore, we only need to focus on the range of $\nu \in [0, 0.5]$. On the other hand, we have
In this way, we can approximate the values using (30) when $\rho$ is large (i.e., $\rho > 80$) or small (i.e., $\rho < 1$).
For the values in the middle range, i.e., $\rho \in [1, 80],\,\nu \in [0, 0.5]$, we can use the finite series approximation of (28) to pre-calculate the values for some parameters of $\rho, \nu$ in a dense grid, then use the linear interpolation for any arbitrary parameters to speed up the calculation time in the EM algorithm.
5.2 Extraction results
For our RR design measurement data, the algorithm converges to distributions with standard deviations
within 600 iterations. In our case, the solution has relatively large $\rho _i$, hence we can roughly approximate the posterior (25) by a normal distribution: $p_{\hat {\boldsymbol {\mathsf {y}}}_i|\boldsymbol {\mathsf {y}}_i}(\hat {\boldsymbol {y}}_i|\boldsymbol {y}_i) \approx \mathcal {N}(\hat {\boldsymbol {y}}_i; \boldsymbol {m}_i, \mathbf {V}_i)$. Notice that $\boldsymbol {p}_i = \mathbf {S}_i^+(\hat {\boldsymbol {y}}_i - \mathbf {A}_i \boldsymbol {w})$; we thus haveWe can also generate spatial maps of the variations based on the estimated posterior (32). Figure 5(b) shows an example of the spatial map of the thickness $T$ variation, which is interpolated and smoothed using a Gaussian process model. It should be noted that the values at the edge of the map are results of extrapolation, and can be inaccurate. Since we do not separate different levels of the spatial variations as in Ref. [7], this map also includes random residual errors and could present short spatial correlations that are nonphysical for intra-die systematic variation. Some of the correlation and extreme values in the extraction results in Fig. 5 can also be the result of ignoring the material index variation; these estimates could potentially be improved by including $\Delta n_i$ into the model.
Finally, the standard deviations of the gradient parameters are 0.36 nm/$\mathrm {\mu }$m, 0.066 nm/$\mathrm {\mu }$m, and 0.14 nm/$\mathrm {\mu }$m for $g_w$, $g_T$, and $g_h$, respectively, which are slightly larger than expected; these numbers suggest a bigger difference between the ring and the coupler than the standard deviations shown in (31). This can be due to the relatively large error in extracting the coupling coefficients $t$, which these gradient parameters solely depend on. It might also suggest that the narrow gap area of the coupler is suffering from larger variation in the etching process.
6. Conclusions
We present a Bayesian approach to infer geometric variation distributions from characterization measurements of spatially non-replicated silicon photonic devices, that incorporates error information into the inference model to reduce the effect of measurement noise. We also provide a reliable approach to estimate the variance of the extracted property errors by combining a nonlinear fitting residual approach and a wavelength dependency approach, serving as an important piece that enables the Bayesian inference.
The results from silicon nitride RR characterization measurements are reported with standard deviations of 28 nm, 0.8 nm, and 3.8 nm for $w$, $T$, and $h$, respectively. The method shows the potential of using such design characterization measurements rather than replicated devices as variation test structures, and thus provides an efficient and accessible approach to study process variations in silicon photonics.
Future work could explore the possibility of further including the simulation error of the sensitivity matrix $\mathbf {S}_i$ and prior knowledge into the Bayesian model. As another avenue, both the auxiliary parameters and our generated spatial maps of variations suggest the possibility of an advanced model that incorporates the spatial information as well. Such future directions might help in achieving reliable variation inferences with even less error.
Disclosures
The authors declare no conflicts of interest.
Data availability
Data underlying the results presented in this paper are available in Ref. [8].
References
1. J. E. Bowers, T. Komljenovic, M. Davenport, J. Hulme, A. Y. Liu, C. T. Santis, A. Spott, S. Srinivasan, E. J. Stanton, and C. Zhang, “Recent advances in silicon photonic integrated circuits,” in Next-Generation Optical Communication: Components, Sub-Systems, and Systems V, vol. 9774 (SPIE, 2016), pp. 977402.
2. X. Wang, W. Shi, H. Yun, S. Grist, N. A. Jaeger, and L. Chrostowski, “Narrow-band waveguide Bragg gratings on SOI wafers with CMOS-compatible fabrication process,” Opt. Express 20(14), 15547–15558 (2012). [CrossRef]
3. C. Sun, M. T. Wade, Y. Lee, et al., “Single-chip microprocessor that communicates directly using light,” Nature 528(7583), 534–538 (2015). [CrossRef]
4. W. A. Zortman, M. R. Watts, and D. C. Trotter, “Determination of wafer and process induced resonant frequency variation in silicon microdisk-resonators,” in Adv. in Optical Sciences Cong., (OSA, 2009), p. IMC5.
5. Z. Lu, J. Jhoja, J. Klein, X. Wang, A. Liu, J. Flueckiger, J. Pond, and L. Chrostowski, “Performance prediction for silicon photonics integrated circuits with layout-dependent correlated manufacturing variability,” Opt. Express 25(9), 9712–9733 (2017). [CrossRef]
6. Y. Xing, J. Dong, S. Dwivedi, U. Khan, and W. Bogaerts, “Accurate extraction of fabricated geometry using optical measurement,” Photonics Res. 6(11), 1008–1020 (2018). [CrossRef]
7. Y. Xing, J. Dong, U. Khan, and W. Bogaerts, “Capturing the effects of spatial process variations in silicon photonic, circuits,” ACS Photonics 10, 928 (2022). https://doi.org/10.1021/acsphotonics.2c01194.
8. Z. Zhang, “zhx-zhang/Characterization_Inference,” Zenodo, 2023, https://doi.org/10.5281/zenodo.7972288.
9. Q. Wilmart, H. El Dirani, N. Tyler, D. Fowler, S. Malhouitre, S. Garcia, M. Casale, S. Kerdiles, K. Hassan, C. Monat, X. Letartre, A. Kamel, M. Pu, K. Yvind, L. K. Oxenløwe, W. Rabaud, C. Sciancalepore, B. Szelag, and S. Olivier, “A versatile silicon-silicon nitride photonics platform for enhanced functionalities and applications,” Appl. Sci. 9(2), 255 (2019). [CrossRef]
10. W. McKinnon, D.-X. Xu, C. Storey, E. Post, A. Densmore, A. Delâge, P. Waldron, J. Schmid, and S. Janz, “Extracting coupling and loss coefficients from a ring resonator,” Opt. Express 17(21), 18971–18982 (2009). [CrossRef]
11. D. M. Bates and D. G. Watts, Nonlinear Regression Analysis and its Applications (Wiley, 1988), chap. 2, pp. 32–66.
12. F. Dellaert, “The expectation maximization algorithm,” Tech. Rep. GIT-GVU-02–20, Georgia Institute of Technology (2002).
13. X.-L. Meng and D. B. Rubin, “Maximum likelihood estimation via the ECM algorithm: A general framework,” Biometrika 80(2), 267–278 (1993). [CrossRef]