Optical spectrum augmentation for machine learning powered spectroscopic ellipsometry

Inho Kim; Inho Kim; Seungho Gwak; Seungho Gwak; Yoonsung Bae; Yoonsung Bae; Taeyong Jo; Taeyong Jo

doi:10.1364/OE.452502

1. Introduction

As complex nanostructures with high aspect ratios have been introduced in the semiconductor industry, precise process control has become more difficult. Therefore, the importance of an accurate measurement system capable of measuring multiple Critical Dimensions (CDs) of the nanostructure has emerged [1,2]. The conventional methods to measure CDs of the nanostructures include two approaches: Destructive method and Non-destructive. A representative example of the destructive method is to apply image processing algorithms to images captured on the cutting plane with high-resolution Scanning Electron Microscopy (SEM). One of the examples of the non-destructive method is Rigorous Coupled-Wave Analysis (RCWA) based on Spectroscopic Ellipsometry (SE) [3,4]. The former can obtain relatively precise values since it uses a high resolution image of the actual structures, but there is a limit to its application in mass production in that it destroys the specimen. The latter preserves the specimen, but it should be preceded by creating a virtual three-dimensional model that mimics the real structures. With the virtual model such as RCWA, it is difficult to accurately reproduce the optical coefficients and geometric features of the various materials of the specimens and physical components of the measuring systems in the real world. Therefore, the non-destructive method with SE is not applicable to the production environment where the multiple materials and measuring instruments are used.

In order to mitigate these problems, previous studies have proposed methodologies to measure nanostructures by introducing machine learning approaches [5–9]. Although these approaches have been confirmed to improve the accuracy of the inference model under limited conditions, there are limitations to their application to actual mass production processes. Since multiple instruments are used in mass production, repeatability and reproducibility is as important as the accuracy of the values produced by the predictive model. For example, data obtained from a single specimen using multiple instruments may vary due to physical differences between the instruments, such as the location, size, and angle of components. Although an inferential model which is trained with a limited number of measuring instruments has high accuracy, it is not guaranteed to maintain high performance when applied to a production environment with multiple measuring instruments.

Data augmentation approaches are widely used in various fields in order to train a predictive model to recognize variations in data for the same object. For example, in the image classification field, data augmentation is applied to train a classification model to process images captured in various forms for one object. The data augmentation includes various image transformation methods, such as rotation and translation [10]. In speech recognition field, data augmentation is performed by transforming the log mel spectrogram through various warping algorithms [11]. Compared to these fields, research on data augmentation methodologies in the SE field is relatively scarce. In this study, we proposed domain specific methodologies to obtain an inferential model that produces consistent values across multiple spectroscopic ellipsometers from the perspective of data augmentation.

2. Methods

2.1 Overall system configuration to obtain an inferential model

In experiments, inferential models are trained to measure the nanoscale profiles which include the diameter of hole for each height of the channel hole of three-dimensional vertical NAND Flash memories (V-NAND).

Data acquisition process for modeling is conducted through sequential steps as shown in Fig. 1. First, optical spectra are obtained using Rotating Polarizer Ellipsometer (RPE) from multiple chips in wafers. Preprocessing methods such as sampling, filtering, and interpolation of wavelengths are performed on the spectra in order to get values at the same wavelengths [9]. In addition, data augmentation techniques are applied to the resulting spectra. Second, CDs are measured by performing destructive methods and SEM imaging at the same position where the optical spectra were obtained in the first step. Finally, by pairing the two types of data, a predictive model is trained through supervised learning with the pairs.

Fig. 1. Overall process of obtaining optical spectra and channel hole CDs for use in supervised learning

Download Full Size | PDF

In general, it is difficult to obtain large amounts of data with destructive methods in semiconductor field because the partial destruction of a wafer results in discarding the whole chips in the wafer. Therefore, in order to mitigate the overfitting problem caused by the lack of data, a basic Artificial Neural Network (ANN) model with about three hidden layers are used, and several machine learning techniques such as Dropout, L2 Regularization are applied. The cost function of the model is designed to minimize the difference between the actual CDs and the predicted values of the model as follows:

(1)$$\mathop {\min }\limits_W \mathop \sum \nolimits_{k = 0}^n \log ({\cosh ({{{\widehat {CD}}_k} - C{D_k}} )} ),$$

where W is the variables of the ANN, n is the number of profiles in data, $\widehat {CD}$ denotes the predicted values of a profile, and $CD$ denotes the actual values of the same profile.

2.2 Rotating polarizer ellipsometer (RPE)

The optical spectra to be used as an input for the supervised learning are obtained from the RPE.

As shown in Fig. 2, RPE is composed of collimated light source, a polarizer rotating at tens of hertz, a specimen loading device, an analyzer driven by a stepping motor, a spectrometer and a detector. RPE is capable of obtaining Normalized Fourier Coefficients ($\alpha $, $\beta $) with about 0.015 nm intervals in the wavelength range from 900 nm to 2200 nm through the detector. The $\alpha $ and $\beta $ are described as follows:

(2)$$\alpha = \frac{{ta{n^2}\Psi - ta{n^2}\Psi }}{{ta{n^2}\Psi + \; ta{n^2}\Psi }},\; \; \beta = \frac{{2cos\varDelta tan\Psi tanA}}{{ta{n^2}\Psi + \; ta{n^2}\Psi }},$$

where A is the angle of the analyzer, and $\Psi $ and $\varDelta $ are the ellipsometry parameters which reflect structural information of the specimen [12,13].

Fig. 2. Conceptual system configuration of RPE.

Download Full Size | PDF

Considering the repeatability of the instrument, $\alpha $ and $\beta $ signals are obtained at least 10 times for each location on the specimen. Then, the average and standard deviation of the $\alpha $ and $\beta $ signals are calculated for each wavelength. Wavelengths with relatively high standard deviation values are excluded, and the average signals at the remaining wavelengths are used. As shown in Fig. 3, the standard deviations of the signals are observed to be relatively high at wavelengths close to 2200 nm. These wavelengths are filtered out by a specific threshold. In order to use the spectra as inputs of the ANN, it is necessary to extract the signal values from the same wavelengths. The cubic spline interpolation is applied to obtain the $\alpha $ and $\beta $ at the common wavelengths which are equally spaced. The common wavelengths are described as follows:

(3)$$\lambda_n{ = \lambda_{min}} + c({n - 1} ),$$

where $\lambda_{min}$ is the minimum wavelength, c is a constant that means the interval between wavelengths, and n is the wavelength index ranges from 1 to the number of wavelengths. The $\alpha $ and $\beta $ values at the common wavelengths are denoted by $\alpha({\lambda_n} )$ and $\beta({\lambda_n} )$, respectively.

Fig. 3. Average (μ) and standard deviation (σ) values for each wavelength ($\lambda $) derived from repeated measurements. The black line represents the μ, and the red area represents the range from μ-6σ to μ+6σ for each wavelength.

Download Full Size | PDF

2.3 Analysis on the spectral difference between multiple ellipsometers

When optical spectra are obtained from multiple instruments for the same specimen, inconsistent values are obtained due to differences in physical components, resulting in negative effects on the performance of the inferential models. In order to alleviate the problem, it is necessary to find the cause of the differences and perform the calibration process through modeling the patterns of the differences. One of the main causes is misalignment of the Charge Coupled Device (CCD) photodetector [14]. It causes the light ray reflected on the grating surface to reach the other pixels different from the central pixel of the CCD array. As a result, the ray is detected at different wavelengths. Therefore, this geometric difference between two different ellipsometers can be described as follows:

(4)$${\varDelta \lambda^{({I,J} )}}{ = \lambda^{(I )}}{ - \lambda^{(J )}},$$

where I and J are indices of the ellipsometers, that is, the $\varDelta \lambda$ expresses the degree of the difference between CCD arrays of the two ellipsometers in the amount of wavelength.

When obtaining optical spectra with multiple instruments from a single specimen, we confirmed that the peaks of spectra do not exactly match as shown in the upper two graphs in Fig. 4. The wavelength at each peak is denoted by $\lambda_i^{(I )}$ where the subscript i represents the peak index that ranges from 1 to the number of peaks in the spectrum obtained from the $I$-th ellipsometer. In the lower graphs in Fig. 4, the differences of peaks in $\alpha $ and $\beta $ seemed to show almost the same pattern. In some previous studies, polynomials were used to model the difference in peaks for use in the wavelength calibration process [15,16]. The spectral difference between two spectra obtained from different ellipsometers is defined as follows:

(5)$$\varDelta {\lambda}_i^{({I,J} )} = {\lambda}_i^{(I )} - {\lambda}_i^{(J )} = \mathop \sum \nolimits_{k = 0}^K a_k^{({I,J} )}{({{\lambda}_i^{(J )}} )^k},$$

where a is a polynomial coefficient and K is the highest degree of the polynomial.

Fig. 4. The wavelength differences at each peak of spectra from different measuring instruments. In the upper two graphs, e1, e2, e3, and e4 represent four different ellipsometers. The lower three graphs show the $\varDelta \lambda $s calculated at peak positions for each pair of the ellipsometers.

Download Full Size | PDF

Figure 5 shows polynomial regression results for $\varDelta \lambda$s obtained from 6 pairs of ellipsometers that are actually used in the production process. The RMSE converged to zero when using polynomials of second order or higher, which means that K of about 2 or 3 may represent the spectral difference caused by misalignment of the photodetector of the RPE in industrial environment.

Fig. 5. Polynomial regression performance trend according to the highest degree of the polynomials.

Download Full Size | PDF

2.4 Data augmentation

In this study, we adopted a data augmentation approach to obtain the robust predictive model that is not adversely affected by the spectral differences mentioned in the previous section. In this section, two data augmentation methods are introduced to be used as comparison groups for the proposed method: Gaussian Noise Augmentation (GNA) and Polynomial Wavelength Calibration (PWC). Then, we propose Stochastic Polynomial Wavelength Calibration (s-PWC) that compensates for the shortcomings of the two methods.

2.4.1 Gaussian noise augmentation (GNA)

GNA is a simple and basic method that can be applied in general regardless of the type of data. It assumes that the data follow a Gaussian distribution, so it probabilistically produces different values within the distribution. According to [17], it is confirmed that the augmentation method using Gaussian noise has a positive effect on the performance of the measuring system based on SE. Therefore, GNA was introduced using the average and standard deviation of $\alpha $ and $\beta $ for each wavelength obtained through repeated measurements as mentioned in 2.2. For each wavelength, $\alpha $ and $\beta $ values are generated as random values within a Gaussian distribution. In fact, GNA expects that the physical differences between multiple RPEs are covered by the random values by chance. However, the concept of this method is actually closer to considering the random noise when performing repeated measurements with a single RPE. Therefore, it is difficult to expect that this method could alleviate the spectral difference between multiple RPEs due to misalignment of the photodetector.

2.4.2 Polynomial wavelength calibration (PWC)

PWC is a domain-specific method of augmenting data using the polynomial $\varDelta \lambda$ information mentioned in 2.3. In terms of the wavelength calibration, the wavelengths of two different RPEs are interchangeable via $\varDelta \lambda$ as follows:

(6)$$\lambda_i^{(I )} = \lambda_i^{(J )} + \varDelta \lambda_i^{({I,J} )} = \lambda_i^{(J )} + \mathop \sum \nolimits_{k = 0}^K a_k^{({I,J} )}{({\lambda_i^{(J )}} )^k},$$

where I and J are indices of two different RPEs. Even though the coefficients of the polynomial are the results of the polynomial regression that minimizes the differences of peak-positions between spectra, (6) is possible to be extended to arbitrary wavelength $\varDelta \lambda $. Therefore, spectral data augmentation can be described as follows:

(7)$${\widetilde {\alpha}^{({I,J} )}}({\lambda_n} )= ({\lambda_n + \varDelta \lambda_n^{({I,J} )}} ),\; \; {\widetilde {\beta}^{({I,J} )}}({\lambda_n} )= ({\lambda_n + \varDelta \lambda_n^{({I,J} )}} ),$$

where $\widetilde {\alpha}({\lambda_n} )$ and $\widetilde {\beta}({\lambda_n} )$ are the augmented spectra at the common wavelengths ${\lambda_n}$. The $\varDelta \lambda$s are obtained from all possible pairs of two RPEs. This method mimics the physical differences between the RPEs in the real world, however, the alignment of the CCD arrays may have changed slightly due to the changes over time and regular maintenance activities. Therefore, if the training data and $\varDelta \lambda$s are obtained at different times, the physical differences in various aspects cannot be sufficiently covered by this method. In addition, this method has a limitation in setting the number of spectra to be augmented during the data augmentation step due to the limited number of RPEs.

2.4.3 Stochastic Polynomial Wavelength Calibration (s-PWC)

s-PWC is similar to PWC, but is a method that randomly generates polynomial coefficients. The coefficients should be limited within a certain range so as not to deviate significantly from the distribution of the physically possible $\varDelta \lambda $ in the real world. As shown in Fig. 5, $\varDelta \lambda $s have a small range of values close to zero, so simply generating the coefficients of polynomials with respect to $\lambda s$ may yield values that are not observed in the real world. In fact, the focus is to make the $\varDelta \lambda$ take the form of a polynomial. Therefore, first we normalized the range of the lambda from -1 to 1 and then generated polynomials with random coefficients. In order to restrict the range of the coefficients, we defined hyper parameter k. The value of k denotes the range of the coefficient. For example, a coefficient ranges from -$k$/2 to $k$/2. The $\varDelta \lambda$s are defined by setting coefficients to random values within the range defined by k as follows:

(8)$$\varDelta \widehat {\lambda} = {a_0}{\widehat {\lambda}^3} + {a_1}{\widehat {\lambda}^2} + {a_2}\widehat {\lambda} + {a_3},\textrm{where} - \frac{k}{2} \le a \le \frac{k}{2}, - 1 \le \widehat {\lambda} \le 1,$$

(9)$$\varDelta = {a_0}\lambda^3 + {a_1}\lambda^2 + {a_2}\lambda + {a_3},$$

where $\hat{\lambda }$ is the normalized $\lambda $.

As shown on the right of Fig. 6, when k is 3, it appeared that $\varDelta \lambda$s are generated within a range similar to the results of polynomial regression on the actual $\varDelta \lambda$s obtained from the RPE pairs in the PWC method. The range covered by the randomly generated $\varDelta \lambda$s becomes wider as the value of k increases, so that more cases of physical differences between RPEs are taken into account.

Fig. 6. Randomly generated $\varDelta \lambda $s at the normalized $\lambda $ when the value of k is 3, and the results of polynomial regression performed on actual $\varDelta \lambda$s obtained from measuring instruments.

Download Full Size | PDF

Figure 7 shows the result of applying s-PWC to a set of real $\alpha $ and $\beta $ when the value of k was 30. In the data augmentation step, augmentation is performed in the same way as PWC using the randomly generated $\varDelta \lambda$s. Since this method has no limit when setting the number of data to be augmented unlike the PWC method, it is expected that s-PWC could be able to cover physical differences in various aspects if a sufficient number of spectra are generated.

Fig. 7. Spectra augmented through s-PWC where k was set to 30.

Download Full Size | PDF

2.5 Evaluation methods and data description

In order to compare the effects of the three methods described in 2.4, supervised learning is performed on inferential models with the same configuration. Data set used in the experiment include about 4500 pairs of spectra and channel hole profile CDs of three-dimensional V-NAND. Data collection period is approximately 1 year, and the spectra were obtained using 6 RPEs. Due to process variations throughout the period, CDs range from 95 nm to 140 nm. About 4000 pairs of data were used as the training set to obtain the inferential model, and the remaining 500 pairs of data were used as the test set. The performance of the inferential model is evaluated through three metrics: Coefficient of determination (R²), RMSE, and the Percentage of Gage R&R value. First, R² is the square of the Pearson correlation coefficient between the predicted value and the actual value, and is one of the measures for evaluating the prediction accuracy of the model. Second, RMSE is also used as an index to evaluate the accuracy of the model. R² and RMSE are calculated through the same 500 test data containing spectra from multiple RPEs. Finally, Gage R&R is an index for evaluating the consistency of the model’s predicted values for spectra obtained with multiple RPEs for the same specimen [18,19]. In experiments, Gage R&R values of the inferential model were calculated with the test data different from R² and RMSE. We obtained about 1600 spectra from 6 wafers with 6 RPEs for the test set. In other words, 6 spectra were obtained from each of about 266 different samples through the 6 RPEs. Since the Gage R&R is a value related to the difference in values predicted using two different measuring instruments for the same sample, the value is calculated for each pair of instruments. Therefore, the mean (μ) and standard deviation (σ) of Gage R&R values were obtained for convenience.

3. Results

3.1 Evaluation of inferential models trained on spectra from a single RPE

The initial process development stage is a constrained environment in which only a small number of measuring instruments can be used. Therefore, it is necessary to confirm whether there are performance improvements when the proposed methods are applied to a predictive model trained using only a single instrument. Supervised learning was performed with only 1500 pieces obtained from a single RPE out of a total of 4000 training data to obtain an inferential model, and the model was evaluated with 500 test samples which include spectra obtained from multiple RPEs.

As shown in Table 1, all three augmentation methods had a positive effect on the R² and RMSE of the model. While GNA had little effect on the Gage R&R of the model, the other two methods improved the Gage R&R. Since GNA takes less account of differences between multiple RPEs, it was not effective in improving Gage R&R in evaluation with the test set containing spectra from multiple RPEs. On the other hand, the other two methods seemed to have a positive effect on Gage R&R because the physical differences of various RPEs are reflected. Among the two methods, s-PWC was observed to be the better way to improve Gage R&R because it is capable of covering more cases of physical differences between RPEs as mentioned in 2.4.3.

Table 1. Evaluation of Learning Results from Data Obtained with a Single RPE.

View Table | View all tables in this article

3.2 Evaluation of inferential models trained on spectra from multiple RPEs

From a practical point of view, it is necessary to confirm the effect of the size of the training set because it is difficult to obtain a large amount of data through the destructive method in the process development stage. By evaluating the performance change according to the number of data, it is possible to estimate the amount of data required to achieve the target performance for application to the mass production environment. In addition, it is also necessary to evaluate the performance of inferential models trained using multiple measuring instruments. As the training data reflects differences between multiple RPEs, the performance improvement due to data augmentation is expected to be relatively small compared to the results in 3.1. In order to confirm these, several inferential models were obtained by progressively increasing the number of training data included in the 4000 samples. Evaluations were conducted using the same fixed test set of 500 samples which was used in 3.1. The hyper parameter k of s-PWC was set to 15 in the experiments. All experimental conditions for each trial were kept the same.

As shown in Table 2, all three augmentation methodologies seemed to show consistent performance gains regardless of the size of the training set. Overall, all three methods had no significant effect on R². GNA and PWC had no significant effect on RMSE either, but it was observed that s-PWC increased RMSE by about 0.04 nm when N was above 3000. Although this is still a small number, it is noteworthy in that it is a phenomenon that does not appear in the other two methods.

Table 2. Evaluation Results for R² and RMSE of Predictive Models according to the Size of the Training Set.

View Table | View all tables in this article

In terms of Gage R&R, the three methods, GNA, PWC, and s-PWC, reduced the μ values of Gage R&R by 0.2, 2.6, and 3.9 on average regardless of N, respectively, compared to the baseline as shown in Table 3. GNA increased the σ value by 0.1, and the other two methods decreased the value by 1.45 and 1.85 on average, respectively. In other words, s-PWC seemed to be the best way to improve the Gage R&R of the inferential model among the three methods. Both GNA and s-PWC are methods using randomly generated values, but the former had no effect and the latter had a performance improvement effect. This indicates that it is actually effective to reflect the polynomial representations of the physical differences between the multiple RPEs. Comparing PWC and s-PWC, PWC may be expected to reflect the physical differences better than s-PWC because PWC uses the result of polynomial regression on the $\varDelta \lambda$s of the actual spectra. However, it was observed that s-PWC improves Gage R&R over PWC. Since the collection period of the data set was about 1 year, the system alignment of the RPEs may have changed slightly due to the changes over time and regular maintenance activities. It is clear that PWC was also effective in improving Gage R&R, but since it cannot cover all changes in the real world in many cases, s-PWC appeared to produce a relatively better effect. In addition, when the number of training data samples was 1500, the values of μ and σ were similar to those of the previous experiment in 3.1. This indicates that a model trained with a single RPE can perform similarly to a model trained with multiple RPEs by applying the proposed data augmentation methods when the training sets have the same size.

Table 3. Evaluation Results for Gage R&R of Predictive Models according to the Size of the Training Set.

View Table | View all tables in this article

3.3 Parametric evaluation of k of s-PWC

As mentioned in 2.4.3, s-PWC is a method of randomly generating signals, so if it deviates too much from the range of signals that can be generated in the real world, it may adversely affect the performance of the inferential model. Therefore, it is necessary to regulate the distribution from which the signals will be generated by setting an appropriate value to the hyper parameter k. We checked the performance of the models by changing the value of k. All experimental conditions other than the k were kept exactly the same.

As shown in Fig. 8, it appeared that the Gage R&R improves as the value of k increases. However, as we expected, adverse effects were observed on RMSE. Considering this trade-off relationship, it suggests that in actual use, it should be used after going through the tuning process of k according to the characteristics of the data.

Fig. 8. Performance change trend of a predictive model according to k value change

Download Full Size | PDF

4. Discussion

In the production environment with multiple measuring instruments, it is important to ensure that the predictive model yields consistent values regardless of the measuring instrument. In this study, we introduced the data augmentation approach based on the knowledge of machine learning and SE. In the experiments, we confirmed that the random generation method using simple Gaussian noise did not show a significant effect unlike the previous studies [17]. On the other hand, we found that PWC and s-PWC which reflect differences between multiple RPEs in polynomial form showed positive effect on Gage R&R improvement of inferential models. In other words, modeling the difference between the photodetectors of multiple RPEs in a polynomial form helps to improve the prediction repeatability and reproducibility of the inferential model in the mass production environment. PWC improves Gage R&R in that it models the actual spectral difference between RPEs based on the polynomial regression. However, PWC is not practical in high volume manufacturing because it requires additional data acquisition steps for each sample with multiple RPEs, and it is difficult to reflect changes of RPEs in the real world, such as changes over time and maintenance activities. We found that the proposed s-PWC is the most practical to use and the best way to improve Gage R&R among the three methods.

The methodology proposed in this study is one of the few Artificial Intelligence (AI) solutions that can be actually applied to the semiconductor mass production environment so far. We expect this study could provide guidelines for improving the performance of inferential models based on machine learning and SE in mass production environments. It is clear that the proposed s-PWC improve the Gage R&R of the predictive model in practical settings, but there is less improvement in R² and RMSE. One of the reasons is that the proposed method focuses only on the misalignment of the photodetector among the various components of the RPE. Future research on data augmentation techniques for the other components such as angular differences in analyzers of RPEs might extend the explanations of the methodologies to improve predictive models’ R², RMSE, and Gage R&R at the same time.

Acknowledgements

All of the work presented in this paper was done in collaboration with many colleagues in Mechatronics Research at Samsung Electronics. We would like to thank our executives Dr. Janggyoo Yang, Dr. Sukwoon Lee, and Dr. Changhoon Choi for their guidance, support, and encouragement on our research project. This research was supported by Mechatronics Research, Samsung Electronics Co., Ltd.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. N. G. Orji, M. Badaroglu, B. M. Barnes, C. Beitia, B. D. Bunday, U. Celano, R. J. Kline, M. Neisser, Y. Obeng, and A. E. Vladar, “Metrology for the next generation of semiconductor devices,” Nat. Electron. 1(10), 532–547 (2018). [CrossRef]

2. D. E. Aspnes, “Spectroscopic ellipsometry — Past, present, and future,” Thin Solid Films 571(3), 334–344 (2014). [CrossRef]

3. M. G. Moharam and T. K. Gaylord, “Rigorous coupled-wave analysis of planar-grating diffraction,” J. Opt. Soc. Am. 71(7), 811–818 (1981). [CrossRef]

4. J. Toudert, “Spectroscopic ellipsometry for active nano- and meta-materials,” Nanotechnol. Rev. 3(3), 223–245 (2014). [CrossRef]

5. M. Shan, Q. Cheng, Z. Zhong, B. Liu, and Y. Zhang, “Deep-learning-enhanced ice thickness measurement using Raman scattering,” Opt. Express 28(1), 48–56 (2020). [CrossRef]

6. D. Dixit, A. Green, E. R. Hosler, V. Kamineni, M. E. Preil, N. Keller, J. Race, J. Chun, M. O’Sullivan, P. Khare, W. Montgomery, and A. C. Diebold, “Optical critical dimension metrology for directed self-assembly assisted contact hole shrink,” J. Micro. Nanolithogr. MEMS MOEMS 15(1), 014004 (2016). [CrossRef]

7. M. H. Madsen and P.-E. Hansen, “Scatterometry—fast and robust measurements of nano-textured surfaces,” Surf. Topogr.: Metrol. Prop. 4(2), 023003 (2016). [CrossRef]

8. J. Liu, D. Zhang, D. Yu, M. Ren, and J. Xu, “Machine learning powered ellipsometry,” Light: Sci. Appl. 10(1), 55 (2021). [CrossRef]

9. I. Kim, Y. Bae, S. Gwak, E. Kum, and T. Jo, “Machine learning aided profile measurement in high-aspect-ratio nanostructures,” Proc. SPIE 11783, 19 (2021). [CrossRef]

10. C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J Big Data 6(1), 60 (2019). [CrossRef]

11. D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” Proc. Interspeech, 2613–2617 (2019).

12. L. Y. Chen and D. W. Lynch, “Scanning ellipsometer by rotating polarizer and analyzer,” Appl. Opt. 26(24), 5221–5228 (1987). [CrossRef]

13. E. Garcia-Caurel, A. D. Martino, J.-P. Gaston, and L. Yan, “Application of Spectroscopic Ellipsometry and Mueller Ellipsometry to Optical Characterization,” Appl. Spectrosc. 67(1), 1–21 (2013). [CrossRef]

14. A. K. Gaigalas, L. Wang, H. J. He, and P. DeRose, “Procedures for Wavelength Calibration and Spectral Response Correction of CCD Array Spectrometers,” J. Res. Natl. Inst. Stand. Technol. 114(4), 215–228 (2009). [CrossRef]

15. C.-H. Tseng, J. F. Ford, C. K. Mann, and T. J. Vickers, “Wavelength Calibration of a Multichannel Spectrometer,” Appl. Spectrosc. 47(11), 1808–1813 (1993). [CrossRef]

16. S. Krishnan, S. Hampton, J. Rix, B. Taylor, and R. M. A. Azzam, “Spectral polarization measurements by use of the grating division-of-amplitude photopolarimeter,” Appl. Opt. 42(7), 1216–1227 (2003). [CrossRef]

17. A. K. Conlin, E. B. Martin, and A. J. Morris, “Data augmentation: an alternative approach to the analysis of spectroscopic data,” Chemom. Intell. Lab. Syst. 44(1-2), 161–173 (1998). [CrossRef]

18. L. Shi, W. Chen, and L. F. Lu, “An Approach for Simple Linear Profile Gauge R&R Studies,” Discrete Dyn. Nat. Soc. 2014, 1–7 (2014). [CrossRef]

19. L. Cepova, A. Kovacikova, R. Cep, P. Klaput, and O. Mizera, “Measurement System Analyses – Gauge Repeatability and Reproducibility Methods,” Meas. Sci. Rev. 18(1), 20–27 (2018). [CrossRef]

	Baseline^a	GNA	PWC	s-PWC
R² (a.u.)	0.75	0.76	0.76	0.77
RMSE (nm)	1.53	1.49	1.47	1.44
μ of Gage R&R (%)	9.15	9.49	8.08	6.81
σ of Gage R&R (%)	4.73	4.37	3.3	2.92

	Baseline		GNA		PWC		s-PWC
N^a (ea)	R² (a.u.)	RMSE (nm)	R² (a.u.)	RMSE (nm)	R² (a.u.)	RMSE (nm)	R² (a.u.)	RMSE (nm)
500	0.74	1.53	0.73	1.54	0.74	1.52	0.74	1.53
1000	0.75	1.47	0.75	1.48	0.75	1.49	0.76	1.46
1500	0.78	1.38	0.79	1.37	0.80	1.36	0.80	1.36
2000	0.81	1.29	0.81	1.28	0.81	1.30	0.80	1.31
2500	0.81	1.28	0.81	1.28	0.83	1.27	0.81	1.30
3000	0.82	1.25	0.82	1.25	0.82	1.25	0.81	1.28
3500	0.83	1.21	0.83	1.22	0.82	1.23	0.82	1.27
4000	0.83	1.22	0.83	1.21	0.83	1.21	0.82	1.26
Avg.	0.8	1.33	0.8	1.3	0.8	1.33	0.8	1.35

	Baseline		GNA		PWC		s-PWC
N^a (ea)	μ^b (%)	σ^c (%)	μ (%)	σ (%)	μ (%)	σ (%)	μ (%)	σ (%)
500	12.02	5.13	11.67	5.50	7.99	3.07	6.72	2.79
1000	11.56	4.45	11.39	4.77	7.90	3.15	6.87	2.87
1500	10.19	4.66	10.08	4.67	7.68	3.04	6.42	2.91
2000	9.98	4.81	9.78	4.43	7.58	3.62	6.49	3.28
2500	9.66	4.73	9.52	4.87	8.09	3.42	6.16	2.57
3000	9.15	4.18	8.41	3.85	7.04	3.10	6.17	2.71
3500	9.76	4.83	10.11	5.11	7.51	3.00	5.66	2.75
4000	9.54	4.57	9.10	4.69	7.17	3.38	5.92	2.70
Avg.	10.23	4.67	10.01	4.74	7.62	3.22	6.3	2.82

	Baseline^a	GNA	PWC	s-PWC
R² (a.u.)	0.75	0.76	0.76	0.77
RMSE (nm)	1.53	1.49	1.47	1.44
μ of Gage R&R (%)	9.15	9.49	8.08	6.81
σ of Gage R&R (%)	4.73	4.37	3.3	2.92

	Baseline		GNA		PWC		s-PWC
N^a (ea)	R² (a.u.)	RMSE (nm)	R² (a.u.)	RMSE (nm)	R² (a.u.)	RMSE (nm)	R² (a.u.)	RMSE (nm)
500	0.74	1.53	0.73	1.54	0.74	1.52	0.74	1.53
1000	0.75	1.47	0.75	1.48	0.75	1.49	0.76	1.46
1500	0.78	1.38	0.79	1.37	0.80	1.36	0.80	1.36
2000	0.81	1.29	0.81	1.28	0.81	1.30	0.80	1.31
2500	0.81	1.28	0.81	1.28	0.83	1.27	0.81	1.30
3000	0.82	1.25	0.82	1.25	0.82	1.25	0.81	1.28
3500	0.83	1.21	0.83	1.22	0.82	1.23	0.82	1.27
4000	0.83	1.22	0.83	1.21	0.83	1.21	0.82	1.26
Avg.	0.8	1.33	0.8	1.3	0.8	1.33	0.8	1.35

Optical spectrum augmentation for machine learning powered spectroscopic ellipsometry

Abstract

1. Introduction

2. Methods

2.1 Overall system configuration to obtain an inferential model

2.2 Rotating polarizer ellipsometer (RPE)

2.3 Analysis on the spectral difference between multiple ellipsometers

2.4 Data augmentation

2.4.1 Gaussian noise augmentation (GNA)

2.4.2 Polynomial wavelength calibration (PWC)

2.4.3 Stochastic Polynomial Wavelength Calibration (s-PWC)

2.5 Evaluation methods and data description

3. Results

3.1 Evaluation of inferential models trained on spectra from a single RPE

3.2 Evaluation of inferential models trained on spectra from multiple RPEs

3.3 Parametric evaluation of k of s-PWC

4. Discussion

Acknowledgements

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (3)

Equations (9)

Optics Express