Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Cluster sampling and scalable Bayesian optimization with constraints for negative tone development resist model calibration

Open Access Open Access

Abstract

As the semiconductor technology node continues to shrink, achieving smaller critical dimension in lithography becomes increasingly challenging. Negative tone development (NTD) process is widely employed in advanced node due to their large process window. However, the unique characteristics of NTD, such as shrinkage effect, make the NTD resist model calibration more complex. Gradient descent (GD) and heuristic methods have been applied for calibration of NTD resist model. Nevertheless, these methods depend on initial parameter selection and tend to fall into local optima, resulting in poor accuracy of the NTD model and massive computational time. In this paper, we propose cluster sampling and scalable Bayesian optimization (BO) with constraints method for NTD resist model calibration. This approach utilizes cluster sampling strategy to enhance the capability for global initial sampling and employs scalable BO with constraints for global optimization of high-dimensional parameter space. With this approach, the calibration accuracy is significantly enhanced in comparison with results from GD and heuristic methods, and the computational efficiency is substantially improved compared with GD. By gearing up cluster sampling strategy and scalable BO with constraints, this method offers a new and efficient resist model calibration.

© 2024 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

As the technology node of integrated circuits has been decreasing to 14nm and below, there is an imminent demand for higher lithographic resolution. Various techniques have been developed to meet the requirements of advanced node. Negative tone development (NTD) process [12] as one of these techniques can achieve higher image contrast [3], larger process window [4], smaller edge placement error [5], as well as lower line edge roughness and line width roughness [6] compared to positive tone development process (PTD). Therefore, NTD process has been widely adopted for advanced nodes.

There are several special physical effects in the NTD process, such as photoresist shrinkage [7], developer depletion and diffusion [8], and critical dimension (CD) hopping induced by sub-resolution assistance feature [9]. NTD resist exhibits different development characteristics with organic solvent compared to aqueous based development of PTD resist [1017]. Therefore, resist model for PTD calibration leads to a significant mismatch between simulation and NTD experimental data [16], and additional NTD resist terms must be incorporated to take into account the forementioned physical effects of NTD [16,18]. The number of parameters in the NTD resist model is several times more than that of the PTD, leading to an exponential expansion of the parameter space. Therefore, the calibration of NTD resist model requires a lot more computation resource and efforts, making it increasingly challenging to find the global optima.

Currently, traditional optimization methods, such as gradient descent (GD) [1920] and heuristic methods [2124], are employed in resist model calibration. The former relies on the gradient information of the loss function to update the parameters, which is widely used due to its stable convergence and clear underlying principles. However, the calculation of gradient is usually fairly computation consuming and in most of the cases we do not care much about the gradient convergence pathway. The latter treats the optimization problem as a black box and finds a suboptimal solution by employing certain search mechanisms, such as genetic algorithm (GA) [2122] and particle swarm optimization (PSO) [2324]. This approach offers the advantages of not requiring a priori knowledge or calculation of gradient. However, as the initial sampling of parameter set in the forementioned methods is usually limited, the optimization of high-dimensional parameter space is prone to falling into local optima. If the initial parameters sampling is too concentrated, the optimization will only be restricted to a small subregion of the entire parameter space, resulting in a limited optimization region and a high probability of converging to a local optimum. The NTD resist model possess tens or even hundreds of parameters, leading to an extensive search space and local optima may become plentiful, which makes global optimization more challenging.

In this paper, we propose a cluster sampling and scalable Bayesian optimization (BO) with constraints method for NTD resist model calibration. We employ Latin hypercube sampling (LHS) to uniformly sample the entire parameter space and then utilize cluster center sampling strategy to gather the initial parameter set, thereby ensuring reliable global sampling capability. The scalable BO approach is applied to achieve efficient global optimization by scaling the high-dimensional parameter space according to the current optima, while parameter constraints are incorporated to limit the parameter space to ensure a reasonable optimization process. Calibration results from two common NTD models demonstrate that this method significantly enhances resist model accuracy compared to GD and heuristic methods. Meanwhile, the runtime is considerably reduced compared with the GD provided by PanGen software from DJEL [25]. This method seamlessly combines cluster sampling strategy and scalable BO with constraints approach, providing a novel NTD resist model calibration acceleration strategy that has attracted widespread interest in bridging the new solution to commercial resist model frameworks.

2. Cluster sampling and scalable Bayesian optimization with constraints for NTD resist model calibration

As illustrated in Fig. 1, the calibration process of the proposed method is primarily divided into five steps. In the first step, the ranges of all calibrated parameters are set, and a collection of many parameter sets is generated via LHS [26], an approximate random stratified sampling method in high-dimensional parameter space. The sampled parameter sets are then clustered using the K-Means++ method [27], with the samples at the cluster centers selected as the initial parameter sets to represent each cluster, ensuring high coverage of the entire search space. As shown in Fig. 1, the blue, yellow, and green cylinders represent the total sampled parameter set, the initial parameter set, and the remaining parameter set, respectively. Following multiple iterations, the initial parameter set consisted of 20 parameters, while the remaining parameter set consisted of 499,980 parameters. In the second step, the initial parameter set and the resist model are utilized to obtain the resist contour corresponding to the aerial images. The simulated CD values of the detection points are obtained from the resist contour, which are then compared to experimental values to calculate the root mean square (RMS) of resist model error corresponding to each parameter set. In the third step, if the minimum RMS of resist model error in the initial parameter set decreases at the current iteration, the optimization success count will be increased by one. Otherwise, the failure count is incremented by one. As illustrated in the step 3 of Fig. 1, when the success count exceeds the success threshold, the range of each parameter is increased. When the failure count reaches the failure threshold, the range of each parameter is reduced. The optimization process then returns to the first step. To ensure the parameters are reasonable, the updated parameter ranges must not exceed the preset parameter constraint ranges. Moreover, if the success or failure count do not reach the count threshold, the scaling process will be neglected. In the fourth step, the initial parameter set is adopted as the training dataset, which comprises of parameters and the error RMS values. As shown in the step 4 of Fig. 1, we train a machine learning model using the Gaussian process regression (GPR) method [28], which establishes a mapping relationship between the parameters and the error RMS. The error RMS can be described by the Gaussian process (GP), which has the form of GP(µ(x), σ(x)), where the GP is a collection of random variables satisfying a joint Gaussian distribution, x represents each parameter set, its dimension is the number of calibration parameters, and µ(x) and σ(x) are their mean predicted error RMS value and uncertainty value (the height of the blue shaded area in the fifth step of Fig. 1) of the joint Gaussian distribution, respectively. In the fifth step, the GPR model is used to obtain the average predicted value µ(x) and uncertainty σ(x) of each parameter set in the remaining dataset. Then the acquisition function value for each of the remaining parameter sets is calculated based on their µ(x) and σ(x). Subsequently, the new next parameter corresponding to the maximal acquisition function is selected (the yellow star in Fig. 1). The error RMS of this new parameter is then calculated based on the resist model, which is added to the training dataset for retraining. A new GPR model is then trained based on the updated training dataset. This loop is repeated until the minimum error RMS in the training dataset or the number of iterations reaches a preset value. The number of iterations refers to the number of times the NTD model is executed. As illustrated in Figure 2, the minimum RMS generally converges when the number of iterations is in the range of 12 - 20. Thus, the preset value sets to 20. It is worth noting that the direction of the calibration is determined by the acquisition function based on the predictions of the GPR model, and the error RMS for each parameter set in the training set are obtained by the resist model.

 figure: Fig. 1.

Fig. 1. Workflow of the cluster sampling and scalable Bayesian optimization with constraints method for resist model calibration.

Download Full Size | PDF

 figure: Fig. 2.

Fig. 2. Curves for minimum RMS of model error and number of iterations.

Download Full Size | PDF

2.1. Selecting parameters with cluster sampling

The selection of initial values is crucial for finding the global optima. If the sampling of initial values is excessively concentrated, the search process might be confined to a small subregion of the entire search space, thereby increasing the likelihood of converging to a local optimum. If the distribution region of the initial values is far from the global optima, the search process may either fall into a local optimum or require more iterations to find the global optima. To ensure a broad search view and high computational efficiency, the distribution of initial values should cover the entire parameter space as much as possible with limited sampling data. In order to avoid falling into a local optimum, an efficient global sampling strategy is required and the LHS strategy is adopted in this case. The blue and red box in the first step of Fig. 1 illustrate LHS strategy and K-means++ cluster method, respectively. In detail, the range of each parameter is divided into equal intervals, and then one parameter is sampled from each interval for each input parameter. These sampled parameters are combined to form the total parameter set by randomly arranging them. Then the K-means++ method is employed for clustering the above sampled parameter set, and the parameter corresponding to each cluster center is selected as the initial parameter set, ensuring comprehensive coverage of the search space with limited sampled data.

2.2. Scaling the parameter space with constraints

The optimization of high-dimensional problems has three challenges: global optima in the huge search space is difficult to find, the objective function in high-dimensional space is highly complex and may even be impossible to express analytically, and search spaces grow considerably faster than sampling budgets. To address these challenges, we adopt a scaled sampling strategy [29], which adjusts the parameter ranges according to the changes of the current optima, ensuring that the search process does not easily converge to a local optimum and can effectively search entire space. The third step of Fig. 1 shows the scaled sampling strategy. If the optimal solution at the current iteration is better than the previous one, the optimization success count will be incremented by one. Otherwise, the failure count is increased by one. As shown in the left part of the third step of Fig. 1, when the success count reaches the success threshold, the parameter range is doubles, encouraging the exploration of a larger space to ensure that the search process does not easily fall into a local optimum. The success count is set to zero after changing the range. As shown in the right part of the third step of Fig. 1, when the failure count reaches the failure threshold, the parameter range is reduced to half of the current range, encouraging more fine-grained searches. The failure count is then reset to zero. To balance the shrinking and expanding space, the failure and success thresholds are typically set to the same value. If the threshold is too small, premature scaling of the parameter space may occur, resulting in deviation from the optimum. Conversely, if it is too large, the search process may easily fall into a local optimum. Comparison of the model errors of three cases with thresholds of 1, 2, and 3 are displayed in Table 1, where it can be seen that when the threshold exceeds 2, the model error RMS and range are exactly the same and better than that of threshold of 1. Thus, the threshold is set to 2. Although some parameters may have better solutions, they may not be physically reasonable. Therefore, it is necessary to ensure that the updated parameter bounds do not exceed the preset parameter bounds. When the success or failure count do not reach the count threshold, the scaling process is neglected. Sampling in different regions enables the fitted machine learning model to be more generalizable. This approach allows for efficient global search in parameters spaces with limited sampled data.

Tables Icon

Table 1. Comparison of the error root mean square (RMS) and range for NTD resist model A calibrated by the proposed method with different scaling thresholds. Bold font indicates the best evaluation metrics

2.3. Gaussian process regression model

The fourth step of Fig. 1 illustrates the GPR model. GPR is adopted as the surrogate machine learning model in BO [30], which is available in the open-source package GPyTorch [31]. A Gaussian process is a collection of random variables that satisfy a joint Gaussian distribution. The objective functions f(x) can be described by using a GP of the form:

$$\begin{array}{{c}} {f(x )\sim GP({m(x ),\textrm{}k({x,x^{\prime}} )} )} \end{array}$$
where x and x’ represent the input features or parameters of the different data, and m(x) and k(x, x’) are their mean and kernel function. GPR is a non-parametric regression technique that establishes a distribution of functions consistent with the training dataset. A Matérn kernel is a generalization of the radial basis function kernel and is adopted as the kernel function, which has the form:
$$\begin{array}{{c}} {k({x,x^{\prime}} )= \; \frac{1}{{\Gamma (v ){2^{v - 1}}}}{{\left( {\frac{{\sqrt {2v} }}{l}d({x,x^{\prime}} )} \right)}^v}{K_v}\left( {\frac{{\sqrt {2v} }}{l}d({x,x^{\prime}} )} \right)} \end{array}$$
where v is an adjustable parameter and sets to 1.5, and d(·, ·) represents the Euclidean distance. Γ(·) and Kv(·) represent the gamma function and a modified Bessel function, respectively. The hyperparameters l is a hyper-parameter obtained using the maximum likelihood estimate.

The simulated resist contour is obtained by utilizing a term-based NTD resist model, which consists of several Gaussian convolution kernels. The calibrated parameters control the shape of the Gaussian kernels, thus establishing a mapping relation between RMS and these parameters via the Gaussian kernels. As shown in Figure 3, the RMS predicted by the Gaussian process regression (GPR) is very close to that of the resist model error.

 figure: Fig. 3.

Fig. 3. The predicted RMS via Gaussian Process regression (GPR) versus RMS of resist model error.

Download Full Size | PDF

2.4. Selection of the next parameter

As shown in the lower part of the fifth step in Fig. 1, BO selects the next unexplored parameters by maximizing the acquisition function based on the predictions of the surrogate machine learning model. These parameters are then added into the training dataset for retraining a new surrogate model until the optimum in the training dataset or the number of iterations reaches the preset value. BO was frequently used to optimize high-dimensional black box functions with many variables by employing minimal function evaluations [30]. The key to this method is the acquisition function based on the predictions from a surrogate model. There are several acquisition functions, such as the upper confidence bounds (UCB), the probability of improvement (PI), and the expected improvement (EI). However, since UCB has adjustable hyperparameter and PI focuses more on local optimization, EI is widely adopted due to its lack of adjustable hyperparameter and tends to optimize globally compared with the other two acquisition functions [30]. Its form is as follows:

$$\begin{array}{{c}} {EI(x )= \sigma (x )[{z\Phi (z )+ \varphi (z )} ]} \end{array}$$
$$\begin{array}{{c}} {z = \left\{ {\begin{array}{{c}} {\frac{{\mu (x )- {f^{max}}}}{{\sigma (x )}}, \textrm{when optimize the maximum value}}\\ {\frac{{{f^{min}} - \mu (x )}}{{\sigma (x )}}, \textrm{when optimize the minimum value}} \end{array}} \right.} \end{array}$$
where fmax and fmin represent the maximum and minimum actual values observed so far in the training dataset, respectively, and µ(x) and σ(x) are the mean predicted value and standard deviation (uncertainty) of the input parameters in the unexplored dataset, respectively, which are all obtained from GPR model trained in the fourth step. The top part of the fifth step in Fig. 1 shows µ(x) and σ(x). And Φ(z), φ(z) are the cumulative density function and probability density function of standard normal distributions, respectively. The acquisition function comprises two parts. The first part σ(x)*z*Φ(z) seeks a better solution than the current best estimates (the maximum or minimum values in the training data set), encouraging the exploitation process. The second part σ(x)*φ(z) explores the search space where the uncertainties are the largest, denoting an exploration process. Although exploitation and exploration cannot be maximized simultaneously, maximizing the EI is a reasonable trade-off between them. Therefore, the next new parameters with the largest EI are selected from the unexplored dataset, and then added to the training dataset for retraining the GPR model. This adaptive process is iteratively executed until current optimum satisfies the preset value. In principle, the method relies only on acquisition function that is independent of specific optimization task, implying that it can be used not only for NTD resist model calibration, but can also be widely applied to optimization problems in electronic design automation and other fields.

3. Results and discussion

The proposed method is developed using PyTorch and scikit-learn packages [32], and tested on a Linux machine equipped with 2.7 GHz Intel Xeon CPU and Nvidia Tesla V100 GPU. The mask, resist and lithography tools are modeled in PanGen software. The wavelength of the light source is set to 193 nm, and the numerical aperture (NA) of the imaging system is 1.35. As shown in Fig. 4, the light source is a quasar illumination with XY polarization, where the inner and outer partial coherence factors are 0.57 and 0.68, respectively, and open angle is 60 degrees. The resist thickness is 120 nm. The CDs of test patterns range from 46 nm to 300 nm. As illustrated in Fig. 4, these patterns include the through-pitch line patterns, isolated patterns, two-bar patterns, three-bar patterns, and line-end-to-end patterns. Based on these patterns and corresponding measured data, a gauge file is created, which contains name, coordinate, drawn CD, measured CD, and weight. To validate the effectiveness of the proposed method, we divide the dataset into a training dataset and a test (verification) dataset in a 2:1 ratio. With a fixed fine-tuned optical model, we calibrate the parameters of the NTD resist model on the training dataset using different methods with fixed iterations, obtaining the optimal resist model for each method. Finally, the optimal resist models are used for validation on the test dataset to obtain the CD values and resist model error RMS. To verify the universality of the proposed method, we employ two common NTD resist models provided by PanGen, referred to as NTD Model A and Model B, with 39 and 19 calibration parameters, respectively.

 figure: Fig. 4.

Fig. 4. Simulation conditions and test patterns in this paper.

Download Full Size | PDF

Figure 5 illustrates the NTD resist model A error distribution of GD, GA, PSO and the proposed method on the training and test dataset. The model error is the difference between the measured CD value and the calculated CD value with the calibrated NTD resist Model A. As shown in Fig. 5(a) and (b), compared with GD, GA, and PSO methods, our method reduces the errors of almost all the CDs. The GA and PSO methods obtain a larger error for some of the patterns, and thus the generalizability of our approach is better than that of the above two methods. The GD-calibrated resist model has a significant error on one pattern in the training set, but the errors on the test dataset are generally reasonable. However, this case does not occur in our method, indicating that the generalizability of our approach is better than that of GD. Comparison details are displayed in Table 2, where it can be seen that the model error RMS of the proposed method in the training dataset is 3.74 nm, which is reduced by 21.6%, 27.0%, and 42.7% compared to GD, GA, and PSO, respectively. In the test dataset, the error RMS of the proposed method is 3.66 nm, which is 1.9%, 16.4%, and 48.2% lower than that of the above three methods, respectively, indicating that the method could significantly reduce the model error. In addition, we use the difference between the maximum and minimum values of the model error to evaluate the robustness of our method, and the results show that the error range of our method in the training dataset is 34.11 nm, which is reduced by 60.5%, 46.4%, and 48.0% compared to the other three methods, respectively. The result of this method on the test dataset is 26.47 nm, which is very close to the GD result of 25.21 nm, and is decreased by 21.0% and 62.1% over GA and PSO, respectively, which proves that the stronger robustness of our method. The calibration results on NTD resist Models B demonstrate that the proposed method exhibits same advantages compared to GD, GA and PSO. The computation time of the method is crucial for its application. As shown in Fig. 6, the computation time of the proposed method is almost the same as that of GA and PSO methods, and is 89.0% and 80% lower than that of GD for Model A and B, respectively. Therefore, our method can obtain an NTD resist model with lower error in the same run time level, with stronger robustness.

 figure: Fig. 5.

Fig. 5. Comparison of error distribution of NTD resist model A calibrated by the gradient descent method (GD), the genetic algorithm (GA), the particle swarm optimization method (PSO), and the proposed method, respectively. (a) Training dataset and (b) Test or verification dataset.

Download Full Size | PDF

 figure: Fig. 6.

Fig. 6. Comparison of computational time for NTD resist model calibration performed by the gradient descent method (GD), the genetic algorithm (GA), the particle swarm optimization method (PSO), and the proposed method, respectively. NTD resist Model A (a) and B (b).

Download Full Size | PDF

Tables Icon

Table 2. Comparison of the error root mean square (RMS) and range for NTD resist model A calibrated by the gradient descent method (GD), the genetic algorithm (GA), the particle swarm optimization method (PSO), and the proposed method. Bold font indicates the best evaluation metrics

Different random seeds in initialization sampling could influence the search process. To exclude the effect of random seeds, we selected different random seeds to initialize the GA, PSO, and the proposed method. The GD method is not compared because the interface for adjusting random seeds is not available in PanGen. Figure 7 shows the comparison of model error RMS and range of GA, PSO, and the proposed method. As illustrated in Fig. 7(a) and (b), the error RMS of our method on both the training dataset and the test dataset is lower than that of GA and PSO with the same random seed. As shown in Fig. 7(c), the error range of the proposed method is lower than the results of GA and PSO on the training dataset with the same random seed. As shown in Fig. 7(d), the error range of the proposed method are larger than that of PSO with a random seed of 100, but the result of the proposed method is still lower than that of the other two methods for other random seeds. The above results demonstrate that the advantage of the proposed method over other methods is not affected by random seeds. The results on NTD resist Model B are consistent with those of Model A. The optimization process using GD and heuristic methods easily falls into local optima because the initial value selection is highly random. However, we employ LHS and cluster sampling method to ensure higher space coverage for initial sampling and utilize the scaled BO approach to timely escape local optima and efficiently search globally.

 figure: Fig. 7.

Fig. 7. Comparison of the error root mean square (RMS) (a)(b) and range (c)(d) for NTD resist model A of the genetic algorithm (GA), the particle swarm optimization method (PSO), and the proposed method on the training (a)(c) and test or verification (b)(d) dataset, with different random seeds for initialization.

Download Full Size | PDF

4. Conclusion

In this paper, we propose a cluster sampling and scalable Bayesian optimization with constraints method for NTD resist model calibration. The initial parameter set is first collected using Latin hypercube sampling and cluster sampling strategies to ensure high coverage of the entire parameter space. The scaled BO approach is then employed to scale the high-dimensional parameter space according to the current optimal solution, enabling timely escape from local optima and efficient global optimization. By applying this method to the calibration of common NTD resist models, we show that the calibration error can be reduced by several-fold compared to GD, GA, and PSO. Furthermore, the runtime of our method is at least 80% shorter than that of GD. Our method improves efficiency and accuracy of the calibration of NTD resist model.

Funding

Fundamental Research Funds for the Central Universities (E2ET3801); University of Chinese Academy of Sciences (118900M032); Youth Innovation Promotion Association of the Chinese Academy of Sciences (2021115); Guangdong Province Research and Development Program in Key Fields (2021B0101280002); Ministry of Science and Technology of the People's Republic of China (2019YFB2205005); National Natural Science Foundation of China (62204257, 62274181); Chinese Academy of Sciences (XDA0330401).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. G. Landie, Y. Xu, S. Burns, et al., “Fundamental investigation of negative tone development (NTD) for the 22nm node (and beyond),” Proc. SPIE 7972, 797206 (2011). [CrossRef]  

2. S. A. Robertson, M. Reilly, J. J. Biafore, et al., “Negative tone development: gaining insight through physical simulation,” Proc. SPIE 7972, 79720Y (2011). [CrossRef]  

3. M. Kim, J. Moon, B. S. Nam, et al., “Consideration for application of NTD from OPC and simulation perspective,” Proc. SPIE 8326, 83262C (2012). [CrossRef]  

4. L. V. Look, J. Bekaert, V. Truffert, et al., “Printing the metal and contact layers for the 32 and 22 nm node: comparing positive and negative tone development process,” Proc. SPIE 7640, 764011 (2010). [CrossRef]  

5. D. Kim, H. Y. Yune, D. Park, et al., “Process window variation comparison between NTD and PTD for various contact type,” Proc. SPIE 9780, 97800D (2016). [CrossRef]  

6. S. Tarutani, T. Hideaki, and S. Kamimura, “Development of materials and processes for negative tone development toward 32-nm node 193-nm immersion double-patterning process,” Proc. SPIE 7273, 72730C (2009). [CrossRef]  

7. P. Liu, L. Zheng, M. Ma, et al., “A physical resist shrinkage model for full-chip lithography simulations,” Proc. SPIE 9779, 97790Y (2016). [CrossRef]  

8. A. Chen, K. K. Koh, Y. M. Foong, et al., “Compact modeling of negative tone development resist with photo decomposable quencher,” Proc. SPIE 10961, 13 (2019). [CrossRef]  

9. C. M. Hu, F. Lo, E. Yang, et al., “Optimizing the lithography model calibration algorithms for NTD process,” Proc. SPIE 9780, 978018 (2016). [CrossRef]  

10. L. Zhao, L. Dong, L. Zhang, et al., “Resist model setup for negative tone development at 14NM node,” in 2018 China Semiconductor Technology International Conference (2018), pp. 1–4.

11. J. Bao, C. J. Spanos, and A. R. Romano, “Physically based model for predicting volume shrinkage in chemically amplified resists,” Proc. SPIE 3743, 16–24 (1999). [CrossRef]  

12. C. Fang, M. D. Smith, S. Robertson, et al., “A Physics-based Model for Negative Tone Development Materials,” J. Photopol. Sci. Technol. 27(1), 53–59 (2014). [CrossRef]  

13. W. Gao, U. Klostermann, T. Mülders, et al., “Application of an inverse Mack model for negative tone development simulation,” Proc. SPIE 7973, 79732W (2011). [CrossRef]  

14. N. Jakatdar, J. W. Bao, and C. J. Spanos, “Physical modeling of deprotection-induced thickness loss,” Proc. SPIE 3678, 275–282 (1999). [CrossRef]  

15. B. Küchler, T. Mülders, H. Taoka, et al., “Experimental characterization of NTD resist shrinkage,” Proc. SPIE 10147, 101470F (2017). [CrossRef]  

16. A. Chen, Y. M. Foong, D. Q. Zhang, et al., “Evaluation of compact models for negative-tone development layers at 20/14nm nodes,” Proc. SPIE 9426, 94261P (2015). [CrossRef]  

17. P. I. Hagouel, A. R. Neureuther, and A. M. Zenk, “Negative resist corner rounding. Envelope volume modeling,” J. Vac. Sci. Technol., B: Microelectron. Nanometer Struct.--Process., Meas., Phenom. 14(6), 4257–4261 (1996). [CrossRef]  

18. T. Mülders, H. J. Stock, B. Küchler, et al., “Modeling of NTD resist shrinkage,” Proc. SPIE 10146, 101460M (2017). [CrossRef]  

19. J. T. Chen, Y. Y. Zhao, Y. Zhang, et al., “Label-free neural networks-based inverse lithography technology,” Opt. Express 30(25), 45312–45326 (2022). [CrossRef]  

20. X. Ma and G. R. Arce, “Pixel-based OPC optimization based on conjugate gradients,” Opt. Express 19(3), 2165–2180 (2011). [CrossRef]  

21. W. C. Huang, C. M. Lai, B. Luo, et al., “OPC modeling by genetic algorithm,” Proc. SPIE 5754, 133–1240 (2005). [CrossRef]  

22. R. Wu, L. Dong, X. Ma, et al., “Compensation of EUV lithography mask blank defect based on an advanced genetic algorithm,” Opt. Express 29(18), 28872–28885 (2021). [CrossRef]  

23. L. Wang, S. Li, X. Wang, et al., “Source optimization using particle swarm optimization algorithm in photolithography,” Proc. SPIE 9426, 94261L (2015). [CrossRef]  

24. E. Briones, R. R. Cruz, J. Briones, et al., “Particle swarm optimization of nanoantenna-based infrared detectors,” Opt. Express 26(22), 28484–28496 (2018). [CrossRef]  

25. PanGen User Manual Software Version 2022.12.

26. M. Stein, “Large sample properties of simulations using Latin hypercube sampling,” Technometrics 29(2), 143–151 (1987). [CrossRef]  

27. D. Arthur and S. Vassilvitskii, “K-means++ the advantages of careful seeding,” In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 1027–1035 (2007).

28. C. K. I. Williams and C. E. Rasmussen, “Gaussian processes for regression. Advances in neural information processing systems,” Proc. Adv. Neural Inf. Process. Syst. 8 (1995).

29. D. Eriksson, M. Pearce, J. Gardner, et al., “Scalable global optimization via local Bayesian optimization,” Proc. Adv. Neural Inf. Process. Syst. 32 (2019).

30. D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” J. Global Optim. 13(4), 455–492 (1998). [CrossRef]  

31. J. Gardner, G. Pleiss, K. Q. Weinberger, et al., “Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration,” Proc. Adv. Neural Inf. Process. Syst. 31 (2018).

32. F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: Machine learning in Python,” J. Mach. Learn. Res. 12, 2825–2830 (2011).

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (7)

Fig. 1.
Fig. 1. Workflow of the cluster sampling and scalable Bayesian optimization with constraints method for resist model calibration.
Fig. 2.
Fig. 2. Curves for minimum RMS of model error and number of iterations.
Fig. 3.
Fig. 3. The predicted RMS via Gaussian Process regression (GPR) versus RMS of resist model error.
Fig. 4.
Fig. 4. Simulation conditions and test patterns in this paper.
Fig. 5.
Fig. 5. Comparison of error distribution of NTD resist model A calibrated by the gradient descent method (GD), the genetic algorithm (GA), the particle swarm optimization method (PSO), and the proposed method, respectively. (a) Training dataset and (b) Test or verification dataset.
Fig. 6.
Fig. 6. Comparison of computational time for NTD resist model calibration performed by the gradient descent method (GD), the genetic algorithm (GA), the particle swarm optimization method (PSO), and the proposed method, respectively. NTD resist Model A (a) and B (b).
Fig. 7.
Fig. 7. Comparison of the error root mean square (RMS) (a)(b) and range (c)(d) for NTD resist model A of the genetic algorithm (GA), the particle swarm optimization method (PSO), and the proposed method on the training (a)(c) and test or verification (b)(d) dataset, with different random seeds for initialization.

Tables (2)

Tables Icon

Table 1. Comparison of the error root mean square (RMS) and range for NTD resist model A calibrated by the proposed method with different scaling thresholds. Bold font indicates the best evaluation metrics

Tables Icon

Table 2. Comparison of the error root mean square (RMS) and range for NTD resist model A calibrated by the gradient descent method (GD), the genetic algorithm (GA), the particle swarm optimization method (PSO), and the proposed method. Bold font indicates the best evaluation metrics

Equations (4)

Equations on this page are rendered with MathJax. Learn more.

f ( x ) G P ( m ( x ) , k ( x , x ) )
k ( x , x ) = 1 Γ ( v ) 2 v 1 ( 2 v l d ( x , x ) ) v K v ( 2 v l d ( x , x ) )
E I ( x ) = σ ( x ) [ z Φ ( z ) + φ ( z ) ]
z = { μ ( x ) f m a x σ ( x ) , when optimize the maximum value f m i n μ ( x ) σ ( x ) , when optimize the minimum value
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.