Design of an optical linear-discriminant filter: optimization for enhancement of filter transmittance and discrimination accuracy

Jun-Ichiro Sugisaka; Shingo Shimada; Koichi Hirayama; Takashi Yasui

doi:10.1364/JOSAA.506713

1. INTRODUCTION

In the manufacturing process of semiconductor devices, a process rule of several nanometers has been introduced to produce highly integrated circuits. According to this process rule, nanometer-order defects in the semiconductor wafer significantly affect the yield. By determining the defect type, which includes dust particles, digs, and scratches, it is possible to appropriately handle the wafer defect and improve the fabrication process.

The defect type can be determined using an important structural feature—the concavity or convexity of the defect surface. Some optical measurement techniques, such as interferometry and holography, enable the measurement of height distribution on the sample surface by acquiring the phase of the reflected wave from the sample. For example, diffraction phase microscopy [1,2] and digital holography [3,4] reconstruct defect shapes from fringe patterns. However, for the subwavelength-sized defects, the optical response of a sample is not a simple reflection; it includes the scattering effect in the defect as well as multiscattering between the defect and the structure around the defect. The relationship between the phase and the sample height becomes complicated, which makes it challenging to determine the convexity or concavity of the defect.

Scatterometry determines the sample profile by searching and comparing the measured scattering pattern with that from a numerical scattering simulation [5,6]. Because no imaging system is necessary, the resolution is not restricted by the diffraction limit. Instead of searching for a scattered pattern library, a neural network was employed in [7]. Neural networks have also been applied to the detection of various defects, such as gratings [7], light-emitting diodes [8], solar cells [9], and glass substrates [10].

The problem of classifying the defect types was further investigated by Chien et al., who employed digital holography and logistic regression [11]. Li et al. [12] and Jiang et al. [13] applied a support vector machine. Wu et al. quantified the defect features using polarization characteristics and classified digs and dust according to the Mahalanobis distance [14]. The inverse recognition calibration method [15] classifies dig, dust, and scratch defects in microscopic images by referring to a database of numerical scattering simulations.

In the methods described above, the optical systems only acquire the scattered light from the defect, while the machine learning processes are performed by electronic computers. Optical systems that can detect and classify patterns have also been proposed. One significant approach is to use matched filters [16–21], optical cross-correlation systems, and synthetic discriminant filters (SDFs) [22–26]. Matched filters and cross-correlators are used to detect the specified patterns buried in noisy images (patterns other than the target). SDFs are used to classify input images, which can also be classified as distorted patterns. Recently, a diffractive deep neural network [27] was proposed. It comprises a set of multilayer diffractive optical elements that can discriminate handwritten digit images.

We designed an optical linear discriminant filter (OLDF) based on Fisher’s linear-discriminant analysis (LDA) to classify fine defects smaller than the illumination wavelength. When the scattered light is input into the filter, it presents a dark spot for a concave defect and a bright spot for a convex defect. The classification result is obtained by comparing the spot irradiance with the threshold value. The contrast of the output between the concave and convex defects is maximized and compared with the variance in each sample.

However, the OLDF has a problem—its output irradiance is considerably low. When the defect is smaller than the illumination wavelength, the scattered waves from the sample become significantly weak. Moreover, the scattered light incident on the OLDF destructively interferes with the observation point, further reducing the output-spot irradiance. When measuring such weak spots, the signal-to-noise ratio may be small. This is because Fisher’s LDA, which was originally developed to operate with electronic computers, does not consider signal transmittance.

In this study, we propose a design algorithm for high transmittance of an OLDF while maintaining a high discrimination accuracy. This algorithm employs two objective functions to evaluate discrimination accuracy and filter transmittance. The filter transmittance is iteratively updated by referring to these functions. It converges such that the OLDF outputs a high irradiance with high discrimination accuracy. Section 2 describes the configuration of the optical system including the OLDF. Section 3 presents the design algorithm for the OLDF using objective functions. Section 4 presents the results of the discrimination simulations. Section 5 compares the designed OLDF with conventional filters. In addition, the optimization process is visualized using a simple model to discuss the differences in the characteristics of the conventional filter. Finally, Section 6 concludes the study.

2. OPTICAL SYSTEM WITH OLDF

An optical system with an OLDF filter is shown in Fig. 1. The sample structure, optical elements, and lightwave distributions are constant in the $\zeta$ direction. We considered light propagation only in the $\xi - \eta$-plane. The sample, filter, and observation planes are parallel to the $\xi$ and $\zeta$ axes. The incident light is a two-dimensional Gaussian beam that illuminates the surface of the sample perpendicularly. The beam center is on the $\eta$-axis, and the beam waist is at $\eta = 0$. The sample is a semi-infinite dielectric substrate with its surface placed on $\eta = 0$, and the defect is located at the origin. The sample surface, OLDF, and output plane are located on the focal planes of the objective and imaging lenses. This alignment is typical of a $4F$ system. The OLDF is a Lohmann-type computer-generated hologram that realizes an arbitrary complex modulation by distributing small apertures on the filter surface.

Fig. 1. Optical system of the defect discrimination system. ${\rm S}$, sample with defect; ${{\rm L}_1}$, objective lens (focal length of ${{\rm F}_1}$); ${{\rm L}_2}$, imaging lens (focal length of ${{\rm F}_2}$); P, output plane on which the observation point exists.

Download Full Size | PDF

Here, we describe the output irradiance at an observation point on the output plane. Assuming that the aberration of the imaging lens is removed, the point source at $\xi$ in the OLDF becomes a plane wave on the output plane (Fraunhofer diffraction). The amplitude at ${\xi ^\prime}$ is expressed as

(1)$$C\exp \left({jk\frac{\xi}{{{F_2}}}{\xi ^\prime}} \right),$$

where $j$ is an imaginary unit, ${F_2} \gg \xi$ is the focal length of the imaging lens, and $C$ is a constant. The time ($t$) harmonic factor is represented by $\exp (j\omega t)$, where $\omega$ is the angular frequency of the illumination. This factor was not explicitly described in this study. The amplitude of the diffracted wave was normalized to unity. When the light scattered from the sample uniformly illuminates a small aperture with an amplitude of ${x_n}$, the diffracted wave on the output plane ${f_n}({\xi ^\prime})$ can be written as

(2)$$\begin{split}{f_n}({\xi ^\prime}) &\simeq {x_n}\int_{{c_n} - {h_n}/2}^{{c_n} + {h_n}/2} C\exp \left({jk\frac{\xi}{{{F_2}}}{\xi ^\prime}} \right){\rm d}\xi \\&= C{x_n}\frac{{\sin \left({\frac{{k{h_n}}}{{2{F_2}}}{\xi ^\prime}} \right)}}{{\frac{{k{h_n}}}{{2{F_2}}}{\xi ^\prime}}}{h_n}\exp \left({jk\frac{{{c_n}}}{{{F_2}}}{\xi ^\prime}} \right),\end{split}$$

where ${c_n}$ and ${h_n}$ are the center and size of the aperture, respectively. The filter plane was discretized with an equal interval of $\Delta \xi$, and $N$ apertures were distributed such that

(3)$${h_n} = \frac{{\Delta \xi}}{\pi}\arcsin ({|{w_n}|} )\quad (n = 1,2, \cdots ,N),$$

(4)$${c_n} = \Delta \xi \left[{\left({n - \frac{N}{2}} \right) - \frac{{\arg ({w_n})}}{{2\pi}}} \right]\quad (n = 1,2, \cdots ,N),$$

where ${w_n}$ denotes a complex quantity. The amplitude of the diffracted wave was normalized by setting $C = \pi /\Delta \xi$. This normalization yields ${h_n} = \Delta \xi /2$ when $|{w_n}| = 1$. The diffracted wave distribution on the output plane is then given by

(5)$$y({\xi ^\prime}) = \sum\limits_{n = 1}^N {f_n}({{\xi ^\prime}} ).$$

We set the observation point as $\xi _{\rm o}^\prime = 2\pi {F_2}/(k\Delta \xi)$. The amplitude at this point ${y_{\rm o}}$ is given by

(6)$${y_{\rm o}} = \sum\limits_{n = 1}^N {f_n}\left({\frac{{2\pi {F_2}}}{{k\Delta \xi}}} \right) = \sum\limits_{n = 1}^N w_n^*{x_n}.$$

We define the input vector and weight vector (transmittance of the OLDF), respectively, as

(7)$${\boldsymbol x} = {[{{x_1},{x_2}, \cdots ,{x_N}} ]^{\rm T}},$$

(8)$${\boldsymbol w} = {[{{w_1},{w_2}, \cdots ,{w_N}} ]^{\rm T}},$$

and the output amplitude becomes a complex inner product ${y_{\rm o}} = {{\boldsymbol w}^{\rm H}}{\boldsymbol x}$. Because the optical system is a $4F$ system, this operation is shift invariant—when the defect is shifted, the observation point also shifts in the opposite direction. The amplitude is expressed using Eq. (6). When discriminating defects, the output irradiance $|{y_{\rm o}}{|^2}$ is compared with the threshold value ${w_0}$. If $|{y_{\rm o}}{|^2} \lt {w_0}$, then the sample is classified as a concave defect (class A); otherwise, it is classified as a convex defect (class B).

3. DESIGN ALGORITHM OF OLDF

The target parameters for optimization are the filter transmittance ${\boldsymbol w}$ and threshold irradiance ${w_0}$. These parameters have redundancy; for example, the pairs $({\boldsymbol w},{w_0})$ and $(c{\boldsymbol w},|c{|^2}{w_0})$ (here, $c$ is an arbitrary constant) provide the same discrimination accuracy. Moreover, we can unlimitedly increase the output irradiance by setting a larger $|c|$. However, because $|{w_n}|$ has an upper limit [according to Eq. (3), $|{w_n}| \le 1$], this redundancy must be eliminated. Here, we introduce the filter vector ${\boldsymbol v}(\in {\mathbb{C}^N})$. The filter transmittance ${\boldsymbol w}$ and threshold irradiance ${w_0}$ are represented by

(9)$${\boldsymbol w} = \frac{{\boldsymbol v}}{{|{\boldsymbol v}|}},$$

(10)$${w_0} = |{\boldsymbol v}{|^2}.$$

This definition imposes the constraint that the norm of ${\boldsymbol w}$ should be unity. After the design process is complete, ${\boldsymbol w}$ and ${w_0}$ must be normalized to ${\boldsymbol w}/\max |{{\boldsymbol w}_n}|$ and ${w_0}/\max |{{\boldsymbol w}_n}{|^2}$, such that $|{w_n}| \le 1$. The dimensions of ${\boldsymbol v}$ are equivalent to the amplitude of the scattered waves. Thus, we can plot ${\boldsymbol v}$ and ${\boldsymbol x}$ by using the same coordinate system. Figure 2 presents simple examples of ${\boldsymbol x} \in {\mathbb{R}^2}$ and ${\boldsymbol w} \in {\mathbb{R}^2}$. Samples ${\boldsymbol x}$ belonging to classes A and B are plotted as filled and open circles, respectively. The blue arrow indicates ${\boldsymbol v}$. The decision boundary ${\rm D}$ is geometrically represented by the dashed line in Fig. 2; it is a line perpendicular to the arrow passing through the arrow tip. Note that another decision boundary ${{\rm D}^\prime}$ exists in the position of inverted symmetry with respect to the origin. These boundaries divide the ${x_1} - {x_2}$ plane into three regions. The samples contained in the region between ${\rm D}$ and ${{\rm D}^\prime}$ ($|{{\boldsymbol w}^{\rm H}}{\boldsymbol x}{|^2} \lt {w_0}$) are classified as class A.

Fig. 2. Geometric interpretation of the filter vector ${\boldsymbol v} \in {\mathbb{R}^2}$. The direction of ${\boldsymbol v}$ represents the weight vector (transmittance distribution of the OLDF) and the norm represents the threshold value. The dashed lines ${\rm D}$ and ${{\rm D}^\prime}$ are decision boundaries.

Download Full Size | PDF

A. Objective Functions

Next, to increase the discrimination accuracy, we define an objective function as

(11)$${f_1}({\boldsymbol v}) = \sum\limits_{n = 1}^{{N_A}} {g_{{\rm A}n}}({\boldsymbol v}) + \sum\limits_{n = 1}^{{N_B}} {g_{{\rm B}n}}({\boldsymbol v}),$$

(12)$${g_{{\rm A}n}}({\boldsymbol v}) = \log \left\{{1 + \exp \left[{a\left({\frac{{|{{\boldsymbol v}^T}{{\boldsymbol x}_{{\rm A}n}}{|^2}}}{{|{\boldsymbol v}{|^2}}} - |{\boldsymbol v}{|^2}} \right)} \right]} \right\},$$

(13)$${g_{{\rm B}n}}({\boldsymbol v}) = \log \left\{{1 + \exp \left[{- a\left({\frac{{|{{\boldsymbol v}^T}{{\boldsymbol x}_{{\rm B}n}}{|^2}}}{{|{\boldsymbol v}{|^2}}} - |{\boldsymbol v}{|^2}} \right)} \right]} \right\},$$

where ${{\boldsymbol x}_{{\rm A}n}}$ is the $n$-th training datum of class A (concave defect) and ${{\boldsymbol x}_{{\rm B}n}}$ is the $n$-th training datum of class B (convex defect). ${N_A}$ and ${N_B}$ are the numbers of training data points for the concave and convex defects, respectively. When the output irradiances for ${{\boldsymbol x}_{{\rm A}n}}$, $|{{\boldsymbol w}^{\rm H}}{{\boldsymbol x}_{{\rm A}n}}{|^2}$ are less than ${w_0}$ (the threshold irradiance), ${g_{{\rm A}n}}$ decreases, which decreases ${f_1}$. Similarly, when the output irradiance for ${{\boldsymbol x}_{{\rm B}n}}$, $|{{\boldsymbol w}^{\rm H}}{{\boldsymbol x}_{{\rm B}n}}{|^2}$ is larger than ${w_0}$, ${g_{{\rm B}n}}$ decreases, which also decreases ${f_1}$. The constant $a$ is a hyper-parameter, which is determined manually such that the design process can determine the minimum discrimination error for the training data.

We also define a second objective function to evaluate the output irradiance:

(14)$${f_2}({\boldsymbol v}) = 1 - \frac{{{{\left| {{{\boldsymbol v}^{\rm H}}{\bar{\boldsymbol x}}_{\rm B}^{({\rm s})}} \right|}^2}}}{{|{\bar{\boldsymbol x}}_{\rm B}^{({\rm s})}{|^2}|{\boldsymbol v}{|^2}}}.$$

Vector ${\bar{\boldsymbol x}}_{\rm B}^{{\rm (s)}}$ represents the mean vector of the training data belonging to class B. The notation $({\rm s})$ also represents the scattered field component, and the reflected wave on the substrate surface is not included. The objective function is minimized when ${\boldsymbol v}$ coincides with the complex conjugate ${\bar{\boldsymbol x}}_{\rm B}^{{\rm (s)}}$. Simultaneously, the scattered fields were almost in-phase at the observation point. The reflected-field component is not included because it does not contain information for defect discrimination, and irradiance enhancement for this component at the observation point makes no sense. The training data belonging to class A were not considered because the filter was trained such that the output irradiances for class A defects were small (less than the threshold irradiance).

B. Optimization Process

The filter vector is optimized by an adaptive gradient algorithm [28]. In the $m$-th step, the current filter vector ${{\boldsymbol v}_m}$ is updated to ${{\boldsymbol v}_{m + 1}}$ by

(15)$${v_{m + 1,i}} = {v_{m,i}} - \frac{\alpha}{{\sqrt {{h_{m + 1,i}}}}}{g_i},$$

(16)$${h_{m + 1,i}} = {h_{m,i}} + {\left| {{g_i}} \right|^2},$$

(17)$${g_i} = (1 - \beta)\frac{e}{{e + {f_2}}}\frac{{\partial {f_1}}}{{\partial {v_i}}} + \beta \frac{{{f_2}}}{{e + {f_2}}}\frac{{\partial {f_2}}}{{\partial {v_i}}}.$$

The coefficients $\alpha$ and $\beta (0 \le \beta \le 1)$ are the learning rate and weight parameters, respectively. When $\beta$ is close to zero, the effect of increasing discrimination accuracy is predominant. In contrast, when $\beta = 1$, the optimization process no longer increases the discrimination accuracy, but simply increases the output irradiance. The parameter $e$ $(0 \le e \le 1)$ is the discrimination error defined by

(18)$$e = \frac{{{\rm N}\{{{{\boldsymbol x}_{{\rm A}n}}| {{{| {{{\boldsymbol w}^{\rm H}}{{\boldsymbol x}_{{\rm A}n}}} |}^2} \lt {w_0}} } \} + {\rm N}\left\{{{{\boldsymbol x}_{{\rm B}n}}\left| {{{\left| {{{\boldsymbol w}^{\rm H}}{{\boldsymbol x}_{{\rm B}n}}} \right|}^2} \gt {w_0}}\right. } \right\}}}{{{\rm N}\{{{{\boldsymbol x}_{{\rm A}n}}} \} + {\rm N}\{{{{\boldsymbol x}_{{\rm B}n}}} \}}},$$

where ${\rm N}(S)$ represents the number of elements in set $S$. Factors $e/(e + {f_2})$ and ${f_2}/(e + {f_2})$ balance the contributions of the two object functions. For example, when $e$ is greater than ${f_2}$, the optimization process progresses to increase the discrimination accuracy. The derivative $\partial {f_1}/\partial {\boldsymbol v}$ is given by

(19)$$\frac{{\partial {f_1}({\boldsymbol v})}}{{\partial {\boldsymbol v}}} = \sum\limits_{n = 1}^{{N_A}} \frac{{\partial {y_{{\rm A}n}}({\boldsymbol v})}}{{\partial {\boldsymbol v}}} + \sum\limits_{n = 1}^{{N_B}} \frac{{\partial {y_{{\rm B}n}}({\boldsymbol v})}}{{\partial {\boldsymbol v}}},$$

(20)$$\begin{split}\frac{{\partial {y_{{\rm A}n}}({\boldsymbol v})}}{{\partial {\boldsymbol v}}}&= - 2a\frac{1}{{1 + \exp \left[{- a\left({\frac{{|{{\boldsymbol v}^T}{{\boldsymbol x}_{{\rm A}n}}{|^2}}}{{|{\boldsymbol v}{|^2}}} - |{\boldsymbol v}{|^2}} \right)} \right]}}\\&\quad\times\left({{\boldsymbol v} + \frac{{|{{\boldsymbol v}^T}{{\boldsymbol x}_{{\rm A}n}}{|^2}}}{{|{\boldsymbol v}{|^4}}}{\boldsymbol v} - \frac{{{{({{\boldsymbol v}^{\rm T}}{{\boldsymbol x}_{{\rm A}n}})}^*}}}{{|{\boldsymbol v}{|^2}}}{{\boldsymbol x}_{{\rm A}n}}} \right),\end{split}$$

(21)$$\begin{split}\frac{{\partial {y_{{\rm B}n}}({\boldsymbol v})}}{{\partial {\boldsymbol v}}} &= 2a\frac{1}{{1 + \exp \left[{a\left({\frac{{|{{\boldsymbol v}^T}{{\boldsymbol x}_{{\rm B}n}}{|^2}}}{{|{\boldsymbol v}{|^2}}} - |{\boldsymbol v}{|^2}} \right)} \right]}}\\&\quad\times\left({{\boldsymbol v} + \frac{{|{{\boldsymbol v}^T}{{\boldsymbol x}_{{\rm B}n}}{|^2}}}{{|{\boldsymbol v}{|^4}}}{\boldsymbol v} - \frac{{{{({{\boldsymbol v}^{\rm T}}{{\boldsymbol x}_{{\rm B}n}})}^*}}}{{|{\boldsymbol v}{|^2}}}{{\boldsymbol x}_{{\rm B}n}}} \right),\end{split}$$

and $\partial {f_2}/\partial {\boldsymbol v}$ is given by

(22)$$\frac{{\partial {f_2}({\boldsymbol v})}}{{\partial {\boldsymbol v}}} = - 2\frac{{{{({{\boldsymbol v}^{\rm T}}{{{\bar{\boldsymbol x}}}_{\rm B}})}^*}}}{{|{{{\bar{\boldsymbol x}}}_{\rm B}}{|^2}|{\boldsymbol v}{|^2}}}{{\bar{\boldsymbol x}}_{\rm B}} + 2\frac{{|{{\boldsymbol v}^T}{{{\bar{\boldsymbol x}}}_{\rm B}}{|^2}}}{{|{{{\bar{\boldsymbol x}}}_{\rm B}}{|^2}|{\boldsymbol v}{|^4}}}{\boldsymbol v}.$$

The initial value ${{\boldsymbol v}_0}$ is set such that the output irradiances for convex defects can be maximized. This is achieved when ${\boldsymbol w}$ is parallel to ${\bar{\boldsymbol x}}_{\rm B}^{{\rm (s)}}$. The threshold value is set such that the decision line is between ${{\bar{\boldsymbol x}}_{\rm A}}$ and ${{\bar{\boldsymbol x}}_{\rm B}}$. Consequently, we set ${{\boldsymbol v}_0}$ as

(23)$${{\boldsymbol v}_0} = \sqrt {\frac{{{\bar{\boldsymbol x}}_{\rm B}^{{\rm (s)H}}}}{{|{\bar{\boldsymbol x}}_{\rm B}^{{\rm (s)}}|}}\frac{{{{{\bar{\boldsymbol x}}}_{\rm A}} + {{{\bar{\boldsymbol x}}}_{\rm B}}}}{2}} \frac{{{\bar{\boldsymbol x}}_{\rm B}^{{\rm (s)}}}}{{|{\bar{\boldsymbol x}}_{\rm B}^{{\rm (s)}}|}}.$$

4. DISCRIMINATION SIMULATION

For practical use, the OLDF was designed using a training dataset prepared through a numerical scattering simulation. The discrimination process was performed using actual samples and the fabricated OLDF. In this study, however, we numerically performed the discrimination process using a validation dataset that comprised samples different from those in the training dataset.

The training and validation datasets were prepared as follows. The defect cross-section is shown in Fig. 3. The defect width $d$ is defined by the size in $\xi$ direction at the sample surface. The defect height $h$ is the length in $\eta$ direction from the substrate surface to the top (bottom) of the defect. The negative and positive heights represent concave and convex defects, respectively. Both the training and validation datasets contained defects with sizes in the ranges of $0.1\lambda \le d \le 2\lambda$ and ${-}\lambda \le h \le \lambda$. The training and validation data are uniformly distributed within these ranges. We equally divided these ranges into $M$ sections for both width and height and selected one defect size from each section using uniform random numbers. $M$ was set to 20 for the training dataset and 30 for the validation dataset; thus, the training and validation datasets included 400 and 900 data points, respectively. After determining $d$ and $h$, a parabola was drawn on the defect surface and divided into small boundary elements with lengths less than $\lambda /140$. The element node was then randomly shifted in a direction perpendicular to the parabola. The shift amount of each node was determined by a normal random number ${\cal N}(0,\sigma)$. The parameter $\sigma$ was determined such that the entire length of the defect perimeter was 10% longer than that of the original parabola.

Fig. 3. Cross section of (a) concave and (b) convex defects.

Download Full Size | PDF

The scattered wave radiated from the defect was computed by the difference-field boundary element method (DFBEM) [29], which is an arranged version of the boundary element method (BEM) [30]. The BEM calculates the scattered wave using the path integral of the fields on the dielectric interfaces and Green’s function. The DFBEM provides a rigorous scattered field from semi-infinite samples without infinite path integrals on the sample surface. Assuming that the aberration of the objective lens is removed, the scattered field on the OLDF corresponds to the far field of the scattered field because the sample surface and OLDF are located on the focal plane of the objective lens. The far field was calculated using an asymptotic expression of the Green’s function at a far distance.

The reflected wave component of the OLDF was computed algebraically as a Gaussian beam. The beam waist size was set to $40\lambda$. The training and validation datasets ${\boldsymbol x}$ were obtained from the sum of the scattered and reflected fields in the OLDF.

According to the procedure described in the previous section, the optimization process was performed in 10,000 steps. The constant $a$ in ${f_1}$ [Eqs. (12) and (12)] were set to 1.0, and the learning rate was set to $\alpha = 0.1$. The weight coefficients were varied as 0.00, 0.01, 0.05, 0.10, 0.20, and 0.50. The values ${f_1}$, ${f_2}$, and $e$ for each $\beta$ are shown in Fig. 4. On decreasing the objective function ${f_1}$, the discrimination error decreases. At the 10,000-th step, the discrimination errors were $e = 0.0475$ and ${f_2} = 0.934$ for $\beta = 0.00$. As $\beta$ increased, the discrimination error increased and ${f_2}$ decreased; when $\beta = 0.50$, $e = 0.212$ and ${f_2} = 0.0975$.

Fig. 4. (a) ${f_1}$, (b) ${f_2}$, and (c) discrimination error for each iteration step for $\beta = 0$, 0.05, and 0.5.

Download Full Size | PDF

Using the validation dataset, we computed the discrimination accuracy and irradiance distribution on the output plane. Figure 5 presents the discrimination error for the validation dataset after the 10,000-th iteration step. The error increases with an increase in beta, as observed with the training dataset. We selected two samples with widths of approximately $0.5\lambda$: $(d,h) = (0.5350\lambda , - 0.5656\lambda)$ (concave defect) and $(d,h) = (0.5110\lambda ,0.4674\lambda)$ (convex defect), and plotted the irradiance distribution on the output plane in Fig. 6. The irradiance was normalized such that the reflected wave component after passing through a unit filter (${w_i} = 1$ for all $i$-values) was unity. The observation point $\xi _{\rm o}^\prime $ was ${10^4}\lambda$. Compared to the results for $\beta = 0.01$, the irradiance for $\beta = 0.50$ was enhanced over the entire output plane.

Fig. 5. Discrimination error of the validation dataset.

Download Full Size | PDF

Fig. 6. Irradiance distribution of the concave ($(d,h) = (0.5350\lambda , - 0.5656\lambda)$) and convex defects ($(d,h) = (0.5110\lambda ,0.4674\lambda)$) for the OLDF with (a) $\beta = 0$, (b) 0.05, and (c) 0.5.

Download Full Size | PDF

5. DISCUSSION

Based on Fisher’s LDA [31], we designed and evaluated a conventional OLDF with the same training dataset used for our proposed method. The discrimination error of the validation data was 3.556%. Figure 7 presents the irradiance distributions on the output plane for $(d,h) = (0.5350\lambda , - 0.5656\lambda)$ and $(d,h) = (0.5110\lambda ,0.4674\lambda)$ (the same samples as plotted in Fig. 6). The irradiance at $\xi _{\rm o}^\prime $ was less than 0.1 times that achieved by the OLDF designed using the proposed algorithm. Moreover, the irradiance at the observation point was less than 0.1% of the peak irradiance at $\xi _{\rm o}^\prime \pm 20\lambda$. The relationship between the defect height and irradiance at $\xi _{\rm o}^\prime $ is plotted in Fig. 8(a); the irradiances are biased toward the lower side of the concave defects ($h \lt 0$) and the higher side of the convex defects ($h \gt 0$). By contrast, the peak irradiance at $\xi _{\rm o}^\prime + 20\lambda$ is no longer biased, as shown in Fig. 8(b). This is because the irradiance at this point is not the product of ${\boldsymbol x}$ and optimized ${\boldsymbol w}$, as described in Eq. (6). In practice, when we measure the irradiance at $\xi _{\rm o}^\prime $ using a finite-sized photodetector, the discrimination accuracy decreases considerably if only 0.1% of the irradiance at $\xi _{\rm o}^\prime + 20\lambda$ is included.

Fig. 7. Irradiance distribution of the concave ($(d,h) = (0.5350\lambda , - 0.5656\lambda)$) and convex defects ($(d,h) = (0.5110\lambda ,0.4674\lambda)$) for the OLDF based on Fisher’s LDA.

Download Full Size | PDF

Fig. 8. Relationship between the defect height and irradiance at (a) $\xi _{\rm o}^\prime $ and (b) $\xi _{\rm o}^\prime + 20\lambda$.

Download Full Size | PDF

Next, we evaluated the discrimination accuracy using the SDF introduced in Section 1. The training and test datasets were the same as those used to evaluate the proposed algorithm. The overall discrimination error was 2.000%. The irradiance distributions on the output plane for $(d,h) = (0.5350\lambda , - 0.5656\lambda)$ and $(d,h) = (0.5110\lambda ,0.4674\lambda)$ are shown in Fig. 9. The irradiance distribution expanded over the entire output plane and the irradiance at $\xi _{\rm o}^\prime $ was ${10^{- 11}}$ times smaller than that around the observation point. Consequently, in practical use, the discrimination accuracy decreases considerably because of the small amount of crosstalk with adjacent irradiances.

Fig. 9. Irradiance distribution of the concave ($(d,h) = (0.5350\lambda , - 0.5656\lambda)$) and convex defects ($(d,h) = (0.5110\lambda ,0.4674\lambda)$) for the SDF.

Download Full Size | PDF

Fig. 10. Simple dataset of ${\boldsymbol x} \in {\mathbb{R}^2}$ and ideal decision boundary (dashed line) determined by Fisher’s LDA. The vector ${\boldsymbol v}$ is the filter vector corresponding to the ideal decision boundary.

Download Full Size | PDF

To visualize the optimization process and compare the results with those of the conventional method, we prepared a simple dataset with $N = 2$ as shown in Fig. 10; the samples that belong to classes A (${{\boldsymbol x}_{{\rm A}n}}$, $n = 1,2,3,4$) and B (${{\boldsymbol x}_{{\rm B}n}}$, $n = 1,2,3,4$) are plotted with filled and open circles, respectively. One feature of this dataset is that the norm $|{{\boldsymbol x}_n}|$ varies but its direction ${{\boldsymbol x}_n}/|{{\boldsymbol x}_n}|$ is almost constant. The difference between ${{\boldsymbol x}_{{\rm A}n}}/|{{\boldsymbol x}_{{\rm A}n}}|$ and ${{\boldsymbol x}_{B{n^\prime}}}/|{{\boldsymbol x}_{B{n^\prime}}}|$ is small. These characteristics correspond to samples whose scattered irradiances vary, but whose directional patterns are almost constant. The decision boundary determined using Fisher’s LDA is shown in Fig. 10 with a dashed line. The within-class variance ($|{{\boldsymbol v}^{\rm H}}{\boldsymbol x}|/|{\boldsymbol v}|$) was almost zero for both classes A and B. The vector ${\boldsymbol v}$ can perfectly discriminate between the classes of samples. However, because the vector ${\boldsymbol v}$ is almost orthogonal to ${{\bar{\boldsymbol x}}_{\rm B}}$, $|{{\boldsymbol w}^{\rm H}}{{\bar{\boldsymbol x}}_{\rm B}}|$ for class B is much shorter than the norm of ${{\bar{\boldsymbol x}}_{\rm B}}$. This implies that the energy of the scattered wave barely reaches the observation point.

Fig. 11. Filter vectors (blue arrows) and decision boundaries (dashed line) for (a) $\beta = 0$ and (b) $\beta = 0.5$. A white circle with a number $m$ represents ${{\boldsymbol v}_m}$.

Download Full Size | PDF

The color map in Fig. 11 presents the distribution of ${f_1}({\boldsymbol v})$. The blue and white circles in the figure represent ${{\boldsymbol v}_m}(m = 0,20,40, \cdots ,5000)$, which were obtained by the proposed algorithm. When $\beta = 0$ [Fig. 11(a)], ${{\boldsymbol v}_m}$ converges such that $f({{\boldsymbol v}_m})$ is minimized. The decision boundary [the dashed line in Fig. 11(a)] is aligned to divide ${{\boldsymbol x}^{(A)}}$ by ${{\boldsymbol x}^{(B)}}$. When $\beta = 0.5$ [Fig. 11(b)], ${{\boldsymbol v}_m}$ converges to the minimum value of ${f_1}$; however, the angle between ${\boldsymbol x}$ and ${\boldsymbol v}$ is smaller and the output irradiance is larger than that observed for $\beta = 0$. The weight parameter is effective for adjusting the balance between discrimination accuracy and output irradiance. However, the margins between the decision boundary and some values of ${{\boldsymbol x}^{(A)}}$ and ${{\boldsymbol x}^{(B)}}$ are significantly small for $\beta = 0$. The discrimination accuracy may be reduced for deformed defects or noisy observational data. The objective function ${f_1}$ remains to be improved, such that ${\boldsymbol v}$ for the minimum ${f_1}$ is closer to that provided by Fisher’s LDA.

6. CONCLUSION

We proposed a design algorithm for the optical processing of LDA with enhanced output irradiance and high discrimination accuracy. The filter transmittance and threshold irradiance were optimized using two objective functions: increased discrimination accuracy and enhanced output irradiance. We prepared defect datasets with rough parabolic shapes and evaluated the proposed algorithm using numerical simulations. The discrimination error was 4.750%, which is comparable to that of the conventional OLDF designed based on Fisher’s LDA. However, the output irradiance was enhanced by more than 10 times. In addition, the high-irradiance peaks near the observation point were removed. The discrimination accuracy reduction, which occurs due to the crosstalk with such irradiances near the observation point, was also reduced.

The OLDF designed by the proposed algorithm is suitable for discriminating samples in which the structural difference between the two classes is small and their scattered irradiances vary but have almost constant scattered direction patterns. In addition to fine defects on the dielectric substrate, the proposed algorithm is effective in discriminating slight differences in the image patterns.

Funding

Japan Society for the Promotion of Science (23K11258).

Disclosures

The authors have no conflicts of interest to declare.

Data availability

The data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

REFERENCES

1. B. Bhaduri, C. Edwards, H. Pham, et al., “Diffraction phase microscopy: principles and applications in materials and life sciences,” Adv. Opt. Photon. 6, 57–119 (2014). [CrossRef]

2. S. Ajithaprasad, R. Velpula, and R. Gannavarpu, “Defect detection using windowed Fourier spectrum analysis in diffraction phase microscopy,” J. Phys. Commun. 3, 25006 (2019). [CrossRef]

3. S. Verma, S. S. Sarma, R. Dhar, et al., “Scratch enhancement and measurement in periodic and non-periodic optical elements using digital holography,” Optik 126, 3283–3287 (2015). [CrossRef]

4. M. A. Schulze, M. A. Hunt, E. Voelkl, et al., “Semiconductor wafer defect detection using digital holography,” Proc. SPIE 5041, 183–193 (2003). [CrossRef]

5. M. H. Madsen and P.-E. Hansen, “Scatterometry-fast and robust measurements of nano-textured surfaces,” Surf. Topogr. Metrol. Prop. 4, 023003 (2016). [CrossRef]

6. H. Sekiguchi and H. Shirai, “Electromagnetic scattering analysis for crack depth estimation,” IEICE Trans. Electron. 86, 2224–2229 (2003).

7. M. Liu, C. F. Cheung, N. Senin, et al., “On-machine surface defect detection using light scattering and deep learning,” J. Opt. Soc. Am. A 37, B53–B59 (2020). [CrossRef]

8. H.-D. Lin, “Automated defect inspection of light-emitting diode chips using neural network and statistical approaches,” Expert Syst. Appl. 36, 219–226 (2009). [CrossRef]

9. H. Chen, Y. Pang, Q. Hu, et al., “Solar cell surface defect inspection based on multispectral convolutional neural network,” J. Intell. Manuf. 31, 453–468 (2020). [CrossRef]

10. J. Jiang, P. Cao, Z. Lu, et al., “Surface defect detection for mobile phone back glass based on symmetric convolutional neural network deep learning,” Appl. Sci. 10, 3621 (2020). [CrossRef]

11. K.-C. C. Chien and H.-Y. Tu, “Complex defect inspection for transparent substrate by combining digital holography with machine learning,” J. Opt. 21, 85701 (2019). [CrossRef]

12. L. Li, D. Liu, P. Cao, et al., “Automated discrimination between digs and dust particles on optical surfaces with dark-field scattering microscopy,” Appl. Opt. 53, 5131–5140 (2014). [CrossRef]

13. J. Jiang, X. Xiao, G. Feng, et al., “Detection and classification of glass defects based on machine vision,” Proc. SPIE 11102, 1110210 (2019). [CrossRef]

14. F. Wu, Y. Yang, J. Jiang, et al., “Classification between digs and dust particles on optical surfaces with acquisition and analysis of polarization characteristics,” Appl. Opt. 58, 1073–1083 (2019). [CrossRef]

15. Y. Yang, H. Chai, C. Li, et al., “Surface defects evaluation system based on electromagnetic model simulation and inverse-recognition calibration method,” Opt. Commun. 390, 88–98 (2017). [CrossRef]

16. A. Vander Lugt, F. Rotz, and A. Klooster Jr., “Character reading by optical spatial filtering,” in Optical and Electro-Optical Information Processing (1965), pp. 125–141.

17. R. Fusek, L. Lin, K. Harding, et al., “Holographic optical processing for submicron defect detection,” Proc. SPIE 523, 54–59 (1985). [CrossRef]

18. C. Uhrich and L. Hesselink, “Submicrometer defect detection in periodic structures by photorefractive holography: system design and performance,” Appl. Opt. 33, 744–757 (1994). [CrossRef]

19. A. Kozma and D. L. Kelly, “Spatial filtering for detection of signals submerged in noise,” Appl. Opt. 4, 387–392 (1965). [CrossRef]

20. D. Casasent, “Coherent optical pattern recognition: a review,” Opt. Eng. 24, 240126 (1985). [CrossRef]

21. Z.-H. Gu and S. H. Lee, “Recognition of images of Markov-1 model by least-squares linear mapping technique,” Appl. Opt. 23, 822–827 (1984). [CrossRef]

22. Z. Bahri and B. V. Kumar, “Generalized synthetic discriminant functions,” J. Opt. Soc. Am. A 5, 562–571 (1988). [CrossRef]

23. B. V. Kumar, “Minimum-variance synthetic discriminant functions,” J. Opt. Soc. Am. A 3, 1579–1584 (1986). [CrossRef]

24. D. Casasent, “Unified synthetic discriminant function computational formulation,” Appl. Opt. 23, 1620–1627 (1984). [CrossRef]

25. D. Casasent and W.-T. Chang, “Correlation synthetic discriminant functions,” Appl. Opt. 25, 2343–2350 (1986). [CrossRef]

26. D. Wu, X. Sun, J. Wang, et al., “Distortion-invariant recognition of volume holographic correlator based on morphological algorithm and synthetic discriminant function,” Optik 124, 508–511 (2013). [CrossRef]

27. X. Lin, Y. Rivenson, N. T. Yardimci, et al., “All-optical machine learning using diffractive deep neural networks,” Science 361, 1004–1008 (2018). [CrossRef]

28. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res. 12, 2121–2159 (2011).

29. J. Sugisaka, T. Yasui, and K. Hirayama, “Expansion of the difference-field boundary element method for numerical analyses of various local defects in periodic surface-relief structures,” J. Opt. Soc. Am. A 32, 751–763 (2015). [CrossRef]

30. N. Kumagai, N. Morita, and J. R. Mautz, Integral Equation Methods for Electromagnetics (Artech House Antenna Library, 1990).

31. J. Sugisaka, T. Yasui, and K. Hirayama, “Design of an optical linear discriminant filter for classification of subwavelength concave and convex defects on dielectric substrates,” J. Opt. Soc. Am. A 39, 342–351 (2022). [CrossRef]

Design of an optical linear-discriminant filter: optimization for enhancement of filter transmittance and discrimination accuracy

Abstract

1. INTRODUCTION

2. OPTICAL SYSTEM WITH OLDF

3. DESIGN ALGORITHM OF OLDF

A. Objective Functions

B. Optimization Process

4. DISCRIMINATION SIMULATION

5. DISCUSSION

6. CONCLUSION

Funding

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (11)

Equations (23)

Journal of the Optical Society of America A