Adaptive penalty method with an Adam optimizer for enhanced convergence in optical waveguide mode solvers

Po-Jui Chiang

doi:10.1364/OE.495855

1. Introduction

Significant strides have been made in the development of eigenmode solvers for optical waveguide modes [1–4] and photonic crystal band diagrams [5,6] using numerical methods. Among these, the pseudospectral frequency-domain (PSFD) method [2–4,6] has become increasingly popular due to its high precision and rapid convergence. Contrary to the techniques previously mentioned, the PSFD method uses non-uniform Chebyshev-Gauss-Lobatto grid points and corresponding boundary conditions, such as Dirichlet and Neumann-type boundary conditions (DBCs and NBCs), at curved dielectric interfaces. This method offers benefits like a natural concentration of points near material interfaces, which is particularly advantageous for instances with substantial electromagnetic field variations near interfaces [7]. Hence, the PSFD method represents a compelling solution to optical problems using numerical methods, supplementing other techniques in the field.

Boundary conditions (DBCs and NBCs) at interfaces are crucial for the performance of both PSFD and other numerical methods. In Refs. [2–4,6], DBCs and NBCs are enforced between adjacent subdomains, known as the strong boundary method (SBM), when assembling subdomains to formulate the final matrix eigenvalue equation. Conversely, the finite element method (FEM) solver employs a penalty method (PM) [8] with a penalty coefficient $\alpha$ to enforce the divergence-free condition and deliver necessary boundary conditions, effectively eliminating spurious solutions. However, the selection of $\alpha$ has not been extensively addressed.

Funaro and Gottlieb [9] established and validated the value of $\alpha$ for pseudospectral methods based on Chebyshev approximation to ensure the stability and accuracy of the numerical scheme. They also suggested an intuitive trial-and-error method for correlating $\alpha$ to the numerical grid square of the subdomain. However, the accuracy and convergence of this method may deteriorate in more complex structures. By employing artificial intelligence methods, it is possible to efficiently determine the optimal penalty coefficient, leading to enhanced accuracy and convergence for optical waveguide mode solvers in intricate structures.

Artificial intelligence (AI) and deep neural networks have exhibited extraordinary performance in a variety of fields, including computer vision, natural language processing, and speech recognition [10–15]. These techniques have also been applied to optics and photonics, leading to significant advancements in the design, simulation, and optimization of optical components. Deep learning algorithms have further been utilized to solve inverse problems in optics, such as image deblurring, denoising, and super-resolution [16]. Yet, these techniques have their limitations, including the need for large datasets and considerable time investment for preprocessing to select useful datasets.

In this work, a novel method is presented to eliminate the need for comprehensive and complex data collection. A penalty method for pseudospectral frequency-domain (PM-PSFD) is introduced. This approach effectively imposes boundary conditions at material interfaces, aiding in the modeling of practical optical components. By converting the two-dimensional Helmholtz eigenvalue problem into an unconstrained optimization form [17–20], an explicit gradient relationship for the penalty factor $\alpha$ can be derived. The Adam optimizer [18–21], an adaptive optimization technique inspired by artificial intelligence, is then leveraged to identify the optimal $\alpha$ value for the pseudospectral solution of the two-dimensional Helmholtz equation. This strategy facilitates efficient determination of the optimal penalty coefficient in just a few iterations, leading to the successful implementation of the PM-PSFD method.

The efficiency of the PM-PSFD method is assessed by analyzing a simple circular waveguide using different penalty factor values, compared with the optimal penalty factor obtained iteratively through the Adam optimizer. The iterative method is then used to model a symmetrical $2\times 2$ strongly fused fiber-optic coupler with a single sharp corner, a rib waveguide with multiple sharp corners, and a photonic crystal fiber with numerous subdomains. To verify the suitability of the Adam optimizer over other advanced algorithms, such as the Artificial Gorilla Troops Optimizer (GTO) (a metaheuristic optimization method) [22], a comparison is conducted using a surface plasmon structure [23] with extreme boundary mode variations.

The remainder of this paper is organized as follows: Section 2 reviews relevant optimization methods. Section 3 elaborates on the PM-PSFD method and Adam optimizer implementation. In Section 4, we test our approach with examples including a circular waveguide, a $2\times 2$ strongly fused fiber-optical coupler, a rib waveguide, and a photonic crystal fiber. Section 5 compares the Adam optimizer and GTO using a surface plasmon structure, confirming Adam’s suitability for PM-PSFD. Lastly, Section 6 summarizes our findings and suggests future research directions.

2. Related work

AI advancement is heavily tied to the integration of various optimization techniques, primarily adaptive and metaheuristic methods. Adaptive methods, encompassing techniques like gradient descent [24], Adam [25,26], and AdaGrad [27], necessitate first-order gradient differentiation. Conversely, metaheuristic methods, such as the Farmland Fertility algorithm [28], African Vultures Optimization Algorithm [29,30], and GTO, are free from this gradient differentiation requirement.

This liberates metaheuristic optimization methods to serve in a plethora of domains, such as COVID-19 detection, energy management, and more [31–37]. In contrast, adaptive methods excel in machine learning and deep learning contexts, streamlining convergence in complex neural network architectures [24–27,38].

Initial adaptive optimization methods like SGD [39] depended heavily on appropriate learning rates. Adaptive learning rate methods such as AdaGrad [27], RMSProp [40], and Adam [25,26] were developed to address this, adjusting learning rates for each network weight. However, these methods aren’t without their challenges, particularly in managing swift convergence and overfitting in deep learning.

Despite the challenges, the evolution of these techniques persists, as shown by the recent AdaBelief optimizer [41], a variation of Adam, designed to curtail premature convergence and enhance model generalization. It highlights the potential for innovation and performance boosting in adaptive optimization techniques.

3. Formulation

3.1 PM-PSFD with Adam

Consider the standard two-dimensional Helmholtz equation in a uniform material region:

(1)$$\left\{\begin{array}{cc}{\left(\frac{\partial^2}{\partial x^2}+\frac{\partial^2}{\partial y^2}+{k_0}^2{n^2}\right) \vec{H} \left(x,y\right)=\beta^2 \vec{H} \left(x,y\right)} & \\\vec{H}\left(x,y\right)={\textbf{BC}} & \texttt{on}\,\,\,\, \Omega \end{array} \right.$$

where $n$ is the refractive index, $k_0$ is the free-space wave number, $\vec {H}(x,y)$ is the magnetic field vector, $\beta$ is the modal propagation constant, and BC denotes boundary conditions on the boundary $\Omega$. As in [3], (1) can be formulated into a matrix eigenvalue problem which can be solved by available numerical techniques. In this SBM procedure, (1) is first transformed into the matrix problem:

(2)$$\left\{\begin{array}{cc}{\bar{\bar{A}}\bar{H}=\beta^2\bar{H}} & \\ {\bar{\bar{B}}\bar{H}=0} & \texttt{on}\,\,\,\, \Omega \end{array} \right.$$

where $\bar {\bar {A}}$ is the operator matrix which can be referred to (26) of [3], and $\bar {\bar {B}}$ consists of DBCs or NBCs on $\Omega$. Then, $\bar {\bar {A}}$ is directly displaced by $\bar {\bar {B}}$ and $\beta ^2\bar {H}$ forced to zero on $\Omega$, as detailed in the Appendix of [4]. As for the PM-PSFD, $\bar {\bar {B}}$, is coupled into $\bar {\bar {A}}$ through the penalty number $\alpha$, i.e.,

(3)$$\begin{aligned} &({\bar{\bar{A}}+\delta\alpha\bar{\bar{B}})\bar{H}=\beta^2\bar{H}}\\ &\delta=\left\{\begin{array}{cc}1 & \texttt{on}\,\,\,\, \Omega \\0 & \texttt{otherwise}. \end{array} \right. \end{aligned}$$

When $\alpha \rightarrow \infty$, from (3) we have

(4)$$\bar{\bar{B}}\bar{H}=\lim_{\alpha\rightarrow \infty}\frac{1}{\alpha}(-{\bar{\bar{A}}+\beta^2)\bar{H}}=0.$$

Therefore, according to (3) and (4), we note that the PM approaches the SBM when letting $\alpha \rightarrow \infty$.

In order to iteratively obtain the optimal penalty number $\alpha$ for the two-dimensional Helmholtz eigenvalue problem, we can formulate it into an unconstrained optimization problem and employ the Adam optimizer. We first rewrite the problem as minimizing a cost function $F(\bar {H}, \alpha )$, which considers both the residual of the Helmholtz equation and the boundary conditions (BC) enforced through the penalty term. The cost function can be delineated as:

(5)$$F(\bar{H}, \alpha) = \lVert ({\bar{\bar{A}}+\delta\alpha\bar{\bar{B}}) \bar{H} - \beta^2\bar{H}} \rVert^2 + \lambda \lVert \bar{\bar{B}}\bar{H} \rVert^2,$$

where $\lambda > 0$ serves as a regularization parameter harmonizing the contributions of the residual and the imposition of the boundary condition. From (5), the explicit gradient differentiation of $F(\bar {H}, \alpha )$ with respect to $\alpha$ can be deduced as follows:

(6)$$\nabla_\alpha F(\bar{H}, \alpha) = 2\left(({\bar{\bar{A}}+\delta\alpha\bar{\bar{B}}) \bar{H} - \beta^2\bar{H}}\right)\delta\bar{\bar{B}} \bar{H} .$$

To incorporate the Adam optimizer into (5) for iteratively obtaining the optimal penalty number $\alpha$ for the two-dimensional Helmholtz eigenvalue problem, we need to modify the algorithm accordingly. First, we initialize additional variables for the first and second moment estimates of the gradient. Then, in each iteration, we update these variables and use them to adjust the gradient. The modified algorithm using Adam optimizer can be described as follows:

Algorithm 1. Adam Optimizer for Penalty Method in Optical Waveguide Mode Solver.

View Table | View all tables in this article

In this modified Algorithm 1, we use the Adam optimizer to update $\alpha$. Note that we need to set some additional hyperparameters, such as the learning rate $\eta$, the momentum decay factor $\hat {\beta }_1$, the velocity decay factor $\hat {\beta }_2$, and a small constant $\epsilon$ for numerical stability. Commonly used recommended values are: Hyperparameters for the Adam Optimizer (Mark):

• Learning rate $\eta$: A value in the range of 0.001 to 0.01
• Momentum decay factor $\hat {\beta }_1$: A value close to 0.9 (e.g., 0.9)
• Velocity decay factor $\hat {\beta }_2$: A value close to 0.999 (e.g., 0.999)
• Small constant $\epsilon$: A small value, such as 1e-10

By following this modified algorithm with the Adam optimizer, we can iteratively obtain the optimal $\alpha$, $\bar {H}$, and $\beta$ that satisfy the two-dimensional Helmholtz equation and the given boundary conditions. The use of the Adam optimizer helps to improve convergence speed and stability compared to the basic gradient descent method.

The algorithm we present introduces two momentums, the advantages of which are twofold: 1. The first momentum, or the moving average of the gradient (denoted as $m_{\alpha }$ in our algorithm), captures the direction of steepest descent in the parameter space. This value aggregates information from past and present gradients, which helps guide the optimization process. 2. The second momentum, or the moving average of the squared gradient (denoted as $v_{\alpha }$), encapsulates the ’roughness’ or ’variability’ of the optimization landscape. This provides a self-adjusting learning rate, allowing the algorithm to take larger steps in flatter regions and smaller steps in steeper regions. This facilitates faster convergence and avoids overshooting. The updated value of $\alpha$ is computed using bias-corrected estimates of the first and second momentums. Bias correction is necessary to correct the initial bias towards zero that exists in the early steps of the optimization process. This bias can lead to slower starts in the optimization process. The term $\epsilon$ is added for numerical stability, preventing division by zero situations.

The primary distinction in this adapted algorithm is that we update $\bar {H}$ and $\beta$ at each step by solving the modified Helmholtz Eq. (3) using the present penalty number $\alpha _i$. The Adam optimizer employs adaptive learning rates for each parameter. This approach can help circumvent issues associated with local minima or saddle points within the optimization landscape, potentially promoting quicker convergence and a more resilient optimization process.

3.2 Computational complexity

The computational complexity of the Penalty Method-PSFD (PM-PSFD) with the Adam optimizer can be divided into two primary processes: the matrix operations inherent in PM-PSFD, and the iterative update procedure provided by Adam optimization.

In the initialization process, initial values for the field vector $\bar {H}$, the penalty number $\alpha$, and the moment estimates $m$ and $v$ are set. These vectors are initialized based on the size of the problem, which is correlated with the number of nodes $N^2$ and boundary nodes $4N-4$ in a discretized spatial subdomain.

The matrix operations within PM-PSFD predominantly involve matrix-vector multiplication, which exhibit a computational complexity of $O(N^2)$.

The Adam optimization update process includes iterative refinement of the first and second moment estimates of the gradient. These estimates are utilized to adjust the penalty number $\alpha$, corresponding to the boundary nodes $4N-4$. Since these calculations are conducted for each boundary component of the field vector $\bar {H}$, the complexity of this operation is linearly proportional to the boundary size, i.e., $O(4N-4)$.

The overall complexity also considers the maximum number of iterations $T$ the algorithm can perform. Consequently, the total computational complexity of the PM-PSFD algorithm operating on one subdomain, supplemented with Adam optimization, can be succinctly expressed as $O\left ( N^2T + (4N-4)T\right )$.

4. Waveguide structures

4.1 Circular optical waveguide (COW)

The performance of the proposed PM-PSFD is analyzed using a step-index circular optical waveguide (COW) with a core radius of $a=0.6,\mu$m and a high refractive-index contrast, where the core index is $n_1=\sqrt {8}$ and the cladding index is $n_2=1.0$ at a wavelength of $\lambda =1.5,\mu$m. The exact effective index, computed to 14 decimal places, is $n_{\rm eff}=2.68401932160108$. Here, the effective index is defined as $n_{\rm eff}=\beta /k_0$.

Figure 1(a) illustrates the typical domain and mesh division profile for the Chebyshev pseudospectral method. In this case, five subdomains are considered, and the mesh pattern for each subdomain corresponds to $N=8$, where $N$ signifies the degree of the Chebyshev polynomial. The radius of the computational-domain boundary, where the total field is required to be zero (zero boundary condition), is set to $R_{bc}=2.5,\mu$m.

Fig. 1. (a) Mesh and domain division profile for the circular fiber. (b) Relative errors in the effective index for the fundamental mode in a circular waveguide using constant $\alpha$’s in the PM for comparing with the SBM [2]. (c) Same as (b) but using $N$-dependent $\alpha$’s and Adam.

Download Full Size | PDF

We focus on the sensitivities of Adam’s learning rate $\eta$ within the range of 0.001 to 0.01, as depicted in Fig. 1(b). The effective index convergence results were found to be approximately consistent within this range, suggesting that the sensitivities within this range were relatively inconsequential. However, using a learning rate of 0.01 resulted in faster iteration speed, hence, we primarily used 0.01 in the following studies.

Figure 1(c) and (d) present the errors in the calculated effective indices for the fundamental mode compared to the exact value, which is defined as $|(n_{\rm eff,,calculated}-n_{\rm eff,,exact})/n_{\rm eff,,exact}|$, versus the number of unknowns (transverse magnetic components at grid points). This depends on $N$ for both the SBM calculation [3] and the PM-PSFD using different $\alpha$’s. The results of the SBM calculation [3] up to $N=22$ with the number of unknowns being $5\times 2\times (22+1)^2$ are displayed with dotted lines.

The PM-PSFD results using fixed $\alpha$ values of 5, 500, and 5000 are exhibited in Fig. 1(c). It is clear that results with larger $\alpha$ converge to those of the SBM calculation, in accordance with Eq. (4). However, the errors do not strictly decrease monotonically with respect to the number of unknowns (or $N$), and different numbers of unknowns minimize the errors for different $\alpha$’s. Thus, $N$-dependent $\alpha$’s might yield superior performance.

The Adam optimizer is compared with $\alpha (N)=N^2$ and $2N^2$ as recommended in [9], and the results are depicted in Fig. 1(d). The errors using both the Adam optimizer and $\alpha (N)=N^2$ decrease to below $10^{-12}$ when $N=22$, surpassing the SBM calculation. To further validate the performance of the PM-PSFD method using the Adam optimizer, as well as $\alpha (N)=N^2$ for other optical waveguides, we examine the more complex fused coupler structure in the following section.

4.2 Symmetrical $2\times 2$ strongly fused fiber-optical coupler ($2\times 2$ SFOC)

The mesh and division profile, consisting of five subdomains, for a symmetrical $2\times 2$ strongly fused fiber coupler ($2\times 2$ SFOC) are depicted in Fig. 2(a). The core index is $n_{co}=1.45$, the cladding index is $n_{clad}=1$, the operating wavelength is $\lambda =1.523,\mu$m, the aspect ratio is $2d/2r=1.8$, where $r$ is the radius of the individual core and $2d$ is the width of the coupler, and the normalized frequency is $V=50$. As illustrated in Fig. 2(b), the relative errors in the effective index for the lowest even $H_y^{11}$ mode using the Adam optimizer in the PM-PSFD method are more accurate and stable than the results obtained from the SBM and $\alpha (N)=N^2$. This result deviates from the analysis of the circular waveguide. The observed difference is likely attributable to the presence of sharp corners in the structure. In the following rib waveguide structure, which has even sharper corners, this phenomenon is expected to be more pronounced.

Fig. 2. (a) Mesh and domain division profile for a fused fiber coupler. (b) Relative errors in the effective index for the $H_y^{11}$ mode using the PM with Adam, $\alpha (N)=N^2$ and the SBM [3].

Download Full Size | PDF

4.3 Rib waveguide (Rib)

We then utilize the Chebyshev PM-PSFD to analyze the rib waveguide (Rib), as depicted in Fig. 3(a). The domain and mesh division profile consists of 12 subdomains. Owing to the symmetry of the mode field, we only need to calculate half of the entire region. We adopt the operating wavelength $\lambda =1.15\mu m$, rib width $W=3.0\mu m$, and $H+D=1.0\mu m$, with D varying from 0.1 to 0.5 $\mu m$. The refractive indices of the cover $n_c$, the guiding layer $n_g$, and the substrate $n_s$ are 1.0, 3.44, and 3.4, respectively. The parameters for the computational window are $R=3.0\mu m$, $C=1.0\mu m$, and $S=5.0\mu m$. By taking $D=0.5,\mu$m, we calculate the relative errors in the effective index of the ${\rm H}_{11}^y$ mode versus the $\log {\left (\text {Number of unknowns}\right )}$, and we compare the results obtained using the Adam optimizer, the SBM, and $\alpha (N)=N^2$ with the results from [42], as demonstrated in Fig. 3(b). As shown in Fig. 3(b), as the number of sharp angles increases, the convergence rate of the SBM significantly deteriorates, while the Adam optimizer exhibits the best convergence rate among the tested methods.

Fig. 3. (a) Mesh and domain division profile for a rib waveguide. (b) Relative errors in the effective index for the $H_y^{11}$ mode using the Adam optimizer, $\alpha (N)=N^2$, and the SBM [3], compared with [42].

Download Full Size | PDF

We also compute the effective index ($n_{eff}$) of the ${\rm H}_{11}^y$ mode for different degrees ($M=N$) and list the results in Table 1, along with the corresponding numbers of unknowns, using the Adam optimizer. Table 2 presents the effective indices of the mode, calculated by Hadley [42] and by the current Adam optimizer with 7680 total unknowns for different values of D. It can be seen that in this multi-corner case, the accuracy of the PM-PSFD, using the much simpler boundary setting provided by the Adam optimizer, is comparable to that of [42], which utilized a more complicated method.

Table 1. Convergence Characteristics of the Effective Index of the ${\rm H}_{11}^y$ Mode of the Rib Waveguide of Fig. 3 with $D=0.5\,\mu {\rm m}$ Calculated by the PM-PSFD with Adam.

View Table | View all tables in this article

Table 2. Effective indices of the ${\rm H}_{11}^y$ mode for the rib waveguide of Fig. 3 with different $D_s$ from [42] and the present PM-PSFD with Adam.

View Table | View all tables in this article

4.4 Photonic crystal fiber (PCF)

Finally, we consider a case with as many as 70 subdomains, represented by a photonic crystal fiber (PCF) with two rings of 18 air holes each, and a hole pitch of $\Lambda =2.3\mu m$, as illustrated in Fig. 4(a). The hole diameter is $d=5\mu m$, and the silica background index is $n=1.45$ with a wavelength of $\lambda =1.45,\mu$m. Unlike the cases discussed earlier, the effective index of the PCF for the fundamental mode is a complex number, with $n_{eff}=1.4384449486+1.0129554\times 10^{-6}i$.

Fig. 4. (a) Mesh and domain division profile for a photonic crystal fiber. (b) Relative errors in the effective index for the fundamental mode using the Adam optimizer, $\alpha (N)=N^2$ and the SBM [3].

Download Full Size | PDF

The relative errors in the effective index of the fundamental mode versus the number of unknowns, along with the results obtained using the Adam optimizer, the SBM, and $\alpha (N)=N^2$, are plotted in Fig. 4(b). As depicted in Fig. 4(b), we can clearly observe that since this structure does not have any sharp corners, all three methods—Adam optimizer, SBM, and $\alpha (N)=N^2$—exhibit good convergence as the number of unknowns increases. However, due to the increased number of subdomains, the boundary effects become more significant, resulting in superior performance in terms of both higher precision and convergence rate with the Adam optimizer.

Table 3 summarizes the convergence rate of different structures at various relative error levels, taking into account three methods: SBM, $\alpha =N^2$, and Adam. The total subdomain count (TSD) is also provided for each structure. The symbols "$\surd$", "®", and "$\times$" respectively represent the best, intermediate, and worst performance among the three methods for each structure and relative error level, considering the iteration time of Adam.

Table 3. Convergence Rate of Different Structures for Various relative error Levels with Total Sub-Domain (TSD) Considering Three Methods: SBM, $\alpha =N^2$, and Adam; The "$\surd$", "®", and "$\times$" symbols, respectively, represent the best, intermediate, and worst performance.

View Table | View all tables in this article

For the COW structure with a TSD of 5, the SBM method performs best at relative error levels greater than $10^{-3}$, while the $\alpha =N^2$ method performs optimally at other relative error levels. For the $2\times 2$ SFOC structure with a TSD of 5, the SBM method performs the best at relative error levels greater than $10^{-3}$, while the $\alpha =N^2$ method performs optimally at relative error levels between $10^{-3}$ and $10^{-6}$. The Adam method performs the best at relative error levels less than $10^{-6}$, considering its iteration time. For the Rib structure with a TSD of 12, the SBM method performs the best at relative error levels greater than $10^{-3}$. The $\alpha =N^2$ method performs optimally at relative error levels between $10^{-3}$ and $10^{-6}$, and the Adam method performs the best at relative error levels less than $10^{-6}$, considering its iteration time. Finally, for the PCF structure with a TSD of 70, the SBM method performs the best at relative error levels greater than $10^{-3}$, the $\alpha =N^2$ method performs optimally at relative error levels between $10^{-3}$ and $10^{-6}$, and the Adam method performs the best at relative error levels less than $10^{-6}$.

4.5 Performance evaluation of optimizers

In this section, we delve deeper to determine whether the Adam optimizer proves more efficient for optical waveguide mode solvers compared to other advanced algorithms, such as the GTO. To carry out this comparison, we utilize a complex surface plasmon structure, depicted in Fig. 5(a) [23]. The structure involves a Si waveguide (WG) located on the SiO$2$ buried oxide (BOX). The WG is separated from adjacent Si components by two air trenches, each with a width of $W{\mathrm {air}}$, sufficient to hinder power leakage from the sidewalls. We posit that the waveguide has a width ($W_{\mathrm {WG}}$) of 1.2 $\mu$m and a height ($t_{\mathrm {Si}}$) of 1 $\mu$m, while the BOX boasts a thickness of $t_{\mathrm {BOX}}$. For values of $t_{\mathrm {BOX}}=20$ nm and $t_{\mathrm {c2}}=60$ nm, we observe an SPP-like mode with extreme electromagnetic field variations at the boundaries, as displayed in Fig. 5(b).

Fig. 5. (a) The cross section of a hybrid semiconductor laser (HSL). (b)The major component of the SPP-like mode with $t_{\mathrm {BOX}}=20$ nm at $t_{\mathrm {c2}}=60$ nm [23]. (c) Relative errors in the effective index for the fundamental mode using the Adam optimizer, the GTO optimizer, and the SBM [3].

Download Full Size | PDF

Figure 5(c) presents a plot of the relative errors in the effective index of the SPP-like mode against the number of unknowns, and compares results derived from the Adam optimizer, the SBM, and the GTO optimizer. The data clearly suggests that, despite boundary condition effects, the traditional SBM trails the other two optimizers in convergence speed, while preserving the inherent convergence characteristics of the PSFD.

From a quantitative perspective, we implemented t-tests, producing a T-statistic: 1.27948839 and a P-value: 0.24797290 for Adam VS SBM, and a T-statistic: −0.24709510, P-value: 0.81307227 for Adam VS GTO. These statistical measures reveal a significant difference between Adam and SBM (T-statistic>0, P-value<1), while the similarity between Adam and GTO is striking (T-statistic$\approx$0, P-value$\approx$1).

Although Adam and GTO exhibit similar convergence performance, the computation time for GTO becomes significantly more complex in view of the PSFD computation time, defined as $O(N^2T+2P\times (1+T\times (N\times 4-3)))$, where P=$30\sim 50$ represents the population size. This contrasts starkly with Adam’s computational time complexity, described as $O( N^2T + (N\times 4-4)T)$. Crucially, the predominant factor that provides Adam its efficiency edge is its implementation of the explicit gradient differentiation of the penalty coefficient $\alpha$ (refer to Eq. (6)) in the optical mode solves formulation. As a result, the iteration count, T, required by Adam is generally lesser than that needed for GTO. This discrepancy leads to a substantially shorter computation time for Adam compared to GTO. For example, to achieve an accuracy of $10^{-10}$, GTO demands 7495 seconds, while Adam achieves the same precision in only 2397 seconds. Additionally, the procedure for determining parameters in GTO is more intricate than in Adam, necessitating numerous tests to pinpoint the optimal solution. Therefore, Adam emerges as the more favorable choice in these conditions.

5. Conclusions and future works

We have developed an advanced method for the analysis of optical waveguide structures that employs the enhanced Chebyshev pseudospectral method in combination with the Adam optimization algorithm. This approach effectively and accurately addresses eigenvalue problems associated with optical waveguides. With the exception of the optical circular waveguide, our numerical examples showcased the superiority of the proposed method in terms of accuracy (relative errors less than $<10^{-6}$), convergence, and computational efficiency compared to traditional methods. Adam’s advantage originates from its inherent characteristics: while it can be easily disturbed by numerical noise in cases with a smaller number of unknowns, it is more adept at avoiding local minima in situations with a larger number of unknowns, where noise is less prominent. To further assess the performance of Adam, we applied it to a complex surface plasmon structure and compared its efficacy with the sophisticated metaheuristic GTO optimizer. Despite GTO’s complexity, we found that Adam excelled in terms of computational efficiency, primarily owing to its implementation of the explicit gradient differentiation of the penalty coefficient $\alpha$ in the optical mode solves formulation. However, it is crucial to underscore that if optical problems lack explicit gradient differentiation or even do not have gradient differentiation at all, as in nonlinear optical problems, the effectiveness of Adam would be significantly hampered, possibly even rendering it incapable of finding the optimal solution.

Given our successful application of the Adam optimizer to this optical mode solver problem, we foresee the potential to further employ both Adam and GTO optimizers to address other challenges in the realm of optics. This could not only promote advances in the field but also significantly influence the design and analysis of optical waveguide devices in future research and practical applications. Examples of such applications could include the design of high-precision optical components and the creation of comprehensive optical databases.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. W. Guo, Y. Wu, Z. Xiong, Y. Jing, and Y. Chen, “Simple yet effective analysis of waveguide mode symmetry: generalized eigenvalue approach based on maxwell’s equations,” Opt. Express 30(21), 37910–37924 (2022). [CrossRef]

2. P. J. Chiang and H. C. Chang, “A high-accuracy pseudospectral full-vectorial leaky optical waveguide mode solver with carefully implemented upml absorbing boundary conditions,” Opt. Express 19(2), 1594–1608 (2011). [CrossRef]

3. P. J. Chiang, C. L. Wu, C. H. Teng, C. S. Yang, and H. C. Chang, “Full-vectorial optical waveguide mode solvers using multidomain pseudospectral frequency-domain (psfd) formulations,” IEEE J. Quantum Electron. 44(1), 56–66 (2008). [CrossRef]

4. P. J. Chiang, C. P. Yu, and H. C. Chang, “Analysis of two-dimensional photonic crystals using a multidomain pseudospectral method,” Phys. Rev. E 75(2), 026703 (2007). [CrossRef]

5. F. L. Hsiao, H. F. Lee, S. C. Wang, Y. M. Weng, and Y. P. Tsai, “Artificial neural network for photonic crystal band structure prediction in different geometric parameters and refractive indexes,” Electronics 12(8), 1777 (2023). [CrossRef]

6. P. J. Chiang and Y. C. Chiang, “Pseudospectral frequency-domain formulae based on modified perfectly matched layers for calculating both guided and leaky modes,” IEEE Photon. Technol. Lett. 22(12), 908–910 (2010). [CrossRef]

7. P. J. Chiang, Y. C. Chiang, N. H. Sun, and S. X. Hong, “Analysis of optical waveguides with ultra-thin metal film based on the multidomain pseudospectral frequency-domain method,” Opt. Express 19(5), 4324–4336 (2011). [CrossRef]

8. B. M. A. Rahman and J. B. Davies, “Penalty function improvement of waveguide solution by finite elements,” IEEE Trans. Microwave Theory Techn. 32(8), 922–928 (1984). [CrossRef]

9. D. Funaro and D. Gottlieb, “A new method of imposing boundary conditions in pseudospectral approximations of hyperbolic equations,” Math. Comp. 51(184), 599–613 (1988). [CrossRef]

10. A. Radhakrishnan, M. Belkin, and C. Uhler, “Wide and deep neural networks achieve consistency for classification,” Proc. Natl. Acad. Sci. 120(14), e2208779120 (2023). [CrossRef]

11. S. V. Mahadevkar, B. Khemani, S. Patil, K. Kotecha, D. R. Vora, A. Abraham, and L. A. Gabralla, “A review on machine learning styles in computer vision—techniques and future directions,” IEEE Access 10, 107293–107329 (2022). [CrossRef]

12. R. Saleem, B. Yuan, F. Kurugollu, A. Anjum, and L. Liu, “Explaining deep neural networks: A survey on the global interpretation methods,” Neurocomputing 513, 165–180 (2022). [CrossRef]

13. Z. Liu, D. Zhu, S. P. Rodrigues, K.-T. Lee, and W. Cai, “Generative model for the inverse design of metasurfaces,” Nano Lett. 18(10), 6570–6576 (2018). [CrossRef]

14. W. Ma, Z. Liu, and Z. Kudyshev, “Deep learning for the design of photonic crystals,” Nat. Photonics 15, 77–90 (2021). [CrossRef]

15. P. Kumar and H. Reddy, “Machine learning-based design and optimization of plasmonic structures,” Plasmonics 15(5), 1471–1479 (2020). [CrossRef]

16. Q. Sun, X. Fu, Y. Guo, and W. An, “Deep learning techniques for inverse problems in optics: A comprehensive review,” Opt. Laser Technol. 126, 106118 (2020). [CrossRef]

17. G. Hadley, Nonlinear and dynamic optimization: From theory to practice (Springer Science & Business Media, 2002).

18. S. Bock, J. K. Goppold, and M. G.. Wei, “An improvement of the convergence proof of the Adam-optimizer,” ArXiv, abs/1804.10587 (2018).

19. M. Reyad, A. Sarhan, and M. Arafa, “A modified Adam algorithm for deep neural network optimization,” Neural Computing and Applications (2023).

20. B. Becker and J. Goll, “A deep learning tutorial for state-of-the-art fault detection performance in photovoltaic systems,” IEEE J. Photovoltaics 8(2), 575–586 (2018). [CrossRef]

21. Z. Zhang, “Improved adam optimizer for deep neural networks,” in IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), (2018), pp. 1–2.

22. B. Abdollahzadeh, F. Soleimanian Gharehchopogh, and S. Mirjalili, “Artificial gorilla troops optimizer: A new nature-inspired metaheuristic algorithm for global optimization problems,” Int. J. Intell. Syst. 36(10), 5887–5958 (2021). [CrossRef]

23. C. L. Tseng, C. K. Wang, C. H. Lai, C. H. Tsai, and P. J. Chiang, “Excitation of surface plasmon mode in bulk semiconductor lasers,” Appl. Opt. 62(14), 3690–3695 (2023). [CrossRef]

24. S. Ruder, “An overview of gradient descent optimization algorithms,” arXivarXiv:1609.04747 (2016). [CrossRef]

25. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, (2014).

26. F. Mehmood, S. Ahmad, and T. K. Whangbo, “An efficient optimization technique for training deep neural networks,” Mathematics 11(6), 1360 (2023). [CrossRef]

27. J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res. 12(61), 2121–2159 (2011). [CrossRef]

28. H. Shayanfar and F. S. Gharehchopogh, “Farmland fertility: A new metaheuristic algorithm for solving continuous optimization problems,” Appl. Soft Comput. 71, 728–746 (2018). [CrossRef]

29. B. Abdollahzadeh, F. S. Gharehchopogh, and S. Mirjalili, “African vultures optimization algorithm: A new nature-inspired metaheuristic algorithm for global optimization problems,” Comput. & Ind. Eng. 158, 107408 (2021). [CrossRef]

30. A. Shaddeli, F. Soleimanian Gharehchopogh, M. Masdari, and V. Solouk, “An improved african vulture optimization algorithm for feature selection problems and its application of sentiment analysis on movie reviews,” Big Data Cogn. Comput. 6(4), 104 (2022). [CrossRef]

31. F. S. Gharehchopogh and A. A. Khargoush, “A chaotic-based interactive autodidactic school algorithm for data clustering problems and its application on covid-19 disease detection,” Symmetry 15(4), 894 (2023). [CrossRef]

32. F. S. Gharehchopogh, A. Ucan, T. Ibrikci, et al., “Slime mould algorithm: A comprehensive survey of its variants and applications,” Arch. Computat. Methods Eng. 30(4), 2683–2723 (2023). [CrossRef]

33. F. Gharehchopogh, “Quantum-inspired metaheuristic algorithms: comprehensive survey and classification,” Artif. Intell. Rev. 56(6), 5479–5543 (2023). [CrossRef]

34. S. Shishavan and F. Soleimanian Gharehchopogh, “An improved cuckoo search optimization algorithm with geneti algorithm for community detection in complex network,” Multimed. Tools Appl. 81(18), 25205–25231 (2022). [CrossRef]

35. J. Piri, P. Mohapatra, B. Acharya, F. S. Gharehchopogh, V. C. Gerogiannis, A. Kanavos, and S. Manika, “Feature selection using artificial gorilla troop optimization for biomedical data: A case analysis with covid-19 data,” Mathematics 10(15), 2742 (2022). [CrossRef]

36. H. Mohmmadzadeh and F. Soleimanian Gharehchopogh, “A multi-agent system based for solving high-dimensional optimization problems: A case study on email spam detection,” Int. J. Commun Syst. 34(3), e4670 (2021). [CrossRef]

37. F. S. Gharehchopogh, M. Namazi, L. Ebrahimi, and B. Abdollahzadeh, “Advances in sparrow search algorithm: A comprehensive survey,” Arch. Computat. Methods Eng. 30(1), 427–455 (2023). [CrossRef]

38. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” (2017).

39. L. Bottou, “Large-scale machine learning with stochastic gradient descent,” Proceedings of COMPSTAT’2010 pp. 177–186 (2010).

40. T. Tieleman and G. Hinton, “Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning (2012).

41. J. Zhuang and T. Tang, “Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,” arXivarXiv:2010.07468 (2020). [CrossRef]

42. G. R. Hadley, “High-accuracy finite-difference equations for dielectric waveguide analysis ii: Dielectric corners,” J. Lightwave Technol. 20(7), 1219–1231 (2002). [CrossRef]

$N$	Unknowns	$n_{e f f}$ (`Adam`)
6	1470	3.4133212030
9	3000	3.4131450301
11	4320	3.4131347193
13	5880	3.4131333142
15	7680	3.4131329981
17	9720	3.4131329549
18	10830	3.4131329356

$D (μ m)$	$n_{e f f}$ [42]	$n_{e f f}$ (`Adam`)
0.1	3.412126	3.4121262713
0.2	3.412279	3.4122792726
0.3	3.412492	3.4124927921
0.5	3.413132	3.4131329981

$N$	Unknowns	$n_{e f f}$ (`Adam`)
6	1470	3.4133212030
9	3000	3.4131450301
11	4320	3.4131347193
13	5880	3.4131333142
15	7680	3.4131329981
17	9720	3.4131329549
18	10830	3.4131329356

$D (μ m)$	$n_{e f f}$ [42]	$n_{e f f}$ (`Adam`)
0.1	3.412126	3.4121262713
0.2	3.412279	3.4122792726
0.3	3.412492	3.4124927921
0.5	3.413132	3.4131329981

Adaptive penalty method with an Adam optimizer for enhanced convergence in optical waveguide mode solvers

Abstract

1. Introduction

2. Related work

3. Formulation

3.1 PM-PSFD with Adam

3.2 Computational complexity

4. Waveguide structures

4.1 Circular optical waveguide (COW)

4.2 Symmetrical $2\times 2$ strongly fused fiber-optical coupler ($2\times 2$ SFOC)

4.3 Rib waveguide (Rib)

4.4 Photonic crystal fiber (PCF)

4.5 Performance evaluation of optimizers

5. Conclusions and future works

Disclosures

Data availability

References

Data availability

Cited By

Figures (5)

Tables (4)

Equations (6)

Optics Express

		Convergence Rate
		Relative errors: $> 10^{- 3}$			Relative errors: $10^{- 3} \sim 10^{- 6}$			Relative errors: $< 10^{- 6}$
Structure	TSD	SBM	$α = N^{2}$	Adam	SBM	$α = N^{2}$	Adam	SBM	$α = N^{2}$	Adam
COW	5	$\sqrt$	®	$\times$	®	$\sqrt$	$\times$	$\times$	$\sqrt$	®
$2 \times 2$ SFOC	5	$\sqrt$	®	$\times$	$\times$	$\sqrt$	®	$\times$	®	$\sqrt$
Rib	12	$\sqrt$	®	$\times$	$\times$	$\sqrt$	®	$\times$	®	$\sqrt$
PCF	70	$\sqrt$	®	$\times$	$\times$	$\sqrt$	®	$\times$	®	$\sqrt$