Design and optimization of optical passive elements using artificial
neural networks

Ahmed M. Gabr; Chris Featherston; Chao Zhang; Cem Bonfil; Qi-Jun Zhang; Tom J. Smy

doi:10.1364/JOSAB.36.000999

1. INTRODUCTION

Silicon photonics has experienced a phenomenal transformation over the last few decades. The development of silicon-on-insulator (SOI) platforms for optical devices has sparked renewed interest in silicon photonics technology, and a number of research groups have actively been involved worldwide [1]. The large refractive index contrast between silicon and oxide, and their optical transparency in the 1550 nm wavelength window, enables ultra-compact device dimensions suitable for large-scale, high density integration on a chip. Additionally, the increasing availability of complementary metal-oxide-semiconductor (CMOS) photonic foundries, development of component libraries, and integration capability with microelectronics promises low-cost development of devices compared to other material technologies [2].

One of the challenges in designing photonic devices is the computational time needed to compute a model simulation using a number of commercial tools, such as OptiFDTD by Optiwave [3], Lumerical Solutions [4], or COMSOL [5]. Optimizing an optical structure requires nested sweeps along a number of geometrical dimensions (possibly at a number of wavelengths), which consumes a considerable amount of time because of the repetitive simulations required. This issue becomes even more challenging when optimizing a circuit consisting of a number of optical elements or when analyzing statistical variation where process variations and manufacturing tolerances of components are required to be taken into account. In this paper we will use a general nonlinear mapping tool, such as artificial neural networks (ANNs), to demonstrate how we can overcome some of these challenges.

Many papers in the literature targeted the use of ANNs in the field of microwave circuit element design and optimization [6–8]. However, in contrast, ANNs are rarely used in the field of photonics to analyze and optimize optical components. Only a few papers in the literature can be found in the photonics field [9–11]. Accurate and fast neural models can be developed from measured or simulated data over a range of geometrical parameter values. These features make neural networks a useful alternative for device modeling where repetitive simulation is required, such as the use of a compact model in a device or system level simulator. Once a model is developed, it can be used over and over again. This avoids repetitive simulation where a simple change in the physical dimension requires a complete re-simulation of the optical structure.

Typically, a neural network is trained to model the calculation of “outputs” from “inputs” and is referred to as a forward model—where the model inputs are physical or geometrical parameters and outputs are optical parameters. On the other hand, an inverse model finds the geometrical or physical parameters for given values of optical parameters [12]. For design and optimization problems, both types of models can be used [13]. While an inverse model can provide solutions immediately and is therefore faster than optimization using a forward model, it is more difficult to train because the same input values to the inverse model may have different values at the output (multi-valued solutions). This paper will deal with the creation of forward models for use in design and optimization of passive devices and as compact models for high levels of analysis.

In this work, we present the design and analysis of four fundamental passive elements common in optical circuits, namely: ridge waveguides, radial waveguide elements (bends), directional couplers, and multi-mode interference (MMI) couplers. All devices are simulated using commercial tools, such as OptiMODE and Lumerical Solutions, to train and validate the ANN while sweeping a number of design parameters. Once the ANN model is validated, the passive elements are then simulated using the commercial simulator and results are compared with the ANN model.

The paper is structured as follows. Section 2 provides an overview of the ANN approach used in this work. In Section 3, the simulation of four passive elements—ridge waveguide, radial waveguide, directional coupler, and MMI coupler—using OptiMODE and Lumerical tools is discussed. Comparisons with developed ANN models are presented. Finally, conclusions are provided in Section 4.

2. ARTIFICIAL NEURAL NETWORK MODEL DEVELOPMENT

Artificial neural networks are information processing systems with their design inspired by the studies of the ability of the human brain to learn from observations and to generalize by abstraction. They can be trained to learn any arbitrary nonlinear input/output relationships from corresponding data. A typical ANN structure is composed of processing elements, which are called neurons, and the interconnections between them, which are called links. Every link has a corresponding weight parameter associated with it. Each neuron receives stimulus from other neurons connected to it, processes the information, and produces an output.

There are a variety of structural forms for ANNs, but the most popularly used neural network structure is the multi-layer perceptron (MLP). In the MLP neural network, the neurons are grouped into layers. MLPs have a simple layer structure in which successive layers of neurons are fully interconnected, with connection weights controlling the strength of the connections. The MLP comprises of an input layer, an output layer, and a number of hidden layers [7]. A typical MLP neural network is shown in Fig. 1.

Fig. 1. Typical multi-layer perceptron neural network structure. A multi-layer perceptron network consists of an input layer, one or more hidden layers, and an output layer.

Download Full Size | PDF

For the purposes of this paper the ANN can be seen as a nonlinear fitting method. A traditional method of finding a representation of a set of nonlinear simulated data can be accomplished by fitting a multi-dimensional polynomial. For situations where the nonlinearity of the data is moderate and the order of the polynomial is high enough, the data can be represented with a moderate error. Fitting data with multi-dimensional polynomials is algorithmically more transparent than training a neural network; however, complex or highly nonlinear data will be subject to over-fitting and large errors. In contrast, a neural network takes more effort to train and validate but the interior representation is more flexible. Many of the successes of neural networks are due to the rich hierarchical neuron representations and the use of nonlinear neurons (often using a sigmoid input/output relationship). We will show a comparison between a neural network model and a polynomial fit in Section 3.B.

The first step in developing an ANN is to define the input ( $x$ ) and the output ( $y$ ) parameters. Once the range of input parameters is finalized, a sampling distribution needs to be chosen. Here, we use a uniform grid distribution where each input parameter is sampled at equal intervals. Next, input–output sample pairs are generated using either commercial software, as in this work, or measured data, where the output data is denoted as $d$ . The generated data is composed of training data, $T$ , which is used to train the neural network, and validation data, $V$ , which is used for testing the resulting neural network model. Once the data is generated, the ANN is now ready to be trained. The training data consists of sample pairs ( $x_{k}$ , $d_{k}$ ), where $x_{k}$ and $d_{k}$ are vectors representing the inputs and the targeted outputs of the neural network, respectively.

ANNs can be trained using several different learning algorithms, such as conjugate gradient, backward propagation, quasi-Newton, etc. In this work the ANN is trained based on the quasi-Newton algorithm. The training process is terminated once a minimum training error, $E_{T}$ , is reached, where the training error is defined as

E_{T} (w) = \frac{1}{2} \sum_{k \in T} \sum_{j = 1}^{m} {| y_{j} (x_{k}, w) - d_{j k} |}^{2},

where vector

w

contains all the weight parameters representing various links in the ANN,

d_{j k}

is the

j

th element of

d_{k}

, and

y_{j} (x_{k}, w)

is the

j

th neural network output for input

x_{k}

. To obtain the optimal neural network,

w

is adjusted such that the error function is minimized. Hidden neurons can be successively added to the network until the best error estimate is obtained. Once the ANN is trained, the validation dataset is used to confirm the model based on the validation error, which is defined similar to the training error in Eq. (1). Finally, a closed-form expression representing the ANN model is extracted.

In this work we use the NeuroModelerPlus program module [14]. It is intended for the automatic training and validating of MLP neural networks. NeuroModelerPlus can output the final ANN in a number of forms, including C and MATLAB code. In this work the ANN was formulated as a MATLAB function, which was used to generate results and compare to simulation data.

3. APPLICATION OF ANN FOR PASSIVE OPTICAL ELEMENTS

In this section we explore the use of trained ANNs with four fundamental photonics elements based on silicon photonics. A commonly used technology for silicon photonics is a variant of the SOI technology developed for CMOS circuits. A typical 200 mm wafer consists of a 725 μm silicon substrate, 2 μm of buried oxide (BOX), and 220 nm of crystalline silicon [15]. In this paper, a constant model of refractive indices of silicon and silicon oxide are used with values of 3.475 and 1.444, respectively, and the wavelength is 1.55 μm unless otherwise mentioned. In the following examples we include an oxide cladding which is used to protect the devices.

A. Optical Waveguides

There are several types of waveguides used in SOI-based photonics. We will explore two types, namely, strip and rib waveguides. Strip waveguides are used for routing as they offer low-loss tight bend radii and are formed by etching the thin top silicon layer down to the BOX layer—forming completely isolated rectangular waveguides. On the other hand, rib waveguides are formed by only using a partial etch of the top silicon layer so that the waveguide is placed on very thin silicon base. Rib waveguides are generally used for electro-optic devices such as modulators as it allows for electrical connections to be made to the waveguide. The basic structure of each waveguide is shown in Fig. 2. A strip waveguide is comprised of a simple rectangular dielectric waveguide with a width of $w_{g}$ and a height of $t_{g}$ placed directly on top of the BOX layer of thickness $t_{b}$ . The rib (also known as ridge) structure has a width of ( $w_{g}$ ), similar to the strip waveguide; however, it is placed on a thin slab layer of thickness ( $t_{s}$ ) and width ( $w_{s}$ ). The slab layer is located on top of the BOX layer of thickness $t_{b}$ , which is fixed at 2 μm. The whole structure is supported by a thick silicon substrate. A 2 μm of cladding oxide is on the top layer.

Fig. 2. Two typical SOI-based silicon photonics waveguides. (a) Strip and (b) rib.

Download Full Size | PDF

A single ANN is trained using commercial mode solvers for both waveguides using a set of selected parameters and ranges, with $t_{s} = 0$ for the strip waveguide. Four parameters are explored in this study, which are the inputs of the ANN: the silicon waveguide width ( $w_{g}$ ), silicon thickness ( $t_{g}$ ), slab thickness ( $t_{s}$ ), and wavelength ( $λ$ ). The output of the ANN model is the effective refractive index ( $n_{eff}$ ) of the fundamental mode. The range of input parameters is selected to explore valid solutions, and is shown in Table 1. The total number of simulations was 8510, which was divided into training data (5390) and validation data (3120). This number of samples can be reduced by means of design of experiments; however, we did not do so since the simulation time was moderate and the intent was to demonstrate the ANN.

Table 1. Parameter Space for the Waveguide Simulation

View Table

After training and validating the ANN model, the average training error is found to be 0.67% while the average validation error is 0.6% using one hidden layer with six neurons. Figure 3(a) presents a portion of the training data and Fig. 3(b) shows the effective index obtained from the ANN model (solid lines) and the training data from both Lumerical MODE (triangle markers) and OptiMODE (square markers) as a function of silicon waveguide width for wavelengths of 1310 nm and 1560 nm. The silicon thickness is fixed at 220 nm while the slab thickness is zero to consider a strip waveguide. As can be seen, there is a good agreement between simulated results and the ANN model. Note that the ANN model can generate data for a wavelength of 1310 nm with good accuracy although it is out of the input range for the wavelength. The ANN results took only 18.7 ms to execute while the Lumerical MODE simulations (14 points) took 3 min and 48 s on a PC with Intel i7 3 GHz processor and 64 Gb of RAM.

Fig. 3. Effective refractive index as a function of waveguide width, ( $w_{g}$ ), for strip waveguide ( $t_{s} = 0$ ) where silicon thickness, ( $t_{g}$ ), is 220 nm. (a) Training data from Lumerical MODE (triangles) and OptiMODE (squares) mode solvers for different wavelengths. (b) Comparison between the trained ANN model (solid lines) and simulation data from Lumerical MODE (triangles) and OptiMODE (squares) for wavelengths of 1310 and 1550 nm.

Download Full Size | PDF

The ability of the ANN to capture the wavelength dependence of the effective index is presented in Fig. 4. Training data is shown in Fig. 4(a), while Fig. 4(b) presents a comparison of the ANN model (solid lines) and the simulation data as a function of wavelength for different silicon waveguide widths. The widths are selected so that they are not part of the training or testing datasets, but rather selected randomly. The silicon thickness is 220 nm while the slab thickness is 90 nm to consider a rib waveguide. Generally, there is very good agreement between the ANN model and simulations.

Fig. 4. Effective index as a function of wavelength for a ridge waveguide ( $t_{s} = 90 nm$ ), where silicon thickness is 220 nm. (a) Training data from both Lumerical MODE (triangles) and OptiMODE (squares) mode solvers. (b) ANN model (solid lines) in comparison with Lumerical MODE (triangles) and OptiMODE (squares) for different silicon waveguide widths.

Download Full Size | PDF

Figure 5 presents the relative error of the modal indices as a function of wavelength for a number of waveguide widths. Within the region of training the error is less than 1% and is comparable to the variation in index between the two mode solvers. As can be seen the error is still moderate beyond the region of training, which indicates that the model is robust with respect to a degree of extrapolation. We have explored the ANN model outside the trained region for different parameters. The model is sensitive for small widths below 200 nm where error can increase to 10%, while extrapolating the width up to 2500 nm results in an error less than 4%. The model is less sensitive to the other three parameters. Therefore, although the ANN model is certain within the training range for a specific accuracy, it can still be used to estimate output outside the training region but with caution.

Fig. 5. Relative error comparing ANN output with Lumerical MODE and OptiMODE simulation results. Error is less than 0.65% for output data within the training range and has a maximum of 1.4% for specific output but not within the specified ANN input range.

Download Full Size | PDF

B. Radial Waveguide (90° Bend)

A requirement for silicon photonics is to provide radially curving waveguides, primarily for signal routing but also to create optical devices, such as ring resonators. Curved waveguide elements introduce losses due to a number of phenomena and it is required to understand how much optical loss is present. The complete loss associated with a curved optical element should include the pure bending loss due to propagation through the curvilinear geometry, additional scattering, and the transition losses at each end [16].

For a curved waveguide, the peak of the modal field shifts toward the outside radius, which introduces a mismatch in the modal profile at the junction between a straight waveguide and a bend waveguide and consequently results in the transition loss [16,17]. Furthermore, when the modal field shifts to the outside, it interacts more strongly with the outside sidewall roughness and this introduces some additional scattering loss (compared with a straight waveguide). Therefore, only a small shift in the field peak is allowed in order to make the transition loss and the additional scattering loss small. In this section we simulate the total bend loss, which includes both the pure loss and transition loss. Additional scattering is not simulated due to the idealized nature of the geometry. The bending loss is calculated using the following equation:

B L = - 10 \log (\frac{P_{out}}{P_{in}}),

where

P_{in}

and

P_{out}

are the input and output power, respectively.

The basic structure of a bend waveguide is shown in Fig. 6. The structure is based on a strip waveguide with a silicon layer of thickness 220 nm and width ( $w_{g}$ ) located on top of a buried oxide layer with thickness of 2 μm. The whole structure is supported by a silicon substrate of thickness 2 μm. A 2 μm of cladding oxide is on the top layer. The ANN is trained from Lumerical FDTD simulations using a set of selected parameters and ranges. Two parameters are explored in this study: waveguide width ( $w_{g}$ ) and bending radius ( $R_{b}$ ), which are the inputs to the ANN model. The outputs of the ANN model are the bending losses in decibels for the fundamental TE and TM modes. The range of input parameters is selected to explore valid solutions. The silicon waveguide width is explored from 300 to 600 nm and bending radii from 0 to 5 μm. The total number of simulations was 732, which was divided into training data (372) and validation data (360).

Fig. 6. 90° bend waveguide. The core silicon layer has a thickness of 220 nm with a buried oxide layer of 2 μm.

Download Full Size | PDF

After training and validating the ANN model, the average training error is found to be 0.32% while the validation error is 0.76% using one hidden layer with seven neurons. Figures 7(a) and 7(b) show the bending loss obtained from the ANN model as a function of bending radius for fundamental TE and TM modes, respectively. To compare the ANN model with independent Lumerical FDTD simulation, we compare the output results for waveguide widths of 385, 477, and 565 nm. Figure 8 shows the bending loss as a function of bending radius for the TE fundamental mode obtained from the ANN model (dashed lines) and Lumerical FDTD simulations (circles). As can be seen, there is a good agreement between simulated results and the ANN model. For a waveguide width of 477 nm, the ANN results took only 25.7 ms to execute while the FDTD simulations (12 points) took 155 s on a PC with Intel i7 3 GHz processor and 64 Gb of RAM. In addition to the ANN model, a number of polynomial fits were generated. The best fit was found to be using a fifth-order polynomial and this is also presented in Fig. 8 (dotted lines). Over the entire surface shown in Fig. 7, the mean square error of the ANN and the polynomial fit were determined to be 0.022 and 0.112, respectively. This clearly shows the advantage of the ANN methodology.

Fig. 7. Bending loss in dB of fundamental (a) TE and (b) TM modes as a function of bending radius as obtained using the trained ANN.

Download Full Size | PDF

Fig. 8. Bending loss in dB of fundamental TE mode as a function of bending radius as obtained using the trained ANN for waveguide widths of 385, 477, and 565 nm. The Lumerical FDTD simulation results (circle markers) are compared with the ANN model (dashed line) and a fifth-order polynomial (dotted line).

Download Full Size | PDF

C. Directional Coupler

Optical directional couplers are one of the most fundamental components in integrated optics and are widely applied in power combiner/dividers, optical switches, wavelength filters, polarization selectors, and add/drop multiplexers. The directional coupler consists of two parallel waveguides, where the coupling coefficient is controlled by both the coupler length and the spacing between the two waveguides. Directional couplers can be implemented using any type of waveguide, but in this section we use strip waveguides.

Lumerical 3D FDTD is used to obtain the training and validation data for the ANN model. The structure of a directional coupler is shown in Fig. 9, where the structure is based on a SOI wafer with a fully etched 220 nm thick silicon over a 2 μm thick buried oxide layer. The width is constrained to 500 nm where only the fundamental mode is supported. In this study, we sweep two parameters: coupler length ( $L_{c}$ ) and gap ( $g_{c}$ ). The objective is to determine the coupling coefficient as a function of coupler length and gap.

Fig. 9. Directional coupler.

Download Full Size | PDF

Based on the coupled mode theory [18,19], the fraction of the power coupled from one waveguide to the other is expressed as

κ^{2} = \frac{P_{cross}}{P_{in}} = \sin^{2} (c L_{c}),

where

P_{in}

and

P_{cross}

are the input power and the power coupled across the directional coupler, respectively,

c

is the coupling coefficient, and

L_{c}

is the coupler length. The coupling coefficient,

c

, is found:

c = \frac{π Δ n}{λ} = \frac{π}{2 L_{x}},

where

Δ n

is the difference between the effective indices of the first two eigenmodes of the coupled waveguides, and

L_{x}

is the distance where the power is exchanged periodically between the two waveguides. Using trigonometry, the fraction of the power in power coupled across the directional coupler is given by

κ^{2} = \sin^{2} (\frac{π L_{c}}{2 L_{x}}) = \frac{1}{2} - \frac{1}{2} \cos (\frac{π L_{c}}{L_{x}}) .

Instead of using the length-dependent coupling coefficient as the output of the ANN, we fit the coupling coefficient in Eq. (5) to a sine wave as

κ^{2} = y + A \sin (\frac{2 π}{T} L_{c} + ϕ] .

Therefore, the input of the ANN is the coupler gap (

g_{c}

) and the four outputs are amplitude (

A

), period (

T

), phase (

ϕ

), and y-offset (

y

). The coupler length is explored from 0 to 400 μm, and the coupler gap from 50 nm to 300 nm. The total number of simulations was 816, which was divided into training data (459) and validation data (357).

A neural network model based on the MLP network is used to model the directional coupler. After training and validating the ANN model, the average training error is found to be 0.37% while the average validation error is 0.66% using one hidden layer with eight neurons. Figure 10 shows the cross-over length, $L_{x}$ , as a function of coupler gap obtained from the ANN model. It is found that there is an exponential behavior between the coupler gap and the cross-over length. Using this figure, a designer can determine the required length for a 3 dB coupler with a given coupler gap. For example, for a coupler gap of 150 nm, the 3 dB coupler must be 12.3 μm long.

Fig. 10. Cross-over length, $L_{x}$ , versus directional coupler gap calculated using the ANN model.

Download Full Size | PDF

In Fig. 11 the ANN model of the cross-port transmission is presented as a function of both the coupler gap and length. The complexity of the input/output relationship is clear. To illustrate the accuracy of the ANN model, Fig. 12 shows the coupling coefficient as a function of DC coupler length obtained from the ANN model (solid lines) and Lumerical FDTD simulations (square markers) for coupler gaps of 50 nm, 150 nm, and 280 nm. As can be seen, there is a good agreement between simulated results and the ANN model. Not only the cross-over length can be extracted, but in addition the directional coupler length can be chosen such that a specific power ratio is obtained. For example, a coupler length of 144.3 μm is required to achieve 90% power splitting for a coupler with a gap of 280 nm. To simulate a directional coupler with a specific gap, the ANN results took only 1.5 ms to execute while the FDTD simulations (21 points) took 7 h and 44 min on a PC with Intel i7 3 GHz processor and 64 Gb of RAM.

Fig. 11. Transmission in cross-over port as a function of coupler length and gap.

Download Full Size | PDF

Fig. 12. Cross-over transmission as a function of coupler length as obtained using Lumerical FDTD and trained ANN. Note: these plots are vertical slices through the surface presented in Fig. 11.

Download Full Size | PDF

D. Design of a $1 \times 2$ MMI

Power splitters and couplers are basic and important building blocks for photonic integrated circuits. Among all of the integrated optical implementations reported, MMI-based power splitters have shown to have a smaller footprint, better tolerance to fabrication errors, broader bandwidth, and lower polarization dependency for weakly restrictive waveguides.

Figure 13 shows the schematic configuration of an MMI coupler, defining the length ( $L_{c}$ ) and width ( $w_{c}$ ). The operation of an MMI-based device is based on the self-imaging principle [20]. The incident fundamental mode entering via the access waveguide will excite many higher-order modes in the MMI region to satisfy the continuity of the input field.

Fig. 13. Schematic of an MMI coupler.

Download Full Size | PDF

A straightforward 2D analysis provides the propagation constants of these excited modes approximately, and they follow a quadratic relationship given by [20]

β_{0} - β_{v} = \frac{v (v + 2) π}{3 L_{π}},

where

β_{0}

is the propagation constant of the fundamental mode in the MMI waveguide,

v

is the mode number, and

L_{π}

is the beat length of the two lowest-order modes, and

L_{π} = \frac{π}{β_{0} - β_{1}} .

For a symmetric interference mechanism relating to an

1 \times N

MMI power splitter whose input waveguide is placed to the center of the MMI section,

N

-fold image may be obtained at a distance

L_{c} = \frac{3 L_{π}}{4 N}

by exciting only the even symmetric modes. However, this theory does not include the effects of a real three-dimensional structure and no prediction of the losses due to reflection at the interfaces present at the inputs and outputs. Due to these limitations MMI design is often undertaken using time-consuming 3D FDTD simulations.

For the MMI coupler used in the following work the insulator layer is thick (e.g., 2 μm) enough to ensure a low leakage loss to the substrate and the wavelength considered is 1.55 μm. The cladding layer is assumed to be air. The height of the core and access waveguides are fixed at 220 nm. The width of the central section ( $w_{c}$ ) and access waveguide widths ( $w_{g}$ ) are swept from 3 μm to 6 μm and from 0.5 μm to 1 μm, respectively. The total number of experiments is 519, which is divided into training data (224) and validation data (295).

Taking advantage of the Lumerical EME simulator, the MMI length is swept from 1 to 36 μm for each $w_{c}$ and $w_{g}$ value. As an example, the transmission curve is shown in Fig. 14 for $w_{c}$ and $w_{a}$ of 4 μm and 885 nm, respectively. To simplify the data for the neural network, instead of importing the transmission curve as a function of MMI length to the ANN, only the maximum transmission and the corresponding MMI length are imported for each $w_{c}$ and $w_{g}$ step. Therefore, the developed neural network model for the transmitted power of the fundamental TE and TM modes of a $1 \times 2$ MMI has two inputs— $w_{c}$ and $w_{g}$ —and two outputs—optimum MMI length, $L_{opt}$ , and the maximum transmission of fundamental mode $T_{\max}$ . After training and validating the ANN model, the average training error is found to be 0.53% while the average validation error is 1.62% using one hidden layer with 23 hidden neurons.

Fig. 14. MMI transmission as a function of the MMI length for TE modes after an EME sweep. The red circle represents the maximum transmission which occurs at the optimum MMI length.

Download Full Size | PDF

Figure 15 shows the ANN prediction of the maximum output transmission of the MMI as a function of the MMI width and access waveguide width. It is clear from the heat map that the relationship between the input and the output is very complex. The second figure (Fig. 16) shows vertical slices through this heat map and compares the ANN model to testing data. It is evident how the ANN can capture the detailed functionality of the device. The plot shows that the transmission increases as the access waveguide width increases. In addition, the transmission decreases for larger MMI widths.

Fig. 15. Maximum transmission for a $1 \times 2$ MMI coupler ( $h_{co} = 220 nm$ ) as a function of $w_{c}$ and $w_{g}$ for fundamental TE mode obtained from the ANN model.

Download Full Size | PDF

Fig. 16. Maximum transmission as a function of $w_{c}$ for three $w_{g}$ : 0.5 μm, 0.73 μm, and 1 μm. The ANN model (solid lines) is compared with Lumerical simulations (markers).

Download Full Size | PDF

Figure 17 shows the optimum MMI length as a function of the MMI width obtained from the ANN model and Lumerical EME simulation. As can be seen, the simulation results and the ANN model agree very well. For comparison the value obtained using the approximate theory [Eq. (9)] is also shown illustrating the difference between the detailed FDTD modeling and this theory.

Fig. 17. Optimum MMI length for a $1 \times 2$ MMI coupler ( $h_{co} = 220 nm$ ) as a function of $w_{c}$ for TE modes. Comparison between the ANN model (red line), EME simulation (blue circles), and theoretical results (dashed green line).

Download Full Size | PDF

Using the ANN model, a user can choose an MMI width and find the corresponding optimum MMI length that maximizes transmission. For example, for a $1 \times 2$ MMI coupler with MMI width and access waveguide width of 4.25 μm and 0.8 μm, respectively, the optimum length is 15.38 μm and results in maximum transmission of 87%. Simulating this structure in Lumerical EME the results for the optimum length and maximum transmission are 15.4 μm and 86.3%. While the Lumerical EME simulation took 300 s on a PC with Intel i7 3 GHz processor and 64 Gb of RAM, the ANN results took only 0.8 ms.

E. Applications Summary

In this work, a three-layer MLP neural network structure was used for each neural network model, and a quasi-Newton training algorithm was used to train the neural network models. Testing data are used after training the model to verify the generalization ability of these models. The automatic model generation algorithm of NeuroModelerPlus [14] was used to develop these models, which automatically trains the model until model training and testing accuracy are satisfied. The training error and test errors are generally similar because sufficient training data were used in the examples.

Neural network models are useful for highly repeated design tasks with different specifications. In such a case, the benefit of using the models far outweighs the cost of training because of the following reasons.

1. Generating training data is done only once. The benefit of the model increases when the model is used repeatedly.
2. Neural network training is considered outside the design cycle.
3. Unlike circuit or device design, which requires human interaction, neural network training is a machine-based computational task.
4. Neural network model training can be done by a model developer but used by multiple designers. The neural network approach cuts expensive design time by shifting much burden to offline computer-based neural network training.

4. CONCLUSIONS

In this paper a neural network model based on an MLP neural network structure has been used to model optical passive elements. Four fundamental passive elements are selected to demonstrate the accuracy and speed of ANNs in optical modeling. We have demonstrated in this work how ANNs can be used for designing and optimizing optical passive elements. The scope of the paper is not to optimize the different passive elements presented here, but rather to demonstrate the accuracy and effectiveness of the ANN models. The average error is less than 1% while the computational time is in the range of milliseconds as compared to minutes or hours using commercial tools, such as OptiMODE and Lumerical Solutions. The models can be extended to be more flexible by adding more geometrical dimensions and wavelength. This can be very useful in the process of designing and optimizing optical passive elements, but also in designing optical circuits where different elements are integrated together. In addition, ANN models can be very useful in statistical validation that is essential in the design process in order to consider fabrication tolerances which can have strong effect on the functionality of fabricated photonic circuits and on fabrication yield.

As the integrated optics industry expands and matures, sophisticated methods of modeling devices need to be developed to handle more complex physics and structural detail. A good example of this is the development of nanophotonics. Nanophotonics studies light and its interactions with matter at the nanoscale [21]. As the demands of performance and integration level increases, the design and optimization of nanophotonic devices become computationally expensive. A recent review illustrates how nanophotonic inverse design can benefit from the use of artificial intelligence techniques [22]. The use of ANNs for modeling a wide variety of optical devices would appear to be of benefit for applications from optimization to system level simulation.

Funding

Natural Sciences and Engineering Research Council of Canada (NSERC) (CRDPJ493622-16); Ontario Centres of Excellence (OCE) (25798).

REFERENCES

1. D. Thomson, A. Zilkie, J. E. Bowers, T. Komljenovic, G. T. Reed, L. Vivien, D. Marris-Morini, E. Cassan, L. Virot, J.-M. Fédéli, J.-M. Hartmann, J. H. Schmid, D.-X. Xu, F. Boeuf, P. O’Brien, G. Z. Mashanovich, and M. Nedeljkovic, “Roadmap on silicon photonics,” J. Opt. 18, 073003 (2016). [CrossRef]

2. A. E. J. Lim, J. Song, Q. Fang, C. Li, X. Tu, N. Duan, K. K. Chen, R. P. C. Tern, and T. Y. Liow, “Review of silicon photonics foundry efforts,” IEEE J. Sel. Top. Quantum Electron. 20, 405–416 (2014). [CrossRef]

3. OptiFDTD, https://www.optiwave.com.

4. Lumerical FDTD Solutions, https://www.lumerical.com.

5. COMSOL Multiphysics, https://www.comsol.com.

6. F. Wang, V. K. Devabhaktuni, C. Xi, and Q.-J. Zhang, “Neural network structures and training algorithms for RF and microwave applications,” Int. J. RF Microwave Comput. Aid. Eng. 9, 216–240 (1999). [CrossRef]

7. Q.-J. Zhang, K. Gupta, and V. Devabhaktuni, “Artificial neural networks for RF and microwave design-from theory to practice,” IEEE Trans. Microwave Theory Tech. 51, 1339–1350 (2003). [CrossRef]

8. S. W. S. Wan, L. Z. L. Zhang, and Q. Z. Q. Zhang, “Application of artificial neural networks for electromagnetic modeling and computational electromagnetics,” in 51st Midwest Symposium on Circuits and Systems (2008), pp. 743–746.

9. T. Abreu-Cerqueira, A. Dourado-Sisnando, and V. F. Rodriguez-Esquerre, “Analysis and design of directional couplers based on Al_xGa_1–xAs by using an efficient neural networks: a design tool simulation implemented in C/C++,” in SBMO/IEEE MTT-S International Microwave and Optoelectronics Conference (IMOC) (IEEE, 2011), pp. 881–885.

10. M. F. O. Hameed, S. S. A. Obayya, K. Al-Begain, A. M. Nasr, and M. I. Abo El Maaty, “Accurate radial basis function based neural network approach for analysis of photonic crystal fibers,” Opt. Quantum Electron. 40, 891–905 (2009). [CrossRef]

11. R. R. Andrawis, M. A. Swillam, M. A. El-Gamal, and E. A. Soliman, “Artificial neural network modeling of plasmonic transmission lines,” Appl. Opt. 55, 2780–2790 (2016). [CrossRef]

12. H. Kabir, Y. Wang, M. Yu, and Q. Zhang, “Neural network inverse modeling and applications to microwave filter design,” IEEE Trans. Microwave Theory Tech. 56, 867–879 (2008). [CrossRef]

13. M. M. Vai, S. Wu, B. Li, and S. Prasad, “Reverse modeling of microwave circuits with bidirectional neural network models,” IEEE Trans. Microwave Theory Tech. 46, 1492–1494 (1998). [CrossRef]

14. Q. J. Zhang and K. C. Gupta, Neural Networks for RF and Microwave Design (Book + Neuromodeler Disk), 1st ed. (Artech House, 2000).

15. S. K. Selvaraja, P. Jaenen, W. Bogaerts, D. V. Thourhout, P. Dumon, and R. Baets, “Fabrication of photonic wire and crystal circuits in silicon-on-insulator using 193-nm optical lithography,” J. Lightwave Technol. 27, 4076–4083 (2009). [CrossRef]

16. D. Dai and S. He, “Analysis of characteristics of bent rib waveguides,” J. Opt. Soc. Am. A 21, 113–121 (2004). [CrossRef]

17. M. K. Smit, E. C. Pennings, and H. Blok, “Normalized approach to the design of low-loss optical waveguide bends,” J. Lightwave Technol. 11, 1737–1742 (1993). [CrossRef]

18. R. Syms and J. Cozens, Optical Guided Waves and Devices (McGraw-Hill, 1992).

19. L. Chrostowski and M. Hochberg, Silicon Photonics Design (Cambridge University, 2015).

20. L. B. Soldano and E. C. Pennings, “Optical multi-mode interference devices based on self-imaging: principles and applications,” J. Lightwave Technol. 13, 615–627 (1995). [CrossRef]

21. S. Molesky, Z. Lin, A. Y. Piggott, W. Jin, J. Vuckovic, and A. W. Rodriguez, “Inverse design in nanophotonics,” Nat. Photonics 12, 659–670 (2018). [CrossRef]

22. K. Yao, R. Unni, and Y. Zheng, “Intelligent nanophotonics: merging photonics and artificial intelligence at the nanoscale,” arXiv:1810.11709 (2018).

Parameter	Variable	Range (nm)
Silicon waveguide width	$w_{g}$	200–1500
Silicon thickness	$t_{g}$	200–500
Slab thickness	$t_{s}$	0–100
Wavelength	$λ$	1400–1600

Design and optimization of optical passive elements using artificial neural networks

Abstract

1. INTRODUCTION

2. ARTIFICIAL NEURAL NETWORK MODEL DEVELOPMENT