Illustrated tutorial on global optimization in nanophotonics

Pauline Bennet; Denis Langevin; Chaymae Essoual; Abdourahman Khaireh-Walieh; Olivier Teytaud; Olivier Teytaud; Peter Wiecha; Antoine Moreau

doi:10.1364/JOSAB.506389

1. INTRODUCTION

While efficient automated design methods for multilayered structures emerged in the 1970s, typically, numerical optimization has been used only more recently, thanks to the increase in the available computational power and the progress in simulation techniques. These developments led to methods providing original and efficient designs for three-dimensional structures, for which no design rules exist [1–4]. In photonics, the most promising approaches so far are inspired by successful methods from mechanics and are based on local optimization algorithms [5]. However, in photonics, the wave nature of the problem typically generates a large number of local minima, making global optimization algorithms valuable tools, while they are in many cases unreasonably expensive in mechanics [6].

Numerical optimization is a domain in which significant progress has been made in the last two decades, with enormous practical implications. Recent results suggest that modern global optimization algorithms are able to provide us with original and efficient photonic designs [7] that are particularly convincing as they can be understood from a physical point of view [8]. However, reproducibility problems have made the important results in the field optimization harder to identify and to trust [9]–especially for researchers in other communities.

The aim of this paper is to serve as a tutorial for photonics specialists who are interested in using modern numerical optimization tools. We provide insights, practical tips, and guidance to help researchers navigate the challenges and pitfalls associated with optimization. We demonstrate how simulation tools and state-of-the-art optimization libraries can be easily integrated to effectively tackle inverse design problems.

Specifically, we provide examples of multi-layer photonics problems, simulated with the PyMoosh toolkit [10] and optimized using the Nevergrad Python library [11]. We present a comprehensive methodology that includes defining relevant observables, choosing optimization strategies, and computing specific criteria to assess the reliability of the obtained solutions. We offer practical examples inspired by real-world problems involving multilayered structures, but we have also included the optimization of a 2D optical grating and a 3D plasmonic structure to show that these techniques can be applied even in the most complex setups. These examples effectively illustrate our methodology and make it easy to transpose to other situations. For easy reproducibility, the codes are provided in an online repository [12].

In the first part, we provide essential background information on automated design of photonic structures and optimization. We also introduce the test cases that we use to demonstrate the concepts discussed in this paper.

Fig. 1. Parametrization. The process of selecting the parameters to describe the structure plays a crucial role in determining the types of solutions that can be obtained. (a) Less than 10 parameters are optimized. The geometry of the structure is fully constrained. We refer to problems of such low degrees of freedom as optimization, but not inverse design. Image from [13]. (b) Around a hundred parameters are optimized. While the geometry is constrained since the structure is periodic, the blocks can be any size and position in the period. Optimization could have produced a Bragg mirror as well as a diffraction grating. Here, an intermediate structure is produced presenting characteristics from both. Given the wide range of possibilities, this can be deemed an inverse problem. Image from [7]. (c) Around a thousand parameters are optimized on a pixel-based optimization. Each pixel is filled with one material or another to create the final design. Reprinted with permission from Gondarenko et al., Phys. Rev. Lett., 96, 143904, 2006 [14]. Copyright 2006 by the American Physical Society. (d) Tens of thousands of parameters are optimized. We call this type of image parametrization topology optimization. Reprinted with permission from Piggott et al., “Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer,” Nat. Photonics 9, 374–377 (2015) [15].

Download Full Size | PDF

The second section provides an overview of algorithm categories, explains in detail the inner workings of some algorithms, and outlines key observations to make during algorithm execution, as these observations provide insights into the success or failure of the optimization.

In the third part, we walk through the steps of running an optimization, showcase the benchmarking of algorithms using our examples, and provide criteria for selecting the most effective algorithms. Additionally, we conduct a thorough physical analysis of our solutions to evaluate their quality.

In the final part, we present general guidelines for optimizing photonic structures using global optimization algorithms. These guidelines serve as a methodology to avoid common pitfalls and maximize the potential of the optimization process.

2. GENERAL BACKGROUND AND DESCRIPTION OF THE TEST CASES

In the fields of photonics and optimization, numerous concepts and terminologies have been introduced over the years. As we now require tools and concepts from both domains to optimize complex photonic structures, it is important to present an overview of the vocabulary that has been developed and provide clear definitions for key terms.

A. Fundamentals of Optimization in Photonics

Defining the optimization problem. To apply optimization techniques for improving a photonic structure, the structure must be represented by a finite set of parameters. This process of selecting the parameters to describe the structure is known as parametrization, and it plays a crucial role in determining the types of solutions that can be obtained, potentially introducing biases. Figure 1 presents typical examples of problems with increasing complexity in their parametrization.

In photonics, many problems are continuous, meaning that valid structures can be achieved by making the parameters change continuously. In the present paper we focus on continuous optimization. However, when the problem is not continuous, it is said to be discrete or combinatorial, requiring specialized algorithms [16–18]. As will be explained below, discrete problems in photonics are often made continuous in order to leverage gradient-based approaches. We underline that this comes at the cost of extra complications and that discrete algorithms can also provide interesting solutions [19,20], even if they are less often considered in photonics [2].

It is then necessary to define a cost function that quantifies the discrepancy between the actual optical response of the structure and the desired response. This cost function serves as a measure of how close the performance of the structure is to the target, even if this is not a measure in a strict mathematical sense. Finally, an optimization domain must be defined, typically by setting bounds on the parameter values. Together, the cost function and the optimization domain form the optimization problem. The global optimum is the point of the optimization domain where the value of the cost function is the lowest. While it is simple to determine whether a solution corresponds to a minimum of the cost function, called a local minimum, it is generally impossible to know whether such a minimum is the global optimum, i.e., the solution with the lowest possible value of the cost function, and hence the closest to the desired response. The surface representing the cost function as a function of the parameters is called the cost function landscape.

We underline that it is often useful to put limits on the values of the cost function. A lot of cost functions in photonics are based on reflection or transmission coefficients, so that the physical limits put on such coefficients translate into physical boundaries for the cost function. Values outside these boundaries may be indicative of a numerical error, something algorithms often tend to find and exploit.

Local optimization. A local optimization algorithm starts with one structure (either generated randomly within the optimization domain or given by the user), and tries to improve it at each step by changing its parameters by a small amount—often guided by the gradient of the cost function landscape. Typically the new structure is kept if the associated cost function is better than for the previous structure. This makes for a simple and fast optimization technique, but has the problem of easily getting stuck in local minima of the cost function. Several strategies are possible to avoid the problem of local minima [21–23], but they often make the algorithms more computationally expensive. In the end, performing multiple iterations of the optimization process with different initial structures can provide insight into the presence of local minima, which indicates the level of difficulty of the problem. Put simply, the greater the number of local minima, the less likely it is for a solution found by local optimization to be the global optimum.

An attraction basin is the region of the optimization domain for which a solution will be found whenever a local algorithm starts in this region. An optimization domain can typically be divided into attraction basins.

Global optimization and black-box optimization. The search for a global optimum is called global optimization (GO). Algorithms that optimize without using the gradient or other side information are called black-box optimization (BBO). BBO and GO are not synonymous: the former refers to the absence of additional information (gradient or other), whereas the latter refers to caring about global minima rather than local minima. There is a large overlap between the two categories of algorithms though, even if some BBO algorithms can be deemed local and if some GO algorithms exploit the information provided by the gradient [24,25].

BBO algorithms are generalist in nature and can be applied to a large variety of problems. A wide range of algorithms exist, and new ones are continually proposed [26]. This includes genetic algorithms, mathematical programming, Bayesian optimization, machine learning [27], reinforcement learning. or other BBO methods. Most of these algorithms are heuristic in nature, making them non-deterministic. This means that two different runs, even with the same starting points, can yield different solutions. Moreover, the performance of an algorithm often vastly depends on the specific optimization problem at hand [28], making it challenging to compare different algorithms. Consequently, BBO suffers from poor reproducibility, hindering the identification of the most efficient algorithms [29]. Combining rigorous benchmarking [30–40] and BBO libraries is the only approach able to address the reproducibility crisis in optimization, ensuring transparency and reliability of results. A lot of efforts have thus been made to design benchmarks for BBO [9,11,41,42].

From optimization to inverse design in photonics. Optimization techniques have found applications in the improvement of photonic structures. They can also be employed in retrieving experimental parameters, as for instance commonly done in ellipsometry.

Thanks to the increase in the available computational power, numerical optimization has been increasingly used in cases where a wide range of designs can be generated. This wide range is achieved either by a low level of constraints on the geometry of the structures, or by the versatility of their optical behavior. Using numerical optimization to yield novel solutions is usually called inverse design, even though it is difficult to determine precisely when an optimization problem becomes an inverse design problem. Inverse design problems generally require advanced methods and increased computational power to solve them effectively.

For instance, problems characterized by an almost completely unconstrained geometry and often called “free form” [4,43] are considered inverse design problems beyond any doubts. However, optimizing a multilayered structure consisting of 20 layers with alternating refractive indexes can also be considered as an inverse design problem. In that case, even though the geometry is somewhat limited, the structure still presents a wide diversity of optical responses, requiring advanced or adapted methods to tackle the challenge [44].

Topology optimization. With the increase in available computational power, it has become recently possible to divide a given region of space into pixels/voxels that can typically be either filled with a material or left empty, in order to look for a structure presenting a desired optical response. Then, the number of parameters is intentionally very large, to offer the algorithms a large number of degrees of freedom. Such an approach is called topology optimization (TO). TO draws inspiration from its successful application in the field of mechanics, where it has been widely used. Given the large number of parameters used for representing a structure, TO usually employs gradient-based methods. First, the problem is made continuous instead of discrete (continuous properties are allowed for a given pixel instead of a binary choice between filled and void) and since the gradient can be computed for a low computational cost (through the use of the adjoint method [45]), steepest descent algorithms seem a natural choice. This approach has been extremely successful in mechanics, to the point that global optimization has been considered obsolete [6].

When applied to photonics problems, this approach has at first shown remarkable success, demonstrating that photonic structures can be miniaturized to an unprecedented degree while maintaining high optical performance [15]. However, the physics fundamentally differ in some aspects between mechanics and photonics. Mechanical problems can be regarded as comparatively simpler since a continuous deformation of an object results in a continuous and nearly monotonic deformation of its properties. The optimization process in photonics poses greater challenges compared to mechanical cases, primarily due to the wave nature of the problem. Photonic structures can exhibit resonant behavior, with each resonance corresponding to a local maximum or minimum of the cost function. This characteristic poses challenges for gradient descent methods, which are better suited for problems with smoother landscapes of the cost function. In photonics, two different starting points in the optimization process most often lead to two different outcomes [46]. This is in contrast to mechanics, where different starting points do not typically result in significant variations in the outcomes [6,47]. Moreover, the structures produced by these algorithms often exhibit excessive intricacies, which can pose challenges for fabrication and hinder their commercialization potential [1,48].

Global optimization for photonics. It is now evident that early attempts at using genetic algorithms for optimizing simple photonic structures were unsuccessful [49]. Specifically designed heuristics have in general failed to produce structures that are convincing enough to persuade the community to embrace global numerical optimization as a tool for years. Moreover, for continuous problems, even modern global optimization algorithms often yield unsatisfactory solutions when the parameter space has a high dimensionality, as in TO. The first successes of gradient-based TO may thus suggest that global optimization, as in mechanics, cannot compete with TO on inverse design problems.

However, in our experience with continuous problems and modern optimization algorithms, we have found that keeping the number of parameters relatively low, typically below 100, usually suffices to give BBO algorithms enough degrees of freedom to yield innovative and convincing results [7]. This approach requires limiting the number of degrees of freedom, a strategy also referred to as parametric optimization (PO). It becomes, however, challenging to make a fair comparison between BBO and TO, as they operate in different regimes.

In addition, global optimization algorithms may be valuable in problems that are typically considered in TO, due to the discretization requirement: whereas GO and BBO refer to (not incompatible) algorithm categories, TO refers to a category of problems; hence we can have simultaneously GO/BBO/TO or none of them. Many topology optimization problems are inherently discrete, and the use of continuous parameters is primarily for leveraging gradient-based methods. The variety of the results obtained using continuous parameters [46] suggests that making the problem continuous is also making it more difficult, for example, because local optimization algorithms get stuck in minima with intermediate values of the refractive index. In such cases, discrete global optimization algorithms may actually offer advantages and recent results seem to indicate that such methods are able to yield efficient solutions in a relatively consistent way, even spontaneously generating welcome features like symmetry [20] and the disappearance of intermediate refractive index values. Once again, the challenge is to identify the most suitable algorithm for a given problem, which can only be done with comprehensive algorithm libraries. In the present paper, we focus on PO, due to its advantages: suggested designs are frequently smooth and possible to manufacture, and the dimensionality typically remains moderate (dozens to hundreds).

In our opinion, global optimization algorithms are too often overlooked due to the challenges posed by high-dimensional parameter spaces. We underline that most often, finding the right algorithm makes a tremendous difference and that, given the inherent complexity of optimization, a rigorous methodology must be applied to achieve satisfactory solutions. While these methods may be computationally demanding, the advent of parallelism has made many global algorithms (which are usually intrinsically parallel) significantly more efficient, making them increasingly relevant.

B. Typical Photonics Test Cases

We have chosen three test cases that are typical of problems that can be encountered in photonics, on which we applied the methods we recommend and benchmarked different algorithms. A Jupyter Notebook [12] is provided in which we show how cost functions can be easily defined in the context of multilayered structures using the PyMoosh module [50] and how all the algorithms implemented in the comprehensive Nevergrad library [11] can be subsequently tested. These three test cases are: a dielectric mirror optimized for high reflectivity, an ellipsometric problem, and a photovoltaic cell optimized for absorption inside the active material. These problems are represented Fig. 2.

Fig. 2. Graphic presentation of the test cases. (a) Bragg test case. The objective is to maximize the reflectance of a multilayered structure composed of two alternating materials (of refractive index 1.4 and 1.7) at a wavelength of 600 nm. The parameters are the thicknesses of the different layers, but the refractive indexes are set. (b) Photovoltaic test case. The objective is to maximize the short circuit current and thus the absorption in the visible spectrum range (wavelengths from 375 to 750 nm) in the silicon layer, using an antireflective multilayered coating composed of two alternating materials. (c) Ellipsometry test case. The objective is to retrieve the thickness and the refractive index of an unknown material, using the reflectance spectrum of a single layer in both polarizations. (d) 2D grating problem. The objective is to minimize the blue ($\lambda = 450\,{\rm nm}$) specular reflection while maximizing diffraction orders. The parameters subject to optimization are the width, position, and thickness of each block of matter. (e) 3D nanoantenna problem. Right: the optimization goal is to direct the emission from a local dipole lightsource towards a defined solid angle, by maximizing the ratio between the power emitted in the target direction (in red) over power emitted in the rest of the solid angle. Left: top view of a geometry of a directional plasmonic gold nanoantenna. The parameters subject to the optimization are the position (in $x$ and $y$ directions) of each of the nanocubes.

Download Full Size | PDF

High-reflectivity problem. The cost function is defined as $1 - R$ where $R$ represents the reflectance at the working wavelength of 600 nm of a multilayered structure with alternating refractive indexes [${n_1} = 1.4$ and ${n_2} = 1.8$, starting with ${n_2}$) as shown in Fig. 2(a)]. Therefore, the optimization algorithms will aim to find highly reflective structures. We considered two sub-cases: (i) a structure with only 10 layers (minibragg), which is representative of low-dimensionality problems in photonics; (ii) a structure with 20 layers (Bragg), a number we feel marks the beginning of the domain of inverse design problems.

The thickness of each layer is taken between zero and the thickness of a half-wave layer for the lower index. When considering only a single working wavelength, adding or removing a half-wave layer to a given layer has no impact on the optical response (this is called an absent layer). Therefore, considering larger thicknesses would only introduce degenerate minima. We underline that letting the refractive index vary would lead to the same solutions, as the optimization would converge to the highest index contrast between two layers [7]. This can be considered as physically expected, as this is known to be the most efficient choice to modulate the optical response of a multilayer [51]. The Bragg mirror, a periodic solution, has been identified as the best solution to this problem so far [7]. It is a local optimum, and it outperforms any disordered solution, suggesting that it might be a regular, principled solution that might be the optimum—even though it is not, as of now, strictly proven. This is why we have selected it as a test case, for which we know the (likely) global optimum, and called it “Bragg.” We underline that structures reminiscent of Bragg mirrors emerge often even in 2D or 3D structures [14,52], for instance, in waveguide problems [20,46,53].

Ellipsometry problem. The objective here is to find the material and the thickness of a reference layer, knowing its reflectance properties, obtained using spectroscopic ellipsometry. This step is required to extract the desired information from ellipsometric measurements. For a tested layer with a given material and thickness, we compute the relation $e = \frac{{{r_p}}}{{{r_s}}}$, where ${r_p}$ is the reflectance of the whole structure in TE polarization and ${r_s}$ is the reflectance in TM polarization, for a wavelength range of 400–800 nm and for a given incidence angle of ${40^ \circ}$. The quantity we minimize here is the difference between the $e$ computed from the tested layer and the $e$ computed from the reference layer.

The thickness of the tested layer is taken between 30 and 250 nm, and we assume the refractive index is comprised between 1.1 and 3. This simple problem is illustrated in Fig. 2(c). Problems in ellipsometry are generally more complicated but highly dependent on the response model assumed for the material. Our simple dispersion-less and absorption-less test case can, however, be adapted easily thanks to the architecture of PyMoosh [10].

We underline that in many ways, this case mirrors the practical challenges faced by photonics researchers. It illustrates the common situation where researchers design structures characterized by a small number of parameters, and then engage in optimization to determine the upper limits of their performance [54–56]. Such a problem is typically not considered as inverse design.

Photovoltaics problem. The objective here is to find the best possible antireflective coating in normal incidence with a multilayer structure made of two alternating materials (permittivities of two and three) on an amorphous hydrogenated silicon substrate (30,000 nm). The quantity to be maximized here is the short circuit current in the 375–750 nm range, assuming a quantum yield equal to one, as described in [7,8].

The cost function $f$ is one minus the efficiency defined as the ratio between the short circuit current ${j_{{\rm sc}}}$ and the maximum achievable current ${j_{{\max}}}$ if all photons are converted into electron-hole pairs:

(1)$$f = 1 - \frac{{{j_{{\rm sc}}}}}{{{j_{{\max}}}}},$$

with

(2)$${j_{{\rm sc}}} = \int A(\lambda)\frac{{dI}}{{d\lambda}}.\frac{{{ e}\lambda}}{{{ hc}}} {\rm d}\lambda ,$$

where $A(\lambda)$ is the absorbance of the active layer, $e$, $h$, and $c$ are, respectively, the elementary charge, the Planck constant, and the speed of light in vacuum, and the spectral density of the illumination $\frac{{dI}}{{d\lambda}}$ is given by the solar spectrum [57].

The problem is illustrated in Fig. 2(b). As for the high-reflectivity problem we considered three sub-cases: (i) a structure with only 10 layers (Photovoltaics), (ii) a structure with 20 layers (BigPhotovoltaics), and (iii) a structure with 32 layers (HugePhotovoltaics).

C. Complex Photonics Cases

In order to show how the techniques we advocate for can be applied to much more complex cases, we have selected 2D and 3D cases that can be studied even though they require much more computational power compared to multilayered structures.

The 2D case corresponds to a grating with a 600 nm period and composed of five layers containing a single block of matter, whose size and position can be adjusted as illustrated in Fig. 2(d). The cost function aims to minimize the specular reflection and maximize the diffraction orders, producing structures resembling the photonic structures that can be found on the wings of Morpho butterflies.

The 3D problem focuses on the optimization of a directional plasmonic nanoantenna that couples with a local quantum emitter, and steers the light towards a defined solid angle, as illustrated in Fig. 2(e).

For more details, the interested reader is invited to consult our previous work [7,58] and corresponding codes [12]. A first notebook shows how to perform an optimization using DE for the grating case, which generally produces a regular structure. Two notebooks are devoted to the plasmonic antenna—one showing a simple optimization and the second one leveraging the parallel library MPI to benchmark algorithms on that particularly costly test case.

3. BASIC TOOLS FOR OPTIMIZATION: ALGORITHMS AND OBSERVABLES

As discussed before, many algorithms and methods are available to perform optimization and eventually inverse design. It is therefore necessary to define some observables and basic tools to compare the performances of algorithms in a reliable way. This section presents first some well known categories of algorithms. It is shown how these algorithms can all be run through the Nevergrad platform [11]. Then, we define relevant observables, i.e., quantities allowing discussions about the performances of algorithms and their results. All the discussions in this section are illustrated by results of the codes provided in [12].

A. Algorithms Categories

Many BBO platforms exist, including CMA (covariance adaptation algorithm) [59], AX (adaptive experiments) [60], HyperOpt [61], SMAC (sequential model-based algorithm configuration [62]), NLOPT [63], and Nevergrad [11]: Nevergrad is a free Python library that imports all those ones and others. We present in this section the algorithms from Nevergrad used in our benchmark and in our Jupyter Notebook experiments [12], and organize them based on their respective categories.

Mathematical programming. Methods available for gradient-based optimization have been adapted to the black-box setting. For example, the limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) [64] method can be applied, with derivatives computed by finite differences. We underline that other methods for computing the gradient more efficiently are available, especially the adjoint method [65] widely used in TO, or numerical tools [66], but this is beyond the scope of the present paper.

Genetic and evolutionary algorithms. Some of the most well known families of algorithms are evolutionary and/or genetic algorithms. In an evolutionary algorithm, each step of the process creates and evaluates the cost function of a “population” of structures, preserving the “individuals” (i.e., the structures) that are better than those belonging to the population of the previous step (also called the previous “generation”). The different genetic algorithms have different ways of creating the new generation of structures. Most include steps such as mutations and crossovers between structure information.

We underline that the historical algorithms, in which, for instance, the binary coding of the parameters is considered as its genetic code, are considered obsolete in the optimization community because of their well documented lack of efficiency [67,68] and their use should be avoided.

Many more efficient algorithms, inspired by these early ideas, have emerged over the years, with an improved efficiency. One of the most well known and efficient overall seems to be differential evolution (DE [69]), an algorithm that has many variants. One of the most classical variants is presented in Table 1 and the different formulas that can be used for the mutation to define new variants are presented in Table 2. By default, we use the current-to-best version of DE.

Table 1. Pseudo-code of DE^a

View Table | View all tables in this article

Table 2. Various DE Formulas^a

View Table | View all tables in this article

In our experiments, we often use the quasi-oppositional variant, QODE [70], which randomly draws half the initial population, and then, for each point $x$, adds ${-}r \times x$ to the population with $r$ randomly uniformly drawn in [0,1] (assuming that the center is zero: otherwise $c - r \times (x - c)$, with $c$ the center of the optimization domain). We also include QNDE (quasi-Newton DE), which runs QODE for half the budget and then finishes with BFGS with finite differences: this is a memetic algorithm, which runs first a global algorithm and then a local one.

All the variants of DE typically perform well for a large parameter space dimension, including quite irregular cost functions such as the ones that appear in photonics. Most winning algorithms for large-scale global optimization (LSGO) competitions use a variant of DE [34].

There are many reasons why DE can be a sensible default choice—among them its simplicity, its low computational overhead, and its good performance overall associated to a general robustness to changes in the optimization conditions. We also point out that the rise of parallel computing resources makes DE and other global methods like PSO (see below) faster: whereas parallelizing BFGS or other gradient-based methods is tricky, DE (as implemented and parametrized in the present paper and in most implementations) is just 30 times faster with 30 cores than on a sequential machine, and can be even more parallel with a simple adaptation of parameters (typically increasing the population size). Besides this natural parallelism advantage, many black-box optimization methods can include arbitrary parameters (including discrete parameters, e.g., choosing a material in a list), adding non-differentiable stochastic manufacturing errors [71,72], adding multiple objective functions, and handling worst-case analysis over input angle or over a range of wavelengths.

As its name indicates, DE is built on a differential formula to create new structures based on existing ones. This means that if multiple structures in the current population share the same characteristics (e.g., in the present framework, the same block of layers with identical thicknesses and materials), those characteristics will probably be preserved in the creation of the new candidate solution. In photonics, this leads to the emergence of modular structures with distinct blocks of layers, each having well defined optical functions. This property might contribute to its efficiency addressing photonics problems [7,73]. Also, DE can deal with discrete spaces as well, e.g., choosing between many materials.

Other evolutionary methods include particle swarm optimization (PSO [74]): a population of particles with random initial positions and velocities is attracted towards good search points found so far. PSO is known as a robust method, dedicated to rugged cost functions.

Typically, evolution strategies (ESs) iteratively update a probability distribution model that is used to optimize the likelihood for generation of good candidates. A well known example is covariance matrix adaptation (CMA [75]), which updates the entire covariance matrix of a multivariate normal distribution (MND). The CMA algorithm samples candidates using a MND, evaluates them, and then modifies the MND using the evaluation results.

While CMA typically performs best on the black-box optimization benchmark (BBOB [33]), simpler methods such as the $(1 + 1)$ evolution strategy with one-fifth rule [76] are still good in many cases due to their ability to quickly adapt the search scale.

Bayesian optimization. In a Bayesian optimization process [77,78], the algorithm uses a probability distribution (a Gaussian process) for modeling the cost function and the uncertainty, and updates it with each new structure tested. The model provides, for each point in the search space, an estimated probability distribution for the cost function value of that search point.

Therefore, it is possible, at each iteration, to define the expected improvement (EI) for a search point $x:{\rm EI}(x) = {\mathbb{E}_w}\,\max (0,m -{\rm CostModel}(x,w))$, where $m$ is the best cost so far and ${\rm CostModel}(x,w)$ is the current statistical model of the cost function. ${\rm CostModel}(x,w)$ depends on a random $w$, because this model is statistical. The value of $w \mapsto {\rm CostModel}(x,w)$ is updated each time a new cost value is available: it is frequently a Gaussian process.

This gives an estimation of what structure to try next: we search for $x$ in the search space such that ${\rm EI}(x)$ is approximately maximal—this requires a side optimization for each iteration, hopefully orders of magnitude faster than the original problem.

Many variants of this approach have been proposed in the literature, with many parameters for the stochastic model. By design, Bayesian optimization (BO) works best with problems with a small parameter space dimension and for relatively smooth functions or functions adapted to the design of the kernel used in a specific BO implementation. Also, the probabilistic interpolation process, which is necessary to know which structure to try next, can be expensive, possibly becoming computationally more expensive than the cost function itself. For these reasons, BO is best suited for problems with a cost function that is computationally costly [3]. In such cases, BO is then difficult to compare to other approaches, which explains why comparative studies are not abundant. For instance, the authors in [79] tested Bayesian optimization in a limited budget scenario but other, computationally cheaper algorithms were still performing well. As BO is not relevant in the context of our test cases, no BO method has been benchmarked in the present work, but for computationally expensive cost functions, such methods are often preferred.

Other black-box optimization methods. Other methods include pattern search methods, such as, e.g., the Nelder-Mead method [80], which iteratively updates a simplex of search points. Also, optimization wizards (also known as hyper-heuristics) are defined by automatically selecting and combining existing methods [81]: NGOpt (Nevergrad optimizer) and NGOptRW (NGOpt real-world) are examples of wizards included in Nevergrad. These are home-made Nevergrad algorithms, combining many algorithms (including DE and BFGS) with selectors tuned on a wide range of methods. Methods based on reinforcement learning (RL) have also been defined and applied on machine learning problems [82], though simple classical evolutionary methods were actually competitive [83]. Please note that a study [84] mentions difficulties for reproducing results in some RL papers.

Fig. 3. Convergence curves. Convergence curves for different runs for different algorithms showing the value of the cost function as a function of the number of iterations performed so far by the optimization. The lower the algorithm is in the legend, the better its performances are. The information brought by such individual curves is useful but difficult to read when comparing even a small number of algorithms; that is why the average convergence is often plotted, as shown in Fig. 4.

Download Full Size | PDF

B. Observables

For the vast majority of continuous optimization problems, proving that a solution provided by an algorithm is the best possible solution in the optimization domain is essentially impossible. In addition, many optimization algorithms are non-deterministic or sensitive to the initialization, which means each run of the algorithm will lead to a different outcome.

As a consequence, it is possible to perform many optimization runs and still not be able to firmly determine whether the best of these solutions is good enough to stop looking for other, potentially better, outcomes. Yet, observing how exactly each different run progresses towards a solution, as well as considering the solutions that are produced statistically, yields crucial information that may help the user gain confidence in the best solutions produced or, on the contrary, indicate that these solutions cannot be trusted.

Convergence curves. The first observable, which is widely used, is the convergence curve that can be produced for each run of a given algorithm by plotting the value of the cost function of the solution that the algorithm recommends (typically the best solution found so far) as a function of the number of iterations. When multiple runs are launched, convergence curves can be either drawn for each run or an averaged convergence can be obtained. Both essentially provide the same kind of information: whether most of the runs have settled on a given solution (if they have almost all reached a plateau for many iterations), or if further iterations have a chance to improve the outcome. Figure 3 (resp. Fig. 4) presents such an example of individual curves for each run (resp. aggregated curves with average convergence).

Fig. 4. Averaged convergence curves. Averaged convergence curves for different algorithms (31 runs each), with standard error visible in ellipses, as a function of the number of iterations. The algorithms are ordered by average performance for the maximum budget (the lower in the legend the better). Using average convergence curves allows comparison of more algorithms [85].

Download Full Size | PDF

Budget. Not all iterations of all algorithms have the same computational cost or evaluate the cost function the same number of times. An iteration for a genetic or evolutionary algorithm typically corresponds to a new generation and thus the evaluation of a few dozen solutions. For some other algorithms, an iteration requires the evaluation of a single new solution. A way to compare two algorithms fairly is to discuss their performances for a given number of evaluations of the cost function. The maximal number of evaluations allowed is called the budget and it has to be chosen before running the algorithms. Each run of an algorithm is then stopped when the budget has been reached. Of course, this does not take into account the computational cost of the optimization algorithm itself, which should be discussed when not negligible compared to the cost function values.

Consistency curve. Convergence curves for different runs allow determination of whether the chosen budget is large enough to reach a satisfactory convergence for each run. However, since the different runs can produce completely different results, the variability of the different results has to be discussed. This can be done by plotting what we call the consistency curve (Fig. 5). This curve is obtained by plotting the different values of the cost function reached at the end of each run, sorted from the lowest to the highest. This is generally done for the highest budget allowed. When such a curve displays a plateau, then the same solution, or at least solutions with the same cost, has been found several times by the algorithm. A large plateau for the lowest value of the cost function thus means the best solution has been found a large number of times. This reinforces the confidence that can be placed in that solution.

Fig. 5. Consistency curves for different algorithms. Each algorithm has been run 31 times with the highest budget (from ${10^3}$ to 32,000 iterations depending on the test case, as shown in the convergence curves in Fig. 3). The cost function values for each run’s best solution are sorted from left to right in ascending order. The lower the curve and the flatter, the more efficient and reliable the algorithm can be considered. A plateau means that a certain cost function value, and probably the same solution, has been found multiple times, indicating a good reproducibility of the optimization. The results for BFGS and its variants correspond to different local minima if the algorithm has converged (this is often the case when the budget is large). The lower the algorithm is in the legend, the better its performances are.

Download Full Size | PDF

Box plots. It is not always relevant to draw the consistency curve, especially when comparing a large number of algorithms. This curve can be summarized by a box plot as, e.g., in Fig. 6. While a box plot will not allow observation of any plateau and thus bears less information than a consistency curve, it gives an immediate and easily readable access to the statistical properties of the values of the cost function.

Fig. 6. Performance box plots for comparing algorithms. Box plots representing the distribution of the minimum values of the cost function reached at the end of each run for a given algorithm for different test cases. The results are shown for the highest budget, ranging from 1000 for Ellipsommetry to 32,000 for HugePhotovoltaics, as shown in Fig. 3. Each box presents the first quartile, the third quartile, and a line at the median, and dashed lines are removed at 1.5 times the inter-quartile range. The algorithms are sorted according to their performances; the best performing is placed on the left.

Download Full Size | PDF

Estimating the density of local minima. In addition, local optimization algorithms can be used to estimate whether there is a limited or a large number of local minima and whether the best solution found by any other algorithm has a large attraction basin or not. A simple way to do so is to launch multiple runs of the algorithm with starting points drawn randomly in the optimization domain, then make sure all the runs have converged (e.g., BFGS has a stop criterion essentially guaranteeing a local minimum has been reached) and finally plot the resulting consistency curve. If we were to run a large number of such optimizations, the consistency curve should present plateaus corresponding to different local minima—several minima may present identical values of the cost function, due to symmetries of the problem, and not be distinguishable. The width of a plateau would allow estimation of the volume of the corresponding attraction basin.

In this context, running BFGS with randomly drawn starting points can be seen as a way of estimating the difficulty of a problem. Many different results suggest a lot of different minima and if the best result is rarely found, this also means it has a relatively small attraction basin compared to the size of the optimization domain. These characteristics are indicative of a difficult problem.

4. RUNNING AND DISCUSSING EXPERIMENTS: THE ART OF BENCHMARKING

In this section, after presenting our codes and explaining how to reproduce the results below, we evaluate selected algorithms using our multi-layer test cases. We provide criteria for selecting the most efficient algorithms. In addition, we carry out an in-depth physical analysis of our solutions to assess their quality.

A. Repository Description

With the idea that anyone should be able to build easily on our work, we provide Jupyter Notebooks and Python codes in a repository [12] to demonstrate how to perform global optimizations of multi-layered photonic structures, 2D gratings, and 3D plasmonic nanostructures.

The cost function computation is necessarily based on Maxwell’s equation solvers, which are adapted to the geometry considered. Multi-layer simulations are done with PyMoosh, a Python-based simulation library designed to provide a comprehensive set of numerical tools allowing computation of essentially all optical characteristics of multilayered structures and that we released recently [50,86]. The optical response of the 2D gratings is obtained using a home-made rigorous coupled wave analysis (RCWA) code [87,88] whereas the plasmonic nanostructures’ scattering diagram is computed with pyGDM, a freely available Green Dyadic Method Python implementation [89,90]. As an optimization toolkit, we use either Nevergrad [11], especially for benchmarking, or sometimes PyMoosh’s internal DE optimizer when this is sufficient.

Our repository contains simple notebooks to show, on the Bragg problem, (i) how to use PyMoosh’s internal DE optimizer, (ii) how to use a Nevergrad optimizer, and (iii) how to leverage Nevergrad to benchmark algorithms. More notebooks are provided to showcase an optimization using Nevergrad (iv) on the Ellipsometry test case, (v) on the Photovoltaics test case, (vi) on the 2D grating problem, and (vii) on the 3D nanoantenna problem.

Then, more complete examples are also presented that will require supplementary steps as well as more computational power to run. The first example, in a separate folder, is a Nevergrad-based benchmark of several algorithms on the 3D plasmonic nanoantenna test case, accelerated by relying on MPI for the parallelization. The second folder contains codes allowing reproduction of all the benchmarking results presented in this paper—including, importantly, the convergence and consistency curves on all multi-layer test cases. We even provide a simplified notebook that can be run directly on the Colab platform, in order to maximize the reproducibility of our results and respect the principles of the FAIR approach (making data and methods findable, accessible, interoperable, and reusable).

The first examples, presented in notebooks, are voluntarily kept simple to be run on any platform, and we underline that they do not always present the same optimization configuration (in terms of budget, algorithms, or initialization) as the comprehensive code. As a consequence, the notebooks may produce results that differ from the benchmark presented below.

B. Benchmarking with Nevergrad

We present and discuss here benchmark results that can be obtained using the codes described above. We focus here on the methodology and on the information that can be obtained by closely monitoring the observables we have defined earlier. We have thus limited the number of algorithms for which we show optimization results.

Algorithm comparison. The convergence curves (shown in Fig. 3) allow assessment of whether the budget is high enough. The averaged curves shown in Fig. 4 are usually the most readable and informative, especially if the standard error is also given on each averaged curve.

Once convergence has been reached for all algorithms, the consistency curves shown in Fig. 5 allow thorough comparison of the reliability of different algorithms. Showing on the same graph the consistency curves for a large variety of algorithms is not convenient, as figures can be difficult to interpret. Each consistency curve can, however, be summarized using a box plot, allowing for a fair benchmarking between various algorithms, as shown in Fig. 6.

According to these results, with the optimization domain and initialization we have chosen (see below), DE and the variants we considered (QODE and QNDE) perform better than CMA on all our test cases. We underline that we have actually tested more algorithms, without showing the results (they can be obtained using the benchmark codes provided) but that CMA and DE appeared as the best options overall. CMA’s performances are actually relatively close to the performances of DE with a random initialization. However, we observed that CMA often has poorer consistency.

Density of local minima. We have also run BFGS with an almost unlimited budget, using randomly chosen starting points in the optimization domain. All the points of the corresponding consistency curve are thus actual local minima, which allows assessment of their density and the difficulty of the optimization problem. In the ellipsometry case, BFGS almost always finds the minimum, which means the density of local minima is extremely low. This is not the case for DE, for instance. It is not surprising to see that many algorithms are efficient in that case. Retrieving the Bragg mirror as a solution is more complex because of a relatively large density of local minima that seem to increase with the number of parameters of the problem. Photovoltaic problems seem to be the most difficult for all algorithms, to the point that for BigPhotovoltaics, the performances of CMA are similar to those of BFGS, which means it does not bring an advantage compared to a steepest descent with a random start. For 32 layers, the original DE algorithm finds a minimum close to the best only one time out of three, while CMA and BFGS find a continuum of values for the cost function, reminiscent of what happens for BFGS in complex TO cases [46].

Impact of initialization. The bounds we have defined for the optimization domain on a physical basis have actually two roles. One is ensuring the realism of the produced solution, by forbidding nonsensical values for the parameters, such as a negative thickness for a material layer. The other role is to give the algorithms an indication of the domain in which a solution can be expected.

Often in photonics, once the bounds have been defined, algorithms are initialized by drawing the first structure randomly according to a uniform random distribution inside the optimization domain. This choice is natural for BFGS, for instance, as it allows more accurate estimatation of the density of local minima in the optimization domain. When an initial population is required, for DE or CMA, for instance, we have partially used Nevergrad’s way of initializing algorithms in a way that is consistent with the standards of the optimization community. The population is generated around a center (drawn randomly according to a uniform random distribution in the optimization domain) following a normal distribution with width of a fraction of the optimization domain (typically 1/6, but it varies for certain algorithms). Such an initialization allows one to estimate how good an algorithm is at finding a solution even if it is located outside its initialization region. This kind of initialization could also be used to target a specific region of the optimization domain, a role that is generally devoted to the bounds of the optimization domain.

The importance of initialization is well illustrated by the performances of QODE when compared to DE with the initialization described above. In QODE, for each individual randomly chosen in the optimization domain, another one is added symmetrically with respect to the center of the domain. This simple trick improves the performances of DE in all cases, which is why we recommend it. As shown by the different formulas of DE variants, DE is more efficient to explore the domain when the difference between individuals is large. This initialization ensures that there will be a spread in the values chosen for all parameters in the initial population, making DE measurably more efficient. QODE has performed impressively on all our test cases.

Combining algorithms. We underline here that combining algorithms can be a good idea if the algorithms are complementary. The memetic QNDE combines QODE and BFGS. DE provides excellent starting points for BFGS, which is then efficient at finding the closest local minimum, making such a strategy effective in all our test cases. We have also tested NGOpt, the “wizard” of Nevergrad [91], in the Ellipsometry case. It is automatically choosing a well working algorithm (note that many algorithms perform well here).

C. Physical Analysis

Analyzing physically both the solutions produced and how they have been produced (by which algorithm, in which conditions, and how fast) sheds a new light both on the solution and on the optimization process itself. When a problem is grounded in physics, one can and should take advantage of its distinct characteristics in order to gain deeper insights.

Low dimension. In the Ellipsometry case, local algorithms perform perfectly and find the solution reliably and quickly. This is typical for a simple landscape with few local minima. Physically, this means the geometry is simple enough and the considered layers sufficiently thin to have a low number of resonances. This may not always be the case in ellipsometry problems, for instance, if the material dispersion is better described using a large number of parameters [92].

Multiple resonances and local minima. The Bragg test case is more interesting. Even for 10 layers, local algorithms most often fail to find solutions performing as well as the Bragg mirror. Since the algorithms have converged and since local algorithms converge to local minima, this means local minima are numerous, even for such a relatively simple problem with an intermediate number of parameters. This is expected, since for only a dozen layers the thickness of the whole structure is already more than 2 µm, so that a large number of resonances, leading to as many local minima, can be expected in such a system.

For Photovoltaics cases, the problem is made even more complicated because of the use of a realistic (and thus noisy) solar spectrum in the cost function, which leads to even more local minima. This naturally leads to a decrease in the performance of the algorithms, as shown by the consistency curves that are less flat than for more simple cases.

For the 3D optimization case, we have run the optimization using a selection of several optimizers each with a budget of 10,000 evaluations and repeating every optimizer run 10 times. The results are depicted in Fig. 9(e), where the best structure found is shown. The corresponding emission pattern is shown in Fig. 9(f). The result is consistent with our former results [58].

Using convergence and consistency plots (not shown but retrievable with the provided 3D codes), we find that on this problem, significant performance differences occur depending on the optimizer. We attribute this to the discrete character of the problem, which is not ideal for many continuous algorithms. We find in particular that gradient-based optimization (BFGS) is totally unsuited for this optimization problem with discrete positional parameters. Also, the consistency curves in this problem do not show a plateau for either of the algorithms, indicating that the budget is not sufficiently high for convergence. The high variance of the individual runs finally indicates that probably a high number of local minima exist.

Fig. 7. Best structures (right) and their associated reflectance spectrum (left) obtained for the Bragg case with 20 layers for a budget of (a) 100, (b) 1000, and (c) 20,000. The light, resp. dark, color represents the high (1.8), resp. low (1.4), refractive index material. The gray color represents substrate and superstrate.

Download Full Size | PDF

Regularity and modularity. As explained for DE in Section 3.A, if the same values for some parameters can be found in all the individuals they will be preserved throughout the optimization. This is well shown in Table 2: DE is based on the difference between vectors in all the different versions of the mutation formula, so that if a subset of parameters is present in all the individuals, these parameters evolve only slightly. In photonic structures, it is common to find structures where different parts play very definite roles (such as an antireflective coating typically, or a photonic crystal core). Such structures are said to be modular. DE is well equipped to find modular structures because it will tend to keep a subset of parameters (which, depending on the parametrization, can correspond to a sub-part) if that makes all the solutions more efficient.

This is illustrated for the Bragg case in Fig. 7, where the best results (whatever the algorithm) are shown for different budgets. For the lower budget, the structure appears disordered (no algorithm has converged at that point) and the spectrum indicates that a relatively narrow anti-resonance producing a large reflectance has been found. This does not qualify as a local minimum: a minor adjustment in the spectrum can align the reflectance peak at 600 nm, achievable through a straightforward contraction of the entire structure since the materials considered are not dispersive. The absence of such alignment serves as a clear indicator that the optimization falls short here.

For a budget of around 1000, the best solution, shown in Fig. 7(b), is likely a local minimum. As can be seen on the convergence curves in Fig. 4 for such a budget, BFGS has converged and since the value of the cost function is higher than what DE can find with a larger budget, this means a local optimum has been found. The performances of the device are interesting, and some regularity can definitely be seen in the structure. We emphasize that a degree of partial regularity is discernible in the outcomes of numerous optimization studies documented in the literature. The reflection peak is larger than a random resonance, which we attribute to the relative regularity of the structure.

For a larger budget, however, BFGS as well as CMA seem to be stuck in a local minimum, while all the variants of DE get close to the Bragg mirror, the likely optimal solution, which presents the largest reflectance peak of all structures we have tested. We played with variants of BFGS (included in Nevergrad) with restarts: this improves BFGS, but not enough to make it competitive, in particular in photovoltaics problems.

The Photovoltaics case is a perfect example of a modular structure. Figure 8 shows the best results for all the Photovoltaics cases, all produced by QODE. It is important to notice that whatever the number of layers and even though the structures have been obtained for independent runs, the three upper and up to five lower layers are common to all the structures. For the largest number of layers, periodical patterns (alternating 120 nm, resp. 150 nm, thick layers for permittivity 3.0, resp. 2.0) are appearing. A previous study has shown that the upper and lower layers allow light to be coupled in and out of the central photonic crystal more easily [8]. The fact that they appear consistently when the number of layers varies indicates their physical importance. They allow one to diminish the oscillation characteristics of Bragg mirrors outside the bandgap that can be seen in Fig. 7(c) on the reflectance. Such oscillations are detrimental to the absorption.

Fig. 8. Best structures (right) and their absorptance spectrum (left) obtained for the Photolvoltaics case with (a) 10 layers, (b) 20 layers, and (c) 32 layers. The light, resp. dark, color represents the high (3), resp. low (2), permittivity material. The gray color represents substrate and superstrate.

Download Full Size | PDF

Regularity emerges in a similar way in the case of the 2D grating we have studied, a problem inspired by the architecture present on the wings of Morpho butterflies [7]. As shown in Fig. 9, looking for a way to minimize specular reflection leads to a remarkably regular structure shown in Fig. 9(c) even though the structures considered by the algorithms can be different [see Fig. 9(a)].

Fig. 9. Optimization of 2D or 3D structures. (a) Geometry of a 2D grating corresponding to a typical starting point. The dielectric material is represented in black and air in white. (b) Diffraction efficiencies in reflection for the grating represented in (a). (c) High-performance result of an optimization for the 2D grating. (d) Diffraction efficiencies in reflection for the grating represented in (c). The efficiency is maximized in the ${\pm}1$ reflected orders (blue and black solid lines), while it is minimized in the specular reflection (red solid line) at the working wavelength ($\lambda = 450\,{\rm nm}$). (e) Top view of an optimized geometry of a directional plasmonic gold nanoantenna, coupled to a local emitter, after multiple runs using several algorithms. (f) Far-field emission pattern of the optimized antenna showing the directionality of the emission.

Download Full Size | PDF

Sometimes DE is able to generate structures like chirped dielectric mirrors [7], which locally resemble a Bragg mirror but whose parameters change gradually and smoothly. Such a structure cannot be considered periodic. It is, however, modular, as the different parts of the structure are obviously able to reflect different parts of the spectrum. We call this kind of structure regular, even if it is not periodic. To summarize, a periodic structure is obviously regular, but a regular (e.g., modular) structure is not necessarily periodic.

From the cases presented here and our experience, it can generally be expected that periodic or regular structures will also be the most effective in influencing the behavior of light because it is a wave—so that it is particularly sensitive to periodicity. The Bragg mirror, for instance, has a higher reflectivity than any disordered structure that we have generated. The thickness values for the AR coating point towards a structure containing a photonic crystal, even if they are not as precise as with the Bragg case, probably because of the irregularities of the solar spectrum. We underline that we have run these optimizations a large number of times on these two cases and never found any better solution than the ones presented here. This strongly suggests that regularity should be expected and even sought after [14] and we underline that in many results from the literature, periodic patterns often seem to emerge spontaneously (see Fig. 10).

Fig. 10. Spontaneous emergence of regularity or periodicity. When periodicity or regularity emerges, solutions appear much more satisfactory and are more likely to be analyzed physically. (a) When the positions of nanopillars of silicon are optimized to enhance the magnetic Purcell factor, using the differential evolution algorithm, two ring cavities spontaneously emerge [52]. (b) Two-dimensional dielectric metalens designs obtained by topology optimization [93] with a relatively low number of parameters. (c) Silicon carbide optical cavity obtained using gradient-based optimization [53] showing a regular pattern (holes of increasing dimensions in the waveguide). (d) When gratings of rectangular blocks of chitin are optimized to maximize the scattered reflection of a 450 nm wavelength in the first diffraction orders, while minimizing the specular reflection of a range of visible wavelengths, quasi-perfect checkboards with interdigitated blocks are produced [7].

Download Full Size | PDF

Robustness. In photonics, robustness to fabrication defaults is always desirable. A robust structure presents an efficiency that will not change much when the parameters are slightly modified. From an optimization point of view, this simply means that the attraction basin of the desired solution should be flat. Evaluating the robustness of a solution is thus computationally costly because it involves modifying a large number of parameters and computing the change in the cost function. It can be tempting to include the robustness of the structure in the cost function [71,72], so that robust solutions appear to have a lower cost function. In our experience, this may lead to an improvement in the quality of the solutions, and produce regular structures more often but at the cost of a much larger computational burden.

Robustness can be assessed indirectly by examining the spectral response of a structure. This becomes particularly evident in the context of the Bragg mirror. A contraction or dilation of the entire structure primarily shifts the forbidden band and consequently alters the position of the reflectance maximum. As a consequence, there exists a direct correlation between the spectral size of the photonic bandgap and the resilience of the reflectance to structural contractions at the working wavelength. In this context, since the periodicity of a Bragg mirror is what makes its bandgap larger, the connection between regularity and robustness is explicit. While establishing such a link in a universally general manner is impossible, our current findings consistently support this relationship.

5. GOOD PRACTICES FOR OPTIMIZING PHOTONIC STRUCTURES

The optimization of photonic structures thus presents quite a few specific characteristics that influence the strategy to be followed in this particular context. While the computational cost may put a limit on the problems that can be tackled, we give below a list of strategies that, applied together, constitute a methodology for the optimization of photonic structures.

One of the most important questions is when we should stop the optimization. It is impossible to prove that a solution is optimal. However, there are ways to determine the quality of a solution. We give below criteria that can help to determine whether a solution is satisfactory—meaning the optimization can be stopped—or not.

A. Optimization Methodology

Our methodology consists in maximizing the information extracted from the observables we have defined above, and to make use of specific characteristics of photonics problems to gradually establish confidence in the generated solutions. We leave other strategies that could also make interesting solutions emerge for further work (e.g., multi-objective optimization, or the use of manufacturing noise in the cost function).

Convergence curves for the determination of the budget. The presence of plateaus in a convergence curve indicates that the algorithm has converged, suggesting that the budget is adequate. On the contrary, the absence of plateaus on the consistency curve suggests that the budget should be increased.

Systematic use of consistency curves for checking local minima. Even when relying on a single algorithm, since global algorithms are typically nondeterministic, it is necessary to launch multiple runs in order to conduct a statistical analysis with a consistency curve. Ideally, the consistency curve exhibits even a small plateau at its minimum value [see Fig. 5(a)]. However, when this is not the case [see Fig. 5(f)], the solution should be considered with a lot of caution.

Changing the number of parameters. In photonics problems, it is generally straightforward to gradually increase the number of elements that can be optimized without changing the nature of the problem. In the cases presented above, this can be done by increasing the number of layers of the structure, but it could, for example, also be through a decrease in the discretization stepsize, e.g., in topology optimization. Structures with different numbers of layers can then be compared in terms of performances. It can be generally expected that increasing the number of degrees of freedom leads to improved optimized performances. Plotting the minimum value of the cost function as a function of complexity, represented by the number of layers or elements in the structure, can provide valuable insights, e.g., if increasing the number of layers does not improve performance, which indicates that the difficulty of the problem has also increased; continuing that path is likely pointless [8].

Parametrization bias awareness: meaningful representations make the optimization problem easier. In parametric optimization, when a handful of parameters are needed to describe a relatively complex device, the choice of these parameters and of their limits (which defines the optimization domain) is crucial. More precisely, these initial choices may introduce biases, favor certain types of algorithms, or make the convergence more difficult. When, for instance, the parametrization is chosen such that subsets of parameters correspond to components of the structure, algorithms like DE are particularly efficient. In DE, when a component of a structure is widely spread in the whole population, it might be exactly preserved through the iterations, whereas many other algorithms keep perturbing all variables. When the different parameters do not describe a part of the structure but a more global property, other kinds of algorithms might be more relevant, as has been underlined in previous works [85].

Sensitivity to the optimization domain: changing bounds. In many cases, the imposed constraints strongly control the emerging solutions. For example, using a medium with an extremely high refractive index (typically infinite) is a simple but not realistic way to reflect light completely. The constraints on the refractive index values are therefore the fundamental reason for the production of Bragg mirrors as a solution. However, there are instances where the constraints become too demanding, making it difficult to find satisfactory solutions. It is important in that case to verify whether some parameters are stuck at the optimization domain boundary (i.e., if the boundary constraints are active). On the other hand, when a satisfactory solution is produced, it can be informative to add or remove constraints or expand the optimization domain. Bragg mirrors tend to emerge, whether the refractive indexes are allowed to vary within certain limits or are imposed, with the latter case being straightforward. A clear understanding of the conditions under which a solution is generated also contributes to building confidence.

Leverage your physical intuition. We underline that in optimization, there are no rules a priori. If, for instance, it makes sense physically to modify a solution by hand, this should not be considered forbidden or “cheating,” especially when it seems that the algorithm is stuck in a local optimum that can be criticized based on a physical reasoning [94]. The limits of the optimization range can also be set to encourage the algorithm to explore areas where promising solutions are likely based on physics, or to stay within specific functional ranges [95]. This approach usually makes the task easier for the algorithm and provides solutions that are easier to understand and thus more satisfactory. Physical intuition is what often determines the conditions in which the optimization takes place and what allows detection of parametrization biases, or even that a problem is not well posed enough for any algorithm to find a satisfactory solution. It should never be overlooked.

B. Assessing the Quality of a Solution

Usually, no solution can be proven optimal, due to the impossibility to explore the entire space of possible solutions or to locate all the local minima. Therefore, it becomes necessary to establish criteria that enhance the confidence in a solution. Besides the optimization criteria above based on optimization observables, a physical analysis is possible, as developed in the present section. When enough confidence in a solution has been built, it can be deemed satisfactory.

Consistency. A solution that has been obtained at least more than once inspires greater confidence. If it is obtained repeatedly, it might correspond to a plateau of similarly good solutions on the consistency curve of the most efficient algorithm. In that case the solution can be deemed truly consistent and particularly trustworthy. This is most often not the case. When the best solution is obtained only for a single run, this should be considered indicative of a local minimum. We regret that, except in a few cases [46], elements allowing assessment of the consistency of a solution, even if this is not a consistency curve, are generally not given.

Spontaneous emergence of regularity. In photonics, periodical or regular structures are ubiquitous. This can be directly attributed to the wave nature of the underlying phenomena. As underlined in a pioneering optimization study [14], “the emergence of periodicity suggests that periodicity is a principal condition for strong light manipulation.” Many studies have shown the emergence of partially periodic structures, even when the solution lacks complete periodicity or regularity (like for chirped dielectric mirrors typically [7]). However, when an algorithm proposes completely periodic structures as a solution, they naturally inspire more satisfaction. Based on our experience, we have yet to encounter a simple problem where fully disordered structures outperform regular ones in terms of efficiency.

A symmetrical structure is typically considered as more regular. We believe that the spontaneous emergence of symmetry also reinforces the confidence that can be placed in a solution. We underline that this is rare, as symmetry is often imposed spontaneously in the parametrization—this tends to simplify problems noticeably.

Overall, we do not necessarily prioritize performance over aesthetics, as both aspects are inherently intertwined in photonics. In the Bragg mirror benchmark, no disordered structure has ever presented a better performance than a Bragg mirror with the same number of layers. In the Photovoltaics case, the irregularities can be linked to the noise in the solar spectrum and, as a consequence, in the cost function. Irregularities in that case improve the performances, but the periodic pattern is still distinguishable and is central for the overall efficiency. In the case of multilayered structures, regularity or periodicity may be more likely to emerge, due to the relative simplicity of the geometry. However, in the literature, relatively regular patterns (in the sense defined above) emerge all the time, as shown in Fig. 10. Sometimes the patterns look unfinished, perhaps indicating the solution can be further improved—which is likely if a local algorithm has been used. In our experience, regular or periodic patterns, including spontaneously symmetrical ones, can be generated also in more complex setups [20].

Physical interpretability. The solutions that are most satisfactory are those that can be readily understood from a physical point of view. We underline that, generally, only periodic, regular, or modular structures can be truly understood. This is more difficult for completely disordered structures, except if the disorder itself is tailored, which cannot be ruled out. The absence of physical interpretability is likely what hinders the widespread adoption of optimization as a daily research tool within the community. Sometimes, the solutions can be studied and fully understood a posteriori [8]. Although this does not guarantee optimality, this is at least a good reason to stop looking for alternative solutions: any solution that is comprehensible and understandable can serve as a valuable source of inspiration for manually generated structures and can offer valuable design rules. In rare situations, algorithms can produce solutions that resemble patterns found in nature, on insect wings for instance, which have evolved for optical purposes. Although these occurrences are uncommon, they can be highly satisfactory, as they align with the concept of evolution as a form of optimization. However, due to their infrequency, they cannot be included in the above criteria. In the case of the reflection problem that we have considered in this work, this criterion is obviously fulfilled too as Bragg mirrors are commonly found in nature.

We underline that these criteria to determine whether a solution is satisfactory or not can be applied to any inverse design problem, whether it is solved by topology optimization, shape optimization, parametric optimization, or any other technique. We advise all authors, as far as possible, to publish all the necessary information so that other researchers can reproduce and verify the quality of optimization results. When this is done, particularly interesting and thorough discussions become possible [46].

6. CONCLUSION

In this paper, we have presented different types of popular optimization algorithms and compared some of them using the Nevergrad platform on three typical photonics problems. We have shown how algorithms can be rigorously and thoroughly compared, relying on specific observables. We proposed a rigorous methodology and offered some advice for conducting high-performance design optimizations and evaluating the quality of a solution. We have illustrated this methodology on a wide range of cases that we think are representative of photonics problems. We have finally provided Jupyter Notebooks that illustrate our workflow and can be used to reproduce the presented benchmarks.

For low-dimensional problems such as ellipsometry or, more generally, those based on the search for a small number of parameters, a lot of algorithms seem adapted, including local optimization algorithms or Bayesian optimization, the latter being particularly useful when evaluating a solution proves costly—a direction we have not studied here.

For problems that can be considered as inverse design problems, not all approaches are effective. We have shown that photonics problems are characterized by a large number of local minima that render the optimization intrinsically difficult. We have compared our algorithms to what amounts to a steepest descent with random starting points. Except when the problems become too difficult (because of a large number of parameters or because it is noisy), algorithms like covariance matrix adaptation (CMA) or differential evolution (DE) constitute a more efficient approach. We tend to recommend DE (and particularly its quasi-oppositional version, QODE) because it proves to be more robust while being simple to implement. DE is never a bad choice, even if it may require a large number of evaluations of the cost function to perform well, and it seems particularly adapted for photonics.

We meant this work as a tutorial, but also as a warning. Optimization is difficult because of its unique curse: it is never possible to guarantee that a solution is optimum, or even close to it, making science much more difficult. This must lead to extra caution, and no solution should ever be deemed optimal, only optimized [93].

We underline that the field of optimization is awash in questionable claims of novelty and efficiency. In order to avoid a similar reproducibility crisis in photonics, adopting an open science approach is imperative: data regarding the different runs of optimization should be published, codes should be shared, and both should be discussed [46,96]. We are convinced that the optimization of photonic structures is a work intensive domain, and that neither a single method nor a single team will be enough to uncover what optimization can bring in terms of innovation.

There also is a danger in deeming a solution satisfactory when it should not be: to miss innovative and more efficient structures. Given the potential of inverse design but the difficulty to find structures that would be commercially viable [1], there is a danger in giving up too soon and missing particularly efficient structures. Fortunately, physical analysis of structures seems to be a powerful tool to discuss both the solutions and the optimization process itself.

We have shown in the present work how modern numerical tools have made the use of optimization much simpler and efficient in photonics. Even for well defined functioning regimes suggested by physics itself, with a relatively low number of parameters, numerically optimizing a photonic structure often yields unexpectedly high performances. We underline that numerical optimization is now able to produce photonic structures that can be understood. This constitutes a complete change compared to the times when inefficient algorithms (such as the first genetic algorithms) were producing disordered and impossible to understand results. Open science approaches now allow any researcher in the field to use such tools easily. We hope that our work will encourage fellow researchers within the community to seamlessly integrate optimization tools into their routine practice and to join the effort in discovering novel and more efficient structures to address the challenges of the future.

Funding

Agence Nationale de la Recherche (16-IDEX-0001, ANR-22-CE24-0002); Toulouse High Performance Computing Facility CALcul in MIdi-Pyrénées (p20010).

Acknowledgment

A.M. is an Academy CAP 20-25 chair holder. He acknowledges the support received from the Agence Nationale de la Recherche of the French government through the program Investissements d’Avenir. This work was supported by the International Research Center “Innovation Transportation and Production Systems” of the Clermont-Ferrand I-SITE CAP 20-25. P.R.W. acknowledges the support of the French Agence Nationale de la Recherche (ANR) (project NAINOS), and from the Toulouse High Performance Computing facility CALMIP.

Disclosures

The authors declare no conflicts of interest.

Data availability

The code and parameters (no other data are required) necessary for reproducing all results are available in [12].

REFERENCES

1. S. Molesky, Z. Lin, A. Y. Piggott, et al., “Inverse design in nanophotonics,” Nat. Photonics 12, 659–670 (2018). [CrossRef]

2. S. D. Campbell, D. Sell, R. P. Jenkins, et al., “Review of numerical optimization techniques for meta-device design,” Opt. Mater. Express 9, 1842–1863 (2019). [CrossRef]

3. M. M. Elsawy, S. Lanteri, R. Duvigneau, et al., “Numerical optimization methods for metasurfaces,” Laser Photon. Rev. 14, 1900445 (2020). [CrossRef]

4. M. Chen, J. Jiang, and J. A. Fan, “Algorithm-driven paradigms for freeform optical engineering,” ACS Photon. 9, 2860–2871 (2022). [CrossRef]

5. M. P. Bendsoe and O. Sigmund, Topology Optimization: Theory, Methods, and Applications (Springer, 2003).

6. O. Sigmund, “On the usefulness of non-gradient approaches in topology optimization,” Struct. Multidiscip. Optim. 43, 589–596 (2011). [CrossRef]

7. M. A. Barry, V. Berthier, B. D. Wilts, et al., “Evolutionary algorithms converge towards evolved biological photonic structures,” Sci. Rep. 10, 12024 (2020). [CrossRef]

8. P. Bennet, P. Juillet, S. Ibrahim, et al., “Analysis and fabrication of antireflective coating for photovoltaics based on a photonic-crystal concept and generated by evolutionary optimization,” Phys. Rev. B 103, 125135 (2021). [CrossRef]

9. K. Sörensen, “Metaheuristics—the metaphor exposed,” Int. Tran. Oper. Res. 22, 3–18 (2015). [CrossRef]

10. “Pymoosh,” 2023, https://pypi.org/project/PyMoosh/.

11. J. Rapin and O. Teytaud, “Nevergrad–a gradient-free optimization platform,” 2018, https://GitHub.com/FacebookResearch/Nevergrad.

12. P. Bennet, O. Teytaud, P. Wiecha, et al., “Ellawin/tuto_global_optimization_photonics: v2,” Zenodo, 2023, https://doi.org/10.5281/zenodo.10246032.

13. Q. Wang, J. Lu, and S. He, “Optimal design method of a low-loss broadband Y branch with a multimode waveguide section,” Appl. Opt. 41, 7644–7649 (2002). [CrossRef]

14. A. Gondarenko, S. Preble, J. Robinson, et al., “Spontaneous emergence of periodic patterns in a biologically inspired simulation of photonic structures,” Phys. Rev. Lett. 96, 143904 (2006). [CrossRef]

15. A. Y. Piggott, J. Lu, K. G. Lagoudakis, et al., “Inverse design and demonstration of a compact and broadband on-chip wavelength demultiplexer,” Nat. Photonics 9, 374–377 (2015). [CrossRef]

16. H. Einarsson, M. M. Gauy, J. Lengler, et al., “The linear hidden subset problem for the (1+1)-EA with scheduled and adaptive mutation rates,” Theor. Comput. Sci. 785, 150–170 (2019). [CrossRef]

17. B. Doerr and F. Neumann, “A survey on recent progress in the theory of evolutionary algorithms for discrete optimization,” ACM Trans. Evol. Learn. Optim. 1, 1–43 (2021). [CrossRef]

18. T. H. Bäck, A. V. Kononova, B. van Stein, et al., “Evolutionary algorithms for parameter optimization—thirty years later,” Evol. Comput. 31, 81–122 (2023). [CrossRef]

19. D. Gagnon, J. Dumont, and L. J. Dubé, “Multiobjective optimization in integrated photonics design,” Opt. Lett. 38, 2181–2184 (2013). [CrossRef]

20. O. Teytaud, P. Bennet, and A. Moreau, “Discrete global optimization algorithms for the inverse design of silicon photonics devices,” Photon. Nanostruct. Fundam. Appl. 52, 101072 (2022). [CrossRef]

21. S. W. Mahfoud, “Niching methods for genetic algorithms,” Ph.D. thesis (University of Illinois at Urbana-Champaign, 1995).

22. O. M. Shir, Niching in Evolutionary Algorithms (Springer, 2012), pp. 1035–1069.

23. A. Petrowski, “A clearing procedure as a niching method for genetic algorithms,” in IEEE International Conference on Evolutionary Computation (1996), pp. 798–803.

24. J. Wu, M. Poloczek, A. G. Wilson, et al., “Bayesian optimization with gradients,” Adv. Neural Inf. Process. Syst. 30, 5267–5278 (2017).

25. X. Garcia-Santiago, S. Burger, C. Rockstuhl, et al., “Bayesian optimization with improved scalability and derivative information for efficient design of nanophotonic structures,” J. Lightwave Technol. 39, 167–177 (2021). [CrossRef]

26. Z. Jakšić, S. Devi, O. Jakšić, et al., “A comprehensive review of bio-inspired optimization algorithms including applications in microelectronics and nanophotonics,” Biomimetics 8, 278 (2023). [CrossRef]

27. A. Khaireh-Walieh, D. Langevin, P. Bennet, et al., “A newcomer’s guide to deep learning for inverse design in nano-photonics,” Nanophotonics 12, 4387–4414 (2023). [CrossRef]

28. D. Wolpert and W. Macready, “No free lunch theorems for optimization,” IEEE Trans. Evol. Comput. 1, 67–82 (1997). [CrossRef]

29. I. L. Markov, “The false dawn: reevaluating Google’s reinforcement learning for chip macro placement,” arXiv, arXiv:2306.09633 (2023). [CrossRef]

30. N. I. Gould, D. Orban, and P. L. Toint, “Cuter and sifdec: a constrained and unconstrained testing environment, revisited,” ACM Trans. Math. Softw. 29, 373–394 (2003). [CrossRef]

31. N. I. Gould, D. Orban, and P. L. Toint, “Cutest: a constrained and unconstrained testing environment with safe threads for mathematical optimization,” Comput. Optim. Appl. 60, 545–557 (2015). [CrossRef]

32. DIMACS, “Dimacs implementation challenge,” 2021, http://dimacs.rutgers.edu/programs/challenge.

33. N. Hansen, A. Auger, S. Finck, et al., “Real-parameter black-box optimization benchmarking 2009: experimental setup,” Technical Report RR-6828 (INRIA, 2009).

34. X. Li, K. Tang, M. N. Omidvar, et al., “Benchmark functions for the CEC’2013 special session and competition on large-scale global optimization,” in CEC 2013 Proceedings (2013).

35. M. Gallagher and S. Saleem, “Exploratory landscape analysis of the MLDA problem set,” in PPSN’18 Workshop (2018).

36. F. Häse, M. Aldeghi, R. J. Hickman, et al., “Olympus: a benchmarking framework for noisy optimization and experiment planning,” Mach. Learn. Sci. Technol. 2, 035021 (2021). [CrossRef]

37. J. Lee, M. X. Grey, S. Ha, et al., “Dart: dynamic animation and robotics toolkit,” J. Open Source Softw. 3, 500 (2018). [CrossRef]

38. E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation in robotics, games and machine learning,” GitHub (2017), https://github.com/bulletphysics/bullet3.

39. E. Todorov, T. Erez, and Y. Tassa, “Mujoco: a physics engine for model-based control,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (2012), pp. 5026–5033.

40. G. Brockman, V. Cheung, L. Pettersson, et al., “Openai gym,” arXiv, arXiv:1606.01540 (2016). [CrossRef]

41. K. Musgrave, S. J. Belongie, and S. Lim, “A metric learning reality check,” arXiv, 2003.08505 (2020). [CrossRef]

42. S. Kapoor and A. Narayanan, “Leakage and the reproducibility crisis in ML-based science,” arXiv, arXiv:2207.07048 (2022). [CrossRef]

43. E. B. Whiting, S. D. Campbell, L. Kang, et al., “Meta-atom library generation via an efficient multi-objective shape optimization method,” Opt. Express 28, 24229–24242 (2020). [CrossRef]

44. A. V. Tikhonravov, M. K. Trubetskov, and G. W. DeBell, “Application of the needle optimization technique to the design of optical coatings,” Appl. Opt. 35, 5493–5508 (1996). [CrossRef]

45. J. Cea, “Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût,” ESAIM: Math. Model. Numer. Anal. 20, 371–402 (1986). [CrossRef]

46. L. Su, D. Vercruysse, J. Skarda, et al., “Nanophotonic inverse design with spins: software architecture and practical considerations,” Appl. Phys. Rev. 7, 011407 (2020). [CrossRef]

47. M. Y. Wang, X. Wang, and D. Guo, “A level set method for structural topology optimization,” Comput. Methods Appl. Mech. Eng. 192, 227–246 (2003). [CrossRef]

48. F. Wang, R. E. Christiansen, Y. Yu, et al., “Maximizing the quality factor to volume ratio for ultra-small photonic crystal cavities,” arXiv, arXiv:1810.02417 (2018). [CrossRef]

49. S. Martin, J. Rivory, and M. Schoenauer, “Synthesis of optical multilayer systems using genetic algorithms,” Appl. Opt. 34, 2247–2254 (1995). [CrossRef]

50. D. Langevin, P. Bennet, A. Khaireh-Walieh, et al., “PyMoosh: a comprehensive numerical toolkit for computing the optical properties of multilayered structures,” J. Opt. Soc. Am. B 41, A67–A78 (2024). [CrossRef]

51. A. V. Tikhonravov, “Some theoretical aspects of thin-film optics and their applications,” Appl. Opt. 32, 5417–5426 (1993). [CrossRef]

52. Y. Brûlé, P. Wiecha, A. Cuche, et al., “Magnetic and electric Purcell factor control through geometry optimization of high index dielectric nanostructures,” Opt. Express 30, 20360–20372 (2022). [CrossRef]

53. J. Yang, M. A. Guidry, D. M. Lukin, et al., “Inverse-designed silicon carbide quantum and nonlinear photonics,” Light Sci. Appl. 12, 201 (2023). [CrossRef]

54. R. Smaali, T. Taliercio, A. Moreau, et al., “Reshaping plasmonic resonances using epsilon-near-zero materials for enhanced infrared vibrational spectroscopy,” Appl. Phys. Lett. 119, 183701 (2021). [CrossRef]

55. E. Centeno, E. Alvear-Cabezón, R. Smaali, et al., “Inverse-designed terahertz modulators based on semiconductor multilayers,” Semicond. Sci. Technol. 36, 085014 (2021). [CrossRef]

56. A. Moreau, C. Lafarge, N. Laurent, et al., “Enhanced transmission of slit arrays in an extremely thin metallic film,” J. Opt. A 9, 165–169 (2007). [CrossRef]

57. R. Santbergen, J. Goud, M. Zeman, et al., “The AM1. 5 absorption factor of thin-film solar cells,” Sol. Energy Mater. Sol. Cells 94, 715–723 (2010). [CrossRef]

58. P. R. Wiecha, C. Majorel, C. Girard, et al., “Design of plasmonic directional antennas via evolutionary optimization,” Opt. Express 27, 29069–29081 (2019). [CrossRef]

59. N. Hansen, Y. Akimoto, and P. Baudis, “CMA-ES/pycma on Github,” Zenodo, 2019, https://10.5281/zenodo.2559634.

60. FacebookResearch, “Ax-adaptive experimentation,” 2020, https://ax.dev.

61. J. Bergstra, B. Komer, C. Eliasmith, et al., “Hyperopt: a Python library for model selection and hyperparameter optimization,” Comput. Sci. Dis. 8, 014008 (2015). [CrossRef]

62. F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential model-based optimization for general algorithm configuration,” in LION, C. A. C. Coello, ed., Vol. 6683 of Lecture Notes in Computer Science (Springer, 2011), pp. 507–523.

63. S. G. Johnson, “The NLopt nonlinear-optimization package,” GitHub (2007), https://github.com/stevengj/nlopt.

64. D. C. Liu and J. Nocedal, “On the limited memory BFGS method for large scale optimization,” Math. Program. 45, 503–528 (1989). [CrossRef]

65. G. Allaire, “A review of adjoint methods for sensitivity analysis, uncertainty quantification and optimization in numerical codes,” Ingénieurs l’Automobile 836, 33–36 (2015).

66. D. Maclaurin, D. Duvenaud, and R. P. Adams, “Autograd: effortless gradients in Numpy,” in ICML 2015 AutoML Workshop (2015), vol. 238.

67. C. Z. Janikow and Z. Michalewicz, “An experimental comparison of binary and floating point representations in genetic algorithms,” in International Conference on Genetic Algorithm (ICGA) (1991), Vol. 1991, pp. 31–36.

68. F. Herrera, M. Lozano, and J. L. Verdegay, “Tackling real-coded genetic algorithms: operators and tools for behavioural analysis,” Artif. Intell. Rev. 12, 265–319 (1998). [CrossRef]

69. R. Storn and K. Price, “Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces,” J. Global Optim. 11, 341–359 (1997). [CrossRef]

70. S. Rahnamayan, H. R. Tizhoosh, and M. M. A. Salama, “Quasi-oppositional differential evolution,” in IEEE Congress on Evolutionary Computation (2007), pp. 2229–2236.

71. M. Schevenels, B. Lazarov, and O. Sigmund, “Robust topology optimization accounting for spatially varying manufacturing errors,” Comput. Methods Appl. Mech. Eng. 200, 3613–3627 (2011). [CrossRef]

72. H.-G. Beyer and B. Sendhoff, “Robust optimization–a comprehensive survey,” Comput. Methods Appl. Mech. Eng. 196, 3190–3218 (2007). [CrossRef]

73. J. Rapin, P. Bennet, E. Centeno, et al., “Open source evolutionary structured optimization,” in Genetic and Evolutionary Computation Conference Companion (2020), pp. 1599–1607.

74. J. Kennedy and R. C. Eberhart, “Particle swarm optimization,” in IEEE International Conference on Neural Networks (1995), pp. 1942–1948.

75. N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evol. Comput. 11, 159–195 (2003). [CrossRef]

76. I. Rechenberg, Evolutionsstrategie Optimierung technischer Systeme nach Prinzipien der biologischen Evolution (Friedrich Frommann Verlag, 1973).

77. D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” J. Global Optim. 13, 455–492 (1998). [CrossRef]

78. M. M. Elsawy, A. Gourdin, M. Binois, et al., “Multiobjective statistical learning optimization of RGB metalens,” ACS Photon. 8, 2498–2508 (2021). [CrossRef]

79. F. Hutter, H. Hoos, and K. Leyton-Brown, “An evaluation of sequential model-based optimization for expensive blackbox functions,” in 15th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO), New York, New York, USA, Association for Computing Machinery, 2013, pp. 1209–1216.

80. J. A. Nelder and R. Mead, “A simplex method for function minimization,” Comput. J. 7, 308–313 (1965). [CrossRef]

81. J. Liu, A. Moreau, M. Preuss, et al., “Versatile black-box optimization,” in Genetic and Evolutionary Computation Conference (GECCO) (2020), pp. 620–628.

82. B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” in 5th International Conference on Learning Representations (ICLR), Toulon, France, OpenReview.net, April 24–26, 2017.

83. E. Real, S. Moore, A. Selle, et al., “Large-scale evolution of image classifiers,” arXiv, 1703.01041 (2017). [CrossRef]

84. C.-K. Cheng, A. B. Kahng, S. Kundu, et al., “Assessment of reinforcement learning for macro placement,” in International Symposium on Physical Design (ISPD), New York, New York, USA, Association for Computing Machinery, 2023, pp. 158–166.

85. P.-I. Schneider, X. Garcia Santiago, V. Soltwisch, et al., “Benchmarking five global optimization approaches for nano-optical shape optimization and parameter reconstruction,” ACS Photon. 6, 2726–2733 (2019). [CrossRef]

86. D. Langevin, P. Bennet, A. Khaireh-Walieh, et al., “PyMoosh: a comprehensive numerical toolkit for computing the optical properties of multilayered structures,” arXiv, arXiv:2309.00654 (2023). [CrossRef]

87. P. Lalanne and G. M. Morris, “Highly improved convergence of the coupled-wave method for Tm polarization,” J. Opt. Soc. Am. A 13, 779–784 (1996). [CrossRef]

88. G. Granet and B. Guizal, “Efficient implementation of the coupled-wave method for metallic lamellar gratings in tm polarization,” J. Opt. Soc. Am. A 13, 1019–1023 (1996). [CrossRef]

89. P. R. Wiecha, “pyGDM–a Python toolkit for full-field electro-dynamical simulations and evolutionary optimization of nanostructures,” Comput. Phys. Commun. 233, 167–192 (2018). [CrossRef]

90. P. R. Wiecha, C. Majorel, A. Arbouet, et al., “pyGDM–new functionalities and major improvements to the Python toolkit for nano-optics full-field simulations,” Comput. Phys. Commun. 270, 108142 (2022). [CrossRef]

91. L. Meunier, H. Rakotoarison, P. K. Wong, et al., “Black-box optimization revisited: improving algorithm selection wizards through massive benchmarking,” IEEE Trans. Evol. Comput. 26, 490–500 (2021). [CrossRef]

92. A. D. Rakić, A. B. Djurišić, J. M. Elazar, et al., “Optical properties of metallic films for vertical-cavity optoelectronic devices,” Appl. Opt. 37, 5271–5283 (1998). [CrossRef]

93. R. E. Christiansen and O. Sigmund, “Compact 200 line MATLAB code for inverse design in photonics by topology optimization: tutorial,” J. Opt. Soc. Am. B 38, 510–520 (2021). [CrossRef]

94. L. F. Frellsen, Y. Ding, O. Sigmund, et al., “Topology optimized mode multiplexing in silicon-on-insulator photonic wire waveguides,” Opt. Express 24, 16866–16873 (2016). [CrossRef]

95. A. Moreau, R. Smaali, E. Centeno, et al., “Optically optimal wavelength-scale patterned ITO/ZnO composite coatings for thin film solar cells,” J. Appl. Phys. 111, 083102 (2012). [CrossRef]

96. J. Jiang, R. Lupoiu, E. W. Wang, et al., “MetaNet: a new paradigm for data sharing in photonics research,” Opt. Express 28, 13670–13681 (2020). [CrossRef]

Illustrated tutorial on global optimization in nanophotonics

Abstract

1. INTRODUCTION

2. GENERAL BACKGROUND AND DESCRIPTION OF THE TEST CASES

A. Fundamentals of Optimization in Photonics

B. Typical Photonics Test Cases

C. Complex Photonics Cases

3. BASIC TOOLS FOR OPTIMIZATION: ALGORITHMS AND OBSERVABLES

A. Algorithms Categories

B. Observables

4. RUNNING AND DISCUSSING EXPERIMENTS: THE ART OF BENCHMARKING

A. Repository Description

B. Benchmarking with Nevergrad

C. Physical Analysis

5. GOOD PRACTICES FOR OPTIMIZING PHOTONIC STRUCTURES

A. Optimization Methodology

B. Assessing the Quality of a Solution

6. CONCLUSION

Funding

Acknowledgment

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (10)

Tables (2)

Equations (2)

Journal of the Optical Society of America B