Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Coherent SAT solvers: a tutorial

Open Access Open Access

Abstract

The coherent Ising machine (CIM) is designed to solve the NP-hard Ising problem quickly and energy efficiently. Boolean satisfiability (SAT) and maximum satisfiability (Max-SAT) are classes of NP-complete and NP-hard problems that are equally important and more practically relevant combinatorial optimization problems. Many approaches exist for solving Boolean SAT, such as quantum annealing and classical stochastic local search (SLS) solvers; however, they all are expected to require many steps to solve hard SAT problems and, thus, require large amounts of time and energy. In addition, a SAT problem can be converted into an Ising problem and solved by an Ising machine; however, we have found that this approach has drawbacks. As well as reviewing existing approaches to solving the SAT problem, we have extended the CIM algorithm and architecture to solve SAT and Max-SAT problems directly. This new technique is termed a coherent SAT solver (CSS). We have studied three implementations of the CSS, all-optical, hybrid optical–digital and all digital (cyber-CSS), and have compared the time-to-solution and energy-to-solution of three machines. The cyber-CSS, which is already implemented using a graphics processing unit (GPU), demonstrates competitive performance against existing SLS solvers such as probSAT. The CSS is also compared with another continuous-time SAT solver known as the CTDS, and the scaling behavior is evaluated for random 3-SAT problems. The hybrid optical–digital CSS is a more performant and practical machine that can be realized in a short term. Finally, the all-optical CSS promises the best energy-to-solution cost; however various technical challenges in nonlinear optics await us in order to build this machine.

© 2023 Optica Publishing Group

1. Introduction

Combinatorial optimization problems are ubiquitous across diverse fields of science, engineering, medicine, and finance. Classic examples include drug discovery, machine learning, compressed sensing, scheduling, logistics, circuit design, and communication networks. A possible numerical approach for combinatorial optimization problems is to map those problems to the Ising model [1,2] and to solve it with an Ising machine. Formally, the Ising model is defined by the Hamiltonian, $\mathcal {H} = -\frac {1}{2}\sum _i \sum _j \sigma _i \sigma _j J_{ij}$ , with a set of discrete spins discrete spins $\sigma _i \in \{-1,+1\}$. Finding spin configurations that minimize the Hamiltonian is an NP-hard problem and intractable on digital computers [3]. As a result, significant efforts have been invested toward leveraging various physical systems as Ising machines. On-going research on these physical platforms ranges from quantum annealers based on superconducting circuits [4,5], to coherent Ising machines (CIMs) based on networks of degenerate optical parametric oscillators (DOPOs) [69] among various other approaches [1015]. Although some combinatorial optimization problems are efficiently mapped to the Ising model, other problems require substantial overhead, i.e., many additional spins for proper mapping. For instance, solving Boolean satisfiability (SAT) problems and maximum satisfiability (Max-SAT) problems by first mapping them to the Ising model and subsequently solving them using an Ising machine is not competitive against various SAT solvers on a digital platform. In addition to CIMs, the continuous-time dynamical solver (CTDS) is an ordinary differential equation (ODE) that is designed to solve SAT and Max-SAT problems directly [1621]. The CTDS shares a few common features with the advanced CIM algorithms, in particular, chaotic amplitude control (CAC) [22] and chaotic feedback control (CFC) [23]. This analogy motivates us to explore a new principle and architecture for SAT/Max-SAT solvers using networks of DOPOs.

In this work, we first review existing techniques for solving the SAT problem. We also review existing proposals for special purpose hardware designed for combinatorial optimization such as the CIM. Then, we propose a DOPO network based coherent SAT/Max-SAT solver and evaluate its performance against native CTDS on a digital platform. The coherent SAT/Max-SAT solver hardware consists of hybrid optical–digital units, in which the network of DOPOs evolves dynamically and chaotically under a real-time measurement-feedback protocol. In particular, the measurement-feedback circuit detects an amplitude error and injects a correction signal back into the DOPO network. Although the native CTDS solver needs to compute a feedback signal that is a sixth-order nonlinear term (even for the 3-SAT problem) and requires more advanced numerical integration methods, the coherent SAT/Max-SAT solver needs to compute a feedback signal with only second-order nonlinear terms and can be numerically integrated using a fixed Euler step. We propose an experimental hardware architecture based on state-of-the-art linear/nonlinear optical devices and digital electronic circuits. A thin-film LiNbO$_3$ waveguide DOPO [24] is identified as a key element of the proposed system. The estimated energy-to-solution (ETS) of the optical–digital hybrid architecture is orders of magnitude smaller than those required in all-digital computing methods. We further study all-optical coherent SAT solvers (CSSs) with optical error correction feedback circuits that replace some of the digital electronic circuits. This all-optical architecture can potentially reduce the ETS cost by orders of magnitude.

2. Review of Quantum, Analog, and Digital Approaches to the SAT Problem

The Boolean SAT problem is known to be a classic example of an NP-complete optimization problem. Thus, it is thought that there is no algorithm that can always find a solution to it with high probability in polynomial time. However, there are many approaches that attempt to solve the problem as efficiently as possible. In this section, we outline approaches from quantum, analog, and digital computing which aim to solve the SAT problem. The pros and cons of these approaches are briefly summarized in Table 1. Some of these algorithms are heuristic which means there is no guarantee that they will find a solution in a certain amount of time; however, experimental evidence shows that they can solve many classes of SAT problems efficiently.

Tables Icon

Table 1. Summary of the Pros and Cons of the Various Approaches to Solving Boolean SAT Discussed in This Sectiona

2.1 Brief Introduction to SAT

Given a Boolean formula over a set of $N$ variables, the Boolean SAT problem simply asks if there is an assignment of the variables in which the formula outputs True. Although a SAT problem can take the form of any Boolean expression it is standard to only consider Boolean expressions in a certain form known as conjunctive normal form (CNF). This is because it is known that an arbitrary Boolean formula can be converted into an equivalent CNF formula with only a linear overhead [25]. A CNF formula consists of the conjunction (AND gate) of a set of clauses, each of which is a disjunction (OR gate) of literals (either variables or negated variables). For example, a CNF formula with four variables and two clauses may look like

$$(x_1 \vee \bar{x_2} \vee x_4 ) \wedge (x_2 \vee \bar{x_3} \vee \bar{x_4}).$$
A CNF SAT formula, where each clause has exactly $K$ variables, is known as a K-SAT formula and the K-SAT problem is known to be NP-complete if and only if $K \geq 3$.

In addition to the SAT problem, the NP-hard Max-SAT problem is another closely related problem which is briefly mentioned in Sections 4 and 6. Given a CNF formula, the goal of Max-SAT is to find a variable assignment that minimizes the number of unsatisfied clauses (or, equivalently, maximizes the number of satisfied clauses). Thus, a Max-SAT problem has a solution regardless of whether or not the formula is satisfiable.

2.1.1 Matrix Representation of CNF Formula

In this work we use the following matrix to denote a SAT problem that is in CNF form.

The coupling matrix $C_{ij}$ is defined as follows:

$$C_{ij} = \begin{cases} 1 & x_i\; {\textrm{is included in the}}\; j{\textrm{th clause}}, \\ -1 & \bar{x_i} \;{\textrm{is included in the}}\; j{\textrm{th clause}}, \\ 0 & {\textrm{the}}\; i{\textrm{th variable is not included in the}}\; j{\textrm{th clause}}.\end{cases}$$
In addition, we use the set of indices $\mathcal {I}_j$ to denote the set of problem variables included in the $j$th clause. Note that in this work the index $i$ is used to denote a problem variable, and the index $j$ will be used to denote a clause (except when referring to the Ising problem). We also use $N$ to denote the number of variables whereas $M$ denotes the number of clauses.

2.2 Quantum Computing

The most straightforward approach to solving the SAT problem with the aid of quantum computing is the use of Grover’s search algorithm [26]. Imagine we are given a function called an “oracle” from a domain of $2^N$ states which outputs False for all $2^N - 1$ states except for one “marked” state which outputs True. Using a classical computer, we cannot find the marked state without using $O(2^N)$ calls to the oracle function, however using Grover’s algorithm we can do this in $O(2^{N/2})$ calls. To solve the SAT problem we can consider the state space to be the set of possible variable assignments ($2^{N}$ for $N$ variables). Then, the oracle function simply tells us whether or not the given variable assignment satisfies the problem. However, one caveat is that in order to use Grover’s algorithm we must know exactly how many states output True (how many “marked” states). A given SAT problem can have many solutions or just one or zero. Fortunately, there is another quantum algorithm called quantum counting which can fix this problem [27]. Quantum counting uses quantum phase estimation to count the number of solutions and also has an asymptotic time complexity of $O(2^{N/2})$.

In Fig. 1(a) we show the quantum circuit for one iteration of Grover’s search on a particular 3-SAT problem with $N=4$ variables and $M=10$ clauses. The Grover iteration consists of two operators which can be thought of as reflections. The first reflection, denoted by $U_w$, flips the phase of the marked state. In the case of Fig. 1(a) this is done by evaluating SAT formula on the superposition of truth assignments and flipping the phase based on this Boolean value. Then, the second operator $U_s$ reflects the resulting superposition about the average amplitude of $2^N$ states. This can be done using a combination of Hadamard and controlled NOT (CNOT) gates as shown in Fig. 1(a). Together, these two operators perform a rotation that increases the amplitude of the marked state by roughly $\sqrt {\frac {1}{2^{N}}}$ (while also de-amplifying the other state) so that when this operator is repeated $2^{N/2}$ times the marked state will have large amplitude and can be measured with high probability. For a problem with $r$ solutions instead of just one, this circuit will still work but it will amplify the marked states by $\sqrt {\frac {1}{2^{N}}}$ and needs to be repeated $\sqrt {\frac {2^{N}}{r}}$ times instead. In Fig. 1(b) we show the circuit to implement quantum counting. In Fig. 1(b) $U_{\textrm {PH}}$ is the quantum phase estimation circuit for the Grover operator $G$ (Fig. 1(a)) and $U_{\textrm {IFT}}$ is the inverse quantum Fourier transform operator. By measuring the $t$ qubits in the first register we can evaluate $r$ with high probability and then repeat $G$ approximately $\sqrt {\frac {2^N}{r}}$ times to obtain a satisfying solution with high probability.

 figure: Figure 1.

Figure 1. (a) Quantum circuit to implement one step of Grover’s search algorithm for a 3-SAT problem with $N=4$ variables (represented by the top-four qubits) and $M=10$ clauses (represented by the next 10 ancillary qubits). The truth value of the CNF formula is used for a CNOT gate applied to the bottom qubit (which is in the $\lvert - \rangle$ state). This implements the $U_{w}$ operator. Combined with the $U_s$ operator, this circuit will perform a rotation of the state vector that amplifies the marked states by roughly $\sqrt {\frac {1}{2^N}}$. (b) Quantum circuit to implement quantum counting by estimating the eigenvalue of the Grover operator shown in Fig. 1(a) (denoted $G$).

Download Full Size | PDF

Together, we can use these quantum computing techniques to solve an arbitrary SAT problem. The quantum gates used, as shown in Figs. 1(a) and 1(b), are of the order of $O(2^{N/2})$ (ignoring some polynomial prefactor). Although this is still better than the $O(2^N)$ steps required for a brute force search, this time complexity is very sub-optimal for many classes of SAT problems. Roughly speaking, this is because we are not taking advantage of the structure that SAT problems often have. For example, it has been postulated [28] that random 3-SAT problems can be solved in average-case polynomial time by many classical heuristics. This hypothesis is further supported by simulation results presented in this work (Section 6).

2.3 Quantum Annealing

Another quantum computing approach that can be applied to solving the SAT problem is known as quantum annealing. Traditionally quantum annealing is performed on the Ising problem [29], another type of NP-hard problem in which variables only have pairwise interactions. The Ising problem is specified by a symmetric $N \times N$ coupling matrix $J$ and the goal is to minimize the objective function $E = -\frac {1}{2}\sum _i \sum _j J_{ij}x_i x_j$ over the domain $x \in \{-1,+1\}^{N}$. To solve an Ising problem using quantum annealing we must build a quantum system that evolves according to the following Hamiltonian:

$$\mathcal{H}_{\textrm{ising}} = (1-s)\sum_i \sigma_{x}^{i} - s \frac{1}{2}\sum_i\sum_j J_{ij} \sigma_z^{i}\sigma_z^{j},$$
where $\sigma _{x}^{i}$ and $\sigma _{z}^{i}$ correspond to the $X$ and $Z$ Pauli operators operating on the $i$th spin, and $s$ is a parameter which varies from $s=0$ to $s=1$ as a function of time. Quantum annealing is performed if we prepare the system in a uniform superposition of all $2^N$ states and gradually increase $s$ from 0 to 1 as the system evolves. This superposition is the ground state of the Hamiltonian when $s=0$ (which is $H_{\textrm {init}} = \sum _i \sigma _{x}^{i}$). On the other hand, the ground state of the final Hamiltonian, $H_{\textrm {final}} = -\frac {1}{2}\sum _i\sum _j J_{ij} \sigma _z^{i}\sigma _z^{j}$, will be the solution to the Ising problem. Thus, due to the quantum adiabatic theorem, we have a high probability of finding an optimal solution if $s$ is gradually increased from 0 to 1 and the final state of the system is measured [29].

To solve a SAT problem using this method we can convert our SAT problem into an Ising problem using one of the various conversion methods [1] and then using quantum annealing on the corresponding Ising problem. This conversion method requires extra resources (qubits) to be added which means this approach will likely be sub-optimal in many cases. In Section 5 we compare the performance of a simulated CIM on a SAT problem to the CSS proposed in this work. We find that the Ising solver has significantly more difficulty solving the problems that the native SAT solver can solve easily. Thus, it is important to note that, in principle, we can apply quantum annealing to the SAT problem directly by using higher-order interactions between spins. For example, given $C_{ij}$ representing a SAT problem in CNF as defined in Section 2.1.1 (2) we can construct the following Hamiltonian:

$$\mathcal{H}_{\textrm{SAT}} = (1-s)\sum_i \sigma_{x}^{i} + s\sum_j \prod_{k } \frac{I - C_{kj}\sigma_z^{k}}{2}.$$
Here, the indices $i,k$ sum over the $N$ variables, and the index $j$ sums over the set of clauses. This approach to solving SAT problems using quantum annealing is not studied much in the literature mainly because it involves interactions of order three or more (for each clause containing at least three variables). Current approaches to building quantum annealers only implement two-body interactions and thus can only be used to solve Ising or Max 2-SAT problems directly [30].

Overall, it has been shown both experimentally and theoretically that quantum annealing can often solve Ising problems efficiently [29,31]; however, it is still unclear whether or not it can outperform the best classical heuristic on a given problem. In addition, experimental realizations of quantum annealing typically have limited connectivity among spins [31] which makes them struggle to implement fully connected dense Ising problems. However, other physics-inspired approaches such as the CIM (see Section 2.4) are able to get around this problem and have all-to-all connectivity of spins [32].

2.4 Coherent Ising Machines

A compelling alternative to quantum annealers is the CIM, which natively solves problems of the form given by Eq. (3) [33,34]. In a CIM, the state of the system is stored as a set of coherent pulses in an optical cavity. During each round trip of the cavity, some form of all-to-all optical coupling [(matrix–vector multiplication (MVM)] is used. The addition of intracavity parametric gain by the inclusion of a $\chi ^{(2)}$ nonlinear crystal is used to realize an optical parametric oscillator (OPO). This nonlinearity allows for the machine to be a standalone device without the use of a digital or analog electronic computer to store the system state and produce the nonlinear behavior. However, there are many difficulties that come about when trying to introduce the coupling between pulses. Thus, experimental implementations of the CIM typically have used hybrid approaches in which part of the computation is still performed by electronics [7,35].

An important component of the CIM is the use of an optical cavity as a form of “optical memory.” Not only are the $N$ spin pulses stored in an optical cavity, but some architectures involve storing the whole $N \times N$ coupling matrix in a similar way [23,36]. This idea of using an optical cavity for memory is not new to the CIM [13,37]. Although optical memory has very low information density compared with its electronic counterparts, the potential for high readout speeds makes it desirable for applications such as Ising and SAT machines as well as other applications. In Section 8, we explore optical memory as a component of our proposed SAT solver. Because of the sparse nature of the Boolean SAT problem, this memory must be different from previous approaches. Typical realizations of optical memories based on OPOs are sequentially read. Each pulse in the cavity is read once, and in the same order, on every round trip of the cavity. Although this is sufficient for Ising problems, because row-vector sums are calculated by element-wise multiplication and addition, this is not practical for the sparse MVMs that occur in a SAT solver. In Section 8.4.1 we propose a possible solution to this problem by using gated OPOs to realize an optical random-access memory.

Another key concept that is demonstrated by the CIM is the idea that an Ising solver can be itself a dynamical system. In particular, even though the CIM is a discrete-time system it is often modeled by the following set of deterministic ODEs:

$$\frac{dx_i(t)}{dt} = (p - 1)x_i(t) - x_i(t)^3 - \sum_j J_{ij}x_j(t).$$
In addition to modeling the physical system, these equations can provide an efficient way of solving the Ising problem, even when simulated by a digital computer [23,38,39]. In addition, modifications of this dynamical system have been proposed in which new “error variables” are added which greatly improve the performance of the dynamical system [22,23,39,40]. These modified CIMs are often referred to as CIMs with “error correction.”

The concept of using a continuous-time dynamical system to solve combinatorial optimization problems is not new. Earlier work [21] had already explored a continuous-time dynamical system approach to solve the SAT problem which they referred to as the CTDS. In Section 3 we explore the CTDS in more detail and also propose a couple of novel dynamical systems aimed at solving the SAT problem more efficiently. Some of these are based on the CIM but are modified to work on the SAT problem instead of the Ising problem.

2.5 Simulated Annealing and Stochastic Local Search Solvers

In addition to quantum approaches to solving the SAT problem, there are many effective algorithms based on probabilistic approaches to solving SAT problems, which can be implemented using a conventional digital computer. This class of solvers often use a method known as stochastic local search (SLS). SLS solvers keeps track of a variable assignment which is the current guess for the solution to the problem. At each step of the algorithm, the solver uses a heuristic approach to choose one variable to negate. Generally speaking, this choice is made such that the solver progressively gets closer to solving the problem until it reaches a variable assignment that satisfies the formula. The term “stochastic” in the name refers to the fact that the heuristic used to choose the negated variable is often seeded by pseudo-random numbers.

The baseline algorithm for the SLS method is simulated annealing (SA) [41], which is the classical counterpart to quantum annealing. Similar to quantum annealing, SA is also traditionally applied to the Ising problem, however, it can be straightforwardly extended to Boolean SAT as follows. The SAT “energy” function $E_{\textrm {SAT}}(x)$ assigns an energy to a given variable assignment $x$ which is simply the number of clauses that are not satisfied by the assignment. Finding the assignment $x$ that minimizes $E_{\textrm {SAT}}$ is equivalent to solving the SAT problem. At each step of SA, a variable is chosen at random and $\Delta E_{\textrm {SAT}}$ is calculated. Here $\Delta E_{\textrm {SAT}}$ is the difference in $E_{\textrm {SAT}}$ if that variable is flipped. If $\Delta E_{\textrm {SAT}} < 0$, then the variable is flipped with probability 1, otherwise the variable is flipped with probability $e^{-\beta \Delta E_{\textrm {SAT}}}$. The parameter $\beta$ is called inverse temperature and is typically brought from a small positive value to a large positive value during the computation (or, in other words, from high temperature to low temperature creating an “annealing” effect).

Although SA is already effective at solving SAT problems, there are several tweaks that can be done to reduce the number of steps needed and speed up the algorithm. In particular, SA chooses which variable to update at random which will cause many unnecessary flips resulting in slow convergence. GSAT and WalkSAT [42] are SLS solvers in which the variable to flip is chosen more carefully. For these types of solvers, the algorithm has two types of moves to choose from randomly. A “steepest descent” move where it flips the spin which minimizes the objective function $E_{\textrm {SAT}}$ (out of $N$ possible choices). This step is not always possible because the algorithm can sometimes be at a local minimum where any single flip move will increase $E_{\textrm {SAT}}$. The second type of move is a random move in which a heuristic is used to negate a variable regardless of whether or not said negation increases the objective function. GSAT and WalkSAT differ in the precise details of how these moves are executed, but overall they both show a significant improvement over SA [42]. Over the years, additional improvements have been made to these SLS solvers. For example, in Section 6 we use ProbSAT [43] as a representative of the state-of-the-art SLS solvers to compare with the new approaches proposed here.

2.6 Benchmarking SAT Solvers

Even when considering solely classical computing methods there are many different approaches to solving SAT problems. Depending on the exact type of SAT problem that needs to be solved some approaches may be better than others. In order to evaluate which solvers have the best overall performance researchers host a regular SAT competition [44] in which a number of SAT solvers are tested on a benchmark problem set.

The competitions typically include multiple tracks with complete and incomplete solvers. Complete solvers are expected to be able to output a satisfying variable assignment if the problem is solvable but also be able to tell if a problem is not solvable. On the other hand, incomplete solvers (also called heuristic solvers) are only expected to output a solution with a high probability if the problem is solvable but have no guarantee that the problem is not solvable if they do not output a solution. In this work, we mainly focus on incomplete solvers. The incomplete tracks of the SAT competitions have primarily been occupied by SLS-based solvers, however many methods use hybrid multi-phase approaches which can include an SLS phase as a subroutine. The problem sets used typically include many application problems from various fields as well as randomly generated problems. Because the application problems vary greatly in size and difficulty, multi-phase solvers are often used to tackle a wide variety of problems.

In Section 6 we briefly discuss how our purposed algorithm compares with other solvers on one of the incomplete tracks in the 2018 competition. We use the 2018 competition problems because they include a random track in which randomly generated problems are used as benchmarks. Random SAT problems tend to not require multiple phases and most can be solved efficiently just by using an SLS approach. For the SAT solvers used to solve application problems, our proposed algorithm most closely resembles the SLS phase of a multi-phase solver.

2.7 Hardware Accelerators for Combinatorial Optimization

Because combinatorial optimization problems are known to take very many steps for classical computers to solve, many non-conventional approaches have been proposed in which various hardware devices other than a conventional CPU do computational work. By harnessing properties of the chosen device, such as parallelism and low power consumption, one can often reduce the time and energy required to solve these hard problems. In some approaches, one physical device is used just for one step of the algorithm while some approaches embed the entire algorithm in a single physical system.

2.7.1 Graphics Processing Unit

Owing to their multiple cores and distributed memory, graphics processing units (GPUs) can greatly speed up computations which allows for a large amount of parallelism. Many combinatorial optimization problems fall into this class as they often allow for both parallelism between problem variables (different threads are responsible for updating different variables) and parallelism between runs of the algorithm (different threads run the same algorithm in parallel with different random seeds). Although GPUs can offer an improvement in speed, they usually do not offer much performance benefit with respect to energy consumption [16,23,45].

2.7.2 Field Programmable Gate Array

Similar to a GPU, a field programmable gate array (FPGA) is best used to solve problems in which a large amount of parallelism is allowed. Even though an FPGA is built with the same physical hardware as CPUs and GPUs, because of the greater flexibility of circuit design an FPGA can be significantly more energy efficient than the former when programmed to do a certain task. Recently, FPGA versions of various Ising solvers have been implemented [39,4547] which show great improvement in terms of ETS and time-to-solution (TTS) cost over conventional GPU and CPU approaches. There are many examples of SAT solvers being implemented on an FPGA as well [48].

2.7.3 Photonic Accelerator Hardware

Because CPUs, GPUs, and FPGAs still rely on the same fundamental physical architecture (CMOS) it is believed that there is a limit to how fast and energy efficient these devices can get [49]. Thus, for certain types of computation, it is predicted that other computing platforms such as photonics can potentially provide lower power consumption and higher speed. When it comes to Ising solvers, a computational step that often takes the majority of resources is the MVM in which a vector is multiplied by the (fixed) coupling matrix $J$. Thus, many proposals have been made in which this step of an Ising solver is carried out by special purpose photonic hardware [7,36,5053].

Because of the linear nature of optics, many approaches have been proposed in which the MVM is computed at a low energy cost using few photons. The remainder of the computation required for the Ising solver can then be done by a digital circuit and the state variables for the algorithm are stored in digital memory. Because the remaining computations done by the digital computer have a computational complexity of $O(N)$ whereas a (dense) MVM has a complexity of $O(N^2)$ much time and energy can be saved by relying on the optical components for this step of the algorithm.

There are two main approaches to building this photonic MVM. The first method [7,5052] involves using a mesh of Mach–Zehnder interferometers which can be used to implement an arbitrary unitary transformation. By combining two of these units along with a diagonal matrix one can implement an arbitrary symmetric matrix using the spectral decomposition. The second method [36] leverages the nonlinear nature of coherent detection to perform multiply and accumulate operations (vector-vector product) at very low power. Both of these methods are useful when solving an Ising problem with a dense coupling matrix, however because of the sparse nature of the SAT problem these techniques cannot be straightforwardly applied to SAT solvers without an additional $O(N)$ resource overhead. This problem is discussed in more detail in Section 8 in which we discuss possible photonic accelerators for our proposed SAT solver based on the technique used in Ref. [36].

2.7.4 Other Hardware Approaches

In addition to the hardware accelerators and application specific machines discussed in this section, there are many other proposals which use some sort of physical device or hardware to speed up the solution of combinatorial optimization problems [11,18,5456]. These approaches range from analog electronic devices [11,18,54] to photonic $XY$-Ising machines [55] and other CIM variants [56]. All of these approaches are similar in the fact that they aim to use the inherent physics of these hardware devices to encode the optimization problem. Other than Ref. [18], all of the existing approaches only target the Ising/Max-Cut problem as the optimization problem to solve. A key theme of this work is that, although any NP problem can theoretically mapped to an Ising problem, this mapping process will be sub-optimal in many cases, thus it is important to have physical hardware that can solve a wider variety of optimization problems. In the next section, we discuss how a Boolean SAT problem can be solved using a dynamical system which will lead the way for the development of hardware SAT solver devices.

3. A Dynamical Systems Approach: Continuous-Time Dynamical Solvers and Chaotic Feedback Control SAT Solvers

In this section, we introduce the two types of SAT solvers studied in this paper in the form of ODEs. It is assumed that a SAT instance in CNF is given, and the coupling matrix $C_{ij}$ and sets $\mathcal {I}_j$ are defined as outlined in Section 2.1.1 (2).

The continuous-time SAT solvers are all based on the idea (inspired by Ref. [21]) that each clause can be expressed using the formula:

$$K_{j} = \prod_{k \in \mathcal{I}_{j}} \frac{1 - C_{kj}x_k}{2}.$$
If the value $-1$ is associated with the Boolean value “false” and 1 with “true,” and $x_i \in \{-1,+1\}$, then $K_{j} = 0$ if the $j$th clause is satisfied and $K_{j} = 1$ otherwise. However, the cost function can also be extended to the case $x_i \in \mathbb {R}$. This allows us to extend the discrete variable to a continuous “soft spin” variables much like what is done in the CIM [22,34]. Based on this concept, the SAT problem can be written in continuous form as
$$\sum_j K_j = 0 \quad \quad x_i \in [{-}1,1]$$
or
$$\sum_j K_j^2 = 0 \quad \quad x_i \in \mathbb{R}$$
and the Max-SAT problem can be written as
$$\text{minimize} \quad \sum_j K_j \quad \quad \text{subject to} \quad x_i \in [{-}1,1]$$
or
$$\text{minimize} \quad \sum_j K_j^2 \quad \quad x_i \in \mathbb{R}.$$
Naïvely, we could try starting with a random point in the continuous phase space and use gradient descent on the continuous cost function, however the dynamics of such a gradient descent system will quickly become stuck at local minima and saddle points. The systems of ODEs studied in this work can roughly be thought of as extensions of gradient descent in which additional “auxiliary” variables are added. These auxiliary variables provide additional degrees of freedom along which the dynamics can escape from local minima and trapping sets that are non-solutions.

3.1 CTDS

The solvers presented in this work are based on two different continuous-time solvers which have already been studied in Refs. [21,22]. The first solver, named CTDS [16,21], is described by the following deterministic system of ODEs:

$$\frac{dx_i(t)}{dt} ={-}\sum_j a_j(t)K_j(t)K_{ij}(t),$$
$$\frac{da_j(t)}{dt} = a_j(t) K_{j} (t)^2,$$
$$x_i \in [{-}1,1] \quad a_i \in (0, \infty),$$
$$K_{ij} = C_{ij}\prod_{k \in I_{j}, k \neq i} \frac{1 - C_{kj}x_k}{2},$$
with $K_{j}$ defined in (6).

This solver suffers from the fact that the auxiliary variables $a_j(t)$ can become arbitrarily large. This makes it difficult to implement in a physical system (such as an energy efficient optical machine) and to integrate numerically. In this work, we have studied a modified version of this system which fixes this problem of unbounded auxiliary variables. This system is equivalent to the system studied in Refs. [57,58] and is described by

$$\frac{dx_i(t)}{dt} ={-}\sum_j a_j(t)K_j(t)K_{ij}(t),$$
$$\frac{da_j(t)}{dt} = \xi(t) a_j(t)\left(K_{j} (t)^2 - \frac{1}{M}\sum_m K_{m} (t)^2 \right),$$
$$x_i \in [{-}1,1] \quad a_i \in (0, \infty).$$
We refer to this system of equations as modified CTDS (M-CTDS). The extra term that is added in the auxiliary equation ($\frac {1}{M}\sum _m K_{m} (t)^2$) ensures that the geometric mean of all $a_j$ stays constant, or in other words:
$$\sum_j \log (a_j(t)) = \sum_j \log (a_j(0)).$$
In addition, we added a system parameter $\xi$ which controls the speed of the auxiliary variable change (relative to the spin variable change). If this parameter is set dynamically according to the equation
$$\xi(t) = \frac{1}{\int_{0}^{t} \frac{1}{M}\sum_m K_{m} (\tau)^2 \, d\tau},$$
then the resulting dynamics are equivalent to CTDS but with a different speed (for more details about this, see Ref. [58]). We found in our numerical experimentation, however, that the system works better as a SAT solver if $\xi$ is chosen to be constant instead of modulated according to the above formula. We also found through numerical simulation that integrating this equation using a constant Euler integration step size is sufficient to have good performance as a solver unlike the equations studied in Refs. [16,21].

Although some properties of this solver are desirable for implementation in an optical machine, there are still several difficulties. In particular, there are many nonlinearities in this system of ODEs. For example, for a 3-SAT instance, the coupling term $a_j(t)K_j(t)K_{ij}(t)$ will require a fifth-order mixing of problem variables which is unrealistic to implement with any sort of nonlinear optical effect. For this reason, we focus solely on the optical implementation of SAT-CFC (see the next section). Regardless, we still include benchmark results for this solver as a digital algorithm.

3.2 SAT-CFC

The second type of differential solvers discussed in this work can be traced back to the original CIM [3234]. The CIM aims to solve the Ising problem, not the Boolean SAT problem, however, it is similar in the sense that binary spin variables are relaxed to “soft” analog spins which can take a continuous range of values. The current state-of-the-art CIM solver, called CIM-CAC [22,39], is described by the following deterministic set of ODEs:

$$\frac{dx_i(t)}{dt} = (p - 1)x_i(t) - x_i(t)^3 - e_i(t) \sum_j J_{ij}x_j(t),$$
$$\frac{de_i(t)}{dt} ={-} \beta e_i(t) ( x_i(t)^2 - \alpha).$$
A slight modification of this system discussed in Ref. [23] known as CIM-CFC, is described by
$$f_i(t) = e_i(t) \sum_j J_{ij}x_j(t),$$
$$\frac{dx_i(t)}{dt} = (p - 1)x_i(t) - x_i(t)^3 - f_i(t),$$
$$\frac{de_i(t)}{dt} ={-} \beta e_i(t) ( f_i(t)^2 - \alpha).$$
This system is slightly easier to implement with a hybrid optical–electronic system (for reasons that are explained in Section 5).

We have found that both of these solvers can be modified to solve SAT problems as follows. We call these new solvers SAT-CAC,

$$\frac{dx_i(t)}{dt} = (p-1)x_i(t) - x_i(t)^3 - e_i(t) \sum_j K_{ij}(t),$$
$$\frac{de_i(t)}{dt} ={-}\beta e_i(t)(x_i(t)^2 - 1),$$
and SAT-CFC,
$$f_i(t) = e_i \sum_j K_{ij}(t),$$
$$\frac{dx_i(t)}{dt} = (p-1)x_i(t) - x_i(t)^3 - f_i(t),$$
$$\frac{de_i(t)}{dt} ={-}\beta e_i(t)(f_i(t)^2 - (p-2)^2),$$
where $C_{kj}$ and $\mathcal {I}_j$ represent the SAT problem being solved ($K_{ij}$ is defined in Eq. (9) and $K_{j}$ is defined in Eq. (6). For a 3-SAT problem, these new ODEs will only have second-order nonlinearities in the sum $\sum _j K_{ij}(t)$ which can be more reasonably implemented using nonlinear optical processes. In the case of a hybrid optical–digital implementation, we can use optical homodyne detection for this purpose (as discussed in Section 8). For the rest of this work, we mainly discuss SAT-CFC; however, Appendix D includes a brief comparison of SAT-CFC and SAT-CAC in which the performance is very comparable. Both SAT-CAC and SAT-CFC use analog variables that are reasonably bounded and can be numerically integrated using a constant Euler step.

4. Analysis of Nonlinear Dynamics

To give an intuitive picture as to why these ODEs can be used to efficiently solve the SAT problem we briefly discuss the fixed points of each system similar to the analysis done in Refs. [2123,58]. In this section, we also show trajectories of the dynamics and qualitatively describe the similarities and differences between SAT-CFC and CTDS. We also show some theoretical analysis of SAT-CAC in which a condition is derived for the stability of fixed points.

In the case of M-CTDS, fixed points occur when the $x_i$ variables take the form of $x_i = \sigma _i$ where $\sigma _i \in \{-1,+1\}$ is a spin configuration corresponding to a solution to the given SAT problem. When a problem is under-constrained, fixed points may also occur when the under-constrained variables take some value in the range $[-1,1]$. In this case, $x_i$ still represents a solution to the SAT problem (multiple solutions, in fact). It is easy to show that these fixed points are stable, thus the system trajectory will stop if and only if it has found a solution. Thus, for a problem that is unsatisfiable, there will be no fixed points which means M-CTDS (and CTDS) are not very effective as Max-SAT solvers.

In the case of SAT-CFC, the fixed points take the form:

$$x_i = \sigma_i \in \{{-}1,+1\},$$
$$e_i = \frac{p - 2}{ |\sum_j K_{ij}|}.$$
Here $\sigma _i$ now corresponds to a local minimum of the SAT cost function (the SAT cost function counts the number of unsatisfied clauses), not necessarily a satisfiable configuration. A local minimum can be defined as a configuration such that
$$\sigma_i ={-}\text{sign} \left( \sum_j K_{ij} \right).$$
To see algebraically why a fixed point must obey this formula, we can look at Eq. (19):
$$0 = \frac{dx_i(t)}{dt} = (p-1)x_i(t) - x_i(t)^3 - e_i(t) \sum_j K_{ij}(t),$$
which implies
$$-\text{sign}\left((p-1)x_i(t) - x_i(t)^3\right) ={-}\text{sign}\left( e_i(t) \sum_j K_{ij}(t)\right).$$
As $(p-1) - x_i(t)^2$ is always negative and $e_i$ is always positive this equation can be simplified to
$$\sigma_i = \text{sign}(x_i(t)) ={-}\text{sign}\left( \sum_j K_{ij} \right).$$
This is called a local minimum because negating any variable will increase the number of unsatisfied clauses. If a satisfiable configuration exists, then it is also a local minimum and, thus, a fixed point. However, for a hard problem instance, there may be many other local minima that are not satisfiable configurations. Unlike in the case of M-CTDS, these fixed points may not be stable and, in fact, none of these fixed points will be stable given the right choice of the parameter $p$ (see Section 4.1 for more details). This means that a satisfiable configuration will correspond to an unstable fixed point during the computation. However, numerical evidence shows that the system still has a high chance of finding a satisfiable configuration at some point during the trajectory for solvable problem instances.

This property of SAT-CAC and SAT-CFC means that for an unsatisfiable Max-SAT problem, the global minimum of the SAT cost function (which is the solution to the Max-SAT problem) will also be a fixed point. This implies that, unlike CTDS and M-CTDS, SAT-CAC and SAT-CFC can be used as Max-SAT solvers as well. This hypothesis is supported by numerical evidence in Section 6.

The qualitative differences and similarities between he three solvers can be summarized in Table 2. Note that when we say that the auxiliary variables are bounded in the case of M-CTDS and SAT-CFC (second row), this is not strictly the case. There is no guarantee that the auxiliary variables will stay within any particular range, however, our numerical simulation shows that, in practice, the auxiliary variables roughly stay within a certain range. In particular, the auxiliary variables stay small enough that higher-order numerical integration methods are not needed. Another important message of this table is that M-CTDS and SAT-CFC can perform well as SAT solvers even when a first-order Euler integration step is used unlike the original CTDS [20]. This result primarily comes from numerical simulations, but can be roughly linked to the lack of unbounded auxiliary variable growth in M-CTDS and SAT-CFC. Because higher-order integration methods are not needed, all of the numerical results in this work for M-CTDS and SAT-CFC are generated using a first-order Euler integrator.

Tables Icon

Table 2. Qualitative Similarities and Differences between K-SAT Solvers

Another message from Table 2 is that the order of the nonlinear coupling term (third row) in CTDS and M-CTDS might be able to be reduced. Although it has not been studied in this work, it might be possible for CTDS and M-CTDS to use the same cost function at SAT-CFC, in which case the coupling term would have order $(K - 1) + 1 = K$ instead of $2K$. The additional $+1$ corresponds to the auxiliary variable $a_j$ occurring in the coupling term in Eqs. (7) and (10). Although SAT-CFC also contains a factor of $e_i$ in the coupling term (Eq. (19)), it is applied after the sum and, thus, can be implemented by a digital circuit in our experimental design in Section 5.

Table 3 compares example trajectories ($x_i$, $e_i$, and $a_j$) for the three solvers proposed here. The trajectories can be used to get a better intuitive understanding of the system dynamics even though they only represent one set of parameters and one specific problem (a random 3-SAT problem with 80 variables and 340 clauses).

Tables Icon

Table 3. Example Trajectories ($x_i$, $e_i$, and $a_j$) for the Three Solvers Proposed Here

The trajectories in Table 3 for CTDS and M-CTDS quickly reach a stable fixed point. This is because the problem instance is easy and the system can find a satisfying configuration rather quickly. Even once the fixed point is reached, many $x_i$ variables are still between $+1$ and $-1$ indicating that many variables are under-constrained. The differences between CTDS and M-CTDS are rather subtle in these plots. In the case of CTDS, the auxiliary variables ($a_j$) always increase in value and grow exponentially until the solution is found. This does not affect the performance on this instance because a satisfying configuration is found quickly, however, for a harder problem instance, the auxiliary variables will continue to increase and cause numerical stiffness [16,21]. In the case of M-CTDS we can see (particularly in the log-scaled plot) that some auxiliary variables increase in value, whereas others decrease, ensuring no net change in $\sum _j \log (a_j)$.

For SAT-CFC the system never reaches a (stable) fixed point and continues to evolve chaotically. This trajectory has visited a satisfiable solution even though it does not stay there. The most important observation about SAT-CFC is the fact that although the $x_i$ variables tend to stay close to the values $\{-1, +1\}$, each variable still oscillates in some small range around these discrete values. These behaviors are caused by the error correction dynamics studied in Refs. [22,23] and are important to ensure the system’s effectiveness as a solver. These small deviations from $\{-1, +1\}$ enforce chaotic dynamics, which cause new regions of the phase space to be explored for possible solutions.

In the plot of the auxiliary variable ($e_i$) for SAT-CFC the $e_i$ tends to cluster around certain discrete values. These discrete values correspond to the values $e_i$ takes at fixed points, which are of the form $e_i = \frac {p - 2}{ |\sum _j K_{ij}|}$ as mentioned earlier in this section. The fact that $e_i$ appears close to these values often, along with the fact that $x_i$ appears close to $\{-1,+1\}$, implies that the system is often near a fixed point. In addition, it can be observed that some values of $e_i$ tend to be much larger than the others. These correspond to under-constrained variables. When a variable $x_i$ is under-constrained this means it can take either $+1$ or $-1$ value and it will have no effect on the SAT potential which means $\sum _j K_{ij} \approx 0$ (since $\sum _j K_{ij}$ comes from the gradient of the SAT potential).

4.1 Stability of Fixed Points for SAT-CAC

Until now, the stability of the fixed points in SAT-CFC and SAT-CAC have been inferred from the qualitative dynamics studied above. In this section, we derive the stability conditions of the fixed points found in SAT-CAC. We restrict our focus to SAT-CAC because the Jacobian matrix is simpler, and assume similar behavior occurs in SAT-CFC.

To reiterate, the fixed points of SAT-CAC are given as follows:

$$x_i = \sigma_i ,$$
$$e_i = \frac{p - 2}{ \sigma_i h_i},$$
with $h_i = \sum _j K_{ij}$, $K_{ij} = C_{ij} 2^{-k} \prod _{k \in I_j, k \neq i} (1-C_{jk} \sigma _k)$.

The Jacobian matrix at the fixed points can be described as follows:

$$\frac{\partial \dot{x}_i}{\partial x_j} = \delta_{ij} (p - 4) + e_i \beta \omega_{ij},$$
$$\frac{\partial \dot{x}_i}{\partial e_j} = h_i \delta_{ij},$$
$$\frac{\partial \dot{e}_i}{\partial x_j} ={-}2 \beta \sigma_i e_i \delta_{ij},$$
$$\frac{\partial \dot{e}_i}{\partial e_j} = 0,$$
with the matrix $\omega _{ij} = 2^{-k} (1-\delta _{ij}) \sum _l C_{li} C_{lj} \prod _{k \in I_j, k \not \in \{i, j\}} (1 - C_{lk} \sigma _k)$. We also write $\Omega = \{\omega _{ij}\}_{ij}$.

Using the fact that $\frac {\partial \dot {e}_i}{\partial x_j} \frac {\partial \dot {x}_i}{\partial e_j} = b \delta _{ij}$ with $b = -2 \beta (p - 2)$, the eigenvalues $\lambda _j^{\pm }$ of the Jacobian matrix can be explicitly calculated by considering its characteristic polynomial, and are given as follows [22]:

$$\lambda_j^{{\pm}} = \begin{cases} \frac{1}{2}({-}2 + (2 - p) \mu_j \pm \sqrt{\Delta_j}) & \text{ if } \Delta_j > 0,\\ \frac{1}{2}({-}2 + (2 - p) \mu_j \pm i \sqrt{-\Delta_j}) & \text{otherwise}, \end{cases}$$
with $\Delta _j = (-2 + (2 - p) \mu _j)^2 + 4b$, $i^2=-1$, and $\mu _j$ the $j^{th}$ eigenvalues of the matrix $\tilde {\Omega } = D[(\boldsymbol {\sigma } \cdot \boldsymbol {h})^{-1}] \Omega - I$ with $(\boldsymbol {\sigma } \cdot \boldsymbol {h})^{-1} = \{(h_i \sigma _i)^{-1}\}_i$.

The $2N$ eigenvalues of the system are pairs $\lambda _j^{+}$ and $\lambda _j^{-}$ with $j \in \{1,\ldots,N\}$. A pair become the same real value when $\Delta _j = 0$ under the condition:

$$\theta = G_{{\pm}}(\beta,\mu_j),$$
with $\theta =1/(1 - p)$ and $G_{\pm }(\beta,\mu ) = \frac {4 \beta - \mu (\mu -2) \pm 4 \sqrt {\beta (\beta + \mu )}}{(\mu -2)^2 - 8 \beta }$.

The stability of fixed points depends on the sign of the real part of the eigenvalues of the Jacobian matrix. It can be shown that $\text {Re}[\lambda _j^{+}] = 0$ for the parameter $\theta$ given as follows ($\text {Re}[\lambda _j^{+}] \geq \text {Re}[\lambda _j^{-}]$, $\forall j$):

$$\text{Re}[\lambda_j^{+}] = 0 \Leftrightarrow \begin{cases} \theta=0 \text{ or } \theta={-}1, & \text{ if } \Delta_j>0,\\ \theta= F(\mu_j (\boldsymbol{\sigma})), & \text{ if } \Delta_j<0, \end{cases}$$
with $F(\mu ) = -\frac {\mu }{(\mu -2)}$. This means that for any given fixed point $\boldsymbol {\sigma }$, there exists some parameters for which it is stable and some parameters for which it is unstable. This is illustrated in the bifurcation diagram shown in Fig. 2. The numerical simulations for a 3-SAT problem with 20 Boolean variables and 85 clauses are shown in Figs. 3 and 4 when the parameter $\theta$ is set in a stable ($p=-0.6$) and unstable region ($p=-2.6$), respectively. Other parameters for these plots are $\beta =1.0$, $dt=0.02$.

 figure: Figure 2.

Figure 2. Bifurcation diagram. The blue star and orange cross show the points $\{\mu ^*,1/(1-p)\}$ with $p = -0.6$ (stable case) and $p = -2.6$ (unstable case), respectively. Here $\mu ^*$ is the maximal eigenvalue of $\tilde {\Omega }$, i.e., $\mu ^* = \text {max}_j \mu _j$ for a satisfiable solution $\sigma$. The horizontal coordinate of this figure is associated with a specific fixed point/local minima for a given problem, whereas the vertical coordinate corresponds to the parameter $p$ (technically $\theta = 1/(1-p)$). The green curve indicates the value of $p$ for which a given fixed point will transition from being unstable to stable. This is the theoretical reasoning behind why modulating $p$ (or, equivalently, modulating $\theta$) over time can create an “annealing” effect.

Download Full Size | PDF

 figure: Figure 3.

Figure 3. Trajectory in the case of a stable satisfiable solution, $p=-0.6$.

Download Full Size | PDF

 figure: Figure 4.

Figure 4. Trajectory in the case of an unstable satisfiable solution, $p=-2.6$.

Download Full Size | PDF

5. Comparison of SAT-CFC with CIM-CAC

As mentioned in Section 3, SAT-CAC and SAT-CFC are inspired by CIM-CAC which is an algorithm designed to solve the Ising problem. Since the Ising problem is NP-hard, we can map a given SAT problem into an Ising problem instance and use CIM-CAC to solve the original SAT problem. Using this approach we can compare the ability of SAT-CFC to solve a given SAT problem against the two-step approach of conversion to Ising and then the use of an Ising solver such as CIM-CAC. The exact method of conversion used in this section is outlined in Appendix C. We have found that the SAT-CFC solver greatly outperforms the Ising solver in all cases tested.

In Fig. 5, we show the TTS (see Appendix A for details on TTS and how it is calculated) as a function of clause-to-variable ratio ($\alpha = M/N$) when the problem size is fixed at $N = 40$, and as a function of problem size when the clause-to-variable ratio is fixed at $\alpha = 3.8$.

 figure: Figure 5.

Figure 5. (a) Median TTS of CIM-CAC and SAT-CFC on random 3-SAT with respect to clause-to-variable ratio $\alpha = M/N$ with the problem size fixed at $N=40$. (b) Median TTS of CIM-CAC and SAT-CFC on random 3-SAT with respect to problem size $N$ with the clause-to-variable ratio fixed at $\alpha = 3.8$. TTS is expressed in the units of integration time steps for both methods (or, equivalently, round trips on a physical machine).

Download Full Size | PDF

In this range of problem instances tested, CIM-CAC shows worse performance by a factor of roughly $10^2$ to $10^3$ even after the parameters for CIM-CAC were optimized. It is worth noting that in the case of $N=40$ and $\alpha = 4.2$ there is significantly less difference in performance. At the moment we have not investigated this further, and for the rest of this work we primarily focus on benchmarking SAT-CFC, because we hypothesize that SAT-CFC outperforms CIM-CAC on almost all SAT instances.

6. Benchmarking of SAT Solvers on Random SAT

In this section, we compare the performance of four solvers: M-CTDS, SAT-CFC, and the two versions of CTDS from Ref. [16]. We use randomly generated 3-SAT instances as performance benchmarks and use the TTS as the performance metric with the unit being the number of integration time steps. This unit of time is relevant to both digital and experimental implementations since a single integration time step corresponds to a single round trip in our proposed hybrid optical–digital and all-optical implementations. Note that this metric is slightly different than the metric used to evaluate CTDS in Ref. [16]. Details of TTS including exactly how it is calculated and why it is used can be found in Appendix A. The results for CTDS are obtained using open-source code in Ref. [16].

In addition to the differential solvers CTDS and SAT-CFC we have also performed benchmarks against a state-of-the-art digital heuristic known as ProbSAT [43]. ProbSAT is an SLS-based solver which means a single variable is updated at each step of the algorithm. Because this is different from the computation time required to update all $O(N)$ soft spins simultaneously, it makes more sense to compare real wall-clock times of GPU and CPU implementations of the solvers. Although we have observed good performance for SAT-CFC on random 3-SAT our benchmarking against ProbSAT shows that the SLS-based solver is still superior in this problem class when implemented on a digital platform. The results for ProbSAT are obtained using code submitted to the 2018 SAT competition and is available on GitHub here [59].

In the following set of plots, we compare the performance of differential solvers by fixing a problem size ($N$) and varying the number of clauses ($M$) as well as fixing the clause to variable ratio $\alpha = M/N$. Thus, the TTS is plotted both as a function of the clause to variable ratio (Figs. 6(a) and 6(b)) and as a function of problem size (Figs. 6(c) and 6(d)). As long as $\alpha$ is smaller than some threshold value $\alpha _c = 4.267$, then it is expected that the majority of random 3-SAT problems created will be satisfiable as long as $N$ is sufficiently large [60].

 figure: Figure 6.

Figure 6. (a),(b) Median TTS as a function of clause-to-variable ratio $\alpha = M/N$ at problem sizes $N=100$ and $N=1000$, respectively. For each value of $\alpha$, a set of 25 randomly generated instances is created and the median is evaluated over the instances that were solved by any of the four solvers. (c),(d) Median TTS as a function of problem size $N$ with the clause-to-variable ratio fixed at $\alpha = 3.4$ and $\alpha = 3.8$, respectively. The median is evaluated over a set of 25 randomly generated instances per problem size.

Download Full Size | PDF

6.1 Scaling of SAT Solvers on Random 3-SAT

In Fig. 6 we show the median performance of four continuous-time SAT solvers on random 3-SAT problems with various parameters. In these figures, the solid green and red curves correspond to SAT-CFC and M-CTDS, respectively. The light blue and purple dotted curves correspond to the two versions of CTDS presented in Ref. [16] which are integrated using an adaptive Runge–Kutta integration method. In Figs. 6(a) and 6(b), the vertical gray line corresponds to the 3-SAT phase transition point at $\alpha _c = 4.267$ [60]. Note that in Fig. 6(b) the curves for all three CTDS algorithms do not extend to the phase transition point. This is because in the given amount of computation time these solvers were only able to solve a small fraction of the problems that were solved by SAT-CFC and SAT-CAC. In the range of $4.2 \leq \alpha \leq 4.27$ we were not able to solve all of the generated problems using SAT-CFC. Thus, we cannot be certain if SAT-CFC is missing out on the solutions to some satisfiable problems, in which case we would be estimating the median TTS incorrectly, or if the unsolved problems are truly unsatisfiable. Because of this, it is technically more accurate to consider this plot as a lower bound for the median TTS in this range of $\alpha$. Nevertheless, we believe it is likely that this estimate is close to the true median over satisfiable instances.

In Figs. 6(c) and 6(d) we show the median TTS as a function of problem size with the clause-to-variable ratio fixed at $\alpha = 3.4$ and $\alpha = 3.8$, respectively. As $\alpha$ is below the phase transition point in both plots, all of the problems are very easy for all of the solvers. In particular, because the plots appear roughly linear with log–log axis, this implies all of the solvers exhibit polynomial scaling of TTS, similar to what has been observed in Ref. [16]. Another interesting behavior which can be observed in both Figs. 6(c) and 6(d) is that we see a change in the scaling behavior of SAT-CFC around $N = 10^3$. This could be partially explained by different parameters being optimal for small and large problem sizes (see Appendix B for details), however, the reason why different parameters are optimal in these different regions is yet to be understood.

6.1.1 α = 4.25

In Fig. 7(a) we show the median TTS as a function of problem size with the clause-to-variable ratio fixed at $\alpha = 4.25$. As expected, with $\alpha$ closer to the phase transition point the TTS increases more steeply with problem size compared with Figs. 6(c) and 6(d). Nevertheless, these numerical results are still compatible with polynomial scaling, at least in the case of SAT-CFC. Even near the phase transition point where the problems are expected to be hardest, polynomial scaling of median time complexity is not too surprising as it has been observed for heuristic SAT solvers on 3-SAT in Refs. [16,28]. Although it is expected that 3-SAT cannot be solved in polynomial time (P$\neq$NP conjecture), this is referring to the worst-case time complexity, thus our results on median time complexity have no implications for this conjecture.

 figure: Figure 7.

Figure 7. (a) Median TTS as a function of problem size $N$ with the clause-to-variable ratio fixed at $\alpha = 4.25$. The median is evaluated over a set of 25 randomly generated instances per problem size. As $\alpha = 4.25$ is very close to the satisfiability threshold, many instances generated were not satisfiable; thus we took the median over the instances that were solved by any of the four solvers. (b) Fraction of problem instances solved by any of the solvers at $\alpha = 4.25$ (out of the 25 random instances generated). Although the fraction decreases with problem sizes, this is expected because the location of the phase transition point is slightly larger for smaller values of $N$ [28]. Because the number of satisfied problems stays above 0.5, this tells us at the very least that the solvers solved most of the satisfiable problems. We believe it is likely that we have solved nearly all of the satisfiable problems, so the median TTS measured is close to the true median.

Download Full Size | PDF

Although we hypothesize that the median TTS of SAT-CFC may scale polynomially, even at the phase transition point, there are several reasons why it can be hard to come to this conclusion based on the current numerical results. First of all, the numerical results in Fig. 7(a) are given under the assumption that we have solved all of the satisfiable instances out of the random instances generated. In Fig. 7(b) we show the fraction of instances solved as a function of problem size. As this fraction appears to level out above 50%, we believe this is good evidence that the majority of satisfiable instances are solved. However, it is still possible that we are missing a significant fraction of harder instances and that the median TTS reported here is a lower bound for the larger problem sizes.

Second, it is also possible that the scaling behavior we have observed is some sort of finite-size effect, and the true scaling will be some super-polynomial function of $N$. Due to limited computational resources, our data only reaches $N= 10^3$, and as we have observed in Figs. 6(c) and 6(d) there appears to be a qualitative difference in the scaling behavior of SAT-CFC before and after $N=10^3$. This means that we might observe a significant difference in scaling behavior if data are collected beyond this problem size. The scaling results in both Refs. [28] and [16] also do not have data beyond $N = 10^3$ when $\alpha$ is near the phase transition point, thus it is possible we are all observing the same finite-size effect.

6.1.2 Wall-Clock TTS for α = 4.25

In Fig. 8 we show the wall-clock TTS of our GPU implementation of SAT-CFC and M-CTDS as well as the GPU implementations of CTDS from Ref. [16] against the wall-clock TTS of ProbSAT [43] (plotted in blue). ProbSAT is an SLS-based solver which is known to be efficient at solving random SAT instances. It is hypothesized to exhibit polynomial scaling on random 3-SAT at the phase transition point [28]. Based on our numerical simulations it appears that ProbSAT does follow a polynomial scaling law similar to that of SAT-CFC. In addition, it appears that for many problem sizes the performance of the GPU implementation of SAT-CFC is comparable to or faster than ProbSAT. Nevertheless, for large problem sizes it appears that ProbSAT is faster, and may exhibit a shallower polynomial scaling law. This shallower scaling can be partially attributed to the larger cost of computing a single step of ProbSAT compared with a single time step of the differential solvers.

 figure: Figure 8.

Figure 8. Median wall-clock TTS (seconds) for a GPU implementation of differential solvers (SAT-CFC, M-CTDS, and CTDS) and for a CPU implementation of ProbSAT. The GPU used for this plot is the Tesla V100, and ProbSAT was run on an AWS c6i instance which uses third-generation Intel Xeon Scalable processors. (Note that the time for ProbSAT is calculated for a single thread even though multiple threads can be run in parallel on the instance.)

Download Full Size | PDF

It is important to note that in the plot we are comparing the performance of a GPU implementation with a CPU implementation. This is not a fair comparison because the GPU has many more components and requires more power to operate. However, we expect that in the future SAT-CFC can potentially be implemented on a hybrid digital–optical solver or with the all-optical architecture proposed in this paper. The low power consumption and high speed of these devices can allow SAT-CFC to be competitive against ProbSAT.

6.1.3 Instance-Wise Comparison Between SAT-CFC and ProbSAT at N = 1000 α = 4.25

In Fig. 9 we show the TTS of both SAT-CFC and ProbSAT for a set of 50 randomly generated SAT problems at problem size $N=1000$ and clause-to-variable ratio $\alpha = 4.25$. Instances on the dotted gray lines indicate that the problem was not solved by the given solver (in other words $10^{11}$ is the “default” TTS). In this figure, the unit of TTS is computational steps, not real time (so integration time steps for SAT-CFC and spin updates for ProbSAT). Although these TTS values turn out to be comparable for this problem size even with these different units, this is a coincidence and it will not happen at different problem sizes. In fact, this indicates that ProbSAT will be faster because a single step for ProbSAT can be computed much more quickly.

 figure: Figure 9.

Figure 9. Scatter plot showing TTS for both SAT-CFC (vertical axis) and ProbSAT (horizontal axis) for 50 randomly generated 3-SAT instances at problem size $N=1000$ with $\alpha = 4.25$. Unsolved instances are plotted on the dotted gray lines (with a default TTS of $10^{11}$). The units for TTS are integration time steps for SAT-CFC and spin flips for ProbSAT.

Download Full Size | PDF

Although ProbSAT has superior performance to SAT-CFC as a digital solver on these problem instances, there are still several important things to note regarding this figure. In particular, this figure illustrates that the TTS will have a similar distribution for both solvers. This is important because it means that the hardest instances will have similar hardness for the two solvers relative to the median and easiest instances. Although SAT-CFC did not solve six of the instances that ProbSAT solved we expect that this number will decrease to zero if SAT-CFC is given more computation time. In addition to similar distributions, there is also a high level of correlation between the instance hardness with respect to each solver. Similar behavior has been observed for the Ising solvers in Ref. [23] and this most likely indicates that the instance hardness is an inherent property of the problem and will be similar regardless of the heuristic method used to solve it.

6.1.4 Performance on 2018 SAT Competition Problems (Random Track)

In Table 4 we present a comparison of SAT-CFC against solvers submitted to the 2018 SAT competition random track. We tested them on some of the benchmark problems used for the competition and compared our results on SAT-CFC to the results reported in the competition. As we are using a GPU implementation of SAT-CFC, instead of a CPU like what was used in the competition, we gave SAT-CFC 500 seconds instead of 5000 to solve each problem. (this corresponds to the fact that the GPU used, the Nvidia Tesla V100, has about 10 times the power consumption of a typical CPU: 200 W versus 20 W).

Tables Icon

Table 4. Performance of Solvers on Random 5-SAT Problems from the 2018 SAT Competition

Our results indicate that SAT-CFC implemented on a digital platform (GPU) might be somewhat comparable to existing digital heuristics on hard (random) competition problems. However, these results are fairly limited since we only used one type of benchmark instance (random 5-SAT). This is partially due to another drawback of SAT-CFC, which is that in its current form we need to spend some time optimizing the parameters ($p$ and $\beta$ in particular) in order to get good performance. To get good performance on all instance types we would need to add an automated rule to guess good parameters based on the input instance, however, this is beyond the scope of this paper and may be explored in further works.

6.1.5 N = 50 Max-SAT

As mentioned in Section 4, we believe that SAT-CFC can also function as a heuristic solver for Max-SAT. To check this hypothesis we computed the TTS (time to find maximum number of satisfied clauses) of SAT-CFC on a set of random 3-SAT problems with $N=50$ variables and different clause-to-variable ratios. To confirm the maximum number of satisfied clauses, we used a complete Max-SAT solver found in Ref. [62].

In Fig. 10 we show the median, 75th percentile, and 90th percentile TTS of SAT-CFC on randomly generated 3-SAT as a function of clause-to-variable ratio $\alpha$. For each value of $\alpha$ we used 100 instances. SAT-CFC was able to find the same Max-SAT value as the complete solver on all instances. In Fig. 10 we can see that for most instances at this problem size SAT-CFC can solve the problem in relatively few ($\sim$100) time steps. However, when considering the hardest problems, especially around the phase transition point, some instances take orders of magnitude longer time to solve. Interestingly, this observation contrasts with the results of CIM-CAC on random Ising problems (the Sherrington–Kirkpatrick model) in Refs. [22,23] in which the spread of TTS is much smaller. This difference is most likely attributed to the difference in structure between random 3-SAT instances and random Ising instances (which can be thought of as random 2-SAT), or it could be some finite-size effect we only see at problem size N=50.

 figure: Figure 10.

Figure 10. Median, 75th percentile, and 90th percentile TTS of SAT-CFC on Max-SAT at problem size $N=50$. The TTS is measured in the units of integration time steps.

Download Full Size | PDF

Another interesting observation is that, after a certain point, the TTS does not increase or decrease when $\alpha$ is increased. In the case of the complete solver, we found that the time it took to solve these problems increases greatly with $\alpha$ (data not shown here), in fact, we were not able to solve many problems after $\alpha = 8$ using the complete solver. Figure 10 implies, however, that the difficulty of finding a maximally satisfying configuration for a heuristic solver such as SAT-CFC does not increase. On the other hand, because the complete solver needs to be certain of optimality it must exhaustively check many solutions, and the size of this exhaustive search increases with $\alpha$.

Because the complete solver struggles even for moderate problem sizes, we only studied problem size $N=50$ in the case of Max-SAT. However, in the future, we expect to benchmark SAT-CFC against Max-SAT competition problems to gain a better sense of its computational performance on Max-SAT.

7. Tutorial on DOPO Network and CIM

In this section, we give a brief introduction to the DOPO and how networks of DOPOs can be used to perform computations. The concept of a DOPO network is central to the discussion on the physical implementations of CSSs presented in the next section (Section 8).

7.1 Pitchfork Bifurcation of the DOPO

Figure 11 shows a thin-film lithium niobate (LiNbO$_3$) device in which a periodically poled LiNbO$_3$ active waveguide is connected to a racetrack LiNbO$_3$ passive waveguide to form a ring cavity DOPO [24]. When a pump pulse train is injected into the cavity, a parametric downconversion process from the pump at frequency $2\omega$ to the signal at frequency $\omega$ occurs. This process generates a squeezed vacuum state as shown in Fig. 12(a) which is a superposition of canonical coordinate eigenstates $\lvert X\rangle$. As the pump intensity increases from below to above the oscillation threshold, the stretched quantum noise along the $x$ direction is split apart into either 0-phase or $\pi$-phase coherent state as shown in Fig. 12(b). A CIM is a network of coupled DOPOs that starts below threshold and ends at above threshold. Once the machine is above threshold, the nearly orthogonal states (0 and $\pi$ phases) are used to represent Ising spin-up and spin-down states. Figure 12(c) shows the experimental pitchfork bifurcation of $N=2000$ coupled DOPOs [8].

 figure: Figure 11.

Figure 11. DOPO device implemented using a thin-film lithium niobate (LiNbO$_3$) waveguide [24].

Download Full Size | PDF

 figure: Figure 12.

Figure 12. (a) Squeezed vacuum state in a DOPO device below threshold. (b) Two nearly orthogonal coherent states with 0 and $\pi$ phase [63]. (c) Pitchfork bifurcation of 2000 DOPOs [8].

Download Full Size | PDF

7.2 Mutual Coupling of DOPOs

In order to construct a CIM, we must have optical coupling between DOPOs according to the Ising coupling matrix $J_{ij}$. This means we need to create a coupling field to the $i$th DOPO from all of the other DOPOs according to the formula $f_i(t) = \sum _{j} J_{ij} x_j(t)$. There are two experimental schemes available to achieve this goal: optical delay line coupling shown in Fig. 13(a) [6,9] and measurement feedback coupling schemes shown in Fig. 13(b) [7,8]. The optical delay line scheme can realize high-speed and low-power operation of the CIM. However, the phase stabilization of the $N-1$ optical delay lines is challenging for both free-space and fiber-optic-based approaches [6,9]. The measurement feedback scheme realizes all-to-all coupling by using a single digital circuit consisting of an analog-to-digital converter (ADC), FPGA, and digital-to-analog converter (DAC) which eliminates the notorious phase stabilization problem of the optical delay line scheme. As described in Ref. [34] the DOPO network with external pumping can modulate the oscillator amplitudes of constituent nodes and in this way finds a lower threshold pump rate in a non-solution state than the threshold pump rate for the true ground state. It turns out that this particular mode of oscillation is a principal eigenvector of the Ising coupling matrix $J_{ij}$. This means the CIM is only able to find the ground state if it is sufficiently close to the principal eigenvector. This fundamental disadvantage of the CIM can be overcame by adding error correction feedback which keeps the amplitude of all DOPOs close to a universal target amplitude. [22]

 figure: Figure 13.

Figure 13. (a) Optical delay line coupled CIM. (b) Measurement feedback coupled CIM [63].

Download Full Size | PDF

8. Physical Implementations of Coherent SAT Solvers

In this section, we outline hybrid optical–digital and all-optical approaches to building a fast and energy-efficient integrator for the SAT-CFC and SAT-CAC equations. In all proposed implementations the device iteratively updates the spin variables $x_i$ and error variables $e_i$ using an Euler integration step. These variables are either stored digitally in the memory of an FPGA or optically as the amplitude of coherent pulses in an optical cavity. The key difficulty of the integration step is the calculation of the coupling term $z_i$ which is defined as

$$z_i(t) = \sum_j K_{ij}(t),$$
with
$${} K_{ij} = C_{ij}\prod_{k \in \mathcal{I}_{j}, k \neq i} \frac{1 - C_{kj}x_k}{2}.$$
(Note that $z_i e_i$ = $f_i$ from Eq. (18).) Because the other terms in equations for SAT-CFC and SAT-CAC (Eqs. (19), (20), (16), and (17)) are all componentwise, they can be efficiently computed in parallel. This section mainly focuses on how this $z_i$ can be computed efficiently on the various platforms. In this section, we mainly consider random 3-SAT near the phase transition point $\alpha = 4.25$, although the analysis presented can likely be generalized. For each implementation we give estimates for the energy and time it would take to compute a single Euler integration step. These energy and time estimates are compared with the existing GPU implementation used to generate the results in this paper.

8.1 Algorithmic Complexity

A key consideration for implementations of CSS is that the computation of $z_i$ has very low complexity, essentially $O(N)$, for most $k$-SAT problems of practical interest. At first glance, the summation over $j \in [1,M]$ in Eq. (30) together with the product over $k \in [1,N] \neq i$ in Eq. (31) appear to require $M(N-1)$ iterations and/or parallel operations. Thus, it might appear that the naïve cost of calculating Eq. (30) once (i.e., computing a single element of $z_i$) scales as $\alpha N^2 = O(N^2)$ if the ratio of clauses to literals $\alpha = M/N$ is fixed; because we need to do this $N$ times to get all the elements of $z_i$, the feedback term appears to bottleneck the algorithm at $O(N^3)$, which even worse than the Ising case.

However, for each fixed $j$, there are only $K-1$ other literals in clause $j$ by construction (i.e., for a given $j$, $C_{ij}$ is only non-zero for $K-1$ values of $k \in [1,N] \neq i$). If we process (30) by iterating over $j$ rather than $i$ first (i.e., assume we have random or out-of-order access to incrementally update $z_i$), then we see that, in fact, the sum-over-products consists of exactly $KM$ products, each consisting of $K-1$ factors. Thus, in principle, the cost of calculating the entire set of $z_i$ (for all $i$) is $K-2$ pairwise product reductions followed by $KM$ pairwise sum reductions, for a total of $K(K-2)M = \alpha K(K-2) N = O(N)$ operations. However, reaching this optimal cost requires us to have random access to the elements of the feedback vector $z$ in order to perform the sum reductions, which can be difficult in a physical scenario where the elements $z_i$ are calculated in a serial manner (as the schemes we consider in the following do).

If we want to stick with the form suggested by Eq. (30) (i.e., retaining index $i$ as the slow index), we can take the following approach instead, which also draws on the idea that not every clause $j$ contains each literal $i$. For fixed $K$ and $\alpha$, the largest number of clauses $M'$ in which any one literal appears is a random variable with a well-defined distribution. For example at $K=3$ and $\alpha = 4.25$, the distribution of $M'$ is sharply peaked around $\langle M' \rangle \approx 12 + 7 \log \log N$, as shown in Fig. 14. Thus, taken together, the complexity of calculating the most expensive element of $z_i$ is, on average, $\langle M' \rangle (K-1) = O(\log \log N)$: essentially constant. Thus, the total cost of computing all elements $z_i$ for the feedback step is upper-bounded by $O(N\log \log N)$, which is also nearly linear. These considerations mean that any practical implementation of CSS should not attempt to implement (30) directly, but rather a more sparse representation that bypasses the vast majority of elements $C_{ij}$ which are zero, i.e.,

$$z_i = \sum_{j \in \mathcal{M}_i} C_{ij} \prod_{k \in \mathcal{I}_j \neq i} {\textstyle\frac12} \big(1 - C_{kj} x_k\big),$$
where $\mathcal {M}_i$ contains the indices of the clauses in which literal $i$ appears, and $\mathcal {I}_j$ contains the indices of the literals appearing in clause $j$. In this formulation, there are on average $|\mathcal {M}_i| = \langle M'\rangle = O(\log \log N)$ terms in the summation and $K-1$ terms in the product, as desired.

 figure: Figure 14.

Figure 14. (a) Probability distribution of highest occurrences $M'$ of any variable as a function of $N$ (for $\alpha = M/N = 4.25$ and $k = 3$). Solid line: the scaling of $\langle M'\rangle$. Dashed lines: the scaling of the minimum and maximum $M'$ observed in 1000 instances. Note that the fit indicates that $\langle M'\rangle \approx 12 + 7 \log \log N$. (b) Histogram of probability distribution for $M'$ at $N = 1000$. The distribution of $M'$ is strongly peaked around $\langle M'\rangle$, and even a cutoff of $M'_\textrm {max} = 40$ provides sufficient memory for the optical encoding of any realistic problem instance.

Download Full Size | PDF

8.2 Physical Encodings

For the remainder of this section, unless otherwise stated, we focus on the $K=3$ case (i.e., 3-SAT) for simplicity. For $K=3$, the feedback function can be reduced to a generic form

$$z_i = \sum_{j'} \frac{\tilde{C}_{ij'}}{4} \Bigl(1 - \tilde{C}^{(1)}_{ij'}\tilde{x}^{(1)}_{ij'}\Bigr) \Bigl(1 - \tilde{C}^{(2)}_{ij'}\tilde{x}^{(2)}_{ij'}\Bigr),$$
which we explain as follows. Following the discussion, we have introduced a single uniform cutoff $M'_\textrm {max}$ for the summation over clauses, where $M'_\textrm {max} \leq M$ is chosen such that the probability of $M' > M'_\textrm {max}$ is acceptably small; the suboptimality of this choice is evidently only a factor of $O(\log \log N)$. The tildes over the symbols appearing in this expression are used to denote that they are new matrices which are distinct from, but constructed from, the underlying $C$ matrix and $x$ vector. First, $\tilde C_{ij'}$ denotes the $j'$th non-zero element in the $i$th row of $C$; if the literal $i$ appears $|\mathcal {M}_i| < M'_\textrm {max}$ times, then $\tilde C_{ij'}$ is zero for $j' > |\mathcal {M}_i|$. Then, because each clause consists exactly of three literals, $\tilde {x}^{(1)}_{ij'}$ and $\tilde {x}^{(2)}_{ij'}$ refer to the other two variables ($i$ excluded) which occur in the clause referred to by $i,j'$. Similarly, $\tilde {C}^{(1)}_{ij'}$ and $\tilde {C}^{(2)}_{ij'}$ are the respective $C$-matrix elements for those two variables, respectively. This somewhat clunky notation leads to a clean time-multiplexed encoding that still leverages the sparsity of the problem: The calculation consists of looping over three indices $i$ ($N$ times), $j'$ ($M'_\textrm {max}$ times), and the superscript (2 times). The total physical cost of this encoding is thus $N \times M'_\textrm {max} \times 2 = O(N\log \log N)$, which is optimal aside from the constant-factor overhead of making $M'_\textrm {max}$ uniform for each $i$.

To make this discussion more concrete, consider the following example problem:

$$(\ell_1 \vee \overline{\ell_2} \vee \ell_5)\wedge (\ell_2 \vee \ell_3 \vee \overline{\ell_4})\wedge (\overline{\ell_1} \vee \ell_4 \vee \ell_6)\wedge (\ell_3 \vee \ell_4 \vee \ell_5).$$
The feedback to $x_4$ is given by $z_4(t) = K_{42} + K_{43} + K_{44}$, where $K_{42} = -{\textstyle \frac 14}(1-x_2)(1-x_3)$, $K_{43} = {\textstyle \frac 14}(1+x_1)(1-x_6)$, and $K_{44} = {\textstyle \frac 14}(1-x_3)(1-x_5)$. Even though there are $M = 4$ clauses and $N = 6$ literals, the most number of times that any literal appears is $M' = 3$ (obtained by $\ell _4$). Thus, for the sake of illustration, we can fix $M'_\textrm {max} = 3$ and generate a sparse encoding using $6 \times 3$ matrices as follows:
$$\tilde{C} = \begin{pmatrix} +1 & -1 & 0 \\ -1 & + 1 & 0 \\ +1 & +1 & 0 \\ -1 & +1 & +1 \\ +1 & +1 & 0 \\ +1 & 0 & 0 \end{pmatrix}, \quad \tilde{C}^{(1)} = \begin{pmatrix} -1 & +1 & 0 \\ +1 & +1 & 0 \\ +1 & +1 & 0 \\ +1 & -1 & +1 \\ +1 & +1 & 0 \\ -1 & 0 & 0 \end{pmatrix}, \quad \tilde{C}^{(2)} = \begin{pmatrix} +1 & +1 & 0 \\ +1 & -1 & 0 \\ -1 & +1 & 0 \\ +1 & +1 & +1 \\ -1 & +1 & 0 \\ +1 & 0 & 0 \end{pmatrix},$$
where the remaining zeros are simply end-padding to make the encoding uniform (along the columns). The trick with the sparse physical encoding is to have two physical variables, $\tilde {x}^{(1)}_{ij'}$ and $\tilde {x}^{(2)}_{ij'}$ (e.g., two optical pulses), for each entry of these matrices, which are redundantly mapped to the computational variables $x_i$, as follows:
$$ \tilde{x}^{(1)} = \begin{pmatrix} x_2 & x_4 & 0 \\ x_1 & x_3 & 0 \\ x_2 & x_4 & 0 \\ x_2 & x_1 & x_3 \\ x_1 & x_3 & 0 \\ x_1 & 0 & 0 \end{pmatrix}, \quad \tilde{x}^{(2)} = \begin{pmatrix} x_5 & x_6 & 0 \\ x_5 & x_4 & 0 \\ x_4 & x_5 & 0 \\ x_3 & x_6 & x_5 \\ x_2 & x_4 & 0 \\ x_4 & 0 & 0 \end{pmatrix}. $$
Thus, we can see that this example problem uses $N \times M'_\textrm {max} \times 2 = 36$ physical degrees of freedom in this encoding, which is more than $N = 6$ but will still result in $O(N)$ asymptotically. Physically, we can imagine one time-multiplexed train of pulses representing $\tilde x^{(1)}$ and another representing $\tilde x^{(2)}$, in which case, the time cost of calculating all the feedback terms becomes just $N \times M'_\textrm {max} = 18$ rather than $N \times M \times N = 144$ naïvely. If we also parallelize across the rows of these matrices using an $N$-fold spatial multiplexing on top of this (an easy prospect in this encoding, as we show later), we can even achieve $O(1)$ time cost (in this case, a cost of three to iterate through the columns) at the expense of having $O(N)$ spatial channels.

8.3 Analog–Digital Hybrid Approach

The limiting factor in digital implementations of the dynamics is expected to be the time and energy cost for performing scalar arithmetic. A hybrid analog–digital approach may be able to alleviate such bottlenecks by replacing some of the digital arithmetic with analog operations on analog components and signals, which can either decrease the power or time required relative to the digital version. At a high level, CSS involves repeated iterations of the following:

  • (1) continuous-variable arithmetic in the form of both local operations such as $x_i^3$, as well as non-local operations such as the summation over $j$;
  • (2) memory and communication to store, read, and write (including feedback) the dynamical variables $x_i$ and $e_i$, as well as the problem statement $C_{ij}$.
We consider and evaluate a few design patterns in the following that address these needs using analog approaches, involving both electronic and optical signals. We focus primarily on time-domain-multiplexed schemes in order to explore the potential of high-clock-rate systems with minimal requirements on spatial hardware homogeneity, with spatial or wavelength resources used only for higher-level (channel) parallelization.

8.3.1 Analog Multiply–Accumulate with Digital Feedforward

Our proposed approach for calculating the feedback terms is shown in Fig. 15. We begin by noting that the feedback terms in Eq. (33) can be written as a sum over triple-wise products of the elements from three vectors for each $i$, $z_i=\sum _{j'} \tilde {C}_{ij'}s^{(1)}_{ij'}s^{(2)}_{ij'}$. The coupling terms $s_{ij'}$ are given by $s^{(1)}_{ij'} = (1-\tilde {C}^{(1)}_{ij'})x_{j'}/2$. We start with an optical time-multiplexed architecture, with one optical channel processing both $i \in [0, N-1]$ and $j' \in [0, M'_\textrm {max}-1]$ sequentially (but with two spatial rails for $s^{(1)}_{ij'}$ and $s^{(2)}_{ij'}$) at a clock rate $T_\textrm {clock}$. (Note that for convenience we have also shifted to zero-indexing.) The simplest way to optically generate the coupling subterms is to directly modulate $s_{ij'}^{(1)}$ and $s_{ij'}^{(2)}$ onto two copies of a laser with a pulse repetition period of $T_{\mathrm{clock}}$ (Fig. 15(a)). Since the coupling terms are real numbers this approach only requires two amplitude modulators (AM), one for each copy of the input laser. One phase modulator is used to implement $\tilde C_{ij'}$. At the start of each iteration of the algorithm, the subterms are generated by an FPGA, and thus modulated onto optical signals by appropriately driving the modulators.

 figure: Figure 15.

Figure 15. (a) Calculation of the mutual coupling terms $K_{ij'}$ using a coherent receiver. (b) Summation over the index $j'$ is performed using an analog integrator, with the resulting voltage digitized and buffered for the FPGA. (c) High-level circuit layout for the signal processing in the supporting FPGA to update $x_i$ and $e_i$ and feed forward into the optical circuit.

Download Full Size | PDF

The triple-wise inner product can be evaluated using coherent detection in a balanced homodyne. In time-multiplexed systems, it is known that the multiply-and-accumulate (MAC) operation can be easily performed using a balanced homodyne receiver and an electronic low-pass filter (Fig. 15) [36]. Here we extend this approach to a triple-wise product, where one of the vectors ($\tilde {C}_{ij'}$ for a fixed $i$) is constant between successive iterations. For a balanced homodyne receiver with a much faster response time than $T_{\mathrm{clock}}$, the photocurrent detected by the receiver given the two input streams $s^{(1)}_{ij'}$ and $s^{(2)}_{ij'}$ is

$$I\bigl(t = (i M'_\textrm{max} + j') T_\mathrm{clock}\bigr) = 2 s_{ij'}^{(1)} s_{ij'}^{(2)}\sin(\phi),$$
where $\phi$ is the relative optical phase between the two inputs. A push–pull modulator can be used to control this relative phase and therefore implement $\tilde {C}_{ij'}$, in which case we realize
$$I\bigl(t = (i M'_\textrm{max} + j') T_\mathrm{clock}\bigr) = \tilde{K}_{ij'}.$$
Summation over the index $j'$ is performed by integrating this photocurrent using a trans-impedance amplifier over a time $M'_\textrm {max} T_{\mathrm{clock}}$, which sums over the $j'$ index. In practice, this relatively long integration time has the advantage of allowing the use of slow (e.g., 1 GHz) receivers, with similarly low bandwidth requirements on the output electronics; more precisely, the output varies on time scales $M'_\textrm {max} T_\textrm {clock}$.

Upon proper sampling and digitization by the ADC in Fig. 15(b), the digitized output produces a time-multiplexed readout of the feedback terms $z_i$, which can then be processed using only local operations in an FPGA to realize Eqs. (19) and (20). Figure 15(c) shows a high-level circuit for how these local operations might be performed on an FPGA with suitable ADC and DAC units. The local nature of the calculations greatly simplifies the digital circuitry needed in this hybrid system, thus decreasing the amount of energy expenditure by the inefficient electronic circuits in the FPGA compared with an all-digital implementation. In exchange, however, some care must be taken with regard to the latency introduced by the ADC/FPGA/DAC system. For example, in Fig. 15(c), 14 clock cycles of the FPGA are needed for the information to go from ADC to DAC, which amounts to 28 ns at a 500 MHz clock. This sets a lower bound on the time a single iteration of the CSS algorithm must take, which can become the bottleneck if all other rates (such as the optical clock or the modulator bandwidth) are sufficiently fast. The total system latency and electronic power consumption are given in Table 5.

Tables Icon

Table 5. Electronic Latency and Power Consumption for the Hybrid Optoelectronic Scheme

Finally, we note that one could, in principle, bypass the modulators and balanced homodyne receivers, using analog electronic multipliers instead to generate the mutual coupling terms $K_{ij'}$, which may then be integrated directly to find the feedback terms $z_i$. In practice, electro-optic modulators can have greater bandwidth than their electronic counterparts based on electronic mixers and amplifiers whose broadband performance is limited by impedance mismatch and loss. Efficient electro-optic modulators have been demonstrated with 1 V $V_\pi$ and 3-dB bandwidths greater than 100 GHz [64]. Commercially available photodiodes can have bandwidths greater than 100 GHz (II-VI BPDV412), but our scheme does not require such high bandwidth because a photodiode response slower than the optical clock rates performs the integration necessary for the MAC operation.

8.3.2 Parallelization

Feedforward analog approaches are easily parallelized to reduce both the TTS and the ETS. In this case, we define a channel as a separate pair of optical ports fed into a homodyne receiver (Fig. 16). Each channel comprises (i) a fan-out from an input laser, (ii) two amplitude modulators to generate coupling terms, (iii) a push–pull phase modulator to implement $C_{ij'}$, and (iv) a homodyne receiver to perform a MAC.

 figure: Figure 16.

Figure 16. (a) Parallelism may be achieved by fanning out the input laser into multiple channels. In this case, each channel contains two amplitude modulators, a push–pull phase modulator, and a balanced homodyne receiver. For the case of $N_\mathrm {ch}=2$, we may choose either $i\in \left [1,N/2\right ]$ and $h\in \left [N/2 + 1, N\right ]$ or $i \in Z_e$ and $h \in Z_o$, where $Z_e$ and $Z_o$ refer to the even and odd integers, respectively.

Download Full Size | PDF

For $N_\mathrm {ch}$ channels, the total power consumption of the system is given by

$$P = N_\mathrm{ch} (P_\mathrm{opt} + 3 P_\mathrm{mod}) + P_\mathrm{elec},$$
where $P_\mathrm {opt}$ is the optical power necessary to achieve high-fidelity readout in the homodyne receiver, $P_\mathrm {mod}$ is the power required to operate a modulator, and $P_\mathrm {elec}$ is the power required for the electronics. Here $P_\mathrm {elec}$ is a weak function of $N_\mathrm {ch}$, and is assumed to be constant for simplicity. The time required to perform one iteration of SAT-CFC is given by
$$T_\mathrm{iteration} = T_\mathrm{elec} + T_\mathrm{clock}M'_{\textrm{max}}N/N_\mathrm{ch},$$
where $T_\mathrm {elec}$ is the total latency of the electronics. The latter term in Eq. (36) is the total time required to integrate all of the mutual coupling terms.

The energy required to perform one iteration of the SAT-CFC algorithm is given by the product of Eq. (35) and Eq. (36). Grouping terms by powers of $N_\mathrm {ch}$, we have

$$\begin{aligned} E_\mathrm{iteration} &= N_\mathrm{ch}P_\mathrm{opt}T_\mathrm{elec} + P_\mathrm{elec}T_\mathrm{elec}\\ &\qquad+(P_\mathrm{opt} + 3 P_\mathrm{mod}) T_\mathrm{clock}M'_{\textrm{max}}N + P_\mathrm{elec}T_\mathrm{clock}M'_{\textrm{max}}N/N_\mathrm{ch}, \end{aligned}$$
where we have assumed $P_\mathrm {mod}T_\mathrm {elec}$ may be set to zero because the modulators only consume power when receiving an electronic signal. The four terms in Eq. (37) have a simple physical meaning. The term that grows linearly with $N_\mathrm {ch}$ corresponds to the optical power consumed during the relatively long computation time of the electronics. The constant terms correspond to the energy required to integrate the optical signals and perform the electronic operations. The term that decays linearly with increasing $N_\mathrm {ch}$ corresponds to the power consumed by the electronics while the receivers are integrating the coupling terms. Equation (37) exhibits a minimum with respect to $N_\mathrm {ch}$ when
$$N_\mathrm{ch,opt}^2 = \frac{P_\mathrm{elec}T_\mathrm{clock}}{P_\mathrm{opt}T_\mathrm{elec}}M'_{\textrm{max}}N.$$
We close this section by evaluating the TTS and ETS of this hybrid approach as a function of the number of parallel channels. The numbers we assume for these calculations are listed in Table 6. For these parameters, the ETS is minimized when $N_\mathrm {ch}=443$. The ETS and TTS for both the serial and the parallel hybrid analog-digital approaches are given as a function of $N_\textrm {ch}$ in Table 7. Remarkably, the TTS and ETS both drop off quickly with $N_\textrm {ch}$ (for small $N_\textrm {ch}$), and then slowly approach a global minimum as $N_\textrm {ch}\rightarrow N_\textrm {ch,opt}$. Using relatively small channel numbers, $N_\textrm {ch}=16$ and $N_\textrm {ch}=64$, yields values for ETS and TTS that are only factors of 3 and 1.5 worse than the optimal case of $N_\textrm {ch,opt}=443$, respectively. This rapid scaling suggests that relatively simple systems with realistic numbers of channels may operate with far less energy than a pure FPGA approach. We compare these approaches later in Table 11.

Tables Icon

Table 6. Power Consumption and Latency of Each Component of the Hybrid Architecture

Tables Icon

Table 7. Time-to-Solution and Energy-to-Solution Estimates for Hybrid Feed-Forward Architectures ($N = 1000$, $M = 4.25N$)

8.4 Fully Optical Approach

The hybrid analog–digital approach established here has two main limitations: the modulator bandwidth, which constrains the optical clock speed; and the electronic power required to drive the electro-optic modulators, which dominates the marginal power cost per channel. This motivates the exploration of a fully optical approach, which forgoes electro-optic modulation, coherent detection, and digital electronics entirely.

A block diagram of a fully optical SAT-CAC solver is shown in Fig. 17. This approach comprises three subsystems: (i) an optical dynamic random access memory (optical DRAM, Fig. 17(a)) and an associated output stage, which prepares the outputs from optical DRAM for computation; (ii) an optical compute core (Fig. 17(b)), which calculates the feedback terms $z_i$; and (iii) an error correction circuit (Fig. 17(c)), which implements the equations of motion for CAC, given by

$$\frac{d e_i(t)}{dt} ={-}\beta e_i(t)\left(x_i^2(t) -1\right).$$
In the following sections, we discuss the dynamics of each subsystem in more detail.

 figure: Figure 17.

Figure 17. (a) Variables $x_i$ are stored in an optical DRAM and read out into the optical encoding described in Section 8.2. These outputs are prepared using either offset circuits or SHG for the optical compute core and error correction circuit. (b) The error correction circuit interferes with the input second harmonic of each $x_i$ with the pump laser to realize Eq. (39). (c) The optical compute core performs a sequence of element-wise products using sum-frequency generation (SFG) and difference-frequency generation (DFG) and vector–vector inner products (MACs) to calculate the feedback terms given by $e_iz_i$.

Download Full Size | PDF

8.4.1 Optical Memory

The optical encoding introduced in Section 8.2 used to calculate the feedback terms $z_i$ contains many copies of each $x_i$ in a random (but rearrangeable) order. We, therefore, need a random-access optical memory, which can copy pulses of $x_i$ into arbitrary time bins synchronized to an optical clock. Synchronously pumped DOPOs are naturally suited for this task because they may be driven directly by clock pulses and implement the equations of motion for $x_i$ inside the cavity. In the absence of CAC terms, which induce complicated trajectories through state space, the pump acts as an external memory refresh both by compensating for any decay of $x_i$ with saturated gain and by resetting any phase drifts.

Figure 18 shows one such approach to an optical memory based on OPOs. The optical pulse encoding $x_i$ is stored inside of a synchronously pumped optical cavity with two nonlinear sections (Fig. 18(a)). The first section performs degenerate optical parametric amplification (DOPA) when pumped by short pulses of 780-nm light, which implements the gain, loss, and saturation terms in Eq. (16) given by $\partial _t x_i(t) = (p - 1)x_i(t) - x_i^3(t)$. Readout and feedback injection are performed using sum-frequency generation (SFG) and difference-frequency generation (DFG), respectively (Fig. 18(b)). In the first case, optical pulses with amplitudes given by $\tilde {C}_{ij'}^{(1)}$ or $\tilde {C}_{ij'}^{(2)}$ are injected into the bus waveguide with a carrier frequency of 1300 nm. The optical pulses generated at 700-nm by SFG have an amplitude proportional to $\tilde {C}_{ij'}^{(1)} x_i$. These short wavelengths are not coupled across any of the directional couplers shown in Fig. 18(a), and are therefore outcoupled in the bus waveguide. Similarly, 700-nm feedback pulses in the bus waveguide with field amplitude $e_i z_i$ may be injected into the OPO cavity by performing DFG against a bright 1300-nm write pulse.

 figure: Figure 18.

Figure 18. (a) Optical memory based on a synchronously pumped OPO. Copies of $x_i$ may be read to a bus waveguide, and feedback pulses may be injected into the OPO using SFG and DFG, respectively. (b) Example pulse sequence for reading copies of $x_i$ into the waveguide and injecting feedback from the waveguide into the OPO. In these diagrams, the rightmost pulse enters the resonator first, and therefore operations are read from right to left.

Download Full Size | PDF

We now consider the design of an optical DRAM (Fig. 19) that can be used to copy five different $x_i$ into an encoding of the first subterm $s_{ij'}^{(1)}$ for a problem instance given by

$$(\ell_1 \vee \bar{\ell_2} \vee \ell_5)\wedge (\ell_2 \vee \ell_3 \vee \bar{\ell_4})\wedge (\bar{\ell_1} \vee \ell_2 \vee \ell_4)\wedge (\ell_3 \vee \ell_4 \vee \ell_5).$$
In this case, synchronized read and write pulses injected into each SFG stage may be used to generate an output waveform in the bus waveguide that contains any weighted sum of the $x_i$. Sparse feedback pulses (one for every $M'_\textrm {max}$) are input to the bus waveguide and are gated into the appropriate OPO by bright write pulses. We note here that in the absence of a write pulse, the injected feedback pulses may provide parametric gain to the intracavity $x_i$, which acts as a potential source of cross talk. In practice, because this gain is given by $\text {cosh}(e_i z_i)$ whereas the feedback injected by SFG is equal to $e_i z_i$, high-fidelity feedback may be realized by using dim feedback pulses and bright write pulses relative to the field intensity of the intracavity $x_i$. For example, if the intracavity $x_i$ are of the order of 1 fJ, the feedback pulses are on the order of 10 aJ, and the write pulses are on the order of 100 fJ to achieve saturated SFG, then the cross talk introduced by field gain from the feedback pulses is $\mathrm {cosh}(0.01) < 0.01\%$.

 figure: Figure 19.

Figure 19. (a) Optical DRAM based on many synchronously pumped degenerate OPOs coupled to a common bus waveguide. (b) Example pulse sequence for reading copies of each $x_i$ to the waveguide and injecting feedback into each OPO.

Download Full Size | PDF

8.4.2 Optical Vector Arithmetic

The optical compute core requires both element-wise operations across vectors of pulses and vector–vector inner products, hereafter referred to as MACs. Although element-wise addition and multiplication are easily realized using beam splitters and three-wave mixing (TWM), respectively, MACs require a memory element in order to accumulate the summed signals. Figure 20 illustrates two ways of performing this operation using synchronously pumped nonlinear resonators. In Fig. 20(a), DFG between successive pulses in two pulse trains with field amplitudes given by $a_i$ and $b_i$ injects idler pulses into an optical micro-resonator. When the round trip time of the intracavity idler is matched to the pulse separation of the two input waves, the intracavity idler sums contributions from each pulse pair. At the end of each summation, a read pulse can be used to perform SFG against the intracavity idler and eject the summed fields as a short wavelength pulse in the bus waveguide (Fig. 20(c)). A similar approach is taken in Fig. 20(b) where two long-wavelength pulse trains, $a_i$ and $b_i$, are coupled into a cavity across directional couplers. Each pulse pair generates a short-wavelength pulse via SFG, which circulates in the cavity and accumulates contributions from successive SFG processes. A bright red pulse is used to eject the short-wavelength sum as a long wave after summation is finished.

 figure: Figure 20.

Figure 20. (a),(b) Optical vector–vector inner products with outputs at short and long wavelengths, respectively. (c),(d) Example pulse sequence for each approach. We note here that a vector of ones can be used as an input to perform accumulation (ACC). This can be used to integrate an optical signal over many pulses without incurring waveform distortion.

Download Full Size | PDF

8.4.3 Full System Analysis

Having introduced each of the subsystems, we are now prepared for a system-level analysis of the all-optical SAT-CAC architecture. We begin by considering the power requirements of the optical DRAM. For 3-SAT, a full layout of the optical DRAM is shown in Fig. 21 for the problem instance introduced in Section 8.4.1. Three separate copies of $x_i$ are stored and addressed in parallel across the optical DRAM. For the architecture shown here, mutual injection between the copies of $x_i$ is used to prevent inconsistencies from forming. The first two copies of $x_i$ are used to calculate each pair of subterms (blue and purple pulses), and one copy is used to calculate the error correction terms $e_i$ using a train of pulse-picked (or accumulated) $x_i$ (red pulses). As shown in Fig. 17, these outputs are then prepared for the compute core by offsetting each subterm and performing SHG on the copied $x_i$.

 figure: Figure 21.

Figure 21. (a) For 3-SAT, three separate copies of $x_i$ are stored and addressed in parallel across the optical DRAM. (b) Two copies are to calculate each pair of subterms, and one copy is used to calculate the error correction terms $e_i$. Mutual injection between the copies of $x_i$ helps to prevent inconsistencies from forming.

Download Full Size | PDF

The optical DRAM has the highest power consumption of any subsystem because $3N$ OPOs are required to store the $x_i$. Furthermore, two additional OPOs are required to store the ternary $\tilde {C}_{ij'}^{(1)}$ and $\tilde {C}_{ij'}^{(2)}$ used to perform readout on each of the memory cavities, totaling $9N$ OPOs. For $N=1000$, a clock speed of 100 GHz, and required pump energy of 1 fJ per OPO, the total average power consumption of the optical DRAM is 0.9 W. The only other input required by the optical DRAM is the offset operation, which only consumes $\approx$ 100 aJ per pulse for the two rails, or $\approx$ 20 $\mu$W for a 100 GHz clock rate. The two remaining subsystems each contribute a fixed number of OPOs. The CAC circuit shown in Fig. 17(b) requires one additional OPO, and the ternary signals used to encode $C_{ij'}$ in Fig. 17(c) require two OPOs. In total, these subsystems contribute 300 $\mu$W of power consumption for the clock rates and energy requirements assumed here. We round the total power requirements of the all-optical architecture up to 1 W to account for the power required to temperature stabilize the chip containing all of the optical cavities. These calculations are briefly summarized in Table 8.

Tables Icon

Table 8. Factors Involved in Calculating the Power Consumption of the Fully Optical System

The total system latency is set by the time required to fan copies of each $x_i$ into the optical encodings for each $s_{ij'}$ and inject feedback into each cavity. These outputs can be streamed into later operations, namely element-wise vector operations and MACs, during readout, and therefore these operations contribute no additional latency. In total, $N(M'_\textrm {max} + 1)$ operations are required to perform a state update, and therefore the total system latency is given by $N(M'_\textrm {max} + 1)T_\textrm {clock}$. For $N = 1000$, $M'_{\textrm {max}} = 100$, and $T_\textrm {clock} = 10$ ps, this gives an overall system latency of 1.01 $\mu$s. We note here that this architecture is conceptually trivial to parallelize by introducing $N_\textrm {ch}$ copies of the optical DRAM, each of which computes a subset of the $z_i$. We note here that since parallelization requires a copy of each subsystem, the power requirements scale linearly in $N_\textrm {ch}$. Similarly, the system latency scales inversely with $N_\textrm {ch}$, and therefore the energy required to perform one state update is invariant with respect to $N_\textrm {ch}$. For a fully parallelized system $N_\textrm {ch}=N$, the system latency is given by $(M'_\textrm {max} + 1)T_\textrm {clock}$, or a constant 1.01 ns for $M'_\textrm {max}=100$ and a 100 GHz clock. These calculations are briefly summarized in Table 9.

Tables Icon

Table 9. Factors Involved in Calculating the Time for One Round Trip of the Fully Optical System

We close this section by noting that the ETS is determined entirely by the energy required to generate sufficient gain for an OPO to oscillate. Although near-term OPOs in thin-film lithium niobate may operate with 1 fJ, roadmaps exist for approaching attojoule operation [65], in which case the ETS may be reduced by orders of magnitude. Similarly, the TTS for an all-optical system is determined solely by the clock rate, which is limited only by the gain bandwidth available for degenerate OPA. Dispersion-engineered nonlinear devices have demonstrated nearly 100 THz of bandwidth, corresponding to two optical cycles. Although near-term architectures are assumed to operate at 100 GHz, in principle the clock speed may be increased by three orders of magnitude. A summary of the ETS and TTS for both near-term and long-term all-optical architectures is given in Table 10.

Tables Icon

Table 10. ETS and TTS Estimates for All-Optical Architecture ($N = 1000$, $M = 4.25N$)a

8.5 Summary of the Various Approaches

Table 11 compares the various approaches. For combinatorial optimization using continuous-time dynamical systems such as CSS, an important baseline for performance, in terms of both energy and time, is to consider an optimized FPGA circuit for digitally integrating the equations of motion [22,38,45]. The full FPGA implementation of SAT-CFC would be composed of several cores that are simultaneously working. As an order-of-magnitude estimate, we expect each core to require $\sim 100$ ns (at 500 MHz clock) to finish its calculation [22]. Note that this estimation is based on known results (i.e., experimentally realized FPGA circuits) for the CIM solver [22]; although experimental FPGA circuits for SAT-CFC have yet to be developed, we expect this number likely represents a good upper bound of the time cost per core for an FPGA implementation of SAT-CFC. Additional clock cycles may be required to gather all the data and dispatch them to the corresponding cores depending on the SAT problem. Using a sparse matrix approach, we estimate that 20 W of power consumption is required for the full system to work.

Tables Icon

Table 11. Estimated Time and Energy Per Iteration for Each Architecture as well as Wall-Clock TTS and ETSa

We see that hybrid opto-electronic approaches may outperform an FPGA in both TTS and ETS when computations are parallelized across many optical channels. The ETS of all-optical architectures is invariant with respect to the number of channels, and therefore no near-term architecture offers any substantial performance improvement in ETS relative to the hybrid and digital approaches. In principle, further improvements in the bandwidth and energy requirements of all-optical systems may enable substantial reductions in both ETS and TTS.

9. Conclusion

In conclusion, we have presented a new heuristic for solving the SAT problem which uses nonlinear continuous-time dynamics. We have performed benchmarks of our technique against existing continuous-time solvers [16,23] by comparing their performance on random 3-SAT. Our solvers appear to exhibit polynomial scaling of median TTS at and below the 3-SAT phase transition point $\alpha \approx 4.25$ for the problem sizes studied (up to $N=1000$). Our heuristic is also able to efficiently solve the Max-SAT problem when the problem instance is not satisfiable, at least at problem size $N = 50$.

In addition to the equations presented in Sections 3 and 4., in Section 8 we have proposed a hybrid optical–electronic implementation scheme as well as a fully optical implementation which are designed to take advantages of the low power consumption and high clock speed attainable in optics. Both the hybrid opto-electronic approach and the fully optical approach can be implemented on a thin-film LiNbO$_3$ platform. In Section 8 our estimates for TTS and ETS have shown potential orders of magnitude improvement over the GPU hardware used to generate the simulation results for this paper.

In future work, we would like to expand the benchmarking of these solvers to larger classes of problems including SAT and Max-SAT competition problems. In addition, a result of this work is that even though conversion between classes NP-hard problems is possible, this process will often increase the difficulty of the problem for heuristic solvers. Thus, it is important to have a dedicated heuristic for each class of problems. We believe that the philosophy used to create the solvers discussed in this work can be extended to other types of NP-hard and NP-complete problems and can possibly provide improved performance for a variety of applications.

Appendix A: Calculation of TTS

In this appendix, we discuss precisely how TTS is calculated for the figures in Sections 3 and 4 and why we use it as the metric for performance. TTS has been used for evaluating the performance of heuristic Ising solvers [22,23,32,39] because it simultaneously takes into account both the time required to obtain a result (which can be measured in wall-clock time, time steps, etc.) and the probability that the result is correct. In other words, there is often a balance between quickly getting a solution that has a small chance of being accurate, and spending a long time getting a solution that has a higher chance of being accurate. Although this concept of TTS has not been applied to the SAT problem yet to the best of the authors’ knowledge, it works equally well in this case.

Given a solver which takes time $T$ (in some unit) and has a probability $P$ of succeeding in solving a given problem, the TTS is defined as

$$\textrm{TTS} = T\frac{\log(1 - 0.99)}{\log(1 - P)}$$
and represents the amount of time required to find the solution with 99% accuracy. The factor $\frac {\log (1 - 0.99)}{\log (1 - P)}$ represents the number of runs of the system needed to have a 99% chance of succeeding, assuming each run is independent.

In Sections 3 and 4, $T$ represents the number of integration time steps, or, more precisely in the case of the original CTDS, the number of right-hand side evaluations [16]. As both CTDS versions use a higher-order numerical integration method, they need to calculate the right-hand side of the ODE multiple times per Runge–Kutta integration step. As one of these calculations is computationally similar to a single Euler step for SAT-CFC or M-CTDS this is the unit we use for comparison.

In Section 6, on the other hand, we express TTS (and ETS) in the units of wall-clock time and energy. As the time and energy it takes for a single integration time step computed on any of the platforms discussed is expected to only depend on $N$ and $M$, it is rather straightforward to convert TTS in the units of integration time steps into the units of wall-clock time and energy.

Another important subtlety of how TTS is computed in this paper is the selection of the number of time steps $T$. When computing the TTS for a given problem instance and parameters, we take 1000 initial conditions and compute the system trajectory for a certain number of time steps, $T_{\rm max}$. During this time, some trajectories will succeed and some will not, and thus we obtain the success probability of the system as a function of $T$, $P(T)$. That is, $P(T)$ corresponds to the fraction of trajectories that have succeeded after $T$ time steps. Thus, to compute $P(T)$ for all $T \leq T_{\rm max}$ we just need to run the simulation 1000 times for $T_{\rm max}$ steps. Thus, the TTS can also be efficiently computed as a function of $T$ as

$$\textrm{TTS}(T) = T\frac{\log(1 - 0.99)}{\log(1 - P(T))};$$
the TTS reported in this work is always
$$\textrm{TTS}_{\rm opt} = \text{min}_{T} \quad \textrm{TTS}(T).$$
In other words, we assume that we have prior knowledge of the optimal number of time steps $T_{\rm opt}$. In practice, what $T_{\rm opt}$ represents is the number of time steps after which it is better to restart the system at a new initial condition if it has not yet found the solution. In general, there is no reason to believe we will know $T_{\rm opt}$ given a problem and a solver, so $\textrm {TTS}_{\rm opt}$ might be inaccurate with regards to what we can obtain in practice. However, in each of the solvers discussed in this work, we give the following reasons why it is reasonable to use $\textrm {TTS}_{\rm opt}$ for these benchmarks.
  • • For M-CTDS with fixed parameters, we found that $\textrm {TTS}(T)$ was very dependent on $T$, however $T_{\rm opt}$ was very predictable based on the value of the parameter $\xi$ and had nearly no dependence on the SAT instance being solved. This is illustrated in Fig. 22 where $\textrm {TTS}(T)$ is shown as a function of $T$ for a specific problem instance (left), and $T_{\rm opt}$ is shown to be independent of the problem instance (right).
  • • For SAT-CFC with fixed parameters, we found that $\textrm {TTS}(T)$ was not very dependent on $T$, thus the use of $\textrm {TTS}_{\rm opt}$ is reasonable for an estimate of the TTS even if $T_{\rm opt}$ is not known exactly in practice.
  • • For SAT-CFC with modulated parameters, $T_{\rm opt}$ always occurs at the end of the anneal (otherwise the modulated parameters would not be necessary), so it can be predicted easily. Note that in this case it still necessary to optimize the anneal time (which is discussed in Appendix B), but this is a different question because it will require multiple batches of trajectories.
Although this method for calculating TTS may not be ideal, we believe it is fair and accurate enough to provide a baseline for the performance of these solvers on random 3-SAT instances. Future works will likely go into more detail on how parameters can be chosen in advance and give a more precise calculation of TTS.

 figure: Figure 22.

Figure 22. (a) TTS of M-CTDS plotted as a function of $T$ for a specific 3-SAT instance and different values of the parameter $\xi$. The TTS has a sharp minimum at a specific value of $T$. (b) Plot of $T_{\rm opt}$ against $x_i$ for a number of instances (of the same problem size), indicating that $T_{\rm opt}$ only depends on $\xi$.

Download Full Size | PDF

Appendix B: Parameter Selection and Tuning

In this appendix, we specify the system parameters used for the results in Section 4 (Figs. 6(a)–6(d), 7, and 10). For both M-CTDS and SAT-CFC some experimentation was done to achieve the optimal parameters for each of these figures. For some figures we needed different parameters to achieve good performance at different values of $\alpha$ or $N$. In these cases, the plots shown in Section 4 show TTS under the assumption of optimal parameters out of the parameters we tested.

For the results of CTDS from Ref. [16] we used the default parameters. Thus, it is possible that performance for these solvers could be improved if parameters are optimized.

Figure 6(a) (Section 6)

For this figure, because the problem size was small making the problems relatively easy, we can achieve good performance for both M-CTDS and SAT-CFC with the same parameters for all $\alpha$. The parameters we chose for SAT-CFC are $p = -2.0$, $\beta = 0.05$, $dt = 0.1$, and $T = 2\times 10^4$. The parameters we chose for M-CTDS are $\xi =1.0$, $dt = 1.0$, and $T = 2\times 10^4$.

Figure 6(b) (Section 6)

For this figure, we chose the parameters to be the best out of a set of parameters for each value of $\alpha$. This is because the optimal parameters are likely different for different clause-to-variable ratios.

In addition, for SAT-CFC we modulated the parameter $p$ during the course of the trajectory. For all values of $\alpha$, $p$ is modulated from $-1.0$ to $1.0$ whereas the other parameters are kept constant at $\beta = 0.05$, $dt = 0.1$. For each $\alpha$ we tried many different values of $T$ (the number of time steps in each trajectory) from the set $\{10^4, 2\times 10^4, 5\times 10^4, 10^5\}$ and chose the best median TTS out of these. Table 12 lists the results for the optimal value of $T$ and the corresponding TTS.

Tables Icon

Table 12. TTS and popt of SAT-CFC for Fig. 6(b)

Our hypothesis is that for harder problems $T$ must be larger to allow for a slower annealing process. This trend can be seen in the optimal parameters we found although there is some inconsistency due to noise in the data.

For M-CTDS we fixed the parameters $dt = 1.0$ and $T = 5\times 10^4$ and chose $\xi$ to be the optimal value from the set $\{1.0, 0.5, 0.2, 0.1, 0.05, 0.02, 0.01\}$. Table 13 lists the results (M-CTDS could not solve problems after $\alpha = 4.1$).

Tables Icon

Table 13. TTS and $\xi_{opt}$ of M-CTDS for Fig. 6(b)

Figure 6(c)

For this figure, the parameters are optimized by problem size for SAT-CFC. For each problem size we fixed $\beta = 0.05$, $dt = 0.1$, and $T = 20{,}000$ and chose $p$ to be the optimal value from the set $\{-2.0, -1.0, 0.0, 1.0 \}$. Table 14 shows the optimal value of $p$ and the best TTS for each problem size. For the results in this plot, we did not modulate any parameters during the course of the trajectory.

Tables Icon

Table 14. TTS and $p_{opt}$ of SAT-CFC for Fig. 6(c)a

Figure 6(d)

For this figure, the parameters are optimized by problem size for both SAT-CFC and M-CTDS. For SAT-CFC, for each problem size we fixed $\beta = 0.05$, $dt = 0.1$, and $T = 20{,}000$ and chose $p$ to be the optimal value from the set $\{-2.0, -1.0, 0.0, 1.0 \}$. Table 15 shows the optimal value of $p$ and the best TTS for each problem size. For the results in this plot, we did not modulate any parameters during the course of the trajectory.

Tables Icon

Table 15. TTS and popt of SAT-CFC for Fig. 6(d)

For M-CTDS we chose $dt= 1.0$, $T = 1\times 10^4$ for all problem sizes and chose $\xi$ to be the optimal value from $\{1.0, 0.5, 0.2, 0.1\}$. Table 16 shows the optimal value of $\xi$ and TTS for each problem size.

Tables Icon

Table 16. TTS and $\xi_{opt}$of M-CDTS for Fig. 6(d)

Figure 7

For this figure, the parameters are optimized by problem size for both SAT-CFC and M-CTDS. In the case of SAT-CFC, for each problem size we chose the parameters from two different methods for choosing parameters (see Table 17, Table 18, and Table 19). In the first method, each parameter set corresponds to runs in which $p$ is kept fixed during the trajectory, and each parameter set has a different (fixed) value of $p$. For the second method, $p$ is linearly modulated during the trajectory and each set of parameters has a different total number of steps.

Tables Icon

Table 17. Parameters Used for Each Parameter Selection Methoda

For each problem size, for fixed parameters $p$ was chosen to be the optimal value from the set $\{-2.0, -1.0, 0.0, 1.0\}$ and for annealed parameters, $T$ was chosen from the set $\{ 2\times 10^4, 5\times 10^4, 10^5, 2\times 10^5, 5\times 10^5\}$. Not all parameters were tested for each problem size. The optimal values we found for each problem size are summarized in Table 18.

Tables Icon

Table 18. TTS and popt of SAT-CFC with Fixed Parameters for Fig. 7

Tables Icon

Table 19. TTS and popt of SAT-CFC with Annealed Parameters for Fig. 7

In Fig. 23 we plot the best TTS of both fixed parameters (blue) and annealed parameters (green) as a function of problem size. As one can see, below a certain problem size, the two parameter sets give nearly identical performance, but around $N=500$–600 it becomes important to use annealed parameters. TTS in Fig. 7 is the lower envelope of these two plots.

 figure: Figure 23.

Figure 23. Median TTS of best fixed parameters and best annealed parameters for SAT-CFC. Data was not collected for fixed parameters after $N=600$.

Download Full Size | PDF

For M-CTDS, we chose $dt= 1.0$, $T = 1\times 10^4$ for all problem sizes and chose $\xi$ to be the optimal value from $\{1.0, 0.5, 0.2, 0.1, 0.05, 0.02, 0.01\}$. Table 20 lists the optimal value of $\xi$ and TTS for each problem size.

Tables Icon

Table 20. TTS and $\xi_{opt}$ of M-CTDS for Fig. 7

Appendix C: Conversion of SAT to Ising

In this appendix, we outline the method of converting a given 3-SAT problem to an Ising problem that was used in Section 3.

The standard method of converting a 3-SAT problem into an Ising problem starts by identifying each Boolean variable $x_i$ with an analogous Ising spin variable $x_i$ where $x_i = 1$ corresponds to true and $x_i = -1$ corresponds to false. Then, for each clause, we can try to choose couplings between the involved $x_i$ variables so that the Ising energy is minimized when the clause is satisfied. However, this turns out to be impossible unless we introduce an additional “auxiliary” Ising spin $a_j$ for each clause (not to be confused with the auxiliary variable $a_j$ in CTDS). To demonstrate this, without loss of generality consider the 3-SAT clause

$$x_0 \vee x_1 \vee x_2$$
can be mapped to the quadratic cost function:
$$K(x_0, x_1, x_2, a) = 3(x_0 x_1 + x_1 x_2 + x_2 x_0) - 4a (x_0 + x_1 + x_2) + 2(a - x_0 - x_1 - x_2).$$
In each of the four possibilities for $(x_0,x_1,x_2)$ (up to symmetry) there is a value of $a$ that minimizes $K$. If we assume $a$ is always set to this optimal value then the energy will be $-7$ for all values of $(x_0,x_1,x_2)$ except for the case when $x_0 = x_1 = x_2 = -1$ in which the energy will be $+1$. This can be summarized as in Table 21, where the bold numbers are the lowest possible energy for the given configuration of $(x_0,x_1,x_2)$.

Tables Icon

Table 21. Ising Energies for Each Configuration of Spins Demonstrating That the Ising Energy Matches That SAT Potential When the Ancillary Spin Is Chosen to Minimize the Energy

Using this method, we can encode an arbitrary 3-SAT clause by replacing $x_i$ with $-x_i$ if $x_i$ is negated in the clause. Now, if we sum up these cost functions for each clause then we get a quadratic cost function over $N$ $x_i$ spin variables and $M$ $a_j$ auxiliary spins, so an Ising/QUBO problem of size $N + M$. If the problem is satisfiable, then the minimum Ising energy will be $-7M$, or for an unsatisfiable problem $-7M_{\rm sat}$ where $M_{\rm sat}$ is the number of satisfiable clauses.

Note that the quadratic cost function constructed here also contains a linear term (that is, $2(a - x_0 - x_1 - x_2)$) which affects both types of spins (also called a “Zeeman term” in the context of the Ising problem). Typically, CIM algorithms such as CIM-CAC are not designed to solve problems with such a linear term (in other words, they assume the cost function is homogeneous). To fix this problem we added another (single) auxiliary spin which is coupled to all $N + M$ other variables with the strength of the linear term, making the total number of Ising spins $N + M + 1$. This resulting Ising problem will be identical (except each of the configurations of the original problem are duplicated due to the additional symmetry).

To give an intuitive understanding of why we believe this encoding does not work so well compared with the direct SAT solver, consider again the 3-SAT clause:

$$x_0 \vee x_1 \vee x_2.$$
Now imagine that we are at the configuration $(x_0, x_1, x_2) = (1,1,-1)$ and want to move to $(x_0, x_1, x_2) = (1,-1,-1)$. In the direct SAT solvers, this is not a problem, as both configurations have the same energy so there is no “force” preventing $x_1$ to flip. This is how it should be because this clause does not care what $x_1$ is as long as we have $x_0 = 1$. However, if we look at the above Ising encoding, to go from $(x_0, x_1, x_2) = (1,1,-1)$ to $(x_0, x_1, x_2) = (1,-1,-1)$ we must also flip the auxiliary spin $a$ if we want the Ising energy to remain constant. This means we can either flip $x_1$ first and then $a$, or $a$ first and then $x_1$. In either case, there will be an energy barrier which means the Ising solver will have trouble making this move.

This is not the only possible way to convert a SAT problem into an Ising problem; however, this is one of the most straightforward approaches. It is possible that there are other approaches that do not cause this degradation of performance. However, to do this conversion it is always necessary to introduce something like an auxiliary spin, so similar problems with the energy landscape are likely to arise.

Appendix D: Comparison of SAT-CAC and SAT-CFC

In this appendix, we briefly compare the performance of SAT-CFC and SAT-CAC. We believe that, in general, there is very little difference between these systems in terms of their performance as solvers, however, it is still important to confirm this numerically. In Fig. 24 we show identical data to Fig. 6(b) except SAT-CAC is included as well. The parameters for SAT-CAC are nearly identical to those of SAT-CFC (outline in Appendix B) but with $\beta = 0.25$ (instead of $\beta = 0.05$) and $p = 0.5 \rightarrow 1$ (instead of $p = -1 \rightarrow 1$).

 figure: Figure 24.

Figure 24. Median TTS of SAT-CAC (shown in blue) is nearly identical to that of SAT-CFC.

Download Full Size | PDF

Appendix E: Generalized CAC and CFC Algorithms

One of the central results of this work is that the equations for CIM-CAC (and CIM-CFC), which are aimed at solving the Ising problem, can naturally be generalized to other types of combinatorial optimization problems (in this case the SAT problem). In this section, we hypothesize a general method for creating a CAC/CFC solver given a combinatorial optimization problem in a certain form.

Given the a function $E : \{-1, +1 \}^N \rightarrow \mathbb {R}$ we can consider the optimization problem:

$${} \text{minimize}_{\sigma} \quad E(\sigma),$$
where $\sigma \in \{-1, + 1 \}^N$. Next, we want to construct a new function $V : \mathbb {R}^N \rightarrow \mathbb {R}$ which is differentiable and has the following properties:
  • $V(\sigma ) = E(\sigma )$ for all $\sigma \in \{-1, + 1 \}^N$;
  • $V(x)$ has no local minima on $\mathbb {R}^N$;
  • $\frac {\partial V}{\partial x_i} (\sigma ) = \sigma _i \left ( E(\text {flip}_i (\sigma )) - E(\sigma ) \right )$ where $\text {flip}_i (\sigma )$ refers to flipping the $i$th sign of $\sigma$.
Then we can consider the generalized CAC equations,
$${} \frac{dx_i(t)}{dt} = (p - 1)x_i(t) - x_i(t)^3 - e_i(t) \frac{\partial V}{\partial x_i} (x),$$
$$\frac{de_i(t)}{dt} ={-}\beta e_i(t)(x_i(t)^2 - 1),$$
and the generalized CFC equations,
$$f_i(t) = e_i(t) \frac{\partial V}{\partial x_i} (x),$$
$${} \frac{dx_i(t)}{dt} = (p - 1)x_i(t) - x_i(t)^3 - f_i(t),$$
$$\frac{de_i(t)}{dt} ={-} \beta e_i(t) ( f_i(t)^2 - \alpha).$$
We hypothesize that given the conditions on $V$ these equations will have some success at finding a minimizing configuration of $V$. The performance will still likely depend on what $V$ is chosen because many $V$ have these properties and the parameters $p, \beta$ still need to be chosen appropriately.

In addition to the form above (Eq. (47)), many combinatorial optimization problems involve optimizing some cost function over other discrete sets (for example, $\sigma _i$ could belong to some subset of integers). Although these problems can always be converted into the form above (Eq. (47)), we believe it may be desirable to have a new set of ODEs that is specific to the structure of the space being optimized over. These generalizations are beyond the scope of this paper as they likely require a more significant modification of the current ODEs (Eqs. (48), (49), (50), and (51)).

Appendix F: CAC and CFC with Target Amplitude

In previous works, studies of CIM-CAC and CIM-CFC [22,23,39] have included and additional parameter known as the target amplitude. This parameter can straightforwardly be included in the equations for SAT-CAC and SAT-CFC as follows:

$${} \frac{dx_i(t)}{dt} = (p-1)x_i(t) - x_i(t)^3 - e_i(t) \sum_j K_{ij}(t),$$
$$\frac{de_i(t)}{dt} ={-}\beta e_i(t)(x_i(t)^2 - a),$$
$$K_{ij} ={-}C_{ij}\prod_{k \in I_{j}, k \neq i} \frac{1 - C_{kj}x_k/\sqrt{a}}{2}.$$
For SAT-CAC the modified equations are
$$f_i(t) = e_i \sum_j K_{ij}(t)$$
$$\frac{dx_i(t)}{dt} = (p-1)x_i(t) - x_i(t)^3 - f_i(t),$$
$$\frac{de_i(t)}{dt} ={-}\beta e_i(t)(f_i(t)^2 - a(p - 1 - a)^2),$$
with the new parameter $a$ being the target squared amplitude. The key differences here being the factor of $1/\sqrt {a}$ in the calculation of $K_{ij}$ and the modifications to Eqs. (53) and (55). This new parameter $a$ can be useful for understanding how the machine works (because it specifies the amplitude that the soft spins should be stabilized to). However, due to a symmetry of the system this new parameter turns out to be redundant and, thus, we omit it to make the analysis and parameter selection simpler.

Given a positive real number $S$ we can perform the following variable substitution:

$$x_i = Sx_i^{*}, \quad e_i = S^{3}e_i^{*}, \quad t = S^{{-}2}t^{*},$$
$$p = 1 - S^2(1- p^{*}), \quad a = S^2 a^{*}, \quad \beta = \beta^{*}.$$
We can substitute these new variables into Eqs. (52) and (55)) and obtain
$$\frac{d(Sx_i^{*}(t^{*}))}{d(S^{{-}2}t^{*})} = (1 - (1 - S^2(p^{*}-1)))S x_i^{*}(t^{*}) - S^3 x_i^{*}(t^{*})^3 - S^3 e_i^{*}(t^{*}) \sum_j K_{ij}(t^{*}),$$
$$\frac{d(S^3 e_i^{*}(t^{*})}{d(S^{{-}2}t^{*})} ={-}\beta^{*}S^3 e_i^{*}(t^{*})(S^2 x_i^{*}(t^{*})^2 - S^2 a^{*}),$$
$$K_{ij} ={-}C_{ij}\prod_{k \in I_{j}, k \neq i} \frac{1 - C_{kj}S x_k^{*}/\sqrt{S^2 a^{*}}}{2},$$
which gives identical dynamics (all factors of $S$ cancel out). Thus, if we set $S = a^{1/2}$ we have $a^{*} = 1$ and we recover Eqs. (16) and (17). An identical substitution shows Eqs. (54) and (55) are identical to Eqs. (19) and (20).

Disclosures

The author declares that there are no conflicts of interest related to this article.

Data Availability

The data used for the results presented in this work is available upon request from the authors.

References

1. A. Lucas, “Ising formulations of many NP problems,” Front. Phys. 2, 00005 (2014). [CrossRef]  

2. S. G. Brush, “History of the Lenz-Ising model,” Rev. Mod. Phys. 39, 883–893 (1967). [CrossRef]  

3. F. Barahona, “On the computational complexity of Ising spin glass models,” J. Phys. A: Math. Gen. 15, 3241–3253 (1982). [CrossRef]  

4. M. Johnson, M. Amin, S. Gildert, et al., “Quantum annealing with manufactured spins,” Nature 473, 194–198 (2011). [CrossRef]  

5. S. Boixo, T. F. Rønnow, S. Isakov, Z. Wang, D. Wecker, D. Lidar, J. Martinis, and M. Troyer, “Quantum annealing with more than one hundred qubits,” Nat. Phys. 10, 218–224 (2014). [CrossRef]  

6. A. Marandi, Z. Wang, K. Takata, R. Byer, and Y. Yamamoto, “Network of time-multiplexed optical parametric oscillators as a coherent Ising machine,” Nat. Photonics 8, 937–942 (2014). [CrossRef]  

7. P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, S. Tamate, T. Inagaki, H. Takesue, S. Utsunomiya, K. Aihara, R. L. Byer, M. M. Fejer, H. Mabuchi, and Y. Yamamoto, “A fully programmable 100-spin coherent Ising machine with all-to-all connections,” Science 354, 614–617 (2016). [CrossRef]  

8. T. Inagaki, Y. Haribara, K. Igarashi, T. Sonobe, S. Tamate, T. Honjo, A. Marandi, P. L. McMahon, T. Umeki, K. Enbutsu, O. Tadanaga, H. Takenouchi, K.-i. Aihara, K. Kawarabayashi, K. Inoue, S. Utsunomiya, and H. Takesue, “A coherent Ising machine for 2000-node optimization problems,” Science 354, 603–606 (2016). [CrossRef]  

9. T. Inagaki, K. Inaba, R. Hamerly, K. Inoue, Y. Yamamoto, and H. Takesue, “Large-scale Ising spin network based on degenerate optical parametric oscillators,” Nat. Photonics 10, 415–419 (2016). [CrossRef]  

10. I. Mahboob, H. Okamoto, and H. Yamaguchi, “An electromechanical Ising Hamiltonian,” Sci. Adv. 2, e1600236 (2016). [CrossRef]  

11. T. Wang and J. Roychowdhury, “OIM: oscillator-based Ising machines for solving combinatorial optimisation problems,” in Unconventional Computation and Natural Computation, I. McQuillan and S. Seki, eds. (Springer International Publishing, 2019), pp. 232–256.

12. D. Pierangeli, G. Marcucci, and C. Conti, “Large-scale photonic Ising machine by spatial light modulation,” Phys. Rev. Lett. 122, 213902 (2019). [CrossRef]  

13. Y. Okawachi, M. Yu, J. K. Jang, X. Ji, Y. Zhao, B. Kim, M. Lipson, and A. Gaeta, “Demonstration of chip-based coupled degenerate optical parametric oscillators for realizing a nanophotonic spin-glass,” Nat. Commun. 11, 4119 (2020). [CrossRef]  

14. J. Chou, S. Bramhavar, S. Ghosh, and W. Herzog, “Analog coupled oscillator based weighted Ising machine,” Sci. Rep. 9, 14786 (2019). [CrossRef]  

15. F. Cai, S. Kumar, T. Vaerenbergh, X. Sheng, R. Liu, C. Li, Z. Liu, M. Foltin, S. Yu, Q. Xia, J. J. Yang, R. Beausoleil, W. Lu, and J. W. Strachan, “Power-efficient combinatorial optimization using intrinsic noise in memristor Hopfield neural networks,” Nat. Electron. 3, 409–418 (2020). [CrossRef]  

16. F. Molnár, S. R. Kharel, X. S. Hu, and Z. Toroczkai, “Accelerating a continuous-time analog SAT solver using GPUs,” Comput. Phys. Commun. 256, 107469 (2020). [CrossRef]  

17. B. Molnár, F. Molnár, M. Varga, Z. Toroczkai, and M. Ercsey-Ravasz, “A continuous-time MaxSAT solver with high analog performance,” Nat. Commun. 9, 4864 (2018). [CrossRef]  

18. X. Yin, B. Sedighi, M. Varga, M. Ercsey-Ravasz, and Z. Toroczkai, “Efficient analog circuits for Boolean satisfiability,” IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. 26, 155–167 (2018). [CrossRef]  

19. R. Sumi, M. Varga, Z. Toroczkai, and M. Ercsey-Ravasz, “Order-to-chaos transition in the hardness of random Boolean satisfiability problems,” Phys. Rev. E 93, 052211 (2016). [CrossRef]  

20. M. Ercsey-Ravasz and Z. Toroczkai, “The chaos within Sudoku,” Sci. Rep. 2, 725 (2012). [CrossRef]  

21. M. Ercsey-Ravasz and Z. Toroczkai, “Optimization hardness as transient chaos in an analog approach to constraint satisfaction,” Nat. Phys. 7, 966–970 (2011). [CrossRef]  

22. T. Leleu, Y. Yamamoto, P. L. McMahon, and K. Aihara, “Destabilization of local minima in analog spin systems by correction of amplitude heterogeneity,” Phys. Rev. Lett. 122, 040607 (2019). [CrossRef]  

23. S. Reifenstein, S. Kako, F. Khoyratee, T. Leleu, and Y. Yamamoto, “Coherent Ising machines with optical error correction circuits,” Adv. Quantum Technol. 4, 2100077 (2021). [CrossRef]  

24. T. McKenna, H. Stokowski, V. Ansari, J. Mishra, M. Jankowski, C. Sarabalis, J. Herrmann, C. Langrock, M. Fejer, and A. Safavi-Naeini, “Ultra-low-power second-order nonlinear optics on a chip,” Nat. Commun. 13, 4532 (2022). [CrossRef]  

25. G. S. Tseitin, On the Complexity of Derivation in Propositional Calculus (Springer, 1983), pp. 466–483.

26. L. K. Grover, “A fast quantum mechanical algorithm for database search,” in Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, STOC ’96 (Association for Computing Machinery, New York, NY, USA, 1996), p. 212–219.

27. G. Brassard, P. HØyer, and A. Tapp, “Quantum counting,” in Automata, Languages and Programming, K. G. Larsen, S. Skyum, and G. Winskel, eds. (Springer, 1998), pp. 820–831.

28. Z. Mu and H. H. Hoos, “On the empirical time complexity of random 3-SAT at the phase transition,” in IJCAI (2015).

29. E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lundgren, and D. Preda, “A quantum adiabatic evolution algorithm applied to random instances of an NP-complete problem,” Science 292, 472–475 (2001). [CrossRef]  

30. S. Santra, G. Quiroz, G. V. Steeg, and D. A. Lidar, “Max 2-SAT with up to 108 qubits,” New J. Phys. 16, 045006 (2014). [CrossRef]  

31. P. Hauke, H. G. Katzgraber, W. Lechner, H. Nishimori, and W. D. Oliver, “Perspectives of quantum annealing: methods and implementations,” Rep. Prog. Phys. 83, 054401 (2020). [CrossRef]  

32. R. Hamerly, T. Inagaki, P. L. McMahon, et al., “Experimental investigation of performance differences between coherent Ising machines and a quantum annealer,” Sci. Adv. 5, eaau0823 (2019). [CrossRef]  

33. S. Utsunomiya, K. Takata, and Y. Yamamoto, “Mapping of Ising models onto injection-locked laser systems,” Opt. Express 19, 18091–18108 (2011). [CrossRef]  

34. Z. Wang, A. Marandi, K. Wen, R. L. Byer, and Y. Yamamoto, “Coherent Ising machine based on degenerate optical parametric oscillators,” Phys. Rev. A 88, 063853 (2013). [CrossRef]  

35. T. Honjo, T. Sonobe, K. Inaba, T. Inagaki, T. Ikuta, Y. Yamada, T. Kazama, K. Enbutsu, T. Umeki, R. Kasahara, K.-i Kawarabayashi, and H. Takesue, “100,000-spin coherent Ising machine,” Sci. Adv. 7, eabh0952 (2021). [CrossRef]  

36. R. Hamerly, L. Bernstein, A. Sludds, M. Soljačić, and D. Englund, “Large-scale optical neural networks based on photoelectric multiplication,” Phys. Rev. X 9, 021032 (2019). [CrossRef]  

37. F. Leo, S. Coen, P. Kockaert, S.-P. Gorza, P. Emplit, and M. Haelterman, “Temporal cavity solitons in one-dimensional kerr media as bits in an all-optical buffer,” Nat. Photonics 4, 471–476 (2010). [CrossRef]  

38. E. Tiunov, A. Ulanov, and A. Lvovsky, “Annealing by simulating the coherent Ising machine,” Opt. Express 27, 10288–10295 (2019). [CrossRef]  

39. T. Leleu, F. Khoyratee, T. Levi, R. Hamerly, T. Kohno, and K. Aihara, “Scaling advantage of nonrelaxational dynamics for high-performance combinatorial optimization,” arXiv, arXiv:2009.04084 (2020). [CrossRef]  

40. S. K. Vadlamani, T. P. Xiao, and E. Yablonovitch, “Equivalence of coupled parametric oscillator dynamics to lagrange multiplier primal-dual optimization,” arXiv, arXiv:2204.02472 (2022). [CrossRef]  

41. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science 220, 671–680 (1983). [CrossRef]  

42. B. Selman, H. Kautz, and B. Cohen, “Noise strategies for improving local search,” Proceedings of the National Conference on Artificial Intelligence, Vol. 1 (1999).

43. A. Balint and U. Schöning, Choosing probability distributions for stochastic local search and the role of make versus break (2012), pp. 16–29.

44. M. Heule, M. Järvisalo, M. Suda, M. Iser, and T. Balyo, “The international SAT competition web page,” http://www.satcompetition.org/.

45. H. Goto, K. Endo, M. Suzuki, Y. Sakai, T. Kanao, Y. Hamakawa, R. Hidaka, M. Yamasaki, and K. Tatsumura, “High-performance combinatorial optimization based on classical mechanics,” Sci. Adv. 7, eabe7953 (2021). [CrossRef]  

46. M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa, H. Tamura, and H. G. Katzgraber, “Physics-inspired optimization for quadratic unconstrained problems using a digital annealer,” Front. Phys. 7, 48 (2019). [CrossRef]  

47. S. Patel, P. Canoza, and S. Salahuddin, “Logically synthesized and hardware-accelerated restricted Boltzmann machines for combinatorial optimization and integer factorization,” Nat. Electron. 5, 92–101 (2022). [CrossRef]  

48. I. Skliarova and A. de Brito Ferrari, “Reconfigurable hardware SAT solvers: a survey of systems,” IEEE Trans. Comput. 53, 1449–1461 (2004). [CrossRef]  

49. F. Schwierz and J. J. Liou, “Status and future prospects of CMOS scaling and Moore’s law - a personal perspective,” in 2020 IEEE Latin America Electron Devices Conference (LAEDC) (2020), pp. 1–4.

50. R. Hamerly, A. Sludds, L. Bernstein, M. Prabhu, C. Roques-Carmes, J. Carolan, Y. Yamamoto, M. Soljacic, and D. Englund, “Towards large-scale photonic neural-network accelerators,” in 2019 IEEE International Electron Devices Meeting (IEDM) (2019), pp. 22.8.1–22.8.4.

51. C. Roques-Carmes, Y. Shen, C. Zanoci, M. Prabhu, F. Atieh, L. Jing, T. Dubcek, C. Mao, M. Johnson, V. Ceperic, J. Joannopoulos, D. Englund, and M. Soljacic, “Heuristic recurrent algorithms for photonic Ising machines,” Nat. Commun. 11, 249 (2020). [CrossRef]  

52. M. Prabhu, C. Roques-Carmes, Y. Shen, N. Harris, L. Jing, J. Carolan, R. Hamerly, T. Baehr-Jones, M. Hochberg, V. Čeperić, J. D. Joannopoulos, D. R. Englund, and M. Soljačić, “Accelerating recurrent Ising machines in photonic integrated circuits,” Optica 7, 551–558 (2020). [CrossRef]  

53. M. Calvanese Strinati, D. Pierangeli, and C. Conti, “All-optical scalable spatial coherent Ising machine,” Phys. Rev. Appl. 16, 054022 (2021). [CrossRef]  

54. W. Moy, I. Ahmed, P.-w. Chiu, J. Moy, S. Sapatnekar, and C. Kim, “A 1,968-node coupled ring oscillator circuit for combinatorial optimization problem solving,” Nat. Electron. 5, 310–317 (2022). [CrossRef]  

55. K. P. Kalinin, A. Amo, J. Bloch, and N. G. Berloff, “Polaritonic XY-Ising machine,” Nanophotonics 9, 4127–4138 (2020). [CrossRef]  

56. M. Babaeian, D. T. Nguyen, V. Demir, M. Akbulut, P.-A. Blanche, Y. Kaneda, S. Guha, M. A. Neifeld, and N. Peyghambarian, “A single shot coherent Ising machine based on a network of injection-locked multicore fiber lasers,” Nat. Commun. 10, 3516 (2019). [CrossRef]  

57. H. Yamashita, H. Suzuki, Z. Toroczkai, and K. Aihara, “Bounded continuous-time satisfiability solver,” inInternational Symposium on Nonlinear Theory and its Applications (NOLTA2019).

58. H. Yamashita, K. Aihara, and H. Suzuki, “Accelerating numerical simulation of continuous-time Boolean satisfiability solver using discrete gradient,” Commun. Nonlinear Sci. Numer. Simul. 102, 105908 (2021). [CrossRef]  

59. S. Reifenstein, T. Leleu, T. McKenna, M. Jankowski, M.-G. Suh, E. Ng, F. Khoyratee, Z. Toroczkai, and Y. Yamamoto, “2018 SAT competition solvers, GitHub (2022) https://github.com/satcompetition/2018/tree/master/solvers.

60. M. Mézard and R. Zecchina, “Random k-satisfiability problem: from an analytic solution to an efficient algorithm,” Phys. Rev. E 66, 056126 (2002). [CrossRef]  

61. M. J. H. Heule, M. Järvisalo, and M. Suda, “SAT competition 2018,” J. on Satisf. Boolean Model. Comput. 11, 133–154 (2019). [CrossRef]  

62. A. Ignatiev, A. Morgado, and J. Marques-Silva, “RC2: an efficient maxsat solver,” J. on Satisf. Boolean Model. Comput. 11, 53–64 (2019). [CrossRef]  

63. Y. Yamamoto, K. Aihara, T. Leleu, K.-i. Kawarabayashi, S. Kako, M. Fejer, K. Inoue, and H. Takesue, “Coherent Ising machines—optical neural networks operating at the quantum limit,” npj Quantum Inf. 3, 49 (2017). [CrossRef]  

64. M. Xu, Y. Zhu, F. Pittalà, J. Tang, M. He, W. C. Ng, J. Wang, Z. Ruan, X. Tang, M. Kuschnerov, L. Liu, S. Yu, B. Zheng, and X. Cai, “Dual-polarization thin-film lithium niobate in-phase quadrature modulators for terabit-per-second transmission,” Optica 9, 61–62 (2022). [CrossRef]  

65. M. Jankowski, J. Mishra, and M. Fejer, “Dispersion-engineered χ(2) nanophotonics: a flexible tool for nonclassical light,” JPhys Photonics 3, 042005 (2021). [CrossRef]  

aop-15-2-385-i002

Sam Reifenstein is an intern researcher at NTT Research, Inc. He is interested in understanding how continuous-time dynamical systems such as the coherent Ising machine can speed up the solution of combinatorial optimization problems and how these new techniques can be applied to real-world problems. He is currently a master’s student at University of California, Santa Cruz, in applied math, where he was awarded his undergraduate degree in 2022.

aop-15-2-385-i003

Timothee Leleu is a project associate professor at the International Research Center for Neurointelligence at The University of Tokyo. His research focuses on the mechanisms of neural computation from the applied physics perspective. He is also interested in developing fast, scalable, and reconfigurable simulation platforms for the coherent Ising machine, a type of optical neural network that can solve combinatorial optimization problems.

aop-15-2-385-i004

Tim McKenna enjoys pushing the limits of on-chip integrated photonic systems to perform tasks in the areas of computing, communications, and sensing. He focuses on combining nonlinear optics with quantum information science and advances in nanofabrication to further the state-of-the-art. He holds a BS in electrical engineering from the University of Pennsylvania and an MS and PhD in electrical engineering from Stanford University.

aop-15-2-385-i005

Marc Jankowski is a senior research scientist at NTT PHI lab who looks at the intersection of nonlinear dynamics with optical systems, particularly nanophotonic devices. He received his PhD in electrical engineering from Stanford in 2020. His work with NTT PHI Lab addresses the following questions: What are the ultimate limits, in terms of size and energy requirements, of nonlinear photonic devices? What dynamical regimes are still undiscovered? How can these nonlinear devices and processes be used to enable new technologies, such quantum and classical light sources?

aop-15-2-385-i006

Myoung-Gyun Suh is a senior scientist at NTT Research, Inc. His primary research interest is in optical precision measurements and information processing systems. He has focused on developing innovative chip-based optical sources and exploring their applications to precision spectroscopy, astronomy, optical communication, and computation. Recently, he has been studying integrated nonlinear optics, optical frequency combs, and neuromorphic computing. He earned a BS in physics from Korea Advanced Institute of Science and Technology in 2004, an MS in physics from National Taiwan University in 2006, and a PhD in applied physics from the California Institute of Technology in 2017. For his earlier research, he was fascinated by light–matter interaction phenomena in photonic crystal structures and studied two-dimensional photonic crystal lasers for his BS and MS degrees. After his MS, he worked at Samsung Advanced Institute of Technology from 2006 to 2011, where he developed high-efficiency gallium nitride light-emitting diodes and III–V multi-junction solar cells. He has been the recipient of a Taiwan scholarship (2004–2006), Kwanjeong scholarship (2011–2015), and OSA Paul F. Forman team engineering excellence award (2020).

aop-15-2-385-i007

Edwin Ng is a research scientist at NTT PHI lab. He did his PhD in the group of Hideo Mabuchi at Stanford University studying optical parametric oscillator networks coupled in time and frequency domains. At PHI Lab, Edwin focuses on theoretical models and experimental techniques for describing and probing the physics of high-dimensional photonic systems, like coherent Ising machines and ultrafast photonics, operating at the quantum limit.

aop-15-2-385-i008

Farad Khoyratee is a dedicated postdoctoral researcher at the Theoretical Quantum Physics Lab at Riken in Tokyo, Japan. He holds a PhD from the University of Bordeaux, where he spent years studying the interface between artificial biomimetic neural networks embedded in electronics platforms and biological neurons to study neurological diseases. During his research, he successfully employed FPGAs to overcome the challenges of power computation and demonstrated that such technologies can effectively handle complex algorithms. Currently, he is focused on developing unconventional technologies to accelerate computationally challenging algorithms for combinatorial optimization problems.

aop-15-2-385-i009

Zoltan Toroczkai received the PhD degree in theoretical physics from the Virginia Polytechnic Institute and State University, Blacksburg, Virginia, USA, in 1997. He was a Post-Doctoral Researcher with the Condensed Matter Physics Group, University of Maryland, College Park, Maryland, USA until 2000, after which he joined Los Alamos National Laboratory (LANL), Los Alamos, New Mexico, USA as a Director Funded Fellow. Subsequently he became a regular Research Staff Member at LANL within the Complex Systems Group and from 2004 to 2006 he was the Deputy Director of the Center for Nonlinear Studies, LANL. In 2006, he joined the Department of Physics, University of Notre Dame, Notre Dame, Indiana, USA, where he is currently a Professor and a Concurrent Professor with the Department of Computer Science and Engineering. He has authored and co-authored over 100 papers in peer-reviewed publications on topics in the areas of statistical physics, nonlinear dynamical systems, fluid flows, reaction kinetics, interface growth, population dynamics, epidemics, agent-based systems and game theory, complex networks, discrete mathematics, foundations of computing, and brain neuronal networks. Dr. Toroczkai has been on the editorial board at the European Journal of Physics B, Scientific Reports, Entropy, and Chaos and an Associate Editor of Network Science. He is an APS Fellow (since 2012) upon nomination by GSNP “[f]or his contributions to the understanding of the statistical physics of complex systems, and in particular for his discoveries pertaining to the structure and dynamics of complex networks.”

aop-15-2-385-i010

Yoshihisa Yamamoto is the Director of PHI (Physics and Informatics) Laboratories, NTT Research, Inc. He received B.S. degree from Tokyo Institute of Technology and Ph.D. degree from the University of Tokyo in 1973 and 1978, respectively, and joined NTT Basic Research Laboratories in 1978. He became a Professor of Applied Physics and Electrical Engineering at Stanford University in 1992 and also a Professor at National Institute of Informatics (NII) in 2003. He is currently a Professor (Emeritus) at Stanford University and NII. His past research areas include coherent communications, squeezed states, quantum non-demolition measurements, exciton–polariton BEC, single photon and spin-photon entanglement generation, and mesoscopic transport noise. He has received many distinctions for his past work, including a Carl Zeiss Award (1992), Nishina Memorial Prize (1992), IEEE/LEOS Quantum Electronics Award (2000), Medal with Purple Ribbon (2005), Hermann A. Haus Lecturer of MIT (2010), Okawa Prize (2011), and Willis E. Lamb Award (2022). His current research interest focuses on quantum information processing, physics of quantum-to-classical transition, and coherent Ising machines.

Data Availability

The data used for the results presented in this work is available upon request from the authors.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (24)

Figure 1.
Figure 1. (a) Quantum circuit to implement one step of Grover’s search algorithm for a 3-SAT problem with $N=4$ variables (represented by the top-four qubits) and $M=10$ clauses (represented by the next 10 ancillary qubits). The truth value of the CNF formula is used for a CNOT gate applied to the bottom qubit (which is in the $\lvert - \rangle$ state). This implements the $U_{w}$ operator. Combined with the $U_s$ operator, this circuit will perform a rotation of the state vector that amplifies the marked states by roughly $\sqrt {\frac {1}{2^N}}$. (b) Quantum circuit to implement quantum counting by estimating the eigenvalue of the Grover operator shown in Fig. 1(a) (denoted $G$).
Figure 2.
Figure 2. Bifurcation diagram. The blue star and orange cross show the points $\{\mu ^*,1/(1-p)\}$ with $p = -0.6$ (stable case) and $p = -2.6$ (unstable case), respectively. Here $\mu ^*$ is the maximal eigenvalue of $\tilde {\Omega }$, i.e., $\mu ^* = \text {max}_j \mu _j$ for a satisfiable solution $\sigma$. The horizontal coordinate of this figure is associated with a specific fixed point/local minima for a given problem, whereas the vertical coordinate corresponds to the parameter $p$ (technically $\theta = 1/(1-p)$). The green curve indicates the value of $p$ for which a given fixed point will transition from being unstable to stable. This is the theoretical reasoning behind why modulating $p$ (or, equivalently, modulating $\theta$) over time can create an “annealing” effect.
Figure 3.
Figure 3. Trajectory in the case of a stable satisfiable solution, $p=-0.6$.
Figure 4.
Figure 4. Trajectory in the case of an unstable satisfiable solution, $p=-2.6$.
Figure 5.
Figure 5. (a) Median TTS of CIM-CAC and SAT-CFC on random 3-SAT with respect to clause-to-variable ratio $\alpha = M/N$ with the problem size fixed at $N=40$. (b) Median TTS of CIM-CAC and SAT-CFC on random 3-SAT with respect to problem size $N$ with the clause-to-variable ratio fixed at $\alpha = 3.8$. TTS is expressed in the units of integration time steps for both methods (or, equivalently, round trips on a physical machine).
Figure 6.
Figure 6. (a),(b) Median TTS as a function of clause-to-variable ratio $\alpha = M/N$ at problem sizes $N=100$ and $N=1000$, respectively. For each value of $\alpha$, a set of 25 randomly generated instances is created and the median is evaluated over the instances that were solved by any of the four solvers. (c),(d) Median TTS as a function of problem size $N$ with the clause-to-variable ratio fixed at $\alpha = 3.4$ and $\alpha = 3.8$, respectively. The median is evaluated over a set of 25 randomly generated instances per problem size.
Figure 7.
Figure 7. (a) Median TTS as a function of problem size $N$ with the clause-to-variable ratio fixed at $\alpha = 4.25$. The median is evaluated over a set of 25 randomly generated instances per problem size. As $\alpha = 4.25$ is very close to the satisfiability threshold, many instances generated were not satisfiable; thus we took the median over the instances that were solved by any of the four solvers. (b) Fraction of problem instances solved by any of the solvers at $\alpha = 4.25$ (out of the 25 random instances generated). Although the fraction decreases with problem sizes, this is expected because the location of the phase transition point is slightly larger for smaller values of $N$ [28]. Because the number of satisfied problems stays above 0.5, this tells us at the very least that the solvers solved most of the satisfiable problems. We believe it is likely that we have solved nearly all of the satisfiable problems, so the median TTS measured is close to the true median.
Figure 8.
Figure 8. Median wall-clock TTS (seconds) for a GPU implementation of differential solvers (SAT-CFC, M-CTDS, and CTDS) and for a CPU implementation of ProbSAT. The GPU used for this plot is the Tesla V100, and ProbSAT was run on an AWS c6i instance which uses third-generation Intel Xeon Scalable processors. (Note that the time for ProbSAT is calculated for a single thread even though multiple threads can be run in parallel on the instance.)
Figure 9.
Figure 9. Scatter plot showing TTS for both SAT-CFC (vertical axis) and ProbSAT (horizontal axis) for 50 randomly generated 3-SAT instances at problem size $N=1000$ with $\alpha = 4.25$. Unsolved instances are plotted on the dotted gray lines (with a default TTS of $10^{11}$). The units for TTS are integration time steps for SAT-CFC and spin flips for ProbSAT.
Figure 10.
Figure 10. Median, 75th percentile, and 90th percentile TTS of SAT-CFC on Max-SAT at problem size $N=50$. The TTS is measured in the units of integration time steps.
Figure 11.
Figure 11. DOPO device implemented using a thin-film lithium niobate (LiNbO$_3$) waveguide [24].
Figure 12.
Figure 12. (a) Squeezed vacuum state in a DOPO device below threshold. (b) Two nearly orthogonal coherent states with 0 and $\pi$ phase [63]. (c) Pitchfork bifurcation of 2000 DOPOs [8].
Figure 13.
Figure 13. (a) Optical delay line coupled CIM. (b) Measurement feedback coupled CIM [63].
Figure 14.
Figure 14. (a) Probability distribution of highest occurrences $M'$ of any variable as a function of $N$ (for $\alpha = M/N = 4.25$ and $k = 3$). Solid line: the scaling of $\langle M'\rangle$. Dashed lines: the scaling of the minimum and maximum $M'$ observed in 1000 instances. Note that the fit indicates that $\langle M'\rangle \approx 12 + 7 \log \log N$. (b) Histogram of probability distribution for $M'$ at $N = 1000$. The distribution of $M'$ is strongly peaked around $\langle M'\rangle$, and even a cutoff of $M'_\textrm {max} = 40$ provides sufficient memory for the optical encoding of any realistic problem instance.
Figure 15.
Figure 15. (a) Calculation of the mutual coupling terms $K_{ij'}$ using a coherent receiver. (b) Summation over the index $j'$ is performed using an analog integrator, with the resulting voltage digitized and buffered for the FPGA. (c) High-level circuit layout for the signal processing in the supporting FPGA to update $x_i$ and $e_i$ and feed forward into the optical circuit.
Figure 16.
Figure 16. (a) Parallelism may be achieved by fanning out the input laser into multiple channels. In this case, each channel contains two amplitude modulators, a push–pull phase modulator, and a balanced homodyne receiver. For the case of $N_\mathrm {ch}=2$, we may choose either $i\in \left [1,N/2\right ]$ and $h\in \left [N/2 + 1, N\right ]$ or $i \in Z_e$ and $h \in Z_o$, where $Z_e$ and $Z_o$ refer to the even and odd integers, respectively.
Figure 17.
Figure 17. (a) Variables $x_i$ are stored in an optical DRAM and read out into the optical encoding described in Section 8.2. These outputs are prepared using either offset circuits or SHG for the optical compute core and error correction circuit. (b) The error correction circuit interferes with the input second harmonic of each $x_i$ with the pump laser to realize Eq. (39). (c) The optical compute core performs a sequence of element-wise products using sum-frequency generation (SFG) and difference-frequency generation (DFG) and vector–vector inner products (MACs) to calculate the feedback terms given by $e_iz_i$.
Figure 18.
Figure 18. (a) Optical memory based on a synchronously pumped OPO. Copies of $x_i$ may be read to a bus waveguide, and feedback pulses may be injected into the OPO using SFG and DFG, respectively. (b) Example pulse sequence for reading copies of $x_i$ into the waveguide and injecting feedback from the waveguide into the OPO. In these diagrams, the rightmost pulse enters the resonator first, and therefore operations are read from right to left.
Figure 19.
Figure 19. (a) Optical DRAM based on many synchronously pumped degenerate OPOs coupled to a common bus waveguide. (b) Example pulse sequence for reading copies of each $x_i$ to the waveguide and injecting feedback into each OPO.
Figure 20.
Figure 20. (a),(b) Optical vector–vector inner products with outputs at short and long wavelengths, respectively. (c),(d) Example pulse sequence for each approach. We note here that a vector of ones can be used as an input to perform accumulation (ACC). This can be used to integrate an optical signal over many pulses without incurring waveform distortion.
Figure 21.
Figure 21. (a) For 3-SAT, three separate copies of $x_i$ are stored and addressed in parallel across the optical DRAM. (b) Two copies are to calculate each pair of subterms, and one copy is used to calculate the error correction terms $e_i$. Mutual injection between the copies of $x_i$ helps to prevent inconsistencies from forming.
Figure 22.
Figure 22. (a) TTS of M-CTDS plotted as a function of $T$ for a specific 3-SAT instance and different values of the parameter $\xi$. The TTS has a sharp minimum at a specific value of $T$. (b) Plot of $T_{\rm opt}$ against $x_i$ for a number of instances (of the same problem size), indicating that $T_{\rm opt}$ only depends on $\xi$.
Figure 23.
Figure 23. Median TTS of best fixed parameters and best annealed parameters for SAT-CFC. Data was not collected for fixed parameters after $N=600$.
Figure 24.
Figure 24. Median TTS of SAT-CAC (shown in blue) is nearly identical to that of SAT-CFC.

Tables (21)

Tables Icon

Table 1. Summary of the Pros and Cons of the Various Approaches to Solving Boolean SAT Discussed in This Sectiona

Tables Icon

Table 2. Qualitative Similarities and Differences between K-SAT Solvers

Tables Icon

Table 3. Example Trajectories (xi, ei, and aj) for the Three Solvers Proposed Here

Tables Icon

Table 4. Performance of Solvers on Random 5-SAT Problems from the 2018 SAT Competition

Tables Icon

Table 5. Electronic Latency and Power Consumption for the Hybrid Optoelectronic Scheme

Tables Icon

Table 6. Power Consumption and Latency of Each Component of the Hybrid Architecture

Tables Icon

Table 7. Time-to-Solution and Energy-to-Solution Estimates for Hybrid Feed-Forward Architectures (N=1000, M=4.25N)

Tables Icon

Table 8. Factors Involved in Calculating the Power Consumption of the Fully Optical System

Tables Icon

Table 9. Factors Involved in Calculating the Time for One Round Trip of the Fully Optical System

Tables Icon

Table 10. ETS and TTS Estimates for All-Optical Architecture (N=1000, M=4.25N)a

Tables Icon

Table 11. Estimated Time and Energy Per Iteration for Each Architecture as well as Wall-Clock TTS and ETSa

Tables Icon

Table 12. TTS and popt of SAT-CFC for Fig. 6(b)

Tables Icon

Table 13. TTS and ξopt of M-CTDS for Fig. 6(b)

Tables Icon

Table 14. TTS and popt of SAT-CFC for Fig. 6(c)a

Tables Icon

Table 15. TTS and popt of SAT-CFC for Fig. 6(d)

Tables Icon

Table 16. TTS and ξoptof M-CDTS for Fig. 6(d)

Tables Icon

Table 17. Parameters Used for Each Parameter Selection Methoda

Tables Icon

Table 18. TTS and popt of SAT-CFC with Fixed Parameters for Fig. 7

Tables Icon

Table 19. TTS and popt of SAT-CFC with Annealed Parameters for Fig. 7

Tables Icon

Table 20. TTS and ξopt of M-CTDS for Fig. 7

Tables Icon

Table 21. Ising Energies for Each Configuration of Spins Demonstrating That the Ising Energy Matches That SAT Potential When the Ancillary Spin Is Chosen to Minimize the Energy

Equations (82)

Equations on this page are rendered with MathJax. Learn more.

(x1x2¯x4)(x2x3¯x4¯).
Cij={1xiis included in thejth clause,1xi¯is included in thejth clause,0theith variable is not included in thejth clause.
Hising=(1s)iσxis12ijJijσziσzj,
HSAT=(1s)iσxi+sjkICkjσzk2.
dxi(t)dt=(p1)xi(t)xi(t)3jJijxj(t).
Kj=kIj1Ckjxk2.
jKj=0xi[1,1]
jKj2=0xiR
minimizejKjsubject toxi[1,1]
minimizejKj2xiR.
dxi(t)dt=jaj(t)Kj(t)Kij(t),
daj(t)dt=aj(t)Kj(t)2,
xi[1,1]ai(0,),
Kij=CijkIj,ki1Ckjxk2,
dxi(t)dt=jaj(t)Kj(t)Kij(t),
daj(t)dt=ξ(t)aj(t)(Kj(t)21MmKm(t)2),
xi[1,1]ai(0,).
jlog(aj(t))=jlog(aj(0)).
ξ(t)=10t1MmKm(τ)2dτ,
dxi(t)dt=(p1)xi(t)xi(t)3ei(t)jJijxj(t),
dei(t)dt=βei(t)(xi(t)2α).
fi(t)=ei(t)jJijxj(t),
dxi(t)dt=(p1)xi(t)xi(t)3fi(t),
dei(t)dt=βei(t)(fi(t)2α).
dxi(t)dt=(p1)xi(t)xi(t)3ei(t)jKij(t),
dei(t)dt=βei(t)(xi(t)21),
fi(t)=eijKij(t),
dxi(t)dt=(p1)xi(t)xi(t)3fi(t),
dei(t)dt=βei(t)(fi(t)2(p2)2),
xi=σi{1,+1},
ei=p2|jKij|.
σi=sign(jKij).
0=dxi(t)dt=(p1)xi(t)xi(t)3ei(t)jKij(t),
sign((p1)xi(t)xi(t)3)=sign(ei(t)jKij(t)).
σi=sign(xi(t))=sign(jKij).
xi=σi,
ei=p2σihi,
x˙ixj=δij(p4)+eiβωij,
x˙iej=hiδij,
e˙ixj=2βσieiδij,
e˙iej=0,
λj±={12(2+(2p)μj±Δj) if Δj>0,12(2+(2p)μj±iΔj)otherwise,
θ=G±(β,μj),
Re[λj+]=0{θ=0 or θ=1, if Δj>0,θ=F(μj(σ)), if Δj<0,
zi(t)=jKij(t),
Kij=CijkIj,ki1Ckjxk2.
zi=jMiCijkIji12(1Ckjxk),
zi=jC~ij4(1C~ij(1)x~ij(1))(1C~ij(2)x~ij(2)),
(12¯5)(234¯)(1¯46)(345).
C~=(+1101+10+1+101+1+1+1+10+100),C~(1)=(1+10+1+10+1+10+11+1+1+10100),C~(2)=(+1+10+1101+10+1+1+11+10+100),
x~(1)=(x2x40x1x30x2x40x2x1x3x1x30x100),x~(2)=(x5x60x5x40x4x50x3x6x5x2x40x400).
I(t=(iMmax+j)Tclock)=2sij(1)sij(2)sin(ϕ),
I(t=(iMmax+j)Tclock)=K~ij.
P=Nch(Popt+3Pmod)+Pelec,
Titeration=Telec+TclockMmaxN/Nch,
Eiteration=NchPoptTelec+PelecTelec+(Popt+3Pmod)TclockMmaxN+PelecTclockMmaxN/Nch,
Nch,opt2=PelecTclockPoptTelecMmaxN.
dei(t)dt=βei(t)(xi2(t)1).
(12¯5)(234¯)(1¯24)(345).
TTS=Tlog(10.99)log(1P)
TTS(T)=Tlog(10.99)log(1P(T));
TTSopt=minTTTS(T).
x0x1x2
K(x0,x1,x2,a)=3(x0x1+x1x2+x2x0)4a(x0+x1+x2)+2(ax0x1x2).
x0x1x2.
minimizeσE(σ),
dxi(t)dt=(p1)xi(t)xi(t)3ei(t)Vxi(x),
dei(t)dt=βei(t)(xi(t)21),
fi(t)=ei(t)Vxi(x),
dxi(t)dt=(p1)xi(t)xi(t)3fi(t),
dei(t)dt=βei(t)(fi(t)2α).
dxi(t)dt=(p1)xi(t)xi(t)3ei(t)jKij(t),
dei(t)dt=βei(t)(xi(t)2a),
Kij=CijkIj,ki1Ckjxk/a2.
fi(t)=eijKij(t)
dxi(t)dt=(p1)xi(t)xi(t)3fi(t),
dei(t)dt=βei(t)(fi(t)2a(p1a)2),
xi=Sxi,ei=S3ei,t=S2t,
p=1S2(1p),a=S2a,β=β.
d(Sxi(t))d(S2t)=(1(1S2(p1)))Sxi(t)S3xi(t)3S3ei(t)jKij(t),
d(S3ei(t)d(S2t)=βS3ei(t)(S2xi(t)2S2a),
Kij=CijkIj,ki1CkjSxk/S2a2,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.