Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Optical processor for a binarized neural network

Open Access Open Access

Abstract

We propose and experimentally demonstrate an optical processor for a binarized neural network (NN). Implementation of a binarized NN involves multiply-accumulate operations, in which positive and negative weights should be implemented. In the proposed processor, the positive and negative weights are realized by switching the operations of a dual-drive Mach–Zehnder modulator (DD-MZM) between two quadrature points corresponding to two binary weights of +1 and −1, and the multiplication is also performed at the DD-MZM. The accumulation operation is realized by dispersion-induced time delays and detection at a photodetector (PD). A proof-of-concept experiment is performed. A binarized convolutional neural network (CNN) accelerated by the optical processor at a speed of 32 giga floating point operations/s (GFLOPS) is tested on two benchmark image classification tasks. The large bandwidth and parallel processing capability of the processor has high potential for next generation data computing.

© 2022 Optica Publishing Group

In recent years, deep neural networks (NNs) have achieved great success in a wide range of applications including computer vision, automatic speech recognition, and natural language processing [1]. However, it also introduces tremendous demand on computational resources. To meet the computational requirement, analog optical computing that uses the physical characteristics of light, such as intensity and phase, and the interactions between light and optical devices, has been heavily investigated [2,3]. The inherent large bandwidth and better parallelism, such as with wavelength division multiplexing (WDM) and mode division multiplexing (MDM), can be used to boost the computing capability. Specifically, for NN computing, the vector-matrix multiplication (VMM) operation can be accelerated by linear optics. Several configurations have been proposed to speed up the implementation of NN algorithms. In [4], the VMM is implemented by using cascaded acousto-optic modulator arrays. In [5,6], the VMM is implemented by time-wavelength plane manipulation and dispersed time delays. In [7], the VMM is implemented by the optical time-stretch method. In [8,9], the VMM is implemented by Mach–Zehnder interferometers (MZIs). In [10], the VMM is performed by interconnection based on group delay dispersion. Photonic VMM can be also achieved by phase-change materials [11,12] with reduced energy consumption. A key problem using optics for VMM is the implementation of negative weights. A few solutions have been proposed. In [4], a negative weight is realized by using an electrical switch through which positive and negative values can be switched after optical-to-electrical conversion. However, an electrical switch has limited speed and bandwidth compared with optics. In [6], negative weights are implemented based on balanced photodetection. An optical spectral shaper is used to direct optical carriers to two input ports of a balanced photodetector, leading to the generation of positive and negative weights. However, the response time of an optical spectral shaper is several hundred milliseconds, leading to a low refresh rate. In [7], an NN is separated into two parts so as to calculate the positive and negative coefficients separately. The positive and negative coefficients are encoded in two time-stretched pulses, calculated in series, thus the calculation speed is reduced by half. In [8,9], coherent optics is used. Since both phase and intensity can be manipulated in coherent optics, negative even complex weights can be achieved [9]. Compared with coherent optics, by which both phase and intensity of a light can be manipulated, incoherent optics can only manipulate light intensity with positive weights. To implement negative weights, some special designs should be used. However, NNs have achieved unprecedented success at the cost of complexity, which hinders their hardware deployment. Therefore, the NN compression concept is naturally proposed and widely used for memory saving and computing acceleration [13]. Among the NN compression techniques, binarizing weights was introduced to simplify the NN implementation [14]. It was proved that a binarized NN can approach the performance of a full-precision NN on small datasets (e.g., MNIST, CIFAR-10) [15]. In the past, the implementations of NNs were mainly based on digital electronics. However, NNs can also be implemented based on optics which can provide higher speed. In [1618], binarized NNs were implemented based on free-space optics. Waveguide optics was also employed to implement NNs thanks to the compact size [19,20], but no binarized NNs have been demonstrated by waveguide optics.

In this Letter, we propose and experimentally demonstrate a new optical processor based on incoherent waveguide optics for a general-purpose binary NN. The binary positive and negative weights are realized by switching the operations of a dual-drive Mach–Zehnder modulator (DD-MZM) between the quadrature points in the complementary slopes of the transfer function and the multiplication is also performed at the DD-MZM. The accumulation operation is realized by dispersion-induced time delays and detection at a photodetector (PD). Compared with previous schemes that require a separate modulator to continuously tune the weights, only a single modulator is employed, which greatly simplifies the processor [4,5,7]. In addition, since the weights are binary and the polarity is controlled by controlling the bias voltage, a 1-bit DAC can be used and the weight refresh rate is much faster than the approach using an optical spectral shaper [6]. A proof-of-concept experiment is performed. An optical processor for a binarized convolutional neural network (CNN) at a speed of 32 giga floating point operations/s (GFLOPS) is tested on two benchmark image classification tasks.

An NN consists of basic units called artificial neurons which are inspired by biological neurons. Artificial neurons include two constituents: linear operations and a nonlinear activation function. In the linear part, an input signal is weighted and summed. Then the weighted sum of the input signal is sent to a module with a nonlinear activation function to generate the output of the neuron. To simplify the implementation, binary weights are introduced to replace the full-precision weights. A simple binarization operation can be given by the sign function, i.e., wb = 1 if w >= 0 and wb = 0 if w < 0 [15]. To achieve binarized positive and negative weights in the optical domain, we propose to use a DD-MZM by switching its operations between two quadrature points. As shown in Fig. 1, a serial input signal x(t) and a serial weight signal w(t) are applied to the two arms of the DD-MZM. To implement a binarized NN, the constant bias voltage is adjusted to bias the DD-MZM at the minimum transmission point, and w(t) is set to ±Vπ/2 as shown in the inset of Fig. 1, to allow the DD-MZM to operate between the two quadrature points at the positive and negative slopes. Mathematically, the generated binary electrical signal is I(t) ≈ 1 ± γx(t), when w(t) = ±Vπ/2. As can be seen, the input signal is scaled and multiplied by a binary weight of +1 or −1. When x(t) = 0, the electrical signal is regarded as the reference level, given by I0 as shown in the inset of Fig. 1. In the artificial neuron model, the weighted input signal is summed before the nonlinear activation function. To sum the binary-weighted input signal, we propose to use the dispersion-indued time-delay method [5,6]. The input vector is serialized first and applied to the DD-MZM. To sum up the weighted time series after the DD-MZM, multiple optical carriers with identical wavelength spacing are generated and sent to the DD-MZM. The weighted signal is copied to the optical carriers, as shown in Fig. 2. Then, the modulated multi-wavelength signal is directed to a dispersive medium where dispersion-induced time delays are introduced. For a wavelength spacing Δλ and chromatic dispersion D, the time delay is given by Δτ = D × Δλ. If the time delay equals the symbol duration, then the successive symbols are aligned. After photodetection, the electrical signal at every symbol duration is the sum of successive symbols within a sum window determined by the number of wavelengths, which is ${I_{sum}}[m ]= {I_{ref}} + g\mathop \sum \limits_{n = m - N + 1}^m {w_b}[n ]x[n ]$, where the reference level Iref and the gain g are intrinsic parameters of the optical processor, which can be estimated by a calibration process. Once the intrinsic parameters are obtained, the true weighted sum can be given by (Isum[m] − Iref)/g.

 figure: Fig. 1.

Fig. 1. Experimental setup. The DD-MZM and the PD are the I/O interface of the optical processor. Inset shows the switching between the opposite quadrature points leading to binary weighting.

Download Full Size | PDF

 figure: Fig. 2.

Fig. 2. Sum of the weighted input signal due the dispersion-induced time delay.

Download Full Size | PDF

The proposed optical processor is employed to implement a binarized CNN. The CNN has two convolutional layers and one fully connected (FC) layer, as shown in Fig. 3. The first convolutional layer has eight channels, and the kernel size is 2 × 2; the second convolutional layer has 32 channels, and the kernel size is 5 × 5. The output of the second convolutional layer is flattened and connected to a FC layer whose output is sent to a softmax layer to generate the output. The training curves of the model with full-precision and binarized weights on the MNIST dataset are illustrated in Fig. 4. The MNIST dataset is a database of handwritten digits that is commonly used for training various NNs. The MNIST dataset is divided into a training set consisting of 60,000 images and a test set consisting of 10,000 images. To train the model with binarized weights, the strait-through estimator is used [15]. The best accuracy of the full-precision model is 98.62% while the best accuracy of the model with binarized weights is 98.18% with slight accuracy degradation compared to the full-precision model. The trained kernels of the first convolution layer are [−1, −1; −1,1], [−1,1;1,1], [1,1;1,1], [1, −1; −1, −1], [1, −1;1,1], [−1, −1; −1, −1], [1,1;1,1], and [−1, −1;1,1].

 figure: Fig. 3.

Fig. 3. Employed CNN model which has two convolutional layers and one fully connected layer. The optical processor implements the first convolution layer.

Download Full Size | PDF

 figure: Fig. 4.

Fig. 4. Training curves of the model with (a) full-precision and (b) binarized weights on the MNIST dataset.

Download Full Size | PDF

A proof-of-concept experiment is performed based on the setup shown in Fig. 1. In the experiment, the optical processor is used to calculate the first convolutional layer, and the other layers in the CNN are carried out in a digital computer. Four wavelengths from four laser diodes (Keysight N7714A) are combined to achieve 2 × 2 convolution windows. Then, the combined four wavelengths are directed to a DD-MZM (Fujitsu FTM7921ER). The bandwidth and half-voltage of the DD-MZM are 10 GHz and 4 V, respectively. An arbitrary waveform generator (AWG) (Keysight M8195A) with a sampling rate of 64 GSa/s is employed to generate test image signals and weight signals. The test image signal is applied to the DD-MZM via one RF port. The weight signal is amplified by an electrical amplifier (EA) (Multilink MTC5515) with a gain of 20 dB and combined with a DC bias by a bias-tee and is applied to the DD-MZM via the second RF port. The signal at the output of the DD-MZM is sent to an optical fiber acting as a dispersive medium. A low-noise erbium-doped fiber amplifier (EDFA) (Nortel FA17URAC) with a gain of 25 dB is placed after the fiber to amplify the optical signal. The signal at the output of the EDFA is sent to a PD (New Focus 1414) with a bandwidth of 25 GHz and a responsivity of 0.7 A/W. The chromatic dispersion of the fiber is 175 ps/nm. The baud rate of the system is set to be 4 GBd leading to a computing speed of 2 × 4 × 4 GBd = 32 GFLOPS. To align the successive symbols of the serial input signal, the wavelengths are 1545.00, 1546.43, 1547.86, and 1549.29 nm with 1.43-nm wavelength spacing. After photonic-to-electrical conversion at the PD, an electrical signal is generated which is sampled by an oscilloscope (OSC). Finally, the feature map is obtained from the sampled signal. One image from the MNIST dataset, which is labeled as number 1, shown in Fig. 5(a), is used for demonstration. Before being sent to the optical processor, the gray scale 2D MNIST image is standardized and serialized to a 1D signal, as also shown in Fig. 5(a). The points less than 0 are the background black pixels while the points larger than the background line are the white pixels. The kernels are also serialized as a time waveform and sent to the other port of the DD-MZM to induce binary weights to the image input. The zoom-in view of the waveform of eight kernels is shown in Fig. 5(b). The image signal and the weight signal are sent to the AWG and synchronized.

 figure: Fig. 5.

Fig. 5. (a) Serialized temporal waveform of the image. Inset shows an image from the MNIST dataset labeled as “1” and the serialization process. (b) Zoom-in view of the waveform of the eight kernels.

Download Full Size | PDF

To estimate the intrinsic parameter Iref, a rectangular waveform before the kernel waveform is used to generate the reference level. For the other intrinsic parameter g, manual calibration can be performed to obtain the relationship between the amplitude of the electrical signal and the actual value. To simplify the intrinsic calibration process, layer normalization is incorporated into the model shown in Fig. 3 since the layer normalization has a re-scaling invariant property which leads the output being irrelative to the scaling factor g. Since the sum slot is achieved with every 4-symbol spacing, we down-sample the generated waveform at the output of the PD every four symbols, and the down-sampled waveform as the pixels is shown as red dots in Fig. 6. The reference level is also recorded, shown by the blue triangles in Fig. 6. Subtracting the reference level from the down-sampled pixels, the feature map of the first convolutional layer can be calculated. At the same time, we apply the convolutional layer to the input image with the eight kernels by a digital computer, and eight feature maps are also obtained. Both results are shown in Fig. 7(a). The root mean square errors (RMSEs) between the feature maps are 0.1137, 0.0919, 0.3363, 0.1738, 0.2119, 0.1914, 0.3321, 0.1226, while the structural similarity index measures (SSIMs) between the feature maps are 0.6632, 0.6847, 0.1890, 0.6432, 0.2468, 0.6133, 0.1907, 0.4083. We can see the optical processor can generate similar results. Finally, the eight feature maps are sent to the second convolutional layer followed by the FC layer, and the output of the softmax layer for the classification task is shown in Figs. 7(b) and 7(c). Figure 7(b) shows the classification probability calculated by the digital computer, and the probability of the correct classification is 0.99999. Figure 7(c) shows the classification result calculated by the optical processor. The probability of correct classification is 0.99988. Both can give the right results. We further tested 49 images. Both can generate the same results. The classification confusion matrix is given in Fig. 7(d).

 figure: Fig. 6.

Fig. 6. Recorded waveform (black line), the down-sampled pixels (red dots), and the reference level (blue triangle) for the eight feature maps of the first convolutional layer.

Download Full Size | PDF

 figure: Fig. 7.

Fig. 7. (a) Feature maps calculated by a digital computer (left) and the optical processor (right). Classification probabilities by (b) a computer and (c) the processor. (d) Confusion matrix for the MNIST dataset.

Download Full Size | PDF

Furthermore, the optical processor is also tested on the fashion MNIST dataset which is a dataset of Zalando’s article images. First, the CNN given in Fig. 3 is re-trained on the new dataset. The training curves of the full-precision and the binarized model are given in Figs. 8(a) and 8(b). The best accuracy for the full-precision and binarized weights is 89.70% and 86.44%, respectively. The performance degradation of the model after binarization on the fashion MNIST dataset is larger than the MNIST dataset, which can be attributed to the less redundancy of the weights. A better model on the fashion MNIST can be designed to get less loss when the weights are binarized. The trained kernels for the fashion MNIST are [1,1;−1,1], [−1,1; −1,1], [1, −1;1,1], [1, −1;1, −1], [1,1;1,1], [−1, −1, −1,1], [1,1;1,1], and [−1,1;1,1]. We calculate the feature maps of a sample “ankle boot” from the dataset by a digital computer and the optical processor, with the results given in Fig. 8(c). The RMSEs between the eight feature maps are 0.1170, 0.2468, 0.0828, 0.2125, 0.0686, 0.2499, 0.0547, 0.0687 while the SSIMs are 0.8104, 0.3765, 0.9087, 0.6047, 0.7945, 0.4414, 0.8312, 0.9489. Similar results are obtained. The classification probability is 0.92 by a digital computer while it is 0.74 by the optical processor, as shown in Figs. 8(d) and 8(e). Both can give the right results. We further tested 49 images. Again, the same results are obtained. The classification confusion matrix is given in Fig. 8(f).

 figure: Fig. 8.

Fig. 8. Training curves with (a) full precision and (b) binarized weights. (c) Feature maps (left maps, calculated by a digital computer; right maps, calculated by the optical processor). Classification probabilities calculated by (d) a digital computer and (e) the optical processor. (f) Confusion matrix for the fashion MNIST dataset.

Download Full Size | PDF

In conclusion, we have proposed a novel optical processor based on incoherent waveguide optics for a binarized NN. To achieve ±1 weights, the DD-MZM was operating between two opposite quadrature points. The use of the optical processor to demonstrate a binarized CNN for image classification was performed at a speed of 32 GFLOPS and the results agreed well with those using a digital computer. The computing speed can be scaled up by using more wavelengths, higher baud rate, and more DD-MZMs.

Funding

Natural Sciences and Engineering Research Council of Canada.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

REFERENCES

1. Y. LeCun, Y. Bengio, and G. Hinton, Nature 521, 436 (2015). [CrossRef]  

2. B. J. Shastri, A. N. Tait, T. F. de Lima, W. H. P. Pernice, H. Bhaskaran, C. D. Wright, and P. R. Prucnal, Nat. Photonics 15, 102 (2021). [CrossRef]  

3. H. Zhou, J. Dong, J. Cheng, W. Dong, C. Huang, Y. Shen, Q. Zhang, M. Gu, C. Qian, H. Chen, Z. Ruan, and X. Zhang, Light: Sci. Appl. 11, 1 (2022). [CrossRef]  

4. S. Xu, J. Wang, R. Wang, J. Chen, and W. Zou, Opt. Express 27, 19778 (2019). [CrossRef]  

5. Y. Huang, W. Zhang, F. Yang, J. Du, and Z. He, Opt. Express 27, 20456 (2019). [CrossRef]  

6. X. Xu, M. Tan, B. Corcoran, J. Wu, A. Boes, T. G. Nguyen, D. T. Chu, B. E. Little, D. G. Hicks, R. Morandotti, A. Mitchell, and D. J. Moss, Nature 589, 7840 (2021). [CrossRef]  

7. Y. Zang, M. Chen, S. Yang, and H. Chen, IEEE J Sel. Top. Quantum Electron 26, 1 (2020). [CrossRef]  

8. Y. Shen, N. C. Harris, S. Sirlo, M. Prabhu, T. B. Jones, M. Hochberg, X. Sun, S. Zhao, H. Larochelle, D. Englund, and M. Soljacic, Nat. Photonics 11, 441 (2017). [CrossRef]  

9. H. Zhang, M. Gu, X. D. Jiang, J. Thompson, H. Cai, S. Paesani, R. Santagati, A. Laing, Y. Zhang, M. H. Yung, Y. Z. Shi, F. K. Muhammad, G. Q. Lo, X. S. Luo, B. Dong, D. L. Kwong, L. C. Kwek, and A. Q. Liu, Nat. Commun. 12, 1 (2021). [CrossRef]  

10. Z. Lin, S. Sun, J. Azana, W. Li, and M. Li, Opt. Express 29, 13 (2021). [CrossRef]  

11. C. Rios, N. Youngblood, Z. Cheng, M. Le Gallo, W. H. P. Pernice, C. D. Wright, A. Sebastian, and H. Bhaskaran, Sci. Adv. 5, 2 (2019). [CrossRef]  

12. M. Mario and V. J. Sorger, Appl. Phys. Rev. 7, 3 (2020). [CrossRef]  

13. L. Deng, G. Li, S. Han, L. Shi, and Y. Xie, Proc. IEEE 108, 485 (2020). [CrossRef]  

14. M. Courbariaux, Y. Bengio, and J.-P. David, in Advances in Neural Information Processing Systems 28 (NIPS 2015)  3123 (2015).

15. H. Qin, R. Gong, X. Liu, X. Bai, J. Song, and N. Sebe, Pattern Recognit. 105, 107281 (2020). [CrossRef]  

16. M. Oita, M. Takahashi, S. Tai, and K. Kyuma, Opt. Lett. 15, 21 (1990). [CrossRef]  

17. A. Shortt, J. G. Keating, L. Moulinier, and C. N. Pannell, Inf. Sci. 171, 273 (2005). [CrossRef]  

18. J. Bueno, S. Maktoobi, L. Froehly, I. Fischer, M. Jacquot, L. Larger, and D. Brunner, Optica 5, 756 (2018). [CrossRef]  

19. T. Zhang, J. Wang, Y. Dan, Y. Lanqiu, J. Dai, X. Han, X. Sun, and K. Xu, Opt. Express 27, 26 (2019). [CrossRef]  

20. H. H. Zhu, J. Zou, H. Zhang, Y. Z. Shi, S. B. Luo, N. Wang, H. Cai, L. X. Wan, B. Wang, X. D. Jiang, J. Thompson, X. S. Luo, X. H. Zhou, L. M. Xiao, W. Huang, L. Patrick, M. Gu, L. C. Kwek, and A. Q. Liu, Nat. Commun. 13, 1044 (2022). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (8)

Fig. 1.
Fig. 1. Experimental setup. The DD-MZM and the PD are the I/O interface of the optical processor. Inset shows the switching between the opposite quadrature points leading to binary weighting.
Fig. 2.
Fig. 2. Sum of the weighted input signal due the dispersion-induced time delay.
Fig. 3.
Fig. 3. Employed CNN model which has two convolutional layers and one fully connected layer. The optical processor implements the first convolution layer.
Fig. 4.
Fig. 4. Training curves of the model with (a) full-precision and (b) binarized weights on the MNIST dataset.
Fig. 5.
Fig. 5. (a) Serialized temporal waveform of the image. Inset shows an image from the MNIST dataset labeled as “1” and the serialization process. (b) Zoom-in view of the waveform of the eight kernels.
Fig. 6.
Fig. 6. Recorded waveform (black line), the down-sampled pixels (red dots), and the reference level (blue triangle) for the eight feature maps of the first convolutional layer.
Fig. 7.
Fig. 7. (a) Feature maps calculated by a digital computer (left) and the optical processor (right). Classification probabilities by (b) a computer and (c) the processor. (d) Confusion matrix for the MNIST dataset.
Fig. 8.
Fig. 8. Training curves with (a) full precision and (b) binarized weights. (c) Feature maps (left maps, calculated by a digital computer; right maps, calculated by the optical processor). Classification probabilities calculated by (d) a digital computer and (e) the optical processor. (f) Confusion matrix for the fashion MNIST dataset.
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.