Mobile QR Code QR CODE

  1. (Department of Electronic and Electrical Engineering, Ewha Womans University, Seoul 03760, Korea)
  2. (School of Electronic and Electrical Engineering, Hongik University, Seoul 04066, Korea)
  3. (Department of Electrical and Computer Engineering, Seoul National University, Seoul 08826, Korea)



Hardware artificial intelligence, power efficiency, multiplicate-and-accumulate (MAC) operation, memory-based artificial intelligence chip

I. INTRODUCTION

Artificial intelligence has been mainly developed in a way that implements mathematical representation of the functions of neurons and synapses in biological systems through software technology [1-3]. Artificial intelligence has begun to be realized based on hardware for the energy efficiency and volume reduction of the system since a few years ago, and artificial intelligence semiconductor chips based on integrated circuits have become visible [4-6]. C. Mead mentioned ``neuromorphic'' system as a next-generation computing technology that can maximally parallelize serial operations in the conventional digital computers [7]. However, although current neuromorphic chips mimic effectively the functions of neurons and synapses, they are still done on digital integrated circuit technology. In order to implement a neuromorphic system in a more active sense and in a more faithful sense to the original text, renovation at the component technology level must be made. Component technology here means memory device technology, so various memory devices can be the basis of hardware-oriented artificial intelligence computing [8-11]. In an equal sense, the ultimate form of the artificial intelligence computing is memory computing. The biggest advantage of hardware-oriented artificial intelligence semiconductor chips is power efficiency, and how successful the technology is should be based on quantitative evaluation of power efficiency. Tera operations per second per watt (TOPS/W) is a widely-used metric index for evaluating the operational power efficiency of digital circuit-based multiplicate-and-accumulate (MAC) operations [12], but it is difficult to apply to neuromorphic systems that perform event-driven operations not relying on clock [13]. In addition, if operations are performed in a memory-based synaptic array rather than circuit-based one, it is more difficult to use the existing indicator. This paper presents a method of deriving the power efficiency of artificial intelligence chips that are operated based on synaptic memory cells. The unit does not lose generality by using TOPS/W, but a technique that can reflect the characteristics of memory devices is presented via a purely mathematical process.

II. NUMBER OF MAC OPERATIONS

In order to draw a general conclusion, an inductive method can be chosen by explaining it through some representative scenarios. As shown in Fig. 1, a fully-connected network (FCN) that takes a Modified National Institute of Standards and Technology (MNIST) image pattern without intentional reduction in number of pixels as input and classifies from 0 to 9 can be presumed. That is, the number of input nodes is 28 ${\times}$ 28 = 784, and the number of output nodes is 10. Also, it is assumed that the network has a single hidden layer and there are 200 hidden nodes on it. The total number of synapses can be revealed by the following simple equation, Eq. (1).

(1)
$ (Total \;number \;of\; synapses\; to\; construct\; the\; FCN) \\ = (Total \;number \;of\; connection \;lines\; between \;nodes) \\ = (784 {\times} 100) + (200 {\times} 10) = 158,800\; synapses $

The synaptic weight, or artificial intelligence parameter in the equivalent term, is determined by the strength in connection between neurons on two different consecutive layers making up the FCN. The connectivity between ith neuron on a pre-layer and jth neuron on the post-layer can be noted as w$_{ij}$. Fig. 2 shows a network constructed by two layers, in which the pre-layer has 5 nodes and the post-layer has 4 nodes, respectively. Based on the definition of weight, it can be identified that the positive integer i and j are ranged 1 ${\leq}$ i ${\leq}$ 5 and 1 ${\leq}$ j ${\leq}$ 4. Through this method of weight representation, all the weights existing in the network in Fig. 2 can be expressed in a single matrix as shown in Fig. 3.

As previously defined, weight identification can be defined with a two-digit subscript with the order of a node on the pre-layer as the first digit and that of a node on the post-layer as the second digit, which allows a matrix with the numbers of pre-layer and post-layer nodes as the numbers of rows and columns of the matrix, respectively as can be clarified by Fig. 3. The first row (blue shaded) is a set of weights associated with node 1 on the pre-layer; the fourth column (orange shaded) is that of weights related with node 4 on the post-layer.

Looking at the matrix in Fig. 3, it can be seen that the product between the numbers of nodes constituting each layer in two different consecutive layers is the number of entries of the matrix. It indicates the number of weight values that determine the connectivity between the neurons on the pre-synaptic and post-synaptic layers. Further, the FCN in Fig. 1 can be also represented by the product between subnetwork matrices including the input and the output vectors as shown in Fig. 4. The number of weight values corresponds to that of multiplication operations between the two layers. In performing sum operations among the terms over which product has been operated, the number of sum operations is always 1 less than that of terms to be added, that of weighted inputs. Looking back at the examples in Fig. 2 and Fig. 3, five weighted inputs are fed into node 4 on the post-layer, and thus, the number of sum operations made on them is four. Since these operations are carried out on the four nodes constituting the post-layer, a total of 16 sum operations are executed. These results can be generalized for an FCN with the number of pre-layer nodes m and post-layer nodes n as Eqs. (2) and (3) below.

(2)
$ (Number\; of \;multiplications) \\ = (Number \;of\; entries \;of\; the \;conversion \;matrix) \\ = (Dimension \;of\; the\; matrix) = m {\times} n $
(3)
$ (Number\; of\; accumulations\; or\; sums) = (m-1) {\times} n $

Based on this mathematical foundation, the total number of MAC operations performed in the FCN in Fig. 1 can be obtained. Fig. 2 shows the redrawn FCN with identifications of subnetworks A and B that can be represented by two different matrices. From Eq. (2), the total number of product operations in the subnetwork A is 784 ${\times}$ 200 = 156,800. From Eq. (3), the total number of sum operations in the subnetwork A is (784 - 1) ${\times}$ 200 = 156,600. Thus, the total number of MAC operations is calculated to be 313,400 operations in the subnetwork A. In the same manner, the total number of MAC operations conducted in the subnetwork B is 200 ${\times}$ 10 + (199 - 1) ${\times}$ 10 = 3,990. Finally, the total number of MAC operations carried out in the exemplary FCN in Fig. 1 and 2 comes to $313,400 + 3,990 = 317,390$ operations. Here, it is assumed that all the MAC operations for inferencing are performed at the same time and all the synaptic devices are activated. Thus, this value provides the worst case assumption in calculating the power efficiency of MAC operation in the given FCN. This calculation method is also applicable to deep neural networks (DNNs) and the total number of MAC operations in a given network is obtained by individually adding the number of MAC operations between two successive layers.

(4)
../../Resources/ieie/JSTS.2024.24.1.47/eq4.png

If the neural network is composed of n subnetworks, or equivalently, (n-1) hidden layers, the total number of MAC operations can be calculated in the same manner. Here, it can be assumed that the number of input nodes is m$_{1}$ and that of output nodes m$_{\mathrm{n+1}}$ so that the total number of conversion matrices is n. The sum of numbers of multiplications and accumulations (MAC operations) of the kth subnetwork can be calculated as Eq. (5).

(5)
m$_{\mathrm{k}}$m$_{\mathrm{k+1}}$ + (m$_{\mathrm{k}}$ - 1) ${\times}$ m$_{\mathrm{k+1}}$ = 2 m$_{\mathrm{k}}$m$_{\mathrm{k+1}}$- m$_{\mathrm{k+1}}$

Thus, the total number of MAC operations over the n subnetworks and that of synapses in the entire artificial neural network are expressed as Eqs. (6) and (7), respectively. Plugging Eqs. (6) and (7) into Eq. (4) results in the MAC operation efficiency of the deep neural network (DNN) having n subnetworks or (n-1) hidden layers. Since the other three terms in the denominator are not affected by the array size once the type of synaptic device is already given, the MAC operation efficiency is determined by the ratio between the terms in Eqs. (6) and (7) which are put in the numerator and in the denominator in Eq. (4), respectively.

(6)
$ \sum _{k=1}^{n}\left(2m_{k}m_{k+1}-m_{k+1}\right) $
(7)
../../Resources/ieie/JSTS.2024.24.1.47/eq7.png
(8)
$ =2-\frac{\sum _{k=1}^{n}m_{k+1}}{\sum _{k=1}^{n}m_{k}m_{k+1}}\cong 2-\frac{a(n-1)}{a^{2}n}\cong 2-\frac{1}{a} $

Eq. (8) implies that MAC operation efficiency becomes less dependent on the number of hidden layers as the depth of an artificial neural network is deepened. Here, the numbers of rows and columns in the individual matrices are assumed to be comparably same with a. Thus, furthermore, as the size of individual conversion matrices increases, the ratio in Eq. (8) asymptotically approaches the factor of 2. With the numbers given in a previous example, 317,390 / 158,800 = 1.999, which is a value very close to 2. Therefore, it is revealed that the MAC operation efficiency in Eq. (4) is validated even for a DNN, having little dependence on the depth of the neural network and the number of synaptic devices. This mathematical formulation is valid only when the size of task is not considered. If the array is too small compared with that of a given task (number of operations that need to be performed at one time for achieving a specific goal), the operation efficiency of the small synapse array would be low. If the array size is excessively large compared with the workload, most of the power consumption would be dedicated to sustaining the stand-by (or low-conductivity) mode of the synaptic cells not in work. How efficiently a given task can be completed by a small amount of energy is determined by the sizes of synaptic array and workload so that this matter might go to an optimization problem in reality.

Fig. 1. Fully-connected network (FCN) having one hidden layer with 200 nodes performing MNIST pattern classification.
../../Resources/ieie/JSTS.2024.24.1.47/fig1.png
Fig. 2. Two-layer network where the pre-layer has 5 nodes and the post-layer has 4 nodes, respectively.
../../Resources/ieie/JSTS.2024.24.1.47/fig2.png
Fig. 3. Matrix representation of all the synaptic weights in the two-layer network given in Fig. 2.
../../Resources/ieie/JSTS.2024.24.1.47/fig3.png
Fig. 4. Matrix representation of the FCN in Fig. 1 demonstrating the relation between input and output vectors.
../../Resources/ieie/JSTS.2024.24.1.47/fig4.png
Fig. 4. Redrawn FCN in Fig. 1 with identifications of subnetworks A and B that can be represented by two different matrices.
../../Resources/ieie/JSTS.2024.24.1.47/fig4-1.png

III. POWER EFFICIENCY OF INFERENCE AND IMPLICATION FOR SYNAPSE CELL DESIGN

The power efficiency of the MAC operation for inference has been defined as Eq. (4) in this study. As previously stated, this definition is quite different from that usually adopted for general digital circuit-based MAC operation accelerators. However, for familiarity on the evaluator’s side and metric generality, the unit of TOPS/W can be maintained. The most distinctive feature of the MAC operation efficiency presented in Eq. (4) is that the power efficiency depends primarily on the characteristics of the synaptic device. To understand what level of value can be provided by Eq. (4), realistic values for the terms constituting the denominator of Eq. (4) should be substituted, and for this, several assumptions can be made as follows.

(i) Binary memory operation (0 and 1 weight)

(ii) The numbers of synapses having state 0 and state 1 at arbitrary moment are equal.

(iii) Only the inference operation is taken into account for calculating the power efficiency.

(iv) Inference current at state 1 = 1 ${\mu}$A = 10$^{-6}$ A

(v) Inference current at state 0 is negligibly small; smaller than 1/1,000 times of (iv).

(vi) Inference voltage = 1 V

(vii) Inference time = 1 ${\mu}$s = 10$^{-6}$s

The first term in the denominator of Eq. (4), total number of synapses, was already prepared and can be brought from Eq. (1). Plugging Eq. (1) and all the presumably determined values in the assumptions above into Eq. (4) provides the inference power efficiency of the neural network of 784 ${\times}$ 200 ${\times}$ 10 FCN based on synaptic memory devices.

(9)
../../Resources/ieie/JSTS.2024.24.1.47/eq9.png

Looking into Eqs. (4) and (9), some important implications for designing memory device-based synaptic cells can be derived. First, MAC operation efficiency is inversely proportional to on-state inference current. Thus, synaptic cells should be designed to have low maximum conductance. Second, it is necessary to develop synaptic devices that can lower the inference voltage. Third, synaptic devices with high read operation speed (high inference speed) should be designed. If another set of assumptions is made with inference current = 100 nA, inference voltage = 0.5 V, and inference time = 100 nA, a drastically high efficiency of 400 TOPS/W is obtained. Since three of the four terms making up the denominator of Eq. (4) are determined by the electrical characteristics of the single synaptic memory device, the MAC operation efficiency defined in Eq. (4) can be understood to be a highly practical index focused on the performance of the cell itself. The on-state inference current and the inference voltage are values that can decrease as the synaptic device is shrunken. On the other hand, although inference time can be shortened as the device is scaled down, the inference operation of synaptic cell is largely influenced by interconnect technology since it is not done on an individual cell standing separated, but in the whole synapse array level. As a result, it can be concluded that the ensemble effect on the MAC operation efficiency from the three terms defined by the miniaturization of the synaptic cell is not so significant. According to Eq. (4), total number of synapses and MAC operation efficiency are inversely proportional, so the higher the degree of integration density of synapse array constructing the artificial intelligence semiconductor chip, the lower the efficiency. Fortunately, however, the total number of MAC operations over distinct layers in the numerator of Eq. (4) has a succinct cancelling effect with the total number of synapse in the denominator. It can be explicitly proven through the procedures in Eq. (10) that the MAC operation efficiency for inference presented in this study is very weak in dependence on scaling level of a synapse or array integration density of synaptic devices.

In particular, the inference time in the denominator in Eqs. (4) and (6) indicates the time required for bitline inference operation rather than for cell-level read operation, which makes the inference time little dependent on cell scaling. Here, NI and NO stand for numbers of inputs and outputs, respectively. h is the number of nodes in the hidden layer. Eq. (10) demonstrates that the MAC operation efficiency is independent of cell scalability and synapse array density, and is only dependent on the cell characteristics. The derivation of Eq. (10) has been made in consideration that the synapses show a binary operation and all the synapses have the highest conductivity or fully-on inference current for the worst-case scenario. If the synaptic device is capable of multi-level operation, the MAC operation efficiency in Eq. (10) goes higher. If the synaptic device is permitted to have n different inference current levels and all the synaptic devices have the equal probabilities to have the permitted n weights, 1/n of the total synaptic devices have 0, 1/(n-1), 2/(n-1), …, (n-2)/(n-1), and (n-1)/(n-1) = 1 times of the fully-on inference current. Applying these assumptions to Eq. (10), the head part of Eq. (10) can be simplified as Eq. (11). As the result, the MAC operation efficiency in Eqs. (10) and (11) is independent on the number of synaptic levels, by which the generality of the equations can be finally validated. However, this generality can be limited inside the synapse array and one should be on guard for increased complexity and power consumption of the peripheral circuits inevitably required for the multi-level operation in designing the entire system architecture.

IV. CONCLUSIONS

In this study, an indicator for calculating the MAC operation efficiency of a full-fledged hardware-oriented artificial intelligence semiconductor chip in which the operations are performed in an artificial neural network composed of synaptic cells based on memory devices has been presented. Although different from the definition of existing ones, the index definition has been made so that device-specific parameters can have the predominance without losing familiarity and generality by maintaining the unit of TOPS/W. The value of the indicator obtained by the newly proposed method is likely to improve depending on scaling of device, but it is hard to address that there is a great dependence, and it is hardly affected by the integration size of the synaptic array. The new performance metric index will serve as a highly practical guideline for designing synaptic devices that make up hardware-oriented artificial intelligence chips and predicting the inference power efficiency of the synapse array in separation from the peripheral circuits.

ACKNOWLEDGMENTS

This work was supported by the Ministry of Science and ICT of Korea (MSIT) through the Grants 2020-0-01294 and RS-2023-00258527.

References

1 
F. Rosenblatt, “Perceptron Simulation Experiments,” Proc. IRE, vol. 48, no. 3, pp. 301-309, Mar. 1960.DOI
2 
H. D. Block, “The Perceptron: A Model for Brain Functioning. I,” Rev. Mod. Phys., vol. 34, no. 1, pp. 123-135, Jan. 1962.DOI
3 
S. K. Pal and S. Mitra, "Multilayer Perceptron, Fuzzy Sets, and Classification,” IEEE Trans. Neural Networks, vol. 3, no. 5, pp. 683-697, Sep. 1992.URL
4 
M. Davis, et al., “Loihi: A Neuromorphic Manycore Processor with On-Chip Learning,” IEEE Micro, vol. 38, no. 1, pp. 82-99, Jan. 2018.DOI
5 
F. Akopyan, et al., “TrueNorth: Design and Tool Flow of a65 mW 1Million Neuron Programmable Neurosynaptic Chip,” IEEE Trans. Compt. Aided Des. Integr. Circuits Syst., vol. 34, no. 10, pp. 1537-1557, Oct. 2015.DOI
6 
N. P. Jouppi, et al., “In-Datacenter Performance Analysis of a Tensor Processign Unit,” Proc. Annual International Symposium on Computer Architecture (ISCA), pp. 1-12, Toronto, Canada, Jun. 2017.DOI
7 
C. Mead, “Neuromorphic Electronic Systems,” Proc. IEEE, vol. 78, no. 10, pp. 1629-1636, Oct. 1990.DOI
8 
S. Cho, “Volatile and Nonvolatile Memory Devices for Neuromorphic and Processing-in-memory Applications,” J. Semicond. Technol. Sci., vol. 22, no. 1, pp. 30-46, Feb. 2022.URL
9 
D. J. Jang, H. Ryu, H. Cha, N.-Y. Lee, Y. Kim, and M.-W. Kwon, “Synaptic Device Based on Resistive Switching Memory using Single-Walled Carbon Nanotubes,” J. Semicond. Technol. Sci., vol. 22, no. 5, pp. 346-352, Apr. 2022.URL
10 
K. Udaya-Mohanan, S. Cho, and B.-G. Park, “Medium-Temperature-Oxidized GeO Resistive-Switching Random-Access Memory and Its Applicability in Processing-in-Memory,” Nanoscale Res. Lett., vol. 17, pp. 63-1-63-14, Jul. 2022.DOI
11 
B. Jeon, T. Jang, S. Cho, H. Shin, and W. Y. Choi, “Synapse Array with Buried Bottom Gate Structure for Neuromorphic Systems,” Proc. Silicon Nanoelectronics Workshop (SNW), pp. 15-16, Kyoto, Japan, Jun. 2023.DOI
12 
Q. Liu, et al., “A Fully Integrated Analog ReRAM Based on 78.4 TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing,” Proc. IEEE International Solid-State Circuits Conference (ISSCC), pp. 500-502, San Francisco, CA, Feb. 2020.URL
13 
V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “How to Evaluate Deep Neural Network Processors: TOPS/W (alone) Considered Harmful,” IEEE Solid-State Circuits Mag., vol. 12, no. 3, pp. 28-41, Aug. 2020.DOI
Seongjae Cho
../../Resources/ieie/JSTS.2024.24.1.47/au1.png

Seongjae Cho received the B.S. and the Ph.D. degrees in electrical engineering from Seoul National University, Seoul, Korea, in 2004 and 2010, respectively. He worked as an Exchange Researcher at the National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan, in 2009. He worked as a Postdoctoral Researcher at Seoul National University in 2010 and at Stanford University, Palo Alto, CA, from 2010 to 2013. Also, he worked as a faculty member at the Department of Electronic Engineering, Gachon University, from 2013 to 2023. He is currently working as an Associate Professor at the Department of Electronic and Electrical Engineering, Ewha Womans University, Seoul, Korea, from 2023. His current research interests include emerging memory devices, advanced nanoscale CMOS devices, and ultra-small integration technologies.

Sung-Tae Lee
../../Resources/ieie/JSTS.2024.24.1.47/au2.png

Sung-Tae Lee received the B.S. and the Ph.D. degrees in electrical and computer engineering from Seoul National University (SNU), Seoul, Korea, in 2016 and 2021, respectively. He has been an Assistant Professor with the School of Electronic and Electrical Engineering, Hongik University, since 2023. His current research interests include neuromorphic devices and their application in advanced computing.

Soomin Kim
../../Resources/ieie/JSTS.2024.24.1.47/au3.png

Soomin Kim received the B.S. degree in Electronic and Electrical Engineering from Ewha Womans University, Seoul, Korea, in 2023. She is currently pursuing the M.S. degree at Ewha Womans University. Her current research interests include nanoscale CMOS devices, low-power synaptic devices, and scalable neuron circuits for neuromorphic applications.

Hyungcheol Shin
../../Resources/ieie/JSTS.2024.24.1.47/au4.png

Hyungcheol Shin received the B.S. and M.S. Degrees in electrical engineering from Seoul National University, Seoul, Korea, in 1985 and 1987, respectively, and the Ph.D. degree in electrical engineering from the University of California, Berkeley, in 1993. From 1994 to 1996, he worked as a Senior Device Engineer in Motorola. From 1996 to 2003, he was with the Department of Electrical Engineering and Computer Science at the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, as an Associate Professor. From 2001 to 2002, he worked as a Staff Scientist in Qualcomm. Since 2003, he has been with Seoul National University (SNU), Seoul, Korea, where he is currently a professor in the Department of Electrical and Computer Engineering. From 2012 to 2013, he was a Director of the Inter-university Semiconductor Research Center (ISRC) at SNU.