Mobile QR Code QR CODE

Main Menu

The Journal of Semiconductor Technology and Science (JSTS) is an international, peer-reviewed, and open-access journal that is published bimonthly.
- Scope: semiconductor processes, devices, circuits, and MEMS.
- Editor-in-Chief: Prof. Woo Young Choi (ECE, Seoul National University)
- Indexed within Science Citation Index Expanded (SCIE), SCOPUS, Korea Citation Index (KCI), and other databases.

Journal Search

[

Research article

]

JSTS(Journal of Semiconductor Technology and Science)

IEIE Vol. 20, No. 2, p.195-203

ISSN (print) :

1598-1657

ISSN (online) :

2233-4866

Received : 22 December 2019Accepted : 20 February 2020

DOI :

https://doi.org/10.5573/JSTS.2020.20.2.195

Area-efficient Transposable Crossbar Synapse Memory Using 6T SRAM Bit Cell for Fast Online Learning of Neuromorphic Processors

KooJongeun KimJinseok RyuSungju KimChulsoo KimJae-Joon

(Pohang University of Science and Technology)

^* E-mail: jongeun.koo@postech.ac.kr

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

We present a transposable crossbar synapse memory using 6T SRAM bit cell to speed up online learning of neuromorphic processors at minimal area cost. Based on the proposed integrated transposable row addressing scheme using a row-transition multiplexer, fast write and read operations are made possible for both row-wise and column-wise accesses in an integrated 6T SRAM bit cell array at much smaller area cost compared to the previous works. A 256×256 4-bit transposable synapse memory was implemented in a 28 nm CMOS technology, which had 26% area overhead against the non-transposable 6T synapse memory. The estimated performance gains for unsupervised learning algorithms of spiking neural network and restricted Boltzmann machine using the MNIST data set were 6.3× and 19.3× respectively compared to the non-transposable synapse memory.

Index Terms

Spiking neural network, RBM, STDP, contrastive divergence, synapse memory

I. INTRODUCTION

Recently, spiking neural network (SNN), which uses spike frequency and spike-timing interval for processing, has attracted attention for its biologically plausible characteristics and high energy efficiency ^(1,²⁾. Spike-timing-dependent plasticity (STDP) is the best known SNN learning algorithm that updates synaptic weight based on the interval between pre-synaptic spike time ($t_{pre}$) and post-synaptic spike time ($t_{post}$) ⁽³⁾. In a non-transposable crossbar synapse memory, a column-wise weight vector can be accessed weight-by-weight only by activating all word lines (WLs) one-by-one, while a row-wise weight vector can be concurrently accessed by activating a single WL. For post-then-pre ($t_{post}$${-}$$t_{pre}$ $<$ 0) STDP learning step, all synaptic weights connected to one pre-synaptic neuron can be updated at the moment when the pre-synaptic spike occurs ($t_{pre}$) ⁽³⁾. Thus, the non-transposable synapse memory provides high throughput for post-then-pre STDP learning step in which each pre-synaptic spike incurs updating a row-wise weight vector. However, the throughput for the pre-then-post ($t_{post}$${-}$$t_{pre}$ $>$ 0) STDP learning, where each post-synaptic spike incurs updating a column-wise weight vector, is severely degraded.

Restricted Boltzmann machine (RBM) is another good exemplary neural network which prefers the transposable synapse memory. RBM is a generative stochastic artificial neural network that generates a probability distribution of the input data for applications including dimensionality reduction, classification, and feature learning ^(4-⁶⁾. In RBM, the visible ($\boldsymbol{V}=\left[\begin{array}{llll}v_{0} & v_{1} & \cdots & v_{\mathrm{M}-1}\end{array}\right]$) and the hidden ($\boldsymbol{H}=\left[\begin{array}{llll}h_{0} & h_{1} & \cdots & h_{\mathrm{N}-1}\end{array}\right]$) layers form a bipartite graph, allowing inter-layer connections only, and the restricted connections between the two layers ($\boldsymbol{W} \in \boldsymbol{R}^{\mathrm{M} \times \mathrm{N}}$) allow an efficient learning with the gradient-based contrastive divergence (CD) ⁽⁷⁾. In the non-transposable synapse memory, hidden layer state can be computed with scalar-vector multiply-accumulate (MAC) operation because a row-wise weight vector can be accessed concurrently as shown in Fig. 1(a). On the other hand, visible layer state is computed with scalar-scalar MAC operation due to inefficient column-wise access in the non-transposable synapse memory as shown in Fig. 1(b). As a result, backward vector-matrix multiplication for a visible layer computation ($\boldsymbol{H} × \boldsymbol{W}^{T}$) in CD learning takes $O(n^{2})$ cycles, whereas it takes $O(n)$ cycles for forward vector-matrix multiplication for a hidden layer computation ($\boldsymbol{V} × \boldsymbol{W}$) when transposable read access to a synapse memory is not available.

Fig. 1. (a) Forward, (b) backward vector-matrix multiplication using non-transposable synapse memory for CD learning in RBM.

To cope with the limited column-wise access speed in the non-transposable synapse memory, a transposable synapse memory using a custom 8T SRAM bit cell was introduced ⁽⁸⁾. However, the transposable 8T SRAM bit cell incurs significant area overhead compared to 6T SRAM bit cell due to the routing of additional WL and bit line (BL) for transposable access. In our previous work on a digital neuromorphic processing core ⁽⁹⁾, we presented a novel transposable addressing scheme leveraging 6T SRAM macros. However, the area overhead is still substantial due to duplicated periphery circuits and additional routing for clock, address and control signals.

In this paper, we propose a transposable crossbar synapse memory using 6T SRAM bit cell that provides the same bandwidth as the previous transposable synapse memories at the cost of much smaller area overhead.

Section II describes the previous works and section III presents the proposed memory structure. Section IV shows testchip implementation and measurement result and section V describes learning performance estimation. Finally, section VI concludes this work.

Fig. 2. (a) A crossbar synapse memory connecting 4 pre-/post-synaptic neurons and n-bit synaptic weights using (b) 6T SRAM bit cell and (c) custom 8T SRAM bit cell.

Finally, section VI concludes this work.

II. PREVIOUS WORKS

Fig. 2(a) shows an exemplary crossbar synapse memory that fully connects 4 pre-synaptic and 4 post-synaptic neurons, where $W_{i,j}$ represents a single-/multi-bit synaptic weight between a pre-synaptic neuron $v_{i}$ and a post-synaptic neuron $h_{j}$. In the synapse memory using 6T SRAM bit cell shown in Fig. 2(b), a row-wise weight vector, [$w_{i,0}$ $w_{i,1}$ $w_{i,2}$ $w_{i,3}$] where $i$ is the row address, can be concurrently accessed because no weight in the vector shares the same BL with the others in the vector. On the other hand, a column-wise weight vector, [$w_{0,j}$ $w_{1,j}$ $w_{2,j}$ $w_{3,j}$]$^{T}$ where $j$ is the column address, should be accessed weight-by-weight in sequence because all weights in the vector overlap on the same BL. As a result, column-wise access to the synapse memory using 6T SRAM bit cell becomes very slow.

In ⁽⁸⁾, a transposable synapse memory using a custom 8T SRAM bit cell shown in Fig. 2(c) was proposed, which allows concurrent access to a column-wise weight vector through additional WL ($WL^{T}$) and BL ($BL^{T}$). However, the additional routing of WL$^{T}$ and BL$^{T}$ in the custom 8T SRAM bit cell results in significant area increase up to 2.5${\times}$ compared to the 6T SRAM bit cell. Moreover, the custom 8T SRAM bit cell cannot be configured as a multi-bit array due to the routing for transposable access. Therefore, multiple single-bit arrays need to be used to implement multi-bit weight as shown in in Fig. 2(c).

Fig. 3. Synaptic weight relocation scheme ⁽⁹⁾ (a) Before, (b) after weight relocation.

To reduce the area overhead, we presented a transposable synapse memory using 6T SRAM macros ⁽⁹⁾. The transposable synapse memory is based on the two key schemes of synaptic weight relocation and transposable row addressing. The synaptic weight relocation scheme is to shift each row-wise weight vector [$w_{i,0}$ $w_{i,1}$ $w_{i,2}$ $w_{i,3}$] to the right direction by the row address $i$ as shown in Fig. 3. It relocates each weight $W_{i,j}$ at ($r$, $c$) where $r$ and $c$ represent row and column of the physical address in the synapse memory array. The physical address is calculated as

(1)

$c=\left(j+i\right)% n_{col}$

where $n_{col}$ is the number of columns.

In the baseline 6T SRAM bit cell array shown in Fig. 3(a), all weights in a column-wise weight vector [$w_{0,j}$ $w_{1,j}$ $w_{2,j}$ $w_{3,j}$]$^{T}$ overlap on the same BL $BL[j]$. On the other hand, in the bit cell array after the weight relocation shown in Fig. 3(b), no weight in a column-wise weight vector overlaps on the same BL with the other weights in the vector because both physical rows and columns increase by 1 as the logical row address $i$ increases in a column-wise weight vector. As a result, column-wise weights are diagonally relocated as shown in Fig. 3(b). Although each synaptic weight in Fig. 3 is single-bit 6T SRAM cell, it can be easily replaced to $n$-bit weight in Fig. 2(b) while maintaining the structure. In this case, each BL in Fig. 3 is replaced to $n$-bit BLs.

Fig. 4. Transposable row addressing scheme for (a) row-wise, (b) column-wise accesses ⁽⁹⁾.

However, even after the relocation, a column-wise weight vector cannot be accessed concurrently because the physical row address, WL in other words, for each weight in the vector is different from each other (Fig. 3(b)). The transposable row addressing scheme splits the 2-dimensional synapse weight array into column chunks and changes the physical row address for each chunk depending on the access direction as shown in Fig. 4. For row-wise access shown in Fig. 4(a), the row addressing scheme sets the physical row addresses for all the chunks same as the logical row address $i$. On the other hand, for column-wise access shown in Fig. 4(b), the scheme calculates the physical row address $r$ for $c$-th chunk using Eq. (1). As a result, the physical row address for the 0-th chunk is 2’s complement of the logical column address $j$, ($n_{col}$ - $j$) in other words, and the row address increases by 1 as the chunk index increases. For example, to access a column-wise weight vector [$w_{0,1}$ $w_{1,1}$ $w_{2,1}$ $w_{3,1}$]$^{T}$ as shown in Fig. 4(b), the physical row addresses for the chunks are set to 3, 0, 1, and 2, respectively.

In our previous work on a digital neuromorphic processing core ⁽⁹⁾, we implemented a transposable synapse memory employing the two key schemes to speed up online learning. We used a barrel shifter to implement the synaptic weight relocation scheme as shown in Fig. 5. The barrel shifter shifts input row-wise (column-wise) weight vector to the right direction by the logical row (column) address $i$ ($j$) for write operation and shifts the read weights to the left direction for read operation. For the transposable row addressing scheme, we used 6T SRAM macros for the memory chunks shown in Fig. 4 and implemented transposable row addressing unit as shown in Fig. 5. However, the area overhead is still 1.83${\times}$ compared to the non-transposable synapse memory due to duplicated periphery circuits in SRAM macros and additional routing for clock, address and control signals. For further area reduction, we propose an integrated transposable synapse memory using 6T SRAM bit cell.

Fig. 5. Our previous implementation of transposable synapse memory using 6T SRAM macros ⁽⁹⁾.

III. PROPOSED MEMORY STRUCTURE

The proposed transposable synapse memory eliminates the need for the periphery circuits and other control overhead by integrating the transposable row addressing scheme inside bit cell array. Fig. 6 shows the overall structure of the proposed transposable synapse memory, in which the transposable row addressing scheme is implemented using the integrated custom address decoder and row-transition multiplexer (MUX). The address decoder activates the WL corresponding to the input address for row-wise access ($Rb/C=0$) and the WL corresponding to 2’s complement of the input address for column-wise access ($Rb/C=1$). Row-transition MUXs are placed between every two adjacent memory chunks. Each row-transition MUX connects the upper input WL to the output WL for row-wise access ($Rb/C=0$) and connects the lower input WL to the output WL for column-wise access ($Rb/C=1$). In other words, the row-transition MUX connects WLs in the same row of two adjacent memory chunks for row-wise access and connects the WL in the lower (left) memory chunk to the WL which is one row above in the upper (right) memory chunk for column-wise access. For example, the address decoder activates the WL for $w_{1,3}$ to access row ⁽¹⁾ weight vector [$w_{1,0}$ $w_{1,1}$ $w_{1,2}$ $w_{1,3}$]. At the same time, all row-transition MUXs connect the upper input WLs to the output WLs to activate WLs for $w_{1,0}$, $w_{1,1}$, and $w_{1,2}$ concurrently. To access col ⁽¹⁾ weight vector [$w_{0,1}$ $w_{1,1}$ $w_{2,1}$ $w_{3,1}$]$^{T}$, the address decoder activates the WL for $w_{3,1}$ and all row-transition MUXs connect the lower input WLs to the output WLs to access $w_{0,1}$, $w_{1,1}$, and $w_{2,1}$ concurrently.

Fig. 6. Overall structure of the proposed synapse memory.

Fig. 7. Proposed row-transition scheme for the memory configuration with column multiplexing.

The barrel shifter is placed between IO buffers and write drivers/sense amplifiers and is shared for write and read accesses.

Meanwhile, the proposed integrated transposable row addressing using row-transition MUX needs to be modified for the memory array with column multiplexing. Fig. 7 shows the proposed integrated transposable row addressing for memory configuration with column multiplexing. In the configuration, the input address is divided into two parts, MSBs for WLs and LSBs for column selection lines (CSLs), and the two parts activate WLs and CSLs respectively. The row-transition MUXs for CSLs are placed between every two adjacent memory chunks same as for WLs. The row-transition MUXs for CSLs connect the upper input CSLs to the output CSLs for row-wise access and connect the lower input CSLs to the output CSLs for column-wise access. On the other hand, the row-transition MUXs for WLs connect the lower input WLs to the output WLs only when the MSB of the CSLs in the previous memory chunk is ‘1’ for column-wise access. In this way, the proposed row-transition scheme that integrates the transposable addressing scheme inside memory array works correctly for any configuration of synapse memory using 6T SRAM bit cell.

IV. IMPLEMENTATION AND MEASUREMENT

1. Testchip Implementation and Measurement Results

We implemented a testchip of a 64K (256${\times}$256) 4-bit weight transposable synapse memory in a 28 nm CMOS technology. The 256${\times}$256 weight array was reconfigured into 4 256${\times}$64 weight arrays and the 4 arrays were stacked together as a 1024${\times}$64 configuration under the 256-bit I/O data width constraint. Therefore, it takes 4 cycles to read or write 256 weights in both row-/column-wise directions (64weights/cycle via 256-bit I/O data path) in the reconfigured 1024${\times}$64 weight synapse memory.

Fig. 8(a) shows the overall structure of the testchip. We implemented the proposed integrated transposable row addressing scheme for the reconfigured 1024${\times}$64 4-bit weight array with column multiplexing as shown in Fig. 7. A unit array $U[n]$ corresponding to 1024${\times}$1 weights was implemented to have a 128${\times}$8 4-bit weight array with 8-to-1 column multiplexing. In the unit array, a 4-bit synaptic weight among 1024 weights can be accessed for a read or write operation in a cycle. Each set of 8 unit arrays was horizontally placed to configure a sub-array. A local address decoder was connected to the WLs and CSLs of the left most unit array in each sub-array and 9 groups of 8 row-transition MUXs are placed between every two unit-array in each sub-array. The 64K-synapse memory consists of 8 sub-arrays as shown in Fig. 8(a). A 64-to-64 4-bit weight barrel shifter was placed between the periphery and the synaptic weight array.

A sub-array in the baseline non-transposable synapse memory is configured with 128${\times}$256 bit cells. In the testchip implementation, the sub-array was divided into 8 128${\times}$32 bit cell unit arrays and the proposed row-transition MUXs were placed between every 2 adjacent unit arrays. Because the proposed row-transition MUX was designed to have the same height with the 6T SRAM bit cell and about 5${\times}$ width, a column of the MUXs can be placed beside a bit cell array with the same vertical pitch of the bit cell. Fig. 8(b) shows the layout design of 2 adjacent unit arrays in the testchip. 2 dummy cells were added in both left and right edges of each unit array for layout regularity and a column of 128 row-transition MUXs were placed between unit arrays for WL transition. A column of 8 row-transition MUXs were also placed for CSL transition. In this way, 7 columns of 128 and 8 row-transition MUXs were placed for WL and CSL transitions respectively in each sub-array.

Fig. 8. (a) Overall structure, (b) array layout design, (c) die photograph, and specifications of testchip.

Fig. 8(c) shows the die photograph and specifications of the testchip. 6T SRAM bit cell was manually designed using logic design rules to integrate the proposed row-transition scheme. The built-in self-test (BIST) unit automatically generates test patterns for row-wise (column-wise) writes and column-wise (row-wise) reads. Then, the BIST unit compares the read weights with the expected output weights.

Fig. 9. Shmoo plot of testchip.

Fig. 9 shows the shmoo plot of the testchip measured at different frequencies and supply voltages. The chip runs at the maximum operating frequency=255 MHz at 1.1 V while consuming 1.023 mW power.

2. Analysis of Area and Performance Overheads

In the baseline non-transposable synapse memory with 256K (64K-weight ${\times}$ 4-bit) bit cells and 256 (64-weight ${\times}$ 4-bit) IOs, bit cell array, sense amplifiers/write drivers, and periphery circuit occupy 84%, 10%, and 6% of the overall area respectively.

The overall cell array in the baseline is composed of 8 sub-arrays, where each sub-array has 128${\times}$256 bit cells. Therefore, 260 cells including 2 dummy cells in both left and right edges are placed in a row of a sub-array. In the proposed transposable synapse memory, we divided each sub-array into 8 unit arrays, where each unit array has 128${\times}$32 bit cells. Then, we placed a column of row-transition MUXs and 4 columns of dummy cells between every 2 adjacent unit arrays as shown in Fig. 8(b). Because the width of the proposed row-transition MUX is 5${\times}$ of the bit cell width while the height is same, the area increase for the implementation of the row-transition MUXs is 24%, which is (2+5+2 cells)${\times}$(7 columns)/ (2+256+2 cells) in a sub-array, of the bit cell area. Therefore, the overall area increase due to the row-transition MUXs is 20%.

In addition, the barrel shifter occupies almost the same area with the periphery circuit, which means the area increase is 6%. Thus, overall area increase of the proposed transposable memory is 26% compared to the baseline non-transposable synapse memory.

Table 1. Relative areas of transposable synapse memories

Units	Baseline 6T	Previous 8T	Previous 6T	Proposed
Cell Array	0.84	2.10	0.84	0.84
SA & WD	0.10	0.20	0.10	0.10
Periphery	0.06	0.83	0.83	0.06
MUXs	0.00	0.00	0.00	0.20
B. Shifter	0.00	0.06	0.06	0.06
Total	1.00	2.36	1.83	1.26

Fig. 10. Post-layout simulation results of the WL signals in a sub-array.

In the 8T synapse memory ⁽⁸⁾, the bit cell area increases by 150% and sense amplifiers/write drivers are doubled according to our analysis. Thus, the estimated area increase is 136% compared to the baseline. The relative areas of the transposable synapse memories against the baseline are shown in Table 1.

We assumed that all synapse memories in Table 1 have 256${\times}$256 synaptic weights, where each weight has 4 bit cells. In case of the “Previous 8T” synapse memory, we assumed that it has 4 arrays of 256${\times}$256 single-bit cells.

Fig. 10 shows the delays of WL signals of 8 successive unit arrays in a sub-array measured from the post-layout simulations. In the baseline non-transposable synapse memory, the delay from the WL driver to the WL in the far-end unit array of a sub-array is 206 ps. However, the delay of the WL in the far-end unit array in the proposed transposable memory is increased by 331 ps (=537-206 ps) due to the row-transition MUXs. In addition, the propagation delay in the barrel shifter is measured to be 450 ps. Therefore, the memory cycle time of the proposed transposable synapse memory increases by about 780 ps, which is 26% of the cycle time of the baseline non-transposable synapse memory (=3 ns). In spite of this increase in the memory cycle time, overall performance of the proposed memory is much higher than the conventional structure due to the reduced number of cycles for column-wise access. Detailed analysis will be shown in the next section.

V. LEARNING PERFORMANCE ESTIMATION

We implemented cycle-based simulators for a SNN and a RBM to estimate STDP and CD learning performance gains. We used the standard MNIST data. The training and testing sets consist of 60,000 and 10,000 gray-scale images of 28${\times}$28 pixels. Because the number of pixels in each image of the MNIST data set is 784, we used 4 synapse memory macro models with 256${\times}$256 weights both for non-transposable and transposable synapse memories. The number of post-synaptic neurons was set to 256 considering the configurations of the synapse memory macros. In both estimations, we considered the cycle time increase of the proposed transposable memory structure.

1. Performance Gain in SNN STDP Learning

We simulated the STDP learning in a 784${\times}$256 single layer SNN to estimate the performance gain of the proposed transposable synapse memory structure against the baseline non-transposable memory. We used the leaky-integrate-and-fire (LIF) spiking neuron model (left of Fig. 11(a)) with the STDP learning rule (right of Fig. 11(a)). We also considered the biologically plausible features of SNN including refractory period and lateral inhibition in the simulation. The MNIST training images were converted into Poisson-spike trains proportional to the intensity of each pixel in the images. The maximum input spike rate $r$$_{\mathrm{max}}$ represents the spike rate for the pixel with the highest intensity in whole training images. Thus, the spike rate for each pixel is an integer number between 0 and $r$$_{\mathrm{max}}$ proportional to the intensity of the pixel.

Fig. 11. (a) LIF spiking neuron model, STDP learning curve, (b) performance gain of the proposed transposable synapse memory for STDP learning using MNIST data set.

Fig. 11(b) shows the performance gain of the proposed transposable synapse memory over the non-transposable memory as a function of the maximum input spike rate. The performance gain was only 2${\times}$ for $r$$_{\mathrm{max}}$=50 because the average number of post-synaptic spikes was only 0.4, and thus the number of column-wise weight updates for STDP learning was very small. However, the performance gain was up to 6.6${\times}$ for $r$$_{\mathrm{max}}$${\geq}$100, where the number of post-synaptic spikes was large enough. The average gain of the proposed transposable synapse memory over the non-transposable was 6.3${\times}$ for $r$$_{\mathrm{max}}$${\geq}$100.

2. Performance Gain in RBM CD Learning

We also simulated CD learning operation in a RBM comprising 784 visible and 256 hidden neurons. We used a sigmoid activation function and ran 10 epochs of CD learning shown in Fig. 12(a) for the MNIST training data set. In the CD learning, forward/backward vector-matrix multiplications to compute the hidden layer state $\boldsymbol{H}^k$ and the visible layer state $\boldsymbol{V}^k$ are the most critical steps. We counted the number of clock cycles to compute two forward vector-matrix multiplications ($\boldsymbol{H}^{0}$ and $\boldsymbol{H}^{1}$) and one backward vector-matrix multiplication ($\boldsymbol{V}^{1}$) for the 60,000 training images using the non-transposable and the proposed transposable synapse memory for comparison.

Fig. 12(b) shows the number of clock cycles to complete each epoch of CD learning and the performance gain of the proposed transposable synapse memory compared to the non-transposable memory in each epoch. The minimum and the maximum gains were 18.0${\times}$ and 20.2${\times}$ respectively. The average performance gain for 10 epochs was 19.3${\times}$.

Fig. 12. (a) CD learning, (b) performance gain of the proposed transposable synapse memory for CD learning using MNIST data set.

VI. CONCLUSION

We presented a transposable synapse memory using 6T SRAM bit cell for fast online learning in neuromorphic processors. Based on the synaptic weight relocation using barrel shifter and the transposable row addressing using row-transition multiplexer, the proposed design enables the transposable memory access in the integrated SRAM array structure. The proposed memory shows 6.3${\times}$ and 19.3${\times}$ higher performance for STDP and CD learning algorithms with MNIST data set than the conventional non-transposable memory. The area overhead of the proposed design over the non-transposable memory was 26% only, which is much smaller than the area overheads in the previous works.

ACKNOWLEDGMENTS

This research was supported in part by the Technology Innovation Program (10067764) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea), the “Nano-Material Technology Development Program” through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT (NRF-2016M3A7B4910249), the MSIT (Ministry of Science and ICT), Korea, under the “ICT Consilience Creative program” (IITP-2018-2011-1-00783) supervised by the IITP (Institute for Information & communications Technology Promotion) and Samsung Electronics Co., Ltd..

REFERENCES

Akopyan F., Oct 2015, Truenorth: Design and tool flow of a 65 mW 1 million neuron programmable neurosynaptic chip, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 34, No. 10, pp. 1537-1557

Davies M., Jan 2018, Loihi: A neuromorphic manycoreprocessor with on-chip learning, IEEE Micro, Vol. 38, No. 1, pp. 82-99

Diehl P., Cook M., 2015, Unsupervised learning of digit recognition using spike-timing-dependent plasticity, Frontiers in Computational Neuroscience, Vol. 9, pp. 99

Hinton G. E., Salakhutdinov R. R., 2006, Reducing the dimensionality of data with neural networks, Science, Vol. 313, No. 5786, pp. 504-507

Larochelle H., Bengio Y., 2008, Classification using discriminative restricted Boltzmann machines, in Proceedings of the 25th international conference on Machine learning, pp. 536-543

Coates A., Ng A., Lee H., 2011, An analysis of single-layer networks in unsupervised feature learning, in Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 215-223

Hinton G. E., Osindero S., Teh Y.-W., 2006, A fast learning algorithm for deep belief nets, Neural computation, Vol. 18, No. 7, pp. 1527-1554

Seo J., et al. , Sep. 2011, A 45nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons, in 2011 IEEE Custom Integrated Circuits Conference (CICC), pp. 1-4

Kim J., Koo J., Kim T., Kim J.-J., 2018, Efficient synapse memory structure for reconfigurable digital neuromorphic hardware, Frontiers in Neuroscience, Vol. 12, pp. 829

Author

Jongeun Koo

Jongeun Koo received the B.S. degree in Electrical Engineering from Kyungpook National University, Daegu, Korea and the M.S. degree in Electrical Engineering from Pohang University of Science and Tech-nology, Pohang, Korea in 2001 and 2003, respectively. From 2003 to 2019, he was with Samsung Electronics, Hwaseong, Korea, where he contributed to the design and verification of DRAM, Flash memory, and SRAM products.

He is currently pursuing the Ph.D. degree. His research interests include near-/in-memory computing and low-power VLSI design.

Jinseok Kim

Jinseok Kim received the B.S. degree in Creative IT Engineering from Pohang University of Science and Technology, Pohang, Korea in 2015, where he is currently pursuing the Ph.D. degree.

His research interests range from algorithm development to chip design, and he has been working on designing efficient digital neuromorphic hardware and deep learning hardware accelerator.

Sungju Ryu

Sungju Ryu received the B.S. degree in Electrical Engineering from Pusan National University, Busan, Korea in 2015.

He is currently pursuing the Ph.D. degree in the Department of Creative IT Engineering, Pohang University of Science and Tech-nology, Pohang, Korea.

His current research interests include energy-efficient hardware accelerator for compressed neural networks, adaptive/resilient circuit, and low-power VLSI design.

Chulsoo Kim

Chulsoo Kim received the B.S. degree in Electrical Engineering from Kyungpook National University, Daegu, Korea in 1991.

From 1991 to 2016, he was with Samsung Electronics, Hwaseong, Korea, where he contributed to design of DRAM products for server, graphics and mobile applications.

Since 2017, he has been a research staff in the Department of Creative IT Engineering, Pohang University of Science and Technology, Pohang, Korea.

His research interests include high-speed DRAM and emerging memories.

Jae-Joon Kim

Jae-Joon Kim is currently a professor at Pohang University of Science and Technology, Pohang, Korea.

He received the B.S. and M.S. degrees in Electronics Engineering from Seoul National University, Seoul, Korea and Ph.D. degree from the School of Electrical and Computer Engineering of Purdue University at West Lafayette, IN, USA in 1994, 1998, and 2004, respectively.

Before joining POSTECH, he was with IBM T. J. Watson Research Center as a Research Staff Member from May 2004 to Jan. 2013.

His current research interest includes design of deep learning hardware accelerator, neuromorphic processor, hardware security circuit, and circuit for exploratory devices.