KooJongeun
KimJinseok
RyuSungju
KimChulsoo
KimJae-Joon
-
(Pohang University of Science and Technology)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Spiking neural network, RBM, STDP, contrastive divergence, synapse memory
I. INTRODUCTION
Recently, spiking neural network (SNN), which uses spike frequency and spike-timing
interval for processing, has attracted attention for its biologically plausible characteristics
and high energy efficiency (1,2). Spike-timing-dependent plasticity (STDP) is the best known SNN learning algorithm
that updates synaptic weight based on the interval between pre-synaptic spike time
($t_{pre}$) and post-synaptic spike time ($t_{post}$) (3). In a non-transposable crossbar synapse memory, a column-wise weight vector can be
accessed weight-by-weight only by activating all word lines (WLs) one-by-one, while
a row-wise weight vector can be concurrently accessed by activating a single WL. For
post-then-pre ($t_{post}$${-}$$t_{pre}$ $<$ 0) STDP learning step, all synaptic weights
connected to one pre-synaptic neuron can be updated at the moment when the pre-synaptic
spike occurs ($t_{pre}$) (3). Thus, the non-transposable synapse memory provides high throughput for post-then-pre
STDP learning step in which each pre-synaptic spike incurs updating a row-wise weight
vector. However, the throughput for the pre-then-post ($t_{post}$${-}$$t_{pre}$ $>$
0) STDP learning, where each post-synaptic spike incurs updating a column-wise weight
vector, is severely degraded.
Restricted Boltzmann machine (RBM) is another good exemplary neural network which
prefers the transposable synapse memory. RBM is a generative stochastic artificial
neural network that generates a probability distribution of the input data for applications
including dimensionality reduction, classification, and feature learning (4-6). In RBM, the visible ($\boldsymbol{V}=\left[\begin{array}{llll}v_{0} & v_{1} & \cdots
& v_{\mathrm{M}-1}\end{array}\right]$) and the hidden ($\boldsymbol{H}=\left[\begin{array}{llll}h_{0}
& h_{1} & \cdots & h_{\mathrm{N}-1}\end{array}\right]$) layers form a bipartite graph,
allowing inter-layer connections only, and the restricted connections between the
two layers ($\boldsymbol{W} \in \boldsymbol{R}^{\mathrm{M} \times \mathrm{N}}$) allow
an efficient learning with the gradient-based contrastive divergence (CD) (7). In the non-transposable synapse memory, hidden layer state can be computed with
scalar-vector multiply-accumulate (MAC) operation because a row-wise weight vector
can be accessed concurrently as shown in Fig. 1(a). On the other hand, visible layer state is computed with scalar-scalar MAC operation
due to inefficient column-wise access in the non-transposable synapse memory as shown
in Fig. 1(b). As a result, backward vector-matrix multiplication for a visible layer computation
($\boldsymbol{H} × \boldsymbol{W}^{T}$) in CD learning takes $O(n^{2})$ cycles, whereas
it takes $O(n)$ cycles for forward vector-matrix multiplication for a hidden layer
computation ($\boldsymbol{V} × \boldsymbol{W}$) when transposable read access to a
synapse memory is not available.
Fig. 1. (a) Forward, (b) backward vector-matrix multiplication using non-transposable
synapse memory for CD learning in RBM.
To cope with the limited column-wise access speed in the non-transposable synapse
memory, a transposable synapse memory using a custom 8T SRAM bit cell was introduced
(8). However, the transposable 8T SRAM bit cell incurs significant area overhead compared
to 6T SRAM bit cell due to the routing of additional WL and bit line (BL) for transposable
access. In our previous work on a digital neuromorphic processing core (9), we presented a novel transposable addressing scheme leveraging 6T SRAM macros. However,
the area overhead is still substantial due to duplicated periphery circuits and additional
routing for clock, address and control signals.
In this paper, we propose a transposable crossbar synapse memory using 6T SRAM bit
cell that provides the same bandwidth as the previous transposable synapse memories
at the cost of much smaller area overhead.
Section II describes the previous works and section III presents the proposed memory
structure. Section IV shows testchip implementation and measurement result and section
V describes learning performance estimation. Finally, section VI concludes this work.
Fig. 2. (a) A crossbar synapse memory connecting 4 pre-/post-synaptic neurons and
n-bit synaptic weights using (b) 6T SRAM bit cell and (c) custom 8T SRAM bit cell.
Finally, section VI concludes this work.
II. PREVIOUS WORKS
Fig. 2(a) shows an exemplary crossbar synapse memory that fully connects 4 pre-synaptic and
4 post-synaptic neurons, where $W_{i,j}$ represents a single-/multi-bit synaptic weight
between a pre-synaptic neuron $v_{i}$ and a post-synaptic neuron $h_{j}$. In the synapse
memory using 6T SRAM bit cell shown in Fig. 2(b), a row-wise weight vector, [$w_{i,0}$ $w_{i,1}$ $w_{i,2}$ $w_{i,3}$] where $i$ is
the row address, can be concurrently accessed because no weight in the vector shares
the same BL with the others in the vector. On the other hand, a column-wise weight
vector, [$w_{0,j}$ $w_{1,j}$ $w_{2,j}$ $w_{3,j}$]$^{T}$ where $j$ is the column address,
should be accessed weight-by-weight in sequence because all weights in the vector
overlap on the same BL. As a result, column-wise access to the synapse memory using
6T SRAM bit cell becomes very slow.
In (8), a transposable synapse memory using a custom 8T SRAM bit cell shown in Fig. 2(c) was proposed, which allows concurrent access to a column-wise weight vector through
additional WL ($WL^{T}$) and BL ($BL^{T}$). However, the additional routing of WL$^{T}$
and BL$^{T}$ in the custom 8T SRAM bit cell results in significant area increase up
to 2.5${\times}$ compared to the 6T SRAM bit cell. Moreover, the custom 8T SRAM bit
cell cannot be configured as a multi-bit array due to the routing for transposable
access. Therefore, multiple single-bit arrays need to be used to implement multi-bit
weight as shown in in Fig. 2(c).
Fig. 3. Synaptic weight relocation scheme (9) (a) Before, (b) after weight relocation.
To reduce the area overhead, we presented a transposable synapse memory using 6T SRAM
macros (9). The transposable synapse memory is based on the two key schemes of synaptic weight
relocation and transposable row addressing. The synaptic weight relocation scheme
is to shift each row-wise weight vector [$w_{i,0}$ $w_{i,1}$ $w_{i,2}$ $w_{i,3}$]
to the right direction by the row address $i$ as shown in Fig. 3. It relocates each weight $W_{i,j}$ at ($r$, $c$) where $r$ and $c$ represent row
and column of the physical address in the synapse memory array. The physical address
is calculated as
where $n_{col}$ is the number of columns.
In the baseline 6T SRAM bit cell array shown in Fig. 3(a), all weights in a column-wise weight vector [$w_{0,j}$ $w_{1,j}$ $w_{2,j}$ $w_{3,j}$]$^{T}$
overlap on the same BL $BL[j]$. On the other hand, in the bit cell array after the
weight relocation shown in Fig. 3(b), no weight in a column-wise weight vector overlaps on the same BL with the other
weights in the vector because both physical rows and columns increase by 1 as the
logical row address $i$ increases in a column-wise weight vector. As a result, column-wise
weights are diagonally relocated as shown in Fig. 3(b). Although each synaptic weight in Fig. 3 is single-bit 6T SRAM cell, it can be easily replaced to $n$-bit weight in Fig. 2(b) while maintaining the structure. In this case, each BL in Fig. 3 is replaced to $n$-bit BLs.
Fig. 4. Transposable row addressing scheme for (a) row-wise, (b) column-wise accesses
(9).
However, even after the relocation, a column-wise weight vector cannot be accessed
concurrently because the physical row address, WL in other words, for each weight
in the vector is different from each other (Fig. 3(b)). The transposable row addressing scheme splits the 2-dimensional synapse weight
array into column chunks and changes the physical row address for each chunk depending
on the access direction as shown in Fig. 4. For row-wise access shown in Fig. 4(a), the row addressing scheme sets the physical row addresses for all the chunks same
as the logical row address $i$. On the other hand, for column-wise access shown in
Fig. 4(b), the scheme calculates the physical row address $r$ for $c$-th chunk using Eq. (1). As a result, the physical row address for the 0-th chunk is 2’s complement of the
logical column address $j$, ($n_{col}$ - $j$) in other words, and the row address
increases by 1 as the chunk index increases. For example, to access a column-wise
weight vector [$w_{0,1}$ $w_{1,1}$ $w_{2,1}$ $w_{3,1}$]$^{T}$ as shown in Fig. 4(b), the physical row addresses for the chunks are set to 3, 0, 1, and 2, respectively.
In our previous work on a digital neuromorphic processing core (9), we implemented a transposable synapse memory employing the two key schemes to speed
up online learning. We used a barrel shifter to implement the synaptic weight relocation
scheme as shown in Fig. 5. The barrel shifter shifts input row-wise (column-wise) weight vector to the right
direction by the logical row (column) address $i$ ($j$) for write operation and shifts
the read weights to the left direction for read operation. For the transposable row
addressing scheme, we used 6T SRAM macros for the memory chunks shown in Fig. 4 and implemented transposable row addressing unit as shown in Fig. 5. However, the area overhead is still 1.83${\times}$ compared to the non-transposable
synapse memory due to duplicated periphery circuits in SRAM macros and additional
routing for clock, address and control signals. For further area reduction, we propose
an integrated transposable synapse memory using 6T SRAM bit cell.
Fig. 5. Our previous implementation of transposable synapse memory using 6T SRAM macros
(9).
III. PROPOSED MEMORY STRUCTURE
The proposed transposable synapse memory eliminates the need for the periphery circuits
and other control overhead by integrating the transposable row addressing scheme inside
bit cell array. Fig. 6 shows the overall structure of the proposed transposable synapse memory, in which
the transposable row addressing scheme is implemented using the integrated custom
address decoder and row-transition multiplexer (MUX). The address decoder activates
the WL corresponding to the input address for row-wise access ($Rb/C=0$) and the WL
corresponding to 2’s complement of the input address for column-wise access ($Rb/C=1$).
Row-transition MUXs are placed between every two adjacent memory chunks. Each row-transition
MUX connects the upper input WL to the output WL for row-wise access ($Rb/C=0$) and
connects the lower input WL to the output WL for column-wise access ($Rb/C=1$). In
other words, the row-transition MUX connects WLs in the same row of two adjacent memory
chunks for row-wise access and connects the WL in the lower (left) memory chunk to
the WL which is one row above in the upper (right) memory chunk for column-wise access.
For example, the address decoder activates the WL for $w_{1,3}$ to access row (1) weight vector [$w_{1,0}$ $w_{1,1}$ $w_{1,2}$ $w_{1,3}$]. At the same time, all row-transition
MUXs connect the upper input WLs to the output WLs to activate WLs for $w_{1,0}$,
$w_{1,1}$, and $w_{1,2}$ concurrently. To access col (1) weight vector [$w_{0,1}$ $w_{1,1}$ $w_{2,1}$ $w_{3,1}$]$^{T}$, the address decoder
activates the WL for $w_{3,1}$ and all row-transition MUXs connect the lower input
WLs to the output WLs to access $w_{0,1}$, $w_{1,1}$, and $w_{2,1}$ concurrently.
Fig. 6. Overall structure of the proposed synapse memory.
Fig. 7. Proposed row-transition scheme for the memory configuration with column multiplexing.
The barrel shifter is placed between IO buffers and write drivers/sense amplifiers
and is shared for write and read accesses.
Meanwhile, the proposed integrated transposable row addressing using row-transition
MUX needs to be modified for the memory array with column multiplexing. Fig. 7 shows the proposed integrated transposable row addressing for memory configuration
with column multiplexing. In the configuration, the input address is divided into
two parts, MSBs for WLs and LSBs for column selection lines (CSLs), and the two parts
activate WLs and CSLs respectively. The row-transition MUXs for CSLs are placed between
every two adjacent memory chunks same as for WLs. The row-transition MUXs for CSLs
connect the upper input CSLs to the output CSLs for row-wise access and connect the
lower input CSLs to the output CSLs for column-wise access. On the other hand, the
row-transition MUXs for WLs connect the lower input WLs to the output WLs only when
the MSB of the CSLs in the previous memory chunk is ‘1’ for column-wise access. In
this way, the proposed row-transition scheme that integrates the transposable addressing
scheme inside memory array works correctly for any configuration of synapse memory
using 6T SRAM bit cell.
IV. IMPLEMENTATION AND MEASUREMENT
1. Testchip Implementation and Measurement Results
We implemented a testchip of a 64K (256${\times}$256) 4-bit weight transposable synapse
memory in a 28 nm CMOS technology. The 256${\times}$256 weight array was reconfigured
into 4 256${\times}$64 weight arrays and the 4 arrays were stacked together as a 1024${\times}$64
configuration under the 256-bit I/O data width constraint. Therefore, it takes 4 cycles
to read or write 256 weights in both row-/column-wise directions (64weights/cycle
via 256-bit I/O data path) in the reconfigured 1024${\times}$64 weight synapse memory.
Fig. 8(a) shows the overall structure of the testchip. We implemented the proposed integrated
transposable row addressing scheme for the reconfigured 1024${\times}$64 4-bit weight
array with column multiplexing as shown in Fig. 7. A unit array $U[n]$ corresponding to 1024${\times}$1 weights was implemented to
have a 128${\times}$8 4-bit weight array with 8-to-1 column multiplexing. In the unit
array, a 4-bit synaptic weight among 1024 weights can be accessed for a read or write
operation in a cycle. Each set of 8 unit arrays was horizontally placed to configure
a sub-array. A local address decoder was connected to the WLs and CSLs of the left
most unit array in each sub-array and 9 groups of 8 row-transition MUXs are placed
between every two unit-array in each sub-array. The 64K-synapse memory consists of
8 sub-arrays as shown in Fig. 8(a). A 64-to-64 4-bit weight barrel shifter was placed between the periphery and the
synaptic weight array.
A sub-array in the baseline non-transposable synapse memory is configured with 128${\times}$256
bit cells. In the testchip implementation, the sub-array was divided into 8 128${\times}$32
bit cell unit arrays and the proposed row-transition MUXs were placed between every
2 adjacent unit arrays. Because the proposed row-transition MUX was designed to have
the same height with the 6T SRAM bit cell and about 5${\times}$ width, a column of
the MUXs can be placed beside a bit cell array with the same vertical pitch of the
bit cell. Fig. 8(b) shows the layout design of 2 adjacent unit arrays in the testchip. 2 dummy cells
were added in both left and right edges of each unit array for layout regularity and
a column of 128 row-transition MUXs were placed between unit arrays for WL transition.
A column of 8 row-transition MUXs were also placed for CSL transition. In this way,
7 columns of 128 and 8 row-transition MUXs were placed for WL and CSL transitions
respectively in each sub-array.
Fig. 8. (a) Overall structure, (b) array layout design, (c) die photograph, and specifications
of testchip.
Fig. 8(c) shows the die photograph and specifications of the testchip. 6T SRAM bit cell was
manually designed using logic design rules to integrate the proposed row-transition
scheme. The built-in self-test (BIST) unit automatically generates test patterns for
row-wise (column-wise) writes and column-wise (row-wise) reads. Then, the BIST unit
compares the read weights with the expected output weights.
Fig. 9. Shmoo plot of testchip.
Fig. 9 shows the shmoo plot of the testchip measured at different frequencies and supply
voltages. The chip runs at the maximum operating frequency=255 MHz at 1.1 V while
consuming 1.023 mW power.
2. Analysis of Area and Performance Overheads
In the baseline non-transposable synapse memory with 256K (64K-weight ${\times}$ 4-bit)
bit cells and 256 (64-weight ${\times}$ 4-bit) IOs, bit cell array, sense amplifiers/write
drivers, and periphery circuit occupy 84%, 10%, and 6% of the overall area respectively.
The overall cell array in the baseline is composed of 8 sub-arrays, where each sub-array
has 128${\times}$256 bit cells. Therefore, 260 cells including 2 dummy cells in both
left and right edges are placed in a row of a sub-array. In the proposed transposable
synapse memory, we divided each sub-array into 8 unit arrays, where each unit array
has 128${\times}$32 bit cells. Then, we placed a column of row-transition MUXs and
4 columns of dummy cells between every 2 adjacent unit arrays as shown in Fig. 8(b). Because the width of the proposed row-transition MUX is 5${\times}$ of the bit cell
width while the height is same, the area increase for the implementation of the row-transition
MUXs is 24%, which is (2+5+2 cells)${\times}$(7 columns)/ (2+256+2 cells) in a sub-array,
of the bit cell area. Therefore, the overall area increase due to the row-transition
MUXs is 20%.
In addition, the barrel shifter occupies almost the same area with the periphery circuit,
which means the area increase is 6%. Thus, overall area increase of the proposed transposable
memory is 26% compared to the baseline non-transposable synapse memory.
Table 1. Relative areas of transposable synapse memories
Units
|
Baseline 6T
|
Previous 8T
|
Previous 6T
|
Proposed
|
Cell Array
|
0.84
|
2.10
|
0.84
|
0.84
|
SA & WD
|
0.10
|
0.20
|
0.10
|
0.10
|
Periphery
|
0.06
|
0.83
|
0.83
|
0.06
|
MUXs
|
0.00
|
0.00
|
0.00
|
0.20
|
B. Shifter
|
0.00
|
0.06
|
0.06
|
0.06
|
Total
|
1.00
|
2.36
|
1.83
|
1.26
|
Fig. 10. Post-layout simulation results of the WL signals in a sub-array.
In the 8T synapse memory (8), the bit cell area increases by 150% and sense amplifiers/write drivers are doubled
according to our analysis. Thus, the estimated area increase is 136% compared to the
baseline. The relative areas of the transposable synapse memories against the baseline
are shown in Table 1.
We assumed that all synapse memories in Table 1 have 256${\times}$256 synaptic weights, where each weight has 4 bit cells. In case
of the “Previous 8T” synapse memory, we assumed that it has 4 arrays of 256${\times}$256
single-bit cells.
Fig. 10 shows the delays of WL signals of 8 successive unit arrays in a sub-array measured
from the post-layout simulations. In the baseline non-transposable synapse memory,
the delay from the WL driver to the WL in the far-end unit array of a sub-array is
206 ps. However, the delay of the WL in the far-end unit array in the proposed transposable
memory is increased by 331 ps (=537-206 ps) due to the row-transition MUXs. In addition,
the propagation delay in the barrel shifter is measured to be 450 ps. Therefore, the
memory cycle time of the proposed transposable synapse memory increases by about 780
ps, which is 26% of the cycle time of the baseline non-transposable synapse memory
(=3 ns). In spite of this increase in the memory cycle time, overall performance of
the proposed memory is much higher than the conventional structure due to the reduced
number of cycles for column-wise access. Detailed analysis will be shown in the next
section.
V. LEARNING PERFORMANCE ESTIMATION
We implemented cycle-based simulators for a SNN and a RBM to estimate STDP and CD
learning performance gains. We used the standard MNIST data. The training and testing
sets consist of 60,000 and 10,000 gray-scale images of 28${\times}$28 pixels. Because
the number of pixels in each image of the MNIST data set is 784, we used 4 synapse
memory macro models with 256${\times}$256 weights both for non-transposable and transposable
synapse memories. The number of post-synaptic neurons was set to 256 considering the
configurations of the synapse memory macros. In both estimations, we considered the
cycle time increase of the proposed transposable memory structure.
1. Performance Gain in SNN STDP Learning
We simulated the STDP learning in a 784${\times}$256 single layer SNN to estimate
the performance gain of the proposed transposable synapse memory structure against
the baseline non-transposable memory. We used the leaky-integrate-and-fire (LIF) spiking
neuron model (left of Fig. 11(a)) with the STDP learning rule (right of Fig. 11(a)). We also considered the biologically plausible features of SNN including refractory
period and lateral inhibition in the simulation. The MNIST training images were converted
into Poisson-spike trains proportional to the intensity of each pixel in the images.
The maximum input spike rate $r$$_{\mathrm{max}}$ represents the spike rate for the
pixel with the highest intensity in whole training images. Thus, the spike rate for
each pixel is an integer number between 0 and $r$$_{\mathrm{max}}$ proportional to
the intensity of the pixel.
Fig. 11. (a) LIF spiking neuron model, STDP learning curve, (b) performance gain of
the proposed transposable synapse memory for STDP learning using MNIST data set.
Fig. 11(b) shows the performance gain of the proposed transposable synapse memory over the non-transposable
memory as a function of the maximum input spike rate. The performance gain was only
2${\times}$ for $r$$_{\mathrm{max}}$=50 because the average number of post-synaptic
spikes was only 0.4, and thus the number of column-wise weight updates for STDP learning
was very small. However, the performance gain was up to 6.6${\times}$ for $r$$_{\mathrm{max}}$${\geq}$100,
where the number of post-synaptic spikes was large enough. The average gain of the
proposed transposable synapse memory over the non-transposable was 6.3${\times}$ for
$r$$_{\mathrm{max}}$${\geq}$100.
2. Performance Gain in RBM CD Learning
We also simulated CD learning operation in a RBM comprising 784 visible and 256 hidden
neurons. We used a sigmoid activation function and ran 10 epochs of CD learning shown
in Fig. 12(a) for the MNIST training data set. In the CD learning, forward/backward vector-matrix
multiplications to compute the hidden layer state $\boldsymbol{H}^k$ and the visible
layer state $\boldsymbol{V}^k$ are the most critical steps. We counted the number
of clock cycles to compute two forward vector-matrix multiplications ($\boldsymbol{H}^{0}$
and $\boldsymbol{H}^{1}$) and one backward vector-matrix multiplication ($\boldsymbol{V}^{1}$)
for the 60,000 training images using the non-transposable and the proposed transposable
synapse memory for comparison.
Fig. 12(b) shows the number of clock cycles to complete each epoch of CD learning and the performance
gain of the proposed transposable synapse memory compared to the non-transposable
memory in each epoch. The minimum and the maximum gains were 18.0${\times}$ and 20.2${\times}$
respectively. The average performance gain for 10 epochs was 19.3${\times}$.
Fig. 12. (a) CD learning, (b) performance gain of the proposed transposable synapse
memory for CD learning using MNIST data set.
VI. CONCLUSION
We presented a transposable synapse memory using 6T SRAM bit cell for fast online
learning in neuromorphic processors. Based on the synaptic weight relocation using
barrel shifter and the transposable row addressing using row-transition multiplexer,
the proposed design enables the transposable memory access in the integrated SRAM
array structure. The proposed memory shows 6.3${\times}$ and 19.3${\times}$ higher
performance for STDP and CD learning algorithms with MNIST data set than the conventional
non-transposable memory. The area overhead of the proposed design over the non-transposable
memory was 26% only, which is much smaller than the area overheads in the previous
works.
ACKNOWLEDGMENTS
This research was supported in part by the Technology Innovation Program (10067764)
funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea), the “Nano-Material
Technology Development Program” through the National Research Foundation of Korea
(NRF) funded by the Ministry of Science, ICT (NRF-2016M3A7B4910249), the MSIT (Ministry
of Science and ICT), Korea, under the “ICT Consilience Creative program” (IITP-2018-2011-1-00783)
supervised by the IITP (Institute for Information & communications Technology Promotion)
and Samsung Electronics Co., Ltd..
REFERENCES
Akopyan F., Oct 2015, Truenorth: Design and tool flow of a 65 mW 1 million neuron
programmable neurosynaptic chip, IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, Vol. 34, No. 10, pp. 1537-1557
Davies M., Jan 2018, Loihi: A neuromorphic manycoreprocessor with on-chip learning,
IEEE Micro, Vol. 38, No. 1, pp. 82-99
Diehl P., Cook M., 2015, Unsupervised learning of digit recognition using spike-timing-dependent
plasticity, Frontiers in Computational Neuroscience, Vol. 9, pp. 99
Hinton G. E., Salakhutdinov R. R., 2006, Reducing the dimensionality of data with
neural networks, Science, Vol. 313, No. 5786, pp. 504-507
Larochelle H., Bengio Y., 2008, Classification using discriminative restricted Boltzmann
machines, in Proceedings of the 25th international conference on Machine learning,
pp. 536-543
Coates A., Ng A., Lee H., 2011, An analysis of single-layer networks in unsupervised
feature learning, in Proceedings of the fourteenth international conference on artificial
intelligence and statistics, pp. 215-223
Hinton G. E., Osindero S., Teh Y.-W., 2006, A fast learning algorithm for deep belief
nets, Neural computation, Vol. 18, No. 7, pp. 1527-1554
Seo J., et al. , Sep. 2011, A 45nm CMOS neuromorphic chip with a scalable architecture
for learning in networks of spiking neurons, in 2011 IEEE Custom Integrated Circuits
Conference (CICC), pp. 1-4
Kim J., Koo J., Kim T., Kim J.-J., 2018, Efficient synapse memory structure for reconfigurable
digital neuromorphic hardware, Frontiers in Neuroscience, Vol. 12, pp. 829
Author
Jongeun Koo received the B.S. degree in Electrical Engineering from Kyungpook National
University, Daegu, Korea and the M.S. degree in Electrical Engineering from Pohang
University of Science and Tech-nology, Pohang, Korea in 2001 and 2003, respectively.
From 2003 to 2019, he was with Samsung Electronics, Hwaseong, Korea, where he contributed
to the design and verification of DRAM, Flash memory, and SRAM products.
He is currently pursuing the Ph.D. degree. His research interests include near-/in-memory
computing and low-power VLSI design.
Jinseok Kim received the B.S. degree in Creative IT Engineering from Pohang University
of Science and Technology, Pohang, Korea in 2015, where he is currently pursuing the
Ph.D. degree.
His research interests range from algorithm development to chip design, and he has
been working on designing efficient digital neuromorphic hardware and deep learning
hardware accelerator.
Sungju Ryu received the B.S. degree in Electrical Engineering from Pusan National
University, Busan, Korea in 2015.
He is currently pursuing the Ph.D. degree in the Department of Creative IT Engineering,
Pohang University of Science and Tech-nology, Pohang, Korea.
His current research interests include energy-efficient hardware accelerator for compressed
neural networks, adaptive/resilient circuit, and low-power VLSI design.
Chulsoo Kim received the B.S. degree in Electrical Engineering from Kyungpook National
University, Daegu, Korea in 1991.
From 1991 to 2016, he was with Samsung Electronics, Hwaseong, Korea, where he contributed
to design of DRAM products for server, graphics and mobile applications.
Since 2017, he has been a research staff in the Department of Creative IT Engineering,
Pohang University of Science and Technology, Pohang, Korea.
His research interests include high-speed DRAM and emerging memories.
Jae-Joon Kim is currently a professor at Pohang University of Science and Technology,
Pohang, Korea.
He received the B.S. and M.S. degrees in Electronics Engineering from Seoul National
University, Seoul, Korea and Ph.D. degree from the School of Electrical and Computer
Engineering of Purdue University at West Lafayette, IN, USA in 1994, 1998, and 2004,
respectively.
Before joining POSTECH, he was with IBM T. J. Watson Research Center as a Research
Staff Member from May 2004 to Jan. 2013.
His current research interest includes design of deep learning hardware accelerator,
neuromorphic processor, hardware security circuit, and circuit for exploratory devices.