LiCheng
AnJunseop
KweonJun Young
SongYun-Heub
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Phase-change memory, PCM, neural network, feature extraction, neuron, synapse
I. INTRODUCTION
The emergence of non-volatile memory (NVM) used as synapses have promoted the development
of hardware neural network (HNN) for their excellent scalability and synapse-like
characteristics. Several types of NVM synapses have been implemented, including resistive
random access memory (ReRAM), spin-transfer torque magnetic random-access memory (STT-MRAM),
and phase-change memory (PCM) devices. We prefer to use PCM as the synapse for its
excellent properties such as multi-level resistance values, strong data retention,
high endurance, promising reliability, fast programming, CMOS compatibility, and technological
maturity (1-14). However, due to the programming mechanism of the PCM, the asymmetric characteristic
of the PCM between partially SET and partially RESET operations is severe (9-14). As a result, the change in the synaptic weights for the long-term potentiation (LTP)
and long-term depression (LTD) operations is mismatched: changes in the synaptic weight
for LTD (${∆}$W$_{\mathrm{LTD}}$) may be several times greater than those for LTP
(${∆}$W$_{\mathrm{LTP}}$). In this paper, we compared the learning results between
the fully-matched (or called symmetric, ${∆}$W$_{\mathrm{LTD}}$=${∆}$W$_{\mathrm{LTP}}$)
and mismatched (or called asymmetric, ${∆}$W$_{\mathrm{LTD}}>{∆}$W$_{\mathrm{LTP}}$)
cases, to show the influence of asymmetric synapse property during the training process.
Then we proposed an alternate pulse scheme (APS) to reduce the influence of the asymmetric
property.
In Section II, we explained the basic properties of the PCM device acting as a synapse.
In Section III, we expatiate on the operating principle of HNN, which is used for
feature extraction, and show the influence of the asymmetry between LTP and LTD. In
Section IV, we describe the operating principle of the proposed APS and then compare
the learning results of a mismatch case and those of a compensation case to show the
effects of the APS. Finally, we conclude this paper in Section V.
II. PCM DEVICE AS SYNAPSE
Fig. 1. (a) The structure of a PCM device, (b) The experimental relationship of PCM
conductance vs. programming voltage with 50 ns width, (c) The pulse for LTP and LTD
operations, (d) The trace of synaptic weight evolution based-on the response of pulse.
The PCM device is a kind of resistive device with a simple structure that a chalcogenide
glass layer (typically Ge$_{2}$Sb$_{2}$Te$_{5}$) sandwiched between the top electrode
and the bottom electrode (as shown in Fig. 1(a)). The conductance, which is seen as the synaptic weight, is dependent on the molecular
arrangement of the PCM. The conductance of PCM device changes based on the response
of the pulse crosses the PCM device. The PCM device retains its conductance value
when a low enough voltage is applied. The RESET operation can be performed by pulse
heating the temperature over the melting point followed by a fast falling time. And
the SET can be performed by pulse heating it over crystallizing temperature but below
the melting point. In addition to fully SET and fully RESET operation, the PCM device
can be programmed into intermedia conductance states by partially SET and partially
RESET operation with suitable programming conditions (as shown in Fig. 1(b)). The most important properties of PCM device acting as synapse is the intermedia
conductance states and gradual change by suitable partially SET (can be seen as LTP
operation) or partially RESET (can be seen as LTD operation) operations. To introduce
the performance of weight updating intuitionally, we assume the synaptic weight of
PCM device changes as the response of pulses (as shown in Fig. 1(c)). If the pulse amplitude is below crystalline voltage (V$_{\mathrm{C}}$), the PCM
synapse remains its weight. If the pulse amplitude above the melting voltage (V$_{\mathrm{m}}$),
a partially RESET (LTD) operation is performed and the synaptic weight decrease with
a value of ${∆}$W$_{\mathrm{LTD}}$. If the pulse amplitude between V$_{\mathrm{C}}$
and V$_{\mathrm{m}}$, a partially SET (LTP) operation is performed and the synaptic
weight increase with a value of ${∆}$W$_{\mathrm{LTP}}$. However, the weight change
for LTD operation (${∆}$W$_{\mathrm{LTD}}$) always much larger than that for LTP operations
(${∆}$W$_{\mathrm{LTP}}$) due to the asymmetric programming characteristic of the
PCM device. In this paper, we assumed the typical value of normalized ${∆}$W$_{\mathrm{LTP}}$
is set as 1/50; the value of ${∆}$W$_{\mathrm{LTD}}$ equals to ${∆}$W$_{\mathrm{LTP}}$
in the symmetric case and equals several times of ${∆}$W$_{\mathrm{LTP}}$ in asymmetric
cases. Fig. 1(d) shows the weight trace following fifty suitable identical pulses.
Fig. 2. (a) The leaky integrate-and-fire circuit, (b) The LTP and LTD pulse scheme
for PCM synapse, (c) The simulation result of the leaky integrate-and-fire circuit.
III. NEURAL NETWORK
1. Synapse and Neuron
The basic units of a neural network are synapses and neurons. We use one PCM device
acting as one synapse. A leaky-integrate-and-fire (LIF) circuit (as shown in Fig. 2(a)) is used to represent one neuron. It collects weighted signals and aggregates them
in the V$_{\mathrm{mem}}$. As shown in Fig. 2(c), when the V$_{\mathrm{mem}}$ exceeds a threshold (V$_{\mathrm{TH}}$), the V$_{\mathrm{mem}}$
is reset to zero and the pulse generator releases a square pulse as a post-synaptic
spike. A square pulse with low amplitude, which below the crystallizing voltage (V$_{\mathrm{c}}$),
acts as a pre-synaptic spike (V$_{\mathrm{PRE}}$). Such pulses can be weighted and
transmitted by synapses without causing a change in synaptic weight. A square pulse
with high amplitude, which over the melting voltage (V$_{\mathrm{m}}$), acts as a
post-synaptic spike (V$_{\mathrm{POST}}$). The overlap of V$_{\mathrm{PRE}}$ and V$_{\mathrm{POST}}$
set the voltage through a PCM synapse between V$_{\mathrm{c}}$ and V$_{\mathrm{m}}$,
so that an LTP operation is performed (as shown in Fig. 2(b)). The LTD operation is performed if only V$_{\mathrm{POST}}$ exists when the voltage
across the PCM synapse exceeds V$_{\mathrm{m}}$. This weight updating rule can be
understood as the simplified STDP (11-17).
Fig. 3. The input patterns (a) for original pattern rebuilding, (b) for common feature
learning.
Fig. 4. (a) Conceptual design of neuron circuit for APS, (b) The pulses scheme for
APS.
2. Neural Network
In this paper, we built a two-layer fully-connected neural network to perform the
feature extraction. For simplicity, the output layer contains only one LIF neuron.
The input layer contains 784 units which are connecting to the pixels of the input
pattern. The input pattern is a binary MNIST pattern whose white part corresponds
to on-pixels and the black part corresponds to off-pixels. The input neuron releases
a pre-synaptic spike if its correlating pixel is an on-pixel. By contrast, the neuron
does not release a pre-synaptic spike if its correlating pixel is an off-pixel. The
learning mechanism is based on Hebbian theory. The synapses that exhibit causal relationships
are potentiated. In contrast, the synapses that exhibited the anti-causal relationships
are depressed.
3. Feature Extraction Tasks
Feature extraction is one of the important characteristics of the neural network.
We performed two tasks about feature extraction. One of the tasks is the original
pattern rebuilding from the contaminated pattern. A pattern labeled “1” is contaminated
by different random noises (as shown in Fig. 3(a)). Here the “contaminated” means the pixels with noise are inversed. Our goal is to
extract the original pattern from the contaminated pattern. The other task is common
feature learning. Different patterns with the same label “5” (as shown in Fig. 3(b)) are provided to the neural network. And our goal is to extract their common features
for learning.
4. Influence of Asymmetry
Theoretically, the two tasks of feature extraction can be performed perfectly if the
weight updates caused by LTP and LTD are exactly fully-matched. However, due to the
mismatch between LTP and LTD, the result of feature extraction may be unexpected.
For the original pattern rebuilding task, Fig. 6(a) shows the synaptic weights after the 100$^{\mathrm{th}}$ learning operation for mismatched
cases. With the increase of mismatches, we can see that the image becomes fuzzy when
${∆}$W$_{\mathrm{LTD}}$ > 5/50. In addition, the average weight difference (AWD) is
shown in Fig. 6(c) and Table 1. The AWD normalizes the difference between the actual weight value and the expected
weight value. The AWD after the 100$^{\mathrm{th}}$ learning operation is less than
1% for cases with ${∆}$W$_{\mathrm{LTD }}$ < 5/50. With the mismatch increasing, the
final learning result is far from the target.
The result of the common feature learning operation is shown in Fig. 7. The evolution of the synaptic weights for the fully-match (${∆}$W$_{\mathrm{LTD}}$
= 1/50) and mismatch (${∆}$W$_{\mathrm{LTD}}$ = 4/50) cases are shown in Fig. 7(a) and (b), respectively. With the limitation of the minimum weight value, some information
from the trained patterns is removed for the mismatched case after a majority of the
synaptic weights decrease to the minimum value. Furthermore, the final weight values
correlating to the overlap of on-pixels for mismatched cases (range from 0 to 0.2)
are much lower than those for the fully-matched case (range from 0 to 1). In Fig. 7(d) and (e), we selected four representative weights and showed the trace of weights evolution
in the case of fully-match and mismatch, respectively. For the fully-match case, the
weights correlating to pixel (12,12) and pixel (12,13) finally approach the maximum
value with more LTP operations, while the weights correlating to pixel (23,15) and
pixel (23,16) are relatively lower with less LTP operations. However, for the mismatch
case, all of these weights approach the minimum value.
Fig. 5. Flowchart of the previous scheme and proposed APS.
Table 1. The AWD after 100th learning operation
Fig. 6. Synaptic weights after 100$^{\mathrm{th}}$ learning operation for (a) mismatch,
(b) compensation cases. The average weight difference for (c) mismatch, (d) compensation
cases.
IV. PROPOSED ALTERNATE PULSE SCHEME
1. Concept of APS
The asymmetry between the LTP and LTD operations is an intrinsic property of a PCM
device. The average mismatch “m”, which is defined as the rate of ${∆}$W$_{\mathrm{LTD}}$
to ${∆}$W$_{\mathrm{LTP}}$ (m = ${∆}$W$_{\mathrm{LTD}}$/${∆}$W$_{\mathrm{LTP}}$),
can be pre-measured. A counter is added in the neuron circuit (as shown in Fig. 4(a)) to indicate how many times the neuron fires, and the pulse generator is designed
to fire two kinds of pulses: pulse(1) only performs LTP operations without performing
LTD operations; pulse(2) is the normal pulse for performing both LTP and LTD operations.
The working principles of the pulses are shown in Fig. 4(b). The overlap of the pulse(1) can only drive the PCM synapses to set the voltage (LTP
operations), and the working principle of the pulse(2) is explained in Section III.
The flowchart of the previous scheme and the proposed APS are shown in Fig. 5. The counter number is set to zero initially, and it increments by one after each
complete integrate-and-fire operation of a neuron. Pulse(1), which prohibits the LTD
operation, is provided before the counter number reaches m. Once the counter number
reaches m, pulse(2) is fired instead of the pulse(1). By performing the APS, an LTD
operation is only performed once while LTP operations are performed m times. This
scheme roughly compensates for the mismatch of the asymmetry property.
2. Effects of APS on Original Pattern Rebuilding
To show the effects of the APS on original pattern rebuilding, we compared the learning
result for mismatched cases and for compensation cases (as shown in Fig. 6). The AWD values after the 100$^{\mathrm{th}}$ learning operation are shown in Table 1. For the cases with ${∆}$W$_{\mathrm{LTD}}$ > 8/50, the learning results by the APS
more closely approach the fully-matched case than those of the mismatched cases with
the same conditions. The AWDs for compensation cases are much more reduced than the
mismatched cases and the patterns are much clearer. For the compensation cases with
${∆}$W$_{\mathrm{LTD }}$ < 15/50, the AWD is reduced to less than 0.6% by the APS.
For the compensation case with ${∆}$W$_{\mathrm{LTD}}$ = 20/50, the AWD is as large
as 1.658 due to the influence of noise in the background, but the pattern is clear
enough.
3. Effects of APS on Common Feature Learning
Fig. 7. Synaptic weights for (a) fully-match, (b) mismatch, (c) compensation cases.
The trace of specific synaptic weights for (d) fully-match, (e) mismatch, (f) compensation
cases.
Fig. 8. (a) The synaptic weights after 5000th learning for fully match, mismatch and
compensation cases, (b) The difference ratio, (c) The summation of weights.
In Fig. 7, we compare the weight evolutions and the trace of specific synaptic weights for
fully-matched, mismatched and compensation cases. For the mismatched and compensation
cases, the changes in the synaptic weights for the LTP and LTD operations are set
as ${∆}$W$_{\mathrm{LTP}}$=1/50 and ${∆}$W$_{\mathrm{LTD}}$=4/50. In contrast with
the mismatched case, the evolution process for the compensation case more closely
approaches the fully-matched case. The learning results for the compensation case
are the common properties of the trained patterns instead of the properties of the
last training pattern for the mismatched case. The trace of the synaptic weight evolution
for the compensation case is similar to that for the fully-matched case, while the
trace for the mismatched case is in the lower range.
The learning results after the 5000$^{\mathrm{th}}$ learning operation shown in Fig. 8(a). All of the learning results for the compensation cases are much clearer than the
mismatched cases. We statistically analyzed the difference ratio and summation of
the synaptic weights in Fig. 8(b) and (c), respectively. The difference increases with the mismatch between LTP and LTD until
a saturation value of about 8%. And the summation of the normalized synaptic weights
is below 5 after the ratio of LTD to LTP becomes greater than 500%. This means that
the learning result is merely the last training pattern and the integrated weighted
signal is very weak. Compared with the mismatched cases, the difference ratios for
the corresponding compensation cases are reduced and the summation of the normalized
synaptic weights has a similar value to the fully-matched case. This means that the
learning results are the common properties of trained patterns, just as in the fully-matched
case, and the signal of the integrated weighted signal is strong enough to drive the
integrate-and-fire operation of the LIF neuron.
4. Effects of APS on Energy Efficiency
An additional effect of APS is the reduction of energy consumption in the learning
process. As the energy consumption for unit reset operation on the PCM device is several
times larger than that for unit set operation. The total energy consumption is reduced
by reducing the number of RESET operations in the APS. The operation numbers and energy
consumption are shown in Table 2 to compare the previous scheme and the APS (W$_{\mathrm{LTD}}$=4/50). Here, the energy
consumption for unit LTP and LTD operation is based on the data from (13). For the previous scheme, the number of RESET operation (N$_{\mathrm{RESET}}$) equals
the number of LTD (N$_{\mathrm{LTD}}$). For the APS, N$_{\mathrm{RESET}}$ equals a
quarter of N$_{\mathrm{LTD}}$. As a result, the total energy consumption in APS is
only about 26.43% of that in the previous scheme.
V. CONCLUSIONS AND DISCUSSION
Table 2. he number of operations and energy consumption on specified synapse and total
synapse array
In this paper, we analyzed the influence of asymmetry between LTP and LTD, and demonstrated
by two tasks of feature extraction: (1) original pattern rebuilding and (2) common
feature learning. Then we proposed an APS to compensate for the influence of asymmetry.
As a result, the rebuilt pattern by the APS is more similar to the original pattern.
In common feature learning, the APS reduces the difference ratio and increases the
summation of the synaptic weights. This APS improves the learning results by reducing
the influence of asymmetry between LTP and LTD. However, we just focus on the asymmetry
between LTP and LTD operations, neglecting other non-ideal characteristics of PCM
devices, such as stochastic, non-linearity, and so on. We will further research the
APS effects with more non-ideal characteristics (18-20) of the synapse.
ACKNOWLEDGMENTS
This research was supported by Nano Material Technology Development Program through
the National Research Foundation of Korea (NRF) funded by the Ministry of Science,
ICT and Future Planning (NRF-2016M3A7B4910398).
REFERENCES
Kuzum D., 2011, Nanoelectronic programmable synapses based on phase change materials
for brain-inspired computing, Nano letters, Vol. 12, No. 5, pp. 2179-2186
Breitwisch M. J., Cheek R. W., Lam C. H., Modha D. S., Rajendran B., :US Patent 20100299297.
Kim S., 2015, NVM neuromorphic core with 64k-cell (256-by-256) phase change memory
synaptic array with on-chip neuron circuits for continuous in-situ learning, 2015
IEEE international electron devices meeting (IEDM), pp. 17.1.1-17.1.4.
Wang Z., 2015, A 2-transistor/1-resistor artificial synapse capable of communication
and stochastic learning in neuromorphic systems, Frontiers in neuroscience, Vol. 8,
pp. 438
Pantazi A., 2016, All-memristive neuromorphic computing with level-tuned neurons,
Nano-technology, Vol. 27, No. 35, pp. 355205
Stefano Ambrogio , 2016, Unsupervised Learning by Spike Timing Dependent Plasticity
in Phase Change Memory (PCM) Synapses, Frontiers in neuroscience, Vol. 10, No. 56
Tomas Tuma , 2016, Detecting Correlations Using Phase-Change Neurons and Synapses,
IEEE Electron Device Letters, Vol. 37, No. 9, pp. 1238-1241
Nandakumar S. R., 2017, Supervised Learning in Spiking Neural Networks with MLC PCM
Synapses, 2017 75th Annual Device Research Conference (DRC), pp. 1-2
Barbera Selina La, 2018, Narrow Heater Bottom Electrode‐Based Phase Change Memory
as a Bidirectional Artificial Synapse, Advanced Electronic Materials, Vol. 4, No.
9, pp. 1800223
Irem Boybat, 2018, Neuromorphic computing with multi-memristive Synapses, Nature communi-cations,
Vol. 9, No. 1, pp. 2514
Suri M., 2011, Phase change memory as synapse for ultra-dense neuromorphic systems:
Application to complex visual pattern extraction, 2011 International Electron Devices
Meeting, pp. 4.4.1-4.4.4
Suri M., 2012, Physical aspects of low power synapses based on phase change memory
devices, Journal of Applied Physics, Vol. 112, No. 5, pp. 054904
Bichler O., 2012, Visual Pattern Extraction Using Energy-Efficient, IEEE Transactions
on Electron Devices, Vol. 59, No. 8, pp. 2206-2214
Suri M., 2011, Phase change memory for synaptic plasticity application in neuromorphic
systems, The 2011 International Joint Conference on Neural Networks. IEEE, pp. 619-624
Querlioz D., 2011, Simulation of a memristor-based spiking neural network immune to
device variations, The 2011 International Joint Con-ference on Neural Networks. IEEE,
pp. 1775-1781
Querlioz D., 2012, Bioinspired networks with nanoscale memristive devices that combine
the unsupervised and supervised learning approaches, 2012 IEEE/ACM International Symposium
on Nanoscale Architectures (NANOARCH). IEEE, pp. 203-210
Querlioz D., 2013, Immunity to Device Variations in a Spiking Neural Network With
Memristive Nanodevices, IEEE Transactions on Nano-technology, Vol. 12, No. 3, pp.
288-295
Irem Boybat, 2017, Stochastic weight updates in phase-change memory-based synapses
and their influence on artificial neural networks, 2017 13th Conference on Ph. D.
Research in Microelectronics and Electronics, pp. 13-16
Pritish Narayanan, 2017, Neuromorphic Tech-nologies for Next-Generation Cognitive
Com-puting, Electron Devices Technology and Manufacturing Conference (EDTM)
G.W.Burr , 2015, Experimental demonstration and tolerancing of a large-scale neural
network (165,000 synapses), using phase-change memory as the synaptic weight element,
IEEE Transactions on Electron Devices, Vol. 62, No. 11, pp. 3498-3507
Author
was born in Linfen city, Shanxi Province, China, in 1989.
He received the B.S. degree in electronics engineering from Han-yang University, Seoul,
South Korea, in 2013.
He is currently working toward the unified M.S. and Ph.D. degree in electronic engineering.
His current research interests include neuromorphic system, phase-change material
synapse, and neuron circuit.
received the B.S. degree in electronic engineering from Han-yang University, Seoul,
Korea in 2015.
Since 2015, he has worked the unified the Master’s and Doctor’s Course in electronics
engineering.
His current research interests include the development of neuromorphic system using
phase-change memory, simulation study using TCAD, fabricating and characterization
of memory devices such as PRAM.
received the B.S. degree in electronics engineering from Hanyang University, Seoul,
South Korea, in 2014.
Currently working the unified Ph.D student of division of nanoscale semiconductor
engineering at Hanyang University.
Interesting phase-change material memory systems.
received his M.S. degree in electronic engineering from Hanyang University, Seoul,
Korea, in 1992, and his Ph.D. degree in intelligent mechanical engineering from Tohoku
University, Sendai, Japan, in 1999. He is currently a Professor in Electronic Engineering,
Hanyang University, Seoul, Korea.
He has researched semiconductor devices and circuit design for more than 30 years
at the Semiconductor R&D Center, Samsung Electronic Co. and Hanyang University, Korea.
When Prof. Song was working at Samsung, he was responsible for the device and product
development of Flash memory as a vice-president, and developed 256Mb and 512Mb NOR
Flash memory in 2000–2003. After moving to Hanyang University in 2008, Korea, he served
as a vice-dean, College of Engineering, engaging in extensive international collaboration
research and planning on industrial projects from 2011 to 2013.
His research interests include device reliability modeling, device characterization,
novel device structures and architecture for memory and logic applications, circuit
design and algorithms for low power and high speed, and sensor systems based on semiconductor
technology.