I. INTRODUCTION
Deep neural network has been actively developed as a classifier in various computer
vision applications due to its high classification accuracy[1-5]. However, they involve too huge amount of computations and memory footprints to be
deployed in battery-driven mobile devices yet. Besides the deep neural networks, Support
Vector Machines and neural networks have been commonly used for mobile applications
but the former suffered from its massive computation and data storage compared to
the latter. Among the variants of neural network, Radial Basis Function Network (RBFN)
and Multilayer Perceptron (MLP) are most commonly used for classification on account
of its high classification accuracy and fast learning process when compared to other
types of classifiers[6,7]. However, mobile applications such as object recognition in unmanned aerial vehicle[8] and head-mounted display[9,10] require low-power consumption while preserving high classification accuracy and fast
processing speed. However, previous neural network implementations were not able to
achieve such high accuracy and low-power consumption at the same time due to the significant
advantages and disadvantages of analog and digital circuit implementations[11,19].
Digital implementations of neural networks[11-13] have advantages that they can achieve high accuracy and programmability with noise
tolerance, but they consumed huge power and area. Hundreds of weights must be stored
in SRAM as well as numbers of nonlinear activation functions in Look-Up Table, but
usually their values were approximated because they require huge memory to hold every
entries, e.g. holding 8-b resolution of one activation requires 256B while digital
multiplication doubles the precision, therefore, it needs an external memory to store
the parameters. Moreover, parallel processing of activation function for multiple
neurons was not possible, degrading the processing time [14]. On the other hand, analog VLSI implementation results in low-cost parallelism as
well as low-power computation such as addition, but their inaccurate circuit parameters
with noise and low precision degraded classification accuracy[15-17]. Several mixed-mode SoCs took the advantages of both analog and digital implementations
obtaining low-power consumption within small area; however, there were no concerns
on noise compensation while mobile devices suffer from supply voltage fluctuation
and temperature variation that directly ruins accuracy[18,19].
In this paper, an analog-digital mixed-mode neural network classifier (NNC) is proposed
for mobile scene classification. Its highly controllable radial basis function (RBF)
circuit generates sigmoidal-shape function as well as various shapes of RBF[20] supporting both approximated MLP and RBFN. By adopting the environmental noise compensation
in the analog core, namely supply voltage (△VDD) and temperature (△T) compensation, the proposed SoC achieves 92% classification
accuracy.
The rest of the paper is organized as follows. In Section II, NNC is introduced to
provide basic operations and hardware architecture of the proposed chip will be explained
in Section III. Section IV shows measurement and implementation results, followed
by conclusion in Section V.
II. NEURAL NETWORK CLASSIFIER
Fig. 1(a) shows the typical neural network architecture. It consists of input layer ($X$),
hidden layers ($H$), and an output layer ($Z$). Every input is fully connected to
hidden neurons, and they are multiplied with weights ($c$, $w$, $v$) and summed up
before transfer function $f$(.). The output of each hidden neuron $j$ becomes as (1), where types of activation functions are shown in Fig. 1(b).
Fig. 1. (a) Neural network architecture, (b) Types of activation functions. Bell-shaped
or Gaussian functions for RBFN; Sigmoid and ReLU for MLP, (c) Basic concepts of linear
and non-linear classification. Left diagram = MLP, right diagram = RBFN.
Among different types of activation functions in Fig. 1(b), sigmoid and ReLU functions are most widely used for MLP and deep neural network;
and RBFs are used for Support Vector Machine and RBFN. The output of RBFN is represented
as (2), where $w_{ij}$ is the weight between the ith center ($c_{i}$) of RBF and output $j$, $\sigma_{i}$ represents the width of the
ith center, $\|\cdot\|$ is the Euclidean norm on the input space, $\phi$ is output of
each hidden RBF nodes that corresponds to bell-shaped functions including Gaussian.
The operation of neural network classification differs by the types of function used
for its activation, as depicted in Fig. 1(c). The number of hidden layers determines combinations of linear decision boundaries
of MLP while using RBF utilizes nonlinear classification boundaries. Therefore, MLP
and RBF have different classification accuracy under the same network size or computational
cost. RBFN shows great performance for simple classification with low complexity besides
today’s heavy deep neural network, but its performance is dependent on shape, center,
and width of the RBFs. If the shape of the RBF is corrupted by noise resulting in
unwanted overlaps among the RBFs as depicted in Fig. 2, the trained weight values become no more reliable; this fact causes severe degradation
of accuracy. Therefore, the NNC should provide versatile shapes of RBF with stability.
In order to solve the problem that may occur due to noise, the proposed NNC processor
contains highly controllable RBF circuit that can generate various RBFs as well as
sigmoid-shape function for reconfigurable processing.
Fig. 2. Changes of RBF shape due to environmental noise $\Delta \mathrm{V}_{\mathrm{DD}}$.
III. MIXED-MODE HARDWARE ARCHITECTURE
Fig. 3 shows the overall hardware architecture of the proposed mixed-mode NNC processor.
The analog datapath performs feed-forward neural network classification. For low-power
consumption and utilizing Kirchhoff’s current law, the analog core is designed with
current-mode circuits. The entire analog core consists of DACs for input, current
multipliers for weight multiplication, I-V converters, RBF circuits, and a sigmoid
circuit. The neural network module has 4 input neurons, 6 hidden neurons and 1 output
neuron, which is operated recursively to generate 25 scene categories.
Fig. 3. Overall architecture of the mixed-mode NNC processor.
The neural network parameters are stored in the memory bank in the digital controller
that sets the corresponding values of the parameters to the analog core through DAC
bank. The digital controller also performs on-line learning of the weights and RBF
shapes.
1. Analog Neural Network Core Circuits
The required number of DACs and current multipliers in fully-connected layers should
be identical to the number of weights. Instead of having numbers of DACs and multipliers,
binary-weighted current mirrors are implemented to save area and power as shown in
Fig. 4. They perform weight multiplication when the NNC is used as MLP-mode. In RBFN-mode,
multipliers in the first layer are used as wire connection and those in the second
layer are used for weight multiplication. The input current is multiplied by twofold
4-b weights of $w[3 : 0]$ and $w[7 : 4]$ to save area. The outputs mirrored through
M1~M2 and M3~M4 represent LSB and MSB, respectively, and they have different size.
The amount of final output is as (3).
Fig. 4. 8-bit current multiplier.
Fig. 5 shows the sigmoid circuit for the output neuron. Bias voltages $V_{BP}$ and $V_{BN}$
control the transient points and output range. The reference voltages define shape
and slope of the sigmoid function, and setting their values can also provide approximately
linear functions. The controllable sigmoid circuit saves area and power of the NNC
processor.
Fig. 5. Controllable sigmoid circuit for activation and measured waveform.
For low-power consumption, the analog core operates with current in the order of nA.
Therefore, the circuit becomes too sensitive to environmental noises $\Delta T$ and
$\Delta V_{DD}$ variations. To compensate for such noise, all the analog circuit adopts
a stable current reference from [21][21] with modifications, as shown in Fig. 6. The stacked output nodes generate stable current where its amount is controlled
by the 5-b switches $BS[4:0]$.
Fig. 6. Environmental noise robust current source.
Fig. 7(a) shows the proposed RBF circuit which is highly controllable with six parameters:
$V_{ref1}$, $V_{ref2}$, $B[4:0]$, $I_{x}$, $Sel_{-} p$, $F_{-} Sel$. Since bias voltages
$V_{ref1}$, $V_{ref2}$ define transient points of the V-I curve, their combination
defines center and width of the RBF and approximated sigmoid function shape. Switches
$B[4:0]$ controls $g_{m}$ of the curve; height is set by $I_{x}$, which is the binary
switches $BS[4:0]$ in the current source; and up/down phase of the function is set
by multiplexer $Sel_{-} p$. MUX, $F_{-} Sel$, and level shifter invert the input voltage
domain to generate much sharp bump functions as described in [20][20]. Fig. 7(b) shows measured waveforms of the circuit. The circuit generates various shapes of
RBF (black, red, green, blue) as well as sigmoid-shaped function depicted in purple
dots where $V_{ref1}$ is 0 V and $V_{ref2}$ is 0.6 V. Moreover, it reduced $\Delta
T$ noise error by 92.7% testing under [-37℃, 87℃], and achieved stable output current
within ±0.2 V window of $\Delta V_{DD}$ where $V_{DD}$ is 1.2 V.
Fig. 7. (a) The highly controllable RBF circuit with noise compensation, (b) Measured
analog waveforms of RBFs show diversity of activation functions with noise compensation.
2. Digital Controller for Learning
Fig. 8 shows the digital controller that consists of a control unit, a learning unit, and
a configuration memory which controls analog parameters through DAC bank. In the learning
unit, the K-means Clustering Accelerator (KCA) and the Back Propagation Accelerator
(BPA) are used for learning the RBF parameters and weights of the fully-connected
layers. In RBFN-mode, the current multipliers in the first layer work as wire connection
and the NNC FSM controller sets RBF parameters, which are trained from the KCA, to
the analog core. The BPA consists of a 4-way SIMD Multiply-and-Accumulator arrays
for back-propagation and a sum-of-squareddistance unit for loss computation. The KCA
contains centroid unit that finds center of each cluster, or class, and a RBF identifier
which finds the shape of the RBFs. The KCA is not used in MLP-mode and the NNC FSM
Controller supervises correct weights via the current multiplier in the first layer.
The learning unit is used only for on-line training of the weights, which means it
is not necessarily used for feed-forward classification once the neural network parameters
are trained.
Fig. 8. Digital controller architecture.
IV. IMPLEMENTATION & MEASUREMENT
Fig. 9 shows the chip micrograph and the performance summary of the proposed NNC processor.
It is fabricated in 0.13 $\mu m$ CMOS process as a part of the mobile object recognition
processor[8]. The NNC processor occupies 0.140 mm2 and consumes 2.20 mW running at 200 MHz for digital domain, while the whole SoC[8] occupies 25.0 mm2 and consumes 260 mW in average. The power consumption of the analog neural network
core is only 723 $\mu W$ due to the current-mode circuits. Compared with conventional
fully-digital implementation, the proposed processor saves area and power by 84.0%
and 82.2%, respectively by utilizing mixed-mode architecture.
Fig. 9. Chip photograph and summary table.
Table 1 is the performance comparisons with the analog/digital ASIC and FPGA implementations.
We define complexity, which represents the number of weights, to compare power and
area efficiencies. Also, the efficiencies of [11][11] are scaled to 0.13 $\mu m$ process by applying Dennard Scaling for comparison. By
the natural characteristics of spiking neural network, [11][11] shows the greatest power efficiency and dense complexity but it requires large area
to implement every neuron, therefore its area efficiency is less than this work. Also,
this work can support both MLP and RBFN by having reconfigurable architecture. Compared
to analog circuits, mixed-mode implementations are advantageous to obtain high programmability.
Among the mixed-mode processors[18,19], this work achieved the highest efficiency in terms of power and area due to its
recursive operating architecture for 25-category classification.
Table 1. Comparisons with Neural Network Processors
Reference
|
CLASSIFIER TYPE
|
Signal Type
|
Complexity [# of weights]
|
Programmability
|
Process
|
Power [mW]
|
Area [mm2]
|
Power $\eta$ [#weight/mW]
|
Area $\eta$ [#weight/mm2]
|
Seo [11][11]*
|
1)SNN
|
Digital
|
64k (256x256)
|
High
|
45 nm
|
~3.00
|
4.20
|
7560
|
1870
|
Yang [13][13]
|
RBFN
|
Digital (FPGA)
|
135
|
Extremely High
|
0.18 μm
|
967
|
N/A
|
0.140
|
N/A
|
Lont [16][16]
|
MLP
|
Analog
|
161
|
Very Low
|
3 μm
|
25.0
|
2.40
|
6.44
|
67.1
|
Peng [17][17]
|
RBFN
|
Analog
|
14
|
Low
|
0.5 μm
|
2)2.24
|
2)0.0482
|
6.25
|
290
|
Kim [18][18]
|
3)NFL
|
Mixed
|
12 (3x4)
|
High
|
0.13 μm
|
2.83
|
0.163
|
4.24
|
73.6
|
Oh [19][19]
|
3)NFL
|
Mixed
|
27
|
High
|
0.13 μm
|
1.20
|
0.765
|
271
|
35.3
|
This Work
|
MLP/RBFN
|
Mixed
|
750 (30x25)
|
High
|
0.13 μm
|
2.20
|
0.140
|
341
|
5360
|
1) SNN: Spiking Neural Network; 2) Numbers are available only with RBF circuits; 3)
NFL: Neuro Fuzzy Logic
* Power dissipation differs by the variants in [11][11]; power and area efficiencies are scaled to 0.13 $\mu m$ process
Fig. 10 shows the measurement process of the proposed processor, which is used as a visual
attention for the entire object recognition SoC[8]. The input image is decomposed into 128x128 pixel macro-blocks and HMAX is performed
over each block to extract statistical descriptor vector that becomes the input to
the RBFN. Then, the macro-block is classified as one of the pretrained 25 scene categories
by recursive operation of RBFN. Finally, the input image turns into spatially organized
scene map and object recognition is performed within the macro-blocks. The scene classification
result provides the likelihood of target object on the object recognition pipeline,
therefore, only correct objects of interest are detected. For example of safe driving,
drivers are interested in moving vehicles on the road, not the vehicles on the advertisement.
Scene classification with RBFN provides the contextual information to detect objects
on road scene category. Fig. 11 shows the evaluation platform and results. The SoC is integrated with the multimedia
expansion board and evaluated in city-view experiment set, where the target object
is the toy car on the road and the distractor is a vehicle on an advertising board.
The RBFN scene classification results in context-aware map as depicted in the right-bottom,
and only the target object on road scene is recognized while the advertisement is
neglected.
Fig. 10. Measurements process of scene classification for object recognition.
Fig. 11. Evaluation system and results.
Thanks to mixed-mode implementation of the proposed NNC processor, the overall visual
attention accuracy is increase to 84% that is 1.40x improvement to the conventional
visual attention model. As a result, the entire SoC[8] achieved 96% of object recognition accuracy in the test of 200 objects with 25 scene
categories. In addition to the scene classification with HMAX descriptor, sole classification
accuracy is measured with handcrafted test vectors and the proposed NNC processor
achieved 92%.
V. CONCLUSIONS
In this work, a reconfigurable mixed-mode neural network classifier processor is proposed
for robust and low power scene classification as a part of mobile object recognition
processor. It consists of noise tolerant analog circuits compensate for temperature
and supply voltage variations in order to achieve high classification accuracy, and
supports both MLP and RBFN. The proposed processor fabricated in 0.13 μm CMOS process
consumes 2.20 mW running at 200 MHz; it achieves 92% classification accuracy. Thanks
to the analog-digital mixed-mode implementation, the proposed neural network classifier
processor reduced area and power by 84.0% and 82.27% compared with fully-digital ASIC
implementation, respectively.
ACKNOWLEDGMENTS
This work was supported by the research fund (1.180081.01) of UNIST.
REFERENCES
Krizhevsky A., et al , 2012, ImageNet classificatino with deep convolutional neural
networks, in Advances in Nueral Information Processing Systems (NIPS), Vol. 25
Szegedy C., et al , Jun. 2015, Going deeper with convolutions, IEEE Conf. on Computer
Vision and Pattern Recognition (CVPR), pp. 1-9
He K., Zhang X., Ren S., Sun J., Jun. 2016, Deep residual learning for image recognition,
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 770-778
Xie S., et al , Jul. 2017, Aggregated residual transformations for deep neural networks,
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 5987-5995
Redmon J., et al , Jun. 2016, You only look once: unified, real-time object detection,
IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779-788
Ou Y., Oyang Y., Aug. 2005, A novel radial basis function network classifier with
centers set by hierarchical clustering, in International Joint Conference on Neural
Networks, pp. 1383-1388
Sardar S., et al , Nov. 2011, A hardware/software co-design model for face recognition
using cognimem neural network chip, in IEEE International Conference on Image Information
Processing, pp. 1-6
Park J., et al , 2013, A 646GOPS/W multi-classifier many-core processor with cortex-like
architecture for super-resolution recognition, IEEE International Solid-State Circuits
Conference Digest of Tech. Papers, pp. 168-169
Kim G., et al , Jan. 2015, A 1.22 tops and 1.52mW/MHz augmented reality multi core
processor with neural network NoC for HDM applications, IEEE Journal of Solid-State
Circuits, Vol. 50, No. 1, pp. 113-124
Hong I., et al , Jan. 2016, A 2.71nJ/pixel gaze-activated object recognition system
for low-power mobile smart glasses, IEEE Journal of Solid-State Circuits, Vol. 51,
No. 1, pp. 45-55
Seo J., et al , Oct. 2011, A 45nm CMOS neuromorphic chip with a scalable architecture
for learning in networks of spiking neurons, in Proceedings of IEEE Custom Integrated
Circuits Conference
Ienne P., et al , Aug. 1996, Special-purpose digital hardware for neural networks:
an architectural survey, Journal of VLSI Signal Processing Systems, Vol. 13, No. 1,
pp. 5-25
Yang F., Paindavoine M., Sept. 2003, Implementation of an RBF neural netwrok on embedded
systems: realtime face tracknig an didentity verification, IEEE Transactions on Neural
Networks, Vol. 14, No. 5, pp. 1162-1175
Du K. L., Swamy M. N. S., 2006, Neural Networks in a Softcomputing Framework, London,
Springer, Vol. 6, No. 14, pp. 285
Kang K., Shibata T., Jul. 2010, An on-chip-trainable gaussian-kernel analog support
vector machine, IEEE Transactions on Circuits and Systems I, Vol. 57, No. 7, pp. 1513-1524
Lont J., Guggenbuhl W., May 1992, Analog CMOS implementation of a multilayer perceptron
with nonlinear synapses, IEEE Transactions on Neural Networks, Vol. 3, No. 3, pp.
457-465
Peng S., Hasler P., Anderson D., Oct. 2007, An analog programmable multi dimensional
radial basis function based classifier, IEEE Transactions on Circuits and Systems,
Vol. 54, No. 10, pp. 2148-2158
Kim M., et al , 2009, A 54GOPS 51.8mW analog-digital mixed mode neural perception
engine for fast object detection, in IEEE Custom Integrated Circuits Conference, pp.
649-652
Oh J., Lee S., Yoo H. J., May 2013, 1.2mW online learning mixed-mode intelligent inference
engine for low-power real-time object recognition processor, IEEE Transactions on
VLSI Systems, Vol. 21, No. 5, pp. 921-933
Lee K., et al , May 2013, A multi-modal and tunable radialbasis-function circuit with
supply and temperature compensation, in Proceedings of IEEE International Symposium
on Circuits and Systems, pp. 1608-1611
Yoo C., Park J., Dec. 2007, CMOS current reference with supply and temperature compensation,
IEEE Electronics Letters, Vol. 43, No. 25, pp. 1422-1424
Author
(S’12-M’17) received B.S., M.S., and Ph.D. degrees in the School of Electrical Engineering
from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea in
2012, 2014 and 2017, respectively.
From 2017 to 2018, he has researched as a postdoctoral researcher in Information Engineering
and Electronics Research Institute, KAIST, Daejeon, Korea, and as a Principal Engineer
in UX Factory Inc., Pangyo, Korea.
Now he is an Assistant Professor at the School of Electrical and Computer Engineering,
Ulsan National Institute of Science and Technology (UNIST).
His research interests include mixed-mode neuromorphic SoC, deep learning processor,
Network-on-Chip architectures, and intelligent computer vision processor for mobile
devices and autonomous vehicles.
(S’09-M’15) received Ph.D. degree in Electrical Engineering from Korea Advanced Institute
of Science and Technology (KAIST), Daejeon, Korea in 2014.
His Ph.D. research focused on System-on-a-Chip (SoC) architectures for energy-efficient
vision processing and artificial intelligence.
He is interested in customized architectures and circuits for computationally intensive
algorithm such as computer vision, machine learning, and their integration on the
mobile platform.
Since 2015, he has been running a start-up, UX Factory Inc., which is dedicated to
deliver the AI solutions derived from Software-System-on-chip technologies.
(M’95 – SM’04 – F’08) received the B.S. degree in electronics engineering from Seoul
National University, Seoul, South Korea, in 1983, and the M.S. and Ph.D. degrees in
electrical engineering from Korea Advanced Institute of Science and Technology (KAIST),
Daejeon, in 1985 and 1988, respectively.
From 2001 to 2005, he was the Director of Korean System Integration and IP Authoring
Research Center (SIPAC), South Korea. From 2003 to 2005, he was a full-time Advisor
to the Korean Ministry of Information and Communication, South Korea, and the National
Project Manager for System-on-Chip and Computer.
In 2007, he founded System Design Innovation & Application Research Center (SDIA)
at KAIST.
Since 1998, he has been with the Department of Electrical Engineering, KAIST, where
he is currently a Full Professor.
He has coauthored DRAM Design (Hongrung, 1996), High Performance DRAM (Sigma, 1999),
Future Memory: FRAM (Sigma, 2000), Networks on Chips (Morgan Kaufmann, 2006), Low-Power
NoC for High-Performance SoC Design (CRC, 2008), Circuits at the Nanoscale (CRC, 2009),
Embedded Memories for Nano-Scale VLSIs (Springer, 2009), Mobile 3D Graphics SoC form
Algorithm to Chip (Wiley, 2010), Bio-medical CMOS ICs (Springer, 2011), Embedded Systems
(Wiley, 2012), and Ultra-Low-Power Short-Range Radios (Springer, 2015).
His current research interests include computer vision system-on-chip, body-area networks,
and biomedical devices and circuits.
Dr. Yoo has been serving as the General Chair of the Korean Institute of Next Generation
Computing since 2010.
He was a member of the Executive Committee of ISSCC, the Symposium on VLSI Circuits,
and A-SSCC, the TPC Chair of A-SSCC 2008 and ISWC 2010, an IEEE Distinguished Lecturer
from 2010 to 2011, the Far East Chair of ISSCC from 2011 to 2012, the Technology Direction
Sub-Committee Chair of ISSCC 2013, the TPC Vice Chair of ISSCC 2014, and the TPC Chair
of ISSCC 2015.
He was a recipient of the Electronic Industrial Association of Korea Award for his
contribution to DRAM technology in 1994, the Hynix Development Award in 1995, the
Korea Semiconductor Industry Association Award in 2002, the Best Research of KAIST
Award in 2007, the Scientist/Engineer of this month Award from the Ministry of Education,
Science, and Technology of Korea in 2010, the Best Scholarship Awards of KAIST in
2011, and the Order of Service Merit from the Ministry of Public Administration and
Security of Korea in 2011.
He was a co-recipient of the ASP-DAC Design Award 2001, the Outstanding Design Awards
of 2005, 2006, 2007, 2010, 2011, 2014 A-SSCC, and the Student Design Contest Award
of 2007, 2008, 2010, 2011 DAC/ISSCC.