Mobile QR Code QR CODE

  1. (the School of Electrical and Computer Engineering, UNIST)
  2. (UX Factory Inc.)
  3. (the School of Electrical Engineering, KAIST)



Mixed-mode SoC, classifier, neural network processor, multi-layer perceptron, radial basis function network

I. INTRODUCTION

Deep neural network has been actively developed as a classifier in various computer vision applications due to its high classification accuracy[1-5]. However, they involve too huge amount of computations and memory footprints to be deployed in battery-driven mobile devices yet. Besides the deep neural networks, Support Vector Machines and neural networks have been commonly used for mobile applications but the former suffered from its massive computation and data storage compared to the latter. Among the variants of neural network, Radial Basis Function Network (RBFN) and Multilayer Perceptron (MLP) are most commonly used for classification on account of its high classification accuracy and fast learning process when compared to other types of classifiers[6,7]. However, mobile applications such as object recognition in unmanned aerial vehicle[8] and head-mounted display[9,10] require low-power consumption while preserving high classification accuracy and fast processing speed. However, previous neural network implementations were not able to achieve such high accuracy and low-power consumption at the same time due to the significant advantages and disadvantages of analog and digital circuit implementations[11,19].

Digital implementations of neural networks[11-13] have advantages that they can achieve high accuracy and programmability with noise tolerance, but they consumed huge power and area. Hundreds of weights must be stored in SRAM as well as numbers of nonlinear activation functions in Look-Up Table, but usually their values were approximated because they require huge memory to hold every entries, e.g. holding 8-b resolution of one activation requires 256B while digital multiplication doubles the precision, therefore, it needs an external memory to store the parameters. Moreover, parallel processing of activation function for multiple neurons was not possible, degrading the processing time [14]. On the other hand, analog VLSI implementation results in low-cost parallelism as well as low-power computation such as addition, but their inaccurate circuit parameters with noise and low precision degraded classification accuracy[15-17]. Several mixed-mode SoCs took the advantages of both analog and digital implementations obtaining low-power consumption within small area; however, there were no concerns on noise compensation while mobile devices suffer from supply voltage fluctuation and temperature variation that directly ruins accuracy[18,19].

In this paper, an analog-digital mixed-mode neural network classifier (NNC) is proposed for mobile scene classification. Its highly controllable radial basis function (RBF) circuit generates sigmoidal-shape function as well as various shapes of RBF[20] supporting both approximated MLP and RBFN. By adopting the environmental noise compensation in the analog core, namely supply voltage (△VDD) and temperature (△T) compensation, the proposed SoC achieves 92% classification accuracy.

The rest of the paper is organized as follows. In Section II, NNC is introduced to provide basic operations and hardware architecture of the proposed chip will be explained in Section III. Section IV shows measurement and implementation results, followed by conclusion in Section V.

II. NEURAL NETWORK CLASSIFIER

Fig. 1(a) shows the typical neural network architecture. It consists of input layer ($X$), hidden layers ($H$), and an output layer ($Z$). Every input is fully connected to hidden neurons, and they are multiplied with weights ($c$, $w$, $v$) and summed up before transfer function $f$(.). The output of each hidden neuron $j$ becomes as (1), where types of activation functions are shown in Fig. 1(b).

Fig. 1. (a) Neural network architecture, (b) Types of activation functions. Bell-shaped or Gaussian functions for RBFN; Sigmoid and ReLU for MLP, (c) Basic concepts of linear and non-linear classification. Left diagram = MLP, right diagram = RBFN.

../../Resources/ieie/JSTS.2019.19.1.129/fig1.png

(1)
$f_{j}\left(\sum X_{i} c_{j}\right)$

Among different types of activation functions in Fig. 1(b), sigmoid and ReLU functions are most widely used for MLP and deep neural network; and RBFs are used for Support Vector Machine and RBFN. The output of RBFN is represented as (2), where $w_{ij}$ is the weight between the ith center ($c_{i}$) of RBF and output $j$, $\sigma_{i}$ represents the width of the ith center, $\|\cdot\|$ is the Euclidean norm on the input space, $\phi$ is output of each hidden RBF nodes that corresponds to bell-shaped functions including Gaussian.

(2)
$O_{j}=\sum_{i=1}^{N} w_{i j} \phi\left(\left\|\vec{r}-\overrightarrow{c_{i}}\right\|, \sigma_{i}\right)$

The operation of neural network classification differs by the types of function used for its activation, as depicted in Fig. 1(c). The number of hidden layers determines combinations of linear decision boundaries of MLP while using RBF utilizes nonlinear classification boundaries. Therefore, MLP and RBF have different classification accuracy under the same network size or computational cost. RBFN shows great performance for simple classification with low complexity besides today’s heavy deep neural network, but its performance is dependent on shape, center, and width of the RBFs. If the shape of the RBF is corrupted by noise resulting in unwanted overlaps among the RBFs as depicted in Fig. 2, the trained weight values become no more reliable; this fact causes severe degradation of accuracy. Therefore, the NNC should provide versatile shapes of RBF with stability. In order to solve the problem that may occur due to noise, the proposed NNC processor contains highly controllable RBF circuit that can generate various RBFs as well as sigmoid-shape function for reconfigurable processing.

Fig. 2. Changes of RBF shape due to environmental noise $\Delta \mathrm{V}_{\mathrm{DD}}$.

../../Resources/ieie/JSTS.2019.19.1.129/fig2.png

III. MIXED-MODE HARDWARE ARCHITECTURE

Fig. 3 shows the overall hardware architecture of the proposed mixed-mode NNC processor. The analog datapath performs feed-forward neural network classification. For low-power consumption and utilizing Kirchhoff’s current law, the analog core is designed with current-mode circuits. The entire analog core consists of DACs for input, current multipliers for weight multiplication, I-V converters, RBF circuits, and a sigmoid circuit. The neural network module has 4 input neurons, 6 hidden neurons and 1 output neuron, which is operated recursively to generate 25 scene categories.

Fig. 3. Overall architecture of the mixed-mode NNC processor.

../../Resources/ieie/JSTS.2019.19.1.129/fig3.png

The neural network parameters are stored in the memory bank in the digital controller that sets the corresponding values of the parameters to the analog core through DAC bank. The digital controller also performs on-line learning of the weights and RBF shapes.

1. Analog Neural Network Core Circuits

The required number of DACs and current multipliers in fully-connected layers should be identical to the number of weights. Instead of having numbers of DACs and multipliers, binary-weighted current mirrors are implemented to save area and power as shown in Fig. 4. They perform weight multiplication when the NNC is used as MLP-mode. In RBFN-mode, multipliers in the first layer are used as wire connection and those in the second layer are used for weight multiplication. The input current is multiplied by twofold 4-b weights of $w[3 : 0]$ and $w[7 : 4]$ to save area. The outputs mirrored through M1~M2 and M3~M4 represent LSB and MSB, respectively, and they have different size. The amount of final output is as (3).

Fig. 4. 8-bit current multiplier.

../../Resources/ieie/JSTS.2019.19.1.129/fig4.png

(3)
$\mathrm{I}_{\mathrm{out}}=\mathrm{I}_{\mathrm{in}}\left(\sum \mathrm{w}_{\mathrm{i}} \times 2^{\mathrm{i}}\right)$

Fig. 5 shows the sigmoid circuit for the output neuron. Bias voltages $V_{BP}$ and $V_{BN}$ control the transient points and output range. The reference voltages define shape and slope of the sigmoid function, and setting their values can also provide approximately linear functions. The controllable sigmoid circuit saves area and power of the NNC processor.

Fig. 5. Controllable sigmoid circuit for activation and measured waveform.

../../Resources/ieie/JSTS.2019.19.1.129/fig5.png

For low-power consumption, the analog core operates with current in the order of nA. Therefore, the circuit becomes too sensitive to environmental noises $\Delta T$ and $\Delta V_{DD}$ variations. To compensate for such noise, all the analog circuit adopts a stable current reference from [21][21] with modifications, as shown in Fig. 6. The stacked output nodes generate stable current where its amount is controlled by the 5-b switches $BS[4:0]$.

Fig. 6. Environmental noise robust current source.

../../Resources/ieie/JSTS.2019.19.1.129/fig6.png

Fig. 7(a) shows the proposed RBF circuit which is highly controllable with six parameters: $V_{ref1}$, $V_{ref2}$, $B[4:0]$, $I_{x}$, $Sel_{-} p$, $F_{-} Sel$. Since bias voltages $V_{ref1}$, $V_{ref2}$ define transient points of the V-I curve, their combination defines center and width of the RBF and approximated sigmoid function shape. Switches $B[4:0]$ controls $g_{m}$ of the curve; height is set by $I_{x}$, which is the binary switches $BS[4:0]$ in the current source; and up/down phase of the function is set by multiplexer $Sel_{-} p$. MUX, $F_{-} Sel$, and level shifter invert the input voltage domain to generate much sharp bump functions as described in [20][20]. Fig. 7(b) shows measured waveforms of the circuit. The circuit generates various shapes of RBF (black, red, green, blue) as well as sigmoid-shaped function depicted in purple dots where $V_{ref1}$ is 0 V and $V_{ref2}$ is 0.6 V. Moreover, it reduced $\Delta T$ noise error by 92.7% testing under [-37℃, 87℃], and achieved stable output current within ±0.2 V window of $\Delta V_{DD}$ where $V_{DD}$ is 1.2 V.

Fig. 7. (a) The highly controllable RBF circuit with noise compensation, (b) Measured analog waveforms of RBFs show diversity of activation functions with noise compensation.

../../Resources/ieie/JSTS.2019.19.1.129/fig7.png

2. Digital Controller for Learning

Fig. 8 shows the digital controller that consists of a control unit, a learning unit, and a configuration memory which controls analog parameters through DAC bank. In the learning unit, the K-means Clustering Accelerator (KCA) and the Back Propagation Accelerator (BPA) are used for learning the RBF parameters and weights of the fully-connected layers. In RBFN-mode, the current multipliers in the first layer work as wire connection and the NNC FSM controller sets RBF parameters, which are trained from the KCA, to the analog core. The BPA consists of a 4-way SIMD Multiply-and-Accumulator arrays for back-propagation and a sum-of-squareddistance unit for loss computation. The KCA contains centroid unit that finds center of each cluster, or class, and a RBF identifier which finds the shape of the RBFs. The KCA is not used in MLP-mode and the NNC FSM Controller supervises correct weights via the current multiplier in the first layer. The learning unit is used only for on-line training of the weights, which means it is not necessarily used for feed-forward classification once the neural network parameters are trained.

Fig. 8. Digital controller architecture.

../../Resources/ieie/JSTS.2019.19.1.129/fig8.png

IV. IMPLEMENTATION & MEASUREMENT

Fig. 9 shows the chip micrograph and the performance summary of the proposed NNC processor. It is fabricated in 0.13 $\mu m$ CMOS process as a part of the mobile object recognition processor[8]. The NNC processor occupies 0.140 mm2 and consumes 2.20 mW running at 200 MHz for digital domain, while the whole SoC[8] occupies 25.0 mm2 and consumes 260 mW in average. The power consumption of the analog neural network core is only 723 $\mu W$ due to the current-mode circuits. Compared with conventional fully-digital implementation, the proposed processor saves area and power by 84.0% and 82.2%, respectively by utilizing mixed-mode architecture.

Fig. 9. Chip photograph and summary table.

../../Resources/ieie/JSTS.2019.19.1.129/fig9.png

Table 1 is the performance comparisons with the analog/digital ASIC and FPGA implementations. We define complexity, which represents the number of weights, to compare power and area efficiencies. Also, the efficiencies of [11][11] are scaled to 0.13 $\mu m$ process by applying Dennard Scaling for comparison. By the natural characteristics of spiking neural network, [11][11] shows the greatest power efficiency and dense complexity but it requires large area to implement every neuron, therefore its area efficiency is less than this work. Also, this work can support both MLP and RBFN by having reconfigurable architecture. Compared to analog circuits, mixed-mode implementations are advantageous to obtain high programmability. Among the mixed-mode processors[18,19], this work achieved the highest efficiency in terms of power and area due to its recursive operating architecture for 25-category classification.

Table 1. Comparisons with Neural Network Processors

Reference

CLASSIFIER TYPE

Signal Type

Complexity [# of weights]

Programmability

Process

Power [mW]

Area [mm2]

Power $\eta$ [#weight/mW]

Area $\eta$ [#weight/mm2]

Seo [11][11]*

1)SNN

Digital

64k (256x256)

High

45 nm

~3.00

4.20

7560

1870

Yang [13][13]

RBFN

Digital (FPGA)

135

Extremely High

0.18 μm

967

N/A

0.140

N/A

Lont [16][16]

MLP

Analog

161

Very Low

3 μm

25.0

2.40

6.44

67.1

Peng [17][17]

RBFN

Analog

14

Low

0.5 μm

2)2.24

2)0.0482

6.25

290

Kim [18][18]

3)NFL

Mixed

12 (3x4)

High

0.13 μm

2.83

0.163

4.24

73.6

Oh [19][19]

3)NFL

Mixed

27

High

0.13 μm

1.20

0.765

271

35.3

This Work

MLP/RBFN

Mixed

750 (30x25)

High

0.13 μm

2.20

0.140

341

5360

1) SNN: Spiking Neural Network; 2) Numbers are available only with RBF circuits; 3) NFL: Neuro Fuzzy Logic

* Power dissipation differs by the variants in [11][11]; power and area efficiencies are scaled to 0.13 $\mu m$ process

Fig. 10 shows the measurement process of the proposed processor, which is used as a visual attention for the entire object recognition SoC[8]. The input image is decomposed into 128x128 pixel macro-blocks and HMAX is performed over each block to extract statistical descriptor vector that becomes the input to the RBFN. Then, the macro-block is classified as one of the pretrained 25 scene categories by recursive operation of RBFN. Finally, the input image turns into spatially organized scene map and object recognition is performed within the macro-blocks. The scene classification result provides the likelihood of target object on the object recognition pipeline, therefore, only correct objects of interest are detected. For example of safe driving, drivers are interested in moving vehicles on the road, not the vehicles on the advertisement. Scene classification with RBFN provides the contextual information to detect objects on road scene category. Fig. 11 shows the evaluation platform and results. The SoC is integrated with the multimedia expansion board and evaluated in city-view experiment set, where the target object is the toy car on the road and the distractor is a vehicle on an advertising board. The RBFN scene classification results in context-aware map as depicted in the right-bottom, and only the target object on road scene is recognized while the advertisement is neglected.

Fig. 10. Measurements process of scene classification for object recognition.

../../Resources/ieie/JSTS.2019.19.1.129/fig10.png

Fig. 11. Evaluation system and results.

../../Resources/ieie/JSTS.2019.19.1.129/fig11.png

Thanks to mixed-mode implementation of the proposed NNC processor, the overall visual attention accuracy is increase to 84% that is 1.40x improvement to the conventional visual attention model. As a result, the entire SoC[8] achieved 96% of object recognition accuracy in the test of 200 objects with 25 scene categories. In addition to the scene classification with HMAX descriptor, sole classification accuracy is measured with handcrafted test vectors and the proposed NNC processor achieved 92%.

V. CONCLUSIONS

In this work, a reconfigurable mixed-mode neural network classifier processor is proposed for robust and low power scene classification as a part of mobile object recognition processor. It consists of noise tolerant analog circuits compensate for temperature and supply voltage variations in order to achieve high classification accuracy, and supports both MLP and RBFN. The proposed processor fabricated in 0.13 μm CMOS process consumes 2.20 mW running at 200 MHz; it achieves 92% classification accuracy. Thanks to the analog-digital mixed-mode implementation, the proposed neural network classifier processor reduced area and power by 84.0% and 82.27% compared with fully-digital ASIC implementation, respectively.

ACKNOWLEDGMENTS

This work was supported by the research fund (1.180081.01) of UNIST.

REFERENCES

1 
Krizhevsky A., et al , 2012, ImageNet classificatino with deep convolutional neural networks, in Advances in Nueral Information Processing Systems (NIPS), Vol. 25Google Search
2 
Szegedy C., et al , Jun. 2015, Going deeper with convolutions, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 1-9Google Search
3 
He K., Zhang X., Ren S., Sun J., Jun. 2016, Deep residual learning for image recognition, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 770-778Google Search
4 
Xie S., et al , Jul. 2017, Aggregated residual transformations for deep neural networks, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 5987-5995Google Search
5 
Redmon J., et al , Jun. 2016, You only look once: unified, real-time object detection, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 779-788Google Search
6 
Ou Y., Oyang Y., Aug. 2005, A novel radial basis function network classifier with centers set by hierarchical clustering, in International Joint Conference on Neural Networks, pp. 1383-1388DOI
7 
Sardar S., et al , Nov. 2011, A hardware/software co-design model for face recognition using cognimem neural network chip, in IEEE International Conference on Image Information Processing, pp. 1-6DOI
8 
Park J., et al , 2013, A 646GOPS/W multi-classifier many-core processor with cortex-like architecture for super-resolution recognition, IEEE International Solid-State Circuits Conference Digest of Tech. Papers, pp. 168-169DOI
9 
Kim G., et al , Jan. 2015, A 1.22 tops and 1.52mW/MHz augmented reality multi core processor with neural network NoC for HDM applications, IEEE Journal of Solid-State Circuits, Vol. 50, No. 1, pp. 113-124DOI
10 
Hong I., et al , Jan. 2016, A 2.71nJ/pixel gaze-activated object recognition system for low-power mobile smart glasses, IEEE Journal of Solid-State Circuits, Vol. 51, No. 1, pp. 45-55DOI
11 
Seo J., et al , Oct. 2011, A 45nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons, in Proceedings of IEEE Custom Integrated Circuits ConferenceDOI
12 
Ienne P., et al , Aug. 1996, Special-purpose digital hardware for neural networks: an architectural survey, Journal of VLSI Signal Processing Systems, Vol. 13, No. 1, pp. 5-25DOI
13 
Yang F., Paindavoine M., Sept. 2003, Implementation of an RBF neural netwrok on embedded systems: realtime face tracknig an didentity verification, IEEE Transactions on Neural Networks, Vol. 14, No. 5, pp. 1162-1175DOI
14 
Du K. L., Swamy M. N. S., 2006, Neural Networks in a Softcomputing Framework, London, Springer, Vol. 6, No. 14, pp. 285Google Search
15 
Kang K., Shibata T., Jul. 2010, An on-chip-trainable gaussian-kernel analog support vector machine, IEEE Transactions on Circuits and Systems I, Vol. 57, No. 7, pp. 1513-1524DOI
16 
Lont J., Guggenbuhl W., May 1992, Analog CMOS implementation of a multilayer perceptron with nonlinear synapses, IEEE Transactions on Neural Networks, Vol. 3, No. 3, pp. 457-465DOI
17 
Peng S., Hasler P., Anderson D., Oct. 2007, An analog programmable multi dimensional radial basis function based classifier, IEEE Transactions on Circuits and Systems, Vol. 54, No. 10, pp. 2148-2158DOI
18 
Kim M., et al , 2009, A 54GOPS 51.8mW analog-digital mixed mode neural perception engine for fast object detection, in IEEE Custom Integrated Circuits Conference, pp. 649-652DOI
19 
Oh J., Lee S., Yoo H. J., May 2013, 1.2mW online learning mixed-mode intelligent inference engine for low-power real-time object recognition processor, IEEE Transactions on VLSI Systems, Vol. 21, No. 5, pp. 921-933DOI
20 
Lee K., et al , May 2013, A multi-modal and tunable radialbasis-function circuit with supply and temperature compensation, in Proceedings of IEEE International Symposium on Circuits and Systems, pp. 1608-1611DOI
21 
Yoo C., Park J., Dec. 2007, CMOS current reference with supply and temperature compensation, IEEE Electronics Letters, Vol. 43, No. 25, pp. 1422-1424DOI

Author

Kyuho Jason Lee
../../Resources/ieie/JSTS.2019.19.1.129/au1.png

(S’12-M’17) received B.S., M.S., and Ph.D. degrees in the School of Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea in 2012, 2014 and 2017, respectively.

From 2017 to 2018, he has researched as a postdoctoral researcher in Information Engineering and Electronics Research Institute, KAIST, Daejeon, Korea, and as a Principal Engineer in UX Factory Inc., Pangyo, Korea.

Now he is an Assistant Professor at the School of Electrical and Computer Engineering, Ulsan National Institute of Science and Technology (UNIST).

His research interests include mixed-mode neuromorphic SoC, deep learning processor, Network-on-Chip architectures, and intelligent computer vision processor for mobile devices and autonomous vehicles.

Junyoung Park
../../Resources/ieie/JSTS.2019.19.1.129/au2.png

(S’09-M’15) received Ph.D. degree in Electrical Engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea in 2014.

His Ph.D. research focused on System-on-a-Chip (SoC) architectures for energy-efficient vision processing and artificial intelligence.

He is interested in customized architectures and circuits for computationally intensive algorithm such as computer vision, machine learning, and their integration on the mobile platform.

Since 2015, he has been running a start-up, UX Factory Inc., which is dedicated to deliver the AI solutions derived from Software-System-on-chip technologies.

Hoi-Jun Yoo
../../Resources/ieie/JSTS.2019.19.1.129/au3.png

(M’95 – SM’04 – F’08) received the B.S. degree in electronics engineering from Seoul National University, Seoul, South Korea, in 1983, and the M.S. and Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, in 1985 and 1988, respectively.

From 2001 to 2005, he was the Director of Korean System Integration and IP Authoring Research Center (SIPAC), South Korea. From 2003 to 2005, he was a full-time Advisor to the Korean Ministry of Information and Communication, South Korea, and the National Project Manager for System-on-Chip and Computer.

In 2007, he founded System Design Innovation & Application Research Center (SDIA) at KAIST.

Since 1998, he has been with the Department of Electrical Engineering, KAIST, where he is currently a Full Professor.

He has coauthored DRAM Design (Hongrung, 1996), High Performance DRAM (Sigma, 1999), Future Memory: FRAM (Sigma, 2000), Networks on Chips (Morgan Kaufmann, 2006), Low-Power NoC for High-Performance SoC Design (CRC, 2008), Circuits at the Nanoscale (CRC, 2009), Embedded Memories for Nano-Scale VLSIs (Springer, 2009), Mobile 3D Graphics SoC form Algorithm to Chip (Wiley, 2010), Bio-medical CMOS ICs (Springer, 2011), Embedded Systems (Wiley, 2012), and Ultra-Low-Power Short-Range Radios (Springer, 2015).

His current research interests include computer vision system-on-chip, body-area networks, and biomedical devices and circuits.

Dr. Yoo has been serving as the General Chair of the Korean Institute of Next Generation Computing since 2010.

He was a member of the Executive Committee of ISSCC, the Symposium on VLSI Circuits, and A-SSCC, the TPC Chair of A-SSCC 2008 and ISWC 2010, an IEEE Distinguished Lecturer from 2010 to 2011, the Far East Chair of ISSCC from 2011 to 2012, the Technology Direction Sub-Committee Chair of ISSCC 2013, the TPC Vice Chair of ISSCC 2014, and the TPC Chair of ISSCC 2015.

He was a recipient of the Electronic Industrial Association of Korea Award for his contribution to DRAM technology in 1994, the Hynix Development Award in 1995, the Korea Semiconductor Industry Association Award in 2002, the Best Research of KAIST Award in 2007, the Scientist/Engineer of this month Award from the Ministry of Education, Science, and Technology of Korea in 2010, the Best Scholarship Awards of KAIST in 2011, and the Order of Service Merit from the Ministry of Public Administration and Security of Korea in 2011.

He was a co-recipient of the ASP-DAC Design Award 2001, the Outstanding Design Awards of 2005, 2006, 2007, 2010, 2011, 2014 A-SSCC, and the Student Design Contest Award of 2007, 2008, 2010, 2011 DAC/ISSCC.