DoHyeon-Gu
ChoiSeongrim
HwangJaemin
KimAra
NamByeong-Gyu
-
(Department of Computer Science and Engineering, Chungnam National University, 99,
Daehak-ro, Yuseong-gu, Daejeon, 305-764, Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Gesture recognition, hidden Markov model (HMM), Viterbi algorithm, binary search orientation, model clustering
I. INTRODUCTION
In recent days, gesture recognition is widely studied for a natural user interface
(NUI) method on wearable smart devices such as head mounted displays (HMDs) and smart
bands (1). Hand gesture recognition can be a secure method for the NUI on wearable devices
because it rarely reveals user’s intention in public spaces unlike the speech recognition.
There are two categories in the hand gesture recognition, the pose recognition on
hand shapes and motion recognition on hand movements. This work focuses on the motion
recognition of handwritings (2) because of its fluency in expressions that suits for versatile NUIs. Sequence matching
plays a key role in this category, and thus the hidden Markov model (HMM) is widely
adopted because of its best-in-class sequence matching accuracy (3). In the HMM algorithms, complex arithmetic operations and excessive memory bandwidth
become the most challenging issues for their low-power and real-time realizations
(4,5). There have been studies on hardware accelerations of the HMM algorithms (4,5), but they focused on the speech recognition only and did not account for the compute
intensive motion orientation which is mandatory for the gesture recognition problem.
A few studies on the gesture recognition hardware are available (1), but they focused on the pose recognition that does not include the HMM this work
focuses on.
In this paper, we propose a low-power HMM accelerator accommodating a light weight
motion orientation module for gesture recognition interface on wearable smart devices
(6). In this accelerator, binary search method is exploited in the motion orientation
module to reduce its computational complexity by replacing the division and arctangent
operations in computing motion orientations with simple multiplications and lookup
tables. In addition, gesture models are clustered in the gesture database to reduce
unnecessary external memory transactions. As a result, the proposed HMM accelerator
shows 25.6% power reduction from a vanilla hardware implementation of the gesture
HMM.
Fig. 1. HMM-based gesture recognition process.
II. HMM ACCELERATOR
Fig. 1 shows a brief description of the gesture recognition process using the HMM algorithm.
Basically, this is a stochastic process to find the meaning of a given gesture input
based on the stochastic matching between input hand motion and a gesture model from
the gesture database (DB). Each gesture model in the DB is defined with state transitions
where each state corresponds to a hand motion and makes a transition to next state
if it matches with the given input hand motion. A gesture model is chosen for the
recognition result when it reaches its final state successfully.
The overall architecture of the proposed gesture HMM accelerator is depicted in Fig. 2. It consists of motion orientation and gesture matching stages. The motion orientation
stage computes the direction of hand movement and transfers this direction to the
following gesture matching stage. The gesture matching stage fetches the gesture models
from the gesture DB and selects the model that matches the given direction with the
highest probability as its final recognition result.
Fig. 2. HMM-based gesture recognition process.
1. Motion Orientation
Motion orientation specifies the direction of a motion vector in hand movement. The
motion vector (∆x, ∆y) is determined based on the difference between two successive
hand positions ($x_{t}, y_{t}$) and ($x_{t+1}, y_{t+1}$) on the hand movement. Hence,
the motion orientation θ of the motion vector is computed by Eq. (1), which involves compute intensive division and arctangent operations. Therefore,
in this work, the orientation θ is quantized into 16 intervals for reasonable complexity
of the HMM algorithm, which is found to be sufficient quantization for representing
on-line handwriting (7,8).
A simplified hardware design of Eq. (1)can be found in (9) where the Eq. (1)is used to find the rotation angle in producing SIFT descriptors for object recognition
purpose. They simplified the division operation using a multiplication and a reciprocation
table for reduced hardware complexity. However, this approach is limited by the number
of bits used in the divisor because the size of the reciprocation table explodes accordingly.
This complication arises particularly in the high-resolution input image which produces
the ∆x with a wide dynamic range that becomes the divisor in Eq. (1). Therefore, we propose a novel orientation scheme scalable to the input image resolution
by adopting binary search method to find a quantized orientation interval that a motion
vector resides in. We divide the motion vector space into 16 orientation intervals
and assign a pre-computed arctangent value to each interval. Thanks to this method,
we just need to find a proper orientation interval to get an orientation angle for
a given motion vector. This approach avoids the arctangent and division operations,
and exploits simple comparisons and a vector rotation that are independent of the
dynamic range of ∆x.
Fig. 3. Binary search orientation (a) step 1: comparison to boundary; x=0, (b) step
2: comparison to boundary; y=0, (c) step 3: comparisons to one of two boundaries;
y=x or y= −x, (d) step 4: comparisons to one of four boundaries rotated by −22.5°;
x’=0, y’=0, y’=x, or y’= −x’.
The proposed binary search orientation scheme is illustrated in Fig. 3. The binary search determines the orientation interval for a given motion vector
(∆x, ∆y) by simply comparing the vector with search boundaries x=0, y=0, y=±x and
−22.5° rotation for each. Each boundary divides the vector space into two subspaces,
and the orientation interval of the motion vector is examined step-by-step by narrowing
down the search space into the selected subspace from the previous search step, as
described in Fig. 3(a)-(d). The search steps shown in Fig. 3(a)-(c) contain simple comparisons but the step in Fig. 3(d) involves comparisons to the search boundaries rotated by −22.5°. These comparisons
to the rotated boundaries can be expressed as Eq. (2)where the comparison is made in −22.5° rotated coordinates (x’, y’), and (∆x’, ∆y’)
is the representation of the motion vector (∆x, ∆y) in the rotated space. The parameters
α and β in Eq. (2)indicate the signs to specify each search boundary.
This binary search orientation is implemented in hardware as depicted in Fig. 4. The hardware consists of four steps realizing the search steps illustrated in Fig. 3. The ∆x and ∆y are tested against the search boundaries x=0 and y=0 (i.e. ∆x>0, ∆y>0)
to select a search space. Then, the test results are used together to determine the
next search boundary between y=x and y= −x, and ∆x and ∆y are tested with this search
boundary (i.e. ∆y > ±∆x) to select the next search space accordingly. Finally, all
the test results above are used together to determine one of the rotated search boundaries,
and the Eq. (2)is evaluated to select the final search space that corresponds to one of the 16 quantized
orientations. The boundary comparisons represented by Eq. (2)require two multiplications with rotation coefficients. Hence, we implement the Eq. (2)using two multipliers and an 8-entry lookup table that is independent of input image
resolution. Therefore, our motion orientation hardware reduces its power dissipation
by 56.1% for high-definition (HD) input stream compared with the conventional approach
using a reciprocation table (9).
Fig. 4. Structure for binary search orientation (a) coefficients for the rotated boundary
compariosons, (b) hardware organization.
2. Gesture Matching
Fig. 5. Model clustering scheme.
Fig. 6. Viterbi decoder architecture using logarithmic arithmetic.
Gesture matching stage fetches in gesture models from the gesture DB and selects the
one with the highest matching probability using the Viterbi decoder of the HMM algorithm.
However, a huge memory bandwidth is required for the model fetching as the number
of gesture models increases because each gesture model contains thousands of parameters
such as transition, observation, and initial probabilities associated with the state
transitions in the gesture model. Therefore, we exploit gesture model clustering in
the gesture matching process to reduce effective bandwidth to external memory.
Proposed model clustering scheme is illustrated in Fig. 5. Gesture models with similar matching probabilities are clustered together, and the
one with the closest probability to the average value in each cluster is chosen to
represent the cluster. Thus, the gesture matching stage in the first step just fetches
the representative gesture model from each cluster and selects the one with the highest
probability using the Viterbi decoder. In the next step, it fetches in the cluster
associated with the selected representative model and determines the gesture model
with the highest probability within the cluster as the final gesture recognized by
the system. An 8kB gesture cache is used to buffer the clustered gesture models for
more reduction in external bandwidth as these gesture models are accessed repeatedly
from the Viterbi decoder.
The Viterbi decoder module exploits the logarithmic arithmetic for a reduced arithmetic
complexity because it is a multiplication oriented block. The logarithmic arithmetic
is well known to be effective for multiplication and division by converting them into
simple addition and subtraction in logarithmic domain (10). All the off-line parameters in the Viterbi decoder are pre-converted into the logarithmic
domain and thus, logarithmic converters are unnecessary in this module. In addition,
results do not need antilogarithmic converters because we just need to choose the
maximum value among them and the antilogarithmic conversion is a monotonically increasing
function preserving the order among results. Fig. 6 shows the proposed Viterbi decoder module, and the adders in gray area show the replacements
of the multipliers using the logarithmic arithmetic.
Fig. 7. Die photograph and chip characteristics.
Fig. 8. . Power reduction from each of proposed scheme.
III. IMPLEMENTATION RESULTS
The proposed HMM accelerator with motion orientation function is fabricated using
65 nm CMOS technology. Its chip photograph and performance summary are given in Fig. 7. It realizes real-time gesture recognition while consuming 6.4 mW at 133 MHz.
Table 1. Comparison with other works
|
Gesture
HMM*
|
Gesture HMM
(This Work)
|
Speech
HMM (5)
|
Implementation
|
General purpose processor (OMAP4430)
|
Hardware Accelerator
|
Hardware Accelerator
|
Technology (nm)
|
N/A
|
65
|
65
|
No. of Models
|
2,296
|
2,296
|
5,000
|
Power (mW)
|
847
|
6.4 (@ 1.2V)
|
23 (@ 1.2V)
|
6.0 (@ 0.85V)
|
Operating
Frequency (MHz)
|
600
|
133 (@ 1.2V)
|
110 (@ 1.2V)
|
50 (@ 0.85V)
|
* Measured with TI multimedia evaluation board (OMAP 4430)
Fig. 9. Configuration of evaluation system.
To our best knowledge, this is the first HMM accelerator for the gesture recognition
purpose. Therefore, our work is compared with the general purpose processor (i.e.
OMAP4430) solution of the gesture recognition on 2,296 handwriting data set (2) as summarized in Table 1. Proposed optimization schemes are evaluated by demonstrating power reductions presented
in Fig. 8. The binary search method in motion orientation achieves 12.6% power reduction, the
model clustering contributes additional 7.9% and the logarithmic arithmetic reduces
5.1% more, thereby resulting in 25.6% reduction in total from a vanilla hardware implementation.
Our design is also compared with the state-of-the-art HMM accelerator for speech recognition
purpose (5) as presented in Table 1.
Fig. 9 shows an evaluation system for the proposed HMM accelerator chip. Camera board is
attached to stream in QVGA hand gesture images at 30fps and detects the fingertips
from the incoming images. The coordinates of the fingertip is delivered to the proposed
HMM accelerator on the test board to conduct the gesture recognition. The handwriting
recognition results are presented in Fig. 10. The left side of each image shows the input image frame augmented with traces of
finger movements, and the right side shows a sequence of the motion orientations determined
by the HMM accelerator.
Fig. 10. Handwriting recognition results (a) case for character A, (b) case for character
D, (c) case for character N.
Fig. 10(a)-(c) show the recognition results of the handwritings for characters A, D, and N, respectively.
The upper parts show handwriting gestures with regular strokes and the lower parts
show handwritings with irregular strokes. As shown in Fig. 10(a), a total of five strokes are needed for character A with regular strokes, of which
strokes 1, 3, and 5 constitute the character A. The start point of the gesture can
be changed to the lower left side on irregular strokes and the recognition is properly
done for this case as well. Fig. 10(b) shows the recognition results for the character D. It demonstrates proper result
even for the irregular strokes with smaller character size. Fig. 10(c) shows the recognition is still successful even for the slanted characters with irregular
sequence of strokes. These results show that the proposed HMM accelerator demonstrates
robust recognition results on various handwriting gestures.
IV. CONCLUSION
In this paper, the world’s first HMM accelerator with motion orientation and model
clustering is presented for the gesture recognition UI on wearable devices. Binary
search orientation is proposed to avoid the division and arctangent operations and
attains 12.6% power reduction. Model clustering in the gesture DB achieves additional
7.9% reduction, and logarithmic arithmetic in the Viterbi decoder contributes 5.1%
of more reduction. Thanks to these schemes combined together, the proposed gesture
HMM accelerator demonstrates 25.6% power reduction compared with a vanilla hardware
implementation of the gesture recognizing HMM.
ACKNOWLEDGMENTS
This work was supported by research fund of Chungnam National University. The authors
would like to thank IDEC for chip fabrication and CAD support.
REFERENCES
Park S., Choi S., Lee J., Kim M., Park J., Yoo H.-J., 2016, A 126.1mW Real-Time Natural
UI/UX Processor with Embedded Deep-Learning Core for Low-Power Smart Glasses, ISSCC
Dig. Tech. Papers, pp. 254-256
Günter S., Bunke H., Apr 2004, HMM-based handwritten word recognition: On the optimization
of the number of states, training iterations and Gaussian components, Pattern Recognition,
Vol. 37, No. 10, pp. 2069-2079
Farra N., Raffa G., Nachman L., Hajj H., 2011, Energy-Efficient Mobile Gesture Recognition
with Computation Offloading, Proc. Int. Conf. Energy Aware Comput., pp. 1-6
Fahmy S. A., Cheung P. Y. K., Luk W., 2005, Hardware Acceleration of Hidden Markov
Model Decoding for Person Detection, in Proc. Conf. Des., Autom. Test Eur. (DATE),
Vol. 3, pp. 8-13
Price M., Glass J., Chandrakasan A. P., Jan 2015, A 6 mW, 5,000-Word Real-Time Speech
Recognizer Using WFST Models, IEEE J. Solid-State Circuits, Vol. 50, No. 1, pp. 102-112
Choi S., et al. , Nov 2016, A Low-Power Real-Time Hidden Markov Model Accelerator
for Gesture User Interface on Wearable Devices, in IEEE Asian Solid-State Circuits
Conf., Vol. , No. , pp. 261-264
Lee H.-K., Kim J. H., Oct 1999, An HMM-Based Threshold Model Approach for Gesture
Recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 21, No. 10,
pp. 961-973
Lee S.-H., Kim J. H., Nov 1997, Augmenting the Discrimination Power of HMM by NN for
On-Line Cursive Script Recognition, Applied Intelligence, Vol. 7, No. 4, pp. 304-314
Hong I., Bong K., Shin D., Park S., Lee K. J., Kim Y., Yoo H.-J., Jan 2016, A 2.71
nJ/Pixel Gaze-Activated Object Recognition System for Low-Power Mobile Smart Glasses,
IEEE J. Solid-State Circuits, Vol. 51, No. 1, pp. 45-55
Nam B.-G., Yoo H.-J., May 2009, An embedded stream processor core based on logarithmic
arithmetic for a low-power 3-D graphics SoC, IEEE J. Solid-State Circuits, Vol. 44,
No. 5, pp. 1554-1570
Author
received the B.S. degree in computer science and engineering from the Chungnam National
University (CNU), Daejeon, in 2017, where he is currently working toward the M.S.
degree.
His current research interests include machine learning processors and wearable SoC
design.
received the B.S. and M.S. degrees in computer science and engineering from the Chungnam
National University (CNU), Daejeon, in 2011 and 2013, respectively, where he is currently
working toward the Ph.D. degree.
His current research interests include object recognition processor and wearable SoC
design.
He is a co-recipient of the IEEE Asian Solid-State Circuits Conference (A-SSCC) Distinguished
Design Award in 2016.
received the B.S. degrees in mechatronics and computer science and engineering (double
major) and M.S. degree in computer science and engineering from the Chungnam National
University (CNU), Daejeon, in 2014 and 2016, respectively, where he is currently with
Hanwha Systems.
His research interests include mobile GPU, digital arithmetic circuits, and system
software platforms.
received the B.S degree in information and communication engineering from the Hanbat
National University and M.S. degree in computer science and engineering from the Chungnam
National University (CNU), Daejeon, in 2016 and 2018, respectively, where she is currently
with Satreci.
Her research interests include wearable SoC and low-power SoC design.
received his B.S. degree (summa cum laude) in computer engineering from Kyungpook
National University, Daegu, Korea, in 1999, M.S. and Ph.D. degrees in electrical engineering
and computer science from Korea Advanced Institute of Science and Technology (KAIST),
Daejeon, Korea, in 2001 and 2007, respectively.
His Ph.D. work focused on low-power GPU design for wireless mobile devices.
In 2001, he joined Electronics and Telecommunications Research Institute (ETRI), Daejeon,
Korea, where he was involved in a network processor design for InfiniBandTM protocol.
From 2007 to 2010, he was with Samsung Electronics, Giheung, Korea, where he worked
on world first 1-GHz ARM CortexTM microprocessor design.
Dr. Nam is currently with Chungnam National University, Daejeon, Korea, as an associate
professor.
His current interests include mobile GPU, machine learning processor, microprocessor,
low-power SoC and embedded software.
He co-authored the book Mobile 3D Graphics SoC: From Algorithm to Chip (Wiley, 2010)
and presented tutorials on mobile processor design at IEEE ISSCC 2012 and IEEE A-SSCC
2011.
He was a recipient of the CNU Recognition of Excellent Professors in 2013 and co-recipient
of the A-SSCC Distinguished Design Award in 2016.
Prof. Nam has served as the Chair of Digital Architectures and Systems (DAS) subcommittee
of ISSCC from 2017 to 2019.
He was a member of the Technical Program Committees for ISSCC (2011-2019), A-SSCC
(2011-2018), COOL Chips (2011-2018), VLSI-DAT (2011-2018), ASP-DAC (2015-2016), and
ISOCC (2015-2018) and the Steering Committee for the IC Design Education Center (IDEC)
from 2013 to 2018.
He was a Guest Editor for the IEEE Journal of Solid-State Circuits (JSSC) in 2013
and is an Associate Editor for the IEIE Journal of Semiconductor Technology and Science
(JSTS).