Mobile QR Code QR CODE

  1. (Department of Computer Science and Engineering, Chungnam National University, 99, Daehak-ro, Yuseong-gu, Daejeon, 305-764, Korea)

Gesture recognition, hidden Markov model (HMM), Viterbi algorithm, binary search orientation, model clustering


In recent days, gesture recognition is widely studied for a natural user interface (NUI) method on wearable smart devices such as head mounted displays (HMDs) and smart bands (1). Hand gesture recognition can be a secure method for the NUI on wearable devices because it rarely reveals user’s intention in public spaces unlike the speech recognition. There are two categories in the hand gesture recognition, the pose recognition on hand shapes and motion recognition on hand movements. This work focuses on the motion recognition of handwritings (2) because of its fluency in expressions that suits for versatile NUIs. Sequence matching plays a key role in this category, and thus the hidden Markov model (HMM) is widely adopted because of its best-in-class sequence matching accuracy (3). In the HMM algorithms, complex arithmetic operations and excessive memory bandwidth become the most challenging issues for their low-power and real-time realizations (4,5). There have been studies on hardware accelerations of the HMM algorithms (4,5), but they focused on the speech recognition only and did not account for the compute intensive motion orientation which is mandatory for the gesture recognition problem. A few studies on the gesture recognition hardware are available (1), but they focused on the pose recognition that does not include the HMM this work focuses on.

In this paper, we propose a low-power HMM accelerator accommodating a light weight motion orientation module for gesture recognition interface on wearable smart devices (6). In this accelerator, binary search method is exploited in the motion orientation module to reduce its computational complexity by replacing the division and arctangent operations in computing motion orientations with simple multiplications and lookup tables. In addition, gesture models are clustered in the gesture database to reduce unnecessary external memory transactions. As a result, the proposed HMM accelerator shows 25.6% power reduction from a vanilla hardware implementation of the gesture HMM.

Fig. 1. HMM-based gesture recognition process.



Fig. 1 shows a brief description of the gesture recognition process using the HMM algorithm. Basically, this is a stochastic process to find the meaning of a given gesture input based on the stochastic matching between input hand motion and a gesture model from the gesture database (DB). Each gesture model in the DB is defined with state transitions where each state corresponds to a hand motion and makes a transition to next state if it matches with the given input hand motion. A gesture model is chosen for the recognition result when it reaches its final state successfully.

The overall architecture of the proposed gesture HMM accelerator is depicted in Fig. 2. It consists of motion orientation and gesture matching stages. The motion orientation stage computes the direction of hand movement and transfers this direction to the following gesture matching stage. The gesture matching stage fetches the gesture models from the gesture DB and selects the model that matches the given direction with the highest probability as its final recognition result.

Fig. 2. HMM-based gesture recognition process.


1. Motion Orientation

Motion orientation specifies the direction of a motion vector in hand movement. The motion vector (∆x, ∆y) is determined based on the difference between two successive hand positions ($x_{t}, y_{t}$) and ($x_{t+1}, y_{t+1}$) on the hand movement. Hence, the motion orientation θ of the motion vector is computed by Eq. (1), which involves compute intensive division and arctangent operations. Therefore, in this work, the orientation θ is quantized into 16 intervals for reasonable complexity of the HMM algorithm, which is found to be sufficient quantization for representing on-line handwriting (7,8).

$$ \theta=\arctan \left(\frac{\Delta y}{\Delta x}\right) $$

A simplified hardware design of Eq. (1)can be found in (9) where the Eq. (1)is used to find the rotation angle in producing SIFT descriptors for object recognition purpose. They simplified the division operation using a multiplication and a reciprocation table for reduced hardware complexity. However, this approach is limited by the number of bits used in the divisor because the size of the reciprocation table explodes accordingly. This complication arises particularly in the high-resolution input image which produces the ∆x with a wide dynamic range that becomes the divisor in Eq. (1). Therefore, we propose a novel orientation scheme scalable to the input image resolution by adopting binary search method to find a quantized orientation interval that a motion vector resides in. We divide the motion vector space into 16 orientation intervals and assign a pre-computed arctangent value to each interval. Thanks to this method, we just need to find a proper orientation interval to get an orientation angle for a given motion vector. This approach avoids the arctangent and division operations, and exploits simple comparisons and a vector rotation that are independent of the dynamic range of ∆x.

Fig. 3. Binary search orientation (a) step 1: comparison to boundary; x=0, (b) step 2: comparison to boundary; y=0, (c) step 3: comparisons to one of two boundaries; y=x or y= −x, (d) step 4: comparisons to one of four boundaries rotated by −22.5°; x’=0, y’=0, y’=x, or y’= −x’.


The proposed binary search orientation scheme is illustrated in Fig. 3. The binary search determines the orientation interval for a given motion vector (∆x, ∆y) by simply comparing the vector with search boundaries x=0, y=0, y=±x and −22.5° rotation for each. Each boundary divides the vector space into two subspaces, and the orientation interval of the motion vector is examined step-by-step by narrowing down the search space into the selected subspace from the previous search step, as described in Fig. 3(a)-(d). The search steps shown in Fig. 3(a)-(c) contain simple comparisons but the step in Fig. 3(d) involves comparisons to the search boundaries rotated by −22.5°. These comparisons to the rotated boundaries can be expressed as Eq. (2)where the comparison is made in −22.5° rotated coordinates (x’, y’), and (∆x’, ∆y’) is the representation of the motion vector (∆x, ∆y) in the rotated space. The parameters α and β in Eq. (2)indicate the signs to specify each search boundary.

\begin{align} \begin{array}{l} \alpha \Delta y'- \beta \Delta x'>0\\ =\alpha \left(\sin 22.5^{\circ}\cdot \Delta x+\cos 22.5^{\circ}\cdot \Delta y\right)- \\ \beta \left(\cos 22.5^{\circ}\cdot \Delta x- \sin 22.5^{\circ}\cdot \Delta y\right)>0\\ =\left(\alpha \cos 22.5^{\circ}+\beta \sin 22.5^{\circ}\right)\cdot \Delta y- \\ \left(\beta \cos 22.5^{\circ}- \alpha \sin 22.5^{\circ}\right)\cdot \Delta x>0\\ \equiv A\cdot \Delta y- B\cdot \Delta x>0\\ \textit{where}\,\left(\alpha ,\beta \right)=\left\{\left(0,- 1\right),\left(0,1\right),\left(1,0\right),\left(- 1,0\right),\right.\\ \left. \left(1,- 1\right),\left(- 1,1\right),\left(1,1\right),\left(- 1,- 1\right)\right\} \end{array} \end{align}

This binary search orientation is implemented in hardware as depicted in Fig. 4. The hardware consists of four steps realizing the search steps illustrated in Fig. 3. The ∆x and ∆y are tested against the search boundaries x=0 and y=0 (i.e. ∆x>0, ∆y>0) to select a search space. Then, the test results are used together to determine the next search boundary between y=x and y= −x, and ∆x and ∆y are tested with this search boundary (i.e. ∆y > ±∆x) to select the next search space accordingly. Finally, all the test results above are used together to determine one of the rotated search boundaries, and the Eq. (2)is evaluated to select the final search space that corresponds to one of the 16 quantized orientations. The boundary comparisons represented by Eq. (2)require two multiplications with rotation coefficients. Hence, we implement the Eq. (2)using two multipliers and an 8-entry lookup table that is independent of input image resolution. Therefore, our motion orientation hardware reduces its power dissipation by 56.1% for high-definition (HD) input stream compared with the conventional approach using a reciprocation table (9).

Fig. 4. Structure for binary search orientation (a) coefficients for the rotated boundary compariosons, (b) hardware organization.


2. Gesture Matching

Fig. 5. Model clustering scheme.


Fig. 6. Viterbi decoder architecture using logarithmic arithmetic.


Gesture matching stage fetches in gesture models from the gesture DB and selects the one with the highest matching probability using the Viterbi decoder of the HMM algorithm. However, a huge memory bandwidth is required for the model fetching as the number of gesture models increases because each gesture model contains thousands of parameters such as transition, observation, and initial probabilities associated with the state transitions in the gesture model. Therefore, we exploit gesture model clustering in the gesture matching process to reduce effective bandwidth to external memory.

Proposed model clustering scheme is illustrated in Fig. 5. Gesture models with similar matching probabilities are clustered together, and the one with the closest probability to the average value in each cluster is chosen to represent the cluster. Thus, the gesture matching stage in the first step just fetches the representative gesture model from each cluster and selects the one with the highest probability using the Viterbi decoder. In the next step, it fetches in the cluster associated with the selected representative model and determines the gesture model with the highest probability within the cluster as the final gesture recognized by the system. An 8kB gesture cache is used to buffer the clustered gesture models for more reduction in external bandwidth as these gesture models are accessed repeatedly from the Viterbi decoder.

The Viterbi decoder module exploits the logarithmic arithmetic for a reduced arithmetic complexity because it is a multiplication oriented block. The logarithmic arithmetic is well known to be effective for multiplication and division by converting them into simple addition and subtraction in logarithmic domain (10). All the off-line parameters in the Viterbi decoder are pre-converted into the logarithmic domain and thus, logarithmic converters are unnecessary in this module. In addition, results do not need antilogarithmic converters because we just need to choose the maximum value among them and the antilogarithmic conversion is a monotonically increasing function preserving the order among results. Fig. 6 shows the proposed Viterbi decoder module, and the adders in gray area show the replacements of the multipliers using the logarithmic arithmetic.

Fig. 7. Die photograph and chip characteristics.


Fig. 8. . Power reduction from each of proposed scheme.



The proposed HMM accelerator with motion orientation function is fabricated using 65 nm CMOS technology. Its chip photograph and performance summary are given in Fig. 7. It realizes real-time gesture recognition while consuming 6.4 mW at 133 MHz.

Table 1. Comparison with other works



Gesture HMM

(This Work)


HMM (5)


General purpose processor (OMAP4430)

Hardware Accelerator

Hardware Accelerator

Technology (nm)




No. of Models




Power (mW)


6.4 (@ 1.2V)

23 (@ 1.2V)

6.0 (@ 0.85V)


Frequency (MHz)


133 (@ 1.2V)

110 (@ 1.2V)

50 (@ 0.85V)

* Measured with TI multimedia evaluation board (OMAP 4430)

Fig. 9. Configuration of evaluation system.


To our best knowledge, this is the first HMM accelerator for the gesture recognition purpose. Therefore, our work is compared with the general purpose processor (i.e. OMAP4430) solution of the gesture recognition on 2,296 handwriting data set (2) as summarized in Table 1. Proposed optimization schemes are evaluated by demonstrating power reductions presented in Fig. 8. The binary search method in motion orientation achieves 12.6% power reduction, the model clustering contributes additional 7.9% and the logarithmic arithmetic reduces 5.1% more, thereby resulting in 25.6% reduction in total from a vanilla hardware implementation. Our design is also compared with the state-of-the-art HMM accelerator for speech recognition purpose (5) as presented in Table 1.

Fig. 9 shows an evaluation system for the proposed HMM accelerator chip. Camera board is attached to stream in QVGA hand gesture images at 30fps and detects the fingertips from the incoming images. The coordinates of the fingertip is delivered to the proposed HMM accelerator on the test board to conduct the gesture recognition. The handwriting recognition results are presented in Fig. 10. The left side of each image shows the input image frame augmented with traces of finger movements, and the right side shows a sequence of the motion orientations determined by the HMM accelerator.

Fig. 10. Handwriting recognition results (a) case for character A, (b) case for character D, (c) case for character N.


Fig. 10(a)-(c) show the recognition results of the handwritings for characters A, D, and N, respectively. The upper parts show handwriting gestures with regular strokes and the lower parts show handwritings with irregular strokes. As shown in Fig. 10(a), a total of five strokes are needed for character A with regular strokes, of which strokes 1, 3, and 5 constitute the character A. The start point of the gesture can be changed to the lower left side on irregular strokes and the recognition is properly done for this case as well. Fig. 10(b) shows the recognition results for the character D. It demonstrates proper result even for the irregular strokes with smaller character size. Fig. 10(c) shows the recognition is still successful even for the slanted characters with irregular sequence of strokes. These results show that the proposed HMM accelerator demonstrates robust recognition results on various handwriting gestures.


In this paper, the world’s first HMM accelerator with motion orientation and model clustering is presented for the gesture recognition UI on wearable devices. Binary search orientation is proposed to avoid the division and arctangent operations and attains 12.6% power reduction. Model clustering in the gesture DB achieves additional 7.9% reduction, and logarithmic arithmetic in the Viterbi decoder contributes 5.1% of more reduction. Thanks to these schemes combined together, the proposed gesture HMM accelerator demonstrates 25.6% power reduction compared with a vanilla hardware implementation of the gesture recognizing HMM.


This work was supported by research fund of Chungnam National University. The authors would like to thank IDEC for chip fabrication and CAD support.


Park S., Choi S., Lee J., Kim M., Park J., Yoo H.-J., 2016, A 126.1mW Real-Time Natural UI/UX Processor with Embedded Deep-Learning Core for Low-Power Smart Glasses, ISSCC Dig. Tech. Papers, pp. 254-256DOI
Günter S., Bunke H., Apr 2004, HMM-based handwritten word recognition: On the optimization of the number of states, training iterations and Gaussian components, Pattern Recognition, Vol. 37, No. 10, pp. 2069-2079DOI
Farra N., Raffa G., Nachman L., Hajj H., 2011, Energy-Efficient Mobile Gesture Recognition with Computation Offloading, Proc. Int. Conf. Energy Aware Comput., pp. 1-6DOI
Fahmy S. A., Cheung P. Y. K., Luk W., 2005, Hardware Acceleration of Hidden Markov Model Decoding for Person Detection, in Proc. Conf. Des., Autom. Test Eur. (DATE), Vol. 3, pp. 8-13DOI
Price M., Glass J., Chandrakasan A. P., Jan 2015, A 6 mW, 5,000-Word Real-Time Speech Recognizer Using WFST Models, IEEE J. Solid-State Circuits, Vol. 50, No. 1, pp. 102-112DOI
Choi S., et al. , Nov 2016, A Low-Power Real-Time Hidden Markov Model Accelerator for Gesture User Interface on Wearable Devices, in IEEE Asian Solid-State Circuits Conf., Vol. , No. , pp. 261-264DOI
Lee H.-K., Kim J. H., Oct 1999, An HMM-Based Threshold Model Approach for Gesture Recognition, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 21, No. 10, pp. 961-973Google Search
Lee S.-H., Kim J. H., Nov 1997, Augmenting the Discrimination Power of HMM by NN for On-Line Cursive Script Recognition, Applied Intelligence, Vol. 7, No. 4, pp. 304-314DOI
Hong I., Bong K., Shin D., Park S., Lee K. J., Kim Y., Yoo H.-J., Jan 2016, A 2.71 nJ/Pixel Gaze-Activated Object Recognition System for Low-Power Mobile Smart Glasses, IEEE J. Solid-State Circuits, Vol. 51, No. 1, pp. 45-55DOI
Nam B.-G., Yoo H.-J., May 2009, An embedded stream processor core based on logarithmic arithmetic for a low-power 3-D graphics SoC, IEEE J. Solid-State Circuits, Vol. 44, No. 5, pp. 1554-1570DOI


Hyeon-Gu Do

received the B.S. degree in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2017, where he is currently working toward the M.S. degree.

His current research interests include machine learning processors and wearable SoC design.

Seongrim Choi

received the B.S. and M.S. degrees in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2011 and 2013, respectively, where he is currently working toward the Ph.D. degree.

His current research interests include object recognition processor and wearable SoC design.

He is a co-recipient of the IEEE Asian Solid-State Circuits Conference (A-SSCC) Distinguished Design Award in 2016.

Jaemin Hwang

received the B.S. degrees in mechatronics and computer science and engineering (double major) and M.S. degree in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2014 and 2016, respectively, where he is currently with Hanwha Systems.

His research interests include mobile GPU, digital arithmetic circuits, and system software platforms.

Ara Kim

received the B.S degree in information and communication engineering from the Hanbat National University and M.S. degree in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2016 and 2018, respectively, where she is currently with Satreci.

Her research interests include wearable SoC and low-power SoC design.

Byeong-Gyu Nam

received his B.S. degree (summa cum laude) in computer engineering from Kyungpook National University, Daegu, Korea, in 1999, M.S. and Ph.D. degrees in electrical engineering and computer science from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2001 and 2007, respectively.

His Ph.D. work focused on low-power GPU design for wireless mobile devices.

In 2001, he joined Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea, where he was involved in a network processor design for InfiniBandTM protocol.

From 2007 to 2010, he was with Samsung Electronics, Giheung, Korea, where he worked on world first 1-GHz ARM CortexTM microprocessor design.

Dr. Nam is currently with Chungnam National University, Daejeon, Korea, as an associate professor.

His current interests include mobile GPU, machine learning processor, microprocessor, low-power SoC and embedded software.

He co-authored the book Mobile 3D Graphics SoC: From Algorithm to Chip (Wiley, 2010) and presented tutorials on mobile processor design at IEEE ISSCC 2012 and IEEE A-SSCC 2011.

He was a recipient of the CNU Recognition of Excellent Professors in 2013 and co-recipient of the A-SSCC Distinguished Design Award in 2016.

Prof. Nam has served as the Chair of Digital Architectures and Systems (DAS) subcommittee of ISSCC from 2017 to 2019.

He was a member of the Technical Program Committees for ISSCC (2011-2019), A-SSCC (2011-2018), COOL Chips (2011-2018), VLSI-DAT (2011-2018), ASP-DAC (2015-2016), and ISOCC (2015-2018) and the Steering Committee for the IC Design Education Center (IDEC) from 2013 to 2018.

He was a Guest Editor for the IEEE Journal of Solid-State Circuits (JSSC) in 2013 and is an Associate Editor for the IEIE Journal of Semiconductor Technology and Science (JSTS).