(Yongchul Jung)
^{1}
(Jaechan Cho)
^{1}
(Seongjoo Lee)
^{2}
(Yunho Jung)
^{1}

(School of Electronics and Information Engineering, Korea Aerospace University, Goyangsi,
South Korea)

(Department of Information and Communication Engineering, Sejong University, Seoul,
South Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
FFT, hand gesture recognition, MDC, radar
I. INTRODUCTION
Research on humancomputer interaction (HCI) has become increasingly important as
userfriendly interfaces can help to directly employ natural communication and operation
skills of humans for controlling the myriad of currently available electronic devices.
Hand gesture recognition (HGR) is one of the simplest ways for interacting with a
computer, as the movement and orientation of the hands and fingers can be encoded
to easily command devices ^{(1}^{5)}. Most gesture recognition systems are based on computer vision and usually rely on
a camera and an image recognition algorithm. However, vision based HGR presents several
limitations under conditions such as variable lighting and dynamic background, and
can be ineffective for realtime interaction. Moreover, a highresolution and speed
camera is required to acquire fast and fine gestures of the fingers, and thus the
hardware cost can be high. Unlike vision technology, radio detection and ranging (radar)
is insensitive to variable lighting, and an HGR system using this technology can feature
reduced cost, computational complexity, and power consumption ^{(6)}.
The fast Fourier transform (FFT) processor is one of the most complex units in radar
systems. Therefore, reducing its complexity can notably contribute to a more efficient
hardware design ^{(7)}. The number of FFT points N for radar system is given by
where ${\lambda}$ is the wavelength of the radar signal and ${\Delta}$v is velocity
resolution of target ^{(8)}. T is maximum pulse repetition interval (PRI), which is defined as
where v$_{\mathrm{max}}$ is the maximum velocity of target, which for finger movement
can reach up to 18 km/h. Likewise, a suitable velocity resolution for HGR system is
0.05 km/h. Moreover, at least a threechannel receiver is required to estimate the
azimuth and altitude angles from the phase difference among pairs of signals ^{(9)}, and therefore the FFT processor for HGR radar should support at least 600\textasciitilde{}720point
operation for threechannel input data streams ^{(10)}. To further optimize the FFT calculation, radix2$^{2}$ or radix2$^{3}$ algorithms
allow to reduce the number of nontrivial multiplications and can be adopted in HGR
radar. Still, the radix2 architecture requires a 1024point FFT processor, thus increasing
the hardware requirements ^{(11)}.
Multichannel FFT processors are commonly designed by using one FFT processor per
data stream. In addition, a pipeline FFT architecture using singlepath delay feedback
(SDF) provides the lowest number of nontrivial multiplications, which are the most
complex operations during FFT calculation. The increasing hardware complexity of implementing
SDF in multichannel systems can be reduced using a multipath delay commutator (MDC)
architecture to simultaneously handle multiple input data streams through a single
FFT processor ^{(12)}. This alternative has been shown to be more area efficient than the SDF architecture
for multiple channels.
In this paper, we propose a threechannel 729point FFT processor for HGR radar based
on radix3$^{2}$ algorithm and MDC architecture to minimize the number of nontrivial
multiplications. The rest of the paper is organized as follows. Section 2 presents
the FFT for HGR, and the hardware architecture of the proposed FFT processor is described
in Section 3. Section 4 presents the design and implementation of the proposed FFT
processor. In Section 5, we draw some conclusions.
II. FFT FOR HGR
The Npoint discrete Fourier transform (DFT) is defined as
Based on the divide and conquer algorithm, indices n and k can be written as
where 0 ${\leq}$ n$_{1}$ ${\leq}$ 2, 0 ${\leq}$ k$_{1}$ ${\leq}$ 2, 0 ${\leq}$ n$_{2}$
${\leq}$ 2, 0 ${\leq}$ k$_{2}$ ${\leq}$ 2, 0 ${\leq}$ n$_{3}$ ${\leq}$ (N/9  1),
and 0 ${\leq}$ k$_{3}$ ${\leq}$ (N/9  1). Replacing (4) and (5) in (3) we obtain
In (6), BF$_{3}$((N/9)n$_{2 }$+ n$_{3}$, k$_{1}$) corresponds to the radix3 decimationinfrequency
(DIF) butterfly operator expressed as
Similarly, H (n$_{3}$, k$_{1}$, k$_{2}$) can be written as
Therefore, (6) comprises a twostep radix3 butterfly operation and a N/9point DFT ^{(20)}. Term$W_{N}^{nk}$is a twiddle factor, whose multiplication can be divided into a
trivial and a nontrivial multiplication according to indices n and k. Therefore,
the$W_{9}^{n_{2}k_{1}}$multiplication in (6) can be expanded as
where$~ n_{2}k_{1}\in \left\{0,1,2,4\right\}. $As$W_{9}^{n_{2}k_{1}}$is a simply constant
scaling, (9) can be designed by a trivial multiplication using shifters and adders, and the radix3$^{2}$
algorithm has the same number of nontrivial multiplications as radix9 algorithm
but maintaining the butterfly structure of radix3 algorithm.
Table 1 lists the hardware requirements of threechannel R2$^{2}$SDF, R2$^{3}$SDF, and R3$^{2}$MDC
FFT processors, where R denotes the radix algorithm. To compare these FFT architectures,
the area of complex multiplier is given in terms of equivalent adders. If a 16bit
complex multiplier is implemented with three real multipliers and five real adders,
it is equivalent to fifty adders ^{(13)}. As shown in Table 1, the proposed implementation, R3$^{2}$MDC diminishes the number of adders by 35.8%
compared to the R2$^{2}$SDF architecture, and by 24.3% compared to the R2$^{3}$SDF
architecture. Fig. 1 shows the required number of adders according to the number of FFT points. The proposed
algorithm can be considered as the most areaefficient for a number of FFT points
between 2$^{3q}$ + 1 and 3$^{\mathrm{2q}}$, with q {\textgreater} 1.
Table 1. Hardware requirements for 3channel R2$^{2}$SDF, R2$^{3}$SDF, and R3$^{2}$MDC
architectures

Complex Multiplier

Complex
Adder

Total
Adder

Reduction
(%)

Trivial

Non
trivial

R2$^{2}$SDF

15

12

60

720



R2$^{3}$SDF

18

9

80

610

15.3

R3$^{2}$MDC

9

6

81

462

35.8

Fig. 1. Number of adders for R2$^{2}$SDF, R2$^{3}$SDF, and R3$^{2}$MDC architectures
according to the number of FFT points.
III. FFT PROCESSOR HARDWARE ARCHITECTURE
Fig. 2 shows the hardware architecture of the proposed threechannel 729point FFT processor,
which consists of data mapping modules DMM1 to DMM5, radix3 butterfly modules R3BM1
to R3BM6, and a data reordering module (DRM). DMM1 consists of a multiplexer and a
dualport random access memory (DPRAM) for reconstructing the input data stream and
transmitting it to the next stage. R3BM1, R3BM3, and R3BM5 are composed of one radix3
butterfly (R3BF) operator and three trivial multipliers, whereas R3BM2 and R3BM4 are
composed of one R3BF operator and three nontrivial multipliers, and R3BM6 is composed
of only one R3BF operator. DMM2 to DMM6 reconstruct the input data using a delay component
and a commutator and transfer the reconstructed data to the next stage. Finally, the
DRM reconstructs the outputs of module R3BM6 with a structure similar to that of DMM1,
consisting of multiplexer and DPRAM.
Fig. 2. The hardware architecture of the proposed FFT processor.
1. DMM1 and DRM
Both DMM1 and DRM reconstruct data using a delay component for the next stage. In
general, the delay component is implemented as a shift register, and the area that
this operation occupies notably increases with the FFT length and number of data paths.
In ^{(14)}, a method for reducing the area using DPRAM instead of shift registers is proposed.
Likewise, in the proposed FFT processor, we design DMM1 and DRM using three DPRAM
modules and six multiplexers as shown in Fig. 3, where the data scheduling of DMM1 is illustrated in Fig. 4. First, input data streams a0 to a728, b0 to b728, and c0 to c728 enter into DMM1
(Fig. 4(a)) and are reconstructed by the leftside multiplexers (Fig. 4(b)). These reconstructed data streams are registered in the corresponding DPRAM and
then retrieved (Fig. 4(c)). Finally, the DPRAM outputs are reordered by the rightside multiplexers (Fig. 4(d)) and transferred to R3BM1.
Fig. 3. Block diagram of DMM1.
Fig. 4. Data reordering of DMM1.
Fig. 5. Hardware architecture for multiplication of 0.765625  j0.640625.
2. R3BM1, R3BM3 and R3BM5
The twiddle factors of R3BM1, R3BM3, and R3BM5 are$W_{9}^{~ n_{2}k_{1}}$ as in (6). The possible twiddle factors are $W_{9}^{~ 1}$,$W_{9}^{~ 2}$and$W_{9}^{~ 4}$because$n_{2}k_{1}\in
\left\{0,1,2,4\right\},$and expressed as
where 0.7660 and 0.6428 being approximated as
The calculation in (13) and (14) can be implemented with shifters and adders as shown in Fig. 5. Therefore, multiplying twiddle factor$W_{9}^{~ 1}$is a trivial operation because
it can be implemented using shifters and adders. Likewise, multiplying twiddle factors$W_{9}^{~
2}$and$W_{9}^{~ 4}$can be implemented by using (15) to (18) as shown in Fig. 6 and 7, respectively.
Fig. 6. Hardware architecture for multiplication of 0.171875  j0.984375.
Fig. 7. Hardware architecture for multiplication of 0.93750j0.34375.
3. DMM2 to DMM6
DMM2 to DMM6 are composed of shift registers and commutators without DPRAM, because
their delays are small. As shown in Fig. 2, DMM2 has shift registers with sizes of 81 and 162, whose data reordering pattern
is depicted in Fig. 8. The three input data streams of DMM2 are simultaneously transferred from R3BM1,
with the second and third data streams delayed by shift registers of sizes 81 and
162, respectively. Commutators switch each delayed data and introduce additional delays
into the first and second data streams, as shown in Fig. 8. The aligned data streams are transferred to the next stage, R3BM2. DMM3 to DMM6
are similarly implemented with shift registers of different sizes and the same alignment
pattern of DMM2.
Fig. 8. Data reordering of DMM2.
IV. DESIGN AND IMPLEMENTATION
The proposed FFT processor was designed using hardware description language (HDL)
and synthesized to gatelevel circuits using a standard cell library of 65 nm CMOS
process. A 12bit word for the real and imaginary data paths was selected to satisfy
the requirement for a signaltoquantizationnoise ratio (SQNR) of 40 dB, as detailed
in Table 2. The proposed architecture results in a die size of 0.81 mm$^{2}$ with the 219K logic
gates and memory of 18 KB, as specified in Table 3 and 4. Given that R3BM2 and R3BM4 include three nontrivial multipliers, which are the most
complex operations in the FFT processor, their area proportion is high. Fig. 9 shows the layout of the proposed FFT processor, which was compared to similar recent
developments ^{(15}^{19)} as summarized in Table 5. For a fair comparison, we normalized the area as
where M, N, and Tech are the number of the parallel data paths, the FFT size, and
the process technology in nanometers, respectively. Table 5 shows that the normalized area of the proposed FFT processor is the smallest among
the different processors, because it requires the minimum number of nontrivial multipliers.
Table 2. SQNR for word lengths between 10 and 15 bits
Bit

SQNR (dB)

Bit

SQNR (dB)

10

32

13

51

11

40

14

58

12

46

15

63

Table 3. Logic gate count of the proposed FFT processor
Module

Gate Count (K)

Proportion (%)

R3BM1

15

6.85

DMM2

34

15.53

R3BM2

43

19.63

DMM3

30

13.7

R3BM3

15

6.85

DMM4

12

5.48

R3BM4

43

19.63

DMM5

6

2.74

R3BM5

15

6.85

DMM6

5

2.28

R3BM6

1

0.46

Total

219

100

Table 4. Key features of the implemented FFT processor
Parameter

Value

Technology

65 nm 1P9M CMOS

Operating Frequency

96 MHz

Internal Memory

18K bytes

Voltage

I/O

2.5 V

Core

1.2 V

Core Size

0.81 mm$^{2}$

Table 5. Comparison of the proposed FFT processor with recent developments

[15]

[16]

[17]

[18]

[19]

This work

FFT Length

1024

128~2048

1024, 8192

128~2048

1024 ~ 8192

729

FFT Architecture

MDC

SDF

Single

MDC

SDF

MDC

FFT Algorithm

Radix4

Radix2

Radix2$^{3}$

Radix4/8

Balanced binarytree

Radix3$^{2}$

Number of Channel

4

1

1

4

1

3

Frequency (MHz)

30

40

56

40

112

96

Word Length (Bit)

16

32

22

12

24

24

SQNR (dB)

N.A.

N.A.

40

N.A.

55

46

Max. Throughput @ R Hz

4R

R

R

4R

R

3R

Technology (nm)

65

180

180

90

180

65

Area (mm$^{2}$)

8.299

6.76

4.84

3.1

3.52

0.81

Normalized Area

207.48

80.14

48.55

36.75

35.31

28.39

Normalized Area / SQNR

N.A.

N.A.

1.214

N.A.

0.642

0.617

Fig. 9. Layout of the proposed FFT processor
V. CONCLUSIONS
We propose an areaefficient FFT processor for HGR radar. The radix3$^{2}$ algorithm
and MDC pipelined hardware architecture allow to minimize the number of nontrivial
multiplications and support the threechannel 729point FFT operation required for
HGR. The FFT processor is implemented using a 65 nm CMOS process, whose normalized
area is the smallest among similar FFT processors. Moreover, the proposed algorithm
and hardware architecture can achieve high performance in a small area, and we expect
it to be suitable for compact HGR radar.
ACKNOWLEDGMENTS
This work was supported by the Technology Innovation Program, 10080619, funded by
the Ministry of Trade, Industry and Energy (MOTIE, Korea) and CAD tools were supported
by IDEC.
REFERENCES
Pickering C. A., Mar. 2005, The search for a safer driver interface: A review of gesture
recognition human machine interface, Computing & Control Engineering Journal., Vol.
16, No. 1, pp. 34
Pavlovic V. I., Sharma R., Huang T. S., Jul. 1997, Visual interpretation of hand gestures
for humancomputer interaction: A review, Pattern Analysis and Machine Intelligence,
IEEE Transactions on, Vol. 19, pp. 677695
Suarez J., Murphy R. R., Sep 2012, Hand gesture recognition with depth images: A review,
Robot and Human Interactive, IEEE Intenational Symposium on, pp. 411417
ParadaLoira F., GonzalezAgulla E., AlbaCastro J., June 2014, Hand gestures to control
infotainment equipment in cars, Intelligent Vehicles, IEEE Symposium on, pp. 16
Jung Y., Lim E., Jung Y., Feb. 2017, LowComplexity 3Channnel FFT Processor for Hand
Gesture Recognition Radar Systems, Korea Conference on Semiconductors, pp. 77
Lien J., Gillian N., Karagozler M.. E., Amlihood P., Schwesig C., Olson E., Raja H.,
Poupyrev I., July 2016, Soli: Ubiquitos Gesutre sensing with millimeter wave radar,
ACM Transaction on Graphics, Vol. 35, No. 4
Nathanson F. E., Reilly J. P., Cohen M. N., 1999, Radar Design Principles, 2nd ed.,
SciTech Publishing
Kronauage M., Rohling H., Oct. 2014, New Chirp Sequence Radar Waveform, Aerospace
and Electronic Systems, IEEE Transaction On, Vol. 50, No. 4, pp. 28702877
Molchanov P., Gupta S., Kim K., Pulli K., May 2015, Multisensor system for driver's
handgesture recognition, Automatic Face and Gesture Recognition, IEEE Conference
on, pp. 18
Molchanov P., Gupta S., Kim K., Pulli K., May 2015, Shortrange FMCW monopulse radar
for handgesture sensing, Radar Conference IEEE, pp. 14911496
Nguyen T., Lee H., Feb. 2017, Highthroughput Lowcomplexity Mixedradix FFT Processor
using a Dualpath Shared Complex Constant Multiplier, Semiconductor Technology and
science, IEIE Journal of, Vol. 17, No. 1
Jung Y., Yoon H., Kim J., Feb 2003, New efficient FFT algorithm and pipeline implementation
results for OFDM/DMT applications, Consumer Electronics, IEEE Transaction on, Vol.
49, No. 1, pp. 1420
Sansaloni T., PerezPascual A., Torees V., Valls J., Sep. 2005, Efficient pipeline
FFT processors for wireless LAN MIMOOFDM systems, Elecotronics Letters, Vol. 41,
No. 19, pp. 10431044
Yoshizawa S., Nishi K., May 2008, An Area and Power Efficient Pipeline FFT Processor
for 8x8 MIMOOFDM Systems, Circuits and Systems, IEEE International Symposium on,
pp. 12481251
Seok M., et al. , Feb., 2011, A 0.27V 30MHz 17.7nJ/transform 1024pt complex FFT core
with superpipelining, SolidState Circuits Conference, 2011. ISSCC 2011. Digest of
Technical Papers, IEEE Internaltional, pp. 342344
Chhatbar T. D., Darji A. D., Jan., 2009, High Speed High Throughput FFT /IFFT Processor
ASIC for Mobile Wimax, Emerging Trends in Engineering and Technology, 2nd IEEE International
Conference on, pp. 402405
Lin Y. W., Liu H. Y., Lee C. Y., Nov., 2004, Dynamic scaling FFT processor for DVBT
applications, Solid State Circuits, IEEE Journal of, Vol. 39, No. 11, pp. 20052013
Yang K. J., Tsai S. H., Chuang G. C. H., Apr., 2013, MDC FFT/IFFT Processor With Variable
Length for MIMOOFDM Systems, Very Large Scale Integration Systems, IEEE Transactions
on, Vol. 21, No. 4, pp. 720731
Lee H., Park I., Apr., 2007, Balanced BinaryTree Decomposition for AreaEfficient
Pipelined FFT Processing, Circuits and Systems I: Regular Papers, IEEE Transactions
on, Vol. 54, No. 4, pp. 889900
Shih X. Y., Liu Y. Q., Chou H. R., Jun., 2017, 48Mode Reconfigurable Design of
SDF FFT Hardware Architecture Using Radix3^2 and Radix2^3 Design Approaches, IEEE
TCASI, Vol. 64, No. 6, pp. 
Author
Yongchul Jung received the B.S. and M.S. degrees in the School of Electronics and
Information Engineering from Korea Aerospace University, Goyang, Korea, in 2015 and
2017, respectively. He is currently working towards the Ph.D. degree in the School
of Electronics and Information Engineering, Korea Aerospace University. His research
interests include the signal processing algorithm and VLSI implementation for the
radar signal processing system.
Jaechan Cho received the B.S. and M.S. degrees in the School of Electronics and Information
Engineering from Korea Aerospace University, Goyang, Korea, in 2015 and 2017, respectively.
He is currently working towards the Ph.D. degree in the School of Electronics and
Information Engineering, Korea Aerospace University. His research interests include
the signal processing algorithm and VLSI implementation for the image processing system.
Seongjoo Lee received his B.S., M.S., and Ph.D. degrees in Department of Electrical
and Electronic Engineering from Yonsei University, Seoul, Korea, in 1993, 1998, and
2002, respectively. From 2002 to 2003, he was a senior research engineer at the IT
SOC Research Center, Yonsei University, Seoul, Korea. From 2003 to 2005, he was a
senior engineer in Samsung Electronics Co. Ltd., Suwon, Korea. He was a research professor
at the IT Center and the IT SoC Research Center, Yonsei University, Seoul, Korea from
2005 and to 2006. He is currently a professor in the Department of Information and
Communication Engineering at Sejong University, Seoul, Korea. His current research
interests include PN code acquisition algorithms, cdma2000 modem SoC design, CDMA
communication, and SoC design for image processing.
Yunho Jung received the B.S., M.S., and Ph.D. degrees in Department of Electrical
and Electronic Engineering from Yonsei University, Seoul, Korea, in 1998, 2000, and
2005, respectively. From 2005 to 2007, he was a senior engineer in Samsung Electronics,
Suwon, Korea. From 2007 to 2008, he was a research professor at Institute of TMS Information
Technology, Yonsei University, Seoul, Korea. He is currently a professor in the School
of Electronics and Information Engineering, Korea Aerospace University, Goyang, Korea.
His research interests include the signal processing algorithm and VLSI implementation
for the wireless communication and image processing systems.