DoHyeon-Gu
ChoiSeongrim
WooJunsik
KimAra
NamByeong-Gyu
-
(Department of Computer Science and Engineering, Chungnam National University, 99,
Daehak-ro, Yuseong-gu, Daejeon, 305-764, Korea)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Wake-up detector, gesture recognition, energy-harvesting, dual-mode CMOS image sensor, subthreshold SRAM
I. INTRODUCTION
Hand gesture recognition is gaining attention as a natural user interface (NUI) mechanism
in wearable smart devices such as smart watches and head mounted displays due to its
security merits of undisclosed users’ intention in public domain unlike speech recognition
or other approaches (1). However, always-on sensing nature of the gesture user interface demands large energy
dissipation, which becomes worse due to compute-intensive vision processing for the
gesture recognition functions. Recently, wake-up solutions have been proposed to extend
battery lifetime of the NUI system by turning off the main functional blocks in standby
mode and keeping the wake-up detectors always alive as shown in Fig. 1 (2,3). There have been several studies on the wakeup detectors for speech recognition,
face recognition, and surveillance applications, but they were not for the gesture
recognition systems.
In this paper, we propose a self-powered always-on vision wake-up detector for the
gesture user interface on wearable smart devices (4). The proposed wake-up detector accommodates four key features to enable the self-powered
operation. First, we propose a near-threshold imaging-harvesting dual-mode CMOS image
sensor (CIS) based on 0.6 V 3T pixels. In this dual-mode sensor, the number of stacked
devices in the pixel comparators is reduced for its near-threshold operation. Second,
we present a subthreshold SRAM with disturb-free 0.3 V 10T bitcells. We eliminate
all the contentions arising in a bitcell for subthreshold operations by adopting proper
gating schemes to the bitcell. Third, hand detection engine is devised based on a
proposed descriptor algorithm optimized for the hand detection. It uses modified Haar-like
filters robust to skin color variations to improve the accuracy in hand detection.
Finally, switched capacitor DC-DC converter is integrated together for lightweight
system design. It converts the energy harvested from the CIS to the proper voltage
levels used in the chip. As a result, our wake-up detector achieves self-powered operation
of the always-on vision-based wake-up detection.
Fig. 1. Conceptual diagram of wake-up detection.
II. SELF-POWERED VISION WAKE-UP DETECTOR
1. System Architecture
The overall system architecture of the proposed self-powered wake-up detector is
shown in Fig. 2. It consists of four major functional blocks, i.e. energy-harvesting CIS, frame buffer
SRAM, hand detection engine, and switched capacitor (SC) DC-DC converter. The energy-harvesting
CIS works in dual modes of imaging and harvesting. In the imaging mode, the CIS captures
incoming images and stores them in the frame buffer. The hand detection engine then
finds hand shapes from the images and produces wake-up signal to the gesture UI system
on hand detection. In the energy-harvesting mode, pixels in the CIS are used as photovoltaic
cells to harvest energy from incident light. The SC DC-DC converter then produces
0.3 V supply voltage for the digital blocks such as frame buffer and hand detection
engine from the energy harvested from the CIS and 0.6V for the analog blocks including
the CIS and SC DC-DC converter itself. The chip initially runs from the external battery,
but it switches to the on-die supply voltage for self-powered operation when the harvested
energy reaches a sufficient voltage level of 0.6 V.
Fig. 2. Overall architecture of vision wake-up detector.
Fig. 3. Cross section diagrams of photodiodes (a) conventional photodiode, (b) dual-mode
photodiode.
2. Near-threshold Dual-mode CIS with 3T Pixels
The dual-mode CIS with integrated imaging and harvesting operations has been studied
for area- and energy-efficient system designs (5). The dual-mode CIS operates in two phases i.e. harvesting phase and imaging phase
to serve as the energy source as well as image sensor for a system. The photodiode
(PD) in conventional CIS pixels (6) cannot work as a photovoltaic cell because it needs forward bias for photovoltaic
operation but its anode is fixed in reverse bias as shown in Fig. 3(a). Therefore, another diode (PD2) is added to the dual-mode pixel (5) that can be switched to forward bias by connecting its cathode to the ground in energy
harvesting mode (EHM) as described in Fig. 3(b). This additional diode does not incur any area overhead to the pixel as it’s built
upon the existing diode (PD1) and improves the fill-factors as well in the imaging
mode.
Fig. 4. Comparison of energy-harvesting pixel structures (a) dual-mode pixel, (b)
DPS pixel, (c) DPS pixel with shared-comparator, (d) low-voltage dual-mode DPS pixel.
The dual-mode pixels incorporate a source follower (SF) to transfer the pixel value
to the comparator in the analog-to-digital converter (ADC) as illustrated in Fig. 4(a), but this SF imposes limitations on the dynamic range under low voltage operations.
In the meantime, digital pixel sensor (DPS) was studied for low-voltage operation
of the CIS (7). It takes out the comparator from the ADC and makes each pixel compare its own value
within the pixel to directly output the result to the following ADC as described in
Fig. 4(b). This approach eliminates the SF from the pixel and thus improves the dynamic range
under low voltage operation. However, the DPS structure incurs a large area overhead
due to the comparator incorporated in each pixel and therefore, shared-comparator
structure (8) has been proposed for the DPS as shown in Fig. 4(c). It mitigates the area issue by leaving only the input device of the comparator in
each pixel and taking out the other part of the comparator to make shared among the
pixels in the same column. In this structure, a row-select device is added in each
pixel for selective activation of the input device.
We basically adopt this shared-comparator DPS structure to our dual-mode pixel as
shown in Fig. 4(d) for low-voltage operations. However, in our structure, the row-select device in Fig. 4(c) is taken out from the input stack and shunted with the diodes to reduce the number
of stacked devices for more reduction in supply voltage down to near-threshold level.
This structure also reduces the mismatches across pixels because process variation
impacts only the single input device rather than both the stacked devices together.
Thanks to this proposed dual-mode DPS structure, we achieve 0.6 V of near-threshold
operation of the dual-mode CIS.
3. Subthreshold SRAM with Disturb-free 10T Bitcells
Conventional 6T SRAM bitcell achieves high memory-density with its simple structure
but its low-voltage operation is limited around 0.7 V owing to the several disturbance
issues associated with the 6T structure such as read disturbance, write disturbance,
half selection and hold stability (9). Recently, a 12T bitcell was proposed to eliminate all the disturbances arising in
a bitcell for robust subthreshold operations by adopting read buffer, cross-point
write wordline, and power cutoff structures (10). However, the large transistor count in this structure impacts memory density so
we propose a novel disturbance-free 10T bitcell to address the density issue without
sacrificing the robustness under subthreshold region.
The proposed 10T SRAM bitcell is described in Fig. 5. It consists of data storage (M1-M6) and read/write port (M7-M10) as shown in Fig. 5(a). It exploits the single bitline structure for its area efficiency by unifying the
dual read/write ports exploited in the 12T bitcell. The cell layout in Fig. 5(b) shows the layers from the diffusion up to metal-3. We place the PMOS transistors
(M1-M4) on the left side of the cell and the NMOS transistors (M5-M10) on the right
side to secure the space for preventing the well proximity effect (WPE). We put two
write wordlines (i.e. WWLA and WWLB) between the PMOS and NMOS transistors to exploit
the area secured for the WPE prevention. The layout takes 1.915 μm in width and 1.535
μm in height, thereby resulting in 2.94 μm2, which demonstrates 25.4% less area than
that of 12T bitcell (10).
In this 10T structure, PMOS device M3 or M4 is turned off during write operations
to eliminate the write disturbance arising from the contention between pull-up device
(M1 or M2) and write driver. The read buffer composed of M9 and M10 isolates the data
storage from the bitline to avoid any read disturbance caused by low-impedance path
between storage and bitline. Half-select is also resolved as the connection between
the storage node and the bitline can only be made for the fully selected bitcells,
i.e. both the row-select wordline (RWL) and column-select write wordline (WWLA or
WWLB) should be active together for a selection to be made. In addition, the source
voltage of the read buffer is held high during the hold state to minimize bitline
leakage, which increases the hold margin of the bitcell.
Fig. 6 shows 50k Monte Carlo simulations for the 0.3 V operations of the conventional 12T
and proposed 10T bitcells, respectively. Fig. 6(a)-(c) present the graphs for the read noise margin, write noise margin, and hold margin
of the 12T and the 10T bitcells, respectively, and show the proposed 10T bitcell has
comparable noise margin with the 12T bitcell despite its smaller transistor count.
Fig. 5. Proposed 10T SRAM bitcell (a) schematic diagram, (b) cell layout, (c) timing
diagram.
4. Hand Detection Engine with Skin-color Invariant Haar-like Filters
Fig. 6. Monte Carlo simulation results (a) read noise margin, (b) write noise margin,
(c) hold noise margin.
Fig. 7. Modified Haar-like filter.
Adaboost is a classification algorithm to detect target objects through the votes
from simple classifiers like Haar-like filters (11). Adaboost is widely used for its low complexity but is weak at skin color variations
since the Haar-like filter basically works on the pixel intensity. Therefore, we propose
a modified Haar-like filter that is robust to skin color variations by utilizing the
number of corners in the image rather than the pixel intensity. The modified Haar-like
filter compares the attributes of gray and white rectangular regions in a similar
way to the traditional Haar-like filter, as shown in Fig. 7, but it compares the number of corners rather than the pixel intensity of the gray
and white regions. During the filtering, gray region is examined to see if it has
enough number of corners to become a salient part of the image by comparing the number
with its neighboring white region. The salient regions from this modified Haar-like
filtering give more robust hand detection results since the variation in the number
of corners remains small even when there is a large variation in the skin color. Each
pixel in this modified Haar-like filter stores integral number of corners within a
region starting from the origin to the pixel as illustrated in Fig. 8. In this way, the total number of corners inside a certain region can be simply calculated
from the values of four pixels bounding the region as illustrated in the Fig. 8.
This algorithm is evaluated with the Cambridge hand dataset (12) shown in Fig. 9. The dataset consists of five sets of hand gestures with different lighting conditions,
and each set includes five hand postures. As shown in Fig. 10, skin color variations according to the lighting conditions lowers the accuracy of
the conventional Haar-like filter, but rarely affects the accuracy of the proposed
modified Haar-like filter, maintaining its accuracy above 90%.
Fig. 8. Calculation of integral number of corners for modified Haar-like filter.
Fig. 9. Cambridge hand dataset for five lighting conditions (12).
Fig. 10. Haar-like filters under different lighting conditions.
Fig. 11. Hand detection engine (a) overall architecture with buffer reuse scheme,
(b) timing diagram of the engine.
Fig. 11(a) shows the architecture of our hand detection engine accommodating buffer reuse capability
in storing the integral numbers. The engine consists of corner detection, integral
number generation, and Adaboost classifier blocks. As described in Fig. 11(b), the wake-up detector operates in harvesting and imaging phases alternatively, and
the CIS does not use the frame buffer in the harvesting phase. Thus, the hand detection
engine can reuse the idle frame buffer in the harvesting phase to store the computed
integral number of corners. This approach saves the extra storage required for the
integral numbers. In addition, the Adaboost weights are stored in a small lookup table,
saving the accesses to the external memory.
5. Switched -Capacitor DC-DC Converter
The proposed CIS harvests 0.3 V of supply voltage that needs load regulation to reduce
the ripples from non-uniform occurrence of load current over time. Moreover, this
0.3 V supply needs boosting up to 0.6 V for proper operation of analog domain. Therefore,
we exploit the switched-capacitor (SC) DC-DC converter for cost-effective on-die regulation
and conversion of the energy harvested from the CIS. It consists of loop controller,
maximum power point tracking (MPPT) circuit and switched-capacitor power stage, as
described in Fig. 12(a). The controller uses inverting amplifier and adopts pulse frequency modulation (PFM)
for closed loop control of the power stage. The MPPT circuit monitors harvested voltage
level across the photodiode and maintains the 0.3 V of MPP level by controlling the
charge transfer to the storage capacitor (Coff) for the maximum efficiency in the
harvesting. Fig. 12(b) illustrates how the 0.6 V supply is generated from the power stage. In phase Φ1,
the 0.3 V of Coff is transferred to the flying capacitor (Cfly) of the power stage
through the charge sharing between the Coff and Cfly with 100:1 capacitance ratio
to minimize the voltage level degradation from charge sharing. The power stage then
boosts the 0.3 V up to 0.6 V in phase Φ2 through the capacitive coupling of the Cfly.
This conversion mechanism achieves 82% of power efficiency.
Fig. 12. The proposed SC DC-DC converter (a) overall structure, (b) power stage operations.
Fig. 13. Chip photograph and characteristics.
Fig. 14. Power reductions in analog and digital domains.
III. IMPLEMENTATION RESULTS
The proposed self-powered always-on vision wake-up detector is fabricated using 65
nm CMOS process. Fig. 13 shows a chip micrograph and performance summary. Average power consumption is measured
to be 26 μW at 250 kHz and the generated power is 32 μW at 60 kLux (sunny day), thereby
allowing self-powered operation of the vision wake-up detection.
Fig. 14 shows the power consumption of the vision wake-up detector according to the supply
voltage. The supply voltage for the analog domain stays at their minimum of 0.6 V,
while that for the digital blocks goes down to 0.3 V thanks to the subthreshold design
of the SRAM.
Table 1. Comparison with related works
|
Gesture Wake-up
Detector
(This Work)
|
Speech Wake-up Detector (2)
|
Human Detector (3)
|
Functionality
|
Hand Detection
|
Voice Detection
|
Feature Extraction Only
|
Technology (nm)
|
65
|
90
|
180
|
Supply Voltage (V)
|
0.3 (digital)
|
N/A
|
0.8 (digital)
|
0.6 (analog)
|
1.3 (analog)
|
Operating
Frequency (MHz)
|
0.25
|
0.01
|
25
|
Power Consumption (μW)
|
26 @ 15 fps
|
6
|
51.06 @ 15 fps
3.31 @ 1 fps
|
Generated Power (μW)
|
32 @ 60 kLux
|
N/A
|
N/A
|
Table 1 summarizes the chip performance and compares the work with other state-of-the-art
wake-up detectors such as speech detector (2) and human detector (3). This chip runs at 15 fps drawing 26 μW under 0.3 V and 0.6 V for digital and analog
blocks, respectively. This work shows comparable power consumption with the others
and demonstrates the self-powered operation of the wake-up detection for the first
time.
IV. CONCLUSION
A self-powered always-on vision wake-up detector is presented for the gesture UI system
on wearable devices. It incorporates near-threshold imaging-harvesting dual-mode CIS
with 0.6 V 3T pixels for self-powered operation. Subthreshold SRAM with disturb-free
0.3 V 10T bitcells is also presented for deep low-voltage operation of the chip. Hand
detection engine using skin-color invariant Haar-like filter is devised to improve
the robustness in hand detection. Finally, switched capacitor DC-DC converter is integrated
together for on-die regulation and conversion of the power harvested from the CIS.
Thanks to these features, the proposed vision wake-up detector chip demonstrates the
self-powered operation of the wake-up detection for the first time.
ACKNOWLEDGMENTS
This work was supported by research fund of Chungnam National University. The authors
would like to thank IDEC for chip fabrication and CAD support.
REFERENCES
Choi S., et al. , Nov 2016, A Low-Power Real-Time Hidden Markov Model Accelerator
for Gesture User Interface on Wearable Devices, IEEE Asian Solid-State Circuits Conf.,
pp. 261-264
Badami K., et al. , Feb 2015, Context-Aware Hierarchical Information-Sensing in a
6μW 90nm CMOS Voice Activity Detector, IEEE Int. Solid-State Circuits Conf., pp. 430-431
Choi J., et al. , Jan 2014, A 3.4-μW Object-Adaptive CMOS Image Sensor With Embedded
Feature Extraction Algorithm for Motion-Triggered Object-of-Interest Imaging, IEEE
J. Solid-State Circuit, Vol. 49, No. 1, pp. 289-300
Cho S., et al. , Nov 2017, A Self-Powered Always-On Vision-based Wake-up Detector
for Wearable Gesture user Interfaces, IEEE Asian Solid-State Circuits Conf., pp. 245-248
Ay S. U., Dec 2011, A CMOS Energy Harvesting and Imaging (EHI) Active Pixel Sensor
(APS) Imager for Retinal Prosthesis, IEEE Trans. Biomed. Circuits Syst., Vol. 5, No.
6, pp. 535-545
Köklü G., et al. , May 2013, Characterization of standard CMOS compatible photodiodes
and pixels for Lab-on-Chip devices, IEEE Int. Symp. on Circuits and Systems, pp. 1075-1078
Counjot N., et al. , Oct 2015, A 65 nm 0.5 V DPS CMOS Image Sensor With 17 pJ/Frame.Pixel
and 42 dB Dynamic Range for Ultra-Low-Power SoCs, IEEE J. Solid-State Circuit, Vol.
50, No. 10, pp. 2419-2430
Ho D., et al. , May 2012, CMOS 3-T Digital Pixel Sensor with In-Pixel Shared Comparator,
IEEE Int. Symp. on Circuits and Systems, pp. 930-933
Calhoun B. H., Chandrakasan AP. P., Feb 2007, A 256-kb 65-nm Sub-threshold SRAM Design
for Ultra-Low-Voltage Operation, IEEE J. Solid-State Circuits, Vol. 42, No. 3, pp.
680-688
Chiu Y.-W., et al. , Sept 2014, 40 nm bit-interleaving 12T subthreshold SRAM with
data-aware write-assist, IEEE Trans. Circuits Syst. I, Reg. Papers, Vol. 61, No. 9,
pp. 2578-2585
Viola P., Jones M., Dec 2001, Rapid Object Detection using a Boosted Cascade of Simple
Features, IEEE Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518
Kim T.-K., et al. , 2009, Canonical correlation analysis of video volume tensors for
action categorization and detection, IEEE Trans. Pattern Analysis and Machine Intelligence,
Vol. 31, No. 8, pp. 1451-1428
Author
received the B.S. degree in computer science and engineering from the Chungnam National
University (CNU), Daejeon, in 2017, where he is currently working toward the M.S.
degree.
His current research interests include machine learning processors and wearable SoC
design.
received the B.S. and M.S. degrees in computer science and engineering from the Chungnam
National University (CNU), Daejeon, in 2011 and 2013, respectively, where he is currently
working toward the Ph.D. degree.
His current research interests include object recognition processor and wearable SoC
design.
He is a co-recipient of the IEEE Asian Solid-State Circuits Conference (A-SSCC) Distinguished
Design Award in 2016.
received the B.S degree in electronic engineering and M.S. degree in computer science
and engineering from the Chungnam National University (CNU), Daejeon, in 2013 and
2016, respectively, where he is currently with Silicon Works.
His research interests include wearable SoC and low-power SoC design.
received the B.S degree in information and communication engineering from the Hanbat
National University and M.S. degree in computer science and engineering from the Chungnam
National University (CNU), Daejeon, in 2016 and 2018, respectively, where she is currently
with Satreci.
Her research interests include wearable SoC and low-power SoC design.
received his B.S. degree (summa cum laude) in computer engineering from Kyungpook
National University, Daegu, Korea, in 1999, M.S. and Ph.D. degrees in electrical engineering
and computer science from Korea Advanced Institute of Science and Technology (KAIST),
Daejeon, Korea, in 2001 and 2007, respectively.
His Ph.D. work focused on low-power GPU design for wireless mobile devices.
In 2001, he joined Electronics and Telecommunications Research Institute (ETRI), Daejeon,
Korea, where he was involved in a network processor design for InfiniBandTM protocol.
From 2007 to 2010, he was with Samsung Electronics, Giheung, Korea, where he worked
on world first 1-GHz ARM CortexTM microprocessor design.
Dr. Nam is currently with Chungnam National University, Daejeon, Korea, as an associate
professor.
His current interests include mobile GPU, machine learning processor, microprocessor,
low-power SoC and embedded software.
He co-authored the book Mobile 3D Graphics SoC: From Algorithm to Chip (Wiley, 2010)
and presented tutorials on mobile processor design at IEEE ISSCC 2012 and IEEE A-SSCC
2011.
He was a recipient of the CNU Recognition of Excellent Professors in 2013 and co-recipient
of the A-SSCC Distinguished Design Award in 2016.
Prof. Nam has served as the Chair of Digital Architectures and Systems (DAS) subcommittee
of ISSCC from 2017 to 2019.
He was a member of the Technical Program Committees for ISSCC (2011-2019), A-SSCC
(2011-2018), COOL Chips (2011-2018), VLSI-DAT (2011-2018), ASP-DAC (2015-2016), and
ISOCC (2015-2018) and the Steering Committee for the IC Design Education Center (IDEC)
from 2013 to 2018.
He was a Guest Editor for the IEEE Journal of Solid-State Circuits (JSSC) in 2013
and is an Associate Editor for the IEIE Journal of Semiconductor Technology and Science
(JSTS).