Mobile QR Code QR CODE

  1. (Department of Computer Science and Engineering, Chungnam National University, 99, Daehak-ro, Yuseong-gu, Daejeon, 305-764, Korea)

Wake-up detector, gesture recognition, energy-harvesting, dual-mode CMOS image sensor, subthreshold SRAM


Hand gesture recognition is gaining attention as a natural user interface (NUI) mechanism in wearable smart devices such as smart watches and head mounted displays due to its security merits of undisclosed users’ intention in public domain unlike speech recognition or other approaches (1). However, always-on sensing nature of the gesture user interface demands large energy dissipation, which becomes worse due to compute-intensive vision processing for the gesture recognition functions. Recently, wake-up solutions have been proposed to extend battery lifetime of the NUI system by turning off the main functional blocks in standby mode and keeping the wake-up detectors always alive as shown in Fig. 1 (2,3). There have been several studies on the wakeup detectors for speech recognition, face recognition, and surveillance applications, but they were not for the gesture recognition systems.

In this paper, we propose a self-powered always-on vision wake-up detector for the gesture user interface on wearable smart devices (4). The proposed wake-up detector accommodates four key features to enable the self-powered operation. First, we propose a near-threshold imaging-harvesting dual-mode CMOS image sensor (CIS) based on 0.6 V 3T pixels. In this dual-mode sensor, the number of stacked devices in the pixel comparators is reduced for its near-threshold operation. Second, we present a subthreshold SRAM with disturb-free 0.3 V 10T bitcells. We eliminate all the contentions arising in a bitcell for subthreshold operations by adopting proper gating schemes to the bitcell. Third, hand detection engine is devised based on a proposed descriptor algorithm optimized for the hand detection. It uses modified Haar-like filters robust to skin color variations to improve the accuracy in hand detection. Finally, switched capacitor DC-DC converter is integrated together for lightweight system design. It converts the energy harvested from the CIS to the proper voltage levels used in the chip. As a result, our wake-up detector achieves self-powered operation of the always-on vision-based wake-up detection.

Fig. 1. Conceptual diagram of wake-up detection.



1. System Architecture

The overall system architecture of the proposed self-powered wake-up detector is shown in Fig. 2. It consists of four major functional blocks, i.e. energy-harvesting CIS, frame buffer SRAM, hand detection engine, and switched capacitor (SC) DC-DC converter. The energy-harvesting CIS works in dual modes of imaging and harvesting. In the imaging mode, the CIS captures incoming images and stores them in the frame buffer. The hand detection engine then finds hand shapes from the images and produces wake-up signal to the gesture UI system on hand detection. In the energy-harvesting mode, pixels in the CIS are used as photovoltaic cells to harvest energy from incident light. The SC DC-DC converter then produces 0.3 V supply voltage for the digital blocks such as frame buffer and hand detection engine from the energy harvested from the CIS and 0.6V for the analog blocks including the CIS and SC DC-DC converter itself. The chip initially runs from the external battery, but it switches to the on-die supply voltage for self-powered operation when the harvested energy reaches a sufficient voltage level of 0.6 V.

Fig. 2. Overall architecture of vision wake-up detector.


Fig. 3. Cross section diagrams of photodiodes (a) conventional photodiode, (b) dual-mode photodiode.


2. Near-threshold Dual-mode CIS with 3T Pixels

The dual-mode CIS with integrated imaging and harvesting operations has been studied for area- and energy-efficient system designs (5). The dual-mode CIS operates in two phases i.e. harvesting phase and imaging phase to serve as the energy source as well as image sensor for a system. The photodiode (PD) in conventional CIS pixels (6) cannot work as a photovoltaic cell because it needs forward bias for photovoltaic operation but its anode is fixed in reverse bias as shown in Fig. 3(a). Therefore, another diode (PD2) is added to the dual-mode pixel (5) that can be switched to forward bias by connecting its cathode to the ground in energy harvesting mode (EHM) as described in Fig. 3(b). This additional diode does not incur any area overhead to the pixel as it’s built upon the existing diode (PD1) and improves the fill-factors as well in the imaging mode.

Fig. 4. Comparison of energy-harvesting pixel structures (a) dual-mode pixel, (b) DPS pixel, (c) DPS pixel with shared-comparator, (d) low-voltage dual-mode DPS pixel.


The dual-mode pixels incorporate a source follower (SF) to transfer the pixel value to the comparator in the analog-to-digital converter (ADC) as illustrated in Fig. 4(a), but this SF imposes limitations on the dynamic range under low voltage operations. In the meantime, digital pixel sensor (DPS) was studied for low-voltage operation of the CIS (7). It takes out the comparator from the ADC and makes each pixel compare its own value within the pixel to directly output the result to the following ADC as described in Fig. 4(b). This approach eliminates the SF from the pixel and thus improves the dynamic range under low voltage operation. However, the DPS structure incurs a large area overhead due to the comparator incorporated in each pixel and therefore, shared-comparator structure (8) has been proposed for the DPS as shown in Fig. 4(c). It mitigates the area issue by leaving only the input device of the comparator in each pixel and taking out the other part of the comparator to make shared among the pixels in the same column. In this structure, a row-select device is added in each pixel for selective activation of the input device.

We basically adopt this shared-comparator DPS structure to our dual-mode pixel as shown in Fig. 4(d) for low-voltage operations. However, in our structure, the row-select device in Fig. 4(c) is taken out from the input stack and shunted with the diodes to reduce the number of stacked devices for more reduction in supply voltage down to near-threshold level. This structure also reduces the mismatches across pixels because process variation impacts only the single input device rather than both the stacked devices together. Thanks to this proposed dual-mode DPS structure, we achieve 0.6 V of near-threshold operation of the dual-mode CIS.

3. Subthreshold SRAM with Disturb-free 10T Bitcells

Conventional 6T SRAM bitcell achieves high memory-density with its simple structure but its low-voltage operation is limited around 0.7 V owing to the several disturbance issues associated with the 6T structure such as read disturbance, write disturbance, half selection and hold stability (9). Recently, a 12T bitcell was proposed to eliminate all the disturbances arising in a bitcell for robust subthreshold operations by adopting read buffer, cross-point write wordline, and power cutoff structures (10). However, the large transistor count in this structure impacts memory density so we propose a novel disturbance-free 10T bitcell to address the density issue without sacrificing the robustness under subthreshold region.

The proposed 10T SRAM bitcell is described in Fig. 5. It consists of data storage (M1-M6) and read/write port (M7-M10) as shown in Fig. 5(a). It exploits the single bitline structure for its area efficiency by unifying the dual read/write ports exploited in the 12T bitcell. The cell layout in Fig. 5(b) shows the layers from the diffusion up to metal-3. We place the PMOS transistors (M1-M4) on the left side of the cell and the NMOS transistors (M5-M10) on the right side to secure the space for preventing the well proximity effect (WPE). We put two write wordlines (i.e. WWLA and WWLB) between the PMOS and NMOS transistors to exploit the area secured for the WPE prevention. The layout takes 1.915 μm in width and 1.535 μm in height, thereby resulting in 2.94 μm2, which demonstrates 25.4% less area than that of 12T bitcell (10).

In this 10T structure, PMOS device M3 or M4 is turned off during write operations to eliminate the write disturbance arising from the contention between pull-up device (M1 or M2) and write driver. The read buffer composed of M9 and M10 isolates the data storage from the bitline to avoid any read disturbance caused by low-impedance path between storage and bitline. Half-select is also resolved as the connection between the storage node and the bitline can only be made for the fully selected bitcells, i.e. both the row-select wordline (RWL) and column-select write wordline (WWLA or WWLB) should be active together for a selection to be made. In addition, the source voltage of the read buffer is held high during the hold state to minimize bitline leakage, which increases the hold margin of the bitcell.

Fig. 6 shows 50k Monte Carlo simulations for the 0.3 V operations of the conventional 12T and proposed 10T bitcells, respectively. Fig. 6(a)-(c) present the graphs for the read noise margin, write noise margin, and hold margin of the 12T and the 10T bitcells, respectively, and show the proposed 10T bitcell has comparable noise margin with the 12T bitcell despite its smaller transistor count.

Fig. 5. Proposed 10T SRAM bitcell (a) schematic diagram, (b) cell layout, (c) timing diagram.


4. Hand Detection Engine with Skin-color Invariant Haar-like Filters

Fig. 6. Monte Carlo simulation results (a) read noise margin, (b) write noise margin, (c) hold noise margin.


Fig. 7. Modified Haar-like filter.


Adaboost is a classification algorithm to detect target objects through the votes from simple classifiers like Haar-like filters (11). Adaboost is widely used for its low complexity but is weak at skin color variations since the Haar-like filter basically works on the pixel intensity. Therefore, we propose a modified Haar-like filter that is robust to skin color variations by utilizing the number of corners in the image rather than the pixel intensity. The modified Haar-like filter compares the attributes of gray and white rectangular regions in a similar way to the traditional Haar-like filter, as shown in Fig. 7, but it compares the number of corners rather than the pixel intensity of the gray and white regions. During the filtering, gray region is examined to see if it has enough number of corners to become a salient part of the image by comparing the number with its neighboring white region. The salient regions from this modified Haar-like filtering give more robust hand detection results since the variation in the number of corners remains small even when there is a large variation in the skin color. Each pixel in this modified Haar-like filter stores integral number of corners within a region starting from the origin to the pixel as illustrated in Fig. 8. In this way, the total number of corners inside a certain region can be simply calculated from the values of four pixels bounding the region as illustrated in the Fig. 8.

This algorithm is evaluated with the Cambridge hand dataset (12) shown in Fig. 9. The dataset consists of five sets of hand gestures with different lighting conditions, and each set includes five hand postures. As shown in Fig. 10, skin color variations according to the lighting conditions lowers the accuracy of the conventional Haar-like filter, but rarely affects the accuracy of the proposed modified Haar-like filter, maintaining its accuracy above 90%.

Fig. 8. Calculation of integral number of corners for modified Haar-like filter.


Fig. 9. Cambridge hand dataset for five lighting conditions (12).


Fig. 10. Haar-like filters under different lighting conditions.


Fig. 11. Hand detection engine (a) overall architecture with buffer reuse scheme, (b) timing diagram of the engine.


Fig. 11(a) shows the architecture of our hand detection engine accommodating buffer reuse capability in storing the integral numbers. The engine consists of corner detection, integral number generation, and Adaboost classifier blocks. As described in Fig. 11(b), the wake-up detector operates in harvesting and imaging phases alternatively, and the CIS does not use the frame buffer in the harvesting phase. Thus, the hand detection engine can reuse the idle frame buffer in the harvesting phase to store the computed integral number of corners. This approach saves the extra storage required for the integral numbers. In addition, the Adaboost weights are stored in a small lookup table, saving the accesses to the external memory.

5. Switched -Capacitor DC-DC Converter

The proposed CIS harvests 0.3 V of supply voltage that needs load regulation to reduce the ripples from non-uniform occurrence of load current over time. Moreover, this 0.3 V supply needs boosting up to 0.6 V for proper operation of analog domain. Therefore, we exploit the switched-capacitor (SC) DC-DC converter for cost-effective on-die regulation and conversion of the energy harvested from the CIS. It consists of loop controller, maximum power point tracking (MPPT) circuit and switched-capacitor power stage, as described in Fig. 12(a). The controller uses inverting amplifier and adopts pulse frequency modulation (PFM) for closed loop control of the power stage. The MPPT circuit monitors harvested voltage level across the photodiode and maintains the 0.3 V of MPP level by controlling the charge transfer to the storage capacitor (Coff) for the maximum efficiency in the harvesting. Fig. 12(b) illustrates how the 0.6 V supply is generated from the power stage. In phase Φ1, the 0.3 V of Coff is transferred to the flying capacitor (Cfly) of the power stage through the charge sharing between the Coff and Cfly with 100:1 capacitance ratio to minimize the voltage level degradation from charge sharing. The power stage then boosts the 0.3 V up to 0.6 V in phase Φ2 through the capacitive coupling of the Cfly. This conversion mechanism achieves 82% of power efficiency.

Fig. 12. The proposed SC DC-DC converter (a) overall structure, (b) power stage operations.


Fig. 13. Chip photograph and characteristics.


Fig. 14. Power reductions in analog and digital domains.



The proposed self-powered always-on vision wake-up detector is fabricated using 65 nm CMOS process. Fig. 13 shows a chip micrograph and performance summary. Average power consumption is measured to be 26 μW at 250 kHz and the generated power is 32 μW at 60 kLux (sunny day), thereby allowing self-powered operation of the vision wake-up detection.

Fig. 14 shows the power consumption of the vision wake-up detector according to the supply voltage. The supply voltage for the analog domain stays at their minimum of 0.6 V, while that for the digital blocks goes down to 0.3 V thanks to the subthreshold design of the SRAM.

Table 1. Comparison with related works

Gesture Wake-up


(This Work)

Speech Wake-up Detector (2)

Human Detector (3)


Hand Detection

Voice Detection

Feature Extraction Only

Technology (nm)




Supply Voltage (V)

0.3 (digital)


0.8 (digital)

0.6 (analog)

1.3 (analog)


Frequency (MHz)




Power Consumption (μW)

26 @ 15 fps


51.06 @ 15 fps

3.31 @ 1 fps

Generated Power (μW)

32 @ 60 kLux



Table 1 summarizes the chip performance and compares the work with other state-of-the-art wake-up detectors such as speech detector (2) and human detector (3). This chip runs at 15 fps drawing 26 μW under 0.3 V and 0.6 V for digital and analog blocks, respectively. This work shows comparable power consumption with the others and demonstrates the self-powered operation of the wake-up detection for the first time.


A self-powered always-on vision wake-up detector is presented for the gesture UI system on wearable devices. It incorporates near-threshold imaging-harvesting dual-mode CIS with 0.6 V 3T pixels for self-powered operation. Subthreshold SRAM with disturb-free 0.3 V 10T bitcells is also presented for deep low-voltage operation of the chip. Hand detection engine using skin-color invariant Haar-like filter is devised to improve the robustness in hand detection. Finally, switched capacitor DC-DC converter is integrated together for on-die regulation and conversion of the power harvested from the CIS. Thanks to these features, the proposed vision wake-up detector chip demonstrates the self-powered operation of the wake-up detection for the first time.


This work was supported by research fund of Chungnam National University. The authors would like to thank IDEC for chip fabrication and CAD support.


Choi S., et al. , Nov 2016, A Low-Power Real-Time Hidden Markov Model Accelerator for Gesture User Interface on Wearable Devices, IEEE Asian Solid-State Circuits Conf., pp. 261-264DOI
Badami K., et al. , Feb 2015, Context-Aware Hierarchical Information-Sensing in a 6μW 90nm CMOS Voice Activity Detector, IEEE Int. Solid-State Circuits Conf., pp. 430-431DOI
Choi J., et al. , Jan 2014, A 3.4-μW Object-Adaptive CMOS Image Sensor With Embedded Feature Extraction Algorithm for Motion-Triggered Object-of-Interest Imaging, IEEE J. Solid-State Circuit, Vol. 49, No. 1, pp. 289-300DOI
Cho S., et al. , Nov 2017, A Self-Powered Always-On Vision-based Wake-up Detector for Wearable Gesture user Interfaces, IEEE Asian Solid-State Circuits Conf., pp. 245-248DOI
Ay S. U., Dec 2011, A CMOS Energy Harvesting and Imaging (EHI) Active Pixel Sensor (APS) Imager for Retinal Prosthesis, IEEE Trans. Biomed. Circuits Syst., Vol. 5, No. 6, pp. 535-545DOI
Köklü G., et al. , May 2013, Characterization of standard CMOS compatible photodiodes and pixels for Lab-on-Chip devices, IEEE Int. Symp. on Circuits and Systems, pp. 1075-1078DOI
Counjot N., et al. , Oct 2015, A 65 nm 0.5 V DPS CMOS Image Sensor With 17 pJ/Frame.Pixel and 42 dB Dynamic Range for Ultra-Low-Power SoCs, IEEE J. Solid-State Circuit, Vol. 50, No. 10, pp. 2419-2430DOI
Ho D., et al. , May 2012, CMOS 3-T Digital Pixel Sensor with In-Pixel Shared Comparator, IEEE Int. Symp. on Circuits and Systems, pp. 930-933DOI
Calhoun B. H., Chandrakasan AP. P., Feb 2007, A 256-kb 65-nm Sub-threshold SRAM Design for Ultra-Low-Voltage Operation, IEEE J. Solid-State Circuits, Vol. 42, No. 3, pp. 680-688DOI
Chiu Y.-W., et al. , Sept 2014, 40 nm bit-interleaving 12T subthreshold SRAM with data-aware write-assist, IEEE Trans. Circuits Syst. I, Reg. Papers, Vol. 61, No. 9, pp. 2578-2585DOI
Viola P., Jones M., Dec 2001, Rapid Object Detection using a Boosted Cascade of Simple Features, IEEE Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518Google Search
Kim T.-K., et al. , 2009, Canonical correlation analysis of video volume tensors for action categorization and detection, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 31, No. 8, pp. 1451-1428DOI


Hyeon-Gu Do

received the B.S. degree in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2017, where he is currently working toward the M.S. degree.

His current research interests include machine learning processors and wearable SoC design.

Seongrim Choi

received the B.S. and M.S. degrees in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2011 and 2013, respectively, where he is currently working toward the Ph.D. degree.

His current research interests include object recognition processor and wearable SoC design.

He is a co-recipient of the IEEE Asian Solid-State Circuits Conference (A-SSCC) Distinguished Design Award in 2016.

Junsik Woo

received the B.S degree in electronic engineering and M.S. degree in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2013 and 2016, respectively, where he is currently with Silicon Works.

His research interests include wearable SoC and low-power SoC design.

Ara Kim

received the B.S degree in information and communication engineering from the Hanbat National University and M.S. degree in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2016 and 2018, respectively, where she is currently with Satreci.

Her research interests include wearable SoC and low-power SoC design.

Byeong-Gyu Nam

received his B.S. degree (summa cum laude) in computer engineering from Kyungpook National University, Daegu, Korea, in 1999, M.S. and Ph.D. degrees in electrical engineering and computer science from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2001 and 2007, respectively.

His Ph.D. work focused on low-power GPU design for wireless mobile devices.

In 2001, he joined Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea, where he was involved in a network processor design for InfiniBandTM protocol.

From 2007 to 2010, he was with Samsung Electronics, Giheung, Korea, where he worked on world first 1-GHz ARM CortexTM microprocessor design.

Dr. Nam is currently with Chungnam National University, Daejeon, Korea, as an associate professor.

His current interests include mobile GPU, machine learning processor, microprocessor, low-power SoC and embedded software.

He co-authored the book Mobile 3D Graphics SoC: From Algorithm to Chip (Wiley, 2010) and presented tutorials on mobile processor design at IEEE ISSCC 2012 and IEEE A-SSCC 2011.

He was a recipient of the CNU Recognition of Excellent Professors in 2013 and co-recipient of the A-SSCC Distinguished Design Award in 2016.

Prof. Nam has served as the Chair of Digital Architectures and Systems (DAS) subcommittee of ISSCC from 2017 to 2019.

He was a member of the Technical Program Committees for ISSCC (2011-2019), A-SSCC (2011-2018), COOL Chips (2011-2018), VLSI-DAT (2011-2018), ASP-DAC (2015-2016), and ISOCC (2015-2018) and the Steering Committee for the IC Design Education Center (IDEC) from 2013 to 2018.

He was a Guest Editor for the IEEE Journal of Solid-State Circuits (JSSC) in 2013 and is an Associate Editor for the IEIE Journal of Semiconductor Technology and Science (JSTS).