DoHyeon-Gu
                     ChoiSeongrim
                     WooJunsik
                     KimAra
                     NamByeong-Gyu
               
                  - 
                           
                        (Department of Computer Science and Engineering, Chungnam National University, 99,
                        Daehak-ro, Yuseong-gu, Daejeon, 305-764, Korea)
                        
 
            
            
            Copyright © The Institute of Electronics and Information Engineers(IEIE)
            
            
            
            
            
               
                  
Index Terms
               
               Wake-up detector, gesture recognition, energy-harvesting, dual-mode CMOS image sensor, subthreshold SRAM
             
            
          
         
            
                  I. INTRODUCTION
                
                  Hand gesture recognition is gaining attention as a natural user interface (NUI) mechanism
                  in wearable smart devices such as smart watches and head mounted displays due to its
                  security merits of undisclosed users’ intention in public domain unlike speech recognition
                  or other approaches (1). However, always-on sensing nature of the gesture user interface demands large energy
                  dissipation, which becomes worse due to compute-intensive vision processing for the
                  gesture recognition functions. Recently, wake-up solutions have been proposed to extend
                  battery lifetime of the NUI system by turning off the main functional blocks in standby
                  mode and keeping the wake-up detectors always alive as shown in Fig. 1 (2,3). There have been several studies on the wakeup detectors for speech recognition,
                  face recognition, and surveillance applications, but they were not for the gesture
                  recognition systems.
                  
               
               
                  In this paper, we propose a self-powered always-on vision wake-up detector for the
                  gesture user interface on wearable smart devices (4). The proposed wake-up detector accommodates four key features to enable the self-powered
                  operation. First, we propose a near-threshold imaging-harvesting dual-mode CMOS image
                  sensor (CIS) based on 0.6 V 3T pixels. In this dual-mode sensor, the number of stacked
                  devices in the pixel comparators is reduced for its near-threshold operation. Second,
                  we present a subthreshold SRAM with disturb-free 0.3 V 10T bitcells. We eliminate
                  all the contentions arising in a bitcell for subthreshold operations by adopting proper
                  gating schemes to the bitcell. Third, hand detection engine is devised based on a
                  proposed descriptor algorithm optimized for the hand detection. It uses modified Haar-like
                  filters robust to skin color variations to improve the accuracy in hand detection.
                  Finally, switched capacitor DC-DC converter is integrated together for lightweight
                  system design. It converts the energy harvested from the CIS to the proper voltage
                  levels used in the chip. As a result, our wake-up detector achieves self-powered operation
                  of the always-on vision-based wake-up detection.
                  
               
               
                  
                        
                        
Fig. 1. Conceptual diagram of wake-up detection.
                       
                  
               
             
            
                  II. SELF-POWERED VISION WAKE-UP DETECTOR
               
                     1. System Architecture
                  
                     	  The overall system architecture of the proposed self-powered wake-up detector is
                     shown in Fig. 2. It consists of four major functional blocks, i.e. energy-harvesting CIS, frame buffer
                     SRAM, hand detection engine, and switched capacitor (SC) DC-DC converter. The energy-harvesting
                     CIS works in dual modes of imaging and harvesting. In the imaging mode, the CIS captures
                     incoming images and stores them in the frame buffer. The hand detection engine then
                     finds hand shapes from the images and produces wake-up signal to the gesture UI system
                     on hand detection. In the energy-harvesting mode, pixels in the CIS are used as photovoltaic
                     cells to harvest energy from incident light. The SC DC-DC converter then produces
                     0.3 V supply voltage for the digital blocks such as frame buffer and hand detection
                     engine from the energy harvested from the CIS and 0.6V for the analog blocks including
                     the CIS and SC DC-DC converter itself. The chip initially runs from the external battery,
                     but it switches to the on-die supply voltage for self-powered operation when the harvested
                     energy reaches a sufficient voltage level of 0.6 V.
                     	  
                  
                  
                     
                           
                           
Fig. 2. Overall architecture of vision wake-up detector.
                          
                     
                  
                  
                     
                           
                           
Fig. 3. Cross section diagrams of photodiodes (a) conventional photodiode, (b) dual-mode
                              photodiode.
                           
                          
                     
                  
                
               
                     2. Near-threshold Dual-mode CIS with 3T Pixels
                  
                     	  The dual-mode CIS with integrated imaging and harvesting operations has been studied
                     for area- and energy-efficient system designs (5). The dual-mode CIS operates in two phases i.e. harvesting phase and imaging phase
                     to serve as the energy source as well as image sensor for a system. The photodiode
                     (PD) in conventional CIS pixels (6) cannot work as a photovoltaic cell because it needs forward bias for photovoltaic
                     operation but its anode is fixed in reverse bias as shown in Fig. 3(a). Therefore, another diode (PD2) is added to the dual-mode pixel (5) that can be switched to forward bias by connecting its cathode to the ground in energy
                     harvesting mode (EHM) as described in Fig. 3(b). This additional diode does not incur any area overhead to the pixel as it’s built
                     upon the existing diode (PD1) and improves the fill-factors as well in the imaging
                     mode.
                     	  
                  
                  
                     
                           
                           
Fig. 4. Comparison of energy-harvesting pixel structures (a) dual-mode pixel, (b)
                              DPS pixel, (c) DPS pixel with shared-comparator, (d) low-voltage dual-mode DPS pixel.
                           
                          
                     
                  
                  
                     The dual-mode pixels incorporate a source follower (SF) to transfer the pixel value
                     to the comparator in the analog-to-digital converter (ADC) as illustrated in Fig. 4(a), but this SF imposes limitations on the dynamic range under low voltage operations.
                     In the meantime, digital pixel sensor (DPS) was studied for low-voltage operation
                     of the CIS (7). It takes out the comparator from the ADC and makes each pixel compare its own value
                     within the pixel to directly output the result to the following ADC as described in
                     Fig. 4(b). This approach eliminates the SF from the pixel and thus improves the dynamic range
                     under low voltage operation. However, the DPS structure incurs a large area overhead
                     due to the comparator incorporated in each pixel and therefore, shared-comparator
                     structure (8) has been proposed for the DPS as shown in Fig. 4(c). It mitigates the area issue by leaving only the input device of the comparator in
                     each pixel and taking out the other part of the comparator to make shared among the
                     pixels in the same column. In this structure, a row-select device is added in each
                     pixel for selective activation of the input device.
                     
                  
                  
                     We basically adopt this shared-comparator DPS structure to our dual-mode pixel as
                     shown in Fig. 4(d) for low-voltage operations. However, in our structure, the row-select device in Fig. 4(c) is taken out from the input stack and shunted with the diodes to reduce the number
                     of stacked devices for more reduction in supply voltage down to near-threshold level.
                     This structure also reduces the mismatches across pixels because process variation
                     impacts only the single input device rather than both the stacked devices together.
                     Thanks to this proposed dual-mode DPS structure, we achieve 0.6 V of near-threshold
                     operation of the dual-mode CIS.
                     
                  
                
               
                     3. Subthreshold SRAM with Disturb-free 10T Bitcells
                  
                     	  Conventional 6T SRAM bitcell achieves high memory-density with its simple structure
                     but its low-voltage operation is limited around 0.7 V owing to the several disturbance
                     issues associated with the 6T structure such as read disturbance, write disturbance,
                     half selection and hold stability (9). Recently, a 12T bitcell was proposed to eliminate all the disturbances arising in
                     a bitcell for robust subthreshold operations by adopting read buffer, cross-point
                     write wordline, and power cutoff structures (10). However, the large transistor count in this structure impacts memory density so
                     we propose a novel disturbance-free 10T bitcell to address the density issue without
                     sacrificing the robustness under subthreshold region.
                     	  
                  
                  
                     	  The proposed 10T SRAM bitcell is described in Fig. 5. It consists of data storage (M1-M6) and read/write port (M7-M10) as shown in Fig. 5(a). It exploits the single bitline structure for its area efficiency by unifying the
                     dual read/write ports exploited in the 12T bitcell. The cell layout in Fig. 5(b) shows the layers from the diffusion up to metal-3. We place the PMOS transistors
                     (M1-M4) on the left side of the cell and the NMOS transistors (M5-M10) on the right
                     side to secure the space for preventing the well proximity effect (WPE). We put two
                     write wordlines (i.e. WWLA and WWLB) between the PMOS and NMOS transistors to exploit
                     the area secured for the WPE prevention. The layout takes 1.915 μm in width and 1.535
                     μm in height, thereby resulting in 2.94 μm2, which demonstrates 25.4% less area than
                     that of 12T bitcell (10). 
                     	  
                  
                  
                     	  In this 10T structure, PMOS device M3 or M4 is turned off during write operations
                     to eliminate the write disturbance arising from the contention between pull-up device
                     (M1 or M2) and write driver. The read buffer composed of M9 and M10 isolates the data
                     storage from the bitline to avoid any read disturbance caused by low-impedance path
                     between storage and bitline. Half-select is also resolved as the connection between
                     the storage node and the bitline can only be made for the fully selected bitcells,
                     i.e. both the row-select wordline (RWL) and column-select write wordline (WWLA or
                     WWLB) should be active together for a selection to be made. In addition, the source
                     voltage of the read buffer is held high during the hold state to minimize bitline
                     leakage, which increases the hold margin of the bitcell.
                     	  
                  
                  
                     	  Fig. 6 shows 50k Monte Carlo simulations for the 0.3 V operations of the conventional 12T
                     and proposed 10T bitcells, respectively. Fig. 6(a)-(c) present the graphs for the read noise margin, write noise margin, and hold margin
                     of the 12T and the 10T bitcells, respectively, and show the proposed 10T bitcell has
                     comparable noise margin with the 12T bitcell despite its smaller transistor count.
                     
                     	  
                  
                  
                     
                           
                           
Fig. 5. Proposed 10T SRAM bitcell (a) schematic diagram, (b) cell layout, (c) timing
                              diagram.
                           
                          
                     
                  
                
               
                     4. Hand Detection Engine with Skin-color Invariant Haar-like Filters
                  
                     
                           
                           
Fig. 6. Monte Carlo simulation results (a) read noise margin, (b) write noise margin,
                              (c) hold noise margin.
                           
                          
                     
                  
                  
                     
                           
                           
Fig. 7. Modified Haar-like filter.
                          
                     
                  
                  
                     	  Adaboost is a classification algorithm to detect target objects through the votes
                     from simple classifiers like Haar-like filters (11). Adaboost is widely used for its low complexity but is weak at skin color variations
                     since the Haar-like filter basically works on the pixel intensity. Therefore, we propose
                     a modified Haar-like filter that is robust to skin color variations by utilizing the
                     number of corners in the image rather than the pixel intensity. The modified Haar-like
                     filter compares the attributes of gray and white rectangular regions in a similar
                     way to the traditional Haar-like filter, as shown in Fig. 7, but it compares the number of corners rather than the pixel intensity of the gray
                     and white regions. During the filtering, gray region is examined to see if it has
                     enough number of corners to become a salient part of the image by comparing the number
                     with its neighboring white region. The salient regions from this modified Haar-like
                     filtering give more robust hand detection results since the variation in the number
                     of corners remains small even when there is a large variation in the skin color. Each
                     pixel in this modified Haar-like filter stores integral number of corners within a
                     region starting from the origin to the pixel as illustrated in Fig. 8. In this way, the total number of corners inside a certain region can be simply calculated
                     from the values of four pixels bounding the region as illustrated in the Fig. 8.
                     	  
                  
                  
                     	  This algorithm is evaluated with the Cambridge hand dataset (12) shown in Fig. 9. The dataset consists of five sets of hand gestures with different lighting conditions,
                     and each set includes five hand postures. As shown in Fig. 10, skin color variations according to the lighting conditions lowers the accuracy of
                     the conventional Haar-like filter, but rarely affects the accuracy of the proposed
                     modified Haar-like filter, maintaining its accuracy above 90%.
                     	  
                  
                  
                     
                           
                           
Fig. 8. Calculation of integral number of corners for modified Haar-like filter.
                          
                     
                  
                  
                     
                           
                           
Fig. 9. Cambridge hand dataset for five lighting conditions (12).
                           
                          
                     
                  
                  
                     
                           
                           
Fig. 10. Haar-like filters under different lighting conditions.
                          
                     
                  
                  
                     
                           
                           
Fig. 11. Hand detection engine (a) overall architecture with buffer reuse scheme,
                              (b) timing diagram of the engine.
                           
                          
                     
                  
                  
                     	  Fig. 11(a) shows the architecture of our hand detection engine accommodating buffer reuse capability
                     in storing the integral numbers. The engine consists of corner detection, integral
                     number generation, and Adaboost classifier blocks. As described in Fig. 11(b), the wake-up detector operates in harvesting and imaging phases alternatively, and
                     the CIS does not use the frame buffer in the harvesting phase. Thus, the hand detection
                     engine can reuse the idle frame buffer in the harvesting phase to store the computed
                     integral number of corners. This approach saves the extra storage required for the
                     integral numbers. In addition, the Adaboost weights are stored in a small lookup table,
                     saving the accesses to the external memory.
                     	  
                  
                
               
                     5. Switched -Capacitor DC-DC Converter
                  
                     	The proposed CIS harvests 0.3 V of supply voltage that needs load regulation to reduce
                     the ripples from non-uniform occurrence of load current over time. Moreover, this
                     0.3 V supply needs boosting up to 0.6 V for proper operation of analog domain. Therefore,
                     we exploit the switched-capacitor (SC) DC-DC converter for cost-effective on-die regulation
                     and conversion of the energy harvested from the CIS. It consists of loop controller,
                     maximum power point tracking (MPPT) circuit and switched-capacitor power stage, as
                     described in Fig. 12(a). The controller uses inverting amplifier and adopts pulse frequency modulation (PFM)
                     for closed loop control of the power stage. The MPPT circuit monitors harvested voltage
                     level across the photodiode and maintains the 0.3 V of MPP level by controlling the
                     charge transfer to the storage capacitor (Coff) for the maximum efficiency in the
                     harvesting. Fig. 12(b) illustrates how the 0.6 V supply is generated from the power stage. In phase Φ1,
                     the 0.3 V of Coff is transferred to the flying capacitor (Cfly) of the power stage
                     through the charge sharing between the Coff and Cfly with 100:1 capacitance ratio
                     to minimize the voltage level degradation from charge sharing. The power stage then
                     boosts the 0.3 V up to 0.6 V in phase Φ2 through the capacitive coupling of the Cfly.
                     This conversion mechanism achieves 82% of power efficiency.
                     	
                  
                  
                     
                           
                           
Fig. 12. The proposed SC DC-DC converter (a) overall structure, (b) power stage operations.
                          
                     
                  
                  
                     
                           
                           
Fig. 13. Chip photograph and characteristics.
                          
                     
                  
                  
                     
                           
                           
Fig. 14. Power reductions in analog and digital domains.
                          
                     
                  
                
             
            
                  III. IMPLEMENTATION RESULTS
               
                  The proposed self-powered always-on vision wake-up detector is fabricated using 65
                  nm CMOS process. Fig. 13 shows a chip micrograph and performance summary. Average power consumption is measured
                  to be 26 μW at 250 kHz and the generated power is 32 μW at 60 kLux (sunny day), thereby
                  allowing self-powered operation of the vision wake-up detection.
                  
               
               
                  Fig. 14 shows the power consumption of the vision wake-up detector according to the supply
                  voltage. The supply voltage for the analog domain stays at their minimum of 0.6 V,
                  while that for the digital blocks goes down to 0.3 V thanks to the subthreshold design
                  of the SRAM. 
                  
               
               
                  
                  
                  
                        
                        
Table 1. Comparison with related works
                     
                     
                        
                        
                              
                                 
                                    |   | Gesture Wake-up Detector (This Work) | Speech Wake-up Detector (2) | Human Detector (3) | 
                              
                                    | Functionality | Hand Detection | Voice Detection | Feature Extraction Only | 
                              
                                    | Technology (nm) | 65 | 90 | 180 | 
                              
                                    | Supply Voltage (V) | 0.3 (digital) | N/A | 0.8 (digital) | 
                              
                                    | 0.6 (analog) | 1.3 (analog) | 
                              
                                    | Operating Frequency (MHz) | 0.25 | 0.01 | 25 | 
                              
                                    | Power Consumption (μW) | 26 @ 15 fps | 6 | 51.06 @ 15 fps 3.31 @ 1 fps | 
                              
                                    | Generated Power (μW) | 32 @ 60 kLux | N/A | N/A | 
                           
                        
                     
                   
                  
                  
               
               
                  Table 1 summarizes the chip performance and compares the work with other state-of-the-art
                  wake-up detectors such as speech detector (2) and human detector (3). This chip runs at 15 fps drawing 26 μW under 0.3 V and 0.6 V for digital and analog
                  blocks, respectively. This work shows comparable power consumption with the others
                  and demonstrates the self-powered operation of the wake-up detection for the first
                  time.
                  
               
             
            
                  IV. CONCLUSION
               
                  A self-powered always-on vision wake-up detector is presented for the gesture UI system
                  on wearable devices. It incorporates near-threshold imaging-harvesting dual-mode CIS
                  with 0.6 V 3T pixels for self-powered operation. Subthreshold SRAM with disturb-free
                  0.3 V 10T bitcells is also presented for deep low-voltage operation of the chip. Hand
                  detection engine using skin-color invariant Haar-like filter is devised to improve
                  the robustness in hand detection. Finally, switched capacitor DC-DC converter is integrated
                  together for on-die regulation and conversion of the power harvested from the CIS.
                  Thanks to these features, the proposed vision wake-up detector chip demonstrates the
                  self-powered operation of the wake-up detection for the first time.
                  
               
             
          
         
            
                  ACKNOWLEDGMENTS
               
                  This work was supported by research fund of Chungnam National University. The authors
                  would like to thank IDEC for chip fabrication and CAD support.
                  
               
             
            
                  
                     REFERENCES
                  
                     
                        
                        Choi S., et al. , Nov 2016, A Low-Power Real-Time Hidden Markov Model Accelerator
                           for Gesture User Interface on Wearable Devices, IEEE Asian Solid-State Circuits Conf.,
                           pp. 261-264

 
                     
                        
                        Badami K., et al. , Feb 2015, Context-Aware Hierarchical Information-Sensing in a
                           6μW 90nm CMOS Voice Activity Detector, IEEE Int. Solid-State Circuits Conf., pp. 430-431

 
                     
                        
                        Choi J., et al. , Jan 2014, A 3.4-μW Object-Adaptive CMOS Image Sensor With Embedded
                           Feature Extraction Algorithm for Motion-Triggered Object-of-Interest Imaging, IEEE
                           J. Solid-State Circuit, Vol. 49, No. 1, pp. 289-300

 
                     
                        
                        Cho S., et al. , Nov 2017, A Self-Powered Always-On Vision-based Wake-up Detector
                           for Wearable Gesture user Interfaces, IEEE Asian Solid-State Circuits Conf., pp. 245-248

 
                     
                        
                        Ay S. U., Dec 2011, A CMOS Energy Harvesting and Imaging (EHI) Active Pixel Sensor
                           (APS) Imager for Retinal Prosthesis, IEEE Trans. Biomed. Circuits Syst., Vol. 5, No.
                           6, pp. 535-545

 
                     
                        
                        Köklü G., et al. , May 2013, Characterization of standard CMOS compatible photodiodes
                           and pixels for Lab-on-Chip devices, IEEE Int. Symp. on Circuits and Systems, pp. 1075-1078

 
                     
                        
                        Counjot N., et al. , Oct 2015, A 65 nm 0.5 V DPS CMOS Image Sensor With 17 pJ/Frame.Pixel
                           and 42 dB Dynamic Range for Ultra-Low-Power SoCs, IEEE J. Solid-State Circuit, Vol.
                           50, No. 10, pp. 2419-2430

 
                     
                        
                        Ho D., et al. , May 2012, CMOS 3-T Digital Pixel Sensor with In-Pixel Shared Comparator,
                           IEEE Int. Symp. on Circuits and Systems, pp. 930-933

 
                     
                        
                        Calhoun B. H., Chandrakasan AP. P., Feb 2007, A 256-kb 65-nm Sub-threshold SRAM Design
                           for Ultra-Low-Voltage Operation, IEEE J. Solid-State Circuits, Vol. 42, No. 3, pp.
                           680-688

 
                     
                        
                        Chiu Y.-W., et al. , Sept 2014, 40 nm bit-interleaving 12T subthreshold SRAM with
                           data-aware write-assist, IEEE Trans. Circuits Syst. I, Reg. Papers, Vol. 61, No. 9,
                           pp. 2578-2585

 
                     
                        
                        Viola P., Jones M., Dec 2001, Rapid Object Detection using a Boosted Cascade of Simple
                           Features, IEEE Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518

 
                     
                        
                        Kim T.-K., et al. , 2009, Canonical correlation analysis of video volume tensors for
                           action categorization and detection, IEEE Trans. Pattern Analysis and Machine Intelligence,
                           Vol. 31, No. 8, pp. 1451-1428

 
                   
                
             
            Author
             
            
            
               received the B.S. degree in computer science and engineering from the Chungnam National
               University (CNU), Daejeon, in 2017, where he is currently working toward the M.S.
               degree.
               
            
            
               His current research interests include machine learning processors and wearable SoC
               design.
               
            
             
            
            
               received the B.S. and M.S. degrees in computer science and engineering from the Chungnam
               National University (CNU), Daejeon, in 2011 and 2013, respectively, where he is currently
               working toward the Ph.D. degree. 
               
            
            
               His current research interests include object recognition processor and wearable SoC
               design.
               
            
            
               He is a co-recipient of the IEEE Asian Solid-State Circuits Conference (A-SSCC) Distinguished
               Design Award in 2016.
               
            
             
            
            
               received the B.S degree in electronic engineering and M.S. degree in computer science
               and engineering from the Chungnam National University (CNU), Daejeon, in 2013 and
               2016, respectively, where he is currently with Silicon Works.
               
            
            
               His research interests include wearable SoC and low-power SoC design.
               
            
             
            
            
               received the B.S degree in information and communication engineering from the Hanbat
               National University and M.S. degree in computer science and engineering from the Chungnam
               National University (CNU), Daejeon, in 2016 and 2018, respectively, where she is currently
               with Satreci.
               
            
            
               Her research interests include wearable SoC and low-power SoC design.
               
            
             
            
            
               received his B.S. degree (summa cum laude) in computer engineering from Kyungpook
               National University, Daegu, Korea, in 1999, M.S. and Ph.D. degrees in electrical engineering
               and computer science from Korea Advanced Institute of Science and Technology (KAIST),
               Daejeon, Korea, in 2001 and 2007, respectively.
               
            
            
               His Ph.D. work focused on low-power GPU design for wireless mobile devices.
               
            
            
               In 2001, he joined Electronics and Telecommunications Research Institute (ETRI), Daejeon,
               Korea, where he was involved in a network processor design for InfiniBandTM protocol.
               
            
            
               From 2007 to 2010, he was with Samsung Electronics, Giheung, Korea, where he worked
               on world first 1-GHz ARM CortexTM microprocessor design.
               
            
            
               Dr. Nam is currently with Chungnam National University, Daejeon, Korea, as an associate
               professor.
               
            
            
               His current interests include mobile GPU, machine learning processor, microprocessor,
               low-power SoC and embedded software. 
               
            
            
               He co-authored the book Mobile 3D Graphics SoC: From Algorithm to Chip (Wiley, 2010)
               and presented tutorials on mobile processor design at IEEE ISSCC 2012 and IEEE A-SSCC
               2011.
               
            
            
               He was a recipient of the CNU Recognition of Excellent Professors in 2013 and co-recipient
               of the A-SSCC Distinguished Design Award in 2016.
               
            
            
               Prof. Nam has served as the Chair of Digital Architectures and Systems (DAS) subcommittee
               of ISSCC from 2017 to 2019.
               
            
            
               He was a member of the Technical Program Committees for ISSCC (2011-2019), A-SSCC
               (2011-2018), COOL Chips (2011-2018), VLSI-DAT (2011-2018), ASP-DAC (2015-2016), and
               ISOCC (2015-2018) and the Steering Committee for the IC Design Education Center (IDEC)
               from 2013 to 2018.
               
            
            
               He was a Guest Editor for the IEEE Journal of Solid-State Circuits (JSSC) in 2013
               and is an Associate Editor for the IEIE Journal of Semiconductor Technology and Science
               (JSTS).