MetkuPrashanthi1
                     KimKyung Ki2
                     KimYong-Bin3
                     ChoiMinsu4
               
                  - 
                           
                        (1GlobalFoundries, Essex Junction, VT, USA)
                        
- 
                           
                        (Department of Electronic Engineering, Daegu University, Gyeongsan, Korea)
                        
- 
                           
                        (Department of Electrical and Computer Engineering, Northeastern University, Boston,
                        MA, USA)
                        
- 
                           
                        (Department of Electrical & Computer Engineering, Missouri University of Science &
                        Technology, Rolla, MO, USA)
                        
 
            
            
            Copyright © The Institute of Electronics and Information Engineers(IEIE)
            
            
            
            
            
               
                  
Index Terms
               
               Null-convention logic, gate diffusion input, clockless design, transistor count reduction, simulation
             
            
          
         
            
                  I. INTRODUCTION
               Clocking have become a very complex task for circuits due to technology scaling. The
                  increasing clock rate, due to the de- creasing transistor size is leading to a major
                  problem of clock skew. In fact, designing clock nets consumes large portion of the
                  designing time (1). In order to achieve a tolerable skew, large part of the chip area is allotted for
                  clock drives (2). This leads to high power dissipation prominently at clock edges, where switching
                  occurs. As the trend for high clock frequency and decreasing the feature size continues,
                  synchronous circuits power dissipation and noise are significantly increasing (3). The increasing power dissipation is the major concern for the emerging low power
                  industry. Thus, encouraging renewed interest towards asynchro-nous digital designs.
               
               In comparison to synchronous circuits delay-insensitive (DI) asynchronous paradigms
                  offers less power, noise and electro- magnetic interference (4). Asynchronous circuits are classified into two types: bounded-delay and delay-insensitive
                  models. Bounded-delay models consider both the gates and wire delays to be bounded.
                  One such example for this type of model is micropipelines (5). Here, delays are added based on the worst-case scenarios. To ensure the correctness
                  of the circuits, extensive timings analysis of worse-case behavior is considered.
                  On the contrary, delay-insensitive models assume both the gate and wire delays are
                  unbounded. Here, wire forks are considered to be isochronic, that is, the component
                  wire delays are much less than the logic element delays (6). This assumption is even valid for the future nanotechnologies. However, wire connecting
                  the components doesn’t abide to the isochronic assumption.
               
               One of the most used techniques for delay- insensitive asynchronous logic design is
                  the Null Convention logic (NCL) (7). NCL utilizes dual-rail or quad-rail encoding to represent logic 1, logic 0, null
                  and invalid signals. For clock free operation, NCL, uses local handshaking done by
                  the completion detection register (8). Usually, NCL circuits are realized in CMOS technology which has the potential for
                  high speed but has high power dissipation and occupies a large area. In order to reduce
                  the area, semi-static implementation of NCL circuits have also been proposed (4). However, the semi-static implementation has the limitation of weak feedback loop.
                  To overcome the above limitations, a novel approach leveraging Gate Diffusion Input
                  (GDI) method is proposed (9). GDI is a low power design technique that was first introduced in synchronous circuits
                  to obtain low power synchronous designs (10). A wide range of complex logic functions can be implemented in only two transistors
                  by using the GDI approach. This approach is suitable for designing low power circuits
                  with the reduced transistor count. The proposed approach is extensively verified by
                  design and simulation of multiple prototype arithmetic logic circuits in this work.
               
               This paper is organized as follows. Section II presents the Preliminaries and review
                  of NCL and GDI. An extensive discussion of the proposed design is carried out in Section
                  III. Design and performance evaluation data including the area, power and latency
                  are included in Section IV. Finally, the summary and concluding remarks are made in
                  Section V.
               
             
            
                  II. PRELIMINARIES AND REVIEW
               In the current nanometer technology with ultra-low power de- sign as a goal, synchronous
                  circuit designs are limited because of their high-power dissipation factor. Asynchronous
                  circuits such as Null Convention Logic are the promising alternative to this solution.
                  NCL gates also known as threshold gates are designed with a hysteresis loop to main
                  delay insensitivity (11). Several CMOS implementations of NCL gates have been proposed and each design has
                  its own limitation. One of the most common limitations of a using CMOS implementation
                  is the area consumption. To overcome this and to reduce power dissipation, the low
                  power design technique GDI approach is implemented in some NCL gates. This section
                  gives the brief idea about the NCL design and GDI approach.
               
               
                     1. Null Convention Logic
                  NCL is a popular delay-insensitive methodology used for designing asynchronous circuits.
                     NCL circuits are said to perform correctly regardless of when the input becomes available.
                     Hence, resulting in a clock-less and DI circuit design (3). It is a self-timed logic paradigm where both data and control are integrated into
                     a single signal. To achieve the delay- insensitivity, NCL circuits utilize dual-rail
                     or quad-rail logic (12). Dual rail logic consists of two wires D0 and D1, whose values can be any one from
                     the set DATA0, DATA1, NULL. The DATA 0 (D0 = 1, D1 = 0) stage represent Boolean logic
                     0, DATA1 state (D0 = 0, D1 = 1) is equivalent to Boolean logic 1 and NULL (empty stage)
                     stage (D0 = 0, D1 = 0), meaning no DATA is available at the input. When D0 = 1 and
                     D1 = 1, this corresponds to invalid stage (2). Both the rails are mutually exclusive to each other, such that no two rails can
                     be simultaneously asserted. Similarly, quad-rail has four wires Q0, Q1, Q2 and Q3,
                     each representing different stage from the set DATA0, DATA1, DATA2, DATA3, NULL. These
                     rails are also mutually exclusive to each other. To achieve the delay-insensitive
                     behavior NCL should possess two main characteristics: symbolic completeness and input
                     completeness (2).
                  
                  NCL circuits are implemented using threshold gates. The basic NCL gate is T Hmn where
                     1 ≤ m ≤ n (13). Here, n and m represent total number of inputs and the number of inputs to be asserted,
                     respectively. At least m out of n inputs should be asserted before the output is asserted
                     (12). Second type of NCL gates are weighted threshold gate.
                  
                  These gates are denoted as T HmnWw1w2wR where, w1, w2, ....wR, each > 1, are the integer
                     weights of input1, input2,..... input R, respectively. Here, m ≥ wR > 1, applied to
                     input R but 1 ≤ R < n. There are 27 fundamental NCL gates constituting from two to
                     four variable functions. In order to design the DI circuits, NCL has a built-in hysteresis
                     state-holding capacity. This implies that after the output is asserted; all the inputs
                     must be de-asserted for the output to be de-asserted. This Hysteresis ensures the
                     gate is input complete, meaning that the output remains constant until all the inputs
                     are de-asserted (2).
                  
                
               
                     2. Gate Diffusion Input (GDI) Approach
                  For simple implementation of the GDI gates (all functions) in standard CMOS processes,
                     a new modified GDI model was introduced in (14). Fig. 1 illustrate modified GDI basic cell (14). Table 1 shows the input configuration of the simple GDI cell corresponding to different Boolean
                     functions. Similar to the conventional GDI it has three inputs G (common gate input
                     of both the nMOS and the pMOS), P (input to the source/drain of the pMOS), N (input
                     to the source/drain of the nMOS). The bulks of nMOS and pMOS transistors in the modified
                     cell are constantly connected to GND and VDD, respectively. This adaptation enables
                     simple implementation of the GDI gates (all functions) in standard CMOS processes
                     (10). The influence of the bulk effect on the circuit performance is very similar to that
                     of the originally proposed GDI cell. With the technology scaling, the impact of source-to-bulk
                     voltage on the transistor threshold voltage is highly reduced making this limitation
                     less relevant in process below 65 nm technology. The following equation shows the
                     dependency of transistor threshold voltage on the source to bulk voltage (14):
                  
                  
                     
                     
                     
                     
                     
                  
                  where VSB refers to the source to body voltage, $V_{th0}$ is the threshold voltage
                     when VSB = 0, φ1F represents the Fermi potential, γ denotes the linearized body coefficient,
                     and η represents the Drain-induced barrier lowering (DIBL) coefficient.
                  
                  Variety of function as seen in Table 1 can also be implemented using Modified GDI cell (15). GDI gates are more versatile and compact than Static CMOS gates. For example, designing
                     Multiplexer (MUX) using Modified GDI requires only two transistors whereas CMOS design
                     requires 12 transistors. GDI approach is more effective for the AND, OR, F1 and F2
                     functions. The F1 and F2 functions are the two basic functions in GDI and each one
                     of these functions provides a universal set (15). Therefore, in general, every digital circuit can be implemented using only F1 or
                     F2 gates or a combination of both. Simple modification in the input signals of F1
                     and F2 gates provides different functions, thus allowing synthesizing of other functions
                     more efficiently (14). Although MGDI reduces the transistor count, they suffer a voltage drop at their
                     outputs causing performance degradation.
                  
                  
                     
                           
                           
Fig. 1. Block diagram of the proposed transmitter.
                              
                           
                         
                     
                     
                  
                  
                     
                     
                     
                     
                           
                           
Table 1. Boolean function synthesis through input configuration of a simple GDI cell
                              (15)
                              
                           
                        
                        
                           
                           
                           
                                 
                                    
                                       | N | P | G | Out | Function | 
                                 
                                       | 0 | B | A | $\overline{A}B$ | F 1 | 
                                 
                                       | B | 1 | A | $\overline{A} + B$ | F 2 | 
                                 
                                       | 1 | B | A | A + B | OR | 
                                 
                                       | B | 0 | A | AB | AND | 
                                 
                                       | C | B | A | $\overline{A}B + AC$ | MUX | 
                                 
                                       | 0 | 1 | A | $\overline{A}$ | NOT | 
                              
                           
                        
                      
                     
                     
                     
                  
                
             
            
                  III. GATE DIFFUSION INPUT (GDI) BASED NCL CIRCUITS
               With the decreasing feature size, requirement of designs with not only reduced area
                  but also power dissipation is required. Several CMOS implementation schemes have been
                  introduced for NCL gates, including dynamic, static, semi-static (4), and differential. The static and semi-static implementations of C-elements have
                  been extensively discussed in (6). The main drawback of the CMOS NCL design is it occupies a large area, thus, large
                  power dissipation. To address this limitation modules of the NCL design are implemented
                  using the GDI technique. The GDI technique is a low power designed approach where
                  a wide range of complex circuits can be implemented using only two transistors. of
                  C-elements have been extensively discussed in (6). The main drawback of the CMOS NCL design is it occupies a large area, thus, large
                  power dissipation. To address this limitation modules of the NCL design are implemented
                  using the GDI technique. The GDI technique is a low power designed approach where
                  a wide range of complex circuits can be implemented using only two transistors. Hence,
                  the GDI approach not only reduces the power dissipation but also reduces the transistor
                  count. The GDI implementation of NCL gates has been proposed and extensively discussed
                  in here.
               
               
                     1. Static CMOS Implementation of NCL Gates
                  Generally, CMOS based designs consists of one pull-up and one pull-down network to
                     implement the set and reset functions, which are complements of each other (16). However, the NCL threshold gates are also designed with the hysteresis state holding
                     capability to ensure delay-insensitivity (12). As the result, an additional pull up and pull-down network known as Hold0 and Hold1
                     are required to maintain this hysteresis such that the output will not change until
                     all inputs are de-asserted (2). An NCL gate constitutes of both set and hold equation, the gate functionality and
                     when should it be asserted is determined by the set and the hold determines till when
                     the gate should be asserted which is nothing but the OR-ing of all the gate inputs
                     (12).
                  
                
               
                     2. GDI Implementation of NCL Gates
                  To overcome the above limitations, a low power design technique, Modified Gate Diffusion
                     Input (GDI) can be utilized to design the NCL circuits. The basic representation of
                     GDI cell is shown in Fig. 4 where inputs can also be applied to source of both NMOS and PMOS, allowing to design
                     a wide range of circuits using only two transistors (15). However, using GDI full output voltage swing cannot be obtained for all input combinations,
                     thus, leading to a significant voltage drop at the final output [since, PMOS transistor
                     is strong pull up device and NMOS transistor has strong pull-down network]. This limitation
                     can be addressed by using regenerative buffers (15). Thus, implementation of the NCL circuits using GDI technology not only reduces the
                     transistor count but also reduces power dissipation (15).
                  
                  
                     
                           
                           
Fig. 2. Basic GDI Implementation of TH22 gate.
                              
                           
                         
                     
                     
                  
                  1) Designing of the GDI NCL gates: Table 1 shows the different input configuration corresponding to respective Boolean functions
                     (15). These configurations are used for designing GDI based NCL gates. The NCL gates constitutes
                     of both set and hold equation, the gate functionality and when should it be asserted
                     is determined by the set and the hold determines till when the gate should be asserted
                     which is nothing but the OR-ing of all the gate inputs. The complete Boolean equation
                     for a THmn gate is breakdown into a series of AND, OR and MUX functionality and then
                     GDI AND, OR and MUX configurability is used for designing NCL TH gates. The basic
                     GDI implementation of NCL TH22 gate is depicted in the Fig. 2. The Boolean expression of T H22 gate is as [(AB + Z(A + B)]. Accordingly, the GDI
                     AND and OR configurability is used for designing AB and (A + B) respectively. Finally,
                     GDI MUX configurability is used to determine the set or hold state based on the previous
                     results (i.e. based on Z). In comparison to the CMOS implementation, GDI based TH22
                     gates requires only 6 transistors. Thus, reducing the transistor count by 50%. However,
                     voltage drop at the output effects the performance of the GI NCL gates.
                  
                  2) Analysis of voltage swing for the GDI NCL gates: The major drawback of the above
                     method is that the full output voltage swing cannot be obtained for all input combinations
                     (leading to a significant voltage drop at the output). This limitation arises due
                     to the structure of the inputs applied to the GDI cell. As the pMOS and nMOS transistor
                     are strong pull up device and strong pull-down network respectively, application of
                     any other voltage other than VDD and gnd to pMOS and NMOS source respectively leads
                     to a voltage drop of $V_{tp}$ for pMOS and (VDD − $V_{tn}$) for nMOS transistors at
                     the output (drain). Here, $V_{tp}$ and $V_{tn}$ represents the threshold voltage of
                     pMOS and nMOS transistor. The above said limitation can be explained clearly for the
                     above GDI NCL Th22 gate by theoretically examining the output voltages for all the
                     input combinations. Assuming all the pMOS and nMOS transistors have the same properties
                     (i.e. same widths and lengths for the pMOS and nMOS transistors respectively). The
                     final output voltage for different input combinations is as explained below: When
                     A = 0 and B = 0; voltage at node N1 would be $V_{tp}$ and at node N2 the voltage would
                     be $V_{tp}$. Assuming the previous stage to be zero then the present output would
                     be greater than $V_{tp}$ leading to a significant voltage drop as shown in Fig. 3 . When A = 0, B = 1; voltage at node N1 would be $V_{tp}$ and at node N2 the voltage
                     would be V DD. Assuming the previous stage was null then the current results would
                     be greater than $V_{tp}$. Therefore, significant performance degradation. For the
                     input combination A = 1 and B = 0; voltage at node N1 would be zero and at node N2
                     the voltage would be (VDD − $V_{tn}$). Assuming the previous stage to be 0 then the
                     present output would be $V_{tp}$. Therefore, voltage drop at the output voltage. For
                     A = 1 and B = 1; voltage at node N1 would be (VDD − $V_{tn}$) and at node N2 the voltage
                     would be (VDD − $V_{tn}$). Assuming the previous stage to be 0 then the present output
                     would be (VDD − $V_{tn}$). Therefore, voltage drop at the output voltage. Since, the
                     NCL follows the hysteresis loop (where the present output serves as the feedback to
                     the next result), this voltage drops also effect the preceding stages causing performance
                     degradation. Thus, very essential to address this limitation. To overcome this performance
                     degradation and obtain a full swing output voltage a regenerative buffer is used that
                     the output of every GDI technique based NCL THmn gates. When compared to the above
                     said method implementation of regenerative buffer increases the transistor count but
                     solves the problem of performance degradation.
                  
                  
                     
                           
                           
Fig. 3. Approximate voltage drop across GDI TH22 gate for A=0, B=0 input combination.
                              
                           
                         
                     
                     
                  
                  3) Leakage current: The current between drain to source of a transistor operating
                     in weak inversion region is called sub threshold region. This sub threshold conduction
                     is due to the diffusion current of the minority charge carriers given as (14):
                  
                  
                     
                     
                     
                     
                     
                  
                  where $I_{SUB}$ is a function of transistor width (W), transistor length (L), temperature,
                     drain-source voltage ($V_{DS}$), gate- source voltage (VGS), threshold voltage ($V_{t}$)
                     and process constants (K and m). Under weak inversion the channel surface potential
                     is almost constant across the channel and the current flow is determined by diffusion
                     of minority carriers due to a lateral concentration gradient (14). Gate Leakage Gate leakage current is due to the flow of electrons through the oxide.
                     Fowler-Nordheim tunneling and direct tunneling are the two tunneling mechanisms responsible
                     for the gate leakage (14). The gate leakage increases exponentially as the oxide thickness is reduced.
                  
                  
                     
                     
                     
                     
                     
                  
                  where Vox is the oxide layer potential, $t_{ox}$ is the oxide layer thickness, A and
                     B are constants, and $E_{ox}$ is the electric field over the oxide layer that is given
                     by:
                  
                  
                     
                     
                     
                     
                     
                  
                  The sophisticated structure of the GDI cell provide significant reduction in the gate
                     leakage current as well as the subthreshold leakage current when compared to the static
                     CMOS gates (14). In static CMOS gates there is always a sub-threshold leakage path for all the possible
                     input states as the pull-up and the pull-down networks are always connected to the
                     supply voltage or ground; in contrast to GDI gates where the connection of the pull-up
                     and pull-down network depends on the functionality to be implemented.
                  
                  3) Multi-threshold techniques for reducing the threshold voltage drop: The voltage
                     drops at their output of the GDI gates causing performance degradation. Regenerative
                     inverters are used to avoid voltage drop but they increase the circuit area. However,
                     the usage of the cascaded inverters has increased static power dissipation, due to
                     the increased VGS voltages of the off transistors. This issue limited the use of GDI
                     technology in the older technologies (14). However, the nanoscale process is providing an option to fabricate different threshold
                     transistors on the same die can which solve the above problems. The best solution
                     is provided by using low threshold transistors in the path where a voltage drop is
                     expected, coupled with regenerative inverters designed using high threshold transistors.
                     Due to the increased subthreshold leakage in Static CMOS, integration of low threshold
                     transistors in non-critical path is usually not practiced (14). Since, in GDI, the leakage currents are small, the coupling of low and high threshold
                     transistors doesn’t dissipate large leakage current as in Static CMOS. When compared
                     to Static CMOS the performance of GDI is still degraded due to the uses of these transistors.
                     However, to achieve the same functionality, the total path length from the input to
                     the output is small (for most function) in GDI and compensates for individual gate
                     performance degradation (14).
                  
                  5) Proposed generalized design approach for GDI NCL gates: By implementing the GDI
                     technique for the asynchronous NCL designs in the nano-scale process, we can utilize
                     the multi-threshold techniques for reducing the threshold voltage drop. Along the
                     multi=threshold techniques, introducing the regenerative buffers/inverters eliminates
                     the voltage drop by producing the full swing voltage at the output. These two techniques
                     can be used for designing any NCL gates i.e. any NCL circuity. Designing the GDI AND,
                     OR, MUX cells using the low threshold transistor and the regenerative buffer with
                     high threshold transistor will not only reduce the delay but also the power consumption
                     with an area overhead. As an example, the GDI based NCL TH22 designed using the proposed
                     method is illustrated in Fig. 4. The GDI based designing of TH22 is carried out as explained. First, GDI based AND
                     and OR configurability is used to designing AB and A+B and then GDI MUX is used to
                     select AB or A + B based on Z value. If Z = 0, AB value is selected else A + B value
                     is selected. The above GDI AND, OR and MUX cells are implemented using low threshold
                     transistor for low power design. Then, the MUX result is passed through regenerative
                     buffers designed using high threshold transistor to produce a full swing output. Therefore,
                     efficiently reducing the transistor count and power consumption. Similarly, different
                     NCL THmn gates are designed using GDI technique. Similarly, different NCL THmn gates
                     are designed using GDI technique. The number of transistors required for implementing
                     27 NCL gates using CMOS (Static) (17) and GDI techniques has been compared and found that GDI NCL gates offer 13.5% reduction
                     in transistor count on average. Thus, using GDI implementation of NCL circuits we
                     can reduce the transistor count which leads to decrease in power consumption.
                  
                  The validation of the proposed model is carried out by realizing a variety of delay-insensitive
                     NCL designs such as a 4-bit ripple carry adder, unpipelined 4x4 multiplier, two stage
                     pipelined 4x4 NCL multiplier and unpipelined NCL ALU using GDI technology. The in-depth
                     detail for each design is as explained below.
                  
                  
                     
                           
                           
Fig. 4. Proposed GDI NCL TH22 gate.
                         
                     
                     
                  
                
               
                     3. Ripple Carry Adder
                  NCL Ripple carry adder (RCA) designed using GDI technology is presented; GDI RCA model.
                     In this paper a GDI model of a 4-bit RCA is proposed. The proposed model utilizes
                     low power GDI technique to realize the NCL gates. The results show that the proposed
                     model have better performance in terms of transistor count, static and dynamic power
                     dissipation. For designing a 4-bit NCL ripple carry adder, a 4-bit input complete,
                     optimized NCL full adders are utilized which are sandwiched between two DI registers.
                     The optimized NCL full adder is designed using two T H23 and T H34W 2 gates. Fig. 7 depicts the proposed optimized GDI NCL full adder, where TH23 and T H34W 2 gates
                     are implemented using GDI technology. Fig. 5 depicts the transistor level implementation of T H23 gate using GDI technique, where
                     a restoration buffer is added at the output to restore the signal to avoid any voltage
                     drop. For designing a low power circuit, except for the buffer, the rest of the circuit
                     is designed using low threshold transistors. The reason for realizing buffer using
                     high threshold transistors is to restore the dropped voltage levels.
                  
                  
                     
                           
                           
Fig. 4. Block diagram of the proposed 2b/cycle NS SAR ADC.
                              
                           
                         
                     
                     
                  
                  
                     
                           
                           
Fig. 5. GDI Implementation of TH23 gate.
                              
                           
                         
                     
                     
                  
                  
                     
                           
                           
Fig. 6. GDI Implementation of TH34W2 gate.
                              
                           
                         
                     
                     
                  
                  
                     
                           
                           
Fig. 7. GDI Model of Full Adder with DI Registers.
                              
                           
                         
                     
                     
                  
                
               
                     4. 4-Bit Multiplier
                  NCL multipliers are classified into non-pipelined and pipelined multipliers. In this
                     paper a GDI model of 4-bit non-pipelined and pipelined NCL multiplier is proposed.
                     In the GDI model all the modules are implemented in GDI technique, to over- come the
                     limitations of the static CMOS design. The proposed model provides the best performance
                     in terms of power and area.
                  
                  
                     
                           
                           
Fig. 8. GDI Model of Non-pipelined, 1-stage 4×4 multiplier.
                              
                           
                         
                     
                     
                  
                  Non-Pipelined Multiplier
                  Fig. 8 illustrate the proposed GDI model for the existing non-pipelined (6), 1-stage 4-bit multiplier using full-word completion version of the NCL multiplier
                     design. To reduce the transistor count and dynamic power dissipation, all the modules
                     of the existing CMOS Non-pipelined multiplier are replaced with GDI modules. Thus,
                     resulting to a GDI model consisting of GDI technology-based gates. As depicted in
                     the Fig. 8, the GDI model consists of 8-bit GDI registers, incomplete GDI AND, complete GDI
                     AND gate, GDI half adders (GDI HA) and GDI full adder (GDI FA). I and C denotes “incomplete
                     GDI AND” and “complete GDI AND” functions, respectively. The GDI multiplier also include
                     GENS7 and the completion component, denoted as COMP. The 8-bit GDI registers at the
                     input and at the output are used to control the ow of DATA and NULL wavefronts as
                     shown in Fig. 8.
                  
                  
                     
                           
                           
Fig. 9. GDI Model of 2-stage 4×4 multiplier.
                              
                           
                         
                     
                     
                  
                  2-Stage Pipelined Multiplier
                  The proposed GDI model for the existing 2-stage 4-bit multiplier (6) using full-word completion is depicted in Fig. 9. It consists of an 8-bit GDI register, an 8-bit CMOS register, a 12-bit GDI register,
                     incomplete GDI AND (I), complete GDI AND (C), GDI half adders and the GDI full adder
                     (GDI FA). Here, a 12-bit GDI registers is added between the HYBRID HA and GDI FA in
                     addition to the proposed HYBRID Non- pipelined, 1-stage 4-bit multiplier using full-word
                     completion, to achieve 2-stage GDI 4-bit multiplier.
                  
                
               
                     5. Hybrid Non-pipelined ALU
                  The logic diagram of the proposed non-pipelined dual-rail GDI ALU is shown in Fig. 10. The existing non-pipelined dual-rail ALU (3) is modified to obtain the proposed model. The proposed model gives better performance
                     in terms of transistor count and power dissipation. It consists of dual-rail GDI registers,
                     completion components (COMP), GDI Convert to MEAG function, GDI Demultiplexer, GDI
                     NCL OR, GDI AND, GDI XOR, invert, shift right, shift left functions, a GDI ripple
                     carry subtractor and adder, two GDI Multiplexers and CMOS Carry Logic. The Convert
                     to MEAG function converts the three dual rail signals to an 8-rail MEAG signals. This
                     conversion is carried out by eight TH33 gates present in the Convert to MEAG function.
                     The invert, shift right, and shift left operations are done by renaming the signals
                     and hence, have no logic delay.
                  
                  
                     
                           
                           
Fig. 10. Non-pipelined Dual-Rail GDI ALU.
                              
                           
                         
                     
                     
                  
                  The GDI ripple-carry subtractor and adder consist of four GDI full adders. Based on
                     the select MEAG result, the GDI Demultiplexer selects the corresponding function.
                     The GDI Demultiplexer is realized using GDI TH22 gates, which pass the input A, B,
                     and $C_{in}$/$B_{in}$ inputs, respectively. For the functions which doesn’t require
                     B input, GDI Demultiplexer is designed using GDI TH34 gates, which also ensures input-completeness
                     with respect to B. On the other hand, the CMOS Carry Logic generates $C_{out}$ and
                     provides input completeness to $C_{in}$/$B_{in}$ inputs. The CMOS Multiplexers consists
                     of TH14 and TH12 gates, which produces single results by OR-ing each rail of the demultiplexer
                     signals.
                  
                
             
            
                  IV. SIMULATION RESULTS
               This section presents the comparison results of different of NCL circuits implemented
                  using CMOS and GDI technology. They are three different types of CMOS models: High
                  Threshold model (High $V_{th}$) where the complete circuit is realized using only
                  high threshold transistors. In the second Low Threshold model (Low $V_{th}$) the low
                  threshold transistors are used for realizing the design. Lastly, the standard threshold
                  transistors are used for designing the Standard Threshold model (std $V_{th}$). The
                  low threshold transistors offer high speed but high-power consumption, high threshold
                  transistors have low power and high latency, and standard threshold transistors provide
                  medium delay and medium power dissipation. The GDI design performance is compared
                  individually with all three CMOS designs. The performance comparison is based on number
                  of transistors, static and dynamic power dissipation. The CMOS and GDI designs are
                  realized in 45 nm technology using Cadence proprietary general process design kit
                  (gpdk45). A process design kit contains the process technology and needed information
                  to do device-level design in the Cadence environment. The schematics are implemented
                  in Cadence Virtuoso tool with VDD = 1V and temperature= 27°. The circuits are simulated
                  with the Spectre simulator in the Cadence Virtuoso using gpdk45 high and low threshold
                  MOSFET transistors with W/L ratio of 1. Note that all transistors for all designs,
                  both CMOS and GDI, are minimum sized.
               
               
                  
                  
                  
                  
                        
                        
Table 2. Simulated Results 4-Bit RCA using CMOS and GDI Technology
                           
                        
                     
                     
                        
                        
                        
                              
                                 
                                    | Design Technique | Static Power (nW ) | Average Power (nW ) | Dynamic Power ( nW ) | Transistor Count | 
                              
                                    | CMOS model 1 | 0.588 | 14.01 | 13.42 | 1128 | 
                              
                                    | CMOS model 2 | 9.77 | 32.21 | 22.44 | 1128 | 
                              
                                    | CMOS model 3 | 1.01 | 17.46 | 16.45 | 1128 | 
                              
                                    | GDI model | 1.63 | 13.79 | 12.16 | 960 | 
                           
                        
                     
                   
                  
                  
                  
               
               
                  
                  
                  
                  
                        
                        
Table 3. Performance Comparison of 4-Bit Unpipelined multiplier
                           
                        
                     
                     
                        
                        
                        
                              
                                 
                                    | Design Technique | Static Power (nW ) | Average Power (nW ) | Dynamic Power (nW ) | Transistor Count | 
                              
                                    | CMOS model 1 | 1.58 | 21.06 | 19.48 | 2040 | 
                              
                                    | CMOS model 2 | 15.8 | 45.83 | 30.03 | 2040 | 
                              
                                    | CMOS model 3 | 1.66 | 25.93 | 24.27 | 2040 | 
                              
                                    | GDI model | 2.9 | 21.916 | 19.01 | 1760 | 
                           
                        
                     
                   
                  
                  
                  
               
               Simulations were carried on all the possible input patterns to calculate static and
                  dynamic power dissipation. Dynamic power dissipation is the power dissipated during
                  the transient state condition (when the transistors of the circuits are switching
                  from one logic state to another). For computing the dynamic power, first the average
                  power for all the available input patterns is measured. Then, the static power is
                  deducted from the measured average power to obtain the dynamic power.
               
               A. 4-bit Ripple Carry Adder - CMOS vs GDI
               Ripple Carry Adder presented in this paper is designed using four different models
                  i.e. low threshold, high threshold and standard CMOS models and a GDI technique based
                  RCA model. In the CMOS model1 the whole circuit is designed using high threshold transistors,
                  similarly the CMOS model 2 designed with low threshold transistors and standard transistor
                  are used in CMOS model3. Whereas, in the GDI RCA model complete circuit (full adder,
                  input and output registers) is designed using GDI technology. Table 2 shows the performance comparison of these designs in terms of power and transistor
                  count. Simulations are carried out using input test vectors, which covers all possible
                  input combinations for a 4-bit RCA. The values tabulated in the Table 2 corresponds to the average value calculated for all possible input combinations.
               
               The GDI RCA model offers 14% reduction in transistor count when compared to all designs
                  of CMOS models. In comparison with the CMOS High threshold, low-threshold, and standard
                  transistor models, the GDI model results in 9.3%, 45.7% and 30.30% reduction in dynamic
                  power reduction.
               
               B. 4-bit NCL Multiplier CMOS vs GDI
               The CMOS and the GDI design comparison can also be extended to multipliers. Two types
                  of 4-bit NCL multipliers, 4-bit Unpipelined Multiplier and 4-bit pipelined Multiplier
                  are designed and there simulation results are discussed as below.
               
               1) 4- bit Unpipelined Multiplier: The four models of unpipelined NCL multipliers designed
                  in this paper constitutes of three different CMOS models and the GDI model. The GDI
                  design model results are compared with the CMOS models. As seen from Table 3, the GDI design gives the best performance in terms of the # transistor used and
                  dynamic power dissipation when compared to the CMOS models. The dynamic power is improved
                  by 2.4%, 36.6% and 21.6 when compared with CMOS model 1, CMOS model 2 and CMOS model
                  3. In comparison to the CMOS models, the GDI model offers 13.7% reduction in transistor
                  count. Thus, reducing the dynamic power and area as well.
               
               
                  
                  
                  
                  
                        
                        
Table 4. 4-Bit Two Stage Pipelined Multiplier Simulation Results for CMOS and GDI
                           Technology
                           
                        
                     
                     
                        
                        
                        
                              
                                 
                                    | Design Technique | Static Power (nW ) | Average Power (nW ) | Dynamic Power ( nW ) | Transistor Count | 
                              
                                    | CMOS model 1 | 2.17 | 28.34 | 26.17 | 2574 | 
                              
                                    | CMOS model 2 | 20.9 | 68.40 | 47.5 | 2574 | 
                              
                                    | CMOS model 3 | 1.23 | 33.915 | 32.685 | 2574 | 
                              
                                    | GDI model | 3.72 | 29.56 | 25.84 | 2238 | 
                           
                        
                     
                   
                  
                  
                  
               
               
                  
                  
                  
                  
                        
                        
Table 5. Performance Comparison of Non-Pipelined Dual-Rail CMOS and GDI ALU
                           
                        
                     
                     
                        
                        
                        
                              
                                 
                                    | Design Technique | Static Power (nW ) | Average Power (nW ) | Dynamic Power ( nW ) | Transistor Count | 
                              
                                    | CMOS model 1 | 1.95 | 19.116 | 17.16 | 4084 | 
                              
                                    | CMOS model 2 | 23.9 | 54.25 | 30.53 | 4084 | 
                              
                                    | CMOS model 3 | 3.09 | 24.55 | 21.435 | 4084 | 
                              
                                    | GDI model | 5.54 | 23.96 | 18.42 | 3520 | 
                           
                        
                     
                   
                  
                  
                  
               
               2) Two Stage pipelined Multiplier: Performance analysis of the Nonpipelined ALU, designed
                  using three different CMOS approaches and GDI are discussed below. To prevent power
                  dissipation and area consumption, GDI model employing low power GDI technique is proposed.
                  Table 4 presents the simulation results of the three CMOS models and the GDI model. The Average
                  power presented are the average of all the input transitions possible for the 4-bit
                  ALU. As illustrated the GDI non-pipelined ALU design results in a 1.2%, 45.6% and
                  20.9% decrease in dynamic power. In addition, transistor count is decreased by 13.4%
                  when compared to the all the CMOS non-pipelined ALU design.
               
               C. Hybrid Non-pipelined ALU
               The Performance analysis of the Nonpipelined ALU, designed using two different approaches
                  CMOS and GDI are discussed below. To prevent threshold voltage penetration inside
                  the circuit and to utilize the GDI low power technique advantages, a GDI circuit comprising
                  of both GDI NCL gates is proposed. Table 5 presents the simulation results of both the CMOS and the GDI models. As illustrated
                  the GDI non-pipelined ALU design results in a 39% and 14% decrease in the dynamic
                  power dissipation when compared to CMOS model 2 and model 3. However, GDI model dynamic
                  power increases by 6% in comparison with the CMOS model 1. This variation is because
                  of the type of threshold transistor used in these models. The CMOS model1 only comprises
                  of high threshold transistors which dissipates less power. Whereas the GDI model uses
                  both high and low threshold transistors, this low threshold transistors are the reason
                  for its increased power. The GDI model results in 13% reduced transistor count in
                  comparison to all the CMOS models.
               
             
            
                  IV. CONCLUSIONS
               In this paper, a novel GDI NCL model is proposed to address the limitations of the
                  existing CMOS NCL design. The GDI model contains modules implemented using GDI technique.
                  The main drawback of the CMOS NCL design is it occupies a large area. To address this
                  limitation modules of the NCL design are implemented using the GDI technique. The
                  GDI technique is a low power designed approach where a wide range of complex circuits
                  can be implemented using only two transistors. Hence, the GDI approach not only reduces
                  the power dissipation but also reduces the transistor count.
               
               However, when the NCL gates are designed using the GDI technique there is a considerable
                  voltage drop at their outputs. This problem is addressed by using low threshold transistors
                  where a voltage drop is expected, and high threshold transistor are used for the regenerative
                  inverters at the output. The proposed idea is implemented in various NCL circuits
                  such as the RCA, unpipelined multiplier and pipelined multiplier, unpipelined ALU
                  When compared to the CMOS design, the GDI models have less transistor count, dynamic
                  power dissipation.
               
             
          
         
            
            
                  
                     REFERENCES
                  
                     
                        
                        Mader R., Friedman E. G., Litman A., Kourtev I. S., May 2002, Large scale clock skew
                           scheduling techniques for improved reliability of digital synchronous vlsi circuits,
                           IEEE International Symposium on Cirtuis ans Systems(ISCAS 2002), Vol. 1, pp. I-357

 
                     
                        
                        Smith S. C., DeMara R. F., Yuan J. S., Ferguson D., Lamb D., 2004, Optimization of
                           null convention self-timed circuits, INTEGRATION, the VLSI journal, Vol. 37, No. 3,
                           pp. 135-165

 
                     
                        
                        Bandapati S. K., Smith S. C., 2007, Design and characterization of null convention
                           arithmetic logic units, Microelectronic engineering, Vol. 84, No. 2, pp. 280-287

 
                     
                        
                        Parsan F. A., Smith S. C., Oct 2012, CMOS implementation of static threshold gates
                           with hysteresis: A new approach, in Proc. IEEE/IFIP 20th Int VLSI and System-on-Chip
                           (VLSI-SoC) Conf, pp. 41-45

 
                     
                        
                        Bonam R., Chaudhary S., Yellambalase Y., Choi M., Aug 2007, Clock-free nanowire crossbar
                           architecture based on null convention logic (ncl), in Proc. 7th IEEE Conf. Nanotechnology
                           (IEEE NANO), pp. 85-89

 
                     
                        
                        Smith S. C., 2001, Gate and throughput optimizations for null convention self- timed
                           digital circuits, Ph.D. dissertation, University of Central Florida Orlando, Florida

 
                     
                        
                        Choi M., Kang B.-H., Kim Y.-B., Kim K. K., Nov 2014, Asynchronous circuit design using
                           new high speed ncl gates, in Proc. Int. SoC Design Conf. (ISOCC), pp. 13-14

 
                     
                        
                        Parsan F. A., Smith S. C., Aug 2012, CMOS implementation comparison of ncl gates,
                           in Proc. IEEE 55th Int. Midwest Symp. Circuits and Systems (MWSCAS), pp. 394-397

 
                     
                        
                        Metku P., Kim K. K., Kim Y.-B., Choi M., Oct 2018, Low-power null con- vention logic
                           multiplier design based on gate diffusion input technique, in 2018 International SoC
                           Design Conference (ISOCC), pp. 233-234

 
                     
                        
                        Morgenshtein A., Yuzhaninov V., Kovshilovsky A., Fish A., 2014, Full- swing gate diffusion
                           input logiccase-study of low-power cla adder design, INTEGRATION, the VLSI journal,
                           Vol. 47, No. 1, pp. 62-70

 
                     
                        
                        Fant K. M., Brandt S. A., Oct. 27 199, Null convention logic system, US Patent 5,828,228

 
                     
                        
                        Smith S. C., DeMara R. F., Yuan J. S., Hagedorn M., Ferguson D., 2002, Null convention
                           multiply and accumulate unit with conditional round- ing, scaling, and saturation,
                           Journal of Systems Architecture, Vol. 47, No. 12, pp. 977-998

 
                     
                        
                        Sobelman G. E., Fant K., May 1998, Cmos circuit design of threshold gates with hysteresis,
                           IEEE International Symposium on Circuits and Systems (ISCAS1998), Vol. 2, pp. 61-64

 
                     
                        
                        Morgenshtein A., Shwartz I., Fish A., Nov 2010, Gate diffusion input (gdi) logic in
                           standard CMOS nanoscale process, in Proc. IEEE 26-th Convention of Electrical and
                           Electronics Engineers in Israel, pp. 000 776-000 780

 
                     
                        
                        Morgenshtein A., Fish A., Wagner I.A., May 2002, Gate-diffusion input (gdi) - a technique
                           for low power design of digital circuits: analysis and characterization, IEEE International
                           Symposium on Circuits and Systems (ISCAS2002), Vol. 1, pp. I–477-I–480

 
                     
                        
                        Parsan F. A., Smith S. C., Aug 2012, Cmos implementation comparison of ncl gates,
                           in Circuits and Systems (MWSCAS), 2012 IEEE 55th International Midwest Symposium on

 
                     
                        
                        Smith S. C., Di J., 2009, Designing asynchronous circuits using null convention logic
                           (ncl), Synthesis Lectures on Digital Circuits and Systems, Vol. 4, No. 1, pp. 1-96

 
                   
                
             
            Author
             
             
             
            
            
               Prashanthi Metku is from Hyderabad, India. 
            
            She received her B.Tech degree in Electronic and Communication Engineering from Jawaharlal
               Nehru Technological University, Hyderabad, India, in 2011 and M.Tech degree in Electronic
               Engineering from Pondicherry University, India, in 2014. 
            
            She is currently pursuing her Ph.D. degree in the Computer Engineering from Missouri
               University of Science and Technology, United States. 
            
            Her interests include CMOS circuit design and Error Correction Codes.
               
            
            
            
               Kyung Ki Kim received his B.S. and M.S. degrees in Electronic Engi-neering from Yeungnam
               University, South Korea, in 1995 and 1997, respectively. 
            
            He was a candidate for Ph.D. in Computer Science from Sogang University, South Korea
               from 1997 to 1999, and received his Ph.D. Degree in Computer Engineering from Northeastern
               University, Boston, USA in 2008. 
            
            He was a member of technical staff with Sun Microsystems, Santa Clara, CA in 2008
               and a senior researcher with Illinois Institute of Technology, Chicago, USA in 2009.
               
            
            Since March 2010, he has been with the school of Electronic and Electrical Engineering,
               Daegu University, Korea, where he is currently an Associate Professor. 
            
            His current research focuses on neuromorphic architecture, high speed low power VLSI
               design, asynchronous design, electronic CAD and nano-electronics.
               
            
            
            
               Yong-Bin Kim received the B.S. degree in electrical engineering from Sogang University,
               Seoul, Korea, the M.S. degree in electrical engineering from New Jersey Institute
               of Technology, Newark, NJ, USA, and the Ph.D. degree in electrical and computer engineering
               from Colorado State University, Fort Collins, CO, USA. 
            
            He was a member of the technical staff with Electronics and Telecommunications Research
               Institute(ETRI), Daejon, Korea from 1982 to 1987. 
            
            He was a Senior Design Engineer with Intel Corp., Hillsboro, OR, USA, from 1990 to
               1993, involved in Intel Pentium Pro CPU chip design. 
            
            He was a Member of Technical Staff with Hewlett Packard Co., Fort Collins, CO, USA
               from 1993 to 1996, involved in HP PA-8000 RISC microprocessor chip design. 
            
            He was as a Staff Engineer with Sun Microsystems, Palo Alto, CA, USA from 1996 to
               1998, involved in 1.5 GHz Ultra Sparc5 CPU chip design. 
            
            He was an Assistant Professor with the Department of Electrical and Computer Engineering
               of the University of Utah, Salt Lake City, UT, USA from 1998 to 2000. 
            
            He is currently a Professor with the Department of Electrical and Computer Engineering
               at Northeastern University, Boston, MA, USA. 
            
            His research focuses on low-power analog and digital circuit design as well as high-speed
               low-poper VLSI circuit design and methodology.
               
            
            
            
               Minsu Choi received his B.S., M.S. and Ph.D. degrees in Computer Science from Oklahoma
               State University in 1995, 1998 and 2002, respectively. 
            
            He is currently an associate professor of Electrical and Computer Engineering at Missouri
               University of Science & Technology (Missouri S&T). 
            
            His research mainly focuses on Computer Architecture & VLSI, Crypto-hardware design,
               Nanoelectronics, Embedded Systems, Fault Tolerance, Testing, Quality Assurance, Reliability
               Modeling and Analysis, Configurable Computing, Parallel & Distributed Systems and
               Dependable Instrumentation & Measurement. 
            
            He has won two outstanding teaching awards at MST in 2008 and 2009. 
            He is a senior member of IEEE and a member of Golden Key National Honor Society and
               Sigma Xi.