Null Convention Logic (NCL) is a robust clock-less technique for designing asynchronous delay-insensitive circuits. The traditional complementary metal oxide semiconductor (CMOS) approach is often used for designing NCL circuits, which tends to occupy a large area. To address this issue, a low power design technique Gate Diffusion Input (GDI) is introduced for designing the NCL circuits. This GDI design methodology is the promising alternative for the static CMOS designs, which allows the reduction in area and power consumption while maintaining the low complexity of the logic design. In this paper, a novel GDI based NCL designs are proposed and designed. However, the voltage swings in the GDI approach leads to the considerable amount of voltage drop at the output. This limitation is addressed by using low threshold transistors where a voltage drop is expected, and high threshold transistors are used for the regenerative inverters at the output. The proposed approach has been verified by designing the NCL Ripple Carry Adder (RCA), Unpipelined multiplier, pipelined multiplier and Unpipelined ALU by using the GDI technique. These models are designed and simulated using Cadence Virtuoso and an average of 13.5 % reduction in the transistor count is observed for these GDI based NCL models when compared to the CMOS models.

※ The user interface design of www.jsts.org has been recently revised and updated. Please contact inter@theieie.org for any inquiries regarding paper submission.

### Journal Search

- (1GlobalFoundries, Essex Junction, VT, USA)
- (Department of Electronic Engineering, Daegu University, Gyeongsan, Korea)
- (Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA)
- (Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, USA)

## I. INTRODUCTION

Clocking have become a very complex task for circuits due to technology scaling. The
increasing clock rate, due to the de- creasing transistor size is leading to a major
problem of clock skew. In fact, designing clock nets consumes large portion of the
designing time ^{(1)}. In order to achieve a tolerable skew, large part of the chip area is allotted for
clock drives ^{(2)}. This leads to high power dissipation prominently at clock edges, where switching
occurs. As the trend for high clock frequency and decreasing the feature size continues,
synchronous circuits power dissipation and noise are significantly increasing ^{(3)}. The increasing power dissipation is the major concern for the emerging low power
industry. Thus, encouraging renewed interest towards asynchro-nous digital designs.

In comparison to synchronous circuits delay-insensitive (DI) asynchronous paradigms
offers less power, noise and electro- magnetic interference ^{(4)}. Asynchronous circuits are classified into two types: bounded-delay and delay-insensitive
models. Bounded-delay models consider both the gates and wire delays to be bounded.
One such example for this type of model is micropipelines ^{(5)}. Here, delays are added based on the worst-case scenarios. To ensure the correctness
of the circuits, extensive timings analysis of worse-case behavior is considered.
On the contrary, delay-insensitive models assume both the gate and wire delays are
unbounded. Here, wire forks are considered to be isochronic, that is, the component
wire delays are much less than the logic element delays ^{(6)}. This assumption is even valid for the future nanotechnologies. However, wire connecting
the components doesn’t abide to the isochronic assumption.

One of the most used techniques for delay- insensitive asynchronous logic design is
the Null Convention logic (NCL) ^{(7)}. NCL utilizes dual-rail or quad-rail encoding to represent logic 1, logic 0, null
and invalid signals. For clock free operation, NCL, uses local handshaking done by
the completion detection register ^{(8)}. Usually, NCL circuits are realized in CMOS technology which has the potential for
high speed but has high power dissipation and occupies a large area. In order to reduce
the area, semi-static implementation of NCL circuits have also been proposed ^{(4)}. However, the semi-static implementation has the limitation of weak feedback loop.
To overcome the above limitations, a novel approach leveraging Gate Diffusion Input
(GDI) method is proposed ^{(9)}. GDI is a low power design technique that was first introduced in synchronous circuits
to obtain low power synchronous designs ^{(10)}. A wide range of complex logic functions can be implemented in only two transistors
by using the GDI approach. This approach is suitable for designing low power circuits
with the reduced transistor count. The proposed approach is extensively verified by
design and simulation of multiple prototype arithmetic logic circuits in this work.

This paper is organized as follows. Section II presents the Preliminaries and review of NCL and GDI. An extensive discussion of the proposed design is carried out in Section III. Design and performance evaluation data including the area, power and latency are included in Section IV. Finally, the summary and concluding remarks are made in Section V.

## II. PRELIMINARIES AND REVIEW

In the current nanometer technology with ultra-low power de- sign as a goal, synchronous
circuit designs are limited because of their high-power dissipation factor. Asynchronous
circuits such as Null Convention Logic are the promising alternative to this solution.
NCL gates also known as threshold gates are designed with a hysteresis loop to main
delay insensitivity ^{(11)}. Several CMOS implementations of NCL gates have been proposed and each design has
its own limitation. One of the most common limitations of a using CMOS implementation
is the area consumption. To overcome this and to reduce power dissipation, the low
power design technique GDI approach is implemented in some NCL gates. This section
gives the brief idea about the NCL design and GDI approach.

### 1. Null Convention Logic

NCL is a popular delay-insensitive methodology used for designing asynchronous circuits.
NCL circuits are said to perform correctly regardless of when the input becomes available.
Hence, resulting in a clock-less and DI circuit design ^{(3)}. It is a self-timed logic paradigm where both data and control are integrated into
a single signal. To achieve the delay- insensitivity, NCL circuits utilize dual-rail
or quad-rail logic ^{(12)}. Dual rail logic consists of two wires D0 and D1, whose values can be any one from
the set DATA0, DATA1, NULL. The DATA 0 (D0 = 1, D1 = 0) stage represent Boolean logic
0, DATA1 state (D0 = 0, D1 = 1) is equivalent to Boolean logic 1 and NULL (empty stage)
stage (D0 = 0, D1 = 0), meaning no DATA is available at the input. When D0 = 1 and
D1 = 1, this corresponds to invalid stage ^{(2)}. Both the rails are mutually exclusive to each other, such that no two rails can
be simultaneously asserted. Similarly, quad-rail has four wires Q0, Q1, Q2 and Q3,
each representing different stage from the set DATA0, DATA1, DATA2, DATA3, NULL. These
rails are also mutually exclusive to each other. To achieve the delay-insensitive
behavior NCL should possess two main characteristics: symbolic completeness and input
completeness ^{(2)}.

NCL circuits are implemented using threshold gates. The basic NCL gate is T Hmn where
1 ≤ m ≤ n ^{(13)}. Here, n and m represent total number of inputs and the number of inputs to be asserted,
respectively. At least m out of n inputs should be asserted before the output is asserted
^{(12)}. Second type of NCL gates are weighted threshold gate.

These gates are denoted as T HmnWw1w2wR where, w1, w2, ....wR, each > 1, are the integer
weights of input1, input2,..... input R, respectively. Here, m ≥ wR > 1, applied to
input R but 1 ≤ R < n. There are 27 fundamental NCL gates constituting from two to
four variable functions. In order to design the DI circuits, NCL has a built-in hysteresis
state-holding capacity. This implies that after the output is asserted; all the inputs
must be de-asserted for the output to be de-asserted. This Hysteresis ensures the
gate is input complete, meaning that the output remains constant until all the inputs
are de-asserted ^{(2)}.

### 2. Gate Diffusion Input (GDI) Approach

For simple implementation of the GDI gates (all functions) in standard CMOS processes,
a new modified GDI model was introduced in ^{(14)}. Fig. 1 illustrate modified GDI basic cell ^{(14)}. Table 1 shows the input configuration of the simple GDI cell corresponding to different Boolean
functions. Similar to the conventional GDI it has three inputs G (common gate input
of both the nMOS and the pMOS), P (input to the source/drain of the pMOS), N (input
to the source/drain of the nMOS). The bulks of nMOS and pMOS transistors in the modified
cell are constantly connected to GND and VDD, respectively. This adaptation enables
simple implementation of the GDI gates (all functions) in standard CMOS processes
^{(10)}. The influence of the bulk effect on the circuit performance is very similar to that
of the originally proposed GDI cell. With the technology scaling, the impact of source-to-bulk
voltage on the transistor threshold voltage is highly reduced making this limitation
less relevant in process below 65 nm technology. The following equation shows the
dependency of transistor threshold voltage on the source to bulk voltage ^{(14)}:

##### (1)

$V_{t h}=V_{t h 0}+\gamma\left(/| 2 \varphi_{F}+V_{S B} \mid-\left(/\left|2 \varphi_{F}\right|\right)-\eta V_{D S}\right.$where VSB refers to the source to body voltage, $V_{th0}$ is the threshold voltage when VSB = 0, φ1F represents the Fermi potential, γ denotes the linearized body coefficient, and η represents the Drain-induced barrier lowering (DIBL) coefficient.

Variety of function as seen in Table 1 can also be implemented using Modified GDI cell ^{(15)}. GDI gates are more versatile and compact than Static CMOS gates. For example, designing
Multiplexer (MUX) using Modified GDI requires only two transistors whereas CMOS design
requires 12 transistors. GDI approach is more effective for the AND, OR, F1 and F2
functions. The F1 and F2 functions are the two basic functions in GDI and each one
of these functions provides a universal set ^{(15)}. Therefore, in general, every digital circuit can be implemented using only F1 or
F2 gates or a combination of both. Simple modification in the input signals of F1
and F2 gates provides different functions, thus allowing synthesizing of other functions
more efficiently ^{(14)}. Although MGDI reduces the transistor count, they suffer a voltage drop at their
outputs causing performance degradation.

Table 1. Boolean function synthesis through input configuration of a simple GDI cell
^{(15)}

N |
P |
G |
Out |
Function |

0 |
B |
A |
$\overline{A}B$ |
F 1 |

B |
1 |
A |
$\overline{A} + B$ |
F 2 |

1 |
B |
A |
A + B |
OR |

B |
0 |
A |
AB |
AND |

C |
B |
A |
$\overline{A}B + AC$ |
MUX |

0 |
1 |
A |
$\overline{A}$ |
NOT |

## III. GATE DIFFUSION INPUT (GDI) BASED NCL CIRCUITS

With the decreasing feature size, requirement of designs with not only reduced area
but also power dissipation is required. Several CMOS implementation schemes have been
introduced for NCL gates, including dynamic, static, semi-static ^{(4)}, and differential. The static and semi-static implementations of C-elements have
been extensively discussed in ^{(6)}. The main drawback of the CMOS NCL design is it occupies a large area, thus, large
power dissipation. To address this limitation modules of the NCL design are implemented
using the GDI technique. The GDI technique is a low power designed approach where
a wide range of complex circuits can be implemented using only two transistors. of
C-elements have been extensively discussed in ^{(6)}. The main drawback of the CMOS NCL design is it occupies a large area, thus, large
power dissipation. To address this limitation modules of the NCL design are implemented
using the GDI technique. The GDI technique is a low power designed approach where
a wide range of complex circuits can be implemented using only two transistors. Hence,
the GDI approach not only reduces the power dissipation but also reduces the transistor
count. The GDI implementation of NCL gates has been proposed and extensively discussed
in here.

### 1. Static CMOS Implementation of NCL Gates

Generally, CMOS based designs consists of one pull-up and one pull-down network to
implement the set and reset functions, which are complements of each other ^{(16)}. However, the NCL threshold gates are also designed with the hysteresis state holding
capability to ensure delay-insensitivity ^{(12)}. As the result, an additional pull up and pull-down network known as Hold0 and Hold1
are required to maintain this hysteresis such that the output will not change until
all inputs are de-asserted ^{(2)}. An NCL gate constitutes of both set and hold equation, the gate functionality and
when should it be asserted is determined by the set and the hold determines till when
the gate should be asserted which is nothing but the OR-ing of all the gate inputs
^{(12)}.

### 2. GDI Implementation of NCL Gates

To overcome the above limitations, a low power design technique, Modified Gate Diffusion
Input (GDI) can be utilized to design the NCL circuits. The basic representation of
GDI cell is shown in Fig. 4 where inputs can also be applied to source of both NMOS and PMOS, allowing to design
a wide range of circuits using only two transistors ^{(15)}. However, using GDI full output voltage swing cannot be obtained for all input combinations,
thus, leading to a significant voltage drop at the final output [since, PMOS transistor
is strong pull up device and NMOS transistor has strong pull-down network]. This limitation
can be addressed by using regenerative buffers ^{(15)}. Thus, implementation of the NCL circuits using GDI technology not only reduces the
transistor count but also reduces power dissipation ^{(15)}.

1) Designing of the GDI NCL gates: Table 1 shows the different input configuration corresponding to respective Boolean functions
^{(15)}. These configurations are used for designing GDI based NCL gates. The NCL gates constitutes
of both set and hold equation, the gate functionality and when should it be asserted
is determined by the set and the hold determines till when the gate should be asserted
which is nothing but the OR-ing of all the gate inputs. The complete Boolean equation
for a THmn gate is breakdown into a series of AND, OR and MUX functionality and then
GDI AND, OR and MUX configurability is used for designing NCL TH gates. The basic
GDI implementation of NCL TH22 gate is depicted in the Fig. 2. The Boolean expression of T H22 gate is as [(AB + Z(A + B)]. Accordingly, the GDI
AND and OR configurability is used for designing AB and (A + B) respectively. Finally,
GDI MUX configurability is used to determine the set or hold state based on the previous
results (i.e. based on Z). In comparison to the CMOS implementation, GDI based TH22
gates requires only 6 transistors. Thus, reducing the transistor count by 50%. However,
voltage drop at the output effects the performance of the GI NCL gates.

2) Analysis of voltage swing for the GDI NCL gates: The major drawback of the above method is that the full output voltage swing cannot be obtained for all input combinations (leading to a significant voltage drop at the output). This limitation arises due to the structure of the inputs applied to the GDI cell. As the pMOS and nMOS transistor are strong pull up device and strong pull-down network respectively, application of any other voltage other than VDD and gnd to pMOS and NMOS source respectively leads to a voltage drop of $V_{tp}$ for pMOS and (VDD − $V_{tn}$) for nMOS transistors at the output (drain). Here, $V_{tp}$ and $V_{tn}$ represents the threshold voltage of pMOS and nMOS transistor. The above said limitation can be explained clearly for the above GDI NCL Th22 gate by theoretically examining the output voltages for all the input combinations. Assuming all the pMOS and nMOS transistors have the same properties (i.e. same widths and lengths for the pMOS and nMOS transistors respectively). The final output voltage for different input combinations is as explained below: When A = 0 and B = 0; voltage at node N1 would be $V_{tp}$ and at node N2 the voltage would be $V_{tp}$. Assuming the previous stage to be zero then the present output would be greater than $V_{tp}$ leading to a significant voltage drop as shown in Fig. 3 . When A = 0, B = 1; voltage at node N1 would be $V_{tp}$ and at node N2 the voltage would be V DD. Assuming the previous stage was null then the current results would be greater than $V_{tp}$. Therefore, significant performance degradation. For the input combination A = 1 and B = 0; voltage at node N1 would be zero and at node N2 the voltage would be (VDD − $V_{tn}$). Assuming the previous stage to be 0 then the present output would be $V_{tp}$. Therefore, voltage drop at the output voltage. For A = 1 and B = 1; voltage at node N1 would be (VDD − $V_{tn}$) and at node N2 the voltage would be (VDD − $V_{tn}$). Assuming the previous stage to be 0 then the present output would be (VDD − $V_{tn}$). Therefore, voltage drop at the output voltage. Since, the NCL follows the hysteresis loop (where the present output serves as the feedback to the next result), this voltage drops also effect the preceding stages causing performance degradation. Thus, very essential to address this limitation. To overcome this performance degradation and obtain a full swing output voltage a regenerative buffer is used that the output of every GDI technique based NCL THmn gates. When compared to the above said method implementation of regenerative buffer increases the transistor count but solves the problem of performance degradation.

3) Leakage current: The current between drain to source of a transistor operating
in weak inversion region is called sub threshold region. This sub threshold conduction
is due to the diffusion current of the minority charge carriers given as ^{(14)}:

##### (2)

$I_{S U B}=\frac{W}{L} K^{\prime}\left(1-e^{\frac{-V_{D S}}{V_{T}}}\right)\left(e^{\frac{V_{G S}-V_{t h}}{m V_{T}}}\right)$where $I_{SUB}$ is a function of transistor width (W), transistor length (L), temperature,
drain-source voltage ($V_{DS}$), gate- source voltage (VGS), threshold voltage ($V_{t}$)
and process constants (K and m). Under weak inversion the channel surface potential
is almost constant across the channel and the current flow is determined by diffusion
of minority carriers due to a lateral concentration gradient ^{(14)}. Gate Leakage Gate leakage current is due to the flow of electrons through the oxide.
Fowler-Nordheim tunneling and direct tunneling are the two tunneling mechanisms responsible
for the gate leakage ^{(14)}. The gate leakage increases exponentially as the oxide thickness is reduced.

##### (3)

$J_{G T}=A E_{o x}^{2} \exp \left(\frac{-B}{E_{o x}}\left[1-\left(1-\frac{V_{o x}}{\phi_{o x}}\right)^{\frac{3}{2}}\right]\right)$where Vox is the oxide layer potential, $t_{ox}$ is the oxide layer thickness, A and B are constants, and $E_{ox}$ is the electric field over the oxide layer that is given by:

The sophisticated structure of the GDI cell provide significant reduction in the gate
leakage current as well as the subthreshold leakage current when compared to the static
CMOS gates ^{(14)}. In static CMOS gates there is always a sub-threshold leakage path for all the possible
input states as the pull-up and the pull-down networks are always connected to the
supply voltage or ground; in contrast to GDI gates where the connection of the pull-up
and pull-down network depends on the functionality to be implemented.

3) Multi-threshold techniques for reducing the threshold voltage drop: The voltage
drops at their output of the GDI gates causing performance degradation. Regenerative
inverters are used to avoid voltage drop but they increase the circuit area. However,
the usage of the cascaded inverters has increased static power dissipation, due to
the increased VGS voltages of the off transistors. This issue limited the use of GDI
technology in the older technologies ^{(14)}. However, the nanoscale process is providing an option to fabricate different threshold
transistors on the same die can which solve the above problems. The best solution
is provided by using low threshold transistors in the path where a voltage drop is
expected, coupled with regenerative inverters designed using high threshold transistors.
Due to the increased subthreshold leakage in Static CMOS, integration of low threshold
transistors in non-critical path is usually not practiced ^{(14)}. Since, in GDI, the leakage currents are small, the coupling of low and high threshold
transistors doesn’t dissipate large leakage current as in Static CMOS. When compared
to Static CMOS the performance of GDI is still degraded due to the uses of these transistors.
However, to achieve the same functionality, the total path length from the input to
the output is small (for most function) in GDI and compensates for individual gate
performance degradation ^{(14)}.

5) Proposed generalized design approach for GDI NCL gates: By implementing the GDI
technique for the asynchronous NCL designs in the nano-scale process, we can utilize
the multi-threshold techniques for reducing the threshold voltage drop. Along the
multi=threshold techniques, introducing the regenerative buffers/inverters eliminates
the voltage drop by producing the full swing voltage at the output. These two techniques
can be used for designing any NCL gates i.e. any NCL circuity. Designing the GDI AND,
OR, MUX cells using the low threshold transistor and the regenerative buffer with
high threshold transistor will not only reduce the delay but also the power consumption
with an area overhead. As an example, the GDI based NCL TH22 designed using the proposed
method is illustrated in Fig. 4. The GDI based designing of TH22 is carried out as explained. First, GDI based AND
and OR configurability is used to designing AB and A+B and then GDI MUX is used to
select AB or A + B based on Z value. If Z = 0, AB value is selected else A + B value
is selected. The above GDI AND, OR and MUX cells are implemented using low threshold
transistor for low power design. Then, the MUX result is passed through regenerative
buffers designed using high threshold transistor to produce a full swing output. Therefore,
efficiently reducing the transistor count and power consumption. Similarly, different
NCL THmn gates are designed using GDI technique. Similarly, different NCL THmn gates
are designed using GDI technique. The number of transistors required for implementing
27 NCL gates using CMOS (Static) ^{(17)} and GDI techniques has been compared and found that GDI NCL gates offer 13.5% reduction
in transistor count on average. Thus, using GDI implementation of NCL circuits we
can reduce the transistor count which leads to decrease in power consumption.

The validation of the proposed model is carried out by realizing a variety of delay-insensitive NCL designs such as a 4-bit ripple carry adder, unpipelined 4x4 multiplier, two stage pipelined 4x4 NCL multiplier and unpipelined NCL ALU using GDI technology. The in-depth detail for each design is as explained below.

### 3. Ripple Carry Adder

NCL Ripple carry adder (RCA) designed using GDI technology is presented; GDI RCA model. In this paper a GDI model of a 4-bit RCA is proposed. The proposed model utilizes low power GDI technique to realize the NCL gates. The results show that the proposed model have better performance in terms of transistor count, static and dynamic power dissipation. For designing a 4-bit NCL ripple carry adder, a 4-bit input complete, optimized NCL full adders are utilized which are sandwiched between two DI registers. The optimized NCL full adder is designed using two T H23 and T H34W 2 gates. Fig. 7 depicts the proposed optimized GDI NCL full adder, where TH23 and T H34W 2 gates are implemented using GDI technology. Fig. 5 depicts the transistor level implementation of T H23 gate using GDI technique, where a restoration buffer is added at the output to restore the signal to avoid any voltage drop. For designing a low power circuit, except for the buffer, the rest of the circuit is designed using low threshold transistors. The reason for realizing buffer using high threshold transistors is to restore the dropped voltage levels.

### 4. 4-Bit Multiplier

NCL multipliers are classified into non-pipelined and pipelined multipliers. In this paper a GDI model of 4-bit non-pipelined and pipelined NCL multiplier is proposed. In the GDI model all the modules are implemented in GDI technique, to over- come the limitations of the static CMOS design. The proposed model provides the best performance in terms of power and area.

Non-Pipelined Multiplier

Fig. 8 illustrate the proposed GDI model for the existing non-pipelined ^{(6)}, 1-stage 4-bit multiplier using full-word completion version of the NCL multiplier
design. To reduce the transistor count and dynamic power dissipation, all the modules
of the existing CMOS Non-pipelined multiplier are replaced with GDI modules. Thus,
resulting to a GDI model consisting of GDI technology-based gates. As depicted in
the Fig. 8, the GDI model consists of 8-bit GDI registers, incomplete GDI AND, complete GDI
AND gate, GDI half adders (GDI HA) and GDI full adder (GDI FA). I and C denotes “incomplete
GDI AND” and “complete GDI AND” functions, respectively. The GDI multiplier also include
GENS7 and the completion component, denoted as COMP. The 8-bit GDI registers at the
input and at the output are used to control the ow of DATA and NULL wavefronts as
shown in Fig. 8.

2-Stage Pipelined Multiplier

The proposed GDI model for the existing 2-stage 4-bit multiplier ^{(6)} using full-word completion is depicted in Fig. 9. It consists of an 8-bit GDI register, an 8-bit CMOS register, a 12-bit GDI register,
incomplete GDI AND (I), complete GDI AND (C), GDI half adders and the GDI full adder
(GDI FA). Here, a 12-bit GDI registers is added between the HYBRID HA and GDI FA in
addition to the proposed HYBRID Non- pipelined, 1-stage 4-bit multiplier using full-word
completion, to achieve 2-stage GDI 4-bit multiplier.

### 5. Hybrid Non-pipelined ALU

The logic diagram of the proposed non-pipelined dual-rail GDI ALU is shown in Fig. 10. The existing non-pipelined dual-rail ALU ^{(3)} is modified to obtain the proposed model. The proposed model gives better performance
in terms of transistor count and power dissipation. It consists of dual-rail GDI registers,
completion components (COMP), GDI Convert to MEAG function, GDI Demultiplexer, GDI
NCL OR, GDI AND, GDI XOR, invert, shift right, shift left functions, a GDI ripple
carry subtractor and adder, two GDI Multiplexers and CMOS Carry Logic. The Convert
to MEAG function converts the three dual rail signals to an 8-rail MEAG signals. This
conversion is carried out by eight TH33 gates present in the Convert to MEAG function.
The invert, shift right, and shift left operations are done by renaming the signals
and hence, have no logic delay.

The GDI ripple-carry subtractor and adder consist of four GDI full adders. Based on the select MEAG result, the GDI Demultiplexer selects the corresponding function. The GDI Demultiplexer is realized using GDI TH22 gates, which pass the input A, B, and $C_{in}$/$B_{in}$ inputs, respectively. For the functions which doesn’t require B input, GDI Demultiplexer is designed using GDI TH34 gates, which also ensures input-completeness with respect to B. On the other hand, the CMOS Carry Logic generates $C_{out}$ and provides input completeness to $C_{in}$/$B_{in}$ inputs. The CMOS Multiplexers consists of TH14 and TH12 gates, which produces single results by OR-ing each rail of the demultiplexer signals.

## IV. SIMULATION RESULTS

This section presents the comparison results of different of NCL circuits implemented using CMOS and GDI technology. They are three different types of CMOS models: High Threshold model (High $V_{th}$) where the complete circuit is realized using only high threshold transistors. In the second Low Threshold model (Low $V_{th}$) the low threshold transistors are used for realizing the design. Lastly, the standard threshold transistors are used for designing the Standard Threshold model (std $V_{th}$). The low threshold transistors offer high speed but high-power consumption, high threshold transistors have low power and high latency, and standard threshold transistors provide medium delay and medium power dissipation. The GDI design performance is compared individually with all three CMOS designs. The performance comparison is based on number of transistors, static and dynamic power dissipation. The CMOS and GDI designs are realized in 45 nm technology using Cadence proprietary general process design kit (gpdk45). A process design kit contains the process technology and needed information to do device-level design in the Cadence environment. The schematics are implemented in Cadence Virtuoso tool with VDD = 1V and temperature= 27°. The circuits are simulated with the Spectre simulator in the Cadence Virtuoso using gpdk45 high and low threshold MOSFET transistors with W/L ratio of 1. Note that all transistors for all designs, both CMOS and GDI, are minimum sized.

Table 2. Simulated Results 4-Bit RCA using CMOS and GDI Technology

Table 3. Performance Comparison of 4-Bit Unpipelined multiplier

Simulations were carried on all the possible input patterns to calculate static and dynamic power dissipation. Dynamic power dissipation is the power dissipated during the transient state condition (when the transistors of the circuits are switching from one logic state to another). For computing the dynamic power, first the average power for all the available input patterns is measured. Then, the static power is deducted from the measured average power to obtain the dynamic power.

A. 4-bit Ripple Carry Adder - CMOS vs GDI

Ripple Carry Adder presented in this paper is designed using four different models i.e. low threshold, high threshold and standard CMOS models and a GDI technique based RCA model. In the CMOS model1 the whole circuit is designed using high threshold transistors, similarly the CMOS model 2 designed with low threshold transistors and standard transistor are used in CMOS model3. Whereas, in the GDI RCA model complete circuit (full adder, input and output registers) is designed using GDI technology. Table 2 shows the performance comparison of these designs in terms of power and transistor count. Simulations are carried out using input test vectors, which covers all possible input combinations for a 4-bit RCA. The values tabulated in the Table 2 corresponds to the average value calculated for all possible input combinations.

The GDI RCA model offers 14% reduction in transistor count when compared to all designs of CMOS models. In comparison with the CMOS High threshold, low-threshold, and standard transistor models, the GDI model results in 9.3%, 45.7% and 30.30% reduction in dynamic power reduction.

B. 4-bit NCL Multiplier CMOS vs GDI

The CMOS and the GDI design comparison can also be extended to multipliers. Two types of 4-bit NCL multipliers, 4-bit Unpipelined Multiplier and 4-bit pipelined Multiplier are designed and there simulation results are discussed as below.

1) 4- bit Unpipelined Multiplier: The four models of unpipelined NCL multipliers designed in this paper constitutes of three different CMOS models and the GDI model. The GDI design model results are compared with the CMOS models. As seen from Table 3, the GDI design gives the best performance in terms of the # transistor used and dynamic power dissipation when compared to the CMOS models. The dynamic power is improved by 2.4%, 36.6% and 21.6 when compared with CMOS model 1, CMOS model 2 and CMOS model 3. In comparison to the CMOS models, the GDI model offers 13.7% reduction in transistor count. Thus, reducing the dynamic power and area as well.

Table 4. 4-Bit Two Stage Pipelined Multiplier Simulation Results for CMOS and GDI Technology

Table 5. Performance Comparison of Non-Pipelined Dual-Rail CMOS and GDI ALU

2) Two Stage pipelined Multiplier: Performance analysis of the Nonpipelined ALU, designed using three different CMOS approaches and GDI are discussed below. To prevent power dissipation and area consumption, GDI model employing low power GDI technique is proposed. Table 4 presents the simulation results of the three CMOS models and the GDI model. The Average power presented are the average of all the input transitions possible for the 4-bit ALU. As illustrated the GDI non-pipelined ALU design results in a 1.2%, 45.6% and 20.9% decrease in dynamic power. In addition, transistor count is decreased by 13.4% when compared to the all the CMOS non-pipelined ALU design.

C. Hybrid Non-pipelined ALU

The Performance analysis of the Nonpipelined ALU, designed using two different approaches CMOS and GDI are discussed below. To prevent threshold voltage penetration inside the circuit and to utilize the GDI low power technique advantages, a GDI circuit comprising of both GDI NCL gates is proposed. Table 5 presents the simulation results of both the CMOS and the GDI models. As illustrated the GDI non-pipelined ALU design results in a 39% and 14% decrease in the dynamic power dissipation when compared to CMOS model 2 and model 3. However, GDI model dynamic power increases by 6% in comparison with the CMOS model 1. This variation is because of the type of threshold transistor used in these models. The CMOS model1 only comprises of high threshold transistors which dissipates less power. Whereas the GDI model uses both high and low threshold transistors, this low threshold transistors are the reason for its increased power. The GDI model results in 13% reduced transistor count in comparison to all the CMOS models.

## IV. CONCLUSIONS

In this paper, a novel GDI NCL model is proposed to address the limitations of the existing CMOS NCL design. The GDI model contains modules implemented using GDI technique. The main drawback of the CMOS NCL design is it occupies a large area. To address this limitation modules of the NCL design are implemented using the GDI technique. The GDI technique is a low power designed approach where a wide range of complex circuits can be implemented using only two transistors. Hence, the GDI approach not only reduces the power dissipation but also reduces the transistor count.

However, when the NCL gates are designed using the GDI technique there is a considerable voltage drop at their outputs. This problem is addressed by using low threshold transistors where a voltage drop is expected, and high threshold transistor are used for the regenerative inverters at the output. The proposed idea is implemented in various NCL circuits such as the RCA, unpipelined multiplier and pipelined multiplier, unpipelined ALU When compared to the CMOS design, the GDI models have less transistor count, dynamic power dissipation.

### REFERENCES

## Author

Prashanthi Metku is from Hyderabad, India.

She received her B.Tech degree in Electronic and Communication Engineering from Jawaharlal Nehru Technological University, Hyderabad, India, in 2011 and M.Tech degree in Electronic Engineering from Pondicherry University, India, in 2014.

She is currently pursuing her Ph.D. degree in the Computer Engineering from Missouri University of Science and Technology, United States.

Her interests include CMOS circuit design and Error Correction Codes.

Kyung Ki Kim received his B.S. and M.S. degrees in Electronic Engi-neering from Yeungnam University, South Korea, in 1995 and 1997, respectively.

He was a candidate for Ph.D. in Computer Science from Sogang University, South Korea from 1997 to 1999, and received his Ph.D. Degree in Computer Engineering from Northeastern University, Boston, USA in 2008.

He was a member of technical staff with Sun Microsystems, Santa Clara, CA in 2008 and a senior researcher with Illinois Institute of Technology, Chicago, USA in 2009.

Since March 2010, he has been with the school of Electronic and Electrical Engineering, Daegu University, Korea, where he is currently an Associate Professor.

His current research focuses on neuromorphic architecture, high speed low power VLSI design, asynchronous design, electronic CAD and nano-electronics.

Yong-Bin Kim received the B.S. degree in electrical engineering from Sogang University, Seoul, Korea, the M.S. degree in electrical engineering from New Jersey Institute of Technology, Newark, NJ, USA, and the Ph.D. degree in electrical and computer engineering from Colorado State University, Fort Collins, CO, USA.

He was a member of the technical staff with Electronics and Telecommunications Research Institute(ETRI), Daejon, Korea from 1982 to 1987.

He was a Senior Design Engineer with Intel Corp., Hillsboro, OR, USA, from 1990 to 1993, involved in Intel Pentium Pro CPU chip design.

He was a Member of Technical Staff with Hewlett Packard Co., Fort Collins, CO, USA from 1993 to 1996, involved in HP PA-8000 RISC microprocessor chip design.

He was as a Staff Engineer with Sun Microsystems, Palo Alto, CA, USA from 1996 to 1998, involved in 1.5 GHz Ultra Sparc5 CPU chip design.

He was an Assistant Professor with the Department of Electrical and Computer Engineering of the University of Utah, Salt Lake City, UT, USA from 1998 to 2000.

He is currently a Professor with the Department of Electrical and Computer Engineering at Northeastern University, Boston, MA, USA.

His research focuses on low-power analog and digital circuit design as well as high-speed low-poper VLSI circuit design and methodology.

Minsu Choi received his B.S., M.S. and Ph.D. degrees in Computer Science from Oklahoma State University in 1995, 1998 and 2002, respectively.

He is currently an associate professor of Electrical and Computer Engineering at Missouri University of Science & Technology (Missouri S&T).

His research mainly focuses on Computer Architecture & VLSI, Crypto-hardware design, Nanoelectronics, Embedded Systems, Fault Tolerance, Testing, Quality Assurance, Reliability Modeling and Analysis, Configurable Computing, Parallel & Distributed Systems and Dependable Instrumentation & Measurement.

He has won two outstanding teaching awards at MST in 2008 and 2009.

He is a senior member of IEEE and a member of Golden Key National Honor Society and Sigma Xi.