Mobile QR Code QR CODE

Main Menu

The Journal of Semiconductor Technology and Science (JSTS) is an international, peer-reviewed, and open-access journal that is published bimonthly.
- Scope: semiconductor processes, devices, circuits, and MEMS.
- Editor-in-Chief: Prof. Woo Young Choi (ECE, Seoul National University)
- Indexed within Science Citation Index Expanded (SCIE), SCOPUS, Korea Citation Index (KCI), and other databases.

Journal Search

[

Research article

]

JSTS(Journal of Semiconductor Technology and Science)

IEIE Vol. 21, No. 6, p.483-494

ISSN (print) :

1598-1657

ISSN (online) :

2233-4866

Received : 19 October 2021Revised : 12 November 2021Accepted : 12 November 2021

DOI :

https://doi.org/10.5573/JSTS.2021.21.6.483

Transistor Count Reduction Technique for Clockfree Null-convention Arithmetic Logic Circuits

MetkuPrashanthi¹ KimKyung Ki² KimYong-Bin³ ChoiMinsu⁴

(1GlobalFoundries, Essex Junction, VT, USA)
(Department of Electronic Engineering, Daegu University, Gyeongsan, Korea)
(Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA)
(Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, USA)

^* E-mail: choim@umsystem.edu, kkkim@daegu.ac.kr (corresponding)

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

Null Convention Logic (NCL) is a robust clock-less technique for designing asynchronous delay-insensitive circuits. The traditional complementary metal oxide semiconductor (CMOS) approach is often used for designing NCL circuits, which tends to occupy a large area. To address this issue, a low power design technique Gate Diffusion Input (GDI) is introduced for designing the NCL circuits. This GDI design methodology is the promising alternative for the static CMOS designs, which allows the reduction in area and power consumption while maintaining the low complexity of the logic design. In this paper, a novel GDI based NCL designs are proposed and designed. However, the voltage swings in the GDI approach leads to the considerable amount of voltage drop at the output. This limitation is addressed by using low threshold transistors where a voltage drop is expected, and high threshold transistors are used for the regenerative inverters at the output. The proposed approach has been verified by designing the NCL Ripple Carry Adder (RCA), Unpipelined multiplier, pipelined multiplier and Unpipelined ALU by using the GDI technique. These models are designed and simulated using Cadence Virtuoso and an average of 13.5 % reduction in the transistor count is observed for these GDI based NCL models when compared to the CMOS models.

Index Terms

Null-convention logic, gate diffusion input, clockless design, transistor count reduction, simulation

I. INTRODUCTION

Clocking have become a very complex task for circuits due to technology scaling. The increasing clock rate, due to the de- creasing transistor size is leading to a major problem of clock skew. In fact, designing clock nets consumes large portion of the designing time ⁽¹⁾. In order to achieve a tolerable skew, large part of the chip area is allotted for clock drives ⁽²⁾. This leads to high power dissipation prominently at clock edges, where switching occurs. As the trend for high clock frequency and decreasing the feature size continues, synchronous circuits power dissipation and noise are significantly increasing ⁽³⁾. The increasing power dissipation is the major concern for the emerging low power industry. Thus, encouraging renewed interest towards asynchro-nous digital designs.

In comparison to synchronous circuits delay-insensitive (DI) asynchronous paradigms offers less power, noise and electro- magnetic interference ⁽⁴⁾. Asynchronous circuits are classified into two types: bounded-delay and delay-insensitive models. Bounded-delay models consider both the gates and wire delays to be bounded. One such example for this type of model is micropipelines ⁽⁵⁾. Here, delays are added based on the worst-case scenarios. To ensure the correctness of the circuits, extensive timings analysis of worse-case behavior is considered. On the contrary, delay-insensitive models assume both the gate and wire delays are unbounded. Here, wire forks are considered to be isochronic, that is, the component wire delays are much less than the logic element delays ⁽⁶⁾. This assumption is even valid for the future nanotechnologies. However, wire connecting the components doesn’t abide to the isochronic assumption.

One of the most used techniques for delay- insensitive asynchronous logic design is the Null Convention logic (NCL) ⁽⁷⁾. NCL utilizes dual-rail or quad-rail encoding to represent logic 1, logic 0, null and invalid signals. For clock free operation, NCL, uses local handshaking done by the completion detection register ⁽⁸⁾. Usually, NCL circuits are realized in CMOS technology which has the potential for high speed but has high power dissipation and occupies a large area. In order to reduce the area, semi-static implementation of NCL circuits have also been proposed ⁽⁴⁾. However, the semi-static implementation has the limitation of weak feedback loop. To overcome the above limitations, a novel approach leveraging Gate Diffusion Input (GDI) method is proposed ⁽⁹⁾. GDI is a low power design technique that was first introduced in synchronous circuits to obtain low power synchronous designs ⁽¹⁰⁾. A wide range of complex logic functions can be implemented in only two transistors by using the GDI approach. This approach is suitable for designing low power circuits with the reduced transistor count. The proposed approach is extensively verified by design and simulation of multiple prototype arithmetic logic circuits in this work.

This paper is organized as follows. Section II presents the Preliminaries and review of NCL and GDI. An extensive discussion of the proposed design is carried out in Section III. Design and performance evaluation data including the area, power and latency are included in Section IV. Finally, the summary and concluding remarks are made in Section V.

II. PRELIMINARIES AND REVIEW

In the current nanometer technology with ultra-low power de- sign as a goal, synchronous circuit designs are limited because of their high-power dissipation factor. Asynchronous circuits such as Null Convention Logic are the promising alternative to this solution. NCL gates also known as threshold gates are designed with a hysteresis loop to main delay insensitivity ⁽¹¹⁾. Several CMOS implementations of NCL gates have been proposed and each design has its own limitation. One of the most common limitations of a using CMOS implementation is the area consumption. To overcome this and to reduce power dissipation, the low power design technique GDI approach is implemented in some NCL gates. This section gives the brief idea about the NCL design and GDI approach.

1. Null Convention Logic

NCL is a popular delay-insensitive methodology used for designing asynchronous circuits. NCL circuits are said to perform correctly regardless of when the input becomes available. Hence, resulting in a clock-less and DI circuit design ⁽³⁾. It is a self-timed logic paradigm where both data and control are integrated into a single signal. To achieve the delay- insensitivity, NCL circuits utilize dual-rail or quad-rail logic ⁽¹²⁾. Dual rail logic consists of two wires D0 and D1, whose values can be any one from the set DATA0, DATA1, NULL. The DATA 0 (D0 = 1, D1 = 0) stage represent Boolean logic 0, DATA1 state (D0 = 0, D1 = 1) is equivalent to Boolean logic 1 and NULL (empty stage) stage (D0 = 0, D1 = 0), meaning no DATA is available at the input. When D0 = 1 and D1 = 1, this corresponds to invalid stage ⁽²⁾. Both the rails are mutually exclusive to each other, such that no two rails can be simultaneously asserted. Similarly, quad-rail has four wires Q0, Q1, Q2 and Q3, each representing different stage from the set DATA0, DATA1, DATA2, DATA3, NULL. These rails are also mutually exclusive to each other. To achieve the delay-insensitive behavior NCL should possess two main characteristics: symbolic completeness and input completeness ⁽²⁾.

NCL circuits are implemented using threshold gates. The basic NCL gate is T Hmn where 1 ≤ m ≤ n ⁽¹³⁾. Here, n and m represent total number of inputs and the number of inputs to be asserted, respectively. At least m out of n inputs should be asserted before the output is asserted ⁽¹²⁾. Second type of NCL gates are weighted threshold gate.

These gates are denoted as T HmnWw1w2wR where, w1, w2, ....wR, each > 1, are the integer weights of input1, input2,..... input R, respectively. Here, m ≥ wR > 1, applied to input R but 1 ≤ R < n. There are 27 fundamental NCL gates constituting from two to four variable functions. In order to design the DI circuits, NCL has a built-in hysteresis state-holding capacity. This implies that after the output is asserted; all the inputs must be de-asserted for the output to be de-asserted. This Hysteresis ensures the gate is input complete, meaning that the output remains constant until all the inputs are de-asserted ⁽²⁾.

2. Gate Diffusion Input (GDI) Approach

For simple implementation of the GDI gates (all functions) in standard CMOS processes, a new modified GDI model was introduced in ⁽¹⁴⁾. Fig. 1 illustrate modified GDI basic cell ⁽¹⁴⁾. Table 1 shows the input configuration of the simple GDI cell corresponding to different Boolean functions. Similar to the conventional GDI it has three inputs G (common gate input of both the nMOS and the pMOS), P (input to the source/drain of the pMOS), N (input to the source/drain of the nMOS). The bulks of nMOS and pMOS transistors in the modified cell are constantly connected to GND and VDD, respectively. This adaptation enables simple implementation of the GDI gates (all functions) in standard CMOS processes ⁽¹⁰⁾. The influence of the bulk effect on the circuit performance is very similar to that of the originally proposed GDI cell. With the technology scaling, the impact of source-to-bulk voltage on the transistor threshold voltage is highly reduced making this limitation less relevant in process below 65 nm technology. The following equation shows the dependency of transistor threshold voltage on the source to bulk voltage ⁽¹⁴⁾:

(1)

$V_{t h}=V_{t h 0}+\gamma\left(/| 2 \varphi_{F}+V_{S B} \mid-\left(/\left|2 \varphi_{F}\right|\right)-\eta V_{D S}\right.$

where VSB refers to the source to body voltage, $V_{th0}$ is the threshold voltage when VSB = 0, φ1F represents the Fermi potential, γ denotes the linearized body coefficient, and η represents the Drain-induced barrier lowering (DIBL) coefficient.

Variety of function as seen in Table 1 can also be implemented using Modified GDI cell ⁽¹⁵⁾. GDI gates are more versatile and compact than Static CMOS gates. For example, designing Multiplexer (MUX) using Modified GDI requires only two transistors whereas CMOS design requires 12 transistors. GDI approach is more effective for the AND, OR, F1 and F2 functions. The F1 and F2 functions are the two basic functions in GDI and each one of these functions provides a universal set ⁽¹⁵⁾. Therefore, in general, every digital circuit can be implemented using only F1 or F2 gates or a combination of both. Simple modification in the input signals of F1 and F2 gates provides different functions, thus allowing synthesizing of other functions more efficiently ⁽¹⁴⁾. Although MGDI reduces the transistor count, they suffer a voltage drop at their outputs causing performance degradation.

Fig. 1. Block diagram of the proposed transmitter.

Table 1. Boolean function synthesis through input configuration of a simple GDI cell ⁽¹⁵⁾

N	P	G	Out	Function
0	B	A	$\overline{A}B$	F 1
B	1	A	$\overline{A} + B$	F 2
1	B	A	A + B	OR
B	0	A	AB	AND
C	B	A	$\overline{A}B + AC$	MUX
0	1	A	$\overline{A}$	NOT

III. GATE DIFFUSION INPUT (GDI) BASED NCL CIRCUITS

With the decreasing feature size, requirement of designs with not only reduced area but also power dissipation is required. Several CMOS implementation schemes have been introduced for NCL gates, including dynamic, static, semi-static ⁽⁴⁾, and differential. The static and semi-static implementations of C-elements have been extensively discussed in ⁽⁶⁾. The main drawback of the CMOS NCL design is it occupies a large area, thus, large power dissipation. To address this limitation modules of the NCL design are implemented using the GDI technique. The GDI technique is a low power designed approach where a wide range of complex circuits can be implemented using only two transistors. of C-elements have been extensively discussed in ⁽⁶⁾. The main drawback of the CMOS NCL design is it occupies a large area, thus, large power dissipation. To address this limitation modules of the NCL design are implemented using the GDI technique. The GDI technique is a low power designed approach where a wide range of complex circuits can be implemented using only two transistors. Hence, the GDI approach not only reduces the power dissipation but also reduces the transistor count. The GDI implementation of NCL gates has been proposed and extensively discussed in here.

1. Static CMOS Implementation of NCL Gates

Generally, CMOS based designs consists of one pull-up and one pull-down network to implement the set and reset functions, which are complements of each other ⁽¹⁶⁾. However, the NCL threshold gates are also designed with the hysteresis state holding capability to ensure delay-insensitivity ⁽¹²⁾. As the result, an additional pull up and pull-down network known as Hold0 and Hold1 are required to maintain this hysteresis such that the output will not change until all inputs are de-asserted ⁽²⁾. An NCL gate constitutes of both set and hold equation, the gate functionality and when should it be asserted is determined by the set and the hold determines till when the gate should be asserted which is nothing but the OR-ing of all the gate inputs ⁽¹²⁾.

2. GDI Implementation of NCL Gates

To overcome the above limitations, a low power design technique, Modified Gate Diffusion Input (GDI) can be utilized to design the NCL circuits. The basic representation of GDI cell is shown in Fig. 4 where inputs can also be applied to source of both NMOS and PMOS, allowing to design a wide range of circuits using only two transistors ⁽¹⁵⁾. However, using GDI full output voltage swing cannot be obtained for all input combinations, thus, leading to a significant voltage drop at the final output [since, PMOS transistor is strong pull up device and NMOS transistor has strong pull-down network]. This limitation can be addressed by using regenerative buffers ⁽¹⁵⁾. Thus, implementation of the NCL circuits using GDI technology not only reduces the transistor count but also reduces power dissipation ⁽¹⁵⁾.

Fig. 2. Basic GDI Implementation of TH22 gate.

1) Designing of the GDI NCL gates: Table 1 shows the different input configuration corresponding to respective Boolean functions ⁽¹⁵⁾. These configurations are used for designing GDI based NCL gates. The NCL gates constitutes of both set and hold equation, the gate functionality and when should it be asserted is determined by the set and the hold determines till when the gate should be asserted which is nothing but the OR-ing of all the gate inputs. The complete Boolean equation for a THmn gate is breakdown into a series of AND, OR and MUX functionality and then GDI AND, OR and MUX configurability is used for designing NCL TH gates. The basic GDI implementation of NCL TH22 gate is depicted in the Fig. 2. The Boolean expression of T H22 gate is as [(AB + Z(A + B)]. Accordingly, the GDI AND and OR configurability is used for designing AB and (A + B) respectively. Finally, GDI MUX configurability is used to determine the set or hold state based on the previous results (i.e. based on Z). In comparison to the CMOS implementation, GDI based TH22 gates requires only 6 transistors. Thus, reducing the transistor count by 50%. However, voltage drop at the output effects the performance of the GI NCL gates.

2) Analysis of voltage swing for the GDI NCL gates: The major drawback of the above method is that the full output voltage swing cannot be obtained for all input combinations (leading to a significant voltage drop at the output). This limitation arises due to the structure of the inputs applied to the GDI cell. As the pMOS and nMOS transistor are strong pull up device and strong pull-down network respectively, application of any other voltage other than VDD and gnd to pMOS and NMOS source respectively leads to a voltage drop of $V_{tp}$ for pMOS and (VDD − $V_{tn}$) for nMOS transistors at the output (drain). Here, $V_{tp}$ and $V_{tn}$ represents the threshold voltage of pMOS and nMOS transistor. The above said limitation can be explained clearly for the above GDI NCL Th22 gate by theoretically examining the output voltages for all the input combinations. Assuming all the pMOS and nMOS transistors have the same properties (i.e. same widths and lengths for the pMOS and nMOS transistors respectively). The final output voltage for different input combinations is as explained below: When A = 0 and B = 0; voltage at node N1 would be $V_{tp}$ and at node N2 the voltage would be $V_{tp}$. Assuming the previous stage to be zero then the present output would be greater than $V_{tp}$ leading to a significant voltage drop as shown in Fig. 3 . When A = 0, B = 1; voltage at node N1 would be $V_{tp}$ and at node N2 the voltage would be V DD. Assuming the previous stage was null then the current results would be greater than $V_{tp}$. Therefore, significant performance degradation. For the input combination A = 1 and B = 0; voltage at node N1 would be zero and at node N2 the voltage would be (VDD − $V_{tn}$). Assuming the previous stage to be 0 then the present output would be $V_{tp}$. Therefore, voltage drop at the output voltage. For A = 1 and B = 1; voltage at node N1 would be (VDD − $V_{tn}$) and at node N2 the voltage would be (VDD − $V_{tn}$). Assuming the previous stage to be 0 then the present output would be (VDD − $V_{tn}$). Therefore, voltage drop at the output voltage. Since, the NCL follows the hysteresis loop (where the present output serves as the feedback to the next result), this voltage drops also effect the preceding stages causing performance degradation. Thus, very essential to address this limitation. To overcome this performance degradation and obtain a full swing output voltage a regenerative buffer is used that the output of every GDI technique based NCL THmn gates. When compared to the above said method implementation of regenerative buffer increases the transistor count but solves the problem of performance degradation.

Fig. 3. Approximate voltage drop across GDI TH22 gate for A=0, B=0 input combination.

3) Leakage current: The current between drain to source of a transistor operating in weak inversion region is called sub threshold region. This sub threshold conduction is due to the diffusion current of the minority charge carriers given as ⁽¹⁴⁾:

(2)

$I_{S U B}=\frac{W}{L} K^{\prime}\left(1-e^{\frac{-V_{D S}}{V_{T}}}\right)\left(e^{\frac{V_{G S}-V_{t h}}{m V_{T}}}\right)$

where $I_{SUB}$ is a function of transistor width (W), transistor length (L), temperature, drain-source voltage ($V_{DS}$), gate- source voltage (VGS), threshold voltage ($V_{t}$) and process constants (K and m). Under weak inversion the channel surface potential is almost constant across the channel and the current flow is determined by diffusion of minority carriers due to a lateral concentration gradient ⁽¹⁴⁾. Gate Leakage Gate leakage current is due to the flow of electrons through the oxide. Fowler-Nordheim tunneling and direct tunneling are the two tunneling mechanisms responsible for the gate leakage ⁽¹⁴⁾. The gate leakage increases exponentially as the oxide thickness is reduced.

(3)

$J_{G T}=A E_{o x}^{2} \exp \left(\frac{-B}{E_{o x}}\left[1-\left(1-\frac{V_{o x}}{\phi_{o x}}\right)^{\frac{3}{2}}\right]\right)$

where Vox is the oxide layer potential, $t_{ox}$ is the oxide layer thickness, A and B are constants, and $E_{ox}$ is the electric field over the oxide layer that is given by:

(4)

$E_{o x}=\frac{V_{o x}}{t_{o x}}$

The sophisticated structure of the GDI cell provide significant reduction in the gate leakage current as well as the subthreshold leakage current when compared to the static CMOS gates ⁽¹⁴⁾. In static CMOS gates there is always a sub-threshold leakage path for all the possible input states as the pull-up and the pull-down networks are always connected to the supply voltage or ground; in contrast to GDI gates where the connection of the pull-up and pull-down network depends on the functionality to be implemented.

3) Multi-threshold techniques for reducing the threshold voltage drop: The voltage drops at their output of the GDI gates causing performance degradation. Regenerative inverters are used to avoid voltage drop but they increase the circuit area. However, the usage of the cascaded inverters has increased static power dissipation, due to the increased VGS voltages of the off transistors. This issue limited the use of GDI technology in the older technologies ⁽¹⁴⁾. However, the nanoscale process is providing an option to fabricate different threshold transistors on the same die can which solve the above problems. The best solution is provided by using low threshold transistors in the path where a voltage drop is expected, coupled with regenerative inverters designed using high threshold transistors. Due to the increased subthreshold leakage in Static CMOS, integration of low threshold transistors in non-critical path is usually not practiced ⁽¹⁴⁾. Since, in GDI, the leakage currents are small, the coupling of low and high threshold transistors doesn’t dissipate large leakage current as in Static CMOS. When compared to Static CMOS the performance of GDI is still degraded due to the uses of these transistors. However, to achieve the same functionality, the total path length from the input to the output is small (for most function) in GDI and compensates for individual gate performance degradation ⁽¹⁴⁾.

5) Proposed generalized design approach for GDI NCL gates: By implementing the GDI technique for the asynchronous NCL designs in the nano-scale process, we can utilize the multi-threshold techniques for reducing the threshold voltage drop. Along the multi=threshold techniques, introducing the regenerative buffers/inverters eliminates the voltage drop by producing the full swing voltage at the output. These two techniques can be used for designing any NCL gates i.e. any NCL circuity. Designing the GDI AND, OR, MUX cells using the low threshold transistor and the regenerative buffer with high threshold transistor will not only reduce the delay but also the power consumption with an area overhead. As an example, the GDI based NCL TH22 designed using the proposed method is illustrated in Fig. 4. The GDI based designing of TH22 is carried out as explained. First, GDI based AND and OR configurability is used to designing AB and A+B and then GDI MUX is used to select AB or A + B based on Z value. If Z = 0, AB value is selected else A + B value is selected. The above GDI AND, OR and MUX cells are implemented using low threshold transistor for low power design. Then, the MUX result is passed through regenerative buffers designed using high threshold transistor to produce a full swing output. Therefore, efficiently reducing the transistor count and power consumption. Similarly, different NCL THmn gates are designed using GDI technique. Similarly, different NCL THmn gates are designed using GDI technique. The number of transistors required for implementing 27 NCL gates using CMOS (Static) ⁽¹⁷⁾ and GDI techniques has been compared and found that GDI NCL gates offer 13.5% reduction in transistor count on average. Thus, using GDI implementation of NCL circuits we can reduce the transistor count which leads to decrease in power consumption.

The validation of the proposed model is carried out by realizing a variety of delay-insensitive NCL designs such as a 4-bit ripple carry adder, unpipelined 4x4 multiplier, two stage pipelined 4x4 NCL multiplier and unpipelined NCL ALU using GDI technology. The in-depth detail for each design is as explained below.

Fig. 4. Proposed GDI NCL TH22 gate.

3. Ripple Carry Adder

NCL Ripple carry adder (RCA) designed using GDI technology is presented; GDI RCA model. In this paper a GDI model of a 4-bit RCA is proposed. The proposed model utilizes low power GDI technique to realize the NCL gates. The results show that the proposed model have better performance in terms of transistor count, static and dynamic power dissipation. For designing a 4-bit NCL ripple carry adder, a 4-bit input complete, optimized NCL full adders are utilized which are sandwiched between two DI registers. The optimized NCL full adder is designed using two T H23 and T H34W 2 gates. Fig. 7 depicts the proposed optimized GDI NCL full adder, where TH23 and T H34W 2 gates are implemented using GDI technology. Fig. 5 depicts the transistor level implementation of T H23 gate using GDI technique, where a restoration buffer is added at the output to restore the signal to avoid any voltage drop. For designing a low power circuit, except for the buffer, the rest of the circuit is designed using low threshold transistors. The reason for realizing buffer using high threshold transistors is to restore the dropped voltage levels.

Fig. 4. Block diagram of the proposed 2b/cycle NS SAR ADC.

Fig. 5. GDI Implementation of TH23 gate.

Fig. 6. GDI Implementation of TH34W2 gate.

Fig. 7. GDI Model of Full Adder with DI Registers.

4. 4-Bit Multiplier

NCL multipliers are classified into non-pipelined and pipelined multipliers. In this paper a GDI model of 4-bit non-pipelined and pipelined NCL multiplier is proposed. In the GDI model all the modules are implemented in GDI technique, to over- come the limitations of the static CMOS design. The proposed model provides the best performance in terms of power and area.

Fig. 8. GDI Model of Non-pipelined, 1-stage 4×4 multiplier.

Non-Pipelined Multiplier

Fig. 8 illustrate the proposed GDI model for the existing non-pipelined ⁽⁶⁾, 1-stage 4-bit multiplier using full-word completion version of the NCL multiplier design. To reduce the transistor count and dynamic power dissipation, all the modules of the existing CMOS Non-pipelined multiplier are replaced with GDI modules. Thus, resulting to a GDI model consisting of GDI technology-based gates. As depicted in the Fig. 8, the GDI model consists of 8-bit GDI registers, incomplete GDI AND, complete GDI AND gate, GDI half adders (GDI HA) and GDI full adder (GDI FA). I and C denotes “incomplete GDI AND” and “complete GDI AND” functions, respectively. The GDI multiplier also include GENS7 and the completion component, denoted as COMP. The 8-bit GDI registers at the input and at the output are used to control the ow of DATA and NULL wavefronts as shown in Fig. 8.

Fig. 9. GDI Model of 2-stage 4×4 multiplier.

2-Stage Pipelined Multiplier

The proposed GDI model for the existing 2-stage 4-bit multiplier ⁽⁶⁾ using full-word completion is depicted in Fig. 9. It consists of an 8-bit GDI register, an 8-bit CMOS register, a 12-bit GDI register, incomplete GDI AND (I), complete GDI AND (C), GDI half adders and the GDI full adder (GDI FA). Here, a 12-bit GDI registers is added between the HYBRID HA and GDI FA in addition to the proposed HYBRID Non- pipelined, 1-stage 4-bit multiplier using full-word completion, to achieve 2-stage GDI 4-bit multiplier.

5. Hybrid Non-pipelined ALU

The logic diagram of the proposed non-pipelined dual-rail GDI ALU is shown in Fig. 10. The existing non-pipelined dual-rail ALU ⁽³⁾ is modified to obtain the proposed model. The proposed model gives better performance in terms of transistor count and power dissipation. It consists of dual-rail GDI registers, completion components (COMP), GDI Convert to MEAG function, GDI Demultiplexer, GDI NCL OR, GDI AND, GDI XOR, invert, shift right, shift left functions, a GDI ripple carry subtractor and adder, two GDI Multiplexers and CMOS Carry Logic. The Convert to MEAG function converts the three dual rail signals to an 8-rail MEAG signals. This conversion is carried out by eight TH33 gates present in the Convert to MEAG function. The invert, shift right, and shift left operations are done by renaming the signals and hence, have no logic delay.

Fig. 10. Non-pipelined Dual-Rail GDI ALU.

The GDI ripple-carry subtractor and adder consist of four GDI full adders. Based on the select MEAG result, the GDI Demultiplexer selects the corresponding function. The GDI Demultiplexer is realized using GDI TH22 gates, which pass the input A, B, and $C_{in}$/$B_{in}$ inputs, respectively. For the functions which doesn’t require B input, GDI Demultiplexer is designed using GDI TH34 gates, which also ensures input-completeness with respect to B. On the other hand, the CMOS Carry Logic generates $C_{out}$ and provides input completeness to $C_{in}$/$B_{in}$ inputs. The CMOS Multiplexers consists of TH14 and TH12 gates, which produces single results by OR-ing each rail of the demultiplexer signals.

IV. SIMULATION RESULTS

This section presents the comparison results of different of NCL circuits implemented using CMOS and GDI technology. They are three different types of CMOS models: High Threshold model (High $V_{th}$) where the complete circuit is realized using only high threshold transistors. In the second Low Threshold model (Low $V_{th}$) the low threshold transistors are used for realizing the design. Lastly, the standard threshold transistors are used for designing the Standard Threshold model (std $V_{th}$). The low threshold transistors offer high speed but high-power consumption, high threshold transistors have low power and high latency, and standard threshold transistors provide medium delay and medium power dissipation. The GDI design performance is compared individually with all three CMOS designs. The performance comparison is based on number of transistors, static and dynamic power dissipation. The CMOS and GDI designs are realized in 45 nm technology using Cadence proprietary general process design kit (gpdk45). A process design kit contains the process technology and needed information to do device-level design in the Cadence environment. The schematics are implemented in Cadence Virtuoso tool with VDD = 1V and temperature= 27°. The circuits are simulated with the Spectre simulator in the Cadence Virtuoso using gpdk45 high and low threshold MOSFET transistors with W/L ratio of 1. Note that all transistors for all designs, both CMOS and GDI, are minimum sized.

Table 2. Simulated Results 4-Bit RCA using CMOS and GDI Technology

Design Technique	Static Power (nW )	Average Power (nW )	Dynamic Power ( nW )	Transistor Count
CMOS model 1	0.588	14.01	13.42	1128
CMOS model 2	9.77	32.21	22.44	1128
CMOS model 3	1.01	17.46	16.45	1128
GDI model	1.63	13.79	12.16	960

Table 3. Performance Comparison of 4-Bit Unpipelined multiplier

Design Technique	Static Power (nW )	Average Power (nW )	Dynamic Power (nW )	Transistor Count
CMOS model 1	1.58	21.06	19.48	2040
CMOS model 2	15.8	45.83	30.03	2040
CMOS model 3	1.66	25.93	24.27	2040
GDI model	2.9	21.916	19.01	1760

Simulations were carried on all the possible input patterns to calculate static and dynamic power dissipation. Dynamic power dissipation is the power dissipated during the transient state condition (when the transistors of the circuits are switching from one logic state to another). For computing the dynamic power, first the average power for all the available input patterns is measured. Then, the static power is deducted from the measured average power to obtain the dynamic power.

A. 4-bit Ripple Carry Adder - CMOS vs GDI

Ripple Carry Adder presented in this paper is designed using four different models i.e. low threshold, high threshold and standard CMOS models and a GDI technique based RCA model. In the CMOS model1 the whole circuit is designed using high threshold transistors, similarly the CMOS model 2 designed with low threshold transistors and standard transistor are used in CMOS model3. Whereas, in the GDI RCA model complete circuit (full adder, input and output registers) is designed using GDI technology. Table 2 shows the performance comparison of these designs in terms of power and transistor count. Simulations are carried out using input test vectors, which covers all possible input combinations for a 4-bit RCA. The values tabulated in the Table 2 corresponds to the average value calculated for all possible input combinations.

The GDI RCA model offers 14% reduction in transistor count when compared to all designs of CMOS models. In comparison with the CMOS High threshold, low-threshold, and standard transistor models, the GDI model results in 9.3%, 45.7% and 30.30% reduction in dynamic power reduction.

B. 4-bit NCL Multiplier CMOS vs GDI

The CMOS and the GDI design comparison can also be extended to multipliers. Two types of 4-bit NCL multipliers, 4-bit Unpipelined Multiplier and 4-bit pipelined Multiplier are designed and there simulation results are discussed as below.

1) 4- bit Unpipelined Multiplier: The four models of unpipelined NCL multipliers designed in this paper constitutes of three different CMOS models and the GDI model. The GDI design model results are compared with the CMOS models. As seen from Table 3, the GDI design gives the best performance in terms of the # transistor used and dynamic power dissipation when compared to the CMOS models. The dynamic power is improved by 2.4%, 36.6% and 21.6 when compared with CMOS model 1, CMOS model 2 and CMOS model 3. In comparison to the CMOS models, the GDI model offers 13.7% reduction in transistor count. Thus, reducing the dynamic power and area as well.

Table 4. 4-Bit Two Stage Pipelined Multiplier Simulation Results for CMOS and GDI Technology

Design Technique	Static Power (nW )	Average Power (nW )	Dynamic Power ( nW )	Transistor Count
CMOS model 1	2.17	28.34	26.17	2574
CMOS model 2	20.9	68.40	47.5	2574
CMOS model 3	1.23	33.915	32.685	2574
GDI model	3.72	29.56	25.84	2238

Table 5. Performance Comparison of Non-Pipelined Dual-Rail CMOS and GDI ALU

Design Technique	Static Power (nW )	Average Power (nW )	Dynamic Power ( nW )	Transistor Count
CMOS model 1	1.95	19.116	17.16	4084
CMOS model 2	23.9	54.25	30.53	4084
CMOS model 3	3.09	24.55	21.435	4084
GDI model	5.54	23.96	18.42	3520

2) Two Stage pipelined Multiplier: Performance analysis of the Nonpipelined ALU, designed using three different CMOS approaches and GDI are discussed below. To prevent power dissipation and area consumption, GDI model employing low power GDI technique is proposed. Table 4 presents the simulation results of the three CMOS models and the GDI model. The Average power presented are the average of all the input transitions possible for the 4-bit ALU. As illustrated the GDI non-pipelined ALU design results in a 1.2%, 45.6% and 20.9% decrease in dynamic power. In addition, transistor count is decreased by 13.4% when compared to the all the CMOS non-pipelined ALU design.

C. Hybrid Non-pipelined ALU

The Performance analysis of the Nonpipelined ALU, designed using two different approaches CMOS and GDI are discussed below. To prevent threshold voltage penetration inside the circuit and to utilize the GDI low power technique advantages, a GDI circuit comprising of both GDI NCL gates is proposed. Table 5 presents the simulation results of both the CMOS and the GDI models. As illustrated the GDI non-pipelined ALU design results in a 39% and 14% decrease in the dynamic power dissipation when compared to CMOS model 2 and model 3. However, GDI model dynamic power increases by 6% in comparison with the CMOS model 1. This variation is because of the type of threshold transistor used in these models. The CMOS model1 only comprises of high threshold transistors which dissipates less power. Whereas the GDI model uses both high and low threshold transistors, this low threshold transistors are the reason for its increased power. The GDI model results in 13% reduced transistor count in comparison to all the CMOS models.

IV. CONCLUSIONS

In this paper, a novel GDI NCL model is proposed to address the limitations of the existing CMOS NCL design. The GDI model contains modules implemented using GDI technique. The main drawback of the CMOS NCL design is it occupies a large area. To address this limitation modules of the NCL design are implemented using the GDI technique. The GDI technique is a low power designed approach where a wide range of complex circuits can be implemented using only two transistors. Hence, the GDI approach not only reduces the power dissipation but also reduces the transistor count.

However, when the NCL gates are designed using the GDI technique there is a considerable voltage drop at their outputs. This problem is addressed by using low threshold transistors where a voltage drop is expected, and high threshold transistor are used for the regenerative inverters at the output. The proposed idea is implemented in various NCL circuits such as the RCA, unpipelined multiplier and pipelined multiplier, unpipelined ALU When compared to the CMOS design, the GDI models have less transistor count, dynamic power dissipation.

ACKNOWLEDGMENTS

REFERENCES

Mader R., Friedman E. G., Litman A., Kourtev I. S., May 2002, Large scale clock skew scheduling techniques for improved reliability of digital synchronous vlsi circuits, IEEE International Symposium on Cirtuis ans Systems(ISCAS 2002), Vol. 1, pp. I-357

Smith S. C., DeMara R. F., Yuan J. S., Ferguson D., Lamb D., 2004, Optimization of null convention self-timed circuits, INTEGRATION, the VLSI journal, Vol. 37, No. 3, pp. 135-165

Bandapati S. K., Smith S. C., 2007, Design and characterization of null convention arithmetic logic units, Microelectronic engineering, Vol. 84, No. 2, pp. 280-287

Parsan F. A., Smith S. C., Oct 2012, CMOS implementation of static threshold gates with hysteresis: A new approach, in Proc. IEEE/IFIP 20th Int VLSI and System-on-Chip (VLSI-SoC) Conf, pp. 41-45

Bonam R., Chaudhary S., Yellambalase Y., Choi M., Aug 2007, Clock-free nanowire crossbar architecture based on null convention logic (ncl), in Proc. 7th IEEE Conf. Nanotechnology (IEEE NANO), pp. 85-89

Smith S. C., 2001, Gate and throughput optimizations for null convention self- timed digital circuits, Ph.D. dissertation, University of Central Florida Orlando, Florida

Choi M., Kang B.-H., Kim Y.-B., Kim K. K., Nov 2014, Asynchronous circuit design using new high speed ncl gates, in Proc. Int. SoC Design Conf. (ISOCC), pp. 13-14

Parsan F. A., Smith S. C., Aug 2012, CMOS implementation comparison of ncl gates, in Proc. IEEE 55th Int. Midwest Symp. Circuits and Systems (MWSCAS), pp. 394-397

Metku P., Kim K. K., Kim Y.-B., Choi M., Oct 2018, Low-power null con- vention logic multiplier design based on gate diffusion input technique, in 2018 International SoC Design Conference (ISOCC), pp. 233-234

Morgenshtein A., Yuzhaninov V., Kovshilovsky A., Fish A., 2014, Full- swing gate diffusion input logiccase-study of low-power cla adder design, INTEGRATION, the VLSI journal, Vol. 47, No. 1, pp. 62-70

Fant K. M., Brandt S. A., Oct. 27 199, Null convention logic system, US Patent 5,828,228

Smith S. C., DeMara R. F., Yuan J. S., Hagedorn M., Ferguson D., 2002, Null convention multiply and accumulate unit with conditional round- ing, scaling, and saturation, Journal of Systems Architecture, Vol. 47, No. 12, pp. 977-998

Sobelman G. E., Fant K., May 1998, Cmos circuit design of threshold gates with hysteresis, IEEE International Symposium on Circuits and Systems (ISCAS1998), Vol. 2, pp. 61-64

Morgenshtein A., Shwartz I., Fish A., Nov 2010, Gate diffusion input (gdi) logic in standard CMOS nanoscale process, in Proc. IEEE 26-th Convention of Electrical and Electronics Engineers in Israel, pp. 000 776-000 780

Morgenshtein A., Fish A., Wagner I.A., May 2002, Gate-diffusion input (gdi) - a technique for low power design of digital circuits: analysis and characterization, IEEE International Symposium on Circuits and Systems (ISCAS2002), Vol. 1, pp. I–477-I–480

Parsan F. A., Smith S. C., Aug 2012, Cmos implementation comparison of ncl gates, in Circuits and Systems (MWSCAS), 2012 IEEE 55th International Midwest Symposium on

Smith S. C., Di J., 2009, Designing asynchronous circuits using null convention logic (ncl), Synthesis Lectures on Digital Circuits and Systems, Vol. 4, No. 1, pp. 1-96

Author

Yoonji Park

Prashanthi Metku is from Hyderabad, India.

She received her B.Tech degree in Electronic and Communication Engineering from Jawaharlal Nehru Technological University, Hyderabad, India, in 2011 and M.Tech degree in Electronic Engineering from Pondicherry University, India, in 2014.

She is currently pursuing her Ph.D. degree in the Computer Engineering from Missouri University of Science and Technology, United States.

Her interests include CMOS circuit design and Error Correction Codes.

Ji-Hoon Kim

Kyung Ki Kim received his B.S. and M.S. degrees in Electronic Engi-neering from Yeungnam University, South Korea, in 1995 and 1997, respectively.

He was a candidate for Ph.D. in Computer Science from Sogang University, South Korea from 1997 to 1999, and received his Ph.D. Degree in Computer Engineering from Northeastern University, Boston, USA in 2008.

He was a member of technical staff with Sun Microsystems, Santa Clara, CA in 2008 and a senior researcher with Illinois Institute of Technology, Chicago, USA in 2009.

Since March 2010, he has been with the school of Electronic and Electrical Engineering, Daegu University, Korea, where he is currently an Associate Professor.

His current research focuses on neuromorphic architecture, high speed low power VLSI design, asynchronous design, electronic CAD and nano-electronics.

Ji-Hoon Kim

Yong-Bin Kim received the B.S. degree in electrical engineering from Sogang University, Seoul, Korea, the M.S. degree in electrical engineering from New Jersey Institute of Technology, Newark, NJ, USA, and the Ph.D. degree in electrical and computer engineering from Colorado State University, Fort Collins, CO, USA.

He was a member of the technical staff with Electronics and Telecommunications Research Institute(ETRI), Daejon, Korea from 1982 to 1987.

He was a Senior Design Engineer with Intel Corp., Hillsboro, OR, USA, from 1990 to 1993, involved in Intel Pentium Pro CPU chip design.

He was a Member of Technical Staff with Hewlett Packard Co., Fort Collins, CO, USA from 1993 to 1996, involved in HP PA-8000 RISC microprocessor chip design.

He was as a Staff Engineer with Sun Microsystems, Palo Alto, CA, USA from 1996 to 1998, involved in 1.5 GHz Ultra Sparc5 CPU chip design.

He was an Assistant Professor with the Department of Electrical and Computer Engineering of the University of Utah, Salt Lake City, UT, USA from 1998 to 2000.

He is currently a Professor with the Department of Electrical and Computer Engineering at Northeastern University, Boston, MA, USA.

His research focuses on low-power analog and digital circuit design as well as high-speed low-poper VLSI circuit design and methodology.

Ji-Hoon Kim

Minsu Choi received his B.S., M.S. and Ph.D. degrees in Computer Science from Oklahoma State University in 1995, 1998 and 2002, respectively.

He is currently an associate professor of Electrical and Computer Engineering at Missouri University of Science & Technology (Missouri S&T).

His research mainly focuses on Computer Architecture & VLSI, Crypto-hardware design, Nanoelectronics, Embedded Systems, Fault Tolerance, Testing, Quality Assurance, Reliability Modeling and Analysis, Configurable Computing, Parallel & Distributed Systems and Dependable Instrumentation & Measurement.

He has won two outstanding teaching awards at MST in 2008 and 2009.

He is a senior member of IEEE and a member of Golden Key National Honor Society and Sigma Xi.