Mobile QR Code QR CODE

  1. (Department of EE, Incheon National University, Incheon 22012, Korea)



Charge saving and sharing circuit, in-memory computing, full adder, MRAM

I. INTRODUCTION

Over the past few decades, there has been a significant increase in the volume of data being processed and stored. One of the most severe bottlenecks in conventional Von-Neumann computer architectures is the limited data bandwidth between the processor and memory [1-3]. Furthermore, data transfer between the processor and memory incurs high latency and energy consumption, which leads to a significant degradation in system performance and efficiency. This situation has resulted in memory bandwidth limitations, known as the ``memory wall,'' and increased the data movement overhead and leakage current [4]. In-memory computing (IMC), an idea proposed several decades ago, aims to address these challenges by incorporating processing units directly into the memory itself [5]. The fundamental concept revolves around preprocessing data and providing only intermediate results to the processor [2]. Such a computer architecture not only reduces data transfer bandwidth and power overhead but also enhances performance by executing simple logical operations within the memory [1].

In recent years, the emergence of new non-volatile memories (NVMs), such as resistive random access memory (RRAM), phase-change random access memory (PRAM), and spin-transfer torque magnetic random access memory (STT-MRAM), has opened up new possibilities for efficient implementation of IMC [6]. The resistance-based storage mechanism of these NVM devices offers unique processing capabilities, enabling energy-efficient logical computing within the memory itself. In this scenario, logical operations can be performed, and the results can be stored in a non-volatile format on the memory chip [7]. Among these NVMs, STT-MRAM have garnered significant attention, with various prototype demonstrations and early commercial products [2]. Extensive research efforts have been dedicated to improving the efficiency of STT-MRAM at the device, circuit, and architectural levels [6, 8-10]. In this paper, we delve into the exploration of IMC utilizing STT-MRAM.

Numerous STT-MRAM-based IMC approaches have been proposed at the architectural level [2,11]. The capability to simultaneously activate multiple word lines (WLs) within a memory array can be leveraged to execute various arithmetic, logic, and vector operations [12,13]. The concurrent activation of memory cells enables the AND and OR operations in a single stage by utilizing a pre-charge sense amplifier (PCSA) [11]. Furthermore, a full adder (FA) can also operate by integrating a logic tree into the PCSA [11]. However, for multi-bit FA, an ``n + 1'' stage configuration is required to perform an n-bit operation. Although digital circuits like carry-lookahead adders (e.g., Kogge-Stone adder (KSA), Brent-Kung adder, Sklansky adder) can significantly reduce the number of stages, they entail significant area overhead and are unsuitable for memory arrays. Therefore, to minimize the number of stages while minimizing overhead within a memory array, the utilization of analog circuits is preferred instead of digital circuits.

In this study, we propose a high-performance multi-bit FA that incorporates a charge saving and sharing (CSS) circuit, which operates in the analog domain [14]. Similar to the carry skip adder, we pre-compute the carry for every 4 bits to enable parallel computation of the 4-bit sum operation [15]. To compute the carry for every 4 bits, we employ the CSS circuit, while the 4-bit sum operation is performed using the PCSA with an integrated logic tree [11]. As a result, the proposed method utilizing the CSS circuit successfully reduces the required number of stages from ``n + 1'' to ``n/4 + 5'' stages, while minimizing the area overhead.

The remainder of this paper is structured as follows: Section II provides the background information on STT-MRAM and PCSA; Section III describes the implementation of the state-of-the-art multi-bit FA and the proposed multi-bit FA using the CSS circuit; Section IV presents the simulation results; and finally, Section V offers the conclusion.

II. BACKGROUND

1. STT-MRAM

Fig. 1(a) illustrates a magnetic tunnel junction (MTJ), which serves as the fundamental storage element of STT-MRAM. The MTJ comprises a free layer, a tunnel barrier, and a pinned layer. Commonly employed materials for the tunnel barrier include AlOx and MgO, while the free layer is typically composed of CoFeB, Ru, CoFe, PtMn, and similar substances [16].

Fig. 1(b) demonstrates two states, namely parallel (P) and anti-parallel (AP), which are determined by the magnetization direction of the free layer. The MTJ can exhibit two resistance states, attributed to the tunneling magneto-resistance (TMR) effect, depending on whether it is in the P or AP state [17].

(1)
$ \mathrm{TMR}=\frac{\mathrm{R}_{\mathrm{H}}-\mathrm{R}_{\mathrm{L}}}{\mathrm{R}_{\mathrm{L}}}\times 100\mathrm{\% } $

In the case of the P state, it is represented by low resistance (RL), which corresponds to the data ‘1’. On the other hand, the AP state is indicated by high resistance (RH), representing the data ‘0’. Fig. 1(c) depicts a single bit-cell configuration, known as 1T-1MTJ, in STT-MRAM. During a write operation, the ‘1’ data can be written by allowing current to flow from the bit-line (BL) to the source line (SL), while the ‘0’ data can be written by allowing the current to flow from SL to BL.

Fig. 1. (a) MTJ; (b) Two states of MTJ; (c) 1T-1MTJ bit-cell structure of STT-MRAM.
../../Resources/ieie/JSTS.2024.24.2.111/fig1.png

2. PCSA

The PCSA depicted in Fig. 2 enables the execution of read, AND/OR, carry, and sum operations [11]. The logic tree within the PCSA is utilized specifically for FA operation. According to Table 1, during all the operations, L0 and L1 maintain a high level, except for sum (i.e., FA) operation.

Fig. 2. PCSA with the addition of a logic tree[11].
../../Resources/ieie/JSTS.2024.24.2.111/fig2.png
Table 1. Control signals for read, AND, OR, carry, and sum operations[11]

Operation

L0

/L1

L1

/L0

L2

L3

Read

1

0

1

0

0

0

AND

1

0

1

0

1

0

OR

1

0

1

0

0

1

Carry

1

0

1

0

/CIN

CIN

Sum

CIN

/COUT

COUT

/CIN

CIN

/CIN

A. Read Operation [13,18]

Fig. 3(a) demonstrates the read behavior when L2 and L3 are deactivated, as indicated in Table 1. During this read operation, the selected data cell (RL or RH) is compared to the reference cell (RREF), and read by the PCSA. RREF has a resistance value between RL and RH, as depicted in Fig. 4(a). The outcome of the read operation, as read by the PCSA, is shown in Fig. 5(a).

Fig. 3. (a) Circuit for read operation; (b) Circuit for AND, OR operation[19,20].
../../Resources/ieie/JSTS.2024.24.2.111/fig3.png
Fig. 4. (a) Resistance distribution of RL, RH, and RREF[21,22]; (b) Resistance distribution when RL, RH, and RREFare connected in parallel [11].
../../Resources/ieie/JSTS.2024.24.2.111/fig4.png
Fig. 5. (a) Results of read operation according to MTJ state; (b) Result of AND operation according to MTJ 'A' and 'B' states; (c) Results of OR operation based on MTJ 'A' and 'B' states [23].
../../Resources/ieie/JSTS.2024.24.2.111/fig5.png

B. AND and OR Operations [1,24]

A key approach for performing bit logic operations in STT-MRAM macro involves organizing and distinguishing resistor combinations. In Fig. 2, by enabling two WLs simultaneously, the resistive state can be extended by connecting two resistors in parallel, as demonstrated in Fig. 3(b). Fig. 4(b) illustrates the resistance distribution of RL${\parallel}$RL, RH${\parallel}$RL, and RH${\parallel}$RH when two MTJs are connected in parallel, along with a reference resistor that distinguishes the three resistance values. Then, these resistance combinations are connected to the PCSA, and the resulting OUT indicates an AND operation when only L2 is activated on the reference branch. Conversely, when only L3 is activated, the OUT represents an OR operation.

III. MULTI-BIT FA

1. State-of-the-art Multi-bit FA [11]

Several papers have proposed the use of PCSA for sum operations [11-13]. The sum operation, as proposed by Wang et al. [11], can be executed by utilizing the PCSA equipped with the logic tree illustrated in Fig. 2.

A. Carry Operation

Fig. 6(a) shows the single-bit carry operation. The carry result, denoted as COUT, is determined by the MAJ(A, B, CIN) function, where MAJ(A, B, 0) represents the AND operation (i.e., AND(A, B)) and MAJ(A, B, 1) represents the OR operation (i.e., OR(A, B)). In the figure, the red and blue paths correspond to the AND and OR operations, respectively.

Fig. 6. Single-bit FA using PCSA: (a) Carry operation (red path when CIN= 0 and blue path when CIN= 1); (b) Sum operation when CIN= 0 (red path when COUT= 0 and blue path when COUT= 1); (c) Sum operation when CIN= 1 (red path when COUT= 0 and blue path when COUT= 1).
../../Resources/ieie/JSTS.2024.24.2.111/fig6.png

B. Sum Operation

The sum result is determined by the MAJ(A, B, CIN, /COUT, /COUT), as shown in Table 1. L0, /L1, L1, and /L0 correspond to CIN, /COUT, COUT, and /CIN, respectively. In Fig. 6(b), the red path represents the case where MAJ(A, B, 0, 1, 1) becomes OR(A, B) and the blue path represents the case where MAJ(A, B, 0, 0, 0) evaluates to zero. Fig. 6(c) shows the case where the red path of MAJ(A, B, 1, 1, 1) yields 1 and the blue path of MAJ(A, B, 1, 0, 0) yields AND(A, B). This sum result can be achieved using the logic tree or by reusing the AND and OR operations. Because the sum operation requires the COUT value, it is essential to obtain it in the previous step so that the sum result can be obtained in the next step of the calculation.

Fig. 7(a) shows the schematic of the state-of-the-art multi-bit FA [11]. Fig. 7(b) illustrates the SAE signal for the PCSA. In Fig. 7(c), it is evident that the sum operation for the current bit and the carry operation for the subsequent bit are executed concurrently. The final outcome of the sum operation, Sn, is obtained in stages ``n + 1''.

Fig. 7. (a) Schematic of multi-bit FA [11]; (b) SAE signal for the PCSA; (c) Result of multi-bit FA according to the number of stages.
../../Resources/ieie/JSTS.2024.24.2.111/fig7.png

2. Proposed Multi-bit FA using CSS Circuit

Fig. 8(a) shows the array structure of the proposed multibit FA. This structure can be used to read inputs A and B simultaneously by closing a switch, or to read inputs A and B separately by opening a switch. Fig. 8(b) shows the schematic of the CSS circuit, which is responsible for storing charge in the capacitor and sharing the charge by closing the switch.

Fig. 8. (a) Array structure for the proposed multi-bit FA; (b) Schematic of the CSS circuit.
../../Resources/ieie/JSTS.2024.24.2.111/fig8.png
Fig. 9. (a) 1 stage operation; (b) 1.5 stage operation; (c) 2 stage operation; (d) Result of SA as a function of stage; (e) SAE signal.
../../Resources/ieie/JSTS.2024.24.2.111/fig9.png

To obtain COUT(X+3) from A(x+3)A(x+2)A(x+1)A(x) + B(x+3)B(x+2)B(x+1)B(x) + CIN, the values VCAP1, V\-CAP2, VCAP3, VCAP4, VCAP5, VCAP6, VCAP7, VCAP8, VCAP9 are used as inputs to VCIN, VA(x), VB(x), VA(x+1), VB(x+1), VA(x+2), VB(x+2), VA(x+3), VB(x+3), respectively. The size of the capacitor of the CSS circuit is determined by the weight of each digit.

(2)
$ CAP1=CAP2=CAP3 $
(3)
$ CAP4=CAP5=2CAP1 $
(4)
$ CAP6=CAP7=2^{2}CAP1 $
(5)
$ CAP8=CAP9=2^{3}CAP1 $

Based on CAP1, CAP2, and CAP3, which store the least significant bit and C\-IN, the second bit has a size of 2x, the third bit has a size of 4x, and the fourth bit has a size of 8x. Charge-sharing occurs when all the switches are closed so that all the capacitors have the same voltage. The voltage at this point is VCSS.

(6)
VCSS = $~ \frac{{\sum }_{i=1}^{9}\left(\left[V_{CAPi}\right]\times \left[CAPi\right]\right)}{{\sum }_{i=1}^{9}CAPi}$
(7)
VREF =$\mathrm{VDD}\times \frac{CAP2+CAP4+CAP6+CAP8}{{\sum }_{i=1}^{9}CAPi}$
Fig. 10. (a) FA operation in parallel by 4 bits; (b) 4-bit adder; (c) Result as per stage.
../../Resources/ieie/JSTS.2024.24.2.111/fig10.png

VREF represents the reference voltage used for reading the output, OUT, of the SA. The value of COUT(X+3) can be read using the latch-type SA [25,26], as depicted in Fig. 8(b).

Fig. 9 illustrates the process of calculating COUT for every 4 bits. In Fig. 9(a), which represents the stage 1, A1-A4 and B1-B4 are read using the PCSA, and the read values, along with CIN, are stored in capacitors of the CSS circuit. Fig. 9(b) corresponds to stage 1.5. At this stage, the switch in the CSS circuit is closed to obtain Vcss, which represents the shared voltage across the capacitors. Fig. 9(c) depicts the behavior during stage 2. Utilizing the Vcss obtained in stage 1.5, COUT4 (= C4, the carry-out bit for the fourth bit) is obtained using the SA. At the same time, A5-A8 and B5-B8 are read using the PCSA and stored in the CSS circuit along with COUT4. Thus, by continuing this process, the final result shown in Fig. 9(d) can be obtained by iteratively calculating COUT for each group of 4 bits.

Once the COUT values for every 4 bits are obtained through the CSS circuit, the 4-bit adder depicted in Fig. 10(a) and (b) performs the sum operation in parallel, processing 4 bits at a time. The resulting sum values can be observed in Fig. 10(c). Notably, all the sum operations are accomplished within a total of only ``n/4 + 5'' stages.

IV. SIMULATION

The efficiency of the proposed MRAM-based IMC platform was evaluated by Cadence Spectre simulations with industry-compatible 28-nm model parameters.

Fig. 11 shows the read yield as a function of MTJ variation when reading STT-MRAM with PCSA. It can be seen that the read yield decreases sharply as the MTJ variation increases. The proposed CSS circuit can be utilized with SAs other than PCSA; therefore, to increase the read yield, an offset-canceling current-sampling SA [27], single-cap offset-cancelled SA [28], offset-canceling single-ended SA [29], or a sensing circuit (SC) can be used as a pre-amplifier for the STT-MRAM to increase the read yield. Examples of SCs include source-degeneration SC [30], body-voltage SC [31], etc.

Fig. 11. Read yield based on MTJ variation.
../../Resources/ieie/JSTS.2024.24.2.111/fig11.png

The capacitance mismatch can affect the accuracy of the calculation results. In Table 2, starting with a capacitance mismatch of 9%, the results are inverted. It does not affect the accuracy up to 8%, but when the capacitance mismatch is larger, it will affect the accuracy.

Fig. 12 shows the performance as a function of the number of bits in the adder. It can be seen that as n increases, the performance becomes higher compared to the state-of-the-art multi-bit FA [11], especially when n = 64, the number of stages can be reduced by more than 3 times. In Table 3, compared to the state-of-the-art multi-bit FA [11], the proposed multi-bit FA using CSS circuit increases the area by about 2 times and the energy by 1.6 times. Therefore, it has an advantage over the state-of-the-art multi-bit FA [11] starting from 16 bits, when the number of stages is about half.

Fig. 12. $\frac{state-of-the-art multi-bit FA[11]stagecount}{proposed multi-bit FA using CSS circuit stage count}$ depending on the number of bits.
../../Resources/ieie/JSTS.2024.24.2.111/fig12.png

The 16-bit values of A (A16-A1), B (B16-B1), and CIN are set to ``1011 0111 1010 1100'', ``0100 0011 0111 1001'', and ``1'', respectively. Fig. 13 shows the results of the state-of-the-art multi-bit FA [11], while the results of the proposed multi-bit FA using the CSS circuit are shown in Fig. 14. Both sets of results have been calculated correctly. State-of-the-art multi-bit FA [11] required 17 stages to perform the operation, whereas the proposed multi-bit FA using the CSS circuit accomplished the operation in only 9 stages. In conclusion, by incorporating the CSS circuit into the existing multi-bit FA, the number of required stages can be reduced by half, from 17 to 9 stages, when 16-bit design is considered.

Fig. 13. 16-bit results from state-of-the-art multi bit FA [11]. “1011 0111 1010 1100” (A16-A1) + “0100 0011 0111 1001” (B16-B1) + “1” (CIN) = “0 1111 1011 0010 0110” (C16 S16-S1).
../../Resources/ieie/JSTS.2024.24.2.111/fig13.png
Fig. 14. 16-bit results of the proposed multi-bit FA using CSS circuit. “1011 0111 1010 1100” (A16-A1) + “0100 0011 0111 1001” (B16-B1) + “1” (CIN) = “0 1111 1011 0010 0110” (C16 S16-S1).
../../Resources/ieie/JSTS.2024.24.2.111/fig14.png

Table 3 compares the performance, energy consumption, and area utilization of the three multibit FAs on a 16-bit basis. The evaluation parameters include the number of stages, number and size of PCSAs with logic trees, number and size of additional transistors, number of memory read operations, and energy consumption. The state-of-the-art multi-bit FA [11] demonstrates superior area efficiency and low energy consumption; however, it suffers from a high number of stages (poor performance). Although the utilization of KSA significantly reduces the number of stages, its large area overhead prevents it from being incorporated into the memory array. Similarly, other digital circuits such as carry lookahead adders, carry select adders, and carry skip adders face similar area overhead challenges, thus preventing their inclusion in the memory array. To address this issue, it is necessary to optimize the overhead while improving the performance by leveraging the analog domain instead of the digital domain [34]. Compared to the state-of-the-art multi-bit FA, the proposed multi-bit FA with the CSS circuit requires approximately half the number of stages. Additionally, it employs fewer transistors compared to the multi-bit FA with KSA. However, compared to the other two multi-bit FAs, the proposed circuit entails a higher number of memory read operations. In this case, the energy consumption by CAP is 22.56 f J, which accounts for 2.3% of the total energy consumption. The reason for the increase in energy consumption is the increase in the number of read operations. In summary, the proposed multi-bit FA utilizing the analog domain offers intermediate performance between the other two FAs while effectively addressing the area overhead problem associated with the digital domain. Nevertheless, there is still a need to reduce energy consumption.

Table 2. CSS circuit operation result due to capacitance mismatch1)

Capacitance mismatch

0%

1%

2%

3%

4%

5%

Result

Pass

Pass

Pass

Pass

Pass

Pass

Capacitance mismatch

6%

7%

8%

9%

10%

11%

Result

Pass

Pass

Pass

Fail

Fail

Fail

1) For the worst case, “1111” (A4-A1) + “0000” (B4-B1) + “1” (CIN), we simulated the CAP mismatch so that the CAP size where 1 are stored decreases and the CAP size where 0 are stored increases.
Table 3. Comparison of 16-bit sum operation between state-of-the-art multi-bit FA, multi-bit FA using KSA, and proposed multi-bit FA using CSS circuit

State-of-the-art multi-bit FA [11]

Multi-bit FA using KSA [32,33]

Proposed multi-bit FA using CSS circuit

Computing domain

Digital

Digital

Analog + Digital

Number of computing stages

(performance)

17

1 + tpg + 4*tAO + txor

9

PCSA count (size1))

16 (2.92 um2)

32 (5.84 um2)

32 (5.84 um2)

Additional transistor count

0

2982

104.5

Additional size1)

0 um2

7.16 um2

0.25 um2

Total size1)

(area overhead)

2.92 um2

13 um2

6.09 um2

Memory read operation count

32

32

56

Energy consumption

598.7 fJ

755.2 fJ

969.3 fJ

1) The size is the size for the pre-layout and is the sum of the width*length of the transistor.

V. CONCLUSIONS

In this paper, we propose a multi-bit FA designed specifically for high-performance sum operations in STT-MRAM-based IMC systems. The proposed multi-bit FA is implemented with the CSS circuit in the analog domain with parallel Cout generation every 4 bits followed by a 4-bit sum operation in the digital domain. Our circuit architecture demonstrates a more efficient stage utilization, requiring only ``n/4 + 5'' stages per n-bit compared to the conventional ``n + 1'' stages. Moreover, it significantly reduces the area overhead when compared to digital domain-based multi-bit FAs, making it feasible for integration within a memory array. However, it is important to note that the proposed circuit, while effectively reducing the number of stages, requires twice the number of PCSA and additional circuits compared to the state-of-the-art multi-bit FA. Additionally, its energy consumption is also higher. As a result, our future work will be focused on minimizing both the area overhead and energy consumption associated with the proposed circuit.

ACKNOWLEDGMENTS

This work was supported by Incheon National University Research Grant in 2022. The EDA tool was supported by the IC Design Education Center (IDEC), Korea.

References

1 
C. Wang et al., "Computing-in-memory paradigm based on STT-MRAM with synergetic read/write-like modes," in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May. 2021, pp. 1-5.DOI
2 
S. Jain et al., "Computing in memory with spin-transfer torque magnetic RAM," IEEE Trans, Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 3, pp. 470-483, Mar. 2018.DOI
3 
T. Na, “Ternary output binary neural network with zero-skipping for MRAM-based digital in-memory computing,” IEEE Trans. Circuits Syst. II, Exp. Briefs (TCAS-II), 2023.DOI
4 
Z. He et al., "Exploring STT-MRAM based in-memory computing paradigm with application of image edge extraction," In 2017 IEEE International Conference on Computer Design (ICCD)., Nov. 2017, pp. 439-446.DOI
5 
H. S. Stone, "A logic-in-memory computer," IEEE Trans. Comput., Vol. C-19, no. 1, pp. 73-78, Jan. 1970.DOI
6 
T. Na et al., “STT-MRAM sensing: a review,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 68, no. 1, pp. 12-18, Jan. 2021.DOI
7 
M. Zabihi et al. "In-memory processing on the spintronic CRAM: From hardware design to application mapping," IEEE Trans. Comput., Vol. 68, no. 8, pp. 1159-1173, Aug 2019.DOI
8 
D. Apalkov et al. "Spin-transfer torque magnetic random access memory (STT-MRAM)," ACM Journal. Emerging Technologies in Computing Systems (JETC), Vol. 9, no. 2, pp. 1-35, May 2013.DOI
9 
R. Bishnoi et al. "Improving write performance for STT-MRAM," IEEE Trans. Magn., vol. 52, no. 8, pp. 1-11, Aug 2016.DOI
10 
L. Zhang et al. "Addressing the thermal issues of STT-MRAM from compact modeling to design techniques," IEEE Trans. Nanotechnology., Vol. 17, no. 2, pp. 345-352, Mar 2018.DOI
11 
C. Wang et al. "Design of an area-efficient computing in memory platform based on STT-MRAM," IEEE Trans. Magn., vol. 57, no. 2, pp. 1-4, Feb. 2021.DOI
12 
G. Patrigeon et al. "Design and evaluation of a 28-nm FD-SOI STT-MRAM for ultra-low power microcontrollers," IEEE Trans. Magn., vol. 7, no. 9, pp. 4982-4987, Sep. 2019.DOI
13 
S. Angizi et al "Design and evaluation of a spintronic in-memory processing platform for nonvolatile data encryption," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 9, pp. 1788-1801, Sep. 2018.DOI
14 
H. Yu et al. "An adder using charge sharing and its application in DRAMs," In Proceedings 2000 International Conference on Computer Design, Sep. 2000.DOI
15 
V. Vijay et al. "A Review On N-Bit Ripple-Carry Adder Carry-Select Adder And Carry-Skip Adder," Journal of VLSI circuits and systems., vol. 4, no. 01, pp. 27-32, Mar. 2022.DOI
16 
J.-G. Zhu et al. "Magnetic tunnel junctions," Mater. today., vol. 9, no. 11, pp. 36-45, Nov. 2006.DOI
17 
M. Hosomi et al. "A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM," in IEDM Tech. Dig., Dec. 2005, pp. 459-462.DOI
18 
Y. Luo et al. "A variation robust inference engine based on STT-MRAM with parallel read-out," Proc. IEEE Int. Symp. Circuits Syst. (ISCAS) Oct. 2020.DOI
19 
S. Ikeda et al. "Magnetic tunnel junctions for spintronic memories and beyond," IEEE Trans. Electron Devices., vol. 54, no. 5, pp. 991-1002, May. 2007.DOI
20 
M. Zabihi et al. "Using spin-hall mtjs to build an energy-efficient in-memory computation platform," Proc. 20th Int. Symp. Qual. Electron. Design (ISQED), Mar. 2019, pp. 52-57.DOI
21 
E. Deng et al. "Low power magnetic full-adder based on spin transfer torque MRAM," IEEE trans. Magn., vol. 49, no. 9, pp. 4982-4987, Sep. 2013.DOI
22 
S. Lim et al "Highly independent MTJ-based PUF system using diode-connected transistor and two-step postprocessing for improved response stability," IEEE Trans. Inf. Forensics Security., vol. 15, pp. 2798-2807, 2020.DOI
23 
W. Zhao et al "Design considerations and strategies for high-reliable STT-MRAM," Microelectron. Rel., vol. 51, no. 9, pp. 1454-1458, Sep. 2011.DOI
24 
G. P. Devaraj et al "Design and Analysis of Modified Pre-Charge Sensing Circuit for STT-MRAM," 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), March. 2021, pp. 507-511.DOI
25 
T. Na et al "Comparative study of various latch-type sense amplifiers," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 22, no. 2, pp. 425-429, Feb. 2014.DOI
26 
B. Wicht et al. "Yield and speed optimization of a latch-type voltage sense amplifier," IEEE Journal of Solid-State Circuit. (JSSC), vol. 39, no. 7, pp. 1148-1158, July. 2004.DOI
27 
T. Na et al., "Offset-canceling current-sampling sense amplifier for resistive nonvolatile memory in 65 nm CMOS", IEEE J. Solid-State Circuits, vol. 52, no. 2, pp. 496-504, Feb. 2017.DOI
28 
Q. Dong et al., "A 1-Mb 28-nm 1T1MTJ STT-MRAM with single-cap offset-cancelled sense amplifier and in situ self-write-termination", IEEE J. Solid-State Circuits, vol. 54, no. 1, pp. 231-239, Jan. 2019.DOI
29 
T. Na et al., "Offset-canceling single-ended sensing scheme with one-bit-line precharge architecture for resistive nonvolatile memory in 65-nm CMOS", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 11, pp. 2548-2555, Nov. 2019.DOI
30 
J. Kim et al., "A novel sensing circuit for deep submicron spin transfer torque MRAM (STT-MRAM)", IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 1, pp. 181-186, Jan. 2012.DOI
31 
F. Ren et al., "A body-voltage-sensing-based short pulse reading circuit for spin-torque transfer RAMs (STT-RAMs)", Proc. Int. Symp. Quality Electron Design (ISQED), pp. 275-282, 2012.DOI
32 
P. Chakali et al "Design of High Speed Kogge-Stone Based Carry Select Adder," International Journal of Emerging Science and Engineering. (IJESE), vol. 1, no. 4, pp. 2319-6378, Feb. 2013.URL
33 
R. Anjana et al "Implementation of Vedic mutiplier using Kogge Stone adder," IEEE Int. Conf. on Embedded Sys., July. 2014, pp. 28-31.DOI
34 
T. Brächer and P. Pirro "An analog magnon adder for all-magnonic neurons," J. Appl. Phys., vol. 124, no. 15, Oct. 2018.DOI
Jangseok Yu
../../Resources/ieie/JSTS.2024.24.2.111/au1.png

Jangseok Yu received the B.S. degree in Electronics Engineering from Incheon National University, Incheon, Republic of Korea, in 2024.

Geonwoo Lee
../../Resources/ieie/JSTS.2024.24.2.111/au2.png

Geonwoo Lee is currently pursuing the B.S. degree in Electronics Engineering from Incheon National University, Incheon, Republic of Korea.

Taehui Na
../../Resources/ieie/JSTS.2024.24.2.111/au3.png

Taehui Na received the B.S. and Ph.D. degrees in Electrical & Electronic Engineering from Yonsei University, Seoul, Republic of Korea, in 2012 and 2017, respectively. From 2017 to 2019, he was with Samsung Electronics Co., Ltd., Hwasung, Republic of Korea, where he worked on phase-change random access memory (PRAM) and high-performance NAND (ZNAND) core circuit designs. Since 2019, he has been a professor at Incheon National University, Incheon, Republic of Korea. His current research interests are focused on process-voltage-temperature variation tolerant and low-power circuit designs for memory, microcontroller unit, and neuromorphic SoC.