Mobile QR Code QR CODE

  1. (Korea Advanced Institute of Science and Technology (KAIST) Daejeon, Republic of Korea)



1T1C eDRAM, Computing-in-memory, network-on-chip, spiking neural network, system-level efficiency

I. INTRODUCTION

Recently, Computing-In-Memory (CIM) has been actively researched to achieve high energy efficiency in processing Deep Neural Networks (DNNs). Traditional CIM processors [1,2] employed multi-word-line (WL) driving techniques to improve energy and area efficiency. CIM mitigates data movement bottlenecks by enabling in-memory data processing, offering significant advantages over conventional digital computing architectures: reduced data movement, support for parallel processing, and improved energy and area efficiency through the integration of memory and processors. These characteristics enable high-speed and energy-efficient large-scale deep learning computation. However, while Analog-to-Digital Converters (ADCs) play a critical role in CIM architectures, they also present significant challenges, including high power consumption, limited area efficiency, and their impact on computational accuracy. In particular, ADCs account for approximately 59% of total power consumption [1] and 20% of processor area [2], creating a major obstacle to achieving higher energy and area efficiency.

To address these limitations, Spiking Neural Network (SNN)-based CIM [3,4] has been proposed as an effective alternative. By eliminating the power and area overhead caused by ADCs, SNN-CIM offers a promising solution. SNN-CIM achieves macro-level energy efficiency through two architectural features: spike-based accumulation and 1-bit thresholding. First, in conventional CIM architectures, multiplication, and accumulation (MAC) operations are central to data processing, requiring significant computational resources and energy. In contrast, SNN-CIM replaces these complex operations with simple accumulation tasks. This is made possible by the inherent characteristics of Spiking Neural Networks, where signals are represented as spikes, simplifying the computational process. As shown in Fig. 2, during the integration phase, input signals are accumulated over time, enabling data processing without the need for multiplication operations. This effectively eliminates the computational cost associated with multiplication, significantly reducing overall power consumption. Second, during the firing phase, SNN-CIM replaces the complex ADCs typically used to determine whether input signals meet a neuron's firing threshold with a simple 1-bit comparator. In traditional CIM architectures, ADCs convert multi-bit signals to digital form for further processing, which involves considerable complexity. By introducing the firing phase, SNN-CIM avoids using ADCs entirely, relying instead on 1-bit comparators to determine firing conditions. These two features enable SNN-CIM to overcome the inefficiencies of traditional CIM architectures, achieving substantial improvements in energy and area efficiency at the macro level.

However, one drawback of SNN-CIM is the large input activation memory requirement, as shown in Fig. 3. This is due to the representation method of SNN, where n-bit data as a $2n$-length spike train composed of binary spikes (0 or 1). Despite its macro-level efficiency, frequent external memory accesses (EMA) for input activation degrade system-level energy efficiency. To mitigate the EMA issue, layer fusion has been introduced [5], allowing multiple layers to be processed by loading their weights simultaneously. Consequently, a high-density SNN-CIM is necessary to accommodate the extensive memory footprint required for such layer fusion. Existing SNN-CIM designs [3] face limitations due to a large peripheral logic area ($>69.5$%) and reliance on 8T SRAM cells, highlighting the need for more area-efficient peripheral logic and denser memory technologies like eDRAM.

Additionally, the rigid architectures of prior SNN-CIMs [3,4] are not adaptable to varying layer configurations, such as input channel sizes and kernel dimensions. Optimizing hardware utilization and throughput requires a flexible ratio of memory to peripheral logic that adapts to specific layer requirements. Previous fixed architectures result in suboptimal hardware utilization and diminished throughput. A reconfigurable architecture is therefore essential to dynamically adjust the memory and peripheral logic balance, ensuring optimal configurations for each layer and enhancing system-level energy efficiency.

This paper presents a high-density and reconfigurable SNN eDRAM-based CIM processor that introduces two major innovations to boost system-level energy efficiency: 1) A high-density Reconfigurable Neuro-Cell Array (ReNCA) design using a 1T1C eDRAM cell and area-efficient SNN peripheral logic, achieving a 41% reduction in area and 90% reduction in power compared to prior works. 2) A reconfigurable CIM architecture leveraging ReNCA and Dynamic Adjustable Neuron Link (DAN Link) to adapt memory and peripheral configurations to diverse layer requirements, thereby maximizing energy efficiency.

The proposed processor is implemented using a 28nm CMOS process and demonstrates outstanding performance: achieving 157.15 TOPS/W excluding EMA and 0.03 TOPS/W including EMA on ResNet18 with CIFAR-10, which is $10\times$ higher than [3], and 86.63 TOPS/W excluding EMA and 0.53 TOPS/W including EMA on ResNet18 with ImageNet.

Fig. 1. Advantages and bottleneck of analog-CIM.

../../Resources/ieie/JSTS.2025.25.4.355/fig1.png

Fig. 2. Operation of SNN-CIM.

../../Resources/ieie/JSTS.2025.25.4.355/fig2.png

Fig. 3. Limitation of previous SNN-CIM.

../../Resources/ieie/JSTS.2025.25.4.355/fig3.png

II. OVERALL ARCHITECTURE

As shown in Fig. 4, the proposed SNN eDRAM-CIM processor incorporates a high-density, reconfigurable 1T1C eDRAM CIM bank as its core component. The overall architecture includes 37 KB of input memory, 37 KB of output memory, four 1T1C eDRAM CIM banks, a top controller, and a spike encoder which converts spike trains. These components are interconnected via a custom 2D mesh Network-on-Chip (NoC), which transforms data into spike trains.

Each 1T1C eDRAM CIM bank comprises a $76\times16$ reconfigurable neuro cell array (ReNCA), a dynamic adjustable neuron link (DAN Link), and a bank controller. The ReNCA includes an accumulation path designed to combine weights across ReNCAs. Each ReNCA is composed of a $64\times 64$ array of 1T1C cells, supported by 64 charge pumps (CP) [9] and 64 firing sense amplifiers (firing SAs). This structure can function in either memory mode (mem-mode) or peripheral mode (peri-mode), with the bank controller dynamically configuring the mode based on the specific layer requirements.

The DAN Link facilitates layer fusion by connecting multiple ReNCAs within the bank. In the $64\times64$ 1T1C cell array, cells associated with even-numbered word lines (WLs) are connected to the bit line (BL), while cells on odd-numbered WLs are linked to the bit line bar (BLB). This separation prevents overlap between the WLs associated with BL and BLB connections.

Each 1T1C bit cell occupies an area of 0.276 $\mu$m² and has a capacitance of 0.87 fF. The capacitor of the cell is constructed using both MOS and MOM capacitors, ensuring a compact and efficient design.

Fig. 4. Overall architecture.

../../Resources/ieie/JSTS.2025.25.4.355/fig4.png

III. RECONFIGURABLE NEURO-CELL ARRAY

Although SNN-CIM demonstrates superior macro-level energy efficiency over earlier CNN CIM designs [6-8], its system-level energy efficiency is limited due to the substantial power consumption incurred by intermediate activations. When input data is represented as a spike train, the power consumption for intermediate EMA increases by $3.88\times$ (@ResNet50, ImageNet, I $= 6$ bit, W $= 8$ bit, Micron 16Gb DDR4 SDRAM$\times16$, 600 MHz, based on architecture in [3]). The state-of-the-art SNN-CIM [3], which uses 8T SRAM cells and extensive peripheral logic, suffers from insufficient memory density, making efficient layer fusion impractical and limiting system-level energy efficiency to just 0.003 TOPS/W, including EMA power.

The proposed reconfigurable neuro-cell array (ReNCA) overcomes these limitations through three key innovations, as shown in Fig. 5. First, vertical and horizontal charge-sharing accumulation eliminates the requirement for additional integrated capacitors by leveraging the parasitic capacitance of bit lines (BL) and data bus lines (DB). Second, a charge pump (CP) with only two MOSFETs [9] is employed for membrane potential accumulation, reusing the 1T1C cell capacitors as the secondary capacitor (C2) in the CP structure.

Third, the eDRAM-designed sense amplifier (SA) is reused as firing logic, functioning in both memory (mem-mode) and peripheral (peri-mode) modes through switch control. By effectively repurposing the 1T1C cell array to serve as both eDRAM bit cells and peripheral logic, the ReNCA architecture enhances system-level energy efficiency. The compact and high-density structure of the design allows for the integration of multiple ReNCAs, facilitating parallel operations within and across banks

Fig. 5. Features of reconfigurable 1T1C neuro-cell array architecture.

../../Resources/ieie/JSTS.2025.25.4.355/fig5.png

1. 1T1C ReNCA Architecture

Fig. 6 illustrates the detailed architecture of the ReNCA. Switches are included to support both membrane accumulation and multi-bit operations on BL pairs. Each BL pair in the ReNCA is equipped with a charge pump (CP), which consists of two MOSFETs, four input switches, and two output switches.

Membrane integration between arrays is accomplished solely through the CP within the ReNCA, without requiring additional peripheral logic. This approach achieves a 17% reduction in area and a 90% reduction in power consumption compared to folding circuits [3] used for inter-cell array membrane integration. As depicted in Fig. 6, the charge pump (CP) input can be linked to the BL and BLB of the Nth ReNCA via switches (1, 2), while switches (3, 4) allow it to connect to the BL and BLB of the ($N+1$)th ReNCA. The CP output can then connect to the BL and BLB of the ($N+1$)th ReNCA via switches (5, 6). The firing SA operates either as a sense amplifier in memory mode or as a comparator in peripheral mode, depending on the selected mode. This integrated design reduces the area by 41% compared to architectures that rely on individually implemented cell arrays, CPs, SAs, and comparators.

Fig. 6. Comprehensive architecture of the reconfigurable neuro-cell array (ReNCA).

../../Resources/ieie/JSTS.2025.25.4.355/fig6.png

2. 1-bit SNN Computation with ReNCA

The computation process for 1-bit weight SNN using ReNCA, illustrated in Fig. 7, is divided into two main phases: integration (❶$\mathrm{\sim}$❸) and firing (❹). The steps involved are as follows: 1) Spatial Summation (❶): Weights are summed along the spatial dimension. 2) Temporal Accumulation (❷): The output from spatial summation is added to the membrane potential. 3) Input Channel Summation (❸): Weights from input channels are accumulated, and the result is further added to the membrane potential using temporal accumulation.

This sequence (❶ $\mathrm{\to}$ ❷ $\mathrm{\to}$ ❸ $\mathrm{\to}$ ❷) skips zero-valued spikes and continues until all input spikes for the current time step are processed. Finally, at the firing phase (❹), the output spike is generated.

The detailed operation of the CIM processor is described as below. In step 1, word lines (WLs) corresponding to active spikes in ReNCA 0 and ReNCA 1 are enabled. While the WLs are driven, the firing SA writes back the data. Once the WLs are disabled, spatial summation is executed between the two ReNCAs through charge sharing, utilizing BL parasitic capacitors. Parallel calculations are performed in the output channel direction for each ReNCA.

The BL voltage is then amplified by an amplifier and sent to the charge pump (CP) input. The CP utilizes its C${}_{2}$ capacitor, which reuses the BL parasitic capacitance and cell capacitors within the ReNCA, to perform temporal accumulation. As depicted in Fig. 5(b), the simulated output voltage of the charge pump exhibited high linearity when four cell capacitors were utilized for temporal accumulation. Next, each ReNCA input driver skips WLs that correspond to zero-valued inactivated spike and only activates WLs linked to active spikes.

After all the weights from the input spikes of the current time step are aggregated, the firing phase (❹) begins. In this phase, the membrane potential on each BL of ReNCA 2 is connected to the firing SA through switches, and output spikes are generated by comparing the BL voltage to a predefined threshold.

Fig. 7. 1-bit weight SNN computation flow.

../../Resources/ieie/JSTS.2025.25.4.355/fig7.png

IV. RECONFIGURABLE CIM ARCHITECTURE

Traditional CIM processors utilize a fixed ratio of memory to peripheral logic. To maximize hardware utilization across various layers, a flexible hardware configuration is essential. This paper presents a reconfigurable CIM architecture featuring a dual-mode Reconfigurable Neuro-Cell Array (ReNCA) and a Dynamic Adjustable Neuron Link (DAN Link), as illustrated in Fig. 8, for interconnecting multiple ReNCAs.

The ReNCA supports dual-mode operation. In memory mode (mem-mode), the ReNCA serves as a storage array for SNN weights, where the firing SA replaces the conventional sense amplifier. Conversely, in peripheral mode (Peri-mode), the ReNCA serves as a peripheral circuit by reusing the 1T1C cell array. In this mode, the charge pump is used for membrane potential accumulation and supports multi-bit operations using the 1T1C cell capacitor. Additionally, the firing SA is repurposed as a comparator.

The DAN Link, designed as a 2D custom mesh-type NoC, interconnects ReNCAs and enables the bank controller to configure each ReNCA mode according to per-layer memory and peripheral logic requirements. This NoC structure facilitates layer fusion by establishing dynamic computational pathways within the bank.

Previous fixed CIM architectures suffer from a rigid memory-to-peripheral hardware ratio, leading to underutilization and reduced operation density. In contrast, the proposed reconfigurable CIM architecture in this paper, dynamically adjusts this ratio, resulting in higher operation and memory densities. Specifically, it improves operation and memory density by up to 68.2% compared to conventional fixed CIM architectures. Furthermore, the proposed ReNCA and DAN Link-based CIM design achieves a $2.82{\times}$ improvement in density FoM compared to fixed CIM architectures. Ultimately, this reconfigurable architecture enhances system efficiency by a factor of $10\times$ for ResNet-18 on CIFAR-10 compared to the previous SNN-CIM [3].

Fig. 8. Reconfigurable CIM with dual-mode ReNCA.

../../Resources/ieie/JSTS.2025.25.4.355/fig8.png

V. MULTI-BIT EXTENSION IN RENCA

The charge pump (CP) in ReNCA is designed to be expandable, allowing it to handle multi-bit SNN operations. The CP processes 4 bit weights by converting them into an analog voltage. The operation begins by resetting the voltages of all cells in the array. Multi-bit functionality is realized by generating an analog output voltage (Vout) that reflects the significance of each bit position. The output voltage (Vout) is scaled by adjusting the value of the C2 capacitor in the CP, with a scaling factor of one-half per bit position. The capacitance of C2 is determined by the number of unit capacitors connected to the bit line (BL) or bit line bar (BLB).

VI. IMPLEMENT RESULT

Fig. 9 presents the layout photograph of the proposed 1T1C eDRAM-based SNN-CIM processor, fabricated using 28 nm CMOS technology. The processor integrates 19 Mb of memory, offering over $20\times$ the capacity of the state-of-the-art SNN-CIM [3]. The chip occupies an area of 10.47 mm$^2$ and includes four banks. The processor achieves an accuracy of 94.13% on ResNet-18 with CIFAR-10 and 72.03% on ResNet-50 with ImageNet, based on CNN-to-SNN conversion, operating at a clock frequency of 250 MHz condition with a low power consumption of 0.096 W. Table 1 highlights the comparison with prior state-of-the-art CIM processors. This is the first SNN-CIM processor utilizing 1T1C eDRAM technology. Furthermore, unlike previous CIM architecture, it supports a reconfigurable CIM architecture that dynamically adjusts the memory-to-peripheral logic ratio based on various layer requirements. The proposed CIM processor achieves remarkable energy efficiency on two levels: At the macro level, it delivers 382 TOPS/W to 1531.3 TOPS/W, outperforming previous CNN CIMs [6-8]. It provides a $7729\times$ and $55.6\times$ improvement in density FoM over prior 1T1C CIM [7] and SNN-CIM [3], respectively. Ultimately, the processor achieves a $10\times$ increase in system-level energy efficiency compared to [3], as evaluated on ResNet-18, CIFAR-10, considering EMA.

Fig. 9. Chip photo & performance summary.

../../Resources/ieie/JSTS.2025.25.4.355/fig9.png

Table 1. Comparison table.

../../Resources/ieie/JSTS.2025.25.4.355/tb1.png

VII. CONCLUSION

The proposed 1T1C eDRAM-based SNN-CIM processor significantly enhances system-level efficiency through two main features. First, the high-density Reconfigurable Neuro-Cell Array (ReNCA), utilizing 1T1C bit cells and compact SNN peripheral logic, achieves a 41% reduction in area and 90% reduction in power consumption compared to [3]. Second, the dual-mode ReNCA and Dynamic Adjustable Neuron Link (DAN Link) enable a reconfigurable CIM architecture, improving system-level efficiency, including EMA, by $10\times$ over [3]. As a result, the processor demonstrates a state-of-the-art macro-level efficiency of 1531.3 TOPS/W and achieves $10\times$ higher system-level efficiency than the previous SNN-CIM [3].

ACKNOWLEDGMENTS

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) under the Graduate School of Artificial Intelligence Semiconductor (IITP-2024-RS-2023-00256472) grant funded by the Korea government (MSIT).

References

1 
J. Lee, H. Valavi, Y. Tang, and N. Verma, ``Fully row/column parallel in-memory computing SRAM macro employing capacitor based mixed-signal computation with 5-b inputs,'' Proc. of 2021 Symposium on VLSI Circuits, pp. 1-2, 2021DOI
2 
H. Jia and M. Ozatay, ``A programmable neural-network inference accelerator based on scalable in-memory computing,'' Proc. of 2021 IEEE International Solid- State Circuits Conference (ISSCC), pp. 236-237, 2021.DOI
3 
S. Kim, S. Kim, S. Um, S. Kim, K. Kim, and H. -J. Yoo, ``Neuro-CIM: A 310.4 TOPS/W neuromorphic computing-in-memory processor with low WL/BL activity and digital-analog mixed-mode Neuron firing,'' Proc. of 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), pp. 38-39, 2022.DOI
4 
Y. Wang, ``Energy efficient RRAM spiking neural network for real time classification,'' Proc. of the 25th edition on Great Lakes Symposium on VLSI, pp. 189-194, 2015.DOI
5 
K. Goetschalckx and M. Verhelst, ``DepFiN: A 12 nm, 3.8TOPs depth first CNN processor for high res. image processing,'' Proc. of 2021 Symposium on VLSI Circuits, pp. 1-2, 2021.DOI
6 
Z. Chen, X. Chen, and J. Gu, ``15.3 A 65 nm 3T dynamic analog RAM-based computing-in-memory macro and CNN accelerator with retention enhancement, adaptive analog sparsity and 44TOPS/W system energy efficiency,'' Proc. of 2021 IEEE International Solid- State Circuits Conference (ISSCC), pp. 240-242, 2021.DOI
7 
S. Xie, C. Ni, A. Sayal, P. Jain, F. Hamzaoglu, and J. P. Kulkarni, ``16.2 eDRAM-CIM: Compute-in-memory design with reconfigurable embedded-dynamic-memory array realizing adaptive data converters and charge-domain computing,'' Proc. of 2021 IEEE International Solid- State Circuits Conference (ISSCC), pp. 248-250, 2021.DOI
8 
S. Xie, C. Ni, P. Jain, F. Hamzaoglu, and J. P. Kulkarni, ``Gain-cell CIM: Leakage and bitline swing aware 2T1C gain-cell eDRAM compute in memory design with bitline precharge DACs and compact schmitt trigger ADCs,'' Proc. of 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), pp. 112-113, 2022.DOI
9 
T. Tanzawa and T. Tanaka, ``A dynamic analysis of the Dickson charge pump circuit,'' IEEE Journal of Solid-State Circuits, vol. 32, no. 8, pp. 1231-1240, Aug. 1997.DOI
Sangmyoung Lee
../../Resources/ieie/JSTS.2025.25.4.355/au1.png

Sangmyoung Lee received his B.S. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2024, where he is currently pursuing an M.S. degree in Graduate School of AI Semiconductor. His current research interests include energy-efficient processing-in-memory accelerators and application-specific neuromorphic hardware.

Seryeong Kim
../../Resources/ieie/JSTS.2025.25.4.355/au2.png

Seryeong Kim received her B.S. degree in electrical engineering from the Korea Aerospace University, South Korea, in 2021, and an M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2023, where she is currently pursuing a Ph.D. degree. Her current research interests include SW-HW co-optimization, low-power system-on-chip design, and application-specific processor design.

Soyeon Kim
../../Resources/ieie/JSTS.2025.25.4.355/au3.png

Soyeon Kim received her B.S. and M.S. degree from the School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2019 and 2021, respectively, where she is currently pursuing a Ph.D. degree. Her research interests include energy-efficient deep learning processor design and intelligent computer vision systems.

Soyeon Um
../../Resources/ieie/JSTS.2025.25.4.355/au4.png

Soyeon Um received her B.S. and M.S. degrees from the School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2020 and 2021, respectively, where she is currently pursuing a Ph.D. degree. Her current research interests include low-power deep learning and intelligent vision system-on-chip (SoC) design, energy-efficient processing-memory architecture, and application-specific neuromorphic hardware.

Sangjin Kim
../../Resources/ieie/JSTS.2025.25.4.355/au5.png

Sangjin Kim received his B.S., M.S., and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2019, 2021, and 2024, respectively. He is currently a Postdoctoral Associate at KAIST. His research interests include computing-in-memory for low-power AI accelerators, processing-in-memory for energy-efficient AI systems, and hardware-software co-optimization for generative AI models.

Sangyeob Kim
../../Resources/ieie/JSTS.2025.25.4.355/au6.png

Sangyeob Kim received his B.S., M.S. and Ph.D. degrees from the School of Electrical Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 2018, 2020 and 2023, respectively. He is currently a Post-Doctoral Associate with the KAIST. His current research interests include energy-efficient system-on-chip design, especially focused on deep neural network accelerators, neuromorphic hardware, and computing-in-memory accelerators.

Wooyoung Jo
../../Resources/ieie/JSTS.2025.25.4.355/au7.png

Wooyoung Jo received his B.S. and M.S. degree in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea in 2020 and 2022, respectively, where he is currently pursuing a Ph.D. degree. His current research interests include energy-efficient ASIC/SoCs especially focused on computer vision, large language models and deep learning algorithms for efficient processing.

Hoi-Jun Yoo
../../Resources/ieie/JSTS.2025.25.4.355/au8.png

Hoi-Jun Yoo graduated from the Department of Electronics, Seoul National University, Seoul, South Korea, in 1983. He received his M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea, in 1985 and 1988, respectively. Dr. Yoo has served as a member of the Executive Committee for the International Solid-State Circuits Conference (ISSCC), the Symposium on Very Large-Scale Integration (VLSI), and the Asian Solid-State Circuits Conference (A-SSCC), the TPC Chair of the A-SSCC 2008 and the International Symposium on Wearable Computer (ISWC) 2010, the IEEE Distinguished Lecturer from 2010 to 2011, the Far East Chair of the ISSCC from 2011 to 2012, the Technology Direction Sub-Committee Chair of the ISSCC in 2013, the TPC Vice-Chair of the ISSCC in 2014, and the TPC Chair of the ISSCC in 2015. He is currently an ICT chair professor in the School of Electrical Engineering at KAIST, where he also serves as director of the System Design Innovation and Application Research Center (SDIA), PIM Semiconductor Design Research Center (AI-PIM), and KAIST Institute of Information Technology Convergence. He is also dean of graduate school of AI semiconductor at KAIST. More details are available at http://ssl.kaist.ac.kr.