Mobile QR Code QR CODE

  1. (Department of Electrical and Information Engineering, Seoul National University of Science and Technology, Seoul, Korea)



Compute-in-memory, deep neural network, 3D NAND, variation

I. INTRODUCTION

Recently, artificial intelligence has been emerging to successfully process the complex tasks such as image classification and language translation. The deep learning algorithms are using larger and deeper networks, so called deep neural networks (DNNs), to achieve higher accuracy. Such trend requires tremendous number of computation which may incur lots of power consumption for the computing hardware. Specifically, processing the DNN on the conventional von-Neumann computing architecture may suffer from the energy consumption due to the large number of data movements between the dynamic random access memory (DRAM) and the processing units.

Therefore, many kinds of novel computing architectures have been proposed to replace the von-Neumann architecture. Compute-in-memory (CIM) has been proposed as a powerful candidate with its high energy efficiency and throughput owing to its parallel computation in memory. CIM utilizes the nonvolatile memory (NVM) array for the computation. The weights of the DNNs are stored as a conductance of the memory cell. Then, multiply-and-accumulation (MAC) operations are conducted by activating the multiple rows of the array and reading out the summed current along the column. The results of the digitized analog current represent the output of the MAC computation.

Every nonvolatile memory device has been investigated as a candidate for the CIM applications. Emerging NVMs such as resistive random access memory (RRAM) [1,2] and phase change memory (PCM) [3] have advantages in their multi-level characteristics and logic compatibility. But their small on and off current ratio make difficult to configuring large array size which may result in the low array efficiency of the chip. NOR Flash [4] could be another candidate thanks to its large on and off current ratio, however, recent NOR Flash technology has stopped at 28 nm technology node because of lots of fabrication challenges piled up.

Recently, 3D NAND Flash memory has been emerged as a CIM device candidate due to its ultra-high density and low on current [5,6]. Incomparable high density of 3D NAND enables storing and processing huge DNNs within a single 3D NAND die [7]. The fabrication technology of 3D NAND is already matured and further developments are on-going. Moreover, its inherent low on current is advantageous for low power and parallel computation.

In this paper, we investigate the current variation of the 3D NAND and its impact on the accuracy of the inference application system. We first develop the current variation from the various sources of the 3D NAND in detail. Then, the electrical parameters are incorporated into the software simulation to evaluate the inference accuracy degradation.

II. 3D NAND BASED CIM ARCHITECTURE AND NONIDEAL EFFECTS ON 3D NAND STRING CURRENT

Fig. 1 shows the MAC operation schemes on 3D NAND architecture. The string select line (SSL) input scheme [8] is depicted in Fig. 1(a), where the input voltages corresponding to the input data are applied to the string select lines (SSLs) while the string currents from the multiple cells are summed up along the bitline (BL). The analog-to-digital circuit (ADC) sense the aggregated currents and digitized results represent the output of MAC. Meanwhile, the BL input scheme [6] applies input voltage to the BLs and sense the output currents at source line (SL) as shown in Fig. 1(b). The significant bits (i.e. MSBs) of weights are duplicated to multiple SSLs. According to the applied input voltage at the bitlines, multi-bit weights can be processed at once. In terms of latency and energy efficiency, BL input scheme can take huge advantage when the network requires frequent input fetch. According to the target applications, the proper scheme must be exploited to achieve higher energy efficiency.

In spite of the advantages of the 3D NAND based CIM architecture, the inaccurate current of the memory cells may induce accuracy drop of the computation. Not different from other types of NVMs, 3D NAND string current would vary with various variation sources. The three critical factors that could affect the on-state current are shown in Fig. 2.

The retention characteristic of the charge trap flash (CTF) cells may cause serious impact on the on-state current. Due to the loss or gain of the trapped charges, i.e. electrons and holes, in the charge trap layer, threshold voltage of the individual cell may change over time. Not only the cells in the selected wordline to be read, but also the cells in the unselected wordlines that must be turned on during the read operation lose the charges over time as well. Therefore, the string current amount would decrease or increase compared to immediately after being programmed.

The operating temperature at read operation may also affect the on-state current. Because the grain boundaries of the polysilicon channel incur the temperature dependency, the temperature rise would cause the on-state current increase of the 3D NAND channel.

The pattern dependency of the NAND string is another critical factor that affect the on-state current. Although the cells at different NAND strings which are sharing the same wordline are erased (holes trapped in the charge trap layer) as shown in Fig. 2, the programmed states on the unselected wordlines cause on-state current variation, i.e. I$_{1}$ is higher than I$_{2}$. The current variation could be mitigated with high pass voltage (Vpass), but it cannot be completely eliminated.

Fig. 1. 3D NAND based in-memory MAC operation scheme with (a) SSL input scheme; (b) BL input scheme.
../../Resources/ieie/JSTS.2022.22.5.341/fig1.png
Fig. 2. Various current variation sources on 3D NAND string.
../../Resources/ieie/JSTS.2022.22.5.341/fig2.png

III. SIMULATION RESULTS

In this section, we show the quantitative array-level simulation result of the 3D NAND string current variation and the respective inference accuracy degradation. To conduct the array-level simulation, we used SPICE simulation with the BSIM parameters extracted from the measured 3D NAND device characteristics [6] as indicated in Fig. 3. The conventional BSIM parameters was modified to describe the I-V characteristics and threshold voltage window of the programmed and erased CTF cells. Fig. 4 shows the I-V curve of on and off state 3D NAND string current with modified BSIM parameters which was used throughout this work. To evaluate the string current variation induced by various device level nonideal effects (i.e. Vt shift, temperature dependent mobility variation, program and erase pattern in NAND string), variations were reflected on the electron mobility and flat band voltage of the extracted BSIM parameters. Then, to evaluate the NAND string current variation from these nonideal effects with SPICE simulation, the netlist for the NAND array built with 32 WL NAND string was written up. A single array is composed of 128 NAND strings to be able to extract the standard deviation of the current variation with randomly written data, which means 32 CTF cells of every NAND string are randomly programmed (0 or 1). The string current variation can be transferred to the change or variation of the pre-trained weight values. By incorporating the string current variation from SPICE simulation result into the weights of pre-trained model which are already quantized integer numbers, we evaluated the inference accuracy for the CIFAR-10 dataset on VGG-8 model. In the VGG-8 model, ReLU function was used as a nonlinear activation function. Throughout this work, the two-level input voltages (0 V and Vread) were used to represent binary input signals.

As mentioned in the previous section, charge loss or charge gain in the charge trapping layer over time change the threshold voltage (V$_{\mathrm{t}}$) of the CTF cells. For the MAC operation on 3D NAND array, the on-state (erased) current plays dominant role for the accurate computation. Therefore, the effect of the electron gain (or hole loss) of the on-state cells should be investigated. We assumed that all the erased cells gain or lose the same amount of charge at specific time which means the threshold voltage changes are identical for every erased cell. First, the 32 WL 3D NAND string current change with respect to the positive shift of the threshold voltage have been simulated as shown in Fig. 5(a). As the threshold voltage shift increases, the string current decrease gradually. 0.4 V of threshold voltage shift induces approximately 10 % current drop. We incorporated this current drop into the software simulation as weight mean value change to evaluate the inference accuracy drop of the CIFAR-10 dataset. Relatively high accuracy was maintained until 0.2 V of threshold voltage shift. However, accuracy drops to 10 % drastically with threshold voltage shift larger than 0.25 V. It can be noted that the threshold voltage shift must be carefully monitored not to exceed ~0.2 V.

The operating temperature dependency to the inference accuracy is shown in Fig. 6. Within the polysilicon channel, the grain boundaries impede the electron drift. The higher operating temperature helps to jump over this electrical barrier. From [9], the electron mobility at high temperature increases drastically, e.g., at 400 K, it is 30 % higher than at room temperature. If the sensing circuits are designed to operate at the room temperature near 300 K, the inference accuracy significantly drops at a temperature above 310 K if any temperature compensation schemes are not applied. It implies that the temperature compensation in the sensing circuits such as temperature dependent reference voltage or current must be implemented for the practical design.

The program and erase pattern dependency at the 32 WL 3D NAND string is shown in Fig. 7. Even though the 64 programmed cells and 64 erased cells have exactly same threshold voltage respectively, the randomly programmed pattern at the cells of unselected WLs affect the 3D NAND string current. It can be seen that both the threshold variation (-1 V to 0 V for erased cells) and the on-state current variation (2 nA to 3.5 nA for erased cells) exists. Because the on-state currents do not vary with higher than 1 V of WL voltage, the threshold voltage may not affect the total summed currents from multiple NAND strings. However, the on-state current variation directly affects the summed current amount so that the inference accuracy may be degraded. By applying the weight value variation with the on current standard variation results above, 84.5 % of inference accuracy for CIFAR-10 dataset was achieved which is around 5 % lower than the case without on-state current variation.

Fig. 3. Inference accuracy simulation flow of this work from device-level characterization to software simulation.
../../Resources/ieie/JSTS.2022.22.5.341/fig3.png
Fig. 4. Simulated I-V curve of on and off state 32 WL 3D NAND string current with modified BSIM parameters.
../../Resources/ieie/JSTS.2022.22.5.341/fig4.png
Fig. 5. (a) String current change with the threshold voltage shift of the erased cell; (b) CIFAR-10 inference accuracy with respect to the threshold voltage shift.
../../Resources/ieie/JSTS.2022.22.5.341/fig5.png
Fig. 6. Temperature dependency of the inference accuracy of 3D NAND based compute-in-memory system.
../../Resources/ieie/JSTS.2022.22.5.341/fig6.png
Fig. 7. Programmed pattern dependency at the 32 WL 3D NAND string current. 128 string currents are shown. The 64 cells on the selected WL are programmed and the other 64 cells are erased.
../../Resources/ieie/JSTS.2022.22.5.341/fig7.png

IV. CONCLUSION

The NAND string current variation induced by various sources was simulated. Accordingly, inference accuracy would be significantly degraded as the impact of the variation source increases. From this work, we provide the guideline of NAND Flash based CIM hardware with accuracy point of view. Noticeable degradation was shown when the threshold voltage shift of erased cells in NAND string shifts larger than 0.25 V, or temperature increases 10 degrees. The randomly programmed data pattern also degrades the accuracy as well. In conclusion, further device engineering and circuit design techniques must be developed to compensate the inference accuracy degradation of the NAND Flash based CIM engine.

ACKNOWLEDGMENTS

This study was financially supported by Seoul National University of Science and Technology.

References

1 
Wu W., et al , June 2018, A methodology to improve linearity of analog RRAM for neuromorphic computing, Symposium on VLSI Technology, art. No. 8510690, pp. 103-104DOI
2 
Lin P., et al , 2020, Three-dimensional memristor circuits as complex neural networks, Nature Electronics 3, pp. 225-232DOI
3 
Ambrogio S., et al , 2018, Equivalent-accuracy accelerated neural-network training using analogue memory, Nature, Vol. 558, pp. 60-67DOI
4 
Guo X., et al , 2017, Fast, energy-efficient, robust, and reproducible mixed-signal neuromorphic classifier based on embedded NOR flash memory technology, IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, pp. 6.5.1-6.5.4.DOI
5 
Wang P., et al , 2019, Three-dimensional NAND Flash for vector-matrix multiplication, IEEE Trans. VLSI Systems, Vol. 27, No. 4, pp. 988-991DOI
6 
Lue H. -T., et al , 2019, Optimal design methods to transform 3D NAND Flash into a high-density, high-bandwidth and low-power nonvolatile computing in memory (nvCIM) accelerator for deep-learning neural networks (DNN), IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, pp. 38.1.1-38.1.4.DOI
7 
Shim W., et al , Oct. 2020, Architectural design of 3D NAND Flash based compute-in-memory for inference engine, ACM/IEEE International Symposium on Memory Systems (MEMSYS), pp. 77-85DOI
8 
Shim W., et al , Feb. 2021, Technological Design of 3D NAND-Based Compute-in-Memory Architecture for GB-Scale Deep Neural Network, in IEEE Electron Device Letters, Vol. 42, No. 2, pp. 160-163DOI
9 
Resnati D., et al , Aug. 2018, Characterization and Modeling of Temperature Effects in 3-D NAND Flash Arrays-Part I: Polysilicon-Induced Variability, IEEE Transactions on Electron Devices, Vol. 65, No. 8, pp. 3199-3206DOI
Wonbo Shim
../../Resources/ieie/JSTS.2022.22.5.341/au1.png

Wonbo Shim received the B.S., Ph.D. degrees in electrical engi-neering from the Seoul National University, Korea in 2007 and 2013, respectively. He is currently an assistant professor of Department of Electrical and Information Engi-neering with Seoul National University of Science and Technology. From 2013 to 2019, he was a Senior Engineer with the Samsung Electronics, Hwaseong, Korea, in flash design team at memory division. From 2019 to 2021, he was a Postdoctoral Research Fellow in Georgia Institute of Technology. His current research interest includes energy efficient deep learning architecture design, nonvolatile memory device modeling for synaptic applications, and 3D NAND Flash design.