Mobile QR Code

1. (Department of Electronic Engineering, Gachon University, Sujeong-gu, Seongnam-si, Gyeonggi-do 13120, Korea)

Semiconductor devices, semiconductor memories, data processing, computer architecture, neuromorphic system, processing-in-memory (PIM), memory processing unit (MemPU)

## I. INTRODUCTION

The actual effective speed of a computer system is determined by speed of memory, and further, that of communication between processing and memory units. It is an undoubted fact that the intrinsic gate delay governs system speed most fundamentally, but we are not living in the era in which the processing speed of a central processing unit (CPU) is determined by the speed of transistor switching although the great deal of effort has been dedicated to shrinkage of transistor for higher switching speed and low power consumption. For being capable of accommodating the gigantic amount of data, stronger parallelism has been consistently required. Parallel computing, high-performance computing (HPC), distributed computing, and grid computing can be thought as the effort for increasing the system speed by physical segmentations of computers over space, operating in the time-division manner (1-4), which have been prevalent. In recent times, such computers are shrinkun into a chip with the highly scaled parallelism, which can be easily found in the contemporary multi-core CPUs and many-core graphic processing units (GPUs) (5-8). However, these technologies are highly dependent on scaling technology of transistors and what crucially matters is the logic operation speed, not taking the actual system speed determined at the level of massive nonvolatile memories into the serious consideration. The computing performances have been referred as the result of semiconductor logic technology but the importance of memory technologies is getting larger and larger as the high-performance computers and hardware-driven artificial intelligence (AI) become more big-data-driven and require expedited communication between the processor core and the ultra-high-density memory area (9). In this review, volatile and nonvolatile memory devices making up the most fundamental functional cells in the advanced computer architectures are surveyed in the highlights of their applications in the hardware-driven neuromorphic systems and processing-in-memory (PIM).

Fig. 1. Hierarchy in semiconductor memories for the neuromorphic and PIM applications.

Semiconductor memories can be categorized by two criteria as schematically shown in Fig. 1: (i) whether it is charge-storage type or resistance-changing type and (ii) whether it is volatile or nonvoltaile. Great majority of Si memory devices are found in the charge (or potential) storage type including static random-access memory (SRAM), dynamic random-access memory (DRAM), and flash memory. Although floating-gate (FG) structure had the great majority in the past flash memory technologies and it can be still found in the microcontroller units (MCUs) embedding the FG flash memories owing to its perfect Si processing compatibility, the predominence is taken by the charge-trap flash (CTF) technology in recent times. In the charge storage memory regime, the device evolutions have been progressed with a relatively high emphasis on novel device stucturing since the base materials that can be accommodated in the fabrication facility for mass chip production are not unlimited and the Si processing technologies are highly matured. On the other hand, in the regime of resistance-change memories, the mateirals are being sought without ceasing and the development and optimization of process architecture are of parallel concern. It needs to be clarified that resistance-change type and resistive-switching random access memory (RRAM) do not have the same definition but they have different set and subset relations as clearly grasped by Fig. 1. The resistance-change memory refers to all the memories in which the state changes can be made by the change in resistance, or equivalently, that in conductance. Phase-change random-access memory (PRAM), ferroelectric RAM (FRAM), magnetic RAM (MRAM), and RRAM belong to resistance-change memory technology.

The following sections have been organized covering all the hiararchies: neuromorphic applications based on charge-storage volatile memories, SRAM and DRAM, are surveyed in Chapter II. Those with nonvolatile memories are investigated in Chapter III, which is more specifically divided into Chapter III. 1 for charge-storage CTF and Chapter III. 2 for resistance-change memories including all of PRAM, FRAM, MRAM, and RRAM, in sequence.

## II. Volatile Memory Cells for Neuromorphic Applications

Fig. 2. Artificial intelligence and neuromorphic computing.

Fig. 3. Background and orientation of the artificial intelligence.

Neuromorphic computing is a new way of computing mimicking the behaviors of nervous system. The most fundamental nervous behavior is broken down into the multiplication-and-accumulation (MAC) operations that take place between neurons as schematically shown in the upper part in Fig. 2. Neuromorphic computing is a forward step by which AI can be implemented in a more physical way so that the MAC operations can be carried out with higher volume and energy efficiencies (10). For this goal, more specifically designed hardwares - integrated circuits and devices - are necessitated as shown in the lower part in Fig. 2. The early AI was realized in the highly algorithm-intensive manner, in which the volume and energy efficiencies were not substantially considered (11). More hardware-oriented state-of-the-art neuromorphic chips have been incessantly released with the full Si CMOS processing compatibility (12-14), where the synapses were made of SRAMs. AI has been primarily led by software and is pursuing machine learning as can be schematically shown in Fig. 3. Deep neural network (DNN) is a widely admitted way to realize machine leaning that essentially requires big data. Thus, to be a successful hardware neuromorphic system, the synaptic device or cell needs to equip higher scalability toward a high-density synapse array. However, the bulky SRAM composed of 6 transistors is not strategic to practically achieve the goal (13,14), and as the result, applications can be quite limited (15). Although it has been rarely reported until recent date in comparison with SRAM, dynamic random-access memory (DRAM) is another volatile meory cell that can be also utilized for the hardware-driven neuromorphic system as the synapse with higher cell scalability. It was reported that DRAM can be used in the accelerator for either convolutional neural network (CNN) or recurrent neural network (RNN) due to the area and cost effectiveness of DRAM (16). Even in case of the architecture of a CNN accelerator employing DRAM, the DRAM domain is not used for synaptic computing but for providing the compressed feature maps and kernal as schematically shown in Fig. 4 (16). It has not been explicitly addressed but the reason that DRAM has not been actively adopted for the neuromorphic computing can be found from the fact that the periodic refresh operations are required in the conventional DRAM cell.

Fig. 4. Architecture of a CNN accelerator with DRAM (16).

Fig. 5. Synaptic operation of a novel DRAM cell (17).

Neuromorphic computing architectures are specifically designed for higher energy efficiency and superb parallelism in big data processing. The loss of time and data bandwidth in the DRAM synapse array can be seriously concerned. Thus, if DRAM cells can be adopted in the neuromorphic applications as the synaptic units, the issue of data retainability should be resolved. A novel DRAM cell featuring two independent MOSFET devices, without capacitor, has been recently invented and presented (17,18). The first MOSFET takes charge of learning operations (potentiation and depression) and the second one takes charge of inference only, by which non-destructive inference operation and substantially increased data retention are warranted. Further, the invented DRAM cell can be operated in the dual modes: one for the stand-alone DRAM and the other for neuromorphic application depending on the magnitude of voltage pulse for program and erase operations. Fig. 5 demonstrates the output curves of the second MOSFET where the inference operation takes place. The functionality as a synapse cell with plausibly linear weight modulation capability in terms of number of learning pulses is clearly demonstrated in Fig. 5.

Although the usefulness and functions of the short-term memory (STM) in the hardware neuromorphic system can be differed from those in the biological nervous system (19,20), STM is essential in design and realization of time-series neuromorphic system based on RNN (21-23). Thus, the STM-oriented neuromorphic systems can be surely realized by volatile memories including SRAM and DRAM as surveyed above, and higher data capacity, energy efficiency, the time-invariant weight retainability can be realized by introducing the nonvolatile memory synapses as will be reviewed in the subsequent sections.

Fig. 6. Si-based floating-body synaptic transistor (SFST) (24).

Fig. 7. Device structure and potentiation process of semi-floating-gate synaptic transistor (SFGST): (a) Aerial view of the SFGST and its circuit symbol representation; (b) Cross-sectional view of the device; (c) Contour of hole current density during the potentiation through band-to-band tunneling (26).

## III. Nonvolatile Memory Cells for Neuromorphic Applications

### 1. Charge-trap Memory Synaptic Devices

All-circuit AI chip in Fig. 2 can be categorized into neuromorphic system since area and energy efficiencies are enhanced, in comparison with the software-driven AI, by the approach of more specific hardware design. Since the all-circuit AI chip has the Si processing compatibility, it had a higher chance to reach chip production earlier. However, a functional synaptic unit is composed of plural transistors so that there is much room to increase the area and energy efficiencies. It should be correct to express memory cells when it comes to SRAM or DRAM, the volatile memories, rather than memory devices. However, when dealing with nonvolatile memories, a single device can function as one synapse. In consequence, the nonvolatile memory synapse has higher device scalability and array density. Also, nonvolatile memories are superior to voltaile ones with regard to energy efficiency when they weave the synapse arrays for neuromorphic systems. An early single-device nonvolatile synapse was invented in the structure of floating body with charge-trap layer (24). The Si-based floating-body synaptic transistor (SFST) is capable of both STM and long-term memory (LTM) functions. SFST can be specifically understood as the combination of one-transistor (1T) DRAM and CTF memory for short- and long-term memories, respectively. Fig. 6 schematically shows the principles for the synaptic operations. The electron-hole pairs are generated by hot-carrier-induced impact ionization. The electrons are drifted into the drain junction and the holes are accumunlated in the floating body. A recent research results show that diffusion has the predominance over drift and recombination in determining retention of data in 1T DRAM (25). In other words, the accumulated holes in the ${\textit{p}}$-type body vanish by extremely fast diffusion of holes into the source and drain junctions, unless the potentiation pulses are repeated with short enough intervals. By this accumulation and fast diffusion, threshold voltage of the SFST is temporarily elevated and comes back to the initial value, which realizes the STM. Repeated potentiation pulses increase the population of the holes accumulated in the floating body and the holes have higher probability to be injected into the charge-trap layer by tunneling. By the trapped holes, the threshold voltage becomes invariant if there is no intended depression (erase) operation. The SFST necessitates a floating body for realizing the STM function but the holes can be also temporarily stored by preparing other type of storage. Fig. 7(a) schematically shows the semi-floating-gate synaptic transistor (SFGST) which can be fabricated on the bulk Si wafers (26). The potentiation takes place by band-to-band tunneling of holes from the channel into the semi-floating gate (SFG) of which one end is connected to the channel as shown in Fig. 7(b) and (c). STM is realized due to diffusion of the holes out of the SFG to the channel. It would be essential to realize high-density synaptic device array for processing massive data and vertical structuring can be a viable way of achieving the goal. Synaptic transistor with vertical channel can be designed as shown in Fig. 8(a), and further, a quantum well can be equipped for low-power learning operation and effective STM (27,28). The potentiation is performed by band-to-band tunneling through SiGe with higher power efficiency as the simulation results in Fig. 8(b) and (c).

Fig. 8. Quantum-well charge-trap synaptic transistor (CTS): (a) Schematic of the device structure; (b) Tunneling rate in the channel direction investigated by device simulation; (c) Change in energy-band diagram during a potentiation (28).

Fig. 9. Short-term memory functionality of CTS: (a) Increase of the hole concentration in the whole SiGe layer with potentiation pulse number; (b) The decay in the absence of a pulse (28).

Fig. 10. Highly linear conductance change of the CTS device with regard to number of learning pulses. The earlier 23 pulses are for potentiation and the latter ones are for depression (28).

Fig. 11. Device structure of core-shell dual-gate (CSDG) nanowire synaptic transistor: (a) Three-dimensional view. Cross-sectional views; (b) along; (c) across the channel (29).

The valence band offset (VBO) between SiGe and Si provides a quantum well for effective hole confinement for STM as shown in Fig. 9(a) and (b). Since the heterostructure quantum well acts as the floating body for holes, the synaptic transistor can be fabricated on the bulk Si wafers cost-effectively. By this structuring, both area and power efficiencies are obtained at the same time. Fig. 10 depicts the modulation of synaptic weight (electrical conductance) by the number of pulses in the learning processes of the charge-trap synapse (CTS) (28). Although the perfect linearity in weight modulation does not have to be fulfulled for off-chip learning, higher weight linearity is undoubtedly beneficial since the burdens in the peripheral circuits and supporting softwares can be greatly lessened. Further, the perfect linearity needs to be pursued in the on-chip learning neuromorphic system with full autonomy.

Nanowire synaptic transistor can be designed considering the geometrical similarity (Fig. 11(a)) with the three-dimensional vertical NAND (VNAND) products (29,30). The synaptic transistor is operated by core-shell dual gates (CSDG) and the charge-trap nitride layer is located on the shell gate side as schematically shown in Fig. 11(b) and (c). Voltages of large magnitudes are applied to the shell gate for potentiation and depression operations. The core gate assists the shell gate in learning operations, being applied with voltages of smaller magnitudes. Fig. 12(a) shows the weight modulation as a function of number of learning pulses. In order to obtain higher linearity, bias conditions for potentiation, depression, and inference need to be optimized. The synaptic weights obtained from the potentiation/depression data in Fig. 12(a) were used for off-chip training of a neural network in Fig. 12(b). In comparison with the purely software-based recognition accuracy of 92.3%, there are only marginal drops in accuracy as demonstrated in Fig. 12(c) and (d), which supports the merits of the CSDG synapse.

Fig. 12. Pattern recognition test of the CSDG nanowire synaptic transistor: (a) Modulation of synaptic weight in LTP and LTD characteristics of the CSDG device; (b) Schematic of the single-layer neural network made up of CSDG nanowire synaptic transistors for MNIST digit recognition. Digit recognition accuracy (%) as a functon of the number of training epochs at three different distinct depression voltages of the synaptic device for training with (c) 28 ${\times}$ 28; (d) 16 ${\times}$ 16 pixels. Insets of (c) and (d) show the MNIST images of digit “3” in the 28 ${\times}$ 28 and 16 ${\times}$ 16 resolutions, respectively (29).

Fig. 13. Phase-change memory (PCM) synapse array: (a) Schematics of 10 ${\times}$ 10 array and cell; (b) Optical microscope image of PCM cell array and TEM image of a single cell (31).

Fig. 14. 3D vertical ferroelectric HZO-based FTJ array characterization: (a) Schematic of high-density 3D vertical HZO-based FTJ synapse array; (b) zoomed-in schematic; (c) HRTEM image of the 3D TiN/FE-HZO/Pt devices; (d) Enlarged TEM image of the bottom cell corresponding to (c) (adapted from (32) with permission from Nanoscale).

### 2. Resistance-change Memory Synaptic Devices

As reviewed in the previous section, CTF synapses have high Si processing compatibility and can be made capable of both STM and LTM. Although the function of STM can be differed from the original one in the biological system in many aspects, one of the common functions is to manage the entire system in the energy-efficient manner. In the electronic system, sustaining the stronger connectivity with a larger weight requires a larger energy consumption. Since stronger connectivity implies that higher electrical conductivity, a synaptic transistor with a larger synaptic weight consumes more energy in performing inference operations at a given read voltage. From this point of view, the STM function acts as a filter discriminating less important signals - mistakenly sent signals, noises, less frequently incoming signals, etc. - that might be the sources for increasing the system power consumption by the synaptic components with unwantedly increased weights. However, STM function can be optional and can be prepared depending on system requirements and applications, and the CTF memories surveyed in the previous section can provide the plausible synaptic device solution.

Fig. 15. Schematic of MTJ-heavy metal (HM) binary synapse: (a) Cross-sectional view; (b) A significance driven LT-ST stochastic synapse comprising two MTJ-HM devices (33).

Fig. 16. (a) Construction of Si wedge. SEM image of the Si wedges for bottom electrode formation after the optimized wet etch process with 25% TMAH solution at room temperature; (b) TEM images of the cross-senctional view of Si wedge. The heavily ${\textit{p}}$-type-doped top of the Si wedge acts as the bottom electrode of a single synaptic device cell and a bitline in the array The width of the wedge top is 30 nm (Copyright (2021) The Japan Society of Applied Physics).

Fig. 17. Measurement for input voltage vector-conductivity matrix multiplication function of the resistive-switching synaptic device cross-point array: (a) Input voltage: (${\textit{V}}$$_{1}, {\textit{V}}$$_{2}$) = (1 V, 0 V); (b) Input voltage: (${\textit{V}}$$_{1}, {\textit{V}}$$_{2}$) = (0 V, 1 V); (c) Input voltage: (${\textit{V}}$$_{1}, {\textit{V}}$$_{2}$) = (1 V, 1 V); (d) Output current for input in (a); (e) Output current for input in (b); (f) Output current for input in (c) (Copyright (2021) The Japan Society of Applied Physics).

As can be inferred by Fig. 11(a), synapse is the connecting part between two neurons called pre- and post-synaptic neurons. The synapse is neither an organ nor an explicit structure but an aquaeous medium through which signals are propagating between the neurons. Thus, it will be closer to the reality to call it “connectivity” rather than a connecting part indeed. However, it is surely the place where two neurons meet each other so that the synapse can be treated as a two-terminal device in the electronic device sense. There can be deficiency in numbers or imperfection in functionality in realizing all the functions of a biological synapse by a device with only two terminals, and thus, assistant terminals can be added as confirmed by the charge-trap memory devices in the previous section. At the same time, a substantially large portion of researches on neuromorphic devices have been dedicated to the two-terminal synaptic devices owing to the great structural resemblance and simplicity in process integration. Resistive-switching random-access memory (RRAM), phase-change memory (PCM), ferroelectric tunnel junction (FTJ), and magnetic tunnel junction (MTJ) have been considered to be the candidates for the two-terminal synaptic devices. Fig. 13(a) and (b) through Fig. 15(a) and (b) demonstrate the synaptic devices and their arrays based on PCM, FTJ, and MTJ in the recent literature (31-33).

RRAM has relatively wider variety in the base material compared with PCM, FTJ, and MTJ which usually necessitate highly delicate control over the atomic compositions. Also, RRAM has a wide span of materials compatible with Si processing, which can be a merit in the massive production point of view, including IGZO, HfO$_{2}$, TiO$_{x}$, ZTO, Ta$_{2}$O$_{5}$, SiN$_{x}$, and GeO$_{x}$ (34-42). The resistive-switching synaptic device can be further optimized with regard to device structure for low-power operation. A novel structure of nanowedge can be adopted for low-voltage learning operations helped by an effective field concentration as shown in Fig. 16 (43). The most important feature of the hardware neuromorphic system becomes apparent when the vector matrix multiplication (VMM) operation is clearly shown, which should be the absolute index for the accelerated MAC operations in the ultra-light and fast hardware-driven AI. Fig. 17 demonstrates the experimental results on VMM operation in the fabricated nanowedge SiN$_{x}$ resistive-switching synaptic device array (43).

Table 1. Comparison among the reported synaptic devices

 2T DRAM (17) SFST (24) QW CTS (vertical) (28) PCM (31) HZO FTJ (32) MTJ-HM (33) Nanowedge RRAM (43) Volatility Volatile Nonvolatile Nonvolatile Nonvolatile Nonvolatile Nonvolatile Nonvolatile Mechanism Charge store Charge trap Charge trap Phase change Ferroelectric Magnetic Resistive-switching Type Charge-storage type Resistance-change type Reported area 250 nm × 250 nm 100 nm × 100 nm 100 nm × 30 nm 90-nm node 2.5 μm$^{2}$ π/4 × 100 × 40 nm$^{2}$ 30 nm × 30 nm Processing maturity Extremely high Extremely high Extremely high Extremely high High High High Predicted cell scalability High High Extremely high Extremely high High Moderate Extremely high Multilevel operation Possible (newly made in this work) Possible Possible Possible Possible Possible Possible Switching speed Extremely high High High High High Extremely high High Inference energy Low Extremely low Extremely low Extremely low Low High High

Table 1 shows the comparison among the reported synaptic devices introduced in Chap. II and III with respect to the representative characteristics, in the order of their appearances. The weight volatility, weight modulation mechanism, and type are identified on the first three rows, which could have been understood by Fig. 1. The cell areas reported in the references are listed on the fourth row. While some of the reported synaptic devices were fabricated and their cell areas were also explicitly clarified in the references, some of them were designed by device simulation and the cell areas were estimated by a set of critical dimensions given in the references. Processing maturity means the possibility that the invented synaptic devices can be accommodated by the current fabrication technology for commercial chip production. 2T DRAM, SFST, and QW CTS are fully compatible with the Si processing. Although ferroelectric and magnetic switching materials have been actively brought into the Si processing fabrication, there is still room for expanding the variety of materials. The resistive-switching materials also have a wide span of candidates and recent materials such as SiO$_{2}$ and Si$_{3}$N$_{4}$ ensure the Si processing compatibility. Based on the processing maturity, cell scalability has been further predicted, beyond the reported values, in which vertical CTS, PCM, and RRAM are highly scalable. All the reported synaptic devices are capable of multilevel operations, and in particular, it has been demonstrated that a peculiarly designed DRAM can be operated with multiple weights (20 weights in the report). The highest switching speeds are found in DRAM and MTJ synaptic devices and the lowest inference energies are realized by the charge-trap synaptic devices.

The hardware-oriented neuromorphic system is under active researches and developments for the highly mobile and energy-efficient AI. However, the essense comes from the mathematical backgrounds built up from the biological analogy. MAC operation is one of the examples. In other words, there might be still room that can be filled by the software that complementarily work with the developed hardware neuromorphic system. A recent study shows that a successful encounter between the fabricated hardware neural network and software approach can increase the intelligent performances of the system. The philosophy that agent and environment interact with each other through action and reward (Fig. 18(a)) substantially reduced the minimum number of car moves that let a targeted car out of the parking lot in a shorter time (Fig. 18(b)) (44). By the reinforcement learning in which a reward is given, the overall learning process can be shortened and it can be more effectively mimicking the way of learning in the biological system. The hardware-oriented AI would be more dependent on memory technologies which conduct the numerous operations with superb energy efficiency in the compact hardware, being grafted with software in part for higher intelligence.

## IV. Processing-in-memory (PIM)

Fig. 18. Learning results: (a) Process of reinforcement learning. Agent and environment interact with each other through action and reward; (b) Number of moves required to get the red car out of the area during the reinforcement learning process (adapted from (44) with permission from IEEE Transactions on Electron Devices).

Fig. 19. Different but same names for processing-in-memory.

Fig. 20. Memory bottleneck in the serial-processing computers.

Processing-in-memory (PIM) is one of the traditional technologies that have been developed in the very-large-scale integration (VLSI) area. The first idea came up with a terminology of logic-in-memory that features the SRAM working between the central processing unit (CPU) and slow high-density magnetic memory domain, dating back to 1970 (45). PIM has been explictly appearing since 1990’s and the majority of PIM technology is devoted by SRAM (46). There have been similar nomenclatures that can be understood in the same meaning of PIM as shown in Fig. 19: logic-in-memory (LIM), near-memory processing (NMP), in-memory processing (IMP), memory-centric processing, etc. In short, although PIM technologies have been developed for more than half a century in the computer architecture and VLSI fields, most of the dedication has been made in reducing the physical distance between CPU and memory domain by either shortening the interconnection or introducing a new architecture topology among functional blocks. In other words, all the above technologies have been realized “near” the memory. So, it cannot be denied that PIM has been a rather metaphorical terminology if seen from the device point of view. Coming back to the original motivation, PIM aims to get rid of the memory bottleneck or memory “wall” in the serially processing conventional computers schematically shown in Fig. 20. This should be true since the perceived speed on the end user’s side is defined by the speed of communication between the processing unit and memory domain rather than the speed of processing itself. Thinking about the device scaling limit due to quantum mechanical carrier behaviors and line-and-space pitch limit capped by parasitic resistances and capacitances, further breakthrough needs to be sought with more specifically designed semiconductor memory devices for making up the PIM cells.

The understanding of difference between PIM and neuromorphic system can be helpful. Fig. 21 shows the technological map of computer architectures. Computer architectures can be categorized into Von Neumann architecture and non-Von Neumann architecture although the latter is not prevalent yet. PIM has been indicating NMP so far, indeed. Recently, a part of functions of the processing unit are allocated into individual DRAM chips, which realizes “in-memory-array” processing (47,48). This is surely a new PIM technology advanced from NMP. However, the Von Neumann architecture is maintained in the in-memory-array processing. Whether the computer walks out of the Von Neumann architecture is not decidingly important if the motivation of PIM is reminded. PIM can embrace both Von Neumann and non-Von Neumann architectures only if the contributions are made in the direction of getting rid of memory wall. The literal PIM can be realized by cell-level-memory-and-processing technology, and here, the conventional architecture shall be broken. Neuromorphic computing does not completely belong to PIM but its majority is found as a subset of PIM. The reason that neuromorphic system is a subset to PIM from the task capability point of view is more succinctly glanced by the application landscape in Fig. 22. The applications can be grouped into three main categories based on the overall degree of required computational precision. A qualitative measure of the computational complexity and data accesses involved in the different applications is also shown (49). Although neuromorhpic is mainly focused on the accerlerated MAC operations, optionally with mult-level-operational memory devices, PIM is capable of carrying out both arithmetic operations including MAC and Boolean logic operations. PIM has not existed for AI although the ingredients can make the substantial contributions to it. Rather, PIM is more general and universal technology in which neuromorphic can be realized as a form of PIM. Thus, indicating neuromophic system or MAC accelerator as PIM can be misleading since they take only a part in PIM. Neuromphic chip cannot replace the conventional CPU completely but PIM aims to be the new CPU technology itself. In this regard, PIM might have a new differentiating name of memory processing unit (MemPU). The final destination of logic is the memory cell itself, and at this stage, the literal PIM is realized. Breaking the Von Neumann architecture is not the goal but it can be broken at some moment while taking the forward steps to the literal PIM. It needs to be reminded that the PIM is not related with AI nor non-Von Neumann computer architecture. Not all the technologies on memory devices and integrated circuits are aiming neuromorphic system but it can be admitted that all of them are pursuing PIM for lifting up the memory wall.

Fig. 21. Technological map of computer architectures.

## V. Memory Devices for PIM Cells

Fig. 22. Application landscape for in-memory computing (adapted from (49) with permission from Nature Nanotechnology).

SRAM and DRAM, volatile memories, showed the possibilities of implementing the cell-level memory and processing previously sketched in Fig. 21. Fig. 23 shows the in-memory computing schemes based on 8-T and 8$^{+}$-T SRAM cells in which Boolean operations of NAND, NOR, and XOR along with implication (IMP) and 2-bit read operation are realized. It is reported that 8$^{+}$-T SRAM cell in the differential mode achieves a latency of 1 ns and an average energy/bit of 29.67 fJ (50). SRAM can tackle into PIM technology in advance due to its high operation speed but lacks of area efficiency. One of the early ideas on PIM based on volatile memory is found in the realization of in-DRAM AND and OR operations (51), which evolves into an accelerator-in-memory for bulk bitwise operations (Ambit) soon (52). In Fig. 24, if ${\textit{A}}$, ${\textit{B}}$, and ${\textit{C}}$ represent the logical values of the three cells, then the final state of the bitline is ${\textit{AB}}$ + ${\textit{BC}}$ + ${\textit{CA}}$ (the bitwise majority function). Since the activation is a row-level operation in DRAM, the triple-row activation (TRA) operates on an entire row of DRAM cells and multi-kilobyte-wide bitwise AND/OR of two rows is conducted (52). Although the principles of individual memory devices are neither changed nor newly found, the full functionality for PIM can be expected when plural memory and logic devices are combined. As a result, PIM cell might be a more realistic terminology in many cases than PIM device. It would be more beneficial if the PIM technology is realized with a high capacity memory in the sense that the overall perceived speed of a system is determined by the speed of memory domain, as briefly forementioned, and the memory with the highest density decides the eventual system speed. Thus, although the state-of-the-art PIM chip is based on DRAM at this moment (47,48), nonvolatile memories would provide the driving force toward advanced PIM technologies just as in case of neuromorphic system. In the research level, Boolean operations are being obtained in the nonvolatile memories. Fig. 25 shows the XOR logic operation in the three-dimensional NAND flash memory array (53). The PIM cell is implemented by a single device and the operation is conducted by the combinations of bitline and wordline voltages. PIM cell composed of two transistors and one RRAM (2T-1R) was reported (54). Simultaneous operations of 2T-1R realizes the simultaneous logic-in-memory (SLIM) depending on input voltages applied on the logic transistor gates and on resistance state of the RRAM device. Fig. 26 demonstrates that NOR operation can be performed by the PIM cell as one of the feasible Boolean operations. It has been also reported that phase-change, ferroelectric, and magnetic memories can be employed in constructing a PIM cell that performs various set of Boolean operations (55-57).

Fig. 23. A summary of in-memory computing schemes proposed by 8-T and 8$^{+-}$T SRAM cells (adapted from (50) with permission from IEEE Transactions on Circuits and Systems I: Regular Papers).

Fig. 24. Triple-row activation for in-DRAM logic operation.

Fig. 27 depicts bar diagrams to make a good distinction between NMP and the cell-level (literal) PIM. The first bar at the top shows the total sequences taken when CPU and memory domain communicate and the total lenth implies the time required for a unit processing/memory operation between them. Advanced computer architecture aims to reduce the length of the bar: faster logic transistor for faster CPU, interconnection with smaller RC delay, memory devices with faster read/write speeds, and effective data processing methods need to be collectively developed. These advancements result in the shorter bar at the center. The past PIM technology has dedicated to reduction of time and energy loss in the interconnection by shortening the physical distance between CPU and memory, which can be the major feature of NMP. Advanced Von Neumann architecture can be developed by minimization of the individual time segments. In-memory-array processing can be still categorized into here. On the other hand, the cell-level PIM can lift off the interconnects by specifically designed PIM cells as shown by the bar at the bottom in Fig. 27, admitting that the time save in the logic/memory devices is getting more and more irreducible due to the physical and process limits. In this phase, Von Neumann architecture might be destructed.

Fig. 25. Logic operations for XOR with three steps (adapted from (53) with permission from IEEE Electron Device Letters).

Fig. 26. Four possible input operand combinations: (a) ${\textit{a}}$ = ${\textit{b}}$ = ‘0’; (b) ${\textit{a}}$ = ‘0’, ${\textit{b}}$ = ‘1’; (c) ${\textit{a}}$ = ‘1’, ${\textit{b}}$ = ‘0’; (d) ${\textit{a}}$ = ${\textit{b}}$ = ‘1’. Experimental results for NOR logic implemented using 2T-1R SLIM bitcell with device initial state: ‘11’ (e-h) and ‘01’ (i-l) (54).

Fig. 27. Conceptual comparison between near-memory PIM (NMP) and cell-level PIM technologies. NMP is dedicated to shorten the individual segments composing the time for whole data communiation between processor and memory domains. On the other hand, cell-level PIM can truncate one of more segments out of the entire communcation processes.

## VI. CONCLUSION

In this review, identities of volatile and nonvolatile memories have been contemplated in the view of neuromorphic and PIM technologies. Although neuromorphic system and PIM are not the same, they are not mutually exclusive at all since both of them can be implemented by memory devices. Although PIM is not targeting the AI but more widely applicable processors, both neuromorphic system and PIM resemble the human brain in which the various operations are occuring at the very place where the memory components are. Memory devices is taking the steering position for advanced computers and it should be high time to make the series contributions toward the future processor, MemPU.

### ACKNOWLEDGMENTS

This work was supported by IITP Grant funded by the Ministry of Science and ICT (MSIT) (2021001776) and NRF Grant funded by MSIT (2021M3F3A201037927).

### REFERENCES

1
Asanovic K., Bodik R., Demmel J., Keaveny T., Keutzer K., Kubiatowicz J., Morgan N., Patterson D., Sen K., Wawrzynek J., Wessel D., Yelick K., Oct 2009, A view of the parallel computing landscape, Commun. ACM, Vol. 52, No. 10, pp. 56-67
2
Kindratenko V., Trancoso P., May-Jun 2011, Trends in High-Performance Computing, Comput. Sci. Eng., Vol. 13, No. 3, pp. 92-95
3
Birman K. P., Dec 1993, The process group approach to reliable distributed computing, Commun. ACM, Vol. 36, No. 12, pp. 37-54
4
Foster I., Zhao Y., Raicu I., Lu S., Nov 2007, Cloud Computing and Grid Computing 360-Degree Compared, Proc. 2008 Grid Computing Environments Workshop (GCE), Austin, TX, USA, pp. 12-16
5
Loeffler J., Jun. 15, 2021, AMD Zen 4 Epyc CPU could be an epic 128-core, 256-thread monster, Techradar, online available at https://www.techradar.com/news/
6
Shilov A., Oct. 1, 2021, Arm-Based 128-Core Ampere CPUs Cost a Fraction of x86 Price, Tom’s Hardware, online available at https://www.tomshardware.com/news/ampere-altra-max-128-core-priced
7
IntelⓡCoreTMi9-10980XE Extreme Edition Processor (24.75M Cache, 3.00 GHz), online available at https://www.intel.com/content/www/us/en/products/sku/198017/intel-core-i910980xe-extreme-edition-processor-24-75m-cache-3-00-ghz/specifications.html
8
IntelⓡCoreTMi9-10980XE Extreme Edition Processor (24.75M Cache, 3.00 GHz), online available at https://www.intel.com/content/www/us/en/products/sku/198017/intel-core-i910980xe-extreme-edition-processor-24-75m-cache-3-00-ghz/specifications.html
9
Cho S., Sep 2021, Semiconductor Memory Devices for Hardware-Driven Neuromorphic Systems, MDPI Books
10
Mead C., Oct 1990, Neuromorphic Electronic Systems, Proc. IEEE, Vol. 78, No. 10, pp. 1629-1639
11
Silver D., et al. , Jan 2016, Mastering the game of Go with deep neural networks and tree search, Nature, Vol. 529, pp. 484-489
12
Moore D., Jun 2014, Neuromorphic Computing Gets Ready for the (Really) Big Time, Comm. ACM, Vol. 57, No. 6, pp. 13-15
13
Akopyan F., Oct 2015, TrueNorth: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., Vol. 34, No. 10, pp. 1537-1557
14
Davies M., Wild A., Orchard G., Sandamirskaya Y., Guerra G. A. F., Joshi P., Plank P., Risbud S. R., May 2021, Advancing Neuromorphic Computing With Loihi: A Survey of Results and Outlook, Proc. IEEE, Vol. 109, No. 5, pp. 911-934
15
Andreou A. G., May 2016, Real-time sensory information processing using the TrueNorth Neurosynaptic System, 2016 IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, Vol. qc, No. canada, pp. 22-25
16
Delbruck T., Liu S.-C., Systems, Data-Driven Neuromorphic DRAM-based CNN and RNN Accelerators, 2009 Sig. Proc. Soc. Asilomar Conference on Signals
17
Baek S., Yoo B. E., Lee I., Cho S., un. 30 - Jun. 2, 2021, Design of Compact 2T(0C) DRAM Cell Allowing Nondestructive Read Operation and Glance at Its Applications as Synaptic Device, in Proc. 2021 IEIE Summer Conf., pp. 515-516, Jeju, Korea
18
Cho S., Baek S., Nov. 4, 2021, Two-Transistor Memory Cell, Synaptic Cell and Neuron Mimic Cell Using the Same and Operation Method Thereof, Korean Patent filed, Vol. 10-2021-0150751
19
Wingfield nand A., Byrnes D. L., May 1972, Decay of Information in Short-Term Memory, Science, Vol. 176, No. 4035, pp. 690-692
20
Camina E., Güell F., Jun 2017, The Neuroanatomical, Neurophysiological and Psychological Basis of Memory: Current Models and Their Origins, Front. Pharmacol., Vol. 8, pp. 438-1-438-16
21
Botvinick M. M., Plaut D. C., Apr 2006, Short-Term Memory for Serial Order: A Recurrent Neural Network Model, Psychol. Rev., Vol. 113, No. 2, pp. 201-233
22
Liu J., Zhang H., Yu T., Ni D., Ren L., Yang Q., Lu B., Wang D., Heinen R., Axmacher N., Xue G., Dec 2020, Stable maintenance of multiple representational formats in human visual short-term memory, PNAS, Vol. 117, No. 51, pp. 32329-32339
23
Ichikawa K., Kaneko K., Aug 2021, Short-term memory by transient oscillatory dynamics in recurrent neural networks, Phys. Rev. Res., Vol. 3, No. 3, pp. 033193-1-033193-9
24
Kim H., Cho S., Sun M.-C., Park J., Hwang S., Park B.-G., Oct 2016, Simulation Study on Silicon-Based Floating Body Synaptic Transistor with Short- and Long-Term Memory Functions and Its Spike Timing-Dependent Plasticity, J. Semicond. Technol. Sci., Vol. 16, No. 5, pp. 657-663
25
Lee Y. J., Cho S., Dec 2021, Predominance of Carrier Diffusion in Determination of Data Retention in One-Transistor Dynamic Random-Access Memory, J. Semicond. Technol. Sci., Vol. 21, No. 6, pp. 406-411
26
Lee Y. J., Cho S., Dec 2021, Predominance of Carrier Diffusion in Determination of Data Retention in One-Transistor Dynamic Random-Access Memory,, Vol. 21, No. 6, pp. 406-411
27
Cho Y., Lee J. Y., Yu E., Han J.-H., Baek M.-H., Cho S., Park B.-G., Jan 2019, Design and Characterization of Semi-Floating-Gate Synaptic Transistor, Micromachines, Vol. 10, No. 1, pp. 32-41
28
Yu E., Cho S., Park B.-G., Sep 2019, A Silicon-Compatible Synaptic Transistor Capable of Multiple Synaptic Weights toward Energy-Efficient Neuromorphic Systems, Electronics, Vol. 8, No. 10, pp. 1102-1-1102-12
29
Yu E., Cho S., Roy K., Park B.-G., Aug 2020, A Quantum-Well Charge-Trap Synaptic Transistor with Highly Linear Weight Tunability, IEEE J. Electron Devices Soc., Vol. 8, pp. 834-840
30
Ansari Md. H. R., Kannan U. M., Cho S., Jul 2021, Core-Shell Dual-Gate Nanowire Charge-Trap Memory for Synaptic Operations for Neuromorphic Applications, Nanomater., Vol. 11, No. 7, pp. 1773-1-1773-14
31
Ansari Md. H. R., Cho S., Lee J.-H., Park B.-G., Dec 2021, Core-Shell Dual-Gate Nanowire Memory as a Synaptic Device for Neuromorphic Application, IEEE J. Electron Devices Soc., Vol. 9, pp. 1282-1289
32
Eryilmaz S. B., Kuzum D., Jeyasingh R., Kim S. B., Brightsky M., Lam C., Wong H.-S. P., Jul 2014, Brain-like associative learning using a nanoscale non-volatile phase change synaptic device array, Front. Neurosci., Vol. 8, pp. 205-1-205-11
33
Chen L., Wang T.-Y., Dai Y.-W., Cha M.-Y., Zhu H., Sun Q.-Q., Ding S.-J., Zhou P., Chua L., Zhang D. W., Sep 2018, Ultra-low power Hf0.5Zr0.5O2 based ferroelectric tunnel junction synapses for hardware neural network applications, Nanoscale, Vol. 10, No. 33, pp. 15826-15833
34
Srinivasan G., Sengupta A., Roy K., Jul 2016, Magnetic Tunnel Junction Based Long-Term Short-Term Stochastic Synapse for a Spiking Neural Network with On-Chip STDP Learning, Sci. Rep., Vol. 6, pp. 29545-1-2954513
35
Bang S., Kim M.-H., Kim T.-H., Lee D. K., Kim S., Cho S., Park B.-G., Dec 2018, Gradual switching and self-rectifying characteristics of Cu/α-IGZO/p+-Si RRAM for synaptic device application, Solid-State Electron., Vol. 150, pp. 60-65
36
Lee D. K., Kim M.-H., Kim T.-H., Bang S., Choi Y.-J., Kim S., Cho S., Park B.-G., Apr 2019, Synaptic behaviors of HfO2 ReRAM by pulse frequency modulation, Solid-State Electron., Vol. 154, pp. 31-35
37
Kim T.-H., Kim M.-H., Bang S., Lee D. K., Kim S., Cho S., Park B.-G., Jul 2020, Fabrication and Characterization of TiOx Memristor for Synaptic Device Application, IEEE Trans. Nanotechnol., Vol. 19, pp. 475-480
38
Ryu J.-H., Kim B., Hussain F., Ismail M., Mahata C., Oh T., Imran M., Min K. K., Kim T.-H., Yang B.-D., Cho S., Park B.-G., Kim Y., Kim S., Jul 2020, Zinc Tin Oxide Synaptic Device for Neuromorphic Engineering, IEEE Access, Vol. 8, pp. 130678-130686
39
Kim D., Jang J. T., Yu E., Park J., Min J., Kim D. M., Choi S.-J., Mo H.-S., Cho S., Roy K., Kim D., Aug 2020, Pd/IGZO/p+-Si Synaptic Device with Self-Graded Oxygen Concentration for Highly Linear Weight Adjustability and Improved Energy Efficiency, ACS Appl. Electron. Mater., Vol. 2, No. 8, pp. 2390-2397
40
Kang D., Jang J. T., Park S., Ansari Md. H. R., Bae J.-H., Choi S.-J., Kim D. M., Kim C., Cho S., Kim D., Apr 2021, Threshold-Variation-Tolerant Coupling-Gate α-IGZO Synaptic Transistor for More Reliably Controllable Hardware Neuromorphic System, IEEE Access, Vol. 9, pp. 59345-59352
41
Rasheed U., Ryu H., Mahata C., Khalil R. M. A., Imran M., Rana A. M., Kousar F., Kim B., Kim Y., Cho S., Hussain F., Kim S., Oct 2021, Resistive switching characteristics and theoretical simulation of a Pt/α-Ta2O5/TiN synaptic device for neuromorphic applications, J. Alloys Compd., Vol. 877, pp. 160204-1-160204-10
42
Kim S., Jung S., Kim M.-H., Chen Y.-C., Chang Y.-F., Ryoo K.-C., Cho S., Lee J.-H., Park B.-G., May 2018, Scaling Effect on Silicon Nitride Memristor with Highly Doped Si Substrate, Small, Vol. 14, No. 19, pp. 1704062-1-1704062-8
43
Lee J. Y., Kim Y., Kim M.-H., Go S., Ryu S. W., Lee J. Y., Ha T. J., Kim S. G., Cho S., Park B.-G., Mar 2019, Ni/GeOx/p+ Si resistive-switching random-access memory with full Si processing compatibility and its characterization and modeling, Vacuum, Vol. 161, pp. 63-70
44
Kim M.-H., Cho S., Park B.-G., May 2021, Nanoscale wedge resistive-switching synaptic device and experimental verification of vector-matrix multiplication for hardware neuromorphic application, Jpn. J. Appl. Phys., Vol. 60, No. 5, pp. 050905-1
45
Kim M.-H., Hwang S., Bang S., Kim T.-H., Lee D. K., Ansari Md. H. R., Cho S., Park B.-G., Sep 2021, A More Hardware-Oriented Spiking Neural Network Based on Leading Memory Technology and Its Application With Reinforcement Learning, IEEE Trans. Electron Devices, Vol. 68, No. 9, pp. 4411-4417
46
Stone H. S., Jan 1970, A Logic-in-Memory Computer, IEEE Trans. Compt., Vol. c-19, No. 1, pp. 73-78
47
Gokhale M., Holmes N., Iobst K., Apr 1995, Processing in Memory: The Terasys Massively Parallel PIM Array, IEEE Comput., Vol. 28, No. 4, pp. 23-31
48
UPMEM PIM Soluition: DRAM Processing Unit (DPU), UPMEM Official website, online available at https://www.upmem.com/technology/
49
HBM PIM: Memory redesigned to advance AI, Samsung official website, online available at https://www.samsung.com/semiconductor/solutions/technology/hbm-processing-in-memory/
50
Sebastian A., Gallo M. L., Khaddam-Aljameh R., Eleftheriou E., Jul 2020, Memory devices and applications for in-memory computing, Nat. Nanotechnol., Vol. 15, pp. 529-544
51
Agrawal A., Jaiswal A., Lee C., Roy K., Dec 2018, X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories, IEEE Trans. Circuits Syst. I Regul. Pap., Vol. 65, No. 2, pp. 4219-4232
52
Seshadri V., Hsieh K., Boroum A., Lee D., Kozuch M. A., Mutlu O., Gibbons P. B., Mowry T. C., Jul-Dec 2015, Fast Bulk Bitwise AND and OR in DRAM, IEEE Comput. Archit. Lett., Vol. 14, No. 2, pp. 127-131
53
Seshadri V., Lee D., Mullins T., Hassan H., Boroumand A., Kim J., Kozuch M. A., Mutlu O., Gibbons P. B., Mowry T. C., Abmit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology, Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-50), pp. 273-287
54
Lee J., Park B.-G., Kim Y., Sep 2019, Implementation of Boolean Logic Functions in Charge Trap Flash for In-Memory Computing, IEEE Electron Device Lett., Vol. 40, No. 9, pp. 1358-1361
55
Kingra S. K., Parmar V., Chang C.-C., B.-Hudec , Hou T.-H., Suri M., Feb 2020, SLIM: Simultaneous Logic-in-Memory Computing Exploiting Bilayer Analog OxRAM Device, Sci. Rep., Vol. 10, pp. 2567-1-2567-64
56
Li Y., Zhong Y. P., Deng Y. F., Zhou Y. X., Xu L., Miao X. S., Dec 2013, Nonvolatile “AND,” “OR,” and “NOT” Boolean logic gates based on phase-change memory, J. Appl. Phys., Vol. 114, No. 23, pp. 234503-1-234503-4
57
Kim M., Lee K., Kim S., Lee J.-H., Park B.-G., Kwon D., Nov 2021, Double-Gated Ferroelectric-Gate Field-Effect Transistor for Processing in Memory, IEEE Electron Device Lett., Vol. 42, No. 11, pp. 1607-1610
58
Gonzalez-Zalba M. F., Ciccarelli C., Zarbo L. P., Irvine A. C., Campion R. C., Gallagher B. L., Jungwirth T., Ferguson A. J., Wunderlich J., Apr 2015, Reconfigurable Boolean Logic Using Magnetic Single-Electron Transistors, PLoS One, Vol. 10, No. 4, pp. 0125142-1-0125142-8

## Author

##### Seongjae Cho

received the B.S. and the Ph.D. degrees in electrical engineering from Seoul National University, Seoul, Republic of Korea, in 2004 and 2010, respectively.

He worked as an Exchange Researcher at the National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Japan, in 2009.

Also, he worked as a Postdoctoral Researcher at Seoul National University in 2010 and at Stanford University, CA, USA, from 2010 to 2013.

He joined the Department of Electronic Engineering, Gachon University, Seongnam, Republic of Korea, in 2013, where he is currently working as an Associate Professor.

His current research interests include emerging memory technologies, advanced nanoscale CMOS devices, group-IV photonic devices, memory cells for neuromorphic and memory-centric processor technolo-gies.

He is a Senior Member of IEEE and a Lifetime Member of IEIE.