Mobile QR Code QR CODE

References

1 
D. Kreuzberger, N. Kühl, and S. Hirschl, “Machine learning operations (mlops): Overview, definition, and architecture,” IEEE Access, vol. 11, pp. 31866-31879, 2023.DOI
2 
W. Liu et al., “A survey of deep neural network architectures and their applications,” Neurocomputing, vol. 234, pp. 11-26, 2017.DOI
3 
J. A. Suykens and J. Vandewalle, “Training multilayer perceptron classifiers based on a modified support vector method,” IEEE Transactions on Neural Networks, vol. 10, no. 4, pp. 907-911, 1999.DOI
4 
J. -W. Jang et al., “Sparsity-aware and re-configurable NPU architecture for Samsung flagship mobile SoC,” in Proc. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 15-28.DOI
5 
A. Reuther et al., “AI accelerator survey and trends,” in Proc. 2021 IEEE High Performance Extreme Computing Conference (HPEC), 2021, pp. 1-9.DOI
6 
A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, and E. Eleftheriou, “Memory devices and applications for in-memory computing,” Nature nanotechnology, vol. 15, no. 7, pp. 529-544, 2020.DOI
7 
M. Imani, S. Gupta, and T. Rosing, “Ultra-efficient processing in-memory for data intensive applications,” in Proc. 54th Annual Design Automation Conference 2017, 2017, pp. 1-6.DOI
8 
R. Hecht-Nielsen, “Theory of the backpropagation neural network,” Neural networks for perception, Elsevier, 1992, pp. 65-93.DOI
9 
A. Ganguly, R. Muralidhar, and V. Singh, “Towards Energy Efficient non-von Neumann Architectures for Deep Learning,” in2Proc. 20th International Symposium on Quality Electronic Design (ISQED), 2019, pp. 335-342.DOI
10 
J. Li et al., “SmartShuttle: Optimizing off-chip memory accesses for deep learning accelerators,” in Proc. 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, pp. 343-348.DOI
11 
I. Daubechies, R. DeVore, S. Foucart, B. Hanin, and G. Petrova, “Nonlinear approximation and (deep) ReLU networks,” Constructive Approximation, vol. 55, no. 1, pp. 127-172, 2022.DOI
12 
T. Liu, T. Qiu, and S. Luan, “Hyperbolic-tangent-function-based cyclic correlation: Definition and theory,” Signal Processing, vol. 164, pp. 206-216, 2019.DOI
13 
J. Han and C. Moraga, “The influence of the sigmoid function parameters on the speed of backpropagation learning,” in From Natural to Artificial Neural Computation: International Workshop on Artificial Neural Networks Malaga-Torremolinos, 1995, pp. 195-201.DOI
14 
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017.DOI
15 
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.URL
16 
S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint arXiv:1605.07146, 2016.DOI
17 
Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE Transactions on neural networks and learning systems, vol. 33, no. 12, pp. 6999-7019, Dec. 2022.DOI
18 
J. Yin et al., “Highly parallel GEMV with register blocking method on GPU architecture,” Journal of Visual Communication and Image Representation, vol. 25, no. 7, pp. 1566-1573, 2014.DOI
19 
J. Cheng et al., “Cache-Major: A Hardware Architecture and Scheduling Policy for Improving DRAM Access Efficiency in GEMV,” in Proc. 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology (ICSICT), 2022, pp. 1-3.DOI
20 
S. Khan et al., “Transformers in vision: A survey,” ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1-41, 2022.DOI
21 
P. C. Chen and Y. K. Hwang, “SANDROS: a dynamic graph search algorithm for motion planning,” IEEE Transactions on Robotics and Automation, vol. 14, no. 3, pp. 390-403, 1998.DOI
22 
K. M. Chandy and J. Misra, “Distributed computation on graphs: Shortest path algorithms,” Communications of the ACM, vol. 25, no. 11, pp. 833-837, 1982.DOI
23 
E. M. Palmer, “On the spanning tree packing number of a graph: a survey,” Discrete Mathematics, vol. 230, no. 1-3, pp. 13-21, 2001.DOI
24 
C. C. Aggarwal and H. Wang, “A survey of clustering algorithms for graph data,” Managing and mining graph data, pp. 275-301, 2010.DOI
25 
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip, “A comprehensive survey on graph neural networks,” IEEE transactions on neural networks and learning systems, vol. 32, no. 1, pp. 4-24, 2020.DOI
26 
S. Zhang, H. Tong, J. Xu, and R. Maciejewski, “Graph convolutional networks: a comprehensive review,” Computational Social Networks, vol. 6, no. 1, pp. 1-23, 2019.DOI
27 
M. Chen, Z. Wei, Z. Huang, B. Ding, and Y. Li, “Simple and deep graph convolutional networks,” in Proc. International conference on machine learning, 2020, pp. 1725-1735.URL
28 
X. Zou, S. Xu, X. Chen, L. Yan, and Y. Han, “Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology,” Science China Information Sciences, vol. 64, no. 6, p. 160404, 2021.DOI
29 
A. Ivanov, N. Dryden, T. Ben-Nun, S. Li, and T. Hoefler, “Data movement is all you need: A case study on optimizing transformers,” in Proceedings of Machine Learning and Systems, vol. 3, pp. 711-732, 2021.URL
30 
G. F. Oliveira et al., “DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks,” IEEE Access, vol. 9, pp. 134457-134502, 2021.DOI
31 
Y. Kim, W. Yang, and O. Mutlu, “Ramulator: A fast and extensible DRAM simulator,” IEEE Computer architecture letters, vol. 15, no. 1, pp. 45-49, 2015.DOI
32 
D. Sanchez and C. Kozyrakis, “ZSim: Fast and accurate microarchitectural simulation of thousand-core systems,” ACM SIGARCH Computer architecture news, vol. 41, no. 3, pp. 475-486, 2013.DOI
33 
C.-K. Luk et al., “Pin: building customized program analysis tools with dynamic instrumentation,” Acm sigplan notices, vol. 40, no. 6, pp. 190-200, 2005.DOI
34 
P. Rosenfeld, E. Cooper-Balis, and B. Jacob, “DRAMSim2: A cycle accurate memory system simulator,” IEEE computer architecture letters, vol. 10, no. 1, pp. 16-19, 2011.DOI
35 
J. D. Leidel and Y. Chen, “Hmc-sim: A simulation framework for hybrid memory cube devices,” Parallel Processing Letters, vol. 24, no. 4, p. 1442002, 2014.DOI
36 
S. Mittal, “A survey of ReRAM-based architectures for processing-in-memory and neural networks,” Machine learning and knowledge extraction, vol. 1, no. 1, pp. 75-114, 2018.DOI
37 
N. Challapalle et al., “GaaS-X: Graph analytics accelerator supporting sparse data representation using crossbar architectures,” in Proc. 2020 ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), 2020, pp. 433-445.DOI
38 
R. Yang et al., “Ternary content-addressable memory with MoS2 transistors for massively parallel data search,” Nature Electronics, vol. 2, no. 3, pp. 108-114, 2019.DOI
39 
T. Yang et al., “PIMGCN: A ReRAM-based PIM design for graph convolutional network acceleration,” in Proc. 2021 58th ACM/IEEE Design Automation Conference, 2021, pp. 583-588.DOI
40 
J. Chen et al., “GCIM: Toward Efficient Processing of Graph Convolutional Networks in 3D-Stacked Memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 41, no. 11, pp. 3579-3590, Nov. 2022.URL
41 
O. A. Alzubi et al., “An optimal pruning algorithm of classifier ensembles: dynamic programming approach,” Neural Computing and Applications, vol. 32, pp. 16091-16107, 2020.DOI
42 
M. Yan et al., “Hygcn: A gcn accelerator with hybrid architecture,” in Proc. 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 15-29.DOI
43 
M. Fey and J. E. Lenssen, “Fast graph representation learning with PyTorch Geometric,” arXiv preprint arXiv:1903.02428, 2019.DOI
44 
T. Gokmen, M. Onen, and W. Haensch, “Training deep convolutional neural networks with resistive cross-point devices,” Frontiers in neuroscience, vol. 11, p. 538, 2017.DOI
45 
X. Peng, R. Liu, and S. Yu, “Optimizing weight mapping and data flow for convolutional neural networks on RRAM based processing-in-memory architecture,” in Proc. 2019 IEEE International Symposium on Circuits and Systems, 2019, pp. 1-5.DOI
46 
A. Roohi, S. Angizi, D. Fan, and R. F. DeMara, “Processing-in-memory acceleration of convolutional neural networks for energy-effciency, and power-intermittency resilience,” in Proc. 20th International Symposium on Quality Electronic Design (ISQED), 2019, pp. 8-13.DOI
47 
Y. Wang, W. Chen, J. Yang, and T. Li, “Towards memory-efficient allocation of CNNs on processing-in-memory architecture,” IEEE Transactions on Parallel and Distributed Systems, vol. 29, no. 6, pp. 1428-1441, 2018.DOI
48 
H. Dbouk, S. K. Gonugondla, C. Sakr, and N. R. Shanbhag, “A 0.44-μJ/dec, 39.9-μs/dec, recurrent attention in-memory processor for keyword spotting,” IEEE Journal of Solid-State Circuits, vol. 56, no. 7, pp. 2234-2244, 2020.DOI
49 
A. K. Ramanathan et al., “Look-up table based energy efficient processing in cache support for neural network acceleration,” in Proc. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 88-101.DOI
50 
M. He et al., “Newton: A DRAM-maker’s accelerator-in-memory (AiM) architecture for machine learning,” in Proc. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 372-385.DOI
51 
M. Zhou, W. Xu, J. Kang, and T. Rosing, “TransPIM: A Memory-based Acceleration via Software-Hardware Co-Design for Transformer,” in Proc. 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2022, pp. 1071-1085.DOI
52 
J. Gómez-Luna et al., “Benchmarking memory-centric computing systems: Analysis of real processing-in-memory hardware,” in Proc. 2021 12th International Green and Sustainable Computing Conference (IGSC), 2021, pp. 1-7.DOI
53 
F. Devaux, “The true processing in memory accelerator,” in Proc. 2019 IEEE Hot Chips 31 Symposium (HCS), 2019, pp. 1-24.DOI
54 
H. Shin et al. “McDRAM: Low latency and energy-efficient matrix computations in DRAM,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 11, pp. 2613-2622, 2018.DOI
55 
S. Cho, H. Choi, E. Park, H. Shin, and S. Yoo, “McDRAM v2: In-dynamic random access memory systolic array accelerator to address the large model problem in deep neural networks on the edge,” IEEE Access, vol. 8, pp. 135223-135243, 2020.DOI
56 
Y. C. Kwon et al., “25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications,” in Proc. 2021 IEEE International Solid-State Circuits Conference, Feb. 2021, pp. 350-352.DOI
57 
S. Lee et al., “Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product,” in Proc. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 2021, pp. 43-56.DOI
58 
J. H. Kim et al., “Aquabolt-XL: Samsung HBM2-PIM with in-memory processing for ML accelerators and beyond,” in Proc. 2021 IEEE Hot Chips 33 Symposium (HCS), 2021, pp. 1-26.DOI
59 
D. Kwon et al., “A 1ynm 1.25 V 8Gb 16Gb/s/Pin GDDR6-Based Accelerator-in-Memory Supporting 1TFLOPS MAC Operation and Various Activation Functions for Deep Learning Application,” IEEE Journal of Solid-State Circuits, vol. 58, no. 1, pp. 291-302, 2022.DOI
60 
A. Boroumand et al., “LazyPIM: Efficient support for cache coherence in processing-in-memory architectures,” arXiv preprint arXiv:1706.03162, 2017.DOI
61 
M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “Tetris: Scalable and efficient neural network acceleration with 3d memory,” in Proc. 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, 2017, pp. 751-764.DOI
62 
M. P. Drumond Lages De Oliveira et al., “The Mondrian data engine,” in Proc. 44th International Symposium on Computer Architecture, 2017.URL
63 
G. Dai et al., “Graphh: A processing-in-memory architecture for large-scale graph processing,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 4, pp. 640-653, 2018.DOI
64 
M. Zhang et al., “GraphP: Reducing communication for PIM-based graph processing with efficient data partition,” in Proc. 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018, pp. 544-557.DOI
65 
Y. Huang et al., “A heterogeneous PIM hardware-software co-design for energy-efficient graph processing,” in Proc. 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020, pp. 684-695.DOI
66 
Y. Zhuo et al., “Graphq: Scalable pim-based graph processing,” in Proc. 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 712-725.DOI
67 
J. Gómez-Luria et al., “Machine Learning Training on a Real Processing-in-Memory System,” in Proc. 2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2022, pp. 292-295.DOI
68 
C. Giannoula, I. Fernandez, J. Gómez-Luna, N. Koziris, G. Goumas, and O. Mutlu, “Towards efficient sparse matrix vector multiplication on real processing-in-memory systems,” arXiv preprint arXiv:2204.00900, 2022.DOI
69 
I. Fernandez et al., “Exploiting near-data processing to accelerate time series analysis,” in Proc. 2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2022, pp. 279-282.DOI
70 
G. F. Oliveira, A. Boroumand, S. Ghose, J. Gómez-Luna, and O. Mutlu, “Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases,” in Proc. 2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2022, pp. 273-278.DOI
71 
A. C. Jacob et al., “Compiling for the active memory cube,” Tech. rep. RC25644 (WAT1612-008). IBM Research Division, Tech. Rep., 2016.URL
72 
S. Lloyd and M. Gokhale, “Design space exploration of near memory accelerators,” in Proc. International Symposium on Memory Systems, 2018, pp. 218-220.DOI
73 
M. Gokhale, S. Lloyd, and C. Hajas, “Near memory data structure rearrangement,” in Proc. Int. Symp. on Memory Systems, 2015, pp. 283-290.DOI
74 
S. Lloyd and M. Gokhale, “In-memory data rearrangement for irregular, data-intensive computing,” Computer, vol. 48, no. 8, pp. 18-25, 2015.DOI
75 
A. Rodrigues, M. Gokhale, and G. Voskuilen, “Towards a scatter-gather architecture: hardware and software issues,” in Proc. International Symposium on Memory Systems, 2019, pp. 261-271.DOI
76 
S. Lloyd and M. Gokhale, “Near memory key/value lookup acceleration,” in Proc. International Symposium on Memory Systems, 2017, pp. 26-33.URL
77 
J. Landgraf, S. Lloyd, and M. Gokhale, “Combining emulation and simulation to evaluate a near memory key/value lookup accelerator,” arXiv preprint arXiv:2105.06594, 2021.DOI
78 
R. Nair, “Evolution of memory architecture,” Proc. IEEE, vol. 103, no. 8, pp. 1331-1345, 2015.DOI
79 
L. Ke et al., “Near-Memory Processing in Action: Accelerating Personalized Recommendation With AxDIMM,” IEEE Micro, vol. 42, no. 1, pp. 116-127, 2022.DOI
80 
D. Lee et al., “Improving in-memory database operations with acceleration DIMM (AxDIMM),” in Proc. Data Management on New Hardware, 2022, pp. 1-9.DOI
81 
S. Li et al., “Drisa: A dram-based reconfigurable in-situ accelerator,” in Proc. 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017, pp. 288-301.DOI
82 
Q. Deng, L. Jiang, Y. Zhang, M. Zhang, and J. Yang, “DrAcc: A DRAM based accelerator for accurate CNN inference,” in Proc. annual design automation conference, 2018, pp. 1-6.DOI
83 
S. Li et al., “Scope: A stochastic computing engine for dram-based in-situ accelerator,” in Proc. 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018, pp. 696-709.DOI
84 
X. Xin, Y. Zhang, and J. Yang, “ELP2IM: Efficient and low power bitwise operation processing in DRAM,” in Proc. 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 303-314.DOI
85 
V. Seshadri et al., “Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology,” in Proc. Annual IEEE/ACM International Symposium on Microarchitecture, 2017, pp. 273-287.DOI
86 
M. Ali et al., “IMAC: In-memory multi-bit multiplication and accumulation in 6T SRAM array,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 8, pp. 2521-2531, 2020.DOI
87 
S. Yin, Z. Jiang, J.-S. Seo, and M. Seok, “XNOR-SRAM: In-memory computing SRAM macro for binary/ternary deep neural networks,” IEEE Journal of Solid-State Circuits, vol. 55, no. 6, pp. 1733-1743, 2020.DOI
88 
J. Heo, J. Kim, S. Lim, W. Han, and J.-Y. Kim, “T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training,” IEEE Journal of Solid-State Circuits, vol. 58, no. 3, pp. 600-613, March 2023.DOI
89 
A. Biswas and A. P. Chandrakasan, “Conv-RAM: An energy-efficient SRAM with embedded convolution computation for low-power CNN-based machine learning applications,” in Proc. 2018 IEEE International Solid-State Circuits Conference-(ISSCC), 2018, pp. 488-490.DOI
90 
A. Shafiee et al., “ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 14-26, 2016.DOI
91 
P. Chi et al., “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 27-39, 2016.DOI
92 
X. Sun, S. Yin, X. Peng, R. Liu, J. Seo, and S. Yu, “XNOR-RRAM: A scalable and parallel resistive synaptic architecture for binary neural networks,” in Proc. 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018, pp. 1423-1428.DOI
93 
T. Tang, L. Xia, B. Li, Y. Wang, and H. Yang, “Binary convolutional neural network on RRAM,” in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), 2017, pp. 782-787.DOI
94 
M. Imani, S. Gupta, Y. Kim, and T. Rosing, “Floatpim: In-memory acceleration of deep neural network training with high precision,” in Proc. International Symposium on Computer Architecture, 2019, pp. 802-815.DOI
95 
L. Song, X. Qian, H. Li, and Y. Chen, “Pipelayer: A pipelined reram-based accelerator for deep learning,” in Proc. IEEE international symposium on high performance computer architecture (HPCA), 2017, pp. 541-552.URL
96 
S. Angizi, Z. He, A. Awad, and D. Fan, “MRIMA: An MRAM-based in-memory accelerator,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 5, pp. 1123-1136, 2019.DOI
97 
A. D. Patil, H. Hua, S. Gonugondla, M. Kang, and N. R. Shanbhag, “An MRAM-based deep in-memory architecture for deep neural networks,” in Proc. 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019, pp. 1-5.DOI
98 
S. Angizi, Z. He, F. Parveen, and D. Fan, “IMCE: Energy-efficient bit-wise in-memory convolution engine for deep neural network,” in Proc. Asia and South Pacific Design Automation Conference (ASP-DAC), 2018, pp. 111-116.DOI
99 
S. Angizi, Z. He, A. S. Rakin, and D. Fan, “Cmp-pim: an energy-efficient comparator-based processing-in-memory neural network accelerator,” in Proc. Annual Design Automation Conference, 2018, pp. 1-6.DOI
100 
Y. Long, T. Na, and S. Mukhopadhyay, “ReRAM-based processing-in-memory architecture for recurrent neural network acceleration,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 12, pp. 2781-2794, 2018.DOI
101 
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.URL
102 
K. Cho et al., “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.DOI
103 
J. Han, H. Liu, M. Wang, Z. Li, and Y. Zhang, “ERA-LSTM: An efficient ReRAM-based architecture for long short-term memory,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 6, pp. 1328-1342, 2019.DOI
104 
N. Challapalle et al., “Psb-rnn: A processing-in-memory systolic array architecture using block circulant matrices for recurrent neural networks,” in Proc. Design, Automation & Test in Europe Conference & Exhibition, 2020, pp. 180-185.DOI
105 
A. Vaswani et al., “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.URL
106 
M. Zhou, W. Xu, J. Kang, and T. Rosing, “TransPIM: A Memory-based Acceleration via Software-Hardware Co-Design for Transformer,” in Proc. IEEE Int. Symp. on High-Performance Computer Architecture, 2022, pp. 1071-1085.DOI
107 
X. Yang, B. Yan, H. Li, and Y. Chen, “ReTransformer: ReRAM-based processing-in-memory architecture for transformer acceleration,” in Proc. 39th International Conference on Computer-Aided Design, 2020, pp. 1-9.DOI
108 
A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.DOI
109 
S. M. Lakew, M. Cettolo, and M. Federico, “A comparison of transformer and recurrent neural networks on multilingual neural machine translation,” arXiv preprint arXiv:1806.06957, 2018.DOI
110 
S. Lee and H. Kim, “GaussianMask: Uncertainty-aware Instance Segmentation based on Gaussian Modeling,” in Proc. 26th International Conference on Pattern Recognition (ICPR 2022), pp. 3851-3857, Aug. 2022.DOI
111 
J. Jang, H. Lee, and H. Kim, “Performance Analysis of Phase Change Memory System on Various CNN Inference Workloads,” in Proc. 19th International SoC Design Conference (ISOCC 2022), pp. 133-134, Oct. 2022.DOI
112 
J. Jang, H. Lee, and H. Kim, “Characterizing Memory Access Patterns of Various Convolutional Neural Networks for Utilizing Processing-In-Memory,” in Proc. 2023 Int. Conf. on Electronics, Information, and Communications (ICEIC 2023), pp. 358-360, Feb. 2023.DOI
113 
D. Chun, J. Choi, H.-J. Lee, and H. Kim, “CP-CNN: Computational Parallelization of CNN-based Object Detectors in Heterogeneous Embedded Systems for Autonomous Driving,” IEEE Access, vol. 11, pp. 52812-52823, 2023.DOI
114 
J. Lee and H. Kim, “Discrete Cosine Transformed Images Are Easy To Recognize in Vision Transformers,” IEIE Transactions on Smart Processing & Computing, vol. 12, no. 1, pp. 48-54, Feb. 2023.DOI
115 
D. Nguyen, N. Hung, H. Kim, and H.-J. Lee, “An Approximate Memory Architecture for Energy Saving in Deep Learning Applications,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 67, no. 5, pp. 1588-1601, May 2020.DOI
116 
Mutlu, Onur, et al. “A Modern Primer on Processing in Memory”, pp171-243, 2022DOI
117 
S. Lee, H. Lee, H.-J. Lee, and H. Kim, “Evaluation of Various Workloads in Filebench Suitable for Phase-Change Memory,” IEIE Transactions on Smart Processing & Computing, vol. 10, no. 2, pp. 160-166, Apr. 2021.DOI
118 
S. Cho, “Volatile and Nonvolatile Memory Devices for Neuromorphic and Processing-in-memory Applications,” Journal of Semiconductor Technology and Science, vol. 22, no. 1, pp.30-46, Feb. 2022.URL
119 
W. Shim, “Impact of 3D NAND Current Variation on Inference Accuracy for In-memory Computing,” Journal of Semiconductor Technology and Science, vol. 22, no. 5, pp. 341-345, Oct. 2022.URL