Mobile QR Code QR CODE

  1. (Department of Electronic Systems Engineering, The University of Shiga Prefecture, Hikone, Shiga 522-8533, Japan)

FPGA, image processing, memory, IoT, machine learning


Recently, various sensor systems for Internet of things (IoT) devices have been proposed and developed (1-4). Image processing is a key technology supporting such sensor systems and has been actively studied for applications in self-driving cars (5), factory automation systems (6), security technology (7), and so on. The demand for processing high-resolution (4K or 8K) images to detect objects more precisely is increasing, because camera modules with high performance are readily available (8). These systems require small size, low power consumption, and high speed. It is more difficult, however, for a central processing unit (CPU) to process large data in real time, because it operates sequentially (9).

In contrast, a field-programmable gate array (FPGA) has many advantages including high-speed processing derived from its parallel operation. Thus, it is suitable for application in image processing involving large-scale calculation at high speed (9-12).

Fig. 1 shows a typical FPGA-based image-processing system. It consists of an FPGA, memory, and input/output devices (a camera and monitor). In most cases, a dynamic random-access memory (DRAM) is adopted as the memory, because it has a large capacity and can be mounted on a general FPGA board (13,14). The image data is transmitted from the camera to the monitor by the FPGA in the following steps.

1. The captured camera images are written to the memory and held there temporarily.

2. The image data from the memory is loaded to the FPGA for image processing.

3. The results of image processing are sent to the memory and held there temporarily.

4. The processed results in the memory are output to the monitor.

As described above, an FPGA in an image processing system plays the roles of controlling all the devices and receiving and transmitting image data at appropriate timings. Because the frame rates of the camera and

Fig. 1. FPGA-based image processing system


Fig. 2. FPGA-based image processing system


monitor are defined in advance, the memory access timing for image processing is limited. When the interval between memory accesses for image processing is constant, reducing the frequency of such memory accesses can avoid collision with other access requests, but it reduces the processing speed.

We have studied a new method to dynamically control memory access for image processing by monitoring the status of access requests from the monitor and camera controllers. A brief operation principle was presented and preliminary experimental results were shown in the previous report (15). In this paper, we describe our memory access method in detail and present the implementation in an FPGA board with a DDR3 SDRAM. Also, we show that the image processing speed of the proposed method is 1.65 times faster than that of the conventional method.

The organization of this paper is as follows. First, we explain the procedure for memory access in an FPGA-based image processing system in Section II. Next, we describe the problems of a conventional image processing system in Section III. Then, we propose a method to optimize the memory access for image processing in Section IV. After that, we show measurement results with the proposed method in Section V. Finally, we discuss advantage of the proposed method in Section VI, before concluding the paper in Section VII.


This section provides an overview of memory access in an FPGA-based image processing system. Fig. 2 shows the configuration of the modules for the image processing in the FPGA. These modules include device controllers for the camera, monitor, and memory, an image processing module, and a memory arbiter. There are four kinds of memory access: monitor read access, camera write access, and image processing read and write access. The memory can accept only one access from one of the modules at one time. When multiple access requests to the memory occur, only one request is accepted and processed, while the others may be lost (corresponding to collision as mentioned in Section I). This can cause system malfunctions. Therefore, the arbiter is usually inserted between the modules and the memory, and it enables control of the access timing from each controller to prevent conflicts due to multiple requests occurring at the same time (16). Thus, a memory arbiter sequentially transmits access requests to the memory controller.

In general, the ways of arbitration in a memory arbiter are classified into the following three schemes: round robin, first in first out (FIFO), and priority (17,18). In the round robin method, a specific amount of time to access to a memory controller is given to each module in order defined in advance. When a module has a request for memory in its turn, the request is accepted. Even if it has no request but other modules have requests, no request is accepted. Although collision of requests does not occur, it is difficult to improve the processing speed because of the idle time. In the FIFO method, requests from modules are accepted and executed in order of arrival at an arbiter. In contrast to the round robin method, all the acceptable time of the memory are available for all modules, while collision of the requests occurs and some requests may be lost. In particular, loss of the write request from a camera is not recovered. Hence, the FIFO method is not suitable for real-time image processing. In the priority method, a specific priority (low, medium, high) is given to each module. In image processing, higher priorities should be assigned to a camera and

Fig. 3. Configuration and function of memory arbiter


monitor because they operate with the fixed frame rates. A priority method is also classified into two types including fixed and dynamic. In the fixed type, the priorities are given to modules in advance. Therefore, the image processing with lower priority may be not accepted. On the other hand, the dynamic type switches the priority to the module accessing to memory frequently, so it can reduce the idle time. However, it is more difficult to implement the dynamic-type priority arbiter because complicated condition settings are required.

First, we consider the fixed-priority type arbiter. Fig. 3 shows the configuration of the memory arbiter, which consists of an arbiter controller and registers. Specifically, it has two registers for each module to store the access requests from each one. The arbiter operates in the following steps, as shown in Fig. 3. Each register 2 receives the access requests from each module and transfers them to each register 1. Then, the arbiter controller decides which request to select and transfers the request to the memory controller in the following priority order. Note that higher priority is given to memory access from a camera or monitor controller that has requests at regular intervals as mentioned above.

1. Memory controller is busy: no access.

2. Monitor register 1 has a request: monitor access.

3. Camera register 1 has a request: camera access.

4. Process read register 1 has a request: process read access.

5. Process write register 1 has a request: process write access.

6. No request from any register 1: no access.

Fig. 4. Process access permission timing


Here, the memory-busy state (priority order: 1) occurs when the memory controller is refreshing the DRAM or when the buffer in the memory controller is filled (13).


Because the memory arbiter operates according to the priority order described in Section II, memory access for image processing is permitted only if the process access permission is high, as shown in Fig. 4. Here, we focus on either reading or writing in image processing, and for simplicity, we assume that only one register is used.

The timings of the image processing and memory access requests are synchronized with that of the clock signal. The frequencies of the processing and the requests are equal to the clock frequency divided by an integer, which is a constant. Thus, the access requests are sent to the arbiter at regular intervals. That causes some problems in image processing, as described below.

Fig. 5 shows the flow of memory access for image processing. First, as described in Section II, a process access request sent from the image processing module is stored in a process register in the arbiter. When the process access permission is high, the arbiter controller can receive the request from the register, and the memory access is executed.

To increase the image processing speed, the access request interval, tinterval, should become as short as possible. In that case, a process access request may be overwritten and lost, as shown in Fig. 5, which prevents the system from operating properly. This is because a

Fig. 5. Process access timing in shorter interval case


Fig. 6. Process access timing in longer interval case


Fig. 7. Active and inactive periods for monitor and camera access


new request (request 6) that comes before an existing request (request 5) in the register is transferred to the memory. As a result, the existing request is overwritten by the new request and lost.

To avoid this problem, the interval should be long enough to prevent losing requests, as shown in Fig. 6. Unfortunately, this decreases the speed of image processing. In addition, the request is not sent to the arbiter even though a request can be accepted, and the timing of the process access permission is wasted.

Fig. 8. Process access permission timing in inactive period


Fig. 9. Process access permission timing in inactive period


Furthermore, there are two distinct periods, in which access requests from the camera and monitor controllers occur frequently (active period) or rarely (inactive period), as shown in Fig. 7. This is due to the specifications of the communication protocol for the camera and monitor. During an inactive period, more process access permission remains in comparison with the case of an active period, as shown in Fig. 8. This means that much time is wasted in terms of improving the image processing speed, as shown in Fig. 9.


The problems described in Section III are derived from the constant intervals between process access requests. It is desirable to control the request intervals dynamically. For example, the more requests for image processing can be accepted when there is not too much memory access, while the process access should be limited during an active period by monitoring the memory status. Therefore, dynamic memory access is effective for both

Fig. 10. Proposed stop-processing function


preventing the loss of processing requests and reducing the wasted time during inactive periods.

Hence, we propose a method for dynamically controlling the intervals of memory access requests according to the memory state. This is a type of the fixed-type priority methods, and different from the dynamic-type priority method described in Section II because priority given to each module is fixed in our method. When the image-processing module has priority (priority order in Section II is 4 or 5), requests for image processing are accepted as many as possible under monitoring the memory status. Consequently, the arbiter can accept the requests dynamically.

In general, because of its algorithm, the image processing module often makes access requests for reading and writing simultaneously. When both the process read and process write registers 2 already have requests, they may be overwritten with the following requests. Thus, we designed a stop-processing function so that the image processing module temporarily stops giving the arbiter access requests when there is a request in the process read or process write register 2, as shown in Fig. 10. Here, we assume that image processing has priority (priority order: 4 or 5) as described in Section II. The output signal of an OR block indicates “existence” i.e., whether the request exists in each register. In this way, the output signal works as a stop-processing signal for the image processing module. After the arbiter controller completes transferring all the requests of the read and write registers 1, and the requests of the read and/or write registers 2 are transferred to the registers 1, the registers 2 have no requests. At this time, both existence signals are set to low, and the stop-processing signal (OR output) is turned off. Then, the operation of the image processing module is restarted.

Fig. 11. Optimization of process access timing in inactive period


Therefore, the image processing module’s operation can be switched according to the stop-processing signal in relation with the memory state.

Fig. 11 shows the timing chart for memory access using the proposed method. While the registers 1 and 2 have no requests, or when the process access permission is enabled, the image processing module continues access requests. When the process access permission is not enabled, access requests are stopped by the stop-processing function. In this way, the proposed method ensures that, in principle, no requests are lost and no time is wasted.


As listed in Table 1, we implemented an FPGA-based image processing system with the proposed method and examined its operation characteristics. The image size was 640 × 480 pixels, and the maximum frame rate was 30 fps. An easily implemented low-pass filter (LPF) (20) was adopted for image processing to confirm the operation of the proposed system. The process of the LPF was performed by calculating the average pixel value in a patch size of 10 × 10 pixels over the entire input image of 640 × 480 pixels. The output frame rate was set to 60 fps because of the monitor’s specifications. Fig. 12 shows the results of image processing, which confirmed that the LPF operated properly and the image was averaged as expected.

Table 1. Measurement conditions




Evaluation board


Nexys Video









MT41K256M16HA-187E [19]



OV5642 camera module

(CMOS image sensor)

Fig. 12. LPF processing result


To examine the validity of the proposed method, both the conventional and proposed processing systems were implemented on an FPGA. Then, the output signals, such as monitor access, camera access, memory-busy, and LPF access signals, were evaluated by using a logic analyzer. Here, the LPF access included both read and write accesses for LPF image processing. As shown in Fig. 13, the memory access interval for image processing was a fixed value of 140 ns, which corresponded to only 5 accesses within 640 ns in the conventional method. During the active period indicated by (1) in Fig. 14(a), the number of memory accesses with the proposed method was 9 within 640 ns, which was 1.8 times more than that with the conventional method, as shown in Fig. 14(b). Furthermore, during the inactive period indicated by (2) in Fig. 14(a), the number of memory accesses was 14 within 640 ns, which was 2.8 times greater, as shown in Fig. 14(c).

Next, as shown in Fig. 15, the image processing speed of the conventional method was 18.6 fps, which was determined by the minimum fixed interval to avoid the loss of image processing access requests. In contrast, the

Fig. 13. Measurement results of memory access status by logic analyzer in conventional method


Fig. 14. Measurement results of memory access status by logic analyzer in proposed method


proposed method increased the processing speed to 30.7 fps, which was 1.65 times faster. The interval between image processing accesses changed dynamically between 10 and 800 ns, depending on the memory access status.

Fig. 16 shows the measurement results of time for processing one image. The total time of the proposed method was 32.8 ms, which was 60% as large as that of the conventional method. As shown in Fig. 16, the memory-free time, meaning wasted time with no access from anywhere, was significantly reduced. Although we used two registers for each module in this implementation, the memory-free state can be almost

Fig. 15. Measurement results for image processing speed


Fig. 16. Measurement results of time for processing one image


completely eliminated by increasing the number of process registers as FPGA resources allow. An increase in the memory-busy time means that memory is used efficiently, because the buffer in the memory controller is easily saturated by increasing the number of accesses.

From the above results, the proposed method improved memory access for image processing and achieved 1.65 times faster processing as compared to the conventional method. Overall, the effectiveness of this method was verified.


We consider advantage of the proposed method and difficulty in implementing an image-processing system with the dynamic memory access control. Because our method does not require changing the priority (fixed-priority type), it becomes easier to design the memory arbiter compared with the dynamic-priority type. In addition, the proposed dynamic arbiter determines whether it accepts new request for image processing or not according to the stop signal depending on the memory state when an image-processing module has priority. Therefore, it does not affect the operations of the camera and monitor, and the arbitration is achieved dynamically in a simple configuration.

However, the intervals between the image-processed outputs are not fixed, which makes it difficult to adjust the timing of the monitor output or additional processing. In our system, the difference of the intervals is absorbed by the memory because the image-processed data is written to the memory temporally and then output to the monitor. In general, the operation speed of the memory is not so high. For further improvement of the processing speed, it is required that the processed data is directly transferred to the monitor or other image-processing modules with appropriate timing adjustment.


We developed a novel memory access method to improve the processing speed in an FPGA-based image processing system. This method dynamically controls the intervals between memory access requests for image processing by monitoring the memory status. We implemented an image processing system with the proposed method and examined its characteristics. In an implementation using the conventional method, the access interval was fixed, which limited the processing speed. On the other hand, the processing speed of the proposed method was 2.8 times faster (in an inactive period) and 1.65 times faster (in an active period) than that of the conventional method, without losing any memory access requests.


Part of this research was supported by Grants-in-Aid for Scientific Research, from the Japan Society for the Promotion of Science.


Singh D., Tripathi G., Jara A. J., March 2014, A survey of Internet-of-things: Future vision, architecture, challenges and services, in 2014 IEEE World Forum on Internet of Things (WF-IoT), pp. 287-292DOI
Chen S., Xu H., Liu D., Hu B., Wang H., August 2014, A vision of IoT: Applications, challenges, and opportunities with China perspective, IEEE Internet of Things Journal, Vol. 1, No. 4, pp. 349-359DOI
Miraz M. H., Ali M., Excell P. S., Picking R., September 2015, A review on Internet of things (IoT), Internet of everything (IoE) and Internet of nano things (IoNT), in 2015 Internet Technologies and Applications (ITA), pp. 219-224DOI
Yin Y., Zeng Y., Chen X., Fan Y., March 2016, The Internet of things in healthcare: An overview, Journal of Industrial Information Integration, Vol. 1, No. , pp. 3-13DOI
Altera , December 2013, FPGA-based control for electric vehicle and hybrid electric vehicle power electronics, available from: [last accessed August 2020]Google Search
Rahmatov N., Paul A., Saeed F., Hong W., Seo H., Kim J., October 2019, Machine learning-based automated image processing for quality management in industrial Internet of things, International Journal of Distributed Sensor Networks, Vol. 15, No. 10DOI
Du Y., Ives R., Nevel A., She J., January 2011, Editorial advanced image processing for defense and security applications, EURASIP Journal on Advances in Signal Processing, Vol. 2010DOI
Matsuo Y., Sakaida S., November 2017, Super-resolution for 2K/8K television using wavelet-based image registration, in 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 378-382DOI
Asano S., Maruyama T., Yamaguchi Y., August 2009, Performance comparison of FPGA, GPU and CPU in image processing, in 2009 International Conference on Field Programmable Logic and Applications, pp. 126-131DOI
Torres-Huitzil C., Arias-Estrada M., 2004, Real-time image processing with a compact FPGA-based systolic architecture, Real-Time Imaging, Vol. 10, No. 3, pp. 177-187DOI
Altera , 2013, Real-time challenges and opportunities in SoCs, available from: https://wwwintelcom/content/dam/www/programmable/us/en/pdfs/literature/wp/wp-01190-real-time-socspdf [last accessed August 2020]Google Search
Hernandez-Lopez A., Torres-Huitzil C., Garcia-Hernandez J. J., July 2015, FPGA-based flexible hardware architecture for image interest point detection, International Journal of Advanced Robotic Systems, Vol. 12, pp. 1-15DOI
Xilinx , 2018, Zynq-7000 SoC and 7 series devices memory interface solutions, available from: [last accessed August 2020]Google Search
Xilinx , 2011, 7 series FPGAs memory interface solutions, available from: [last accessed August 2020]Google Search
Nishiguchi K., Inoue T., Tsuchiya A., Ogohara K., Kishine K., October 2019, Optimization technique of memory traffic for FPGA-based image processing system, in 2019 International SoC Design Conference (ISOCC), pp. 46-47DOI
Tigadi A., Guhilot H., November 2018, Design of an arbiter for two systems accessing a single DDR3 memory on a reconfigurable platform, International Journal of Information Engineering and Electronic Business, Vol. 10, pp. 14-20DOI
Helal K. A., Attia S., Ismail T., Mostafa H., June 2015, Priority-select arbiter: An efficient round-robin arbiter, in 2015 IEEE 13th International New Circuits and Systems Conference (NEWCAS), pp. 1-4DOI
Yang Y., Wu R., Zhang L., Zhou D., January 2015, An asynchronous adaptive priority round-robin arbiter based on four-phase dual-rail protocol, Chinese Journal of Electronics, Vol. 24, No. 1, pp. 1-7DOI
Micron Technology , 2018, 4Gb: x4, x8, x16 DDR3L SDRAM, available from: https://wwwmicroncom/-/media/client/global/ documents/products/data-sheet/dram/ddr3/4gb_ ddr3lpdf [last accessed August 2020]Google Search
Gonzalez R., Woods R., 2018, Digital Image Processing, 4th ed. PearsonGoogle Search


Kenta Nishiguchi

He received the B.E. degree of electronic systems engi-neering from the University of Shiga Prefecture in 2017.

Since the same year, he has enrolled a master's course Graduate school of Engi-neering in the University of Shiga Prefecture.

His research interest an FPGA based circuits and systems.

Toshiyuki Inoue

Toshiyuki Inoue received the B.S., M.S. and Ph.D. degrees in Electrical Electronic and Information Engi-neering from Osaka University, Osaka, Japan in 2010, 2012 and 2015, respectively.

He joined the Depart-ment of Electronic Systems Engi-neering, the University of Shiga Prefecture, in 2017, and has been an Assistant Professor since 2017.

His research interests include RF circuits for wireless communication, wireless sensor networks, radio-over-fiber technique and optoelectronics.

Dr. Inoue is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) of Japan and the Japan Society of Applied Physics (JSAP).

He received the Paper Award in 2013 from IEICE.

Rei Yamazaki

Rei Yamazaki is currently pursuing a B.E degree in Electronic Systems Engineering at the University of Shiga Prefecture, Japan.

His research interest includes FPGA based systems and image processing.

Kazunori Ogohara

Kazunori Ogohara received the B.S., M.S. and Ph.D. degrees in Graduate school of Science from Kyoto University, Kyoto, Japan in 2005, 2007 and 2010, respectively.

He joined the Department of Electronic Systems Engineering, the University of Shiga Prefecture, in 2013 as an assistant professor, and has been a lecturer since 2019.

His research interests include Martian atmospheric science and semantic segmentation of Martian dust storms using machine learning.

Dr. Ogohara is a member of the Meteorological Society of Japan (JMS), Information Processing Society of Japan (IPSJ), and Division for Planetary Sciences of the American Astronomical Society (DPS-AAS).

He received the Outstanding Paper Award for Young Scientist from COSPAR in 2012.

Akira Tsuchiya

Akira Tsuchiya received the B.E., M.E. and Ph.D. degrees in Communications and Computer Engineering from Kyoto University, Kyoto, Japan, in 2001, 2003, and 2005, respectively.

Since 2005, he has been an Assistant Professor in the Department of Communications and Computer Engineering, Graduate School of Informatics, Kyoto university.

Since 2017, he has been an Associate Professor in the Department of Electronic Systems Engineering, the University of Shiga Prefecture, Shiga, Japan.

His research interest includes modeling and design of on-chip passive components of high-frequency CMOS, and high-speed analog circuit design.

He is a member of the IEEE, IEICE and IPSJ.