Mobile QR Code QR CODE

Main Menu

The Journal of Semiconductor Technology and Science (JSTS) is an international, peer-reviewed, and open-access journal that is published bimonthly.
- Scope: semiconductor processes, devices, circuits, and MEMS.
- Editor-in-Chief: Prof. Woo Young Choi (ECE, Seoul National University)
- Indexed within Science Citation Index Expanded (SCIE), SCOPUS, Korea Citation Index (KCI), and other databases.

Journal Search

[

Research article

]

JSTS(Journal of Semiconductor Technology and Science)

IEIE Vol. 19, No. 1, p.137-143

ISSN (print) :

1598-1657

ISSN (online) :

2233-4866

Received : 10 February 2019Revised : 0 0 0Accepted : 13 February 2019

DOI :

https://doi.org/10.5573/JSTS.2019.19.1.137

The Reinforcement Learning based Local Routing Optimization for Ad-hoc Network

(Yongsu Lee) ¹^† (Jongchan Woo) ¹ (Hoi-Jun Yoo) ¹

(School of EE, Korea Advanced Institute of Science and Technology)

^†Corresponding author, E-mail: yongsu.lee91@gmail.com

License :

This is an Open-Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.(www.theieie.org).

Abstract

A reinforcement learning based local routing optimization scheme with two different semiconductor chips – hub IC and node ICs is proposed for the ad-hoc network. The received signal strength indicator (RSSI) in IC generates voltage information for analyzing network quality between each node and collected RSSI data are used for input and reward function of the learning agent. The Qlearning method is utilized for reinforcement learning. The chip fabricated with a 0.18 μm CMOS process, which uses the standard supply voltage of 1.5 V, achieves the lowest power consumption of 274 μW at the supply voltage of 0.8 V. The proposed reinforcement learning based local routing optimization for the ad-hoc network reduce 64 % of total network power consumption compare to the conventional infrastructure based network.

Index Terms

Reinforcement learning, local routing optimization, Q-learning, semiconductor chip, hub, node, low- power consumption

I. INTRODUCTION

In wireless networks, each node in the network transmits and receives data through one of two modes: infrastructure network mode or ad-hoc network mode. In infrastructure network mode, all nodes on a wireless network communicate with each other through a single access point like hub node or wireless router. In ad-hoc network mode, each node with its own wireless network adapter or transceiver communicates directly with neighbored nodes and collects local data to the hub node. Each node in the ad-hoc network mode can act as a router itself.

Ad-hoc network-based communication consumes much less power across the entire network than the infrastructure network^[1,^2]. Fig. 1 illustrates the power consumed by each node for communication. The distance between each node is modeled in proportion to the quality of the communication environment.

Fig. 1. Power consumption modeling in the infrastructure network and the ad-hoc network.

Variable reinforcement learning methodologies were used to route ad-hoc networks^[3-^5]. Some reinforcement learnings follow the Finite Markov Decision Processes (MDP), a way to express the problem of learning from interactions to achieve goals. MDP contains three concepts: sensation, action, and goal. A trade-off occurs between immediate and delayed reward.

MDP consists of three components: state, action, and reward, as shown in Fig. 2. Each set of components includes a finite number of elements. The agent is the leader and decision maker, state and reward are determined from the environment. This paper utilizes the received signal strength indicator (RSSI) block in the chip to generate the reward for the communication environment between each node^[6]. RSSI generates voltage information for analyzing network quality between each node. Collected RSSI data are processed and used for input and reward function of the learning agent. The Q-learning method which is the commonly used reinforcement learning technique for routing optimization is adopted^[7].

Fig. 2. Components of the proposed reinforcement learning system.

In this paper, the Watkins-Dayan^[8] based Q-learning update rule is used, and it can be summarized as the following equation:

$Q _ {t} (s, a) \leftarrow\left(1-\alpha_{t}\right) Q _ {t-1} (s, a) + \alpha_{t}\left[r_{t}+\gamma \max Q_{t=1}\left(s^{\prime}, a^{\prime}\right)\right] $

where s is the state, $a$ is the action, $r$ is the reward and $\gamma$ is the learning rate.

In this paper, the reinforcement based local routing optimization scheme is proposed for the ad-hoc network. The ad-hoc network enables stable communication in harsh channel condition with the low-power operation. The intermediate nodes are operated as repeaters to relieve the output power requirement of transmitter in the neighboring nodes.

The rest of this paper consist as follow. Section II shows the architecture of the circuit in each IC. Section III illustrates the communication protocol description and FSM of entire sequence. Section IV shows the implementation results and the paper is concluded in section V.

II. CIRCUIT ARCHITECTURE

Fig. 3 shows the overall block diagram of the proposed system. The system consists of two different types of chips – the hub IC and node ICs. In the hub IC, a 240 MHz clock signal is produced from a reference clock generator, and the hub broadcasts clock to multi-node ICs. Thanks to low-power injection-locking clock receiving scheme, every node IC can recover the same clock signal from the hub IC. The super-regenerative receiver based signal strength indicator (RSSI) monitors the communication channel environment between not only nodes to the hub, but also node to node each other^[6].

Fig. 3. Overall block diagram of each IC – hub IC and node ICs.

According to the channel condition between node and hub, the type of data transmitter is selected. In case of good channel condition, a FSK transmitter is selected for sending data directly to the hub, and an OOK transmitter is utilized for sending data to the neighbored node in case of harsh channel condition.

The collected RSSI data is processed in the hub IC, and the routing table of the entire network system is generated after the Q-learning update.

III. PROTOCOL AND FSM

In this session, a protocol of the proposed ad-hoc network system is described. According to the 3-bit code of frame type, the type and the purpose of data is determined as shown in Fig. 4. In the case of the Beacon Frame, each component is used as follow,

Fig. 4. Frame structure according to frame type code.

1) Time Stamp (8 bytes)

After receiving the beacon frame, all the stations change their local clocks to this time. It helps with synchronization.

2) Beacon Interval (2 bytes)

It is the time interval between beacon transmissions. Beacon interval is expressed in Time Unit (TU, 1TU = 1024 us) and typically configured as 100TU.

3) Error Frame (2 bytes)

It indicates the frame number of missing frame or biterror frame.

4) Guard Interval (2 bytes)

It shows the time interval between each frame.

5) CRC (4 bytes)

Cyclic Redundancy Check for the FCS (Frame Check Sequence).

In the case of the RSSI Frame, the purpose of this frame is to calculate RSSI value. Therefore, it is identical to the data frame without payload.

In the case of the Attenuation Table Frame, it includes attenuation values (calculated from the RSSI value) from other nodes to the transmitting nodes. For example, if node 3 is transmitting the Attenuation Table frame, it contains attenuation values of node 1 to N (N excludes 3) to node 3. Attenuation Table contains information about the identification of nodes and their attenuation values. Since each pair of nodes has own Attenuation Table, the total number of Table would be equal to the total number of the node.

In the case of the Routing Table Frame, it is included in the Beacon Frame which enables all nodes to synchronize together. The Routing Table has components as follow,

1) Source, Destination

The source node should transmit to destination node.

2) Attenuation

Attenuation value from source to destination. Source node would set the signal power based on this value.

3) Group

Group number of the transmission. The near nodes of hub are fixed to group 3 and those who send frame to group 3 would be group 2, and group 1 would send signal to group 2, respectively.

4) Priority

It indicates the priority of transmission in the same group. Higher priority node transmits signal earlier than the others.

5) Payload Size

It indicates payload size of the node. The code 0b00 means 300 bytes, and the code 0b01 is for 600 bytes and the code 0b11 is for 900 bytes.

In the case of the Data Frame, each component is used as follow,

1) PHY header

This field contains 7 bytes of preamble (0b10101010) with code of Start of the Frame (SOF, 0b101010$11$). The last two bits of SOF indicates the start of the frame.

2) Frame Control

Frame control contains 16 bits of control that indicate information about the type and the purpose of the frame.

3) Packet number

Current packet number of the node.

4) Address 1, Address 2

Source and Destination address depending on Frame Control bits.

For the FSM, the variables and the functions of the FSM are described in Table 1 and Table 2.

Table 1. Description of the FSM variables in the node

Variable	Description
Current_node	Node number that is allocated to transmit currently
Node_number	Number of nodes in the system
Final_node_num	Number of nodes that transmit frame to hub
Current_node_in_final_group	Number of nodes that have transmitted frame to hub currently

Table 2. Description of the FSM functions in the node

Function	Description
Push_start_button()	Initiate transmission by push button in hub
Send_beacon_frame()	Send beacon frame globally
Receive_signal	Receive frame from nodes for RSSI value calculation
RSSl_calculate()	Calculate RSSI value from received signal and update attenuation table
Receive_atten_table()	Receive attenuation table from nodes
Atten_table_update()	Add attenuation row received from node to hub's attenuation table
Q-Learning()	Q-Learning for generating routing table
Set_routing_table()	Set routing table and timing for data transmission based on attenuation table
Send_routing_ table()	Send routing table to all nodes
Set_final_node_num()	Set the number of nodes that would transmit data to hub
Receive_data()	Receive data frame from nodes (final nodes)
Process_data()	Extract payload and process the data

The sequence of the FSM is illustrated in Fig. 5, and the details are as like followed. First, send the Beacon Frame for global clock synchronization from the hub IC. Each node gets a beacon frame and sets its clock.

Fig. 5. FSM of the entire system.

Then, it sends the signal for calculating the RSSI data in sequence according to the node number globally. When the transmission to the last node and the calculation of the RSSI data are completed, each node can know the value of the attenuation from the other node to itself. Thus, to make the completed assembly table of n x n size, each node sends its attenuation table to the hub, making a complete attenuation table at the hub.

In the hub IC, the Q-learning is updated based on the complete attenuation table, and the routing table is generated through reinforcement learning. The routing table is sent globally to all nodes, and each node determines the data transmit timing based on the routing table.

In the case of transmitting data to a nearby node, the node receiving the data adds them in its payload to the next node or the hub. Once the data transmission to the hub is complete, one data transmission sequence is completed, and the above process is repeated.

IV. IMPLEMENTATION RESULTS

The electrical characteristics of the RSSI and the measurement results are shown in Fig. 6. The RSSI consists of seven rectifiers, each producing a current proportional to the input signal size of the corresponding range. Current is converted to voltage and recorded as RSSI data, and an error of -90 to -20 dBm input range is only up to 4 dBm.

Fig. 6. RSSI operation and its measurement results.

Sixteen nodes are placed to compare the power consumption between the Infrastructure Network and the Ad-hoc Network as shown in Fig. 7. In the case of the infrastructure network, each node sends data directly to the hub, and in the case of the ad-hoc network, it is divided into up to three groups, sending data to adjacent nodes according to the network environment. The experimental conditions for power measurement are as shown in Table 3.

Fig. 7. Sixteen nodes and data transmission for power measurement.

Table 3. Experimental conditions for measurement

Parameter	Value
Bits for datum	24 bits/datum
Maximum Sampling Rate	1kHz
Transmission bps (Node to Hub)	500kHz
Transmission bps (Hub to Nod el Node to Node)	200kHz
Number of Nodes	16
Guard Interval	0.1ms

The sequence of the routing table setting in the Ad-hoc network is shown in Fig. 8(a), and the routing table is updated once every ten cycles, so the effective time is 3.87 ms. The sequence of the data transmission is shown in Fig. 8(b), and it took 94.98 ms for a total.

Fig. 8. (a) Sequence of the routing table setting, (b) Sequence of the data transmission.

As a result of the power measurement, a total of 1540 mW was consumed in the Infrastructure Network, and only 558 mW was dissipated in the Ad-hoc Network to transmit the same amount of data, reducing power consumption by 64%.

The chip photographs of hub and node IC are shown in Fig. 9, and the summary for chip performance is shown in Table 4.

Fig. 9. Chip photograph of hub and node IC.

Table 4. Chip performance summary

Process		0.18$\mu m$ 1P6M Mixed CMOS
Die Size		Hub IC : 2.35 X 2.55 mm² (Including Pads)
		Node IC: 2.35 X 2 mm² (Including Pads)
Total Power		Hub IC: 2.8mW Node IC : 274$\mu W$ (@0.8V Supply)
Hub	FSK Rx	Channel Frequency	20/40MHz FSK
		Data Rate	500kbps
		Sensitivity	-75dBm
	RISC	Attenuation Table Update, Routing Table Update with Q-Learning
Node	Clock Rx	Channel Frequency	240MHz Always on
		Data Rate	500kbps
		Sensitivity	-62dBm
	OOK Rx	Channel Frequency	20MHz OOK
		Data Rate	200kbps
		Sensitivity	-70dBm

V. CONCLUSIONS

In this paper, a reinforcement learning based local routing optimization for the ad-hoc network is proposed. For the agent, the Q-learning method is adopted. The hub IC and multiple node ICs are implemented for data communication. The received signal strength indicator (RSSI) block converts network quality information to an electrical current value between each node to node. The collected RSSI data are used for reward function of the reinforcement learning, and the routing table for the adhoc network is generated from them.

The chip fabricated with a 0.18 μm CMOS process, which uses the standard supply voltage of 1.5 V, achieves the lowest power consumption of 274 μW at the supply voltage of 0.8 V. The proposed reinforcement learning based local routing optimization for the ad-hoc network reduce 64% of total network power consumption compare to the conventional infrastructure network model.

ACKNOWLEDGMENTS

This work was supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No.2016-0-00207, Intelligent Processor Architectures and Application Software for CNN (Convolutional Neural Network)-RNN (Recurrent Neural Network))

REFERENCES

Gregori Enrico, Cherkasova Ludmila, Cugola Gianopaolo, Panzieri Fabio, Picco Gian P., May. 2002, Web Engineering and Peer-to-Peer Computing, Springer

Hsieh Hung-Yun, Sivakumar Raghupathy, Jun. 2001, Performance Comparison of Cellular and Multihop Wireless Networks: A Quantitative Study, ACM SIGMETRICS Performance Evaluation Review, Vol. 29, No. 1, pp. 113-122

Reddy Prashant P., Veloso Manuela M., Sep. 2011, RSSIbased Physical Layout Classification and Target Tethering in Mobile Ad-hoc Networks, IEEE/RSJ International Conference on Intelligent Robots and Systems

Beyens P., Peeters M., Steenhaut K., Nowe A., Mar. 2005, Routing with Compression in WSNs: A QLearning approach, Proc. of the 5th Eur. Wksp on Adaptive Agents and Multi-Agent Systems (AAMAS)

F¨orster Anna, Dec. 2007, Machine Learning Techniques Applied to Wireless Ad-Hoc Networks: Guide and Survey, 3rd International Conference on Intelligent Sensors, Sensor Networks and Information

Lee Yongsu, Yoo Hoi-jun, Jul. 2017, A 274μW Clock Synchronized Wireless Body Area Network IC with Super-Regenerative RSSI for Biomedical Ad-Hoc Network System, 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Boyan J. A., Littman M. L., 1994, Packet routing in dynamically changing networks: A reinforcement learning approach, Advances in Neural Information Processing Systems, Vol. 6

Watkins C., Dayan P., 1992, Q-learning, Machine Learning

Author

Yongsu Lee

(S’13) received the B.S., M.S. and Ph.D. degrees in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2013, 2015, and 2019.

His current research interests include deep-learning SoC and low-power SoC for wearable healthcare system.

He is also interested in human body communication (HBC) SoC for low-power application.

Jongchan Woo

received the B.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology (KAIST).

And he is currently working toward the Ph.D. degree in Massachusetts Institute of Technology (MIT).

His current research interests include deep-learning SoC and human experience with sensor system.

Hoi-Jun Yoo

graduated from the Electronic Department of Seoul National University, Seoul, Korea, in 1983 and received the M.S. and Ph.D. degrees in electrical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, in 1985 and 1988, respectively.

Since 1998, he has been the faculty of the Department of Electrical Engineering at KAIST and now is a full professor.

From 2001 to 2005, he was the director of Korean System Integration and IP Authoring Research Center (SIPAC).

From 2003 to 2005, he was the full time Advisor to Minister of Korea Ministry of Information and Communication and National Project Manager for SoC and Computer.

In 2007, he founded System Design Innovation & Application Research Center (SDIA) at KAIST.

Since 2010, he has served the general chair of Korean Institute of Next Generation Computing.

His current interests are computer vision SoC, body area networks, biomedical devices and circuits.

He is a coauthor of DRAM Design (Korea: Hongrung, 1996), High Performance DRAM (Korea: Sigma, 1999), Future Memory: FRAM (Korea: Sigma, 2000), Networks on Chips (Morgan Kaufmann, 2006), Low-Power NoC for High-Performance SoC Design (CRC Press, 2008), Circuits at the Nanoscale (CRC Press, 2009), Embedded Memories for Nano-Scale VLSIs (Springer, 2009), Mobile 3D Graphics SoC from Algorithm to Chip (Wiley, 2010), Bio-Medical CMOS ICs (Springer, 2011), Embedded Systems (Wiley, 2012), and Ultra-Low-Power Short-Range Radios (Springer, 2015).