(Yongsu Lee)
1†
(Jongchan Woo)
1
(Hoi-Jun Yoo)
1
-
(School of EE, Korea Advanced Institute of Science and Technology)
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Index Terms
Reinforcement learning, local routing optimization, Q-learning, semiconductor chip, hub, node, low- power consumption
I. INTRODUCTION
In wireless networks, each node in the network transmits and receives data through
one of two modes: infrastructure network mode or ad-hoc network mode. In infrastructure
network mode, all nodes on a wireless network communicate with each other through
a single access point like hub node or wireless router. In ad-hoc network mode, each
node with its own wireless network adapter or transceiver communicates directly with
neighbored nodes and collects local data to the hub node. Each node in the ad-hoc
network mode can act as a router itself.
Ad-hoc network-based communication consumes much less power across the entire network
than the infrastructure network[1,2]. Fig. 1 illustrates the power consumed by each node for communication. The distance between
each node is modeled in proportion to the quality of the communication environment.
Fig. 1. Power consumption modeling in the infrastructure network and the ad-hoc network.
Variable reinforcement learning methodologies were used to route ad-hoc networks[3-5]. Some reinforcement learnings follow the Finite Markov Decision Processes (MDP),
a way to express the problem of learning from interactions to achieve goals. MDP contains
three concepts: sensation, action, and goal. A trade-off occurs between immediate
and delayed reward.
MDP consists of three components: state, action, and reward, as shown in Fig. 2. Each set of components includes a finite number of elements. The agent is the leader
and decision maker, state and reward are determined from the environment. This paper
utilizes the received signal strength indicator (RSSI) block in the chip to generate
the reward for the communication environment between each node[6]. RSSI generates voltage information for analyzing network quality between each node.
Collected RSSI data are processed and used for input and reward function of the learning
agent. The Q-learning method which is the commonly used reinforcement learning technique
for routing optimization is adopted[7].
Fig. 2. Components of the proposed reinforcement learning system.
In this paper, the Watkins-Dayan[8] based Q-learning update rule is used, and it can be summarized as the following equation:
$Q _ {t} (s, a) \leftarrow\left(1-\alpha_{t}\right) Q _ {t-1} (s, a) + \alpha_{t}\left[r_{t}+\gamma
\max Q_{t=1}\left(s^{\prime}, a^{\prime}\right)\right] $
where s is the state, $a$ is the action, $r$ is the reward and $\gamma$ is the learning
rate.
In this paper, the reinforcement based local routing optimization scheme is proposed
for the ad-hoc network. The ad-hoc network enables stable communication in harsh channel
condition with the low-power operation. The intermediate nodes are operated as repeaters
to relieve the output power requirement of transmitter in the neighboring nodes.
The rest of this paper consist as follow. Section II shows the architecture of the
circuit in each IC. Section III illustrates the communication protocol description
and FSM of entire sequence. Section IV shows the implementation results and the paper
is concluded in
section V.
II. CIRCUIT ARCHITECTURE
Fig. 3 shows the overall block diagram of the proposed system. The system consists of two
different types of chips – the hub IC and node ICs. In the hub IC, a 240 MHz clock
signal is produced from a reference clock generator, and the hub broadcasts clock
to multi-node ICs. Thanks to low-power injection-locking clock receiving scheme, every
node IC can recover the same clock signal from the hub IC. The super-regenerative
receiver based signal strength indicator (RSSI) monitors the communication channel
environment between not only nodes to the hub, but also node to node each other[6].
Fig. 3. Overall block diagram of each IC – hub IC and node ICs.
According to the channel condition between node and hub, the type of data transmitter
is selected. In case of good channel condition, a FSK transmitter is selected for
sending data directly to the hub, and an OOK transmitter is utilized for sending data
to the neighbored node in case of harsh channel condition.
The collected RSSI data is processed in the hub IC, and the routing table of the entire
network system is generated after the Q-learning update.
III. PROTOCOL AND FSM
In this session, a protocol of the proposed ad-hoc
network system is described. According to the 3-bit code
of frame type, the type and the purpose of data is
determined as shown in Fig. 4. In the case of the Beacon
Frame, each component is used as follow,
Fig. 4. Frame structure according to frame type code.
1) Time Stamp (8 bytes)
After receiving the beacon frame, all the stations change their local clocks to this
time. It helps with synchronization.
2) Beacon Interval (2 bytes)
It is the time interval between beacon transmissions. Beacon interval is expressed
in Time Unit (TU, 1TU = 1024 us) and typically configured as 100TU.
3) Error Frame (2 bytes)
It indicates the frame number of missing frame or biterror frame.
4) Guard Interval (2 bytes)
It shows the time interval between each frame.
5) CRC (4 bytes)
Cyclic Redundancy Check for the FCS (Frame Check Sequence).
In the case of the RSSI Frame, the purpose of this frame is to calculate RSSI value.
Therefore, it is identical to the data frame without payload.
In the case of the Attenuation Table Frame, it includes attenuation values (calculated
from the RSSI value) from other nodes to the transmitting nodes. For example, if node
3 is transmitting the Attenuation Table frame, it contains attenuation values of node
1 to N (N excludes 3) to node 3. Attenuation Table contains information about the
identification of nodes and their attenuation values. Since each pair of nodes has
own Attenuation Table, the total number of Table would be equal to the total number
of the node.
In the case of the Routing Table Frame, it is included in the Beacon Frame which enables
all nodes to synchronize together. The Routing Table has components as follow,
1) Source, Destination
The source node should transmit to destination node.
2) Attenuation
Attenuation value from source to destination. Source node would set the signal power
based on this value.
3) Group
Group number of the transmission. The near nodes of hub are fixed to group 3 and those
who send frame to group 3 would be group 2, and group 1 would send signal to group
2, respectively.
4) Priority
It indicates the priority of transmission in the same group. Higher priority node
transmits signal earlier than the others.
5) Payload Size
It indicates payload size of the node. The code 0b00 means 300 bytes, and the code
0b01 is for 600 bytes and the code 0b11 is for 900 bytes.
In the case of the Data Frame, each component is used as follow,
1) PHY header
This field contains 7 bytes of preamble (0b10101010) with code of Start of the Frame
(SOF, 0b101010$11$). The last two bits of SOF indicates the start of the frame.
2) Frame Control
Frame control contains 16 bits of control that indicate information about the type
and the purpose of the frame.
3) Packet number
Current packet number of the node.
4) Address 1, Address 2
Source and Destination address depending on Frame Control bits.
For the FSM, the variables and the functions of the FSM are described in Table 1 and Table 2.
Table 1. Description of the FSM variables in the node
Variable
|
Description
|
Current_node
|
Node number that is allocated to transmit currently
|
Node_number
|
Number of nodes in the system
|
Final_node_num
|
Number of nodes that transmit frame to hub
|
Current_node_in_final_group
|
Number of nodes that have transmitted frame to hub currently
|
Table 2. Description of the FSM functions in the node
Function
|
Description
|
Push_start_button()
|
Initiate transmission by push button in hub
|
Send_beacon_frame()
|
Send beacon frame globally
|
Receive_signal
|
Receive frame from nodes for RSSI value calculation
|
RSSl_calculate()
|
Calculate RSSI value from received signal and update attenuation table
|
Receive_atten_table()
|
Receive attenuation table from nodes
|
Atten_table_update()
|
Add attenuation row received from node to hub's attenuation table
|
Q-Learning()
|
Q-Learning for generating routing table
|
Set_routing_table()
|
Set routing table and timing for data transmission based on attenuation table
|
Send_routing_ table()
|
Send routing table to all nodes
|
Set_final_node_num()
|
Set the number of nodes that would transmit data to hub
|
Receive_data()
|
Receive data frame from nodes (final nodes)
|
Process_data()
|
Extract payload and process the data
|
The sequence of the FSM is illustrated in Fig. 5, and the details are as like followed. First, send the Beacon Frame for global clock
synchronization from the hub IC. Each node gets a beacon frame and sets its clock.
Fig. 5. FSM of the entire system.
Then, it sends the signal for calculating the RSSI data in sequence according to the
node number globally. When the transmission to the last node and the calculation of
the RSSI data are completed, each node can know the value of the attenuation from
the other node to itself. Thus, to make the completed assembly table of n x n size,
each node sends its attenuation table to the hub, making a complete attenuation table
at the hub.
In the hub IC, the Q-learning is updated based on the complete attenuation table,
and the routing table is generated through reinforcement learning. The routing table
is sent globally to all nodes, and each node determines the data transmit timing based
on the routing table.
In the case of transmitting data to a nearby node, the node receiving the data adds
them in its payload to the next node or the hub. Once the data transmission to the
hub is complete, one data transmission sequence is completed, and the above process
is repeated.
IV. IMPLEMENTATION RESULTS
The electrical characteristics of the RSSI and the measurement results are shown in
Fig. 6. The RSSI consists of seven rectifiers, each producing a current proportional to
the input signal size of the corresponding range. Current is converted to voltage
and recorded as RSSI data, and an error of -90 to -20 dBm input range is only up to
4 dBm.
Fig. 6. RSSI operation and its measurement results.
Sixteen nodes are placed to compare the power consumption between the Infrastructure
Network and the Ad-hoc Network as shown in Fig. 7. In the case of the infrastructure network, each node sends data directly to the
hub, and in the case of the ad-hoc network, it is divided into up to three groups,
sending data to adjacent nodes according to the network environment. The experimental
conditions for power measurement are as shown in Table 3.
Fig. 7. Sixteen nodes and data transmission for power measurement.
Table 3. Experimental conditions for measurement
Parameter
|
Value
|
Bits for datum
|
24 bits/datum
|
Maximum Sampling Rate
|
1kHz
|
Transmission bps (Node to Hub)
|
500kHz
|
Transmission bps (Hub to Nod el Node to Node)
|
200kHz
|
Number of Nodes
|
16
|
Guard Interval
|
0.1ms
|
The sequence of the routing table setting in the Ad-hoc network is shown in Fig. 8(a), and the routing table is updated once every ten cycles, so the effective time is
3.87 ms. The sequence of the data transmission is shown in Fig. 8(b), and it took 94.98 ms for a total.
Fig. 8. (a) Sequence of the routing table setting, (b) Sequence of the data transmission.
As a result of the power measurement, a total of 1540 mW was consumed in the Infrastructure
Network, and only 558 mW was dissipated in the Ad-hoc Network to transmit the same
amount of data, reducing power consumption by 64%.
The chip photographs of hub and node IC are shown in Fig. 9, and the summary for chip performance is shown in Table 4.
Fig. 9. Chip photograph of hub and node IC.
Table 4. Chip performance summary
Process
|
0.18$\mu m$ 1P6M Mixed CMOS
|
Die Size
|
Hub IC : 2.35 X 2.55 mm2 (Including Pads)
|
Node IC: 2.35 X 2 mm2 (Including Pads)
|
Total Power
|
Hub IC: 2.8mW
Node IC : 274$\mu W$ (@0.8V Supply)
|
Hub
|
FSK Rx
|
Channel Frequency
|
20/40MHz FSK
|
Data Rate
|
500kbps
|
Sensitivity
|
-75dBm
|
RISC
|
Attenuation Table Update, Routing Table Update with Q-Learning
|
Node
|
Clock Rx
|
Channel Frequency
|
240MHz Always on
|
Data Rate
|
500kbps
|
Sensitivity
|
-62dBm
|
OOK Rx
|
Channel Frequency
|
20MHz OOK
|
Data Rate
|
200kbps
|
Sensitivity
|
-70dBm
|
V. CONCLUSIONS
In this paper, a reinforcement learning based local routing optimization for the ad-hoc
network is proposed. For the agent, the Q-learning method is adopted. The hub IC and
multiple node ICs are implemented for data communication. The received signal strength
indicator (RSSI) block converts network quality information to an electrical current
value between each node to node. The collected RSSI data are used for reward function
of the reinforcement learning, and the routing table for the adhoc network is generated
from them.
The chip fabricated with a 0.18 μm CMOS process, which uses the standard supply voltage
of 1.5 V, achieves the lowest power consumption of 274 μW at the supply voltage of
0.8 V. The proposed reinforcement learning based local routing optimization for the
ad-hoc network reduce 64% of total network power consumption compare to the conventional
infrastructure network model.
ACKNOWLEDGMENTS
This work was supported by Institute for Information & communications Technology
Promotion (IITP) grant funded by the Korea government (MSIP) (No.2016-0-00207, Intelligent
Processor Architectures and Application Software for CNN (Convolutional Neural Network)-RNN
(Recurrent Neural Network))
REFERENCES
Gregori Enrico, Cherkasova Ludmila, Cugola Gianopaolo, Panzieri Fabio, Picco Gian
P., May. 2002, Web Engineering and Peer-to-Peer Computing, Springer
Hsieh Hung-Yun, Sivakumar Raghupathy, Jun. 2001, Performance Comparison of Cellular
and Multihop Wireless Networks: A Quantitative Study, ACM SIGMETRICS Performance Evaluation
Review, Vol. 29, No. 1, pp. 113-122
Reddy Prashant P., Veloso Manuela M., Sep. 2011, RSSIbased Physical Layout Classification
and Target Tethering in Mobile Ad-hoc Networks, IEEE/RSJ International Conference
on Intelligent Robots and Systems
Beyens P., Peeters M., Steenhaut K., Nowe A., Mar. 2005, Routing with Compression
in WSNs: A QLearning approach, Proc. of the 5th Eur. Wksp on Adaptive Agents and Multi-Agent
Systems (AAMAS)
F¨orster Anna, Dec. 2007, Machine Learning Techniques Applied to Wireless Ad-Hoc Networks:
Guide and Survey, 3rd International Conference on Intelligent Sensors, Sensor Networks
and Information
Lee Yongsu, Yoo Hoi-jun, Jul. 2017, A 274μW Clock Synchronized Wireless Body Area
Network IC with Super-Regenerative RSSI for Biomedical Ad-Hoc Network System, 39th
Annual International Conference of the IEEE Engineering in Medicine and Biology Society
(EMBC)
Boyan J. A., Littman M. L., 1994, Packet routing in dynamically changing networks:
A reinforcement learning approach, Advances in Neural Information Processing Systems,
Vol. 6
Watkins C., Dayan P., 1992, Q-learning, Machine Learning
Author
(S’13) received the B.S., M.S. and Ph.D. degrees in electrical engineering from Korea
Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2013, 2015,
and 2019.
His current research interests include deep-learning SoC and low-power SoC for wearable
healthcare system.
He is also interested in human body communication (HBC) SoC for low-power application.
received the B.S. degree in electrical engineering from Korea Advanced Institute of
Science and Technology (KAIST).
And he is currently working toward the Ph.D. degree in Massachusetts Institute of
Technology (MIT).
His current research interests include deep-learning SoC and human experience with
sensor system.
graduated from the Electronic Department of Seoul National University, Seoul, Korea,
in 1983 and received the M.S. and Ph.D. degrees in electrical engineering from the
Korea Advanced Institute of Science and Technology (KAIST), Daejeon, in 1985 and 1988,
respectively.
Since 1998, he has been the faculty of the Department of Electrical Engineering at
KAIST and now is a full professor.
From 2001 to 2005, he was the director of Korean System Integration and IP Authoring
Research Center (SIPAC).
From 2003 to 2005, he was the full time Advisor to Minister of Korea Ministry of Information
and Communication and National Project Manager for SoC and Computer.
In 2007, he founded System Design Innovation & Application Research Center (SDIA)
at KAIST.
Since 2010, he has served the general chair of Korean Institute of Next Generation
Computing.
His current interests are computer vision SoC, body area networks, biomedical devices
and circuits.
He is a coauthor of DRAM Design (Korea: Hongrung, 1996), High Performance DRAM (Korea:
Sigma, 1999), Future Memory: FRAM (Korea: Sigma, 2000), Networks on Chips (Morgan
Kaufmann, 2006), Low-Power NoC for High-Performance SoC Design (CRC Press, 2008),
Circuits at the Nanoscale (CRC Press, 2009), Embedded Memories for Nano-Scale VLSIs
(Springer, 2009), Mobile 3D Graphics SoC from Algorithm to Chip (Wiley, 2010), Bio-Medical
CMOS ICs (Springer, 2011), Embedded Systems (Wiley, 2012), and Ultra-Low-Power Short-Range
Radios (Springer, 2015).