QTCP: Adaptive Congestion Control with Reinforcement Learning
IEEE Transactions on Network Science and Engineering
¶ Model
Q-learning + Kanerva Coding
¶ State
- average packet sending interval
- average consecutive ACK received interval
- average RTT
- Kanerva Coding :question:
- work as a function approximation
- to reducing the huge, continous state space
¶ Action
- increase: +10, decrease: -1, Keep: 0
¶ Reward
- Utility = $\alpha \times \log{throughput} - \sigma \times \log{RTT}$
- reward =
- $a (a > 0)$, if $U’ > \epsilon$
- $b (b < 0)$, if $U’ < -\epsilon$
¶ Training
Online!
¶ Parameters
Parameters | Value |
---|---|
Learning rate $\alpha$ | 0.95, *0.995 per second |
Exploration rate | 0.1, *0.9995 per second |
Discount factor $\gamma$ | 0.9 |
Reward update time | 0.23 s |
Simulation time | 800 s |
RTT | 120 ms |
buffer size | 200 packet |
Generalization tolerance factor $\beta$ | 0.8 |
¶ Environment setting
- NS-3 Simulator
- 2 Senders $\leftrightarrow$ Router $\leftarrow$ wireless $\rightarrow$ Router $\leftrightarrow$ 2 Reciever
¶ Fixed bandwidth
40 Mbps
¶ Dynamic bandwidth
60 Mbps, 40s $\to$ 30 Mbps, 10s $\to …$