QTCP: Adaptive Congestion Control with Reinforcement Learning

IEEE Transactions on Network Science and Engineering

Model

Q-learning + Kanerva Coding

State

  • average packet sending interval
  • average consecutive ACK received interval
  • average RTT
  • Kanerva Coding :question:
    • work as a function approximation
    • to reducing the huge, continous state space

Action

  • increase: +10, decrease: -1, Keep: 0

Reward

  • Utility = $\alpha \times \log{throughput} - \sigma \times \log{RTT}$
  • reward =
    1. $a (a > 0)$, if $U’ > \epsilon$
    2. $b (b < 0)$, if $U’ < -\epsilon$

Training

Online!

Parameters

Parameters Value
Learning rate $\alpha$ 0.95, *0.995 per second
Exploration rate 0.1, *0.9995 per second
Discount factor $\gamma$ 0.9
Reward update time 0.23 s
Simulation time 800 s
RTT 120 ms
buffer size 200 packet
Generalization tolerance factor $\beta$ 0.8

Environment setting

  • NS-3 Simulator
  • 2 Senders $\leftrightarrow$ Router $\leftarrow$ wireless $\rightarrow$ Router $\leftrightarrow$ 2 Reciever

Fixed bandwidth

40 Mbps

65%](fixed_bandwidth_throughput.png) ![65%
65%](fixed_bandwidth_throughput.png) ![65%

Dynamic bandwidth

60 Mbps, 40s $\to$ 30 Mbps, 10s $\to …$

110%](dynamic_bandwidth_throughput.png) ![110%
110%](dynamic_bandwidth_throughput.png) ![110%