Experience-Driven Congestion Control: When Multi-Path TCP Meets Deep Reinforcement Learning

IEEE Journal of Selected Areas in Communications (JSAC)

Overview

  • Scenario: Multi-Path TCP
  • One RL agent $\longleftrightarrow$ MPTCP flows on an end host
  • Implemented on Linux kernel
  • Policy gradient, Actor-Critic
  • LSTM
  • Setting of testing scenario

Model

Flows $\to$ LSTM $\to$ RL

90%
90%

LSTM

state of all flows $s_t = [s_t^1, …, s_t^N] \to$ one output

90%
90%

State

  • Flow(total: $N$): $i$, subflow(total: $K_i$): $k$, epoch: $t$
    • Agent $\to$ Flows(TCP & MPTCP) $\to$ Subflows(only 1 if is regular TCP)
  • $s_t^{i, k} = [b, g, d, v, w]$
    • corresponding sending rate
    • goodput
    • average RTT
    • mean deviation of RTTs
    • respective cwnd
  • $s_t = [s_t^1, …, s_t^N]$
  • $s_t^i = [s_t^{i, 1}, …, s_t^{i, K_i}]$

Action

  • $a_t = [x_t^1, …, x_t^K]$
  • $x_i$: changes to current subflows’ cwnd
  • DRL-CC only takes an action on one (target) MPTCP flow

Reward

  • $r_t = \sum_{i = 1 \to N}{U(i, t)}$
    • U depends on upper-layer apps

    • in paper: $U=\lg{g_t^i}$ ($g_t$: average goodput during the $t-1$ epoch)
    • maximizing this utility function leads to proportional fairness (Why?)

Training

Pre-training

  • Environment
    • using iPerf3 to continuously generate packets
    • 2 laptop $\longleftarrow$ Gigabit switch $\longrightarrow$ 2 server
  • 1 MPTCP = 2 subflow: 8Mbps, 200ms, 0.5%
  • 50000 epochs
  • 2.5 hours

Online Test

  • Benchmark
    • Jain’s fairness index: $\bar{x}^2 / \bar{x^2}$
    • goodput
  • General Environment
    • client $\longleftarrow$ 5 MPTCP flows $\longrightarrow$ server
    • transporting document through HTTP / iPerf3
    • 0.5ms to convergence
  • Parameters
    • delay: $50ms \to 400ms$
    • packet loss rate: $0.5% \to 4%$
    • bottleneck bandwidth: $2Mbps \to 16Mbps$
    • document: 2 $\to$ 8 MB
  • Scenerios
    • 4 (HTTP) + 3 (iPerf3) + 1 (wireless)
    • 5-th: dynamic establishments and terminations of MPTCP flows
      • establish: Poisson process, each flow lasted for 30s
    • 6-th: 5 MPTCP in begining, close 1 subflow per 60s
    • 7-th: MPTCP and TCP co-exist $\to$ TCP-friendliness
      90%
    • 9-th: wireless environment