Multi-Path TCP RL

Experience-Driven Congestion Control: When Multi-Path TCP Meets Deep Reinforcement Learning

IEEE Journal of Selected Areas in Communications (JSAC)

¶ Overview

Flows $\to$ LSTM $\to$ RL

state of all flows $s_t = [s_t^1, …, s_t^N] \to$ one output

Flow(total: $N$): $i$, subflow(total: $K_i$): $k$, epoch: $t$
- Agent $\to$ Flows(TCP & MPTCP) $\to$ Subflows(only 1 if is regular TCP)
$s_t^{i, k} = [b, g, d, v, w]$
- corresponding sending rate
- goodput
- average RTT
- mean deviation of RTTs
- respective cwnd
$s_t = [s_t^1, …, s_t^N]$
$s_t^i = [s_t^{i, 1}, …, s_t^{i, K_i}]$

$r_t = \sum_{i = 1 \to N}{U(i, t)}$
- U depends on upper-layer apps
- in paper: $U=\lg{g_t^i}$ ($g_t$: average goodput during the $t-1$ epoch)
- maximizing this utility function leads to proportional fairness (Why?)

Environment
- using iPerf3 to continuously generate packets
- 2 laptop $\longleftarrow$ Gigabit switch $\longrightarrow$ 2 server
1 MPTCP = 2 subflow: 8Mbps, 200ms, 0.5%
50000 epochs
2.5 hours

Benchmark
- Jain’s fairness index: $\bar{x}^2 / \bar{x^2}$
- goodput
General Environment
- client $\longleftarrow$ 5 MPTCP flows $\longrightarrow$ server
- transporting document through HTTP / iPerf3
- 0.5ms to convergence
Parameters
- delay: $50ms \to 400ms$
- packet loss rate: $0.5% \to 4%$
- bottleneck bandwidth: $2Mbps \to 16Mbps$
- document: 2 $\to$ 8 MB
Scenerios
- 4 (HTTP) + 3 (iPerf3) + 1 (wireless)
- 5-th: dynamic establishments and terminations of MPTCP flows
  - establish: Poisson process, each flow lasted for 30s
- 6-th: 5 MPTCP in begining, close 1 subflow per 60s
- 7-th: MPTCP and TCP co-exist $\to$ TCP-friendliness
- 9-th: wireless environment