BSDCan2011 - Final (with audio).5

BSDCan 2011
The Technical BSD Conference

Speakers
Randall Stewart
Schedule
Day Talks - 2 - 2011-05-14
Room DMS 1150
Start time 11:30
Duration 01:00
Info
ID 242
Event type Lecture
Track Hacking
Language used for presentation English

An Investigation into Data Center Congestion with ECN

Data Center Congestion Control

Data centers pose a unique set of demands on any transport protocol being used within them. It has been noted in Griffen et.al. [1] that common datacenter communication patterns virtually guarantee incidents of incast. In Vasudevan et.al. [2] solving the incast problem involved reducing the RTO.min (the minimum retransmission timeout) of the transport protocol, but this failed to alleviate the root cause of the problem, switch buffer overflow. Alizadeh et.al (DC-TCP) [3] address the same problem with thought given to using ECN [4] with a new algorithm to not only eliminate incast but to also reduce switch buffer occupancy, thus improving both elephant and mice flows within the datacenter. In this paper we attempt to revisit some of the DCTCP work

Data centers pose a unique set of demands on any transport protocol being used within them. It has been noted in Griffen et.al. [1] that common datacenter communication patterns virtually guarantee incidents of incast. In Vasudevan et.al. [2] solving the incast problem involved reducing the RTO.min (the minimum retransmission timeout) of the transport protocol, but this failed to alleviate the root cause of the problem, switch buffer overflow. Alizadeh et.al (DC-TCP) [3] address the same problem with thought given to using ECN [4] with a new algorithm to not only eliminate incast but to also reduce switch buffer occupancy, thus improving both elephant and mice flows within the datacenter. In this paper we attempt to revisit some of the DCTCP work with a few differences, namely:

  1. Instead of using only TCP [5] we separate the external transport protocol from the internal datacenter protocol. To achieve this separation, we use SCTP [6] instead of TCP for the internal datacenter communication, giving us more flexibility in the feature *set available to our internal datacenter, and at the same time assuring that changes within the transport stack internally will not adversely effect external communications on the Internet itself.

  2. When attempting to reproduce some of DC-TCP findings, we will use existing switch products to provide the appropriate ECN marking.

  3. Instead of using the DC-TCP algorithm we have defined a less compute intensive modification to ECN we call Data Center Congestion Control (DCCC), implementing it within the FreeBSD SCTP stack.

  4. We compare four variants of SCTP: standard SCTP, SCTP with ECN, SCTP with DCCC and an alternate form of DCCC we call Dynamic DCCC. This version of DCCC is capable of switching between regular ECN and DCCC based on the initial Round Trip Time (RTT) of the path.