Building a network stack for optimal throughput / low-latency trade-offs

Flink's network stack is designed with two goals in mind: (a) having low latency for data passing through, and (b) achieving an optimal throughput. It already achieves a good trade-off between these two but we will continue to tune it further for an optimal one. Among the changes of Flink 1.3, we already removed one major annoyance with regards to the network buffers configuration where too few buffers led to jobs not being able to run and too many buffers led to suboptimal performance due to back-pressure kicking in too late. Flink 1.4 and onwards will see further improvements by avoiding unnecessary copies in the network stack and by generally changing to a network-event-driven pull approach. In this talk, we will describe in detail how the network stack works with respect to memory management and buffer pools, configuration, as well as the design to always get the best throughput and latency combination that the network and processing pipeline can currently support. We will also present the latest changes and future plans for improvements to make the network stack even better than it already is.