Parameter Server on Flink, an approach for model-parallel machine learning
In this talk we show a Parameter Server implementation in Flink for model-parallel machine learning. To scale efficiently, some machine learning algorithms not only require the input to be processed in parallel, but to train and store the model in a distributed manner. The Parameter Server provides an abstraction layer for the distributed model, so the implementation of such algorithms is much easier. We present how our Parameter Server can be used for model-parallel training in Flink through two use-cases: text classification with passive aggressive algorithm and recommendation with matrix factorization. Our implementation is built entirely on top of the Streaming API, so Flink can also be used to preprocess the data and even to serve predictions in a single job. We built the Parameter Server implementation as part of STREAMLINE, an EU funded project, for the text classification task of Internet Memory (France) and the TV program recommendation task of AlticeLabs (Portugal).