Time-Based Aggregates: The Good, the Bad and the Ugly
Time-based aggregations are a common use case in stream-processing, but some details remain surprisingly difficult to grasp. Many of them are related to the distributed nature of topologies and the inevitability of failures. Flink´s strong semantic foundations make it easy to avoid wrong results. But getting the correct results - reliably and timely - is not trivial either. This is especially true when reading data from partitioned - and potentially skewed - Kafka topics. There are some common mistakes and misunderstandings that may lead to problems - including, but not limited to - excessively large states or jobs (seemingly) not making any progress. We try to identify a couple of dos and don´ts to help you avoid common pitfalls and get your jobs up and running quickly.