The past, present, and future of streaming: Flink, Spark, and the gang
Reactive, real-time applications require real-time, eventful data flows. This is the premise on which a number of streaming frameworks have proliferated. The latest milestone was adding ACID capabilities, so let us take stock of where we are in this journey down the stream — or river.
Streaming is one of the top trends we’ve been keeping up with. The latest episode in that saga was adding ACID capabilities to Apache Flink, as covered by ZDNet’s Tony Baer last week. This announcement, made at Flink Forward in Berlin, was the backdrop for in-depth conversations we had with executives, engineers, and users, which may help put things in context.
To begin with, as Baer noted, there is an API for Flink that can be downloaded from GitHub, but it only works for a single stream. The version with the “runner” for multiple parallel streams is part of the data Artisans Platform – the commercial incarnation of Flink.
This is not at all surprising, as data Artisans, the vendor that provides support for Flink and employs a big part of its full-time contributors has an open core policy. That’s a very common policy in the open source world, and one that data Artisans/Flink’s main competitor, Databricks / Apache Spark, is also taking.