Apache Spark Streaming Framework

Context:
- It can perform Mini Data Batches and perform RDD Transformations on them.
- …
Counter-Example*(s):
- Storm (Event Processor).
- …
See: Network Socket, Event Stream Processing, Lambda Architecture, Apache Flink, Apache Kafka, Apache Flume, ZeroMQ.

References

https://www.infoworld.com/article/3236869/what-is-apache-spark-the-big-data-platform-that-crushed-hadoop.html?page=2
- QUOTE: ... A criticism of the Spark Streaming approach is that microbatching, in scenarios where a low-latency response to incoming data is required, may not be able to match the performance of other streaming-capable frameworks like Apache Storm, Apache Flink, and Apache Apex, all of which use a pure streaming method rather than microbatches. …

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Apache_Spark#Spark_Streaming Retrieved:2017-2-2.
- Spark Streaming leverages Spark Core's fast scheduling capability to perform streaming analytics. It ingests data in mini-batches and performs RDD transformations on those mini-batches of data. This design enables the same set of application code written for batch analytics to be used in streaming analytics, thus facilitating easy implementation of lambda architecture. However, this convenience comes with the penalty of latency equal to the mini-batch duration. Other streaming data engines that process event by event rather than in mini-batches include Storm and the streaming component of Flink. Spark Streaming has support built-in to consume from Kafka, Flume, Twitter, ZeroMQ, Kinesis, and TCP/IP sockets.

https://databricks.com/blog/2016/07/28/continuous-applications-evolving-streaming-in-apache-spark-2-0.html
- QUOTE: Since its release, Spark Streaming has become one of the most widely used distributed streaming engines, thanks to its high-level API and exactly-once semantics. Nonetheless, as these types of engines became common, we’ve noticed that developers often need more than just a streaming programming model to build real-time applications. At Databricks, we’ve worked with thousands of users to understand how to simplify real-time applications. In this post, we present the resulting idea, continuous applications, which we have started to implement through the Structured Streaming API in Apache Spark 2.0.