Apache Kafka Platform

Context:
- It can (typically) be used to create a Publish-Subscribe Messaging System.
- It can be a Pull System (Flume is a push system)
- It can (typically) support Kafka Producers that write IT micro-events to kafka
- It can (typically) support Kafka Consumers that subscribe to IT micro-events. (e.g. click events).
- It can provide a vast array of metrics on performance and resource utilization (to help you manage and scale a cluster).
- It can (typically) guarantee order (within a Kafka partition).
- It can be used to create a Kafka Messaging System Instance.
- It can have Kafka APIs, such as Kafka Connect API (for Kafka Connect).
- …
Example(s):
- Kafka v2.4.1 (~2020-03-20) [1].
- Kafka v1.0.0 (~2017-11-01).
- Kafka v0.11.0 (~2017-11-17).
- https://kafka.apache.org/downloads
- …
Counter-Example(s):
- Apache Flume.
- Fluentd (point-to-point).
- Amazon Kinesis.
See: Kafka Streams, AWS MSK, Distributed Messaging System, Near-line System, Apache Yarn, Kafka SQL (KSQL), Kafka Cruise Control.

References

(Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Apache_Kafka Retrieved:2021-6-10.
- Apache Kafka is a framework implementation of a software bus using stream-processing. It is an open-source software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.
  Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks [...] which allows Kafka to turn a bursty stream of random message writes into linear writes."

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Apache_Kafka Retrieved:2017-7-21.
- Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log," making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library. The design is heavily influenced by transaction logs.

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Apache_Kafka#Description Retrieved:2017-7-21.
- Kafka stores messages which come from arbitrarily many processes called "producers". The data can thereby be partitioned in different "partitions" within different "topics". Within a partition the messages are indexed and stored together with a timestamp. Other processes called "consumers" can query messages from partitions. Kafka runs on a cluster of one or more servers and the partitions can be distributed across cluster nodes.

http://confluent.io/blog/apache-kafka-for-service-architectures/
- QUOTE: … The log-structured approach is itself a simple idea: a collection of messages, appended sequentially to a file. When a service wants to read messages from Kafka it ‘seeks’ to the position of the last message it read, then scans sequentially, reading messages in order, while periodically recording its new position in the log.
  Taking a log-structured approach has an interesting side effect. Both reads and writes are sequential operations. This makes them sympathetic to the underlying media, leveraging pre-fetch, the various layers of caching and naturally batching operations together. This makes them efficient. In fact, when you read messages from Kafka, the server doesn’t even import them into the JVM. Data is copied directly from the disk buffer to the network buffer. An opportunity afforded by the simplicity of both the contract and the underlying data structure. …