Apache Kafka Platform

From GM-RKB
(Redirected from Kafka-based)
Jump to navigation Jump to search

An Apache Kafka Platform is a open source distributed, partitioned, and replicated data stream processing platform managed by the Apache Kafka project.



References

2021

  • (Wikipedia, 2021) ⇒ https://en.wikipedia.org/wiki/Apache_Kafka Retrieved:2021-6-10.
    • Apache Kafka is a framework implementation of a software bus using stream-processing. It is an open-source software platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library.

      Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks [...] which allows Kafka to turn a bursty stream of random message writes into linear writes."

2017a

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Apache_Kafka Retrieved:2017-7-21.
    • Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a "massively scalable pub/sub message queue architected as a distributed transaction log," making it highly valuable for enterprise infrastructures to process streaming data. Additionally, Kafka connects to external systems (for data import/export) via Kafka Connect and provides Kafka Streams, a Java stream processing library. The design is heavily influenced by transaction logs.

2017b

  • (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Apache_Kafka#Description Retrieved:2017-7-21.
    • Kafka stores messages which come from arbitrarily many processes called "producers". The data can thereby be partitioned in different "partitions" within different "topics". Within a partition the messages are indexed and stored together with a timestamp. Other processes called "consumers" can query messages from partitions. Kafka runs on a cluster of one or more servers and the partitions can be distributed across cluster nodes.

2017c

  • http://confluent.io/blog/apache-kafka-for-service-architectures/
    • QUOTE: … The log-structured approach is itself a simple idea: a collection of messages, appended sequentially to a file. When a service wants to read messages from Kafka it ‘seeks’ to the position of the last message it read, then scans sequentially, reading messages in order, while periodically recording its new position in the log.

      Taking a log-structured approach has an interesting side effect. Both reads and writes are sequential operations. This makes them sympathetic to the underlying media, leveraging pre-fetch, the various layers of caching and naturally batching operations together. This makes them efficient. In fact, when you read messages from Kafka, the server doesn’t even import them into the JVM. Data is copied directly from the disk buffer to the network buffer. An opportunity afforded by the simplicity of both the contract and the underlying data structure. …

2014

2011