Apache Flume Platform

Context:
- It can be a Push System (Kafka is a pull system)
- It can be used to Rotate Log Files.
- It can support aggregating data from many sources into hdfs
- It can be an ETL system (as opposed to elt system).
- It can be highly horizontally scalable
- It can be extensible with custom data sources
- It can use agents as node-like operators
- It can perform transactional channel messaging.
- It can use Memory channels (crashes loses data)
- It can use Disk channels which are slower but reliable on crash
- It can new channel spills into disk when out of ram
- It can be load balanced and replicated redundantly
- It can use a Sink/Source Design.
- It can be managed by the Apache Flume Project.
- It can … no explicit code execution
Example(s):
- Apache Flume v1.7.0 (2016-10-17)[1].
- …
Counter-Example(s):
- Apache Kafka Platform.
- Fluentd.
See: ETL Software System, Batch File, Streaming Data, Lambda Architecture.

References

(Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Apache_Flume Retrieved:2017-8-4.
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

http://flume.apache.org/
- Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.

http://cwiki.apache.org/FLUME/home.html
- Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic applications. Please click here for the user guide.
  It is written primarily in Java and has been tested on unix-like systems …