Apache Storm Framework
Jump to navigation
Jump to search
An Apache Storm Framework is a fault-tolerant distribute realtime computation system tailored for data stream processing.
- Context:
- It can be used for a Log File Processing System.
- …
- Example(s):
- Counter-Example(s):
- See: Apache Flume, Apache Hadoop, Storm_(Event_processor), Distributed Stream Processing, Apache Mesos, Clojure, Stream Processing, MapReduce.
References
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Storm_(event_processor) Retrieved:2017-2-2.
- Apache Storm is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and team at BackType, the project was open sourced after being acquired by Twitter. It uses custom created "spouts" and "bolts" to define information sources and manipulations to allow batch, distributed processing of streaming data. The initial release was on 17 September 2011. A Storm application is designed as a "topology" in the shape of a directed acyclic graph (DAG) with spouts and bolts acting as the graph vertices. Edges on the graph are named streams and direct data from one node to another. Together, the topology acts as a data transformation pipeline. At a superficial level the general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches. Additionally, Storm topologies run indefinitely until killed, while a MapReduce job DAG must eventually end. Storm became an Apache Top-Level Project in September 2014 and was previously in incubation since September 2013.
2013
- http://storm-project.net/
- QUOTE: Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
- QUOTE: Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!