Apache Spark Cluster Instance
Jump to navigation
Jump to search
An Apache Spark Cluster Instance is a computing cluster that is based on an Apache Spark framework.
- Context:
- It can (typically) run Spark Applications / Spark Jobs.
- It can (typically) be composed of Spark Cluster Manager (standalone, Mesos, YARN) and Spark Worker Nodes (which run tasks on an executor).
- It can be composed of Spark Driver Program and Spark Tasks (within Spark Executors).
- It can range from being a Long-Lived Spark Cluster / Permanent Spark Cluster to being a Short-Lived Spark Cluster / Temporary Spark Cluster.
- …
- Example(s):
- the
j-1TCRZQMPC9R64
AWS EMR-based Spark Cluster (based on AWS EMR). - a DataBricks-based Spark Cluster.
- …
- the
- Counter-Example(s):
- a Scalding Cluster (for Scalding jobs).
- a MapReduce Cluster (such as a Hadoop cluster).
- See: HDFS, Hadoop Master Node, Spark Context, Grid Computing.
References
2015
- http://techblog.netflix.com/2015/03/can-spark-streaming-survive-chaos-monkey.html
-
ComponentTypeBehaviour on Component FailureResilientDriverProcessClient Mode: The entire application is killedCluster Mode with supervise: The Driver is restarted on a different Worker nodeMasterProcessSingle Master: The entire application is killedMulti Master: A STANDBY master is elected ACTIVEWorker ProcessProcessAll child processes (executor or driver) are also terminated and a new worker process is launchedExecutorProcessA new executor is launched by the Worker processReceiverThread(s)Same as Executor as they are long running tasks inside the ExecutorWorker NodeNodeWorker, Executor and Driver processes run on Worker nodes and the behavior is same as killing them individually
-
2012
- http://en.wikipedia.org/wiki/Apache_Hadoop#Architecture
- … A small Hadoop cluster will include a single master and multiple worker nodes. The master node consists of a JobTracker, TaskTracker, NameNode, and DataNode. A slave or worker node acts as both a DataNode and TaskTracker, though it is possible to have data-only worker nodes, and compute-only worker nodes; these are normally only used in non-standard applications.