Spark Worker Node
Jump to navigation
Jump to search
A Spark Worker Node is a cluster worker node in a Spark cluster.
- Context:
- It can have one or more Spark Node Partitions.
- See: Spark Master.
References
2018
- https://www.talend.com/blog/2018/03/05/intro-apache-spark-partitioning-need-know/
- QUOTE: Here are some of the basics of partitioning:
- Every node in a Spark cluster contains one or more partitions.
- The number of partitions used in Spark is configurable and having too few (causing less concurrency, data skewing and improper resource utilization) or too many (causing task scheduling to take more time than actual execution time) partitions is not good. By default, it is set to the total number of cores on all the executor nodes.
- Partitions in Spark do not span multiple machines.
- Tuples in the same partition are guaranteed to be on the same machine.
- Spark assigns one task per partition and each worker can process one task at a time.
- QUOTE: Here are some of the basics of partitioning: