Data Redistribution Across Partitions Operation

References

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-partitions.html
- QUOTE: Depending on how you look at Spark (programmer, devop, admin), an RDD is about the content (developer’s and data scientist’s perspective) or how it gets spread out over a cluster (performance), i.e. how many partitions an RDD represents.
  A partition (aka split) is a logical chunk of a large distributed data set.
  Spark manages data using partitions that helps parallelize distributed data processing with minimal network traffic for sending data between executors.