Spark Datasets API
Jump to navigation
Jump to search
A Spark Datasets API is a Spark data structure such that ...
- Example(s):
- …
- Counter-Example(s):
- See: Spark Datasets API, Data Frame.
References
2016
val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc) import sqlContext.implicits._ val sampleData: Seq[ScalaPerson] = ScalaData.sampleData() val dataset = sqlContext.createDataset(sampleData)
dataset.filter(_.age < 21);
2015
- http://spark.apache.org/docs/latest/sql-programming-guide.html#datasets
- QUOTE: A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s optimized execution engine. A Dataset can be constructed from JVM objects and then manipulated using functional transformations (map, flatMap, filter, etc.).
The unified Dataset API can be used both in Scala and Java. Python does not yet have support for the Dataset API, but due to its dynamic nature many of the benefits are already available (i.e. you can access the field of a row by name naturally row.columnName). Full python support will be added in a future release.
- QUOTE: A Dataset is a new experimental interface added in Spark 1.6 that tries to provide the benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s optimized execution engine. A Dataset can be constructed from JVM objects and then manipulated using functional transformations (map, flatMap, filter, etc.).