pyspark.sql Module

A pyspark.sql Module is a Spark SQL PySpark module.

Context:
- It can contain:
  - a pyspark.sql.types module
  - a pyspark.sql.functions module
Counter-Example(s):
See: pyspark, s3a, SparkContext.

References

2017

https://spark.apache.org/docs/2.2.0/sql-programming-guide.html
- QUOTE: All data types of Spark SQL are located in the package of pyspark.sql.types. You can access them by doing
  from pyspark.sql.types import *

2017

http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html
- QUOTE:
  class pyspark.sql.SQLContext(sparkContext, sqlContext=None)
  Main entry point for Spark SQL functionality.
  - A SQLContext can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files.
- Parameters:
  - sparkContext – The SparkContext backing this SQLContext.
  - sqlContext – An optional JVM Scala SQLContext. If set, we do not instantiate a new SQLContext in the JVM, instead we make all calls to this object.

2017

http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html

pyspark.sql.SparkSession Main entry point for DataFrame and SQL functionality.
pyspark.sql.DataFrame A distributed collection of data grouped into named columns.
pyspark.sql.Column A column expression in a DataFrame.
pyspark.sql.Row A row of data in a DataFrame.
pyspark.sql.GroupedData Aggregation methods, returned by DataFrame.groupBy().
pyspark.sql.DataFrameNaFunctions Methods for handling missing data (null values).
pyspark.sql.DataFrameStatFunctions Methods for statistics functionality.
pyspark.sql.functions List of built-in functions available for DataFrame.
pyspark.sql.types List of data types available.
pyspark.sql.Window For working with window functions.

pyspark.sql Module

References

2017

2017

2017

Navigation menu

Search