pyspark.sql Module

From GM-RKB
(Redirected from pyspark.sql module)
Jump to navigation Jump to search

A pyspark.sql Module is a Spark SQL PySpark module.



References

2017

2017

  • http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html
    • QUOTE:
      class pyspark.sql.SQLContext(sparkContext, sqlContext=None)
      Main entry point for Spark SQL functionality.
      • A SQLContext can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files.
    • Parameters:
      • sparkContext – The SparkContext backing this SQLContext.
      • sqlContext – An optional JVM Scala SQLContext. If set, we do not instantiate a new SQLContext in the JVM, instead we make all calls to this object.

2017

pyspark.sql.SparkSession Main entry point for DataFrame and SQL functionality.
pyspark.sql.DataFrame A distributed collection of data grouped into named columns.
pyspark.sql.Column A column expression in a DataFrame.
pyspark.sql.Row A row of data in a DataFrame.
pyspark.sql.GroupedData Aggregation methods, returned by DataFrame.groupBy().
pyspark.sql.DataFrameNaFunctions Methods for handling missing data (null values).
pyspark.sql.DataFrameStatFunctions Methods for statistics functionality.
pyspark.sql.functions List of built-in functions available for DataFrame.
pyspark.sql.types List of data types available.
pyspark.sql.Window For working with window functions.