pyspark.sql Module
Jump to navigation
Jump to search
A pyspark.sql Module is a Spark SQL PySpark module.
- Context:
- It can contain:
- a pyspark.sql.types module
- a pyspark.sql.functions module
- It can contain:
- Counter-Example(s):
- See: pyspark, s3a, SparkContext.
References
2017
- https://spark.apache.org/docs/2.2.0/sql-programming-guide.html
- QUOTE: All data types of Spark SQL are located in the package of pyspark.sql.types. You can access them by doing
from pyspark.sql.types import *
- QUOTE: All data types of Spark SQL are located in the package of pyspark.sql.types. You can access them by doing
2017
- http://spark.apache.org/docs/2.2.0/api/python/pyspark.sql.html
- QUOTE:
class pyspark.sql.SQLContext(sparkContext, sqlContext=None)
Main entry point for Spark SQL functionality.- A SQLContext can be used create DataFrame, register DataFrame as tables, execute SQL over tables, cache tables, and read parquet files.
- Parameters:
- sparkContext – The SparkContext backing this SQLContext.
- sqlContext – An optional JVM Scala SQLContext. If set, we do not instantiate a new SQLContext in the JVM, instead we make all calls to this object.
- QUOTE:
2017
pyspark.sql.SparkSession Main entry point for DataFrame and SQL functionality. pyspark.sql.DataFrame A distributed collection of data grouped into named columns. pyspark.sql.Column A column expression in a DataFrame. pyspark.sql.Row A row of data in a DataFrame. pyspark.sql.GroupedData Aggregation methods, returned by DataFrame.groupBy(). pyspark.sql.DataFrameNaFunctions Methods for handling missing data (null values). pyspark.sql.DataFrameStatFunctions Methods for statistics functionality. pyspark.sql.functions List of built-in functions available for DataFrame. pyspark.sql.types List of data types available. pyspark.sql.Window For working with window functions.