Data Frame-based Data Structure
Jump to navigation
Jump to search
A Data Frame-based Data Structure is a size-mutable heterogeneous indexed tabular structure with labeled axes.
- …
- Example(s):
- Counter-Example(s):
- See: Data Frame.
References
2015
- http://lab.getbase.com/pandarize-spark-dataframes/
- … DataFrames are a great abstraction for working with structured and semi-structured data. They are basically a collection of rows, organized into named columns. Think of relational database tables: DataFrames are very similar and allow you to do similar operations on them:
- slice data: select subset of rows or columns based on conditions (filters)
- sort data by one or more columns
- aggregate data and compute summary statistics.
- join multiple DataFrames
- What makes them much more powerful than SQL is the fact that this nice, SQL-like API is actually exposed in a full-fledged programming language. Which means we can mix declarative SQL-like operations with arbitrary code written in a general-purpose programming language.
DataFrames were popularized by R and then adopted by other languages and frameworks. For Python we have pandas, a great data analysis library, where DataFrame is one of the key abstractions.
- … DataFrames are a great abstraction for working with structured and semi-structured data. They are basically a collection of rows, organized into named columns. Think of relational database tables: DataFrames are very similar and allow you to do similar operations on them: