Parquet File
Jump to navigation
Jump to search
A Parquet File is a columnar data file in a Parquet file format.
- Context:
- It can be opened with a Parquet Library (such as with PyArrow).
- Example(s):
- Counter-Example(s):
- See: Apache Thrift.
References
2015
- ">How to Convert a CSV file to Apache Parquet Using Apache Drill."
- QUOTE: … Apache Drill … We are now ready to create our Parquet files using the "Create Table As Select" (aka CTAS):
alter session set `store.format`='parquet';
CREATE TABLE dfs.tmp.`/stats/airport_data/` AS SELECT CAST(SUBSTR(columns[0],1,4) AS INT) `YEAR`, CAST(SUBSTR(columns[0],5,2) AS INT) `MONTH`, columns[1] as `AIRLINE`, columns[2] as `IATA_CODE`, columns[3] as `AIRLINE_2`, columns[4] as `IATA_CODE_2`, columns[5] as `GEO_SUMMARY`, columns[6] as `GEO_REGION`, columns[7] as `ACTIVITY_CODE`, columns[8] as `PRICE_CODE`, columns[9] as `TERMINAL`, columns[10] as `BOARDING_AREA`, CAST(columns[11] AS DOUBLE) as `PASSENGER_COUNT` FROM dfs.`/opendata/Passenger/SFO_Passenger_Data/*.csv`