Apache Drill Platform
Jump to navigation
Jump to search
An Apache Drill Platform is an open-source distributed data querying platform that is an Apache project.
- Context:
- …
- Example(s):
- Apache Drill v1.10 (2017-03-16)[1].
- …
- Counter-Example(s):
- Impala Platform.
- Google's Dremel Platform, that supports GCP BigQuery.
- Apache Druid.
- See: Hue, Apache HBase.
References
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Apache_Drill Retrieved:2017-4-19.
- Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an infrastructure service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Drill is an Apache top-level project. Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop. Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, so it's a good idea to co-locate Drill and the datastore on the same nodes.
Apache Drill 1.9 adds dynamic UDF feature, enables users to register and unregister UDFs on their own using the new CREATE FUNCTION USING JAR and DROP FUNCTION USING JAR commands.
- Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open source version of Google's Dremel system which is available as an infrastructure service called Google BigQuery. One explicitly stated design goal is that Drill is able to scale to 10,000 servers or more and to be able to process petabytes of data and trillions of records in seconds. Drill is an Apache top-level project. Drill supports a variety of NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files. A single query can join data from multiple datastores. For example, you can join a user profile collection in MongoDB with a directory of event logs in Hadoop. Drill's datastore-aware optimizer automatically restructures a query plan to leverage the datastore's internal processing capabilities. In addition, Drill supports data locality, so it's a good idea to co-locate Drill and the datastore on the same nodes.
2017
- (Wikipedia, 2017) ⇒ https://en.wikipedia.org/wiki/Apache_Drill#Features Retrieved:2017-4-19.
- Schema-free JSON document model similar to MongoDB and Elasticsearch, without requiring a formal schema to be declared
- Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs
- Extremely user and developer friendly
- Pluggable architecture enables connectivity to multiple datastores
2015
- https://drill.apache.org/faq/
- QUOTE: Apache Drill enables analysts, business users, data scientists and developers to explore and analyze this data without sacrificing the flexibility and agility offered by these datastores. Drill processes the data in-situ without requiring users to define schemas or transform data.
Drill is an innovative distributed SQL engine designed to enable data exploration and analytics on non-relational datastores. Users can query the data using standard tool and BI tools without having to create and manage schemas. Some of the key features are:
- Schema-free JSON document model similar to MongoDB and Elasticsearch.
- Industry-standard APIs: ANSI SQL, ODBC/JDBC, RESTful APIs.
- Extremely user and developer friendly.
- Pluggable architecture enables connectivity to multiple datastores
- QUOTE: Apache Drill enables analysts, business users, data scientists and developers to explore and analyze this data without sacrificing the flexibility and agility offered by these datastores. Drill processes the data in-situ without requiring users to define schemas or transform data.