2017 DataIngestionfortheConnectedWor
- (Meehan et al., 2017) ⇒ John Meehan, Cansu Aslantas, Stan Zdonik, Nesime Tatbul, and Jiang Du. (2017). “Data Ingestion for the Connected World..” In: CIDR.
Subject Headings: Data Ingestion Task, Data Ingestion System.
Notes
Cited By
Quotes
Abstract
In this paper, we argue that in many "Big Data" applications, getting data into the system correctly and at scale via traditional ETL (Extract, Transform, and Load) processes is a fundamental roadblock to being able to perform timely analytics or make real-time decisions. The best way to address this problem is to build a new architecture for ETL which takes advantage of the push-based nature of a stream processing system. We discuss the requirements for a streaming ETL engine and describe a generic architecture which satisfies those requirements. We also describe our implementation of streaming ETL using a scalable messaging system (Apache Kafka), a transactional stream processing system (S-Store), and a distributed polystore (Intel's BigDAWG), as well as propose a new time-series database optimized to handle ingestion internally.
References
;
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2017 DataIngestionfortheConnectedWor | John Meehan Cansu Aslantas Stan Zdonik Nesime Tatbul Jiang Du | Data Ingestion for the Connected World. |