Extract-Transform-Load (ETL) Platform System: Difference between revisions

From GM-RKB
Jump to navigation Jump to search
No edit summary
No edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
An [[Extract-Transform-Load (ETL) Platform System]] is a [[data processing system]] that can solve [[ETL task]]s.
An [[Extract-Transform-Load (ETL) Platform System]] is a [[data processing system]] that can solve [[ETL task]]s by extracting [[data]] from [[source system]]s, transforming it according to [[business rule]]s, and loading it into [[target system]]s.
* <B>AKA:</B> [[ETL System]], [[Data Integration System]], [[Data Pipeline System]], [[Data Processing Pipeline]].
* <B>Context:</B>
* <B>Context:</B>
** It can range from (typically) being based on a [[3rd-Party ETL Platform]] to being a [[Custom ETL System]].
** It can typically implement [[ETL workflow]]s for [[structured data processing]].
** It can (typically) have [[ETL Platform System Capabiliti]]es.
** It can typically manage [[data extraction]] from [[diverse source]]s through [[connector component]]s.
** It can be a [[Domain-Specific System]], such as a [[Log File ETL System]].
** It can typically perform [[data transformation]] using [[transformation rule]]s and [[data mapping]].
**
** It can typically execute [[data loading]] into [[target database]]s through [[loading protocol]]s.
* <B>Example(s):</B>
** It can typically maintain [[data quality]] through [[validation rule]]s and [[cleansing process]]es.
** PlayStation's ETL Platform.
** ...
** Medable's ETL Platform.
** It can often provide [[ETL monitoring]] for [[process oversight]] and [[issue detection]].
** one build on [[Informatica PowerCenter]], [[Apache Spark]], or [[Apache Flume]].
** It can often support [[metadata management]] for [[data lineage tracking]] and [[impact analysis]].
**
** It can often implement [[error handling]] for [[failed operation recovery]].
* <B>Counter-Example(s):</B>
** It can often enable [[job scheduling]] for [[automated execution]].
** a [[Machine Learning System]].
** It can often include [[logging mechanism]]s for [[audit purpose]]s and [[troubleshooting]].
** a [[Data Streaming System]].
** ...
* <B>See:</B> [[Data Warehouse System]], [[Data Streaming System]].
** It can range from being a [[3rd-Party ETL Platform System]] to being a [[Custom ETL System]], depending on its [[implementation approach]].
** It can range from being a [[Batch ETL System]] to being a [[Real-Time ETL System]], depending on its [[processing mode]].
** It can range from being a [[Centralized ETL System]] to being a [[Distributed ETL System]], depending on its [[architecture pattern]].
** It can range from being a [[Simple ETL System]] to being a [[Complex ETL System]], depending on its [[transformation complexity]].
** It can range from being an [[On-Premise ETL System]] to being a [[Cloud-Based ETL System]], depending on its [[deployment environment]].
** ...
** It can have [[ETL Platform System Capability]]s for [[data integration need]]s.
** It can provide [[Data Pipeline Component]]s for [[modular processing]].
** It can perform [[Data Quality Check]]s for [[integrity verification]].
** It can support [[Incremental Processing]] for [[efficiency optimization]].
** It can implement [[Parallel Execution]] for [[performance improvement]].
** ...
** It can integrate with [[Data Warehouse System]] for [[analytical data storage]].
** It can connect to [[Business Intelligence System]] for [[reporting function]]s.
** It can support [[Master Data Management System]] for [[reference data synchronization]].
** It can work with [[Data Governance System]] for [[compliance adherence]].
** It can interface with [[Enterprise Application]] for [[operational data exchange]].
** ...
* <B>Examples:</B>
** [[ETL System Implementation Type]]s, such as:
*** [[Commercial ETL System]]s, such as:
**** [[Informatica PowerCenter ETL System]] for [[enterprise data integration]].
**** [[IBM DataStage ETL System]] for [[large-scale data processing]].
**** [[Microsoft SSIS ETL System]] for [[microsoft ecosystem integration]].
**** [[Oracle Data Integrator ETL System]] for [[oracle environment integration]].
*** [[Open-Source ETL System]]s, such as:
**** [[Apache NiFi ETL System]] for [[dataflow automation]].
**** [[Talend Open Studio ETL System]] for [[java-based integration]].
**** [[Pentaho Data Integration ETL System]] for [[kettle-based processing]].
**** [[Airbyte ETL System]] for [[open-source data integration]].
** [[Domain-Specific ETL System]]s, such as:
*** [[Log File ETL System]]s, such as:
**** [[Splunk ETL System]] for [[log data processing]].
**** [[ELK Stack ETL System]] for [[log analytics processing]].
**** [[Fluentd ETL System]] for [[log collection integration]].
*** [[Healthcare ETL System]]s, such as:
**** [[Patient Data ETL System]] for [[electronic health record integration]].
**** [[Claims Processing ETL System]] for [[healthcare billing integration]].
**** [[Clinical Trial ETL System]] for [[research data processing]].
** [[Enterprise ETL System]]s, such as:
*** [[PlayStation ETL System]] for [[gaming analytics processing]].
*** [[Medable ETL System]] for [[clinical research data integration]].
*** [[Financial Institution ETL System]]s for [[regulatory reporting]].
*** [[Retail ETL System]]s for [[customer data integration]].
** ...
* <B>Counter-Examples:</B>
** [[Machine Learning System]], which focuses on [[algorithmic model training]] and [[predictive analytics]] rather than [[data movement]] and [[transformation]].
** [[Data Streaming System]], which processes [[continuous data flow]]s in [[real-time]] rather than [[batch-oriented extraction and loading]].
** [[Extract-Load-Transform (ELT) System]], which performs [[data transformation]] after [[loading]] rather than before it.
** [[Data Virtualization System]], which provides [[virtual data access]] without physically [[moving data]].
** [[API Integration System]], which connects [[application]]s through [[service interface]]s rather than [[data processing pipeline]]s.
* <B>See:</B> [[Data Warehouse System]], [[Data Streaming System]], [[Business Intelligence System]], [[ETL Development Framework]], [[Data Lake System]], [[Extract-Transform-Load (ETL) 3rd-Party Platform]].


----
----
Line 26: Line 78:
=== 2013b ===
=== 2013b ===
* http://en.wikipedia.org/wiki/Extract,_transform,_load#Tools
* http://en.wikipedia.org/wiki/Extract,_transform,_load#Tools
** Programmers can set up ETL processes using almost any [[programming language]], but building such processes from scratch can become complex. Increasingly, companies are buying ETL tools to help in the creation of ETL processes.<ref>[http://www.etltool.com/nieuws/2715_ETL_poll_produces_unexpected_results.htm ETL poll produces unexpected results]</ref>        <P>        By using an established ETL framework, one may increase one's chances of ending up with better connectivity and [[scalability]]{{citation needed|date=December 2011}}. A good ETL tool must be able to communicate with the many different [[relational database]]s and read the various file formats used throughout an organization. ETL tools have started to migrate into [[Enterprise Application Integration]], or even [[Enterprise Service Bus]], systems that now cover much more than just the extraction, transformation, and loading of data. Many ETL vendors now have [[data profiling]], [[data quality]], and [[Metadata (computing)|metadata]] capabilities. A common use case for ETL tools include converting CSV files to formats readable by relational databases. A typical translation of millions of records is facilitated by ETL tools that enable users to input csv-like data feeds/files and import it into a database with as little code as possible.        <P>               ETL Tools are typically used by a broad range of professionals - from students in computer science looking to quickly import large data sets to database architects in charge of company account management, ETL Tools have become a convenient tool that can be relied on to get maximum performance. ETL tools in most cases contain a GUI that helps users conveniently transform data as opposed to writing large programs to parse files and modify data types - which ETL tools facilitate as much as possible.
** Programmers can set up ETL processes using almost any [[programming language]], but building such processes from scratch can become complex. Increasingly, companies are buying ETL tools to help in the creation of ETL processes.<ref>[http://www.etltool.com/nieuws/2715_ETL_poll_produces_unexpected_results.htm ETL poll produces unexpected results]</ref>        <P>        By using an established ETL framework, one may increase one's chances of ending up with better connectivity and [[scalability]]{{citation needed|date=December 2011}}. A good ETL tool must be able to communicate with the many different [[relational database]]s and read the various file formats used throughout an organization. ETL tools have started to migrate into [[Enterprise Application Integration]], or even [[Enterprise Service Bus]], systems that now cover much more than just the extraction, transformation, and loading of data. Many ETL vendors now have [[data profiling]], [[data quality]], and [[Metadata (computing)|metadata]] capabilities. A common use case for ETL tools include converting CSV files to formats readable by relational databases. A typical translation of millions of records is facilitated by ETL tools that enable users to input csv-like data feeds/files and import it into a database with as little code as possible.        <P> ETL Tools are typically used by a broad range of professionals - from students in computer science looking to quickly import large data sets to database architects in charge of company account management, ETL Tools have become a convenient tool that can be relied on to get maximum performance. ETL tools in most cases contain a GUI that helps users conveniently transform data as opposed to writing large programs to parse files and modify data types - which ETL tools facilitate as much as possible.


----
----

Latest revision as of 05:38, 7 April 2025

An Extract-Transform-Load (ETL) Platform System is a data processing system that can solve ETL tasks by extracting data from source systems, transforming it according to business rules, and loading it into target systems.



References

2013a

2013b

  • http://en.wikipedia.org/wiki/Extract,_transform,_load#Tools
    • Programmers can set up ETL processes using almost any programming language, but building such processes from scratch can become complex. Increasingly, companies are buying ETL tools to help in the creation of ETL processes.[1]

      By using an established ETL framework, one may increase one's chances of ending up with better connectivity and scalability[citation needed]. A good ETL tool must be able to communicate with the many different relational databases and read the various file formats used throughout an organization. ETL tools have started to migrate into Enterprise Application Integration, or even Enterprise Service Bus, systems that now cover much more than just the extraction, transformation, and loading of data. Many ETL vendors now have data profiling, data quality, and metadata capabilities. A common use case for ETL tools include converting CSV files to formats readable by relational databases. A typical translation of millions of records is facilitated by ETL tools that enable users to input csv-like data feeds/files and import it into a database with as little code as possible.

      ETL Tools are typically used by a broad range of professionals - from students in computer science looking to quickly import large data sets to database architects in charge of company account management, ETL Tools have become a convenient tool that can be relied on to get maximum performance. ETL tools in most cases contain a GUI that helps users conveniently transform data as opposed to writing large programs to parse files and modify data types - which ETL tools facilitate as much as possible.