Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Extract, transform, load
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Variations == === In online transaction processing === [[File:Conventional ETL Diagram.jpg|alt=ETL diagram in the context of online transaction processing|thumb|ETL diagram in the context of [[online transaction processing]]<ref name="Kimball 2004" />]] In [[online transaction processing]] (OLTP) applications, changes from individual OLTP instances are detected and logged into a snapshot, or batch, of updates. An ETL instance can be used to periodically collect all of these batches, transform them into a common format, and load them into a data lake or warehouse.<ref name="Kimball 2004" /> === Virtual ETL === {{Unreferenced section|date=September 2024}} [[Data virtualization]] can be used to advance ETL processing. The application of data virtualization to ETL allowed solving the most common ETL tasks of [[data migration]] and application integration for multiple dispersed data sources. Virtual ETL operates with the abstracted representation of the objects or entities gathered from the variety of relational, semi-structured, and [[unstructured data]] sources. ETL tools can leverage object-oriented modeling and work with entities' representations persistently stored in a centrally located [[hub-and-spoke]] architecture. Such a collection that contains representations of the entities or objects gathered from the data sources for ETL processing is called a metadata repository and it can reside in memory or be made persistent. By using a persistent metadata repository, ETL tools can transition from one-time projects to persistent middleware, performing data harmonization and [[data profiling]] consistently and in near-real time. === Extract, load, transform (ELT) === {{main|Extract, load, transform}} [[Extract, load, transform]] (ELT) is a variant of ETL where the extracted data is loaded into the target system first.<ref name="AWS Data Warehousing 9" >Amazon Web Services, Data Warehousing on AWS, p. 9</ref> The architecture for the analytics pipeline shall also consider where to cleanse and enrich data<ref name="AWS Data Warehousing 9" /> as well as how to conform dimensions.<ref name="Kimball 2004" /> Some of the benefits of an ELT process include speed and the ability to more easily handle both unstructured and structured data.<ref>{{Cite web |last=Mishra |first=Tanya |date=2023-09-02 |title=ETL vs ELT: Meaning, Major Differences & Examples |url=https://www.analyticsinsight.net/etl-vs-elt-meaning-major-differences-examples/ |access-date=2024-01-30 |website=Analytics Insight}}</ref> [[Ralph Kimball]] and [[Joe Caserta]]'s book The Data Warehouse ETL Toolkit, (Wiley, 2004), which is used as a textbook for courses teaching ETL processes in data warehousing, addressed this issue.<ref>{{Cite web|url=https://www.oreilly.com/library/view/the-data-warehouse/9780764567575/|title = The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data [Book]}}</ref> Cloud-based data warehouses like [[Amazon Redshift]], Google [[BigQuery]], [[Microsoft Azure Synapse Analytics]] and [[Snowflake Inc.]] have been able to provide highly scalable computing power. This lets businesses forgo preload transformations and replicate raw data into their data warehouses, where it can transform them as needed using [[SQL]]. After having used ELT, data may be processed further and stored in a data mart.<ref>Amazon Web Services, Data Warehousing on AWS, 2016, p. 10</ref> Most data integration tools skew towards ETL, while ELT is popular in database and data warehouse appliances. Similarly, it is possible to perform TEL (Transform, Extract, Load) where data is first transformed on a blockchain (as a way of recording changes to data, e.g., token burning) before extracting and loading into another data store.<ref>{{cite book |last1=Bandara |first1=H. M. N. Dilum |last2=Xu |first2=Xiwei |last3=Weber |first3=Ingo |title=Proceedings of the European Conference on Pattern Languages of Programs 2020 |chapter=Patterns for Blockchain Data Migration |year=2020 |pages=1β19 |doi=10.1145/3424771.3424796 |arxiv=1906.00239|isbn=9781450377690 |s2cid=219956181 }}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)