Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Extract, transform, load
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Transform === In the [[data transformation]] stage, a series of rules or functions are applied to the extracted data in order to prepare it for loading into the end target. An important function of transformation is [[data cleansing]], which aims to pass only "proper" data to the target. The challenge when different systems interact is in the relevant systems' interfacing and communicating. Character sets that may be available in one system may not be in others. In other cases, one or more of the following transformation types may be required to meet the business and technical needs of the server or data warehouse: * Selecting only certain columns to load: (or selecting [[null (SQL)|null]] columns not to load). For example, if the source data has three columns (aka "attributes"), roll_no, age, and salary, then the selection may take only roll_no and salary. Or, the selection mechanism may ignore all those records where salary is not present (salary = null). * Translating coded values: (''e.g.'', if the source system codes male as "1" and female as "2", but the warehouse codes male as "M" and female as "F") * Encoding free-form values: (''e.g.'', mapping "Male" to "M") * Deriving a new calculated value: (''e.g.'', sale_amount = qty * unit_price) * Sorting or ordering the data based on a list of columns to improve search performance * [[Join (relational algebra)#Joins and join-like operators|Join]]ing data from multiple sources (''e.g.'', lookup, merge) and [[Record linkage|deduplicating]] the data * Aggregating (for example, rollup β summarizing multiple rows of data β total sales for each store, and for each region, etc.) * Generating [[surrogate key|surrogate-key]] values * [[Transpose|Transposing]] or [[Pivot table|pivoting]] (turning multiple columns into multiple rows or vice versa) * Splitting a column into multiple columns (''e.g.'', converting a [[comma separated values|comma-separated list]], specified as a string in one column, into individual values in different columns) * Disaggregating repeating columns * Looking up and validating the relevant data from tables or referential files * Applying any form of data validation; failed validation may result in a full rejection of the data, partial rejection, or no rejection at all, and thus none, some, or all of the data is handed over to the next step depending on the rule design and exception handling; many of the above transformations may result in exceptions, e.g., when a code translation parses an unknown code in the extracted data
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)