Data transformation

Data transformation
Concepts
Metadata Data element Data mapping Data migration Data transformation Model transformation Macro Preprocessor
Transformation languages
ATL AWK MOFM2T QVT XML languages
Techniques and transforms
Identity transform Data refinement
Applications
Data migration Data conversion Extract, transform, load (ETL) Web template system
Related
Data conversion Data wrangling

This article is about metadata transformation in computer science. For the statistical concept see data transformation (statistics).

In computing, a data transformation converts a set of data values from the data format of a source data system into the data format of a destination data system. It is often used in a data warehouse system.

Data transformation can be divided into two steps:

data mapping maps data elements from the source data system to the destination data system and captures any transformation that must occur
code generation that creates the actual transformation program

Data element to data element mapping is frequently complicated by complex transformations that require one-to-many and many-to-one transformation rules.

The code generation step takes the data element mapping specification and creates an executable program that can be run on a computer system. Code generation can also create transformation in easy-to-maintain computer languages such as Java or XSLT.

A master data recast is another form of data transformation where the entire database of data values is transformed or recast without extracting the data from the database. All data in a well designed database is directly or indirectly related to a limited set of master database tables by a network of foreign key constraints. Each foreign key constraint is dependent upon a unique database index from the parent database table. Therefore, when the proper master database table is recast with a different unique index, the directly and indirectly related data are also recast or restated. The directly and indirectly related data may also still be viewed in the original form since the original unique index still exists with the master data. Also, the database recast must be done in such a way as to not impact the applications architecture software.

When the data mapping is indirect via a mediating data model, the process is also called data mediation.

Transformational languages

There are numerous languages available for performing data transformation. Many transformation languages require a grammar to be provided. In many cases the grammar is structured using something closely resembling Backus–Naur Form (BNF). There are numerous languages available for such purposes varying in their accessibility (cost) and general usefulness. Examples of such languages include:

AWK - one of the oldest and popular textual data transformation language;
Perl - a high-level language with both procedural and object-oriented syntax capable of powerful operations on binary or text data.
Template languages - specialized for transform data into documents (see also template processor);
TXL - prototyping language-based descriptions, used for source code or data transformation.
XSLT - the standard XML data transformation language (suitable by XQuery in many applications);

Although transformational languages are typically best suited for transformation, something as simple as regular expressions can be used to achieve useful transformation. A text editor like emacs or Textpad supports the use of regular expressions with arguments. This would allow all instances of a particular pattern to be replaced with another pattern using parts of the original pattern. For example:

foo ("some string", 42, gCommon);
bar (someObj, anotherObj);

foo ("another string", 24, gCommon);
bar (myObj, myOtherObj);

could both be transformed into a more compact form like:

foobar("some string", 42, someObj, anotherObj);
foobar("another string", 24, myObj, myOtherObj);

In other words, all instances of a function invocation of foo with three arguments, followed by a function invocation with two invocations would be replaced with a single function invocation using some or all of the original set of arguments.

Another advantage to using regular expressions is that they will not fail the null transform test. That is, using your transformational language of choice, run a sample program through a transformation that doesn't perform any transformations. Many transformational languages will fail this test.

References

External links

Extraction and Transformation at DMOZ

Data warehouse

Creating the data warehouse

Concepts	Database Dimension Dimensional modeling Fact OLAP Star schema Aggregate

Variants	Anchor Modeling Column-oriented DBMS Data vault modeling HOLAP MOLAP ROLAP Operational data store

Elements	Data dictionary/Metadata Data mart Sixth normal form Surrogate key

Fact	Fact table Early-arriving fact Measure

Dimension	Dimension table Degenerate Slowly changing

Filling	Extract-Transform-Load (ETL) Extract Transform Load

Using the data warehouse

Concepts	Business intelligence Dashboard Data mining Decision support system (DSS) OLAP cube Data warehouse automation

Languages	Data Mining Extensions (DMX) MultiDimensional eXpressions (MDX) XML for Analysis (XMLA)

Tools	Business intelligence tools Reporting software Spreadsheet

People	Bill Inmon Ralph Kimball

Products	Comparison of OLAP Servers Data warehousing products and their producers

This article is issued from Wikipedia - version of the 5/28/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.

Data transformation

Transformational languages

See also

References

External links