6/15/11

Teradata Data Distribution and Data Access Methods

An index is a physical mechanism used to store and access the rows of a table. Indexes on
tables in a relational database function much like indexes in books— they speed up
Information retrieval.
In general, the Teradata Database uses indexes to:
• Distribute data rows.
• Locate data rows.
• Improve performance.
Indexed access is usually more efficient than searching all rows of a table.
• Ensure uniqueness of the index values.
Only one row of a table can have a particular value in the column or columns defined as a
Unique Index.
The Teradata Database supports the following types of indexes:
• Primary
• Secondary
• Join
• Hash
• Special indexes for referential integrity
These indexes are discussed in the following sections.

Primary Index : The Teradata Database requires one Primary Index (PI) for each table in the database, except for some data dictionary tables and global temporary tables.

Distributes rows.
Defines most common access path.
May be unique or non-unique.
May be null.
Physical access mechanism.
Required by Teradata Database for most tables.
64-column limit.
Values can be changed.

Primary Indexes and Data Distribution:
Unique Primary Indexes (UPIs) guarantee uniform distribution of table rows.
Non-Unique Primary Indexes (NUPIs) can cause skewed data. While not a guarantor of
uniform row distribution, the degree of uniqueness of the index will determine the degree of
uniformity of the distribution. Because all rows with the same PI value end up on the same
AMP, columns with a small number of distinct values that are repeated frequently do not
make good PI candidates.
The most efficient access method to data in a table is through the PI. For this reason, choosing
a PI should take the following design goal into consideration: choosing a PI that gives good
distribution of data across the AMPs must be balanced against choosing a PI that reflects the
most common usage pattern of the table.

0 comments:

Post a Comment