4/30/12

Netezza Performance Server :


Distribution Specification:
Each table in a Netezza RDBMS database has only one distribution key, which consists of
one to four columns. You can use the following SQL syntax to create distribution keys.

>To create an explicit distribution key, the Netezza SQL syntax is:
usage: create table [ ( [, … ] ) ] as
[ distribute on [hash] ( [ ,… ] ) ]
The phrase distribute on specifies the distribution key, the word hash is optional.

> To create a round-robin distribution key, the Netezza SQL syntax is:
usage: create table (col1 int, col2 int, col3 int);
distribute on random;
The phrase distribute on random specifies round-robin distribution.

> To create a table without specifying a distribution key, the Netezza SQL syntax is:
usage: create table (col1 int, col2 int, col3 int);
The NPS chooses a distribution key. There is no guarantee what that key is and it can
vary depending on the NPS software release.



Choosing the distribution key of table in Netezza

In Netezza the each table data is distributed across may servers(SPU's). This distribution of data results in parallel processing. So, we should be careful in choosing the right distribution key while creating a table.

There are two types of distribution methods:

  1. Hash algorithm
  2. Random algorithm
Random algorithm applies when you specify random() in the 'distribute on' clause.
Hash algorithm applies when you specify column names in the distribute on clause. When you don't specify distribute on clause while creating a table, by default, NPS distributes the data on first column in the create table statement.


Tips to choose a good distribution method:


  1. Choose columns that distributes rows evenly across all the SPU's.
  2. Select the columns withe unique values and high cardinality.
  3. Do not use random distribution even it distributes data equally across all the SPU's.
  4. Always specify the distribution clause even if you are distributing on first column in the table.
  5. Use same distribution key and datatypes for commonly joined tables.
  6. Do not distribute on floating point and boolean datatypes.

1 comments:

Anonymous said...

Its like you read my mind! You appear to know so much about this, like you wrote the book in it or something.

I think that you could do with some pics to drive the message home
a little bit, but other than that, this is magnificent blog.
An excellent read. I'll definitely be back.

Feel free to visit my blog post - bmi chart for women

Post a Comment