6/19/11

Optimzie your DataStage Job Performance with relevant Environment Variables

DataStage has many parameters which can be tweaked and used to optimize the performance of various DataStage Jobs. Even many available to collect more information during the event of crash to get more traces.
For any DataStage Job if you run into problem or want to get more details need to check following variables.

$APT_CONFIG_FILE: This allows you to define Configuration file based on your requirement. You can keep many configuration files with n-node combination and assign it dynamically for Job based in criteria or time.

$APT_SCORE_DUMP: It creates a job run report that shows the partitioning used, degree of parallelism, data buffering and inserted operators. It is Useful for finding out what your high volume job is doing.

$APT_PM_PLAYER_TIMING: This option lets you see what each operator in a job is doing, especially how much data they are handling and how much CPU they are consuming. It helps in identifying various bottlenecks.

One way to speed up very high volume jobs is to pre-sort the data and make sure it is not resorted in the DataStage job. This is done by turning off auto sorting in high volume jobs:
APT_NO_SORT_INSERTION: stops the job from automatically adding a sort command to the start of a job that has stages that need sorted data such as Remove Duplicates.You can also add a sort stage to the job and set it to a value of "Previously Sorted" to avoid this is a specific job path.

Please Refer to the following link for more details:
Optimzie your DataStage Job Performance with relevant Environment Variables

0 comments:

Post a Comment