4/30/12

Difference Between Datastage Server Jobs and Parallel Jobs


Datastage parallel jobs can run in parallel on multiple nodes. Server jobs do not run on multiple node.
Parallel jobs support partition parallelism(Round robin,Hash,modulus etc.), server jobs don't support this.
The transformer in Parallel jobs compiles in C++. In server jobs, the transformer is compiled in Basic language.
--------------------------------------
Basic difference is server job runs on windows platform usually and parallel job runs on unix platform.
server job runs on on node whereas parallel job runs on more than one node.
Even the server jobs run on UNIX most of the major installation are on UNIX platfoam and comming to the differences. There is a major difference in job architecture.
Server jobs process in sequence one stage after other.
While Parallel job process in parallel. It uses the configuration file to know the number of CPU's difined to process parallely.
Datastage parallel jobs can run in parallel on multiple nodes. Server jobs do not run on multiple node.
Parallel jobs support partition parallelism(Round robin Hash modulus etc.) server jobs don't support this.
The transformer in Parallel jobs compiles in C++. In server jobs the transformer is compiled in Basic language.

------------------------------------------

  1. The major difference between Infosphere Datastage Enterprise and Server edition is that Enterprise Edition (EE) introduces Parallel jobs. Parallel jobs support a completely new set of stages, which implement the scalable and parallel data processing mechanisms. In most cases parallel jobs and stages look similiar to the Datastage Server objects, however their capababilities are way different.
    In rough outline:
    • Parallel jobs are executable datastage programs, managed and controlled by Datastage Server runtime environment
    • Parallel jobs have a built-in mechanism for Pipelining, Partitioning and Parallelism. In most cases no manual intervention is needed to implement optimally those techniques.
    • Parallel jobs are a lot faster in such ETL tasks like sorting, filtering, aggregating
  2. Datastage EE jobs are compiled into OSH (Orchestrate Shell script language).
    OSH executes operators - instances of executable C++ classes, pre-built components representing stages used in Datastage jobs.
    Server Jobs are compiled into Basic which is an interpreted pseudo-code. This is why parallel jobs run faster, even if processed on one CPU.
  3. Datastage Enterprise Edition adds functionality to the traditional server stages, for instance record and column level format properties.
  4. Datastage EE brings also completely new stages implementing the parallel concept, for example:
    • Enterprise Database Connectors for Oracle, Teradata & DB2
    • Development and Debug stages - Peek, Column Generator, Row Generator, Head, Tail, Sample ...
    • Data set, File set, Complex flat file, Lookup File Set ...
    • Join, Merge, Funnel, Copy, Modify, Remove Duplicates ...
  5. When processing large data volumes Datastage EE jobs would be the right choice, however when dealing with smaller data environment, using Server jobs might be just easier to develop, understand and manage.
    When a company has both Server and Enterprise licenses, both types of jobs can be used.
  6. Sequence jobs are the same in Datastage EE and Server editions.


3 comments:

Anonymous said...

Thanks for providing the information on  DataStage Online training. Online training have the benefits of being convenient, flexible and on your own ti

Unknown said...

It was nice to see the best place to learn

Datastage Tutorial here.Thanks for sharing this one.

Ramakrishna said...
This comment has been removed by the author.

Post a Comment