6/19/11

Difference between Dataset ,Fileset and sequential file

Dataset:
0). Data set is the internally data format behind Orchestrate framework, so any other data being processed as source in parallel job would be converted into data set format first(it is handled by the operator "import") and also being processed as target would be converted from data set format last(it is handled by the operator "export"). Hence, data set usually could bring highest performance.
1) It stores data in binary in the internal format of DataStage so, it takes less time to read/write from dataset than any other source/target.
2)It preserves the partioning schemes so that you don't have to partition it again.
3)You cannot view data without datastage

Fileset:
0) Both .ds file and .fs file are the descriptor file of data set and file set respectively, whereas .fs file is stored as ASCII format, so you could directly open it to see the path of data file and its schema. However, .ds file cannot be open directly, and you could follow alternative way to achieve that, Data Set Management, the utility in client tool(such as Designer and Manager), and command line ORCHADMIN.
1)It stores data in the format similar to a sequential file.
2)Only advantage of using fileset over a sequential file is "it preserves partioning scheme"
3)You can view the data but in the order defined in partitioning scheme

0 comments:

Post a Comment