Thursday, May 30, 2013

Generate Insert/Update Script for DataStage Oracle Connector Automatically

Here is an updated post.

For transactional (non-bulk) data load in DataStage using Oracle connectors, it is always the best option to use the "update then insert" mode. As shown in this post, the "update then insert" mode is as fast as the "insert then update" mode when it is 100% inserts, and outperform other modes when updates has higher percentage. Even if you know that the incoming data is 100% inserts, "update then insert" is still a better choice. Because your job may fail in the middle, and the data has been partially loaded. Now, if you restart the job, it is no longer 100% inserts.

In order to use the "update then insert" mode, you need to specify key column(s) in an Oracle connector stage, so that it knows which columns should be matched in where clause while generating update statement at run-time. Alternatively, you can provide insert/update statements yourself.

Writing insert/update statements by hand is tedious, especially if you have a lot of tables. The following SQL scripts generate insert/update statements automatically. You may paste the output into Oracle connector stages directly. Or you may wrap the scripts in a simple parallel job, and save the output into a text file. Use a sequence job to loop through hundreds of tables.

For update statement:
For insert statement:


Mohan Sukumar said...

Hai. Thanks for your code.

I am using Oracle 10G. I couldn't able to execute your code. i am getting error

ORA-00923: FROM keyword not found where expected

I think the error is in LISTArr Function. Please help me in this Issue.

Thanks in Advance.

Jing said...

listagg() is a 11g new feature. There are work arounds in 10g.