Tuesday, April 30, 2013

Performance Comparison of Upsert Options in DataStage Oracle Connector

DataStage Oracle connector stage can write data to database in UPSERT mode. Actually, there are 2 UPSERT modes, "Insert then update" and "Update then insert", as described below:
Write modeDescription
Insert then updateFor each input record, the connector first tries to insert the record as a new row in the target table. If the insert operation fails because of a primary key or unique constraint, the connector updates the existing row in the target table with the new values from the input record.
Update then insertFor each input record, the connector first tries to locate the matching rows in the target table and to update them with the new values from the input record. If the rows cannot be located, the connector inserts the record as a new row in the target table.
From the description, it's no-brainer to select a write mode if you know the majority of input records are inserts or updates. What about 50/50, or unpredictable input? Is it worthwhile to split the input records into pure inserts and pure updates? To answer the questions, I did some tests. The result is summarized in the following graph.

The result is surprising. "Update then insert" mode is always the fastest no matter what percentage of input is update. It makes life much easier. No more splitting stream. No more picking the correct mode. It is always "Update then insert."

Note: The test was done in DataStage v8.7.