Sunday, December 22, 2013

Apply Balanced Optimization Principles in DataStage Job Design

DataStage v8.5 introduced an optional add-on: balanced optimizer. The optimizer can redesign parallel jobs to improve performance. It applies the following principles in the redesign.
Minimize I/O and data movement
Reduce the amount of source data read by the job by performing computations within the source database. Where possible, move processing of data to the database and avoid extracting data just to process it and write it back to the same database.
Maximize optimization within source or target databases
Make use of the highly developed optimizations that databases achieve by using local indexes, statistics, and other specialized features.
Maximize parallelism
Take advantage of default InfoSphere DataStage behavior when reading and writing databases: use parallel interfaces and pipe the data through the job, so that data flows from source to target without being written to intermediate destinations.
The key takeaway is to do as much as possible within databases. It seems to me that an experienced DataStage developer could do a much better job in applying these principles to optimize a job than a piece of software. So let me give it a try on a real-world production job.
The original design is shown in Fig. 1. All data processing and transformation are performed within DataStage server, including,
  1. two slowly changing dimension stages,
  2. five lookup stages, and
  3. three transformer stages for adding timestamp columns.
Eight Oracle connector stages serve as data sources/targets only, reading 5 source tables and writing 3 target tables within the same database. No other tasks are performed within the connectors. This type of job is an ideal candidate for balanced optimization.

Figure 1: Original job design

The optimized job is shown in Fig. 2. A single Oracle connector stage replaces all 20 processing and data stages. The main data stream is bulk-loaded into a temp table (Fig. 3), and processed in after-SQL.
Figure 2: Redesigned job applying the principles of Balanced Optimization

Figure 3: Oracle connector configuration in the optimized job

Here are the after-SQL statements.

The performance gain is dramatic, dropping from ~30 seconds all the way down to ~3 seconds. In addition, I get two extra bonuses for free.

  1. The original job is not transactional. If it failed in the middle, the target tables might be in inconsistent states with partial results. In order to recover from the job failure, I need to do some cleanup. The optimized job, on the other hand, is transactional. The update to the target tables is carried out in the after-SQL statements. If they fail, the target tables are automatically rolled back.
  2. If the job fails because of data quality issues, such as lookup failure or unique constraint violation, it is not easy to find the offending rows in the original design. In the optimized design, all intermediate results are kept in the temp table within the same database as the target tables. It is fairly straightforward to pinpoint the offending rows by joining the temp and the target tables.
In conclusion, balanced optimization (BO) can greatly improve job performance, as well as simplify maintenance. However, it is not necessary to pay the additional license fee if the BO principles can been applied in job design by developers.

Saturday, December 21, 2013

Scrollable Table Control with Fixed Header


ISD provides a scrollable table control for displaying a lot of rows in a window with fixed height. It lacks, however, the most desirable feature of a scrollable table: a fixed table header. Here is a screenshot and a demo page. The previous versions of ISD do have a fixed-header scrollable table control. But it only works in IE, not in Chrome or Firefox.
A scrollable table is, in my opinion, not a good UI design in the first place. It significantly slows down initial page load. The issue (or the header) can be easily fixed. If your client or manager asks for a scrollable table, try your best to steer them away from it. If you can't, then follow these steps to implement a fixed-header scrollable table.


Step 1: Prepare a built-in scrollable table

Add a built-in scrollable table control onto your page. Split the merged header cell above row buttons. This is an important step. Any merged cell will screw up column width calculation while fixing the header row.

Step 2: Download JavaScript libraries

ISD has added out-of-box jQuery support in latest versions. If you still use old versions without jQuery, you can download it from jQuery website, and include it in master pages.
Another library is the jQuery Scrollable Table Plugin. Download it from its website, and put it under your project's root folder.

Step 3: Modify Styles.css

Add the following classes to the Styles.css file.

Step 4: Add JavaScript code to page prologue

Add the following JavaScript code to the page prologue.
Build and run.


Here is what you get on screen, and a demo page. Please note the long delay (~10 seconds) at initial page load, during which the table header appears not fixed. This endorses my previous suggestion: avoid scrollable tables as much as you can.