Saturday, August 12, 2017

Advanced SQL: Group Overlapping Events Using Analytic Window Functions

Problem: Group overlapping events

Given a list of events with start and end times, group them by their overlap. The problem is easier to understand with the following graph.

Solution 1: Use cross self-join

I found the following forums with a similar solution.
Here is my own implementation of the solution.
Even though the code is surprisingly short, it is very hard to understand or explain. I am intrigued to develop an alternative solution.

Solution 2: Use analytic window functions

Sort events by their start and end times. Then loop through the event sequence 3 times.
  1. In the 1st loop, at each row (event), calculate the max value of its previous events' end times (prev_max_end). This can be done using a running total calculation, an analytic function with a window clause (ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING). Compare prev_max_end against the current row's start time. If the start time > prev_max_end, then the current event starts a new group. In other words, a group boundary is detected at the current row. In the above graph, the detection is positive at Event 4. Event 1 automatically starts a new group because its pre_max_end is NULL.
  2. In the 2nd loop, at each row (event), count the running total of group boundaries. Again, magic of window clause (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). Every event within the same start and end boundaries will have the same count. For example, events 4, 5 and 6 all get running totals of 2, because they all see the boundaries at events 1 and 4. The running totals, therefore, can be used as group IDs.
  3. Finally in the 3rd loop, find min(start time) and max(end time) within each group as the group's start and end times.
Here is an implementation.
It is very easy to verify and understand the solution step-by-step on a demo execution, as shown below.


Performance comparison

In addition to easiness of understanding and explaining, solution 2 also has a much better performance. Solution 1 has 3 cross self joins, which result in O(N4) total row counts. On the other hand, solution 2 has no joins. It just loops through the same number of rows 3 times, O(N). On a 2000-events data set, solution 1 runs for 12 seconds, and solution 2 less than 1 second.

Saturday, August 5, 2017

Advanced SQL: Find groups of students who take the same courses

How to group students who take the same courses using SQL? The trick is to assign a unique value to the combination of courses taken by a student, and then group students by their assigned values. There are 2 options for the assignment.

  1. Assign every course a unique bit. For example, course A = 1 (0001), course B = 2 (0010), course C = 4 (0100), etc. Then, for all possible combinations of courses, their sum of the bits are unique.
  2. Sort and concatenate course IDs.
Option 2's string concatenation is not SQL standard. Different RDBMS has different syntax. Option 1, on the other hand, can be implemented in standard SQL. Below is an example for option 1.

Sunday, July 30, 2017

Query Neo4j using PowerShell and Bolt .Net driver

I failed to query Neo4j using PowerShell and the latest .Net driver (version 1.4.0) following the instructions from this blog post, which is based on version 1.0.2. The issue is that the new driver has changed its package dependencies. You need to import a different set of DLLs at the beginning of your PowerShell scripts, as shown in this code example.

Saturday, July 22, 2017

OBIEE Hack: Cleanup column headings of direct SQL reports with a single mouse click

OBIEE automatically enclose column headings with double quotes if they contain spaces. This is totally unnecessary and outright annoying. Here is a bookmarklet that can do the dirty work for you: Cleanup Headings.

Drag and drop it to your browser's bookmark bar. On a direct SQL report's edit page, click it. It will do the following cleaning tasks to all column headings:

  • Remove all double quotes.
  • Replace all underscores with spaces.
  • Uppercase first letters and lowercase remaining letters of all words, except for all CAPITAL words.

Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).

Here is the JavaScript code:

Sunday, July 16, 2017

OBIEE Hack: Repeat all columns of a direct SQL report with a single mouse click

By default, OBIEE suppresses all columns of a direct SQL report. In other words, adjacent rows with the same value are merged like this.
Suppressed column format is excellent for viewing with human eyes. This is probably why OBIEE makes it the default format. However, it is a nightmare for sorting and filtering when exported to Excel.

In order to set a column to repeat format, at least 3 mouse clicks at different locations are required. It is common for reports to have 10+ columns, which means total 60+ mouse moves and clicks. It sure feels like a chore.

Drag and drop this bookmarklet to your bookmark bar: Repeat All Columns.

Now repeating all columns is just a single mouse click (of the bookmarklet).

Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).

Here is the JavaScript code:

Saturday, July 8, 2017

OBIEE Hack: Save and Restore Column Formatting While Editing Direct SQL Reports

OBIEE's direct SQL reports query databases directly, bypassing its logical data model layer (RPD). This is a quick-and-dirty solution for ad-hoc reporting requests. However, there is an annoying issue with direct SQL reports. Every time you modify a report's SQL, you lose its column formatting. Reformatting the same reports again and again could be frustrating if your customers frequently ask for changes to their reports. I previously developed a client-server hack to preserve column formatting while editing direct SQL reports. This post is a pure client-side, simpler, and more elegant solution.

Just drag and drop this bookmarklet to your browser's bookmark bar: Direct SQL Wizard.

When you edit a report's SQL, click the bookmarklet. It will add 3 new buttons sandwiching the native "Validate SQL" button as shown in the following screenshot. The new buttons' function is self-evident.


Below is the bookmarklet's JavaScript code.
Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).

Sunday, June 18, 2017

OBIEE Hack: Preserve Column Formatting While Editing Direct SQL

Update: I posted a better solution here.

OBIEE's direct SQL reports query databases directly, bypassing its logical data model layer (RPD). This is a quick-and-dirty solution for ad hoc reporting requests. However, there is an annoying issue with direct SQL reports. Every time you modify a report's SQL, you lose its column formatting. Reformatting the same reports again and again could be frustrating if your customers frequently ask for changes to their reports. Here is a hack to preserve column formatting while editing direct SQL reports.

When you modify and validate a direct SQL report's SQL, OBIEE reset all columns to default settings. However, the reset happens only at client side, not on the server. In other words, the original formatting is still saved on the server. It is possible to retrieve the original formatting from the server, and merge it with the modified SQL from your local computer. The bookmarklet below grabs and posts the following information to a web service.

  • OBIEE server host name
  • Full path of the report on the server
  • Modified and validated report in XML format to be saved on the server

The web service is where you do the merge. There are various technologies for implementing web services and manipulating XML. You can mix and match them based on what's already available in you company's technology stack.

Thursday, April 20, 2017

A subtle difference between PowerShell ForEach and ForEach-Object

This blog post explains the major differences between PowerShell's ForEach statement and ForEach-Object cmdlet. There is a subtle difference, however, not mentioned in the post.

ForEach statement will not loop into a $null array variable, while ForEach-Object cmdlet will loop it once with a $null value. This may bite you with NullExceptions.

Here is a code snippet to demonstrate the issue.