Sunday, October 28, 2018
Saturday, August 12, 2017
Advanced SQL: Group Overlapping Events Using Analytic Window Functions
Problem: Group overlapping events
Given a list of events with start and end times, group them by their overlap. The problem is easier to understand with the following graph.Solution 1: Use cross self-join
I found the following forums with a similar solution.
- https://stackoverflow.com/questions/2561130/merge-overlapping-date-intervals
- https://www.sqlservercentral.com/Forums/Topic826031-8-1.aspx
Even though the code is surprisingly short, it is very hard to understand or explain. I am intrigued to develop an alternative solution.
Solution 2: Use analytic window functions
Sort events by their start and end times. Then loop through the event sequence 3 times.- In the 1st loop, at each row (event), calculate the max value of its previous events' end times (prev_max_end). This can be done using a running total calculation, an analytic function with a window clause (ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING). Compare prev_max_end against the current row's start time. If the start time > prev_max_end, then the current event starts a new group. In other words, a group boundary is detected at the current row. In the above graph, the detection is positive at Event 4. Event 1 automatically starts a new group because its pre_max_end is NULL.
- In the 2nd loop, at each row (event), count the running total of group boundaries. Again, magic of window clause (ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW). Every event within the same start and end boundaries will have the same count. For example, events 4, 5 and 6 all get running totals of 2, because they all see the boundaries at events 1 and 4. The running totals, therefore, can be used as group IDs.
- Finally in the 3rd loop, find min(start time) and max(end time) within each group as the group's start and end times.
It is very easy to verify and understand the solution step-by-step on a demo execution, as shown below.
Performance comparison
In addition to easiness of understanding and explaining, solution 2 also has a much better performance. Solution 1 has 3 cross self joins, which result in O(N4) total row counts. On the other hand, solution 2 has no joins. It just loops through the same number of rows 3 times, O(N). On a 2000-events data set, solution 1 runs for 12 seconds, and solution 2 less than 1 second.
Saturday, August 5, 2017
Advanced SQL: Find groups of students who take the same courses
How to group students who take the same courses using SQL? The trick is to assign a unique value to the combination of courses taken by a student, and then group students by their assigned values. There are 2 options for the assignment.
- Assign every course a unique bit. For example, course A = 1 (0001), course B = 2 (0010), course C = 4 (0100), etc. Then, for all possible combinations of courses, their sum of the bits are unique.
- Sort and concatenate course IDs.
Sunday, July 30, 2017
Query Neo4j using PowerShell and Bolt .Net driver
I failed to query Neo4j using PowerShell and the latest .Net driver (version 1.4.0) following the instructions from this blog post, which is based on version 1.0.2. The issue is that the new driver has changed its package dependencies. You need to import a different set of DLLs at the beginning of your PowerShell scripts, as shown in this code example.
Saturday, July 22, 2017
OBIEE Hack: Cleanup column headings of direct SQL reports with a single mouse click
OBIEE automatically enclose column headings with double quotes if they contain spaces. This is totally unnecessary and outright annoying. Here is a bookmarklet that can do the dirty work for you: Cleanup Headings.
Drag and drop it to your browser's bookmark bar. On a direct SQL report's edit page, click it. It will do the following cleaning tasks to all column headings:
Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).
Here is the JavaScript code:
Drag and drop it to your browser's bookmark bar. On a direct SQL report's edit page, click it. It will do the following cleaning tasks to all column headings:
- Remove all double quotes.
- Replace all underscores with spaces.
- Uppercase first letters and lowercase remaining letters of all words, except for all CAPITAL words.
Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).
Here is the JavaScript code:
Sunday, July 16, 2017
OBIEE Hack: Repeat all columns of a direct SQL report with a single mouse click
By default, OBIEE suppresses all columns of a direct SQL report. In other words, adjacent rows with the same value are merged like this.
Suppressed column format is excellent for viewing with human eyes. This is probably why OBIEE makes it the default format. However, it is a nightmare for sorting and filtering when exported to Excel.
In order to set a column to repeat format, at least 3 mouse clicks at different locations are required. It is common for reports to have 10+ columns, which means total 60+ mouse moves and clicks. It sure feels like a chore.
Drag and drop this bookmarklet to your bookmark bar: Repeat All Columns.
Now repeating all columns is just a single mouse click (of the bookmarklet).
Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).
Here is the JavaScript code:
Suppressed column format is excellent for viewing with human eyes. This is probably why OBIEE makes it the default format. However, it is a nightmare for sorting and filtering when exported to Excel.
In order to set a column to repeat format, at least 3 mouse clicks at different locations are required. It is common for reports to have 10+ columns, which means total 60+ mouse moves and clicks. It sure feels like a chore.
Drag and drop this bookmarklet to your bookmark bar: Repeat All Columns.
Now repeating all columns is just a single mouse click (of the bookmarklet).
Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).
Here is the JavaScript code:
Saturday, July 8, 2017
OBIEE Hack: Save and Restore Column Formatting While Editing Direct SQL Reports
OBIEE's direct SQL reports query databases directly, bypassing its logical data model layer (RPD). This is a quick-and-dirty solution for ad-hoc reporting requests. However, there is an annoying issue with direct SQL reports. Every time you modify a report's SQL, you lose its column formatting. Reformatting the same reports again and again could be frustrating if your customers frequently ask for changes to their reports. I previously developed a client-server hack to preserve column formatting while editing direct SQL reports. This post is a pure client-side, simpler, and more elegant solution.
Just drag and drop this bookmarklet to your browser's bookmark bar: Direct SQL Wizard.
When you edit a report's SQL, click the bookmarklet. It will add 3 new buttons sandwiching the native "Validate SQL" button as shown in the following screenshot. The new buttons' function is self-evident.
Below is the bookmarklet's JavaScript code.
Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).
Just drag and drop this bookmarklet to your browser's bookmark bar: Direct SQL Wizard.
When you edit a report's SQL, click the bookmarklet. It will add 3 new buttons sandwiching the native "Validate SQL" button as shown in the following screenshot. The new buttons' function is self-evident.
Below is the bookmarklet's JavaScript code.
Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).
Subscribe to:
Posts (Atom)