Sunday, July 16, 2017

OBIEE Hack: Repeat all columns of a direct SQL report with a single mouse click

By default, OBIEE suppresses all columns of a direct SQL report. In other words, adjacent rows with the same value are merged like this.
Suppressed column format is excellent for viewing with human eyes. This is probably why OBIEE makes it the default format. However, it is a nightmare for sorting and filtering when exported to Excel.

In order to set a column to repeat format, at least 3 mouse clicks at different locations are required. It is common for reports to have 10+ columns, which means total 60+ mouse moves and clicks. It sure feels like a chore.

Drag and drop this bookmarklet to your bookmark bar: Repeat All Columns.

Now repeating all columns is just a single mouse click (of the bookmarklet).

Saturday, July 8, 2017

OBIEE Hack: Save and Restore Column Formatting While Editing Direct SQL Reports

OBIEE's direct SQL reports query databases directly, bypassing its logical data model layer (RPD). This is a quick-and-dirty solution for ad-hoc reporting requests. However, there is an annoying issue with direct SQL reports. Every time you modify a report's SQL, you lose its column formatting. Reformatting the same reports again and again could be frustrating if your customers frequently ask for changes to their reports. I previously developed a client-server hack to preserve column formatting while editing direct SQL reports. This post is a pure client-side, simpler, and more elegant solution.

Just drag and drop this bookmarklet to your browser's bookmark bar: Direct SQL Wizard.

When you edit a report's SQL, click the bookmarklet. It will add 3 new buttons sandwiching the native "Validate SQL" button as shown in the following screenshot. The new buttons' function is self-evident.


Below is the bookmarklet's JavaScript code.
Browser compatibility: Google Chrome (yes), FireFox (yes), Opera (yes), IE (no).

Sunday, June 18, 2017

OBIEE Hack: Preserve Column Formatting While Editing Direct SQL

Update: I posted a better solution here.

OBIEE's direct SQL reports query databases directly, bypassing its logical data model layer (RPD). This is a quick-and-dirty solution for ad hoc reporting requests. However, there is an annoying issue with direct SQL reports. Every time you modify a report's SQL, you lose its column formatting. Reformatting the same reports again and again could be frustrating if your customers frequently ask for changes to their reports. Here is a hack to preserve column formatting while editing direct SQL reports.

When you modify and validate a direct SQL report's SQL, OBIEE reset all columns to default settings. However, the reset happens only at client side, not on the server. In other words, the original formatting is still saved on the server. It is possible to retrieve the original formatting from the server, and merge it with the modified SQL from your local computer. The bookmarklet below grabs and posts the following information to a web service.

  • OBIEE server host name
  • Full path of the report on the server
  • Modified and validated report in XML format to be saved on the server

The web service is where you do the merge. There are various technologies for implementing web services and manipulating XML. You can mix and match them based on what's already available in you company's technology stack.

Thursday, April 20, 2017

A subtle difference between PowerShell ForEach and ForEach-Object

This blog post explains the major differences between PowerShell's ForEach statement and ForEach-Object cmdlet. There is a subtle difference, however, not mentioned in the post.

ForEach statement will not loop into a $null array variable, while ForEach-Object cmdlet will loop it once with a $null value. This may bite you with NullExceptions.

Here is a code snippet to demonstrate the issue.

Monday, August 22, 2016

OrientDB Completely Misses the Point of Graph Databases

What is the point of a graph database? It is all about relationships.

First of all (as explained here), relationships are treated as first-class data in a graph database, instead of second-class metadata in a relational database. Data is queryable or discoverable, while metadata is not. You have to learn metadata through other methods, such as taking training courses, reading documentations, reverse-engineering, or data profiling. The more complex a relational database, the more metadata you need to learn before querying for data. For companies with complex business domains, it is very time-consuming and expensive for employees to learn metadata. It is even more expensive to keep metadata-rich employees.

Secondly (as explained here), you need a true graph query language to take full advantage of relationship data. The old-fashioned SQL is just not designed for this purpose. SQL assumes that you already know relationships as metadata. So it expects you to explicitly spell them out in FROM clauses. If you are trying to use SQL to query/discover unknown relationships in a graph database, you are putting the cart before the donkey.

OrientDB is a self-claimed graph database. However, it completely misses the above 2 points. This is evident when it tries to compare itself against a true graph database, Neo4j.
Comparison Point
OrientDB
Neo4j
Complex Domains Support schemas around graphs, i.e. vertex and edge hierarchies. Only support "flat" labels for vertices and edges.
Query Language OrientDB's query language is built on SQL. Considering most developers are familiar with SQL, working with OrientDB is just easier. Neo4j has its own Query Language called "Cypher" which requires training to learn a new language.

What are "schemas around graphs?" They are METADATA! Neo4j's not supporting schemas is a conscious  strategic design decision, not a trade-off between simplicity or capability. Hierarchy is just a tree structure, a specific type of graph. No matter how complex a domain hierarchy is, a graph database can model it natively as data. There is absolutely no need to introduce metadata. Modeling domain hierarchies as metadata instead of data in a graph database is like putting the donkey before a car.

Cypher provides liberating expressiveness for querying graphs. Its advantages over SQL is at the same level as comparing OOP to procedure languages. Criticizing Cypher on the base of developers' familiarity with SQL is like a swordsmith running a negative ad against a gun shop. "Every knight knows how to use a sword. It requires training to use a gun."

Sunday, August 14, 2016

A Graph is Worth a Thousand Words, Part 2

A Graph is Worth a Thousand Words, Part 1.

What are the things that a graph database can do, but a relational database can't? This is the burning question anyone from SQL world would ask when they first hear about graph database.

In part 1, I pointed out that a graph database allows you to discover/query complex relationships, while a relational database assumes you already know the relationships beforehand. This is mainly contributed to the fact that relationships are treated as first-class data in a graph database, and second-class metadata in a relational database.

There is another major reason why a graph database makes relationship discovery possible: its new graph-based query language. To discover relations implies that you don't know them in advance. Without knowing the relationships, you cannot write any SQL queries. You must explicitly spell out the relationships in a SQL query's FROM clause to join tables together. Using SQL to discover relationships is to put the cart before the donkey.

A graph query language, e.g. Cypher, allows you to query relationships without explicitly spelling out their specifics. This is equivalent to allowing wildcards in FROM clauses, such as "TABLE_A join *", or even "* join *". Not only the tables can be wildcarded, but also the join levels, like "A join (1..3) B" or "* join(*) *".

This makes graph databases extremely tool-friendly, or self-serving-friendly. The current generation of self-serving data discovery tools relies on predefined data models. You cannot point such a tool to an unknown relational database and start to discover. With a graph database and a Cypher-powered tool? Yes We Can!

Sunday, July 31, 2016

A Graph is Worth a Thousand Words, Part 1

What are the real advantages of graph database, neo4j in particular, over relational database? In other words, what are the things that a graph database can do while a relational database can't. This is the burning question a developer/analyst asks when he/she makes the decision whether or not to pick up the new technology. Here is neo4j's own sales pitch.
Why does My Enterprise Need a Graph Database? 
Today’s CIOs and CTOs don’t just need to manage larger volumes of data – they need to generate insight from their existing data. In this case, the relationships between data points matter more than the individual points themselves.
In order to leverage data relationships, organizations need a database technology that stores relationship information as a first-class entity. That technology is a graph database.
Ironically, legacy relational database management systems (RDBMS) are poor at handling data relationships. Their rigid schemas make it difficult to add different connections or adapt to new business requirements.
Not only do graph databases effectively store data relationships; they’re also flexible when expanding a data model or conforming to changing business needs.
Obviously, it is targeted at CIOs and CTOs. If you are a developer or analyst from RDBMS/SQL world, and feel somewhat detached from the above sales pitch, below is my translation for you.

Graph database maximizes discoverability of information

A relational database captures the information of a business system as a mixture of data (entities) and metadata (relationships). Metadata is as important as data. It glues data together, and gives data meanings. Yet it is treated as a second-class citizen. You can discover information (through queries) only from data. You cannot ask questions (query) about metadata. Metadata must be obtained from other (usually costly) methods BEFORE you write queries. Here is a simple example to illustrate the point.
This is the universe of information (some movies, persons and their relationships) to be captured. There are 2 different designs. 

Both designs capture exactly the same amount information. The difference is that a portion of the information (relationship type) is captured as data in design #2, and metadata in design #1. The more information captured as data in design #2 allows more types of questions to be asked. For example, is Tom Hanks related in anyway to Finding Dory? In order to find the same information in design #1, the question has to be broken down to 3 smaller ones. 1) Is Tom Hanks an actor in the movie? 2) Is Tom Hanks a director of the movie? 3) Does Tom Hanks rate the movie? The breaking down of the question requires prior knowledge of the metadata. The information captured in metadata is the information lost discoverability. The more complex a business system, the more relationships among entities, the more information captured as metadata, the more discoverability lost.

A graph database, on the other hand, captures ALL information as data, including relationships. Therefore, it maximizes a business system's discoverability, no matter how complex it is. This is why a graph (database) is worth a thousand words (of metadata in RDBMS).