A Code Thief's Brag: 2016

Monday, August 22, 2016

OrientDB Completely Misses the Point of Graph Databases

What is the point of a graph database? It is all about relationships.

First of all (as explained here), relationships are treated as first-class data in a graph database, instead of second-class metadata in a relational database. Data is queryable or discoverable, while metadata is not. You have to learn metadata through other methods, such as taking training courses, reading documentations, reverse-engineering, or data profiling. The more complex a relational database, the more metadata you need to learn before querying for data. For companies with complex business domains, it is very time-consuming and expensive for employees to learn metadata. It is even more expensive to keep metadata-rich employees.

Secondly (as explained here), you need a true graph query language to take full advantage of relationship data. The old-fashioned SQL is just not designed for this purpose. SQL assumes that you already know relationships as metadata. So it expects you to explicitly spell them out in FROM clauses. If you are trying to use SQL to query/discover unknown relationships in a graph database, you are putting the cart before the donkey.

OrientDB is a self-claimed graph database. However, it completely misses the above 2 points. This is evident when it tries to compare itself against a true graph database, Neo4j.

Comparison Point	OrientDB	Neo4j
Complex Domains	Support schemas around graphs, i.e. vertex and edge hierarchies.	Only support "flat" labels for vertices and edges.
Query Language	OrientDB's query language is built on SQL. Considering most developers are familiar with SQL, working with OrientDB is just easier.	Neo4j has its own Query Language called "Cypher" which requires training to learn a new language.

What are "schemas around graphs?" They are METADATA! Neo4j's not supporting schemas is a conscious strategic design decision, not a trade-off between simplicity or capability. Hierarchy is just a tree structure, a specific type of graph. No matter how complex a domain hierarchy is, a graph database can model it natively as data. There is absolutely no need to introduce metadata. Modeling domain hierarchies as metadata instead of data in a graph database is like putting the donkey before a car.

Cypher provides liberating expressiveness for querying graphs. Its advantages over SQL is at the same level as comparing OOP to procedure languages. Criticizing Cypher on the base of developers' familiarity with SQL is like a swordsmith running a negative ad against a gun shop. "Every knight knows how to use a sword. It requires training to use a gun."

Sunday, August 14, 2016

A Graph is Worth a Thousand Words, Part 2

A Graph is Worth a Thousand Words, Part 1.

What are the things that a graph database can do, but a relational database can't? This is the burning question anyone from SQL world would ask when they first hear about graph database.

In part 1, I pointed out that a graph database allows you to discover/query complex relationships, while a relational database assumes you already know the relationships beforehand. This is mainly contributed to the fact that relationships are treated as first-class data in a graph database, and second-class metadata in a relational database.

There is another major reason why a graph database makes relationship discovery possible: its new graph-based query language. To discover relations implies that you don't know them in advance. Without knowing the relationships, you cannot write any SQL queries. You must explicitly spell out the relationships in a SQL query's FROM clause to join tables together. Using SQL to discover relationships is to put the cart before the donkey.

A graph query language, e.g. Cypher, allows you to query relationships without explicitly spelling out their specifics. This is equivalent to allowing wildcards in FROM clauses, such as "TABLE_A join *", or even "* join *". Not only the tables can be wildcarded, but also the join levels, like "A join (1..3) B" or "* join(*) *".

This makes graph databases extremely tool-friendly, or self-serving-friendly. The current generation of self-serving data discovery tools relies on predefined data models. You cannot point such a tool to an unknown relational database and start to discover. With a graph database and a Cypher-powered tool? Yes We Can!

Sunday, July 31, 2016

A Graph is Worth a Thousand Words, Part 1

What are the real advantages of graph database, neo4j in particular, over relational database? In other words, what are the things that a graph database can do while a relational database can't. This is the burning question a developer/analyst asks when he/she makes the decision whether or not to pick up the new technology. Here is neo4j's own sales pitch.

Why does My Enterprise Need a Graph Database?

Today’s CIOs and CTOs don’t just need to manage larger volumes of data – they need to generate insight from their existing data. In this case, the relationships between data points matter more than the individual points themselves.

In order to leverage data relationships, organizations need a database technology that stores relationship information as a first-class entity. That technology is a graph database.

Ironically, legacy relational database management systems (RDBMS) are poor at handling data relationships. Their rigid schemas make it difficult to add different connections or adapt to new business requirements.

Not only do graph databases effectively store data relationships; they’re also flexible when expanding a data model or conforming to changing business needs.

Obviously, it is targeted at CIOs and CTOs. If you are a developer or analyst from RDBMS/SQL world, and feel somewhat detached from the above sales pitch, below is my translation for you.

Graph database maximizes discoverability of information

A relational database captures the information of a business system as a mixture of data (entities) and metadata (relationships). Metadata is as important as data. It glues data together, and gives data meanings. Yet it is treated as a second-class citizen. You can discover information (through queries) only from data. You cannot ask questions (query) about metadata. Metadata must be obtained from other (usually costly) methods BEFORE you write queries. Here is a simple example to illustrate the point.

This is the universe of information (some movies, persons and their relationships) to be captured. There are 2 different designs.

Both designs capture exactly the same amount information. The difference is that a portion of the information (relationship type) is captured as data in design #2, and metadata in design #1. The more information captured as data in design #2 allows more types of questions to be asked. For example, is Tom Hanks related in anyway to Finding Dory? In order to find the same information in design #1, the question has to be broken down to 3 smaller ones. 1) Is Tom Hanks an actor in the movie? 2) Is Tom Hanks a director of the movie? 3) Does Tom Hanks rate the movie? The breaking down of the question requires prior knowledge of the metadata. The information captured in metadata is the information lost discoverability. The more complex a business system, the more relationships among entities, the more information captured as metadata, the more discoverability lost.

A graph database, on the other hand, captures ALL information as data, including relationships. Therefore, it maximizes a business system's discoverability, no matter how complex it is. This is why a graph (database) is worth a thousand words (of metadata in RDBMS).

A Graph is Worth a Thousand Words, Part 2.

Monday, April 18, 2016

Tribal Wars 2 Building Manager

Introduction

As an (ex-)active Tribal Wars 2 (TW2) player with more than 50 villages, I felt the pain of leveling up buildings. Its "finish for free" feature is supposed to award players who are always online. You save 5 minutes here and 10 minutes there. The time savings quickly add up, and put you hours or even days ahead of less active players. However, you have to constantly loop through your villages to see which ones are in "finish for free" mode. Otherwise, you just miss the savings. It quickly becomes a chore when you have 30+ villages. I felt it more like a punishment than a reward.

Tribal Wars 2 Building Manager (TW2BM) comes to the rescue.

What is TW2BM?

TW2BM is a Google Spreadsheet, which color-codes your in-progress buildings' remaining times. Here is a screenshot of TW2BM.

The remaining time is adjusted for "finish for free" time based on headquarters levels, and counted down every minute. The villages are sorted by their remaining times, so the ones need your attention are always at the top. There is no need to scroll up and down the sheet even if you have a lot of villages. After finishing a building, you update the village with next building's construction time. The remaining time is automatically re-calculated; and the villages are re-sorted.

The best part of TW2BM is that you don't have to manually setup the spreadsheet or type in any village information. All you need is a browser bookmarklet and, of course, a Google account. The spreadsheet is a private document in your own Google account. Nobody else can see it.

How to use TW2BM?

First, click this link to create a template Spreadsheet on your Google Drive: Create TW2BM Spreadsheet.

Then, drag and drop this bookmarklet to your browser's bookmark bar: TW2BM.

Open a village's Headquarters screen, and click the bookmarklet. If this is the first time you click it, it automatically creates the world sheet for you. It also sends the village name, the headquarters' level and the current building's remaining time to the spreadsheet. If the village has never been registered before, a new row is added. Otherwise, it updates the existing row. Then go to the next village.

Disclaimer

Use of TW2BM seems a violation of TW2's bot/script rule. Your account may be banned. Use it at your own risk. However, I strongly feel that using TW2BM is not cheating. It doesn't interact with the game server directly, or mimic your mouse clicking or key stroking. All it does is collecting and managing your game information. Besides, who can tell you are using TW2BM unless you tell someone?