Monday, August 22, 2016

OrientDB Completely Misses the Point of Graph Databases

What is the point of a graph database? It is all about relationships.

First of all (as explained here), relationships are treated as first-class data in a graph database, instead of second-class metadata in a relational database. Data is queryable or discoverable, while metadata is not. You have to learn metadata through other methods, such as taking training courses, reading documentations, reverse-engineering, or data profiling. The more complex a relational database, the more metadata you need to learn before querying for data. For companies with complex business domains, it is very time-consuming and expensive for employees to learn metadata. It is even more expensive to keep metadata-rich employees.

Secondly (as explained here), you need a true graph query language to take full advantage of relationship data. The old-fashioned SQL is just not designed for this purpose. SQL assumes that you already know relationships as metadata. So it expects you to explicitly spell them out in FROM clauses. If you are trying to use SQL to query/discover unknown relationships in a graph database, you are putting the cart before the donkey.

OrientDB is a self-claimed graph database. However, it completely misses the above 2 points. This is evident when it tries to compare itself against a true graph database, Neo4j.
Comparison Point
Complex Domains Support schemas around graphs, i.e. vertex and edge hierarchies. Only support "flat" labels for vertices and edges.
Query Language OrientDB's query language is built on SQL. Considering most developers are familiar with SQL, working with OrientDB is just easier. Neo4j has its own Query Language called "Cypher" which requires training to learn a new language.

What are "schemas around graphs?" They are METADATA! Neo4j's not supporting schemas is a conscious  strategic design decision, not a trade-off between simplicity or capability. Hierarchy is just a tree structure, a specific type of graph. No matter how complex a domain hierarchy is, a graph database can model it natively as data. There is absolutely no need to introduce metadata. Modeling domain hierarchies as metadata instead of data in a graph database is like putting the donkey before a car.

Cypher provides liberating expressiveness for querying graphs. Its advantages over SQL is at the same level as comparing OOP to procedure languages. Criticizing Cypher on the base of developers' familiarity with SQL is like a swordsmith running a negative ad against a gun shop. "Every knight knows how to use a sword. It requires training to use a gun."

Sunday, August 14, 2016

A Graph is Worth a Thousand Words, Part 2

A Graph is Worth a Thousand Words, Part 1.

What are the things that a graph database can do, but a relational database can't? This is the burning question anyone from SQL world would ask when they first hear about graph database.

In part 1, I pointed out that a graph database allows you to discover/query complex relationships, while a relational database assumes you already know the relationships beforehand. This is mainly contributed to the fact that relationships are treated as first-class data in a graph database, and second-class metadata in a relational database.

There is another major reason why a graph database makes relationship discovery possible: its new graph-based query language. To discover relations implies that you don't know them in advance. Without knowing the relationships, you cannot write any SQL queries. You must explicitly spell out the relationships in a SQL query's FROM clause to join tables together. Using SQL to discover relationships is to put the cart before the donkey.

A graph query language, e.g. Cypher, allows you to query relationships without explicitly spelling out their specifics. This is equivalent to allowing wildcards in FROM clauses, such as "TABLE_A join *", or even "* join *". Not only the tables can be wildcarded, but also the join levels, like "A join (1..3) B" or "* join(*) *".

This makes graph databases extremely tool-friendly, or self-serving-friendly. The current generation of self-serving data discovery tools relies on predefined data models. You cannot point such a tool to an unknown relational database and start to discover. With a graph database and a Cypher-powered tool? Yes We Can!