Monday, August 22, 2016

OrientDB Completely Misses the Point of Graph Databases

What is the point of a graph database? It is all about relationships.

First of all (as explained here), relationships are treated as first-class data in a graph database, instead of second-class metadata in a relational database. Data is queryable or discoverable, while metadata is not. You have to learn metadata through other methods, such as taking training courses, reading documentations, reverse-engineering, or data profiling. The more complex a relational database, the more metadata you need to learn before querying for data. For companies with complex business domains, it is very time-consuming and expensive for employees to learn metadata. It is even more expensive to keep metadata-rich employees.

Secondly (as explained here), you need a true graph query language to take full advantage of relationship data. The old-fashioned SQL is just not designed for this purpose. SQL assumes that you already know relationships as metadata. So it expects you to explicitly spell them out in FROM clauses. If you are trying to use SQL to query/discover unknown relationships in a graph database, you are putting the cart before the donkey.

OrientDB is a self-claimed graph database. However, it completely misses the above 2 points. This is evident when it tries to compare itself against a true graph database, Neo4j.
Comparison Point
OrientDB
Neo4j
Complex Domains Support schemas around graphs, i.e. vertex and edge hierarchies. Only support "flat" labels for vertices and edges.
Query Language OrientDB's query language is built on SQL. Considering most developers are familiar with SQL, working with OrientDB is just easier. Neo4j has its own Query Language called "Cypher" which requires training to learn a new language.

What are "schemas around graphs?" They are METADATA! Neo4j's not supporting schemas is a conscious  strategic design decision, not a trade-off between simplicity or capability. Hierarchy is just a tree structure, a specific type of graph. No matter how complex a domain hierarchy is, a graph database can model it natively as data. There is absolutely no need to introduce metadata. Modeling domain hierarchies as metadata instead of data in a graph database is like putting the donkey before a car.

Cypher provides liberating expressiveness for querying graphs. Its advantages over SQL is at the same level as comparing OOP to procedure languages. Criticizing Cypher on the base of developers' familiarity with SQL is like a swordsmith running a negative ad against a gun shop. "Every knight knows how to use a sword. It requires training to use a gun."

2 comments:

Luca Garulli said...

Hi Jing,

OrientDB is a Multi-Model database first. This means you can mix different models all together. If you want still use a pure Graph Database approach (like Neo4j) you can do that without any of the limitations you reported:

1) don't use schema and
2) don't use SQL but use the Pattern Matching approach: http://orientdb.com/pattern-matching-with-orientdb/

Unknown said...

Dear Jing,

I think you are missing the point, OrientDB offers you both the SQL syntax (low-barrier entry for people coming from the relational-database world) AND a pattern-matching that is comparable to Cypher.

Both Neo4j and OrientDB are native graph databases, only OrientDB has a more ambitious plan, which makes it more complex to comprehend (it's a tradeoff, just like in any technology you choose).

Also METADATA is nice to have in the database, otherwise you need to implement it yourself in your application code, but you don't need to use it, as Luca mentioned, just don't bother with it unless you need it.

Take a look at this introductory (free) course: https://www.udemy.com/orientdb-getting-started , it helped me to give OrientDB a fair treatment when comparing it with Neo4j for a couple projects.