The Year of the Graph Newsletter: September 2018
Knowledge graphs in Gartner’s hype cycle, machine learning extensions and visual tools for graph databases, Ethereum analytics with RDF, Using Gremlin with R, SPARQL, and Spring, graph database research wins best paper award in VLDB, and benchmarking AWS Neptune.
Not bad for a typical summer vacation month such as August. This edition of the Year of the Graph newsletter had to be extended to make sure we include as much of the good stuff as possible.
Gartner’s hype cycle for 2018 was recently released, and knowledge graphs were included for the first time. If you wanted official proof it is the year of the graph, there you have it. When a hitherto niche technology gets in the spotlight, some explanations are in order, and Andreas Blumauer from the Semantic Web Company has a go at this.
Google has had a knowledge graph for a while now. But developing and using a knowledege graph at web scale is no easy feat. Diffbot claim to have managed to do just that, turning the web into the world’s largest knowledge graph.
There is a lot to be said about knowledge graphs, what they are, and how to build them. A graph database will be the foundation on which you build one, but that’s not the only thing you can use graph databases for. Neo4j’s Jennifer Reif talks about when graph databases make sense.
Here’s the thing about knowledge graphs: you don’t necessarily need to move all your data to a graph database in order to build one. But you do need to have the right pointers and metadata about your data, and for this you do need a graph database. Kurt Cagle from Semantical LLC describes the approach.
Since we’re at the semantic side of things in graphs, check out how Alethio and SANSA combined the SANSA stack for reading and querying large scale RDF data with two of the most classic graph algorithms, Connected Components and PageRank, to do analytics on the Ethereum network.
RDF and graph analytics, check. RDF and machine learning, check too. Expect to see this more and more going forward. Here Pedro Oliveira from Stardog outlines how Stardog’s machine learning extensions for SPARQL do similarity search.
Neo4j also has some machine learning extensions. Lauren Shin, an intern at Neo4j, has developed some extensions for linear regression, which she outlines here.
Another contributor, Peter Heisig from Technische Universität Dresden, another Neo4j extension. Heisig has built a Graph View Editor to interact with Neo4j, skipping the writing Cypher part.
More visual tools. Dave Bechberger built an IDE for running traversals and visualizing results for Tinkerpop-enabled graph databases. It’s still early stage, but if you are not a big fan of the console, this may work well for you. And it’s open source, so you can contribute too.
But that’s not the only reason Tinkerpop users have to rejoice. Microsoft also developed and open sourced a valuable resource for Tinkerpop-enabled graph databases: a Spring Data layer for Gremlin. If you like Spring Data, you will sure appreciate this.
Tinkerpop on a roll: Dharmen Punjani and Harsh Thakkar from the University of Bonn just released their Gremlin – SPARQL connector, which was included in Tinkerpop. This means you can now query Tinkerpop-enabled graph databases using SPARQL.
Wrapping up with Tinkerpop and Gremlin, Jeffrey Hanson from the University of Queensland shows how Gremlin can be used to find subgraphs in R. Hanson is a conservation scientist, drawn to graphs by problems he has to deal with in his work.
This goes to show the ubiquity of large graphs and the surprising challenges
of graph processing. That was also the title of Siddhartha Sahu’s and his co-authors’ user survey paper that won the best paper award in VLDB.
Did you ever wonder how fast AWS Neptune really is? Not as fast as TigerGraph, according to this benchmark published by TigerGraph’s VP of Engineering Mingxi Wu. Of course, benchmarks done by vendors should always be taken with a pinch of salt, but this may give you an idea.
Performance is important of course, but choosing a graph database is a hard exercise which should take many factors into account. Good news is, somebody did this already, so you don’t have to. The most comprehensive research on graph databases is out there, it will save you time and money, and ensure you choose what works for you. And if you’ve read this far, here’s a limited edition 33% off discount code for you: 33OFF
Would you like to receive the latest Year of the Graph Newsletter in your inbox each month? Easy – just signup below. Have some news you think should be featured in an upcoming newsletter? Easy too – drop me a line here.