Neo4j’s roadmap in 2023: Cloud, Graph Data Science, Large Language Models and Knowledge Graphs

Neo4j recently announced new product features in collaboration with Google, as well as a new Chief Product Officer coming from Google: Sudhir Hasbe. We caught up to discuss what the future holds for Neo4j as well as the broader graph database space.
Sudhir Hasbe began his career as an engineer, but quickly transitioned into product development and management roles. His career includes stints at Microsoft and Google, notably as Google Cloud Senior Director of Product Management for Data Analytics. It was during his Google tenure that Hasbe got acquainted with Neo4j, even though his relationship with graphs did not start with Neo4j, but with Xbox.
From Xbox to Big Graph and Neo4j
Hasbe was involved in building features centered around social discovery for Xbox, which required understanding and leveraging social graphs. Although graph databases were not widely available at the time, Hasbe and his team experimented with building their own graph solutions on top of big data technologies. This experience sparked his interest in the power of graph analytics and its potential for personalized recommendations and insights.
Subsequently, Hasbe joined Zulili, a consumer technology marketplace, where he further explored graph technology. They experimented with Neo4j’s open-source stack, gaining valuable insights into its capabilities. Later, at Google, Hasbe got more exposure to graph solutions such as JanusGraph, exploring the integration of graph technology with Bigtable.
Hasbe’s vision for data analytics in Google Cloud emphasized the need for diverse compute engines to extract value from collected data. Recognizing the value of graphs in this context, Hasbe coined the term “Big Graph” to describe the integration of graph analytics with BigQuery. This vision aligned with the evolving needs of organizations.
Although Google never rolled out a graph database product of its own, partnerships with industry leaders like Neo4j were instrumental in bringing “Big Graph” to fruition. Neo4j was part of a batch of partnerships with open-core vendors that Google announced in 2019. Their partnership aimed to empower customers with the ability to leverage graph technology for enhanced analytics.
After five and a half years at Google, Hasbe decided to explore new horizons. Graph and Neo4j caught his attention, and he joined the company as the Chief Product Officer after discussing with CEO and founder Emil Eifrem.
Large Language Models
The last product announcement to come out of Google Cloud Data Analytics on Hasbe’s watch was about Neo4j and BigQuery integration. The first product announcement to come out Neo4j on Hasbe’s watch was about new product integrations with generative AI features in Google Cloud Vertex AI. Besides a sense of continuity, there were a few other noteworthy points about this announcement.
Those integrations are meant to enable enterprise customers to leverage natural language to interact with knowledge graphs, transform unstructured data into knowledge graphs, call Vertex AI services in real-time to enrich knowledge graphs, ground large language models with knowledge graphs, and offer support for vector embeddings.
Unsurprisingly, all these integrations fall under one of the three ways people are attempting to use knowledge graphs and LLMs in tandem: using an LLM to create a new Knowledge Graph, using an LLM to access an existing Knowledge Graph, or using a Knowledge Graph to augment a LLM. But there’s more to unpack here.
First, this signifies a clear emphasis on LLM integration and support for AI on Neo4j’s part. As Hasbe shared, Google is just the beginning. Neo4j is working on releasing similar integrations with other providers as well, notably AWS and presumably Azure as well. In addition, Neo4j has put together an internal project code-named NaLLM to explore and demonstrate the synergies between Neo4j and Large Language Models.
Hasbe also noted that while these integrations are deployed with a priority on Aura, Neo4j’s fully managed cloud, they are rolled out on both self managed and on-premise versions too. We wondered whether integrations with custom LLMs other than OpenAI’s GPT series or Google’s PaLM are planned. Hasbe acknowledged that this comes up a lot from clients, so it’s in Neo4j’s agenda, but at this time it’s hard to tell what it might look like.
Graph Data Science and Vector Embeddings
What Neo4j’s product roadmap has more clarity on, however, is Graph Data Science (GDS), Neo4j’s graph analytics and modeling platform. The touch point to Neo4j’s recent announcement was vector embeddings. This is something that GDS has supported for a while, and is widely used as per Hasbe. But the LLM wave brought renewed interest in embeddings. The reason is that embeddings can offer a shortcut to interacting with LLMs that results in faster, cheaper and better results.
“The theory is, using vectorized information from a LLM, storing it in your store, and then running cosine similarity functions is a much more cost-effective way than trying to put all your enterprise data into a LLM and fine tune it. There are limitations on fine-tuning and the number of tokens you can use, and most likely it’s going to hallucinate and give you some wrong information”, Hasbe noted.
This is the reason why adding vector capabilities has been a sweeping trend across all databases in the last few months. Of course, the usual dilemma comes up here too: should you use your existing database which just added some vector capabilities, or a specialized vector database? The answer, as usual, is “it depends”. A specialized store will typically provide better and faster functionality, at the expense of more cost and operational complexity.
Neo4j supports vectors via an in-memory cosine similarity function today. But Hasbe said there will be new specialized indexes for vectors working natively on top of the storage layer coming soon, in order to improve performance.
Knowledge Graphs
In a way, however, GDS and embeddings were the odd one out in Neo4j’s announcement on product integrations with generative AI features in Google Cloud Vertex AI. Four out of those five integrations referred to Knowledge Graphs. Even though we’ve been keeping an eye on Neo4j for a long time, this is the first time we recall its messaging emphasizing Knowledge Graphs in this way.
Knowledge Graphs, like AI, is one of those super-hyped terms used in so many different ways by so many different people that end up almost meaningless. And it’s claimed there are over 100 different definitions of Knowledge Graphs too. It seems somewhat intuitive, however, that a Knowledge Graph should be about knowledge, therefore about meaning, semantics, and schema.
Historically, there have been two different categories of graph databases, based on the data model they utilize: labeled property graphs (LPG) and RDF. Although there is convergence under way, LPG has historically emphasized operations and ease of use while RDF has emphasized meaning and semantics. Neo4j is a typical LPG graph database, so the Knowledge Graph messaging is an interesting twist..but maybe not the only one.
Although RDF, meaning, semantics, and schema never were first-class citizens in Neo4j, there is some support for them. More specifically, there is a plugin called neosemantics which enables the use of RDF and its associated vocabularies (like OWL, RDFS, SKOS and others) in Neo4j. Recently, neosemantics reached 1 million downloads.
This is an important milestone in and of its own, but to put it more in context, consider that about 18 months ago neosemantics was at 20K downloads. Could this signify a change in what Neo4j’s users are looking for, and therefore, a change in direction for the company as well?
“We are following what our customers are telling us. And I think if there is more interest in building Knowledge Graphs and having more semantic graph capabilities in Neo4j, we will follow that and we will absolutely go build more capabilities to support organizations in that”, Hasbe said.
Cloud and Ease of use
That’s a departure from Eifrem’s position on RDF in 2020, but then again, the numbers speak for themselves. Hasbe noted that he is new in his role, as he’s only been with Neo4j for 3 months, so he is still learning and evaluating options. He also shared that he has met with many customers in this short time.
Neo4j is building a customer advisory board and adding some of the largest organizations in the world to get continuous feedback. The goal, Hasbe said, is to learn from them and identify what capabilities will make more sense in the product.
But that probably does not mean that everything will change, or that Neo4j is starting again with a blank slate. One thing that has remained the same in Hasbe’s priorities compared to the existing Neo4j roadmap is the emphasis on ease of use. “Five seconds to sign up, five minutes to wow” is the mantra Hasbe used, giving credit for it to Microsoft. That includes a number of things, from dashboards to LLMs.
Another existing priority that will remain is the emphasis on cloud. That makes sense considering not only Hasbe’s background and experience, but also broader market trends and Neo4j specific data. More than 90% of users today deploy Neo4j in the cloud, whether that’s Aura or self-managed.
Neo4j is and will remain cloud-first, Hasbe said. Furthermore, the goal is to integrate with more cloud ecosystems beyond Google, such as Snowflake, Databricks and Azure Fabric.
There will be renewed emphasis on Graph Data Science as well. Hasbe identified two different audiences for GDS: developers that want to enrich applications with graph-based analytics and insights, and data scientists that do ad-hoc analysis. These are different types of workloads (the latter are ephemeral while the former are not), and the goal is to be able to support both equally well.
Last but not least, scalability. The size of data in organizations is only growing, so being able to support a cloud scale architecture and cloud scale capabilities is going to be another big area of focus for Neo4j.
Even though the overarching strategy is subject to evaluation as Hasbe gains more experience in his role, the initial directions seem as a logical evolution of Neo4j strategy historically, combined with new trends and insights.