Aerospike Graph: A new entry in the graph database market, aiming to tackle complex problems at scale
“Graph database growth is going strong through the Trough of Disillusionment.” And “Graph Analytics go big and real-time.” These were two of the headlines of the Spring 2023 update of the Year of the Graph newsletter. In combination, they seem like an appropriate summary of the reasoning behind a new entry in the graph database market: Aerospike Graph, which Aerospike officially unveiled in June 2023.
Aerospike Graph is the culmination of a long journey that started about three years ago, as the company’s Chief Product Officer Lenley Hensarling shared with me during a recent podcast. We caught up to discuss the steps in that journey, as well as Aerospike’s differentiation in a very densely populated market. I captured some of the highlights of our discussion below.
Aerospike: An open-source NoSQL database
To understand Aerospike’s foray into graph, some context is necessary.
Aerospike has been around since 2014. The name “Aerospike” derives from the aerospike engine, a type of rocket nozzle that can maintain its output efficiency over a large range of altitudes and is intended to refer to the software’s ability to scale up.
Going over Aerospike’s history and his own involvement with the company, Hensarling referred to Aerospike’s founders Brian Bulkowski and Srini V. Srinivasan. Bulkowski came from a networking and storage background, while Srinivasan came from a theoretical database background. Combining them was what shaped Aerospike.
Aerospike is a shared-nothing, distributed architecture and is written in C. It operates in three layers: data storage, self-managed distribution, and a cluster-aware client. In 2014, the company was open-sourced under the AGPL 3.0 license for the database server and the Apache License Version 2.0 for its client software development kit.
Aerospike’s hypothesis was that companies need to apply more data in a short time window with a predictable SLA and do that in a cost-effective manner in the cloud and on-premises. The engine started out as a key-value store, with “hybrid memory” as its main feature. As Hensarling explained, hybrid memory is the name Aerospike uses for its indexing approach.
Eventually, Aerospike expanded its initial offering to include the document model (JSON) as well as a SQL interface via Starburst. Graph was the next step, with adoption being customer-driven as per Hensarling.
Historically, Aerospike has had a large footprint in adtech as well as banking, financial services and payments. These are industries in which identity resolution is a key operation, and identity resolution is getting increasingly complex. More and more identity resolution solutions triangulate data aiming to identify usage patterns that could point to identity theft and/or fraud.
What the Aerospike team noticed was that an increasing number of customers were developing custom solutions that leveraged graph processing as part of their identity resolution solutions. A common usage pattern emerged, as customers were building solutions leveraging the Apache TinkerPop open-source graph framework using Aerospike as the storage layer.
After talking to all those customers and evaluating the potential, Aerospike embarked on a journey to implement an “official” graph solution based on Apache TinkerPop. What encouraged Aerospike to invest in this was that customers were able to scale to petabytes while keeping access times low and data partitions balanced.
“We have a very large customer who’s in the payment systems business. And they were using a graph solution like the one we’re talking about here – built on TinkerPop at scale: billions of vertices and thousands of edges connecting them all. And the first time I saw that I went – Wow! Why don’t we do something in that space?”, Hensarling said.
Database performance and scalability
Aerospike embarked on its journey to graph roughly three years ago. However, it wasn’t all smooth sailing. The initial implementation created some hotspots that compromised performance and scalability. Realizing this, Aerospike ramped up their efforts and went back to the drawing board. A new product manager was hired to work on Aerospike Graph – Ishaan Biswas.
Biswas worked closely with Aerospike customers to understand their needs and learn from their experience. Eventually, a new team was built for Aerospike Graph, including Apache TinkerPop founder Marko Rodriguez. Rodriguez is a well-respected figure in the industry, and the team benefited immensely from his involvement.
The development of Aerospike Graph also involved key contributors in the Apache TinkerPop project. Their expertise enabled Aerospike Graph to overcome its early issues and create a graph layer that interfaces with the core Aerospike engine in a way that scales out horizontally in a shared-nothing architecture.
“If you have high throughput, and since it’s shared nothing [architecture], you can spin nodes up and down as you have to in terms of how much throughput you need and how many connections you’re going to have. You have the ability to scale while you have the persistence handled elsewhere. It’s a big cost savings when you have variable workloads, so there’s a great deal of elasticity built into the solution”, Hensarling said.
As a new vendor in the graph database market emphasizing performance and scalability, it’s only natural to wonder whether Aerospike Graph has benchmarks to back up those claims. Even though Aerospike Graph was extensively benchmarked internally according to Hensarling, those benchmarks are not publicly available at this point.
Either way, benchmarks are always highly disputed and can only serve as an indication. Hensarling claims that Aerospike Graph has been battle tested with real workloads in many customer deployments and has earned its stripes.
Coincidentally, it was only a few days after Aerospike Graph’s official unveiling that another database vendor with a similar profile who had recently entered the graph database market announced its exit. In 2019, Redis introduced their graph database, citing similar reasoning: they wanted to offer performance and scalability. In 2023, they wound down RedisGraph saying:
“Many analyst reports predicted that graph databases would grow exponentially. However, based on our experience, companies often need help to develop software based on graph databases. It requires a lot of new technical skills, such as graph data modeling, query composition, and query optimization. As with any technology, graph databases have their limitations and disadvantages.
This learning curve is steep. Proof-of-concepts can take much longer than predicted and the success rate can be low relative to other database models. For customers and their development teams, this often means frustration. For database vendors like Redis, this means that the total pre-sales (as well as post-sales) investment is very high relative to other database models”.
Adding to this is the fact that the graph market is bustling with vendors vying for market share. A total of circa 50 solutions offer graph capabilities as per DB Engines if we take into account RDF stores as well. And peak hype for graph databases seems to be behind us. Should Aerospike be cautious about its investment?
Hensarling acknowledged that there is some truth in Redis’ reasoning. However, he emphasized that there is a fundamental difference between Redis and Aerospike: their audience. Redis is meant to be simple and joyful, according to its own RedisGraph End-of-Life Announcement. Aerospike, on the other hand, is meant for tackling complex problems at scale per Hensarling.
“We differentiate along a couple of specific vectors. First, the notion of real time. It’s not just being able to respond to queries with low latency. It’s the ability to do that consistently regardless of what the load is. Even when the data set has grown to significant size.
We have a T-shirt that we hand out at Meetups for Aerospike that says – ‘Write once, scale forever’. We call this aspirational scale: Even if your solution in the beginning has maybe thousands of users, not hundreds of thousands, not millions. You still need to plan for that number of users, that kind of throughput, from the start”, he noted.
Aerospike’s target audience does not expect that tackling their problems is going to be easy, and Aerospike does not expect that selling to this segment is going to be easy either. However, the goal is always to reduce friction as much as possible, and Aerospike is trying to do this in a number of ways.
To begin with, there’s a 60-day free trial for Aerospike Graph. The idea is to let people experiment with the product before they adopt it. In addition, Hensarling noted that Aerospike has put together resources that developers can refer to to get started, as well as some templates with pre-modeled graphs for specific use cases.
Those pre-modeled graphs are focused on identity resolution and may come in handy for developers not familiar with graph modeling. That is also related to another aspect of using Aerospike Graph, namely familiarizing oneself with Gremlin, Apache TinkerPop’s query language. Aerospike has partnered with G.V(), a dedicated Gremlin IDE that can be used for quick prototyping and iterative adjustments of data models and queries as well as to visualize connected datasets.
However, Gremlin is different compared with other graph query languages such as Cypher or SPARQL, in that it is imperative rather than declarative. In Gremlin, developers have to specify exactly how their graphs will be traversed step by step. In other graph query languages, developers only have to specify the pattern in the data that they are looking for.
That means that Gremlin requires more involvement to use and may not appeal to everyone. However, Hensarling argues that this is actually in line with the types of use cases Aerospike Graph is aiming for. The reasoning there is that when trying to tackle complex problems at scale, it makes sense to be intricately involved in things such as data distribution and optimal path traversals to optimize solutions.
At this time, Aerospike Graph is meant to be used primarily in the types of use cases that inspired it: those requiring real-time graph-algorithmic prowess. Adding OLAP capabilities for graph analytics is something that the team is already working on.
As far as the roadmap goes, the other big item in Aerospike’s list is making Aerospike Graph a part of its DBaaS offering. That may not sound like much, but it’s a lot of work that the team hopes will pay off as DBaaS becomes a cornerstone of Aerospike’s strategy.
All in all, the path that Aerospike has chosen to approach to add graph to its capabilities seems to make sense: starting out from what customers are doing, learning from their requirements and experience, building a team with the right expertise and treading with caution. What remains to be seen is whether the complexity bet will pay off, as well as how much the offering can be simplified to attract a wider audience.
To hear my interview with Aerospike’s Lenley Hensarling, tune into our recent podcast discussion.
DISCLOSURE: This article is sponsored by Aerospike. Editorial control exercised by the author.