The Year of the Graph Newsletter: November 2018
October 2018 was the busiest month in the busiest year in graph history, hence the longest Year of the Graph newsletter to date. Neo4j lands a massive funding round, Tinkerpop is moving forward, the most important knowledge graph research event with key industry presence, W3C organizing a Workshop on Web Standardization for Graph Data, and new knowledge graphs and tools are out.
Neo4j just announced a new funding round of 80 million, almost as much as it had raised in the last 10 years combined. What this means in a nutshell: VC validates the view that graph databases are going mainstream. Neo4j gets an even longer runway and the chance to widen its lead in the market. We will see further funding for the competition as well, probably sooner than later.
Read about the story behind our Series E fundraising announcement in this post by Neo4j CEO Emil Eifrem on the future of graph technology and AI.
After his sabbatical, Marko Rodriguez, Apache Tinkerpop’s mastermind, returns with a paper outlining Tinkerpop’s future. TinkerPop4 will be along the lines of the vision outlined last year, and will start development in the Spring 2019. An interesting read whether you are getting started with Tinkerpop or have been using it for years.
The Graph Traversal Machine: Close Encounters of the Fourth Kind” by @twarko and @spmallette. This article discusses the path forward on Apache TinkerPop’s next and final version: TinkerPop4 https://t.co/TzmgKoHaki #graphdb #nosql
If you are into more down-to-earth Tinkerpop knowledge, Microsoft’s Jayanta Mondal explains that the way Gremlin queries are written has an impact on their performance. Gremlin is not like most query languages you are used to working with probably, and these tips apply not just to Azure Cosmos DB, but beyond as well.
Gremlin is one of the most popular query languages for exploring and analyzing data modeled as property graphs. There are many graph-database vendors out there that support Gremlin as their query…
More Gremlin query tips from Microsoft’s Jayanta Mondal, this time on how to use pagination.
If you have looked for efficient solutions to paginate the results of your Gremlin queries, you may have stumbled upon this post on StackOverflow, however, perhaps only to realize that the pagination…
More query language news: if you’ve been following this topic, you must know about the Neo4j-lead initiative for a common query language for property graphs. Now the W3C has embraced this initiative, and has announced a workshop to be held on the topic in Berlin this March.
Data is increasingly important for all organisations, especially with the rise of IoT and Big Data. The falling costs for storage and processing is driving interest in extracting competitive value from ever larger amounts of data through analytics and data hungry AI algorithms. In addition, organisations are seeking to exploit opportunities for sharing data within emerging digital ecosystems. W3C has an extensive suite of standards relating to data that were developed over two decades of experience.
ISWC is the most important semantic web conference. This year an incredible wealth of research was presented, but perhaps the highlight was the presence of industry giants such as Google, Microsoft, eBay and LinkedIn who elaborated on how they are using knowledge graphs at scale in the real world, without necessarily using Semantic Web technology. Juan Sequeda from Capsenta summarizes.
ISWC has been my go-to conference every year. This time it was very special for two reasons. First of all, it was my 10 year anniversary of attending ISWC (first one was ISWC2008 in Karlsruhe where I presented a poster that ultimately became the basis of my PhD research and also the foundational software of Capsenta).
One of the things that attracted attention in ISWC, and beyond, was TBL’s initiative called Solid, which utilizes Semantic Web technology under the hood. Ruben Verborgh is one of the people who are working on this, and here he presents the vision and motivation behind Solid.
Personal data is being centralized at an unprecedented scale, and this comes with widely known and far-reaching consequences, considering the recent data scandals with companies such as Equifax and Facebook. Decentralizing personal data storage allows people to take back control of their data, and Semantic Web technologies can facilitate data integration at runtime…
One of the topics that permeated ISWC was using machine learning to enhance knowledge graphs. But perhaps this also works the other way round, according to researchers from DeepMind, Google Brain, MIT, and the University of Edinburgh. They propose the use of network “graphs” as a means to better generalize from one instance of a problem to another.
Scientists of AI at Google’s Google Brain and DeepMind units acknowledge machine learning is falling short of human cognition and propose that using models of networks might be a way to find relations between things that allow computers to generalize more broadly about the world.
Google has announced new features for search – Activity Cards, Collections and Dynamic organization of Search results. To enable these, Search has to understand interests and how they progress over time. To do this, Google has taken their existing Knowledge Graph and added a new layer, called the Topic Layer, engineered to deeply understand a topic space and how interests can develop over time as familiarity and expertise grow.
We’re introducing new features that help you resume tasks where you left off, keep track of ideas and content that you found useful, and get relevant suggestions of things to explore next.
dataCommons is a Google-lead initiative for graph-shaped datasets which attempts to synthesize a single Knowledge Graph from publicly available data from open sources. It links references to the same entities across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources. It can be accessed via Python Notebooks, the dataCommons Knowledge Graph Browser, as well as APIs.
dataCommons attempts to synthesize a single Knowledge Graph from these different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources.
KBpedia is another comprehensive Knowledge Graph resource has just been made available to the public. Its upper ontology (KKO), full knowledge graph, mappings to major leading knowledge bases, and logical concept groupings according to 70 largely disjoint typologies have just been opensourced. The KBpedia knowledge structure combines seven ‘core’ public knowledge bases — Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL — into an integrated whole.
KBpedia is a comprehensive knowledge structure for promoting data interoperability and knowledge-based artificial intelligence, or KBAI. The KBpedia knowledge structure combines seven ‘core’ public knowledge bases – Wikipedia, Wikidata, schema.org, DBpedia, GeoNames, OpenCyc, and UMBEL – into an integrated whole. KBpedia’s upper structure, or knowledge graph, is the KBpedia Knowledge Ontology.
If you still are still unsure as to what a Knowledge Graph is, Spencer Norris has a go a this, concluding they are closely related to Ontologies, but the difference is not clear – size, maybe?
Recent attention from the research community has helped foment a significant debate among knowledge representation experts: what are knowledge graphs?
And what about the difference between Ontologies and Vocabularies? Holger Knublauch from TopQuadrant wonders, a few pundits have their say.
Most of the published “ontologies” barely go beyond RDF Schema and simply define classes, properties, domains and ranges, but no OWL whatsoever. It would be more helpful to call those “vocabularies”, as their main contribution is to define names and URIs.
Back to knowledge graphs in the real world, here is one of the things Salesforce is using them for. What is an application network, and why does it need a graph? Phil Wainewright from Diginomica explains why MuleSoft, and Salesforce, are going graph. Hint: look out for more coming from Salesforce in the next few days.
What is an application network? And why does it need a graph? MuleSoft’s founder and CTO help explain today’s announcement at Dreamforce 2018
A new tool for working with Knowledge Graphs is out – Gra.fo from Capsenta. Gra.fo claims to be the only visual, collaborative, and real-time ontology and knowledge graph schema editor in existence. Many interesting features, perhaps the most surprising one: support for property graphs.
A common frustration we’ve encountered is the lack of adequate tooling around ontology and knowledge graph schema design. Many tools exist- some are overly complex, some are very expensive, and none allow one to work visually, collaboratively and in real time on a document with multiple concurrent users.
Neo4j works with property graphs, and not many people would think of it as their first choice for working with RDF. But if you are using Neo4j, and want to use RDF, there is a way – Jesus Barrasa’s Neosemantics plugin. Now there is a new version released, and Barrasa writes about some of the things you can do with it.
In this instalment of the QuickGraph series, I’ll show how to map a graph stored in Neo4j to an ontology (or schema, or vocabulary…) using the neosemantics extension. THE DATASET For this experiment, I’ve used the public IMDb Datasets available at https://datasets.imdbws.com/. I’ll use only a subset of the data (people, titles and genres) since the…
RDF is probably not something you would think of in the context of Docker. But Maximiliano Osorio, Carlos Buil-Aranda and Hernán Vargas have analyzed more than 150.000 Docker images hosted in Docker Hub, creating more than 100 millions of RDF triples that include links to external resources such as the Debian Package Tracking System and The Docker Store, and published their dataset.
DockerPedia is an RDF linked dataset that stores the information about Docker images hosted in Docker Hub including 4.5 million of images, its layers and packages. We also provide vulnerability analysis of these packages obtained with Clair.
How about using RDF datasets for testing hypotheses in the domain of politics? This is exactly what some academics are doing, using Wikidata
How academics are using Wikidata to look for links between legislative behaviour and the biographies of Members of Congress
mySociety is a not-for-profit social enterprise that exists to invent and popularise websites and apps that enable citizens around the world to exert power over institutions and decision makers.
MarkLogic is usually not among the first names that come to mind when considering graph databases. MarkLogic’s value proposition is mainly built around working as an integration hub though, RDF is a very good match for this, and MarkLogic fully supports RDF and SPARQL. Now MarkLogic has just announced its data hub as a hosted service in the cloud.
New MarkLogic Data Hub Service delivers both agile data integration and agile data infrastructure – with unmatched security and governance, and predictable costs.
GraphDB also has a new version out, 8.7. We take a look at key new features, introducing data wrangling and embeddings as the foundation for semantic search and machine learning, and discuss where these may lead to. If you want to hear from leaders and innovators in Graph Databses, Knowledge Graphs, Linked Data and AI and Machine Learning, there’s just a couple of days left to join us in Connected Data London.
With the knowledge graph hype in the air, more people than ever are looking to find out how and why the Googles, Airbnbs and Ubers of the world are using knowledge graphs, and how such practices can be adopted elsewhere. Usually with hype comes also confusion, and knowledge graphs are no exception.
If you are interested in fragments of the process of selecting a graph database, Louis Chen discusses. Why fragments though? Because this is a very-very limited subset of the graph database options which is evaluated, and a limited set of criteria used as well. For the real deal, check out the Year of the Graph report.
Recently, I have been working with my client who is Australian’s largest telecommunications to deliver a real-time data product. The product allows the client to visualize network performance/configuration overlaid on network and service topology.
To see the Year of the Graph research presented in November, you have 3 options: 1) Connected Data London, 2) Big Data Spain, and 3) Connected Data Berlin. The benefit of joining our Meetup in Berlin? It’s free, and you also get to see local heroes ArangoDB in action – check it out! Spots are limited, so get yours now to avoid disappointment.
What can you get for free in Berlin, that you would have to get out of your way to get in, say, London or Madrid?A few things, including the most up to date and comprehensive research on graph databas
Would you like to receive the latest Year of the Graph Newsletter in your inbox each month? Easy – just signup below. Have some news you think should be featured in an upcoming newsletter? Easy too – drop me a line here.