Semantic Web and CMS: a symbiotic relationship
How structured data and CMS can work with each other
Working in a company that specializes in Knowledge Management and does pretty much all of its front-end development in a web-based environment, you quickly come to the realization that a good CMS is a key success factor. ‘Good’ in this context means everything you would normally expect – reliability, performance, modularity, ease of use, extensibility etc, plus one very important, yet missing element: support for rich, extensible and standardised metadata. In other words, Semantic Web support.
There is definitely a case to be made for the kind of symbiotic relationship that could be developed between Portal – CMS systems and the Semantic Web: on the one hand, the ability to transparently integrate content from other sources that the Semantic Web standards provide is something that any Portal – CMS system would benefit from.
On the other hand, having content producers that support Semantic Web standards and annotate (or better yet, give access to) their content using Semantic Web standards means we’re getting closer to the solution of the ‘chicken-and-egg problem of missing semantic representations on the Web and the lack of their evaluation by standard search engines‘.
So why, do you ask, does not every Portal and CMS out there support features like these? Many reasons, IMHO: relative complexity, lack of maturity and, did i mention the chicken-and-egg problem? So, when faced with the need to find a platform that would cover our requirements in this department, the only real option we had (besides some Semantic Wikis) was Drupal 7. Sadly, that was not even an option.
Don’t get me wrong, i have nothing against Drupal and i can say that their DERI-powered implementation seems powerful and flexible – although i have not really played with it. However, both the skillset in our organisation, our existing codebase and the overall features we need dictated that we stick with our portal/CMS of choice: Liferay.
So, we went ahead and produced our own Linked Data Module for Liferay and entered the 2009 Linking Open Data Triplification Challenge. The module exemplifies the Outbound part of our Inbound/Outbound Linked Data approach, has been created using open-source software and standard vocabularies and has been released under LGPL license on Sourceforge.
Needless to say, we’re eating our own dogfood: most of the applications we have developed for the Inbound part of the Inbound/Outbound approach utilize this module, such as the Tag Disambiguation and the Distributed Contextual View Retrieval applications.
So, why the update now? Apparently, Semantic Web standards for CMS are getting traction – most notably, RDFa. For many people, RDFa is the low-hanging fruit of the Semantic Web stack and what they focus their attention on. This is not surprising, since:
- RDFa is relatively simple to grasp – definitely simpler than RDF (let alone OWL and everything that goes with it) and similar in some ways to microformats, so people can relate to it.
- RDFa is also easy to implement (it’s all about embedding some markup in your HTML, basically) and
- RDFa provides immediate benefits: annotating web pages using RDFa means they get linked to other resources on the web and have more metadata that can be interpreted to by e.g. search engines (Google rich snippets and Yahoo Search Monkey) to display and rank them appropriately.
However, i think RDFa is just the tip of the iceberg or the bate to get web developers hooked on Semantic Web goodies – depends on which way you look at it. The Semantic Web stack and vision go way beyond that: RDFa is good for annotating web pages, giving them the aforementioned benefits, but it is possible to offer or access the same metadata via a SPARQL endpoint, which means you do not have to parse to extract it and you can perform complex queries on it remotely.
So, while it is clear that RDFa is useful on the web page level, if what you are interested in is offering/accessing raw data then it makes much more sense to do that via a SPARQL endpoint. Please note that i deliberately refrained from pointing out issues with both RDFa and SPARQL, although they do exist, because the point i want to make is that they are both usable and useful and are here to stay. So let’s start thinking about what they can do for you!
I’m glad to say that Liferay people are keen on the idea, so we have been discussing on ways to include the Linked Data Module in a near future version of Liferay (6.x) as well as incorporating more Semantic Web features. This is pretty exciting, so we’ll keep you posted!