Data Modeling for APIs. Part 1: setting the stage

Lately we’ve been engaged in the design of a data model for a project aiming to deliver an API for analytics in the domain of energy. As there is  an ongoing debate in the consortium wrt to the type of API that will be implemented (RESTful vs Web Services), we have been asked to provide feedback on the implications of each choice from a data modeling point of view. This was an opportunity to revisit the space and take stock of latest developments and how they are relevant in the real world today, so we will briefly recap our findings and thoughts in a series of posts.

File:4-3 Data Modelling Today.svg

To begin with, when setting out to develop a data model to be used directly by an API rather than an application (that may or may not at some point provide an API), your considerations and priorities are a bit different as instead of starting off at the database level we are aimed directly at the API level. But wait a minute, I hear you ask, I thought data modeling was supposed to begin at the conceptual level, right? Right – with the occasional quick-n-dirty hack exception, modeling at the database level typically starts with ERDs (or UML extensions for DB models). Then you would go from ERDs to db schema and to storage layer object domain model and from that to API layer object domain model.

This is a widely used, well documented and understood technique with rich tooling support. It just works and most architects and  developers are familiar with it. That does not necessarily mean it is flawless or the best fit for an API data model, but in any case it will be useful for setting the stage in terms of how to evaluate our options.

So what would we the requirements for our data modeling technique? In the following we will introduce the features based on which we will evaluate different options and present how the db-first approach fares in terms of these features.

  1. Semantic clarity & expressiveness. A data modeling approach that offers the notation to enable an elegant and complete modeling of the domain one needs to model is absolutely essential.
  2. Modeling flexibility. Every data model, no matter how well-crafted, is bound to change at some point. Our data modeling approach needs to anticipate and allow changes in the model as seamlessly as possible.
  3. Ease of use  & communication. A data model serves many needs, one of the most fundamental of which is the need to interact with project stakeholders, document their domain knowledge and validate your modeling. So our modeling approach needs to be simple enough for  stakeholders to be able to follow, with a notation that can also be visualized.
  4. Documentation & tooling support. Anything that makes a modeler’s life easier is well appreciated. Even the most elegant approach will not get much use if it is not well documented and requires you to hand-craft your models. And support for generating additional artifacts from data models is also essential.
  5. Performance. While at first this does not seem related to a data model, one must keep in mind that especially when talking about API use, your conceptual data model in the sky will have to stand the test of serialization and exchange of large data volumes. Although not an integral property of the data modeling approach per se, the performance implications for the artifacts the approach leads to generate is something to consider.

How does the good-old-db-first approach fare in terms of the above criteria then?

  1. Semantic clarity & expressiveness. A DDL for the relational DB world translates to SQL. While straightforward, SQL is not the most expressive of ways to model a domain as it lacks features such as inheritance for example. The only semantic relationship that is clearly expressed in SQL is that of association, using foreign keys.
  2. Modeling flexibility. For non-trivial changes, a relational data model can be quite messy to update
  3. Ease of use  & communication. Being well-established, widely used and understood and having a graphical notation (ERD diagrams), relational data modeling scores quite well here.
  4. Documentation & tooling support. Again, the maturity of relational data modeling makes it shine in this department, as there is lots of documentation and excellent tooling support available.
  5. Performance. Serialization and data exchange are not dependent on the modeling on this level in the case of relational data models, but rather on the domain object model they are translated to and the mechanism used for communication on that level.

Now that the stage is set, stay tuned as next week we will cover the implications of the type of API we are modeling for on our choice of data modeling approach (and vice versa) in the case of RESTful vs Web Service APIs.

Part 1: Seting the stage | Part 2: REST and JSON | Part 3: SOAP and XML | Part 4: Linked Data and SPARQL | Part 5: Modeling vs. Meta-Modeling

2 comments to “Data Modeling for APIs. Part 1: setting the stage”

You can leave a reply or Trackback this post.
  1. […] Part 1: Seting the stage | Part 2: REST and JSON | Part 3: SOAP and XML | Part 4: Linked Data and SPARQL […]

Write a Reply or Comment

Your email address will not be published.