Data Modeling for APIs. Part 2: REST and JSON

In the second part of this series of posts we start looking into the implications of the choice between a SOAP and a REST approach to implementing APIs from a data modeling perspective. For most people a SOAP API is associated with an XML data model, while a REST API is associated with a JSON data model. So before proceeding to explain why this ain’t neccessarily so, let us first see what are the options for modeling in the XML and the JSON world respectively.

JSON is typically perceived as a format whose main advantage is that it is simple and lean: it can be used without knowing or caring about any underlying schema, happily hacking away setting and getting values. It is also considered more lightweight than XML in terms of serialization, as it is less verbose. As JSON has originally evolved as a serialization format for Javascript, this mirrors its primary context of use. Of course it was not long before people realized that the lack of a contract to specify the structure of data exchanged between two points (i.e. a schema) may seem like a benefit at first, but that is not necessarily the case.

This approach may work when the exchanged structures are simple and stable and the parties in charge of the two points that communicate are able to sync and maintain a common understanding of the shared data objects without a formal specification. But for a big part of real-life situations this set of conditions is simply not possible to meet and the situation will sooner or later escalate to a maintenance nightmare. Hence, the need for a schema became evident in the JSON world and voila – JSON Schema enters the picture.

Besides the related debate, as for certain people/situations the use of schema does not seem to make sense, the approach to using JSON Schema is different than the one to using XML Schema. First of all, the choice of tooling support is considerably less than its counterpart for XML Schema, and it’s not hard to imagine why: it’s a format that has been around for a shorter time than XML and is used in situations where the existence of a schema is not always deemed necessary.

But what is perhaps even more important is that these tools seem to work in a kind of reverse / retrospective logic: you feed it some template in the form of JSON fragments, and they will reverse-engineer a schema out of it. Since this approach is not always enough to generate a complete schema, the outcome can then be edited manually to include missing specifications. There are also a couple of visualization tools / hacks available to attend to the need to get a quick overview of a schema.

Still for some people / situations, the use of a schema to validate data exchanged is overkill: sometimes all that is needed is to make sure that the data objects associated with API calls are adequately described, to make sure that anybody using the API will be able to understand what is passed around in order to extract values and create objects as needed in a way that is compliant with the API designer’s intentions. So a schema is not, strictly speaking, part of the requirements there, although it can be present. This really falls under the category of API documentation, and you can check a nice writeup of the options here, with the notable exception* of Rest.li which is LinkedIn’s recently open-sourced framework for REST API ecosystems.

So what could a schema development cycle in the JSON world look like – supposing you actually chose to include a schema in the first place and not skip that part and go directly to step 5 of the following process:

Create sample JSON for data objects to be exchanged
Feed those to the JSON Schema tool of choice in order to generate a corresponding schema
Review and modify/add to the generated schema as needed
(Optional) Use a visualization script to generate an overview of the schema
Use one of the available REST API metadata frameworks to generate documentation for the API

Now let us see how this approach fares in terms of the criteria set previously to evaluate data modeling techniques for APIs:

Semantic clarity & expressiveness. JSON Schema has the capability to express basic definitions and constraints for data types contained in objects, and it also supports some more advanced features such as properties typed as other objects, inheritance and links. Note: there is also the recently standardized JSON-LD extension which deserves special reference in an upcoming extension of this post.
Modeling flexibility. Having the above process as reference entails that introducing a change would mean having to go through the steps again – especially the loop from step 2 to step 3 could be a painful one. On the other hand, in terms of data exchanged, if there is no validation and no language-mapping tools used the implications are minor to non-existing.
Ease of use & communication. Lack of mature modeling and visualization tools for JSON give it a low score on this dimension. A data model developed in JSON will be quite hard to communicate to anyone without a developer background.
Documentation & tooling support. There is some documentation on JSON schema and some references to and documentation on REST API metadata frameworks, as well as a number of tools for modeling and visualization. Overall however it seems that this approach to modeling in JSON is not very popular – hence the lack of extensive documentation and examples.
Performance. This is one of the strong points of JSON, as it was designed to be quick and flexible in terms of serialization and has good tooling support. Of course, nothing beats a binary format, so if you really have a need for speed, BSON is the way to go.

In the next part of the series, we’ll delve in the specifics of data modeling for SOAP APIs.

* Another late addition to this list was the FOS REST API Bundle for Symfony, brought to my attention via a LinkedIn thread and apparently getting some traction in the PHP community.

Part 1: Seting the stage | Part 2: REST and JSON | Part 3: SOAP and XML | Part 4: Linked Data and SPARQL | Part 5: Modeling vs. Meta-Modeling