Data modeling for APIs. Part 3: SOAP and XML
In the third part of this series of posts we take a look at some of the details of data modeling for APIs using an XML – SOAP approach. SOAP has been around for a while now, as it was designed as an object-access protocol in 1998 by Dave Winer, Don Box, Bob Atkinson, and Mohsen Al-Ghosein for Microsoft, where Atkinson and Al-Ghosein were working at the time. SOAP originally stood for ‘Simple Object Access Protocol’ but this acronym was dropped with Version 1.2 of the standard, which became a W3C recommendation in 2003. The SOAP specification is currently maintained by the XML Protocol Working Group of the World Wide Web Consortium.
At their core, SOAP Web Services are based on simple premises: the use of XML as a means of serializing objects and the use of a variety of protocols (such as HTTP, SMTP, TCP, or JMS) for wire transport. Thus SOAP Web Services achieve platform independence and protocol neutrality, as any implementation should be able to serialize/deserialize XML and send/receive it over protocol of choice. It revolutionized the Enterprise Application Integration domain, as it provided a standardized way for communication between applications implemented on different platforms, and more specifically helped bridge the gap between Java and Microsoft.
SOAP Web Services come in 2 flavors, RPC and document style. In RPC style, the client sends a message and waits to get a response or fault message back from the server while in document style, a full XML document is passed to/from the client and server inside a SOAP message. SOAP defines a message structure consisting of an envelope, a header (optional), the body of the message and potentially a fault description, all encoded in XML. In addition to SOAP itself, the Web Service stack includes a discovery component (UDDI registry) and a standardized way of describing RPC interfaces (WSDL). WSDL is an XML-based specification for describing RPC-style web service interfaces, while UDDI was meant to serve as a mechanism for registering service descriptions in order to enable service discovery and composition.
Over time, additional Web Service specifications have been added to the core ones, collectively known as WS-* and covering aspects such as security and federation. In fact, this is one of the reasons SOAP Web Services have been criticised as being bloated and impractical. However, focusing on the data modeling aspect, it’s interesting to note that the use of SOAP Web Services enforces the use of XML Schema as well, typically included in the WSDL descriptions.
In practice many, if not most, SOAP Web Services evolve as a sort of wrapper layer added a posteriori over services originally developed as native applications. What this means in terms of data modeling is that XML Schema is used as an intermediary to enable data exchange, rather than as a data modeling language per se: applications have their own domain model, as per the language they are written in, and the tools that are used in order to expose the application API as a SOAP Web Service perform an auto-generated mapping from native classes and types to XML. On the client side, typically the same process is reversed, having client-side tools and libraries that generate stubs delegating calls to remote interfaces while also mapping the XML structures defined to classes and structures of the host language.
In situations like these, the use of XML as a data modeling language is virtually non-existent. Although the tooling used to wrap server and client implementations of SOAP Web Services may allow some customization of the mapping to/from XML, the data modeling itself takes place in the language of choice, typically using UML.
However, in situations where development is initiated with a view on exposing an API from the beginning, a different approach may be pursued, one that focuses on XML not just as the means of serialization, but uses XML Schema to develop the domain model from the start. Although this is not a typical approach, it is perfectly feasible and actually presents some advantages that make it an attractive choice for data modeling for APIs. So let us see what could a schema development cycle in XML look like:
1. Develop and document the domain model in XML, using XML Schema.
2. (Optional) Use a visualization tool to generate an overview of the schema.
3a. Use a framework to map from XML Schema to a domain object model in the language of choice, or alternatively
3b. Work directly with XML, utilizing an XML database in the back end
4. Implement the API leveraging the generated domain model, adding CRUD support and custom functionality as needed.
Now let us see how this approach fares in terms of the criteria set previously to evaluate data modeling techniques for APIs:
- Semantic clarity & expressiveness. XML Schema has the capability to express basic definitions and constraints for data types contained in objects, and it also supports more advanced features such as complex types, namespaces, references and inheritance. However, it must be noted that modeling in XML is not always intuitive and straightforward, especially in cases involving complicated domain models.
- Modeling flexibility. The layered approach sketched above means that introducing a change entails having to go through the steps again. In addition, introducing a change in the XML data model would not only break the server implementation, having to re-generate the domain object model of step 3a, but also the client, as a new schema means a new WSDL that will have to be consumed.
- Ease of use & communication. Rich tooling support for XML Schema give it a good score on this dimension. While there is no widely accepted formalism for XML Schema visualization, there are ways to visualize XML Schemas as UML class diagrams, thus promoting ease of use and making it accessible to a wider audience that is familiar with basic UML notation.
- Documentation & tooling support. There is a wide variety of tools that can be used to develop XML Schemas, both stand-alone and as plugins for IDEs, commercial and open source. Documentation is also abundant, as XML has been around for a long time and is well understood.
- Performance. While XML de/serialization and parsing is considered to be one of the weak points of SOAP Web Services, along with the imposed message verbosity associated with XML, there is arguably some progress on both issues. In terms de/serialization, both implementations and the underlying harware have come a long way since the early days of XML so this is not the performance killer it once was. In terms of message verbosity, when this is an issue a binary XML approach can be adopted, even though this comes at the cost of additional performance overhead.
Even though apparently this approach has its own strenghts and weaknesses, thanks to tooling support it comes with a property that makes it an interesting choice that can span both the REST and the SOAP approach. Focusing on the domain model first and treating it as the crystalization point that drives the API gives a REST-like quality to the API regardless of the actual mechanism used. This has been leveraged by tools such as enunciate, that offer the option to support both REST and SOAP Web Services, leveraging the same XML Schemas to describe both by means of WADL and WSDL respectively.
In the next part of the series we’ll look into a less popular, but very interesting alternative for data modeling for APIs, namely RDF and the Linked Data approach. Many thanks for the insight provided on the previous part of the series by members of the LinkedIn SOA SIG in this thread.
Part 1: Seting the stage | Part 2: REST and JSON | Part 3: SOAP and XML | Part 4: Linked Data and SPARQL | Part 5: Modeling vs. Meta-Modeling
3 comments to “Data modeling for APIs. Part 3: SOAP and XML”
Excellent series. Awaiting what’s next!
Thank you, the next part should be out in a couple of weeks.
[…] 1: Seting the stage | Part 2: REST and JSON | Part 3: SOAP and XML | Part 4: Linked Data and […]