The messages sent across the messaging channels should conform to a contract. The messages being SOAP messages would contain XML payload as the data that is transferred between the services.
These messages should fulfill a contract agreed upon by the services bilaterally or even multilaterally. A contract is an obligation that two or more services agree upon in sending and receiving messages. If the message does not fulfill the contract then the message is rejected. It is very important for systems to have such a contract since this gives the boundary of change that the application can perform independently of each system. This also guarantees that the message is fully acceptable to each of the service that has entered into the agreement. The question that arises is how a contract can be designed to fulfill this need.
In this article I assume all the messages sent across are well formed XML messages. For non-XML documents the option would be to transform these messages into XML document by intermediary layers. Today it is standard to use SOAP as the transport mechanism and the payload of a SOAP message contains XML.
XML Schema
A schema describes the structure of the information. It establishes a standard vocabulary in creation of XML messages. It sets the constraints hence checks for the validity of the message passed between the systems, therefore defining message boundaries. For example if the message contained an order for an item but accidentally the system added duplicates then multiple items could be sold for a single unit price. This could have been prevented if a well defined schema was used to validate the XML message. Indirectly it is encapsulating the business rule as in the customer orders where same item cannot be ordered in duplication with a single unit price.
There are basically two kinds of validity -- the validity of content models and the validity of specific units of data.
Content model validity tests whether the order and nesting of tags is correct.
Datatype validity is the ability to test whether specific units of information are of the correct type and fall within the specified legal values.
The vocabulary of an XML Schema document is comprised of about thirty elements and attributes.
The datatyping power of XML Schema can be seen in the declaration for <phoneNumber> in figure 2.
We begin by defining a phoneNumberType datatype which is a string that needs to be exactly three
digits followed by a hyphen followed by exactly three digits and another hyphen and then followed by four digits:
<datatype name="phoneNumberType">
<basetype name="string"/>
<lexicalRepresentation>
<lexical>999-999-9999</lexical>
</lexicalRepresentation>
</datatype>
Figure 2: Phone Datatype
With the phoneNumber datatype defined, it's now easy to declare that a <phoneNumber>
element which must be of the type as seen in figure 3.
<elementType name="phoneNumber">
<datatypeRef name="phoneNumberType"/>
</elementType>
Figure 3: The phone Element Type
Using this schema any XML message can be validated to conform to the phone number structure.
Typical Design
At first the applications are created to generate the XML messages. The schema is created as an afterthought.
Then there is a mismatch of the schemas and now to integrate it transformations are created. The complexity of
the message is now added to the transformation system. Moreover serious incompatibilities in the schema would
result in the increased difficulty for creating the transform. This result in extremely complex logic in the
transformation and the bugs are difficult to track. To correct the system already built some of the schemas
are remodeled and added complex exceptions to the transformation logic. Finally the integration is done
irrespective of the irregularities in the xml message.
Improved Design
For an improved design the following steps need to be followed:
- Identify the XML instance.
- Write a schema based on this XML structure.
- Use this schema as the contract between the messages.
- Transform the message as needed.
- Similar messages are created which uses similar data and more.
- Add on to the existing schema.
- Repeat the steps recursively.
The application should be written only after this flow is established if necessary a simple application
can be created to test these messages. Sometimes the applications already exist even in such case start
of creating an efficient XML structure first and follow the steps as enumerated.
Location Transparency
The sending message can have an attribute 'SchemaLocation' which would indicate the location of the schema.
But now the messages need not have an idea about the location of the schema. The schema information can be
stored in a database with the receiving address location and based on the receiving address information the
appropriate schema is used to validate the message. This would mean the SOAP message is interrogated for its
end point address and using this information the routing mechanism polls the database to retrieve the
appropriate schema to validate the message. The purpose of considering the receiving end point is that
the same message could be validated only for the information needed by the receiving end point and so
more specific schemas can be created based on the need.
Versioning
This is another critical component of the schema. Since a message itself could change and there could be requirement where both the old and new message could be emitted through the same system. Additionally it also gives the possibility of flexibility in changes as needed if multiple receiving points have variations in data and hence extensions only to portions of the schema are done.
There are multiple approaches to versioning of a schema.
-
Using schematron: Schematron is another schema language. It can be embedded within the XSD using
the <appinfo> element within the XML Schema document. Constraints can be expressed
using <assert> elements. Schematron will extract the directives from the XSD document
and create a schema which will be then used to validate the instance document.
The figure 4 below shows an example of schematron embedded in a schema that can be used to validate an
XML message for version 2.0.0.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.mySchema.org"
xmlns="http://www.mySchema.org"
xmlns:sc="http://www.mySchema.org/schematron"
elementFormDefault="qualified">
<xsd:annotation>
<xsd:appinfo>
<sc:title>Version Validation with Schematron</sc:title>
<sc:ns prefix="v" uri="http://www.mySchema.org"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element name="ver">
<xsd:annotation>
<xsd:appinfo>
<sc:pattern name="Version check">
<sc:rule context="v:check">
<sc:assert test="v.Version = 2.0.0"
diagnostics="equivalent">
Version is valid
</sc:assert>
</sc:rule>
</sc:pattern>
<sc:diagnostics>
<sc:diagnostic id="equivalent">
The version is incorrect
v = <sc:value-of select="v:ver"/>
</sc:diagnostic>
</sc:diagnostics>
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Ver" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<!-- Other schema entries for the xml document -->
</xsd:element>
</xsd:schema>
Figure 4: Schematron in an XML Schema.
-
Using XSLT expression: XSLT is a good expression to check the version. The only
disadvantage of using this option is an XSLT Stylesheet has to be maintained for each XML
schema document. This results in increased complexity in maintaining and matching the right
documents across each other.
Figure 5 is an example of usage of XSLT to verify the version.
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.mySchema.org"
version="1.0">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:if test="/xs:ver <> '2.0.0'">
<xsl:text>The version is incorrect</xsl:text>
</xsl:if>
<xsl:if test="/xs:ver <> '2.0.0'">
<xsl:text>The version is correct</xsl:text>
</xsl:if>
</xsl:template>
</xsl:stylesheet>
Figure 5: XSLT example for checking the version in an XML message.
-
Using any of the programming languages: This is done by creating another application
layer that verifies each schema version by preprocessing them in the messaging pipeline. The
disadvantage is that this could be bottleneck for overall performance of the system. This
should be the least encouraged mechanism.
Maintenance
As the applications grow, the number of schemas would increase. Additionally there would be enhancements to
existing schemas and possibility of maintaining multiple versions could arise. All this would need a sound
maintenance plan in maintenance of the schemas. Some of the best practices are:
- Create modular schemas so they cane be highly normalized and hence can be reused easily. Since they are modular they can be combined into larger schemas as needed by using include and import attribute in schemas.
- Redefine elements with which one schema document can actually override elements or attributes of another.
In conclusion using XML schema as the contract for the messages avoids data redundancy, misaligned
vocabularies and enforces business rules.