First, what is an XML Schema?
XML Schema Definition Language (XSDL) is used to enforce consistency of the structure of objects in XML files. XSDL makes assertions like “all order objects in our company should look like this and include the following fields.” Also called XSD , XSDL is sort of like Electronic Data Interchange (EDI), which also establishes standards for how objects like orders and customers should be defined, except that schema definitions are decided by a programmer rather than a world-wide committee. A schema is a formal vocabulary agreement, serves to documents the data, limits the values of certain fields, and can establish default values where values are not supplied.
XML schemas allow you to create complex data structures out of simple structures that are, in turn, built up from primitive types (like integers, strings, and floating point numbers). XML Schemas provide mechanisms to enforce date and time formats, to create lists of objects, to restrict values between ranges, to force a value to be selected from a predefined list of strings, and to import parts of other schemas.
You may have noticed that Microsoft Word has the ability to export documents as XML files. But did you also know that Word provides the ability to import an XML schema so that the content of your document conforms to a standard? That feature is not widely used and requires some training, but if you’re looking for a very small “class” in XML Schemas, you can search the web for simple tutorials in these kind of features in your favorite word processing software. You don’t have to purchase any new software to experiment with schemas - you probably already have it.
What's the difference between XSTL and XSDL?
Don’t confuse XML Stylesheet Transformation Language (XSTL) with XSDL. XSTL is about morphing an XML document into some other format (perhaps into HTML, say, for display in a browser). XSDL is about enforcing data integrity and not about transforming the data into something else.
OK - XML Schema syntax is easy. What's going to trip me up then?
The XML schema syntax is pretty easy, but the use of namespaces is horrific. Namespaces make XML data files and their schemas incredibly hard to read and make you long for comparatively easy to read IRS tax forms. Prefixes seem to appear wherever you least expect them and debugging messages that refer to namespaces (or “QName” values in those namespaces) are common and hard to interpret. Schemas are themselves XML documents that must conform to a schema specification (or a “schema schema” - my head is killing me). Schemas can be embedded in the actual XML file itself (rather than separated into another file). It’s best to get an XML editor (or better yet, a developer – one not prone to headaches) to read these for you. XML editors are essential to comprehend schema files of even modest complexity.
I'm a masochist. Give me more detail on XML Schemas.
The remainder of this section provides more detail than many people want to know about XML Schemas. You may want to skip ahead a bit... I know I do. To understand XML Schemas in detail, be sure you first understand a little bit about XML prefixes and XML namespaces. What? We don’t have chapters on those yet? Uh oh. You’re doomed. Oh wait. Those were covered back in the chapter on XML. Whew. Best read the XML chapter first, before launching into this one.
Recall that the XML chapter began with a simple three line sample XML file. Below is a 10 line sample schema file:
<?xml version=”1.0” encoding=”UTF-8”?>
<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” >
<xs:element name=”ProductNum” type=”xs:integer”/>
This file (we’ll name it “product.xsd”) contains a schema for a corporate “Product” object. The product object is declared on line 3. Our product object has only one element in it – the product number. Our schema can then be tied to an XML file, which has the actual data in it. Our corresponding XML data file is shown below:
<?xml version=”1.0” encoding=”UTF-8”?>
This particular file contains only product – product #666 (pretty scary, huh?) The first line of our XML data file is standard XML and has nothing to do with schemas. It sets the XML version and establishes the algorithm for how strings should be converted into numbers. Line 2, however, does have something to do with schemas. It tells the XML parser where it should look for the schema – in the file “product.xsd”. The third line is really just a continuation of the second line since it falls inside of the “Product” tag”. The actual mechanics of what happens in the parser when it hits that third line is very involved. However, the net effect is that the parser then knows to go get the “product.xsd” schema file specified in line 2 and apply it to this file. When this schema is applied, a parser returns an error if the tag “ProductNum” is changed to “ProductNo” in the data file since the schema says it is supposed to be “ProductNum”.
In the schema file (product.xsd) you can see the prefixes that make the tags in this XML data file unique are “xs”. It’s convention to use the prefixes “xs” or “xsd” to define schema tags (like “element”, “simpleType”, “complexType”, “date”, “integer”, etc.) So when you see the prefix “xs” or “xsd”, you know that the word that follows is part of the standard vocabulary that is used to define schemas.
The “root element” in our schema file is the “Product” object. We made the root object a complex type because it would ordinarily contain a lot of child objects. We made the first and only child object a “sequence”, to demonstrate how you enforce child elements to appear in the order listed in the schema. The only line in the schema that might actually be mapped to some real data is the product number.
In our example above, the vocabulary of the schema (what a “Product” object looks like) is not assigned to a namespace. Instead, it’s assigned to “noNamespaceSchemaLocation”. Normally schemas specifically target a certain namespace that they want to create or add to. This is known as the “targetNamespace”. It’s more common to see schemas that begin like this:
<xsd:schema xmlns:xsd = ””
xmlns = ””
targetNamespace = ”“>
Note that schemas are optional for XML files. You don’t have to have one. If you remove all references to the schema file from the XML data file, the XML data file will still validate as proper XML (but it won’t have its content checked). Announcing intent to conform to a schema but omitting the actual integrity checking is not uncommon. You’ll see XML files declare that they intend to use the SOAP namespace like this:
<soap:Envelope xmlns:soap = ””
but those same files do not really enforce that SOAP schema like they should, like this:
<soap:Envelope xmlns:soap = ””
xmlns = ””
xsi:schemaLocation = ”>”
A SOAP message that uses this second technique will be validated against a schema, so if you accidentally misspelled “<soap:Body>” as “<soap:upMyBody>”, you’d get an error. The advantage of skipping the schema in a SOAP message is that you will not incur the performance hit of validation, but at the cost of potentially corrupt data.
What are the common and serious mistakes that companies repeatedly make when using XML Schemas?
XML Schemas are simultaneously hated but used everywhere. How come? When should you use them and when should you avoid them? What are the most common mistakes that people make with schemas? Is there something you can use instead? (Hint – the answer is yes, but you need to know the trade-offs.) Get the answers to these questions in a very brief and easy to read format, and get them for all the other technologies we cover too. Leverage off our years of experience in the trenches getting these technologies working in production for end clients. Click the “Buy Now” button below and act to keep your career firm, erect, and moving upward! Don't just settle for the book excerpt on this web page. Act now to protect your career.