Schema Validation with JAXP
This tip is an example Java program for validating an XML file using an W3C XML Schema. This example uses Sun JAXP 1.2 API and JDK 1.4.0. JAXP 1.2 uses Apache Xerces as its underlying XML parser.
Note: This tip is current at the time of this writing. However, the XML standards are evolving daily and the parsing code may change anytime a new standard is introduced.
To make this example work, you must download and install the latest JAX Pack from Sun at http://java.sun.com/xml/javaxmlpack.html.
Reviewing the Code
Listing for: SimpleSchema.java
1:import javax.xml.parsers.*; 2:import org.xml.sax.*; 3:import org.xml.sax.helpers.*; 4:import java.io.*; 5: 6:// A Simple SAX Application using JAXP with Namespace support 7:public class SimpleSchema{ 8: private SAXParserFactory factory; // Creates parser object 9: private SAXParser parser; // Holds a parser object 10: private XMLReader xmlReader; // Object that parses the file 11: private DefaultHandler handler; // Defines the handler for this parser 12: private boolean valid = true; 13: 14: // Set schema constants 15: static final String JAXP_SCHEMA_LANGUAGE = 16: "http://java.sun.com/xml/jaxp/properties/schemaLanguage"; 17: static final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema"; 18: static final String JAXP_SCHEMA_SOURCE = "http://java.sun.com/xml/jaxp/properties/schemaSource"; 19: 20: 21: public SimpleSchema() throws SAXException{ 22: try{ 23: factory = SAXParserFactory.newInstance(); 24: factory.setValidating(true); 25: factory.setNamespaceAware(true); 26: 27: if (factory.isValidating()){ 28: System.out.println("The parser is validating"); 29: } 30: 31: //Create Parser 32: parser = factory.newSAXParser(); 33: 34: // Enable Schemas 35: parser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA); 36: 37: //Create XMLReader 38: xmlReader = parser.getXMLReader(); 39: 40: ContentHandler cHandler = new MyDefaultHandler(); 41: ErrorHandler eHandler = new MyDefaultHandler(); 42: 43: xmlReader.setContentHandler(cHandler); 44: xmlReader.setErrorHandler(eHandler); 45: 46: } catch (ParserConfigurationException e){ 47: e.printStackTrace(); 48: } catch (SAXException e){ 49: e.printStackTrace(); 50: } 51: } 52: 53: public void parseDocument(String xmlFile){ 54: try{ 55: xmlReader.parse(xmlFile); 56: if (valid) { 57: System.out.println("Document is valid!"); 58: } 59: } catch (SAXException e){ 60: e.printStackTrace(); 61: } catch (IOException e){ 62: e.printStackTrace(); 63: } catch (Exception e){ 64: e.printStackTrace(); 65: } 66: } 67: 68: public static void main(String[] args){ 69: try { 70: if (args.length != 1) { 71: System.out.println( 72: "Usage: java SimpleSchema " + 73: "[XML Document Filename]"); 74: System.exit(0); 75: } 76: SimpleSchema xmlApp = new SimpleSchema(); 77: xmlApp.parseDocument(args[0]); 78: } catch (SAXException e){ 79: e.printStackTrace(); 80: } catch (Exception e) { 81: e.printStackTrace(); 82: } 83: } 84: 85: class MyDefaultHandler extends DefaultHandler{ 86: private CharArrayWriter buff = new CharArrayWriter(); 87: private String errMessage = ""; 88: /* With a handler class, just override the methods you need to use 89: */ 90: 91: // Start Error Handler code here 92: public void warning(SAXParseException e) { 93: System.out.println("Warning Line " + e.getLineNumber() + ": " + e.getMessage() + "\n"); 94: } 95: 96: public void error(SAXParseException e) { 97: errMessage = new String("Error Line " + e.getLineNumber() + ": " + e.getMessage() + "\n"); 98: System.out.println(errMessage); 99: valid = false; 100: } 101: 102: public void fatalError(SAXParseException e) { 103: errMessage = new String("Error Line " + e.getLineNumber() + ": " + e.getMessage() + "\n"); 104: System.out.println(errMessage); 105: valid = false; 106: } 107: } 108:} 109:
There are a couple of differences you will see right off the bat from a standard XML parser. Lines 15-18 define constants for parser features we want to use in our program. The JAXP_SCHEMA_LANGUAGE constant is used to define the language the schema is written in. The W3C_XML_SCHEMA constant identifies W3C XML Schema as the language we will be using to validate. The JAXP_SCHEMA_SOURCE constant is used when you which to specify a schema document different from the one specified in the XML document. We are not using this constant in this example.
Next, look at the SimpleSchema() constructor (line 21) for the key configuration steps used in creating our parsing object.
- Line 23-25. Create a SAXParserFactory to set the features for this parser. In this case, validation is enabled as well as namespace support. When using schemas, you must enable namespace support
- Line 32. Create a SAXParser object.
- Line 35. Set the Schema Language.
- Line 38. Create an XMLReader object to parse the document
- Line 40-44. Create and set the ErrorHandler and ContentHandler. In this example, the ContentHander is unecessary since no processing of elements or attributes is being done in the program. However, normally you would set the ContentHandler so I left the code in. The ErrorHandler is required to see error messages generated during parsing.
The parser is invoked on line 55. This method call parses the file and calls the ErrorHandler whenever an error is encountered.
The Error Handling Code
Line 85-107. This is the error handling code. It is pretty simple really. Create warning(), error(), and fatalError() methods as shown. The appropriate method is called depending on the error generated. Warnings and errors will not stop parsing, however, fatal errors will.
Note:The following are sample XML files you can use to test the class.
Listing for: todo.xml
1:<?xml version="1.0" encoding="UTF-8" standalone="no" ?> 2:<!-- A Todo List --> 3:<todo xmlns="http://abbeyworkshop.com/todo" 4: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 5: xsi:schemaLocation="http://abbeyworkshop.com/todo todo.xsd" 6:> 7: <list name="List1"> 8: <item>Item 1</item> 9: <item>Item 2</item> 10: <item>Item 3</item> 11: </list> 12: <list name="List2"> 13: <item>Item one</item> 14: <item>Item two</item> 15: <item>Item three</item> 16: </list> 17:</todo>
Listing for: todoS.xsd
1:<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 2: targetNamespace="http://abbeyworkshop.com/todo" 3: xmlns="http://abbeyworkshop.com/todo" 4: elementFormDefault="qualified" 5: attributeFormDefault="unqualified" 6:> 7: 8:<xs:simpleType name="nameType"> 9: <xs:restriction base="xs:string"> 10: <xs:maxLength value="50" /> 11: </xs:restriction> 12:</xs:simpleType> 13: 14:<xs:simpleType name="itemType"> 15: <xs:restriction base="xs:string"> 16: </xs:restriction> 17:</xs:simpleType> 18: 19:<xs:complexType name="listType"> 20: <xs:sequence> 21: <xs:element name="item" type="itemType" minOccurs="1" maxOccurs="unbounded" /> 22: </xs:sequence> 23: <xs:attribute name="name" type="nameType" /> 24:</xs:complexType> 25: 26:<xs:complexType name="todoType"> 27: <xs:sequence> 28: <xs:element name="list" type="listType" minOccurs="0" maxOccurs="unbounded" /> 29: </xs:sequence> 30:</xs:complexType> 31: 32:<xs:element name="todo" type="todoType" /> 33: 34:</xs:schema>
Listing for: todoBad.xml
1:<?xml version="1.0" encoding="UTF-8" standalone="no" ?> 2:<!-- A Todo List --> 3:<todo xmlns="http://abbeyworkshop.com/todo" 4: xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 5: xsi:schemaLocation="http://abbeyworkshop.com/todo todo.xsd" 6:> 7: <list name="List1"> 8: <item attrib="text">Item 1</item><!-- Attrib not allowed --> 9: <item>Item 2</item> 10: <item>Item 3</item> 11: <bad>Some text </bad> <!-- Undefined tag --> 12: </list> 13: <list name="List2"> 14: <item>Item one</item> 15: <item>Item two</item> 16: <item attrib="text">Item three</item> 17: <item>Item 4</item> 18: </list> 19: <bad>Some text </bad> <!-- Undefined tag --> 20:</todo>