(Excerpt from "The MathML Handbook" by Pavi Sandhu)

A document that conforms to the basic rules of XML syntax is said to be well formed. Some typical rules a well-formed document must follow are:

  • It must have exactly one root element.
  • Every start tag must have a matching end tag.
  • Elements cannot overlap.
  • Attribute values must be enclosed in quotation marks.
  • No element can have two attributes with the same name.
  • The character data of an element or attribute cannot contain any literal < or & signs.
  • Comments cannot appear inside tags.

When creating an XML document, you can define your own tags and attributes to create what is called free-form XML. As long as your document is well formed, standard XML tools such as parsers can accept and process it.

Free-form XML offers the greatest flexibility since you can create your own tag names and assign to them any meaning you choose. However, free-form XML documents are not very useful as a medium for communicating information. If you exchange your document with other people, they may not know the specific meaning of the tag names you used. Hence, they cannot easily search, modify, or manipulate the information contained in your document.

For documents to be publicly exchanged, it is important that they are composed according to an agreed-upon standard. For this reason, most XML documents used for real-world applications are restricted to using a specific set of tags and attributes, whose meaning and usage are clearly defined. Such documents are said to conform to a specific document type.

The document type determines the list of tags and attributes that can be used in a document as well as other details (such as which elements can be nested inside other elements and in what order). In XML, information of this type is specified using a special document called a DTD. A DTD is written in a formal syntax, defined by the XML specification. The DTD syntax is hard for humans to read because it is intended mainly for use by XML processors. Also, the DTD provides only the bare minimum information needed to define a particular document format. To explain the meaning and usage of the various elements and attributes, some additional documentation is usually necessary to supplement the information in the DTD.

You can define your own DTDs to formalize the structure and vocabulary of any XML document that you create. Various groups have defined a large number of public DTDs to provide a standard XML document format for a specific purpose. Each XML format defined by a specific DTD is called an XML application, because it is an application of XML to a specific field, such as mathematics, music, or vector graphics.

A well-formed XML document that conforms to the rules specified by a particular DTD is said to be a valid document. Validity is a stronger requirement than being well-formed since a document can be well-formed without being valid. Note that since validity is defined with respect to a specific DTD, a document can be valid with respect to one DTD but invalid with respect to another.

A valid document must include a reference to the DTD to which it conforms. This is done using a document type declaration, which is a statement that appears at the beginning of the document, before the root element. Here is an example of a document type declaration for a MathML document:

<!DOCTYPE math SYSTEM "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd">

This declaration says the document's root element is math and the DTD for the document can be found at the URL "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd".

You can also specify the DTD using a relative URL (if it is in the same file system as the document) or just the filename (if it is in the same directory as the document), as shown here:

<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN" "http://www.w3.org/TR/MathML2/dtd/mathml2.dtd">

The two declarations above are examples of external DTDs, in which the DTD being specified is a separate document specified as an external URL. XML documents can also have internal DTDs; that is, the DTD is included in the content of the XML document itself. The declaration for an internal DTD is similar to that of an external DTD, except that instead of a URL, you explicitly include the contents of the DTD enclosed in square brackets. For example:

<!DOCTYPE library [
  <!ELEMENT author (#PCDATA, born, died)>
  <!ELEMENT born (#PCDATA)>
  <!ELEMENT died (#PCDATA)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT book (title, author+)>
  <!ELEMENT library (book*)>

Each <!ELEMENT ...> statement defines the name of a specific element (and the content it can have) using a specific syntax defined for use in DTDs. For example:

<!ELEMENT title (#PCDATA)> means that a title element can contain any parsed character data, that is ordinary text, possibly containing entity references but not containing any tags or child elements.

<!ELEMENT author (#PCDATA, born, died)> means that each author element can contain ordinary text as well as a born and died element.

<!ELEMENT book (title, author+)> means that each book element can contain one <title> element and one or more author elements.

<!ElEMENT library book*> means that each library element can contain zero or more book elements.

An XML document can also have both an external and internal DTD. That is, a part of the DTD is in an external file and the rest is specified explicitly inside the document. The external part is called the external DTD subset, and the internal part is called the internal DTD subset. For example, the following DTD declaration includes both an external DTD subset and an internal DTD subset (the declarations for the author, born, and died elements are made in the internal subset, while the remaining elements are declared in the external subset):

<!DOCTYPE library SYSTEM "library.dtd" [
  <!ELEMENT author (#PCDATA, born, died)>
  <!ELEMENT born (#PCDATA)>
  <!ELEMENT died (#PCDATA)>

Combining an internal and external DTD is useful when you want to use an existing DTD but if you want to modify it slightly for a specific document (say to add declarations for a small number of extra elements). The two parts of the DTD must be consistent; that is, the element and attribute declarations in one part must not conflict with the declarations in the other part. In case of a conflict, the declarations in the internal subset override the declarations in the external subset.


<< back next >>





Copyright © CHARLES RIVER MEDIA, INC., Massachusetts (USA) 2003
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "The MathML Handbook" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.

CHARLES RIVER MEDIA, INC., 20 Downer Avenue, Suite 3, Hingham, Massachusetts 02043, United States of America