xmltex

(Excerpt from "The MathML Handbook" by Pavi Sandhu)

The ORCCA converter and the XSLT MathML Library, already discussed, are both limited to translating individual MathML expressions into LaTeX. For large-scale document-processing, for example as part of a publisher's workflow, it is useful to have a way of processing entire XML documents that contain embedded MathML while still using TeX as a formatting engine. This is clearly a much more challenging task than converting individual formulas. However, the foundation for this type of conversion has been provided by David Carlisle, in the form of a program called xmltex.

xmltex is a parser for XML documents and is written entirely in TeX. You can configure xmltex to trigger specific TeX commands when it encounters a particular type of element, attribute, processing instruction, or entity in the input XML document. xmltex thus serves as a valuable bridge that connects the worlds of TeX and XML. It allows TeX's powerful typesetting capabilities to be applied not just to TeX documents but to arbitrary XML documents. You can download the xmltex program along with documentation.

xmltex can process XML documents that combine elements from different namespaces; for example, XHTML documents that contain embedded MathML. The xmltex program by itself does not have any knowledge of specific XML formats. All information about a specific XML format must be specified in additional package files (with a .xmt extension). A separate xmt file is required for each XML document type, such as XHTML, DocBook, TEI, or MathML. By including a command of the following form in a catalog file, you can associate the namespace for a specific document type with a particular xmt file:

\NAMESPACE{URL}{xmt-file}

When xmltex processes an XML document and encounters elements from a particular namespace, it loads the xmt file corresponding to that namespace. For example, the following command specifies that the mathml2.xmt package should be loaded whenever the input XML document contains an element that belongs to the MathML namespace:

\NAMESPACE{http://www.w3.org/1998/Math/MathML}{mathml2.xmt}

The mathml2.xmt package is included with the standard xmltex distribution. It contains TeX commands for typesetting most of the common presentation MathML elements.

The catalog file, which specifies which xmt file should be associated with a particular namespace, has a .cfg file extension. You can define a specific catalog file for each document. So, for example, to typeset an XML document called test.xml, you would create a catalog file called test.cfg. If a document-specific catalog file is not found, the default configuration file xmltex.cfg is used.

Let’s look at a simple example of using xmltex to typeset an XHTML document that contains MathML. We can set up an xhtml.xmt file that defines LaTeX commands that correspond to each XHTML element used in this document. The following example shows the contents of this file.

Example: An xmt package that defines LaTeX commands for specific XHTML elements.

\DeclareNamespace{xhtml}{http://www.w3.org/ 1999/xhtml}
\XMLelement{xhtml:html}
{}
  {\documentclass{article}
     \begin{document}
  }
  {\end{document}}
\XMLelement{xhtml:head}
{}
  {}{}
\XMLelement{xhtml:body}
{}
  {}{}
\XMLelement{xhtml:h1}
{}
  {\xmlgrab}
  {\title{#1}
     \maketitle}
\XMLelement{xhtml:p}
{}
{\par}
  {\par}

The first line in the xhtml.xmt file specifies the namespace associated with all XHTML elements in the file. Each \XMLelement{name} defines LaTeX commands to be used when an element called name is encountered. The file contains LaTeX commands that correspond to the XHTML elements html, head, body, h1, and p. The following example shows an XHTML+MathML document called test.xml, which uses only these XHTML elements. The following figure shows how this file looks when displayed by Mozilla.

Example: An XHTML+MathML document called test.xml.

<html xmlns="http://www.w3.org/1999/xhtml">
  <head></head>
  <body>
    <h1>Using TeX to Typeset MathML</h1>
    <h2>Subscript and Superscript</h2>
    <p>
      <math xmlns="http://www.w3.org/1998/Math/MathML">
        <mrow>
          <msup><mi>a</mi><mn>1</mn></msup>
          <mo>+</mo> 
          <msub><mi>b</mi><mn>2</mn></msub>
        </mrow>
      </math>
    </p>
    <h2>Fraction</h2>
    <p>
      <math xmlns="http://www.w3.org/1998/Math/MathML">
        <mfrac>
          <mrow><mi>z</mi><mo>+</mo><mi>1</mi></mrow>
          <mrow><mi>z</mi><mo>-</mo><mi>2</mi></mrow>
        </mfrac>
      </math>
    </p>
    <h2>Radical</h2>
    <p>
      <math xmlns="http://www.w3.org/1998/Math/MathML">
        <mroot>
          <mrow><mi>x</mi><mo>+</mo><mi>1</mi></mrow>
          <mn>3</mn>
        </mroot>
      </math>
    </p>
    <h2>Subscript-superscript pair</h2>
    <p>
      <math xmlns="http://www.w3.org/1998/Math/MathML">
        <msubsup>
          <mi>A</mi>
          <mi>i</mi>
          <mn>j</mn>
        </msubsup>
      </math>
    </p>
    <h2>Table</h2>
    <p>
      <math xmlns="http://www.w3.org/1998/Math/MathML">
        <mtable>    
          <mtr>
            <mtd><mn>1</mn></mtd><mtd><mn>2</mn></mtd><mtd><mn>3</mn></mtd>
          </mtr>  
          <mtr>
            <mtd><mi>a</mi></mtd><mtd><mi>b</mi></mtd><mtd><mi>c</mi></mtd>
          </mtr>
        </mtable>
      </math>
    </p>
  </body>
</html>

xmltex: The file test.xml viewed in Mozilla

Figure: The file test.xml viewed in Mozilla.

TeX cannot process XML files directly, only TeX files. Hence, to run xmltex on the XML document shown in the above example, you first need to create a text file, called test.tex, with the following lines in it:

\def\xmlfile{test.xml}
\input test.tex

For the xhtml.xmt file to be automatically loaded whenever the XHTML namespace is encountered, you must create a catalog file called test.cfg that contains the following line:

\NAMESPACE{http://www.w3.org/1999/xhtml}{xhtml.xmt}

You can then run the following command in your LaTeX installation to parse the XML file using xmltex:

latex test.tex

The result is a DVI file (see the next figure) called test.dvi that contains the typeset output produced by TeX. Alternatively, you can run the pdflatex command to generate a PDF file of the typeset output, as shown here:

pdflatex test.tex

DVI file produced by typesetting test.xml using xmltex

Figure: The DVI file produced by typesetting test.xml using xmltex.

By suitably defining xmt package files for specific document types, you can typeset any XML document using TeX's formatting capabilities. A good example of this approach is the PassiveTeX project of Sebastian Rahtz. He has created a fotex.xmt package and a style file that provides a fairly complete implementation of the XSL-FO format. As discussed under XSLT primer, XSL-FO is a W3C standard for specifying the detailed layout and formatting of XML documents. You can use an XSLT stylesheet for transforming a document in any arbitrary XML format, such as XHTML or DocBook, into XSL-FO. Once an XSL-FO document is obtained, you can use Rahtz's package to typeset the document in TeX and directly create PDF files from the typeset output.

You can find more information under PassiveTeX. This site provides sample input files and XSLT stylesheets for converting XML documents in TEI format into XSL-FO and then processing them with TeX to get PDF files as output. The site also provides an example of typesetting a fairly complex XML document that contains MathML.

The PassiveTeX project is a good prototype for how TeX can be used for typesetting arbitrary XML documents. The same approach can be applied to any other XML format, including XHTML+MathML. Of course, the task of writing a macro package that will translate all elements of a given XML format into their TeX equivalents can be quite challenging. However, once the initial implementation is done, the process is flexible and robust enough for large-scale adoption as part of a publisher's production workflow. This discussion shows that TeX can continue to play an important role for generating high-quality printed output, using XHTML+MathML documents as a source.

   

<< back next >>

 

 

 

Tipp der data2type-Redaktion:
Zum Thema MathML bieten wir auch folgende Schulungen zur Vertiefung und professionellen Fortbildung an:

 

Copyright © CHARLES RIVER MEDIA, INC., Massachusetts (USA) 2003
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "The MathML Handbook" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.


CHARLES RIVER MEDIA, INC., 20 Downer Avenue, Suite 3, Hingham, Massachusetts 02043, United States of America