Introduction

(Excerpt from "The MathML Handbook" by Pavi Sandhu)

TeX (pronounced "tek") is a text-processing system that makes it easy to create professionally formatted documents. Donald Knuth, a computer scientist at Stanford University, created TeX. Knuth was dissatisfied with the quality of the typesetting in his published book, The Art of Computer Programming. To remedy the limitations of the manual typesetting processes in use at the time (the middle 1970s), Knuth resolved to create a program that would allow documents to be typeset electronically. He first started work on TeX in 1977, and the first version was completed in 1979. Knuth continued to develop and refine TeX until 1990, when he officially declared the source code to be frozen.

Today, TeX is widely used in many contexts where high-quality printed output is important. TeX provides a set of commands for specifying the precise details of a document's typeset appearance. There are commands, for example, to insert a fixed amount of vertical or horizontal space, to determine the indentation of paragraphs or the alignment of text, to specify the font style, and so on. TeX automatically handles many subtle details of typesetting, such as the kerning or ligature of fonts, hyphenation, and linebreaking. However, the author can also control each of these properties explicitly by using specific TeX commands.

The process of creating a document using TeX involves several steps. The author first creates a plain text document called the source file, which contains TeX commands along with the content of the document. This source file is then processed by the TeX program, which interprets the TeX commands in the file to produce a device-independent (DVI) file as the output. DVI is a page-description language, similar to PostScript, for specifying the exact details of how text should be laid out on the page. The DVI file can be either rendered on the screen, using a viewer program, or converted to PostScript for printing. This process can be less convenient than using a word processor with a WYSIWYG graphical interface for formatting documents. However, the extra steps involved in creating documents with TeX are more than justified by the greater degree of control and flexibility a user has over the form of the printed output.

A given TeX source file produces exactly the same output regardless of the type of computer or operating system the author uses. This makes TeX highly portable and platform independent, which is one key reason it appeals to technical authors. Implementations of TeX are freely available for all major computer platforms. In addition, many free and commercial applications for working with TeX, including editors, viewers, and font utilities, are available.

TeX is a large and complex system with a number of files and utility programs that interact with each other in complicated ways. An experienced TeX user can customize the system to create virtually any typographic effect or style desired. In practice, most TeX authors use a standard macro package, such as Plain TeX or LaTeX for authoring documents. Each of these packages uses the primitives or low-level commands of TeX to implement a set of high-level commands that provide an easier interface for creating specific types of formatted output.

LaTeX, for example, includes commands for creating tables of contents, indexes, citations, cross-references, numbered sections, numbered equations, and so on. TeX does not directly support these types of elements. An individual author or publisher can also create specific style files to define the style of a particular type of document. By using a macro package in combination with one or more style files, a user can create complex documents formatted according to a specific style, without having to know the technical details of TeX. In the rest of this chapter, we use the term TeX to refer to documents authored using any TeX package, such as Plain TeX, LaTeX, or AMS-TeX. The term "LaTeX" is used when we make comments that apply specifically to LaTeX documents.

TeX and MathML

TeX provides special commands for entering many different types of mathematical constructs, such as fractions, square roots, subscripts, and superscripts, as well as the large number of special characters and symbols important in mathematical notation. TeX automatically applies the various rules and conventions of mathematical typesetting, such as using different styles and spacing for numbers, identifiers, and operators.

Using TeX, an author can type in complicated mathematical formulas quickly and easily using characters present on any standard keyboard. Then, the TeX program processes the formulas and converts them into beautiful-looking typeset equations. This combination of simplicity in authoring and excellence in typographic output has made TeX a popular tool among the majority of physicists, mathematicians, and other researchers who produce complex technical documents. Most physics and mathematics journals specify TeX as their preferred submission format and use TeX extensively in their internal production processes.

Although TeX and MathML can both be used as markup languages for describing mathematics, they have many points of contrast. TeX is compact, while MathML is verbose. TeX is intended for authoring by humans, while MathML is best generated using software tools. TeX can describe only the appearance of mathematical formulas, while MathML can also describe their symbolic meaning. TeX is intended for producing printed output, while MathML is primarily for displaying mathematics in Web pages.

For all these reasons, TeX and MathML have a complementary role, and authors can use one or the other depending on the context. As MathML use becomes more widespread, conversion between TeX and MathML will become increasingly important. In particular, authors and publishers will need to convert the vast number of TeX-based documents stored in scientific databases into HTML or XHTML (with embedded MathML) for deployment on the Web.

An author who wants to convert a TeX document into a Web document that contains MathML has two main options:

  • Convert individual mathematical formulas in the document from TeX to MathML. The resulting MathML markup can then be pasted into an HTML document (produced using some other application) to produce a complete document that can be displayed in a Web browser.
  • Convert the entire TeX document into HTML or XHTML with all formulas converted into MathML. This approach is clearly convenient for large-scale conversion of complete documents.

We shall see examples of both types of conversions in this chapter.

Most TeX constructs can be easily mapped to equivalent presentation MathML elements. For example, the \frac and \sqrt commands in LaTeX correspond to the mfrac and msqrt elements in MathML. However, there are two types of problems in using software for translating TeX to MathML. One is that complicated TeX constructs, especially those involving user-defined macros, are not so easy to interpret. Another problem arises with scripted expressions like $(a+b)^2$. When a program that parses formulas in linear order encounters a superscript, the program may no longer have information about where the base expression begins. Hence, the program might not be able to translate an expression like $(a+b)^2$ into the corresponding <msup>...</msup> element in MathML.

It is useful to keep these issues in mind while you are trying out the software tools discussed in the rest of this chapter. Most of the tools for converting between TeX and MathML are still at an early stage of development, mainly because MathML itself is relatively new. Hence, you may need to do some trial and error to fine-tune your results and get documents that are viewable on the Web in the way you want. However, these tools are under active development, so you can expect rapid progress as far as the range of TeX constructs that can be translated and in the quality of the output.

   

<< back next >>

 

 

 


 

Copyright © CHARLES RIVER MEDIA, INC., Massachusetts (USA) 2003
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "The MathML Handbook" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.


CHARLES RIVER MEDIA, INC., 20 Downer Avenue, Suite 3, Hingham, Massachusetts 02043, United States of America