Translating between content and presentation markup

(Excerpt from "The MathML Handbook" by Pavi Sandhu)

In the XSLT transformations discussed so far, both the input and the output were in the form of presentation MathML. However, XSLT is also an effective tool for transforming content MathML into presentation MathML, or vice versa.

Converting presentation MathML to content MathML is useful in situations where the mathematical meaning of the markup is important. For example, a student might want to copy a mathematical formula displayed in a Web page (using presentation MathML) and then paste the markup into a computer algebra system for evaluation. For many types of formulas, the mathematical meaning can be inferred from the notation and so a stylesheet can apply simple heuristic rules for transforming presentation markup into content markup.

For example, a superscript, as in x2, can be interpreted as a power. However, this is not always accurate. There could be some cases, for example, where an author uses x2 to denote the second component of a vector x. In general, any notation described using presentation MathML can have more than one meaning or sometimes no meaning at all. Hence, it is impossible to write a single stylesheet that is general enough to convert any arbitrary presentation MathML into content MathML, even though such a conversion is possible in specific cases.

A more realistic goal is to do the reverse transformation; that is, convert content markup into presentation markup. Since content markup specifies mathematical meaning unambiguously, this type of conversion is always possible. In principle, one can create a single stylesheet that is general enough to take almost any type of valid content markup expression and convert it into the corresponding presentation markup. Such a stylesheet can be very useful since it allows you to specify the notation for a formula independently of its mathematical meaning. For example, an author can create a technical paper in which all the formulas are specified using content MathML. A publisher can then use an XSLT stylesheet to convert the content MathML in the document into presentation MathML while applying specific notational rules that enforce the style of a particular journal.

The conversion of content markup into presentation markup is also of great importance when you are displaying MathML in Web browsers. Most browsers that support native display of MathML, such as Amaya, Mozilla 1.0, and Netscape 7.0, can recognize only MathML presentation tags. They cannot render content MathML. However, Mozilla 1.0 and Netscape 7.0do have built-in support for XSLT transformations. Hence, one possible strategy for rendering content MathML in such browsers is to use an XSLT stylesheet that will convert any content MathML expression into presentation MathML. This can then be displayed using the browser’s native rendering abilities.

David Carlisle, as part of his work on the Universal MathML stylesheet, has successfully implemented this approach. As we saw under The Universal MathML stylesheet, this is a large and complex XSLT stylesheet that allows both content and presentation MathML to be displayed on a wide variety of browsers and using a variety of plug-ins. One feature of this stylesheet is that it includes templates for all the different types of content elements. When applied to any document that contains content MathML, the stylesheet automatically transforms each instance of content MathML in the document into an equivalent presentation MathML expression.

In this section, we focus on giving some simple examples of XSLT transformations for converting content MathML to presentation MathML. Of course, you do not need to write such a stylesheet from scratch; for most purposes, you can use the stylesheet already created by David Carlisle. The purpose of this section is to illustrate some of the general techniques and issues involved in doing such conversions. You can then use these techniques to customize the existing stylesheet, for example, to implement a different set of notational preferences.

Factorials

As the first example, let’s write a stylesheet that will transform a content MathML document involving the factorial operator into presentation MathML. The following example shows a simple MathML document that we can use to illustrate this behavior.

Example: A content MathML document that uses the factorial element.

<math>
  <apply>
    <factorial/>
    <ci>n</ci>
  </apply>
</math>

The factorial operator is typically indicated by an exclamation mark after its operand; that is, the factorial of n is shown as n!. The following example shows a simple stylesheet that generates presentation markup using this notation.

Example: An XSLT stylesheet for transforming content MathML expressions involving the factorial operator.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="math">
    <math><xsl:apply-templates/></math>
  </xsl:template>
  <xsl:template match="ci">
    <mi><xsl:apply-templates/></mi>
  </xsl:template>
  <xsl:template match="apply[factorial]">
    <mrow><xsl:apply-templates select="*[2]"/><mo>!</mo></mrow>
  </xsl:template>
</xsl:stylesheet>

The key template in the stylesheet is the one for the apply element. The select attribute of this template has the value apply[factorial], which is the XPath expression for an apply element having a factorial element as one of its children. When this template is matched, it first writes an opening mrow tag to the output. It then triggers all templates that match the second child of the apply element, which happens to be a ci element. This triggers the template for the ci element, which simply copies the content of the matched ci element into the output and replaces the ci tags with mi tags. The template for the apply element then resumes and copies the literal text <mo>!</mo> to the output, followed by a closing mrow tag.

Applying the stylesheet to the above MathML document yields the following output:

<?xml version="1.0" encoding="utf-8"?>
<math>
  <mrow><mi>n</mi><mo>!</mo></mrow>
</math>

The above stylesheet is a relatively simple one and is not general enough to accommodate all the different contexts in which a factorial element might occur in a content MathML expression. It is easy to construct cases where the stylesheet breaks down. For example, suppose you used the content MathML expression for the factorial of 2n as the input expression instead of n, as shown in the following example.

Example: A MathML document that contains the content markup for (2n)!

<math>
  <apply>
    <factorial/>
    <apply><mn>2</mn><ci>n</ci></apply>
  </apply>
</math>

Applying the same stylesheet to this expression yields the following output:

<?xml version="1.0" encoding="utf-8"?>
<math>
  <mrow>2<mi>n</mi><mo>!</mo></mrow>
</math>

This output is not valid presentation markup since the 2 is not enclosed in an mn element, as required by MathML. The reason for this is that the stylesheet does not contain a template for the cn element. Hence, when a cn element is encountered, the default template for it is used, which has the effect of simply copying the element's content without wrapping it in any tags. For cn elements to be processed properly, we can add the following template to the stylesheet:

<xsl:template match="cn">
  <mn><xsl:apply-templates/></mn>
</xsl:template>

With the addition of this template to the stylesheet, processing the MathML document yields the following output:

<?xml version="1.0" encoding="utf-8"?>
<math>
  <mrow>
    <mn>2</mn>
    <mi>n</mi>
    <mo>!</mo>
  </mrow>
</math>

This is better than before since now at least the 2 is properly enclosed in the tags for the mn element. However, there is still a problem with the output expression. If it is rendered in a browser, it will appear as 2n!, which is ambiguous, since it could represent either the factorial of 2n or two times the factorial of n. Of course, presentation markup specifically provides the entity reference &InvisibleTimes; to distinguish such cases. We could, of course, modify our stylesheet to insert this reference in cases where multiplication is intended. But it would be better still to use parentheses to explicitly indicate all complex operands to which the factorial operator is applied, as in (2n)!. You can achieve this by modifying the template for the apply element in the stylesheet to the form shown below:

<xsl:template match="apply[factorial]">
  <mrow>
    <xsl:choose>
      <xsl:when test="(*2=(ci or cn))">
        <xsl:apply-templates select="*[2]"/>
        <mo>!</mo></mrow>
      </xsl:when>
      <xsl:otherwise>
        <mrow>
          <mo>(</mo>
          <xsl:apply-templates select="*[2]"/>
          <mo>)</mo>
        </mrow>
        <mo>!</mo>
        </mrow>
      </xsl:otherwise>
    </xsl:choose>
  </mrow>
</xsl:template>

The difference between this and the template shown in the example An XSLT stylesheet for transforming content MathML expressions involving the factorial operator is that now we have included an xsl:when element for conditional processing. When the child element immediately following the factorial element is either a ci or cn element, the same processing is done as in the above-mentioned example. Otherwise, parentheses are placed around the expression. The revised template therefore takes into account information about the context in which the factorial element occurs. The following example shows the mentioned stylesheet modified to include the more complex template shown above as well as a template for the cn element.

Example: An XSLT stylesheet that causes parentheses to be placed around complex operands of the factorial operator.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="math">
    <math><xsl:apply-templates/></math>
  </xsl:template>
  <xsl:template match="ci">
    <mi><xsl:apply-templates/></mi>
  </xsl:template>
  <xsl:template match="cn">
    <mn><xsl:apply-templates/></mn>
  </xsl:template>
  <xsl:template match="apply[factorial]">
    <mrow>
      <xsl:choose>
        <xsl:when test="(*2=(ci or cn))">
          <xsl:apply-templates select="*[2]"/>
          <mo>!</mo></mrow>
        </xsl:when>
        <xsl:otherwise>
          <mrow>
            <mo>(</mo>
            <xsl:apply-templates select="*[2]"/>
            <mo>)</mo>
          </mrow><mo>!</mo></mrow>
        </xsl:otherwise>
      </xsl:choose>
    </mrow>
  </xsl:template>
</xsl:stylesheet>

Applying this stylesheet to the document in the example A content MathML document that uses the factorial element yields the following output, which renders correctly as (2n)!:

<?xml version="1.0" encoding="utf-8"?>
<math>
  <mrow><mo>(</mo><mn>2</mn><mi>n</mi><mo>)</mo></mrow><mo>!</mo></mrow>
</math>

Integrals

The last example demonstrated the importance of writing templates that take into account the context in which specific elements might occur in the input document. It is not enough to write a stylesheet that works as intended for a specific document. For the stylesheet to be generally useful, the template for any element must be sufficiently robust to work properly in all the possible contexts that the element could occur. Also, since templates can be called recursively, it is important to ensure that the templates for different elements do not interact in unintended ways. Clearly, writing a stylesheet that is general enough to work for all documents that could be written in a particular XML format, such as MathML, is a challenging task and requires a great deal of testing and fine-tuning.

The importance of making templates that are as general as possible is illustrated in the next example. Suppose we want to create a stylesheet that will convert the content markup for an integral into the corresponding presentation markup. The following example shows the content markup for a simple integral: Integral. Notice that the limits of the integral are specified using the qualifier elements lowlimit and uplimit.

Example: A MathML document showing the content markup for an integral.

<math>
  <apply>
    <int/>
    <bvar><ci>x</ci></bvar>
    <lowlimit><ci>a</ci></lowlimit>
    <uplimit><ci>b</ci></uplimit>
    <apply>
      <ci>sin</ci>
      <ci>x</ci>
    </apply>
  </apply>
</math>

The next example shows an XSLT stylesheet for transforming expressions like the one of the MathML document above into presentation markup.

Example: An XSLT stylesheet for transforming the content markup for an integral into presentation markup.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="math">
    <math><xsl:apply-templates/></math>
  </xsl:template>
  <xsl:template match="ci">
    <mi><xsl:apply-templates/></mi>
  </xsl:template>
  <xsl:template match="apply[sin]">
    <mrow><mi>sin</mi><mo>(</mo><xsl:apply-templates/><mo>)</mo></mrow>
  </xsl:template>
  <xsl:template match="apply[int]">
    <mrow>
      <msubsup>
        <mo>&#8747;</mo>
        <mrow><xsl:apply-templates select="lowlimit"/></mrow>
        <mrow><xsl:apply-templates select="uplimit"/></mrow>
      </msubsup>
      <xsl:apply-templates select="last()"/>
      <mo>d</mo>
      <xsl:apply-templates select="bvar"/>
    </mrow>
  </xsl:template>
</xsl:stylesheet>

The key template in this stylesheet is the one whose select attribute is set to apply[int]. This template is matched by any apply element that contains an int element. The three important statements in this template are:

<xsl:apply-templates select="lowlimit"/>
<xsl:apply-templates select="uplimit"/>
<xsl:apply-templates select="last()"/>

These statements apply all templates that match a lowlimit, uplimit, and the last child element in the apply element, respectively. last() is an XPath expression that refers to the last child element in the current context.

Applying the above stylesheet to the above MathML document results in the following output:

<?xml version="1.0" encoding="utf-8"?>
<math>
  <mrow>
    <msubsup>
      <mo>&8747;</mo>
      <mi>a</mi>
      <mi>b</mi>
    </msubsup>
    <mrow>
      <mi><sin></mi>
      <mi>x</mi>
      <mo>d</mo>
      <mi>x</mi>
    </mrow>
  </mrow>
</math>

If this presentation markup is viewed in a MathML-enabled browser, it renders as Integral, indicating that the transformation was successful.

However, the template used for the int element in the above stylesheet is not very general. Recall that in content markup, the limits of a definite integral can also be represented using other content elements, such as interval and condition. For example, the integral of the above stylesheet can also be represented using the content markup shown in the following example.

Example: A content MathML document that represents an integral using the interval element.

<math>
  <apply>
    <int/>
    <bvar><ci>x</ci></bvar>
    <interval><ci>a</ci><ci>b</ci></interval>
    <apply>
      <ci>sin</ci>
      <ci>x</ci>
    </apply>
  </apply>
</math>

You can extend the above stylesheet to take into account this alternative method of representing the limits of an integral. The only template that needs to be changed is the one that matches the apply[int] element. You can generalize the xsl:apply-templates elements that match the lowlimit and uplimit elements by modifying the value of their select attribute. The modified elements look as follows:

<xsl:apply-templates select="lowlimit|interval/*[1]"/>

<xsl:apply-templates select="uplimit|interval/*[2]"/>

Here, lowlimit|interval/*[1] is an XPath expression that matches either a lowlimit element or the first child of the interval element. Similarly, uplimit|interval/*[2] matches either the uplimit element or the second child of the interval element. The modified stylesheet with these generalized elements is shown in following example.

Example: A modified XSLT stylesheet for converting integrals from content markup to presentation markup.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="math">
    <math><xsl:apply-templates/></math>
  </xsl:template>
  <xsl:template match="ci">
    <mi><xsl:apply-templates/></mi>
  </xsl:template>
  <xsl:template match="apply[sin]">
    <mrow><mi>sin</mi><mo>(</mo><xsl:apply-templates/><mo>)</mo></mrow>
  </xsl:template>
  <xsl:template match="apply[int]">
    <mrow>
      <msubsup>
        <mo>&#8747;</mo>
        <mrow><xsl:apply-templates select="lowlimit|interval/*[1]"/></mrow>
        <mrow><xsl:apply-templates select="uplimit|interval/*[2]"/></mrow>
      </msubsup>
      <xsl:apply-templates select="last()"/>
      <mo>d</mo>
      <xsl:apply-templates select="bvar"/>
    </mrow>
  </xsl:template>
</xsl:stylesheet>

Applying this stylesheet to the above MathML document results in the following output:

<?xml version="1.0" encoding="utf-8"?>
<math>
  <mrow>
    <msubsup>
      <mo>&8747;</mo>
      <mi>a</mi><mi>b</mi>
    </msubsup>
    <mrow>
      <mi><sin></mi>
      <mi>x</mi>
      <mo>d</mo>
      <mi>x</mi>
    </mrow>
  </mrow>
</math>

It is easy to generalize the stylesheet further so it can handle integrals in which the limits are represented using the condition element, as well as the interval or lowlimit and uplimit elements. This would require adding templates for all elements that can occur as child elements of the condition element.

   

<< back next >>

 

 

 


 

Copyright © CHARLES RIVER MEDIA, INC., Massachusetts (USA) 2003
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "The MathML Handbook" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.


CHARLES RIVER MEDIA, INC., 20 Downer Avenue, Suite 3, Hingham, Massachusetts 02043, United States of America