Atomic Steps

An Atomic Step performs a single operation. It does not contain internal “subpipelines“ or further steps but it performs simple actions.

Atomic Steps in XProc

Figure: Atomic Steps in XProc

The important Atomic Steps

In this example the important Atomic Steps of XProc are introduced and explained.

p:identity

<p:declare-step type="p:identity">
   <p:input port="source" sequence="true"/>
   <p:output port="result" sequence="true"/>
</p:declare-step>

The <p:identity> step makes a direct copy of the input document or the input on the input port (“source“) and re-outputs it on the ouput port (“result“).

Example

The following script re-outputs the entire content of the input port directly.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
  <p:input port="source">
    <p:document href="FilmCollection.xml"/>
  </p:input>
  <p:output port="result"/>
  <p:identity/>
</p:declare-step>

p:xslt

<p:declare-step type="p:xslt">
   <p:input port="source" sequence="true" primary="true"/>
   <p:input port="stylesheet"/>
   <p:input port="parameters" kind="parameter"/>
   <p:output port="result" primary="true"/>
   <p:output port="secondary" sequence="true"/>
   <p:option name="initial-mode"/> <!-- QName -->
   <p:option name="template-name"/> <!-- QName -->
   <p:option name="output-base-uri"/> <!-- anyURI -->
   <p:option name="version"/> <!-- string --> 
</p:declare-step>

With the help of the <p:xslt> step, XSLT transformations can be performed. The step has three input ports which indicate the document to be imported, the XSLT stylesheet to be transformed and optional parameters. The step also has two output ports. The first output port (“result“) provides the primary result of the respective transformation, the second output port (“secondary“) outputs all the other output documents (if some have been generated). In addition, options are available. So the user may give an indication under “inital-mode“ which influences the behaviour of the transformation. However, this mode must have been defined in the underlying XSLT stylesheet. Under “template-name“ the name of a named template can be indicated in the XSLT stylesheet. This means that the transformation starts with this template. With “output-base-uri“ it is possible to give an indication regarding the output of the XSLT transformation in the file system. “version“ indicates the XSLT version to be used (1.0 or 2.0). All options are optional.

Example

In this example a XSLT stylesheet shall be generated which transforms the file to be imported into a XHTML page indicating the film titles in a list.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" version="2.0">
  <xsl:output method="xhtml"/>
  <xsl:template match="/">
    <html>
      <body>
        <h1>FilmList</h1>
        <ul>
          <xsl:apply-templates select="//Title"/>
        </ul>
      </body>
    </html>
  </xsl:template>
  <xsl:template match="Title">
    <li>
      <xsl:value-of select="."/>
    </li>
  </xsl:template>
</xsl:stylesheet>

The XSLT stylesheet above generates a XHTML page on the basis of the character of the imported XML file. This page generates an unordered list containing the film titles. This stylesheet shall become part of the XProc document.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
 <p:input port="source">
  <p:document href="FilmCollection.xml"></p:document>
 </p:input>
 <p:output port="result"/>
 <p:xslt>
   <p:input port="stylesheet">
     <p:document href="xslt_stylesheet.xsl"></p:document>
   </p:input>
   <p:input port="parameters">
     <p:empty/>
   </p:input>
 </p:xslt>
</p:declare-step>

In the <p:xslt> step the required input ports are defined. So, in the “stylesheet“ port the XSLT stylesheet to be imported is indicated via <p:document>. For the purpose of this example, it is assumed that all files reside in the same directory. Since the stylesheet has no parameters, this has to be clearly indicated by <p:empty> (otherwise this would lead to a “dynamic error“).

The result of the transformation is as follows:

<html xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl">
 <body>
   <h1>FilmList</h1>
   <ul>
     <li>Star Wars: Episode IV - A New Hope</li>
     <li>Eraserhead</li>
     <li>Unforgiven</li>
   </ul>
 </body>
</html>

p:filter

<p:declare-step type="p:filter">
   <p:input port="source"/>
   <p:output port="result" sequence="true"/>
   <p:option name="select" required="true"/> <!-- XPathExpression -->
</p:declare-step>

The filter port essentially has the same functionality as the input port. The main difference between both ports is that the content to be imported can be exactly defined or filtered by an appropriate XPath expression.

Example

In the following example all title elements are filtered out of the input document.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
  <p:input port="source">
    <p:document href="FilmCollection.xml"/>
  </p:input>
  <p:output port="result" sequence="true"/>
  <p:filter select="/FilmCollection/Film/Title"/>
</p:declare-step>

In the “select“ attribute the XPath expression is indicated. The Boolean value in the “sequence“ href attribute of <p:output> should be set to “true“ since XPath expressions normally produce several output documents (for each match one document).

The following output is produced by the XProc processor:

<Title>Star Wars: Episode IV - A New Hope</Title>
<Title>Eraserhead</Title>
<Title>Unforgiven</Title>

p:compare

<p:declare-step type="p:compare">
   <p:input port="source" primary="true"/>
   <p:input port="alternate"/>
   <p:output port="result" primary="false"/>
   <p:option name="fail-if-not-equal" select="'false'"/> <!-- boolean -->
</p:declare-step>

The <p:compare> step compares two documents regarding their “equality“. If both documents are identical, the Boolean value “true“ is outputted as return value, otherwise it is outputted as “false“.

Example

In this example two documents are compared.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0" name="Pipeline">
  <p:input port="source">
    <p:document href="FilmCollection.xml"/>
  </p:input>
  <p:input port="alternate">
    <p:document href="FilmCollection_2.xml"/>
  </p:input>
  <p:output port="result"/>
  <p:compare name="Compare" fail-if-not-equal="false">
    <p:input port="source">
      <p:pipe port="source" step="Pipeline"/>
    </p:input>
    <p:input port="alternate">
      <p:pipe port="alternate" step="Pipeline"/>
    </p:input>
  </p:compare>
  <p:identity>
    <p:input port="source">
      <p:pipe port="result" step="Compare"/>
    </p:input>
  </p:identity>
</p:declare-step>

In this example the documents “FilmCollection.xml“ and “FilmCollection_2.xml“ are compared. The pipeline gets a name in the <p:declare-step> root element (“Pipeline“) in order to be able to connect it later with the corresponding ports in the <p:compare> step. The <p:compare> step gets the “Compare“ string as name and “fail-if-not-equal“ is set to “false“. As a consequence, “true“ or “false“ is outputted as result. Otherwise a dynamic error (if the documents are unequal) would be outputted by the XProc processor.

As stated above, the two input ports within <p:compare> are connected with the outer “input ports“ via <p:pipe>. To make this possible, the “step“ attribute needs the name of the step to be linked (“Pipeline“ in this example). Since <p:compare> is now provided with the appropriate documents, the process can be executed. The result is read and outputted by <p:identity>.

If there is a difference between the documents, the result would be as follows:

<c:result>false</c:result>

p:directory-list

<p:declare-step type="p:directory-list">
   <p:output port="result"/>
   <p:option name="path" required="true"/> <!-- anyURI -->
   <p:option name="include-filter"/> <!-- RegularExpression -->
   <p:option name="exclude-filter"/> <!-- RegularExpression -->
</p:declare-step>

With <p:directory-list> it is possible to read the content of a directory within the file system. The desired directory is indicated by the “path“ option. If the processor does not find the directory when analysing the data, a dynamic error will be outputted. Furthermore, filtering rules can be defined. These have to contain a regular expression according to the XPath 2.0 rules. With “include-filter“ you can determine what exactly shall be outputted. Similarly, with “exclude-filter“ you can define which data shall not be displayed.

Example

In the following example the current directory (“.“) shall be read out.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
  <p:input port="source"> </p:input>
  <p:output port="result"/>
  <p:directory-list path="."/>
</p:declare-step>

The output is as follows:

<c:directory name="directory" xml:base="file:../Bachelorarbeit/SourceCodes/directory/">
 <c:file name="directory.xpl"/>
 <c:file name="file.xml"/>
 <c:file name="file2.xml"/>
 <c:file name="file3.txt"/>
 <c:file name="file4.doc"/>
</c:directory>

If the user only wants to get files outputted having the extension “*.txt“, the <p:directory> element needs a filter option as attribute:

<p:directory-list include-filter=".*txt" path="."/>

Therefore, the output would be as follows:

<c:directory name="directory" xml:base="file:../Bachelorarbeit/SourceCodes/directory/">
 <c:file name="file3.txt"/>
</c:directory>

If the user wants to exclude all files with the extension “*.txt“ from the output, the filtering rule has to be adjusted:

<p:directory-list exclude-filter=".*txt" path="."/>

The result would be as follows:

<c:directory name="directory" xml:base="file:/Users/tg/Documents/FH%20Worms/6_Semester/Bachelorarbeit/SourceCodes/directory/">
 <c:file name="example_14b_directory.xpl"/>
 <c:file name="file.xml"/>
 <c:file name="file2.xml"/>
 <c:file name="file4.doc"/>
</c:directory>

If both variants are used as filtering rules, firstly the “include-filter“ and then the “exclude-filter“ rule is executed.

p:load

<p:declare-step type="p:load">
   <p:output port="result"/>
   <p:option name="href" required="true"/> <!-- anyURI -->
   <p:option name="dtd-validate" select="'false'"/> <!-- boolean -->
</p:declare-step>

The <p:load> step is very similar to the functionality of <p:document>. Both import external documents or accept appropriate URIs or inline generated documents. The primary difference is that <p:document> is a XProc pipeline element and <p:load> a step. This means it can appear in other places in the stylesheet as <p:document>. Furthermore, the step supports a validation against a DTD (provided that it is deposited in the document to be imported). If the DTD is invalid or missing, a dynamic error is outputted.

Example

In the following example the most trivial possible use of <p:load> is demonstrated.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
  <p:output port="result"/>
  <p:load href="FilmCollection.xml"/>
</p:declare-step>

p:make-absolute-uris

<p:declare-step type="p:make-absolute-uris">
   <p:input port="source"/>
   <p:output port="result"/>
   <p:option name="match" required="true"/> <!-- XSLTMatchPattern -->
   <p:option name="base-uri"/> <!-- anyURI -->
</p:declare-step>

The <p:make-absolute-uris> step generates absolute links of the content which is indicated by an appropriate XSLT expression. With the “base-uri“ option, the user can indicate a base URI. All entries which are transformed into an absolute path have this value as prefix.

Example

In the following example all directors shall be transformed into absolute URIs with the base address "http://www.fh-worms.de".

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
  <p:input port="source">
    <p:document href="FilmCollection.xml"/>
  </p:input>
  <p:output port="result"/>
  <p:make-absolute-uris base-uri="http://www.fh-worms.de/" match="/FilmCollection/Film/Director"/>
</p:declare-step>

The result of this process is as follows (excerpt):

<FilmCollection>
  <Film>
    <Title>Star Wars: Episode IV - A New Hope</Title>
    <Year>1978</Year>
    <Genre>SciFi</Genre>
    <Director>http://www.fh-worms.de/George%20Lucas</Director>
    <Producer>George Lucas</Producer>
    <Cast>
[...]

p:exec

<p:declare-step type="p:exec">
   <p:input port="source" primary="true" sequence="true"/>
   <p:output port="result" primary="true"/>
   <p:output port="errors"/>
   <p:output port="exit-status"/>
   <p:option name="command" required="true"/> <!-- string -->
   <p:option name="args" select="''"/> <!-- string -->
   <p:option name="cwd"/> <!-- string -->
   <p:option name="source-is-xml" select="'true'"/> <!-- boolean -->
   <p:option name="result-is-xml" select="'true'"/> <!-- boolean -->
   <p:option name="wrap-result-lines" select="'false'"/> <!-- boolean -->
   <p:option name="errors-is-xml" select="'false'"/> <!-- boolean -->
   <p:option name="wrap-error-lines" select="'false'"/> <!-- boolean -->
   <p:option name="path-separator"/> <!-- string -->
   <p:option name="failure-threshold"/> <!-- integer -->
   <p:option name="arg-separator" select="' '"/> <!-- string -->
   <p:option name="byte-order-mark"/> <!-- boolean -->
   <p:option name="cdata-section-elements" select="''"/> <!-- ListOfQNames -->
   <p:option name="doctype-public"/> <!-- string -->
   <p:option name="doctype-system"/> <!-- anyURI -->
   <p:option name="encoding"/> <!-- string -->
   <p:option name="escape-uri-attributes" select="'false'"/> <!-- boolean -->
   <p:option name="include-content-type" select="'true'"/> <!-- boolean -->
   <p:option name="indent" select="'false'"/> <!-- boolean -->
   <p:option name="media-type"/> <!-- string -->
   <p:option name="method" select="'xml'"/> <!-- QName -->
   <p:option name="normalization-form" select="'none'"/> <!-- NormalizationForm -->
   <p:option name="omit-xml-declaration" select="'true'"/> <!-- boolean -->
   <p:option name="standalone" select="'omit'"/><!-- "true" | "false" | "omit" -->
   <p:option name="undeclare-prefixes"/> <!-- boolean -->
   <p:option name="version" select="'1.0'"/> <!-- string -->
</p:declare-step>

With the <p:exec> step external commands/programmes can be executed on the command line of the respective operating system. The contents at the input port (“source“) are assigned to the respective call. The programme which shall be executed is defined by the “command“ option. The option expects a string which contains the respective command call. Under the option “args“, parameters can be indicated which may influence the behaviour of the programme to be called up. A typical parameter is for example “-h“ which usually outputs the help. Various parameters are indicated by a space character. However, this space character can also be freely defined under “arg-seperator“. With the “cwd“ (“Current Working Directory“) option, the user can indicate the path in the file system where the programme to be called up is located. If the programme cannot be found, a dynamic error is outputted. If this option is not used, the value is by default the place in the file system where the programme to be executed is located. The Boolean options “source-is-xml“ and “result-is-xml“ tell the processor, whether the input information or the output information is XML-based or not. If the options “wrap-result-lines“ or/and “wrap-error-lines“ are set to “true“, all lines of the output information are enclosed with a <c:line> element.

If the “output-is-xml“ option is also set to “true“, a dynamic error is outputted. The “path-seperator“ option indicates how the character for a separation of paths in the file system has to look like. If this character is contained in the specifications of “command“, “args“ or “cwd“, it will be replaced by the path separation character attached to the respective operating system. Further options are available which enable more detailed specifications about the character of the data to be imported (e.g. which kind they are under the option “media-type“) and their processing guidelines (e.g. whether the output data shall be indented under “indent“). Since these options are strongly case-based and irrelevant for the primary significance of the step, they are not discussed further in the following.

The step also has three output ports. The primary output port “result“ outputs the result of the propramme being executed. In the output port “errors“ any errors are listed which means the outputs on the command line written on the standard error output. The “exit-status“ port always outputs an integer value which gives indication about the successful execution of the command. So, in the event of a success, the value <c:result>0</result> appears and in the event of an error, the value <c:result>1</c:result> appears.

Example

In the following example the Unix command “cat“ is executed which shall output the “FilmCollection.xml“ document.

<?xml version="1.0" encoding="UTF-8"?>
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0" name="testpipe">
  <p:input port="source">
    <p:document href="FilmCollection.xml"/>
  </p:input>
  <p:output port="result" sequence="true">
    <p:empty/>
  </p:output>
  <p:exec command="cat" result-is-xml="true" name="exec"/>
  <p:identity>
    <p:input port="source">
      <p:pipe port="errors" step="exec"/>
    </p:input>
  </p:identity>
  <p:store href="Error.xml"/>
  <p:identity>
    <p:input port="source">
      <p:pipe port="exit-status" step="exec"/>
    </p:input>
  </p:identity>
  <p:store href="exit_status.xml"/>
  <p:identity>
    <p:input port="source">
      <p:pipe port="result" step="exec"/>
    </p:input>
  </p:identity>
  <p:store href="Result.xml"/>
</p:declare-step>

Each of the three output ports is cought by a <p:identity> step and saved in a file via <p:store>. The result is as follows (excerpt):

<c:result>
 <FilmCollection>
   <Film>
    <Title>Star Wars: Episode IV - A New Hope</Title>
    <Year>1978</Year>
    <Genre>SciFi</Genre>
[...]

Since the process has been successfully carried out, there is no output at the “errors“port. Therefore, the created file “Error.xml“ has no content. In the “exit-status“ port <c:result>0</c:result> is outputted accordingly (which corresponds to the content of “exit-status.xml“).

p:store

<p:declare-step type="p:store">
   <p:input port="source"/>
   <p:output port="result" primary="false"/>
   <p:option name="href" required="true"/> <!-- anyURI -->
   <p:option name="byte-order-mark"/> <!-- boolean -->
   <p:option name="cdata-section-elements" select="''"/> <!-- ListOfQNames -->
   <p:option name="doctype-public"/> <!-- string -->
   <p:option name="doctype-system"/> <!-- anyURI -->
   <p:option name="encoding"/> <!-- string -->
   <p:option name="escape-uri-attributes" select="'false'"/> <!-- boolean -->
   <p:option name="include-content-type" select="'true'"/> <!-- boolean -->
   <p:option name="indent" select="'false'"/> <!-- boolean -->
   <p:option name="media-type"/> <!-- string -->
   <p:option name="method" select="'xml'"/> <!-- QName -->
   <p:option name="normalization-form" select="'none'"/> <!-- NormalizationForm -->
   <p:option name="omit-xml-declaration" select="'true'"/> <!-- boolean -->
   <p:option name="standalone" select="'omit'"/> <!-- "true"|"false"|"omit" -->
   <p:option name="undeclare-prefixes"/> <!-- boolean -->
   <p:option name="version" select="'1.0'"/> <!-- string -->
</p:declare-step>

With the help of <p:store>, incoming document information can be saved in an URI. The “href“ value is displayed as “anyURI“, which means that the user has the free choice when identifying the target (normally, a document name). Since <p:store> has no primary output port, it must be defined when generating. The other options are all optional.

Example

In the following example the imported input is written into a XML file (“output.xml“).

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
 <p:input port="source">
  <p:document href="FilmCollection.xml"></p:document>
 </p:input>
 <p:output port="result"><p:empty/></p:output>
 <p:store href="output.xml"></p:store>
</p:declare-step>

The document “output.xml“ is generated in the same directory in which the XProc stylesheet is located.

<< back

next >>