XSLT and XPath function reference in alphabetical order
(Excerpt from “XSLT 2.0 & XPath 2.0” by Frank Bongers, chapter 5, translated from German)
String functions – pattern matching
A xs:string string; the input string in which all instances of matches of the regular expression are replaced by the replacement string.
fn:replace($inputString?, $reg-ex, $replacementString?, $flags?)
Optional. A xs:string input string in which, by means of the regular expression of the second argument, those substrings on which a match occurs are replaced by the replacement character string of the third argument. If the empty sequence is passed on, it is treated like the empty string.
Obligatory. The argument consists of a regular expression which is used for testing the string.
Obligatory. A xs:string replacement string by which all non-overlapping matches of the regular expression are replaced on the input string. Within the replacement string, references in the form of subgroup variables $1 ... $N on subgroups 1 to N of the reg-ex match are possible. If the replacement string is empty or the empty sequence is passed on as third argument, all matches in the input string are replaced by the empty string.
Optional. By means of the flag argument the effect of the regular expression can be controlled. If the empty string or the empty sequence is passed on for $flags, or the flag argument is omitted, firstly, »case-sensitive« is tested per default and secondly, the input string is regarded as a closed (not line orientated) character string.
Purpose of use:
By means of the fn:replace() function all instances of the substring indicated by a match of the reg-ex are replaced by the replacement character string of the third argument. If two matches overlap, only the area of the first match is replaced.
A match occurs if the regular expression matches at one or several points of the tested string. For this purpose, the use of capital and small initial letters is to be taken into account (case-sensitive). Line break characters within the string passed on are evaluated as normal characters (string mode). All substrings found by the match are replaced.
It is an error if the regular expression in its entirety matches the empty string (»Regular expression matches zero-length string«, err:FORX0003). However, a substring stored in a subgroup (captured substring) may well be empty – in this case, just no match was recorded for the subgroup in question.
Setting flags with the $flags argument:
In order to modify the behaviour of the regular expression, as fourth argument a string as so-called »flag« may be passed on in addition to the input string, the regular expression and the replacement string. The permitted flags are based on the conventions of Perl.
Attention – deviation from the flags in the Perl syntax:
The Perl flag g (»global«) is not supported in XPath!
The argument may consist of the single letters m, i, s, x or (in undetermined order) of reasonable combinations of these. Also the empty string (or the empty sequence) is permitted. An invalid flag argument (derivating from the permitted characters) is answered by the processor with the error message »Invalid regular expression. flags« (err:FORX0001).
|i||»ignore« – ignores for the matches the case sensitivity in the string examined.|
|m||»multiline« – notes line breaks in the string examined (permits a multiple match by the patterns ^ and $).|
|s||switches to the so-called »dot-all« mode. |
Explanation: The behaviour of the meta character ».« (dot) is influenced. Without set s flag, this meta character applies to all characters (also whitespace!), except for the new line character #x0A (NL).
|x||deactivates the notice of the whitespace characters #x9, #xA, #xD and #x20 within the regular expression. However, if the flag is not set, whitespace characters are considered as part of the expression and taken into account for the match. |
Explanation: The set x flag makes it possible to arrange longer regular expressions more clearly by using line breaks and tabulators.
|''||The empty string – is expressly permitted as value. It corresponds to the non-passing on of a flag argument.|
Table: in XPath permitted symbols for flags in regular expressions
The 'i' (»ignore«) value effects the case insensitivity in the string examined. A match is also reported even if the found substring does not match the pattern in terms of notation:
fn:replace('ABCabc', 'a', 'X', 'i')
results in 'XBCXbc' because due to the 'i' flag, the characters 'A' and 'a' are equally found.
fn:replace('ABCabc', 'a', 'X')
results in 'ABCXbc' because without flag, case sensitivity applies, which means only the small 'a' is replaced.
With the 'm' (»multiline«) flag the interpretation of the input string is switched from string mode to multiline mode. With set 'm' flag, the character string passed on is interpreted as »multiline«, so that line break characters (#x0A) contained in it are regarded as line delimiters. This has an effect on the meta characters ^ and $ which recognise the beginning and the end of a characters string: In this example, their matches are extended to the beginning and the end of the line:
def', 'abc$', 'XXX', 'm')
def', because of the multiline flag the line break, for which the meta character $ stands, behind the substring abc is recognised as the end of the line.
def', 'abc$', 'XXX')
returns, in contrast, the input string unchanged, since in this case the character string is examined as a whole (in the string mode). Therefore, the character string has only one recognisable end for the function, the string end itself – but no match occurs for the string end.
Regular expressions – brief overview:
A brief overview of meta characters usable in regular expressions and their meaning can be found in the explanations for the fn:matches() function.
Attention – regular expressions in XPath are simplified compared to Perl:
Although regular expressions in XPath are widely based on the respective Perl syntax, there are simplifications and with that deviations. Therefore, a Perl reg-ex cannot always be applied equally effective in XPath.
Generation of back references by subgroup brackets:
Within the regular expression, subgroups can be marked by round brackets. The matches of each of these subgroups (captured substrings) can be treated like variables and referenced in the replacement string.
For this purpose, only a maximum of nine subgroups is available. The reference is made by means of the expressions $1 to $9. The $ character only refers to the immediately following digit, therefore, for example, $10 cannot be written.
If there is no appropriate subgroup for a reference or an indicated existing nth group does not result in a match, the respective subgroup reference $n is replaced by the empty string.
If a match is possible for two overlapping positions, the first possible match is selected:
fn:replace("abcd", "(ab)|(a)", "[1=$1][2=$2]")
results in: "[1=ab][2=]cd".
Here, the variable reference $2 is replaced by the empty string. Because for the second group a match occurs which overlaps with the first match.
Outside of subgroup references the $ character can only be used in the replacement string as a literal character if a backslash is put in front of it (\$). Accordingly, a $ character without prepended backslash must be followed immediately by one of the digits 1 to 9 – otherwise the error »Invalid replacement string« (err:FORX0004) is reported.
Also a single backslash '\' must not occur in the replacement string. It has always to be escaped with a further backslash in the form '\\'.
Example 1 – character replacement with fn:replace:
fn:replace("abracadabra", "bra", "X")
results in: "aXcadaX".
Example 2 – character replacement with fn:replace:
fn:replace("abracadabra", "a.*a", "X")
results in: "X".
Example 3 – character replacement with fn:replace:
fn:replace("abracadabra", "a.*?a", "X")
results in: "XcXbra".
Example 4 – character replacement with fn:replace:
fn:replace("abracadabra", "a", "")
results in: "brcdbr".
All found matches are replaced by the empty string.
Example 5 – character replacement with fn:replace:
fn:replace("abracadabra", "a(.)", "a$1$1")
results in: "abbraccaddabbra".
In this example the replacement string contains bracketed subgroups. The subgroup variable $1 contains each time any a(.) character following an a which, per match, is accordingly outputted twice via $1$1.
Example 6 – error by match on an empty string:
fn:replace("abracadabra", ".*?", "$1")
generates an error message since the .*? expression matches the empty string.
The function is not available.
fn:replace($input as xs:string?,
$pattern as xs:string,
$replacement as xs:string) as xs:string?
fn:replace($input as xs:string?,
$pattern as xs:string,
$replacement as xs:string
$flags as xs:string) as xs:string?
|<< back||next >>|
Copyright © Galileo Press, Bonn 2008
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "XSLT 2.0 & XPath 2.0" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.
Galileo Press, Rheinwerkallee 4, 53227 Bonn, Germany