XML technologies / XSLT / XSLT and XPath functions / Alphabetical XSLT and XPath reference / matches

XSLT and XPath function reference in alphabetical order

(Excerpt from “XSLT 2.0 & XPath 2.0” by Frank Bongers, chapter 5, translated from German)

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

fn:matches

Category:

String functions – pattern matching

Origin:

XPath 2.0

Return value:

A Boolean value of xs:boolean type; the value true in case the input string corresponds to the comparison pattern of the regular expression, otherwise the value false.

Call/Arguments:

fn:matches($inputString?, $reg-ex, $flags?)

$inputString:

A xs:string input string which shall be tested for whether the regular expression of the second argument is to be successfully applied to it (or to parts of it). If the empty sequence is passed on at this point, the function returns an empty result sequence.

$reg-ex:

Obligatory. The argument consists of a regular expression which is used for the testing of the string.

$flags:

Optional. By means of the $flags argument the effect of the regular expression can be regulated. If no $flags argument is passed on, firstly, the regular expression tests »case-sensitive« and secondly, it considers the input string as a closed (not line orientated) character string.

Purpose of use:

The fn:matches() function is used for the examination of an input string and returns a Boolean value which describes whether the regular expression also passed on matches the string. No change is made at the input string.

A match occurs if the regular expression applies to any point of the examined string. For this purpose, case sensitivity is respected and line break characters within the string passed on are evaluated as normal characters (string mode).

In order to modify the behaviour of the regular expression, as third argument a string as so-called »flag« may be passed on in addtion to the input string and the regular expression. The permitted flags are based on the conventions of Perl.

Regular expressions – brief overview:

Regular expressions in XPath are largely based on the respective syntax of Perl, however, there are various simplifications and thus also major diviations. Therefore, regular expressions being valid in Perl cannot be applied in each case with the same effect in XPath. Here, a brief overview of meta characters used in regular expressions and their meanings follows.

Setting of flags with the $flags argument:

The argument may consist of the single letters m, i, s, x or (in undetermined order) of reasonable combinations of these letters. Also the empty string (or the empty sequence) is permitted. In case of an invalid $flags argument (deviating from the permitted characters), the processor outputs the error message »Invalid regular expression. flags« (err:FORX0001).

Attention: deviation to flags in the Perl syntax
The Perl flag g (»global«) is not supported in XPath!

Flag symbol	Description
i	»ignore« – ignores for the matches the case sensitivity in the examined string.
m	»multi-line« – respects line breaks in the examined string (allows a multiple match by the patterns ^ and $).
s	switches to the so-called »dot-all« mode. Explanation: The behaviour of the meta character ».« (dot) is influenced. Without any set s flag, this meta character applies to all characters (also whitespace!) except for the line break character #x0A (NL).
x	deactivates the notice of the whitespace characters #x9, #xA, #xD and #x20 within the regular expression. However, if the flag is not set, the whitespace characters are considered as part of the expression and taken into account for the match. Explanation: With the set x flag it is possible to make very long regular expressions clearer by using line breaks and tabulators.
''	The empty string – which is expressly permitted. It corresponds to the non-passing on of a flag argument.

Table: in XPath permitted symbols for flags in regular expressions

Flag 'i':

The 'i' (ignore) value effects that the case sensitivity in the examined string is ignored. A match is also reported if the found substring does not match the pattern regarding its notation:

fn:matches('ABC', 'a', 'i')

results in true since the characters 'A' and 'a' can be equally found by the flag.

fn:matches('ABC', 'a')

results in false since without flag, case sensitivity is respected.

Flag 'm':

The 'm' flag switches the interpretation of the input string from string mode (default) to multi-line mode. With set 'm' flag, the character string passed on is interpreted as »multi-line«, which means the contained line break characters #x0A are considered as line delimiters. This has an effect on the task of the meta characters ^ and $ which recognise the beginning and the end of a character string. In this case this is extended to the beginning and the end of the line:

fn:matches('abc

def', 'abc$', 'm')

results in true, because of the multi-line flag the line break which is represented by the meta character $ is recognised as line end behind the abc substring.

fn:matches('abc

def', 'abc$')

results in false since in this case the character string is examined as a whole (in the string mode). Therefore, the character string has only one recognisable end for the function, which is the string end itself.

Meta character	Description
.	The dot represents any character inclusive the line break (#x0A) unless a m flag has explicitly been set.
^	The (preceding) circumflex represents the beginning of the character string or the beginnging of the line when the m flag is set; also serves as negation in character lists.
$	The (postpositioned) Dollar sign represents the end of the character string or the line end when the m flag is set.
\|	The pipe symbol is used in the sense of »or« in order to choose between characters and groups.
+	Quantifier; preceding character appears once or several times.
*	Quantifier; preceding character appears never or several times.
?	Quantifier; preceding character appears never or once.
{n}	Quantifier; preceding character appears exactly n times.
{n,}	Quantifier; preceding character appears at least n times.
{n, m}	Quantifier; preceding character appears at least n times, but at most m times.
()	Round brackets within the pattern are used for the formation of subgroups and back references.
[]	Squared brackets are used for the indication of character lists, character classes (char classes) or character ranges (char ranges).

Table: permitted meta characters within regular expressions

Meta character '.':

The dot '.' represents any character inclusive the line break character unless the m flag has been explicitly set (this is the exact opposite to the behaviour in Perl).

fn:matches($inputString, '.')

always returns true if $inputString contains any character, meaning it is non-empty.

Meta character '^':

The meta character '^' is prepended to any expression in order to indicate that this must be situated at the beginning of the string:

fn:matches($inputString, '^abc')

returns true if $inputString starts with 'abc'.

In connection with character lists ^ excludes a character to which it is prepended.

Meta character '$':

The meta character '$' is postpositioned to any expression in order to indicate that this must be situated at the end of the string:

fn:matches($inputString, 'abc$')

returns true if $inputString ends with 'abc'.

Meta character '|':

The meta character '|' signals a choice between two subpatterns:

fn:matches($inputString, '^a|b$')

returns true if $inputString begins with 'a' or ends with 'b'.

Meta character '+':

The meta character '+' indicates that the character which precedes it (in this example and in the following the meta character '.') appears once or several times:

fn:matches($inputString, 'a.+')

returns true if $inputString contains an a with one or more following characters (false if the string is empty, does not contain an 'a', consists of the 'a' as sole character or the sole a is located at the string end).

Meta character '*':

The meta character '*' indicates that the character which precedes it appears never or several times:

fn:matches($inputString, 'a.*')

returns true if $inputString contains an a or an a with any number of following characters (false if the string is empty or does not contain an 'a').

Shall the quantifiers + and * act greedy or reluctant?
The quantifiers + and * can be switched from 'greedy' to 'reluctant' by using a postpositioned question mark +? or *?. For fn:matches() this is unimportant since it only matters for the function whether a match occurs, but this is not the case for the functions fn:replace() or fn:tokenize().

Meta character '?':

The meta character '?' indicates that the character which precedes it appears never or once:

fn:matches($inputString, 'a.?')

returns true if $inputString contains an a or an a with any following character (false if the string is empty or does not contain an 'a').

Meta character '{n,m}':

The meta character '{n,m}' indicates that the character which precedes it appears at least n times and at the most m times:

fn:matches($inputString, 'a{2,4}')

returns true if $inputString contains the character sequences 'aa', 'aaa' or 'aaaa'.

Meta character '(...)':

The round brackets group a regular expression or define subpatterns for back references:

fn:matches($inputString, '(ab)|(cd)')

returns true if $inputString contains the substrings 'ab' or 'cd'. However, the fn:matches() function does not use back references which access subpatterns – therefore, the usage of these meta characters makes no sense in this connection.

Meta character '[...]':

The squared brackets mark a character list or a character range:

fn:matches($inputString, '[abc]')

returns true if $inputString contains the characters a, b or c.

fn:matches($inputString, '[a-d]')

returns true if $inputString contains one of the characters of the range a to d, which means a, b, c or d.

Function definition:

XPath 1.0:

The function is not available.

XPath 2.0:

fn:matches($input as xs:string?,

$pattern as xs:string) as xs:boolean?

fn:matches($input as xs:string?,

$pattern as xs:string,

$flags as xs:string) as xs:boolean?

<< back

next >>

Copyright © Galileo Press, Bonn 2008
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "XSLT 2.0 & XPath 2.0" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.

Galileo Press, Rheinwerkallee 4, 53227 Bonn, Germany