XSLT and XPath function reference in alphabetical order

(Excerpt from “XSLT 2.0 & XPath 2.0” by Frank Bongers, chapter 5, translated from German)

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

format-date

Category:

Type conversion and output formatting

Origin:

XSLT 2.0

Return value:

A xs:string string; the input value formatted by means of a pattern string and further parameters to be optionally passed on.

Call/Arguments:

format-date($date?, $formattingPattern, $language?, $calendar?, $location?)

$date:

Optional. A date value of xs:date type which may contain a timezone indication. If the empty sequence is passed on, the function also returns an empty sequence.

$formattingPattern:

Obligatory. A xs:string character string which determines the formatting of the resulting character string in the form of a preset pattern (picture string).

$language:

Optional. A xs:string character string in the form of a country code consisting of two characters according to ISO 3166-1. By the third parameter, a language for the formatting of the data string can be determined. This has a direct effect on the identifiers selected as names of weekdays and months, where applicable, also on the order of the outputted components of the date value. The appropriate identifers and the abbreviations in common use must be available for the system.

$calendar:

Optional. A xs:string character string in the form of an abbreviation consisting of capital letters of that calendar which shall be taken as a basis of the formatting. The default value is that calendar according to ISO (according to the Gregorian calendar) whose support is required. The support of further calendars depends on the implementation.

$location:

Optional. A xs:string character string in the form of a country code consisting of two characters according to ISO 3166-1. This parameter which seems redundant with regard to the language indication is used in order to geographically locate the value to be formatted and to modify the formatting of the time indication according to this location.

Purpose of use:

The format-date() function determines the outer form in which a xs:date date value passed on as first argument is outputted formatted as xs:string string.

In order to determine the appearance of the formatting, a string passed on as second obligatory argument is used as formatting pattern (also referred to as picture string).

A function call without any further arguments refers to the default formatting which is performed according to the system environment. The choice of the calendar (e.g. Gregorian) and the choice of the language setting is concerned. The latter has, for example, influence on the order of the component output or when outputting the daytime, influence on the display of the hour values (0–23 vs. 1–24, 0–11 vs. 1–12).

If the formatting shall not be performed by means of the default settings, an explicit language setting as well as the underlying calendar can be specified by two further optional arguments. A fifth argument is used to bind the formatting of a value to a geographical location.

Attention: The following explanation also applies for other functions!
The following explanation applies likewise for the application of the XLST functions format-dateTime() and format-time() which, together with format-date(), form the group of the date formatting functions of XSLT 2.0.

The formatting pattern (picture string):

A formatting pattern – a simple example would be: '[Y]-[M]-[D]' – is composed of a series of string literals and wildcard symbols for the values to be formatted.

The wildcard symbols are marked by squared brackets surrounding them. Their most important feature, the component symbol (component specifier), is replaced by the appropriate value of the date component to be formatted. Additional space characters or other whitespace characters (line breaks, etc.) do not play any role within the markings.

In the following example, Y, M and D are included as wildcards for year, month and day:

[Y]-[M]-[D]

The string literals are copied unchanged into the result string at the place assigned to them by their position in the formatting string. All components of the formatting string not surrounded by squared brackets (in the example above [Y]-[M]-[D], the hyphens) apply as string literals. The formatting string may consist of any number of or none at all string literals.

Escaping the squared bracket for an output as literal
If a squared bracket itself is part of a string literal, it must be escaped by a further, similar squared bracket: an opening bracket [ to [[ and a closing bracket ] to ]] (see example 9).

The result of a formatting of a date value by means of the above formatting string could be as follows:

[Y]-[M]-[D] => '2003-07-20'

The component symbols:

A number of abbreviations are agreed for the component extraction. However, depending on the type of the value passed on, only such components may be indicated in the formatting string which actually are contained in the value.

That is why, for xs:date date values as expected by the format-date() function, daytime indications like hours, minutes, seconds, etc. cannot be extracted. The attempt leads to an error.

In the following table all defined component symbols are listed for the sake of completeness – the ones not applicable on this function are specially marked.

SymbolComponentPresentationExample
Yyearnumerical2003
Mmonth (related to the year)numerical12
Dday (related to the month)numerical30
dday (related to the year)numerical365
Fday (weekday)identifierFriday
Wweek (calendar week)numerical52
wweek (of the month)numerical3
Hhour (24-hours day)*numerical23
hhour (2 x 12-hours day)*numerical10
P AM/PM indication* identifier AM
m minute (related to the hour)* numerical 59
s second (related to the minute)* numerical 30
f seconds with fraction* numerical 59.99
Z UTC timezone duration + 2:00
z GMT timezone identifier + duration GMT + 2:00
C calendar used identifier Gregorian
E reference era identifier [name of ruler]

Table: complete list of the component symbols (* = not applicable in the context)

Remark: The daytime components H, h, P, m, s and f cannot be used in connection with format-date().

The two timezone symbols are interpreted as follows:

Z stands for a timezone indication related to UTC time which can be outputted numerically as time difference or with the N modifier by ZN as timezone identifier (e.g. EST for Eastern Standard Time). If there is no difference (the time indication is already available in UTC), this is symbolised by the letter Z.

z stands for a timezone indication related to GMT time which is presented as time difference in the form GMT ± [hh:00], for example GMT +2:00.

The modifiers:

Beyond the simple value extraction, within the wildcard marker brackets further formatting options can be described concerning the presentation of the component for each component wildcard.

For this purpose, two optional modifiers at the maximum are used – a primary and a secondary modifier which can be attached to the component wildcard in the form of a suffix. The secondary modifier may only be used in conjunction with a primary modifier. If an indicated presentation modifier is not supported by the implementation, the default presentation is used for the respective component.

For instance, by attaching the letter N, the wildcard for the month value M can be switched from the output of a numerical value to the output of the name of the month (in this case in capital letters):

[M] => 11

[MN] => NOVEMBER

An attached small n indicates the output in small letters (depending on the language, the first letter may be a capital letter):

[Mn] => November

Language dependence of the name to be outputted
The selection of the name to be outputted is dependent on the language: »Tuesday« (en), »Dienstag« (de), »Mardi« (fr), etc. Moreover, regional differences may possibly also be taken into account (in German: »Samstag« or »Sonnabend« for Saturday). The language to be used is determined by the third function argument: The language is indicated in the form of a country code consisting of two characters according to ISO 3166-1.

There are the following primary modifiers:

Symbol Meaning Example Effect
A capital letters MA DECEMBER
a small letters Ma December
N identifier, capital letters MN DECEMBER
n identifier, small letters Mn December
I Roman numerals, capital MI XII (December)
i Roman numerals, small Mi xii (December)
1 (digit) numerals, Unicode set* M1 12 (December)

Table: primary modifiers

* Explanation to 1 (digit): In place of a presentation by numerals of any kind, here any Unicode character can be used which has the decimal value 1 (according to the Unicode Property Database). The value is presented with the help of the group of numerals indicated in this way. (In theory, for example, Arabic numbering can be arranged by the Arabic numeral 1 [U+0661] and Tibetan numbering by the Tibetan numeral 1 [U+0F21].) In principle, also any other character can be used in order to indicate a numbering row starting with the appropriate character – of course only if such a numbering is supported by the implementation. By means of leading zeros (e.g. 001), in the format token the replenishment by zeros (or an appropriate character) can be executed on a predetermined number of digits.

In addition, there are the following secondary modifiers:

Symbol Meaning Example Effect
t traditional numbering* D1t 31
o ordinal numbering
Note: only in addition to the primary modifier
D1o 31st (lang="en")

Table: secondary modifiers

The length indication:

Normally, the substring of each component is outputted in full length (full representation). However, the formatted output can also be limited per component by an added length indication to a certain number of characters to be used. Actually, this length indication is a range specification composed of indications for a minimum and a maximum length. The indication of a maximum length is optional within the length indication. If it is omitted, the component is outputted in full length (but with the indicated minimum length).

If the formatting exceeds the indicated maximum length (for example by demanding leading zeros for the numerical output) by means of the format token, the length indication has priority in case of doubt.

The length indicator is written behind the component indication (with possible modifiers) separated by a comma. Example:

'2007' is shortened to '07': [Y, 2-2] => 07

The star * is a representative for any integer, which means it leaves the minimum length open. The 3 limits the maximum length of the component string to three characters. Since most languages do not have month names with less than three characters, the names are effectively shortened to three characters. However, if the maximum length is indicated with a star *, the respective indication can be omitted. Example:

'December' is shortened to 'Dec': [Mn,*-3] => Dec

The way of shortening depends on the conventions of the selected language. In the case of names, it is partly a real abbreviation instead of a simple length cutting (however, this can be handled in different ways). For example, when outputting the name of the weekday limited to four characters (at the maximum!), it can indeed be outputted as abbreviation consisting of only three characters, provided that these are »common«:

'Tuesday' becomes 'Tues': [Dn,*-3] => Tues

but 'Friday' becomes 'Fri': [Dn,*-3] => Fri

In these examples, the minimum output length is not determined, therefore the maximum number may be shortened. A determination of exactly four characters could be achieved with [Dn,4-4].

In case the actual length of the value of the component to be outputted falls below the minimum length, the string is accordingly extended (padding). This is done by space characters inserted at the beginnging of the substring, in case of numerical values by prepended leading zeros:

[D], [M] [Y] => 1. 4. 2003

[D,2], [M,2] [Y] => 01. 04. 2007

Remark: If the shortening of the output is not reasonably possible (e.g. in case of longer Roman numerals), the component is outputted in full length, despite the length indication.

The calendar indication:

The presentation of date values according to different calendars is by no means a trivial matter. In order to adapt the return value of the time format function accordingly, the functions can be provided with a certain calender by using the calendar parameter. At the moment, only the ability of formatting according to the Gregorian or the ISO calendar (which, in most cases, will already be the default calendar) or at least one of the other indicated calendars is required. The support of further calendars, also beyond the ones in the recommended list, is optional for the application.

The storage and processing of date values in computer systems is strongly based on the Gregorian calendar (ISO calendar). However, this calendar is not valid in all cultures, where to some extent extreme conversion problems may occur. Only for demonstration:

The Jewish calendar (referred to as AM »Anno Mundi« in the table) is based on solar and lunar cycles and has leap years, each with thirteen months within phases of 19 years (the years 3, 6, 8, 11, 14 and 19). Therefore, the years have 354 or 385 days. The Jewish calendar (year 1, AM) begins on 6 October 3760 of the Julian calendar (AD). Therefore, the Christian year 2000 (AD) corresponds to the year 5760 (AM). Months of 29 und 30 days are used, days begin with sunset.

The Islamic calendar (referred to as AH »Anno Hegirae« in the table) is a lunar calendar where the months correspond to the lunar cycles which, measured by the solar year over a period of about 33 years, leads to a shift by the seasons (a year only has 354 days). The sighting of the crescent moon after new moon determines the beginning of a new month (for non-religious purposes, a table is used). Days begin with sunset. The year 2000 (AD) of the Christian calendar corresponds to the year 1421 (AH) of the Islamic calendar.

The historical Julian calendar (referred to as OS »Old Style« in the table) is a solar calendar which was introduced by Julius Caesar in the year 46 BC. (This calendar is partly still in use in East European orthodox churches!) The calendar year consists of 365 days, whereby each forth year is a leap year with 366 days. However, a Julian year turned out to be about eleven minutes too long compared to the actual solar year, which led to an error of ten full days until the 16th century (ca. one day every 128 years).

The Gregorian calendar (referred to as AD »Anno Domini« in the table) was created due to a correction made by Pope Gregory XIII in the year 1582. Deviating from the Julian calendar, from the years with full hundreds only those are counted as leap years whose number is divisible by 400 (for this reason, in contrast to the year 2000, the year 1900 was not a leap year). Today, the Gregorian calendar is used in most parts of the world. However, the date of the actual introduction varies considerably.

The calendar according to ISO 8601 (referred to as ISO in the table) corresponds to the Gregorian calendar, but, in contrast to it, does not allow country- or language-specific numberings for the indication of weekdays or calender weeks (in particular, Monday is always day 1 and Sunday day 7; as the first calendar week the one applies which includes the first Thursday of the year). Therefore, the ISO calendar conforms with the practice of the date notation of XML Schema and essentially aims at software applications which have to process date values.

The following table lists more or less all currently available calendars which are supported by XSLT 2.0:

Abbreviation Calendar
AD Anno Domini, Christian calendar; identical to the Gregorian calendar, valid since 1582
AH Anno Hegirae, Muslim calendar (Islamic calendar); counts from the emigration of the Islamic Prophet Muhammad from Mecca to Medina (known as the Hijra) in 622 AD according to the Christian calendar (2000 AD corresponds to 1421 AH.)
AM Anno Mundi, Jewish calendar; outside of Israel only used for religious purposes (2000 AD corresponds to 5760 AM.)
AME Mawlid calendar; solar years since the birth of Muhammad (2000 AD corresponds to ca. 1433 AME.)
AP Anno Persici, Persian calendar (2000 AD corresponds to 1379 AP.)
AS Aji Saka calendar, Java; lunar calendar with 354 days per year; conversion cannot be determined
BE Buddhist calendar (2000 AD corresponds to 2543 BE.)
CB Cooch Behar calendar, West Bengal; conversion cannot be determined
CE general secularised calendar (Common Era); corresponds to AD without the religious background (2000 AD corresponds to 2000 CE.)
CL Chinese, lunar calendar (follows a 60-years cycle); conversion is problematic
CS Chula Sakarat calendar, Burma (2000 AD corresponds to 1362 CS.)
EE Ethiopian calendar (2000 AD corresponds to 1992 EE.)
FE Fasli calendar, India (2000 AD corresponds to 1410 FE.)
ISO ISO 8601 calendar (2000 AD corresponds to 2000 ISO.)
JE Japanese calendar, uncommon (2000 AD corresponds to 2660 JE.)
KE Khalsa calendar, Sikh calendar, India (2000 AD corresponds to ca. 1700 KE.)
KY Kali Yuga, India; age of Kali according to the Vedic scriptures (2000 AD entspricht ca. 5102 KY.)
ME Malabar calendar, India (2000 AD corresponds to 1175 ME.)
MS monarchal solar calendar (2000 AD corresponds to 2559 MS.)
NS Nepal Sambat calendar (Bikram Sambat), Nepal (2000 AD corresponds to ca. 2057 NS.)
OS Old Style; Julian calendar until 1582 (2000 AD corresponds to 2000 OS.)
RS Rattanakosin calendar (Rata Nakosinook), Bangkok (2000 AD corresponds to 219 RS.)
SE Saka calendar, India (2000 AD corresponds to 1921 SE.)
SHIslamic solar calendar, Iran (2000 AD corresponds to 1379 SH.)
SS Shaka Samvat calendar, India (2000 AD corresponds to 1922 SS.)
TE Tripurabda calendar, India (2000 AD corresponds to 1410 TE.)
VE Vikrama calendar, India; conversion cannot be determined, probably corresponds to VS
VS Vikram Samvat calendar, India (2000 AD corresponds to 2057 VS.)

Table: the calendars (theoretically) supported in XSLT 2.0

Remark: this calendar list is not complete.
There are further eras and calendars in India (and not only there), such as Saptarsi Saka, Brihaspati Samvatsara, Gupta Saka, Vallabhi Saka and many more. Therefore, the table above could be optionally expanded.

The location indication:

The optional fifth argument of the function call, also known as country parameter, shall, depending on the implementation, take into account the regional and historical differences in the interpretation of a calender wherever this is not possible by using the mere language setting parameter.

With this parameter cases are covered in which the calendrical calculations within a geographical outlined region change at a specific point in time, with the result that time indications have to be formatted differently, depending on whether they take place before or after the change. For this purpose, the boundary of the respective region is to be taken into consideration by means of, for example, political borders at the regarded moment. If the time indication is in the future, the currently existing borders are taken as a basis.

Especially in the Indian or East Asian region such calendar changes can be dated back partly only slightly more than five decades. Implementations dealing with corresponding data (for instance for the Thai market; Buddhist calendar BE) would not be able to take this into account accordingly only by the language indication. In the European area such changes date back further: Thus, the change from the Julian calendar to the Gregorian calender took place at different times, depending on the country and region: in Germany in 1700, in England in 1752, in Turkey in 1916. In China the Gregorian calendar was introduced not until the year 1949.

The $d date value of the xs:date type used in all following examples corresponds to the 31st December 2007.

Example 1 - simple date formatting, version 1:

format-date($d,"[Y]-[M]-[D]")

Results in: 2007-12-31.

In this case the number values of the respective components are simply extracted and outputted in the appropriate order provided with the string literals.

Example 2 - simple date formatting, version 2:

format-date($d,"[M]-[D]-[Y]")

Results in: 12-31-2007.

In principle, a variant of example 1. The order of the output corresponds to the order of the wildcards in the pattern string.

Example 3 - simple date formatting, version 3:

format-date($d,"[D]. [M]. [Y]")

Results in: 31. 12. 2007.

In this example, other string literals are used. The output corresponds to the convention in the German-speaking area. However, it is not achieved via a date format declaration with the language attribute, but directly by the form of the pattern string containing the required punctuation as string literals.

Example 4 - simple date formatting, Roman month:

format-date($d,"[D1] [MI] [Y]")

Results in: 31 XII 2007.

The value of the day component is outputted decimally (D1; D with modifier 1, which can also be omitted in this example), the month with capital Roman digits (MI; M with modifier I). In this example, the string literals only consist of inserted space characters.

Example 5 - date formatting with date format declaration (1):

format-date($d,"[D1o] [Mn], [Y]", "en", (), ())

Results in: 31st December, 2007.

The names of the months are outputted according to the rules of the language indicated with the third argument "en" as identifiers with capital and small letters (Mn; M with modifier n).

The day is outputted with ordinal. Since this is only possible via the secondary modifier, the 1, although not in this sense »modifying«, must be available as primary modifier (D1o; D with 1 as primary and o as secondary modifier).

Here, the default calender is used, because the empty string was passed on for the calendar argument.

For formal reasons, the empty sequence was also passed on for the location argument not used.

Example 6 - date formatting with date format declaration (2):

format-date($d,"[Mn] [D], [Y]", "en", (), ())

Results in: 31 December, 2007.

Here, the same language setting as in example 5 is used. The month value is outputted as identifier, whereby small letters are allowed. The first letter is – language specifically! – outputted as capital letter (Mn; M with modifier n).

Example 7 - date formatting with date format declaration (3):

format-date($d,"[D] [MN,*-3] [Y]", "en", (), ())

Results in: 31 DEC 2007.

Here, also the same language setting as in example 5 is used. The month value is outputted as identifier in capital letters (MN; M with modifier N). Additionally, a length restriction to three characters of the output of the identifier is required, without indicating a lower threshold (*-3).

Additional space characters enforce a minimum number of characters of the output
If the selected language contains month identifiers with only two letters, these are outputted unchanged. However, in case a lower threshold of three characters is indicated (3-3), the minimum number of characters would be enforced by prepended space characters.

Example 8 - date formatting with date formt declaration (4):

format-date($d,"[Fn] [D] [Mn] [Y]", "sv", (), ())

Results in: Onsdag 31 December 2007.

The function call determines Swedish as formatting. The processor must support the output accordingly in the Swedish language, otherwise the function call fails.

Since the (forth) calendar argument was not passed on, the »Gregorian« default calendar applies.

Example 9 - squared brackets in the string literal:

format-date($d,"[[[Y]-[M]-[D]]]")

Results in: [2003-04-07].

In order to surround the date string with squared brackets (as in this example), these have to be outputted as string literals. This can be achieved by the dublication of each squared bracket to be outputted literally. As a consequence, at the beginning of the pattern string a sequence of three squared brackets occurs from which the first two represent the string literal (escaping inclusive) and the third one signals the begin of the first wildcard symbol. The situation at the end of the string is the same.

Example 10 - Jewish calendar with western numbering:

format-date($d, "[D] [Mn] [Y]", "he", "AM", ())

Results in: 5768 תבט26.

The usage of the Jewish calendar by passing on the calendar parameter with the value "AM" has the effect that the year 2007 is converted to the year 5768 according to the Jewish calendar. On the one hand, the language setting "he" for Hebrew causes the output of the Hebrew month name and, on the other hand, the output in the form: year, month, day.

Example 11 - Julian calendar (Old Style):

format-date($d, "[D] [MNn] [Y]", "en", "OS", ())

Results in: 18 December 2007.

During the formatting, the usage of the Julian calendar effects a shift of the counting of the days from the 31st to the 18th December. The shift of the input date 31.12.2007 of 13 days results from the varying length of the underlying solar year (since the Julian year is too long, the outputted date moves backwards).

Function definition:

XSLT 1.0:

The function is not available.

XSLT 2.0:

format-date($value as xs:date?,

$picture as xs:string) as xs:string?

format-date($value as xs:date?,

$picture as xs:string,

$language as xs:string?,

$calendar as xs:string?,

$country as xs:string?) as xs:string?

   

<< back next >>

 

 

 


Copyright © Galileo Press, Bonn 2008
Printing of the online version is permitted exclusively for private use. Otherwise this chapter from the book "XSLT 2.0 & XPath 2.0" is subject to the same provisions as those applicable for the hardcover edition: The work including all its components is protected by copyright. All rights reserved, including reproduction, translation, microfilming as well as storage and processing in electronic systems.


Galileo Press, Rheinwerkallee 4, 53227 Bonn, Germany