XPath and Layout Functions

What is XPath?

The speedata Publisher’s input is data encoded in the XML format. XML is a hierarchical data structure where there is a root element and each element can have attributes and children, that are either text or other elements. For example you can have an element articlegroup which contains ten article elements. Now the idea of XPath is to navigate through the XML tree and ask questions like: give me all articles that have a certain attribute. Or how many articles do I have in this article group?

XPath expressions

The speedata Publisher accepts XPath expressions in the attributes test and select. In all other attributes XPath expressions can be used via curly braces ({ and }). In the following example XPath expressions are used in the attribute width and in the attribute select of the element Value. The width of the textblock is taken from the variable $width, the contents of the paragraph is the value of the current node (whatever this is).

<Textblock width="{ $width }">
  <Paragraph>
    <Value select="."/>
  </Paragraph>
</Textblock>

The following XPath expressions are handled by the software:

Number: Return the value without change: "5"
Text: Return the text without change: 'hello world'
Arithmetic operation (*, div, idiv, +, -, mod). Example: ( 6 + 4.5 ) * 2
Variables. Example: $column + 2
Access to the current node (dot operator). Example: . + 2
Access to subelements. Examples: productdata, *, foo/bar, node(),
Parent nodes: ../
Filter in square brackets, for example article[1] selects the first article.
Access to subelements. Examples: productdata, *, foo/bar
Attribute access in the current node. Example @a
Attribute access in subnodes, for example foo/@bar
Boolean expressions: <, >, <=, >=, =, !=. Attention, the less than symbol < must be written in XML as <, the symbol > may be written as >. Example: $number > 6. Can be used in tests.
if/then/else expressions: if (…) then … else ….
for clauses for example: for $i in (1,2,3) return $i * 2 or for $i in 1 to 3 return $i * 2.
The well known axis expressions like following-sibling, parent, preceding-sibling.

See the test file of lxpath if in doubt.

If you are still uncertain about XPath, please follow the good tutorial at W3Schools.

XPath and namespaces

The speedata Publisher ignores XML namespaces by default. This is usually very helpful, as namespaces are not always easy to handle.

The XML file:

<bar:data xmlns:bar="somenamespace">
    <bar:child>
        <bar:sub>sub</bar:sub>
    </bar:child>
</bar:data>

can be addressed in the layout as follows:

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en">

    <Record element="data">
      ...

although the namespace of the root element is somenamespace.

With <Options namespaces="strict" /> you can change this so that the namespace must be specified for <Record> and <ProcessNode>:

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en"
    xmlns:sn="somenamespace">

    <Options namespaces="strict" />

    <Record element="sn:data">
        <PlaceObject>
            <Textblock>
                <Paragraph>
                    <Value select="local-name()"></Value>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>

It is important that the namespace matches, the prefix (here: sn) is freely selectable as long as it is linked to the namespace.

<ProcessNode> works similar:

<ProcessNode select="*:child" />
<ProcessNode select="sn:child" />
<ProcessNode select="sn:*" />
<ProcessNode select="*" />

call the child element bar:child, as the namespaces of sn (from the layout) and bar (from the data) match. * is a ‘wildcard’ here, so it matches all names.

<ProcessNode select="child" /> will not work in the above case, as the empty namespace from the layout is urn:speedata.de:2009/publisher/en and does not match the namespace from the data.

You must also use namespaces in the XPath queries if they are specified in the data:

<Message select="count(sn:sub)" />

writes 1 to the log file. Without namespace a 0, as no element sub with the preset namespace is found.

The following XPath functions are known to the system:

There are three classes of XPath functions: standard XPath functions, the speedata Publisher specific ones and user defined functions. The layout specific functions are in the namespace urn:speedata:2009/publisher/functions/en (denoted by sd: below). The standard functions should behave like documented by the XPath 2.0 standard.

sd layout functions

sd:allocated(x,y,<areaname>,<framenumber>): Return true if the grid cell is allocated, false otherwise (since 2.3.71).
sd:alternating(<type>, <text>,<text>,.. ): On each call the next element will be returned. You can define more alternating sequences by using distinct type values. Example: sd:alternating('tbl', 'White','Gray') can be used for alternating color of table rules. To reset the state, use sd:reset-alternating(<type>).
sd:aspectratio(<imagename>,[pagenumber],[pdfbox]): Return the result of the division width by height of the given image. (< 1 for portrait images, > 1 for landscape). The arguments can contain a page number and a PDF box, see Specifying the page for layout functions for details.
sd:attr(<name>, …): Is the same as @name, but can be used to dynamically construct the attribute name. See example at sd:variable().
sd:count-saved-pages(<name>): Return the number of saved pages from <SavePages>.
sd:current-column(<name>): Return the current column. If name is given, return the column of the given frame.
sd:current-framenumber(<name>): Return the current frame number of given positioning area.
sd:current-page(): Return the current page number.
sd:current-row(<name>): Return the current row. If name is given, return the row of the given frame.
sd:decode-base64(<contents>): Expect a string encoded in base64 and return the binary contents.
sd:decode-html(<node>): Change text such as <i>italic</i> into HTML markup (<i>italic</i> in this case).
sd:dimexpr(<Unit>,<Expression>): Interprets the expression as a calculation and return the value as a scalar in the unit. Interprets variables. For example, say that $twocm is set to the string 2cm, sd:dimexpr('cm',' (40mm + $twocm) / 2 ') returns the number 3.0.
sd:dummytext([<count>]): Return a “Lorem ipsum… “ dummy text. The count defaults to 1.
sd:even(<number>): True if number is even. Example: sd:even(sd:current-page()).
sd:file-exists(<filename or URI schema>): True if file exists in the current search path. Otherwise it returns false.
sd:filecontents(<binarycontent>): Save the given contents into a file and return the file name.
sd:firstmark(pagenumber): The first marker of the given page number. Useful for headings in dictionaries where the first and the last entry of a page is given.
sd:first-free-row(<name>): Return the first free row of this area (experimental).
sd:format-number(Number or string, thousands separator, comma separator): Format the number and insert thousands separators and change comma separator. Example: sd:format-number(12345.67, ',','.') returns the string 12,345.67.
sd:format-string(object, object, … ,formatting instructions): Return a text string with the objects formatted as given by the formatting instructions. These instructions are the same as the instructions by the C function printf().
sd:group-height(<string>[, <unit>]): Return the given group’s height (in gridcells). See sd:group-width(…) If provided with an optional second argument, it returns the height of the group in multiples of this unit. For example sd:group-height('mygroup', 'in') returns the group height in inches.
sd:group-width(<string>[, <unit>]): Return the number of gridcells of the given group’s width. The argument must be the name of an existing group. Example: sd:group-width('My group'). See sd:group-height() for description of the second parameter.
sd:imageheight(<filename or URI schema>,[pagenumber],[pdfbox],[<unit>]): Natural height of the image in grid cells. Attention: if the image is not found, the height of the file-not-found placeholder will be returned. Therefore you need to check in advance if the image exists. If provided with an optional second argument, it returns the height of the image in multiples of this unit. For example sd:imageheight('myimage.pdf', 'in') returns the height of 'myimage.pdf' in inches. The arguments can contain a page number and a PDF box, see Specifying the page for layout functions for details.
sd:imagewidth(<filename or URI schema>,[pagenumber],[pdfbox],[<unit>]): Natural width of the image in grid cells. Attention: if the image is not found, the width of the file-not-found placeholder will be returned. Therefore you need to check in advance if the image exists. If provided with an optional second argument, it returns the width of the image in multiples of this unit. For example sd:imagewidth('myimage.pdf', 'in') returns the width of myimage.pdf in inches. The arguments can contain a page number and a PDF box, see Specifying the page for layout functions for details.
sd:keep-alternating(<type>): Use the current value of sd:alternating(<type>) without changing the value.
sd:lastmark(pagenumber): The last marker of the given page number. Useful for headings in dictionaries where the first and the last entry of a page is given.
sd:length(<what>[,<unit>]): Get the length of what in number of units (but without the unit). For example sd:length('_bleed','mm') returns 3 if the bleed value is set to 3mm. This function initializes a page. The unit defaults to 'mm'. what can be _bleed, _pagewidth or _pageheight (Since version 4.19.10.)
sd:loremipsum(): Same as sd:dummytext().
`sd:markdown(<text>): Renders the text as markdown. See Markdown.
sd:md5(<value>,<value>, …): Return the MD5 sum of the concatenation of each value as a hex string. Example: sd:md5('hello ', 'world') gives the string 5eb63bbbe01eeed093cb22bb8f5acdc3.
sd:merge-pagenumbers(<pagenumbers>,<separator for range>,<separator for space>, [hyperlinks]): Merge page numbers. For example the numbers "1, 3, 4, 5" are merged into 1, 3–5. Defaults for the separator for the range is an en-dash (–), default for the spacing separator is ', ' (comma, space). This function sorts the page numbers and removes duplicates. When the separator for range is empty, the page numbers are separated each with the separator for the space. If hyperlinks is set to true(), the page numbers become active. The default is false(). The function will show the user visible page numbers, which correspond to the logical page numbers by default.
sd:mode(<string>[,<string>…]): Returns true (true()) if one of the specified modes is set. A mode can be set from the command line or from the configuration file. See Control of the layout when calling the Publisher
sd:number-of-columns(): Number of columns in the current grid.
sd:number-of-pages(<filename or URI schema>): Determines the number of pages of a (PDF-)file.
sd:number-of-rows(): Number of rows in the current grid.
sd:odd(<number>): True if number is odd.
sd:pagenumber(<string>): Get the number of the page where the given mark is placed on. See the command <Mark>.
sd:pageheight(<unit>): Similar to sd:pagewidth(), just for the height.
sd:pagewidth(<unit>): Get the width of the page in number of units (but without the unit). For example a page with width 210mm sd:pagewidth("mm") returns 210. This function initializes a page. (Since version 4.13.8.)
sd:romannumeral(<number>): Convert the number into a lowercase Roman numeral.
sd:randomitem(<Value>,<Value>, …): Return one of the values.
sd:reset-alternating(<type>): Reset alternating so the next sd:alternating() starts again from the first element.
sd:sha1(<value>,<value>, …): Return the SHA-1 sum of the concatenation of each value as a hex string. Example: sd:sha1('hello ', 'world') gives the string 2aae6c35c94fcfb415dbe95f408b9ce91ee846ed.
sd:sha256(<value>,<value>, …): Return the SHA-256 sum of the concatenation of each value as a hex string. Example: sd:sha256('hello ', 'world') gives the string b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9.
sd:sha512(<value>[,<value>] …): Return the SHA-512 sum of the concatenation of each value as a hex string. Example: sd:sha512('hello ', 'world') gives the string 309ecc489c12d6eb4cc40f50c902f2b4d0ed77ee511a7c7a9bcd3ca86d4cd86f989dd35bc5ff499670da34255b45b0cfd830e81f605dcf7dc5542e93ae9cd76f.
sd:symbol(<value>[,<value>] …): Returns a symbol from the current font. Example: sd:symbol(123, 444) returns the characters with the glyph numbers (glyph id) 123 and 444. The glyph number can be determined e.g. with https://opentype.js.org/glyph-inspector.html (since version 5.1.10.)
sd:tounit(<string>,<string>[,<number>]): Return a scalar of the unit given in the second argument converted to the unit given in the first argument rounded to the digits in the third argument (defaults to 0 - return integer values). Example: sd:tounit('pt','1pc') returns 12, because there are 12pt in 1pc (pica point).
sd:variable(<name>, …): The same as $name. This function allows variable names to be constructed dynamically. Example: sd:variable('myvar',$num) – if $num contains the number 3, the resulting variable name is myvar3.
sd:variable-exists(<name>): True if variable name exists. Example: sd:variable-exists('my_bar') checks whether $my_bar is defined (variable names in this function have to be enclosed in single quotation marks if double quotation marks are used to delimit the XPath attribute).
sd:visible-pagenumber([<number>]): Return the user visible page number (as defined by matters) for the given real page number. Without argument, the current page number is used.

XPath functions

abs(<number>): Return the positive value of the number.
boolean(<value>): Return the effective boolean value of the argument.
codepoints-to-string( <codepoints> ): Convert the sequence of code points to a string.
ceiling(): Round to the higher integer. ceiling(-1.34) returns -1, ceiling(1.34) returns 2.
concat( <value>,<value>, … ): Create a new text value by concatenating the arguments.
contains(<haystack>,<needle>): True if haystack contains needle. contains('bana','na') returns true().
count(<text>): Counts all child elements with the given name. Example: count(article) counts how many child elements with the name article exist.
distinct-values(<sequence>): Returns the distinct values of the sequence. Example: distinct-values1,2,2,3,3,3 results in the sequence (1,2,3). (Since version 5.1.26.)
doc(<string>): Opens the file with the given filename and returns the contents of the file.
empty( <sequence> ): Checks, if the sequence is empty. For example non existing child elements or non existing attributes are “empty”.
ends-with ( <string>, <string>): Return true if the first string ends with the second string. Example: ends-with ( "tattoo", "too") returns true.
false(): Return false.
floor(): Returns the largest number with no fractional part that is not greater than the value of the argument.
format-number(<number>, <pattern>): Format the number according to the pattern. Example: format-number(12345.67, ',#0.00') returns the string 12,345.67. (Since version 5.1.25.)
last(): Return the number of elements of the same named sibling elements. Not yet XPath conform.
local-name(): Return the local name (without namespace) of the current element.
lower-case(<text>): Return the text in lowercase letters.
matches(<text>,<regexp>[,<flags>]): Return true if the regexp matches the text. Flags can be one of sim and are described in the spec: https://www.w3.org/TR/xpath-functions-31/#flags. Example: matches("banana", "^(.a)+$") returns true.
max(): Return the maximum value. max(1.1,2.2,3.3,4.4) returns 4.4.
min(): Return the minimum value. min(1.1,2.2,3.3,4.4) returns 1.1.
namespace-uri([nodeset]): The namespace URI of the first node in this node-set will be returned. If this argument is omitted, the current context node will be used.
number(<value>): Convert the argument to a number. Return “not a number” if the value cannot be converted.
not(): Negates the value of the argument. Example: not(true()) returns false().
normalize-space(<text>): Return the text without leading and trailing spaces. All newlines will be changed to spaces. Multiple spaces/newlines will be changed to a single space.
position(): Return the position of the current node.
replace(<input>,<regexp>, <replacement>): Replace the input using the regular expression with the given replacement text. Example: replace('banana', 'a', 'o') yields bonono.
root(element): Return the root element of the element.
round(<number>,<number>): Return the argument in the first parameter rounded to number of decimal places in the second parameter. The second parameter defaults to 0.
round-half-to-even(<number>,<number>): Return the argument in the first parameter rounded to number of decimal places in the second parameter. The second parameter defaults to 0. If the number ends in .5, it is rounded to the next even number (Banker’s Rounding). (Since version 5.1.25.)
starts-with ( <string>, <string>): Return true if the first string starts with the second string. Example: starts-with ( "tattoo", "tat") returns true.
string(<sequence>): Return the text value of the sequence e.g. the contents of the elements.
string-join(<sequence>,separator): Return the string value of the sequence, where each element is separated by the separator.
string-length(<string>): Return the length of the string in characters. Multi-byte UTF-8 sequences are counted as 1.
string-to-codepoints( <string> ): Convert the string to a sequence of code points.
substring(<input>,<start>,<length>): Return the part of the string input that starts at start and optionally has the given length. start can be (in contrast to the XPath specification) negative which counts from the end of the input.
substring-after(<string>,<string>]): Return the contents of the first string, that comes after the second string. Example: substring-after ( "tattoo", "tat") returns "too".
substring-before(<string>,<string>]): Return the contents of the first string, that comes before the second string. Example: substring-before ( "tattoo", "attoo") returns "t".
tokenize(<input>,<regexp>): This function returns a sequence of strings. The input text is read from left to right. When the regular expression matches the current position, the text read so far from the last match is returned. Example (from the great XPath / XSLT book by M. Key): tokenize("Go home, Jack!", "\W+") returns the sequence "Go", "home", "Jack", "".
translate(<input>,<from>,<to>): Replace all characters in input that are in from by the corresponding character in to. Example: translate('banana','an','uo') results in buouou. If to is shorter than from, the extra characters in from are deleted. Example: translate('banana','an','u') results in buu. (Since version 5.1.26)
true(): Return true.
unparsed-text(<filename>): Returns the contents of the file without interpretation.
upper-case(): Converts the text to capital letters: upper-case('text') results in TEXT.