Accessibility / tagged PDF

The possibility of creating accessible PDFs is still under development. This means that the objects that can be tagged are limited. The same applies to the roles (tag names) and this section of the manual.

What is tagged PDF?

Tagged PDF is an extension of the PDF file with a logical structure of the content. The tags are used to determine the role of the object to be output, e.g. level 1 heading, a paragraph or an image. This is mainly used to create accessible PDFs, i.e. PDFs that are optimized for screen readers or similar output devices.

For accessible PDFs, not only the technical side must be taken into account (this is what the speedata Publisher is for), but also the logical structure must be created sensibly. For example, images must be output with an alternative text, otherwise their use is restricted.

Example

In the following example, the document is not further divided into sub structures. A heading (H1) and an image (Figure) are output in this section.

<Layout xmlns="urn:speedata.de:2009/publisher/en"
	xmlns:sd="urn:speedata:2009/publisher/functions/en">

  <PDFOptions format="PDF/UA" />

  <StructureElement role="Document" id="doc" />

  <Record element="data">
    <PlaceObject>
      <Textblock>
        <Bookmark level="1" select="'My image collection'"/>
        <Paragraph role="H1" parent="doc">
          <Value>My image collection</Value>
        </Paragraph>
      </Textblock>
    </PlaceObject>

    <PlaceObject>
      <Image parent="doc"
             description="an ocean"
             file="ocean.pdf"
             width="4cm" />
    </PlaceObject>
  </Record>
</Layout>

Hierarchical structure

By default, the role "Document" is defined with the ID doc. This corresponds to the following structure:

<StructureElement role="Document" id="doc" />

There are two ways to define further structures. Either these are defined in a nested manner (both examples create the same structure):

<StructureElement role="Document">
	<StructureElement role="Art">
		<StructureElement role="Sect" id="sect1a1"/>
		<StructureElement role="Sect" id="sect2a1"/>
	</StructureElement>
	<StructureElement role="Art">
		<StructureElement role="Sect" id="sect1a2"/>
		<StructureElement role="Sect" id="sect2a2"/>
	</StructureElement>
</StructureElement>

or you can use the parent attribute:

<StructureElement role="Document" id="doc" />
<StructureElement role="Art" id="art1" parent="doc" />
<StructureElement role="Art" id="art2" parent="doc" />
<StructureElement role="Sect" id="sect1a1" parent="art1"/>
<StructureElement role="Sect" id="sect2a1" parent="art1"/>
<StructureElement role="Sect" id="sect1a2" parent="art2"/>
<StructureElement role="Sect" id="sect2a2" parent="art2"/>

This allows the document to be structured dynamically based on the data.

The id attribute is necessary to specify the parent structure for paragraphs and images. In the example above, both elements are output in the Sect section.

Roles (tag names)

The following role names (tags) are defined in the speedata Publisher. Others can be added on request:

Art, Div, Document, Figure, H1, H2, H3, H4, H5, H6, Lbl, P, Part, Sect, Span, TOC, TOCI

From the examples repository

This layout can also be found in the examples repository and creates a PDF/UA-compliant PDF that fulfils the requirements for accessibility. Below the top level Document there is a section Sect, which contains a heading, a paragraph without special features and a paragraph with a hyperlink, as well as an image.

(Run the speedata Publisher with sp --dummy, as variable data is not used here).

<Layout xmlns="urn:speedata.de:2009/publisher/en"
	xmlns:sd="urn:speedata:2009/publisher/functions/en">
  <PDFOptions format="PDF/UA" />

  <StructureElement role="Document">
    <StructureElement role="Sect" id="section" />
  </StructureElement>

  <Record element="data">
    <PlaceObject>
      <Textblock>
        <Paragraph role="H1" parent="section">
          <B>
            <Value>A very short story</Value>
          </B>
        </Paragraph>
        <Paragraph role="P" parent="section">
          <Value>Once upon a time....</Value>
        </Paragraph>
        <Paragraph role="P" parent="section">
          <Value>This is a </Value>
          <A href="https://www.speedata.de"
             description="link to speedata.de">
            <Value>link to speedata.de</Value>
          </A>
          <Value>.</Value>
        </Paragraph>
      </Textblock>
    </PlaceObject>
    <PlaceObject>
      <Image
          width="8"
          file="ocean.pdf"
          parent="section"
          description="An image of an ocean" />
    </PlaceObject>
  </Record>
</Layout>

The output from the layout above is as expected.

ay11output

Various tools can be used to check the structure of the document:

ay11structure
The accessibility checker outputs exactly the specified structure. The b-tag in the heading is not displayed in the structure.
ay11acrobat
In addition to a detailed review, Adobe Acrobat also provides a visual view of the structure.

You can use pdfuaanalyze to display the structure as an XML tree.

<Document>
  <Sect>
    <H1></H1>
    <P></P>
    <P>
      <Link></Link>
    </P>
    <Figure></Figure>
  </Sect>
</Document>

Checking the document

The following programs can be used to check accessibility: