Hyphenation / language settings

Hyphenation is necessary in most Western languages to have an acceptable appearance for narrow texts. Hyphenation is an integral part of the line breaking algorithm, for example to avoid multiple hyphenations in consecutive lines.

Hyphenation is controlled by language in the Publisher and is pattern-based. The language can be set globally via <Options mainlanguage="…​"> or paragraphwise.

<Options mainlanguage="German" />

switches the entire document to German hyphenation patterns, while

<Paragraph language="German">
    <Value>Autobahn</Value>
</Paragraph>

changes the language for only one paragraph. The available languages are described in the command reference under <Options>.

Alternatively to the written out names like German, the language code can be used. The two examples above can also be used as follows:

<Options mainlanguage="de" />

<Paragraph language="de">
    <Value>Autobahn</Value>
</Paragraph>

If you want to test if the words are hyphenated correctly, you can create small marks with <Trace hyphenation="yes" />.

<Layout
  xmlns="urn:speedata.de:2009/publisher/en"
  xmlns:sd="urn:speedata:2009/publisher/functions/en">

  <Options mainlanguage="German" />
  <Trace hyphenation="yes" />

  <Record element="data">
    <PlaceObject>
      <Textblock width="3">
        <Paragraph>
          <Value>Autobahn</Value>
        </Paragraph>
      </Textblock>
    </PlaceObject>
  </Record>
</Layout>

results in the following:

13 autobahn
Show hyphenation points in text

Via

<Hyphenation>er-go-no-mic</Hyphenation>

you can define hyphenation suggestions or exceptions for individual words. In this way, the words are then only hyphenated at the positions indicated by a hyphen. The language attribute can be used to specify the language for which the separation exception applies.

With optical margin alignment, which is described in the section Optical margin alignment, you can reduce the number of hyphenations in the document somewhat.

Turn off paragraph hyphenations

For single paragraphs you can switch off the automatic hyphenation by defining a text format with hyphenate="no".

<DefineTextformat name="nohyphen" hyphenate="no"/>

No words are hyphenated in paragraphs marked in this way. The use of text formats is described in a separate section.

The hyphenation character can also be changed using a text format:

<DefineTextformat name="dothyphen" hyphenchar="•"/>
13 dothyphen
Other character for word hyphenations

Use different languages within a paragraph

You can set the language for a textblock, a paragraph and you can even set the language for a piece of text by surrounding the text by <Span language="…​"> and </Span>.

<Paragraph language="en">
  <Span language="de">
    <Value>Also schön, Guido Heffels,
           nachfolgend meine Textempfehlung
           für das Blindtextbuch.
    </Value>
  </Span>
  <Br />
  <Span>
    <Value>A wonderful serenity has taken
           possession of my entire soul, like these sweet
           mornings of spring which I enjoy with my whole
           heart.
    </Value>
  </Span>
</Paragraph>

Allow hyphenations only on certain characters

A property of <Paragraph> allows to limit the characters where a line break may be inserted. This is often important for technical data where, for example, type designations in the form 12-345/AB occur and should not be hyphenated. In the following example, a line break may only be inserted after a slash:

<Paragraph allowbreak="/">
  <Value>https://download.speedata.de/publisher/development/</Value>
</Paragraph>

The default setting for allowbreak is “ -”, i.e. a break at a space or hyphen.

This is an experimental feature in the Publisher. It is likely to be associated with a text format in a future version.

Language settings for non-western languages

Some languages have special typesetting rules that do not affect hyphenation, but the appearance of the text. So the characters can change their shape or position, depending on where they are in the word. To use this feature, the following conditions must be met:

  1. mode="harfbuzz" must be activated at <LoadFontfile>.
  2. The language should be set correctly. If the language is not available in the list of supported languages, Other or -- (two dashes) must be used. If the language is not set correctly, layout errors might orccur.
  3. The selected font must contain the appropriate characters.
<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en"
    version="4.1.7">

    <LoadFontfile name="NotoSansBengali-Regular"
                  filename="NotoSansBengali-Regular.ttf"
                  mode="harfbuzz" />
    <DefineFontfamily fontsize="10" leading="12" name="text">
        <Regular fontface="NotoSansBengali-Regular" />
    </DefineFontfamily>

    <Record element="data">
        <PlaceObject>
            <Textblock>
                <Paragraph language="Other">
                    <Value>আমি</Value>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>
hb bengali correct
The language is recognized by the system when set to Other.

Right-to-left running text

If text is output that runs from right to left (e.g. Arabic), the direction of the paragraph must be specified with must be specified (direction="rtl"). Otherwise, the alignment may be wrong (the last line is left-aligned instead of right-aligned).

If the output text is not justified then start and end must be used for the alignment in text format and not 'leftaligned' and 'rightaligned'. start and end are based on the start position of the text and not on the orientation of the page (output area).

<Layout xmlns="urn:speedata.de:2009/publisher/en"
    xmlns:sd="urn:speedata:2009/publisher/functions/en"
    version="4.1.16">

  <LoadFontfile
    name="Amiri-Regular"
    filename="amiri-regular.ttf"
    mode="harfbuzz" />
  <DefineFontfamily fontsize="10" leading="12" name="text">
        <Regular fontface="Amiri-Regular" />
    </DefineFontfamily>

    <Record element="data">
        <PlaceObject>
            <Textblock width="5">
                <Paragraph direction="rtl">
                  <Value select="."/>
                </Paragraph>
            </Textblock>
        </PlaceObject>
    </Record>
</Layout>
<data>المادة 1 يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق.
وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء.</data>
rtl text
The text runs from right to left.

Mixed text (right-to-left and left-to-right)

If text is output that runs both from right to left (rtl) and from left to right (ltr), the paragraph must be divided into individual segments and the writing direction must be changed between the segments. This so-called “bidi algorithm” is built into the speedata Publisher and is activated with bidi="yes":

<PlaceObject>
    <Textblock width="5">
        <Paragraph bidi="yes">
            <Value select="."/>
        </Paragraph>
    </Textblock>
</PlaceObject>
<data>العاشر ليونيكود (Unicode Conference)،
الذي سيعقد في 10-12 آذار 1997 مبدينة</data>
bidi sample
Here the text direction is calculated separately for each section. If bidi="yes" is specified, the first part is taken as the main direction of the paragraph, in this case the specification direction="rtl" is not necessary

Rules for mixed text

  • Set the direction attribute if it is clear in which context the text should appear. If it is empty or not set, the content of the text decides which direction the paragraph should have. This works well in most cases, but not, for example, with mixed text that starts with a “wrong” direction.
  • If in doubt, set the attribute bidi to yes. The only drawback is that the publishing run might be a bit slower. Other differences should not occur.
  • The language setting (language) should either contain the correct language, be empty or set to the language Other. The problem is that some language settings can cause an unwanted write direction.
  • For text alignment (alignment at DefineTextformat) you should use start and end instead of left or right. start and end are oriented to the direction for the paragraph.
  • The HarfBuzz-fontloader must be activated.