Hyphenation / language settings
This page was automatically translated. Stay tuned for a human translation…
Hyphenation is necessary in most Western languages to have an acceptable appearance for narrow texts. Hyphenation is an integral part of the line breaking algorithm, for example to avoid multiple hyphenations in consecutive lines.
Hyphenation is controlled by language in the Publisher and is pattern-based. The language can be set globally via
<Options mainlanguage="…"> or paragraphwise.
<Options mainlanguage="German" />
switches the entire document to German hyphenation patterns, while
<Paragraph language="German"> <Value>Autobahn</Value> </Paragraph>
changes the language for only one paragraph. The available languages are described in the command reference under
Alternatively to the written out names like
German, the language code can be used.
The two examples above can also be used as follows:
<Options mainlanguage="de" /> <Paragraph language="de"> <Value>Autobahn</Value> </Paragraph>
If you want to test if the words are hyphenated correctly, you can create small marks with
<Trace hyphenation="yes" />.
<Layout xmlns="urn:speedata.de:2009/publisher/en" xmlns:sd="urn:speedata:2009/publisher/functions/en"> <Options mainlanguage="German" /> <Trace hyphenation="yes" /> <Record element="data"> <PlaceObject> <Textblock width="3"> <Paragraph> <Value>Autobahn</Value> </Paragraph> </Textblock> </PlaceObject> </Record> </Layout>
results in the following:
you can define hyphenation suggestions or exceptions for individual words. In this way, the words are then only hyphenated at the positions indicated by a hyphen.
|With optical margin alignment, which is described in the section Optical margin alignment, you can reduce the number of hyphenations in the document somewhat.|
Turn off paragraph hyphenations
For single paragraphs you can switch off the automatic hyphenation by defining a text format with
<DefineTextformat name="nohyphen" hyphenate="no"/>
No words are hyphenated in paragraphs marked in this way. The use of text formats is described in a separate section.
The hyphenation character can also be changed using a text format:
<DefineTextformat name="dothyphen" hyphenchar="•"/>
Use different languages within a paragraph
You can set the language for a textblock, a paragraph and you can even set the language for a piece of text by surrounding the text by
<Span language="…"> and
<Paragraph language="en"> <Span language="de"> <Value>Also schön, Guido Heffels, nachfolgend meine Textempfehlung für das Blindtextbuch. </Value> </Span> <Br /> <Span> <Value>A wonderful serenity has taken possession of my entire soul, like these sweet mornings of spring which I enjoy with my whole heart. </Value> </Span> </Paragraph>
Allow hyphenations only on certain characters
A property of
<Paragraph> allows to limit the characters where a line break may be inserted. This is often important for technical data where, for example, type designations in the form
12-345/AB occur and should not be hyphenated. In the following example, a line break may only be inserted after a slash:
<Paragraph allowbreak="/"> <Value>https://download.speedata.de/publisher/development/</Value> </Paragraph>
The default setting for allowbreak is “ -”, i.e. a break at a space or hyphen.
|This is an experimental feature in the Publisher. It is likely to be associated with a text format in a future version.|
Language settings for non-western languages
Some languages have special typesetting rules that do not affect hyphenation, but the appearance of the text. So the characters can change their shape or position, depending on where they are in the word. To use this feature, the following conditions must be met:
mode="harfbuzz"must be activated at
The language should be set correctly. If the language is not available in the list of supported languages,
--(two dashes) must be used. If the language is not set correctly, layout errors might orccur.
- The selected font must contain the appropriate characters.
<Layout xmlns="urn:speedata.de:2009/publisher/en" xmlns:sd="urn:speedata:2009/publisher/functions/en" version="4.1.7"> <LoadFontfile name="NotoSansBengali-Regular" filename="NotoSansBengali-Regular.ttf" mode="harfbuzz" /> <DefineFontfamily fontsize="10" leading="12" name="text"> <Regular fontface="NotoSansBengali-Regular" /> </DefineFontfamily> <Record element="data"> <PlaceObject> <Textblock> <Paragraph language="Other"> <Value>আমি</Value> </Paragraph> </Textblock> </PlaceObject> </Record> </Layout>
Right-to-left running text
If text is output that runs from right to left (e.g. Arabic), the direction of the paragraph must be specified with must be specified (
Otherwise, the alignment may be wrong (the last line is left-aligned instead of right-aligned).
If the output text is not justified then
end must be used for the alignment in text format and not 'leftaligned' and 'rightaligned'.
end are based on the start position of the text and not on the orientation of the page (output area).
<Layout xmlns="urn:speedata.de:2009/publisher/en" xmlns:sd="urn:speedata:2009/publisher/functions/en" version="4.1.16"> <LoadFontfile name="Amiri-Regular" filename="amiri-regular.ttf" mode="harfbuzz" /> <DefineFontfamily fontsize="10" leading="12" name="text"> <Regular fontface="Amiri-Regular" /> </DefineFontfamily> <Record element="data"> <PlaceObject> <Textblock width="5"> <Paragraph direction="rtl"> <Value select="."/> </Paragraph> </Textblock> </PlaceObject> </Record> </Layout>
<data>المادة 1 يولد جميع الناس أحرارًا متساوين في الكرامة والحقوق. وقد وهبوا عقلاً وضميرًا وعليهم أن يعامل بعضهم بعضًا بروح الإخاء.</data>
Mixed text (right-to-left and left-to-right)
If text is output that runs both from right to left (rtl) and from left to right (ltr), the paragraph must be divided into individual segments and the writing direction must be changed between the segments. This so-called “bidi algorithm” is built into the speedata Publisher
and is activated with
<PlaceObject> <Textblock width="5"> <Paragraph bidi="yes"> <Value select="."/> </Paragraph> </Textblock> </PlaceObject>
<data>العاشر ليونيكود (Unicode Conference)، الذي سيعقد في 10-12 آذار 1997 مبدينة</data>
Rules for mixed text
directionattribute if it is clear in which context the text should appear. If it is empty or not set, the content of the text decides which direction the paragraph should have. This works well in most cases, but not, for example, with mixed text that starts with a “wrong” direction.
If in doubt, set the attribute
yes. The only drawback is that the publishing run might be a bit slower. Other differences should not occur.
The language setting (
language) should either contain the correct language, be empty or set to the language
Other. The problem is that some language settings can cause an unwanted write direction.
For text alignment (
alignmentat DefineTextformat) you should use
endare oriented to the direction for the paragraph.
- The harfbuzz-fontloader must be activated.