Performance Considerations
When processing large documents or documents with many formatting operations, performance can become an important consideration. This chapter describes various strategies to optimize the typesetting speed of the speedata Publisher.
HTML Parsing
One of the most significant performance factors is HTML parsing in paragraphs. By default, the Publisher parses HTML tags like <b>, <i>, <span>, etc. in all text content. This parsing happens for every paragraph and can be expensive, especially in documents with many small text blocks.
Disabling HTML Parsing
If your document does not use HTML formatting tags, you can significantly improve performance by disabling HTML parsing:
Global Setting
To disable HTML parsing for the entire document, use the <Options> command:
<Options html="off"/>
This can reduce typesetting time by up to 40% in documents with many paragraphs.
Local Setting
You can also control HTML parsing on a per-paragraph basis:
<Paragraph html="off">
<Value>Text without HTML formatting</Value>
</Paragraph>
HTML Parsing Modes
The html attribute supports three values:
all-
Parse HTML in all paragraphs (default behavior).
inner-
Parse HTML only in child elements of the current data element.
off-
Disable HTML parsing completely. This provides the best performance but HTML tags like
<b>or<i>will not be interpreted.
Command Line Option
You can also set the HTML parsing mode from the command line:
sp --option html=off
This is particularly useful for batch processing or testing performance optimizations.
When to Use html="off"
Consider disabling HTML parsing when:
- Your data contains no HTML formatting tags
- Text formatting is handled entirely through text formats
- You are processing large documents with many paragraphs
- Performance is critical and you don’t need inline HTML formatting
When to Keep HTML Parsing Enabled
Keep HTML parsing enabled when:
-
Your data contains HTML tags like
<b>,<i>,<span>, etc. - You need inline formatting within paragraphs
- You are using CSS styles with HTML elements
- Document generation time is not critical