tkmilan.parser#
Parsers for custom formats.
Reuse existing parsers as much as possible, don’t reinvent wheels.
Functions
|
Convert the |
|
Parse Lite Text Markup Language using the default settings. |
Classes
|
Lite Text Markup Language parser. |
|
|
- class tkmilan.parser.LTML(tag_act: bool = True, tag_simplify: bool = True)#
Bases:
HTMLParserLite Text Markup Language parser.
This is a small subset of HTML, to mark spans of text with metadata.
The HTML subset is a very small piece of the whole HTML specification. See
TAGS_FULLandTAGS_SEfor the list of allowed tags. All text ouside the tags is included as untagged spans.The API is analogous to
HTMLParser, but supports only the LTML subset. After creating the parser object, feed it data using thefeedfunction, and reuse the parser with new data by calling theresetfunction.Note
Chunked
feedusage is not supported (not even detected), please send the entire text at once, or at least break it at the tags.The output command list is available in
cmdlist, it’s a list ofmodel.TextElement. The text element main tag is the LTML tag name (i.e for<span>TXT</span>, the text element main tag isspan).Most attributes are ignored, but other have a special meaning and restrictions:
- Attribute
id: An optional identifier.
This is sent as
id:$idtext element tag, if present. The value must not have any:anywhere.
- Attribute
- Attribute
class: A list of strings, separated by whitespace.
This is sent as
class:$classtext element tag, for each split class name. The value must not have any:anywhere.
- Attribute
- Attributes
data-*: All the data attributes are collected in a dictionary, removing the
data-prefix from the attribute name to get the key, and using the value as-is.
- Attributes
- Tag
a; Attributehref: A simple string, NOT a URL as in HTML.
"a:href"is sent asa-$hreftext element tag. The value must not have any::anywhere.
- Tag
For the tags in
TAGS_SIMPLE, if there are no attributes, the tag is “simplified” by turning it into a regular text span. This is controlled by the “tag_simplify” argument.In addition to these tags, each tag type includes the so-called Automatic Counter tags (ACT). These are markers for each instance of each target type, including it’s index position (starting at 0). The format is
$tag::$index. This is controlled by “tag_act” argument.Note
As an example of the Automatic Counter tags, consider the following LTML:
<span>1</span><span>2</span>
This results in two elements with the following tags:
1:
span(Main Tag)span::0(ACT)2:
span(Main Tag)span::1(ACT)
- Parameters:
tag_simplify (bool) – Simplify some tags (
TAGS_SIMPLE).tag_act (bool) – See the description for automatic counter tags documentation. Defaults to
True.
- feed(data)#
Feed data to the parser.
Call this as often as you want, with as little or as much text as you want (may include ‘n’).
Inherited from
HTMLParser.feed
- parse(ltml: str) Sequence[TextElement]#
Helper function to parse a standalone string.
- reset() None#
Reset the instance. Loses all unprocessed data.
This is called when instancing the parser.
—
Extended from
HTMLParser.reset
- TAGS_FULL = {'a', 'b', 'i', 'span'}#
Full Tags.
Like this:
<tag>...</tag>.
- TAGS_MODEL = {'a': <class 'tkmilan.model.TextElementInline'>, 'b': <class 'tkmilan.model.TextElementInline'>, 'br': <class 'tkmilan.model.TextElement_br'>, 'i': <class 'tkmilan.model.TextElementInline'>, 'span': <class 'tkmilan.model.TextElementInline'>}#
Mapping tag names to its corresponding model class.
- TAGS_SE = {'br'}#
Start/End Tags.
Like this:
<tag />
- TAGS_SIMPLE = {'span'}#
Simple Tags, that is, Full Tags that can be simplified.
See the
tag_simplifyoption.
- cmdlist: List[TextElement]#
Output command list.
- class tkmilan.parser.LTML_Attributes(identifier: str | None = None, classes: ~typing.AbstractSet[str] = <factory>, data: ~typing.Mapping[str, str] = <factory>, href: str | None = None)#
Bases:
objectLTMLattributes (see the main documentation).
- tkmilan.parser.escape_LTML(string: str, *, quote: bool = True) str#
Convert the
stringinto safe Lite Text Markup Language.Wraps the upstream escape function, see
html.escape.Note
You can use the upstream function directly, instead of importing this.
- Parameters:
string (str) – String to escape special LTML characters.
quote (bool) – Escape quotes too. Usually needed, since it might be embedded in an attribute. Defaults to
True.
- tkmilan.parser.parse_LTML(ltml: str, __parser: ~tkmilan.parser.LTML = <tkmilan.parser.LTML object>) Sequence[TextElement]#
Parse Lite Text Markup Language using the default settings.
This always reuses the same parser, so it should be fast.
- Parameters:
ltml (str) – The string to parse.
See also
See
LTMLfor further advanced usage. In particular, seeLTML.parse.