Apertium Streamparser

Usage: streamparser.py [FILE]

Consumes input from a file (first argument) or stdin, parsing and pretty printing the readings of lexical units found.

class streamparser.Knownness[source]

Level of knowledge associated with a LexicalUnit.

Values: known, unknown, biunknown, genunknown

class streamparser.known[source]
class streamparser.unknown[source]

Denoted by *, analysis not available.

class streamparser.biunknown[source]

Denoted by @, translation not available.

class streamparser.genunknown[source]

Denoted by #, generated form not available.

class streamparser.LexicalUnit(lexical_unit)[source]

A lexical unit consisting of a lemma and its readings.

lexical_unit

The lexical unit in Apertium stream format.

Type:str
wordform

The word form (surface form) of the lexical unit.

Type:str
wordbound_blank

The wordbound blank of the lexical unit.

Type:str
readings

The analyses of the lexical unit with sublists containing all subreadings.

Type:List[List[SReading]]
knownness

The level of knowledge of the lexical unit.

Type:Knownness
class streamparser.SReading

A single subreading of an analysis of a token.

baseform

The base form (lemma, lexical form, citation form) of the reading.

Type:str
tags

The morphological tags associated with the reading.

Type:List[str]
baseform

Alias for field number 0

tags

Alias for field number 1

streamparser.mainpos(reading, ltr=False)[source]

Return the first part-of-speech tag of a reading. If there are several subreadings, by default give the first tag of the last subreading. If ltr=True, give the first tag of the first subreading, see http://beta.visl.sdu.dk/cg3/single/#sub-stream-apertium for more information.

streamparser.parse(stream, with_text=False)[source]

Generates lexical units from a character stream.

Parameters:
  • stream (Iterator[str]) – A character stream containing lexical units, superblanks and other text.
  • with_text (Optional[bool]) – A boolean defining whether to output preceding text with each lexical unit.
Yields:

LexicalUnit – The next lexical unit found in the character stream. (if with_text is False)

(str, LexicalUnit) - The next lexical unit found in the character stream and the the text that seperated it from the prior unit in a tuple. (if with_text is True)

streamparser.parse_file(f, **kwargs)[source]

Generates lexical units from a file.

Parameters:f (file) – A file containing lexical units, superblanks and other text.
Yields:LexicalUnit – The next lexical unit found in the file.