Subword-level Language Model

From GM-RKB
Jump to navigation Jump to search

A Subword-level Language Model is a Language Model that operates at a Subword units level.



References

2018a

2018b

2016

1989

  • (Mignosi, 1989) ⇒ Filippo Mignosi. (1989). “Infinite Words with Linear Subword Complexity.” In: Theoretical Computer Science, 65(2).
    • QUOTE: ... Let $A$ be a set and let $A^{*}$ be the free monoid generated by $A$. The elements of $A^{*}$ are said to be words. The empty word is denoted by $\wedge$ and we set $A^{+}:=A^{*} \backslash\{1\} .$ Let $A^{m}$ be the set of all words of $A^{+}$ of length $m$ and denote by $|u|$ the length of the word $u$. An infinite word over $A$ is a sequence of elements in $A^{+} ;$ its length is $+\infty .$ The set $A$ is called an alphabet and accordingly, elements of $A$ are called letters. Now a word which is not a power of another word is called primitive. Let $f=a_{1} a_{2} \ldots$ be an infinite (or a finite) word. A word $w$ is called a subword of $f(\text { but also a factor })$ if $w=\wedge$ or $w=a_{i} a_{i+1} \ldots a_{j}, i, j \in \mathbb{N}, i \leqslant j \leqslant|f| .$ We set for short $w \mid f .$ Let $F:-F(f)$ be the set of the subwords of $f$.