Digital Text Document
Jump to navigation
Jump to search
A Digital Text Document is a type written digital item that is a type-written document (a digital document composed of text items).
- Context:
- It can have a Title (or Header). (e.g. Martin Luther's "I have a dream" speech)
- It can be a Passage, Paragraph, a Document, a Webpage.
- It can be:
- Expository Text, where the Text is a kind of Textbook.
- Episodic Text, where the Text describes a series of related Events.
- Explanatory Text.
- It can have a Text Meaning (contain several complete Thoughts).
- It can be evaluated by a Text Measure (which produces a text score).
- It can contain Entity Mentions, Relation Mentions.
- It can be an Annotated Text Document.
- It can be in a File Format, such as Text File Format, or PDF File Format, ...
- It can be divided into Textual Units.
- It can be associated with a Text Document Category.
- It can be a Member of of Text Document Corpus.
- It can range from being an Unrestricted-Topic Text Document to being a Domain-Specific Text Document.
- …
- Example(s):
- a Plaintext Document,
- a Webpage,
- a MS Word Document,
- a LibreOffice Writer Document,
- a Wikipedia Article e.g. http://en.wikipedia.org/wiki/Text (without the illustrations).
- a Type Written Physical Document.
- …
- Counter-Example(s):
- An Digital Image such as:
- a Handwritten Document,
- An XML Document,
- A Table,
- A Spreadsheet File,
- A Painting,
- A Song.
- a PDF Document (with images).
- See: Discourse-level Analysis, Document File Format, Text File.
References
2006
- (Hirst, 2006) ⇒ Graeme Hirst. (2006). “Views of text-meaning in computational linguistics: Past, present, and future.” In: Computing, Philosophy, and Cognitive Science; Edited by G. Dodig-Crnkovic and S. Stuart.
- QUOTE: In this paper, I’ll use the word text to denote any complete utterance, short or long. In a computational context, a 'text could be a non-interactive document, such as a news article, a legal statute, or a memorandum, that a writer or author has produced for other people and which is to undergo some kind of processing by a computer. Or a text could be a natural-language utterance by a user in a spoken or typewritten interactive dialogue with another person or a computer: a turn or set of turns in a conversation. The term text-meaning, then, as opposed to mere word-meaning or sentence-meaning, denotes the complete in-context meaning or message of such texts at all levels of interpretation including subtext.
2000
- (Nigam et al., 2000) ⇒ Kamal Nigam, Andrew McCallum, Sebastian Thrun, and Tom M. Mitchell. (2000). “Text Classification from Labeled and Unlabeled Documents Using EM.” In: Machine Learning, 39(2/3). doi:10.1023/A:1007692713085
- QUOTE: First let us introduce some notation to describe text. A document, [math]\displaystyle{ d_i }[/math] is considered to be an ordered list of word events [math]\displaystyle{ \langle w_{d_{i,1} }, w_{d_{i,2} } , ... \rangle }[/math]. We write [math]\displaystyle{ w_{d_{i,k} } }[/math] for the word [math]\displaystyle{ w_t }[/math] in position [math]\displaystyle{ k }[/math] of document [math]\displaystyle{ d_i }[/math] where [math]\displaystyle{ w_t }[/math] is a word in the vocabulary [math]\displaystyle{ V = \langle w_1, w_2, ..., w_{\vert V \vert}\rangle }[/math]
1997
- (Marcu, 1997) ⇒ Daniel Marcu. (1997). “Rhetorical Parsing, Summarization, and Generation of Natural Language Texts.] PhD Thesis. University of Toronto.