Digital Text Item
A Digital Text Item is a text item that is a data item composed of digital linguistic characters.
- AKA: Digital Type Written Item, Unstructured Textual Artifact.
- Context:
- It can (typically) be encoded by a Character Encoding System.
- It can (typically) have a Digital Text Item Size.
- It can range from being a Human-Readable Digital Text Item to being a Machine-Readable Digital Text Item.
- It can be processed by a Text Processing System.
- It can be inputted to an Electronic Writing Device using a Text Entry Interface.
- It can range from being a Digital Text Document, a Digital Text Paragraph, Digital Text Passage, a Digital Text Sentence, Digital Text Phrase, Digital Text Word, depending on its text item size.
- It can range from being a Short Digital Text Item to being a Long Digital Text Item.
- It can range from being a Grammatical Digital Text to being an Un-Grammatical Digital Text to being a Non-Grammatical Digital Text, depending on its grammatically.
- It can range from being Unformatted Digital Text to being Formatted Digital Text.
- It can range from being a Single-Language Digital Text Item (e.g. english digital text item) to being a Multi-Language Digital Text Item (e.g. japanese-english digital text item), depending on the natural languages mentioned.
- It can be represented by a Digital Text Item Icon (text item icon).
- It can range from being a Raw Digital Text Item to being an Annotated Digital Text Item (such as a labeled textitem or tokenized textitem).
- It can be represented by a Text Item Record.
- It can have a Digital Text Item Location.
- It can be from a Text Dataset (such as a text corpus).
- It can be processed by a Text Processing System, such as a text translation system.
- …
- Example(s):
- a Plain Text,
- a Formatted Text,
- a Markup Language Text such as Hypertext, WikiText,
- a Natural Language Text,
- a Textual Email.
- a Twitter Posting.
- a Product Description Title.
- a Text File, such as CPROD1 text item.
- …
- Hypertext,
- WikiText,
- PDF Document,
- …
- Counter-Example(s):
- a Type Written Physical Document, such as an original Gutenberg Bible.
- a Hand-Written Item.
- a Programming Language Code such as Shell Script, Source Code,
- a Text Item Vector.
- a Spoken Expression.
- an Image Data Item.
- a Plot-Graph,
- Digital Audio Data,
- Vector Graphics,
- Visual Digital Data,
- Digital Video Data.
- See: Typesetting System, Written Language, Linguistic Artifact, History of Writing, Literary Theory, Literary Criticism, Type Written Physical Document, Electronic Document, Text Encoding Initiative, Natural Language Processing System, Natural Written Language, Programming Language, Markup Language, Formal Language, Text Error Correction System, Universal Coded Character Set, SVG, Computer String, Computer Character, Formatted Text, OHCO, Binary Files, ASCII, Unicode, UTF-8, UTF-16.
References
2015
- (Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/text_(literary_theory) Retrieved:2015-4-16.
- In literary theory, a text is any object that can be "read," whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothing. It is a coherent set of signs that transmits some kind of informative message. [1] This set of symbols is considered in terms of the informative message's content, rather than in terms of its physical form or the medium in which it is represented.
Within the field of literary criticism, "text" also refers to the original information content of a particular piece of writing; that is, the "text" of a work is that primal symbolic arrangement of letters as originally composed, apart from later alterations, deterioration, commentary, translations, paratext, etc. Therefore, when literary criticism is concerned with the determination of a "text," it is concerned with the distinguishing of the original information content from whatever has been added to or subtracted from that content as it appears in a given textual document (that is, a physical representation of text).
Since the history of writing predates the concept of the "text", most texts were not written with this concept in mind. Most written works fall within a narrow range of the types described by text theory. The concept of "text" becomes relevant if and when a "coherent written message is completed and needs to be referred to independently of the circumstances in which it was created."
- In literary theory, a text is any object that can be "read," whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothing. It is a coherent set of signs that transmits some kind of informative message. [1] This set of symbols is considered in terms of the informative message's content, rather than in terms of its physical form or the medium in which it is represented.
- ↑ Yuri Lotman - The Structure of the Artistic Text
2009a
- (WordNet, 2009) ⇒ http://wordnetweb.princeton.edu/perl/webwn?s=text
- S: (n) text, textual matter (the words of something written) "there were more than a thousand words of text"; "they handed out the printed text of the mayor's speech"; "he wants to reconstruct the original text"
- …
2009b
- (WordNet, 2009) ⇒ http://en.wiktionary.org/wiki/text#Noun
- …
- 4. (computing) Data which can be interpreted as human-readable text (often contrasted with binary data).
2006
- (Hirst, 2006) ⇒ Graeme Hirst. (2006). “Views of text-meaning in computational linguistics: Past, present, and future.” In: Computing, Philosophy, and Cognitive Science; Edited by G. Dodig-Crnkovic and S. Stuart.
- In this paper, I’ll use the word text to denote any complete utterance, short or long. In a computational context, a text could be a non-interactive document, such as a news article, a legal statute, or a memorandum, that a writer or author has produced for other people and which is to undergo some kind of processing by a computer. Or a text could be a natural-language utterance by a user in a spoken or typewritten interactive dialogue with another person or a computer: a turn or set of turns in a conversation. The term text-meaning, then, as opposed to mere word-meaning or sentence-meaning, denotes the complete in-context meaning or message of such texts at all levels of interpretation including subtext."
1996s
- (Wall et al., 1996) ⇒ Larry Wall, Tom Christiansen, and Randal L. Schwartz. (1996). “Programming Perl, 2nd edition." O'Reilly. ISBN:1565921496
- text: Normally, a string or file containing primarily printable characters. The word has been usurped in some UNIX circles to mean the portion of your process that contains machine code to be executed.
1996b
- (Sproat et al, 1996) ⇒ Richard Sproat, William A. Gale, Chilin Shih, and Nancy Chang. (1996). “A Stochastic Finite-state Word-Segmentation Algorithm for Chinese.” In: Computational Linguistics, 22(3).
- Any NLP application that presumes as input unrestricted text requires an initial phase of text analysis; such applications involve problems as diverse as machine translation, information retrieval, and text-to-speech synthesis (TTS). An initial step of any text analysis task is the tokenization of the input into words.