Unformatted Text
An Unformatted Text is a Electronic Text that is only contains textual characters as inputted through text entry interface without any character encoding.
- AKA: Raw Textual Data, Plain Text.
- Context:
- It can include meta-data, markup language, source code and shell scripts.
- Example(s):
"Plain text is the underlying content stream to which formatting can be applied."
."Plain text is public, standardized, and universally readable."
."A <i>[[plain text document]]</i> displays all [[computer character]]s before being encoded by a <a href="https://www.gabormelli.com/RKB/Software_Engine">software engine</a> to [[graphical symbol]]s or typesetting features"
.- …
- Counter-Example(s):
- Formatted Text such as:
- Graphics,
- Audio Data.
- See: Text Processing System, Typesetting System, Text Error Correction System, Universal Coded Character Set, SVG, Text Item, Computer String, Computer Character, Formatted Text, OHCO, Binary Files, Character Encoding System, ASCII, Unicode, UTF-8, UTF-16.
References
2020
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Plain_text Retrieved:2020-2-16.
- In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limited number of characters that control simple arrangement of text, such as spaces, line breaks, or tabulation characters (although tab characters can "mean" many different things, so are hardly "plain"). Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects (encoded integers, real numbers, images, etc.).
The term is sometimes used quite loosely, to mean files that contain only "readable" content (or just files with nothing that the speaker doesn't prefer). For example, that could exclude any indication of fonts or layout (such as markup, markdown, or even tabs); characters such as curly quotes, non-breaking spaces, soft hyphens, em dashes, and/or ligatures; or other things.
In principle, plain text can be in any encoding, but occasionally the term is taken to imply ASCII. As Unicode-based encodings such as UTF-8 and UTF-16 become more common, that usage may be shrinking.
Plain text is also sometimes used only to exclude "binary" files: those in which at least some parts of the file cannot be correctly interpreted via the character encoding in effect. For example, a file or string consisting of "hello" (in whatever encoding), following by 4 bytes that express a binary integer that is not just a character, is a binary file, not plain text by even the loosest common usages. Put another way, translating a plain text file to a character encoding that uses entirely different number to represent characters, does not change the meaning (so long as you know what encoding is in use), but for binary files such a conversion does change the meaning of at least some parts of the file.
Files that contain markup or other meta-data are generally considered plain-text, so long as the markup is also in directly human-readable form (as in HTML, XML, and so on; as Coombs, Renear, and DeRose argue,[1] punctuation is itself markup; and no one considers punctuation to disqualify a file from being plain text).
The use of plain text rather than binary files, enables files to survive much better "in the wild", in part by making them largely immune to computer architecture incompatibilities. For example, all the problems of Endianness can be avoided (with encodings such as UCS-2 rather than UTF-8, endianness matters, but uniformly for every character, rather than for potentially-unknown subsets of it).
According to The Unicode Standard,
- “Plain text is a pure sequence of character codes; plain Un-encoded text is therefore a sequence of Unicode character codes."
- styled text, also known as rich text, is any text representation containing plain text completed by information such as a language identifier, font size, color, hypertext links.[2]
- Thus, representations such as SGML, RTF, HTML, XML, wiki markup, and TeX, as well as nearly all programming language source code files, are considered plain text. The particular content is irrelevant to whether a file is plain text. For example, an SVG file can express drawings or even bitmapped graphics, but is still plain text.
According to The Unicode Standard, plain text has two main properties in regard to rich text:
- "plain text is the underlying content stream to which formatting can be applied."
- "Plain text is public, standardized, and universally readable.".
- In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limited number of characters that control simple arrangement of text, such as spaces, line breaks, or tabulation characters (although tab characters can "mean" many different things, so are hardly "plain"). Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects (encoded integers, real numbers, images, etc.).
- ↑ Coombs, James H.; Renear, Allen H.; DeRose, Steven J. (November 1987). [Coombs, James H.; Renear, Allen H.; DeRose, Steven J. (November 1987). "Markup systems and the future of scholarly text processing". Communications of the ACM. 30 (11): 933–947. CiteSeerX 10.1.1.515.5618. doi:10.1145/32206.32209.
- ↑ The Unicode Standard, version 6.1, General Structure, page 14