String Literal
A String Literal is a computer programming literal that represents a string item.
- Context:
- It can range from being a Single-Line String Literal to being a Multiline String Literal.
- It is usually enclosed between String Delimiters.
- ...
- Example(s):
- a C String Literal such as:
char str[] = "Hello World!"
char str[] = {'H','e','l','l','o',' ','W','o','r','l','d','!','\0'}
- ...
- a Python String Literal such as:
Input: string1
→Output: print(string1)
string1 = 'Hello World!'
→Hello World!
string1 = '"Hello World!"'
→"Hello World!"
string1 = 'H' + 'e' + 'l' + 'l' + 'o' + ' ' + 'W' + 'o' + 'r' + 'l' + 'd' + '!'
→Hello World!
string1 = ur"\u20AC"
→€
- ...
- a Docstring.
- ...
- a C String Literal such as:
- Counter-Example(s):
- a Character Literal,
- a Numeric Literal,
- a XML Literal.
- See: Programming Language, String Function, Source Code, String Processing Algorithm, Bracketed Sequence Delimiters, Escape Sequences, Sequence Delimiter Collision, Sequence Delimiter, Pattern Recognition Task, Sequence-to-Sequence Network.
References
2020a
- (Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/String_literal Retrieved:2020-2-23.
- A string literal or anonymous string is a type of literal in programming for the representation of a string value within the source code of a computer program. Most often in modern languages this is a quoted sequence of characters (formally “bracketed delimiters"), as in
x = "foo"
, where"foo"
is a string literal with valuefoo
– the quotes are not part of the value, and one must use a method such as escape sequences to avoid the problem of delimiter collision and allow the delimiters themselves to be embedded in a string. However, there are numerous alternate notations for specifying string literals, particularly more complicated cases, and the exact notation depends on the individual programming language in question. Nevertheless, there are some general guidelines that most modern programming languages follow.
- A string literal or anonymous string is a type of literal in programming for the representation of a string value within the source code of a computer program. Most often in modern languages this is a quoted sequence of characters (formally “bracketed delimiters"), as in
2020b
- (Wikipedia, 2020b) ⇒ https://en.wikipedia.org/wiki/String_(computer_science)#Literal_strings Retrieved:2020-2-23.
- Sometimes, strings need to be embedded inside a text file that is both human-readable and intended for consumption by a machine. This is needed in, for example, source code of programming languages, or in configuration files. In this case, the NUL character doesn't work well as a terminator since it is normally invisible (non-printable) and is difficult to input via a keyboard. Storing the string length would also be inconvenient as manual computation and tracking of the length is tedious and error-prone.
Two common representations are:
- Surrounded by quotation marks (ASCII 0x22 double quote or ASCII 0x27 single quote), used by most programming languages. To be able to include special characters such as the quotation mark itself, newline characters, or non-printable characters, escape sequences are often available, usually prefixed with the backslash character (ASCII 0x5C).
- Terminated by a newline sequence, for example in Windows INI files.
- Sometimes, strings need to be embedded inside a text file that is both human-readable and intended for consumption by a machine. This is needed in, for example, source code of programming languages, or in configuration files. In this case, the NUL character doesn't work well as a terminator since it is normally invisible (non-printable) and is difficult to input via a keyboard. Storing the string length would also be inconvenient as manual computation and tracking of the length is tedious and error-prone.
2020c
- (Wikipedia, 2020c) ⇒ https://en.wikipedia.org/wiki/C_string_handling#Definitions Retrieved:2020-2-23.
- A string is defined as a contiguous sequence of code units terminated by the first zero code unit (often called the NUL code unit).[1] This means a string cannot contain the zero code unit, as the first one seen marks the end of the string. The length of a string is the number of code units before the zero code unit. The memory occupied by a string is always one more code unit than the length, as space is needed to store the zero terminator.
Generally, the term string means a string where the code unit is of type
char
, which is exactly 8 bits on all modern machines. C90 defines wide strings which use a code unit of typewchar_t
, which is 16 or 32 bits on modern machines. This was intended for Unicode but it is increasingly common to use UTF-8 in normal strings for Unicode instead.Strings are passed to functions by passing a pointer to the first code unit. Since
char*
andw_char*
are different types, the functions that process wide strings are different than the ones processing normal strings and have different names. String literals (
"text"
in the C source code) are converted to arrays during compilation. The result is an array of code units containing all the characters plus a trailing zero code unit. In C90L"text"
produces a wide string. A string literal can contain the zero code unit (one way is to put\0
into the source), but this will cause the string to end at that point. The rest of the literal will be placed in memory (with another zero code unit added to the end) but it is impossible to know those code units were translated from the string literal, therefore such source code is not a string literal.
- A string is defined as a contiguous sequence of code units terminated by the first zero code unit (often called the NUL code unit).[1] This means a string cannot contain the zero code unit, as the first one seen marks the end of the string. The length of a string is the number of code units before the zero code unit. The memory occupied by a string is always one more code unit than the length, as space is needed to store the zero terminator.
- ↑ "The C99 standard draft + TC3" (PDF). §7.1.1p1. Retrieved 7 January 2011.
2020d
- (Python Doc., 2020) ⇒ "2.4.1. String literals". In: Python Documentation Content. Retrieved:2020-2-23.
- QUOTE: In plain English: String literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter
'r'
or'R'
; such strings are called raw strings and use different rules for interpreting backslash escape sequences. A prefix of'u'
or'U'
makes the string a Unicode string. Unicode strings use the Unicode character set as defined by the Unicode Consortium and ISO 10646. Some additional escape sequences, described below, are available in Unicode strings. A prefix of'b'
or'B'
is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A'u'
or'b'
prefix may be followed by an'r'
prefix.In triple-quoted strings, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the string. (A “quote” is the character used to open the string, i.e. either
'
or"
.)Unless an
'r'
or'R'
prefix is present, escape sequences in strings are interpreted according to rules similar to those used by Standard C.
- QUOTE: In plain English: String literals can be enclosed in matching single quotes (') or double quotes ("). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter