Character Offset Identifier
Jump to navigation
Jump to search
A Character Offset Identifier is an Identifier of a Location in a Text Document that is based on counting Characters.
- Context:
- It can be a part of an Entity Mention Identifier.
- …
- Example(s):
- It is used in ACE-2004 evaluation.
- See: Token Offset Identifier.
- ACE-2004
- Character Counting: The annotation files use a character-based, not byte-based, offset counting method: the first Unicode character has the offset of zero; the second character has the offset of one, and so on. The APF and ALF file formats do not count SGML tags in their offsets, while the AG format does count them. The source SGML files use the UNIX-style end-of-line character (LF), and each end-of-line character is counted as one character.