Character Offset Identifier

From GM-RKB
Jump to navigation Jump to search

A Character Offset Identifier is an Identifier of a Location in a Text Document that is based on counting Characters.


  • ACE-2004
    • Character Counting: The annotation files use a character-based, not byte-based, offset counting method: the first Unicode character has the offset of zero; the second character has the offset of one, and so on. The APF and ALF file formats do not count SGML tags in their offsets, while the AG format does count them. The source SGML files use the UNIX-style end-of-line character (LF), and each end-of-line character is counted as one character.