Inverted Index Data Structure
Jump to navigation
Jump to search
An Inverted Index Data Structure is an index data structure where each Index Data Record contains a set of labels and a pointer to a data record in a tabular data structure.
- AKA: Inverted File, Postings File, Inverted Index.
- Context:
- It can be:
- a Record-level Inverted Index … contains a list of references to documents for each word.
- a Word-level Inverted Index … additionally contains the positions of each word within a document
- It can be:
- Example(s):
"a": {2}
"banana": {2}
"is": {0, 1, 2}
"it": {0, 1, 2}
"what": {0, 1}"a": {(2, 2)}
"banana": {(2, 3)}
"is": {(0, 1), (0, 4), (1, 1), (2, 1)}
"it": {(0, 0), (0, 3), (1, 2), (2, 0)}
"what": {(0, 2), (1, 0)}
- See: Information Retrieval Algorithm.
References
2011
- http://en.wikipedia.org/wiki/Inverted_index
- QUOTE:In computer science, an inverted index (also referred to as postings file or inverted file) is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its index. It is the most popular data structure used in document retrieval systems,[1] used on a large scale for example in search engines. Several significant general-purpose mainframe-based database management systems have used inverted list architectures, including ADABAS, DATACOM/DB, and Model 204.
There are two main variants of inverted indexes: A record level inverted index (or inverted file index or just inverted file) contains a list of references to documents for each word. A word level inverted index (or full inverted index or inverted list) additionally contains the positions of each word within a document.[2] The latter form offers more functionality (like phrase searches), but needs more time and space to be created.
- QUOTE:In computer science, an inverted index (also referred to as postings file or inverted file) is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents. The purpose of an inverted index is to allow fast full text searches, at a cost of increased processing when a document is added to the database. The inverted file may be the database file itself, rather than its index. It is the most popular data structure used in document retrieval systems,[1] used on a large scale for example in search engines. Several significant general-purpose mainframe-based database management systems have used inverted list architectures, including ADABAS, DATACOM/DB, and Model 204.
2007
- (Banko et al., 2007) ⇒ Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. (2007). “Open Information Extraction from the Web.” In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2007).
- QUOTE:TEXTRUNNER is capable of responding to queries over millions of tuples at interactive speeds due to an inverted index distributed over a pool of machines. Each relation found during tuple extraction is assigned to a single machine in the pool. Every machine then computes an inverted index over the text of the locally-stored tuples, ensuring that each machine is guaranteed to store all of the tuples containing a reference to any relation assigned to that machine. (The inverted index itself is built using the Lucene open source search engine.)