SMART Information Retrieval System
A SMART Information Retrieval System is a Information Retrieval System that includes Vector Space Modelling, Relevance Feedback, and Rocchio Classification.
- AKA: SMART System, System for the Manipulation and Retrieval of Texts.
- Context:
- It was developed by Gerard Salton at Cornell University in the 1960s.
- Example(s):
- Counter-Example(s):
- See: Vector Representation, Similarity Computation, Gerard M. Salton, Word Vector Model, Information Extraction System, Cornell TREC Experiment.
References
2019
- (Wikipedia, 2019) ⇒ https://en.wikipedia.org/wiki/SMART_Information_Retrieval_System Retrieved:2019-12-21.
- The SMART (System for the Mechanical Analysis and Retrieval of Text) Information Retrieval System is an information retrieval system developed at Cornell University in the 1960s. Many important concepts in information retrieval were developed as part of research on the SMARTsystem, including the vector space model, relevance feedback, and Rocchio classification.
Gerard Salton led the group that developed SMART. Other contributors included Mike Lesk.
The SMART system also provides a set of corpora, queries and reference rankings, taken from different subjects, notably
- ADI: publications from information science reviews
- CACM: computer science
- Cranfield collection: publications from aeronautic reviews
- CISI: library science
- Medlars collection: publications from medical reviews
- Time magazine collection: archives of the generalist review Time in 1963
- To the legacy of the SMART system belongs the so-called SMART triple notation, a mnemonic scheme for denoting tf-idf weighting variants in the vector space model. The mnemonic for representing a combination of weights takes the form
ddd.qqq
, where the first three letters represents the term weighting of the collection document vector and the second three letters represents the term weighting for the query document vector. For example,ltc.lnn
represents theltc
weighting applied to a collection document and thelnn
weighting applied to a query document.The following tables establish the SMART notation: The gray letters in the first, fifth, and ninth columns are the scheme used by Salton and Buckley in their 1988 paper. [1] The bold letters in the second, sixth, and tenth columns are the scheme used in experiments reported thereafter.
- The SMART (System for the Mechanical Analysis and Retrieval of Text) Information Retrieval System is an information retrieval system developed at Cornell University in the 1960s. Many important concepts in information retrieval were developed as part of research on the SMARTsystem, including the vector space model, relevance feedback, and Rocchio classification.
- ↑ Salton, G., & Buckley, C. (1988). Term-Weighting Approaches in Automatic Text Retrieval. Inf. Process. Manage., 24, 513-523.
1995
- (Buckley et al., 1995) ⇒ Chris Buckley, Amit Singhal, and Mandar Mitra (1995, November). "New retrieval approaches using SMART: TREC 4". In: Proceedings of the Fourth Text REtrieval Conference (TREC-4) (pp. 25-48).
- The Cornell TREC experiments use the SMART Information Retrieval System, Version 12, and are run on a dedicated Sun Sparc 20/51 with 160 Megabytes of memory and 27 Gigabytes of local disk.
SMART Version 12 is the latest in a long line of experimental information retrieval systems, dating back over 30 years, developed under the guidance of G. Salton. The new version is approximately 44,000 lines of C code and documentation.
SMART Version 12 offers a basic framework for investigations of the vector space and related models of information retrieval. Documents are fully automatically indexed, with each document representation being a weighted vector of concepts, the weight indicating the importance of a concept to that particular document (as described above). The document representatives are stored on disk as an inverted file. Natural language queries undergo the same indexing process. The query representative vector is then compared with the indexed document representatives to arrive at a similarity (equation (1)), and the documents are then fully ranked by similarity.
- The Cornell TREC experiments use the SMART Information Retrieval System, Version 12, and are run on a dedicated Sun Sparc 20/51 with 160 Megabytes of memory and 27 Gigabytes of local disk.
1993
- (Buckley et al., 1993) ⇒ Chris Buckley, Gerard Salton, and James Allan (1993). “Automatic Retrieval With Locality Information Using SMART". In: The First Text REtrieval Conference (TREC-1), NIST Special Publication 500-207.
1983
- (Salton & McGill, 1983) ⇒ Gerard Salton, and Michael J. McGill (1983) "The SMART and SIRE Experimental Retrieval Systems" In: Introduction To Modern Information Retrieval. McGraw-Hill. ISBN: 9780070544840, 0070544840.
- QUOTE: The SMART system distinguishes itself from more conventional retrieval systems in the following important respects: (1) it uses fully automatic indexing methods to assign content identifiers to documents and search requests; (2) it collects related documents into common subject classes, making it possible to start with specific items in a particular subject area and to find related items in neighboring subject fields; (3) it identifies the documents to be retrieved by performing similarity computations between stored items and incoming queries, and by ranking the retrieved items in decreasing order of their similarity with the query; and finally, (4) it includes automatic procedures for producing improved search statements based on information obtained as a result of earlier retrieval operations (...)
1981
- (1981, Salton) ⇒ Gerard Salton (1981). "The Smart Environment For Retrieval System Evaluation — Advantages And Problem Areas". In: Information Retrieval Experiment, Karen Sparck Jones, Butterworths, 1981.
- QUOTE: The Smart environment provides a test-bed for implementing and evaluating a large number of different automatic search and retrieval processes. In this chapter, the basic parameters underlying the Smart system design are briefly outlined, and a comparison is made with the characteristics of more conventional retrieval systems. The principal lessons learned from the Smart experiments are described, and some of the methodological problems raised by the system design are outlined. Finally, some comments are included about the disadvantages inherent in working in the laboratory, and the insights that can be gained in such a situation.
1975
- (Salton et al., 1975) ⇒ Gerard M. Salton, A. Wong, and C. Yang. (1975). “A Vector Space Model for Automatic Indexing.” In: Communications of the ACM, 18(11). doi:10.1145/361219.361220.
1965
- (Salton & Lesk, 1965) ⇒ Gerard Salton, and Michael E. Lesk (1965). The SMART automatic document retrieval systems — an illustration. Communications of the ACM, 8(6), 391-398.