Research Glossary
Jump to navigation
Jump to search
Back to HomePage.
- THIS PAGE IS OUT-OF-DATE. ONE DAY IT WILL BE AUTOMATICALLY GENERATED
A
- Abductive Reasoning: Abductive Reasoning is the type of Reasoning where one makes a Conclusion based on its likeliness.
- Active Learning: Active Learning is a type of machine learning where the algorithm can request for labels.
- Accuracy: Accuracy is the measure of a Predictive Model's ability to correctly label a previously unseen Test Case.
- Accuracy Estimation: Accuracy Estimation is the use of a Validation Process to approximate the true value of a Predictive Model's Accuracy based on a Data Sample.
- Adposition: An Adposition is a Word that is used to indicate a semantic relationship between an Entity and a modifying concept.
- Adjective: An Adjective is a Content Word that modifies a Noun or Pronoun.
- Adjective Phrase: An Adjective Phrase is a Phrase with an Adjective as its Head Word.
- Adjuncative Argument: An Adjuncative Argument is a Semantic Argument that modifies the meaning of another Semantic Argument.
- Adverb: An Adverb is a Word that can modifies a Verb, Adjective, or another Adverb.
- Affix: An Affix is a type of Bound Morpheme that can be added to a word to create a new word.
- Algorithm: An Algorithm is a well specified sequence of steps that accepts an Input and produces an Output.
- Anaphor: An Anaphor is a Word that refers to the same concept as a nearby Word.
- Anaphora Resolution: Anaphora Resolution is the NLP Task of identifying Anaphors and their Referents.
- Antonym: An Antonym of a Word is a Word with opposite meaning.
- ArtEquAKT: ArtEquAKT is a Question Answering system
- Association: An Association is function of Conditional Dependence between two or more Objects.
- Attribute: An Attribute is a property of a Concept that has been Extentionally Defined.
- Automatic Speech Recognition:
- AutoSlog: AutoSlog is a system developed in the early 90s that automatically built dictionaries of----
B
- Bootstrapping: Bootstrapping is a technique used to improve the accuracy of an Induction Algorithm by Resampling with replacement.
C
- C4.5: C4.5 is a Decision Tree Algorithm.
- Causal Factor: A Causal Factor is a Factor that has a Causal Connection with the Feature being investigated.
- Chi Square Distribution: The Chi Square Distribution is the Distribution of the Chi Square Function.
- Center Embedding: Center Embedding is the process of embedding a Phrase in the middle of another Phrase of the same type.
- CiteSeer: CiteSeer is an online Information Retrieval service for scientific publications based on Meta-information such as author name and reference lists.
- Class Noun: A Class Noun is a Noun that refers to a Semantic Class.
- Classification: Classification is the Task of creating a Predictive Model that makes a Categorical valued Prediction (aka a Classifier).
- Classification Algorithm: An Algorithm that supports the task of Classification.
- Classifier: A Classifier is a Predictive Model whose Input is a Test Case and whose Output is a Categorical valued Prediction.
- Clause: A Clause is a sequence of Words that expresses a Proposition.
- Cluster: A Cluster is a set of similar Cases.
- Clustering: Clustering is the Task of developing a Clustering Model that places a Test Case into a Cluster.
- Concept: A Concept is a Representation of a Thing or a set of things that can be thought.
- Concept Class: A Concept Class is a Concept that contains one or more Concepts or Concept Instances that can be unambiguously grouped together by referring to a property that is shared by all of them.
- Concept Mapping: Concept Mapping is a visual technique to aid people in understanding concepts.
- Conclusion: A Conclusion is a Statement about the world that one believes to be true.
- Confidence Level: A Confidence Level is a Statistic of how sure one can be that a Prediction is True.
- Confounding Factor: A Confounding Factor is a Causal Factor that is not present in the Training Data of a Predictive Modeling Problem.
- Confusion Matrix: A Confusion Matrix is a Table that illustrates how well a Classifier predicts.
- Content Word: A Content Word is a Word with Semantic content.
- Contingency Table: A Contingency Table is used to examine the relationship between two random variables.
- Controlled Study: A Controlled Study is a model of Hypothesis Testing in which the performance of Treatment Group is compared to the performance of the Control Group.
- Coreference Resolution: Coreference Resolution is the NLP Task of linking Nouns that refer to the same Concept.
- Correlation Coefficient: A Correlation Coefficient is a Statistic (between -1 and 1) that measures the degree to which two Continuous random variables are related.
- Corpus: A Corpus is a set of one or more Documents.
- Cost-Benefit Matrix: A Cost-Benefit Matrix is an input to the Classification Problem that allows predictive modelers to describe the costs and the benefits associated with each possible prediction.
- Cross-Document Coreference Resolution: Cross-Document Coreference Resolution is the task of Coreference Resolution across multiple documents.
- Cross-Validation: Cross-Validation is a Resampling technique used to produce stable Estimates a Predictive Model's Accuracy.
- Crystal: Crystal is one of the earliest Information Extraction systems to automatically learn from a----
D
- Data: Data is a set of one or more elementary facts.
- Data Cleaning: Data Cleaning is the task of removing errors in Structured Data.
- Deductive Reasoning: Deductive Reasoning is the type of Reasoning where one makes a Conclusion by simply transforming what is already known.
- Definite Clause: A Definite Clause is a Clause without negation.
- DENDRAL: DENDRAL was one of the first Expert Systems.
- Dictionary: A Dictionary is a Database of Lexemes and information on their syntactic and
- Discretization: Discretization is the process of dividing a the range of a numeric variable into a fixed (discrete) number of 'bins'.
- Document: A Document is a unit of human readable Information.
- DUC: DUC is a series of conferences on Text Summarization.
- DTD: The BNF-style grammar used in XML to the define the legal elements and relationships between elements.
E
- EMYCIN: EMYCIN is the Expert System Shell built during the later stages of the MYCIN project.
- Expert System: An Expert System is a system that applies Deductive Reasoning to make Inferences about specific Instance given some Background Knowledge.
- Extraction Rule: An Extraction Rule is a Classifier that maps a Text to one or more Entity.
F
- Fact: A Fact is a Statement whose truth can be verified in the world.
- First-Order Logic: First-Order Logic is a Logic System that allows the representation Variables and Predicates and Quantification over Variables.
- Function Word: A Function Word is a Word that has grammatical meaning but no lexical meaning.
G
- Ground Fact: A Ground Fact is a Predicate with instantiated Predicate Parameters.
H
- Head Word: A Head Word is the most important Word in a Phrase both grammatically and lexically.
- Homonym: A Homonym is a Word with more than one Word Sense.
- Homophone: A Homophone is a spoken Word with more than one Word Sense.
- Horn Clause: A Horn Clause is a Clause containing at most one positive Literal.
- Hypernym: A Hypernym is a Class Noun that represents a more general concept than some other given Noun.
- Hyponym: A Hyponym is a Noun that represents a more specific concept than some other given Class Noun.
- Hypothesis: A Hypothesis is a Statement that is proposed to explain some Phenomenon.
I
- Inductive Reasoning: Inductive Reasoning is the type of Reasoning where one makes a Conclusion based on a set of Facts.
- Inductive Logic Programming: Inductive Logic Programming is the subfield of Machine Learning where the Model inferred is in First Order Logic.
- Information Extraction: Information Extraction is the NLP Task of extracing Structured Data from a Corpus.
- Information Retrieval: Information Retrieval is the NLP Task of identifying Documents that are relevant to a specified Query.
- Instance Noun: An Instance Noun is a Noun that conveys an instance of a Concept Class.
- Intensional Predicate: An Intentional Predicate is a Predicate that is defined by a set of rules (clauses).
J
- JAVELIN: JAVELIN is an open domain Question Answering System developed at CMU.
K
- Knowledge Base: A Knowledge Base is a Database of Concepts and (inter)Relations from some Domain.
- Knowledge Representation: Knowledge Representation is the study of formal methods to specify concepts and the constraints between them.
L
- Lexeme: A Lexeme is a Set of Terminal Words that Denote the same Word Sense. E.g. RUNNER, FAST1, FAST2.
- Lexical Database: A Lexical Database is a Database that contains a set of Lexemes.
- Lexicon: A Lexicon is the set of valid Terminal Words in a Language in some Domain.
M
- Machine Learning Algorithm: A Machine Learning Algorithm makes use of Inductive Logic to solve a Reasoning Task.
- Malapropism: A Malapropism is a Word that was used by mistake for a similar sounding Word.
- Message Understanding Conference: The Message Understanding Conference was a series of conferences that focused on the NLP task of Information Extraction.
- Metonymy: A Metonymy is a Figure of Speech where one substitutes a Word for another one with similar meaning.
- Model: A Model is a representation of some Thing that can help an Agent to Predict something about the thing's Behavior.
- Monosemy: A Monosemy is a (written?) Word that conveys only one meaning.
- Morpheme: A Morpheme is the smallest unit of Natural Language that conveys meaning and is grammatically significant.
- MUC: See Message Understanding Conference.
- MURAX: MURAX is a Question/Answering system for encyclopedias.
- MYCIN System: The MYCIN System was Expert System developed in the 70s to help physicians treat blood infections, prior to the arrival of comprehensive blood-test results.
N
- Named Entity Recognition Task: A Named Entity Recognition Task is the NLP Task to Annotate the Named Entities in a Text.
- Natural Language Expression: A Natural Language Expression is a sequence of Words that can be Uttered and understood.
- Natural Language Morphological Theory: A Natural Language Morphological Theory is a Theory about the structure of Words in terms of Morphemes in a Natural Language.
- Natural Language Semantic Theory: A Natural Language Semantic Theory is a Theory of how to represent the Semantic Structure of a Natural Language Expression.
- Natural Language Syntactic Theory: A Natural Language Syntactic Theory is a Theory of how to represent the Syntactic Structure of a Natural Language Expression.
- NER: See Named Entity Recognition.
- Nominalization: Nominalization is a Figure of Speech in which a Noun is used that is derived from a Verb or an Adjective.
O
P
- Paraphrase: A Paraphrase is a restatement of a Statement in another way.
- Phrase: A Phrase is a group of Words that form a syntactic unit but has no Subject-Predicate combination and so cannot stand alone as a Sentence.
- Polysemy: Polysemy is a type of relationship between one (written) word and all others in the lexicon. A word is polysemous if it can convey more than one meaning to the reader.
- Predicate: See: Predicate Function, Predicate Phrase.
- Predicate Function: A Predicate Function is a Function that returns either a true or false value.
- Predicate Adjective: A Predicate Adjective is an Adjective that follows a Linking Verb (e.g. is, seems), and which agrees with the Subject in number, gender, and case.
- Predicate Argument: A Predicate Argument is one of the values accepted by a Predicate Function.
- Predicate Calculus: See First-Order Logic.
- Predicate Noun: A Predicate Noun a noun or pronoun which follows a Linking Verb and which is the same as the Subject.
- Predicate Phrase: A Predicate Phrase is a Verb Phrase that expresses what is said about a Subject.
- Preposition: A Preposition is an Adposition that is placed before the modifying concept.
- Prepositional Phrase: A Prepositional Phrase is a Phrase composed of a Preposition and its modifier.
- PROGOL: PROGOL is a Machine Learning Algorithm.
- Prolog: Prolog is a Computing Language that implements automated Deductive Reasoning.
- Context:
- Prolog comes from the phrase "Programming in Logic".
- It was originally designed by A. Colmerauer and P. Roussel in 1971 for natural-language processing but has since been applied to several other AI problems.
- See: Deductive Reasoning, LISP
- Context:
.
- Pronoun Resolution: See Anaphora Resolution
.
- PropBank: PropBank is a Corpus derived from the Penn Treebank Corpus that has been enriched with Proposition structures.
- Context:
- In the formalism, semantic arguments are encoded with (A0-A5,AA), adjuncts with (AM-), references with (R-), and Verbs with (V).
- Verb senses come from VerbNet.
- See: Adjucative Argument, Proposition, Semantic Argument, Semantic Role Labeling, [1], Palmer, Gildea, and Kingsburyet, 2003
- Context:
.
- Proposition: A Proposition is a Predicate (verb) and its set of Arguments.
- AKA: Syntactic Constituent
- Context:
- A sentence often contains more than one proposition.
- See: Semantic Role Labeling
.
- Propositional Learner: A Propositional Learner is a Machine Learning Algorithm that is capable of inducing Propositional Rules.
- Context::
- Captures techniques such as Decision Tree Induction, Neural Network Induction, and Instance Based Learning.
- Are typically criticized for their inability to learn relations between observations, such as Ancestor(x,y).
- See: Propositional Logic
.
- Propositional Logic: Propositional Logic is a system of Logic that operates on individual members of the domain.
- AKA: Propositional Calculus
- See: First-Order Logic, Deductive Reasoning, Knowledge Base
.
- Pronoun Resolution: Pronoun Resolution is the task of identifying the proper noun related to a pronoun in a paragraph.
- See: Anaphora Resolution.
Q
- Question Answering: Question Answering Task is an NLP Task where an Answer must by returned to the provided Question.
R
- Random Variable: A Random Variable is a Variable whose Outcome is assigned by a Random Experiment.
- Reading Comprehension Task: The task of answering simple factual questions based on a small natural language passage.
- Reasoning: Reasoning is the process of making a Conclusion based on what is known to date.
- Refinement Operator: ARefinement Operator is a method of altering a Model.
- Relative Clause: A Relative Clause is a Linguistic Clause that begins with a Relative Pronoun and functions as an Adjective.
- Relation Recognition: Relation Recognition is the NLP Task of identifying semantic associations between Concepts that are implied in a Document.
- RSS: RSS is a lightweight dialect of XML used for describing metadata about Web sites.
S
- Second-Order Logic.
- Semantic Analysis.
- Semantic Argument: A semantic argument is an argument defined by verb-specific roles.
- Example:
- Sample semantic arguments include: agent, patient, and instrument.
- See: Adjuncative Argument, Semantic Role Labeling.
- Example:
- Semantic Parsing:
- Semantic Relation: A Semantic Relation is a relation between two or more Concepts that is True in some Domain.
- Semantic Role Labeling: Semantic Role Labeling is the NLP Task for the identification of the Propositions associated with each Predicate in a Sentence.
- Semantic Web: A nascent endeavor to create another World Wide Web where the information must be published in a format easy for computers to access (as opposed to the current WWW where the information is meant for human consumption). The standard encoding mechanism for the Semantic Web is OWL (and formerly RDF). An example of the Semantic Web is the www.BioPAX.org site.
- See: Ontology, XML, OWL. www.semanticweb.org.
- Semi-Supervised Learning: Semi-supervised learning is a type of machine learning where an algorithm makes use of both labeled and unlabelled data.
.
- Sentence: A Sentence is a sequence of Terminal Words that conforms some Natural Language Syntax.
.
- SEQUENTIAL-COVERING: SEQUENTIAL-COVERING is a supervised classification algorithm that performs a general-to-specific beam search through rule-space. The algorithm removes training examples covered by each discovered rule and then repeats until all the positive examples have been covered. The algorithm does not backtrack so the underlying LEARN-ONE-RULE must be effective.
.
- Statement: A Statement is something that may be True in some Domain.
- AKA: Rule, Pattern
- Example:
- All men a mortal. By Inductive Reasoning.
- I am mortal. By Deductive Reasoning.
- Socrates was a man. By Abductive Reasoning.
- See: Fact, Reasoning, Rule, Conclusion, Hypothesis, Semantic Relation
.
- Statistically Independent: Two Events are Statistically Independent if the probability that they both occur simultaneously is equal to the product of the probability that each occurs individually.
- AKA: Stochastically Independent
- Context:
- Sometimes statistical independence is represented as P(A ^ B) and its property as P(A)·P(B).
- Alternatively, two events are independent if the discovery that one of the event has occurred does not help you determine whether the other event has also occurred: i.e., P(A|B) = P(A).
- See: Probability, Conditional Probability, Independent Variable.
- Structured Data:.
- Subsumption: A subsumption relation specifies the relative generality of two concepts. More fomally, concept [math]\displaystyle{ A }[/math] subsumes concept [math]\displaystyle{ B }[/math] if the definitions of [math]\displaystyle{ A }[/math] and [math]\displaystyle{ B }[/math] logically imply that members of [math]\displaystyle{ B }[/math] must also be members of A.
- See: Instantiation.
- Synonym: A word [math]\displaystyle{ x }[/math] is a Synonym of another word [math]\displaystyle{ y }[/math] if they are both similar enough in meaning that they can be interchanged in some situations without loss of meaning.
- Context: Synonyms are often dependent on context.
- Example(s): The words 'attorney' can be a synonym of the word 'lawyer' although the word 'attorney' is more typical in American English.
- See: Antonym, Polysemy, Semantics, en.wikipedia.org/wiki/Synonym.
.
- Syntactic Relation: A Syntactic Relation is a relation between Words that conforms to some Grammar.
- See: Semantic Relation, Grammar.
T
- Table: In relational databases, a structure (table) that contains a set of records (tuples).
- See: Predicate, Relation.
.
- Target Verb: A Target Verb is a Verb that governs a Proposition.
.
- Taxonomy: A hierachical classification of concepts typically for a specific domain. The primary semantic organizing principle of taxonomies is class inclusion (is a or subsumption relationships). Examples of taxonomises include the tree of life, and library catalogues.
.
- Template: AKA: forms or 'frames') The common tabular-like structure that is filled in information extraction tasks. The elements of templates are often referred to as slots. Occassionally information extraction is referred to as a "template filling" or "slot filling" exercise.
.
- Text Mining: The automated discovery of interesting patterns from human-readable sources. Typical sources include the Web, email, corporate databases with text information, and publication databases such as Citeseer and MEDLINE. Text mining is sometimes referred to as data mining on unstructured text data.
- See: Citeseer, MEDLINE, Ontology, Structured Data, Unstructured Data.
- TF-IDF: TF-IDF is a function that estimates how well a term describes a document.
- Top-Down Learning: Refers to the technique of starting from a general rule and to proceed by specializing it.
.
- TREC: Text Retrieval Conference. A conference sponsored by NIST with tracks in Information Extraction, Question Answering, and other NLP tasks.
.
- Tuple: In relational databases, an entry within a relation.
- See: Ground Fact.
U
- Unstructured Data: Unstructured Data is Data that is not in a format that is amenable to computer processing.
- Context:
- Is useable by humans.
- It can be a Document
- It can be an audio file.
- It can be a video file.
- Example(s):
- Wikipedia
- A television news program
- An internet news story
- A telephone conversation
- See: Data, Structured Data.
- Context:
V
- VerbNet: VerbNet is a class-based verb lexicon.
- Verb: A Verb is a Content Word that expresses action, occurrence, or a state of being.
- Verb Phrase: A Verb Phrase is a Phrase whose Head is a Verb.
- View: In relational databases, an implicit relation.
- See: Intentional Predicate, Relation.
W
- Web Search: Web Search is Information Retrieval from the Web.
- Context:
- A feature of the Web that can be exploited is the rich amount linking between documents.
- See: Information Retrieval
- Context:
.
- WebKB: A project started in the late-90s by Tom M. Mitchell to develop a knowledge base that mirrors the content of the Web.
- See: [Craven et al. 1998]
.
- Word: See: Terminal Word, Lexeme.
- WordNet:
- Context:
- It can help to quantify the semantic similarity between two Words.
- See: Lexical Database, Lexeme
- Context:
.
- Word Play: Word Play is the intentional use of words that connote multiple meanings.
- Example(s): Just for the pun of it. The title of the play "The Importance of Being Earnest".
- See: Word Sense Disambiguation, Polysemous, Homonym
.
- Word Sense Disambiguation: Word Sense Disambiguation is the NLP task of identifying the intended Word Sense of a Terminal Word.
.
- Wrapper: A Wrapper is a procedure to extract a particular type of information from textual data.
- See: Information Extraction.