Joint Inference Algorithm

A joint inference algorithm is a supervised model-based learning algorithm that optimizes for all the underlying inferences for a composite task at once.

AKA: Joint Inferencing.
Context:
- It can (in some but not all situations) outperform a state-of-the-art algorithm (such as a k-best pipeline algorithm). (Wick et al., 2009; Sutton & McCallum, 2005a)
Examples(s):
- It can used to jointly solve an entity mention recognition task and an entity mention coreference resolution task (Poon & Domingos, 2007)
- It can be used to jointly solve a named entity detection task and a named entity classification task. (Daumé III & Marcu, 2005).
- …
Counter-Example(s):
- a k-Best List Algorithm.
- a Reranking Algorithm.
- a Pipelined Algorithm.
See: Joint Learning, Global Model, Local Model.

References

Google Scholar Search: http://scholar.google.com/scholar?q="joint+inference"+training+algorithm

2009

(Wick et al., 2009) ⇒ Michael Wick, Aron Culotta, Khashayar Rohanimanesh, and Andrew McCallum. (2009). “An Entity Based Model for Coreference Resolution.” In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009).
- QUOTE: … we explicitly model entities and perform coreference and canonicalization jointly … ... our model allows first order logic features to be expressed over entire clusters, enabling us to model canonicalization and coreference simultaneously.
- It applies a Joint Inference Algorithm to the Composite Task of canonicalization and coreference resolution.

2007

(Poon & Domingos, 2007) ⇒ Hoifung Poon, and Pedro Domingos. (2007). “Joint Inference in Information Extraction.” In: Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI 2007).
- It applies a Joint Inference Algorithm to the combined tasks of entity mention recognition and entity mention coreference resolution.

2006

(Finkel et al., 2006) ⇒ Jenny Rose Finkel, Christopher D. Manning, and Andrew Y. Ng. (2006). “Solving the Problem of Cascading Errors: Approximate Bayesian Inference for Linguistic Annotation Pipelines.” In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006).
(Culotta et al., 2006) ⇒ Aron Culotta, Andrew McCallum, and Jonathan Betz. (2006). “Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text.” In: Proceedings of HLT-NAACL 2006.
- This work can also be viewed as part of a trend to perform joint inference across multiple language processing tasks (Miller et al., 2000; Roth and tau Yih, 2002; Sutton and McCallum, 2004).
(JINLP, 2006) Proposed workshop http://www.cs.umass.edu/~casutton/jinlp2006/
- In NLP there has been increasing interest in moving away from systems that make chains of local decisions independently, and instead toward systems that make multiple decisions jointly using global information. For example, NLP tasks are often solved by a pipeline of processing steps (from speech, to translation, to entity extraction, relation extraction, coreference and summarization)---each of which locally chooses its output to be passed to the next step. However, we can avoid accumulating cascading errors by joint decoding across the pipeline---capturing uncertainty and multiple hypotheses throughout. The use of lattices in speech recognition is well-established, but recently there has been more interest in larger, more complex joint inference, such as joint ASR and MT, and joint extraction and coreference.
- The main challenge in applying joint methods more widely throughout NLP is that they are more complex and more expensive than local approaches. Various models and approximate inference algorithms have been used to maintain efficiency, such as beam search, reranking, simulated annealing, and belief propagation, but much work remains in understanding which methods are best for particular applications, or which new techniques could be brought to bear.

2005

(Finkel et al., 2005) ⇒ Jenny Rose Finkel, Trond Grenager, and Christopher D. Manning. (2005). “Incorporating Nonlocal Information into Information Extraction Systems by Gibbs Sampling.” In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005). doi:10.3115/1219840.1219885.
(Sutton & McCallum, 2005a) ⇒ Charles Sutton, and Andrew McCallum. (2005). ."Joint Parsing and Semantic Role Labeling.” In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CONLL 2005).
- … Our current results are negative, because a locally-trained SRL model can return inaccurate probability estimates.
(Daumé III & Marcu, 2005) ⇒ Hal Daumé, III, and Daniel Marcu. (2005). “A Large-Scale Exploration of Effective Global Features for a Joint Entity Detection and Tracking Model.” In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT/EMNLP 2005). doi:10.3115/1220575.1220588
- Entity detection and tracking (EDT) is the task of identifying textual mentions of real-world entities in documents, extending the named entity detection and coreference resolution task by considering mentions other than names (pronouns, definite descriptions, etc.). Like NE tagging and coreference resolution, most solutions to the EDT task separate out the mention detection aspect from the coreference aspect. By doing so, these solutions are limited to using only local features for learning. In contrast, by modeling both aspects of the EDT task simultaneously, we are able to learn using highly complex, non-local features. We develop a new joint EDT model and explore the utility of many features, demonstrating their effectiveness on this task.

2004

(McCallum & Sutton, 2004) ⇒ Andrew McCallum, and Charles Sutton. (2004). “Piecewise Training with Parameter Independence Diagrams: Comparing Globally- and Locally-trained Linear-chain CRFs.” In: NIPS 2004 Workshop on Learning with Structured Outputs.

2003

(McCallum, 2003) ⇒ Andrew McCallum. (2003). “Efficiently Inducing Features of Conditional Random Fields.” In: Proceedings of the 19th Conference on Uncertainty in Artificial Intelligence.

2000

(Miller et al., 2000) ⇒ Scott Miller, Heidi Fox, Lance Ramshaw, and Ralph Weischedel. (2000). “A Novel Use of Statistical Parsing to Extract Information from Text.” In: Proceedings of NAACL Conference (NAACL 2000).

Joint Inference Algorithm

References

2009

2007

2006

2005

2004

2003

2000

Navigation menu

Search