2003 IdentityUncertaintyAndCitationMatching

Subject Headings: Entity Linking Task.

Notes

(Wick et al., 2009) ⇒ Michael Wick, Aron Culotta, Khashayar Rohanimanesh, and Andrew McCallum. (2009). “An Entity Based Model for Coreference Resolution.” In: Proceedings of the SIAM International Conference on Data Mining (SDM 2009).
- NOTE: It describes it as a Probabilistic Algorithm that applies a Generatively Trained Directed Graphical Models to the Citation Matching Task.
- QUOTE: Pasula et al. [5] and Milch et al. [27] propose Bayesian network based on logical clauses for modeling the citation matching task. The model implicitly represents entities with distributions specific to certain attributes such as title or venue. However, we believe that the flexibility of discriminatively-trained models is an advantage for the coreference tasks since they more naturally handle overlapping and co-dependencies between features. Also, their approaches do not explicitly result in canonical records as ours does.

(Poon & Domingos, 2007) ⇒ H. Poon and Pedro Domingos. (2007). “Joint inference in information extraction.” In: Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI 2007).
- QUOTE: While a number of previous authors have taken steps in this direction (e.g., Pasula et al (2003), Wellner et al. (2004)), to our knowledge this is the first fully joint approach.

Identity uncertainty is a pervasive problem in real-world data analysis. It arises whenever objects are not labeled with unique identifiers or when those identifiers may not be perceived perfectly. In such cases, two observations may or may not correspond to the same object. In this paper, we consider the problem in the context of citation matching — the problem of deciding which citations correspond to the same publication. Our approach is based on the use of a relational probability model to define a generative model for the domain, including models of author and title corruption and a probabilistic citation grammar. Identity uncertainty is handled by extending standard models to incorporate probabilities over the possible mappings between terms in the language and objects in the domain. Inference is based on Markov chain Monte Carlo, augmented with specific methods for generating efficient proposals when the domain contains many objects. Results on several citation data sets show that the method outperforms current algorithms for citation matching. The declarative, relational nature of the model also means that our algorithm can determine object characteristics such as author names by combining multiple citations of multiple papers.

,

	Author	volume	Date Value	title	type	journal	titleUrl	doi	note	year
2003 IdentityUncertaintyAndCitationMatching	Hanna Pasula Bhaskara Marthi Brian Milch Ilya Shpitser Stuart J. Russell			Identity Uncertainty and Citation Matching		Proceedings of Advances in Neural Information Processing	http://www.eecs.berkeley.edu/~russell/papers/nips02-citation.pdf			2003