Person Record Duplicate Detection Task
(Redirected from Person Record Coreference Resolution)
Jump to navigation
Jump to search
A Person Record Duplicate Detection Task is a Domain Specific Duplicate Record Detection Task that requires the Detection of Person Records is a Person Record Set that Refer to the same person.
- AKA: Person Duplicate Detection Task, Person Record Coreference Resolution Task.
- Context:
- It can involve the test of Person Record Data Attributes such as: Person Name, and Person Birthdate.
- Example(s):
- The following two records likely refer to the same person
[Name=J. Smith; Phone=(555)123-1234]
and[Name=John R. Smith; Phone=1-555-123-1234]
. - an Author Citation Duplication Detection Task.
- …
- The following two records likely refer to the same person
- Counter-Example(s):
- See: Citation Record Duplicate Detection Task.
References
2012
- (Ferreira et al., 2012) ⇒ Anderson A. Ferreira, Marcos André Gonçalves, and Alberto H. F. Laender. (2012). “A Brief Survey of Automatic Methods for Name Disambiguation.” In: SIGMOD Record, 41(2).
- QUOTE: In case of the author names attribute, a component corresponds to the name of a single unique author and is a reference [math]\displaystyle{ r_j }[/math] to a real author. In case of the other attributes, a component corresponds to a word/term. The objective of a disambiguation method is to produce a function that is used to partition the set of references to authors {r1, . . . , rm} into n sets {a1, . . . , an}, so that each partition ai contains (all and ideally only all) the references to a same author.
To disambiguate the bibliographic citations of a DL, first we may split the set of references to authors into groups of references whose values of the author name attribute are ambiguous. These are called ambiguous groups (i.e., groups of references having the value of the author name attribute with similar names). The ambiguous groups may be obtained by using blocking methods [37] which address scalability issues avoiding the need for comparisons among all references.
- QUOTE: In case of the author names attribute, a component corresponds to the name of a single unique author and is a reference [math]\displaystyle{ r_j }[/math] to a real author. In case of the other attributes, a component corresponds to a word/term. The objective of a disambiguation method is to produce a function that is used to partition the set of references to authors {r1, . . . , rm} into n sets {a1, . . . , an}, so that each partition ai contains (all and ideally only all) the references to a same author.