Identity Resolution
Jump to navigation
Jump to search
See: Person Reference Normalization Task, Entity Resolution, Record Linkage, Data Matching Task, String Matching Task.
References
2018
- (Wikipedia, 2018) ⇒ https://en.wikipedia.org/wiki/Record_linkage#Data_matching Retrieved:2018-5-27.
- While entity resolution solutions include data matching technology, many data matching offerings do not fit the definition of entity resolution. Here are four factors that distinguish entity resolution from data matching, according to John Talburt, director of the UALR Center for Advanced Research in Entity Resolution and Information Quality:
- Works with both structured and unstructured records, and it entails the process of extracting references when the sources are unstructured or semi-structured
- Uses elaborate business rules and concept models to deal with missing, conflicting, and corrupted information
- Utilizes non-matching, asserted linking (associate) information in addition to direct matching
- Uncovers non-obvious relationships and association networks (i.e. who's associated with whom)
- In contrast to data quality products, more powerful identity resolution engines also include a rules engine and workflow process, which apply business intelligence to the resolved identities and their relationships. These advanced technologies make automated decisions and impact business processes in real time, limiting the need for human intervention.
- While entity resolution solutions include data matching technology, many data matching offerings do not fit the definition of entity resolution. Here are four factors that distinguish entity resolution from data matching, according to John Talburt, director of the UALR Center for Advanced Research in Entity Resolution and Information Quality:
2017a
- (Christen & Winkler, 2017) ⇒ Peter Christen and William E. Winkler (2017) "Record Linkage". In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA
- QUOTE: Identifying and linking records that correspond to the same real-world entity in one or more databases is an increasingly important task in many data mining and machine learning projects. The aim of record linkage is to compare records within one (known as deduplication) or across two databases and classify the compared pairs of records as matches (pairs where both records are assumed to refer to the same real-world entity) and non-matches (pairs where the two records are assumed to refer to different entities). Formally, let us consider two databases (or files), A and B, and record pairs in the product space A × B (for the deduplication of a single database A, the product space is A × A). The aim of record linkage is to classify these record pairs into the classes of matches (links) and non-matches (non-links) (Christen 2012[1]). Depending upon the decision model used (Fellegi and Sunter 1969[2]; Herzog et al. 2007[3]), a third clas of potential matches (potential links) might be used. These are difficult to classify record pairs that will need to be manually assessed and classified as matches or non-matches in a manual clerical review process.
Each record pair in A × B is assumed to correspond to either a true match or a true non-match. The space A × B is therefore partitioned into the set M of true matches and the set U of true non-matches. The objective of record linkage is to correctly classify record pairs from M into the class of matches and pairs from U into the class of non-matches.
- QUOTE: Identifying and linking records that correspond to the same real-world entity in one or more databases is an increasingly important task in many data mining and machine learning projects. The aim of record linkage is to compare records within one (known as deduplication) or across two databases and classify the compared pairs of records as matches (pairs where both records are assumed to refer to the same real-world entity) and non-matches (pairs where the two records are assumed to refer to different entities). Formally, let us consider two databases (or files), A and B, and record pairs in the product space A × B (for the deduplication of a single database A, the product space is A × A). The aim of record linkage is to classify these record pairs into the classes of matches (links) and non-matches (non-links) (Christen 2012[1]). Depending upon the decision model used (Fellegi and Sunter 1969[2]; Herzog et al. 2007[3]), a third clas of potential matches (potential links) might be used. These are difficult to classify record pairs that will need to be manually assessed and classified as matches or non-matches in a manual clerical review process.
2017b
- (Bhattacharya & Getoor, 2017) ⇒ Indrajit Bhattacharya and Lise Getoor (2017) "Entity Resolution". In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA.
- QUOTE: A fundamental problem in data cleaning and integration (see Data Preparation) is dealing with uncertain and imprecise references to real-world entities. The goal of entity resolution is to take a collection of uncertain entity references (or references, in short) from a single data source or multiple data sources, discover the unique set of underlying entities, and map each reference to its corresponding entity. This typically involves two subproblems – identification of references with different attributes to the same entity and disambiguation of references with identical attributes by assigning them to different entities.
2016
- (Edwards et al., 2016) ⇒ Matthew Edwards, Stephen Wattam, Paul Rayson and Awais Rashid (2016, December). "Sampling labelled profile data for identity resolution" (PDF). In Big Data (Big Data), 2016 IEEE International Conference on (pp. 540-547). IEEE.
- QUOTE: Identity resolution tasks are a form of classification whereby two or more profiles of a person - often from different databases - are matched together based on the similarity of their features. The aim is to identify multiple profiles referring to the same individual, where a profile may include everything from simple biographical attributes to inferred characteristics such as writing style. The aim of identity resolution is to allow different sets of information about a person to be connected.
20??
- http://www.information-management.com/
- IDENTITY RESOLUTION: Its Increasingly Important Role Inside Enterprise Data Management
- Not so long ago, identity resolution was a pretty obscure topic. But look at today’s headlines: Air travelers vetted against no-fly lists. Voter registrations verified by government databases. Mortgages defaults blamed on bad credit checks. Electronic health records promise huge cost savings. Drug pushers, tax cheats, and inside traders caught through data mining. Everywhere you turn, solutions to our biggest problems depend on correctly linking data to people within and across systems.
- ↑ Christen P (2012) Data matching – concepts and techniques for record linkage, entity resolution, and duplicate detection. Data-centric systems and applications. Springer, Berlin/New York
- ↑ Fellegi IP, Sunter AB (1969) A theory for record linkage. J Am Stat Assoc 64(328):1183–1210
- ↑ Herzog TN, Scheuren FJ, Winkler WE (2007) Data Quality and Record Linkage Techniques. Springer, New York/London