2008 FebrlAnOpenSourceDataCleaningDe
- (Christen, 2008) ⇒ Peter Christen. (2008). “Febrl -: An Open Source Data Cleaning, Deduplication and Record Linkage System with a Graphical User Interface.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). doi:10.1145/1401890.1402020
Subject Headings:
Notes
Cited By
- http://scholar.google.com/scholar?q=%22Febrl+-%3A+an+open+source+data+cleaning%2C+deduplication+and+record+linkage+system+with+a+graphical+user+interface%22+2008
- http://portal.acm.org/citation.cfm?doid=1401890.1402020&preflayout=flat#citedby
Quotes
Author Keywords
Abstract
Matching records that refer to the same entity across data-bases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs to be matched in order to enrich data or improve its quality. Significant advances in record linkage techniques have been made in recent years. However, many new techniques are either implemented in research proof-of-concept systems only, or they are hidden within expensive 'black box' commercial software. This makes it difficult for both researchers and practitioners to experiment with new record linkage techniques, and to compare existing techniques with new ones. The Febrl (Freely Extensible Biomedical Record Linkage) system aims to fill this gap. It contains many recently developed techniques for data cleaning, deduplication and record linkage, and encapsulates them into a graphical user interface (GUI). Febrl thus allows even inexperienced users to learn and experiment with both traditional and new record linkage techniques. Because Febrl is written in Python and its source code is available, it is fairly easy to integrate new record linkage techniques into it. Therefore, Febrl can be seen as a tool that allows researchers to compare various existing record linkage techniques with their own ones, enabling the record linkage research community to conduct their work more efficiently. Additionally, Febrl is suitable as a training tool for new record linkage users, and it can also be used for practical linkage projects with data sets that contain up to several hundred thousand records.
References
,
Author | volume | Date Value | title | type | journal | titleUrl | doi | note | year | |
---|---|---|---|---|---|---|---|---|---|---|
2008 FebrlAnOpenSourceDataCleaningDe | Peter Christen | Febrl -: An Open Source Data Cleaning, Deduplication and Record Linkage System with a Graphical User Interface | 10.1145/1401890.1402020 |