2008 FebrlAnOpenSourceDataCleaningDe

From GM-RKB
Jump to navigation Jump to search

Subject Headings:

Notes

Cited By

Quotes

Author Keywords

Abstract

Matching records that refer to the same entity across data-bases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs to be matched in order to enrich data or improve its quality. Significant advances in record linkage techniques have been made in recent years. However, many new techniques are either implemented in research proof-of-concept systems only, or they are hidden within expensive 'black box' commercial software. This makes it difficult for both researchers and practitioners to experiment with new record linkage techniques, and to compare existing techniques with new ones. The Febrl (Freely Extensible Biomedical Record Linkage) system aims to fill this gap. It contains many recently developed techniques for data cleaning, deduplication and record linkage, and encapsulates them into a graphical user interface (GUI). Febrl thus allows even inexperienced users to learn and experiment with both traditional and new record linkage techniques. Because Febrl is written in Python and its source code is available, it is fairly easy to integrate new record linkage techniques into it. Therefore, Febrl can be seen as a tool that allows researchers to compare various existing record linkage techniques with their own ones, enabling the record linkage research community to conduct their work more efficiently. Additionally, Febrl is suitable as a training tool for new record linkage users, and it can also be used for practical linkage projects with data sets that contain up to several hundred thousand records.

References

,

 AuthorvolumeDate ValuetitletypejournaltitleUrldoinoteyear
2008 FebrlAnOpenSourceDataCleaningDePeter ChristenFebrl -: An Open Source Data Cleaning, Deduplication and Record Linkage System with a Graphical User Interface10.1145/1401890.1402020