Schema Matching Algorithm
Jump to navigation
Jump to search
A Schema Matching Algorithm is an coreference resolution algorithm that can be implemented into a Schema Matching System (to solve a schema matching task).
- …
- Counter-Example(s):
- See: Database Schema Integration Task.
References
2011
- http://en.wikipedia.org/wiki/Schema_matching#Approaches
- Approaches to schema integration can be broadly classified as ones that exploit either just schema information or schema and instance level information (see Figure 2 and and for a list of prototypes)
- Schema-level matchers only consider schema information, not instance data. The available information includes the usual properties of schema elements, such as name, description, data type, relationship types (part-of, is-a, etc.), constraints, and schema structure. Working at the element (atomic elements like attributes of objects) or structure level (matching combinations of elements that appear together in a structure), these properties are used to identify matching elements in two schemas. Language-based or linguistic matchers use names and text (i.e., words or sentences) to find semantically similar schema elements. Constraint based matchers exploit constraints often contained in schemas. Such constraints are used to define data types and value ranges, uniqueness, optionality, relationship types and cardinalities, etc. Constraints in two input schemas are matched to determine the similarity of the schema elements.
- Instance-level matchers use instance-level data to gather important insight into the contents and meaning of the schema elements. These are typically used in addition to schema level matches in order to boost the confidence in match results, more so when the information available at the schema level is insufficient. Matchers at this level use linguistic and constraint based characterization of instances. For example, using linguistic techniques, it might be possible to look at the Dept, DeptName and EmpName instances to conclude that DeptName is a better match candidate for Dept than EmpName. Constraints like zipcodes must be 5 digits long or format of phone numbers may allow matching of such types of instance data.
- Hybrid matchers directly combine several matching approaches to determine match candidates based on multiple criteria or information sources. Most of these techniques also employ additional information such as dictionaries, thesauri, and user-provided match or mismatch information
- Reusing matching information: Another initiative has been to re-use previous matching information as auxiliary information for future matching tasks. The motivation for this work is that structures or substructures often repeat, for example in schemas in the E-commerce domain. Such a reuse of previous matches however needs to be a careful choice. It is possible that such a reuse makes sense only for some part of a new schema or only in some domains. For example, Salary and Income may be considered identical in a payroll application but not in a tax reporting application. There are several open ended challenges in such reuse that deserves further work.
- Sample Prototypes: Typically, the implementation of such matching techniques can be classified as being either rule based or learner based systems. The complementary nature of these different approaches has instigated a number of applications using a combination of techniques depending on the nature of the domain or application under consideration. Among others, both and discuss such prototype systems along with the dimensions of their classification.
- Approaches to schema integration can be broadly classified as ones that exploit either just schema information or schema and instance level information (see Figure 2 and and for a list of prototypes)
2008
- (Wick et al., 2008) ⇒ Michael Wick, Khashayar Rohanimanesh, Karl Schultz, and Andrew McCallum. (2008). “A Unified Approach for Schema Matching, Coreference, and Canonicalization.” In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008).
2006
- (Wu et al., 2006) ⇒ Wensheng Wu, AnHai Doan, and Clement Yu. Webiq. (2006). “WebIQ: Learning from the Web to Match Deep-Web Query Interfaces.” In: Proceedings of the 22nd International Conference on Data Engineering.
- WebIQ discovers the instances of an attribute using the knowledge learned from surface web.
- It uses Google search.
2005
- (Madhavan et al., 2005) ⇒ Jayant Madhavan, Philip A. Bernstein, AnHai Doan, and Alon Halevy. (2005). “Corpus-based Schema Matching.” In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005).
- (Shvaiko & Euzenat, 2005) ⇒ Pavel Shvaiko, and Jérôme Euzenat. (2005). “A Survey of Schema-based Matching Approaches.” In: Journal on Data Semantics IV, 3730/2005.
2004
- (Wang et al., 2004) ⇒ Jiying Wang, Ji-Rong Wen, Fred Lochovsky, and Wei-Ying Ma. (2004). “Instance-based Schema Matching for Web Databases by Domain-specific Query Probing.” In: Proceedings of the Thirtieth International Conference on Very large data bases (VLDB 2004).
2003
- (Giunchiglia & Shvaiko, 2003) ⇒ Fausto Giunchiglia, and Pavel Shvaiko. (2003). “Semantic Matching.” In: The Knowledge Engineering Review, 18.
2002
- (Do & Rahm, 2002) ⇒ Hong-Hai Do, and Erhard Rahm. (2002). “COMA: A system for flexible combination of schema matching approaches.” In: Proceedings of the 28th VLDB Conference (VLDB 2002).
2001
- (Doan et al., 2001) ⇒ AnHai Doan, Pedro Domingos, and Alon Y. Halevy. (2001). “Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach.” In: ACM SIGMOD Record, 30(2). doi:10.1145/376284.375731
- (Madhavan et al., 2001) ⇒ Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. (2001). “Generic Schema Matching with Cupid.” In: Proceedings of the 27th International Conference on Very Large Data Bases
- Erhard Rahm, and Philip A. Bernstein. (2001). “A Survey of Approaches to Automatic Schema Matching.” In: The VLDB Journal The International Journal on Very Large Databases.