DBPredictor Project
A DBPredictor Project is a research project that investigates the DBPredictor Algorithm (a Lazy Model-based Classification Algorithm).
- Context:
- It is currently hosted at http://www.cs.sfu.ca/~melli/DBPredictor
- See: Performance Metric, DatGen Project, GM-RKB Project.
References
1998
- (Melli, 1998) ⇒ Melli, G. (1998). A Lazy Model-Based Approach to On-Line Classification.
- Abstract: The growing access to large amounts of structured observations allows for more opportunistic uses of this data. An example of this, is the prediction of an event’s class membership based on a database of observations. When these predictions are supported by a highlevel representation, we refer to these as knowledge based on-line classification tasks. Two common types of algorithms from machine learning research that may be applied to on-line classification tasks make use of either lazy instance-based (k-NN,IB1) or eager model-based (C4.5,CN2) approaches. Neither approach, however, appears to provide a complete solution for these tasks.
This thesis proposes a lazy model-based algorithm, named DBPredictor, that is suited to knowledge based on-line classification tasks. The algorithm uses a greedy top-down search to locate a probabilistic IF-THEN rule that will classify the given event. Empirical investigation validates this match. DBPredictor is shown to be as accurate as IB1 and C4.5 against general datasets. Its accuracy however, is more robust to irrelevant attributes than IB1, and more robust to underspecified events than C4.5. Finally, DBPredictor is shown to solve a significant number of classification requests before C4.5 can satisfy its first request.
These performance characteristics, along with the algorithm’s ability to avoid discretization of numerical attributes and its ability to be tightly-coupled with a relational database, suggests that DBPredictor is an appropriate algorithm for knowledge based on-line classification tasks.
- Abstract: The growing access to large amounts of structured observations allows for more opportunistic uses of this data. An example of this, is the prediction of an event’s class membership based on a database of observations. When these predictions are supported by a highlevel representation, we refer to these as knowledge based on-line classification tasks. Two common types of algorithms from machine learning research that may be applied to on-line classification tasks make use of either lazy instance-based (k-NN,IB1) or eager model-based (C4.5,CN2) approaches. Neither approach, however, appears to provide a complete solution for these tasks.