Model Evaluation Task
A Model Evaluation Task is an evaluation task that test the goodness of a predictive model to model some reality.
- Context:
- Input: Model Evaluation Metric.
- It can range from being an Offline Model Evaluation Task to being an Online Model Evaluation Task.
- It can range from being a Classification Model Evaluation Task to being a Ranking Model Evaluation Task to being an Estimation Model Evaluation Task.
- It can be solved by a Model Evaluation System (that implements a model evaluation algorithm).
- Example(s):
- Counter-Example(s):
- See: Model Design Task; Model Performance Evaluation; Model Artifact; Homeostatic Model Assessment; Overfitting; ROC Analysis; Resubstitution Estimate; Resubstitution Error.
References
2017
- (Webb, 2017) ⇒ Geoffrey I. Webb. (2011). “Model Evaluation.” In: (Sammut & Webb, 2017) p.683
- QUOTE: Model evaluation is the process of assessing a property or properties of a model. (...)
here are many metrics by which a model may be assessed. The relative importance of each metric varies from application to application. The primary considerations often relate to predictive efficacy — how useful will the predictions be in the particular context it is to be deployed. Measures relating to predictive efficacy include Accuracy, Lift, Mean Absolute Error, Mean Squared Error, Negative Predictive Value, Positive Predictive Value, Precision, Recall, Sensitivity, Specificity, and various metrics based on ROC analysis (...)
When assessing the predictive efficacy of a model learned from data, to obtain a reliable estimate of its likely performance on new data, it is essential that it not be assessed by considering its performance on the data from which it was learned. A learning algorithm must interpolate appropriate predictions for regions of the instance space that are not included in the training data. It is probable that the inferred model will be more accurate for those regions represented in the training data than for those that are not, and hence predictions are likely to be less accurate for instances that were not included in the training data. Estimates that have been computed on the training data are called resubstitution estimates. For example, the error of a model on the training data from which it was learned is called resubstitution error.
Algorithm evaluation techniques such as cross-validation, holdout evaluation, and bootstrap sampling are designed to provide more reliable estimates of the accuracy of the models learned by an algorithm than would be obtained by assessing them on the training data.
- QUOTE: Model evaluation is the process of assessing a property or properties of a model. (...)