Predictor Feature Function

A Predictor Feature Function is a function structure that is intended to provide useful information to a predictive model.

Context:
- output: a Predictor Feature Value.
- It can (typically) be instantiated as a Predictor Feature Column in a Learning Dataset.
- It can be created by a Feature Population Job created by a feature development task).
- It can be defined by a Feature Creation Task (from a feature vector acting as a learning record).
- It can (typically) be a member of a Feature Space.
- It can range from being a Categorical Predictor Variable to being an Ordinal Predictor Variable to being a Continuous Predictor Variable.
- It can range from being a Simple Predictor Feature to being a Complex Predictor Feature (such as an aggregational feature).
- It can range from being a Static Feature to being a Dynamic Feature.
- It can range from being a Predictive Feature to being an Unpredictive Feature (for some prediction task).
- It can range from being a Manually-Engineered Feature to being an Automatically-Generated Function.
- It can be a part of an ML Feature Repository.
- It can be in a Correlated Feature Relationship (with another predictor feature).
Example(s):
- a Text Token-based Predictor Feature, such as if token="and" then 1 else 0.
- a Image-based Predictor Feature, such as if pixel=<255,255,255> then 1 else 0.
- a Temporal Predictor Feature, such as a recency feature (such as ellapsed timeperiod since last event) or a frequency feature.
- a Spatial Predictor Feature, such as a Distance-based Feature, or a Coordinate-based Feature.
- a sepal length in cm, such as in Fisher's Iris Dataset.
- …
Counter-Example(s):
- a Target Attribute.
- a Latent Variable, such as a confounder.
- a Statistic Function Structure.
- a Probability Function Structure.
See: Factor Analysis, Learning Task Dataset, Predictor Feature Detector, Exposure Variable, Feature Selection, Machine Learning, Regression Analysis.

References

2020

(Wikipedia, 2020) ⇒ https://en.wikipedia.org/wiki/Feature_(machine_learning) Retrieved:2020-5-13.
- In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon being observed. Choosing informative, discriminating and independent features is a crucial step for effective algorithms in pattern recognition, classification and regression. Features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition.
  The concept of "feature" is related to that of explanatory variable used in statistical techniques such as linear regression.

2015a

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/feature_(machine_learning) Retrieved:2015-7-8.
- … The initial set of raw features can be redundant and too large to be managed. Therefore, a preliminary step in many applications of machine learning and pattern recognition consists of selecting a subset of features, or constructing a new and reduced set of features to facilitate learning, and to improve generalization and interpretability.
  Extracting or selecting features is a combination of art and science. It requires the experimentation of multiple possibilities and the combination of automated techniques with the intuition and knowledge of the domain expert.

2015b

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Dependent_and_independent_variables#Statistics_synonyms Retrieved:2015-6-6.
- An independent variable is also known as a "predictor variable", "regressor", "controlled variable", "manipulated variable", "explanatory variable", “exposure variable” (see reliability theory), “risk factor” (see medical statistics), “feature” (in machine learning and pattern recognition) or an "input variable."^[1] ^[2] "Explanatory variable"is preferred by some authors over "independent variable" when the quantities treated as "independent variables" may not be statistically independent.^[3] ^[4]
  A dependent variable is also known as a "response variable", "regressand", "measured variable", "responding variable", "explained variable", "outcome variable", "experimental variable", and "output variable". If the independent variable is referred to as an "explanatory variable" (see above) then the term "response variable"is preferred by some authors for the dependent variable.
  Variables may also be referred to by their form: continuous, binary dichotomous, nominal categorical, and ordinal categorical, among others.

↑ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9 (entry for "independent variable")
↑ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9 (entry for "regression")
↑ Everitt, B.S. (2002) Cambridge Dictionary of Statistics, CUP. ISBN 0-521-81099-X
↑ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9

2015c

(Wikipedia, 2015) ⇒ http://en.wikipedia.org/wiki/Feature_(machine_learning) Retrieved:2015-6-6.
- In machine learning and pattern recognition, a feature is an individual measurable property of a phenomenon being observed. Choosing informative, discriminating and independent features is a crucial step for effective algorithms in pattern recognition, classification and regression. Features are usually numeric, but structural features such as strings and graphs are used in syntactic pattern recognition. The concept of "feature" is related to that of explanatory variable used in statistical techniques such as linear regression.

2012

(Wikipedia, 2012) ⇒ http://en.wikipedia.org/wiki/Covariate
- QUOTE: In statistics, a covariate is a variable that is possibly predictive of the outcome under study. A covariate may be of direct interest or it may be a confounding or interacting variable.
  The alternative terms explanatory variable, independent variable, or predictor, are used in a regression analysis. In econometrics, the term "control variable" is usually used instead of "covariate". In a more specific usage, a covariate is a secondary variable that can affect the relationship between the dependent variable and other independent variables of primary interest.
  An example is provided by the analysis of trend in sea-level by Woodworth (1987). Here the dependent variable (and variable of most interest) was the annual mean sea level at a given location for which a series of yearly values were available. The primary independent variable was "time". Use was made of a "covariate" consisting of yearly values of annual mean atmospheric pressure at sea level. The results showed that inclusion of the covariate allowed improved estimates of the trend against time to be obtained, compared to analyses which omitted the covariate.

2011

http://archive.ics.uci.edu/ml/datasets/Iris/
- QUOTE: ... Fisher's paper is a classic in the field and is referenced frequently to this day. …
  Attribute Information:
  1. . sepal length in cm.
  2. . sepal width in cm.
  3. . petal length in cm.
  4. . petal width in cm.
  5. . class: Iris Setosa; Iris Versicolour; Iris Virginica

2008

(Wilson, 2008a) ⇒ Bill Wilson. (2008). “The Machine Learning Dictionary for COMP9414." University of New South Wales, Australia.
- QUOTE: attributes: An attribute is a property of an instance that may be used to determine its classification. For example, when classifying objects into different types in a robotic vision task, the size and shape of an instance may be appropriate attributes. Determining useful attributes that can be reasonably calculated may be a difficult job - for example, what attributes of an arbitrary chess end-game position would you use to decide who can win the game? This particular attribute selection problem has been solved, but with considerable effort and difficulty. Attributes are sometimes also called features.

2003

(Guyon & Elisseeff, 2003) ⇒ Isabelle M. Guyon, and André Elisseeff. (2003). “An Introduction to Variable and Feature Selection.” In: The Journal of Machine Learning Research, 3.
- QUOTE: … We call “variable” the “raw” input variables and “features” variables constructed for the input variables. We use without distinction the terms “variable” and “feature” when there is no impact on the selection algorithms, e.g., when features resulting from a pre-processing of input variables are explicitly computed. The distinction is necessary in the case of kernel methods for which features are not explicitly computed (see section 5.3).

2000

http://www.cse.unsw.edu.au/~billw/mldict.html#attribute
- QUOTE: An attribute is a property of an instance that may be used to determine its classification. For example, when classifying objects into different types in a robotic vision task, the size and shape of an instance may be appropriate attributes. Determining useful attributes that can be reasonably calculated may be a difficult job - for example, what attributes of an arbitrary chess end-game position would you use to decide who can win the game? This particular attribute selection problem has been solved, but with considerable effort and difficulty.
  Attributes are sometimes also called features.

1998

(Johnson & Wichern, 1998) ⇒ Richard A. Johnson, and Dean W. Wichern. (1998). “Applied Multivariate Statistical Analysis, 4th ed." Prentice hall, 1992. ISBN:013834194X
- QUOTE: Regression analysis is the statistical method logy for predicting values of one or more response (dependent) variables from a collection of predictor (independent) variable values. It can also be used for assessing the effects of the predictor variables on the responses. Unfortunately, the name regression culled from the title of the first paper on the subject by F. Galton [13], in no way reflects either the importance or breath of application of this methodology. … Let [math]\displaystyle{ z_1, z_2, ..., z_r }[/math] be [math]\displaystyle{ r }[/math] predictor variables through to be related to a [[response variable [math]\displaystyle{ Y }[/math] ...

[Dodgeindepvar-1] Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9 (entry for "independent variable")

[Dodgeregression-2] Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9 (entry for "regression")

[Everitt1-3] Everitt, B.S. (2002) Cambridge Dictionary of Statistics, CUP. ISBN 0-521-81099-X

[Dodge1-4] Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms, OUP. ISBN 0-19-920613-9

[1]

[2]

[3]

[4]

Predictor Feature Function

References

2020

2015a

2015b

2015c

2012

2011

2008

2003

2000

1998

Navigation menu

Search