Expanded Basis Function

An Expanded Basis Function is a series expansion of a basis function (for linear regression).

AKA: Basis Expansion, Basis Function Expansion.
Context:
- It can be defined as: Given a input dataset [math]\displaystyle{ \{(x_1,y_1),(x_2,y_2,\dots,(x_p,y_p))\} }[/math], where [math]\displaystyle{ y_i \in I\!R }[/math] and [math]\displaystyle{ x_i \in I\!R^p }[/math]. We then choose a function [math]\displaystyle{ f (x) : I\!R^p\rightarrow I\!R }[/math] to predict [math]\displaystyle{ y }[/math] for a new [math]\displaystyle{ x }[/math] as [math]\displaystyle{ f(x)=\sum_{m=1}^M w_i \phi_i(x) }[/math], where [math]\displaystyle{ \phi_i }[/math] is a basis expansion such that [math]\displaystyle{ \phi(x): I\!R^p\rightarrow I\!R^m }[/math] and [math]\displaystyle{ w_i \in I\!R^m }[/math] is a linear function.
Example(s):
- Linear Basis Expansion.
- Polynomial Basis Expansion.
Counter-Example(s):
- Series Expansion.
See: Mapping Task, Regularization Algorithm, Training Set, Sparsity Regularization, High-dimensional Learning, Feature Selection.

References

2010

(Domke, 2010) ⇒ Justin Domke, (2010). "Basis Expansions" - Statistical Machine Learning, Notes 4
- We have defined our linear methods as

[math]\displaystyle{ f(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} }[/math]

Many machine learning textbooks, however, introduce linear methods with an explicit intercept^[1] term [math]\displaystyle{ w_0 }[/math], as something like

[math]\displaystyle{ f(\mathbf{x}) = w_0 + \mathbf{w} \cdot \mathbf{x} \quad (1.1) }[/math]

In learning, both the parameters [math]\displaystyle{ \mathbf{w} }[/math] and [math]\displaystyle{ w_0 }[/math] need to be adjusted. We have not bothered with this because our original model can be made equivalent by “tacking” a constant term onto [math]\displaystyle{ \mathbf{x} }[/math]. Define the function [math]\displaystyle{ \phi }[/math] which just takes the vector [math]\displaystyle{ \mathbf{x} }[/math], and prepend a constant of 1 by

[math]\displaystyle{ \phi(\mathbf{x}) = (1, \mathbf{x})\quad (1.2) }[/math]

Then, if we take all our training data, and replace each element [math]\displaystyle{ (\hat{y}, \hat{\mathbf{x}}) }[/math] by [math]\displaystyle{ (\hat{y}, \phi(\hat{\mathbf{x}})) }[/math], then we will have done the equivalent of adding an intercept term. This is a special example of a straightforward but powerful idea known as “basis expansion”.

2009

(Hastie et al., 2009) ⇒ Trevor Hastie, Robert Tibshirani, and Jerome H. Friedman. (2009). “The Elements of Statistical Learning: Data Mining, Inference, and Prediction; 2nd edition.” Springer-Verlag. ISBN:0387848576
- QUOTE: (pg. 49): The goal is to obtain a useful approximation to [math]\displaystyle{ f(x) }[/math] for all [math]\displaystyle{ x }[/math] in some region of [math]\displaystyle{ I\!R^p }[/math], given the representations in [math]\displaystyle{ \mathcal{T} }[/math]. Although somewhat less glamorous than the learning paradigm, treating supervised learning as a problem in function approximation encourages the geometrical concepts of Euclidean spaces and mathematical concepts of probabilistic inference to be applied to the problem. This is the approach taken in this book. Many of the approximations we will encounter have associated a set of parameters [math]\displaystyle{ \theta }[/math] that can be modified to suit the data at hand. For example, the linear model [math]\displaystyle{ f(x) = x^T \beta }[/math] has [math]\displaystyle{ \theta = \beta }[/math]. Another class of useful approximators can be expressed as linear basis expansions

[math]\displaystyle{ f_\theta(x) = \sum_{k=1}^K h_k(x)θ_k, \quad }[/math] (2.30)

where the [math]\displaystyle{ h_k }[/math] are a suitable set of functions or transformations of the input vector [math]\displaystyle{ x }[/math].

QUOTE: (pg. 158-159): The core idea in this chapter is to augment/replace the vector of inputs [math]\displaystyle{ X }[/math] with additional variables, which are transformations of [math]\displaystyle{ X }[/math], and then use linear models in this new space of derived input features.

Denote by [math]\displaystyle{ h_m(X) : I\!R^p \rightarrow I\!R }[/math] the m-th transformation of [math]\displaystyle{ X,\; m = 1, \cdots, M }[/math]. We then model

[math]\displaystyle{ f(X) = \sum_{m=1}^M\beta_m h_m(X) \quad (5.1) }[/math]

a linear basis expansion in [math]\displaystyle{ X }[/math]. The beauty of this approach is that once the basis functions [math]\displaystyle{ h_m }[/math] have been determined, the models are linear in these new variables, and the fitting proceeds as before.

Some simple and widely used examples of the hm are the following:

[math]\displaystyle{ h_m(X) = X_m,\; m = 1, \cdots, p }[/math] recovers the original linear model.
[math]\displaystyle{ h_m(X) = X_j^2 }[/math] or [math]\displaystyle{ h_m(X) = X_jX_k }[/math] allows us to augment the inputs with polynomial terms to achieve higher-order Taylor expansions. Note, however, that the number of variables grows exponentially in the degree of the polynomial. A full quadratic model in [math]\displaystyle{ p }[/math] variables requires [math]\displaystyle{ O(p^2) }[/math] square and cross-product terms, or more generally [math]\displaystyle{ O(p^d) }[/math] for a degree-d polynomial.
[math]\displaystyle{ h_m(X) = \log(X_j ), \sqrt{X_j} , \cdots }[/math] permits other nonlinear transformations of single inputs. More generally one can use similar functions involving several inputs, such as [math]\displaystyle{ h_m(X) = ||X|| }[/math].
[math]\displaystyle{ h_m(X) = I(L_m \leq X_k \lt U_m) }[/math], an indicator for a region of [math]\displaystyle{ X_k }[/math]. By breaking the range of [math]\displaystyle{ X_k }[/math] up into [math]\displaystyle{ M_k }[/math] such non-overlapping regions results in a model with a piecewise constant contribution for [math]\displaystyle{ X_k }[/math].

↑ The terminology “bias” is more common, but we will stick to “intercept”, since this has nothing to do with the “bias” we discuss in the bias-variance tradeoff.

[intercept-1] The terminology “bias” is more common, but we will stick to “intercept”, since this has nothing to do with the “bias” we discuss in the bias-variance tradeoff.

[1]

Expanded Basis Function

References

2010

2009

Navigation menu

Search