Statistical Model Family

A statistical model family is a mathematical model family that defines a probability function set by means of a statistical model parameter vector [math]\displaystyle{ (S\theta, B\theta, P\theta, f\theta, W, P) }[/math], where ...

AKA: Parametric Family of Probability Distributions, Class of Statistical Models.
Context:
- It can (typically) be a Human Processable Artifact (by statisticians).
- ...
- It can range from being a Pure Statistical Model Family to being a Mixture Statistical Model Family.
- It can range from being a Discrete Probability Distribution Family to being a Discrete/Continuous Probability Distribution Family to being a Continuous Probability Distribution Family.
- It can range from being a Univariate Probability Distribution Family to being a Multivariate Probability Distribution Family.
- It can range from being a Parametric Statistical Model Family to being a Nonparametric Statistical Model Family.
- It can range from being a Linear Probability Distribution to being a Non-Linear Probability Distribution.
- It can range from being a True Statistical Model to being a False Statistical Model (of some stochastic Process).
- It can range from being a Correctly Specified Statistical Model to being a Misspecified Statistical Model (for some statistical modeling task).
- It can range from being a General Statistical Model to being a Domain-Specific Statistical Model.
- ...
- It can represent a Stochastic System.
- It can be supported by a Statistical Theory.
- It can be Produced by a Probabilistic Modeling Algorithm/Statistical Modeling Algorithm.
- It can assign a Likelihood Estimate to Events.
- It can be a Learning Metamodel.
- ...
Example(s):
- a Linear Statistical Model, such as Linear Probability Functions and their uses in Linear Regression.
- a Non-Linear Statistical Model, including Exponential Probability Models like Logistic Regression Models and Gaussian Probability Models.
- a Probability Distribution Family, encompassing models like the Binomial Distribution and the Normal Distribution Family.
- a Probabilistic Graphical Model Family, such as Bayesian Networks and Conditional Random Fields.
- a Markov Chain Model used to predict state transitions in stochastic processes.
- a Bradley-Terry Model, specifically designed for pairwise comparison-based ranking.
- any Probabilistic Graphical Model Family, such as Conditional Random Fields.
- a Normal Distribution Family.
- an application of a Bayesian Probability Model to update beliefs based on new evidence.
- Bradley-Terry Model ...
- …
Counter-Example(s):
- a Deterministic Modeling System, such as finite state automata, or a classification function family.
- a Probability Function Structure.
- a Heuristic Model.
- an Inductive Logic Programming Theory.
See: Distribution Parameter, Statistical Learning, Bayesian Model, Stochastic Calculus, Probability Model Fitting.

References

2014

(Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/List_of_probability_distributions Retrieved:2014-10-28.
- Many probability distributions are so important in theory or applications that they have been given specific names.

(Wikipedia, 2014) ⇒ http://en.wikipedia.org/wiki/statistical_model Retrieved:2014-8-12.
- A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more other variables. The model is statistical as the variables are not deterministically but stochastically related. In mathematical terms, a statistical model is frequently thought of as a pair [math]\displaystyle{ (Y, P) }[/math] where [math]\displaystyle{ Y }[/math] is the set of possible observations and [math]\displaystyle{ P }[/math] the set of possible probability distributions on [math]\displaystyle{ Y }[/math]. It is assumed that there is a distinct element of [math]\displaystyle{ P }[/math] which generates the observed data. Statistical inference enables us to make statements about which element(s) of this set are likely to be the true one.
  Most statistical tests can be described in the form of a statistical model. For example, the Student's t-test for comparing the means of two groups can be formulated as seeing if an estimated parameter in the model is different from 0. Another similarity between tests and models is that there are assumptions involved. Error is assumed to be normally distributed in most models.

2013

http://en.wikipedia.org/wiki/Statistical_model#Formal_definition
- QUOTE: A statistical model is a collection of probability distribution functions or probability density functions (collectively referred to as distributions for brevity). A parametric model is a collection of distributions, each of which is indexed by a unique finite-dimensional parameter: [math]\displaystyle{ \mathcal{P}=\{\mathbb{P}_{\theta} : \theta \in \Theta\} }[/math], where [math]\displaystyle{ \theta }[/math] is a parameter and [math]\displaystyle{ \Theta \subseteq \mathbb{R}^d }[/math] is the feasible region of parameters, which is a subset of d-dimensional Euclidean space. A statistical model may be used to describe the set of distributions from which one assumes that a particular data set is sampled. For example, if one assumes that data arise from a univariate Gaussian distribution, then one has assumed a Gaussian model: [math]\displaystyle{ \mathcal{P}=\{\mathbb{P}(x; \mu, \sigma) = \frac{1}{\sqrt{2 \pi} \sigma} \exp\left\{ -\frac{1}{2\sigma^2}(x-\mu)^2\right\} : \mu \in \mathbb{R}, \sigma \gt 0\} }[/math].
  A non-parametric model is a set of probability distributions with infinite dimensional parameters, and might be written as [math]\displaystyle{ \mathcal{P}=\{\text{all distributions}\} }[/math]. A semi-parametric model also has infinite dimensional parameters, but is not dense in the space of distributions. For example, a mixture of Gaussians with one Gaussian at each data point is dense in the space of distributions. Formally, if d is the dimension of the parameter, and n is the number of samples, if [math]\displaystyle{ d \rightarrow \infty }[/math] as [math]\displaystyle{ n \rightarrow \infty }[/math] and [math]\displaystyle{ d/n \rightarrow 0 }[/math] as [math]\displaystyle{ n \rightarrow \infty }[/math], then the model is semi-parametric.

http://en.wikipedia.org/wiki/Exponential_family#The_meaning_of_.22exponential_family.22
- In probability and statistics, an 'exponential family is an important class of probability distributions sharing a certain form, specified below. ... Properly speaking, there is no such thing as "the" exponential family, but rather an exponential family, and properly speaking, it is not a "distribution" but a family of distributions that either is or is not an exponential family. The problem lies in the fact that we often say, e.g., "the normal distribution" when properly we mean something like "the family of normal distributions with unknown mean and variance". A family of distributions is defined by a set of parameters that can be varied, and what makes a family be an exponential family is a particular relationship between the domain of a family of distributions (the variable over which each distribution in the family is defined) and the parameters.

2009

http://www.nature.com/nrg/journal/v5/n4/glossary/nrg1318_glossary.html
- Probabilistic Model: A model in which the data are modelled as random variables, the probability distribution of which depends on parameter values. ...
http://www.cdc.gov/drugresistance/community/program-planner/Glossary-Eval-Res.htm
- A model that is normally based on previous research and permits transformation of a specific impact measure into another specific impact measure ...

2008

(Georgii, 2008) ⇒ Hans-Otto Georgii. (2008). “Stochastics: introduction to probability theory and statistics." Walter de Gruyter. ISBN:3110191458
- QUOTE: A statistical model is a triple (X [math]\displaystyle{ F }[/math], P_v : [math]\displaystyle{ v }[/math] ∈ ϴ) consisting of a sample space [math]\displaystyle{ X }[/math], a σ-algebra [math]\displaystyle{ F }[/math] on [math]\displaystyle{ X }[/math], and a class {P_v : [math]\displaystyle{ v }[/math] ∈ ϴ} of (at least two) probability measures on (X, F), which are indexed by an Index Set ϴ.

2007

American Meteorology Society. (2007). “Glossary of Meteorology" http://amsglossary.allenpress.com/glossary/browse?s=s&p=102
- QUOTE: stochastic model — A model of a system that includes some sort of random forcing. In many cases, stochastic models are used to simulate deterministic systems that include smaller- scale phenomena that cannot be accurately observed or modeled. As such, these small-scale phenomena are effectively unpredictable. A good stochastic model manages to represent the average effect of unresolved phenomena on larger-scale phenomena in terms of a random forcing.

2006

(Cox, 2006) ⇒ David R. Cox. (2006). “Principles of Statistical Inference." Cambridge University Press. ISBN:9780521685672
- QUOTE: Key ideas about probability models and the objectives of statistical analysis are introduced. The differences between frequentist and Bayesian analyses are illustrated in a very special case. ... We use throughout the notation that observable random variables are represented by capital letters and observations by the corresponding lower case letters. ... A model, or strictly a family of models, specifies the density of [math]\displaystyle{ Y }[/math] to be [math]\displaystyle{ f_Y(y:z:\theta) }[/math] Where θ ⊂ Ω_θ is unknown. The distribution may depend also on design features of the study that generated the data. We typically simplify the notation to f_Y(y://θ), although the explanatory variables [math]\displaystyle{ z }[/math] are frequently essential in specific applications. To chose the model appropriately is crucial to fruitful application.

2005

(Freedman, 2005) ⇒ David A. Freedman. (2005). “Statistical Models: theory and practice." Cambridge University Press. ISBN:0521854830

2004

(Isaev, 2004) ⇒ Alexander Isaev. (2004). “Introduction to Mathematical Methods in Bioinformatics." Springer. ISBN:3540219730
- QUOTE: A statistical model is a family of probability spaces [math]\displaystyle{ {(S_\theta, \mathcal{B}_\theta, P_\theta)} }[/math] and a family of random variables [math]\displaystyle{ {f_\theta} }[/math] with common range [math]\displaystyle{ \mathcal{W} \subset \Re }[/math], each defined on the respective space for [math]\displaystyle{ \theta \in \mathcal{P} }[/math], where [math]\displaystyle{ \mathcal{P} }[/math] is an index set. The variable [math]\displaystyle{ \theta }[/math] denotes the parameters of the model, the set is called the range of the model, and the index set [math]\displaystyle{ \mathcal{P} }[/math] the parameter space of the model. Hence a statistical model can be thought of as a family [math]\displaystyle{ \{(S_\theta, \mathcal{B}_\theta, P_\theta, f_\theta, \mathcal{W}, \mathcal{P})\} }[/math]. When studying and applying statistical models, one is primarily interested in the distributions of [math]\displaystyle{ f_\theta }[/math]. Therefore, often statistical models are not specified in full as in Definition 8.1, but only the family [math]\displaystyle{ \{F_\theta = F_{f_\theta}\} }[/math], [math]\displaystyle{ \theta \in \mathcal{P} }[/math], of the distribution functions of [math]\displaystyle{ f_\theta }[/math] is given. Once the family [math]\displaystyle{ \{F_\theta\}, \theta \in \mathcal{P} }[/math], is specified, one can construct a statistical model in the sense of definition 8.1, for example, by setting [math]\displaystyle{ S_\theta = \Re }[/math], [math]\displaystyle{ \mathcal{B}_\theta = \mathcal{B}(\mathcal{F}_0) }[/math] (the σ-algebra of Borel sets in $\Re$), [math]\displaystyle{ P_\theta = P_{F_\theta} }[/math], [math]\displaystyle{ f_\theta(x)=x }[/math], [math]\displaystyle{ \mathcal{W} = \Re }[/math], for [math]\displaystyle{ \theta \in \mathcal{P} }[/math] (see Sect. 6.7). This model, however, is not always useful. We also remark that one can consider more general statistical models by allowing [math]\displaystyle{ f_\theta }[/math] to be vector-valued.

2003

(Davison, 2003) ⇒ Anthony C. Davison. (2003). “Statistical Models." Cambridge University Press. ISBN:0521773393
- QUOTE: Models and likelihood are the backbone of modern statistics and data analysis. A statistical model is a probability distribution constructed to enable inference to be drawn or decisions made from data. The huge variety of such problems makes it hard to develop a single over-arching theory, but nevertheless common strands appear. Uniting them is the idea of a statistical model. The key feature of a statistical model is that variability is represented using probability distributions, which form the building-blocks from which the model is constructed. Typically it must accommodate both random and systematic variation. The randomness inherent in the probability distribution accounts for apparently haphazard scatter in the data, and systematic pattern is supposed to be generated by structure in the model. The art of modelling lies in finding a balance that enables the questions at hand to be answered or new ones posed. The complexity of the model will depend on the problem at hand and the answer required, so different models and analyses may be appropriate for a single set of data.

2002

(McCullagh, 2002) ⇒ Peter McCullagh. (2002). “What is a Statistical Model.” In: The Annals of Statistics, 30(5).
(Pitt et al., 2002) ⇒ Mark A. Pitt, In Jae Myung, and Shaobo Zhang. (2002). “Toward a Method of Selecting Among Computational Models of Cognition.” In: Psychological Review, 109(3). [1]
- QUOTE: ... From a statistical standpoint, data are a sample generated from a true but unknown probability distribution, which is the regularity underlying the data. A statistical model is defined as a collection of probability distributions defined on experimental data and indexed by the model’s parameter vector, whose values range over the parameter space of the model. If the model contains as a special case the probability distribution that generated the data (i.e., the “true” model), then the model is said to be correctly specified; otherwise it is misspecified. ...

1987

(Hogg & Ledolter, 1987) ⇒ Robert V. Hogg and Johannes Ledolter. (1987). “Engineering Statistics. Macmillan Publishing Company.
- QUOTE: In applied mathematics we are usually concerned with either deterministic or probabilistic models, although in many instances these are intertwined. ... a deterministic model because everything is known once ... conditions are specified.

1985

(Nagaev & Shkol'nik, 1985a) ⇒ A. V. Nagaev and S. M. Shkol'nik. (1985). “A Family of Probability Distributions.” In: Mathematical Notes, 37(4). doi:10.1007/BF01158189

1980

(Thomas & Ross, 1980) ⇒ Ewart A. C. Thomas, and Brian H. Ross. (1980). “On appropriate procedures for combining probability distributions within the same family.” In: Journal of Mathematical Psychology, 21(2). [doi>10.1016/0022-2496(80)90003-6]