Minimum mean square error

Revision as of 17:27, 9 August 2012 by WikiBot (talk | contribs) (Robot: Automated text replacement (-{{SIB}} + & -{{EH}} + & -{{EJ}} + & -{{Editor Help}} + & -{{Editor Join}} +))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


In statistics, minimum mean square error (or MMSE) describes the statistical estimator with the least possible mean squared error. MMSE estimators are commonly described as optimal.

Let <math>{\scriptstyle\hat{\theta}}</math> be a point estimator of the parameter <math>\theta</math>:

<math>

\operatorname{MSE}( \hat{\theta} ) = \operatorname{E}( ( \hat{\theta} - \theta )^2 ). </math>

Suppose there are two estimators <math>{\scriptstyle\hat{\theta}_1}</math> and <math>{\scriptstyle{\hat{\theta}_2}}</math> of the parameter <math>\theta.</math> Set <math>{\scriptstyle\operatorname{MSE}( \hat{\theta}_1 )}</math> and <math>{\scriptstyle\operatorname{MSE}( \hat{\theta}_2 )}</math> equal to the mean square errors of those two estimators. Then the relative efficiency of <math>{\scriptstyle\hat{\theta}_1}</math> and <math>{\scriptstyle\hat{\theta}_2}</math> can be defined as:

<math>\frac{ \operatorname{MSE}( \hat{\theta}_1 ) }{ \operatorname{MSE}( \hat{\theta}_2 ) }</math>

Operational Considerations

Unfortunately, the correct distribution from which to estimate the mean-squared error of the estimator is a point of contention between Bayesian and frequentist schools of probability theory. Orthodox statistics employs a transformation of variables to get the probability distribution of the estimator from the sampling distribution, giving the estimator's probability independent of the actual data set obtained. This distribution correctly describes the variation of the estimator over all possible data sets.

<math>

P(\hat{\theta}) = \int_{-\infty}^{\infty} {\prod_{i=1}^{N} P(x_i|\theta)} \delta(\hat{\theta}- \hat{\theta}(\vec{x})) d\vec{x} \! </math>

Bayesian statistics instead holds that the correct distribution to use is that which represents the probability an observer would give to the variable after observing the actual data set.

<math>

P(\hat{\theta} | x_1, ..., x_N, I) = \frac {P(\vec{x} | \theta, I) P(\theta | I)} {P(\vec{x} | I)} \propto P(\vec{x} | \theta, I) P(\theta | I) \! </math> Where I represents some information the observer has about the nature of the variable <math>\theta</math>. This distribution correctly describes the observer's state of knowledge about the parameter to be estimated after taking the observed data set into consideration.

It is interesting to note that these alternate viewpoints can sometimes (but not always) produce the same mean +/- standard deviation answer, as for example, estimation of the mean of a Normally-distributed data set. (Jaynes)

References

Template:Statistics-stub

Template:WS