Degrees of freedom (statistics)

Revision as of 11:26, 10 June 2012 by WikiBot (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Please Take Over This Page and Apply to be Editor-In-Chief for this topic: There can be one or more than one Editor-In-Chief. You may also apply to be an Associate Editor-In-Chief of one of the subtopics below. Please mail us [1] to indicate your interest in serving either as an Editor-In-Chief of the entire topic or as an Associate Editor-In-Chief for a subtopic. Please be sure to attach your CV and or biographical sketch.

Overview

In statistics, the term degrees of freedom has two distinct senses.

Residuals

In fitting statistical models to data, the vectors of residuals are often constrained to lie in a space of smaller dimension than the number of components in the vector. That smaller dimension is the number of degrees of freedom for error.

Perhaps the simplest example is this. Suppose

<math>X_1,\dots,X_n\,</math>

are random variables each with expected value μ, and let

<math>\overline{X}_n={X_1+\cdots+X_n \over n}</math>

be the "sample mean". Then the quantities

<math>X_i-\overline{X}_n\,</math>

are residuals that may be considered estimates of the errors Xi − μ. The sum of the residuals (unlike the sum of the errors) is necessarily 0. That means they are constrained to lie in a space of dimension n − 1. If one knows the values of any n − 1 of the residuals, one can thus find the last one. One says that "there are n − 1 degrees of freedom for error."

An only slightly less simple example is that of least squares estimation of a and b in the model

<math>Y_i=a+bx_i+\varepsilon_i\ \mathrm{for}\ i=1,\dots,n</math>

where εi, and hence Yi are random. Let <math>\widehat{a}</math> and <math>\widehat{b}</math> be the least-squares estimates of a and b. Then the residuals

<math>e_i=y_i-(\widehat{a}+\widehat{b}x_i)\,</math>

are constrained to lie within the space defined by the two equations

<math>e_1+\cdots+e_n=0,\,</math>
<math>x_1 e_1+\cdots+x_n e_n=0.\,</math>

One says that there are n − 2 degrees of freedom for error.

The capital Y is used in specifying the model, and lower-case y in the definition of the residuals. That is because the former are hypothesized random variables and the latter are data.

Another simple and frequently seen example arises in multiple comparisons.

Parameters in probability distributions

The probability distributions of residuals are often parametrized by these numbers of degrees of freedom. Thus one speaks of a chi-square distribution with a specified number of degrees of freedom, an F-distribution, a Student's t-distribution, or a Wishart distribution with specified numbers of degrees of freedom in the numerator and the denominator respectively.

In the familiar uses of these distributions, the number of degrees of freedom takes only integer values. The underlying mathematics in most cases allows for fractional degrees of freedom, which can arise in more sophisticated uses.

See also

External links

Template:WH Template:WikiDoc Sources

de:Freiheitsgrad eo:Grado de libereco fa:درجه آزادی it:Gradi di libertà (statistica) no:Frihetsgrad sl:Prostostna stopnja su:Tingkat kabebasan sv:Frihetsgrad