Covariance matrix: Difference between revisions
Line 150: | Line 150: | ||
== Further Reading == | == Further Reading == | ||
*[http://mathworld.wolfram.com/CovarianceMatrix.html Covariance Matrix] at | *[http://mathworld.wolfram.com/CovarianceMatrix.html Covariance Matrix] at MathWorld | ||
*van Kampen, N. G. ''Stochastic processes in physics and chemistry''. New York: North-Holland, 1981. | *van Kampen, N. G. ''Stochastic processes in physics and chemistry''. New York: North-Holland, 1981. | ||
Revision as of 18:32, 17 June 2009
WikiDoc Resources for Covariance matrix |
Articles |
---|
Most recent articles on Covariance matrix Most cited articles on Covariance matrix |
Media |
Powerpoint slides on Covariance matrix |
Evidence Based Medicine |
Cochrane Collaboration on Covariance matrix |
Clinical Trials |
Ongoing Trials on Covariance matrix at Clinical Trials.gov Trial results on Covariance matrix Clinical Trials on Covariance matrix at Google
|
Guidelines / Policies / Govt |
US National Guidelines Clearinghouse on Covariance matrix NICE Guidance on Covariance matrix
|
Books |
News |
Commentary |
Definitions |
Patient Resources / Community |
Patient resources on Covariance matrix Discussion groups on Covariance matrix Patient Handouts on Covariance matrix Directions to Hospitals Treating Covariance matrix Risk calculators and risk factors for Covariance matrix
|
Healthcare Provider Resources |
Causes & Risk Factors for Covariance matrix |
Continuing Medical Education (CME) |
International |
|
Business |
Experimental / Informatics |
Please Take Over This Page and Apply to be Editor-In-Chief for this topic: There can be one or more than one Editor-In-Chief. You may also apply to be an Associate Editor-In-Chief of one of the subtopics below. Please mail us [1] to indicate your interest in serving either as an Editor-In-Chief of the entire topic or as an Associate Editor-In-Chief for a subtopic. Please be sure to attach your CV and or biographical sketch.
Overview
In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions of the concept of the variance of a scalar-valued random variable.
Definition
If entries in the column vector
- <math>X = \begin{bmatrix}X_1 \\ \vdots \\ X_n \end{bmatrix}</math>
are random variables, each with finite variance, then the covariance matrix Σ is the matrix whose (i, j) entry is the covariance
- <math>
\Sigma_{ij} =\mathrm{E}\begin{bmatrix} (X_i - \mu_i)(X_j - \mu_j) \end{bmatrix} </math>
where
- <math>\mu_i = \mathrm{E}(X_i)\,</math>
is the expected value of the ith entry in the vector X. In other words, we have
- <math>
\Sigma = \begin{bmatrix}
\mathrm{E}[(X_1 - \mu_1)(X_1 - \mu_1)] & \mathrm{E}[(X_1 - \mu_1)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_1 - \mu_1)(X_n - \mu_n)] \\ \\ \mathrm{E}[(X_2 - \mu_2)(X_1 - \mu_1)] & \mathrm{E}[(X_2 - \mu_2)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_2 - \mu_2)(X_n - \mu_n)] \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \mathrm{E}[(X_n - \mu_n)(X_1 - \mu_1)] & \mathrm{E}[(X_n - \mu_n)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_n - \mu_n)(X_n - \mu_n)]
\end{bmatrix}. </math>
As a generalization of the variance
The definition above is equivalent to the matrix equality
- <math>
\Sigma=\mathrm{E} \left[
\left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top
\right] </math>
This form can be seen as a generalization of the scalar-valued variance to higher dimensions. Recall that for a scalar-valued random variable X
- <math>
\sigma^2 = \mathrm{var}(X) = \mathrm{E}[(X-\mu)^2], \, </math>
where
- <math>\mu = \mathrm{E}(X).\,</math>
Conflicting nomenclatures and notations
Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector <math>X</math>, because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector <math>X</math>. Thus
- <math>
\operatorname{var}(\textbf{X}) = \operatorname{cov}(\textbf{X}) = \mathrm{E} \left[
(\textbf{X} - \mathrm{E} [\textbf{X}]) (\textbf{X} - \mathrm{E} [\textbf{X}])^\top
\right] </math>
However, the notation for the "cross-covariance" between two vectors is standard:
- <math>
\operatorname{cov}(\textbf{X},\textbf{Y}) = \mathrm{E} \left[
(\textbf{X} - \mathrm{E}[\textbf{X}]) (\textbf{Y} - \mathrm{E}[\textbf{Y}])^\top
\right] </math>
The <math>var</math> notation is found in William Feller's two-volume book An Introduction to Probability Theory and Its Applications, but both forms are quite standard and there is no ambiguity between them.
Properties
For <math>\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]</math> and <math> \mu = \mathrm{E}(\textbf{X})</math> the following basic properties apply:
- <math> \Sigma = \mathrm{E}(\mathbf{X X^\top}) - \mathbf{\mu}\mathbf{\mu^\top} </math>
- <math> \mathbf{\Sigma}</math> is positive semi-definite
- <math> \operatorname{var}(\mathbf{A X} + \mathbf{a}) = \mathbf{A}\, \operatorname{var}(\mathbf{X})\, \mathbf{A^\top} </math>
- <math> \operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})</math>
- <math> \operatorname{cov}(\mathbf{X_1} + \mathbf{X_2},\mathbf{Y}) = \operatorname{cov}(\mathbf{X_1},\mathbf{Y}) + \operatorname{cov}(\mathbf{X_2}, \mathbf{Y})</math>
- If p = q, then <math>\operatorname{var}(\mathbf{X} + \mathbf{Y}) = \operatorname{var}(\mathbf{X}) + \operatorname{cov}(\mathbf{X},\mathbf{Y}) + \operatorname{cov}(\mathbf{Y}, \mathbf{X}) + \operatorname{var}(\mathbf{Y})</math>
- <math>\operatorname{cov}(\mathbf{AX}, \mathbf{BY}) = \mathbf{A}\, \operatorname{cov}(\mathbf{X}, \mathbf{Y}) \,\mathbf{B}^\top</math>
- If <math>\mathbf{X}</math> and <math>\mathbf{Y}</math> are independent, then <math>\operatorname{cov}(\mathbf{X}, \mathbf{Y}) = 0</math>
where <math>\mathbf{X}, \mathbf{X_1}</math> and <math>\mathbf{X_2}</math> are a random <math>\mathbf{(p \times 1)}</math> vectors, <math>\mathbf{Y}</math> is a random <math>\mathbf{(q \times 1)}</math> vector, <math>\mathbf{a}</math> is <math>\mathbf{(p \times 1)}</math> vector, <math>\mathbf{A}</math> and <math>\mathbf{B}</math> are <math>\mathbf{(p \times q)}</math> matrices.
This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way (see Rayleigh quotient for a formal proof and additional properties of covariance matrices). This is called principal components analysis (PCA) in statistics and Karhunen-Loève transform (KL-transform) in image processing.
Which matrices are covariance matrices
From the identity
- <math>\operatorname{var}(\mathbf{a^\top}\mathbf{X}) = \mathbf{a^\top} \operatorname{var}(\mathbf{X}) \mathbf{a}\,</math>
and the fact that the variance of any real-valued random variable is nonnegative, it follows immediately that only a nonnegative-definite matrix can be a covariance matrix. The converse question is whether every nonnegative-definite symmetric matrix is a covariance matrix. The answer is "yes". To see this, suppose M is a p×p nonnegative-definite symmetric matrix. From the finite-dimensional case of the spectral theorem, it follows that M has a nonnegative symmetric square root, which let us call M1/2. Let <math>\mathbf{X}</math> be any p×1 column vector-valued random variable whose covariance matrix is the p×p identity matrix. Then
- <math>\operatorname{var}(M^{1/2}\mathbf{X}) = M^{1/2} (\operatorname{var}(\mathbf{X})) M^{1/2} = M.\,</math>
Complex random vectors
The variance of a complex scalar-valued random variable with expected value μ is conventionally defined using complex conjugation:
- <math>
\operatorname{var}(z) = \operatorname{E} \left[
(z-\mu)(z-\mu)^{*}
\right] </math>
where the complex conjugate of a complex number <math>z</math> is denoted <math>z^{*}</math>.
If <math>Z</math> is a column-vector of complex-valued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix:
- <math>
\operatorname{E} \left[
(Z-\mu)(Z-\mu)^{*}
\right] </math>
where <math>Z^{*}</math> denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar.
LaTeX provides useful features for dealing with covariance matrices. These are available through the extendedmath package.
Estimation
The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a 1 × 1 matrix than as a mere scalar. See estimation of covariance matrices.
Further Reading
- Covariance Matrix at MathWorld
- van Kampen, N. G. Stochastic processes in physics and chemistry. New York: North-Holland, 1981.