Pearson's r
Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]
In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the MCV or PMCC) (r) is a common measure of the correlation between two variables X and Y. When measured in a population the Pearson Product Moment correlation is designated by the Greek letter rho (ρ). When computed in a sample, it is designated by the letter r and is sometimes called "Pearson's r." Pearson's correlation reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. A correlation of -1 means that there is a perfect negative linear relationship between variables. A correlation of 0 means there is no linear relationship between the two variables. Correlations are rarely if ever 0, 1, or -1. A certain outcome could indicate whether correlations are negative or positive.[1]
The statistic is defined as the sum of the products of the standard scores of the two measures divided by the degrees of freedom.[1]. If the data comes from a sample, then
- <math>r = \frac {1}{n - 1} \sum ^n _{i=1} \left( \frac{X_i - \bar{X}}{s_X} \right) \left( \frac{Y_i - \bar{Y}}{s_Y} \right)</math>
where
- <math>\frac{X_i - \bar{X}}{s_X}, \bar{X}, \text{ and } s_X</math>
are the standard score, sample mean, and sample standard deviation (calculated using n − 1 in the denominator).[1]
If the data comes from a population, then
- <math>\rho = \frac {1}{n} \sum ^n _{i=1} \left( \frac{X_i - \mu_X}{\sigma_X} \right) \left( \frac{Y_i - \mu_Y}{\sigma_Y} \right)</math>
where
- <math>\frac{X_i - \mu_X}{\sigma_X}, \mu_X, \text{ and } \sigma_X</math>
are the standard score, population mean, and population standard deviation (calculated using n in the denominator).
The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations.
The coefficient ranges from −1 to 1. A value of 1 shows that a linear equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y increasing with X. A score of −1 shows that all data points lie on a single line but that Y increases as X decreases. A value of 0 shows that a linear model is inappropriate – that there is no linear relationship between the variables.[1]
The Pearson coefficient is a statistic which estimates the correlation of the two given random variables.
The linear equation that best describes the relationship between X and Y can be found by linear regression. This equation can be used to "predict" the value of one measurement from knowledge of the other. That is, for each value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value. We denote this predicted variable by Y'.
Any value of Y can therefore be defined as the sum of Y′ and the difference between Y and Y′:
- <math>Y = Y^\prime + (Y - Y^\prime).</math>
The variance of Y is equal to the sum of the variance of the two components of Y:
- <math>s_y^2 = S_{y^\prime}^2 + s^2_{y.x}.</math>
Since the coefficient of determination implies that sy.x2 = sy2(1 − r2) we can derive the identity
- <math>r^2 = {s_{y^\prime}^2 \over s_y^2}.</math>
The square of r is conventionally used as a measure of the association between X and Y. For example, if r2 is 0.90, then 90% of the variance of Y can be "accounted for" by changes in X and the linear relationship between X and Y.[1]
See also
- Linear correlation (wikiversity)
- Spearman's rank correlation coefficient
References
de:Korrelationskoeffizient it:Indice di correlazione di Pearson nl:Correlatiecoëfficiënt