|
|
Line 1: |
Line 1: |
| {{SI}}
| | #redirect[[Power transform]] |
| {{EH}}
| |
| | |
| In [[statistics]], the '''power transform''' is a family of transformations that map [[data]] from one space to another using power functions. This is a useful data (pre)[[processing]] technique used to reduce data variation, make the data more [[normal distribution]]-like, improve the correlation between variables and for other data stabilization procedures. The '''Box–Cox transformation''', by statisticians [[George E. P. Box]] and [[David Cox (statistician)|David Cox]], is one particular way of parameterising a power transform that has advantageous properties.
| |
| | |
| ==Definition==
| |
| The power transformation is defined as a continuously varying function, with respect to the power parameter ''λ'', in a piece-wise function form that makes it continuous at the point of singularity (''λ'' = 0). For data vectors (''y''<sub>1</sub>,..., ''y''<sub>''n''</sub>) in which each ''y''<sub>''i''</sub> > 0, the power transform is
| |
| | |
| : <math>y_i^{(\lambda)} =
| |
| \begin{cases}
| |
| \dfrac{y_i^\lambda-1}{\lambda(\operatorname{GM}(y))^{\lambda -1}} , &\mbox{ if } \lambda \neq 0 \\ \\
| |
| \operatorname{GM}(y)\log{y_i} , &\mbox{ if } \lambda = 0
| |
| \end{cases}
| |
| </math>
| |
| | |
| where
| |
| | |
| : <math> \operatorname{GM}(y) = (y_1\cdots y_n)^{1/n} \, </math>
| |
| | |
| is the [[geometric mean]] of the observations ''y''<sub>1</sub>, ..., ''y''<sub>''n''</sub>.
| |
| | |
| The inclusion of the (''λ'' − 1)th power of the geometric mean in the denominator implies that the units of measurement do not change as ''λ'' changes. That makes it possible to compare sums of squares of [[errors and residuals in statistics|residuals]] and choose the value of ''λ'' that minimizes that sum.
| |
| | |
| The value at ''Y'' = 1 for any ''λ'' is 0, and the [[derivative]] with respect to ''Y'' there is 1 for any ''λ''. Sometimes ''Y'' is a version of some other variable scaled to give ''Y'' = 1 at some sort of average value.
| |
| | |
| The transformation is a [[power (mathematics)|power]] transformation, but done in such a way as to make it [[continuous function|continuous]] with the parameter ''λ'' at ''λ'' = 0. It has proved popular in [[regression analysis]], including [[econometrics]].
| |
| | |
| Box and Cox also proposed a more general form of the transformation that incorporates a shift parameter.
| |
| | |
| :<math>\tau(y_i;\lambda, \alpha) = \begin{cases} \dfrac{(y_i + \alpha)^\lambda - 1}{\lambda (\operatorname{GM}(y))^{\lambda - 1}} & \mathrm{if}\ \lambda\neq 0, \\ \\
| |
| \operatorname{GM}(y)\ln(y_i + \alpha)& \mathrm{if}\ \lambda=0.\end{cases}</math>
| |
| | |
| If τ(''Y'', λ, α) follows a [[truncated normal distribution]], then ''Y'' is said to follow a [[Box–Cox distribution]].
| |
| | |
| ==Use of the power transform==
| |
| * Power transforms are ubiquitously used in various fields. For example, [http://portal.acm.org/citation.cfm?id=1172964.1173292&coll=&dl=acm&CFID=15151515&CFTOKEN=6184618 multi-resolution and wavelet analysis], [[statistical data analysis]], [http://www.andrologyjournal.org/cgi/reprint/23/5/629.pdf medical research], [http://www.springerlink.com/content/y25q020x24602701/ modeling of physical processes], [http://www.springerlink.com/content/mt81u60813077641/ geochemical data analysis], [http://www.blackwell-synergy.com/doi/abs/10.1111/j.1467-9876.2005.00476.x epidemiology] and many other clinical, environmental and social research areas.
| |
| | |
| ==Power transform activities==
| |
| The [[SOCR]] resource pages contain a number of [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_PowerTransformFamily_Graphs hands-on interactive activities with the Power Transform] using Java applets and charts.
| |
| | |
| == Example ==
| |
| The BUPA liver data set contains data on liver enzymes [[Alanine transaminase|ALT]] and [[Gamma-glutamyl transpeptidase|γGT]]. The data can be found via the [[classic data sets]] page. Suppose we are interested in using log(γGT) to predict ALT. A plot of the data appears in panel (a) of the figure. There appears to be non-constant variance, and a Box–Cox transformation might help.
| |
| | |
| [[image:BUPA_BoxCox.JPG]]
| |
| | |
| The log-likelihood of the power parameter appears in panel (b). The horizontal reference line is at a distance of χ<sub>1</sub><sup>2</sup>/2 from the maximum and can be used to read off an approximate 95% confidence interval for λ. It appears as though a value close to zero would be good, so we take logs.
| |
| | |
| Possibly, the transformation could be improved by adding a shift parameter to the log transformation. Panel (c) of the figure shows the log-likelihood. In this case, the maximum of the likelihood is close to zero suggesting that a shift parameter is not needed. The final panel shows the transformed data with a superimposed regression line.
| |
| | |
| Note that although Box–Cox transformations can make big improvements in model fit, there are some issues that the transformation cannot help with. In the current example, the data are rather heavy-tailed so that the assumption of normality is not realistic and a [[robust regression]] approach leads to a more precise model.
| |
| | |
| == Econometric application ==
| |
| | |
| Economists often characterize production relationships by some variant of the Box–Cox transformation.
| |
| | |
| Consider a common representation of production ''Q'' as dependent on services provided by a capital stock ''K'' and by labor hours ''N'':
| |
| | |
| :<math>\tau(Q)=\alpha \tau(K)+ (1-\alpha)\tau(N).\,</math>
| |
| | |
| Solving for ''Q'' by inverting the Box–Cox transformation we find
| |
| | |
| :<math>Q=\big(\alpha K^\lambda + (1-\alpha) N^\lambda\big)^{1/\lambda},\,</math>
| |
| | |
| which is known as the ''constant elasticity of substitution (CES)'' production function.
| |
| | |
| The CES production function is a [[homogeneous function]] of degree one.
| |
| | |
| When ''λ'' = 1, this produces the linear production function:
| |
| | |
| : <math>Q=\alpha K + (1-\alpha)N.\,</math>
| |
| | |
| When ''λ'' → 0 this produces the famous [[Cobb-Douglas]] production function:
| |
| | |
| : <math>Q=K^\alpha N^{1-\alpha}.\,</math>
| |
| | |
| ==Activities and demonstrations==
| |
| The [[SOCR]] resource pages contain a number of [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_PowerTransformFamily_Graphs hands-on interactive activities] demonstrating the Box–Cox (Power) Transformation using Java applets and charts. These directly illustrate the effects of this transform on [[Qq plot]]s, X-Y [[scatterplot]]s, [[time-series]] plots and [[histogram]]s.
| |
| | |
| ==References==
| |
| * {{cite journal | last = Box | first = George E. P. | authorlink = George EP Box | coauthors = [[David Cox (statistician)|Cox, D. R.]] | title = An analysis of transformations | journal = Journal of the Royal Statistical Society, Series B | volume = 26 | pages = 211–246 | date = 1964 | url=http://www.jstor.org/stable/2984418}}
| |
| * Carroll, RJ and Ruppert, D. [http://wiki.stat.ucla.edu/socr/uploads/b/b8/PowerTransformFamily_Biometrica609.pdf On prediction and the power transformation family]. Biometrika 68: 609–615.
| |
| * {{cite journal | last = DeGroot| first = M. H.| title = A Conversation with George Box | journal = Statistical Science | volume = 2 | pages = 239–258 | date = 1987| doi = 10.1214/ss/1177013223}}
| |
| * Handelsman, DJ. Optimal Power Transformations for Analysis of Sperm Concentration and Other Semen Variables. Journal of Andrology, Vol. 23, No. 5, September/October 2002.
| |
| * Gluzman, S and Yukalov, VI. Self-similar power transforms in extrapolation problems. Journal of Mathematical Chemistry, Volume 39, Number 1 / January, 2006, DOI 10.1007/s10910-005-9003-7, 47–56.
| |
| * Howarth, RJ and Earle, SAM. Application of a generalized power transformation to geochemical data Journal Mathematical Geology, Volume 11, Number 1 / February, 1979, DOI 10.1007/BF01043245, pages 45–62.
| |
| * Peters, JL Rushton, L, Sutton, AJ, Jones, DR, Abrams, KR, Mugglestone, MA. (2005) Bayesian methods for the cross-design synthesis of epidemiological and toxicological evidence. [[Journal of the Royal Statistical Society]]: Series C (Applied Statistics) 54 (1), 159–172, doi:10.1111/j.1467-9876.2005.00476.x
| |
| | |
| ==External links==
| |
| * [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_PowerTransformFamily_Graphs SOCR Power Transform Activities and Applets]
| |
| * [http://www.stat.uconn.edu/~studentjournal/index_files/pengfi_s05.pdf Box–Cox Transformation: An Overview, Pengfei Li]
| |
| | |
| | |
| [[Category:Statistics]]
| |
| | |
| | |
| {{SIB}}
| |
| | |
| [[de:Box-Cox-Transformation]]
| |
| [[eu:Box-Cox aldakuntza]]
| |
| [[pl:Przekształcenie Boxa-Coxa]]
| |
| | |
| {{WH}}
| |
| {{WS}}
| |