Tweedie distributions
In probability and statistics, the Tweedie distributions are a family of probability distributions which include continuous distributions such as the normal and gamma, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson-Gamma distributions which have positive mass at zero, but are otherwise continuous.[1] Tweedie distributions belong to the exponential dispersion model family of distributions, a generalization of the exponential family, which are the response distributions for generalized linear models.
Tweedie distributions have a mean <math>\mu</math> and a variance <math>\phi \mu^p</math>, where <math>\phi>0</math> is a dispersion parameter, and <math>p</math>, called the index parameter, (uniquely) determines the distribution in the Tweedie family. Special cases include:
- <math>p=0</math> is the normal distribution
- <math>p=1</math> with <math>\phi=1</math> is the Poisson distribution
- <math>p=2</math> is the gamma distribution
- <math>p=3</math> is the inverse Gaussian distribution.
Tweedie distributions exist for all real values of <math>p</math> except for <math>0<p<1</math>.[2] Apart from the four special cases identified above, their probability density function have no closed form. However, software is available that enables the accurate computation of the Tweedie densities (and probability distribution functions).[3][4]
The Tweedie distributions were so named by Bent Jørgensen after M.C.K. Tweedie, a medical statistician at the University of Liverpool, UK, who presented the first thorough study of these distributions in 1984.[1]
The index parameter <math>p</math> defines the type of distribution[2]:
- For <math>p<0</math>, the data <math>y</math> are supported on the whole real line (but, interestingly, <math>\mu>0</math>). Applications for these distribution are unknown.
- For <math>p=0</math> (the normal distribution), the data <math>y</math> and the mean <math>\mu</math> are supported on the whole real line.
- For <math>0<p<1</math>, no distributions exist
- For <math>p=1</math>, the distribution exist on the non-negative integers
- For <math>1<p<2</math>, the distribution is continuous on the positive reals, plus an added mass (exact zero) at <math>Y=0</math>. For example, consider monthly rainfall[5]. When no rain is recorded, an exact zero is recorded. If rain is recorded, a continuous amount results. These distributions are also called the Poisson-gamma distributions, since they can be represented as the Poisson sum of gamma distributions.[6] They are therefore a type of compound Poisson distribution.
- For <math>p>2</math>, the data <math>y</math> are supported on the non-negative reals, and <math>\mu>0</math>. These distribution are like the gamma distribution (which corresponds to <math>p=2</math>), but are progressively more right-skewed as <math>p</math> gets larger.
Applications
Applications of Tweedie distributions (apart from the four special cases identified) include:
- ecology [19]
- analysis of alcohol consumption in British teenagers [20]
- medical applications [21]
- fisheries [22]
References
- ↑ 1.0 1.1 Tweedie MCK (1984). An index which distinguishes between some important exponential families. In ‘Statistics Applications and New Directions’, Proceedings of the Indian Statistical Institute Golden Jubilee International Conference. (Ed. JK Ghosh and J Roy) pp. 579-604. (Indian Statistical Institute: Calcutta)
- ↑ 2.0 2.1 Jørgensen, B. 1987. Exponential dispersion models (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 49: 127--162
- ↑ Dunn, P. K. and Smyth, G. K. 2005. Series evaluation of Tweedie exponential dispersion models densities. Statistics and Computing, 15: 267--280
- ↑ Dunn, P. The tweedie Package, (2007) http://cran.r-project.org/doc/packages/tweedie.pdf
- ↑ 5.0 5.1 Dunn, P. K. 2004. Occurrence and quantity of precipitation can be modelled simultaneously. International Journal of Climatology. 24: 1231--1239.
- ↑ Dunn, Peter; Smyth, Gordon. (2005). Series evaluation of Tweedie exponential dispersion model densities Statistics and Computing, Volume 15, Number 4, October 2005 , pp. 267-280(14)http://portal.acm.org/citation.cfm?id=1093724.1093748&coll=&dl=
- ↑ Haberman, S. and Renshaw, A. E. 1996. Generalized linear models and actuarial science. The Statistician, 45: 407--436.
- ↑ Renshaw, A. E. 1994. Modelling the claims process in the presence of covariates. ASTIN Bulletin 24: 265--286.
- ↑ Jørgensen, B. and Paes de Souza, M. C. 1994. Fitting Tweedie's compound Poisson model to insurance claims data. Scand. Actuar. J. 1: 69--93.
- ↑ Haberman, S., and Renshaw, A. E. 1998. Actuarial applications of generalized linear models. In Statistics in Finance, D. J. Hand and S. D. Jacka (eds), Arnold, London.
- ↑ Millenhall, S. J. 1999. A systematic relationship between minimum bias and generalized linear models. 1999 Proceedings of the Casualty Actuarial Society 86: 393--487.
- ↑ Murphy, K. P., Brockman, M. J., and Lee, P. K. W. (2000). Using generalized linear models to build dynamic pricing systems. Casualty Actuarial Forum, Winter 2000.
- ↑ Smyth, G. K., and Jørgensen, B. 2002. Fitting Tweedie's compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin 32: 143--157.
- ↑ Davidian, M. 1990. Estimation of variance functions in assays with possible unequal replication and nonnormal data. Biometrika 77: 43--54.
- ↑ Davidian, M., Carroll, R. J. and Smith, W. 1988. Variance functions and the minimum detectable concentration in assays. Biometrika 75: 549--556.
- ↑ Aalen, O. O. 1992. Modelling heterogeneity in survival analysis by the compound Poisson distribution. Ann. Appl. Probab. 2: 951--972.
- ↑ Hougaard, P. , Harvald, B. and Holm, N. V. 1992.Measuring the similarities between the lifetimes of adult Danish twins born between 1881--1930. J. Amer. Statist. Assoc. 87: 17--24.
- ↑ Hougaard, P. 1986. Survival models for heterogeneous populations derived from stable distributions. Biometrika, 73: 387--396.
- ↑ Perry, J. N. 1981. Taylor's power law for dependence of variance on mean in animal populations.J. Roy. Statist. Soc. Ser. C 30: 254--263.
- ↑ Gilchrist, R. and Drinkwater, D. 1999. Fitting Tweedie models to data with probability of zero responses. Proceedings of the 14th International Workshop on Statistical Modelling, Graz, pp. 207--214.
- ↑ 21.0 21.1 Smyth, G. K. 1996. Regression analysis of quantity data with exact zeros. Proceedings of the Second Australia--Japan Workshop on Stochastic Models in Engineering, Technology and Management. Technology Management Centre, University of Queensland, 572--580.
- ↑ Candy, S. G. 2004. Modelling catch and effort data using generalized linear models, the Tweedie distribution, random vessel effects and random stratum-by-year effects. CCAMLR Science. 11: 59--80.
Further reading
- Kaas, R (2005). Compound Poisson distribution and GLM’s – Tweedie’s distribution. Handelingen van het contactforum 3rd Actuarial and Financial Mathematics Day (4 February 2005), 3-12. http://ucs.kuleuven.be/seminars_events/other/files/3afmd/Kaas.PDF
- Ohlsson, E and Johansson, B. Exact Credibility and Tweedie Models, University of Stockholm, Research report , October 2003. http://www.math.su.se/matstat/reports/seriea/2003/rep15/report.pdf
- Smith, CAB. (1997). Obituary: Maurice Charles Kenneth Tweedie, 1991-96 Journal of the Royal Statistical Society: Series A (Statistics in Society) 160 (1), 151–154. doi:10.1111/1467-985X.00052
- Smyth, G. K., and Jørgensen, B. (2002). Fitting Tweedie's compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin 32, 143-157. 6/2002 http://www.statsci.org/smyth/pubs/insuranc.pdf
- Tweedie, M. C. K. (1956) Some statistical properties of inverse Gaussian distributions. Virginia J. Sci. (N.S.) 7 (1956), 160--165.
- Tweedie distributions. http://www.statsci.org/s/tweedie.html
- Tweedie generalized linear model family. http://www.statsci.org/s/tweedief.html
- Examples of use of the model. http://www.sci.usq.edu.au/staff/dunn/Datasets/tech-glms.html#Tweedie