Beta-binomial model

Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]

Overview

In empirical Bayes methods, the Beta-binomial model is an analytic model where the likelihood function <math>L(x|\theta)</math> is specifed by a binomial distribution

<math>L(x|\theta) = \operatorname{Bin}(x,\theta)\,</math>

<math> = {n\choose x}\theta^x(1-\theta)^{n-x}\,</math>

and the conjugate prior <math>\pi(\theta|\alpha,\beta)</math> is a Beta distribution

<math>\pi(\theta|\alpha,\beta) = \mathrm{Beta}(\alpha,\beta)\,</math>

<math> = \frac{1}{\mathrm{B}(\alpha,\beta)} \theta^{\alpha-1}(1-\theta)^{\beta-1}.</math>

Derivation of the posterior and the marginal

It is convenient to reparameterize the distributions so that the expected mean of the prior is a single parameter: Let

<math>\pi(\theta|\mu,M) = Beta(\mu,M)\,</math>

<math> = \frac{\Gamma (M)}{\Gamma(\mu M)\Gamma(M(1-\mu))} \theta^{M\mu-1}(1-\theta)^{M(1-\mu)-1}</math>

where

<math> \mu = \frac{\alpha}{\alpha+\beta}\,</math> and <math> M = \alpha+\beta\,</math>

so that

<math>E(\theta|\mu,M) = \mu\,</math>

<math> Var(\theta|\mu,M) = \frac{\mu(1-\mu)}{M+1}</math>

The posterior distribution <math>\rho(\theta|x)</math> is also a beta distribution

<math> \rho(\theta|x) \propto l(x|\theta)\pi(\theta|\mu,M) </math>

<math> = \frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x}\theta^{x+M\mu-1}(1-\theta)^{n-x+M(1-\mu)-1}\,</math>

while the marginal distribution <math>m(x|\mu, M)</math> is given by

<math> m(x|\mu,M) = \int_{0}^{1} l(x|\theta)\pi(\theta|\mu, M) d\theta </math>

= <math>\frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x} \int_{0}^{1} \theta^{x+M\mu-1}(1-\theta)^{n-x+M(1-\mu)-1} d\theta </math>

= <math>\frac{\Gamma (M) }{\Gamma(M\mu)\Gamma(M(1-\mu))}{n\choose x}

\frac{\Gamma (x+M\mu)\Gamma(n-x+M(1-\mu)) }{\Gamma(n+M)} </math>

Moment estimates

Because the marginal is a complex, non-linear function of Gamma and Digamma functions, it is quite difficult to obtain a marginal maximum likelihood estimate (MMLE) for the mean and variance. Instead, we use the method of iterated expectations to find the expected value of the marginal moments.

Let us write our model as a two-stage compund sampling model (for each event i out of n_i possible events):

<math> X_{i}|\theta_{i} \sim \mathrm{Bin}(n_{i}, \theta_{i})\,</math>

<math> \theta_{i} \sim \mathrm{Beta}|(\mu,M), i.i.d\,</math>

We can find iterated moment estimates for the mean and variance using the moments for the distributions in the two-stage model:

<math>E\left(\frac{X}{n}\right) = E\left[E\left(\frac{X}{n}|\theta\right)\right] = E(\theta) = \mu</math>

<math>\mathrm{var}\left(\frac{X}{n}\right) = E\left[\mathrm{var}\left(\frac{X}{n}|\theta\right)\right] + \mathrm{var}\left[E\left(\frac{X}{n}|\theta\right)\right]</math>

<math> = E\left[\left(\frac{1}{n}\right)\theta(1-\theta)|\mu,M\right] + \mathrm{var}\left(\theta|\mu,M\right)</math>

<math> = \frac{1}{n}\left(\mu(1-\mu)\right)+ \frac{n_{i}-1}{n_{i}}\frac{(\mu(1-\mu))}{M+1}</math>

<math> = \frac{\mu(1-\mu)}{n}\left(1+\frac{n-1}{M+1}\right).</math>

We now seek a point estimate <math>\tilde{\theta}</math> as a weighted average of the sample estimate <math>\hat{\theta_{i}}</math> and an estimate for <math>\hat{\mu}</math>. The sample estimate <math>\hat{\theta_{i}}</math> is given by

<math>\hat{\theta_{i}} = \frac{x_{i}}{n_{i}}\,</math>

Therefore we need point estimates for <math>\mu</math> and <math>M</math>. The estimated mean <math>\hat{\mu}</math> is given as a weighted average

<math>\hat{\mu} = \frac{\sum_{i=1}^{N} n_{i} \hat{\theta_{i}} }{\sum_{i=1}^{N} n_{i} } = \frac{\sum_{i=1}^{N} x_{i} }{\sum_{i=1}^{N} n_{i} }</math>

The hyperparameter <math>M</math> is obtained using the moment estimates for the variance of the two-stage model:

<math>s^{2} = \frac{1}{N} \sum_{i=1}^{N} \mathrm{var}\left(\frac{x_{i}}{n_{i}}\right) = \frac{1}{N} \sum_{i=1}^{N} \frac{\hat{\mu}(1-\hat{\mu})}{n_{i}} \left[1+\frac{n_{i}-1}{\hat{M}+1}\right]= </math>

<math>s^{2} = \frac{N \sum_{i=1}^{N} n_{i} (\hat{\theta_{i}} - \hat{\mu_{i}})^{2} }{(N-1)\sum_{i=1}^{N} n_{i} } </math>

We can now solve for <math>\hat{M}</math>:

Given our point estimates for the prior, we may now plug in these values to find a point estimate for the posterior

<math>\tilde{\theta_{i}} = E(\Theta|\hat{\mu},\hat{M}) = \frac{\frac{x_{i}}{n_{i}} + \hat{M}\hat{\mu}}{n_{i}+\hat{M}} = \frac{\hat{M}}{n_{i}+\hat{M_{i}}}\hat{\mu} + \frac{n_{i}}{n_{i}+\hat{M_{i}}}\frac{x_{i}}{n_{i}}</math>

Shrinkage factors

We may write the posterior estimate as a weighted average:

<math>\tilde{\theta_{i}} = \hat{B_{i}}\hat{\mu} + (1-\hat{B_{i}})\hat{\theta_{i}}</math>

where <math>\hat{B_{i}}</math> is called the Shrinkage Factor.

Example

Maximum likelihood estimation

Improved estimates: James-Stein estimator

External links

Empirical Bayes for Beta-Binomial model

Using the Beta-binomial distribution to assess performance of a biometric identification device

Extended Beta-Binomial Model for Demand Forecasting of Multiple Slow-Moving Items with Low Consumption and Short Request History

Template:WikiDoc Sources