False discovery rate

Template:Cleanup

False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. It controls the expected proportion of incorrectly rejected null hypotheses (type I errors) in a list of rejected hypotheses.^[1] It is a less conservative comparison procedure with greater power than familywise error rate^[2] (FWER) control, at a cost of increasing the likelihood of obtaining type I errors.

The q value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimate q-values rather than fixing a level at which to control the FDR.

Classification of m hypothesis tests

The following table defines some random variables related to the m hypothesis tests.

	# declared non-significant	# declared significant	Total
# true null hypotheses	<math>U</math>	<math>V</math>	<math>m_0</math>
# non-true null hypotheses	<math>T</math>	<math>S</math>	<math>m - m_0</math>
Total	<math>m - R</math>	<math>R</math>	<math>m</math>

<math>m_0</math> is the number of true null hypotheses
<math>m - m_0</math> is the number of false null hypotheses
<math>U</math> is the number of true negatives
<math>V</math> is the number of false positives
<math>T</math> is the number of false negatives
<math>S</math> is the number of true positives
<math>H_1 ... H_m</math> the null hypotheses being tested
In m hypothesis tests of which m₀ are true null hypotheses, R is an observable random variable, and S, T, U, and V are unobservable random variables.

The false discovery rate is given by <math>\mathrm{E}\!\left [\frac{V}{V+S}\right ] = \mathrm{E}\!\left [\frac{V}{R}\right ]</math> and one wants to keep this value below a threshold <math>\alpha</math>.

Controlling procedures

Independent tests

The Simes procedure ensures that its expected value <math>\mathrm{E}\!\left[ \frac{V}{V + S} \right]\,</math> is less than a given <math>\alpha</math> (Benjamini and Hochberg 1995). This procedure is valid when the <math>m</math> tests are independent. Let <math>H_1 \ldots H_m</math> be the null hypotheses and <math>P_1 \ldots P_m</math> their corresponding p-values. Order these values in increasing order and denote them by <math>P_{(1)} \ldots P_{(m)}</math>. For a given <math>\alpha</math>, find the largest <math>k</math> such that

<math>P_{(k)} \leq \frac{k}{m} \alpha.</math>

Then reject (i.e. declare positive) all <math>H_{(i)}</math> for <math>i = 1, \ldots, k</math>. ...Note, the mean <math>\alpha</math> for these <math>m</math> tests is <math>\frac{\alpha(m+1)}{2m}</math> which could be used as a rough FDR (RFDR) or "<math>\alpha</math> adjusted for <math>m</math> indep. tests."

Dependent tests

The Benjamini and Yekutieli procedure controls the false discovery rate under dependence assumptions. This refinement modifies the threshold and finds the largest <math>k</math> such that:

<math>P_{(k)} \leq \frac{k}{m \cdot c(m)} \alpha </math>

If the tests are independent: <math>c(m) = 1</math> (same as above)
If the tests are positively correlated: <math>c(m) = 1</math>
If the tests are negatively correlated: <math>c(m) = \sum _{i=1} ^m \frac{1}{i}</math>

In the case of negative correlation, <math>c(m)</math> can be approximated by using the Euler-Mascheroni constant

<math>\sum _{i=1} ^m \frac{1}{i} \approx \ln(m) + \gamma.</math>

Using RFDR above, an approximate FDR (AFDR) is the min(mean <math>\alpha</math>) for <math>m</math> dependent tests = RFDR / ( ln(<math>m</math>)+ 0.57721...).

References

↑ Benjamini, Y., and Hochberg Y. (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing". Journal of the Royal Statistical Society. Series B (Methodological) 57 (1), 289–300. [1]
↑ Shaffer J.P. (1995) Multiple hypothesis testing, Annual Rview of Psychology 46:561-584 http://dx.doi.org/10.1146/annurev.ps.46.020195.003021

Benjamini, Yoav; Hochberg, Yosef (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing" (PDF). Journal of the Royal Statistical Society, Series B (Methodological). 57 (1): 289–300. Template:MathSciNet.

Benjamini, Yoav; Yekutieli, Daniel (2001). "The control of the false discovery rate in multiple testing under dependency" (PDF). Annals of Statistics. 29 (4): 1165–1188. doi:10.1214/aos/1013699998. Template:MathSciNet.

Storey, John D. (2002). "A direct approach to false discovery rates" (PDF). Journal of the Royal Statistical Society, Series B (Methodological). 64 (3): 479–498. doi:10.1111/1467-9868.00346. Template:MathSciNet.

Storey, John D. (2003). "The positive false discovery rate: A Bayesian interpretation and the q-value" (PDF). Annals of Statistics. 31 (6): 2013–2035. doi:10.1214/aos/1074290335. Template:MathSciNet.

Template:WikiDoc Sources

[1] Benjamini, Y., and Hochberg Y. (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing". Journal of the Royal Statistical Society. Series B (Methodological) 57 (1), 289–300. [1]

[2] Shaffer J.P. (1995) Multiple hypothesis testing, Annual Rview of Psychology 46:561-584 http://dx.doi.org/10.1146/annurev.ps.46.020195.003021

[1]

[2]

False discovery rate

Contents

Classification of m hypothesis tests

Controlling procedures

Independent tests

Dependent tests

References

Navigation menu

False discovery rate

Classification of m hypothesis tests

Controlling procedures

Independent tests

Dependent tests

References

Navigation menu

Search