Cohen's kappa
Overview
Cohen's kappa coefficient is a statistical measure of inter-rater reliability. It is generally thought to be a more robust measure than simple percent agreement calculation since κ takes into account the agreement occurring by chance. Cohen's kappa measures the agreement between two raters who each classify N items into C mutually exclusive categories.
The equation for κ is:
- <math>\kappa = \frac{\Pr(a) - \Pr(e)}{1 - \Pr(e)}, \!</math>
where Pr(a) is the relative observed agreement among raters, and Pr(e) is the probability that agreement is due to chance. If the raters are in complete agreement then κ = 1. If there is no agreement among the raters (other than what would be expected by chance) then κ ≤ 0.
The seminal paper introducing kappa as a new technique was published by Jacob Cohen in the journal Educational and Psychological Measurement in 1960.
Note that Cohen's kappa measures agreement between two raters only. For a similar measure of agreement (Fleiss' kappa) used when there are more than two raters, see Fleiss (1981).
Significance
Landis and Koch[1] gave the following table for interpreting <math>\kappa</math> values. This table is however by no means universally accepted; Landis and Koch supplied no evidence to support it, basing it instead on personal opinion. It has been noted that these guidelines may be more harmful than helpful[2], as the number of categories and subjects will affect the magnitude of the value. The kappa will be higher when there are fewer categories.[3]
<math>\kappa</math> | Interpretation |
---|---|
< 0 | Poor agreement |
0.0 — 0.20 | Slight agreement |
0.21 — 0.40 | Fair agreement |
0.41 — 0.60 | Moderate agreement |
0.61 — 0.80 | Substantial agreement |
0.81 — 1.00 | Almost perfect agreement |
References
- Jacob Cohen, A coefficient of agreement for nominal scales, Educational and Psychological Measurement 20: 37–46, 1960.
- Joseph L. Fleiss. Statistical methods for rates and proportions, 2ed. John Wiley & Sons, Inc. New York. 1981. pp 212-236 (chapter 13: The measurement of interrater agreement).
See also
Notes
- ^ Landis, J. R. and Koch, G. G. (1977) pp. 159--174
- ^ Gwet, K. (2001)
- ^ Sim, J. and Wright, C. C. (2005) pp. 257--268
References
- Fleiss, J. L. (1971) "Measuring nominal scale agreement among many raters." Psychological Bulletin, Vol. 76, No. 5 pp. 378--382
- Gwet, K. (2001) Statistical Tables for Inter-Rater Agreement. (Gaithersburg : StatAxis Publishing)
- Landis, J. R. and Koch, G. G. (1977) "The measurement of observer agreement for categorical data" in Biometrics. Vol. 33, pp. 159--174
- Scott, W. (1955). "Reliability of content analysis: The case of nominal scale coding." Public Opinion Quarterly, 17, 321-325.
- Sim, J. and Wright, C. C. (2005) "The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements" in Physical Therapy. Vol. 85, pp. 257--268
Further reading
- Fleiss, J. L. and Cohen, J. (1973) "The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability" in Educational and Psychological Measurement, Vol. 33 pp. 613--619
- Fleiss, J. L. (1981) Statistical methods for rates and proportions. 2nd ed. (New York: John Wiley) pp. 38--46
External links
- Cohen's Kappa Example
- Vassar A Kappa worksheet with explanation provided by Dr Lowry of Vassar.
- Kappa Coefficients: A Critical Appraisal