Missing data

Jump to navigation Jump to search

Template:Missing data Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1] Gonzalo Romero, M.D.[2]

Overview

Classification of missing data

Missing data can be classified depending on the relationship with the independent or dependent(outcome) variables in 3 categories:

  1. Missing completely at random (MCAR)
  2. Missing at random (MAR)
  3. Missing not at random (MNAR)

Missing completely at random (MCAR)

It is independent of observed and non-observed data, therefore not related to the independent variables or the outcome.

Missing at random (MAR)

It is not related to the outcome but is related to the independent variables (for example age, race, gender). Probability of a value being missing will generally depend on observed values (NOT MISSING VALUES), so it does not correspond to the intuitive notion of 'random'. May influence if the independent variable is related to the outcome.

Old subjects might drop out a treatment because they have walking difficulties ( as they cannot go to the clinic center, however among older subjects, the likelihood of dropping out does not relate to the outcome).

Missing not at random (MNAR)

It is is related to the outcome. Present when the pattern of missing data are related to unobserved data - therefore it is impossible to predict data from other values from the dataset

The worst missing data would be the "missing not at random" data since it would indicate that the dropouts were related to the therapy under study.

Handling missing data