Propensity score
Editor-In-Chief: C. Michael Gibson, M.S., M.D. [1]
Please Take Over This Page and Apply to be Editor-In-Chief for this topic: There can be one or more than one Editor-In-Chief. You may also apply to be an Associate Editor-In-Chief of one of the subtopics below. Please mail us [2] to indicate your interest in serving either as an Editor-In-Chief of the entire topic or as an Associate Editor-In-Chief for a subtopic. Please be sure to attach your CV and or biographical sketch.
Overview
In medical research, observational studies do not allow investigators to have control over treatment assignment. As a result, covariates (i.e. age, sex) between treatment groups may differ significantly, which causes biased estimates of treatment effects.
For example, when analyzing the differences in outcomes among patients that received a lung transplant in a study versus those that did not, the lung transplant cohort may have been older and had a lower body weight than the cohort that did not receive a transplant. Since the two cohorts were unbalanced on these two covariates, the estimate for lung transplant will be biased.
Propensity scores help to eliminate this bias.
Definition
In the analysis of treatment effects, suppose that we have a binary treatment T, an outcome Y, and background variables X. The propensity score is defined as the conditional probability of treatment given background variables:
- <math>p(x) \ \stackrel{\mathrm{def}}{=}\ \Pr(T=1 | X=x).</math>
The propensity score was introduced by Rosenbaum and Rubin (1983) to provide an alternative method for estimating treatment effects when treatment assignment is not random, but can be assumed to be unconfounded. Let Y(0) and Y(1) denote the potential outcomes under control and treatment, respectively. Then treatment assignment is (conditionally) unconfounded if treatment is independent of potential outcomes conditional on X. This can be written compactly as
- <math>T \perp Y(0), Y(1) | X\,</math>
where <math>\perp</math> denotes statistical independence.
Rosenbaum and Rubin showed that if unconfoundedness holds, then
- <math>T \perp Y(0), Y(1) | p(X).</math>
While it is cognitively impossible to use the definition above for determining whether unconfoundedness holds in any specific situation, Pearl (2000) has shown that a simple graphical criterion called backdoor provides an equivalent definition of unconfoundedness.
Application
There are three commonly used ways to incorporate propensity scores into an analysis of treatment effects: matching, stratification, and regression adjustment. In each of these methods, the propensity score is created in the same manner, but the way in which the score is used varies. One common way to estimate the propensity score is with logistic regression of treatment predicted by clinically relevant or significant baseline covariates. The advantage of using a propensity score in addition to a logistic regression of treatment predicted by covariates is that the propensity score creates a randomized way of comparing the treatment group to the control group. When paired based on propensity, each subject is equally likely (i.e. had the same probability) to receive a given treatment.
Matching
One to one matching can be difficult, especially when there are numerous covariates to match. It is easier to match using propensity scoring because the propensity score is a scalar assigned to each patient that incorporates the effect of all covariates in the model.
The first step is to match a treated subject with a control subject based on their respective propensity scores. Exact matching of scores would be nearly impossible, so a range of values must be determined. According to Rosenbaum and Rubin [1], a quarter of a standard deviation of the logit of the propensity score is an appropriate range. Once the subjects are paired, pre and post-matching baseline characteristics between means of covariates for the treatment and control groups are compared. If the post-matching comparison of means is more similar than the pre-matching comparison, the propensity matching has reduced the bias of the treatment effect.
Stratification
Another way to consider propensity is through stratification. Using this method, the propensity score is calculated and then divided into groups. Rosenbaum and Rubin[2] suggest that the propensity be stratified into quintiles because this usually eliminates over 90% of the bias in each covariate. Means of the baseline characteristics between treated subjects and controls are compared pre and post stratification. For post stratification comparison of means, an adjustment is made using a categorical variable representing the propensity quintiles. One way to determine an overall treatment effect is to individually model treatment predicted by the propensity score for each quintile and then combine the estimates determined by each quintile. Another way is to model outcome predicted by treatment and either the raw propensity score or the propensity quintile. A subset of covariates can also be included in this model.
Regression Adjustment
Regression Adjustment is also a useful way to incorporate propensity scoring. With this method, a regression of the outcome using a large set of background covariates is performed to obtain the propensity score. Then once the propensity scores are obtained, another regression of the outcome predicted by treatment group and propensity score is used to analyze treatment effect. A subset of important covariates can also be included in this model. Both models, the model with the subset of covariates and the model without the subset, should yield the same conclusions. Stratification and regression adjustment methods can be combined and may produce more accurate results than any one individual method from above.
STATA Code
For the purposes of these examples, the data is entered as one line per subject.
Subject ID | Transfusion status | Sex | Age | BMI |
1 | 0 | male | 91 | 31.5 |
2 | 1 | female | 45 | 33.7 |
3 | 0 | female | 33 | 25 |
Matching
Generating Propensity Score
xi: logistic lungtransplant age sex bmi
predict propensity
- Note: Now we divide the propensity score into ranges to match on. To develop the ranges, look at the distribution of the propensity values.
gen propensity_class=1 if propensity<0.1
replace propensity_class=2 if 0.1<=propensity & propensity<0.2
replace propensity_class=3 if 0.2<=propensity & propensity<0.3
replace propensity_class=4 if 0.3<=propensity & propensity<0.4
replace propensity_class=5 if 0.4<=propensity & propensity<0.5
replace propensity_class=6 if 0.5<=propensity & propensity<0.6
replace propensity_class=7 if 0.6<=propensity & propensity<0.7
replace propensity_class=8 if 0.7<=propensity & propensity<0.8
replace propensity_class=9 if 0.8<=propensity & propensity<0.9
replace propensity_class=10 if 0.9<=propensity & propensity<=1
save "c:\transplant.dta", replace
use "c:\transplant.dta", clear
keep if lungtransplant==1
sort propensity_class
save "c:\transplant_yes.dta", replace
use "c:\transplant.dta", clear
keep if lungtransplant==0
rename id id_no
rename lungtransplant lungtransplant_no
rename age age_no
rename sex sex_no
rename bmi bmi_no
sort propensity_class
save "c:\transplant_no.dta", replace
merge propensity_class using "c:\transplant_yes.dta"
tab _merge
keep if _merge==3
drop _merge
save "c:\matched_cohort.dta", replace
Now the dataset is arranged as such:
Propensity_class | Id | Id_no | Lungtransplant | Lungtransplant_no | Age | Age_no | sex | sex_no | bmi | bmi_no |
1 | 100 | 203 | 1 | 0 | 91 | 88 | female | female | 31.5 | 32 |
2 | 101 | 215 | 1 | 0 | 45 | 47 | male | male | 33.7 | 35 |
3 | 102 | 145 | 1 | 0 | 33 | 31 | male | female | 25 | 22.5 |
use "c:\matched_cohort.dta", clear
keep id lungtransplant age sex bmi
save "c:\matched_cohort_yes.dta", replace
use "c:\matched_cohort.dta", clear
keep id_no lungtransplant_no age_no sex_no bmi_no
rename id_no id
rename lungtransplant_no lungtransplant
rename age_no age
rename sex_no sex
rename bmi_no bmi
save "c:\matched_cohort_no.dta", replace
append using "c:\matched_cohort_yes.dta"
stset days2death, failure(death)
stcox lungtransplant
Stratification
Generating Propensity Score
pscore lungtransplant age sex bmi, pscore(mypscore) blockid(myblock)
Incorporating Propensity Score Stratification in the Model
stset days2death, failure(death)
stcox lungtransplant, strata(myblock)
Regression
Generating Propensity Score
logistic lungtransplant age sex bmi
predict propensity
Incorporating Propensity Score in the Model
stset days2death, failure(death)
- Model 1- Death predicted by lung transplant status (0/1) and propensity score*
stcox lungtransplant propensity
- Model 2- Death predicted by lung transplant status (0/1), propensity score and a set or subset of important covariates*
stcox lungtransplant age sex bmi propensity
References
- ↑ Rosenbaum, P. R. and Rubin, D. B. "Constructing a control group using multivariate matched sampling methods that incorporate the propensity score," American Statistician, 39, 33-38 (1985)
- ↑ Rosenbaum, P. R. and Rubin, D. B. "Reducing bias in observational studies using subclassication on the propensity score," Journal of the American Statistical Association, 79, 516-524 (1984)
Additional Resources
- D'Agostino, R. B. (2007). "Propensity Scores in Cardiovascular Research," Circulation;115;2340-2343.
- D'Agostino, R. B. (1998). "Tutorial in Biostatistics: Propensity Score Methods for Bias Reduction in the Comparison of a Treatment to a non-Randomzied Control Group," Statistics in Medicine. 17, 2265-2281.
- Pearl, J. (2000). Causality: Models, Reasoning, and Inference, Cambridge University Press.
- Rosenbaum, P. R., and Rubin, D. B., (1983), "The Central Role of the Propensity Score in Observational Studies for Causal Effects," Biometrika 70, 41-55.