USING PROPENSITY SCORES IN QUASI-EXPERIMENTAL DESIGNS
William M. Holmes
STATA COMMANDS FOR PROPENSITY USE
Shenyang Guo and Mark W. Fraser (2010) have written an excellent book on propensity score analysis that provides many Stata commands. While their treatment is more technical than this book, the Stata commands are quite useful. The support site for their book is: http://ssw.unc.edu/psa/. The Stata Journal article by Becker and Ichino (2002) provides a good overview of propensity score use and extensive commands with Stata. It can be accessed at http://www.stata-journal.com/sjpdf.html?articlenum=st0026.
IMBALANCE ASSESSMENT
Imbalance may be assessed with the ANOVA command or ttest.
. anova confounder1 treatment
.ttest confounder 1 treatment
PROPENSITY ESTIMATION
Propensities may be estimated with logistic, probit, or linear regression or with discriminant function analysis commands when combined with the predict command.
. logistic confounder1 confounder2 confounder3 i.treatment
. predict propen, xb
. probit treatment confounder1
confounder2 confounder3
. predict propen
. regress treatment confounder1 confounder2 confounder3
. predict propen
. candisc confounder1 confounder2 confounder3, group(treatment)
. predict propen
MATCHING
The psmatch2 program provides a means for propensity score matching within Stata. Alternative matching programs may be accessed with the R interface for Stata.
Exact and Coarsened Exact
See psmatch2 within Stata
Nearest Neighbor and Caliper
See psmatch2 within Stata
1-Many
See psmatch2 within Stata
Optimized, Full, or Genetic
To do optimized, full, or genetic matching within Stata, the R interface must be used to access the corresponding R programs.
STRATIFYING
Creating propensity strata can be done with the recode command.
recode propen 0/.19999=1 .2/.39999=2 .4/.59999=3 .6/.79999=4 .8/.99999=5, gen(propenstrata)
REGRESSION AND THE GENERAL LINEAR MODEL
Regression and General Linear Model estimation may be done with the regress and glm commands.
. regress outcome1 i.treatment
. glm outcome1 i.treatment, family(gaussian) link(identity)
TWO-STAGE LEAST SQUARES
2SLS regression can be done with the ivregress command.
. ivregress 2sls outcome1(treatment = instrument1) , vce(robust)
SAMPLE WEIGHTING
Sample weighting is done within the analytical commands using pweight. For example with a logit analysis, the command would be:
logit treatment confounder1 confounder2 [pweight=ipa]
WEIGHTED LEAST SQUARES
Weighted Least Squares may be done within analytical program (see sample weighting above) or with the wls0 command.
.wls0 outcome treatment, wvar(ipa) type(abse) noconst graph
GENERALIZED LINEAR MODEL
. glm income educ jobexp i.black, family(gaussian) link(identity)
MISSING DATA ANALYSIS
Analysis of missing data can combine mdesc with mvpatterns or with the misstable command.
mdesc confounder1 confounder2 confounder3
mvpatterns confounder1 confounder2 confounder3
. misstable summarize
IMPUTATION OF MISSING DATA
Multiple imputation can be done with the mi impute command. Variables to be imputed must first be registered. The example below imputes 5 values with a random seed of 123. Estimation of imputed data in Stata may be done with the mi estimate command.
. mi register imputed confounder1 confounder2
. mi impute regress confounder1 confounder2 confounder3, add(5) rseed(123)
. mi estimate: logit treatment confounder1 confounder2 confounder3
return to Home
Revised 1/31/2013