Chapter 24 Causal Inferense

Recall this fun advertisement

How come everyone in the past did not know what every kid knows these days: that cigarettes are bad for you. The reason is the difficulty in causal inference. Scientists knew about the correlations between smoking and disease, but no one could prove one caused the other. These could have been nothing more than correlations, with some external cause.

Cigarettes were declared dangerous without any direct causal evidence. It was in the USA’s surgeon general report of 1964 that it was decided that despite of the impossibility of showing a direct causal relation, the circumstantial evidence is just too strong, and declared cigarettes as dangerous.

24.1 Causal Inference From Designed Experiments

24.2 Causal Inference from Observational Data

24.2.1 Principal Stratification

Frumento et al. (2012)


24.2.2 Instrumental Variables


24.2.3 Propensity Scores


24.2.4 Direct Lieklihood


24.2.5 Regression Discontinuity

24.3 Bibliographic Notes

On the tail behind “smoking causes cancer” see NIH’s Reports of the Surgeon General.

24.4 Practice Yourself

Allard, Denis. 2013. “J.-P. Chiles, P. Delfiner: Geostatistics: Modeling Spatial Uncertainty.” Springer.

Analytics, Revolution, and Steve Weston. 2015. Foreach: Provides Foreach Looping Construct for R.

Anderson-Cook, Christine M. 2004. “An Introduction to Multivariate Statistical Analysis.” Journal of the American Statistical Association 99 (467). American Statistical Association: 907–9.

Arlot, Sylvain, Alain Celisse, and others. 2010. “A Survey of Cross-Validation Procedures for Model Selection.” Statistics Surveys 4. The author, under a Creative Commons Attribution License: 40–79.

Arnold, Taylor, Michael Kane, and Simon Urbanek. 2015. “Iotools: High-Performance I/O Tools for R.” arXiv Preprint arXiv:1510.00041.

Bai, Zhidong, and Hewa Saranadasa. 1996. “Effect of High Dimension: By an Example of a Two Sample Problem.” Statistica Sinica. JSTOR, 311–29.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. doi:10.18637/jss.v067.i01.

Benjamini, Yoav, and Daniel Yekutieli. 2001. “The Control of the False Discovery Rate in Multiple Testing Under Dependency.” Annals of Statistics. JSTOR, 1165–88.

Bonat, Wagner Hugo. 2018. “Multiple Response Variables Regression Models in R: The Mcglm Package.” Journal of Statistical Software 84 (1): 1–30.

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2017. Shiny: Web Application Framework for R.

Chapple, Simon R, Eilidh Troup, Thorsten Forster, and Terence Sloan. 2016. Mastering Parallel Programming with R. Packt Publishing Ltd.

Christakos, George. 2000. Modern Spatiotemporal Geostatistics. Vol. 6. Oxford University Press.

Conway, Drew, and John White. 2012. Machine Learning for Hackers. “ O’Reilly Media, Inc.”

Cotton, Richard. 2017. Testing R Code. Chapman; Hall/CRC.

Cressie, Noel, and Christopher K Wikle. 2015. Statistics for Spatio-Temporal Data. John Wiley; Sons.

Davis, Timothy A. 2006. Direct Methods for Sparse Linear Systems. SIAM.

Diggle, Peter J, JA Tawn, and RA Moyeed. 1998. “Model-Based Geostatistics.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 47 (3). Wiley Online Library: 299–350.

Dowle, Matt, and Arun Srinivasan. 2017. Data.table: Extension of ‘Data.frame‘.

Efron, Bradley. 2012. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Vol. 1. Cambridge University Press.

Eisenhart, Churchill. 1947. “The Assumptions Underlying the Analysis of Variance.” Biometrics 3 (1). JSTOR: 1–21.

Everitt, Brian, and Torsten Hothorn. 2011. An Introduction to Applied Multivariate Analysis with R. Springer Science & Business Media.

Fithian, William. 2015. “Topics in Adaptive Inference.” PhD thesis, STANFORD UNIVERSITY.

Foster, Dean P, and Robert A Stine. 2004. “Variable Selection in Data Mining: Building a Predictive Model for Bankruptcy.” Journal of the American Statistical Association 99 (466). Taylor & Francis: 303–13.

Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Vol. 1. Springer series in statistics Springer, Berlin.

Frumento, Paolo, Fabrizia Mealli, Barbara Pacini, and Donald B Rubin. 2012. “Evaluating the Effect of Training on Wages in the Presence of Noncompliance, Nonemployment, and Missing Outcome Data.” Journal of the American Statistical Association 107 (498). Taylor; Francis Group: 450–66.

Gentle, James E. 2012. Numerical Linear Algebra for Applications in Statistics. Springer Science & Business Media.

Gilbert, John R, Cleve Moler, and Robert Schreiber. 1992. “Sparse Matrices in Matlab: Design and Implementation.” SIAM Journal on Matrix Analysis and Applications 13 (1). SIAM: 333–56.

Golub, Gene H, and Charles F Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.

Graham, RL. 1988. “Isometric Embeddings of Graphs.” Selected Topics in Graph Theory 3. Academic Press San Diego, CA: 133–50.

Greene, William H. 2003. Econometric Analysis. Pearson Education India.

Hotelling, Harold. 1933. “Analysis of a Complex of Statistical Variables into Principal Components.” Journal of Educational Psychology 24 (6). Warwick & York: 417.

Izenman, Alan Julian. 2008. “Modern Multivariate Statistical Techniques.” Regression, Classification and Manifold Learning. Springer.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Vol. 6. Springer.

Javanmard, Adel, and Andrea Montanari. 2014. “Confidence Intervals and Hypothesis Testing for High-Dimensional Regression.” Journal of Machine Learning Research 15 (1): 2869–2909.

Kalisch, Markus, and Peter Bühlmann. 2014. “Causal Structure Learning and Inference: A Selective Review.” Quality Technology & Quantitative Management 11 (1). Taylor & Francis: 3–21.

Kane, Michael J, John Emerson, Stephen Weston, and others. 2013. “Scalable Strategies for Computing with Massive Data.” Journal of Statistical Software 55 (14): 1–19.

Kempthorne, Oscar. 1975. “Fixed and Mixed Models in the Analysis of Variance.” Biometrics. JSTOR, 473–86.

Lantz, Brett. 2013. Machine Learning with R. Packt Publishing Ltd.

Leisch, Friedrich. 2002. “Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis.” In Compstat, 575–80. Springer.

Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. 2014. Mining of Massive Datasets. Cambridge University Press.

McCullagh, Peter. 1984. “Generalized Linear Models.” European Journal of Operational Research 16 (3). Elsevier: 285–92.

Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. 2012. Foundations of Machine Learning. MIT press.

Nadler, Boaz. 2008. “Finite Sample Approximation Results for Principal Component Analysis: A Matrix Perturbation Approach.” The Annals of Statistics. JSTOR, 2791–2817.

Pearson, Karl. 1901. “LIII. on Lines and Planes of Closest Fit to Systems of Points in Space.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 (11). Taylor & Francis: 559–72.

Pinero, Jose, and Douglas Bates. 2000. “Mixed-Effects Models in S and S-Plus (Statistics and Computing).” Springer, New York.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.

Rabinowicz, Assaf, and Saharon Rosset. 2018. “Assessing Prediction Error at Interpolation and Extrapolation Points.” arXiv Preprint arXiv:1802.00996.

Ripley, Brian D. 2007. Pattern Recognition and Neural Networks. Cambridge university press.

Robinson, George K. 1991. “That Blup Is a Good Thing: The Estimation of Random Effects.” Statistical Science. JSTOR, 15–32.

Rosenblatt, Jonathan. 2013. “A Practitioner’s Guide to Multiple Testing Error Rates.” arXiv Preprint arXiv:1304.4920.

Rosenblatt, Jonathan D, and Yoav Benjamini. 2014. “Selective Correlations; Not Voodoo.” NeuroImage 103. Elsevier: 401–10.

Rosenblatt, Jonathan, Roee Gilron, and Roy Mukamel. 2016. “Better-Than-Chance Classification for Signal Detection.” arXiv Preprint arXiv:1608.08873.

Rosset, Saharon, and Ryan J Tibshirani. 2018. “From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation.” Journal of the American Statistical Association, no. just-accepted. Taylor & Francis.

Sammut, Claude, and Geoffrey I Webb. 2011. Encyclopedia of Machine Learning. Springer Science & Business Media.

Sarkar, Deepayan. 2008. Lattice: Multivariate Data Visualization with R. New York: Springer.

Schmidberger, Markus, Martin Morgan, Dirk Eddelbuettel, Hao Yu, Luke Tierney, and Ulrich Mansmann. 2009. “State of the Art in Parallel Computing with R.” Journal of Statistical Software 47 (1).

Searle, Shayle R, George Casella, and Charles E McCulloch. 2009. Variance Components. Vol. 391. John Wiley & Sons.

Shah, Viral, and John R Gilbert. 2004. “Sparse Matrices in Matlab* P: Design and Implementation.” In International Conference on High-Performance Computing, 144–55. Springer.

Shalev-Shwartz, Shai, and Shai Ben-David. 2014. Understanding Machine Learning: From Theory to Algorithms. Cambridge university press.

Shawe-Taylor, John, and Nello Cristianini. 2004. Kernel Methods for Pattern Analysis. Cambridge university press.

Simes, R John. 1986. “An Improved Bonferroni Procedure for Multiple Tests of Significance.” Biometrika 73 (3). Oxford University Press: 751–54.

Small, Christopher G. 1990. “A Survey of Multidimensional Medians.” International Statistical Review/Revue Internationale de Statistique. JSTOR, 263–77.

Tukey, John W. 1977. Exploratory Data Analysis. Reading, Mass.

Vapnik, Vladimir. 2013. The Nature of Statistical Learning Theory. Springer science & business media.

Venables, William N, and Brian D Ripley. 2013. Modern Applied Statistics with S-Plus. Springer Science & Business Media.

Venables, William N, David M Smith, R Development Core Team, and others. 2004. “An Introduction to R.” Network Theory Limited.

Wang, Chun, Ming-Hui Chen, Elizabeth Schifano, Jing Wu, and Jun Yan. 2015. “Statistical Methods and Computing for Big Data.” arXiv Preprint arXiv:1502.07989.

Weihs, Claus, Olaf Mersmann, and Uwe Ligges. 2013. Foundations of Statistical Algorithms: With References to R Packages. CRC Press.

Weiss, Robert E. 2005. Modeling Longitudinal Data. Springer Science & Business Media.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

———. 2011. “Testthat: Get Started with Testing.” The R Journal 3 (1): 5–10.

———. 2014. Advanced R. CRC Press.

Wickham, Hadley, and Romain Francois. 2016. Dplyr: A Grammar of Data Manipulation.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data.

Wilcox, Rand R. 2011. Introduction to Robust Estimation and Hypothesis Testing. Academic Press.

Wilkinson, GN, and CE Rogers. 1973. “Symbolic Description of Factorial Models for Analysis of Variance.” Applied Statistics. JSTOR, 392–99.

Wilkinson, Leland. 2006. The Grammar of Graphics. Springer Science & Business Media.

Xie, Yihui. 2015. Dynamic Documents with R and Knitr. Vol. 29. CRC Press.

———. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. CRC Press.


Frumento, Paolo, Fabrizia Mealli, Barbara Pacini, and Donald B Rubin. 2012. “Evaluating the Effect of Training on Wages in the Presence of Noncompliance, Nonemployment, and Missing Outcome Data.” Journal of the American Statistical Association 107 (498). Taylor; Francis Group: 450–66.