Courses

Statistics and R for the Life Sciences

We will learn the basics of statistical inference in order to understand and compute p-values and confidence intervals. We will provide examples by programming in R in a way that will help make the connection between concepts and implementation. Problem sets requiring R programming will be used to test understanding and ability to implement basic data analyses.

Overview

We will learn the basics of statistical inference in order to understand and compute p-values and confidence intervals. We will provide examples by programming in R in a way that will help make the connection between concepts and implementation. Problem sets requiring R programming will be used to test understanding and ability to implement basic data analyses. We will use visualization techniques to explore new data sets and determine the most appropriate approach. We will describe robust statistical techniques as alternatives when data do not fit assumptions required by the standard approaches. We will also introduce the basics of using R scripts to conduct reproducible research.

Suggested pre-requisites

  • PH207x: Health in Numbers: Quantitative Methods in Clinical and Public Health Research. This is another HarvardX course, which we recommend, but it is not a strict pre-requisite.
  • Basic programming skills. We will assume that learners are familiar with very basic programming concepts (variables, functions).
  • Familiarity with the R language. The course will use R in order to demonstrate data analyses. In the first week, we will have a refresher on the commands in R which you will need to use in the following weeks, but this is not a comprehensive R course, and we will not go in depth on R syntax. Please see below for online R resources.

Related Resources

Online R resources:

  • R reference card (PDF) by Tom Short (more can be found under Short Documents and Reference Cards here)
  • Quick-R: Quick online reference for data input, basic statistics, and plots
  • Thomas Girke's R & Bioconductor manuals
  • R programming class on Coursera,  taught by Roger Peng, Jeff Leek, and Brian Caffo
  • The free "try R" class from Code School is also a good place to start: http://tryr.codeschool.com/
  • swirl: learn R interactively from within the R console

R Books:

  • Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)
  • S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
  • Programming with Data: A Guide to the S Language by John M. Chambers (Springer)

Schedule

Week 1 : Introduction

  • Using Rstudio
  • R programming skills
  • Getting organized

Week 2 : Probability Distributions

  • Introduction to random variables
  • Introduction to the null distribution
  • Probability distributions
  • The normal distribution

Week 3 : Inference

  • t-tests
  • The Central Limit Theorem
  • Association tests
  • Monte Carlo methods
  • Permutation tests
  • Power

Week 4 : Exploratory Data Analysis and Robust Summaries

  • Exploratory data analysis
  • histogram
  • QQ-plot
  • boxplot
  • scatterplot
  • log transformation
  • Robust summaries
  • Median, MAD and Spearman correlation
  • Mann-Whitney-Wilcoxon test