April 13, 2017

Session 3 - Agenda

  1. Statistical Distributions in R

Statistical Distributions in R:

  • R has many built-in statistical distributions
    • e.g., binomial, poisson, normal, chi square, …
  • Each distribution in R has four functions:
    • These functions begin with a "d", "p", "q", or "r" and are followed by the name of the distribution
  • ddist(): gives the density of the distribution
  • rdist(): generates random numbers out of the distribution
  • qdist(): gives the quantile of the distribution
  • pdist(): gives the cumulative distribution function (CDF)

Discrete Distribution: Binomial

  • Consider tossing a coin 10 times
  • The probability distribution for the two possible outcomes follows a binomial distribution
  • Let's calculate the probability of getting five heads using the function dbinom()
str(dbinom) # binomial probability mass func
## function (x, size, prob, log = FALSE)
dbinom(5, 10, 0.5) # Pr[X = 5] = ?
## [1] 0.2460938

Discrete Distribution: Binomial

  • Next, let's calculate the probability of getting 5 or fewer heads using the function pbinom()
str(pbinom) # binomial CDF 
## function (q, size, prob, lower.tail = TRUE, log.p = FALSE)
pbinom(5, 10, 0.5) # Pr[X <= 5] = ?
## [1] 0.6230469

Discrete Distribution: Binomial

  • Now, suppose we have the probability 0.75 and we want to calculate the number of heads whose CDF is equal to that using qnorm() (note that this is the inverse of pnorm())
str(qbinom) # binomial quantile func
## function (p, size, prob, lower.tail = TRUE, log.p = FALSE)
qbinom(0.75, 10, 0.5) # get the value of ? s.t. Pr[X <= ?] = 0.75
## [1] 6

Discrete Distribution: Binomial

  • Finally, let's generate 20 independent samples from a binomial(10, 0.5). This is equivalent to repeatedly (i.e., 20 times) flipping a coin 10 times and counting the number of heads.
str(rbinom) # binomial random number generator
## function (n, size, prob)
rbinom(20, 10, 0.5) # 20 ind samples from binomial(10, 0.5)
##  [1] 4 4 5 4 6 8 7 5 6 3 4 4 6 6 4 4 6 4 5 7

Continuous Distribution: Standard Normal

  • Calculate the value of the probability density function at \(X = 0\)
str(dnorm) # normal pdf
## function (x, mean = 0, sd = 1, log = FALSE)
dnorm(x = 0, mean = 0, sd = 1)
## [1] 0.3989423

Continuous Distribution: Standard Normal

  • Calculate the probability that \(X \leq 0\)
str(pnorm) # normal CDF
## function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
pnorm(0, mean = 0, sd = 1) # Pr[X <= 0] = ?
## [1] 0.5

Continuous Distribution: Standard Normal

  • Find the value for which the CDF = 0.975
str(qnorm) # normal quantile func
## function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(0.975, mean = 0, sd = 1) # PR[X <= ?] = 0.975
## [1] 1.959964

Continuous Distribution: Standard Normal

  • Generate 10 independent random numbers from a standard normal distribution
str(rnorm) # generate random number from normal dist
## function (n, mean = 0, sd = 1)
rnorm(10, mean = 0, sd = 1)
##  [1]  0.28732831  1.28870120  0.95492333 -1.05616106  0.71571889
##  [6] -0.06861534  0.25125574  1.13770718  0.31197057  1.73683798

Let's try plotting a normal curve (more on plotting later)

x <- seq(from = -3, to = 3, by = 0.05)
y <- dnorm(x, mean = 0, sd = 1)
plot(x, y, type = "l")

Break Time