Intro to R Workshop: Session 2

UCI Data Science Initiative

April 13, 2017

Session 2 - Agenda

  1. Vectorized Operations in R
  2. Reading and Writing Data in R
  3. Control Structures
  4. R Packages and Functions

Vectorized Operations

x <- 1:5
y <- c(1, 2, 6, 7, 10)
x + y # R does an element by element summation
## [1]  2  4  9 11 15
x < y
## [1] FALSE FALSE  TRUE  TRUE  TRUE

Vectorized Operations

x <- matrix(1:9, ncol = 3)
y <- matrix(rep(c(5,6,7), 3), ncol = 3)
x + y # R does an element by element summation
##      [,1] [,2] [,3]
## [1,]    6    9   12
## [2,]    8   11   14
## [3,]   10   13   16
x < y
##      [,1] [,2]  [,3]
## [1,] TRUE TRUE FALSE
## [2,] TRUE TRUE FALSE
## [3,] TRUE TRUE FALSE

Reading and Writing Data

The slides for “Reading and Writing Data” section were mainly from Dr. Roger D. Peng, Associate Professor at Johns Hopkins

Main functions for reading data into R:

  1. read.table(), read.csv(): to read tabular data
  2. readLines(): to read lines of a text file
  3. source(), dget(): to read R code
  4. load(): to read saved workspaces

Reading and Writing Data

Main functions for writing data from R:

  1. write.table(), write.csv(): to write tabular data to file
  2. writeLines(): to write lines to a text file
  3. dump(), dput(): to write R code to a file
  4. save(): to save a workspace

read.table():

read.table():

irisFile <- read.table(file = "iris.csv", sep=",", header = TRUE)
head(irisFile)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width     Species
## 1          5.1         3.5          1.4         0.2 Iris-setosa
## 2          4.9         3.0          1.4         0.2 Iris-setosa
## 3          4.7         3.2          1.3         0.2 Iris-setosa
## 4          4.6         3.1          1.5         0.2 Iris-setosa
## 5          5.0         3.6          1.4         0.2 Iris-setosa
## 6          5.4         3.9          1.7         0.4 Iris-setosa

write.table():

write.table(irisFile, file = "new_iris.csv", sep = ",", col.names = TRUE)

Control Structures:

for loops:

print(paste("The year is", 2014))
## [1] "The year is 2014"
print(paste("The year is", 2015))
## [1] "The year is 2015"
print(paste("The year is", 2016))
## [1] "The year is 2016"

for loops:

for(i in 2014:2016){
  print(paste("The year is", i))
}
## [1] "The year is 2014"
## [1] "The year is 2015"
## [1] "The year is 2016"

for loops:

vec <- seq(2, 20, by = 2)
newvec <- vector("numeric", length = length(vec))
for(i in 1:length(vec)){
  newvec[i] <- vec[i]^2
}
newvec
##  [1]   4  16  36  64 100 144 196 256 324 400

if/else statements:

x <- 7
if (x < 10){
  print("x is less than 10")
}else{
  print("x is greater than 10")
}
## [1] "x is less than 10"

Combining for loops and if/else statements:

for loops and if/else statements:

age <- sample(1:100, 10)
ageCat <- rep(NA, length(age))
for (i in 1:length(age)) {
    if (age[i] <= 35){
       ageCat[i] <- "Young"
      }else if (age[i] <= 55){
        ageCat[i] <- "Middle-Aged"
      }else{
         ageCat[i] <- "Old"
      } 
}
age.df <- data.frame(age = age, ageCat = ageCat)
age.df[1:3,]
##   age      ageCat
## 1  30       Young
## 2  26       Young
## 3  41 Middle-Aged

Functions and Packages:

  1. R has many built-in functions
  2. Each function has a name followed by (), e.g., mean()
  3. Arguments of a function are put within the parentheses
  4. R packages are a way to maintain collections of R functions and data sets
  5. Packages allow for easy, transparent and cross-platform extension of the R base system

Functions and Packages:

Terminology:

  1. Package: an extension of the R base system with code, data and documentation in a standardized format
  2. Library: a directory containing installed packages
  3. Repository: a website providing packages for installation
  4. Source: the original version of a package with human-readable text and code
  5. Base packages: part of the R source tree, maintained by R Core

How to install a package in R:

There are two main ways to install a package in R:

  1. Installing from CRAN: install a package directly from the repository
    • Using R studio: tools/install packages
    • From R console: install.packages()
  2. Installing from Source: first download the add-on R package and then type the following in your console:
    • install.packages("path_to_file", repos = NULL, type = "source")

Functions in R

str(sample)
## function (x, size, replace = FALSE, prob = NULL)

Calling a function in R

Function arguments can either be matched by position within the parentheses or by name

sampSpace <- 1:6 
sample(sampSpace, 1) # arguments with default values can be omitted
## [1] 1
sample(size = 1, x = sampSpace) # no need to remember the order 
## [1] 5
sample(size = 1, sampSpace)
## [1] 5

Writing Your Own Functions

yourFnName <- function(arg1, arg2, ...){
  statements # body of your code
  
  return(object) # what is to be returned
}
yourFnName(arg1, arg2, ...)

Writing Your Own Functions

myMin <- function(a, b, c){
  myMinVal <- min(a, b, c)
  return(myMinVal)
}

myMin(10, 20, 30)
## [1] 10
myMin(10, NA, 20) # how to fix this so it returns 10?
## [1] NA

Some Useful Functions:

str():

str(str)
## function (object, ...)
str(sample)
## function (x, size, replace = FALSE, prob = NULL)
genderF <- factor(sample(c("Male", "Female"), 20, replace = TRUE))
str(genderF)
##  Factor w/ 2 levels "Female","Male": 1 2 1 1 2 1 2 2 2 1 ...

str():

myMat <- matrix(1:10, ncol = 5)
str(myMat)
##  int [1:2, 1:5] 1 2 3 4 5 6 7 8 9 10
myList <- list(numVec = 1:3, logVec = F, charVec = LETTERS[1:4])
str(myList)
## List of 3
##  $ numVec : int [1:3] 1 2 3
##  $ logVec : logi FALSE
##  $ charVec: chr [1:4] "A" "B" "C" "D"

apply():

str(apply) # try ?apply for more info
## function (X, MARGIN, FUN, ...)

apply():

myMat <- matrix(1:10, ncol = 5)
myMat
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2    4    6    8   10
apply(myMat, 2, sum)
## [1]  3  7 11 15 19

apply()

myMat <- matrix(1:10, ncol = 5)
myMat[2,c(2, 5)] <- NA
myMat
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    3    5    7    9
## [2,]    2   NA    6    8   NA
apply(myMat, 2, sum, na.rm = TRUE)
## [1]  3  3 11 15  9

apply():

head(iris) # more info ?iris
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

apply():

apply(iris[,-5], 2, quantile, probs = c(0.25, 0.75))
##     Sepal.Length Sepal.Width Petal.Length Petal.Width
## 25%          5.1         2.8          1.6         0.3
## 75%          6.4         3.3          5.1         1.8

Other functions in the apply() family:

Other functions in the apply() family:

Break Time