4 Loops and functions

Loops are fundamental a programming concept as they get a lot of repetitive stuff done in very few lines of code. Paired with custom functions, we can begin to tackle complex programming problems.

4.1 For loops


For loops. Here’s what the syntax of a for loop looks like:

for (item in list_of_items) {
  do_something(item)
}

And here is an example:

for (i in 1:3) {
  print(i)
}
## [1] 1
## [1] 2
## [1] 3

In the previous example, we used the dummy variable i to take on some range of values. Notice that i can be called anything you want.

Try creating a for loop that prints the square of a number plus one for numbers ranging from 2 to 6.

4.1.1 Looping over multiple files

We turn our attention now to a (slightly more) useful example: how do we analyze multiple files with similar contents?

In this hypothetical example, we have 5 datasets with satellite coodinates at specific points orbiting the Earth. Suppose the files are similarly named (click on the files to download them):

Our goal is to determine the number of satellite coordinates per file.

First, retrieve the name of each file.

my_dir <- "data/04_intro-to-r"  # files are located in this location (on my computer)
my_files <- "locations-.*.txt"   # file names follow this pattern
data_files <- list.files(path = my_dir, pattern = my_files, full.names = TRUE)

Note that the asterisk in "*.txt" refers to “any name in this directory” whereas the ".txt" part ensures we are only selecting .txt files.

Next, determine the number of observations in each file. We will assume that each row corresponds to a single coordinate.

results <- vector(mode = "integer", length = length(data_files))
for (i in 1:length(data_files)) {
  data <- read.csv(data_files[i])
  count <- nrow(data)
  results[i] <- count
}

Now, store the output in a data frame and associate the file name with the count.

# initializing the data frame with empty columns
results <- data.frame(file_name = character(length(data_files)),
                      count = integer(length(data_files)),
                      stringsAsFactors = FALSE)

# reading the data into the data frame
for (i in 1:length(data_files)) {
  data <- read.csv(data_files[i])
  count <- nrow(data)
  results$file_name[i] <- data_files[i]
  results$count[i] <- count
}

# voila!
results
##                                     file_name count
## 1 data/04_intro-to-r/locations-2016-01-01.txt     4
## 2 data/04_intro-to-r/locations-2016-01-02.txt     8
## 3 data/04_intro-to-r/locations-2016-01-03.txt    10
## 4 data/04_intro-to-r/locations-2016-01-04.txt    10
## 5 data/04_intro-to-r/locations-2016-01-05.txt    12

Sometimes, we need to loop over more than a single range of numbers. For example, what if we want to select all pixels on a 2x3 rectangular screen? Here, we need to cover both the “x” and “y” pixel coodinates:

for (i in 1:2) {
  for (j in 1:3) {
    print(paste("i = " , i, "; j = ", j, sep=""))
  }
}
## [1] "i = 1; j = 1"
## [1] "i = 1; j = 2"
## [1] "i = 1; j = 3"
## [1] "i = 2; j = 1"
## [1] "i = 2; j = 2"
## [1] "i = 2; j = 3"

4.2 Functions

Sometimes, we will need to create custom functions. Luckily, we can define our own functions!


Functions. This is the general syntax for a function:

function_name <- function(arguments) {
  output_value <- do_something(inputs)
  return(output_value)
}

Remark: every function returns a value. Recall from your grade-school math class that functions take an input and return an output. In R, however, a function may or may not take user-defined input. This brings us to an extremely important point: creating a function does NOT run it. You must call the function to run it.

As an exercise, create a function called calc_vol that takes three parameters length, width, and height, and use those values to calculate the volume of the object. Then, call the function to calculate the volume of a 1x1x1 object and a 3x2x5 object.

Since R treats functions like a black box, you can’t access a variable that was created in a function. You must save the output of a function (to a variable) to use it later.

4.2.1 Conditionals within functions

We can use conditionals in a function for more complex tasks. As an exercise, create a function called pred_c19_cases to predict the number of COVID-19 cases in a population (note that these numbers are fictional):

  1. The function will have two parameters pop_size (population size) and vac_brand (vaccine brand).
  2. If the vaccine is Moderna, multiply pop_size by 0.941.
  3. If the vaccine is Pfizer, multiply pop_size by 0.950.
  4. If the vaccine is Astrazeneca, multiply pop_size by 0.870.
  5. Return the predicted cases by subtracting the number of healthy individuals from pop_size.

Now that we’ve got the basics of R under our belts, we can jump into the delightful world of data science 😄.