4 Loops and functions
Loops are fundamental a programming concept as they get a lot of repetitive stuff done in very few lines of code. Paired with custom functions, we can begin to tackle complex programming problems.
4.1 For loops
For loops. Here’s what the syntax of a for
loop looks like:
for (item in list_of_items) {
do_something(item)
}
And here is an example:
for (i in 1:3) {
print(i)
}
## [1] 1
## [1] 2
## [1] 3
In the previous example, we used the dummy variable i
to take on some range of values. Notice that i
can be called anything you want.
Try creating a for loop that prints the square of a number plus one for numbers ranging from 2 to 6.
4.1.1 Looping over multiple files
We turn our attention now to a (slightly more) useful example: how do we analyze multiple files with similar contents?
In this hypothetical example, we have 5 datasets with satellite coodinates at specific points orbiting the Earth. Suppose the files are similarly named (click on the files to download them):
- locations-2016-01-01.txt
- locations-2016-01-02.txt
- locations-2016-01-03.txt
- locations-2016-01-04.txt
- locations-2016-01-05.txt
Our goal is to determine the number of satellite coordinates per file.
First, retrieve the name of each file.
<- "data/04_intro-to-r" # files are located in this location (on my computer)
my_dir <- "locations-.*.txt" # file names follow this pattern
my_files <- list.files(path = my_dir, pattern = my_files, full.names = TRUE) data_files
Note that the asterisk in "*.txt"
refers to “any name in this directory” whereas the ".txt"
part ensures we are only selecting .txt files.
Next, determine the number of observations in each file. We will assume that each row corresponds to a single coordinate.
<- vector(mode = "integer", length = length(data_files))
results for (i in 1:length(data_files)) {
<- read.csv(data_files[i])
data <- nrow(data)
count <- count
results[i] }
Now, store the output in a data frame and associate the file name with the count.
# initializing the data frame with empty columns
<- data.frame(file_name = character(length(data_files)),
results count = integer(length(data_files)),
stringsAsFactors = FALSE)
# reading the data into the data frame
for (i in 1:length(data_files)) {
<- read.csv(data_files[i])
data <- nrow(data)
count $file_name[i] <- data_files[i]
results$count[i] <- count
results
}
# voila!
results
## file_name count
## 1 data/04_intro-to-r/locations-2016-01-01.txt 4
## 2 data/04_intro-to-r/locations-2016-01-02.txt 8
## 3 data/04_intro-to-r/locations-2016-01-03.txt 10
## 4 data/04_intro-to-r/locations-2016-01-04.txt 10
## 5 data/04_intro-to-r/locations-2016-01-05.txt 12
Sometimes, we need to loop over more than a single range of numbers. For example, what if we want to select all pixels on a 2x3 rectangular screen? Here, we need to cover both the “x” and “y” pixel coodinates:
for (i in 1:2) {
for (j in 1:3) {
print(paste("i = " , i, "; j = ", j, sep=""))
} }
## [1] "i = 1; j = 1"
## [1] "i = 1; j = 2"
## [1] "i = 1; j = 3"
## [1] "i = 2; j = 1"
## [1] "i = 2; j = 2"
## [1] "i = 2; j = 3"
4.2 Functions
Sometimes, we will need to create custom functions. Luckily, we can define our own functions!
Functions. This is the general syntax for a function:
<- function(arguments) {
function_name <- do_something(inputs)
output_value return(output_value)
}
Remark: every function returns a value. Recall from your grade-school math class that functions take an input and return an output. In R, however, a function may or may not take user-defined input. This brings us to an extremely important point: creating a function does NOT run it. You must call the function to run it.
As an exercise, create a function called calc_vol
that takes three parameters length, width, and height, and use those values to calculate the volume of the object. Then, call the function to calculate the volume of a 1x1x1 object and a 3x2x5 object.
Since R treats functions like a black box, you can’t access a variable that was created in a function. You must save the output of a function (to a variable) to use it later.
4.2.1 Conditionals within functions
We can use conditionals in a function for more complex tasks. As an exercise, create a function called pred_c19_cases
to predict the number of COVID-19 cases in a population (note that these numbers are fictional):
- The function will have two parameters
pop_size
(population size) andvac_brand
(vaccine brand). - If the vaccine is Moderna, multiply
pop_size
by 0.941. - If the vaccine is Pfizer, multiply
pop_size
by 0.950. - If the vaccine is Astrazeneca, multiply
pop_size
by 0.870. - Return the predicted cases by subtracting the number of healthy individuals from
pop_size
.
Now that we’ve got the basics of R under our belts, we can jump into the delightful world of data science 😄.