Before we start developing our very own R Packages, let’s review the basic building blocks of R programming: functions.
If you do not recall the basics of writing functions, or if you want a quick refresher, watch the video below.
Let’s establish some vocabulary moving forward. Consider the very simple function below:
add_or_subtract <- function(first_num, second_num = 2, type = "add") {
if (type == "add") {
first_num + second_num
} else if (type == "subtract") {
first_num - second_num
} else {
stop("Please choose `add` or `subtract` as the type.")
}
}
The function name is chosen by whoever writes the function:
add_or_subtract <- function(first_num, second_num = 2, type = "add") {
if (type == "add") {
first_num + second_num
} else if (type == "subtract") {
first_num - second_num
} else {
stop("Please choose `add` or `subtract` as the type.")
}
}
The required arguments are the ones for which no default value is supplied:
add_or_subtract <- function(first_num, second_num = 2, type = "add") {
if (type == "add") {
first_num + second_num
} else if (type == "subtract") {
first_num - second_num
} else {
stop("Please choose `add` or `subtract` as the type.")
}
}
The optional arguments are the ones for which a default value is supplied.
add_or_subtract <- function(first_num, second_num = 2, type = "add") {
if (type == "add") {
first_num + second_num
} else if (type == "subtract") {
first_num - second_num
} else {
stop("Please choose `add` or `subtract` as the type.")
}
}
The body of the function is all the code inside the definition. This code will be run in the environment of the function, rather than in the global environment. This means that code in the body of the function does not have the power to alter anything outside the function.
(There are ways to cheat your way around this… we will avoid them!)
add_or_subtract <- function(first_num, second_num = 2, type = "add") {
if (type == "add") {
first_num + second_num
} else if (type == "subtract") {
first_num - second_num
} else {
stop("Please choose `add` or `subtract` as the type.")
}
}
The return values of the function are the possible objects that get returned:
add_or_subtract <- function(first_num, second_num = 2, type = "add") {
if (type == "add") {
first_num + second_num
} else if (type == "subtract") {
first_num - second_num
} else {
stop("Please choose `add` or `subtract` as the type.")
}
}
When we use a function in code, this is referred to as a function call.
Question 1: What will be returned by each of the following?
add_or_subtract
add_or_subtract
add_or_subtract(5, 6, type = "subtract")
add_or_subtract("orange")
add_or_subtract(5, 6, type = "multiply")
add_or_subtract("orange", type = "multiply")
Question 2:
Consider the following code:
first_num <- 5
second_num <- 3
result <- 8
result <- add_or_subtract(first_num, second_num = 4)
result_2 <- add_or_subtract(first_num)
In your Global Environment, what is the value of…
first_num
second_num
result
result_2
Most likely, you have so far only written functions for your own convenience. (Or for assignments, of course!)
We are now going to be designing functions for other people to use and possibly even edit them. This means we need to put some thought into the design of the function.
Designing functions is somewhat subjective, but there are a few principles that apply:
library()
statements inside functions!)Identify five major violations of design principles for the following function:
Suppose you’ve done it: You’ve written the most glorious, beautiful, well-designed function of all time. It’s many lines long, and it relies on several sub-functions.
You run it and - it doesn’t work.
How can you track down exactly where in your complicated functions, something went wrong?
Object of type ‘closure’ is not subsettable - Jenny Bryan
Question 1: What does using the traceback
approach to debugging NOT tell you?
Question 2: Which of the following is NOT a disadvantage of using browser()
?
Question 3: What is the most fun pronounciation of debugonce()
As this is an Advanced course, let’s take a moment to talk about two quirky details of how R handles functions.
In R, functions are objects.
That is, creating a function is not fundamentally different from creating a vector or a data frame.
Here we store the vector 1,2,3
in the object named a
:
## [1] 1 2 3
Here we store the procedure “add one plus one” in the object named a
:
## function(){
## 1+1
## }
For some strange reason, the word in R that means “object that’s a function” is closure. Have you ever gotten this error?
## Error in a[1]: object of type 'closure' is not subsettable
I bet you have! What happened here is that we tried to take a subset of the vector a
. But a
is a function, not a vector, so this doesn’t work!
If you encounter this error in the wild, it’s probably because you tried to reference a non-existant object. However, you used an object name that happens to also be an existing function.
Like most people, R’s goal is to avoid doing any unnecessary work.
When you “give” a value to an argument of a function, R does a quick check to make sure you haven’t done anything too crazy, like forgotten a parenthesis. Then it says, “Yep, looks like R code to me!” and moves on with its life.
Only when that argument is actually used does R try to run the code.
Consider the following obvious problem:
## Warning in mean.default("orange"): argument is not numeric or logical: returning
## NA
## [1] NA
Now consider the following function:
What do you think will happen when we run:
Seems like it should be an error, right?
But wait! Try it out for yourself.
The function silly_function
doesn’t use the x
argument. Thus, R was “lazy”, and never even bothered to try to run mean("orange")
- so we never get an error.
Suppose you want to write a function that takes a dataset, a categorical variable, and a quantitative variable; and returns the means by group.
You might think to yourself, “Easy!” and write something like this:
means_by_group <- function(dataset, cat_var, quant_var) {
dataset %>%
group_by(cat_var) %>%
summarize(means = mean(quant_var))
}
Okay, let’s run it!
## Error: Column `cat_var` is unknown
Dagnabbit! The function tried to group the data by a variable named cat_var
- but the dataset iris
doesn’t have any variables named cat_var
!
What happened here is that the function group_by
uses non-standard evaluation. This means it has a very special type of input called unquoted.
Notice that we say group_by(Species)
not group_by("Species")
- there are no quotation marks, because Species
is a variable name, not a string.
In our function, R sees the unquoted variable cat_var
, and tries to use it in group_by
, not realizing that we actually meant to pass along the variable name Species
into the function.
To solve this conundrum, we use a trick called tunnelling to “force” the unquoted name Species
through to the function group_by
. It looks like this:
means_by_group <- function(dataset, cat_var, quant_var) {
dataset %>%
group_by({{cat_var}}) %>%
summarize(means = mean({{quant_var}}))
}
Note: The tunnel, or “curly-curly” operator, {{ }}
, is from the tidyverse package rlang
.
Now everything works!
## # A tibble: 3 x 2
## Species means
## <fct> <dbl>
## 1 setosa 5.01
## 2 versicolor 5.94
## 3 virginica 6.59
In your time as an R developer, you may find yourself wanting to do even more complicated and fancy functions using non-standard evaluation. Some of these require much more complexity than the tunnelling trick - but all is possible!