Review: Writing Functions

Stat 431


Before we start developing our very own R Packages, let’s review the basic building blocks of R programming: functions.


Time Estimates:
     Videos: 55 min
     Readings: 30 min
     Activities: 0 min
     Check-ins: 6



Extra Resources:

Basics of Functions

If you do not recall the basics of writing functions, or if you want a quick refresher, watch the video below.


Optional Video: How to Write a Function


Anatomy of a function definition

Let’s establish some vocabulary moving forward. Consider the very simple function below:

The function name is chosen by whoever writes the function:

add_or_subtract <- function(first_num, second_num = 2, type = "add") {
  
  if (type == "add") {
    first_num + second_num
  } else if (type == "subtract") {
    first_num - second_num
  } else {
    stop("Please choose `add` or `subtract` as the type.")
  }
  
}

The required arguments are the ones for which no default value is supplied:

add_or_subtract <- function(first_num, second_num = 2, type = "add") {
  
  if (type == "add") {
    first_num + second_num
  } else if (type == "subtract") {
    first_num - second_num
  } else {
    stop("Please choose `add` or `subtract` as the type.")
  }
  
}

The optional arguments are the ones for which a default value is supplied.

add_or_subtract <- function(first_num, second_num = 2, type = "add") {
  
  if (type == "add") {
    first_num + second_num
  } else if (type == "subtract") {
    first_num - second_num
  } else {
    stop("Please choose `add` or `subtract` as the type.")
  }
  
}

The body of the function is all the code inside the definition. This code will be run in the environment of the function, rather than in the global environment. This means that code in the body of the function does not have the power to alter anything outside the function.

(There are ways to cheat your way around this… we will avoid them!)

add_or_subtract <- function(first_num, second_num = 2, type = "add") {
  
  if (type == "add") {
    first_num + second_num
  } else if (type == "subtract") {
    first_num - second_num
  } else {
    stop("Please choose `add` or `subtract` as the type.")
  }

  
}

The return values of the function are the possible objects that get returned:

add_or_subtract <- function(first_num, second_num = 2, type = "add") {
  
  if (type == "add") {
    first_num + second_num
  } else if (type == "subtract") {
    first_num - second_num
  } else {
    stop("Please choose `add` or `subtract` as the type.")
  }
  
}

When we use a function in code, this is referred to as a function call.


Check-In 1: Function Basics


Question 1: What will be returned by each of the following?

  1. 1
  2. -1
  3. 30
  4. An error defined in the function add_or_subtract
  5. An error defined in a different function, that is called from inside add_or_subtract

Question 2:

Consider the following code:

In your Global Environment, what is the value of…

  1. first_num
  2. second_num
  3. result
  4. result_2

Canvas Link     

Good function design

Most likely, you have so far only written functions for your own convenience. (Or for assignments, of course!)

We are now going to be designing functions for other people to use and possibly even edit them. This means we need to put some thought into the design of the function.


Required Reading: R4DS Chapter 19: Functions


Designing functions is somewhat subjective, but there are a few principles that apply:

  1. Choose a good, descriptive names
    • Your function name should describe what it does, and usually involves a verb.
    • Your argument names should be simple and/or descriptive.
    • Names of variables in the body of the function should be descriptive.
  2. Output should be very predictable
    • Your function should always return the same object type, no matter what input it gets.
    • Your function should expect certain objects or object types as input, and give errors when it does not get them.
    • Your function should give errors or warnings for common mistakes.
    • Default values of arguments should only be used when there is a clear common choice.
  3. The body of the function should be easy to read.
    • Code should use good style principles.
    • There should be occasional comments to explain the purpose of the steps.
    • Complicated steps, or steps that are repeated many times, should be written into separate functions (sometimes called helper functions).
  4. Functions should be self-contained.
    • They should not rely on any information besides what is given as input.
    • (Relying on other functions is fine, though)
    • They should not alter the Global Environment
    • (do not put library() statements inside functions!)

Check-In 2: Function Design


Identify five major violations of design principles for the following function:

Canvas Link     

Debugging Functions

Suppose you’ve done it: You’ve written the most glorious, beautiful, well-designed function of all time. It’s many lines long, and it relies on several sub-functions.

You run it and - it doesn’t work.

How can you track down exactly where in your complicated functions, something went wrong?


Required Video: Object of type closure is not subsettable

Object of type ‘closure’ is not subsettable - Jenny Bryan


Check-In 3: Debugging


Question 1: What does using the traceback approach to debugging NOT tell you?

  1. The function call that triggered the error.
  2. The sub-function where the error actually occurred.
  3. The value of the argument or object that caused the error.
  4. The text of the full error message.

Question 2: Which of the following is NOT a disadvantage of using browser()?

  1. You can’t insert it into existing functions.
  2. You can’t view variables in the function environment when it is running.
  3. You have to remember to take it out of your code when you are done with it.
  4. You have to run your code line-by-line until you find the error.

Question 3: What is the most fun pronounciation of debugonce()

  1. “Debug Once”
  2. “Debut Gonky”
  3. “Debugoncé” like “Beyoncé”

Canvas Link     

Advanced Details

As this is an Advanced course, let’s take a moment to talk about two quirky details of how R handles functions.

Objects of type closure

In R, functions are objects.

That is, creating a function is not fundamentally different from creating a vector or a data frame.

Here we store the vector 1,2,3 in the object named a:

## [1] 1 2 3

Here we store the procedure “add one plus one” in the object named a:

## function(){
##   1+1
## }

For some strange reason, the word in R that means “object that’s a function” is closure. Have you ever gotten this error?

## Error in a[1]: object of type 'closure' is not subsettable

I bet you have! What happened here is that we tried to take a subset of the vector a. But a is a function, not a vector, so this doesn’t work!

If you encounter this error in the wild, it’s probably because you tried to reference a non-existant object. However, you used an object name that happens to also be an existing function.

Lazy Evaluation

Like most people, R’s goal is to avoid doing any unnecessary work.

When you “give” a value to an argument of a function, R does a quick check to make sure you haven’t done anything too crazy, like forgotten a parenthesis. Then it says, “Yep, looks like R code to me!” and moves on with its life.

Only when that argument is actually used does R try to run the code.

Consider the following obvious problem:

## Warning in mean.default("orange"): argument is not numeric or logical: returning
## NA
## [1] NA

Now consider the following function:

What do you think will happen when we run:

Seems like it should be an error, right?

But wait! Try it out for yourself.

The function silly_function doesn’t use the x argument. Thus, R was “lazy”, and never even bothered to try to run mean("orange") - so we never get an error.

Non-Standard Evaluation and Tunnelling

Suppose you want to write a function that takes a dataset, a categorical variable, and a quantitative variable; and returns the means by group.

You might think to yourself, “Easy!” and write something like this:

Okay, let’s run it!

## Error: Column `cat_var` is unknown

Dagnabbit! The function tried to group the data by a variable named cat_var - but the dataset iris doesn’t have any variables named cat_var!

What happened here is that the function group_by uses non-standard evaluation. This means it has a very special type of input called unquoted.

Notice that we say group_by(Species) not group_by("Species") - there are no quotation marks, because Species is a variable name, not a string.

In our function, R sees the unquoted variable cat_var, and tries to use it in group_by, not realizing that we actually meant to pass along the variable name Species into the function.

To solve this conundrum, we use a trick called tunnelling to “force” the unquoted name Species through to the function group_by. It looks like this:

means_by_group <- function(dataset, cat_var, quant_var) {
  
  dataset %>%
    group_by({{cat_var}}) %>%
    summarize(means = mean({{quant_var}}))
  
  
}

Note: The tunnel, or “curly-curly” operator, {{ }}, is from the tidyverse package rlang.

Now everything works!

## # A tibble: 3 x 2
##   Species    means
##   <fct>      <dbl>
## 1 setosa      5.01
## 2 versicolor  5.94
## 3 virginica   6.59

In your time as an R developer, you may find yourself wanting to do even more complicated and fancy functions using non-standard evaluation. Some of these require much more complexity than the tunnelling trick - but all is possible!