SSCC - Social Science Computing Cooperative Supporting Statistical Analysis for Research

1.8 Functions and parameters

Functions are a set of commands that have been given a name. The commands of a function are designed to accomplish a specific task. A function may need data to accomplish its task. Data objects that are passed to a function are called parameters. Functions can return a data object when they have completed their task. In summary, data is given to a function. The function does a specific task using this data, and returns data.

Most function have multiple parameters. We need to communicate with the function which parameter each value is to used for. This can be done by using the functions name for the parameters or by aligning the values in the right position in the list of parameters.

Examples

  1. Passing parameters by name

    The seq() function generates a sequence of numbers. We need to specify a starting number, ending number, and the value to count by, for seq() to give us the sequence we want. The names of these parameters are from, to, and by. They are also in this order in the parameter list.

    seq(from = 1, to = 10, by = 2)
    [1] 1 3 5 7 9

    While this makes clear how each parameter is to be used by the seq() function, the use of from and to for sequence is clear with out the names. (sequence have start and end values.)

  2. Passing parameter by position.

    This example uses the same function and the same parameter values as the prior example.

    seq(1, 10, 2)
    [1] 1 3 5 7 9

    Passing the parameters by postion is nice when the parameters are clearly understood, like the 1 and the 10 are. It is less clear what the 2 would be used for.

    Some function have more than 10 parameters. counting commas to determine what parameter a value is to be given to would be burdensome.

  3. Using both position and names for parameters

    The convention in R is to pass the first and maybe second parameters by position, if it is clear what they are for. Parameter further in the list are passed by name.

    The prior examples would be better parametertised as follows.

    seq(1, 10, by = 2)
    [1] 1 3 5 7 9

Function can return only one data object. When a function has more than one object to return, the objects are collected into a list and the list is returned. The calling code would then access the list as needed.

Many tasks require more than one function to be completed. This is by design in R. R provides functions that are building blocks that are use to accomplish tasks. There are several style of writing multiple functions. One is nesting the result of one function as parameter to another function. The other is to save the results of functions as intermediate values and then pass these intermediate values to the next function.

Examples

  1. Nested functions.

    The following code creates a vector of 15 random numeric values. This vector is then rounded to two significant digits, sorted in descending order, and then head() displays a few of the largest values.

    set.seed(749875)
    number_data <- runif(n = 15, min = 0, max = 1000)
    
    head(sort(round(number_data, digit = 2), decreasing = TRUE))
    [1] 997.62 813.26 797.96 733.98 732.67 675.45

    To read the above base R code, one reads from the inner most parenthises to the outer most. This nesting of functions can make reading base R code challenging. The parameters can be separated from the function they are associated with.

  2. Saving intermediate values.

    This example does the same set of rounding, sorting, and head as in the prior example.

    number_round <- round(number_data, digit = 2)
    number_sort <- sort(number_round, decreasing = TRUE)
    head(number_sort)
    [1] 997.62 813.26 797.96 733.98 732.67 675.45

    This provides a more natural order of the functions. It does require the intermediate results to be saved. These intermediate results are only used by the function on the next line in this example. So there is a lot code written to manage the intermediate values.

Neither of these coding styles is perfect. They both have advantages and disadvantages. Most programmers use a blend of both approaches using the style that makes their code as clear as possible.

The tidyverse uses the pipe operator which allows for the natural order of function in the intermediate value approach without the need to save intermediate files.