R, pipes, & readability

Or: How I Learned to Stop Worrying and Love the Right Assignment Operator

pipes

Published

2023-12-30

In version 4.1.0^[1], R introduced the native pipe operator:

R now provides a simple native forward pipe syntax |>. The simple form of the forward pipe inserts the left-hand side as the first argument in the right-hand side call. The pipe implementation as a syntax transformation was motivated by suggestions from Jim Hester and Lionel Henry.

^[1] R News

This is an incredibly useful tool, as it allows functions to be composed into clear, sequential pipelines^[2]. Personally, I find using functions with pipes far more readable than the more common “onion-style” function calls, since pipes let me read from left to right (or top to bottom) rather than from right to left:

^[2] I believe that this was prompted by the popularity of the forward-pipe operator, %>%, in the magrittr package, which is used extensively within the Tidyverse — a widely-used ecosystem of data science packages for R.

# Onion style
quux(bar(foo(df)))

# Pipes!
df |>
  foo() |>
  bar() |>
  quux()

In the pipe example above, I think that giving each function its own line also makes the code easier to parse at a glance. While I don’t have any cold hard data to support this, I’d venture that pipes and composable functions are a major part of why the Tidyverse has been so successful.

A more unusual pattern that I like to use with pipes is to assign the result of a pipeline using the right assignment operator because this follows the natural flow of the code. When adopting this approach, my eyes don’t have to jump back to the beginning of the pipeline to remind myself of which variable I’m binding the results to. Compare the following examples.

# Left assignment
result <- df |>
            foo() |>
            bar() |>
            quux() 
             
# Right assignment
df |>
  foo() |>
  bar() |>
  quux() -> result

Usually, the right assignment operator is considered bad form. For example, the assignment_linter in the lintr package specifies that the right assignment operator should not be allowed by default^[3]. While I agree with this as a rule of thumb, I nevertheless think that pipes are an exception for the sake of readability. As Abelson and Sussman noted in the preface to the first edition of SICP:

^[3] Note that the assignment_linter is a default linter.

[P]rograms must be written for people to read, and only incidentally for machines to execute.

As an alternative to the two examples shown above, magrittr also has a compound assignment pipe operator, %<>%, which allows the result of the right-hand side to be assigned to the left-hand side. For example, df %<>% foo() %>% bar() %>% quux() is essentially equivalent to df <- df foo() |> bar() |> quux().