Exploring NSE: enquo, quos and ...

As one gets more interested in building your own custom functions, you quickly start realising that unless your functions are tidyverse friendly, standardising your code workflow becomes a problem. So, how do you make your customs play well with your favourite tidyverse packages? Our friendly little helpers are going to be enquo and quos. I am going to build a function that calculates the proportion and cumulative proportion of a grouping variable.

suppressPackageStartupMessages(library(dplyr))

prop_count <- function(df, vars){
  vars_col <- enquo(vars)
  
  print(vars_col)
  
  df %>% 
    count(!!vars_col, sort = T) %>% 
    mutate(prop_n = prop.table(n)) %>% 
    mutate(cumsum_n = cumsum(prop_n)) 
}

dplyr::starwars %>% 
  prop_count(homeworld)
## <quosure>
##   expr: ^homeworld
##   env:  000000000C4567B8
## # A tibble: 49 x 4
##    homeworld     n prop_n cumsum_n
##    <chr>     <int>  <dbl>    <dbl>
##  1 Naboo        11 0.126     0.126
##  2 Tatooine     10 0.115     0.241
##  3 <NA>         10 0.115     0.356
##  4 Alderaan      3 0.0345    0.391
##  5 Coruscant     3 0.0345    0.425
##  6 Kamino        3 0.0345    0.460
##  7 Corellia      2 0.0230    0.483
##  8 Kashyyyk      2 0.0230    0.506
##  9 Mirial        2 0.0230    0.529
## 10 Ryloth        2 0.0230    0.552
## # ... with 39 more rows

From the output we can see that quosures are quoted expressions that keep track of an environment or function and we can use the bang bang (!!) to evaluate (or unquote) the columns. What happens when we are looking to get the proportional count of multiple variable?

dplyr::starwars %>% 
  prop_count(homeworld, species)
## Error in prop_count(., homeworld, species): unused argument (species)

We get an error, as the second argument in the function is interpreted as exactly that, a second argument. We want our function to accommodate multiple grouping variables. This is where quos and ... come in. The ellips is analogous to multiple arguments or input.

prop_count <- function(df, ...){
  vars_col <- quos(...)
  
  print(vars_col)
  
  df %>% 
    count(!!!vars_col, sort = T) %>% 
    mutate(prop_n = prop.table(n)) %>% 
    mutate(cumsum_n = cumsum(prop_n)) 
}

dplyr::starwars %>% 
  prop_count(homeworld, species)
## [[1]]
## <quosure>
##   expr: ^homeworld
##   env:  000000000BFAE918
## 
## [[2]]
## <quosure>
##   expr: ^species
##   env:  000000000BFAE918
## # A tibble: 58 x 5
##    homeworld species      n prop_n cumsum_n
##    <chr>     <chr>    <int>  <dbl>    <dbl>
##  1 Tatooine  Human        8 0.0920   0.0920
##  2 Naboo     Human        5 0.0575   0.149 
##  3 <NA>      Human        5 0.0575   0.207 
##  4 Alderaan  Human        3 0.0345   0.241 
##  5 Naboo     Gungan       3 0.0345   0.276 
##  6 Corellia  Human        2 0.0230   0.299 
##  7 Coruscant Human        2 0.0230   0.322 
##  8 Kamino    Kaminoan     2 0.0230   0.345 
##  9 Kashyyyk  Wookiee      2 0.0230   0.368 
## 10 Mirial    Mirialan     2 0.0230   0.391 
## # ... with 48 more rows

Now our function accommodates multiple inputs in the tidyverse fashion! If you feel like reading more about Non-standard evaluation, go read the full documentation

Related