This post is motivated by the following question asked on Stack Overflow:

  • How to add metadata to a tibble?

I thought the answer to the above question provided by Benjamin provided an elequent solution for storing and accessing column descriptions for data.frame objects. His answer employed set_label() and get_label() functions from the labelVector package. These functions provide a clean, concise solution to the task as shown using the mtcars dataset.

# install 'labelVector' package, if necessary ...
if(!'labelVector' %in% rownames(installed.packages())) {
  devtools::install_github('nutterb/labelVector')
  }
                              
library(labelVector)
library(magrittr)

mtcars_labelled <- mtcars %>%
  set_label(x = ., 
            mpg = 'miles per U.S. gallon',
            cyl = 'number of cylinders',
            disp = 'engine displacement (cu. in.)',
            hp = 'gross horsepower',
            drat = 'rear axle ratio',
            wt = 'weight (in thousands of pounds)',
            qsec = '1/4 mile time (in seconds)',
            vs = 'engine cyclinder configuration',
            am = 'transmission configuration',
            gear = 'number of forward gears',
            carb = 'number of carburetor barrels')

# print column name with label ...
paste(colnames(mtcars_labelled), ': ', get_label(mtcars_labelled), sep = "") %>% print()   
##  [1] "mpg: miles per U.S. gallon"         
##  [2] "cyl: number of cylinders"           
##  [3] "disp: engine displacement (cu. in.)"
##  [4] "hp: gross horsepower"               
##  [5] "drat: rear axle ratio"              
##  [6] "wt: weight (in thousands of pounds)"
##  [7] "qsec: 1/4 mile time (in seconds)"   
##  [8] "vs: engine cyclinder configuration" 
##  [9] "am: transmission configuration"     
## [10] "gear: number of forward gears"      
## [11] "carb: number of carburetor barrels"

Base R provides the attr() function, which allows for an arbitrary number of attributes to be stored with any R object. The str() function shows the set_label() is equivalent to attr(data.frame$colname, 'label') <- 'label description'.

str(mtcars_labelled)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg :Classes 'labelled', 'numeric'  atomic [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##   .. ..- attr(*, "label")= chr "miles per U.S. gallon"
##  $ cyl :Classes 'labelled', 'numeric'  atomic [1:32] 6 6 4 6 8 6 8 4 4 6 ...
##   .. ..- attr(*, "label")= chr "number of cylinders"
##  $ disp:Classes 'labelled', 'numeric'  atomic [1:32] 160 160 108 258 360 ...
##   .. ..- attr(*, "label")= chr "engine displacement (cu. in.)"
##  $ hp  :Classes 'labelled', 'numeric'  atomic [1:32] 110 110 93 110 175 105 245 62 95 123 ...
##   .. ..- attr(*, "label")= chr "gross horsepower"
##  $ drat:Classes 'labelled', 'numeric'  atomic [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##   .. ..- attr(*, "label")= chr "rear axle ratio"
##  $ wt  :Classes 'labelled', 'numeric'  atomic [1:32] 2.62 2.88 2.32 3.21 3.44 ...
##   .. ..- attr(*, "label")= chr "weight (in thousands of pounds)"
##  $ qsec:Classes 'labelled', 'numeric'  atomic [1:32] 16.5 17 18.6 19.4 17 ...
##   .. ..- attr(*, "label")= chr "1/4 mile time (in seconds)"
##  $ vs  :Classes 'labelled', 'numeric'  atomic [1:32] 0 0 1 1 0 1 0 1 1 1 ...
##   .. ..- attr(*, "label")= chr "engine cyclinder configuration"
##  $ am  :Classes 'labelled', 'numeric'  atomic [1:32] 1 1 1 0 0 0 0 0 0 0 ...
##   .. ..- attr(*, "label")= chr "transmission configuration"
##  $ gear:Classes 'labelled', 'numeric'  atomic [1:32] 4 4 4 3 3 3 3 4 4 4 ...
##   .. ..- attr(*, "label")= chr "number of forward gears"
##  $ carb:Classes 'labelled', 'numeric'  atomic [1:32] 4 4 1 1 2 1 4 2 2 4 ...
##   .. ..- attr(*, "label")= chr "number of carburetor barrels"

set_label() can be further shown to be equivalent to attr() by assigning a label using attr() and accessing it with get_label().

df <- mtcars

# assign label using `attr()`
attr(df$mpg, 'label') <- 'miles per U.S. gallon'

# access attribute with `get_label()'
get_label(df$mpg) %>% print()
## [1] "miles per U.S. gallon"

The main advantage to using get_label() over attr() is the ability to assign multiple labels in a single function call, which makes the code easy to read and write.

get_label() can be used on vectors, but doesn’t offer any advantages over using attr(). Perhaps one day the functionality will be extended to other R objects as well.