Data tidying with tidyr:: Cheatsheet Tidy data is a way to organize tabular data in a consistent data structure across packages. A table is tidy if: 111 Each variable is in its own column Each observation, or case, is in its own row Access variables as vectors Preserve cases in vectorized operations Tibbies AN ENHANCED DATA FRAME Tibbies are a table format provided by the tibble package. They inherit the data frame class, but have improved behaviors: • Subset a new tibble with ], a vector with [[ and $. • No partial matching when subsetting columns. • Display concise views of the data on one screen. options(tibble.print_max = n, tibble. print_min = m, tibble.width = Inf) Control default display settings. View() or glimpse)) View the entire data set. CONSTRUCT A TIBBLE tibble)...) Construct by columns. tibble(x = 1:3, y = c("a", "b", "c")) tribble)...) Construct by rows. tribble(~x, ~y, 1, "a", 2, "b", 3, "c") A tibble: 3x2 11a 2 2b 3 3c as_tibble(x,...) Convert a data frame to a tibble. enframe(x, name = "name", value = "value") Convert a named vector to a tibble. Also deframe(). is_tibble(x) Test whether x is a tibble. ©Stud 10 Reshape Data - Pivot data to reorganize values into a new layout. table4a KK1WTW HUHU [1 0.7k b 37k 80k C 212k 213k b 37k C 212k a IFTiTTl 2k b 80k c Bililil 213k table2 1333 a 1999 3 0.7k iw a 1999 0.7k 19m a 1999 IBS 19m a 2000 2k 20m a 2000 j 2k b 1999 37k 172m a 2000 WEHM 20m b 2000 80k 174m b 1999 37k C 1999 212k 1t b 1999 BEH 172m C 2000 213k 1t b 2000 80k b 2000 BEH 174m C 1999 212k C 1999 BEH 1t C 2000 213k C 2000 WHOM 1t pivot_longer(data, cols, names_to = "name", values_to = "value", values_drop_na = FALSE) "Lengthen" data by collapsing several columns into two. Column names move to a new names_to column and values to a newvalues_to column. pivot_longer(table4a, cols = 2:3, namesjo - year", values_to = "cases") pivot_wider(data, names_from = "name", values_from = "value") The inverse of pivot_longer(). "Widen" data by expanding two columns into several. One column provides the new column names, the other the values. pivot_wider(table2, names_from = type, values from = count) Split Cells table5 a 19 a 20 b 19 b 20 ■ Use these functions to split or combine cells into individual, isolated values. _table 3_ a 1999 0.7k/ a 2000 2k/ b 1999 37k/ b 2000 80k/ a 19 a 20 b 19 b 20 a 1999 0.7k a 2000 2k b 1999 37k b 2000 80k _table3 a 1999 0.7k/ a 2000 ?kHiIM—^. I b 1999 37k/ b 2000 80k/ a 1999 0.7k 1999 a 2000 2k a 2000 b 1999 37k b 1999 b 2000 80k b 2000 unite(data, col,..., sep = remove = TRUE, na.rm = FALSE) Collapse cells across several columns into a single column. unite(table5, century, year, col = "year", sep = "") separate(data, col, into, sep = "[A[:alnum:]]+", remove = TRUE, convert = FALSE, extra = "warn", fill = "warn",...) Separate each cell in a column into several columns. Also extract)) separate(table3, rate, sep = "/", into = c("cases", "pop")) separate rowsfdata,sep = "[A[:alnum:].]+", convert = FALSE) Separate each cell in a column into several rows. separate_rows(table3, rate, sep = "/") Expand Tables Create new combinations of variables or identify implicit missing values (combinations of variables not present in the data). a 1 3 . B 1 4 EÜEE1 expand(data,...) Create a * a 2 new tibble with all possible g 2 combinations of the values of the variables listed in ... Drop other variables. expand(mtcars, cyl, gear, ca rb) a 1 3 B 1 4 EOtaiS] complete(data, ,fill = list()) Add missing possible combinations of values of variables listed in ... Fill remaining variables with NA. complete(mtcars, cyl, gear, ca rb) Handle Missing Values Drop or replace explicit missing values (NA). a 1 B na C na a 1 B na C na Eld drop na(data,...) Drop □ 3 rows containing NA's in .. columns. drop_na(x, x2) nira a 1 B 1 C 1 111 TZW fill(data,..., .direction = "down") Fill in NA's in ... columns using the next or previous value. fill(x,x2) replace_na(data, replace) Specify a value to replace NA in selected columns. replace_na(x, list(x2 = 2)) RStudio8isatrademarkofRStudio,PBC • CCBYSA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn moreattidyr.tidyverse.org • tibble 3.1.2 • tidyr 1.1.3 • Updated: 2021-08 Nested Data A nested data frame stores individual tables as a list-column of data frames within a larger organizing data frame. List-columns can also be lists of vectors or lists of varying data types. Use a nested data frame to: • Preserve relationships between observations and subsets of data. Preserve the type of the variables being nested (factors and datetimes aren't coerced to character). • Manipulate many sub-tables at once with purrr funcitons like map(), map2(), or pmap() or with dplyr rowwisef) grouping. CREATE NESTED DATA nest(data,...) Moves groups of cells into a list-column of a data frame. Use alone or with dplyr::group_by(): 1. Group the data frame with group_by() and use nest() to move the groups into a list-column. n_storms <- storms %>% group_by(name) %>% nest() 2. Use nest(new_col = c(x, y)) to specify the columns to group using dplyr::select() syntax. n_storms <- storms %>% nestfdata = c(yeanlong)) "cell" contents RESHAPE NESTED DATA unnest(data, cols,keep_empty = FALSE) Flatten nested columns back to regular columns. The inverse of nest(). n_storms%>% unnest(data) unnest_longer(data, col, values_to = NULL, indices_to = NULL) Turn each element of a list-column into a row. starwars%>% select(name, films) %>% unnestjonger(films) KB misa KB Ig] ESU Amy 1975 27.5 -79.0 Amy 1975 27.5 -79.0 Amy 1975 28.5 -79.0 Amy 1975 28.5 -79.0 nested data frame Amy 1975 29.5 -79.0 Amy 1975 29.5 -79.0 Bob 1979 22.0 -96.0 Bob 1979 22.0 -96.0 _^ Amy Bob 1979 22.5 -95.3 Bob 1979 22.5 -95.3 Bob Bob 1979 23.0 -94.6 Bob 1979 23.0 -94.6 Zeta Zeta 2005 23.9 -35.6 Zeta 2005 23.9 -35.6 Zeta 2005 24.2 -36.1 Zeta 2005 24.2 -36.1 Zeta 2005 24.7 -36.6 Zeta 2005 24.7 -36.6 1975 1975 1975 27.5 -79.0 28.5 -79.0 29.5 -79.0 Index list-columns with [ . n_storms$data[[l]] 1979 1979 1979 2005 2005 2005 22.0 -96.0 22.5 -95.3 23.0 -94.6 ^^^^H name films 1 Luke TheEmpire Strik... Luke Revenge of the S... Luke Return of the Jed... Luke I C-3PO TheEmpire Strik... C-3PO 1 C-3PO Attack of the CI... R2-D2 C-3PO The Phantom M... R2-D2 TheEmpire Strik... R2-D2 Attack of the CI... R2-D2 The Phantom M... 23.9 -35.6 24.2 -36.1 24.7 -36.6 CREATE TIBBLES WITH LIST-COLUMNS tibble::tribble(...) Makes list-columns when needed tribble( -max, ~seq, unnest_wider(data, col) Turn each element of a list-column into a regular column. starwars%>% select(name, films) %>% unnest wider(films) 3, 1 4, 1 5, 1 3 Luke 4 C-3PO 5 R2-D2 TfT^a EH EH EH Luke 1 The Empire... Revenge of... Return of... C-3PO 1 The Empire... Attack of... The Phantom... R2-D2 The Empire... Attack of... The Phantom... tibble::tibble(...) Saves list input as list-columns. tibble(max = c(3,4,5), seq = list(l:3,1:4,1:5)) tibble::enframe(x, name="name", value="value") Converts multi-level list to a tibble with list-cols. enframe(list('3'=l:3, '4'=1:4, '5'=1:5), 'max', 'seq') OUTPUT LIST-COLUMNS FROM OTHER FUNCTIONS dplyr::mutate(), transmute)), and summarise)) will output list-columns if they return a list. mtcars%>% group_by(cyl) %>% summarisefq = list(quantile(mpg))) hoist).data, .col,.remove = TRUE) Selectively pull list components out into their own top-level columns. Uses purrr::pluck() syntax for selecting from lists. starwarsc e select(name, films) %>% hoist(films, first_film = 1, second_film = 2) ■QE^HI BEnS^BB3SS3U Luke HHHHJ The Empire... B C-3PO TheEmpire... Revenge of... C-3PO Attack of... R2-D2 R2-D2 TheEmpire... Attack of... TRANSFORM NESTED DATA A vectorized function takes a vector, transforms each element in parallel, and returns a vector of the same length. By themselves vectorized functions cannot work with lists, such as list-columns. dplyr::rowwise(.data,...) Group data so that each row is one group, and within the groups, elements of list-columns appear directly (accessed with [[), not as lists of length one. When you use rowwise)), dplyr functions will seem to apply functions to list-columns in a vectorized fashion. . í ! fUnj fun( result 1 result 2 result 3 Apply a function to a list-column and create a new list-column. n_storms%>% dimi) returns two values per row rowwiseO %>% f mutate(n = list(dim(data)))-^E Apply a function to a list-column and create a regular column. n_storms%>% rowwiseO %>% mutatefn = nrow(data)' nrow() returns one integer per row Collapse multiple list-columns into a single list-column. appendO returnsa list for each row, so col type must be list starwars %>% rowwiseO %>% mutate(transport = list(append(vehicles, starships)) Apply a function to multiple list-columns. lengthQ returns one integer per row starwars %>% rowwiseO %>% mutate(n_transports = length(c(vehicles, starships)) Studio See purrr package for more list functions. RStudio8 is a trademark of RStudio, PBC • CCBYSA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn moreattidyr.tidyverse.org • tibble 3.1.2 • tidyr 1.1.3 • Updated: 2021-08