Work with strings with strin CHEATSHEET The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks. Detect Matches TRUE TRUE FALSE III str_detect(string, pattern) Detect the presence of a pattern match in a string. str_detect(fruit, "a") str_which(string, pattern) Find the indexes of strings that contain a pattern match. str_which(fruit, "a") str_count(string, pattern) Count the number of matches in a string. str_count(fruit, "a") str_locate(string, pattern) Locate the positions of pattern matches in a string. Also str_locate_all. str_locate(fruit, "a") Subset Strings str_sub(string, start = 1L, end = -1L) Extract substrings from a character vector. str_sub(fruit, 1, 3); str_sub(fruit, -2) str_subset(string, pattern) Return only the strings that contain a pattern match. str_subset(fruit, "b") str_extract(string, pattern) Return the first pattern match found in each string, as a vector. Also str_extract_all to return every pattern match. str_extract(fruit, "[aeiou]") str_match(string, pattern) Return the first pattern match found in each string, as a matrix with a column for each () group in pattern. Also str_match_all. str_match(sentences, "(a\the) (["]+)") Manage Lengths str_length(string) The width of strings (i.e. number of code points, which generally equals the number of characters), strjength(fruit) str_pad(string, width, side = c("left", "right", "both"), pad = "") Pad strings to constant width. str_pad(fruit, 17) str_trunc(string, width, side = c("right", "left", "center"), ellipsis = "...") Truncate the width of strings, replacing content with ellipsis. str_trunc(fruit, 3) str_trim(string, side = c("both", "left", "right")) Trim whitespace from the start and/or end of a string. str_trim(fruit) Mutate Strings ASTRING a string a string ASTRING a string t A String str_sub() <- value. Replace substrings by identifying the substrings with str_sub() and assigning into the results. str_sub(fruit, 1,3) <- "str" str_replace(string, pattern, replacement) Replace the first matched pattern in each string. str_replace(fruit, "a", "-") str_replace_all(string, pattern, replacement) Replace all matched patterns in each string. str_replace_all(fruit, "a", "-") str_to_lower(string, locale = strings to lower case. str_to_lower(sentences) str_to_upper(string, locale = strings to upper case. str_to_upper(sentences) "en")1 Convert "en")1 Convert str_to_title(string, locale = "en")1 Convert strings to title case. str_to_title(sentences) @Stud Join and Split {XX) |{yy)| str_c(..., sep ="", collapse = NULL) Join multiple strings into a single string. str_c(letters, LETTERS) str_c(..., sep = ".collapse = NULL) Collapse a vector of strings into a single string. str_c(letters, collapse = "") str_dup(string, times) Repeat strings times times. str_dup(fruit, times = 2) str_split_fixed(string, pattern, n) Split a vector of strings into a matrix of substrings (splitting at occurrences of a pattern match). Also str_split to return a list of substrings. str_split_fixed(fruit, "" n=2) glue::glue(..., .sep ="", .envir = parent.frame(), .open = "{", .close ="}") Create a string from strings and {expressions} to evaluate. glue::glue("Pi is {pi}") glue::glue_data(.x,.sep ="", .envir = parent.frame(), .open = "{", .close ="}") Use a data frame, list, or environment to create a string from strings and {expressions} to evaluate. glue::glue_data(mtcars, "{rownames(mtcars)} has {hp} hp") Order Strings Helpers apple banana pear apple 10 str_order(x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ...J1 Return the vector of indexes that sorts a character vector. x[str_order(x)] str_sort(x, decreasing = FALSE, najast = TRUE, locale = "en", numeric = FALSE, ...J1 Sort a character vector. str_sort(x) str_conv(string, encoding) Override the encoding of a string. str_conv(fruit,"IS0-8859-l") str_view(string, pattern, match = NA) View HTML rendering of first regex match in each string. str_view(fruit, "[aeiou]") str_view_all(string, pattern, match = NA) View HTML rendering of all regex matches. str_view_all(fruit, "[aeiou]") str_wrap(string, width = 80, indent = 0, exdent = 0) Wrap strings into nicely formatted paragraphs. str_wrap(sentences, 20) 1 See bit.lv/ISQ639-l for a complete list of locales. RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at striner.tidvverse.ore ■ Diagrams from @LVaudor stringr 1.2.0« Updated: 2017-10 Need to Know Pattern arguments in stringr are interpreted as regu la r exp ressio ns after any special characters have been parsed. In R, you write regular expressions as strings, sequences of characters surrounded by quotes ("") orsingle quotes("). Some characters cannot be represented directly in an R string. These must be represented as special characters, sequences of characters that have a specific meaning., e.g. Special Character Represents \\ \ \n new line Run ?""' to see a complete list Because of this, whenever a \ appears in a regular expression, you must write it as \\ in the string that represents the regular expression. Use writeLinesQ to see how R views your string after all special characters have been parsed. writeLines("\\.") #\. writeLines("\\ is a backslash") #\isa backslash INTERPRETATION Patterns in stringr are interpreted as regexs To change this default, wrap the pattern in one of: regex(pattern, ignore_case= FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE,...) Modifies a regex to ignore cases, match end of lines as well of end of strings, allow R comments within regex's , and/or to have . match everything including \n. str_detect("l", regex("i", TRUE)) fixed!) Matches raw bytes but will miss some characters that can be represented in multiple ways (fast). str_detect("\u0130", fixed("i")) coll() Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow). str_detect("\u0130", coll("i", TRUE, locale = "tr")) boundary!) Matches boundaries between characters, line_breaks, sentences, or words. str_split(sentences, boundary("word")) ©Stud Regular Expressions Regular expressions, or regexps, are a concise language for describing patterns in strings. MATCH CHARACTERS see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n", rx) string (type this) regex p (to mean this) matches (which matches this) example ■ (etc.) a (etc.) see( 'a") |bcABC123 •!?\(){} \\. \. see( •\V) abcABC123 |!?\(){} W V see( •\\n abcABC123 |?\(){} \\? \? see( •\\r) abcABC123 •!|\(){} WW w \ see( 'WW") abcABC123 •!?\(){} \\( \( ( see( •\\n abcABC123 •!?\|){} \\) \) ) see( 'W)"l abcABC123 •!?\(){} \\{ \{ { see( •\\n abcABC123 •!?\()l \\) \} } see( "WJ") abcABC123 •!?\()l \\n \n new line (return) see( '\\n") abcABC123 •!?\(){} \\t \t tab see( ■\\t") abcABC123| •!?\(){} Us \s any whitespace (\S fornon-whitespaces) see( •\\s") abcABC123 •!?\(){}| \\d \d any digit f\D for non-digits) see( •\\d") abcABCH •!?\(){} \\w \w any word character (\\N for non-word chars) see( '\\w") abcABC123 •!?\(){} \\b \b word boundaries see( ■\\b") |abc||ABCJ|l23; •!?\(){} [:digit:] 1 digits see( '[:digit:]") abcABC123 •!?\(){} [:alpha:] letters see( '[:alpha:]") abcABC123 •!?\(){} [:lower:] lowercase letters see( '[:lower:]") abcABC123 •!?\(){} [:upper:] 1 uppercase letters see( '[:upper:]") abcABC123 •!?\(){} [:alnum:] letters and numbers see( '[:alnum:]") abcABC123 •!?\(){} [:punct:] punctuation see( '[:punct:]") abcABC123 •!?\(){} [:graph:] 1 letters, numbers, and punctuation see( '[:graph:]") abcABC123 •!?\(){} [:space:] 1 space characters (i.e. \s) see( '[:space:]") abo]AB(|l23| mni [:blank:] 1 space and tab (but not new line) see( '[:blank:]") abo]AB(|l23| mm every character except a new line see( '•") abcABC123 mm [:alnum:] [digit:] 0123456789 r[:alpha:] [:Iower:] [:upper:] abcdef ABCDEF 1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]] 10 ALTERNATES alt<- function rx) str_view_all("abcde", rx) QUANTIFIERS quant <- function(rx) str_view_all(".a.aa.aaa", rx) regexp matches example regexp matches example ab|d or alt("ab|d") abcde a? zeroorone quant("a?") .a.aa.aaa [abe] one of alt("[abe]") abcde H H H * zero or more quant("a*") .a.aa.aaa [Aabe] anything but alt("[Aabe]") abcde □to: a+ one or more quant("a+") .a.aa.aaa [a-c] range alt("[a-c]") abcde UHIJQM i a{n} exactly n quant("a{2}") .a.aa.aaa .......11Q0O {n,} n or more quant("a{2,}") .a.aa.aaa ANCHORS anchor<-function(rx) str_view_all("aaa", rx) rtoHIHmH'"] a{n, m} between n and m quant("a{2,4}") .a.aa.aaa ■oopz regexp matches example Aa start of string anchor("Aa") aaa GROUPS ref <-function(rx) str_view_all("abbaab", rx) j_t j_j j_j Ml $ end of string anchor("a$") aaa Use parentheses to set precedent (order of evaluation) and create groups regexp matches example LOOKAROUNDS look <-function rx) str_view_all("bacad", rx) (ab|d)e sets precedence alt("(ab|d)e") abcde regexp matches example Use an escaped number to refer to and duplicate parentheses groups that occur a(?=c) followed by look("a(?=c)") bacad earlier in a pattern. Refer to each group by its order of appearance —iis&zpr 1—n—rUri^hj, a(?!c) not followed by look("a(?!c)") bacad string regexp matches example ■ r*| |"| | (?<=b)a preceded by look("(?<=b)a") bacad (type this) fto mean this) (which matches this) (the result is the same as reff'abba")) SSSSj—u—1 (?