Work with strings with strin
CHEATSHEET
The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks.
Detect Matches
TRUE TRUE FALSE
III
str_detect(string, pattern) Detect the presence of a pattern match in a string.
str_detect(fruit, "a")
str_which(string, pattern) Find the indexes of strings that contain a pattern match.
str_which(fruit, "a")
str_count(string, pattern) Count the number of matches in a string.
str_count(fruit, "a")
str_locate(string, pattern) Locate the positions of pattern matches in a string. Also str_locate_all. str_locate(fruit, "a")
Subset Strings
str_sub(string, start = 1L, end = -1L) Extract substrings from a character vector.
str_sub(fruit, 1, 3); str_sub(fruit, -2)
str_subset(string, pattern) Return only the strings that contain a pattern match.
str_subset(fruit, "b")
str_extract(string, pattern) Return the first pattern match found in each string, as a vector. Also str_extract_all to return every pattern match. str_extract(fruit, "[aeiou]")
str_match(string, pattern) Return the first pattern match found in each string, as a matrix with a column for each () group in pattern. Also str_match_all.
str_match(sentences, "(a\the) (["]+)")
Manage Lengths
str_length(string) The width of strings (i.e. number of code points, which generally equals the number of characters), strjength(fruit)
str_pad(string, width, side = c("left", "right", "both"), pad = "") Pad strings to constant width. str_pad(fruit, 17)
str_trunc(string, width, side = c("right", "left", "center"), ellipsis = "...") Truncate the width of strings, replacing content with ellipsis.
str_trunc(fruit, 3)
str_trim(string, side = c("both", "left", "right")) Trim whitespace from the start and/or end of a string. str_trim(fruit)
Mutate Strings
ASTRING a string
a string ASTRING
a string
t
A String
str_sub() <- value. Replace substrings by identifying the substrings with str_sub() and assigning into the results.
str_sub(fruit, 1,3) <- "str"
str_replace(string, pattern, replacement) Replace the first matched pattern in each string. str_replace(fruit, "a", "-")
str_replace_all(string, pattern,
replacement) Replace all matched patterns in each string. str_replace_all(fruit, "a", "-")
str_to_lower(string, locale = strings to lower case. str_to_lower(sentences)
str_to_upper(string, locale = strings to upper case. str_to_upper(sentences)
"en")1 Convert
"en")1 Convert
str_to_title(string, locale = "en")1 Convert strings to title case. str_to_title(sentences)
@Stud
Join and Split
{XX)
|{yy)|
str_c(..., sep ="", collapse = NULL) Join multiple strings into a single string. str_c(letters, LETTERS)
str_c(..., sep = ".collapse = NULL) Collapse a vector of strings into a single string.
str_c(letters, collapse = "")
str_dup(string, times) Repeat strings times times. str_dup(fruit, times = 2)
str_split_fixed(string, pattern, n) Split a vector of strings into a matrix of substrings (splitting at occurrences of a pattern match). Also str_split to return a list of substrings. str_split_fixed(fruit, "" n=2)
glue::glue(..., .sep ="", .envir = parent.frame(), .open = "{", .close ="}") Create a string from strings and {expressions} to evaluate. glue::glue("Pi is {pi}")
glue::glue_data(.x,.sep ="", .envir = parent.frame(), .open = "{", .close ="}") Use a data frame, list, or environment to create a string from strings and {expressions} to evaluate. glue::glue_data(mtcars, "{rownames(mtcars)} has {hp} hp")
Order Strings
Helpers
apple
banana
pear
apple
10
str_order(x, decreasing = FALSE, na_last = TRUE, locale = "en", numeric = FALSE, ...J1 Return the vector of indexes that sorts a character vector. x[str_order(x)]
str_sort(x, decreasing = FALSE, najast = TRUE, locale = "en", numeric = FALSE, ...J1 Sort a character vector. str_sort(x)
str_conv(string, encoding) Override the encoding of a string. str_conv(fruit,"IS0-8859-l")
str_view(string, pattern, match = NA) View HTML rendering of first regex match in each string. str_view(fruit, "[aeiou]")
str_view_all(string, pattern, match = NA) View HTML rendering of all regex matches. str_view_all(fruit, "[aeiou]")
str_wrap(string, width = 80, indent = 0, exdent = 0) Wrap strings into nicely formatted paragraphs. str_wrap(sentences, 20)
1 See bit.lv/ISQ639-l for a complete list of locales.
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at striner.tidvverse.ore ■ Diagrams from @LVaudor     stringr 1.2.0« Updated: 2017-10
Need to Know
Pattern arguments in stringr are interpreted as regu la r exp ressio ns after any special characters have been parsed.
In R, you write regular expressions as strings, sequences of characters surrounded by quotes ("") orsingle quotes(").
Some characters cannot be represented directly in an R string. These must be represented as special characters, sequences of characters that have a specific meaning., e.g.
Special Character Represents
\\ \
\n new line
Run ?""' to see a complete list
Because of this, whenever a \ appears in a regular expression, you must write it as \\ in the string that represents the regular expression.
Use writeLinesQ to see how R views your string after all special characters have been parsed.
writeLines("\\.") #\.
writeLines("\\ is a backslash") #\isa backslash
INTERPRETATION
Patterns in stringr are interpreted as regexs To change this default, wrap the pattern in one of:
regex(pattern, ignore_case= FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE,...) Modifies a regex to ignore cases, match end of lines as well of end of strings, allow R comments within regex's , and/or to have . match everything including \n.
str_detect("l", regex("i", TRUE))
fixed!) Matches raw bytes but will miss some characters that can be represented in multiple ways (fast). str_detect("\u0130", fixed("i"))
coll() Matches raw bytes and will use locale specific collation rules to recognize characters that can be represented in multiple ways (slow). str_detect("\u0130", coll("i", TRUE, locale = "tr"))
boundary!) Matches boundaries between characters, line_breaks, sentences, or words. str_split(sentences, boundary("word"))
©Stud
Regular Expressions
Regular expressions, or regexps, are a concise language for describing patterns in strings.
MATCH CHARACTERS
see <- function(rx) str_view_all("abc ABC 123\t.!?\\(){}\n", rx)
string (type this)	regex p (to mean this)	matches (which matches this)	example			
	■ (etc.)	a (etc.)	see(	'a")	|bcABC123	•!?\(){}
\\.	\.		see(	•\V)	abcABC123	|!?\(){}
W	V		see(	•\\n	abcABC123	|?\(){}
\\?	\?		see(	•\\r)	abcABC123	•!|\(){}
WW	w	\	see(	'WW")	abcABC123	•!?\(){}
\\(	\(	(	see(	•\\n	abcABC123	•!?\|){}
\\)	\)	)	see(	'W)"l	abcABC123	•!?\(){}
\\{	\{	{	see(	•\\n	abcABC123	•!?\()l
\\)	\}	}	see(	"WJ")	abcABC123	•!?\()l
\\n	\n	new line (return)	see(	'\\n")	abcABC123	•!?\(){}
\\t	\t	tab	see(	■\\t")	abcABC123|	•!?\(){}
Us	\s	any whitespace (\S fornon-whitespaces)	see(	•\\s")	abcABC123	•!?\(){}|
\\d	\d	any digit f\D for non-digits)	see(	•\\d")	abcABCH	•!?\(){}
\\w	\w	any word character (\\N for non-word chars)	see(	'\\w")	abcABC123	•!?\(){}
\\b	\b	word boundaries	see(	■\\b")	|abc||ABCJ|l23;	•!?\(){}
	[:digit:] 1	digits	see(	'[:digit:]")	abcABC123	•!?\(){}
	[:alpha:]	letters	see(	'[:alpha:]")	abcABC123	•!?\(){}
	[:lower:]	lowercase letters	see(	'[:lower:]")	abcABC123	•!?\(){}
	[:upper:] 1	uppercase letters	see(	'[:upper:]")	abcABC123	•!?\(){}
	[:alnum:]	letters and numbers	see(	'[:alnum:]")	abcABC123	•!?\(){}
	[:punct:]	punctuation	see(	'[:punct:]")	abcABC123	•!?\(){}
	[:graph:] 1	letters, numbers, and punctuation	see(	'[:graph:]")	abcABC123	•!?\(){}
	[:space:] 1	space characters (i.e. \s)	see(	'[:space:]")	abo]AB(|l23|	mni
	[:blank:] 1	space and tab (but not new line)	see(	'[:blank:]")	abo]AB(|l23|	mm
		every character except a new line	see(	'•")	abcABC123	mm
[:alnum:] [digit:] 0123456789
r[:alpha:] [:Iower:] [:upper:]
abcdef ABCDEF
1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]]
10
ALTERNATES		alt<- function	rx) str_view_all("abcde", rx)		QUANTIFIERS		quant <- function(rx) str_view_all(".a.aa.aaa", rx)	
	regexp	matches	example			regexp	matches example	
	ab|d	or	alt("ab|d")	abcde		a?	zeroorone quant("a?")	.a.aa.aaa
	[abe]	one of	alt("[abe]")	abcde	H   H H	*	zero or more quant("a*")	.a.aa.aaa
	[Aabe]	anything but	alt("[Aabe]")	abcde	□to:	a+	one or more quant("a+")	.a.aa.aaa
	[a-c]	range	alt("[a-c]")	abcde	UHIJQM i	a{n}	exactly n quant("a{2}")	.a.aa.aaa
					.......11Q0O	{n,}	n or more quant("a{2,}")	.a.aa.aaa
ANCHORS		anchor<-function(rx) str_view_all("aaa", rx)			rtoHIHmH'"]	a{n, m}	between n and m quant("a{2,4}")	.a.aa.aaa
■oopz	regexp	matches	example					
	Aa	start of string	anchor("Aa")	aaa	GROUPS		ref <-function(rx) str_view_all("abbaab", rx)	
j_t j_j j_j Ml	$	end of string	anchor("a$")	aaa	Use parentheses to set precedent (order of evaluation) and create groups			
					regexp	matches example		
LOOKAROUNDS		look <-function	rx) str_view_all("bacad", rx)		(ab|d)e	sets precedence alt("(ab|d)e")		abcde
	regexp	matches	example		Use an escaped number to refer to and duplicate parentheses groups that occur			
	a(?=c)	followed by	look("a(?=c)")	bacad	earlier in a pattern. Refer to each		group by its order of appearance	
—iis&zpr 1—n—rUri^hj,	a(?!c)	not followed by	look("a(?!c)")	bacad	string regexp	matches example		
■   r*|   |"| |	(?<=b)a	preceded by	look("(?<=b)a")	bacad	(type this)    fto mean this)     (which matches this)       (the result is the same as reff'abba"))			
SSSSj—u—1	(?<!b)a	not preceded by	look("(?<!b)a")	bacad	\\1         \1 (etc.)	first ()	group, etc. ref("(a)(b)\\2\\l")	abbaab
RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org ■ Diagrams from @LVaudor     stringr 1.2.0« Updated: 2017-10