Google Analytics Training Implementing Google Analytics Using Regular Expressions Regular Expressions (RegEx) Regular Expressions are patterns used to match text: ■ They can contain characters and metacharacters ■ They are used in GA to match or capture portions of a data field ■ Within Google Analytics, Regular Expressions are most useful in defining filters and tracking goals Filters in Google Analytics use Regular Expressions to match data and perform an action when a match is achieved •e.g. You would use a Regular Expression to exclude a range of IP addresses A regular expression is a set of characters and metacharacters that are used to match test in a specified pattern. You can use regular expressions to configure flexible goals and powerful filters. For example, if you want to create a filter that filters out a range of IP addresses, you'll need to enter a string that describes the range of the IP addresses that you want excluded from your traffic. Let's start off by looking at each metacharacter. Metacharacters are characters that have special meanings in regular expressions. Dot. . Match any single character Act Scene 3 would match "Act 1, Scene 3" and "Act 2, Scene 3" The operative word here is single. The regex would oof match "Act 10, Scene 3" What will? AcL * *, Scene 3 To make your regex more flexible, you could use a quantifier like the + sign: Act .+, Scene 3 But we'll talk about repetition a bit later, Use the dot as a wildcard to match any single character. The operative ward here is '"single", as the regex would NOT match Act 10. Scene 3. 71if d;t only allows ;nf character, ind the number ten contains two characters — a 1 andaO. How would you write a regular expression that would match "Act 10. Scene T"? You could use two dot s. To make your regex more flexible, and match EITHER "Act 1, Scene 3" or "Act 10, Scene 3", you could use a quantifier like the + sign. But we'll talk about repetition a bit later in this module. Backslash \ \ Escape the special meaning of metacharacters U.S. Holiday would match "UPS. Holiday" "U.Sb Holiday" and "U3Sg Holiday" Remember that the dot is a special character that matches with any single character, so if you want to treat a dot like a regular dot, you have to escape it with the backslash. rj\. s\. Holiday matches oniy with "U.S. Holiday" Will 192 . 168 .1. l match on/ythis IP address? No. The dots are wildcards, so this regex would match many strings like "192.168.151", "1921168.1.1", and others. To match only this one IPaddress, you need to escape the dots with backslashes: 192\.168\.l\.l Backslashes allow you to use special characters, such as the dot. as though thsy wsre literal characters. Enter the backslash irnmsdiately before sach metacharacter vou would like to sscape. '"U.S. Holiday" writ ten this way with periods after the U and the S would match a number of unintendsd strings, including UPS. Holiday, U.Sb Holiday: and U3Sg Holiday. Rsmsmber that ths dot is a special character that matches with any single character, so if you want to treat a dot like a regular dot. you have to escape it with the backslash. You'll use backslashss a lot. becauss dot s are used so frequently in precisely ths strings you are trying to match, like URL s and IP addrssses. For example, if you are creating a filter to excluds an IP address, remsmber to escape the dots. Character Sets and Ranges [ ] [ ] Match one item in this character set - Use a hyphen inside a character set to specify a range [uU] \ . [sS] \ . Holiday matches "u.s. Holiday" and "U.S. Holiday" Why won't it match "US Holiday" or "u.s. holiday"? The regex requires periods after the U and the S, and the H to be capitalized. [u"J]\.?[sS]\.? [hH]oiiday will match all of the above You can also use a hyphen - to specify a range of characters. [0-9] matches 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 You can negate matches using a caret A after the opening square bracket [ [A0-9] does nor match 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Again, we'll talk about repetition in the nest slids. Use a hyphen inside a character set to specify a range. So instead of typing square bracket 01234567S9. you can type square bracket 0 dash 9. And. you can nsgate a match using a caret after the opening square bracket. You can either individually list all the characters you want to match, as we did Typing square bracket caret zero dash in the first example, or you can specify a range. nins will exclude all numbers from rnatching. Use square brackets to enclose all of the characters you want as match possibilities. So. in the slide, you're trying to match the string U.S. Holiday, regardless of whether the U and ths S are capitalized. However, the expression won't match U.S. Holiday unless periods are used after both the U and the S. The expression also requires that the H is capitalized. Thsrs is a regex you can write to match all of these variations. The question mark used here is another "quantifier", like the '+' sign mentioned earlier. Note that later in this module, you will see the caret used a different way—as an anchor. The use of the caret shown here is specific to character sets, and the negating behaviour occurs only when ths caret is used after the opening square bracket in a character set. Quantifiers and Repetition ? + * ? Match zero or one of the previous item + Match one or more of the previous item * Match zero or more of the previous item 31? matches 3, 31 31+ matches 31, 311, 3111, 31111 31* matches 3, 31, 311, 3111, 3111 Specify repetition using {minimum, maximum] 3l{2) matches311. Does nor match 3, 31, or3111. 31(1,3} matches 31, 311, 3111. Does not match 3, or 31111. Recall that a dot . matches any single character. What would you use to match a wildcard of indeterminate length? * This will match any string of any size The asterisk requires zero or more of the preceding character. In the expression. "3-1-*", the preceding character is a 1. So it would match 3,3-1 character. What would vou use to match You can also SPECIFY repetition using a minimum and maximum number inside curly brackets. Recall that a dot matches any single Now7 let's talk about using quantifiers to indicate repetition. In earlier examples, we've used the plus sign and the question mark. The question mark requires either zero or one of the preceding character. In the expression "3-1-?" . the preceding character is a 1. So. both 3 and 3-1 would match. , 3-1-1, and so forth. a wildcard of ^determinate length? The plus sign requires at least one of the preceding character. So, "3-1-+" wouldn't match just a 3. It would match 3-1,3-1-1, and so on. Dot star will match a string of any size. Dot star is an easy way to say' "match anything." and is commonly used in Google Analytics goals and filters. Grouping () () Group and remember contents as an item | Either/Or (rj\ . s\ . | us | u\ . s\ . |us) Holiday matches "U.S. Holiday", "US Holiday", "us. Holiday", and "us Holiday" Will it match "U.S Holiday"? No, because it isn't one of the options we listed (u\ .?s\, ? |u\, 7E\. ?) will match all of the above In our list, we've accounted for both Using question marks, the second regex periods missing, but not for just one in the slide will match all of the above, period missing. It is handy to use the parentheses and the pipe symbol (also known as the OR symbol) together. Basically, you can just list the strings you want to match, separating each string with a pipe symbol — and enclosing the whole list in parentheses. Here, we've listed four variations of "US" that we'll accept as a match for US Holiday. If it's not in the list, it won't get matched. That's why "US Holiday" won't get matched if one of the periods is missing. Anchors A $ A Start of a string $ End of a string *'js matches "US Holiday", but not "Next Monday is a US Holiday" Hoiiday$ matches "US Holiday", but not "US Holiday Schedule" Anchors can be useful when specifying an IP address: 192\. 16S\ , IV . l$ matches "192.168.1.1" but not "192.163.1.15" A7 2V.i6S\-iV.i matches "72.168.1.1" but not "172,168.1.1" The caret signal s the beginning of an expression. In order to match, the string must BEGIN with what the regex specifies.. The dollar sign says, if there are any more characters, after the END of this string, then it=snot a match. So, caret US means start with US. US Holiday matches, but "Next Monday is a US Holiday" does not match. Holiday!) msans end with Holiday. US Holiday still matches, but '"US Holiday Schedule" does not match. Anchors can be useful when specifying an IP address. Take a look at these examplss. Shorthand Character Classes \d \s \w \d Match any number (same as [0-9]) \s Match any whitespace \w Match any letter, number, or underscore (same as [A-za-zO-9 \d{ 1,5} \s\w* matches "345 Embarcadero" However, just "Embarcadero" does not. Why not? Numbers are required as part of the regular expression. (\d( l, 5 J\s) ?\w* will match all of the above Note that "1600 Amphitheatre Parkway" would not match either. You would have to use something like: \d(l,5}\s\w+\s\w+ Backslash s msans that the numbsr should be followed by ens spacs. backslash w means match any alphanumeric character and ths star msans includs as many alphanumsric characters as you want If you want to maks the numbsr optional, group the first part of the regex with parentheses—including the space-regex requires the string to start witn a ^ foUow it ^ ^ ^ ^ number. "345 Embarcadero" matches, but just '"Embarcadero" does not. becauss this Some character classes are used so commonly that thsrs is a shorthand you can uss instead of writing out the ranges within squars brackets. Let's look at the example of a simplified regex that could match an addrss: Backslasli d means match any one digit zero through runs. Use curly bracket s and a minimum and maximum numbsr to specify how7 many digits to match. Backslash d follow7ed by 1 comma 5 in curly brackets msans that the address must contain at least one digit, and at most five digits. ■ not Note that an address like "HS00 Amphitheatre Parkway" would not match either, bscauss the rsgsx doss not account for ths spacs bstw7ssn Amphitheatre and Parkway. The slide show7s one w^ay you could account for this. Reg Ex Review Reg Ex Review [ ] Matches a Range of Characters () Groups statements | Either/Or Example: ([Gg]oogle|[Yy]ahoo) Matches: Google google Yahoo yahoo In the example on ths slide, we've created an expression that will match the strings Google or Yahoo, regardless of whether or not Google and Yahoo are capitalizsd. Reg Ex Review () Gro u ps state m ents , Match Any Single Character . Here. we=ve created an expression that will * Matches Zero Or More Characters . , IT__ ... . .., ... match URLs for uitsmst and thsatncal \ Escape movie trailers. | Either/Or ^ The first part of the expression indicates 5 End of a string that the URL can begin with anything will Sample URL: .*index\.php\?dl=video/trailers/(internet I theatrical)$ Then the expression specifies that the URL must end with mdex.php?dl=video.:'trailers,:' and thai sithsr internet or theatrical Matches: (anythirg)index.Php?dl=video/trailerS/intemet ™e 5 ^ lhat ^ ^ lMl w any longer than this won't get included in (anythirg)index.php?dl=video/trailers/theatrical ^~ are Common Uses for Regular Expressions There are several common uses for Regular Expressions within Google Analytics: ■ Creating filters - Filter Internal Traffic: 2D5\ . 112 \ . 23 [234 5] Setting up goals - Making a Goal out of Several Pages ,* index\,php\?dl=video/trailets/(internet I theatrical) S Tracking equivalent pages - /downloads/casestudy/.* Filtering data within the reporting interface - ( [Gg] oogle I [¥y] ajhoo) You'll find lots of applications for expressions in Google Analytics. Tregular 1 5 oms common examples are: * filtering out internal traffic by specifying a set of IP addresses * setting up a goal that needs to match multiple URLs * tracking equivalent pages in a funnel ■ and using the filter bos that appears on your reports to find specific entries in a table. Reg Ex Filters • e o o o o f**fH*t Cw-p»řiT«nSHlflL. Ona ■ ■» Filter Panem googiestoreYcom Use Regular Expressions to create filters that include or exclude Here's an example of a custom fdter that certain types of traffic uses a very simple regular expression. RegEx Goals Google Analytics Aj-iiii.1i« S+flnat > PrpfiJ* Srt)PB-f > Cod Salbu* Gojl Stttingi Ci Nu Goal Inlormat i>xle)tiphp\^i^vi(l*o^3ilei3/tirtei Goal URL vrtwi«ii»ijiKnM4itit lorn pagt.il PT#^thfr0 ire 41 Cqji nim* wm Jpp*ar m Cflnwriior. n HNIa lej httpnvsne ccur.YiantyouNml) *iridexlphpWdl=video/trailerW(intemet|thealrical)$ Use RegEx to set-up goals where several different types of pages H«e=s a regular expression used to definis a constitute a goal or step in the funnel process goal URL. RegEx and Tracking Equivalent Pages If your site has multiple pages or steps that are equivalent in value (e.g. whitepaper or case study downloads), you can use Regular Expressions to group them Regular Expressions allow you to create one funnel as opposed to tracking each page or action individually Example: Here's how you might use regular expressions to group pages or funnel steps on vour site. /downloads/caseyludy/.' Using a regular exprsssion allows you to track them as one funnel step rather than tracking each page or action mdividually. L earn how goals and funnels work in the module on goals. RegEx Within the Report Interface Use Filter box within the report interface to see subsets of info Instead of exporting bits of info at a time, you can select and filter the data you want to work with And. here's an example of using regular expressions within your report s. We're using the Find box to display all the rows in the table that contain Google or Yahoo. Reg Ex Generator for IP Address Ranges Use this tool to create regular expressions involving a range of IP Addresses: http:/Aivww,aoQale.com/suDPOi1/qocx]lBanaMcs/binfanBwer,D\^hl=enaanBWBr=55572 Google Ajiriytk:* Htlp C ejiltr Google Analytic* Hovh do I «jidifd* [rjTIk from 3 ra* o< IP iddrfl? Goo#t Arutylri Uhi uu wilmmmi* tt#Nqinnpuii cartel dm tl* üMm tlwfi* km I««!* 4as^Mkil^4*ri4i«« ih4 icri«! *ipt»M* ky 1*4* & If tM*tW rimiP44dr«t " [19a 1« 11 taf 7 lOcHet*! trt9' am Im IP rtvn 1 *• >»ngr Wq La* IP a J Jim. |wi|BlA i.9 a in mim 1 ■■■■..........:KiJ f Qw i ■i .JWI* Google Analytics provides a tool that makes it easier to generate a regular expression that matches a range of IP addresses. It=s called the Regular Expression Generator and you can find it at the URL shown in the slide. Or. you can search for Regular Expression Generator in the Google Analytics Help Center. I Points to Remember You'll find a number of useful applications for regex as you use Google Analytics. But. it's important that you think through all the implications of each expression that you use when you set up a filter or a goal. ■ Regular Expressions are a powerful search/find tool Tu. , .. ... 3 It s easy to mate a mistake and not get the ■ Can be used with filters, goals, and within the reporting interface ^ta or the resrit you're looking for. ■ Carefully test your RegEx statements - Small mistakes can have a big impact - Even correctly made RegEx statements may have flaws - Test, test, test then consider having someone else test for you ■ Search for "regex" to find additional resources on the web Set up a duplicate profile to test your regex statement s. After enough data has been collected, check Your results and make sure theY=re what you expect. Remember to always maintain a backup profile that includes all your data There are lot s of regex resources on web. To get started, just search for regex the