What Kind of Regular Expression Engine Does Google Analytics Use and Can I Use Negative Regex in Google Analytics?

 

You’ve started delving into regex. You either (understand it)|(don\’t understand it), but chances are, if you were able to read that joke, you’re in the know. Congrats! Now that you know a little bit about regex and how it works, you’re probably wondering how you can apply it to Google Analytics, Google Tag Manager, Google Data Studio, and any other Google product you can get your hands on. 

While your basic knowledge of regex will no doubt take you far, if you start to get more in depth into regex, you’ll need to know your limitations when it comes to Google products. I’ll start off with a background on regex, so easy a caveman could understand it! Then I’ll go into what kind of regular expression Google uses and whether or not you can use negative regular expressions in Google Analytics goals. 

An Introduction to Regex Commonly Used Terms 

A long, long time ago in a world very far away, I thought there was only one type of regex. Boy was I wrong. As I began reading more and more, I was running into words like regex flavors, regex engines, and regex syntax. After hours and hours of research, I think the best way to explain regex flavors, regex engines, and regex syntax is with pets. 

Disclaimer: With the below metaphor, we’re going to have to assume that there are no outliers among dogs. That’s right, your dog who acts like a cat, your calm Jack Russell terrier, and your golden retriever who absolutely hates people all don’t count here. 

I love dogs. There are many different breeds of dogs. I love regex. There are many different flavors of regex. Each breed of dog has a different look. Some have long hair that you have to groom constantly, others have short oily hair, some coats are all one color, while other coats are different colors.

Overall, dogs have four legs, a tail, paws, and similar body shapes. That’s not to say a Great Dane looks like a Maltese, but from far away you can tell they’re both dogs.  Regular expressions, similarly, have different syntaxes. That is to say, some have long hair that you have to groom constantly…just kidding. In all seriousness though, although the regex can be arranged differently, you still can kind of tell from afar that it’s a regular expression. 

In addition to looking different, different breeds have different characteristics. A Jack Russell Terrier has more energy than a bulldog, golden retrievers are known for being incredibly kid friendly, beagles howl something wonderful/terrible and still other breeds of dogs are known for being a little bit more particular about who they choose to get along with. Often when describing a breed, you’re also describing the way they act and look. When I mention that I have a chocolate lab, you understand that it is brown in color, approximately 60 – 70 lbs., and loves water. In the same way that breeds have characteristics, different regular expression flavors support different behaviors

In the beginning we associated dog breeds with flavors of regex, but we didn’t get into the technical definition. The technical definition of a flavor is the syntax and behavior supported by a particular regex engine, where a regex engine is just the “machine” that processes the regex. Going back to our dog example, each breed is going to look a different way and have different characteristics.

Unfortunately for my extended metaphor, there’s no way to compare a regular expression engine to a dog. A regular expression engine is what actually processes your regular expression.

I’m going to finish my metaphor with a common-sense reminder. Some breeds don’t get along with other breeds. There’s a reason why there’s a big dog section and a small dog section in every dog park. Sure, some can live in harmony with each other (we’ve all seen that one little dog that just loves to play with big dogs), but others can’t. Similarly, different regular expressions are not always compatible with each other. 

What kind of regular expression does Google Analytics use? 

This is the kind of question that you don’t care about until you need to know the answer and can’t find any Google help document to guide you to it. Google’s regular expression engine of choice is RE2, sometimes referred to as ‘golang’ regex. If you’re using very simple regular expressions, this doesn’t matter. Here’s why it became a problem for me though: it doesn’t support negative regular expressions. 

For example, if you’re setting up a destination goal and you want to capture any time someone submits a thank you form, EXCEPT for the help/support thank you form, you are sore out of luck. RE2 does not support negative regular expressions at the time of writing this article. So if you’re looking for the answer for “Can I exclude something with Regex in Google Analytics Goals?” The answer is no. 

Summary 

A regular expression engine is the actual processor of your regular expression. The syntax is how your regular expression looks and the flavor is a combination of the syntax and the different behaviors associated with a regular expression engine. Unfortunately, the flavor used by Google Analytics doesn’t support negative regular expressions, so you can’t use negative regular expressions in your Google Analytics goals. 

 

Stay in the know with email updates!

* indicates required

Regular Expressions for Misspelled Words

 

If you’ve spent any time at all parsing through data manually, you may have stumbled on the phrase “regex” once or twice. Regex, sometimes referred to as regular expressions, are strings that allow you to search through data more easily. Although seemingly daunting at first with its code like language, regular expressions can be learned fairly easily and can save you a lot of time. 

How do regular expressions save time? Quite simply, you’re not going through and manually adding a filter for every single variation of a word. There are tons of useful articles about regex, but I find the easiest way to explain the concept and the most commonly used application of regex is through misspelled names. This tutorial will guide you through the most basic form of regex examples and then slowly go into more complex uses. 

Regex 101 

To begin with, you’ll need to know some simple regular expression examples (simple regex examples). 

  • (Parentheses) are meant to contain something, just like you would in a simple math expression. 
  • | means or. As in you either know regex | you don’t know regex. 

For example, if your name is Ashley, but some people spell it Ashleigh, you can use the formula Ashle(y|igh). Other examples are: 

  • (Eri|Aaro)n matches Erin and Aaron 
  • Charl(ie|ey) matches Charlie and Charley 

Regex for If a Letter Might or Might Not Be There 

The “or” feature is great if you have a simple variation on your name, but if you’re an Anna, Collin, or Meghan, that seems like a lot of work for one little variation. With these names it’s not a drastic misspelling, you’ve just got some Meghan’s without an h or an Anna or Collin without the extra middle letter. That’s when the ? comes in handy. 

  • ? means the preceding letter might be there or it might not be. As in this this next regex joke might (not)? make sense. 

For example, our friend Anna could use the regex formula Ann?a. Other examples are: 

  • Anne? matches Anne and Ann 
  • Megh?an matches Meghan and Megan 
  • Coll?in matches Colin and Collin 

Regex for Begins With and Anything 

The next two regex formulas are most useful for when you’ve given up all hope of spelling someone’s name right. That’s right, I’m looking at you Kathryn/Catherine/Katherine/Cathryn/Kathy/Cathy. All we know if that your name begins with a C or K, has an “ath” in it somewhere and ends with anything. That’s right – anything. 

  • The regex to cover anything and everything is .* and when used in a larger formula, it’s seen as (.*) 
  • ^ means begins with. In the same way a carrot leads a horse, your carrot (^) can lead your regex, making very clear what your formula begins with.  

For example, our final formula to include people with a first name of Katherine/Cathryn/Kathy/Cathie, not Mary Catherine or Ann Kathryn would be ^(K|C)ath(.*) 

This expression means that no matter the name ends up being, it definitely starts (^) with a K or (|) a C, has “ath” after that, and then ends in anything. Another example is:  

  • ^(K|C)h?rist(e|i)n(.*) matches Christine, Kristina, Kristin, Kristen 

Regex for Ends With 

Lastly, we’ve got a regex for that one time when you were half listening to a conversation and only heard the end of someone’s name.  

  • $ means ends with, as in all jobs end with payment (we hope).  

So whether you got introduced to Kevin or Devin or Kevan or Devan, we may never know, but at least regex will help you get in the ballpark. For Kevin or Devan we would use the formula (.*)ev(i|a)n$ 

This expression means it doesn’t matter what the name starts with, it definitely ends in evin or evan. This formula even includes our friend Evan! 

Summary 

I’ve listed 5 common regular expressions to help you with misspelled names. To summarize: 

    • | means or
    • ? means the letter before may or may not be there 
    • .* means anything 
    • ^ means starts with
    • $ means ends with 

Stay in the know with email updates!

* indicates required