Regular Expressions for Misspelled Words

 

If you’ve spent any time at all parsing through data manually, you may have stumbled on the phrase “regex” once or twice. Regex, sometimes referred to as regular expressions, are strings that allow you to search through data more easily. Although seemingly daunting at first with its code like language, regular expressions can be learned fairly easily and can save you a lot of time. 

How do regular expressions save time? Quite simply, you’re not going through and manually adding a filter for every single variation of a word. There are tons of useful articles about regex, but I find the easiest way to explain the concept and the most commonly used application of regex is through misspelled names. This tutorial will guide you through the most basic form of regex examples and then slowly go into more complex uses. 

Regex 101 

To begin with, you’ll need to know some simple regular expression examples (simple regex examples). 

  • (Parentheses) are meant to contain something, just like you would in a simple math expression. 
  • | means or. As in you either know regex | you don’t know regex. 

For example, if your name is Ashley, but some people spell it Ashleigh, you can use the formula Ashle(y|igh). Other examples are: 

  • (Eri|Aaro)n matches Erin and Aaron 
  • Charl(ie|ey) matches Charlie and Charley 

Regex for If a Letter Might or Might Not Be There 

The “or” feature is great if you have a simple variation on your name, but if you’re an Anna, Collin, or Meghan, that seems like a lot of work for one little variation. With these names it’s not a drastic misspelling, you’ve just got some Meghan’s without an h or an Anna or Collin without the extra middle letter. That’s when the ? comes in handy. 

  • ? means the preceding letter might be there or it might not be. As in this this next regex joke might (not)? make sense. 

For example, our friend Anna could use the regex formula Ann?a. Other examples are: 

  • Anne? matches Anne and Ann 
  • Megh?an matches Meghan and Megan 
  • Coll?in matches Colin and Collin 

Regex for Begins With and Anything 

The next two regex formulas are most useful for when you’ve given up all hope of spelling someone’s name right. That’s right, I’m looking at you Kathryn/Catherine/Katherine/Cathryn/Kathy/Cathy. All we know if that your name begins with a C or K, has an “ath” in it somewhere and ends with anything. That’s right – anything. 

  • The regex to cover anything and everything is .* and when used in a larger formula, it’s seen as (.*) 
  • ^ means begins with. In the same way a carrot leads a horse, your carrot (^) can lead your regex, making very clear what your formula begins with.  

For example, our final formula to include people with a first name of Katherine/Cathryn/Kathy/Cathie, not Mary Catherine or Ann Kathryn would be ^(K|C)ath(.*) 

This expression means that no matter the name ends up being, it definitely starts (^) with a K or (|) a C, has “ath” after that, and then ends in anything. Another example is:  

  • ^(K|C)h?rist(e|i)n(.*) matches Christine, Kristina, Kristin, Kristen 

Regex for Ends With 

Lastly, we’ve got a regex for that one time when you were half listening to a conversation and only heard the end of someone’s name.  

  • $ means ends with, as in all jobs end with payment (we hope).  

So whether you got introduced to Kevin or Devin or Kevan or Devan, we may never know, but at least regex will help you get in the ballpark. For Kevin or Devan we would use the formula (.*)ev(i|a)n$ 

This expression means it doesn’t matter what the name starts with, it definitely ends in evin or evan. This formula even includes our friend Evan! 

Summary 

I’ve listed 5 common regular expressions to help you with misspelled names. To summarize: 

    • | means or
    • ? means the letter before may or may not be there 
    • .* means anything 
    • ^ means starts with
    • $ means ends with 

Stay in the know with email updates!

* indicates required