Take control of your data and filter out what doesn’t help you to improve your search strategy. Learn how with this guide to regex for SEO.
Perhaps you’ve heard of regex but aren’t quite sure how it can be used in SEO or whether it fits into your own strategy.
Regular expressions, or ‘regex’, are like an in-line programming language for text searches that allow you to include complex search strings, partial matches and wildcards, case-insensitive searches, and other advanced instructions.
You can think of them as searching for a pattern, rather than a specific string of text.
Therefore, they can help you to find entire sets of search results that, at first glance, may appear to have little in common with each other.
Regex expressions are a language all their own and the first time you see one, it can look quite alien.
But they are quite easy to learn and can be used across JavaScript, Python and other programming languages, making them a versatile and powerful SEO tool.
In this guide, you’ll learn common regex operators, how to use more advanced regex filters for SEO, how to use regex in Google Analytics and Google Search Console, and more.
You’ll find examples of regex at work in different ways in SEO, too.
A regular expression typically includes a combination of text that will match exactly in the search results, along with several operators that act more like wildcards to achieve a pattern match rather than an exact text match.
This can include a single-character wildcard, a match for one or more characters, or a match for zero or more characters, as well as optional characters, nested sub-expressions in parentheses, and ‘or’ functions.
By combining these different operations together, you can build a complex expression that can achieve very far-reaching, yet very specific results.
A few examples of common regex operators include:
. A wildcard match for any single character.
.* A match for zero or more characters.
.+ A match for one or more characters.
d A match for any single numerical digit 0-9.
? Inserted after a character to make it an optional part of the expression.
| A vertical line or ‘pipe’ character indicates an ‘or’ function.
^ Used to denote the start of a string.
$ Used to denote the end of a string.
( ) Used to nest a sub-expression.
Inserted before an operator or special character to ‘escape’ it.
Some programming languages, such as JavaScript, allow the inclusion of ‘flags’ after the regex pattern itself, and these can further affect the outcome:
g Returns all matches instead of just the first one.
i Returns case-insensitive results.
m Activates multiline mode.
s Activates ‘dotall’ mode.
u Activates full Unicode support.
y Searches the specific text position (‘sticky’ mode).
As you can see, together these operators and flags start to build up to a complex logical language, giving you the ability to achieve very specific results across large, unordered data sets.
Regex can be used to explore the queries different user segments use, which queries are common to specific content areas, which queries drive traffic to specific parts of your site, and more.
In this article, Hamlet Batista demonstrated how to use regex in Python to analyze server log files, for example.
And in this one, Chris Long showed you how to use regex to extract the position, item, and name of the breadcrumbs associated with each URL of your site as part of a scalable keyword research and segmentation process.
Google encourages SEO pros to share examples of how they’re using regex on Twitter using the hashtag #performanceregex.
Here are a couple tips from SEO Twitter (you’ll notice it’s a pretty quiet hashtag – add your own examples if you have them!):
Use slug$ in a filter to see a list of every page/keyword that ends on “slug”. Very important if you have to manage large websites 🖤#performanceregex
— hannes-jeremia jaacks (@HannesJaacks) December 31, 2021
I’ve compiled quite an extensive library of #regularexpressions for #googlesearchconsole. 😎
Hit me with any other ideas, happy to add them. @danielwaisberg @DanielHereMe @CyrusShepard @5le @DataChaz #performanceregex #regex #seo https://t.co/BKX9UCGrOU
— JC Chouinard (@ChouinardJC) June 17, 2021
One of the most common uses of regex for SEO is in Google Analytics, where regular expressions can be used to set up filters so that you only see the data you want to see.
In this sense, the expression is used to exclude results, rather than to generate a set of inclusive search results.
For example, if you want to exclude data from IP addresses on your local area network, you might filter out 192.168.*.* to remove the full range from 192.168.0.0 to 192.168.255.255.
As a more complex example, let’s imagine you have two brands: regex247 and regex365.
You might want to filter results that match any combination of URLs that contain these brand names, such as regex247.biz or www.regex365.org.
One way to do this is with a fairly simple ‘or’ expression:
.*regex247.*|.*regex365.*
This would remove all matching URLs from your Analytics data, including subfolder paths and specific page URLs that appear on those domain names.
It is worth noting that – similar to your robots.txt file – a poorly written regex expression can quite easily filter out most or all of your data by including an unrestricted wildcard match.
The good news is that in many SEO cases, the filter is only applied to your data at the reporting stage, and by editing or deleting your regex expression, you can restore full visibility to your data.
You can also test regular expressions on a number of online testing tools, in order to see if they achieve the intended outcome – allowing you to ‘sandbox’ your regex expressions before you let them loose across your entire data set.
To create regex filters on Google Analytics, first, navigate to the type of Report you want to create (e.g. Behaviour > Site Content > All Pages or Acquisition > All Traffic > Source/Medium).
Below the graph, at the top of the data table, look for the search box and click advanced to display the advanced filter options.
Here you can include or exclude data based on a particular dimension or metric. In the dropdown list after you select your dimension, choose Matching RegExp and then enter your expression into the text box.
To create an ‘or’ expression in Google Analytics, just include the pipe character (the | vertical stroke symbol) between the appropriate segments of your expression.
Google Analytics regular expressions do not support ‘and’ statements within a single regex; however, you can just add another filter to achieve this.
Below your first regex, just click Add a dimension or metric and enter your next regex. In this way, you can stack as many expressions as you want and they will be processed as a single logical ‘and’ statement when filtering your data.
In 2021, Google Search Console began supporting the Re2 syntax of regex, allowing webmasters to include and exclude data within the user interface.
You’ll find all metacharacters supported by Google Search Console in this RE2 regex syntax reference on GitHub.
At the time of writing, there is a character limit of 4096 characters (which is usually enough…).
Examples you can use in Search Console can be filtering for queries containing a specific brand and the variations users could type, such as Facebook:
.*facebook.*|face*book.*|fb.*|fbook.*|f*book.*
Filter out users finding your website through “commercial” intent terms:
.*(best|top|alternate|alternative|vs|versus|review*).*
Related: Google Search Console Adds New Regex Filter Options
Finally, why does all this matter?
Well, it’s all about taking control of your data and filtering out the parts of it that don’t help you to improve your SEO – whether that’s particular pages or parts of your website, traffic from a specific source or medium, or your own local network data.
You can create quite simple regex expressions to achieve a basic ‘include’ or ‘exclude’ filter, or write longer expressions that work similarly to programming code to achieve complex and very specific results.
And with the right regex for each campaign, you can verify that your SEO efforts are achieving your aims, ambitions, and outcomes – a powerful way to prove positive ROI on your future SEO investments.
More resources:
Featured Image: Optura Design/Shutterstock
Get our daily newsletter from SEJ’s Founder Loren Baker about the latest news in the industry!
I’m Head of Research & Development at SALT.agency, a bespoke technical SEO consultancy with offices in the UK and the … [Read full bio]
Subscribe to our daily newsletter to get the latest industry news.
Subscribe to our daily newsletter to get the latest industry news.