A good[ish] website

Web development blog, loads of UI and JavaScript topics

Easily digestible introduction to Regex in JavaScript

Filed under: JavaScript— Tagged with: regex

An easily approachable but complete look into Regex for beginners and intermediates (probably).


"Let’s learn Regex," I said deterministically, and smashed my head through a dry wall. After wiping the dust off from my shoulders I spat on my palms and sat in front of the computer.

Different flavors of regex

Regular Expressions come in different flavors, the most adopted one might be the Regex syntax that came bundled along Perl, that was later inherited by Java, Python, Ruby, and JavaScript (plus many more). You may read more about from WikiPedia.

This post looks into using regex with JavaScript, mainly, but due to the universal nature of Regular Expressions, the patterns might be usable in other languages, too.

There’s a Wikipedia entry comparing different regex engines if you’d like to know more about it.

A table showcasing different flavors of Regex.
List fo regular expression libraries

Defining a regex pattern

At first, Regex looks like what your grandmother might think programming languages look like: unfathomable garble. The syntax is dense and completely unexpressive, each character is packed full of meaning.

In JavaScript, all regex starts and ends with forward slash: /pattern/, (expect when using the RegExp() constructor, more on that later). Also it's very often coupled with the global g flag, like so: /pattern/g, the g means it matches all occurrences in a given string, without it, only first occurrence is matched.

Simplest regex pattern

At its simplest, a pattern can be just a word:


That will match the word "hello" once in the given string. The following will match all "hello" words, because it uses the global flag:


Then you would use it with JavaScript like so:

const greeting = 'Well, hello'.replace(/hello/g, 'hi')
// Well, hi

On the above case the replace method was used, but there are many others, more on the JavaScript regex methods later on in the article.

To do more advanced regexs you need to know about the metacharacters.

The metacharacters

Metacharacters are the juice of Regex, they make all the complex matching patterns possible. Below I have a quick reference of them:

ℹ️ Editors note: whenever you see | that means the pipe |. MDX has trouble rendering pipes inside tables even if they’re marked as code, and this blog uses MDX.

Char nameCharNameExplanation
Pipe or vertical bar|Alternation/hello | hey/ will match "hello" or "hey".
The question mark?Greedy Quantifier/https?/ makes the preceding char optional, the "s" in this case.
Dot or period.Any character/hello./ matches any one character after the "hello".
Caret^Anchor, begins with/^hello/ matches a string if it begins with "hello".
Dollar symbol$Anchor, ends with/hello$/ matches a string if it ends with "hello".
Star or asterisk*Greedy quantifierRepeat the preceding character zero or more times, /hello.*\.jpg/ matches "hello.jpg" or "hello-anything.jpg".
The plus sign+Greedy quantifierRepeat the preceding character one or more times.
Parenthesis()Grouping / capturing/Pop(Tarts)?/ matches "Pop" and "PopTarts".
Square brackets[]Character class/[0-9a-f]/ matches all characters used to formulate a hex number. Basically place the wanted characters inside the brackets, can use ranges.
Curly braces{}Quantifiers/x{6}/ matches "xxxxxx"

Let’s look escaping first and then what all the meta characters do.

Escaping the meta characters

Incidentally, if you want to target any of the meta characters, they need to be escaped with a backslash, in this example I want to target the question mark (Greedy Quantifier):


All character in regex patterns that need to be escaped are:

. / \ ^ $ * + ? ( ) [ ] { } |

Inside character classes ([]) the requirement is looser, only the following characters require escaping:

^ - ]

The double escape

When you’re using the RegExp constructor, the shorthand character classes need to be doubly escaped (more on those later):

const regex = new RegExp('\\s', 'g')

Because when using the constructor, you’re passing in a string (not a regex), and it’s parsed by using the string literal parser, and the backslash \ is an escape character in the string universe, so you’re just passing in a plain s. This why the backlash needs also to be escaped.

The forward slash / should also be escaped, so that it's not accidentally interpret as the regex closing character.

Now we got that out of the way, let's dig into the metacharacters.

The alternation character, pipe |

The following would match 'hello' and 'hey':


Very suitable use-case for the alternation character is to match single and double quotes:


Quantifier, question mark ?

The question mark makes the preceding character optional.

One good example use-case for this is to match http and https:


In this case the 's' is optional.

Another classic example: colou?r matches both the British colour and the simplified American spelling color.

Anchors, dollar $ and caret ^

The anchors match the beginning or ending of a string: beginning with caret ^ ending with $.

Anchors match a position, not character, so on it's own, the only finds zero length matches. Let's look the caret first:


That'll match only all URLs pointing to this site.

The dollar $ is pretty much the same as the caret, but targets the ending of a string, and it's semantically placed at the end:


The dot .

Dot, or the period is really powerful, it targets any character once (except line break), take the below example:


In the above any single character after the hello is selected.

On it's own the the dot is kind of boring, but when coupled with the repeater characters it comes super handy. Let's look the repeaters next.

Repeater characters: star * and plus +

The star or the asterisk character will repeat the preceding character zero or more times.


The above would match "hello", "helloo", "helloooo", and so on. Not very useful eh? Couple that with the upper mentioned dot and it becomes super useful!

The below will target all sort of different favicon names:


Will match, for example:

  • favicon.png ✅
  • favicon-16x16.png ✅
  • favicon-asddsfasdfasdf.png ✅

The plus character is almost same as the star, only difference is that it will repeat the preceding character one or more times.


Would not match match:

  • favicon.png ❌

Too greedy matching

In the case of /favicon.*\.png/ you might notice that the match is wildly too optimistic, which is the problem and the virtue of the dot. Greedy regular expressions in wrong places can be security hazards. In the favicon matching case, we probably only need to match numbers, letters, and the hyphen (the letters appearing in the Latin alphabet, that is). We can make the match less greedy with character classes, let’s look those next.

Character Classes with square brackets [ ]

Between the angle brackets you can define the characters you want to match. For example:

/[- /.]/

Would conveniently target all common separators in different date formats:

  • 2016-03-18
  • 2016/03/18
  • 2016.03.18
  • 2016 03 18

Like mentioned earlier, the dot is not needed to be escaped inside a character class.

All numbers could be matched like so:


But that’s kind of silly, luckily the smart folks at the Regular Expressions head quarters — in a seedy business park somewhere in suburban America – know this, and they have given us the ranges:


Within the square brackets [ ] The hyphen sets the range. Same range can be applied to upper or lowercase letters, like so:


They can also be combined:


a-z includes all letters from the Roman alphabet (the English language alphabet). But for example, the Finnish alphabet include more characters, and the Germans have four special letters Ä, Ö, Ü, and ß.

When needed, special characters can be simply added into the character class:


Character classes can be negated by placing a caret after the opening square bracket:


The above makes sure the match has no numbers.

At times, the character classes might become too long and complex, that’s when the shorthand character classes come handy.

Shorthand character classes

The character classes might get too long in some cases, so there are (a ton of) convenient shorthands that do the same thing.

Here’s a table of all the character classes in JavaScript:

\b Word boundary
Matches a character that is not followed or preceded by another "word character", but rather a space, tab, or a line-break. In a string "there was the boat" a regex /the/ would match "there was the boat", but /\bthe\b/ would only match the full word: "there was the boat"
\B Non-word boundary
Matches if the previous and next character are of same type. Kind of opposite of the aforementioned \b. \Be would match the second "e" in "erection", but not the first.
\cX Control character
Matches a control character. X represents a control character between A and Z.
\d Digit
Matches only numbers, exactly same as [0-9].
\D Non-digit
Same as the negated digit character group [^0-9].
\f Feed character
Matches form feed character.
\n Linefeed
Matches a linefeed.
\r carriage return
Matches carriage return.
\s Single whitespace character
That's to say: line feed, tab, space, form feed. Same as going: [ \f\n\r\t\v\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]. The pattern /\.\s/g would match all the dots with a following whitespace "Lorem. Ipsum. Dolor.yms.sit.".
\S Single non-whitespace character
Pretty much same as: [^ \f\n\r\t\v\u00a0\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff]. The pattern /\.\S/g would match: "Lorem. Ipsum. Dolor.yms.sit."
\t Tab
Matches a tab.
\v Vertical tab
Matches a vertical tab.
\w Word character
Same as [A-Za-z0-9_]. Note the underscore.
\W Non-word character
Same as [^A-Za-z0-9_]. Note the underscore.
\0 Null character
Matches a Null character.

Quantifiers, with curly braces { }

We've touched quantifiers already in this article, in the shorthand form of the question mark ?, the plus +, and the asterisk *. Below is a list of how you would write these quantifiers using curly braces:

  • /x?/ match x zero or one times, same as: /x{0,1}/
  • /x+/ match x zero or more times, same as: /x{0,}/
  • /x*/ match x one or more times, same as: /x{1,}/

Matching an MD5 hash is a great example case, the MD5 hash is a 16 byte string consisting of 32 hexadecimal characters, in simplified terms, numbers and letters:


Or with a shorthand:


That would work for most cases, but since MD5 hash consists of hexadecimal numbers, which is to say, all numbers, and letters from A to F, then our optimized pattern would look something like:


Grouping and Capturing, with parentheses ( )



Also called as capturing groups, the regex patterns wrapped inside parentheses can be subjected to quantifiers or alternation, or the match can be remembered and returned for later use. In JavaScript the return value can be accessed with '$1...'.

Same way as the question mark makes the preceding character optional, you can make a string of characters optional:


Would match "pop" and "poptarts".

Alteration can be used in normal fashion inside the group, which make it super useful:


And here's how to later on use the remembered captured group, note the '$1':

var foo = 'PopTarts'.replace(/Pop(Tarts)?/, 'Toaster$1')

console.log(foo) // ToasterTarts

The captured group can be also used within the same regex pattern, referring to them with escaped numbers 1... If we hypothetically needed to target an HTML element <span>hello</span>, we could use the capturing group to match the closing tag:


See the examples section on how to match an HTML tag.

The non-capturing groups



Capturing groups come with performance penalty, so use them only when you need to access the returned value. Otherwise use a non-capturing group. This is done by placing a question mark and colon ?: in the very beginning of the group:


Named capturing groups



JavaScript has the concept of named capturing groups, they’re what you’d think they are, and you'd used them like this:


Put that into a match:

const foo = '<span>hello</span>'.match(

The groups are now in the output as key/value pairs:

  0: '<span>hello</span>',
  1: 'span',
  2: 'span',
  groups: { openingTag: 'span', closingTag: 'span' },
  index: 0,
  input: '<span>hello</span>',

Regex flags

Flags are single letters that are added after the pattern, they change the way that specific regex behaves.

/foo/ matches the first "foo" in a string and then stops, but /foo/g matches all instances of "foo". When using the case-insensitive flag i, the regex pattern /foo/gi matches all instances of "FOO" and "foo".

In JavaScript, the following flags can be used:

gGlobalMatch all occurrences, default is to match only the first
iIgnore caseMake the Regex match case insensitive
mMultilineChanges how the anchors ^ & & behave
uUnicodeTreat pattern as a sequence of unicode code points
yStickymatches only from the index indicated by the lastIndex property of this regular expression in the target string (and does not attempt to match from any later indexes)

Using the regex patterns in JavaScript

There’s two ways to define regex patterns in JavaScript:

The literal notation
The pattern is passed in as a regex: /ab+c/i,
The constructor function
The constructor function can take the pattern as a string new RegExp('ab+c', 'i') or as a regex new RegExp(/ab+c/, 'i')

The regex object

Regex patters in JavaScript are objects:

console.log(typeof /foo/)
// object

The object has one property: lastIndex, which defines the index to start the match from. I find myself only needing this when working with the exec command, see the exec section of the article for more.

What can regex do?

In general, there are four things you can do with regex:

  1. Replace text.
  2. See if a given patterns is found from the text.
  3. Extract more complex patterns out of text.
  4. Define point where to split a string into an array.

JavaScript provides few handy functions to deal with these.

Replacing: String.prototype.replace()


str.replace(regexp|substr, newSubStr|function[, flags])

Replace must be one of the most used Regex related methods in JavaScript.

Simple example:

const greeting = 'Good morning'
const modifiedGreeting = greeting.replace(/morning$/, 'evening')

console.log(modifiedGreeting) // Good evening

Replace is common to use with a callback function:

const greeting = 'Good morning'
const greetingModifier = match => console.log('Bad ' + match)

greeting.replace(/morning$/, greetingModifier) // Bad morning

The replace method’s callback function takes the output from a capturing group as its parameters. To demonstrate this let’s make a simple helper to replace placeholder values like these {{foo}}.

But first, here’s the anatomy of the callback arguments:

const corpus = 'We have {{fruitCount}} fruits, and {{veggieCount}} of veggies.'

// 3 capturing groups, mostly for show.
corpus.replace(/({{)(\w+)(}})/gi, ( => {

The above clog spits out the arguments for each match (this being the latter):

  "We have {{fruitCount}} fruits, and {{veggieCount}} of veggies."

This is the exact anatomical structure of the callback parameters:

Given namevalue
matchThe full match from the regex.
g1, g2...An infinite amount of possible capturing groups.
offsetThe position of the match in the original string.
stringThe whole original string.
groupsObject where key is the group’s name and value the match.

And here’s the little helper that replaces the stubs globally:

const corpus = 'We have {{fruitCount}} fruits, and {{veggieCount}} of veggies.'
const replacements = {
  fruitCount: 5,
  veggieCount: 7
const replacePlaceholders = corpus =>
  corpus.replace(/{{(\w+)}}/gi, (match, group1) => replacements[group1])

// We have 5 fruits, and 7 of veggies.

Or, like mentioned earlier, the group can also be accessed with the dollar notation '$1'...:

const tarts = 'PopTarts'.replace(/Pop(Tarts)?/, 'Blob$1')

console.log(tarts) // BlobTarts

Testing: RegExp.prototype.test()



test is a super simple no thrills Regex matcher method that returns a boolean, and is good for conditions, like so:

const isSecureUrl = /^https:\/\//.match('')

if (isSecureUrl) {
  console.log('Yup, is secure')



Search is almost like test (above), but it returns the position of the match instead of a boolean, and -1 if no match was found (similarly as indexOf):

console.log('bad unboxing'.search(/box/))
// 6
console.log('bad unboxing'.search(/duck/))
// -1

Split a string: String.prototype.split()


str.split([separator[, limit]])

The split() method splits a string into an array on the spot where it finds the defined separator, and the separator can be a regex pattern. If you come from the PHP world, it’s basically the same as explode.

The following will split the string on any whitespace character (space, tab, form feed, or line feed):

const corpus = 'Lorem\tipsum\ndolor'
const myArray = corpus.split(/\s/)

console.log(myArray) // ["Lorem", "ipsum", "dolor"]

The second, optional, parameter in split is limit:

const corpus = 'Lorem ipsum dolor'
const myArray = corpus.split(/\s/, 2)

console.log(myArray.length) // 2
console.log(myArray) // ["Lorem", "ipsum"]

Matching: String.prototype.match()



What’s cool about match is that it returns an array, with a lot of useful crap in it.

This example finds placeholder values {{foo}}, which can be later on filled with actual values coming from a database or whatever:

const corpus = 'We have {{fruitCount}} fruits in the store.'
const pattern = /({{)(\w+)(}})/i
const matches = corpus.match(pattern)

That gives us an array looking like this:

['{{fruitCount}}', '{{', 'fruitCount', '}}']

Where matches[0] is the full match, and the rest are matches from the capturing groups (the stuff inside parens).

Match and the global flag

Maybe you noticed the absence of the g flag, that’s because match won’t really work the same way with the global flag.

const corpus =
  'We have {{amountOfFruit}} fruits, and {{amountOfVeggies}} of veggies.'
// Now with g
const pattern = /({{)(\w+)(}})/gi
// ["{{amountOfFruit}}", "{{amountOfVeggies}}"]

That won’t give you the groups, the groups are are ignored.

It’s not the best tool for extracting multiple placeholder values, since we need access to the group that contains the value between the double curly braces. But matchAll or exec can handle patterns using the global flag.

Matching globally: String.prototype.matchAll()



The matchAll regex method is a bit like what match should’ve been. It’s exclusively meant for global matches, actually, it throws an error if the g flag is not used.

const corpus = 'We have {{fruitCount}} fruits, and {{veggieCount}} of veggies.'
const regex = RegExp('({{)(w+)(}})', 'gi')
const matches = corpus.matchAll(regex)

// It has to be spread, because matchAll returns and iterator.
// [
//   ['{{fruitCount}}', '{{', 'fruitCount', '}}'],
//   ['{{veggieCount}}', '{{', 'veggieCount', '}}']
// ]

The return value from matchAll is an iterator, so "normal" array methods can’t be used on it, but it can be spread: [...matches], and for...of loop works on it, or Array.from(matches) also turns it into a normal array that can be worked on.

matchAll is relatively new, so IE11 doesn’t support it.

Finding successive matches: String.prototype.exec()



The exec method executes a search on an input string and returns the results array, or null if no match was found.

exec can do pretty much everything matchAll can, but it has a more a 90s-type-of-archaic-feel to it. Plus, it updates the RegExp object’s lastIndex property every time it finds a match (see below).

Here’s the above example written with exec which returns all the placeholders and their capturing groups:

const extractPlaceholders = corpus => {
  const pattern = /({{)(\w+)(}})/gi
  let array
  const placeholders = []

  while ((array = pattern.exec(corpus)) !== null) {

  return placeholders

const placeholders = extractPlaceholders(
  'We have {{amountOfFruit}} fruits, and {{amountOfVeggies}} of veggies.'

That extractPlaceholders helper function gives us a nested array:

  ['{{amountOfFruit}}', '{{', 'amountOfFruit', '}}'],
  ['{{amountOfVeggies}}', '{{', 'amountOfVeggies', '}}']

RegExp object’s lastIndex property

One thing to note, is that the RegExp object is stateful when using g or y flags, which means that every time a match is found, the RegExp.lastIndex is updated to reflect the end position of the current match.

Here’s how exec works basically:

  1. It looks for a match from a string,
  2. when match happens it updates the lastIndex in the RegExp object,
  3. and continues matching forward from that positioning.

Here’s a little function that shows the beginning positions of a word in a corpus of text:

const getWordPositions = (pattern, corpus) => {
  const regex = RegExp(pattern, 'gi')
  let array
  let wordPositionsInCorpus = []

  while ((array = regex.exec(corpus)) !== null) {
    // The `lastIndex` is now the end of the match.
    const matchStart = regex.lastIndex - array[0].length

  return wordPositionsInCorpus

const wordPositions = getWordPositions(
  'The quick brown fox jumps over the lazy dog'

// [0, 31]

Regex examples

No programming related article is complete without examples. Here’s some common things, with explanations, what one can do with Regex while programming in JavaScript. This is obviously just a scrape on the surface, sites like RegexLib offer a stupefyingly large array of ready-made Regex patterns. Also, like anything nowadays, there’s probably a prepackaged solution for you to use, just dig the npm.

Match an internal URL

Internal URL is easy to match since it has your site domain in it and it and starts with "http" or "https", or with a forward slash /:

│     │                └┬─┘└┬┘
│     │                 │   └Has nothing or anything after the /
│     └Optional s       └Or begins with a /
└─Should begin with http

This would match:

  • /
  • /foo

Here's another way to write the same thing:


Match URL (without port)

In this hectic Internet era that we live in, where everything is possible, like really messed up TLDs, it’s very difficult with 100% accuracy to match a domain name.

Here’s a Regex for a traditional TLD (since the new TLDs this is not as effective anymore):

/^(https?:\/\/)?([\da-z.\-]+)\.([a-z.]{2,6})([/w .\-]*)*/?$/
  └─────┬─────┘ └────┬─────┘ └─────┬─────┘└────┬────┘ └─┬─┘
        │            │             │           │        └─Optional forward slash at the end
        │            │             │           └─Any word character, dot, hyphen, 0 or more times
        │            │             └─The TLD, letters, 2 to 6 times
        │            └─Numbers & letters, 1 or more times
        └─http:// or https://, optional

Match a hexadecimal value

Hex values can be #dddddd or #ddd or also without the hash ddd:

 ││ └────┬────┘└─────┬────┘
 ││      │           └─Or any number or a-f 3 times
 ││      └─Any number or a-f 6 times
 │└─Optional hash
 └─Begins with

Target MD5 hash

MD5 hashes, these things: 7e18a1b53182d6124253453811b67eb0, 32 digits long hex values. The following will match the hash if it’s completely on its own:


If, for example, part of a filename, like so: global.7e18a1b53182d6124253453811b67eb0.js, it won’t match, because of the ^ and $. The following matches the hash in that filename:


Matching an HTML tag

You really almost never n͠eed̸ ͢t͝o do͢ ̨t͏h̛is, and if you’re doing this, you’re doing s̺̠o̻̭̞̞m̶͕͙ͩͯ̅ͅe̱͖͖̹̭̜͋̅̾̃̓̈́̕ṯ͇̯̤͇͇̤͌̌͛ͯͤ̑̈́ĥ̪̽ͅiņ͇̞̯ǵ̴ͨ͒̾ ͥ̓ṱ̜̬͇̮̱̰̽ͮ͊͋̅̓̿e̘̥̙̝̼͖̬ͪ͐ͧ͗ͥͧ͊͠rr̵̥̬̖̟í̛͔̹̙ͭͤb̙̓ḻ͟y w̻̖̿̄r͎͍̅͑ö̻́n̈͊ͦ͊̽̾g̶̻ and you need to stop. But all the more reason, this is a great example of a complex Regex pattern and of what Regular Expressions does if you cut its head off and let it roam free:

                               ┌─4) Non capturing group to match the closing tag
||└────┬───┘└──┬──┘|   |└─┬┘└──┬─┘└──┬──┘ |
||     |       |   |   |  |    |     |    └─End of string
||     |       |   |   |  |    |     └─Or the end of a self-closing tag " />"
||     |       |   |   |  |    └─The "1" refers to the result of the first capturing group
||     |       |   |   |  └─3) Followed by any character 0 or more times
||     |       |   |   └─Closing carrot ">"
||     |       |   └─Zero or more times
||     |       └─2) Anything but "<" one or more times
||     └─1) a to z and "-" one or more times
└─Beginning of string

Below I’ve numbered the groups above, so you can see what they exactly do:

<span class="hello">foobar</span>
└─┬─┘└──────┬─────┘ └──┬─┘└──┬──┘
  |         |          |     |
  └─1       └─2        └─3   └─4

Using variables in Regex patters

This will not work:

const haystack = 'will not work'
const needle = 'work'

haystack.replace(/needle/g, 'twerk')

But the RegExp constructor can take variables:

const haystack = 'something is wrong.'
const needle = 'something'
const regex = new RegExp(needle, 'g')

haystack.replace(regex, 'everything')

Regex testing tools

These are priceless! And there’s so many, just search for "regex tester".

I find to be all I need.


To wrap things up I could summarize:

  • Use test or search when you want a simple boolean match.
  • Use match when you want to match a single instance of a string.
  • Use matchAll or exec when you want to match globally.
  • Use exec when you want to target successive matches.
  • Use replace when you want to replace values and use capturing groups.
  • Use split when you want to split strings into arrays.

Regex is one of those things you want to use everywhere right after you’ve learned it. But often there’s a better way to do it.

Thanks to and MDN Regular Expressions articles for great references!

Comments would go here, but the commenting system isn’t ready yet, sorry. Tweet me @hiljaa if you want to make a correction etc.

  • © 2021 Antti Hiljá
  • About
  • Follow me in Twatter → @hiljaa
  • All rights reserved yadda yadda.
  • I can put just about anything here, no one reads the footer anyways.
  • console.log('Smash the patriarchy!')
  • I love u!