As with everything in this language, we have an object: it’s called RegExp. We can create one in two ways:
- Regular expression literal
var query = /pattern/flags; Everything between the two “/“s is taken as the pattern, the flags come after it. Writing out the expression directly onto a variable like this gives better performance, but isn’t so well suited to receiving variables to build up the pattern. After all, it’s hardcoded.
- Regular expression constructor
var query = new RegExp('pattern','flags'); A constructor function that lets us pass in values (i.e. We can replace the ‘pattern’ and ‘flags’ arguments with variables). This is slower, but it is great for dynamically producing our patterns with user input.
Now we have the RegExp object. As it is an extension of the main Object ‘class’ it has all the default properties, but it also has a few unique ones of its own:
string. The pattern we’re searching with. This is the crazy gibberish we’ll be getting into after laying out the context within which we use it.
int. The number at which to start the next search, so if you want to start searching from the 2nd character in a string, this would be set to 1 (for 0 based numbering). Initially, this defaults to 0. After the first result is found it is set to the index of the first character after our result in the main string. So, if you search for a 3 letter word and the pattern matches the one that starts at the 2nd character,lastIndex will then be set to 7. We will clear this concept soon.
The other unique properties on RegExp hold the values set by the flags we pass or write in:
Boolean (i), As itself it says that, to ignore a string to be searched or viewed.
Boolean (g). Without this, only one result will be returned from a string – the first one, over and over again. The g flag means search (with your pattern) throughout the entire string, then we can get all the results.
Boolean (m). This treat lines as separate strings, ^(beginning of a string) and $(end of a string) will hook onto the start and end of each line. Without the m flag, that would only match the start and end of the entire string.
Boolean (y). Will only look up matches from lastIndex, and no more, this is currently supported in FireFox only, but it is now in the ES6 spec so the others might implement it.
Let’s Get Into a simple example. This pattern just looks for any exact matches of “abc” and has the three cross-browser compatible flags set: i, g, and m. The order of the flags doesn’t matter.
var myRegExp = /Geeks Trick/igm; alert("Source: " + myRegExp.source + "\n Ignore Case : " + myRegExp.ignoreCase + "\n Global : " + myRegExp.global + "\n Multiline : " + myRegExp.multiline + "\n Last Index: " + myRegExp.lastIndex)
/Geeks Trick/ig Look for every match g of exactly ‘Geeks Trick‘, but ignore the case i and treat new lines as string boundaries m
These are the guys that take a RegExp object and use it to work out some kind of result from a string. They search, they check, they make new strings, they build new arrays, they do all we could possibly (I think) want them to do! For each example, in this section, we will use a fairly simple pattern /\a\w+/ which will match any word beginning with ‘a’. we will also use this on the string ‘aa ab ac’, that way we’ll be able to see what’s going on with each function quite clearly. You can copy and paste these examples into the console to follow along, just make sure you paste these two values in first:
var stringToSearch = 'aa ab ac'; var myRegExp = /a\w+/g;
/a\w+/g Look for any match g, where ‘a’ is followed by one or more + word characters \w
In cases where a search may match multiple substrings, this function lets us iterate through them. It is best to use it within a loop but I will be explicit in the example for clarity. Also, this will only work with the global g flag set on our RegExp, otherwise the lastIndex property will not increment and we will be stuck on the first result. So set it! Then loop. Eventually, we will reach the final result, after that exec will return null once, then start from the beginning again.
console.log( myRegExp.lastIndex); // 0 console.log( myRegExp.exec(stringToSearch) ); //["aa", index: 0, input: "aa ab ac"] console.log( myRegExp.lastIndex) // 2 console.log( myRegExp.exec(stringToSearch) ); //["ab", index: 3, input: "aa ab ac"] console.log( myRegExp.lastIndex) // 5 console.log( myRegExp.exec(stringToSearch) ); //["ac", index: 6, input: "aa ab ac"] console.log( myRegExp.lastIndex) // 8 console.log( myRegExp.exec(stringToSearch) ); // null console.log( myRegExp.lastIndex); // 0 console.log( myRegExp.exec(stringToSearch) ); // ["aa", index: 0, input: "aa ab ac"] console.log( myRegExp.lastIndex) // 2 //and so on
- index :
The index of the result’s first character in the original string (so the first result gets 0 as ‘aa’ is at the beginning of the original string, the second gets 3 as the first letter of ‘ab’ is the third character of the original string).
- input : The original string.
Say you’d like to check if your pattern is actually going to match anything in a string. This function will simply return true for yes and false for no. Super simple! Note, with the global set on our RegExp, this will also iterate through the results updating lastIndex in exactly the same way as exec(), so if you’ve done some work with a global pattern and are about to test – it’s probably a good idea to RegExp.lastIndex = 0; so you don’t get a false negative by accidentally testing from the end of the original string.
For this example we can again paste into the console (if you’ve already put in the two variables there’s no need to put them in again). Run it a few times to see what happens:
console.log( myRegExp.test(stringToSearch) ); //true console.log( myRegExp.lastIndex); //n
You could also set a non global var myRegExp = /\a\w+/; pattern and try the above examples again just to see how they react.
It returns an array of the results, handy if you’re looking to count how many results we get! Watch out for that global flag though – without it we’ll only get the first result. You might also note that the above two functions (exec and test) are properties of the RegExp object (as in we call RegExp.function('and pass in the string here');). The remaining 4 functions (match, search, replace, and split) are properties of the String object so the format flips.
stringToSearch.match(myRegExp); // ["aa","ab","ac"]
This is kind of similar to test() in that it allows us to check if we have anything in the string that our pattern matches. But, this returns the index of the first match – even with the global flag and lastIndex set beyond the first match. Also, given that it returns a number, -1 is used to indicate there are no matches.
myRegExp.lastIndex = 5; //just to prove search ignores this stringToSearch.search(myRegExp); // 0 stringToSearch.search(/ad/); // to show -1 as this finds nothing
it’s find and replace! Our Regexp is used to match parts of a string which are then replaced by another string that we define. The replace function takes two arguments. Our RegExp goes in as the first and the replacement string goes in as the second.
var newString = stringToSearch.replace(myRegExp, 'replacement!'); console.log(stringToSearch); // "aa ab ac" console.log(newString); // "replacement! replacement! replacement!"
NOTE : if we don’t pass in a replacement string ‘undefined’ is used instead – am I the only one who finds this weirdly amusing? If you’re looking to use replace() to deleted substrings, pass in ” as the second argument.
It is for those occasions when we have a string and wish to split it up into an array of substrings, each defined by something we can match – like commas, or something far more complex / variable if the string is a mess.
myRegExp = / /; // just a space var spaceSeparatedArray = stringToSearch.split(myRegExp); console.log(spaceSeparatedArray); //["aa", "ab", "ac"]
Note, again, the pattern isn’t global but split() doesn’t care.