Regular Expressions Part 3: Negation and Assertions

Before reading this article, I highly recommend reading my previous parts:

This article will focus on how to tell the regular expression engine to ignore certain characters, or patterns of characters.

We will also explore how to tell the engine to look for a certain pattern of text when that pattern is found to the right or to the left of the rest of a string.

One form of negation is supported in character classes.

The following is an example of a regular expression:

  1. var myExpression = new Regex(@"[^aeiou]");  
This line of code tells the regular expression instance to match every character, except vowels.

Another form of negation can be seen when using the special character classes.

Here is a list of the syntax that can be typed into a Regex object:

 

    \S Do not match white space characters.
    \D Do not match any digit.
    \W Do not match any "word" character (in other words, a through z, A through Z, the _ character and numbers).

Another concept that goes hand in hand with negation is called assertions. Assertions are used to look for a pattern that is found to the left or to the right of the rest of a string.

Here's an example:

  1. var myExpression = new Regex(@"(q(?!p))");  
As one can see, the expression q(?!p) is enclosed in a group.

The expression inside the group means that the engine should match the letter q, if it is not next to the letter p.

Every time we want to say, "x not followed by y," we use the syntax ?! followed by the expression we want to ignore.

This needs to be enclosed in parentheses. That is the reason for our second set of parentheses in our example above.

How, then, should we tell the regular expression engine to search for the letter q, only if it is next to the letter p?

The only change that needs to be made is swapping the character ! for the character =.

Here's the modified example:
  1. var myExpression = new Regex(@"(q(?=p))");  
In future articles, we will explore some replacement patterns, as well as some guidelines on how to incorporate regular expressions in production code.