Learn About Regular Expressions In C#

Regular expression is a pattern matching technique where we can check whether a given input is in a specified format or not. We have RegEx class in .Net for it. We will have to import System.Text.RegularExpression namespace to access RegEx class.  RegEx class has IsMatch() method, which determines whether a given input format matches our pre-specified format or not.
 

Why do we need Regular expression ?

 
Suppose we have to save Email Id in our application. Before saving any input value we must validate its format. It's easier to validate it using regular expression than by writing our own custom logic. Similarly if we need any kind of input format validation (password validation, input restriction etc) we can use regular expression.
 
In order to define our own expression format, we must know the meaning of a set of predefined characters and operators in terms of regular expression. So let's begin with the logic to write a regular expression.
  1. static void Main(string[] args)     
  2. {       
  3.     string input = "test123";     
  4.     string regexPattern = "[a-z]";     
  5.               
  6.     if(new Regex(regexPattern).IsMatch(input)){    
  7.         Console.WriteLine("Pattern matched");    
  8.     }    
  9.     else {    
  10.         Console.WriteLine("Pattern did not match");    
  11.     }    
  12.               
  13.     Console.ReadKey();     
  14. } 
Here variable "input" refers to user input, "regexPattern" refers to the pattern restriction we wish to have. 
 
New RegEx(regexPattern) - in this piece of code , we are trying to create a RegEx object by passing our desired pattern format.
 
New Regex(regexPattern).IsMatch(input) - with this piece of code, we are trying to figure out whether given input format matches our desired format or not.
 
Output
 
 
Let's understand the meaning of "[a-z]" expression. [] is a character class that matches any one character defined inside it (that is a-z) and it's case-sensitive. We have a few more commonly used character classes as described below.
 
Character classes  Description
 [<specific_character(s)>]  RegEx matches any single character defined in <specific_character(s)>
[^<specific_character(s)>]  RegEx matches any single character that is not defined in <specific_character(s)>
 .  Matches any character except \n (new line)
 \w  Matches any word character, that is any alpha numeric and underscore character.
 \W   Matches any non-word character
 \s Matches white-space character like tab, space .
 \S  Matches any no-white-space character like tab, space .
 \d  Matches any digit character
 \D  Matches any non-digit character
 
Let's understand this in detail.
 

Any Alphabet Match 

 
Suppose we are trying to figure out whether the given input has any lower case characters (that is, any characters starting from a to z ). In that case we should use [] character class and our expression will be [a-z]. We have already tried this in the above example. Now suppose we wish to see whether the given format has either "a" or "b" or "c" character.
 
Then our RegEx expression should look like [abc]. Let's verify this. 
  1. static void Main()  
  2. {  
  3.     Program p = new Program();  
  4.     p.AnyCharacterMatch();    
  5.     Console.ReadKey();    
  6. }  
  7.       
  8. void AnyCharacterMatch() {  
  9.     List<string> inputs = new List<string>(){  
  10.         "123",  
  11.         "@#$",  
  12.         "test123",  
  13.         "123a*@",  
  14.         "cab123",  
  15.         "*#c12"  
  16.     };     
  17.     string regexPattern = "[abc]";   
  18.           
  19.     foreach(string input in inputs){  
  20.         if(new Regex(regexPattern).IsMatch(input)){  
  21.             Console.WriteLine("Pattern matched");    
  22.         }    
  23.         else {    
  24.             Console.WriteLine("Pattern did not match");    
  25.         }    
  26.               
  27.     }  
  28. }  
Output
 
 
Only those strings which have either "a" or "b" or "c" matches our pattern.
 

Any Number Match 

 
Likewise if we wish to find whether a given string has any number , then our RegEx pattern will be "[0-9]". 
  1. static void Main()  
  2. {  
  3.     Program p = new Program();  
  4.     p.AnyNumberMatch();    
  5.     Console.ReadKey();    
  6. }  
  7.       
  8. void AnyNumberMatch() {  
  9.     List<string> inputs = new List<string>(){  
  10.         "@#$",  
  11.         "test",  
  12.         "test123",  
  13.         "123a*@",  
  14.         "123"  
  15.     };     
  16.     string regexPattern = "[0-9]";   
  17.           
  18.     foreach(string input in inputs){  
  19.         if(new Regex(regexPattern).IsMatch(input)){  
  20.             Console.WriteLine("Pattern matched for "+ input);    
  21.         }    
  22.         else {    
  23.             Console.WriteLine("Pattern did not match for "+ input);    
  24.         }    
  25.               
  26.     }  
  27. }  
Output
 

 

AND operator Match

 
Similarly suppose we wish to check whether given input has at least one character and number. Our regular expression should look like "[a-z][0-9]" , meaning any character that matches with a to z and 0 to 9 . Now suppose we wish to check a pattern which must have at least one upper and lower case character and number. Our RegEx should look like "[a-z][A-Z][0-9]".
 
Note
The order of character classes ("[]") is important here.
 
Ex,
  1. static void Main()    
  2. {    
  3.     Program p = new Program();                  
  4.     p.MandatorySequenceMatch();  
  5.          
  6.     Console.ReadKey();      
  7. }    
  8.      
  9. void MandatorySequenceMatch() {    
  10.     List<string> inputs = new List<string>(){    
  11.         "123",    
  12.         "@#$",    
  13.         "test123",    
  14.         "test123ABC",  
  15.         "TestA123",  
  16.         "XYZabc0987",  
  17.         "Ab1",    
  18.          "test123Ab1"  
  19.     };       
  20.     string regexPattern = "[A-Z][a-z][0-9]";     
  21.            
  22.     foreach(string input in inputs){    
  23.         if(new Regex(regexPattern).IsMatch(input)){    
  24.             Console.WriteLine("Pattern matched for "+ input);      
  25.         }      
  26.         else {      
  27.             Console.WriteLine("Pattern did not match for "+ input);      
  28.         }      
  29.                
  30.     }  
  31. }  
Output
 
 
Explanation
 
Pattern "[A-Z][a-z][0-9]" refers to any string that has one upper character, one lower character and a number in sequence. Only one character from each. For example "Ab1", "abCd2", "abcXy7345tesT". (only highlighted sections are exact match for the regular expression)
 
In "123", "@#$" and "test123" input strings either number or upper or lower character is missing. Hence pattern is not matching.
 
In "test123ABC","TestA123", order of upper, lower case and number characters are not followed. Hence pattern is not matching.
 
In "XYZabc0987", we have multiple characters of upper, lower case characters and numbers but we don't have any substring which must contain consecutive upper, lower case characters and numbers, one character each. Hence pattern is not matching.
 
"Ab1" is the exact pattern we are expecting as per the expression.
 
In "test123Ab1", "Ab1" substring matches the RegEx pattern. Hence given input matches the pattern too.
 
Note
A combination of multiple [] character classes behaves as AND operator and strictly follows the sequence.
 

Start character Match

 
Now suppose we wish to check a pattern that must start with a number. For such a condition we start with ("^" ) pattern. "^" pattern searches given character group (0 to 9 in case of "^[0-9]") at position index 0 of the given input string. Without this pattern RegEx will try to match pattern at any position index. Let's look at the below code.
 
Ex,
  1. static void Main()    
  2. {    
  3.     Program p = new Program();                  
  4.     p.MustStartWithNumber();  
  5.           
  6.     Console.ReadKey();      
  7. }    
  8.       
  9. //must start with 0 to 9  
  10. void MustStartWithNumber() {    
  11.     List<string> inputs = new List<string>(){    
  12.         "test",  
  13.         "test123",    
  14.         "Test123ABC",  
  15.         "*1abc",  
  16.         "123",  
  17.         "123Test",  
  18.         "7TEST"  
  19.     };       
  20.     string regexPattern = "^[0-9]";     
  21.             
  22.     foreach(string input in inputs){    
  23.         if(new Regex(regexPattern).IsMatch(input)){    
  24.             Console.WriteLine("Pattern matched for "+ input);      
  25.         }      
  26.         else {      
  27.             Console.WriteLine("Pattern did not match for "+ input);      
  28.         }      
  29.                 
  30.     }  
  31. }  
Output
 
 
Explanation
 
In "test","test123", "Test123ABC", "*1abc" don't have numbers as start characters. Hence pattern is not matching for those.
 
In "123", "123Test","7TEST" has first character as number. Hence pattern is matching. 
 

End Character Match

 
Similarly if we wish to check a pattern that must end with a number we have to use "$". It checks if the last character of given input string matches with the pattern character group or not. Let's look at the below code.
 
Ex,
  1. static void Main()    
  2. {    
  3.     Program p = new Program();                 
  4.     p.MustEndtWithNumber();  
  5.           
  6.     Console.ReadKey();      
  7. }    
  8.       
  9. //must end with 0 to 9  
  10. void MustEndtWithNumber() {    
  11.     List<string> inputs = new List<string>(){    
  12.         "test",  
  13.         "Test123A*",  
  14.         "123Test",  
  15.         "test123",   
  16.         "123",  
  17.         "*@Ab1"  
  18.     };       
  19.     string regexPattern = "[0-9]$";     
  20.             
  21.     foreach(string input in inputs){    
  22.         if(new Regex(regexPattern).IsMatch(input)){    
  23.             Console.WriteLine("Pattern matched for "+ input);      
  24.         }      
  25.         else {      
  26.             Console.WriteLine("Pattern did not match for "+ input);      
  27.         }      
  28.                 
  29.     }  
  30. }  
Output
 

 
Explanation
 
In "test", "Test123A*" and "123Test", end character is not a number. Hence pattern is not matching.
 
In "test123", "123", "*@Ab1" , end character is a number. Hence pattern is matching.
 

OR operator Match

 
Now suppose we wish to match a pattern for a group of characters as either group 1 or group 2. For example, a website must end with ".com" or ".in". For such conditions, we will have to use OR pattern ("|"). Let's see it in the below example. 
  1. static void Main()    
  2. {    
  3.     Program p = new Program();                  
  4.     p.OrPatternMatch();  
  5.           
  6.     Console.ReadKey();      
  7. }      
  8. void OrPatternMatch(){  
  9.     List<string> inputs = new List<string>(){    
  10.         "test",  
  11.         "Test1.abc",  
  12.         "123.comabc",  
  13.         "*123.com",  
  14.         "test123.in"  
  15.     };       
  16.     string regexPattern = @"(\.com|\.in)$";     
  17.             
  18.     foreach(string input in inputs){    
  19.         if(new Regex(regexPattern).IsMatch(input)){    
  20.             Console.WriteLine("Pattern matched for "+ input);      
  21.         }      
  22.         else {      
  23.             Console.WriteLine("Pattern did not match for "+ input);      
  24.         }      
  25.                 
  26.     }  
  27. }  

Output

 
Explanation
 
Here our regular expression looks like @"(\.com|\.in)$" . "@" refers to verbatim identifier. We have used "\." here. It's because regular expression has "." as a pattern. In order to make it understand that we are trying to use "." character not the pattern, we should define it as "\.". so (\.com|\.in) expression refers to either ".com" or ".in" pattern. At the end we have used "$" pattern for end character group match.
 
In "test"and "Test1.abc", neither ".com" nor ".in" is present. Hence pattern is not matching.
 
In "123.comabc", it has ".com" but not at the end. Hence pattern is not matching.
 
In "*123.com", "test123.in", it has either ".com" or ".in" at the end. Hence pattern is matching.
 

Minimum Character Match

 
Suppose we wish to check a pattern that contains at least one special character like *,@,#,&. In this case we must use minimun character check pattern ([character_group]{n,});
Here "n" refers to a minimum number of characters expected from our given character group. 
 
Ex,
  1. static void Main()  
  2. {    
  3.     Program p = new Program();                  
  4.     p.SpecialCharacterMatch();  
  5.          
  6.     Console.ReadKey();      
  7. }     
  8. void SpecialCharacterMatch(){  
  9.     List<string> inputs = new List<string>(){    
  10.         "test",  
  11.         "abc%",  
  12.         "*123.com",  
  13.         "123@test#",  
  14.         "test123##**abc"  
  15.     };       
  16.     string regexPattern = @"[*@#&]{1,}";     
  17.            
  18.     foreach(string input in inputs){    
  19.         if(new Regex(regexPattern).IsMatch(input)){    
  20.             Console.WriteLine("Pattern matched for "+ input);      
  21.         }      
  22.         else {      
  23.             Console.WriteLine("Pattern did not match for "+ input);      
  24.         }      
  25.                
  26.     }  
  27. }  
Output
 
 
Explanation
 
In "test", we don't have any special character. Hence pattern is not matching.
 
In "abc%", we have "%" special character but it doesn't have any of our defined special characters. Hence pattern is not matching.
 
In "*123.com", "123@test#" and "test123##**abc", has one or more matching character(s) (marked in yellow). Hence pattern is matching.
 

Maximum Character Match

 
Likewise if we wish to restrict the maximum character count , we can define RegEx as "[character_group]{m,n}".
 
Ex,
 
regexPattern = "^[a-z]{1,5}$";
 
Here {1,5} refers to at least 1 character and at max 5.
 

Fixed Characters Match

 
Now suppose we wish to validate a domain Email address of an organization.
 
Example - all users under an organization will have email id pattern as [combination of any character,number ,"." and/or "_" special character]@domain.com .
 
Email Id rules,
  • Must start with alphabet.
  • At least 3 characters before @domain.com
  • only "." and "_" special characters are allowed. 
  • must contain @domain.com at the end
Our RegEx will look like below.
  1. static void Main()  
  2. {    
  3.     Program p = new Program();                  
  4.     p.MustContainFixedCharactersMatch();  
  5.           
  6.     Console.ReadKey();
  7. }          
  8. void MustContainFixedCharactersMatch(){  
  9.     List<string> inputs = new List<string>(){    
  10.         "test",  
  11.         "abc.09@abc",  
  12.         "abcdomaincom",  
  13.         "123abc@domain.com",  
  14.         "a@domain.com",  
  15.         "test@domain.com",  
  16.         "test_23@domain.com"  
  17.     };       
  18.     string regexPattern = @"^[a-zA-Z][a-zA-Z0-9_.]{2,}\@domain.com$";     
  19.             
  20.     foreach(string input in inputs){    
  21.         if(new Regex(regexPattern).IsMatch(input)){    
  22.             Console.WriteLine("Pattern matched for "+ input);      
  23.         }      
  24.         else {      
  25.             Console.WriteLine("Pattern did not match for "+ input);      
  26.         }      
  27.                 
  28.     }  
  29. }  
Output
 
 
Explanation
 
Pattern - "^[a-zA-Z][a-zA-Z0-9_.]{2,}\@domain.com$"
 
Here ^[a-zA-Z]: validates whether starting character of given input is an alphabet letter or not.
 
[a-zA-Z0-9_.]{2,}: We have at least 3 characters before domain name validation. Earlier expression group has already considered first character. Hence we have minimum character count validation as 2. This part of email Id can have any character, number and "_","." special character(s).
 
\@: Simply represents character "@".
 
domain.com$: This is the domain identifier which is fixed and user email Id must end with it.
 
In "test", "abc.09@abc" and "abcdomaincom", none of them match domain validation , that is @domain.com. Hence expression is not matched.
 
In "123abc@domain.com", given string first character is not alphabet. Hence expression is not matched.
 
In "a@domain.com", minimum 3 characters before domain name validation fails. Hence expression is not matched.
 
In "test@domain.com", "test_23@domain.com", both are satisfying all email Id rules. Hence expression is matching.
 

Password Match

 
Now suppose we wish to validate password. Criteria of password is as follows.
  • Minimum length 8 characters.
  • Must start with alphabet
  • Must contain at least an upper case and a lower case character and a number
  • Must contain a special character out of *,#,_
Here is the code for it.
  1. static void Main()  
  2. {    
  3.     Program p = new Program();                 
  4.     p.PasswordMatch();  
  5.           
  6.     Console.ReadKey();      
  7. }      
  8. void PasswordMatch(){  
  9.     List<string> inputs = new List<string>(){    
  10.         "test",  
  11.         "123",  
  12.         "test*123",  
  13.         "tesT*1",  
  14.         "tesT*14fgff",  
  15.         "tesT*6d6@14fgff"  
  16.     };       
  17.     string regexPattern = @"^(?=\S*[a-z])(?=\S*[A-Z])(?=\S*\d)(?=\S*[^\w\s])\S{8,}$";     
  18.             
  19.     foreach(string input in inputs){    
  20.         if(new Regex(regexPattern).IsMatch(input)){    
  21.             Console.WriteLine("Pattern matched for "+ input);      
  22.         }      
  23.         else {      
  24.             Console.WriteLine("Pattern did not match for "+ input);      
  25.         }      
  26.                 
  27.     }  
  28. }      
Output
 
 
Explanation
 
Regular expression pattern @"^(?=\S*[a-z])(?=\S*[A-Z])(?=\S*\d)(?=\S*[^\w\s])\S{8,}$";
 
Here (?=) represents positive look ahead. Meaning input must contain character(s) defined after "?=".
 
(?=\S*[a-z]) :- must contain a-z without whitespace
 
(?=\S*[A-Z]):- must contain A-Z without whitespace
 
(?=\S*\d):- must contain any number without whitespace
 
(?=\S*[^\w\s]):- must contain any character except word (alphanumeric and underscore) and whitespace
 
\S{8,}:- minimum character length as 8 without whitespace
 
In "test", "123", "test*123"- inputs don't have all types of characters that is an upper case or lower case character, number and special character. Hence expression is not matching
 
In "tesT*1", - Length is less than 8 characters. Hence expression is not matching.
 
In "tesT*14fgff" and "tesT*6d6@14fgff" , both the inputs are matching all password criteria. Hence expression is matching.
 
Here is the complete code that we have discussed so far.
  1. using System.IO;  
  2. using System;  
  3. using System.Collections.Generic;  
  4. using System.Text.RegularExpressions;  
  5.   
  6. class Program  
  7. {  
  8.     static void Main()  
  9.     {    
  10.         Program p = new Program();  
  11.           
  12.         p.AnyCharacterMatch();  
  13.         p.AnyNumberMatch();  
  14.         p.MandatorySequenceMatch();  
  15.         p.MustStartWithNumber();  
  16.         p.MustEndtWithNumber();  
  17.         p.OrPatternMatch();  
  18.         p.SpecialCharacterMatch();  
  19.         p.MustContainFixedCharactersMatch();  
  20.           
  21.         Console.ReadKey();      
  22.     }          
  23.     void MustContainFixedCharactersMatch(){  
  24.         List<string> inputs = new List<string>(){    
  25.             "test",  
  26.           "abc.09@abc",  
  27.           "abcdomaincom",  
  28.           "123abc@domain.com",  
  29.           "a@domain.com",  
  30.           "test@domain.com",  
  31.           "test_23@domain.com"  
  32.         };       
  33.         string regexPattern = @"^[a-zA-Z][a-zA-Z0-9_.]{2,}\@domain.com$";     
  34.             
  35.         foreach(string input in inputs){    
  36.             if(new Regex(regexPattern).IsMatch(input)){    
  37.                 Console.WriteLine("Pattern matched for "+ input);      
  38.             }      
  39.             else {      
  40.                 Console.WriteLine("Pattern did not match for "+ input);      
  41.             }      
  42.                 
  43.         }  
  44.     }   
  45.     void SpecialCharacterMatch(){  
  46.         List<string> inputs = new List<string>(){    
  47.             "test",  
  48.           "abc%",  
  49.           "*123.com",  
  50.           "123@test#",  
  51.           "test123##**abc"  
  52.         };       
  53.         string regexPattern = @"[*@#&]{1,}";     
  54.             
  55.         foreach(string input in inputs){    
  56.             if(new Regex(regexPattern).IsMatch(input)){    
  57.                 Console.WriteLine("Pattern matched for "+ input);      
  58.             }      
  59.             else {      
  60.                 Console.WriteLine("Pattern did not match for "+ input);      
  61.             }      
  62.                 
  63.         }  
  64.     }         
  65.     void OrPatternMatch(){  
  66.         List<string> inputs = new List<string>(){    
  67.             "test",  
  68.           "Test1.abc",  
  69.           "123.comabc",  
  70.           "*123.com",  
  71.           "test123.in"  
  72.         };       
  73.         string regexPattern = @"(\.com|\.in)$";     
  74.             
  75.         foreach(string input in inputs){    
  76.             if(new Regex(regexPattern).IsMatch(input)){    
  77.                 Console.WriteLine("Pattern matched for "+ input);      
  78.             }      
  79.             else {      
  80.                 Console.WriteLine("Pattern did not match for "+ input);      
  81.             }      
  82.                 
  83.         }  
  84.     }  
  85.       
  86.     //must end with 0 to 9  
  87.     void MustEndtWithNumber() {    
  88.         List<string> inputs = new List<string>(){    
  89.             "test",  
  90.           "Test123A*",  
  91.           "123Test",  
  92.           "test123",   
  93.           "123",  
  94.           "*@Ab1"  
  95.         };       
  96.         string regexPattern = "[0-9]$";     
  97.             
  98.         foreach(string input in inputs){    
  99.             if(new Regex(regexPattern).IsMatch(input)){    
  100.                 Console.WriteLine("Pattern matched for "+ input);      
  101.             }      
  102.             else {      
  103.                 Console.WriteLine("Pattern did not match for "+ input);      
  104.             }      
  105.                 
  106.         }  
  107.     }  
  108.       
  109.     //must start with 0 to 9  
  110.     void MustStartWithNumber() {    
  111.         List<string> inputs = new List<string>(){    
  112.             "test",  
  113.           "test123",    
  114.           "Test123ABC",  
  115.           "*1abc",  
  116.           "123",  
  117.           "123Test",  
  118.           "7TEST"  
  119.         };       
  120.         string regexPattern = "^[0-9]";     
  121.             
  122.         foreach(string input in inputs){    
  123.             if(new Regex(regexPattern).IsMatch(input)){    
  124.                 Console.WriteLine("Pattern matched for "+ input);      
  125.             }      
  126.             else {      
  127.                 Console.WriteLine("Pattern did not match for "+ input);      
  128.             }      
  129.                 
  130.         }  
  131.     }  
  132.       
  133.     // one upper character, one lower character and a number in sequence  
  134.     void MandatorySequenceMatch() {    
  135.         List<string> inputs = new List<string>(){    
  136.           "123",    
  137.           "@#$",    
  138.           "test123",    
  139.           "test123ABC",  
  140.           "TestA123",  
  141.           "XYZabc0987",  
  142.           "Ab1",    
  143.            "test123Ab1"  
  144.         };       
  145.         string regexPattern = "[A-Z][a-z][0-9]";     
  146.             
  147.         foreach(string input in inputs){    
  148.             if(new Regex(regexPattern).IsMatch(input)){    
  149.                 Console.WriteLine("Pattern matched for "+ input);      
  150.             }      
  151.             else {      
  152.                 Console.WriteLine("Pattern did not match for "+ input);      
  153.             }      
  154.                 
  155.         }  
  156.     }  
  157.         
  158.     void AnyNumberMatch() {    
  159.         List<string> inputs = new List<string>(){    
  160.           "@#$",    
  161.           "test",    
  162.           "test123",    
  163.           "123a*@",    
  164.           "123"    
  165.         };       
  166.         string regexPattern = "[0-9]";     
  167.             
  168.         foreach(string input in inputs){    
  169.             if(new Regex(regexPattern).IsMatch(input)){    
  170.                 Console.WriteLine("Pattern matched for "+ input);      
  171.             }      
  172.             else {      
  173.                 Console.WriteLine("Pattern did not match for"+ input);      
  174.             }      
  175.                 
  176.         }    
  177.     }   
  178.       
  179.     //any input that has either "a"/"b"/"c"  
  180.     void AnyCharacterMatch() {    
  181.         List<string> inputs = new List<string>(){    
  182.           "123",    
  183.           "@#$",    
  184.           "test123",    
  185.           "123a*@",    
  186.           "cab123",    
  187.           "*#c12"    
  188.         };       
  189.         string regexPattern = "[abc]";     
  190.             
  191.         foreach(string input in inputs){    
  192.             if(new Regex(regexPattern).IsMatch(input)){    
  193.                 Console.WriteLine("Pattern matched");      
  194.             }      
  195.             else {      
  196.                 Console.WriteLine("Pattern did not match");      
  197.             }      
  198.                 
  199.         }   
  200.     }  
  201. }  
We can save time by using regular expressions for any kind of pattern matching, rather than going with language specific logic . It is useful in scenarios as explained above. I hope this was useful for you. Thanks for taking your valuable time to read this.