SOUNDEX And DIFFERENCE In SQL Server

Sometimes we are required to compare strings based on how the string sounds when spoken. For such a task SQL Server provides SOUNDEX and DIFFERENCE function. In this article we will read about both functions.

SOUNDEX

SOUNDEX converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken. The first character of the code is the first character of character_expression, converted to upper case. The second through fourth characters of the code are numbers that represent the letters in the expression. SOUNDEX ignores the vowels(‘A’,’E’,’I’,’O’,’U’) from alphanumeric strings until that are not the first character of the string. If alphanumeric string is small and necessary to generate a four-character code then extra zeroes are added at the end of the string.

Syntax


Example



In the above example we generate SOUNDX code for strings. Now a question arises --  how does SQL Server generate SOUNDEX code for given alpha numeric string? As I described, that first letter of SOUNDEX code is the first letter of the alpha numeric string and the last 3 digits are assigned according this table.

Number Represent the Letters
1 B,F,P,V
2 C, G, J, K, Q, S, X, Z
3 D, T
4 L
5 M,N
6 R

A, E, I, O, U, H, W, and Y letters are ignored from alpha numeric string.

You can find more information about “SOUNDEX system” here. Soundex

For “csharpcorner” we get “C612” code. In this code “C” is for first letter, 6 for “R”, 1 for “P” and 2 for “C”.

DIFFERENCE

The DIFFERENCE function compares two expressions and assigns a value between 0 and 4, with 0 being little to no similarity and 4 representing the same or very similar phrases. This value is derived from the number of characters in the SOUNDEX of each phrase that are the same.

Syntax



Example



Output



In the above example we get the value between 0 to 4 that represents the similarity betwen the SOUNDEX of two words.

Now we have knowledge of soundex but a question arises --  what is the use of soundex or where can we implement it in our project. Generally SOUNDEX is used in a search engine. SOUNDEX is used in FULL-Text search where we want to search similar words.

Let us create a table and insert some data into this.

  1. DECLARE @tbl_Soundex AS TABLE  
  2.     (  
  3.         Word varchar(max)  
  4.     );  
  5.   
  6. INSERT INTO @tbl_Soundex  
  7. SELECT 'Symth'  
  8. UNION ALL  
  9. SELECT 'Alwar'  
  10. UNION ALL  
  11. SELECT 'csharpcorner'  
  12. UNION ALL  
  13. SELECT 'csharpcor'  
  14. UNION ALL  
  15. SELECT 'seesharpcorner'  
  16. UNION ALL  
  17. SELECT 'Choudhary'  
  18.   
  19. SELECT * FROM @tbl_Soundex ts  
Table



Example 1

We have an entry for “Symth” in the table. We have a person who wants to search for “Symth” but he writes “Smith” instead of “Symth”. In this case we try to use the like operator then we don't get any result.



We can solve this problem by using the SOUNDEX operator because SOUNDS of both word (‘Smyth’,’Smith’) are the same.



After using the SOUNDEX function we find a match for ‘Smith’ into the table.

Example 2

My name is “Pankaj” and I used “Choudhary” as my surname. But many time I find that people use “Chaudhary” or “Chowdhry” instead of “Choudhary”. We have a table that contains the following data.



Now we want to search for all persons who use “Choudhary” as a surname. If we use the like operator then we only get a single record.



Above result is not the correct result because there is another person who uses different spellings for “Choudhary” but it sounds the same . In this we can use the SOUNDEX function to find out all similar results.



Conclusion

Use SOUNDEX and DIFFERENCE function when you want to perform searching on the SOUND or pronunciation of the words. Thanks for reading the article.


Similar Articles