Reader Level:
ARTICLE

Data Compression in SQL Server 2008

Posted by Dhananjay Kumar Articles | SQL Server January 17, 2011
SQL Server 2008 automatically compresses data stored in database. SQL server does Lossless data compression.
  • 0
  • 0
  • 6900

SQL Server 2008 automatically compresses data stored in database. SQL server does Lossless data compression. 

1.gif
 
SQL server uses Dictionary based compression algorithm. 

Row Level Data Compression 

For Row level Data Compression SQL Server does not use explicitly any standard compression algorithm. It works on very simple algorithm. Say, 
  1. You created a column of CHAR(50)
  2. Normally SQL server requires 50 bytes regardless of the actual byte needed by your Data. 
  3. If you are storing "DEBUG MODE" in that column then you really need 10 bytes of storage. 
So in Row Level Data Compression, rather than fixed format data storage, SQL Server stores data in a variable format. 

2.gif
 
We need to say at time of table creation that DATA_COMPRESSION = ROW

Estimate Row Level compression saving 

3.gif
 
In above query dbo is name of the schema and TempTable is name of the table and ROW parameter says to estimate Row level estimation.  

Page Level Data Compression 

In SQL server 2005 page level compression was done by minimizing data redundancy but in SQL Server 2008 it is performed by 
  1. Reducing Data Redundancy 
  2. Lossless Data Compression algorithm 
  3. Column Prefix compression 
4.gif

With column prefix compression first SQL Server identifies repeated byte sequence in beginning of column in all rows on the page. If the same column has the same byte pattern in more than one row then SQL server stores the byte pattern once and replaces the other byte patterns with the pointer. 

SQL Server creates Dictionary per page and stores repeating vales of the page in the dictionary and performs the compression on Dictionary. 

Compression Saving is directly proportional to repeated byte patterns 

5.gif
 
Estimate Page Level compression saving

6.gif
 
In the above query dbo is the name of the schema and TempTable1 is the name of the table and PAGE parameter says to estimate Row level estimation.

COMMENT USING

Trending up