Learn About SQL Server Architecture

This is my very first article on SQL Server Architecture. I tried to make this as simple as possible. I just want to share what I have learned. If there is any error or wrong information, please leave a comment; I will be obliged to rectify that and grow with that. Happy learning!

SQL Server Architecture consists of the following.

  1. Page Architecture
  2. Database Architecture
  3. Network Architecture
  4. Detailed SQL Architecture
Page Architecture

The data in the database is saved in the least unit, that is, the data page. The size of the data page is 8 KB. Data page consists of a Page Header, Data Row, and Row Offset.

Page Header

It is the header of the data page and is 96 bytes. It holds metadata information about the page, such as Page Type (Data Page, Index Page, Text Page, GAM, IAM, Page Free Space), Page Number, Pointer to next page and previous page, Amount of Free Space, Amount of Space Used, Allocation Id of the object that owns that object.

Data Rows

This stores the actual data which is 8060 Bytes. If data in any row exceeds the limit of 8060 Bytes then the rest of the data is stored in the new page or a series of new pages known as allocation unit and the pointer to that is stored in the header.

Row Offset

It saves the data row's pointer in the reverse order. The last row in the data page may be the first record in the row offset. It holds the information, that is how far is the specific row is from the header.

SQL Server Architecture

 

The data in the data file is stored as extents. An extent is a combination of 8 continuous pages of 8KB, that means the size of every extent is 64KB. Now, as discussed earlier, if the data in a row exceeds 8060Bytes in the data page, the rest of the data is stored in the next page and a pointer to that is stored in the header of the previous page. Next may be a page or a combination of pages known as allocation unit.

SQL Server Architecture

There are two types of extents.

  1. Uniform Extent - In which the data pages belong to one object.
  2. Mixed Extent - In which the pages are shared by different objects.
SQL Server Architecture
Database Architecture

SQL Server database is a logical collection of data. Data is stored in two sets of files - Data Files and Log Files. An Extent consists of 8 - 8KB continuous pages and in similar manner the data file consists of extents. Log files store all modifications that are made to the database such as DDL, DML operations.

Data Files

When we create a database there are two files created for every database  -- that is the Data File(.mdf) and Log File(.ldf) and a Primary filegroup (a container) is created in which .mdf (Primary Data File) resides. Log file cannot be contained in a Filegroup. It is always recommended to store the Log File in a different disk separate from data file.

SQL Server Architecture

We can create multiple data files, created with .ndf extension and they can be grouped in a secondary filegroup to be able to separate the production data and sort it out.

If we add two secondary data files(.ndf) in a separate filegroup, the SQL Server stores the data in data files (.ndf) in a round robin basis. That is, first extent is saved in the first file and the second in the second file and the third again in the first and so on.The primary data file  (.mdf) when created stores additional information as Database configuration, user info, Collation, Recovery Model, roles etc.

SQL Server Architecture

Log Files

When we create a database a default file is created to save all the modifications made on the objects in a Log File created with the extension of .ldf. It is always recommended to have a separate disk space for log files as they grow according to the number of transactions.

They should be regularly backed up in order to recover from any failure as they provide point-in-time recovery of the database, helping to restore the database in a consistent state before the failure, by rolling back the uncommitted transactions or rolling forward to the committed transactions.

Log files save all the modifications made by DML and DDL queries on the database objects. Log files initially save all the transactions in the log Files in  a log buffer by providing the Log Sequential Number for every Transaction. Before saving (or we should say committing) any data to log files, the log data is saved in the log buffer in the buffer cache of SQL Server. All the committed transaction data (Dirty Log) is saved to the transaction log before the committed data (Dirty Pages) is saved in the data files. This is known as Write Ahead Logging (WAL). We will discuss the entire Transaction Management in the later part as well. So as of now just keep in mind that the Log files save all the transactional logs.

SQL Server Architecture

Unlike Data files, log files save data in Virtual Log Files rather than pages. The above diagram clearly represents how the data is saved in the Virtual log File inside a Physical Log File. The transaction Manager assigns an LSN to every transaction to be able to isolate it from other transactions and save the log in the Log Cache. When checkpoint happens it first saves all the committed transaction logs into Log Files and then pushes all the dirty pages to the data files and marks them as Clean.

The process of saving transaction log data prior to pushing dirty pages is known as Write Ahead Logging (WAL). The checkpoint pushes all dirty pages (Pages which are modified in the cache but not yet saved to the disk) to the disk and marks them as clean but does not deallocate the pages from cache.

Lazywriter does the same thing but for a different purpose. It flushes the dirty pages to the disk and deallocates the clean pages in order to provide space in the memory. SQL Serve performs lazy writer when it comes under memory pressure.

Network Architecture

SQL Server uses different protocols for connecting to the SQL Engine and services.

  1. Shared Memory
    It is used to connect the stand alone application where the client is running on the same computer as the server.

  2. TCP/IP
    The most commonly used protocol in SQL Server to connect to the services. Default TCP/IP port is 1433. You can use any user defined port as well to connect with the SQL Server. It is recommended to change your default port after installing SQL Server.

  3. Named Pipes
    TCP/IP are comparable protocols. Named Pipes are developed for LAN, but it can be inefficient for WAN (Wide Area Network).
Detailed SQL Server Architecture

SQL Server Architecture

Following is the process which a SQL Query goes through.

  1. The SQL query is fired at the application end. From there it is converted into TDS (Tabular Data Stream) Packets using ODBC or OLEDB or SNAC (SQL Native Client) or SNI (SQL Server Network Interface). These are data access protocols which are used to access all types of data and grind them into TDS packets to be encapsulated within network packets to be travelled through network protocols from one endpoint to other.
  2. Once TDS packets reach the server endpoint, the SNI decapsulates those packets into SQL commands.
  3. The commands are passed through Query Parser or Command Parser and it checks the query for any Syntactical (Syntax) or Semantical (Logical) errors and if any error occurrs it returns the error to the N/W protocol layer.
  4. If it passes through the command parser, the next step is to generate a query plan. The query optimizer selects a cost effective plan provided to it using Query Tree (It uses certain algorithms to generate different query plans and presents it to the Query Optimizer.
  5. Query optimizer then selects a cost effective plan and presents the query to Query Executor. To execute a plan Query Executor needs data as well. So it passes the request to the Access Methods which is a collection of codes which provides an interface to retrieve data and present it to Query Executor after valid conversion using OLEDB. It itself does not do this work, rather it asks the buffer manager for the data.
  6. If the data is there in the Buffer, Access methods fetch those data pages and pass them to the query executor to execute the query. If the query plan is already in the Plan cache the executor uses that plan.
  7. The work of access method is to check if the query is select or non-select (DML). If the query is non-select the access methods contact Transaction Manager. Transaction Manager has two components,

    1. Log Manager
      Logs the events that will be modifying the data into Log Buffer in Buffer Pool.

    2. Lock Manager
      Assigns a lock on that transaction to provide data consistency and isolation.

  8. Transaction Manager generates a Log Sequential Number(LSN) for that transaction and records the events that will be modifying the data in Log Buffer and the Transaction will make changes in the buffer cache using only locking mechanism to be isolated from any other transaction who wants to modify the same data. Changes are not directly made to the data pages on the disk.

These modified pages reside in the buffer cache and are known as Dirty Pages, as they are not written to the disk as of yet. Now there is a checkpoint process that is an automatic recurring event in SQL server and runs in the background. When checkpoint happens, it flushes all the dirty pages (Modified Pages) to the disk and marks the pages as clean (pages not modified since last fetch) in the buffer cache, but does not deallocate those pages from cache. Before it does that the Log records are pushed into a Virtual Log File (data page of log file) with the LSN in Transaction Log from Log Buffer. This process of writing to the Log File before writing to the disk is known as Write Ahead Logging.

Lazy Writer, again a background process, also flushes the pages out of the buffer pool to the disk. When the SQL server comes under the memory pressure, lazy writer deallocates the pages which are residing there unused, also the clean pages from the memory,  and writes the dirty pages to the disk to be able to make some memory space for other operations.

Declaration

The images used in this article do not belong to me and have been used only for reference purposes to be able to help with better understanding of the subject. I will remove them in case of any conflict.