FREE BOOK

Chapter 8: Data Storage Design and SQL Server

Posted by Apress Free Book | ADO.NET & Database December 16, 2008
This chapter will explore the most common storage technologies available to you. It will also present some benefits and concerns about different storage designs so that you can more easily choose what suits your solution. We will finish the chapter with a presentation of SQL Server-examining its architecture and how it fits into your storage policy.

SQL Server Architecture

We have already mentioned that SQL Server is quite easy to use, but this can also be one of its drawbacks when it comes to people’s perceptions of this product. As long as you understand what it does and when it behaves in a wrong way, we argue that there is no value in having to tune a database server manually when you can let the built-in logic do it. You want your database administrators and IT staff spending time on other stuff, like designing, planning, and deploying applications. Obviously such staff would still need to provide maintenance and surveillance of your existing SQL Server system; if it behaves strangely, they must have the knowledge to troubleshoot it. This should not be their main occupation, however. A database that requires IT staff to spend hours and hours just to get it up and running takes too much time from the IT staffs' other tasks.

To get a better understanding of SQL Server, we will outline its architecture in the following sections.

Memory Management and Memory Architecture

Memory management is one of the things that in SQL Server requires little or no manual work, at least not in most cases. SQL Server by default dynamically allocates and deallocates memory as needed. This way SQL Server itself optimizes memory for best performance, based on the amount of physical memory it has to work with. SQL Server uses the virtual memory in Windows for two main components (see Figure 8-5):

  • The memory pool
     
  • Executable code



 Figure 8-5. SQL Servers’ two main components in the SQL Server address space

The memory pool is an area of memory from which some objects allocate their memory. These components are as follows:

  • System-level data structures: These hold data that is global to the SQL Server instance, such as database descriptors and the lock table.
     
  • Buffer cache: This cache stores data and index pages read from a database. If these items need to be accessed again, they can be found in the cache, which speeds up processing.
     
  • Procedure cache: This holds the execution plans so that they can be reused for stored procedures and Transact-SQL statements.
     
  • Log cache: The log cache is used to hold pages that are read from and written to the log. Each database has one log cache.
     
  • Connection context: This constitutes a set of data structures that keeps a record of the current state of a connection.

Determined by the number of user requests, the size of the memory pool varies, simply because the objects within it vary in size. When a new database is defined, the different data structures are allocated. The same goes for when a table or view is referenced.

The buffer cache, log cache, and procedure cache are constantly managed and adjusted by SQL Server to optimize performance so it fits the current workload. So if the procedure cache needs to have more memory, SQL Server dynamically increases the whole memory pool size. It does so by allocating more memory from the physical memory. If no more physical memory exists, it tries to resize the other objects in the memory pool.

You can control the size of the memory pool by setting minimum and maximum values for it. You do not need to control the size of the objects within the memory pool, as SQL Server handles this best alone.

The executable code is more or less the SQL Server engine. It includes the code for the engine itself, as well as the DLLs and executables for Open Data Services and Net-Libraries.

Let us take a closer look at the buffer cache, since this is an important part of SQL Server. The buffer cache consists of a number of memory pages that are initially free when SQL Server starts up. When SQL Server starts reading a page from disk, this page is stored in the first page of the buffer cache’s free list. When this page is read or modified by the same process or another, it is read from the cache instead of from disk. This reduces the amount of physical I/O required for the operation, and performance increases.

A page that has been modified in the buffer cache but not yet written to disk is called a dirty page. Whether a page is dirty or not is written to each buffer page’s header information. A reference counter also resides in the header, which is incremented and decremented with each reference to the page. The buffer cache is scanned periodically, and if a page has been referenced less than three times since the last scan and is not dirty, the page is considered to be free. It is then added to the free list, which we talked about in Chapter 4. If the page is dirty, the modifications are written to disk first. This work is done by the worker processes of SQL Server.

To prevent the free list from being too small, a worker process called the lazywriter periodically checks to see that the free list does not fall below a certain size. The size depends on the size of the buffer cache. If the free list is becoming too small, the lazywriter scans the cache and reclaims unused pages and pages that have a reference counter set to zero.

NOTE The lazywriter is mostly used in very I/O-intense systems, because the other threads can handle these tasks on less heavily used systems.

To allow the buffer cache to have as much memory as possible, you should see to it that your system has enough physical memory (RAM) for the task it performs. Memory is cheap these days, so this does not have to be too expensive even for small companies.

Memory Configuration in SQL Server

As we have mentioned, SQL Server dynamically adjusts the memory for the memory pool. It not only allocates memory for its parts, but it also deallocates memory if other applications need it. But it only deallocates memory if another process asks for it; otherwise it maintains memory at the current size.

The virtual memory SQL Server uses is maintained at 5MB less than the available physical memory. This stops excessive memory swapping to disk, while still giving SQL Server the largest memory pool possible. SQL Server always tries to keep 5MB of free physical memory on the system by deallocating memory from its pool, if another process tries to access these megabytes.

To make sure SQL Server always has a certain minimum memory pool, you can set the min server memory option so SQL Server does not release memory less than this value. The min server memory option is always set in megabytes. You can also set the max server memory option so that SQL Server does not steal too much memory from other applications running on the server. Carefully use these options so that no application (including SQL Server) starts any excessive swapping.

You can also set the max server memory option so that SQL Server does not  steal too much memory from other applications running on the server.

Carefully use these options so that no application (including SQL Server) starts
any excessive swapping. You can keep an eye on this by using the Performance MMC (see Figure 8-6) and the Pages/sec counter of the memory object. This counter shows the paging on the system. A value between 0 and 100 is considered normal.



 Figure 8-6. The Memory: Pages/sec counter in the Performance MMC

The best approach is to let SQL Server handle memory usage and not to set the manual options. This is especially true when only SQL Server runs on the machine. When you have other applications running, you might need to manually configure these settings, however. With all the consolidation going on these days, it is not unlikely that SQL Server has to share room with other applications, so it could be nice to know these options are available. The following code shows how to set these values from Transact-SQL:

sp_configure 'show advanced options', 1
go
sp_configure 'min server memory', 256
go
RECONFIGURE WITHOUT OVERRIDE
go

You can also set minimum and maximum memory values from Enterprise Manager if you want to, as shown in Figure 8-7.



 Figure 8-7. Setting the memory pool size from SQL Server Enterprise Manager

Data Files and Log Files

No database manager would be of any value if you did not store your data somewhere. SQL Server stores its data in a set of operating system files. Each file is made up of smaller parts called pages. There are three kinds of files that SQL Server uses (see Figure 8-8):

  • Primary data files
     
  • Secondary data files
     
  • Log files



 Figure 8-8. The different files in SQL Server

Primary data files contain different database information like startup information for the database, pointers to the other files of the database, system tables, and objects. These files can also contain database data and objects. There is one primary data file for every database, and the file extension through which you can identify this file is .mdf.

Secondary data files, which sport the extension .ndf, are an option for every database. You do not need to have one of these, but can instead use the primary data file if you want. Secondary data files hold data and objects that are not in the primary file. Here we find tables and indexes. The reason you might want to use secondary files is that you can spread the data on separate disks, thereby improving I/O performance.

Log files are files that hold all the information about the transactions in the database, and they cannot be used to hold any other data. You can have one or more log files (like the ones shown in Figure 8-8 with an extension of .ldf), but our recommendation is to use one only. This is because the log file is written to sequentially, and hence does not benefit from being separated on several physical disks. To enhance performance, you should instead try to use RAID 1+0, and place the log file on such hardware.

One important thing to understand about the log file is that it actually consists of several files, regardless of whether you use only one or not. These files are called virtual log files (VLFs) and are created by SQL Server (see Figure 8-9). Of these virtual log files, only one is active at a time. When a backup of the log is performed,all inactive VLFs are cleared. The active VLF is never cleared, so contrary to what you might have been told, a log backup does not clear the whole log. But do not worry, this is not a problem.



 Figure 8-9. The virtual log files in a log file

If you have too many VLFs, the overhead of maintaining them can degrade performance. To view how many VLFs exist in your databases, you could use an undocumented DBCC command, which returns as many rows as there are VLFs in the database. Execute DBCC LOGINFO and have a look at how your database appears. In Figure 8-10, we have run DBCC LOGINFO against the Northwind database on our system. As you can see, a total of four VLFs exist in our database.



 Figure 8-10. Executing DBCC LOGINFO against the Northwind database

To reduce the number of VLFs, you should perform frequent log backups. You also do not want to forget to shrink the log file through the command DBCC SHRINKFILE (logfilename, truncateonly), otherwise the number of VLFs will not be reduced.

When designing a log file, you should make a good estimate for the size of the log file and create it with that size. If auto-growth is enabled on a small log file, new VLFs are created every time the log is increased. So it is important to make it as close to the size you will need as possible.

Total Pages : 12 45678

comments