FREE BOOK

Chapter 8: Data Storage Design and SQL Server

Posted by Apress Free Book | ADO.NET & Database December 16, 2008
This chapter will explore the most common storage technologies available to you. It will also present some benefits and concerns about different storage designs so that you can more easily choose what suits your solution. We will finish the chapter with a presentation of SQL Server-examining its architecture and how it fits into your storage policy.

Clustering

In previous chapters, you have seen the two ways you can cluster Windows servers: Network Load Balancing (NLB) and Microsoft Cluster Service (MSCS). You can only use MSCS with SQL Server, but as you will see, several servers can share the workload of a database server in a similar way as NLB. First, we will start with a look at using MSCS with SQL Server.

MSCS

SQL Server is built as a cluster-aware application and can therefore use MSCS to provide high availability. In Windows Server 2003, you can have up to eight nodes in a cluster, and you can also disperse the nodes across geographic locations. By using MSCS with SQL Server, you can build a back-end data store solution that provides your applications with a secure and highly available data feed. Figure 8-17 shows a typical four-node cluster exposed as a single virtual server to the clients.



Figure 8-17. A MSCS cluster with four nodes

The disks used to store the database and its transaction log(s) are shared between the nodes in the cluster.

NOTE To avoid a single point of failure, duplicate all fiber channels and other communications hardware your cluster uses.

Remember that only one instance of SQL Server can access the database at a time, and when a node fails, it takes a little time before the failover node is online and processing requests. If the failure depends on a corrupt database, obviously the failover node will not work either. Database corruption like the kind a user can make, called logical corruption, and the type that occurs within a database, called internal corruption, is not prevented by clustering. This means you must always provide a disaster recovery plan, and also make certain that it works when you need it.

This sounds so logical and easy, but this is not always the case. One time, a client of ours experienced an erroneous update in their sales batches during the night, which corrupted 25 customer databases. All of the customers had regular database backups scheduled, but only 15 of them were able to perform a restore operation. We ended up writing scripts and doing tricks so the databases were rolled back to an earlier date, and all batches were then inserted again, this time with correct data. That took us close to a week. Do not put yourself in this situation we can guarantee it is not pleasant.

Index Tuning

Indexes are one of a database designer's best friends. Efficient index design is a crucial part of achieving good performance. An index reduces the amount of I/O necessary to retrieve requested data by helping SQL Server to quickly determine where data is stored, hence making retrieval quicker.

If you do not have an index in your table, all data must be read and compared to what you are looking for. Compare this to how hard and slow it would be to find a certain topic in a book without a table of content or index, and you soon understand the importance of good indexing. By using indexes, you speed up the process of finding what you are looking for. There are two kinds of indexes that SQL Server uses:

  • Clustered indexes
     
  • Nonclustered indexes

A clustered index can be compared to a phone directory. All data is sorted alphabetically, in the phone directory by last name and first name, and in a database depending on what column(s) the index is created on. There can only exist one clustered index in a table, since the clustered index dictates the physical storage order of the data.

A nonclustered index can be compared to an index in a book. The data is stored in one place and the index, with pointers to the data, is stored in another. The index and the items it consists of are stored in the order of the index key values. The data in the table can be stored in a different order. You can, for instance, store it in the order of a clustered index, but if you do not have a clustered index in your table, it can be stored in any possible way. SQL Server searches the index to find the location of the data, and then retrieves it from that location. You can have multiple nonclustered indexes in a table, and use different ones depending on your queries.

When to Use Which Index

If you have a column in a table where data is unique for every row, you should consider having a clustered index on that column. An example of this is a column with Social Security numbers or employee numbers. Clustered indexes are very good at finding a specific row in this case. This can be effective for an OLTP application, where you look for a single row and need access to it quickly.

Another great opportunity for using clustered indexes is when you search a table for a range of values in a column(s). When SQL Server has found the first value in the range, all others are sure to follow after it. This improves performance of data retrieval.

If you have columns often used in queries that have a GROUP BY or ORDER BY clause, you can also consider using a clustered index. This eliminates the need for SQL Server to sort the data itself, since it is already sorted by the index.

TIP We have learned from experience that it is not a good idea to have a clustered index on a GUID (or any other random value). The clustered index works well for columns you know will be sorted a particular way (according to last name, for example), because you do not need to perform a sort after retrieval.You will rarely (if ever) sort or seek data according to their GUIDs (or random values for that matter) because that would only increase the overhead during and after data retrieval, since you then would have to resort the data to be able to use it properly.

Nonclustered indexes are used to great advantage if a column or columns contain a large number of distinct values. (Refer to a combination of last name and first name here to get the picture. There can be many John Smiths in a table, for example.)

You can also use the nonclustered indexes when queries do not return large data sets. The opposite is, of course, true for clustered indexes.

If you have queries that use search conditions (WHERE clauses) to return an exact match, you can also consider using nonclustered indexes on the columns in the WHERE clause.

NOTE SQL Server comes with a tool called the Index Tuning Wizard, which can be used to analyze your queries and suggest the indexes that you should create. Check it out on a few queries and tables, and do some testing.We will cover this feature more later in this chapter.

Total Pages : 12 89101112

comments