FREE BOOK

Chapter 8: Data Storage Design and SQL Server

Posted by Apress Free Book | ADO.NET & Database December 16, 2008
This chapter will explore the most common storage technologies available to you. It will also present some benefits and concerns about different storage designs so that you can more easily choose what suits your solution. We will finish the chapter with a presentation of SQL Server-examining its architecture and how it fits into your storage policy.

Query Optimizer

SQL Server 2000 comes with a built-in query optimizer that in most cases chooses the most effective index to use. When you design your indexes, you should provide the query optimizer with a carefully thought-out selection of indexes to choose from. It will in most cases select the best index from these.

The task of the query optimizer is to select an index when it attempts to optimize performance. It will not use an index that will degrade performance. The truth is that having an index does not necessarily mean you get the best performance, especially if an index is designed poorly. So you should be grateful that there is logic built into the query optimizer to help you out. The rule is to test your indexes as you test all other solutions to find the most effective ones.

Index Tuning Wizard

As you saw in the previous section, in SQL Server you do have available a tool that helps you analyze your queries and suggest indexes for your tables. If you were going to do this manually, you would have quite a task in front of you. First of all, you would need to have an expert understanding of the database. If you are the database designer, this would not be a problem, but that is not always the case. You would also need to have deep knowledge of how SQL Server works internally. If you are like most developers, you just do not want to dig that deep into the tools you use, and you should not need to either. That is why it is good to have tools that make your lives easier.

To use the Index Tuning Wizard, you need a sample of the normal workload your SQL Server has to deal with. You can use this as input on how you should design your indexes, so you know they will be effective. Only it will be the Index Tuning Wizard that makes the recommendations for you.

To create the sample, you use another tool that ships with SQL Server: the SQL Profiler. Once you are satisfied with the trace SQL Profiler creates, you can let the Index Tuning Wizard analyze the workload and a sample Transact-SQL statement. When done, the wizard will recommend an index configuration that will improve the performance of the database. At this moment, you can choose to let it implement the suggested changes immediately, schedule the implementation for later, or save it to a SQL script that you can run at any time.

NOTE Run the Index Tuning Wizard against a test SQL Server. It consumes quite a lot of CPU from the server, which could create problems in a production environment. Also, try to use a separate computer to run the wizard from, so it is not running on the same server where the SQL Server instance is installed.

Partitioned Views

You cannot use Network Load Balancing to distribute the workload for a SQL Server. Another method lets you accomplish the same thing, however, or at least something similar.

Take a look at an example. If you have a customer database that only stores information about your customers A–Z, you can partition the data. One partition could hold all customers in the range from A–E, another F–I, and so on. Once you have decided which way to partition the data, you must select one out of two following choices of storing it for best performance:

  • Local partitioned view
     
  • Distributed partitioned view

Local Partitioned Views

A local partitioned view stores all partitions in the same instance of your SQL Server. (See the section "Separating Read Data from Write Data" later in this chapter for a discussion on when this is a good solution.)

Distributed Partitioned Views

A distributed partitioned view uses a federation of servers to store data. Although this sounds like something out of Star Wars, it is really quite simple. A federation of servers consists of two or more servers, each holding one partition of the data (see Figure 8-18).


Figure 8-18. A federation of SQL Servers

Every server runs a separate instance of SQL Server. The servers are administered independently of each other, but they cooperate to share the processing load of the data. The horizontally partitioned table is exposed as a single table through the use of partitioned views.

On every server, the other SQL Servers are added as linked servers so you can execute distributed queries on them. You then create a distributed partitioned view on each of the servers as the code listing that follows shows, so that clients can access this view on any of the servers and still get the view of the full table. As you see, the view is a set of SQL statements whose individual result sets are combined into one, using the UNION ALL statement in Transact-SQL.

create view CustomerView as
select * from server01.MyDatabase.dbo.Customers_01
union all
select * from server02.MyDatabase.dbo.Customers_02
union all
select * from server03.MyDatabase.dbo.Customers_03
union all
select * from server04.MyDatabase.dbo.Customers_04

You can also make the view updatable if you want to, which further adds to the flexibility of this solution.

Remember that you can use the servers for storing other data at the same time as they store a partition. The servers are not dedicated to this partition in any way. You might only need to partition one large table, in which case it would be a pity not to use the servers’ capacity to their full extent.

By using this method, you can let several servers share the burden of handling a large table. Clients can access any of the servers and get results from all of them by using one single view, allowing this solution to scale even though you do not use NLB. Since you can use a MSCS cluster behind each of the participating servers, you can also achieve high availability with this technique.

NOTE Most common way of implementing distributed partitioned views is by letting the client applications access the member server that has most of the data required by the SQL statement. This is called collocating the SQL statement with the data it requires.

Separating Read Data from Write Data

There is a way to reduce the impact locking has on your database. As you may recall, locking can reduce performance, and if you have a lot of read activity,a write command should not lock these reads. By defining vertical partitions, you can separate read columns from write columns and place them in different tables. This can improve performance in your applications.

You could also use horizontal partitioning to separate read data from writedata. Say you have a table that stores sales data for your company. Data is written on a daily basis, and all other data is maintained for reporting purposes. This kind of table could grow extremely large and unmanageable. By partitioning the data horizontally, you can enhance performance on write and read operations. You could create a table for the current month, another for the current year, and a third for the rest of the data. How you do the partitioning is determined by the way the table is used. After that, you create a view that lets you display data from all tables.

By partitioning this way, you also enhance backup performance, since you do not need to back up all historical data every day.

During the design phase, you should carefully consider how your data is going to be accessed so you minimize locking.

Query Tuning

Even if you have top-of-the-line hardware and it is tuned extremely well, you might still experience performance troubles. It is as important to optimize your queries as it is to optimize your hardware. Although writing queries is quite easy, if you have no knowledge of query design issues, your queries could degrade performance. Resources are used intensively especially in two particular cases: queries returning large result sets and highly nonunique WHERE clauses. We have seen some horrible queries in our times (not to mention the embarrassing queries we ourselves have produced over the years), despite the fact that the developers who wrote them should have known better.

Sometimes you cannot avoid using very complex and resource-intensive queries; however, you could try to move the processing of these to a stored procedure, where such a query probably would execute better than if the application issued it. This would also reduce network traffic if the query returned a large result set.

If this still does not help, you could also add more CPUs to your server if your hardware allows this possibility. This way SQL Server can execute queries in parallel, and performance will increase.

What you should do first of all is to go over the design of the query one more time. Is there really no way you can rewrite it to make it perform better? If you use cursors in your query, you should definitely see if you can make them more effective. It might also be possible to exchange the cursor for a GROUP BY or CASE statement instead.

If your application uses a loop that executes a parameterized stored procedure, every time you travel down the loop, a round-trip is made to the SQL Server. This slows down performance. Instead, you should try to rewrite the application and the stored procedure to create a single, more complex query using a temporary table. This way the SQL Server query optimizer can better optimize the execution of the query. A temporary table causes a recompile of the stored procedure, however, which negatively affects performance. So if you find it causing problems, you could use a table variable instead. Try to avoid getting too large of a result set back, since this also degrades performance. This can be like walking on the edge of a blade, so test, test, and test.

NOTE Stored procedures are not all good, unfortunately. See the section "Stored Procedures" later in this chapter for specifics.

A long-running query costs a lot since it might prevent others from getting their queries executed. It also consumes a lot of resources. You can stop these queries by setting the query governor configuration option. The default is that the query governor allows all queries to execute. You can set the query governor to not allow queries that will take more than a specified number of seconds to execute. It also allows you to control this type of query execution on each separate connection, or on all if you wish. The query governor estimates the cost of the queries and stops long-running queries before they have even started executing. Since it does not allow them to run, the overhead of this process is quite low.

Total Pages : 12 89101112

comments