Understanding Transactions in SQL Server

Vidya Vrat Agarwal
7y
603k
0
24

Article

Transactions in SQL Server

For any business, transactions that may be comprised of many individual operations and even other transactions, play a key role.

Transactions are essential for maintaining data integrity, both for multiple related operations and when multiple users that update the database concurrently.

This article will specifically talk about the concepts related to transactions and how transactions can be used in the context of a SQL Server database. Besides, a transaction is a fundamental concept and this article will be helpful for relating transaction concepts with other databases as well.

Note: I used SQL Server 2012 in the article but you can use SQL Server 2008 as well.

In this article, I'll cover the following:

What a transaction is
When to use transactions
Understanding ACID properties
Design of a Transaction
Transaction state
Specifying transaction boundaries
T-SQL statements allowed in a transaction
Local transactions in SQL Server 2012
Distributed transactions in SQL Server 2012
Guidelines to code efficient transactions
How to code transactions

What Is a Transaction?

A transaction is a set of operations performed so all operations are guaranteed to succeed or fail as one unit.

Transaction is all or none

A common example of a transaction is the process of transferring money from a checking account to a savings account.

This involves two operations:

Deducting money from the checking account and

Note: in the USA a checking account is like a current account in India
Adding it to the savings account.

Both must succeed together and the changes must be committed to the accounts, or both must fail together and rolled back so that the accounts are maintained in a consistent state. Under no circumstances should money be deducted from the checking account but not added to the savings account (or vice versa), you would at least not want this to happen with the transactions occurring with your bank accounts.

By using a transaction concept, both the operations, namely debit and credit, can be guaranteed to succeed or fail together. So both accounts remain in a consistent state all the time.

When to Use Transactions

You should use transactions when several operations must succeed or fail as a unit. The following are some frequent scenarios where use of transactions is recommended:

In batch processing, where multiple rows must be inserted, updated, or deleted as a single unit
Whenever a change to one table requires that other tables be kept consistent
When modifying data in two or more databases concurrently
In distributed transactions, where data is manipulated in databases on various servers

When you use transactions, you put locks on data that is pending for permanent change to the database. No other operations can take place on locked data until the acquired lock is released. You could lock anything from a single row up to the entire database. This is called concurrency, which means how the database handles multiple updates at one time.

In the bank example above, locks will ensure that two separate transactions don't access the same accounts at the same time. If they do then either deposits or withdrawals could be lost.

Note: it's important to keep transactions pending for the shortest period of time. A lock stops others from accessing the locked database resource. Too many locks, or locks on frequently accessed resources, can seriously degrade performance.

Understanding ACID Properties

A transaction is characterized by four properties, often referred to as the ACID properties: atomicity, consistency, isolation, and durability.

Note: The term ACID was coined by Andreas Reuter in 1983.

Atomicity: A transaction is atomic if it's regarded as a single action rather than a collection of separate operations. So, only when all the separate operations succeed does a transaction succeed and is committed to the database. On the other hand, if a single operation fails during the transaction then everything is considered to have failed and must be undone (rolled back) if it has already taken place. In the case of the order-entry system of the Northwind database, when you enter an order into the Orders and Order Details tables, data will be saved together in both tables, or it won't be saved at all.
Consistency: The transaction should leave the database in a consistent state, whether or not it completed successfully. The data modified by the transaction must comply with all the constraints placed on the columns in order to maintain data integrity. In the case of Northwind, you can't have rows in the Order Details table without a corresponding row in the Orders table, since this would leave the data in an inconsistent state.
Isolation: Every transaction has a well-defined boundary; that is, it is isolated from another transaction. One transaction shouldn't affect other transactions running at the same time. Data modifications made by one transaction must be isolated from the data modifications made by all other transactions. A transaction sees data in the state it was in before another concurrent transaction modified it, or it sees the data after the second transaction has completed, but it doesn't see an intermediate state.
Durability: Data modifications that occur within a successful transaction are kept permanently within the system regardless of what else occurs. Transaction logs are maintained so that should a failure occur the database can be restored to its original state before the failure. As each transaction is completed, a row is entered in the database transaction log. If you have a major system failure that requires the database to be restored from a backup then you could then use this transaction log to insert (roll forward) any successful transactions that have taken place.

Every database software that offers support for transactions enforces these four ACID properties automatically.

Design of a Transaction

Transactions represent real-world events such as bank transactions, airline reservations, remittance of funds, and so forth.

The purpose of transaction design is to define and document the high-level characteristics of transactions required on the database system, including the following:

Data to be used by the transaction
Functional characteristics of the transaction
Output of the transaction
Importance to users
Expected rate of usage

The following are the three main types of transactions:

Retrieval transactions: Retrieves data from the database to be displayed on the screen.
Update transactions: Inserts new records, deletes old records, or modifies existing records in the database.
Mixed transactions: Involves both the retrieval and updating of data.

Transaction State

In the absence of failures, all transactions complete successfully. However, a transaction may not always complete its execution successfully. Such a transaction is termed aborted.

A transaction that completes its execution successfully is said to be committed. Figure 1-1 shows that if a transaction has been partially committed then it will be committed but only if it has not failed and if the transaction has failed, it will be aborted.

Figure 1-1. States of a transaction

Specifying Transaction Boundaries

SQL Server transaction boundaries help you to identify when a SQL Server transaction starts and ends by using API functions and methodsas in the following:

Transact-SQL statements: Use the BEGIN TRANSACTION, COMMIT TRANSACTION, COMMIT WORK, ROLLBACK TRANSACTION, ROLLBACK WORK, and SET IMPLICIT_TRANSACTIONS statements to delineate transactions.
API functions and methods: Database APIs such as ODBC, OLE DB, ADO, and the .NET Framework SqlClient namespace contain functions or methods used to delineate transactions. These are the primary mechanisms used to control transactions in a database engine application.

Each transaction must be managed by only one of these methods. Using both methods on the same transaction can lead to undefined results.

T-SQL Statements Allowed in a Transaction

You can use all T-SQL statements in a transaction, except for the following statements: ALTER DATABASE, RECONFIGURE, BACKUP, RESTORE, CREATE DATABASE, UPDATE STATISTICS, and DROP DATABASE.

Local Transactions in SQL Server 2012

All database engines are supposed to provide built-in support for transactions. Transactions that are restricted to only a single resource or database are known as local transactions. Local transactions can be in one of the following four transaction modes:

Autocommit Transactions Autocommit mode is the default transaction management mode of SQL Server. Every T-SQL statement is committed or rolled back when it is completed. If a statement completes successfully, it is committed; if it encounters any errors, it is bound to roll back. A SQL Server connection operates in autocommit mode whenever this default mode has not been overridden by any type transactions.
Explicit Transactions Explicit transactions are those in which you explicitly control when the transaction begins and when it ends. Prior to SQL Server 2000, explicit transactions were also called user-defined or user-specified transactions.
T-SQL scripts for this mode use the BEGIN TRANSACTION, COMMIT TRANSACTION, and ROLLBACK TRANSACTION statements. Explicit transaction mode lasts only for the duration of the transaction. When the transaction ends, the connection returns to the transaction mode it was in before the explicit transaction was started.
Implicit Transactions When you connect to a database using SQL Server Management Studio and execute a DML query, the changes are automatically saved. This occurs because, by default, the connection is in autocommit transaction mode. If you want no changes to be committed unless you explicitly indicate so, you need to set the connection to implicit transaction mode.

You can set the database connection to implicit transaction mode by using SET IMPLICIT TRANSACTIONS ON|OFF.

After implicit transaction mode has been set to ON for a connection, SQL Server automatically starts a transaction when it first executes any of the following statements: ALTER TABLE, CREATE, DELETE, DROP, FETCH, GRANT, INSERT, OPEN, REVOKE, SELECT, TRUNCATE TABLE, and UPDATE.

The transaction remains in effect until a COMMIT or ROLLBACK statement has been explicitly issued. This means that when, say, an UPDATE statement is issued on a specific record in a database, SQL Server will maintain a lock on the data scoped for data modification until either a COMMIT or ROLLBACK is issued. In case neither of these commands are issued, the transaction will be automatically rolled back when the user disconnects. This is why it is not a best practice to use implicit transaction mode on a highly concurrent database.
Batch-Scoped Transactions A connection can be in batch-scoped transaction mode, if the transaction running in it is Multiple Active Result Sets (MARS) enabled. Basically MARS has an associated batch execution environment, since it allows ADO .NET to take advantage of SQL Server 2012's capability of having multiple active commands on a single connection object.

When MARS is enabled, you can have multiple interleaved batches executing at the same time, so all the changes made to the execution environment are scoped to the specific batch until the execution of the batch is complete. Once the execution of the batch completes, the execution settings are copied to the default environment. Thus a connection is said to be using batch-scoped transaction mode if it is running a transaction, has MARS enabled on it, and has multiple batches running at the same time.

MARS allows executing multiple interleaved batches of commands. However, MARS does not let you have multiple transactions on the same connection; it only allows having Multiple Active Result Sets.

Distributed Transactions in SQL Server 2012

In contrast to local transactions that are restricted to a single resource or database, distributed transactions span two or more servers, that are known as resource managers. Transaction management needs to be coordinated among the resource managers via a server component known as a transaction manager or transaction coordinator. SQL Server can operate as a resource manager for distributed transactions coordinated by transaction managers such as the Microsoft Distributed Transaction Coordinator (Microsoft DTC).

A transaction with a single SQL Server that spans two or more databases is actually a distributed transaction. SQL Server, however, manages the distributed transaction internally.

At the application level, a distributed transaction is managed in much the same way as a local transaction. At the end of the transaction, the application requests the transaction to be either committed or rolled back. A distributed commit must be managed differently by the transaction manager to minimize the risk that a network failure might lead you to a situation when one of the resource managers is committing instead of rolling back the transactions due to failure caused by various reasons. This critical situation can be handled by managing the commit process in two phases, also known as a two-phase commit:

Prepare phase: When the transaction manager receives a commit request, it sends a prepare command to all of the resource managers involved in the transaction. Each resource manager then does everything required to make the transaction durable, and all buffers holding any of the log images for other transactions are flushed to disk. As each resource manager completes the prepare phase, it returns success or failure of the prepare phase to the transaction manager.
Commit phase: If the transaction manager receives successful prepares from all of the resource managers then it sends a COMMIT command to each resource manager. If all of the resource managers report a successful commit then the transaction manager sends a notification of success to the application. If any resource manager reports a failure to prepare, the transaction manager sends a ROLLBACK statement to each resource manager and indicates the failure of the commit to the application.

Guidelines to Code Efficient Transactions

We recommend you use the following guidelines while coding transactions to make them as efficient as possible:

Do not require input from users during a transaction.

Get all required input from users before a transaction is started. If additional user input is required during a transaction then roll back the current transaction and restart the transaction after the user input is supplied. Even if users respond immediately, human reaction times are vastly slower than computer speeds. All resources held by the transaction are held for an extremely long time, that has the potential to cause blocking problems. If users do not respond then the transaction remains active, locking critical resources until they respond, that may not happen for several minutes or even hours.
Do not open a transaction while browsing through data, if at all possible.

Transactions should not be started until all preliminary data analysis has been completed.
Keep the transaction as short as possible.

After you know the modifications that need to be made, start a transaction, execute the modification statements, and then immediately commit or roll back. Do not open the transaction before it is required.
Make intelligent use of lower cursor concurrency options, such as optimistic concurrency options.

In a system with a low probability of concurrent updates, the overhead of dealing with an occasional "somebody else changed your data after you read it" error can be much lower than the overhead of always locking rows as they are read.
Access the least amount of data possible while in a transaction.

The smaller the amount of data that you access in the transaction, the fewer the number of rows that will be locked, reducing contention between transactions.

How to Code Transactions

The following three T-SQL statements control transactions in SQL Server:

BEGIN TRANSACTION: This marks the beginning of a transaction.
COMMIT TRANSACTION: This marks the successful end of a transaction. It signals the database to save the work.
ROLLBACK TRANSACTION: This denotes that a transaction hasn't been successful and signals the database to roll back to the state it was in prior to the transaction.

Note: there is no END TRANSACTION statement. Transactions end on (explicit or implicit) commits and rollbacks.

Coding Transactions in T-SQL

You'll use a Stored Procedure to practice coding transactions in SQL. It's an intentionally artificial example but representative of transaction processing fundamentals. It keeps things simple so you can focus on the important issue of what can happen in a transaction.

Warning: Using ROLLBACK and COMMIT inside a Stored Procedures typically requires careful consideration of what transactions may already be in progress and have led to the Stored Procedure call. The example runs by itself, so you don't need to be concerned with this here, but you should always consider whether it's a potential issue.

Try It Out: Creating a Parent-Child relationship

Before you code the transaction let's create two tables to apply our transaction code and understand the transaction concept.

Open SQL Server Management Studio, and in the Object Explorer, select our previously created database SQL2012Db, right-click and click on "New Query".
Enter the following SQL statement to create tables with a primary-key and foreign-key, in other words a parent-child relationship. The Person table will have a primary-key column that will be referenced by the PersonDetails table via a foreign key column as shown below here.

Listing 1-1. Create Parent-Child relationship
1. create table Person
2. (
3. PersonID nvarchar(5)primary key not null,
4. FirstName nvarchar(10)not null,
5. Company nvarchar(15)
6. )
7. create table PersonDetails
8. (
9. PersonID nvarchar(5)FOREIGN KEY REFERENCES dbo.Person(PersonID),
10. Address nvarchar(30)
11. )
Now click "Execute". And it should show the status as "Command(s) completed successfully" as shown in Image 1-2 below.

Figure 1-2. Executing the Create table statement (parent-child relationship)
Next let's insert some data into the Person and PersonDetails table, by executing the statement below, and click "Execute".

Listing 1-2. Create Parent-Child relationship
1. Insert into Person
2. values('Vidvr','Vidya Vrat','Lionbridge Inc'),
3. ('Rupag','Rupali','Pearl Solutions')
This statement should show the status "2 row(s) affected".

Since a child can have only those records that map to the parent, hence we can only insert child records into PersonDetails for those PersonIDs that are already available in the Person table.
1. Insert into PersonDetails
2. values('Vidvr','Bellevue WA 98007'),
3. ('Rupag','Bellevue WA 98007')

As you can see, the child table's PersonID matches with the parent table. So now we have a perfect parent-child relationship, where we have two parent records and 2 matching child records in the Person and PersonDetails tables respectively as shown in Figure 1-3 below:

Figure 1-3. Showing Parent-Child relationship between Person and PersonDetails table

Try It Out: Coding a Transaction in T-SQL

Here, you'll code a transaction based on the Person and PersonDetails table, where we will use SQL Server's primary-key and foreign-key rules to understand how transactions work. The Person table has three columns; two columns, PersonID and FirstName, don't allow null values, and PersonID is also a primary key column. In other words only unique values are allowed. Also, the last column Company allows null values.

Similarly, the PersonDetails table is a foreign-key or child table, it has a PersonID column that is a foreign-key column and reference to Person.PersonID. It also has an Address column. The child or foreign-key table can only have those records that has a matching Primary-key column value available in the Parent or Primary-key table as shown in Figure 1-3 above, if a child record is inserted that doesn't have a matching parent or primary ley value then it will result in an error and not insterted into the child table.
In Object Explorer, select the SQL2012Db database, and click the New Query button.
Create a Stored Procedure named sp_Trans_Test using the code in Listing 1-3.

Listing 1-3. spTransTest
1. create procedure sp_Trans_Test
2. @newpersonid nvarchar(5),
3. @newfirstname nvarchar(10)
4. @newcompanyname nvarchar(15),
5. @oldpersonid nvarchar(5)
6. as
7. declare @inserr int
8. declare @delerr int
9. declare @maxerr int
10. set @maxerr = 0
11. BEGIN TRANSACTION
13. -- Add a person
14. insert into person (personid, firstname, company)
15. values(@newpersonid, @newfirstname, @newcompanyname)
17. -- Save error number returned from Insert statement
18. set @inserr = @@error
19. if @inserr > @maxerr
20. set @maxerr = @inserr
22. -- Delete a person
23. delete from person
24. where personid = @oldpersonid
26. -- Save error number returned from Delete statement
27. set @delerr = @@error
28. if @delerr > @maxerr
29. set @maxerr = @delerr
31. -- If an error occurred, roll back
32. if @maxerr <> 0
33. begin
34. ROLLBACK
35. print 'Transaction rolled back'
36. end
37. else
38. begin
39. COMMIT
40. print 'Transaction committed'
41. end
42. print 'INSERT error number:'+ cast(@inserras nvarchar(8))
43. print 'DELETE error number:'+ cast(@delerras nvarchar(8))
44. return @maxerr
Enter the following query in the same query windows as the Listing 1-3 code. Select the statement as shown in Figure 1-2, and then click "Execute" to run the query.
1. exec sp_Trans_Test 'Pearl', 'Vamika ', null, 'Agarw'
The results window should show a return value of zero, and you should see the same messages as shown in Figure 1-4.

Figure 1-4. Executing the Stored Procedure in the same query window, enter the following SELECT statement:
1. Select * from Person
Select the statement as shown in Figure 1-3 and then click the "Execute" button. You will see that the person named "Vamika" has been added to the table, as shown in the Results tab in Figure 1-3.

Figure 1-5. Row inserted in a transaction
Add another person with the parameter values. Enter the following statement and execute it as you've done previously with other similar statements.
1. EXEC sp_Trans_Test 'Spark', 'Arshika ', null, 'Agarw'
You should get the same results as shown earlier in Figure 1-4 in the Messages tab.
Try the SELECT statement shown in Figure 84 one more time. You should see that person "Arshika" has been added to the Person table. Both Person "Vamika" and "Arshika" have no child records in the PersonDetails table.

How It Works

In the Stored Procedure, you define four input parameters:

create procedure sp_Trans_Test
@newpersonid nvarchar(5),
@newfirstname nvarchar(10),
@newcompanyname nvarchar(15),
@oldpersonid nvarchar(5)

You also declare three local variables:

declare @inserr int
declare @delerr int
declare @maxerr int

These local variables will be used with the Stored Procedure, so you can capture and display the error numbers returned if any from the INSERT and DELETE statements.

You mark the beginning of the transaction with a BEGIN TRANSACTION statement and follow it with the INSERT and DELETE statements that are part of the transaction. After each statement, you save the return number for it.

BEGIN TRANSACTION
-- Add a person
insert into person (personid, firstname, company)
values(@newpersonid, @newfirstname, @newcompanyname)
-- Save error number returned from Insert statement
set @inserr = @@error
if @inserr > @maxerr
set @maxerr = @inserr
-- Delete a person
delete from person
where personid = @oldpersonid
-- Save error number returned from Delete statement
set @delerr = @@error
if @delerr > @maxerr
set @maxerr = @delerr

Error handling is important at all times in SQL Server, and it's never more than inside transactional code. When you execute a T-SQL statement, there's always the possibility that it may not succeed. The T-SQL @@ERROR function returns the error number for the last T-SQL statement executed. If no error occurred then @@ERROR returns zero.

@@ERROR is reset after every T-SQL statement (even SET and IF) is executed, so if you want to save an error number for a specific statement then you must store it before the next statement executes. That's why you declare the local variables @inserr and @delerr and @maxerr.

If @@ERROR returns any value other than 0, an error has occurred, and you want to roll back the transaction. You also include PRINT statements to report whether a rollback or commit has occurred.

-- If an error occurred, roll back
if @maxerr <> 0
begin
ROLLBACK
print 'Transaction rolled back'
end
else
begin
COMMIT
print 'Transaction committed'
end

Tip: T-SQL (and standard SQL) supports various alternative forms for keywords and phrases. You've used just ROLLBACK and COMMIT here.

Then you add some more instrumentation, so you could see what error numbers are encountered during the transaction.

print 'INSERT error number:' + cast(@inserr as nvarchar(8))
print 'DELETE error number:' + cast(@delerr as nvarchar(8))
return @maxerr

Now let's look at what happens when you execute the Stored Procedure. You run it twice, first by adding person "Pearl" and next by adding person "Spark", but you also enter the same nonexistent person "Agarw" to delete each time. If all statements in a transaction are supposed to succeed or fail as one unit then why does the INSERT succeed when the DELETE doesn't delete anything?

Figure 1-4 should make everything clear. Both the INSERT and DELETE return error number zero. The reason DELETE returns error number zero even though it has not deleted any rows is that when a DELETE doesn't find any rows to delete, T-SQL doesn't treat that as an error. In fact, that's why you use a nonexistent person. Excluding these recently added persons Pearl and Spark. Other records have child records in the PersonDetails table as shown in Figure 1-3 and you can't delete these existing persons unless you delete their details from the PersonDetails table first.

Try It Out: What Happens When the First Operation Fails

In this example, you'll try to insert a duplicate person and delete an existing person. Add person "Pearl" and delete person "Spark" by entering the following statement, and then click the Execute button.

exec sp_Trans_Test 'Pearl', 'Vamika', null, 'Spark'

The result should appear as in Figure 1-6.

Figure 1-6. First operation failed, second operation rolled back

In the Messages pane shown in Figure 1-6, note that the entire transaction was rolled back because the INSERT failed and was terminated with error number 2627 (whose error message appears at the top of the window). The DELETE error number was 0, meaning it executed successfully but was rolled back. (If you check the table then you'll find that person "Spark" still exists in the Person table.)

How It Works

Since person "Pearl" already exists and as you know and shown in Figure 1-2, the Person table's PersonID column in the primary key and it can only contain unique values. This is why SQL Server prevents the insertion of a duplicate, so the first operation fails. The second DELETE statement in the transaction is executed, and person "Spark" was deleted since it doesn't have any child records in the PersonDetails table; but because gmaxerr isn't zero (it's 2627, as you see in the Results pane), you roll back the transaction by undoing the deletion of customer "Spark". As a result you see all the records in the table as it is.

Try It Out: What Happens When the Second Operation Fails

In this example, you'll insert a valid new person and try to delete a person who has child records in PersonDetails table.

Add person "ag" and delete person "Vidvr" by entering the following statement, and then click the "Execute" button.

exec sp_Trans_Test 'ag', 'Agarwal ',null, 'Vidvr'

The result should appear as in Figure 1-7.

Figure 1-7. Second operation failed, first operation rolled back.

In the Messages window shown in Figure 1-7, note that the transaction was rolled back because the DELETE failed and was terminated with error number 547 (the message for which appears at the top of the window). The INSERT error number was 0, so it apparently executed successfully but was rolled back. (If you check the table then you'll find "ag" is not a person.)

How It Works

Since person "ag" doesn't exist, SQL Server inserts the row, so the first operation succeeds. When the second statement in the transaction is executed, SQL Server prevents the deletion of customer "Vidvr" because it has child records in the PersonDetails table, but since gmaxerr isn't zero (it's 547, as you see in the Results pane), the entire transaction is rolled back.

Try It Out: What Happens When Both Operations Fail

In this example, you'll try to insert an invalid new person, in other words one with a duplicate name and try to delete an undeletable one. In other words that has child records in the PersonDetails table.

Add person "Pearl" and delete customer Rupag by entering the following statement, and then click the "Execute" button.

exec sp_Trans_Test 'Pearl', 'Vamika', null,'Rupag'

The result should appear as in Figure 1-8.

Figure 1-8. Both operations rolled back

In the Messages window shown in Figure 1-8, note that the transaction was rolled back (even though neither statement succeeded, so there was nothing to roll back) because gmaxerr returns 2627 for the INSERT and 547 for the DELETE. Error messages for both failing statements are displayed at the top of the window.

How It Works

By now, you should understand why both statements failed. This happened because the first statement couldn't insert a duplicate record and the second statement couldn't delete a record that has associated child records. This is why the Message pane in Figure 1-8 shows both the errors explicitly mentioning duplicate key and conflict reference with child records.

Summary

This article covered the fundamentals of transactions, from concepts such as understanding what transactions are, to ACID properties, local and distributed transactions, guidelines for writing efficient transactions, and coding transactions in T-SQL. Although this article provides just the fundamentals of transactions, you now know enough about coding transactions to handle basic transactional processing and implement it using C# and ADO .NET.