Introduction to SQL Server FileStream in 2008

Abhishek Yadav
9y
26.7k
0
2

Article

Hello readers. A week ago I was going through some topics related to SQL Server 2008 where I landed on FileStream. At first I thought it's like any other normal topic, but when I began digging into it further, I came to understand that it's like one of the interesting concepts from a developer's and DBA's point of view. After going through many videos, articles, links and other resources, I found that it's an extraordinary functionality provided by SQL Server that solved the problem of storing and managing unstructured data and increases the performance of our application or our database.

In this article we're going to see what FileStream is in SQL Server, why it was introduced, the use of FileStream and so on. We'll also see an example of where we store not only images but also other files.

I've divided this article into the following 7 sections:

Introduction and Overview.
When to Use FileStream.
Benefits of using FileStream.
Limitations of Using FileStream.
Things to consider when creating FileStream.
Enabling and Configuration.
Example and Demonstration.

Let's begin with our article.

Introduction and Overview

FileStream was introduced in SQL Server 2008 that allows us to store unstructured data. Using FileStream, we can store BLOB data, like images, music files, any documents, videos and so on. In short, any kind of files that are larger in size can be easily stored using FileStream.

Prior to SQL Server 2008, we had a datatype called "IMAGE" that was used to store the largest amount of binary data.

The value range for an IMAGE datatype falls in the range of 0-2million bytes. Its fixed storage size was 1 byte per binary byte. And with the introduction of SQL Server 2008, the Image datatype was replaced by “VARBINARY (MAX)”.

Earlier, all media files like images, videos or any documents are used to stored in database tables using IMAGE datatype. This approach had some disadvantages, like file streaming becomes slow and poor performance.

But FileStream came up with a new way of storing all these files on your system disk instead of in database files. Isn't that cool? Instead of storing all those image, videos and documents in data files, those files can be directly stored in the Windows file system.

Figure 1: Windows File

As in the preceding image I tried to show how earlier things were saved in a data file, but in a FileStream all files are stored in your drive and its location is entered in a table as a pointer.

This is considered to be the best approach because a SQL Server buffer is not used in this process and therefore this memory can be used or is available for query processing.

But when researching I found that most of the places it gives an indication that the FileStream feature can be implemented as a datatype. But this is not true, a FileStream is not a kind of Datatype.

“FileStream is not a kind of Datatype”　　　　

Instead, FileStream is an attribute assigned to a VARBINARY (MAX) column. So when we apply this attribute to any varbinary (MAX) column, that column automatically becomes a “FileStream Enabled Column”.

So when we store any kind of data in this column, it'll be stored in your Windows file system and a pointer kind of thing is stored in the respective table that points to the data.

Earlier, the size of BLOB data is limited to 2 GB only but in FileStream it's not. If you're creating a column as a FileStream enabled column then the size of the file is restricted by size of the disk.

FileStream data must be stored in a FileStream filegroup only.

Until here I'm sure we all understand what FileStream is and the benefits of using a file stream over the old method.

When to use FileStream

The following are the key points to be considered for potential use of a FileStream in our application.

The object to be stored is larger than 1 MB.
FileStream will use the Windows API for streaming the files so that files can be accessed faster.
This can be helpful when creating an application that uses a middle tier for the application logic.
FileStream uses a Windows buffer instead of a SQL Server cache for caching the file accessed.

Benefits of using FileStream

This newly added feature offers various advantages as in the following.

Transactional Consistency: If we store any BLOB data in the application then that uses SQL Server storage. That operation involves inserting new records into the table along with relational data and BLOB data. And if the BLOB operation fails within the scope of an open transaction, a COMMIT or ROLLBACK affects the changes to the BLOB table just like any other column or table in SQL Server.
　　　　
Security: Access or Denied permissions can be granted on a FileStream enabled column just like any other column in SQL Server.

Point-In-Time Restore: As changes in a FileStream-enabled column is captured in the transaction log, the log can be backed up and that allows a point-in-time restore.

Reduce overhead on SQL Server: When accessing data using the API, FileStream operations no longer degrade SQL Server internal memory.

Limitation of using FileStream

Every new feature that comes with advantages also ships with some limitations. Some of the limitations are listed below.

Database Mirroring does not support FileStream data.
Transparent Data Encryption does not work with FileStream data.
Database Snapshots cannot be created for FileStream filegroups.
SQL Server Express supports FileStreams. The 10 GB database size does not include the FileStream Data Container.

Things to consider when creating a FileStream

The column should store data on file system, specify the FILESTREAM attribute on a VARBINARY (MAX) column. FILESTREAM data must be stored in a FileStream Filegroup. (A FileStream Filegroup is nothing but a special kind of group that uses a system directory instead of files themselves. These directories are called a data container.)
When a table contains a FileStream column, each row must have a unique row id with not null constraint.
A FileStream Data Container cannot be nested.
When using Cluster Failover, a FileStream Filegroup must be on a shared drive.
A FileStream filegroup can be on compressed volumes.
When you delete data from a FileStream column, the record will be deleted from the column but the file saved in the data container will be removed when the checkpoint runs.

Enabling and configuring FileStream

The FileStream feature remains disabled if you didn't enable it during installation. If you installed a SQL Server instance without enabling FileStream then do not to worry. The following is the procedure to enable FileStream.

Open SQL Server Configuration Manager then select SQL Services then right/double-click on your instance.

Figure 2: SQL Server Manager

Select the “FILESTREAM” tab.

Figure 3: File Stream

On the FileStream tab, enable all 3 options.

Figure 4: Properties

Click OK.

Then restart your SQL Services.

Figure 5: SQL Services

After rebooting your SQL Server Services, open SSMS with a new query window.

Execute the following query to configure FileStream.

USE master
GO
sp_configure 'filestream access level', 2
GO

The following are the values with its definition. We're enabling it for T-SQL and Win32 streaming access.

Value	Defination
0	Disables FILESTREAM support for this instance.
1	Enables FILESTREAM for Transact-SQL access.
2	Enables FILESTREAM for Transact-SQL and Win32 streaming access.

On running this command, we'll get the following message:

Configuration option 'filestream access level' changed from 0 to 2. Run the RECONFIGURE statement to install.

As you see in the message "FileStream access level" changed from 0 to 2. That means it was disabled and now it's enabled to execute FileStream for “Transact-SQL and Win32 streaming access”.

Figure 6: Acces Level

After configuration of FileStream you'll get a message to run the RECONFIGURE statement to install it.

Execute RECONFIGURE command as shown below.

Figure 7: Reconfigure

Now we're done with the enabling and configuration part. Let's move towards an example.

Example and Demonstration

First we'll create a database with FileStream. We can do this action in 2 ways, either T-SQL or GUI.

With GUI

Right-click on Database then select New Database.

Figure 8: DataBase

On the General tab, give your database a name and owner of your choice.

Figure 9: New DataBase

Now click on the “Add” button to add your FileStream and enter the following information.

Logical Name: Name of your FileStream.

File Type: FileStream Data.

Path: Location of your FileStream data container.

Figure 10: Data Container

Now when you click on the OK button you'll get an error message as shown below.

Figure 11: Error

This is because we didn't specify a filegroup for this FileStream.

Figure 12: File Group

So let's create a filegroup for our FileStream. To do this, use the following procedure.

Go to the Filegroup tab then click on the "Add" button in the FileStream section and provide your filegroup a name.

Figure 13: File Group Tab

The following is the T-SQL code for the preceding process.

CREATE DATABASE [FileDptDB] ON PRIMARY (
NAME = N 'FileDptDB', FILENAME = N 'A:\Data Files\FileDptDB.mdf',
SIZE = 3072KB, FILEGROWTH = 1024KB
),
FILEGROUP [FileDptDb_FileStream] CONTAINS FILESTREAM (
NAME = N 'FileDptDB_Files', FILENAME = N 'A:\Data Files\FileDptDB_Files'
) LOG ON (
NAME = N 'FileDptDB_log', FILENAME = N 'A:\Data Files\FileDptDB_log.ldf',
SIZE = 6144KB, FILEGROWTH = 10 %
) GO

The preceding query will create a Data file (.MDF file), Log file (.LDF file) and 1 FileStream Data container (Folder). The following is the final result for it.

Figure 14: Data File

The folder shown in the preceding image, the folder “FileDptDB_Files”, is called a “FileStream Data Container” where the FileStream data will be stored.

The most interesting thing is, if any folder is available of the same name that you specify when creating a FileStream Database, SQL Server will throw an error as shown below.

Msg 5170, Level 16, State 2, Line 1
Cannot create file 'A:\Data Files\FileDptDB_Files' because it already exists. Change the file path or the file name and retry the operation.

The preceding error simply states that a folder with the same name should not exist when creating a FileStream database.

When you open a “FileStream Data Container”, in other words the folder, you'll see the following 2 things:

$FSLOG
Filestream.hdr

Figure 15: File Dpt DB

The folder $FSLOG is nothing but a log file for FileStream and filestream.hdr is the file that stores the metadata information used by SQL Server.

Note: Do not alter/modify this file.

Now we're ready with our database that has a FileStream to store unstructured data.

Now, let's create a table where we'll store our data. The column that stores the actual data is known as a FileStream column. This FileStream column will be created using a VARBINARY (MAX) data type that has a FILESTREAM attribute.

CREATE table tblFilesForAll (
FileId int primary key identity,
ID uniqueidentifier rowguidcol unique not null,
FileName varchar(15),
ActualFile varbinary(MAX) filestream,
Description varchar(15)
)

In the preceding query, we've added a column named "ID" that is of type “UNIQUEIDENTIFIER” and “ROWGUIDCOL”, this is because a FileStream column can be created only on a table that has a UNIQUEIDENTIFIER column marked as ROWGUIDCOL.

Using ROWGUIDCOL also has some limitations like:

Column cannot allow NULL values.
Column either should have unique attribute or we can make the entire column a primary key.
Column needs to be UNIQUEIDENTIFIER data type.

Now, when we execute the preceding query, SQL Server will create a folder where it'll store the files.

Now, we've created our table. Looks as simple as other tables. Ahhhh!!! Let's look at the "FileStream Data container" folder.

Figure 16: Container

If you notice, SQL Server created a new folder with a GUID value as its name and when you open that folder, you'll find another folder such like that.

Figure 17: Guid

Why this folder and what is it all about?

This second folder is nothing but your filestream enabled column that you've created in the table.

In graphically, it looks as in:

Figure 18: Grafically

Until here SQL Server created a folder for the table and for the column.

Let's insert data into this table.

We'll use Openrowset to insert BLOB data. Openrowset is a function that allows reading data from many sources.

This Openrowset() came with SQL Server 2005 with an attribute BULK. This BULK attribute can be use to load content from disk into a varbinary (MAX) column and then inserts that data into a FileStream column. So we'll also use this BULK attribute.

DECLARE @actualFile varbinary(MAX);
SELECT
@actualFile = CAST(
bulkcolumn as varbinary(max)
)
FROM
OPENROWSET(BULK 'A:\MyPic.jpg', SINGLE_BLOB) as MyData;
INSERT INTO tblFilesForAll
VALUES
(
NEWID(),
'My Picture',
@actualFile,
'Demo Pic'
)
SELECT
@actualFile = CAST(
bulkcolumn as varbinary(max)
)
FROM
OPENROWSET(
BULK 'A:\Recovery Models in SQL Server.docx',
SINGLE_BLOB
) as MyData;
INSERT INTO tblFilesForAll
VALUES
(
NEWID(),
'Recovery Model',
@actualFile,
'My Article'
) GO

You can also tweak the preceding complex query very simply as shown below.

INSERT INTO tblFilesForAll(
ID, FileName, Description, ActualFile
)
SELECT
NEWID(),
'My Picture',
'Demo Pic',
CAST(
bulkcolumn as varbinary(max)
)
from
OPENROWSET(BULK 'A:\MyPic.jpg', SINGLE_BLOB) as Data GO INSERT INTO tblFilesForAll(
ID, FileName, Description, ActualFile
)
SELECT
NEWID(),
'Recovery Model',
'My Article',
CAST(
bulkcolumn as varbinary(max)
)
from
OPENROWSET(
BULK 'A:\Recovery Models in SQL Server.docx',
SINGLE_BLOB
) as Data

With this query, we've entered 2 records into the table as shown below.

Figure 19: Enter Record

Our table stored 1 image file and 1 Word document file. So is this our actual data? Is this table storing those 2 files? Where did the data go?

Well, our data is stored in the filestream data container (that SQL Server created earlier) as shown below.

Figure 20: Data Stored

So when SQL Server stores any BLOB data into a FileStream Data Container, an exact copy of the file is stored in the file system.

SQL Server doesn't change/modify/remove anything when storing the file. If you compare the original file with this stored file, you'll find everything is the same.

Let's check it.

Figure 21: File Location

If you see, in the left pane, our actual file is stored and its size is 46,286 bytes and when SQL Server stored that files in FileStream Data Container, the file isn't modified. Both of the files are of the same size.

Now, if you'll open the file to view the content, it'll open like any other file.

Figure 22: Other File

So, we've accomplished our target. We've stored our BLOB data successfully in the table that has a filestream enabled column. With this, I'm winding up this article.

In this article we've learn what filestream is and how to enable filestream. We've also seen its benefits and limitations. We also created tables with a FileStream enabled column where SQL Server stores data. We tries inserting some data and we also compared and cross-checked our stored data with the actual data.

In my next article, we'll learn how to play with these files and how to update and alter these stored files. Until then keep learning and keep sharing.

If there's any mistake in this article then please let me know. Please provide your valuable feedback and comments that enable me to provide a better article the next time.