SQL Server Integration Services: How to ETL (Extract, Transform, Load)

As data may come from several sources into the data warehouse, there is often a need to import data from flat files into the system. These flat files may be text files or csv files. The question that arises is, how do we import the data into database. The easiest solution is to use SQL Server Integration Services that is shipped along with MS SQL server. In order to demonstrate the solution, I am using SQL Server 2008 here.

Problem Description

We have a list of customers in a text files. Every row, but the first one, describes the customer details such as customer’s:

  1. FirstName
  2. LastName
  3. Address
  4. PhoneNumber

These values are separated by commas for simplicity. However, you are free to choose any punctuation. It’s always advisable to keep column name also in the very first row so that they give some meaning to values separated by commas in the rows below.

table image

Approach

SQL Server integration services (SSIS) facilitates transporting data from source to destination but it is much more than that. We will see that shortly. Destination can be any database, Excel file or flat files itself. However, that depends on the business needs.

I shall limit my approach to a set of tools for this article and will take them to a higher level gradually in future articles.

  1. Firstly, create a SSIS project.

     SSIS project

  2. Name it and save it.

  3. On the dataflow tab in the middle, drag and drop a flat file source.

    dataflow tab

    flat file source

  4. Give flat file source a name and double click on it.

    dataflow
      
  5. If necessary, give connection a name and click on New to create a flat file source connection.

  6. Browse to the flat file on the disk.

    Browse to the flat file

  7. Select the check box to consider the first row of the flat file for column names.

    Select the check box

  8. Click on the mapping in the left hand pane and change names in the output column if you wish to.

    output column

After performing the above steps, our job is to bring the data from our source flat file to our destination. Let’s say our destination is a table named “Customer” in a database in SQL Server. Before you drag a destination OLE DB on to data flow tab, make sure you have created a customer table in SQL Server. If not, do it using the script below. For the easiness, I have created an auto created Customer ID in the table. Feel free to have it in the flat file directly.

  1. createtable Customers   
  2. ( CustomerId IntIdentity(1,1),  
  3.    FirstName Varchar(50)NotNull,  
  4.    LastName Varchar(50)NotNULL,  
  5.    AddressVarchar(150)NotNULL,  
  6.    PhoneNumber Varchar(10)NotNull  
  7.   
  8. )  
  9. Go  
Execute query and check the table created. It should look like as shown below.

table

Let’s follow the same steps for destination as we did for source flat file connection.

 

  1. Drag and drop a OLE DB destination on to 'Data Flow' tab in the middle

    drop a OLE DB destination

    On 'Data Flow' tab:

    On Dataflow tab

  2. Assign a meaningful name to the dragged OLE DB destination task and connect the green arrow from flat file source to OLE DB destination.

    source to OLE DB destination

  3. Double click on the OLE db destination to configure as shown below.

    OLE db destination

  4. If you already have a connection manager, go ahead and select it or else create a new one by selecting on New Button.

    selecting on New Button

  5. Choose data connection if already shown in the list or create a new one and click on.

  6. Select the destination table you want to import the data to.

    Select the destination table

  7. Map the columns by clicking on the Mapping in the left hand pane and map the columns from Source to Destination table correctly (leave the CustomerID column in destination as it is because it’s an identity column therefore incremented automatically) and click on Ok button.

    CustomerID

  8. Execute the package by right click on the package in the solution explorer.

    Execute the package

  9. In order to confirm the output, check the table entries in the SQL Server.

I have made a lot of assumptions in this article such as No Null values and no loss in data conversion. However, in real life scenarios, we need to take care of them by ourselves. Integrated features in SSIS help us to achievethem. I shall explain most of the features in my upcoming articles.

Run Query

 
Read more articles on SQL Server Integration Services (SSIS):