An Introduction to SQL Server Integration Service (SSIS)

Integration Services is a platform for building high performance data integration and workflow solutions, including extraction, transformation and loading (ETL) operations for data warehousing.
This includes graphical tools and wizards for building and debugging packages.

Uses of Integration Services

One use of Integration Services is to merge data from multiple data stores and update the data to data warehouses and/or data marts. Create the Data Transformation process logic and automate the data loading process.

ETL Working Model

data wharehouse

Reference: Introduction to Microsoft Business Intelligence (MSBI).

Architecture of Integration Services

data flow task

Image source: TechNet (MSDN)

Some important components to using Integration Services:

  • SSIS Designer
  • Runtime engine
  • Tasks and other executables
  • Data Flow engine and Data Flow components
  • API or object model
  • Integration Services Service
  • SQL Server Import and Export Wizard
  • Other tools, wizards and command prompt utilities
SSIS Packages

This is the collection of data flow, control flow, variables, events, connections and configurations that are stored in SQL Server, SSIS package store, or in the file system.

The package is the unit of work that is retrieved, executed and saved.

Control Flow Elements

It provides the following three types of control flow elements:
  • Containers that provide structure in packages
  • Tasks that provide functionality
  • Precedence constraints

container task
Image source: TechNet(MSDN)

Integration Services Containers

  • The Foreach Loop container
  • For Loop container
  • The Sequence container

Integration Services Tasks

ssis toolbox

  • Analysis Services Execute DDL Task

    Executes Data Definition Language (DDL) statements in the package.

  • Analysis Services Processing Task

    Processes the cubes and dimensions in the package.

  • Back Up Database Task

    SQL Server database backup using the package.

  • Bulk Insert Task

    Provides an efficient way to copy large amounts of data into a SQL Server table or view.

  • CDC Control Task

    Controls the life cycle of Change Data Capture (CDC) packages.

  • Check Database Integrity Task

    We can check the structural integrity of all the objects in the specified database.

  • Data Mining Query Task

    Runs prediction queries based on data mining models built in Analysis Services.

  • Data Profiling Task and Viewer

    Provides data profiling functionality in the process of extracting, transforming and loading data.

  • Execute Package Task

    Runs other packages as part of a workflow.

  • Execute Process Task

    Helps to run an application or batch file as part of a SQL SSIS package.

  • Execute SQL Task

    Executes SQL statements or a Stored Procedure from an SSIS package.

  • Execute T-SQL Statement Task

    Execute Transact-SQL statements.

  • File System Task

    For the operations on files and directories.

  • FTP Task

    Uploads or downloads and manages directories.

  • History Cleanup Task

    Deletes entries in the history tables in the SQL Server msdb database.

  • Maintenance Cleanup Task

    Removes files related to maintenance plans, including database backup files and reports created by maintenance plans.

  • Message Queue Task

    Allows you to use MSMQ to send and receive messages between SQL Server Integration Services packages.

  • Notify Operator Task

    Sends notification messages to a SQL Server Agent operator.

  • Rebuild Index Task

    Rebuild indexes in SQL Server database tables and views.

  • Reorganize Index Task

    Reorganizes indexes in SQL Server database tables and views.

  • Script Task

    We have an option to write code to do functions that are not available in the built-in tasks and transformations that SSIS provides.

  • Send Mail Task

    Send E-mail from the package.

  • Shrink Database Task

    Reduces the size of the SQL database data and log file.

  • Transfer Database Task

    Transfers a SQL Server database between two instances of SQL Server.

  • Transfer Error Messages Task

    Configured to transfer all error messages.

  • Transfer Jobs Task

    Transfers one or more SQL Server Agent jobs between instances of SQL Server.

  • Transfer Logins Task

    Transfers one or more logins between instances of SQL Server.

  • Transfer Master Stored Procedures Task

    Transfers one or more user-defined Stored Procedures between master databases on instances of SQL Server.

  • Transfer SQL Server Objects Task

    Transfers one or more types of objects in a SQL Server database between instances of SQL Server.

  • Select Objects to Transfer

    Use this dialog box to select objects to transfer from one instance of SQL Server to another by using an Integration Services transfer task.

  • Update Statistics Task

    Updates information about the distribution of key values for one or more statistics groups (collections) in the specified table or indexed view.

  • Web Service Task

    Executes a Web service method.

  • WMI Data Reader Task

    Runs queries using the Windows Management Instrumentation (WMI) Query Language that returns information from WMI about a computer system.

  • WMI Event Watcher Task

    Watches for a Windows Management Instrumentation (WMI) event using a Management Instrumentation Query Language (WQL) event query to specify events of interest.

  • XML Task

    The XML task is used to work with XML data.

Precedence Constraints

Precedence constraints link executables, containers and tasks in packages into a control flow and specify conditions that determine whether the executable run.

execute SQL task

Edit option in Precedence constraints.

constraint

Data Flow Elements

There are mainly the following three types of data flow components:

  • Sources
  • Transformation
  • Destination

The diagram includes the various data flow components:

external column
Image source: TechNet(MSDN).

Sources

Sources extract data from data stores such as tables and views in relational databases, files and Analysis Services databases.

Transformations

Transformations modify, summarize and clean data.

Destinations

Destinations load data into data stores or create in-memory datasets.

destination assistant

  • Source Assistant: Helps to create a source component and connection manager.

  • Destination Assistant: Helps to create a destination component and connection manager.

  • ADO NET Source: The ADO NET source consumes data from a .NET provider and makes the data available to the data flow.

  • ADO NET Destination: The ADO NET destination loads data into a variety of ADO.NET-compliant databases that use a database table or view.

  • ADO.NET Connection Manager: Enables a package to access data sources using a .NET provider.

  • ADO NET Custom Properties: The ADO .NET source has both custom properties and the properties common to all data flow components.

  • ADO Connection Manager: This will enable a package to access data sources using a .NET provider.

  • CDC Flow Components: Help SSIS developers work with CDC and reduce the complexity of CDC packages.

  • Excel Source: The Excel source extracts data from worksheets or ranges in Microsoft Excel workbooks.

  • Excel Destination: The Excel destination loads data into worksheets or ranges in Microsoft Excel workbooks.

  • Excel Connection Manager: Enables a package to connect to an existing Microsoft Excel workbook file.

  • Connect to an Excel Workbook: To connect an SSIS package to an Excel file requires an Excel connection manager.

  • Excel Custom Properties: The Excel source has both custom properties and the properties common to all data flow components.

  • File Connection Manager: Enables a package to reference an existing file or folder, or to create a file or folder at run time.

  • Flat File Source: Reads data from a text file.

  • Flat File Destination: Writes data to a text file.

  • Flat File Connection Manager: This will enable a package to access data in a flat file.

  • Multiple Flat Files Connection Manager: It will help to enable a package to access data in multiple flat files.

  • ODBC Flow Components: For creating an ODBC data flow using SQL Server 2014 Integration Services (SSIS).

  • ODBC Connection Manager: Enable a package to connect to a data source using an OLE DB provider.

  • OLE DB Source: Extracts data from a variety of OLE DB-compliant relational databases using a database table, a view, or an SQL command.

  • OLE DB Destination: Loads data into a variety of OLE DB-compliant databases using a database table or view or an SQL command.

  • OLE DB Connection Manager: Enables a package to connect to a data source by using an OLE DB provider.

  • Raw File Source: Reads raw data from a file.

  • Raw File Destination: Writes raw data to a file.

  • Data Mining Model Training Destination: Trains data mining models by passing the data that the destination receives through the data mining model algorithms.

  • DataReader Destination: Exposes the data in a data flow by using the ADO.NET DataReader interface.

  • Dimension Processing Destination: Loads and processes an SQL Server Analysis Services dimension.

  • Partition Processing Destination: Loads and processes an SQL Server Analysis Services partition.

  • Recordset Destination: Creates and populates an in-memory ADO recordset.

  • SQL Server Compact Edition Destination: Writes data to SQL Server Compact databases.

  • SQL Server Compact Edition Connection Manager: Enables a package to connect to a SQL Server Compact database.

  • SQL Server Destination: Connects to a local SQL Server database and bulk loads data into SQL Server tables and views.
  • XML Source: Reads an XML data file and populates the columns in the source output with the data.

  • Connect to an Access Database: Connect an SSIS package to access database require an OLEDB connection manager and a data provider.

  • Integration Services Paths: A path connects two components in a data flow by connecting the output of one data flow component to the input of another component.

  • Data in Data Flows: A set of data types that are used in data flows like data type conversion, data type comparison, Data Flow Troubleshooting and so on.

  • Error Handling in Data: Handles the errors.

  • Data Flow Tab: Creates data flows in a package.

  • Data Flow Task: Encapsulates the data flow engine that moves data between sources and destinations.

  • Data Viewer: Displays the data buffer by buffer as data moves between two data flow components.

  • OData Connection Manager: Enables a package to connect to an OData source.

  • OData Source: Consumes data from Open Data Protocol (OData) services.

  • Integration Services Transformations: Are the components in the data flow of a package that aggregates, merges, distributes and modifies data.

  • Aggregate Transformation: Does the aggregate functions, for example: Average.

  • Audit Transformation: Enables the data flow in a package to include data about the environment in which the package runs.

  • Balanced Data Distributor Transformation: Distributes buffers of the incoming rows uniformly across outputs on separate threads.

  • Character Map Transformation: Does the string functions like lower to upper and so on.

  • Conditional Split Transformation: Can route data rows to multiple outputs depending on the content of the data.

  • Copy Column Transformation: Creates new columns by copying input columns and adding the new columns to the transformation output.

  • Data Conversion Transformation: Converts the data in an input column to a different data type and then copies it to a new output column.

  • Data Mining Query Transformation: Performs prediction queries against data mining models.

  • DQS Cleansing Transformation: Performs the Data quality services.

  • Derived Column Transformation: Creates new column values by applying expressions to transformation input columns.

  • Export Column Transformation: Reads data in a data flow and inserts the data into a file.

  • Fuzzy Grouping Transformation: Identifies rows of data that are likely to be duplicates and selects a canonical row of data to use in standardizing the data.

  • Fuzzy Lookup Transformation: Performs data cleaning tasks such as standardizing data, correcting data and providing missing values.

  • Import Column Transformation: Reads data from files and adds the data to columns in a data flow.

  • Lookup Transformation: Performs lookups by joining data in input columns with columns in a reference dataset.

  • Merge Transformation: combines two sorted datasets into a single dataset.

  • Merge Join Transformation: Provides an output that is generated by joining two sorted datasets using a FULL, LEFT, or INNER joins.

  • Multicast Transformation: Distributes its input to one or more outputs.

  • OLE DB Command Transformation: Runs an SQL statement for each row in a data flow.

  • Percentage Sampling Transformation: Creates a sample data set by selecting a percentage of the transformed input rows.

  • Pivot Transformation: Makes a normalized data set into a less normalized, but more compact version by pivoting the input data on a column value.
  • Row Count Transformation: Counts rows as they pass through a data flow and stores the final count in a variable.

  • Row Sampling Transformation: Obtains a randomly selected subset of an input dataset.

  • Script Component: Hosts a script and enables a package to include and run custom script code.

  • Slowly Changing Dimension Transformation: Coordinates the updating and inserting of records in a data warehouse dimension tables.

  • Sort Transformation: Sorts input data in ascending or descending order and copies the sorted data to the transformation output.

  • Term Extraction Transformation: Extracts terms from text in a transformation input column and then writes the terms to a transformation output column.

  • Term Lookup Transformation: Matches terms extracted from text in a transformation input column with terms in a reference table.

  • Union All Transformation: Combines multiple inputs into one output.

  • Unpivot Transformation: Makes a normalized dataset into a more normalized version by expanding values from multiple columns in a single record into multiple records with the same values in a single column.

Integration Services Connections

SSIS packages use connections to do various tasks and to implement Integration Services.

Some of the examples, like connecting to source and destination, connecting to relational databases, connecting to Analysis Services and so on.

Connection Managers

SQL Server Integration Services provides various types of connection managers that enable packages to connect to a variety of data sources and servers:

  • Built-in connection managers.
  • Available for download from the Microsoft Web site.
  • Custom connection manager.

The following are some of the examples:

  • ADO
  • ADO.NET
  • CACHE
  • Excel
  • FILE
  • FLATFILE
  • FTP
  • HTTP
  • MSMQ
  • MSOLAP100
  • MULTIFILE
  • MULTIFLATFILE
  • OLEDB
  • ODBC
  • SMOServer
  • SMTP
  • SQLMOBILE
  • WMI

References

SQL Server Integration Services

SQL Server Integration Services

Conclusion

In this article we learned some basics of SQL Server Integration Services.