Category Archives: MSBI

SSIS – Non Blocking , Partially Blocking and Full Blocking

Data flow transformations in SSIS use memory/buffers in different ways.  The way transformation uses memory can impact the performance of your package.  Transformations memory/Buffer usage are classified into 3 categories:

1.Non Blocking  2.Semi Blocking  3. Full Blocking

All the dataflow components can be categorized to be either Synchronous or Asynchronous.

Synchronous vs Asynchronous : 

  • Synchronous components The output of an synchronous component uses the same buffer as the input.  Reusing of the input buffer is possible because the output of an synchronous component always contain exactly the same number of records as the input.                                                                                                                               Number of records IN == Number of records OUT.
  • Asynchronous components The output of an asynchronous component uses a new buffer. It’s not possible to reuse the input buffer because an asynchronous component can have more or less output records than input records.

One Main thing is that synchronous components reuse buffers and therefore are generally faster than asynchronous components.

All source adapters are asynchronous, they create two buffers; one for the success output and one for the error output.

All destination adapters on the other hand, are synchronous.

Here are some tips that will help you minimize your use of asynchronous transformations:

  • Instead of using a Merge Join in your Data Flow, perform the joins in the source query or in a transform environment.
  • If you absolutely have to use Merge Join, make sure that you sort the data in your source query. In the Advanced Editor, set the “IsSorted” property to true, and set the Sort Key Position on the Output Columns to the appropriate values.
  • Instead of using an Aggregate transform do perform a count, consider using Row Count, which is a synchronous transformation.
  • If you are using a Pivot or Unpivot transformation and it is performing poorly, consider using staging tables in your solution in order to leverage your SQL server environment to perform these transformations instead of doing it in SSIS.
  • Even though Union All is a semi-blocking asynchronous transformation, you will probably not achieve better performance by replacing the Union All with multiple inserts.
Non-blocking Semi-blocking Fully-blocking
Synchronous or asynchronous Synchronous Asynchronous Asynchronous
Number of rows in == number of rows out True Usually False Usually False
Must read all input before they can output False False True
New buffer created? False True True

SSIS transformations categorized:

Non-Blocking transformations Semi-blocking transformations Blocking transformations
Audit Data Mining Query Aggregate
Character Map Merge Fuzzy Grouping
Conditional Split Merge Join Fuzzy Lookup
Copy Column Pivot Row Sampling
Data Conversion Unpivot Sort
Derived Column Term Lookup Term Extraction
Lookup Union All
Multicast
Percent Sampling
Row Count
Script Component
Export Column
Import Column
Slowly Changing Dimension
OLE DB Command
Advertisements

SSIS Data Flow Tasks with Descriptions

S No Transformation Description
1 Aggregate Summing / Averaging a total of the products purchased by a customer online to produce the final amount.
2 Audit When you want to load audit information like (Created Date, UserName , server Name..etc) to your Destination table.
3 Character Map String Manipulation, like Lowercase, Uppercase etc..
4 Conditional Split When you want to split your data to many destinations based on conditions.
5 Copy Column To have same column data with alias name
6 Data Conversion To convert the data type, (Unicode String àString…)
7 Data Mining Query Evaluating the input data against the analysis model to get a proper set.
8 Derived Column Adding a title of courtesy (Mr., Mrs., Dr, etc) before the name and removing the trailing and ending spaces.
9 Export Column When we get the normal files/pdf files/image files from different systems and save it under a particular folder and map it to the table master
10 Fuzzy Grouping Matching the name of a customer with master and child table and use it to group and get the desired set
11 Fuzzy Lookup Matching the name of a customer with master and child table and use it to group and get the desired set
12 Import Column When we get the normal files/pdf files/image files from different systems and save it under a particular folder and map it to the table master
13 Lookup Employee table information saved in a master file and the region wise data available across the table which can be mapped and joined to perform a joined querying operation
14 Merge Combine data from multiple data source like master and child employee table and get result in single dataset.
15 Merge Join Combine data from multiple data source like master and child employee table and get result in single dataset. Can use any type of join like inner, outer, left , right etc
16 Multicast Similar to the conditional split but this splits across all the parts
17 OLE DB Command Used when we need to do updates to all the rows of a table like update If a message sent to the entire customer who have made a payment today.
18 Percentage Sampling Similar to “SELECT TOP 10 PERCENT” in TSQL
19 Pivot To Convert Rows to Column and Column to Rows
20 Row Count Gives the Count of Rows
21 Row Sampling Similar to “SELECT TOP 10 ” in TSQL
22 Script Component Used for places where we need to use .net framework specific assemblies.
23 Slowly Changing Dimension When we need to use some historic dimensions of data
24 Sort To make some sorting to get the desired result. Sorting like customer who made the highest payment in a particular day.
25 Term Extraction Used to get a data from a large set of data and get the extracted output in a formatted set.
26 Term Lookup Used to get a data from a large set of data and get the extracted output in a formatted set.
27 Union All Used to get data from different data sources and get in a single dimensional format.
28 Unpivot Reverse of PIVOT Operation

SSAS STORAGE MODES

In SQL Server Analysis Services 2008, we have three storage mode options available to us: Relational Online Analytical Processing (ROLAP), Multidimensional Online Analytical Processing (MOLAP) and Hybrid Online Analytical Processing (HOLAP). There are advantages and disadvantages to each, so I figured I’d take a few minutes to give a quick overview describing the storage modes and laying out some of the pros and cons of each.

Relational Online Analytical Processing (ROLAP)

The ROLAP storage mode allows the detail data and aggregations to be stored in the relational database. If you plan on using ROLAP, you need to make sure that your database is carefully designed or you’ll run into some bad performance issues.

Pros:

  • Since the data is kept in the relational database instead of on the OLAP server, you can view the data in almost real time.
  • Also, since the data is kept in the relational database, it allows for much larger amounts of data, which can mean better scalability.
  • Low latency.

Cons:

  • With all the data being stored in the relational database, query performance is going to be much slower than MOLAP.
  • You must maintain a permanent connection with the relational database to use ROLAP.

Multidimensional Online Analytical Processing (MOLAP)

MOLAP is the default and thus most frequently used storage mode. With MOLAP storage, the data and aggregations are stored in a multidimensional format, compressed and optimized for performance. This is both good and bad. When a cube with MOLAP storage is processed, the data is pulled from the relational database, the aggregations are performed, and the data is stored in the AS database.

Pros:

  • Since the data is stored on the OLAP server in optimized format, queries (even complex calculations) are faster than ROLAP.
  • The data is compressed so it takes up less space.
  • And because the data is stored on the OLAP server, you don’t need to keep the connection to the relational database.
  • Cube browsing is fastest using MOLAP.

Cons:

  • Because you don’t have a real time connection to the relational database, you need to frequently process the cube to update your data.
  • If there’s a large amount of data, processing is going to take longer.
  • There’s also an additional amount of storage since a copy of the relational database is kept on the OLAP server.
  • High latency.

Hybrid Online Analytical Processing (HOLAP)

HOLAP is a combination of MOLAP and ROLAP. HOLAP stores the detail data in the relational database but stores the aggregations in multidimensional format. Because of this, the aggregations will need to be processed when changes are occur. With HOLAP you kind of have medium query performance: not as slow as ROLAP, but not as fast as MOLAP. If, however, you were only querying aggregated data or using a cached query, query performance would be similar to MOLAP. But when you need to get that detail data, performance is closer to ROLAP.

Pros:

  • HOLAP is best used when large amounts of aggregations are queried often with little detail data, offering high performance and lower storage requirements.
  • Cubes are smaller than MOLAP since the detail data is kept in the relational database.
  • Processing time is less than MOLAP since only aggregations are stored in multidimensional format.
  • Low latency since processing takes place when changes occur and detail data is kept in the relational database.

Cons:

  • Query performance can head downhill fast when more detail data is queried from the relational database.