Slowly Changing Dimensions (SCD) Types

Slowly Changing Dimensions in Data Warehouse is an important concept that is used to enable the historic aspect of data in an analytical system. As you know, the data warehouse is used to analyze historical data, it is essential to store the different states of data. In data warehousing, we have fact and dimension tables to store the data. Dimensional tables are used to analyze the measures in the fact tables. In a data environment, data is initiated at operational databases and data will be extracted-transformed-loaded (ETL) to the data warehouse to suit the analytical environment. Customer, Product are examples… Read more

Types of Dimensions in Data warehouse

What is Dimension? Dimension table contains the data about the business. The primary keys of the dimension tables are used in Fact tables with Foreign key relationship. And the remaining columns in the dimension is normal data which is the information about the Objects related to the business.Eg: Product,Customer,Orders,Company,Date etc. Slowly changing dimensions refer to how data in your data warehouse changes over time. Slowly changing dimensions have the same natural key but other data columns that may or may not change over time depending on the type of dimensions that it is.  Slowly changing dimensions are important in data… Read more

Star Schema vs Snowflake Schema: differences

What is a star schema? A star schema is a database schema used to store data in a star format. This schema consists of a central table, called the “fact table,” and a number of directly connected other tables, called “dimension tables.” The fact table contains information about metrics or measures, while the dimension tables contain information about descriptive attributes. The star schema is very simple and easy to understand, making it ideal for cloud data warehousing and business intelligence applications. What is a snowflake schema? A snowflake schema is a type of database schema that is used to store data in a more… Read more

Difference between PySpark and Python

PySpark is the Python API that is used for Spark. Basically, it is a collection of Apache Spark, written in Scala programming language and Python programming to deal with data. Spark is a big data computational engine, whereas Python is a programming language. To work with PySpark, one needs to have basic knowledge of Python and Spark. The market trends of PySpark and Python are expected to increase in the next few years. Both terms have their own features, limitations, and differences. So, let’s check what aspects they differ. PySpark PySpark is a python-based API used for the Spark implementation and… Read more

ACID Properties

In the context of database transaction processing, the acronym ACID refers to the four key properties of a transaction: • Atomicity • Consistency • Isolation • Durability Atomicity : All changes to data are performed as if they are a single operation. That is, all the changes are performed, or none of them are.For example, in an application that transfers funds from one account to another, the atomicity property ensures that, if a debit is made successfully from one account, the corresponding credit is made to the other account. Consistency :  data is in a consistent state when a transaction starts and when… Read more

SSIS – Non Blocking , Partially Blocking and Full Blocking

Data flow transformations in SSIS use memory/buffers in different ways.  The way transformation uses memory can impact the performance of your package.  Transformations memory/Buffer usage are classified into 3 categories: 1.Non Blocking  2.Semi Blocking  3. Full Blocking All the dataflow components can be categorized to be either Synchronous or Asynchronous. Synchronous vs Asynchronous :  Synchronous components The output of an synchronous component uses the same buffer as the input.  Reusing of the input buffer is possible because the output of an synchronous component always contain exactly the same number of records as the input.            … Read more

SSIS to SQL Server Data Types Matching

SSIS Data Type SSIS Expression SQL Server single-byte signed integer (DT_I1)   two-byte signed integer (DT_I2) smallint four-byte signed integer (DT_I4) int eight-byte signed integer (DT_I8) bigint single-byte unsigned integer (DT_UI1) tinyint two-byte unsigned integer (DT_UI2)   four-byte unsigned integer (DT_UI4)   eight-byte unsigned integer (DT_UI8)   float (DT_R4) real double-precision float (DT_R8) float string (DT_STR, «length», «code_page») char, varchar Unicode text stream (DT_WSTR, «length») nchar, nvarchar, sql_variant, xml date (DT_DATE) date Boolean (DT_BOOL) bit numeric (DT_NUMERIC, «precision», «scale») decimal, numeric decimal (DT_DECIMAL, «scale») decimal currency (DT_CY) smallmoney, money unique identifier (DT_GUID) uniqueidentifier byte stream (DT_BYTES, «length») binary, varbinary, timestamp… Read more

SSIS Data Flow Tasks with Descriptions

S No Transformation Description 1 Aggregate Summing / Averaging a total of the products purchased by a customer online to produce the final amount. 2 Audit When you want to load audit information like (Created Date, UserName , server Name..etc) to your Destination table. 3 Character Map String Manipulation, like Lowercase, Uppercase etc.. 4 Conditional Split When you want to split your data to many destinations based on conditions. 5 Copy Column To have same column data with alias name 6 Data Conversion To convert the data type, (Unicode String àString…) 7 Data Mining Query Evaluating the input data against… Read more

Log Shipping Complete Tutorial Part 2 Step by Step Log shipping

Permissions To setup a log-shipping you must have sysadmin rights on the server. Minimum Requirements SQL Server 2005 or later Standard, Workgroup or Enterprise editions must be installed on all server instances involved in log shipping. The servers involved in log shipping should have the same case sensitivity settings. The database must use the full recovery or bulk-logged recovery model A shared folder for copying T-Log backup files SQL Server Agent Service must be configured properly In addition, you should use the same version of SQL Server on both ends. It is possible to Log Ship from SQL 2005 to… Read more

Log Shipping Complete Tutorial Part 1

What is SQL Server log shipping? SQL Server log shipping is a technique which involves two or more SQL Server instances and copying of a transaction log file from one SQL Server instance to another. The process of transferring the transaction log files and restoring is automated across the SQL Servers. As the process result there are two copies of the data on two separate locations A log shipping session involves the following steps: Backing up the transaction log file on the primary SQL Server instance Copying the transaction log backup file across the network to one or more secondary… Read more