2022 ~ Shanmukh Sattiraju

Blog Viewers

visitor counter script

Tuesday, December 13, 2022

Azure DevOps - Quick Solution - "No hosted parallelism has been purchased or granted Error" --Fixed!

By Shanmukh Sattiraju

You will not be given any free grant of agents by default from Microsoft with its new policy

To build your code or deploy your software using Azure Pipelines, you need at least one agent. As you add more code and people, you'll eventually need more.

When your pipeline runs, the system begins one or more jobs. An agent is computing infrastructure with installed agent software that runs one job at a time.

2 types of Agents:

1. Microsoft Hosted Agents

2. Self-Hosted Agents

Microsoft Hosted: Here the will host the agent

Self-hosted – You can host the agent to run the job for you

If you do not want to wait for 2 business days and if you are doing these tests you can host a self hosted agent in a virtual machine or in your own local machine

Thursday, December 08, 2022

Azure Data Factory - Setup Continuous Integration Continuous Delivery (CI/CD) using Azure DevOps Git

By Shanmukh Sattiraju

In this video, we are configuring the Continuous Integration and Continuous Delivery (CICD) from scratch in ADF using Azure DevOps Git.

Monday, November 14, 2022

Azure Data Factory - Copy multiple SQL tables incrementally using a watermark table (Delta Load) ADF

By Shanmukh Sattiraju

Azure Data Factory - Copy multiple SQL tables incrementally using a watermark table (Delta Load) ADF

Here we are doing incremental copy of multiple SQL Tables from one Database to another Database.

Please refer the below video Step by Step explanation of pipeline for a Single SQL Database:

https://youtu.be/AOClU3s9jXw

We connected to SQL Database from On-Premise Environment for this video. To know how to setup Self Hosted Integrated Runtime,

Please refer:

https://youtu.be/d9Xp2pnjYcI

Wednesday, October 26, 2022

Azure Data Factory - Copying files from On premise Linux or Unix Server to Azure Datalake using ADF

By Shanmukh Sattiraju

Here we are copying files from Linux Virtual Machine to ADLS. We are created a file share from linux and we are adding that file share to windows,

using self hosted IR we are accessing the Linux File Share and we are able to copy the files from Linux to Azure DataLake

Link for configuring Self Hosted Integration Runtime to access On premise Windows file:

https://youtu.be/d9Xp2pnjYcI

To install samba utility:

-- sudo apt install samba -y

To start samba service:

-- sudo systemctl enable --now smbd

To check status of service:

-- sudo systemctl status smbd

To create a samba user:

-- sudo smbpasswd -a (username)

Configuring Samba share from Linux to Windows

Copy files from Linux to Azure Datalake using ADF

Friday, September 30, 2022

Azure Data Factory - Status of Trigger runs during their pipeline failure situation

By Shanmukh Sattiraju

Azure Data Factory - Status of Trigger runs during their pipeline failure situation

Here we are seeing behavior of Tumbling Window, Scheduled and Storage Event triggers during the event of pipeline failure.

To configure Self Hosted Integration Runtime:

https://youtu.be/d9Xp2pnjYcI

Thursday, September 08, 2022

Azure Data Factory -Delete files based on Last modified date (delete more than 7 days ) using ADF

By Shanmukh Sattiraju

Azure Data Factory -Delete files from ADLS Gen2 based on Last modified date (delete more than 7 days ) using ADF

In this we are deleting files from Azure Data Factory based on the last modified date (as per requirement) , using this we can perform clean up for unwanted or stale data.

Tuesday, September 06, 2022

Azure Data Factory - Create a Customer Managed Key and encrypt ADF using that CMK

By Shanmukh Sattiraju

Azure Data Factory - Create a Customer Managed Key and encrypt ADF using that CMK

By default, data is encrypted with a Microsoft-managed key that is uniquely assigned to your data factory. For extra security , you can use customer-managed keys feature in Azure Data Factory.

When you specify a customer-managed key, Data Factory uses both the factory system key and the CMK to encrypt customer data. Missing either would result in Deny of Access to data and factory.

Reference Documentation Link: https://docs.microsoft.com/en-us/azure/data-factory/enable-customer-managed-key

Monday, September 05, 2022

Azure Data Factory - Send an alert email when a blob is created or deleted from ADLS Gen2 using ADF

By Shanmukh Sattiraju

Azure Data Factory - Send an alert email when a blob is created or deleted from ADLS Gen2 using ADF

Notifying about the alerts is an important part in ADF, we can achieve that by following the steps in the video. Here we are getting email alerts to GMAIL whenever a file is removed and created in our Storage account.

Refer the below video to get custom alert email when pipeline failed : https://youtu.be/KorFyv5FntY

Friday, September 02, 2022

Azure Data Factory - Import or Copy Pipeline from One Data Factory to Another Data Factory

By Shanmukh Sattiraju

Azure Data Factory - Import or Copy Pipeline from One Data Factory to Another Data Factory

Here we are importing one ADF to another ADF, all the configurations in one pipeline will be copied to another using this

Tuesday, August 30, 2022

Azure Data Factory - Copy multiple files from HTTP website dynamically to Data Lake using ADF

By Shanmukh Sattiraju

Azure Data Factory - Copy multiple files from HTTP website dynamically to Data Lake using ADF

Here we are copying multiple files dynamically from HTTP website . We are using GitHub as source website to copy multiple files to Azure Data Lake Gen2

Previous video to copy single HTTP file: https://youtu.be/PNN5VPoP2zQ

Azure Data Factory - Copy data from HTTP website (GitHub) to Azure Data lake using ADF

By Shanmukh Sattiraju

Azure Data Factory - Copy data from HTTP website to Azure Data lake using ADF

Here we are copying a csv file from a HTTP website (GitHub) to Azure Data lake Storage using copy activity from ADF

Refer the below link for copying multiple files from a HTTP website dynamically to ADLS: https://youtu.be/K5ND-pyD3yE

Monday, August 29, 2022

Azure Data Factory - Incremental load or Delta load using a watermark Table

By Shanmukh Sattiraju

Here source is having incoming values with some date value, we are keeping a watermark table which keeps track of the data copied and this helps to incrementally load the data to the destination preventing copy of the old data.

Refer below to setup self hosted IR to use SQL from local Machine:

https://youtu.be/d9Xp2pnjYcI

Friday, August 26, 2022

Azure Data Factory - Copy multiple files from AWS S3 to Azure Data Lake Storage using ADF pipeline

By Shanmukh Sattiraju

Azure Data Factory - Copy multiple files from AWS S3 to Azure Data Lake Storage using ADF pipeline

Here we are getting all the files in a folder in Amazon S3. We are copying all files dynamically using Azure Data Factory

To prevent hard coded values in sensitive information, its recommended to use Azure Key Vault. Refer the below video for reference to how to use Azure Key vault in ADF:

https://youtu.be/2lPlReATWew

Thursday, August 25, 2022

Azure Data Factory - Delete specific folders and specific files using lookup file in ADF

By Shanmukh Sattiraju

Azure Data Factory - Delete specific folders and specific files using lookup file in ADF

Here we are making a lookup file as reference which tells the specific files and folders to be deleted. We are using lookup , forEach and delete activity to execute this logic to delete the required files

Wednesday, August 24, 2022

Azure Data Factory - Create Managed VNET Integration runtime and creating Managed Private Endpoint

By Shanmukh Sattiraju

Azure Data Factory - Create Managed VNET Integration runtime and creating Managed Private Endpoint

In this we are first creating Managed VNET Integration runtime and we are establishing a private endpoint connection with ADLS Gen2 and doing a copy activity using Managed VNET Integration Runtime

Monday, August 22, 2022

Azure Data Factory - Copy specific range of cells in Excel along with sheets dynamically to CSV

By Shanmukh Sattiraju

Azure Data Factory - Copy specific range of cells in Excel along with sheets dynamically

Here we are copying only specific range of cells from an Excel file , using this we can eliminate unwanted cells from an Excel while copying

Below is the video which tells about copying excel sheets dynamically in details

https://youtu.be/n5phQgvIaxY

Friday, August 19, 2022

Azure Data Factory - Moving Files from Source to Destination in ADLS using copy and delete Activity

By Shanmukh Sattiraju

In Azure Data Factory - We do not have any move activity , we can take advantage of copy and delete activity to fulfill the requirement of moving.

In this video we are moving all the csv files from one folder to another

Thursday, August 18, 2022

Azure Data Factory - Copy multiple sheets in excel dynamically into multiple files/tables /SQL table

By Shanmukh Sattiraju

Azure Data Factory - Copy multiple sheets in excel dynamically into multiple files/tables /SQL table

We are copying multiple sheets in Excel dynamically into multiple Files, which created each csv file for a sheet.

We are also copying these sheets into a SQL table and also multiple table based on the name of the sheet

Wednesday, August 17, 2022

Azure Data Factory - Add additional column dynamically with Row count of files in Copy activity

By Shanmukh Sattiraju

Add additional column dynamically with Row count of files in Copy activity

In this we are dynamically adding a column which tells us the row count that each file holds in the Azure Data Lake and we are copying them using Copy activity to SQL Database

Friday, July 15, 2022

Azure Data Factory - Adding Keyvault secret for password in SSIS Package in pipeline

By Shanmukh Sattiraju

Azure Data Factory - Adding Keyvault secret for password in SSIS Package in Azure Data Factory

Storing the sensitive information is an importing thing while developing a solution. Using Azure KeyVault we can store our secrets and access them using a linked service in Azure Data Factory.

Azure Data Factory - Run SSIS packages in ADF as File System (Project Model) using SSIS-IR

By Shanmukh Sattiraju

Azure Data Factory - Run SSIS packages in ADF as File System (Project Model) using SSIS-IR

In this we are using Project Deployment Model, we are taking the .ispac file that is getting generated after build. From that we are uploading the .ispac file and the package(.dtsx) to Azure File Share.

If you want to know the FileSystem (Package),

Please visit: https://youtu.be/LzvoUwriio8

Azure Data Factory - Run SSIS packages in ADF as File System (Package Model) using SSIS-IR

By Shanmukh Sattiraju

Azure Data Factory - Run SSIS packages in ADF as File System (Package Model) using SSIS-IR

In this we are using package deployment model and storing our Configuration File (.dtsConfig) and Package(.dtsx) in an Azure File Share.

Using Azure Data Factory we are calling the package as File System (Package) and giving the configuration file with package

ADF looks for the configuration file and performs the action based on the content /variables that we given in the configuration file

Thursday, July 14, 2022

Azure Data Factory -Run SSIS-IR using Self Hosted IR as proxy to access On premise SQL Database Demo

By Shanmukh Sattiraju

Azure Data Factory -Run SSIS-IR using Self Hosted IR as proxy to access On premise SQL Database Demo

Here we are accessing Onpremise SQL Database Data using Self Hosted IR as a proxy, We are deploying the SSIS-IR in the Azure SQL Database.

Azure Data Factory - Lift & Shift SSIS -Execute SSIS package in ADF using Project deployment Model (SSISDB)

By Shanmukh Sattiraju

Azure Data Factory - Lift & Shift SSIS -Execute SSIS package in ADF using Project deployment Model (SSISDB) by creating SSIS - Integration Runtime

In this video we are using Project Deployment model . We are deploying our Project into SSISDB which is being Hosted in Azure SQL Database Server.

We are executing the packages from Azure Data Factory which are hosted in the SSISDB.

SSIS - Deploy SSIS-DB Catalog in local SQL Server - Project Deployment Model

By Shanmukh Sattiraju

Here we are deploying the SSISDB catalog into on premise local SQL server

Azure Data Factory - Send a custom alert email for Activity or Pipeline Failure using Logic App

By Shanmukh Sattiraju

Azure Data Factory - Send alert email for Activity or Pipeline Failure using Logic App

We can send a custom alert email for any activity or pipeline failure using Logic Apps from Azure Data Factory

Wednesday, July 13, 2022

Azure Data Factory - Invoke an Azure Databricks notebook from an Azure Data Factory Pipeline

By Shanmukh Sattiraju

Azure Data Factory - Invoke an Azure Databricks notebook from an Azure Data Factory Pipeline

We have a notebook which performs a transformation and generates a csv file.

We can execute that Azure Databricks notebook from Azure Data Factory using Databricks Activity.

Azure Data Factory - Copying Today's files with date and timestamp in name of the file

By Shanmukh Sattiraju

Azure Data Factory - Incremental data copy Copy files that are Created / Modified today

This helps us to get files which are added or modified on Today's date

Similarly there are multiple approaches to get today's file

If you want only non-empty file and modified or created date with today,

please follow below

https://youtu.be/LOo9JC-HtLk

If you want Copy Today's files with date and timestamp in name of the file and copy based on the name, please follow below

https://youtu.be/u67xQ1u6NjU

Azure Data Factory - Copy files which are not empty and last modified is today to ADLS container

By Shanmukh Sattiraju

Azure Data Factory - Copy files which are not empty and last modified is today to ADLS container

This approach helps us to get only files which are not empty

Expression is @greaterOrEquals(activity('Lookup1').output.count,2)

This checks file along with the header, Sometimes files can also contain header but it could be an empty file with no further rows. The above expressions evaluates row count from 2 or more and that will be treated as non empty file.

For True conditions,

Take a copy Activity,

Source is same dataset given to lookup activity

FileName as parameter, should have @item().name as value

Sink as dataset which points a folder

and copy behavior as Preserve hierarchy

For False conditions,

Just create a Wait activity

Tuesday, July 12, 2022

Azure Data Factory - Read Files From On Premise File System to Azure Blob Storage - Practical Demo

By Shanmukh Sattiraju

To access one premise file system, we need to setup a Self Hosted Integration runtime.

Default Azure runtime will have only scope up to Azure environment, to access files/resources outside azure we need to configure Self Hosted Integrated Runtime.

Here to stimulate an On premise environment we had a Windows 10 virtual machine and we installed Self Hosted IR in that

Sunday, June 05, 2022

Azure Data Factory - copying multiple table from a schema and saving them as files in ADLS

By Shanmukh Sattiraju

Using Lookup activity and forEach activity we can copy multiple SQL tables to multiple files dynamically taking their tablename as file name into Azure Datalake Storage.

Tuesday, May 17, 2022

Azure Databricks - Read CSV from Datalake to Databricks and Write as Single File with specified name

By Shanmukh Sattiraju

Once Datalake storage is mounted. We can read the files present in the Datalake using spark dataframes.

We can also write the output to the files and save them into different folder with a specific name

Monday, May 16, 2022

Azure Databricks - Mounting Azure Datalake using Oauth with Service principal and Azure Keyvault

By Shanmukh Sattiraju

Mounting an Azure Datalake using Oauth with Service principal and Azure Keyvault

This is the most secure way to access it

This prevents to store keys in the code which is insecure way to define in code.

Below video shows the complete walk through to perform it

Azure Databricks - Mounting Azure Datalake using Access keys or SAS

By Shanmukh Sattiraju

Mounting Azure Datalake Storage to Azure Databricks

Mounting object storage to DBFS allows you to access objects in object storage as if they were on the local file system.

It is like we are accessing DataBricks File System (DBFS)

We can mount the file system using dbutils.fs.mount()

We can mount Azure Blob Storage either by using Account Key or By using SAS key

Sunday, May 15, 2022

Azure Databricks - dbutils - Widget commands with Examples

By Shanmukh Sattiraju

Widget commands helps to add parameters to your notebook. Sometimes values cannot be taken as hard coded values

We need a mechanism to take values from other Azure Services Dynamically for that we have Widget Commands.

They are:

text

dropdown

combobox

multiselect

Let us see all utilities with example

Azure Databricks - dbutils - FS command with all its utilities with Example

By Shanmukh Sattiraju

dbutils is a package that helps to perform certain tasks in Azure Databricks.

dbutils are only supported inside databricks notebooks

The available utilities are:

fs - Manipulates the Databricks filesystem (DBFS) from the console

1. cp

2. head

3. ls

4. mkdirs

5. mv

6. put

7. rm

Let us see about all the above utilities with example

Azure Databricks - Access Azure Datalake Gen2 Storage using Azure Key Vault Secret Scope with PySpark Code

By Shanmukh Sattiraju

Azure KeyVault is a service is Azure where we can storage our Secrets, Certificates, Keys,.

Storing the Access keys and we are linking to Databricks using Secret Scope

http://<databricks_Instance_Link>#secrets/createScope

Azure Databricks - Access Azure Datalake Gen2 Storage directly with Spark Dataframe with Pyspark code

By Shanmukh Sattiraju

We can directly access Azure Datalake from Azure Databricks using SAS or Access keys in the commands.

This is to get an idea on accessing Datalake from Databricks , the recommended way is to use Azure Key vault which we will see in coming video

Azure Databricks- Importing CSV file into DataBricks File System with PySpark Code

By Shanmukh Sattiraju

Azure DataBricks File System

Databricks File System (DBFS) is a distributed file system mounted into an Azure Databricks workspace and available on Azure Databricks clusters.

DBFS is an abstraction on top of scalable object storage

The default storage location in DBFS is known as the DBFS root.

/FileStore: Imported data files, generated plots, and uploaded libraries

/databricks-datasets: Sample public datasets.

/databricks-results: Files generated by downloading the full results of a query.

Below video contains example video with importing CSV file to DBFS and perform some transformation in it

Azure Databricks - Creating simple Dataframe from list with PySpark code

By Shanmukh Sattiraju

Azure Data bricks

Azure data bricks is a platform which provides you computational resources and a integrated interface to write code to perform data transformation.

It prevents time to setup environments of Python, R, Scala and SQL. It provides all for us with 0 configuration.

It contains Workspace ,Cluster and Notebook to write your code.

1. Workspace is an environment provided to you by Azure Data bricks

2. Cluster is set of computation resources and configurations on which we can run

workloads

3. Notebook is a web-based interface to a document that contains a code,

Visualizations and narrative text

Below video have a simple way to create a dataframe in Azure databricks

Azure Data Factory - Data Flows - DERIVED COLUMN and SORT Transformation

By Shanmukh Sattiraju

DERIVED COLUMN Transformation

This helps to modify or generate a new column based on the condition we define. This can also helps to generate a new column based on the data of the existing columns

SORT Transformation

As name defines this helps to sort the data of the column based on the column we provide to it

Below video have the complete explanation of both with example

Azure Data Factory - Data flows - UNION Transformation with example

By Shanmukh Sattiraju

UNION Transformation

This helps to combine multiple streams of data. This is similar to UNION in SQL , but we can combine N number of streams

Azure Data Factory - Data Flows - CONDITIONAL SPLIT Transformation

By Shanmukh Sattiraju

Conditional Split Transformation - This helps to make different data streams based on the matching condition that we give.

This is similar to CASE statement that we use in our Programming languages. We can give multiple conditions and split the data to different streams. The following video have a detailed example with explanation for Conditional Split Transformations

Saturday, May 14, 2022

Azure Data Factory - Data Flows - JOIN, SELECT , FILTER , AGGREGATE Transformations

By Shanmukh Sattiraju

Data flows in Azure Data Factory

Data Flows helps to apply transformation logic with a code less graphical interface.

Its helps to build the logic with ease.

Below video shows with an example how to perform JOIN, SELECT, FILTER, AGGREGATE Transformations

Friday, May 13, 2022

Azure Data Factory - Incremental data copy Copy files that are Created / Modified today

By Shanmukh Sattiraju

Azure Data Factory - We can copy files in a folder incrementally that are Created or Modified today to another Folder in Azure Data Lake

Azure Data Factory - Copy specific files from one folder to another in Data Lake

By Shanmukh Sattiraju

Using Get-Metadata, forEach and If condition in Azure Data Factory we can get properties of the files in Folder and we can apply our transformation to and pick desired files to another Location of DataLake

In this video, we look file size as a parameter to pick files and we took files which are more than 30 KB of size and moved them to other location

Monday, May 09, 2022

Azure Data Factory - Dynamic Data Loading using parameters to different SQL Tables.

By Shanmukh Sattiraju

Dynamic Data loading helps us to Load data to resources without having effort of giving hard coded names with concept of parameterize of values

In this video we will see to load data from different CSV files from different folders to different tables in Azure SQL Database We used Lookup and Foreach activity to achieve the dynamic data load and copy activity to copy the resources

Azure Data Factory - Triggers - Tumbling Window trigger

By Shanmukh Sattiraju

Tumbling window triggers helps to execute a pipeline with an specified internal .

That interval is known as a window. Pipeline will be trigger based on give value of recurrence.

Azure Data Factory - Triggers - Schedule Trigger with Example

By Shanmukh Sattiraju

Schedule trigger helps to execute a pipeline at a particular scheduled time (Start Time,Recurrence, End Time). We can define an end date to schedule trigger.

Features:

a. You can specify a specific date, Month, Days in a week to run this trigger in a given duration.

The above image defines to run pipeline on every Monday, Wednesday and Friday at 1.25 and 8.25 AM

b. These can be used to attach to multiple pipelines

c. Many triggers also can be attached to a single pipeline.

d. We cannot schedule for a past date (which doesn't make any sense)

Below is the video to demonstrate how it works

Azure Data Factory - Triggers - Storage event based trigger

By Shanmukh Sattiraju

Storage event based trigger helps to execute the pipeline based on a particular event occurred in the given trigger

Azure Data Factory - Copy data from Azure DataLake Gen2 to Azure SQL Database using Copy activity

By Shanmukh Sattiraju

Sunday, May 08, 2022

Azure Data Factory - Convert CSV file to JSON, Parquet, Avro formats using ADF pipeline

By Shanmukh Sattiraju

AVRO is a row-based storage format

PARQUET is a columnar-based storage format.

PARQUET is much better for analytical querying, i.e., reads and querying are much more efficient than writing.

Azure Data Factory - Pipeline to Unzip a folder and extract files using ADF

By Shanmukh Sattiraju

Building a pipeline with Copy activity to Extract files from a Zipped folder from Azure Data Lake