Tuesday, December 13, 2022

Azure DevOps - Quick Solution - "No hosted parallelism has been purchased or granted Error" --Fixed!

You will not be given any free grant of agents by default from Microsoft with its new policy

To build your code or deploy your software using Azure Pipelines, you need at least one agent. As you add more code and people, you'll eventually need more.

When your pipeline runs, the system begins one or more jobs. An agent is computing infrastructure with installed agent software that runs one job at a time.

2 types of Agents:

1.      Microsoft Hosted Agents

2.      Self-Hosted Agents

Microsoft Hosted: Here the will host the agent

Self-hosted – You can host the agent to run the job for you

 

If you do not want to wait for 2 business days and if you are doing these tests you can host a self hosted agent in a virtual machine or in your own local machine





Thursday, December 08, 2022

Monday, November 14, 2022

Azure Data Factory - Copy multiple SQL tables incrementally using a watermark table (Delta Load) ADF

 Azure Data Factory - Copy multiple SQL tables incrementally using a watermark table (Delta Load) ADF


Here we are doing incremental copy of multiple SQL Tables from one Database to another Database. 

Please refer the below video Step by Step explanation of pipeline for a Single SQL Database:

https://youtu.be/AOClU3s9jXw


We connected to SQL Database from On-Premise Environment for this video. To know how to setup Self Hosted Integrated Runtime,

Please refer:

https://youtu.be/d9Xp2pnjYcI








Wednesday, October 26, 2022

Azure Data Factory - Copying files from On premise Linux or Unix Server to Azure Datalake using ADF

 Here we are copying files from Linux Virtual Machine to ADLS. We are created a file share from linux and we are adding that file share to windows, 

using self hosted IR we are accessing the Linux File Share and we are able to copy the files from Linux to Azure DataLake




Link for configuring Self Hosted Integration Runtime to access On premise Windows file:

https://youtu.be/d9Xp2pnjYcI


To install samba utility:

-- sudo apt install samba -y


To start samba service:

-- sudo systemctl enable --now smbd


To check status of service:

-- sudo systemctl status smbd


To create a samba user:

-- sudo smbpasswd -a (username)


Configuring Samba share from Linux to Windows

Copy files from Linux to Azure Datalake using ADF

Friday, September 30, 2022

Thursday, September 08, 2022

Tuesday, September 06, 2022

Azure Data Factory - Create a Customer Managed Key and encrypt ADF using that CMK

Azure Data Factory - Create a Customer Managed Key and encrypt ADF using that CMK


By default, data is encrypted with a Microsoft-managed key that is uniquely assigned to your data factory. For extra security , you can use customer-managed keys feature in Azure Data Factory. 

When you specify a customer-managed key, Data Factory uses both the factory system key and the CMK to encrypt customer data. Missing either would result in Deny of Access to data and factory.


Reference Documentation Link:  https://docs.microsoft.com/en-us/azure/data-factory/enable-customer-managed-key




Monday, September 05, 2022

Azure Data Factory - Send an alert email when a blob is created or deleted from ADLS Gen2 using ADF


 Azure Data Factory - Send an alert email when a blob is created or deleted from ADLS Gen2 using ADF


Notifying about the alerts is an important part in ADF, we can achieve that by following the steps in the video. Here we are getting email alerts to GMAIL whenever a file is removed and created in our Storage account.


Refer the below video to get custom alert email when pipeline failed : https://youtu.be/KorFyv5FntY




Friday, September 02, 2022

Tuesday, August 30, 2022

Azure Data Factory - Copy multiple files from HTTP website dynamically to Data Lake using ADF

 Azure Data Factory - Copy multiple files from HTTP website dynamically to Data Lake using ADF


Here we are copying multiple files dynamically from HTTP website . We are using GitHub as source website to copy multiple files to Azure Data Lake Gen2


Previous video to copy single HTTP file: https://youtu.be/PNN5VPoP2zQ




Azure Data Factory - Copy data from HTTP website (GitHub) to Azure Data lake using ADF

 Azure Data Factory - Copy data from HTTP website to Azure Data lake using ADF


Here we are copying a csv file from a HTTP website (GitHub) to Azure Data lake Storage using copy activity from ADF


Refer the below link for copying multiple files from a HTTP website dynamically to ADLS:  https://youtu.be/K5ND-pyD3yE




Monday, August 29, 2022

Azure Data Factory - Incremental load or Delta load using a watermark Table

Here source is having incoming values with some date value, we are keeping a watermark table which keeps track of the data copied and this helps to incrementally load the data to the destination preventing copy of the old data.


Refer below to setup self hosted IR to use SQL from local Machine:

https://youtu.be/d9Xp2pnjYcI








Friday, August 26, 2022

Azure Data Factory - Copy multiple files from AWS S3 to Azure Data Lake Storage using ADF pipeline

 Azure Data Factory - Copy multiple files from AWS S3 to Azure Data Lake Storage using ADF pipeline


Here we are getting all the files in a folder in Amazon S3. We are copying all files dynamically using Azure Data Factory


To prevent hard coded values in sensitive information, its recommended to use Azure Key  Vault. Refer the below video for reference to how to use Azure Key vault in ADF: 

https://youtu.be/2lPlReATWew




Thursday, August 25, 2022

Wednesday, August 24, 2022

Azure Data Factory - Create Managed VNET Integration runtime and creating Managed Private Endpoint

Azure Data Factory - Create Managed VNET Integration runtime and creating Managed Private Endpoint


In this we are first creating Managed VNET Integration runtime and we are establishing a private endpoint connection with ADLS Gen2 and doing a copy activity using Managed VNET Integration Runtime



Monday, August 22, 2022

Azure Data Factory - Copy specific range of cells in Excel along with sheets dynamically to CSV

 Azure Data Factory - Copy specific range of cells in Excel along with sheets dynamically


Here we are copying only specific range of cells from an Excel file , using this we can eliminate unwanted cells from an Excel while copying


Below is the video which tells about copying excel sheets dynamically in details

https://youtu.be/n5phQgvIaxY




Friday, August 19, 2022

Thursday, August 18, 2022

Azure Data Factory - Copy multiple sheets in excel dynamically into multiple files/tables /SQL table

 Azure Data Factory - Copy multiple sheets in excel dynamically into multiple files/tables /SQL table


We are copying multiple sheets in Excel dynamically into multiple Files, which created each csv file for a sheet.

We are also copying these sheets into a SQL table and also multiple table based on the name of the sheet




Wednesday, August 17, 2022

Friday, July 15, 2022

Azure Data Factory - Run SSIS packages in ADF as File System (Project Model) using SSIS-IR

 Azure Data Factory - Run SSIS packages in ADF as File System (Project Model) using SSIS-IR


In this we are using Project Deployment Model, we are taking the .ispac file that is getting generated after build. From that we are uploading the .ispac file and the package(.dtsx) to Azure File Share. 

If you want to know the FileSystem (Package),

Please visit: https://youtu.be/LzvoUwriio8




Azure Data Factory - Run SSIS packages in ADF as File System (Package Model) using SSIS-IR

 Azure Data Factory - Run SSIS packages in ADF as File System (Package Model) using SSIS-IR


In this we are using package deployment model and storing our Configuration File (.dtsConfig) and Package(.dtsx) in an Azure File Share.  

Using Azure Data Factory we are calling the package as File System (Package) and giving the configuration file with package

ADF looks for the configuration file and performs the action based on the content /variables that we given in the configuration file




Thursday, July 14, 2022

Azure Data Factory - Lift & Shift SSIS -Execute SSIS package in ADF using Project deployment Model (SSISDB)

Azure Data Factory - Lift & Shift SSIS -Execute SSIS package in ADF using Project deployment Model (SSISDB) by creating SSIS - Integration Runtime


In this video we are using Project Deployment model . We are deploying our Project into SSISDB which is being Hosted in Azure SQL Database Server. 

We are executing the packages from Azure Data Factory which are hosted in the SSISDB.



Wednesday, July 13, 2022

Azure Data Factory - Copying Today's files with date and timestamp in name of the file

 Azure Data Factory - Incremental data copy   Copy files that are Created / Modified today

This helps us to get files which are added or modified on Today's date


Similarly there are multiple approaches to get today's file

If you want only non-empty file and modified or created date with today,

please follow below 

https://youtu.be/LOo9JC-HtLk


If you want Copy Today's files with date and timestamp in name of the file and copy based on the name, please follow below

https://youtu.be/u67xQ1u6NjU




Azure Data Factory - Copy files which are not empty and last modified is today to ADLS container

 Azure Data Factory - Copy files which are not empty and last modified is today to ADLS container


This approach helps us to get only files which are not empty


Expression is @greaterOrEquals(activity('Lookup1').output.count,2)

This checks file along with the header, Sometimes files can also contain header but it could be an empty file with no further rows. The above expressions evaluates row count from 2 or more and that will be treated as non empty file.


For True conditions, 

Take a copy Activity,

Source is same dataset given to lookup activity

FileName as parameter, should have @item().name as value

Sink as dataset which points a folder 

and copy behavior as Preserve hierarchy

For False conditions,

Just create a Wait activity




Tuesday, July 12, 2022

Azure Data Factory - Read Files From On Premise File System to Azure Blob Storage - Practical Demo

 To access one premise file system, we need to setup a Self Hosted Integration runtime.

Default Azure runtime will have only scope up to Azure environment, to access files/resources outside azure we need to configure Self Hosted Integrated Runtime.


Here to stimulate an On premise environment we had a Windows 10 virtual machine and we installed Self Hosted IR in that




Sunday, June 05, 2022

Tuesday, May 17, 2022

Monday, May 16, 2022

Azure Databricks - Mounting Azure Datalake using Access keys or SAS

 Mounting Azure Datalake Storage to Azure Databricks

Mounting object storage to DBFS allows you to access objects in object storage as if they were on the local file system.

It is like we are accessing DataBricks File System (DBFS)


We can mount the file system using dbutils.fs.mount()

We can mount Azure Blob Storage either by using Account Key or By using SAS key




Sunday, May 15, 2022

Azure Databricks - dbutils - FS command with all its utilities with Example

dbutils is a package that helps to perform certain tasks in Azure Databricks. 
dbutils are only supported inside databricks notebooks

The available utilities are:
fs - Manipulates the Databricks filesystem (DBFS) from the console
1. cp
2. head  
3. ls  
4. mkdirs 
5. mv 
6. put
7. rm

Let us see about all the above utilities with example



Azure Databricks- Importing CSV file into DataBricks File System with PySpark Code

 

Azure DataBricks File System

Databricks File System (DBFS) is a distributed file system mounted into an Azure Databricks workspace and available on Azure Databricks clusters. 

DBFS is an abstraction on top of scalable object storage


The default storage location in DBFS is known as the DBFS root.

  /FileStore: Imported data files, generated plots, and uploaded libraries

  /databricks-datasets: Sample public datasets.

  /databricks-results: Files generated by downloading the full results of a query.

Below video contains example video with importing CSV file to DBFS and perform some transformation in it





Azure Databricks - Creating simple Dataframe from list with PySpark code

 Azure Data bricks 

Azure data bricks is a platform which provides you computational resources and a integrated interface to write code to perform data transformation.

It prevents time to setup environments of Python, R, Scala and SQL. It provides all for us with 0 configuration.

It contains Workspace ,Cluster and Notebook to write your code.

1. Workspace is an environment provided to you by Azure Data bricks

2. Cluster is set of computation resources and configurations on which we can run

workloads

3. Notebook is a web-based interface to a document that contains a code,

Visualizations and narrative text

Below video have a simple way to create a dataframe in Azure databricks



Azure Data Factory - Data Flows - DERIVED COLUMN and SORT Transformation

 DERIVED COLUMN Transformation

This helps to modify or generate a new column based on the condition we define. This can also helps to generate a new column based on the data of the existing columns


SORT Transformation

As name defines this helps to sort the data of the column based on the column we provide to it

Below video have the complete explanation of both with example



Azure Data Factory - Data Flows - CONDITIONAL SPLIT Transformation

Conditional Split Transformation - This helps to make different data streams based on the matching condition that we give.

This is similar to CASE statement that we use in our Programming languages. We can give multiple conditions and split the data to different streams. The following video have a detailed example with explanation for Conditional Split Transformations



Saturday, May 14, 2022

Friday, May 13, 2022

Azure Data Factory - Copy specific files from one folder to another in Data Lake

Using Get-Metadata, forEach and If condition in Azure Data Factory we can get properties of the files in Folder and we can apply our transformation to and pick desired files to another Location of DataLake

In this video, we look file size as a parameter to pick files and we took files which are more than 30 KB of size and moved them to other location





Monday, May 09, 2022

Azure Data Factory - Dynamic Data Loading using parameters to different SQL Tables.

 Dynamic Data loading helps us to Load data to resources without having effort of giving hard coded names with concept of parameterize of values

In this video we will see to load data from different CSV files from different folders to different tables in Azure SQL Database We used Lookup and Foreach activity to achieve the dynamic data load and copy activity to copy the resources



Azure Data Factory - Triggers - Schedule Trigger with Example

Schedule trigger helps to execute a pipeline at a particular scheduled time (Start Time,Recurrence, End Time). We can define an end date to schedule trigger. 

Features:

a. You can specify a specific date, Month, Days in a week to run this trigger  in a given duration.

The above image defines to run pipeline on every Monday, Wednesday and Friday at 1.25 and 8.25 AM

b. These can be used to attach to multiple pipelines 

c. Many triggers also can be attached to a single pipeline.

d. We cannot schedule for a past date (which doesn't make any sense)

Below is the video to demonstrate how it works




Sunday, May 08, 2022

Global Certifications: