Tuesday, May 17, 2022

Monday, May 16, 2022

Azure Databricks - Mounting Azure Datalake using Access keys or SAS

 Mounting Azure Datalake Storage to Azure Databricks

Mounting object storage to DBFS allows you to access objects in object storage as if they were on the local file system.

It is like we are accessing DataBricks File System (DBFS)


We can mount the file system using dbutils.fs.mount()

We can mount Azure Blob Storage either by using Account Key or By using SAS key




Sunday, May 15, 2022

Azure Databricks - dbutils - FS command with all its utilities with Example

dbutils is a package that helps to perform certain tasks in Azure Databricks. 
dbutils are only supported inside databricks notebooks

The available utilities are:
fs - Manipulates the Databricks filesystem (DBFS) from the console
1. cp
2. head  
3. ls  
4. mkdirs 
5. mv 
6. put
7. rm

Let us see about all the above utilities with example



Azure Databricks- Importing CSV file into DataBricks File System with PySpark Code

 

Azure DataBricks File System

Databricks File System (DBFS) is a distributed file system mounted into an Azure Databricks workspace and available on Azure Databricks clusters. 

DBFS is an abstraction on top of scalable object storage


The default storage location in DBFS is known as the DBFS root.

  /FileStore: Imported data files, generated plots, and uploaded libraries

  /databricks-datasets: Sample public datasets.

  /databricks-results: Files generated by downloading the full results of a query.

Below video contains example video with importing CSV file to DBFS and perform some transformation in it





Azure Databricks - Creating simple Dataframe from list with PySpark code

 Azure Data bricks 

Azure data bricks is a platform which provides you computational resources and a integrated interface to write code to perform data transformation.

It prevents time to setup environments of Python, R, Scala and SQL. It provides all for us with 0 configuration.

It contains Workspace ,Cluster and Notebook to write your code.

1. Workspace is an environment provided to you by Azure Data bricks

2. Cluster is set of computation resources and configurations on which we can run

workloads

3. Notebook is a web-based interface to a document that contains a code,

Visualizations and narrative text

Below video have a simple way to create a dataframe in Azure databricks



Azure Data Factory - Data Flows - DERIVED COLUMN and SORT Transformation

 DERIVED COLUMN Transformation

This helps to modify or generate a new column based on the condition we define. This can also helps to generate a new column based on the data of the existing columns


SORT Transformation

As name defines this helps to sort the data of the column based on the column we provide to it

Below video have the complete explanation of both with example



Azure Data Factory - Data Flows - CONDITIONAL SPLIT Transformation

Conditional Split Transformation - This helps to make different data streams based on the matching condition that we give.

This is similar to CASE statement that we use in our Programming languages. We can give multiple conditions and split the data to different streams. The following video have a detailed example with explanation for Conditional Split Transformations



Saturday, May 14, 2022

Friday, May 13, 2022

Azure Data Factory - Copy specific files from one folder to another in Data Lake

Using Get-Metadata, forEach and If condition in Azure Data Factory we can get properties of the files in Folder and we can apply our transformation to and pick desired files to another Location of DataLake

In this video, we look file size as a parameter to pick files and we took files which are more than 30 KB of size and moved them to other location





Monday, May 09, 2022

Azure Data Factory - Dynamic Data Loading using parameters to different SQL Tables.

 Dynamic Data loading helps us to Load data to resources without having effort of giving hard coded names with concept of parameterize of values

In this video we will see to load data from different CSV files from different folders to different tables in Azure SQL Database We used Lookup and Foreach activity to achieve the dynamic data load and copy activity to copy the resources



Azure Data Factory - Triggers - Schedule Trigger with Example

Schedule trigger helps to execute a pipeline at a particular scheduled time (Start Time,Recurrence, End Time). We can define an end date to schedule trigger. 

Features:

a. You can specify a specific date, Month, Days in a week to run this trigger  in a given duration.

The above image defines to run pipeline on every Monday, Wednesday and Friday at 1.25 and 8.25 AM

b. These can be used to attach to multiple pipelines 

c. Many triggers also can be attached to a single pipeline.

d. We cannot schedule for a past date (which doesn't make any sense)

Below is the video to demonstrate how it works




Sunday, May 08, 2022

Global Certifications: