Friday, June 23, 2023

Azure Synapse Analytics - Can 2 different spark notebooks connect to a same spark pool and execute in parallel?

 

Azure synapse analytics – Spark

 

FAQ  1- Can 2 different spark notebooks connect to a same spark pool and execute in parallel?

 

Answer: Yes

I’m having workspace capacity of 80vCores

Taking an example, if you have created a spark pool of Node size: Small (4 vCore – 32 GB size) with 8 nodes.

Total pool size = 32vCores


You can set the number of nodes to be used for each notebook

Notebook 1: Total 3 nodes = (1 driver node and 2 executor nodes)

4vCores x 3 nodes = 12 vCores used



 

Notebook 2:

Total 4 nodes (1 driver node and 3 executor nodes)

4vCores x 4 = 16 vCores used

 

 




12 vCores + 16 vCores = 28 vCores

 

Total of pool size with 32vCores, 28vCores were utilized which is 87.5% utilization.

You can run 2 notebooks having a single Spark pool

 

FAQ 2: Can these share the variables or Temporary views created in their notebooks as they are attached to same pool?

Answer: No

 

Explanation:

Apache Spark for Synapse is designed as a job service and not a cluster model. It creates a separate Apache Spark application to run each notebook.

 



 

 

Friday, June 09, 2023

Azure Data Factory + Azure Synapse Analytics - END to END development Project course - Grab 50% OFF COUPON and ENROLL NOW!

Coupon Code link: 

https://www.udemy.com/course/azure-data-factory-synapse-analytics-end-to-end-etl-project/?couponCode=NEWJUNE60


With 450+ Students join this course with above Link for can access the course with 50% OFF!!!

Throughout this course, you'll gain practical hands-on experience with Azure Data Factory and Azure Synapse Analytics, learning how to use these powerful data engineering tools to create a highly effective ETL solution. You'll explore the many features and capabilities of these platforms, as well as their integration with other Azure services like 


1. Azure SQL Database

2. Azure Synapse Analytics 

3. Azure Key Vault 

4. Azure Data Factory for Orchestration,

5. Azure Storage solutions (Azure Datalake Gen2)

6. Microsoft Power BI

7. Azure Logic Apps

============================

Linkedln : https://www.linkedin.com/in/shanmukh-sattiraju/


View the below video for project architecture:



Azure Synapse Analytics - Reading files from Azure Datalake and Writing to ADLS using PySpark

 Accessing storage account from Azure Synapse Analytics

This can be directly accessed using Linked service, 

With linked service we can access by "Account key" or by "User assigned Managed Identity"


Microsoft reference Documentation link: 


https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-secure-credentials-with-tokenlibrary?pivots=programming-language-python



Global Certifications: