lectio-uninapoli-2021

Azure Synapse

Contents

General information

Synapse Studio is an integrated Azure tool that allows managing all the steps of an Analytics pipeline, within a single interface. In particular:

All the configuration and objects created within Synapse can be versioned with a Git repository.

Demo

Create resources on Azure

  1. Create a resource group for your project
  2. Open resource group, select Add –> MarketPlace –> Azure Synapse Analytics (not private hubs) –> Create
  3. On the current screen, click on account name –> create new and select a name, same for file system name

image.png

  1. Add the remaining configuration for Synapse workspace as in the screenshot

image.png

  1. Click on review and create, then on Create

image.png

Create connection to external Database and a Serverless SQL Pool

  1. Open the resource group of the project
  2. Open Synapse Workspace
  3. Click on Open Synapse Studio image

NB: Synapse allows you to version all the configurations in a GIT Repository. We won’t go into details here, but you can switch between “Synapse Live” (no versioning) to a versioning in GIT in the upper left corner of the screen.

image.png

  1. Click on Manage –> Linked Services –> New

image.png

  1. Select Azure SQL Database and click on continue
  2. Provide the required information.

image.png

  1. Click on test connection on the bottom of the page to make sure connection is successful, then click on create.

  2. Click on the “Data” icon on the left, then on the “+” and SQL Database.

image.png

  1. Select “Serverless” and provide a name

image.png

Create a Power BI workspace and connect Synapse to it

  1. Open powerbi.com
  2. Login with your own company account
  3. Click on workspaces –> Create a Workspace

image.png

  1. Provide name and description and click on save

  2. Go back to Synapse, click on Manage –> Linked services –> New

image.png

  1. Select Connect to Power BI or search for Power BI
  2. Provide a name for the linked service and the workspace name you have just created in Power BI, then click on Create

image.png

  1. Click on Publish All to validate all modifications

image.png

Create ETL Pipelines to import data

The subsequent step must be executed for each business process you want to analyze, with facts and dimensions

  1. Click on the Storage sign on the left (“Data”), then on the linked tab and on the primary data lake object as in the figure. Then, click on “New folder” to create a new folder with the name of your business process.

image.png

  1. Click on “integrate” button on the left image.png

  2. Click on the plus “+” button on the right of “integrate”, then select “Pipeline”

image.png

  1. Provide a name for the pipeline

The subsequent step must be repeated for each table you want to create in the Data Lake

  1. On the left bar, expand “Move and Transform” and select “Copy data” tool image.png

  2. Provide a name for the step, like copy_[table_name]
  3. Click on the source tab and apply the following steps:
    • click on “+ New” button on the right of “Source Dataset”
    • Choose Azure SQL Database as input (or your source DB)
    • Fulfill the properties form with a name and the linked service we created for the external database, select the table (or “None” if you want to import via a query and not the full table), then click on OK.

image.png

image.png

  1. Go the the sink tab, click on “+ New” on the right of “Sink dataset”
  2. Select “Azure Data lake storage Gen2”, then “Parquet”
  3. In the subsequent form, provide a name and select the workspace default storage as linked service.

image.png

  1. Click on the folder on the right of the “file path” section, then select the path in the data lake where you want to drop your data (double click on a folder to open it). You may choose the folder you created for the business process in previous steps.
  2. If you wrote a query in the source section, also add the file name on the last text form of “File path” section, with “.parquet” as an extension. If you selected a table, leave it blank.

  3. Click on the “Debug” button above to test the pipeline. (NB: this will trigger the import and impute costs) and verify if the pipeline runs successfully, in the output tab below

image.png

image.png

  1. If the pipeline terminates correctly, verify if data retrieved are correct. Click on “Data” on the left, then on “Linked” tab, click on the primary data lake as in the image below. Then, navigate to the path you selected on the steps below

image.png

Now we will create a view on the Data Lake. You may want to create external tables instead of of views. In this case, in next step select “Create an External Table” instead of SELECT TOP rows. Here an analysis of the differences. image.png

  1. Right click on the file you created, and select “New SQL Script” –> Select TOP 100 Rows

  2. Click on Run and verify if results are expected.

image.png image.png

  1. Now, modify the script removing the TOP 100 text and adding at the beginning “CREATE VIEW [name] AS”, as in the figure below. Also, select the database you created in the Preliminary steps at the right of use database, then click on “Run”

  2. You may want to provide a name to the Script for creating the view (on the right bar), while this is not strictly necessary

image.png

  1. Click on “Validate all” tab above, and if the systems shows no errors, then click on “Commit All”

image.png

Add trigger for the pipeline

You should now define triggers to schedule the activation of the pipeline.

  1. On the created pipeline, click on Add Trigger –> “New/Edit”

image.png

  1. On the dropdown list, click on “New” TODO: I think that if you already created a trigger for another pipeline, you can reuse it and avoid creating a new one. To verify.
  2. Create a new trigger providing all requested information. You may select “No” on activated to suspend the trigger and activate it later.

image.png

Generate PowerBI Dashboards

  1. Click on the paper icon on the left (develop), then click on Power BI datasets as in the figure, then click on New Power BI datasets

image.png

  1. Download the PowerBI file, double click on it to open PowerBI

image.png

  1. From PowerBI, connect with your Microsoft account

image.png

  1. Select the views you created in the Import data from external source step and click on Load

image.png

  1. Select Direct Query or Import and validate
  2. Verify the relationships in the correspondent view image.png
  3. Prepare your analysis
  4. When you finish, click on file –> publish and select the Workspace of Power BI created on previous step
  5. You can now see the report in Synapse, in Develop tab

image.png

Agenda

  1. Presentation :clock12: (00:00)
  2. Introduction
  3. Azure and microsoft resources :clock1230: (00:30)
  4. Azure Synapse :clock1: (01:00)
  5. Q&A :clock2: (02:00)