lectio-uninapoli-2021

Introduction

Contents

Data ingestion and analytics

Modern IT architectures have data coming from a lot of different sources, including application databases, social media interactions, user reviews, external partners.

The opportunity given by the possibility to analyze these data to make analysis and take decisions, from immediate actions to longest business plans, is huge. Some examples:

medium

From: Luke Chesser

Data Ingestion challenges

However, efficiently retrieving and transforming them to make them exploitable for analysis poses many challenges:

These challenges must be faced with complex solutions involving Data Ingestion and Analytical tools. These solutions are sometimes called Business Intelligence solution, or Big Data architectures according to the volume of data from the sources.

Traditional Architecture

A typical architecture for a traditional data ingestion process is composed of several components.

image

Icons from flaticons.com

Data Warehouse

Storage containing data retrieven from different applications, unified, simplified and adapted to the analysis need.

Divided in different data mart, each representing a business process (e.g.: sales, inventory, orders). Each data mart provides:

ETL

Process for retrieving data from external sources and load into a Data Warehouse

ETL processes can operate in two ways:

ETL processes can be implemented with an ad-hoc code, most commonly there are tools that allow defining ETL processes (or pipelines) with simple graphical interfaces. We will see in next chapters how to implement an ETL pipeline with some Microsoft cloud tools in Azure.

Analytical tools

Tools used to present data retrieved in the data ingestion process. Concretely, analytical tools offer several functions:

Some well-known analytical tool are Tableau, PowerBI, even Excel can be considered as one.

Modern components

Current architecture often varies from traditional ones, taking into account modern challenges and opportunities.

image

Icons from flaticons.com

Data Lake

A simple storage tool, typically similar to a classical “File System”, that collects all data from external sources in any format they are, without the need to transform them.

In this way, if Data Warehouse model is changed or additional fields must be retrieved, data can be taken from Data Lake without the need to extract them again from the source, where they also might have been deleted.

In this case, ETL processes can be called ELT, as load is made before transformations.

Machine Learning

ML tools are more and more often used to enrich data retrieved from external tools. We saw some of these tools on former lessons on ML and cognitive services. Ideas:

Other: IoT, Spark, Search Engines

image

Icons from flaticons.com

Agenda

  1. Presentation :clock12: (00:00)
  2. Introduction
  3. Azure and microsoft resources :clock1230: (00:30)
  4. Azure Synapse :clock1: (01:00)
  5. Q&A :clock2: (02:00)