lectio

Vision

Contents

APIs

Face

The Azure Cognitive Services Face service provides algorithms that detect, recognize, and analyze human faces in images. The ability to process human face information is important in many different software scenarios.

Face Detection

The Face service detects human faces in an image and returns the rectangle coordinates of their locations. Optionally, face detection can extract a series of face-related attributes. Examples are head pose, gender, age, emotion, facial hair, and glasses.

The face detection feature is also available through the Computer Vision API.

image

For more information on face detection, see the Face detection concepts article. Also see the Detect API reference documentation.

Face Verification

The Verify API does an authentication against two detected faces or from one detected face to one person object.

Practically, it evaluates whether two faces belong to the same person. This capability is potentially useful in security scenarios. For more information, see the Face recognition concepts guide or the Verify API reference documentation.

Find similar faces

The Find Similar API compares a target face with a set of candidate faces to find a smaller set of faces that look similar to the target face. Two working modes, matchPerson and matchFace, are supported.

The matchPerson mode returns similar faces after it filters for the same person by using the Verify API.

The matchFace mode ignores the same-person filter. It returns a list of similar candidate faces that might or might not belong to the same person. For more information, see the Face recognition concepts guide or the Find Similar API reference documentation.

Face Grouping

The Group API divides a set of unknown faces into several groups based on similarity. Each group is a disjoint proper subset of the original set of faces. All of the faces in a group are likely to belong to the same person. There can be several different groups for a single person. The groups are differentiated by another factor, such as expression, for example. For more information, see the Face recognition concepts guide or the Group API reference documentation.

Person identification

The Identify API is used to identify a detected face against a database of people. This feature might be useful for automatic image tagging in photo management software. You create the database in advance, and you can edit it over time.

The following image shows an example of a database named “myfriends”. Each group can contain up to 1 million different person objects. Each person object can have up to 248 faces registered.

image

After you create and train a database, you can do identification against the group with a new detected face. If the face is identified as a person in the group, the person object is returned.

Containers

Use the Face container to detect, recognize, and identify faces by installing a standardized Docker container closer to your data.

Computer Vision

Azure’s Computer Vision service provides developers with access to advanced algorithms that process images and return information, depending on the visual features you’re interested in. For example, Computer Vision can determine if an image contains adult content, or it can find all of the human faces in an image.

Analyze Images

You can analyze images to detect and provide insights about their visual features and characteristics. All of the features in the table below are provided by the Analyze Image API.

Action Description
Tag visual features Identify and tag visual features in an image, from a set of thousands of recognizable objects, living things, scenery, and actions. When the tags are ambiguous or not common knowledge, the API response provides hints to clarify the context of the tag. Tagging isn’t limited to the main subject, such as a person in the foreground, but also includes the setting (indoor or outdoor), furniture, tools, plants, animals, accessories, gadgets, and so on.
Detect objects Object detection is similar to tagging, but the API returns the bounding box coordinates for each tag applied. For example, if an image contains a dog, cat and person, the Detect operation will list those objects together with their coordinates in the image. You can use this functionality to process further relationships between the objects in an image. It also lets you know when there are multiple instances of the same tag in an image.
Detect brands Identify commercial brands in images or videos from a database of thousands of global logos. You can use this feature, for example, to discover which brands are most popular on social media or most prevalent in media product placement.
Categorize an image Identify and categorize an entire image, using a category taxonomy with parent/child hereditary hierarchies. Categories can be used alone, or with our new tagging models.
Currently, English is the only supported language for tagging and categorizing images.
Describe an image Generate a description of an entire image in human-readable language, using complete sentences. Computer Vision’s algorithms generate various descriptions based on the objects identified in the image. The descriptions are each evaluated and a confidence score generated. A list is then returned ordered from highest confidence score to lowest.
Detect faces Detect faces in an image and provide information about each detected face. Computer Vision returns the coordinates, rectangle, gender, and age for each detected face.
Computer Vision provides a subset of the Face service functionality. You can use the Face service for more detailed analysis, such as facial identification and pose detection.
Detect image types Detect characteristics about an image, such as whether an image is a line drawing or the likelihood of whether an image is clip art.
Detect domain-specific content Use domain models to detect and identify domain-specific content in an image, such as celebrities and landmarks. For example, if an image contains people, Computer Vision can use a domain model for celebrities to determine if the people detected in the image are known celebrities.
Detect the color scheme Analyze color usage within an image. Computer Vision can determine whether an image is black & white or color and, for color images, identify the dominant and accent colors.
Generate a thumbnail Analyze the contents of an image to generate an appropriate thumbnail for that image. Computer Vision first generates a high-quality thumbnail and then analyzes the objects within the image to determine the area of interest. Computer Vision then crops the image to fit the requirements of the area of interest. The generated thumbnail can be presented using an aspect ratio that is different from the aspect ratio of the original image, depending on your needs.
Get the area of interest Analyze the contents of an image to return the coordinates of the area of interest. Instead of cropping the image and generating a thumbnail, Computer Vision returns the bounding box coordinates of the region, so the calling application can modify the original image as desired.

Extract text from images

You can use Computer Vision Read API to extract printed and handwritten text from images into a machine-readable character stream. The Read API uses our latest models and works with text on a variety of surfaces and backgrounds, such as receipts, posters, business cards, letters, and whiteboards. Currently, English and Spanish are the only supported languages.

You can also use the optical character recognition (OCR) API to extract printed text in several languages. If needed, OCR corrects the rotation of the recognized text and provides the frame coordinates of each word. OCR supports 25 languages and automatically detects the language of the recognized text.

Moderate content in images

You can use Computer Vision to detect adult content in an image and return confidence scores for different classifications. The threshold for flagging content can be set on a sliding scale to accommodate your preferences.

Use containers

Use Computer Vision containers to recognize printed and handwritten text locally by installing a standardized Docker container closer to your data.

Custom Vision Service

Azure Custom Vision is a cognitive service that lets you build, deploy, and improve your own image classifiers. An image classifier is an AI service that applies labels (which represent classes) to images, according to their visual characteristics. Unlike the Computer Vision service, Custom Vision allows you to specify the labels to apply.

The Custom Vision service uses a machine learning algorithm to apply labels to images. You, the developer, must submit groups of images that feature and lack the characteristics in question. You label the images yourself at the time of submission. Then the algorithm trains to this data and calculates its own accuracy by testing itself on those same images. Once the algorithm is trained, you can test, retrain, and eventually use it to classify new images according to the needs of your app. You can also export the model itself for offline use.

Form Recognizer (preview)

Azure Form Recognizer is a cognitive service that uses machine learning technology to identify and extract text, key/value pairs and table data from form documents. It ingests text from forms and outputs structured data that includes the relationships in the original file. You quickly get accurate results that are tailored to your specific content without heavy manual intervention or extensive data science expertise. Form Recognizer is comprised of custom models, the prebuilt receipt model, and the layout API. You can call Form Recognizer models by using a REST API to reduce complexity and integrate it into your workflow or application.

Form Recognizer is made up of the following services:

Ink Recognizer (Retiring)

The Ink Recognizer API has ended its preview on August 26th, 2020. If you have existing Ink Recognizer resources, you can continue using them until the service is fully retired on January 31st, 2021.

The Ink Recognizer Cognitive Service provides a cloud-based REST API to analyze and recognize digital ink content. Unlike services that use Optical Character Recognition (OCR), the API requires digital ink stroke data as input. Digital ink strokes are time-ordered sets of 2D points (X,Y coordinates) that represent the motion of input tools such as digital pens or fingers. It then recognizes the shapes and handwritten content from the input and returns a JSON response containing all recognized entities.

A flowchart describing sending an ink stroke input to the API

Video Indexer

Video Indexer provides ability to extract deep insights (with no need for data analysis or coding skills) using machine learning models based on multiple channels (voice, vocals, visual). You can further customize and train the models. The service enables deep search, reduces operational costs, enables new monetization opportunities, and creates new user experiences on large archives of videos (with low entry barriers).

Demo

References

Agenda

  1. Presentation :clock12: (00:00)
  2. Introduction
  3. Azure Cognitive Services :clock3: 15:00
  4. Telegram Bot with Go
  5. Vision :clock1: (01:00)
  6. Language :clock130: (01:30)
  7. Decision
  8. Q&A (01:55)