What exactly are Cognitive Services and what are they for? Cognitive Services are a set of machine learning algorithms that Microsoft has developed to solve problems in the field of Artificial Intelligence (AI). The goal of Cognitive Services is to democratize AI
by packaging it into discrete components that are easy for developers to use in their own apps. Web and Universal Windows Platform developers can consume these algorithms through standard REST calls over the Internet to the Cognitive Services APIs.
The Cognitive Services APIs are grouped into five categories…
- Vision—analyze images and videos for content and other useful information.
- Speech—tools to improve speech recognition and identify the speaker.
- Language—understanding sentences and intent rather than just words.
- Knowledge—tracks down research from scientific journals for you.
- Search—applies machine learning to web searches.
So why is it worthwhile to provide easy access to AI? Anyone watching tech trends realizes we are in the middle of a period of huge AI breakthroughs right now with computers beating chess champions, go masters and Turing tests. All the major technology companies are in an arms race to hire the top AI researchers.
Along with high profile AI problems that researchers know about, like how to beat the Turing test and how to model computer neural-networks on human brains, are discrete problems that developers are concerned about, like tagging our family photos and finding an even lazier way to order our favorite pizza on a smartphone. The Cognitive Services APIs are a bridge allowing web and UWP developers to use the resources of major AI research to solve developer problems. Let’s get started by looking at the Vision APIs.
Cognitive Services Vision APIs
The Vision APIs are broken out into five groups of tasks…
- Computer Vision—Distill actionable information from images.
- Content Moderator—Automatically moderate text, images and videos for profanity and inappropriate content.
- Emotion—Analyze faces to detect a range of moods.
- Face—identify faces and similarities between faces.
- Video—Analyze, edit and process videos within your app.
Because the Computer Vision API on its own is a huge topic, this post will mainly deal with just its capabilities as an entry way to the others. The description of how to use it, however, will provide you good sense of how to work with the other Vision APIs.
Note: Many of the Cognitive Services APIs are currently in preview and are undergoing improvement and change based on user feedback.
One of the biggest things that the Computer Vision API does is tag and categorize an image based on what it can identify inside that image. This is closely related to a computer vision problem known as object recognition. In its current state, the API recognizes about 2000 distinct objects and groups them into 87 classifications.
Using the Computer Vision API is pretty easy. There are even samples available for using it on a variety of development platforms including NodeJS
, the Android SDK
and the Swift SDK
. Let’s do a walkthrough of building a UWP app with C#, though, since that’s the focus of this blog.
The first thing you need to do is register at the Cognitive Services site
and request a key
for the Computer Vision Preview (by clicking on one of the “Get Started for Free” buttons.
Next, create a new UWP project in Visual Studio and add the ProjectOxford.Vision NuGet package by opening Tools | NuGet Package Manager | Manage Packages for Solution and selecting it. (Project Oxford was an earlier name for the Cognitive Services APIs.)
For a simple user interface, you just need an Image
control to preview the image, a Button
to send the image to the Computer Vision REST Services and a TextBlock
to hold the results. The workflow for this app is to select an image -> display the image -> send the image to the cloud -> display the results of the Computer Vision analysis...