Microsoft Cognitive Services Leap Forward

I have written before about learning data science and how it's something every developer should consider learning at least something about. Like most things, once you learn the basics, it's another useful tool to have in your belt of engineering tricks to solve problems for customers. Just like you don't need to be a mathematician to do basic equations and balance your household budget, you likewise do not need to be a full-fledged data scientist or have a PhD to be able to use the power of Machine Learning and so called Artificial Intelligence.



Microsoft and others are extremly good at taking big technologies and building services around them that bring what may seem difficult within reach of the general development community. If one has time to spend learning the deep core fundementals of this branch of computing that's great; but where other things take a priority and you still need to utilise the power of Machine learning and Artificial Intelligence, there are some great APIs you can hook into. Recently, Microsoft rolled out further advances in the area under the banner of 'Cognitive services'... and they are really quite impressive.

The offering is provided by Microsoft as a series of REST based API services, available here. By exposing the power of the underlying services via a web based API, the service allows us to integrate very powerful features into modern applications both web and mobile based. The APIs offered cover a broad range that include Vision, Speech, Language, Knowledge and Search.

 

The first question I usually ask of new technology offerings is 'what can it do for me?' ... from here, I decide if it is worth looking into further or not. I often find that examples given by vendors may not be apoplicable to me immediatly, but sit at the back of my head and become useful at a later stage. Let's take a look at some of what is there and see if it is worth deeper investigation.
The vision service covers Computer vision, Content moderation, Emotion, Face and Video. Here are some examples of how these are useful.

Using the Face Detection API, you can upload an image (or point to an online URL), and the API will return information about any faces located in the image. In this example, you can see my handsome visage and what are called 'face landmarks' that the API identified. These can be used in conjunction with other services as you will see later.

 

One of the uses of the API is verifying if facial images belong to the same person and by examining the probability score returned, you can measure the confidence of the match. 

 

The face API and its corrosponding moving picture Video API can also be used to get extract deeper understanding from images. The following examples show the service identifying the gender of a person in the image together with age estimate.



Emotions are also possible to detect and report on.

 

Let's look at another very powerful example. This one analyzes an image uploaded of a swimmer, and is capable of giving us information about whats happening in the image - very impressive!

 

Before we move on, let us look at analyzing text that's embedded within images, these two examples' demonstration text being extracted that's both typset and handwritten. All data of course as you can see is returned in JSON format.

 

 

The text analysis service is equally as powerful, offering a lot of opportunity to add value to systems we develop. Text analysis can extract key phrases, detect the topic of the text, and the language the text is written in among other things. Sentiment analysis is only provided at this date (April 2017) in English, French, Spanish, and Portuguese, but more are to follow.


Another text based fundemental that is provided as part of the service is translation. Translation for speech is supported for 9 languages, and for text a massive 60 languages. One of the things you discover as you move between industries, is that each sector has its own domain specific language in which they communicate. To help with this, there is a custom translation system where you can feed in your own specific custom dictionaries to be used.

One specialist domain specific area catered for by Cognitive services is the Academic Knowledge API. This service offers a number of very interesting ways to interact with academic research papers. Available functionality includes paper similarity matching, graph search to enable you to follow citations (which can be expressed as lambda expressions), and natural language interpretation options.

 

Dispite publishing a plethora of content on a website, sometimes users can get lost and simply cannot find the informaiton they require. In this case its useful to create a 'frequently asked questions' section. However, like the content itself, this can be bothersome to both create and maintain. The 'QnA' service allows you to create an FAQ service from existing content, using natural language analysis to derive the questions from content.

 

I think that most web developers at some stage have been asked to develop, or at least interact with some kind of eCommerce website. Three APIs in the 'Recommendations API' should prove very useful for this sector. As the name suggests, this API uses cognititive intelligence to recommend products to customers. Options include the 'Frequently bought together', 'Item to item', and 'Personalized user recomendations'. These APIs allow you to offer the kind of services only normally found on bigger sites such as 'customers who liked this product also...', 'because you watched this movie you might also like...' etc.


Theres a lot of seriously interesting stuff in Cognitive Services that is worth checking out. Even if you don't have a use for the service now, it may trigger a thought in future so worth spending some time familiarising yourself with the options. As an aside, you can sign up for free and there is a very reasonable monthly allowance you can use for testing without having to pay anything.

Useful links
Intro videos