R And Python Drive SQL Server 2017 - Microsoft’s Move Into Machine Learning

Why Python is the ideal fit for ML in SQL Server 2017 and a comparison between Python and Anaconda.

Every Python development company has reason to celebrate, as Microsoft shows clear signs of leaning towards Machine Learning (ML). The SQL Server 2017 offering extended R and Python support in its recently announced production quality beta release is a conclusive proof of what appeared to be in the making when SQL Server 2016 was released with support for embedded R code.

In fact, Microsoft has reinforced this belief strongly by renaming R related features with the term ‘machine learning’. Both regular developers and data scientists benefit from the additions that SQL Server 2017 brings, as the developers are familiar with T-SQL (which is still supported) and data scientists can use the ML languages for statistical programming – which is imbibed deep into both R and Python.

While R caters to the statistical analytics experts, Python is a relatively easier on-ramp analytical programming language that brings data and code together.

Python – why it is the ideal fit for ML in SQL Server 2017

Of late, there is a wide demand for Python Development Company in the enterprise application market, due to the following advantages of Python,

1. Developers can work without extracting query data sets

Most of the developers are familiar with T-SQL programming. Hence, SQL Server 2017 maintains backward compatibility with its previous version. Additionally, Python is easy to learn. There is no requirement to extract query datasets and then use them in the application, taking external data storage/processing out of the equation. This also speeds up OLAP queries significantly. Most importantly, as the code runs as stored procedures, the developers are not required to be absolute experts in Python.

2. Maintains data sovereignty and compliance

As a result of the above feature of Python (direct access to the database), developers can work within SQL Server security boundaries. This helps maintain data confidentiality and security, as data transfer is not in the picture. Developers work directly on the set of data extracts. The code once written can then be deployed and run locally, on-premises, and on cloud – as per application requirements.

3. Microsoft offers a variety of Python-based tools

As SQL Server 2017 is quite an evolved version despite being dubbed as the production quality beta release, it offers an impressive ML based tool set that supports Python. Installing Python installs Python interpreter and other ML tools like Anaconda (Python-based data analysis tool) and Microsoft RevoScalePy. Apart from this, developers can install other ML tools from the inbuilt package management functionality from Microsoft. Large scale open source packages like Microsoft Cognitive Toolkit and Google TensorFlow help add GPU to the existing functionalities.

Anaconda – the Big Data based version of Python

This is a Python-based tool that helps visualize data and add machine learning capabilities to the database. This is particularly useful when working with big data, and hence, the name does justice to the evolution of Python in processing larger data chunks. It is primarily designed for working with Hadoop and Amazon S3 clusters. Data scientists who have experience in Hadoop can easily bring their existing skill set to SQL Server by installing the Anaconda tool set for Python.

Conclusion

The natural evolution of the machine learning features from just T-SQL to Azure Focused U-SQL (that extended T-SQL with C#) to having R support included in the SQL Server versions has led to the inclusion of the most happening language in the ML world – Python in its 2017 version. It helps leverage the performance and scaling features of Microsoft in the database itself. The added advantages of being merged with a Microsoft based product suite are that on-premise and cloud hosting is supported, along with Linux and Mac OSx support.

It effectively aims to eliminate the need for third-party database management, data analytics, and business intelligence systems as they can all be clubbed into a single solution developed using Python. It literally makes the database intelligent!