The One Minute AI - Azure Databricks Overview

In this article, you will learn about Azure Databricks Overview.

Series introduction

 
Welcome to a new series of short articles I am presenting about Artificial Intelligence specifically in the Azure AI stack. The objective is that you will learn about an Azure based AI service in no more than one minute and thus quickly get familiar with the entire stack over a short period of time. These are going short, easily digestible articles so let's get started!
 

What is Azure Databricks Overview?

 
Azure Databricks Overview
 

What is Azure Databricks?

 
Azure Databricks is an Apache Spark-based analytics platform which has been optimized for Microsoft Azure’s cloud services platform, thus giving Azure users a single platform for Big Data processing and Machine Learning. Azure Databricks also integrates with Azure services such as SQL Data Warehouse, Power BI and Azure Active Directory. Because it is integrated with Azure it can provide streamlined workflows and collaborative workspaces that allow integration between the work and needs of data engineers, data scientists and business analysts. Azure Databricks includes all open-source Apache Spark cluster technologies and capabilities.
 
Azure Databricks Overview
 

What can Azure Databricks do?

 
Azure Databricks also connects to all Azure storage options. For example, it can read and write to file-based storage, such as Azure Data Lake Store and Blob storage, as well as to relational databases, including Azure SQL Database/Data Warehouse, and NoSQL data stores. It can also connect to streaming such as Event Hubs or Apache Kafka on HDInsight.
 
With Azure Databricks different compute tasks can be implemented in a single workspace; for example, Azure Data Lake Analytics, Stream Analytics and Azure Machine Learning.
 
Also, batch ETL jobs can be developed in the workspace and then scheduled using either the Databricks scheduler or with Azure Data Factory; machine learning models can be created and deployed in the workspace and jobs for processing streaming data can be developed and deployed to clusters within the workspace.
 
Azure Databricks allows you to,
  • use machine learning tools which means that you can combine data at any scale and deploy custom machine learning models.
  • bring together all your data at any scale in a data warehouse.
  • gain insights through the use of analytics, operational reports and analytical dashboards.
  • capture data from any streaming source and process it in near-real time.
In short, Azure Databricks is a managed Apache platform, optimized for the cloud which has one-click deployment and auto-scaling with monitoring tools, security controls and an interactive notebook environment, all of which make it simpler and more cost-efficient to run large scale Spark workloads.
 
Find out more,
  • https://azure.microsoft.com/en-us/services/databricks/
  • https://databricks.com/introducing-azure-databricks
  • https://databricks.com/blog/2017/11/15/introducing-azure-databricks.html