Interactive Data Analytics And Data Visualization With R In Apache Zeppelin

Hi Friends! I started looking for options to put Interactive Data Analytics and Data Visualization in my custom web page. I want to share one of the best options, that is an Open Source technology Apache Zeppelin. It can easily be integrated with my web application including cloud (AWS and Azure) and this is 100 percent Open Source. It’s easier to mix languages in the same notebook.

As per Wiki:

"Zeppelin is a modern web-based tool for the data scientists to collaborate over large-scale data exploration and visualization projects. It is a notebook style interpreter that enable collaborative analysis sessions sharing between users. Zeppelin is independent of the execution framework itself. This is a collaborative data analytics and visualization tool for distributed, general-purpose data processing systems such as Apache Spark, Python, Shell including data visualization using AngularJS templates etc. Currently Apache Zeppelin supports many interpreters such as Apache Spark, Python, JDBC, Markdown and Shell."

In detail, the list of interpreters supported by Apache Zeppelin is shown in the below image.
 
 

The Notebook is the place for all your needs - Data Ingestion, Data Discovery, Data Analytics and Data Visualization, and Collaboration. As we compare with our Microsoft technologies, this tool is very similar to WWF (Windows Workflow Foundation) or PowerShell Pipe line operations. The first Notebook output is input to the second one. Use “Apache Spark” and “R” as a data analytics tool (Part of Data Science) similar to SSAS (SQL Server Analytical Services).

This is a simple interpreter for all languages. I am writing most of my code in Python and the data visualization reports using AngularJS templates as shown in some of the templates, given below.

The deploying part is pretty simple. Team can use an innovation tool from the people in the domain verticals part of data analysis and IOT. People start implementing their ideas in the hosted cloud portal. Then, if we feel the idea will generate some money out of this, simply export the notebooks (Total Work Flows) into JSON and deploy as API.

Installation of Apache Zeppelin in Windows can be done by either downloading the installer “Sparklet-0.4.6.msi” (around 501 MB) or directly using the available virtual machine “Spark standalone VM” in Azure or AWS. You can download this VM from online portal and run in your local machine using Oracle Virtual Box, too.

Currently, I have installed Zeppelin on Amazon EMR, but even Microsoft Azure also supports Spark + Zeppelin as service.

Installation on Windows

  1. Just type “Apache Spark & Zeppelin Installer for Windows 64 bit” in any available search portal and download the Windows installer. I have downloaded “Sparklet-0.4.6.msi”, the size of which is around 501 MB.
  2. Before installation, make sure that the following prerequisites are installed in your system:
    • Java Development Kit 7 or later version
    • Microsoft Visual C++
    • If you want to use pySpark (Python API for spark), then ensure that you have a Python distribution installed in your system.
  • Double click on the downloaded installer (Sparklet - 0.4.6.msi) and follow  the steps to complete the installation.






After completion of installation Windows -> Start -> Menu Item ( Run Zeppelin ),



Create a new notebook. From the header pane, click Notebook, and then click Create New Note.


On the web page for the new notebook, click the heading, and change the name of the notebook (if you want to). Press ENTER to save the name change. Also, make sure that the notebook header shows a connected status in the top-right corner.


The user screen looks like as shown below, with the list of Icons.

  1. Run all programs (Short cut Shift + Enter).
  2. If you want to show/hide code or output, use next two icon symbols as shown below.
  3. The import ones are normally used every time.
  4. Export the notebook in JSON format to deploy further on cloud or other environments. (Sample exported JSON code file is attached with this article.)
  5. Run scheduler, also. It is very useful to automate the process of running with CRON expression or schedule at specified time intervals.


Workflow of Notebook Language Interpreter

List of Sample code in different Language interpreters.

Sample code with Ptyhon

Data Visualization with Apache Zeppelin

Displaying reports in DashBoards using Apache Zeppelin in four formats “Text”, “HTML” , “Table” and with “AngularJS Templates “.

Sample shell script with Data Visualization with AngularJS templates

Angular display system treats output as a view template of AngularJS. It compiles templates and displays what's inside of Zeppelin.




Sample Python script for Data Visualization in Table format, easily representable format in different graphical formats.



Sample code with SQL interpretor



Summary & Resources

Apache Zeppelin is easier to mix languages in the same notebook, well advanced in DataScience, and easily deployed in cloud/other environments.

In my next article, I am planning to cover some use cases with detailed sample code, along with deployment options in cloud or other environments. Follow the link for better learning.