Machine Learning - Configuring Windows 10 PC To Run RevoScaleR Within Jupyter Notebooks

Introduction

 
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
 
The RevoScaleR library is a collection of portable, scalable, and distributable R functions for importing, transforming, and analyzing data at scale by Microsoft. You can use it for descriptive statistics, generalized linear models, k-means clustering, logistic regression, classification and regression trees, and decision forests. Functions run on the RevoScaleR interpreter, built on open-source R, engineered to leverage the multithreaded and multinode architecture of the host platform.
 
The RevoScaleR library is found in Machine Learning Server and Microsoft R Client. Now we are going to look at how to use RevoScaleR with Juptyter Notebooks in a Windows PC. This tutorial covers the following topics
  1. How to install Anaconda in a Windows 10 PC.
  2. How to Switch the environment from CRAN R to Microsoft R Open.
  3. How to Install Microsoft R Client.
  4. How to test the Jupyter Notebook and the R environment is properly configured.
Step 1. Installing Anaconda
 
Anaconda is a distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment.
 
Anaconda can be downloaded from their official website. We will be using the Individual Edition (Distribution)
 
https://www.anaconda.com/products/individual
 
Anaconda download page
 
Download the 64-Bit / 32-Bit installer according to the Windows 10 version you are using. After downloading complete the installation with the default options.
 
Step 2. Installing r-essentials
 
r-essentials package installs R with some essential packages for working with R. To install this with conda, run the following command in the “Anaconda Prompt (anaconda3)” which we installed in the previous step.
 
Go to Start - Anaconda Prompt (anaconda3)
 
In the Anaconda Prompt run the following command:
 
> conda install -c r r-essentials
 
Complete the installation.
 
Step 3. Switch the environment from CRAN R to Microsoft R Open
 
As the RevoScaleR package is not available for CRAN R. We have to install Microsoft R Open which is the enhanced distribution of R from Microsoft, a complete and free open-source platform for statistical analysis and data science.
 
To Switch the environment from CRAN R to Microsoft R Open run the following command in the Anaconda Prompt (anaconda3)
 
> conda install mro-base
 
Complete the installation.
 
Step 4. Installing Microsoft R Client
 
Microsoft R Client is a free, community-supported, data science tool for high performance analytics. Microsoft R Server and Microsoft R Client offer virtually identical packages, but each one targets different scenarios. R Client is intended for data scientists who create solutions that run locally.
 
Run the following command in the Anaconda Prompt (anaconda3). Install the Microsoft R Client
 
> conda install -c r r-mrclient
 
Complete the installation.
 
Step 5. Running the Jupyter Notebook and testing the environment
 
Jupyter Notebook can be started from the Anaconda Prompt (anaconda3) buy running the following command:
 
> jupyter notebook
 
Once you enter the command, Jupyter Notebook should start automatically in your default browser. If not, read the output in your Anaconda Prompt (anaconda3), and as instructed there, you can manually open it in your browser.
 
Command Jupyter Notebook
 
In the browser window, navigate to your desired folder and create your notebook. In this example, I have navigated to Desktop and creating a notebook named RevoScaleR.ipynb
 
Machine Learning - Configuring Windows 10 PC To Run RevoScaleR Within Jupyter Notebooks
 
As you can see in the below screenshot running library(RevoScaleR), sessionInfo() codes shows that RevoScaleR library is loaded and ready.
 
testing envrionment
 
Now your Jupyter environment is ready for Big Data analysis with RevoScaleR library
 

Summary

 
In this article, I discussed how we can install Jupyter Notebooks using Anaconda and configure it to use RevoScaleR library. In this process, we installed r-essentials package to configure the Jupyter Notebook to use CRAN R and then change it to Microsoft R Open and then installed Microsoft R Client which contains the RevoScaleR library. Finally, we tested the environment to check weather everything configured correctly.
 
I have created this tutorial as the starting step to my future Big Data analysis articles. I’ll be using Jupyter Notebooks and RevoScaleR library in most of them.
 
References
  • Project Jupyter [Internet]. [cited 2021 Jan 14]. Available from: https://www.jupyter.org
  • dphansen. RevoScaleR package for R (Machine Learning Server) [Internet]. [cited 2021 Jan 14]. Available from: https://docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/revoscaler
  • Anaconda (Python distribution). In: Wikipedia [Internet]. 2020 [cited 2021 Jan 14]. Available from: https://en.wikipedia.org/w/index.php?title=Anaconda_(Python_distribution)&oldid=994998205
  • Microsoft R Open: The Enhanced R Distribution . MRAN [Internet]. [cited 2021 Jan 14]. Available from: https://mran.microsoft.com/open
  • dphansen. Microsoft R Server vs R Client: Scale [Internet]. [cited 2021 Jan 14]. Available from: https://docs.microsoft.com/en-us/machine-learning-server/what-is-r-client-r-server