Google Cloud Launched Beta Version Of SparkR Job Types In Cloud Dataproc

Recently, Google has launched a beta version of SparkR jobs on Cloud Dataproc, a cloud service that lets you run Apache Spark and Apache Hadoop in a cost-effective manner. It also supports distributed machine learning using MLlib.

Recently, Google has launched a beta version of SparkR jobs on Cloud Dataproc, a cloud service that lets you run Apache Spark and Apache Hadoop in a cost-effective manner. It also supports distributed machine learning using MLlib.
 
"With GCP, you can build large-scale models to analyze datasets of sizes that previously would have required huge upfront investments in high-performance computing infrastructures." wrote the company. 
  
According to the company, SparkR Jobs helps to build R support on GCP. It is a package that delivers a lightweight front-end to use Apache Spark from R, and it can be used to process against large cloud storage datasets and for performing work that is computationally intensive. Moreover, it also allows the developers to use “dplyr-like operations”, which transforms and summarizes tabular data with rows and columns on datasets stored in Cloud Storage.
 
beta release of SparkR job 
Source: Google 
 
The R programming language is very efficient for building data analysis tools and statistical apps. Using GCP’s Cloud Dataproc Jobs API, it becomes easier to submit SparkR jobs to a cluster without any need to open firewalls for accessing web-based IDEs or SSH onto the master node. Also, developers can easily automate the repeatable R statistics that are needed to be running on their datasets.
 
Additionally, GCP for R also helps avoid the infrastructure barriers that put a limit on understanding data including selecting datasets that need to be sampled due to compute or data size limits.
  
For more information, you can visit the official announcement here.