NVIDIA Scales Apache Spark

NVIDIA and Databricks together optimize Spark with the RAPIDS software suite for Databricks, bringing GPU acceleration to ML workloads.

Recently, NVIDIA announced its collaboration with the open-source community in order to bring end-to-end GPU acceleration to Apache Spark 3.0.
 
NVIDIA said that with the release of Spark 3.0, ML engineers, for the first time, will be able to apply GPU acceleration to the ETL data processing workloads widely handled using SQL database operations.
 
Now, AI model training will be able to be processed on the same Spark cluster, rather than running the workloads as separate processes on separate infrastructure.
 
Adobe, which is one of the first organizations working with a preview release of Spark 3.0 running on Databricks, has realized a 7 fold performance improvement and about 90 percent cost savings in an introductory test, using GPU-accelerated data analytics for product development in Adobe Experience Cloud.
 
According to the announcement, NVIDIA and Databricks have colluded to optimize Spark with the RAPIDS software suite for Databricks, bringing GPU acceleration to ML workloads running on Databricks.
 
 
Source: NVIDIA 
 
Matei Zaharia, the original creator of Apache Spark and chief technologist at Databricks said "Our continued work with NVIDIA improves performance with RAPIDS optimizations for Apache Spark 3.0 and Databricks to benefit our joint customers like Adobe....These contributions lead to faster data pipelines, model training and scoring, that directly translate to more breakthroughs and insights for our community of data engineers and data scientists."


Next Recommended Reading Apache Spark Gets A New Set Of APIs