Apache Spark 3.0.0 Is Out

Apache Spark Version 3.0.0 features adaptive query execution; dynamic partition pruning; and ANSI SQL compliance.

Apache Spark released its version 3.0.0.
 
According to the official announcement, this latest version is based on git tag v3.0.0 which includes all commits up to June 10, and builds upon many of the innovations from Spark 2.x. Apache Spark 3.0.0 resolved over 3400 tickets as the result of contributions from more than 440 contributors.
 
The new release features adaptive query execution; dynamic partition pruning; improvements in pandas APIs; and ANSI SQL compliance.
 
Apache Spark 3.0.0 also brings new UI for structured streaming and up to 40 times speedups for calling R user-defined functions. Other highlight includes accelerator-aware scheduler and SQL reference documentation.
 
 
Source: spark.apache.org 
 
The team said that Spark SQL is the top active component in the new release as more than 45% of the resolved tickets are for Spark SQL. These enhancements benefit all the higher-level libraries, including structured streaming and MLlib, and higher level APIs, including SQL and DataFrames. As per TPC-DS 30TB benchmark, Spark 3.0 is around two times faster than Spark 2.4.
 
Spark 3.0 improves its Python functionalities and usability, including the pandas UDF API redesign with Python type hints, and new pandas UDF types. The release features more Pythonic error handling.
 
For full list of changes and improvements, you can visit the official announcement here.