Databricks  

Databricks Runtime 18.1 (Beta) — A Big Leap Forward for Data & AI Teams

Introduction

Databricks has rolled out Runtime 18.1 (Beta), and it is packed with meaningful enhancements across streaming, Delta Lake, SQL, geospatial, performance, and Apache Spark 4.1.0 improvements. This release builds on 18.0 and introduces new capabilities that make pipelines faster, smarter, and more reliable. Below is a breakdown of what is new and why it matters.

Key New Features & Improvements

Auto Loader Enhancements

Auto Loader now uses file events by default when available, reducing directory listing costs and improving latency. You can still override behavior using:

  • useIncrementalListing

  • useNotifications

Or disable file events with:

  • useManagedFileEvents = false

Delta Lake & Unity Catalog Improvements

Optimized Writes for CRTAS

Partitioned Unity Catalog tables created via CREATE OR REPLACE TABLE AS SELECT now automatically use optimized writes for fewer, larger files.

Schema Evolution with INSERT

The new WITH SCHEMA EVOLUTION clause allows automatic schema evolution during:

  • INSERT INTO

  • INSERT OVERWRITE

  • INSERT INTO … REPLACE

It handles:

  • New columns

  • Widened types

  • Preserving NULL struct values even when field order differs

Delta Sharing

Delta Sharing now supports multi‑statement transactions for shared tables using pre‑signed URLs or cloud tokens.

SQL & Scripting Enhancements

New SQL Functions

  • parse_timestamp — photonized for fast multi‑pattern timestamp parsing

  • Approximate top‑k sketch functions:

    • approx_top_k_accumulate

    • approx_top_k_combine

    • approx_top_k_estimate

  • Tuple sketch functions for distinct counting and key‑summary aggregation

SQL Cursor Support

Compound SQL statements now support:

  • DECLARE CURSOR

  • OPEN

  • FETCH

  • CLOSE

This enables row‑by‑row processing.

Behavioural Changes

  • FILTER clause now works with MEASURE aggregate functions

  • Timestamp partitions now use Spark session timezone instead of JVM timezone

  • DESCRIBE FLOW is now a reserved keyword

Streaming Improvements

  • Automatic streaming type widening for Delta tables

  • New configurations allow stricter control if required

Geospatial Performance Boost

Geospatial Boolean set operations now use a new, faster implementation, with minor precision differences beyond 15 decimal places.

DataFrame & Compute Enhancements

  • DataFrame checkpoints now support Unity Catalog volume paths

  • .cache() no longer re‑runs SQL commands like SHOW TABLES

Cloud & External System Improvements

  • DATETIMEOFFSET type support for Azure Synapse

  • Google BigQuery table descriptions now appear as table comments

Apache Spark 4.1.0 Included

Databricks Runtime 18.1 ships with Apache Spark 4.1.0, bringing:

  • Major performance fixes

  • Improved pandas interoperability

  • New geospatial type support

  • Arrow and Pandas UDF improvements

  • Streaming enhancements

  • Stability and error‑handling improvements

Summary

Databricks Runtime 18.1 (Beta) builds on 18.0 with improvements across Auto Loader, Delta Lake, Unity Catalog, SQL, streaming, geospatial processing, and compute behavior, while upgrading to Apache Spark 4.1.0. The release focuses on performance optimization, schema flexibility, transaction reliability, and improved interoperability across cloud systems and analytics workloads.