Best Practices For Dropping A Managed Delta Lake Table

Regardless of how we drop a managed table, it can take significant time, depending on the data size.

Delta Lake managed tables contain a lot of metadata in the form of transaction logs, and they can also contain duplicate data files. If a Delta table has been in use for a long time, it can accumulate a large amount of data.

In the Azure Databricks environment, there are two ways to drop tables,

  1. Run the DROP TABLE in a notebook cell.
  2. Click Delete in the UI.

Even though we can delete tables in the background without affecting workloads, it is always good to ensure that we run DELETE FROM and VACUUM before drop command on any table. This ensures that the metadata and file sizes are cleaned before initiating the data deletion.

For example, if we are trying to delete the Delta table "emptbl", run the following commands before starting the DROP TABLE command:

Run DELETE FROM

DELETE FROM emptbl

Run VACUUM with an interval of zero

VACUUM emptbl RETAIN 0 HOURS

These two steps reduce the amount of metadata and the number of uncommitted files that would otherwise increase the data deletion time.