Create Python Wheel File And Deploy Production Pipelines With Python Wheel Task In Databricks

Sagar Lad
4y
28.1k
0
1

Article

Python is widely used language in the IT world. Process of packaging and distributing the python code across the teams is a very complex task. Solution to this problem is to create wheel file and share this binary files securely across teams.

Python wheel file has an extension .whl which consists of the python version and the platform the wheel file supports. There are various benefits of packaging the python code in wheel files including their smaller size.

Additionally, installing wheel files avoids the step of building the package from the source distribution

Create wheel file using the VS Code

Install the Visual Studio Code here

Install Python extension in the Visual Studio Code,

Create Python Wheel File & Deploy Production pipelines with Python wheel task in Databricks

Install Python 3.9

Create Python Wheel File & Deploy Production pipelines with Python wheel task in Databricks

Setup Wheel Directory Folders and Files

We have to run below commands to setup files and folders to create wheel file.

mkdir PythonWheelDemo
cd PythonWheelDemo
code .

After the directory is created, let’s open it in VS Code. Additionally, we will need to add the following folders and files by clicking the icons outlined below.

Create Python Wheel File & Deploy Production pipelines with Python wheel task in Databricks

Create Setup File

First we have to create the setup.py file. This file will contain all the metadata information.

import setuptools
 
with open("README.md", "r") as fh:
    long_description = fh.read()
 
setuptools.setup(
    name="ingestion",
    version="0.0.1",
    author="Sagar Lad",
    author_email="[email protected]",
    description="Package to create data ingestion",
    long_description=long_description,
    long_description_content_type="text/markdown",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.7',
)

Create Readme File

We have to create the README.md file.

# Example PackageThis is a simple example package. You can use
[Github-flavored Markdown](https://guides.github.com/features/mastering-markdown/)
to write your content.

Create Init File

Then we have to create an __init__.py file. It is a mechanism to group separate python scripts into a single importable module.

from .ingestion import etl

Create Package Function File

Finally, we will need a python package function file which will contain the python code that will need to be converted to a function. In this demo, we are simply creating a function for a create table statement that can be run in Synapse or Databricks. It will accept the database, table. Spark will be used to simply define the spark.sql code section.

def etl(df, database, table):
   database ="meta" 
   table ="fan"
   connection_string=""   df.write()
   print(f"Command Executed:")

Install Python Wheel Packages

pip install wheel

Install Check Wheel Package

Create & Verify Wheel File

python setup.py bdist_wheel

Data bricks Python Wheel Task

Python wheel tasks can be executed on both interactive clusters and on job clusters. All the output is captured and logged as part of the task execution so it is easy to understand what happened without having to go into cluster logs.

To run a Job with a wheel, first build the Python wheel locally then upload it to cloud storage. Specify the path of the wheel in the task and choose the method as the entry point.

Create Python Wheel File & Deploy Production pipelines with Python wheel task in Databricks

Conclusion

In this article, we explored about how to create python wheel file for easier distribution using the VS Code and how to deploy python wheel files to data bricks clusters.