Fine-Tuning Hyperparameters using Hyperdrive in Azure Machine Learning SDK

Article

Azure Machine Learning allows for hyperparameter training through Hyperdrive experiments. This process launches multiple child runs, each with a different hyperparameter configuration. After all, runs are complete; the best model can be evaluated and registered to the Azure Machine Learning Studio.

In this article, you will follow the process of tuning Hyperparameters for optimizing a model.

What are hyperparameters?

Hyperparameters are different than the model parameters in that they cannot be learned from the data. They are decided before training the model. They are adjustable and need to be tuned in order to obtain a model with optimal performance. Some examples of hyperparameters include the number of layers in a neural network or the so-called learning rate of many machine learning algorithms that determines how big of a step the algorithm takes at each iteration. You can learn more about hyperparameters here. An example of a hyperparameter used in the scikit-learn package would be:

train_test_split(X, y, test_size=0.9, random_state=0)

The test_size represents the percentage of the data to use in the test split and random_state is the seed used by the random number generator. These hyperparameters can be fined tuned in order to create the best possible model.

Login to Workspace

To log in to the workspace with the Azure ML Python SDK, you will need to authenticate again with Azure. When you run this cell for the first time, you are prompted to authenticate with Azure by clicking on a link and inputting a security code into a web page.

This block of code imports the azureml.core package which is used for interacting with Azure Machine Learning.

import azureml.core
from azureml.core import Workspace

# Load the workspace from the saved config file
ws = Workspace.from_config()
print('Ready to use Azure ML {} to work with {}'.format(azureml.core.VERSION, ws.name))

Prepare the Data

Use the Diabetes open dataset to train a regression model. Executing the code below registers the diabetes dataset within the Machine Learning Studio Workspace as a tabular dataset to be used in experiments.

from azureml.opendatasets import Diabetes
from azureml.core import Dataset

if "diabetes" not in ws.datasets:

    ds_name = "diabetes"

    # Create a tabular dataset from the path on the datastore
    tab_data_set = Diabetes.get_tabular_dataset()

    # Register the tabular dataset
    try:
        print("Registering Dataset")
        tab_data_set = tab_data_set.register(
            workspace=ws,
            name=ds_name,
            description="Diabetes Sample",
            tags={"format": "CSV"},
            create_new_version=True,
        )
        print("Dataset is registered")
    except Exception as ex:
        print(ex)
else:
    print("Dataset already registered.")

Set Up Compute

A compute instance will need to be selected to deploy the Hyperdrive experiment. Executing the code discovers the available compute instance and sets it to a variable that is used in a later code cell.

from azureml.core.compute import ComputeTarget

for compute in ComputeTarget.list(ws):
    training_cluster = ComputeTarget(workspace=ws, name=compute.name)

print("Found compute instance!")

Create Training Script

A training script needs to be generated to execute during each run. Create a folder directory to download the training script.

import os

experiment_folder = "diabetes_training-hyperdrive"
os.makedirs(experiment_folder, exist_ok=True)

print("The folder has been created.")

A parameterized training script is created the experiment_folder with parameters for optimizing the alpha and tol arguments of the algorithm. The script downloads the Diabetes dataset from the workspace and trains against it with the specified algorithm settings. Running the cell below will generate the script.

import os
import argparse
import joblib
import math
from azureml.core import Dataset, Run
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# Set alphas and tols parameters
parser = argparse.ArgumentParser()
parser.add_argument("--input-data", type=str)
parser.add_argument(
    "--alphas", type=float, dest="alpha_value", default=0.01, help="alpha rate"
)
parser.add_argument(
    "--tols", type=float, dest="tol_value", default=0.01, help="tol rate"
)
args = parser.parse_args()
alpha = args.alpha_value
tol = args.tol_value

# Get the experiment run context
run = Run.get_context()
ws = run.experiment.workspace

# Load the Diabetes dataset and split the data into training and test sets
diabetes = Dataset.get_by_id(ws, id=args.input_data).to_pandas_dataframe()

X, y = (
    diabetes[["AGE", "BMI", "S1", "S2", "S3", "S4", "S5", "S6", "SEX"]].values,
    diabetes["Y"].values,
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=66
)

# Train the model with the specified alpha and tol arguments
model = Ridge(alpha=alpha, tol=tol)
model.fit(X=X_train, y=y_train)
y_pred = model.predict(X=X_test)
rmse = math.sqrt(mean_squared_error(y_true=y_test, y_pred=y_pred))
run.log("rmse", rmse)

# A file is saved to the outputs folder which automated gets uploaded into the experiment record in Azure ML Studio
os.makedirs("outputs", exist_ok=True)
model_name = "model_alpha_" + str(alpha) + ".pkl"
filename = "outputs/" + model_name
joblib.dump(value=model, filename=filename)

run.complete()

Run a Hyperdrive Experiment

Tuning hyperparameters is similar to tuning a musical instrument. You play with the settings to determine which result is best. The Hyperdrive package helps automate this process to reduce the amount of tedious work it would take to test each configuration by hand. Run the next cell to start by importing the required packages for the Hyperdrive experiment.

from azureml.core import Environment
from azureml.core import ScriptRunConfig
from azureml.core import Experiment
from azureml.train.hyperdrive import (
    RandomParameterSampling,
    BanditPolicy,
    HyperDriveConfig,
    PrimaryMetricGoal,
    choice,
    uniform,
)
from azureml.widgets import RunDetails

print("Packages imported!")

Hyperparameters are the desired settings used to tweak a given algorithm. Azure Machine Learning provides the ability to automate the selection of these settings. Currently, the sampling methods supported are random sampling, grid sampling, and Bayesian sampling.

Executing the code below will use random sampling, which randomly picks values from the defined search space. With random sampling, you can use continuous hyperparameters that choose within a range of values instead of statically calling out each value to use. This reduces some of the manual work in hyper parameter tuning.

# Parameter values for random sampling
params = RandomParameterSampling(
    {
        "--alphas": choice(0.001, 0.005, 0.01, 0.05, 0.1, 1.0, 2.0, 4.0, 8.0),
        "--tols": uniform(0.001, 0.01),
    }
)

print("Hyperparameters are set!")

Run the next cell to create the run configuration. The run configuration defines the training script to use for each run and the compute target to apply for the runs. Also, the dataset is passed through as an input so that each run can use the Diabetes dataset.

# Get the training Diabetes dataset
diabetes_ds = ws.datasets.get("diabetes")

sklearn_env = Environment.get(workspace=ws, name="AzureML-Tutorial")
run_config = ScriptRunConfig(
    source_directory=experiment_folder,
    script="diabetes_training.py",
    arguments=["--input-data", diabetes_ds.as_named_input("diabetes")],
    compute_target=training_cluster,
    environment=sklearn_env,
)


print("The run configuration has been created!")

Set up the Hyperdrive to configure the experiment settings. This includes the random sampling parameters as well as the run configuration.

# Configure hyperdrive settings
hyperdrive = HyperDriveConfig(
    run_config=run_config,
    hyperparameter_sampling=params,
    policy=None,
    primary_metric_name="rmse",
    primary_metric_goal=PrimaryMetricGoal.MINIMIZE,
    max_total_runs=20,
    max_concurrent_runs=4,
)

print("The hyperdive is ready to run!")

Run the experiment and review the results. This will take 10 - 20 minutes. The status will be displayed in the output as the experiment runs. When the experiment finishes, you may see an error message related to a bug inside the azureml.widgets package that can be ignored.

You can also switch over to Azure Machine Learning Studio and view the status of the run from the Experiments console.

# Run the experiment
experiment = Experiment(workspace = ws, name = 'diabetes_training_hyperdrive')
run = experiment.submit(config=hyperdrive)

# Show the status
RunDetails(run).show()
run.wait_for_completion()

_HyperDriveWidget("widget_settings ="{
   "childWidgetDisplay":"popup",
   "send_telemetry":false,
   "log_level":"INFO""…"{
      "runId":"HD_77961f54-fea8-4514-8a1c-25a5743ce89e",
      "target":"ca-41044-compute",
      "status":"Completed",
      "startTimeUtc":"2023-02-12T23:25:25.131619Z",
      "endTimeUtc":"2023-02-12T23:34:58.151189Z",
      "services":{
         
      },
      "properties":{
         "primary_metric_config":"{\"name\":\"rmse\",\"goal\":\"minimize\"}",
         "resume_from":"null",
         "runTemplate":"HyperDrive",
         "azureml.runsource":"hyperdrive",
         "platform":"AML",
         "ContentSnapshotId":"c0a3443f-a512-46bd-8434-de2b8c46aa03",
         "user_agent":"python/3.8.10 (Linux-5.15.0-1031-azure-x86_64-with-glibc2.17) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.48.0",
         "space_size":"infinite_space_size",
         "score":"57.05435499812854",
         "best_child_run_id":"HD_77961f54-fea8-4514-8a1c-25a5743ce89e_3",
         "best_metric_status":"Succeeded",
         "best_data_container_id":"dcid.HD_77961f54-fea8-4514-8a1c-25a5743ce89e_3"
      },
      "inputDatasets":[
         
      ],
      "outputDatasets":[
         
      ],
      "runDefinition":{
         "configuration":"None",
         "attribution":"None",
         "telemetryValues":{
            "amlClientType":"azureml-sdk-train",
            "amlClientModule":"[Scrubbed]",
            "amlClientFunction":"[Scrubbed]",
            "tenantId":"fd1fbf9f-991a-40b4-ae26-61dfc34421ef",
            "amlClientRequestId":"007be226-0ddc-447f-806b-e217c5e9dfb5",
            "amlClientSessionId":"e1feb09c-8faa-4bf1-9f5d-886d7ea3e852",
            "subscriptionId":"c460cd3f-7c2a-48cc-9f5d-b62d6083ec23",
            "estimator":"NoneType",
            "samplingMethod":"RANDOM",
            "terminationPolicy":"Default",
            "primaryMetricGoal":"minimize",
            "maxTotalRuns":20,
            "maxConcurrentRuns":4,
            "maxDurationMinutes":10080,
            "vmSize":"None"
         },
         "snapshotId":"c0a3443f-a512-46bd-8434-de2b8c46aa03",
         "snapshots":[
            
         ],
         "sourceCodeDataReference":"None",
         "parentRunId":"None",
         "dataContainerId":"None",
         "runType":"None",
         "displayName":"None",
         "environmentAssetId":"None",
         "properties":{
            
         },
         "tags":{
            
         },
         "aggregatedArtifactPath":"None"
      },
      "logFiles":{
         "azureml-logs/hyperdrive.txt":"https://mllabsbj62gmtjekds.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_77961f54-fea8-4514-8a1c-25a5743ce89e/azureml-logs/hyperdrive.txt?sv=2019-07-07&sr=b&sig=X5DFgKkAi8dN7nGQXfk1%2BWxEfHPEg0cNrKF%2BNMEe9Ao%3D&skoid=1e9fabda-e0fa-4902-8f88-332bc2c64d90&sktid=fd1fbf9f-991a-40b4-ae26-61dfc34421ef&skt=2023-02-12T23%3A16%3A12Z&ske=2023-02-14T07%3A26%3A12Z&sks=b&skv=2019-07-07&st=2023-02-12T23%3A25%3A07Z&se=2023-02-13T07%3A35%3A07Z&sp=r"
      },
      "submittedBy":"student-1289-1715157"
   }

Get Best Performing Run

When all the runs have finished, you can execute the code below to determine the best-performing run based on the primary metric used in the experiment.

best_run = run.get_best_run_by_primary_metric()
if best_run is None:
    raise Exception("No best run was found")

best_run

Experiment	Id	Type	Status	Details Page	Docs Page
diabetes_training_hyperdrive	HD_77961f54-fea8-4514-8a1c-25a5743ce89e_3	azureml.scriptrun	Completed	Link to Azure Machine Learning Studio	Link to Documentation

Automating the hyperparameter tuning process provides a lot of efficiency in the Machine Learning process. For more information on tuning hyperparameters check out Microsoft's Documentation.

You can find here the full source code for this article.

Thanks for reading

Thank you very much for reading; I hope you found this article interesting and may be useful in the future. If you have any questions or ideas that you need to discuss, it will be a pleasure to be able to collaborate and exchange knowledge together.