Chat With Your CSV Data Using Azure OpenAI

In this article, I’ll walk you through all the steps required to query your CSV data and get a response out of it using Azure OpenAI.

Azure OpenAI

Let’s get started by importing the required packages.

Import Required Packages

Here are the packages which we need to import to get started:

import json
import pandas as pd
from dotenv import dotenv_values    

Read Configuration

First of all, we need to set a few variables with information from the Azure portal and Azure OpenAI Studio:

openai_api_version = "API_VERSION"
openai_api_key = "API_KEY"
openai_api_base = "ENDPOINT"

If you are not sure how to grab the above values, I would recommend you watch my video here.

Next, we will go ahead and use the above variables to set environment variables:

config = dotenv_values(env_name)

openai_api_key = config['openai_api_key']
openai_api_base = config['openai_api_base']
openai_api_version = config['openai_api_version']

In my case, I’ve pushed all the configuration values in env_name and then I’m reading those values. Feel free to change above lines of code as per your convenience.

Preparing the Data

Next, we need to read a CSV file, aka comma-separated values, and push the data into a Pandas data frame. This CSV file contains the data about movies, which I grabbed from Kaggle.

df = pd.read_csv(‘MovieData.csv’)

Here is the gist of the data:

Preparing The Data

In order to query such data, first, we need to construct some relationship and that can be done by combining the required columns. Here is how you can do this:

df['combined'] = 'Movie: ' + df['name'] + ' ' + 'year: ' + df['year'].astype(str) + ' ' + 'duration: ' + df['duration'] + ' ' + 'certificate: ' + df['certificate']

And this is how combined columns look like:

Making a Call to Azure OpenAI Endpoint

At this point, we have our data ready. So, let’s go ahead and create an Azure OpenAI client.

from openai import AzureOpenAI
client = AzureOpenAI(
    api_version= "2023-10-01-preview", #I'm using this version
    azure_endpoint = openai_api_base

context = df.head().to_json(orient="records")

Once the client object is created successfully, we are good to go ahead and make an API call.

response =
        {"role": "system", "content": "You are a helpful assistant who answers only from the given Context."},
        {"role": "user", "content": "Context: " + context + "\n\n Query: " + query}

Before executing the above lines, make sure to set the query variable with the appropriate question. Here is mine:

query = "How many movies were not certified?" 

If everything goes well, then you will definitely receive a response. Here is the response that I received:


I hope you find this walkthrough useful.

If you find anything that is not clear, I recommend you watch my video recording, which demonstrates this flow from end to end.

Happy learning!

Similar Articles