Uploading Files To Snowflake Stage Using SnowSQL

Nitesh Jha
1y
40.4k
0
4

Article

Introduction

In this article, we will learn the uploading process of the CSV and JSON files into a Snowflake stage using a SnowSQL client. Once the file is uploaded into the Stage, then we can perform a bulk copy operation to load the data from the file to a Snowflake table. Before working on the problem statement, we should have knowledge of SnowSQL and Snowflake Stage. SnowSQL is just a connector whereas a Snowflake stage is a location where we are loading our files. We can create an Internal and external stage in Snowflake. The external stage for AWS or Azure or GCP can be created in Snowflake. We should have the required keys respective to the cloud provider. In this case, if we have stored our CSV/JSON/XML or any semi-structured supported files in the S3 bucket or Azure Blob, we can simply stage it, and it will be available in Snowflake for data loading into the table.

That is why sometimes we say this process as ELT instead of ETL. In the ELT (Extraction, Loading, and transformation) process, first, we extract data from any source, then load it into a storage location to stage it on Snowflake. Once we have the list of files in Stage, we can transform those data as per our requirements. We are going to discuss the following things in detail

SnowSQL
Snowflake Stage
Problem statement (uploading file to the stage and loading the data from the stage to the snowflake table)

What Is SnowSQL?

It is a command-line client that is used to connect Snowflake. Instead of Snowflake Web Interface, we can use the SnowSQL tool to perform/execute SQL query, DDL, and DML commands, including loading and unloading data. To install SnowSQL, please download the executable from the following link

index

Let us set up the SnowSQL in our system to work on our problem statement. Perform the following steps to install and configure.

Step 1. Download the executable from the given link and double-click once it is downloaded (refer to the below screen).

setupScreen12

Step 2. Click on Next until you reach the below screen.

setupScreen3

If you closely look at the screen, then you can see some instructions. We need to configure the account detail in ~/.snowsql/config file.

Step 3. Click on the Finish button.

To check whether SnowSQL is installed or not. Window key + R or Run command

Step 4. Open the Run command. You will get the below screen:

runTerminal

Step 5. Type %USERPROFILE%\.snowsql\ and then click on OK

Step 6. You can see the below screen. Now open the config file in Notepad or notepad++

configFile

Step 7. Add the Snowflake account information based on the below screen.

You need to provide an accountname, region, username, password, dbname, schemaname, warehousename, and rolename. This information you can get in your Snowflake web Interface.

accountInfo

Step 8. You can see [connections.my_example_connection] in the above screen. Here, my_example_connection we use to connect Snowflake in command prompt. You can give any name.

Step 10. Type ‘snowsql -v’, like shown below.

cmdScreen

Step 11. Type snowsql -c my_example_connection and press the enter key

This command connects the snowsql with Snowflake as per the given configuration. In this case, it will come as ,nitesh#COMPUTE_WH@TEST_DB.MANAGEMENT refer below screen. Now you perform any SQL query, DML or DDL command.

cmd3

Now SnowSQL is installed and configured in your system properly. Let us discuss the Snowflake Stage.

Snowflake Stage

It specifies the stored data files location, so we can load the data into the table. You can explore on the Snowflake documentation.

The following types of Stages can be created in Snowflake:

User stage
Table Stage
Internal Named Stage

While uploading a file into the internal stage, we must specify the internal stage name in the PUT command, and while loading data into a table from staged files we must specify the stage name in COPY INTO <table> command

User Stages

By default, each user has a Snowflake stage allocated for storing files. It is a convenient option if your files will only be accessed by a single user but need to be copied into multiple tables.

Characteristics and Limitations

It is referenced using @~; e.g. use LIST @~ to list the files in a user stage.
It cannot be altered or dropped.
It does not support setting file format options. Instead, you must specify the file format and copy options as part of the COPY INTO <table> command.

Table Stages

By default, each table has a Snowflake stage allocated for storing files. It is a convenient option if your files need to be accessible to multiple users and only need to be copied into a single table.

Characteristics and Limitations

It has the same name as the table, e.g. a table named mytable has a stage referenced as @%mytable.
It cannot be altered or dropped.
It does not support setting file format options. Instead, you must specify the file format and copy options as part of the COPY INTO <table> command.
It does not support transforming data while loading it (i.e. using a query as the source for the COPY command).

We do not go with this option if we need to copy the data in the files into multiple tables.

Internal Named Stages

It is named database objects which provides more flexibility for data loading. Because they are database objects, the security/access rules that apply to all objects apply

Users who have appropriate privileges on the stage can load data into any table.
Named internal stages are optional but recommended when you plan regular data loads that could involve multiple users and/or tables.

Creating Internal Named Stage

To achieve the solution for a given problem, we need to create the Internally named stage, so we can upload the files into that. Apart from creating a Stage in Snowflake, we can also create a stage for AWS, Azure, and GCP. We must provide the appropriate keys related to the cloud provider. Just an example, we create a stage with the AWS S# bucket key, then we can see all the listed files in the Snowflake stage which you have loaded to the S3 bucket. It is just created the reference, so now we can load the data into the table if the file is listed in our stage. In our scenario, we are creating a named stage in Snowflake, so uploading files into S3, we need to upload the file into the Snowflake stage. You can create a stage by GUI or SQL query. Ultimately, whatever we are passing to GUI, it creates SQL for that and runs it behind the scenes. I am creating Stage by GUI (Graphical User Interface).

Perform The Following Steps To Ceate Stage

Step 1. Login into the Snowflake account. You will be landing on the worksheet.

Step 2. Click on the Database from the Header (beside the Share Icon). Refer to the below screen.

selectDB

Step 3.Click on the database where you want to create the stage. In this case. I have selected TEST_DB. Once you click it, you will be getting the below screen.

CreateStage

Step 4. Click on the Stages

Step 5. Click on Create. You will be getting the below screen. Now it depends on you whether you want to create stage for Snowflake or Amazon S3 or Azure or GCP.

CreateSuccess

Step 6. Select Snowflake Managed in the above screen and click on Next

Step 7. Enter the stage name and select the schema. Refer to the below screen.

comment

Step 8. Click on Finish.

Step 9. In the below screen, you can see the stage which we have created under MANAGEMENT schema.

Uploading Files To Snowflake Stage Using SnowSQL

Now we have configured the SnowSQL and Created the STAGE as well. Let me give you brief information on File Format as it is required while loading data from stage to table.

File Format

It is a pattern of format that we create for the type of File. It can be CSV, JSON, XML, Avro, etc. It will help us while loading the different format data into the Snowflake table.

Here, I am creating the File Format for CSV and JSON file and we will use these formats while loading data from stage to snowflake table.

Creating File Format Inside The Database

Like a stage, we can create File Format inside the database. Perform the following steps:

Step 1. Click on the File Format from the menu

Step 2. Click on Create

Step 3. Specify the format name and schema name with the required settings. Refer to the below screen.

FileFormat

Step 4. Click on the Finish button.

Perform the above steps once again to create JSON file Format. Refer to the below screen.

createJSON

Click on the Finish button.

Now we have everything ready to work on our problem statement.

Problem Statement

Uploading CSV and JSON files from the local system to Snowflake Stage
Load data from stage to Snowflake table

Solution

As we have already set up and configured the SnowSQL and Snowflake Stage, now it will be very easy for us to work on this solution part. I have created two files. One is employee_list.json and the other is employeedaa.csv. These files are available in C to drive inside the test data folder.

Please have a look at the screen below for JSON file data and CSV file data. I have also attached the testdata.zip here.

JSON File Data

JSONDATA

CSV file Data in the below screen

CSVDATA

Now, we must perform the following steps to achieve the solution of loading the file into stage and stage to Snowflake table.

Step 1. Open the command prompt.

Step 2. Type “snowsql -v “ and press enter key

It will provide you the snowsql version installed in your system.

Step 3.

Type snowsql -c my_example_connection and press enter key

This command connects snowsql with Snowflake as per the given configuration. In this case, it will come as Nitesh#COMPUTE_WH@TEST_DB.MANAGEMENT

Step 4. Type PUT file://c:\testdata\employeedata.csv @MYDATA; and press the enter key. Refer to the below screen.

cmd2

Step 5. Once it is successfully loaded into Stage, you will see the below screen. The status will be uploaded, and it will be created employee.csv.gz compressed file on the Snowflake stage.

:LoadStage

Step 6. Type “PUT file://c:\testdata\employee_list. json @MYDATA; “ and press enter key. Refer to the below screen.

The JSON file will be loaded into the Snowflake Stage called “MYDATA”

cmd4

Step 7. Open your Snowflake Web Interface. Set the context

Use role sysadmin;

Use warehouse compute_wh;

Use database test_db;

Use schema management;

Step 8. Open the worksheet and type the below query

List @MYDATA;

It will show the list of the files available in the stage.

StageList

Step 9. To see the CSV file data, run the below query.

Select $1,$2,$3,$4,$5 from @MYDATA/employeedata.csv.gz;

It will show the CSV data in the result set. Refer to the below screen.

CSVData

You can use the file format to see the proper data.

Select $1,$2,$3,$4,$5 from @MYDATA/employeedata.csv.gz (FILE_FORMAT=>’CSV_FORMAT’);

Refer to the below screen.

SelectData

Step 10. Perform the bulk insert operation. We have the user_data table in TEST_DB database under the management schema.

copy into user_data (user_id,user_name,address)

from (select $1,$2,$3 from @MYDATA/employeedata.csv.gz)

FILE_FORMAT = 'CSV_FORMAT';

Refer to the below screen.

CopyData

Step 11. The data has been loaded to the user_data table. To see the loaded data, run the below query

select * from user_data;

Refer to the below screen:

SelectUser

Step 12. We have loaded the CSV file data to the user_data table. Now we have to load the JSON file data to the table. Firstly, To view the JSON data in Stage, run the below query

select parse_json($1) from @mydata/employee_list.json.gz (file_format => 'JSON_FORMAT');

Refer to the below screen.

ParseData

Step 13. Click on the result row so you can see the JSON format data. Refer to below screen.

Result

Step 14. Create a table where we can store the JSON data as it is. Here, I am creating a table which is having one column whose data type is variant. This is the new data type in Snowflake which can store semi-structured data. Run the below query to create a table.

create or replace table test_table(user_data variant);

Step 15. Run the below query to load the JSON file data

copy into test_table (user_data)

from (select parse_json($1) from @mydata/employee_list.json.gz)

FILE_FORMAT = 'JSON_FORMAT';

Step 16. To view the loaded data into test_table, run the below query

select * from test_table;

Refer to the below screen.

queryScreen

Conclusion

Uploading files to a Snowflake stage can be done by any Snowflake connector client. We can post the file into the stage from the local system and then the data can be loaded from the stage to the Snowflake table. If you have an S3 bucket where you are posting/uploading the data files or if you have Azure blob where you are posting/uploading data files, then you can simply create a stage referencing the keys of the AWS S3 bucket or Azure blob and start loading the data into the table.

I answer questions that you may have on this functionality.

Happy Learning!