Snowflake And Its Features

Nitesh Jha
6y
16.8k
0
3

Article

Introduction

In this article, you will learn about Snowflake and its features.

What is Snowflake?

Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). It provides a data warehouse that is faster, easier to use, and far more flexible than traditional data warehouse offerings. It is a cloud data warehouse that runs on cloud infrastructures like AWS, Azure, and GCP (Google Cloud Provider). It can not be run on a private cloud or hosted infrastructure. When you create an account in Snowflake, it asks whether you want to create your account on AWS, Azure, or GCP.

As it is a warehouse, we can create a database, and perform SQL commands like DDL/DML. Not only this, but we can also create SQL function, Views, and stored procedure (in JavaScript). It is not a relational database, so it does not enforce the primary key and foreign key relationship. But you can use it if you want. The purpose of Snowflake is that it provides the data faster. You can pull the millions of records with a minute and visualize the data by BI tools.

Snowflake's Features

Data Warehouse as a Cloud service

As this is a cloud-based data warehouse, there is no hardware and software to configure, install, and manage it. Snowflake handles the maintenance, management, and tuning from their end.

Scalability (Storage and Compute)

Snowflake implements a separation of storage and computation. A database is used to store the data in Snowflake and computation can be done by a virtual warehouse. Snowflake stores data into S3 bucket because of the following reasons,

High availability
Durability
API to read data parts

User Experience (Performance)

No Indexing
No Performance Tuning
No Partitioning
No Physical Storage Design

In Snowflake, there is no indexing, no performance tuning, no portioning, no physical storage design. It simply works.

Supports Semi-structured data

Snowflake supports semi-structured data like Json, XML, Parquet, Avro, etc. You can load data as it is in the table.

Tuneable Pay per Use

You can configure it based on your requirement.

Please go through the Snowflake documentation on the Snowflake website for more details. Here is the link.

It has very good documentation, where they explain each and everything in a descriptive manner. However, I am sure you will need the help of Snowflake documentation for implementing and exploring their functionality.

We have different tools and connectors to work with Snowflake.

Integration Support by Snowflake

Data Integration

You can have ETL (Extraction, Transformation, and Loading) process in Snowflake using Informatica, Talend, Alooma, Kafka, etc.

Extraction: Extracting data from different source like csv, excel, Oracle etc
Transformation: Transforming or modifying the extracted data as per need using look up and rules.
Loading: Loading the transformed data into the destination source.

BI Tools

It enables analyzing, discovering, and reporting on the data using Power BI, Tableau, QuickSight, Qlik, Adobe, etc. A BI tool provides a facility the ability to visualize data through dashboards, charts, and other graphical outputs.

Machine Learning and Data Science

We can also refer to the advance analytical and AI (Artificial Intelligence) tool, Machine learning, Big Data tools for statistical and predictive modeling. BI tools can be focused on past data for reporting, but machine learning tools can be used to focus on large data set to discover a pattern and predict future trends. The tools are Databricks, DataRobot, Spark, etc.

Programmatic Interface

You can connect the snowflake using .Net, python, Node.Js, JDBC/ODBC driver. When you want to automate the DBA task in snowflake, it can be done by Python or other language connectors.

SQL Interface and Client

Snowflake has the following SQL interface:

Snowflake Web Interface: For an ad-hoc query, you can use the Web Interface. You need to log in with the Snowflake account and execute the SQL query
SnowSQL: You can perform a SQL query by installing the Snow SQL clients.
DBeaver: You can connect to Snowflake using the DBeaver client as well.

Snowflake Architecture

It is a combination of shared disk and shares nothing database architecture. Snowflake architecture is divided into three layers.

Service Layer
Query Processing Layer
Database Storage Layer

Cloud Services /Service Layer

This layer is responsible for authentication and access control management. It also manages the metadata information like if we create the warehouse then actually it creates the metadata only.

Query Processing

This layer is responsible for query processing. The SQL query can be processed using a Virtual warehouse, which is a parallel processing (MPP) compute cluster. You need to create the virtual warehouse as it is the same as EC2 instance of AWS, but instead of naming the virtual machine, it is called a virtual warehouse in Snowflake. Creating a warehouse is very easy in Snowflake, and it comes with different sizes like XS (extra small), S (small), M (medium), L (large), XL, XXL, XXXL. The SQL query will be processed using these virtual warehouses. By default, Warehouse will be suspended, but when we process query it will be activated. That is why Snowflake says you pay for storage and computation. If the warehouse was not in use, then you would pay only for storage.

Database Storage

Snowflake stores the data in the database. When we load the data in snowflake it automatically identifies the data for optimizing, compressing for internal purpose. The optimized data will be stored in cloud storage. Snowflake uses the AWS S3 bucket as cloud storage. As we know that S3 API can read the data parts based on the range. Whatever table we create inside the database, Snowflake splits that table into multiple partitions. Each partition called a micro partition, so it will be faster to access.

Snowflake Data Life cycle

To work with the data in Snowflake, Snowflake came up with the following mechanism is used to manage the data lifecycle.

Organizing Data
Storing Data
Querying Data
Working with Data
Removing Data

Organizing Data

You can create Database, schema, and table to organize data in Snowflake

Database and Scheme

There is a SQL command to create database and schema

Create database <database_name>
create schema <schema_name>
Table and View : - Create table <table_name> (parameter_name with data type)

Storing Data

Storing data in a table can be done with the below SQL query:

Insert: INSERT INTO TABLE ………
Bulk Insert: Copy INTO TABLE query

Querying Data

To query the data, we use the ‘Select’ query.

Select * from table_name

Working with Data

Once the data is loaded into the table, then you can perform the following activities.

Update, Merge, Delete
Cloning Database, schema and table by Create <object> Clone….command

Removing Data

To remove the data from the table, you can perform the below command:

Truncate
Drop

Almost all commands are the same as the SQL command in SQL server, or just a bit different. I will try to cover the main flow in my upcoming articles where all the above SQL command will be used.

Conclusion

I have covered the introduction of Snowflake with Features and architecture. You can use Snowflake if you are dealing with lots of data. You can visualize those data and perform machine learning on that to predict future trends. It will be faster in all aspects.

Happy Learning!