Getting Started With Amazon Redshift

Raj Kumar
1y
19.2k
0
4

Article

What is Amazon Redshift?

Amazon Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze all your data across your data warehouse and data lake.

Amazon Redshift is a fully managed cloud-based data warehouse product designed for large-scale data set storage and analysis. It is also used to perform large-scale database migrations.

Redshift’s column-oriented database is designed to connect to SQL-based clients and business intelligence tools, making data available to users in real time. Based on PostgreSQL 8, Redshift delivers fast performance and efficient querying that help teams make sound business analyses and decisions.

Amazon reference

What is a Redshift Cluster?

An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.

Benefits

Deepest integration with your data lake and AWS services
Best performance
Most scalable
Best Value
Easy to manage
Most secure and compliant

Read a more detailed reference here.

Getting Started

Log in to the AWS Console
Enter Username and Password
Go to services and then go to Analytics
Search Amazon Redshift
Ready to enter in Redshift Dashboard.

On the right side, you will see the option to create a Cluster.

Create Cluster

On the left side, you can see other dashboard options like Dashboard, Clusters, Queries, etc.

Dashboard

In this article, I am not going too in-depth, I'll only explain how it's easy to get started with Amazon Redshift in the following steps.

How to create a cluster
Create schema and tables,
How to load data and integrate ETL
Run Queries and Integrate BI Tools
How to monitor and tune queries.

Click on the Create cluster button and provide all configuration details like Cluster identifier, I am using a Free trial.

Cluster button

Provide database configurations like a database name, port, username, and password.

Database configurations

You must wait until the cluster is created, then you can see the update in the CLUSTERS icon.

Cluster

Once the cluster status is Available, go to the Editor tab and connect to the created cluster. Enter a database name, username, and password, then click Connect.

Database password

The query editor looks like this.

Query editor

Now I have a CSV file copied in my S3 bucket, I am going to create a table and load data in the table from the S3 bucket. My database looks like this.

CSV file

Let us run 2 commands in the editor, one to create a new table and another to copy data from the s3 bucket to the redshift table. Run both queries manually one by one.

Redshift table

Queries

Create Table

CREATE TABLE orders (
    OrderDate datetime NULL,
    Region nvarchar(255) NULL,
    Rep nvarchar(255) NULL,
    Item nvarchar(255) NULL,
    Units float NULL,
    Total float NULL
);

Copy data from S3

COPY orders (OrderDate, Region, Rep, Item, Units, Total)
FROM 's3://rajsamplebucket/SalesOrders.csv'
IAM_ROLE '<Role-ARN>'
CSV
IGNOREHEADER 1;

Note. Make sure the given role has an AmazonS3FullAccess policy attached.

Summary

Let's check if our table has data or not.

Write the command in the editor.

SELECT *
FROM public.orders
LIMIT 10;

Rows returned

You can visualize data using the Visualize button.

Visualize button

Line Chart

Line chart

Bar Chart

Conclusion

In this article, we learned how to get started with Amazon Redshift and how to create clusters along with schema, tables, and how to load data from S3 to Redshift table.