Getting Started With Amazon Redshift

What is Amazon Redshift?

 
Amazon Redshift is a fast, scalable data warehouse that makes it simple and cost-effective to analyze all your data across your data warehouse and data lake.
 
Amazon Redshift is a fully managed cloud-based data warehouse product designed for large scale data set storage and analysis. It is also used to perform large scale database migrations.
 
Redshift’s column-oriented database is designed to connect to SQL-based clients and business intelligence tools, making data available to users in real-time. Based on PostgreSQL 8, Redshift delivers fast performance and efficient querying that help teams make sound business analyses and decisions.

What is a Redshift Cluster?

 
An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster. Each cluster runs an Amazon Redshift engine and contains one or more databases.
 
Benefits
  1. Deepest integration with your data lake and AWS services
  2. Best performance
  3. Most scalable
  4. Best value
  5. Easy to manage
  6. Most secure and compliant

    Read a more detailed reference here.
Getting Started
  • Log in to the AWS Console
  • Enter Username and Password
  • Go to services and then go to Analytics
  • Search Amazon Redshift
  • Ready to enter in Redshift Dashboard 😊
On the right side, you will see the option to create Cluster
 
Getting Started Amazon Redshift
 
On the left side, you can see other dashboard options like Dashboard, Clusters, Queries, etc.
 
Getting Started Amazon Redshift
 
In this article, I am not going too in-depth, I'll only explain how it's easy to get started with Amazon Redshift in the following steps:
  1. How to create a cluster
  2. Create schema and tables,
  3. How to load data and integrate ETL
  4. Run Queries and Integrate BI Tools
  5. How to monitor and tune queries
Click on the Create cluster button and provide all configuration details like Cluster identifier, I am using a Free trial 
 
Getting Started Amazon Redshift
 
Provide database configurations like a database name, port, username, and password.
 
Getting Started Amazon Redshift
 
You must wait until the cluster is created, then you can see the update in the CLUSTERS icon.
 
Getting Started Amazon Redshift
 
Once the cluster status is Available, go to the Editor tab and connect to the created cluster. Enter a database name, username, and password, then click connect.
 
Getting Started Amazon Redshift
 
The query editor looks like this:
 
Getting Started Amazon Redshift
 
Now I have a CSV file copied in my S3 bucket, I am going to create a table and load data in the table from the S3 bucket. My database looks like this:
 
Getting Started Amazon Redshift
 
Let us run 2 commands in the editor, one to create a new table and another to copy data from the s3 bucket to the redshift table. Run both queries manually one by one.
 
Getting Started Amazon Redshift
 
Queries
 
Create Table
  1. CREATE TABLE orders(  
  2.    OrderDate datetime NULL,  
  3.    Region nvarchar(255) NULL,  
  4.    Rep nvarchar(255) NULL,  
  5.    Item nvarchar(255) NULL,  
  6.    Units float NULL,  
  7.    Total float NULL); 
Copy data from S3
  1. copy orders(OrderDate,Region,Rep,Item,Units,Total)  
  2. from 's3://rajsamplebucket/SalesOrders.csv'  
  3. iam_role '<Role-ARN>'  
  4. Csv  
  5. IGNOREHEADER 1 
Note
Make sure the given role has an AmazonS3FullAccess policy attached.
 
Getting Started Amazon Redshift
 
Let's check if our table has data or not.
 
Write the command in the editor:
 
select * from public.orders limit 10
 
Getting Started Amazon Redshift
 
You can visualize data using the Visualize button:
 
Getting Started Amazon Redshift
 
Line Chart
 
Getting Started Amazon Redshift
 
Bar Chart
 
Getting Started Amazon Redshift
 

Conclusion

 
In this article, we learned how to get started with Amazon Redshift and how to create clusters along with schema, tables, and how to load data from S3 to Redshift table.