Serverless Data Lake Analytics Service In Alibaba Cloud

Suketu Nayak
Sep 14, 2018

12.3k
0
2
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

Alibaba Cloud offers so many DT Plus services. Among all of them, the Data Lake Analytics is a very high-in-demand service in the industry. Alibaba Cloud Data Lake Analytics is a serverless interactive cloud native and analytics service which is fully managed by Alibaba Cloud using Massive Parallel Processing compute nodes managed by Alibaba Cloud. So, there is no need to maintain it which makes it a zero-maintenance service provided by Alibaba Cloud. The company has made it available for enterprise users on a Pay-As-You-Go model.

Alibaba Cloud Data Lake Analytics service offers cloud-native query using standard SQL interface with SQL Compatibility and comprehensive built-in functions. You can connect various data sources using JDBC and ODBC connectors. Also, Data Lake Analytics on Alibaba Cloud can integrate with BI products which help this service to turn in to Big Data Insights and visualization.

This also helps the customers to provide them a cloud migration process in low migration cost. Alibaba Cloud Data Lake Analytics offers to do complex analytics on data which may come from different sources in its formats. Using Alibaba Cloud Data Lake Analytics, we can analyze the data which is stored on Alibaba Cloud Object Storage (OSS) or Table Storage or we can also join the results and generate new insights.

Alibaba Cloud Data Lake Analytics is powered by full Massive Parallel Processing architecture and can provide vectorized execution optimization, operator pipelined execution optimization and multi-tenancy resource allocation and priority scheduling.

Using Alibaba Cloud Data Lake Analytics, we can analyze OSS Raw data like Logs, CSV, JSON, Avro etc. We can execute a query against a specific OSS file folders and we can create a table, search the query, and can integrate BI as well. Using Data Lake Analytics, we can query the time series data, pipeline data, logs, and Post ETL Data which is stored in the TableStore. Using DLA, we can query a single table, store table, or even join across multiple tables.

Also, we can join heterogeneous data sources like we have data in OSS and Table Store. Then, we can use the JOIN query from data sources and turn it in to insights. Here, the data is isolated so it's only visible to the data owner. Once you activate data lake analytics the system will grant your account access permissions to the database.

Alibaba Cloud Data Lake Analytics offers so many types of in-built functions like aggregation functions which ignore null values and return null without input. Also, these have binary functions and operators, bitwise functions, and conversion functions which help to convert numeric and character values into the required typecasting, Date and Time functions and operators, JSON functions and operators, Mathematical functions and operators, String functions and operators, and Window Functions. All the tables which we create in DLA must have a parent database schema and it must be unique in each of your Alibaba Cloud regions. Below is a sample table creation query which is more like the syntax of a Hive query.

CREATE EXTERNAL TABLE nation_text_string (
N_NATIONKEY INT COMMENT 'column N_NATIONKEY',
N_NAME STRING,
N_REGIONKEY INT,
N_COMMENT STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE LOCATION 'oss://your-bucket/path/to/nation_text';

Alibaba Cloud Data Lake Analytics is compatible with the Serialization and Deserialization data records mechanism of Hive including data files in CSV, Parquet, ORC, RCFile, Avro, and JSON formats. So, make sure that whenever we are creating any table from CSV, we need to take appropriate SerDe (Serialization and Deserialization data records) on the basis of contents of the CSV file.

For Example:

CREATE EXTERNAL TABLE test_csv_opencsvserde (
id STRING,
name STRING,
location STRING,
create_date STRING,
create_timestamp STRING,
longitude STRING,
latitude STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
with serdeproperties(
"separatorChar"=",",
"quoteChar"="\"",
"escapeChar"="\\"
)
STORED AS TEXTFILE LOCATION 'oss://test-bucket-julian-1/test_csv_serde_1';

Recommended Free Ebook

Printing in C# Made Easy

Download Now!