Deciphering Data To Uncover Hidden Insights - Understanding The Data

"The best vision is insight – Malcolm Forbes". When it comes to data analytics for enterprises, nothing is more important than making accurate and reliable inferences from data. It is no surprise that enterprises are investing heavily on big data analytics as they can reap larger profits with accurate insights. However, this is often easier said than done. Data collected from real-world applications is affected by many variables, making data prediction challenging. Regardless, data analytics remain essential for many, if not all, businesses around the world.
 
In this article, I will walk you through the process of deciphering data to uncovering hidden insights from this data.
 

Is this article series for me?

 
This article is meant for everyone! This includes students who just want to familiarize with general concepts, professional data analysts who want to learn new ways to analyze data, and business decision-makers who want to know how to get better insights from business data.
 

Prerequisites

 
This article covers the overall process of deciphering data from conceptual, practical, and best practice perspectives. Anyone with valid data can use this article as a guide to get insights from data with the help of open-source technologies. However, if you are doing data analytics for business intelligence, I strongly recommend using Alibaba Cloud QuickBI.
 
To use Alibaba Cloud QuickBI, you need to do the following:
  1. Create Account in Alibaba Cloud.
  2. Add a valid Payment Method to your account.
  3. Enroll yourself for a free trial of QuickBI Pro in your console.

Overview of the Article

 
For this article, we are going to be looking at,
  1. Domain - BFSI (Banking, Financial Services, and Insurance)
  2. Modules - From Understanding Data to Visual Stories
  3. Use cases - ATM Analytics, Customer 360
We will be covering the entire process of deciphering data. The overall process involves,
  1. Understanding the data
  2. Wrangling the data according to your business scenario (if needed)
  3. Ingesting the data
  4. Modeling the data
  5. Visualizing the data
This multi-part article talks about how to collect data, wrangle the data, ingest the data, model the data, and visualize the data from three viewpoints (conceptual, practical, and best practice).
 
In the first article in this series, we are going to see how to understand the data better.
 

Understanding the Data (Conceptual)

 
When it comes to big data, more data isn't necessarily better. Your data is only as good as your ability to understand and communicate it, which is why understanding the data is so essential.
 
Once you've got your data, you need to consider the following problems,
  1. What do you do with it?
  2. What should you look for?
  3. Which tools should you use?
You will need to address these questions for your data analysis to be effective. We will provide some generalized answers for the above questions in this article.
 

What Do You Do with It?

 
We should analyze the data to understand the domain it belongs to. With the domain in mind, we should ask the right questions against the data to get insights out of it. For example, if the data shows ATM location details, transaction type, number of transactions, and transaction amount, it clearly depicts the data belongs to the BFSI domain.
 
After we determine the domain, it's now our turn to decide what type of insights that we can infer out of it from the given data. We will do this in our practical section.
 

What Should You Look For?

 
We should look for some "interesting" insights. As we discussed earlier, we need to ask the right questions against the data to understand it better and decipher insights.
 
For example, let's assume you have some understanding of the BFSI domain. Then, we should able to differentiate the Facts (Measures) and Dimensions (Other than Measures) from the data to get a clear idea about the data.
 
It's now our turn to understand what the facts and dimensions are available, what are the right questions that we need to ask to the given data. We can do this in our practical section.
 

What Tools Should You Use?

 
We need to choose the right tool to wrangle, process, and visualize the data effectively. There are a lot of tools available in the market, all of them with their own unique strengths.
 
When deploying on the cloud, I prefer using Alibaba Cloud Quick BI, which covers the majority of tasks needed to be done in ease at an affordable price.
  1. Quick BI allows you to perform data analytics, exploration and reporting on mass data with drag-and-drop features and a rich variety of visuals.
  2. Quick BI enables users to perform data analytics, exploration, and reporting and empowers enterprise users to view and explore data and make informed, data-driven decisions.
In this article, we are going to utilize Alibaba Cloud QuickBI as a tool to decipher the data to get the insights out of it. We will explore how to do this in our practical section.
 

Understanding the Data (Practical)

 
As we discussed earlier, we are going to understand the data better with real use cases.
 
UseCase 1 - ATM Analytics
 
Here we will use the data from ATM Dataset.
 
Deciphering Data To Uncover Hidden Insights - Understanding The Data
 

What Do You Do with It?

 
As mentioned previously, we know that this data belongs to the BFSI domain. Specifically, this data talks about ATM Transactions. Now before digging deeper, we need to understand the domain basics and how the business users will see it proceed with the next question.
 

What Should You Look For?

 
As we discussed earlier we need to ask the right questions to understand the data better. We need to differentiate the Facts (Measures) and Dimensions (Other than the Measures).
 
The Facts include,
  1. no_of_withdrawals
  2. no_of_cub_card_withdrawals
  3. no_of_other_card_withdrawals
  4. total_amount_withdrawn
  5. amount_withdrawn_cub_card
  6. amount_withdrawn_other_card
The Dimensions include,
  1. atm_name
  2. weekday
  3. festival_religion
  4. working_day
  5. holiday_sequence
After separating the facts and dimensions, we can now ask questions about the data. Questions may include,
  1. Total number of transactions
  2. Total transaction amount
  3. Top 5 ATMs by transaction volume
  4. Top 5 ATMs by the transaction amount
  5. Lowest 5 ATMs by transaction volume
  6. Lowest 5 ATMs by the transaction amount
  7. Number of different transactions by ATM
These questions are key to deriving insights from the data. Without the right questions, we can't derive the value we need from the data.
 
UseCase 2 - Customer 360
 
Here, we will use the data from Customer360.
 
Deciphering Data To Uncover Hidden Insights - Understanding The Data
 

What Do You Do with It?

 
Like the previous use case, we know the data belongs to the BFSI domain, specifically on bank customer details. Now before digging deeper, we need to understand the domain basics and how the business users will see it proceed with the next question.
 

What Should You Look For?

 
Similarly, we need to differentiate the Facts (Measures) and Dimensions (Other than the Measures).
 
The facts are,
  1. Balance
  2. Duration
  3. Campaign
  4. Pdays
  5. Previous
The dimensions are,
  1. Age
  2. Job
  3. Marital status
  4. Education
  5. Default
  6. Housing
  7. Loan
  8. Contact
  9. Day
  10. Month
  11. Poutcome
  12. Deposit
After separating the facts and dimensions, we can ask questions such as,
  1. Balance by job
  2. Balance by marital status
  3. Loan by age
  4. Loan by job
These questions are key to deriving insights from the data. Let's now look at the best practices of understanding data.
 

Understanding the Data (Best Practices)

 
Here are some of the best practices when trying to make sense out of data, particularly data relating to the two use cases above.
  • Determine the appropriate domain and understand the domain basics.
  • Always ask right questions about the data

    • Which ATMs fall under the Transaction Volume Benchmark?
    • Which ATMs fall under Transaction Amount Benchmark?
    • Which ATMs fall under Hit Rate Benchmark?
    • Which ATMs perform well irrespective of External Influences?
    • Top Violators
    • Income or Profitability of ATMs

  • Have a clear understanding of Facts and Dimensions.
  • Name the columns meaningfully.
    • "Job" as "Job Category"
    • "Marital" as "Marital Status"
    • "pdays" as "Previous Days"
    • "poutcome" as "Previous Outcome"
  • Name the columns in sentence case and always use space instead of underscore
    • "Job_Category" as "Job Category"

Summary

 
I hope that this article gives you a better grasp of the basic principles of data analytics, specifically on understanding your data. In the next article of this series, we will be exploring how to wrangle the data. Please ensure that you have registered on Alibaba Cloud because we will be using QuickBI for other articles in this series. Stay tuned.
 
"Torture the data, and it will confess to anything – Ronald Coase"