Pandas with D-Tale

Sameer Shukla
4y
5.1k
0
3

Article

Introduction

D-Tale is a recently created library for Interactive Data Exploration in Pandas. D-Tale library helps us visualize the DataFrame and Series. Its integration with Pandas is seamless and it generates an Interactive graphical interface of DataFrame and provides us options like Custom Filtering, Show/Hide columns, Handling Duplicates, Data Summarization and helps in generating the heat maps, charts, and many other options. The backend of D-Tale is written on Flask and its front-end is developed using React. The article explains how to use D-Tale with Pandas and some of its features.

Installation

Setup can be done either by using pip install or conda forge, I used conda forge command for installation.

conda install dtale -c conda-forge

Using pip:

!pip install dtale

Data Setup

We will work on a Kaggle dataset that provides YouTube video trending statistics, URL: https://www.kaggle.com/datasnaek/youtube-new, and the file used is ‘USvideos.csv’.

df = pd.read_csv('USvideos.csv')
df.columns

The columns of the data set are

D-Tale Features

D-Tale shows the data the way pandas do, the magic is in the top left corner of the menu that gives us a variety of options for data exploration, clean-up, and analysis. The below code generate the data in the tabular format on the default browser with the below code

import dtale
import pandas as pd
df = pd.read_csv('USvideos.csv')
dtaleDf = dtale.show(df)
dtaleDf.open_browser()

The button on the top left corner is the one that has a collection of multiple options, up-on clicking multiple options displayed for us to use.

On clicking on column headers, a drop-down menu gets opened which helps us in data sorting, hiding, renaming, deleting, etc.

Let’s explore some of the options, like Duplicates, Describe, and Heat Map. Duplicates help us in handling duplicate data like removing duplicate columns, showing duplicates, etc by selecting the column on which the operation needs to be performed.

Most interesting option is Describe, the Describe feature provides the statistical analysis of the selected column. Say we want to visualize the ‘likes’ and ‘dislikes’ column, all we need to do is to select the column, and the rest is all done for us.

Correlation

GroupBy can be achieved using D-Tale, something like

df.groupby(['trending_date'])['likes'].sum()

The GroupBy option comes under “Summarize Data”. Let’s execute the same code using D-Tale

Summary

The article provided a brief introduction to working with D-Tale with some insights on the options it provides. The library is still evolving so far the features it provides and the way it works are impressive.