Creating Dataframe In PANDAS

CREATING DATAFRAMES

 
In my previous article, I have introduced you to PANDAS and we also learned what DataFrame and Series are.
 
Now let’s dig deeper; this is what a DataFrame (Multi-Dimensional Data Structure) looks like:
 
Creating Dataframe In PANDAS
 
We would be using the above example throughout the article.
 
These kinds of DataFrames can be created in various ways using Dictionary, NumPy Array, etc. Here we will learn to create DataFrames by using,
  1. 2-D Dictionary
  2. 2-D NumPy Array
  3. Series Type Object
  4. DataFrame Object

Creating DataFrame from 2-D Dictionary

 
We must know that Dictionary is a collection of key: value pairs that are mutable. Refer to my article about Dictionaries in Python.
 
a) 2-D Dictionary contains values as list/ndArray
 
(To understand Lists, refer to my article.)
 
Let us consider the above example, “Report of Student A” and try to create that DataFrame.
  1. import pandas as pd    
  2.     
  3. dict= { '2018':[85,73,80,64], '2019':[60,80,58,96], '2020':[90,64,74,87] }    
  4.     
  5. df=pd.DataFrame(dict)    
  6. print(df)  
OUTPUT
 
Creating Dataframe In PANDAS
Now here we got the indexes as 0,1,2…; this is because we didn’t mention any index, and these were set by default. If we wish to change the indexes and want to create the exact same DataFrame then we use,
 
DataFrame(dictionary_object, index=[‘mention’,’index’,'you’,’want’])
  1. import pandas as pd    
  2.     
  3. dict= { '2018':[85,73,80,64],  '2019':[60,80,58,96], '2020':[90,64,74,87] }    
  4.     
  5. df=pd.DataFrame(dict,index=['English','Math','Science','French'])    
  6. print(df)    
OUTPUT
Creating Dataframe In PANDAS
 
NOTE
Index Value must be the same as the length of the rows, otherwise it generates an error. In case we use:
 
df=pd. DataFrame (dict, index=['English','Math','Science'])
 
We get:
 
Creating Dataframe In PANDAS
 
b) 2-D Dictionary contains values as Dictionary
 
We use the concept of Nested Dictionary in this case.
  1. import pandas as pd    
  2.     
  3. report= { '2018':{'English':85,'Math':73,'Science':80,'French':64},    
  4.           '2019':{'English':60,'Math':80,'Science':58,'French':96},    
  5.           '2020':{'English':90,'Math':64,'Science':74,'French':87}    
  6.         }    
  7.     
  8. df=pd.DataFrame(report)    
  9. print(df)   
There is another way in which you can create a nested dictionary to form a DataFrame,
  1. import pandas as pd    
  2.     
  3. year2018={'English':85,'Math':73,'Science':80,'French':64}    
  4. year2019={'English':60,'Math':80,'Science':58,'French':96}    
  5. year2020={'English':90,'Math':64,'Science':74,'French':87}    
  6.     
  7. report={'2018':year2018,'2019':year2019,'2020':year2020}    
  8. df=pd.DataFrame(report)    
  9. print(df)   
NOTE
In Nested Dictionary, sometimes we get confused within the inner and outer keys. So, Columns- Outer Dictionary Keys and Rows- Inner Dictionary Keys.
 
So, in the above example, 2018,2019,2020 are Columns hence the Outer Dictionary Keys and 'English','Math','Science','French' are Rows hence the Inner Dictionary Keys.
 

Creating DataFrame from 2-D ndArray(NumPy Array)

 
(To know more about NumPy, Refer to my article about NumPy Array)
  1. import pandas as pd    
  2. import numpy as np    
  3.     
  4. arr= np.array([(85,60,90),(73,80,64),(98,58,74),(88,96,87)])    
  5. df=pd.DataFrame(arr)    
  6. print(df)  
OUTPUT
 
Creating Dataframe In PANDAS
 
By Default, the column and row index both are set to 0,1,2,3…. Depending upon the number of rows and columns. If you wish to name them, then we can use,
df=pd.DataFrame(arr, columns=['2018','2019','2020'], index=['English','Math','Science','French'])
 

Creating DataFrame from Series Object

 
(To know more about Series, Refer to my article
 
To create a DataFrame from a Series Object we need to go through 2 steps,
 
a) First, we create series.
  1. import pandas as pd    
  2.     
  3. student= pd.Series(['A','B','C'])    
  4. print(student)    
OUTPUT
 
Creating Dataframe In PANDAS
 
b) Then, we convert this series into dictionary to form a DataFrame.
  1. import pandas as pd    
  2.     
  3. stud= pd.Series(['A','B','C'],index=[1,2,3])    
  4. dict={'Student':stud}    
  5.     
  6. df=pd.DataFrame(dict)    
  7. print(df)   
OUTPUT
 
Creating Dataframe In PANDAS
 
Now, if we want to create the DataFrame as first example,
  • First, we have to create a series, as we notice that we need 3 columns, so we have to create 3 series with index as their subjects.
  • Then we need to convert the series into Dictionary with column titles of 2018,2019,2020.
  1. import pandas as pd    
  2.     
  3. year1= pd.Series([85,73,80,64],index=['English','Math','Science','French'])    
  4. year2= pd.Series([60,80,58,96],index=['English','Math','Science','French'])    
  5. year3= pd.Series([90,64,74,87],index=['English','Math','Science','French'])    
  6.     
  7. dict={'2018':year1, '2019':year2, '2020':year3 }    
  8.     
  9. df=pd.DataFrame(dict)    
  10. print(df)   

Creating DataFrame another DataFrame

 
This means that we do not have to go through the whole procedure of making a DataFrame to create a new one. We can simply cast the old DataFrame into new one. This would be helpful in case we want 2 similar DataFrames.
  1. import pandas as pd    
  2.     
  3. year1= pd.Series([85,73,80,64],index=['English','Math','Science','French'])    
  4. year2= pd.Series([60,80,58,96],index=['English','Math','Science','French'])    
  5. year3= pd.Series([90,64,74,87],index=['English','Math','Science','French'])    
  6.     
  7. dict={'2018':year1, '2019':year2, '2020':year3 }    
  8.     
  9. df=pd.DataFrame(dict)    
  10. print(df)    
  11.     
  12. df2=pd.DataFrame(df)    
  13. print(df2)   
OUTPUT
 
Creating Dataframe In PANDAS

Summary

 
In this article, we learned how to create dataFrame using different techniques. Now you know basic Pandas, Series and DataFrame.
 
In my next article, we will learn about “DataFrames Attributes". Until then practice and try to create different dataFrames using different techniques.
 
Feedback or queries related to this article are most welcome.
 
Thanks for reading.


Similar Articles