How To Give Name To A Size Column In Python

It is quite common to use size() in Python. The size() function gives you a total number of elements.

Now, if it is that easy and straight forward, then why am I writing about it here?

Well, calculating the size or getting the output of the size() function is very straight forward, but when it comes to labeling this value, things can become more complicated.

Let’s understand this with the help of an example.

Input Data

Here is how our sample data looks. It is in the form of CSV.

Scenario Explained

The idea is to group data based on two columns named "type_school" and "interest," and then show their item count in a separate column.

Here is the sample code to achieve this.

import pandas as pd
df = pd.read_csv('data.csv')
data = df.groupby(['type_school','interest'])
data['size'] = data.size()
print(data)

The above code looks good, but you will end up seeing an error in its execution, which says:

"TypeError: ‘DataFrameGroupBy’ object does not support item assignment"

Is there anything wrong with the above code? Any guesses?

Analysis

The size() function, which is a function of DataFrameGroupBy objects, actually returns a Series object with the group sizes.

So, if you want to display it for a data frame having column of group sizes, you need to change your code to the below code.

import pandas as pd
df = pd.read_csv('data.csv')
data = df.groupby(['type_school','interest'])
data = df.groupby(['type_school','interest']).size()
print(data)

The above execution will give you an output as shown below:

At this point, the data looks correct and the only thing that is missing is column titles.

Giving Column Title

As we are displaying a data frame having column of group sizes, we need to use the to_frame() function to make this column name association happen with the desired column name as its parameter.

Below is the code to do this.

import pandas as pd
df = pd.read_csv('data.csv')
data = df.groupby(['type_school','interest'])
data = df.groupby(['type_school','interest']).size().to_frame('size')
print(data)

On execution of the above lines of code, you will get the expected output as shown below.

I hope you enjoyed labelling your columns :)

I hope you find this writeup useful. Do not forget to check out the recording of this article on my YouTube channel named Shweta Lodha.


Similar Articles