AGGREGATION IN PANDAS

In this tutorial, we will learn about the aggregation in pandas by discovering about different aggregation functions like min, max sum and mean.

aggregation-in-pandas

Understanding Aggregation in Pandas

So as we know that pandas is a great package for performing data analysis because of its flexible nature of integration with other libraries. The aggregation function is used for one or more rows or columns to aggregate the given type of data. The syntax of the aggregation function is:

df.aggregate(func, axis=0, *args, **kwargs)

Note: asix 0 refers to the index values whereas axis 1 refers to the rows.

Let’s create a dataframe that holds some numeric values as aggregation is applicable of numeric rows or columns

import pandas as pd

# intialise data of lists.
data = {'Name':['Hira', 'Sanjeev', 'Rahul', 'Ali'],
'Occupation':['Entrepreneur', 'Doctor', 'Actor', 'Chef'], 'Salary':[30000, 40000, 25000, 32000], 'Age':[25,24,27,29]}

# Create DataFrame
df = pd.DataFrame(data, index=['Second','Fourth','Fifth','First'])

# Print the output.
print(df)




Let’s perform the aggregation function on our dataframe. Let’s find out the min and max value of Salary and Age from our dataframe on our columns.

df.agg(['min','max'])

Output:

             Name       Occupation   Salary    Age
min          Ali           Actor     25000      24
max          Sanjeev   Entrepreneur  40000      29

Now you can see that the data returned seems pretty confusing as it did calculated min and max salary but we can see a mix up of information in Occupation column as it doesn’t corresponds to the Name column, hence there is a confusion in using these together, so what alternate do we have?

We can use the aggregation functions separately as well on the desired labels as we want. Let’s use sum of the aggregate functions on a certain label:

Aggregation in Pandas: Max Function

#using the max function on salary
df['Salary'].max()

Output

40000

Aggregation in Pandas: Mean Function

#using the mean function on salary
df['Salary'].mean()

Output

31750.0

Aggregation in Pandas: Median Function

#using the median function on salary
df['Salary'].median()

Output:

31000.0

Sum Function

#using the sum function on salary
df['Salary'].sum()

Output:

127000

Standard Deviation:

#using the std (standard deviation) function on salary 
df['Salary'].std()

Output:

6238.322424070967

Describe Function:

#using the describe function on salary
df.describe()

Output:

Salary Age
count 4.000000 4.000000
mean 31750.000000 26.250000
std 6238.322424 2.217356
min 25000.000000 24.000000
25% 28750.000000 24.750000
50% 31000.000000 26.000000
75% 34000.000000 27.500000
max 40000.000000 29.000000

This is the most important tutorial of this series as it covers all the basic aggregation functions like sum, max, min, describe, count etc to work around with data. Another important aspect of performing or squeezing a dataframe into a selected dataframe is groupby where you can classify your own columns and perform aggregation functions through grouping.