AGGREGATION IN PANDAS

In this tutorial, we will learn about the aggregation in pandas by discovering about different aggregation functions like min, max sum and mean.

Understanding Aggregation in Pandas

So as we know that pandas is a great package for performing data analysis because of its flexible nature of integration with other libraries. The aggregation function is used for one or more rows or columns to aggregate the given type of data. The syntax of the aggregation function is:

df.aggregate(func, axis=0, *args, **kwargs)

Note: asix 0 refers to the index values whereas axis 1 refers to the rows.

Let’s create a dataframe that holds some numeric values as aggregation is applicable of numeric rows or columns

import pandas as pd

# intialise data of lists.
data = {'Name':['Hira', 'Sanjeev', 'Rahul', 'Ali'],
'Occupation':['Entrepreneur', 'Doctor', 'Actor', 'Chef'], 'Salary':[30000, 40000, 25000, 32000], 'Age':[25,24,27,29]}

# Create DataFrame
df = pd.DataFrame(data, index=['Second','Fourth','Fifth','First'])

# Print the output.
print(df)

Let’s perform the aggregation function on our dataframe. Let’s find out the min and max value of Salary and Age from our dataframe on our columns.

df.agg(['min','max'])

Output:

             Name       Occupation   Salary    Age
min          Ali           Actor     25000      24
max          Sanjeev   Entrepreneur  40000      29

Now you can see that the data returned seems pretty confusing as it did calculated min and max salary but we can see a mix up of information in Occupation column as it doesn’t corresponds to the Name column, hence there is a confusion in using these together, so what alternate do we have?

We can use the aggregation functions separately as well on the desired labels as we want. Let’s use sum of the aggregate functions on a certain label: