MERGE IN PANDAS

In this tutorial, we are going to learn about the merge function in pandas and how we can join or merge two dataframes with the commonalities of data using inner, outer, left and right join.





Merge in Pandas

What is Merge in Pandas?

We have been working with 2-D data which is rows and columns in Pandas. In order to go on a higher understanding of what we can do with dataframes that are mostly identical and somehow would join them in order to merge the common values. We use a function called merge() in pandas that takes the commonalities of two dataframes just like we do in SQL. The Syntax for merge in pandas is:

DataFrame.merge(self, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=(‘_x’, ‘_y’), copy=True, indicator=False, validate=None)





Inner Join

Let’s merge two dataframes which will have common indexes and see different ways to merge their values:

import pandas as pd
df1 = pd.DataFrame({'Country':["India", "USA", "Canada","Pakistan"],
'Population':[1352642280, 329968629, 35151728, 212742631]})

df2 = pd.DataFrame({'Country':["India", "USA", "Brazil","Bangladesh"],
'Area/sqkm2':[3287263,9834000, 8511000, 147570]})

df3 = pd.merge(df1, df2, on='Country')
df3

Output:

inner

Above you can see that a simple merge has been displayed which displays the common values present in both the data frames, if you want to visually see this, then here is a quick at what exactly happened.

inner 2

Courtesy: w3schools.com

By default, merge creates the inner join which only takes values that are common, in our case India and USA were present in both the dataframes, so the merge function only printed values that were common in both the dataframes while ignoring other countries. However, we can create, left join, right join and outer join as well by predefining the ‘how’ attribute.





Left Merge in Pandas

As the name suggests the left join will only display values that are common between two dataframes as well as present fully in the left or first dataframe in our case.

left join

Courtesy: w3schools.com

df3 = pd.merge(df1, df2, on='Country', how='left')
df3

Output:

left join 2

You can see that the values that aren’t present or missing during the merge process have been replaced by NaN term.

Right Merge in Pandas

The left join will only display values that are common between two dataframes as well as present fully in the right dataframe in our case.

right join

Courtesy: w3schools.com

df3 = pd.merge(df1, df2, on='Country', how='right')
df3

merge-right





Outer Join

The outer join will merge all the values together of both the dataframes

Full Outer Join

Courtesy: w3schools.com

df3 = pd.merge(df1, df2, on='Country', how='outer')
df3

Output:

Full Outer Join 2