DATAFRAME IN PANDAS

In this tutorial, we will learn about dataframe which is the most used data structure in Pandas, further we will discuss how to create, modify and index our dataframe.





dataframe-in-pandas

What is Dataframe in Pandas?

As we learned that series is a one dimensional data structure, dataframe is the opposite of it as it is two dimensional data structure with labeled axes (rows and columns). Whenever we deal with Dataframes, we always keep three things in mind:

  • Data to populate the dataframe
  • Rows
  • Columns

Creating a DataFrame in Pandas

Pandas Dataframe can be created via arrays, lists, dictionaries, through external storage like SQL database, CSV files or excel sheets. Hence, there are multiple ways to create a Dataframe. We are going to be looking at a few to understand dataframe in a better way. We always use a DataFrame notation followed by the parentheses which includes the data. The syntax of using a dataframe is:

df = pd.DataFrame(data)

Creating a DataFrame in Pandas via List

Dataframes can be created through a list or a set of lists. For example:

import pandas as pd

#Creating a list
list_1 = ['banana', 'apple', 'orange', 'pear', 'avocado']

# Printing the output
df = pd.DataFrame(list_1, columns=['Fruits'])
print(df)

Output:

  Fruits
0 banana
1 apple
2 orange
3 pear
4 avocado




You can use the ‘column’ attribute as well in order to add your own column label.

Creating DataFrame in Pandas via Dictionary

DataFrame can be created through a dictionary, where keys are going to act as the column names as you can see in the example below:

import pandas as pd

#create a dictionary
data = {'Name':['Hira', 'Sanjeev', 'Rahul', 'Ali'],
'Occupation':['Entrepreneur', 'Doctor', 'Actor', 'Chef']}

# Create DataFrame
df = pd.DataFrame(data)

# Print the output.
print(df)

Output:

       Name           Occupation
0      Hira           Entrepreneur
1      Sanjeev        Doctor
2      Rahul          Actor
3      Ali            Chef

Note: Indexing and slicing works in the same way in DataFrame as it worked in a Series.

Indexing DataFrame in Pandas

Like series, indexes are set as integers (starting from 0) by default, however, you can set your own indexes as well by using the index method.

df.index =['First','Second','Third','Fourth']
df

Output:

           Name         Occupation
First      Hira         Entrepreneur
Second     Sanjeev      Doctor
Third      Rahul        Actor
Fourth     Ali          Chef




Slicing a DataFrame

Slicing a dataframe is as simple as slicing a Series or a regular list in python, let’s say that you want to retrieve a few required rows from your dataframe. You can slice the dataframe by passing the index positions of your rows.

df[0:3]

Output:

          Name         Occupation
First     Hira         Entrepreneur
Second    Sanjeev      Doctor
Third     Rahul        Actor

Modifying the Column Value

You can change the name of your column as well, for example in some cases especially when you use a dictionary then keys by default become your column names in DataFrame. To change that, you can use the column attribute

df.columns = ['Persons', 'Jobs']
df

Output:

          Persons     Jobs
First     Hira        Entrepreneur
Second    Sanjeev     Doctor
Third     Rahul       Actor
Fourth    Ali         Chef

Dropping Rows and Columns

You can delete rows and columns from your dataframe as well by selecting the name of the row and defining the axis where rows and columns are placed (axis 0 is for rows and axis 1 is for columns). Let’s say we want to remove ‘Jobs’ from our columns so we will use the drop method to do so:

df.drop('Jobs',axis=1)




Output:

         Persons
First    Hira
Second   Sanjeev
Third    Rahul
Fourth   Ali

We can use the drop method for rows as well, let’s say we want to eliminate the third row, so we will initiate:

df.drop('Third',axis=0)

Output:

          Persons    Jobs
First     Hira       Entrepreneur
Second    Sanjeev    Doctor
Fourth    Ali        Chef