DATAFRAME IN PANDAS
In this tutorial, we will learn about dataframe which is the most used data structure in Pandas, further we will discuss how to create, modify and index our dataframe.
What is Dataframe in Pandas?
As we learned that series is a one dimensional data structure, dataframe is the opposite of it as it is two dimensional data structure with labeled axes (rows and columns). Whenever we deal with Dataframes, we always keep three things in mind:
- Data to populate the dataframe
- Rows
- Columns
Creating a DataFrame in Pandas
Pandas Dataframe can be created via arrays, lists, dictionaries, through external storage like SQL database, CSV files or excel sheets. Hence, there are multiple ways to create a Dataframe. We are going to be looking at a few to understand dataframe in a better way. We always use a DataFrame notation followed by the parentheses which includes the data. The syntax of using a dataframe is:
df = pd.DataFrame(data)
Creating a DataFrame in Pandas via List
Dataframes can be created through a list or a set of lists. For example:
import pandas as pd #Creating a list list_1 = ['banana', 'apple', 'orange', 'pear', 'avocado'] # Printing the output df = pd.DataFrame(list_1, columns=['Fruits']) print(df)
Output:
Fruits 0 banana 1 apple 2 orange 3 pear 4 avocado
You can use the ‘column’ attribute as well in order to add your own column label.
Creating DataFrame in Pandas via Dictionary
DataFrame can be created through a dictionary, where keys are going to act as the column names as you can see in the example below:
import pandas as pd #create a dictionary data = {'Name':['Hira', 'Sanjeev', 'Rahul', 'Ali'], 'Occupation':['Entrepreneur', 'Doctor', 'Actor', 'Chef']} # Create DataFrame df = pd.DataFrame(data) # Print the output. print(df)
Output:
Name Occupation 0 Hira Entrepreneur 1 Sanjeev Doctor 2 Rahul Actor 3 Ali Chef
Note: Indexing and slicing works in the same way in DataFrame as it worked in a Series.
Indexing DataFrame in Pandas
Like series, indexes are set as integers (starting from 0) by default, however, you can set your own indexes as well by using the index method.
df.index =['First','Second','Third','Fourth'] df
Output:
Name Occupation First Hira Entrepreneur Second Sanjeev Doctor Third Rahul Actor Fourth Ali Chef
Slicing a DataFrame
Slicing a dataframe is as simple as slicing a Series or a regular list in python, let’s say that you want to retrieve a few required rows from your dataframe. You can slice the dataframe by passing the index positions of your rows.
df[0:3]
Output:
Name Occupation First Hira Entrepreneur Second Sanjeev Doctor Third Rahul Actor
Modifying the Column Value
You can change the name of your column as well, for example in some cases especially when you use a dictionary then keys by default become your column names in DataFrame. To change that, you can use the column attribute
df.columns = ['Persons', 'Jobs'] df
Output:
Persons Jobs First Hira Entrepreneur Second Sanjeev Doctor Third Rahul Actor Fourth Ali Chef
Dropping Rows and Columns
You can delete rows and columns from your dataframe as well by selecting the name of the row and defining the axis where rows and columns are placed (axis 0 is for rows and axis 1 is for columns). Let’s say we want to remove ‘Jobs’ from our columns so we will use the drop method to do so:
df.drop('Jobs',axis=1)
Output:
Persons First Hira Second Sanjeev Third Rahul Fourth Ali
We can use the drop method for rows as well, let’s say we want to eliminate the third row, so we will initiate:
df.drop('Third',axis=0)
Output:
Persons Jobs First Hira Entrepreneur Second Sanjeev Doctor Fourth Ali Chef