Pandas Tutorial Part #10 – Add/Remove DataFrame Rows & Columns

In this tutorial, we will learn how to add a new row or column to a DataFrame and change the values of existing rows and columns.

Table of Contents

First of all, we will create a DataFrame, and then we will discuss how to add or remove elements from it i.e.

import pandas as pd

# List of Tuples
students = [('jack',    34, 'Sydney',   'Australia'),
            ('Riti',    30, 'Delhi',    'India'),
            ('Vikas',   31, 'Mumbai',   'India'),
            ('Neelu',   32, 'Bangalore','India'),
            ('John',    16, 'New York',  'US'),
            ('Mike',    17, 'Las Vegas', 'US')]

# Create a DataFrame object
df = pd.DataFrame( students,
                   columns=['Name', 'Age', 'City', 'Country'],
                   index=  ['a', 'b', 'c', 'd', 'e', 'f'])

# Display the DataFrame
print(df)

Output

    Name  Age       City    Country
a   jack   34     Sydney  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  Las Vegas         US

This DataFrame contains four columns and six rows.

Advertisements

Add a column to the DataFrame

To add a new column in the DataFrame, pass the column name in the subscript operator ([]) of the DataFrame and assign new values to it. Let’s see an example, where we will add a new column ‘Budget’ to the above created DataFrame,

# Add a new column to the DataFrame
df['Budget'] = [2000, 3000, 4000, 3500, 4500, 2900]

# Display the DataFrame
print(df)

Output

    Name  Age       City    Country  Budget
a   jack   34     Sydney  Australia    2000
b   Riti   30      Delhi      India    3000
c  Vikas   31     Mumbai      India    4000
d  Neelu   32  Bangalore      India    3500
e   John   16   New York         US    4500
f   Mike   17  Las Vegas         US    2900

All the list values were added as different rows values for the new column in the DataFrame. What if we want to add a new column with the same values?

Add a new column with the same values

To add a new column in the DataFrame with a similar value in each row, pass the column name in the subscript operator ([]) of the DataFrame and assign a scalar value. For example,

# Add a new column to the DataFrame
df['Marks'] = 0

# Display the DataFrame
print(df)

Output

    Name  Age       City    Country  Budget  Marks
a   jack   34     Sydney  Australia    2000      0
b   Riti   30      Delhi      India    3000      0
c  Vikas   31     Mumbai      India    4000      0
d  Neelu   32  Bangalore      India    3500      0
e   John   16   New York         US    4500      0
f   Mike   17  Las Vegas         US    2900      0

It added a new column, ‘Marks’ in the DataFrame with a similar value in each row, i.e. a zero.

Changing values of an existing column

While using the subscript operator([]) of DataFrame, if you use a column that already exists, it will change the values of that column. For example, let’s change the values of column ‘Age’,

# Change the values of a column
df['Age'] = [31, 35, 36, 34, 31, 37]

# Display the DataFrame
print(df)

Output

    Name  Age       City    Country  Budget  Marks
a   jack   31     Sydney  Australia    2000      0
b   Riti   35      Delhi      India    3000      0
c  Vikas   36     Mumbai      India    4000      0
d  Neelu   34  Bangalore      India    3500      0
e   John   31   New York         US    4500      0
f   Mike   37  Las Vegas         US    2900      0

As the column ‘Age’ already exists in the DataFrame, all the values in column ‘Age’ got changed.

Add a new Row to the DataFrame

To add a new row to the DataFrame, pass the row index label in the loc[] property of the DataFrame and assign new row values. For example,

# Add a new Row to the DataFrame
df.loc['g'] = ['Aadi', 35, 'Delhi', 'India']

The Complete example of creating a new DataFrame and then adding a new Row to it,

import pandas as pd

# List of Tuples
students = [('jack',    34, 'Sydney',   'Australia'),
            ('Riti',    30, 'Delhi',    'India'),
            ('Vikas',   31, 'Mumbai',   'India'),
            ('Neelu',   32, 'Bangalore','India'),
            ('John',    16, 'New York',  'US'),
            ('Mike',    17, 'Las Vegas', 'US')]

# Create a DataFrame object
df = pd.DataFrame( students,
                   columns=['Name', 'Age', 'City', 'Country'],
                   index=  ['a', 'b', 'c', 'd', 'e', 'f'])

# Display the DataFrame
print(df)

# Add a new Row to the DataFrame
df.loc['g'] = ['Aadi', 35, 'Delhi', 'India']

# Display the DataFrame
print(df)

Output

    Name  Age       City    Country
a   jack   34     Sydney  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  Las Vegas         US



    Name  Age       City    Country
a   jack   34     Sydney  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  Las Vegas         US
g   Aadi   35      Delhi      India

It added a new row with the index label ‘g’. All the list values got added as the new row values in the DataFrame. Please make sure that number of items provided in the list must be equal to the number of columns in the DataFrame, otherwise it will give ValueError like,

raise ValueError("cannot set a row with mismatched columns")
ValueError: cannot set a row with mismatched columns

Add a new Row with the same values

Instead of passing a sequence, we can also assign a scalar value to the df.loc[row_name]. It will add a new row with similar values for all the columns. For example,

# Add a new Row to the DataFrame
df.loc['h'] = 0

# Display the DataFrame
print(df)

Output

    Name  Age       City    Country
a   jack   34     Sydney  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  Las Vegas         US
g   Aadi   35      Delhi      India
h      0    0          0          0

It added a new row with the index label ‘h,’ and all the values in the new row are 0.

Changing the existing row values

While using the loc[] operator of DataFrame, if you use a row index label that already exists, it will change the values of that row contents. For example, let’s change the values of row ‘b’

# Change the values of existing row
df.loc['b'] = ['Justin', 45, 'Tokyo', 'Japan']

# Display the DataFrame
print(df)

Output

     Name  Age       City    Country
a    jack   34     Sydney  Australia
b  Justin   45      Tokyo      Japan
c   Vikas   31     Mumbai      India
d   Neelu   32  Bangalore      India
e    John   16   New York         US
f    Mike   17  Las Vegas         US
g    Aadi   35      Delhi      India
h       0    0          0          0

As row ‘b’ already exists in the DataFrame, all the values in row ‘b’ got changed.

Summary:

We learned how to change the add or remove new rows and columns in the Pandas DataFrame, also discussed how to change the values of existing rows and columns.

Leave a Comment

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top