In this article, we will discuss how to check if a column or multiple columns exist in a Pandas DataFrame or not.
Suppose we have a DataFrame,
Name Age City Country Budget a jack 34 Sydney Australia 200 b Riti 30 Delhi India 321 c Vikas 31 Mumbai India 333 d Neelu 32 Bangalore India 238 e John 16 New York US 262 f Mike 17 las vegas US 198
Now we want to check, if column with name ‘Age’ exists in this DataFrame? Also, it might be possible that we have a list of names and we want to check if all the columns mentioned in list exist in DataFrame or not? Let’s see how to do that.
First we will create a DataFrame from list of tuples,
import pandas as pd # List of Tuples students = [('jack', 34, 'Sydney', 'Australia', 200), ('Riti', 30, 'Delhi', 'India', 321), ('Vikas', 31, 'Mumbai', 'India', 333), ('Neelu', 32, 'Bangalore','India', 238), ('John', 16, 'New York', 'US', 262), ('Mike', 17, 'las vegas', 'US', 198)] # Create a DataFrame object df = pd.DataFrame( students, columns=['Name', 'Age', 'City', 'Country', 'Budget'], index= ['a', 'b', 'c', 'd', 'e', 'f']) # Display the DataFrame print(df)
Output:
Name Age City Country Budget a jack 34 Sydney Australia 200 b Riti 30 Delhi India 321 c Vikas 31 Mumbai India 333 d Neelu 32 Bangalore India 238 e John 16 New York US 262 f Mike 17 las vegas US 198
This DataFrame has five columns and six rows.
Check if a Column exists in DataFrame
In Pandas, the DataFrame provides an attribute columns, and it gives an Index object containing a sequence of all column names of the DataFrame. We can use the “in operator” with this Index object to check if a name exists in this sequence of column names. For example, let’s see how to check if column ‘Age’ exists in the above created DataFrame,
# Check if column with name 'Age' exists in a Dataframe if 'Age' in df.columns: print('Column exists in the DataFrame') else: print('Column does not exists in the DataFrame')
Output:
Column exists in the DataFrame
The df.columns returned an Index object containing all column names of the DataFrame, and then we checked if the name ‘Age’ was in it or not. As column exists in the DataFrame, the “in operator” returned True. Let’s check out a negative example,
# Check if column with name 'Experience' exists in a Dataframe if 'Experience' in df.columns: print('Column exists in the DataFrame') else: print('Column does not exists in the DataFrame')
Output:
Column does not exists in the DataFrame
In the example, “Experience” doesn’t exist in the DataFrame. Therefore the “in operator” returned False.
Check if multiple columns exist in Pandas DataFrame
Using list comprehension and in operator
Suppose we have a list of a few column names, and we want to check if all of these columns exist in a DataFrame or not. To do that, we can iterate over all of these column names and one by one check if the column name exists or not. For example,
column_names = ['Age', 'Budget'] # Check if all of the column names in a list exist in DataFrame if all(col in df.columns for col in column_names): print('All Column names exists in the DataFrame') else: print('All Column names does not exists in the DataFrame')
Output:
All Column names exists in the DataFrame
Our list had two column names ‘Age’ and ‘Budget’. We iterated over all the names in this list and checked if each of them exists in the DataFrame or not. There is another way to achieve the same using set.
Using Set and issubset()
Convert the list of names to a set and then call that set’s issubset() method. As an argument, pass all the column names of DataFrame. The issubset() function will return True if all the calling set items exist in the passed argument. For example,
column_names = ['Age', 'Budget'] # Check if all of the column names in a list exist in DataFrame if set(column_names).issubset(df.columns): print('All Column names exists in the DataFrame') else: print('All Column names does not exists in the DataFrame')
Output:
All Column names exists in the DataFrame
All the column names in the lists exist in the DataFrame.
How did it work?
We converted the list of column names to a Set and called the issubset() function. As an argument, we passed the df.columns i.e. all the column names of the DataFrame. The issubset() returned True because all the Set items exist in the passed sequence of DataFrame column names.
Let’s check out a negative example,
column_names = ['Age', 'Budget', 'Department'] # Check if all of the column names in a list exist in DataFrame if set(column_names).issubset(df.columns): print('All Column names exists in the DataFrame') else: print('All Column names does not exists in the DataFrame')
Output:
All Column names does not exists in the DataFrame
All the column names in the lists do not exist in the DataFrame.
Summary:
We learned how to check if single or multiple columns exist in the DataFrame or not in Pandas.
Pandas Tutorials -Learn Data Analysis with Python
-
Pandas Tutorial Part #1 - Introduction to Data Analysis with Python
-
Pandas Tutorial Part #2 - Basics of Pandas Series
-
Pandas Tutorial Part #3 - Get & Set Series values
-
Pandas Tutorial Part #4 - Attributes & methods of Pandas Series
-
Pandas Tutorial Part #5 - Add or Remove Pandas Series elements
-
Pandas Tutorial Part #6 - Introduction to DataFrame
-
Pandas Tutorial Part #7 - DataFrame.loc[] - Select Rows / Columns by Indexing
-
Pandas Tutorial Part #8 - DataFrame.iloc[] - Select Rows / Columns by Label Names
-
Pandas Tutorial Part #9 - Filter DataFrame Rows
-
Pandas Tutorial Part #10 - Add/Remove DataFrame Rows & Columns
-
Pandas Tutorial Part #11 - DataFrame attributes & methods
-
Pandas Tutorial Part #12 - Handling Missing Data or NaN values
-
Pandas Tutorial Part #13 - Iterate over Rows & Columns of DataFrame
-
Pandas Tutorial Part #14 - Sorting DataFrame by Rows or Columns
-
Pandas Tutorial Part #15 - Merging or Concatenating DataFrames
-
Pandas Tutorial Part #16 - DataFrame GroupBy explained with examples
Are you looking to make a career in Data Science with Python?
Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.
Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.
Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.