How to shuffle DataFrame rows in Pandas?

In this article, we will discuss how to shuffle DataFrame rows in Pandas. Shuffling rows is generally used to randomize datasets before feeding the data into any Machine Learning model training.

Table Of Contents

Preparing DataSet

To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.

import pandas as pd
import numpy as np

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Tech' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'Tech', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

Contents of the created dataframe are,

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US    Tech          11
5      Sim       US    Tech           4

Method 1: Using pandas.DataFrame.sample() function

The sample() function from pandas is generally used to pick a random sample from the dataset. But we can also use it to shuffle the rows by setting the “frac” attribute as 1, where the “frac” attributes means to select the fraction of rows in the random sample DataFrame. Therefore, setting that as 1 will keep all the rows but will just shuffle it randomly.

Advertisements
# sample DataFrame with random state
print (df.sample(frac=1, random_state=2022))

Output

      Name Location    Team  Experience
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
4     Aadi       US    Tech          11
5      Sim       US    Tech           4

As observed, the DataFrame rows have now shuffled in random order. We have used “random_state” to replicate the results later on when we run the same code.

Method 2: Using shuffle from sklearn

The sklearn.utils also provides a function to shuffle any pandas DataFrame. Let’s use it to shuffle the original DataFrame again.

# import
from sklearn.utils import shuffle

# shuffle rows
print (shuffle(df))

Output

      Name Location    Team  Experience
5      Sim       US    Tech           4
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US    Tech          11
0  Shubham    India    Tech           5
1     Riti    India    Tech           7

Method 3: Using permutation from NumPy

Another interesting way to shuffle the DataFrame rows is using the numpy.random.permutation() function. Broadly, this is used to create all the permutations of a sequence or a range. Here, we will use it to shuffle the rows by creating a random permutation of the sequence from 0 to DataFrame length.

# shuffle using permutation function
print(df.iloc[np.random.permutation(len(df))])

Output

      Name Location    Team  Experience
5      Sim       US    Tech           4
1     Riti    India    Tech           7
0  Shubham    India    Tech           5
4     Aadi       US    Tech          11
2   Shanky    India     PMO           2
3   Shreya    India  Design           2

The complete example is as follows,

import pandas as pd
import numpy as np
from sklearn.utils import shuffle

# List of Tuples
employees = [('Shubham', 'India', 'Tech',   5),
            ('Riti', 'India', 'Tech' ,   7),
            ('Shanky', 'India', 'PMO' ,   2),
            ('Shreya', 'India', 'Design' ,   2),
            ('Aadi', 'US', 'Tech', 11),
            ('Sim', 'US', 'Tech', 4)]

# Create a DataFrame object from list of tuples
df = pd.DataFrame(employees,
                  columns=['Name', 'Location', 'Team', 'Experience'])
print(df)

# sample DataFrame with random state
print (df.sample(frac=1, random_state=2022))


# shuffle rows
print (shuffle(df))

# shuffle using permutation function
print(df.iloc[np.random.permutation(len(df))])

Output:

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
4     Aadi       US    Tech          11
5      Sim       US    Tech           4

      Name Location    Team  Experience
2   Shanky    India     PMO           2
3   Shreya    India  Design           2
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
4     Aadi       US    Tech          11
5      Sim       US    Tech           4

      Name Location    Team  Experience
0  Shubham    India    Tech           5
1     Riti    India    Tech           7
3   Shreya    India  Design           2
5      Sim       US    Tech           4
2   Shanky    India     PMO           2
4     Aadi       US    Tech          11

      Name Location    Team  Experience
4     Aadi       US    Tech          11
5      Sim       US    Tech           4
3   Shreya    India  Design           2
1     Riti    India    Tech           7
2   Shanky    India     PMO           2
0  Shubham    India    Tech           5

Summary

In this article, we have discussed multiple ways to shuffle the DataFrame rows in pandas.

Pandas Tutorials -Learn Data Analysis with Python

   

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top