In this article, we will discuss how to shuffle DataFrame rows in Pandas. Shuffling rows is generally used to randomize datasets before feeding the data into any Machine Learning model training.
Table Of Contents
Preparing DataSet
To quickly get started, let’s create a sample dataframe to experiment. We’ll use the pandas library with some random data.
import pandas as pd import numpy as np # List of Tuples employees = [('Shubham', 'India', 'Tech', 5), ('Riti', 'India', 'Tech' , 7), ('Shanky', 'India', 'PMO' , 2), ('Shreya', 'India', 'Design' , 2), ('Aadi', 'US', 'Tech', 11), ('Sim', 'US', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience']) print(df)
Contents of the created dataframe are,
Name Location Team Experience 0 Shubham India Tech 5 1 Riti India Tech 7 2 Shanky India PMO 2 3 Shreya India Design 2 4 Aadi US Tech 11 5 Sim US Tech 4
Method 1: Using pandas.DataFrame.sample() function
The sample() function from pandas is generally used to pick a random sample from the dataset. But we can also use it to shuffle the rows by setting the “frac” attribute as 1, where the “frac” attributes means to select the fraction of rows in the random sample DataFrame. Therefore, setting that as 1 will keep all the rows but will just shuffle it randomly.
# sample DataFrame with random state print (df.sample(frac=1, random_state=2022))
Output
Name Location Team Experience 2 Shanky India PMO 2 3 Shreya India Design 2 0 Shubham India Tech 5 1 Riti India Tech 7 4 Aadi US Tech 11 5 Sim US Tech 4
As observed, the DataFrame rows have now shuffled in random order. We have used “random_state” to replicate the results later on when we run the same code.
Frequently Asked:
- Replace NaN values with next values in Pandas
- Select Rows with unique column values in Pandas
- Pandas: Select last column of dataframe in python
- Get Last value of a Column in Pandas DataFrame
Method 2: Using shuffle from sklearn
The sklearn.utils also provides a function to shuffle any pandas DataFrame. Let’s use it to shuffle the original DataFrame again.
# import from sklearn.utils import shuffle # shuffle rows print (shuffle(df))
Output
Name Location Team Experience 5 Sim US Tech 4 2 Shanky India PMO 2 3 Shreya India Design 2 4 Aadi US Tech 11 0 Shubham India Tech 5 1 Riti India Tech 7
Method 3: Using permutation from NumPy
Another interesting way to shuffle the DataFrame rows is using the numpy.random.permutation() function. Broadly, this is used to create all the permutations of a sequence or a range. Here, we will use it to shuffle the rows by creating a random permutation of the sequence from 0 to DataFrame length.
# shuffle using permutation function print(df.iloc[np.random.permutation(len(df))])
Output
Name Location Team Experience 5 Sim US Tech 4 1 Riti India Tech 7 0 Shubham India Tech 5 4 Aadi US Tech 11 2 Shanky India PMO 2 3 Shreya India Design 2
The complete example is as follows,
import pandas as pd import numpy as np from sklearn.utils import shuffle # List of Tuples employees = [('Shubham', 'India', 'Tech', 5), ('Riti', 'India', 'Tech' , 7), ('Shanky', 'India', 'PMO' , 2), ('Shreya', 'India', 'Design' , 2), ('Aadi', 'US', 'Tech', 11), ('Sim', 'US', 'Tech', 4)] # Create a DataFrame object from list of tuples df = pd.DataFrame(employees, columns=['Name', 'Location', 'Team', 'Experience']) print(df) # sample DataFrame with random state print (df.sample(frac=1, random_state=2022)) # shuffle rows print (shuffle(df)) # shuffle using permutation function print(df.iloc[np.random.permutation(len(df))])
Output:
Name Location Team Experience 0 Shubham India Tech 5 1 Riti India Tech 7 2 Shanky India PMO 2 3 Shreya India Design 2 4 Aadi US Tech 11 5 Sim US Tech 4 Name Location Team Experience 2 Shanky India PMO 2 3 Shreya India Design 2 0 Shubham India Tech 5 1 Riti India Tech 7 4 Aadi US Tech 11 5 Sim US Tech 4 Name Location Team Experience 0 Shubham India Tech 5 1 Riti India Tech 7 3 Shreya India Design 2 5 Sim US Tech 4 2 Shanky India PMO 2 4 Aadi US Tech 11 Name Location Team Experience 4 Aadi US Tech 11 5 Sim US Tech 4 3 Shreya India Design 2 1 Riti India Tech 7 2 Shanky India PMO 2 0 Shubham India Tech 5
Summary
In this article, we have discussed multiple ways to shuffle the DataFrame rows in pandas.