In this article, we will discuss how to split a DataFrame column into two columns in Python.
Table Of Contents
Overview Pandas DataFrame
In Pandas, the DataFrame contains three elements rows, columns, and data. It is a two-dimensional object which contains columns and rows. Where columns represent the content and rows representing the index. DataFrame is like a tabular data structure.
String column values in DataFrame
In Pandas, a DataFrame column can contain delimited string values. It means, multiple values in a single column that are either separated by dashes, whitespace, or comma. For example,
RollNo student_name student_address 0 10 Reema Surat_Gujarat 1 20 Rekha Pune_Maharastra 2 30 Jaya Delhi_Uttar Pradesh
Here, we have the requirement to split a single column into two different columns. For example, in the above DataFrame split the student_address column to two different columns “city” and “state” like,
RollNo student_name city state 0 10 Reema Surat Gujarat 1 20 Rekha Pune Maharastra 2 30 Jaya Delhi Uttar Pradesh
There are different ways to do this. Let’s discuss them one by one.
Frequently Asked:
Split DataFrame column into two columns using Series.str.split()
In pandas, DataFrame columns are called Series, and to convert the column into a string data we can use Series.str() function. The Series.str.split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series.str.split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.str.split() method works on a specified column only.
Syntax of Series.str.split() method
Series.str.split(pat=None, n=-1, expand=False)
- pat: string type; It is a regular expression or a delimiter symbol to split on. By default it is whitespace. It is optional.
- n: int type; It specifies the number of splits, default is -1.
- expand: bool type; default is False
- If True, return DataFrame/MultiIndex expanding dimensionality.
- If False, return Series/Index, containing lists of strings.
Example of Series.str.split() method with an underscore as a delimiter to split the student_address column to two different columns “city” and “state”.
import pandas as pd # create a Dataframe df = pd.DataFrame({ 'RollNo': [10, 20, 30], 'student_name': ['Reema', 'Rekha', 'Jaya'], 'student_address': ['Surat_Gujarat', 'Pune_Maharastra', 'Delhi_Uttar Pradesh'] }) # show the dataframe print(df) print("***********") df[['city','state']] = df.student_address.str.split("_", expand=True) print(df)
Output
RollNo student_name student_address 0 10 Reema Surat_Gujarat 1 20 Rekha Pune_Maharastra 2 30 Jaya Delhi_Uttar Pradesh *********** RollNo student_name student_address city state 0 10 Reema Surat_Gujarat Surat Gujarat 1 20 Rekha Pune_Maharastra Pune Maharastra 2 30 Jaya Delhi_Uttar Pradesh Delhi Uttar Pradesh
In the above script, we have used the Series.str.split() method to split student_address column values into the city and state columns. The delimiter ‘‘ underscore is specified between both the values, so both values are separated by ‘‘. We passed ‘_’ as the first argument in the Series. str.split() method.
Example 2:
Split two different columns values into four new columns, where comma is the delimiter.
import pandas as pd # create a Dataframe df = pd.DataFrame({ 'RollNo': [10, 20, 30], 'student_name': ['Reema,Thakkar', 'Rekha,Chande', 'Jaya,Sachde'], 'student_address': ['Surat,Gujarat', 'Pune,Maharastra', 'Delhi,Uttar Pradesh'] }) # show the dataframe print(df) print('***********') # Split column student_name to Name and Surname df[['Name','Surname']]=df["student_name"].str.split(",",expand=True) # Split column student_address to City and State df[['City','State']]=df["student_address"].str.split(",",expand=True) print(df)
Output
RollNo student_name student_address 0 10 Reema,Thakkar Surat,Gujarat 1 20 Rekha,Chande Pune,Maharastra 2 30 Jaya,Sachde Delhi,Uttar Pradesh *********** RollNo student_name student_address Name Surname City State 0 10 Reema,Thakkar Surat,Gujarat Reema Thakkar Surat Gujarat 1 20 Rekha,Chande Pune,Maharastra Rekha Chande Pune Maharastra 2 30 Jaya,Sachde Delhi,Uttar Pradesh Jaya Sachde Delhi Uttar Pradesh
In the above script, we have used Series. str. split() method to split values in column student_name into two columns Name and Surname, and column student_address into columns City and State.
Split DataFrame column into two columns using apply() method
In Pandas, the apply() method can also be used to split one column values into multiple columns. The DataFrame.apply method() can execute a function on all values of single or multiple columns. Then inside that function, we can split the string value to multiple values. Then we can assign all these splitted values into new columns.
Syntax of DataFrame.apply() method
dataframe.apply(func, axis, raw, result_type, args, kwds)
- Parameters:
- func: Required. A function to apply to the DataFrame.
- axis: Default-0, 1
- raw: True or False Optional, default False.
- result_type: ‘expand’,’reduce’,’broadcast or None Optional, default None. Specifies how the result will be returned
- args: a tuple Optional, arguments to send into the function
- kwds: keyword arguments Optional, keyword arguments to send into the function
Example 1:
Example of DataFrame.apply() method to split a column into multiple columns. Where an underscore is the delimiter.
import pandas as pd # create a Dataframe df = pd.DataFrame({ 'RollNo': [10, 20, 30], 'student_name': ['Reema', 'Rekha', 'Jaya'], 'student_address': ['Surat_Gujarat', 'Pune_Maharastra', 'Delhi_Uttar Pradesh'] }) # show the dataframe print(df) # Split column student_address into two columns City and State df[['City', 'State']] = df["student_address"].apply(lambda x: pd.Series(str(x).split("_"))) print(df)
Output
RollNo student_name student_address 0 10 Reema Surat_Gujarat 1 20 Rekha Pune_Maharastra 2 30 Jaya Delhi_Uttar Pradesh RollNo student_name student_address City State 0 10 Reema Surat_Gujarat Surat Gujarat 1 20 Rekha Pune_Maharastra Pune Maharastra 2 30 Jaya Delhi_Uttar Pradesh Delhi Uttar Pradesh
In the above script, we have the used pandas DataFrame.apply() method to split a column student_address into two columns city and state. For that, we applied a lambda function on all values of column student_address. This lambda function broke each value in that column to two different values i.e. City and State. Then we assigned those values to two new columns.
Example 2:
Example of DataFrame.apply() method with comma as a delimiter, to split two different columns values into four new columns.
import pandas as pd # create a Dataframe df = pd.DataFrame({ 'RollNo': [10, 20, 30], 'student_name': ['Reema,Thakkar', 'Rekha,Chande', 'Jaya,Sachde'], 'student_address': ['Surat,Gujarat', 'Pune,Maharastra', 'Delhi,Uttar Pradesh'] }) # show the DataFrame print(df) print('*******************') # Split column student_name into Name and Surname df[['Name', 'Surname']] = df["student_name"].apply(lambda x: pd.Series(str(x).split(","))) # Split column student_address into City and State df[['City', 'State']] = df["student_address"].apply(lambda x: pd.Series(str(x).split(","))) print(df)
Output
RollNo student_name student_address 0 10 Reema,Thakkar Surat,Gujarat 1 20 Rekha,Chande Pune,Maharastra 2 30 Jaya,Sachde Delhi,Uttar Pradesh ******************* RollNo student_name student_address Name Surname City State 0 10 Reema,Thakkar Surat,Gujarat Reema Thakkar Surat Gujarat 1 20 Rekha,Chande Pune,Maharastra Rekha Chande Pune Maharastra 2 30 Jaya,Sachde Delhi,Uttar Pradesh Jaya Sachde Delhi Uttar Pradesh
In the above script, we have applied the pandas.DataFrame.apply() method to split two columns student_name and student_address into four new columns i.e. student_name into Name and Surname, and student_address into City and State.
Summary
In the article, we have discussed what is a string column in a DataFrame? The we listed out different ways to split string column into two columns and also explain each method with practical examples.