Convert a Unicode String to a String in Python

In this python tutorial, you will learn how to convert a Unicode string to a string.

Table Of Contents

A Unicode string that is used to represent the characters in a number system. If we want to specify a Unicode string, we have to place the character – “u” in front of the string.

Example:

u"Hello Varun"

Convert a Unicode string to a string using str()

Here, we will use str() to convert Unicode string to string.

Syntax:

str(inp_str)

It takes only one parameter.

Parameter:

Where inp_str is a Unicode string.
Example 1:

In this example, we will convert the Unicode string – u”Welcome to thisPointer” to a string using str().

# Consider the unicode string
inp_str= u"Welcome to thisPointer"

# Convert to string
print("Converted String: ",str(inp_str))

Output:

Converted String:  Welcome to thisPointer

Convert a Unicode string to UTF-8

Here, we will take a Unicode string and encode it to UTF-8 using the encode() method. The UTF-8 converts each character in the Unicode string into 1 to 4 characters. The conversion depends upon the character.

Syntax:

inp_str.encode('UTF-8')

Where inp_str is the Unicode string.

Example:

In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-8.

# Consider the unicode string
inp_str= u"Welcome to thisPointer"

# Convert unicode string to UTF-8 encoding
inp_str=inp_str.encode('UTF-8')
print("Converted String: ", inp_str)

Output:

Converted String:  b'Welcome to thisPointer'

From the above string, it takes 1 character to convert from Unicode to UTF-8. Suppose, if you want to revert the Unicode string, then you can use the decode() method.

Syntax:

inp_str.decode('UTF-8')

Example:
In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-8 and again decode it to a unicode string.

# Consider the unicode string
inp_str= u"Welcome to thisPointer"

# Convert unicode string to UTF-8 encoding
inp_str=inp_str.encode('UTF-8')
print("Converted String: ", inp_str)

# Convert back
inp_str=inp_str.decode('UTF-8')
print("Actual String: ", inp_str)

Output:

Converted String:  b'Welcome to thisPointer'
Actual String:  Welcome to thisPointer

Convert a Unicode string to UTF-16

Here, we will take a Unicode string and encode to UTF-16 using encode() method. The UTF-16 converts each character in the Unicode string into mostly 2 bytes.

Syntax:

inp_str.encode('UTF-16')

Where inp_str is the Unicode string.
Example:

In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-16.

# Consider the unicode string
inp_str= u"Welcome to thisPointer"

# Convert unicode string to UTF-16 encoding
inp_str=inp_str.encode('UTF-16')
print("Converted String: ", inp_str)

Output:

Converted String:  b'\xff\xfeW\x00e\x00l\x00c\x00o\x00m\x00e\x00 \x00t\x00o\x00 \x00t\x00h\x00i\x00s\x00P\x00o\x00i\x00n\x00t\x00e\x00r\x00'

From the above string, it returned 2 bytes of each character, if you want to revert the Unicode string, then you can use the decode() method.

Syntax:

inp_str.decode('UTF-16')

Example:

In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-16 and again decode it to a Unicode string.

# Consider the unicode string
inp_str= u"Welcome to thisPointer"

# Convert unicode string to UTF-16 encoding
inp_str=inp_str.encode('UTF-16')
print("Converted String: ", inp_str)

# Convert back
inp_str=inp_str.decode('UTF-16')
print("Actual String: ", inp_str)

Output:

Converted String:  b'\xff\xfeW\x00e\x00l\x00c\x00o\x00m\x00e\x00 \x00t\x00o\x00 \x00t\x00h\x00i\x00s\x00P\x00o\x00i\x00n\x00t\x00e\x00r\x00'
Actual String:  Welcome to thisPointer

Convert a Unicode string to UTF-32

Here, we will take a Unicode string and encode it to UTF-32 using encode() method.UTF-16 converts each character in the Unicode string into mostly 4 bytes.

Syntax:

inp_str.encode('UTF-32')

Where inp_str is the Unicode string.

Example:

In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-32.

# Consider the unicode string
inp_str= u"Welcome to thisPointer"

# Convert unicode string to UTF-32 encoding
inp_str=inp_str.encode('UTF-32')
print("Converted String: ", inp_str)

Output:

Converted String:  b'\xff\xfe\x00\x00W\x00\x00\x00e\x00\x00\x00l\x00\x00\x00c\x00\x00\x00o\x00\x00\x00m\x00\x00\x00e\x00\x00\x00 \x00\x00\x00t\x00\x00\x00o\x00\x00\x00 \x00\x00\x00t\x00\x00\x00h\x00\x00\x00i\x00\x00\x00s\x00\x00\x00P\x00\x00\x00o\x00\x00\x00i\x00\x00\x00n\x00\x00\x00t\x00\x00\x00e\x00\x00\x00r\x00\x00\x00'

From the above string, it returned 4 bytes of each character, if you want to revert the Unicode string, then you can use the decode() method.

Syntax:

inp_str.decode('UTF-32')

Example:

In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-32 and again decode it to a Unicode string.

# Consider the unicode string
inp_str= u"Welcome to thisPointer"

# Convert unicode string to UTF-32 encoding
inp_str=inp_str.encode('UTF-32')
print("Converted String: ", inp_str)

# Convert back
inp_str=inp_str.decode('UTF-32')
print("Actual String: ", inp_str)

Output:

Converted String:  b'\xff\xfe\x00\x00W\x00\x00\x00e\x00\x00\x00l\x00\x00\x00c\x00\x00\x00o\x00\x00\x00m\x00\x00\x00e\x00\x00\x00 \x00\x00\x00t\x00\x00\x00o\x00\x00\x00 \x00\x00\x00t\x00\x00\x00h\x00\x00\x00i\x00\x00\x00s\x00\x00\x00P\x00\x00\x00o\x00\x00\x00i\x00\x00\x00n\x00\x00\x00t\x00\x00\x00e\x00\x00\x00r\x00\x00\x00'
Actual String:  Welcome to thisPointer

Summary

In this Python String article, we have seen how to convert a Unicode string to a string using the str(). Also, we saw how to encode the strings to UTF-8, UTF-16, and UTF-32 with encode() and decode the strings to Unicode strings with decode() method. Happy Learning.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top