In this python tutorial, you will learn how to convert a Unicode string to a string.
Table Of Contents
A Unicode string that is used to represent the characters in a number system. If we want to specify a Unicode string, we have to place the character – “u” in front of the string.
Example:
u"Hello Varun"
Convert a Unicode string to a string using str()
Here, we will use str() to convert Unicode string to string.
Syntax:
str(inp_str)
It takes only one parameter.
Frequently Asked:
- Convert hex string to int in Python
- How to Write a String to a Text File in Python?
- Check if String is Lowercase in Python
- Python: Count uppercase characters in a string
Parameter:
Where inp_str is a Unicode string.
Example 1:
In this example, we will convert the Unicode string – u”Welcome to thisPointer” to a string using str().
# Consider the unicode string inp_str= u"Welcome to thisPointer" # Convert to string print("Converted String: ",str(inp_str))
Output:
Converted String: Welcome to thisPointer
Convert a Unicode string to UTF-8
Here, we will take a Unicode string and encode it to UTF-8 using the encode() method. The UTF-8 converts each character in the Unicode string into 1 to 4 characters. The conversion depends upon the character.
Syntax:
inp_str.encode('UTF-8')
Where inp_str is the Unicode string.
Example:
In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-8.
# Consider the unicode string inp_str= u"Welcome to thisPointer" # Convert unicode string to UTF-8 encoding inp_str=inp_str.encode('UTF-8') print("Converted String: ", inp_str)
Output:
Converted String: b'Welcome to thisPointer'
From the above string, it takes 1 character to convert from Unicode to UTF-8. Suppose, if you want to revert the Unicode string, then you can use the decode() method.
Syntax:
inp_str.decode('UTF-8')
Example:
In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-8 and again decode it to a unicode string.
# Consider the unicode string inp_str= u"Welcome to thisPointer" # Convert unicode string to UTF-8 encoding inp_str=inp_str.encode('UTF-8') print("Converted String: ", inp_str) # Convert back inp_str=inp_str.decode('UTF-8') print("Actual String: ", inp_str)
Output:
Converted String: b'Welcome to thisPointer' Actual String: Welcome to thisPointer
Convert a Unicode string to UTF-16
Here, we will take a Unicode string and encode to UTF-16 using encode() method. The UTF-16 converts each character in the Unicode string into mostly 2 bytes.
Syntax:
inp_str.encode('UTF-16')
Where inp_str is the Unicode string.
Example:
In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-16.
# Consider the unicode string inp_str= u"Welcome to thisPointer" # Convert unicode string to UTF-16 encoding inp_str=inp_str.encode('UTF-16') print("Converted String: ", inp_str)
Output:
Converted String: b'\xff\xfeW\x00e\x00l\x00c\x00o\x00m\x00e\x00 \x00t\x00o\x00 \x00t\x00h\x00i\x00s\x00P\x00o\x00i\x00n\x00t\x00e\x00r\x00'
From the above string, it returned 2 bytes of each character, if you want to revert the Unicode string, then you can use the decode() method.
Syntax:
inp_str.decode('UTF-16')
Example:
In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-16 and again decode it to a Unicode string.
# Consider the unicode string inp_str= u"Welcome to thisPointer" # Convert unicode string to UTF-16 encoding inp_str=inp_str.encode('UTF-16') print("Converted String: ", inp_str) # Convert back inp_str=inp_str.decode('UTF-16') print("Actual String: ", inp_str)
Output:
Converted String: b'\xff\xfeW\x00e\x00l\x00c\x00o\x00m\x00e\x00 \x00t\x00o\x00 \x00t\x00h\x00i\x00s\x00P\x00o\x00i\x00n\x00t\x00e\x00r\x00' Actual String: Welcome to thisPointer
Convert a Unicode string to UTF-32
Here, we will take a Unicode string and encode it to UTF-32 using encode() method.UTF-16 converts each character in the Unicode string into mostly 4 bytes.
Syntax:
inp_str.encode('UTF-32')
Where inp_str is the Unicode string.
Example:
In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-32.
# Consider the unicode string inp_str= u"Welcome to thisPointer" # Convert unicode string to UTF-32 encoding inp_str=inp_str.encode('UTF-32') print("Converted String: ", inp_str)
Output:
Converted String: b'\xff\xfe\x00\x00W\x00\x00\x00e\x00\x00\x00l\x00\x00\x00c\x00\x00\x00o\x00\x00\x00m\x00\x00\x00e\x00\x00\x00 \x00\x00\x00t\x00\x00\x00o\x00\x00\x00 \x00\x00\x00t\x00\x00\x00h\x00\x00\x00i\x00\x00\x00s\x00\x00\x00P\x00\x00\x00o\x00\x00\x00i\x00\x00\x00n\x00\x00\x00t\x00\x00\x00e\x00\x00\x00r\x00\x00\x00'
From the above string, it returned 4 bytes of each character, if you want to revert the Unicode string, then you can use the decode() method.
Syntax:
inp_str.decode('UTF-32')
Example:
In this example, we will convert the Unicode string – u”Welcome to thisPointer” to UTF-32 and again decode it to a Unicode string.
# Consider the unicode string inp_str= u"Welcome to thisPointer" # Convert unicode string to UTF-32 encoding inp_str=inp_str.encode('UTF-32') print("Converted String: ", inp_str) # Convert back inp_str=inp_str.decode('UTF-32') print("Actual String: ", inp_str)
Output:
Converted String: b'\xff\xfe\x00\x00W\x00\x00\x00e\x00\x00\x00l\x00\x00\x00c\x00\x00\x00o\x00\x00\x00m\x00\x00\x00e\x00\x00\x00 \x00\x00\x00t\x00\x00\x00o\x00\x00\x00 \x00\x00\x00t\x00\x00\x00h\x00\x00\x00i\x00\x00\x00s\x00\x00\x00P\x00\x00\x00o\x00\x00\x00i\x00\x00\x00n\x00\x00\x00t\x00\x00\x00e\x00\x00\x00r\x00\x00\x00' Actual String: Welcome to thisPointer
Summary
In this Python String article, we have seen how to convert a Unicode string to a string using the str(). Also, we saw how to encode the strings to UTF-8, UTF-16, and UTF-32 with encode() and decode the strings to Unicode strings with decode() method. Happy Learning.