Python String encode() function

In this article, we will discuss how to use the encode() function of string class in python.

Since python 3.0, strings are stored as Unicodes. The Unicode is a standard encoding system, in which each character is encoded using a integer code point from 0 and 0x10FFFF. In python, a Unicode string is a sequence of zero or more code points i.e unicode characters. The benefit is that, we can store characters from languages other than english in the string like, Hindi or German characters.

In python, the string class provides a function encode() to get the different encoded versions of a string.

Syntax of str.encode()

str.encode(encoding='UTF-8',errors='strict')

Arguments:

  • encoding: The encoding type in which the string has to be encoded. Like ‘UTF-8’ or ‘ascii’ etc.
    • Default value is ‘UTF-8’
  • errors: It describes the behavior when encoding fails. Default value ‘strict’ and all possible values are,
    • ignore – Let encode() ignores the unencodable Unicode in the returned string.
    • replace – Let encode()replaces the unencodable Unicode to a question mark (?) in the returned string
    • xmlcharrefreplace – Let encode()inserts XML character reference instead of unencodable Unicode in the returned string.
    • backslashreplace – Let encode() inserts a \uNNNN escape sequence instead of unencodable Unicode in the returned string.
    • namereplace – Let encode() inserts a \N{…} escape sequence instead of unencodable Unicode in the returned string.
    • strict – Let encode() raises a UnicodeDecodeError exception on failure. It is the default behavior if no argument value is provided for errors.

Returns:

  • It returns an encoded version of the calling string object. Whereas, if errors value is ‘strict’ and encoding fails then it raises the error UnicodeEncodeError.

Important Point: As strings are immutable in python, therefor it returns a new string object.

Let’s checkout some examples,

Example 1: Encode a string to Utf-8 encoding in python using encode()

Bu default encode() converts the string into utf-8 encoding. So, we just call the encode() function without any parameter. For example,

sample_str = 'This is -- भफऱ'

# Encode a string to Utf-8 encoding in python using encode()
sample_str = sample_str.encode(encoding='UTF-8')

print(sample_str)

Output:

b'This is -- \xe0\xa4\xad\xe0\xa4\xab\xe0\xa4\xb1'

It returned an utf-8 encoding version of the string. Behaviour will be sample if you pass encoding parameter with value ‘UTF-8’ i.e.

sample_str = 'This is -- भफऱ'

# Encode a string to Utf-8 encoding in python using encode()
sample_str = sample_str.encode()

print(sample_str)

Output:

b'This is -- \xe0\xa4\xad\xe0\xa4\xab\xe0\xa4\xb1'

Example 2: Encode a string to ascii encoding using encode() and ignore errors

If string is unencodable in the given encoding, also errors parameter is ignore, then it will ignore the errorneous unencodable unicodes and converts the remaining. For example,

sample_str = 'This is -- भफऱ'

# Encode a string to ascii ignore errors regarding unencodable unicodes
sample_str = sample_str.encode(encoding='ascii', errors='ignore')

print(sample_str)

Output:

b'This is -- '

Example 3: Encode a string to ascii encoding using encode() and replace unencodable Unicode with ?

If string is unencodable in the given encoding, also errors parameter is replace, then it will replace the erroneous Unicode characters with ‘?’ in the returned string. For example,

sample_str = 'This is -- भफऱ'

# Encode a string to ascii & replace errorneous unencodable unicodes with '?'
sample_str = sample_str.encode(encoding='ascii', errors='replace')

print(sample_str)

Output:

b'This is -- ???'

Example 4: Encode an unencodable string to ascii encoding and handle errors

If string is unencodable in the given encoding and errors parameter is not provided. Then it will raise error. For example,

sample_str = 'This is -- भफऱ'

# Encode a string to ascii & raise error in case of errorneous unencodable unicodes
sample_str = sample_str.encode(encoding='ascii')

print(sample_str)

Error:

Traceback (most recent call last):
  File ".\encode.py", line 27, in <module>
    sample_str = sample_str.encode(encoding='ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-13: ordinal not in range(128)

Summary:

Today, we learned that how we can use the encode() function of string in python.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scroll to Top