numpy.where()

In this article we will discuss how np.where() works in python with the help of various examples like,

Python’s Numpy module provides a function to select elements two different sequences based on conditions on a different Numpy array i.e.

Syntax of np.where()

numpy.where(condition[, x, y])

Argument:

  • condition: A conditional expression that returns a Numpy array of bool
  • x, y: Arrays (Optional i.e. either both are passed or not passed)
    • If x & y are passed in np.where(), then it returns the elements selected from x & y based on condition on original array depending on values in bool array yielded by the condition.

Returns:

  • If x & y parameters are passed then it returns a new numpy array by selecting items from x & y based on the result from applying condition on original numpy array.
  • If x & y arguments are not passed and only condition argument is passed then it returns the indices of the elements that are True in bool numpy array. If the original array is multidimensional then it returns a tuple of arrays (one for each axis).

Let’s understand by some examples

Using numpy.where() with single condition

Suppose we have a numpy array and two lists of the same size,

arr = np.array([11, 12, 13, 14])

high_values = ['High', 'High', 'High', 'High']
low_values = ['Low', 'Low', 'Low', 'Low']

Now we want to convert this Numpy array arr to another array of the same size, where it will contain the values from lists high_values and low_values. Like, if the value in arr is greater than 12 then replace it with the corresponding value from high_values i.e ‘High’. Whereas, if the value in arr is less then 12 then replace it with the corresponding value in low_values i.e. ‘Low’. So, our new numpy array should be like this,

['Low' 'Low' 'High' 'High']

We can do this using for loops and conditions, but np.where() is designed for this kind of scenario only. So, let’s use np.where() to get this done,

# Create a Numpy array from a list
arr = np.array([11, 12, 13, 14])

high_values = ['High', 'High', 'High', 'High']
low_values = ['Low', 'Low', 'Low', 'Low']

# numpy where() with condition argument
result = np.where(arr > 12,
                  ['High', 'High', 'High', 'High'],
                  ['Low', 'Low', 'Low', 'Low'])

print(result)

Output:

['Low' 'Low' 'High' 'High']

Here we converted the numpy arr to another array by picking values from two different lists based on the condition on original numpy array arr. Like, first for the first two values in the arr condition evaluated to False because they were less than 12, so it selected the elements from 2nd list i.e. low_values. Whereas, first the next two values in the arr condition evaluated to True because they were greater than 12, so it selected the elements from the 1st list i.e. high_values.

Let’s understand in details, how did it work,

We passed the three arguments in the np.where(). The first argument is the condition on the numpy array arr which got converted to a bool array i.e.

arr > 12 ==> [False False True True]

Then numpy.where() iterated over the bool array and for every True it yields corresponding element from list 1 i.e. high_values and for every False it yields corresponding element from 2nd list i.e. low_values i.e.

[False False True True] ==> [‘Low’, ‘Low’, ‘High’, ‘High’]

So, this is how we can use np.where() to process the contents of numpy array and create a new array based on condition on the original array.

Using numpy.where() with multiple conditions

In the previous example we used a single condition in the np.where(), but we can use multiple conditions too inside the numpy.where(). For example,

# Create a numpy array from list
arr = np.array([11, 12, 14, 15, 16, 17])

# pass condition expression only
result = np.where((arr > 12) & (arr < 16),
                  ['A', 'A', 'A', 'A', 'A', 'A'],
                  ['B', 'B', 'B', 'B', 'B', 'B'])

print(result)

Output:

['B' 'B' 'A' 'A' 'B' 'B']

Here we executed multiple conditions on the array arr and it returned a bool array. Then numpy.where() iterated over the bool array and for every True it yields corresponding element from the first list and for every False it yields the corresponding element from the 2nd list. Then constructs a new array by the values selected from both the lists based on the result of multiple conditions on numpy array arr i.e.

  • Values in arr for which conditional expression returns True are 14 & 15, so these will be replaced by corresponding values in list1.
  • Values in arr for which conditional expression returns False are 11, 12, 16 & 17, so these will be replaced by corresponding values in list2.

Example 2:

In all the above example the lists we passed had the same values, but these lists can contain other values too i.e.

# Create a numpy array from list
arr = np.array([11, 12, 14, 15, 16, 17])

# pass condition expression only
result = np.where((arr > 12) & (arr < 16),
                  ['A', 'B', 'C', 'D', 'E', 'F'],
                  [1, 2, 3, 4, 5, 6])

Output:

['1' '2' 'C' 'D' '5' '6']

It returned a new array by the values selected from both the lists based on the result of multiple conditions on numpy array arr i.e.

  • Values in arr for which conditional expression returns True are 14 & 15, so these will be replaced by corresponding values in list1.
  • Values in arr for which conditional expression returns False are 11, 12, 16 & 17, so these will be replaced by corresponding values in list2.

Use np.where() to select indexes of elements that satisfy multiple conditions

Suppose we have a new numpy array,

arr = np.array([11, 12, 13, 14, 15, 16, 17, 15, 11, 12, 14, 15, 16, 17])

Now we want to find the indexes of elements in this array that satisfy our given condition i.e. element should be greater than 12 but less than 16. For this we can use the np.where() by passing the condition argument only i.e.

# Create a numpy array from list
arr = np.array([11, 12, 13, 14, 15, 16, 17, 15, 11, 12, 14, 15, 16, 17])

# pass condition expression only
result = np.where((arr > 12) & (arr < 16))

print(result)

Output:

(array([ 2,  3,  4,  7, 10, 11], dtype=int64),)

It returned a tuple containing an array of indexes where condition evaluated to True in the original array arr.

How did it work?

In this case condition expression is evaluated to a bool numpy array, which is eventually passed to numpy.where(). Then where() returned a tuple of arrays i.e. one for each dimension. As our array was one dimension only, so it contained an element only i.e. a new array containing the indices of elements where the value was True in bool array i.e. indexes of items from original array arr where value is between 12 & 16.

Using np.where() without any condition expression

In all the previous examples we passed a condition expression as the first argument, which will be evaluated to a bool array. But we can pass a bool array too instead of that,

result = np.where([True, False, False],
                  [1, 2, 4],
                  [7, 8, 9])
print(result)

Output:

[1 8 9]

numpy.where() iterates over the bool array and for every True it yields corresponding element from the first list and for every False it yields corresponding element from the second list.

So, basically it returns an array of elements from firs list where the condition is True, and elements from a second list elsewhere.

Important Points about np.where()

  • We can either pass all the 3 arguments or pass one condition argument only. There cannot be two arguments in the case of  numpy.where().
  • The first array will be a boolean array, that where() function will get by evaluating the condition expression.
  • If we are passing all 3 arguments to numpy.where(). Then all the 3 numpy arrays must be of the same length otherwise it will raise the following error,
    • ValueError: operands could not be broadcast together with shapes

Further Learning:

Find the index of value in Numpy Array using numpy.where()

Conclusion:

In this article we discussed the working of np.where() and how we can use to construct a new numpy array based on conditions on another array.

Scroll to Top