In this article we will discuss what is a structured numpy array and how to create it and sort it using different functions.
What is a Structured Numpy Array ?
A Structured Numpy Array is an array of structures (Similar to C struct). As numpy arrays are homogeneous i.e. they can contain data of same type only. So, instead of creating a numpy array of int or float, we can create numpy array of homogeneous structures too.
Let’s understand by an example,
Suppose we want to create a numpy array with elements of following structure
struct { char name[10]; float marks; int gradeLevel; }
It means each element in numpy array should be a structure of above type. This kind of numpy arrays are called structured numpy arrays.
Let’s see how to create that,
Creating a Structured Numpy Array
First of all import numpy module i.e.
import numpy as np
Now to create a structure numpy array we can pass a list of tuples containing the structure elements i.e.
Frequently Asked:
- How to Add Columns to NumPy Array in Python
- Count values greater than a value in 2D Numpy Array / Matrix
- Check if a NumPy Array contains any NaN value
- Introduction to NumPy in Python
[('Sam', 33.3, 3), ('Mike', 44.4, 5), ('Aadi', 66.6, 6), ('Riti', 88.8, 7)]
But as elements of a numpy array are homogeneous, so how will be the size and type of structure will be decided ?
For that we need to pass the type of above structure type i.e. schema in dtype parameter. Let’s create a dtype for above structure i.e.
# Creating the type of a structure dtype = [('Name', (np.str_, 10)), ('Marks', np.float64), ('GradeLevel', np.int32)]
Let’s create a numpy array based on this dtype i.e.
# Creating a Strucured Numpy array structuredArr = np.array([('Sam', 33.3, 3), ('Mike', 44.4, 5), ('Aadi', 66.6, 6), ('Riti', 88.8, 7)], dtype=dtype)
It will create a structured numpy array and its contents will be,
[('Sam', 33.3, 3) ('Mike', 44.4, 5) ('Aadi', 66.6, 6) ('Riti', 88.8, 7)]
Let’s check the data type of the above created numpy array is,
print(structuredArr.dtype)
Output:
[('Name', '<U10'), ('Marks', '<f8'), ('GradeLevel', '<i4')]
It is basically the structure type specifying a structure of String of size 10, float and int.
How to Sort a Structured Numpy Array ?
Suppose we have a very big structured numpy array and we want to sort that numpy array based on specific fields of the structure. For this,
both numpy.sort() and numpy.ndarray.sort() provides a parameter ‘order‘ , in which it can accept a single argument or list of arguments. Then it will sort the structured numpy array by this given order parameter as field of structure.
Let’s see how to do that,
Sort the Structured Numpy array by field ‘Name‘ of the structure
# Sort the Structured Numpy array by field 'Name' of the structure modArr = np.sort(structuredArr, order='Name') print('Sorted Array : ') print(modArr)
Output:
Sorted Array : [('Aadi', 66.6, 6) ('Mike', 44.4, 5) ('Riti', 88.8, 7) ('Sam', 33.3, 3)]
It sorted all the elements in this structured numpy array based on first field of the structure i.e. ‘Name’.
Sort the Structured Numpy array by field ‘Marks‘ of the structure
# Sort the Structured Numpy array by field 'Marks' of the structure modArr = np.sort(structuredArr, order='Marks') print('Sorted Array : ') print(modArr)
Output:
Sorted Array : [('Sam', 33.3, 3) ('Mike', 44.4, 5) ('Aadi', 66.6, 6) ('Riti', 88.8, 7)]
It sorted all the elements in this structured numpy array based on second field of the structure i.e. ‘Marks’.
Sort the Structured Numpy array by ‘Name’ & ‘GradeLevel’ fields of the structure
# Sort by Name & GradeLevel modArr = np.sort(structuredArr, order=['Name', 'GradeLevel']) print('Sorted Array : ') print(modArr)
Output:
Sorted Array : [('Aadi', 66.6, 6) ('Mike', 44.4, 5) ('Riti', 88.8, 7) ('Sam', 33.3, 3)]
It sorted all the elements in this structured numpy array based on multiple fields of the structure i.e. ‘Name’ and ‘GradeLevel’.
Structured numpy arrays are useful when you want to load a big csv file in a single numpy array and perform operations on it.
Complete example is as follows,
import numpy as np def main(): print('*** Creating a Structured Numpy Array ***') # Creating the type of a structure dtype = [('Name', (np.str_, 10)), ('Marks', np.float64), ('GradeLevel', np.int32)] # Creating a Strucured Numpy array structuredArr = np.array([('Sam', 33.3, 3), ('Mike', 44.4, 5), ('Aadi', 66.6, 6), ('Riti', 88.8, 7)], dtype=dtype) print('Contents of the Structured Numpy Array : ') print(structuredArr) print('Data type of the Structured Numpy Array : ') print(structuredArr.dtype) print('*** Sorting a Structured Numpy Array by <Name> field ***') # Sort the Structured Numpy array by field 'Name' of the structure modArr = np.sort(structuredArr, order='Name') print('Sorted Array : ') print(modArr) print('*** Sorting a Structured Numpy Array by <Marks> field ***') # Sort the Structured Numpy array by field 'Marks' of the structure modArr = np.sort(structuredArr, order='Marks') print('Sorted Array : ') print(modArr) print('*** Sorting a Structured Numpy Array by <Name> & <GradeLevel> fields ***') # Sort by Name & GradeLevel modArr = np.sort(structuredArr, order=['Name', 'GradeLevel']) print('Sorted Array : ') print(modArr) if __name__ == '__main__': main()
Output:
*** Creating a Structured Numpy Array *** Contents of the Structured Numpy Array : [('Sam', 33.3, 3) ('Mike', 44.4, 5) ('Aadi', 66.6, 6) ('Riti', 88.8, 7)] Data type of the Structured Numpy Array : [('Name', '<U10'), ('Marks', '<f8'), ('GradeLevel', '<i4')] *** Sorting a Structured Numpy Array by <Name> field *** Sorted Array : [('Aadi', 66.6, 6) ('Mike', 44.4, 5) ('Riti', 88.8, 7) ('Sam', 33.3, 3)] *** Sorting a Structured Numpy Array by <Marks> field *** Sorted Array : [('Sam', 33.3, 3) ('Mike', 44.4, 5) ('Aadi', 66.6, 6) ('Riti', 88.8, 7)] *** Sorting a Structured Numpy Array by <Name> & <GradeLevel> fields *** Sorted Array : [('Aadi', 66.6, 6) ('Mike', 44.4, 5) ('Riti', 88.8, 7) ('Sam', 33.3, 3)]