Introduction to Data and Plotting

Introduction to NumPy

Overview

  • Teaching: 15 min
  • Exercises: 10 min

Questions

  • What is NumPy?
  • Why should I use it?

Objectives

  • Use NumPy to convert lists to NumPy arrays.
  • Use NumPy to create arrays from scratch.
  • Manipulate and reshape NumPy arrays.

NumPy ('Numerical Python') is the standard module for doing numerical work in Python. Its main feature is its array data type which allows very compact and efficient storage of homogenous (of the same type) data

There is a standard convention for importing numpy, and that is as np:

In [1]:
import numpy as np

Now that we have access to the numpy package we can start using its features.

Documentation

As you go through this material, you may find it useful to refer to the NumPy documentation, particularly the array objects section.

Creating arrays from lists

In many ways a NumPy array can be treated like a standard Python list and much of the way you interact with it is identical. Given a list, you can create an array as follows:

In [2]:
python_list = [1, 2, 3, 4, 5, 6, 7, 8]
numpy_array = np.array(python_list)
print(numpy_array)
[1 2 3 4 5 6 7 8]
In [3]:
# ndim give the number of dimensions
print(numpy_array.ndim)
Out[3]:
1
In [4]:
# the shape of an array is a tuple of its length in each dimension. In this case it is only 1-dimensional
print(numpy_array.shape)
Out[4]:
(8,)
In [5]:
# as in standard Python, len() gives a sensible answer
print(len(numpy_array))
Out[5]:
8
In [6]:
nested_list = [[1, 2, 3], [4, 5, 6]]
two_dim_array = np.array(nested_list)
print(two_dim_array)
[[1 2 3]
 [4 5 6]]
In [7]:
print(two_dim_array.ndim)
Out[7]:
2
In [8]:
print(two_dim_array.shape)
Out[8]:
(2, 3)

Creating arrays from scratch

It's very common when working with data to not have it already in a Python list but rather to want to create some data from scratch. numpy comes with a whole suite of functions for creating arrays. We will now run through some of the most commonly used.

The first is np.arange (meaning "array range") which works in a vary similar fashion the the standard Python range() function, including how it defaults to starting from zero, doesn't include the number at the top of the range and how it allows you to specify a 'step:

In [9]:
np.arange(10) #0 .. n-1  (!)
Out[9]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [10]:
np.arange(1, 9, 2) # start, end (exclusive), step
Out[10]:
array([1, 3, 5, 7])

Next up is the np.linspace (meaning "linear space") which generates a given floating point numbers starting from the first argument up to the second argument. The third argument defines how many numbers to create:

In [11]:
np.linspace(0, 1, 6)   # start, end, num-points
Out[11]:
array([0. , 0.2, 0.4, 0.6, 0.8, 1. ])

Note how it included the end point unlike arange(). You can change this feature by using the endpoint argument:

In [12]:
np.linspace(0, 1, 5, endpoint=False)
Out[12]:
array([0. , 0.2, 0.4, 0.6, 0.8])

np.ones creates an n-dimensional array filled with the value 1.0. The argument you give to the function defines the shape of the array:

In [13]:
np.ones((3, 3))  # reminder: (3, 3) is a tuple
Out[13]:
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

Likewise, you can create an array of any size filled with zeros:

In [14]:
np.zeros((2, 2))
Out[14]:
array([[0., 0.],
       [0., 0.]])

The np.eye (referring to the matematical identity matrix, commonly labelled as I) creates a square matrix of a given size with 1.0 on the diagonal and 0.0 elsewhere:

In [15]:
np.eye(3)
Out[15]:
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

The np.diag creates a square matrix with the given values on the diagonal and 0.0 elsewhere:

In [16]:
np.diag([1, 2, 3, 4])
Out[16]:
array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

Finally, you can fill an array with random numbers:

In [17]:
np.random.rand(4)  # uniform in [0, 1]
Out[17]:
array([0.10694928, 0.88985274, 0.63606749, 0.59386516])
In [18]:
np.random.randn(4)  # Gaussian or normally distributed
Out[18]:
array([-1.81972346, -0.13515826,  1.95490428,  0.70545204])

Try executing these cells multiple times and notice how you get a different result each time.

print()

In each of these examples we have omitted the print(). How does including it change the output of the cell?

Different arrays

  • Create at least one one dimensional array with each of arange, linspace and ones.
  • Create at least one two dimensional array with each of zeros, eye and diag.
  • Create at least two arrays with different types of random numbers (eg. uniform and Gaussian random numbers).
  • Look at the function np.empty. What does it do? When might this be useful?

    Solution

Reshaping arrays

Behind the scenes, a multi-dimensional NumPy array is just stored as a linear segment of memory. The fact that it is presented as having more than one dimension is simply a layer on top of that (sometimes called a view). This means that we can simply change that interpretive layer and change the shape of an array very quickly (i.e without NumPy having to copy any data around).

This is mostly done with the reshape() method on the array object:

In [19]:
my_array = np.arange(16)
my_array
Out[19]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15])
In [20]:
my_array.shape
Out[20]:
(16,)
In [21]:
my_array.reshape((2, 8))
Out[21]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15]])
In [22]:
my_array.reshape((4, 4))
Out[22]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

Note that if you check, my_array.shape will still return (16,) as reshaped is simply a view on the original data, it hasn't actually changed it. If you want to edit the original object in-place then you can use the resize() method.

You can also transpose an array using the transpose() method which mirrors the array along its diagonal:

In [23]:
my_array.reshape((2, 8)).transpose()
Out[23]:
array([[ 0,  8],
       [ 1,  9],
       [ 2, 10],
       [ 3, 11],
       [ 4, 12],
       [ 5, 13],
       [ 6, 14],
       [ 7, 15]])
In [24]:
my_array.reshape((4,4)).transpose()
Out[24]:
array([[ 0,  4,  8, 12],
       [ 1,  5,  9, 13],
       [ 2,  6, 10, 14],
       [ 3,  7, 11, 15]])

An array puzzle

Using the NumPy documentation, create, in one line, a NumPy array which looks like:

[10,  60,  20,  70,  30,  80,  40,  90,  50, 100]

Hint: you might need to use transpose(), reshape() and arange() as well as other functions from the "Shape manipulation" section of the documentation. Can you find a method which uses fewer than 4 function calls?

Solution

Key Points

  • np.array can convert Python lists to NumPy arrays.
  • NumPy gives many functions for initialising arrays, like arange, linspace, ones and zeros.
  • NumPy arrays can be reshaped and resized using the reshape and resize functions.