Performant Python

Performance¶

Overview:

Teaching: 5 min
Exercises: 15 min

Questions

How does using numpy help me write more efficient code?
How can I find out how to check performance of codes in Python?

Objectives

Know that Numpy and SciPy use a C or Fortran and make use of other standard libraries to support their built-in functions.
Use timeit to time code execution.

Timeit¶

Python has a convenient timing function called timeit, which is also avaailable as magic in Ipython/Jupyter notebooks

Can use this to measure the execution time of small code snippets.

From python: import timeit and supply code snippet as a string
From ipython: can use magic command %timeit

By default, %timeit repeats the code 3 times and outputs the best time. It also tells you how many iterations it ran the code per repeat. You can specify the number of repeats and the number of iterations per repeat.

%timeit -n <iterations> -r <repeats>  <code_snippet>

If you want to run multiple lines of code in a cell block then you need to use:

%%timeit -n <iterations> -r <repeats>
<code_snippet1>
<code_snippet2>
`

See

%timeit? for more information
https://docs.python.org/3/library/timeit.html

First as always we must import numpy:

import numpy as np

Let's create ourselves an array as before and change their shape.

nd = np.arange(100).reshape((10,10))

We can use magic %timeit to time hold long it takes to access elements of the array, and explore which method for accessing elements of an array is quicker.

# accessing element of 2d array
%timeit -n 10000000 -r 3 nd[5][5]
%timeit -n 10000000 -r 3 nd[(5,5)]

322 ns ± 7.12 ns per loop (mean ± std. dev. of 3 runs, 10000000 loops each)
159 ns ± 7.17 ns per loop (mean ± std. dev. of 3 runs, 10000000 loops each)

All accesses are not equal. If we want to time multiple lines of code the we can use the %%timeit with default settings:

%%timeit
nd[5][5]
nd[(5,5)]

469 ns ± 31.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Though note that now we just get time time for the entire code block. Using the default settings can be good when we don't know how long a code block will take to execute, as timeit will decide how many instances or loops to run, to get reasonable statistics in reasonable time.

Lists

Compare the performance of numpy arrays and lists. E.g. make list

size = 1000000
my_list = list(range(size))
my_array = np.arange(size)

Try timing some simple operations, e.g. multiply, square, raise to a power. You may need to import additional libraries to does this with lists, e.g. import math to get non-numpy implementations.

Ranges

In many cases we do not need to create an array but can use the built-in or numpy range functions to perform loops directly. Try the following examples and again explore the performance of different functions.

size = 1000000

%timeit for x in range(size): x**2

%timeit for x in np.arange(size): x**2

What is the relative performance of the two implementations, and why do you think this is the case? Can you think of another way of performing the same functionality more efficiently?

Solution

Darts and calculating $\pi$

A Monte Carlo method (aka "throwing darts")¶

Geometry gives us an expression for $\pi$: the area of a circle radius $r$ divided by the area of a square with length $r$:

$$ \pi = \pi r^2 / r^2 $$

We can estimate the area of a circle and a square by throwing darts. If $N_{in}$ is the number of darts falling on the dart board (quarter circle), and $N_{tot}$ is the total number of trials (i.e. darts falling on the square):

$$ \pi \approx 4 N_{in} / N_{tot} $$

Try using numpy arrays to compute the following:

Choose a sample size ntot
Generate an array of random $x$ coordinates $0 \leq x < 1$.
Generate an array of random $y$ coordinates $0 \leq y < 1$.
Count the number for which $x^2 + y^2 < 1$
Compute approximation to $\pi$

Key Points:

You can use the timeit function/magic to evaluate the performance of your code
Just because you use numpy does not magically improve the performance of your code
Making appropriate use of numpy vectorisation (and libraries compiled against the intel maths library!) can give significant performance improvements
Vectorisation can also make our code easier to read and write!