First execute the code as provided
size = 1000000
%timeit for x in range(size): x**2
%timeit for x in np.arange(size): x**2
329 ms ± 19.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 288 ms ± 14.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The peformance here shows little apparent benefit from using numpy
. The problem is that while we are using a numpy
's arange
to perform the loop, the calculation x**2
is the same built-in function being called in the same fashion.
If we want to make use of numpy
's vectorisation we need to simplify our code:
%timeit np.arange(size)**2
6.03 ms ± 775 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
We now see a significant improvement in performance. Is this accessing the billions of calculations per second that modern processors promise? Try this with other maths functions/operations.