Interactive tools such as Jupyter notebooks and Ipython are great for prototyping code and exploring data, but sooner or later if we want to re-use our codes, or demonstrate reproducbile workflows, we will want to use our program in a pipeline or run it in a shell script to process thousands of data files. In order to do that, we need to make our programs work like other Unix command-line tools. For example, we may want a program that reads a dataset and prints the average inflammation per patient.
You have several ways to choose from to run this episode. If you are comfortable using linux editors and the terminal you are welcome to do so. Otherwise you can create a file directly in notebooks from the menu page where you create a new notebook, instead of selecting Python3.6 select Text File. Once the new file has opened click on Untitled.txt and change it's name as you are instructed. notebooks allows you to edit the file in a number of different modes replicating more advanced editors which you should explore if you want to use notebooks for regular development.
As you might now expect our first task will be to produce a program that sends a greeting to the world.
Let's start by creating a new Text File rename it hello_world.py
and enter:
print("Hello world!")
Create a new notebook in the same folder as your program and run it with
%run hello_world.py
Verify that this gives the output you would expect.
Often we will want to pass information into the program from the command line. Fortunately Python has a standard library that allows us to do this. Copy your hello_world.py
program to a new file hello.py
and edit the file as follows:
import sys
print("Hello",sys.argv)
If we run our new program with the argument, James
we should see the following output:
%run hello.py James
Hello ['./hello.py', 'James']
sys.argv
means system argument values. The first argument is the name of the program and the full set of values are presented as a list, we don't want to say hello to the name of the program, and generally we will want to ignore this argument so let's modify our program to just consider the rest of the list:
import sys
names = sys.argv[1:]
for name in names:
print("Hello",name)
Make sure that you understand what we have done here, and why, discuss with your neighbours to make sure eveyone is following.
We can now re-run our new program with the same command as before:
%run hello.py James
Hello James
Because we have generalised the program to operate on all arguments passed to it we can also run
%run hello.py Alan Bob Carl Dave
Hello Alan
Hello Bob
Hello Carl
Hello Dave
so we already have a way to generalise the script to perform the same task on a number of arguments.
We will next make some small changes to our program to encapsulate the main part of the our program in its own function, and then tell Python that this is what it should run we the program is executed:
import sys
def main():
'''
We can also add a docstring to remind our future selves that this program:
Takes a list of arguments and say hello to each of them.
'''
names = sys.argv[1:]
for name in names:
print("Hello",name)
if __name__ == '__main__':
main()
Run your program with the same arguments as before to check that you have not change its behaviour. Note that we can also add a 'docstring' to our main
function to explain what it does.
Let's say we want to find the mean inflamation of each of the patients in the inflammation data we read in during the previous lesson.
First copy our template hello.py
to inflammation_mean.py
. Open inflammation_mean.py
and edit it as follows:
import sys
def main():
'''
We can also add a docstring to remind our future selves that this program:
Takes a list of files, and find and print the mean of each line of data:
'''
filenames = sys.argv[1:]
for filename in filenames:
data = read_csv_to_floats(filename)
count=0
for line in data:
count += 1
print("File: ", filename, "patient: ", count, "average inflammation", mean(line))
if __name__ == '__main__':
main()
Now we need to add the mean(sample)
function that we considered in episode 7, add this between read_csv_to_floats()
and main
:
def mean(sample):
'''
Takes a list of numbers, sample
and returns the mean.
'''
sample_sum = 0
for value in sample:
sample_sum += value
sample_mean = sample_sum / len(sample)
return sample_mean
Now run your program with:
%run inflammation_mean.py inflammation-01.csv
Your output should look something like:
File: ../inflammation-01.csv patient: 1 average inflammation 5.45
File: ../inflammation-01.csv patient: 2 average inflammation 5.425
File: ../inflammation-01.csv patient: 3 average inflammation 6.1
File: ../inflammation-01.csv patient: 4 average inflammation 5.9
File: ../inflammation-01.csv patient: 5 average inflammation 5.55
File: ../inflammation-01.csv patient: 6 average inflammation 6.225
File: ../inflammation-01.csv patient: 7 average inflammation 5.975
File: ../inflammation-01.csv patient: 8 average inflammation 6.65
File: ../inflammation-01.csv patient: 9 average inflammation 6.625
File: ../inflammation-01.csv patient: 10 average inflammation 6.525
File: ../inflammation-01.csv patient: 11 average inflammation 6.775
File: ../inflammation-01.csv patient: 12 average inflammation 5.8
File: ../inflammation-01.csv patient: 13 average inflammation 6.225
File: ../inflammation-01.csv patient: 14 average inflammation 5.75
File: ../inflammation-01.csv patient: 15 average inflammation 5.225
File: ../inflammation-01.csv patient: 16 average inflammation 6.3
File: ../inflammation-01.csv patient: 17 average inflammation 6.55
File: ../inflammation-01.csv patient: 18 average inflammation 5.7
File: ../inflammation-01.csv patient: 19 average inflammation 5.85
File: ../inflammation-01.csv patient: 20 average inflammation 6.55
File: ../inflammation-01.csv patient: 21 average inflammation 5.775
File: ../inflammation-01.csv patient: 22 average inflammation 5.825
File: ../inflammation-01.csv patient: 23 average inflammation 6.175
File: ../inflammation-01.csv patient: 24 average inflammation 6.1
File: ../inflammation-01.csv patient: 25 average inflammation 5.8
File: ../inflammation-01.csv patient: 26 average inflammation 6.425
File: ../inflammation-01.csv patient: 27 average inflammation 6.05
File: ../inflammation-01.csv patient: 28 average inflammation 6.025
File: ../inflammation-01.csv patient: 29 average inflammation 6.175
File: ../inflammation-01.csv patient: 30 average inflammation 6.55
File: ../inflammation-01.csv patient: 31 average inflammation 6.175
File: ../inflammation-01.csv patient: 32 average inflammation 6.35
File: ../inflammation-01.csv patient: 33 average inflammation 6.725
File: ../inflammation-01.csv patient: 34 average inflammation 6.125
File: ../inflammation-01.csv patient: 35 average inflammation 7.075
File: ../inflammation-01.csv patient: 36 average inflammation 5.725
File: ../inflammation-01.csv patient: 37 average inflammation 5.925
File: ../inflammation-01.csv patient: 38 average inflammation 6.15
File: ../inflammation-01.csv patient: 39 average inflammation 6.075
File: ../inflammation-01.csv patient: 40 average inflammation 5.75
File: ../inflammation-01.csv patient: 41 average inflammation 5.975
File: ../inflammation-01.csv patient: 42 average inflammation 5.725
File: ../inflammation-01.csv patient: 43 average inflammation 6.3
File: ../inflammation-01.csv patient: 44 average inflammation 5.9
File: ../inflammation-01.csv patient: 45 average inflammation 6.75
File: ../inflammation-01.csv patient: 46 average inflammation 5.925
File: ../inflammation-01.csv patient: 47 average inflammation 7.225
File: ../inflammation-01.csv patient: 48 average inflammation 6.15
File: ../inflammation-01.csv patient: 49 average inflammation 5.95
File: ../inflammation-01.csv patient: 50 average inflammation 6.275
File: ../inflammation-01.csv patient: 51 average inflammation 5.7
File: ../inflammation-01.csv patient: 52 average inflammation 6.1
File: ../inflammation-01.csv patient: 53 average inflammation 6.825
File: ../inflammation-01.csv patient: 54 average inflammation 5.975
File: ../inflammation-01.csv patient: 55 average inflammation 6.725
File: ../inflammation-01.csv patient: 56 average inflammation 5.7
File: ../inflammation-01.csv patient: 57 average inflammation 6.25
File: ../inflammation-01.csv patient: 58 average inflammation 6.4
File: ../inflammation-01.csv patient: 59 average inflammation 7.05
File: ../inflammation-01.csv patient: 60 average inflammation 5.9
We can also run our program with all the inflammation data:
%run inflammation_mean.py inflammation-*.csv
We may also want to output our data to a file. In order to do this modify your main
function as follows:
def main():
'''
We can also add a docstring to remind our future selves that this program:
Takes a list of files, and find and print the mean of each line of data:
'''
filenames = sys.argv[1:]
output_filename = "my_data.txt"
output_file = open(output_filename, 'w')
for filename in filenames:
data = read_csv_to_floats(filename)
count=0
for line in data:
count += 1
output_file.write("File: "+filename+"patient: "+str(count)+"average inflammation: "+str(mean(line))+"\n")
output_file.close()
Note that we as with reading from files we have to open
and close
the file. Also the function file.write()
can only take a single str as its parameter, so the write
line is a little different to our print
statement before, we also have to add a explicit new line at the end of the line which is the reason for the "\n"
.
Run your program and cat
the file my_data.txt
to verify that it has worked as intended.
Verify that you can also import
your library and access the functions it defines, remember that as with undefined variables, if your function is not found, the library has not been correctly read in. Repeat the 'analysis' in the main function by explicitly assigning values to filename
and calling your read_csv_to_floats
and mean
functions.