This exercise requires you to clone the repository from: github.com/arc-bath/parsing. Make sure that the repository is not cloned into a directory or sub-directory of an existing git repository.
% git clone https://github.com/arc-bath/parsing.git
Once you have the repository change into the directory and run the tests in test_rainfall.py
% cd parsing/src
% pytest test_ts_parser.py
You should see a lot of output from pytest since many of the tests failed. The final line should contain a summary:
======================================== 8 failed, 1 passed in 0.37 seconds =========================================
The aim of this exercise is to modify your function so that it passes all these tests. Let's begin by reducing the output produced by pytest
so we can see more clearly what is happening:
% pytest --tb=short test_ts_parser.py
You should now see output that looks like:
================================================ test session starts ================================================
platform linux -- Python 3.6.3, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/rjg20/training/arc-training/now-code-repos/parsing/src, inifile:
collected 9 items
test_ts_parser.py .FFFFFFFF
===================================================== FAILURES ======================================================
_______________________________________________ test_read_ts_coords2 ________________________________________________
test_ts_parser.py:32: in test_read_ts_coords2
assert len(item) == 38
E assert 0 == 38
E + where 0 = len([])
________________________________________________ test_read_structs_a ________________________________________________
test_ts_parser.py:64: in test_read_structs_a
assert len(item) == 1
E assert 0 == 1
E + where 0 = len([])
________________________________________________ test_read_structs_b ________________________________________________
test_ts_parser.py:81: in test_read_structs_b
assert len(item) == 2
E assert 0 == 2
E + where 0 = len([])
________________________________________________ test_read_structs_c ________________________________________________
test_ts_parser.py:99: in test_read_structs_c
assert len(item) == 3
E assert 0 == 3
E + where 0 = len([])
________________________________________________ test_read_structs_d ________________________________________________
test_ts_parser.py:119: in test_read_structs_d
assert len(item) == 3
E assert 0 == 3
E + where 0 = len([])
________________________________________________ test_read_structs_e ________________________________________________
test_ts_parser.py:142: in test_read_structs_e
assert len(item) == 3
E assert 0 == 3
E + where 0 = len([])
________________________________________________ test_read_structs_f ________________________________________________
test_ts_parser.py:171: in test_read_structs_f
raise Exception('Expected empty line error not raised')
E Exception: Expected empty line error not raised
________________________________________________ test_read_structs_g ________________________________________________
test_ts_parser.py:188: in test_read_structs_g
raise Exception('Expected file termination error not raised')
E Exception: Expected file termination error not raised
======================================== 8 failed, 1 passed in 0.37 seconds =========================================
Writing code to process data files can take an inordinate amount of time. We will examine how to conditionally process a file of structures. While this may seem like a problem of simulation, it examines a much more general problem when dealing with data, how to read and write it. Even with standard formats easing the process of reading and writing data, in processing that data, and changing its format to meet the needs of our analysis will often be necessary.
You will read in structures (data set) comprising of atom labels and elements in the format:
<Element label> <x_coordinate> <y_coordinate> <z_coordinate>
e.g.
A 0.0 0.0 0.0
where the data are separated by spaces
. Each structure consists of an unknown number of elements and is terminated by a line reading
** [x_coordinate] [y_coordinate] [z_coordinate]
The coordinates are optional and not required. The end of the set of structures is signified by the line:
## [x_coordinate] [y_coordinate] [z_coordinate]
again the coordinates are optional. While it may seem unecessary this line is important since it signifies that the previous step in our analysis completed successfully, i.e. the previous program didn't just stop midway through calculating the next structure.
You will modify the code ts_parser.py
so that it reads in the structures, according to the above syntax, and processes the resulting data into lists of lists:
<elements>
, raising an Exception('Empty line in file')
if the line is emptyx
, y
and z
and converted to floats, or float('nan')
if not possible or not presentException('File termination Error')
if it didn'treturn [ [[ elements ]],
[[ x ]],
[[ y ]],
[[ z ]] ]
You will then write a second function that processes the data to:
return [ num_structs,
[ elements_per_struct ],
[ invalid_structs, [ list_of_invalid_structs ]] ]
A series of tests will help you to identify when your functions are performing correctly.