Solutions

Make a Series

  • Ten elements indexed by ten years:
my_series = Series(range(10), index=range(1920, 2020, 10))
print('My series:')
print(my_series)
print()
print('My series for 1990:')
print(my_series[1990])
print()

Output:

My series:
1920    0
1930    1
1940    2
1950    3
1960    4
1970    5
1980    6
1990    7
2000    8
2010    9
dtype: int64

My series for 1990:
7
  • Another series with a repeated index:
another_series = Series(range(5), index=['a', 'b', 'b', 'c', 'd'])
print('Another series, but with duplicated index:')
print(another_series)
print()
print('Another series accessing duplicated index:')
print(another_series['b'])

Output:

Another series, but with duplicated index:
a    0
b    1
b    2
c    3
d    4
dtype: int64

Another series accessing duplicated index:
b    1
b    2
dtype: int64
  • Series are different to lists since they must contain all the same data type.
  • Series are different to dicts since they can have keys with multiple values.

Broadcasting

  • Two series of the same size with no index broadcast together, element for element.
series_a = Series(range(5))
series_b = Series(range(5,10))
print(series_a*series_b)

Output:

0     0
1     6
2    14
3    24
4    36
dtype: int64
  • Two series of the different sizes with no index broadcast together, element for element, until one series runs out of elements, every element after that in the longer series is set to NaN (not a number).
series_c = Series(range(7))
print(series_a + series_c)

Output:

0    0.0
1    2.0
2    4.0
3    6.0
4    8.0
5    NaN
6    NaN
dtype: float64
  • Two series of the different sizes each with an index broadcast together, index for index, any elements that don't have a matching index are set to NaN (not a number).
series_d = Series(range(5), index=range(10,60,10))
series_e = Series(range(7), index=range(30,100,10))
print(series_d + series_e)

Output:

10    NaN
20    NaN
30    2.0
40    4.0
50    6.0
60    NaN
70    NaN
80    NaN
90    NaN
dtype: float64

.

Making your own dataframe

  • To setup the dataframe as before:
data = {'city': ['Paris', 'Paris', 'Paris', 'Paris',
                 'London', 'London', 'London', 'London',
                 'Rome', 'Rome', 'Rome', 'Rome'],
        'year': [2001, 2008, 2009, 2010,
                 2001, 2006, 2011, 2015,
                 2001, 2006, 2009, 2012],
        'pop': [2.148, 2.211, 2.234, 2.244,
                7.322, 7.657, 8.174, 8.615,
                2.547, 2.627, 2.734, 2.627]}
df = DataFrame(data)

Output: (no output)

  • To select the data for the year 2001:
print(df[df['year'] == 2001])

Output:

city  year    pop
0   Paris  2001  2.148
4  London  2001  7.322
8    Rome  2001  2.547
  • To find all cities with population less than 2.6 million:
print(df[df['pop'] < 2.6].city)

Output:

0    Paris
1    Paris
2    Paris
3    Paris
8     Rome
Name: city, dtype: object

.