When and When Not To Use Comprehensions¶
Uses¶
Comprehensions can be used for building up new collections in lots of cases. Here we’re using a comprehension and a Counter
object to find a set of unique letters in a given string:
from collections import Counter
def find_unique_letters(text):
return {
char
for char, count in Counter(text.lower()).items()
if count == 1
}
In general comprehensions and generator expressions can be used for building iterable from an old iterable.
Non-Uses¶
In general should not be used for printing or executing code with side effects.
Comprehensions should never be used when you don’t care about the list or iterable you’re building up from them.
Here’s a for
loop that writes data to a CSV file:
import csv
data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
with open('data.csv', mode='wb') as csv_file:
writer = csv.writer(csv_file)
for color, frequency in data:
writer.writerow((frequency, color))
You might think it would be a good idea to turn that loop into a list comprehension like this:
import csv
data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
with open('data.csv', mode='wb') as csv_file:
writer = csv.writer(csv_file)
[writer.writerow((frequency, color)) for color, frequency in data]
This makes our code less clear though.
Comprehensions are for turning old iterables into new iterables. We’re creating a list just to throw out its values.
Consider if we kept the values:
import csv
data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
with open('data.csv', mode='wb') as csv_file:
writer = csv.writer(csv_file)
written = [writer.writerow((frequency, color)) for color, frequency in data]
That written
variable is just a list of None
values now:
>>> written
[None, None, None]
You shouldn’t use a list comprehension except for the specific case of making a new list out of an old list (or any new iterable out of any old iterable).
Looking For Uses¶
We weren’t making a new list in our CSV-writing code:
import csv
data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
with open('data.csv', mode='wb') as csv_file:
writer = csv.writer(csv_file)
for color, frequency in data:
writer.writerow((frequency, color))
But we could have been:
import csv
data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
flipped_data = []
for color, frequency in data:
flipped_data.append((frequency, color))
with open('data.csv', mode='wb') as csv_file:
writer = csv.writer(csv_file)
writer.writerows(flipped_data)
Notice that we’re using the writerows
method here instead of writerow
, which we used before. The writerows
method accepts a list of rows, not just one row.
Because we rewrote our code to make a new list we could make a list comprehension in this case.
flipped_data = [
(frequency, color)
for color, frequency in data
]
In fact we could even make a generator expression:
flipped_data = (
(frequency, color)
for color, frequency in data
)
Uses for comprehensions and generator expressions aren’t always obvious. Sometimes you need to write your code in a different shape to find them.
Efficiency¶
List, set, and dictionary comprehensions are often a bit faster than for
loops, but their speed difference isn’t always huge.
Generator expressions are fairly different though. Generator expressions are often more memory efficient than list comprehensions because they don’t need to create a whole list, since they generate items as you loop over them.
This list comprehension (commented out so you don’t try executing it) would run out of memory because it would try to make an infinitely long list:
>>> from itertools import count
>>> # numbers = [n**2 for n in count()]
...
This generator expression won’t do any work until we start looping over it so very little memory be taken up by it:
>>> numbers = (n**2 for n in count())
If we start looping over it, it’ll print out the squares of all numbers:
>>> for n in numbers:
... print(n)
...
So list comprehensions will run out of memory, while generator expressions will run out of time.
Generator Functions¶
Generator expressions aren’t the only way to make a generator.
List comprehensions are to lists as generator expressions are to generator functions.
This is a generator function:
def all_together(*iterables):
for iterable in iterables:
for x in iterable:
yield x
We can copy-paste this into an equivalent generator expression:
def all_together(*iterables):
return (
x
for iterable in iterables
for x in iterable
)
List comprehensions can be copy-pasted from a for loop with an optional if and an append. Generator expressions can likewise be copy-pasted from a for loop with an optional if and a yield.
Extend and yield from¶
There are many cases where generator functions cannot be turned into generator expressions and many cases where for loops that build up lists cannot be turned into list comprehensions.
It’s not always obvious where you can use comprehensions.
Here’s a function that uses a list:
def all_together(*iterables):
"""The non-lazy list-returning version of all_together."""
together = []
for iterable in iterables:
together.extend(iterable)
return together
This for loop doesn’t have an append so it might seem like we can’t use a comprehension.
But we can! We first need to rewrite our code to use append instead of extend before we can copy-paste our way into a comprehension though:
def all_together(*iterables):
"""The non-lazy list-returning version of all_together."""
together = []
for iterable in iterables:
for item in iterable:
together.append(item)
return together
Now we can copy paste that loop into this comprehension:
def all_together(*iterables):
"""The non-lazy list-returning version of all_together."""
return [
item
for iterable in iterables
for item in iterable
]
We’ll have the same problem if we try to turn a generator function using yield from into a generator expression:
def all_together(*iterables):
"""The non-lazy list-returning version of all_together."""
for iterable in iterables:
yield from iterable
This generator function has to be turned into the one we saw before:
def all_together(*iterables):
for iterable in iterables:
for x in iterable:
yield x
So that we can then copy-paste it into an equivalent generator expression:
def all_together(*iterables):
return (
x
for iterable in iterables
for x in iterable
)
Scope¶
If we write a for
loop, can we use the variables defined as we loop from outside of the loop?
>>> numbers = [1, 2, 3, 4]
>>> squares = []
>>> for n in numbers:
... squares.append(n**2)
...
We can:
>>> n
4
If we write a list comprehension, can we use the variable n
after the comprehension?
>>> numbers = [1, 2, 3, 4]
>>> squares = [m**2 for m in numbers]
Unlike for loops, we cannot access m
from outside our comprehension:
>>> m
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'm' is not defined
What about for generator expressions?
>>> squares = (x**2 for x in numbers)
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
What if we evaluate the generator expression by looping them?
>>> tuples(squares)
(1, 4, 9, 16)
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
There’s still not variable x
.
Set comprehensions work the same way:
>>> squares = {x**2 for x in numbers}
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
As do dictionary comprehensions:
>>> squares = {x: x**2 for x in numbers}
>>> x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined
Comprehensions and generator expressions do not leak their scope in Python 3.
Note
In Python 2 you could access variables defined inside comprehensions from outside, but only for list comprehensions. List comprehensions didn’t have their own scope in Python 2! List comprehensions were invented before generator expressions which were invented before dictionary and set comprehensions. So list comprehensions were the only ones that didn’t have their own scope… until Python 3, which deemed this to be a “bug” instead of a “feature” and changed the scoping rules to be consistent.
Scope Oddities¶
This one is weird too:
class Cipher:
alphabet = 'abcdefghijklmnopqrstuvwxyz'
letter_a = alphabet[0]
letters = {
letter: ord(letter) - ord(letter_a)
for letter in alphabet
}
If we execute this code in a Python REPL, we’ll get a NameError
here because letter_a
is not defined.
This happens because dictionary comprehensions have their own scope and comprehension scope strangely cannot refer to variables in class scope (classes have their own scope).
Advanced Exercises¶
These exercises are all in the advanced.py
file (except as noted) in the exercises
directory. Edit the file to add the functions or fix the error(s) in the existing function(s). To run the test: from the exercises
folder, type python test.py <function_name>
, like this:
$ python test.py matrix_from_string
Matrix From String¶
Edit the matrix_from_string
exercise in to accept a string and return a list of lists of integers (found in the string).
Example:
>>> matrix_from_string("1 2\n10 20")
[[1, 2], [10, 20]]
Atbash Cipher¶
Instructions for this one can be found here.
You can test this one by typing:
$ python test.py encode
And:
$ python test.py decode
Memory-efficient CSV¶
Edit the function parse_csv
so that it accepts a file object which contains a CSV file (including a header row) and returns a list of namedtuples representing each row. It contains a partially-implemented version that does not pass the tests.
Note
Python’s standard library has a csv module that makes reading and processing csv files easy. It has a csv.reader
object for reading csv files that handles all the quoting and column separations for you. Each line in the file is read in as a list, with each element of the list being a column from the file. It also has a csv.DictReader
object that will read the file into a list of dictionaries where the key is the column name and the value is the string from the corresponding column. Using DictReader
to read CSV files is convenient because CSV columns can be referenced by name (instead of positional order). However there are some downsides to using DictReader
. CSV column ordering is lost because dictionaries are unordered. The space required to store each row is also unnecessarily large because dictionaries are not a very space-efficient data structure.
There is discussion of adding a NamedTupleReader to the csv
module, but this hasn’t been implemented yet.
In the meantime, it’s not too difficult to use a csv.reader
object to open a CSV file and then use a namedtuple
to represent each row.
Example with us-state-capitals.csv
:
>>> with open('us-state-capitals.csv') as csv_file:
... csv_rows = parse_csv(csv_file)
...
>>> csv_rows[:3]
[Row(state='Alabama', capital='Montgomery'), Row(state='Alaska', capital='Juneau'), Row(state='Arizona', capital='Phoenix')]
Deal Cards¶
Edit the get_cards
and deal_cards
functions. Some of them are partially implemented and may not pass the tests.
get_cards
: returns a list of namedtuples representing cards. Each card should havesuit
andrank
.shuffle_cards
: This function is provided for you.deal_cards
: accepts a number as its argument, removes the given number of cards from the end of the list and returns them
Examples:
>>> from advanced import get_cards, shuffle_cards, deal_cards
>>> deck = get_cards()
>>> deck[:14]
[Card(rank='A', suit='spades'), Card(rank='2', suit='spades'), Card(rank='3', suit='spades'), Card(rank='4', suit='spades'), Card(rank='5', suit='spades'), Card(rank='6', suit='spades'), Card(rank='7', suit='spades'), Card(rank='8', suit='spades'), Card(rank='9', suit='spades'), Card(rank='10', suit='spades'), Card(rank='J', suit='spades'), Card(rank='Q', suit='spades'), Card(rank='K', suit='spades'), Card(rank='A', suit='hearts')]
>>> len(deck)
52
>>> shuffle_cards(deck)
>>> deck[-5:]
[Card(rank='9', suit='diamonds'), Card(rank='6', suit='hearts'), Card(rank='7', suit='diamonds'), Card(rank='K', suit='spades'), Card(rank='7', suit='clubs')]
>>> hand = deal_cards(deck)
>>> hand
[Card(rank='9', suit='diamonds'), Card(rank='6', suit='hearts'), Card(rank='7', suit='diamonds'), Card(rank='K', suit='spades'), Card(rank='7', suit='clubs')]
>>> len(deck)
47
>>> deck[-5:]
[Card(rank='5', suit='spades'), Card(rank='Q', suit='clubs'), Card(rank='Q', suit='spades'), Card(rank='2', suit='diamonds'), Card(rank='6', suit='clubs')]
Meetup¶
Instructions for this one can be found here.
You can test this one by typing:
$ python test.py meetup_day