When and When Not To Use Comprehensions

Uses

Comprehensions can be used for building up new collections in lots of cases. Here we’re using a comprehension and a Counter object to find a set of unique letters in a given string:

from collections import Counter

def find_unique_letters(text):
    return {
        char
        for char, count in Counter(text.lower()).items()
        if count == 1
    }

In general comprehensions and generator expressions can be used for building iterable from an old iterable.

Non-Uses

In general should not be used for printing or executing code with side effects.

Comprehensions should never be used when you don’t care about the list or iterable you’re building up from them.

Here’s a for loop that writes data to a CSV file:

import csv

data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
with open('data.csv', mode='wb') as csv_file:
    writer = csv.writer(csv_file)
    for color, frequency in data:
        writer.writerow((frequency, color))

You might think it would be a good idea to turn that loop into a list comprehension like this:

import csv

data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
with open('data.csv', mode='wb') as csv_file:
    writer = csv.writer(csv_file)
    [writer.writerow((frequency, color)) for color, frequency in data]

This makes our code less clear though.

Comprehensions are for turning old iterables into new iterables. We’re creating a list just to throw out its values.

Consider if we kept the values:

import csv

data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
with open('data.csv', mode='wb') as csv_file:
    writer = csv.writer(csv_file)
    written = [writer.writerow((frequency, color)) for color, frequency in data]

That written variable is just a list of None values now:

>>> written
[None, None, None]

You shouldn’t use a list comprehension except for the specific case of making a new list out of an old list (or any new iterable out of any old iterable).

Looking For Uses

We weren’t making a new list in our CSV-writing code:

import csv

data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
with open('data.csv', mode='wb') as csv_file:
    writer = csv.writer(csv_file)
    for color, frequency in data:
        writer.writerow((frequency, color))

But we could have been:

import csv

data = [('blue', 0.2), ('red', 0.3), ('green', 0.5)]
flipped_data = []
for color, frequency in data:
    flipped_data.append((frequency, color))

with open('data.csv', mode='wb') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerows(flipped_data)

Notice that we’re using the writerows method here instead of writerow, which we used before. The writerows method accepts a list of rows, not just one row.

Because we rewrote our code to make a new list we could make a list comprehension in this case.

flipped_data = [
    (frequency, color)
    for color, frequency in data
]

In fact we could even make a generator expression:

flipped_data = (
    (frequency, color)
    for color, frequency in data
)

Uses for comprehensions and generator expressions aren’t always obvious. Sometimes you need to write your code in a different shape to find them.

Efficiency

List, set, and dictionary comprehensions are often a bit faster than for loops, but their speed difference isn’t always huge.

Generator expressions are fairly different though. Generator expressions are often more memory efficient than list comprehensions because they don’t need to create a whole list, since they generate items as you loop over them.

This list comprehension (commented out so you don’t try executing it) would run out of memory because it would try to make an infinitely long list:

>>> from itertools import count
>>> # numbers = [n**2 for n in count()]
...

This generator expression won’t do any work until we start looping over it so very little memory be taken up by it:

>>> numbers = (n**2 for n in count())

If we start looping over it, it’ll print out the squares of all numbers:

>>> for n in numbers:
...     print(n)
...

So list comprehensions will run out of memory, while generator expressions will run out of time.

Generator Functions

Generator expressions aren’t the only way to make a generator.

List comprehensions are to lists as generator expressions are to generator functions.

This is a generator function:

def all_together(*iterables):
    for iterable in iterables:
        for x in iterable:
            yield x

We can copy-paste this into an equivalent generator expression:

def all_together(*iterables):
    return (
        x
        for iterable in iterables
        for x in iterable
    )

List comprehensions can be copy-pasted from a for loop with an optional if and an append. Generator expressions can likewise be copy-pasted from a for loop with an optional if and a yield.

Extend and yield from

There are many cases where generator functions cannot be turned into generator expressions and many cases where for loops that build up lists cannot be turned into list comprehensions.

It’s not always obvious where you can use comprehensions.

Here’s a function that uses a list:

def all_together(*iterables):
    """The non-lazy list-returning version of all_together."""
    together = []
    for iterable in iterables:
        together.extend(iterable)
    return together

This for loop doesn’t have an append so it might seem like we can’t use a comprehension.

But we can! We first need to rewrite our code to use append instead of extend before we can copy-paste our way into a comprehension though:

def all_together(*iterables):
    """The non-lazy list-returning version of all_together."""
    together = []
    for iterable in iterables:
        for item in iterable:
            together.append(item)
    return together

Now we can copy paste that loop into this comprehension:

def all_together(*iterables):
    """The non-lazy list-returning version of all_together."""
    return [
        item
        for iterable in iterables
        for item in iterable
    ]

We’ll have the same problem if we try to turn a generator function using yield from into a generator expression:

def all_together(*iterables):
    """The non-lazy list-returning version of all_together."""
    for iterable in iterables:
        yield from iterable

This generator function has to be turned into the one we saw before:

def all_together(*iterables):
    for iterable in iterables:
        for x in iterable:
            yield x

So that we can then copy-paste it into an equivalent generator expression:

def all_together(*iterables):
    return (
        x
        for iterable in iterables
        for x in iterable
    )

Scope

If we write a for loop, can we use the variables defined as we loop from outside of the loop?

>>> numbers = [1, 2, 3, 4]
>>> squares = []
>>> for n in numbers:
...     squares.append(n**2)
...

We can:

>>> n
4

If we write a list comprehension, can we use the variable n after the comprehension?

>>> numbers = [1, 2, 3, 4]
>>> squares = [m**2 for m in numbers]

Unlike for loops, we cannot access m from outside our comprehension:

>>> m
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'm' is not defined

What about for generator expressions?

>>> squares = (x**2 for x in numbers)
>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

What if we evaluate the generator expression by looping them?

>>> tuples(squares)
(1, 4, 9, 16)
>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

There’s still not variable x.

Set comprehensions work the same way:

>>> squares = {x**2 for x in numbers}
>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

As do dictionary comprehensions:

>>> squares = {x: x**2 for x in numbers}
>>> x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'x' is not defined

Comprehensions and generator expressions do not leak their scope in Python 3.

Note

In Python 2 you could access variables defined inside comprehensions from outside, but only for list comprehensions. List comprehensions didn’t have their own scope in Python 2! List comprehensions were invented before generator expressions which were invented before dictionary and set comprehensions. So list comprehensions were the only ones that didn’t have their own scope… until Python 3, which deemed this to be a “bug” instead of a “feature” and changed the scoping rules to be consistent.

Scope Oddities

This one is weird too:

class Cipher:
   alphabet = 'abcdefghijklmnopqrstuvwxyz'
   letter_a = alphabet[0]
   letters = {
        letter: ord(letter) - ord(letter_a)
        for letter in alphabet
   }

If we execute this code in a Python REPL, we’ll get a NameError here because letter_a is not defined.

This happens because dictionary comprehensions have their own scope and comprehension scope strangely cannot refer to variables in class scope (classes have their own scope).

Advanced Exercises

These exercises are all in the advanced.py file (except as noted) in the exercises directory. Edit the file to add the functions or fix the error(s) in the existing function(s). To run the test: from the exercises folder, type python test.py <function_name>, like this:

$ python test.py matrix_from_string

Matrix From String

Edit the matrix_from_string exercise in to accept a string and return a list of lists of integers (found in the string).

Example:

>>> matrix_from_string("1 2\n10 20")
[[1, 2], [10, 20]]

Atbash Cipher

Instructions for this one can be found here.

You can test this one by typing:

$ python test.py encode

And:

$ python test.py decode

Memory-efficient CSV

Edit the function parse_csv so that it accepts a file object which contains a CSV file (including a header row) and returns a list of namedtuples representing each row. It contains a partially-implemented version that does not pass the tests.

Note

Python’s standard library has a csv module that makes reading and processing csv files easy. It has a csv.reader object for reading csv files that handles all the quoting and column separations for you. Each line in the file is read in as a list, with each element of the list being a column from the file. It also has a csv.DictReader object that will read the file into a list of dictionaries where the key is the column name and the value is the string from the corresponding column. Using DictReader to read CSV files is convenient because CSV columns can be referenced by name (instead of positional order). However there are some downsides to using DictReader. CSV column ordering is lost because dictionaries are unordered. The space required to store each row is also unnecessarily large because dictionaries are not a very space-efficient data structure.

There is discussion of adding a NamedTupleReader to the csv module, but this hasn’t been implemented yet.

In the meantime, it’s not too difficult to use a csv.reader object to open a CSV file and then use a namedtuple to represent each row.

Example with us-state-capitals.csv:

>>> with open('us-state-capitals.csv') as csv_file:
...     csv_rows = parse_csv(csv_file)
...
>>> csv_rows[:3]
[Row(state='Alabama', capital='Montgomery'), Row(state='Alaska', capital='Juneau'), Row(state='Arizona', capital='Phoenix')]

Deal Cards

Edit the get_cards and deal_cards functions. Some of them are partially implemented and may not pass the tests.

  • get_cards: returns a list of namedtuples representing cards. Each card should have suit and rank.

  • shuffle_cards: This function is provided for you.

  • deal_cards: accepts a number as its argument, removes the given number of cards from the end of the list and returns them

Examples:

>>> from advanced import get_cards, shuffle_cards, deal_cards
>>> deck = get_cards()
>>> deck[:14]
[Card(rank='A', suit='spades'), Card(rank='2', suit='spades'), Card(rank='3', suit='spades'), Card(rank='4', suit='spades'), Card(rank='5', suit='spades'), Card(rank='6', suit='spades'), Card(rank='7', suit='spades'), Card(rank='8', suit='spades'), Card(rank='9', suit='spades'), Card(rank='10', suit='spades'), Card(rank='J', suit='spades'), Card(rank='Q', suit='spades'), Card(rank='K', suit='spades'), Card(rank='A', suit='hearts')]
>>> len(deck)
52
>>> shuffle_cards(deck)
>>> deck[-5:]
[Card(rank='9', suit='diamonds'), Card(rank='6', suit='hearts'), Card(rank='7', suit='diamonds'), Card(rank='K', suit='spades'), Card(rank='7', suit='clubs')]
>>> hand = deal_cards(deck)
>>> hand
[Card(rank='9', suit='diamonds'), Card(rank='6', suit='hearts'), Card(rank='7', suit='diamonds'), Card(rank='K', suit='spades'), Card(rank='7', suit='clubs')]
>>> len(deck)
47
>>> deck[-5:]
[Card(rank='5', suit='spades'), Card(rank='Q', suit='clubs'), Card(rank='Q', suit='spades'), Card(rank='2', suit='diamonds'), Card(rank='6', suit='clubs')]

Meetup

Instructions for this one can be found here.

You can test this one by typing:

$ python test.py meetup_day