Prime Time Programming, Part 1

2011-04-25 on Development, Primes, Profiling, ProjectEuler, Python

From time to time, I find myself caught up in the heady world of Project Euler. It's almost like playing Professor Layton or some other puzzle-solving game - mathematical or programming brain teasers, served in bite-sized pieces.

If you look at the Project Euler problems for any length of time, you'll notice common themes. One theme is prime numbers - many problems can't be solved without generating varying quantities of primes. To that end, I'd written a prime generator to be shared between problem solutions. Over time, I added functionality and "optimized" the generator to improve the running time of my solutions. Everything was great, until I tried Problem 315, whose solution required a list of primes between 10⁷ and 2×10⁷. My solution worked, but it ran really slowly - something like 12 minutes. Now, I'm not doing myself any favours by writing in Python and running on a 7-year-old laptop, but that's still too long.

My Original Generator

This is the slow-performing generator that I replaced when working on Problem 315. The squeamish reader may want to avert his eyes.

class PrimeGenerator:
    __primes_so_far = [5]
    __increment = 2

    def __init__(self):
        self.__next_index = -1


    def is_prime(self, n):
        if n == 2 or n == 3:
            return True
        elif n % 2 == 0 or n %3 == 0:
            return False
        elif n <= PrimeGenerator.__primes_so_far[-1]:
            # bisecting didn't really help
            return n in PrimeGenerator.__primes_so_far
        else:
            return self.__is_prime(n)

    def __is_prime(self, n):
        limit = math.sqrt(n)
        g = PrimeGenerator()
        g.next() # skip 2
        g.next() # skip 3
        p = g.next()
        while p <= limit:
            if  n % p == 0:
                return False
            p = g.next()
        return True

    def next(self):
        self.next = self.__next3
        return 2

    def __next3(self):
        self.next = self.__next5
        return 3

    def __next5(self):
        self.__next_index += 1
        if self.__next_index < len(PrimeGenerator.__primes_so_far):
            return PrimeGenerator.__primes_so_far[self.__next_index]

        candidate = PrimeGenerator.__primes_so_far[-1]

        while True:
            candidate += PrimeGenerator.__increment
            PrimeGenerator.__increment = 6 - PrimeGenerator.__increment

            if self.__is_prime(candidate):
                PrimeGenerator.__primes_so_far.append(candidate)
                return candidate

    def __iter__(self):
        return self</code></pre>

My eyes!

When I first went back to this code, I exclaimed, "What was I thinking?". I can think of two things:

The is_prime member was intended to help out for problems where I didn't have to create too many primes, but just had to check a few number for primality. This doesn't really belong here, and clutters the class. I'd be better off focusing on prime generation.
I was optimizing at least partly for the case where I'd want to get lists of primes a couple times—hence the indexing into an already-generated list of primes. If the generator were fast enough, this wouldn't be necessary.

Things I can't understand, even today. I'll have to blame them on then-ignorance, foolishness, and hasty modifications to the class:

Why the mucking about with __next3 and __next5? What did I have against yield?
Why is is_prime fussing with a whole new PrimeGenerator and skipping 2 and 3? Why not just go straight for the saved list of primes starting with 5?

In fact, just about the only defensible things (as we'll see below) in the whole class are:

ending the modulus checks once the candidate divisor is greater than the square root of the candidate number, and
the bit where the next__increment is formed by subtracting the current one from 6 - I was exploiting the fact that, past 3, for any prime p, p ≡ 1 (mod 6) or p ≡ 5 (mod 6).

How slow was it?

The generator took 551.763 seconds to generate primes less than 10⁷, as measured by the following:

def run(f):
    global highest
    start = datetime.datetime.now()
    count = 1
    for p in f:
        count += 1
        if p > highest: break
    end = datetime.datetime.now()
    elapsed = end-start
    return highest, count, elapsed.seconds + (elapsed.microseconds/1000000.0)

Where f is an instance of PrimeGenerator passed into the run method, and highest is a global that's been set to 10⁷.

Moving forward

Based on the extreme slowness and general horribleness of the code, I figured I'd be better off starting over. So, I chucked the generator and started afresh with the simplest generator I could write. I resolved to make incremental changes, ensuring that at each step, the code was:

correct (otherwise, why bother)
faster than the previous version
simple enough for me to understand

Let's see what happened.

Attempt 1: Trial Division

Trial division is one of the simplest methods for generating primes—you start counting at 2, and if no smaller positive integers (other than 1) divide the current number, it's prime. The naive implementation is very simple.

def naive_trial():
    n = 2
    while True:
        may_be_prime = True
        for k in xrange(2, n):
            if n % k == 0:
                may_be_prime = False
                break
        if may_be_prime:
            yield n
        n += 1

This method takes 113.804 seconds to generate primes below 100000—I couldn't wait for the full 10⁷ - it would probably be over 3 hours.

Trial until root

Fortunately, there are some pretty obvious optimizations one can make to the algorithm. The first comes from the observation that if there is a number k, 1 < k < n, that divides n, then there is a number j that divides n with 1 < j ≤ √n, so we can stop our trial once we've hit that point.

def trial_until_root():
    n = 2
    while True:
        may_be_prime = True
        for k in xrange(2, int(n**0.5)+1):
            if n % k == 0:
                may_be_prime = False
                break
        if may_be_prime:
            yield n
        n += 1

This method takes 468 seconds to generate primes up to 10⁷. A definite improvement (and already faster my original generator), but there's still room for more.

Trial by primes

Here's another observation about divisors of n: if there's a number k that divides n, then there's a prime number p ≤ k that divides n, since either k is prime or has prime factors. So if we keep a list of the primes found so far, we only need to check prime divisors that are less than √n.

def trial_by_primes():
    primes_so_far = []
    n = 2
    while True:
        may_be_prime = True
        for p in primes_so_far:
            if n % p == 0:
                may_be_prime = False
                break
            if p * p > n: # it's prime
                break
        if may_be_prime:
            primes_so_far.append(n)
            yield n
        n += 1

Now we're down to 136 seconds to generate primes below 10⁷. That was a worthwhile change, but we have to balance it against the fact that the generator now requires additional storage - the list of primes encountered so far. In this case, we're storing over 660,000 prime numbers in a list. Even an older laptop can handle this burden, but it's something to keep in mind.

That's odd

And by "that", I mean "all the prime numbers except for 2". There's no point checking the even numbers to see if they're prime. Let's see what happens when we skip them. The only tricky part (and it's not that tricky) is to make sure we still return 2 as our first prime.

def odd_trial_by_primes():
    primes_so_far = []
    yield 2
    n = 3
    while True:
        may_be_prime = True
        for p in primes_so_far:
            if n % p == 0:
                may_be_prime = False
                break
            if p * p > n: # it's prime
                break
        if may_be_prime:
            primes_so_far.append(n)
            yield n
        n += 2

This method takes 127 seconds to generate primes less than 10⁷. Not a huge improvement, but better than nothing, and it doesn't really complicate the code that much. I'll keep it. The reason we don't get a huge improvement here is that checking the even numbers for primeness doesn't take that much effort - they were eliminated as soon as we modded them by the first prime in primes_so_far. Still, it's a little quicker to jump right over them than to perform the division.

What about 3?

If skipping the numbers that are divisible by 2 paid off, will skipping those divisible by 3? As I noted above, every prime p greater than 3 satisfies p ≡ 1 (mod 6) or p ≡ 5 (mod 6). Let's use that. We'll take advantage of these facts:

if p ≡ 1 (mod 6) , then p+4 ≡ 5 (mod 6)
if p ≡ 5 (mod 6) , then p+2 ≡ 1 (mod 6)

So we want to alternate our step between 2 and 4. Fortunately 6 - 4 = 2 and 6 - 2 = 4, so we can use 6 - step as our next step.

def sixish_trial_by_primes():
    primes_so_far = []
    yield 2
    yield 3
    step = 2
    n = 5
    while True:
        may_be_prime = True
        for p in primes_so_far:
            if n % p == 0:
                may_be_prime = False
                break
            if p * p > n: # it's prime
                break
        if may_be_prime:
            primes_so_far.append(n)
            yield n
        n += step
        step = 6 - step

Now the time drops to 123 seconds to generate primes less than 10⁷. Clearly we've hit diminishing returns - we're saving two modulus operations on numbers that are divisible by 3 (but not 2), at the cost of a more complicated "step" calculation. We could continue on in this vein, but the gains are not likely to be large, and the complexity increases rapidly. Consider the next step: we'd make sure we don't test numbers divisible by 2, 3, or 5. That means (after 5) we only consider numbers whose remainders when divided by 30 are one of 1, 7, 11, 13, 17, 19, 23, or 29. The steps between numbers are 6, 4, 2, 4, 2, 4, 6, and 2. Who has the energy?

The problem with Trial Division

Trial division has a few things going for it:

it's simple to understand
there are some obvious optimizations that can make its performance tolerable

Ultimately, though, its downfall is that it takes a lot of work to verify that a large number is prime. Consider the largest prime number below 10⁷: 9999991. In order to verify that this is prime, we have to consider all prime factors less than √9999991. There are 446 of these. That's 446 divisions, just to verify that one number is prime.

We're unlikely to radically improve performance by continuing to tinker with trial division. It's time to throw the whole thing away again and try a new approach. Next time we'll do just that.