Thursday, December 16, 2010

Python generators

If you want to know about generators, I don't find the docs all that helpful, but you could check the PEPs, like this one.

Below is possibly the world's simplest Python generator: it's a function that includes the yield statement in a loop. It takes a number as the single argument and increments it once for each pass through the loop. The loop is endless: while True, but that's not a problem because we ask for one value at a time. Eventually we tire of the game and kill the process.


def g(n):
while True:
n += 1
yield n

f = g(13)
print f
x = f.next()
print x

while x <= 17:
x = f.next()
print x



<generator object g at 0x1004c7f50>
14
15
16
17
18

Notice that if we really wanted the test to work properly, we should reverse the order of the statements in the second while loop.

We can extend this idea to the problem of finding prime numbers---one of the oldest computational problems. In the code below we cache and then serve up the first four values. After that, the algorithm is that we test each prime number p in turn against the candidate n:

first, construct p*p
if p*p == n, n is not prime, it's the square of a prime
elif p divides n evenly, n is not prime
elif p*p > n, n is prime
else: test the next p

If n passes all the tests, it joins the club. Since all the primes beyond 5 end in 1, 3, 7, or 9, we test no other candidates.

I did three runs, a short one with v (verbose) = True to help test our logic, and longer ones with N = 10000 and N = 100000.
We grab N primes and look at the distribution.

testing 11 : 2 3 5 prime
testing 13 : 2 3 5 prime
testing 17 : 2 3 5 prime
testing 19 : 2 3 5 prime
testing 21 : 2 3 is a factor
testing 23 : 2 3 5 prime
testing 27 : 2 3 is a factor
testing 29 : 2 3 5 7 prime
testing 31 : 2 3 5 7 prime
testing 33 : 2 3 is a factor
testing 37 : 2 3 5 7 prime
testing 39 : 2 3 is a factor
testing 41 : 2 3 5 7 prime
testing 43 : 2 3 5 7 prime
testing 47 : 2 3 5 7 prime
testing 49 : 2 3 5 7 square of prime
testing 51 : 2 3 is a factor
testing 53 : 2 3 5 7 11 prime
testing 57 : 2 3 is a factor
testing 59 : 2 3 5 7 11 prime
testing 61 : 2 3 5 7 11 prime
testing 63 : 2 3 is a factor
testing 67 : 2 3 5 7 11 prime
testing 69 : 2 3 is a factor
testing 71 : 2 3 5 7 11 prime
{1: 5, 2: 1, 3: 5, 5: 1, 7: 5, 9: 3}
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]


{1: 2484, 2: 1, 3: 2515, 5: 1, 7: 2508, 9: 2491}


{1: 24967, 2: 1, 3: 25007, 5: 1, 7: 25015, 9: 25009}

Notice that the probability of ending in 1, 3, 7, or 9 is essentially identical. It would be interesting to look at the patterns in more detail, for example within a particular decade do 3 and 7 tend to cluster, etc.

v = False
def primes():
pL = [2,3,5,7]
for i in range(4):
yield pL[i]
base = 0
while True:
base += 10
for i in [1,3,7,9]:
n = base + i
if v: print 'testing', n, ':',
for p in pL:
if v: print p,
prod = p*p
if prod > n:
if v: print 'prime'
pL.append(n)
yield n
break
if prod == n:
if v: print 'square of prime'
break
if not n % p:
if v: print 'is a factor'
break

def test(N):
p = primes()
L = [p.next() for i in range(N)]
return L

L = test(100000)
D = dict(zip([1,2,3,5,7,9],[0]*6))
for n in L:
x = n % 10
D[x] += 1
print D
if v: print L[:16]

It's seems likely that this is not efficient. The tests for 2 and 3 seem unnecessary. We should just test whether n is even, whether the sum of the digits is divisible by 3, and never test 5 at all. There are other divisibility rules as well (wikipedia).

If you want an efficient prime number generator in Python, go here. Although unutbu is awesome, you should look at Alex Martelli's answer, which is short and awfully fast.

[UPDATE: I found several errors in an earlier version of this post.]