Thursday, June 25, 2009

Motif Discovery 10

This post will demonstrate the world's simplest Gibbs Sampler. We first construct a 2D distribution by hand. The Python code is:

# a very simple Gibbs sampler
import random,sys
#---------------------------------
# a discrete distribution
R = range(10)
distr = [
[1,1,2,2,1,1,1,1,1,1],
[1,3,4,5,2,1,1,1,1,1],
[1,4,7,6,2,1,1,1,1,1],
[1,3,4,3,2,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,2,2,3,2,1],
[1,1,1,1,1,2,3,4,2,1],
[1,1,1,1,1,2,2,5,3,1],
[1,1,1,1,1,1,1,1,2,1]]

def f(x,y):
return distr[x][y]
#---------------------------------
# save distribution for R plot
FH = open('results.txt','w')
for x in R:
for y in R:
# convert to 1-based index
t = (x+1,y+1,f(x,y))
L = [str(e) for e in t]
FH.write(' '.join(L) + '\n')
FH.close()


Two plots of the values (rotate the table above to see the correspondence):





The R code to do the plots:

Rcode
setwd('Desktop')
library(scatterplot3d)
m = read.table('results.txt',as.is=T)
par(las=1)
scatterplot3d(m,
type='h',pch=16,
cex.symbol=3,
cex.axis=2,
highlight.3d = T,
xlab='x',ylab='y',
zlab='z')

s = m[,3]
dim(s) = c(10,10)
s = t(s)
image(s,
col=topo.colors(10))


The sampler is simplicity itself:

def gibbsMove(L):
L = [t[0] for t in L]
S = sum(L)
cumulative = 0
# weight by score
r = random.random()
for i,s in enumerate(L):
cumulative += s
f = cumulative*1.0/S
if f > r: return i
raise ValueError('f < r')


Here are the top scores. The count is the number of times the sampler visited that value; the ratio of the count to the actual score (or value of the distribution at the point), is nearly constant.

c = count, s = score
c c/s ( s, x, y)
4208 601.14 ( 7.0 , 2 , 2 )
3789 631.5 ( 6.0 , 2 , 3 )
3239 647.8 ( 5.0 , 1 , 3 )
3103 620.6 ( 5.0 , 8 , 7 )
2636 659.0 ( 4.0 , 1 , 2 )
2551 637.75 ( 4.0 , 7 , 7 )
2465 616.25 ( 4.0 , 2 , 1 )
2466 616.5 ( 4.0 , 3 , 2 )
1929 643.0 ( 3.0 , 8 , 8 )
1894 631.33 ( 3.0 , 6 , 7 )
1901 633.67 ( 3.0 , 1 , 1 )
1893 631.0 ( 3.0 , 3 , 3 )
1842 614.0 ( 3.0 , 3 , 1 )
1971 657.0 ( 3.0 , 7 , 6 )
1315 657.5 ( 2.0 , 8 , 6 )
1374 687.0 ( 2.0 , 1 , 4 )
1268 634.0 ( 2.0 , 7 , 8 )
1394 697.0 ( 2.0 , 6 , 8 )
1363 681.5 ( 2.0 , 9 , 8 )
1227 613.5 ( 2.0 , 0 , 2 )


Here is the rest of the code needed to run the example and show the results:

x = random.choice(R)
y = random.choice(R)
L = [[f(x,y),x,y]]
t = ('x','y')

T = int(1E5)
for i in range(T):
L2 = list()
if random.choice(t) == 'x':
y = L[-1][2]
for x in R: L2.append((f(x,y),x,y))
else:
x = L[-1][1]
for y in R: L2.append((f(x,y),x,y))

j = gibbsMove(L2)
L.append(L2[j])
#---------------------------------
# show results
D = dict()
for t in L:
t = tuple(t)
if D.has_key(t): D[t] += 1
else: D[t] = 1

kL = sorted(D.keys(),key=lambda t: t[0])
kL.reverse()

def show(t):
counts = D[t]
score = t[0]
print str(counts).rjust(4),
ratio = 1.0*counts/score
print str(round(ratio,2)).ljust(7),
print '(' + str(round(score,1)).rjust(4),
print ',',t[1],',',t[2],')'

print 'c = count, s = score'
print ' c c/s ( s, x, y)'
for t in kL[:20]: show(t)
print '-' * 10
for t in kL[-20:]: show(t)


Update: fixed a bug in "show."