Thursday, July 23, 2009

Phylogenetic Trees: making the plot

We're on the last leg of the journey now. Having calculated all the relevant values we are ready to make the plot. I used R to do this so you can follow along if you want.

I find it is easier for me to write R code from Python to be executed later than it is to write R code as text that needs to sort or manipulate values to do various things. And I really doubt that you care exactly how I achieved this. (Let me know if you want a copy of the Python script). Here is a small taste of it.

Some code I write as text and load into Python to combine with the other parts:

code = open('src.Rcode.txt').read()
code = code.strip().split('\n\n')
FH.write(code[0] + '\n')


In this part, we go through each eNode, recover its x-position and the x-position of its parent node, and write the instruction that will draw a line connecting it those two points.

# coordinates for the lines and text
for k in eL: # horizontal
p = rT[k] # parent
x1 = str(round(xD[k],5))
x0 = str(round(xD[p],5)) # parent's x
y = str(round(yD[k],2))
FH.write('lines(c(' + x0 + ',')
FH.write(x1 + '),c(')
FH.write(y + ',' + y + '))\n')


The ta-ta-ta-taa (Melvin video at 2:51) figure is here:



You can compare it with the original plot by the APE package:



And here is the listing for the R code:


ex=c(0.08401,0.08853,0.0595,0.08516,0.0871,0.09976)
ey=c(1.0,2.0,3.0,4.0,5.0,6.0)
ix=c(0.00827,0.07219,0.03863,0.0)
iy=c(1.5,4.5,5.25,3.0)
par(cex=1.5,yaxt='n')
X = max(ex) + 0.15
Y = max(c(max(ey),max(iy))) + 2
plot(ex,ey,xlim=c(0,X),
ylim=c(0,Y),type="n",
xlab='',ylab='')
lines(c(0.00827,0.08401),c(1.0,1.0))
text(0.08401,1.0,label="Stenotrophomonas maltophilia",pos=4,font=3)
lines(c(0.00827,0.08853),c(2.0,2.0))
text(0.08853,2.0,label="Kingella oralis",pos=4,font=3)
lines(c(0.0,0.0595),c(3.0,3.0))
text(0.0595,3.0,label="Pseudomonas aeruginosa",pos=4,font=3)
lines(c(0.07219,0.08516),c(4.0,4.0))
text(0.08516,4.0,label="Salmonella typhi",pos=4,font=3)
lines(c(0.07219,0.0871),c(5.0,5.0))
text(0.0871,5.0,label="Escherichia coli",pos=4,font=3)
lines(c(0.03863,0.09976),c(6.0,6.0))
text(0.09976,6.0,label="Haemophilus parainfluenzae",pos=4,font=3)
lines(c(0.0,0.00827),c(1.5,1.5))
lines(c(0.03863,0.07219),c(4.5,4.5))
lines(c(0.0,0.03863),c(5.25,5.25))
lines(c(0.00827,0.00827),c(1.0,2.0))
lines(c(0.07219,0.07219),c(4.0,5.0))
lines(c(0.03863,0.03863),c(4.5,6.0))
lines(c(0.0,0.0),c(1.5,5.25))
points(ex,ey,pch=16,col="red")
points(ix,iy,pch=16,col="blue")