Saturday, February 19, 2011

QIIME (1)

Yesterday and this morning, I've been working on installing QIIME and doing the tutorial. It stands for: Quantitative Insights into Microbial Ecology (link). It's yet another major software project out of Rob Knight's lab.

I'd like to post on the issues/questions/answers I run into as I work through it. It'll also give me an opportunity to get back to Bioinformatics (e.g. multiple sequence alignments).

The first thing is that they recommend you use a virtual machine (Virtual Box) and install the QIIME Virtual Box on it, which is a huge download that packages Ubuntu Linux with all of the QIIME dependencies, correctly configured. Sounds like a great idea.

I started by trying this. I had to download overnight on my machine at work (> 2 GB), then buy a new thumb drive to bring the file home, since I don't have the recommended amount of memory for the VM (1 GB) on that machine. But I ran into trouble---basically much aggravation dealing with the different keys on Linux, but also the Virtual Box and extra work trying to figure out copy/paste, moving files over, etc. The killer was when the VM prompted for an administrator's password, and of course I don't have it. There shouldn't be one needed..

[ After a search of the Forum the mystery word is revealed to be: qiime. I shoulda guessed it. ]

So I decided instead to deal with the dependencies for QIIME. I got almost all of them (like 24 or so), but for the "default pipeline" you really don't need so many. Particularly if you already have PyCogent and matplotlib installed, as I do, I would recommend just going down the list. It's not difficult, and it's better to be in an environment where you're comfortable working.

There was really no trouble configuring them (at least the essentials).

#1: Python 2.6

I have it, since I'm on OS X. I ran into a little trouble because (from playing with MacPorts) I still had /opt/local/bin on my path as changed in .bash_profile. The QIIME install checks this variable and modifies its scripts (the she-bang thing), so I had to fix that later on.

#2: Numpy 1.3.0

I have this too.


>>> import numpy
>>> print numpy.__version__
1.3.0


Not sure what the default OS X version is, but easy_install will fix you up quickly.

#3: PyCogent

Of course, this has its own dependencies. But we've been through that before (here, here, here).

#4: MUSCLE

This is not one of the primary dependencies, but it's an old friend. I upgraded to the latest and greatest version, but it caused an error in the QIIME tests. Luckily I still had muscle3.6_src around. As usual, I put it in ~/Software and do:


ln -s ~/Software/muscle3.6_src/muscle ~/bin/muscle


and I put ~/bin on my $PATH in .bash_profile:


export PATH=$HOME/bin/:$PATH


#5: MAFFT

It's an installer. Piece of cake. Check it:


> mafft


#6 uclust

See the QIIME install notes for this. This is the work of Robert Edgar (also the creator of MUSCLE), who is a talented programmer, but also a businessman. Hence, no source code, and no 64-bit for some things. But there's a download link on the QIIME site. The only problem is that Safari put a .txt extension on the download and I thought there was a problem, went off on a wild goose chase, and got a version that is not new enough. Enough said. Put it in ~/Software and link as usual


> uclust
uclust v1.2.21q


#7 PyNAST

(download)

unpack and move to Software


python setup.py install
cd tests
python all_tests.py


After installing MUSCLE and MAFFT passes all tests except:


AssertionError: Got (DnaSequence(ACGTACG... 23), DnaSequence(ACGTACG... 23)), but expected (DnaSequence(ACGTACG... 23), DnaSequence(ACGTACG... 23))


That's pretty silly.

#8 Greengenes files

greengenes core set data file (fasta)
greengenes alignment lanemask file (txt)


Where to put them? I put them in qiime_tutorial, see below, but will eventually want them in Qiime somewhere.

#9 FastTree

fasttree 2.1.0 (src)


gcc -Wall -O3 -finline-functions -funroll-loops -o FastTree -lm FastTree-2.1.0.c


move to Software and link

#10 Java

I don't need to add the Java runtime. I got it from Apple a while ago (here):


> java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)


#11 RDP Classifier

rdp_classifier-2.0.1 (src)

in .bash_profile:


export RDP_JAR_PATH=$HOME/Software/rdp_classifier/rdp_classifier-2.0.jar


Check for mismatch between the name of the classifier file. What I actually got was named 2.0 not 2.0.1.

#12 QIIME

That's it for the default pipeline..


svn co https://qiime.svn.sourceforge.net/svnroot/qiime/trunk


I followed their advice and put the install scripts in a special place (but I wouldn't do that again).


python setup.py install --install-scripts=~/bin/qiime/bin/
cd tests
python all_tests.py


The problem with the special directory is you will need a .qiime_config file (see the docs)

Make sure to put it in ~/.qiime_config.

Also not needed yet but I'll list them here anyway:

#13 BLAST

blast-2.2.22

We've done that one before. The BLASTMAT variable must point to the NCBI data directory. In .bash_profile:


export BLASTMAT=$HOME/Software/blast-2.2.22/data


#14 Infernal

(download)


./configure
make
make check
sudo make install


#15 R

Run R and do:


install.packages('randomForest')


I upgraded to R 2.12.1 from 2.10.0, but it has some issue. Trying to get the ape package from CRAN hangs the app. I backed off to 2.11.1 (see here).

Still to come:

10 more bits of software.

But they're not needed for the first part of the tutorial, so we should just do that first. Oh..I also have Sphinx installed (from the PyCogent instructions), so the documentation got built. But I've just been working from the web version anyway. On to the fun stuff.