Pages

Thursday 16 October 2014

Getting HHBlits to produce profiles

HHblits was introduced in the paper HHblits: Lightning-fast iterative protein sequence searching by HMM-HMM alignment.

First download the HHsuite from here: ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/. You will also need a database from here: ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/databases/hhsuite_dbs/. I used uniprot20 as it was the most recent at the time of writing this post.

Extract HHsuite and the databases. To run hhblits, you'll need fasta files for each of the proteins you want. I happened to have individual files with the protein sequences in it, but hhblits needs proper fasta formatting, which basically just means adding a '>' to the first line. This can be done as follows:


for i in `ls /home/james/Desktop/new_pssm_and_ssp/pssm_new/EDD/infile/`;
do echo '>' > eddtrain/$i;
cat /home/james/Desktop/new_pssm_and_ssp/pssm_new/EDD/infile/$i >> eddtrain/$i;
done

For the script above to work, you'll need a directory called 'eddtrain'. The script reads every file in ...EDD/infile/ and creates a corresponding file in eddtrain/ with > as the first line and the protein sequence as the second line. To run hhblits on all of these files, use the following:

for i in `ls eddtrain/*.txt | sed 's-eddtrain/--'`; do 
echo $i; hhsuite-2.0.16-linux-x86_64/bin/hhblits -d uniprot20_2013_03/uniprot20_2013_03 -i eddtrain/$i -ohhm eddhmms/$i -n 4 -M first;
done

This will output hmms for all the files in the eddhmm directory.

The hmms have a set of 20 numbers which are transformed probabilities. They can be transformed back into probabilities using the following formula (Note that '*' is the same as infinity i.e. p=0):

\[p = 2^{(-N/1000)}\]

Possible errors

Error in hhblits: input file missing!

You need to add the -i flag. e.g. -i train/file.fasta


Error in hhblits: database missing (see -d)

You need to add the database name e.g. -d uniprot20_2013_03/uniprot20_2013_03. You can get the database from here: ftp://toolkit.genzentrum.lmu.de/pub/HH-suite/databases/hhsuite_dbs/. Extract the database as you would a normal zipped file.


Error in hhblits: Could not open A3M database uniprot20_2013_03_a3m_db, No such file or directory (needed to construct result MSA)

In this case you have given the database directory, you need to provide the path to the database inside the directory. It has to be of the correct format, see the previous error.


3 comments:

  1. You can expect that prospects and potential joint venture partners will be taking the time to check out your social profiles. This practice has become a standard way to get to know somebody before deciding whether to take the next step and interact with them online.
    blogspot

    ReplyDelete
  2. Great.This is such awesome content i got to read after lot of time.Its so interesting as well as informative.I am sure everyone who read it got a lot to learn from it. 300-075 exam questions

    ReplyDelete