Pages

Thursday, 16 October 2014

Getting PSI-BLAST to create PSSMs

This post will be a reference on how to generate PSSM profiles for a list of proteins using NCBI's PSI-BLAST.

First of all, you'll need the PSI-BLAST tool itself from here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ and a database e.g. the nr dataset from here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/, just download all the numbered files e.g. nr.00.tar.gz -> nr.26.tar.gz and extract them all to a folder somewhere.

Let's say you have a text file containing protein sequences called prots.txt containing something like:


SQETFSDLWKLLPEN
STSRHKKLMFK
SPLPSQAMDDLMLSPDDIEQWFTEDPGP
SHLKSKKGQSTSRHKKLMFK

PSI-BLAST needs these as separate input files, so we need to create a fasta file for each one. We will do this at the same time as running PSI-BLAST. I will assume we extracted the nr dataset into a directory called nr/. Now to run PSI-BLAST, I move in to the nr/ directory. This ensures PSI-BLAST can find the dataset. From here we run the following script (e.g. put it in a file called 'run.sh' and execute it):


c=1
for i in `cat prots.txt`
do
echo $c $i
echo $i > ~/lab_work/nr_db/train/train${c}.txt
../ncbi-blast-2.2.26+-src/c++/GCC461-Release64/bin/psiblast -query ~/lab_work/nr_db/train/train${c}.txt -db nr -out ~/lab_work/nr_db/out/out${c}.txt -num_iterations 3 -out_ascii_pssm ~/lab_work/nr_db/pssm/pssm${c}.txt -inclusion_ethresh 0.001
c=`expr $c + 1`
done

The script above outputs each line of prots.txt into a file called trainN.txt, where N is a number that is incremented for each protein. You'll have to make sure the directory you want to put these train files in exists. There will also need to be directories called 'out' and 'pssm'. These will be filled with PSI-BLAST outputs and the pssm files respectively.

4 comments:

  1. Awesome blog. I would love to see true life prepared to walk, so please share more informative updates. Great work keeps it up. AZ-100 exam questions

    ReplyDelete
  2. The Community-Driven Education is the best to help us and provide great results. The Project Ownership Where it Belongs is amazing and I like that you shared this post for us to know about these ideas. Also from ANS-C00 Practice Questions I realize that it is more helpful for us.

    ReplyDelete
  3. The software program then either sends the G-code to the printer, or the G-Code could be copied to a Secure Digital card for use by 3D printers that can be be} operated and not using a|with no} laptop. In a traditional 3D printer, the G-code offers instructions, corresponding to extruder nozzle temperatures, where the tip of the extruder will go, when to deposit melting plastic from the heated nozzle fed with filament, and the way a lot to deposit. We repurposed the 3D printer for NA extraction by writing G-codes to regulate the movement heaps of|and plenty of} different features (e.g., controlling temperature on the extruder and the heated Knee High Stockings bed) of the 3D printer. For example, our G-code can direct the MPPA to move from well to well, and shake at a sure frequency to help mixing .

    ReplyDelete