This post will be a reference on how to generate PSSM profiles for a list of proteins using NCBI's PSI-BLAST.
First of all, you'll need the PSI-BLAST tool itself from here: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ and a database e.g. the nr dataset from here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/, just download all the numbered files e.g. nr.00.tar.gz -> nr.26.tar.gz and extract them all to a folder somewhere.
Let's say you have a text file containing protein sequences called prots.txt containing something like:
SQETFSDLWKLLPEN
STSRHKKLMFK
SPLPSQAMDDLMLSPDDIEQWFTEDPGP
SHLKSKKGQSTSRHKKLMFK
PSI-BLAST needs these as separate input files, so we need to create a fasta file for each one. We will do this at the same time as running PSI-BLAST. I will assume we extracted the nr dataset into a directory called nr/. Now to run PSI-BLAST, I move in to the nr/ directory. This ensures PSI-BLAST can find the dataset. From here we run the following script (e.g. put it in a file called 'run.sh' and execute it):
c=1
for i in `cat prots.txt`
do
echo $c $i
echo $i > ~/lab_work/nr_db/train/train${c}.txt
../ncbi-blast-2.2.26+-src/c++/GCC461-Release64/bin/psiblast -query ~/lab_work/nr_db/train/train${c}.txt -db nr -out ~/lab_work/nr_db/out/out${c}.txt -num_iterations 3 -out_ascii_pssm ~/lab_work/nr_db/pssm/pssm${c}.txt -inclusion_ethresh 0.001
c=`expr $c + 1`
done
The script above outputs each line of prots.txt into a file called trainN.txt, where N is a number that is incremented for each protein. You'll have to make sure the directory you want to put these train files in exists. There will also need to be directories called 'out' and 'pssm'. These will be filled with PSI-BLAST outputs and the pssm files respectively.
It was really helpful (Y)
ReplyDeleteAwesome blog. I would love to see true life prepared to walk, so please share more informative updates. Great work keeps it up. AZ-100 exam questions
ReplyDeleteThe Community-Driven Education is the best to help us and provide great results. The Project Ownership Where it Belongs is amazing and I like that you shared this post for us to know about these ideas. Also from ANS-C00 Practice Questions I realize that it is more helpful for us.
ReplyDelete