Thursday, 16 October 2014

Getting SPINEX to produce Protein Secondary Structure

SPINEX is a tool that takes PSSM files as input and predicts the secondary structure of the protein. It is roughly 80% accurate. It can be downloaded here:

Extract the tgz file, then cd into the directory. There should be bin, code, test and weights directories. If you don't already, you'll need a fortran compiler, gfortran works fine. cd into the code directory and run the compile script. This should create all the executables required.

The next caveat is that SPINE-X only takes pssm files with the '.mat' extension. If your pssm files don't have this extension you'll need to change them. E.g. where I generate PSSM files, I created them with a '.txt' extension. To change them all at once, you can do the following:

for i in `ls`; do mv $i `echo $i | sed 's/txt/mat/'`; done

To run the script above, make sure you are in the directory with all the pssm files. It will change the extension of all the '.txt' files in that directory.

Now move up one directory and do the following ('pssm' is the directory containing the pssm files that we just renamed):

ls pssm | sed 's/.mat$//' > list.txt

This will create a list of all the pssm files without their extensions. SPINE-X automatically adds '.mat' to them. Now we can finally run SPINEX itself:

/home/james/Downloads/spineXpublic/ list.txt pssm/ 

The file is a perl script from the spineXpublic.tgz we downloaded originally. 'list.txt' is the file list we just created, and 'pssm/' is the directory containing all the pssm files renamed to have a '.mat' extension. This procedure will create a new directory called 'spXout/' which will contain all the predicted secondary structure files.

No comments:

Post a Comment