Tuesday, 2 December 2014

Converting PDB files to FASTA format with python

This is a simple script to convert PDB files (which contain information about protein structure) into FASTA format, which is just a list of the amino acids. This code handles one PDB file at a time, to do multiple ones put it in a loop e.g. for bash:

for i in `ls *.pdb`; do python $i > $i.fasta; done

In operation, it just looks for lines with 'ATOM', then looks at the residue number to see if has changed. When the residue numbers change it outputs the current amino acid label. Feel free to use/modify this code for any purpose.