Workflow for Functional Annotation
Summary
Functional Annotation workflow template provides the easy way of annotating whole proteome set. This workflow uses multiple tools to annotate the given protein sequence. BLAST is used for functional annotation based on similarity with existing protein databases such as UniProt, nr etc. PfamHmm is used to identify the functional domains which are present within the sequence. Using programs viz, TMHMM and SignalP, user can predict subcellular localization. InterProScan is used to assign the Gene Ontology terms to the given protein sequences. The output of above mentioned programs is processed further by FAT-Functional Annotation Tool in order to provide protein sequences with annotations mentioned in respected header lines of fasta sequences.
|
Fig:Translated implementation |
Standard Tools
BLAST, PfamHMM, TMHMM, SignalP, InterProScan
Custom tool
FAT- FunctionalAnnotationTool
Details of workflow and tools used:
Ø Input files
o File containing Fasta formatted protein sequences
Ø BLAST Basic Local Alignment Search Tool (BLAST-2.2.14):
BLAST will search the query protein sequences against the available protein databases.
Ø PfamHMM
PfamHMM searches the query sequence against the available sequeneces from Pfam to predict the domains in the given sequences.
Ø TMHMM
TMHMM is used in for prediction of transmembrane helices in proteins.
Ø SignalP
SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes.
Ø InterProScan
InterProScan is a tool that combines different protein signature recognition methods native to the InterPro member databases into one resource with look up of corresponding InterPro and GO annotation.
Ø FAT-FunctionalAnnotationTool
It takes the outputs of different programs and parses them to give the fasta-formatted file having annotation details in the header location of each sequence.
Mandatory options:
1. Input file containing fasta-formatted protein sequences
Optional input files:
1. Output file from BLASTP program
i. Identity cutoff [Default value: 90%]
ii. Query Coverage cut-off [Default value: 80%]
2. Output file from PfamHMM program
3. Output file from TMHMM program
4. Output file from SignalP program
5. Output file from InterProScan program (XML output)