Workflow for Prediction of Antigenic sites

Workflow for Prediction of Antigenic sites

· Summary

The workflow is designed to predict the potential antigenic regions in a given set of proteins of a pathogenic organism. In this workflow multiple BLAST programs (BLASTP, TBLASTN) are run for queried Pathogen sequences against the host sequence (nr for BLASTP and human/mouse for TBLASTN) databases. The E value for BLAST searches is 0. A parser is also built to write the rejected Ids of sequences which show significant hits from BLAST into one file and write the remaining sequence Ids into a separate file to pass them to next node (Take the geneid as input from BLASTP/TBLASTN and output is used as input to SRT).

The short listed sequences have BLAST bit score <= to 40 and query coverage >= 50 as default. Sequences having low score or no match are the unique pathogen sequences. These sequences are retrieved from the BLAST output by using GeneID from multiple BLAST parsers and saved in flat file format comprising of fasta sequences using in-house built Sequence Retrieval Tool.

Selected sequences are than used as input for tools like netCTL, Antigenic, TargetP and ESLpred2 to predict T cell and B cell epitopes along with their sub cellular location.

netCTL predicts the cytotoxic T cell lymphocyte epitopes. netCTL has been chosen because of its higher predictive performance than EpiJen, MAPPP, MHC-pathway, and WAPP on all performance measures. T cell epitopes are predicted and stored in flat file format using custom tool ‘Map potential antigenic output’

Antigenic (emboss) is used to predict B cell epitopes. The probable B-cell epitopes are predicted and retrieved by using Map potential antigenic output custom tool.

TargetP and ESLpred2 are used to predict the sub cellular localization of the given sequences and if the sequence is detected as secretory or extra cellular protein or contains signal peptides more weight is added to the netCTL and antigenic results.

· Standard Tools

BLAST
NetCTL
Antigenic
ESLPred2
TargetP

· Custom Tools

Sequence Retrieval Tool
Map Potential Antigenic Output

· Parser

· GeneID from Multiple BLAST Outputs

Fig: Translated implementation

· netCTL parameters (red color text are the parameter used)

a. Weight on C terminal cleavage: The default value (0.15) gives optimal predictive performance on average. The user can modify the relative weight on proteasomal cleavage by entering a different weight value.

b. Weight on TAP transport efficiency: The default value (0.05) gives optimal predictive performance on average. The user can modify the relative weight on TAP transport by entering a different weight value.

c. Threshold for epitope identification: Peptides with a combined prediction score value greater than the threshold value are marked as potential epitope. In a large scale benchmark identifying known CTL epitope in proteins the default value of 0.75 was found to correspond to a sensitivity of 0.65 and specificity 0.97.

d. 12 class I binding MHC supertypes used: A1, A2, A3, A24, A26, B7, B8, B27, B39, B44, B58, B62

· TargetP parameters:

Select the organism non-plant, Perform cleavage site predictions

Specificity >0.95 (predefined set of cutoffs that yielded this specificity on the TargetP test sets, specificity >0.90 (predefined set of cutoffs that yielded this specificity on the TargetP test sets), Define your own cutoffs (0.00 - 1.00).

· ESLpred2 parameters:

Specify method as 3 (Hybrid approach) is used to predict the sub cellular localization; select the organism G (generalized dataset).

1 for AAC with terminals; 2 for PSSM Composition; 3 Hybrid

A for Animal dataset; F for Fungi; P for Plants dataset; G for Generalized dataset

· Antigenic parameter:

The minimum length of antigenic region is 6 but can be taken from 1-50.

· BLAST parameters:

E value is 0 for BLASTP as well as for TBLASTN and the sequence having BLAST score <= 40 and query coverage >= 50 as default are taken for next node.