Table of Contents

Introduction

 


The advent of new generation sequencing technologies has revolutionized the number of genomes being sequenced (per organism) leading to a steep rise in the volumes of genomic data generated. Downstream analysis of such enormous data requires specialized software/hardware that will enable high-throughput analysis of vast genomic data. Comprehensive analysis of heterogeneous genomic data requires the usage of large number of Bioinformatics tools that demand extensive preprocessing and require manual intervention at frequent intervals. Our motive is to provide a flexible platform for running complex queries that is capable of integrating and analyzing large amount of genomic data. Anvaya, is a software that consists of Bioinformatics tools and databases that are loosely coupled together in a coordinated system to execute a set of analyses tools in series or in parallel.

Anvaya Client Features


User

Login

 The user has to login with appropriate credentials. Login-ID and Password can be obtained from Anvaya Admin Interface.

Logout

 The user must logout everytime before the client exits.

Project

New Project

 When workflows client is started, after user login, a new project needs to be created in order to start creating a pipeline

Open Project

 Anvaya Client allows user to open a previously executed project by using this option.

Delete Project

 A previously created project can be deleted using this option.

Workflow (Pipeline)

Create Workflow

 A workflow is created by dragging associated tools from the tool list onto the design canvas. The tools are then connected in logical order.

Save workflow The workflow created on design canvas must be saved before execution.

Run Workflow

 When user clicks "Run Workflow", the user will need to transfer the input and other configuration files before actual execution of workflow. Once the file transfer is complete, the user can start executing the workflow.

Stop Workflow

  Workflow Execution can be stopped in between by using this option

Resume workflow

  Workflow execution can be resumed from certain node using this option.

Status

  Status of the workflow can be viewed in tabular format on the status tab. The status is also depicted pictorially on the design canvas

UI

Design Canvas

  Design Canvas is the tab which is used as work are for creating the pipeline. The tools are dragged on the design canvas for logical connection.

Project Explorer

Project Explorer Tab allows user to view the input-output and the intermediate output files of the current project. The files must be transferred from server (status tab) before they can be viewed in the Project Explorer

Tool List

  The complete list of tools available is available in tool list. The tool list is sorted functionality-wise and alphabetical order.

Pre Defined Workflows  Anvaya provides a set of 11 pre-defined workflows for frequently used pipelines in genome annotation and comparative genomics ranging for EST assembly and annotation to phylogenetic reconstruction and microarray analysis.

Node Properties

  The advanced parameters of a node (tool) can be set by double clicking the node

Connecting Tools

  The tools can be connected by single click of mouse in appropriate order.

Sticky Note

  Sticky Note allows user to store short notes regarding associated node or workflow. These can be minimized or hidden or expanded back for readability purpose.

Help

  Help is provided for Anvaya Client to assist user with different functionalities.

SubLayer

  Nodes (Tools) can be logically grouped together to form sublayer. The sublayer can be collapsed or expanded as per readability.

Rules Engine

  One of the unique features of �Anvaya� is the  �rules engine� that defines rules for logical connection between the existing tools.  �Anvaya� offers the user, novel functionalities to carry out exhaustive comparative analysis via �custom tools�, which are tools with new functionality not available in standard tools. Once clicked on particular node, the nodes which are allowed connection from that node are highlighted.

Tool tip

  Tool tip is provided on the tool nodes in the tool list. This gives the version of associated tool.

Alignment of Nodes

  The nodes can be group selected and can be arranged horizontally or vertically using this option.

Scroll View

  Scroll View allows user to span across the design canvas.

Workflows - Get Started


Once Anvaya client is installed, user will have to request login-id from Anvaya Admin interface. The user can then use the login-id and password to start using Anvaya Client. The server needs to be configured before login.

Anvaya - Software UI Layout


Anvaya has following layout components:

    - Design Canvas : Work Area for creating workflow pipelines

    - Project Explorer : To view input/output files for the project

     - Status Tab : For viewing Tabular/pictorial status of workflow execution

    - Tool List: List of tools available functionality wise or in alphabetical order

    - Sub Layer Tree : Tree View for sub layers created in workflow pipeline

    - Scroll View : Snapshot view area which has handle to span across the complete design canvas

    - Console: Text Message area to display current execution messages for user

Workflows - Configurations


Client Configuration

Once client is installed, it is configured with default values. the configuration file is avilable under conf/workflow.conf file wherever the client was installed. The user with have to configure WORKFLOW_HOME path to indicate the workspace directory for Anvaya Client. All the other paths are configurable through the client itself.

 

Server Configuration

Server configuration is available under the "Server>>Configure Server" option in the client menu. Here, the user will have to configure the server address, where the Anvaya services are installed along with user authentication details. These details are stored in encrypted format at client end for further usage.

 

Anvaya Client Configuration


Once client is installed, it is configured with default values. the configuration file is avilable under conf/workflow.conf file wherever the client was installed. The user with have to configure WORKFLOW_HOME path to indicate the workspace directory for Anvaya Client. All the other paths are configurable through the client itself.

Anvaya Server Configuration


Server configuration is available under the "Server>>Configure Server" option in the client menu. Here, the user will have to configure the server address, where the Anvaya services are installed along with user authentication details. These details are stored in encrypted format at client end for further usage.

Anvaya - Nodes


Anvaya node represents any individual tools which is part of created pipeline. The nodes can be logically connected to other nodes. Each node is associated with input-output panel to configure the IO files and also with advanced panels, to configure the associated tool in detail.

Workflows - Delete Node


If the user wants to delete a node form pipeline, right click on the node and then click 'Delete' option.

Workflows - Drag Node


To create pipeline, the node (tool) needs to be dragged from the tool list onto the design canvas. User can drag the tools from alphabetical list or from the tools list sorted functionaility-wise.

Workflows - Edit Node Properties


User can edit the node, by double-clicking it. The Properties dialog of the node will pop up. Each node has input-output panel for configuring the input and outputs for the node. The parameters to be configured for each tools is available in the associated "advanced parameter" panel.

Anvaya - Project


In Anvaya, the user has to started by creating a project. Each project can have only one workflow-pipeline defined. The user can open-delete previously created projects.

 Delete Project


A previously created project can be deleted using this option.

New Project


When workflows client is started, after user login, a new project needs to be created in order to start creating a pipeline.

Open Project


Anvaya Client allows user to open a previously executed project by using this option.

Project Explorer


The Project Explorer tab allows user to view input, output and the intermediate files create during the execution of the workflow. The client needs to send request to transfer files from server to make them available in Project Explorer. For transfering the file, the user can make use of the Status Tab.

Pre-defined Workflow - EST Assembly


EST Assembly is one of the challenging workflow in bioinformatics research area. Here Anvaya provides researcher the single workflow, which will take the raw trace files from sequencing machines and will provide the fully annotated assembled ESTs.

The trace files from sequencing machines are processed by PHRED software for base calling. PHRED will provide the files containing sequences along with corresponding quality values. The sequences are further processed by the Cros_match tool for vector masking. The STT-Sequence trimming tool takes these vector masked EST sequences as an input. STT will remove the vector-masked nucleotides, polyA/T tail, polyC/G tail and Adapter/Linker sequences from EST sequences. STT will also maintain the similar changes/edition in the corresponding quality values. Further, STT-processed sequence files are processed by Seqclean for the removal contamination such as mitochondrial sequences. Seqclean will provide the filtered ESTs. The similar modification can be made in corresponding quality values by the tool cln2qual from seqclean package. One can remove the host/parasite contamination by the use of seqclean tool.

The filtered EST sequences thus obtained are further annotated by using BLAST. FAT tool will parse the BLAST results and will give EST sequences with their annotation in their header lines. The �dbEST submission files Tool� takes these EST sequences along with their annotations as an input and provides the dbEST submission files having file format mentioned by NCBI.

The filtered EST sequences are submitted for assembly to Cap3 assembly tool. The �Unique Transcripts EST Tool� combines the output files from Cap3 (assembled ESTs i.e. contigs file and unassembled ESTs i.e singlets file) and gives the single fasta formatted file containing the union of contigs and singlets sequences. This assembled EST dataset are further processed by BLAST, InterProScan, BLAST2GO for it�s functional annotation. The outputs from all these tools will be parsed by FAT-Functional Annotation tool for the fasta formatted sequences having annotation information in the respected header information. ESTScan will help in analyzing whether the sequenced ESTs are true ESTs or not.

 

Pre-defined Workflow - Functional Annotation


Functional Annotation workflow template provides the easy way of annotating whole proteome set. This workflow uses multiple tools to annotate the given protein sequence. BLAST is used for functional annotation based on similarity with existing protein databases such as UniProt, nr etc. PfamHmm tools is used to identify the functional domains which are present within the sequence. Using programs viz, TMHMM and SignalP, user can predict subcellular localization. InterProScan is used to assign the Gene Ontology terms to the given protein sequences. The output of above mentioned programs is processed further by FAT-FunctionalAnnotationTool in order to provide protein sequences with annotations mentioned in respected header lines of fasta sequences.

Pre-defined Workflow - Genome Annotation


Genome annotation workflow is used to annotate a newly sequenced genome. This workflow can be used for both prokaryotic and Eukaryotic organisms.

Glimmer/Genscan predicts the genes from a given input genome sequence. In case of prokaryotes, glimmer results are further accompanied by ribosomal binding site prediction from RBSfinder. The tRNA genes are predicted by tRNAScan-SE program. The repeats from the genome are masked by using RepeatMasker program. All the results from Glimmer/Genscan, RBSfinder, tRNAScan, and RepeatMasker programs are combined to give in a standardized format i.e., in Genbank format. The client can view this genbank formatted result file using Artemis visualization tool for detailed analysis.

The Compseq utility from EMBOSS package is used for reporting composition of dimer/trimer/etc words in a sequence. The freak utility from EMBOSS package is used for providing residue/base frequency tables or plots. The restrict utility from EMBOSS package is used to find restriction enzyme cleavage sites within the given genome

Pre-defined Workflow - Promoter Identification using Micro-array data


The workflow helps us to identify consensus patterns in the upstream regions of certain genes, which may be clustered together, which may also lead to the identification of novel target genes for therapeutic purposes. Gene expression data is given as input for the workflow, for which the data is supposed to be pre-normalized at the user�s end. The normalized data is then parsed for Cluster analysis (preferably K-Means or hierarchical Clustering). For the different clusters obtained, the corresponding gene ids are obtained from database followed by retrieval of the upstream regions of the gene. Motif analysis tool MEME is used to identify conserved patterns/motifs. The motifs thus obtained are searched against the desired motif databases using the tool MAST.

 

Pre-defined Workflow - Motif Identification


The workflow identifies conserved patterns or motifs using DNA or Protein sequences as input. The input sequences are searched against a preferred database using BLAST and the significant hits obtained (orthologs) are parsed for the next step of sequence retrieval. Motif discovery programs like MEME, AlignACE, MDScan, Weeder and Consensus are used for pattern identification. The conserved regions obtained are then parsed and analyzed through a custom tool �Motif Processing Tool�.

 

Pre-defined Workflow - Ortholog Prediction


The workflow predicts orthologs given a set of two genomes using the criteria of bi-directional best hits. BLAST is used for detecting the orthologs. Initially the gene sets of two organisms are subjected to BLAST and orthologs are predicted in cases where the criteria of bi-directional best hits are satisfied. Genes are referred to as unique if they do not satisfy the criteria of bi-directional best hits. Such unique genes of a given organism are then searched against the genome of the other organism so as to cross-check if the genes have been missed due to the limitation of the gene prediction program. If significant matches are found at the genome level then it is not a unique gene and if no matches are detected then the genes are unique with respect to the other organism in consideration.

Pre-defined Workflow - Phylogenetic Profiling


The workflow aims to infer functional linkages using phylogenetic profiling. The input protein sequence(s) is usually from any organism with its complete genome sequenced which is searched against proteome data of other organisms with completely sequenced genome using either BLASTP/SSEARCH. The e-values obtained after search of every protein sequence is parsed and normalised and represented as a profile vector/matrix where in the rows are individual proteins and the columns are organisms. The profiles obtained are analysed for their statistical significance using parameters like mutual information content, hamming distance and correlation coefficient

 

Pre-defined Workflow - Phylogeny


The workflow builds a phylogenetic tree of the orthologs detected for a given query sequence (nucleotide or protein) using a similarity search tool. User needs to provide only the query sequence to carry out a similarity search using BLAST against a chosen database. Parsers will be provided to read the output of BLAST and submit sequences to multiple sequence alignment tools like ClustalW, which would then pass the output to Phylip suite for reconstruction of phylogenetic tree of the orthologs. 

Pre-defined Workflow - Prediction for Potential Antigenic Sites


The workflow is designed to predict the potential antigenic regions in a given set of proteins of a pathogenic organism. In this workflow multiple BLAST programs (BLASTP, TBLASTN) are run for submitted Pathogen sequences against the host sequence (nr for BLASTP and human/mouse for TBLASTN) databases. The E value for BLAST searches is 0. A parser is also built to write the rejected Ids of sequences which show significant hits from BLAST into one file and write the remaining sequence Ids into a separate file to pass them to next node (Take the geneid as input from BLASTP/TBLASTN and output is used as input to SRT).

The short listed sequences have BLAST bit score <= to 40 and query coverage >= 50 as default. Sequences having low score or no match are the unique pathogen sequences. These sequences are retrieved from the BLAST output by using GeneID from multiple BLAST parser and saved in flat file format comprising of fasta sequences using in-house built Sequence Retrieval Tool.

Selected sequences are than used as input for tools like netCTL, Antigenic, TargetP and ESLpred2 to predict T cell and B cell epitopes along with their sub cellular location.

netCTL predicts the cytotoxic T cell lymphocyte epitopes. netCTL has been chosen because of its higher predictive performance than EpiJen, MAPPP, MHC-pathway, and WAPP on all performance measures. T cell epitopes are predicted and stored in flat file format using custom tool �Map potential antigenic output�

Antigenic (emboss) is used to predict B cell epitopes. The probable B-cell epitopes are predicted and retrieved by using Map potential antigenic output custom tool.

TargetP and ESLpred2 are used to predict the sub cellular localization of the given sequences and if the sequence is detected as secretory or extra cellular protein or contains signal peptides more weight is added to the netCTL and antigenic results.

Pre-defined Workflow - Identification of Primer/Marker


The workflow is for identification of primer sequence. The input nucleotide sequence is subjected to tools like Primer3 for prediction of primers. The primers predicted are then subjected to search against the database of choice using BLASTN and to RNA secondary structure prediction. Outputs from BLASTN search and RNA secondary structure prediction are then combined to give an output, suggesting the most probable primer sequence pair. 

 

Pre-defined Workflow - Remote Ortholog and Conserved Domain Prediction


The workflow enables to find probable remote orthologs for a test input sequence. Simultaneously it also allows identifying conserved domains amongst the closely related sequences of the same input.

Tools available in Workflows


Sr.No

Functionality Name

   Tools

1

Gene Prediction

 

Glimmer-HMM

GenScan

2

Annotation

 

RepeatMasker

TMHMM

InterProScan

SignalP

RBS_finder

PFAM

ESLPRED2

TargetP

ESTScan

Blast2GO

3

RNA prediction

 

tRNAscan-SE

vRNAFold

4

Assembly

 

Cap3

Phred

SeqClean

Cln2Qual

5

DBsearch

 

BLAST

FASTA

MAST

PSIBlast

CrossMatch

6

Motif Prediction

 

AlignACE

Weeder

Consensus

MDScan

MEME

7

MSA

CLUSTALW

8

Sequence properties

COMPSEQ

FREAK

RESTRICT

CHARGE

 

Sr.No

Functionality Name

   Tools

9

Phylogeny

 

Seqboot

Dnadist

Protdist

Dnaml

Proml

Dnapars

Protpars

Fitch

Kitsch

Neighbor

Consense

10

Custom Tools

 

BBH

OFB

CGF

SRT

ROT

OrthologousCluster

FAT

Unique Transcript EST

HammingDistance

CreateProfileVector

MapPotentialAntigenic

Motif Processing

SeqTrimmingTool

DbEstSubmission

HQ_EST

PredictPrimer

Cap3InputCustomization

 

11

Primer Prediction

ePrimer3

12

Epitope Prediction

NETCTL

ANTIGENIC

13

Other

MutualInformation

Cluster

 

 

Tool Name          : AlignACE
  - Aligns Nucleic Acid Conserved Elements

Version                : 4.0

Description

AlignACE (Aligns Nucleic Acid Conserved Elements) is a program which finds sequence elements conserved in a set of DNA sequences

Reference : http://atlas.med.harvard.edu/

Tool Name          : Antigenic
  - From EMBOSS Package

Version                : 4.1.0

Description

Finds antigenic sites in proteins

Reference : http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/antigenic.html

Tool Name          : BBH
  -  BiDirectional Blast Hit (Anvaya Custom Tool)

Version                : 1.0

Description

It is a custom tool which performs two BLASTP programs at a time taking proteome of one of the organisms as input file and proteome of the other as database and vice-versa. It also fishes out the orthologs found using the bi-directional best hit criteria between the two organisms

Advanced Parameters

Parameter

Allowed Values

Default Value

Input sequence for organism1 in BLASTp (complete protein sequence)

Fasta sequences

None

Database (complete protein sequences of 2nd organism against which ortholog has to be predicted)

Fasta Sequences

None

Tool Name          : Blast
  - Basic Local Alignment Search Tool

Version                : 2.2.14

Description

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

Advanced Parameters

Parameter

Allowed Values

Default Value

Program Name

blastp, blastn, tblastn, tblastx, blastx

blastp

Database

Alphanumeric

nr

Query File

Alphanumeric

stdin

Expectation value (E)

Real

10.0

Output file

alphanumeric

stdout

Filter

T/F

T

Gap Opening Peanlty

blastp:7,8,9,10,11,12

-1

Gap Extension Penalty

blastp:1,2

-1

Mismatch penalty

-1,-2.-3,-4,-5

-3

Match score

1,2,4

2

Number of hits to show alignments

Any integer

250

No. of processors

integer

1

Scoring matrix

blastp:BLOSUM62, BLOSUM80, BLOSUM45, PAM30, PAM70

BLOSUM62

Word size

blastn :7,11, 15

all others: 2,3

blastn:11

all others:3

Tool Name          : BLAST2GO
  - B2G4Pipe

Version                : 2.2.2

Description

Blast2GO is an ALL in ONE tool for functional annotation of (novel) sequences and the analysis of annotation data.

Reference : http://www.blast2go.org

Tool Name          : Cap3
  - A DNA sequence assembly program

Version                : Versions as with Phred 2.2.14

Description

The program has a capability to clip 5' and 3' low-quality regions of reads. It uses base quality values in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. The program also uses forward-reverse constraints to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented.

Tool Name          : CGF
  -  Convert to Genbank Format (Anvaya Custom Tool)

Version                : 1.0

Description

This program will take the input from different nodes and will summarize the results in genbank format.

Advanced Parameters

Mandatory options:

Optional input files:

Tool Name          : Charge
  - From EMBOSS Package

Version                : 4.1.0

Description

Charge reads a protein sequence and writes a file (or plots a graph) of the charges of the amino acids within a window of specified length as the window is moved along the sequence

Reference : http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/charge.html

Advanced Parameters

Parameter

Allowed Values

Default Value

Query input

Alphanumeric

-seqall

Sequence associated option

Integer

N/A

Sequence format

Alphanumeric (Fasta, Embl,)

Fasta

Produce  graph

ps, hpgl, png, gif, x11.

ps

Window size

Integer

5

Graphics

Toggle value Yes/No (png, ps)

No

Amino acids properties and molecular weight data file

Data file

Eamino.dat

Output File

Alphanumeric

outfile

Tool Name          : Cln2Qual
  

Version                : 2.2.14

Description

Cln2Qual parses the trimming ("clear range") coordinates and trash codes from the cleaning report and applies them to the quality records.

Reference : http://jimmy.harvard.edu

Tool Name          : ClustalW
  - Multiple sequence alignment program

Version                : 0.13

Description

ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.

Advanced Paramters

Parameter

Allowed Values

Default Value

Input

Alphanumeric

None

Output

Alphanumeric

Same as input file name

Type of sequence

protein or dna

Protein

Format of output

gcg, gde, phylip, pir, nexus, clustal

Clustal

Format of output tree

nj, phylip,        dist, nexus

 

Matrix for pairwise alignment

BLOSUM, PAM, GONNET, ID

Gonnet

Matrix for pairwise and multiple alignment

IUB, CLUSTALW or filename

IUB

Gap opening penalty

Float

DNA: 15.0

Protein: 10.0

 

Gap extension penalty

Float

DNA:6.66

Protein:0.1

Tool Name          : Cluster
  

Version                : 3.0

Description

The tool  provides  the most commonly used clustering methods for gene expression data analysis.

Reference : http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/

Tool Name          : CompSeq
  - From EMBOSS Package

Version                : 4.1.0

Description

Compseq counts composition of dimer/trimer/etc words in a sequence.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input Sequence file

File

NA

Word size

1 =< n < 20

2

Out put file name

File

NA

Previoulsy produced compseq outfile.It can be used to set the expected frequencies of words in this analysis.

File

 

NA

Frame of word

0 =< n < word_number

0

Ignore code B and Z

Yes/no

Yes

(Checked)

Reverse complement

Yes/no

No

(Unchecked)

Calculate from Observed frequency

Yes/no

No

(Unchecked)

Zero count

Yes/no

Yes

(Checked)

Tool Name          : Consense
  - From PHYLIP Package

Version                : 3.67

Description

CONSENSE reads a file of computer-readable trees and prints out (and may also write out onto a file) a consensus tree. Basically the consensus tree consists of monophyletic groups that occur as often as possible in the data. If a group occurs in more than 50% of all the input trees it will definitely appear in the consensus tree. The tree printed out has at each fork a number indicating how many times the group which consists of the species to the right of (descended from) the fork occurred.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

 

Alphanumeric

Intree

Output file

 

alphanumeric

Outfile

Consensus method

Majority rule (extended)

Majority rule (extended)

Strict

Majority Rule

Ml

Outgroup root

Yes/No

No, use as Outgroup species  1

Rooted tree (Trees to be treated as Rooted)

Yes/No

No

Terminal type

ANSI

ANSI

BM PC

None

Print out the data at start of run 

 

Yes/No

No

Print indications of progress of run   

 

Yes/No

Yes

 

Print out tree

Yes/No

Yes

Write out trees onto tree file? 

Yes/No

 

Yes

Tool Name          : Consensus
  

Version                : 6c

Description

Consensus is a pattern recognition program that can be used in identifying pattern in a set of unaligned DNA, RNA and protein sequences.

Reference : http://bifrost.wustl.edu/consensus/

Tool Name          : CrossMAtch
  

Version                : 0.990319

Description

Cross_match is a program for rapid protein and nucleic acid sequence comparison and database search. 

Reference : http://jimmy.harvard.edu

Tool Name          : inCap3
  -  Cap3 Input Customization (Anvaya Custom Tool)

Version                : 1.0

Description

This is a customized tool provided as a part of Anvaya package. This will create the input file names compatible with the CAP3 software.

Input files:

Output files:

Sequence and qual file should be at the same location. User can not provide the name of output files. User can only provide the output tag. According to the output tag, the cap3 input files will be created.

Tool Name          : Db_EST
  -  DB EST Submission Tool (Anvaya Custom Tool)

Version                : 1.0

Description

This program is a part of Anvaya package, which creates the submission files for dbEST from the filtered and annotated EST sequences.

Input files:

Output file (need to be mention as an parameter):

The output file containing records (EST sequence information) in NCBI dbEST submission format

Tool Name          : dnadist
  - From PHYLIP Package

Version                : 3.67

Description

DnaDist uses nucleotide sequences to compute a distance matrix, under three different models of nucleotide substitution. The distance for each pair of species estimates the total branch length between the two species, and can be used in the distance matrix programs FITCH, KITSCH or NEIGHBOR. This is an alternative to use of the sequence data itself in the maximum likelihood program DNAML or the parsimony program DNAPARS.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

 

Alphanumeric

Infile

Output file

 

Alphanumeric

Outfile

Method

 

F84

F84

Kimura

Jukes-Cantor

LogDet

 

Similarity Table

 
 

Gamma distribution

Yes/No

No

Gamma+Invariant

Transition/transversion option

Any Number or ration

2.0

Number of categories

Any number between 0- 9

 

Yes

Weights

Yes/No

 

No

Frequencies

Yes/No

No

Output file with distance matrix in lower triangular form

Square

 

Square

 

Lower triangular

Multiple data sets

Multiple data sets (type D)

No

Multiple weights (type W)

Input Sequence

Interleaved

 

Interleaved

Sequential

 

Terminal Type

IBM PC

 

ANSI

 

ANSI

 

None

 

Print out the data at start of run 

 

Yes/No

No

Print indications of progress of run 

Yes/No

YES

Tool Name          : dnaml
  - From PHYLIP Package

Version                : 3.67

Description

Dnaml implements the maximum likelihood method for DNA sequences. This program is fairly slow, and can be expensive to run. 

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

Alphanumeric

Infile

Output file

Alphanumeric

Outfile

 

Search for best tree

Yes/No

Yes

Transition/transversion ratio

Any Number

2.0

Empirical Base Frequencies

Yes/No

  

Yes

Number of categories

Any number between 1- 9

 

Yes

Hidden Markov Model rates

Constant rate

 

Constant rate

Gamma distributed rates

 

Gamma+Invariant sites

 

user-defined HMM of rates

Weight

Yes/No

No

Speedier but rougher analysis

Yes/No

Yes

 

Global Arrangement

Yes/No

 

No

Randomize input order of sequences? 

Yes/No

No. Use input order

Outgroup root? 

Yes/No

No, use as outgroup species  1

Analyze multiple Data sets

Multiple data sets (type D)

 

No

Multiple weights  (type W)

Input sequence

Sequential

 

Interleaved

Interleaved

 

Terminal Type

IBM PC

 

ANSI

ANSI

 

None

 

Print out the data at start of run 

Yes/No

 

No

Print indications of progress of run 

 

Yes/No

Yes

Print out tree?

Yes/No

 

Yes

Write out trees onto tree file? 

 

Yes/No

Yes

Reconstruct hypothetical sequences? 

Yes/No

No

Use lengths from user trees?

 

Yes/No

No

Rates at adjacent sites correlated? 

Yes/No

No, they are independent

Tool Name          : dnapars
  - From PHYLIP Package

Version                : 3.67

Description

DnaPars carries out unrooted parsimony (analogous to Wagner trees)  on DNA sequences. The method of Fitch  is used to count the number of changes of base needed on a given tree. Other than that, the algorithm is a direct modification of program WAGNER (an ancestor of MIX which was formerly in this package).

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

Alphanumeric

Infile

Output file

 

Alphanumeric

Outfile

 

Search for best tree? 

No, use user trees in input file

Yes

Yes

Search option? 

More thorough search

More thorough search

Less thorough

Number of trees to save? 

Any number

10000

Randomize input order of sequences? 

No/Yes

No. Use input order

Outgroup root? 

No/Yes

No, use as outgroup species  1

Use Threshold parsimony? 

No/Yes

No, use ordinary parsimony

Use Transversion parsimony? 

 No/ Yes, count only transversions

No, count all steps

Sites weighted? 

 No/Yes

 

No

Analyze multiple data sets? 

Multiple data sets (type D)

 

No

Multiple weights  (type W)

 

Input sequences interleaved? 

Yes/No, sequential

Yes

Terminal type

IBM PC/ANSI/ none 

 

ANSI

Print out the data at start of run 

 

No/Yes

No

Print indications of progress of run 

Yes/No

No

Print out tree 

Yes/No

Yes

Print out steps in each site 

Yes/No

No

Print sequences at all nodes of tree

Yes/No

No

Write out trees onto tree file? 

Yes/No

Yes

Dot-differencing to display 

Yes/No

No

Tool Name          : ePrimer3
  - From EMBOSS Package

Version                : 4.1.0

Description

Picks PCR primers and hybridization oligos

Reference : http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/eprimer3.html

 

Tool Name          : ESLPred2
  

Version                : NA

Description

"ESLpred2" is an improved version of our previous most popular method, ESLpred , which can predict four major localizations (cytoplasmic, mitochondrial, nuclear and extracellular) with an accuracy of 88%.

Reference : http://www.imtech.res.in/raghava/eslpred2/

 

Advanced Parameters

Parameter

Allowed Values

Default Value

Query input

Alphanumeric

Stdin/Fasta format

Organism group

-A , -F, -P, -G

Generalized

Method for prediction

1 (amino acid composition), 2(PSSM composition), 3 (hybrid AAC,PSSM, PSI-BLAST)

3

Output file

Alphanumeric

stdout

  

Tool Name          : ESTScan
  

Version                : 2.0b

Description

ESTScan is a program that can detect coding regions in DNA sequences, even if they are of low quality. ESTScan will also detect and correct sequencing errors that lead to frameshifts.

Reference : http://www.isrec.isb-sib.ch/ftp-server/ESTScan/

Tool Name          : FASTA
  

Version                : 3.4

Description

Provides sequence similarity searching against protein databases using the FASTA and SSEARCH programs. SSEARCH does a rigorous Smith-Waterman search for similarity between a query sequence and a database. GGSEARCH compares a protein or DNA sequence to a sequence database producing global-global alignment (Needleman-Wunsch). GLSEARCH compares a protein or DNA sequence to a sequence database. FASTA can be very specific when identifying long regions of low similarity especially for highly diverged sequences.

Reference : http://www.ebi.ac.uk/Tools/fasta/index.html

Tool Name          : FAT
  -  Functional Annotation Tool (Anvaya Custom Tool)

Version                : 1.0

Description

It takes the outputs of different programs and parses them to give the fasta-formatted file having annotation details in the header location of each sequence.

Mandatory options:

1.      Input file containing fasta-formatted protein sequences

Optional input files:

 

Tool Name          : fitch
  - From PHYLIP Package

Version                : 3.67

Description

Fitch carries out Fitch-Margoliash, Least Squares, and a number of similar methods as described in the documentation file for distance methods.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

 

Alphanumeric

Infile

Output file

 

alphanumeric

Outfile

Method

Fitch-Margoliash

Fitch-Margoliash

Minimum Evolution

Search for best tree

Yes

Yes

No, use user trees in input file

Power

 

Any Number

2.0

Negative branch lengths allowed? 

Yes/No

No

Lower-triangular data matrix? 

Yes/No

No

Upper-triangular data matrix?

Yes/No

No

Subreplicates

Yes/No

No

Global rearrangements?

Yes/No

No

Randomize input order of species? 

Yes/No

No. Use input order

Analyze multiple data sets?

Yes/No

No

Terminal type

BM PC

 

ANSI

ANSI

 

None

 

Print out the data at start of run 

 

Yes/No

No

Print indications of progress of run   

Yes/No

Yes

Print out tree 

Yes/No

 

Yes

Write out trees onto tree file? 

Yes/No

 

Yes

Use lengths from user trees

Yes/No

 

Yes

Tool Name          : Freak
  - From EMBOSS Package

Version                : 4.1.0

Description

Freak takes one or more sequences as input and a set of bases or residues to search for. It then calculates the frequency of these bases/residues in a window as it moves along the sequence. The frequency is output to a data file or (optionally) plotted.

Advanced Parameters

Parameter

Allowed Values

Default Value

Sequence file

File

NA

Residue letters

Any string

“gc”

Output file

File

NA

Stepping value

Any integer value

1

Averaging window

Any integer value

30

     

Tool Name          : genscan
  

Version                : NA

Description

GenScan is an tool to identify complete gene structures in genomic DNA. It is a GHMM-based gene finder for human sequences.

Reference : http://genes.mit.edu/GENSCAN.html

Advanced Parameters

Display Name

Allowed Values

Default Value

verbose output (extra explanatory info)

NA

NA

Print predicted coding sequences (nucleic acid)

 

NA

NA

Tool Name          : Glimmer
  - Gene Locator and Interpolated Markov ModelER

Version                : 3.02

Description

Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA

Advanced Parameters

Tool Name          : Hamming
  -  Hamming Distance (Anvaya Custom Tool)

Version                : 1.0

Description

This is customized tool provided as a part of Anvaya package.

Tool Name          : HQ_EST
  -   (Anvaya Custom Tool)

Version                : 1.0

Description

This program is a part of Anvaya package.

Tool Name          : InterProScan
  

Version                : 4.4

Description

InterPro is an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures.

Reference : http://www.ebi.ac.uk/interpro

Tool Name          : kitsch
  - From PHYLIP Package

Version                : 3.67

Description

Kitsch carries out the Fitch-Margoliash and Least Squares methods, plus a variety of others of the same family, with the assumption that all tip species are contemporaneous, and that there is an evolutionary clock (in effect, a molecular clock). This means that branches of the tree cannot be of arbitrary length, but are constrained so that the total length from the root of the tree to any species is the same. The quantity minimized is the same weighted sum of squares described in the Distance Matrix Methods documentation file.

Advanced Parameters

Parameter and

Display Name

Allowed Values

Default Value

Input file

 

Alphanumeric

Infile

Output file

 

alphanumeric

Outfile

Method

Fitch-Margoliash

Fitch-Margoliash

Minimum Evolution

Search for best tree

Yes/No

Yes

Power

 

Any Number

2.0

Negative branch lengths allowed? 

Yes/No

No

Lower-triangular data matrix? 

Yes/No

No

Upper-triangular data matrix?

Yes/No

No

Subreplicates

Yes/No

No

Randomize input order of species? 

Yes/No

No. Use input order

Analyze multiple data sets?

 

Yes/No

No

Terminal type

BM PC

 

ANSI

ANSI

 

None

 

Print out the data at start of run 

 

Yes/No

No

Print indications of progress of run   

Yes/No

Yes

Print out tree 

Yes/No

 

Yes

Write out trees onto tree file? 

Yes/No

 

Yes

Tool Name          : MAST
  - Motif Alignment and Search Tool

Version                : 4.1.0

Description

MAST is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs.

Reference : http://meme.sdsc.edu/meme/mast-intro.html

Tool Name          : MDScan
  

Version                : 2004

Description

A Fast and Accurate Motif Finding Algorithm With Applications To Chromatin Immunoprecipitation Microarray Experiments

Reference : http://robotics.stanford.edu/~xsliu/MDscan/

Tool Name          : MEME
  

Version                : 4.1.0

Description

Meme is a motif finding tool for DNA as well as Protein.

Reference : http://meme.nbcr.net

Tool Name          : MPAO
  -  Map Potential Antigenic Output (Anvaya Custom Tool)

Version                : 1.0

Description

Parse the outputs of different antigen prediction tools and make a file which will give combined output.

Tool Name          : MPT
  -  Motif Processing Tool (Anvaya Custom Tool)

Version                : 1.0

Description

It is a custom tool which converts outputs from different motif prediction tools into a text-based visual format (for easy comparison of results). It requires output of at least two tools to compare. By default it takes first five motifs predicted by each tool. If predictions are less than five, then it takes all predicted motifs into consideration. It takes as input the outputs of different motif prediction programs and processes them to give the modified easy-to-interpret formatted output.

Tool Name          : Mutual Info
  

Version                : 0.64

Description

Calculates mutual informartions (MIs) from a table of continous and discontinous variables. It is used in detection of functional linkages by Phylogenetic profiling.

Tool Name          : neighbor
  - From PHYLIP Package

Version                : 3.67

Description

Neighbor implements the Neighbor-Joining method. It constructs a tree by successive clustering of lineages, setting branch lengths as the lineages join. The tree is not rearranged thereafter. The tree does not assume an evolutionary clock, so that it is in effect an unrooted tree.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

 

Alphanumeric

Infile

Output file

 

alphanumeric

Outfile

Method

Neighbor-joining

Neighbor-joining

UPGMA tree

Lower-triangular data matrix? 

Yes/No

No

Upper-triangular data matrix?

 

Yes/No

No

Subreplicates

Yes/No

No

Randomize input order of species? 

Yes/No

No. Use input order

Analyze multiple data sets?

 

Yes/No

No

Terminal type

IBM PC

 

ANSI

ANSI

 

None

 
 

Print out the data at start of run 

 

Yes/No

 

No

 

Print indications of progress of run   

Yes/No

 

Yes

 

Print out tree 

Yes/No

 
 

Yes

Write out trees onto tree file? 

Yes/No

 

Yes

Outgroup  root

Yes/No

No, use as outgroup species  1

Tool Name          : NETCTL
  

Version                : 1.2

Description

NetCTL 1.2 predicts CTL epitopes in protein sequences

Reference : http://www.cbs.dtu.dk/services/NetCTL/

Tool Name          : OFB
  -  Ortholog From Blast (Anvaya Custom Tool)

Version                : 1.0

Description

It is a custom tool which reports the final orthologs, if found from two TBLASTN output, given the desired identity and query coverage as well as unique genes found in the respective two genomes. It takes as input three files, orthologsQuery.out file from BBH, and the two TBLASTN output from the previous nodes. Given the desired identity value and percentage query coverage, it looks for genuine orthologs, if missed previously, and appends the result to the file orthologsQuery.out. The genes which did not satisfy the given identity and percent query coverage criteria are reported as unique genes.

Advanced Parameters

Tool Name          : OCP
  -  Orthologous Cluster of conserved Protein (Anvaya Custom Tool)

Version                : 1.0

Description

The .aln file of Clustalw is used by  Orthologous Cluster tool, which extracts the conserved regions from given set of orthologous sequences.

Tool Name          : PFAM
  

Version                : Pfam_scan 0.7, Pfam 23.0

Description

The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).

Reference : http://pfam.sanger.ac.uk/

Tool Name          : Phred
  - Base calling Program

Version                : 2.2.14

Description

Phred is a base-calling program for DNA sequence traces. Phred reads DNA sequence chromatogram files and analyzes the peaks to call bases, assigning quality scores ("Phred scores") to each base call.

Tool Name          : PredPrimer
  -  Predict Primer (Anvaya Custom Tool)

Version                : 1.0

Description

It is a custom tool that reports a summarized output of the predicted primers by Primer3 after performing a similarity search against a desired database using BLASTn search and secondary structure formation using RNAfold. It takes as input two files, the BLASTn output file after running BLAST and the output of RNAfold. It reports the summarized output of atleast two hits reported by BLASTN for a query and its corresponding secondary structure prediction at a given temperature.

Tool Name          : Prof_Vec
  -  Create Profile Vector (Anvaya Custom Tool)

Version                : 1.0

Description

Reads in the results of independent Smith-Waterman database searches and creates a matrix containing normalized E-values.

The SSEARCH output is parsed in terms of % query overlap (default: 40%). The e-value is normalized using the formula normalized e-value=-1/log E 

The tool reads in outputs of independent search (one per organism) and creates a matrix with normalized E-values. The following are some of the assumptions made for the construction of the matrix:

5 --> refers to No significant Hits Found query.

1 --> refers to eVal >= 1

0 --> refers to eVal = 0

2 --> refers to eVal >0 and <1 BUT NOT satisfying conditions.

            normaliseEval  --> refers to eVal > 0 and < 1 satisfying conditions.

Tool Name          : protml
  - From PHYLIP Package

Version                : 3.67

Description

Protml PROTML is a PASCAL program for inferring evolutionary trees from protein (amino acid) sequences by using maximum likelihood. A maximum likelihood method for inferring trees from DNA or RNA sequences was developed by Felsenstein (1981). The method does not impose any constraint on the constancy of evolutionary rate among lineages.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

 

Alphanumeric

Infile

Output file

alphanumeric

Outfile

 

Search for best tree

Yes/No

Yes

Models of amino acid change

Jones-Taylor-Thornton

 

Jones-Taylor-Thornton

Henikoff/Tillier PMB

 

Dayhoff PAM

 

Number of categories

Any number between 1- 9

 

Yes

Hidden Markov Model rates

Constant rate

 

Constant rate

Gamma distributed rates

 

Gamma+Invariant sites

 

user-defined HMM of rates

Weight

Yes/No

No

Speedier but rougher analysis

Yes/No

Yes

 

Global Arrangement

Yes/No

 

No

Randomize input order of sequences? 

Yes/No

No. Use input order

Outgroup root? 

Yes/No

No, use as outgroup species  1

Analyze multiple Data sets

Multiple data sets (type D)

 

No

Multiple weights  (type W)

Input sequence

Sequential

 

Interleaved

Interleaved

 

Terminal Type

IBM PC

 

ANSI

ANSI

 

None

 

Print out the data at start of run 

Yes/No

 

No

Print indications of progress of run 

 

Yes/No

Yes

Print out tree?

Yes/No

 

Yes

Write out trees onto tree file? 

 

Yes/No

Yes

Reconstruct hypothetical sequences? 

Yes/No

No

Use lengths from user trees?

 

Yes/No

No

Rates at adjacent sites correlated? 

Yes/No

No, they are independent

Tool Name          : protdist
  - From PHYLIP Package

Version                : 3.67

Description

ProtDist uses protein sequences to compute a distance matrix, under three different models of amino acid replacement. The distance for each pair of species estimates the total branch length between the two species, and can be used in the distance matrix programs FITCH, KITSCH or NEIGHBOR. This is an alternative to use of the sequence data itself in the parsimony program PROTPARS.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

 

Alphanumeric

Infile

Output file

 

alphanumeric

Outfile

Method

 

JTT 

Jones-Taylor-Thornton matrix

Henikoff/Tillier PMB matrix

Dayhoff PAM matrix

Kimura formula

 

Similarity Table

Categories model

 
 

Gamma distribution

Yes/No

No

Gamma+Invariant

Number of categories

Any number between 0- 9

 

Yes

Weights

Yes/No

 

No

 

Multiple data sets

Multiple weights (type W)

 

No

Multiple data sets (type D)

 

Input Sequence

Sequential

 
 

Interleaved

Interleaved

 
 

Terminal Type

ANSI

 
 

ANSI

 

IBM PC

 

None

 
 

Print out the data at start of run 

 
 

Yes/No

No

Print indications of progress of run 

 

Yes/No

Yes

 

Genetic codes

Universal

 

Universal

Mitochondrial

Vertebrate mitochondrial

 

Fly mitochondrial

 

Yeast mitochondrial

 
 

Categories of Amino acids

Chemical

 
 

George/Hunt/Barker

 

George/Hunt/Barker

 

Hall

 

Ease of changing category of amino acid

Any number below 1.0. Can’t be negative.

  0.4570

Transition/transversion option

Any Number or ratio

2.0

 

Base Frequencies

Equal

 

Equal

Any number but the Frequencies must sum to 1.

Tool Name          : protpars
  - From PHYLIP Package

Version                : 3.67

Description

ProtPars infers an unrooted phylogeny from protein sequences, using a new method intermediate between the approaches of Eck and Dayhoff (1966) and Fitch (1971). Eck and Dayhoff (1966) allowed any amino acid to change to any other, and counted the number of such changes needed to evolve the protein sequences on each given phylogeny. This has the problem that it allows replacements which are not consistent with the genetic code, counting them equally with replacements that are consistent. Fitch, on the other hand, counted the minimum number of nucleotide substitutions that would be needed to achieve the given protein sequences. This counts silent changes equally with those that change the amino acid.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

 

Alphanumeric

Infile

Output file

 

alphanumeric

Outfile

 

Search for best tree? 

No, use user trees in input file

Yes

Yes

Randomize input order of sequences

No/Yes

No. Use input order

Outgroup root

No/Yes

No, use as outgroup species  1

Use Threshold parsimony

No/Yes

No, use ordinary parsimony

Genetic code

Universal          

Universal

Mitochondria

Vertebrate mitochondrial

Fly mitochondrial

Yeast mitochondrial

Use Transversion parsimony

 No/ Yes, count only transversions

No, count all steps

Sites weighted

 No/Yes

 

No

Analyze multiple data sets

Multiple data sets (type D)

 

No

Multiple weights  (type W)

 

Input sequences interleaved 

Yes/No, sequential

Yes

Terminal type

IBM PC/ANSI/ none 

 

ANSI

Print out the data at start of run 

 

No/Yes

No

Print indications of progress of run 

Yes/No

No

Print out tree 

Yes/No

Yes

Print out steps in each site 

Yes/No

No

Print sequences at all nodes of tree

Yes/No

No

Write out trees onto tree file? 

Yes/No

Yes

Dot-differencing to display 

Yes/No

No

Tool Name          : psiBLAST
  

Version                : 2.0

Description

Position specific iterative BLAST (PSI-BLAST) refers to a feature of BLAST 2.0 in which a profile (or position specific scoring matrix, PSSM) is constructed (automatically) from a multiple alignment of the highest scoring hits in an initial BLAST search. The PSSM is generated by calculating position-specific scores for each position in the alignment. Highly conserved positions receive high scores and weakly conserved positions receive scores near zero. The profile is used to perform a second (etc.) BLAST search and the results of each "iteration" used to refine the profile. This iterative searching strategy results in increased sensitivity.

Reference : http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html

Tool Name          : RBSFinder
  

Version                : Not Available

Description

RBSfinder will search for regions in the vicinity of the gene start where the ribosom might bind. Based on its findings RBSfinder might propose a different gene start. In most cases the use of RBSfinder increases the accuracy of prediction of the gene start.

Advanced Parameters

Parameter

Allowed Values

   

Genome sequence file in fasta

File name

( Input File )

Glimmer output file

File name

( Input File )

Output file

File name

Length of upstream region of gene, where RBS will be searched (in bp)

<=300

consensus sequence

String of a, t, g, c (any length)

File containing position to relocate or check for RBS site

File name

( Input File)

Tool Name          : RepeatMasker
  

Version                : 3.0

Description

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns).

Advanced Parameters

Parameter

Allowed Values

Default Value

Alternate search engine

Cross_match ,  WuBlast , Decypher

Cross_match

Parallel version

No. of processors available

NA

Slow search

N.A

Unchecked

Quick search

N.A

Unchecked

Rush job

N.A

Unchecked

Don’t mask low complexity regions / simple repeats

N.A

Unchecked

Only masks low complex/simple repeats

N.A

Unchecked

Do not mask small RNA or pseudo genes

N.A

Unchecked

Mask Alus, 7SLRNA, SVA and LTR5

N.A

Unchecked

Maximum % divergence from consensus sequence

0 to 100

NA

Use custom library

File name

(Input File)

NA

Cutoff score for masking repeats.

> 0

225

Specify species of input sequence

String

N.A

Only clip E.coli insertion elements.

N.A

Unchecked

Clip IS element before analysis

N.A

Unchecked

Skip bacterial insertion element check

N.A

Unchecked

Check for rodent specific repeats

N.A

Unchecked

Checks for primate specific repeats

N.A

Unchecked

Use matrices calculated for x% background GC level

1 to100

N.A

Calculate GC content

N.A

Unchecked

Max. sequence length masked without fragmenting.

Any integer

If –e = DeCypher then Default value: 300000

Else

Default: 40000

Skip the steps in which repeats are excised

N.A

Unchecked

Write alignment in .align output file

N.A

Unchecked

Present alignment in the orientation of repeats

N.A

Unchecked

Outputs ambiguous DNA transposon in lower case

N.A

Unchecked

Mask Sequence in lower case

N.A

Unchecked

Returns repetitive regions in lowercase

N.A

Unchecked

Repetitive region masked with X’s

N.A

Unchecked

Reports simple repeats that may be polymorphic

N.A

N.A

Annotation with the HSP evidence

N.A

Unchecked

Create an additional output in xhtml format

N.A

Unchecked

Output in ACeDB format

N.A

Unchecked

Gene Feature Finding format output

N.A

Unchecked

Annotation output file not processed by ProcessRepeats

N.A

Unchecked

Output file in cross_match format

N.A

Unchecked

Create annotation file with fixed column width

N.A

Unchecked

Does not write final column with unique ID for each element

N.A

Unchecked

    

Tool Name          : Restrict
  - From EMBOSS Package

Version                : 4.1.0

Description

Restricts finds restriction enzyme cleavage sites. Restrict uses the REBASE database of restriction enzymes to predict cut sites in a DNA sequence. The program allows you to select a range of cuts, whether the DNA is circular, whether IUB ambiguity codes are used, whether blunt or sticky ends or both are reported.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input

Alphanumeric

Stdin

-Enzymes

Alphabetic

all

Minimum site length

Integer

4

Output

Alphanumeric

Stdin

Minimum cuts for an enzyme

Integer

1

Maximum cuts for an enzyme

Integer

2000000000

 

Fragment lengths

 

Boolean Yes/No

 

No

Solo

Fragment lengths of each enzyme

 

Boolean Yes/No

 

No

Only one fragment per enzyme

Boolean Yes/No

No

Allow blunt cut

Boolean Yes/No

Yes

Allow Sticky ends

Boolean Yes/No

Yes

Allow ambiguity

Boolean Yes/No

Yes

Span Plasmid/end sequence

Boolean Yes/No

No

commercial enzymes

Boolean Yes/No

No

Alternate RE datafile

Alphanumeric

 (Input file)

Stdin

Isoschizomers

limit

Boolean

Yes/No

Yes

Sort out Alphabetically

Boolean

Yes/No

No

   

Tool Name          : ROT
  -  Remote Ortholog Tool (Anvaya Custom Tool)

Version                : 1.0

Description

The output of PSI-BLAST is used as input for Remote ortholog tool (ROT). Hits between 30% to 60% identity achieved from each round of PSI-BLAST are extracted by ROT. These are probable remote ortholog sequences and can be saved in FASTA format for further analysis.

Tool Name          : SeqBoot
  - From PHYLIP Package

Version                : 3.67

Description

SEQBOOT is a general boostrapping tool. It is intended to allow you to generate multiple data sets that are resampled versions of the input data set. Since almost all programs in the package can analyze these multiple data sets, this allows almost anything in this package to be bootstrapped, jackknifed, or permuted. SEQBOOT can handle molecular sequences, binary characters, restriction sites, or gene frequencies.

Advanced Parameters

Parameter

Allowed Values

Default Value

Input file

 

Alphanumeric

Infile

Output file

 

alphanumeric

Outfile

Method

 

Molecular Sequence

Molecular Sequence

Discrete Morphological Characters

Restriction Sites

Gene Frequencies

 

Bootstrap

Bootstrap

Delete half Jackknife

 
 

Permute

Permute species for each character

Permute character order

Permute within species

 

Rewrite data

 

Sampling fraction

Regular

Regular

Altered

Block Bootstrap

Size of Block:  Any Number

 

1

Number of replicate datasets

 

Any Number

100

Weight

Read weights of characters: Yes/No

No

Categories

Read categories of sites: Yes/No

No

 

Data sets

Data sets

Just weights

Input Sequence

Interleaved

Interleaved

Sequential

Terminal Type

IBM PC

 

ANSI

None

Print out the data at start of run 

 

Yes/No

No

Print indications of progress of run 

Yes/No

No

Number of enzymes

Present in input file

Present in input file

 

Not present in input file

All alleles present at each locus 

No, one absent at each locus

No, one absent at each locus

 

Yes

Factors

Yes/No

No

Ancestors

Yes/No

No

Mixture of methods

Yes/No

No

Dot-differencing to display 

Yes/No

No

Output format

PHYLIP

  

PHYLIP

NEXUS

XML

Type of molecular sequences

DNA

RNA

Protein

DNA

Tool Name          : seqClean
  

Version                : 2.2.14

Description

A script for automated trimming and validation of ESTs or other DNA sequences by screening for various contaminants, low quality and low-complexity sequences.

Reference : http://jimmy.harvard.edu

Tool Name          : SignalP
  

Version                : 3.0

Description

SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks and hidden Markov models.

Reference : http://www.cbs.dtu.dk/services/SignalP/

Tool Name          : SRT
  -  Sequence Retreival Tool (Anvaya Custom Tool)

Version                : 1.0

Description

Retrieves the fasta sequences of the entries satisfying the above mentioned criteria from the corresponding table of the database stored in a RDBMS like MySQL. The storage of standard databases like UniProt and nr as tables in MySQL increased the pace of retrieval which otherwise had to be extracted from a text file with upto 3GB size and consumed lot of time for the same task.

Tool Name          : STT
  -  Sequence Trimming Tool (Anvaya Custom Tool)

Version                : 1.0

Description

Sequence trimming tool will remove the vector-masked regions from the EST sequences. It also removes polyA/T tails, poly C/G tails, adaptor and linker sequences.

Input files:

Output files: (need to be mentioned as an parameter)

 

Tool Name          : TargetP
  

Version                :  v1.1b

Description

TargetP predicts the subcellular location of eukaryotic proteins. The location assignment is based on the predicted presence of any of the N-terminal presequences: chloroplast transit peptide (cTP), mitochondrial targeting peptide (mTP) or secretory pathway signal peptide (SP).

Reference : http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?targetp

 

Advanced Parameters

 

Parameter

Allowed Values

Default Value

Query input

Alphanumeric

Stdin/Fasta format

Organism group

N/A

Non-plant

Include cleavage site prediction

N/A

N/A

Cutoff for predicting cTP. 

Float  0.0-1.0

0.0

Cutoff for predicting mTP.

Float  0.0-1.0

0.0

Cutoff for SP.

Float  0.0-1.0

0.0

Cutoff for other.

Float  0.0-1.0

0.0

Output file

Alphanumeric

stdout

 

Tool Name          : TMHMM
  - Prediction of transmembrane helices in proteins

Version                : 2.0.c

Description

TMHMM predicts transmembrane helices and the location of the intervening loop regions.
If the whole sequence is labeled as inside or outside, the prediction is that it contains no membrane helices. It is probably not wise to interpret it as a prediction of location. The prediction gives the most probable location and orientation of transmembrane helices in the sequence.

Reference : http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?tmhmm

Advanced Parameters

Parameter

Command line mapping

Allowed Values

Input file

NA

Alphanumeric

Model

-v

 

Tool Name          : tRNAScan
  

Version                : 1.21

Description

tRNA detection in large-scale genome sequence.

tRNAscan-SE detects ~99% of eukaryotic nuclear or prokaryotic tRNA genes, with a false positive rate of less than one per 15 gigabases, and with a search speed of about 30 kb/second

 

Reference : http://selab.janelia.org/software.html

Tool Name          : UT_EST
  -  Unique Transcripts EST Tool (Anvaya Custom Tool)

Version                : 1.0

Description

This is customized tool provided as a part of Anvaya package. This will give the concatenated file containing the all contigs and singlets.

Input files:

Output file name need to mention as an input parameter.

Tool Name          : vRNAFold
  

Version                : NA

Description

predict secondary structures of single stranded RNA or DNA sequences.

Tool Name          : Weeder
  

Version                : 1.3

Description

Weeder is a program for finding novel motifs ( transcription factor binding sites ) conserved in a set of regulatory regions of related genes.

Reference : http://159.149.109.9/modtools/

Workflows - Pipeline


A Workflow-Pipeline is logical connection of commonly used Bioinformatics tools, which are run either in serial mode or in parallel mode, to achieve a scientific target.

The pipeline is created by dragging appropriate tools from the available tool list, connecting them logically on the design canvas, and then setting the appropriate input-output files and advanced parameters.

Create Workflow


A workflow is created by dragging associated tools from the tool list onto the design canvas. The tools are then connected in logical order.

Execute Workflow


When user clicks "Run Workflow", the user will need to transfer the input and other configuration files before actual execution of workflow. Once the file transfer is complete, the user can start executing the workflow.

Status of the workflow can be viewed in tabular format on the status tab. The status is also depicted pictorially on the design canvas

Save Workflow


The workflow created on design canvas must be saved before execution.

Workflows - Glossary


Terminology Description
Pipeline Logical connection of tools
Rules Engine A component of Anvaya Client which controls connection of tools. The rules engine defines which tools can be logically connected. The rules engine is pre-defined and should not be  modified. by end-user
Custom Tool These are tools which are custom made for Anvaya. These add value to the pre-defined workflows available in Anvaya
SubLayer If the workflow, spans across large area on design canvas, the sub-parts of workflows can be grouped together in sub-layer, which then can be collapsed/expanded by the user
   
   
   
   
   
   
   

Index