The advent of new generation sequencing technologies has revolutionized the number of genomes being sequenced (per organism) leading to a steep rise in the volumes of genomic data generated. Downstream analysis of such enormous data requires specialized software/hardware that will enable high-throughput analysis of vast genomic data. Comprehensive analysis of heterogeneous genomic data requires the usage of large number of Bioinformatics tools that demand extensive preprocessing and require manual intervention at frequent intervals. Our motive is to provide a flexible platform for running complex queries that is capable of integrating and analyzing large amount of genomic data. Anvaya, is a software that consists of Bioinformatics tools and databases that are loosely coupled together in a coordinated system to execute a set of analyses tools in series or in parallel.
User |
|
Login |
The user has to login with appropriate credentials. Login-ID and Password can be obtained from Anvaya Admin Interface. |
Logout |
The user must logout everytime before the client exits. |
Project |
|
New Project |
When workflows client is started, after user login, a new project needs to be created in order to start creating a pipeline |
Open Project |
Anvaya Client allows user to open a previously executed project by using this option. |
Delete Project |
A previously created project can be deleted using this option. |
Workflow (Pipeline) |
|
Create Workflow |
A workflow is created by dragging associated tools from the tool list onto the design canvas. The tools are then connected in logical order. |
Save workflow | The workflow created on design canvas must be saved before execution. |
Run Workflow |
When user clicks "Run Workflow", the user will need to transfer the input and other configuration files before actual execution of workflow. Once the file transfer is complete, the user can start executing the workflow. |
Stop Workflow |
|
Resume workflow |
|
Status |
|
UI |
|
Design Canvas |
|
Project Explorer |
Project Explorer Tab allows user to view the input-output and the intermediate output files of the current project. The files must be transferred from server (status tab) before they can be viewed in the Project Explorer |
Tool List |
|
Pre Defined Workflows | Anvaya provides a set of 11 pre-defined workflows for frequently used pipelines in genome annotation and comparative genomics ranging for EST assembly and annotation to phylogenetic reconstruction and microarray analysis. |
Node Properties |
|
Connecting Tools |
|
Sticky Note |
|
Help |
|
SubLayer |
|
Rules Engine |
|
Tool tip |
|
Alignment of Nodes |
|
Scroll View |
|
Once Anvaya client is installed, user will have to request login-id from Anvaya Admin interface. The user can then use the login-id and password to start using Anvaya Client. The server needs to be configured before login.
Anvaya has following layout components:
- Design Canvas : Work Area for creating workflow pipelines
- Project Explorer : To view input/output files for the project
- Status Tab : For viewing Tabular/pictorial status of workflow execution
- Tool List: List of tools available functionality wise or in alphabetical order
- Sub Layer Tree : Tree View for sub layers created in workflow pipeline
- Scroll View : Snapshot view area which has handle to span across the complete design canvas
- Console: Text Message area to display current execution messages for user
Client Configuration
Once client is installed, it is configured with default values. the configuration file is avilable under conf/workflow.conf file wherever the client was installed. The user with have to configure WORKFLOW_HOME path to indicate the workspace directory for Anvaya Client. All the other paths are configurable through the client itself.
Server Configuration
Server configuration is available under the "Server>>Configure Server" option in the client menu. Here, the user will have to configure the server address, where the Anvaya services are installed along with user authentication details. These details are stored in encrypted format at client end for further usage.
Once client is installed, it is configured with default values. the configuration file is avilable under conf/workflow.conf file wherever the client was installed. The user with have to configure WORKFLOW_HOME path to indicate the workspace directory for Anvaya Client. All the other paths are configurable through the client itself.
Server configuration is available under the "Server>>Configure Server" option in the client menu. Here, the user will have to configure the server address, where the Anvaya services are installed along with user authentication details. These details are stored in encrypted format at client end for further usage.
Anvaya node represents any individual tools which is part of created pipeline. The nodes can be logically connected to other nodes. Each node is associated with input-output panel to configure the IO files and also with advanced panels, to configure the associated tool in detail.
If the user wants to delete a node form pipeline, right click on the node and then click 'Delete' option.
To create pipeline, the node (tool) needs to be dragged from the tool list onto the design canvas. User can drag the tools from alphabetical list or from the tools list sorted functionaility-wise.
User can edit the node, by double-clicking it. The Properties dialog of the node will pop up. Each node has input-output panel for configuring the input and outputs for the node. The parameters to be configured for each tools is available in the associated "advanced parameter" panel.
In Anvaya, the user has to started by creating a project. Each project can have only one workflow-pipeline defined. The user can open-delete previously created projects.
A previously created project can be deleted using this option.
When workflows client is started, after user login, a new project needs to be created in order to start creating a pipeline.
Anvaya Client allows user to open a previously executed project by using this option.
The Project Explorer tab allows user to view input, output and the intermediate files create during the execution of the workflow. The client needs to send request to transfer files from server to make them available in Project Explorer. For transfering the file, the user can make use of the Status Tab.
EST Assembly is one of the challenging workflow in bioinformatics research area. Here Anvaya provides researcher the single workflow, which will take the raw trace files from sequencing machines and will provide the fully annotated assembled ESTs.
The trace files from sequencing machines are processed by PHRED software for base calling. PHRED will provide the files containing sequences along with corresponding quality values. The sequences are further processed by the Cros_match tool for vector masking. The STT-Sequence trimming tool takes these vector masked EST sequences as an input. STT will remove the vector-masked nucleotides, polyA/T tail, polyC/G tail and Adapter/Linker sequences from EST sequences. STT will also maintain the similar changes/edition in the corresponding quality values. Further, STT-processed sequence files are processed by Seqclean for the removal contamination such as mitochondrial sequences. Seqclean will provide the filtered ESTs. The similar modification can be made in corresponding quality values by the tool cln2qual from seqclean package. One can remove the host/parasite contamination by the use of seqclean tool.
The filtered EST sequences thus obtained are further annotated by using BLAST. FAT tool will parse the BLAST results and will give EST sequences with their annotation in their header lines. The �dbEST submission files Tool� takes these EST sequences along with their annotations as an input and provides the dbEST submission files having file format mentioned by NCBI.
The filtered EST sequences are submitted for assembly to Cap3 assembly tool. The �Unique Transcripts EST Tool� combines the output files from Cap3 (assembled ESTs i.e. contigs file and unassembled ESTs i.e singlets file) and gives the single fasta formatted file containing the union of contigs and singlets sequences. This assembled EST dataset are further processed by BLAST, InterProScan, BLAST2GO for it�s functional annotation. The outputs from all these tools will be parsed by FAT-Functional Annotation tool for the fasta formatted sequences having annotation information in the respected header information. ESTScan will help in analyzing whether the sequenced ESTs are true ESTs or not.
Functional Annotation workflow template provides the easy way of annotating whole proteome set. This workflow uses multiple tools to annotate the given protein sequence. BLAST is used for functional annotation based on similarity with existing protein databases such as UniProt, nr etc. PfamHmm tools is used to identify the functional domains which are present within the sequence. Using programs viz, TMHMM and SignalP, user can predict subcellular localization. InterProScan is used to assign the Gene Ontology terms to the given protein sequences. The output of above mentioned programs is processed further by FAT-FunctionalAnnotationTool in order to provide protein sequences with annotations mentioned in respected header lines of fasta sequences.
Genome annotation workflow is used to annotate a newly sequenced genome. This workflow can be used for both prokaryotic and Eukaryotic organisms.
Glimmer/Genscan predicts the genes from a given input genome sequence. In case of prokaryotes, glimmer results are further accompanied by ribosomal binding site prediction from RBSfinder. The tRNA genes are predicted by tRNAScan-SE program. The repeats from the genome are masked by using RepeatMasker program. All the results from Glimmer/Genscan, RBSfinder, tRNAScan, and RepeatMasker programs are combined to give in a standardized format i.e., in Genbank format. The client can view this genbank formatted result file using Artemis visualization tool for detailed analysis.
The Compseq utility from EMBOSS package is used for reporting composition of dimer/trimer/etc words in a sequence. The freak utility from EMBOSS package is used for providing residue/base frequency tables or plots. The restrict utility from EMBOSS package is used to find restriction enzyme cleavage sites within the given genomeThe workflow helps us to identify consensus patterns in the upstream regions of certain genes, which may be clustered together, which may also lead to the identification of novel target genes for therapeutic purposes. Gene expression data is given as input for the workflow, for which the data is supposed to be pre-normalized at the user�s end. The normalized data is then parsed for Cluster analysis (preferably K-Means or hierarchical Clustering). For the different clusters obtained, the corresponding gene ids are obtained from database followed by retrieval of the upstream regions of the gene. Motif analysis tool MEME is used to identify conserved patterns/motifs. The motifs thus obtained are searched against the desired motif databases using the tool MAST.
The workflow identifies conserved patterns or motifs using DNA or Protein sequences as input. The input sequences are searched against a preferred database using BLAST and the significant hits obtained (orthologs) are parsed for the next step of sequence retrieval. Motif discovery programs like MEME, AlignACE, MDScan, Weeder and Consensus are used for pattern identification. The conserved regions obtained are then parsed and analyzed through a custom tool �Motif Processing Tool�.
The workflow predicts orthologs given a set of two genomes using the criteria of bi-directional best hits. BLAST is used for detecting the orthologs. Initially the gene sets of two organisms are subjected to BLAST and orthologs are predicted in cases where the criteria of bi-directional best hits are satisfied. Genes are referred to as unique if they do not satisfy the criteria of bi-directional best hits. Such unique genes of a given organism are then searched against the genome of the other organism so as to cross-check if the genes have been missed due to the limitation of the gene prediction program. If significant matches are found at the genome level then it is not a unique gene and if no matches are detected then the genes are unique with respect to the other organism in consideration.
The workflow aims to infer functional linkages using phylogenetic profiling. The input protein sequence(s) is usually from any organism with its complete genome sequenced which is searched against proteome data of other organisms with completely sequenced genome using either BLASTP/SSEARCH. The e-values obtained after search of every protein sequence is parsed and normalised and represented as a profile vector/matrix where in the rows are individual proteins and the columns are organisms. The profiles obtained are analysed for their statistical significance using parameters like mutual information content, hamming distance and correlation coefficient
The workflow builds a phylogenetic tree of the orthologs detected for a given query sequence (nucleotide or protein) using a similarity search tool. User needs to provide only the query sequence to carry out a similarity search using BLAST against a chosen database. Parsers will be provided to read the output of BLAST and submit sequences to multiple sequence alignment tools like ClustalW, which would then pass the output to Phylip suite for reconstruction of phylogenetic tree of the orthologs.
The workflow is for identification of primer sequence. The input nucleotide sequence is subjected to tools like Primer3 for prediction of primers. The primers predicted are then subjected to search against the database of choice using BLASTN and to RNA secondary structure prediction. Outputs from BLASTN search and RNA secondary structure prediction are then combined to give an output, suggesting the most probable primer sequence pair.
The workflow enables to find probable remote orthologs for a test input sequence. Simultaneously it also allows identifying conserved domains amongst the closely related sequences of the same input.
|
|
Tool Name : AlignACE
- Aligns Nucleic Acid Conserved Elements
Version : 4.0
AlignACE (Aligns Nucleic Acid Conserved Elements) is a program which finds sequence elements conserved in a set of DNA sequences
Reference : http://atlas.med.harvard.edu/
Tool Name : Antigenic
- From EMBOSS Package
Version : 4.1.0
Finds antigenic sites in proteins
Reference : http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/antigenic.html
Tool Name : BBH
- BiDirectional Blast Hit (Anvaya Custom Tool)
Version : 1.0
It is a custom tool which performs two BLASTP programs at a time taking proteome of one of the organisms as input file and proteome of the other as database and vice-versa. It also fishes out the orthologs found using the bi-directional best hit criteria between the two organisms
Parameter |
Allowed Values |
Default Value |
Input sequence for organism1 in BLASTp (complete protein sequence) |
Fasta sequences |
None |
Database (complete protein sequences of 2nd organism against which ortholog has to be predicted) |
Fasta Sequences |
None |
Tool Name : Blast
- Basic Local Alignment Search Tool
Version : 2.2.14
The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
Parameter |
Allowed Values |
Default Value |
Program Name |
blastp, blastn, tblastn, tblastx, blastx |
blastp |
Database |
Alphanumeric |
nr |
Query File |
Alphanumeric |
stdin |
Expectation value (E) |
Real |
10.0 |
Output file |
alphanumeric |
stdout |
Filter |
T/F |
T |
Gap Opening Peanlty |
blastp:7,8,9,10,11,12 |
-1 |
Gap Extension Penalty |
blastp:1,2 |
-1 |
Mismatch penalty |
-1,-2.-3,-4,-5 |
-3 |
Match score |
1,2,4 |
2 |
Number of hits to show alignments |
Any integer |
250 |
No. of processors |
integer |
1 |
Scoring matrix |
blastp:BLOSUM62, BLOSUM80, BLOSUM45, PAM30, PAM70 |
BLOSUM62 |
Word size |
blastn :7,11, 15 all others: 2,3 |
blastn:11 all others:3 |
Tool Name : BLAST2GO
- B2G4Pipe
Version : 2.2.2
Blast2GO is an ALL in ONE tool for functional annotation of (novel) sequences and the analysis of annotation data.
Reference : http://www.blast2go.org
Tool Name : Cap3
- A DNA sequence assembly program
Version : Versions as with Phred 2.2.14
The program has a capability to clip 5' and 3' low-quality regions of reads. It uses base quality values in computation of overlaps between reads, construction of multiple sequence alignments of reads, and generation of consensus sequences. The program also uses forward-reverse constraints to correct assembly errors and link contigs. Results of CAP3 on four BAC data sets are presented.
Tool Name : CGF
- Convert to Genbank Format (Anvaya Custom Tool)
Version : 1.0
This program will take the input from different nodes and will summarize the results in genbank format.
Mandatory options:
Optional input files:
Tool Name : Charge
- From EMBOSS Package
Version : 4.1.0
Charge reads a protein sequence and writes a file (or plots a graph) of the charges of the amino acids within a window of specified length as the window is moved along the sequence
Reference : http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/charge.html
Parameter |
Allowed Values |
Default Value |
Query input |
Alphanumeric |
-seqall |
Sequence associated option |
Integer |
N/A |
Sequence format |
Alphanumeric (Fasta, Embl,) |
Fasta |
Produce graph |
ps, hpgl, png, gif, x11. |
ps |
Window size |
Integer |
5 |
Graphics |
Toggle value Yes/No (png, ps) |
No |
Amino acids properties and molecular weight data file |
Data file |
Eamino.dat |
Output File |
Alphanumeric |
outfile |
Tool Name : Cln2Qual
Version : 2.2.14
Cln2Qual parses the trimming ("clear range") coordinates and trash codes from the cleaning report and applies them to the quality records.
Reference : http://jimmy.harvard.edu
Tool Name : ClustalW
- Multiple sequence alignment program
Version : 0.13
ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences. It calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.
Parameter |
Allowed Values |
Default Value |
Input |
Alphanumeric |
None |
Output |
Alphanumeric |
Same as input file name |
Type of sequence |
protein or dna |
Protein |
Format of output |
gcg, gde, phylip, pir, nexus, clustal |
Clustal |
Format of output tree |
nj, phylip, dist, nexus |
|
Matrix for pairwise alignment |
BLOSUM, PAM, GONNET, ID |
Gonnet |
Matrix for pairwise and multiple alignment |
IUB, CLUSTALW or filename |
IUB |
Gap opening penalty |
Float |
DNA: 15.0 Protein: 10.0 |
Gap extension penalty |
Float |
DNA:6.66 Protein:0.1 |
Tool Name : Cluster
Version : 3.0
The tool provides the most commonly used clustering methods for gene expression data analysis.
Reference : http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/
Tool Name : CompSeq
- From EMBOSS Package
Version : 4.1.0
Compseq counts composition of dimer/trimer/etc words in a sequence.
Parameter |
Allowed Values |
Default Value |
Input Sequence file |
File |
NA |
Word size |
1 =< n < 20 |
2 |
Out put file name |
File |
NA |
Previoulsy produced compseq outfile.It can be used to set the expected frequencies of words in this analysis. |
File |
NA |
Frame of word |
0 =< n < word_number |
0 |
Ignore code B and Z |
Yes/no |
Yes (Checked) |
Reverse complement |
Yes/no |
No (Unchecked) |
Calculate from Observed frequency |
Yes/no |
No (Unchecked) |
Zero count |
Yes/no |
Yes (Checked) |
Tool Name : Consense
- From PHYLIP Package
Version : 3.67
CONSENSE reads a file of computer-readable trees and prints out (and may also write out onto a file) a consensus tree. Basically the consensus tree consists of monophyletic groups that occur as often as possible in the data. If a group occurs in more than 50% of all the input trees it will definitely appear in the consensus tree. The tree printed out has at each fork a number indicating how many times the group which consists of the species to the right of (descended from) the fork occurred.
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Intree |
Output file |
alphanumeric |
Outfile |
Consensus method |
Majority rule (extended) |
Majority rule (extended) |
Strict |
||
Majority Rule |
||
Ml |
||
Outgroup root |
Yes/No |
No, use as Outgroup species 1 |
Rooted tree (Trees to be treated as Rooted) |
Yes/No |
No |
Terminal type |
ANSI |
ANSI |
BM PC |
||
None |
||
Print out the data at start of run |
Yes/No |
No |
Print indications of progress of run |
Yes/No |
Yes |
Print out tree |
Yes/No |
Yes |
Write out trees onto tree file? |
Yes/No |
Yes |
Tool Name : Consensus
Version : 6c
Consensus is a pattern recognition program that can be used in identifying pattern in a set of unaligned DNA, RNA and protein sequences.
Reference : http://bifrost.wustl.edu/consensus/
Tool Name : CrossMAtch
Version : 0.990319
Cross_match is a program for rapid protein and nucleic acid sequence comparison and database search.
Reference : http://jimmy.harvard.edu
Tool Name : inCap3
- Cap3 Input Customization (Anvaya Custom Tool)
Version : 1.0
This is a customized tool provided as a part of Anvaya package. This will create the input file names compatible with the CAP3 software.
Input files:
Output files:
Sequence and qual file should be at the same location. User can not provide the name of output files. User can only provide the output tag. According to the output tag, the cap3 input files will be created.
Tool Name : Db_EST
- DB EST Submission Tool (Anvaya Custom Tool)
Version : 1.0
This program is a part of Anvaya package, which creates the submission files for dbEST from the filtered and annotated EST sequences.
Input files:
Output file (need to be mention as an parameter):
The output file containing records (EST sequence information) in NCBI dbEST submission format
Tool Name : dnadist
- From PHYLIP Package
Version : 3.67
DnaDist uses nucleotide sequences to compute a distance matrix, under three different models of nucleotide substitution. The distance for each pair of species estimates the total branch length between the two species, and can be used in the distance matrix programs FITCH, KITSCH or NEIGHBOR. This is an alternative to use of the sequence data itself in the maximum likelihood program DNAML or the parsimony program DNAPARS.
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
Alphanumeric |
Outfile |
Method |
F84 |
F84 |
Kimura |
||
Jukes-Cantor |
||
LogDet |
||
Similarity Table |
||
Gamma distribution |
Yes/No |
No |
Gamma+Invariant |
||
Transition/transversion option |
Any Number or ration |
2.0 |
Number of categories |
Any number between 0- 9 |
Yes |
Weights |
Yes/No |
No |
Frequencies |
Yes/No
|
No |
Output file with distance matrix in lower triangular form |
Square |
Square |
Lower triangular |
||
Multiple data sets |
Multiple data sets (type D)
|
No |
Multiple weights (type W)
|
||
Input Sequence |
Interleaved |
Interleaved |
Sequential |
||
Terminal Type |
IBM PC |
ANSI |
ANSI |
||
None |
||
Print out the data at start of run |
Yes/No |
No |
Print indications of progress of run |
Yes/No |
YES |
Tool Name : dnaml
- From PHYLIP Package
Version : 3.67
Dnaml implements the maximum likelihood method for DNA sequences. This program is fairly slow, and can be expensive to run.
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
Alphanumeric |
Outfile |
Search for best tree |
Yes/No |
Yes |
Transition/transversion ratio |
Any Number |
2.0 |
Empirical Base Frequencies |
Yes/No |
Yes |
Number of categories |
Any number between 1- 9 |
Yes |
Hidden Markov Model rates |
Constant rate |
Constant rate |
Gamma distributed rates |
||
Gamma+Invariant sites |
||
user-defined HMM of rates |
||
Weight |
Yes/No |
No |
Speedier but rougher analysis |
Yes/No |
Yes |
Global Arrangement |
Yes/No |
No |
Randomize input order of sequences? |
Yes/No |
No. Use input order |
Outgroup root? |
Yes/No |
No, use as outgroup species 1 |
Analyze multiple Data sets |
Multiple data sets (type D) |
No |
Multiple weights (type W) |
||
Input sequence |
Sequential |
Interleaved |
Interleaved |
||
Terminal Type |
IBM PC |
ANSI |
ANSI |
||
None |
||
Print out the data at start of run |
Yes/No |
No |
Print indications of progress of run |
Yes/No |
Yes |
Print out tree? |
Yes/No |
Yes |
Write out trees onto tree file? |
Yes/No |
Yes |
Reconstruct hypothetical sequences? |
Yes/No |
No |
Use lengths from user trees? |
Yes/No |
No |
Rates at adjacent sites correlated? |
Yes/No |
No, they are independent |
Tool Name : dnapars
- From PHYLIP Package
Version : 3.67
DnaPars carries out unrooted parsimony (analogous to Wagner trees) on DNA sequences. The method of Fitch is used to count the number of changes of base needed on a given tree. Other than that, the algorithm is a direct modification of program WAGNER (an ancestor of MIX which was formerly in this package).
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
Alphanumeric |
Outfile |
Search for best tree? |
No, use user trees in input file |
Yes |
Yes |
||
Search option? |
More thorough search |
More thorough search |
Less thorough |
||
Number of trees to save? |
Any number |
10000 |
Randomize input order of sequences? |
No/Yes |
No. Use input order |
Outgroup root? |
No/Yes |
No, use as outgroup species 1 |
Use Threshold parsimony? |
No/Yes |
No, use ordinary parsimony |
Use Transversion parsimony? |
No/ Yes, count only transversions |
No, count all steps |
Sites weighted? |
No/Yes |
No |
Analyze multiple data sets? |
Multiple data sets (type D) |
No |
Multiple weights (type W) |
||
Input sequences interleaved? |
Yes/No, sequential |
Yes |
Terminal type |
IBM PC/ANSI/ none |
ANSI |
Print out the data at start of run
|
No/Yes |
No |
Print indications of progress of run |
Yes/No |
No |
Print out tree |
Yes/No |
Yes |
Print out steps in each site |
Yes/No |
No |
Print sequences at all nodes of tree |
Yes/No |
No |
Write out trees onto tree file? |
Yes/No |
Yes |
Dot-differencing to display |
Yes/No |
No |
Tool Name : ePrimer3
- From EMBOSS Package
Version : 4.1.0
Picks PCR primers and hybridization oligos
Reference : http://emboss.sourceforge.net/apps/release/5.0/emboss/apps/eprimer3.html
Tool Name : ESLPred2
Version : NA
"ESLpred2" is an improved version of our previous most popular method, ESLpred , which can predict four major localizations (cytoplasmic, mitochondrial, nuclear and extracellular) with an accuracy of 88%.
Reference : http://www.imtech.res.in/raghava/eslpred2/
Advanced Parameters
Parameter |
Allowed Values |
Default Value |
Query input |
Alphanumeric |
Stdin/Fasta format |
Organism group |
-A , -F, -P, -G |
Generalized |
Method for prediction |
1 (amino acid composition), 2(PSSM composition), 3 (hybrid AAC,PSSM, PSI-BLAST) |
3 |
Output file |
Alphanumeric |
stdout |
Tool Name : ESTScan
Version : 2.0b
ESTScan is a program that can detect coding regions in DNA sequences, even if they are of low quality. ESTScan will also detect and correct sequencing errors that lead to frameshifts.
Reference : http://www.isrec.isb-sib.ch/ftp-server/ESTScan/
Tool Name : FASTA
Version : 3.4
Provides sequence similarity searching against protein databases using the FASTA and SSEARCH programs. SSEARCH does a rigorous Smith-Waterman search for similarity between a query sequence and a database. GGSEARCH compares a protein or DNA sequence to a sequence database producing global-global alignment (Needleman-Wunsch). GLSEARCH compares a protein or DNA sequence to a sequence database. FASTA can be very specific when identifying long regions of low similarity especially for highly diverged sequences.
Reference : http://www.ebi.ac.uk/Tools/fasta/index.html
Tool Name : FAT
- Functional Annotation Tool (Anvaya Custom Tool)
Version : 1.0
It takes the outputs of different programs and parses them to give the fasta-formatted file having annotation details in the header location of each sequence.
Mandatory options:
Optional input files:
Tool Name : fitch
- From PHYLIP Package
Version : 3.67
Fitch carries out Fitch-Margoliash, Least Squares, and a number of similar methods as described in the documentation file for distance methods.
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
alphanumeric |
Outfile |
Method |
Fitch-Margoliash |
Fitch-Margoliash |
Minimum Evolution |
||
Search for best tree |
Yes |
Yes |
No, use user trees in input file |
||
Power |
Any Number |
2.0 |
Negative branch lengths allowed? |
Yes/No |
No |
Lower-triangular data matrix? |
Yes/No |
No |
Upper-triangular data matrix? |
Yes/No |
No |
Subreplicates |
Yes/No |
No |
Global rearrangements? |
Yes/No |
No |
Randomize input order of species? |
Yes/No |
No. Use input order |
Analyze multiple data sets? |
Yes/No |
No |
Terminal type |
BM PC
|
ANSI |
ANSI |
||
None |
||
Print out the data at start of run |
Yes/No |
No |
Print indications of progress of run |
Yes/No |
Yes |
Print out tree |
Yes/No |
Yes |
Write out trees onto tree file? |
Yes/No |
Yes |
Use lengths from user trees |
Yes/No |
Yes |
Tool Name : Freak
- From EMBOSS Package
Version : 4.1.0
Freak takes one or more sequences as input and a set of bases or residues to search for. It then calculates the frequency of these bases/residues in a window as it moves along the sequence. The frequency is output to a data file or (optionally) plotted.
Parameter |
Allowed Values |
Default Value |
Sequence file |
File |
NA |
Residue letters |
Any string |
“gc” |
Output file |
File |
NA |
Stepping value |
Any integer value |
1 |
Averaging window |
Any integer value |
30 |
Tool Name : genscan
Version : NA
GenScan is an tool to identify complete gene structures in genomic DNA. It is a GHMM-based gene finder for human sequences.
Reference : http://genes.mit.edu/GENSCAN.html
Display Name |
Allowed Values |
Default Value |
verbose output (extra explanatory info) |
NA |
NA |
Print predicted coding sequences (nucleic acid) |
NA |
NA |
Tool Name : Glimmer
- Gene Locator and Interpolated Markov ModelER
Version : 3.02
Glimmer is a system for finding genes in microbial DNA, especially the genomes of bacteria, archaea, and viruses. Glimmer (Gene Locator and Interpolated Markov ModelER) uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA
Tool Name : Hamming
- Hamming Distance (Anvaya Custom Tool)
Version : 1.0
This is customized tool provided as a part of Anvaya package.
Tool Name : HQ_EST
- (Anvaya Custom Tool)
Version : 1.0
This program is a part of Anvaya package.
Tool Name : InterProScan
Version : 4.4
InterPro is an integrated database of predictive protein "signatures" used for the classification and automatic annotation of proteins and genomes. InterPro classifies sequences at superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures.
Reference : http://www.ebi.ac.uk/interpro
Tool Name : kitsch
- From PHYLIP Package
Version : 3.67
Kitsch carries out the Fitch-Margoliash and Least Squares methods, plus a variety of others of the same family, with the assumption that all tip species are contemporaneous, and that there is an evolutionary clock (in effect, a molecular clock). This means that branches of the tree cannot be of arbitrary length, but are constrained so that the total length from the root of the tree to any species is the same. The quantity minimized is the same weighted sum of squares described in the Distance Matrix Methods documentation file.
Parameter and Display Name |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
alphanumeric |
Outfile |
Method |
Fitch-Margoliash |
Fitch-Margoliash |
Minimum Evolution |
||
Search for best tree |
Yes/No |
Yes |
Power |
Any Number |
2.0 |
Negative branch lengths allowed? |
Yes/No |
No |
Lower-triangular data matrix? |
Yes/No |
No |
Upper-triangular data matrix? |
Yes/No |
No |
Subreplicates |
Yes/No |
No |
Randomize input order of species? |
Yes/No |
No. Use input order |
Analyze multiple data sets? |
Yes/No |
No |
Terminal type |
BM PC
|
ANSI |
ANSI |
||
None |
||
Print out the data at start of run |
Yes/No |
No |
Print indications of progress of run |
Yes/No |
Yes |
Print out tree |
Yes/No |
Yes |
Write out trees onto tree file? |
Yes/No |
Yes |
Tool Name : MAST
- Motif Alignment and Search Tool
Version : 4.1.0
MAST is a tool for searching biological sequence databases for sequences that contain one or more of a group of known motifs.
Reference : http://meme.sdsc.edu/meme/mast-intro.html
Tool Name : MDScan
Version : 2004
A Fast and Accurate Motif Finding Algorithm With Applications To Chromatin Immunoprecipitation Microarray Experiments
Reference : http://robotics.stanford.edu/~xsliu/MDscan/
Tool Name : MEME
Version : 4.1.0
Meme is a motif finding tool for DNA as well as Protein.
Reference : http://meme.nbcr.net
Tool Name : MPAO
- Map Potential Antigenic Output (Anvaya Custom Tool)
Version : 1.0
Parse the outputs of different antigen prediction tools and make a file which will give combined output.
Tool Name : MPT
- Motif Processing Tool (Anvaya Custom Tool)
Version : 1.0
It is a custom tool which converts outputs from different motif prediction tools into a text-based visual format (for easy comparison of results). It requires output of at least two tools to compare. By default it takes first five motifs predicted by each tool. If predictions are less than five, then it takes all predicted motifs into consideration. It takes as input the outputs of different motif prediction programs and processes them to give the modified easy-to-interpret formatted output.
Tool Name : Mutual Info
Version : 0.64
Calculates mutual informartions (MIs) from a table of continous and discontinous variables. It is used in detection of functional linkages by Phylogenetic profiling.
Tool Name : neighbor
- From PHYLIP Package
Version : 3.67
Neighbor implements the Neighbor-Joining method. It constructs a tree by successive clustering of lineages, setting branch lengths as the lineages join. The tree is not rearranged thereafter. The tree does not assume an evolutionary clock, so that it is in effect an unrooted tree.
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
alphanumeric |
Outfile |
Method |
Neighbor-joining |
Neighbor-joining |
UPGMA tree |
||
Lower-triangular data matrix? |
Yes/No |
No |
Upper-triangular data matrix? |
Yes/No |
No |
Subreplicates |
Yes/No |
No |
Randomize input order of species? |
Yes/No |
No. Use input order |
Analyze multiple data sets? |
Yes/No |
No |
Terminal type |
IBM PC
|
ANSI |
ANSI |
||
None |
||
Print out the data at start of run |
Yes/No |
No |
Print indications of progress of run |
Yes/No |
Yes |
Print out tree |
Yes/No |
Yes |
Write out trees onto tree file? |
Yes/No |
Yes |
Outgroup root |
Yes/No |
No, use as outgroup species 1 |
Tool Name : NETCTL
Version : 1.2
NetCTL 1.2 predicts CTL epitopes in protein sequences
Reference : http://www.cbs.dtu.dk/services/NetCTL/
Tool Name : OFB
- Ortholog From Blast (Anvaya Custom Tool)
Version : 1.0
It is a custom tool which reports the final orthologs, if found from two TBLASTN output, given the desired identity and query coverage as well as unique genes found in the respective two genomes. It takes as input three files, orthologsQuery.out file from BBH, and the two TBLASTN output from the previous nodes. Given the desired identity value and percentage query coverage, it looks for genuine orthologs, if missed previously, and appends the result to the file orthologsQuery.out. The genes which did not satisfy the given identity and percent query coverage criteria are reported as unique genes.
Tool Name : OCP
- Orthologous Cluster of conserved Protein (Anvaya Custom Tool)
Version : 1.0
The .aln file of Clustalw is used by Orthologous Cluster tool, which extracts the conserved regions from given set of orthologous sequences.
Tool Name : PFAM
Version : Pfam_scan 0.7, Pfam 23.0
The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).
Reference : http://pfam.sanger.ac.uk/
Tool Name : Phred
- Base calling Program
Version : 2.2.14
Phred is a base-calling program for DNA sequence traces. Phred reads DNA sequence chromatogram files and analyzes the peaks to call bases, assigning quality scores ("Phred scores") to each base call.
Tool Name : PredPrimer
- Predict Primer (Anvaya Custom Tool)
Version : 1.0
It is a custom tool that reports a summarized output of the predicted primers by Primer3 after performing a similarity search against a desired database using BLASTn search and secondary structure formation using RNAfold. It takes as input two files, the BLASTn output file after running BLAST and the output of RNAfold. It reports the summarized output of atleast two hits reported by BLASTN for a query and its corresponding secondary structure prediction at a given temperature.
Tool Name : Prof_Vec
- Create Profile Vector (Anvaya Custom Tool)
Version : 1.0
Reads in the results of independent Smith-Waterman database searches and creates a matrix containing normalized E-values.
The SSEARCH output is parsed in terms of % query overlap (default: 40%). The e-value is normalized using the formula normalized e-value=-1/log E
The tool reads in outputs of independent search (one per organism) and creates a matrix with normalized E-values. The following are some of the assumptions made for the construction of the matrix:
5 --> refers to No significant Hits Found query.
1 --> refers to eVal >= 1
0 --> refers to eVal = 0
2 --> refers to eVal >0 and <1 BUT NOT satisfying conditions.
normaliseEval --> refers to eVal > 0 and < 1 satisfying conditions.
Tool Name : protml
- From PHYLIP Package
Version : 3.67
Protml PROTML is a PASCAL program for inferring evolutionary trees from protein (amino acid) sequences by using maximum likelihood. A maximum likelihood method for inferring trees from DNA or RNA sequences was developed by Felsenstein (1981). The method does not impose any constraint on the constancy of evolutionary rate among lineages.
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
alphanumeric |
Outfile |
Search for best tree |
Yes/No |
Yes |
Models of amino acid change |
Jones-Taylor-Thornton |
Jones-Taylor-Thornton |
Henikoff/Tillier PMB |
||
Dayhoff PAM |
||
Number of categories |
Any number between 1- 9 |
Yes |
Hidden Markov Model rates |
Constant rate |
Constant rate |
Gamma distributed rates |
||
Gamma+Invariant sites |
||
user-defined HMM of rates |
||
Weight |
Yes/No |
No |
Speedier but rougher analysis |
Yes/No |
Yes |
Global Arrangement |
Yes/No |
No |
Randomize input order of sequences? |
Yes/No |
No. Use input order |
Outgroup root? |
Yes/No |
No, use as outgroup species 1 |
Analyze multiple Data sets |
Multiple data sets (type D) |
No |
Multiple weights (type W) |
||
Input sequence |
Sequential |
Interleaved |
Interleaved |
||
Terminal Type |
IBM PC |
ANSI |
ANSI |
||
None |
||
Print out the data at start of run |
Yes/No |
No |
Print indications of progress of run |
Yes/No |
Yes |
Print out tree? |
Yes/No |
Yes |
Write out trees onto tree file? |
Yes/No |
Yes |
Reconstruct hypothetical sequences? |
Yes/No |
No |
Use lengths from user trees? |
Yes/No |
No |
Rates at adjacent sites correlated? |
Yes/No |
No, they are independent |
Tool Name : protdist
- From PHYLIP Package
Version : 3.67
ProtDist uses protein sequences to compute a distance matrix, under three different models of amino acid replacement. The distance for each pair of species estimates the total branch length between the two species, and can be used in the distance matrix programs FITCH, KITSCH or NEIGHBOR. This is an alternative to use of the sequence data itself in the parsimony program PROTPARS.
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
alphanumeric |
Outfile |
Method |
JTT |
Jones-Taylor-Thornton matrix |
Henikoff/Tillier PMB matrix |
||
Dayhoff PAM matrix |
||
Kimura formula |
||
Similarity Table |
||
Categories model |
||
Gamma distribution |
Yes/No |
No |
Gamma+Invariant |
||
Number of categories |
Any number between 0- 9 |
Yes |
Weights |
Yes/No |
No |
Multiple data sets |
Multiple weights (type W)
|
No |
Multiple data sets (type D)
|
||
Input Sequence |
Sequential |
Interleaved |
Interleaved |
||
Terminal Type |
ANSI |
ANSI |
IBM PC |
||
None |
||
Print out the data at start of run |
Yes/No |
No |
Print indications of progress of run |
Yes/No |
Yes |
Genetic codes |
Universal |
Universal |
Mitochondrial |
||
Vertebrate mitochondrial |
||
Fly mitochondrial |
||
Yeast mitochondrial |
||
Categories of Amino acids |
Chemical |
George/Hunt/Barker |
George/Hunt/Barker |
||
Hall |
||
Ease of changing category of amino acid |
Any number below 1.0. Can’t be negative. |
0.4570 |
Transition/transversion option |
Any Number or ratio |
2.0 |
Base Frequencies |
Equal |
Equal |
Any number but the Frequencies must sum to 1. |
Tool Name : protpars
- From PHYLIP Package
Version : 3.67
ProtPars infers an unrooted phylogeny from protein sequences, using a new method intermediate between the approaches of Eck and Dayhoff (1966) and Fitch (1971). Eck and Dayhoff (1966) allowed any amino acid to change to any other, and counted the number of such changes needed to evolve the protein sequences on each given phylogeny. This has the problem that it allows replacements which are not consistent with the genetic code, counting them equally with replacements that are consistent. Fitch, on the other hand, counted the minimum number of nucleotide substitutions that would be needed to achieve the given protein sequences. This counts silent changes equally with those that change the amino acid.
Parameter |
Allowed Values |
Default Value |
Input file |
Alphanumeric |
Infile |
Output file |
alphanumeric |
Outfile |
Search for best tree? |
No, use user trees in input file |
Yes |
Yes |
||
Randomize input order of sequences |
No/Yes |
No. Use input order |
Outgroup root |
No/Yes |
No, use as outgroup species 1 |
Use Threshold parsimony |
No/Yes |
No, use ordinary parsimony |
Genetic code |
Universal |
Universal |
Mitochondria |
||
Vertebrate mitochondrial |
||
Fly mitochondrial |
||
Yeast mitochondrial |
||
Use Transversion parsimony |
No/ Yes, count only transversions |
No, count all steps |
Sites weighted |
No/Yes |
No |
Analyze multiple data sets |
Multiple data sets (type D) |
No |
Multiple weights (type W) |
||
Input sequences interleaved |
Yes/No, sequential |
Yes |
Terminal type |
IBM PC/ANSI/ none |
ANSI |
Print out the data at start of run
|
No/Yes |
No |
Print indications of progress of run |
Yes/No |
No |
Print out tree |
Yes/No |
Yes |
Print out steps in each site |
Yes/No |
No |
Print sequences at all nodes of tree |
Yes/No |
No |
Write out trees onto tree file? |
Yes/No |
Yes |
Dot-differencing to display |
Yes/No |
No |
Tool Name : psiBLAST
Version : 2.0
Position specific iterative BLAST (PSI-BLAST) refers to a feature of BLAST 2.0 in which a profile (or position specific scoring matrix, PSSM) is constructed (automatically) from a multiple alignment of the highest scoring hits in an initial BLAST search. The PSSM is generated by calculating position-specific scores for each position in the alignment. Highly conserved positions receive high scores and weakly conserved positions receive scores near zero. The profile is used to perform a second (etc.) BLAST search and the results of each "iteration" used to refine the profile. This iterative searching strategy results in increased sensitivity.
Reference : http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html
Tool Name : RBSFinder
Version : Not Available
RBSfinder will search for regions in the vicinity of the gene start where the ribosom might bind. Based on its findings RBSfinder might propose a different gene start. In most cases the use of RBSfinder increases the accuracy of prediction of the gene start.
Parameter |
Allowed Values |
Genome sequence file in fasta |
File name ( Input File ) |
Glimmer output file |
File name ( Input File ) |
Output file |
File name |
Length of upstream region of gene, where RBS will be searched (in bp) |
<=300 |
consensus sequence |
String of a, t, g, c (any length) |
File containing position to relocate or check for RBS site |
File name ( Input File) |
Tool Name : RepeatMasker
Version : 3.0
RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns).
Parameter
Allowed Values
Default Value
Alternate search engine
Cross_match , WuBlast , Decypher
Cross_match
Parallel version
No. of processors available
NA
Slow search
N.A
Unchecked
Quick search
N.A
Unchecked
Rush job
N.A
Unchecked
Don’t mask low complexity regions / simple repeats
N.A
Unchecked
Only masks low complex/simple repeats
N.A
Unchecked
Do not mask small RNA or pseudo genes
N.A
Unchecked
Mask Alus, 7SLRNA, SVA and LTR5
N.A
Unchecked
Maximum % divergence from consensus sequence
0 to 100
NA
Use custom library
File name
(Input File)
NA
Cutoff score for masking repeats.
> 0
225
Specify species of input sequence
String
N.A
Only clip E.coli insertion elements.
N.A
Unchecked
Clip IS element before analysis
N.A
Unchecked
Skip bacterial insertion element check
N.A
Unchecked
Check for rodent specific repeats
N.A
Unchecked
Checks for primate specific repeats
N.A
Unchecked
Use matrices calculated for x% background GC level
1 to100
N.A
Calculate GC content
N.A
Unchecked
Max. sequence length masked without fragmenting.
Any integer
If –e = DeCypher then Default value: 300000
Else
Default: 40000
Skip the steps in which repeats are excised
N.A
Unchecked
Write alignment in .align output file
N.A
Unchecked
Present alignment in the orientation of repeats
N.A
Unchecked
Outputs ambiguous DNA transposon in lower case
N.A
Unchecked
Mask Sequence in lower case
N.A
Unchecked
Returns repetitive regions in lowercase
N.A
Unchecked
Repetitive region masked with X’s
N.A
Unchecked
Reports simple repeats that may be polymorphic
N.A
N.A
Annotation with the HSP evidence
N.A
Unchecked
Create an additional output in xhtml format
N.A
Unchecked
Output in ACeDB format
N.A
Unchecked
Gene Feature Finding format output
N.A
Unchecked
Annotation output file not processed by ProcessRepeats
N.A
Unchecked
Output file in cross_match format
N.A
Unchecked
Create annotation file with fixed column width
N.A
Unchecked
Does not write final column with unique ID for each element
N.A
Unchecked
Tool Name : Restrict
- From EMBOSS Package
Version : 4.1.0
Restricts finds restriction enzyme cleavage sites. Restrict uses the REBASE database of restriction enzymes to predict cut sites in a DNA sequence. The program allows you to select a range of cuts, whether the DNA is circular, whether IUB ambiguity codes are used, whether blunt or sticky ends or both are reported.
Parameter
Allowed Values
Default Value
Input
Alphanumeric
Stdin
-Enzymes
Alphabetic
all
Minimum site length
Integer
4
Output
Alphanumeric
Stdin
Minimum cuts for an enzyme
Integer
1
Maximum cuts for an enzyme
Integer
2000000000
Fragment lengths
Boolean Yes/No
No
Solo
Fragment lengths of each enzyme
Boolean Yes/No
No
Only one fragment per enzyme
Boolean Yes/No
No
Allow blunt cut
Boolean Yes/No
Yes
Allow Sticky ends
Boolean Yes/No
Yes
Allow ambiguity
Boolean Yes/No
Yes
Span Plasmid/end sequence
Boolean Yes/No
No
commercial enzymes
Boolean Yes/No
No
Alternate RE datafile
Alphanumeric
(Input file)
Stdin
Isoschizomers
limit
Boolean
Yes/No
Yes
Sort out Alphabetically
Boolean
Yes/No
No
Tool Name : ROT
- Remote Ortholog Tool (Anvaya Custom Tool)
Version : 1.0
The output of PSI-BLAST is used as input for Remote ortholog tool (ROT). Hits between 30% to 60% identity achieved from each round of PSI-BLAST are extracted by ROT. These are probable remote ortholog sequences and can be saved in FASTA format for further analysis.
Tool Name : SeqBoot
- From PHYLIP Package
Version : 3.67
SEQBOOT is a general boostrapping tool. It is intended to allow you to generate multiple data sets that are resampled versions of the input data set. Since almost all programs in the package can analyze these multiple data sets, this allows almost anything in this package to be bootstrapped, jackknifed, or permuted. SEQBOOT can handle molecular sequences, binary characters, restriction sites, or gene frequencies.
Parameter |
Allowed Values |
Default Value |
|
Input file |
Alphanumeric |
Infile |
|
Output file |
alphanumeric |
Outfile |
|
Method |
Molecular Sequence |
Molecular Sequence |
|
Discrete Morphological Characters |
|||
Restriction Sites |
|||
Gene Frequencies |
|||
Bootstrap |
Bootstrap |
||
Delete half Jackknife |
|||
Permute |
Permute species for each character |
||
Permute character order |
|||
Permute within species |
|||
Rewrite data |
|||
Sampling fraction |
Regular |
Regular |
|
Altered |
|||
Block Bootstrap |
Size of Block: Any Number |
1 |
|
Number of replicate datasets |
Any Number |
100 |
|
Weight |
Read weights of characters: Yes/No |
No |
|
Categories |
Read categories of sites: Yes/No |
No |
|
Data sets |
Data sets |
||
Just weights |
|||
Input Sequence |
Interleaved |
Interleaved |
|
Sequential |
|||
Terminal Type |
IBM PC |
||
ANSI |
|||
None |
|||
Print out the data at start of run
|
Yes/No |
No |
|
Print indications of progress of run |
Yes/No |
No |
|
Number of enzymes |
Present in input file |
Present in input file |
|
Not present in input file |
|||
All alleles present at each locus |
No, one absent at each locus |
No, one absent at each locus |
|
Yes |
|||
Factors |
Yes/No |
No |
|
Ancestors |
Yes/No |
No |
|
Mixture of methods |
Yes/No |
No |
|
Dot-differencing to display |
Yes/No |
No |
|
Output format |
PHYLIP |
PHYLIP |
|
NEXUS |
|||
XML |
|||
Type of molecular sequences |
DNA RNA Protein |
DNA |
Tool Name : seqClean
Version : 2.2.14
A script for automated trimming and validation of ESTs or other DNA sequences by screening for various contaminants, low quality and low-complexity sequences.
Reference : http://jimmy.harvard.edu
Tool Name : SignalP
Version : 3.0
SignalP predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal peptide/non-signal peptide prediction based on a combination of several artificial neural networks and hidden Markov models.
Reference : http://www.cbs.dtu.dk/services/SignalP/
Tool Name : SRT
- Sequence Retreival Tool (Anvaya Custom Tool)
Version : 1.0
Retrieves the fasta sequences of the entries satisfying the above mentioned criteria from the corresponding table of the database stored in a RDBMS like MySQL. The storage of standard databases like UniProt and nr as tables in MySQL increased the pace of retrieval which otherwise had to be extracted from a text file with upto 3GB size and consumed lot of time for the same task.
Tool Name : STT
- Sequence Trimming Tool (Anvaya Custom Tool)
Version : 1.0
Sequence trimming tool will remove the vector-masked regions from the EST sequences. It also removes polyA/T tails, poly C/G tails, adaptor and linker sequences.
Input files:
Output files: (need to be mentioned as an parameter)
Tool Name : TargetP
Version : v1.1b
TargetP predicts the subcellular location of eukaryotic proteins. The location assignment is based on the predicted presence of any of the N-terminal presequences: chloroplast transit peptide (cTP), mitochondrial targeting peptide (mTP) or secretory pathway signal peptide (SP).
Reference : http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?targetp
Advanced Parameters
Parameter |
Allowed Values |
Default Value |
Query input |
Alphanumeric |
Stdin/Fasta format |
Organism group |
N/A |
Non-plant |
Include cleavage site prediction |
N/A |
N/A |
Cutoff for predicting cTP. |
Float 0.0-1.0 |
0.0 |
Cutoff for predicting mTP. |
Float 0.0-1.0 |
0.0 |
Cutoff for SP. |
Float 0.0-1.0 |
0.0 |
Cutoff for other. |
Float 0.0-1.0 |
0.0 |
Output file |
Alphanumeric |
stdout |
Tool Name : TMHMM
- Prediction of transmembrane helices in proteins
Version : 2.0.c
TMHMM predicts transmembrane helices and the location of the intervening loop regions. |
If the whole sequence is labeled as inside or outside, the prediction is that it contains no membrane helices. It is probably not wise to interpret it as a prediction of location. The prediction gives the most probable location and orientation of transmembrane helices in the sequence. |
Reference : http://www.cbs.dtu.dk/cgi-bin/nph-sw_request?tmhmm
Parameter |
Command line mapping |
Allowed Values |
Input file |
NA |
Alphanumeric |
Model |
-v |
Tool Name : tRNAScan
Version : 1.21
tRNA detection in large-scale genome sequence.
tRNAscan-SE detects ~99% of eukaryotic nuclear or prokaryotic tRNA genes, with a false positive rate of less than one per 15 gigabases, and with a search speed of about 30 kb/second
Reference : http://selab.janelia.org/software.html
Tool Name : UT_EST
- Unique Transcripts EST Tool (Anvaya Custom Tool)
Version : 1.0
This is customized tool provided as a part of Anvaya package. This will give the concatenated file containing the all contigs and singlets.
Input files:
Output file name need to mention as an input parameter.
Tool Name : vRNAFold
Version : NA
predict secondary structures of single stranded RNA or DNA sequences.
Tool Name : Weeder
Version : 1.3
Weeder is a program for finding novel motifs ( transcription factor binding sites ) conserved in a set of regulatory regions of related genes.
Reference : http://159.149.109.9/modtools/
A Workflow-Pipeline is logical connection of commonly used Bioinformatics tools, which are run either in serial mode or in parallel mode, to achieve a scientific target.
The pipeline is created by dragging appropriate tools from the available tool list, connecting them logically on the design canvas, and then setting the appropriate input-output files and advanced parameters.
A workflow is created by dragging associated tools from the tool list onto the design canvas. The tools are then connected in logical order.
When user clicks "Run Workflow", the user will need to transfer the input and other configuration files before actual execution of workflow. Once the file transfer is complete, the user can start executing the workflow.
Status of the workflow can be viewed in tabular format on the status tab. The status is also depicted pictorially on the design canvas
The workflow created on design canvas must be saved before execution.
Terminology | Description |
Pipeline | Logical connection of tools |
Rules Engine | A component of Anvaya Client which controls connection of tools. The rules engine defines which tools can be logically connected. The rules engine is pre-defined and should not be modified. by end-user |
Custom Tool | These are tools which are custom made for Anvaya. These add value to the pre-defined workflows available in Anvaya |
SubLayer | If the workflow, spans across large area on design canvas, the sub-parts of workflows can be grouped together in sub-layer, which then can be collapsed/expanded by the user |