<%@ page language="C#" masterpagefile="~/Index.master" autoeventwireup="true" inherits="Algorithm, App_Web_iepjjt2j" %>
Algorithm

Data Set

Blood samples were collected from a total of 1037 unrelated animals belonging to twenty two different Indian goat breeds. The breeds selected were from diverse geographical regions and climatic conditions with varying utilities and body sizes. Genomic DNA was isolated from the blood samples by using SDS-Proteinase-K method. The quality and quantity of the DNA extracted was assessed by Nanodrop 1000 (Thermo Scientific, USA) before further use. 55000 allelic data of microsatellite marker based DNA fingerprinting of 22 goat breeds. The list of breeds with their accession number is given in Table 1.These data are on 25 loci viz. ILST008, ILSTS059, ETH225, ILSTS044, ILSTS002, OarFCB304, OarFCB48, OarHH64, OarJMP29, ILSTS005, ILSTS019, OMHC1, ILSTS087, ILSTS30, ILSTS34, ILSTS033, ILSTS049, ILSTS065, ILSTS058, ILSTS029, RM088, ILSTS022, OarAE129, ILSTS082 and RM4 (Table 2). Out of 25 loci 9 loci viz. ETH225, OarFCB304, ILSTS065, ILSTS058, ILSTS029, RM088, OarAE129, ILSTS082 and RM4 are selected using feature selection (variable screening) and the system has been trained for the data on those 9 loci to achieve best mobel for breed identification.

Table 1. List of Goat Breeds with Accession Number
 
S.No. Breed Home Tract Accessation number
1 Attapaddy Kerala INDIA_GOAT_0900_ATTAPADYBLACK_06001
2 Barbari Uttar Pradesh and Rajisthan INDIA_GOAT_2017_BARBARI_06002
3 Beetal Punjab INDIA_GOAT_1600_BEETAL_06003
4 Blackbengal West Bengal INDIA_GOAT_2100_BLACKBENGAL_06004
5 Changthangi Jammu & Kashmir INDIA_GOAT_0700_CHANGTHANGI_06005
6 Cheghu Himachal Pradesh INDIA_GOAT_0600_CHEGU_06006
7 Gaddi Himachal Pradesh INDIA_GOAT_0600_GADDI_06007
8 Ganjam Orissa INDIA_GOAT_1500_GANJAM_06008
9 Gohilwari Gujarat INDIA_GOAT_0400_GOHILWADI_06009
10 Jhakarana Rajasthan INDIA_GOAT_1700_JAKHRANA_06010
11 Jharkhandblack Jharkhand Applied for registration
12 Jamunapari Uttar Pradesh INDIA_GOAT_2000_JAMUNAPARI_06011
13 Kanniadu Tamilnadu INDIA_GOAT_1800_KANNIADU_06012
14 Kutchi Gujarat INDIA_GOAT_0400_KUTCHI_06013),
15 Malabari Kerala INDIA_GOAT_0900_MALABARI_06014
16 Marwari Rajasthan INDIA_GOAT_1700_MARWARI_06015
17 Mehsana Gujrat INDIA_GOAT_0400_MEHSANA_06016
18 Osmanabadi Maharashtra INDIA_GOAT_1100_ OSMANABADI _06017
19 Sangamnari Maharashtra INDIA_GOAT_1100_ SANGAMNERI_06018
20 Sirohi Rajasthan and Gujrat INDIA_GOAT_1704_SIROHI _06019
21 Surti Gujarat INDIA_GOAT_0400_SURTI_06020
22 Zalawari Gujarat INDIA_GOAT_0400_ZALAWADI _06021
Source: National Bureau of Animal Genetic Resources, Karnal

 

Table 2. List of 25 loci along with the primer pairs

Locus

Forward Primer

Reverse Primer

Dye

Size Range

ILST008 gaatcatggattttctgggg tagcagtgagtgaggttggc FAM 167-195
ILSTS059 gctgaacaatgtgatatgttcagg gggacaatactgtcttagatgctgc FAM 105-135
ETH225 gatcaccttgccactatttcct acatgacagccaagctgctact VIC 146-160
ILST044 agtcacccaaaagtaactgg acatgttgtattccaagtgc NED 145-177
ILSTS002 tctatacacatgtgctgtgc cttaggggtgtattccaagtgc VIC 113-135
OarFCB48 gagttagtacaaggatgacaagaggcac gactctagaggatcgcaaagaaccag VIC 149-181
OarFCB304 ccctaggagctttcaataaagaatcgg cgctgctgtcaactgggtcaggg FAM 119-169
OarFCB48 gagttagtacaaggatgacaagaggcac gactctagaggatcgcaaagaaccag VIC 149-181
OarHH64 cgttccctcactatggaaagttatatatgc cactctattgtaagaatttgaatgagagc PET 120-138
OarJMP29 gtatacacgtggacaccgctttgtac gaagtggcaagattcagaggggaag NED 120-140
ILSTS005 ggaagcaatgaaatctatagcc tgttctgtgagtttgtaagc VIC 174-190
ILSTS019 aagggacctcatgtagaagc acttttggaccctgtagtgc FAM 142-162
ILSTS34 aagggtctaagtccactggc gacctggtttagcagagagc VIC 153-185
ILSTS033 tattagagtggctcagtgcc atgcagacagttttagaggg PET 151-187
ILSTS049 caattttcttgtctctcccc gctgaatcttgtcaaacagg NED 160-184
ILSTS065 gctgcaaagagttgaacacc aactattacaggaggctccc PET 105-135
ILSTSO58 gccttactaccatttccagc catcctgactttggctgtgg PET 136-188
ILSTSO29 tgttttgatggaacacagcc tggatttagaccagggttgg PET 148-191
RM088 gatcctcttctgggaaaaagagac cctgttgaagtgaaccttcagaa FAM 109-147
ILSTS022 agtctgaaggcctgagaacc cttacagtccttggggttgc PET 186-202
OARE129 aatccagtgtgtgaaagactaatccag gtagatcaagatatagaatatttttcaacacc FAM 130-175
ILSTS082 ttcgttcctcatagtgctgg agaggattacaccaatcacc PET 100-136
RM4 cagcaaaatatcagcaaacct ccacctgggaaggccttta NED 104-127

MACHINE LEARNING TECHNIQUE

STATISTICA is an analytics software package developed by StatSoft. STATISTICA provides data analysis, data management, data mining, and data visualization procedures. STATISTICA product categories include Enterprise (for use across a site or organization), Web-Based (for use with a server and web browser), Concurrent Network Desktop, and Single-User Desktop. The classification model has been developed using STATISTICA Automated Neural Networks (SANN) package in STATISTICA software and saved the generated codes in C++ and used further integrated in ASP.NET. We can use our server for goat breed identification without installing STATISTICA software in our own system.

ARTIFICIAL NEURAL NETWORK (ANNs)

Artificial Neural Networks (ANNs) are non-linear mapping structures based on the function of the human brain. ANNs are computational structure that is inspired by observed process in natural networks of biological neurons in the brain which can identify and learn correlated patterns between input data sets and corresponding target values. ANNs imitate the learning process of the human brain and can process problems involving non-linear and complex data. They are powerful tools for classification and modelling, especially when the underlying data relationship is unknown. ANN consists of simple computational units called neurons and are highly interconnected. A very important feature of these networks is their adaptive nature, where “learning by example” is employed in solving problems where ‘learning’ can be understood as developing or evolving while ‘example’ means cases or objects presented to the network. ANNs are now being increasingly recognized in the area of classification and prediction, where regression model and other related statistical techniques have traditionally been employed. The neural network model can be claimed to be ‘data-driven’ i.e. free from any stringent model assumptions prevalent with other statistical models. ANNs have been used for a wide variety of applications where statistical methods are traditionally employed.


They have a remarkable ability to derive and extract meaning, rules, and trends from complicated, noisy, and imprecise data. They can be used to extract patterns and detect trends that are governed by complicated mathematical functions that are too difficult, if not impossible, to model using analytic or parametric techniques. One of the abilities of neural networks is to accurately predict data that were not part of the training data set, a process known as generalization. Given these characteristics and their broad applicability, neural networks are suitable for applications of real world problems in research and science, business, and industry.

Multilayer perceptron (MLP):The most popular form of neural network architecture is the multilayer perceptrons (MLP) which is a generalization of the single-layer perceptron. A multilayer perceptron (MLP) is feedforward neural network architecture with unidirectional full connections between successive layers. Typically, the MLP network consists of a set of source nodes that constitute the input layer (known as independent variables in statistical literature), one or more hidden layers of computation nodes and an output layer (known as dependent or response variables) of computation nodes. The input signal propagates through the network in a forward direction on a layer by layer basis. A multilayer perceptron has three distinctive characteristics:

  • The model of each neuron in the network includes a nonlinear activation function which should also be a differentiable everywhere.

  • The network contains one or more layers of hidden neurons that are not part of the input or output of the network.

  • The network exhibits a high degree of connectivity determined by the synapses of the network. A change in the connectivity of the network requires a change in the population of synaptic connections or their weights.

Radial basis function (RBF):This is another type of neural network architecture which are perhaps the most popular type after MLPs. In many ways RBF is similar to MLP networks. First, they too have unidirectional feedforward connections and every neurons is fully connected to the units in the next layer above. The neurons are arranged in a layered feedforward topology. RBF networks, almost invariably, consists of three layers: a transparent input layer, a hidden layer with sufficiently large number of nodes and an output layer. As its name implies, radially symmetric basis function is used as activation function of hidden nodes. The transformation from the input nodes to the hidden nodes is non-linear one and training of this portion of the network is generally accomplished by an unsupervised fashion. The training of the network parameters between the hidden and output layers occurs in a supervised fashion based on target outputs.

Error Function:The error function is used to evaluate the performance of a neural network during training. The error function measures how close the network predictions are to the targets and, hence, how much weight adjustment should be applied by the training algorithm in each iteration. Thus, the error function is the eyes and ears of the training algorithm as to how well a network performs given its current state of training (and, hence, how much adjustment should be made to the value of its weights). One common approach is to use the sum-squares error function. In this case, the network learns a discriminant function. The sum-of-squares error is simply given by the sum of differences between the target and prediction outputs defined over the entire training set. Thus:

Esos=

which assumes that the target variables are driven from a multinomial distribution.

ASSESSMENT OF THE PREDICTION ACCURACY

Computational models that are valid, relevant, and properly assessed for accuracy can be used for planning of complementary laboratory experiments. The prediction quality was examined by testing the model, obtained after training the system, with test data set. Several measures are available for the statistical estimation of the accuracy of prediction models. The common statistical measures are Sensitivity, Specificity, Precision or Positive predictive value (PPV), Negative predictive value (NPV), Accuracy and Mathew’s correlation coefficient (MCC).

The Sensitivity indicates the ‘‘quantity’’ of predictions, i.e., the proportion of real positives correctly predicted. The Specificity indicates the ‘‘quality’’ of predictions, i.e., the proportion of true negatives correctly predicted. The PPV indicates the proportion of true positives in predicted positives-“the success rate” while NPV is the proportion of true negatives in predicted negatives. These measures are defined as follows:

Sensitivity=[TP/(TP + FP) ]* 100
Specificity=[TN/(FP + TN)] * 100
NPV=[TN/(TN + FN)] * 100
PPV=[TP/(TP + FP) ]* 100

Where TP and TN are correctly predicted antibacterial peptides and non antibacterial peptides respectively. FP and FN are wrongly predicted antibacterial peptides and non antibacterial peptides respectively.