<%@ page language="C#" masterpagefile="~/Index.master" autoeventwireup="true" inherits="Algorithm, App_Web_iepjjt2j" %>

Algorithm

Data Set

Blood samples were collected from a total of 1037 unrelated animals belonging to twenty two different Indian goat breeds. The breeds selected were from diverse geographical regions and climatic conditions with varying utilities and body sizes. Genomic DNA was isolated from the blood samples by using SDS-Proteinase-K method. The quality and quantity of the DNA extracted was assessed by Nanodrop 1000 (Thermo Scientific, USA) before further use. 55000 allelic data of microsatellite marker based DNA fingerprinting of 22 goat breeds. The list of breeds with their accession number is given in Table 1.These data are on 25 loci viz. ILST008, ILSTS059, ETH225, ILSTS044, ILSTS002, OarFCB304, OarFCB48, OarHH64, OarJMP29, ILSTS005, ILSTS019, OMHC1, ILSTS087, ILSTS30, ILSTS34, ILSTS033, ILSTS049, ILSTS065, ILSTS058, ILSTS029, RM088, ILSTS022, OarAE129, ILSTS082 and RM4 (Table 2). Out of 25 loci 9 loci viz. ETH225, OarFCB304, ILSTS065, ILSTS058, ILSTS029, RM088, OarAE129, ILSTS082 and RM4 are selected using feature selection (variable screening) and the system has been trained for the data on those 9 loci to achieve best mobel for breed identification.

Table 1. List of Goat Breeds with Accession Number

S.No.	Breed	Home Tract	Accessation number
1	Attapaddy	Kerala	INDIA_GOAT_0900_ATTAPADYBLACK_06001
2	Barbari	Uttar Pradesh and Rajisthan	INDIA_GOAT_2017_BARBARI_06002
3	Beetal	Punjab	INDIA_GOAT_1600_BEETAL_06003
4	Blackbengal	West Bengal	INDIA_GOAT_2100_BLACKBENGAL_06004
5	Changthangi	Jammu & Kashmir	INDIA_GOAT_0700_CHANGTHANGI_06005
6	Cheghu	Himachal Pradesh	INDIA_GOAT_0600_CHEGU_06006
7	Gaddi	Himachal Pradesh	INDIA_GOAT_0600_GADDI_06007
8	Ganjam	Orissa	INDIA_GOAT_1500_GANJAM_06008
9	Gohilwari	Gujarat	INDIA_GOAT_0400_GOHILWADI_06009
10	Jhakarana	Rajasthan	INDIA_GOAT_1700_JAKHRANA_06010
11	Jharkhandblack	Jharkhand	Applied for registration
12	Jamunapari	Uttar Pradesh	INDIA_GOAT_2000_JAMUNAPARI_06011
13	Kanniadu	Tamilnadu	INDIA_GOAT_1800_KANNIADU_06012
14	Kutchi	Gujarat	INDIA_GOAT_0400_KUTCHI_06013),
15	Malabari	Kerala	INDIA_GOAT_0900_MALABARI_06014
16	Marwari	Rajasthan	INDIA_GOAT_1700_MARWARI_06015
17	Mehsana	Gujrat	INDIA_GOAT_0400_MEHSANA_06016
18	Osmanabadi	Maharashtra	INDIA_GOAT_1100_ OSMANABADI _06017
19	Sangamnari	Maharashtra	INDIA_GOAT_1100_ SANGAMNERI_06018
20	Sirohi	Rajasthan and Gujrat	INDIA_GOAT_1704_SIROHI _06019
21	Surti	Gujarat	INDIA_GOAT_0400_SURTI_06020
22	Zalawari	Gujarat	INDIA_GOAT_0400_ZALAWADI _06021
Source: National Bureau of Animal Genetic Resources, Karnal

Table 2. List of 25 loci along with the primer pairs

Locus	Forward Primer	Reverse Primer	Dye	Size Range
ILST008	gaatcatggattttctgggg	tagcagtgagtgaggttggc	FAM	167-195
ILSTS059	gctgaacaatgtgatatgttcagg	gggacaatactgtcttagatgctgc	FAM	105-135
ETH225	gatcaccttgccactatttcct	acatgacagccaagctgctact	VIC	146-160
ILST044	agtcacccaaaagtaactgg	acatgttgtattccaagtgc	NED	145-177
ILSTS002	tctatacacatgtgctgtgc	cttaggggtgtattccaagtgc	VIC	113-135
OarFCB48	gagttagtacaaggatgacaagaggcac	gactctagaggatcgcaaagaaccag	VIC	149-181
OarFCB304	ccctaggagctttcaataaagaatcgg	cgctgctgtcaactgggtcaggg	FAM	119-169
OarFCB48	gagttagtacaaggatgacaagaggcac	gactctagaggatcgcaaagaaccag	VIC	149-181
OarHH64	cgttccctcactatggaaagttatatatgc	cactctattgtaagaatttgaatgagagc	PET	120-138
OarJMP29	gtatacacgtggacaccgctttgtac	gaagtggcaagattcagaggggaag	NED	120-140
ILSTS005	ggaagcaatgaaatctatagcc	tgttctgtgagtttgtaagc	VIC	174-190
ILSTS019	aagggacctcatgtagaagc	acttttggaccctgtagtgc	FAM	142-162
ILSTS34	aagggtctaagtccactggc	gacctggtttagcagagagc	VIC	153-185
ILSTS033	tattagagtggctcagtgcc	atgcagacagttttagaggg	PET	151-187
ILSTS049	caattttcttgtctctcccc	gctgaatcttgtcaaacagg	NED	160-184
ILSTS065	gctgcaaagagttgaacacc	aactattacaggaggctccc	PET	105-135
ILSTSO58	gccttactaccatttccagc	catcctgactttggctgtgg	PET	136-188
ILSTSO29	tgttttgatggaacacagcc	tggatttagaccagggttgg	PET	148-191
RM088	gatcctcttctgggaaaaagagac	cctgttgaagtgaaccttcagaa	FAM	109-147
ILSTS022	agtctgaaggcctgagaacc	cttacagtccttggggttgc	PET	186-202
OARE129	aatccagtgtgtgaaagactaatccag	gtagatcaagatatagaatatttttcaacacc	FAM	130-175
ILSTS082	ttcgttcctcatagtgctgg	agaggattacaccaatcacc	PET	100-136
RM4	cagcaaaatatcagcaaacct	ccacctgggaaggccttta	NED	104-127

MACHINE LEARNING TECHNIQUE

STATISTICA is an analytics software package developed by StatSoft. STATISTICA provides data analysis, data management, data mining, and data visualization procedures. STATISTICA product categories include Enterprise (for use across a site or organization), Web-Based (for use with a server and web browser), Concurrent Network Desktop, and Single-User Desktop. The classification model has been developed using STATISTICA Automated Neural Networks (SANN) package in STATISTICA software and saved the generated codes in C++ and used further integrated in ASP.NET. We can use our server for goat breed identification without installing STATISTICA software in our own system.

ARTIFICIAL NEURAL NETWORK (ANNs)

Artificial Neural Networks (ANNs) are non-linear mapping structures based on the function of the human brain. ANNs are computational structure that is inspired by observed process in natural networks of biological neurons in the brain which can identify and learn correlated patterns between input data sets and corresponding target values. ANNs imitate the learning process of the human brain and can process problems involving non-linear and complex data. They are powerful tools for classification and modelling, especially when the underlying data relationship is unknown. ANN consists of simple computational units called neurons and are highly interconnected. A very important feature of these networks is their adaptive nature, where “learning by example” is employed in solving problems where ‘learning’ can be understood as developing or evolving while ‘example’ means cases or objects presented to the network. ANNs are now being increasingly recognized in the area of classification and prediction, where regression model and other related statistical techniques have traditionally been employed. The neural network model can be claimed to be ‘data-driven’ i.e. free from any stringent model assumptions prevalent with other statistical models. ANNs have been used for a wide variety of applications where statistical methods are traditionally employed.

They have a remarkable ability to derive and extract meaning, rules, and trends from complicated, noisy, and imprecise data. They can be used to extract patterns and detect trends that are governed by complicated mathematical functions that are too difficult, if not impossible, to model using analytic or parametric techniques. One of the abilities of neural networks is to accurately predict data that were not part of the training data set, a process known as generalization. Given these characteristics and their broad applicability, neural networks are suitable for applications of real world problems in research and science, business, and industry.

Multilayer perceptron (MLP):The most popular form of neural network architecture is the multilayer perceptrons (MLP) which is a generalization of the single-layer perceptron. A multilayer perceptron (MLP) is feedforward neural network architecture with unidirectional full connections between successive layers. Typically, the MLP network consists of a set of source nodes that constitute the input layer (known as independent variables in statistical literature), one or more hidden layers of computation nodes and an output layer (known as dependent or response variables) of computation nodes. The input signal propagates through the network in a forward direction on a layer by layer basis. A multilayer perceptron has three distinctive characteristics:

The model of each neuron in the network includes a nonlinear activation function which should also be a differentiable everywhere.
The network contains one or more layers of hidden neurons that are not part of the input or output of the network.
The network exhibits a high degree of connectivity determined by the synapses of the network. A change in the connectivity of the network requires a change in the population of synaptic connections or their weights.

Radial basis function (RBF):This is another type of neural network architecture which are perhaps the most popular type after MLPs. In many ways RBF is similar to MLP networks. First, they too have unidirectional feedforward connections and every neurons is fully connected to the units in the next layer above. The neurons are arranged in a layered feedforward topology. RBF networks, almost invariably, consists of three layers: a transparent input layer, a hidden layer with sufficiently large number of nodes and an output layer. As its name implies, radially symmetric basis function is used as activation function of hidden nodes. The transformation from the input nodes to the hidden nodes is non-linear one and training of this portion of the network is generally accomplished by an unsupervised fashion. The training of the network parameters between the hidden and output layers occurs in a supervised fashion based on target outputs.

Error Function:The error function is used to evaluate the performance of a neural network during training. The error function measures how close the network predictions are to the targets and, hence, how much weight adjustment should be applied by the training algorithm in each iteration. Thus, the error function is the eyes and ears of the training algorithm as to how well a network performs given its current state of training (and, hence, how much adjustment should be made to the value of its weights). One common approach is to use the sum-squares error function. In this case, the network learns a discriminant function. The sum-of-squares error is simply given by the sum of differences between the target and prediction outputs defined over the entire training set. Thus:

Esos=

which assumes that the target variables are driven from a multinomial distribution.

ASSESSMENT OF THE PREDICTION ACCURACY

Computational models that are valid, relevant, and properly assessed for accuracy can be used for planning of complementary laboratory experiments. The prediction quality was examined by testing the model, obtained after training the system, with test data set. Several measures are available for the statistical estimation of the accuracy of prediction models. The common statistical measures are Sensitivity, Specificity, Precision or Positive predictive value (PPV), Negative predictive value (NPV), Accuracy and Mathew’s correlation coefficient (MCC).

The Sensitivity indicates the ‘‘quantity’’ of predictions, i.e., the proportion of real positives correctly predicted. The Specificity indicates the ‘‘quality’’ of predictions, i.e., the proportion of true negatives correctly predicted. The PPV indicates the proportion of true positives in predicted positives-“the success rate” while NPV is the proportion of true negatives in predicted negatives. These measures are defined as follows:

Sensitivity=[TP/(TP + FP) ] 100*	Specificity=[TN/(FP + TN)] 100*
NPV=[TN/(TN + FN)] 100*	PPV=[TP/(TP + FP) ] 100*

Where TP and TN are correctly predicted antibacterial peptides and non antibacterial peptides respectively. FP and FN are wrongly predicted antibacterial peptides and non antibacterial peptides respectively.