Algorithm | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Data SetBlood samples were collected from a total of 1037 unrelated animals belonging to twenty two different Indian goat breeds. The breeds selected were from diverse geographical regions and climatic conditions with varying utilities and body sizes. Genomic DNA was isolated from the blood samples by using SDS-Proteinase-K method. The quality and quantity of the DNA extracted was assessed by Nanodrop 1000 (Thermo Scientific, USA) before further use. 55000 allelic data of microsatellite marker based DNA fingerprinting of 22 goat breeds. The list of breeds with their accession number is given in Table 1.These data are on 25 loci viz. ILST008, ILSTS059, ETH225, ILSTS044, ILSTS002, OarFCB304, OarFCB48, OarHH64, OarJMP29, ILSTS005, ILSTS019, OMHC1, ILSTS087, ILSTS30, ILSTS34, ILSTS033, ILSTS049, ILSTS065, ILSTS058, ILSTS029, RM088, ILSTS022, OarAE129, ILSTS082 and RM4 (Table 2). Out of 25 loci 9 loci viz. ETH225, OarFCB304, ILSTS065, ILSTS058, ILSTS029, RM088, OarAE129, ILSTS082 and RM4 are selected using feature selection (variable screening) and the system has been trained for the data on those 9 loci to achieve best mobel for breed identification. Table 1. List of Goat Breeds with Accession Number
Table 2. List of 25 loci along with the primer pairs
MACHINE LEARNING TECHNIQUESTATISTICA is an analytics software package developed by StatSoft. STATISTICA provides data analysis, data management, data mining, and data visualization procedures. STATISTICA product categories include Enterprise (for use across a site or organization), Web-Based (for use with a server and web browser), Concurrent Network Desktop, and Single-User Desktop. The classification model has been developed using STATISTICA Automated Neural Networks (SANN) package in STATISTICA software and saved the generated codes in C++ and used further integrated in ASP.NET. We can use our server for goat breed identification without installing STATISTICA software in our own system. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ARTIFICIAL NEURAL NETWORK (ANNs) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Artificial Neural Networks (ANNs) are non-linear mapping structures based on the function of the human brain. ANNs are computational structure that is inspired by observed process in natural networks of biological neurons in the brain which can identify and learn correlated patterns between input data sets and corresponding target values. ANNs imitate the learning process of the human brain and can process problems involving non-linear and complex data. They are powerful tools for classification and modelling, especially when the underlying data relationship is unknown. ANN consists of simple computational units called neurons and are highly interconnected. A very important feature of these networks is their adaptive nature, where “learning by example” is employed in solving problems where ‘learning’ can be understood as developing or evolving while ‘example’ means cases or objects presented to the network. ANNs are now being increasingly recognized in the area of classification and prediction, where regression model and other related statistical techniques have traditionally been employed. The neural network model can be claimed to be ‘data-driven’ i.e. free from any stringent model assumptions prevalent with other statistical models. ANNs have been used for a wide variety of applications where statistical methods are traditionally employed.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Multilayer perceptron (MLP):The most popular form of neural network architecture is the multilayer perceptrons (MLP) which is a generalization of the single-layer perceptron.
A multilayer perceptron (MLP) is feedforward neural network architecture with unidirectional full connections between successive layers.
Typically, the MLP network consists of a set of source nodes that constitute the input layer (known as independent variables in statistical literature), one or more hidden layers of computation nodes and an output layer (known as dependent or response variables) of computation nodes.
The input signal propagates through the network in a forward direction on a layer by layer basis.
A multilayer perceptron has three distinctive characteristics:
Radial basis function (RBF):This is another type of neural network architecture which are perhaps the most popular type after MLPs. In many ways RBF is similar to MLP networks. First, they too have unidirectional feedforward connections and every neurons is fully connected to the units in the next layer above. The neurons are arranged in a layered feedforward topology. RBF networks, almost invariably, consists of three layers: a transparent input layer, a hidden layer with sufficiently large number of nodes and an output layer. As its name implies, radially symmetric basis function is used as activation function of hidden nodes. The transformation from the input nodes to the hidden nodes is non-linear one and training of this portion of the network is generally accomplished by an unsupervised fashion.
The training of the network parameters between the hidden and output layers occurs in a supervised fashion based on target outputs.
which assumes that the target variables are driven from a multinomial distribution. ASSESSMENT OF THE PREDICTION ACCURACYComputational models that are valid, relevant, and properly assessed for accuracy can be used for planning of complementary laboratory experiments. The prediction quality was examined by testing the model, obtained after training the system, with test data set.
Several measures are available for the statistical estimation of the accuracy of prediction models.
The common statistical measures are Sensitivity, Specificity, Precision or Positive predictive value (PPV), Negative predictive value (NPV), Accuracy and Mathew’s correlation coefficient (MCC).
Where TP and TN are correctly predicted antibacterial peptides and non antibacterial peptides respectively. FP and FN are wrongly predicted antibacterial peptides and non antibacterial peptides respectively. |