MSLVP

Prediction of multiple subcellular localization of viral proteins

Algorithm

Machine learning Technique:

Support vector machines (SVMs) were trained with the selected sequence features to predict potency in classification mode. SVM allows choosing a number of parameters and kernels. The SVMlight software package (available at http://svmlight.joachims.org/) was used to construct SVM classifiers. In this study, we used the radial basis function (RBF) kernel:

k(x ,y)=exp(-γ||x - y||²)

Features:

a) Amino Acid Composition

Amino acid composition is the fraction of each amino acid in a sequence. The fraction of all 20 natural amino acids was calculated using the following equation:

Fraction of amino acid X =Total number of X/sequence length

b) Dipeptide Composition

Dipeptide composition provides composition of pair of the residues (e.g. Phy-Phy, Gly-Leu, etc) present in a sequence. Complete pattern of DPC is represented by vector of 400 (20*20). Dipeptide composition is calculated by the formula:

Fraction of Dipeptide =Total number of dipeptide/Total number of all possible dipeptides

c) Physicochemical properties

We tried all 544 physicochemical properties available in AAindex database individually. Finally 10 best performing physicochemical properties were used in developing physico model.

Evaluation:

The performance modules constructed in this study were evaluated using 10-fold cross-validation technique. In 10-fold cross-validation, the relevant dataset was partitioned randomly into ten equally sized  sets. The training and testing was carried out ten times, each time using one distinct set for testing and the remaining nine sets for training. The performance of the methods was computed using the following formulas

   Sensitivity (S_n) = [TP / (TP+FN)]*100

   Specificity (S_p) = [TN / (TN+FP)]*100

   Accuracy (A_c ) = [TP+TN / (TP+FP+TN+FN)]*100