Frequently Asked Questions
Q1. What is MSLVP? MSLVP is a multi-label two-tier prediction algorithm for annotation of subcellular localization of viral proteins using support vector machine.
Q2. What is two-tier prediction algorithm? First-Tier predicts whether a input sequence belong to single/double/multiple location. While second tier predicts specific eight, four and six sub-categories within first-tier locations
Q3. What is the use of this algorithm? MSLVP web server has the capability to predict subcellular locations i.e. single (8; including single and multi-pass membrane), double (4) and multiple (6). It is helpful in understanding functional annotation of viral proteins. Subcellular locations of viral proteins in virus-infected cells is particularly important in explaining their pathogenesis.
Q4. How was data collected for MSLVP? We have collected comprehensive viral proteins with experimentally validated subcellular localization annotation from UniProt.
Q5. What are the total number of viral protein entries ? We have collected 4858 experimentally validated viral proteins with subcellular location annotation.
Q6. What are the filters used to curate data from UniProt ? For curating the data, following filters were applied: i) the taxonomy (OC) field includes "viruses" only ii) Proteins annotated with terms like "probable", "potential" and "by similarity" were excluded and only experimental observations were included iv) sequences that were annotated with "fragments", "polyprotein" and sequence length of less than 50 amino acid residues were also discarded.
Q7. What are the different sites used in MSLVP? MSLVP includes three types of sites .i.e. Single, Double and Multiple (Three or more than three) sites.
Q9. At what percent, redundant data is used in MSLVP? We have classified data set at 90% and 30% sequence identity. At 90%, MSLVP used non-redundant data set of 3480 viral proteins for single (8), double (4) and multiple (6) locations of viral proteins. While for 30% (where sequences having not more than 30% sequence identity extracted using CDHit tool), the dataset is reduced to 1687 proteins.
Q8. How many entries are there for 90% sequence identity in MSLVP? Out of 3480 viral proteins, single location has 2715 (1893+822 capsid entries)entries. While in double and multiple, we have 391 and 374 entries respectively.
Q9.How many entries are there for 30% sequence identity in MSLVP? Out of 1687 viral proteins, single location has 1366 (1009+357 capsid entries)entries. While in double and multiple, we have 167 and 154 entries respectively.
Q10. What are the different types of location in Single Location? Single Location comprises of eight categories i.e. (i) Cytoplasm; (ii) Nucleus; (iii) Extracell [Secreted, Extracellular]; (iv) Single-Pass membrane; (v) Multi-Pass membrane; (vi) Endoplasmic Reticulum; (vii) Others-Single; (viii) Capsid;
Q11. What are the different types of location in Double Location? Double Location comprises of four categories i.e. (i) Host nucleus. Host cytoplasm.; (ii) Host cytoplasm. Host cell membrane; Peripheral membrane protein; Cytoplasmic side.; (iii) Host nucleus. Extracell.; (iv) Others Double
Q12. What are the different types of location in Triple Location? Double Location comprises of four categories i.e. (i) Virion tegument. Host cytoplasm. Host nucleus.; (ii) Host cell membrane; Lipid-anchor; Cytoplasmic side. Host cytoplasm > host perinuclear region. Secreted.; (iii) Host nucleus > host nucleolus. Host cytoplasm. Secreted.; (iv) Host cytoplasm. Host nucleus. Host mitochondrion.; (v) Multiple and (vi) Others Triple.
Q13. How is data set divided in Single, Double and Multiple locations? The data set is divided into train/test set with 1703/190, 351/40, 336/38 for single, double and multiple locations respectively.
Q14. What are the different approaches used in MSLVP? For MSLVP, we have implemented "one-versus-other" and "one-versus-one" approach.
Q15.What is "one-versus-other" approach? For "one-versus-other" approach, one class is treated as positive class. While, the remaining classes are considered as negative data set.
Q16.What is "one-versus-one" approach? In "one-versus-one" approach, each class is trained with every different class.
Q17.Is feature selection method applied in MSLVP? Yes, we have applied feature selection algorithm to choose the optimal feature set. For this, we have used maximum relevance minimum redundancy (mRMR) algorithm, which is one of the important method.
Q18.Which are machine learning technique used in MSLVP? In MSLVP, we have used SVM multiclass V2.20 for "one-versus-one" and SVM-light V6.02 for "one-versus-other" approaches.
Q19.What are the different features used in MSLVP? In MSLVP, we have used Amino Acid Composition (AAC), Dipeptide composition (DPC), Physicochemical Properties (PHY) and their hybrid.
Q20. What is the average accuracy in MSLVP? For 90% data set, we achieved accuracy, Matthew correlation coefficient (MCC) and ROC of 99.99%, 1.00, 1.00; 100.00%, 1.00, 1.00 and 99.90%; 1.00, 1.00 for single, double and multiple locations respectively. Similar results were achieved for 30% data set.
Q21. Is it validated independently? Yes, it is validated independently.