!
  • Step-1:sgRNA-Scanner


    Identifies the potential sgRNAs in an input gene/genomic regions using NGG PAM sequences in both sense and antisense strands using in-house method.
  •               
    Example sgRNA sequence:
    
    	<    Distal |  Proximal  | PAM    >
    	<5'-NNNNNNNN|NNNNNNNNNNNN| NGG- 3'>
    
  • Step-2:geCRISPRc


    • Prediction algorithms are developed using high throughput experimentally proven 4569 sgRNAs from literature i.e. 1841 sgRNAs (Doench et al, Nature Biotech 2014); 1278 sgRNAs (Shalem et al, Science 2014); 1020 sgRNAs (Miguel et al, Nature Methods 2015) and 430 sgRNAs (Chari et al, Nature Methods 2015).

    • We utilized the important sgRNA sequence features reported earlier such as nucleotide compositions, binary profile, thermodynamic and structure properties and their hybrids (Total 22 models) to develop Support vector machine (SVM) based predictive models.

    • geCRISPRc algorithm is developed using 2090 sgRNAs (including potent 1021 sgRNAs having >50% efficiency and 1069 sgRNAs with <5% efficiency) divided into Training/Testing dataset T1840=895pos+945neg and Validation dataset V250=126pos+124neg.

    • We have achieved accuracy, Mathew’s correlation coefficient (MCC) and receiver operating characteristics (ROC) of 87.17%, 0.75, 0.92 and 88.80%, 0.78, 0.94 during 10-fold cross validation (T1840) and on independent (V250) datasets respectively using best SVM model for geCRISPRc algorithm.
  • Step-3:geCRISPRr


    • Prediction algorithms are developed using high throughput experimentally proven 4569 sgRNAs from literature i.e. 1841 sgRNAs (Doench et al, Nature Biotech 2014); 1278 sgRNAs (Shalem et al, Science 2014); 1020 sgRNAs (Miguel et al, Nature Methods 2015) and 430 sgRNAs (Chari et al, Nature Methods 2015).

    • We utilized the important sgRNA sequence features reported earlier such as nucleotide compositions, binary profile, thermodynamic and structure properties and their hybrids (Total 22 models) to develop Support vector machine (SVM) based predictive models.

    • geCRISPRr algorithm is developed using 4139 sgRNAs (including sgRNAs having efficiency from 0 to 100%) divided into Training/Testing dataset T3619 and Validation dataset V520.

    • For geCRISPRr algorithm we achieved Pearson correlation coefficients (PCC) of 0.68 and 0.69 using the best model during 10-fold cross validation (T3619) and on independent (V520) datasets respectively.

    • The performance of the geCRISPR algorithm developed on the largest experimentally proven sgRNAs sequences is better than the existing few methods tested on smaller data sets.
  • Step-4:geCRISPR-analysis


    • An off-target analysis is also integrated to search the similar sgRNAs (target sites) in the model organisms genomes i.e. homo sapiens, mus musculus, drosophila melanogaster, denio rerio, caenorhabditis elegans.

                  
    This is accomplished in two modes - First, using a complete sgRNA sequence with the cutoff of Query coverage should be 80% and Percent identity should be more than 90% on default e-value (10). 
    
    Second, using a seed region of particular sgRNA (12 nt 5' proximal to the PAM motif) with the e-value (100) and the search parameter of 100% Query coverage
    and Percent identity as any mismatch in seed region is more critical for specificity than the distal 8 nucleotides. Example sgRNA sequence: < Distal | Proximal(seed) | PAM > <5'-NNNNNNNN| NNNNNNNNNNNN | NGG- 3'>

    • It also provide secondary structure of sgRNAs.

    • Web interface of geCRISPR web server is user friendly.

    • Output for all the models is provided against each sgRNA sequence in graphical as well as numerical form using different coloring scheme to help the user to interpret the results to select the highly potent/effective sgRNAs. User has option to sort the result output data.