Modifying genomes using the CRISPR/Cas9 system is becoming popular. Good prior planning can save you much time and effort, and could mean the difference between success and failure. Designing your guide RNAs is your first major challenge.
There are multiple online tools available for designing gRNAs for a CRISPR/Cas9 genome editing (Here are some useful online design tools). CHOPCHOP is another recently developed tool but these tools presently only support defined model organisms.
A recent paper by Zhu et al (2014) describes CRISPRseek, a new gRNA design tool, which can be applied to non-model organisms as well as model organisms.
By writing their tool as a Bioconductor package in R, Zhu et al have incorporated the algorithms currently utilized to design gRNAs (eg/ protospacer adjacent motifs, weight scores for the sequence-specific bases), while at the same time, they allow for user-defined flexibility. CRISPRseek allows the user to set the mismatch value and also permits for regular expression pattern searching for the gRNA of interest, so motifs such as uracil stretches in the 5′ end can be automatically eliminated from consideration.
In addition, the comparison of two isoforms differing only by one or more SNPs can be compared to identify allele-specific gRNAs. As a Bioconductor package, CRISPRseek works seamlessly with Biostrings, meaning that the end user can either use a BSgenome defined genome, or forge their own genome if their organism of interest is not represented (BSgenomeForge)(BSgenomeForge vignette). As with any R package, there is a well-documented vignette (CRISPRseek_vignette), which outlines the parameters for each of the functions of the package through a combination of usage examples and argument definitions.
The output of CRISPRseek is convenient as well, providing Excel files containing the list of single or paired gRNAs for a cleavage site, as well as a list of all possible restriction enzyme sites within the cleavage region for expediting downstream screening of knock-outs.
As is the case with bioinformatics analyses of non-model organisms, the “great freedom” afforded by Bioconductor comes with “great responsibility”, and if you want to define your own genome, it will take a bit of reformatting. BSgenome does not seem to want to forge a genome comprised of thousands of scaffolds (as is the case with most provisional genomes), but was amenable to forging a single FASTA psuedomolecule genome comprised of the entire set of scaffolds where each scaffold was gapped using ~100Ns.
While this allows the forging of the genome of a non-model organism, it rules out the inclusion of a masking or basefeatures file, which results in longer processing time to identify off-target hits, which must then be scanned manually to determine if they are exonic. Still, even with these limitations, it’s simple to forge your own genome (Running_CRISPRseek_cheatsheet) and the tool has great documentation.
So if you’re designing a set of gRNAs for an organism that does not have a reference genome in an online tool, give CRISPRseek a try.
I’ve attached a CRISPRseek cheatsheet (pdf) that I put together to get myself set up to do some gRNA design for a project I am working on with Aedes aegypti.
If you have suggestions, comments or questions, feel free to attach them to this post.
Zhu LJ, Holmes BR, Aronin N, Brodsky MH (2014) CRISPRseek: A Bioconductor Package to Identify Target-Specific Guide RNAs for CRISPR-Cas9 Genome-Editing Systems. PLoS ONE 9: e108424 10.1371/journal.pone.0108424