In the inaugural volume of The CRISPR Journal, Philippos Papathanos and Nikolai Windbichler describe a computational pipeline they call redkmer, which identifies repetitive CRIPSR target sites for X-shredding systems in insects.
Back in 2007, Windbichler et al. found that the endonuclease I-PpoI cleaved the 28S rDNA genes on the X chromosome in Anopheles gambiae and proposed that I-Ppol could be used as a sex distorter. Further, their lab group placed the expression of I-PpoI under the control of the male spermatogenesis-specific β2 tubulin promoter, where the X chromosomes of X-containing sperm were multiply-cut, or ‘shredded’ and the resulting offspring had a >95% male-skewed sex distortion. And even further, they replaced the endonuclease I-PpoI with the CRISPR-Cas9 system, making the system transferable to other insect species that have the XY mating strategy.
The targeting of sites on X, however, is difficult in many insect species because the predominance of sequenced insect genomes are distributed among multiple scaffolds, which are a priori, unassigned to chromosome. In addition, finished genome assemblies may struggle with connecting repetitive sequences (which may contain repetitive CRISPR target sites) and thus discard data that cannot otherwise be assigned to scaffold.
Papathanos and Windbichler avoid these problems, by ignoring the use of a finished assembly entirely. In place, they compared NGS high throughput data of females to males using long-read PacBio data as surrogate scaffolds and remapped short-read Illumina data to determine appropriate chromosome quotient (CQ) values to the scaffolds. Using this approach, the coverage for short reads remapped to the PacBio reads has a ratio of ~2:1 for reads of X origin (CQ=2), ~1:1 for autosomal (CQ=1), and ~0:1 for Y-linked reads (CQ=0).
The assignment of the long PacBio reads to chromosome is striking. And while redkmer was developed with the goal of identifying repetitive X-linked CRISPR targets, the pipeline could as easily be co-opted to identify Y-specific loci, or Y-specific CRISPR target sites.
Finally, to identify the putative Cas9/CpfI target sites, redkmer implements a new CRISPR target identification tool FlashFry, from Aaron McKenna and Jay Shendure, which came out a few months ago as a preprint in bioRxiv (https://www.biorxiv.org/content/early/2017/09/14/189068). FlashFry provides user-level flexibility, including allowing the user to set a high threshold for off-targets, which is critical for the identification of multiple CRISPR targets on X.
Redkmer relies on a number of Linux/Unix-based dependencies (eg/ Jellyfish, bowtie, samtools, BLAST, R, FlashFry), and as part of the OSS community, is a valuable tool for CRISPR target discovery.