Description

This website is designed for sgRNA design for site specific knock out experiment.

Tool input

The website will send a email to you once the job is finished.

* Genome assembly:
This tool only support hg19, hg38, mm9, mm10, danRer7, dm6 and ce10 genomes for sgRNA target scan.

* Spacer length
The output target sequence will be spacer_len + PAM + 7bp. PAM in red color.

* This tool can use the following format to get region as input:

• Region coordinate like "chr1:10000-10400" or "chr1 10000 10400"
• A refseq gene like "NM_003106"
• A official gene symbol like "Sox2"

Standard genome chromosome input

• human/mouse/zebrafish: chr1, chr2, ...
• c. elegans: chrI, chrII, chrIII, ...
• fly: chr2L, chr2R, chr3L, chr3R, ...

* select to design in whole input region or only in exons of input region.

* efficiency score cutoff
See below for detail

* specificity score cutoff
See below for detail

Output

The output of this tool contains a table incuding information and scores for each sgRNA. Also there are two buttons, one for you to download the whole table, and the other is for the visualization of the sgRNAs in genome browser.

Each column of the table is explained as below.

 chrom the chromosome name the sgRNA located. start start position of the sgRNA in genome end end position of the sgRNA in genome hitseq sgRNA sequence, containing the PAM (in red) and 7bp downstream. strand on which strand the sgRNA located efficiency_score the efficiency of the sgRNA based on its 29 / 30bp sequcene. specificity_score the specificity of the sgRNA. This score is ranged from 0 to 100 conservation_score average conservation score in 29 / 30bp sequence (19 / 20bp guide + PAM + 7bp), using UCSC phastcons score exon_overlap if the sgRNA is overlapped with any exon DHS_overlap if the sgRNA is overlapped with any DHS regions from encode SNP overlap if there are any SNP located in the sgRNA

example output:

The filter function at (1) is used to quickly filter the result sgRNA in the table. You can filter it by several criterions, and several times. As an example, if you only need sgRNA in exon regions, please select "exon_overlap", "unequals to", "False" and click "Filter".

The button at (3) is for epi-genome browser visualization for the result.

Conservation

The conservation score is calculated by average of the PhastCons score in 30bp sequence (20bp guide + PAM + 7bp). The PhastCons score static files downloaded from UCSC are list below.

• hg19 phastCons46way vertebrate
• hg38 phastCons7way
• mm9 phastCons30way vertebrate
• mm10 phastCons60way placental
• danRer7 phastCons8way
• dm6 phastCons27way
• ce10 phastCons7way
• Specificity score

For each spacer, we use bwa to map it back to genome with a most of 4 mismatches. The mismatched sites for the single spacer are collected. For each of the mismatch site, we assign a score based on the formula (Hsu et al, Nature Biotechnology 2013)

$\prod_{e\in{\mathcal{M}}}\left(1-&space;W[e]\right)\times\frac{1}{\left(\frac{(19&space;-&space;\bar{d})}{19}\times4&space;+&space;1\right)}\times\frac{1}{n^2_{mm}}$

M = [0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.731, 0.828, 0.615, 0.804, 0.685, 0.583]

with e runs over the mismatch positions between guide and off-target, with M representing the experimentally-determined effect of mismatch position on targeting. d indicates the average distance of the mismatched basepair and nmm is the number of basepair for this mismatch.

After each mismatch site of the spacer is assigned by a score. We can calculate the score of the spacer by formula

$S_{guide}&space;=&space;\frac{100}{1&space;+&space;\sum_{i=1}^{n_{mm}}S_{hit}(h_i)}$

This specificity score is ranged from 0 to 100 and the higher score shows a lower off-target effect for the spacer.

This figure shows the distribution of the specificity score of all the sgRNA in human hg38 and mouse mm10.

Efficiency score

This method has published on Genome Research (Han et al. 2015). The KO efficiency calculation is based on the sequence context of each sgRNA as well as the flanking region of the spacer. For each 20bp spacer, we extend 10bp from the both side, and construct a 160-dimensional binary vector (40bp sequence length multiplied by ACGT, which may appear at each position). This vector is trained by experimental data (Koike-Yusa, et al., 2014, Wang et al., 2014) with ElasticNet to get the nucleotide weight for each position. We select 1,979 sgRNAs targeting a list of essential ribosomal genes, as well as 1,573 sgRNAs that target essential non-ribosomal genes from experimental data. In our study, the 19bp sequence upstream from PAM as well as the 7bp downstream from PAM contributes the sgRNA efficiency.

The coefficients of each nucleotide in the model can be shown as a sequence logo below (Han et al. 2015).

This figure shows the distribution of the efficiency score (cutoff = 0) of all the sgRNA in human hg38 and mouse mm10. The recommended efficiency score range is 0.3-1.4

Epigenome browser views

We also user Wugb epigenome browser to display the regions to give a direct visualization of the overlaps. You may need a modern web browser like chrome or firefox to access all the features.

By clicking each spacer of the first track, you will see the scores and sequence of the region. The depth of the color is calculated by specificity_score + 100 * efficiency_score, which shows a general quality of the spacer.