ChiLin support all species listed on UCSC website, which includes dependent data as we list in [species]
:
We have packaged all dependent data for hg19, hg38, mm9, mm10.
Data Name | Used by | Data Source |
---|---|---|
genome_index |
bwa/bowtie/star | raw fasta indexed files |
genome_dir |
bwa/bowtie/star | genome fasta files |
conservation |
conservation_plot.py | wiggle files |
Genome version | Raw genome sequence | Masked genome sequence |
---|---|---|
hg19 | hg19_raw | hg19_mask |
hg38 | hg38_raw | hg38_mask |
mm9 | mm9_raw | mm9_mask |
mm10 | mm10_raw | mm10_mask |
Data Name | Used by | Data Source |
---|---|---|
chrom_len |
samtools | UCSC table browser |
dhs |
bedtools | Union DHS regions from Cistrome DB |
velcro |
bedtools | blacklist regions |
geneTable |
bedAnnotate | UCSC table browser |
[contamination] |
bwa | Mycoplasma genome index(set by –mapper) |
It seems that Mycoplasma contamination would be a major source of contamination, so we recommended downloading the Mycoplasma fasta for indexing, data is in the link of the mycoplasma genome. Or look at NCBI Nucleotide database.
Then index with bwa index -a is mycoplasma.fasta.
download raw genome sequence data, and tar xvfz them and cat *fa > genome.fa. Use the following to index them:
bwa index -a bwtsw genome.fasta
Use Browser step by step
sed 1d species.refgene > sp.refgene
wget -r -np -nd --accept=gz http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/phastCons46way/placentalMammals/
for c in chr*wig*gz
do
bw=${c%phastCons46way.placental.wigFix.gz}bw
echo $bw
gunzip -c $c | wigToBigWig stdin chrom_len $bw ## chrom_len is where you put your reference chromosome information file
done