Installation

The two clones are synchronized between https://github.com/cfce/chilin/. We have packaged all dependent software and species-specific data into .tar.gz.

Before installation, make sure that you have gcc, g++, make and java in place, we provide the installation for common system for these dependency.

Dependent software list

Note

python must be 2.7 version for macs2 support.

Tool debian/centos/mac Usage for ChiLin
python dev header apt-get or yum or port prerequisites
python setuptools apt-get or yum or port prerequisites
python numpy package apt-get or yum or port prerequisites
cython apt-get or yum or port prerequisites
R apt-get or yum or manually prerequisites
java/gcc/g++ apt-get/yum install/Xcode prerequisites
ghostscript apt-get or yum or manually prerequisites
texlive-latex apt-get or yum or manually prerequisites
ImageMagick apt-get or yum or manually prerequisites
MACS2 pypi peak calling
seqtk built-in packaged into chilin
bx-python built-in packaged into chilin
FastQC built-in packaged into chilin
BWA built-in packaged into chilin
samtools built-in packaged into chilin
bedtools built-in packaged into chilin
bedClip built-in UCSC binary (packaged into chilin)
bedGraphToBigWig built-in UCSC binary (packaged into chilin)
wigCorrelate built-in UCSC binary (packaged into chilin)
wigToBigWig built-in UCSC binary (packaged into chilin)
mdseqpos built-in packaged into chilin

Ubuntu and debian installation

  • If you are the administrator, use followings.

    sudo apt-get update
    #python
    sudo apt-get install python-dev python-numpy python-setuptools cython python-pip
    #R
    sudo apt-get install r-base
    #java
    sudo apt-get install default-jre
    sudo apt-get install ghostscript
    sudo apt-get install imagemagick --fix-missing
    #Tex
    sudo apt-get install texlive-latex-base
    

Then continue to install chilin.


Centos and Fedora installation

For centos, use:

sudo yum install python-devel numpy python-setuptools python-pip
rpm -Uvh http://mirror.chpc.utah.edu/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo yum install tcl tcl-devel tk-devel
sudo yum install R
sudo yum install ImageMagick
sudo easy_install Cython
sudo yum install tetex

Then continue to install chilin.


Mac OS Installation

  • Install Xcode from MacStore.
  • See the instruction of installing java for mac.
  • Type xcode-select –install to install Command Line Tool.
  • Install MacPorts to help install the dependent modules easier.

For mac, we suggest using macports, before install macport, user need to have Xcode and Java installed:

## download and install macport
# open https://distfiles.macports.org/MacPorts/ and download the right version
sudo port install py27-setuptools py27-pip py27-nose py27-cython py27-numpy @1.8.1  ## or use EPD to replace this
## Install R manually from http://cran.cnr.berkeley.edu/bin/macosx/
## For mac latex, install separately, download and click to install it
wget -c http://mirror.ctan.org/systems/mac/mactex/MacTeX.pkg

Install R, MacTex, ImageMagick and ghostscript manually.

Then continue to install chilin.


Install ChiLin

Test and install pipeline software

  • Type “which gcc g++ java make gs convert pdflatex R cython” to check the installation.

After solving the dependent prerequisites, install chilin as followings,

git clone http://github.com/cfce/chilin/
cd chilin
python setup.py install -f

Then, check your installation:

source chilin_env/bin/activate
# check ChiLin dependent software and data
python setup.py -l

If any software can not be installed, look into their official documentation. Most of time, see dependentsoft to check whether all prerequisites are installed or not, usually it’s the problem of numpy, cython or gcc compiler problem, or R package seqLogo problem. Take a look at all software in the software directory to see what’s going on, and try apt-get, yum, port and pypi to fix the issue.

Lastly, user may need to check the installation of mdseqpos dependency of R seqLogo package, open R console and install dependent R packages:

R -e "source('http://bioconductor.org/biocLite.R');biocLite('seqLogo');library(seqLogo)"

Remember to source your python virtual environment “source ${ChiLin_ROOT}/chilin_env/bin/activate” everytime or put them into your ${HOME}/.bashrc or ${HOME}/.bash_profile.

Note

After installation, the config file is auto-generated and set the species specific data directory default to db under the code root directory.

Download dependent data for hg38, hg19, mm9, or mm10

under the ChiLin source code root directory,

# download from our cistrome server
mkdir -p db

# change directory to db
cd db

# download the one you need, this would be over 10 GB, make sure your internet access is over 100k/s, or it's too slow..
# human
wget -c http://cistrome.org/chilin/_downloads/hg19.tgz
wget -c http://cistrome.org/chilin/_downloads/hg19.tgz.md5 ## check md5
#wget -c http://cistrome.org/chilin/_downloads/hg38.tgz
#wget -c http://cistrome.org/chilin/_downloads/hg38.tgz.md5

# mouse
#wget -c http://cistrome.org/chilin/_downloads/mm9.tgz
#wget -c http://cistrome.org/chilin/_downloads/mm9.tgz.md5
#wget -c http://cistrome.org/chilin/_downloads/mm10.tgz
#wget -c http://cistrome.org/chilin/_downloads/mm10.tgz.md5

# check the md5sum for completeness of hg19
md5sum -c hg19.tgz
tar xvfz hg19.tgz

# download mycoplasma that you are afraid of contaminating your samples
wget -c http://cistrome.org/chilin/_downloads/mycoplasma.tgz
wget -c http://cistrome.org/chilin/_downloads/mycoplasma.tgz.md5
md5sum -c mycoplasma.tgz.md5
tar xvfz mycoplasma.tgz

# change back
cd ..

# check your data and software installation, if download is ok
python setup.py -l

If you

  • want to know more about dependent data
  • want to prepare new version of reference data by yourself
  • you have species assembly not in our list

see details about the dependent data.

add species support in chilin

After these preparation of software and reference data, if you are using our prepared hg38, hg19, mm10, mm9 dependent data, you can skip this part because setup.py already sets chilin.conf.filled for you. If you have your own reference data, open your favorite text editor, appending section in chilin.conf.filled file add in species support like this.

fill in the section with your own data absolute path, then append filled following section

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
[hg19]
genome_index =
# fasta file separated by chromosome, such as chr1.fa
genome_dir =
chrom_len =
dhs =
## blacklist region
velcro =
conservation =
geneTable =

[mm9]
genome_index =
# fasta file separated by chromosome, such as chr1.fa
genome_dir =
chrom_len =
dhs =
## blacklist region
velcro =
conservation =
geneTable =

[hg38]
genome_index =
# fasta file separated by chromosome, such as chr1.fa
genome_dir =
chrom_len =
dhs =
## blacklist region
velcro =
conservation =
geneTable =

after conf, see details about this in [species].

#Define the site-wide defaults for your system using ABSOLUTE PATHS
#When defining reference species (i.e. your species of interest)
#please refer to "Generating Species References" in README.md on how
#to generate the files
#------------------------------------------------------------------------------
# Tools
#------------------------------------------------------------------------------
#ChiLin is dependent on several tools, please specify the absolute path to
#these tools--ALL FIELDS ARE REQUIRED
#put bedClip, bedGraphToBigWig, bowtie, star, bwa, fastqc, bedtools, macs2, samtools, seqtk, wigCorrelate
#in executable PATH
#other system tool includes convert, pdflatex, R, python2.7
[tool]
mdseqpos =
macs2 =

#------------------------------------------------------------------------------
# Tool parameters
#------------------------------------------------------------------------------
#These are optional parameters for some tools defined above
#NOTE: not all tool parameters can be inputted in this conf--e.g. for bwa,
#the only thing you can affect here is the number of threads used.
[macs2]
#refer to the macs2 help message to find out what these mean, species for effective genome size
extsize = 146
# effecitive genome sizes, support hs, mm, other species, please refer to chromInfo
species = 
type = both
fdr = 0.01
keep_dup = 1

[reg]
## regulatory potential score prediction top peaks
peaks = 10000
dist = 100000

[conservation]
## for tf/dnase we suggest 400bp width around summit, for histone 4000
type = tf
peaks = 5000
width = 400

[seqpos]
peaks = 5000
mdscan_width = 200
mdscan_top_peaks = 200
seqpos_mdscan_top_peaks_refine = 500
width = 600
pvalue_cutoff = 0.001
db = cistrome.xml

#------------------------------------------------------------------------------
# Contamination
#------------------------------------------------------------------------------
#OPTIONAL- our contamination module can screen for any species defined below
#specify the species name and the path to the bwa index as follows: e.g.
#ECOLI = /some/path/ecoli
[contamination]
mycoplasma = mycoplasma
# ecoli =
# yeat = 

Test installation

Test installation with demo data,

# non cluster server
cd demo
bash foxa1

# if you are using slurm sytem
cd demo
# submit cluster script `foxa1`
sbatch foxa1

Check demo data results,

du -h local/local.pdf ## quality report
# mac
open local/local.pdf
# linux
nautilus local/

For more options, see Manual.