Alignment

CPT3 align

long-read or short-read

ChIA-PET, HiCHIP sequencing paired-end data alignment:

An example usage is:
    java -jar ChIA-PET.jar --fastq1 test_long_1.fastq.gz --fastq2 test_long_2.fastq.gz \
    --minimum_linker_alignment_score 14 --GENOME_INDEX hg38.fa --GENOME_LENGTH 3E9 \
    --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg38.chromSize.txt \
    --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 \
    --output test_long --prefix GM12878 --thread 8 --fastp fastp \
    --start_step 1 --stop_step 2

Parameters

[ Main paramaters ]

System.out.println("Usage: java -jar <path of ChIA_PET.jar> [options]

Necessary options:

--fastq1

path of read1 fastq file

--fastq2

path of read2 fastq file

--autolinker

detect linker by our program, then no need provide --linker and --mode paramater.

--mode

mode of tool, 0: short read; 1: long read, need for ChIA-PET data

--linker

path of linker file, need for ChIA-PET mode

--fastp

fastp path, strong suggest for ChIA-PET data.

--skipheader

skip header N reads for detect linker, default 1000000.

--linkerreads

N reads used for detect linker, default 100000.

--hichip

Y(es) or N(o)[default] or O(nly print restriction site file without run other step), need for hichip data

--ligation_site

It can be the name of restriction enzyme, such as HindIII, MboI, DpnII, Bglii, Sau3AI, Hinf1, NlaIII, AluI or the site of enzyme digestion, A^AGCTT, ^GATC, ^GATC, A^GATCT, G^ANTC, CATG^, AG^CT or others. multipe restriction enzyme can be seperated by comma, such as G^ANTC,^GATC. restriction site with '^' and contains 'ATCG' without other character!!! if the genomic enzyme digestion file --restrictionsiteFile is provided, this parameter does not need to be provided. only needed for hichip data

--ResRomove

Y or N, whether remove PET in same restriction contig. default: Y

--restrictionsiteFile

restriction site file, can be genarated while has --ligation_site and without this paramater or provide restriction enzyme information with --ligation_site, we will automatically generate the file. only needed for hichip data

--genomefile

genome fasta file path, needed for with --ligation_site and without --restrictionsiteFile only needed for hichip data

--minfragsizeMinimum

restriction fragment length to consider, default 20

--maxfragsize

Maximum restriction fragment length to consider, default 1000000

--minInsertsize

Minimum restriction fragment skip of mapped reads to consider, default 1

--fqmode

single-end or paired-end (default), only required --fastq1 when single-end mode for ChIA-PET data

--minimum_linker_alignment_score

minimum alignment score, 14 default

--GENOME_INDEX

the path of BWA index

--GENOME_LENGTH

the number of base pairs in the whole genome

--CHROM_SIZE_INFO

the file that contains the length of each chromosome, example file is in ChIA-PET_Tool_V3/chrInfo, this is necessary for > step 2 analysis. Note. please make sure chromosome name in this file is same as name in genome file!!!

--CYTOBAND_DATA

the ideogram data used to plot intra-chromosomal peaks and interactions, example file is in ChIA-PET_Tool_V3/chrInfo

--SPECIES

1: human; 2: mouse; 3: others

Other options:

--start_step

start with which step, 1: linker filtering; 2: mapping to genome; 3: removing redundancy; 4: categorization of PETs; 5: peak calling; 6: interaction calling; 7: visualizing, default: 1

--stop_step

stop with which step, 1: linker filtering; 2: mapping to genome; 3: removing redundancy; 4: categorization of PETs; 5: peak calling; 6: interaction calling; 7: visualizing, default: 100, should be bigger than --start_step

--output

path of output, default: ChIA-PET_Tool_V3/output

--prefix

prefix of output files, default: out

--minimum_tag_length

minimum tag length, default: 18

--maximum_tag_length

maximum tag length, default: 1000

--minSecondBestScoreDiff

the score difference between the best-aligned and the second-best aligned linkers, default: 3

--output_data_with_ambiguous_linker_info

whether to print the linker-ambiguity PETs, 0: not print; 1: print, default: 1

--printreadID

write read ID to bedpe file, default: N

--printallreads

print all reads no matter strand, default: 0[print all]; 1, only print valid strand reads.

--search_all_linker

search all linkers in reads or just search one time, default: N

--thread

the number of threads used in linker filtering and mapping to genome, default: 1

--MAPPING_CUTOFF

cutoff of mapping quality score for filtering out low-quality or multiply-mapped reads, default: 20

--MERGE_DISTANCE

the distance limit to merge the PETs with similar mapping locations, default: 2

--SELF_LIGATION_CUFOFF

the distance threshold between self-ligation PETs and intra-chromosomal

inter-ligation PETs, default: 8000 for ChIA, and 1000 for HiChIP

--EXTENSION_LENGTH

the extension length from the location of each tag, default: 500, 1500 suggest for single-end mode

--MIN_COVERAGE_FOR_PEAK

the minimum coverage to define peak regions, default: 5

--PEAK_MODE

1: peak region mode, which takes all the overlapping PET regions above the coverage threshold as peak regions; 2: peak summit mode, which takes the highest coverage of overlapping regions as peak regions, default: 2

--MIN_DISTANCE_BETWEEN_PEAK

the minimum distance between two peaks, default: 500

--GENOME_COVERAGE_RATIO

the estimated proportion of the genome covered by the reads, default: 0.8

--PVALUE_CUTOFF_PEAK

p-value to filter peaks that are not statistically significant, default: 0.00001

--INPUT_ANCHOR_FILE

a file which contains user-specified anchors for interaction calling, default: null

--PVALUE_CUTOFF_INTERACTION | p-value to filter false positive interactions, default: 0.5

--zipbedpe

gzip bedpe related file, after analysis done. default: N. Y for gzip, N for not.

--zipsam

Convert sam file to bam, after analysis done. default: N

--deletesam

Delete sam files. default: N

--keeptemp

Keep temp sam and bedpe file. default: N

--map_ambiguous

Also mapping ambiguous reads without linker. default: N

--skipmap

Skip mapping read1 and read2, start from paired R1.sam and R2.sam, only valid in HiChIP mode now. default: N

--macs2

macs2 path, using macs2 callpeak to detect anchor peak with alignment file. default: N

--nomodel

macs2 parameter, Whether or not to build the shifting model in macs2. default: N

--shortestP

extend and keep shorest peak length longer than N for loop calling, suggest 1500, user can set 0 to skip this step. default: 1500

--shortestA

extend and keep shorest anchor length longer than N for loop calling, user can set 0 to skip this step. default: 0

--XOR_cluster

Whether keep loops if only one side of anchor is overlap with peak. default: N

--addcluster

Keep all regions with more than 2 count reads as potential anchor for calling loop. default: N. if peaks number of macs2 smaller than 10000, this paramater will work automaticly.

Note: To use CPT3, you need to first index the genome with Build index.

Tip

For feature requests or bug reports please open an issue on github.