Skip to content

CPT3 Usage Alignment

Alignment¶

CPT3 align
- long-read or short-read
- Parameters

CPT3 align ¶

long-read or short-read ¶

ChIA-PET, HiCHIP sequencing paired-end data alignment:

An example usage is:
    java -jar ChIA-PET.jar --fastq1 test_long_1.fastq.gz --fastq2 test_long_2.fastq.gz \
    --minimum_linker_alignment_score 14 --GENOME_INDEX hg38.fa --GENOME_LENGTH 3E9 \
    --CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg38.chromSize.txt \
    --CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 \
    --output test_long --prefix GM12878 --thread 8 --fastp fastp \
    --start_step 1 --stop_step 2

Parameters ¶

[ Main paramaters ]
System.out.println("Usage: java -jar <path of ChIA_PET.jar> [options]
Necessary options:
--fastq1		path of read1 fastq file
--fastq2		path of read2 fastq file
--autolinker		detect linker by our program, then no need provide --linker and --mode paramater.
--mode		mode of tool, 0: short read; 1: long read, need for ChIA-PET data
--linker		path of linker file, need for ChIA-PET mode
--fastp		fastp path, strong suggest for ChIA-PET data.
--skipheader		skip header N reads for detect linker, default 1000000.
--linkerreads		N reads used for detect linker, default 100000.
--hichip		Y(es) or N(o)[default] or O(nly print restriction site file without run other step), need for hichip data
--ligation_site		It can be the name of restriction enzyme, such as HindIII, MboI, DpnII, Bglii, Sau3AI, Hinf1, NlaIII, AluI or the site of enzyme digestion, A^AGCTT, ^GATC, ^GATC, A^GATCT, G^ANTC, CATG^, AG^CT or others. multipe restriction enzyme can be seperated by comma, such as G^ANTC,^GATC. restriction site with '^' and contains 'ATCG' without other character!!! if the genomic enzyme digestion file --restrictionsiteFile is provided, this parameter does not need to be provided. only needed for hichip data
--ResRomove		Y or N, whether remove PET in same restriction contig. default: Y
--restrictionsiteFile			restriction site file, can be genarated while has --ligation_site and without this paramater or provide restriction enzyme information with --ligation_site, we will automatically generate the file. only needed for hichip data
--genomefile		genome fasta file path, needed for with --ligation_site and without --restrictionsiteFile only needed for hichip data
--minfragsizeMinimum			restriction fragment length to consider, default 20
--maxfragsize		Maximum restriction fragment length to consider, default 1000000
--minInsertsize		Minimum restriction fragment skip of mapped reads to consider, default 1
--fqmode		single-end or paired-end (default), only required --fastq1 when single-end mode for ChIA-PET data
--minimum_linker_alignment_score							minimum alignment score, 14 default
--GENOME_INDEX		the path of BWA index
--GENOME_LENGTH		the number of base pairs in the whole genome
--CHROM_SIZE_INFO		the file that contains the length of each chromosome, example file is in ChIA-PET_Tool_V3/chrInfo, this is necessary for > step 2 analysis. Note. please make sure chromosome name in this file is same as name in genome file!!!
--CYTOBAND_DATA		the ideogram data used to plot intra-chromosomal peaks and interactions, example file is in ChIA-PET_Tool_V3/chrInfo
--SPECIES		1: human; 2: mouse; 3: others
Other options:
--start_step		start with which step, 1: linker filtering; 2: mapping to genome; 3: removing redundancy; 4: categorization of PETs; 5: peak calling; 6: interaction calling; 7: visualizing, default: 1
--stop_step		stop with which step, 1: linker filtering; 2: mapping to genome; 3: removing redundancy; 4: categorization of PETs; 5: peak calling; 6: interaction calling; 7: visualizing, default: 100, should be bigger than --start_step
--output		path of output, default: ChIA-PET_Tool_V3/output
--prefix		prefix of output files, default: out
--minimum_tag_length		minimum tag length, default: 18
--maximum_tag_length		maximum tag length, default: 1000
--minSecondBestScoreDiff					the score difference between the best-aligned and the second-best aligned linkers, default: 3
--output_data_with_ambiguous_linker_info								whether to print the linker-ambiguity PETs, 0: not print; 1: print, default: 1
--printreadID		write read ID to bedpe file, default: N
--printallreads		print all reads no matter strand, default: 0[print all]; 1, only print valid strand reads.
--search_all_linker		search all linkers in reads or just search one time, default: N
--thread	the number of threads used in linker filtering and mapping to genome, default: 1
--MAPPING_CUTOFF		cutoff of mapping quality score for filtering out low-quality or multiply-mapped reads, default: 20
--MERGE_DISTANCE		the distance limit to merge the PETs with similar mapping locations, default: 2
--SELF_LIGATION_CUFOFF			the distance threshold between self-ligation PETs and intra-chromosomal inter-ligation PETs, default: 8000 for ChIA, and 1000 for HiChIP
--EXTENSION_LENGTH		the extension length from the location of each tag, default: 500, 1500 suggest for single-end mode
--MIN_COVERAGE_FOR_PEAK				the minimum coverage to define peak regions, default: 5
--PEAK_MODE		1: peak region mode, which takes all the overlapping PET regions above the coverage threshold as peak regions; 2: peak summit mode, which takes the highest coverage of overlapping regions as peak regions, default: 2
--MIN_DISTANCE_BETWEEN_PEAK						the minimum distance between two peaks, default: 500
--GENOME_COVERAGE_RATIO						the estimated proportion of the genome covered by the reads, default: 0.8
--PVALUE_CUTOFF_PEAK						p-value to filter peaks that are not statistically significant, default: 0.00001
--INPUT_ANCHOR_FILE						a file which contains user-specified anchors for interaction calling, default: null
--PVALUE_CUTOFF_INTERACTION \| p-value to filter false positive interactions, default: 0.5
--zipbedpe	gzip bedpe related file, after analysis done. default: N. Y for gzip, N for not.
--zipsam	Convert sam file to bam, after analysis done. default: N
--deletesam	Delete sam files. default: N
--keeptemp	Keep temp sam and bedpe file. default: N
--map_ambiguous		Also mapping ambiguous reads without linker. default: N
--skipmap		Skip mapping read1 and read2, start from paired R1.sam and R2.sam, only valid in HiChIP mode now. default: N
--macs2		macs2 path, using macs2 callpeak to detect anchor peak with alignment file. default: N
--nomodel		macs2 parameter, Whether or not to build the shifting model in macs2. default: N
--shortestP		extend and keep shorest peak length longer than N for loop calling, suggest 1500, user can set 0 to skip this step. default: 1500
--shortestA		extend and keep shorest anchor length longer than N for loop calling, user can set 0 to skip this step. default: 0
--XOR_cluster		Whether keep loops if only one side of anchor is overlap with peak. default: N
--addcluster		Keep all regions with more than 2 count reads as potential anchor for calling loop. default: N. if peaks number of macs2 smaller than 10000, this paramater will work automaticly.

Note: To use CPT3, you need to first index the genome with Build index.

Tip

For feature requests or bug reports please open an issue on github.