Alignment¶
CPT3 align¶
long-read or short-read¶
ChIA-PET, HiCHIP sequencing paired-end data alignment:
An example usage is:
java -jar ChIA-PET.jar --fastq1 test_long_1.fastq.gz --fastq2 test_long_2.fastq.gz \
--minimum_linker_alignment_score 14 --GENOME_INDEX hg38.fa --GENOME_LENGTH 3E9 \
--CHROM_SIZE_INFO ChIA-PET_Tool_V3/chromInfo/hg38.chromSize.txt \
--CYTOBAND_DATA ChIA-PET_Tool_V3/chromInfo/hg19_cytoBandIdeo.txt --SPECIES 1 \
--output test_long --prefix GM12878 --thread 8 --fastp fastp \
--start_step 1 --stop_step 2
Parameters¶
[ Main paramaters ] |
||||||||
|---|---|---|---|---|---|---|---|---|
System.out.println("Usage: java -jar <path of ChIA_PET.jar> [options] |
||||||||
Necessary options: |
||||||||
--fastq1 |
path of read1 fastq file |
|||||||
--fastq2 |
path of read2 fastq file |
|||||||
--autolinker |
detect linker by our program, then no need provide --linker and --mode paramater. |
|||||||
--mode |
mode of tool, 0: short read; 1: long read, need for ChIA-PET data |
|||||||
--linker |
path of linker file, need for ChIA-PET mode |
|||||||
--fastp |
fastp path, strong suggest for ChIA-PET data. |
|||||||
--skipheader |
skip header N reads for detect linker, default 1000000. |
|||||||
--linkerreads |
N reads used for detect linker, default 100000. |
|||||||
--hichip |
Y(es) or N(o)[default] or O(nly print restriction site file without run other step), need for hichip data |
|||||||
--ligation_site |
It can be the name of restriction enzyme, such as HindIII, MboI, DpnII, Bglii, Sau3AI, Hinf1, NlaIII, AluI or the site of enzyme digestion, A^AGCTT, ^GATC, ^GATC, A^GATCT, G^ANTC, CATG^, AG^CT or others. multipe restriction enzyme can be seperated by comma, such as G^ANTC,^GATC. restriction site with '^' and contains 'ATCG' without other character!!! if the genomic enzyme digestion file --restrictionsiteFile is provided, this parameter does not need to be provided. only needed for hichip data |
|||||||
--ResRomove |
Y or N, whether remove PET in same restriction contig. default: Y |
|||||||
--restrictionsiteFile |
restriction site file, can be genarated while has --ligation_site and without this paramater or provide restriction enzyme information with --ligation_site, we will automatically generate the file. only needed for hichip data |
|||||||
--genomefile |
genome fasta file path, needed for with --ligation_site and without --restrictionsiteFile only needed for hichip data |
|||||||
--minfragsizeMinimum |
restriction fragment length to consider, default 20 |
|||||||
--maxfragsize |
Maximum restriction fragment length to consider, default 1000000 |
|||||||
--minInsertsize |
Minimum restriction fragment skip of mapped reads to consider, default 1 |
|||||||
--fqmode |
single-end or paired-end (default), only required --fastq1 when single-end mode for ChIA-PET data |
|||||||
--minimum_linker_alignment_score |
minimum alignment score, 14 default |
|||||||
--GENOME_INDEX |
the path of BWA index |
|||||||
--GENOME_LENGTH |
the number of base pairs in the whole genome |
|||||||
--CHROM_SIZE_INFO |
the file that contains the length of each chromosome, example file is in ChIA-PET_Tool_V3/chrInfo, this is necessary for > step 2 analysis. Note. please make sure chromosome name in this file is same as name in genome file!!! |
|||||||
--CYTOBAND_DATA |
the ideogram data used to plot intra-chromosomal peaks and interactions, example file is in ChIA-PET_Tool_V3/chrInfo |
|||||||
--SPECIES |
1: human; 2: mouse; 3: others |
|||||||
Other options: |
||||||||
--start_step |
start with which step, 1: linker filtering; 2: mapping to genome; 3: removing redundancy; 4: categorization of PETs; 5: peak calling; 6: interaction calling; 7: visualizing, default: 1 |
|||||||
--stop_step |
stop with which step, 1: linker filtering; 2: mapping to genome; 3: removing redundancy; 4: categorization of PETs; 5: peak calling; 6: interaction calling; 7: visualizing, default: 100, should be bigger than --start_step |
|||||||
--output |
path of output, default: ChIA-PET_Tool_V3/output |
|||||||
--prefix |
prefix of output files, default: out |
|||||||
--minimum_tag_length |
minimum tag length, default: 18 |
|||||||
--maximum_tag_length |
maximum tag length, default: 1000 |
|||||||
--minSecondBestScoreDiff |
the score difference between the best-aligned and the second-best aligned linkers, default: 3 |
|||||||
--output_data_with_ambiguous_linker_info |
whether to print the linker-ambiguity PETs, 0: not print; 1: print, default: 1 |
|||||||
--printreadID |
write read ID to bedpe file, default: N |
|||||||
--printallreads |
print all reads no matter strand, default: 0[print all]; 1, only print valid strand reads. |
|||||||
--search_all_linker |
search all linkers in reads or just search one time, default: N |
|||||||
--thread |
the number of threads used in linker filtering and mapping to genome, default: 1 |
|||||||
--MAPPING_CUTOFF |
cutoff of mapping quality score for filtering out low-quality or multiply-mapped reads, default: 20 |
|||||||
--MERGE_DISTANCE |
the distance limit to merge the PETs with similar mapping locations, default: 2 |
|||||||
--SELF_LIGATION_CUFOFF |
inter-ligation PETs, default: 8000 for ChIA, and 1000 for HiChIP |
|||||||
--EXTENSION_LENGTH |
the extension length from the location of each tag, default: 500, 1500 suggest for single-end mode |
|||||||
--MIN_COVERAGE_FOR_PEAK |
the minimum coverage to define peak regions, default: 5 |
|||||||
--PEAK_MODE |
1: peak region mode, which takes all the overlapping PET regions above the coverage threshold as peak regions; 2: peak summit mode, which takes the highest coverage of overlapping regions as peak regions, default: 2 |
|||||||
--MIN_DISTANCE_BETWEEN_PEAK |
the minimum distance between two peaks, default: 500 |
|||||||
--GENOME_COVERAGE_RATIO |
the estimated proportion of the genome covered by the reads, default: 0.8 |
|||||||
--PVALUE_CUTOFF_PEAK |
p-value to filter peaks that are not statistically significant, default: 0.00001 |
|||||||
--INPUT_ANCHOR_FILE |
a file which contains user-specified anchors for interaction calling, default: null |
|||||||
--PVALUE_CUTOFF_INTERACTION | p-value to filter false positive interactions, default: 0.5 |
||||||||
--zipbedpe |
gzip bedpe related file, after analysis done. default: N. Y for gzip, N for not. |
|||||||
--zipsam |
Convert sam file to bam, after analysis done. default: N |
|||||||
--deletesam |
Delete sam files. default: N |
|||||||
--keeptemp |
Keep temp sam and bedpe file. default: N |
|||||||
--map_ambiguous |
Also mapping ambiguous reads without linker. default: N |
|||||||
--skipmap |
Skip mapping read1 and read2, start from paired R1.sam and R2.sam, only valid in HiChIP mode now. default: N |
|||||||
--macs2 |
macs2 path, using macs2 callpeak to detect anchor peak with alignment file. default: N |
|||||||
--nomodel |
macs2 parameter, Whether or not to build the shifting model in macs2. default: N |
|||||||
--shortestP |
extend and keep shorest peak length longer than N for loop calling, suggest 1500, user can set 0 to skip this step. default: 1500 |
|||||||
--shortestA |
extend and keep shorest anchor length longer than N for loop calling, user can set 0 to skip this step. default: 0 |
|||||||
--XOR_cluster |
Whether keep loops if only one side of anchor is overlap with peak. default: N |
|||||||
--addcluster |
Keep all regions with more than 2 count reads as potential anchor for calling loop. default: N. if peaks number of macs2 smaller than 10000, this paramater will work automaticly. |
|||||||
Note: To use CPT3, you need to first index the genome with Build index.
Tip
For feature requests or bug reports please open an issue on github.