Skip to content
mahmoudibrahim edited this page Aug 18, 2016 · 5 revisions

Running JAMM

To run JAMM, just call the main JAMM bash script from your terminal. For example:

bash /path/to/JAMM/folder/JAMM.sh

JAMM Parameters

  • -s (Sample file directory - Required):

This is the path to a directory containing Sample files. All files need to end in .bed and follow the BED file specification (at least BED6), but without using any headers.

If multiple BED files are found in the directory, they will all be analyzed as replicates. Files that do not end with .bed will be ignored.

Note: If there are multiple BED files (replicates), their names must be unique in the first segment of the name. "rep.1.bed" and "rep.2.bed" will NOT work but "rep1.bed" and "rep2.bed" will work.

Note: Chromosomes with "." in their name currently cause problems and will not be analyzed properly. We will fix this in future versions.

  • -g (Genome size file - Required):

Genome size file is a tab delimited file with chromosome name in the first column and chromosome size (length) in the second column. Chromosome names must match those in the bed file. The file must be withOUT any headers.

Genome size files for most used genome assemblies can be downloaded here. Otherwise, all UCSC genome assemblies can be obtained using the fetchChromSize UCSC script.

Note: Chromosomes with "." in their name currently cause problems and will not be analyzed properly. We will fix this in future versions.

  • -o (Output directory - Required):

A directory for JAMM to create its output. See below for more information. This can be the same directory as the Sample directory.

  • -c (Control files directory - Optional):

If you have control files (like input for ChIP-Seq), you need to store them in a separate directory and give it here. This must be a different directory from the Sample files directory. Files need to end in .bed and follow the BED file specification (at least BED6).

If there are multiple files in this directory, they will be concatenated together.

Note: Chromosomes with "." in their name currently cause problems and will not be analyzed properly. We will fix this in future versions.

Default: None

  • -f (Fragment Length(s) - Optional):

Reads will be extended to the length provided here before creating read count pile-ups. For multiple files (including control), separate the numbers by commas. Order the numbers by the alphabetical orders of the file names.

Setting this to 1 truncates reads to their 5' ends, which is suitable for DNase-Seq and ATAC-Seq.

Note: If you set "-t" to "paired" (starting version 1.0.5), fragment lengths specified here will only be used for bin size estimation.

Default: Estimated from data via cross-correlation analysis

  • -r (peak calling Resolution - Optional):

Setting this to peak will score and report peaks as they are found. Setting this to region will merge peaks that are in the same enriched window. This means that region is more suited if you are only interested in broader regions of enrichment and not in separating closely spaced peaks.

Setting this to window (starting version 1.0.5) will skip signal clustering in local windows and will just score and report the whole window as one peak. window option is only available starting version 1.0.5.

Available values: peak / region / window

Default: peak

  • -m (peak calling Mode - Optional):

Setting this to normal makes JAMM assume that read counts originate from enriched regions and non-enriched regions (peaks and noise). Setting this to narrow makes JAMM assume that read counts originate from enriched regions, tails of enriched regions and non-enriched regions (peaks, peak tails and noise).

This means that narrow mode is more suited to narrower regions and better separation of closely-spaced peaks.

This parameter will have no effect if "-r" is set to "window".

Available values: normal / narrow

Default: normal

  • -i (mixture model Initialization - Optional - starting version 1.0.7rev1):

Setting this to deterministic makes JAMM initialize the Gaussian mixture model using the top scoring 0.1% of the enriched windows discovered for every chromosome separately. This is is the same as JAMMv1.0.4rev1.

Setting this to stochastic makes JAMM initialize the Gaussian mixture model using at most 20 of the top scoring 25% of the enriched windows discovered for every chromosome separately. Those 20 windows are chosen randomly. This is is the same as JAMMv1.0.5 and JAMMv1.0.6revX.

Available values: deterministic / stochastic

Default: deterministic

  • -b (Bin size - Optional):

This is the size of the bins that JAMM partitions the genome into to check for enrichment.

Default: estimated from data

  • -e (window fold Enrichment - Optional - starting version 1.0.6rev1):

Windows have to have this minimum fold-enrichment for sample signal divided by background signal in order to be considered for further analysis. This defaults to 1, which is exactly the same like all previous versions: any window that has higher signal than background is considered.

By setting this to "auto", JAMM will estimate a minimum fold enrichment based on a log-normal distributions of fold-enrichments.

You can set an arbitrary fold-enrichment cutoff by setting -e to your desired value.

Default: 1

  • -d (keep PCR Duplicates - Optional - starting version 1.0.7rev1):

Set this to y to keep all reads. Set it to n to keep only one read per basepair/strand in the genome (similar to samtools rmdup).

When -t is set to "paired". This option has no effect (All fragments are kept).

Available values: y / n

Default: n

NOTE: In v1.0.7rev1 ONLY, this option works for sorted BED files only (for example, using bedtools sort)

  • -t (alignment Type - Optional - starting version 1.0.5):

This is the type of alignment that you have. Setting this single means you have single-end alignment files. Setting this to paired means you have paired-end alignment files.

Files need to end in .bed but be BEDPE files.

Available values: single / paired

Default: single

  • -w (minimum Window size - Optional):

To minimum size of the windows JAMM can check for peaks. Smaller windows are ignored. This parameter is set in relation to bin size. So setting it to 3 means that the minimum window size is 3 times the bin size for example.

Default: 2

  • -p (number of Processors used - Optional):

The number of processors or cores for parallel processing. This only affects the R scripts for fragment length estimation and peak finding.

JAMM uses the R package "parallel" which is maintained by the R Core team.

Default: 1

Secondary JAMM Parameters

The following parameter list are not accessible directly from the command line, users can modify those parameters directly in JAMM's scripts under User-defined variables.

  • reportNoClust (Available in peakfinder.r - starting v1.0.7rev1):

This is to report windows where mixture model clustering did not work (did not converge or replicates did not agree on clustering assignments). The reported windows will be flagged with "noClust" in their name in the 4th column.

  • samplingSeed (Available in peakfinder.r - starting v1.0.7rev1):

The seed that JAMM will use for all random sampling and randomization steps. The default is 1011414, but you can randomize or change it here.

  • meanAdjust (Available in peakfinder.r - starting v1.0.6rev1):

This is to adjust the initial mean vector of every window before the EM clustering algorithm starts. Use this if you think JAMM is "missing" a lot of peaks. WARNING: not fully tested, use at your own risk!

  • cutoff (Available in peakfinder.r):

This is to set an arbitrary signal-to-noise ratio cutoff for finding enriched bins. By default JAMM sets the cutoff signal-to-noise ratio for bin enrichment determination to be the average SNR of the whole corresponding chromosome.

  • strict (Available in peakfinder.r):

By default JAMM does NOT require any fold enrichment for signal vs. background (strict = 1). Signal just has to be above background for a bin to be considered enriched. This is to guarantee that JAMM reports a large number of peaks. Note that JAMM is slower than typical peak finders mainly because of this.

  • defaultBins (Available in bincalculator.r):

The bin size search space for bin size estimation is determined by the fragment length. When the user sets a fragment length that is equal to or less than read length, the vector defined by defaultBins determines the search space for the optimal bin size.

It defaults to: seq(50, 50*15, by = 50) - which means the minimum bin size is 50 and the maximum is 750, at 50 basepair steps.

Examples

  • Transcription Factor ChIP-Seq:

bash JAMM.sh -s /diretory/with/sample/files -g genome_size_file.txt -o /directory/for/output -c /directory/with/input/files

  • DNAse-Seq, two replicates:

bash JAMM.sh -s /diretory/with/sample/files -g genome_size_file.txt -o /directory/for/output -f 1,1

  • Histone Modification ChIP-Seq, three replicates and input:

bash JAMM.sh -s /diretory/with/sample/files -g genome_size_file.txt -o /directory/for/output -c /directory/with/input/files -f 150,139,122,155

Further Notes

The default peak calling mode of JAMM -m normal -r peak is very widely applicable to many datasets.

Try -m normal -r region if you are really only interested in obtaining "broader" regions of enrichment and you are NOT interested in "narrower" peaks with accurate boundaries.

Try -m narrow -r peak if you want to increase and refine separation of closely spaced peaks, or if you are expecting very narrow peaks from high-resolution protocols.

More Questions?

If you still have questions on how to use JAMM, please email us at this email