Skip to main content

Mark Duplicates (Picard)

Picard - MarkDuplicates

mark-duplicates

Removes or marks duplicate reads in paired-end sequencing given identical 5' read positions. Read more in the Picard documentation.

File inputs (BAM)

This script filters BAM-type files so make sure your inputs are properly formatted and use the appropriate .bam extension. The script also supports bulk selection and processing of files.

caution

Make sure your BAM input files are sorted and indexed.

Output file (BAM & TXT)

The output BAM files are named based on the input filenames. The _dedup.bam suffix is used for each output. For example, for a given XXX.bam input file, a new XXX_dedup.bam file will be written to the user-selected output directory.

The output text files are also named based on the input filenames. The _dedup.metrics suffix is used for each output. For example, for a given XXX.bam input file, a new XXX_dedup.metrics file will be written to the user-selected output directory.

tip

Make sure if you change the output BAM filename that you keep the .bam file extension.

Example XXX_dedup.metrix output (TXT)


Remove Duplicates Option

Each output file can either remove one of the duplicate reads or mark them by changing the SAMFlag value to update the status as "duplicate." This checkbox option determines whether the duplicate reads are kept or just marked.

Generate BAI file (GUI only)

By checking this box, the script will automatically generate a BAI index file for each new filtered BAM file.

note

The CLI will not index the resulting BAM file. The user must use appropriate samtools/Picard command to generate the BAI.

Command Line Interface (Picard and Samtools)

CommandLine tools already exist for this function. This tool only exists as a GUI wrapper in ScriptManager.

Please see the Samtools markdup tool or the Picard MarkDuplicates tool for a command line tool that performs this function.