Skip to main content

File Formats

A variety of standard file formats including BAM, GFF, BED, and CDT are used by the ScriptManager tools along with some custom file formats. The purpose of this guide is to help users understand what types of information these formats store and find tools in ScriptManager based on the format their data exists in.

Read More

While this page includes a little info on each of the file formats, there are other resources on the internet that provide detailed descriptions and context that will better inform users looking for more explanation on the data formats (see links below).

Alignment Formats

SAM - Sequence Alignment Map

See BAM. ScriptManager does not generally support the use of SAM formats due to the computational strain it puts on hardware. It is strongly recommended to compress it into a BAM format before analyzing.

BAM - Binary Alignment Map

The binary form of SAM file format, this is one of the most common formats used by ScriptManager. It's the output of aligners when aligning reads to a reference sequence. See Samtools documentation or the documentation from the alignment tool for specification info.

Related Tools:

InputOutput
bam-correlation
bam-indexer
bam-to-bedgraph
bam-to-bed
bam-to-gff
bam-to-scidx
filter-pip-seqfilter-pip-seq
md5checksum
merge-bammerge-bam
pe-stat
mark-duplicatesmark-duplicates
scaling-factor
se-stat
signal-dup
sort-bamsort-bam
tag-pileup

Coordinate/Annotation Formats

BED - Browser Extendable Data

A text-based file format for storing information about genomic regions. ScriptManager supports 0-based and 1-based BED files.

Related Tools:

InputOutput
bam-to-bed
bed-to-gff
dna-shape-bed
expand-bedexpand-bed
fasta-extract
filter-bedfilter-bed
gff-to-bed
peak-align-ref
rand-coord
search-motif
sort-bedsort-bed
tag-pileup

GFF/GTF - General Feature Format

The GTF/GFF/GFF3 file specifications are documented in several places around the the bioinformatics community. See Ensembl for specification info.

Importantly note that both the start and end are 1-indexed and inclusive.

Related Tools:

InputOutput
bam-to-gff
bed-to-gff
expand-gffexpand-gff
gff-to-bed
peak-align-ref
rand-coord
signal-dup
sort-gffsort-gff
tile-genome

Sequence formats

FASTA

A simple, text-based format for representing DNA or protein sequences. Files in the FASTA format may have different extensions, including .fasta, .fna, .ffn, .frn, .fa, or even .txt.

Related Tools:

InputOutput
dna-shape-bed
dna-shape-fasta
fasta-extractfasta-extract
four-color
randomize-fastarandomize-fasta
search-motif

Matrix formats

CDT - Clustered Data Table

A standard format for matrices, with two row headers and one column header. Values are separated by \t characters, making these files a subset of the TAB format. Read more about the format here.

Related Tools:

InputOutput
aggregate-data
composite
dna-shape-bed
dna-shape-fasta
heatmap
peak-align-ref
scale-matrixscale-matrix
transpose-matrixtranspose-matrix
sort-bed
tag-pileup

TAB/TSV - Tab-separated format

or "Tab-delimited" format

A text-based format for storing matrices with values separated by \t characters. These files can be easily viewed in Excel or Google Sheets.

Related Tools:

InputOutput
aggregate-dataaggregate-data
heatmap
tag-pileup
scale-matrixscale-matrix

Image formats

PNG - Portable Network Graphic

A standard, lossless image format used for storing figures.

Related Tools:

InputOutput
bam-correlation
composite
four-color
heatmap
merge-heatmapmerge-heatmap

Genome Browser Track formats

bedGraph

A format used for plotting one value of quantitative data across a genome or region. This format is most closely related to the wiggle format and always 0-based.

Related Tools:

InputOutput
bam-to-bedgraph

scIDX - Strand-specific coordinate count

A lesser-used, 1-based format for storing the number of tags at a given coordinate. Files using this format may also use the .tab extension since it is a subset of the TAB format.

Related Tools:

InputOutput
bam-to-scidx
file has the .tab extension

Generic formats

TXT - Text file

A standard format for storing text. Some text files may have the .out extension.

Related Tools:

InputOutput
bam-correlation
md5checksum
pe-stat
scaling-factor
se-stat
signal-dup
info

See our Tool Index for the full catalog of scripts.