File Formats
A variety of standard file formats including BAM, GFF, BED, and CDT are used by the ScriptManager tools along with some custom file formats. The purpose of this guide is to help users understand what types of information these formats store and find tools in ScriptManager based on the format their data exists in.
Read More
While this page includes a little info on each of the file formats, there are other resources on the internet that provide detailed descriptions and context that will better inform users looking for more explanation on the data formats (see links below).
Alignment Formats
SAM - Sequence Alignment Map
See BAM. ScriptManager does not generally support the use of SAM formats due to the computational strain it puts on hardware. It is strongly recommended to compress it into a BAM format before analyzing.
BAM - Binary Alignment Map
The binary form of SAM file format, this is one of the most common formats used by ScriptManager. It's the output of aligners when aligning reads to a reference sequence. See Samtools documentation or the documentation from the alignment tool for specification info.
Related Tools:
Coordinate/Annotation Formats
BED - Browser Extendable Data
A text-based file format for storing information about genomic regions. ScriptManager supports 0-based and 1-based BED files.
Related Tools:
Input | Output |
---|---|
bam-to-bed | |
bed-to-gff | |
dna-shape-bed | |
expand-bed | expand-bed |
fasta-extract | |
filter-bed | filter-bed |
gff-to-bed | |
peak-align-ref | |
rand-coord | |
search-motif | |
sort-bed | sort-bed |
tag-pileup |
GFF/GTF - General Feature Format
The GTF/GFF/GFF3 file specifications are documented in several places around the the bioinformatics community. See Ensembl for specification info.
Importantly note that both the start and end are 1-indexed and inclusive.
Related Tools:
Input | Output |
---|---|
bam-to-gff | |
bed-to-gff | |
expand-gff | expand-gff |
gff-to-bed | |
peak-align-ref | |
rand-coord | |
signal-dup | |
sort-gff | sort-gff |
tile-genome |
Sequence formats
FASTA
A simple, text-based format for representing DNA or protein sequences. Files in the FASTA format may have different extensions, including .fasta
, .fna
, .ffn
, .frn
, .fa
, or even .txt
.
Related Tools:
Input | Output |
---|---|
dna-shape-bed | |
dna-shape-fasta | |
fasta-extract | fasta-extract |
four-color | |
randomize-fasta | randomize-fasta |
search-motif |
Matrix formats
CDT - Clustered Data Table
A standard format for matrices, with two row headers and one column header. Values are separated by \t
characters, making these files a subset of the TAB format. Read more about the format here.
Related Tools:
Input | Output |
---|---|
aggregate-data | |
composite | |
dna-shape-bed | |
dna-shape-fasta | |
heatmap | |
peak-align-ref | |
scale-matrix | scale-matrix |
transpose-matrix | transpose-matrix |
sort-bed | |
tag-pileup |
TAB/TSV - Tab-separated format
or "Tab-delimited" format
A text-based format for storing matrices with values separated by \t
characters. These files can be easily viewed in Excel or Google Sheets.
Related Tools:
Input | Output |
---|---|
aggregate-data | aggregate-data |
heatmap | |
tag-pileup | |
scale-matrix | scale-matrix |
Image formats
PNG - Portable Network Graphic
A standard, lossless image format used for storing figures.
Related Tools:
Input | Output |
---|---|
bam-correlation | |
composite | |
four-color | |
heatmap | |
merge-heatmap | merge-heatmap |
Genome Browser Track formats
bedGraph
A format used for plotting one value of quantitative data across a genome or region. This format is most closely related to the wiggle format and always 0-based.
Related Tools:
Input | Output |
---|---|
bam-to-bedgraph |
scIDX - Strand-specific coordinate count
A lesser-used, 1-based format for storing the number of tags at a given coordinate. Files using this format may also use the .tab
extension since it is a subset of the TAB format.
Related Tools:
Input | Output |
---|---|
bam-to-scidx file has the .tab extension |
Generic formats
TXT - Text file
A standard format for storing text. Some text files may have the .out
extension.
Related Tools:
Input | Output |
---|---|
bam-correlation | |
md5checksum | |
pe-stat | |
scaling-factor | |
se-stat | |
signal-dup |
See our Tool Index for the full catalog of scripts.