Skip to main content

DNA Shape from FASTA File

dna-shape-fasta

Calculate intrinsic DNA shape parameters given input FASTA files. Based on Roh's lab DNAshape server data.

Based on the findings from the Rohs lab (Zhou et al, 2013; Li et al, 2017), a sliding window approach using a 5bp wide window is a strong predictor of local DNA shape. Using this approach, we can predict 13 kinds of DNA shapes. The electrostatic potential of the strand is also calculable using a similar approach (Chiu et al,2017).

This script takes in a series of nucleotide sequences from a FASTA file and determines the average shape score(s) across the bp positions.

What do these shape options mean?

Below is a video introducing some of the shape measurements that we are trying to capture with these calculations.


File inputs (FASTA)

Each input FASTA-formatted set of sequences has a shape score series pattern. Because the shape score is a series corresponding to the bp position, the FASTA sequences input should be positionally linked to some feature and of the same length.

When using the GUI, make sure your input is properly formatted and uses the appropriate FASTA (.fa / .fa.gz / .fasta / ...) extensions.

Output file (CDT/TAB)

The average composites of the CDT output will be displayed in the GUI output window:

There should be a CDT file/Composite file output for each shape aspect selected based on the input filename and with a suffix distinguishing each selected shape style (_HelT.cdt, _MGW.cdt, _PropT.cdt, and _Roll.cdt).

For example, in the command-line execution, an -o myoutput argument can be provided and the resulting files should be called myoutput_MGW.cdt, myoutput_PTwist.cdt, myoutput_HTwist.cdt, or myoutput_Roll.cdt according to the shapes selected (or with .out if composite is selected).

tip

The output matrix files use the same format as the output from Tag Pileup (can visualize with Figure Generation's heatmap and composite tools).

Command Line Interface

Usage:

java -jar ScriptManager.jar sequence-analysis dna-shape-fasta [-aghlprV]
[--avg-composite] [-o=<outputBasename>] <fastaFile>

Positional Input

Expects a FASTA formatted file with many sequences to stack up with each other (like fasta-extract tool output).

Output Options

OptionDescription
-o, --output=<outputBasename>Specify output basename (files for each shape indicated will share this base)
-z, --gzipgzip output (default=false)
--avg-compositeSave average composite

For each shape option to calculate indicated by the command, a CDT file will be generated with an extension indicating the shape type calculated.

If the groove information is indicated in the command to be used for the output, a file called <outputBasename>_MGW.cdt will be generated. Similarly for propeller, helical, and roll, the output matrix CDT files will be named with the suffixes _PTwist.cdt, _HTwist.cdt, and _Roll.cdt, respectively.

Shape Options

OptionDescriptionUnitsImage
-g, --grooveOutput minor groove widthAngstroms
-r, --rollOutput rollDegrees
-p, --propellerOutput propeller twistDegrees
-l, --helicalOutput helical twistDegrees
--electrostatic-potentialOutput electrostatic potentialkT/e
--stretchOutput stretchAngstroms
--buckleOutput buckleDegrees
--shearOutput shearAngstroms
--openingOutput openingDegrees
--staggerOutput staggerAngstroms
--tiltOutput tiltDegrees
--slideOutput slideAngstroms
--riseOutput riseAngstroms
--shiftOutput shiftAngstroms
--2013Output groove, roll, propeller twist, and helical twist (equivalent to -grpl)
-a, --2021Output all 14 metrics
Image Sources:
MGW
EP
Base-pair Schematics