Skip to main content

Extract FASTA

fasta-extract

Generate FASTA file from indexed Genome FASTA file and BED file. Script will generate FAI index if not present in Genome FASTA folder.

File inputs (Genomic FASTA & BED)

BED files capture coordinate regions without the sequence information. This tool allows the user to search the FASTA file the BED file is based on and extract the sequence within the genomic region to a new FASTA-formatted file. The input FASTA is often a genome FASTA but as long as chrname column matches FASTA identifiers, it could be any FASTA.

When using the GUI, make sure your input is properly formatted and uses the appropriate BED (.bed or .bed.gz) and FASTA (.fa / .fa.gz / .fasta / ...) extensions.

File Options

The 'Force Strandedness' options ensures that the analysis will respect the strand information specified in the BED file when extracting sequences.

Command Line Interface

Usage:

java -jar ScriptManager.jar sequence-analysis fasta-extract [-cfhV] [-o=<output>]
<genomeFile> <bedFile>

Positional Input

The first positional input

OptionDescription
<fastaFile>reference genome FASTA file
<bedFile>the BED file of sequences to extract

Output Options

OptionDescription
-o, --output=<output>Specify output file
-z, --gzipgzip output (default=false)

Extract Options

OptionDescription
-c, --coord-headeruse genome coordinate for output FASTA header (default is to use bed file headers)
-f, --forceforce-strandedness (default)