Skip to main content

Search Motif

search-motif

Search for an IUPAC DNA sequence motif in FASTA files with mismatches allowed

File inputs (FASTA)

Each input FASTA-formatted file will be searched for the user-provided motif. This is typically a genomic FASTA file but can be used with any FASTA formatted file.

When using the GUI, make sure your input is properly formatted and uses the appropriate FASTA (.fa / .fa.gz / .fasta / ...) extensions.

Search Options

Enter an IUPAC Motif

IUPAC (International Union of Pure and Applied Chemistry) has a standard representation for DNA sequences that supports single and sets of bases. Below are some examples but you will need to look up the full IUPAC code for the comprehensive list of options that this tool supports:

  • 'A': Adenine
  • 'T': Thymine
  • 'C': Cytosine
  • 'G': Guanine
  • 'R': Purine (A or G)
  • 'Y': Pyrimidine (C or T)
  • 'N': Any Nucleotide (A, T, C, or G)

These are used to define a DNA pattern to search for within the input FASTA sequences.

Enter Mismatches Allowed

The user can toggle the stringency of the motif search by adjusting the number of mismatched nucleotides that can be tolerated when searching for the motif in the FASTA sequences. Mismatches are positions in the sequence where the nucleotide does not match any of the nucleotides represented in the IUPAC motif for that position.

Command Line Interface

Usage:

java -jar ScriptManager.jar sequence-analysis search-motif [-hV] -m=<motif>
[-n=<ALLOWED_MISMATCH>] [-o=<output>] <fastaFile>

Positional Input

OptionDescription
<fastaFile>reference genome FASTA file

Output Options

OptionDescription
-o, --output=<output>specify output file
-z, --gzipgzip output (default=false)

Search Options

OptionDescription
-m, --motif=<motif>the IUPAC motif to search for
-n, --mismatches=<ALLOWED_MISMATCH>the number of mismatches allowed (default=0)