Making Genome Tracks Tutorial

How to turn your BAM files into genome track files that can be viewed in a genome browser

Goal: This tutorial provides a guide to transforming and filtering your ChIP-exo alignment data into files that can be uploaded by a genome browser.

info

❗ BigWig Warning ❗ Be careful about performing analysis steps using BigWig files instead of BAM files. BigWigs are excellent for visualizing data pileups in a genome browser but if you are processing data, we recommend you process data directly from BAM formats. From a information content and reproducibility standpoint, BAM files have notable advantages over BigWigs. Read more about some of these advantages

This tutorial is based on the processing script from Mittal et al, 2022 to turn BAM files into BigWig tracks with base-pair resolution for high resolution assays.

There will be a couple steps that need to be performed on the command line using UCSC binaries because we currently don't have a parser for BigWig files (in the works but not ready yet). In the meantime, here are the step-by-step instructions for using as much of ScriptManager's GUI as possible to generate the BigWig track.

Download ScriptManager (v0.14):

The current version of ScriptManager is available for download here. Make sure you have Java installed.

The file ScriptManager-v0.14.jar should be placed someplace locally accessible. For example on Mac OS on the Desktop (Permissions will need to be accepted) or someplace in your home directory (i.e. Macintosh HD/Users/userID/ScriptManager)

Download USCS binary `bedGraphToBigWig`

You will need to download the precompiled binary for converting bedGraphs to BigWig files. Navigate to UCSC executables FTP. Click the link to the OS that matches your machine and scroll down to download the bedGraphToBigWig binary.

Linux/MacOS

Make sure you modify the permissions so that you can execute the binary. You can do this with chmod

chmod 755 /path/to/bedGraphToBigWig

For example, if your bedGraphToBigWig binary downloaded to your Downloads folder,

chmod 755 ~/Downloads/bedGraphToBigWig

Download data

You need one file of sequencing data alignments (BAM) to complete this exercise and a file with the sacCer3 yeast reference genome chromosome sizes (chrom.sizes). Read more about the BAM file format here.

BAM File

This is the set of Reb1 read alignments from the Yeast Epigenome Project (YEP). See Rossi et al (2021) for more details.

Download sample BAM file

Navigate to www.yeastepigenome.org and search for Reb1
Select "META DATA"
Select "Direct Download"
Unzip the resulting file ‘12141_YEP.zip’ and inspect the contents of the new 12141_YEP folder. It should contain a file called 12141_filtered.bam.

XXXX.chrom.sizes Reference File

You will also need the chromosome sizes file of the yeast reference genome (sacCer3):

Download sacCer3.chrom.sizes (TXT)

danger

The downloaded sacCer3.chrom.sizes file may be using the roman numeral naming system (chrI chrII chrIII ...). You will need to modify this to the arabic (chr1 chr2 chr3 ...) numeral naming system if you plan to use the BAM file linked above. You can do this in a simple text editor.

Also, if the file is missing the 2-micron row, you will need to add it manually. Simply append the following tab-delimited row to the bottom of the sacCer3.chrom.sizes file.

2-micron    6318

Generate the Genome Tracks

1. Open ScriptManager

MacOS
Linux
Windows

Depending on your system permissions, you may need to be an administrator to open this for the first time. On Mac systems, this can be done by right-clicking the file and selecting ‘Open’ at the top.

caution

Some MacOS systems may not properly open the JAR by simply double-clicking on the JAR file so you may need to open your Terminal window and execute it from the command line by executing the jar file without any arguments or flags:

java -jar /path/to/ScriptManager.jar

If you're not sure how to type the path to ScriptManager, you can type java -jar (end with space) and then drag ScriptManager from Finder into your Terminal window and then press enter.

Once you see the main tool selection window, you're off to the races!

2. Create a whole-genome pileup using BAM Format Converter Tools

2.1. Navigate to BAM Format Converter ➡️ BAM to scIDX to do a genome-wide pileup of your BAM

2.2. Select Encoding information

Since this is a ChIP-exo dataset, we will select Read 1 (5' end) for the encoding. You may select insert size filters if you wish.

2.3. Execute pileup

This should output a 12141_filtered_READ1.tab file.

3. (Optional) Normalize the genome track

Skip this if you would like to make a track of the raw read counts.

ScriptManager includes a "Calculate Scaling Factor" tool under the "Read Analysis" tool group that can calculate scaling factors based on several methodologies including total tag normalization and the Normalization of ChIP-seq (NCIS) protocol. Read more in the docs. Please be sure to check that your normalization method is appropriate for your data.

3.1. Navigate to Read Analysis ➡️ Calculate Scaling Factor to calculate a scaling factor

3.2. Select the BAM file you want to create the data track from

3.3. Select a normalization method and execute

For this BAM file, the total tag method gives a scaling factor of 9.562259045426815.

3.4. Navigate to Read Analysis ➡️ Scale Matrix to apply the normalization to the data track

3.2. Input the normalization factor and adjust the start row to 2

3.3. Load the file and execute matrix scaling

You will be loading in the scIDX file (not the BAM).

This should output a 12141_filtered_READ1_SCALE.tab file.

4. Convert scIDX to BigWig

Since ScriptManager currently does not support converting BedGraph files to BigWigs, this step must be done in your terminal/on the command-prompt.

4.1. Turn your scIDX file into a bedGraph file

Shell
Excel

grep -v -f sacCer3.chrom.sizes 12141_filtered_READ1_SCALE.tab > 12141_filtered_READ1_SCALE.filtered.tab
sed '1d;2d' 12141_filtered_READ1_SCALE.filtered.tab | awk '{OFS="\t"}{FS="\t"}{print $1, $2, $2+1, $3}' |sort -k1,1 -k2,2n > 12141_filtered_READ1_SCALE_forward.bedgraph
sed '1d;2d' 12141_filtered_READ1_SCALE.filtered.tab | awk '{OFS="\t"}{FS="\t"}{print $1, $2, $2+1, $4}' |sort -k1,1 -k2,2n > 12141_filtered_READ1_SCALE_reverse.bedgraph

Open your file in excel and strip the file of the first two rows:

#2023-03-10 12:52:40.439;12141_filtered.bam;READ1
chrom   index   forward reverse value

You may also have read mapping to the very edges of the chromosome. These entries will also have to be removed.

Then rearrange the columns into BedGraph format by copying them in the following order (3rd column is the value of col2 plus one):

12141_filtered_READ1_SCALE_forward.bedGraph: col1, col2, (col2 +1), col3
12141_filtered_READ1_SCALE_reverse.bedGraph: col1, col2, (col2 +1), col4

...and then sort each of them by the first and then the second columns. This format is tab-delimited (not comma-delimited!) so save as a tab-delimited text file. Do not save this as an Excel file format (.xlsx, .xls, etc). This should be a .tsv or .tab format.

4.2. Turn your bedGraph file into a BigWig file

Shell

./bedGraphToBigWig 12141_filtered_READ1_SCALE_forward.bedGraph sacCer3.chrom.sizes my-track-basename_forward.bw
./bedGraphToBigWig 12141_filtered_READ1_SCALE_reverse.bedGraph sacCer3.chrom.sizes my-track-basename_reverse.bw

4.3. Load the BigWig into your favorite genome browser

Tah dah! You've made the genome tracks for a ChIP-exo dataset!

Command-Line shell script

The following shell commands mirror the manipulations described above to create BigWig genome tracks. This can serve as a template for you to write out your own workflows as bash scripts that execute command-line style ScriptManager.

SCRIPTMANAGER=/path/to/ScriptManager.jar
BGTOBW=/path/to/bedGraphToBigWig
CHRMSZ=/path/to/sacCer3.chrom.sizes
BAMFILE=/path/to/12141_filtered.bam
OUTPUT=/path/to/my-track-basename
FACTOR=9.562259045426815  # calculate this separately

# Index BAM if not already indexed
[ -f $BAMFILE.bai ] || samtools index $BAMFILE
# Pileup BAM along whole genome
java -jar $SCRIPTMANAGER bam-format-converter bam-to-scidx $BAMFILE -o $OUTPUT.raw.tab
# Only include pileups from chromosomes in the sacCer3.chrom.sizes reference
grep -v -f $CHRMSZ $OUTPUT.raw.tab > $OUTPUT.filtered.tab
# Scale SCIDX output (optional)
java -jar $SCRIPTMANAGER read-analysis scale-matrix $OUTPUT.filtered.tab -s $FACTOR -r 2 -o $OUTPUT.scaled.tab
# Convert to forward/reverse Bedgraph tracks
sed '1d;2d' $OUTPUT.scaled.tab | awk '{OFS="\t"}{FS="\t"}{print $1, $2, $2+1, $3}' |sort -k1,1 -k2,2n > $OUTPUT.forward.bedgraph
sed '1d;2d' $OUTPUT.scaled.tab | awk '{OFS="\t"}{FS="\t"}{print $1, $2, $2+1, $4}' |sort -k1,1 -k2,2n > $OUTPUT.reverse.bedgraph
# Compress to BigWig format
$BGTOBW $OUTPUT.forward.bedgraph $CHRMSZ $OUTPUT\_forward.bw
$BGTOBW $OUTPUT.reverse.bedgraph $CHRMSZ $OUTPUT\_reverse.bw

# Clean-up
rm $OUTPUT.raw.tab $OUTPUT.filtered.tab $OUTPUT.scaled.tab $OUTPUT.forward.bedgraph $OUTPUT.reverse.bedgraph

# Output files:
#  - /path/to/my-track-basename_forward.bw
#  - /path/to/my-track-basename_reverse.bw

Download ScriptManager (v0.14):​

Download USCS binary bedGraphToBigWig​

Download data​

BAM File​

XXXX.chrom.sizes Reference File​

Generate the Genome Tracks​

1. Open ScriptManager​

2. Create a whole-genome pileup using BAM Format Converter Tools​

2.1. Navigate to BAM Format Converter ➡️ BAM to scIDX to do a genome-wide pileup of your BAM​

2.2. Select Encoding information​

2.3. Execute pileup​

3. (Optional) Normalize the genome track​

3.1. Navigate to Read Analysis ➡️ Calculate Scaling Factor to calculate a scaling factor​

3.2. Select the BAM file you want to create the data track from​

3.3. Select a normalization method and execute​

3.4. Navigate to Read Analysis ➡️ Scale Matrix to apply the normalization to the data track​

3.2. Input the normalization factor and adjust the start row to 2​

3.3. Load the file and execute matrix scaling​

4. Convert scIDX to BigWig​

4.1. Turn your scIDX file into a bedGraph file​

4.2. Turn your bedGraph file into a BigWig file​

4.3. Load the BigWig into your favorite genome browser​

Command-Line shell script​