Downloading Data from PEGR
There are many ways to get your genomic data from PEGR.
Download through the browser
Note to Olivia: Add screenshots for downloading files from PEGR here
Bulk sample download
1. Clone CEGRcode/EGC_utility_scripts
Get the Github scripts for downloading data files from PEGR/Galaxy. We recommend using Github Desktop for this.
2. Install Dependencies
Check the README.md for the conda-style installation.
3. Get your credentials
You can get your PEGR_API_KEY
by logging into www.pegr.org and going to your Account Profile page (linked in top right of screen).
Make sure you also know what email PEGR uses for your account (if you login with your Cornell NetID, then your PEGR_EMAIL
is <mynetid>@cornell.edu
)
You can even add these to your ~/.bashrc
/~/.bash_profile
/~/.zshrc
file for convenience.
export PEGR_API_KEY=ABCDEFGHIJKLMNO12345789 # paste PEGR API Key here
export USER_EMAIL=mypsuusername@psu.edu # paste email here
4. Get the list of samples you want to download
For example, you may have a file called mysamples.txt
that contains the sample ids like this:
12141
21173
As long as the PEGR sample ids are in the first tab-delimited column, you can give these scripts any kind of flat text file:
12141 OtherMetadata1
21173 Whatever you want: and however you want
5. Execute any of the scripts from the EGC_utility_scripts
directory
If you're not sure how to execute them, you can have them print usage statements using the help flag (-h
):
python generate_FQ_file_from_PEGR.py -h
...or check the README.md for the full list of usage statements.
Make sure you are in the EGC_utility_scripts
directory.
If you are downloading human or other data with a several genome build options in PEGR, it is critical that you include the genome build information (-b
) (does not apply to downloading FASTQ files which is genome build-independent).