Step 2: Prepare Sample Data¶
GEMmaker is capable of processing both locally stored RNA-seq files and automatically downloading samples stored in the NCBI SRA database. You can provide both types of files to be included in a single run of GEMmaker, or use only local or only remote files.
Using Samples From NCBI SRA¶
GEMmaker supports automatic download and processing of samples from the NCBI SRA repository. To use samples from the SRA, you must first find the list of NCBI SRA Run IDs of the samples you want to process. The run IDs typically start with an SRR, ERR, or DRR prefix. Do not confuse these with the Experiment IDs which typically start with SRX, ERX or DRX. The run IDs must be placed, one per line, in a file.
Example of a remote ID File:
SRR1058270
SRR1058271
SRR1058272
SRR1058273
SRR1058274
SRR1058275
SRR1058276
SRR1058277
Using Samples Stored Locally¶
By default, GEMmaker expects that FASTQ files are uncompressed (not GZ compressed). They can be stored in any directory on the local filesystem.
Paired FASTQ files¶
By default, paired files must have a _1.fastq
and a _2.fastq
suffix at the end of the filename. GEMmaker uses the _1
and _2
designation to differentiate and match paired files.
Non-Paired FASTQ files¶
By default, if your data is non-paired, GEMmaker expects all files to have a _1.fastq
suffix at the end of the filename.