Frequently Asked Questions
We can accept the following four formats: BAM, CRAM, unaligned BAM, and FASTQ. For BAM/CRAM/uBAM, we expect proper read group information in the file header.
One FASTQ tarball for each sample is expected, which can be either .tar or tar.gz. If multiple read groups exist per sample, include them all in the same tarball. Files inside of the tarball are named as “readgroupname_[12s].(fq|fastq)(.gz)?”. The postfix could be fq, fastq, fq.gz, fastq.gz. The prefix are readgroupname_1 and readgroupname_2 for paired-ended reads; or readgroupname_s for single-ended reads.
At this time, we cannot accept FASTQ chunks.
What formats can I use to send processed/downstream genomic data?
We can accept VCFs. There is no standard naming/header convention for submitted VCFs. We prefer to also receive the raw data for those submitting VCFs.
Yes. Please fill out a new data inventory form for every dataset you submit. If you are going to submit data in three increments, please fill out the data inventory form for each of the three datasets.
Yes. For FASTQ naming, please see What is the best format for FASTQ submission? For BAM/CRAM/unaligned BAM, we expect proper read group information in the file header, but have no restrictions on the file names.