yetnoneの日記

どこぞのPh.D. student。何をやっているかと聞かれると困る。

Submitting (PacBio data) to SRA

How can I submit my (metagenomic) PacBio raw data (*.bax.h5) to the SRA database?

Read the quick guide

Data requirement (PacBio RS II)

Submission of data from the RS II instrument requires one (1) bas.h5 file and three (3) bax.h5 files. Do not link more than one PacBio RS II to an SRA run and please do not change the bax.h5 files names from those indicated in the bas.h5 file.

Submitting steps

  1. Login to or sign up for an NCBI account
  2. Register your project and biological samples:
  3. Create your SRA data submission and upload sequence data files:

    • Submit SRA metadata - information that will link your project, samples/experiments and file names
    • Upload sequence data files in SRA submission portal
  4. If you have already prepared all data for uploading, you can register BioProject and BioSample while creating SRA; that is, you do not have to use BioProject/BioSample submission portal.

  5. You must register BioProject and BioSample before creating SRA if you want to get only the Accession number (without uploading sequence data).

What are accession numbers?

  • SRA (SUBMISSION)
    • SRP# (STUDY) — PRJNA# in BioProject
      • SRS# (SAMPLE) — SAMN# in BioSample
        • SRX# (EXPERIMENT)
          • SRR# (RUN)

We recommend using SRP# in publications.

All submissions have a SUB#. The SUB# is non-public identifier that is used by software for tracking purposes.

On the human metagenomic sample

Human metagenomic studies may contain human sequences and require that the donor provide consent to archive their data in an unprotected database. If you would like to archive human metagenomic sequences in the public SRA database please contact the SRA and we will screen and remove human sequence contaminants from your submission.

  • I removed putative human genomic sequences by myself, but they can do it instead?

See also:

Register BioProject and BioSample (original doc)

Registering project and biological samples at the NCBI BioProject and the BioSample databases is a prerequisite for any public SRA submission. The BioProject and BioSample databases store data that relate to organizational and biological aspects of sequencing experiments.

BioProject

BioProject submission portal -> “New submission” -> Follow the wizard –…

You don’t have to provide BioSample accession(s) or register your sample(s) within the BioProject submission wizard.

…–> “Submit”

To update an existing record or recent submission, please email (bioprojecthelp@ncbi.nlm.nih.gov) your request with your BioProject ID or Submission ID included. Do not create new submission to update an existing submission!

BioSample

BioSample is a record of biological isolate with unique physical properties. Biological and technical replicates (in most cases) should not be considered unique BioSamples.

BioSample submission portal -> “New submission” -> Follow the wizard –…

In Attributes section: download Excel template, fill it out, convert into tab-delimited text file, and upload the text file.

To update BioSample’s attributes, contact BioSample staff at biosamplehelp@ncbi.nlm.nih.gov.

…–> “Submit”

Create SRA (original doc)

The SRA metadata describes the technical aspects of sequencing experiments: the sequencing libraries, preparation techniques and data files.

Each EXPERIMENT has a unique combination of replicate number + library + sequencing strategy + layout + instrument model

RUN is simply a manifest of data file(s) that are derived from sequencing a library described by the associated EXPERIMENT

SRA submission portal -> “Command line upload options” -> “Request preload folder” (-> Choose either “Aspera command line upload” or “FTP upload” for the future uploading of files, and read it) -> Upload all files via Aspera/FTP –…

Each file must be listed in the SRA metadata table you uploaded. If you are uploading a tar archive, list each file name, not the archive name.

  • I uploaded .tar archives of .bax.h5 files for each EXPERIMENT, and used lftp command for FTP.

…–> “New submission” -> Follow the wizard

Upload data with Aspera

Submission Portal provides options for transferring your sequence files using FTP or Aspera command line (recommended for all submissions) and via browser-based HTTP/Aspera transfer protocol (recommended only for small submissions and small files).