🐲
Illumina Infectious Disease Software
Illumina Connected Software
  • 👾Illumina Infectious Disease and Microbiology Software
  • DRAGEN Microbial Amplicon
    • ▶️DRAGEN Microbial Amplicon App Documentation
      • 🌀How to start
      • Page
      • 🧬Custom reference
        • 📄Reference BED file format
        • 📄PCR Primer definition file formats
      • 📂Output files
      • 📖Understanding the BaseSpace Reports
        • 📄Summary
        • 📄Sample Report
      • 💠Pipeline Logic
      • ⭐Special considerations for amplicon detection
      • ❓Frequently Asked Questions (FAQ)
  • DRAGEN Targeted Microbial
    • ▶️DRAGEN Targeted Microbial App Documentation
      • 🌀How to set up and run an analysis
      • 🧬Custom genomes and primer sets
        • 📄Genome definition file formats
        • 📄Primer definition file formats
      • ⚙️App Settings
      • 📖Understanding the BaseSpace Reports
        • 📄Summary Report
        • 📄Result Reports
      • 📂Output files
      • 💠Pipeline Logic
      • ⭐Special considerations for amplicon sequencing with IMAP protocols
      • ❓Frequently Asked Questions (FAQ)
      • 🚩Known issues
  • DRAGEN Microbial Enrichment Plus
    • ▶️DRAGEN Microbial Enrichment Plus App Documentation
      • 🌀How to set up and run an analysis
        • 🧬Custom reference FASTA and BED files
        • 📄Microorganism Reporting File format
      • 📂Output files
        • 📖Understanding the BaseSpace HTML reports
        • 📖Report JSON format
      • 💠Pipeline logic
      • ⭐Test information
        • 📄RPIP
        • 📄UPIP
        • 📄RVOP/RVEK
        • 📄VSP
        • 📄VSP V2
        • 📄Custom Panel
      • 🕵️‍♀️Scientific evidence
      • ❓Frequently Asked Questions (FAQ)
      • 🚩Release notes
Powered by GitBook
On this page
  • Summary Report
  • Metrics By Sample
  • Pre-processing Metrics
  • Nextclade Report (optional)
  • Pangolin Report (optional)

Was this helpful?

Export as PDF
  1. DRAGEN Microbial Amplicon
  2. DRAGEN Microbial Amplicon App Documentation
  3. Understanding the BaseSpace Reports

Summary

PreviousUnderstanding the BaseSpace ReportsNextSample Report

Last updated 5 months ago

Was this helpful?

The Summary contains at most three tabs: Summary Report, Nextclade Report, and Pangolin Report.

Summary Report

Metrics By Sample

This table provides a top-line summary of each of the analyzed samples.

At the top is the "Download CSV" button, which enables downloading the contents of the table as a text comma-separated value (CSV) file.

Next is the table itself, which contains one row per sample and the following columns:

  1. Sample: Name of the BaseSpace sample analyzed

  2. Status: Status of the sample analysis

  3. Input Reads: Total number of reads in input FASTQs

  4. Mapped Reads: Number of reads that map to reference sequences during short read alignment

  5. Num Genomes: Number of genomes chosen during the reference selection stage

  6. Virus: Name of the genome to which the reference sequence belongs

  7. % Callable: Percentage of bases in the reference sequence with coverage above the minimum read coverage depth for consensus sequence generation (10x by default)

    • Callable bases are those for which reliable variant calling can be performed and therefore for which the software can output a base call. They are defined as genomic positions with read coverage above the minimum read coverage depth for consensus sequence generation (10x by default).

    • When generating consensus sequences, genomic positions below the threshold are hard-masked with "N" characters to avoid reference bias (inclusion of a reference base when the true base cannot be accurately determined).

    • This percentage is calculated over the lengths of the reference genome(s), not the final consensus sequence(s) which may be trimmed.

Pre-processing Metrics

This stacked bar plot contains counts of reads that fall into the following categories:

  • Removed in Downsampling: Reads that were removed during downsampling because the user specified a downsampling target in the Input Form under Advanced Workflow Settings

  • Removed in QC: Reads removed as poor quality reads based on quality thresholds during pre-processing

  • Removed as Duplicate: Reads that were labeled as duplicate during short read alignment. Removal of them can be disabled in the Input Form under Advanced Workflow Settings

  • Removed in Trimming: Reads that were removed in the initial sequence-based primer trimming step and were excluded from further processing

  • Removed in De-hosting: Reads that were filtered out as human reads based on kmer-based classification during pre-processing.

    • This improves the quality of downstream analysis and helps ensure that human sequences are not included in the output BAM files.

    • This is applied only if 'Amplicon Primer Set' was set to 'Custom' in the Input Form.

    • This can be disabled in the Input Form under Advanced Workflow Settings by unchecking "Remove off-target reads".

  • Removed as Off-target: Reads that were filtered out as off-target reads based on kmer-based classification during pre-processing

    • Similar de-hosting, this improves the quality of downstream analysis.

    • Off-target is defined as not coming from the target organism, which is determined based on the 'Amplicon Primer Set' selection in the Input Form. For example, if "Influenza A and B, Universal Primers" option is selected, a kmer database generated from a large collection of publicly available Influenza sequences is used to separate reads likely coming from Influenza from the rest.

    • This can be disabled in the Input Form under Advanced Workflow Settings by unchecking "Remove off-target reads".

  • Unmapped: Reads that were not aligned to any reference genomes

  • Mapped. Reads that were mapped to at least one reference genome

Nextclade Report (optional)

This tab contains tables reporting the results of the Nextclade analysis performed on the generated consensus sequences across all samples. Nextclade is run if the "Enable NextClade" box is checked on the Input Form and one of the following is true:

  • 'Amplicon Primer Set' is set to a non-custom set with a reference with Nextclade dataset available and a valid consensus sequence was generated.

  • 'Amplicon Primer Set' is set to 'Custom' and one or more Nextclade datasets are selected under 'Custom Reference'. In this case, each of the selected Nextclade datasets is applied to each consensus sequence generated for every sample. This may result in multiple Nextclade results for each consensus sequence.

Each table contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.

Pangolin Report (optional)

This tab contains tables reporting the results of the Pangolin analysis performed on the generated consensus sequences across all samples. Pangolin is run if the "Enable Pangolin" box is checked on the input form and one of the following is true:

  • 'Amplicon Primer Set' is set to a non-custom set with SARS-CoV-2 as reference (e.g. SARS-CoV-2, ARTIC v5.4.2 primers) and a valid consensus sequence was generated

  • 'Amplicon Primer Set' is set to 'Custom'. In this case, Pangolin is applied to every consensus sequence generated for the sample since the software assumes all of them to be potentially SARS-CoV-2 sequences.

Each table contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.

Detected Amplicons: Proportion of amplicons detected out of the total expected for the sample, which is used to to determine if the sample is sufficient quality for variant calling. See for more details.

All content shown in the tab is derived from the output of the Nextclade software. Please see the for more details.

All content shown in the tab is derived from the output of the Pangolin software. Please see the for more details.

▶️
📖
📄
this page
Nextclade documentation
Pangolin documentation
Example of a Summary Report tab