Summary

PreviousUnderstanding the BaseSpace Reports NextSample Report

Last updated 6 months ago

Was this helpful?

Summary

The Summary contains at most three tabs: Summary Report, Nextclade Report, and Pangolin Report.

Summary Report

Metrics By Sample

This table provides a top-line summary of each of the analyzed samples.

At the top is the "Download CSV" button, which enables downloading the contents of the table as a text comma-separated value (CSV) file.

Next is the table itself, which contains one row per sample and the following columns:

Sample: Name of the BaseSpace sample analyzed
Status: Status of the sample analysis
Input Reads: Total number of reads in input FASTQs
Mapped Reads: Number of reads that map to reference sequences during short read alignment
Num Genomes: Number of genomes chosen during the reference selection stage
Virus: Name of the genome to which the reference sequence belongs
% Callable: Percentage of bases in the reference sequence with coverage above the minimum read coverage depth for consensus sequence generation (10x by default)
- Callable bases are those for which reliable variant calling can be performed and therefore for which the software can output a base call. They are defined as genomic positions with read coverage above the minimum read coverage depth for consensus sequence generation (10x by default).
- When generating consensus sequences, genomic positions below the threshold are hard-masked with "N" characters to avoid reference bias (inclusion of a reference base when the true base cannot be accurately determined).
- This percentage is calculated over the lengths of the reference genome(s), not the final consensus sequence(s) which may be trimmed.

Pre-processing Metrics

This stacked bar plot contains counts of reads that fall into the following categories:

Removed in Downsampling: Reads that were removed during downsampling because the user specified a downsampling target in the Input Form under Advanced Workflow Settings
Removed in QC: Reads removed as poor quality reads based on quality thresholds during pre-processing
Removed as Duplicate: Reads that were labeled as duplicate during short read alignment. Removal of them can be disabled in the Input Form under Advanced Workflow Settings
Removed in Trimming: Reads that were removed in the initial sequence-based primer trimming step and were excluded from further processing
Removed in De-hosting: Reads that were filtered out as human reads based on kmer-based classification during pre-processing.
- This improves the quality of downstream analysis and helps ensure that human sequences are not included in the output BAM files.
- This is applied only if 'Amplicon Primer Set' was set to 'Custom' in the Input Form.
- This can be disabled in the Input Form under Advanced Workflow Settings by unchecking "Remove off-target reads".
Removed as Off-target: Reads that were filtered out as off-target reads based on kmer-based classification during pre-processing
- Similar de-hosting, this improves the quality of downstream analysis.
- Off-target is defined as not coming from the target organism, which is determined based on the 'Amplicon Primer Set' selection in the Input Form. For example, if "Influenza A and B, Universal Primers" option is selected, a kmer database generated from a large collection of publicly available Influenza sequences is used to separate reads likely coming from Influenza from the rest.
- This can be disabled in the Input Form under Advanced Workflow Settings by unchecking "Remove off-target reads".
Unmapped: Reads that were not aligned to any reference genomes
Mapped. Reads that were mapped to at least one reference genome

Nextclade Report (optional)

This tab contains tables reporting the results of the Nextclade analysis performed on the generated consensus sequences across all samples. Nextclade is run if the "Enable NextClade" box is checked on the Input Form and one of the following is true:

'Amplicon Primer Set' is set to a non-custom set with a reference with Nextclade dataset available and a valid consensus sequence was generated.
'Amplicon Primer Set' is set to 'Custom' and one or more Nextclade datasets are selected under 'Custom Reference'. In this case, each of the selected Nextclade datasets is applied to each consensus sequence generated for every sample. This may result in multiple Nextclade results for each consensus sequence.

Each table contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.

Pangolin Report (optional)

This tab contains tables reporting the results of the Pangolin analysis performed on the generated consensus sequences across all samples. Pangolin is run if the "Enable Pangolin" box is checked on the input form and one of the following is true:

'Amplicon Primer Set' is set to a non-custom set with SARS-CoV-2 as reference (e.g. SARS-CoV-2, ARTIC v5.4.2 primers) and a valid consensus sequence was generated
'Amplicon Primer Set' is set to 'Custom'. In this case, Pangolin is applied to every consensus sequence generated for the sample since the software assumes all of them to be potentially SARS-CoV-2 sequences.

Each table contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.

PreviousUnderstanding the BaseSpace Reports NextSample Report

Last updated 6 months ago

Was this helpful?

The Summary contains at most three tabs: Summary Report, Nextclade Report, and Pangolin Report.

Summary Report

Metrics By Sample

This table provides a top-line summary of each of the analyzed samples.

At the top is the "Download CSV" button, which enables downloading the contents of the table as a text comma-separated value (CSV) file.

Next is the table itself, which contains one row per sample and the following columns:

Sample: Name of the BaseSpace sample analyzed
Status: Status of the sample analysis
Input Reads: Total number of reads in input FASTQs
Mapped Reads: Number of reads that map to reference sequences during short read alignment
Detected Amplicons: Proportion of amplicons detected out of the total expected for the sample, which is used to to determine if the sample is sufficient quality for variant calling. See for more details.
Num Genomes: Number of genomes chosen during the reference selection stage
Virus: Name of the genome to which the reference sequence belongs
% Callable: Percentage of bases in the reference sequence with coverage above the minimum read coverage depth for consensus sequence generation (10x by default)
- Callable bases are those for which reliable variant calling can be performed and therefore for which the software can output a base call. They are defined as genomic positions with read coverage above the minimum read coverage depth for consensus sequence generation (10x by default).
- When generating consensus sequences, genomic positions below the threshold are hard-masked with "N" characters to avoid reference bias (inclusion of a reference base when the true base cannot be accurately determined).
- This percentage is calculated over the lengths of the reference genome(s), not the final consensus sequence(s) which may be trimmed.

Pre-processing Metrics

This stacked bar plot contains counts of reads that fall into the following categories:

Removed in Downsampling: Reads that were removed during downsampling because the user specified a downsampling target in the Input Form under Advanced Workflow Settings
Removed in QC: Reads removed as poor quality reads based on quality thresholds during pre-processing
Removed as Duplicate: Reads that were labeled as duplicate during short read alignment. Removal of them can be disabled in the Input Form under Advanced Workflow Settings
Removed in Trimming: Reads that were removed in the initial sequence-based primer trimming step and were excluded from further processing
Removed in De-hosting: Reads that were filtered out as human reads based on kmer-based classification during pre-processing.
- This improves the quality of downstream analysis and helps ensure that human sequences are not included in the output BAM files.
- This is applied only if 'Amplicon Primer Set' was set to 'Custom' in the Input Form.
- This can be disabled in the Input Form under Advanced Workflow Settings by unchecking "Remove off-target reads".
Removed as Off-target: Reads that were filtered out as off-target reads based on kmer-based classification during pre-processing
- Similar de-hosting, this improves the quality of downstream analysis.
- Off-target is defined as not coming from the target organism, which is determined based on the 'Amplicon Primer Set' selection in the Input Form. For example, if "Influenza A and B, Universal Primers" option is selected, a kmer database generated from a large collection of publicly available Influenza sequences is used to separate reads likely coming from Influenza from the rest.
- This can be disabled in the Input Form under Advanced Workflow Settings by unchecking "Remove off-target reads".
Unmapped: Reads that were not aligned to any reference genomes
Mapped. Reads that were mapped to at least one reference genome

Nextclade Report (optional)

'Amplicon Primer Set' is set to a non-custom set with a reference with Nextclade dataset available and a valid consensus sequence was generated.
'Amplicon Primer Set' is set to 'Custom' and one or more Nextclade datasets are selected under 'Custom Reference'. In this case, each of the selected Nextclade datasets is applied to each consensus sequence generated for every sample. This may result in multiple Nextclade results for each consensus sequence.

Each table contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.

All content shown in the tab is derived from the output of the Nextclade software. Please see the for more details.

Pangolin Report (optional)

'Amplicon Primer Set' is set to a non-custom set with SARS-CoV-2 as reference (e.g. SARS-CoV-2, ARTIC v5.4.2 primers) and a valid consensus sequence was generated
'Amplicon Primer Set' is set to 'Custom'. In this case, Pangolin is applied to every consensus sequence generated for the sample since the software assumes all of them to be potentially SARS-CoV-2 sequences.

Each table contains a "Download CSV" button which allows the user to download the contents of the report as a text CSV file.

All content shown in the tab is derived from the output of the Pangolin software. Please see the for more details.