Custom reference

PreviousPage NextReference BED file format

Last updated 6 months ago

Was this helpful?

Custom reference

In addition to the built-in options, DRAGEN Microbial Amplicon supports the use of custom reference genomes and primer definitions. These files must be uploaded to a BaseSpace Project before they can be used. See for more information about importing files into BaseSpace.

In the app input form, select the 'Custom' option for 'Amplicon Primer Set'. Then expand the 'Custom Reference' settings to provide the following:

Custom Reference FASTA for Consensus Generation (required)
Custom Reference BED (optional)
Custom PCR Primer Definitions (optional)

Custom reference FASTA

If the 'Custom' option is selected for 'Amplicon Primer Set', the user must provide a custom FASTA file containing one or more reference sequences as the target for read alignment (and as the basis for generating consensus sequences). The software generates the required DRAGEN hash tables and other auxiliary files automatically, so there is no need to process the FASTA file with a separate app. Note that not all provided reference sequences in the FASTA file may be used for read alignment and consensus sequence generation.

Custom reference BED

Optionally, a reference BED file may be provided to add information about each reference sequence in the FASTA file, such as human-readable names to be used in the reports. For multi-segment genomes such as Influenza, this file assigns the segment name to each sequence, which allows the software to group individual segment sequences by genome. See the following page on the format of this file:

Custom PRC primer definitions

Optionally, a TSV file may be provided to define the primer sequences or binding locations, which are used for two purposes:

Primer sequences are trimmed from reads, which eliminates sequences that may come from the primer sequences themselves (which we do not want) from sequences contributed by the biological sample (which we do want). This reduces reference bias that can incorrectly lower the observed allele frequency of true sequence variants in primer binding sites.
Primer locations are used to define the amplicons expected from PCR reactions. The read coverage within the unique (non-overlapping) amplicon regions is used to determine whether each amplicon is reliably detected. The percentage of detected amplicons is used to determine whether sufficient material exists to accurately call variants and generate consensus sequences from the sample.

See the following pages for further information:

Nextclade datasets

Optionally, one or more Nextclade datasets can be selected to use for phylogenetic analysis of the consensus sequences generated from the samples. Every selected dataset will be applied to every consensus sequence generated in every sample.