> For the complete documentation index, see [llms.txt](https://help.idm.illumina.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://help.idm.illumina.com/dragen-microbial-amplicon/dragen-microbial-amplicon.md). # DRAGEN Microbial Amplicon App Documentation ##

Overview DRAGEN Microbial Amplicon is a software application designed to analyze sequencing data from amplicon library preps (both DNA and RNA) on microbiological samples, with an emphasis on viruses. Illumina sequencing reads are processed to generate consensus sequences that represent a best estimate of the population of viral sequences in each sample. Where appropriate, these consensus sequences are further analyzed by the phylogenetic analysis tools Nextclade and/or Pangolin to provide an identification of the clade or lineage of each sequence. ### Input Data can be provided in one of the following ways: * Samples / biosamples with FASTQ datasets (see details in library preparation documents) * A project containing one or more samples / biosamples with FASTQ datasets * All samples / biosamples in the selected project will be analyzed **Supported amplicon primer schemes** * Chikungunya * [Grubaugh Lab](https://grubaughlab.com/open-science/amplicon-sequencing/) * Illumina * Dengue * Serotype 1 - Illumina * All serotypes - [DengueSeq from Grubaugh Lab](https://github.com/grubaughlab/DENV-genomics) * Influenza [A](https://doi.org/10.1007/978-1-61779-621-0_11)/[B](https://doi.org/10.1128/jcm.03265-13) - Universal * Mpox * Pan-clade - [ARTIC](https://labs.primalscheme.com/detail/artic-inrb-mpox/2500/v1.0.0/) * Clade I - Illumina * Clade II - [Grubaugh Lab](https://dx.doi.org/10.17504/protocols.io.5qpvob1nbl4o/v1) * RSV * [CDC](https://doi.org/10.1016/j.jviromet.2021.114335) * [WCCRRI](https://doi.org/10.1016/j.jcv.2023.105423) * SARS-CoV-2 - ARTIC * [v5.4.2](https://community.artic.network/t/scheme-release-artic-sars-cov2-400-v5-4-2) * [v5.3.2, v4.1, v4, v3](https://github.com/artic-network/artic-ncov2019/tree/master/primer_schemes/nCoV-2019) * Zika - [Grubaugh Lab](https://grubaughlab.com/open-science/amplicon-sequencing/) **Custom genome and primer sets** Users can upload custom files to provide user-defined reference genome set and primer definitions. Multiplexed amplicon panels targeting multiple organisms in the same reaction are supported. {% content-ref url="/pages/vG1KKCccA8uSy3r0C1Zx" %} [Custom reference](/dragen-microbial-amplicon/dragen-microbial-amplicon/custom.md) {% endcontent-ref %} ### Pipeline steps 1. Trim and filter reads using [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic) 2. Remove off-target reads using DRAGEN v4.3.6 kmer classifier (for custom reference, remove human reads using a modified version of the [SRA Human Read Scrubber tool](https://github.com/ncbi/sra-human-scrubber) v2.2.1) 3. For organisms with one default reference genome, skip this step. For organisms with multiple candidates, trim primer sequences in reads using [Trimmomatic](http://www.usadellab.org/cms/?page=trimmomatic), perform assembly using [MEGAHIT](https://github.com/voutcn/megahit), cluster contigs using [CD-HIT-EST](http://weizhong-cluster.ucsd.edu/cd-hit/), map contigs to candidate reference genomes using [minimap2](https://github.com/lh3/minimap2), then select reference genomes based on the mapping 4. Align reads to the default reference genome or selected reference genomes using DRAGEN 5. Trim primer sequences in aligned reads based on coordinates 6. Filter out samples with insufficient amplicon coverage 7. Call sequence variants from the alignments using DRAGEN and apply them to the corresponding reference genomes to create consensus sequences 8. If applicable, run Nextclade/Pangolin on the consensus sequences {% content-ref url="/pages/A96kDmaXOrbxLJKQ1Vog" %} [Pipeline Logic](/dragen-microbial-amplicon/dragen-microbial-amplicon/pipeline.md) {% endcontent-ref %} ### Output * Consensus sequences representing a best estimate of targeted sequences * Tables and plots reporting read counts, coverage, and Nextclade/Pangolin results {% content-ref url="/pages/X96CHXeHG7gtcF7xYQee" %} [Output files](/dragen-microbial-amplicon/dragen-microbial-amplicon/output.md) {% endcontent-ref %} ## Currently supported platforms * BaseSpace Sequence Hub ## Important Notes * The sequences are labeled according to the best match in the reference database, which is not exhaustive and the labels should not be taken as definitive for strain-typing. If strain typing is needed, the built-in Nextclade and/or Pangolin tools can be used for supported organisms. Alternatively, a BLAST or similar search of nucleotide databases may provide a more detailed match. * Because of sequence homology, it is possible that organisms with very few reads will result in the generation of a sequence not present (false positive). Although the de novo assembly step of this software largely mitigates such instances, sequences with very low horizontal coverage (< 5%) should be treated with caution. --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter: ``` GET https://help.idm.illumina.com/dragen-microbial-amplicon/dragen-microbial-amplicon.md?ask= ``` The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.