📖Understanding JSON file format
The DRAGEN Microbial Enrichment Plus app outputs a comprehensive sample-level report.json
file containing general metadata, version information, sample QC, microorganism, and AMR marker results, as well as detailed test information. Additional convenience file formats are generated by the DRAGEN Microbial Enrichment Plus app but do not contain novel content.
Top-Level Node
The top-level section of the report JSON contains general metadata and version information.
Field | Description |
---|---|
.accession | Identifier used for the sample |
.deploymentEnvironment | Environment in which the results were produced |
.batchId | Identifier used for the batch of samples processed together |
.analysisId | Identifier used for the analysis |
.runId | Identifier used for the sequencing run |
.controlFlag | Indicates whether the sample is a control. It is based on the ControlFlag field in the sample |
.dragenVersion | DRAGEN release version |
.analysisPipelineVersion | Analysis Pipeline release version |
.testType | Type of test enrichment panel (e.g. RPIP, VSP V2, Custom) |
.testVersion | Test panel release version |
.testName | Name of the test panel, e.g. "Explify® Respiratory Pathogen ID/AMR Panel (RPIP) - Data Analysis Solution" |
.testUse | Test use. "For Research Use Only. Not for use in diagnostic procedures" |
.reportTime | Time the report was generated |
.warnings | List of warnings encountered during the analysis |
.errors | List of errors encountered during the analysis |
.qcReport Node
This section contains information about sample quality control (QC). The fields are relative to .qcReport
Field | Description |
---|---|
.sampleQc | Sample QC information |
.sampleQc.totalRawBases | Number of base pairs in sample before read QC processing |
.sampleQc.totalRawReads | Number of reads in sample before read QC processing |
.sampleQc.uniqueReads | Number of distinct reads in sample before read QC processing |
.sampleQc.uniqueReadsProportion | Proportion of distinct reads in sample before read QC processing |
.sampleQc.preQualityMeanReadLength | Average read length before read QC processing |
.sampleQc.postQualityMeanReadLength | Average read length after read QC processing |
.sampleQc.postQualityReads | Number of reads in sample after read QC processing |
.sampleQc.postQualityReadsProportion | Proportion of post-quality reads in sample relative to total raw reads |
.sampleQc.removedInDehostingReads | Number of host reads in sample removed during dehosting |
.sampleQc.removedInDehostingReadsProportion | Proportion of host reads in sample removed relative to total raw reads |
.sampleQc.entropy | Kmer entropy of reads after read QC processing |
.sampleQc.gContent | Proportion of guanine (G) base calls in reads after read QC processing |
.sampleQc.libraryQScore | Quality score of the library after read QC processing |
.sampleQc.enrichmentFactor | Enrichment factor information (calculation requires detection of an appropriate Internal Control) |
.sampleQc.enrichmentFactor.value | Enrichment factor value reflecting how well targeted regions were enriched |
.sampleQc.enrichmentFactor.category | Enrichment factor category: "poor", "fair", "good", or "not calculated" |
.qcReport.sampleComposition Node
This section contains information about the composition of the sample. The fields are relative to .qcReport.sampleComposition
Field | Description |
---|---|
.readClassification | Proportion of reads classified to the following categories: |
.readClassification.targetedMicrobial | Targeted microbial |
.readClassification.targetedInternalControl | Targeted Internal Control |
.readClassification.untargeted | Untargeted |
.readClassification.ambiguous | More than one category |
.readClassification.unclassified | No category |
.readClassification.lowComplexity | Low complexity |
.targetedMicrobial | Proportion of targeted microbial reads classified to the following sub-categories: |
.targetedMicrobial.viral | Viral targeted |
.targetedMicrobial.bacterial | Bacterial targeted |
.targetedMicrobial.fungal | Fungal targeted |
.targetedMicrobial.parasitic | Parasitic targeted |
.targetedMicrobial.bacterialAmr | Bacterial AMR targeted |
.untargeted | Proportion of untargeted reads classified to the following sub-categories: |
.untargeted.viral | Viral untargeted |
.untargeted.bacterial | Bacterial untargeted |
.untargeted.fungal | Fungal untargeted |
.untargeted.parasitic | Parasitic untargeted |
.untargeted.bacterialAmr | Bacterial AMR untargeted |
.untargeted.internalControl | Internal Control untargeted |
.untargeted.human | Human untargeted |
.viral | Proportion of viral reads classified to the following categories: |
.viral.targeted | Viral targeted |
.viral.untargeted | Viral untargeted |
.viral.untargetedSubcategories | Proportion of viral untargeted reads classified to the following sub-categories: |
.viral.untargetedSubcategories.panel | Viral panel members |
.viral.untargetedSubcategories.phage | Viral phage |
.viral.untargetedSubcategories.other | Viral other (not a panel member or phage) |
.bacterial | Proportion of bacterial reads classified to the following categories: |
.bacterial.targeted | Bacterial targeted |
.bacterial.untargeted | Bacterial untargeted |
.bacterial.untargetedSubcategories | Proportion of bacterial untargeted reads classified to the following sub-categories: |
.bacterial.untargetedSubcategories.panel | Bacterial panel members |
.bacterial.untargetedSubcategories.ribosomalDna | Bacterial ribosomal DNA (16S) |
.bacterial.untargetedSubcategories.plasmid | Bacterial plasmids |
.bacterial.untargetedSubcategories.other | Bacterial other (not a panel member, ribosomal DNA, or plasmid) |
.fungal | Proportion of fungal reads classified to the following categories: |
.fungal.targeted | Fungal targeted |
.fungal.untargeted | Fungal untargeted |
.fungal.untargetedSubcategories | Proportion of fungal untargeted reads classified to the following sub-categories: |
.fungal.untargetedSubcategories.panel | Fungal panel members |
.fungal.untargetedSubcategories.ribosomalDna | Fungal ribosomal DNA (18S) |
.fungal.untargetedSubcategories.other | Fungal other (not a panel member or ribosomal DNA) |
.parasitic | Proportion of parasitic reads classified to the following categories: |
.parasitic.targeted | Parasitic targeted |
.parasitic.untargeted | Parasitic untargeted |
.parasitic.untargetedSubcategories | Proportion of parasitic untargeted reads classified to the following sub-categories: |
.parasitic.untargetedSubcategories.panel | Parasitic panel members |
.parasitic.untargetedSubcategories.ribosomalDna | Parasitic ribosomal DNA (18S) |
.parasitic.untargetedSubcategories.other | Parasitic other (not a panel member or ribosomal DNA) |
.human | Proportion of human reads classified to the following categories: |
.human.untargeted | Human untargeted |
.human.untargetedSubcategories | Proportion of human untargeted reads classified to the following sub-categories: |
.human.untargetedSubcategories.ribosomalDna | Human ribosomal DNA |
.human.untargetedSubcategories.codingSequence | Human coding sequence |
.human.untargetedSubcategories.other | Human other (not ribosomal DNA or coding sequence) |
.internalControl | Proporition of Internal Control reads classified to the following categories: |
.internalControl.targeted | Internal Control targeted |
.internalControl.untargeted | Internal Control untargeted |
.microbialAndInternalControl | Proportion of Microbial and Internal Control reads classified to the following categories: |
.microbialAndInternalControl.targeted | Microbial and Internal Control targeted |
.microbialAndInternalControl.untargeted | Microbial and Internal Control untargeted |
.bacterialAmr | Proportion of bacterial AMR reads classified to the following categories: |
.bacterialAmr.targeted | Bacterial AMR targeted |
.bacterialAmr.untargeted | Bacterial AMR untargeted |
.qcReport.internalControls Node
The value of the .qcReport.internalControls
field is an array of objects containing name and RPKM information for each Internal Control. See the code block below for an example:
.userOptions Node
This section gives information about analysis options specified by the user. The fields are relative to .userOptions
Field | Description |
---|---|
.quantitativeInternalControlName | The quantitative Internal Control used for microorganism absolute quantification (recommendation: Enterobacteria phage T7) |
.quantitativeInternalControlConcentration | The quantitative Internal Control concentration (recommendation: 1.21 x 10^7 copies/mL of sample) |
.readQcEnabled | Boolean field that indicates whether read QC (trimming and filtering based on quality and read length) was enabled |
.readClassificationSensitivity | (VSPv2 only) Sensitivity threshold for classifying reads. Determines whether alignment should proceed for a microorganism and/or reference sequence |
.targetReport.microorganisms[] Node
The value of the .targetReport.microorganisms[]
field is an array of objects containing information about detected microorganisms. The following table describes one .targetReport.microorganisms[]
object. The fields are relative to .targetReport.microorganisms[]
Field | Description |
---|---|
.class | Microorganism class ("viral", "bacterial", "fungal", "parasite") |
.name | Name of microorganism |
.coverage | Proportion of targeted microorganism reference sequence bases that appear in sample sequencing reads |
.ani | Average nucleotide identity of consensus sequence to targeted microorganism reference sequences |
.medianDepth | Median depth of sample sequencing reads aligned to targeted microorganism reference sequences, indicating the median number of times each targeted microorganism reference sequence base appears in sample sequencing reads |
.condensedDepthVector | Read depth across the targeted microorganism reference sequences, condensed to 256 bins |
.rpkm | Normalized representation of the number of sample sequencing reads aligned to targeted microorganism reference sequences (targeted Reads mapped Per Kilobase of targeted sequence per Million quality-filtered reads) |
.alignedReadCount | Number of sample sequencing reads that aligned to targeted microorganism reference sequences |
.kmerReadCount | (UPIP only) Number of sample sequencing reads classified to targeted microorganism reference sequences |
.absoluteQuantityRatio | Numerical absolute quantification value |
.absoluteQuantityRatioFormatted | Formatted absolute quantification value with units |
.phenotypicGroup | Grouping indicating general association with normal flora, colonization, or contamination from the environment or other sources, as well as general association with disease |
.associatedAmrMarkers | (Bacteria only) Information about the bacterial AMR markers associated with the microorganism |
.associatedAmrMarkers.applicable | Boolean indicating whether one or more bacterial AMR markers are associated with the microorganism |
.associatedAmrMarkers.detected | List of detected bacterial AMR markers associated with the microorganism |
.associatedAmrMarkers.predicted | List of predicted bacterial AMR markers associated with the microorganism |
.consensusGenomeSequences | (RPIP/VSP V2 viruses only) Information about the majority consensus genome (or segment) sequence |
.consensusGenomeSequences.sequence | Consensus genome (or segment) sequence bases |
.consensusGenomeSequences.referenceAccession | Accession of the reference genome (or segment) sequence |
.consensusGenomeSequences.referenceDescription | Description of the reference genome (or segment) sequence |
.consensusGenomeSequences.referenceLength | Length of the reference genome (or segment) sequence |
.consensusGenomeSequences.maximumAlignmentLength | Longest contiguous alignment between consensus sequence and reference genome (or segment) sequence |
.consensusGenomeSequences.maximumGapLength | Longest contiguous alignment gap (insertion or deletion) between consensus sequence and reference genome (or segment) sequence |
.consensusGenomeSequences.maximumUnalignedLength | Longest section of the reference genome (or segment) sequence not aligned to by consensus sequence |
.consensusGenomeSequences.coverage | Proportion of reference genome (or segment) sequence bases that appear in sample sequencing reads |
.consensusGenomeSequences.ani | Average nucleotide identity of consensus sequence to reference genome (or segment) sequence |
.consensusGenomeSequences.alignedReadCount | Number of sample sequencing reads that aligned to reference genome (or segment) sequence. |
.consensusGenomeSequences.medianDepth | Median depth of sample sequencing reads aligned to reference genome (or segment) sequence, indicating the median number of times each reference genome (or segment) sequence base appears in sample sequencing reads |
.consensusGenomeSequences.targetAnnotation | List of targeted region annotations for the reference genome (or segment) sequence. Each annotation is a JSON object with the following fields: start (int), end (int), strand (string: "+", "-"), target_name (string), type (string) |
.consensusGenomeSequences.condensedDepthVector | Read depth across the reference genome (or segment) sequence, condensed to 256 bins |
.consensusTargetSequences | (RPIP viruses only) Information about the majority targeted region consensus sequences |
.consensusTargetSequences.sequence | Consensus targeted region sequence bases |
.consensusTargetSequences.name | Name of the targeted region |
.consensusTargetSequences.referenceAccession | Accession of the targeted region reference sequence |
.consensusTargetSequences.depthVector | Read depth across the targeted region reference sequence, not condensed |
.predictionInformation | Information about microorganism prediction results |
.predictionInformation.predictedPresent | Boolean indicating whether the microorganism passed its proprietary reporting logic algorithm |
.predictionInformation.notes | List of notes about the prediction result |
.predictionInformation.subpanels | List of pre-defined subpanels that the microorganism belongs to |
.predictionInformation.relatedMicroorganisms | Array of objects with information about genetically related microorganisms. See below for details |
.targetReport.microorganisms[].relatedMicroorganisms[] Node
The value of the .targetReport.microorganisms[].relatedMicroorganisms[]
field is an array of objects containing information about genetically related microorganisms. The following table describes one .targetReport.microorganisms[].relatedMicroorganisms[]
object. The fields are relative to .targetReport.microorganisms[].relatedMicroorganisms[]
Field | Description |
---|---|
.name | Name of related microorganism |
.onPanel | Boolean indicating whether the related microorganism is a panel member |
.kmerReadCount | (UPIP only) Number of sample sequencing reads classified to related microorganism reference sequences |
.coverage | Proportion of related microorganism reference sequence bases that appear in sample sequencing reads |
.ani | Average nucleotide identity of consensus sequence to related microorganism reference sequences |
.alignedReadCount | Number of sample sequencing reads that aligned to related microorganism reference sequences |
.targetReport.microorganisms[].variants[] Node
The value of the .targetReport.microorganisms[].variants[]
field is an array of objects containing information about variants for all VSP V2 viruses and select RPIP WGS viruses (SARS-CoV-2 & FluA/B/C). The following table describes one .targetReport.microorganisms[].variants[]
object. The fields are relative to .targetReport.microorganisms[].variants[]
Field | Description |
---|---|
.referenceAccession | Accession of reference genome (or segment) sequence used for variant calling |
.segment | (Segmented viruses only) Segment number of reference segment sequence |
.ntChange | Nucleotide change associated with variant |
.referencePosition | Variant position in reference genome (or segment) sequence |
.referenceAllele | Reference allele at variant position |
.variantAllele | Variant allele |
.depth | Variant depth, indicating the number of times variant allele appears in sample sequencing reads |
.alleleFrequency | Frequency of variant allele in sample sequencing reads |
.targetReport.amrMarkers[] Node
The value of the .targetReport.amrMarkers[]
field is an array of objects containing information about detected bacterial AMR markers. The following table describes one .targetReport.amrMarkers[]
object. The fields are relative to .targetReport.amrMarkers[]
Field | Description |
---|---|
.class | Microorganism class ("bacterial") |
.cardModelType | Bacterial AMR marker model type in the Comprehensive Antibiotic Resistance Database (CARD) ("homolog", "protein variant", "rRNA variant") |
.cardGeneFamily | Bacterial AMR marker family in the Comprehensive Antibiotic Resistance Database (CARD) |
.name | Bacterial AMR marker name |
.cardName | Bacterial AMR marker name in the Comprehensive Antibiotic Resistance Database (CARD) |
.ncbiName | Bacterial AMR marker name in the National Center for Biotechnology Information (NCBI) |
.referenceAccession | Accession of the bacterial AMR marker reference sequence |
.coverage | Proportion of bacterial AMR marker reference sequence residues that appear in sample sequencing reads (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type) |
.pid | Percent identity of consensus sequence aligned to bacterial AMR marker reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type) |
.medianDepth | Median depth of sample sequencing reads aligned to bacterial AMR marker reference sequence, indicating the median number of times each bacterial AMR marker sequence residue appears in sample sequencing reads (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type) |
.rpkm | Normalized representation of the number of sample sequencing reads aligned to bacterial AMR reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type) |
.alignedReadCount | Number of sample sequencing reads that aligned to bacterial AMR reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type) |
.nucleotideConsensusSequence | Nucleotide consensus sequence bases |
.proteinConsensusSequence | Protein consensus sequence bases |
.nucleotideDepthVector | Read depth across the bacterial AMR marker nucleotide reference sequence, not condensed |
.proteinDepthVector | Read depth across the bacterial AMR marker protein reference sequence, not condensed |
.associatedMicroorganisms | Information about the microorganisms associated with the bacterial AMR marker |
.associatedMicroorganisms.all | List of all microorganisms associated with the bacterial AMR marker |
.associatedMicroorganisms.detected | List of detected microorganisms associated with the bacterial AMR marker |
.associatedMicroorganisms.predicted | List of predicted microorganisms associated with the bacterial AMR marker |
.predictionInformation | Information about bacterial AMR marker prediction results |
.predictionInformation.predictedPresent | Boolean indicating whether the bacterial AMR marker passed its proprietary reporting logic algorithm |
.predictionInformation.confidence | Confidence level of bacterial AMR marker prediction ("high", "medium", "low") |
.predictionInformation.notes | List of notes about the prediction result |
.targetReport.amrMarkers[].variants[] Node
The value of the .targetReport.amrMarkers[].variants[]
field is an array of objects containing information about variants for bacterial AMR markers with "protein variant" or "rRNA variant" model types. The following table describes one .targetReport.amrMarkers[].variants[]
object. The fields are relative to .targetReport.amrMarkers[].variants[]
Field | Description |
---|---|
.category | Variant category ("Bacterial Variant; Known AMR") |
.referenceSourceMicroorganism | Microorganism that reference sequence is associated with in NCBI |
.comments | Comments about variant |
.product | Protein product of gene |
.ntChange | Nucleotide change associated with variant |
.referencePosition | Variant position in reference sequence |
.referenceAllele | Reference allele at variant position |
.variantAllele | Variant allele |
.depth | Variant depth, indicating the number of times variant allele appears in sample sequencing reads |
.alleleFrequency | Frequency of variant allele in sample sequencing reads |
.annotation | Type of change (e.g. "Nonsynonymous Variant") |
.aaChange | Amino acid change associated with variant |
.epistaticGroups | List of epistatic groups variant is associated with |
.targetReport.customReferences[] Node
This section contains information about custom reference detection results and is only present for custom database analyses. When only a custom reference FASTA file is provided (no BED file), each .targetReport.customReferences[]
object contains information for a single reference sequence. When both a FASTA and BED file are provided, each .targetReport.customReferences[]
object contains information for a single genome/microorganism, which can be a collection of one or more reference sequences. The fields are relative to .targetReport.customReferences[]
Field | Description |
---|---|
.name | Name of custom reference sequence, accession or genome/microorgannism |
.coverage | Proportion of custom reference sequence bases that appear in sample sequencing reads |
.ani | Average nucleolotide identity of consensus sequence to custom reference sequence or, if specified, collection of one or more custom reference sequences |
.medianDepth | Median depth of sample sequencing reads aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences, indicating the med\ian number of times each custom reference sequence base appears in sample sequencing reads |
.condensedDepthVector | Read depth across custom reference sequence or, if specified, collection of one or more custom reference sequences, condensed to 256 bins |
.rpkm | Normalized number of sample sequencing reads aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences (targeted Reads mapped Per Kilobase of targeted sequence per Million quality-filtered reads) |
.alignedReadCount | Number of sample sequencing reads that aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences |
.consensusSequences | Array of objects with information about each consensus sequence |
.variants | Array of objects with information about variants detected in custom reference sequence or, if specified, collection of one or more custom reference sequences |
.targetReport.customReferences[].consensusSequences[] Node
The value of the .targetReport.customReferences[].consensusSequences[]
field is an array of objects containing majority consensus sequence information for a single custom reference sequence. When only a FASTA file is provided (no BED file), there will be only one object in the array. When both a FASTA and BED file are provided, there may be more than one object in the array. The fields are relative to .targetReport.customReferences[].consensusSequences[]
Field | Description |
---|---|
.sequence | Majority consensus sequence bases |
.referenceAccession | Accession of custom reference sequence |
.referenceDescription | Description of custom reference sequence |
.referenceLength | Length of custom reference sequence |
.coverage | Proportion of custom reference sequence bases that appear in sample sequencing reads |
.ani | Average nucleolotide identity of consensus sequence to custom reference sequence |
.medianDepth | Median depth of sample sequencing reads aligned to custom reference sequence, indicating the median number of times each custom reference sequence base appears in sample sequencing reads |
.depthVector | Read depth across custom reference sequence, not condensed |
.alignedReadCount | Number of sample sequencing reads that aligned to custom reference sequence |
.maximumAlignmentLength | Longest contiguous alignment between consensus sequence and custom reference sequence |
.maximumGapLength | Longest contiguous alignment gap (insertion or deletion) between consensus sequence and custom reference sequence |
.maximumUnalignedLength | Longest section of custom reference sequence not aligned to by consensus sequence |
.targetReport.customReferences[].variants[] Node
The value of the .targetReport.customReferences[].variants[]
field is an array of objects containing information about a single detected variant. The fields are relative to .targetReport.customReferences[].variants[]
Field | Description |
---|---|
.ntChange | Nucleotide change associated with the variant |
.referenceAccession | Accession of custom reference sequence used for variant calling |
.referencePosition | Variant position in custom reference sequence |
.referenceAllele | Reference allele at variant position |
.variantAllele | Variant allele |
.depth | Variant depth, indicating the number of times variant allele appears in sample sequencing reads |
.alleleFrequency | Frequency of variant allele in sample sequencing reads |
Last updated