📖Understanding JSON file format

The DRAGEN Microbial Enrichment Plus app outputs a comprehensive sample-level report.json file containing general metadata, version information, sample QC, microorganism, and AMR marker results, as well as detailed test information. Additional convenience file formats are generated by the DRAGEN Microbial Enrichment Plus app but do not contain novel content.

Top-Level Node

The top-level section of the report JSON contains general metadata and version information.

FieldDescription

.accession

Identifier used for the sample

.deploymentEnvironment

Environment in which the results were produced

.batchId

Identifier used for the batch of samples processed together

.analysisId

Identifier used for the analysis

.runId

Identifier used for the sequencing run

.controlFlag

Indicates whether the sample is a control. It is based on the ControlFlag field in the sample .tsv and can be set to “POS”, “NEG”, “BLANK”, or “-”

.dragenVersion

DRAGEN release version

.analysisPipelineVersion

Analysis Pipeline release version

.testType

Type of test enrichment panel (e.g. RPIP, VSP V2, Custom)

.testVersion

Test panel release version

.testName

Name of the test panel, e.g. "Explify® Respiratory Pathogen ID/AMR Panel (RPIP) - Data Analysis Solution"

.testUse

Test use. "For Research Use Only. Not for use in diagnostic procedures"

.reportTime

Time the report was generated

.warnings

List of warnings encountered during the analysis

.errors

List of errors encountered during the analysis

.qcReport Node

This section contains information about sample quality control (QC). The fields are relative to .qcReport

FieldDescription

.sampleQc

Sample QC information

.sampleQc.totalRawBases

Number of base pairs in sample before read QC processing

.sampleQc.totalRawReads

Number of reads in sample before read QC processing

.sampleQc.uniqueReads

Number of distinct reads in sample before read QC processing

.sampleQc.uniqueReadsProportion

Proportion of distinct reads in sample before read QC processing

.sampleQc.preQualityMeanReadLength

Average read length before read QC processing

.sampleQc.postQualityMeanReadLength

Average read length after read QC processing

.sampleQc.postQualityReads

Number of reads in sample after read QC processing

.sampleQc.postQualityReadsProportion

Proportion of post-quality reads in sample relative to total raw reads

.sampleQc.removedInDehostingReads

Number of host reads in sample removed during dehosting

.sampleQc.removedInDehostingReadsProportion

Proportion of host reads in sample removed relative to total raw reads

.sampleQc.entropy

Kmer entropy of reads after read QC processing

.sampleQc.gContent

Proportion of guanine (G) base calls in reads after read QC processing

.sampleQc.libraryQScore

Quality score of the library after read QC processing

.sampleQc.enrichmentFactor

Enrichment factor information (calculation requires detection of an appropriate Internal Control)

.sampleQc.enrichmentFactor.value

Enrichment factor value reflecting how well targeted regions were enriched

.sampleQc.enrichmentFactor.category

Enrichment factor category: "poor", "fair", "good", or "not calculated"

.qcReport.sampleComposition Node

This section contains information about the composition of the sample. The fields are relative to .qcReport.sampleComposition

FieldDescription

.readClassification

Proportion of reads classified to the following categories:

.readClassification.targetedMicrobial

Targeted microbial

.readClassification.targetedInternalControl

Targeted Internal Control

.readClassification.untargeted

Untargeted

.readClassification.ambiguous

More than one category

.readClassification.unclassified

No category

.readClassification.lowComplexity

Low complexity

.targetedMicrobial

Proportion of targeted microbial reads classified to the following sub-categories:

.targetedMicrobial.viral

Viral targeted

.targetedMicrobial.bacterial

Bacterial targeted

.targetedMicrobial.fungal

Fungal targeted

.targetedMicrobial.parasitic

Parasitic targeted

.targetedMicrobial.bacterialAmr

Bacterial AMR targeted

.untargeted

Proportion of untargeted reads classified to the following sub-categories:

.untargeted.viral

Viral untargeted

.untargeted.bacterial

Bacterial untargeted

.untargeted.fungal

Fungal untargeted

.untargeted.parasitic

Parasitic untargeted

.untargeted.bacterialAmr

Bacterial AMR untargeted

.untargeted.internalControl

Internal Control untargeted

.untargeted.human

Human untargeted

.viral

Proportion of viral reads classified to the following categories:

.viral.targeted

Viral targeted

.viral.untargeted

Viral untargeted

.viral.untargetedSubcategories

Proportion of viral untargeted reads classified to the following sub-categories:

.viral.untargetedSubcategories.panel

Viral panel members

.viral.untargetedSubcategories.phage

Viral phage

.viral.untargetedSubcategories.other

Viral other (not a panel member or phage)

.bacterial

Proportion of bacterial reads classified to the following categories:

.bacterial.targeted

Bacterial targeted

.bacterial.untargeted

Bacterial untargeted

.bacterial.untargetedSubcategories

Proportion of bacterial untargeted reads classified to the following sub-categories:

.bacterial.untargetedSubcategories.panel

Bacterial panel members

.bacterial.untargetedSubcategories.ribosomalDna

Bacterial ribosomal DNA (16S)

.bacterial.untargetedSubcategories.plasmid

Bacterial plasmids

.bacterial.untargetedSubcategories.other

Bacterial other (not a panel member, ribosomal DNA, or plasmid)

.fungal

Proportion of fungal reads classified to the following categories:

.fungal.targeted

Fungal targeted

.fungal.untargeted

Fungal untargeted

.fungal.untargetedSubcategories

Proportion of fungal untargeted reads classified to the following sub-categories:

.fungal.untargetedSubcategories.panel

Fungal panel members

.fungal.untargetedSubcategories.ribosomalDna

Fungal ribosomal DNA (18S)

.fungal.untargetedSubcategories.other

Fungal other (not a panel member or ribosomal DNA)

.parasitic

Proportion of parasitic reads classified to the following categories:

.parasitic.targeted

Parasitic targeted

.parasitic.untargeted

Parasitic untargeted

.parasitic.untargetedSubcategories

Proportion of parasitic untargeted reads classified to the following sub-categories:

.parasitic.untargetedSubcategories.panel

Parasitic panel members

.parasitic.untargetedSubcategories.ribosomalDna

Parasitic ribosomal DNA (18S)

.parasitic.untargetedSubcategories.other

Parasitic other (not a panel member or ribosomal DNA)

.human

Proportion of human reads classified to the following categories:

.human.untargeted

Human untargeted

.human.untargetedSubcategories

Proportion of human untargeted reads classified to the following sub-categories:

.human.untargetedSubcategories.ribosomalDna

Human ribosomal DNA

.human.untargetedSubcategories.codingSequence

Human coding sequence

.human.untargetedSubcategories.other

Human other (not ribosomal DNA or coding sequence)

.internalControl

Proporition of Internal Control reads classified to the following categories:

.internalControl.targeted

Internal Control targeted

.internalControl.untargeted

Internal Control untargeted

.microbialAndInternalControl

Proportion of Microbial and Internal Control reads classified to the following categories:

.microbialAndInternalControl.targeted

Microbial and Internal Control targeted

.microbialAndInternalControl.untargeted

Microbial and Internal Control untargeted

.bacterialAmr

Proportion of bacterial AMR reads classified to the following categories:

.bacterialAmr.targeted

Bacterial AMR targeted

.bacterialAmr.untargeted

Bacterial AMR untargeted

.qcReport.internalControls Node

The value of the .qcReport.internalControls field is an array of objects containing name and RPKM information for each Internal Control. See the code block below for an example:

[
    {
        "name": "Allobacillus halotolerans",
        "rpkm": 0
    },
    {
        "name": "Armored RNA Quant Internal Process Control",
        "rpkm": 0
    },
    {
        "name": "Enterobacteria phage T7",
        "rpkm": 180323
    },
    {
        "name": "Escherichia virus MS2",
        "rpkm": 0
    },
    {
        "name": "Escherichia virus Qbeta",
        "rpkm": 0
    },
    {
        "name": "Escherichia virus T4",
        "rpkm": 0
    },
    {
        "name": "Imtechella halotolerans",
        "rpkm": 0
    },
    {
        "name": "Phocid alphaherpesvirus 1",
        "rpkm": 0
    },
    {
        "name": "Phocine morbillivirus",
        "rpkm": 0
    },
    {
        "name": "Truepera radiovictrix",
        "rpkm": 0
    }
]

.userOptions Node

This section gives information about analysis options specified by the user. The fields are relative to .userOptions

FieldDescription

.quantitativeInternalControlName

The quantitative Internal Control used for microorganism absolute quantification (recommendation: Enterobacteria phage T7)

.quantitativeInternalControlConcentration

The quantitative Internal Control concentration (recommendation: 1.21 x 10^7 copies/mL of sample)

.readQcEnabled

Boolean field that indicates whether read QC (trimming and filtering based on quality and read length) was enabled

.readClassificationSensitivity

(VSPv2 only) Sensitivity threshold for classifying reads. Determines whether alignment should proceed for a microorganism and/or reference sequence

.targetReport.microorganisms[] Node

The value of the .targetReport.microorganisms[] field is an array of objects containing information about detected microorganisms. The following table describes one .targetReport.microorganisms[] object. The fields are relative to .targetReport.microorganisms[]

FieldDescription

.class

Microorganism class ("viral", "bacterial", "fungal", "parasite")

.name

Name of microorganism

.coverage

Proportion of targeted microorganism reference sequence bases that appear in sample sequencing reads

.ani

Average nucleotide identity of consensus sequence to targeted microorganism reference sequences

.medianDepth

Median depth of sample sequencing reads aligned to targeted microorganism reference sequences, indicating the median number of times each targeted microorganism reference sequence base appears in sample sequencing reads

.condensedDepthVector

Read depth across the targeted microorganism reference sequences, condensed to 256 bins

.rpkm

Normalized representation of the number of sample sequencing reads aligned to targeted microorganism reference sequences (targeted Reads mapped Per Kilobase of targeted sequence per Million quality-filtered reads)

.alignedReadCount

Number of sample sequencing reads that aligned to targeted microorganism reference sequences

.kmerReadCount

(UPIP only) Number of sample sequencing reads classified to targeted microorganism reference sequences

.absoluteQuantityRatio

Numerical absolute quantification value

.absoluteQuantityRatioFormatted

Formatted absolute quantification value with units

.phenotypicGroup

Grouping indicating general association with normal flora, colonization, or contamination from the environment or other sources, as well as general association with disease

.associatedAmrMarkers

(Bacteria only) Information about the bacterial AMR markers associated with the microorganism

.associatedAmrMarkers.applicable

Boolean indicating whether one or more bacterial AMR markers are associated with the microorganism

.associatedAmrMarkers.detected

List of detected bacterial AMR markers associated with the microorganism

.associatedAmrMarkers.predicted

List of predicted bacterial AMR markers associated with the microorganism

.consensusGenomeSequences

(RPIP/VSP V2 viruses only) Information about the majority consensus genome (or segment) sequence

.consensusGenomeSequences.sequence

Consensus genome (or segment) sequence bases

.consensusGenomeSequences.referenceAccession

Accession of the reference genome (or segment) sequence

.consensusGenomeSequences.referenceDescription

Description of the reference genome (or segment) sequence

.consensusGenomeSequences.referenceLength

Length of the reference genome (or segment) sequence

.consensusGenomeSequences.maximumAlignmentLength

Longest contiguous alignment between consensus sequence and reference genome (or segment) sequence

.consensusGenomeSequences.maximumGapLength

Longest contiguous alignment gap (insertion or deletion) between consensus sequence and reference genome (or segment) sequence

.consensusGenomeSequences.maximumUnalignedLength

Longest section of the reference genome (or segment) sequence not aligned to by consensus sequence

.consensusGenomeSequences.coverage

Proportion of reference genome (or segment) sequence bases that appear in sample sequencing reads

.consensusGenomeSequences.ani

Average nucleotide identity of consensus sequence to reference genome (or segment) sequence

.consensusGenomeSequences.alignedReadCount

Number of sample sequencing reads that aligned to reference genome (or segment) sequence.

.consensusGenomeSequences.medianDepth

Median depth of sample sequencing reads aligned to reference genome (or segment) sequence, indicating the median number of times each reference genome (or segment) sequence base appears in sample sequencing reads

.consensusGenomeSequences.targetAnnotation

List of targeted region annotations for the reference genome (or segment) sequence. Each annotation is a JSON object with the following fields: start (int), end (int), strand (string: "+", "-"), target_name (string), type (string)

.consensusGenomeSequences.condensedDepthVector

Read depth across the reference genome (or segment) sequence, condensed to 256 bins

.consensusTargetSequences

(RPIP viruses only) Information about the majority targeted region consensus sequences

.consensusTargetSequences.sequence

Consensus targeted region sequence bases

.consensusTargetSequences.name

Name of the targeted region

.consensusTargetSequences.referenceAccession

Accession of the targeted region reference sequence

.consensusTargetSequences.depthVector

Read depth across the targeted region reference sequence, not condensed

.predictionInformation

Information about microorganism prediction results

.predictionInformation.predictedPresent

Boolean indicating whether the microorganism passed its proprietary reporting logic algorithm

.predictionInformation.notes

List of notes about the prediction result

.predictionInformation.subpanels

List of pre-defined subpanels that the microorganism belongs to

.predictionInformation.relatedMicroorganisms

Array of objects with information about genetically related microorganisms. See below for details

.targetReport.microorganisms[].relatedMicroorganisms[] Node

The value of the .targetReport.microorganisms[].relatedMicroorganisms[] field is an array of objects containing information about genetically related microorganisms. The following table describes one .targetReport.microorganisms[].relatedMicroorganisms[] object. The fields are relative to .targetReport.microorganisms[].relatedMicroorganisms[]

FieldDescription

.name

Name of related microorganism

.onPanel

Boolean indicating whether the related microorganism is a panel member

.kmerReadCount

(UPIP only) Number of sample sequencing reads classified to related microorganism reference sequences

.coverage

Proportion of related microorganism reference sequence bases that appear in sample sequencing reads

.ani

Average nucleotide identity of consensus sequence to related microorganism reference sequences

.alignedReadCount

Number of sample sequencing reads that aligned to related microorganism reference sequences

.targetReport.microorganisms[].variants[] Node

The value of the .targetReport.microorganisms[].variants[] field is an array of objects containing information about variants for all VSP V2 viruses and select RPIP WGS viruses (SARS-CoV-2 & FluA/B/C). The following table describes one .targetReport.microorganisms[].variants[] object. The fields are relative to .targetReport.microorganisms[].variants[]

FieldDescription

.referenceAccession

Accession of reference genome (or segment) sequence used for variant calling

.segment

(Segmented viruses only) Segment number of reference segment sequence

.ntChange

Nucleotide change associated with variant

.referencePosition

Variant position in reference genome (or segment) sequence

.referenceAllele

Reference allele at variant position

.variantAllele

Variant allele

.depth

Variant depth, indicating the number of times variant allele appears in sample sequencing reads

.alleleFrequency

Frequency of variant allele in sample sequencing reads

.targetReport.amrMarkers[] Node

The value of the .targetReport.amrMarkers[] field is an array of objects containing information about detected bacterial AMR markers. The following table describes one .targetReport.amrMarkers[] object. The fields are relative to .targetReport.amrMarkers[]

FieldDescription

.class

Microorganism class ("bacterial")

.cardModelType

Bacterial AMR marker model type in the Comprehensive Antibiotic Resistance Database (CARD) ("homolog", "protein variant", "rRNA variant")

.cardGeneFamily

Bacterial AMR marker family in the Comprehensive Antibiotic Resistance Database (CARD)

.name

Bacterial AMR marker name

.cardName

Bacterial AMR marker name in the Comprehensive Antibiotic Resistance Database (CARD)

.ncbiName

Bacterial AMR marker name in the National Center for Biotechnology Information (NCBI)

.referenceAccession

Accession of the bacterial AMR marker reference sequence

.coverage

Proportion of bacterial AMR marker reference sequence residues that appear in sample sequencing reads (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)

.pid

Percent identity of consensus sequence aligned to bacterial AMR marker reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)

.medianDepth

Median depth of sample sequencing reads aligned to bacterial AMR marker reference sequence, indicating the median number of times each bacterial AMR marker sequence residue appears in sample sequencing reads (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)

.rpkm

Normalized representation of the number of sample sequencing reads aligned to bacterial AMR reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)

.alignedReadCount

Number of sample sequencing reads that aligned to bacterial AMR reference sequence (protein alignment for "homolog" and "protein variant" model types; nucleotide alignment for "rRNA variant" model type)

.nucleotideConsensusSequence

Nucleotide consensus sequence bases

.proteinConsensusSequence

Protein consensus sequence bases

.nucleotideDepthVector

Read depth across the bacterial AMR marker nucleotide reference sequence, not condensed

.proteinDepthVector

Read depth across the bacterial AMR marker protein reference sequence, not condensed

.associatedMicroorganisms

Information about the microorganisms associated with the bacterial AMR marker

.associatedMicroorganisms.all

List of all microorganisms associated with the bacterial AMR marker

.associatedMicroorganisms.detected

List of detected microorganisms associated with the bacterial AMR marker

.associatedMicroorganisms.predicted

List of predicted microorganisms associated with the bacterial AMR marker

.predictionInformation

Information about bacterial AMR marker prediction results

.predictionInformation.predictedPresent

Boolean indicating whether the bacterial AMR marker passed its proprietary reporting logic algorithm

.predictionInformation.confidence

Confidence level of bacterial AMR marker prediction ("high", "medium", "low")

.predictionInformation.notes

List of notes about the prediction result

.targetReport.amrMarkers[].variants[] Node

The value of the .targetReport.amrMarkers[].variants[] field is an array of objects containing information about variants for bacterial AMR markers with "protein variant" or "rRNA variant" model types. The following table describes one .targetReport.amrMarkers[].variants[] object. The fields are relative to .targetReport.amrMarkers[].variants[]

FieldDescription

.category

Variant category ("Bacterial Variant; Known AMR")

.referenceSourceMicroorganism

Microorganism that reference sequence is associated with in NCBI

.comments

Comments about variant

.product

Protein product of gene

.ntChange

Nucleotide change associated with variant

.referencePosition

Variant position in reference sequence

.referenceAllele

Reference allele at variant position

.variantAllele

Variant allele

.depth

Variant depth, indicating the number of times variant allele appears in sample sequencing reads

.alleleFrequency

Frequency of variant allele in sample sequencing reads

.annotation

Type of change (e.g. "Nonsynonymous Variant")

.aaChange

Amino acid change associated with variant

.epistaticGroups

List of epistatic groups variant is associated with

.targetReport.customReferences[] Node

This section contains information about custom reference detection results and is only present for custom database analyses. When only a custom reference FASTA file is provided (no BED file), each .targetReport.customReferences[] object contains information for a single reference sequence. When both a FASTA and BED file are provided, each .targetReport.customReferences[] object contains information for a single genome/microorganism, which can be a collection of one or more reference sequences. The fields are relative to .targetReport.customReferences[]

FieldDescription

.name

Name of custom reference sequence, accession or genome/microorgannism

.coverage

Proportion of custom reference sequence bases that appear in sample sequencing reads

.ani

Average nucleolotide identity of consensus sequence to custom reference sequence or, if specified, collection of one or more custom reference sequences

.medianDepth

Median depth of sample sequencing reads aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences, indicating the med\ian number of times each custom reference sequence base appears in sample sequencing reads

.condensedDepthVector

Read depth across custom reference sequence or, if specified, collection of one or more custom reference sequences, condensed to 256 bins

.rpkm

Normalized number of sample sequencing reads aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences (targeted Reads mapped Per Kilobase of targeted sequence per Million quality-filtered reads)

.alignedReadCount

Number of sample sequencing reads that aligned to custom reference sequence or, if specified, collection of one or more custom reference sequences

.consensusSequences

Array of objects with information about each consensus sequence

.variants

Array of objects with information about variants detected in custom reference sequence or, if specified, collection of one or more custom reference sequences

.targetReport.customReferences[].consensusSequences[] Node

The value of the .targetReport.customReferences[].consensusSequences[] field is an array of objects containing majority consensus sequence information for a single custom reference sequence. When only a FASTA file is provided (no BED file), there will be only one object in the array. When both a FASTA and BED file are provided, there may be more than one object in the array. The fields are relative to .targetReport.customReferences[].consensusSequences[]

FieldDescription

.sequence

Majority consensus sequence bases

.referenceAccession

Accession of custom reference sequence

.referenceDescription

Description of custom reference sequence

.referenceLength

Length of custom reference sequence

.coverage

Proportion of custom reference sequence bases that appear in sample sequencing reads

.ani

Average nucleolotide identity of consensus sequence to custom reference sequence

.medianDepth

Median depth of sample sequencing reads aligned to custom reference sequence, indicating the median number of times each custom reference sequence base appears in sample sequencing reads

.depthVector

Read depth across custom reference sequence, not condensed

.alignedReadCount

Number of sample sequencing reads that aligned to custom reference sequence

.maximumAlignmentLength

Longest contiguous alignment between consensus sequence and custom reference sequence

.maximumGapLength

Longest contiguous alignment gap (insertion or deletion) between consensus sequence and custom reference sequence

.maximumUnalignedLength

Longest section of custom reference sequence not aligned to by consensus sequence

.targetReport.customReferences[].variants[] Node

The value of the .targetReport.customReferences[].variants[] field is an array of objects containing information about a single detected variant. The fields are relative to .targetReport.customReferences[].variants[]

FieldDescription

.ntChange

Nucleotide change associated with the variant

.referenceAccession

Accession of custom reference sequence used for variant calling

.referencePosition

Variant position in custom reference sequence

.referenceAllele

Reference allele at variant position

.variantAllele

Variant allele

.depth

Variant depth, indicating the number of times variant allele appears in sample sequencing reads

.alleleFrequency

Frequency of variant allele in sample sequencing reads

Last updated