🧬Custom references and BED files

Custom reference FASTA file:

A custom reference FASTA file is required to run the custom panel analysis. Sequence names in the custom reference FASTA file must be unique and should not contain any spaces. If there is any space in the FASTA header, the part before the first space is assumed to be the sequence name. It is recommended to use only the following in sequence names: alphabets, numbers, underscore (_), hyphen (-), parentheses ((,)), and period (.). Otherwise, the sequence names may appear different in the output. An example custom reference fasta file is provided in the link below.

The user may provide one or more reference genomes as the target for read alignment (and as the basis for generating consensus sequences). At a minimum, the user must provide a FASTA file containing the sequences of the reference genomes. To upload the reference FASTA file, go to the "Projects" tab and click on the folded paper icon (representing File), which will reveal a dropdown menu. Click on "Upload" and select "Files". Within the upload page, select "Other" format for FASTA files, and upload the file as a BioSample. Within the DRAGEN Microbial Enrichment Plus App, under 'Custom panel specification' use the 'Custom reference FASTA for consensus generation' control to select the uploaded FASTA file containing the reference sequences. The software will generate the required DRAGEN hash tables and other auxiliary files automatically, so there is no need to process the FASTA file with a separate app.

Custom reference BED file (optional):

Optionally, a genome definition BED file may also be provided. The BED file tells the software more information about each sequence in the fasta file, such as a human-readable common name to be used in the reports. For multi-segment genomes such as Influenza, the genome definition BED file provides the segment name of each sequence and indicates that all the segments of a single genome belong together. To upload the BED file, go to the "Projects" tab and click on the folded paper icon (representing File), which will reveal a dropdown menu. Click on "Upload" and select "Files". Within the upload page, select "Other" format for BED files, and upload the file as a BioSample. Within the DRAGEN Microbial Enrichment Plus App, under 'Custom panel specification' use the 'Custom reference BED' dropdown to select the uploaded BED file containing the genome definition. See the following page for a description of the format of the genome definition BED file:

The file must be tab-delimited with at least 4 columns:

  1. chrom: the sequence name as it appears in the FASTA

  2. chromStart: start position (always set to 0)

  3. chromEnd: end position (sequence length)

  4. genomeName: name of the genome, target, or microorganism the sequence belongs to (e.g. Monkeypox virus clade II)

  5. segmentName (optional): the name of the segment or gene (e.g. Segment 4 (HA)). Set to 'Full' if the sequence is the full genome

Sequence names must match between the FASTA file and BED file (as included in the "chrom" column), and the same set of sequences must appear in both files. If there are multiple viruses, their names should be unique. For example, if there are multiple Influenza genomes, they should not be labeled with the same virus name in the 4th column.

The BED file controls how sequences are labeled in the output JSON. If the custom reference FASTA file includes sequences from multiple segments, it is recommended to provide a BED file so that the segments are included under the results of that microorganism. Otherwise, each segment will be treated independently and not all of them may be used as reference.

Example genome definition BED file

NC_012532.1	0	10794	Zika	Full
KJ609203.1	0	2292	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 1 (PB2)
KJ609204.1	0	2304	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 2 (PB1)
KJ609205.1	0	2168	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 3 (PA+PA-X)
KJ609206.1	0	1727	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 4 (HA)
KJ609207.1	0	1530	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 5 (NP)
KJ609208.1	0	1441	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 6 (NA)
KJ609209.1	0	1001	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 7 (M1+M2)
KJ609210.1	0	866	Influenza A virus (A/Perth/16/2009(H3N2))	Segment 8 (NS1+NEP)

Last updated