Reference BED file format
A BED-like tab-separated value (TSV) file with no header row and with 4 or 5 columns:
accession
: each sequence accession as it appears in Custom Reference FASTA heaerstart
: start position (always set to 0)end
: end position (sequence length)genome
: full name of the virus the sequence belongs to (e.g. Influenza A H1N1)(optional)
segment
: how this sequence is labeled within the virus (e.g. Segment 4 (HA)). Set to 'Full' if the sequence is the full genome
Guidelines
This file affects how sequences are labeled in the output.
Sequence names must match those in Custom Reference FASTA. The same set of sequences must appear in both.
If there are multiple viruses, their names should be unique. For example, if there are multiple Influenza genomes, they should not be labeled with the same virus name in the 4th column.
If the Custom Reference FASTA includes sequences from multiple segments, it is strongly recommended to provide this BED file. Otherwise, each segment will be treated independently and not all of them may be used as reference.
Example
NC_012532.1 0 10794 Zika Full
NC_007373.1 0 2341 Influenza A virus (H3N2) Segment 1 (PB2)
NC_007372.1 0 2341 Influenza A virus (H3N2) Segment 2 (PB1)
NC_007371.1 0 2233 Influenza A virus (H3N2) Segment 3 (PA+PA-X)
NC_007366.1 0 1762 Influenza A virus (H3N2) Segment 4 (HA)
NC_007369.1 0 1566 Influenza A virus (H3N2) Segment 5 (NP)
NC_007368.1 0 1467 Influenza A virus (H3N2) Segment 6 (NB+NA)
NC_007367.1 0 1027 Influenza A virus (H3N2) Segment 7 (M1+M2)
NC_007370.1 0 890 Influenza A virus (H3N2) Segment 8 (NS1+NEP)
Last updated
Was this helpful?