This page describes the data levels, metadata attributes, and file structure for bulk DNA sequencing.
Bulk DNA sequencing produces the DNA sequence of a biological sample. The sequence is summarized into a list of variants in comparison to a given reference genome. This data model should be applicable to assays including bulk tumor Whole Genome Sequencing (WGS), bulk tumor Whole Exome Sequencing (WES), bulk cfDNA WES (cell free), bulk tumor targeted DNA sequencing, and bulk ctDNA targeted DNA sequencing.
The defined metadata leverages existing common data elements from the Genomic Data Commons (GDC). The HTAN data model currently supports Level 1, 2 and 3 DNA sequencing data:
|Level Number||Definition||Example Data|
|1||Raw unaligned read data||FASTQ|
|2||Genome aligned reads||BAM|
|3||Sample level summary||VCF/ MAF|