HTAN Bulk DNA Sequencing Data Standard


This page describes the data levels, metadata attributes, and file structure for bulk DNA sequencing.

Description of Assay

Bulk DNA sequencing produces the DNA sequence of a biological sample. The sequence is summarized into a list of variants in comparison to a given reference genome. This data model should be applicable to assays including bulk tumor Whole Genome Sequencing (WGS), bulk tumor Whole Exome Sequencing (WES), bulk cfDNA WES (cell free), bulk tumor targeted DNA sequencing, and bulk ctDNA targeted DNA sequencing.

Metadata Levels

The defined metadata leverages existing common data elements from the Genomic Data Commons (GDC). The HTAN data model currently supports Level 1, 2 and 3 DNA sequencing data:

Level NumberDefinitionExample Data
1Raw unaligned read dataFASTQ
2Genome aligned readsBAM
3Sample level summaryVCF/ MAF
Data Schema:
Bulk DNA Level 1
Bulk Whole Exome Sequencing raw files
Bulk DNA Level 2
Bulk Whole Exome Sequencing aligned files and QC
Bulk DNA Level 3
Bulk Whole Exome Sequencing called variants