HTAN Bulk RNA Sequencing Data Standard

Overview

This page describes the data levels, metadata attributes, and file structure for bulk RNA sequencing.

Description of Assay

Bulk RNA sequencing identifies the average gene expression profile of a biological sample.

Metadata Levels

The defined metadata leverages existing common data elements from the Genomic Data Commons (GDC). The HTAN data model currently supports Level 1, 2 and 3 RNA sequencing data:

Level NumberDefinitionExample Data
1Unaligned readsFASTQ
2Aligned readsBAM
3Gene level expression, unnormalizedGene & isoform expression-level data (.csv)
Data Schema:
Attribute
Label
Description
Bulk RNA-seq Level 1
BulkRNA-seqLevel1
Bulk RNA-seq [EFO_0003738]
Bulk RNA-seq Level 2
BulkRNA-seqLevel2
Bulk RNA-seq alignment protocol description
Bulk RNA-seq Level 3
BulkRNA-seqLevel3
Bulk RNA-seq gene expression matrices