HTAN Single Cell/Single Nucleus RNA Sequencing Data Standard


This page describes the data levels, metadata attributes, and file structure for single cell and single nucleus RNA sequencing assays.

Description of Assay

Single cell RNA sequencing is an emerging technology used to investigate the expression profiles of individual cells and/or nuclei. This technique is becoming increasingly useful for investigating the tumor microenvironment, which is composed of a heterogeneous population of cancer cells and tumor-adjacent stromal cells. In these experiments, tissues are enzymatically dissociated, and individual cells are isolated via microfluidics using oil droplet emulsion. Similarly to bulk RNA sequencing, individual transcriptomes are then uniquely tagged, reversed transcribed, amplified and sequenced. While sc-RNA sequencing captures both cytoplasmic and nuclear transcripts, single nucleus RNA sequencing measures the transcriptome of individual nuclei. Advantages of sn-RNA sequencing include differentiating cell states and identifying rare or novel cell types in heterogeneous populations.

In alignment with The Cancer Genome Atlas & NCI Genomic Data Commons, data are divided into levels:

Level NumberDefinitionExample Data
1Raw dataFASTQs, unaligned BAMs
2Aligned primary dataAligned BAMs
3Derived biomolecular dataGene expression matrix files, VCFs
4Sample level summary datat-SNE plot coordinates
Data Schema:
scRNA-seq Level 1
Single-cell RNA-seq [EFO_0008913]
scRNA-seq Level 2
Alignment workflows downstream of scRNA-seq Level 1
scRNA-seq Level 3
Gene and Isoform expression files
scRNA-seq Level 4
Data represents the relationships between cells derived from Level 3 expression data and shown as tSNE or UMAP coordinates per cell, plus all other cell-specific meta information (e.g., cell type)