HTAN Single Cell/Single Nucleus RNA Sequencing Data Standard


This page describes the data levels, metadata attributes, and file structure for single cell and single nucleus RNA sequencing assays.

Description of Assay

Single cell RNA sequencing is an emerging technology used to investigate the expression profiles of individual cells and/or nuclei. This technique is becoming increasingly useful for investigating the tumor microenvironment, which is composed of a heterogeneous population of cancer cells and tumor-adjacent stromal cells. In these experiments, tissues are enzymatically dissociated, and individual cells are isolated via microfluidics using oil droplet emulsion. Similarly to bulk RNA sequencing, individual transcriptomes are then uniquely tagged, reversed transcribed, amplified and sequenced. While sc-RNA sequencing captures both cytoplasmic and nuclear transcripts, single nucleus RNA sequencing measures the transcriptome of individual nuclei. Advantages of sn-RNA sequencing include differentiating cell states and identifying rare or novel cell types in heterogeneous populations.

Metadata Levels

In alignment with The Cancer Genome Atlas & NCI Genomic Data Commons, data are divided into levels:

Level NumberDefinitionExample Data
1Raw dataFASTQs, unaligned BAMs
2Aligned primary dataAligned BAMs
3Derived biomolecular dataGene expression matrix files, VCFs
4Sample level summary datat-SNE plot coordinates
Data Schema:
scATAC-seq Level 1
scATAC-seq files containing sequence read information, with or without alignment, as FASTQ or BAM files