National Human Genome Research Institute National Human Genome Research Institute National Human Genome Research Institute National Institutes of Health
   
       Home | About NHGRI | Newsroom | Staff
Research Grants Health Policy & Ethics Educational Resources Careers & Training

Intramural Research > Online Research Resources > DNaseHS Sites

Genome-wide mapping of DNase Hypersensitive sites using Massively Parallel Signature Sequencing (MPSS)

Gregory E. Crawford1, Ingeborg E. Holt1, James Whittle1, Bryn D. Webb1, Denise Tai1, Sean Davis1, Elliott H. Margulies1, YiDong Chen1, John A. Bernat2, David Ginsburg2, Daixing Zhou3, Shujun Luo3, Thomas J. Vasicek3, Tyra G. Wolfsberg1, and Francis S. Collins1

1National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892
2University of Michigan, Department of Human Genetics, Ann Arbor, MI 48109
3Solexa, Inc., Hayward, CA 94545

Abstract

A major goal in genomics is to understand how genes are regulated in different tissues, stages of development, diseases, and species. Mapping DNaseI hypersensitive (HS) sites within nuclear chromatin is a powerful and well-established method of identifying many different types of regulatory elements, but in the past has been limited to analysis of single loci. We have recently described a protocol to generate a genome-wide library of DNase HS sites. Here, we report high throughput analysis, using massively parallel signature sequencing (MPSS), of 230,000 tags from a DNase library generated from quiescent human CD4+ T cells. Of the tags that uniquely map to the genome, we identified 14,200 clusters of sequences that group within close proximity to each other. Using a real-time PCR strategy, we determined that the majority of these clusters represent valid DNase HS sites. Approximately 80% of these DNase HS sites uniquely map within one or more annotated regions of the genome believed to contain regulatory elements, including regions 2kb upstream of genes, CpG islands, and highly conserved sequences. Most DNase hypersensitive sites identified in CD4+ T cells are also hypersensitive in CD8+ T cells, B cells, hepatocytes, human umbilical vein endothelial cells (HUVECs), and HeLa cells. However, ~10% of the DNase HS sites are lymphocyte specific, indicating that this protocol can identify gene regulatory elements that control cell type specificity. This strategy, which can be applied to any cell line or tissue, will enable a better understanding of how chromatin structure dictates cell function and fate.


Table Column Headers | Description of DNase HS clusters | Genome Assembly | Verification

Raw Data UCSC Tracks
Individual Sequences DNase HS clusters*
  chromosome 1   chromosome 1   chromosome 1
  chromosome 1_random   chromosome 1_random   chromosome 1_random
  chromosome 2   chromosome 2   chromosome 2
  chromosome 2_random   chromosome 2_random   chromosome 2_random
  chromosome 3   chromosome 3   chromosome 3
  chromosome 4   chromosome 4   chromosome 4
  chromosome 4_random   chromosome 4_random   chromosome 4_random
  chromosome 5   chromosome 5   chromosome 5
  chromosome 5_random   chromosome 5_random   chromosome 5_random
  chromosome 6   chromosome 6   chromosome 6
  chromosome 6_random   chromosome 6_random   chromosome 6_random
  chromosome 7   chromosome 7   chromosome 7
  chromosome 7_random   chromosome 7_random   chromosome 7_random
  chromosome 8   chromosome 8   chromosome 8
  chromosome 8_random   chromosome 8_random   chromosome 8_random
  chromosome 9   chromosome 9   chromosome 9
  chromosome 9_random   chromosome 9_random   chromosome 9_random
  chromosome 10   chromosome 10   chromosome 10
  chromosome 10_random   chromosome 10_random   chromosome 10_random
  chromosome 11   chromosome 11   chromosome 11
  chromosome 12   chromosome 12   chromosome 12
  chromosome 13   chromosome 13   chromosome 13
  chromosome 13_random   chromosome 13_random   chromosome 13_random
  chromosome 14   chromosome 14   chromosome 14
  chromosome 15   chromosome 15   chromosome 15
  chromosome 15_random   chromosome 15_random   chromosome 15_random
  chromosome 16   chromosome 16   chromosome 16
  chromosome 17   chromosome 17   chromosome 17
  chromosome 17_random   chromosome 17_random   chromosome 17_random
  chromosome 18   chromosome 18   chromosome 18
  chromosome 18_random   chromosome 18_random   chromosome 18_random
  chromosome 19   chromosome 19   chromosome 19
  chromosome 20   chromosome 20   chromosome 20
  chromosome 21   chromosome 21   chromosome 21
  chromosome 22   chromosome 22   chromosome 22
  chromosome X   chromosome X   chromosome X
  chromosome X_random   chromosome X_random   chromosome X_random
  chromosome Y   chromosome Y   chromosome Y
  chromosome M   chromosome M   chromosome M
  chromosome Un_random     chromosome Un_random     chromosome Un_random  
  All Individual Sequences   All Clusters  

Note: The individual sequence files were updated on May 25, 2005. Before that date, the files did not contain the complete data set. If you downloaded data prior to May 25, 2005, please retrieve the data again to obtain the full list of coordinates. The DNase HS clusters files were not affected.


Table Column Headers

Individual Sequences:
chr: chromosome
coord: coordinate of DNase sequence
strand: strand of DNase sequence
2kb_upstream: + indicates that the sequence falls within 2 kb upstream of an mRNA RefSeq
CpG_Island: + indicates that the sequence falls within a CpG Island
MCS: + indicates that the sequence falls within an MCS (multi-species conserved sequences)

DNase HS clusters:
chr: chromosome
start: first coordinate of cluster
stop: last coordinate of cluster
name: cluster identifier
count: number of DNase sequences in cluster
2kb_upstream: + indicates that the midpoint of cluster falls within 2 kb upstream of an mRNA RefSeq
CpG_Island: + indicates that the cluster region overlaps with a CpG Island
MCS: + indicates that the cluster region overlaps with an MCS (multi-species conserved sequences

Description of DNase HS clusters

DNase HS clusters are multiple DNaseI library sequences that map within 500 bases of each other. Each cluster has a unique identifier; the last digit of each identifier represents the number of sequences that map within that particular cluster. For example, 500bp_199_4 represents a cluster of 4 sequences (that has the unique identifier 199) in which the distance between each sequence is less than 500 bp.

Genome Assembly

Coordinates were derived using UCSC's Human July 2003 assembly (hg16, NCBI build 34)

Verification

Real-time PCR assay was used to verify valid DNaseI-hypersensitive sites in CD4+ T cells. Approximately 20% of individual sequences (singlets) are valid, 50% of clusters of 2 sequences are valid, 80% of clusters of 3 sequences are valid, and 100% of clusters of 4 or more sequences are valid.



* DNase HS Clusters identify individual sequence coordinates that map within 500 bp from each other



Comments, suggestions and problems to bioinformatics@nhgri.nih.gov


Genome.gov privacy policyPrivacy Genome.gov contact informationContact Genome.gov accessibility informationAccessibility Genome.gov site indexSite Index Genome.gov staff directoryStaff Directory Genome.gov home pageHome Government Links Department of Health and Human Services FirstGov National Institutes of Health