DNA methylation is a heritable epigenetic modification process that occurs in some eukaryotes whereby CpG dinucleotides are methylated at the C5 position of cytosine. The methylation of the 5’ regulatory regions of genes results in gene silencing. A substantial effort is underway within the epigenomics community to identify DNA methylation patterns on a genome-wide scale using microarray-based technologies to characterize tumor cells, tissue-specific methylation, and DNA methylation inhibitors. An affinity-based method, methylated DNA immunoprecipitation (MeDIP), has been shown to be a powerful tool for isolating methylated DNA fragments. Roche NimbleGen recommends this sample preparation method due to its straight forward experimental setup, ease of use (only requires a 5-methlycytidine antibody for enrichment) and sensitive level of methylation detection (as little as 2 methyl cytosines per fragment) when coupled with NimbleGen DNA methylation arrays. Another affinity-based method, methylated CpG-island recovery assay (MIRA), can also be used to enrich for methylated DNA. Other methods of enriching for hyper- or hypomethylated DNA fragments that can be hybridized to NimbleGen DNA methylation arrays include the use of various methylation-sensitive or methylation-resistant restriction enzyme cocktails.
Signal intensity data is extracted from the scanned images of each array using NimbleScan, NimbleGen’s data extraction software. Signal intensities for each probe are saved in pair files (.txt), the raw data format for ChIP-chip experiments.
Each feature on the array has a corresponding scaled log2-ratio. This is the ratio of the input signals for the experimental and test samples that were co-hybridized to the array. The log2-ratio is computed and scaled to center the ratio data around zero. Scaling is performed by subtracting the bi-weight mean for the log2-ratio values for all features on the array from each log2-ratio value. View log2-ratio data files (.gff) using SignalMap.
From the scaled log2-ratio data, a fixed-length window (750bp) is placed around each consecutive probe and the one-sided Kolmogorov-Smirnov (KS) test is applied to determine whether the probes are drawn from a significantly more positive distribution of intensity log-ratios than those in the rest of the array. The resulting score for each probe is the -log10 p-value from the windowed KS test around that probe. View p-value data files (.gff) using SignalMap.
For each annotated gene, NimbleScan searches for peaks that appear in a specified promoter region around the transcription start site (TSS). The region searched is design-specific; for most mammalian designs, the search region spans from 5kb upstream to 1kb downstream of the TSS.
You can view the summary reports using spreadsheet software, such as Microsoft Excel:
- Report All Peaks – Lists all peaks and maps them to promoter regions. Each row in the report lists a peak-transcript pair. For each transcript, if more than one peak lies within the promoter region, there will be multiple rows for that transcript.
- Report Nearest Peak – Lists all peaks and maps them to promoter regions. Each row in the report lists a peak-transcript pair. For each transcript, if more than one peak lies within the promoter region, only the peak nearest to the TSS is reported.
To effectively analyze peak data, you should sort the data in summary reports according to peak score, gene name, chromosome, distance to TSS, etc. To sort data in Microsoft Excel, highlight row 1 and select Data -> Filter -> Auto Filter. You can then sort individual columns by ascending/descending values, top 10 values, or individual values.
The table below identifies the fields on the summary reports (.xls):
| Field |
Description |
| PEAK_ID |
An ID for each peak. |
| CHROMOSOME |
Chromosome associated with the peak. |
| PEAK_START |
First base of the peak on the chromosome. |
| PEAK_END |
Last base of the peak on the chromosome. |
| PEAK_SCORE |
The peak score, which is the average -log10 pvalues from probes within that peak. |
| FEATURE_TRACK |
The annotation track against which peaks were mapped; it is the transcription start site for summary reports. |
| FEATURE_STRAND |
Strand of the transcript. |
| FEATURE_START |
First base of the feature on the chromosome. |
| |
Note: For the transcription start site, feature size is 1; therefore, start and end positions are the same. |
| FEATURE_END |
Last base of the feature on the chromosome. |
| |
Note: For the transcription start site, feature size is 1; therefore, start and end positions are the same. |
| FEATURE_TO_PEAK_DISTANCE |
Center-to-center distance of peak to feature. |
| Name |
Gene symbol of the transcript. |
| Accession |
GenBank accession number of the transcript. |
| description |
Full gene name of the transcript. |
| ncbi_gene_id |
NCBI Entrez GeneID of the transcript. |
| synonyms |
Other alias symbol(s) of the transcript. |
| Parent |
The internal identification number of the transcript from which this transcription start site is generated. |
| PEAK_ATTR |
Attribute field from the peak GFF file. |
For each annotated gene, NimbleScan searches for peaks that appear in a specified promoter region around the transcription start site (TSS). The region searched is design-specific; for most mammalian designs, the search region spans from 5kb upstream to 1kb downstream of the TSS.
You can view the promoter reports using spreadsheet software, such as Microsoft Excel:
- Report_All_Peaks – Lists all peaks with an FDR ≤ 0.2 and maps them to promoter regions. Each row in the report lists a peak-transcript pair. For each transcript, if more than one peak lies within the promoter region, there will be multiple rows for that transcript.
- Report_Nearest_Peaks – Lists all peaks with an FDR ≤ 0.2 and maps them to promoter regions. Each row in the report lists a peak-transcript pair. For each transcript, if more than one peak lies within the promoter region, only the peak nearest to the TSS is reported.
To effectively analyze peak data, you should sort the data in promoter reports according to FDR, peak score, gene name, chromosome, distance to TSS, etc. To sort data in Microsoft Excel, highlight row 1 and choose Data -> Filter -> Auto Filter. You can then sort individual columns by ascending/descending values, top 10 values, or individual values.
The table below identifies the fields on the promoter reports (.xls):
| Field |
Description |
| PEAK_ID |
An ID for each peak. |
| CHROMOSOME |
Chromosome associated with the peak. |
| PEAK_START |
First base of the peak on the chromosome. |
| PEAK_END |
Last base of the peak on the chromosome. |
| PEAK_SCORE |
The log2-ratio of the fourth highest probe in the peak. |
| PEAK_FDR |
FDR value of the peak. |
| FEATURE_TRACK |
The annotation track against which peaks were mapped; it is the transcription start site for promoter reports. |
| FEATURE_STRAND |
Strand of the transcript. |
| FEATURE_START |
First base of the feature on the chromosome. |
| |
Note: For the transcription start site, feature size is 1; therefore, start and end positions are the same. |
| FEATURE_END |
Last base of the feature on the chromosome. |
| |
Note: For the transcription start site, feature size is 1; therefore, start and end positions are the same. |
| FEATURE_TO_PEAK_DISTANCE |
Center-to-center distance of peak to feature. |
| Name |
Gene symbol of the transcript. |
| accession |
GenBank accession number of the transcript. |
| description |
Full gene name of the transcript. |
| ncbi_gene_id |
NCBI Entrez GeneID of the transcript. |
| synonyms |
Other alias symbol(s) of the transcript. |
| Parent |
The internal identification number of the transcript from which this transcription start site is generated. |
If your array design is customized, some of the files described above may not be provided. For instance, annotation files (.gff) may not be readily available for less common genomes, which will result in no promoter reports being generated. In addition, the gene description file (.ngd) is available only for certain designs, since these files were replaced by annotation files (.gff) in newer designs. Also, if a positions file (.pos) is not available (because genomic coordinates were not provided for a custom design), no ratio files (.gff), peak data files (.gff), or promoter reports (.xls) are generated.
There are many third party packages into which one can import and analyze NimbleGen ChIP-based DNA methylation data. Five 3rd party packages are listed below:
Elucidating the function and transcriptional network of large gene lists can often be cumbersome and difficult to understand. Using the Database for Annotation, Visualization and Integrated Discovery (DAVID), you can functionally annotate your DNA Methylation data using a here to download a guide to using the DAVID website.
Brochures & Sales Flyers
User Guides
Downloads
Application Notes & Whitepapers