FAQ
Part 1: Agilent RNA Quality Check
- How much RNA do you need for a quality check?
- What's the detectable range of RNA concentration for the RNA quality check?
- What should I look for in the Agilent RNA Quality Check result?
Part 2: Microarrays - Before you Start General:
- What is a DNA microarray and how does it help with cancer research?
- I' m a first time user. Can I get some help on experiment design?
- Why do you need replication in experiment design?
- How many replicates are needed?
- Does pooling RNA count as replication?
- How should I prepare my samples?
For Spotted Arrays:
- What is a spotted DNA microarray?
- What spotted microarrays are available in the facility? Can I get a genelist for available arrays?
- What' s the price of spotted microarrays?
- How much RNA is needed to run a spotted microarray?
- What should I put in the submission form for cy5 and cy3 samples?
For Affymetrix Arrays:
- What is the Affymetrix GeneChip array?
- What' s the price of Affymetrix arrays?
- What Affymetrix microarrays are available? Where can I get a genelist for available arrays?
- How many samples do I need to prepare for an Affymetrix microarray?
Part 3: Microarrays - After you get your results
- What options do I have for data analysis after I get my results?
- What are those numbers in the summary file I received with my spotted array results?
- What are the result files for a spotted array? What does each column in a .gpr file mean?
- What are the result files for an Affymetrix array? Which one should I use for which analysis program?
- I'm using GeneSpring. Where can I get the genome file for spotted arrays?
Part 1: Agilent RNA Quality Check:
How much RNA do you need for a quality check?
Answer: Please give us 3 μl of RNA samples in 0.5 tubes. (Only 1 μl is needed to run the check, but we might need to rerun the sample in case it didn't run well the first time.)
What's the detectable range of RNA concentration for the RNA quality check?
Answer: On Nano Chip: total RNA: 5-500 ng/μl; mRNA: 25-250 ng/μl.
What should I look for in the Agilent RNA Quality Check result?
Answer: For information about the RNA Ladder we are using, check here.
For total RNA:
RNA quality: the ratio for rRNA 28s/18s is a indication of RNA quality. The ideal ratio is 2, a low ratio is a indication of degradation of the total RNA. On the gel image and the graph, the 28s and 18s should show up as 2 sharp bands.
RNA concentration: The Agilent RNA quality check can give you an estimate of the RNA concentration. If concentration is what you're concerned about, double check it on a spec.
For mRNA:
rRNA contamination: the percentage of rRNA in your sample. The ideal number is <5.
RNA concentration: The Agilent RNA quality check can give you an estimate of the RNA concentration. If concentration is what you're concerned about, double check it on a spec.
Part 2: Microarray - Before you start:
General:
What is a DNA microarray and how does it help with cancer research?
Answer: Please check this PowerPoint presentation (15.9 MB).
I'm a first time user. Can I get some help on experiment design?
Answer: Before submitting samples for microarray experiments, Duke investigators are encouraged to make an appointment with Dr. Holly Dressman (668-1583 or Email (antispam encoder used)) to discuss experimental design and the types of microarray technologies that are available to the user through the Duke Microarray Core Facility. It is important to begin a microarray experiment with a proper question that is well defined with independent experimental verification.
Why do you need replication in experiment design?
Answer: Replication is important because it offers statistical power that enables you to find real differences between experimental groups. With adequate replication, "real" differences in levels of gene expression can be distinguished from differences caused by random variation. Without replication, it is difficult to know whether observed differences are real or random. Statistical precision enables you to accurately characterize gene expression for a particular experimental unit. With adequate replication, you get a more accurate overall picture of expression. Without replication, you have more random variation, leading to a less accurate picture and no way to fully characterize the uncertainty in the data.
Sources of variation:
| Biological Variation Between... | Process Variation Caused By... |
| Strains | Quality of the Experimental Sample |
| Animals | Labeling Effects |
| Tissues | Hybridization Effects |
| Time | Background Effects |
How many replicates are needed?
Answer: There is no simple guidance on the number of replicates needed. A minimum of four or five replications for each experimental condition and/or time point is a good starting point, however, more may be needed to achieve the goals of many experiments. Some general guidance follows:
| More replication is needed for... | Less replication is needed for... |
| Finding small differences in genes expressed at modest levels | Finding gross patterns among highly expressed genes |
| Experiments using tissue samples | Experiments using cell line samples |
| Experiments with no confirmatory testing | Experiments incorporating confirmatory testing such as Northern blots or Real-Time PCR |
Standard statistical methods support sample size calculations to determine how many samples are needed to detect a specified difference between groups with a required level of power. In concept, this can be done for microarray experiments too. However, sample size calculations are based on a known level of variation between samples. For microarrays, the reality is that:
(a) The expected level of variation is usually not well known in advance. Due to the high cost of microarrays and the large number of samples needed to accurately assess variance, it is usually not practical to follow the common statistical practice of gathering "pilot" data for the purpose of estimating variability.
(b) Variation between samples can differ for different genes, so the ideal number of replicates may differ as well. This makes it impossible to have a single rule that works in general, without applying some simplifying assumptions.
Does pooling RNA count as replication?
No. A common practice involves pooling RNA from several experimental units (e.g. animals) in an effort to achieve more representative results. While pooling RNA in a replicated experiment may indeed improve statistical power and precision due to less variation across samples, pooling RNA is not a substitute for replicating an experiment. For instance:
The practice of pooling RNA does not in itself provide a way to characterize random variation in the experiment, so replication is still needed to distinguish real differences.
Pooling RNA can distort results if, for example, a particular experimental unit is problematic and contributes a misleading expression pattern that skews the results.
Pooling RNA precludes the investigator from observing potentially interesting patterns in behavior across different animals or other experimental units.
How should I prepare my samples?
A number of RNA isolation methods are available to generate high-quality RNA for use in microarray experiments. We suggest using the Qiagen RNeasy kits for total RNA and the Qiagen Oligotex kits to isolate polyA+ RNA. Isolating polyA+ RNA is not required, but can produce consistent results because the isolation acts as a clean up step to remove residual DNA and protein contamination that may be present in varying amounts in total RNA samples. Trizol may also be used for RNA isolation, however it is strongly recommended that during the extraction, do not remove any of the biphase layer (could introduce protein contaminants) and then clean up the RNA with the Qiagen RNeasy columns.
When working with RNA, remember to protect the RNA at all stages, and in particular take the following precautions and quality assurance steps:
- Autoclaving WILL NOT kill RNases, since they are quite stable, so use only RNase & DNase free tubes and aerosol filtered pipette tips.
- Use only RNase free H2O in all reactions. Use commercial sources if possible.- Wear powder free latex gloves, and change gloves often (powder can hurt the microarray images)
- Use RNaseZap (Ambion) to clean the workbench and pipetters before setting up reactions- Try to obtain the freshest sample possible from tissues and cell cultures
- Keep tissue samples in liquid nitrogen or dry ice immediately after surgical removal and keep on wet ice until then, and store at -80C.
- Process only small pieces of tissue (a few 100 milligrams) for fast homogenization. Spin down homogenized mixture and extract only the clear upper layer for further processing, to eliminate unwanted components when using Trizol.
- Avoid over-drying the final RNA pellet, since RNA is difficult to dissolve.
- Redissolve RNA pellet in RNase free H2O at 60-70C for 10 minutes.
- Check RNA quality by running out on an agarose gel, and look for two strong, not smeared ribosomal bands (28S and 5S), with the upper band brighter than the lower. However, the Microarray facility will always check the quality of RNA submitted on an Agilent Bioanalyzer before proceeding with probe preparation.
- OD the samples and look for A260/A280 ratios of: for Total RNA: ratio 1.5-1.8, and for mRNA: ratio >1.8-1.9 (above 2.0 is excellent). Note: If these ratios are not attained: attempt additional cleanup with the Qiagen RNeasy kit, or repeat extraction using a new sample.
- Always store RNA at -80C; for long term storage (>6 months) suspend RNA in 70% ethanol and store at -80C.
For Spotted Microarrays:
What is a spotted DNA microarray?
Answer: In this approach, distinct DNA fragments (cDNA or oligonucleotides) are attached as an array of distinct spots on a suitably treated glass microscope slide via a mechanical robotic spotting process. Two distinct probe DNA or mRNA mixtures - the reference and the test sample - are given fluorescent red and green labels and are combined in solution and applied to the array. The relative amounts of red and green fluorescence at each spot provide a measurement of the relative numbers of red and green labeled fragments attached at the spot, and thus of the relative numbers of fragments in the reference and test samples. The two-color system requires a compatible pair of dyes. The most commonly used are the Cy5 (Red) and Cy3 (green) fluorescent dyes. These dyes are relatively bright, stable, and they fluoresce when dry, so that the hybridized arrays can be fluorescently imaged in a dry state. On the other hand, these are patented, proprietary dyes, and their cost accounts for more than half of the cost of a spotted microarray experiment.
The probe typically consists of red fluorescently labeled mRNA (or, more commonly, the corresponding cDNA produced by in vitro reverse transcription) extracted from a test sample, and a green fluorescently labeled cDNA from a reference sample. These labeled probes are mixed in solution and hybridized to the array. Unbound probe is washed away, and the result is scanned by a fluorescent imaging system to yield red (upregulated), green (downregulated) and yellow (no difference) intensity measurements from each spot on the array. The ratio of these red and green intensities - suitably normalized - provides a measure of the change in mRNA levels between the test and reference populations, and thus of relative levels of gene expression in the test and reference sample. Compared to traditional techniques, this procedure is analogous to simultaneously carrying out thousands of Northern blots. Type of sample comparisons and experimental design used in for data analysis include direct and indirect comparisons, these are briefly mentioned below.
i . Direct comparisons include the basic comparative measurement for comparison of two samples. A reference sample is tissue or a cell line that is an "normal" state, and the test sample if from the same type of cells in a diseased or otherwise altered state. In this case, the red/green ratio provides a direct comparison of the expression in the normal and altered cells for each gene on the array.
ii. Another type of comparison includes the indirect sample comparison. To make all direct comparisons between N samples requires on the order of NxN comparisons, which gets large quickly. Instead, to compare or characterize many test samples, it is convenient to compare each one to a universal reference sample. This can be any mix of DNAs that reliably and repeatably light up most spots on the array (so that the red/green ratio is meaningful at each spot), and is available in large enough quantities to use the same batch for many experiments. Convenient reference samples can be made for mixes of RNAs from several specific cell lines or tissues. In this approach, all test samples are hybridized to the reference sample, and compared to each other only indirectly using the results of their experimental comparisons to the reference.
Also check the spotted array section of this PowerPoint presentation (15.9 MB).
What spotted microarrays are available in the facility? Can I get a genelist for available arrays?
Answer: Please to go our Spotted Array Page, find the table of available arrays. You can download the genelist for all available arrays there.
What's the price of spotted microarrays?
Answer: Please go to our Spotted Array Page, which has a table of prices for all the services we offer.
How much RNA is needed to run a spotted microarray?
Answer: 10-25μg for total RNA in 10μl RNase-free water, or 1-3μg of mRNA in 10μl RNase-free water. We also need to do an Agilent RNA quality check before labeling and hyb. So please also provide 3μl of the same sample at the same concentration in a seperate tube.
What should I put in the submission form for cy5 and cy3 samples?
Answer: Generally you'll put your samples on one channel (usually cy5) and put reference RNA on the other channel (usually cy3). The relative intensity signals are presented as a ratio cy5/cy3 in the result file. For Human, Mouse and Rat we have universal reference RNA from Stratagene. For other organisms you have to provide your own reference RNA.
For Affymetrix Microarrays:
What is the Affymetrix GeneChip array?
Answer: The Affymetrix GeneChip system provides an approach to comparatively analyze genome wide patterns of gene expression using a technology that incorporates miniaturized, high density arrays of 25mer oligonucleotide probes. The probe arrays are manufactured by Affymetrix's proprietary, light directed chemical synthesis process, which generates high density arrays of oligonucleotides that possess a predefined position on the array. These arrays are used to monitor gene expression for thousand of transcripts. A transcript is represented as a probe set. A probe set is made up of probe pairs comprised of a perfect match (PM) and a mismatch (MM) probe cells. This probe pairing strategy identifies and minimizes the effects of non specific hybridization and background signal. The intensities of each probe pair are used to determine the expression measurement. This measurement is calculated for each probe set and is described in the form of qualitative and quantitative values using the Microarray Analysis Suite, version 5.0.
Briefly, target preparation involves starting with at least 10 micrograms of total RNA or 2 micrograms of polyA mRNA from tissue or cells. An invitro transcription reaction is then performed to produce a biotin-labeled cRNA from the cDNA. The cRNA is fragmented before hybridization and a hybridization cocktail is prepared that contains the fragmented cRNA, probe array controls, BSA and herring sperm DNA. The cRNA is hybridized to the oligonucleotide probes on the array for 16 hours at 45C. Immediately following hybridization, the hybridized probe arrays undergo an automated washing and staining protocol on the fluidic station and then scanned on the Hewlett Packard GeneArray scanner where patterns of hybridization are detected. The hybridization data are colleced as light emitted from the fluorescent reporter groups already incorporated into the target, which is now bound to the probe array. Probes that perfectly match the target generally produce stronger signals than those that have mismatches. The scanner acquires an image of each of the probe cells and the computer workstation automatically overlays two scanned images and averages the intensities of each probe cell for the greatest array sensitivity. Data generated from the scan is then analyzed using the Microarray Analysis Suite, version 5.0.
Also check the Affymetrix array section in this PowerPoint presentation (15.9 MB).
What's the price of Affymetrix arrays?
Answer: Duke investigators must purchase arrays and bring to facility when submitting samples. To determine the costs for probe synthesis, hybridization, and analysis of Affymetrix arrays, please contact Holly Dressman (Email (antispam encoder used)). (Based on the volume of use, Duke investigators receive a considerable discount on the price of Affymetrix GeneChip arrays.)
What Affymetrix microarrays are available? Where can I get a genelist for available arrays?
Answer: Go to www.affymetrix.com to view the various genechips available. Go to "Additional Support" under each chip to download genelists.
How many samples do I need to prepare for an Affymetrix microarray?
Answer: 5-10μg of total RNA in 10μl RNase-free water; 1-3μg of mRNA in 10μl RNase-free water; 15μg of fragmented RNA in 40μl. For total RNA we also do an Agilent RNA quality check, so please also bring 3μl aliquots in seperate tubes.
Part 3: Microarrays - After you get your results
What options do I have for data analysis after I receive the result?
Answer: If you choose to do your own analysis, we can help you with any supported software listed on our Data Analysis Page.
We can also perform data analysis for you. Please make an appointment with Dr. Holly Dressman (668-1583 or Email (antispam encoder used))
What are those numbers in the summary file I received with my spotted array results?
scan date: the date your arrays were scanned.
Genome ID: the system we use to track individual hybridizations. Every sample should have a distinct genome ID.
Array: the array you are using. e.g. MO30K means it's Mouse array; the oglios are from Operon; the size is 30K genes.
Sample: sample name. cy5 sample name vs cy3 sample name
635 Signal/Background: the signal to background ratio for the cy5/635 channel. should expect this number to be above 2.
532 signal/background: the signal to background ratio for the cy3/532 channel. should expect this number to be above 2.
635 background: the average (mean of median) background intensity for cy5 channel, should expect this number to below 200, most time below 100.
532 background: the average (mean of median) background intensity for cy3 channel, should expect this number to below 200, most time below 100.
635 signal/noise: the median of signal to noise ratio for cy5 channel, should expect this number to above 2, higher than 5 means the cy5 sample hybridized very well.
532 signal/noise: the median of signal to noise ratio for cy3 channel, should expect this number to above 2, higher than 5 means the cy3 sample hybridized very well.
PMT 635: the laser power we used to scan the cy5 channel, usually 400-700.
PMT 532: the laser power we used to scan the cy3 channel, usually 400-700.
F635 Median: the median foreground intensity for cy5 channel, usually 300 and up.
F532 Median: the median foreground intensity for cy3 channel, usually 300 and up.
not found: the percentage of not found features -- the percentage of spots didn't get hybridized. Depend on samples you have and which array you are working on. If it's below 25%, means alomst all the genes on that array were detected in your samples (can be from either the cy5 sample or cy3 sample, or both.)
cy5 RNA concentration: the start Cy5 sample concentration we tested on Nanodrop.
cy3 RNA concentration: the start Cy3 sample concentration we tested on Nanodrop.
What are the result files for a spotted array? What does each column in a .gpr file mean?
Answer: Data acquisition is performed using the Axon GenePix Pro 4000A interface. Axon GenePix scanner refers to spots on an array as a feature. There are two dyes used in the hybridization, Cy5 and Cy3. Cy5 is scanned at a wavelength of 635 and fluoresces the color red. Cy 3 is scanned at wavelength 532 and fluoresces green. The Axon GenePix software refers to the intensities of the spots by the wavelength at which they are scanned. The Duke Microarray Facility calculates all ratios as intensity of Cy5 signal (red) / Intensity of signal Cy3 (green). If a ratio is equal to 2, then the expression of the feature was twice as high in the Cy5 labeled sample when compared to the Cy3 labeled sample. The same type of logic holds true when the intensity of the Cy3 signal is greater than the intensity of the Cy5. The ratio would be represented as a number less than 1.
For each array experiment that is completed, five files are generated. Each file will be named identical except for the extension. The naming convention is as follows:
Project ID number_Genome ID number_slidenumber_slidelot_chip type_Cy5sample_Cy3sample (for example, 0089_477_001_01_HO21K_ko_wt.*)
| Extension | Description | Comments |
| *.TIFF (635nm and 532nm) | Image file, picture of scanned array | Can be viewed in Photoshop and Powerpoint |
| *.GPR | Tab- deliminated text file with the raw data, see table below for content. | Can be opened in Excel to manipulate |
| *.JPG | Shows the array image with both channels overlaid | Can open in any operating system |
| *.GPS | Gene Pix Settings File, acquition, analysis and display settings are saved as binary GenePix settings files. Settings are organized into several different categories (acquisition, display, and analysis) all of which are saved together in the GPS file. This file contains block and feature geometry, and can be used to apply a grid template to an image. | Used with the Axon GenePix software |
The data analysis out put file is the *.GPR file. A description of each column in the GPR file are listed below:(info from http://www.axon.com/gn_GPR_Format_History.html)
| Column Title | Description |
| Block | the block number of the feature. |
| Column | the column number of the feature. |
| Row | the row number of the feature. |
| Name | the name of the feature derived from the Array List (up to 40 characters long, contained in quotation marks). |
| ID | the unique identifier of the feature derived from the Array List (up to 40 characters long, contained in quotation marks). |
| X | the X-coordinate in m of the center of the feature-indicator associated with the feature, where (0,0) is the top left of the image. |
| Y | the Y-coordinate in m of the center of the feature-indicator associated with the feature, where (0,0) is the top left of the image. |
| Dia | the diameter in m of the feature-indicator. |
| F635 Median | median feature pixel intensity at wavelength #1 (635 nm). |
| F635 Mean | mean feature pixel intensity at wavelength #1 (635 nm). |
| F635 SD | the standard deviation of the feature pixel intensity at wavelength #1 (635 nm). |
| F635 CV | the coefficient of variation of feature pixel intensity. |
| B635 | the actual background value used for the feature in GenePix Pro calculations (as opposed to B635 Median, for example, which is the local median background.) This column is required because GenePix Pro 5.0 has global and negative control background subtraction methods. If you choose a non-local method, B635 is different to B635 Median |
| B635 Median | the median feature background intensity at wavelength #1 (635 nm). |
| B635 Mean | the mean feature background intensity at wavelength #1 (635 nm). |
| B635 SD | the standard deviation of the feature background intensity at wavelength #1 (635 nm). |
| B635 CV | the coefficient of variation of background pixel intensity. |
| % > B635 + 1 SD | the percentage of feature pixels with intensities more than one standard deviation above the background pixel intensity, at wavelength #1 (635nm). |
| % > B635 + 2 SD | the percentage of feature pixels with intensities more than two standard deviations above the background pixel intensity, at wavelength #1 (635nm). |
| F635 % Sat. | the percentage of feature pixels at wavelength #1 that are saturated. |
| F532 Median | median feature pixel intensity at wavelength #2 (532nm). |
| F532 Mean | mean feature pixel intensity at wavelength #2 (532nm). |
| F532 SD | the standard deviation of the feature intensity at wavelength #2 (532nm). |
| F532 CV | the coefficient of variation of feature pixel intensity. |
| B532 | the actual background value used for the feature in GenePix Pro calculations (as opposed to B532 Median, for example, which is the local median background.) This column is required because GenePix Pro 5.0 has global and negative control background subtraction methods. If you choose a non-local method, B532 is different to B532 Median |
| B532 Median | the median feature background intensity at wavelength #2 (532nm). |
| B532 Mean | the mean feature background intensity at wavelength #2 (532nm). |
| B532 SD | the standard deviation of the feature background intensity at wavelength #2 (532nm). |
| B532 CV | the coefficient of variation of background pixel intensity. |
| % > B532 + 1 SD | % > B532 + 2 SD |
| F532 % Sat. | the percentage of feature pixels at wavelength #2 that are saturated. |
| Ratio of Medians | the ratio of the median intensities of each feature for each wavelength, with the median background subtracted. |
| Ratio of Means | the ratio of the arithmetic mean intensities of each feature for each wavelength, with the median background subtracted. |
| Median of Ratios | the median of pixel-by-pixel ratios of pixel intensities, with the median background subtracted. |
| Mean of Ratios | the arithmetic mean of the pixel-by-pixel ratios of pixel intensities, with the median background subtracted. |
| Ratios SD | the standard deviation of pixel intensity ratios. |
| Rgn Ratio | the regression ratio. |
| Rgn R2 | the coefficient of determination for the current regression value. |
| F Pixels | the total number of feature pixels. |
| B Pixels | the total number of background pixels. |
| Circularity | a measure of circularity from 0 to 100, using a metric based on the variance of the distance of each boundary pixel to the centroid of the feature: 100 is most circular, 0 is most non-circular. Circular features always have a circularity of 100, square features always have a circularity of 79 (= p/4*100). |
| Sum of Medians | the sum of the median intensities for each wavelength, with the median background subtracted. |
| Sum of Means | the sum of the arithmetic mean intensities for each wavelength, with the median background subtracted. |
| Log Ratio | log (base 2) transform of the ratio of the medians. |
| F635 Median - B635 | the median feature pixel intensity at wavelength #1 with the median background subtracted. |
| F532 Median - B532 | the median feature pixel intensity at wavelength #2 with the median background subtracted. |
| F635 Mean - B635 | the mean feature pixel intensity at wavelength #1 with the median background subtracted. |
| F532 Mean - B532 | the mean feature pixel intensity at wavelength #2 with the median background subtracted. |
| F635 Total Intensity | the sum of all pixel intensities in the feature. |
| F532 Total Intensity | the sum of all pixel intensities in the feature. |
| SNR 635 | the signal-to-noise ratio of the feature, calculated as (F635 Mean - B635 Mean) / B635 SD. |
| SNR 532 | the signal-to-noise ratio of the feature, calculated as (F635 Mean - B635 Mean) / B635 SD. |
| Flags | the type of flag associated with a feature. |
| Normalize | flag column describing if the feature was used to calculate the normalization factors (1 for used, 0 for not used). |
| Autoflag | reports whether or not a feature has been flagged from the Flag Features dialog box. It applies to good and bad flags only. |
What are the result files for an Affymetrix array? Which one should I use for which analysis program?
Answer: For each array experiment that is performed using the Affymetrix GeneChip arrays, six files are generated (*.DAT, *.CEL, *.CHP, *.EXP, *.RPT, *.txt). Each file will be named identical except for the extension. The naming convention is as follows:
Project ID number_Genome ID number_chip type_sample name (i.e. 0089_1212_HU95A_wt.*
In the case of pairwise comparisons in the Affymetrix Microarray Suite v5.0, the comparison files will follow the following naming convention:
Sample_base_Sample_exp.txt
The types of extensions are as follows:
| Extension | Description | Comments |
| *.DAT | Scanned image of the GeneChip array | Can only be opened in Microarray Analysis Suite |
| *.CEL | Cell intensity file that calculates the average intensities for each cell and assigns it to an x,y coordinate position | Can be opened in Excel to manipulate |
| *.CHP | Contains analysis output | Can only be opened in Microarray Analysis Suite |
| *.EXP | Contains experimental information | Can only be opened in Microarray Analysis Suite |
| *.RPT | Contains quality control information about the chip | Can be opened in Excel to manipulate |
| *.txt | Contains analysis output | Can be opened in Excel to manipulate |
All data analyses will be given as *.txt files.
I'm using GeneSpring. Where can I get the genome file for spotted arrays?
Answer: Please go to our Spotted Array Page, where you can download GeneSpring genome files.



