Info

Data file (Excel)

Image file (TIFF) (two different ones)

Data file (Excel)

Figure 2.16 Illustrating the steps that have to be taken to transform a microarray scan into a gene-expression data matrix. FR, fold regulation.

on the array that represent the same gene (this is always the case with oligonucleotide gene chips), the corrected spot intensity is averaged over all spots for the same gene.

Estimating expression ratios. As explained in Section 2.3, transcription profiling with the use of microarrays is essentially relative. In spotted cDNA arrays a query sample is compared with a reference sample by competitive hybridization and two signals, corresponding to two different fluorescent dyes, are read from each spot. When using gene chips, the reference and query samples are hybridized to different arrays, but also in this case spots corresponding to the same probe are compared. The quotient of the two signals is defined as the expression ratio Ti for each gene i:

Ri where Qi is the signal for gene i in the query sample and Ri is the signal for the same gene in the reference sample. In spotted cDNA microarrays, Q derives from the Cy5 signal (red) and R from the Cy3 signal (green), or vice versa. The expression ratios are always logarithmically transformed and generally the logarithm to the base 2 is applied. This transformation results in a quantity known as fold change or fold regulation (FR). As a result of the transformation, FR takes a value of 0 if there is no change, a value of 1 if there is a 2-fold increase, and a value of —1 if there is a 2-fold decrease in expression. A 4-fold increase results in FR = 2 and a 4-fold decrease in FR = —2. So:

It should be noted that the use of expression ratios has a disadvantage, namely that information about the actual signal intensity is lost. So genes that are expressed weakly in both the Q and R samples are treated similarly to genes with an overall strong expression, if the relative up- or downregulation is the same. Whereas taking relative measures is common in transcription profiiling, in other microarray applications, for example the detection of microbial diversity in the environment (see Chapter 4), absolute values are taken. Kerr and Churchill (2001) took a stand against the argument that transcription profiiling is necessarily relative and they proposed that the two readings could be considered as two correlated measurements, as in a traditional incomplete block design, common in agricultural experimentation.

Normalization. For various reasons, the FR values obtained cannot be directly compared across replicate measurements or different experiments. The most common source of variation is the use of different amounts of RNA as the starting material from which the target sample was prepared. Another important source is unequal incorporation of dyes in the cDNAs. There are various strategies for normalization (Causton et al. 2003). One approach is to calculate the mean FR value of all probes on the array and to subtract this mean value from all other values (total intensity normalization). This will make sure that the mean expression ratio over the whole array is unity. Another approach is to consider a regression of log Qi against log Ri. If the initial amount of RNA is exactly the same for the Q and R samples and if the labelling and detection efficiencies are also identical, such a plot would show a cluster of points around a straight line through the origin with slope 1 (of course, individual genes will lie apart from the line due to up- or downregulation). However, often the data do not fall exactly on such a straight line and show a curving trend (Fig. 2.17, left-hand panel). Since the interest lies in deviations from the diagonal, insight may be increased by rotating the plot by 45° and re-scaling the axes. This can be done by plotting M - 2log Q — 2log R over A - 2log Q + 2log R, and in such a plot M should be independent of A (Fig. 2.17, right-hand panel). If this is not the case, one can correct the data by subtracting a quantity c, which depends on A and is defined as the difference between the local deviation of the data from a horizontal line. This correction term is estimated for each value of A by means of local weighted regression (loess; Smyth et al. 2003).

The third approach in data normalization is to use expression ratios of housekeeping genes as a basis for normalization. One can also use spiked controls that do not cross-hybridize with the

Figure 2.17 (a) Scatterplot of two signal intensities, log Q (from the query sample, e.g. fluorescence from Cy5) and log R (from the reference sample, e.g. Cy3) over all genes i in a microarray expression analysis. Ideally one expects that the average of the data falls along a straight line with slope 1 through the origin. The slightly curved shape indicates intensity-dependent bias. (b) When plotted as M = 2log Q- 2log R over A = 2log Q + 2log R, the bias is visualized more clearly. The data can be corrected by subtracting a term which depends on A and is estimated by local weighted regression (loess). From Smyth et al. (2003) by permission of Humana Press.

Figure 2.17 (a) Scatterplot of two signal intensities, log Q (from the query sample, e.g. fluorescence from Cy5) and log R (from the reference sample, e.g. Cy3) over all genes i in a microarray expression analysis. Ideally one expects that the average of the data falls along a straight line with slope 1 through the origin. The slightly curved shape indicates intensity-dependent bias. (b) When plotted as M = 2log Q- 2log R over A = 2log Q + 2log R, the bias is visualized more clearly. The data can be corrected by subtracting a term which depends on A and is estimated by local weighted regression (loess). From Smyth et al. (2003) by permission of Humana Press.

target; for example photosynthesis genes from Arabidopsis on an insect microarray. These probes are then queried with a well-known amount of added RNA and so their signals provide a stable reference. Unlike in quantitative PCR, the use of housekeeping genes is not the preferred approach in the case of microarrays, because it ignores the multitude of information on the array (expression of only a few genes is used to correct for expression of thousands of others) and micro-arrays do not allow very precise measurement. Data filtering. In addition to normalization it is also recommended to filter the data to remove dubious expression measurements. The most important data-filtering operation is to screen for low-intensity measurements that have a large inaccuracy. As a criterion, each fluorescence signal should be at least twice the standard deviation of the local background. Another issue in data filtering concerns the case where probes targeting the same transcript produce inconsistent results.

2.4.3 Statistical analysis of microarray experiments

The result of a single microarray experiment is a data file that can be viewed as a matrix with one very long column in which the FR values of all genes are noted. Usually one experiment involves several samples and these are taken together in one gene-expression matrix with a number of columns; for example different points in time, different physiological states of the organism, or different environments from which the RNA was isolated (Fig. 2.18).

Because gene-expression matrices are valuable resources for statistical analysis and a single investigator is often not able to exploit all possible data-analysis techniques, the data are often published on the Internet. This allows other researchers to compare expression profiles across studies, in much the same way as a genomic database is consulted by different people. Brazma et al. (2001) considered the requirements that such data matrices should have in order to be valuable for the research community. They developed a standard known as minimum information about a microarray experiment (MIAME). This standard stipulates that publication of geneexpression matrices should be accompanied by details about:

• Experimental design (conditions, doses, replication, quality-control measures, etc.)

Samples

Sample annotation e G

0 0

Post a comment