GLR: a statistical analysis program to identify differentially expressed genes from microarray data
GLR is a statistical program developed by Song Wang to identify differentially expressed genes from microarray data. The program is written in Microsoft Visual Basic. It can be run in any PC with Windows OS.
GLR implements a generalized likelihood ratio test based on the two-component model proposed by Rocke and Durbin (2001).
Download: User can download the package as a zip file GLR.zip and then unzip it in their computers.
Installation: Double-click ¡°Setup.exe¡± to start installation. Follow the on-screen instructions. Sometime there is an error message about file registering, ignore it. To start GLR program click the START button from the windows Taskbar and select PROGRAM>GLR>GLR.
Use the program: Users should have the data file in their computer and open that file by clicking the ¡°open¡± button and selecting the file in the opening window. Users should also specify the numbers of replicates in control and experimental samples. Analysis begins when users click the ¡°Analyze¡± button. The rank, gene name and GLR test statistic ¨C2logl for each gene will be displayed in the text box in a descending order. The results can be saved to text or Richtext file. The running time for the program depends on the number of genes in the data set, the sample size and the CPU speed of the computer running it. It takes 7 seconds per gene with a sample size 8 in a Pentium III 1GHz CPU.
Data File Format: The file format expected is the tab-delimited plain text file. Each
line in the file represents expression data of a gene. For each gene, data
include the gene name and background¨Csubtracted expression levels for both
control and experimental samples. The column heading and extra blank lines at
the end of the file should be removed. Missing data or data that are below
background should be coded as ¡®0¡¯. The total number of replicates should be
less than or equal to 50 and the total number of genes should be less than
40,000.
Here's an example file:
hdeB 1.13E-03 1.33E-03 9.25E-04 9.86E-04
sanA 6.38E-04 0 3.43E-05 5.64E-05
yhaS 3.96E-05 3.26E-05 4.08E-05 4.24E-04
yeiL 1.24E-03 5.05E-05 0 3.44E-05
nuoJ 4.45E-04 5.53E-05 7.12E-05 6.17E-05
ycfC 0 2.61E-04 2.61E-04 2.65E-04
The first two columns of data are from control samples and the last two columns of data are from experimental samples.
The IHF data set and the permutation data sets used in our paper can be found in data folder.
The GLR test is based on the two-component model
proposed by Rocke and Durbin.
According to the model,
when
. Therefore,
can be estimated by the data of genes with the lowest
expression intensity. While when
,
,
,
. Therefore,
can be estimated by the data of genes with the highest
expression intensities.
To simplify the analysis, we transform data:
is the sum of a normally distributed random variable
(¦Å) and a lognormal distributed random variable (
). The convolution formula can be used to determine the
density function of
.
Convolution formula: For a sum S=X1+X2, where X1 and X2 are
independent random variables, ![]()
So the density function of
is:

The GLR test statistic for hypothesis
is defined by:
For numeric integration, we use the Boole¡¯s rule:

![]()
with k=2000, x1=logu-3 and xn=logu+3. This produces accurate results.
To find the maximum likelihood estimator
of
, we solve the equation
=0 numerically using bisection method.