Supported Analysis Software

Profiler
Work in programs within the Center for Applied Genomics and Technology has focused on the development of statistical methodologies for supervised analysis to classify and predict breast cancer outcomes using gene expression data. * One of these, named Profiler, was developed to identify gene expression profiles that correlate with the phenotype of interest. This group of genes is then employed for a binary regression analysis to identify gene expression patterns, expressed as principal components, that represent underlying structure in the data. The goal is to identify the patterns of gene expression that most highly correlate with and define the cellular state of interest. Programming work in the Duke center has focused on the generation of a graphical user interface that allows investigators to access the program. A tab-delimited text file of the raw expression values and another text file of the gene names is loaded into Profiler. All samples are normalized within the program and a binary re gression analysis is established through testing numerous genes that define principal components which predict a phenotype of interest.

Tree Profiler
Uses classification and regression tree methods for binary classification. One approach that has been found useful in a number of studies in cancer and other contexts is the use of multiple metagene summaries as predictors of a phenotype. The metagenes are simply gene expression signatures representing patterns of co-expression generated by initial clustering of expression data. The classification tree strategy provides a mechanism to sample many sources of data to predict a phenotype, such as ER status in breast tumors. The advantage in this approach is the ability to utilize multiple forms of data; this could be multiple metagenes (clusters), and other genomic data such as DNA methylation patterns or DNA copy number patterns, protein profiles, or other biological and clinical data.

Duke Integrated Genomics (DIG) Annotation System
A web-based data management and information system for retrieval of a variety of functional information sources linked to the genes included on most microarrays utilized within the Duke Microarray Center. The system also provides access to a powerful method for literature searching.

GATHER
Gene Annotation Tool to Help Explain Relationships is a computational tool that analyzes lists of genes identified in high throughput experiments. It will identify significant Gene Ontology functions, biological pathways, interacting proteins, microRNA regulation, transcription factor regulation, or other biological systems to develop a deeper insight into the biology underlying the gene signature. It can infer novel functions and successfully predicted 90% of the functions in an evaluation over a broad range of gene groups.

ChipComparer
This program is designed to identify common genesets on different microarrays. The program will first map each probeset ID in your selected micorarray chips (A and B) to corresponding LocusID using LocusLink and UniGene dbs, then report the probeset ID pair (from A and B) that refer to the same gene locus (if same organism) or the orthlogs (if different organisms, using NCBI-HomoloGene). To access Chip comparer, go to this web site.

File Merger
This program will merge the contents from Source and Target Files, according to the shared identifiers, or the correlationship in the Bridging file.

DMD
(Duke Microarray Database) is a local implementation of the Stanford Microarray Database (known as SMD), which is a current standard web-enabled microarray analysis solution used by several academic research groups across the country. DMD allows users of spotted array data to have the ability to utilize numerous data visualization abilities, including slide viewing, expression histograms, and other plotting tools as well as inter-experiment gene filtering, self-organizing map partitioning, and hierarchical clustering. Finally, DMD provides the raw GenePix data files for users who would prefer to analyze their data elsewhere. For those interested in utilizing this resource, all users MUST create an IGSPnet account (which will be their DMD account as well). Support for this resource is solely based on information that is provided by the Stanford Microarray Database help section on the web site. For more information on how to use DMD, please go to this web site for tutorials.

RMAExpress
RMAExes for Affymetrix Genechip® data using the Robust Multichip Average expression summary. It does not require R nor is it dependent on any component of the BioConductor project. RMA is the Robust Multichip Average. It consists of three steps: a background adjustment, quantile normalization (see the Bolstad et al reference) and finally summarization. Some references (currently published) for the RMA methodology and a user manual are found on the web site. Click here to go to web site for more information and download software. You will need the CEL files and the array CDF file. You can obtain the CDF file from the Affymetrix web site. Select the array of choice, and on the right hand side there is a section called Library files, download the zip file. Inside is the CDF file.

SAM
This software was developed at Stanford and is known as Significance Analysis of Microarrays. SAM identifies genes with statistically significant changes in expression by assimilating a set of gene specific t tests (click here for more information about SAM).

PAM
Prediction Analysis for Microarrays. This software was developed at Stanford. It provides class prediction and survival analysis for genomic expression data mining. Performs sample classification from gene expression data, via "nearest shrunken centroid method'' of Tibshirani, Hastie, Narasimhan and Chu (2002): "Diagnosis of multiple cancer types by shrunken centroids of gene expression" (PNAS website). PNAS 2002 99:6567-6572 (May 14). For survival outcomes, implements 'supervised principal components' method. See Semi-supervised methods for predicting patient survival from gene expression papers (Bair and Tibshirani) PLOS Biology, and Prediction by supervised principal components (Bair, Hastie, Paul, Tibshirani) Stanford tech report Version 2.0 (Mar 7, 2005) featuring: survival analysis via supervised principal components, Estimates prediction error via cross-validation Provides a list of significant genes whose expression characterizes each diagnostic class. Works with data from both cDNA and oligo microarrays. Can also be applied to protein expression data and SNP chip data.

Cluster
A program developed by Michael Eisen for the analysis of gene expression data using hierarchical clustering, self-organizing maps (SOMs), k-means clustering, and principal component analysis. Hierarchical clustering methods described in Eisen et al. (1998) PNAS 95:14863.

TreeView
A program developed by Michael Eisen for visualization of the results of microarray data analysis. One can graphically browse results of clustering and other analyses from Cluster. Supports tree-based and image based browsing of hierarchical trees. Multiple output formats for generation of images for publications.

GeneSpring
The Microarray Core Facility has a site license for GeneSpring version 7 (www.silicongenetics.com). The software allows one to visualize, organize and manipulate gene expression data. GeneSpring provides a host of tools to ask detailed questions about complex data sets. Sixteen transformations are available for creating powerful and flexible normalization scenarios. Normalization steps can be applied in virtually any order and include operations such as dye swapping experiments and median polishing. Scenarios can be saved and applied in other experiments. GeneSpring offers visually intuitive filtering tools for both entry-level and advanced users. All visual filtering windows generate graphs of results in real-time. These filters allow researchers to exclude particular conditions, set minimum and maximum values and choose specific gene lists to filter. The advanced filtering window allows you to create complex Boolean expressions to identify genes with a highly specific expression pattern. It a lso provides various analysis tools, such as t-tests, 2-way ANOVA tests and 1-way post-hoc tests for reliably identifying differentially expressed genes. GeneSpring also has class prediction tools that can identify genes capable of discriminating between one or more experimental parameters or sample phenotypes. Groups of genes identified by expression profiling can be further characterized by performing sequence searches for potential regulatory elements. GeneSpring provides sophisticated clustering methods to uncover patterns of gene expression data and the relationships between these patterns. Researchers can use one or a combination of clustering options to characterize their data: gene trees (hierarchical clustering), experiment trees, self-organizing maps, k-means, Principal Components Analysis (PCA) and QT clustering. QT clustering is an unsupervised technique that allows you to specify both the minimum size and maximum correlation coefficient of each cluster. Principal Components Analysis (PCA), allow s you to reduce the complexity of your data by discovering a number of principal components that define most of the data variability. Pathways may also be explored with the pathway viewer, genes and their expression patterns can be visually characterized based on their location within a cellular pathway. Users can design their own pathway diagrams or directly import publicly available pathway maps. Users can predict genes associated with discrete steps in the pathway of interest. You can download a demo of the software and then contact Zhengzheng Wei or Heather Hemric to obtain the site license password that will enable use of GeneSpring on your computer.