Logos

Download the user guide in PDF format here.

Introduction

Although the role of microarray statistical analysis software is important, the extraction of differentially expressed gene lists is a primary step and most times away from acquiring a real insight on the biological subject interrogated and the molecular mechanisms underlying it. The next logical step is to bundle these genes with annotated information in various databases concerning their functional role, in order to highlight both statistically significant and biologically relevant genes which characterize or distinguish the biological subject interrogated. Functional analysis steps usually include pathway analysis to uncover genes with a certain expression profile that share the same pathway, exploration for common regulatory elements among groups of genes and gene functional analysis based on biological databases or ontologies. The Gene Ontology (GO) database provides such functional annotation in a hierarchical way constituting a valuable tool for microarray experiment meta-analysis. In addition, the Kyoto Encyclopedia of Genes and Genomes biological pathway database comprises a well structured and constantly enriched library of molecular networks which has been widely used as a reference for biological interpretation of large-scale datasets generated by sequencing and other high-throughput experimental technologies.

StRAnGER is a web-based application, which performs functional analysis of high-throughput genomic datasets, starting from a list of significant genes derived from statistical and empirical thresholds, by utilizing the GO database and the KEGG pathway database as well as established statistical methods in order to relate the identified significant genes with important nodes in the GO tree structure or map those genes to over-represented metabolic pathways. In this way, cellular actions are seen as conceptual entities that are mapped as nodes to a hierarchical organizational schemas such as as the GO tree structure, where all functional annotations stem from the root nodes of molecular function, cellular component or biological process. Aim of the application is to suggest whole molecular pathways or parts of them, incorporating a number of significantly differentially expressed genes of the list, rather than isolated genes whose measurements are more susceptible to systematic or random errors, as interesting targets for further biological research. Regarding GO analysis, the rationale supporting StRAnGER is the exploitation of an essential property of the GO terms tree structure and subsequently of the population of GO terms (GOTs) derived by each significant gene list; many genes which are hierarchically lower (descendants) in the context of several biological functions are represented as ‘leaves’ in the GO tree structure but are connected to hierarchically higher biological entities through the same tree structure, and as a result inherit these GOTs too. The main goal of StRAnGER is to sort out among all the GOTs associated with the significant gene list, those associated with nodes higher in the GO hierarchy, which consequently encompass a number of genes that act on a specific biochemical pathway, and rank them according to their statistical significance, following their p-value score as derived from a suitable over-representation test. In this sense the result of the inference of noise on high-throughput genomic experiments is significantly mitigated, thus enabling the targeting of specific biological objects for further investigation.

Application usage

Data import

The first page of the application is used for data import to be used in the subsequent analysis. The following sections depict the procedure and the capabilities of the application.
Using a stored experimental result If the user has performed microarray data analysis using the GRISSOM distributed platform, the results can be selected from the corresponding list and the user can proceed directly to the analysis parameters.

Gene list

StRAnGER’s basic input is a text tab-delimited file with only unique gene identifiers corresponding to the microarray platform used (e.g. Affymetrix Human DNA chip 133A) or to the public database that is chosen to use genome background from (e.g. Ensembl). An additional column with p-values corresponding to each gene as well as any additional data columns containing e.g. expression values for each gene is optional and can be appended to the output. Alternative gene identifiers that are supported are HUGO gene names, Genbank and Entrez IDs. In case of file uploading, wizard allows the user to specify the columns containing information required for the subsequent analysis.

Background

StRAnGER offers a variety of sources and organisms to be used for the generation of the background dataset, including Bioconductor array annotation packages or Ensembl genes on various organisms. The user can also upload own annotation file including the minimum information required for StRAnGER analysis, in text tab-delimited format. In this case, a wizard allows the user to specify the columns containing information required for the subsequent analysis, otherwise, StRAnGER continues automatically.

Attention should be paid if the user uploads or pastes a gene list and subsequently selects a platform background from the drop-down menu. In this case, the user MUST select the correct type of gene identifiers that correspond to the gene list in the file or the pasted ones.

Data selection

Gene List

The user should use the lists GeneID and p-value (optional) to select the corresponding columns in the gene list file uploaded for analysis. The GeneID is very important and should correspond to the selection of the previous page (ProbeID, Gene Symbol, Entrez ID or Genbank accession). If the selection does not correspond to the declaration of the previous page, StRAnGER will not run! The same applies if the user has pasted a gene list instead of uploading a file. Additionally, StRAnGER allows the user to append to the output any additional column contained in the uploaded file by checking the corresponding checkboxes.

Background List

In case of uploaded background, the user should select the columns corresponding to the elements on the left, GeneID (should be of same type with GeneID in gene list!), Gene Name (Symbol), Gene Description and the column with Ontological terms (GO or KEGG). In the case of user uploaded background file, the column with GO terms should contain the terms in the GO format, that is GO:XXXXXXX. It does not mind if the column contains other element among the terms (e.g. descriptions) as StRAnGER will parse only the GO terms. The same applies to KEGG pathways which should be in the form YYYYY where Y is a number. KEGG pathway IDs should NOT have as prefix the organism acronym (e.g. 00640àcorrect, mmu00640àwrong).

Selecting the analysis parameters

Analysis parameters

The third (or second if user analyzes stored experimental results) page of the application allows the user to specify the statistical analysis parameters for StRAnGER as well as options regarding the type and the graphical representation of the output results.

Statistical test

Over-representation test

StRAnGER currently supports three statistical tests for the identification of enriched ontological terms, given a list of selected genes and the appropriate background. In all the above cases, n denotes the number of genes in the microarray/reference list, x the number of genes in the array/reference list associated with the term Ti, t the number of genes in the significant list and z the number of genes in the significant list annotated to the term Ti.

p-value cutoff

The p-value statistical threshold for the detection of over-represented ontological terms.

Bootstrap

Number of iterations

The number of bootstrap iterations that StRAnGER will perform in order to derive the robust cutoff for the distribution of enrichment elements.

Cutoff percentage (%)

The percentile threshold of the enrichment elements distribution that defines the acceptable cutoff for significant terms.

Bootstrap

Which distribution should the application bootstrap. Possible options are “Terms” for bootstrapping ontological terms or “Elements” for bootstrapping the enrichment elements distribution (default StRAnGER algorithm).

Run analysis on

Possible options are GO terms and KEGG pathways.

Output options

This section describes the possibilities regarding the output types of StRAnGER as well as the graphical output.

Graphical options

Output

One of “No visualization” for not constructing a graphical output (for Gene Ontology analysis) or PDF, PNG or SVG for the corresponding graphical outputs, as PDF document, or PNG/SVG images.

Node shape

The shape of the nodes of the output GO tree.

Node outline color

The outline color of each node in the output GO tree. It can be one of the colors in the list.

Node fill color

The fill color of each node in the output GO tree. It can be one of the colors in the list.

Top scoring terms in tree

How many of the top statistical GO terms include in the output tree. If a large number is combined with a high ancestor level, the tree might become too complex in terms of display.

Ancestor level

How many levels up in the GO hierarchy should each output GO term be connected to. If a large number is combined with a large number of top scoring terms, the tree might become too complex in terms of display.

Output

This option should be used to determine the desired output formats of the analysis results. Possible choices are “All” for a complete output incorporating significant terms with their statistics, the genes that are found below each term and any additional information regarding the genes, “Only stats” for an output containing only the significant terms with and some summary statistics and “Only terms” for a single column output containing only the significant terms. In addition, the results can be displayed in HTML format: in case of GO based analysis, links to AMIGO are provided for GO terms and the genes under these GO terms are linked to the GeneCards database. In the case of KEGG based analysis, links to the KEGG PATHWAY database are provided for the significant pathways and the genes mapped to these pathways are linked to the GeneCards database. Genes mapped on the significant pathways can be colored but the time required due to the slow speed of the KEGG web service can be substantial, so in this case, it is suggested to get the results via mail instead of direct display. Two types for results retrieval are provided. The user can either directly download the results upon completion, or provide an e-mail account so that he can receive the results.

Using the CERVis tool

The CERVis tool combines two or more outputs of StRAnGER (for GO analysis only) to produce one graph which allows the use for example to combine or compare significant GO terms in an experimental design consisting of two doses of a specific drug administration. The user should select the graphical outputs and the amount of files that will be supplied. The next page prompts the user to upload the files.

A small tutorial

This section presents a small tutorial on StRAnGER very basic usage. For any question regarding further usage, difficulties or suggestions please contact Panagiotis Moulos (pmoulos@eie.gr). Firstly, download the test dataset from here. Unzip the contents of the zip file and upload the file “GeneList.txt” and “BackgroundList.txt” using StRAnGER’s interface and the corresponding fields. Optionally you can name your project (e.g. “Test”). Then, hit “Next”. Do NOT upload the test_dataset.zip file directly!
The following screen will appear, prompting you to select the columns from your files that correspond to the required elements. Attention should be paid to the GeneIDs in the gene list and background file as this column is very essential for the proper execution of StRAnGER. You can select any additional columns to be appended to the output, from the gene list file.
Hit “Next”. The following screen will appear, prompting you to select the running options. For their meaning, please see the section which describes the application above. Make the necessary selections as shown in the following figure which depicts the screen that appears after you hit “Next”.
You can also select whether to color KEGG pathways or not, or receive the results via e-mail. Hit “Submit”. The program will run for a while and the following screen will appear:
Click on “VIEW RESULTS”. The following web page will appear:
You can also download all the above results, compressed, by clicking on “DOWNLOAD RESULTS”. You can run another analysis by clicking on “Start new analysis”.

List of common mistakes!

During the past months that StRAnGER is up and running, we have been monitoring the usage as well as the rates of success in the execution of the application. While the usage is quite straightforward and most runs result in succesful results, we have spotted several runs that result in failures. Trying to determine the reasons for this we have seen that most of the failures appear are a result of mistakes in the usage and the inputs from the users. While we are working in order to make the application more self-defendant and self-explanatory, here is a list of common mistakes that users should avoid and be careful when performing an analysis: