Omics Data Analyser
This is an R program to visualize and analyze -omics data such as those from microarray, metabolomics and proteomics experiments. Next-generation sequencing such as RNA-Seq and single-cell RNA-Seq analyses are to be supported in the future. The input of this program is intensity or count data supplied in a list or table along with parameter settings, all in an Excel file. The output is an Excel file with figures and analysis result sheets.
Current version: V2.2. Tested R version: 4.3.2.
Highlighted updates in V2.2: Bulk RNA-seq analysis starting from a raw count table was added. DESeq2 is used for bulk RNA-seq data normalization and differential tests. clusterProfiler ORA and QEA analyses were added for gene/protein data. PLS-DA plot was added. Spearman and Kendall correlation plots were added. Analysis without erasing previous results was enabled. Bug fixes.
Highlighted updates in V2.1: Metabolite Set Enrichment Analysis (MSEA), Pathway Analysis (MSPA), and Joint-Pathway Analysis (JointPA) were implemented using MetaboAnalystR 3.0.3. If a list of metabolite IDs is supplied, the Over Representation Analysis (ORA) can be done. If a differential test is specified, the Quantitative Enrichment Analysis (QEA) can be done. To do the analysis, metabolites need to be HMDB IDs or KEGG IDs. Alternatively, features can be metabolite names available in the MetaboAnalyst's compound database. See MetaboAnalyst for details about MSEA, MSPA, and JointPA.
How to run this tool:
Step 1: Click on the Download button at the top right corner of this page to download the source code and unzip on your computer.
Step 2: Copy the data template file from the unzipped directory and open it in Excel. Read the instructions in there.
Step 3: Copy your data to the RawData sheet of the template. Data can be a list or table with samples, features, feature descriptions and values. See data template for details.
Step 4: Fill in the Parameters, Comparisons, Features, and Samples sheets as necessary.
Step 5: Run the program with your data and save results in an Excel file. If visualization is enabled, a Figures folder will be created to save the plots in the Portable Network Graphics (.png) and postscript (.ps) formats. If enrichment analysis is enabled, an EnrichmentAnalysis folder will be created to save the results. See below for ways of running the program.
- Running on the BioHPC @ UTSW. Log on the BioHPC Portal, launch a Web Visualization node, open a terminal from there, and run the following:
sh /path_to_the_program/oda_analysis.sh /input_path/your_data_file.xlsx /output_path/ optional_BioHPC_queue_name
- Running on your local machine with a Singularity / Docker / Podman container. Make sure Singularity / Docker / Podman is installed and you can run it from a command line tool such as a Linux terminal or Windows CMD / PowerShell. You should request a copy of the corresponding container from me.
singularity exec oda_container Rscript /path_to_the_program/ODA/ODA.R /input_path/your_data_file.xlsx /output_path/
- Running on your local machine. Make sure R and required packages are installed and you can run the Rscript command from a command line tool such as a Linux terminal or Windows CMD / PowerShell.
Rscript /path_to_the_program/ODA/ODA.R /input_path/your_data_file.xlsx /output_path/
Examples: See the Examples folder for typical analysis settings and output files (more to be added).
- Raw_Data_QC: This examples shows how to compare samples with quality controls to know whether the experiment works well. For such comparisons use the raw, non-normalized data.
- QC_Excluded_Normalized_Data_Differential_Intensities: This example shows how to do differential intensity tests between biological samples under different experimental conditions. For such comparisons you should exclude quality controls and normalize data.
- Technical_Replicates: When there are technical replicates from the same biological subjects, the program averages them before normalization. This example shows you how to specify technical replicates.
- Multiple_Batches: If you have data from multiple experiments and suspect batch effects, the program can help identify them by visualization. Batches can be modeled if using glm-based statistical tests. This example shows you how to specify batches.
- Paired_or_Matched_Samples: If you have paired or matched samples, the program can do paired statistical tests. This example shows you how to specify the pairing or matching.
- Interested_and_Excluded_Features: If you supply a list of features e.g. significant features by differential intensity tests, the program can generate plots and result sheet for selected features only. The program can also exclude features from analysis if you supply a list. This example shows you how to specify those lists.
- List_Data: The program supports raw data in a table (features in rows and samples in columns) or list (features, samples, and values each in a column). While the above examples are all in the table format, this example shows you how to supply data in the list format.
- Ratio_Data: See this example for an analysis using ratio data such as percentages that are less than 1.
- MSEA_Analysis: See this example for Metabolite Set Enrichment Analysis results.
Citation:
DeVilbiss AW, Zhao Z, Martin-Sandoval MS, Ubellacker JM, Tasdogan A, Agathocleous M, Mathews TP, Morrison SJ. Metabolomic profiling of rare cell populations isolated by flow cytometry from tissues. eLife 2021;10:e61980. PMCID: PMC7847306
Contact:
Contact Zhiyu Zhao for comments / questions / suggestions about this software.