Skip to content
Snippets Groups Projects
Zhiyu's avatar
Zhiyu Zhao authored
266e1cb4

Omics Data Analyser

This is an R program to visualize and analyze -omics data such as those from metabolomics, lipidomics, proteomics, microarray, and Bulk RNA-Seq experiments. The input of this program is an intensity, count, or ratio data table along with parameter settings, all as spreadsheets in an Excel file. The output includes spreadsheets in an Excel file, figure folders, and an .rdata file for interactive visualization (under development).

FlowChart

Current version: V2.2.1. Tested R version: 4.3.2.

Updates in V2.2.1: A Coefficient of Variation (CV) column was added for the raw and normalized data sheets. clusterProfiler QEA and ORA analyses were added for the "Other" data type with a customized DB in the .gmt format. log2 fold changes were added to comparison sheets to facilitate the clusterProfiler QEA analysis.

Highlighted updates in V2.2: Bulk RNA-seq analysis starting from a raw count table was added. DESeq2 is used for bulk RNA-seq data normalization and differential tests. clusterProfiler ORA and QEA analyses were added for gene/protein data. PLS-DA plot was added. Spearman and Kendall correlation plots were added. Analysis without erasing previous results was enabled. Bug fixes.

Highlighted updates in V2.1: Metabolite Set Enrichment Analysis (MSEA), Pathway Analysis (MSPA), and Joint-Pathway Analysis (JointPA) were implemented using MetaboAnalystR 3.0.3. If a list of metabolite IDs is supplied, the Over Representation Analysis (ORA) can be done. If a differential test is specified, the Quantitative Enrichment Analysis (QEA) can be done. To do the analysis, metabolites need to be HMDB IDs or KEGG IDs. Alternatively, features can be metabolite names available in the MetaboAnalyst's compound database. See MetaboAnalyst for details about MSEA, MSPA, and JointPA.

How to run this tool:

Step 1: Click on the download link (password: ODA@CRI_UTSW) to download a copy of the ODA tool. Downloading is NOT necessary for CRI users. The files are shared at /archive/CRI/shared/Tools/zzy/ODA/Vx.y on BioHPC.

Step 2: Copy the data template file from your ODA directory, save it in your analysis folder, and open it in Excel. Read the instructions in there. This is your input file.

Step 3: Copy your data table to the RawData sheet of your input file. Data should be a table with samples, features, optional feature descriptions, and values. See data template for details.

Step 4: Fill in the Parameters, Comparisons, Features, and Samples sheets as necessary.

Step 5: Run the program with your input file and save results in an output folder. If visualization is enabled, a Figures folder will be created to save the plots. If enrichment analysis is enabled, an EnrichmentAnalysis folder will be created to save the results. See below for ways of running the program.

  1. Running on the BioHPC @ UTSW. Log on the BioHPC Portal, launch a Web Visualization node, open a terminal from there, and run the following:
sh  /path_to_the_program/oda_analysis.sh  /input_path/your_data_file.xlsx  /output_path/  optional_BioHPC_queue_name
or
sh  /archive/CRI/shared/Tools/zzy/ODA/Vx.y/oda_analysis.sh  /input_path/your_data_file.xlsx  /output_path/  optional_BioHPC_queue_name	#For CRI users only.
  1. Running on your local machine with a Singularity / Docker / Podman container. Make sure Singularity / Docker / Podman is installed and you can run it from a command line tool such as a Linux terminal or Windows CMD / PowerShell. If you do not use Singularity, you should request a copy of the corresponding Docker / Podman container from me.
singularity exec /path_to_the_program/Vx.y/r_with_packages_4.3.2.sif Rscript /ODA/ODA.R  /input_path/your_data_file.xlsx  /output_path/
  1. Running on your local machine using R. Make sure R and required packages are installed and you can run the Rscript command from a command line tool such as a Linux terminal or Windows CMD / PowerShell. You should request a copy of the ODA source files from me.
Rscript  /path_to_the_program/ODA/ODA.R  /input_path/your_data_file.xlsx  /output_path/

Examples: See the Examples folder for typical analysis settings and output files (more to be added).

  1. Raw_Data_QC: This examples shows how to compare samples with quality controls to know whether the experiment works well. For such comparisons use the raw, non-normalized data.
  2. QC_Excluded_Normalized_Data_Differential_Intensities: This example shows how to do differential intensity tests between biological samples under different experimental conditions. For such comparisons you should exclude quality controls and normalize data.
  3. Technical_Replicates: When there are technical replicates from the same biological subjects, the program averages them before normalization. This example shows you how to specify technical replicates.
  4. Multiple_Batches: If you have data from multiple experiments and suspect batch effects, the program can help identify them by visualization. Batches can be modeled if using glm-based statistical tests. This example shows you how to specify batches.
  5. Paired_or_Matched_Samples: If you have paired or matched samples, the program can do paired statistical tests. This example shows you how to specify the pairing or matching.
  6. Interested_and_Excluded_Features: If you supply a list of features e.g. significant features by differential intensity tests, the program can generate plots and result sheet for selected features only. The program can also exclude features from analysis if you supply a list. This example shows you how to specify those lists.
  7. List_Data: The program supports raw data in a table (features in rows and samples in columns) or list (features, samples, and values each in a column). While the above examples are all in the table format, this example shows you how to supply data in the list format.
  8. Ratio_Data: See this example for an analysis using ratio data such as percentages that are less than 1.
  9. MSEA_Analysis: See this example for Metabolite Set Enrichment Analysis results.

Citation:

DeVilbiss AW, Zhao Z, Martin-Sandoval MS, Ubellacker JM, Tasdogan A, Agathocleous M, Mathews TP, Morrison SJ. Metabolomic profiling of rare cell populations isolated by flow cytometry from tissues. eLife 2021;10:e61980. PMCID: PMC7847306

Contact:

Contact Zhiyu Zhao for comments / questions / suggestions or for requesting features about this software.