Update file README.md

71d35d0f · Jefferson Chen · d35d34e5 · 71d35d0f
Commit 71d35d0f authored 9 months ago by Jefferson Chen
--- a/README.md
+++ b/README.md
+# scGNN-sciPENN
+## Settle Environment
+We can first create the conda environment using 
+```
+conda create -n scgnn_scipenn python==3.8.19
+conda activate scgnn_scipenn
+pip install -r ./requirements.txt
+pip install "numba==0.56.4" seaborn
+```
+Running scGNN also requires an [installation of R](https://www.r-project.org/). Below are the packages required for running scGNN. (It is highly recommend to use R version >=4.1)
+```
+remotes::install_github("satijalab/seurat", "seurat5", quiet = TRUE)
+install.packages(c("Matrix", "data.table", "pracma", "dplyr"))
+```
+## Get Protein Prediction from sciPENN
+To get the protein prediction from sciPENN, run the file **./sciPENN/preprocessing_data.ipynb**. For time effiency, we chosed 8000 cells randomly instead of using the entire datasets. The gene and protein expressions are stored in the file **./scGNN/sample_Data/pmbc/Gene/original_top_expression.csv** and **./scGNN/sample_Data/pmbc/Protein/original_top_expression.csv** respectively. 
+## Run LTMG
+scGNN requires the input data have its corresponding LTMG results. To run LTMG, run the file **LTMG.R**. 
+**Note**: Due to sciPENN prediction will have negative values, some of the genes and cells in the protein prediction will be filtered out (if using original protein expression data, such senario won't happen). Therefore, we re-constrained the gene expression to the same cells as the protein expressions.
+## scGNN Iteration
+To run scGNN, use the following command
+```
+nohup python -W ignore scGNN_v2.py --load_LTMG LTMG_0.1.csv \
+    --load_sc_dataset original_top_expression.csv \
+    --load_dataset_dir ./sample_Data \
+    --output_intermediate \
+    --output_preprocessed \
+    --output_dir ./output_pbmc \
+    --graph_AE_embedding_size 64 \  
+    --load_dataset_1 pbmc/Gene \
+    --load_dataset_2 pbmc/Protein \
+    > scipenn_pbmc.out &
+```
+**Note**: Usually the argument "--load_dataset_1" is used to load gene-related data, and "--load_dataset_2" is used to load protein-related data. For additional arguments and details, visit [scGNN2.0 documentation](https://github.com/OSU-BMBL/scGNN2.0) and [scGNN documentation](https://github.com/juexinwang/scGNN). 
+## Methods Used
+![Mind Map](./img/Mindmap.png)
+The major change we did compare to scGNN's original method is adding protein expression as our input. To merge two inputs together, we use Seurat after the graphing embeddings for cell clusters and graph. Two inputs are trained separately using scGNN, and only merged when predicting the cell clusters. The merging process is achieved in the **Seurat.R** file
+## Result Comparsion
+To compare our new methods to sciPENN, we used correlation of the predicted protein expression as our metrics. For more details, run the file **./sciPENN/comparsion.ipynb**
+## Acknowledgement 
+- Ruanfeng Pei
+- Yang Xie
+- Guanghua Xiao
+## Citations 
+- Lakkis, J., Schroeder, A., Su, K. et al. A multi-use deep learning method for CITE-seq and single-cell RNA-seq data integration with cell surface protein prediction and imputation. Nat Mach Intell 4, 940–952 (2022). https://doi.org/10.1038/s42256-022-00545-w
+- Haocheng Gu, Hao Cheng, Anjun Ma, Yang Li, Juexin Wang, Dong Xu, Qin Ma, scGNN 2.0: a graph neural network tool for imputation and clustering of single-cell RNA-Seq data, Bioinformatics, Volume 38, Issue 23, 1 December 2022, Pages 5322–5325, https://doi.org/10.1093/bioinformatics/btac684
+- Wang, J., Ma, A., Chang, Y. et al. Author Correction: scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nat Commun 13, 2554 (2022). https://doi.org/10.1038/s41467-022-30331-6