Update file README.md

b71fd7b2 · Jefferson Chen · ac0ae745 · b71fd7b2
Commit b71fd7b2 authored 9 months ago by Jefferson Chen
--- a/README.md
+++ b/README.md
@@ -91,6 +91,8 @@ The output of the single cell reference $S$ would be in dimension $(k * m) \time
 For metrics evaluation, we chose to measure the correlation between each predicted spot and the actual spot. Which can be represent in the following equation. 
+All the comparsion and metrics evaluation are stored in the file **./scGPT/Tangram_scGPT_Comparsion.ipynb**.
 $$\frac{1}{n_{spot}}\sum_{i=1}^{n_{spot}} corr(A_i, G_i)$$
 **Note**: 
@@ -155,7 +157,9 @@ To run the result for Redeconve, go to folder **./scGPT/Redeconve**. Run **forma
 For PDAC dataset plotting, please visit the file **./scGPT/Redeconve/PDAC.R**
-To preprocess PDAC dataset for scGPT generation, please visith the file **./scGPT/Redeconve/PDAC_Data_Formatting.R**
+To preprocess PDAC dataset for scGPT generation, please visit the file **./scGPT/Redeconve/PDAC_Data_Formatting.R**
+To see where to store the overlapped common genes and marker genes, please visit the file **./scGPT/Tangram_scGPT_Comparsion.ipynb**. Note that it to filter the overlapped genes when dealing with case such as selecting 1000 cell references from the entire cell references, some gene may have zero-expression and needs to be fitered out. 
 **Note**: It is highly recommended to use R version >=4.2.0. 
@@ -163,6 +167,8 @@ Seurat* -- Needs to be implemented
 ### Method Comparsion with Redeconve & Seurat (Human Breast)
+In this dataset, we compared 5 different results. Since in Redeconve's original result, it only selected 1000 cells as reference, therefore use our method and generated 1000 cell reference to compare the result. We have also generated the same number of cell reference as the original single-cell expression data, which is 100000 cell reference. Same procedures is been applied to dataset 3 (Human Lymph Node). 
 **Note**: We used Human Breast dataset. The default hyperparameters is $ratio=0.8$, $k=1000$, $m=100$, $n_{genes}$ is the overlap with marker genes. Notice that there exist genes with 0's accorss all 1000 selected cells (in Redeconve method). Therefore Tangram requires to filter those zero-expression gene out and resulting an overlapped **165** marker genes and **14655** overlapped genes in total.
@@ -193,7 +199,7 @@ Seurat* -- Needs to be implemented
 ### Method Comparsion with Redeconve & Seurat (PDAC)
-**Note**: We used PDAC dataset. The default hyperparameters is $ratio=0.8$, $k=107$, $m=18$, $n_{genes}$ is the overlap with marker genes. After scGPT generation and filtering, there are **11960** intersected genes between the spatial and single cell expression data with **127** overlapped marker genes. 
+**Note**: We used PDAC dataset. The default hyperparameters is $ratio=0.8$, $k=107$, $m=18$, $n_{genes}$ is the overlap with marker genes. After scGPT generation and filtering, there are **11960** intersected genes between the spatial and single cell expression data with **127** overlapped marker genes. Since for dataset 4 (PDAC), Redeconve did use the entire 1926 cells reference, we did not have to generate the 1000 cell reference using scGPT. 
 | Method      | Median   | Mean     |
 |-------------|----------|----------|