Update Readme.md

9296b7ca · Ahmed Abbas · 39c15ec7 · 9296b7ca
Commit 9296b7ca authored 1 year ago by Ahmed Abbas
--- a/HiC_prediction/Readme.md
+++ b/HiC_prediction/Readme.md
@@ -8,8 +8,42 @@ https://cloud.biohpc.swmed.edu/index.php/s/yyp8DidNzzH4tgb
 1. Go using the terminal to the downloaded folder `IMR90_data_CNN-ChIPr`
 2. Compile the C++ script `prepare_chipseq_data.cpp` by typing: `g++ prepare_chipseq_data.cpp -o imr90.out`
 3. Run the program by typing: `./imr90.out`
-4. An example file showing how to compile and run the program is `run_cpp.sh`. You can adjust the file as necessary for your working environment
+4. It may take more than 1 day to finish
+5. An example file showing how to compile and run the program is `run_cpp.sh`. You can adjust the file as necessary for your working environment
+## Prepare the sequence files needed for training the model
+1. The script used for this step is: `get_seq_ip_files.R`
+2. To run the script, type on terminal: `Rscript get_seq_ip_files.R`
+3. It needs an active R environment and needs to install the library `bedtoolsr`
+4. It may take more than 1 day to finish this step
+5. An example file showing how to run the script is `get_seq_files.sh`
+## Prepare the CTCF orientation flags and TADs flags files
+1. The script used for this step is: `get_CTCF_ORI_TADS_IMR90.R`
+2. To run the script, type on terminal: `Rscript get_CTCF_ORI_TADS_IMR90.R`
+3. It needs an active R environment and needs to install the library `bedtoolsr`
+4. An example file showing how to run the script is `get_ctcf_tads_ori_IMR90.sh`
+## The files needed for training are now ready and stored
+1. All the files needed for training the model are now ready and stored in the previously created folder: `Hi-C_data`
+## Training the model
+1. In the extracted folder `CNN-ChIPr-Data`, create a new folder and name it: `RAD21_model_GM12878`
+2. The file used for training the model is: `train_for_GM12878_RAD21.py`
+3. To train the model, you need to have active python envirinment, and install all the libraries listed at the top of `train_for_GM12878_RAD21.py`
+4. To train the model, type: `python train_for_GM12878_RAD21.py`
+5. It may need several hours to finish. **I used a computer with 256GB memory in this step.**
+6. After this step finishes, the trained model will be in: `RAD21_model_GM12878/RAD21_all_inputs_trained_GM12878.h5`
+## Testing the model
+1. In the downloaded data, the folder `RAD21_inputs_K562` contains data to test the model and get results for the K562 cell line
+2. This K562 data was prepared in the same way the data of GM12878 was prepared as in the previous steps
+3. The file used for testing the model is: `get_results_K562_RAD21.py`
+4. To run the script, type on terminal: `python get_results_K562_RAD21.py`
+5. It should print the pearson correlation values between the predictions and original K562 interactions for each chromosome.
+## For questions, comments, or bug reporting
+- Please contact ahmed.abbaselmahdi@utsouthwestern.edu