Skip to content
Snippets Groups Projects
Commit 9296b7ca authored by Ahmed Abbas's avatar Ahmed Abbas
Browse files

Update Readme.md

parent 39c15ec7
No related merge requests found
...@@ -8,8 +8,42 @@ https://cloud.biohpc.swmed.edu/index.php/s/yyp8DidNzzH4tgb ...@@ -8,8 +8,42 @@ https://cloud.biohpc.swmed.edu/index.php/s/yyp8DidNzzH4tgb
1. Go using the terminal to the downloaded folder `IMR90_data_CNN-ChIPr` 1. Go using the terminal to the downloaded folder `IMR90_data_CNN-ChIPr`
2. Compile the C++ script `prepare_chipseq_data.cpp` by typing: `g++ prepare_chipseq_data.cpp -o imr90.out` 2. Compile the C++ script `prepare_chipseq_data.cpp` by typing: `g++ prepare_chipseq_data.cpp -o imr90.out`
3. Run the program by typing: `./imr90.out` 3. Run the program by typing: `./imr90.out`
4. An example file showing how to compile and run the program is `run_cpp.sh`. You can adjust the file as necessary for your working environment 4. It may take more than 1 day to finish
5. An example file showing how to compile and run the program is `run_cpp.sh`. You can adjust the file as necessary for your working environment
## Prepare the sequence files needed for training the model
1. The script used for this step is: `get_seq_ip_files.R`
2. To run the script, type on terminal: `Rscript get_seq_ip_files.R`
3. It needs an active R environment and needs to install the library `bedtoolsr`
4. It may take more than 1 day to finish this step
5. An example file showing how to run the script is `get_seq_files.sh`
## Prepare the CTCF orientation flags and TADs flags files
1. The script used for this step is: `get_CTCF_ORI_TADS_IMR90.R`
2. To run the script, type on terminal: `Rscript get_CTCF_ORI_TADS_IMR90.R`
3. It needs an active R environment and needs to install the library `bedtoolsr`
4. An example file showing how to run the script is `get_ctcf_tads_ori_IMR90.sh`
## The files needed for training are now ready and stored
1. All the files needed for training the model are now ready and stored in the previously created folder: `Hi-C_data`
## Training the model
1. In the extracted folder `CNN-ChIPr-Data`, create a new folder and name it: `RAD21_model_GM12878`
2. The file used for training the model is: `train_for_GM12878_RAD21.py`
3. To train the model, you need to have active python envirinment, and install all the libraries listed at the top of `train_for_GM12878_RAD21.py`
4. To train the model, type: `python train_for_GM12878_RAD21.py`
5. It may need several hours to finish. **I used a computer with 256GB memory in this step.**
6. After this step finishes, the trained model will be in: `RAD21_model_GM12878/RAD21_all_inputs_trained_GM12878.h5`
## Testing the model
1. In the downloaded data, the folder `RAD21_inputs_K562` contains data to test the model and get results for the K562 cell line
2. This K562 data was prepared in the same way the data of GM12878 was prepared as in the previous steps
3. The file used for testing the model is: `get_results_K562_RAD21.py`
4. To run the script, type on terminal: `python get_results_K562_RAD21.py`
5. It should print the pearson correlation values between the predictions and original K562 interactions for each chromosome.
## For questions, comments, or bug reporting
- Please contact ahmed.abbaselmahdi@utsouthwestern.edu
......
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment