Skip to content
Snippets Groups Projects
Commit 0fcb7c4f authored by Huabin Zhou's avatar Huabin Zhou
Browse files

New test files and Tutorial file

parent f7c6d0d8
No related merge requests found
Showing
with 5213 additions and 0 deletions
Copyright (c) 2024 The University of Texas Southwestern Medical Center.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted for academic research use only (subject to the limitations in the disclaimer below) provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
ANY USE OR REDISTRIBUTION OF THIS SOFTWARE FOR COMMERCIAL PURPOSES, WHETHER IN SOURCE OR BINARY FORM, WITH OR WITHOUT MODIFICATION, IS EXPRESSLY PROHIBITED; ANY USE OR REDISTRIBUTION BY A FOR-PROFIT ENTITY SHALL COMPRISE USE OR REDISTRIBUTION FOR COMMERCIAL PURPOSES.
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE, AND ANY ACCOMPANYING DOCUMENTATION, IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE OR ANY OF ITS ACCOMPANYING DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Tutorial for the CATM and TM
## Introduction
To simplify the batch processing of the program, each demo tomogram is placed in a separate folder. Here, we will test two chromatin datasets with tomogram IDs "s210" and "s68". We will walk through the test for "s210". Pre-computed results are also available in the results folder, which may be overwritten during the process.
## Tune the parameters
All parameters are configured in the `config.py` file, where you can specify the input/output directories and detailed parameters for template matching and the clash resolver. You can tailor these settings to meet specific requirements.
## Inputs
In the input folder, you will find the following files:
1. `s210.test1.wi8Apx.mrc`: Denoised tomogram for visual inspection.
2. `s210.test1.lps25.mrc`: Low-pass filtered tomogram for template matching.
3. `s210.pick1.csv`: Picked particles, which can be manually selected or generated using AI-based pickers (such as DeepFinder).
4. `s210_0000000_ctf_8.00A.mrc`: CTF file for the tomogram, which can be obtained from Warp or Relion (optional).
5. `mono-8Apx-lps30-box30-rot12-core.mrc`: Mononucleosome template file.
6. `mask-mono-8Apx-lps30-box30-rot12-ex2-s2.mrc`: Soft mask for the template (optional).
## Run the CATM pipline
```
cd test/s210/catm
```
simply run "catm"
```
catm
```
This will take 1-5 minutes, depending on the number of available CPUs. If the program runs successfully, it will output several different formats, and we will examine the results.
## Run the TM pipeline
```
cd ../tm
```
simply run "tm" command, and this should only take a few seconds given the tiny size of the tomogram
```
tm
```
After template matching, you can run a postprocessing script to filter particles by distance and remove clashing particles. The results will be mapped back to the tomogram. You can also specify the cutoff for cross-correlation coefficients, which by default are set to 0.2 and 0.3.
To clean the particles based solely on distance (traditional method), run:
```
python plot_and_clean_results.py
```
Alternatively, to clean particles by distance and also remove steric clashes, use the following command. The cleaned file will be saved with the *rc.mrc suffix:
```
python plot_and_clean_results_remove_clash.py
```
## Examine the results
The easiest way to look at the results is using 3dmod or ChimeraX
```
cd ..
3dmod catm/inputs/s210.test1.wi8Apx.mrc catm/results/s210.test1.catm.models.mrc tm/results/s210.test1.tm_*.mrc
```
As you may have observed, there are nucleosomes stacked within the crowded tomogram that are accurately assigned by CATM, but not by TM. In the vicinity of the coordinates [25, 10, 17], two nucleosomes are stacked in a face-on view. This is illustrated in the [CATM vs TM movie], where TM results in incorrect orientation or an incorrect particle count, whereas CATM successfully assigns both nucleosomes. This is also true for the tacked nucleosomes with side-on view around [10,24,18] and [7,19,12]. It is important to note that the recall of CATM is dependent on the pre-assigned coordinates, and we exclude the particle too close to the edges of the tomogram.
The angles for template matching are generated via randomly samping on 3D sphere, so you might expect small difference in results for each run.
\ No newline at end of file
[build-system]
requires = ["setuptools"]
build-backend = "setuptools.build_meta"
[project]
name = "catm" # Context Aware Template matCHING
version = "0.1.0"
description = "A software for template matching and clash resolving in cryo-ET"
authors = [{name = "Huabin Zhou", email = "huabin.zhou@utsouthwestern.edu"}]
readme = "README.md"
license = {file = "LICENSE.md"}
classifiers = [
"Intended Audience :: Science/Research",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: Cryo-EM Data Processing",
"Operating System :: OS Independent",
]
requires-python = ">=3.8"
dependencies = [ "numpy", "pandas", "mrcfile", "scikit-learn", "scipy", "lxml","starfile","tqdm","matplotlib"]
[project.scripts]
catm = "catm:main"
tm = "catm:main" #TODO:take the tm function in separated file
[project.optional-dependencies]
stats = [
"scipy>=1.7",
"statsmodels>=0.12",
]
dev = [
"vg",
"networkx",
"black",
"pytest",
"pycm",
"pytest-cov",
"pytest-xdist",
"mypy",
"pandas-stubs",
"pre-commit",
"flit",
"mpld3",
"jupyter",
]
docs = [
"numpydoc",
"nbconvert",
"ipykernel",
"sphinx<6.0.0",
"sphinx-copybutton",
"sphinx-issues",
"sphinx-design",
"pyyaml",
"pydata_sphinx_theme==0.10.0rc2",
]
[project.urls]
Source = "https://github.com/mwaskom/seaborn"
Docs = "http://seaborn.pydata.org"
\ No newline at end of file
# config.py
import os
"""
The input and configureation:
1.The tomogram, should be black on white background
2.The templates, needs to be white on black background, to be consistent with normal convention
3.The coordinates of picked particles
4.The missing wedge information, default to [30,42], corresponding to [-60,+48] tilt range
5.The shrinkage factor, default to 0.3, this controls the contours of the template
6. There are more high level parameters, like the search depth, etc.
"""
############################ input && output control ##################################
current_dir = os.getcwd()
split_path = current_dir.split("/")
tomoID = split_path[-2]
prefix = tomoID + ".test1.catm"
input_folder = "inputs/"
tomogram = input_folder + tomoID + ".test1.lps25.mrc"
templates = [input_folder + "mono-8Apx-lps30-box30-rot12-core.mrc"]
contour_level = [0.2]
# Masks are required to be the same shape as the templates
# One can create the mask in Relion with soft edges, extend 2 and soft 3 is recommended
masks = [
input_folder + "mask-mono-8Apx-lps30-box30-rot12-ex2-s2.mrc",
]
# Coordinates of picked particles,you might also include the angles for local search
df = input_folder + tomoID + ".pick1.csv" #'../cluster/s'+str(tomoID)+'.unetn_4.5.csv'
# We need the missing wedge information or a CTF model
# You might create a CTF model via Relion or Warp, which should be generic for data collected with the microscope
# If a CTF model present, it will be use by default
ctf_model_file = (
input_folder + tomoID + "_0000000_ctf_8.00A.mrc"
) # path to the CTF model
# output path
output_path = "results/"
write_models = True # Generate a volume with assigned models
mpi_nn = -1 # -1 will use all the availbe cpu
############################ Template Matching Control ##################################
# Missing wedge information, if CTF model is not found
missing_wedge = [30, 42] # [30,42] Corresponding to [-60, +48] tilt range
# Shrinkage factor, which the counter level of the volume, can be determined in Chimera
# It's designed to control how close two objects are allowed to be
# only template 1 will be used for the cleaning of clashes
shrinkage_factor = 1.1
# Number of angles for global template matching
number_of_angles = 2000
# The minimum CCCs kept after template matching
min_CCC = 0.1
# Range of local search angles, only for local search, default to False for global search
# For local searchthe rough angles need to be provied in the coords.csv,
# with columns phi, theta, psi in intrinsic ZXZ convention
local_search_angles = False # False or True
# searching space
matching_space = 3 # in pixels, how far from the original position to search
############################ Clash Resolver Control ##################################
# Search depth, control how many rotataions
search_depth = 300
# for development
bypass_TM = False
bypass_CR = False
sort_score = True # this is useful if you don't want to mix up the index
############################ For Development ##################################
pre_assigned_volume = (
None #'s'+tomoID+'-pre-assigned-vol.mrc' # path to the pre-assigned volume
)
bypass_optimizer = False # this will bypass the optimizer clash resolver
distance_tolerance = 1
adjust_ccc = None
# max distance between two partles to be considered as the same particle
# option's for running only the general template matching
testTM = False
chunk_size = None # [304,776,776]#[152,194,194]
trimvol -x 386,425 -y 413,452 -z 75,114 ../../../cat-box30-linker/s210.lps25.mrc s210.test1.lps25.mrc
File added
File added
x,y,z
5,20,11
7,24,19
26,8,13
24,8,18
32,14,26
33,25,27
18,31,19
14,32,30
File added
File added
File added
This source diff could not be displayed because it is too large. You can view the blob instead.
25 8 11 75.99999999999999 84.00000000000001 -174.00000000000003
24 7 19 -84.0 84.0 36.000000000000014
5 17 10 135.0 38.99999999999999 -89.00000000000001
33 24 30 -23.000000000000007 119.0 6.999999999999997
32 14 28 -175.0 50.00000000000001 55.00000000000003
16 31 22 -87.00000000000001 62.0 -155.0
11 33 27 33.999999999999986 70.0 -35.000000000000014
8 22 19 159.0 92.0 -144.00000000000003
x,y,z,phi,theta,psi,ccc,model
25,8,11,166.0,84.0,96.0,0.470693,0
24,7,19,6.0,84.0,-54.0,0.466697,0
5,17,10,-135.0,39.0,-179.0,0.506189,0
33,24,30,67.0,119.0,-83.0,0.478312,0
32,14,28,-85.0,50.0,-35.0,0.45916,0
16,31,22,3.0,62.0,115.0,0.450286,0
11,33,27,124.0,70.0,-125.0,0.411558,0
8,22,19,-111.0,92.0,126.0,0.396037,0
File added
# Created by the starfile Python package (version 0.5.1) at 05:20:50 on 11/10/2024
data_
loop_
_rlnCoordinateX #1
_rlnCoordinateY #2
_rlnCoordinateZ #3
_rlnAngleRot #4
_rlnAngleTilt #5
_rlnAnglePsi #6
_ccc #7
25 8 11 76.000000 84.000000 -174.000000 0.470693
24 7 19 -84.000000 84.000000 36.000000 0.466697
5 17 10 135.000000 39.000000 -89.000000 0.506189
33 24 30 -23.000000 119.000000 7.000000 0.478312
32 14 28 -175.000000 50.000000 55.000000 0.45916
16 31 22 -87.000000 62.000000 -155.000000 0.450286
11 33 27 34.000000 70.000000 -35.000000 0.411558
8 22 19 159.000000 92.000000 -144.000000 0.396037
<objlist>
<subtomo>
<object subtomo_idx="0" x="25" y="8" z="11" phi="166" theta="84" psi="96" CCC="0.470693" model="0.000000"/>
</subtomo>
<subtomo>
<object subtomo_idx="1" x="24" y="7" z="19" phi="6" theta="84" psi="-54" CCC="0.466697" model="0.000000"/>
</subtomo>
<subtomo>
<object subtomo_idx="2" x="5" y="17" z="10" phi="-135" theta="39" psi="-179" CCC="0.506189" model="0.000000"/>
</subtomo>
<subtomo>
<object subtomo_idx="3" x="33" y="24" z="30" phi="67" theta="119" psi="-83" CCC="0.478312" model="0.000000"/>
</subtomo>
<subtomo>
<object subtomo_idx="4" x="32" y="14" z="28" phi="-85" theta="50" psi="-35" CCC="0.459160" model="0.000000"/>
</subtomo>
<subtomo>
<object subtomo_idx="5" x="16" y="31" z="22" phi="3" theta="62" psi="115" CCC="0.450286" model="0.000000"/>
</subtomo>
<subtomo>
<object subtomo_idx="6" x="11" y="33" z="27" phi="124" theta="70" psi="-125" CCC="0.411558" model="0.000000"/>
</subtomo>
<subtomo>
<object subtomo_idx="7" x="8" y="22" z="19" phi="-111" theta="92" psi="126" CCC="0.396037" model="0.000000"/>
</subtomo>
</objlist>
# config.py
import os
"""
The input and configureation:
1.The tomogram, should be black on white background
2.The templates, needs to be white on black background, to be consistent with normal convention
3.The coordinates of picked particles
4.The missing wedge information, default to [30,42], corresponding to [-60,+48] tilt range
5.The shrinkage factor, default to 0.3, this controls the contours of the template
6. There are more high level parameters, like the search depth, etc.
"""
############################ input && output control ##################################
current_dir = os.getcwd()
split_path = current_dir.split("/")
tomoID = split_path[-2]
prefix = tomoID + ".test1"
input_folder = "inputs/"
tomogram = input_folder + tomoID + ".test1.lps25.mrc"
templates = [input_folder + "mono-8Apx-lps30-box30-rot12-core.mrc"]
contour_level = [0.2]
# Masks are required to be the same shape as the templates
# One can create the mask in Relion with soft edges, extend 2 and soft 3 is recommended
masks = [
input_folder + "mask-mono-8Apx-lps30-box30-rot12-ex2-s2.mrc",
]
# Coordinates of picked particles,you might also include the angles for local search
df = None # input_folder+'s68.pick1.csv'#'../cluster/s'+str(tomoID)+'.unetn_4.5.csv'
# We need the missing wedge information or a CTF model
# You might create a CTF model via Relion or Warp, which should be generic for data collected with the microscope
# If a CTF model present, it will be use by default
ctf_model_file = (
input_folder + tomoID + "_0000000_ctf_8.00A.mrc"
) # path to the CTF model
# output path
output_path = "results/"
write_models = True # Generate a volume with assigned models
mpi_nn = -1 # -1 will use all the availbe cpu
############################ Template Matching Control ##################################
# Missing wedge information, if CTF model is not found
missing_wedge = [30, 42] # [30,42] Corresponding to [-60, +48] tilt range
# Shrinkage factor, which the counter level of the volume, can be determined in Chimera
# It's designed to control how close two objects are allowed to be
# only template 1 will be used for the cleaning of clashes
shrinkage_factor = 1
# Number of angles for global template matching
number_of_angles = 2000
# The minimum CCCs kept after template matching
min_CCC = 0.1
# Range of local search angles, only for local search, default to False for global search
# For local searchthe rough angles need to be provied in the coords.csv,
# with columns phi, theta, psi in intrinsic ZXZ convention
local_search_angles = False # False or True
# searching space
matching_space = 3 # in pixels, how far from the original position to search
############################ Clash Resolver Control ##################################
# Search depth, control how many rotataions
search_depth = 200
# for development
bypass_TM = True
bypass_CR = False
sort_score = True # this is useful if you don't want to mix up the index
############################ For Development ##################################
pre_assigned_volume = (
None #'s'+tomoID+'-pre-assigned-vol.mrc' # path to the pre-assigned volume
)
bypass_optimizer = False # this will bypass the optimizer clash resolver
distance_tolerance = 2
adjust_ccc = None
# max distance between two partles to be considered as the same particle
# option's for running only the general template matching
testTM = True
chunk_size = None # [304,776,776]#[152,194,194]
File added
File added
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment