Skip to content
Snippets Groups Projects
Vishruth Mullapudi's avatar
Vishruth Mullapudi authored
bf562abe

Overview

Collected here are a set of scripts and analysis tools to facilitate the use of Rosetta to conduct large mutagenesis scans, such as those that would occur in alanine scanning of a protein structure [e.g. amyloid fibrils] or when conducting saturation mutagenesis. Included here are protocols to conduct both automated point mutagenesis scans and pre-specified (i.e., via Rosetta resfiles) mutagenesis of structures, both for interface ddG and total deltaREU of a structure, with and without backrub sampling.

Protocol captures detailing the running of a sample structure can be found in each respective protocol's Usage.md files. Portions of this project, specifically the RosettaScripts protocols and analysis scripts, are built on top of the Rosetta Flex ddG protocol as described by Barlow et. al..

The first two variants leverage the Rosetta InterfaceDdgMover to calculate both the interface ddG of mutation as well as the total deltaREU of mutation, utilizing backrub sampling to improve sampling of backbone and sidechain conformational space.

  • The mutation scan protocol contains a method to sequentially mutate all residues in indicated chains to a given amino acid using backrub sampling to calculate the deltaREU of mutation as well as the interface ddG, and analyze and extract those structures.
    • This method is useful for conducting mutagenesis scans on both globular proteins and fibril or other symmetric structures, as either one chain or multiple homologous chains can be mutated simultaneously.
  • The mutation set protocol contains the same underlying Rosetta protocol, but trades the convenience of automatically mutating each residue for the added flexibility of being able to specify custom Rosetta resfiles, allowing multliple simultaneous mutations to be handled.

The third and fourth variants of this protocol allow for faster runs by eschewing the InterfaceDdgMover to manually minimize a mutant and wildtype structure. This only gives total deltaREU of mutation, and not interface ddG, with the benefit of reduced runtime.

Software Requirements

This code utilizes Rosetta and a working installation of Rosetta with RosettaScripts functionality is required.

  • This code has been tested and found to work with Rosetta versions 3.11-3.13, on RHEL Server 7, Ubuntu 20.04, 22.04, and WSL Ubuntu 20.04. It should also work on macOS but has not been extensively tested on that platform.

Python Dependencies

  • A environment file that can be imported into conda is provided. The following packages are required, along with Python 3.8.10:
biopython=1.78
pandas=1.2.4
toml=0.10.2
tqdm=4.59.0

Other library and Python3 versions may also work, but the code has not been tested with them.

Installation Guide

Simply clone this repository into the desired folder, e.g.

git clone https://git.biohpc.swmed.edu/s184069/flex_ddg_ala_scn_runner.git
cd flex_ddg_ala_scn_runner

If using conda, the environment can be installed using conda env create -f ddg_runner_spec.yml The config.toml file for the protocol you wish to run will then need to be edited (details can be found in that protocol's Usage.md file). Total install time (if Rosetta is already installed) <5 minutes, depending on download times for the python packages and the repository.

Workflow

This project is intended to offer a relatively constant workflow no matter which protocol is used. The steps are generally as follows:

  1. Optionally preminimize the input structure in Rosetta using the fa_talaris_2014 scorefunction (or another scorefunction, if you change the Rosetta XML protocol script to use another scorefunction. Minimize with the same scorefunction used for the ddG runs.)
  2. If working with a resfile-based protocol, i.e. any protocol other than the mutation_scan_interface_ddg, generate resfiles defining the desired mutations
  3. Edit the protocol's respective .toml file to change the run parameters (details in the protocol's respective Usage.md file)
  4. Run the protocol using `[ddg python executable].py' with the appropriate command line flags (detailed in Usage.md)
  5. Run the protocol's analysis script
  6. Optionally extract the structures for inspection/further analysis (e.g. RMSD calculations)

Further Directions

  • A major improvement would be to adapt this work to leverage the fibril symmetry during the Rosetta relax and mutation process. This should improve runtimes as well as avoid edge effects that occur when making mutations at the end of a fibril model.
  • This program launches many Rosetta processes in parallel, each of which writes to its own output databases. When performing alanine scans over large numbers of residues and/or replicates, this leads to significant slowdowns in the analysis of the data, as each of the ddG output databases has to be opened and queried for data. Ideally, we would have multiple processes write output to a single shared database or set of shared databases, to reduce the need for many file I/O operations (which can be slow/incurs additional latency penalties, especially on compute cluster filesystems)
  • Improvements can be made to looking at subsets of fibril residues for energetics calculations. Preliminary work has been done to support this, but analyzing e.g. only the center chains of a fibril stack to avoid edge effects or looking at the per-residue energies is not currently as easy as it could be.
  • Refactor to pull all the common code between method variants into a shared library to reduce code duplication

Copyright (c) 2022 The University of Texas Southwestern Medical Center.

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted for academic research use only (subject to the limitations in the disclaimer below) provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

ANY USE OR REDISTRIBUTION OF THIS SOFTWARE FOR COMMERCIAL PURPOSES, WHETHER IN SOURCE OR BINARY FORM, WITH OR WITHOUT MODIFICATION, IS EXPRESSLY PROHIBITED; ANY USE OR REDISTRIBUTION BY A FOR-PROFIT ENTITY SHALL COMPRISE USE OR REDISTRIBUTION FOR COMMERCIAL PURPOSES.

NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY'S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE, AND ANY ACCOMPANYING DOCUMENTATION, IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE OR ANY OF ITS ACCOMPANYING DOCUMENTATION, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.