Skip to content
Snippets Groups Projects

Overview

Collected here are a set of scripts and analysis tools to facilitate the use of Rosetta to conduct large mutagenesis scans, such as those that would occur in alanine scanning of a protein structure [e.g. amyloid fibrils] or when conducting saturation mutagenesis. Included here are protocols to conduct both automated point mutagenesis scans and pre-specified (i.e., via Rosetta resfiles) mutagenesis of structures, both for interface ddG and total deltaREU of a structure, with and without backrub sampling.

Protocol captures detailing the running of a sample structure can be found in each respective protocol's Usage.md files. Portions of this project, specifically the RosettaScripts protocols and analysis scripts, are built on top of the Rosetta Flex ddG protocol as described by Barlow et. al..

The first two variants leverage the Rosetta InterfaceDdgMover to calculate both the interface ddG of mutation as well as the total deltaREU of mutation, utilizing backrub sampling to improve sampling of backbone and sidechain conformational space.

  • The mutation scan protocol contains a method to sequentially mutate all residues in indicated chains to a given amino acid using backrub sampling to calculate the deltaREU of mutation as well as the interface ddG, and analyze and extract those structures.
    • This method is useful for conducting mutagenesis scans on both globular proteins and fibril or other symmetric structures, as either one chain or multiple homologous chains can be mutated simultaneously.
  • The mutation set protocol contains the same underlying Rosetta protocol, but trades the convenience of automatically mutating each residue for the added flexibility of being able to specify custom Rosetta resfiles, allowing multliple simultaneous mutations to be handled.

The third and fourth variants of this protocol allow for faster runs by eschewing the InterfaceDdgMover to manually minimize a mutant and wildtype structure. This only gives total deltaREU of mutation, and not interface ddG, with the benefit of reduced runtime.

Software Requirements

This code utilizes Rosetta and a working installation of Rosetta with RosettaScripts functionality is required.

  • This code has been tested and found to work with Rosetta versions 3.11-3.13, on RHEL Server 7, Ubuntu 20.04, 22.04, and WSL Ubuntu 20.04. It should also work on macOS but has not been extensively tested on that platform.

Python Dependencies

  • A environment file that can be imported into conda is provided. The following packages are required, along with Python 3.8.10:
biopython=1.78
pandas=1.2.4
toml=0.10.2
tqdm=4.59.0

Other library and Python3 versions may also work, but the code has not been tested with them.

Installation Guide

Simply clone this repository into the desired folder, e.g.

git clone https://git.biohpc.swmed.edu/s184069/flex_ddg_ala_scn_runner.git
cd flex_ddg_ala_scn_runner

If using conda, the environment can be installed using conda env create -f ddg_runner_spec.yml The config.toml file for the protocol you wish to run will then need to be edited (details can be found in that protocol's Usage.md file). Total install time (if Rosetta is already installed) <5 minutes, depending on donwload times for the python packages and the repository.

Workflow

This project is intended to offer a relatively constant workflow no matter which protocol is used. The steps are generally as follows:

  1. Optionally preminimize the input structure in Rosetta using the fa_talaris_2014 scorefunction (or another scorefunction, if you change the Rosetta XML protocol script to use another scorefunction. Minimize with the same scorefunction used for the ddG runs.)
  2. If working with a resfile-based protocol, i.e. any protocol other than the mutation_scan_interface_ddg, generate resfiles defining the desired mutations
  3. Edit the protocol's respective .toml file to change the run parameters (details in the protocol's respective Usage.md file)
  4. Run the protocol using `[ddg python executable].py' with the appropriate command line flags (detailed in Usage.md)
  5. Run the protocol's analysis script
  6. Optionally extract the structures for inspection/further analysis (e.g. RMSD calculations)

Further Directions

  • A major improvement would be to adapt this work to leverage the fibril symmetry during the Rosetta relax and mutation process. This should improve runtimes as well as avoid edge effects that occur when making mutations at the end of a fibril model.
  • This program launches many Rosetta processes in parallel, each of which writes to its own output databases. When performing alanine scans over large numbers of residues and/or replicates, this leads to significant slowdowns in the analysis of the data, as each of the ddG output databases has to be opened and queried for data. Ideally, we would have multiple processes write output to a single shared database or set of shared databases, to reduce the need for many file I/O operations (which can be slow/incurs additional latency penalties, especially on compute cluster filesystems)
  • Improvements can be made to looking at subsets of fibril residues for energetics calculations. Preliminary work has been done to support this, but analyzing e.g. only the center chains of a fibril stack to avoid edge effects or looking at the per-residue energies is not currently as easy as it could be.