- Computers & electronics
- Software
- User manual
- 479 Pages
PHENIX software User manual
Below you will find brief information for software PHENIX. PHENIX software suite is a highly automated system for macromolecular structure determination that can rapidly arrive at an initial partial model of a structure without significant human intervention, given moderate resolution and good quality data. This achievement has been made possible by the development of new algorithms for structure determination, maximum-likelihood molecular replacement (PHASER), heavy-atom search (HySS), template and pattern-based automated modelbuilding (RESOLVE, TEXTAL), automated macromolecular refinement (phenix.refine), and iterative model-building, density modification and refinement that can operate at moderate resolution (RESOLVE, AutoBuild).
advertisement
Assistant Bot
Need help? Our chatbot has already read the manual and is ready to assist you. Feel free to ask any questions about the device, but providing details will make the conversation more productive.
PHENIX Documentation Home
Python-based Hierarchical ENvironment for Integrated Xtallography
PHENIX Documentation - version 1.4
1. Introduction to PHENIX a.
b.
c.
How to set up your environment to use PHENIX
d.
e.
The PHENIX Graphical User Interface
f.
g.
FAQS: Frequently asked questions
2. The PHENIX Wizards for Automation a.
b.
Automated Structure Solution using AutoSol
c.
Automated Molecular Replacement using AutoMR
d.
Automated Model Building and Rebuilding using AutoBuild
e.
Automated Ligand Fitting using LigandFit
3. Tools for analysing and manupulating experimental data in PHENIX a.
Data quality assessment with phenix.xtriage
b.
Data quality assessment with phenix.reflection_statistics
c.
Structure factor file inspection and conversions
d.
Manipulating reflection data with phenix.xmanip
e.
Exploring the symmetry of your crystal with phenix.explore_metric_symmetry
4. Tools for substructure Determination in PHENIX a.
Substructure determination with phenix.hyss
b.
Comparison of substructure sites with phenix.emma
5. Tools for structure refinement and restraint generation in PHENIX a.
Structure refinement with phenix.refine
b.
Determining non-crystallographic symmetry (NCS) from a PDB file with phenix.
c.
Finding and analyzing NCS from heavy-atom sites or a model with phenix.find_ncs
d.
Generating ligand coordinates and restraints using eLBOW
e.
Editing ligand restraints from eLBOW using REEL
f.
g.
Generating hydrogen atoms for refinement using phenix.reduce
6. Other tools in PHENIX a.
Documentation for the Phaser program
b.
Superimposing PDB files with phenix.superpose_pdbs
c.
Density modification with multi-crystal averaging with phenix.multi_crystal_average
d.
Correlation of map and model with get_cc_mtz_pdb
e.
Correlation of two maps with origin shifts with get_cc_mtz_mtz
f.
Rapid secondary structure fitting to a map with find_helices_strands
g.
PDB model: statistics, manipulations, Fcalc and more with phenix.pdbtools
h.
Running SOLVE/RESOLVE in PHENIX
i.
Automated ligand identification in PHENIX
j.
Finding all the ligands in a map with phenix.find_all_ligands
k.
Map one PDB file close to another using SG symmetry with phenix.map_to_object
http://phenix-online.org/documentation/ (1 of 2) [12/14/08 1:00:06 PM]
1
PHENIX Documentation Home l.
7. Useful tools outside of PHENIX a.
Manual model inspection and building with Coot
b.
MolProbity - An Active Validation Tool
8. PHENIX Examples and Tutorials a.
b.
Tutorial 1: Solving a structure using SAD data
c.
Tutorial 2: Solving a structure using MAD data
d.
Tutorial 3: Solving a structure using MIR data
e.
f.
Tutorial 5: Solving a structure using Molecular Replacement
g.
Tutorial 6: Automatically rebuilding a structure solved by Molecular Replacement
h.
Tutorial 7: Fitting a flexible ligand into a difference electron density map
i.
Tutorial 8: Structure refinement
j.
Tutorial 9: Refining a structure in the presence of merohedral twinning
k.
Tutorial 10: Generating ligand coordinates and restraints for structure refinement
l.
Tutorial 11: Structure validation using MolProbity
9. Appendix a.
PHENIX html documentation generation procedures
http://phenix-online.org/documentation/ (2 of 2) [12/14/08 1:00:06 PM]
2
What is PHENIX
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
What is PHENIX
The PHENIX software suite is a highly automated system for macromolecular structure determination that can rapidly arrive at an initial partial model of a structure without significant human intervention, given moderate resolution and good quality data. This achievement has been made possible by the development of new algorithms for structure determination, maximum-likelihood molecular replacement (PHASER), heavy-atom search (HySS), template and pattern-based automated modelbuilding (RESOLVE, TEXTAL), automated macromolecular refinement (phenix.refine), and iterative model-building, density modification and refinement that can operate at moderate resolution
(RESOLVE, AutoBuild). These algorithms are based on a highly integrated and comprehensive set of crystallographic libraries that have been built and made available to the community. The algorithms are tightly linked and made easily accessible to users through the PHENIX Wizards and the command line.
There are also a number of tools in PHENIX for handling ligands. Automated fitting of ligands into the electron density is facilitated via the LigandFit wizard. Besides being able to fit a known ligand into a difference map, the LigandFit wizard is capable to identify ligands on the basis of the difference density only. Stereo chemical dictionaries of ligands whose chemical description is not available in the supplied monomer library for the use in restrained macromolecular refinement can be generated with the electronic ligand builder and optimization workbench (eLBOW).
PHENIX builds upon Python, the Boost.Python Library, and C++ to provide an environment for automation and scientific computing. Many of the fundamental crystallographic building blocks, such as data objects and tools for their manipulation are provided by the Computational Crystallography
Toolbox (cctbx). The computational tasks which perform complex crystallographic calculations are then built on top of this. Finally, there are a number of different user interfaces available in PHENIX. In order to facilitate automated operation there is the Project Data Storage (PDS) that is used to store and track the results of calculations.
The PHENIX development team consists of members from Lawrence Berkeley Laboratory (Paul Adams's group), Los Alamos National Laboratory (Tom Terwilliger's group), Cambridge University (Randy Read's group) and Duke University (the Richardsons' group). Researchers from Texas A&M University (Tom
Ioerger's and Jim Sacchettini's groups) participated in the first five years of PHENIX development.
The development of PHENIX is funded by the National Institutes of Health (General Medicine) under grant P01GM063210, and the PHENIX Industrial Consortium. Citing PHENIX If you use PHENIX to solve a structure please cite this publication: PHENIX: building new software for automated crystallographic structure determination P.D. Adams, R.W. Grosse-Kunstleve, L.-W. Hung, T.R. Ioerger, A.J. McCoy, N.
W. Moriarty, R.J. Read, J.C. Sacchettini, N.K. Sauter and T.C. Terwilliger. Acta Cryst. D58, 1948-1954
(2002) Publications A number of publications describing PHENIX can be found at: http://www.phenixonline.org/papers/ http://phenix-online.org/documentation/what-is-phenix.htm [12/14/08 1:00:11 PM]
3
Installation
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Installation
You should obtain the latest distribution of PHENIX including the binary bundles for your machine architectures. Unpack the tar file:
% tar xvf phenix-installer-<version>-<platform>.tar
Change to the installer directory:
% cd phenix-installer-<version>
To install:
% ./install [will install in /usr/local/phenix-<version> by default,
requires root permissions]
% ./install --prefix=<directory> [will make <directory>/phenix-<version> and install there]
Note: <directory> must be a absolute path (i.e. starts with a /). A relative path starting with ../ will not work correctly. Note: on Mac OS-X systems the binary installation must be installed in /usr/local.
Installation of the binary version of PHENIX requires no compilation, only the generation of some data files, so you will probably have to wait about 30 minutes for the installation to complete (depending on the performance of your installation platform). PHENIX is supported on most common Linux platforms and Mac
OS-X Currently, the following Redhat Linux platforms are tested, and therefore supported for the distribution:
●
●
●
●
●
●
●
Redhat 8.0
Redhat 9.0
Redhat Enterprise Workstation 3 [+/- x86_64]
Redhat Enterprise Server 4.2 [+/- x86_64]
Fedora Core 3 [+/- x86_64]
Fedora Core 5 [+/- x86_64]
Fedora Core 6 [+/- x86_64]
Redhat versions prior to 8.0 are not supported. PHENIX should install on other Linux platforms such as
Mandrake, or SuSe. There are 4 different Linux installations available, based on the version of the kernel
(2.4 or 2.6) and the CPU type (ix86 or x86_64). If it isn't clear which you need then type this command on the machine in question:
% uname -rm
The first item is the kernel version, the second is the machine hardware. Please download the appropriate installer based on this table:
Hardware
Kernel
2.4
2.6
ix86 intel-linux-2.4
intel-linux-2.6
x86_64 intel-linux-2.4-x86_64 intel-linux-2.6-x86_64
Currently, the following other platforms are supported for the distribution: http://phenix-online.org/documentation/install.htm (1 of 2) [12/14/08 1:00:12 PM]
4
Installation
●
Mac OS-X (Intel and PPC) 10.4.10 or later (Tiger or Leopard)
For license information please see LICENSE file. For source of components see SOURCES.
Space requirements
For the complete PHENIX installation you will need approximately 1.5GB of disk space. http://phenix-online.org/documentation/install.htm (2 of 2) [12/14/08 1:00:12 PM]
5
The PHENIX environment
Documentation Home
The PHENIX environment
Python-based Hierarchical ENvironment for Integrated Xtallography
Setting up your environment
Once you have successfully installed PHENIX, to set up your environment please source the phenix_env file in the phenix installation directory (for example):
% source /usr/local/phenix-<version>/phenix_env [csh/tcsh users] or
% . /usr/local/phenix-<version>/phenix_env.sh [sh/bash users]
To run jobs remotely, you need to source the phenix_env in your .cshrc (or equivalent) file. The following environmental variables should now be defined (here with example values):
●
PHENIX=/usr/local/phenix
●
PHENIX_INSTALLER_DATE=080920070957
●
PHENIX_VERSION=1.3
●
PHENIX_RELEASE_TAG=final
●
PHENIX_ENVIRONMENT=1
●
PHENIX_MTYPE=intel-linux-2.6-x86_64
●
PHENIX_MVERSION=linux
●
PHENIX_USE_MTYPE=intel-linux-2.6-x86_64
It is not necessary (or useful) to define environmental variables for SOLVE/RESOLVE for PHENIX. If you have them set in your environment they are ignored by PHENIX.
Documentation
You can find documentation in the PHENIX GUI (under the Help menu). Alternatively, you can use a web browser to view the documentation supplied with PHENIX, by typing:
% phenix.doc
If this doesn't work because of browser installation issues then you can point a web browser to the correct location in your PHENIX installation (for example):
% firefox /usr/local/phenix-<version>/doc/index.html
or:
% mozilla $PHENIX/doc/index.html
http://phenix-online.org/documentation/phenix-environment.htm (1 of 2) [12/14/08 1:00:12 PM]
6
The PHENIX environment
For license information please see the LICENSE file. For the source of the components see SOURCES.
Help
You can join the PHENIX bulletin board and/or view the archives: http://www.phenix-online.org/mailman/listinfo/phenixbb
Alternatively you can send email to: [email protected] (if you think you've found a bug) [email protected] (if you'd like to ask us questions) http://phenix-online.org/documentation/phenix-environment.htm (2 of 2) [12/14/08 1:00:12 PM]
7
Running PHENIX
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Running PHENIX
Different user interfaces are required depending on the needs of a diverse user community. There are currently three different user interfaces, each described below.
Command Line Interface
For a number of applications a command-line interface is most effective. This is particularly the case when rapid results are required, such as data quality assessment and twinning analysis, or substructure solution at the synchrotron beam line. Tools that facilitate the ease of use at the early stages of structure solution, such as data analyses (phenix.xtriage), substructure solution (phenix.
hyss) and reflection file manipulations such as the generation of a test set, reindexing and merging of data (phenix.reflection_file_converter) are available via simple command line interfaces. Another major application that is controlled via the command line interface is phenix.refine. To illustrate the command line interface, the command used to run the program that carries out a data quality and twinning analyses is: phenix.xtriage my_data.sca [options]
Further options can be given on the command line, or can be specified via a parameter file: phenix.xtriage my_parameters.def
A similar interface is used for macromolecular refinement: phenix.refine my_model.pdb my_data.mtz
Although SCALEPACK and MTZ formats are indicated in the above example, reflection file formats such as D*TREK, CNS/XPLOR or SHELX can be used, as the format is detected automatically.
Help for all command line applications can be obtained by use of the flag --help : phenix.refine --help
There are also many other command line tools (described in detail elsewhere in this documentation). If you use a shell with command completion, you can type the first part of a command, hit the command list key (<ctrl>-D in tcsh) and see a list of the available commands. For example, this is the start of the list of commands that begin with phenix.auto: phenix.autobuild phenix.automr phenix.autosol
phenix.autobuild_1.3 phenix.automr_1.3 phenix.autosol_1.3
http://phenix-online.org/documentation/running-phenix.htm (1 of 2) [12/14/08 1:00:14 PM]
8
Running PHENIX
Note: all commands have their regular name and name qualified with the version. You can always use the version-qualified name to ensure which version of a command you are using (in case you have multiple versions of PHENIX or related applications installed).
The PHENIX GUI
To run the PHENIX Graphical Interface:
% phenix &
Please see the other documentation files to get more details about the
.
Tasks and Strategies
The PHENIX strategy interface (in the GUI) provides a way to construct complex networks of tasks to perform a higher-level function. For example, the steps required to go from initial data to a first electron density map in a SAD experiment can be broken down into well-defined tasks (available from the task window in the GUI) which can be reused in other procedures. Instead of requiring the user to run these tasks in the correct order they are connected together by the software developer, and can thus be run in an automated way. However, because the connection between tasks is dynamic they can be reconfigured or modified, and new tasks introduced as necessary if problems occur. This provides the flexibility of user input and control, while still permitting complete automation when decision-making algorithms are incorporated into the environment. The tasks and their connection into strategies rely on the use of plain text task files written using the Python scripting language. This enables the computational algorithms to be used easily in a non-graphical environment. The PHENIX
GUI permits strategies to be visualized and manipulated. These manipulations include loading a strategy distributed with PHENIX, customizing and saving it for future recall. Current tasks and strategies available include:
●
Density modification; Carries out a single run of RESOLVE
●
Substructure solution; Runs phenix.hyss
●
Molecular replacement; Computes rotation and translation functions with PHASER.
●
Model building; Using TEXTAL or RESOLVE.
●
Ligand identification; Using RESOLVE.
Wizards
The decision-making in strategies is local, with decisions being made at the end of each task to determine the next path in the network. Crystallographers typically make decisions in a very similar way during structure solution; a program is run, the outputs manually inspected and a decision made about the next step in the process. By contrast, a wizard provides a user interface that can make more global decisions, by considering all of the available information at each step in the process. Wizards can be run from both the command line and the PHENIX GUI. Details on wizards can be found in:
●
●
Automated Structure Solution using AutoSol
●
Automated Molecular Replacement using AutoMR
●
Automated Model Building and Rebuilding using AutoBuild
●
Automated Ligand Fitting using LigandFit
http://phenix-online.org/documentation/running-phenix.htm (2 of 2) [12/14/08 1:00:14 PM]
9
PHENIX Graphical User Interface
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
PHENIX Graphical User Interface
PHENIX Graphical User Interface
Author
Nigel W. Moriarty
Purpose
To provide a simple and easy graphical interface to the features of the PHENIX package. In particular, the concept of a wizard that guides the user through the complex process of solving a protein structure, is a powerful tool.
Screen Shots
http://phenix-online.org/documentation/phenix_gui.htm (1 of 3) [12/14/08 1:00:18 PM]
10
PHENIX Graphical User Interface
Wizards
Wizards can be loaded by double clicking on the Wizard menu to the left of the GUI. The wizard loads and provides an interface to request information from the user. Details on wizards can be found in:
Automated Structure Solution using AutoSol
Automated Molecular Replacement using AutoMR
http://phenix-online.org/documentation/phenix_gui.htm (2 of 3) [12/14/08 1:00:18 PM]
11
PHENIX Graphical User Interface
Automated Model Building and Rebuilding using AutoBuild
Automated Ligand Fitting using LigandFit
Strategies
The main window of the PHENIX GUI is the strategy canvas. The strategy canvas allows the user to construct a strategy from the tasks in the menu in the left window. Choosing a task from the menu will attach that task to the mouse. The mouse will change to a hand icon while in the canvas window and there is a task attached. Clicking inside the canvas will place the task on the closest grid point. Help on tasks can be obtained by right-clicking on the task menu item to reveal a pop-up menu.
A similar situation exists for the strategy menu. Choosing a strategy will load it into a new strategy canvas. These strategies are loaded with the default task inputs providing a "clean slate" strategy for user customization. Right-clicking will reveal a pop-up menu for strategy loading option including overwriting the current strategy or adding to the current strategy.
Tasks can be moved in one of two ways. The first involves using the left mouse button to click-drag-drop the task. This must be done in the title panel of the task. The second option is useful in situations where there is a delay between the mouse action and the GUI update. This happens when using a remote machine to run the GUI. Moving a task can be achieve by right mouse clicking on the title panel and then right clicking where the task should be re-located. The mouse cursor will change to indicate the attachment of the task.
Each task has up to five buttons along the top of the title panel. The right most button deletes the task from the canvas. The left most button is a toggle to indicate which task to commence the calculation. The remaining three buttons are present if the appropriate function is available for that task.
The upper-most task in the above figure displays all five buttons. The second button is the task parameter button. A dialog is displayed that allows the user to edit the input and output of that task.
The number and type of information is dependent on the task.
The third button is the display button. This launches the appropriate windows for display the results of a task. For example, the 'import pdb' task will display the PDB header information in a text control or the molecule in a molecular graphical interface program PyMOL.
The fourth button is the help button. Help for the task is displayed via this option.
Two tasks can be linked by moving the lower panels of one task over the title panel of another. There can be any number of connection panels associated with a task. Logical operations can be provided to choose the appropriate linkage to follow.
The connection between tasks is not data flow but time flow. Each task obtains its data from the PHENIX
Data Storage Server and sends its output data to the PDS server. Subsequent tasks can get data sent to the PDS server by previous tasks.
The colors of the task indicate the activity of the strategy. A purple task means that it has finished running.
The green task is the currently running task and blue indicates a task that hasn't been run. Red indicates a task that failed during calculation and yellow is used when a strategy run is stopped by the user.
The task menu on the left side of the strategy canvas is divided into sub-menus. The ``development`` sub-menu contains experimental tasks. The ``examples`` sub-menu has some demonstrative tasks.
The remaining sub-menus are self explanatory.
The overview window in the bottom left corner allows navigation of the canvas when large strategies are used. http://phenix-online.org/documentation/phenix_gui.htm (3 of 3) [12/14/08 1:00:18 PM]
12
Main PHENIX Modules
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Main PHENIX Modules
Automated Structure Solution Using Experimental Phasing Techniques
Automated Structure Solution Via Molecular Replacement
Automated ligand density analysis
Calculating ligand geometries and defining chemical restraints
Data Analysis
Detection of twinning and other pathologies is facilitated via the program
. This command line driven program analyses an experimental data set and provides diagnostics that aid in the detection of common idiosyncrasies such as the presence of pseudo translational symmetry, certain data processing problems and twinning. Other sanity checks, such as a Wilson plot sanity check and an algorithm that tries to detect the presence of ice rings from the merged data are performed as well. If twin laws are present for the given unit cell and space group, a Britton plot is computed, an H-test is performed and a likelihood based method is used to provide an estimate of the twin fraction. Twin laws are deduced from first principles for each data set, avoiding the danger of over-looking twin laws by incomplete lookup tables. If a model is available, more efficient twin detection tools are available. The
RvsR statistic is particularly useful in the detection of twinning in combination with pseudo rotational symmetry. This statistic is computed by phenix.xtriage if calculated data is supplied together with the observed data. A more direct test for the presence of twinning is by refinement of the twin fraction given an atomic model (which can be performed in phenix.refine). The command line utility phenix.
twin_map_utils provides a quick way to refine a twin fraction given an atomic model and an X-ray data set and also produces
Automated Structure Solution Using Experimental Phasing Techniques
Structure solution via SAD, MAD or SIR(AS) can be carried out with the AutoSol wizard. The AutoSol
wizard performs heavy atom location, phasing, density modification and initial model building in an automated manner. The heavy atoms are located with substructure solution engine also used in phenix.
hyss, a dual space method similar to SHELXD and Shake and Bake. Phasing is carried out with PHASER for SAD cases and with SOLVE for MAD and SIR(AS) cases. Subsequent density modification is carried out with RESOLVE. The hand of the substructure is determined automatically on the basis of the quality of the resulting electron density map. It is noteworthy that the whole process is not necessarily linear but that the wizard can decide to step back and (for instance) try another set of heavy atoms if appropriate. In the resulting electron density map, a model is built (currently limited to proteins).
Further model completion can be carried out via the
AutoBuild wizard. The AutoBuild wizard iterates
model building and density modification with refinement of the model in a scheme similar to other iterative model building methods, for example ARP/wARP.
Automated Structure Solution Via Molecular Replacement
Structure solution via molecular replacement is facilitated via the AutoMR
wizard. The
wizard http://phenix-online.org/documentation/phenix-modules.htm (1 of 4) [12/14/08 1:00:20 PM]
13
Main PHENIX Modules guides the user through setting up all necessary parameters to run a molecular replacement job with
PHASER. The molecular replacement carried out by PHASER uses likelihood based scoring function, improving the sensitivity of the procedure and the ability to obtain reasonable solutions with search models that have a relatively low sequence similarity to the crystal structure being determined.
Besides the use of likelihood based scoring functions, structure solution is enhanced by detailed bookkeeping of all search possibilities when searching for more then a single copy in the asymmetric unit or when there the choice of space group is ambiguous. When a suitable molecular replacement
solution is found, the AutoBuild
wizard is invoked and rebuilds the molecular replacement model given the sequence of the model under investigation.
Automated Model Building
Automated model building given a starting model or a set of reasonable phases can be carried out by
the AutoBuild wizard. A typical AutoBuild job combines density modification, model building,
macromolecular refinement and solvent model updates ('water picking') in an iterative manner.
Various modes of building a model are available. Depending on the availability of a molecular model, model building can be carried by locally rebuilding an existing model (rebuild in place) or by building in the density without any information of an available model. The rebuilding in place model building is a powerful building scheme that is used by default for molecular replacement models that have a high sequence similarity to the sequence of the structure that is to be built. A fundamental feature of the
wizard is that it builds various models, all from slightly different starting points. The dependency of the outcome of the model building algorithm on initial starting conditions provides a straightforward mechanism to obtain a variety of plausible molecular models. It is not uncommon that certain sections of a map are built in one model, while not in another. Combining these models allows
the AutoBuild wizard to converge faster to a more complete model than when using a single model
building pass for a given set of phases. Dedicated loop fitting algorithms are used to close gaps between chain segments. This feature, together with the water picking and side chain placement, typically results in highly complete models of high quality that need minimal manual intervention before they are ready for deposition.
Structure Refinement
The refinement engine used in the AutoBuild and
wizards can also be run from the command line with the
command. The phenix.refine program carries out likelihood based
refinement and has the possibility to refine positional parameters, individual or grouped atomic displacement parameters, individual or grouped occupancies. The refinement of anisotropic displacement parameters (individual or via a TLS parameterization) is also available. Positional parameters can be optimized using either traditional gradient-only based optimization methods, or via simulated annealing protocols. The command line interface allows the user to specify which part of the model should be refined in what manner. It is in principle possible to refine half of the molecule as a rigid group with grouped B values, whereas the other half of the molecule has a TLS parameterization.
The flexibility of specifying the level of parameterization of the model is especially important for the refinement of low resolution data or when starting with severely incomplete atomic models. Another advantage of this flexibility in refinement strategy is that a user can perform a complex refinement protocol that carries out simulated annealing, isotropic B refinement and water picking in 'one go'.
Another main feature of
is the way in which the relative weights for the geometric and
ADP restraints with respect to the X-ray target are determined. Considerable effort has been put into devising a good set of defaults and weight determination schemes that results in a good choice of parameters for the data set under investigation. Defaults can of course be overwritten if the user chooses to. Besides being able to handle refinement against X-ray data,
against neutron data or against X-ray and neutron data simultaneously.
Automated ligand density analysis
Automated fitting of ligands into the electron density is facilitated via the
http://phenix-online.org/documentation/phenix-modules.htm (2 of 4) [12/14/08 1:00:20 PM]
14
Main PHENIX Modules building is performed by finding an initial fit for the largest rigid domain of the ligand and extending the remaining part of the ligand from this initial 'seed'. Besides being able to fit a known ligand into a
wizard is capable of identifying ligands on the basis of the difference density only. In the latter scheme, density characteristics for ligands occurring frequently in the PDB are used to provide the user with a range of plausible ligands.
Calculating ligand geometries and defining chemical restraints
Stereo chemical dictionaries of ligands whose chemical description is not available in the supplied monomer library for the use in restrained macromolecular refinement can be generated with the electronic ligand builder and optimization workbench (eLBOW). eLBOW generates a 3D geometry from a number of chemical input formats including MOL2 or PDB files and SMILES strings. SMILES is a compact, chemically dense description of a molecule that contains all element and bonding information and optionally other stereo information such as chirality. To generate a 3D geometry from an input format that contains no 3D geometry information,
eLBOW uses a Z-Matrix formalism in conjunction
with a table of bond lengths calculated using the Hartree-Fock method with a 6-31G(d,p) basis set to obtain a Cartesian coordinate set. The geometry is then optionally optimized using the semi-empirical quantum chemistry method AM1. The AM1 optimization provides chemically meaningful and accurate
geometries for the class of molecule typically complexed with proteins. eLBOW outputs the optimized
geometry and a standard CIF restraint file that can be read in by phenix.refine
for real space refinement during manual model building sessions in the program COOT. An interface is
also available to use eLBOW within COOT.
Literature
1. Adams PD, Grosse-Kunstleve, R.W., and Brunger, A.T.: Computational aspects of highthroughput crystallographic macromolecular structure determination. Methods Biochem Anal
2003, 44:75-87.
2. Terwilliger TC, Berendzen J: Automated MAD and MIR structure solution. Acta Crystallogr D Biol
Crystallogr 1999, 55(Pt 4):849-861.
3. Schneider TR, Sheldrick GM: Substructure solution with SHELXD. Acta Crystallogr D Biol
Crystallogr 2002, 58(Pt 10 Pt 2):1772-1779.
4. McCoy AJ, Grosse-Kunstleve RW, Storoni LC, Read RJ: Likelihood-enhanced fast translation functions. Acta Crystallographica Section D 2005, 61(4):458-464.
5. Terwilliger TC: Automated main-chain model building by template matching and iterative fragment extension. Acta Crystallogr D Biol Crystallogr 2003, 59(Pt 1):38-44.
6. Terwilliger TC: Automated side-chain model building and sequence assignment by template matching. Acta Crystallogr D Biol Crystallogr 2003, 59(Pt 1):45-49.
7. Emsley P, Cowtan K: Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol
Crystallogr 2004, 60(Pt 12 Pt 1):2126-2132.
8. Adams PD, Grosse-Kunstleve RW, Hung L-W, Ioerger TR, McCoy AJ, Moriarty NW, Read RJ,
Sacchettini JC, Sauter NK, Terwilliger TC: PHENIX: building new software for automated crystallographic structure determination. Acta Crystallographica Section D 2002, 58(11):1948-
1954.
9. Grosse-Kunstleve RW, Sauter NK, Moriarty NW, Adams PD: The Computational Crystallography
Toolbox: crystallographic algorithms in a reusable software framework. Journal of Applied
Crystallography 2002, 35:126-136.
10. Grosse-Kunstleve RW, Adams PD: Substructure search procedures for macromolecular structures. Acta Crystallogr D Biol Crystallogr 2003, 59(Pt 11):1966-1973.
11. Weeks CM, Miller R: Optimizing Shake-and-Bake for proteins. Acta Crystallogr D Biol Crystallogr
1999, 55(Pt 2):492-500.
12. Read R: Pushing the boundaries of molecular replacement with maximum likelihood. Acta
Crystallographica Section D 2001, 57(10):1373-1382.
13. Schomaker V, Trueblood K: On Rigid-Body Motion of Molecules in Crystals. Acta Crystall B-Stru
1968, B 24:63-&.
14. Winn MD, Isupov MN, Murshudov GN: Use of TLS parameters to model anisotropic displacements http://phenix-online.org/documentation/phenix-modules.htm (3 of 4) [12/14/08 1:00:20 PM]
15
Main PHENIX Modules in macromolecular refinement. Acta Crystallogr D Biol Crystallogr 2001, 57(Pt 1):122-133.
15. Brunger AT, Adams PD, Rice LM: Annealing in crystallography: a powerful optimization tool. Prog
Biophys Mol Biol 1999, 72(2):135-155.
16. Vagin AA, Steiner RA, Lebedev AA, Potterton L, McNicholas S, Long F, Murshudov GN: REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. Acta Crystallogr
D Biol Crystallogr 2004, 60(Pt 12 Pt 1):2184-2195.
17. Weininger D: SMILES 1. Introduction and Endoding Rules. J Chem Inf Comput Sci 1988, 28:31.
18. Fisher RG, Sweet RM: Treatment of diffraction data from crystals twinned by merohedry. Acta
Crystallographica Section A 1980, 36(5):755-760.
19. Yeates TO: Simple statistics for intensity data from twinned specimens. Acta Crystallogr A 1988,
44 ( Pt 2):142-144.
20. Yeates TO: Detecting and overcoming crystal twinning. Methods Enzymol 1997, 276:344-358.
21. Lebedev AA, Vagin AA, Murshudov GN: Intensity statistics in twinned crystals with examples from the PDB. Acta Crystallogr D Biol Crystallogr 2006, 62(Pt 1):83-95.
22. Zwart P: Anomalous signal indicators in protein crystallography. Acta Crystallographica Section D
2005, 61(11):1437-1448.
Additional information
http://phenix-online.org/documentation/phenix-modules.htm (4 of 4) [12/14/08 1:00:20 PM]
16
PHENIX FAQS
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
PHENIX FAQS
Can I easily run a Wizard with some sample data?
What sample data are available to run automatically?
Are any of the sample datasets annotated?
Why does the AutoBuild Wizard say it is doing 2 rebuild cycles but I specified one?
What is the difference between overall_best.pdb and cycle_best_1.pdb in the AutoBuild Wizard?
How can I tell the AutoSol Wizard which columns to use from my mtz file?
How do I know what my choices of labels are for my data file?
What can I do if a Wizard says this version does not seem big enough?
Why does the AutoBuild Wizard just stop after a few seconds?
What do I do if the PHENIX GUI hangs?
Why does the GUI Parameters window say Invalid input parameters...do you want to continue?
Why is my TEMP0 directory empty after running a Wizard?
What is an R-free flags mismatch?
Can I use the AutoBuild wizard at low resolution?
Why doesn't COOT recognize my MTZ file from AutoBuild?
How should I cite PHENIX?
If you use PHENIX please cite: Adams, P.D., Grosse-Kunstleve, R.W., Hung, L.-W., Ioerger, T.R.,
McCoy, A.J., Moriarty, N.W., Read, R.J., Sacchettini, J.C., Sauter, N.K., Terwilliger, T.C. (2002).
PHENIX: building new software for automated crystallographic structure determination. Acta Cryst.
D58, 1948-1954.
Where can I find sample data?
You can find sample data in the directories located in: $PHENIX/examples. Additionally there is sample MR data in $PHENIX/phaser/tutorial.
Can I easily run a Wizard with some sample data?
You can run sample data with a Wizard with a simple command. To run p9-sad sample data with the
AutoSol wizard, you type: phenix.run_example p9-sad
This command copies the $PHENIX/examples/p9-sad directory to your working directory and executes the commands in the file run.csh. http://phenix-online.org/documentation/faqs.htm (1 of 6) [12/14/08 1:00:22 PM]
17
PHENIX FAQS
What sample data are available to run automatically?
You can see which sample data are set up to run automatically by typing: phenix.run_example --help
This command lists all the directories in $PHENIX/examples/ that have a command file run.csh ready to use. For example: phenix.run_example --help
PHENIX run_example script. Fri Jul 6 12:07:08 MDT 2007
Use: phenix.run_example example_name [--all] [--overwrite]
Data will be copied from PHENIX examples into subdirectories of this working directory
If --all is set then all examples will be run (takes a long time!)
If --overwrite is set then the script will overwrite subdirectories
List of available examples: 1J4R-ligand a2u-globulin-mr gene-5-mad p9-build p9-sad
Are any of the sample datasets annotated?
The PHENIX tutorials listed on the main PHENIX web page will walk you through sample datasets, telling you what to look for in the output files. For example, the
Tutorial 1: Solving a structure using
SAD data tutorial uses the p9-sad dataset as example. It tells you how to run this example data in
AutoSol and how to interpret the results.
Why does the AutoBuild Wizard say it is doing 2 rebuild cycles but I specified one?
The AutoBuild wizard adds a cycle just before the rebuild cycles in which nothing happens except refinement and grouping of models from any previous build cycles.
What is the difference between overall_best.pdb and cycle_best_1.pdb in the AutoBuild Wizard?
The AutoBuild Wizard saves the best model (and map coefficient file, etc) for each build cycle nn as cycle_best_nn.pdb. Also the Wizard copies the current overall best model to overall_best.pdb. In this way you can always pull the overall_best.pdb file and you will have the current best model. If you wait until the end of the run you will get a summary that lists the files corresponding to the best model.
These will have the same contents as the overall_best files.
Can PHENIX do MRSAD?
Yes, PHENIX can run MRSAD (molecular replacement, combined with SAD phases) by determining the anomalous scatterer substructure from a model-phased anomalous difference Fourier. There two simple ways to do this; both are described in the
documentation.
How can I tell the AutoSol Wizard which columns to use from my mtz file?
The AutoSol Wizard will normally try to guess the appropriate columns of data from an input data file.
If there are several choices, then you can tell the Wizard which one to use with the script command group_labels_list or the command_line keywords labels, peak.labels, infl.labels etc. For example if you http://phenix-online.org/documentation/faqs.htm (2 of 6) [12/14/08 1:00:22 PM]
18
PHENIX FAQS have two input datafiles w1 and w2 for a 2-wavelength MAD dataset and you want to select the w1(+) and w1(-) data from the first file and w2(+) and w2(-1) from the second, you could put in a script file the following lines (see "How do I know what my choices of labels are for my data file" to know what to put in these lines): input_file_list w1.mtz w2.mtz
group_labels_list 'w1(+) SIGw1(+) w1(-) SIGw1(-)' 'w2(+) SIGw2(+) w2(-) SIGw2(-)'
Note that all the labels for one set of anomalous data from one file are grouped together in each set of quotes. You could accomplish the same thing from the command line by specifying something like: peak.data=w1.mtz infl.data=w2.mtz \ peak.labels='w1(+) SIGw1(+) w1(-) SIGw1(-)' \ infl.labels='w2(+) SIGw2(+) w2(-) SIGw2(-)'
How do I know what my choices of labels are for my data file?
You can find out what your choices of labels are by running the command: phenix.autosol show_labels=w1.mtz
This will provide a listing of the labels in w1.mtz and suggestions for their use in the PHENIX Wizards.
For example the labels for w1.mtz yields:
List of all anomalous datasets in w1.mtz
'w1(+) SIGw1(+) w1(-) SIGw1(-)'
List of all datasets in w1.mtz
'w1(+) SIGw1(+) w1(-) SIGw1(-)'
List of all individual labels in w1.mtz
'w1(+)'
'SIGw1(+)'
'w1(-)'
'SIGw1(-)'
Suggested uses: labels='w1(+) SIGw1(+) w1(-) SIGw1(-)' input_labels='w1(+) SIGw1(+) None None None None None None None' input_refinement_labels='w1(+) SIGw1(+) None' input_map_labels='w1(+) None None'
What can I do if a Wizard says this version does not seem big enough?
The Wizards try to automatically determine the size of solve or resolve, but if your data is very high resolution or a very large unit cell, you can get the message:
***************************************************
Sorry, this version does not seem big enough...
(Current value of isizeit is 30)
Unfortunately your computer will only accept a size of 30
with your current settings.
You might try cutting back the resolution
You might try "coarse_grid" to reduce memory
You might try "unlimit" allow full use of memory http://phenix-online.org/documentation/faqs.htm (3 of 6) [12/14/08 1:00:22 PM]
19
PHENIX FAQS
***************************************************
You cannot get rid of this problem by specifying the resolution with resolution=4.0
because the Wizards use the resolution cutoff you specify in all calculations, but the high-res data is still carried along. The easiest solution to this problem is to edit your data file to have lower- resolution data. You can do it like this: phenix.reflection_file_converter huge.sca --sca=big.sca --resolution=4.0
A second solution is to tell the Wizard to ignore the high-res data explicitly with: resolution=4.0 \ resolve_command="'resolution 200 4.0'" \ solve_command="'resolution 200 4.0'" \ resolve_pattern_command="'resolution 200 4.0'"
Note the two sets of quotes; both are required for this command-line input. These commands are applied after all other inputs in resolve/solve/resolve_pattern and therefore all data outside these limits will be ignored.
Why does the AutoBuild Wizard say Sorry, you need to define FP in labin but AutoMR was able to read my data file just fine?
When you run AutoMR and let it continue on to the AutoBuild Wizard automatically, the AutoBuild wizard guesses the input file contents separately from AutoMR. Usually it can guess correctly, but if it cannot then you can tell it what the labels for FP SIGFP FreeR_flag are like this: autobuild_input_labels="myFP mySIGFP myFreeR_flag" where you can say None for anything that you do not want to define. This has an effect that is identical to specifying input_labels directly when you run AutoBuild.
Why does the AutoBuild Wizard just stop after a few seconds?
When you run AutoBuild from the command line it writes the output to a file and says something like:
Sending output to AutoBuild_run_3_/AutoBuild_run_3_1.log
Usually if something goes wrong with the inputs then it will give you an error message right on the screen. However a few types of errors are only written to the log file, so if AutoBuild just stops after a few seconds, have a look at this log file and it should have an error message at the end of the file.
What do I do if the PHENIX GUI hangs?
If the GUI hangs (windows do not respond or windows display partially) then you may want to try and kill it by clicking on the upper right corner , or right-clicking on the top bar of the GUI and closing it. If those fail, you can control-C in the window where you started up the GUI. In either case, you can restart the GUI by typing phenix again. You may find it necessary to start phenix up, then close it down nicely with Project/Exit, and restart it (this gets rid of some files that are deleted when the GUI closes normally). You may also occasionally find it necessary to kill any jobs that are still running by http://phenix-online.org/documentation/faqs.htm (4 of 6) [12/14/08 1:00:22 PM]
20
PHENIX FAQS running top and noticing if there are python or resolve or solve jobs running that were part of your
PHENIX job, then using k to kill those jobs while running top.
Why does the GUI Parameters window say Invalid input parameters...do you want to continue?
This happens if something in the window isn't correct. If no colored entry fields come up, have a look at the bottom where it says NAVIGATE SET VARIABLE AUTO MANUAL. The entry forms under these words should read: "Choose method to run" "Choose variable to set" and "Manual" (or "Automatic") unless you have intentionally set them. If that isn't it, look carefully at all the entries in the entire parameters window and make sure that they are of the type that is expected (file name, number, etc).
If that doesn't work, just click YES and carry on.
Why is my TEMP0 directory empty after running a Wizard?
By default all the working files in the TEMP subdirectories are deleted at the end of a Wizard run. If you want to keep these files, then you can specify clean_up=False
How do I stop a Wizard?
You can stop a Wizard in two ways. For a "soft" stop, press the "Pause" button if you are running from the GUI, or create a file with the name STOPWIZARD in the directory where the Wizard is running (i.e., create AutoBuild_run_4_/STOPWIZARD to stop run 4 of the AutoBuild Wizard). For a hard stop from the GUI, you can select "Strategy" on the top line of the GUI and then select "Stop Strategy" at the bottom of the choices. That kills the Wizard and all associated jobs. You can still go on from there; select the Parameters window (the lines at the upper left of the now-yellow GUI window) and choose what to do next.
What is an R-free flags mismatch?
When you run AutoBuild or phenix.refine you may get this error message or a similar one:
************************************************************
Failed to carry out AutoBuild_build_cycle:
Please resolve the R-free flags mismatch.
************************************************************
Phenix.refine keeps track of which reflections are used as the test set (i.e., not used in refinement but only in estimation of overall parameters). The test set identity is saved as a hex-digest and written to the output PDB file produced by phenix.refine as a REMARK record:
REMARK r_free_flags.md5.hexdigest 41aea2bced48fbb0fde5c04c7b6fb64
Then when phenix.refine reads a PDB file and a set of data, it checks to make sure that the same test set is about to be used in refinement as it was in the previous refinement of this model. If it does not, you get the error message about an R-free flags mismatch. Sometimes the R-free flags mismatch error is telling you something important: you need to make sure that the same test set is used throughout refinement. In this case, you might need to change the data file you are using to match the one previously used with this PDB file. Alternatively you might need to start your refinement over with the desired data and test set. Other times the warning is not applicable. If you have two datasets with the same test set, but one dataset has one extra reflection that contains no data, only indices, then the two datasets will have different hex digests even though they are for all practical purposes equivalent.
In this case you would want to ignore the hex-digest warning. If you get an R-free flags mismatch http://phenix-online.org/documentation/faqs.htm (5 of 6) [12/14/08 1:00:22 PM]
21
PHENIX FAQS error, you can tell the AutoBuild Wizard to ignore the warning with : skip_hexdigest=True and you can tell phenix.refine to ignore it with: refinement.input.r_free_flags.ignore_pdb_hexdigest=True
You can also simply delete the REMARK record from your PDB file if you wish to ignore the hex-digest warnings.
Can I use the AutoBuild wizard at low resolution?
The standard building with AutoBuild does not work very well at resolutions below about 3-3.2 A. In particular, the wizard tends to build strands into helical regions at low resolution. However you can specify "helices_strands_only=True" and the wizard will just build regions that are helical or betasheet, using a completely different algorithm. This is much quicker than standard building but much less complete as well.
Why doesn't COOT recognize my MTZ file from AutoBuild?
This happens if you use "auto-open MTZ" in COOT. COOT will say: FAILED TO FIND COLUMNS FWT
AND PHWT IN THAT MTZ FILE FAILED TO FIND COLUMNS DELFWT AND PHDELFWT IN THAT MTZ FILE.
The solution is to use "Open MTZ" and then to select the columns (usually FP PHIM FOMM, and yes, do use weights). http://phenix-online.org/documentation/faqs.htm (6 of 6) [12/14/08 1:00:22 PM]
22
Using the PHENIX Wizards
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Using the PHENIX Wizards
Overview of Structure Determination with the PHENIX Wizards
Wizard data directories, sub-directories, Facts, and the PDS (Project Data Storage)
Running a Wizard using a multiprocessor machine or on a cluster
Basic operation of a Wizard from the GUI
Keeping track of multiple runs of a Wizard from the GUI
Setting parameters of a Wizard from the GUI
Navigating steps in a Wizard from the GUI
Running a Wizard from the command-line
Basic operation of a Wizard from the command-line
Keeping track of multiple runs of a Wizard from the command-line
Setting parameters of a Wizard from the command-line
Running a Wizard from a script
Differences between running from the command line and running a script
Basic operation of a Wizard from a script
Keeping track of multiple runs of a Wizard from a script
Setting parameters of a Wizard from a script
Specific limitations and problems:
Purpose
Any Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files.
All three versions are identical except in the way that they take commands and keywords from the user.
This page describes how to run a Wizard and what a Wizard does in general. The specific Wizard help pages describe the details of each PHENIX Wizard.
Overview of Structure Determination with the PHENIX Wizards
You can use the AutoSol Wizard to solve structures by SAD, MAD, SIR/SIRAS, and MIR/MIRAS. The
AutoMR Wizard can solve a structure by MR. The AutoMR and AutoSol Wizards together can carry out
MRSAD. The AutoSol Wizard can also combine SAD, MAD, SIR, and MIR datasets and solve the structure using all available data.
Once you have experimental or MR phases, you can carry out iterative model-building, density modification, and refinement with the AutoBuild Wizard to improve your model. Finally you can use the rebuild_in_place feature of the AutoBuild Wizard to make one very good final model. http://phenix-online.org/documentation/running-wizards.htm (1 of 11) [12/14/08 1:00:26 PM]
23
Using the PHENIX Wizards
If your structure contains ligands, you can place them using the LigandFit Wizard
This help page describes how to run the Wizards from a GUI, the command-line, or a script. The individual Wizard documentation pages describe the strategies and commands for each Wizard:
●
Automated Structure Solution using AutoSol
●
Automated Molecular Replacement using AutoMR
●
Automated Model Building and Rebuilding using AutoBuild
●
Automated Ligand Fitting using LigandFit
Usage
Wizard data directories, sub-directories, Facts, and the PDS (Project Data Storage)
●
The directory that you are in when you start up PHENIX is your working directory.
●
Each run of a Wizard will have all output data in a subdirectory of your working directory named like this (for AutoSol run 3):
AutoSol_run_3_/
●
This subdirectory will have one or more temporary directories:
AutoSol_run_3_/TEMP0/ which contain intermediate files. These temporary directories will be deleted when the Wizard is finished (unless you set the parameter clean_up to False)
●
For OMIT and MULTIPLE-MODEL runs, the final OMIT maps and multiple models will be in a subdirectory of your run directory:
AutoSol_run_3_/OMIT/
AutoSol_run_3_/MULTIPLE_MODELS/
●
All the parameter values as well as any other information that a Wizard generates during its run is stored in the PDS (Project Data Storage) and/or the Wizard Facts. The Facts are values of parameters and pointers to files in the PDS. The Facts keep track of the current knowledge available to the Wizard. Each time a step is completed by a Wizard, the new Facts are saved
(overwriting old ones for that run). As the Facts define the state of the Wizard, the Wizard can be restarted any time by loading the appropriate set of Facts.
●
The PDS (Project data storage) will be in your working directory:
./PDS/
The PDS contains the output of each of your runs for all Wizards and a record of all the Facts
(parameters and data) for each run. If you delete a run using the PHENIX Wizard GUI or with a command like "phenix.autosol delete_runs=2", the corresponding entries in the PDS are also deleted. You can copy the PDS from one place to another. Note that if you delete directories such as "AutoSol_run_1_" by hand then the corresponding information remains in the PDS. For this reason it is best to use the GUI or specific commands to delete runs.
Running a Wizard using a multiprocessor machine or on a cluster
http://phenix-online.org/documentation/running-wizards.htm (2 of 11) [12/14/08 1:00:26 PM]
24
Using the PHENIX Wizards
You can take advantage of having a multiprocessor machine or a cluster when running the wizards (Currently this applies to the LigandFit and AutoBuild Wizards). For example, adding command nproc=4 to a command-line command for a Wizard will use 4 processors to run the wizard (if possible).
Normally you will run the parallel processes in the background with the default of background=True
If you have a cluster with a batch queue, you can send subprocesses to the batch queue with run_command=qsub
(or whatever your batch command is). In this case you will use background=False so that the batch queue can keep track of your jobs.
The Wizards divide the data into nbatch batches during processing. The value of nbatch=3 is set from 3 to 5 by default (depending on the Wizard) and is appropriate if you have up to nbatch processors. If you have more, then you may wish to increase nbatch to match the number of processors. The reason it is done this way is that the value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch.
Running a Wizard from a GUI
Basic operation of a Wizard from the GUI
●
Start up the PHENIX GUI in your working directory by typing "phenix"
●
Answer "yes" to the question "Do you want to make it a project directory?".
●
Launch a Wizard from the PHENIX GUI by double-clicking on the name of the Wizard ("AutoSol") under "Wizards" in the Strategy Interface of the main GUI.
●
The Wizard will come up in a blue window and will open a grey Parameters window asking you for information on what files to use and what to do.
●
Enter the file names and make choices as necessary (NOTE: to select a file click on the yellow box to the right of the file entry field. To add a new file entry field click on the "Parameter group options" tab if present).
●
Proceed to the next window by clicking "Continue" in the upper left corner of the grey
Parameters window. http://phenix-online.org/documentation/running-wizards.htm (3 of 11) [12/14/08 1:00:26 PM]
25
Using the PHENIX Wizards
●
The Wizard will guide you through the necessary inputs, then it will continue on its own until it is finished.
●
When the Wizard is done, you can double-click on the Display icon (the little magnifying glass on the upper left of the blue Wizard window) to show a list of files and maps that can be displayed.
(NOTE: The Display Options window is updated when you open it. Once this window is open you cannot open it again until you close it. Sometimes this window may be behind other windows and this will prevent you from opening it again.)
●
You can open the Parameters window any time the Wizard is stopped by clicking on the
Parameters icon (4 little lines in the upper left corner of the blue Wizard window). This allows you to carry out some of the more advanced options below.
●
Your output log file will be in a file called "AutoSol.1.output" for an AutoSol run. You can also see the same file by clicking on the "LOG" button at the lower right of the blue or green window.
Keeping track of multiple runs of a Wizard from the GUI
●
You can run more than one Wizard job at a time if you want. Each run of a Wizard is put in a separate sub-directory (e.g., "AutoSol_run_1_").
●
When you start a Wizard, it will start a new run of that Wizard.
●
If you want to continue on with the highest-numbered run of a Wizard, you can start the Wizard with the continue button for that Wizard (for example the continue_AutoSol button).
●
If you want to go back to a previous run, you can use the Run Control and Run Number selections near the bottom of any Parameters window (NOTE: to open the parameters window click on the lines at the upper left of the blue Wizard window). Select goto_run and choose a run number to go to.
●
If you want to copy a previous run and go on, use the Run Control and Run Number selections and select copy_run and choose a run number to copy. The Wizard will create a new run (with number equal to the highest previous number plus one) and carry on with it.
●
To see what runs are available, select View or Delete Runs in the Navigate tab at the lower left of any Parameters window.
●
If you want to stop the Wizard, hit the PAUSE button on the green Wizard window (the Wizard is green when running, blue or purple when stopped). NOTE: this may take a little time, particularly if Phaser or HYSS or phenix.refine are running. In those cases if you really want to stop the Wizard right away, got to "Strategy" and then select "Stop Strategy" and it will be stopped.
Setting parameters of a Wizard from the GUI
●
You can set any parameter in a Wizard by selecting the variable in the Choose Variable to Set tab. The next time you click Continue, the Wizard will save all the current inputs as usual, and then instead of going on to the next step, it will open a window asking you for the new value of that variable. When you enter it and press Continue, the Wizard will continue on with what it was doing, but with this new value.
●
NOTE that some parameters (e.g., resolution) may affect many steps. If a prior step is affected by a parameter that is changed, the Wizard does not go back and change it. If you want the http://phenix-online.org/documentation/running-wizards.htm (4 of 11) [12/14/08 1:00:26 PM]
26
Using the PHENIX Wizards parameter change to affect something that has already been done, you need to re-run the corresponding step.
●
NOTE that you can set any SOLVE, RESOLVE or RESOLVE_PATTERN keyword when you are running a Wizard using the "resolve_command", "solve_command" or
"resolve_pattern_command" keywords. These can be set in the GUI from the Choose Variable pull-down menu. You just type in the command to the entry form like this: (for resolve_command): res_start 4.0
telling resolve in this case to start out density modification at a resolution of 4 A. This allows you to control what solve, resolve and resolve_pattern do more finely than you otherwise can in the
Wizards.
Navigating steps in a Wizard from the GUI
●
When the Wizard is done or Paused, you can select any available step in the Navigate tab at the middle bottom of any Parameter window. This tells the Wizard to get any necessary inputs for that step and to then carry it out.
●
The Wizards normally start out in Manual mode (one step at a time, asking user for inputs).
Once the necessary inputs are entered, the Wizard enters Automatic mode (no more asking for inputs until something required is missing). You can control this by specifying Manual or
Automatic in the Auto/Manual tab at the bottom right of any Wizard.
Running a Wizard from the command-line
Basic operation of a Wizard from the command-line
●
You can run a wizard from the command line like this (autosol is the AutoSol wizard): phenix.autosol data=w1.sca seq_file=seq.dat 2 Se
●
The command_line interpreter will try to interpret obvious information (2 means sites=2, Se means atom_type=Se) and will run the wizard.
●
To see all the information about this wizard and the keywords that you can set for this wizard, type: phenix.autosol --help all
●
Any wizard keyword can be entered at the command line (not just the ones labelled "commandline only"). The documentation for each wizard lists all the keywords that apply to that wizard.
●
If you want to stop a Wizard, you can create a file "STOPWIZARD" and put it in the subdirectory
(i.e., AutoSol_2_/) where the Wizard is running. This is like hitting the PAUSE button on the GUI and stops the wizard cleanly.
Keeping track of multiple runs of a Wizard from the command-line
●
When you start a Wizard from the command line, the default is to start a new run of that
Wizard. http://phenix-online.org/documentation/running-wizards.htm (5 of 11) [12/14/08 1:00:26 PM]
27
Using the PHENIX Wizards
●
To see all the available runs of this Wizard, type: phenix.autosol show_runs
●
To delete runs 1,2 and 4-7 of this Wizard, type something like this: phenix.autosol delete_runs="1 2 4-7"
Note that the group of numbers is enclosed in quotes ("). This tells the input parser (iotbx.phil) that all these numbers go with the one keyword of delete_runs. Note also that there are no spaces around the "=" sign!
●
To go back to run 2 and carry on (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like: phenix.autosol run=2 resolution=3.0
●
To carry on with the current highest-numbered run (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like: phenix.autosol carry_on resolution=3.0
●
To copy run 2 to a new run and carry on from there (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like: phenix.autosol copy_run=2 resolution=3.0
Setting parameters of a Wizard from the command-line
When you run a Wizard from the command-line, two files are produced and put in the subdirectory of the Wizard (e.g., AutoBuild_run_3_/).
●
A parameters (".eff") file will be produced that you can edit to rerun the Wizard: phenix.autosol autosol.eff
This autosol.eff file (for AutoSol) contains the values of all the AutoSol parameters at the time of starting the Wizard.
Note that the syntax in the autosol.eff file is very slightly different than the syntax from the command line. From the command line, if a value has several parts, you enclose them in quotes and there are no spaces around the "=" sign: phenix.autosol ... input_phase_labels="FP PHIM FOMM"
In the .eff file, you MUST leave off the quotes or the three values will be treated as one, and you should leave blanks around the "=" sign:
input_phase_labels = FP PHIM FOMM
The reason these are different is that in the .eff file, the structure of the file and the brackets tell the PHIL parser what is grouped together, while from the commmand line, the quotes tell the http://phenix-online.org/documentation/running-wizards.htm (6 of 11) [12/14/08 1:00:26 PM]
28
Using the PHENIX Wizards parser what is to be grouped together.
●
A script file (".inp") with inputs in the format for running from a script is produced that you can edit and use like this: phenix.runWizard AutoSol AutoSol.inp
●
To get keyword help on a specific keyword you can type: phenix.autosol --help data # get help on the keyword data for autosol
●
To show current Facts (values of all parameters) for highest_numbered run: phenix.autosol show_facts
●
To show current Facts (values of all parameters) for run 3: phenix.autosol run=3 show_facts
●
To show current summary: phenix.autosol show_summary
●
When you use a keyword like data= you need to give enough information to specify this keyword uniquely. You can see all the keywords for each PHENIX Wizard or tool at the end of the documentation for that Wizard or tool. This will have entries like this (for AutoSol):
autosol
sites= None Number of heavy-atom sites. (Command-line only) which describes the keyword sites in the scope defined by autosol. You can explicitly specify this on the command line with: autosol.sites=3 which in this case is entirely the same as sites=3
●
NOTE that you can set any SOLVE, RESOLVE or RESOLVE_PATTERN keyword in PHENIX using the "resolve_command", "solve_command" or "resolve_pattern_command" keywords from the command line. The format is a little tricky: you have to put two sets of quotes around the command like this: resolve_command="'ligand_start start.pdb'" # NOTE ' and " quotes
This will put the text ligand_start start.pdb
at the end of every temporary command file created to run resolve. http://phenix-online.org/documentation/running-wizards.htm (7 of 11) [12/14/08 1:00:26 PM]
29
Using the PHENIX Wizards
Running a Wizard from a script
Differences between running from the command line and running a script
Command-line
The command-line is an easy way to run a Wizard and is recommended for any users. The command starts with phenix. plus the name of the Wizard in lower-case letters (phenix.autosol). Following this, all of the keywords are on the same line (or on continuation lines) and values are assigned with an "=" sign. The order of keywords makes no difference running from the command line. A simple command is: phenix.autosol data=w1.sca seq_file=seq.dat sites=2 atom_type=Se
Scripts Normally scripts are for advanced users only (for running MIR or multiple datasets, you have to use the GUI or a script, however). A script can contain both commands and keywords. Keywords are read in until a command is found, then the command is executed, then additional keywords are read in until another command is found, and so on. If the script file contains only keywords and no commands, then the keywords are read in and used as input to the Wizard, in just the same way as running from the command line. In a script file, each line can contain a command or keyword and optional values for the command or keyword, separated by spaces. The keywords for scripts are a subset of keywords for the command-line. This is because the command-line interpreter has a number of special keywords
(essentially shortcuts) to make typing at the command-line easier. A script file assigns values to keywords by being on the same line, not using any "=" signs. A sample script file "autosol.inp" that contains the same information as the command-line command shown above (but with the full keyword names, not the command-line shortcuts) is:
# autosol.inp
# script file with inputs for AutoSol Wizard.
# run with: phenix.runWizard AutoSol autosol.inp
# input_file_list w1.sca # script keyword is input_file_list not data input_seq_file seq.dat # script keyword is input_seq_file not seq_file mad_ha_n 2 # script keyword is mad_ha_n not sites mad_ha_type Se # script keyword is mad_ha_type not atom_type
#
# end of autosol.inp
which you can run with: phenix.runWizard AutoSol autosol.inp
NOTE: The script interpreter will accept any keywords and values. If the keyword is not recognized, then it will write a warning to the log file, but it will not stop. This means that if you use the wrong name for a keyword, you will only find this out by looking at the beginning of the log file. The utility of this feature is that keywords set the value of the corresponding variable in the Wizard. If you know what you are doing, you can set any variable in the Wizard in this way, whether or not it is a keyword.
Basic operation of a Wizard from a script
●
You can run a wizard from a script like this (AutoSol wizard): phenix.runWizard AutoSol autosol.inp
http://phenix-online.org/documentation/running-wizards.htm (8 of 11) [12/14/08 1:00:26 PM]
30
Using the PHENIX Wizards
The script file (autosol.inp) should contain keyword entries telling the Wizard what to do. The output will be written to the log file (e.g., AutoSol_run_1_/AutoSol_run_1_1.log).
●
The keywords that can be set in a script file include most of the keywords for for command-line running, plus a set of control commands for running from a script. To see all the basic keywords for a wizard, make a script (e.g., keywords.inp) that says: list_keywords and then type: phenix.runWizard AutoSol keywords.inp
The keywords will be written to the log file (e.g., AutoSol_run_1_/AutoSol_run_1_1.log).
●
For help on a Wizard, your script file should say: help
●
Unlike running from the command-line, the order of entries in a script file can make a difference.
For example you can specify a group of inputs for one dataset and then start a new dataset.
●
If you want to stop a Wizard, you can create a file "STOPWIZARD" and put it in the subdirectory
(i.e., AutoSol_2_/) where the Wizard is running. This is like hitting the PAUSE button on the GUI and stops the wizard cleanly.
Keeping track of multiple runs of a Wizard from a script
●
When you start a Wizard from the command line, the default is to start a new run of that
Wizard.
●
To see all the available runs of this Wizard, delete some runs, carry on with run 3, or copy run 4 into a new run, your script should say one of the following: show_runs delete_run_list 1 2 3-5 run 3 copy_run 4
Setting parameters of a Wizard from a script
●
You can set nearly any parameter using keywords from a script. For example: resolution 2.5
will set the overall high-resolution cutoff to 2.5 A.
Useful script commands
With the exception of show_runs and delete_runs, the output for each of these commands is written to the log file (e.g., AutoSol_run_1_/AutoSol_run_1_1.log). help # print out this help message http://phenix-online.org/documentation/running-wizards.htm (9 of 11) [12/14/08 1:00:26 PM]
31
Using the PHENIX Wizards show_runs # list all the runs that are saved delete_runs 1 2 3-5 9:12 # delete runs 1 2 3-5 9-12 carry_on # continue on with the highest-numbered run run 5 # continue with run 5 copy_run 5 # make a new copy of run 5 (with number equal
# to highest existing run number +1) and continue
# with this new copy.
run 2 run_only DumpFacts # list current values of all parameters in run 2 and stop run_only nothing # do nothing and stop list_keywords # list all the keywords and their possible values run_list method_1 method_2 # run these methods and anything
# that follows automatically run_only method_1 method_2 # run just these methods and stop user_command method_1 list_methods # list all methods that can be run with run_list
These are a good way to run Wizards initially, and also a good way to change some parameters after stopping a run
Note: these all have the form: keyword parameter where the parameter must be enclosed in quotes if it is a string containing blanks. If the keyword contains the text "list" or the words "dataset_", "cell" or "input_labels" then the parameter can be a list of items, separated by blanks: cell 40 50 40 90 90 90
An empty list is indicated by "[]"
●
NOTE that you can set any SOLVE, RESOLVE or RESOLVE_PATTERN keyword in PHENIX using the "resolve_command", "solve_command" or "resolve_pattern_command" keywords from a script. The format is different than from the command-line: you don't have to put quotes around around the command: resolve_command ligand_start start.pdb # NOTE: quotes not necessary for script
This will put the text ligand_start start.pdb
http://phenix-online.org/documentation/running-wizards.htm (10 of 11) [12/14/08 1:00:26 PM]
32
Using the PHENIX Wizards at the end of every temporary command file created to run resolve.
Specific limitations and problems:
●
In the GUI version of Wizards, The Display Options window is updated only when you open it.
Further, once this window is open you cannot open it again until you close it. Sometimes this window may be behind other windows and this will prevent you from opening it again until you close the open window.
●
The Wizards use file names based on the names of your input files, but they do not differentiate between files with the same name coming from different directories. Consequently you should not use two files with different contents but with the same file name as inputs to a Wizard, even if they come from separate starting directories.
●
The command-line version of AutoSol cannot be used for MIR or for combining multiple datasets.
The script and GUI versions can be used instead for these cases.
●
If you stop a Wizard and continue on with a command such as phenix.autobuild run=2 then you can change most parameters with keywords just as if you were starting from scratch, but if you had previously changed a keyword away from the default, you cannot set it back to the default in this way (the Wizard ignores keywords that are the same as the default).
●
You should not work on the same run in two ways at the same time. This can lead to unpredictable results because the two runs will really be the same run and the data and databases for the two runs will be overwriting each other. This means you need to be careful that if you goto_run 1 of a Wizard in one window that you do not also goto_run 1 of the same
Wizard in another window. On the other hand, it is perfectly fine to work on run 1 of a Wizard in one window and run 2 of the same Wizard in another window.
●
The PHENIX Wizards can take most settings of most space groups, however they can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in
PHENIX for these settings and space groups.
Literature
Additional information
http://phenix-online.org/documentation/running-wizards.htm (11 of 11) [12/14/08 1:00:26 PM]
33
Automated structure solution with AutoSol
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Automated structure solution with AutoSol
Datasets and Solutions in AutoSol
Analyzing and scaling the data
Finding heavy-atom (anomalously-scattering atom) sites
Running AutoSol separately in related space groups
Scoring of heavy-atom solutions
Density modification (including NCS averaging)
Preliminary model-building and refinement
Model viewing during model-building with the Coot-PHENIX interface
SAD dataset specifying solvent fraction
SAD dataset without model-building
SAD dataset, building RNA instead of protein
SAD dataset, selecting a particular dataset from an MTZ file
MRSAD -- SAD dataset with an MR model; Phaser SAD phasing including the model
Using an MR model to find sites and as a source of phase information (method #2 for MRSAD)
SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss
MAD dataset, selecting particular datasets from an MTZ file
SAD with more than one anomalously-scattering atom
Specific limitations and problems
Author(s)
●
●
●
●
●
AutoSol Wizard: Tom Terwilliger
PHENIX GUI and PDS Server: Nigel W. Moriarty
HYSS: Ralf W. Grosse-Kunstleve and Paul D. Adams
Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni
SOLVE: Tom Terwilliger http://phenix-online.org/documentation/autosol.htm (1 of 29) [12/14/08 1:00:42 PM]
34
Automated structure solution with AutoSol
●
●
●
●
RESOLVE: Tom Terwilliger
TEXTAL: K. Gopal, T.R. Ioerger, R.K. Pai, T.D. Romo, J.C. Sacchettini phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams phenix.xtriage: Peter Zwart
Purpose
The AutoSol Wizard uses HYSS, SOLVE, Phaser, RESOLVE, TEXTAL, xtriage and phenix.refine to solve a structure and generate experimental phases with the MAD, MIR, SIR, or SAD methods. The Wizard begins with datafiles (.sca, .hkl, etc) containing amplitidues of structure factors, identifies heavy-atom sites, calculates phases, carries out density modification and NCS identification, and builds and refines a preliminary model.
Usage
The AutoSol Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See
Running a Wizard from a GUI, the command-line, or a script
for details of how to run a Wizard. The command-line version will be described here, except for MIR and multiple datasets, which can only be run with the GUI or with a script.
How the AutoSol Wizard works
The basic steps that the AutoSol Wizard carries out are described below. They are: Setting up inputs,
Analyzing and scaling the data, Finding heavy-atom (anomalously-scattering atom) sites, Scoring of heavyatom solutions, Phasing, Density modification (including NCS averaging), and Preliminary model-building and refinement. The data for structure solution are grouped into Datasets and solutions are stored in
Solution objects.
Setting up inputs
The AutoSol Wizard expects the following basic information:
(1) a datafile name (w1.sca or data=w1.sca)
(2) a sequence file (seq.dat or seq_file=seq.dat)
(3) how many sites to look for (2 or sites=2)
(4) what the anomalously-scattering atom is (Se or atom_type=Se)
(5) If you have SAD or MAD data, then it is helpful to add f_prime and f_double_prime for each wavelength.
You can also specify many other parameters, including resolution, number of sites, whether to search in a thorough or quick fashion, how thoroughly to build a model, etc. If you have a heavy-atom solution from a previous run or another approach, you can read it in directly as well.
Datasets and Solutions in AutoSol
AutoSol breaks down the data for a structure solution into datasets, where a dataset is a set of data that corresponds to a single set of heavy-atom sites. An entire MAD dataset is a single dataset. An MIR structure solution consists of several datasets (one for each native-derivative combination). A MAD + SIR structure has one dataset for the MAD data and a second dataset for the SIR data. The heavy-atom sites for each dataset are found separately (but using difference Fouriers from any previously-solved datasets to help). In the phasing step all the information from all datasets is merged into a single set of phases. http://phenix-online.org/documentation/autosol.htm (2 of 29) [12/14/08 1:00:42 PM]
35
Automated structure solution with AutoSol
The AutoSol wizard uses a "Solution" object to keep track of heavy-atom solutions and the phased datasets that go with them. There are two types of Solutions: those which consist of a single dataset (Primary
Solutions) and those that are combinations of datasets (Composite Solutions). "Primary" Solutions have information on the datafiles that were part of the dataset and on the heavy-atom sites for this dataset.
Composite Solutions are simply sets of Primary Solutions, with associated origin shifts. The hand of the heavy-atom or anomalously-scattering atom substructure is part of a Solution, so if you have two datatsets, each with two Solutions related by inversion, then AutoSol would normally construct four different
Composite Solutions from these and score each one as described below.
Analyzing and scaling the data
The AutoSol Wizard analyzes input datasets with phenix.xtriage to identify twinning and other conditions that may require special care. The data is scaled with SOLVE. For MAD data, FA values are calculated as well.
Note on anisotropy corrections:
The AutoSol wizard will apply an anistropy correction to all the raw experimental data if any of the files in the first dataset read in have a very strong anisotropy. You can tell the Wizard how much anisotropy there must be before applying this correction by default using the keywords correct_aniso=True # (if True or False then always or never apply correction) delta_b_for_auto_correct_aniso=20 # correct if range of anisotropic B
#is greater than 20 ratio_b_for_auto_correct_aniso=1.5 #correct if the ratio of the largest
#to smallest anisotropic B is greater than 1.5
If an anisotropy correction is applied then a separate refinement file must be specified if refinement is to be carried out. This is because it is best to refine against data that have not been corrected for anisotropy
(instead applying the correction as part of refinement).
Finding heavy-atom (anomalously-scattering atom) sites
The AutoSol Wizard uses HYSS to find heavy-atom sites. The result of this step is a list of possible heavyatom solutions for a dataset. For SIR or SAD data, the isomorphous or anomalous differences, respectively are used as input to HYSS. For MAD data, the anomalous differences at each wavelength, and the FA estimates of complete heavy-atom structure factors from SOLVE are each used as separate inputs to HYSS.
Each heavy-atom substructure obtained from HYSS corresponds to a potential solution. In space groups where the heavy-atom structure can be either hand, a pair of enantiomorphic solutions is saved for each run of HYSS.
Running AutoSol separately in related space groups
AutoSol will check for the opposite hand of the heavy-atom solution, and at the same time it will check for the opposite hand of your space group (It will invert the heavy-atom solution from HYSS and invert the hand of the space group at the same time). Therefore you do not need to run AutoSol twice for space groups that are chiral (for example P41). The corresponding inverse space groups will be checked automatically (P43 ). If there are possibilities for your space group other than the inverse hand of the space group, then you should test them all, one at a time. For example if you were not able to measure 00l reflections in a hexagonal space group, your space group might be P6, P61, P62, P63, P64 or P65. In this case you would have to run it in P6, P61 P62 and P63 (and then P65 and P64 will be done automatically as the inverses of P61 and P62). Normally only one of these will give a plausible solution.
Scoring of heavy-atom solutions
http://phenix-online.org/documentation/autosol.htm (3 of 29) [12/14/08 1:00:42 PM]
36
Automated structure solution with AutoSol
Potential heavy-atom solutions are scored based on a set of criteria (CC, RFACTOR, SKEW, FOM,
NCS_OVERLAP, TRUNCATION, REGIONS, SD; described below), using either a Bayesian estimate, a linear regression, or a Z-score system to put all the scores on a common scale and to combine them into a single overall score. The overall scoring method chosen (BAYES-CC or Z-SCORE) is determined by the value of the keyword overall_score_method. The default is BAYES-CC. Note that for all scoring methods, the map that is being evaluated, and the estimates of map-perfect-model correlation, refer to the experimental electron density map, not the density-modified map.
Bayesian CC scores (BAYES-CC). Bayesian estimates of the quality of experimental electron density maps are obtained using data from a set of previously-solved datasets. The standard scoring criteria were evaluated for 1905 potential solutions in a set of 246 MAD, SAD, and MIR datasets. As each dataset had previously been solved, the correlation between the refined model and each experimental map
(CC_PERFECT) could be calculated for each solution (after offsetting the maps to account for origin differences). Histograms were tabulated of the number of instances that a scoring criterion (e.g., SKEW) had various possible values, as a function of the CC_PERFECT of the corresponding experimental map to the refined model. These histograms yield the relative probability of measuring a particular value of that scoring criterion (SKEW), given the value of CC_PERFECT. Using Bayes' rule, these probabilities can be used to estimate the relative probabilities of values of CC_PERFECT given the value of each scoring criterion for a particular electron density map. The mean estimate (BAYES-CC) is reported (multiplied x 100), with a +/-
2SD estimate of the uncertainty in this estimate of CC_PERFECT. The BAYES-CC values are estimated independently for each scoring criterion used, and also from all those selected with the keyword score_type_list and not selected with the keyword skip_score_list.
Z-scores (Z-SCORE). The Z-score for one criterion for a particular solution is given by,
Z= (Score - mean_random_solution_score)/(SD_of_random_solution_scores) where Score is the score for this solution, mean_random_solution_score is the mean score for a solution with randomized phases, and SD_of_random_solution_scores is the standard deviation of the scores of solutions with randomized phases.
To create a total score based on Z-scores, the Z-scores for each criterion are simply summed.
The principal scoring criteria are:
(1) Correlation of map-phased electron density map with experimentally- phased map (CC). The statistical density modification in RESOLVE allows the calculation of map-based phases that are (mostly) independent of the experimental phases. The phase information in statistical density modification comes from two sources: your experimental phases and maximization of the agreement of the map with expectations (such as a flat solvent region). Normally the phase probabilities from these two sources are merged together, yielding your density-modified phases. This score is calculated based on the correlation of the phase information from these two sources before combining them, and is a good indication of the quality of the experimental phases. This criterion is used in scoring by default.
(2) The R-factor for density modification (R-Factor). Statistical density modification provides an estimate of structure factors that is (mostly) independent of the measured structure factors, so the R-factor between FC and Fobs is a good measure of the quality of experimental phases. This criterion is used in scoring by default.
(3) The skew (third moment or normalized <rho**3>) of the density in an electron density map is a good measure of its quality, because a random map has a skew of zero (density histograms look like a Gaussian), while a good map has a very positive skew (density histograms very strong near zero, but many points with very high density). This criterion is used in scoring by default.
(4) Non-crystallographic symmetry (NCS overlap). The presence of NCS in a map is a nearly-positive indication that the map is good, or has some correct features. The AutoSol Wizard uses symmetry in heavyatom sites to suggest NCS, and RESOLVE identifies the actual correlation of NCS-related density for the NCS overlap score. This score is used by default if NCS is present in the Z-score method of scoring. http://phenix-online.org/documentation/autosol.htm (4 of 29) [12/14/08 1:00:42 PM]
37
Automated structure solution with AutoSol
(5) Figure of merit (FOM). The figure of merit of phasing is a good indicator of the internal consistency of a solution. This score is not normalized by the SD of randomized phase sets (as that has no meaning; rather a standard SD=0.05 is used). This score is used by default if NCS is present in the Z-score method of scoring and in the Bayesian CC estimate method.
(6) Map correlation after truncation (TRUNCATION). Dummy atoms (the same number as estimated nonhydrogen atoms in the structure) are placed in positions of high density of the map, and a new map is calculated based on these atomic positions. The correlation of these maps is calculated after adjusting an overall B-value for the dummy atoms to maximize the correlation. A good map will show a high correlation of these maps. This score is by default not used.
(7) Number of contiguous regions per 100 A**3 comprising top 5% of density in map (REGIONS). The top
5% of points in the map are marked, and the number of contiguous regions that result are counted, and divided by the volume of the asymmetric unit, then multiplied by 100. A good map will have just a few contiguous regions at a high contour level, a poor map will have many isolated peaks. This score is by default not used. (8) Standard deviation of local rms density (SD). The local rms density in the map is calculated using a smoothing radius of 3 times the high-resolution cutoff (or 6 A, if less than 6A). Then the standard deviation of the local rms, normalized to the mean value of the local rms, is reported. This criteria will be high if there are regions of high local rms (the macromolecule) and separate regions of low local rms
(the solvent) and low if the map is random. This score is by default not used.
Phasing
The AutoSol Wizard uses Phaser to calculate experimental phases from SAD data, and SOLVE to calculate phases from MIR, MAD, and multiple-dataset cases.
Density modification (including NCS averaging)
The AutoSol Wizard uses RESOLVE to carry out density modification. It identifies NCS from symmetries in heavy-atom sites with RESOLVE and applies this NCS if it is present in the electron density map.
Preliminary model-building and refinement
The AutoSol Wizard carries out one cycle of model-building and refinement after obtaining density-modified phases. The model-building can be with RESOLVE or with TEXTAL. The refinement is carried out with phenix.
refine.
Resolution limits in AutoSol
There are several resolution limits used in AutoSol. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:
Name resolution resolution_build res_phase
Description
Overall resolution for a dataset
How default value is set
Highest resolution for any datafile in this dataset. For multiple datasets, the highest resolution for any dataset value of "resolution"
Resolution for modelbuilding value of "resolution"
Resolution for phasing for a dataset
If phase_full_resolution=True then use value of
"resolution". Otherwise, use value of
"recommended_resolution" based on analysis of signalto-noise in dataset.
http://phenix-online.org/documentation/autosol.htm (5 of 29) [12/14/08 1:00:42 PM]
38
Automated structure solution with AutoSol res_eval
Resolution for evaluation of solution quality value of "resolution" or 2.5 A, whichever is lower resolution.
Output files from AutoSol
When you run AutoSol the output files will be in a subdirectory with your run number:
AutoSol_run_1_/
The key output files that are produced are:
●
A summary file listing the results of the run and the other files produced:
AutoSol_summary.dat # overall summary
●
A warnings file listing any warnings about the run
AutoSol_warnings.dat # any warnings
●
A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
AutoSol_Facts.dat # all Facts about the run
●
NCS information (if any)
AutoSol_15.ncs_spec # NCS information. The number is the solution number
●
Experimental phases and HL coefficients solve_15.mtz # either solve or phaser depending on which was run phaser_15.mtz
●
Density-modified phases from RESOLVE current_cycle_map_coeffs.mtz # map coefficients (density modified phases) resolve_15.mtz # density-modified phases; same as above
For either of these, use FP PHIM FOMM for PHI F FOM.
●
An mtz file for use in refinement exptl_fobs_phases_freeR_flags_15.mtz # F Sigma HL coeffs, freeR-flags for refinement
●
Heavy atom sites in PDB format ha_15.pdb_formatted.pdb
●
Current preliminary model and evaluation of model current_cycle.pdb
current_cycle_eval.log
http://phenix-online.org/documentation/autosol.htm (6 of 29) [12/14/08 1:00:42 PM]
39
Automated structure solution with AutoSol
How to run the AutoSol Wizard
Running the AutoSol Wizard is easy. From the command-line you can type: phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5
The AutoSol Wizard will assume that w1.sca is a datafile (because it ends in .sca and is a file) and that seq.
dat is a sequence file, that there are 2 heavy-atom sites, and that the heavy-atom is Se. The f_prime and f_double_prime values are set explicitly
You can also specify each of these things directly: phenix.autosol data=w1.sca seq_file=seq.dat sites=2 \
atom_type=Se f_prime=-8 f_double_prime=4.5
You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at
sites=3 # 3 sites sites_file=sites.pdb # ha sites in PDB or fractional xyz format atom_type=Se # Se is the heavy-atom seq_file=seq.dat # sequence file (1-aa code, separate chains with >>>>) quick=True # try to find sites quickly data=w1.sca # input datafile f_prime=-5 # f-prime value for SAD f_double_prime=4.5 # f-double-prime value for SAD
Model viewing during model-building with the Coot-PHENIX interface
The AutoSol Wizard allows you to view the current best model that is produced by the automated model-
Wizard. Normally you would use it just to view the model in AutoSol, and to view and edit a model in
. The PHENIX-Coot interface is accessible through the GUI and via the command-line. Using the
GUI, when a model has been produced by the AutoSol Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start Coot with your current map and model. If you are running from the command-line, you can open a new window and type: phenix.autobuild coot which will do the same (provided the necessary map and model are ready). When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. If you want, you can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save
model as COMM/overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. In AutoSol, only the main-chain atoms of the model you save are considered, and the side-chains are ignored. Ligands and solvent in the model are ignored as well. As the AutoSol Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button
Update with current files from PHENIX.
Examples
SAD dataset
phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5
http://phenix-online.org/documentation/autosol.htm (7 of 29) [12/14/08 1:00:42 PM]
40
Automated structure solution with AutoSol
The sequence file is used to estimate the solvent content of the crystal and for model-building. Note that for a SAD dataset the value of f_prime and f_double_prime are not critical. If you are off by a factor of 2 on f_double_prime, the refined occupancies of heavy-atom sites might be 1/2 their correct values.
SAD dataset specifying solvent fraction
phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
solvent_fraction=0.45
This will force the solvent fraction to be 0.45. This illustrates a general feature of the Wizards: they will try to estimate values of parameters, but if you input them directly, they will use your input values.
SAD dataset without model-building
phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
build=False
This will carry out the usual structure solution, but will skip model-building
SAD dataset, building RNA instead of protein
phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \
chain_type=RNA
This will carry out the usual structure solution, but will build an RNA chain. For DNA, specify chain_type=DNA. You can only build one type of chain at a time in the AutoSol Wizard. To build protein and
DNA, use the
Wizard and run it first with chain_type=PROTEIN, then run it again specifying the protein model as input_lig_file_list=proteinmodel.pdb and with chain_type=DNA.
SAD dataset, selecting a particular dataset from an MTZ file
If you have an input MTZ file with more than one anomalous dataset, you can type something like: phenix.autosol w1.mtz seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \ labels='F SIGF DANO SIGDANO'
This will carry out the usual structure solution, but will choose the input data columns based on the labels:
'F SIGF DANO SIGDANO'. If you run the AutoSol Wizard with SAD data and an MTZ file containing more than one anomalous dataset and don't tell it which one to use, all possible values of labels are printed out for you so that you can just paste the one you want in.
You can also find out all the possible label strings to use by typing: phenix.autosol display_labels=w1.mtz # display all labels for w1.mtz
MRSAD -- SAD dataset with an MR model; Phaser SAD phasing including the model
If you are carrying out SAD phasing with Phaser, you can carry out a combination of molecular replacement phasing and SAD phasing (MRSAD) by adding a single new keyword to your AutoSol run: input_partpdb_file=MR.pdb
In this case the MR.pdb file will be used as a partial model in a maximum-likelihood SAD phasing calculation with Phaser to calculate phases and identify sites in Phaser, and the combined MR+SAD phases will be written out. NOTE: At the moment the AutoBuild Wizard is not equipped to use these combined phases http://phenix-online.org/documentation/autosol.htm (8 of 29) [12/14/08 1:00:42 PM]
41
Automated structure solution with AutoSol optimally in iterative model-building, density modification and refinement, because they contain both experimental phase information and model information. It is therefore possible that the resulting phases are biased by your MR model, and that this bias will not go away during iterative model-building because it is continually fed back in.
Using an MR model to find sites and as a source of phase information (method #2 for MRSAD)
You can also combine MR information with SAD phases (see J. P. Schuermann and J. J. Tanner Acta Cryst.
(2003). D59, 1731-1736 ) in PHENIX by running the three wizards AutoMR, AutoSol, and AutoBuild one after the other. This method does not use the partial model and the anomalous information in the SAD dataset simultaneously as the above Phaser maximum-likelihood method does. On the other hand, the phases obtained in this method are independent of the model, so that combining them afterwards does not introduce model bias. (It is not yet clear which is the better approach, so you may wish to try both.)
Additionally, this approach can be used with any method for phasing. Here is a set of three simple commands to do this: First run AutoMR to find the molecular replacement solution, but don't rebuild it yet: phenix.automr gene-5.pdb infl.sca copies=1 \
RMS=1.5 mass=9800 rebuild_after_mr=False
Now your MR solution is in AutoMR_run_1_/MR.1.pdb and phases are in AutoMR_run_1_/MR.1.mtz.
Use these phases as input to AutoSol, along with some weak SAD data, still not building any new models:
phenix.autosol data=infl.sca \
input_phase_file=AutoMR_run_1_/MR.1.mtz input_phase_labels="F PHIC FOM" \ seq_file=sequence.dat build=False note that we have specified the data columns for F PHI and FOM in the input_phase_file. For input_phase_file you must specify all three of these (if you leave out FOM it will set it to zero). AutoSol will write an MTZ file with experimental phases to phaser_xx.mtz where xx depends on how many solutions are considered during the run. The next command for running AutoBuild you will need to edit depending on the value of xx:
phenix.autobuild data=AutoSol_run_1_/phaser_2.mtz \
model=AutoMR_run_1_/MR.1.pdb seq_file=sequence.dat rebuild_in_place=False
AutoBuild will now take the phases from your AutoSol run and combine them with model-based information from your AutoMR MR solution, and will carry out iterative density modification, model-building and refinement to rebuild your model. Note that you may wish to set rebuild_in_place=True, depending on how good your MR model is.
SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss
phenix.autosol 11 Pb data=deriv.sca seq_file=seq.dat \
sites_file=deriv_hyss_consensus_model.pdb
This will carry out the usual structure solution process, but will read sites from deriv_hyss_consensus_model.
pdb, try both hands, and carry on from there. If you know the hand of the substructure, you can fix it with
have_hand=True.
MAD dataset
The inputs for a MAD dataset need to specify f_prime and f_double_prime for each wavelength. It also must be clear what datafile goes with which wavelength. If you input an MTZ file with multiple datasets, then the order of those datasets is assumed to be the same as the order of the wavelengths. You may want to either select particular datasets from your MTZ file (see below) or split such an MTZ file into separate files for each dataset if this does not work in the way you expect. http://phenix-online.org/documentation/autosol.htm (9 of 29) [12/14/08 1:00:42 PM]
42
Automated structure solution with AutoSol phenix.autosol seq_file=seq.dat sites=2 atom_type=Se \ peak.data=w1.sca peak.f_prime=-8 peak.f_double_prime=4.5 \ infl.data=w2.sca infl.f_prime=-9 infl.f_double_prime=1.9 \ high.data=w3.sca high.f_prime=-5 high.f_double_prime=3.0
MAD dataset, selecting particular datasets from an MTZ file
This is similar to the case for SAD data.If you have an input MTZ file with more than one anomalous dataset, you can type something like: phenix.autosol seq_file=seq.dat sites=2 atom_type=Se \ peak.data=all_data.mtz peak.f_prime=-8 peak.f_double_prime=4.5 \ high.data=all_data.mtz high.f_prime=-5 high.f_double_prime=3.0 \ peak.labels='Fpeak SIGFpeak DANOpeak SIGDANOpeak' \ high.labels='Fhigh SIGFhigh DANOhigh SIGDANOhigh'
This will carry out the usual structure solution, but will choose the input peak data columns based on the labels: 'Fpeak SIGFpeak DANOpeak SIGDANOpeak', and the high data from the ones labelled 'Fhigh
SIGFhigh DANOhigh SIGDANOhigh'.
As in the SAD case, you can find out all the possible label strings to use by typing: phenix.autosol display_labels=w1.mtz # display all labels for w1.mtz
SIR dataset
The standard inputs for an SIR dataset are the native and derivative, the sequence file, the heavy-atom type, and the number of sites, as well as whether to use anomalous differences (or just isomorphous differences): phenix.autosol native.data=native.sca deriv.data=deriv.sca \
deriv.atom_type=I deriv.sites=2 deriv.inano=inano
This will set the heavy-atom type to Iodine, look for 2 sites, and include anomalous differences.
SAD with more than one anomalously-scattering atom
You can tell the AutoSol wizard to look for more than one anomalously- scattering atom. Specify one atom type (Se) in the usual way. Then specify any additional ones like this if you are running AutoSol from the command line: mad_ha_add_list="Br Pt" mad_ha_add_f_prime_list=" -7 -10" mad_ha_add_f_double_prime_list=" 4.2 12"
There must be the same number of entries in each of these three keyword lists. During phasing Phaser will try to add whichever atom types best fit the scattering from each new site. This option is available for SAD phasing only.
MIR dataset
An MIR dataset is a set of more than one datasets. This cannot be readily expressed in the command-line inputs, but you can specify it easily with the PHENIX AutoSol GUI or with a script. In a script file you can say: cell 93.796 79.849 43.108 90.000 90.000 90.00 # cell params http://phenix-online.org/documentation/autosol.htm (10 of 29) [12/14/08 1:00:42 PM]
43
Automated structure solution with AutoSol thoroughness thorough # best to use thorough for MIR resolution 2.8 # Resolution expt_type sir # MIR dataset is set of SIR datasets input_seq_file sequence.dat
############## DATASET 1 ################ input_file_list rt_rd_1.sca auki_rd_1.sca # Native and deriv 1 nat_der_list Native Au # identify files by ha type inano_list noinano inano # say if ano diffs to be used n_ha_list 0 5 # number of heavy-atoms run_list start # read in datafiles for dataset run_list read_another_dataset # about to start a new dataset here
############## DATASET 2 ################ input_file_list rt_rd_1.sca hgki_rd_1.sca # Native and deriv 2 nat_der_list Native Hg inano_list noinano inano n_ha_list 0 5
#########################################
The script file carries out steps in the order that they are input. This allows us to read in one entire dataset, save it, then read in another one. The AutoSol Wizard will solve each dataset and then combine them and phase the combined datset with SOLVE Bayesian correlated phasing, taking into account any correlations among the non-isomorphism and heavy-atom sites for the various derivatives.
SIR + SAD datasets
A combination of SIR and SAD datasets is almost the same as an MIR dataset in the AutoSol Wizard. You specify each dataset separately, and put "start" and "read_another_dataset" between the datasets: cell 93.796 79.849 43.108 90.000 90.000 90.00 # cell params resolution 2.8 # Resolution input_seq_file sequence.dat
############## DATASET 1 ################ expt_type sir # MIR dataset is set of SIR datasets input_file_list rt_rd_1.sca auki_rd_1.sca # Native and deriv 1 nat_der_list Native Au # identify files by ha type inano_list noinano inano # say if ano diffs to be used n_ha_list 0 5 # number of heavy-atoms run_list start # read in datafiles for dataset run_list read_another_dataset # about to start a new dataset here
############## DATASET 2 ################ expt_type sad # our second dataset is SAD input_file_list hgki_rd_1.sca # anom diffs for SAD dataset mad_ha_n 5 # 5 sites
#########################################
The SIR and SAD datasets will be solved separately (but whichever one is solved first will use difference
Fourier or anomalous difference Fourier's to locate sites for the other). Then phases will be combined by addition of Hendrickson-Lattman coefficients and the combined phases will be density modified.
Possible Problems
General limitations
Specific limitations and problems
●
The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoSol wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Largerhttp://phenix-online.org/documentation/autosol.htm (11 of 29) [12/14/08 1:00:42 PM]
44
Automated structure solution with AutoSol size versions can be obtained on request.
●
The command-line version of AutoSol cannot be used for MIR or for combining multiple datasets. The script and GUI versions can be used instead for these cases.
●
The AutoSol Wizard can take a maximum of 6 derivatives for MIR.
●
The AutoSol Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.
Literature
Simple algorithm for a maximum-likelihood SAD function. A..J. McCoy, L.C. Storoni and R.J. Read. Acta Cryst. D60, 1220-1228 (2004)
[pdf]
Substructure search procedures for macromolecular structures. R.W. Grosse-
Kunstleve and P.D. Adams. Acta Cryst. D59, 1966-1973 (2003)
[pdf]
MAD phasing: Bayesian estimates of F
A
T. C. Terwilliger Acta Cryst. D50 , 11-16 (1994) [pdf]
Additional information
List of all AutoSol keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- autosol
sites= None Number of heavy-atom sites. This is an alias for the keyword
mad_ha_n. (Command-line only)
sites_file= None PDB or plain-text file with ha sites. This is an alias for
the keyword ha_sites_file. (Command-line only)
atom_type= None Anomalously-scattering atom type. This is an alias for the
keyword mad_ha_type. (Command-line only)
seq_file= Auto Sequence file . This is an alias for the keyword
input_seq_file. (Command-line only)
quick= None Run everything quickly (thoroughness=quick) (Command-line only)
data= None Datafile. For command_line input it is easiest if each
wavelength of data is in a separate data file with obvious data
columns. File types that are easy to read include Scalepack sca files
, CNS hkl files, mtz files with just one wavelength of data, or just
native or just derivative. In this case the Wizard can read your data
without further information. If you have a datafile with many
columns, you can use the "labels" keyword to specify which data
columns to read. (It may be easier in some cases to use the GUI or to http://phenix-online.org/documentation/autosol.htm (12 of 29) [12/14/08 1:00:42 PM]
45
Automated structure solution with AutoSol
split it with phenix.reflection_file_converter first, however.)
(Command-line only)
labels= None Specification string for data labels (Command_line only). To
find out what the appropriate strings are, type "phenix.autosol
display_labels=your-datafile-here.mtz"
f_prime= None F-prime value for any wavelength. (Command-line only)
f_double_prime= None F-doubleprime value for any wavelength. (Command_line
only)
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
(Command-line only)
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown. (Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop".
If you type "phenix.autobuild run=3 stop" then this will stop run
3 of autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
(Command-line only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown. (Command-line only)
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this: peak.data="F+
SIGF+ F- SIGF-" # the entire string in quotes counts
here You can use the individual labels from these
strings as identifiers for data columns in AutoSol and
AutoBuild like this: input_refinement_labels="FP SIGFP
FreeR_flags" # each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
peak
data= None Datafile for peak wavelength. (Command_line only)
labels= None Specification string for data labels for peak wavelength.
(Command_line only). To find out what the appropriate strings
are, type "phenix.autosol display_labels=your-datafile-here.mtz" http://phenix-online.org/documentation/autosol.htm (13 of 29) [12/14/08 1:00:42 PM]
46
Automated structure solution with AutoSol
f_prime= None F-prime value for peak wavelength. (Command_line only)
f_double_prime= None F-doubleprime value for peak wavelength.
(Command_line only)
infl
data= None Datafile for infl wavelength. (Command_line only)
labels= None Specification string for data labels for infl wavelength.
(Command_line only). To find out what the appropriate strings
are, type "phenix.autosol display_labels=your-datafile-here.mtz"
f_prime= None F-prime value for infl wavelength. (Command_line only)
f_double_prime= None F-doubleprime value for infl wavelength.
(Command_line only)
high
data= None Datafile for high wavelength. (Command_line only)
labels= None Specification string for data labels for high wavelength.
(Command_line only). To find out what the appropriate strings
are, type "phenix.autosol display_labels=your-datafile-here.mtz"
f_prime= None F-prime value for high wavelength. (Command_line only)
f_double_prime= None F-doubleprime value for high wavelength.
(Command_line only)
low
data= None Datafile for low wavelength. (Command_line only)
labels= None Specification string for data labels for low wavelength.
(Command_line only). To find out what the appropriate strings
are, type "phenix.autosol display_labels=your-datafile-here.mtz"
f_prime= None F-prime value for low wavelength. (Command_line only)
f_double_prime= None F-doubleprime value for low wavelength.
(Command_line only)
remote
data= None Datafile for remote wavelength. (Command_line only)
labels= None Specification string for data labels for remote wavelength.
(Command_line only). To find out what the appropriate strings
are, type "phenix.autosol display_labels=your-datafile-here.mtz"
f_prime= None F-prime value for remote wavelength. (Command_line only)
f_double_prime= None F-doubleprime value for remote wavelength.
(Command_line only)
native
data= None Datafile for native . (Command_line only)
labels= None Specification string for data labels for native .
(Command_line only). To find out what the appropriate strings
are, type "phenix.autosol display_labels=your-datafile-here.mtz
"
atom_type= Native Heavy-atom type for native . (Command_line only)
sites= 0 Number of heavy-atom sites for native . (Command_line only)
inano= *noinano inano anoonly Use anomalous differences for native .
(Command_line only)
deriv
data= None Datafile for deriv . (Command_line only)
labels= None Specification string for data labels for deriv .
(Command_line only). To find out what the appropriate strings
are, type "phenix.autosol display_labels=your-datafile-here.mtz
"
atom_type= I Heavy-atom type for deriv . (Command_line only)
sites= 2 Number of heavy-atom sites for deriv . (Command_line only)
inano= noinano *inano anoonly Use anomalous differences for deriv .
(Command_line only)
crystal_info
cell= 0.0 0.0 0.0 0.0 0.0 0.0
Enter cell parameter a b c alpha beta
gamma
chain_type= *Auto PROTEIN DNA RNA You can specify whether to build
protein, DNA, or RNA chains. At present you can only build http://phenix-online.org/documentation/autosol.htm (14 of 29) [12/14/08 1:00:42 PM]
47
Automated structure solution with AutoSol
one of these in a single run. If you have both DNA and
protein, build one first, then run AutoBuild again,
supplying the prebuilt model in the "input_lig_file_list"
and build the other. NOTE: default for this keyword is Auto,
which means "carry out normal process to guess this
keyword". The process is to look at the sequence file and/or
input pdb file to see what the chain type is. If there are
more than one type, the type with the larger number of
residues is guessed. If you want to force the chain_type,
then set it to PROTEIN RNA or DNA.
change_sg= False You can change the space group. In AutoSol the Wizard
will use ImportRawData and let you specify the sg and cell.
In AutoMR the wizard will give you an entry form to specify
them. NOTE: This only applies when reading in new datasets.
It does nothing when changed after datasets are read in.
residues= None Number of amino acid residues in the au (or equivalent)
resolution= 0.0
High-resolution limit.Used as resolution limit for
density modification and as general default high-resolution
limit. If resolution_build or refinement_resolution are set
then they override this for model-building or refinement. If
overall_resolution is set then data beyond that resolution
is ignored completely.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
solvent_fraction= None Solvent fraction (typically 0.4 - 0.6)
decision_making
acceptable_quality= 40.0
You can specify the minimum overall quality of
a model (as defined by overall_score_method) to be
considered acceptable
acceptable_secondary_structure_cc= 0.35
You can specify the minimum
correlation of density from a
secondary structure model to be
considered acceptable
create_scoring_table= False Choose whether you want a scoring table for
solutions A scoring table is slower but better
desired_coverage= 0.8
Choose what probability you want to have that the
correct solution is in your current list of top
solutions. A good value is 0.80. If you set a low
value (0.01) then only one solution will be kept at
any time; if you set a high value, then many solutions
will be kept (and it will take longer).
ha_iteration= False Choose whether you want to iterate the heavy-atom
search. With iteration, sites are found with HYSS, then
used to phase and carry out quick density-modification,
then difference Fourier is used to find sites again and
improve their accuracy.
hklperfect= None Enter an mtz file with idealized coefficients for map
This will be compared with all maps calculated during
structure solution
max_cc_extra_unique_solutions= 0.5
Specify the maximum value of CC
between experimental maps for two
solutions to consider them substantially
different. Solutions that are within the
range for consideration based on
desired_coverage, but are outside of the
number of allowed max_choices, will be
considered, up to
max_extra_unique_solutions, if they have
a correlation of no more than
max_cc_extra_unique_solutions with all
other solutions to be tested.
max_choices= 3 Number of choices for solutions to put on screen http://phenix-online.org/documentation/autosol.htm (15 of 29) [12/14/08 1:00:42 PM]
48
Automated structure solution with AutoSol
max_composite_choices= 8 Number of choices for composite solutions to
consider
max_extra_unique_solutions= 2 Specify the maximum number of solutions to
consider based on their uniqueness as well
as their high scores. Solutions that are
within the range for consideration based on
desired_coverage, but are outside of the
number of allowed max_choices, will be
considered, up to
max_extra_unique_solutions, if they have a
correlation of no more than
max_cc_extra_unique_solutions with all other
solutions to be tested.
max_ha_iterations= 2 Number of iterations of difference Fouriers in
searching for heavy-atom sites
max_range_to_keep= 4.0
The range of solutions to be kept is
range_to_keep * SD of the group of solutions. This
sets the maximum of range_to_keep
min_fom= 0.05
Minimum fom of a solution to keep it at all
min_fom_for_dm= 0.0
Minimum fom of a solution to density modify
(otherwise just copy over phases). This is useful in
cases where the phasing is so weak that density
modification does nothing or makes the phases worse.
min_phased_each_deriv= 1 You can require that the wizard phase at least
this number of solutions from each derivative,
even if they are poor solutions. Usually at least
1 is a good idea so that one derivative does not
dominate the solutions.
minimum_improvement= 0.0
Minimum improvement in score to continue ha
iteration
n_random= 6 Number of random solutions to generate when setting up
scoring table
overall_score_method= *BAYES-CC Z-SCORE You have 2 choices for an
overall scoring method: (1) Sum of individual
Z-scores (Z-SCORE) (3) Bayesian estimate of CC of
map to perfect model (BAYES-CC) You can specify
which scoring criteria to include with
score_type_list (default is SKEW CORR_RMS for
BAYES-CC and CC RFACTOR SKEW FOM for Z-SCORE.
Additionally, if NCS is present, NCS_OVERLAP is
used by default in the Z-SCORE method).
perfect_labels= None Labels for input data columns for hklperfect
Typical value: "FP PHIC FOM"
r_switch= 0.4
R-value criteria for deciding whether to use R-value or
residues built A good value is 0.40
random_scoring= False For testing purposes you can generate random
scores
res_eval= 0.0
Resolution for running resolve evaluation (usually 2.5 A)
score_individual_offset_list= None Offsets for individual scores in
CC-scoring. Each score will be multiplied
by the score_individual_scale_list value,
then score_individual_offset_list value is
added, to estimate the CC**2 value using
this score by itself. The uncertainty in
the CC**2 value is given by
score_individual_sd_list. NOTE: These
scores are not used in calculation of the
overall score. They are for information
only
score_individual_scale_list= None Scale factors for individual scores in
CC-scoring. Each score will be multiplied http://phenix-online.org/documentation/autosol.htm (16 of 29) [12/14/08 1:00:42 PM]
49
Automated structure solution with AutoSol
by the score_individual_scale_list value,
then score_individual_offset_list value is
added, to estimate the CC**2 value using
this score by itself. The uncertainty in
the CC**2 value is given by
score_individual_sd_list. NOTE: These
scores are not used in calculation of the
overall score. They are for information
only
score_individual_sd_list= None Uncertainties for individual scores in
CC-scoring. Each score will be multiplied by
the score_individual_scale_list value, then
score_individual_offset_list value is added,
to estimate the CC**2 value using this score
by itself. The uncertainty in the CC**2 value
is given by score_individual_sd_list. NOTE:
These scores are not used in calculation of
the overall score. They are for information
only
score_overall_offset= None Overall offset for scores in CC-scoring. The
weighted scores will be summed, then all
multiplied by score_overall_scale, then
score_overall_offset will be added.
score_overall_scale= None Overall scale factor for scores in CC-scoring.
The weighted scores will be summed, then all
multiplied by score_overall_scale, then
score_overall_offset will be added.
score_overall_sd= None Overall SD of CC**2 estimate for scores in
CC-scoring. The weighted scores will be summed, then
all multiplied by score_overall_scale, then
score_overall_offset will be added. This is an
estimate of CC**2, with uncertainty about
score_overall_sd. Then the square root is taken to
estimate CC and SD(CC), where SD(CC) now depends on CC
due to the square root.
score_type_list= SKEW CORR_RMS You can choose what scoring methods to
include in scoring of solutions in AutoSol. (The
choices available are: CC_DENMOD RFACTOR SKEW
NCS_COPIES NCS_IN_GROUP TRUNCATE FLATNESS CORR_RMS
REGIONS CONTRAST FOM ) NOTE: If you are using
Z-SCORE or BAYES-CC scoring, The default is CC_RMS
RFACTOR SKEW FOM (and NCS_OVERLAP if ncs_copies >1).
score_weight_list= None Weights on scores for CC-scoring. Enter the
weight on each score in score_type_list. The weighted
scores will be summed, then all multiplied by
score_overall_scale, then score_overall_offset will
be added.
skip_score_list= NCS_OVERLAP You can evaluate some scores but not use
them. Include the ones you do not want to use in the
final score in skip_score_list.
use_perfect= False You can use the CC between each solution and
hklperfect in scoring. This is only for methods development
purposes.
density_modification
fix_xyz= False You can choose to not refine coordinates, and instead to
fix them to the values found by the heavy-atom search.
fix_xyz_after_denmod= False When sites are found after density
modification you can choose whether you want to
fix the coordinates to the values found in that
map.
hl_in_resolve= False AutoSol normally does not write out HL coefficients http://phenix-online.org/documentation/autosol.htm (17 of 29) [12/14/08 1:00:42 PM]
50
Automated structure solution with AutoSol
in the resolve.mtz file with density-modified phases. You
can turn them on with hl_in_resolve=True
mask_cycles= 5 Number of mask cycles in density modification (5 is usual
for thorough density modification
mask_type= *histograms probability wang Choose method for obtaining
probability that a point is in the protein vs solvent region.
Default is "histograms". If you have a SAD dataset with a
heavy atom such as Pt or Au then you may wish to choose
"wang" because the histogram method is sensitive to very high
peaks. Options are: histograms: compare local rms of map and
local skew of map to values from a model map and estimate
probabilities. This one is usually the best. probability:
compare local rms of map to distribution for all points in
this map and estimate probabilities. In a few cases this one
is much better than histograms. wang: take points with
highest local rms and define as protein.
minor_cycles= 10 Number of minor cycles in density modification for each
mask cycle (10 is usual for thorough density modification
test_mask_type= True You can choose to have AutoSol test histograms/wang
methods for identifying solvent region based on the
final density modification r-factor.
thorough_denmod= False Choose whether you want to go for quick density
modification (speeds it up and for a terrible map is
sometimes better)
truncate_ha_sites_in_resolve= Auto *Yes No True False You can choose to
truncate the density near heavy-atom sites
at a maximum of 2.5 sigma. This is useful
in cases where the heavy-atom sites are
very strong, and rarely hurts in cases
where they are not. The heavy-atom sites
are specified with "input_ha_file"
use_ncs_in_denmod= True This script normally uses available ncs
information in density modification. Say No to skip
this. See also find_ncs
display
number_of_solutions_to_display= 1 Number of solutions to put on screen
and to write out
solution_to_display= 0 Solution number of the solution to display and
write out ( use 0 to let the wizard display the top
solution)
general
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set
run_command=qsub (or otherwise submit to a batch queue),
then you should set background=False, so that the batch
queue can keep track of your runs. There is no need to use
background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
base_path= None You can specify the base path for files (default is
current working directory)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is No, keep these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True"
coot_name= coot If your version of coot is called something else, then
you can specify that here.
http://phenix-online.org/documentation/autosol.htm (18 of 29) [12/14/08 1:00:42 PM]
51
Automated structure solution with AutoSol
data_quality= *moderate strong weak The defaults are set for you
depending on the anticipated data quality. You can choose
"moderate" if you are unsure.
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
expt_type= *Auto mad sir sad Experiment type (MAD SIR SAD) NOTE: Please
treat MIR experiments as a set of SIR experiments. NOTE: The
default for this keyword is Auto which means "carry out
normal process to guess this keyword". If you have a single
file, then it is assumed to be SAD. If you specify
native.data and deriv.data it is SIR, if you specify
peak.data and infl.data it is MAD. If the Wizard does not
guess correctly, you can set it with this keyword.
extra_verbose= False Facts and possible commands will be printed every
cycle if Yes
i_ran_seed= 588459 Random seed (positive integer) for model-building
and simulated annealing refinement
max_wait_time= 100.0
You can specify the length of time (seconds) to
wait when testing the run_command. If you have a cluster
where jobs do not start right away you may need a longer
time to wait.
nbatch= 1 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
("","_giant","_huge","_extra_huge")
run_command= csh When you specify nproc=nn, you can run the subprocesses
as jobs in background with csh (default) or submit them to
a queue with the command of your choice (i.e., qsub ). If
you have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your
system. NOTE: If you set run_command=qsub (or otherwise
submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command=csh (or similar, csh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
temp_dir= None Define a temporary directory (it must exist)
thoroughness= *quick thorough You can try to run quickly and see if you
can get a solution ("quick") or more thoroughly to get the
best possible solution ("thorough").
title= Run 1 AutoSol Sun Dec 7 17:46:23 2008 Enter any text you like to http://phenix-online.org/documentation/autosol.htm (19 of 29) [12/14/08 1:00:42 PM]
52
Automated structure solution with AutoSol
help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
verbose= False Command files and other verbose output will be printed
heavy_atom_search
acceptable_cc_hyss= 0.2
Hyss will be run at up to n_add_res_max+1
resolutions starting with res_hyss and adding
increments of add_res_max/n_add_res_max. If the best
CC value is greater than acceptable_cc_hyss then no
more resolutions are tried.
add_res_max= 2.0
Hyss will be run at up to n_add_res_max+1 resolutions
starting with res_hyss and adding increments of
add_res_max/n_add_res_max. If the best CC value is greater
than acceptable_cc_hyss then no more resolutions are tried.
best_of_n_hyss= 1 Hyss will be run up to best_of_n_hyss_always times at
a given resolution. If the best CC value is greater than
good_cc_hyss and the number of sites found is at least
min_fraction_of_sites_found times the number expected
and Hyss was tried at least best_of_n_hyss times, then
the search is ended.
best_of_n_hyss_always= 10 Hyss will be run up to best_of_n_hyss_always
times at a given resolution. If the best CC value
is greater than good_cc_hyss and the number of
sites found is at least
min_fraction_of_sites_found times the number
expected and Hyss was tried at least
best_of_n_hyss times, then the search is ended.
good_cc_hyss= 0.3
Hyss will be run up to best_of_n_hyss_always times at
a given resolution. If the best CC value is greater than
good_cc_hyss and the number of sites found is at least
min_fraction_of_sites_found times the number expected and
Hyss was tried at least best_of_n_hyss times, then the
search is ended.
hyss_enable_early_termination= True You can specify whether to stop HYSS
as soon as it finds a convincing solution
(Yes, default) or to keep trying...
hyss_general_positions_only= True Select Yes if you want HYSS only to
consider general positions and ignore sites
on special positions. This is appropriate
for SeMet or S-Met solutions, not so
appropriate for heavy-atom soaks
hyss_min_distance= 3.5
Enter the minimum distance between heavy-atom
sites to keep them in HYSS
hyss_n_fragments= 3 Enter the number of fragments in HYSS
hyss_n_patterson_vectors= 33 Enter the number of Patterson vectors to
consider in HYSS
hyss_random_seed= 792341 Enter an integer as random seed for HYSS
mad_ha_n= None Number of heavy atoms (anomalously-scattering atoms) in
the au
mad_ha_type= Se Enter the anomalously-scattering or heavy atom type. For
example, Se or Au. NOTE: if you want Phaser to add
additional heavy-atoms of other types, you can specify them
with mad_ha_add_list.
max_single_sites= 5 In sites_from_denmod a core set of sites that are
strong is identified. If the hand of the solution is
known then additional sites are added all at once up
to the expected number of sites. Otherwise sites are
added one at a time, up to a maximum number of tries
of max_single_sites
min_fraction_of_sites_found= 1.0
Hyss will be run up to
best_of_n_hyss_always times at a given http://phenix-online.org/documentation/autosol.htm (20 of 29) [12/14/08 1:00:42 PM]
53
Automated structure solution with AutoSol
resolution. If the best CC value is greater
than good_cc_hyss and the number of sites
found is at least
min_fraction_of_sites_found times the
number expected and Hyss was tried at least
best_of_n_hyss times, then the search is
ended.
min_hyss_cc= 0.05
Minimum CC of a heavy-atom solution in HYSS to keep it
at all
n_add_res_max= 2 Hyss will be run at up to n_add_res_max+1 resolutions
starting with res_hyss and adding increments of
add_res_max/n_add_res_max. If the best CC value is
greater than acceptable_cc_hyss then no more resolutions
are tried.
input_files
cif_def_file_list= None You can enter any number of CIF definition
files. These are normally used to tell phenix.refine
about the geometry of a ligand or unusual residue.
You usually will use these in combination with "PDB
file with metals/ligands" (keyword
"input_lig_file_list" ) which allows you to attach
the contents of any PDB file you like to your model
just before it gets refined. You can use
phenix.elbow to generate these if you do not have a
CIF file and one is requested by phenix.refine
group_labels_list= None For command-line and script running of AutoSol,
you may wish to use keywords to specify which set of
data columns to be used from an MTZ or other file
type with multiple datasets. (From the GUI, it is
easy because you are prompted with the column
labels). You can do this by specifying a string that
identifies which dataset to include. All allowed
values of this identification string will be written
out any time AutoSol is run on this dataset like
this: NOTE: To specify a particular set of data you
can specify one of the following (this example is for
MAD data, specifying data for peak wavelength): ...:
peak.labels='F SIGF DANO SIGDANO' peak.labels='F(+)
SIGF(+) F(-) SIGF(-)' You can then use one of the
above commands on the command-line to identify the
dataset of interest. If you want to use a script
instead, you can specify N files in your
input_data_file_list, and then specify N values for
group_labels_list like this: group_labels_list
'F,SIGF,DANO,SIGDANO' 'F(+),SIGF(+),F(-),SIGF(-)'
This will take 'F,SIGF,DANO,SIGDANO' as the data for
datafile 1 and 'F(+),SIGF(+),F(-),SIGF(-)' for
datafile 2 You can identify one dataset from each
input file in this way. If you want more than one,
then please use phenix.reflection_file_converter to
split your input file, or else use the GUI version of
AutoSol in which you can select any subset of the
data that you wish.
input_file_list= None Input data files: Any standard format is fine. If
all files are Scalepack premerged or all are Scalepack
unmerged original index then they will be used as is.
In all other cases all files are converted next to
Scalepack premerged.
input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set
then density at sites specified with input_ha_file is
truncated to improve the density modification procedure.
http://phenix-online.org/documentation/autosol.htm (21 of 29) [12/14/08 1:00:42 PM]
54
Automated structure solution with AutoSol
input_phase_file= None MTZ data file with FC PHIC or equivalent to use
for finding heavy-atom sites with difference Fourier
methods.
input_refinement_file= None Data file to use for refinement. The data in
this file should not be corrected for anisotropy.
It will be combined with experimental phase
information for refinement. If you leave this
blank, then the output of phasing will be used in
refinement (see below). If no anisotropy
correction is applied to the data you do not need
to specify a datafile for refinement. If an
anisotropy correction is applied to the data
files, then you must enter a datafile for
refinement if you want to refine your model. (See
"correct_aniso" for specifying whether an
anisotropy correction is applied. In most cases
it is not.) If an anisotropy correction is
applied and no refinement datafile is supplied,
then no refinement will be carried out in the
model-building step. You can choose any of your
datafiles to be the refinement file, or a native
that is not part of the datasets for structure
solution. If there are more than one dataset you
will be asked each time for a refinement file,
but only the last one will be used. Any
standard format is fine; normally only F and sigF
will be used. Bijvoet pairs and duplicates will
be averaged. If an mtz file is provided then a
free R flag can be read in as well. If you do
not provide a refinement file then the structure
factors from the phasing step will be used in
refinement. This is normally satisfactory for SAD
data and MIR data. For MAD data you may wish to
supply a refinement file because the structure
factors from phasing are a combination of data
from different wavelengths of data. It is better
if you choose your best wavelength of data for
refinement.
input_refinement_labels= None Labels for input refinement file columns
(FP SIGFP FreeR_flag)
input_seq_file= Auto Enter name of file with 1-letter code of protein
sequence NOTES: 1. lines starting with > are ignored
and separate chains 2. FASTA format is fine 3. If
there are multiple copies of a chain, just enter one
copy. 4. If you enter a PDB file for rebuilding and it
has the sequence you want, then the sequence file is not
necessary. NOTE: You can also enter the name of a PDB
file that contains SEQRES records, and the sequence from
the SEQRES records will be read, written to
seq_from_seqres_records.dat, and used as your input
sequence. NOTE: for AutoBuild you can specify
start_chains_list on the first line of your sequence
file: >> start_chains_list 23 11 5 NOTE: default
for this keyword is Auto, which means "carry out normal
process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild will
automatically take the value from AutoSol. If you do not
want this to happen, you can specify None which means
"No file"
refine_eff_file_list= None You can enter any number of refinement http://phenix-online.org/documentation/autosol.htm (22 of 29) [12/14/08 1:00:42 PM]
55
Automated structure solution with AutoSol
parameter files. These are normally used to tell
phenix.refine defaults to apply, as well as
creating specialized definitions such as unusual
amino acid residues and linkages. These
parameters override the normal phenix.refine
defaults. They themselves can be overridden by
parameters set by the Wizard and by you,
controlling the Wizard. NOTE: Any parameters set
by AutoBuild directly (such as
number_of_macro_cycles, high_resolution, etc...)
will not be taken from this parameters file. This
is useful only for adding extra parameters not
normally set by AutoBuild.
model_building
add_sidechains= True Add side chains on to main-chain in Textal
model-building. This requires a sequence file
build= True Build model after density modification?
build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build
models with RESOLVE and TEXTAL or either one, and how many
different models to build with RESOLVE. The more you build,
the more likely to get a complete model. Note that
rebuild_in_place can only be carried out with RESOLVE
model-building
capra= True CAPRA is used to place CA atoms
cc_helix_min= None Minimum CC of helical density to map at low
resolution when using helices_strands_only
cc_strand_min= None Minimum CC of strand density to map when using
helices_strands_only
d_max_textal= 1000.0
This low-resolution limit is only used for Textal
model-building
d_min_textal= 2.8
Textal has an optimal high-resolution limit of 2.8 A
This limit is only used for Textal model-building
fit_loops= True You can fit loops automatically if sequence alignment
has been done.
group_ca_length= 4 In resolve building you can specify how short a
fragment to keep. Normally 4 or 5 residues should be
the minimum.
group_length= 2 In resolve building you can specify how many fragments
must be joined to make a connected group that is kept.
Normally 2 fragments should be the minimum.
helices_strands_only= False You can choose to use a quick model-building
method that only builds secondary structure. At
low resolution this may be both quicker and more
accurate than trying to build the entire structure
If you are running the AutoSol Wizard, normally
you should choose 'Yes' and use the quick
model-building. Then when your structure is solved
by AutoSol, go on to AutoBuild and build a more
complete model (this time normally using
helices_strands_only=False).
helices_strands_start= True You can choose to use a quick model-building
method that builds secondary structure as a way
to get started...then model completion is done as
usual. (Contrast with helices_strands_only which
only does secondary structure)
input_compare_file= None If you are rebuilding a model or already think
you know what the model should be, you can include a
comparison file in rebuilding. The model is not used
for anything except to write out information on
coordinate differences in the output log files.
NOTE: this feature does not always work correctly.
http://phenix-online.org/documentation/autosol.htm (23 of 29) [12/14/08 1:00:42 PM]
56
Automated structure solution with AutoSol
loop_cc_min= 0.4
You can specify the minimum correlation of density from
a loop with the map.
n_cycle_build= 3 Choose number of cycles (3). This does not apply if
TEXTAL is selected for build_type
n_random_frag= 0 In resolve building you can randomize each fragment
slightly so as to generate more possibilities for tracing
based on extending it.
n_random_loop= 3 Number of randomized tries from each end for building
loops If 0, then one try. If N, then N additional tries
with randomization based on rms_random_loop.
ncycle_refine= 3 Choose number of refinement cycles (3)
number_of_builds= 2 Number of different solutions to build models for
number_of_models= 3 This parameter lets you choose how many initial
models to build with RESOLVE within a single build
cycle. This parameter is now superseded by
number_of_parallel_models, which sets the number of
models (but now entire build cycles) to carry out in
parallel. A zero means set it automatically. That is
what you normally should use. The number_of_models is
by default set to 1 and number_of_parallel_models is
set to the value of nbatch (typically 4).
offsets_list= 53 7 23 You can specify an offset for the orientation of
the helix and strand templates in building. This is used
in generating different starting models.
quick_build= False Choose whether you want to go for quick
model-building (speeds it up, and for poor maps, is
sometimes better)
rebuild_side_chains= False You can choose to replace side chains (with
extend_only) before rebuilding the model (not
normally used)
refine= False This script normally refines the model during building.
Say No to skip refinement
resolution_build= 0.0
Enter the high-resolution limit for
model-building. If 0.0, the value of resolution is
used as a default.
resolve_command_list= None Commands for resolve. One per line in the
form: keyword value value can be optional
Examples: coarse_grid resolution 200 2.0 hklin
test.mtz NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single quotes
(') like this: resolve_command_list="'no_build'
'b_overall 23' "
retrace_before_build= False You can choose to retrace your model n_mini
times and use a map based on these retraced models
to start off model-building. This is the default
for rebuilding models if you are not using
rebuild_in_place. You can also specify
n_iter_rebuild, the number of cycles of
retrace-density-modify-build before starting the
main build.
rms_random_frag= None Rms random position change added to residues on
ends of fragments when extending them If you enter a
negative number, defaults will be used.
rms_random_loop= None Rms random position change added to residues on
ends of loops in tries for building loops If you enter
a negative number, defaults will be used.
semet= False You can specify that the dataset that is used for
refinement is a selenomethionine dataset, and that the model
should be the SeMet version of the protein, with all SD of MET
replaced with Se of MSE.
http://phenix-online.org/documentation/autosol.htm (24 of 29) [12/14/08 1:00:42 PM]
57
Automated structure solution with AutoSol
solve_command_list= None Commands for solve. One per line in the form:
keyword value value can be optional Examples:
verbose resolution 200 2.0
start_chains_list= None You can specify the starting residue number for
each of the unique chains in your structure. If you
use a sequence file then the unique chains are
extracted and the order must match the order of your
starting residue numbers. For example, if your
sequence file has chains A and B (identical) and
chains C and D (identical to each other, but
different than A and B) then you can enter 2 numbers,
the starting residues for chains A and C. NOTE: you
need to specify an input sequence file for
start_chains_list to be applied.
thorough_loop_fit= True Try many conformations and accept them even if
the fit is not perfect? If you say Yes the parameters
for thorough loop fitting are: n_random_loop=100
rms_random_loop=0.3 rho_min_main=0.5 while if you say
No those for quick loop fitting are: n_random_loop=20
rms_random_loop=0.3 rho_min_main=1.0
trace_as_lig= False You can specify that in building steps the ends of
chains are to be extended using the LigandFit algorithm.
This is default for nucleic acid model-building.
use_any_side= False You can choose to have resolve model-building place
the best-fitting side chain at each position, even if the
sequence is not matched to the map.
use_met_in_align= Auto *Yes No True False You can use the heavy-atom
positions in input_ha_file as markers for Met SD
positions.
ncs
find_ncs= Auto *Yes No True False This script normally deduces ncs
information from the NCS in chains of models that are built
during iterative model-building. The update is done each cycle
in which an improved model is obtained. Say No to skip this.
See also "input_ncs_file" which can be used to specify NCS at
the start of the process. If find_ncs="No" then only this
starting NCS will be used and it will not be updated. You can
use find_ncs "No" to specify exactly what residues will be
used in NCS refinement and exactly what NCS operators to use
in density modification. You can use the function
$PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to
help you set up an input_ncs_file that has your specifications
in it.
ncs_copies= None Number of copies of the molecule in the au (note: only
one type of molecule allowed at present)
ncs_refine_coord_sigma_from_rmsd= False You can choose to use the
current NCS rmsd as the value of the
sigma for NCS restraints. See also
ncs_refine_coord_sigma_from_rmsd_ratio
ncs_refine_coord_sigma_from_rmsd_ratio= 1.0
You can choose to multiply
the current NCS rmsd by this
value before using it as the
sigma for NCS restraints See
also
ncs_refine_coord_sigma_from_rmsd
optimize_ncs= True This script normally deduces ncs information from the
NCS in chains of models that are built during iterative
model-building. Optimize NCS adds a step to try and make http://phenix-online.org/documentation/autosol.htm (25 of 29) [12/14/08 1:00:42 PM]
58
Automated structure solution with AutoSol
the molecule formed by NCS as compact as possible, without
losing any point-group symmetry.
refine_with_ncs= True This script can allow phenix.refine to
automatically identify NCS and use it in refinement.
NOTE: ncs refinement and placing waters automatically
are mutually exclusive at present.
phasing
do_madbst= True Choose whether you want to skip FA calculation (speeds
it up)
f_doubleprime_list= None Enter f" for the heavy-atom for this dataset
f_prime_list= None Enter f' for the heavy-atom for this dataset
fixscattfactors= True For SOLVE phasing and MAD data you can choose
whether scattering factors are to be fixed by choosing
'Yes' to fix them or 'No' to refine them. Normally
choose 'Yes' (fix) if the data are weak and 'No'
(refine) if the data are strong.
ha_sites_file= None Input sites file... with xyz in fractional
coordinates or a PDB file with coordinates NOTE: This
file is optional if you specify a partial model file
have_hand= False Normally you will not know the hand of the heavy-atom
substructure, so have_hand=False. However if you do know it
(you got the sites from a difference Fourier or you know the
answer another way) you can specify that the hand is known.
id_scale_ref= None By default the datafile with the highest resolution
is used for the first step in scaling of MAD data. You can
choose to use any of the datafiles in your MAD dataset.
ikeepflag= 1 You can choose to keep all reflections in merging steps.
This is separate from rejecting reflections with high iso or
ano diffs. Default=1 (keep them)
inano_list= None Choose 'inano' for including anomalous differences and
'noinano' not to include them and 'anoonly' for just
anomalous differences (no isomorphous differences)
input_partpdb_file= None You can enter a PDB file (usually from
molecular replacement) for use in identifying
heavy-atom sites and phasing. NOTE 1: This procedure
works best if the model is refined. NOTE 2: This
file is only used in SAD phasing with Phaser on a
single dataset. In all other cases it is ignored.
NOTE 3: The output phases in phaser_xx.mtz will
contain both SAD and model information. They are not
completely suitable for use with AutoBuild or other
iterative model-building procedures because the
phases are not entirely experimental (but they may
work).
input_phase_labels= None Labels for FC and PHIC for data file with FC
PHIC or equivalent to use for finding heavy-atom
sites with difference Fourier methods.
mad_ha_add_f_double_prime_list= None F-double_prime values of additional
heavy-atom types. You must specify the
same number of entries of
mad_ha_add_f_double_prime_list as you do
for mad_ha_add_f_prime_list and for
mad_ha_add_list.
mad_ha_add_f_prime_list= None F-prime values of additional heavy-atom
types. You must specify the same number of
entries of mad_ha_add_f_prime_list as you do
for mad_ha_add_f_double_prime_list and for
mad_ha_add_list.
mad_ha_add_list= None You can specify heavy atom types in addition to
the one you named in mad_ha_type. The heavy-atoms found
in initial HySS searches will be given the type of http://phenix-online.org/documentation/autosol.htm (26 of 29) [12/14/08 1:00:42 PM]
59
Automated structure solution with AutoSol
mad_ha_type, and Phaser (if used for phasing) will try
to find additional heavy atoms of both the type
mad_ha_type and any listed in mad_ha_add_list. You must
also specify the same number of mad_ha_add_f_prime_list
entries and of mad_ha_add_f_double_prime_list entries.
n_ha_list= None Enter a guess of number of HA sites
nat_der_list= None Enter 'Native' or a heavy-atom symbol (Pt, Se)
overallscale= False You can choose to have only an overall scale factor
for this dataset (no local scaling applied). Use this if
your data is already fully scaled.
partpdb_rms= 1.0
phase_full_resolution= True You can choose to use the full resolution of
the data in phasing, instead of using the
recommended_resolution. This is always a good
idea with Phaser phases.
phaser_completion= True You can choose to use phaser log-likelihood
gradients to complete your heavy-atom sites. This can
be used with or without the ha_iteration option.
phasing_method= SOLVE *PHASER You can choose to phase with SOLVE or with
Phaser. (Only applies to SAD phasing at present)
ratio_out= 3.0
You can choose the ratio of del ano or del iso to the rms
in the shell for rejection of a reflection. Default = 4.
read_sites= False Choose if you want to enter ha sites from a file The
name of the file will be requested after scaling is
finished. The file can have sites in fractional coordinates
or be a PDB file.
require_nat= True Choose yes to skip any reflection with no native (for
SIR) or no data (MAD/SAD) or where anom difference is very
large. This keyword (default=Yes) allows the routines in
SOLVE to remove reflections with an implausibly large
anomalous difference (greater than ratio_out times the rms
anomalous difference).
res_hyss= None Resolution for running HYSS (usually 3.5 A is fine)
res_phase= 0.0
Enter the high-resolution limit for phasing
skip_extra_phasing= Auto Yes *No True False You can choose to skip an
extra phasing step to speed up the process
use_phaser_hklstart= True You can choose to start density modification
with FWT PHWT from Phaser (Only applies to SAD
phasing at present)
wavelength_list= None Enter wavelength of x-ray data (A)
refinement
link_distance_cutoff= 3.0
You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
ordered_solvent_low_resolution= None You can choose what resolution
cutoff to use fo placing ordered solvent
in phenix.refine. If the resolution of
refinement is greater than this cutoff,
then no ordered solvent will be placed,
even if
refinement.main.ordered_solvent=True.
place_waters= True You can choose whether phenix.refine automatically
places ordered solvent (waters) during the refinement
process.
r_free_flags_fraction= 0.1
Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
http://phenix-online.org/documentation/autosol.htm (27 of 29) [12/14/08 1:00:42 PM]
60
Automated structure solution with AutoSol
r_free_flags_lattice_symmetry_max_delta= 5.0
You can set the maximum
deviation of distances in the
lattice that are to be
considered the same for
purposes of generating a
lattice-symmetry-unique set of
free R flags.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
can decide whether to include lattice
symmetry (good in general, necessary
if there is twinning).
refine_b= True You can choose whether phenix.refine is to refine
individual atomic displacement parameters (B values)
refine_se_occ= True You can choose to refine the occupancy of SE atoms
in a SEMET structure (default=Yes). This only applies if
semet=true
refinement_resolution= 0.0
Enter the high-resolution limit for
refinement only. This high-resolution limit can
be different than the high-resolution limit for
other steps. The default ("None" or 0.0) is to
use the overall high-resolution limit for this
run (as set by 'resolution')
use_mlhl= True This script normally uses information from the input file
(HLA HLB HLC HLD) in refinement. Say No to only refine on Fobs
scaling
b_overall= None If an anisotropy correction is applied, you can choose
to set the overall B of the data to a specific value with
b_overall. See also "correct_aniso"
correct_aniso= *Auto Yes No True False Choose if you want to apply a
correction for anisotropy to the data. Yes means always
apply correction, No means never apply it, Auto means
apply it if the data is severely anisotropic
(recommended=Auto). If you set correct_aniso=Auto then if
the range of anisotropic B-factors is greater than
delta_b_for_auto_correct_aniso and the ratio of the
largest to the smallest less than
ratio_b_for_auto_correct_aniso then the correction will
be applied. Anisotropy correction will be applied to all
input data before scaling. The default overall B factor
will be the minimum of the b-factors in any direction of
the original data. To set this to another value, use
"b_overall"
delta_b_for_auto_correct_aniso= 20.0
Choose what range of aniso B values
is so big that you want to correct for
anisotropy by default. Both ratio_b and
delta_b must be large to correct. see
also ratio_b_for_auto_correct_aniso See
also "correct_aniso" which overrides
this default if set to "Yes"
ratio_b_for_auto_correct_aniso= 1.5
Choose what ratio aniso B values is
so big that you want to correct for
anisotropy by default. Both ratio_b and
delta_b must be large to correct. see
also delta_b_for_auto_correct_aniso See http://phenix-online.org/documentation/autosol.htm (28 of 29) [12/14/08 1:00:42 PM]
61
Automated structure solution with AutoSol
also "correct_aniso" which overrides
this default if set to "Yes"
test_correct_aniso= True Choose whether you want to try applying or not
applying an anisotropy correction if the run fails.
First your original selection for applying or not
will be tried, and then the opposite will be tried
if the run fails.
http://phenix-online.org/documentation/autosol.htm (29 of 29) [12/14/08 1:00:42 PM]
62
Automated molecular replacement with AutoMR
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Automated molecular replacement with AutoMR
Summary of inputs and outputs for AutoMR
Components, copies, search models, and ensembles
What the AutoMR wizard needs to run
Specifying which columns of data to use from input data files
Standard AutoMR run with coords.pdb native.sca
Specifying a refinement file for AutoBuild
Passing any commands to AutoBuild
AutoMR searching for 2 components
Specifying molecular masses of 2 components
AutoMR searching for 2 components, but specifying the orientation of one of them
Specific limitations and problems
Author(s)
●
Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni
●
AutoMR Wizard: Tom Terwilliger, Laurent Storoni, Randy Read, and Airlie McCoy
●
PHENIX GUI and PDS Server: Nigel W. Moriarty
● phenix.xtriage: Peter Zwart
Purpose
Purpose of the AutoMR Wizard
The AutoMR Wizard provides a convenient interface to Phaser molecular replacement and feeds the results of molecular replacement directly into the AutoBuild Wizard for automated model rebuilding.
The AutoMR Wizard begins with datafiles with structure factor amplitudes and uncertainties, a search model or models, and identifies placements of the search models that are compatible with the data.
Usage
The AutoMR Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded http://phenix-online.org/documentation/automr.htm (1 of 16) [12/14/08 1:00:50 PM]
63
Automated molecular replacement with AutoMR script files. All three versions are identical except in the way that they take commands from the user.
command-line version will be described here.
NOTE: You may find it easiest to run the GUI version of AutoMR when you are learning how to use it, and then to move to the command-line or script versions later, as the GUI version will take you through all the necessary steps of organizing your data.
Summary of inputs and outputs for AutoMR
Input data file. This file can be in most any format, and must contain either amplitudes or intensities and sigmas. You can specify what resolution to use for molecular replacement and separately what resolution to use for model rebuilding. If you specify "0.0" for resolution (recommended) then defaults will be used for molecular replacement (i.e. use data to 2.5A if available to solve structure, then carry out rigid body refinement of final solution with all data) and all the data will be used for model rebuilding.
Composition of the asymmetric unit. PHASER needs to know what the total mass in the asymmetric unit is (i.e. not just the mass of the search models). You can define this either by specifying one or more protein or nucleic acid sequence files, or by specifying protein or nucleic acid molecular masses, and telling the Wizard how many copies of each are present.
Space groups to search. You can request that all space groups with the same point group as the one you start out with be searched, and the best one be chosen. If you select this option then the best space group will be used for model rebuilding in AutoBuild.
Ensembles to search for. AutoMR builds up a model by finding a set of good positions and orientations of one "ensemble", and then using each of those placements as starting points for finding the next ensemble, until all the contents of the asymmetric unit are found and a consistent solution is obtained. You can specify any number of different ensembles to search for, and you can search for any number of copies of each ensemble. The order of searching for ensembles does make a difference. If possible, you want to search for the biggest, best-ordered, most accurate ensemble first. You specify the order when you list the ensembles to search for on the last main window of the AutoMR wizard.
Each ensemble can be specified by a single PDB file or a set of PDB files. The contents of one set of PDB files for an ensemble must all be oriented in the same way, as they will be put together and used as a group always in the molecular replacement process.
You will need to specify how similar you think each input PDB file that is part of an ensemble is to the structure that is in your crystal. You can specify either sequence identity, or expected rmsd. Note that if you use a homology model, you should give the sequence identity of the template from which the model was constructed, not the 100% identity of the model!
Output of AutoMR
Output files from AutoMR
When you run AutoMR the output files will be in a subdirectory with your run number:
AutoMR_run_1_/ # subdirectory with results
●
A summary file listing the results of the run and the other files produced:
AutoMR_summary.dat # overall summary http://phenix-online.org/documentation/automr.htm (2 of 16) [12/14/08 1:00:50 PM]
64
Automated molecular replacement with AutoMR
●
A warnings file listing any warnings about the run
AutoMR_warnings.dat # any warnings
●
A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
AutoMR_Facts.dat # all Facts about the run
●
Molecular replacement model, structure factors, and map coefficients:
MR.1.pdb
MR.1.mtz
MR.MAP_COEFFS.1.mtz
The AutoMR wizard writes out MR.1.pdb and MR.1.mtz and MR.MAP_COEFFS.1.mtz well as output log files. The MR.1.pdb file will contain all the components of your MR solution. If there are multiple PDB files in an ensemble, the model with the lowest estimated rmsd is chosen to represent the whole ensemble and is written to MR.1.pdb. If there are multiple copies of a model, the chains are lettered sequentially A B C... The MR.1.mtz file contains the data from your input file to the full resolution available. The MR.MAP_COEFFS.1.mtz file contains sigmaA-weighted 2Fo-
Fc map coefficients based on the rigid-body-refined model.
Model rebuilding. After PHASER molecular replacement the AutoMR Wizard loads the AutoBuild Wizard and sets the defaults based on the MR solution that has just been found. You can use the default values, or you may choose to use 2Fo-Fc maps instead of density-modified maps for rebuilding, or you may choose to start the model-rebuilding with the map coefficients from MR.MAP_COEFFS.1.mtz.
How to run the AutoMR Wizard
Running the AutoMR Wizard is easy. For example, from the command-line you can type: phenix.automr native.sca search.pdb RMS=0.8 mass=23000 copies=1
The AutoMR Wizard will find the best location and orientation of the search model search.pdb in the unit cell based on the data in native.sca, assuming that the RMSD between the correct model and search.
pdb is about 0.8 A, that the molecular mass of the true model is 23000 and that there is 1 copy of this model in the asymmetric unit. Once the AutoMR Wizard has found a solution, it will automatically call the AutoBuild Wizard and rebuild the model.
Components, copies, search models, and ensembles
●
Your structure is composed of one or more components such as a 20Kd subunit with sequence seq-of-20Kd-subunit.
●
There may be one or more copies of each component in your structure.
●
You can search for the location(s) of a component with a search model that consists of a single structure or an ensemble of structures.
What the AutoMR wizard needs to run
In a simple case where you have one search model and are looking for N copies of this model in your structure, you need: http://phenix-online.org/documentation/automr.htm (3 of 16) [12/14/08 1:00:50 PM]
65
Automated molecular replacement with AutoMR
●
(1) a datafile name (native.sca or data=native.sca)
●
(2) a search model (search_model.pdb or coords=search_model.pdb)
●
(3) how similar the search model is to your structure ( RMS=0.8 or identity=75)
●
(4) information about the contents of the asymmetric unit: (mass=23000 or seq_file=seq.dat) and (copies=1)
It may be advantageous to search using an ensemble of similar structures, rather than a single structure. If you have an ensemble of search models to search for, then specify it as coords='model_1.pdb model_2.pdb model_3.pdb'
In this case you need to give the RMS or identity for each model: identity='45 40 35'. Each of the models in the ensemble must be in the same orientation as the others, so that the ensemble of models can be placed as a group in the unit cell.
If you are searching for more than one ensemble, or if there is more than one component in the a.u., then use the full syntax and specify them as (NOTE copies becomes copies_to_find or component_copies): ensemble_1.coords=s1.pdb ensemble_1.RMS=0.8 ensemble_1.copies_to_find=1 \
component_1.mass=23000 component_1.component_copies=1
Specifying which columns of data to use from input data files
If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.
For example, if your data file data.mtz has columns F SIGF then you might specify data=data.mtz
input_label_string="F SIGF"
You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=data.mtz # display all labels for data.mtz
You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at
GUI, the command-line, or a script for how to do this. Some of the most common parameters are:
data=w1.sca # data file model=coords.pdb # starting model seq_file=seq.dat # sequence file
Examples
Standard AutoMR run with coords.pdb native.sca
http://phenix-online.org/documentation/automr.htm (4 of 16) [12/14/08 1:00:50 PM]
66
Automated molecular replacement with AutoMR
Run AutoMR using coords.pdb as search model, native.sca as data, assume RMS between coords.pdb and true model is about 0.85 A, the sequence of true model is seq.dat and there is 1 copy in the unit cell: phenix.automr coords.pdb native.sca RMS=0.85 seq.dat copies=1 \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Specifying data columns
Run AutoMR as above, but specify the data columns explicitly: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Note that the data columns are specified by a string that includes both F and SIGF : "F SIGF". The string must match some set of data labels that can be extracted automatically from your data file. You can find the possible values of this string as described above with phenix.automr display_labels=data.mtz
Specifying a refinement file for AutoBuild
Run AutoMR as above, but specify a refinement file that is different from the file used for the MR search: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
input_refinement_file=refinement.mtz \
input_refinement_labels="FP SIGFP FreeR_flag" \
n_cycle_rebuild_max=2 n_cycle_build_max=2
Note that the commands input_refinement_file and input_refinement_labels are in the scope
"autobuild_variables" . These commands and others with this prefix are passed on to AutoBuild.
Passing any commands to AutoBuild
You can pass any AutoBuild commands on to AutoBuild, even if they are not already defined for you in
AutoMR. Use the command autobuild_input_list_add to add a command, and then apply that command by adding "autobuild_" to the beginning of the command name. For example, to add the commands semet=True and refine=False: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \
data=data.mtz input_label_string="F SIGF" \
autobuild_input_list_add='semet refine' \
semet=True \
refine=False
Notes. This applies only to command-line operation of AutoMR. Note that any keywords that are used in both AutoBuild and AutoMR will apply to both if you specify them in autobuild_input_list_add. For example if you set the resolution in AutoBuild with autobuild_input_list_add=resolution and resolution=2.6 then this resolution will apply to both AutoMR and AutoBuild.
AutoMR searching for 2 components
http://phenix-online.org/documentation/automr.htm (5 of 16) [12/14/08 1:00:50 PM]
67
Automated molecular replacement with AutoMR
Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \
resolution=0.0 resolution_build=3.0 \
component_1.component_type=protein component_1.seq_file=beta.seq \
component_1.component_copies=1 \
component_2.component_type=protein component_2.seq_file=blip.seq \
component_2.component_copies=1 \
ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=1 \
ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \
n_cycle_rebuild_max=1
Specifying molecular masses of 2 components
Run AutoMR as in the previous example, except specify the components of the asymmetric unit with molecular masses (30000 and 20000), and define the search models with PDB files and percent sequence identity with the true structures (50% and 60%). phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \
resolution=0.0 resolution_build=3.0 \
component_1.component_type=protein component_1.mass=30000 \
component_1.component_copies=1 \
component_2.component_type=protein component_2.mass=20000 \
component_2.component_copies=1 \
ensemble_1.coords=beta.pdb ensemble_1.identity=50 ensemble_1.copies_to_find=1 \
ensemble_2.coords=blip.pdb ensemble_2.identity=60 ensemble_2.copies_to_find=1 \
n_cycle_rebuild_max=1
AutoMR searching for 2 components, but specifying the orientation of one of them
Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. Define the orientation and position of one component. Define the number of copies to find for each component (0 for beta, which is fixed, 1 for blip). phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \
resolution=0.0 resolution_build=3.0 \
component_1.component_type=protein component_1.seq_file=beta.seq \
component_1.component_copies=1 \
component_2.component_type=protein component_2.seq_file=blip.seq \
component_2.component_copies=1 \
ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=0 \
ensemble_1.ensembleID="beta" \
ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \
ensemble_2.ensembleID="blip" \
n_cycle_rebuild_max=1 \
fixed_ensembleID_list="beta" \
fixed_euler_list="199.84,41.535,184.15"\
fixed_frac_list="-0.49736,-0.15895,-0.28067"
Note: you have to define an ensemble for the fixed molecule (beta in this example).
Possible Problems
http://phenix-online.org/documentation/automr.htm (6 of 16) [12/14/08 1:00:50 PM]
68
Automated molecular replacement with AutoMR
Specific limitations and problems
●
The AutoBuild Wizard can build PROTEIN, RNA, or DNA, but it can only build one at a time. If your
MR model contains more than one type of chain, then you will need to run AutoBuild separately from AutoMR and when you run AutoBuild, specify one of them with input_lig_file_list and the type of chain to build with chain_type: input_lig_file_list=ProteinPartofMRmodel.pdb
chain_type=DNA
●
If you use an ensemble as a search model, the output structure will contain just the first member of the ensemble, so you may wish to put the member that is likely to be the most similar to the true structure as the first one in your ensemble.
●
If you run AutoMR from the GUI and continue on to AutoBuild, and then select "Start run over
(delete everything for this run)" it will delete your AutoBuild and your AutoMR run and start your
AutoMR run all over.
●
The AutoMR Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.
Literature
Phaser crystallographic software. A. J. McCoy, R. W. Grosse-Kunstleve, P. D. Adams, M.
D. Winn, L. C. Storoni and R. J. Read J. Appl. Cryst. 40, 658-674 (2007)
Likelihood-enhanced fast translation functions. A.J. McCoy, R.W. Grosse-Kunstleve, L.C.
Storoni & R.J. Read Acta Cryst. D61, 458-464 (2005)
[pdf]
[pdf]
Likelihood-enhanced fast rotation functions. L.C. Storoni, A.J. McCoy and R.J. Read.
Acta Cryst. D60, 432-438 (2004)
[pdf]
Additional information
List of all AutoMR keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- automr
http://phenix-online.org/documentation/automr.htm (7 of 16) [12/14/08 1:00:50 PM]
69
Automated molecular replacement with AutoMR
build= True Run AutoBuild immediately after AutoMR (Command-line only)
data= None Datafile (any standard format) (Command-line only)
copies= None Set both copies_to_find and component_copies with copies. This
is the number of copies of this search model to find, and also the
number of copies of this sequence or mass in the asymmetric unit.
(Command-line only)
ensembleID= ensemble_1 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
(Command-line only)
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown. (Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop".
If you type "phenix.autobuild run=3 stop" then this will stop run
3 of autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
(Command-line only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown. (Command-line only)
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label http://phenix-online.org/documentation/automr.htm (8 of 16) [12/14/08 1:00:50 PM]
70
Automated molecular replacement with AutoMR
strings that are produced in AutoSol to identify which
data to use from a datafile like this: peak.data="F+
SIGF+ F- SIGF-" # the entire string in quotes counts
here You can use the individual labels from these
strings as identifiers for data columns in AutoSol and
AutoBuild like this: input_refinement_labels="FP SIGFP
FreeR_flags" # each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
autobuild_variables
two_fofc_in_rebuild= None Actively sets two_fofc_in_rebuild in
AutoBuild. NOTE: value is not checked
include_input_model= None Actively sets include_input_model in
AutoBuild. NOTE: value is not checked
n_cycle_rebuild_min= None Actively sets n_cycle_rebuild_min in
AutoBuild. NOTE: value is not checked
n_cycle_rebuild_max= None Actively sets n_cycle_rebuild_max in
AutoBuild. NOTE: value is not checked
n_cycle_build_min= None Actively sets n_cycle_build_min in AutoBuild.
NOTE: value is not checked
n_cycle_build_max= None Actively sets n_cycle_build_max in AutoBuild.
NOTE: value is not checked
rebuild_in_place= None Actively sets rebuild_in_place in AutoBuild.
NOTE: value is not checked
thorough_denmod= None Actively sets thorough_denmod in AutoBuild. NOTE:
value is not checked
i_ran_seed= None Actively sets i_ran_seed in AutoBuild. NOTE: value is
not checked
start_chains_list= None Actively sets start_chains_list in AutoBuild.
NOTE: value is not checked
input_refinement_file= None Actively sets input_refinement_file in
AutoBuild. NOTE: value is not checked
input_refinement_labels= None Actively sets input_refinement_labels in
AutoBuild. NOTE: value is not checked
input_labels= None Actively sets input_labels in AutoBuild. NOTE: value
is not checked
resolve_command_list= None Actively sets resolve_command_list in
AutoBuild. NOTE: value is not checked
resolve_pattern_command_list= None Actively sets
resolve_pattern_command_list in AutoBuild.
NOTE: value is not checked
morph= None Actively sets morph in AutoBuild. NOTE: value is not checked
morph_rad= None Actively sets morph_rad in AutoBuild. NOTE: value is not
checked
ensemble_1
ensembleID= ensemble_1 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_2
ensembleID= ensemble_2 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only) http://phenix-online.org/documentation/automr.htm (9 of 16) [12/14/08 1:00:50 PM]
71
Automated molecular replacement with AutoMR
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_3
ensembleID= ensemble_3 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_4
ensembleID= ensemble_4 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
ensemble_5
ensembleID= ensemble_5 ID for this ensemble. (Command-line only)
copies_to_find= None Number of copies of this ensemble to find in a.u.
(Command-line only)
coords= None model(s) for this ensemble. (Command-line only)
identity= None percent identity(ies) of model(s) in this ensemble to
structure (alternative is RMS). (Command-line only)
RMS= None RMSD(s) of model(s) to structure (alternative is identity).
(Command-line only)
component_1
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_2
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_3
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_4
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only) http://phenix-online.org/documentation/automr.htm (10 of 16) [12/14/08 1:00:50 PM]
72
Automated molecular replacement with AutoMR
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
component_5
seq_file= None protein seq_file for this component. (Command-line only)
component_type= *protein nucleic_acid protein or nucleic acid.
(Command-line only)
mass= None molecular mass (kDa) of this component. (Command-line only)
component_copies= None Number of copies of this component in the a.u.
(required). (Command-ine only)
crystal_info
cell= 0.0 0.0 0.0 0.0 0.0 0.0
Enter cell parameter a b c alpha beta
gamma
chain_type= *Auto PROTEIN DNA RNA You can specify whether to build
protein, DNA, or RNA chains. At present you can only build
one of these in a single run. If you have both DNA and
protein, build one first, then run AutoBuild again,
supplying the prebuilt model in the "input_lig_file_list"
and build the other. NOTE: default for this keyword is Auto,
which means "carry out normal process to guess this
keyword". The process is to look at the sequence file and/or
input pdb file to see what the chain type is. If there are
more than one type, the type with the larger number of
residues is guessed. If you want to force the chain_type,
then set it to PROTEIN RNA or DNA.
resolution= 0.0
Enter the high-resolution limit for MR search. All the
data input will be written out regardless of your choice. By
default, the final rigid-body refinement will use all data.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
decision_making
min_seq_identity_percent= 50.0
The sequence in your input PDB file will
be adjusted to match the sequence in your
sequence file (if any). If there are
insertions/deletions in your model and the
wizard does not seem to identify them, you can
split up your PDB file by adding records like
this: BREAK You can specify the minimum
sequence identity between your sequence file
and a segment from your input PDB file to
consider the sequences to be matched. Default
is 50.0%. You might want a higher number to
make sure that deletions in the sequence are
noticed.
overlap_allowed= None Solutions with no C-alpha clashes will be
accepted. If the best packing has some clashes,
solutions with that number of clashes will be accepted,
as long as this does not exceed the maximum allowed.
You can choose to increase the maximum if the packing
is tight and your search molecule is not exactly the
same as the molecule in the cell. If you leave it blank
then Phaser will decide for you.
selection_criteria_rot= *Percent_of_best Number_of_solutions Z_score All
Choose a criterion for keeping rotation
solutions at each stage. The choices are:
Percent of Best Score: AutoMR looks down the list
of LLG scores and only keeps the ones that
differ from the mean by more than the chosen
percentage, compared to the top solution. Enter
your desired percentage into the entry field http://phenix-online.org/documentation/automr.htm (11 of 16) [12/14/08 1:00:50 PM]
73
Automated molecular replacement with AutoMR
(default=75%) Number of Solutions: Keep the N
top solutions (you can set N; default=1)
Z-score: Keep all the solutions with a Z-score
greater than X (you can set X; default=6). All:
Keep everything and go on holiday while Phaser
crunches through it all (definitely not
recommended!)
selection_criteria_rot_value= 75 Choose a value for your criterion for
keeping rotation solutions at each stage.
Percent of Best Score: AutoMR looks down
the list of LLG scores and only keeps the
ones that differ from the mean by more
than the chosen percentage, compared to
the top solution. Enter your desired
percentage into the entry field
(default=75%) Number of Solutions: Keep
the N top solutions (you can set N;
default=1) Z-score: Keep all the solutions
with a Z-score greater than X (you can set
X; default=6). All: Keep everything and go
on holiday while Phaser crunches through
it all (definitely not recommended!)
fixed_ensembles
fixed_ensembleID_list= None Enter the ID (set with ensemble_1.ensembleID
or equivalent) of the component that is to be
fixed. NOTE 1: Each ensemble in
fixed_ensembleID_list must be defined. NOTE 2:
you can enter more than one fixed component if
you want. If you do, then enter fixed_euler_list
in multiples of 3 numbers and also
fixed_frac_list in multiples of 3 numbers.
fixed_euler_list= 0.0 0.0 0.0
Enter Euler angles (from AutoMR or Phaser)
for fixed component defined with
fixed_ensembleID_list. NOTE 2: you can enter more than
one fixed component if you want. If you do, then enter
fixed_euler_list in multiples of 3 numbers and also
fixed_frac_list in multiples of 3 numbers.
fixed_frac_list= 0.0 0.0 0.0
Enter fractional offset (location) for
fixed component (from AutoMR or Phaser) for fixed
component defined with fixed_ensembleID_list. NOTE 2:
you can enter more than one fixed component if you
want. If you do, then enter fixed_euler_list in
multiples of 3 numbers and also fixed_frac_list in
multiples of 3 numbers.
general
all_plausible_sg_list= None Choose which space groups to search
autobuild_input_list_add= None You can add keywords to those that AutoMR
passes on to AutoBuild (command-line only) The
format for this command is:
autobuild_input_list_add='semet refine' Then
you can set any of the variables you specify
by adding the prefix "autobuild_" to the name
of your variable: autobuild_semet=False
autobuild_refine=True This will now set
'semet'=False and refine=True in AutoBuild
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set http://phenix-online.org/documentation/automr.htm (12 of 16) [12/14/08 1:00:50 PM]
74
Automated molecular replacement with AutoMR
run_command=qsub (or otherwise submit to a batch queue),
then you should set background=False, so that the batch
queue can keep track of your runs. There is no need to use
background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
base_path= None You can specify the base path for files (default is
current working directory)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is No, keep these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True"
coot_name= coot If your version of coot is called something else, then
you can specify that here.
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
do_anisotropy_correction= True Choose whether you want to apply
anisotropy correction
extra_verbose= False Facts and possible commands will be printed every
cycle if Yes
max_wait_time= 100.0
You can specify the length of time (seconds) to
wait when testing the run_command. If you have a cluster
where jobs do not start right away you may need a longer
time to wait.
nbatch= 1 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
run_command= csh When you specify nproc=nn, you can run the subprocesses
as jobs in background with csh (default) or submit them to
a queue with the command of your choice (i.e., qsub ). If
you have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your
system. NOTE: If you set run_command=qsub (or otherwise
submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command=csh (or similar, csh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
skip_xtriage= False You can bypass xtriage if you want. This will http://phenix-online.org/documentation/automr.htm (13 of 16) [12/14/08 1:00:50 PM]
75
Automated molecular replacement with AutoMR
prevent you from applying anisotropy corrections, however.
temp_dir= None Define a temporary directory (it must exist)
title= Run 1 AutoMR Sun Dec 7 17:46:24 2008 Enter any text you like to
help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
use_all_plausible_sg= False Normally you will want to search all space
groups with the same point group as you may not
know which is correct from your data. You can
select which of these to choose using 'Choose
variable to set' and selecting
'all_plausible_sg_list'
verbose= False Command files and other verbose output will be printed
input_files
input_data_file= None Enter the a file with input structure factor data.
For structure factor data only (e.g., FP SIGFP) any
format is ok. If you have free R flags, phase
information or HL coefficients that you want to use
then an mtz file is required. If this file contains
phase information, this phase information should be
experimental (i.e., MAD/SAD/MIR etc), and should not be
density-modified phases (enter any files with
density-modified phases as input_map_file instead).
NOTE: If you supply HL coefficients they will be used
in phase recombination. If you supply PHIB or PHIB and
FOM and not HL coefficients, then HL coefficients will
be derived from your PHIB and FOM and used in phase
recombination. If you also specify a hires data file,
then FP and SIGFP will come from that data file (and
not this one) If an input_refinement_file is
specified, then F, Sigma, FreeR_flag (if present) from
that file will be used for refinement instead of this
one.
input_label_string= None Choose the set of labels that represent the
data and sigma columns for your data. NOTE: Applies
to input data file for AutoMR. See also
'input_labels', which applies to input data file for
AutoBuild.
input_pdb_file= None You can enter a PDB file containing a starting
model of your structure NOTE: If you enter a PDB file
then the AutoBuild wizard will start right in with
rebuild steps, skipping the build process. If the model
is very poor than it may be better to leave it out as
the build process (which includes pattern recognition
and recognition of helical and strand fragments) is
optimized for improving poor maps, while the rebuild
process is optimized for better maps that can be
produced by having a partial model.
input_seq_file= None Enter name of file with 1-letter code of protein
sequence NOTES: 1. lines starting with > are ignored
and separate chains 2. FASTA format is fine 3. If
there are multiple copies of a chain, just enter one
copy. 4. If you enter a PDB file for rebuilding and it
has the sequence you want, then the sequence file is not
necessary. NOTE: You can also enter the name of a PDB
file that contains SEQRES records, and the sequence from
the SEQRES records will be read, written to
seq_from_seqres_records.dat, and used as your input http://phenix-online.org/documentation/automr.htm (14 of 16) [12/14/08 1:00:50 PM]
76
Automated molecular replacement with AutoMR
sequence. NOTE: for AutoBuild you can specify
start_chains_list on the first line of your sequence
file: >> start_chains_list 23 11 5 NOTE: default
for this keyword is Auto, which means "carry out normal
process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild will
automatically take the value from AutoSol. If you do not
want this to happen, you can specify None which means
"No file"
input_seq_file_list= None The keyword input_seq_file_list is used in
AutoMR to specify the molecular masses of the
components of the unit cell using a set of sequence
files. Usually you should input the sequences of
the actual components of the unit cell here (one
sequence file for each component). NOTE: If no
input_seq_file is specified, then the sequences
from input_seq_file_list are used to create a new
file "composite_seq.dat" with all their sequences
and this is used as the input_seq_file. NOTE: the
format of each file in input_seq_file_list is the
1-letter code of the protein sequence (separate
chains with >>>>)
model_building
build_type= *RESOLVE_AND_TEXTAL RESOLVE TEXTAL You can choose to build
models with RESOLVE and TEXTAL or either one, and how many
different models to build with RESOLVE. The more you build,
the more likely to get a complete model. Note that
rebuild_in_place can only be carried out with RESOLVE
model-building
rebuild_after_mr= True You can choose to go right on to the AutoBuild
wizard with the rebuild-in-place option after running
molecular replacement.
resolution_build= 0.0
Enter the high-resolution limit for
model-building. If 0.0, the value of resolution is
used as a default.
semet= False You can specify that the dataset that is used for
refinement is a selenomethionine dataset, and that the model
should be the SeMet version of the protein, with all SD of MET
replaced with Se of MSE.
non_user_parameters
composition_num_list= 1 Enter number of copies of this component
weight_list= 0.0
Molecular weight of component (Da; e.g. 30000)
weight_seq_list= None Choose whether to define composition through
molecular weight or sequence
refinement
link_distance_cutoff= 3.0
You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
r_free_flags_fraction= 0.1
Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_lattice_symmetry_max_delta= 5.0
You can set the maximum
deviation of distances in the
lattice that are to be http://phenix-online.org/documentation/automr.htm (15 of 16) [12/14/08 1:00:50 PM]
77
Automated molecular replacement with AutoMR
considered the same for
purposes of generating a
lattice-symmetry-unique set of
free R flags.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
can decide whether to include lattice
symmetry (good in general, necessary
if there is twinning).
http://phenix-online.org/documentation/automr.htm (16 of 16) [12/14/08 1:00:50 PM]
78
Automated Model Building and Rebuilding using AutoBuild
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Automated Model Building and Rebuilding using AutoBuild
Purpose of the AutoBuild Wizard
How the AutoBuild Wizard works
Core modules in the AutoBuild Wizard
How to run the AutoBuild Wizard
What the AutoBuild wizard needs to run
Specifying which columns of data to use from input data files
Specifying other general parameters
Keeping waters from your input file in AutoBuild
Specifying phenix.refine parameters
Specifying resolve/resolve_pattern parameters
Including ligand coordinates in AutoBuild
Specifying arbitrary commands and cif files for phenix.refine
Standard building, rebuild_in_place, and multiple-models
Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel
Model editing during rebuilding with the Coot-PHENIX interface
Resolution limits in AutoBuild
Run AutoBuild automatically after AutoSol
Run AutoBuild beginning with experimental data
Make a SA-omit map around atoms in target.pdb
Make a simple composite omit map
Make an iterative-build omit map around atoms in target.pdb
Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb
Create one very good rebuilt model
Create 20 very good rebuilt models that are as different as possible
Morph an MR model and rebuild it
Just make maps; don't do any building.
Just calculate a prime-and-switch map
Specific limitations and problems
List of all AutoBuild keywords
Author(s)
http://phenix-online.org/documentation/autobuild.htm (1 of 34) [12/14/08 1:01:09 PM]
79
Automated Model Building and Rebuilding using AutoBuild
●
AutoBuild Wizard: Tom Terwilliger
●
PHENIX GUI and PDS Server: Nigel W. Moriarty
● phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams
●
RESOLVE: Tom Terwilliger
●
TEXTAL: Kreshna Gopal, Thomas Ioerger, Rita Pai, Tod Romo, James Sacchettini, Erik McKee, Lalji Kanbi
● phenix.xtriage: Peter Zwart
Purpose
Purpose of the AutoBuild Wizard
The purpose of the AutoBuild Wizard is to provide a highly automated system for model rebuilding and completion. The Wizard design allows the user to specify data files and parameters through an interactive GUI, or alternatively through keyworded scripts. The AutoBuild Wizard begins with datafiles with structure factor amplitudes and uncertainties, along with either experimental phase information or a starting model, carries out cycles of model-building and refinement alternating with model-based density modification, and producing a relatively complete atomic model.
The AutoBuild Wizard uses RESOLVE, (optionally also TEXTAL), xtriage and phenix.refine to build an atomic model, refine it, and improve it with iterative density modification, refinement, and model-building
The Wizard begins with either experimental phases (i.e., from AutoSol) or with an atomic model that can be used to generate calculated phases. The AutoBuild Wizard produces a refined model that can be nearly complete if the data are strong and the resolution is about 2.5 A or better. At lower resolutions (2.5 - 3 A) the model may be less complete and at resolutions > 3A the model may be quite incomplete and not well refined.
The AutoBuild Wizard can be used to generate OMIT maps (simple omit, SA-omit, iterative-build omit) that can cover the entire unit cell or specific residues in a PDB file.
The AutoBuild Wizard can generate a set of models compatible with experimental data (multiple_models)
Usage
The AutoBuild Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script
version will be described here.
How the AutoBuild Wizard works
The AutoBuild Wizard begins with experimental structure factor amplitudes, along with either experimental or model-based estimates of crystallographic phases. The phase information is improved by using statistical density modification to improve the correlation of NCS-related density in the map (if present) and to improve the match of the distribution of electron densities in the map with those expected from a model map. This improved map is then used to build and refine an atomic model.
In subsequent cycles, the models from previous cycles are used as a source of phase information in statistical density modification, iteratively improving the quality of the map used for model-building.
Additionally, during the first few cycles additional phase information is obtained by detecting and enhancing
(1) the presence of commonly-found local patterns of density in the map, and (2) the presence of density in the shape of helices and strands. The final model obtained is analyzed for residue-based map correlation and density at the coordinates of individual atoms, and an analysis including a summary of atoms and residues that are in strong, moderate, or weak density and out of density is provided.
Automation and user control
http://phenix-online.org/documentation/autobuild.htm (2 of 34) [12/14/08 1:01:09 PM]
80
Automated Model Building and Rebuilding using AutoBuild
The AutoBuild Wizard has been designed for ease of use combined with maximal user control, with as many parameters set automatically by the Wizard as possible, but maintaining parameters accessible to the user through a GUI and through keyword-based scripts. The Wizard uses the input/output routines of the cctbx library, allowing data files of many different formats so that the user does not have to convert their data to any particular format before using the Wizard. Use of the phenix.refine refinement package in the AutoBuild
Wizard allows a high degree of automation of refinement so that the neither user nor Wizard is required to specify parameters for refinement. The phenix.refine package automatically includes a bulk solvent model and automatically places solvent molecules.
Core modules in the AutoBuild Wizard
The five core modules in the AutoBuild Wizard are
●
(1) building a new model into an electron density map
●
(2) rebuilding an existing model
●
(3) refinement
●
(4) iterative model- building beginning from experimental phase information, and
●
(5) iterative model-building beginning from a model.
The standard procedures available in the AutoBuild Wizard that are based on these modules include:
●
(a) model-building and completion starting from experimental phases,
●
(b) rebuilding a model from scratch, with or without experimental phase information, and
●
(c) rebuilding a model in place, maintaining connectivity and sequence register.
Starting from a set of experimental phases and structure factor amplitudes, normally procedure (a) is carried out, and then the resulting model is rebuilt with procedure (b).
Starting from a model (e.g., from molecular replacement) and experimental structure factor amplitudes, procedure (c) is normally carried out if the starting model differs less than about 50% in sequence from the desired model, and otherwise procedure (b) is used.
How to run the AutoBuild Wizard
Running the AutoBuild Wizard is easy. For example, from the command-line you can type: phenix.autobuild data=w1.sca seq.dat model=coords.pdb
The AutoBuild Wizard will carry out iterative model-building, density modification and refinement based on the data in w1.sca and the model in coords.pdb, editing the model as necessary to match the sequence in seq.dat.
What the AutoBuild wizard needs to run
●
(1) a data file, optionally with phases and HL coeffs and freeR flag (w1.sca or data=w1.sca)
●
(2) a sequence file (seq.dat or seq_file=seq.dat) or a model (coords.pdb or model=coords.pdb)
...and optional files
●
(3) coefficients for a starting map (map_file=resolve.mtz)
●
(4) a file for refinement (refinement_file=exptl_fobs_freeR_flags.mtz)
●
(5) a high-resolution datafile (hires_file=high_res.sca)
Specifying which columns of data to use from input data files
If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with
"None" for anything that is missing. http://phenix-online.org/documentation/autobuild.htm (3 of 34) [12/14/08 1:01:09 PM]
81
Automated Model Building and Rebuilding using AutoBuild
For example, if your data file ref.mtz has columns FP SIGFP and FreeR then you might specify refinement_file=ref.mtz
input_refinement_labels="FP SIGFP None None None None None None FreeR"
The keywords for labels and anticipated input labels (program labels) are: input_labels (for data file): FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag input_refinement_labels: FP SIGFP FreeR_flag input_map_labels: FP PHIB FOM input_hires_labels: FP SIGFP FreeR_flag
You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=w1.mtz # display all labels for w1.mtz
NOTES: if your data files contain a mixture of amplitude and intensity data then only the amplitude data is available. If you have only intensity data in a data file and want to select specific columns, then you need to specify the column names as they are after importing the data and conversion to amplitudes (see below under
General Limitations for details).
Specifying other general parameters
You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end
for how to do this. Some of the most common parameters are: data=w1.sca # data file model=coords.pdb # starting model seq_file=seq.dat # sequence file map_file=map_coeffs.mtz # coefficients for a starting map for building resolution=3 # dmin of 3 A s_annealing=True # use simulated annealing refinement at start of each cycle n_cycle_build_max=5 # max number of build cycles (starting from experimental phases) n_cycle_rebuild_max=5 # max number of rebuild cycles (starting from a model)
Picking waters in AutoBuild
By default AutoBuild will instruct phenix.refine to pick waters using its standard procedure. This means that if the resolution of the data is high enough (typically 3 A) then waters are placed.
You can tell AutoBuild not to have phenix.refine pick waters with the command: place_waters=False
If you want to place waters at a lower resolution, you will need to reset the low-resolution cutoff for placing waters in phenix.refine. You would do that in a "refinement_params.eff" file containing lines like these (see below for passing parameters to phenix.refine with an ".eff" file): refinement {
ordered_solvent {
low_resolution = 2.8
}
}
Keeping waters from your input file in AutoBuild
http://phenix-online.org/documentation/autobuild.htm (4 of 34) [12/14/08 1:01:09 PM]
82
Automated Model Building and Rebuilding using AutoBuild
You can tell AutoBuild to keep the waters in your input file when you are using rebuild_in_place (the default is to toss them and replace them with new ones). You can say, keep_input_waters=True place_waters=No
NOTE: If you specify keep_input_waters=True you should also specify either "place_waters=No" or
"keep_pdb_atoms=No" . This is because if place_waters=Yes and keep_pdb_atoms=Yes then phenix.refine will add waters and then the wizard will keep the new waters from the new PDB file created by phenix.refine preferentially over the ones in your input file.
Specifying phenix.refine parameters
You can control phenix.refine parameters that are not specified directly by AutoBuild using a refinement parameters (.eff) file: refine_eff_file=refinement_params.eff # set any phenix.refine params not set by AutoBuild
This file might contain a twin-law for refinement: refinement {
twinning {
twin_law = "-k, -h, -l"
}
}
You can put any phenix.refine parameters in this file, but a few parameters that are set directly by AutoBuild override your inputs from the refine_eff_file. These parameters are listed below.
Refinement parameters that must be set using AutoBuild Wizard keywords (overwriting any values provided by user in input_eff_file)
phenix.refine keyword refinement.main.
number_of_macro_cycles refinement.main.
simulated_annealing refinement.ncs.find_automatically refinement.main.ncs refinement.ncs.coordinate_sigma refinement.main.random_seed
Wizard keyword(s) and notes ncycle_refine s_annealing (only applies to 1st refinement in rebuild. SA in any other refinements controlled by input_eff_file, if any) refine_with_ncs=True turns on automatic ncs search refine_with_ncs=True turns on ncs
Normally not set by Wizard. However if the Wizard keyword ncs_refine_coord_sigma_from_rmsd is True then the ncs coordinate sigma is equal to ncs_refine_coord_sigma_from_rmsd_ratio times the rmsd among ncs copies i_ran_seed sets the random seed at the beginning of a
Wizard... this affects refinement.main.random_seed but does not set it to the value of i_ran_seed (because i_ran_seed gets updated by several different routines) http://phenix-online.org/documentation/autobuild.htm (5 of 34) [12/14/08 1:01:09 PM]
83
Automated Model Building and Rebuilding using AutoBuild refinement.main.ordered_solvent refinement.main.ordered_solvent refinement.ordered_solvent.
low_resolution refinement.main.
use_experimental_phases refinement.refine.strategy refinement.main.occupancy_max refinement.refine.occupancies.
individual refinement.main.high_resolution place_waters=True will set ordered_solvent to True. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.
low_resolution) is higher than the resolution used for refinement. place_waters_in_combine=True will set ordered_solvent to
True, only applying this to the final combination step of multiple-model generation. Note that this only has an effect if the value of the resolution cutoff for adding waters
(refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement. ordered_solvent_low_resolution=3.0 (default) will set the resolution cutoff for adding waters (refinement.
ordered_solvent.low_resolution) to 3 A. If the resolution used for refinement is larger than the value of ordered_solvent_low_resolution then ordered solvent is not added. use_mlhl=True will set refinement.main.
use_experimental_phases to True
The Wizard keywords refine refine_b refine_xyz all affect refinement.refine.strategy. If refine=True then refinement is carried out. If refine_b=True (default) isotropic displacement factors are refined. If refine_xyz=True
(default) coordinates are refined.
max_occ=1.0 sets the value of refinement.main.
occupancy_max to 1.0. Default is to do nothing and use the default from phenix.refine (1.0)
The combination of Wizard keywords of semet=True and refine_se_occ=True will add "(name SE)" to the value of refinement.refine.occupancies.individual. You can add to your .eff file other names of atoms to have occupancies refined as well.
Either of the Wizard keywords refinement_resolution and resolution will set the value of refinement.main.
high_resolution, with refinement_resolution being used if available. link_distance_cutoff refinement.pdb_interpretation.
link_distance_cutoff
The following parameters controlling phenix.refine output are set directly in AutoBuild and cannot be set by the user
● refinement.output.write_eff_file
● refinement.output.write_geo_file
● refinement.output.write_def_file
● refinement.output.write_maps
● refinement.output.write_map_coefficients
Specifying resolve/resolve_pattern parameters
Similarly, you can control resolve and resolve_pattern parameters. For these parameters, your inputs will not be overridden by AutoBuild. The format is a little tricky: you have to put two sets of quotes around the http://phenix-online.org/documentation/autobuild.htm (6 of 34) [12/14/08 1:01:09 PM]
84
Automated Model Building and Rebuilding using AutoBuild command like this: resolve_command="'resolution 200 3'" # NOTE ' and " quotes
This will put the text resolution 200 3 at the end of every temporary command file created to run resolve. (This is why it is not overridden by
AutoBuild commands; they will all come before your commands in the resolve command file.) Note that some commands in resolve may be incompatible with this usage.
Including ligand coordinates in AutoBuild
If your input PDB file contains ligands (anything other than solvent that is not protein if your chain_type=PROTEIN, for example) then by default these ligands will be kept, used in refinement, and written out to your output PDB file. Any solvent molecules will by default be discarded. You can change this behavior by changing the keywords from these defaults: keep_input_ligands=True keep_input_waters=False
The AutoBuild Wizard will use phenix.elbow to generate geometries for any ligands that are not recognized.
You can also tell AutoBuild to add the contents of any PDB files that you wish to supply to the current version of the structure just before refinement, so all the refined models produced contain whatever AutoBuild has built, plus the contents of these PDB files. This can be done through the GUI, the command-line, or a script. In the command-line version you do this with: input_lig_file_list=my_ligand.pdb
NOTE: The files in input_lig_file_list will be edited to make them all HETATM records to tell AutoBuild to ignore these residues in rebuilding.
NOTE You may need to tell phenix.refine about the geometry of your ligands. You will get an error message if the ligand is not recognized and an automatic run of phenix.elbow does not succeed in generating your ligand.
In that case you will want to run phenix.elbow to create a cif definition file for this ligand: phenix.elbow my_ligand.pdb --id=LIG where LIG is the 3-letter ID code that you use in my_ligand.pdb to identify your ligand. If the automatic run does not work you may need to give phenix.elbow additional information to generate your ligand.
Once phenix.elbow has generated your ligand you can use the keyword "cif_def_file_list" to tell AutoBuild about this ligand: cif_def_file_list=elbow.LIG.my_ligand.pdb.cif
Specifying arbitrary commands and cif files for phenix.refine
You can tell AutoBuild to apply any set of cif definitions to the model during refinement by using a combination of specification files and the commands cif_def_file_list and refine_eff_file_list: refine_eff_file_list=link.eff cif_def_file_list=link.cif
This example comes from the phenix.refine manual page in which a link is specified in a cif definition file link.
cif: http://phenix-online.org/documentation/autobuild.htm (7 of 34) [12/14/08 1:01:09 PM]
85
Automated Model Building and Rebuilding using AutoBuild
data_mod_5pho
# loop_
_chem_mod_atom.mod_id
_chem_mod_atom.function
_chem_mod_atom.atom_id
_chem_mod_atom.new_atom_id
_chem_mod_atom.new_type_symbol
_chem_mod_atom.new_type_energy
_chem_mod_atom.new_partial_charge
5pho add . O5T O OH .
loop_
_chem_mod_bond.mod_id
_chem_mod_bond.function
_chem_mod_bond.atom_id_1
_chem_mod_bond.atom_id_2
_chem_mod_bond.new_type
_chem_mod_bond.new_value_dist
_chem_mod_bond.new_value_dist_esd
5pho add O5T P coval 1.520 0.020
and this is applied with a parameters file link.eff:
refinement.pdb_interpretation.apply_cif_modification
{
data_mod = 5pho
residue_selection = resname GUA and name O5T
}
You can have any number of cif files and parameters files.
Output files from AutoBuild
When you run AutoBuild the output files will be in a subdirectory with your run number:
AutoBuild_run_1_/ # subdirectory with results
●
A summary file listing the results of the run and the other files produced:
AutoBuild_summary.dat # overall summary
●
A warnings file listing any warnings about the run
AutoBuild_warnings.dat # any warnings
●
A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
AutoBuild_Facts.dat # all Facts about the run
●
Final refined model overall_best.pdb
NOTE: The "overall_best.pdb" file is always the current best model. Similarly
"overall_best_denmod_map_coeffs.mtz" is always the best map_coefficients file. The
AutoBuild_summary.dat file lists the names of the current best set of files. The contents of "overall_best.
http://phenix-online.org/documentation/autobuild.htm (8 of 34) [12/14/08 1:01:09 PM]
86
Automated Model Building and Rebuilding using AutoBuild
● pdb" and of the best model listed in AutoBuild_summary.dat will be the same.
Final map coefficients used to build refined model. Use FP PHIM FOMM in maps. Normally this is a density-modified map from resolve. See also the map coefficients from phenix.refine below. overall_best_denmod_map_coeffs.mtz
●
Final sigmaA-weighted 2mFo-DFc and Fo-Fc map coefficients from phenix.refine based on overall_best.
pdb final model. The map coefficients are 2FOFCWT PH2FOFCWT for the 2mFo-DFc map and FOFC and
PHFOFC for the Fo-Fc difference map. See also the map coefficients from density modification above. overall_best_refine_map_coeffs.mtz
●
MTZ file with FP, phases and HL coeffs if present, and freeR_flags used in refinement exptl_fobs_phases_freeR_flags.mtz
●
Final log file for model-building overall_best.log
●
Final log file for refinement overall_best.log_refine
●
Evaluation of fit of model to map overall_best.log_eval
●
Summary of NCS information ncs_info.ncs
Standard building, rebuild_in_place, and multiple-models
The AutoBuild Wizard has two overall methods for building a model. The first method (standard build) is to build a model from scratch. This involves identification of where helices (and strands, for proteins) are located, extension using fragment libraries, connection of segments, identification of side-chains, and sequence alignment. These methods are augmented in the standard building procedure by loop-fitting and building model outside of the region that has already been built. The second method (rebuild_in_place) takes an existing model and rebuilds it without adding or deleting any residues and without changing the connectivity of the chain. The way this works is a segment of the model is deleted and then is filled-in again by rebuilding from the remaining ends. This is repeated for overlapping segments covering the entire model. The multiplemodels approach really has two levels of multiple models. At the first level, several
(multiple_models_group_number, default is number_of_parallel_models) models are built (using rebuild_in_place) and are then recombined into a single good model. At the next level, this whole process may be done more than once (multiple_models_number times), yielding several very good models. By default, if you ask for rebuild_in_place, then you will get a single very good model, created by running rebuild_in_place several times and recombining the models.
Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel
The AutoBuild Wizard is set up to take advantage of multi-processor machines or batch queues by splitting the work into separate tasks. See
and Tutorial 6: Automatically rebuilding a structure solved by Molecular
for a description of the method used by the AutoBuild Wizard to run build jobs as sub-processes and to combine the results into single models. Here are the key factors that determine how splitting modelbuilding into batches and running them on one or more processors works: http://phenix-online.org/documentation/autobuild.htm (9 of 34) [12/14/08 1:01:09 PM]
87
Automated Model Building and Rebuilding using AutoBuild
●
nbatch is the number of batches of work. As long as nbatch is fixed then the results of running the
Wizard will be the same, no matter how many processors are used. It is most efficient however to have nbatch be at least as large as nproc, the number of processors. Otherwise some processors may end up doing nothing. The default is nbatch=3. The value of nbatch is used to set other defaults (such as number_of_parallel_models).
●
nproc is the number of processors to split the work among
●
number_of_parallel_models is the number of models to build at once. The default is to set number_of_parallel_models=nbatch. This affects both standard building (number_of_parallel_models sets how many initial models to build) and rebuild_in_place (number_of_parallel_models determines whether a single model is built or a set of models are built and recombined into a single model).
Model editing during rebuilding with the Coot-PHENIX interface
The AutoBuild Wizard allows you to edit a model and give it back to the Wizard during the iterative modelbuilding, density modification and refinement process. The Wizard will consider the model that you give it along with the models that it generates automatically, and will choose the parts of your model that fit the density better than other models. You can edit a model using the PHENIX-Coot interface. This interface is accessible through the GUI and via the command-line. Using the GUI, when a model has been produced by the
AutoBuild Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start
Coot with your current map and model. If you are running from the command-line, you can open a new window and type: phenix.autobuild coot which will do the same (provided the necessary map and model are ready). When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. You can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save model as COMM/
overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. The precise use of the model that you save depends on the type of model-building that is being carried out by the AutoBuild Wizard. If you are using
rebuild_in_place then the main-chain and side-chains of the model are considered as replacements for the current working model. Any ligands or unrecognized residues are (by default) not rebuilt but are included in refinement. By default, solvent in the model is ignored. If you are not using rebuild_in_place, only the mainchain conformation is considered, and the side-chains are ignored. Ligands (but not solvent) in the model are
(by default) kept and included in refinement. As the AutoBuild Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button Update with current files from PHENIX.
Resolution limits in AutoBuild
There are several resolution limits used in AutoBuild. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:
Name resolution refinement_resolution resolution_build
Description
Overall resolution. Used as highresolution limit for density modification. Used as default for refinement resolution and modelbuilding resolution if they are not set.
Resolution for refinement
Resolution for model-building
How default value is set
Resolution of input datafile. If a hires datafile is provided, the resolution of that data is used. value of "resolution" value of "resolution" http://phenix-online.org/documentation/autobuild.htm (10 of 34) [12/14/08 1:01:09 PM]
88
Automated Model Building and Rebuilding using AutoBuild overall_resolution multiple_models_starting_resolution
Resolution to truncate all data. This should only be used if you need to truncate the data in order to get the
Wizard to run. It causes the Wizard to ignore all data at higher resolution than overall_resolution. It is normally better to use the resolution keyword to define the resolution limits, as that will keep all the data in the output and working files.
Resolution for the initial rebuilding of a model in the multiple-models procedure. Normally a low resolution to generate diversity.
None
4 A by default
Examples
Run AutoBuild automatically after AutoSol
phenix.autobuild after_autosol
Run AutoBuild beginning with experimental data
phenix.autobuild data=solve_1.mtz seq_file=seq.dat
Merge in hires data
phenix.autobuild data=solve_2.mtz hires_file=w1.sca seq_file=seq.dat
Make a SA-omit map around atoms in target.pdb
phenix.autobuild data=data.mtz model=coords.pdb omit_box_pdb=target.pdb composite_omit_type=sa_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .
An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).
Make a simple composite omit map
phenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=simple_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .
An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).
Make an iterative-build omit map around atoms in target.pdb
phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=target.pdb \
composite_omit_type=iterative_build_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .
An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).
Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb
http://phenix-online.org/documentation/autobuild.htm (11 of 34) [12/14/08 1:01:09 PM]
89
Automated Model Building and Rebuilding using AutoBuild phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=coords.pdb \
omit_res_start_list=3 omit_res_end_list=4 omit_chain_list=A \
composite_omit_type=sa_omit
Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .
An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note 1: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).
Create one very good rebuilt model
phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
include_input_model=True \
multiple_models_number=1 n_cycle_rebuild_max=5
The final model will be in the file MULTIPLE_MODELS/all_models.pdb (this file will contain just one model).
Note that this procedure will keep the sequence that is present in coords.pdb. If you supply a sequence file it will edit the sequence of coords.pdb to match your sequence file and discard any residues that do not match.
(If you want to input a sequence file but not edit the sequence in coords.pdb and not discard any nonmatching residues, then specify also edit_pdb=False.) Note also that if include_input_model=True then no randomization cycle will be carried out and multiple_models_starting_resolution is ignored.
Touch up a model
phenix.autobuild data=data.mtz model=coords.pdb \ touch_up=True worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8
You can rebuild just the worst parts of your model by settting touch_up=True. You can decide what parts to rebuild based on a minimum model-map correlation (by residue). You can decide how much to rebuild using worst_percent_res_rebuild or with min_cc_res_rebuild, or both.
Create 20 very good rebuilt models that are as different as possible
phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \
multiple_models_number=20 n_cycle_rebuild_max=5
The 20 models will be in the file MULTIPLE_MODELS/all_models.pdb. This procedure is useful for generating an ensemble of models that are each individually consistent with the data, and yet are diverse. The variation among these models is an indication of the uncertainty in each of the models. Note that the ensemble of models is not a representation of the ensemble of structures that is truly present in the crystal.
Morph an MR model and rebuild it
phenix.autobuild data=data.mtz model=MR.pdb \ morph=True rebuild_in_place=False seq_file=seq.dat
You can have autobuild morph your input model, distorting it to match the density-modified map that is produced from your model and data. This can be used to make an improved starting model in cases where the
MR model is very different than the structure that is to be solved. For the morphing to work, the two structures must be topologically similar and differ mostly by movements of domains or motifs such as a group of helices or a sheet. The morphing process consists of identifying a coordinate shift to apply to each N (or P for nucleic acids) atom that maximizes the local density correlation between the model and the map. This is smoothed and applied to the structure to generate a morphed structure.
Build an RNA chain
phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=RNA http://phenix-online.org/documentation/autobuild.htm (12 of 34) [12/14/08 1:01:09 PM]
90
Automated Model Building and Rebuilding using AutoBuild
Build a DNA chain
phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=DNA
Just make maps; don't do any building.
phenix.autobuild data=data.mtz model=coords.pdb maps_only=True
Just calculate a prime-and-switch map
phenix.autobuild data=data.mtz solvent_fraction=.6 \
ps_in_rebuild=True model=coords.pdb maps_only=True
The output prime-and-switch map will be in the file prime_and_switch.mtz.
Possible Problems
General limitations
●
The AutoBuild wizard edits input PDB files to remove multiple conformations. It will also renumber residues if the file contains residues with insertion codes. All references to residue numbers (e.g. rebuild_res_start_list) refer to the edited, renumbered model. This model can be found in the
●
AutoBuild_run_1_ (or appropriate) directory as "edited_pdb.pdb".
The AutoBuild wizard expects residue numbers to not decrease along a chain. It will stop if residue 250 in chain B is found between residues 116 and 117 in the same chain, for example. To get around this, use insertion codes (make residue 250 residue 116A instead).
●
The AutoBuild model-building can only build one type of chain at a time (default chain_type='PROTEIN'; other choices are RNA and DNA). If you supply a PDB file containing more than one type of chain for rebuilding, then all the residues that are not that type of chain are treated as ligands and are (by default, keep_input_ligands=True) included in refinement but not in rebuilding. Any input solvent molecules are (by default, keep_input_waters=False) ignored.
You can include more than one type of chain in rebuilding by supplying one type of chains as ligands with input_lig_file_list and rebuilding another type: chain_type=PROTEIN # build only protein input_lig_file_list=MyDNA.pdb # just read in DNA coordinates and include in refinement
In this case only protein chains will be built, but the DNA coordinates in MyDNA.pdb will be included in all refinements and will be written out to the final coordinate file. You may wish to add the keyword: keep_pdb_atoms=False #keep the ligand atoms if model (pdb) and ligand overlap which will tell AutoBuild that the ligand (DNA) atoms are to be kept if the model that is being built
(protein) overlaps with it. (The default is to keep the model that is being built and to discard any ligand atoms that overlap). This whole process is likely to require substantial editing of the PDB files by hand because when you build DNA, a lot of chains are going to be built into the protein region, and when you
● build protein, it is going to be accidentally built into the DNA.
Any file in input_lig_file_list containing ATOM records will have them replaced with HETATM records. This
● is so that the rebuild_in_place algorithm does not try to use them in rebuilding.
The ligand generation routine in phenix.elbow will not generate heme groups at this point. Most other
● ligands can be automatically generated.
If your input data file contains both intensity data and amplitude data, only the amplitude data is exposed in the AutoBuild Wizard. If you want to use the intensity data then you have to create a file that does not have amplitude data in it.
●
If your input data file has only intensity data and you wish to specify which columns of data the
AutoBuild Wizard is to use, then you have to specify the names that the columns will have AFTER importing the data and conversion to amplitudes, not the original column names. These column names http://phenix-online.org/documentation/autobuild.htm (13 of 34) [12/14/08 1:01:09 PM]
91
Automated Model Building and Rebuilding using AutoBuild may not be obvious. Here is how to find out what they will be. Do a quick dummy run like this with XXX as labels: phenix.autobuild w2.sca coords.pdb input_labels="XXX XXX"
The Wizard will print out a list of available labels like this:
Sorry, the label XXX does not exist as an amplitude array in the input_data_file ImportRawData_run_8_/w2_PHX.mtz
...available labels are: ['w2', 'SIGw2', 'None']
Then you know that the correct command is: phenix.autobuild w2.sca coords.pdb input_labels="w2 SIGw2"
●
The AutoBuild Wizard cannot build modified residues. If you supply a model with modified residues, these will be taken out of the chain and treated as ligands, and the chain will be broken at that point. By default the modified residues will be added to your model just before refinement and a cif definitions file will be automatically generated for these residues. You can also add these residues with the input_lig_file_list procedure if you want.
●
The AutoBuild Wizard will not build very short chains unless you set the variable group_ca_length
(default=4 for building a model from scratch) to a smaller number. The shortest chain that will be built is group_ca_length. If you use rebuild_in_place, then the default shortest chain allowed is 1 residue, so any part of a model you supply is rebuilt.
Specific limitations and problems
●
The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoBuild wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-
● size versions can be obtained on request.
The AutoBuild Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.
Literature
Iterative model building, structure refinement and density modification with the
PHENIX AutoBuild wizard. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, N. W.
Moriarty, P. H. Zwart, L.-W. Hung, R. J. Read, and P. D. Adams Acta Cryst. D64, 61-69 (2008)
Interpretation of ensembles created by multiple iterative rebuilding of
macromolecular models. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, P. D.
Adams, N. W. Moriarty P. H. Zwart, R. J. Read, D. Turk and L.-W. Hung Acta Cryst. D63, 597-
610 (2007)
Using prime-and-switch phasing to reduce model bias in molecular replacement. T.
C. Terwilliger Acta Cryst. D60, 2144-2149 (2004)
Improving macromolecular atomic models at moderate resolution by automated
iterative model building, statistical density modification and refinement. T.C.
Terwilliger. Acta Cryst. D59, 1174-1182 (2003)
Statistical density modification using local pattern matching. T.C. Terwilliger.
Acta
Cryst. D59, 1688-1701 (2003)
Automated side-chain model building and sequence assignment by template
matching. T.C. Terwilliger. Acta Cryst. D59, 45-49 (2003)
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf] http://phenix-online.org/documentation/autobuild.htm (14 of 34) [12/14/08 1:01:09 PM]
92
Automated Model Building and Rebuilding using AutoBuild
Automated main-chain model building by template matching and iterative fragment
extension. T.C. Terwilliger. Acta Cryst. D59, 38-44 (2003)
Rapid automatic NCS identification using heavy-atom substructures T.C. Terwilliger.
Acta Cryst. D58, 2213-2215 (2002)
Statistical density modification with non-crystallographic symmetry T.C. Terwilliger.
Acta Cryst. D58, 2082-2086 (2002)
Maximum likelihood density modification T. C. Terwilliger Acta Cryst. D56 , 965-972
(2000)
Maximum-likelihood density modification with pattern recognition of structural
motifs. T. C. Terwilliger Acta Cryst. D57 , 1755-1762 (2001)
Map-likelihood phasing T. C. Terwilliger Acta Cryst. D57 , 1763-1775 (2001)
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
[pdf]
Additional information
List of all AutoBuild keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- autobuild
data= None Datafile (alias for input_data_file) This file can be a .sca or
mtz or other standard file. The Wizard will guess the column
identification. You can specify the column labels to use with:
input_labels='FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag'
Substitute any labels you do not have with None. If you only have
myFP and mysigFP you can just say input_labels='myFP mysigFP'.
(Command-line only)
model= None PDB file with starting model (alias for input_pdb_file) NOTE:
If your PDB file has been previously refined, then please make sure
that you provide the free R flags that were used in that refinement.
These can come from the data file or from the refinement_file.
(Command-line only).
seq_file= Auto Sequence file (alias for input_seq_file). The format is
plain text, with chains separated by a line starting with > ,
any blanks and unrecognized characters are ignored. You need only
input 1 copy of each unique chain. (Command-line only)
map_file= Auto MTZ file containing starting map (alias for input_map_file)
This file must be a mtz file. The Wizard will guess the column
identification. You can specify the column labels to use with:
input_map_labels='FP PHIB FOM' Substitute any labels you do not
have with None. If you only have myFP and myPHIB you can just say
input_map_labels='myFP myPHIB'. (Command-line only)
refinement_file= Auto File for refinement (alias for input_refinement_file)
This file can be a .sca or mtz or other standard file.
This file will be merged with your data file, with any
phase information coming from your data file. If this file
has free R flags, they will be used, otherwise if the data http://phenix-online.org/documentation/autobuild.htm (15 of 34) [12/14/08 1:01:09 PM]
93
Automated Model Building and Rebuilding using AutoBuild
file has them, those will be used, otherwise they will be
generated. The Wizard will guess the column
identification. You can specify the column labels to use
with: input_refinement_labels='FP SIGFP FreeR_flag'
Substitute any labels you do not have with None. If you
only have myFP and mysigFP you can just say
input_refinement_labels='myFP mysigFP'. (Command-line
only).
hires_file= Auto File with high-resolution data (alias for
input_hires_file) This file can be a .sca or mtz or other
standard file. The Wizard will guess the column identification.
You can specify the column labels to use with:
input_hires_labels='FP SIGFP'. (Command-line only)
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
(Command-line only)
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown. (Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop".
If you type "phenix.autobuild run=3 stop" then this will stop run
3 of autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
(Command-line only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown. (Command-line only)
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this: peak.data="F+
SIGF+ F- SIGF-" # the entire string in quotes counts
here You can use the individual labels from these
strings as identifiers for data columns in AutoSol and
AutoBuild like this: input_refinement_labels="FP SIGFP
FreeR_flags" # each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
crystal_info http://phenix-online.org/documentation/autobuild.htm (16 of 34) [12/14/08 1:01:09 PM]
94
Automated Model Building and Rebuilding using AutoBuild
cell= 0.0 0.0 0.0 0.0 0.0 0.0
Enter cell parameter a b c alpha beta
gamma
chain_type= *Auto PROTEIN DNA RNA You can specify whether to build
protein, DNA, or RNA chains. At present you can only build
one of these in a single run. If you have both DNA and
protein, build one first, then run AutoBuild again,
supplying the prebuilt model in the "input_lig_file_list"
and build the other. NOTE: default for this keyword is Auto,
which means "carry out normal process to guess this
keyword". The process is to look at the sequence file and/or
input pdb file to see what the chain type is. If there are
more than one type, the type with the larger number of
residues is guessed. If you want to force the chain_type,
then set it to PROTEIN RNA or DNA.
dmax= 500.0
Low-resolution limit
overall_resolution= 0.0
If overall_resolution is set, then all data
beyond this is ignored. NOTE: this is only suggested
if you have a very big cell and need to truncate the
data to allow the wizard to run at all. Normally you
should use 'resolution' and 'resolution_build' and
'refinement_resolution' to set the high-resolution
limit
resolution= 0.0
High-resolution limit.Used as resolution limit for
density modification and as general default high-resolution
limit. If resolution_build or refinement_resolution are set
then they override this for model-building or refinement. If
overall_resolution is set then data beyond that resolution
is ignored completely.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
solvent_fraction= None Solvent fraction in crystals (0 to 1).
decision_making
acceptable_r= 0.25
Used to decide whether the model is acceptable enough
to quit if it is not improving much. A good value is 0.25
dist_close= None If main-chain atom rmsd is less than dist_close then
crossover between chains in different models is allowed at
this point. If you input a negative number the defaults
will be used
dist_close_overlap= 1.5
Model or ligand coordinates but not both are
kept when model and ligand coordinates are within
dist_close_overlap and ligands in
input_lig_file_list are being added to the current
model. NOTE: you might want to decrease this if your
ligand atoms get removed by the wizard. Default=1.5
A
group_ca_length= 4 In resolve building you can specify how short a
fragment to keep. Normally 4 or 5 residues should be
the minimum.
group_length= 2 In resolve building you can specify how many fragments
must be joined to make a connected group that is kept.
Normally 2 fragments should be the minimum.
include_molprobity= False You can choose to include the clash score from
MolProbity as one of the scoring criteria in
comparing and merging models. The score is combined
with the model-map correlation CC by summing in a
weighted clashscore. If clashscore for a residue has
a value < ok_molp_score then its value is
(clashscore-ok_molp_score)*scale_molp_score,
otherwise its value is zero.
loop_cc_min= 0.4
You can specify the minimum correlation of density from
a loop with the map.
min_cc_res_rebuild= 0.5
You can rebuild just the worst parts of your
model by settting touch_up=True. You can decide what http://phenix-online.org/documentation/autobuild.htm (17 of 34) [12/14/08 1:01:09 PM]
95
Automated Model Building and Rebuilding using AutoBuild
parts to rebuild based on a minimum model-map
correlation (by residue). You can decide how much to
rebuild using worst_percent_res_rebuild or with
min_cc_res_rebuild, or both.
min_seq_identity_percent= 50.0
The sequence in your input PDB file will
be adjusted to match the sequence in your
sequence file (if any). If there are
insertions/deletions in your model and the
wizard does not seem to identify them, you can
split up your PDB file by adding records like
this: BREAK You can specify the minimum
sequence identity between your sequence file
and a segment from your input PDB file to
consider the sequences to be matched. Default
is 50.0%. You might want a higher number to
make sure that deletions in the sequence are
noticed.
ok_molp_score= None You can choose to include the clash score from
MolProbity as one of the scoring criteria in comparing
and merging models. The score is combined with the
model-map correlation CC by summing in a weighted
clashscore. If clashscore for a residue has a value <
ok_molp_score (the threshold defined by ok_molp_score)
then its value is
(clashscore-ok_molp_score)*scale_molp_score, otherwise
its value is zero.
r_switch= 0.4
R-value criteria for deciding whether to use R-value or
residues built A good value is 0.40
scale_molp_score= None You can choose to include the clash score from
MolProbity as one of the scoring criteria in comparing
and merging models. The score is combined with the
model-map correlation CC by summing in a weighted
clashscore. If clashscore for a residue has a value <
ok_molp_score then its value is
(clashscore-ok_molp_score)*scale_molp_score, otherwise
its value is zero.
semi_acceptable_r= 0.3
Used to decide whether the model is acceptable
enough to skip rebuilding the model from scratch and
focus on adding loops and extending it. A good value
is 0.35
density_modification
hl= False You can choose whether to calculate hl coeffs when doing
density modification ('Yes') or not to do so ('No'). Default is No.
mask_type= *histograms probability wang Choose method for obtaining
probability that a point is in the protein vs solvent region.
Default is "histograms". If you have a SAD dataset with a
heavy atom such as Pt or Au then you may wish to choose
"wang" because the histogram method is sensitive to very high
peaks. Options are: histograms: compare local rms of map and
local skew of map to values from a model map and estimate
probabilities. This one is usually the best. probability:
compare local rms of map to distribution for all points in
this map and estimate probabilities. In a few cases this one
is much better than histograms. wang: take points with
highest local rms and define as protein.
modify_outside_delta_solvent= 0.05
You can set the initial solvent
content to be a little lower than
calculated when you are running
modify_outside_model Usually 0.05 is fine.
modify_outside_model= False You can choose whether to modify the density
in the "protein" region outside the region
specified in your current model by matching http://phenix-online.org/documentation/autobuild.htm (18 of 34) [12/14/08 1:01:09 PM]
96
Automated Model Building and Rebuilding using AutoBuild
histograms with the region that is specified by
that model. This can help by raising the density
in this protein region up to a value similar to
that where atoms are already placed.
thorough_denmod= *Auto Yes No True False Choose whether you want to go
for thorough density modification when no model is used
("No" speeds it up and for a terrible map is sometimes
better)
truncate_ha_sites_in_resolve= *Auto Yes No True False You can choose to
truncate the density near heavy-atom sites
at a maximum of 2.5 sigma. This is useful
in cases where the heavy-atom sites are
very strong, and rarely hurts in cases
where they are not. The heavy-atom sites
are specified with "input_ha_file"
use_resolve_fragments= True This script normally uses information from
fragment identification as part of density
modification for the first few cycles of
model-building. Fragments are identified during
model-building. The fragments are used, with
weighting according to the confidence in their
placement, in density modification as targets for
density values.
use_resolve_pattern= True Local pattern identification is normally used
as part of density modification during the first
few cycles of model building.
general
after_autosol= False You can specify that you want to continue on
starting with the highest-scoring run of AutoSol.
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set
run_command=qsub (or otherwise submit to a batch queue),
then you should set background=False, so that the batch
queue can keep track of your runs. There is no need to use
background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
base_path= None You can specify the base path for files (default is
current working directory)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is No, keep these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True"
coot_name= coot If your version of coot is called something else, then
you can specify that here.
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
extra_verbose= False Facts and possible commands will be printed every
cycle if Yes
i_ran_seed= 289564 Random seed (positive integer) for model-building
and simulated annealing refinement
max_wait_time= 100.0
You can specify the length of time (seconds) to
wait when testing the run_command. If you have a cluster
where jobs do not start right away you may need a longer
time to wait.
nbatch= 3 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it http://phenix-online.org/documentation/autobuild.htm (19 of 34) [12/14/08 1:01:09 PM]
97
Automated Model Building and Rebuilding using AutoBuild
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
quick= False Run everything quickly (number_of_parallel_models=1
n_cycle_build_max=1 n_cycle_rebuild_max=1)
resolve_command_list= None Commands for resolve. One per line in the
form: keyword value value can be optional
Examples: coarse_grid resolution 200 2.0 hklin
test.mtz NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single quotes
(') like this: resolve_command_list="'no_build'
'b_overall 23' "
resolve_pattern_command_list= None Commands for resolve_pattern. One
per line in the form: keyword value
value can be optional Examples:
resolution 200 2.0 hklin test.mtz NOTE:
for command-line usage you need to enclose
the whole set of commands in double quotes
(") and each individual command in single
quotes (') like this:
resolve_pattern_command_list="'resolution
200 20' 'hklin test.mtz' "
resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
("","_giant","_huge","_extra_huge")
run_command= csh When you specify nproc=nn, you can run the subprocesses
as jobs in background with csh (default) or submit them to
a queue with the command of your choice (i.e., qsub ). If
you have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your
system. NOTE: If you set run_command=qsub (or otherwise
submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command=csh (or similar, csh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
temp_dir= None Define a temporary directory (it must exist)
title= Run 1 AutoBuild Sun Dec 7 17:46:23 2008 Enter any text you like
to help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
verbose= False Command files and other verbose output will be printed
input_files
cif_def_file_list= None You can enter any number of CIF definition
files. These are normally used to tell phenix.refine
about the geometry of a ligand or unusual residue.
You usually will use these in combination with "PDB http://phenix-online.org/documentation/autobuild.htm (20 of 34) [12/14/08 1:01:09 PM]
98
Automated Model Building and Rebuilding using AutoBuild
file with metals/ligands" (keyword
"input_lig_file_list" ) which allows you to attach
the contents of any PDB file you like to your model
just before it gets refined. You can use
phenix.elbow to generate these if you do not have a
CIF file and one is requested by phenix.refine
input_data_file= None Enter the a file with input structure factor data.
For structure factor data only (e.g., FP SIGFP) any
format is ok. If you have free R flags, phase
information or HL coefficients that you want to use
then an mtz file is required. If this file contains
phase information, this phase information should be
experimental (i.e., MAD/SAD/MIR etc), and should not be
density-modified phases (enter any files with
density-modified phases as input_map_file instead).
NOTE: If you supply HL coefficients they will be used
in phase recombination. If you supply PHIB or PHIB and
FOM and not HL coefficients, then HL coefficients will
be derived from your PHIB and FOM and used in phase
recombination. If you also specify a hires data file,
then FP and SIGFP will come from that data file (and
not this one) If an input_refinement_file is
specified, then F, Sigma, FreeR_flag (if present) from
that file will be used for refinement instead of this
one.
input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set
then density at sites specified with input_ha_file is
truncated to improve the density modification procedure.
input_hires_labels= None Labels for input hires file (FP SIGFP
FreeR_flag)
input_labels= None Labels for input data columns NOTE: Applies to input
data file for LigandFit and AutoBuild, but not to AutoMR.
For AutoMR use instead 'input_label_string'.
input_lig_file_list= None This script adds the contents of these PDB
files to each model just prior to refinement.
Normally you might use this to put in any
heavy-atoms that are in the refined structure (for
example the heavy atoms that were used in phasing),
or to add a ligand to your model. If the atoms in
this PDB file are not recognized by phenix.refine,
then you can specify their geometries with a cif
definitions file using the keyword
"cif_def_files_list". You can easily generate cif
definitions for many ligands using phenix.elbow in
PHENIX. You can put anything you like in the files
in input_lig_file_list, but any atoms that fall
within 1.5 A of any atom in the current model will
be tossed (not written to the model).
input_map_file= Auto Enter an mtz file with coefficients for map (if
different file or different coefficients than input
structure factor data ). This map will be used in the
first cycle of model-building. NOTE: default for this
keyword is Auto, which means "carry out normal process
to guess this keyword". This means if you specify
"after_autosol" in AutoBuild, AutoBuild will
automatically take the value from AutoSol. If you do not
want this to happen, you can specify None which means
"No file"
input_map_labels= None Labels for input map coefficient columns (FP PHIB
FOM) NOTE: FOM is optional (set to None if you wish)
input_pdb_file= None You can enter a PDB file containing a starting
model of your structure NOTE: If you enter a PDB file http://phenix-online.org/documentation/autobuild.htm (21 of 34) [12/14/08 1:01:09 PM]
99
Automated Model Building and Rebuilding using AutoBuild
then the AutoBuild wizard will start right in with
rebuild steps, skipping the build process. If the model
is very poor than it may be better to leave it out as
the build process (which includes pattern recognition
and recognition of helical and strand fragments) is
optimized for improving poor maps, while the rebuild
process is optimized for better maps that can be
produced by having a partial model.
input_refinement_file= Auto Data file to use for refinement. The data in
this file should not be corrected for anisotropy.
It will be combined with experimental phase
information (if any) from input_data_file for
refinement. If you leave this blank, then the
data in the input_data_file will be used in
refinement. If no anisotropy correction is
applied to the data you do not need to specify a
datafile for refinement. If an anisotropy
correction is applied to the data files, then you
should enter an uncorrected datafile for
refinement. Any standard format is fine;
normally only F and sigF will be used. Bijvoet
pairs and duplicates will be averaged. If an mtz
file is provided then a free R flag can be read
in as well. Any HL coeffs and phase information
in this file is ignored. NOTE: default for this
keyword is Auto, which means "carry out normal
process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild
will automatically take the value from AutoSol.
If you do not want this to happen, you can
specify None which means "No file"
input_refinement_labels= None Labels for input refinement file columns
(FP SIGFP FreeR_flag)
input_seq_file= Auto Enter name of file with 1-letter code of protein
sequence NOTES: 1. lines starting with > are ignored
and separate chains 2. FASTA format is fine 3. If
there are multiple copies of a chain, just enter one
copy. 4. If you enter a PDB file for rebuilding and it
has the sequence you want, then the sequence file is not
necessary. NOTE: You can also enter the name of a PDB
file that contains SEQRES records, and the sequence from
the SEQRES records will be read, written to
seq_from_seqres_records.dat, and used as your input
sequence. NOTE: for AutoBuild you can specify
start_chains_list on the first line of your sequence
file: >> start_chains_list 23 11 5 NOTE: default
for this keyword is Auto, which means "carry out normal
process to guess this keyword". This means if you
specify "after_autosol" in AutoBuild, AutoBuild will
automatically take the value from AutoSol. If you do not
want this to happen, you can specify None which means
"No file"
keep_input_ligands= True You can choose whether to (by default) let the
wizard keep ligands by separating them out from the
rest of your model and adding them back to your
rebuilt model, or alternatively to remove all
ligands from your input pdb file before
rebuild_in_place.
keep_input_waters= False You can choose whether to keep input waters
(solvent) when using rebuild_in_place. If you keep
them, then you should specify either
"place_waters=No" or "keep_pdb_atoms=No" because if http://phenix-online.org/documentation/autobuild.htm (22 of 34) [12/14/08 1:01:09 PM]
100
Automated Model Building and Rebuilding using AutoBuild
place_waters=Yes and keep_pdb_atoms=Yes then
phenix.refine will add waters and then the wizard
will keep the new waters from the new PDB file
created by phenix.refine preferentially over the ones
in your input file.
keep_pdb_atoms= True You can choose whether to keep the model
coordinates when model and ligand coordinates are within
dist_close_overlap and ligands in input_lig_file_list
are being added to the current model. Default=Yes
refine_eff_file_list= None You can enter any number of refinement
parameter files. These are normally used to tell
phenix.refine defaults to apply, as well as
creating specialized definitions such as unusual
amino acid residues and linkages. These
parameters override the normal phenix.refine
defaults. They themselves can be overridden by
parameters set by the Wizard and by you,
controlling the Wizard. NOTE: Any parameters set
by AutoBuild directly (such as
number_of_macro_cycles, high_resolution, etc...)
will not be taken from this parameters file. This
is useful only for adding extra parameters not
normally set by AutoBuild.
maps
maps_only= False You can choose whether to skip all model-building and
just calculate maps and write out the results. This also runs
just 1 cycle and turns on HL coefficients.
n_xyz_list= None You can specify the grid to use for map calculations.
model_building
allow_negative_residues= False Normally the wizard does not allow
negative residue numbers, and all residues with
negative numbers are rejected when they are
read in. You can allow them if you wish.
base_model= None You can enter a PDB file with coordinates to be used
as a starting point for model-building. These coordinates
will be included in the same way as fragments placed by
searching for helices and strand in initial model-building.
Note the difference from the use of models in
consider_main_chain_list, which are merged with models after
they are built. NOTE: Only use this if you want to keep the
input model and just add to it.
build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build
models with RESOLVE and TEXTAL or either one, and how many
different models to build with RESOLVE. The more you build,
the more likely to get a complete model. Note that
rebuild_in_place can only be carried out with RESOLVE
model-building
cc_helix_min= None Minimum CC of helical density to map at low
resolution when using helices_strands_only
cc_strand_min= None Minimum CC of strand density to map when using
helices_strands_only
consider_main_chain_list= None This keyword lets you name any number of
PDB files to consider as templates for
model-building. Every time models are built,
the contents of these files will be merged
with them and the best parts will be kept.
NOTE: this only uses the main-chain atoms of
your PDB files.
dist_connect_max_helices= None Set maximum distance between ends of
helices and other ends to try and connect them
in insert_helices.
edit_pdb= True You can choose to edit the input PDB file in http://phenix-online.org/documentation/autobuild.htm (23 of 34) [12/14/08 1:01:09 PM]
101
Automated Model Building and Rebuilding using AutoBuild
rebuild_in_place to match the input sequence (default=True).
NOTE: residues with residue numbers higher than
'highest_resno' are assumed to not have a known sequence and
will not be edited. By default the value of 'highest_resno' is
the highest residue number from the sequence file, after
adding it to the starting residue number from
start_chains_list. You can also set it directly
helices_strands_only= False You can choose to use a quick model-building
method that only builds secondary structure. At
low resolution this may be both quicker and more
accurate than trying to build the entire structure
If you are running the AutoSol Wizard, normally
you should choose 'Yes' and use the quick
model-building. Then when your structure is solved
by AutoSol, go on to AutoBuild and build a more
complete model (this time normally using
helices_strands_only=False).
helices_strands_start= False You can choose to use a quick
model-building method that builds secondary
structure as a way to get started...then model
completion is done as usual. (Contrast with
helices_strands_only which only does secondary
structure)
highest_resno= None Highest residue number to be considered "placed" in
sequence for rebuild_in_place
include_input_model= True The keyword include_input_model defines
whether the input model (if any) is to be crossed
with models that are derived from it, and the best
parts of each kept. Note that if
multiple_models=True and include_input_model=True
then no initial cycle of randomization will be
carried out and the keyword
multiple_models_starting_resolution is ignored. In
most cases you should use include_input_model=True
If you want to generate maximum diversity with
multiple-models then you may wish to use
include_input_model=False. Also if you want to
decrease the amount of bias from your starting
model you may wish to use
include_input_model=False.
input_compare_file= NONE If you are rebuilding a model or already think
you know what the model should be, you can include a
comparison file in rebuilding. The model is not used
for anything except to write out information on
coordinate differences in the output log files.
NOTE: this feature does not always work correctly.
merge_models= False You can choose to only merge any input models and
write out the resulting model. The best parts of each
model will be kept based on model-map correlation.
Normally used along with number_of_parallel_models=1
morph= False You can choose whether to distort your input model in order
to match the current working map. This may be useful for MR
models that are quite distant from the correct structure.
morph_cycles= 2 Number of iterations of morphing each time it is run.
morph_rad= 7.0
Smoothing radius for morphing. The density from your
model and from the map are calculated with the radius
rad_morph, then they are adjusted to overlap optimally
n_ca_enough_helices= None Set maximum number of CA to add to ends of
helices and other ends to try and connect them in
insert_helices.
offsets_list= 53 7 23 You can specify an offset for the orientation of
the helix and strand templates in building. This is used http://phenix-online.org/documentation/autobuild.htm (24 of 34) [12/14/08 1:01:09 PM]
102
Automated Model Building and Rebuilding using AutoBuild
in generating different starting models.
ps_in_rebuild= False You can choose to use a prime-and-switch resolve
map in all cycles of rebuilding instead of a
density-modified map. This is normally used in
combination with maps_only to generate a prime-and-switch
map.
refine= True This script normally refines the model during building. Say
No to skip refinement
resolution_build= 0.0
Enter the high-resolution limit for
model-building. If 0.0, the value of resolution is
used as a default.
restart_cycle_after_morph= 5 Morphing (if morph=True) will go only up to
this cycle, and then the morphed PDB file
will be used as a starting PDB file from then
on, removing all previous models.
retrace_before_build= False You can choose to retrace your model n_mini
times and use a map based on these retraced models
to start off model-building. This is the default
for rebuilding models if you are not using
rebuild_in_place. You can also specify
n_iter_rebuild, the number of cycles of
retrace-density-modify-build before starting the
main build.
reuse_chain_prev_cycle= True You can choose to allow model-building to
include atoms from each cycle in the model the
next cycle or not
richardson_rotamers= *Auto Yes No True False You can choose to use the
rotamer library from SC Lovell, JM Word, JS
Richardson and DC Richardson (2000) " The
Penultimate Rotamer Library" Proteins: Structure
Function and Genetics 40 389-408. if you wish.
Typically this works well in RESOLVE model-building
for nearly-final models but not as well earlier in
the process . Default (Auto) is to use these
rotamers for rebuild_in_place but not otherwise.
rms_random_frag= None Rms random position change added to residues on
ends of fragments when extending them If you enter a
negative number, defaults will be used.
rms_random_loop= None Rms random position change added to residues on
ends of loops in tries for building loops If you enter
a negative number, defaults will be used.
semet= False You can specify that the dataset that is used for
refinement is a selenomethionine dataset, and that the model
should be the SeMet version of the protein, with all SD of MET
replaced with Se of MSE.
start_chains_list= None You can specify the starting residue number for
each of the unique chains in your structure. If you
use a sequence file then the unique chains are
extracted and the order must match the order of your
starting residue numbers. For example, if your
sequence file has chains A and B (identical) and
chains C and D (identical to each other, but
different than A and B) then you can enter 2 numbers,
the starting residues for chains A and C. NOTE: you
need to specify an input sequence file for
start_chains_list to be applied.
trace_as_lig= False You can specify that in building steps the ends of
chains are to be extended using the LigandFit algorithm.
This is default for nucleic acid model-building.
track_libs= False You can keep track of what libraries each atom in a
built structure comes from.
two_fofc_in_rebuild= False You can choose to use a sigmaa-weighted http://phenix-online.org/documentation/autobuild.htm (25 of 34) [12/14/08 1:01:09 PM]
103
Automated Model Building and Rebuilding using AutoBuild
2Fo-Fc map in all cycles of rebuilding instead of a
density-modified map. If the model is poor this can
sometimes allow model-building in place to work
even when it will not for density-modified maps.
use_any_side= True You can choose to have resolve model-building place
the best-fitting side chain at each position, even if the
sequence is not matched to the map.
use_cc_in_combine_extend= False You can choose to use the correlation of
density rather than density at atomic
positions to score models in combine_extend
use_met_in_align= *Auto Yes No True False You can use the heavy-atom
positions in input_ha_file as markers for Met SD
positions.
multiple_models
combine_only= False Once you have created a set of initial models you
can merge them together into a final set. This option is
useful if you have split up the creation of multiple
models into different directories, and then you have
copied all the initial models to one directory for
combining.
multiple_models= False You can build a set of models, all compatible
with your data. You can specify how many models with
multiple_models_number. If you are using
rebuild_in_place you can specify whether to generate
starting models or not with multiple_models_starting.
multiple_models_first= 1 Specify which model to build first
multiple_models_group_number= 5 You can build several initial models and
merge them. Normally 5 initial models is
fine.
multiple_models_last= 20 Specify which model to end with
multiple_models_number= 20 Specify how many models to build.
multiple_models_starting= True You can specify how to generate starting
models for multiple models. If you are using
rebuild_in_place and you specify "Yes" then
the Wizard will rebuild your starting model at
the resolution specified in
multiple_models_starting_resolution. If you
are not using rebuild_in_place the Wizard will
always build a starting model at the current
resolution.
multiple_models_starting_resolution= 4.0
You can set the resolution for
rebuilding an initial model. A
value of 0.0 will use the
resolution of the dataset.
place_waters_in_combine= True You can choose whether phenix.refine
automatically places ordered solvent (waters)
during the last cycle of multiple-model
generation. This is separate from place_waters,
which applies to all other cycles.
ncs
find_ncs= *Auto Yes No True False This script normally deduces ncs
information from the NCS in chains of models that are built
during iterative model-building. The update is done each cycle
in which an improved model is obtained. Say No to skip this.
See also "input_ncs_file" which can be used to specify NCS at
the start of the process. If find_ncs="No" then only this
starting NCS will be used and it will not be updated. You can
use find_ncs "No" to specify exactly what residues will be
used in NCS refinement and exactly what NCS operators to use
in density modification. You can use the function
$PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to
help you set up an input_ncs_file that has your specifications http://phenix-online.org/documentation/autobuild.htm (26 of 34) [12/14/08 1:01:09 PM]
104
Automated Model Building and Rebuilding using AutoBuild
in it.
input_ncs_file= None You can enter NCS information in 3 ways: (1) an
ncs_spec file produced by AutoSol or AutoBuild with NCS
information (2) a heavy-atom PDB file that contains ncs
in the heavy-atom sites (3) a PDB file with a model
that contains chains with NCS The wizard will derive NCS
information from any of these if specified. See also
"find_ncs" which determines whether the wizard will
update NCS from models that are built during iterative
building.
ncs_copies= None Number of copies of the molecule in the au (note: only
one type of molecule allowed at present)
ncs_refine_coord_sigma_from_rmsd= False You can choose to use the
current NCS rmsd as the value of the
sigma for NCS restraints. See also
ncs_refine_coord_sigma_from_rmsd_ratio
ncs_refine_coord_sigma_from_rmsd_ratio= 1.0
You can choose to multiply
the current NCS rmsd by this
value before using it as the
sigma for NCS restraints See
also
ncs_refine_coord_sigma_from_rmsd
no_merge_ncs_copies= False Normally False (do merge NCS copies). If
True, then do not use each NCS copy to try to build
the others.
optimize_ncs= True This script normally deduces ncs information from the
NCS in chains of models that are built during iterative
model-building. Optimize NCS adds a step to try and make
the molecule formed by NCS as compact as possible, without
losing any point-group symmetry.
use_ncs_in_build= True Use NCS information in the model assembly stage
of model-building. Also if no_merge_ncs_copies is not
set, then use each NCS copy to try to build the
others.
non_user_parameters
background_map= None You can supply an mtz file (REQUIRED LABELS: FP
PHIM FOMM) to use as map coefficients to calculate the
electron density in all points in an omit map that are
not part of any omitted region. (Default="")
boundary_background_map= None You can supply an mtz file (REQUIRED
LABELS: FP PHIM FOMM) to use as map
coefficients to calculate the electron density
in all points in the boundary map that are not
part of any omitted region. (Default="")
extend_try_list= False You can fill out the list of parallel jobs to
match the number of jobs you want to run at one time,
as specified with nbatch.
force_combine_extend= False You can choose whether to force the
combine-extend step in model-building
model_list= None This keyword lets you name any number of PDB files to
consider as starting models for model-building. NOTE: This
differs from consider_main_chain_list which will try to add
your PDB files EVERY cycle of merging models. In contrast
model_list will only do it on the first cycle. NOTE: this
only uses the main-chain atoms of your PDB files.
oasis_cnos= None Enter number of C N O and S atoms here if you have
OASIS and want to run it before resolve density modification
like this: "C 250 N 121 O 85 S 3"
offset_boundary_background_map= None You can set the offset of the
boundary_background_map.
http://phenix-online.org/documentation/autobuild.htm (27 of 34) [12/14/08 1:01:09 PM]
105
Automated Model Building and Rebuilding using AutoBuild
omit
composite_omit_type= *None simple_omit sa_omit iterative_build_omit Your
choices of types of OMIT maps are: None - normal
operation, no omit simple_omit - omit the atoms in
OMIT region in calculating a sigmaA-weighted
2mFo-DFc map with no refinement sa_omit - omit the
atoms in OMIT region, carry out simulated-annealing
refinement, then calculate a sigmaA-weighted
2mFo-DFc map. iterative_build_omit - set occupancy
of atoms in OMIT region to 0 throughout an entire
iterative model-building, density modification and
refinement process (takes a long time). All these
omit map types are available as composite omit maps
(default) or as omit maps around a region defined
by a PDB file (using omit_box_pdb_list) The
resulting OMIT map will be in the directory OMIT
with file name resolve_composite_map.mtz . This mtz
file contains the map coefficients to create the
OMIT map. The file "omit_region.mtz" contains the
coefficients for a map showing the boundaries of
the OMIT region.
n_box_target= None You can tell the Wizard how many omit boxes to try
and set up (but it will not necessarily choose your number
because it has to be nicely divisible into boxes that fit
your asymmetric unit). A suitable number is 24. The
larger the number of boxes, the better the map will be,
but the longer it will take to calculate the map.
n_cycle_image_min= 3 Pattern recognition (resolve_pattern) and fragment
identification ("image based density modification")
are used as part of the density modification process.
These are normally only useful in the first few
cycles of iterative model-building. This script
tries model-building both with and without including
image information, and proceeds with the most
complete model. Once at least n_cycle_image_min
cycles have been carried out with image information,
if the image-based map results in a less-complete
model than the one without image information, image
information is no longer included.
n_cycle_rebuild_omit= 10 Model-building is normally carried out using
the "best" available map. If omit_on_rebuild is
Yes, then every n_cycle_rebuild_omit cycle of
model rebuilding, a composite omit map is used
instead. If you specify 0 and omit_on_rebuild is
Yes, omit maps will be used every cycle. Normally
every 10th cycle is optimal.
offset_boundary= 1.0
Specify the boundary around omit_box_pdb for
definition of omit region.
omit_box_end= 0 To only carry out omit in some of the omit boxes, use
omit_box_start and omit_box_end
omit_box_pdb_list= None This keyword applies if you have set OMIT region
specification to "omit_around_pdb". To automatically
set an OMIT region specify a PDB file(s) with
omit_box_pdb_list. The omit region boundaries will be
the limits in x y z of the atoms in this file, plus a
border of offset_boundary. To use only some of the
atoms in the file, specify values for starting,
ending and chain to omit (omit_res_start_list and
omit_res_end_list and omit_chain_list) If you
specify more than one file (or if you specify more
than one segment of a file with omit_chain_list or
omit_res_start_list and omit_res_end_list) then a set http://phenix-online.org/documentation/autobuild.htm (28 of 34) [12/14/08 1:01:09 PM]
106
Automated Model Building and Rebuilding using AutoBuild
of omit runs will be carried out and combined into
one composite omit.
omit_box_start= 0 To only carry out omit in some of the omit boxes, use
omit_box_start and omit_box_end
omit_chain_list= None You can choose to omit just a portion of your
model keywords omit_res_start_list 3 omit_res_end_list
4 omit_chain_list chain1 (use "" to select all chains)
The residues from 3 to 4 of chain1 will be omitted. You
can specify more than one region by using the Parameter
Group Options button to add lines. If you specify more
than one region, a separate omit run will be carried
out for each one and then the maps will be put together
afterwards. If there are more than one chains in the
input PDB file then only the chain defined by
omit_chain will be omitted NOTE: Zero for start and
end and "" for chain is the same as choosing everything
omit_offset_list= 0 0 0 0 0 0 To carry out one iterative build omit,
with a region defined in grid units, enter
nxs,nxe,nys,nye,nzs,nze in omit_offset_list.
omit_on_rebuild= False You can specify whether to use an omit map for
building the model on rebuild cycles. Default is Yes if
you start with a model, No if you are building a model
from scratch. The omit map is calculated every
n_cycle_rebuild_omit cycles
omit_region_specification= *composite_omit omit_around_pdb You can
specify what region an omit
(simple/sa-omit/iterative-build-omit) map is
to be calculated for. Composite omit will
create a map over the entire asymmetric unit
by dividing the asymmetric unit into
overlapping boxes, calculating omit maps for
each, and splicing all the results together
into a single composite omit map. You can
tell the Wizard how many omit boxes to try
and set up with the keyword "n_box_target"
(but it will not necessarily choose your
number because it has to be nicely divisible
into boxes that fit your asymmetric unit).
Omit around PDB will omit around the region
defined by the PDB file(s) you enter for
omit_box_pdb (or around the residues in that
PDB file that you specify). If you specify
omit_around_pdb then you must enter a pdb
file to omit around.
omit_res_end_list= None You can choose to omit just a portion of your
model keywords omit_res_start_list 3
omit_res_end_list 4 omit_chain_list chain1 (use " "
for blank) The residues from 3 to 4 of chain1 will be
omitted. You can specify more than one region by
using the Parameter Group Options button to add
lines. If you specify more than one region, a
separate omit run will be carried out for each one
and then the maps will be put together afterwards. If
there are more than one chains in the input PDB file
then only the chain defined by omit_chain will be
omitted NOTE: Zero for start and end and "" for
chain is the same as choosing everything
omit_res_start_list= None You can choose to omit just a portion of your
model keywords omit_res_start_list 3
omit_res_end_list 4 omit_chain_list chain1 (use " "
for blank) The residues from 3 to 4 of chain1 will
be omitted. You can specify more than one region by http://phenix-online.org/documentation/autobuild.htm (29 of 34) [12/14/08 1:01:09 PM]
107
Automated Model Building and Rebuilding using AutoBuild
using the Parameter Group Options button to add
lines. If you specify more than one region, a
separate omit run will be carried out for each one
and then the maps will be put together afterwards.
If there are more than one chains in the input PDB
file then only the chain defined by omit_chain will
be rebuilt. NOTE: Zero for start and end and ""
for chain is the same as choosing everything
rebuild_in_place
min_seq_identity_percent_rebuild_in_place= 50.0
The sequence in your
input PDB file will be
adjusted to match the
sequence in your sequence
file (if any) You can specify
the minimum sequence identity
between your sequence file
and a segment from your input
PDB file to consider the
sequences to be matched.
Default is 50.0%. You might
want a higher number to make
sure that deletions in the
sequence are noticed. The
value you specify applies to
rebuild_in_place only. Use
min_seq_identity_percent
instead for non
rebuild_in_place runs.
n_cycle_rebuild_in_place= None Number of cycles for rebuild_in_place for
multiple models only
n_rebuild_in_place= 1 You can choose how many times to rebuild your
model in place with rebuild_in_place
rebuild_chain_list= None You can choose to rebuild just a portion of
your model keywords rebuild_res_start_list 3
rebuild_res_end_list 4 rebuild_chain_list chain1 (use
" " for blank) The residues from 3 to 4 of chain1
will be rebuilt. You can specify more than one
region by using the Parameter Group Options button
to add lines. If there are more than one chains in
the input PDB file then only the chain defined by
rebuild_chain will be rebuilt. The smallest region
that can be rebuilt is 4 residues.
rebuild_in_place= *Auto Yes No True False You can choose to rebuild
your model while fixing the sequence alignment by
iteratively rebuilding segments within the model. This
is done n_rebuild_in_place times, then the models are
recombined, taking the best-fitting parts of each.
Crossovers allowed where main-chain atom rmsd is less
than dist_close. Note that the sequence of the input
model must match the supplied sequence closely enough
to allow a clear alignment. Also this method does not
build any new chain, it just moves the existing model
around. Normally this procedure is useful if the model
is greater than 95% identical with the target
sequence. You can include information directly from
the starting model if you want with the keyword
include_input_model. Then this model will be
recombined with the models that are built based on it.
Note that this requires that the input model have a
sequence that is identical to the model to be rebuilt.
You can also rebuild just a portion of the model with
the keywords keywords rebuild_res_start_list 3 http://phenix-online.org/documentation/autobuild.htm (30 of 34) [12/14/08 1:01:09 PM]
108
Automated Model Building and Rebuilding using AutoBuild
rebuild_res_end_list 4 rebuild_chain_list chain1 (use "
" for blank) The residues from 3 to 4 of chain1 will
be rebuilt. You can specify more than one region by
using the Parameter Group Options button to add lines
NOTE: if a region cannot be rebuilt the original
coordinates will be preserved for that region.
rebuild_near_chain= None You can specify where to rebuild either with
rebuild_res_start_list rebuild_res_end_list
rebuild_chain_list or with rebuild_near_res and
rebuild_near_chain and rebuild_near_dist.
rebuild_near_dist= 7.5
You can specify where to rebuild either with
rebuild_res_start_list rebuild_res_end_list
rebuild_chain_list or with rebuild_near_res and
rebuild_near_chain and rebuild_near_dist.
rebuild_near_res= None You can specify where to rebuild either with
rebuild_res_start_list rebuild_res_end_list
rebuild_chain_list or with rebuild_near_res and
rebuild_near_chain and rebuild_near_dist.
rebuild_res_end_list= None You can choose to rebuild just a portion of
your model keywords rebuild_res_start_list 3
rebuild_res_end_list 4 rebuild_chain_list chain1
(use " " for blank) The residues from 3 to 4 of
chain1 will be rebuilt. You can specify more than
one region by using the Parameter Group Options
button to add lines. If there are more than one
chains in the input PDB file then only the chain
defined by rebuild_chain will be rebuilt. The
smallest region that can be rebuilt is 4 residues.
rebuild_res_start_list= None You can choose to rebuild just a portion
of your model keywords rebuild_res_start_list 3
rebuild_res_end_list 4 rebuild_chain_list chain1
(use " " for blank) The residues from 3 to 4 of
chain1 will be rebuilt. You can specify more
than one region by using the Parameter Group
Options button to add lines. If there are more
than one chains in the input PDB file then only
the chain defined by rebuild_chain will be
rebuilt. The smallest region that can be rebuilt
is 4 residues.
rebuild_side_chains= False You can choose to replace side chains (with
extend_only) before rebuilding the model (not
normally used)
redo_side_chains= True You can chooses to have AutoBuild choose whether
to replace all your side chains in rebuild_in_place,
taking new ones if they fit the density better. If
Yes, this is applied to all side chains, not only
those that are rebuilt.
replace_existing= False In rebuild_in_place the usual default is to
force the replacement of all residues, even if the
rebuilt ones are not as good a fit as the original.
You can override this by saying "No" (do not force
replacement of residues, keep whatever is better).
Additionally if you set the "touch_up" flag then the
default is to keep whatever is better.
touch_up= False You can rebuild just the worst parts of your model by
settting touch_up=True. You can decide what parts to rebuild
based on an minimum model-map correlation (by residue). This
is set with min_cc_residue_rebuild=0.82 Alternatively you can
rebuild the worst percentage of these:
worst_percent_res_rebuild=6. If a value is set for both of
these then residues qualifying in either way are rebuilt.
NOTE: touch_up is only available with rebuild_in_place.
http://phenix-online.org/documentation/autobuild.htm (31 of 34) [12/14/08 1:01:09 PM]
109
Automated Model Building and Rebuilding using AutoBuild
touch_up_extra_residues= None Number of residues on each side of the
residues identified in touch_up that you want
to rebuild. Normally you will want to rebuild
one or more on each side.
worst_percent_res_rebuild= 2.0
You can rebuild just the worst parts of
your model by settting touch_up=True. You can
decide how much to rebuild using
worst_percent_res_rebuild or with
min_cc_res_rebuild, or both.
refinement
link_distance_cutoff= 3.0
You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
max_occ= None You can choose to set the maximum value of occupancy for
atoms that have their occupancies refined. Default is None (use
default value of 1.0 from phenix.refine)
ordered_solvent_low_resolution= None You can choose what resolution
cutoff to use fo placing ordered solvent
in phenix.refine. If the resolution of
refinement is greater than this cutoff,
then no ordered solvent will be placed,
even if
refinement.main.ordered_solvent=True.
place_waters= True You can choose whether phenix.refine automatically
places ordered solvent (waters) during the refinement
process.
r_free_flags_fraction= 0.1
Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_lattice_symmetry_max_delta= 5.0
You can set the maximum
deviation of distances in the
lattice that are to be
considered the same for
purposes of generating a
lattice-symmetry-unique set of
free R flags.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
can decide whether to include lattice
symmetry (good in general, necessary
if there is twinning).
refine_b= True You can choose whether phenix.refine is to refine
individual atomic displacement parameters (B values)
refine_before_rebuild= True You can choose to refine the input model
before rebuilding it
refine_se_occ= True You can choose to refine the occupancy of SE atoms
in a SEMET structure (default=Yes). This only applies if
semet=true
refine_with_ncs= True This script can allow phenix.refine to
automatically identify NCS and use it in refinement.
NOTE: ncs refinement and placing waters automatically
are mutually exclusive at present.
http://phenix-online.org/documentation/autobuild.htm (32 of 34) [12/14/08 1:01:09 PM]
110
Automated Model Building and Rebuilding using AutoBuild
refine_xyz= True You can choose whether phenix.refine is to refine
coordinates
refinement_resolution= 0.0
Enter the high-resolution limit for
refinement only. This high-resolution limit can
be different than the high-resolution limit for
other steps. The default ("None" or 0.0) is to
use the overall high-resolution limit for this
run (as set by 'resolution')
s_annealing= False You can choose to carry out simulated annealing
during the first refinement after initial model-building
skip_hexdigest= False You may wish to ignore the hexdigest of the free R
flags in your input PDB file if (1) the dataset you
provide is not identical to the one that you refined
with (but has the same free R flags), or (2) you are
providing both an input_data_file and an
input_refinement_file or input_hires_file and. In the
second case, the resulting composite file may not have
the same hexdigest even though the free R flags are
copied over. The default is to set skip_hexdigest=True
for case #2. For case #1 you have to tell the Wizard to
skip the hexdigest (because it cannot know about this).
use_mlhl= True This script normally uses information from the input file
(HLA HLB HLC HLD) in refinement. Say No to only refine on Fobs
textal
d_max_textal= 1000.0
This low-resolution limit is only used for Textal
model-building
d_min_textal= 2.8
Textal has an optimal high-resolution limit of 2.8 A
This limit is only used for Textal model-building
thoroughness
build_outside= True Define whether to use the BuildOutside module in
build_model
connect= True Define whether to use the connect module in build_model.
This module tries to connect nearby chains with loops, without
using the sequence. This is different than fit_loops (which
uses the sequence to identify the exact number of residues in
the loop).
extensive_build= False You can choose whether to build a new model on
every cycle and carry out extra model-building steps
every cycle. Default is No (build a new model on first
cycle, after that carry out extra steps).
fit_loops= True You can fit loops automatically if sequence alignment
has been done.
insert_helices= True Define whether to use the insert_helices module in
build_model. This module tries to insert helices
identified with find_helices_strands into the current
working model. This can be useful as the standard build
sometimes builds strands into helical density at low
resolution.
n_cycle_build= -1 Choose number of cycles (3). This does not apply if
TEXTAL is selected for build_type
n_cycle_build_max= 6 Maximum number of cycles for iterative
model-building, starting from experimental phases
without a model. Even if a satisfactory model is not
found, a maximum of n_cycle_build_max cycles will be
carried out.
n_cycle_build_min= 1 Minimum number of cycles for iterative
model-building, starting from experimental phases
without a model. Even if a satisfactory model is
found, n_cycle_build_min cycles will be carried out.
n_cycle_rebuild_max= 15 Maximum number of cycles for iterative
model-rebuilding, starting from a model. Even if a
satisfactory model is not found, a maximum of http://phenix-online.org/documentation/autobuild.htm (33 of 34) [12/14/08 1:01:09 PM]
111
Automated Model Building and Rebuilding using AutoBuild
n_cycle_rebuild_max cycles will be carried out.
n_cycle_rebuild_min= 1 Mininum number of cycles for iterative
model-rebuilding, starting from a model. Even if a
satisfactory model is found, n_cycle_rebuild_min
cycles will be carried out.
n_mini= 10 You can choose how many times to retrace your model in
"retrace_before_build"
n_random_frag= 0 In resolve building you can randomize each fragment
slightly so as to generate more possibilities for tracing
based on extending it.
n_random_loop= 3 Number of randomized tries from each end for building
loops If 0, then one try. If N, then N additional tries
with randomization based on rms_random_loop.
n_try_rebuild= 2 Number of attempts to build each segment of chain
ncycle_refine= 3 Choose number of refinement cycles (3)
number_of_models= -1 This parameter lets you choose how many initial
models to build with RESOLVE within a single build
cycle. This parameter is now superseded by
number_of_parallel_models, which sets the number of
models (but now entire build cycles) to carry out in
parallel. A zero means set it automatically. That is
what you normally should use. The number_of_models is
by default set to 1 and number_of_parallel_models is
set to the value of nbatch (typically 4).
number_of_parallel_models= 0 This parameter lets you choose how many
models to build in parallel. A zero means set
it automatically. That is what you normally
should use. This parameter supersedes the old
parameter number_of_models. The value of
number_of_models is by default set to 1 and
number_of_parallel_models is set to the value
of nbatch (typically 4).
skip_combine_extend= False You can choose whether to skip the
combine-extend step in model-building
thorough_loop_fit= True Try many conformations and accept them even if
the fit is not perfect? If you say Yes the parameters
for thorough loop fitting are: n_random_loop=100
rms_random_loop=0.3 rho_min_main=0.5 while if you say
No those for quick loop fitting are: n_random_loop=20
rms_random_loop=0.3 rho_min_main=1.0
http://phenix-online.org/documentation/autobuild.htm (34 of 34) [12/14/08 1:01:09 PM]
112
Automated ligand fitting with LigandFit
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Automated ligand fitting with LigandFit
Purpose of the LigandFit Wizard
How the LigandFit Wizard works
How to run the LigandFit Wizard
What the LigandFit wizard needs to run
Specifying which columns of data to use from input data files
Specific limitations and problems
List of all LigandFit keywords
Author(s)
●
LigandFit Wizard: Tom Terwilliger
●
PHENIX GUI and PDS Server: Nigel W. Moriarty
●
RESOLVE: Tom Terwilliger
Purpose
Purpose of the LigandFit Wizard
The LigandFit Wizard carries out fitting of flexible ligands to electron density maps.
Usage
The LigandFit Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user.
See
Running a Wizard from a GUI, the command-line, or a script
for details of how to run a Wizard. The command-line version will be described here.
How the LigandFit Wizard works
The LigandFit wizard provides a command-line and graphical user interface allowing the user to identify a datafile containing crystallographic structure factor information, an optional PDB file with a partial model of the structure without the ligand, and a PDB file containing the ligand to be fit (in an allowed but arbitrary conformation).
The wizard checks the data files for consistency and then calls RESOLVE to carry out the fitting of the ligand into the electron-density map. The map used is normally a difference map, with F=FP-FC. It can http://phenix-online.org/documentation/ligandfit.htm (1 of 11) [12/14/08 1:01:15 PM]
113
Automated ligand fitting with LigandFit also be an Fobs map (calulated from FP with phases PHIC from the input partial model), or an arbitrary map, calculated with FP PHI and FOM. If you supply an input partial model, then the region occupied by the partial model is flattened in the map used to fit the ligand, so that the ligand will normally not get placed in this region.
The ligand fitting is done by RESOLVE in a three-stage process. First, the largest contiguous region of density in the map not already occupied by the model is identified. The ligand will be placed in this density. (If desired, the location of the ligand can instead be defined by the user as near a certain residue or near specified coordinates. ) Next, many possible placements of the largest rigid subfragments of the ligand are found within this region of high density. Third, each of these placements is taken as a starting point for fitting the remainder of the ligand. All these ligand fits are scored based on the fit to the density, and the best-fitting placement is written out.
The output of the wizard consists of a fitted ligand in PDB format and a summary of the quality of the fit.
Multiple copies of a ligand can be fit to a single map in an automated fashion using the LigandFit wizard as well.
How to run the LigandFit Wizard
Running the LigandFit Wizard is easy. For example, from the command-line you can type: phenix.ligandfit data=datafile.mtz model=partial_model.pdb ligand=ligand.pdb
The LigandFit Wizard will carry out ligand fitting of the ligand in ligand.pdb based on the structure factor amplitudes in datafile.mtz, calculating phases based on partial-model.pdb. All rotatable bonds will be identified and allowed to take stereochemically reasonable orientations.
What the LigandFit wizard needs to run
The ligandfit wizard needs:
●
(1) a datafile (w1.sca or data=w1.sca); this can be any format
●
(2) a PDB file with your model without ligand (model=partial.pdb; optional if your datafile contains map coefficients)
●
(3) a file with information about your ligand (ligand=side.pdb)
The ligand file can be a PDB file with 1 stereochemically acceptable conformation of your ligand. It can alternatively be a file containing a SMILES string, in which case the starting ligand conformation will be generated with the PHENIX elbow routine.
The command_line ligandfit interpreter will guess which file is your data file but you have to tell it which file is the model and which is the ligand.
Specifying which columns of data to use from input data files
If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.
For example, if your data file data.mtz has columns FP SIGFP then you might specify data=data.mtz
http://phenix-online.org/documentation/ligandfit.htm (2 of 11) [12/14/08 1:01:15 PM]
114
Automated ligand fitting with LigandFit input_labels="FP SIGFP"
You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=data.mtz # display all labels for data.mtz
You can specify many more parameters as well. See the list of keywords, defaults and descriptions
for how to do this. Some of the most common parameters are: data=w1.sca # data file partial_model=coords.pdb # starting model without ligand ligand=ligand.pdb # any stereochemically allowed conformation of your ligand resolution=3 # dmin of 3 A quick=False # specify if you want to look hard for a good conformation ligand_cc_min=0.75 # quit if the CC of ligand to map is 0.75 or better number_of_ligands=3 # find 3 copies of the ligand n_group_search=3 # try 3 different fragments of the ligand in initial search resolve_command="'ligand_start side.pdb'" # build ligand superimposing on side.pdb
Output files from LigandFit
When you run LigandFit the output files will be in a subdirectory with your run number:
LigandFit_run_1_/ # subdirectory with results
●
A summary file listing the results of the run and the other files produced:
LigandFit_summary.dat # overall summary
●
A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)
LigandFit_Facts.dat # all Facts about the run
●
A warnings file listing any warnings about the run
LigandFit_warnings.dat # any warnings
●
A PDB file with the fitted ligand (in this case the first copy of ligand number 1): ligand_fit_1_1.pdb
●
A log file with the fitting of the ligand: ligand_1_1.log
●
A log file with the fit of the ligand to the map: ligand_cc_1_1.log
●
Map coefficients for the map used for fitting: http://phenix-online.org/documentation/ligandfit.htm (3 of 11) [12/14/08 1:01:15 PM]
115
Automated ligand fitting with LigandFit resolve_map.mtz
Examples
Sample command_line inputs
●
Standard run of ligandfit (generate map from model and data file) phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb
●
Build into a map from pre-determined coefficients phenix.ligandfit data=perfect.mtz \
lig_map_type=fo-fc_difference_map \
model=partial.pdb ligand=side.pdb
●
Quick run of ligandfit phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb quick=True
●
Run ligandfit on a series of ligands specified in ligand_list.dat phenix.ligandfit w1.sca model=partial.pdb \
ligand=ligand_list.dat file_or_file_list=file_with_list_of_files
Note that you have to specify file_or_file_list=file_with_list_of_files or else the Wizard will try to interpret the contents of ligand_list.dat as a SMILES string. Here the
"file_with_list_of_files" is a flag, not something you substitute with an actual file name. You use it just as listed above.
●
Place ligand near residue 94 of chain "A" from partial.pdb phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb \
ligand_near_chain="A" ligand_near_res=92
●
Use start.pdb as a template for some of the atoms in the ligand; build the remainder of the ligand, fixing the coordinates of the corresponding atoms: phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb \
resolve_command="'ligand_start start.pdb'" # NOTE ' and " quotes necessary
Note that the formatting is slightly tricky and requires the two different quotation marks on either end of the command. This is an example of passing a specific keyword to RESOLVE.
Possible Problems
Specific limitations and problems
●
The ligand to be searched for must have at least 3 atoms. http://phenix-online.org/documentation/ligandfit.htm (4 of 11) [12/14/08 1:01:15 PM]
116
Automated ligand fitting with LigandFit
●
The partial-model file must not have any atoms (other than waters, which are automatically removed) in the position where the ligand is to be built. If this file contains atoms other than waters in the position where the ligand is to be built, then you may wish to remove them before building the ligand.
●
If a ring in the ligand can have more than one conformation (e.g., chair or boat conformation) then you need to do separate runs for each conformation of the ring (rings are taken as fixed units in LigandFit).
●
LigandFit ignores insertion codes, so if you specify a residue with ligand_near_res, only the residue number is used.
●
The size of the asymmetric unit in the SOLVE/RESOLVE portion of the LigandFit wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size
("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.
●
The LigandFit Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.
Literature
Ligand identification using electron-density map correlations. T. C. Terwilliger, P. D.
Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D63, 101-107 (2007)
Automated ligand fitting by core-fragment fitting and extension into density. T. C.
Terwilliger, H. Klei, P. D. Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D62, 915-922
(2006)
[pdf]
[pdf]
Additional information
List of all LigandFit keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- ligandfit
data= None Datafile (alias for input_data_file). This can be any format if
only FP is to be read in. If phases are to be read in then MTZ format
is required. The Wizard will guess the column identification. If you
want to specify it you can say input_labels="FP" , or
input_labels="FP PHIB FOM". (Command-line only) http://phenix-online.org/documentation/ligandfit.htm (5 of 11) [12/14/08 1:01:15 PM]
117
Automated ligand fitting with LigandFit
ligand= None File containing information about the ligand (PDB or SMILES)
(alias for input_lig_file) (Command-line only)
model= None PDB file with model for everything but the ligand (alias for
input_partial_model_file). (Command-line only)
quick= False Run as quickly as possible. (Command-line only)
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
(Command-line only)
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown. (Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop".
If you type "phenix.autobuild run=3 stop" then this will stop run
3 of autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
(Command-line only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown. (Command-line only)
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this: peak.data="F+
SIGF+ F- SIGF-" # the entire string in quotes counts
here You can use the individual labels from these
strings as identifiers for data columns in AutoSol and
AutoBuild like this: input_refinement_labels="FP SIGFP
FreeR_flags" # each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
crystal_info
cell= 0.0 0.0 0.0 0.0 0.0 0.0
Enter cell parameter a b c alpha beta
gamma
resolution= 0.0
High-resolution limit.Used as resolution limit for
density modification and as general default high-resolution
limit. If resolution_build or refinement_resolution are set http://phenix-online.org/documentation/ligandfit.htm (6 of 11) [12/14/08 1:01:15 PM]
118
Automated ligand fitting with LigandFit
then they override this for model-building or refinement. If
overall_resolution is set then data beyond that resolution
is ignored completely.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
display
number_of_solutions_to_display= None Number of solutions to put on
screen and to write out
solution_to_display= 1 Solution number of the solution to display and
write out ( use 0 to let the wizard display the top
solution)
file_info
file_or_file_list= *single_file file_with_list_of_files Choose if you
want to input a single file with PDB or other
information about the ligand or if you want to input
a file containing a list of files with this
information for a list of ligands
input_labels= None Labels for input data columns NOTE: Applies to input
data file for LigandFit and AutoBuild, but not to AutoMR.
For AutoMR use instead 'input_label_string'.
lig_map_type= *fo-fc_difference_map fobs_map pre_calculated_map_coeffs
Enter the type of map to use in ligand fitting
fo-fc_difference_map: Fo-Fc difference map phased on
partial model fobs_map: Fo map phased on partial model
pre_calculated_map_coeffs: map calculated from FP PHIB
[FOM] coefficients in input data file
ligand_format= *PDB SMILES Enter whether the files contain SMILES
strings or PDB formatted information
general
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set
run_command=qsub (or otherwise submit to a batch queue),
then you should set background=False, so that the batch
queue can keep track of your runs. There is no need to use
background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
base_path= None You can specify the base path for files (default is
current working directory)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is No, keep these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True"
coot_name= coot If your version of coot is called something else, then
you can specify that here.
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
extend_try_list= False You can fill out the list of parallel jobs to
match the number of jobs you want to run at one time,
as specified with nbatch.
extra_verbose= False Facts and possible commands will be printed every
cycle if Yes
i_ran_seed= 289564 Random seed (positive integer) for model-building
and simulated annealing refinement
ligand_id= None You can specify an integer value for the ID of a
ligand... This number will be added to whatever residue
number the ligand search model in input_lig_file has. The http://phenix-online.org/documentation/ligandfit.htm (7 of 11) [12/14/08 1:01:15 PM]
119
Automated ligand fitting with LigandFit
keyword is only valid if a single copy of the ligand is to be
found.
max_wait_time= 100.0
You can specify the length of time (seconds) to
wait when testing the run_command. If you have a cluster
where jobs do not start right away you may need a longer
time to wait.
nbatch= 5 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
resolve_command_list= None Commands for resolve. One per line in the
form: keyword value value can be optional
Examples: coarse_grid resolution 200 2.0 hklin
test.mtz NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single quotes
(') like this: resolve_command_list="'no_build'
'b_overall 23' "
resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
("","_giant","_huge","_extra_huge")
run_command= csh When you specify nproc=nn, you can run the subprocesses
as jobs in background with csh (default) or submit them to
a queue with the command of your choice (i.e., qsub ). If
you have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your
system. NOTE: If you set run_command=qsub (or otherwise
submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command=csh (or similar, csh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
temp_dir= None Define a temporary directory (it must exist)
title= Run 1 LigandFit Sun Dec 7 17:46:24 2008 Enter any text you like
to help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
verbose= False Command files and other verbose output will be printed
input_files
existing_ligand_file_list= None You can enter a list of files with
ligands you have already fit. These will be
used to exclude that region from http://phenix-online.org/documentation/ligandfit.htm (8 of 11) [12/14/08 1:01:15 PM]
120
Automated ligand fitting with LigandFit
consideration.
input_data_file= None Enter the file with input structure factor data
(files other than MTZ will be converted to mtz and
intensities to amplitudes)
input_lig_file= None Enter either a single file with PDB information or
a SMILES string or a file containing a list of files
with this information for a list of ligands. If you
enter a file containing a list of files you need also to
specify
"file_or_file_list=file_with_list_of_files".
If the format is not PDB, then ELBOW will generate a PDB
file.
input_ligand_compare_file= None If you enter a PDB file with a ligand in
it, the coordinates of the newly-built ligand
will be compared with the coordinates in this
file.
input_partial_model_file= None Enter a PDB file containing a model of
your structure without the ligand. This is
used to calculate phases. If you are providing
phases in your data file and have selected
"pre_calculated_map_coeffs" for map_type this
file may be left out.
non_user_parameters
get_lig_volume= False You can ask to get the volume of the ligand and
to then stop
offsets_list= 7 53 29 You can specify an offset for the orientation of
the helix and strand templates in building. This is used
in generating different starting models.
refinement
link_distance_cutoff= 3.0
You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
r_free_flags_fraction= 0.1
Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_lattice_symmetry_max_delta= 5.0
You can set the maximum
deviation of distances in the
lattice that are to be
considered the same for
purposes of generating a
lattice-symmetry-unique set of
free R flags.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
can decide whether to include lattice
symmetry (good in general, necessary
if there is twinning).
search_parameters
conformers= 1 Enter how many conformers to create. If greater than 1, http://phenix-online.org/documentation/ligandfit.htm (9 of 11) [12/14/08 1:01:15 PM]
121
Automated ligand fitting with LigandFit
then ELBOW will always be used to generate them. If 1 then
ELBOW will be used if a PDB file is not specified. These
conformers are used to identify allowed torsion angles for
your ligand. The alternative is to use the empirical rules
in RESOLVE. ELBOW takes longer but is more accurate.
delta_phi_ligand= 40.0
Specify the angle (degrees) between successive
tries in FFT search for fragments
fit_phi_inc= 20 Specify the angle (degrees) between rotations around
bonds
fit_phi_range= -180 180 Range of bond rotation angles to search
group_search= 0 Enter the ID number of the group from the ligand to use
to seed the search for conformations
ligand_cc_min= 0.75
Enter the minimum correlation coefficient of the
ligand to the map to quit searching for more
conformations
ligand_completeness_min= 1.0
Enter the minimum completeness of the
ligand to the map to quit searching for more
conformations
local_search= True If local_search is Yes then, only the region within
search_dist of the point in the map with the highest local
rmsd will be searched in the FFT search for fragments
n_group_search= 3 Enter the number of different fragments of the ligand
that will be looked for in FFT search of the map
n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at
once otherwise all are first searched at once then
individually up to the number specified
n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are
tested at once otherwise all are first tested at once
then individually up to the number specified
number_of_ligands= 1 Number of copies of the ligand expected in the
asymmetric unit
search_dist= 10.0
If local_search is Yes then, only the region within
this distance of the point in the map with the highest
local rmsd will be searched in the FFT search for fragments
use_cc_local= False You can specify the use of a local correlation
coefficient for scoring ligand fits to the map. If you do
not do this, then the region over which the ligand is
scored are all points within 2.5 A of the atoms in the
ligand. If you do specify use_cc_local, then the region
over which the ligand is scored are all these points, plus
all the contingous points that have density greater than
0.5 * sigma .
search_target
ligand_near_chain= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and
ligand_near_chain. If you set
ligand_near_chain="None" or leave it blank or do not
set it, then all chains will be included. The
keywords ligand_near_res and ligand_near_chain refer
to residue/chain in the file defined by
input_partial_model_file (or model if running from
command line).
ligand_near_pdb= None You can specify where LigandFit should look for
your ligands by providing a PDB file containing one or
more copies of the ligand. If you want you can provide
a PDB file with ligand+ macromolecule and specify the
ligand name with name_of_ligand_near_pdb.
ligand_near_res= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and http://phenix-online.org/documentation/ligandfit.htm (10 of 11) [12/14/08 1:01:15 PM]
122
Automated ligand fitting with LigandFit
ligand_near_chain The keywords ligand_near_res and
ligand_near_chain refer to residue/chain in the file
defined by input_partial_model_file (or model if
running from command line).
name_of_ligand_near_pdb= None You can specify where LigandFit should
look for your ligands by providing a PDB file
containing one or more copies of the ligand. If
you want you can provide a PDB file with
ligand+ macromolecule and specify the ligand
name with name_of_ligand_near_pdb.
search_center= 0.0 0.0 0.0
Enter coordinates for center of search region
(ignored if [0,0,0]) http://phenix-online.org/documentation/ligandfit.htm (11 of 11) [12/14/08 1:01:15 PM]
123
Data quality assessment with phenix.xtriage
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Data quality assessment with phenix.xtriage
Specific limitations and problems
Author(s)
● xtriage: Peter Zwart
●
Phil command interpreter: Ralf W. Grosse-Kunstleve
Purpose
The xtriage method is a tool for analyzing structure factor data to identify outliers, presence of twinning and other conditions that the user should be aware of.
Usage
How xtriage works
Basic sanity checks performed by xtriage are
●
Wilson plot sanity
●
Probabilistic Matthews analysis
●
Data strength analysis
●
Ice ring analysis
●
Twinning analysis
●
Reference analysis (determines possible re-indexing. optional)
●
Detwinning and data massaging (optional)
See also: phenix.reflection_statistics
(comparison of multiple data sets)
Output files from xtriage
●
(1) A log file that contains all the screen output plus some ccp4 style graphs
●
(2) optional: an mtz file with massaged data
Xtriage keywords in detail
http://phenix-online.org/documentation/xtriage.htm (1 of 15) [12/14/08 1:01:22 PM]
124
Data quality assessment with phenix.xtriage
Scope: parameters.asu_contents keys: * n_residues :: Number of residues per monomer/unit
* n_bases :: Number of nucleotides per monomer/unit
* n_copies_per_asu :: Number of copies in the ASU.
These keywords control the determination of the absolute scale. If the number of residues/bases is not specified, a solvent content of 50% is assumed. Scope: parameters.misc_twin_parameters.missing_symmetry keys: * tanh_location :: tanh decision rule parameter
* tanh_slope :: tanh decision rule parameter
The tanh_location and tanh_slope parameter control what R-value is considered to be low enough to be considered a 'proper' symmetry operator. the tanh_location parameter corresponds to the inflection point of the approximate step function. Increasing tanh_location will result in large R-value thresholds. tanh_slope is set to 50 and should be okai. Scope: parameters.misc_twin_parameters.twinning_with_ncs keys: * perform_test :: can be set to True or False
* n_bins :: Number of bins in determination of D_ncs
The perform_test is by default set to False. Setting it to True triggers the determination of the twin fraction while taking into account NCS parallel to the twin axis. Scope: parameters.misc_twin_parameters.
twin_test_cuts keys: * high_resolution : high resolution for twin tests
* low_resolution: low resolution for twin tests
* isigi_cut: I/sig(I) threshold in automatic determination
of high resolution limit
* completeness_cut: completeness threshold in automatic
determination of high resolution limit
The automatic determination of the resolution limit for the twinning test is determined on the basis of the completeness after removing intensities for which I/sigI < isigi_cut. The lowest limit obtain in this way is 3.5A.
The value determined by the automatic procedure can be overruled by specification of the high_resolution keyword. The low resolution is set to 10A by default. Scope: parameters.reporting keys: * verbose :: verbosity level.
* log :: log file name
* ccp4_style_graphs :: Either True or False. Determines whether or
not ccp4 style logfgra plots are written to the
log file
Scope: xray_data keys: * file_name :: file name with xray data.
* obs_labels :: labels for observed data is format is mtz or XPLOR/CNS
* calc_labels :: optional; labels for calculated data
* unit_cell :: overrides unit cell in reflection file (if present)
* space_group :: overrides space group in reflection file (if present)
* high_resolution :: High resolution limit of the data
* low_resolution :: Low resolution limit of the data
Note that the matching of specified and present labels involves a sub-string matching algorithm. Scope:
optional keys: * hklout :: output mtz file
* twinning.action :: Whether to detwin the data
* twinning.twin_law :: using this twin law (h,k,l or x,y,z notation)
* twinning.fraction :: The detwinning fraction.
http://phenix-online.org/documentation/xtriage.htm (2 of 15) [12/14/08 1:01:22 PM]
125
Data quality assessment with phenix.xtriage
* b_value :: the resulting Wilson B value
The output mtz file contains an anisotropy corrected mtz file, with suspected outliers removed. The data is put scaled and has the specified Wilson B value. These options have an associated expert level of 10, and are not shown by default. Specification of the expert level on the command line as 'level=100' will show all available options.
Interpreting Xtriage output
Typing:
%phenix.xtriage some_data.sca residues=290 log=some_data.log
results in the following output (parts omitted). Matthews analysis First, a cell contents analysis is performed.
Matthews coefficients, solvent content and solvent content probabilities are listed, and the most likely composition is guessed
Matthews coefficient and Solvent content statistics
----------------------------------------------------------------
| Copies | Solvent content | Matthews Coed. | P(solvent cont.) |
|--------|-----------------|----------------|------------------|
| 1 | 0.705 | 4.171 | 0.241 |
| 2 | 0.411 | 2.085 | 0.750 |
| 3 | 0.116 | 1.390 | 0.009 |
----------------------------------------------------------------
| Best guess : 2 copies in the asu |
----------------------------------------------------------------
Data strength The next step, the strength of the data is gauged by determining the completeness of the in resolution bins after application of several I/sigI cut off values
Completeness and data strength analysis
The following table lists the completeness in various resolution
ranges, after applying a I/sigI cut. Miller indices for which
individual I/sigI values are larger than the value specified in
the top row of the table, are retained, while other intensities
are discarded. The resulting completeness profiles are an indication
of the strength of the data.
----------------------------------------------------------------------------------------
| Res. Range | I/sigI>1 | I/sigI>2 | I/sigI>3 | I/sigI>5 | I/sigI>10 | I/sigI>15 |
----------------------------------------------------------------------------------------
| 19.87 - 7.98 | 96.4% | 95.3% | 94.5% | 93.6% | 91.7% | 89.3% |
| 7.98 - 6.40 | 99.2% | 98.2% | 97.1% | 95.5% | 90.9% | 84.7% |
| 6.40 - 5.61 | 97.8% | 95.4% | 93.3% | 87.1% | 76.6% | 66.8% |
| 5.61 - 5.11 | 98.2% | 95.9% | 94.0% | 87.9% | 74.1% | 58.0% |
| 5.11 - 4.75 | 97.9% | 96.2% | 94.5% | 91.1% | 79.2% | 62.5% |
| 4.75 - 4.47 | 97.4% | 95.4% | 93.1% | 88.9% | 76.6% | 56.9% |
| 4.47 - 4.25 | 96.5% | 94.5% | 92.1% | 88.0% | 75.3% | 56.5% |
| 4.25 - 4.07 | 96.6% | 94.0% | 91.2% | 85.4% | 69.3% | 44.9% |
| 4.07 - 3.91 | 95.6% | 92.1% | 87.8% | 80.1% | 61.9% | 34.8% |
| 3.91 - 3.78 | 94.3% | 89.6% | 83.7% | 71.1% | 48.7% | 20.5% |
| 3.78 - 3.66 | 95.7% | 90.9% | 85.6% | 71.5% | 42.4% | 14.8% |
| 3.66 - 3.56 | 91.6% | 85.0% | 78.0% | 63.3% | 34.1% | 9.5% |
| 3.56 - 3.46 | 89.8% | 80.4% | 70.2% | 52.8% | 22.2% | 3.8% |
| 3.46 - 3.38 | 87.4% | 76.3% | 64.6% | 46.7% | 15.5% | 1.7% |
----------------------------------------------------------------------------------------
This analysis is also used in the automatic determination of the high resolution limit used in the intensity http://phenix-online.org/documentation/xtriage.htm (3 of 15) [12/14/08 1:01:22 PM]
126
Data quality assessment with phenix.xtriage statistics and twin analyses. Absolute, likelihood based Wilson scaling The (anisotropic) B value of the data is determined using a likelihood based approach. The resulting B value/tensor is reported:
Maximum likelihood isotropic Wilson scaling
ML estimate of overall B value of sec17.sca:i_obs,sigma:
75.85 A**(-2)
Estimated -log of scale factor of sec17.sca:i_obs,sigma:
-2.50
Maximum likelihood anisotropic Wilson scaling
ML estimate of overall B_cart value of sec17.sca:i_obs,sigma:
68.92, 0.00, 0.00
68.92, 0.00
91.87
Equivalent representation as U_cif:
0.87, -0.00, -0.00
0.87, 0.00
1.16
ML estimate of -log of scale factor of sec17.sca:i_obs,sigma:
-2.50
Correcting for anisotropy in the data
A large spread in (especially the diagonal) values indicates anisotropy. The anisotropy is corrected for. This clears up intensity statistics. Low resolution completeness analysis Mostly data processing software do not provide a clear picture of the completeness of the data at low resolution. For this reason, xtriage lists the completeness of the data up to 5 Angstrom:
Low resolution completeness analysis
The following table shows the completeness
of the data to 5 Angstrom.
unused: - 19.8702 [ 0/68 ] 0.000
bin 1: 19.8702 - 10.3027 [425/455] 0.934
bin 2: 10.3027 - 8.3766 [443/446] 0.993
bin 3: 8.3766 - 7.3796 [446/447] 0.998
bin 4: 7.3796 - 6.7336 [447/449] 0.996
bin 5: 6.7336 - 6.2673 [450/454] 0.991
bin 6: 6.2673 - 5.9080 [428/429] 0.998
bin 7: 5.9080 - 5.6192 [459/466] 0.985
bin 8: 5.6192 - 5.3796 [446/450] 0.991
bin 9: 5.3796 - 5.1763 [437/440] 0.993
bin 10: 5.1763 - 5.0006 [460/462] 0.996
unused: 5.0006 - [ 0/0 ]
This analysis allows one to quickly see if there is any unusually low completeness at low resolution, for instance due to missing overloads. Wilson plot analysis A Wilson plot analysis a la ARP/wARP is carried out, albeit with a slightly different standard curve:
Mean intensity analysis
Analysis of the mean intensity.
Inspired by: Morris et al. (2004). J. Synch. Rad.11, 56-59.
The following resolution shells are worrisome:
------------------------------------------------
| d_spacing | z_score | compl. | <Iobs>/<Iexp> |
------------------------------------------------
| 5.773 | 7.95 | 0.99 | 0.658 |
| 5.423 | 8.62 | 0.99 | 0.654 |
| 5.130 | 6.31 | 0.99 | 0.744 |
| 4.879 | 5.36 | 0.99 | 0.775 |
| 4.662 | 4.52 | 0.99 | 0.803 | http://phenix-online.org/documentation/xtriage.htm (4 of 15) [12/14/08 1:01:22 PM]
127
Data quality assessment with phenix.xtriage
| 3.676 | 5.45 | 0.99 | 1.248 |
------------------------------------------------
Possible reasons for the presence of the reported
unexpected low or elevated mean intensity in
a given resolution bin are :
- missing overloaded or weak reflections
- suboptimal data processing
- satellite (ice) crystals
- NCS
- translational pseudo symmetry (detected elsewhere)
- outliers (detected elsewhere)
- ice rings (detected elsewhere)
- other problems
Note that the presence of abnormalities
in a certain region of reciprocal space might
confuse the data validation algorithm throughout
a large region of reciprocal space, even though
the data is acceptable in those areas.
A very long list of warnings could indicate a serious problem with your data. Decisions on whether or not the data is useful, should be cut or should thrown away altogether, is not straightforward and falls beyond the scope of xtriage. Outlier detection and rejection Possible outliers are detected on the basis Wilson statistics:
Possible outliers
Inspired by: Read, Acta Cryst. (1999). D55, 1759-1764
Acentric reflections:
-----------------------------------------------------------------
| d_space | H K L | |E| | p(wilson) | p(extreme) |
-----------------------------------------------------------------
| 3.716 | 8, 6, 31 | 3.52 | 4.06e-06 | 5.87e-02 |
----------------------------------------------------------------p(wilson) : 1-(1-exp[-|E|^2]) p(extreme) : 1-(1-exp[-|E|^2])^(n_acentrics) p(wilson) is the probability that an E-value of the specified value would be observed when it would selected at random from the given data set.
p(extreme) is the probability that the largest |E| value is larger or equal than the observed largest |E| value.
Both measures can be used for outlier detection. p(extreme) takes into account the size of the data set.
Outliers are removed from the data set in the further analysis. Note that if pseudo translational symmetry is present, a large number of 'outliers' will be present. Ice ring detection Ice rings in the data are detected by analyzing the completeness and the mean intensity:
Ice ring related problems
The following statistics were obtained from ice-ring
insensitive resolution ranges
mean bin z_score : 3.47
( rms deviation : 2.83 )
mean bin completeness : 0.99
( rms deviation : 0.00 )
The following table shows the z-scores
and completeness in ice-ring sensitive areas.
http://phenix-online.org/documentation/xtriage.htm (5 of 15) [12/14/08 1:01:22 PM]
128
Data quality assessment with phenix.xtriage
Large z-scores and high completeness in these
resolution ranges might be a reason to re-assess
your data processing if ice rings were present.
------------------------------------------------
| d_spacing | z_score | compl. | Rel. Ice int. |
------------------------------------------------
| 3.897 | 0.12 | 0.97 | 1.000 |
| 3.669 | 0.96 | 0.95 | 0.750 |
| 3.441 | 2.14 | 0.94 | 0.530 |
------------------------------------------------
Abnormalities in mean intensity or completeness at
resolution ranges with a relative ice ring intensity
lower then 0.10 will be ignored.
At 3.67 A there is an lower occupancy
then expected from the rest of the data set.
Even though the completeness is lower as expected,
the mean intensity is still reasonable at this resolution
At 3.44 A there is an lower occupancy
then expected from the rest of the data set.
Even though the completeness is lower as expected,
the mean intensity is still reasonable at this resolution
There were 2 ice ring related warnings
This could indicate the presence of ice rings.
Anomalous signal If the input reflection file contains separate intensities for each Friedel mate, a quality measure of the anomalous signal is reported:
Analysis of anomalous differences
Table of measurability as a function of resolution
The measurability is defined as the fraction of
Bijvoet related intensity differences for which
|delta_I|/sigma_delta_I > 3.0
min[I(+)/sigma_I(+), I(-)/sigma_I(-)] > 3.0
holds.
The measurability provides an intuitive feeling
of the quality of the data, as it is related to the
number of reliable Bijvoet differences.
When the data is processed properly and the standard
deviations have been estimated accurately, values larger
than 0.05 are encouraging.
unused: - 19.8704 [ 0/68 ] bin 1: 19.8704 - 7.0211 [1551/1585] 0.1924
bin 2: 7.0211 - 5.6142 [1560/1575] 0.0814
bin 3: 5.6142 - 4.9168 [1546/1555] 0.0261
bin 4: 4.9168 - 4.4729 [1563/1582] 0.0081
bin 5: 4.4729 - 4.1554 [1557/1577] 0.0095
bin 6: 4.1554 - 3.9124 [1531/1570] 0.0083
bin 7: 3.9124 - 3.7178 [1541/1585] 0.0069
bin 8: 3.7178 - 3.5569 [1509/1552] 0.0028
bin 9: 3.5569 - 3.4207 [1522/1606] 0.0085
bin 10: 3.4207 - 3.3032 [1492/1574] 0.0044
unused: 3.3032 - [ 0/0 ]
The anomalous signal seems to extend to about 5.9 A http://phenix-online.org/documentation/xtriage.htm (6 of 15) [12/14/08 1:01:22 PM]
129
Data quality assessment with phenix.xtriage
(or to 5.2 A, from a more optimistic point of view)
The quoted resolution limits can be used as a guideline
to decide where to cut the resolution for phenix.hyss
As the anomalous signal is not very strong in this data set
substructure solution via SAD might prove to be a challenge.
Especially if only low resolution reflections are used,
the resulting substructures could contain a significant amount of
of false positives.
Determination of twin laws Twin laws are found using a modified le-Page algorithm and classified as merohedral and pseudo merohedral:
Determining possible twin laws.
The following twin laws have been found:
-------------------------------------------------------------------------------
| Type | Axis | R metric (%) | delta (le Page) | delta (Lebedev) | Twin law
|
-------------------------------------------------------------------------------
| M | 2-fold | 0.000 | 0.000 | 0.000 | -h,k,-l
|
-------------------------------------------------------------------------------
M: Merohedral twin law
PM: Pseudomerohedral twin law
1 merohedral twin operators found
0 pseudo-merohedral twin operators found
In total, 1 twin operator were found
Non-merohedral (reticular) twinning is not considered. The R-metric is equal to :
Sum (M_i-N_i)^2 / Sum M_i^2
M_i are elements of the original metric tensor and N_i are elements of the metric tensor after 'idealizing' the unit cell, in compliance with the restrictions the twin law poses on the lattice if it would be a 'true' symmetry operator. The delta le-Page is the familiar obliquity. The delta Lebedev is a twin law quality measure developed by A. Lebedev (Lebedev, Vagin & Murshudov; Acta Cryst. (2006). D62, 83-95.). Note that for merohedral twin laws, all quality indicators are 0. For non-merohedral twin laws, this value is larger or equal to zero. If a twin law is classified as non-merohedral, but has a delta le-page equal to zero, the twin law is sometimes referred to as a metric merohedral twin law. Locating translational pseudo symmetry (TPS) TPS is located by inspecting a low resolution Patterson function. Peaks and their significance levels are reported:
Largest Patterson peak with length larger then 15 Angstrom
Frac. coord. : 0.027 0.057 0.345
Distance to origin : 17.444
Height (origin=100) : 3.886
p_value(height) : 9.982e-01
The reported p_value has the following meaning:
The probability that a peak of the specified height
or larger is found in a Patterson function of a
macro molecule that does not have any translational
pseudo symmetry is equal to 9.982e-01
p_values smaller then 0.05 might indicate
weak translation pseudo symmetry, or the self vector of
a large anomalous scatterer such as Hg, whereas values
smaller then 1e-3 are a very strong indication for
the presence of translational pseudo symmetry.
http://phenix-online.org/documentation/xtriage.htm (7 of 15) [12/14/08 1:01:22 PM]
130
Data quality assessment with phenix.xtriage
Moments of the observed intensities The moment of the observed intensity/amplitude distribution, are reported, as well as their expected values:
Wilson ratio and moments
Acentric reflections
<I^2>/<I>^2 :1.955 (untwinned: 2.000; perfect twin 1.500)
<F>^2/<F^2> :0.796 (untwinned: 0.785; perfect twin 0.885)
<|E^2 - 1|> :0.725 (untwinned: 0.736; perfect twin 0.541)
Centric reflections
<I^2>/<I>^2 :2.554 (untwinned: 3.000; perfect twin 2.000)
<F>^2/<F^2> :0.700 (untwinned: 0.637; perfect twin 0.785)
<|E^2 - 1|> :0.896 (untwinned: 0.968; perfect twin 0.736)
Significant departure from the ideal values could indicate the presence of twinning or pseudo translations. For instance, an <I^2>/<I>^2 value significantly lower than 2.0, might point to twinning, whereas a value significantly larger than 2.0, might point towards pseudo translational symmetry. Cumulative intensity
distribution The cumulative intensity distribution is reported:
-----------------------------------------------
| Z | Nac_obs | Nac_theo | Nc_obs | Nc_theo |
-----------------------------------------------
| 0.0 | 0.000 | 0.000 | 0.000 | 0.000 |
| 0.1 | 0.081 | 0.095 | 0.168 | 0.248 |
| 0.2 | 0.167 | 0.181 | 0.292 | 0.345 |
| 0.3 | 0.247 | 0.259 | 0.354 | 0.419 |
| 0.4 | 0.321 | 0.330 | 0.420 | 0.474 |
| 0.5 | 0.392 | 0.394 | 0.473 | 0.520 |
| 0.6 | 0.452 | 0.451 | 0.521 | 0.561 |
| 0.7 | 0.506 | 0.503 | 0.570 | 0.597 |
| 0.8 | 0.552 | 0.551 | 0.603 | 0.629 |
| 0.9 | 0.593 | 0.593 | 0.636 | 0.657 |
| 1.0 | 0.635 | 0.632 | 0.673 | 0.683 |
-----------------------------------------------
| Maximum deviation acentric : 0.015 |
| Maximum deviation centric : 0.080 |
| |
| <NZ(obs)-NZ(twinned)>_acentric : -0.004 |
| <NZ(obs)-NZ(twinned)>_centric : -0.039 |
-----------------------------------------------
The N(Z) test is related to the moments based test discussed above. Nac_obs is the observed cumulative distribution of normalized intensities of the acentric data, and uses the full distribution rather then just a moment. The effects of twinning shows itself for Nac_obs having a more sigmoidal character. In the case of pseudo centering, Nac_obs will tend towards Nc_theo. The L test The L-test is an intensity statistic developed by Padilla and Yeates (Acta Cryst. (2003), D59: 1124-1130) and is reasonably robust in the presence of anisotropy and pseudo centering, especially if the miller indices are partitioned properly. Partitioning is carried out on the basis of a Patterson analysis. A significant deviation of both <|L|> and <L^2> from the expected values indicate twinning or other problems:
L test for acentric data
using difference vectors (dh,dk,dl) of the form:
(2hp,2kp,2lp)
where hp, kp, and lp are random signed integers such that
2 <= |dh| + |dk| + |dl| <= 8
Mean |L| :0.482 (untwinned: 0.500; perfect twin: 0.375)
Mean L^2 :0.314 (untwinned: 0.333; perfect twin: 0.200) http://phenix-online.org/documentation/xtriage.htm (8 of 15) [12/14/08 1:01:22 PM]
131
Data quality assessment with phenix.xtriage
The distribution of |L| values indicates a twin fraction of
0.00. Note that this estimate is not as reliable as obtained
via a Britton plot or H-test if twin laws are available.
Whether or not the <|L|> and <L^2> differ significantly from the expected values, is shown in the final summary (see below). Analysis of twin laws Twin law specific tests (Britton, H and RvsR) are performed:
Results of the H-test on a-centric data:
(Only 50.0% of the strongest twin pairs were used) mean |H| : 0.183 (0.50: untwinned; 0.0: 50% twinned) mean H^2 : 0.055 (0.33: untwinned; 0.0: 50% twinned)
Estimation of twin fraction via mean |H|: 0.317
Estimation of twin fraction via cum. dist. of H: 0.308
Britton analysis
Extrapolation performed on 0.34 < alpha < 0.495
Estimated twin fraction: 0.283
Correlation: 0.9951
R vs R statistic:
R_abs_twin = <|I1-I2|>/<|I1+I2|>
Lebedev, Vagin, Murshudov. Acta Cryst. (2006). D62, 83-95
R_abs_twin observed data : 0.193
R_abs_twin calculated data : 0.328
R_sq_twin = <(I1-I2)^2>/<(I1+I2)^2>
R_sq_twin observed data : 0.044
R_sq_twin calculated data : 0.120
Maximum Likelihood twin fraction determination
Zwart, Read, Grosse-Kunstleve & Adams, to be published.
The estimated twin fraction is equal to 0.227
These tests allow one to estimate the twin fraction and (if calculated data is provided) determine if rotational pseudo symmetry is present. Another option (albeit more computationally expensive), is to estimate the correlation between error free, untwinned, twin related normalized intensities (use the key perform=True on the command line)
Estimation of twin fraction, while taking into account the effects of possible NCS parallel to the twin axis.
Zwart, Read, Grosse-Kunstleve & Adams, to be published.
A parameters D_ncs will be estimated as a function of resolution,
together with a global twin fraction.
D_ncs is an estimate of the correlation coefficient between
untwinned, error-free, twin related, normalized intensities.
Large values (0.95) could indicate an incorrect point group.
Value of D_ncs larger than say, 0.5, could indicate the presence
of NCS. The twin fraction should be smaller or similar to other
estimates given elsewhere.
The refinement can take some time.
For numerical stability issues, D_ncs is limited between 0 and 0.95.
The twin fraction is allowed to vary between 0 and 0.45.
Refinement cycle numbers are printed out to keep you entertained.
http://phenix-online.org/documentation/xtriage.htm (9 of 15) [12/14/08 1:01:22 PM]
132
Data quality assessment with phenix.xtriage
. . . . 5 . . . . 10 . . . . 15 . . . . 20 . . . . 25 . . . . 30
. . . . 35 . . . . 40 . . . . 45 . . . . 50 . . . . 55 . . . . 60
. . . . 65 . . . . 70 . . . . 75 . . .
Cycle : 78
-----------
Log[likelihood]: 22853.700
twin fraction: 0.201
D_ncs in resolution ranges:
9.8232 -- 4.5978 :: 0.830
4.5978 -- 3.7139 :: 0.775
3.7139 -- 3.2641 :: 0.745
3.2641 -- 2.9747 :: 0.746
2.9747 -- 2.7666 :: 0.705
2.7666 -- 2.6068 :: 0.754
2.6068 -- 2.4784 :: 0.735
The correlation of the calculated F^2 should be similar to
the estimated values.
Observed correlation between twin related, untwinned calculated F^2
in resolution ranges, as well as estimates D_ncs^2 values:
Bin d_max d_min CC_obs D_ncs^2
1) 9.8232 -- 4.5978 :: 0.661 0.689
2) 4.5978 -- 3.7139 :: 0.544 0.601
3) 3.7139 -- 3.2641 :: 0.650 0.556
4) 3.2641 -- 2.9747 :: 0.466 0.557
5) 2.9747 -- 2.7666 :: 0.426 0.497
6) 2.7666 -- 2.6068 :: 0.558 0.569
7) 2.6068 -- 2.4784 :: 0.531 0.540
The twin fraction obtained via this method is usually lower than what is obtained by refinement. The estimated correlation coefficient (D_ncs^2) between the twin related F^2 values, is however reasonably accurate.
Exploring higher metric symmetry The fact that a twin law is present, could indicate that the data was incorrectly processed as well. The example below, shows a P41212 data set processed in P1:
Exploring higher metric symmetry
Point group of data as dictated by the space group is P 1
the point group in the Niggli setting is P 1
The point group of the lattice is P 4 2 2
A summary of R values for various possible point groups follow.
-----------------------------------------------------------------------------------------------
| Point group | mean R_used | max R_used | mean R_unused | min R_unused | choice |
-----------------------------------------------------------------------------------------------
| P 1 | None | None | 0.022 | 0.017 | |
| P 4 2 2 | 0.022 | 0.025 | None | None | <--- |
| P 1 2 1 | 0.017 | 0.017 | 0.026 | 0.024 | |
| Hall: C 2y (x-y,x+y,z) | 0.025 | 0.025 | 0.022 | 0.017 | |
| P 4 | 0.025 | 0.028 | 0.025 | 0.025 | |
| Hall: C 2 2 (x-y,x+y,z) | 0.024 | 0.025 | 0.017 | 0.017 | |
| Hall: C 2y (x+y,-x+y,z) | 0.024 | 0.024 | 0.023 | 0.017 | |
| P 1 1 2 | 0.028 | 0.028 | 0.021 | 0.017 | |
| P 2 1 1 | 0.027 | 0.027 | 0.022 | 0.017 | |
| P 2 2 2 | 0.023 | 0.028 | 0.025 | 0.025 | |
-----------------------------------------------------------------------------------------------
R_used: mean and maximum R value for symmetry operators *used* in this point group
R_unused: mean and minimum R value for symmetry operators *not used* in this point group
The likely point group of the data is: P 4 2 2 http://phenix-online.org/documentation/xtriage.htm (10 of 15) [12/14/08 1:01:22 PM]
133
Data quality assessment with phenix.xtriage
As in phenix.explore_metric_symmetry, the possible space groups are listed as well (not shown here). Twin
analysis summary The results of the twin analysis are summarized. Typical outputs look as follows for cases of wrong symmetry, twin laws but no suspected twinning and twinned data respectively. Wrong symmetry:
-------------------------------------------------------------------------------
Twinning and intensity statistics summary (acentric data):
Statistics independent of twin laws
- <I^2>/<I>^2 : 2.104
- <F>^2/<F^2> : 0.770
- <|E^2-1|> : 0.757
- <|L|>, <L^2>: 0.512, 0.349
Multivariate Z score L-test: 2.777
The multivariate Z score is a quality measure of the given
spread in intensities. Good to reasonable data is expected
to have a Z score lower than 3.5.
Large values can indicate twinning, but small values do not
necessarily exclude it.
Statistics depending on twin laws
------------------------------------------------------
| Operator | type | R obs. | Britton alpha | H alpha |
------------------------------------------------------
| k,h,-l | PM | 0.025 | 0.458 | 0.478 |
| -h,k,-l | PM | 0.017 | 0.459 | 0.487 |
| -k,h,l | PM | 0.024 | 0.458 | 0.478 |
| -k,-h,-l | PM | 0.024 | 0.458 | 0.478 |
| -h,-k,l | PM | 0.028 | 0.458 | 0.476 |
| h,-k,-l | PM | 0.027 | 0.458 | 0.477 |
| k,-h,l | PM | 0.024 | 0.457 | 0.478 |
------------------------------------------------------
Patterson analysis
- Largest peak height : 6.089
(corresponding p value : 6.921e-01)
The largest off-origin peak in the Patterson function is 6.09% of the height of the origin peak. No significant pseudo-translation is detected.
The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.
The symmetry of the lattice and intensity however suggests that the input space group is too low. See the relevant sections of the log file for more details on your choice of space groups.
As the symmetry is suspected to be incorrect, it is advisable to reconsider data processing.
-------------------------------------------------------------------------------
Twin laws present but no suspected twinning:
-------------------------------------------------------------------------------
Twinning and intensity statistics summary (acentric data):
Statistics independent of twin laws
- <I^2>/<I>^2 : 1.955
- <F>^2/<F^2> : 0.796
- <|E^2-1|> : 0.725
- <|L|>, <L^2>: 0.482, 0.314
Multivariate Z score L-test: 1.225
The multivariate Z score is a quality measure of the given
spread in intensities. Good to reasonable data is expected http://phenix-online.org/documentation/xtriage.htm (11 of 15) [12/14/08 1:01:22 PM]
134
Data quality assessment with phenix.xtriage
to have a Z score lower than 3.5.
Large values can indicate twinning, but small values do not
necessarily exclude it.
Statistics depending on twin laws
------------------------------------------------------
| Operator | type | R obs. | Britton alpha | H alpha |
------------------------------------------------------
| -h,k,-l | M | 0.455 | 0.016 | 0.035 |
------------------------------------------------------
Patterson analysis
- Largest peak height : 3.886
(corresponding p value : 9.982e-01)
The largest off-origin peak in the Patterson function is 3.89% of the height of the origin peak. No significant pseudo-translation is detected.
The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.
Even though no twinning is suspected, it might be worthwhile carrying out a refinement using a dedicated twin target anyway, as twinned structures with low twin fractions are difficult to distinguish from non-twinned structures.
-------------------------------------------------------------------------------
Twinned data:
-------------------------------------------------------------------------------
Twinning and intensity statistics summary (acentric data):
Statistics independent of twin laws
- <I^2>/<I>^2 : 1.587
- <F>^2/<F^2> : 0.871
- <|E^2-1|> : 0.568
- <|L|>, <L^2>: 0.387, 0.212
Multivariate Z score L-test: 11.589
The multivariate Z score is a quality measure of the given
spread in intensities. Good to reasonable data is expected
to have a Z score lower than 3.5.
Large values can indicate twinning, but small values do not
necessarily exclude it.
Statistics depending on twin laws
------------------------------------------------------
| Operator | type | R obs. | Britton alpha | H alpha |
------------------------------------------------------
| -l,-k,-h | PM | 0.170 | 0.330 | 0.325 |
------------------------------------------------------
Patterson analysis
- Largest peak height : 7.300
(corresponding p value : 4.454e-01)
The largest off-origin peak in the Patterson function is 7.30% of the
height of the origin peak. No significant pseudo-translation is detected.
The results of the L-test indicate that the intensity statistics are significantly different then is expected from good to reasonable, untwinned data.
As there are twin laws possible given the crystal symmetry, twinning could be the reason for the departure of the intensity statistics from normality.
http://phenix-online.org/documentation/xtriage.htm (12 of 15) [12/14/08 1:01:22 PM]
135
Data quality assessment with phenix.xtriage
It might be worthwhile carrying refinement with a twin specific target function.
-------------------------------------------------------------------------------
In the summary, the significance of the departure of the values of the L-test from normality are stated. The multivariate Z-score (also known as the Mahalanobis distance) is used for this purpose.
Examples
Standard run of xtriage
Running xtriage is easy. From the command-line you can type: phenix.xtriage data.sca
When an MTZ or CNS file is used, labels have to be specified: phenix.xtriage file=my_brilliant_data.mtz obs_labels='F(+),SIGF(+),F(-),SIGF(-)'
In order to perform a Matthews analysis, it might be useful to specify the number of residues/nucleotides in the crystallized macro molecule: phenix.xtriage data.sca n_residues=230 n_bases=25
By default, the screen output plus additional ccp4 style graphs (viewable with the ccp4 programs loggraph) are echoed to a file named logfile.log
. The command line arguments and all other defaults settings are summarized in a PHIL parameter data block given at the beginning of the logfile / screen output: scaling.input {
parameters {
asu_contents {
n_residues = None
n_bases = None
n_copies_per_asu = None
}
misc_twin_parameters {
missing_symmetry {
tanh_location = 0.08
tanh_slope = 50
}
twinning_with_ncs {
perform_analysis = False
n_bins = 7
}
twin_test_cuts {
low_resolution = 10
high_resolution = None
isigi_cut = 3
completeness_cut = 0.85
}
}
reporting {
verbose = 1
log = "logfile.log"
ccp4_style_graphs = True
}
}
xray_data {
file_name = "some_data.sca" http://phenix-online.org/documentation/xtriage.htm (13 of 15) [12/14/08 1:01:22 PM]
136
Data quality assessment with phenix.xtriage
obs_labels = None
calc_labels = None
unit_cell = 64.5 69.5 45.5 90 104.3 90
space_group = "P 1 21 1"
high_resolution = None
low_resolution = None
}
}
The defaults are good for most applications.
Possible Problems
Specific limitations and problems
●
Xtriage doesn't deal with data in centric space groups
Literature
●
CCP4 newsletter No. 42, Summer 2005: Characterization of X-ray data sets
●
CCP4 newsletter No. 43, Winter 2005: Xtriage and Fest: automatic assessment of X-ray data and substructure structure factor estimation
Additional information
List of all xtriage keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- scaling
input
expert_level= 1 Expert level
asu_contents
Defines the ASU contents
n_residues= None Number of residues in structural unit
n_bases= None Number of nucleotides in structural unit
n_copies_per_asu= None Number of copies per ASU. If not specified,
Matthews analyses is performed
xray_data
Defines xray data
file_name= None File name with data
obs_labels= None Labels for observed data
calc_labels= None Lables for calculated data
unit_cell= None Unit cell parameters
space_group= None space group
high_resolution= None High resolution limit
low_resolution= None Low resolution limit
reference
A reference data set. For the investigation of possible
reindexing options
data
Defines an x-ray dataset
http://phenix-online.org/documentation/xtriage.htm (14 of 15) [12/14/08 1:01:22 PM]
137
Data quality assessment with phenix.xtriage
file_name= None File name
labels= None Labels
unit_cell= None Unit cell parameters"
space_group= None Space group
structure
file_name= None Filename of reference PDB file
parameters
Basic settings
reporting
Some output issues
verbose= 1 Verbosity
log= logfile.log
Logfile
ccp4_style_graphs= True SHall we include ccp4 style graphs?
misc_twin_parameters
Various settings for twinning or symmetry tests
missing_symmetry
Settings for missing symmetry tests
sigma_inflation= 1.25
Standard deviations of intensities can be
increased to make point group determination
more reliable.
twinning_with_ncs
Analysing the possibility of an NCS operator
parallel to a twin law.
perform_analyses= False Determines whether or not this analyses
is carried out.
n_bins= 7 Number of bins used in NCS analyses.
twin_test_cuts
Various cuts used in determining resolution limit
for data used in intensity statistics
low_resolution= 10.0
Low resolution
high_resolution= None High resolution
isigi_cut= 3.0
I/sigI ratio used in completeness cut
completeness_cut= 0.85
Data is cut at resolution where
intensities with I/sigI greater than
isigi_cut are more than completeness_cut
complete
optional
Optional data massage possibilities
hklout= None HKL out
hklout_type= mtz sca *mtz_or_sca Output format
label_extension= "massaged" Label extension
aniso
Parameters dealing with anisotropy correction
action= *remove_aniso None Remove anisotropy?
final_b= *eigen_min eigen_mean user_b_iso Final b value
b_iso= None User specified B value
outlier
Outlier analyses
action= *extreme basic beamstop None Outlier protocol
parameters
Parameters for outlier detection
basic_wilson
level= 1E-6
extreme_wilson
level= 0.01
beamstop
level= 0.001
d_min= 10.0
symmetry
action= detwin twin *None
twinning_parameters
twin_law= None
fraction= None http://phenix-online.org/documentation/xtriage.htm (15 of 15) [12/14/08 1:01:22 PM]
138
Reflection Statistics
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Reflection Statistics
phenix.reflection_statistics Comparisions between multiple datasets are available via the phenix.
reflection_statistics
command:
Usage: phenix.reflection_statistics [options] reflection_file [...]
Options:
-h, --help show this help message and exit
--unit-cell=10,10,20,90,90,120|FILENAME
External unit cell parameters
--space-group=P212121|FILENAME
External space group symbol
--symmetry=FILENAME External file with symmetry information
--weak-symmetry symmetry on command line is weaker than symmetry found
in files
--quick Do not compute statistics between pairs of data arrays
--resolution=FLOAT High resolution limit (minimum d-spacing, d_min)
--low-resolution=FLOAT
Low resolution limit (maximum d-spacing, d_max)
--bins=INT Number of bins
--bins-twinning-test=INT
Number of bins for twinning test
--bins-second-moments=INT
Number of bins for second moments of intensities
--lattice-symmetry-max-delta=LATTICE_SYMMETRY_MAX_DELTA
angular tolerance in degrees used in the determination
of the lattice symmetry
Example: phenix.reflection_statistics data1.mtz data2.sca
This utility reads one or more reflection files (many common formats incl. MTZ, Scalepack, CNS,
SHELX). For each of the datasets found in the reflection files the output shows a block like the following:
Miller array info: gere_MAD.mtz:FSEinfl,SIGFSEinfl,DSEinfl,SIGDSEinfl
Observation type: xray.reconstructed_amplitude
Type of data: double, size=20994
Type of sigmas: double, size=20994
Number of Miller indices: 20994
Anomalous flag: 1
Unit cell: (108.742, 61.679, 71.652, 90, 97.151, 90)
Space group: C 1 2 1 (No. 5)
Systematic absences: 0
Centric reflections: 0
Resolution range: 24.7492 2.74876
Completeness in resolution range: 0.873513
Completeness with d_max=infinity: 0.872315
Bijvoet pairs: 10497
Lone Bijvoet mates: 0 http://phenix-online.org/documentation/reflection_statistics.htm (1 of 2) [12/14/08 1:01:25 PM]
139
Reflection Statistics
Anomalous signal: 0.1065
This is followed by a listing of the completeness and the anomalous signal in resolution bins. The number of bins and the resolution range may be adjusted with the options shown above. Unless the -quick
option is specified the output will also show the correlations between the datasets and, if applicable, between the anomalous differences, both as overall values and in bins. The correlation between anomalous differences is often a very powerful indicator for the resolution up to which the
anomalous signal is useful for substructure determination. See also: phenix.xtriage
http://phenix-online.org/documentation/reflection_statistics.htm (2 of 2) [12/14/08 1:01:25 PM]
140
reflection file tools
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
reflection file tools
phenix.reflection_file_converter
phenix.reflection_file_converter
Purpose phenix.reflection_file_converter is a simple utility program that allows a straightforward conversion of many reflection file formats to mtz, cns or scalepack format.
Currently, combining several dataset into a single output file is not supported. Keywords Typing: phenix.reflection_file_converter --help results in:
Usage: phenix.reflection_file_converter [options] reflection_file ...
Options:
-h, --help show this help message and exit
--unit-cell=10,10,20,90,90,120|FILENAME
External unit cell parameters
--space-group=P212121|FILENAME
External space group symbol
--symmetry=FILENAME External file with symmetry information
--weak-symmetry symmetry on command line is weaker than symmetry found
in files
--resolution=FLOAT High resolution limit (minimum d-spacing, d_min)
--low-resolution=FLOAT
Low resolution limit (maximum d-spacing, d_max)
--label=STRING Substring of reflection data label or number
--non-anomalous Averages Bijvoet mates to obtain a non-anomalous array
--r-free-label=STRING
Substring of reflection data label or number
--r-free-test-flag-value=FLOAT
Value in R-free array indicating assignment to free
set.
--generate-r-free-flags
Generates a new array of random R-free flags (MTZ and
CNS output only).
--use-lattice-symmetry-in-r-free-flag-generation
group twin/pseudo symmetry related reflections
together in r-free set.
--r-free-flags-fraction=FLOAT
Target fraction free/work reflections (default: 0.10).
--r-free-flags-max-free=INT
Maximum number of free reflections (default: 2000).
--change-of-basis=STRING
Change-of-basis operator: h,k,l or x,y,z or
to_reference_setting, to_primitive_setting,
to_niggli_cell, to_inverse_hand
--eliminate-invalid-indices
Remove indices which are invalid given the change of
basis desired http://phenix-online.org/documentation/reflection_file_tools.htm (1 of 3) [12/14/08 1:01:30 PM]
141
reflection file tools
--expand-to-p1 Generates all symmetrically equivalent reflections.
The space group symmetry is reset to P1. May be used
in combination with --change_to_space_group to lower
the symmetry.
--change-to-space-group=SYMBOL|NUMBER
Changes the space group and merges equivalent
reflections if necessary
--write-mtz-amplitudes
Converts intensities to amplitudes before writing MTZ
format; requires --mtz_root_label
--write-mtz-intensities
Converts amplitudes to intensities before writing MTZ
format; requires --mtz_root_label
--remove-negatives Remove negative intensities or amplitudes from the
data set
--massage-intensities
'Treat' negative intensities to get a positive
amplitude. |Fnew| = sqrt((Io+sqrt(Io**2
+2sigma**2))/2.0). Requires intensities as input and
the flags --mtz, --write_mtz_amplitudes and
--mtz_root_label.
--scale-max=FLOAT Scales data such that the maximum is equal to the
given value
--scale-factor=FLOAT Multiplies data with the given factor
--sca=FILE write data to Scalepack FILE ('--sca .' copies name of
input file)
--mtz=FILE write data to MTZ FILE ('--mtz .' copies name of input
file)
--mtz-root-label=STRING
Root label for MTZ file (e.g. Fobs)
--cns=FILE write data to CNS FILE ('--cns .' copies name of input
file)
--shelx=FILE write data to SHELX FILE ('--shelx .' copies name of
input file)
Example: phenix.reflection_file_converter w1.sca --mtz .
Examples
●
Convert scalepack into an mtz format. Specify ouput filename (w1.mtz) and label for intensities (IP -> IP,
SIGIP): phenix.reflection_file_converter w1.sca --mtz_root_label=IP --mtz=w1.mtz
●
Change basis to get data in primitive setting, merge to higher symmetry and bring to reference setting
(three steps): phenix.reflection_file_converter c2.sca --change-of-basis=to_niggli_cell --sca=niggli.sca
phenix.reflection_file_converter niggli.sca --change-to-space_group=R32:R --sca=r32r.sca
phenix.reflection_file_converter r32r.sca --change-of-basis=to_reference_setting -sca=r32_hexagonal_setting.sca
phenix.cns_as_mtz
Purpose Converts all data in a CNS reflection file to MTZ format. Keywords Typing: phenix.cns_as_mtz --help results in:
Usage: phenix.cns_as_mtz [options] cns_file http://phenix-online.org/documentation/reflection_file_tools.htm (2 of 3) [12/14/08 1:01:30 PM]
142
reflection file tools
Options:
-h, --help show this help message and exit
--unit-cell=10,10,20,90,90,120|FILENAME
External unit cell parameters
--space-group=P212121|FILENAME
External space group symbol
--symmetry=FILENAME External file with symmetry information
-q, --quiet suppress output
Example: phenix.cns_as_mtz scale.hkl
Example Extract unit cell parameters and space group symbol from a PDB coordinate file and reflection data from a CNS reflection file. Write MTZ file: phenix.cns_as_mtz mad_scale.hkl --symmetry minimize.pdb
phenix.mtz.dump
Purpose Inspects an MTZ file. Optionally writes data in text format (human readable, machine readable, or spreadsheet). Keywords Typing: phenix.mtz.dump --help results in:
Usage: phenix.mtz.dump [options] file_name [...]
Options:
-h, --help show this help message and exit
-v, --verbose Enable CMTZ library messages.
-c, --show-column-data
-f KEYWORD, --column-data-format=KEYWORD
Valid keywords are: human_readable, machine_readable,
spreadsheet. Human readable is the default. The format
keywords can be abbreviated (e.g. -f s).
-b, --show-batches
--walk=ROOT_DIR Find and process all MTZ files under ROOT_DIR http://phenix-online.org/documentation/reflection_file_tools.htm (3 of 3) [12/14/08 1:01:30 PM]
143
Structure factor file manipulations with Xmanip
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Structure factor file manipulations with Xmanip
Author(s)
●
Xmanip: Peter Zwart
●
Phil command interpreter: Ralf W. Grosse-Kunstleve
Purpose
Manipulation of reflection data and models
Usage
Command line interface
xmanip
can be invoked via the command line interface with instructions given in a specific definition file:
phenix.xmanip params.def
The full set of definitions can be obtained by typing:
phenix.xmanip
which results in::
xmanip {
input {
unit_cell = None
space_group = None
xray_data {
file_name = None
labels = None
label_appendix = None
name = None
write_out = None
}
model {
file_name = None
} http://phenix-online.org/documentation/xmanip.htm (1 of 7) [12/14/08 1:01:37 PM]
144
Structure factor file manipulations with Xmanip
}
parameters {
action = reindex manipulate_pdb *manipulate_miller
reindex {
standard_laws = niggli *reference_setting invert user_supplied
user_supplied_law = "h,k,l"
}
manipulate_miller {
task = get_dano get_diso lsq_scale sfcalc *custom None
output_label_root = "FMODEL"
get_dano {
input_data = None
}
get_diso {
native = None
derivative = None
use_intensities = True
use_weights = True
scale_weight = True
}
lsq_scale {
input_data_1 = None
input_data_2 = None
use_intensities = True
use_weights = True
scale_weight = True
}
sfcalc {
fobs = None
output = *2mFo-DFc mFo-DFc complex_fcalc abs_fcalc intensities
use_bulk_and_scale = *as_estimated user_upplied
bulk_and_scale_parameters {
d_min = 2
overall {
b_cart {
b_11 = 0
b_22 = 0
b_33 = 0
b_12 = 0
b_13 = 0
b_23 = 0
}
k_overall = 0.1
}
solvent {
k_sol = 0.3
b_sol = 56
}
}
}
custom{
code = print >> out, "hello world"
}
}
manipulate_pdb{
task = apply_operator *set_b
apply_operator{
operator = "x,y,z"
invert=False
concatenate_model=False http://phenix-online.org/documentation/xmanip.htm (2 of 7) [12/14/08 1:01:37 PM]
145
Structure factor file manipulations with Xmanip
chain_id_increment=1
}
set_b{
b_iso = 30
}
}
}
output {
logfile = "xmanip.log"
hklout = "xmanip.mtz"
xyzout = "xmanip.pdb"
}
}
Detailed explanation of the scopes follow below.
Parameters and definitions
The xmanip.input scope defines which files and which data xmanip
reads in::
input {
unit_cell = None # unit cell. Specify when not in reflection or pdb files
space_group = None # space group. Specify when not in reflection or pdb files
xray_data {
file_name = None # File from which data will be read
labels = None # Labels to read in.
label_appendix = None # Label appendix: when writing out the new mtz file, this appendix will be added to the current label.
name = None # A data set name. Useful for manipulation
write_out = None # Determines if this data set will be written to the final mtz file
}
model {
file_name = None # An input pdb file
}
}
One can define as many sub-scopes of xray_data as desired (see examples). The specific tasks of xmanip
are controlled by the xmanip.parameters.action key. Possible options are:
●
reindex
●
manipulate_pdb
●
manipulate_miller
Reindexing: reindexing of a data set (and a model) is controlled by the xmanip.parameters.reindex scope. Standard laws are available:
●
niggli: Brings unit cell to the niggli setting.
●
reference_setting: Brings space group to the reference setting
●
invert: Inverts a data set
●
user_supplied: A user supplied reindexing law is used, specified by reindex.user_supplied_law
manipulate_pdb: A pdb file can be modified by either applying a symmetry operator to the coordinates
(select the apply_operator task from the manipulate_pdb.task list. The operator needs to be specified by apply_operator.operator. Setting apply_operator.invert to true
will invert the supplied operator. One can choose to put out the newly generated chain with the original chain (set concatenate_model = True
). The new chain ID can be controlled with the chain_id_increment parameter. manipulate miller: Reflection data can be manipulate in various ways: http://phenix-online.org/documentation/xmanip.htm (3 of 7) [12/14/08 1:01:37 PM]
146
Structure factor file manipulations with Xmanip
●
get_dano: Get anomalous differences from the data set with name specified by manipulate_miller.
●
get_dano.input_data.
get_diso: Get isomorphous differences (derivative-native) from the data sets specified by the names
manipulate_miller.get_diso.native and manipulate_miller.get_diso.derivative. Least squares scaling of the derivative to the native can be done on intensities ( use_intensities=True
), with or without using
● sigmas (use_weights) and by scaling the weights if desired (recommended).
lsq_scale : As above, no isomorphous difference are computed, only input_data_2 is scaled and
● returned.
sfcalc: Structure factor calculation. Requires a pdb file to be read in. Possible output coefficients are
●
2mFo-DFc (Fobs required. specify sfcalc.fobs).
●
mFo-DFc (Fobs required. specify sfcalc.fobs).
●
complex_fcalc (FC,PHIC)
●
abs_fcalc (FC)
●
intensities (FC^2) bulk solvent and scaling parameters will be either estimated from observed data if supplied, or set by the user (using keywords in the bulk_and_scale_parameters scope)
●
custom: If custom is selected, all data names for the xray data will become variable names accessible via the custom interface. The ``custom`` interface allows one to write a small piece of python code that directly works with the python objects them self. Basic knowledge of the cctbx and python are needed to bring this to a fruitful ending. Please contact the authors for detailed help if required. An example is given in the example section.
Examples
Reindexing a data set and model ::
xmanip {
input {
xray_data {
file_name = mydata.mtz
labels = FOBS,SIGFOBS
write_out = True
}
xray_data {
file_name = mydata.mtz
labels = R_FREE_FLAG
write_out = True
}
model {
file_name = mymodel.pdb
}
}
parameters {
action = reindex
reindex {
standard_laws = *niggli
user_supplied_law = "h,k,l"
}
}
output {
logfile = "xmanip.log"
hklout = "reindex.mtz"
xyzout = "reindex.pdb"
}
}
Applying a symmetry operator to a pdb file :: http://phenix-online.org/documentation/xmanip.htm (4 of 7) [12/14/08 1:01:37 PM]
147
Structure factor file manipulations with Xmanip
xmanip {
input {
model {
file_name = mymodel.pdb
}
}
parameters {
action = manipulate_pdb
manipulate_pdb {
task = apply_operator
apply_operator{
operator = "x+1/3,y-2/3,z+1/8"
}
}
}
output {
logfile = "xmanip.log"
xyzout = "shifted.pdb"
}
}
Printing out some useful information for an mtz file ::
xmanip {
input {
xray_data {
file_name = mydata.mtz
labels = FOBS,SIGFOBS
name = fobs
}
}
parameters {
action = custom
custom{
code = """
print >> out, "Printing d_spacings, epsilons and intensities"
#change amplitude to intensities
fobs = fobs.f_as_f_sq()
#get epsilons
epsilons = fobs.epsilons().data().as_double()
#get d spacings
d_hkl = fobs.d_spacings().data()
#print the lot to a file
output_file = open("jiffy_result.txt", 'w')
for ii, eps, dd in zip( fobs.data(), epsilons, d_hkl):
print >> output_file, ii, eps, dd
print >> out, "Done"
"""
}
}
}
Possible Problems
None
Literature
None http://phenix-online.org/documentation/xmanip.htm (5 of 7) [12/14/08 1:01:37 PM]
148
Structure factor file manipulations with Xmanip
Additional information
List of all xmanip keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- xmanip
input
unit_cell= None Unit cell parameters
space_group= None space group
xray_data
Scope defining xray data. Multiple scopes are allowed
file_name= None file name
labels= None A unique label or unique substring of a label
label_appendix= None Label appendix for output mtz file
name= None An identifier of this particular miller array
write_out= None Determines if this data is written to the output file
model
A model associated with the miller arrays. Only one model can be
defined.
file_name= None A model file
parameters
action= *reindex manipulate_pdb manipulate_miller Defines which action
will be carried out.
reindex
Reindexing parameters. Acts on coordinates and miller arrays.
standard_laws= niggli *reference_setting primitive_setting invert
user_supplied Choices of reindexing operators. Will be
applied on structure and miller arrays.
user_supplied_law= 'h,k,l' User supplied operator.
manipulate_miller
Acts on a single miller array or a set of miller
arrays.
task= *get_dano get_diso lsq_scale sfcalc custom None Possible tasks
output_label_root= None Output label root
get_dano
Get ||F+| - |F-|| from input data.
input_data= None
get_diso
Get |Fder|-|Fnat|
native= None Name of native data
derivative= None Name of derivative data
use_intensities= True Scale on intensities
use_weights= True Use experimental sigmas as weights in scaling
scale_weight= True Whether or not to scale the sigmas during
scaling
lsq_scale
input_data_1= None Reference data
input_data_2= None Data to be scaled
use_intensities= True Scale on intensities
use_weights= True Use experimental sigmas as weights in scaling
scale_weight= True Whether or not to scale the sigmas during
scaling
sfcalc
fobs= None Data name of observed data http://phenix-online.org/documentation/xmanip.htm (6 of 7) [12/14/08 1:01:37 PM]
149
Structure factor file manipulations with Xmanip
output= 2mFo-DFc mFo-DFc *complex_fcalc abs_fcalc intensities
Output coefficients
use_bulk_and_scale= *as_estimated user_upplied estimate or use
parameters given by user
bulk_and_scale_parameters
Parameters used in the structure factor
calculation. Ignored if experimental
data is given
d_min= 2.0
resolution of the data to be calculated.
overall
Bulk solvent and scaling parameters
k_overall= 0.1
Overall scalar
b_cart
Anisotropic B values
b_11= 0
b_22= 0
b_33= 0
b_12= 0
b_13= 0
b_23= 0
solvent
Solvent parameters
k_sol= 0.3
Solvent scale
b_sol= 56.0
Solvent B
custom
A custom script that uses miller_array data names as variables.
code= None A piece of python code
show_instructions= True Some instructions
manipulate_pdb
Manipulate elements of a pdb file
task= set_b apply_operator *None How to manipulate a pdb file
set_b
b_iso= 30 new B value for all atoms
apply_operator
standard_operators= *user_supplied_operator
user_supplied_cartesian_rotation_matrix
Possible operators
user_supplied_operator= "x,y,z" Actualy operator in x,y,z notation
invert= False Invert operator given above before applying on
coordinates
concatenate_model= False Determines if new chain is concatenated
to old model
chain_id_increment= 1 Cain id increment
user_supplied_cartesian_rotation_matrix
Rotation,translation
matrix in cartesian frame
r= None Rotational part of operator
t= None Translational part of operator
output
Output files
logfile= xmanip.log
Logfile
hklout= xmanip.mtz
Ouptut miller indices and data
xyzout= xmanip.pdb
output PDB file http://phenix-online.org/documentation/xmanip.htm (7 of 7) [12/14/08 1:01:37 PM]
150
Explore Metric Symmetry
Documentation Home
Explore Metric Symmetry
Python-based Hierarchical ENvironment for Integrated Xtallography
Purpose
iotbx.explore_metric_symmetry
is a program that allows a user to quickly determine the symmetry of the lattice, given a unit cell, and determine relations between various possible point groups. Another use of iotbx.explore_metric_symmetry is in the comparison of related unit cells, that are related by a linear recombination of their basis vectors.
Keywords
A list of keywords and concise help can be obtained by typing: iotbx.explore_metric_symmetry options:
-h, --help show this help message and exit
--unit_cell=10,10,20,90,90,120|FILENAME
External unit cell parameters
--space_group=P212121|FILENAME
External space group symbol
--symmetry=FILENAME External file with symmetry information
--max_delta=FLOAT Maximum delta/obliquity used in determining the
lattice symmetry, using a modified Le-Page algorithm.
Default is 5.0 degrees
--start_from_p1 Reduce to Niggli cell and forget the input space group
before higher metric symmetry is sought.
--graph=GRAPH A graphical representation of the graph will be
written out. Requires Graphviz to be installed and in
path.
--centring_type=CENTRING_TYPE
Centring type, choose from P,A,B,C,I,R,F
--other_unit_cell=10,20,30,90,103.7,90
Other unit cell, for unit cell comparison
--other_space_group=OTHER_SPACE_GROUP
space group for other_unit_cell, for unit cell
comparison
--other_centring_type=OTHER_CENTRING_TYPE
Centring type, choose from P,A,B,C,I,R,F
--no_point_group_graph
Do not carry out the construction of a point group
graph.
--relative_length_tolerance=FLOAT
Tolerance for unit cell lengths to be considered
equal-ish.
--absolute_angle_tolerance=FLOAT
Angular tolerance in unit cell comparison http://phenix-online.org/documentation/explore_metric_symmetry.htm (1 of 2) [12/14/08 1:01:41 PM]
151
Explore Metric Symmetry
--max_order=INT Maximum volume change for target cell
Explore Metric Symmetry. A list of possible unit cells and space groups is given for the given specified unit cell and space group combination.
The keywords unit_cell, space_group (or centring_type) define the crystal symmetry for which a point group graph is constructed. The keyword max_delta sets the tolerance used the in determination of the lattice symmetry. the keyword start_from_p1 in combination with the space group is equivalent to specification of the centring_type only. If graphviz is installed, an a png file with the point group graph can be constructed by specifying the filename of the png graph with the keyword graph. If a second crystal is specified by the keywords other_unit_cell, other_space_group (or other_centring_type
) the unit cells will be compared. Using linear combinations of the smallest unit cell, possible matches for the large unit cell are sought. If desired, the larger unit cell can be expanded as well using the keyword max_order. The tolerances in the unit cell comparison can be changed form their defaults (10% on the lengths and 20 degrees on the angles) using the keywords relative_length_tolerance
and absolute_angle_tolerance. Construction of a point group graph can be skipped using the key no_point_group_graph.
Examples
Constructing a point group graph given some basic information: iotbx.explore_metric_symmetry --unit_cell="20,30,40,90,90,90" --centring_type=P
All point groups between P 1 and P 2 2 2 will be listed Comparing two related unit cells can be done using: iotbx.explore_metric_symmetry --unit_cell="20,30,40,90,90,90" --centring_type=P -other_unit_cell="40,80,60,90,90,90" --other_centring_type=F http://phenix-online.org/documentation/explore_metric_symmetry.htm (2 of 2) [12/14/08 1:01:41 PM]
152
Hybrid Substructure Search
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Hybrid Substructure Search
Auxiliary programs phenix.emma
HySS overview
The HySS (Hybrid Substructure Search) submodule of the Phenix package is a highly-automated procedure for the location of anomalous scatterers in macromolecular structures. HySS starts with the automatic detection of the reflection file format and analyses all available datasets in a given reflection file to decide which of these is best suited for solving the structure. The search parameters are automatically adjusted based on the available data and the number of expected sites given by the user. The search method is a systematic multitrial procedure employing
●
●
●
●
● direct-space Patterson interpretation followed by reciprocal-space Patterson interpretation followed by dual-space direct methods followed by automatic comparison of the solutions and automatic termination detection.
The end result is a consensus model which is exported in a variety of file formats suitable for frequently used phasing and density modification packages. Links:
●
Examples
●
Download page
The core search procedure is applicable to both anomalous diffraction and isomorphous replacement problems.
However, currently the command line interface is limited to work with anomalous diffraction data or externally preprocessed difference data. References:
●
Grosse-Kunstleve RW, Adams PD:
Substructure search procedures for macromolecular structures
Acta Cryst. 2003, D59, 1966-1973.
Electronic reprint
●
Adams PD, Grosse-Kunstleve RW, Hung L-W, Ioerger TR, McCoy AJ, Moriarty NW, Read RJ,
Sacchettini JC, Sauter NK, Terwilliger, TC:
PHENIX: building new software for automated crystallographic structure
determination
Acta Cryst. 2002, D58, 1948-1954.
Electronic reprint http://phenix-online.org/documentation/hyss.htm (1 of 6) [12/14/08 1:01:45 PM]
153
Hybrid Substructure Search
To contact us send email to [email protected]
.
HySS examples
The only input file required for running HySS is a file with the reflection data. HySS reads the following formats directly:
●
●
●
●
●
●
●
● merged scalepack files unmerged scalepack files (but merged files are preferred!)
CCP4 MTZ files with merged data
CCP4 MTZ files with unmerged data (but merged files are preferred!) d*trek .ref files
XDS_ASCII files with merged data
CNS reflection files
SHELX reflection files with amplitudes
nsf_d2_peak.sca
The CCI Apps binary bundles include a scalepack file with anomalous peak data for the structure with the PDB access code 1NSF (courtesy of A.T. Brunger). To find the 8 selenium sites enter: phenix.hyss nsf_d2_peak.sca 8 se
This leads to:
Reading reflection file: nsf_d2_peak.sca
Space group found in file: P 6
Is this the correct space group? [Y/N]:
HySS prompts for a confirmation of the space group because space group P6 is often used as a placeholder during data reduction. If the space group symbol found in the reflection file is not correct it can be changed.
However, in this case the symbol is correct. At the prompt enter Y to continue. Alternatively, the interactive prompt can be avoided by using the --space_group option: phenix.hyss nsf_d2_peak.sca 8 se --space_group=p6
HySS will quickly print a few screen-pages with information about the data (e.g. the magnitude of the anomalous signal) and the many search parameters. The most interesting output is produced after this point:
Entering search loop: p = peaklist index in Patterson map f = peaklist index in two-site translation function cc = correlation coefficient after extrapolation scan r = number of dual-space recycling cycles cc = final correlation coefficient p=000 f=000 cc=0.364 r=015 cc=0.479 [ best cc: 0.479 ] p=000 f=001 cc=0.310 r=015 cc=0.477 [ best cc: 0.479 0.477 ]
Number of matching sites of top 2 structures: 11 p=000 f=002 cc=0.166 r=015 cc=0.479 [ best cc: 0.479 0.479 0.477 ]
Number of matching sites of top 2 structures: 11
Number of matching sites of top 3 structures: 11
It will take a few seconds for each line starting with p= to appear. Each of these lines summarizes the result of one trial consisting of an evaluation of the Patterson function, two fast translation functions, and 15 cycles of dual-space recycling. The important number to watch is the final correlation. In the first three trials HySS http://phenix-online.org/documentation/hyss.htm (2 of 6) [12/14/08 1:01:45 PM]
154
Hybrid Substructure Search finds three substructure models with promisingly high correlations. These models are compared, taking allowed origin shifts and the hand ambiguity into account. The three models have more than 2/3 of the expected number of sites in common. Therefore HySS decides that the search is complete and prints a summary of the matching sites:
Top 3 correlations: p=000 f=000 cc=0.364 r=015 cc=0.479
p=000 f=002 cc=0.166 r=015 cc=0.479
p=000 f=001 cc=0.310 r=015 cc=0.477
Match summary:
Operator:
rotation: {{-1.0, 0.0, 0.0}, {0.0, -1.0, 0.0}, {0.0, 0.0, -1.0}}
translation: (-9.6289517721653785e-38, 0.0, 0.091526465343537006)
rms coordinate differences: 0.06
Pairs: 11
site001 site001 0.018
site002 site002 0.056
site003 site003 0.033
site004 site004 0.026
site005 site005 0.050
site006 site006 0.103
site007 site007 0.040
site008 site008 0.063
site009 site010 0.067
site010 site009 0.120
site011 site011 0.029
Singles model 1: 0
Singles model 2: 0
The matching sites are used to build a consensus model. The coordinates and occupancies are quickly refined using a quasi-Newton minimizer:
Minimizing consensus model (11 sites).
Truncating consensus model to expected number of sites.
Minimizing consensus model (8 sites).
Correlation coefficient for consensus model (8 sites): 0.483
The refined sites are sorted by occupancy in descending order. The model is truncated to the expected number of sites and refined again. After printing detailed timing information (not shown) the output ends with:
Storing all substructures found: nsf_d2_peak_hyss_models.pickle
Storing consensus model: nsf_d2_peak_hyss_consensus_model.pickle
Writing consensus model as PDB file: nsf_d2_peak_hyss_consensus_model.pdb
Writing consensus model as CNS SDB file: nsf_d2_peak_hyss_consensus_model.sdb
Writing consensus model as SOLVE xyz records: nsf_d2_peak_hyss_consensus_model.xyz
The fractional coordinates may also be useful in other programs.
Total CPU time: 49.60 seconds
The resulting coordinate files can be used for phasing and density modification with other programs.
gere_MAD.mtz
The CCP4 distribution includes a four-wavelength MAD dataset in the tutorial directory. To find the 12 http://phenix-online.org/documentation/hyss.htm (3 of 6) [12/14/08 1:01:45 PM]
155
Hybrid Substructure Search selenium sites with HySS enter: phenix.hyss $CEXAM/tutorial2000/data/gere_MAD.mtz 12 se
HySS automatically picks the wavelength with the strongest anomalous signal and finishes after about 34 seconds (2.8GHz Pentium 4 Linux), writing out the 12 (or sometimes only 11) sites in the various file formats.
mbp.hkl
The CNS tutorial includes data from a MAD experiment with Ytterbium as the anomalous scatterer. CNS reflection files do not contain information about the unit cell and space group. However, HySS is able to extract this information from other files, e.g. other reflection files, CNS files, SOLVE files, PDB files or SHELX files. For example: phenix.hyss $CNS_SOLVE/doc/html/tutorial/data/mbp/mbp.hkl 4 yb --symmetry $CNS_SOLVE/doc/html/ tutorial/data/mbp/def
HySS reads the reflection data from the mbp.hkl file. The --symmetry options instructs HySS to scan the def file for unit cell parameters and a space group symbol. HySS finishes after about 26 seconds (2.8GHz Pentium
4 Linux).
Command line options
Enter phenix.hyss without arguments to obtain a list of the available command line options:
Command line arguments: usage: phenix.hyss [options] reflection_file n_sites element_symbol options:
-h, --help show this help message and exit
--unit_cell=10,10,20,90,90,120|FILENAME
External unit cell parameters
--space_group=P212121|FILENAME
External space group symbol
--symmetry=FILENAME External file with symmetry information
--chunk=n,i Number of chunks for parallel execution and index for
one process
--search=fast|full Search mode
--resolution=FLOAT High resolution limit (minimum d-spacing, d_min)
--low_resolution=FLOAT
Low resolution limit (maximum d-spacing, d_max)
--site_min_distance=FLOAT
Minimum distance between substructure sites (default:
3.5)
--site_min_distance_sym_equiv=FLOAT
Minimum distance between symmetrically-equivalent
substructure sites (overrides --site_min_distance)
--site_min_cross_distance=FLOAT
Minimum distance between substructure sites not
related by symmetry (overrides --site_min_distance)
--molecular_weight=FLOAT
Molecular weight
--solvent_content=FLOAT
Solvent content (default: 0.55)
--random_seed=INT Seed for random number generator
--real_space_squaring
Use real space squaring (as opposed to the tangent
formula) http://phenix-online.org/documentation/hyss.htm (4 of 6) [12/14/08 1:01:45 PM]
156
Hybrid Substructure Search
--data_label=STRING Substring of reflection data label
See also:
http://www.phenix-online.org/download/documentation/cci_apps/hyss/
Example: phenix.hyss w1.sca 66 Se
The --data_label, --resolution and --low_resolution options can be used to override the automatic selection of the reflection data and the resolution range. For example, one may enter the following command with the goal to instruct HySS to use the peak data in the gere_MAD.mtz file (instead of the inflection point data), and to set the high resolution limit to 5 Angstrom: phenix.hyss gere_MAD.mtz 12 se --data_label=peak --resolution=5
Output:
Command line arguments: gere_MAD.mtz 12 se --data_label=peak --resolution=5
Reading reflection file: gere_MAD.mtz
Ambiguous --data_label=peak
Possible choices:
5: gere_MAD.mtz:FSEpeak,SIGFSEpeak,DSEpeak,SIGDSEpeak,merged
6: gere_MAD.mtz:F(+)SEpeak,SIGF(+)SEpeak,F(-)SEpeak,SIGF(-)SEpeak
Please specify an unambiguous substring of the target label.
Sorry: Please try again.
That's a good first try but if --data_label=peak turns out to be ambiguous HySS will ask for more information. Second try: phenix.hyss gere_MAD.mtz 12 se --data_label="F(+)SEpeak" --resolution=5
Now HySS will actually perform the search. Typically the search finishes in less than 10 seconds finding 8-12 sites, depending on the random number generator (which is seeded with the current time unless the -random_seed
option is used). The --site_min_distance, --site_min_distance_sym_equiv, and -site_min_cross_distance
options are available to override the default minimum distance of 3.5 Angstroms between substructure sites. The --real_space_squaring option can be useful for large structures with highresolution data. In this case the large number of triplets generated for the reciprocal-space direct methods procedure (i.e. the tangent formula) may lead to excessive memory allocation. By default HySS switches to real-space direct methods (i.e. E-map squaring) if it searches for more than 100 sites. If this limit is too high given the available memory use the --real_space_squaring option. For substructures with a large number of sites it is in our experience not critical to employ reciprocal-space direct methods. If the --molecular_weight and --solvent_content options are used HySS will help in determining the number of substructures sites in the unit cell, interpreting the number of sites specified on the command line as number of sites per molecule.
For example: phenix.hyss gere_MAD.mtz 2 se --molecular_weight=8000 --solvent_content=0.70
This is telling HySS that we have a molecule with a molecular weight of 8 kD, a crystal with an estimated solvent content of 70%, and that we expect to find 2 Se sites per molecule. The HySS output will now show the following:
#---------------------------------------------------------------------------#
| Formula for calculating the number of molecules given a molecular weight. |
|---------------------------------------------------------------------------| http://phenix-online.org/documentation/hyss.htm (5 of 6) [12/14/08 1:01:45 PM]
157
Hybrid Substructure Search
| n_mol = ((1.0-solvent_content)*v_cell)/(molecular_weight*n_sym*.783) |
#---------------------------------------------------------------------------#
Number of molecules: 6
Number of sites: 12
Values used in calculation:
Solvent content: 0.70
Unit cell volume: 476839
Molecular weight: 8000.00
Number of symmetry operators: 4
HySS will go on searching for 12 sites.
If things go wrong
If the HySS consensus model does not lead to an interpretable electron density map please try the --search full
option: phenix.hyss your_file.sca 100 se --search full
This disables the automatic termination detection and the run will in general take considerably longer. If the full search leads to a better consensus model please let us know because we will want to improve the automatic termination detection. Another possibility is to override the automatic determination of the highresolution limit with the --resolution option. In some cases the resolution limit is very critical. Truncating the high-resolution limit of the data can sometimes lead to a successful search, as more reflections with a weak anomalous signal are excluded. If there is no consensus model at the end of a HySS run please try alternative programs. For example, run SHELXD with the .ins and .hkl files that are automatically generated by HySS:
Writing anomalous differences as SHELX HKLF file: mbp_anom_diffs.hkl
Writing SHELXD ins file: mbp_anom_diffs.ins
If HySS does not produce a consensus model even though it is possible to solve the substructure with other programs we would like to investigate. Please send email to [email protected]
.
Auxiliary programs phenix.emma
EMMA stands for Euclidean Model Matching which allows two sets of coordinates to be superimposed as best
as possible given symmetry and origin choices. See the phenix.emma
documentation for more details.
phenix.xtriage
The phenix.xtriage program performs an extensive suite of tests to assess the quality of a data set. It is a good idea to always run this program before substructure location or any other steps of structure solution. See
documentation for more details.
phenix.reflection_statistics
Comparision between multiple datasets is available using the phenix.reflection_statistics command. See
the phenix.reflection_statistics
documentation for more details. http://phenix-online.org/documentation/hyss.htm (6 of 6) [12/14/08 1:01:45 PM]
158
Euclidian Model Matching
Documentation Home
Euclidian Model Matching
Python-based Hierarchical ENvironment for Integrated Xtallography
phenix.emma EMMA stands for Euclidean Model Matching and is the algorithm used by HySS to superimpose two putative solutions and to derive the consensus model. The same algorithm is also available through the external phenix.emma command-line interface. Enter phenix.emma without arguments to obtain the help page: usage: phenix.emma [options] reference_coordinates reference_coordinates other_coordinates options:
-h, --help show this help message and exit
--unit_cell=10,10,20,90,90,120|FILENAME
External unit cell parameters
--space_group=P212121|FILENAME
External space group symbol
--symmetry=FILENAME External file with symmetry information
--tolerance=FLOAT match tolerance
--diffraction_index_equivalent
Use only if models are diffraction-index equivalent.
Example: phenix.emma model1.pdb model2.sdb
The command takes two coordinate files in various formats (.pdb, CNS .sdb, SOLVE output, SHELX .ins) and compares the structures taking the space group symmetry, the allowed origin shifts and the hand ambiguity into account. The output is similar to the Match summary shown above in the example HySS output. The match tolerance defaults to 3 Angstrom. For structures obtained with very low resolution data it may be necessary to specify a different tolerance, e.g. --tolerance=5. The --symmetry option works just like it does for phenix.hyss. It can be used to extract symmetry information from external files such as input files for other programs (CNS, SHELX, SOLVE, ...) or reflection files. However, the -symmetry
option is only required if the information about the unit cell and the space group is missing in both coordinate files given to phenix.emma. phenix.emma conducts an exhaustive search and, in contrast to HySS, displays all possible matches. The match with the largest number of matching sites is shown first, the match with the smallest number of matching sites is shown last (often just one site). Therefore you have to look at the beginning of the output to see the best match. I.e. if the output goes to the screen don't let yourself get distracted if you see a large number of Singles near the end of the output.
Scroll back to see the best match. Emma is also available via a web interface . http://phenix-online.org/documentation/emma.htm [12/14/08 1:01:47 PM]
159
Structure refinement in PHENIX
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Structure refinement in PHENIX
Current limitations phenix.refine organization
Giving parameters on the command line or in files
Refinement with all default parameters
Refinement of atomic displacement parameters (commonly named as ADP or B-factors)
Using NCS restraints in refinement
Neutron and joint X-ray and neutron refinement
Refinement at high resolution (higher than approx. 1.0 Angstrom)
Examples of frequently used refinement protocols, common problems
Changing the number of refinement cycles and minimizer iterations
Creating R-free flags (if not present in the input reflection files)
Specify the name for output files
Setting the resolution range for the refinement
Bulk solvent correction and anisotropic scaling
Default refinement with user specified X-ray target function
Modifying the initial model before refinement starts
Refinement using FFT or direct structure factor calculation algorithm
Ignoring test (free) flags in refinement
Using phenix.refine to calculate structure factors
Suppressing the output of certain files
Refining with anomalous data (or what phenix.refine does with Fobs+ and Fobs-).
Rejecting reflections by sigma
Definition of custom bonds and angles
Depositing refined structure with PDB
List of all refinement keywords
phenix.refine is the general purpose crystallographic structure refinement program
Available features
●
Coordinate refinement: http://phenix-online.org/documentation/refinement.htm (1 of 42) [12/14/08 1:02:19 PM]
160
Structure refinement in PHENIX
1. Restrained / unrestrained individual
2. Grouped (rigid body)
3. LBFGS minimization, Simulated Annealing
4. Selective removing of stereochemistry restraints
5. Adding custom bonds and angles
●
Atomic Displacement Parameters (ADP) refinement:
1. Restrained individual isotropic, anisotropic, mixed
2. Group isotropic (one isotropic B per selected model part)
3. TLS
4. comprehensive mode: combined TLS + individual or group ADP
●
Occupancy refinement (any: individual, group, constrained for alternative conformations)
●
Anomalous f' and f'' refinement
●
Bulk solvent correction (flat model using a mask) and anisotropic scaling
●
Multiple refinement and scale target functions: least-squares (ls), maximum-likelihood (ml), phased
● maximum-likelihood (mlhl)
FFT and direct summation based refinement
●
Various electron density map calculations (including likelihood-weighted)
●
Simple structure factor calculation (with or without bulk solvent and scaling)
●
Combined automatic ordered solvent building, update and refinement
●
Complete model and data statistics (including twinning analysis, Wilson B calculation, stereo-chemistry
● statistics and much more)
Automatic detection of NCS related copies and building NCS restraints
●
Refinement using X-ray, neutron or both experimental data
●
Complex refinement strategies in one run
●
Refinement at subatomic resolution (approx. < 1.0 A) with IAS model
●
Refinement with twinned data
Current limitations
●
No omit maps calculation (use PHENIX wizards for this)
●
TLS and individual anisotropic ADP cannot be refined at once for the same group
●
Certain refinement strategies are not available for joint X-ray/neutron refinement
●
No NCS constraints (restraints only)
●
Atoms with anisotropic ADP in NCS groups
●
No Simulated Annealing for selected fragments.
Remark on using amplitudes (Fobs) vs intensities (Iobs)
Although phenix.refine can read in both data types, intensities or amplitudes, internally it uses amplitudes in nearly all calculations. Both ways of doing refinement, with Iobs or Fobs, have their own slight advantages and disadvantages. To our knowledge there is no strong points to argue using one data type w.r.t. another.
phenix.refine organization
A refinement run in phenix.refine always consists of three main steps: reading in and processing of the data (model in PDB format, reflections in most known formats, parameters and optionally cif files with stereochemistry definitions), performing requested refinement protocols (bulk solvent and scaling, refinement of coordinates and B-factors, water picking, etc...) and finally writing out refined model, complete refinement statistics and electron density maps in various formats. The figure below illustrates these steps: http://phenix-online.org/documentation/refinement.htm (2 of 42) [12/14/08 1:02:19 PM]
161
Structure refinement in PHENIX
The second central step encompassing from bulk solvent correction and scaling to refinement of particular model parameters is called macro-cycles and repeated several times (3 by default).
Multiple refinement scenario can be realized at this step and applied to any selected part of a model as illustrated at figure below:
Running phenix.refine
phenix.refine is run from the command line:
% phenix.refine <pdb-file(s)> <reflection-file(s)> <monomer-library-file(s)>
When you do this a number of things happen:
●
The program automatically generates a ".eff" file which contains all of the parameters for the job (for example if you provided lysozyme.pdb the file lysozyme_refine_001.eff will be generated). This is the set of input parameters for this run.
●
The program automatically interprets the reflection file(s). If there is an unambiguous choice of data arrays these will be used for the refinement. If there is a choice, you're given a message telling you how to select the arrays. Several reflection files can be provided, for example: one containing Fobs and another
● one with R-free flags.
Once the data arrays are chosen, the program writes all of the data it will be using in the refinement to a http://phenix-online.org/documentation/refinement.htm (3 of 42) [12/14/08 1:02:19 PM]
162
Structure refinement in PHENIX new MTZ file, for example, lysozyme_refine_data.mtz. This makes it very easy to keep track of what you actually used in the refinement (instead of having the arrays spread across multiple files).
●
At the end of refinement the program generates:
1. a new PDB file, with the refined model, called for example lysozyme_refine_001.pdb;
2. two maps: likelihood weighted mFo-DFc and 2mFo-DFc. These are in ASCII X-PLOR format. A reflection file with map coefficients is also generated for use in Coot or XtalView (e.g. lysozyme_refine_001_map_coeffs.mtz);
3. a new defaults file to run the next cycle of refinement, e.g. lysozyme_refine_002.def. This means you can run the next cycle of refinement by typing:
% phenix.refine lysozyme_refine_002.def
To get information about command line options type:
% phenix.refine --help
To have the program generate the default input parameters without running the refinement job (e.g. if you want to modify the parameters prior to running the job):
% phenix.refine --dry_run <pdb-file> <reflection-file(s)>
If you know the parameter that you want to change you can override it from the command line:
% phenix.refine data.hkl model.pdb xray_data.low_resolution=8.0 \
simulated_annealing.start_temperature=5000
Note that you don't have to specify the full parameter name. What you specify on the command line is matched against all known parameters names and the best substring match is used if it is unique. To rerun a job that was previously run:
% phenix.refine --overwrite lysozyme_refine_001.def
The --overwrite option allows the program to overwrite existing files. By default the program will not overwrite existing files - just in case this would remove the results of a refinement job that took a long time to finish. To see all default parameters:
% phenix.refine --show-defaults=all
Giving parameters on the command line or in files
In phenix.refine parameters to control refinement can be given by the user on the command line:
% phenix.refine data.hkl model.pdb simulated_annealing=true
However, sometimes the number of parameters is large enough to make it difficult to type them all on the command line, for example:
% phenix.refine data.hkl model.pdb refine.adp.tls="chain A" \
refine.adp.tls="chain B" main.number_of_macro_cycles=4 \
xray_data.high_resolution=2.5 wxc_scale=3 wxu_scale=5 \
output.prefix=my_best_model strategy=tls+individual_sites+individual_adp \
simulated_annealing.start_temperature=5000
The same result can be achieved by using:
% phenix.refine data.hkl model.pdb custom_par_1.params
where the custom_par_1.params file contains the following lines: refinement.refine.strategy=tls+individual_sites+individual_adp refinement.refine.adp.tls="chain A" http://phenix-online.org/documentation/refinement.htm (4 of 42) [12/14/08 1:02:19 PM]
163
Structure refinement in PHENIX refinement.refine.adp.tls="chain B" refinement.main.number_of_macro_cycles=4 refinement.input.xray_data.high_resolution=2.5
refinement.target_weights.wxc_scale=3 refinement.target_weights.wxu_scale=5 refinement.output.prefix=my_best_model refinement.simulated_annealing.start_temperature=5000 which can also be formatted by grouping the parameters under the relevant scopes (custom_par_2.params): refinement.main {
number_of_macro_cycles=4
} refinement.input.xray_data.high_resolution=2.5
refinement.refine {
strategy = *individual_sites \
rigid_body \
*individual_adp \
group_adp \
*tls \
occupancies \
group_anomalous \
none
adp {
tls = "chain A"
tls = "chain B"
}
} refinement.target_weights {
wxc_scale=3
wxu_scale=5
} refinement.output.prefix=my_best_model refinement.simulated_annealing.start_temperature=5000 and the refinement run will be:
% phenix.refine data.hkl model.pdb custom_par_2.params
The easiest way to create a file like the custom_par_2.params file is to generate a template file containing all parameters by using the command phenix.refine --show-defaults=all and then take the parameters that you want to use (and remove the rest). Comments in parameter files Use # for comments:
% phenix.refine data.hkl model.pdb comments_in_params_file.params
where comments_in_params_file.params file contains the lines: refinement {
refine {
#strategy = individual_sites rigid_body individual_adp group_adp tls \
# occupancies group_anomalous *none
}
#main {
# number_of_macro_cycles = 1
#}
} refinement.target_weights.wxc_scale = 1.5
#refinement.input.xray_data.low_resolution=5.0
In this example the only parameter that is used to overwrite the defaults is target_weights.wxc_scale and the rest is commented.
Refinement scenarios
http://phenix-online.org/documentation/refinement.htm (5 of 42) [12/14/08 1:02:19 PM]
164
Structure refinement in PHENIX
The refinement of atomic parameters is controlled by the strategy keyword. Those include:
- individual_sites (refinement of individual atomic coordinates)
- individual_adp (refinement of individual atomic B-factors)
- group_adp (group B-factors refinement)
- group_anomalous (refinement of f' and f" values)
- tls (TLS refinement = refinement of ADP through TLS parameters)
- rigid_body (rigid body refinement)
- occupancies (occupancy refinement: individual, group, group constrained)
- none (bulk solvent and anisotropic scaling only)
Below are examples to illustrate the use of the strategy keyword as well as a few others.
Refinement with all default parameters
% phenix.refine data.hkl model.pdb
This will perform coordinate refinement and restrained ADP refinement. Three macrocycles will be executed, each consisting of bulk solvent correction, anisotropic scaling of the data, coordinate refinement
(25 iterations of the LBFGS minimizer) and ADP refinement (25 iterations of the LBFGS minimizer). At the end the updated coordinates, maps, map coefficients, and statistics are written to files.
Refinement of coordinates
phenix.refine offers three ways of coordinate refinement:
● individual coordinate refinement using gradient-driven (LBFGS) minimization;
● individual coordinate refinement using simulated annealing (SA refinement);
● grouped coordinate refinement (rigid body refinement).
All types of coordinate refinement listed above can be used separately or combined all together in any combination and can be applied to any selected part of a model. For example, if a model contains three chains A, B and C, than it would require only one single refinement run to perform SA refinement and minimization for atoms in chain A, rigid body refinement with two rigid groups A and B, and refine nothing for chain C. Below we will illustrate this with several examples. The default refinement includes a standard set of stereo-chemical restraints ( covalent bonds, angles, dihedrals, planarities, chiralities, non-bonded). The NCS restrains can be added as well. Completely unrestrained refinement is possible. The total refinement target is defined as:
Etotal = wxc_scale * wxc * Exray + wc * Egeom where: Exray is crystallographic refinement target (least-squares, maximum-likelihood, or any other), Egeom is the sum of restraints (including NCS if requested), wc is 1.0 by default and used to turn the restraints off, wxc ~ ratio of gradient's norms for geometry and X-ray targets as defined in (Adams et al, 1997,
PNAS, Vol. 94, p. 5018), wc_scale is an 'ad hoc' scale found empirically to be ok for most of the cases. Important to note:
When a refinement of coordinates (individual or rigid body) is run without using selections, then the coordinates of all atoms will be refined. Otherwise, if selections are used, the only coordinates of selected atoms will be refined and the rest will be fixed. Using strategy=rigid_body or strategy=individual_sites will ask phenix.refine to refine only coordinates while other parameters (ADP, occupancies) will be fixed. phenix.refine will stop if an atom at special position is included in rigid body group. The solution is to make a new rigid body group selection containing no atoms at special positions.
●
Rigid body refinement phenix.refine implementation of rigid body refinement is very sophisticated and efficient (big convergence radius, one run, no need to cut off high-resolution data). We call this MZ protocol (multiple zones). The essence of MZ protocol is that the refinement starts with a few reflections selected in the lowest resolution zone and proceeds with gradually adding higher resolution reflections. Also, it almost constantly updates the mask and bulk solvent model parameters and this is crucial since the bulk solvent affects the low resolution reflections - exactly those the most important for success of rigid body refinement. The default set of the rigid body parameters is good for most of the cases and is normally not supposed to be changed.
1. One rigid body group (whatever is in the PDB file is refined as a single rigid body): http://phenix-online.org/documentation/refinement.htm (6 of 42) [12/14/08 1:02:19 PM]
165
Structure refinement in PHENIX
% phenix.refine data.hkl model.pdb strategy=rigid_body
2. Multiple groups (requires a basic knowledge of the PHENIX atom selection language, see below):
% phenix.refine data.hkl model.pdb strategy=rigid_body \
sites.rigid_body="chain A" sites.rigid_body="chain B"
This will refine the chain A and chain B as two rigid bodies. The rest of the model will be kept fixed.
3. If one have many rigid groups, a lot of typing in the command line may not be convenient, so creating a parameter file rigid_body_selections, containing the following lines, may be a good idea: refinement.refine.sites {
rigid_body = chain A
rigid_body = chain B
}
The command line will then be:
% phenix.refine data.hkl model.pdb strategy=rigid_body rigid_body_selections.params
Files like this can be created, for example, by copy-and-paste from the complete list of parameters (phenix.
refine --show-defaults=all).
4. To switch from MZ protocol to traditional way of doing rigid body refinement (not recommended!):
% phenix.refine data.hkl model.pdb strategy=rigid_body rigid_body.number_of_zones=1 \
rigid_body.high_resolution=4.0
Note that doing one zone refinement one need to cut the high-resolution data off at some arbitrary point around 3-5 A (depending on model size and data quality).
5. By default the rigid body refinement is run only the first macro-cycles. To switch from running rigid body refinement only once at the first macro-cycle to running it every macro-cycle:
% phenix.refine data.hkl model.pdb strategy=rigid_body rigid_body.mode=every_macro_cycle
6. To change the default number of lowest resolution reflections used to determine the first resolution zone to do rigid body refinement in it (for MZ protocol only):
% phenix.refine data.hkl model.pdb strategy=rigid_body \
rigid_body.min_number_of_reflections=250
Decreasing this number may increase the convergence radius of rigid body refinement but small numbers may lead to refinement instability.
7. To change the number of zones for MZ protocol:
% phenix.refine data.hkl model.pdb strategy=rigid_body \
rigid_body.number_of_zones=7
Increasing this number may increase the convergence radius of rigid body refinement at the cost of much longer run time.
8. Rigid body refinement can be combined with individual coordinates refinement in a smart way:
% phenix.refine data.hkl model.pdb strategy=rigid_body+individual_sites this will perform 3 macro-cycles of individual coordinates refinement and the rigid body refinement will be performed only once at the first macro-cycle. More powerful combination for coordinates refinement is:
% phenix.refine data.hkl model.pdb strategy=rigid_body+individual_sites \
simulated_annealing=true this will do the same refinement as above plus the Simulated annealing at the second macro-cycle (see more options/examples for running SA in this document).
●
Refinement of individual coordinates http://phenix-online.org/documentation/refinement.htm (7 of 42) [12/14/08 1:02:19 PM]
166
Structure refinement in PHENIX
1. Refinement with Simulated Annealing:
% phenix.refine data.hkl model.pdb simulated_annealing=true \
strategy=individual_sites
This will perform the Simulated Annealing refinement and LBFGS minimization for the whole model. To change the start SA temperature:
% phenix.refine data.hkl model.pdb simulated_annealing=true \
strategy=individual_sites simulated_annealing.start_temperature=10000
Since a SA run may take some time, there are several options defining of how many times the SA will be performed per refinement run. Run it only the first macro_cycle:
% phenix.refine data.hkl model.pdb simulated_annealing=true \
strategy=individual_sites simulated_annealing.mode=first or every macro-cycle:
% phenix.refine data.hkl model.pdb simulated_annealing=true \
strategy=individual_sites simulated_annealing.mode=every_macro_cycle or second and before the last macro-cycle:
% phenix.refine data.hkl model.pdb simulated_annealing=true \ strategy=individual_sites simulated_annealing.mode=second_and_before_last
2. Refinement with minimization (whole model):
% phenix.refine data.hkl model.pdb strategy=individual_sites
3. Refinement with minimization (selected part of model):
% phenix.refine data.hkl model.pdb strategy=individual_sites \ sites.individual="chain A"
This will refine the coordinates of atoms in chain A while keeping fixed the atomic coordinates in chain B.
4. To perform unrestrained refinement of coordinates (usually at ultra-high resolutions):
% phenix.refine data.hkl model.pdb strategy=individual_sites wc=0
This assigns the contribution of the geometry restraints target to zero. However, it is still calculated for statistics output.
5. Removing selected geometry restraints In the example below:
% phenix.refine data.hkl model.pdb remove_restraints_selections.params
where remove_restraints_selections.params contains: refinement {
geometry_restraints.remove {
angles = chain B
dihedrals = name CA
chiralities = all
planarities = None
}
} the following restraints will be removed: angle for all atoms in chain B, dihedral for all involving CA atoms, all chirality. All planarity restraints will be preserved. http://phenix-online.org/documentation/refinement.htm (8 of 42) [12/14/08 1:02:19 PM]
167
Structure refinement in PHENIX
Refinement of atomic displacement parameters (commonly named as ADP or B-factors)
An ADP in phenix.refine is defined as a sum of three contributions:
Utotal = Ulocal + Utls + Ucryst where Utotal is the total ADP, Ulocal reflects the local atomic vibration (also named as residual B) and
Ucryst reflects global lattice vibrations. Ucryst is determined and refined at anisotropic scaling stage. phenix.refine offers multiple choices for ADP refinement:
● individual isotropic, anisotropic or mixed ADP;
● grouped with one isotropic ADP per selected group;
●
TLS.
All types of ADP refinement listed above can be used separately or combined all together in any combination (except TLS+individual anisotropic) and can be applied to any selected part of a model.
For example, if a model contains six chains A, B, C, D, E and F than it would require only one single refinement run to perform refinement of:
- individual isotropic ADP for atoms in chain A,
- individual anisotropic ADP for atoms in chain B,
- grouped B with one B per all atoms in chain C,
- TLS refinement for chain D,
- TLS and individual isotropic refinement for chain E,
- TLS and grouped B refinement for chain F.
Below we will illustrate this with several examples. Restraints are used for default ADP refinement of isotropic and anisotropic atoms. Completely unrestrained refinement is possible. The total refinement target is defined as:
Etotal = wxu_scale * wxu * Exray + wu * Eadp where: Exray is crystallographic refinement target (least-squares, maximum-likelihood, ...), Eadp is the
ADP restraints term, wu is 1.0 by default and used to turn the restraints off, wxu and wc_scale are defined similarly to coordinates refinement (see Refinement of Coordinates paragraph). It is important to keep in mind:
If a model was previously refined using TLS that means all atoms participating in TLS groups are reported in output PDB file as anisotropic (have ANISOU records). Now if a PDB file like this is submitted for default refinement then all atoms with ANISOU records will be refined as individual anisotropic which is most likely not desired. When performing TLS refinement along with individual isotropic refinement of Ulocal, the restraints are applied to Ulocal and not to the total ADP (Ulocal+Utls). When performing group B or TLS refinement only, no ADP restrains is used. When ADP refinement is run without using selections then ADP for all atoms will be refined. Otherwise, if selections are used, the only ADP of selected atoms will be refined and the ADP of the rest will be unchanged. If a TLS parametrization is used for a model previously refined with individual anisotropic ADP then normally an increase of R-factors is expected. phenix.refine will stop if an atom at special position is included in TLS group. The solution is to make a new TLS group selection containing no atoms at special positions. When refining TLS, the output PDB file always has the ANISOU records for the atoms involved in TLS groups. The anisotropic B-factor in ANISOU records is the total B-factor
(B_tls + B_individual). The isotropic equivalent B-factor in ATOM records is the mean of the trace of the
ANISOU matrix divided by 10000 and multiplied by 8*pi^2 and represents the isotropic equivalent of the total B-factor (B_tls + B_individual). To obtain the individual B-factors, one needs to compute the TLS component (B_tls) using the TLS records in the PDB file header and then subtract it from the total B-factors
(on the ANISOU records).
●
Refining group isotropic B-factors
1. One B-factor per residue:
% phenix.refine data.hkl model.pdb strategy=group_adp
Two B-factors per residue:
% phenix.refine data.hkl model.pdb strategy=group_adp \ http://phenix-online.org/documentation/refinement.htm (9 of 42) [12/14/08 1:02:19 PM]
168
Structure refinement in PHENIX
group_adp_refinement_mode=two_adp_groups_per_residue
2. One isotropic B per selected group of atoms:
% phenix.refine data.hkl model.pdb strategy=group_adp \
group_adp_refinement_mode=group_selection \
adp.group="chain A" adp.group="chain B"
This will refine one isotropic B for chain A and one B for chain B.
The refinement of group isotropic B-factors in phenix.refine does not change the original distribution of
B-factors within the group, that is the differences between B-factors for atoms withing the group remain constant while the only total component added to all atoms of given group is varied. The atoms
● with anisotropic ADP are allowed to be withing the group.
Refinement of individual ADP (isotropic, anisotropic) By default atoms in a PDB file with ANISOU records are refined as anisotropic and atoms without ANISOU records are refined as isotropic. This behavior can be changed with appropriate keywords.
1. Default refinement of individual ADP:
% phenix.refine data.hkl model.pdb strategy=individual_adp
Note, atoms in input PDB file with ANISOU records will be refined as anisotropic and those without ANISOU - as isotropic.
2. Refinement of individual isotropic ADP for a model previously refined as anisotropic or TLS:
% phenix.refine data.hkl model.pdb strategy=individual_adp \
adp.individual.isotropic=all or equivalently:
% phenix.refine data.hkl model.pdb strategy=individual_adp \
convert_to_isotropic=true
All anisotropic atoms in input PDB file will be converted to isotropic before the refinement starts. Obviously, this may raise the R-factors.
3. Refinement of individual anisotropic ADP for a model previously refined as isotropic:
% phenix.refine data.hkl model.pdb strategy=individual_adp \
adp.individual.anisotropic="not element H"
This will refine all atoms as anisotropic except hydrogens.
4. Refinement of mixed model (some atoms are isotropic, some are anisotropic):
% phenix.refine data.hkl model.pdb strategy=individual_adp \
adp.individual.anisotropic="chain A and not element H" \
adp.individual.isotropic="chain B or element H"
In this example the atoms (except hydrogens if any) in chain A will be refined as anisotropic and the atoms in chain B (and hydrogens if any) will be refined as isotropic. Often, the ADP of water and hydrogens are desired to be refined as isotropic while the other atoms - as anisotropic:
% phenix.refine data.hkl model.pdb strategy=individual_adp \
adp.individual.anisotropic="not water and not element H" \
adp.individual.isotropic="water or element H"
Exactly the same command using slightly shorter selection syntax:
% phenix.refine data.hkl model.pdb strategy=individual_adp \
adp.individual.anisotropic="not (water or element H)" \
adp.individual.isotropic="water or element H"
5. To perform unrestrained individual ADP refinement (usually at ultra-high resolutions): http://phenix-online.org/documentation/refinement.htm (10 of 42) [12/14/08 1:02:19 PM]
169
Structure refinement in PHENIX
% phenix.refine data.hkl model.pdb strategy=individual_adp wu=0
This assigns the contribution of the ADP restraints target to zero. However, it is still calculated for statistics output.
●
TLS refinement
1. Refinement of TLS parameters only (whole model as one TLS group):
% phenix.refine data.hkl model.pdb strategy=tls
2. Refinement of TLS parameters only (multiple TLS group):
% phenix.refine data.hkl model.pdb strategy=tls tls_group_selections.params
where, similar to the rigid body or group B-factor refinement, the selection for TLS groups has been made in a user-created parameter file (tls_group_selections.params) as following: refinement.refine.adp {
tls = chain A
tls = chain B
}
Alternatively, the selection for the TLS groups can be made from the command line (see rigid body refinement for an example). Note: TLS parameters will be refined only for selected fragments. This, for example, will allow to not include the solvent molecules into the TLS groups.
3. More complete is to perform combined TLS and individual or grouped isotropic ADP refinement:
% phenix.refine data.hkl model.pdb strategy=tls+individual_adp or:
% phenix.refine data.hkl model.pdb strategy=tls+group_adp
This will allow to model global (TLS) and local (individual) components of the total ADP and also compensate for the model parts where TLS parametrization doesn't suite well.
Occupancy refinement
Here is the list of facts that are important to know about occupancy refinement in phenix.refine:
● phenix.refine
can perform the following types of occupancy refinement: individual (refinement of one occupancy factor per atom), group (refinement of one occupnacy factor per group of selected atoms) and group constrained occupancy refinement. In individual and group occupancy refinement the refined occupancy values will be constrained between main.occupancy_min and main.occupancy_max, which is 0 and 1 by default. In group constrained occupancy refinement, there are (N-1) refinable occupancies per constrained group. An example of constrained group could be a residue that has N alternative conformations (where N typically ranges between 2 and 4). In such case all atoms within an alternative conformer will have equal occupancy values (1<=occupancy<=0) and the sum of all (N-
1)
occupancies will be 1.
●
The occupancy refinement is ON by default. This does not mean that occupancies of all atoms will be refined. Based on input PDB file, phenix.refine automatically finds which occupancies it will be refining. If no user defined selections is provided, phenix.refine will refine individual occupancies for all atoms that have partial occupancy values in input PDB file (1<occupancy<0, atoms with zero occupancy values are not included). Atoms in alternative conformations will be automatically determined based on altLoc identifiers in input PDB file and the group constrained occupancy refinement for these atoms will be performed as well.
●
Turning OFF the occupancy refinement can be done by removing the star (*) from the corresponing keyword in strategy = ... *occupancies ....
●
If selections are provided (see examples below) then the occupancy refinement for selected atoms will
● be performed as well as for those selected automatically (as described above).
User defined selections will override those defined by phenix.refine automatically. For example, if an atoms is automatically selected for individual occupancy refinement, but the user defined a group of atoms for which one occupancy factor will be refined (group occupnacy refinement), and this particular atom is within http://phenix-online.org/documentation/refinement.htm (11 of 42) [12/14/08 1:02:19 PM]
170
Structure refinement in PHENIX
● the group, then the individual occupancy will not be refined for this atom.
User can withhold the occupancy refinement for any atoms that were originally selectd for
● occupancy refinement by default (automatically).
The presence of user defined selections for occupancies to be refined is not enough to engage the occupancy refinement. It is important that the occupancy refinement is selected in strategy = keyword.
Examples:
1. Running with all defaul parameters:
% phenix.refine data.hkl model.pdb
This will refine individual coordinates, individual B-factors (isotropic or anisotropic) and occupancies for atoms in alternative conformations or for atoms having partial occupancies. If there is no such atoms in input PDB file, then no occupancies will be refined.
2. Refinement of occupancies only:
% phenix.refine data.hkl model.pdb strategy=occupancies
This will only refine occupancies for atoms in alternative conformations or for atoms having partial occupancies. If there is no such atoms in input PDB file, then no occupancies will be refined. Other model parameters, such as B-factors or coordinates will not be refined (this is the only difference between this and the above refinement runs).
3. Refine individual occupancies of water molecules (in addition to atoms with partial occupancies and those in alternative conformations, if any):
% phenix.refine data.hkl model.pdb refine.occupancies.individual="water"
Similar refinement where in addition all Zn atoms in chain X will be refined:
% phenix.refine data.hkl model.pdb occupancies.individual="water" \
occupancies.individual="chain X and element Zn"
4. Complex occupancy refinement strategy (combination of various available occupancy refinement types):
% phenix.refine data.hkl model.pdb strategy=occupancies occ.params
The amount of atom selections makes it inconvenient to type them all from the command line. This is why the parameter file occ.params is used and it contains following lines: refinement {
refine {
occupancies {
individual = element BR or water
individual = element Zn
constrained_group {
selection = chain A and resseq 1
}
constrained_group {
selection = chain A and resseq 2
selection = chain A and resseq 3
}
constrained_group {
selection = chain X and resname MAN
selection = chain X and resseq 42
selection = chain X and resseq 121
}
remove_selection = chain B and resseq 1 and name O
remove_selection = chain B and resseq 3 and name O
}
}
} which defines: http://phenix-online.org/documentation/refinement.htm (12 of 42) [12/14/08 1:02:19 PM]
171
Structure refinement in PHENIX
● group occupancy refinement. One occupancy for all atoms in chain A and resseq 1 will be refined and it will be contrained between main.occupancy_min and main.occupancy_min, which is by default 0 and
1, correspondingly.
● individual occupancies for all Zn and Br atoms, and waters.
● group constrained occupancy refinement. In one group the occupancies of atoms in chain A and resseq 2 and chain A and resseq 3 will be coupled. All occupancies within chain A and resseq 2 will have the exact same values lying between 0 and 1, and same for chain A and resseq 3. The sum of occupancies of chain A and resseq 2 and chain A and resseq 3 will be 1.0, making it one constrained group.
● another constrained group contains three residues (number 42 and 121, and MAN) and they occupancies will
● be refined similarly as described above.
occupancies of atoms O in residues 1 and 3 of chain X will not be refined (even though these atoms have partial occupancies in input PDB file and so they would normally be refined by default).
f' and f'' refinement
If the structure contains anomalous scatterers (e.g. Se in a SAD or MAD experiment), and if anomalous data are available, it is possible to refine the dispersive (f') and anomalous (f") scattering contributions (see e.g. Ethan Merritt's tutorial for more information). In phenix.refine, each group of scatterers with common f' and f" values is defined via an anomalous_scatterers scope, e.g.: refinement.refine.anomalous_scatterers {
group {
selection = name BR
f_prime = 0
f_double_prime = 0
refine = *f_prime *f_double_prime
}
}
NOTE: The refinement of the f' and f" values is carried out only if group_anomalous is included under refine.strategy
! Otherwise the values are simply used as specified but not refined. So the refinement run with the parameters above included into group_anomalous_1.params:
% phenix.refine model.pdb data_anom.hkl group_anomalous_1.params \
strategy=individual_sites+individual_adp+group_anomalous
If required, multiple scopes can be specified, one for each unique pair of f' and f" values. These values are assigned to all selected atoms (see below for atom selection details). Often it is possible to start the refinement from zero. If the refinement is not stable, it may be necessary to start from better estimates, or even to fix some values. For example (file group_anomalous_2.params): refinement.refine.anomalous_scatterers {
group {
selection = name BR
f_prime = -5
f_double_prime = 2
refine = f_prime *f_double_prime
}
}
% phenix.refine model.pdb data_anom.hkl group_anomalous_2.params \
strategy=individual_sites+individual_adp+group_anomalous
Here f' is fixed at -5 (note the missing * in front of f_prime in the refine definition), and the refinement of f" is initialized at 2. The phenix.form_factor_query command is available for obtaining estimates of f' and f" given an element type and a wavelength, e.g.:
% phenix.form_factor_query element=Br wavelength=0.8
Information from Sasaki table about Br (Z = 35) at 0.8 A fp: -1.0333
fdp: 2.9928
http://phenix-online.org/documentation/refinement.htm (13 of 42) [12/14/08 1:02:19 PM]
172
Structure refinement in PHENIX
Run without arguments for usage information:
% phenix.form_factor_query
Using NCS restraints in refinement
phenix.refine can find NCS automatically or use NCS selections defined by the user. Gaps in selected sequences are allowed - a sequence alignment is performed to detect insertions or deletions. We recommend to check the automatically detected or adjusted NCS groups.
1. Refinement with user provided NCS selections. Create a ncs_groups.params file with the NCS selections: refinement.ncs.restraint_group {
reference = chain A resid 1:4
selection = chain B and resid 1:3
selection = chain C
} refinement.ncs.restraint_group {
reference = chain E
selection = chain F
}
Specify ncs_groups.params as an additional input when running phenix.refine:
% phenix.refine data.hkl model.pdb ncs_groups.params main.ncs=True
This will perform the default refinement round (individual coordinates and B-factors) using NCS restraints on coordinates and B-factors. Note: user specified NCS restraints in ncs_groups.params can be modified automatically if better selection is found. To disable this potential automatic adjustment:
% phenix.refine data.hkl model.pdb ncs_groups.params main.ncs=True \
ncs.find_automatically=False
2. Automatic detection of NCS groups:
% phenix.refine data.hkl model.pdb main.ncs=True
This will perform the default refinement round (individual coordinates and B-factors) using NCS restraints automatically created based on input PDB file.
Water picking
phenix.refine has very efficient and fully automated protocol for water picking and refinement. One run of phenix.refine is normally necessary to locate waters, refine them, select good ones, add new and refine again, repeating the whole process multiple times. Normally, the default parameter settings are good for most cases:
% phenix.refine data.hkl model.pdb ordered_solvent=true
This will perform new water picking, analysis of existing waters and refinement of individual coordinates and
B-factors for both, macromolecule and waters. Several cycles will be performed allowing sorting out of spurious waters and refinement of well placed ones. Water picking can be combined with all others protocols, like simulated annealing, TLS refinement, etc. Some useful commands are:
1. Perform water picking every macro-cycle. By default, water picking starts after a half of macro-cycles is done:
% phenix.refine data.hkl model.pdb ordered_solvent=true \
ordered_solvent.mode=every_macro_cycle
2. Remove water only (based on specified criteria):
% phenix.refine data.hkl model.pdb ordered_solvent=true \ http://phenix-online.org/documentation/refinement.htm (14 of 42) [12/14/08 1:02:19 PM]
173
Structure refinement in PHENIX
ordered_solvent.mode=filter_only
3. The following run illustrates the use of some important parameters:
% phenix.refine data.hkl model.pdb ordered_solvent=true solvent.params
where the parameter file solvent.params contains: refinement {
ordered_solvent {
low_resolution = 2.8
b_iso_min = 1.0
b_iso_max = 50.0
b_iso = 25.0
primary_map_type = mFobs-DFmodel
primary_map_cutoff = 3.0
secondary_map_type = 2mFobs-DFmodel
}
peak_search {
map_next_to_model {
min_model_peak_dist = 1.8
max_model_peak_dist = 6.0
min_peak_peak_dist = 1.8
}
}
}
This will skip water picking if the resolution of data is lower than 2.8A, it will remove waters with B < 1.0 or B
> 50.0 A**2 or occupancy different from 1 or peak height at mFo-DFc map lower then 3 sigma. It will not select or will remove existing water if water-water or water-macromolecule distance is less than 1.8A or water-macromolecule distance is greater than 6.0 A. The initial occupancies and B-factors of newly placed waters will be 1.0 and 25.0 correspondingly. If b_either = None, then b_iso will be the mean atomic
B-factor.
Hydrogens in refinement
phenix.refine offers two possibilities for handling of hydrogen atoms:
● riding model;
● complete refinement of H (H atoms will be refined as other atoms in the model)
Although the contribution of hydrogen atoms to X-ray scattering is weak (at high resolution) or negligible
(at lower resolutions), the H atoms still present in real structures irrespective the data quality. Including them as riding model makes other model atoms aware of their positions and hence preventing nonphysical (bad) contacts at no cost in terms of refinable parameters (= no risk of overfitting). At subatomic resolution (approx. < 1.0 A) X-ray refinement or refinement using neutron data the parameters of
H atoms may be refined as for other heavier atoms. Below are some useful commands:
1. To add hydrogens to a model one need to run the Reduce program:
% phenix.reduce model.pdb > model_h_added.pdb
2. Once hydrogens added to a model, by default they will be refined as riding model:
% phenix.refine model.pdb data.hkl
It is possible to refine individual parameters for H atoms (if neutron data is used or at ultra-high resolution):
% phenix.refine model.pdb data.hkl hydrogens.refine=individual
3. To refine individual coordinates and ADP of H atoms:
% phenix.refine model.pdb data.hkl hydrogens.refine=individual http://phenix-online.org/documentation/refinement.htm (15 of 42) [12/14/08 1:02:19 PM]
174
Structure refinement in PHENIX
4. To remove hydrogens from a model:
% phenix.pdbtools model.pdb remove="element H"
We strongly recommend to not remove hydrogen atoms after refinement since it will make the refinement statistics (R-factors, etc...) unreproducible without repeating exactly the same refinement protocol.
5. Normally, phenix.reduce is used to add hydrogens. However, it may happen that phenix.reduce fails to add H to certain ligands. In this case phenix.elbow can be used to add hydrogens:
% phenix.elbow --final-geometry=model.pdb --residue=MAN --output=model_h
An output PDB file called model_h.pdb will contain the original ligand MAN with all hydrogen atoms added.
Refinement using twinned data
phenix.refine can handle the refinement of hemihedrally twinned data (two twin domains). Least square twin refinement can be carried out using the following commands line instructions:
% phenix.refine data.hkl model.pdb twin_law="-k,-h,-l"
The twin law (in this case -k,-h,-l) can be obtained from phenix.xtriage. If more than a single twin law is possible for the given unit cell and space group, using phenix.twin_map_utils might give clues which twin law is the most likely candidate to be used in refinement. Correcting maps for anisotropy might be useful:
% phenix.refine data.hkl model.pdb twin_law="-k,-h,-l" \
detwin.map_types.aniso_correct=true
The detwinning mode is auto by default: it will perform algebraic detwinning for twin fraction below 40%, and detwinning using proportionality rules (SHELXL style) for fractions above 40%. An important point to stress is that phenix.refine will only deal properly with twinning that involves two twin domains.
Neutron and joint X-ray and neutron refinement
Refinement using neutron data requires having H or/and D atoms added to the model. Use Reduce program to add all potential H atoms:
% phenix.reduce model.pdb > model_h.pdb
Currently, adding D atoms will require editing of model_h.pdb file to replace H with D where necessary.
1. Running refinement with neutron data only:
% phenix.refine data.hkl model.pdb main.scattering_table=neutron this will tell phenix.refine that the data in data.hkl file is coming from neutron scattering experiment and the appropriate scattering factors will be used in all calculations. All the examples and phenix.
refine functionality presented in this document are valid and compatible with using neutron data.
2. Using X-ray and neutron data simultaneously (joint X/N refinement). phenix.refine allows simultaneous use of both data sets, X-ray and neutron. The data sets are allowed to have different number of reflections and be collected at different resolutions. The only requirement (that is not enforced by the program but is the user's responsibility) is that both data sets have to be collected at the same temperature from same crystals
(or grown in identical conditions, having identical space groups and unit cell parameters). phenix.refine model.pdb data_xray.hkl neutron_data.file_name=data_neutron.hkl input.xray_data.
labels=FOBSx input.neutron_data.labels=FOBSn
Optimizing target weights
phenix.refine uses automatic procedure to determine the weights between X-ray target and stereochemistry or ADP restraints. To optimize these weights (that is to find those resulting in lowest Rfree factors): http://phenix-online.org/documentation/refinement.htm (16 of 42) [12/14/08 1:02:19 PM]
175
Structure refinement in PHENIX
% phenix.refine data.hkl model.pdb optimize_wxc=true optimize_wxu=true where optimize_wxc will turn on the optimization of X-ray/stereochemistry weight and optimize_wxu will turn on the optimization of X-ray/ADP weight. Note that this could be very slow since the procedure involves a grid search over an array of weights-candidates. It could be a good idea to run this overnight for a final model tune up.
Refinement at high resolution (higher than approx. 1.0 Angstrom)
Guidelines for structure refinement at high resolution:
● make sure the model contains hydrogen atoms. If not, phenix.reduce can be used to add them:
% phenix.reduce model.pdb > model_h.pdb
By default, phenix.refine will refine positions of H atoms as riding model (H atom will exactly follow the atom it is attached to). Note that phenix.refine can also refine individual coordinates of H atoms
(can be used for small molecules at ultra-high resolutions or for refinement against neutron data). This is governed by hydrogens.refine = individual *riding keyword and the default is to use riding model. hydrogens.refine defines how hydrogens' B-factors are refined (default is to refine one group
B for all H atoms). At high resolution one should definitely try to use one_b_per_molecule or even individual
choice (resolution permitting). Similar strategy should be used for refinement of H's occupancies, hydrogens.refine_occupancies keyword.
● most of the atoms should be refined with anisotropic ADP. Exceptions could be model parts with high Bfactors), atoms in alternative conformations, hydrogens and solvent molecules. However, at resolutions higher than 1.0A it's worth of trying to refine solvent with anisotropic ADP.
● it is a good idea to constantly monitor the existing solvent molecules and check for new ones by using ordered_solvent=true
keyword. If it's decided to refine waters with anisotropic ADP then make sure that the newly added ones are also anisotropic; use ordered_solvent.new_solvent=anisotropic
(default is isotropic). One can also ask phenix.refine to refine occupancies of water: ordered_solvent.refine_occupancies=true
(default is False).
● at high resolution the alternative conformations can be visible for more than 20% of residues. phenix.
refine
automatically recognizes atoms in alternative conformations (based on PDB records) and by default does constrained refinement of occupancies for these atoms. Please note, that phenix.refine does not build or create the fragments in alternative conformations; the atoms in alternative conformations should be properly defined in input PDB file (using conformer identifiers) (if actually found in a structure).
● the default weights for stereochemical and ADP restraints are most likely too tight at this resolution, so most likely the corresponding values need to be relaxed. Use wxc_scale and wxu_scale for this; lower values, like 1/2, 1/3, 1/4, ... etc of the default ones should be tried. phenix.refine allows automatically optimize these values ( optimize_wxc=True and optimize_wxu=True), however this is a very slow task so it may be considered for an over night run or even longer. At ultra-high resolutions
(approx. 0.8A or higher) a complete unrestrained refinement should be definitely tried out for well
● ordered parts of the model (single conformations, low B-factors). at ultra-high resolution the residual maps show the electron density redistribution due to bonds formation as density peaks at interatomic bonds. phenix.refine has specific tools to model this density called IAS models (Afonine et al, Acta Cryst. (2007). D63, 1194-1197).
This example illustrates most of the above points:
% phenix.refine model_h.pdb data.hkl high_res.params
where the file high_res.params contains following lines (for more parameters under each scope look at complete list of parameters): refinement.main {
number_of_macro_cycles = 5
ordered_solvent=true
} refinement.refine {
adp {
individual {
isotropic = element H
anisotropic = not element H http://phenix-online.org/documentation/refinement.htm (17 of 42) [12/14/08 1:02:19 PM]
176
Structure refinement in PHENIX
}
}
} refinement.target_weights {
wxc_scale = 0.25
wxu_scale = 0.3
} refinement {
ordered_solvent {
mode = auto filter_only *every_macro_cycle
new_solvent = isotropic *anisotropic
refine_occupancies = True
}
}
In the example above phenix.refine will perform 5 macro-cycles with ordered solvent update (add/ remove) every macro-cycles, all atoms including newly added water will be refined with anisotropic Bfactors (except hydrogens), riding model will be used for positional refinement of H atoms, one occupancy and isotropic B-factor will be refined per all hydrogens within a residue, occupancies of waters will be refined as well, the default stereochemistry and ADP restraints weights are scaled down by the factors of 0.25 and
0.3 respectively. If starting model is far enough from the "final" one, more macro-cycles may be required
(than 5 used in this example).
Examples of frequently used refinement protocols, common problems
1. Starting refinement from high R-factors:
% phenix.refine data.hkl model.pdb ordered_solvent=true main.number_of_macro_cycles=10 \
simulated_annealing=true strategy=rigid_body+individual_sites+individual_adp \
Depending on data resolution, refinement of individual ADP may be replaced with grouped B refinement:
% phenix.refine data.hkl model.pdb ordered_solvent=true simulated_annealing=true \
strategy=rigid_body+individual_sites+group_adp main.number_of_macro_cycles=10
Adding TLS refinement may be a good idea. Note, unlike other programs, phenix.refine does not require
"good model" for doing TLS refinement; TLS refinement is always stable in phenix.refine (please report if noticed otherwise):
% phenix.refine data.hkl model.pdb ordered_solvent=true simulated_annealing=true \
strategy=rigid_body+individual_sites+individual_adp+tls main.number_of_macro_cycles=10
If NCS is present - once can use it:
% phenix.refine data.hkl model.pdb ordered_solvent=true simulated_annealing=true \
strategy=rigid_body+individual_sites+individual_adp+tls main.ncs=true \
main.number_of_macro_cycles=10 tls_group_selections.params \
rigid_body_selections.params
where tls_groups_selections.txt, rigid_body_groups_selections.txt are the files TLS and rigid body groups selections, NCS will be determined automatically from input PDB file. See this document for details on how specify these selections. Note: in these four examples above we re-defined the default number of refinement macro-cycles from 3 to 10, since a start model with high R-factors most likely requires more cycles to become a good one. Also in these examples, the rigid body refinement will be run only once at first macrocycle, the water picking will start after half of macro-cycles is done (after 5th), the SA will be done only twice
- the first and before the last macro-cycles. Even though it is requested, the water picking may not be performed if the resolution is too low. All these default behaviors can be changed: see parameter's help for more details. The last command looks too long to type it in the command line. Look this document for an example of how to make it like this:
% phenix.refine data.hkl model.pdb custom_par_1.params
1. Refinement at "higher than medium" resolution - getting anisotropic.
http://phenix-online.org/documentation/refinement.htm (18 of 42) [12/14/08 1:02:19 PM]
177
Structure refinement in PHENIX
Refining at higher resolution one may consider:
●
At resolutions around 1.8 ... 1.7 A or higher it is a good idea to try refinement of anisotropic ADP for atoms at well ordered parts of the model. Well ordered parts can be identified by relatively small isotropic B-factors ~5-20A**2 of so.
●
The riding model for H atoms should be used.
●
Loosing stereochemistry and ADP restraints.
●
Re-thing using the NCS (if present): it may turn out to be enough of data to not use NCS restrains. Try both, with and without NCS, and based on R-free vales decide the strategy.
Supposing the H atoms were added to the model, below is an example of what may want to do at higher resolution:
% phenix.refine data.hkl model.pdb adp.individual.anisotropic="resid 1-2 and not element H" \
adp.individual.isotropic="not (resid 1-2 and not element H)" wxc_scale=2 wxu_scale=2
In the command above phenix.refine will refine the ADP of atoms in residues from 1 to 2 as anisotropic, the rest (including all H atoms) will be isotropic, the X-ray target contribution is increased for both, coordinate and ADP refinement. IMPORTANT: Please make note of the selection used in the above command: selecting atoms in residues 1 and 2 to be refined as anisotropic, one need to exclude hydrogens, which should be refined as isotropic.
1. Stereochemistry looks too tightly / loosely restrained, or gap between R-free and R-work seems too big: playing with restraints contribution. Although the automatic calculation of weight between X-ray and stereochemistry or ADP restraint targets is good for most of cases, it may happen that rmsd deviations from ideal bonds length or angles are looking too tight or loose ( depending on resolution). Or the difference between R-work and R-free is too big (significantly bigger than approx. 5%). In such cases one definitely need to try loose or tighten the restraints. Hers is how for coordinates refinement:
% phenix.refine data.hkl model.pdb wxc_scale=5
The default value for wxc_scale is 0.5. Increasing wxc_scale will make the X-ray target contribution greater and restraints looser. Note: wxc_scale=0 will completely exclude the experimental data from the refinement resulting in idealization of the stereochemistry. For stereochemistry idealization use the separate command:
% phenix.geometry_minimization model.pdb
To see the options type:
% phenix.geometry_minimization --help
To play with ADP restraints contribution:
% phenix.refine data.hkl model.pdb wxu_scale=3
The default value for wxu_scale is 1.0. Increasing wxu_scale will make the X-ray target contribution greater and therefore the B-factors restraints weaker. Also, one can completely ignore the automatically determined weights (for both, coordinates and ADP refinement) and use specific values instead:
% phenix.refine data.hkl model.pdb fix_wxc=15.0
The refinement target will be: Etotal = 15.0 * Exray + Egeom Similarly for ADP refinement:
% phenix.refine data.hkl model.pdb fix_wxu=25.0
The refinement target will be: Etotal = 25.0 * Exray + Eadp
2. Having unknown to phenix.refine item in PDB file (novel ligand, etc...). phenix.refine uses the CCP4
Monomer Library as the source of stereochemical information for building geometry restraints and reposting statistics. If phenix.refine is unable to match an item in input PDB file against the Monomer Library it will stop with "Sorry" message explaining what to do and listing the problem atoms. If this happened, it is necessary to obtain a cif file (parameter file, describing unknown molecule) by either making it manually or having eLBOW program to generate it: http://phenix-online.org/documentation/refinement.htm (19 of 42) [12/14/08 1:02:19 PM]
178
Structure refinement in PHENIX phenix.elbow model.pdb --do-all --output=all_ligands this will ask eLBOW to inspect the model_new.pdb file, find all unknown items in it and create one cif file for them all_ligands.cif. Alternatively, one can specify a three-letters name for the unknown residue: phenix.elbow model.pdb --residue=MAN --output=man
Once the cif file is created, the new run of phenix.refine will be: phenix.refine model.pdb data.pdb man.cif
Consult eLBOW documentation for more details.
Useful options
Changing the number of refinement cycles and minimizer iterations
% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=5 \
main.max_number_of_iterations=20
Creating R-free flags (if not present in the input reflection files)
% phenix.refine data.hkl model.pdb xray_data.r_free_flags.generate=True
It is important to understand that reflections selected for test set must be never used in any refinement of any parameters. If the newly selected test reflections were used in refinement before then the corresponding
R-free statistics will be wrong. In such case "refinement memory" removal procedure must be applied to recover proper statistics. To change the default maximal number of test flags to be generated and the fraction:
% phenix.refine data.hkl model.pdb xray_data.r_free_flags.generate=True \
xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=500
Specify the name for output files
% phenix.refine data.hkl model.pdb output.prefix=lysozyme
Reflection output
At the end of refinement a file with Fobs, Fmodel, Fcalc, Fmask, FOM, R-free_flags can be written out (in
MTZ format):
% phenix.refine data.hkl model.pdb export_final_f_model=mtz
To output the reflections in CNS reflection file format:
% phenix.refine data.hkl model.pdb export_final_f_model=cns
Note: Fmodel is the total model structure factor including all scales:
Fmodel = scale_k1 * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask)
Setting the resolution range for the refinement
% phenix.refine data.hkl model.pdb xray_data.low_resolution=15.0 xray_data.high_resolution=2.0
Bulk solvent correction and anisotropic scaling
By default phenix.refine always starts with bulk solvent modeling and anisotropic scaling. Here is the list of command that may be of use in some cases: http://phenix-online.org/documentation/refinement.htm (20 of 42) [12/14/08 1:02:19 PM]
179
Structure refinement in PHENIX
1. Perform bulk-solvent modeling and anisotropic scaling only:
% phenix.refine data.hkl model.pdb strategy=none
2. Bulk-solvent modeling only (no anisotropic scaling):
% phenix.refine data.hkl model.pdb strategy=none bulk_solvent_and_scale.anisotropic_scaling=false
3. Anisotropic scaling only (no bulk-solvent modeling):
% phenix.refine data.hkl model.pdb strategy=none bulk_solvent_and_scale.bulk_solvent=false
4. Turn off bulk-solvent modeling and anisotropic scaling:
% phenix.refine data.hkl model.pdb main.bulk_solvent_and_scale=false
5. Fixing bulk-solvent and anisotropic scale parameters to user defined values:
% phenix.refine data.hkl model.pdb bulk_solvent_and_scale.params
where bulk_solvent_and_scale.params is the file containing these lines: refinement {
bulk_solvent_and_scale {
k_sol_b_sol_grid_search = False
minimization_k_sol_b_sol = False
minimization_b_cart = False
fix_k_sol = 0.45
fix_b_sol = 56.0
fix_b_cart {
b11 = 1.2
b22 = 2.3
b33 = 3.6
b12 = 0.0
b13 = 0.0
b23 = 0.0
}
}
}
6. Mask parameters: Bulk solvent modeling involves the mask calculation. There are three principal parameters controlling it: solvent_radius, shrink_truncation_radius and grid_step_factor. Normally, these parameters are not supposed to be changed but can be changed:
% phenix.refine data.hkl model.pdb refinement.mask.solvent_radius=1.0 \
refinement.mask.shrink_truncation_radius=1.0 refinement.mask.grid_step_factor=3
If one wants to gain some more drop in R-factors (somewhere between 0.0 and 1.0%) it is possible to run fairly time consuming (depending on structure size and resolution) procedure of mask parameters optimization:
% phenix.refine data.hkl model.pdb optimize_mask=true
This will perform the grid search for solvent_radius and shrink_truncation_radius and select the values giving the best R-factor.
By default phenix.refine adds isotropic component of overall anisotropic scale matrix to atomic Bfactors, leaving the trace of overall anisotropic scale matrix equals to zero. This is the reason why one can observe the ADP changed even though the only anisotropic scaling was done and no ADP refinement performed.
Default refinement with user specified X-ray target function
http://phenix-online.org/documentation/refinement.htm (21 of 42) [12/14/08 1:02:19 PM]
180
Structure refinement in PHENIX
1. Refinement with least-squares target:
% phenix.refine data.hkl model.pdb main.target=ls
2. Refinement with maximum-likelihood target (default):
% phenix.refine data.hkl model.pdb main.target=ml
3. Refinement with phased maximum-likelihood target:
% phenix.refine data.hkl model.pdb main.target=mlhl
If phenix.refine finds Hendrickson-Lattman coefficients in input reflection file, it will automatically switch to mlhl target. To disable this:
% phenix.refine data.hkl model.pdb main.use_experimental_phases=false
Modifying the initial model before refinement starts
phenix.refine offers several options to modify input model before refinement starts:
1. shaking of coordinates (adding a random shift to coordinates):
% phenix.refine data.hkl model.pdb sites.shake=0.3
2. rotation-translation shift of coordinates:
% phenix.refine data.hkl model.pdb sites.rotate="1 2 3" sites.translate="4 5 6"
3. shaking of occupancies:
% phenix.refine data.hkl model.pdb occupancies.randomize=true
4. shaking of ADP:
% phenix.refine data.hkl model.pdb adp.randomize=true
5. shifting of ADP (adding a constant value):
% phenix.refine data.hkl model.pdb adp.shift_b_iso=10.0
6. scaling of ADP (multiplying by a constant value):
% phenix.refine data.hkl model.pdb adp.scale_adp=0.5
7. setting a value to ADP:
% phenix.refine data.hkl model.pdb adp.set_b_iso=25
8. converting to isotropic:
% phenix.refine data.hkl model.pdb adp.convert_to_isotropic=true
9. converting to anisotropic:
% phenix.refine data.hkl model.pdb adp.convert_to_anisotropic=true \
modify_start_model.selection="not element H"
When converting atoms into anisotropic, it is important to make sure that hydrogens (if present in the model) are not converted into anisotropic. http://phenix-online.org/documentation/refinement.htm (22 of 42) [12/14/08 1:02:19 PM]
181
Structure refinement in PHENIX
By default, the specified manipulations will be applied to all atoms. However, it is possible to apply them to only selected atoms:
% phenix.refine data.hkl model.pdb adp.set_b_iso=25 modify_start_model.selection="chain A"
To write out the modified model (without any refinement), add: main.number_of_macro_cycles=0, e.g.:
% phenix.refine data.hkl model.pdb adp.set_b_iso=25 \
main.number_of_macro_cycles=0
All the commands listed above plus some more are available from phenix.pdbtools utility which in fact is used internally in phenix.refine to perform these manipulations. For more information on phenix.
pdbtools
type:
% phenix.pdbtools --help
Documentation on phenix.pdbtools is also available.
Refinement using FFT or direct structure factor calculation algorithm
% phenix.refine data.hkl model.pdb \
structure_factors_and_gradients_accuracy.algorithm=fft or:
% phenix.refine data.hkl model.pdb \
structure_factors_and_gradients_accuracy.algorithm=direct
Ignoring test (free) flags in refinement
Sometimes one need to use all reflections ("work" and "test") in the refinement; for example, at very low resolution where each single reflection counts, or at subatomic resolution where the risk of overfitting is very low. In the example below all the reflections are used in the refinement:
% phenix.refine data.hkl model.pdb xray_data.r_free_flags.ignore_r_free_flags=true
Note: 1) the corresponding statistics (R-factors, ...) will be identical for "work" and "test" sets; 2) it is still necessary to have test flags presented in input reflection file (or automatically generated by phenix.refine).
Using phenix.refine to calculate structure factors
The total structure factor used in phenix.refine nearly in all calculations is defined as:
Fmodel = scale_k1 * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask)
1. Calculate Fcalc from atomic model and output in MTZ file (no solvent modeling or scaling):
% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=0 \
main.bulk_solvent_and_scale=false export_final_f_model=mtz
2. Calculate Fcalc from atomic model including bulk solvent and all scales:
% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \
strategy=none export_final_f_model=mtz
3. To output CNS/Xplor formatted reflection file:
% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \
strategy=none export_final_f_model=cns http://phenix-online.org/documentation/refinement.htm (23 of 42) [12/14/08 1:02:19 PM]
182
Structure refinement in PHENIX
4. Resolution limits can be applied:
% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \
strategy=none xray_data.low_resolution=15.0 xray_data.high_resolution=2.0
Note:
●
The number of calculated structure factors will the same as the number of observed data (Fobs) provided in the input reflection files or less since resolution and sigma cutoffs may be applied to Fobs or some Fobs may be automatically removed by outliers detection procedure.
●
The set of calculated structure factors has the same completeness as the set of provided Fobs.
Scattering factors
There are four choices for the scattering table to be used in phenix.refine:
● wk1995: Waasmaier & Kirfel table;
● it1992: International Crystallographic Tables (1992)
● n_gaussian: dynamic n-gaussian approximation
● neutron: table for neutron scattering
The default is n_gaussian. To switch to different table:
% phenix.refine data.hkl model.pdb main.scattering_table=neutron
Suppressing the output of certain files
The following command will tell phenix,refine to not write .eff, .geo, .def, maps and map coefficients files:
% phenix.refine data.hkl model.pdb write_eff_file=false write_geo_file=false \
write_def_file=false write_maps=false write_map_coefficients=false
The only output will be: .log and .pdb files.
Random seed
To change random seed:
% phenix.refine data.hkl model.pdb main.random_seed=7112384
The results of certain refinement protocols, such as restrained refinement of coordinates (with SA or
LBFGS minimization), are sensitive to the random seed. This is because: 1) for SA the refinement starts with random assignment of velocities to atoms; 2) the X-ray/geometry target weight calculation involves model shaking with some Cartesian dynamics. As result, running such refinement jobs with exactly the same parameters but different random seeds will produce different refinement statistics. The author's experience includes the case where the difference in R-factors was about 2.0% between two SA runs. Also, this opens a possibility to perform multi-start SA refinement to create an ensemble of slightly different models in average but sometimes containing significant variations in certain parts.
Electron density maps
By phenix.refine outputs two likelihood-weighted maps: 2mFo-DFc and mFo-DFc. The user can also choose between likelihood-weighted and regular maps with any specified coefficients, for example: 2mFo-
DFc, 2.7mFo-1.3DFc, Fo-Fc, 3Fo-2Fc. The result is output as ASCII X-PLOR format. A reflection file with map coefficients is also generated for use in Coot or XtalView. The example below illustrates the main options:
% phenix.refine data.hkl model.pdb map.params
where map.params contains: refinement {
electron_density_maps { http://phenix-online.org/documentation/refinement.htm (24 of 42) [12/14/08 1:02:19 PM]
183
Structure refinement in PHENIX
map {
mtz_label_amplitudes = 2FOFCWT
mtz_label_phases = PH2FOFCWT
likelihood_weighted = True
obs_factor = 2
calc_factor = 1
}
map {
mtz_label_amplitudes = FOFCWT
mtz_label_phases = PHFOFCWT
likelihood_weighted = True
obs_factor = 1
calc_factor = 1
}
map {
mtz_label_amplitudes = 3FO2FCWT
mtz_label_phases = PH3FO2FCWT
likelihood_weighted = False
obs_factor = 3
calc_factor = 2
}
grid_resolution_factor = 1/4.
region = *selection cell
atom_selection = name CA or name N or name C
apply_sigma_scaling = False
apply_volume_scaling = True
}
}
This will output three map files containing mFo-DFc, 2mFo-DFc and 3Fo-2Fc maps. All maps will be in absolute scale (in e/A**3). The map finess will be (data resolution)*grid_resolution_factor and the map will be output around main chain atoms. If atom_selection is set to None or all then map will be computed for all atoms. The corresponding MTZ file will also contain the map coefficients for these three maps.
Refining with anomalous data (or what phenix.refine does with Fobs+ and Fobs-).
The way phenix.refine uses Fobs+ and Fobs- is controlled by xray_data.force_anomalous_flag_to_be_equal_to
parameter. Here are 3 possibilities:
1. Default behavior: phenix.refine will use all Fobs: Fobs+ and Fobs- as independent reflections:
% phenix.refine model.pdb data_anom.hkl
2. phenix.refine will generate missing Bijvoet mates and use all Fobs+ and Fobs- as independent reflections if:
% phenix.refine model.pdb data_anom.hkl xray_data.force_anomalous_flag_to_be_equal_to=true
3. phenix.refine will merge Fobs+ and Fobs-, that is instead of two separate Fobs+ and Fobs- it will use one value F_mean = (Fobs+ + Fobs-)/2 if:
% phenix.refine model.pdb data_anom.hkl xray_data.force_anomalous_flag_to_be_equal_to=false
Look this documentation to see how to use and refine f' and f''.
Rejecting reflections by sigma
Reflections can be rejected by sigma cutoff criterion applied to amplitudes Fobs
<= sigma_fobs_rejection_criterion * sigma(Fobs):
% phenix.refine model.pdb data_anom.hkl xray_data.sigma_fobs_rejection_criterion=2 or/and intensities Iobs <= sigma_iobs_rejection_criterion * sigma(Iobs): http://phenix-online.org/documentation/refinement.htm (25 of 42) [12/14/08 1:02:19 PM]
184
Structure refinement in PHENIX
% phenix.refine model.pdb data_anom.hkl xray_data.sigma_iobs_rejection_criterion=2
Internally, phenix.refine uses amplitudes. If both sigma_fobs_rejection_criterion and sigma_iobs_rejection_criterion are given as non-zero values, then both criteria will be applied: first to Iobs, then to Fobs (after truncated Iobs got converted to Fobs):
% phenix.refine model.pdb data_anom.hkl xray_data.sigma_fobs_rejection_criterion=2 \
xray_data.sigma_iobs_rejection_criterion=2
By default, both sigma_fobs_rejection_criterion and sigma_iobs_rejection_criterion are set to zero
(no reflections rejected) and, unless strongly motivated, we encourage to not change these values.
If amplitudes provided at input then sigma_fobs_rejection_criterion is ignored.
Developer's tools
phenix.refine offers a broad functionality for experimenting that may not be useful in everyday practice but handy for testing ideas. Substitute input Fobs with calculated Fcalc, shake model and refine it
Instead of using Fobs from input data file one can ask phenix.refine to use the calculated structure factors
Fcalc using the input model. Obviously, the R-factors will be zero throughout the refinement. One can also shake various model parameters (see this document for details), then refinement will start with some bad statistics (big R-factors at least) and hopefully will converge to unmodified start model (if not shaken too well). Also it's possible to simulate Flat bulk solvent model contribution and anisotropic scaling:
% phenix.refine model.pdb data.hkl experiment.params
where experiment.params contains the following: refinement {
main {
fake_f_obs = True
}
modify_start_model {
selection = "chain A"
sites {
shake = 0.5
}
}
fake_f_obs {
k_sol = 0.35
b_sol = 45.0
b_cart = 1.25 3.78 1.25 0.0 0.0 0.0
scale = 358.0
}
}
In this example, the input Fobs will be substituted with the same amount of Fcalc (absolute values of Fcalc), then the coordinates of the structure will be shaken to achieve rmsd=0.5 and finally the default run of refinement will be done. The bulk solvent and anisotropic scale and overall scalar scales are also added to thus obtained Fcalc in accordance with Fmodel definition (see this document for definition of total structure factor, Fmodel). Expected refinement behavior: R-factors will drop from something big to zero.
CIF modifications and links
phenix.refine uses the CCP4 monomer library to build geometry restraints (bond, angle, dihedral, chirality and planarity restraints). The CCP4 monomer library comes with a set of "modifications" and "links" which are defined in the file mon_lib_list.cif. Some of these are used automatically when phenix.refine builds the geometry restraints (e.g. the peptide and RNA/DNA chain links). Other links and modifications have to be applied manually, e.g. (cif_modification.params file): refinement.pdb_interpretation.apply_cif_modification {
data_mod = 5pho
residue_selection = resname GUA and name O5T
} http://phenix-online.org/documentation/refinement.htm (26 of 42) [12/14/08 1:02:19 PM]
185
Structure refinement in PHENIX
Here a custom 5pho modification is applied to all GUA residues with an O5T atom. I.e. the modification can be applied to multiple residues with a single apply_cif_modification block. The CIF modification is supplied as a separate file on the phenix.refine command line, e.g. (data_mod_5pho.cif file): data_mod_5pho
# loop_
_chem_mod_atom.mod_id
_chem_mod_atom.function
_chem_mod_atom.atom_id
_chem_mod_atom.new_atom_id
_chem_mod_atom.new_type_symbol
_chem_mod_atom.new_type_energy
_chem_mod_atom.new_partial_charge
5pho add . O5T O OH .
loop_
_chem_mod_bond.mod_id
_chem_mod_bond.function
_chem_mod_bond.atom_id_1
_chem_mod_bond.atom_id_2
_chem_mod_bond.new_type
_chem_mod_bond.new_value_dist
_chem_mod_bond.new_value_dist_esd
5pho add O5T P coval 1.520 0.020
The whole command will be:
% phenix.refine model_o5t.pdb data.hkl data_mod_5pho.cif cif_modification.params
Similarly, a link can be applied like this (cif_link.params file): refinement.pdb_interpretation.apply_cif_link {
data_link = MAN-THR
residue_selection_1 = chain X and resname MAN and resid 900
residue_selection_2 = chain X and resname THR and resid 42
}
% phenix.refine model.pdb data.hkl cif_link.params
The residue selections for links must select exactly one residue each. The MAN-THR link is pre-defined in mon_lib_list.cif. Custom links can be supplied as additional files on the phenix.refine command line.
See mon_lib_list.cif for examples. The full path to this file can be obtained with the command:
% phenix.where_mon_lib_list_cif
All apply_cif_modification and apply_cif_link definitions will be included into the .def files. I.e. it is not necessary to specify the definitions again if further refinement runs are started with .def files. Note that all LINK, SSBOND, HYDBND, SLTBRG and CISPEP records in the input PDB files are ignored.
Definition of custom bonds and angles
Most geometry restraints (bonds, angles, etc.) are generated automatically based on the CCP4 monomer library. Additional custom bond and angle restraints, e.g. between protein and a ligand or ion, can be specified in this way: refinement.geometry_restraints.edits {
zn_selection = chain X and resname ZN and resid 200 and name ZN
his117_selection = chain X and resname HIS and resid 117 and name NE2
asp130_selection = chain X and resname ASP and resid 130 and name OD1
bond {
action = *add
atom_selection_1 = $zn_selection
atom_selection_2 = $his117_selection http://phenix-online.org/documentation/refinement.htm (27 of 42) [12/14/08 1:02:19 PM]
186
Structure refinement in PHENIX
distance_ideal = 2.1
sigma = 0.02
slack = None
}
bond {
action = *add
atom_selection_1 = $zn_selection
atom_selection_2 = $asp130_selection
distance_ideal = 2.1
sigma = 0.02
slack = None
}
angle {
action = *add
atom_selection_1 = $his117_selection
atom_selection_2 = $zn_selection
atom_selection_3 = $asp130_selection
angle_ideal = 109.47
sigma = 5
}
}
The atom selections must uniquely select a single atom. Save the geometry_restraints.edits to a file and specify the file name as an additional argument when running phenix.refine for the first time. For example:
% phenix.refine model.pdb data.hkl restraints_edits.params
The edits will be included into the .def files. I.e. it is not necessary to manually specify them again if further refinement runs are started with .def files. The bond.slack parameter above can be used to disable a bond restraint within the slack tolerance around distance_ideal. This is useful for hydrogen bond restraints, or when refining with very high-resolution data (e.g. better than 1 A). The bond restraint is activated only if the discrepancy between the model bond distance and distance_ideal is greater than the slack value. The slack is subtracted from the discrepancy. The resulting potential is called a "squarewell potential" by some authors. The formula for the contribution to the refinement target function is: weight * delta_slack**2 with: delta_slack = sign(delta) * max(0, (abs(delta) - slack)) delta = distance_ideal - distance_model weight = 1 / sigma**2
The slack value must be greater than or equal to zero (it can also be None, which is equivalent to zero in this case).
Atom selection examples
All atoms all
All C-alpha atoms (not case sensitive) name ca
All atoms with ``H`` in the name (``*`` is a wildcard character) name *H*
Atoms names with ``*`` (backslash disables wildcard function) name o2\* http://phenix-online.org/documentation/refinement.htm (28 of 42) [12/14/08 1:02:19 PM]
187
Structure refinement in PHENIX
Atom names with spaces name 'O 1'
Atom names with primes don't necessarily have to be quoted name o2'
Boolean ``and``, ``or`` and ``not`` resname ALA and (name ca or name c or name n or name o) chain a and not altid b resid 120 and icode c and model 2 segid a and element c and charge 2+ and anisou
Residue 188 resseq 188 resid
is a synonym for resseq: resid 188
Note that if there are several chains containing residue number 188, all of them will be selected. To be more specific and select residue 188 in particular chain: chain A and resid 188 this will select residue 188 only in chain A. Residues 2 through 10 (including 2 and 10) resseq 2:10
"Smart" selections resname ALA and backbone resname ALA and sidechain peptide backbone rna backbone or dna backbone water or nucleotide dna and not (phosphate or ribose) within(5, (nucleotide or peptide) backbone)
Depositing refined structure with PDB
phenix.refine reports a comprehensive statistics in PDB file header of refined model. This statistics consists of two parts: the first (upper, formatted with REMARK record) part is relevant to the current refinement run and contains the information about input data and model files, time stamp, start and final R-factors, refinement statistics from macro-cycle to macro-cycle, etc. The second (lower, formatted with REMARK
3 record) part is abstracted from a particular refinement run (no intermediate statistics, time, no file names, etc.). This part is supposed to go in PDB and the first part should be removed manually.
Referencing phenix.refine
Afonine, P.V., Grosse-Kunstleve, R.W. & Adams, P.D. (2005). CCP4 Newsl. 42, contribution 8.
Relevant reading
Below is the list of papers either published in connection with phenix.refine or used to implement specific features in phenix.refine:
1. Maximum-likelihood in structure refinement: http://phenix-online.org/documentation/refinement.htm (29 of 42) [12/14/08 1:02:19 PM]
188
Structure refinement in PHENIX
❍
V.Yu., Lunin & T.P., Skovoroda. Acta Cryst. (1995). A51, 880-887. "R-free likelihood-based estimates of
❍ errors for phases calculated from atomic models"
Pannu, N.S., Murshudov, G.N., Dodson, E.J. & Read, R.J. (1998). Acta Cryst. D54, 1285-1294. "Incorporation
❍ of Prior Phase Information Strengthens Maximum-Likelihood Structure Refinement"
V.Y., Lunin, P.V. Afonine & A.G., Urzhumtsev. Acta Cryst. (2002). A58, 270-282. "Likelihood-based
❍ refinement. I. Irremovable model errors"
P. Afonine, V.Y. Lunin & A. Urzhumtsev. J. Appl. Cryst. (2003). 36, 158-159. "MLMF: least-squares approximation of likelihood-based refinement criteria"
2. ADP:
❍
V. Schomaker & K.N. Trueblood. Acta Cryst. (1968). B24, 63-76. "On the rigid-body motion of molecules in crystals"
❍
F.L. Hirshfeld. Acta Cryst. (1976). A32, 239-244. "Can X-ray data distinguish bonding effects from vibrational
❍ smearing?"
T.R. Schneider. Proceedings of the CCP4 Study Weekend (E. Dodson, M. Moore, A. Ralph, and S. Bailey, eds.), SERC Daresbury Laboratory, Daresbury, U.K., pp. 133-144 (1996). "What can we Learn from
Anisotropic Temperature Factors ?"
❍
M.D. Winn, M.N. Isupov & G.N. Murshudov. Acta Cryst. (2001). D57, 122-133. "Use of TLS parameters to
❍ model anisotropic displacements in macromolecular refinement"
R.W. Grosse-Kunstleve & P.D. Adams. J. Appl. Cryst. (2002). 35, 477-480. "On the handling of atomic anisotropic displacement parameters"
❍
P. Afonine & A. Urzhumtsev. (2007). CCP4 Newsletter on Protein Crystallography. 45. Contribution 6. "On determination of T matrix in TLS modeling"
3. Rigid body refinement:
❍
Afonine PV, Grosse-Kunstleve RW, Adams PD & Urzhumtsev AG. "Methods for optimal rigid body refinement of models with large displacements". (in preparation for Acta Cryst. D).
4. Bulk-solvent modeling and anisotropic scaling:
❍
S. Sheriff & W.A. Hendrickson. Acta Cryst. (1987). A43, 118-121. "Description of overall anisotropy in diffraction from macromolecular crystals"
❍
Jiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115. "Protein hydration observed by X-ray
❍ diffraction. Solvation properties of penicillopepsin and neuraminidase crystal structures."
A. Fokine & A. Urzhumtsev. Acta Cryst. (2002). D58, 1387-1392. "Flat bulk-solvent model: obtaining optimal
❍ parameters"
P.V. Afonine, R.W. Grosse-Kunstleve & P.D. Adams. Acta Cryst. (2005). D61, 850-855. "A robust bulksolvent correction and anisotropic scaling procedure"
5. Refinement at subatomic resolution:
❍
Afonine, P.V., Pichon-Pesme, V., Muzet, N., Jelsch, C., Lecomte, C. & Urzhumtsev, A. (2002). CCP4
❍
Newsletter on Protein Crystallography. 41. "Modeling of bond electron density"
Afonine P.V., Lunin, V., Muzet, N. & Urzhumtsev, A. (2004). Acta Cryst., D60, 260-274. "On the possibility of
❍ observation of valence electron density for individual bonds in proteins in conventional difference maps"
P.V. Afonine, R.W. Grosse-Kunstleve, P.D. Adams, V.Y. Lunin, A. Urzhumtsev. "On macromolecular refinement at subatomic resolution with interatomic scatterers" (submitted to Acta Cryst. D).
6. LBFGS minimization:
❍
Liu, D.C. & Nocedal, J. (1989). Mathematical Programming, 45, 503-528. "On the limited memory BFGS method for large scale optimization"
7. Dynamics, simulated annealing:
❍
Brünger, A.T., Kuriyan, J., Karplus, M. (1987). Science. 235, 458-460. "Crystallographic R factor refinement
❍ by molecular dynamics"
Adams, P.D., Pannu, N.S., Read, R.J. & Brünger, A.T. (1997). Proc. Natl. Acad. Sci. 94, 5018-5023. "Cross-
❍ validated maximum likelihood enhances crystallographic simulated annealing refinement"
L.M. Rice, Y. Shamoo & A.T. Brünger. J. Appl. Cryst. (1998). 31, 798-805. "Phase Improvement by Multi-
❍
Start Simulated Annealing Refinement and Structure-Factor Averaging"
Brünger, A.T & Adams, P.D. (2002). Acc. Chem. Res. 35, 404-412. "Molecular dynamics applied to X-ray structure refinement"
8. Target weights calculation:
❍
Brünger, A.T., Karplus, M. & Petsko, G.A. (1989). Acta Cryst. A45, 50-61. "Crystallographic refinement by
❍ simulated annealing: application to crambin"
Brünger, A.T. (1992). Nature (London), 355, 472-474. "The free R value: a novel statistical quantity for
❍ assessing the accuracy of crystal structures"
Adams, P.D., Pannu, N.S., Read, R.J. & Brünger, A.T. (1997). Proc. Natl. Acad. Sci. 94, 5018-5023. "Crossvalidated maximum likelihood enhances crystallographic simulated annealing refinement"
9. Electron density maps (Fourier syntheses) calculation:
❍
A.G. Urzhumtsev, T.P. Skovoroda & V.Y. Lunin. J. Appl. Cryst. (1996). 29, 741-744. "A procedure compatible with X-PLOR for the calculation of electron-density maps weighted using an R-free-likelihood approach"
10. Monomer Library:
❍
Vagin, A.A., Steiner, R.A., Lebedev, A.A, Potterton, L., McNicholas, S., Long, F. & Murshudov, G.N. (2004).
Acta Cryst. D60, 2184-2195. "REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use"
11. Scattering factors: http://phenix-online.org/documentation/refinement.htm (30 of 42) [12/14/08 1:02:19 PM]
189
Structure refinement in PHENIX
❍
D. Waasmaier & A. Kirfel. Acta Cryst. (1995). A51, 416-431. "New analytical scattering-factor functions for
❍ free atoms and ions"
International Tables for Crystallography (1992)
❍
Neutron News, Vol. 3, No. 3, 1992, pp. 29-37. http://www.ncnr.nist.gov/resources/n-lengths/list.html
❍
Grosse-Kunstleve RW, Sauter NK & Adams PD. Newsletter of the IUCr Commission on Crystallographic
Computing 2004, 3:22-31. "cctbx news"
12. Neutron and joint X-ray/neutron refinement:
❍
A. Wlodawer & W.A. Hendrickson. Acta Cryst. (1982). A38, 239-247. "A procedure for joint refinement of
❍ macromolecular structures with X-ray and neutron diffraction data from single crystals"
A. Wlodawer, H. Savage & G. Dodson. Acta Cryst. (1989). B45, 99-107. "Structure of insulin: results of joint neutron and X-ray refinement"
13. Stereochemical restraints:
❍
Grosse-Kunstleve, R.W., Afonine, P.V., Adams, P.D. (2004). Newsletter of the IUCr Commission on
Crystallographic Computing, 4, 19-36. "cctbx news: Geometry restraints and other new features"
14. Parameters parsing and interpretation:
❍
Grosse-Kunstleve RW, Afonine PV, Sauter NK, Adams PD. Newsletter of the IUCr Commission on
Crystallographic Computing 2005, 5:69-91. "cctbx news: Phil and friends"
Feedback, more information
●
Send bug reports to: [email protected]
●
For help write to: [email protected]
●
Questions: [email protected]
●
More information: www.phenix-online.org or type:
% phenix.about
List of all refinement keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
refinement
Scope of parameters for structure refinement with phenix.refine
crystal_symmetry
Scope of space group and unit cell parameters
unit_cell= None
space_group= None
input
Scope of input file names, labels, processing directions
symmetry_safety_check= *error warning Check for consistency of crystall
symmetry from model and data files
pdb
file_name= None Model file(s) name (PDB)
neutron_data
Scope of neutron data and neutron free-R flags
ignore_xn_free_r_mismatch= False
file_name= None
labels= None
high_resolution= None
low_resolution= None
outliers_rejection= True
sigma_fobs_rejection_criterion= 0.0
sigma_iobs_rejection_criterion= 0.0
ignore_all_zeros= True
force_anomalous_flag_to_be_equal_to= None
r_free_flags
file_name= None This is normally the same as the file containing
Fobs and is usually selected automatically.
label= None http://phenix-online.org/documentation/refinement.htm (31 of 42) [12/14/08 1:02:19 PM]
190
Structure refinement in PHENIX
test_flag_value= None This value is usually selected automatically
- do not change unless you really know what
you're doing!
disable_suitability_test= False
ignore_pdb_hexdigest= False If True, disables safety check based
on MD5 hexdigests stored in PDB files
produced by previous runs.
ignore_r_free_flags= False Use all reflections in refinement (work
and test)
generate= False Generate R-free flags (if not available in input
files)
fraction= 0.1
max_free= 2000
lattice_symmetry_max_delta= 5
use_lattice_symmetry= True
xray_data
Scope of X-ray data and free-R flags
file_name= None
labels= None
high_resolution= None
low_resolution= None
outliers_rejection= True
sigma_fobs_rejection_criterion= 0.0
sigma_iobs_rejection_criterion= 0.0
ignore_all_zeros= True
force_anomalous_flag_to_be_equal_to= None
r_free_flags
file_name= None This is normally the same as the file containing
Fobs and is usually selected automatically.
label= None
test_flag_value= None This value is usually selected automatically
- do not change unless you really know what
you're doing!
disable_suitability_test= False
ignore_pdb_hexdigest= False If True, disables safety check based
on MD5 hexdigests stored in PDB files
produced by previous runs.
ignore_r_free_flags= False Use all reflections in refinement (work
and test)
generate= False Generate R-free flags (if not available in input
files)
fraction= 0.1
max_free= 2000
lattice_symmetry_max_delta= 5
use_lattice_symmetry= True
experimental_phases
Scope of experimental phase information (HL
coefficients)
file_name= None
labels= None
monomers
Scope of monomers information (CIF files)
file_name= None Monomer file(s) name (CIF)
output
Scope for output files
prefix= None Prefix for all output files
serial= None Serial number for consequtive refinement runs
serial_format= "%03d" Format serial number in output file name
write_eff_file= True
write_geo_file= True
write_def_file= True
export_final_f_model= mtz cns Write Fobs, Fmodel, various scales and
more to MTZ or CNS file
write_maps= False
write_map_coefficients= True
electron_density_maps
Electron density maps calculation parameters
map_format= *xplor
map_coefficients_format= *mtz phs
suppress= None List of mtz_label_amplitudes of maps to be suppressed.
Intended to selectively suppress computation and writing of
the standard maps.
grid_resolution_factor= 1/4 http://phenix-online.org/documentation/refinement.htm (32 of 42) [12/14/08 1:02:19 PM]
191
Structure refinement in PHENIX
region= *selection cell
atom_selection= None
atom_selection_buffer= 3
apply_sigma_scaling= True
apply_volume_scaling= False
map
mtz_label_amplitudes= None
mtz_label_phases= None
likelihood_weighted= None
obs_factor= None
calc_factor= None
kicked= False
fill_missing_f_obs_with_weighted_f_model= True
map
mtz_label_amplitudes= 2FOFCWT
mtz_label_phases= PH2FOFCWT
likelihood_weighted= True
obs_factor= 2
calc_factor= 1
kicked= False
fill_missing_f_obs_with_weighted_f_model= True
map
mtz_label_amplitudes= FOFCWT
mtz_label_phases= PHFOFCWT
likelihood_weighted= True
obs_factor= 1
calc_factor= 1
kicked= False
fill_missing_f_obs_with_weighted_f_model= True
map
mtz_label_amplitudes= 2FOFCWT_no_fill
mtz_label_phases= PH2FOFCWT_no_fill
likelihood_weighted= True
obs_factor= 2
calc_factor= 1
kicked= False
fill_missing_f_obs_with_weighted_f_model= False
map
mtz_label_amplitudes= FOFCWT_no_fill
mtz_label_phases= PHFOFCWT_no_fill
likelihood_weighted= True
obs_factor= 1
calc_factor= 1
kicked= False
fill_missing_f_obs_with_weighted_f_model= False
anomalous_difference_map
mtz_label_amplitudes= ANOM
mtz_label_phases= PHANOM
refine
Scope of refinement flags (=flags defining what to refine) and atom
selections (=atoms to be refined)
strategy= *individual_sites rigid_body *individual_adp group_adp tls
*occupancies group_anomalous Atomic parameters to be refined
sites
Scope of atom selections for coordinates refinement
individual= None Atom selections for individual atoms
rigid_body= None Atom selections for rigid groups
adp
Scope of atom selections for ADP (Atomic Displacement Parameters)
refinement
group_adp_refinement_mode= *one_adp_group_per_residue
two_adp_groups_per_residue group_selection
Select one of three available modes for
group B-factors refinement. For two groups
per residue, the groups will be main-chain
and side-chain atoms. Provide selections
for groups if group_selection is chosen.
group= None One isotropic ADP for group of selected here atoms will
be refined
one_adp_group_per_residue= True Refine one isotropic ADP per residue http://phenix-online.org/documentation/refinement.htm (33 of 42) [12/14/08 1:02:19 PM]
192
Structure refinement in PHENIX
two_adp_group_per_residue= False Refine one isotropic ADP per residue
tls= None Selection(s) for TLS group(s)
individual
Scope of atom selections for refinement of individual ADP
isotropic= None Selections for atoms to be refinement with
isotropic ADP
anisotropic= None Selections for atoms to be refinement with
anisotropic ADP
occupancies
Scope of atom selections for occupancy refinement
individual= None Selection(s) for individual atoms. None is default
which is to refine the individual occupancies for atoms
in alternative conformations or for atoms with partial
occupancies only.
remove_selection= None Occupancies of selected atoms will not be
refined (even though they might satisfy the default
criteria for occupancy refinement).
constrained_group
Selections to define constrained occupancies. If
only one selection is provided then one occupancy
factor per selected atoms will be refined and it
will be constrained between predefined max and min
values.
selection= None Atom selection string.
anomalous_scatterers
group
selection= None
f_prime= 0
f_double_prime= 0
refine= *f_prime *f_double_prime
main
Scope for most common and frequently used parameters
bulk_solvent_and_scale= True Do bulk solvent correction and anisotropic
scaling
simulated_annealing= False Do simulated annealing
ordered_solvent= False Add (or/and remove) and refine ordered solvent
molecules (water)
ncs= False Use restraints NCS in refinement (can be determined
automatically)
ias= False Build and use IAS (interatomic scatterers) model (at
resolutions higher than approx. 0.9 A)
number_of_macro_cycles= 3 Number of macro-cycles to be performed
max_number_of_iterations= 25
use_form_factor_weights= False
tan_u_iso= False Use tan() reparameterization in ADP refinement
(currently disabeled)
use_convergence_test= False Determine if refinement converged and stop
then
target= *ml mlhl ml_sad ls Choices for refinement target
min_number_of_test_set_reflections_for_max_likelihood_target= 50 minimum
number of
test
reflections
required
for use of
ML target
max_number_of_resolution_bins= 30
reference_xray_structure= None
use_experimental_phases= None Use experimental phases if available. If
true, the target function must be set to mlhl .
compute_optimal_errors= False
random_seed= 2679941 Ransom seed
scattering_table= wk1995 it1992 *n_gaussian neutron Choices of
scattering table for structure factors calculations
use_normalized_geometry_target= True
target_weights_only= False Calculate target weights only and exit
refinement
use_f_model_scaled= False Use Fmodel structure factors multiplied by
overall scale factor scale_k1
max_d_min= 0.25
Highest allowable resolution limit for refinement
fake_f_obs= False Substitute real experimental Fobs with those
calculated from input model (scales and solvent can be http://phenix-online.org/documentation/refinement.htm (34 of 42) [12/14/08 1:02:19 PM]
193
Structure refinement in PHENIX
added)
optimize_mask= False Refine mask parameters (solvent_radius and
shrink_truncation_radius)
occupancy_max= 1.0
Maximum allowable occupancy of an atom
occupancy_min= 0.0
Minimum allowable occupancy of an atom
stir= None Stepwise increase of resolution: start refinement at lower
resolution and gradually proceed with higher resolution
rigid_bond_test= False Compute Hirshfeld's rigid bond test value (RBT)
show_residual_map_peaks_and_holes= True Show highest peaks and deepest
holes in residual_map.
fft_vs_direct= False Check accuracy of approximations used in Fcalc
calculations
outliers_rejection= True Remove basic wilson outliers , extreme wilson
outliers , and beamstop shadow outliers
switch_to_isotropic_high_res_limit= 1.7
If the resolution is lower than
this limit, all atoms selected for
individual ADP refinement and not
participating in TLS groups will be
automatically converted to
isotropic.
find_and_add_hydrogens= False Find H or D atoms using difference map and
add them to the model. This option should be
used if ultra-high resolution data is available
or when refining againts neutron data.
modify_start_model
Scope of parameters to modify initial model before
refinement
selection= None Selection for atoms to be modified
random_seed= None Random seed
adp
Scope of options to modify ADP of selected atoms
atom_selection= None Selection for atoms to be modified. Overrides
parent-level selection.
randomize= None Randomize ADP within a certain range
set_b_iso= None Set ADP of atoms to set_b_iso
convert_to_isotropic= None Convert atoms to isotropic
convert_to_anisotropic= None Convert atoms to anisotropic
shift_b_iso= None Add shift_b_iso value to ADP
scale_adp= None Multiply ADP by scale_adp
sites
Scope of options to modify coordinates of selected atoms
atom_selection= None Selection for atoms to be modified. Overrides
parent-level selection.
shake= None Randomize coordinates with mean error value equal to shake
translate= 0 0 0 Translational shift
rotate= 0 0 0 Rotational shift
euler_angle_convention= *xyz zyz Euler angles convention to be used
for rotation
occupancies
Scope of options to modify occupancies of selected atoms
randomize= None Randomize occupancies within a certain range
set= None Set all or selected occupancies to given value
output
Write out PDB file with modified model (file name is defined in
write_modified)
file_name= None Default is the original file name with the file
extension replaced by _modified.pdb .
fake_f_obs
Scope of parameters to simulate Fobs
k_sol= 0.0
Bulk solvent k_sol values
b_sol= 0.0
Bulk solvent b_sol values
b_cart= 0 0 0 0 0 0 Anisotropic scale matrix
scale= 1.0
Overall scale factor
scattering_table= wk1995 it1992 *n_gaussian neutron Choices of
scattering table for structure factors calculations
r_free_flags_fraction= None
structure_factors_accuracy
algorithm= *fft direct
cos_sin_table= False
grid_resolution_factor= 1/3.
quality_factor= None
u_base= None
b_base= None http://phenix-online.org/documentation/refinement.htm (35 of 42) [12/14/08 1:02:19 PM]
194
Structure refinement in PHENIX
wing_cutoff= None
exp_table_one_over_step_size= None
mask
solvent_radius= 1.11
shrink_truncation_radius= 0.9
grid_step_factor= 4.0
The grid step for the mask calculation is
determined as highest_resolution divided by
grid_step_factor. This is considered as suggested
value and may be adjusted internally based on the
resolution.
verbose= 1
mean_shift_for_mask_update= 0.1
Value of overall model shift in
refinement to updates the mask.
ignore_zero_occupancy_atoms= True Include atoms with zero occupancy
into mask calculation
ignore_hydrogens= True Ignore H or D atoms in mask calculation
hydrogens
Scope of parameters for H atoms refinement
refine= individual *riding Choice for refinement: riding model or full
(H is refined as other atoms; useful at very high resolutions
only)
refine_adp= one_b_per_residue *one_b_per_molecule individual Startegy
for ADP refinement of H atoms (used only if mode=riding)
refine_occupancies= one_q_per_residue *one_q_per_molecule individual
Method to refine parameters of H or D atoms
contribute_to_f_calc= True Add H contribution to Xray (Fcalc)
calculations
high_resolution_limit_to_include_scattering_from_h= 1.6
xh_bond_distance_deviation_limit= 0.0
Idealize XH bond distances if
deviation from ideal is greater than
xh_bond_distance_deviation_limit
build
map_type= mFobs-DFmodel Map type to be used to find hydrogens
map_cutoff= 2.0
Map cutoff
angular_step= 3.0
Step in degrees for 6D rigid body search for best
fit
use_sigma_scaled_maps= True Default is sigma scaled map, map in
absolute scale is used otherwise.
resolution_factor= 1./4.
max_number_of_peaks= None
map_next_to_model
min_model_peak_dist= 0.7
max_model_peak_dist= 1.05
min_peak_peak_dist= 1.0
use_hydrogens= False
peak_search
peak_search_level= 1
max_peaks= 0
interpolate= True
min_distance_sym_equiv= None
general_positions_only= False
min_cross_distance= 1.0
group_b_iso
number_of_macro_cycles= 3
max_number_of_iterations= 25
convergence_test= False
run_finite_differences_test= False
adp
iso
max_number_of_iterations= 25
automatic_randomization_if_all_equal= True
scaling
scale_max= 3.0
scale_min= 10.0
tls
one_residue_one_group= None
refine_T= True
refine_L= True
refine_S= True http://phenix-online.org/documentation/refinement.htm (36 of 42) [12/14/08 1:02:19 PM]
195
Structure refinement in PHENIX
number_of_macro_cycles= 2
max_number_of_iterations= 25
start_tls_value= None
run_finite_differences_test= False
eps= 1.e-6
adp_restraints
iso
use_u_local_only= False
sphere_radius= 5.0
distance_power= 1.69
average_power= 1.03
wilson_b_weight_auto= False
wilson_b_weight= None
plain_pairs_radius= 5.0
refine_ap_and_dp= False
b_iso_max= None
group_occupancy
number_of_macro_cycles= 3
max_number_of_iterations= 25
convergence_test= False
run_finite_differences_test= False
group_anomalous
number_of_minimizer_cycles= 3
lbfgs_max_iterations= 20
number_of_finite_difference_tests= 0
rigid_body
Scope of parameters for rigid body refinement
mode= *first_macro_cycle_only every_macro_cycle Defines how many times
the rigid body refinement is performed during refinement run.
first_macro_cycle_only to run only once at first macrocycle,
every_macro_cycle to do rigid body refinement
main.number_of_macro_cycles times
target= ls_wunit_k1 ml *auto Rigid body refinement target function:
least-squares or maximum-likelihood
target_auto_switch_resolution= 6.0
Used if target=auto, use optimal
target for given working resolution.
refine_rotation= True Only rotation is refined (translation is fixed).
refine_translation= True Only translation is refined (rotation is fixed).
max_iterations= 25 Number of LBFGS minimization iterations
bulk_solvent_and_scale= True Bulk-solvent and scaling within rigid body
refinement (needed since large rigid body shifts
invalidate the mask).
euler_angle_convention= *xyz zyz Euler angles convention
lbfgs_line_search_max_function_evaluations= 10
min_number_of_reflections= 100 Number of reflections that defines the
first lowest resolution zone for
multiple_zones protocol
multi_body_factor= 1
zone_exponent= 4.0
high_resolution= 3.0
High resolution cutoff (used for rigid body
refinement only)
max_low_high_res_limit= None Maximum value for high resolution cutoff
for the first lowest resolution zone
number_of_zones= 5 Number of resolution zones for MZ protocol
ncs
find_automatically= True
coordinate_sigma= None
b_factor_weight= None
excessive_distance_limit= 1.5
special_position_warnings_only= False
simple_ncs_from_pdb
pdb_in= None Input PDB file to be used to identify ncs
temp_dir= "" temporary directory (ncs_domain_pdb will be written
there)
min_length= 10 minimum number of matching residues in a segment
njump= 1 Take every njumpth residue instead of each 1
njump_recursion= 10 Take every njump_recursion residue instead of
each 1 on recursive call http://phenix-online.org/documentation/refinement.htm (37 of 42) [12/14/08 1:02:19 PM]
196
Structure refinement in PHENIX
min_length_recursion= 50 minimum number of matching residues in a
segment for recursive call
min_percent= 95.
min percent identity of matching residues
max_rmsd= 2.
max rmsd of 2 chains. If 0, then only search for domains
quick= True If quick is set and all chains match, just look for 1 NCS
group
max_rmsd_user= 3.
max rmsd of chains suggested by user (i.e., if
called from phenix.refine with suggested ncs groups)
maximize_size_of_groups= False You can request that the scoring be
set up to maximize the number of members in
NCS groups
ncs_domain_pdb_stem= None NCS domains will be written to
ncs_domain_pdb_stem+"group_"+nn
write_ncs_domain_pdb= False You can write out PDB files representing
NCS domains for density modification if you
want
verbose= False Verbose output
debug= False Debugging output
dry_run= False Just read in and check parameter names
domain_finding_parameters
find_invariant_domains= True Find the parts of a set of chains
that follow NCS
initial_rms= 0.5
Guess of RMS among chains
match_radius= 2.0
Keep atoms that are within match_radius of
NCS-related atoms
similarity_threshold= 0.75
Threshold for similarity between
segments
smooth_length= 0 two segments separated by smooth_length or less
get connected
min_contig_length= 3 segments < min_contig_length rejected
min_fraction_domain= 0.2
domain must be this fraction of a chain
max_rmsd_domain= 2.
max rmsd of domains
restraint_group
reference= None
selection= None
coordinate_sigma= 0.05
b_factor_weight= 10
pdb_interpretation
link_distance_cutoff= 3
disulfide_distance_cutoff= 3
chir_volume_esd= 0.2
nonbonded_distance_cutoff= None
default_vdw_distance= 1
min_vdw_distance= 1
nonbonded_buffer= 1
vdw_1_4_factor= 0.8
translate_cns_dna_rna_residue_names= None
apply_cif_modification
data_mod= None
residue_selection= None
apply_cif_link
data_link= None
residue_selection_1= None
residue_selection_2= None
peptide_link
cis_threshold= 45
discard_psi_phi= True
omega_esd_override_value= None
rna_sugar_pucker_analysis
use= True
bond_min_distance= 1.2
bond_max_distance= 1.8
epsilon_range_not_2p_min= 155
epsilon_range_not_2p_max= 310
delta_range_2p_min= 115
delta_range_2p_max= 180
p_distance_c1_n_line_2p_max= 2.9
show_histogram_slots http://phenix-online.org/documentation/refinement.htm (38 of 42) [12/14/08 1:02:19 PM]
197
Structure refinement in PHENIX
bond_lengths= 5
nonbonded_interaction_distances= 5
dihedral_angle_deviations_from_ideal= 5
show_max_lines
bond_restraints_sorted_by_residual= 5
nonbonded_interactions_sorted_by_model_distance= 5
dihedral_angle_restraints_sorted_by_residual= 3
clash_guard
nonbonded_distance_threshold= 0.5
max_number_of_distances_below_threshold= 100
max_fraction_of_distances_below_threshold= 0.1
geometry_restraints
edits
excessive_bond_distance_limit= 10
bond
action= *add delete change
atom_selection_1= None
atom_selection_2= None
symmetry_operation= None The bond is between atom_1 and
symmetry_operation * atom_2, with atom_1 and
atom_2 given in fractional coordinates.
Example: symmetry_operation = -x-1,-y,z
distance_ideal= None
sigma= None
slack= None
angle
action= *add delete change
atom_selection_1= None
atom_selection_2= None
atom_selection_3= None
angle_ideal= None
sigma= None
geometry_restraints
remove
angles= None
dihedrals= None
chiralities= None
planarities= None
ordered_solvent
low_resolution= 2.8
Low resolution limit for water picking (at lower
resolution water will not be picked even if requessted)
mode= *auto filter_only every_macro_cycle Choices for water picking
strategy: auto - start water picking after ferst few macro-cycles,
filter_only - remove water only, every_macro_cycle - do water
update every macro-cycle
output_residue_name= HOH
output_chain_id= S
output_atom_name= O
b_iso_min= 1.0
Minimum B-factor value, waters with smaller value will be
rejected
b_iso_max= 80.0
Maximum B-factor value, waters with bigger value will be
rejected
anisotropy_min= 0.1
For solvent refined as anisotropic: remove is less
than this value
b_iso= None Initial B-factor value for newly added water
scattering_type= O Defines scattering factors for newly added waters
occupancy_min= 0.1
Minimum occupancy value, waters with smaller value
will be rejected
occupancy_max= 1.0
Maximum occupancy value, waters with bigger value
will be rejected
occupancy= 1.0
Initial occupancy value for newly added water
primary_map_type= mFobs-DFmodel
primary_map_cutoff= 3.0
secondary_map_type= 2mFobs-DFmodel
secondary_map_cutoff= 1.0
h_bond_min_mac= 1.8
h_bond_min_sol= 1.8
http://phenix-online.org/documentation/refinement.htm (39 of 42) [12/14/08 1:02:19 PM]
198
Structure refinement in PHENIX
h_bond_max= 3.2
new_solvent= *isotropic anisotropic Based on the choice, added solvent
will have isotropic or anisotropic b-factors
refine_adp= True Refine ADP for newly placed solvent.
refine_occupancies= False Refine solvent occupancies.
filter_at_start= True
n_cycles= 1
ignore_final_filtering_step= False
correct_drifted_waters= True
use_kick_maps= False Use Dusan's Turk kick maps for peak picking
kick_map
parameters for kick maps
kick_size= 0.5
number_of_kicks= 100
peak_search
use_sigma_scaled_maps= True Default is sigma scaled map, map in absolute
scale is used otherwise.
resolution_factor= 1./4.
max_number_of_peaks= None
map_next_to_model
min_model_peak_dist= 1.8
max_model_peak_dist= 6.0
min_peak_peak_dist= 1.8
use_hydrogens= False
peak_search
peak_search_level= 1
max_peaks= 0
interpolate= True
min_distance_sym_equiv= None
general_positions_only= False
min_cross_distance= 1.8
bulk_solvent_and_scale
bulk_solvent= True
anisotropic_scaling= True
k_sol_b_sol_grid_search= True
minimization_k_sol_b_sol= True
minimization_b_cart= True
target= ls_wunit_k1 *ml
symmetry_constraints_on_b_cart= True
k_sol_max= 0.6
k_sol_min= 0.0
b_sol_max= 150.0
b_sol_min= 0.0
k_sol_grid_search_max= 0.6
k_sol_grid_search_min= 0.0
b_sol_grid_search_max= 80.0
b_sol_grid_search_min= 20.0
k_sol_step= 0.3
b_sol_step= 30.0
number_of_macro_cycles= 2
max_iterations= 25
min_iterations= 25
fix_k_sol= None
fix_b_sol= None
apply_back_trace_of_b_cart= False
verbose= -1
ignore_bulk_solvent_and_scale_failure= False
fix_b_cart
b11= None
b22= None
b33= None
b12= None
b13= None
b23= None
alpha_beta
free_reflections_per_bin= 140
number_of_macromolecule_atoms_absent= 225
n_atoms_included= 0
bf_atoms_absent= 15.0
http://phenix-online.org/documentation/refinement.htm (40 of 42) [12/14/08 1:02:19 PM]
199
Structure refinement in PHENIX
final_error= 0.0
absent_atom_type= "O"
method= *est calc
estimation_algorithm= *analytical iterative
verbose= -1
interpolation= True
fix_scale_for_calc_option= None
number_of_waters_absent= 613
sigmaa_estimator
kernel_width_free_reflections= 100
kernel_on_chebyshev_nodes= True
number_of_sampling_points= 20
number_of_chebyshev_terms= 10
use_sampling_sum_weights= True
mask
solvent_radius= 1.11
shrink_truncation_radius= 0.9
grid_step_factor= 4.0
The grid step for the mask calculation is
determined as highest_resolution divided by
grid_step_factor. This is considered as suggested
value and may be adjusted internally based on the
resolution.
verbose= 1
mean_shift_for_mask_update= 0.1
Value of overall model shift in
refinement to updates the mask.
ignore_zero_occupancy_atoms= True Include atoms with zero occupancy into
mask calculation
ignore_hydrogens= True Ignore H or D atoms in mask calculation
cartesian_dynamics
temperature= 300
number_of_steps= 200
time_step= 0.0005
initial_velocities_zero_fraction= 0
n_print= 100
verbose= -1
simulated_annealing
start_temperature= 5000
final_temperature= 300
cool_rate= 100
number_of_steps= 25
time_step= 0.0005
initial_velocities_zero_fraction= 0
n_print= 100
update_grads_shift= 0.3
refine_sites= True
refine_adp= False
max_number_of_iterations= 25
mode= every_macro_cycle *second_and_before_last once first
verbose= -1
interleaved_minimization
number_of_iterations= 0
time_step_factor= 10
restraints= *bonds *angles
target_weights
mode= *automatic every_macro_cycle
wxc_scale= 0.5
wxu_scale= 1.0
wc= 1.0
wu= 1.0
fix_wxc= None
fix_wxu= None
optimize_wxc= False
bonds_rmsd_max= 0.05
angles_rmsd_max= 3.5
optimize_wxu= False
shake_sites= True
shake_adp= 10.0
http://phenix-online.org/documentation/refinement.htm (41 of 42) [12/14/08 1:02:20 PM]
200
Structure refinement in PHENIX
regularize_ncycles= 50
verbose= 1
wnc_scale= 0.5
wnu_scale= 1.0
rmsd_cutoff_for_gradient_filtering= 3.0
ias
b_iso_max= 100.0
occupancy_min= -1.0
occupancy_max= 1.5
ias_b_iso_max= 100.0
ias_b_iso_min= 0.0
ias_occupancy_min= 0.01
ias_occupancy_max= 3.0
initial_ias_occupancy= 1.0
build_ias_types= L R B BH
use_map= True
build_only= False
file_prefix= None
peak_search_map
map_type= *Fobs-Fmodel mFobs-DFmodel
grid_step= 0.25
scaling= *volume sigma
ls_target_names
target_name= *ls_wunit_k1 ls_wunit_k2 ls_wunit_kunit ls_wunit_k1_fixed
ls_wunit_k1ask3_fixed ls_wexp_k1 ls_wexp_k2 ls_wexp_kunit
ls_wff_k1 ls_wff_k2 ls_wff_kunit ls_wff_k1_fixed
ls_wff_k1ask3_fixed lsm_kunit lsm_k1 lsm_k2 lsm_k1_fixed
lsm_k1ask3_fixed
twinning
twin_law= None
twin_target= *twin_lsq_f
detwin
mode= algebraic proportional *auto
map_types
twofofc= *two_m_dtfo_d_fc two_dtfo_fc
fofc= *m_dtfo_d_fc gradient m_gradient
aniso_correct= False
structure_factors_and_gradients_accuracy
algorithm= *fft direct
cos_sin_table= False
grid_resolution_factor= 1/3.
quality_factor= None
u_base= None
b_base= None
wing_cutoff= None
exp_table_one_over_step_size= None
r_free_flags
fraction= 0.1
max_free= 2000
lattice_symmetry_max_delta= 5.0
Tolerance used in the determination of
the highest lattice symmetry. Can be thought
of as angle between lattice vectors that
should line up perfectly if the symmetry is
ideal. A typical value is 3 degrees.
use_lattice_symmetry= True When generating Rfree flags, do so in the
asymmetric unit of the highest lattice symmetry.
The result is an Rfree set suitable for twin
refinement.
http://phenix-online.org/documentation/refinement.htm (42 of 42) [12/14/08 1:02:20 PM]
201
Finding NCS in chains from a PDB file with simple_ncs_from_pdb
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Finding NCS in chains from a PDB file with simple_ncs_from_pdb
How simple_ncs_from_pdb works:
Additional notes on how simple_ncs_from_pdb works:
Output files from simple_ncs_from_pdb
Standard run of simple_ncs_from_pdb:
Specific limitations and problems:
List of all simple_ncs_from_pdb keywords
Author(s)
● simple_ncs_from_pdb : Tom Terwilliger
●
Phil command interpreter: Ralf W. Grosse-Kunstleve
● find_domain: Peter Zwart
Purpose
The simple_ncs_from_pdb method identifies NCS in the chains in a PDB file and writes out the NCS operators in forms suitable for phenix.refine, resolve, and the AutoSol and AutoBuild Wizards.
Usage
How simple_ncs_from_pdb works:
The basic steps that the simple_ncs_from_pdb carries out are:
●
(1) Identify sets of chains in the PDB file that have the same sequences. These are potential
NCS-related chains.
●
(2) Determine which chains in a group actually are related by NCS within a given tolerance
(max_rmsd, typically 2 A)
●
(3) Determine which residues in each chain are related by NCS, and break the chains into domains that do follow NCS if necessary.
●
(4) Determine the NCS operators for all chains in each NCS group or domain
Additional notes on how simple_ncs_from_pdb works:
http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (1 of 6) [12/14/08 1:02:26 PM]
202
Finding NCS in chains from a PDB file with simple_ncs_from_pdb
The matching of chains is done in a first quick pass by calling simple_ncs_from_pdb recursively and only using every 10th residue in the analysis. This allows a check of whether chains that have the same sequence really have the same structure or whether some such chains should be in separate NCS groups. The use of only every 10th residue allows time for an all-against all matching of chains.
If residue numbers are not the same for corresponding chains, but they are simply offset by a constant for each chain, this will be recognized and the chains will be aligned.
An assumption in simple_ncs_from_pdb is that residue numbers are consistent among chains. They do not have to be the same: chain A can be residues 1-100 and chain B 211-300. However chain A cannot be residues 1-10 and 20-50, matching to chain B residues 1-10 and 21-51.
Residue numbers are used to align pairs of chains, maximizing identities of matching pairs of residues.
Pairs of chains that can match are identified.
Groupings of chains are chosen that maximize the number of matching residues between each member of a group and the first (reference) member of the group.
For a pair of chains, some segments may match and others not. Each pair of segments must have a length at least as long as min_length and a percent identity at least as high as min_percent. A pair of segments may not end in a mismatch. An overall pair of chains must have an rmsd of CA atoms of less than or equal to rmsd_max.
If find_invariant_domain is specified then once all chains that can be matched with the above algorithm are identified, all remaining chains are matched, allowing the break-up of chains into invariant domains. The invariant domains each get a separate NCS group.
Output files from simple_ncs_from_pdb
The output files that are produced are:
●
NCS operators written in format for phenix.refine simple_ncs_from_pdb.ncs
●
NCS operators written in format for the PHENIX Wizards simple_ncs_from_pdb.ncs_spec
Examples
Standard run of simple_ncs_from_pdb:
Running simple_ncs_from pdb is easy. For example, you can type: phenix.simple_ncs_from_pdb anb.pdb
Simple_ncs_from_pdb will analyze the chains in anb.pdb and identify any NCS that exists. For this sample run the following output is produced:
Chains in this PDB file: ['A', 'N', 'B']
GROUPS BASED ON QUICK COMPARISON: [['A', 'B']]
Looking for invariant domains for ...: ['A', 'N', 'B'] [[[2, 525]], http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (2 of 6) [12/14/08 1:02:26 PM]
203
Finding NCS in chains from a PDB file with simple_ncs_from_pdb
[[2, 259], [290, 525]], [[20, 525]]]
There were 3 chains in the PDB file A, N and B. Chains A and B were very similar and clearly related by
NCS. This relationship was found in a quick comparison. Chain N had the same sequence as A and B, but was not in the identical comparison. Searching for domains that did have NCS among all three chains produced three domains, represented below by 4 NCS groups:
GROUP 1
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[2, 5], [20, 35],
[60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516],
[520, 525]], [[2, 5], [20, 35], [60, 76], [78, 107], [110, 137],
[401, 431], [433, 483], [485, 516], [520, 525]], [[2, 5], [20, 35],
[60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516],
[520, 525]]]]
RMSD (A) from chain A: 0.0 1.09 0.07
Number of residues matching chain A:[215, 215, 194]
Source of NCS info: anb.pdb
The residues in chains A, B, and N in this group are 2-5, 20-35, 60-76, 78-107, 110-137, 401-431,
433-483, 485-516 and 520-525. Note that these are not all contiguous. These are all the residues that all have the same relationships among the 3 chains. The RMSD of CA atoms between chains A and N is
1.09 A and between A and B is 0.07 A.
The NCS operators relating these domains are given below.
OPERATOR 1
CENTER: 29.9208 -53.3304 -13.4779
ROTA 1: 1.0000 0.0000 0.0000
ROTA 2: 0.0000 1.0000 0.0000
ROTA 3: 0.0000 0.0000 1.0000
TRANS: 0.0000 0.0000 0.0000
OPERATOR 2
CENTER: 32.5410 -35.4227 20.2768
ROTA 1: 0.9370 -0.2825 0.2053
ROTA 2: -0.3285 -0.9125 0.2439
ROTA 3: 0.1184 -0.2960 -0.9478
TRANS: -14.7410 -79.9073 -8.5967
OPERATOR 3
CENTER: 50.0256 -91.8920 -13.6461
ROTA 1: 0.6257 0.7800 -0.0037
ROTA 2: -0.7800 0.6257 -0.0010
ROTA 3: 0.0015 0.0035 1.0000
TRANS: 70.3889 42.4760 0.3937
GROUP 2
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[6, 9],
[56, 59], [517, 519]], [[6, 9], [56, 59], [517, 519]], [[6, 9],
[56, 59], [517, 519]]]]
RMSD (A) from chain A: 0.0 0.48 0.03
http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (3 of 6) [12/14/08 1:02:26 PM]
204
Finding NCS in chains from a PDB file with simple_ncs_from_pdb
Number of residues matching chain A:[11, 11, 11]
Source of NCS info: anb.pdb
OPERATOR 1
CENTER: 47.5037 -61.5641 -11.2751
ROTA 1: 1.0000 0.0000 0.0000
ROTA 2: 0.0000 1.0000 0.0000
ROTA 3: 0.0000 0.0000 1.0000
TRANS: 0.0000 0.0000 0.0000
OPERATOR 2
CENTER: 51.8984 -33.6038 20.9877
ROTA 1: 0.9367 -0.2981 0.1836
ROTA 2: -0.3113 -0.9492 0.0469
ROTA 3: 0.1603 -0.1011 -0.9819
TRANS: -14.9810 -78.2888 -2.3823
OPERATOR 3
CENTER: 66.8308 -82.9508 -11.4633
ROTA 1: 0.6255 0.7802 -0.0016
ROTA 2: -0.7802 0.6255 -0.0025
ROTA 3: -0.0009 0.0028 1.0000
TRANS: 70.3999 42.4366 0.4815
GROUP 3
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[193, 255],
[257, 259], [290, 355], [357, 374]], [[193, 255], [257, 259],
[290, 355], [357, 374]], [[193, 255], [257, 259], [290, 355], [357, 374]]]]
RMSD (A) from chain A: 0.0 0.61 0.01
Number of residues matching chain A:[150, 150, 150]
Source of NCS info: anb.pdb
OPERATOR 1
CENTER: 36.1219 -37.6124 -62.1437
ROTA 1: 1.0000 0.0000 0.0000
ROTA 2: 0.0000 1.0000 0.0000
ROTA 3: 0.0000 0.0000 1.0000
TRANS: 0.0000 0.0000 0.0000
OPERATOR 2
CENTER: 39.1403 -33.0801 60.7270
ROTA 1: 0.7650 0.3808 -0.5194
ROTA 2: 0.0664 -0.8488 -0.5245
ROTA 3: -0.6406 0.3668 -0.6746
TRANS: 50.3180 -36.4383 16.0299
OPERATOR 3
CENTER: 40.9347 -76.7723 -62.2004
ROTA 1: 0.5942 0.8043 -0.0007
ROTA 2: -0.8043 0.5942 -0.0064
ROTA 3: -0.0047 0.0043 1.0000
http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (4 of 6) [12/14/08 1:02:26 PM]
205
Finding NCS in chains from a PDB file with simple_ncs_from_pdb
TRANS: 73.5084 40.5311 0.5807
GROUP 4
Summary of NCS group with 3 operators:
ID of chain/residue where these apply: [['A', 'N', 'B'], [[[36, 41]],
[[36, 41]], [[36, 41]]]]
RMSD (A) from chain A: 0.0 0.22 0.03
Number of residues matching chain A:[6, 6, 6]
Source of NCS info: anb.pdb
OPERATOR 1
CENTER: 45.4522 -37.4720 -14.4660
ROTA 1: 1.0000 0.0000 0.0000
ROTA 2: 0.0000 1.0000 0.0000
ROTA 3: 0.0000 0.0000 1.0000
TRANS: 0.0000 0.0000 0.0000
OPERATOR 2
CENTER: 42.1483 -55.6520 24.0535
ROTA 1: 0.9444 -0.3074 0.1171
ROTA 2: -0.2975 -0.9501 -0.0940
ROTA 3: 0.1402 0.0540 -0.9887
TRANS: -14.2728 -75.5420 6.4099
OPERATOR 3
CENTER: 46.7900 -69.5227 -14.6653
ROTA 1: 0.6247 0.7809 -0.0013
ROTA 2: -0.7809 0.6247 0.0028
ROTA 3: 0.0030 -0.0008 1.0000
TRANS: 70.4964 42.5349 0.0067
NCS operators written in format for resolve to: simple_ncs_from_pdb.resolve
NCS operators written in format for phenix.refine to: simple_ncs_from_pdb.ncs
NCS written as ncs object information to: simple_ncs_from_pdb.ncs_spec
Possible Problems
Specific limitations and problems:
●
If user specifies chains to be in a suggested NCS group, but they are too dissimilar as a whole
(rmsd > max_rmsd_use) then the group is rejected even if some fragment of the chains could be similar.
●
Chain specification from suggested_ncs_groups could in principle have than one chain in one group...and simple_ncs_from_pdb can only use suggested groups that consist of N copies of single chains.
●
If the NCS asymmetric unit of your crystal contains more than one chain, simple_ncs_from_pdb will consider it to have more than one domain, and it will assign one NCS group to each chain.
Literature
Additional information
http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (5 of 6) [12/14/08 1:02:26 PM]
206
Finding NCS in chains from a PDB file with simple_ncs_from_pdb
List of all simple_ncs_from_pdb keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- simple_ncs_from_pdb
pdb_in= None Input PDB file to be used to identify ncs
temp_dir= "" temporary directory (ncs_domain_pdb will be written there)
min_length= 10 minimum number of matching residues in a segment
njump= 1 Take every njumpth residue instead of each 1
njump_recursion= 10 Take every njump_recursion residue instead of each 1 on
recursive call
min_length_recursion= 50 minimum number of matching residues in a segment
for recursive call
min_percent= 95.
min percent identity of matching residues
max_rmsd= 2.
max rmsd of 2 chains. If 0, then only search for domains
quick= True If quick is set and all chains match, just look for 1 NCS group
max_rmsd_user= 3.
max rmsd of chains suggested by user (i.e., if called
from phenix.refine with suggested ncs groups)
maximize_size_of_groups= False You can request that the scoring be set up
to maximize the number of members in NCS groups
ncs_domain_pdb_stem= None NCS domains will be written to
ncs_domain_pdb_stem+"group_"+nn
write_ncs_domain_pdb= False You can write out PDB files representing NCS
domains for density modification if you want
verbose= False Verbose output
debug= False Debugging output
dry_run= False Just read in and check parameter names
domain_finding_parameters
find_invariant_domains= True Find the parts of a set of chains that
follow NCS
initial_rms= 0.5
Guess of RMS among chains
match_radius= 2.0
Keep atoms that are within match_radius of NCS-related
atoms
similarity_threshold= 0.75
Threshold for similarity between segments
smooth_length= 0 two segments separated by smooth_length or less get
connected
min_contig_length= 3 segments < min_contig_length rejected
min_fraction_domain= 0.2
domain must be this fraction of a chain
max_rmsd_domain= 2.
max rmsd of domains http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (6 of 6) [12/14/08 1:02:26 PM]
207
Finding and analyzing NCS from heavy-atom sites or a model with find_ncs
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Finding and analyzing NCS from heavy-atom sites or a model with find_ncs
Specific limitations and problems:
Author(s)
●
●
●
● find_ncs: Tom Terwilliger simple_ncs_from_pdb : Tom Terwilliger
Phil command interpreter: Ralf W. Grosse-Kunstleve find_domain: Peter Zwart
Purpose
The find_ncs method identifies NCS in either (a) the chains in a PDB file or (b) a set of heavy-atom sites, and writes out the NCS operators in forms suitable for phenix.refine, resolve, and the AutoSol and AutoBuild
Wizards.
Usage
How find_ncs works:
The basic steps that the find_ncs carries out are:
●
(1) Decide whether to use simple_ncs_from_pdb (used if the input file contains chains from a PDB file) or RESOLVE NCS identification (used if the input file contains heavy-atom sites)
●
(2) call either simple_ncs_from_pdb or RESOLVE to identify NCS
●
(3) Evaluate the NCS by calculating the correlation of NCS-related electron density based on the input map coefficients mtz file.
●
(4) Report the NCS operators and correlations
Output files from find_ncs
The output files that are produced are: http://phenix-online.org/documentation/find_ncs.htm (1 of 5) [12/14/08 1:02:31 PM]
208
Finding and analyzing NCS from heavy-atom sites or a model with find_ncs
●
NCS operators written in format for phenix.refine find_ncs.ncs
●
NCS operators written in format for the PHENIX Wizards find_ncs.ncs_spec
What find_ncs needs:
find_ncs needs a file containing NCS information and a file with map coefficients.
The file with NCS information can be...
● a PDB file with a model (find_ncs will call simple_ncs_from_pdb to extract NCS operators from the chains in your model)
● a PDB file with heavy-atom sites (find_ncs will call RESOLVE to find NCS operators from your heavyatom sites)
● an NCS definitions file written by a PHENIX wizard (e.g., AutoSol_1.ncs_spec, produced by AutoSol)
● a RESOLVE log file containing formatted NCS operators
The file with map coefficients can be any MTZ file with coefficients for a map. If find_ncs does not choose the correct columns automatically, then you can specify them with a command like:
labin="labin FP=FP PHIB=PHIB FOM=FOM "
If you have no map coefficients yet (you just have some sites and want to get operators, for example), you can tell find_ncs to ignore the map with:
ncs_parameters.force_ncs=True
Examples
Standard run of find_ncs:
Running find_ncs is easy. From the command-line you can type: phenix.find_ncs anb.pdb mlt.mtz
This will produce the following output:
Getting column labels from mlt.mtz for input map file
FILE TYPE: ccp4_mtz
All labels: ['FP', 'SIGFP', 'PHIC', 'FOM']
Labin line will be: labin FP=FP PHIB=PHIC FOM=FOM
To change it modify this: params.ncs.labin="labin FP=FP PHIB=PHIC FOM=FOM "
This is the map that will be used to evaluate NCS
Reading NCS information from: anb.pdb
Copying mlt.mtz to temp_dir/mlt.mtz
http://phenix-online.org/documentation/find_ncs.htm (2 of 5) [12/14/08 1:02:31 PM]
209
Finding and analyzing NCS from heavy-atom sites or a model with find_ncs
This PDB file contains 2 chains and 636 total residues and 636 C-alpha or P atoms and 4740 total atoms
NCS will be found using the chains in this PDB file
Chains in this PDB file: ['M', 'Z']
Two chains were found in the file anb.pdb, chain M and chain Z
GROUPS BASED ON QUICK COMPARISON: []
Looking for invariant domains for ...: ['M', 'Z'] [[[2, 138], [193, 373]], [[2, 138], [193,
373]]]
Residues 2-138, 193-373, matched between the two chains
Copying mlt.mtz to temp_dir/mlt.mtz
Copying temp_dir/NCS_correlation.log to NCS_correlation.log
Log file for NCS correlation is in NCS_correlation.log
List of refined NCS correlations: [1.0, 0.80000000000000004]
There were two separate groups of residues that had different NCS relationships. Residues 193-373 of each chain were in one group, and residues 2-138 in each chain were in the other group.
The electron density map had a correlation between the two NCS-related chains of 1.0 for the first group, and 0.8 for the second
The NCS operators for each are listed.
GROUP 1
Summary of NCS group with 2 operators:
ID of chain/residue where these apply: [['M', 'Z'], [[[193, 373]], [[193, 373]]]]
RMSD (A) from chain M: 0.0 0.0
Number of residues matching chain M:[181, 181]
Source of NCS info: anb.pdb
Correlation of NCS: 1.0
OPERATOR 1
CENTER: 69.1058 -9.5443 59.4674
ROTA 1: 1.0000 0.0000 0.0000
ROTA 2: 0.0000 1.0000 0.0000
ROTA 3: 0.0000 0.0000 1.0000
TRANS: 0.0000 0.0000 0.0000
OPERATOR 2
CENTER: 37.5004 -37.0709 -62.5441
ROTA 1: 0.7751 -0.6211 -0.1162
ROTA 2: -0.3607 -0.5859 0.7256
ROTA 3: -0.5188 -0.5205 -0.6782
TRANS: 9.7485 27.6460 17.2076
GROUP 2
Summary of NCS group with 2 operators:
ID of chain/residue where these apply: [['M', 'Z'], [[[2, 138]], [[2, 138]]]]
RMSD (A) from chain M: 0.0 0.0
Number of residues matching chain M:[137, 137]
Source of NCS info: anb.pdb
Correlation of NCS: 0.8
http://phenix-online.org/documentation/find_ncs.htm (3 of 5) [12/14/08 1:02:31 PM]
210
Finding and analyzing NCS from heavy-atom sites or a model with find_ncs
OPERATOR 1
CENTER: 66.6943 -13.3128 21.6769
ROTA 1: 1.0000 0.0000 0.0000
ROTA 2: 0.0000 1.0000 0.0000
ROTA 3: 0.0000 0.0000 1.0000
TRANS: 0.0000 0.0000 0.0000
OPERATOR 2
CENTER: 39.0126 -53.7392 -13.4457
ROTA 1: 0.3702 -0.9275 -0.0516
ROTA 2: -0.8933 -0.3402 -0.2938
ROTA 3: 0.2549 0.1548 -0.9545
TRANS: 1.7147 -0.6936 7.2172
Possible Problems
Specific limitations and problems:
●
None
Literature
Additional information
List of all find_ncs keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- find_ncs
ncs_in= None File with NCS information (PDB file with heavy-atom sites or
with NCS-related chains
ncs_in_type= *None chains sites ncs_file Type of ncs information. Choices
are: chains: a PDB file with two or more chains that have a
consistent residue-numbering system. sites: a PDB file or
fractional-coordinate file with atomic positions of
heavy-atoms that show NCS ncs_file: an ncs object file from
PHENIX.
mtz_in= None MTZ file with coefficients for a map that can be used to
assess NCS. Required for finding NCS from heavy-atom sites
labin= "" Labin line for MTZ file with map coefficients. This is optional
if find_ncs can guess the correct coefficients for FP PHI and FOM.
Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where myFP is
your column label for FP
resolution= 0.
high-resolution limit for map calculation
temp_dir= "temp_dir" Temporary work directory
output_dir= "" Output directory where files are to be written http://phenix-online.org/documentation/find_ncs.htm (4 of 5) [12/14/08 1:02:31 PM]
211
Finding and analyzing NCS from heavy-atom sites or a model with find_ncs
verbose= False Verbose output
debug= False Debugging output
dry_run= False Just read in and check parameter names
ncs_parameters
ncs_restrict= 0 You can specify the number of NCS operators to look for
force_ncs= False You can tell find_ncs to ignore the map. This is useful
if you only have FP but no phases yet...
optimize_ncs= False You can tell find_ncs to optimize the NCS by making
as compact a molecule as possible.
n_try_ncs= 3 Number of tries to find ncs from heavy-atom sites
ncs_thorough= 8 Thoroughness for looking for heavy-atom sites (high=more
thorough) http://phenix-online.org/documentation/find_ncs.htm (5 of 5) [12/14/08 1:02:31 PM]
212
eLBOW - electronic Ligand Builder and Optimisation Workbench
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
eLBOW - electronic Ligand Builder and Optimisation Workbench
Using SMILES string from internal database
electronic Ligand Builder and Optimisation Workbench (eLBOW)
More detailed website
Author
Nigel W. Moriarty
Purpose
Automate the generation of geometry restraint information for refinement of novel ligands and improved geometry restraint information for standard ligands. A protein crystal can contain more than just protein and other simple molecules that most refinement programs can interpret. An unusual molecule can be included in the refinement via eLBOW from a number of chemical inputs. The geometry can be optimised using various levels of chemical knowledge including a semi-empirical quantum mechanical method known as AM1.
Input formats include
●
SMILES string
●
PDB (Protein Data Bank)
●
MolFiles (V2000, V3000 and SDFiles)
●
TRIPOS MOL2
●
XYZ
● certain CIF files
●
GAMESS input and output files
Output formats include http://phenix-online.org/documentation/elbow.htm (1 of 8) [12/14/08 1:02:42 PM]
213
eLBOW - electronic Ligand Builder and Optimisation Workbench
●
PDB (Protein Data Bank)
●
CIF restraint file eLBOW contains a number of programs. All programs have been written to allow command-line control and script access to the objects and algorithm. The main program is run thus: phenix.elbow [options] input_file.ext
or in a Python script from elbow.command_line import builder molecule = builder.run("input_file.ext", **kwds) where the options are passed as a dictionary. The return object can be interrogated for information via the class methods. Output files from both techniques include a PDB file of the final geometry and a CIF file that contains the geometry restraint information for refinement. Other files are output as appropriate, such as edits and CIF files for linking the ligand to the protein. A final file contains the serialised data of the molecule in the Python pickle format.
Examples
Using SMILES string from internal database
To run eLBOW on an internal SMILES string phenix.elbow --key=ATP [options]
PDB input
To run eLBOW on a PDB file (containing one molecule) phenix.elbow input_file.pdb
To run eLBOW on a PDB file containing protein and ligands. This will only process the ligands that are unknown to phenix.refine. phenix.elbow input_file.pdb --do-all
To run eLBOW on a PDB file specifying a residue phenix.elbow input_file.pdb --residue LIG
To use the atom names from a PDB file phenix.elbow --smiles O --template input_file.pdb
SMILES input
To run eLBOW on a SMILES string phenix.elbow --smiles="CCO" http://phenix-online.org/documentation/elbow.htm (2 of 8) [12/14/08 1:02:42 PM]
214
eLBOW - electronic Ligand Builder and Optimisation Workbench or phenix.elbow --smiles=input_file.smi
Other input
To run eLBOW on other supported input formats phenix.elbow input_file.ext
Geometry optimisation
eLBOW performs a simple force-field geometry optimisation by default, however an AM1 geometry optimisation can be performed as follows. phenix.elbow input_file.pdb --opt
To start from a specific geometry for the optimisation phenix.elbow --initial_geometry input_file.pdb --opt
To use a separately installed GAMESS and do a HF/3-21G geometry optimisation phenix.elbow input_file.pdb --gamess --basis="3-21G"
To not optimise, but use the input geometry as the final geometry phenix.elbow --final_geometry input_file.pdb
Hydrogen addition
eLBOW automatically adds hydrogens to the input molecules if there are less than a quarter of the possible hydrogens. This can be controlled using phenix.elbow input_file.pdb --add-hydrogens=True
A common requirement is to add hydrogens to a ligand but retain the geometry and position relative to a protein. To do so use phenix.elbow --final-geometry=input_file.pdb
Output
To choice the base name of the output files phenix.elbow input_file.pdb --output="output"
To change the three letter ID phenix.elbow input_file.pdb --id=NEW
To change other attributes http://phenix-online.org/documentation/elbow.htm (3 of 8) [12/14/08 1:02:42 PM]
215
eLBOW - electronic Ligand Builder and Optimisation Workbench phenix.elbow input_file.pdb --pdb-assign "resSeq=3 b=100"
Some of the attributes.
●
Residue name : resname
●
Chain ID : chain, chainid
●
Residue sequence ID : resseq, resid
●
Alternative location ID : altid, altloc
●
Insert code : icode
●
Occupancy : occ, occupancy
●
Temperature factor : b, tempfactor
●
Segment ID : segid, segID
To output MOL2 format phenix.elbow input_file.pdb --tripos
To output PDB Ligand output phenix.elbow input_file.pdb --pdb-ligand
Additional programs
● phenix.get_smiles
● phenix.get_pdb
● phenix.metal_coordination : Generate edits for metal coordination
● phenix.link_edits : Generate edits from PDB LINK records
● phenix.print_sequence
● elbow.become_expert
● elbow.become_novice
● elbow.compare_two_molecules
● elbow.join_cif_files
● elbow.join_pdb_files
● elbow.join_mol2_files
● elbow.check_residues_against_monomer_lib
● elbow.defaults : Generate a eLBOW defaults file
Literature
Additional information
Novice options
Option
--version
--help
--long-help
--smiles
Default & choices
None
None
None
""
Description of inputs and uses show program's version number and exit show this help message and exit show even more help and exit use the passed SMILES http://phenix-online.org/documentation/elbow.htm (4 of 8) [12/14/08 1:02:42 PM]
216
eLBOW - electronic Ligand Builder and Optimisation Workbench
--file
--msd
--key
--keys
--chemicalcomponent
--pipe
--residue
--chain
--all-residues
--name
--sequence
--read-only
--opt
--template
""
False
""
False
None
False
""
""
None
""
""
None
False
""
--mopac
--gamess
--qchem
False
False
False
--gaussian
--final-geometry
--initial-geometry
False
None
None http://phenix-online.org/documentation/elbow.htm (5 of 8) [12/14/08 1:02:42 PM] use file for chemical input get SMILES using MSDChem code use SMILES from smilesDB for chemical input display smiles DB build ligand from chemical components (PDB) read input from standard in use only this residue from the PDB file use only this chain from the PDB file retain all residues in a PDB file name of ligand to be used in various output files use sequence (limited to 20 residues and no semi-empirical optimisation) read the input but don't do any processing use the best optimisation method available (currently AM1) use file for naming of atoms e.g.
PDB file use MOPAC for quantum chemistry calculations (requires MOPAC be installed) use GAMESS for quantum chemistry calculations (requires GAMESS be installed) use QChem for quantum chemistry calculations (requires QChem be installed) use Gaussian for quantum chemistry calculations (requires Gaussian be installed) use this file to obtain final geometry use this file to obtain the intital geometry for QM
217
eLBOW - electronic Ligand Builder and Optimisation Workbench
--energy-validation
--restart
--opt-steps
--opt-tol
--chiral
--ignore-chiral
--skip-cif-molecule
--memory
--method
--basis
--aux-basis
--random-seed
--quiet
--silent
--view
--reel
--pymol
--overwrite
--bonding
None
False
60, "positive integer" default, loose, tight retain
None
None
False
False
False
False
False
False
None
None
1Gb, "positive integer", "n Gb", "n
Mb"
"AM1"
"AM1"
False
--id
--xyz
--tripos
--sdf
"LIG"
False
False
False http://phenix-online.org/documentation/elbow.htm (6 of 8) [12/14/08 1:02:42 PM] calculate the difference between starting and final energies restart the optimisation with lowest previous geometry optimisation steps (currently for
ELBOW opt only) optimisation tolerance = loose, default or tight treatment of chiral centres = retain
(default), both, enumerate ignore the chirality in the SMILES string ignore ligands in supplied CIF file(s) maximum memory mostly for quantum method run QM optimisation with this method, if possible run QM with this basis, if possible run QM with this auxiliary basis, if possible random number seed less print out almost complete silence viewing software command fire up restraints editor use PyMOL from the PHENIX install to view geometries clobber any existing output files file that specifies the bond of the input molecule three letter code used in the CIF output output is also written in XYZ format output is also written in TRIPOS format output is also written in SDF format
218
eLBOW - electronic Ligand Builder and Optimisation Workbench
--pdb-ligand
--output
--pickle
--do-all
--clean
--pdb-assign
--heme
--add-hydrogens
None
"algorithm determination"
False
None
False
""
None
"algorithm determination", True,
False output is also written in PDB ligand format name for output files use a pickle file to reload the topological information process all molecules in a PDB,
TRIPOS or SDF file
DELETES "unnecessary" output files
(dangerous) set the atom attributes in the PDB file attempt to match HEME groups
(experimental) override the automatic hydrogen addition
Expert options
Option
--newton-raphson
--gdiis
--quicca
--user-opt
Default & choices
None
False
False
None
--user-opt-input-filename
--user-opt-xyz2input
--user-opt-xyz-filename
""
""
""
--user-opt-script-filename ""
--user-opt-program ""
--user-opt-output-filename ""
--user-opt-output2xyz ""
--write-hydrogens True, False
--auto-bond-cutoff
2.0, "float between 0.5 and
3"
Description of inputs and uses use Newton-Raphson optimisation use GDIIS optimisation use QUICCA optimisation use user defined for quantum chemistry calculations input filename converts xyz file to QM program input xyz filename run script filename
QM optimisation program run script or program invocation command output filename converts QM program output to xyz file override the automatic writing of hydrogens to PDB and CIF files set the max bondlength for auto bond detection http://phenix-online.org/documentation/elbow.htm (7 of 8) [12/14/08 1:02:42 PM]
219
eLBOW - electronic Ligand Builder and Optimisation Workbench
--write-redundantdihedrals
None control the writing of redundant dihedrals http://phenix-online.org/documentation/elbow.htm (8 of 8) [12/14/08 1:02:42 PM]
220
Restraints Editor Exclusively Ligands (REEL)
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Restraints Editor Exclusively Ligands (REEL)
Restraints Editor Exclusively Ligands (REEL)
Author
Nigel W. Moriarty
Purpose
Edit the geometry restraints of a ligand using a Graphical User Interface (GUI) including a 3D view of the ligand and a tabular view of the restraints.
Screen Shots
http://phenix-online.org/documentation/reel.htm (1 of 4) [12/14/08 1:02:55 PM]
221
Restraints Editor Exclusively Ligands (REEL) http://phenix-online.org/documentation/reel.htm (2 of 4) [12/14/08 1:02:55 PM]
222
Restraints Editor Exclusively Ligands (REEL)
General Procedure
The general procedure is to load a restraints file (CIF) and manipulate the restraints via the table or molecule view. The geometry of the revised restraints can be tested using the File->Guesstimate option.
The final restraints can be saved to a CIF file for use with phenix.refine. The corresponding PDB file can also be saved.
Input
Restraints can be loaded into REEL using the command line, the pull-down menu to open a file and a pulldown menu to run eLBOW. Restraints files from eLBOW contain both the restraints and cartesian coordinates. For the purposes of REEL, the coordinates are generated by the restraints and can not be edited directly. The background colour is light steel blue to show that it is not used in the geometry editing actions. REEL can load a molecule geometry from any format the eLBOW can read, including PDB,
Mol2D and SDF. If the file does not contain the bonding information, the bonding is automatically determined using proximity. The limit on the size of molecule that the bonding is automatically determined is set at 200 atoms. Molecule up to 2000 atoms can be loaded using the --view but only the bond connectivity is determined. The bond order is set to one but can be changed interactively. Molecules can be loaded into
REEL using the --reel option for eLBOW or using the eLBOW GUI dialog available in REEL. http://phenix-online.org/documentation/reel.htm (3 of 4) [12/14/08 1:02:55 PM]
223
Restraints Editor Exclusively Ligands (REEL)
Editing
The geometry restraints (bonds, angles, dihedrals, planes and chirals) are the driving coordinates in this editor. The cartesian coordinates are displayed in the atoms table only because there are generated for the viewer display. They are the driven coordinates and are therefore ignored if changed in the editor.
Many cells in the table view of the restraints can be changed but care must be taken to ensure that the changes are local or global. For example, changing an atom name in the bonds table view will only change it in that row. If you wish to change the name of the atom in all the restraints for should use the right mouse menu in the viewer window or change it in the atoms table view. The colour of the cell gives the impression as to whether the changes are propagated elsewhere. The cells, in some cases, have not been made read only to allow the user to make changes as desired. Clicking an atom in the molecule view will highlight the various related topological elements in the table view and vice versa. Use the checkbox in the table view to remove a restraint from optimisation and output file. Chiral centres can be changed in the table view.
Examples
To load a previously created restraints file phenix.reel atp.cif
To load a restraints file into REEL from eLBOW phenix.elbow --smiles O --reel
To load all the unknown ligands from a PDB file phenix.reel model.pdb --do-all or a single residue phenix.reel model.pdb --residue ATP http://phenix-online.org/documentation/reel.htm (4 of 4) [12/14/08 1:02:55 PM]
224
ReadySet!
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
ReadySet!
ReadySet!
Author
Nigel W. Moriarty
Purpose
ReadySet! is a program designed to prepare a PDB file for refinement as in ReadySet! Refine!!!. It will
add hydrogens to the protein model using phenix.reduce and to the ligands using eLBOW. The appropriate restraints are also written to disk. Hydrogens can also be added to water molecules.
Deuterium atoms can be added to facilitate dual xray-neutron refinement. Metal coordination files are also generated.
General Procedure
Ligand hydrogen addition
Including hydrogens in a refinement leads to better models. ReadySet! will add hydrogens to the ligands using eLBOW and PDB Chemical Components. The input PDB file is divided into 'standard' residues including the standard aminoacids and RNA/DNA bases. The other residues (usually ligands) are tested using the three-letter and atomic names against the PHENIX monomer library and the PDB
Chemical Components database.
If the ligand is determined to be in the PHENIX monomer library then the hydrogens are added with the atom naming from the library. This is done using a SMILES string taken from the PDB Chemical
Components database and the atom names from the monomer library. In this case, the hydrogens are added to the output PDB file but there is no restraints written because phenix.refine will use the library restraints.
If the ligand is determined to be in the PDB Chemical Components database, the SMILES string and the atom names are used to generate a molecule that represents the ligand. The atomic naming is determined using either the version 2 or version 3 PDB names. The restraints are written to disk.
If no match is found in the PHENIX monomer library or PDB Chemical Components database, the residue atoms are used to generate the ligand. The restraints are written to disk.
Once there is a ligand representation including hydrogens, the ligand must be included in the output. http://phenix-online.org/documentation/ready_set.htm (1 of 2) [12/14/08 1:02:58 PM]
225
ReadySet!
For each copy of the ligand in the model the presentation is pruned to match the number of nonhydrogen atoms and overlayed onto the ligand orientation. Hydrogens are added in an optimised geometry for each copy of the ligand.
Covalently bound ligands are handled and two files, the CIF link restraints file and the atom selection file, are output.
Metal coordination
Any metals in the model are coordinated and the results output in an "edits" for phenix.refine. The distances and angles in the PDB file are used in the output.
Neutron exchange addition
Deuteriums are added to aminoacids that exhibit exchangeable sites. The hydrogens are placed in alternative location "A" and the corresponding deuteriums are placed in "B". http://phenix-online.org/documentation/ready_set.htm (2 of 2) [12/14/08 1:02:58 PM]
226
phenix.reduce: tool for adding hydrogens to a PDB model
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
phenix.reduce: tool for adding hydrogens to a PDB model
Purpose phenix.reduce is a command line tool for adding hydrogens to a PDB structure file.
Hydrogens are added in standardized geometry with optimization of the orientations of OH, SH, NH3+,
Met methyls, Asn and Gln sidechain amides, and His rings. Both proteins and nucleic acids can be processed. HET groups can also be processed as long as the atom connectivity is provided. The program is described in Word, et al.(1999). J. Mol. Biol. 285, 1733-1745. For more information visit: http://kinemage.biochem.duke.edu/software/reduce.php
How to run phenix.reduce is run from the command line:
% phenix.reduce [pdb_file] [options]
To get information about command line options type:
% phenix.reduce
or for a longer list:
% phenix.reduce -h
Hydrogens in refinement Please refer to phenix.refine documentation to see how hydrogen atoms are used in structure refinement. http://phenix-online.org/documentation/hydrogens.htm [12/14/08 1:03:01 PM]
227
Phaser-2.1
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Phaser-2.1
General Strategy for Automated Molecular Replacement
Building an Ensemble from Coordinates
Composition by Molecular Weight
High Degree of Non-crystallographic Symmetry
Pseudo-translational Non-crystallographic Symmetry
Reference
●
J. Appl. Cryst. (2007). 40, 658-674.
Phaser crystallographic software.
A. J. McCoy, R. W. Grosse-
Kunstleve, P. D. Adams, M. D. Winn, L.C. Storoni and R.J. Read.
Tutorials and Example Files
We thank Mike James and Natalie Strynadka for the BETA-BLIP test case diffraction data. Reference:
Strynadka, N.C.J., Jensen, S.E., Alzari, P.M. & James. M.N.G. (1996) Nat. Struct. Biol. 3 290-297. We thank Paul Adams for the Insulin test case diffraction data. Reference: Adams (2001) Acta Cryst D57.
990-995.
Bug Reports
We apologize for the bugs. Please send bug reports to [email protected]
General Strategy for Automated Molecular Replacement
Automated Molecular Replacement in Phaser combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups. The phenix AUTO_MR wizard runs Phaser in http://phenix-online.org/documentation/phaser.htm (1 of 5) [12/14/08 1:03:09 PM]
228
Phaser-2.1 default mode and allows some key changes to the default mode which may give structure solution in more difficult cases. Experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies. However, if the AUTO_MR wizard doesn't give a solution even with non-default input you need to run Phaser outside the wizard to access the full range of Phaser control options. Details of how to run Phaser using keyword input or from python scripts are found at the Phaser home page .
How to Define Models
Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned homologous structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit. Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes. If fp is the fraction scattering and RMS is the rms coordinate error, then
Sigma(A) = SQRT{fp*[1-fsol*exp(-Bsol*(sin(theta)/lambda) 2 )]} * exp{-(8 Pi 2 /3)*RMS 2 *(sin(theta)/ lambda)
2
} where fsol(default=0.95) and Bsol(default=300Å
2
) account for the effects of disordered solvent on the completeness of the model at low resolution.
Building an Ensemble from Coordinates
If you have an NMR Ensemble as a model, there is no need to split the coordinates in the pdb file provided that the models are separated by MODEL and ENDMDL cards. In this case the homology is not a good indication of the similarity of the structural coordinates to the target structure. You should use the RMS option; several test cases have succeeded where the ID was close to 100% with an RMS value of about 1.5Å (see table below). The RMS deviation is entered directly or indirectly via the sequence identity (ID) using the formula RMS = max(0.8,0.4*exp(1.87*(1.0-ID))) where ID is the fraction identity. The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say
0.5 Ångstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.
Sequence ID RMS deviation
100% 0.80Å
64% 0.80Å
63%
50%
40%
30%
20%
0.799Å
1.02Å
1.23Å
1.48Å
1.78Å
--> limit 0% 2.60Å
If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model. http://phenix-online.org/documentation/phaser.htm (2 of 5) [12/14/08 1:03:09 PM]
229
Phaser-2.1
How to Define Composition
The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for. You can mix compositions entered by molecular weight with those entered by sequence.
Composition by Molecular Weight
The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.
Composition by Sequence
The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.
How to Select Peaks
If the AUTO_MR wizard fails to find a solution with default input, a solution may be found by changing the default selection criteria for peaks from the rotation function that are carried through to the translation funciton. The selection criterion can be changed by choosing the "edit rarely used inputs" option in the wizard. Selection can be done in four different ways.
Select by Percent
Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%. Default cutoff is 75%. This criteria has the advantange that at least one peak
(the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function).
Select by Z-Score
Number of standard deviations (sigmas) over the mean (the Z-score). This is an absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma).
Select by Number
Number of top peaks to select. If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.
Select All
All peaks are selected. Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.
Has Phaser Solved It?
http://phenix-online.org/documentation/phaser.htm (3 of 5) [12/14/08 1:03:09 PM]
230
Phaser-2.1
Ideally, only the number of solutions you are expecting should be found. However if the signal-to-noise of your search is low, there will be noise peaks in the final selection also. A highly compact summary of the history of a solution is given in the annotation of a solution in the .sol file.
This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG. You should see the TFZ
(the translation function Z-score) is high at least for the final components of the solution, and that the
LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features SOLU
SET RFZ=11.0 TFZ=22.6 PAK=0 LLG=434 RFZ=6.2 TFZ=28.9 PAK=0 LLG=986 LLG=986
SOLU 6DIM ENSE beta EULER 200.920 41.240 183.776 FRAC -0.49641 -0.15752 -0.28125
SOLU 6DIM ENSE blip EULER 43.873 80.949 117.141 FRAC -0.12290 0.29306 -0.09193
TF Z-score Have I solved it?
less than 5 no
5 - 6
6 - 7
7 - 8
more than
8
unlikely
possibly
probably
definitely*
For a rotation function, the correct solution may be in the list with a Z-score under 4, and will not be found until a translation function is performed and picks out the correct solution. For a translation function the correct solution will generally have a Z-score (number of standard deviations above the mean value) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! *Note, in particular, that in the presence of translational NCS, pairs of similarly-oriented molecules separated by the correct translation vector will give large Z-scores, even if they are incorrect, because they explain the systematic variation in intensities caused by the translational NCS.
You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 10 clashes are allowed. Such a solution may be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFfunction Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.
What to do in difficult cases
Not every structure can be solved by molecular replacement, but the right strategy can push the limits.
What to do when the default jobs fail depends on why your structure is difficult.
Flexible Structure
The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. Alternatively, you could try generating a series of models perturbed by normal modes. One of these may duplicate the hinge motion and provide a good single model.
Poor or Incomplete Model
Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected.
For this reason, it can be very useful to use the subsequent translation search as a way to choose http://phenix-online.org/documentation/phaser.htm (4 of 5) [12/14/08 1:03:09 PM]
231
Phaser-2.1 among many (say 1000) orientations. Try increasing the number of clustered orientations. If that fails, try turning off the clustering feature in the save step, because the correct orientation may sit on the shoulder of a peak in the rotation function. As shown convincingly by Schwarzenbacher et al.
(Schwarzenbacher, Godzik, Grzechnik & Jaroszewski, Acta Cryst. D60, 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a
"mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol.
High Degree of Non-crystallographic Symmetry
If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal.
First search with the oligomeric model; if this fails, search with a monomer.
Pseudo-translational Non-crystallographic Symmetry
It is frequently the case that crystallographic and non-crystallographic rotational symmetry axes are parallel. The combination generates translational NCS, in which more than one unique copy of the molecule is found in the same orientation in the crystal. This can be recognized by the presence of large non-origin peaks in the native Patterson map. If one copy of the search model can be found, then the translational NCS tells you where to place another copy. Unfortunately, the presence of translational NCS can make it difficult to solve a structure using Phaser, because the current likelihood targets do not account for the statistical effects of NCS. If there is a small difference in the orientation of the two molecules (which will show up as a reduction in the height of the non-origin Patterson peak as the resolution is increased), it may help to use data to higher resolution than the default, because the translational NCS is partially broken.
What not to do
The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem.
When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-
1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs. It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default (10) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.
Other suggestions
Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information at the Phaser home page to take advantage of these facilities! http://phenix-online.org/documentation/phaser.htm (5 of 5) [12/14/08 1:03:09 PM]
232
Superimposing two PDB files with superpose_pdbs
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Superimposing two PDB files with superpose_pdbs
Output files from superpose_pdbs
Standard run of superpose_pdbs:
Specific limitations and problems:
List of all superpose_pdbs keywords
Author(s)
● superpose_pdbs: Peter Zwart, Pavel Afonine, Ralf W. Grosse-Kunstleve
Purpose
superpose_pdbs is a command line tool for superimposing one PDB model on another and writing out the superimposed model.
Usage
How superpose_pdbs works:
superpose_pdbs performes a least-squares superposition of two selected parts from two pdb files. If no selections is provided for fixed and moving models the whole content of both input PDB files is used for superposition. If the number of atoms in fixed and moving models is different and the models contain amino-acid residues then the sequence alignment is performed and the matching residues (CA atoms by default, can be changed by the user) are used for superposition. Note that selected (and/or matching) atoms are the atoms used to find the superposition operators while these operators are applied to the whole moving structure.
Output files from superpose_pdbs
A PDB file with fitted model.
Examples
Standard run of superpose_pdbs:
Running the superpose_pdbs is easy. From the command-line you can type: http://phenix-online.org/documentation/superpose_pdbs.htm (1 of 3) [12/14/08 1:03:12 PM]
233
Superimposing two PDB files with superpose_pdbs phenix.superpose_pdbs fixed.pdb moving.pdb
Parameters can be changed from the command line: phenix.superpose_pdbs fixed.pdb moving.pdb selection_fixed="chain A and name CA" selection_moving="chain B and name CA"
Possible Problems
Specific limitations and problems:
●
Different number of atoms in selection_fixed and selection_moving in case when no sequence alignment can be performed (the molecules contain no amino-acid residues) or sequence alignment failed to find matching residues.
●
More than one model in one PDB file (separated with MODEL-ENDMDL)
Literature
Additional information
List of all superpose_pdbs keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
selection_fixed= None Selection of the target atoms to fit to (optional) selection_moving= None Selection of the atoms that will be fit to
selection_fixed (optional)
input
pdb_file_name_fixed= None Name of PDB file with model to fit to
pdb_file_name_moving= None Name of PDB file with model that will be fit to
pdb_file_name_fixed
crystal_symmetry
Unit cell and space group parameters
unit_cell= None
space_group= None
output
file_name= None Name of PDB file with model that best fits to
pdb_file_name_fixed
alignment
Set of parameters for sequence alignment. Defaults are good for most
of cases
alignment_style= local *global
gap_opening_penalty= 1
gap_extension_penalty= 1
similarity_matrix= blosum50 dayhoff *identity http://phenix-online.org/documentation/superpose_pdbs.htm (2 of 3) [12/14/08 1:03:12 PM]
234
Superimposing two PDB files with superpose_pdbs
selection= peptide and name ca Select protein atoms that will be used in
superposition after sequence alignment http://phenix-online.org/documentation/superpose_pdbs.htm (3 of 3) [12/14/08 1:03:12 PM]
235
Density modification with multi-crystal averaging with phenix.multi_crystal_average
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Density modification with multi-crystal averaging with phenix.multi_crystal_average
How phenix.multi_crystal_average works:
Output files from phenix.multi_crystal_average
Standard run of phenix.multi_crystal_average:
Run of phenix.multi_crystal_average with multiple domains:
Run of phenix.multi_crystal_average using PDB files to define the NCS asymmetric unit:
Specific limitations and problems:
List of all multi_crystal_average keywords
Author(s)
● phenix.multi_crystal_average: Tom Terwilliger
Purpose
phenix.multi_crystal_average is a command line tool for carrying out density modification, including NCS symmetry within a crystal and electron density from multiple crystals.
Usage
How phenix.multi_crystal_average works:
The inputs to phenix.multi_crystal_average are a set of PDB files that define the NCS within each crystal and the relationships of density between crystals, structure factor amplitudes (and optional phases, FOM and HL coefficients) for each crystal, and starting electron density maps for one or more crystals. The PDB files should be composed of the exact same of chains, placed in a different position and orientation for each
NCS asymmetric unit of each crystal. You might create these PDB files by molecular replacement starting with the same search model for each crystal. You should not refine these MR solutions; they are only used to get the NCS relationships and they will be more reliably found if the models for all NCS asymmetric units are identical. You can break the NCS asymmetric unit into domains and place them independently. You can specify the domains by giving them unique chain IDs, (or you can use the routine edit_chains.py to do this for you, see below). A separate NCS group will be created for each domain. Additionally if your NCS asymmetric unit consists of more than one chain (A+B for example) then each chain will always be treated as a separate NCS group. phenix.multi_crystal_average first uses the supplied PDB files to calculate NCS operators relating the NCS asymmetric unit in each crystal to all othe NCS asymmetric units in that crystal and in other crystals. This is done by adding the unique chains in one crystal to each PDB file in turn, finding all the NCS relationships from all chains in that composite PDB file, and removing duplicate identity transformations. For example, suppose the NCS asymmetric unit is one chain (A,B,C....). Then to to relate all NCS asymmetric units to the
NCS asymmetric unit of crystal 0, phenix.multi_crystal_average will compare all chains in the PDB file for each crystal to the unique chain in the PDB file for crystal 0, generating one NCS operator for each chain in http://phenix-online.org/documentation/multi_crystal_average.htm (1 of 6) [12/14/08 1:03:16 PM]
236
Density modification with multi-crystal averaging with phenix.multi_crystal_average each crystal. In this process the unique chain (in this case the NCS asymmetric unit of crystal 0) is renamed to a unique name (usually "**") and a composite PDB file is created with this chain along with all the chains in the PDB file for the crystal being considered, and phenix.simple_ncs_from_pdb is used to find the NCS operators. The centroids of the chains defining NCS are used as centers of the regions where the NCS operator is to be applied. If the supplied PDB files have more than one domain or chain in each NCS asymmetric unit, then the domains or chains are grouped into separate NCS groups. Once NCS operators have been identified, density modification is carried out sequentially on data from each crystal. During density modification for one crystal, the current electron density maps from all other crystals are used in generating target density for density modification in exactly the same way as NCS-related density is normally used when only a single crystal is available. First the asymmetric unit of NCS is defined, in this case including the density in all NCS copies within the crystal being density modified as well as the density in all NCS copies in all other crystals. The asymmetric unit of NCS is the region over which the NCS operators apply. It is assumed to be identical for all NCS copies for all crystals, with orientation and position identified by the NCS operators. It is identified as the region over which all NCS copies have correlated density. If a mask for the protein/solvent boundary is supplied (by specifying "quot; use_model_mask"quot;), then the asymmetric unit of NCS is constrained to be within the non-solvent region of the map. Alternatively, if you request that the domains provided in your PDB files be used to define the NCS asymmetric unit (by specifying "quot;write_ncs_domain_pdb"quot;) then the the NCS asymmetric unit (for each NCS group) is limited to the region occupied by the corresponding chains in your
PDB files. Then a target density map is created for the crystal being density modified. For each NCS copy in this crystal, the average density for all other NCS copies in this and other crystals is used as a target.
Finally, statistical density modification is carried out using histograms of expected density, solvent flattening, and the NCS-based target density for this crystal. The process is then repeated for all other crystals. For those crystals for which no starting phases were available, one additional step is carried out in which the target density map is used by itself to calculate a starting electron density map (using RESOLVE map-based phasing). This entire process is carried out several times, leading to electron density maps for all crystals that typically have a high level of correlation of density within all NCS copies in each crystal and between the corresponding NCS regions in different crystals.
Output files from phenix.multi_crystal_average
denmod_cycle_1_xl_0.mtz: Density-modified map coefficients for crystal 0, cycle 1. Crystal 0 is the first crystal specified in your pdb_list, map_coeff_list, etc. denmod_cycle_5_xl_1.mtz: Density-modified map coefficients for crystal 1, cycle 5. These map coefficients are suitable for model-building. They also contain
HL coefficients that can optionally be used in refinement. As the HL coefficients contain information from all crystals they may in some cases be useful in refinement (normally you would only use experimental HL phase information in refinement as the NCS-based information would come from your NCS restraints in refinement).
Examples
Standard run of phenix.multi_crystal_average:
Running phenix.multi_crystal_average is easy. Usually you will want to edit a small parameter file
(run_multi.eff) to contain your commands like this: type:
# run_multi.eff commands for running phenix.multi_crystal_average
# use: "phenix.multi_crystal_average run_multi.eff" multi {
pdb_list = "crystal_1.pdb" "crystal_2.pdb"
map_coeff_list = "crystal_1_map_coeffs.mtz" None
datafile_list = "crystal_1_data.mtz" "crystal_2_data.mtz"
datafile_labin_list = "FP=FP" "FP=F SIGFP=SIGF PHIB=PHI FOM=FOM HLA=HLA HLB=HLB HLC=HLC
HLD=HLD"
solvent_content_list = "0.43" "0.50"
cycles = 5
}
Then you can run this with the command: http://phenix-online.org/documentation/multi_crystal_average.htm (2 of 6) [12/14/08 1:03:16 PM]
237
Density modification with multi-crystal averaging with phenix.multi_crystal_average phenix.multi_crystal_average run_multi.eff
In this example we have 2 crystals. Crystal 1 has starting map coefficients in crystal_1_map_coeffs.mtz and data for FP in crystal_1_data.mtz. The contents of this crystal are represented by crystal_1.pdb. The second crystal has no starting map, has data for FP as well as PHI and HL coefficients in crystal_2_data.mtz, and the contents of this crystal are represented by crystal_2.pdb. The solvent contents of the 2 crystals are 0.43 and 0.50, and 5 overall cycles are to be done. The column label strings like "FP=FP" are optional...if you say instead "None" then phenix.multi_crystal_average will guess them for you.
Run of phenix.multi_crystal_average with multiple domains:
If your PDB files have more than one NCS domain within a chain, then you may want to split the chains up into sub-chains representing the individual NCS domains. This will provide a better definition of the NCS operators when the PDB files are analyzed. You can use the jiffy "edit_chains.py" to do this. This jiffy splits your chains up into sub-chains based on the domains that you specify in "edit_chains.dat". NOTE: edit_chains.py only works if your chains have single-letter ID's. (It simply adds another character to your chain ID's to make new ones.) If you have two-letter chain ID's, then you'll have to do this another way. To use it, type: phenix.python $PHENIX/phenix/phenix/autosol/edit_chains.py file.pdb edited_file.pdb
The file edit_chains.dat is required and should look like:
A 1 321
A 322 597
A 598 750
A 751 902
A 903 1082
B 1 58
B 424 425
B 59 101
B 343 423
B 102 342 where the letter and residue range is your chain ID and residue range for a particular domain. You should specify these for ALL chains in your PDB files (not just the unique ones).
Run of phenix.multi_crystal_average using PDB files to define the NCS asymmetric unit:
If you specify the parameter write_ncs_domain_pdb=True, then phenix.multi_crystal_average will write out domain-specific PDB files for each domain in your model (based on its analysis of NCS, one for each NCS group). Then it will use those domain-specific PDB files to define the region over which the corresponding set of NCS operators apply. This is generally a good idea if you have multiple domains in your structure.
Possible Problems
Specific limitations and problems:
●
If the NCS asymmetric unit of your crystal contains more than one chain, phenix.
multi_crystal_average will consider it to have more than one domain. This limitation comes from phenix.simple_ncs_from_pdb, which assigns one NCS group to each unique chain in the NCS asymmetric unit. If you would like phenix.multi_crystal_average to consider several chains as a single
NCS group, then you would need to rename your chains and residues so that all the residues in a single NCS group have the same chain name and so that residue numbers are not duplicated.
Normally you not need to do this, but if you want to use phenix.multi_crystal_average to generate phases for one crystal from another and you have more than one chain in the NCS asymmetric unit http://phenix-online.org/documentation/multi_crystal_average.htm (3 of 6) [12/14/08 1:03:16 PM]
238
Density modification with multi-crystal averaging with phenix.multi_crystal_average
●
● you would have to do this.
If your NCS asymmetric unit has more than one domain (more than one chain, or else multiple domains within a chain that have different arrangements in different NCS asymmetric units) then phenix.multi_crystal_average requires that you provide map coefficients for all crystals. This is because phenix.multi_crystal_average cannot use the PDB files you provide to generate the NCS asymmetric unit directly at this point (i.e., it cannot use pdb_domain in RESOLVE.) Therefore if you don't provide map coefficients for one crystal then it does not have a way to individually identify the region occupied by each domain in the NCS asymmetric unit for that crystal. This isn't a problem if there are not multiple domains or chains in the NCS asymmetric unit because the automatic method for generation of the NCS asymmetric unit can be used.
Normally you should supply PDB files defining the NCS in your crystals in which all the chains have identical sequences and conformations within each NCS copy. This is not absolutely required, however. If your PDB file contains chains that are not identical then NCS will be estimated from the chains you provide. It may be necessary to set the parameter simple_ncs_from_pdb.maximize_size_of_groups=True
● to get this to work if the chains have insertions, deletions, or sequence differences.
The size of the asymmetric unit in the SOLVE/RESOLVE portion of phenix.multi_crystal_average is limited by the memory in your computer and the binaries used. The Wizard is supplied with regularsize ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.
Literature
Rapid automatic NCS identification using heavy-atom substructures T.C. Terwilliger.
Acta Cryst. D58, 2213-2215 (2002)
Statistical density modification with non-crystallographic symmetry T.C. Terwilliger.
Acta Cryst. D58, 2082-2086 (2002)
Maximum likelihood density modification T. C. Terwilliger Acta Cryst. D56 , 965-972
(2000)
Map-likelihood phasing T. C. Terwilliger Acta Cryst. D57 , 1763-1775 (2001)
[pdf]
[pdf]
[pdf]
[pdf]
Additional information
List of all multi_crystal_average keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
-------------------------------------------------------------------------------
include= scope phenix.command_line.simple_ncs_from_pdb.ncs_master_params
multi
verbose= True verbose output
debug= False debugging output
pdb_list= None List of PDB files, one for each crystal. These should be in
the same order as datafiles and map files. They are used to http://phenix-online.org/documentation/multi_crystal_average.htm (4 of 6) [12/14/08 1:03:16 PM]
239
Density modification with multi-crystal averaging with phenix.multi_crystal_average
identify the NCS within each crystal and between crystals. You
should create these by placing the unique set of atoms (the NCS
asymmetric unit) in each NCS asymmetric unit of each unit cell.
Normally you would do this by carrying out molecular replacement
on each crystal with the same search model.
output_file= None You can name the output file (your own path) if you like
map_coeff_list= None List of mtz files with map coefficients. At least one
crystal must have map coefficients. Use "None" for any
crystals that do not have starting maps. NOTE: If you have
multiple NCS groups then you need map coefficients for all
crystals.
map_coeff_labin_list= None list of labin lines for mtz files with map
coefficients. They look like map_coeff_labin_list="
'FP=FP PHIB=PHIM FOM=FOMM'" Put each set of labin
values inside single quotes, and the whole list
inside double quotes. You can leave out a labin
statement for a file by putting in None and the
routine will guess the column labels
datafile_list= None list of mtz files with structure factors and optional
phases and FOM and optional HL coefficients. One datafile
for each crystal to be included
datafile_labin_list= None list of labin lines for mtz files . Each one can
contain FP SIGFP [PHIB FOM] [HLA HLB HLC HLD]. They
look like this: datafile_labin_list=" 'FP=FP
SIGFP=SIGFP PHIB=PHIM FOM=FOMM'" Put each set of labin
values inside single quotes, and the whole list inside
double quotes. You can leave out a labin statement for
a file by putting in None and the routine will guess
the column labels NOTE: If you supply HL coefficients
they will be used in phase recombination. If you
supply PHIB or PHIB and FOM and not HL coefficients,
then HL coefficients will be derived from your PHIB
and FOM and used in phase recombination.
solvent_content_list= None Solvent content (0 to 1, typically 0.5) for each
crystal
cycles= 5 Number of cycles of density modification
resolution= None high-resolution limit for map calculation
temp_dir= "temp_dir" Optional temporary work directory
output_dir= "" Output directory where files are to be written
perfect_map_coeff_list= None Optional list of mtz files with perfect map
coefficients for comparison
perfect_map_coeff_labin_list= None list of labin lines for mtz files with
perfect map coefficients.
use_model_mask= False You can use the PDB files you input to define the
solvent boundary if you wish. These will partially define
the NCS asymmetric unit (by limiting it to the non-solvent
region) but the exact NCS asymmetric unit will always be
defined automatically (by the overlap of NCS-related
density). Note that this is different than the command
write_ncs_domain_pdb which defines individual regions where
NCS applies for each domain.
coarse_grid= False You can set coarse_grid in resolve
sharpen= False You can sharpen the maps or not in the density-modification
process. (They are unsharpened at the end of the process if so).
equal_ncs_weight= False You can fix the NCS weighting to equally weight all
copies.
weight_ncs= None You can set the weighting on NCS symmetry (and
cross-crystal averaging)
write_ncs_domain_pdb= None You can use the input PDB files to define NCS
boundaries. The atoms in the PDB files will be http://phenix-online.org/documentation/multi_crystal_average.htm (5 of 6) [12/14/08 1:03:16 PM]
240
Density modification with multi-crystal averaging with phenix.multi_crystal_average
grouped into domains during the analysis of NCS and
written out to domain-specific PDB files. (If there
is only one domain or NCS group then there will be
only one domain-specific PDB file and it will be the
same as the starting PDB file.) Then the
domain-specific PDB files will be used to define the
regions over which the corresponding NCS operators
apply. Note that this is different than the command
use_model_mask which only defines the overall solvent
boundary with your model.
mask_cycles= 1 Number of mask cycles in each cycle of density modification
dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/multi_crystal_average.htm (6 of 6) [12/14/08 1:03:16 PM]
241
Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb
Output files from get_cc_mtz_pdb
Standard run of get_cc_mtz_pdb:
Specific limitations and problems:
List of all get_cc_mtz_pdb keywords
Author(s)
● get_cc_mtz_pdb: Tom Terwilliger
Purpose
get_cc_mtz_pdb is a command line tool for adjusting the origin of a PDB file using space-group symmetry so that the PDB file superimposes on a map, obtaining the correlation of model and map, and analyzing the correlation for each residue.
Usage
How get_cc_mtz_pdb works:
get_cc_mtz_pdb calculates a model map based on the supplied PDB file, then uses RESOLVE to find the origin shift (using space-group symmetry) that maximizes the correlation of this model map with a map calculated with the supplied map coefficients in an mtz file. This shift is applied to the atoms in the PDB file to create offset.pdb and then the correlation, residue-by-residue of offset.pdb with the map is analyzed. Atoms and residues that are out of density or are in weak density are flagged. You can set several parameters to define how the correlations are calculated. By default model density is calculated using the atom types, occupancies and isotropic thermal factors (B-values) supplied in the
PDB file. If you specify scale=True then an overall B as well as an increment in B-values for each atom beyond CB (for proteins) will be added to the values in the PDB file, after adjusting these parameters to maximize the map correlation.
If you specify use_only_refl_present_in_mtz=True http://phenix-online.org/documentation/get_cc_mtz_pdb.htm (1 of 3) [12/14/08 1:03:17 PM]
242
Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb then the model-based map will be calculated using the same set of reflections as the map calculated from your input mtz file. This reduces the contribution of missing reflections on the calculation (but the correlation is no longer the actual map-model correlation). In the calculation of the map correlation in the region of the model, the region where the model is located is defined as all points within a distance rad_max of an atom in the model. The value of rad_max is adjusted in each case to maximize this correlation. Its value is typically similar to the high-resolution limit of the map.
Output files from get_cc_mtz_pdb
offset.pdb: A PDB file offset to match the origin in the mtz file.
Examples
Standard run of get_cc_mtz_pdb:
Running the get_cc_mtz_pdb is easy. From the command-line you can type: phenix.get_cc_mtz_pdb map_coeffs.mtz coords.pdb
If you want (or need) to specify the column names from your mtz file, you will need to tell get_cc_mtz_pdb what FP and PHIB (and optionally FOM) are, in this format: phenix.get_cc_mtz_pdb map_coeffs.mtz coords.pdb \ labin="FP=2FOFCWT PHIB=PH2FOFCWT"
Possible Problems
Specific limitations and problems:
In versions of PHENIX up to 1.3-final, defaults were set to maximize the correlation coefficient rather than to give the correlation using the existing thermal parameters and including only the reflections present in the mtz file. These previous defaults were equivalent to using the values: scale=True use_only_refl_present_in_mtz=True
These defaults were changed so that the correlation values obtained by default in a case where no origin shifts are needed would correspond to those obtained by simply calculating (1) a map using the input map coefficients and (2) a map from the PBB file and then determining the correlation between these maps.
Literature
Additional information
List of all get_cc_mtz_pdb keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help http://phenix-online.org/documentation/get_cc_mtz_pdb.htm (2 of 3) [12/14/08 1:03:18 PM]
243
Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- get_cc_mtz_pdb
pdb_in= None PDB file with coordinates to evaluate
mtz_in= None MTZ file with coefficients for a map
labin= "" Labin line for MTZ file with map coefficients. This is optional
if get_cc_mtz_pdb can guess the correct coefficients for FP PHI and
FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where
myFP is your column label for FP
resolution= 0.
high-resolution limit for map calculation
use_only_refl_present_in_mtz= False You can specify that only reflections
present in your mtz file are used in the
comparison.
scale= False If you set scale=True then get_cc_mtz_pdb applies an overall B
factor and a delta_b for each atom beyond CB.
chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and
side-chain atoms)
temp_dir= "temp_dir" Optional temporary work directory
output_dir= "" Output directory where files are to be written
verbose= True Verbose output
quick= False Skip the residue-by=residue correlations for a quick run
debug= False Debugging output
dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/get_cc_mtz_pdb.htm (3 of 3) [12/14/08 1:03:18 PM]
244
Correlation of two maps after accounting for origin shifts with get_cc_mtz_mtz
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Correlation of two maps after accounting for origin shifts with get_cc_mtz_mtz
Output files from get_cc_mtz_mtz
Standard run of get_cc_mtz_mtz:
Specific limitations and problems:
List of all get_cc_mtz_mtz keywords
Author(s)
● get_cc_mtz_mtz: Tom Terwilliger
Purpose
get_cc_mtz_mtz is a command line tool for adjusting the origin of a map so that the map superimposes on another map, and obtaining the correlation of the two maps. The maps are calculated from map coefficients supplied by the user in two mtz files.
Usage
How get_cc_mtz_mtz works:
get_cc_mtz_mtz calculates maps based on the supplied mtz files, then uses RESOLVE to find the origin shift compatible with space-group symmetry that maximizes the correlation of the two maps. This shift is applied to the second map and the correlation of the maps is calculated. Several parameters can be set by the user to define how the correlations are calculated. By default, maps are calculated using all the reflections present (to the specified high-resolution limit, if any) in each mtz file. If you specify use_only_refl_present_in_mtz_1=True
Then the map calculated using your second mtz file will only include reflections that were present in your first mtz file. This removes the effects of missing reflections on the correlation. If you specify scale=True then get_cc_mtz_mtz scales the amplitudes from the second input mtz file to those in the first input mtz, including an overall B factor and a scale factor. This reduces effects of differences in overall B factors between the two mtz files on the correlation. If you specify http://phenix-online.org/documentation/get_cc_mtz_mtz.htm (1 of 3) [12/14/08 1:03:19 PM]
245
Correlation of two maps after accounting for origin shifts with get_cc_mtz_mtz keep_f_mag=False then get_cc_mtz_mtz uses amplitudes from the first input mtz file and phases and figure of merit from both to do the correlation. This has the effect of removing effects due to differences in amplitudes on the correlation, and focusing on differences in phases and figures of merit.
Output files from get_cc_mtz_mtz
offset.log: Log file for correlation calculation.
Examples
Standard run of get_cc_mtz_mtz:
Running the get_cc_mtz_mtz is easy. From the command-line you can type: phenix.get_cc_mtz_mtz map_coeffs_1.mtz map_coeffs_2.mtz
If you want (or need) to specify the column names from your mtz file, you will need to tell get_cc_mtz_mtz what FP and PHIB (and optionally FOM) are, in this format: phenix.get_cc_mtz_mtz map_coeffs_1.mtz map_coeffs_2.mtz \ labin_1="FP=2FOFCWT PHIB=PH2FOFCWT" labin_2="FP=2FOFCWT PHIB=PH2FOFCWT"
Possible Problems
Specific limitations and problems:
●
Versions of phenix.get_cc_mtz_mtz up to 1.3-final used a different set of defaults, with the values, scale=True use_f_mag=False use_only_refl_present_in_mtz_1=True
These defaults were changed after version 1.3-final in order to make the results independent of the order of the mtz files and to make the default be to get the correlation of maps without manipulation.
Literature
Additional information
List of all get_cc_mtz_mtz keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available) http://phenix-online.org/documentation/get_cc_mtz_mtz.htm (2 of 3) [12/14/08 1:03:19 PM]
246
Correlation of two maps after accounting for origin shifts with get_cc_mtz_mtz
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- get_cc_mtz_mtz
mtz_1= None MTZ file 1 with coefficients for a map
mtz_2= None MTZ file 2 with coefficients for a map
labin_1= "" Labin line for MTZ file 1 with map coefficients. This is
optional if get_cc_mtz_mtz can guess the correct coefficients for
FP PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI
FOM=myFOM where myFP is your column label for FP
labin_2= "" Labin line for MTZ file 2 with map coefficients. This is
optional if get_cc_mtz_mtz can guess the correct coefficients for
FP PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI
FOM=myFOM where myFP is your column label for FP
resolution= 0.
high-resolution limit for map calculation
low_resolution= 1000.
low-resolution limit for map calculation
temp_dir= "temp_dir" Optional temporary work directory
output_dir= "" Output directory where files are to be written
keep_f_mag= True If you set keep_f_mag=False then get_cc_mtz_mtz uses
amplitudes from the first input mtz file and phases and fom
from both to do the correlation. If you specify keep_f_mag=True
then the amplitudes from both files are included.
scale= False If you set scale=True then get_cc_mtz_mtz scales the
amplitudes from the second input mtz file to those in the first
input mtz, including an overall B factor and a scale factor.
use_only_refl_present_in_mtz_1= False You can specify that only reflections
present in your first mtz file are used in
the comparison. Note that this means that
the order of the files will have an effect
on the correlation coefficient
verbose= True Verbose output
debug= False Debugging output
dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/get_cc_mtz_mtz.htm (3 of 3) [12/14/08 1:03:19 PM]
247
Rapid helix fitting to a map with find_helices_strands
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Rapid helix fitting to a map with find_helices_strands
How find_helices_strands finds helices and strands in maps:
How find_helices_strands finds RNA and DNA helices in maps:
Output files from find_helices_strands
Standard run of find_helices_strands:
Using find_helices_strands to bootstrap phenix.autobuild:
Specific limitations and problems:
List of all find_helices_strands keywords
Author(s)
● find_helices_strands: Tom Terwilliger
Purpose
find_helices_strands is a command line tool for finding helices and strands in a map and building an model of the parts of a structure that have regular secondary structure. It can be used for protein,
RNA, and DNA.
Usage
How find_helices_strands finds helices and strands in maps:
find_helices_strands first identifies helical segments as rods of density at 5-8 A. Then it identifies helices at higher resolution keeping the overall locations of the helices fixed. Then it identifies the directions and CA positions of helices by noting the helical pattern of high-density points offset slightly along the helix axis from the main helical density (as used in "O" to identify helix direction). Finally model helices are fit to the density using the positions and orientations identified in the earlier steps. A similar procedure is used to identify strands. Then the helices and strands are combined into a single model.
How find_helices_strands finds RNA and DNA helices in maps:
find_helices_strands finds RNA and DNA helices differently than it finds helices in proteins. It uses a convolution search to find places in the asymmetric unit where an A-form RNA or B-form DNA helix can be placed. These are assembled into contiguous helical segments if possible. The resolution of this search is 4.5 A if you have resolution beyond 4.5 A, and the resolution of your data otherwise. http://phenix-online.org/documentation/find_helices_strands.htm (1 of 3) [12/14/08 1:03:21 PM]
248
Rapid helix fitting to a map with find_helices_strands
Output files from find_helices_strands
If you run find_helices_strands with my_map.mtz then you will get: my_map.mtz_helices_strands.
pdb which is a PDB file containing helices from your structure.
Examples
Standard run of find_helices_strands:
Running the find_helices_strands is easy. From the command-line you can type: phenix.find_helices_strands map_coeffs.mtz quick=True
If you want a more thorough run, then skip the "quick=True" flag. If you want (or need) to specify the column names from your mtz file, you will need to tell find_helices_strands what FP and PHIB are, in this format: phenix.find_helices_strands map_coeffs.mtz \ labin="LABIN FP=2FOFCWT PHIB=PH2FOFCWT"
If you want to specify a sequence file, then in the last step find_helices_strands will try to align your sequence with the map and model: phenix.find_helices_strands map_coeffs.mtz seq_file=seq.dat
Using find_helices_strands to bootstrap phenix.autobuild:
If you run phenix.autobuild at low resolution (3.5 A or lower) then your model may have strands built instead of helices. You can use find_helices_strands to help bootstrap autobuild model-building by providing the helical model from find_helices_strands to phenix.autobuild. Just run phenix.
find_helices_strands with your best map map_coeffs.mtz. Then take the helical model map_coeffs.
mtz_helices.pdb and pass it to phenix.autobuild with the keyword (in addition to your usual keywords for autobuild): consider_main_chain_list=map_coeffs.mtz_helices.pdb
Then the AutoBuild wizard will treat your helical model just like one of the models that it builds, and merge it into the model as it is being assembled.
Possible Problems
Specific limitations and problems:
Literature
Additional information
List of all find_helices_strands keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values http://phenix-online.org/documentation/find_helices_strands.htm (2 of 3) [12/14/08 1:03:21 PM]
249
Rapid helix fitting to a map with find_helices_strands blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- find_helices_strands
mtz_in= None MTZ file with coefficients for a map
output_model= None Output PDB file
output_log= None Output log file name. If you want to specify a directory
to put this file in then please use "output_dir=myoutput_dir"
output_dir= None Output directory
seq_file= None Sequence file for sequence alignment
compare_file= None PDB file for comparison only
labin= "" Labin line for MTZ file with map coefficients. This is optional
if find_helices_strands can guess the correct coefficients for FP
PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM
where myFP is your column label for FP
resolution= 0.
high-resolution limit for map calculation
res_convolution= 4.5
high-resolution limit for convolution calculation.
(Applies to nucleic acids only)
chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and
side-chain atoms)
temp_dir= "temp_dir" Optional temporary work directory
helices_only= False Find only helices
strands_only= False Find only strands
use_any_side= False Use any side chain that fits density in assembly
cc_helix_min= None Minimum CC of low-res helical density to map to keep.
cc_strand_min= None Minimum CC of strand density to map to keep.
quick= False Try to find helices quickly
verbose= True Verbose output
debug= False Debugging output
dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/find_helices_strands.htm (3 of 3) [12/14/08 1:03:21 PM]
250
phenix.pdbtools: PDB model manipulations and statistics
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
phenix.pdbtools: PDB model manipulations and statistics
Manipulations on a model in a PDB file including The operations below can be applied to the whole model or selected parts (e.g. "selection=chain A and backbone"). See examples below.
● shaking of coordinates (random coordinate shifts)
● rotation-translation shift of coordinates
● shaking of occupancies
● set occupancies to a value
● shaking of ADP
● shifting of ADP (addition of a constant value)
● scaling of ADP (multiplication by a constant value)
● setting ADP to a given value
● conversion to isotropic ADP
● conversion to anisotropic ADP
● removal of selected parts of a model
Comprehensive model statistics
●
Atomic Displacement parameters (ADP) statistics:
% phenix.pdbtools model.pdb --show-adp-statistics
●
Geometry (stereochemistry) statistics:
% phenix.pdbtools model.pdb --show-geometry-statistics
In the absence of a CRYST1 record in the PDB file, functionality that doesn't require knowledge of the crystal symmetry is still available. To enable the full functionality, the crystal symmetry can be specified externally (e.g. via the --symmetry option). Structure factors calculation The total model structure factor is defined as:
Fmodel = scale * exp(-h*b_cart*ht) * (Fcalc + k_sol * exp(-b_sol*s^2) * Fmask) where: scale is overall scale factor, h is Miller index, b_cart is overall anisotropic scale matrix in
Cartesian basis, Fcalc are structure factors computed from atomic model, k_sol is bulk solvent density, b_sol is smearing factor for bulk solvent contribution, Fmask is a solvent mask. Add
hydrogen atoms Add H atoms to a model using phenix.reduce. All default parameters of phenix.
reduce are used. Perform model geometry regularization Minimize geometry target to idealize bond lenghths, bond angles, planarities, chiralities, dihedrals, and non-bonded interactions. Examples
1) Type phenix.pdbtools from the command line for instructions:
% phenix.pdbtools
2) To see all default parameters: http://phenix-online.org/documentation/pdbtools.htm (1 of 5) [12/14/08 1:03:24 PM]
251
phenix.pdbtools: PDB model manipulations and statistics
% phenix.pdbtools --show-defaults=all
3) Suppose a PDB model consist of three chains A, B and C and some water molecules. Remove all atoms in chain C and all waters:
% phenix.pdbtools model.pdb remove="chain C or water" or one can achieve exactly the same result with equivalent command:
% phenix.pdbtools model.pdb keep="chain A or chain B" or:
% phenix.pdbtools model.pdb keep="not(chain C or water)" or finally:
% phenix.pdbtools model.pdb remove="not(chain A or chain B)"
The result of all four equivalent commands above will be a new PDB file containing chains A and B only.
Important: the commands keep and remove cannot be used simultaneously. 4) Remove all but backbone atoms and set all b-factors to 25:
% phenix.pdbtools model.pdb keep=backbone set_b_iso=25
5) Suppose a PDB model consist of three chains A, B and C and some water molecules. Remove all but backbone atoms and set b-factors to 25 for chain C atoms:
% phenix.pdbtools model.pdb keep=backbone set_b_iso=25 selection="chain C"
6) Simple Fcalc from atomic model (Fmodel = Fcalc):
% phenix.pdbtools model.pdb --f_model high_resolution=2.0
this will result in MTZ file with complete set of Fcalc up to 2A resolution. 7) Compute Fmodel include bulk solvent and all other scales, request the output in CNS format, specify labels for output Fmodel
(by default it is FMODEL), set low_resolution limit, use direct method of calculations (rather than
FFT):
% phenix.pdbtools model.pdb high_resolution=2.0 format=cns label=FM \
low_resolution=6.0 algorithm=direct k_sol=0.35 b_sol=60 scale=3 \
b_cart='1 2 -3 0 0 0' --f_model
8) Compute Fcalc using neutron scattering dictionary:
% phenix.pdbtools model.pdb --f_model high_resolution=2.0 scattering_table=neutron
9) Input model can be manipulated first before structure factors calculation:
% phenix.pdbtools model.pdb --f_model high_resolution=2.0 sites.shake=1.0
10) Add H atoms to a model: http://phenix-online.org/documentation/pdbtools.htm (2 of 5) [12/14/08 1:03:24 PM]
252
phenix.pdbtools: PDB model manipulations and statistics
% phenix.pdbtools model.pdb --add_h output.file_name=model_h.pdb
11) Model geometry regularization:
% phenix.pdbtools model.pdb --geometry_regularization
List of all pdbtools keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- modify
remove= None Selection for the atoms to be removed
keep= None Select atoms to keep
put_into_box_with_buffer= None Move molecule into center of box.
selection= None Selection for atoms to be modified
random_seed= None Random seed
adp
Scope of options to modify ADP of selected atoms
atom_selection= None Selection for atoms to be modified. Overrides
parent-level selection.
randomize= None Randomize ADP within a certain range
set_b_iso= None Set ADP of atoms to set_b_iso
convert_to_isotropic= None Convert atoms to isotropic
convert_to_anisotropic= None Convert atoms to anisotropic
shift_b_iso= None Add shift_b_iso value to ADP
scale_adp= None Multiply ADP by scale_adp
sites
Scope of options to modify coordinates of selected atoms
atom_selection= None Selection for atoms to be modified. Overrides
parent-level selection.
shake= None Randomize coordinates with mean error value equal to shake
translate= 0 0 0 Translational shift
rotate= 0 0 0 Rotational shift
euler_angle_convention= *xyz zyz Euler angles convention to be used for
rotation
occupancies
Scope of options to modify occupancies of selected atoms
randomize= None Randomize occupancies within a certain range
set= None Set all or selected occupancies to given value
output
Write out PDB file with modified model (file name is defined in
write_modified)
file_name= None Default is the original file name with the file
extension replaced by _modified.pdb .
input
pdb
file_name= None Model file(s) name (PDB)
crystal_symmetry
Unit cell and space group parameters
unit_cell= None http://phenix-online.org/documentation/pdbtools.htm (3 of 5) [12/14/08 1:03:24 PM]
253
phenix.pdbtools: PDB model manipulations and statistics
space_group= None
f_model
high_resolution= None
low_resolution= None
r_free_flags_fraction= None
k_sol= 0.0
Bulk solvent k_sol values
b_sol= 0.0
Bulk solvent b_sol values
b_cart= 0 0 0 0 0 0 Anisotropic scale matrix
scale= 1.0
Overall scale factor
scattering_table= wk1995 it1992 *n_gaussian neutron Choices of scattering
table for structure factors calculations
structure_factors_accuracy
algorithm= *fft direct
cos_sin_table= False
grid_resolution_factor= 1/3.
quality_factor= None
u_base= None
b_base= None
wing_cutoff= None
exp_table_one_over_step_size= None
mask
solvent_radius= 1.11
shrink_truncation_radius= 0.9
grid_step_factor= 4.0
The grid step for the mask calculation is
determined as highest_resolution divided by
grid_step_factor. This is considered as suggested
value and may be adjusted internally based on the
resolution.
verbose= 1
mean_shift_for_mask_update= 0.1
Value of overall model shift in
refinement to updates the mask.
ignore_zero_occupancy_atoms= True Include atoms with zero occupancy into
mask calculation
ignore_hydrogens= True Ignore H or D atoms in mask calculation
hkl_output
format= *mtz cns
label= FMODEL
type= real *complex
file_name= None Default is the original PDB file name with the file
extension replaced by .pdbtools.mtz or .pdbtools.cns
pdb_interpretation
link_distance_cutoff= 3
disulfide_distance_cutoff= 3
chir_volume_esd= 0.2
nonbonded_distance_cutoff= None
default_vdw_distance= 1
min_vdw_distance= 1
nonbonded_buffer= 1
vdw_1_4_factor= 0.8
translate_cns_dna_rna_residue_names= None
apply_cif_modification
data_mod= None
residue_selection= None
apply_cif_link
data_link= None
residue_selection_1= None
residue_selection_2= None
peptide_link http://phenix-online.org/documentation/pdbtools.htm (4 of 5) [12/14/08 1:03:24 PM]
254
phenix.pdbtools: PDB model manipulations and statistics
cis_threshold= 45
discard_psi_phi= True
omega_esd_override_value= None
rna_sugar_pucker_analysis
use= True
bond_min_distance= 1.2
bond_max_distance= 1.8
epsilon_range_not_2p_min= 155
epsilon_range_not_2p_max= 310
delta_range_2p_min= 115
delta_range_2p_max= 180
p_distance_c1_n_line_2p_max= 2.9
show_histogram_slots
bond_lengths= 5
nonbonded_interaction_distances= 5
dihedral_angle_deviations_from_ideal= 5
show_max_lines
bond_restraints_sorted_by_residual= 5
nonbonded_interactions_sorted_by_model_distance= 5
dihedral_angle_restraints_sorted_by_residual= 3
clash_guard
nonbonded_distance_threshold= 0.5
max_number_of_distances_below_threshold= 100
max_fraction_of_distances_below_threshold= 0.1
geometry_minimization
alternate_nonbonded_off_on= False
max_iterations= 500
macro_cycles= 1
show_geometry_restraints= False http://phenix-online.org/documentation/pdbtools.htm (5 of 5) [12/14/08 1:03:24 PM]
255
Running SOLVE/RESOLVE in PHENIX
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Running SOLVE/RESOLVE in PHENIX
Running SOLVE/RESOLVE from the command-line or in a script.
Author(s)
●
SOLVE/RESOLVE: Tom Terwilliger
Purpose
SOLVE and RESOLVE can be run directly in the PHENIX environment. This feature is normally only for advanced SOLVE/RESOLVE users who want to access the keywords in SOLVE/RESOLVE directly.
Usage
Running SOLVE/RESOLVE from the command-line or in a script.
●
You can run solve with the command: phenix.solve
This command will set the environmental variables CCP4_OPEN, SYMOP, SYMLIB, and SOLVEDIR and will run solve. If you want to run a different size of solve, then you can specify: phenix.solve --giant
For a bigger version still, choose --huge, for biggest, --extra_huge.
●
Running resolve or resolve_pattern is similar: phenix.resolve
phenix.resolve_pattern
●
Running solve/resolve from a command file is simple. Here is a command file to run resolve: phenix.resolve<<EOD hklin solve.mtz
labin FP=FP SIGFP=SIGFP PHIB=PHIB FOM=FOM HLA=HLA HLB=HLB HLC=HLC HLD=HLD solvent_content 0.43
database 5
EOD http://phenix-online.org/documentation/running-solve-resolve.htm (1 of 2) [12/14/08 1:03:26 PM]
256
Running SOLVE/RESOLVE in PHENIX
Literature
Additional information
All the solve/resolve keywords are available in the PHENIX versions of solve and resolve. See the full documentation for solve/resolve at http://solve.lanl.gov/ . http://phenix-online.org/documentation/running-solve-resolve.htm (2 of 2) [12/14/08 1:03:26 PM]
257
Automated ligand identification
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Automated ligand identification
Purpose of the Resolve_ligand_identification task
How the Resolve_ligand_identification task works:
How to run the Resolve_ligand_identification task
What the Resolve_ligand_identification task needs to run:
Output files from Resolve_ligand_identification task
Specific limitations and problems:
List of ligands in the PHENIX ligand identification library
Author(s)
●
Resolve_ligand_identification task: Li-Wei Hung
●
PHENIX GUI and PDS Server: Nigel W. Moriarty
●
RESOLVE: Tom Terwilliger
Purpose
Purpose of the Resolve_ligand_identification task
ranking of the ligand fitting results.
The current Resolve_ligand_identification task work with the ligand library provided with the Phenix program by default. It is also capable of fitting and ranking ligands in a custom PDB library provided by the users.
Usage
The Resolve_ligand_identification task can be run from the PHENIX GUI as a stand-alone strategy, or as a task in a multi-task strategy.
How the Resolve_ligand_identification task works:
The Resolve_ligand_identification task provides a graphical user interface allowing the user to select either (1) a datafile containing crystallographic structure factor information and a PDB file with a partial model of the structure without the ligand, or (2) an mtz file containing the information of an electron density map of the potential ligand to be identified. http://phenix-online.org/documentation/ligand_identification.htm (1 of 6) [12/14/08 1:03:28 PM]
258
Automated ligand identification
The ligand fitting routine is done by RESOLVE as described in the
LigandFit wizard documentation.
The
the task consists of a list of the best fitted ligands from the library. The task display provides options to view the top ranked ligand in Pymol with or without the electron density.
How to run the Resolve_ligand_identification task
An example 'ligand identification' strategy is located in the 'ligands' section in the Phenix strategy menu. Follow the directions and helps in the GUI.
What the Resolve_ligand_identification task needs to run:
The Resolve_ligand_identification task needs:
●
(1) a mtz file containing structure factors
●
(2) (optional), a PDB file with your protein model without ligand
Output files from Resolve_ligand_identification task
When you run Resolve_ligand_identification task the output files will be in the directory you started Phenix:
●
A summary file of the sitting results of all ligands: overall_ligand_scores.log
●
A summary table listing the results of the top ranked ligands: topligand.txt
The last column "Sequence in library' contains numbers '###' indicating the sequence number of the corresponding ligands. The final fitted ligand coordinates and all the log files are in the corresponding'###' files described below.
●
PDB files with the fitted ligands: resolve_ligand_###.pdb
●
A log file with the fitting of the ligand: resolve_fit_id_###.log
●
A log file with the fit of the ligand to the map: resolve_cc_id_###.log
●
Map coefficients for the map used for fitting: resolve_map.mtz
http://phenix-online.org/documentation/ligand_identification.htm (2 of 6) [12/14/08 1:03:28 PM]
259
Automated ligand identification
Examples
An example 'ligand identification' strategy is located in the 'ligands' section of the Phenix strategy menu.
Possible Problems
Specific limitations and problems:
●
The current Resolve_ligand_identification task work with the ligand library provided with the
Phenix program by default. It is also capable of fitting and ranking ligands in a custom PDB library provided by the users. The ligand atoms in the user-provided PDBs should be under
'HETATM' records.
●
Other Resolve related limitations please refer to the document of the LigandFit wizard.
Literature
Additional information
List of ligands in the PHENIX ligand identification library
---------------------------------------------------------------
PDB #ATOM LIG_ID
103m 6 NBN
1a99 6 PUT
1bio 6 GOL
1dc1 6 DIO
1dwk 6 OXL
1g29 6 DOX
1g8t 6 MO5
1h16 6 PYR
1k26 6 CRY
1fc5 7 MO6
1gaj 7 PEG
1l5j 7 F3S
1ad2 8 MPD
1b6i 8 HED
1cpf 8 TRS
1e42 8 DTT
1gth 8 URA
1jll 8 COA
1knp 8 SIN
1m6z 8 TMN
1nhz 8 HEZ
1o94 8 SF4
1s8l 8 LI1
1a0j 9 BEN
1amk 9 PGA
1bzy 9 POP
1d0v 9 NIO
1djr 9 BEZ
1f4l 9 MET http://phenix-online.org/documentation/ligand_identification.htm (3 of 6) [12/14/08 1:03:28 PM]
260
Automated ligand identification
1gck 9 ASP
1bf3 10 PHB
1bjq 10 ADE
1dan 10 FUC
1e1d 10 FSO
1e1o 10 LYS
1e7f 10 DAO
1e7h 10 PLM
1ewk 10 GLU
1fwn 10 PEP
1i0i 10 7HP
1kjp 10 PHQ
1kwn 10 TAR
1lrj 10 PGE
1os7 10 AKG
1akd 11 CAM
1d3g 11 ORO
1f98 11 HC4
10gs 12 MES
1amu 12 PHE
1e7e 12 DKA
1f7u 12 ARG
1bj5 13 MYR
1bxh 13 AMG
1f07 13 MPO
1gcz 13 CIT
1gni 13 OLA
1h9x 13 NHE
1j4u 13 MMA
1p0z 13 FLC
1e6r 14 NAA
1gkl 14 FER
1o7v 14 NDG
1rff 14 SPM
1a5a 15 PLP
1afb 15 NGA
1ajk 15 EPE
1c9s 15 TRP
1avd 16 BTN
1bg3 16 G6P
1cnq 16 F6P
1f7s 16 LDA
1fi1 16 FTT
1jsl 16 1PE
1d1v 17 BH4
1d7c 17 1PG
1e2j 17 THM
1n2n 17 H4B
1ho5 19 ADN
1o57 19 P6G
1b4w 20 BOG
1brr 20 RET
1dnc 20 GTT
1dug 20 GSH
1ere 20 EST
1fkp 20 NVP
1hvy 20 UMP
1ldn 20 FBP http://phenix-online.org/documentation/ligand_identification.htm (4 of 6) [12/14/08 1:03:28 PM]
261
Automated ligand identification
1ldn 20 OXM
1bh3 21 C8E
1d2s 21 DHT
1e2d 21 TMP
1h7f 21 C5P
1o28 21 UFP
1c3m 22 MAN-MAN
1cx4 22 CMP
1fsg 22 PRP
1gz1 22 BGC-BGC
1l4f 22 NCN
1ocj 22 BGC
1a0f 23 GTS
1aer 23 AMP
1cdg 23 MAL
1ex2 23 SUC
1gim 23 IMP
1gwv 23 LAT
1a97 24 5GP
1bir 24 2GP
1cq1 24 PQQ
1goy 24 3GP
1hk3 24 T44
1jcq 24 FPP
1ay2 25 GAL-NAG
1c3j 25 UDP
1h7l 25 TYD
1af7 26 SAH
1bfd 26 TPP
1k3l 26 GTX
1mcz 26 TDP
1ao0 27 ADP
1ao0 27 FS4
1cg1 27 IMO
1efh 27 A3P
1fpx 27 SAM
1a4r 28 GDP
1ao5 28 NAG-NAG
1b30 28 XYS-XYS-XYS
1b3v 28 XYS
1lv5 28 DCP
1opx 28 2PE
1cjk 29 FOK
1cjv 29 DAD
1g2v 29 TTP
1i52 29 CTP
1cr2 30 DTP
1ag9 31 FMN
1aq2 31 ATP
1aux 31 SAP
1b63 31 ANP
1f9h 31 APC
1gll 31 ACP
1r3k 31 DGA
1a2b 32 GSP
1b23 32 CYS
1b23 32 GNP
1ckm 32 GTP http://phenix-online.org/documentation/ligand_identification.htm (5 of 6) [12/14/08 1:03:28 PM]
262
Automated ligand identification
1pj6 32 FOL
1d1g 33 MTX
1bos 34 GAL
1bos 34 GAL-GAL-GLC
1bwu 34 MAN
1byh 34 GLC-GLC
1bzw 34 GAL-GLC
1cvn 34 MAN-MAN-MAN
1e40 34 GLC-GLC-GLC
1kzj 35 CB3
1n9b 35 MA4
1ek6 36 UPG
1b0f 38 NAG-FUC-NAG
1fuj 38 NAG-FUC
1g82 38 NAG-NAG-FUC
1d7d 39 NAG-NAG-MAN
1foa 39 UD1
1nb3 39 NAG-NAG-BMA
1kby 42 SPO
106m 43 HEM
1at5 43 NAG-NAG-NAG
1e85 43 HEC
1ek6 44 NAI
1esw 44 ACR
1p9l 44 NAD
1ece 45 GLC-GLC-GLC-GLC
1c3v 48 NDP
1c3v 48 PG4
1r2c 48 7MQ
1ti7 48 NAP
1aof 49 DHE
1gsl 49 NAG-NAG-MAN-FUC
1jnd 50 NAG-NAG-MAN-MAN
1dv3 51 BCL
1dv3 51 BPH
1dv3 51 U10
1p0h 51 ACO
1fnd 53 FAD
1lsh 53 PLD
1lsh 53 UPL
1f0y 54 CAA
1okc 54 PC2
1a65 56 NAG
1bdg 56 GLC
1en2 56 NAG-NAG-NAG-NAG
1f9d 56 GLC-GLC-GLC-GLC-GLC
1aky 57 AP5
1myr 58 NAG-FUC-NAG-MAN-XYS
1fq8 61 NAG-NAG-MAN-MAN-MAN
1prc 65 BPB
1cxp 71 NAG-NAG-MAN-MAN-MAN-FUC
1deo 72 NAG-NAG-MAN-MAN-MAN-MAN
1ax0 80 NAG-FUC-NAG-MAN-XYS-MAN-MAN
1kby 81 CDL
1dio 91 B12 http://phenix-online.org/documentation/ligand_identification.htm (6 of 6) [12/14/08 1:03:28 PM]
263
Finding all ligands in a map with phenix.find_all_ligands
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Finding all ligands in a map with phenix.find_all_ligands
How phenix.find_all_ligands works:
Output files from phenix.find_all_ligands
Standard run of phenix.find_all_ligands:
Specific limitations and problems:
List of all find_all_ligands keywords
Author(s)
● phenix.find_all_ligands: Tom Terwilliger
Purpose
phenix.find_all_ligands is a command line tool for finding all the ligands in a map by repetitively running
with a series of ligands and choosing the best-fitting one at each cycle.
Usage
How phenix.find_all_ligands works:
The basic procedure for phenix.find_all_ligands has three steps. The first is to identify the largest contiguous region of density in your map that is not already occupied by your model or previouslyfitted ligands. The second is to fit each ligand (you identify the candidate ligands in advance) into this density. The third is to choose the one that fits the density the best. Then the best-fitting ligand is added to the structure and the process is repeated until the number of ligands you request is found or the correlation of ligand to the map drops below the value you specify (default=0.5).
Output files from phenix.find_all_ligands
The output ligand files from phenix.find_all_ligands are normally in the temporary directory
(default='temp_dir'). They will be files with names such as "SITE_1_ATP.pdb" for the placement of ATP in the first site fitted.
Examples
Standard run of phenix.find_all_ligands:
Running phenix.find_all_ligands is easy. Usually you will want to edit a small parameter file http://phenix-online.org/documentation/find_all_ligands.htm (1 of 8) [12/14/08 1:03:33 PM]
264
Finding all ligands in a map with phenix.find_all_ligands
(find_all_ligands.eff) to contain your commands like this, where the ligandfit commands are sent to
: for the actual fitting and the find_all_ligands commands determine what searches are done: type:
# commands for running phenix.find_all_ligands
find_all_ligands {
number_of_ligands = 5
cc_min = 0.5
ligand_list = ATP.pdb NAD.pdb
nproc = 2
} ligandfit {
data = "nsf-d2.mtz"
model = "nsf-d2_noligand.pdb"
lig_map_type = fo-fc_difference_map
}
You might also want to add to this some additional commands for phenix.ligandfit
. Any commands for ligandfit are allowed, except that the commands "ligand" and "input_lig_file" are ignored as the input ligand comes from the find_all_ligands command "ligand_list":
# find_all_ligands.eff more commands for ligandfit ligandfit { data = "nsf-d2.mtz" model = "nsf-d2_noligand.pdb" lig_map_type = fo-fc_difference_map ligand_cc_min = 0.75
verbose = Yes
} where you can put any phenix.ligandfit commands in the braces. Then you can run this with the command: phenix.find_all_ligands find_all_ligands.eff
Possible Problems
Specific limitations and problems:
●
This method uses phenix.ligandfit
to do the ligand fitting, so all the commands, features, and limitations of phenix.ligandfit apply to phenix.find_all_ligands.
Literature
Additional information
NOTE: in addition to the find_all_ligands keywords shown here, all phenix.ligandfit
commands are also allowed, except that the commands "ligand" and "input_lig_file" are ignored as the input ligand comes from the find_all_ligands.
List of all find_all_ligands keywords
-------------------------------------------------------------------------------
http://phenix-online.org/documentation/find_all_ligands.htm (2 of 8) [12/14/08 1:03:33 PM]
265
Finding all ligands in a map with phenix.find_all_ligands
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- find_all_ligands
number_of_ligands= None Total number of ligand sites. Ignored if "None".
find_all_ligands will keep looking until the correlation
coefficient for the fit of the best ligand is less than
cc_min or the number of ligands placed is
number_of_ligands, whichever comes first
cc_min= 0.50
Ignored if "None". find_all_ligands will keep looking until
the correlation coefficient for the fit of the best ligand is less
than cc_min or the number of ligands placed is number_of_ligands,
whichever comes first
ligand_list= None List of files with ligands to find
nproc= 1 number of processors to use
background= *Yes No run jobs in background or not
run_command= csh Command for running jobs (e.g., csh or qsub )
verbose= True *False verbose output
debug= True *False debugging output
temp_dir= Auto Optional temporary work directory
output_dir= "" Output directory where files are to be written
dry_run= False Just read in and check parameter names
ligandfit
data= None Datafile (alias for input_data_file). This can be any format if
only FP is to be read in. If phases are to be read in then MTZ format
is required. The Wizard will guess the column identification. If you
want to specify it you can say input_labels="FP" , or
input_labels="FP PHIB FOM". (Command-line only)
ligand= None File containing information about the ligand (PDB or SMILES)
(alias for input_lig_file) (Command-line only)
model= None PDB file with model for everything but the ligand (alias for
input_partial_model_file). (Command-line only)
quick= False Run as quickly as possible. (Command-line only)
special_keywords
write_run_directory_to_file= None Writes the full name of a run
directory to the specified file. This can
be used as a call-back to tell a script
where the output is going to go.
(Command-line only)
run_control
coot= None Set coot to True and optionally run=[run-number] to run Coot
with the current model and map for run run-number. In some wizards
(AutoBuild) you can edit the model and give it back to PHENIX to
use as part of the model-building process. If you just say coot
then the facts for the highest-numbered existing run will be
shown. (Command-line only)
ignore_blanks= None ignore_blanks allows you to have a command-line
keyword with a blank value like "input_lig_file_list="
stop= None You can stop the current wizard with "stopwizard" or "stop".
http://phenix-online.org/documentation/find_all_ligands.htm (3 of 8) [12/14/08 1:03:33 PM]
266
Finding all ligands in a map with phenix.find_all_ligands
If you type "phenix.autobuild run=3 stop" then this will stop run
3 of autobuild. (Command-line only)
display_facts= None Set display_facts to True and optionally
run=[run-number] to display the facts for run run-number.
If you just say display_facts then the facts for the
highest-numbered existing run will be shown.
(Command-line only)
display_summary= None Set display_summary to True and optionally
run=[run-number] to show the summary for run
run-number. If you just say display_summary then the
summary for the highest-numbered existing run will be
shown. (Command-line only)
carry_on= None Set carry_on to True to carry on with highest-numbered
run from where you left off. (Command-line only)
run= None Set run to n to continue with run n where you left off.
(Command-line only)
copy_run= None Set copy_run to n to copy run n to a new run and continue
where you left off. (Command-line only)
display_runs= None List all runs for this wizard. (Command-line only)
delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)
display_labels= None display_labels=test.mtz will list all the labels
that identify data in test.mtz. You can use the label
strings that are produced in AutoSol to identify which
data to use from a datafile like this: peak.data="F+
SIGF+ F- SIGF-" # the entire string in quotes counts
here You can use the individual labels from these
strings as identifiers for data columns in AutoSol and
AutoBuild like this: input_refinement_labels="FP SIGFP
FreeR_flags" # each individual label counts
dry_run= False Just read in and check parameter names
params_only= False Just read in and return parameter defaults
display_all= False Just read in and display parameter defaults
crystal_info
cell= 0.0 0.0 0.0 0.0 0.0 0.0
Enter cell parameter a b c alpha beta
gamma
resolution= 0.0
High-resolution limit.Used as resolution limit for
density modification and as general default high-resolution
limit. If resolution_build or refinement_resolution are set
then they override this for model-building or refinement. If
overall_resolution is set then data beyond that resolution
is ignored completely.
sg= None Space Group symbol (i.e., C2221 or C 2 2 21)
display
number_of_solutions_to_display= None Number of solutions to put on
screen and to write out
solution_to_display= 1 Solution number of the solution to display and
write out ( use 0 to let the wizard display the top
solution)
file_info
file_or_file_list= *single_file file_with_list_of_files Choose if you
want to input a single file with PDB or other
information about the ligand or if you want to input
a file containing a list of files with this
information for a list of ligands
input_labels= None Labels for input data columns NOTE: Applies to input
data file for LigandFit and AutoBuild, but not to AutoMR.
For AutoMR use instead 'input_label_string'.
lig_map_type= *fo-fc_difference_map fobs_map pre_calculated_map_coeffs http://phenix-online.org/documentation/find_all_ligands.htm (4 of 8) [12/14/08 1:03:33 PM]
267
Finding all ligands in a map with phenix.find_all_ligands
Enter the type of map to use in ligand fitting
fo-fc_difference_map: Fo-Fc difference map phased on
partial model fobs_map: Fo map phased on partial model
pre_calculated_map_coeffs: map calculated from FP PHIB
[FOM] coefficients in input data file
ligand_format= *PDB SMILES Enter whether the files contain SMILES
strings or PDB formatted information
general
background= True When you specify nproc=nn, you can run the jobs in
background (default if nproc is greater than 1) or
foreground (default if nproc=1). If you set
run_command=qsub (or otherwise submit to a batch queue),
then you should set background=False, so that the batch
queue can keep track of your runs. There is no need to use
background=True in this case because all the runs go as
controlled by your batch system. If you use run_command=csh
(or similar, csh is default) then normally you will use
background=True so that all the jobs run simultaneously.
base_path= None You can specify the base path for files (default is
current working directory)
clean_up= False At the end of the entire run the TEMP directories will
be removed if clean_up is True. The default is No, keep these
directories. If you want to remove them after your run is
finished use a command like "phenix.autobuild run=1
clean_up=True"
coot_name= coot If your version of coot is called something else, then
you can specify that here.
debug= False You can have the wizard stop with error messages about the
code if you use debug. NOTE: you cannot use Pause with debug.
extend_try_list= False You can fill out the list of parallel jobs to
match the number of jobs you want to run at one time,
as specified with nbatch.
extra_verbose= False Facts and possible commands will be printed every
cycle if Yes
i_ran_seed= 289564 Random seed (positive integer) for model-building
and simulated annealing refinement
ligand_id= None You can specify an integer value for the ID of a
ligand... This number will be added to whatever residue
number the ligand search model in input_lig_file has. The
keyword is only valid if a single copy of the ligand is to be
found.
max_wait_time= 100.0
You can specify the length of time (seconds) to
wait when testing the run_command. If you have a cluster
where jobs do not start right away you may need a longer
time to wait.
nbatch= 5 You can specify the number of processors to use (nproc) and
the number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors
available and leave nbatch alone. If you leave nbatch as None it
will be set automatically, with a value depending on the Wizard.
This is recommended. The value of nbatch can affect the results
that you get, as the jobs are not split into exact replicates,
but are rather run with different random numbers. If you want to
get the same results, keep the same value of nbatch.
nproc= 1 You can specify the number of processors to use (nproc) and the
number of batches to divide the data into for parallel jobs.
Normally you will set nproc to the number of processors available
and leave nbatch alone. If you leave nbatch as None it will be http://phenix-online.org/documentation/find_all_ligands.htm (5 of 8) [12/14/08 1:03:33 PM]
268
Finding all ligands in a map with phenix.find_all_ligands
set automatically, with a value depending on the Wizard. This is
recommended. The value of nbatch can affect the results that you
get, as the jobs are not split into exact replicates, but are
rather run with different random numbers. If you want to get the
same results, keep the same value of nbatch.
resolve_command_list= None Commands for resolve. One per line in the
form: keyword value value can be optional
Examples: coarse_grid resolution 200 2.0 hklin
test.mtz NOTE: for command-line usage you need to
enclose the whole set of commands in double quotes
(") and each individual command in single quotes
(') like this: resolve_command_list="'no_build'
'b_overall 23' "
resolve_size= _giant _huge _extra_huge *None Size for solve/resolve
("","_giant","_huge","_extra_huge")
run_command= csh When you specify nproc=nn, you can run the subprocesses
as jobs in background with csh (default) or submit them to
a queue with the command of your choice (i.e., qsub ). If
you have a multi-processor machine, use csh. If you have a
cluster, use qsub or the equivalent command for your
system. NOTE: If you set run_command=qsub (or otherwise
submit to a batch queue), then you should set
background=False, so that the batch queue can keep track of
your runs. There is no need to use background=True in this
case because all the runs go as controlled by your batch
system. If you use run_command=csh (or similar, csh is
default) then normally you will use background=True so that
all the jobs run simultaneously.
skip_xtriage= False You can bypass xtriage if you want. This will
prevent you from applying anisotropy corrections, however.
temp_dir= None Define a temporary directory (it must exist)
title= Run 1 LigandFit Sun Dec 7 17:46:25 2008 Enter any text you like
to help identify what you did in this run
top_output_dir= None This is used in subprocess calls of wizards and to
tell the Wizard where to look for the STOPWIZARD file.
verbose= False Command files and other verbose output will be printed
input_files
existing_ligand_file_list= None You can enter a list of files with
ligands you have already fit. These will be
used to exclude that region from
consideration.
input_data_file= None Enter the file with input structure factor data
(files other than MTZ will be converted to mtz and
intensities to amplitudes)
input_lig_file= None Enter either a single file with PDB information or
a SMILES string or a file containing a list of files
with this information for a list of ligands. If you
enter a file containing a list of files you need also to
specify
"file_or_file_list=file_with_list_of_files".
If the format is not PDB, then ELBOW will generate a PDB
file.
input_ligand_compare_file= None If you enter a PDB file with a ligand in
it, the coordinates of the newly-built ligand
will be compared with the coordinates in this
file.
input_partial_model_file= None Enter a PDB file containing a model of
your structure without the ligand. This is http://phenix-online.org/documentation/find_all_ligands.htm (6 of 8) [12/14/08 1:03:33 PM]
269
Finding all ligands in a map with phenix.find_all_ligands
used to calculate phases. If you are providing
phases in your data file and have selected
"pre_calculated_map_coeffs" for map_type this
file may be left out.
non_user_parameters
get_lig_volume= False You can ask to get the volume of the ligand and
to then stop
offsets_list= 7 53 29 You can specify an offset for the orientation of
the helix and strand templates in building. This is used
in generating different starting models.
refinement
link_distance_cutoff= 3.0
You can specify the maximum bond distance for
linking residues in phenix.refine called from the
wizards.
r_free_flags_fraction= 0.1
Maximum fraction of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_lattice_symmetry_max_delta= 5.0
You can set the maximum
deviation of distances in the
lattice that are to be
considered the same for
purposes of generating a
lattice-symmetry-unique set of
free R flags.
r_free_flags_max_free= 2000 Maximum number of reflections in the free R
set. You can choose the maximum fraction of
reflections in the free R set and the maximum
number of reflections in the free R set. The
number of reflections in the free R set will be
up the lower of the values defined by these two
parameters.
r_free_flags_use_lattice_symmetry= True When generating r_free_flags you
can decide whether to include lattice
symmetry (good in general, necessary
if there is twinning).
search_parameters
conformers= 1 Enter how many conformers to create. If greater than 1,
then ELBOW will always be used to generate them. If 1 then
ELBOW will be used if a PDB file is not specified. These
conformers are used to identify allowed torsion angles for
your ligand. The alternative is to use the empirical rules
in RESOLVE. ELBOW takes longer but is more accurate.
delta_phi_ligand= 40.0
Specify the angle (degrees) between successive
tries in FFT search for fragments
fit_phi_inc= 20 Specify the angle (degrees) between rotations around
bonds
fit_phi_range= -180 180 Range of bond rotation angles to search
group_search= 0 Enter the ID number of the group from the ligand to use
to seed the search for conformations
ligand_cc_min= 0.75
Enter the minimum correlation coefficient of the
ligand to the map to quit searching for more
conformations
ligand_completeness_min= 1.0
Enter the minimum completeness of the
ligand to the map to quit searching for more http://phenix-online.org/documentation/find_all_ligands.htm (7 of 8) [12/14/08 1:03:33 PM]
270
Finding all ligands in a map with phenix.find_all_ligands
conformations
local_search= True If local_search is Yes then, only the region within
search_dist of the point in the map with the highest local
rmsd will be searched in the FFT search for fragments
n_group_search= 3 Enter the number of different fragments of the ligand
that will be looked for in FFT search of the map
n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at
once otherwise all are first searched at once then
individually up to the number specified
n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are
tested at once otherwise all are first tested at once
then individually up to the number specified
number_of_ligands= 1 Number of copies of the ligand expected in the
asymmetric unit
search_dist= 10.0
If local_search is Yes then, only the region within
this distance of the point in the map with the highest
local rmsd will be searched in the FFT search for fragments
use_cc_local= False You can specify the use of a local correlation
coefficient for scoring ligand fits to the map. If you do
not do this, then the region over which the ligand is
scored are all points within 2.5 A of the atoms in the
ligand. If you do specify use_cc_local, then the region
over which the ligand is scored are all these points, plus
all the contingous points that have density greater than
0.5 * sigma .
search_target
ligand_near_chain= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and
ligand_near_chain. If you set
ligand_near_chain="None" or leave it blank or do not
set it, then all chains will be included. The
keywords ligand_near_res and ligand_near_chain refer
to residue/chain in the file defined by
input_partial_model_file (or model if running from
command line).
ligand_near_pdb= None You can specify where LigandFit should look for
your ligands by providing a PDB file containing one or
more copies of the ligand. If you want you can provide
a PDB file with ligand+ macromolecule and specify the
ligand name with name_of_ligand_near_pdb.
ligand_near_res= None You can specify where to search for the ligand
either with search_center or with ligand_near_res and
ligand_near_chain The keywords ligand_near_res and
ligand_near_chain refer to residue/chain in the file
defined by input_partial_model_file (or model if
running from command line).
name_of_ligand_near_pdb= None You can specify where LigandFit should
look for your ligands by providing a PDB file
containing one or more copies of the ligand. If
you want you can provide a PDB file with
ligand+ macromolecule and specify the ligand
name with name_of_ligand_near_pdb.
search_center= 0.0 0.0 0.0
Enter coordinates for center of search region
(ignored if [0,0,0]) http://phenix-online.org/documentation/find_all_ligands.htm (8 of 8) [12/14/08 1:03:33 PM]
271
Mapping one PDB file onto another using space-group symmetry with phenix.map_to_object
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Mapping one PDB file onto another using space-group symmetry with phenix.
map_to_object
How phenix.map_to_object works:
Standard run of phenix.map_to_object:
Run of phenix.map_to_object searching over additional unit cells
Specific limitations and problems:
List of all map_to_object keywords
Author(s)
● phenix.map_to_object: Tom Terwilliger
Purpose
phenix.map_to_object is a command line tool for applying a rotation and translation consistent with space-group symmetry to a PDB file in order to bring its atoms close to those in a second PDB file.
Usage
How phenix.map_to_object works:
phenix.map_to_object searches over each equivalent position in the unit cell and neighboring unit cells to find the one that places the moving_pdb atoms closest to those in fixed_pdb. You can choose to minimize the distance between the center of mass of the PDB files, or you can minimize the distance between the closest atoms, or you can maximize the number of close contacts.
Examples
Standard run of phenix.map_to_object:
Running phenix.map_to_object is easy. You can just type: phenix.map_to_object fixed_pdb=my_target.pdb moving_pdb=my_ligand.pdb
http://phenix-online.org/documentation/map_to_object.htm (1 of 3) [12/14/08 1:03:35 PM]
272
Mapping one PDB file onto another using space-group symmetry with phenix.map_to_object and phenix.map_to_object will move my_ligand.pdb as close as it can to my_target.pdb.
Run of phenix.map_to_object specifying center of mass of moving PDB is to be close to any atom of fixed PDB:
By default phenix.map_to_object will move the center of mass of moving_pdb as close as possible to any atom in fixed_pdb. You could specify this explicitly with: phenix.map_to_object fixed_pdb=my_target.pdb moving_pdb=my_ligand.pdb \ use_moving_center_of_mass=True use_fixed_center_of_mass=False
Run of phenix.map_to_object specifying center of mass of moving PDB is to have maximum number of contacts with atoms of fixed PDB:
If you wanted instead to maximize the number of close contacts under 5 A between the center of mass of my_ligand.pdb and any atom in my_target.pdb, you could type: phenix.map_to_object fixed_pdb=my_target.pdb moving_pdb=my_ligand.pdb \ use_moving_center_of_mass=True use_fixed_center_of_mass=False \ use_contact_order=True contact_dist=5.
Run of phenix.map_to_object searching over additional unit cells
phenix.map_to_object fixed_pdb=my_target.pdb moving_pdb=my_ligand.pdb \ use_moving_center_of_mass=True use_fixed_center_of_mass=False \ use_contact_order=True contact_dist=5. \ extra_cells_to_search=2
Possible Problems
Specific limitations and problems:
Literature
Additional information
List of all map_to_object keywords
-------------------------------------------------------------------------------
Legend: black bold - scope names
black - parameter names red - parameter values blue - parameter help
blue bold
- scope help
Parameter values:
* means selected parameter (where multiple choices are available)
False is No
True is Yes
None means not provided, not predefined, or left up to the program
"%3d" is a Python style formatting descriptor
------------------------------------------------------------------------------- map_to_object
moving_pdb= None PDB file with coordinates to move near fixed_pdb using SG
symmetry
fixed_pdb= None PDB file to move moving_pdb close to using SG symmetry http://phenix-online.org/documentation/map_to_object.htm (2 of 3) [12/14/08 1:03:35 PM]
273
Mapping one PDB file onto another using space-group symmetry with phenix.map_to_object
output_pdb= None Name of output (moved) PDB file
use_moving_center_of_mass= True You can choose to just move the center of
mass of the moving PDB close to the fixed PDB
(as opposed to finding the operator that puts an
atom of the moving PDB closest to an atom in the
fixed PDB
use_fixed_center_of_mass= False You can choose to just move the moving PDB
close to the center of mass of the fixed PDB (as
opposed to finding the operator that puts the
moving PDB closest to any atom in the fixed PDB
use_contact_order= True You can choose to maximize the number of atoms that
are within contact_dist (default=6 ) A of an atom in the
other structure.
contact_dist= 6.
Atoms separated by contact_dist or less are considered to
be in contact
extra_cells_to_search= 1 You can specify how many unit cells beyond the
central one to search in each direction (default=1,
search -1 0 and 1 in each direction)
verbose= False Verbose output
debug= False Debugging output
dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/map_to_object.htm (3 of 3) [12/14/08 1:03:35 PM]
274
PyMOL in PHENIX
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
PyMOL in PHENIX
Author(s)
●
PyMOL: PyMOL executables are kindly supplied by Warren DeLano for distribution in PHENIX.
Starting PyMOL
Normally you will start PyMOL in PHENIX after one of the PHENIX Wizards has finished. In this case if you click on the magnifying glass on the Wizard screen, and select one of the choices that displays a structure with PyMOL, then PyMOL will automatically be launched with the appropriate PDB and map files.
You can also start PyMOL by clicking on the PyMOL button on the PHENIX GUI. In this case you'll need to read in maps and models yourself. You can read in a model to PyMOL by typing load overall_best.pdb
in the PyMOL Tcl/Tk GUI window or at the PyMOL prompt in the PyMOL display window.
Setting up your view in PyMOL
Here are some simple controls that let you choose what you see in PyMOL, assuming that you have a model (pdb_1) and a map (map_1) or maps loaded.
●
Click a few times on "pdb_1" you will see the model turn off and on. Same for "contour_1.5".
Similarly "all" turns everything on and off, and "map_1" turns the unit cell box (which may not be visible in your viewer) on and off.
●
To the right of "pdb_1" you will see buttons labelled "A" "S" "H" "L" and "C". Click on each one and you'll see what they do:
❍
"A" : Actions. lets you recenter, delete the object, and more
❍
"S" : Show. For a model you may want to show sticks for clarity.
❍
"H" : Hide. Undoes show.
❍
"L" : Label. Choose what labels to display
❍
"C" : Color. Choose colors.
●
The little table in the lower right of the PyMOL display window shows what each mouse button does. If you have a 3-button (2 buttons and a roller) mouse, then hold the left button down and move the mouse to rotate; right button down and move the mouse to change the size; both
● buttons down and move the mouse to move the center.
If you accidentally click the wrong buttons and some new object appears on the screen that you do not want, click on the "A" botton for the new object and select "delete" to get rid of it. http://phenix-online.org/documentation/pymol.htm (1 of 2) [12/14/08 1:03:39 PM]
275
PyMOL in PHENIX
Useful PyMOL commands
Here are a few useful PyMOL commands that you can type in the PyMOL window or in the PyMOL Tcl/
Tk GUI window.
●
Read in a pdb file named "overall_best.pdb": load overall_best.pdb
●
Read in a xplor-style map named "map_1.xplor" and contour it at a level of 1.5: load map_1.xplor
isomesh contour_1, map_1, 1.5
●
Create a new set of contours at a level of 2.5 for a map called "map_1" that has already been loaded: isomesh contour_1, map_1, 2.5
●
Get PyMOL help; help
●
Get PyMOL help on the command "isomesh" help isomesh
Additional information
You can get the documentation for PyMOL at pymol.sourceforge.net/html/ http://phenix-online.org/documentation/pymol.htm (2 of 2) [12/14/08 1:03:39 PM]
276
Coot - a Model Building Tool
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Coot - a Model Building Tool
Coot (Crystallographic Object-Oriented Toolkit) Coot is a program for crystallographic model building, model completion, and validation written by Paul Emsley. There is documentation about the program and how to use it here . http://phenix-online.org/documentation/coot.htm [12/14/08 1:03:42 PM]
277
MolProbity - An Active Validation Tool
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
MolProbity - An Active Validation Tool
Authors
MolProbity is a web application that integrates validation programs from the Richardson lab at Duke
University.
●
Ian Davis, principal author: PHP/Java web service; KiNG; Ramachandran & Rotamer; Dangle
●
Vincent Chen: extensions to KiNG & MolProbity
●
Mike Word: Reduce; Probe; Clashlist
●
Dave Richardson: kinemages; Mage; Prekin; Suitename
●
Xueyi Wang: RNABC
●
Jack Snoeyink & Andrew Leaver-Fay: Reduce update
●
Bryan Arendall: webmaster; databases
Purpose
MolProbity provides the user with an expert-system consultation about the accuracy of a macromolecular structure model, diagnosing local problems and enabling their correction. It combines all atom contact analysis with updated versions of more traditional tools for validating geometry and dihedral-angle combinations. MolProbity is most complete for crystal structures of proteins and RNA, but also handles DNA, ligands, and NMR ensembles. It works best as an active validation tool - used as soon as a model is available and during each rebuild/refine loop, not just at the end to provide global statistics before deposition. It produces coordinates, graphics, and numerical evaluations that integrate with either manual or automated use in systems such as PHENIX, KiNG, or Coot.
Usage
The integrated MolProbity web application is at http://molprobity.biochem.duke.edu/ . The user is guided through a work-flow that typically consists of:
1. Fetch or upload model(s)
2. Add & optimize H atoms, with correction of Asn/Gln/His flips
3. Calculate per-residue & global quality analyses:
1. all-atom steric clashes
2. geometry (e.g., Cbeta or ribose pucker ideality)
3. Ramachandran, sidechain rotamer, or RNA backbone outliers
4. global MolProbity score
4. View multi-criterion chart and/or on-line 3D KiNG graphics summaries
5. [Optional features, e.g. interface analysis; load maps for on-line viewing; Coot to-do list]
6. Download coordinate & graphics files for further work on local corrections
An increasingly broad subset of MolProbity functionalities are integrated directly into PHENIX for use in http://phenix-online.org/documentation/molprobity_tool.htm (1 of 2) [12/14/08 1:03:46 PM]
278
MolProbity - An Active Validation Tool refinement, Resolve, and wizard decisions. Phenix.reduce provides optimized hydrogen addition, phenix.probe and quick_clashlist.py provide all-atom clash analysis, and python versions of the
Ramachandran and rotamer scores are available in the mmtbx. Interactive all-atom contact dots are also available in Coot.
Possible Problems
Web usage requires Java, Javascript, and a modern web browser.
MolProbity provides reasonable session protection, but if security or large-scale usage are at issue, you can install MP to run on your own Linux or Mac computer provided that the computer has a web server (Apache), the PHP scripting language, JAVA, and a few common Unix utility programs. For more information, follow the "Download MolProbity" link on the MP main page.
Literature
●
MolProbity: all-atom contacts and structure validation for proteins and nucleic acids I. W.
Davis, A. Leaver-Fay, V. B. Chen, J. N. Block, G. J. Kapral, X. Wang, L. W. Murray, W. B. Arendall,
III, J. Snoeyink, J. S. Richardson, and D. C. Richardson. Nucl. Acids Res. 35: W375-W383 (2007)
●
Visualizing and Quantifying Molecular Goodness-of-Fit: Small-probe Contact Dots with
Explicit Hydrogen Atoms. J. M. Word, S. C. Lovell, T. H. LaBean, H. C. Taylor, M. E. Zalis, B. K.
Presley, J. S. Richardson, and D. C. Richardson. JMB 285, 1711-33 (1999)
●
Structure Validation by C
α Geometry: φ,ψ and Cβ Deviation. S.C. Lovell, I.W. Davis, W.B.
Arendall III, P.I.W. de Bakker, J.M. Word, M.G. Prisant, J.S. Richardson, and D.C. Richardson.
Proteins: Structure, Function and Genetics 50, 437-450 (2003)
●
A test of enhancing model accuracy in high-throughput crystallography. W.B. Arendall III,
W. Tempel, J.S. Richardson, W. Zhou, S. Wang, I.W. Davis, Z.-J. Liu, J.P. Rose, W.M. Carson, M.
Luo, D.C. Richardson, and B-C. Wang. Journal of Structural and Functional Genomics 6, 1-11 (2005) http://phenix-online.org/documentation/molprobity_tool.htm (2 of 2) [12/14/08 1:03:46 PM]
279
PHENIX Examples
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
PHENIX Examples
Can I easily run a Wizard with some sample data?
What sample data are available to run automatically?
Are any of the sample datasets annotated?
Where can I find sample data?
You can find sample data in the directories located in: $PHENIX/examples. Additionally there is sample MR data in $PHENIX/phaser/tutorial.
Can I easily run a Wizard with some sample data?
You can run sample data with a Wizard with a simple command. To run p9-sad sample data with the
AutoSol wizard, you type: phenix.run_example p9-sad
This command copies the $PHENIX/examples/p9-sad directory to your working directory and executes the commands in the file run.csh.
What sample data are available to run automatically?
You can see which sample data are set up to run automatically by typing: phenix.run_example --help
This command lists all the directories in $PHENIX/examples/ that have a command file run.csh ready to use. For example: phenix.run_example --help
PHENIX run_example script. Fri Jul 6 12:07:08 MDT 2007
Use: phenix.run_example example_name [--all] [--overwrite]
Data will be copied from PHENIX examples into subdirectories of this working directory
If --all is set then all examples will be run (takes a long time!)
If --overwrite is set then the script will overwrite subdirectories
List of available examples: 1J4R-ligand a2u-globulin-mr gene-5-mad p9-build p9-sad
Are any of the sample datasets annotated?
http://phenix-online.org/documentation/examples.htm (1 of 2) [12/14/08 1:03:49 PM]
280
PHENIX Examples
The PHENIX tutorials listed on the main PHENIX web page will walk you through sample datasets, telling you what to look for in the output files. For example, the
Tutorial 1: Solving a structure using
SAD data tutorial uses the p9-sad dataset as example. It tells you how to run this example data in
AutoSol and how to interpret the results. http://phenix-online.org/documentation/examples.htm (2 of 2) [12/14/08 1:03:49 PM]
281
Tutorial 1: Solving a structure with SAD data
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Tutorial 1: Solving a structure with SAD data
Running the demo p9 data with AutoSol
Reading the log files for your AutoSol run file
Summary of the command-line arguments
Using the datafiles converted to premerged format.
Testing for anisotropy in the data
Choosing datafiles with high signal-to-noise
Running HYSS to find the heavy-atom substructure
Finding the hand and scoring heavy-atom solutions
Statistical density modification with RESOLVE
The AutoSol_summary.dat summary file
How do I know if I have a good solution?
Introduction
This tutorial will use some very good SAD data (peak wavelength from an IF5A dataset diffracting to
1.7 A) as an example of how to solve a SAD dataset with AutoSol. It is designed to be read all the way through, giving pointers for you along the way. Once you have read it all and run the example data and looked at the output files, you will be in a good position to run your own data through AutoSol.
Setting up to run PHENIX
If PHENIX is already installed and your environment is all set, then if you type: echo $PHENIX then you should get back something like this:
/xtal//phenix-1.3
If instead you get: http://phenix-online.org/documentation/tutorial_sad.htm (1 of 12) [12/14/08 1:04:01 PM]
282
Tutorial 1: Solving a structure with SAD data
PHENIX: undefined variable then you need to set up your PHENIX environment. See the
page for details of how to do this. If you are using the C-shell environment (csh) then all you will need to do is add one line to your .cshrc (or equivalent) file that looks like this: source /xtal/phenix-1.3/phenix_env
(except that the path in this statement will be where your PHENIX is installed). Then the next time you log in $PHENIX will be defined.
Running the demo p9 data with AutoSol
To run AutoSol on the demo p9 data, make yourself a tutorials directory and cd into that directory: mkdir tutorials cd tutorials
Now type the phenix command: phenix.run_example --help to list the available examples. Choosing p9-sad for this tutorial, you can now use the phenix command: phenix.run_example p9-sad to solve the p9 structure with AutoSol. This command will copy the directory $PHENIX/examples/
p9-sad to your current directory (tutorials) and call it tutorials/p9-sad/ . Then it will run AutoSol using the command file run.csh that is present in this tutorials/p9-sad/ directory. This command file run.csh is simple. It says:
#!/bin/csh phenix.autosol seq_file=seq.dat sites=4 atom_type=Se data=p9_se_w2.sca \ sg="I4" cell="113.949 113.949 32.474 90.000 90.000 90.00" \ resolution=2.4 thoroughness=quick
The first line (#!/bin/csh) tells the system to interpret the remainder of the text in the file using the
C-shell (csh). The command phenix.autosol runs the command-line version of AutoSol (see
keywords). The arguments on the command line tell AutoSol about the sequence file (seq_file=seq.
dat), the number of sites to look for (sites=4), and the atom type (atom_type=Se). (Note that each of these is specified with an = sign, and that there are no spaces around the = sign.) The Phaser heavy-atom refinement and model completion algorithm used in the AutoSol SAD phasing will add additional sites if warranted. Note the backslash "\" at the end of some of the lines in the phenix.
autosol command. This tells the C-shell (which interprets everything in this file) that the next line is a continuation of the current line. There must be no characters (not even a space) after the backslash for this to work. The SAD data to be used to solve the structure is in the datafile p9_se_w2.sca. This datafile is in Scalepack unmerged format, which means that there may be multiple instances of each reflection and the cell parameters are not in the file, so we need to provide the cell parameters with the command, cell="113.949 113.949 32.474 90.000 90.000 90.00". (Note that the cell parameters are surrounded by quotation marks. That tells the parser that these are all together.) In this example, the space group in the p9_se_w2.sca file is I41, but the correct space group is I4, so we need to tell AutoSol the correct space group with sg="I4". The resolution of the data in http://phenix-online.org/documentation/tutorial_sad.htm (2 of 12) [12/14/08 1:04:01 PM]
283
Tutorial 1: Solving a structure with SAD data
p9_se_w2.sca is to 1.74 A, but in this example we would like to solve the structure quickly, so we have cut the resolution back with the commands resolution=2.4 and thoroughness=quick. The
quick command sets several defaults to give a less comprehensive search for heavy-atom sites and a less thorough model-building than if you use the default of thoroughness=thorough. Although the
phenix.run_example p9-sad command has just run AutoSol from a script (run.csh), you can run
AutoSol yourself from the command line with the same phenix.autosol seq_file= ... command. You can also run AutoSol from a GUI, or by putting commands in another type of script file. All these possibilities are described in
Running a Wizard from a GUI, the command-line, or a script .
Where are my files?
Once you have started AutoSol or another Wizard, an output directory will be created in your current
(working) directory. The first time you run AutoSol in this directory, this output directory will be called AutoSol_run_1_ (or AutoSol_run_1_/, where the slash at the end just indicates that this is a directory). All of the output from run 1 of AutoSol will be in this directory. If you run AutoSol again, a new subdirectory called AutoSol_run_2_ will be created. Inside the directory AutoSol_run_1_ there will be one or more temporary directories such as TEMP0 created while the Wizard is running. The files in this temporary directory may be useful sometimes in figuring out what the Wizard is doing (or not doing!). By default these directories are emptied when the Wizard finishes (but you can keep their contents with the command clean_up=False if you want.)
What parameters did I use?
Once the AutoSol wizard has started (when run from the command line), a parameters file called
autosol.eff will be created in your output directory (e.g., AutoSol_run_1_/autosol.eff). This
parameters file has a header that says what command you used to run AutoSol, and it contains all the starting values of all parameters for this run (including the defaults for all the parameters that you did not set). The autosol.eff file is good for more than just looking at the values of parameters, though. If you copy this file to a new one (for example autosol_hires.eff) and edit it to change the values of some of the parameters (resolution=1.74) then you can re-run AutoSol with the new values of your parameters like this: phenix.autosol autosol_hires.eff
This command will do everything just the same as in your first run but use all the data to 1.74 A.
Reading the log files for your AutoSol run file
While the AutoSol wizard is running, there are several places you can look to see what is going on. The most important one is the overall log file for the AutoSol run. This log file is located in:
AutoSol_run_1_/AutoSol_run_1_1.log
for run 1 of AutoSol. (The second 1 in this log file name will be incremented if you stop this run in the middle and restart it with a command like phenix.autosol run=1). The AutoSol_run_1_1.log file is a running summary of what the AutoSol Wizard is doing. Here are a few of the key sections of the log files produced for the p9 SAD dataset.
Summary of the command-line arguments
Near the top of the log file you will find:
------------------------------------------------------------
Starting AutoSol with the command: http://phenix-online.org/documentation/tutorial_sad.htm (3 of 12) [12/14/08 1:04:01 PM]
284
Tutorial 1: Solving a structure with SAD data phenix.autosol seq_file=seq.dat sites=4 atom_type=Se data=p9_se_w2.sca sg=I4 \ cell='113.949 113.949 32.474 90.000 90.000 90.00' resolution=2.4 \ thoroughness=quick
This is just a repeat of how you ran AutoSol; you can copy it and paste it into the command line to repeat this run.
ImportRawData.
The input data file p9_se_w2.sca is in unmerged Scalepack format. The AutoSol wizard converts everything to premerged Scalepack format before proceeding. Here is where the AutoSol Wizard identifies the format and then calls the ImportRawData Wizard:
HKLIN ENTRY: p9_se_w2.sca
GUESS FILE TYPE MERGE TYPE sca unmerged
LABELS['I', 'SIGI']
CONTENTS: ['p9_se_w2.sca', 'sca', 'unmerged', 'I 41', None, None, ['I', 'SIGI']]
Converting the files ['p9_se_w2.sca'] to sca format before proceeding
Running import directly...
WIZARD: ImportRawData
Using the datafiles converted to premerged format.
After completing the ImportRawData step, the AutoSol Wizard goes back to the beginning, but uses the newly-converted file p9_se_w2_PHX.sca:
HKLIN ENTRY: AutoSol_run_1_/p9_se_w2_PHX.sca
FILE TYPE scalepack_merge
GUESS FILE TYPE MERGE TYPE sca premerged
LABELS['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']
Unit cell: (113.949, 113.949, 32.474, 90, 90, 90)
Space group: I 4 (No. 79)
CONTENTS: ['AutoSol_run_1_/p9_se_w2_PHX.sca', 'sca', 'premerged', 'I 4',
[113.949, 113.949, 32.473999999999997, 90.0, 90.0, 90.0],
1.7443432606877809, ['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']]
Total of 1 input data files
Guessing cell contents
The AutoSol Wizard uses the sequence information in your sequence file (seq.dat) and the cell parameters and space group to guess the number of NCS copies and the solvent fraction, and the number of total methionines (approximately equal to the number of heavy-atom sites for SeMet proteins):
AutoSol_guess_setup_for_scaling AutoSol Run 1 Fri Mar 7 00:53:48 2008
Solvent fraction and resolution and ha types/scatt fact
This is the last dataset to scale
Guessing setup for scaling dataset 1
SG I 4 cell [113.949, 113.949, 32.473999999999997, 90.0, 90.0, 90.0]
Number of residues in unique chains in seq file: 139
Unit cell: (113.949, 113.949, 32.474, 90, 90, 90) http://phenix-online.org/documentation/tutorial_sad.htm (4 of 12) [12/14/08 1:04:01 PM]
285
Tutorial 1: Solving a structure with SAD data
Space group: I 4 (No. 79)
CELL VOLUME :421654.580793
N_EQUIV:8
GUESS OF NCS COPIES: 1
SOLVENT FRACTION ESTIMATE: 0.64
Total residues:139
Total Met:4 resolution estimate: 2.4
Running phenix.xtriage
The AutoSol Wizard automatically runs phenix.xtriage on each of your input datafiles to analyze them for twinning, outliers, translational symmetry, and other special conditions that you should be aware
of. You can read more about xtriage in Data quality assessment with phenix.xtriage
summary output from xtriage for this dataset looks like this:
The largest off-origin peak in the Patterson function is 6.49% of the height of the origin peak. No significant pseudotranslation is detected.
The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.
Testing for anisotropy in the data
The AutoSol Wizard tests for anisotropy by determining the range of effective anisotropic B values along the principal lattice directions. If this range is large and the ratio of the largest to the smallest value is also large then the data are by default corrected to make the anisotropy small (see
Analyzing and scaling the data
in the AutoSol web page for more discussion of the anisotropy correction). In the
p9 case, the range of anisotropic B values is small and no correction is made:
Range of aniso B: 15.67 26.14
Not using aniso-corrected data files as the range of aniso b is only
10.47 and 'correct_aniso' is not set
Choosing datafiles with high signal-to-noise
During scaling, the AutoSol Wizard estimates the signal-to-noise in each datafile and the resolution where there is significant signal-to-noise (above 0.3:1 signal-to-noise). You can see this analysis in the log file dataset_scale_1.log for dataset 1. In this case, the signal-to-noise is 1.4 to a resolution of
2.4 A:
FILE DATA:AutoSol_run_1_/p9_se_w2_PHX.sca sn: 1.420786
Running HYSS to find the heavy-atom substructure
The HYSS (hybrid substructure search) procedure for heavy-atom searching uses a combination of a
Patterson search for 2-site solutions with direct methods recycling. The search ends when the same solution is found beginning with several different starting points. The HYSS log files are named after the datafile that they are based on and the type of differences (ano, iso) that are being used. In this
p9 SAD dataset, the HYSS logfile is p9_se_w2_PHX.sca_ano_1.sca_hyss.log. The key part of this
HYSS log file is:
Entering search loop: http://phenix-online.org/documentation/tutorial_sad.htm (5 of 12) [12/14/08 1:04:01 PM]
286
Tutorial 1: Solving a structure with SAD data p = peaklist index in Patterson map f = peaklist index in two-site translation function cc = correlation coefficient after extrapolation scan r = number of dual-space recycling cycles cc = final correlation coefficient p=000 f=000 cc=0.392 r=015 cc=0.532 [ best cc: 0.532 ] p=000 f=001 cc=0.381 r=015 cc=0.532 [ best cc: 0.532 0.532 ]
Number of matching sites of top 2 structures: 6
Here a correlation coefficient of 0.5 is very good (0.1 is hopeless, 0.2 is possible, 0.3 is good) and 6 sites were found that matched in the first two tries. The program continues until 5 structures all have matching sites, then ends and prints out the final correlations, after taking the top 4 sites.
Finding the hand and scoring heavy-atom solutions
Normally either hand of the heavy-atom substructure is a possible solution, and both must be tested by calculating phases and examining the electron density map and by carrying out density modification, as they will give the same statistics for all heavy-atom analysis and phasing steps. Note that in chiral space groups (those that have a handedness such as P61, both hands of the space
group must be tested. The AutoSol Wizard will do this for you, inverting the hand of the heavy-atom substructure and the space group at the same time. For example, in space group P61 the hand of the substructure is inverted and then it is placed in space group P65.
Scoring heavy-atom solutions
The AutoSol Wizard scores heavy-atom solutions based on two criteria by default. The first criterion is the skew of the electron density in the map (SKEW). Good values for the skew are anything greater than 0.1. In a SAD structure determination, the heavy-atom solution with the correct hand may have a much more positive skew than the one with the inverse hand. The second criterion is the correlation of local RMS density (CORR_RMS). This is a measure of how contiguous the solvent and non-solvent regions are in the map. (If the local rms is low at one point and also low at neighboring points, then the solvent region must be relatively contiguous, and not split up into small regions.) For SAD datasets, Phaser is used for calculating phases. For a SAD dataset, a figure of merit of 0.3 is acceptable, 0.4 is fine and anything above 0.5 is very good. The scores for solution #1 are listed in the
AutoSol log file:
Scoring for this solution now...
AutoSol_run_1_/TEMP0/resolve.scores SKEW -0.047612928
AutoSol_run_1_/TEMP0/resolve.scores CORR_RMS 0.8755398
CC-EST (BAYES-CC) SKEW : 10.0 +/- 26.1
CC-EST (BAYES-CC) CORR_RMS : 55.7 +/- 36.1
Resetting sigma of quality estimate due to wide range of estimated values:
Overall quality: 14.7
Highest lower bound of quality for individual estimates: 37.6
Current 2*sigma: 37.5 New 2*sigma: 45.7
ESTIMATED MAP CC x 100: 14.7 +/- 45.7
The ESTIMATED MAP CC x 100 is an estimate of the quality of the experimental electron density map (not the density-modified one). A set of real structures was used to calibrate the range of values of each score that were obtained for phases with varying quality. The resulting probability distributions http://phenix-online.org/documentation/tutorial_sad.htm (6 of 12) [12/14/08 1:04:01 PM]
287
Tutorial 1: Solving a structure with SAD data are used above to estimate the correlation between the experimental map and an ideal map for this structure. Then all the estimates are combined to yield an overall Bayesian estimate of the map quality. These are reported as CC x 100 +/- 2SD. These estimated map CC values are usually fairly close, so as the estimate is 14.7 +/- 45.7, you can be quite confident that this solution is not the right one. The wizard then tries the inverse solution...
Scoring for this solution now...
AutoSol_run_1_/TEMP0/resolve.scores SKEW 0.2644597
AutoSol_run_1_/TEMP0/resolve.scores CORR_RMS 0.9274329
CC-EST (BAYES-CC) SKEW : 56.5 +/- 18.1
CC-EST (BAYES-CC) CORR_RMS : 63.1 +/- 28.5
ESTIMATED MAP CC x 100: 60.0 +/- 13.6
Reading NCS information from: AutoSol_run_1_/TEMP0/resolve.log
based on [ha_2.pdb,phaser_2.mtz]
Reformatting ha_2.pdb and putting it in ha_2.pdb_formatted.pdb
RANGE to KEEP :1.28
Confident of the hand (Quality diff from opp hand is 1.9 sigma)
This solution looks a lot better. The overall estimated map CC value is 60.0 +/- 13.6. This means that your structure is not only solved but that you will have a good map when it is density modified.
Final phasing with Phaser
Once the best heavy-atom solution or solutions are chosen based on ESTIMATED MAP CC x 100, these are used in a final round of phasing with Phaser (for SAD phasing). The log file from phasing for solution 2 is in phaser_2.log. Here is the final part of the output from this log file, showing the refined coordinates, occupancies, thermal (B) factors for the 4 sites, along with the refined scattering factors (in this case only f" is refined), and the final figure of merit of phasing (0.544):
Atom Parameters: 4 atoms in list
X Y Z O B (AnisoB) M Atomtype
#1 0.180 -0.113 -0.681 1.135 22.8 ( ---- ) 1 SE
#2 0.686 -0.238 -0.710 0.980 23.0 (+22.40) 1 SE
#3 0.665 -0.206 -0.774 1.020 28.2 (+26.14) 1 SE
#5 0.027 0.758 0.905 0.176 23.9 ( ---- ) 1 SE
Scattering Parameters:
Atom f" (f')
SE 5.5196 -8.0000
Figures of Merit
----------------
Bin Resolution Acentric Centric Single Total
Number FOM Number FOM Number FOM Number FOM
ALL 28.49- 2.40 7502 0.594 874 0.140 51 0.057 8427 0.544
log-likelihood gain -90088
Statistical density modification with RESOLVE
After SAD phases are calculated with Phaser, the AutoSol Wizard uses RESOLVE density modification to http://phenix-online.org/documentation/tutorial_sad.htm (7 of 12) [12/14/08 1:04:01 PM]
288
Tutorial 1: Solving a structure with SAD data improve the quality of the electron density map. The statistical density modification in RESOLVE takes advantage of the flatness of the solvent region and the expected distribution of electron density in the region containing the macromolecule, as well as any NCS that can be found from the heavy-atom substructure. The weighted structure factors and phases (FWT, PHWT) from Phaser are used to calculate the starting map for RESOLVE, and the experimental structure factor amplitudes (FP) and
SAD Hendrickson-Lattman coefficients from Phaser are used in the density modification process. The output from RESOLVE for solution 1 can be found in resolve_2.log. Here are key sections of this output. First, the plot of how many points in the "protein" region of the map have each possible value of electron density. The plot below is normalized so that a density of zero is the mean of the solvent region, and the standard deviation of the density in the map is 1.0. A perfect map has a lot of points with density slightly less than zero on this scale (the points between atoms) and a few points with very high density (the points near atoms), and no points with very negative density. Such a map has a very high skew (think "skewed off to the right"). This map is good, with a positive skew, though it is not perfect.
Plot of Observed (o) and model (x) electron density distributions for protein
region, where the model distribution is given by,
p_model(beta*(rho+offset)) = p_ideal(rho)
and then convoluted with a gaussian with width of sigma
where sigma, offset and beta are given below under "Error estimate."
0.03..................................................
. . .
. . .
. xxxxxxx .
. ooxoooooooxxo .
. xxo . xoo .
. xo . xxo .
p(rho) . xx . xxoo .
. ox . xxooo .
. xo . xxo .
. xx . xxxx .
. xo . xxxx .
. xx . xxxx .
xxx . xxxxx .
x . ooxxxx
0.0 x................................................x
-2 -1 0 1 2 3
normalized rho (0 = mean of solvent region)
After density modification is complete, this plot becomes much more like one from a perfect structure:
0.03..................................................
. . .
. xxxxx . .
. xooooxxo . .
. oxo oxo . .
. xx xo. .
. ox xxoo .
p(rho) . ox .xoo .
. x . xo .
. x . xxxx .
. xx . xxxxo .
http://phenix-online.org/documentation/tutorial_sad.htm (8 of 12) [12/14/08 1:04:01 PM]
289
Tutorial 1: Solving a structure with SAD data
. xx . xxxxxxxx .
. xx . ooxxxxxx .
xx . o oxxxxxoo
x . xo
0.0 o................................................x
-2 -1 0 1 2 3
normalized rho (0 = mean of solvent region)
The key statistic from this RESOLVE density modification is the R-factor for comparison of observed structure factor amplitudes (FP) with those calculated from the density modification procedure (FC). In this p9 SAD phasing the R-factor is very low:
Overall R-factor for FC vs FP: 0.239 for 8422 reflections
An acceptable value is anything below 0.35; below 0.30 is good.
Generation of FreeR flags
The AutoSol Wizard will create a set of free R flags indicating which reflections are not to be used in refinement. By default 5% of reflections (up to a maximum of 2000) are reserved for this test set. If you want to supply a reflection file hires.mtz that has higher resolution than the data used to solve the structure, or has a test set already marked, then you can do this with the keyword
input_refinement_file=hires.mtz. The files to be used for model-building and refinement are listed in the AutoSol log file:
FreeR_flag added to phaser_2.mtz
...
Saving exptl_fobs_phases_freeR_flags_2.mtz for refinement
THE FILE AutoSol_run_1_/resolve_2.mtz will be used for model-building
Model-building with RESOLVE
The AutoSol Wizard by default uses a very quick method to build just the secondary structure of your macromolecule. This is controlled by the keyword helices_strands_only=True. The Wizard will guess from your sequence file whether the structure is protein or RNA or DNA (but you can tell it if you want with (chain_type=PROTEIN). If the quick model-building does not build a satisfactory model (if the correlation of map and model is less than acceptable_secondary_structure_cc=0.35), then modelbuilding is tried again with the standard build procedure, essentially the same as one cycle of model-
building with the AutoBuild Wizard (see the web page Automated Model Building and Rebuilding with
, except that if you specify thoroughness=quick as we have in this example, the modelbuilding is done less comprehensively to speed things up. In this case the secondary-structure-only model-building produces an initial model with 61 residues built and side chains assigned to 0, and which has a model-map correlation of 0.33:
Model with helices and strands is in Build_1.pdb
Log for helices and strands is in Build_1.log
Final file: AutoSol_run_1_/TEMP0/Build_1.pdb
Log file: Build_1.log copied to Build_1.log
Model 1: Residues built=61 placed=0 Chains=9 Model-map CC=0.33
This is new best model with cc = 0.33
Getting R for model: Build_1.pdb
http://phenix-online.org/documentation/tutorial_sad.htm (9 of 12) [12/14/08 1:04:01 PM]
290
Tutorial 1: Solving a structure with SAD data
Model: AutoSol_run_1_/TEMP0/refine_1.pdb R/Rfree=0.55/0.58
As the model-map correlation is only 0.33, the Wizard decides that this is not good enough and tries again with regular model-building, yielding a better model with 86 residues built and a map correlation of 0.55:
Model 2: Residues built=86 placed=7 Chains=15 Model-map CC=0.55
This is new best model with cc = 0.55
Refining model: Build_2.pdb
Model: AutoSol_run_1_/TEMP0/refine_2.pdb R/Rfree=0.46/0.49
After one model completion cycle (including extending ends of chains, fitting loops, and building outside the region already built, the best model built has 77 residues built, 22 assigned to sequence and a map correlation of 0.61:
Model completion cycle 1
Models to combine and extend: ['Build_2.pdb', 'refine_2.pdb']
Model 3: Residues built=77 placed=22 Chains=10 Model-map CC=0.61
This is new best model with cc = 0.61
Refining model: Build_combine_extend_3.pdb
Model: AutoSol_run_1_/TEMP0/refine_3.pdb R/Rfree=0.45/0.47
This initial model is written out to refine_3.pdb in the output directory. It is still just a preliminary model, but it is good enough to tell that the structure is solved. For full model-building you will want to
go on and use the AutoBuild Wizard (see the web page Automated Model Building and Rebuilding with
)
The AutoSol_summary.dat summary file
A quick summary of the results of your AutoSol run is in the AutoSol_summary.dat file in your output directory. This file lists the key files that were produced in your run of AutoSol (all these are in the output directory) and some of the key statistics for the run, including the scores for the heavyatom substructure and the model-building and refinement statistics. These statistics are listed for all the solutions obtained, with the highest-scoring solutions first. Here is part of the summary for this p9
SAD dataset:
-----------CURRENT SOLUTIONS FOR RUN 1 : -------------------
*** FILES ARE IN THE DIRECTORY: AutoSol_run_1_ ****
Solution # 2 BAYES-CC: 60.0 +/- 13.6 Dataset #1 FOM: 0.54 ----------------
Solution 2 using HYSS on
/net/firebird/scratch1/terwill/run_072908a/p9-sad/AutoSol_run_1_/ p9_se_w2_PHX.sca_ano_1.sca and taking inverse. Dataset #1
Dataset number: 1
Dataset type: sad
Datafiles used: [
'/net/firebird/scratch1/terwill/run_072908a/p9-sad/AutoSol_run_1_/p9_se_w2_PHX.sca']
Sites: 4 (Already used for Phasing at resol of 2.4) Refined Sites: 4
NCS information in: AutoSol_2.ncs_spec
Experimental phases in: phaser_2.mtz
Experimental phases plus FreeR_flags for refinement in: exptl_fobs_phases_freeR_flags_2.mtz
Density-modified phases in: resolve_2.mtz
http://phenix-online.org/documentation/tutorial_sad.htm (10 of 12) [12/14/08 1:04:01 PM]
291
Tutorial 1: Solving a structure with SAD data
HA sites (PDB format) in: ha_2.pdb_formatted.pdb
Sequence file in: seq.dat
Model in: refine_3.pdb
Residues built: 77
Side-chains built: 22
Chains: 10
Overall model-map correlation: 0.61
R/R-free: 0.45/0.47
Scaling logfile in: dataset_1_scale.log
HYSS logfile in: p9_se_w2_PHX.sca_ano_1.sca_hyss.log
Phasing logfile in: phaser_2.log
Density modification logfile in: resolve_2.log (R=0.24)
Build logfile in: Build_combine_extend_3.log
Score type: SKEW CORR_RMS
Raw scores: 0.26 0.93
BAYES-CC: 56.50 63.07
Refined heavy atom sites (fractional): xyz 0.180 -0.113 -0.681
xyz 0.686 -0.238 -0.710
xyz 0.665 -0.206 -0.774
xyz 0.027 0.758 0.905
How do I know if I have a good solution?
Here are some of the things to look for to tell if you have obtained a correct solution:
●
How much of the model was built? More than 50% is good, particularly if you are using the default of helices_strands_only=True. If less than 25% of the model is built, then it may be entirely incorrect. Have a look at the model. If you see clear sets of parallel or antiparallel strands, or if you see helices and strands with the expected relationships, your model is going to be correct. If you see a lot of short fragments everywhere, your model and solution is going to be incorrect. How many side-chains were fitted to density? More than 25% is ok, more than 50% is very good.
●
What is the R-factor of the model? This only applies if you are building a full model (not for helices_strands_only=True). For a solution at moderate to high resolution (2.5 A or better) the Rfactor should be in the low 30's to be very good. For lower-resolution data, an R-factor in the low
●
40's is probably largely correct but the model is not very good.
What was the overall signal-to-noise in the data? Above 1 is good, below 0.5 is very low.
●
What are the individual CC-BAYES estimates of map correlation for your top solution? For a good
● solution they are all around 50 or more, with 2SD uncertainties that are about 10-20.
What is the overall "ESTIMATED MAP CC x 100" of your top solution. This should also be 50 or more for a good solution. This is an estimate of the map correlation before density modification, so if you have a lot of solvent or several NCS-related copies in the asymmetric unit, then lower
● values may still give you a good map.
What is the difference in "ESTIMATED MAP CC x 100" between the top solution and its inverse?
If this is large (more than the 2SD values for each) that is a good sign.
What to do next
Once you have run AutoSol and have obtained a good solution and model, the next thing to do is to run the AutoBuild Wizard. If you run it in the same directory where you ran AutoSol, the AutoBuild
Wizard will pick up where the AutoSol Wizard left off and carry out iterative model-building, density
modification and refinement to improve your model and map. See the web page Automated Model
http://phenix-online.org/documentation/tutorial_sad.htm (11 of 12) [12/14/08 1:04:01 PM]
292
Tutorial 1: Solving a structure with SAD data solution, then it's not time to give up yet. There are a number of standard things to try that may improve the structure determination. Here are a few that you should always try:
●
Have a careful look at all the output files. Work your way through the main log file (e.g.,
AutoSol_run_1_1.log) and all the other principal log files in order beginning with scaling
(dataset_1_scale.log), then looking at heavy-atom searching (p9_se_w2_PHX.sca_ano_1.
sca_hyss.log), phasing (e.g., phaser_1.log or phaser_xx.log depending on which solution
xx was the top solution) and density modification (e.g., resolve_xx.log). Is there anything strange or unusual in any of them that may give you a clue as to what to try next? For example did the phasing work well (high figure of merit) yet the density modification failed? (Perhaps the hand is incorrect). Was the solvent content estimated correctly? (You can specify it yourself if you want). What does the xtriage output say? Is there twinning or strong translational symmetry? Are there problems with reflections near ice rings? Are there many outlier reflections?
●
Try a different resolution cutoff. For example 0.5 A lower resolution than you tried before. Often the highest-resolution shells have little useful information for structure solution (though the data may be useful in refinement and density modification).
●
Try a different rejection criterion for outliers. The default is ratio_out=3.0 (toss reflections with delta F more than 3 times the rms delta F of all reflections in the shell). Try instead
ratio_out=5.0 to keep almost everything.
●
If the heavy-atom substructure search did not yield plausible solutions, try searching with HYSS using the command-line interface, and vary the resolution and number of sites you look for. Can you find a solution that has a higher CC than the one found in AutoSol? If so, you can read your
● solution in to AutoSol with sites_file=my_sites.pdb.
Was an anisotropy correction applied in AutoSol? If there is some anisotropy but no correction was applied, you can force AutoSol to apply the correction with correct_aniso=True.
Additional information
For details about the AutoSol Wizard, see Automated structure solution with AutoSol . For help on
running Wizards, see
Running a Wizard from a GUI, the command-line, or a script
. http://phenix-online.org/documentation/tutorial_sad.htm (12 of 12) [12/14/08 1:04:01 PM]
293
Tutorial 2: Solving a structure with MAD data
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Tutorial 2: Solving a structure with MAD data
Running the demo gene-5 data with AutoSol
Reading the log files for your AutoSol run file
Summary of the command-line arguments
Testing for anisotropy in the data
Scaling MAD data and estimating FA values
Choosing datafiles with high signal-to-noise
Running HYSS to find the heavy-atom substructure
Finding the hand and scoring heavy-atom solutions
Finding additional sites by density modification and FA heavy-atom Fouriers
Statistical density modification with RESOLVE
The AutoSol_summary.dat summary file
How do I know if I have a good solution?
Introduction
This tutorial will use some moderately good MAD data (3 wavelengths from a gene-5 protein SeMet dataset diffracting to 2.6 A) as an example of how to solve a MAD dataset with AutoSol. It is designed to be read all the way through, giving pointers for you along the way. Once you have read it all and run the example data and looked at the output files, you will be in a good position to run your own data through AutoSol.
Setting up to run PHENIX
If PHENIX is already installed and your environment is all set, then if you type: echo $PHENIX then you should get back something like this:
/xtal//phenix-1.3
If instead you get: http://phenix-online.org/documentation/tutorial_mad.htm (1 of 14) [12/14/08 1:04:07 PM]
294
Tutorial 2: Solving a structure with MAD data
PHENIX: undefined variable then you need to set up your PHENIX environment. See the
page for details of how to do this. If you are using the C-shell environment (csh) then all you will need to do is add one line to your .cshrc (or equivalent) file that looks like this: source /xtal/phenix-1.3/phenix_env
(except that the path in this statement will be where your PHENIX is installed). Then the next time you log in $PHENIX will be defined.
Running the demo gene-5 data with AutoSol
To run AutoSol on the demo gene-5 data, make yourself a tutorials directory and cd into that directory: mkdir tutorials cd tutorials
Now type the phenix command: phenix.run_example --help to list the available examples. Choosing gene-5-mad for this tutorial, you can now use the phenix command: phenix.run_example gene-5-mad to solve the gene-5 structure with AutoSol. This command will copy the directory $PHENIX/
examples/gene-5-mad to your current directory (tutorials) and call it tutorials/gene-5-mad/ .
Then it will run AutoSol using the command file run.csh that is present in this tutorials/gene-5-
mad/ directory. This command file run.csh is simple. It says:
#!/bin/csh echo "Running AutoSol on gene-5 protein data..." phenix.autosol seq_file=sequence.dat sites=2 atom_type=Se \ peak.data=peak.sca peak.f_prime=-3 peak.f_double_prime=4. \ infl.data=infl.sca infl.f_prime=-5 infl.f_double_prime=2. \ high.data=high.sca high.f_prime=-1.5 high.f_double_prime=3.
The first line (#!/bin/csh) tells the system to interpret the remainder of the text in the file using the
C-shell (csh). The command phenix.autosol runs the command-line version of AutoSol (see
keywords). The arguments on the command line tell AutoSol about the sequence file
(seq_file=sequence.dat), the number of sites to look for (sites=2), and the atom type
(atom_type=Se). (Note that each of these is specified with an = sign, and that there are no spaces around the = sign.) For a MAD dataset, we need to tell AutoSol something about the scattering factors at each wavelength. The lines like: peak.data=peak.sca peak.f_prime=-3 peak.f_double_prime=4. do this. This line specifies that the datafile for peak data is peak.sca, that the f’ value is -3, and that the f" value is 4. These values will (by default) be refined by SOLVE prior to calculating phases. Note http://phenix-online.org/documentation/tutorial_mad.htm (2 of 14) [12/14/08 1:04:07 PM]
295
Tutorial 2: Solving a structure with MAD data the backslash "\" at the end of some of the lines in the phenix.autosol command. This tells the Cshell (which interprets everything in this file) that the next line is a continuation of the current line.
There must be no characters (not even a space) after the backslash for this to work. The MAD data to be used to solve the structure is in the datafiles peak.sca, infl.sca and high.sca These datafiles are in Scalepack premerged format, which means that there is just one instance of each reflection and the cell parameters are in the file, so we do not need to provide the cell parameters or the space group
(unless the ones in the .sca files are incorrect!) The resolution of the data is to about 2.6 A, and we are going to let AutoSol decide on the best resolution to use for structure solution. Although the phenix.
run_example gene-5-mad command has just run AutoSol from a script (run.csh), you can run
AutoSol yourself from the command line with the same phenix.autosol seq_file= ... command. You can also run AutoSol from a GUI, or by putting commands in another type of script file. All these possibilities are described in
Running a Wizard from a GUI, the command-line, or a script .
Where are my files?
Once you have started AutoSol or another Wizard, an output directory will be created in your current
(working) directory. The first time you run AutoSol in this directory, this output directory will be called AutoSol_run_1_ (or AutoSol_run_1_/, where the slash at the end just indicates that this is a directory). All of the output from run 1 of AutoSol will be in this directory. If you run AutoSol again, a new subdirectory called AutoSol_run_2_ will be created. Inside the directory AutoSol_run_1_ there will be one or more temporary directories such as TEMP0 created while the Wizard is running. The files in this temporary directory may be useful sometimes in figuring out what the Wizard is doing (or not doing!). By default these directories are emptied when the Wizard finishes (but you can keep their contents with the command clean_up=False if you want.)
What parameters did I use?
Once the AutoSol wizard has started (when run from the command line), a parameters file called
autosol.eff will be created in your output directory (e.g., AutoSol_run_1_/autosol.eff). This
parameters file has a header that says what command you used to run AutoSol, and it contains all the starting values of all parameters for this run (including the defaults for all the parameters that you did not set). The autosol.eff file is good for more than just looking at the values of parameters, though. If you copy this file to a new one (for example autosol_lores.eff) and edit it to change the values of some of the parameters (resolution=3.0) then you can re-run AutoSol with the new values of your parameters like this: phenix.autosol autosol_lores.eff
This command will do everything just the same as in your first run but use only the data to 3.0 A.
Reading the log files for your AutoSol run file
While the AutoSol wizard is running, there are several places you can look to see what is going on. The most important one is the overall log file for the AutoSol run. This log file is located in:
AutoSol_run_1_/AutoSol_run_1_1.log
for run 1 of AutoSol. (The second 1 in this log file name will be incremented if you stop this run in the middle and restart it with a command like phenix.autosol run=1). The AutoSol_run_1_1.log file is a running summary of what the AutoSol Wizard is doing. Here are a few of the key sections of the log files produced for the gene-5 MAD dataset.
Summary of the command-line arguments
Near the top of the log file you will find: http://phenix-online.org/documentation/tutorial_mad.htm (3 of 14) [12/14/08 1:04:07 PM]
296
Tutorial 2: Solving a structure with MAD data
------------------------------------------------------------
Starting AutoSol with the command: phenix.autosol seq_file=sequence.dat sites=2 atom_type=Se peak.data=peak.sca \ peak.f_prime=-3 peak.f_double_prime=4. infl.data=infl.sca infl.f_prime=-5 \ infl.f_double_prime=2. high.data=high.sca high.f_prime=-1.5 \ high.f_double_prime=3.
This is just a repeat of how you ran AutoSol; you can copy it and paste it into the command line to repeat this run.
Reading the datafiles.
The AutoSol Wizard will read in your datafiles and check their contents, printing out a summary for each one:
HKLIN ENTRY: high.sca
FILE TYPE scalepack_merge
GUESS FILE TYPE MERGE TYPE sca premerged
LABELS['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']
Unit cell: (76.08, 27.97, 42.36, 90, 103.2, 90)
Space group: C 1 2 1 (No. 5)
CONTENTS: ['high.sca', 'sca', 'premerged', 'C 1 2 1',
[76.079999999999998, 27.969999999999999, 42.359999999999999, 90.0, 103.2, 90.0],
2.5940784397029653, ['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']]
Total of 3 input data files
['peak.sca', 'infl.sca', 'high.sca']
Guessing cell contents
The AutoSol Wizard uses the sequence information in your sequence file (sequence.dat) and the cell parameters and space group to guess the number of NCS copies and the solvent fraction, and the number of total methionines (approximately equal to the number of heavy-atom sites for SeMet proteins):
AutoSol_guess_setup_for_scaling AutoSol Run 1 Thu Mar 6 21:43:20 2008
Solvent fraction and resolution and ha types/scatt fact
This is the last dataset to scale
Guessing setup for scaling dataset 1
SG C 1 2 1 cell [76.079999999999998, 27.969999999999999, 42.359999999999999, 90.0, 103.2, 90.0]
Number of residues in unique chains in seq file: 87
Unit cell: (76.08, 27.97, 42.36, 90, 103.2, 90)
Space group: C 1 2 1 (No. 5)
CELL VOLUME :87758.6787391
N_EQUIV:4
GUESS OF NCS COPIES: 1
SOLVENT FRACTION ESTIMATE: 0.46
Total residues:87
Total Met:2 resolution estimate: 2.59
http://phenix-online.org/documentation/tutorial_mad.htm (4 of 14) [12/14/08 1:04:07 PM]
297
Tutorial 2: Solving a structure with MAD data
Running phenix.xtriage
The AutoSol Wizard automatically runs phenix.xtriage on each of your input datafiles to analyze them for twinning, outliers, translational symmetry, and other special conditions that you should be aware
of. You can read more about xtriage in Data quality assessment with phenix.xtriage
summary output from xtriage for this dataset looks like this:
The largest off-origin peak in the Patterson function is 12.60% of the height of the origin peak. No significant pseudotranslation is detected.
The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.
Testing for anisotropy in the data
The AutoSol Wizard tests for anisotropy by determining the range of effective anisotropic B values along the principal lattice directions. If this range is large and the ratio of the largest to the smallest value is also large then the data are by default corrected to make the anisotropy small (see
Analyzing and scaling the data
in the AutoSol web page for more discussion of the anisotropy correction). In the
gene-5 case, the range of anisotropic B values is small and no correction is made:
Range of aniso B: 24.58 27.92
Not using aniso-corrected data files as the range of aniso b is only 3.43 and 'correct_aniso' is not set
Note that if any one of the datafiles in a MAD dataset has a high anisotropy, then by default all of them will be corrected for anisotropy.
Scaling MAD data and estimating FA values
The AutoSol Wizard uses SOLVE localscaling to scale MAD data. The procedure is basically to scale all the data to the most complete dataset, ignoring anomalous differences, to create a reference dataset.
Then all F+ and F- observations at all wavelengths are scaled to this reference dataset, and then the data are merged to the asymmetric unit, averaging duplicate observations. During this process outliers that deviate from the reference values by more that ratio_out (default=3) standard deviations (using all data in the appropriate resolution shell to estimate the SD) are rejected. After scaling, the values of
f’ and f" are refined based on the relative values of anomalous differences at the various wavelengths and the relative values of dispersive differences among the data at different wavelengths. Then FA values (estimates of the heavy-atom structure factor) are estimated. These FA values can often be more useful than the anomalous differences at any of the individual wavelengths because they combine the anomalous and dispersive information. At the same time as FA values are calculated, an estimate of the phase difference between the structure factor of the anomalously-scattering atoms and the structure factor corresponding to all other atoms can be estimated. This phase difference is useful later in calculating Fourier maps showing the positions of the anomalously-scattering atoms.
Choosing datafiles with high signal-to-noise
For MAD data the AutoSol Wizard analyzes the correlation of anomalous differences at the various wavelengths. The anomalous difference for a particular reflection is related to the f" value at each wavelength. Consequently if the data are good then the anomalous differences at different wavelengths (but for the same reflections) are highly correlated. A shell of resolution in which the anomalous differences have a correlation of about 0.3 or greater has some useful information. A strong
SeMet dataset will have an overall correlation of 0.6-0.7 for the peak and high energy remote wavelengths. You can see this analysis in the log file dataset_scale_1.log for this MAD dataset: http://phenix-online.org/documentation/tutorial_mad.htm (5 of 14) [12/14/08 1:04:07 PM]
298
Tutorial 2: Solving a structure with MAD data
Correlation of anomalous differences at different wavelengths.
(You should probably cut your data off at the resolution where
this drops below about 0.3. A good dataset has correlation
between peak and remote of at least 0.7 overall. Data with
correlations below about 0.5 probably are not contributing much.)
CORRELATION FOR
WAVELENGTH PAIRS
DMIN 1 VS 2 1 VS 3 2 VS 3
5.18 0.79 0.89 0.73
3.88 0.68 0.75 0.55
3.63 0.68 0.72 0.46
3.43 0.53 0.61 0.41
3.24 0.51 0.58 0.26
3.11 0.51 0.59 0.36
2.98 0.36 0.54 0.13
2.85 0.50 0.45 0.35
2.72 0.28 0.30 0.10
2.59 0.32 0.23 0.14
ALL 0.55 0.66 0.40
During scaling, the AutoSol Wizard estimates the signal-to-noise in each datafile and the resolution where there is significant signal-to-noise (above 0.3:1 signal-to-noise). In this case, the FA's appear to have the highest signal-to-noise (3.1) and the inflection data the lowest (0.5):
FILE DATA:FA.sca sn: 3.136704
FILE DATA:peak.sca sn: 2.527422
FILE DATA:high.sca sn: 1.35499
FILE DATA:infl.sca sn: 0.5154387
order of datasets for trying phasing:['FA.sca', 'peak.sca', 'high.sca', 'infl.sca']
Running HYSS to find the heavy-atom substructure
The HYSS (hybrid substructure search) procedure for heavy-atom searching uses a combination of a
Patterson search for 2-site solutions with direct methods recycling. The search ends when the same solution is found beginning with several different starting points. The HYSS log files are named after the datafile that they are based on and the type of differences (ano, iso) that are being used. In this
gene-5 MAD dataset, the HYSS logfile is peak.sca_ano_1.sca_hyss.log. The key part of this HYSS log file is:
Entering search loop: p = peaklist index in Patterson map f = peaklist index in two-site translation function cc = correlation coefficient after extrapolation scan r = number of dual-space recycling cycles cc = final correlation coefficient p=000 f=000 cc=0.181 r=015 cc=0.292 [ best cc: 0.292 ] p=000 f=001 cc=0.151 r=015 cc=0.285 [ best cc: 0.292 0.285 ]
Number of matching sites of top 2 structures: 2 http://phenix-online.org/documentation/tutorial_mad.htm (6 of 14) [12/14/08 1:04:07 PM]
299
Tutorial 2: Solving a structure with MAD data p=000 f=002 cc=0.144 r=015 cc=0.280 [ best cc: 0.292 0.285 0.280 ]
Number of matching sites of top 2 structures: 2
Number of matching sites of top 3 structures: 2 p=001 f=000 cc=0.152 r=015 cc=0.278 [ best cc: 0.292 0.285 0.280 0.278 ]
Number of matching sites of top 2 structures: 2
Number of matching sites of top 3 structures: 2
Number of matching sites of top 4 structures: 2 p=001 f=001 cc=0.101 r=015 cc=0.291 [ best cc: 0.292 0.291 0.285 0.280 0.278 ]
Number of matching sites of top 2 structures: 3
Number of matching sites of top 3 structures: 2
Number of matching sites of top 4 structures: 2
Number of matching sites of top 5 structures: 2
Here a correlation coefficient of 0.5 is very good (0.1 is hopeless, 0.2 is possible, 0.3 is good) and 2 sites were found that matched in the first two tries. The program continues until 5 structures all have matching sites, then ends and prints out the final correlations, after taking the top 2 sites.
Finding the hand and scoring heavy-atom solutions
Normally either hand of the heavy-atom substructure is a possible solution, and both must be tested by calculating phases and examining the electron density map and by carrying out density modification, as they will give the same statistics for all heavy-atom analysis and phasing steps. Note that in chiral space groups (those that have a handedness such as P61, both hands of the space
group must be tested. The AutoSol Wizard will do this for you, inverting the hand of the heavy-atom substructure and the space group at the same time. For example, in space group P61 the hand of the substructure is inverted and then it is placed in space group P65. The AutoSol Wizard scores heavyatom solutions based on two criteria by default. The first criterion is the skew of the electron density in the map (SKEW). Good values for the skew are anything greater than 0.1. In a MAD structure determination, the heavy-atom solution with the correct hand may have a more positive skew than the one with the inverse hand. The second criterion is the correlation of local RMS density (CORR_RMS).
This is a measure of how contiguous the solvent and non-solvent regions are in the map. (If the local rms is low at one point and also low at neighboring points, then the solvent region must be relatively contiguous, and not split up into small regions.) For MAD datasets, SOLVE is used for calculating phases. For a MAD dataset, a figure of merit of 0.5 is acceptable, 0.6 is fine and anything above 0.7 is very good. The first three solutions scored are all quite good. Here is the third and best one:
SCORING SOLUTION 3: Solution 3 using HYSS on FA.sca. Dataset #1, with 2 sites
Evaluating solution 3
FOM found: 0.6
Number of scoring criteria: 2
Using BAYES-CC (Bayesian estimate of CC of map to perfect) as scores
Scoring for this solution now...
AutoSol_run_1_/TEMP0/resolve.scores SKEW 0.2547302
AutoSol_run_1_/TEMP0/resolve.scores CORR_RMS 0.8763324
CC-EST (BAYES-CC) SKEW : 55.7 +/- 18.5
CC-EST (BAYES-CC) CORR_RMS : 55.8 +/- 36.0
ESTIMATED MAP CC x 100: 57.6 +/- 14.1
The ESTIMATED MAP CC x 100 is an estimate of the quality of the experimental electron density map (not the density-modified one). A set of real structures was used to calibrate the range of values of each score that were obtained for phases with varying quality. The resulting probability distributions http://phenix-online.org/documentation/tutorial_mad.htm (7 of 14) [12/14/08 1:04:07 PM]
300
Tutorial 2: Solving a structure with MAD data are used above to estimate the correlation between the experimental map and an ideal map for this structure. Then all the estimates are combined to yield an overall Bayesian estimate of the map quality. These are reported as CC x 100 +/- 2SD. These estimated map CC values are usually fairly close, so if the estimate is 57.6 +/- 14.1 then you can be confident that your structure is not only solved but that you will have a good map when it is density modified. In this case the datasets used to find heavy-atom substructures were the FA values in FA.sca and the peak data in peak.sca_ano_1.
sca. For each dataset one solution was found, and that solution and its inverse were scored. The scores were (skipping extra text below):
SCORING SOLUTION 1: Solution 1 using HYSS on
/net/idle/scratch1/terwill/run_072908a/gene-5-mad/peak.sca_ano_1.sca.
Dataset #1, with 2 sites
ESTIMATED MAP CC x 100: 55.0 +/- 15.5
SCORING SOLUTION 2: Solution 2 using HYSS on
/net/idle/scratch1/terwill/run_072908a/gene-5-mad/peak.sca_ano_1.sca and taking inverse. Dataset #1, with 2 sites
ESTIMATED MAP CC x 100: 55.0 +/- 15.5
SCORING SOLUTION 3: Solution 3 using HYSS on FA.sca. Dataset #1, with 2 sites
Dataset #1, with 2 sites
ESTIMATED MAP CC x 100: 57.6 +/- 14.1
SCORING SOLUTION 4: Solution 4 using HYSS on FA.sca and taking inverse.
Dataset #1, with 2 sites
ESTIMATED MAP CC x 100: 39.7 +/- 26.9
SCORING SOLUTION 5: Solution 5 using HYSS on
/net/idle/scratch1/terwill/run_072908a/gene-5-mad/high.sca_ano_1.sca.
Dataset #1, with 2 sites
ESTIMATED MAP CC x 100: 54.9 +/- 15.6
SCORING SOLUTION 6: Solution 6 using HYSS on
/net/idle/scratch1/terwill/run_072908a/gene-5-mad/high.sca_ano_1.sca and taking inverse. Dataset #1, with 2 sites
ESTIMATED MAP CC x 100: 55.0 +/- 15.5
In this case the best score was using the FA values and taking the original hand (ESTIMATED MAP CC x
100: 57.6 +/- 14.1), and score for the inverted hand of the heavy-atom substructure was worse
(ESTIMATED MAP CC x 100: 39.7 +/- 26.9) and so the hand was clear.
Finding additional sites by density modification and FA heavy-atom Fouriers
When AutoSol is used with the default keyword of thoroughness=thorough as in this example, additional heavy-atom sites are found by phasing using the current model, carrying out density modification to improve the phases, and using the improved phases along with the FA values and the phase difference between the heavy atoms and the non-heavy atoms to calculate Fourier maps showing the positions of the anomalously-scattering atoms. The top peaks in these maps are used as trial heavy-atom sites (if they are not already part of the heavy-atom model. In this example solutions
1, 3, and 6 are all used for this phasing/density modification/Fourier procedure. Six new solutions are found, the best of which are solution 16, based on a difference Fourier using density modified phases from solution 6, and solution 8, based on density-modified phases from solution 3. Here is solution 16, not substantially different from solution 6 in this case:
SCORING SOLUTION 16: Solution 16 based on diff Fourier using http://phenix-online.org/documentation/tutorial_mad.htm (8 of 14) [12/14/08 1:04:07 PM]
301
Tutorial 2: Solving a structure with MAD data denmod solution 6. Dataset #1, with 2 sites
CC-EST (BAYES-CC) SKEW : 55.8 +/- 18.5
CC-EST (BAYES-CC) CORR_RMS : 55.8 +/- 36.0
ESTIMATED MAP CC x 100: 57.7 +/- 14.1
This process is repeated several additional times, leading to the final best solution of Solution 21:
SCORING SOLUTION 21: Solution 21 based on diff Fourier using denmod solution 16.
Dataset #1, with 2 sites
ESTIMATED MAP CC x 100: 57.7 +/- 14.1
which is used for final phasing and density modification.
Final phasing with SOLVE
Once the best heavy-atom solution or solutions are chosen based on ESTIMATED MAP CC, these are used in a final round of phasing with SOLVE (for MAD phasing). The log file from phasing for solution
21 is in solve_21.prt. This SOLVE log file repeats the correlation analysis of anomalous differences between data at each wavelength. Then it carries out a detailed refinement of the scattering factors at each wavelength. Finally the heavy-atom model is refined and phases are calculated with Bayesian correlated MAD phasing. The final occupancies and coordinates are listed at the end:
SITE ATOM OCCUP X Y Z B
CURRENT VALUES: 1 Se 0.9665 0.0175 0.2269 0.4069 50.6892
CURRENT VALUES: 2 Se 0.5979 0.9714 0.0088 0.4460 60.0000
In this case the occupancy of one site is quite near 1 and the other is lower. The second site is a selenomethionine that is not well ordered (it is the N-terminal residue in the protein).
Statistical density modification with RESOLVE
After MAD phases are calculated with SOLVE, the AutoSol Wizard uses RESOLVE density modification to improve the quality of the electron density map. The statistical density modification in RESOLVE takes advantage of the flatness of the solvent region and the expected distribution of electron density in the region containing the macromolecule, as well as any NCS that can be found from the heavy-atom substructure. The weighted structure factors and phases (FWT, PHWT) from SOLVE are used to calculate the starting map for RESOLVE, and the experimental structure factor amplitudes (FP) and
MAD Hendrickson-Lattman coefficients from SOLVE are used in the density modification process. The output from RESOLVE for solution 1 can be found in resolve_10.log. Here are key sections of this output. First, the plot of how many points in the "protein" region of the map have each possible value of electron density. The plot below is normalized so that a density of zero is the mean of the solvent region, and the standard deviation of the density in the map is 1.0. A perfect map has a lot of points with density slightly less than zero on this scale (the points between atoms) and a few points with very high density (the points near atoms), and no points with very negative density. Such a map has a very high skew (think "skewed off to the right"). This map is good, with a positive skew, though it is not perfect.
Plot of Observed (o) and model (x) electron density distributions for protein
region, where the model distribution is given by,
p_model(beta*(rho+offset)) = p_ideal(rho)
and then convoluted with a gaussian with width of sigma
where sigma, offset and beta are given below under "Error estimate." http://phenix-online.org/documentation/tutorial_mad.htm (9 of 14) [12/14/08 1:04:07 PM]
302
Tutorial 2: Solving a structure with MAD data
0.04..................................................
. . .
. . .
. xxxx o .
. xx ooxo .
. xo . xx .
. x . xxx .
p(rho) . x . xx .
. xxo . xx .
. o . xo .
. xx . xxx .
. xx . xxx .
. oxx . xxx .
. xxx . oxxxx .
. oxx . oxxxxx .
0.0 xxxx......................................oxxxxxxx
-2 -1 0 1 2 3
normalized rho (0 = mean of solvent region)
After density modification is complete, this plot becomes much more like one from a perfect structure:
0.03..................................................
. . .
. . .
. xxxxoo. .
. xo o xxo .
. xxo o xo .
. x .xxo .
p(rho) . xo . xo .
. xo . oxxx .
. ox . xxx .
. ox . oxxxxx .
. xxx . ooxxxxxxx .
. ox . oooxxxxxx .
. oxx . o oxxxxxx.
xxxx . xo
0.0 x................................................x
-2 -1 0 1 2 3
normalized rho (0 = mean of solvent region)
The key statistic from this RESOLVE density modification is the R-factor for comparison of observed structure factor amplitudes (FP) with those calculated from the density modification procedure (FC). In this gene-5 MAD phasing the R-factor is very low:
Overall R-factor for FC vs FP: 0.293 for 2602 reflections
An acceptable value is anything below 0.35; below 0.30 is good. http://phenix-online.org/documentation/tutorial_mad.htm (10 of 14) [12/14/08 1:04:07 PM]
303
Tutorial 2: Solving a structure with MAD data
Generation of FreeR flags
The AutoSol Wizard will create a set of free R flags indicating which reflections are not to be used in refinement. By default 5% of reflections (up to a maximum of 2000) are reserved for this test set. If you want to supply a reflection file hires.mtz that has higher resolution than the data used to solve the structure, or has a test set already marked, then you can do this with the keyword
input_refinement_file=hires.mtz. The files to be used for model-building and refinement are listed in the AutoSol log file:
Copying AutoSol_run_1_/solve_21.mtz and adding free R flags for refinement input_data_file_use: AutoSol_run_1_/solve_21.mtz
labin_use: labin FP=FP SIGFP=SIGFP PHIB=PHIB FOM=FOM HLA=HLA HLB=HLB HLC=HLC HLD=HLD
Adding FreeR_flag to AutoSol_run_1_/TEMP0/solve_21.mtz
...
Saving exptl_fobs_phases_freeR_flags_21.mtz for refinement
THE FILE AutoSol_run_1_/resolve_21.mtz will be used for model-building
Model-building with RESOLVE
The AutoSol Wizard by default uses a very quick method to build just the secondary structure of your macromolecule. This is controlled by the keyword helices_strands_only=True. The Wizard will guess from your sequence file whether the structure is protein or RNA or DNA (but you can tell it if you want with (chain_type=PROTEIN). If the quick model-building does not build a satisfactory model (if the correlation of map and model is less than acceptable_secondary_structure_cc=0.35), then modelbuilding is tried again with the standard build procedure, essentially the same as one cycle of model-
building with the AutoBuild Wizard (see the web page Automated Model Building and Rebuilding with
, except that if you specify thoroughness=quick as we have in this example, the modelbuilding is done less comprehensively to speed things up. In this case the secondary-structure-only model-building produces an initial model with 32 residues built and side chains assigned to 0, and which has a model-map correlation of 0.32.
Model with helices and strands is in Build_1.pdb
Log for helices and strands is in Build_1.log
Final file: AutoSol_run_1_/TEMP0/Build_1.pdb
Log file: Build_1.log copied to Build_1.log
Model 1: Residues built=32 placed=0 Chains=6 Model-map CC=0.32
This is new best model with cc = 0.32
Getting R for model: Build_1.pdb
Model: AutoSol_run_1_/TEMP0/refine_1.pdb R/Rfree=0.54/0.57
As the secondary-structure-only model-building does not give a very high model-map correlation, the
Wizard tries other density-modified maps as well. None of these give a better correlation, so the
Wizard tries regular model- building:
Secondary-structure-only model-building with RESOLVE was not successful enough...
Trying again with standard build (helices_strands_only=False)
Also turning on refine this try
...
Building 3 RESOLVE models...
Model 6: Residues built=48 placed=6 Chains=8 Model-map CC=0.50
This is new best model with cc = 0.5
Refining model: Build_6.pdbModel: AutoSol_run_1_/TEMP0/refine_6.pdb
R/Rfree=0.48/0.51
Model 7: Residues built=56 placed=0 Chains=11 Model-map CC=0.51
This is new best model with cc = 0.51
http://phenix-online.org/documentation/tutorial_mad.htm (11 of 14) [12/14/08 1:04:07 PM]
304
Tutorial 2: Solving a structure with MAD data
Refining model: Build_7.pdb
Model: AutoSol_run_1_/TEMP0/refine_7.pdb R/Rfree=0.45/0.48
Model 8: Residues built=52 placed=0 Chains=11 Model-map CC=0.51
Model completion cycle 1
Models to combine and extend: ['Build_6.pdb', 'Build_7.pdb',
'Build_8.pdb', 're fine_7.pdb']
Model 9: Residues built=64 placed=0 Chains=12 Model-map CC=0.59
This is new best model with cc = 0.59
Refining model: Build_combine_extend_9.pdb
Model: AutoSol_run_1_/TEMP0/refine_9.pdb R/Rfree=0.42/0.45
As the model-map correlation is now reasonably good (0.59), the model-building is considered successful and the refined initial model is written out to refine_9.pdb in the output directory. It is still just a preliminary model, but it is good enough to tell that the structure is solved. For full model-
building you will want to go on and use the AutoBuild Wizard (see the web page Automated Model
Building and Rebuilding with AutoBuild )
The AutoSol_summary.dat summary file
A quick summary of the results of your AutoSol run is in the AutoSol_summary.dat file in your output directory. This file lists the key files that were produced in your run of AutoSol (all these are in the output directory) and some of the key statistics for the run, including the scores for the heavyatom substructure and the model-building and refinement statistics. These statistics are listed for all the solutions obtained, with the highest-scoring solutions first. Here is part of the summary for this
gene-5 MAD dataset:
-----------CURRENT SOLUTIONS FOR RUN 1 : -------------------
*** FILES ARE IN THE DIRECTORY: AutoSol_run_1_ ****
Solution # 21 BAYES-CC: 57.7 +/- 14.1 Dataset #1 FOM: 0.51 ----------------
Solution 21 based on diff Fourier using denmod solution 16. Dataset #1
Dataset number: 1
Dataset type: mad
Datafiles used: ['/net/idle/scratch1/terwill/run_072908a/gene-5-mad/peak.sca',
'/net/idle/scratch1/terwill/run_072908a/gene-5-mad/infl.sca',
'/net/idle/scratch1/terwill/run_072908a/gene-5-mad/high.sca']
Sites: 2 (Already used for Phasing at resol of 2.5)
NCS information in: AutoSol_21.ncs_spec
Experimental phases in: solve_21.mtz
Experimental phases plus FreeR_flags for refinement in: exptl_fobs_phases_freeR_flags_21.mtz
Density-modified phases in: resolve_21.mtz
HA sites (PDB format) in: ha_21.pdb_formatted.pdb
Sequence file in: sequence.dat
Model in: refine_9.pdb
Residues built: 64
Side-chains built: 0
Chains: 12
Overall model-map correlation: 0.59
R/R-free: 0.42/0.45
http://phenix-online.org/documentation/tutorial_mad.htm (12 of 14) [12/14/08 1:04:07 PM]
305
Tutorial 2: Solving a structure with MAD data
Scaling logfile in: dataset_1_scale.log
HYSS logfile in: high.sca_ano_1.sca_hyss.log
Phasing logfile in: solve_21.prt
Density modification logfile in: resolve_21.log (R=0.29)
Build logfile in: Build_combine_extend_9.log
Score type: SKEW CORR_RMS
Raw scores: 0.26 0.88
BAYES-CC: 55.84 55.78
Heavy atom sites (fractional): xyz 0.018 0.227 0.406
xyz 0.973 0.011 0.448
How do I know if I have a good solution?
Here are some of the things to look for to tell if you have obtained a correct solution:
●
How much of the model was built? More than 50% is good, particularly if you are using the default of helices_strands_only=True. If less than25% of the model is built, then it may be entirely incorrect. Have a look at the model. If you see clear sets of parallel or antiparallel strands, or if you see helices and strands with the expected relationships, your model is going to be correct. If you see a lot of short fragments everywhere, your model and solution is going to be incorrect. How many side-chains were fitted to density? More than 25% is ok, more than 50% is very good.
●
What is the R-factor of the model? This only applies if you are building a full model (not for helices_strands_only=True). For a solution at moderate to high resolution (2.5 A or better) the Rfactor should be in the low 30's to be very good. For lower-resolution data, an R-factor in the low
●
40's is probably largely correct but the model is not very good.
What was the overall signal-to-noise in the data? Above 1 is good, below 0.5 is very low.
●
What are the individual CC-BAYES estimates of map correlation for your top solution? For a good
● solution they are all around 50 or more, with 2SD uncertainties that are about 10-20.
What is the overall "ESTIMATED MAP CC x 100" of your top solution. This should also be 50 or more for a good solution. This is an estimate of the map correlation before density modification, so if you have a lot of solvent or several NCS-related copies in the asymmetric unit, then lower
● values may still give you a good map.
What is the difference in "ESTIMATED MAP CC x 100" between the top solution and its inverse?
If this is large (more than the 2SD values for each) that is a good sign.
What to do next
Once you have run AutoSol and have obtained a good solution and model, the next thing to do is to run the AutoBuild Wizard. If you run it in the same directory where you ran AutoSol, the AutoBuild
Wizard will pick up where the AutoSol Wizard left off and carry out iterative model-building, density
modification and refinement to improve your model and map. See the web page Automated Model
solution, then it's not time to give up yet. There are a number of standard things to try that may improve the structure determination. Here are a few that you should always try:
●
Have a careful look at all the output files. Work your way through the main log file (e.g.,
AutoSol_run_1_1.log) and all the other principal log files in order beginning with scaling
(dataset_1_scale.log), then looking at heavy-atom searching (FA.sca_hyss.log), phasing (e.
g., solve_10.log or solve_xx.log depending on which solution xx was the top solution) and density modification (e.g., resolve_xx.log). Is there anything strange or unusual in any of them that may give you a clue as to what to try next? For example did the phasing work well http://phenix-online.org/documentation/tutorial_mad.htm (13 of 14) [12/14/08 1:04:07 PM]
306
Tutorial 2: Solving a structure with MAD data
(high figure of merit) yet the density modification failed? (Perhaps the hand is incorrect). Was the solvent content estimated correctly? (You can specify it yourself if you want). What does the xtriage output say? Is there twinning or strong translational symmetry? Are there problems with reflections near ice rings? Are there many outlier reflections?
●
Try a different resolution cutoff. For example 0.5 A lower resolution than you tried before. Often the highest-resolution shells have little useful information for structure solution (though the data may be useful in refinement and density modification).
●
Try a different rejection criterion for outliers. The default is ratio_out=3.0 (toss reflections with delta F more than 3 times the rms delta F of all reflections in the shell). Try instead
ratio_out=5.0 to keep almost everything.
●
If the heavy-atom substructure search did not yield plausible solutions, try searching with HYSS using the command-line interface, and vary the resolution and number of sites you look for. Can you find a solution that has a higher CC than the one found in AutoSol? If so, you can read your
● solution in to AutoSol with sites_file=my_sites.pdb.
Was an anisotropy correction applied in AutoSol? If there is some anisotropy but no correction
● was applied, you can force AutoSol to apply the correction with correct_aniso=True.
Try related space groups. If you are not positive that your space group is P212121, then try other possibilities with different or no screw axes.
Additional information
For details about the AutoSol Wizard, see Automated structure solution with AutoSol . For help on
running Wizards, see
Running a Wizard from a GUI, the command-line, or a script
. http://phenix-online.org/documentation/tutorial_mad.htm (14 of 14) [12/14/08 1:04:07 PM]
307
Tutorial 3: Solving a structure with MIR data
Python-based Hierarchical ENvironment for Integrated Xtallography
Documentation Home
Tutorial 3: Solving a structure with MIR data
Running the demo rh-dehalogenase data with AutoSol
Reading the log files for your AutoSol run file
Summary of the command-line arguments
Testing for anisotropy in the data
Running HYSS to find the heavy-atom substructure
Finding the hand and scoring heavy-atom solutions
Finding origin shifts between heavy-atom solutions for different derivatives and combining phases
Finding additional sites by density modification and heavy-atom difference Fouriers
Statistical density modification with RESOLVE
The AutoSol_summary.dat summary file
How do I know if I have a good solution?
Introduction
This tutorial will use some very good MIR data (Native and 5 derivatives from a rh-dehalogenase protein MIR dataset analyzed at 2.8 A) as an example of how to solve a MIR dataset with AutoSol. It is designed to be read all the way through, giving pointers for you along the way. Once you have read it all and run the example data and looked at the output files, you will be in a good position to run your own data through
AutoSol.
Setting up to run PHENIX
If PHENIX is already installed and your environment is all set, then if you type: echo $PHENIX then you should get back something like this:
/xtal//phenix-1.3
If instead you get:
PHENIX: undefined variable http://phenix-online.org/documentation/tutorial_mir.htm (1 of 15) [12/14/08 1:04:14 PM]
308
Tutorial 3: Solving a structure with MIR data
this. If you are using the C-shell environment (csh) then all you will need to do is add one line to your .
cshrc (or equivalent) file that looks like this: source /xtal/phenix-1.3/phenix_env
(except that the path in this statement will be where your PHENIX is installed). Then the next time you log in $PHENIX will be defined.
Running the demo rh-dehalogenase data with AutoSol
To run AutoSol on the demo rh-dehalogenase data, make yourself a tutorials directory and cd into that directory: mkdir tutorials cd tutorials
Now type the phenix command: phenix.run_example --help to list the available examples. Choosing rh-dehalogenase-mir for this tutorial, you can now use the phenix command: phenix.run_example rh-dehalogenase-mir to solve the rh-dehalogenase structure with AutoSol. This command will copy the directory $PHENIX/
examples/rh-dehalogenase-mir to your current directory (tutorials) and call it tutorials/rh-
dehalogenase-mir/ . Then it will run AutoSol using the command file run.csh that is present in this
tutorials/rh-dehalogenase-mir/ directory. Running an MIR dataset is a little different than running a MAD or SAD or SIR dataset because you cannot use the standard command-line control for MIR. Instead you have to run a script. It is not hard, just different. (You can do all of those other things from a script too, it's just even easier to do them from the command-line). This command file run.csh is simple. It says:
#!/bin/csh
#!/bin/csh echo "Running AutoSol on rhodococcus dehalogenase data..." echo "NOTE: command-line not available for MIR..using script instead" phenix.runWizard AutoSol Facts.list
The first line (#!/bin/csh) tells the system to interpret the remainder of the text in the file using the C-shell
(csh). The command
phenix.autosol runs the command-line version of AutoSol (see Automated Structure
for all the details about AutoSol including a full list of keywords). The second line says to run the AutoSol Wizard, and use the contents of the file Facts.list as parameters. Now let’s look at the
Facts.list file. Here is the first relevant part of the file: sequence_file sequence.dat
thoroughness thorough cell 93.796 79.849 43.108 90.000 90.000 90.00 # cell params resolution 2.8 # Resolution expt_type sir # MIR dataset is set of SIR datasets input_file_list rt_rd_1.sca auki_rd_1.sca # list of input .sca files
# Native deriv 1 http://phenix-online.org/documentation/tutorial_mir.htm (2 of 15) [12/14/08 1:04:14 PM]
309
Tutorial 3: Solving a structure with MIR data nat_der_list Native Au # identify files in input_file_list
# as Native or the heavy-atom name
# such as se.
inano_list noinano inano # inano/noinano/anoonly: identify
# if ano diffs to be used for derivs n_ha_list 0 5 # number of heavy-atoms for each
# file for mir/sir (0 for native)
This part of the script tells AutoSol about the resolution, the data files for the first native-derivative combination, and the heavy atoms for these files (Native and Au), and whether anomalous differences are to be included for each (noinano for Native means do not include them; inano for the Au derivative means do include them for this derivative), and the number of heavy-atoms in each file (0 for the Native, 5 for the derivative). Note that this first native-derivative combination in this MIR dataset is being treated as an SIRAS dataset. This is the way the AutoSol Wizard works for MIR. The individual derivatives are all solved separately (except using difference Fouriers to phase one derivative using a solution from another). Then when all are finished all the SIR or SIRAS datasets are phased all together with SOLVE Bayesian correlated phasing. This approach works well because a substructure determination is done separately for each derivative, and if any one of them works well, then all the derivatives can be solved. This part of the script also tells AutoSol to use defaults for a thorough analysis. Usually for MIR this is the best idea, while for SAD and MAD experiments a quick analysis is fine. The MIR script then continues with data for the second, third... derivatives. These parts of the script all look like this:
############## NEW DATASET ################ run_list start # run "start" method.
# read in datafiles for this dataset run_list read_another_dataset # starting a new dataset here input_file_list rt_rd_1.sca hgki_rd_1.sca # list of input .sca files
# Native deriv 1 nat_der_list Native Hg # identify files in input_file_list
# as Native or the heavy-atom name
# such as se.
inano_list noinano inano # inano/noinano/anoonly: identify
# if ano diffs to be used for derivs n_ha_list 0 5 # number of heavy-atoms for each
# file for mir/sir (0 for native)
Here the run_list start line is a command to AutoSol. It means "run the following list of AutoSol methods: start " . So the AutoSol Wizard runs the "start" method and stops. This basically reads in the datafiles from the previous dataset. The next line says to read another dataset. Now we are ready to provide the data for the second native-derivative combination, again as an SIR dataset. We provide the same native as before
(although we don't have to) and a new derivative, this time an Hg derivative, again with anomalous data.
This procedure is repeated for each derivative. The AutoSol Wizard will then scale all the datasets and find heavy-atom solutions for some of them by direct methods, then use difference Fouriers to find the solutions for the others. Although the phenix.run_example rh-dehalogenase-mir command has just run AutoSol from a script (run.csh), you can run AutoSol yourself from this script with the same phenix.runWizard
AutoSol Facts.list command. You can also run AutoSol from a GUI. All these possibilities are described in
Running a Wizard from a GUI, the command-line, or a script
.
Where are my files?
Once you have started AutoSol or another Wizard, an output directory will be created in your current http://phenix-online.org/documentation/tutorial_mir.htm (3 of 15) [12/14/08 1:04:14 PM]
310
Tutorial 3: Solving a structure with MIR data
(working) directory. The first time you run AutoSol in this directory, this output directory will be called
AutoSol_run_1_ (or AutoSol_run_1_/, where the slash at the end just indicates that this is a directory).
All of the output from run 1 of AutoSol will be in this directory. If you run AutoSol again, a new subdirectory called AutoSol_run_2_ will be created. Inside the directory AutoSol_run_1_ there will be one or more temporary directories such as TEMP0 created while the Wizard is running. The files in this temporary directory may be useful sometimes in figuring out what the Wizard is doing (or not doing!). By default these directories are emptied when the Wizard finishes (but you can keep their contents with the command
clean_up=False if you want.)
What parameters did I use?
When the AutoSol wizard runs from a script it does not write out a parameters file. The parameters from your Facts.list are echoed in the AutoSol log file, but otherwise the Facts.list is your record of what the parameters used were.
Reading the log files for your AutoSol run file
While the AutoSol wizard is running, there are several places you can look to see what is going on. The most important one is the overall log file for the AutoSol run. This log file is located in:
AutoSol_run_1_/AutoSol_run_1_1.log
for run 1 of AutoSol. (The second 1 in this log file name will be incremented if you stop this run in the middle and restart it with a command like phenix.autosol run=1). The AutoSol_run_1_1.log file is a running summary of what the AutoSol Wizard is doing. Here are a few of the key sections of the log files produced for the rh-dehalogenase MIR dataset.
Summary of the command-line arguments
Near the top of the log file you will find:
READING FACTS FROM Facts.list
NEW FACT from Facts.list : cell [93.796000000000006, 79.849000000000004, 43.107999999999997, 90.0, 90.0, 90.0]
NEW FACT from Facts.list :resolution 2.8
NEW FACT from Facts.list :expt_type sir
NEW FACT from Facts.list :input_file_list ['rt_rd_1.sca', 'auki_rd_1.sca']
NEW FACT from Facts.list :nat_der_list ['Native', 'Au']
NEW FACT from Facts.list :inano_list ['noinano', 'inano']
NEW FACT from Facts.list :n_ha_list [0, 5]
NEW FACT from Facts.list :run_list ['start']
This is just a repeat of the parameters in your Facts.list script. The last fact is the "run_list start" command, which tells the AutoSol Wizard to read in the data (recall that we put in this command after each nativederivative combination so the Wizard could read it in as an SIR dataset).
Reading the datafiles.
The AutoSol Wizard will read in your datafiles and check their contents, printing out a summary for each one.
This is done one dataset at a time (each native-derivative pair) until all have been read in. Here is the summary for the first derivative:
HKLIN ENTRY: rt_rd_1.sca
FILE TYPE scalepack_no_merge_original_index
GUESS FILE TYPE MERGE TYPE sca unmerged
LABELS['I', 'SIGI']
CONTENTS: ['rt_rd_1.sca', 'sca', 'unmerged', 'P 21 21 2', None, None, http://phenix-online.org/documentation/tutorial_mir.htm (4 of 15) [12/14/08 1:04:14 PM]
311
Tutorial 3: Solving a structure with MIR data
['I', 'SIGI']]
Not checking SG as cell or sg not yet defined
SG from rt_rd_1.sca is: P 21 21 2
HKLIN ENTRY: auki_rd_1.sca
FILE TYPE scalepack_no_merge_original_index
GUESS FILE TYPE MERGE TYPE sca unmerged
LABELS['I', 'SIGI']
CONTENTS: ['auki_rd_1.sca', 'sca', 'unmerged', 'P 21 21 21', None, None,
['I', 'SIGI']]
Converting the files ['rt_rd_1.sca', 'auki_rd_1.sca'] to sca format before proceeding
ImportRawData.
The input data files rt_rd_1.sca and auki_rd_1.sca are in unmerged Scalepack format. The AutoSol wizard converts everything to premerged Scalepack format before proceeding. Here is where the AutoSol Wizard identifies the format and then calls the ImportRawData Wizard:
Running import directly...
WIZARD: ImportRawData followed eventually by...
List of output files :
File 1: rt_rd_1_PHX.sca
File 2: auki_rd_1_PHX.sca
These output files are in premerged Scalepack format. After completing the ImportRawData step, the
AutoSol Wizard goes back to the beginning, but uses the newly-converted files rt_rd_1_PHX.sca and
auki_rd_1_PHX.sca:
HKLIN ENTRY: AutoSol_run_1_/rt_rd_1_PHX.sca
FILE TYPE scalepack_merge
GUESS FILE TYPE MERGE TYPE sca premerged
LABELS['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']
Unit cell: (93.796, 79.849, 43.108, 90, 90, 90)
Space group: P 21 21 2 (No. 18)
CONTENTS: ['AutoSol_run_1_/rt_rd_1_PHX.sca', 'sca', 'premerged', 'P 21 21 2',
[93.796000000000006, 79.849000000000004, 43.107999999999997, 90.0, 90.0, 90.0],
2.4307589843043771, ['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']]
HKLIN ENTRY: AutoSol_run_1_/auki_rd_1_PHX.sca
FILE TYPE scalepack_merge
GUESS FILE TYPE MERGE TYPE sca premerged
LABELS['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']
Unit cell: (93.796, 79.849, 43.108, 90, 90, 90)
Space group: P 21 21 2 (No. 18)
CONTENTS: ['AutoSol_run_1_/auki_rd_1_PHX.sca', 'sca', 'premerged', 'P 21 21 2',
[93.796000000000006, 79.849000000000004, 43.107999999999997, 90.0, 90.0, 90.0],
2.430806639777233, ['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']]
Total of 2 input data files
['AutoSol_run_1_/rt_rd_1_PHX.sca', 'AutoSol_run_1_/auki_rd_1_PHX.sca']
Guessing cell contents
The AutoSol Wizard uses the sequence information in your sequence file (sequence.dat) and the cell parameters and space group to guess the number of NCS copies and the solvent fraction.
AutoSol_guess_setup_for_scaling AutoSol Run 1 Fri Mar 7 01:24:08 2008 http://phenix-online.org/documentation/tutorial_mir.htm (5 of 15) [12/14/08 1:04:14 PM]
312
Tutorial 3: Solving a structure with MIR data
Solvent fraction and resolution and ha types/scatt fact
Guessing setup for scaling dataset 1
SG P 21 21 2 cell [93.796000000000006, 79.849000000000004, 43.107999999999997, 90.0, 90.0, 90.0]
Number of residues in unique chains in seq file: 294
Unit cell: (93.796, 79.849, 43.108, 90, 90, 90)
Space group: P 21 21 2 (No. 18)
CELL VOLUME :322858.090387
N_EQUIV:4
GUESS OF NCS COPIES: 1
SOLVENT FRACTION ESTIMATE: 0.51
Total residues:294
Total Met:6 resolution estimate: 2.8
Running phenix.xtriage
The AutoSol Wizard automatically runs phenix.xtriage on each of your input datafiles to analyze them for twinning, outliers, translational symmetry, and other special conditions that you should be aware of. You can
read more about xtriage in Data quality assessment with phenix.xtriage
. Part of the summary output from
xtriage for this dataset looks like this:
No (pseudo)merohedral twin laws were found.
Patterson analyses
- Largest peak height : 6.680
(corresponding p value : 0.56306)
The largest off-origin peak in the Patterson function is 6.68% of the height of the origin peak. No significant pseudotranslation is detected.
The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.
In this space group (P21 21 2) with the cell dimensions in this structure, there are no ways to create a twinned crystal, so you do not have to worry about twinning. There is also no large off-origin peak in the native Patterson, so there does not appear to be any translational pseudo-symmetry.
Testing for anisotropy in the data
After all the SIR datasets are read in, the AutoSol Wizard tests for anisotropy by determining the range of effective anisotropic B values along the principal lattice directions. If this range is large and the ratio of the largest to the smallest value is also large then the data are by default corrected to make the anisotropy small
(see Analyzing and scaling the data
in the AutoSol web page for more discussion of the anisotropy correction). In the rh-dehalogenase case, the range of anisotropic B values is small and no correction is made:
Range of aniso B: 13.06 19.68
Not using aniso-corrected data files as the range of aniso b is only 6.62 and 'correct_aniso' is not set
Note that if any one of the datafiles in a MIR dataset has a high anisotropy, then by default all of them will be corrected for anisotropy. http://phenix-online.org/documentation/tutorial_mir.htm (6 of 15) [12/14/08 1:04:14 PM]
313
Tutorial 3: Solving a structure with MIR data
Scaling MIR data
The AutoSol Wizard uses SOLVE localscaling to scale MIR data. The procedure is basically to scale all the data to the native. During this process outliers that deviate from the reference values by more that ratio_out
(default=3) standard deviations (using all data in the appropriate resolution shell to estimate the SD) are rejected.
Running HYSS to find the heavy-atom substructure
The HYSS (hybrid substructure search) procedure for heavy-atom searching uses a combination of a
Patterson search for 2-site solutions with direct methods recycling. The search ends when the same solution is found beginning with several different starting points. The HYSS log files are named after the datafile that they are based on and the type of differences (ano, iso) that are being used. In this rh-dehalogenase MIR dataset, the HYSS logfile for the HgKI derivative is hgki_rd_1_PHX.sca_iso_2.sca_hyss.log. The key part of this HYSS log file is:
Entering search loop: p = peaklist index in Patterson map f = peaklist index in two-site translation function cc = correlation coefficient after extrapolation scan r = number of dual-space recycling cycles cc = final correlation coefficient
=0.190 r=015 cc=0.250 [ best cc: 0.250 ] p=000 f=001 cc=0.191 r=015 cc=0.242 [ best cc: 0.250 0.242 ]
Number of matching sites of top 2 structures: 3 p=000 f=002 cc=0.174 r=015 cc=0.200 [ best cc: 0.250 0.242 ] p=001 f=000 cc=0.167 r=015 cc=0.230 [ best cc: 0.250 0.242 0.230 ]
Number of matching sites of top 2 structures: 3
Number of matching sites of top 3 structures: 2
...
p=011 f=002 cc=0.165 r=015 cc=0.229 [ best cc: 0.293 0.279 0.277 0.276 ] p=012 f=000 cc=0.184 r=015 cc=0.250 [ best cc: 0.293 0.279 0.277 0.276 ] p=012 f=001 cc=0.148 r=015 cc=0.292 [ best cc: 0.293 0.292 0.279 0.277 ]
Number of matching sites of top 2 structures: 7
Number of matching sites of top 3 structures: 7
Number of matching sites of top 4 structures: 6
Here a correlation coefficient of 0.5 is very good (0.1 is hopeless, 0.2 is possible, 0.3 is good) and 8 sites were found that matched in the first two tries. The program continues until 4 structures all have 6 matching sites, then ends and prints out the final correlations, after taking the top 5 sites.
Finding the hand and scoring heavy-atom solutions
Normally either hand of the heavy-atom substructure is a possible solution, and both must be tested by calculating phases and examining the electron density map and by carrying out density modification, as they will give the same statistics for all heavy-atom analysis and phasing steps. Note that in chiral space groups
(those that have a handedness such as P61, both hands of the space group must be tested. The AutoSol
Wizard will do this for you, inverting the hand of the heavy-atom substructure and the space group at the same time. For example, in space group P61 the hand of the substructure is inverted and then it is placed in space group P65. The AutoSol Wizard scores heavy-atom solutions based on two criteria by default. The first criterion is the skew of the electron density in the map (SKEW). Good values for the skew are anything greater than 0.1. In a MIR structure determination, the heavy-atom solution with the correct hand may have a more positive skew than the one with the inverse hand. The second criterion is the correlation of local RMS density (CORR_RMS). This is a measure of how contiguous the solvent and non-solvent regions are in the map. (If the local rms is low at one point and also low at neighboring points, then the solvent region must be relatively contiguous, and not split up into small regions.) For MIR datasets, SOLVE is used for calculating http://phenix-online.org/documentation/tutorial_mir.htm (7 of 15) [12/14/08 1:04:14 PM]
314
Tutorial 3: Solving a structure with MIR data phases. For a MIR dataset, a figure of merit of 0.5 is acceptable, 0.6 is fine and anything above 0.7 is very good. The scores are listed in the AutoSol log file. Here is the scoring for solution 4 (the best initial map):
AutoSol_run_1_/TEMP0/resolve.scores SKEW 0.2797302
AutoSol_run_1_/TEMP0/resolve.scores CORR_RMS 0.9306123
CC-EST (BAYES-CC) SKEW : 57.8 +/- 17.0
CC-EST (BAYES-CC) CORR_RMS : 63.3 +/- 28.2
ESTIMATED MAP CC x 100: 60.8 +/- 13.3
This is a good solution, with a high (and positive) skew (0.28), and a high correlation of local rms density
(0.93) The ESTIMATED MAP CC x 100 is an estimate of the quality of the experimental electron density map (not the density-modified one). A set of real structures was used to calibrate the range of values of each score that were obtained for phases with varying quality. The resulting probability distributions are used above to estimate the correlation between the experimental map and an ideal map for this structure. Then all the estimates are combined to yield an overall Bayesian estimate of the map quality. These are reported as
CC x 100 +/- 2SD. These estimated map CC values are usually fairly close, so if the estimate is 60.8 +/-
13.3 then you can be confident that your structure is solved and that the density-modified map will be quite good. In this case the datasets used to find heavy-atom substructures were the isomorphous differences for each derivative. For each dataset one solution was found, and that solution and its inverse were scored. The scores were (skipping extra text below):
SCORING SOLUTION 1: Solution 1 using H