PHENIX software User manual

PHENIX software User manual

Below you will find brief information for software PHENIX. PHENIX software suite is a highly automated system for macromolecular structure determination that can rapidly arrive at an initial partial model of a structure without significant human intervention, given moderate resolution and good quality data. This achievement has been made possible by the development of new algorithms for structure determination, maximum-likelihood molecular replacement (PHASER), heavy-atom search (HySS), template and pattern-based automated modelbuilding (RESOLVE, TEXTAL), automated macromolecular refinement (phenix.refine), and iterative model-building, density modification and refinement that can operate at moderate resolution (RESOLVE, AutoBuild).

advertisement

Assistant Bot

Need help? Our chatbot has already read the manual and is ready to assist you. Feel free to ask any questions about the device, but providing details will make the conversation more productive.

PHENIX Software User Manual | Manualzz

PHENIX Documentation Home

Python-based Hierarchical ENvironment for Integrated Xtallography

PHENIX Documentation - version 1.4

1. Introduction to PHENIX a.

What is PHENIX

b.

How to install PHENIX

c.

How to set up your environment to use PHENIX

d.

Running PHENIX

e.

The PHENIX Graphical User Interface

f.

The main modules in PHENIX

g.

FAQS: Frequently asked questions

2. The PHENIX Wizards for Automation a.

Using the PHENIX Wizards

b.

Automated Structure Solution using AutoSol

c.

Automated Molecular Replacement using AutoMR

d.

Automated Model Building and Rebuilding using AutoBuild

e.

Automated Ligand Fitting using LigandFit

3. Tools for analysing and manupulating experimental data in PHENIX a.

Data quality assessment with phenix.xtriage

b.

Data quality assessment with phenix.reflection_statistics

c.

Structure factor file inspection and conversions

d.

Manipulating reflection data with phenix.xmanip

e.

Exploring the symmetry of your crystal with phenix.explore_metric_symmetry

4. Tools for substructure Determination in PHENIX a.

Substructure determination with phenix.hyss

b.

Comparison of substructure sites with phenix.emma

5. Tools for structure refinement and restraint generation in PHENIX a.

Structure refinement with phenix.refine

b.

Determining non-crystallographic symmetry (NCS) from a PDB file with phenix.

simple_ncs_from_pdb

c.

Finding and analyzing NCS from heavy-atom sites or a model with phenix.find_ncs

d.

Generating ligand coordinates and restraints using eLBOW

e.

Editing ligand restraints from eLBOW using REEL

f.

Add hydrogens to protein, ligands and water molecules; generate metal coordination files; and introduce neutron exchange sites using ReadySet! (phenix.ready_set)

g.

Generating hydrogen atoms for refinement using phenix.reduce

6. Other tools in PHENIX a.

Documentation for the Phaser program

b.

Superimposing PDB files with phenix.superpose_pdbs

c.

Density modification with multi-crystal averaging with phenix.multi_crystal_average

d.

Correlation of map and model with get_cc_mtz_pdb

e.

Correlation of two maps with origin shifts with get_cc_mtz_mtz

f.

Rapid secondary structure fitting to a map with find_helices_strands

g.

PDB model: statistics, manipulations, Fcalc and more with phenix.pdbtools

h.

Running SOLVE/RESOLVE in PHENIX

i.

Automated ligand identification in PHENIX

j.

Finding all the ligands in a map with phenix.find_all_ligands

k.

Map one PDB file close to another using SG symmetry with phenix.map_to_object

http://phenix-online.org/documentation/ (1 of 2) [12/14/08 1:00:06 PM]

1

PHENIX Documentation Home l.

PyMOL in PHENIX

7. Useful tools outside of PHENIX a.

Manual model inspection and building with Coot

b.

MolProbity - An Active Validation Tool

8. PHENIX Examples and Tutorials a.

PHENIX examples

b.

Tutorial 1: Solving a structure using SAD data

c.

Tutorial 2: Solving a structure using MAD data

d.

Tutorial 3: Solving a structure using MIR data

e.

Tutorial 4: Iterative model-building, density modification and refinement starting from experimental phases

f.

Tutorial 5: Solving a structure using Molecular Replacement

g.

Tutorial 6: Automatically rebuilding a structure solved by Molecular Replacement

h.

Tutorial 7: Fitting a flexible ligand into a difference electron density map

i.

Tutorial 8: Structure refinement

j.

Tutorial 9: Refining a structure in the presence of merohedral twinning

k.

Tutorial 10: Generating ligand coordinates and restraints for structure refinement

l.

Tutorial 11: Structure validation using MolProbity

9. Appendix a.

PHENIX html documentation generation procedures

10.

Index

http://phenix-online.org/documentation/ (2 of 2) [12/14/08 1:00:06 PM]

2

What is PHENIX

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

What is PHENIX

The PHENIX software suite is a highly automated system for macromolecular structure determination that can rapidly arrive at an initial partial model of a structure without significant human intervention, given moderate resolution and good quality data. This achievement has been made possible by the development of new algorithms for structure determination, maximum-likelihood molecular replacement (PHASER), heavy-atom search (HySS), template and pattern-based automated modelbuilding (RESOLVE, TEXTAL), automated macromolecular refinement (phenix.refine), and iterative model-building, density modification and refinement that can operate at moderate resolution

(RESOLVE, AutoBuild). These algorithms are based on a highly integrated and comprehensive set of crystallographic libraries that have been built and made available to the community. The algorithms are tightly linked and made easily accessible to users through the PHENIX Wizards and the command line.

There are also a number of tools in PHENIX for handling ligands. Automated fitting of ligands into the electron density is facilitated via the LigandFit wizard. Besides being able to fit a known ligand into a difference map, the LigandFit wizard is capable to identify ligands on the basis of the difference density only. Stereo chemical dictionaries of ligands whose chemical description is not available in the supplied monomer library for the use in restrained macromolecular refinement can be generated with the electronic ligand builder and optimization workbench (eLBOW).

PHENIX builds upon Python, the Boost.Python Library, and C++ to provide an environment for automation and scientific computing. Many of the fundamental crystallographic building blocks, such as data objects and tools for their manipulation are provided by the Computational Crystallography

Toolbox (cctbx). The computational tasks which perform complex crystallographic calculations are then built on top of this. Finally, there are a number of different user interfaces available in PHENIX. In order to facilitate automated operation there is the Project Data Storage (PDS) that is used to store and track the results of calculations.

The PHENIX development team consists of members from Lawrence Berkeley Laboratory (Paul Adams's group), Los Alamos National Laboratory (Tom Terwilliger's group), Cambridge University (Randy Read's group) and Duke University (the Richardsons' group). Researchers from Texas A&M University (Tom

Ioerger's and Jim Sacchettini's groups) participated in the first five years of PHENIX development.

The development of PHENIX is funded by the National Institutes of Health (General Medicine) under grant P01GM063210, and the PHENIX Industrial Consortium. Citing PHENIX If you use PHENIX to solve a structure please cite this publication: PHENIX: building new software for automated crystallographic structure determination P.D. Adams, R.W. Grosse-Kunstleve, L.-W. Hung, T.R. Ioerger, A.J. McCoy, N.

W. Moriarty, R.J. Read, J.C. Sacchettini, N.K. Sauter and T.C. Terwilliger. Acta Cryst. D58, 1948-1954

(2002) Publications A number of publications describing PHENIX can be found at: http://www.phenixonline.org/papers/ http://phenix-online.org/documentation/what-is-phenix.htm [12/14/08 1:00:11 PM]

3

Installation

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Installation

You should obtain the latest distribution of PHENIX including the binary bundles for your machine architectures. Unpack the tar file:

% tar xvf phenix-installer-<version>-<platform>.tar

Change to the installer directory:

% cd phenix-installer-<version>

To install:

% ./install [will install in /usr/local/phenix-<version> by default,

requires root permissions]

% ./install --prefix=<directory> [will make <directory>/phenix-<version> and install there]

Note: <directory> must be a absolute path (i.e. starts with a /). A relative path starting with ../ will not work correctly. Note: on Mac OS-X systems the binary installation must be installed in /usr/local.

Installation of the binary version of PHENIX requires no compilation, only the generation of some data files, so you will probably have to wait about 30 minutes for the installation to complete (depending on the performance of your installation platform). PHENIX is supported on most common Linux platforms and Mac

OS-X Currently, the following Redhat Linux platforms are tested, and therefore supported for the distribution:

Redhat 8.0

Redhat 9.0

Redhat Enterprise Workstation 3 [+/- x86_64]

Redhat Enterprise Server 4.2 [+/- x86_64]

Fedora Core 3 [+/- x86_64]

Fedora Core 5 [+/- x86_64]

Fedora Core 6 [+/- x86_64]

Redhat versions prior to 8.0 are not supported. PHENIX should install on other Linux platforms such as

Mandrake, or SuSe. There are 4 different Linux installations available, based on the version of the kernel

(2.4 or 2.6) and the CPU type (ix86 or x86_64). If it isn't clear which you need then type this command on the machine in question:

% uname -rm

The first item is the kernel version, the second is the machine hardware. Please download the appropriate installer based on this table:

Hardware

Kernel

2.4

2.6

ix86 intel-linux-2.4

intel-linux-2.6

x86_64 intel-linux-2.4-x86_64 intel-linux-2.6-x86_64

Currently, the following other platforms are supported for the distribution: http://phenix-online.org/documentation/install.htm (1 of 2) [12/14/08 1:00:12 PM]

4

Installation

Mac OS-X (Intel and PPC) 10.4.10 or later (Tiger or Leopard)

For license information please see LICENSE file. For source of components see SOURCES.

Space requirements

For the complete PHENIX installation you will need approximately 1.5GB of disk space. http://phenix-online.org/documentation/install.htm (2 of 2) [12/14/08 1:00:12 PM]

5

The PHENIX environment

Documentation Home

The PHENIX environment

Python-based Hierarchical ENvironment for Integrated Xtallography

Setting up your environment

Documentation

Help

Setting up your environment

Once you have successfully installed PHENIX, to set up your environment please source the phenix_env file in the phenix installation directory (for example):

% source /usr/local/phenix-<version>/phenix_env [csh/tcsh users] or

% . /usr/local/phenix-<version>/phenix_env.sh [sh/bash users]

To run jobs remotely, you need to source the phenix_env in your .cshrc (or equivalent) file. The following environmental variables should now be defined (here with example values):

PHENIX=/usr/local/phenix

PHENIX_INSTALLER_DATE=080920070957

PHENIX_VERSION=1.3

PHENIX_RELEASE_TAG=final

PHENIX_ENVIRONMENT=1

PHENIX_MTYPE=intel-linux-2.6-x86_64

PHENIX_MVERSION=linux

PHENIX_USE_MTYPE=intel-linux-2.6-x86_64

It is not necessary (or useful) to define environmental variables for SOLVE/RESOLVE for PHENIX. If you have them set in your environment they are ignored by PHENIX.

Documentation

You can find documentation in the PHENIX GUI (under the Help menu). Alternatively, you can use a web browser to view the documentation supplied with PHENIX, by typing:

% phenix.doc

If this doesn't work because of browser installation issues then you can point a web browser to the correct location in your PHENIX installation (for example):

% firefox /usr/local/phenix-<version>/doc/index.html

or:

% mozilla $PHENIX/doc/index.html

http://phenix-online.org/documentation/phenix-environment.htm (1 of 2) [12/14/08 1:00:12 PM]

6

The PHENIX environment

For license information please see the LICENSE file. For the source of the components see SOURCES.

Help

You can join the PHENIX bulletin board and/or view the archives: http://www.phenix-online.org/mailman/listinfo/phenixbb

Alternatively you can send email to: [email protected] (if you think you've found a bug) [email protected] (if you'd like to ask us questions) http://phenix-online.org/documentation/phenix-environment.htm (2 of 2) [12/14/08 1:00:12 PM]

7

Running PHENIX

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Running PHENIX

Command Line Interface

The PHENIX GUI

Tasks and Strategies

Wizards

Different user interfaces are required depending on the needs of a diverse user community. There are currently three different user interfaces, each described below.

Command Line Interface

For a number of applications a command-line interface is most effective. This is particularly the case when rapid results are required, such as data quality assessment and twinning analysis, or substructure solution at the synchrotron beam line. Tools that facilitate the ease of use at the early stages of structure solution, such as data analyses (phenix.xtriage), substructure solution (phenix.

hyss) and reflection file manipulations such as the generation of a test set, reindexing and merging of data (phenix.reflection_file_converter) are available via simple command line interfaces. Another major application that is controlled via the command line interface is phenix.refine. To illustrate the command line interface, the command used to run the program that carries out a data quality and twinning analyses is: phenix.xtriage my_data.sca [options]

Further options can be given on the command line, or can be specified via a parameter file: phenix.xtriage my_parameters.def

A similar interface is used for macromolecular refinement: phenix.refine my_model.pdb my_data.mtz

Although SCALEPACK and MTZ formats are indicated in the above example, reflection file formats such as D*TREK, CNS/XPLOR or SHELX can be used, as the format is detected automatically.

Help for all command line applications can be obtained by use of the flag --help : phenix.refine --help

There are also many other command line tools (described in detail elsewhere in this documentation). If you use a shell with command completion, you can type the first part of a command, hit the command list key (<ctrl>-D in tcsh) and see a list of the available commands. For example, this is the start of the list of commands that begin with phenix.auto: phenix.autobuild phenix.automr phenix.autosol

phenix.autobuild_1.3 phenix.automr_1.3 phenix.autosol_1.3

http://phenix-online.org/documentation/running-phenix.htm (1 of 2) [12/14/08 1:00:14 PM]

8

Running PHENIX

Note: all commands have their regular name and name qualified with the version. You can always use the version-qualified name to ensure which version of a command you are using (in case you have multiple versions of PHENIX or related applications installed).

The PHENIX GUI

To run the PHENIX Graphical Interface:

% phenix &

Please see the other documentation files to get more details about the

PHENIX GUI

.

Tasks and Strategies

The PHENIX strategy interface (in the GUI) provides a way to construct complex networks of tasks to perform a higher-level function. For example, the steps required to go from initial data to a first electron density map in a SAD experiment can be broken down into well-defined tasks (available from the task window in the GUI) which can be reused in other procedures. Instead of requiring the user to run these tasks in the correct order they are connected together by the software developer, and can thus be run in an automated way. However, because the connection between tasks is dynamic they can be reconfigured or modified, and new tasks introduced as necessary if problems occur. This provides the flexibility of user input and control, while still permitting complete automation when decision-making algorithms are incorporated into the environment. The tasks and their connection into strategies rely on the use of plain text task files written using the Python scripting language. This enables the computational algorithms to be used easily in a non-graphical environment. The PHENIX

GUI permits strategies to be visualized and manipulated. These manipulations include loading a strategy distributed with PHENIX, customizing and saving it for future recall. Current tasks and strategies available include:

Density modification; Carries out a single run of RESOLVE

Substructure solution; Runs phenix.hyss

Molecular replacement; Computes rotation and translation functions with PHASER.

Model building; Using TEXTAL or RESOLVE.

Ligand identification; Using RESOLVE.

Wizards

The decision-making in strategies is local, with decisions being made at the end of each task to determine the next path in the network. Crystallographers typically make decisions in a very similar way during structure solution; a program is run, the outputs manually inspected and a decision made about the next step in the process. By contrast, a wizard provides a user interface that can make more global decisions, by considering all of the available information at each step in the process. Wizards can be run from both the command line and the PHENIX GUI. Details on wizards can be found in:

Using the PHENIX Wizards

Automated Structure Solution using AutoSol

Automated Molecular Replacement using AutoMR

Automated Model Building and Rebuilding using AutoBuild

Automated Ligand Fitting using LigandFit

http://phenix-online.org/documentation/running-phenix.htm (2 of 2) [12/14/08 1:00:14 PM]

9

PHENIX Graphical User Interface

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

PHENIX Graphical User Interface

Author

Purpose

Screen Shots

Wizards

Strategies

PHENIX Graphical User Interface

Author

Nigel W. Moriarty

Purpose

To provide a simple and easy graphical interface to the features of the PHENIX package. In particular, the concept of a wizard that guides the user through the complex process of solving a protein structure, is a powerful tool.

Screen Shots

http://phenix-online.org/documentation/phenix_gui.htm (1 of 3) [12/14/08 1:00:18 PM]

10

PHENIX Graphical User Interface

Wizards

Wizards can be loaded by double clicking on the Wizard menu to the left of the GUI. The wizard loads and provides an interface to request information from the user. Details on wizards can be found in:

Using the PHENIX Wizards

Automated Structure Solution using AutoSol

Automated Molecular Replacement using AutoMR

http://phenix-online.org/documentation/phenix_gui.htm (2 of 3) [12/14/08 1:00:18 PM]

11

PHENIX Graphical User Interface

Automated Model Building and Rebuilding using AutoBuild

Automated Ligand Fitting using LigandFit

Strategies

The main window of the PHENIX GUI is the strategy canvas. The strategy canvas allows the user to construct a strategy from the tasks in the menu in the left window. Choosing a task from the menu will attach that task to the mouse. The mouse will change to a hand icon while in the canvas window and there is a task attached. Clicking inside the canvas will place the task on the closest grid point. Help on tasks can be obtained by right-clicking on the task menu item to reveal a pop-up menu.

A similar situation exists for the strategy menu. Choosing a strategy will load it into a new strategy canvas. These strategies are loaded with the default task inputs providing a "clean slate" strategy for user customization. Right-clicking will reveal a pop-up menu for strategy loading option including overwriting the current strategy or adding to the current strategy.

Tasks can be moved in one of two ways. The first involves using the left mouse button to click-drag-drop the task. This must be done in the title panel of the task. The second option is useful in situations where there is a delay between the mouse action and the GUI update. This happens when using a remote machine to run the GUI. Moving a task can be achieve by right mouse clicking on the title panel and then right clicking where the task should be re-located. The mouse cursor will change to indicate the attachment of the task.

Each task has up to five buttons along the top of the title panel. The right most button deletes the task from the canvas. The left most button is a toggle to indicate which task to commence the calculation. The remaining three buttons are present if the appropriate function is available for that task.

The upper-most task in the above figure displays all five buttons. The second button is the task parameter button. A dialog is displayed that allows the user to edit the input and output of that task.

The number and type of information is dependent on the task.

The third button is the display button. This launches the appropriate windows for display the results of a task. For example, the 'import pdb' task will display the PDB header information in a text control or the molecule in a molecular graphical interface program PyMOL.

The fourth button is the help button. Help for the task is displayed via this option.

Two tasks can be linked by moving the lower panels of one task over the title panel of another. There can be any number of connection panels associated with a task. Logical operations can be provided to choose the appropriate linkage to follow.

The connection between tasks is not data flow but time flow. Each task obtains its data from the PHENIX

Data Storage Server and sends its output data to the PDS server. Subsequent tasks can get data sent to the PDS server by previous tasks.

The colors of the task indicate the activity of the strategy. A purple task means that it has finished running.

The green task is the currently running task and blue indicates a task that hasn't been run. Red indicates a task that failed during calculation and yellow is used when a strategy run is stopped by the user.

The task menu on the left side of the strategy canvas is divided into sub-menus. The ``development`` sub-menu contains experimental tasks. The ``examples`` sub-menu has some demonstrative tasks.

The remaining sub-menus are self explanatory.

The overview window in the bottom left corner allows navigation of the canvas when large strategies are used. http://phenix-online.org/documentation/phenix_gui.htm (3 of 3) [12/14/08 1:00:18 PM]

12

Main PHENIX Modules

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Main PHENIX Modules

Data Analysis

Automated Structure Solution Using Experimental Phasing Techniques

Automated Structure Solution Via Molecular Replacement

Automated Model Building

Structure Refinement

Automated ligand density analysis

Calculating ligand geometries and defining chemical restraints

Literature

Additional information

Data Analysis

Detection of twinning and other pathologies is facilitated via the program

phenix.xtriage

. This command line driven program analyses an experimental data set and provides diagnostics that aid in the detection of common idiosyncrasies such as the presence of pseudo translational symmetry, certain data processing problems and twinning. Other sanity checks, such as a Wilson plot sanity check and an algorithm that tries to detect the presence of ice rings from the merged data are performed as well. If twin laws are present for the given unit cell and space group, a Britton plot is computed, an H-test is performed and a likelihood based method is used to provide an estimate of the twin fraction. Twin laws are deduced from first principles for each data set, avoiding the danger of over-looking twin laws by incomplete lookup tables. If a model is available, more efficient twin detection tools are available. The

RvsR statistic is particularly useful in the detection of twinning in combination with pseudo rotational symmetry. This statistic is computed by phenix.xtriage if calculated data is supplied together with the observed data. A more direct test for the presence of twinning is by refinement of the twin fraction given an atomic model (which can be performed in phenix.refine). The command line utility phenix.

twin_map_utils provides a quick way to refine a twin fraction given an atomic model and an X-ray data set and also produces

Automated Structure Solution Using Experimental Phasing Techniques

Structure solution via SAD, MAD or SIR(AS) can be carried out with the AutoSol wizard. The AutoSol

wizard performs heavy atom location, phasing, density modification and initial model building in an automated manner. The heavy atoms are located with substructure solution engine also used in phenix.

hyss, a dual space method similar to SHELXD and Shake and Bake. Phasing is carried out with PHASER for SAD cases and with SOLVE for MAD and SIR(AS) cases. Subsequent density modification is carried out with RESOLVE. The hand of the substructure is determined automatically on the basis of the quality of the resulting electron density map. It is noteworthy that the whole process is not necessarily linear but that the wizard can decide to step back and (for instance) try another set of heavy atoms if appropriate. In the resulting electron density map, a model is built (currently limited to proteins).

Further model completion can be carried out via the

AutoBuild wizard. The AutoBuild wizard iterates

model building and density modification with refinement of the model in a scheme similar to other iterative model building methods, for example ARP/wARP.

Automated Structure Solution Via Molecular Replacement

Structure solution via molecular replacement is facilitated via the AutoMR

wizard. The

AutoMR

wizard http://phenix-online.org/documentation/phenix-modules.htm (1 of 4) [12/14/08 1:00:20 PM]

13

Main PHENIX Modules guides the user through setting up all necessary parameters to run a molecular replacement job with

PHASER. The molecular replacement carried out by PHASER uses likelihood based scoring function, improving the sensitivity of the procedure and the ability to obtain reasonable solutions with search models that have a relatively low sequence similarity to the crystal structure being determined.

Besides the use of likelihood based scoring functions, structure solution is enhanced by detailed bookkeeping of all search possibilities when searching for more then a single copy in the asymmetric unit or when there the choice of space group is ambiguous. When a suitable molecular replacement

solution is found, the AutoBuild

wizard is invoked and rebuilds the molecular replacement model given the sequence of the model under investigation.

Automated Model Building

Automated model building given a starting model or a set of reasonable phases can be carried out by

the AutoBuild wizard. A typical AutoBuild job combines density modification, model building,

macromolecular refinement and solvent model updates ('water picking') in an iterative manner.

Various modes of building a model are available. Depending on the availability of a molecular model, model building can be carried by locally rebuilding an existing model (rebuild in place) or by building in the density without any information of an available model. The rebuilding in place model building is a powerful building scheme that is used by default for molecular replacement models that have a high sequence similarity to the sequence of the structure that is to be built. A fundamental feature of the

AutoBuild

wizard is that it builds various models, all from slightly different starting points. The dependency of the outcome of the model building algorithm on initial starting conditions provides a straightforward mechanism to obtain a variety of plausible molecular models. It is not uncommon that certain sections of a map are built in one model, while not in another. Combining these models allows

the AutoBuild wizard to converge faster to a more complete model than when using a single model

building pass for a given set of phases. Dedicated loop fitting algorithms are used to close gaps between chain segments. This feature, together with the water picking and side chain placement, typically results in highly complete models of high quality that need minimal manual intervention before they are ready for deposition.

Structure Refinement

The refinement engine used in the AutoBuild and

AutoSol

wizards can also be run from the command line with the

phenix.refine

command. The phenix.refine program carries out likelihood based

refinement and has the possibility to refine positional parameters, individual or grouped atomic displacement parameters, individual or grouped occupancies. The refinement of anisotropic displacement parameters (individual or via a TLS parameterization) is also available. Positional parameters can be optimized using either traditional gradient-only based optimization methods, or via simulated annealing protocols. The command line interface allows the user to specify which part of the model should be refined in what manner. It is in principle possible to refine half of the molecule as a rigid group with grouped B values, whereas the other half of the molecule has a TLS parameterization.

The flexibility of specifying the level of parameterization of the model is especially important for the refinement of low resolution data or when starting with severely incomplete atomic models. Another advantage of this flexibility in refinement strategy is that a user can perform a complex refinement protocol that carries out simulated annealing, isotropic B refinement and water picking in 'one go'.

Another main feature of

phenix.refine

is the way in which the relative weights for the geometric and

ADP restraints with respect to the X-ray target are determined. Considerable effort has been put into devising a good set of defaults and weight determination schemes that results in a good choice of parameters for the data set under investigation. Defaults can of course be overwritten if the user chooses to. Besides being able to handle refinement against X-ray data,

phenix.refine

can refine

against neutron data or against X-ray and neutron data simultaneously.

Automated ligand density analysis

Automated fitting of ligands into the electron density is facilitated via the

LigandFit wizard. The ligand

http://phenix-online.org/documentation/phenix-modules.htm (2 of 4) [12/14/08 1:00:20 PM]

14

Main PHENIX Modules building is performed by finding an initial fit for the largest rigid domain of the ligand and extending the remaining part of the ligand from this initial 'seed'. Besides being able to fit a known ligand into a

difference map, the LigandFit

wizard is capable of identifying ligands on the basis of the difference density only. In the latter scheme, density characteristics for ligands occurring frequently in the PDB are used to provide the user with a range of plausible ligands.

Calculating ligand geometries and defining chemical restraints

Stereo chemical dictionaries of ligands whose chemical description is not available in the supplied monomer library for the use in restrained macromolecular refinement can be generated with the electronic ligand builder and optimization workbench (eLBOW). eLBOW generates a 3D geometry from a number of chemical input formats including MOL2 or PDB files and SMILES strings. SMILES is a compact, chemically dense description of a molecule that contains all element and bonding information and optionally other stereo information such as chirality. To generate a 3D geometry from an input format that contains no 3D geometry information,

eLBOW uses a Z-Matrix formalism in conjunction

with a table of bond lengths calculated using the Hartree-Fock method with a 6-31G(d,p) basis set to obtain a Cartesian coordinate set. The geometry is then optionally optimized using the semi-empirical quantum chemistry method AM1. The AM1 optimization provides chemically meaningful and accurate

geometries for the class of molecule typically complexed with proteins. eLBOW outputs the optimized

geometry and a standard CIF restraint file that can be read in by phenix.refine

and can also be used

for real space refinement during manual model building sessions in the program COOT. An interface is

also available to use eLBOW within COOT.

Literature

1. Adams PD, Grosse-Kunstleve, R.W., and Brunger, A.T.: Computational aspects of highthroughput crystallographic macromolecular structure determination. Methods Biochem Anal

2003, 44:75-87.

2. Terwilliger TC, Berendzen J: Automated MAD and MIR structure solution. Acta Crystallogr D Biol

Crystallogr 1999, 55(Pt 4):849-861.

3. Schneider TR, Sheldrick GM: Substructure solution with SHELXD. Acta Crystallogr D Biol

Crystallogr 2002, 58(Pt 10 Pt 2):1772-1779.

4. McCoy AJ, Grosse-Kunstleve RW, Storoni LC, Read RJ: Likelihood-enhanced fast translation functions. Acta Crystallographica Section D 2005, 61(4):458-464.

5. Terwilliger TC: Automated main-chain model building by template matching and iterative fragment extension. Acta Crystallogr D Biol Crystallogr 2003, 59(Pt 1):38-44.

6. Terwilliger TC: Automated side-chain model building and sequence assignment by template matching. Acta Crystallogr D Biol Crystallogr 2003, 59(Pt 1):45-49.

7. Emsley P, Cowtan K: Coot: model-building tools for molecular graphics. Acta Crystallogr D Biol

Crystallogr 2004, 60(Pt 12 Pt 1):2126-2132.

8. Adams PD, Grosse-Kunstleve RW, Hung L-W, Ioerger TR, McCoy AJ, Moriarty NW, Read RJ,

Sacchettini JC, Sauter NK, Terwilliger TC: PHENIX: building new software for automated crystallographic structure determination. Acta Crystallographica Section D 2002, 58(11):1948-

1954.

9. Grosse-Kunstleve RW, Sauter NK, Moriarty NW, Adams PD: The Computational Crystallography

Toolbox: crystallographic algorithms in a reusable software framework. Journal of Applied

Crystallography 2002, 35:126-136.

10. Grosse-Kunstleve RW, Adams PD: Substructure search procedures for macromolecular structures. Acta Crystallogr D Biol Crystallogr 2003, 59(Pt 11):1966-1973.

11. Weeks CM, Miller R: Optimizing Shake-and-Bake for proteins. Acta Crystallogr D Biol Crystallogr

1999, 55(Pt 2):492-500.

12. Read R: Pushing the boundaries of molecular replacement with maximum likelihood. Acta

Crystallographica Section D 2001, 57(10):1373-1382.

13. Schomaker V, Trueblood K: On Rigid-Body Motion of Molecules in Crystals. Acta Crystall B-Stru

1968, B 24:63-&.

14. Winn MD, Isupov MN, Murshudov GN: Use of TLS parameters to model anisotropic displacements http://phenix-online.org/documentation/phenix-modules.htm (3 of 4) [12/14/08 1:00:20 PM]

15

Main PHENIX Modules in macromolecular refinement. Acta Crystallogr D Biol Crystallogr 2001, 57(Pt 1):122-133.

15. Brunger AT, Adams PD, Rice LM: Annealing in crystallography: a powerful optimization tool. Prog

Biophys Mol Biol 1999, 72(2):135-155.

16. Vagin AA, Steiner RA, Lebedev AA, Potterton L, McNicholas S, Long F, Murshudov GN: REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use. Acta Crystallogr

D Biol Crystallogr 2004, 60(Pt 12 Pt 1):2184-2195.

17. Weininger D: SMILES 1. Introduction and Endoding Rules. J Chem Inf Comput Sci 1988, 28:31.

18. Fisher RG, Sweet RM: Treatment of diffraction data from crystals twinned by merohedry. Acta

Crystallographica Section A 1980, 36(5):755-760.

19. Yeates TO: Simple statistics for intensity data from twinned specimens. Acta Crystallogr A 1988,

44 ( Pt 2):142-144.

20. Yeates TO: Detecting and overcoming crystal twinning. Methods Enzymol 1997, 276:344-358.

21. Lebedev AA, Vagin AA, Murshudov GN: Intensity statistics in twinned crystals with examples from the PDB. Acta Crystallogr D Biol Crystallogr 2006, 62(Pt 1):83-95.

22. Zwart P: Anomalous signal indicators in protein crystallography. Acta Crystallographica Section D

2005, 61(11):1437-1448.

Additional information

http://phenix-online.org/documentation/phenix-modules.htm (4 of 4) [12/14/08 1:00:20 PM]

16

PHENIX FAQS

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

PHENIX FAQS

How should I cite PHENIX?

Where can I find sample data?

Can I easily run a Wizard with some sample data?

What sample data are available to run automatically?

Are any of the sample datasets annotated?

Why does the AutoBuild Wizard say it is doing 2 rebuild cycles but I specified one?

What is the difference between overall_best.pdb and cycle_best_1.pdb in the AutoBuild Wizard?

Can PHENIX do MRSAD?

How can I tell the AutoSol Wizard which columns to use from my mtz file?

How do I know what my choices of labels are for my data file?

What can I do if a Wizard says this version does not seem big enough?

Why does the AutoBuild Wizard say Sorry, you need to define FP in labin but AutoMR was able to read my data file just fine?

Why does the AutoBuild Wizard just stop after a few seconds?

What do I do if the PHENIX GUI hangs?

Why does the GUI Parameters window say Invalid input parameters...do you want to continue?

Why is my TEMP0 directory empty after running a Wizard?

How do I stop a Wizard?

What is an R-free flags mismatch?

Can I use the AutoBuild wizard at low resolution?

Why doesn't COOT recognize my MTZ file from AutoBuild?

How should I cite PHENIX?

If you use PHENIX please cite: Adams, P.D., Grosse-Kunstleve, R.W., Hung, L.-W., Ioerger, T.R.,

McCoy, A.J., Moriarty, N.W., Read, R.J., Sacchettini, J.C., Sauter, N.K., Terwilliger, T.C. (2002).

PHENIX: building new software for automated crystallographic structure determination. Acta Cryst.

D58, 1948-1954.

Where can I find sample data?

You can find sample data in the directories located in: $PHENIX/examples. Additionally there is sample MR data in $PHENIX/phaser/tutorial.

Can I easily run a Wizard with some sample data?

You can run sample data with a Wizard with a simple command. To run p9-sad sample data with the

AutoSol wizard, you type: phenix.run_example p9-sad

This command copies the $PHENIX/examples/p9-sad directory to your working directory and executes the commands in the file run.csh. http://phenix-online.org/documentation/faqs.htm (1 of 6) [12/14/08 1:00:22 PM]

17

PHENIX FAQS

What sample data are available to run automatically?

You can see which sample data are set up to run automatically by typing: phenix.run_example --help

This command lists all the directories in $PHENIX/examples/ that have a command file run.csh ready to use. For example: phenix.run_example --help

PHENIX run_example script. Fri Jul 6 12:07:08 MDT 2007

Use: phenix.run_example example_name [--all] [--overwrite]

Data will be copied from PHENIX examples into subdirectories of this working directory

If --all is set then all examples will be run (takes a long time!)

If --overwrite is set then the script will overwrite subdirectories

List of available examples: 1J4R-ligand a2u-globulin-mr gene-5-mad p9-build p9-sad

Are any of the sample datasets annotated?

The PHENIX tutorials listed on the main PHENIX web page will walk you through sample datasets, telling you what to look for in the output files. For example, the

Tutorial 1: Solving a structure using

SAD data tutorial uses the p9-sad dataset as example. It tells you how to run this example data in

AutoSol and how to interpret the results.

Why does the AutoBuild Wizard say it is doing 2 rebuild cycles but I specified one?

The AutoBuild wizard adds a cycle just before the rebuild cycles in which nothing happens except refinement and grouping of models from any previous build cycles.

What is the difference between overall_best.pdb and cycle_best_1.pdb in the AutoBuild Wizard?

The AutoBuild Wizard saves the best model (and map coefficient file, etc) for each build cycle nn as cycle_best_nn.pdb. Also the Wizard copies the current overall best model to overall_best.pdb. In this way you can always pull the overall_best.pdb file and you will have the current best model. If you wait until the end of the run you will get a summary that lists the files corresponding to the best model.

These will have the same contents as the overall_best files.

Can PHENIX do MRSAD?

Yes, PHENIX can run MRSAD (molecular replacement, combined with SAD phases) by determining the anomalous scatterer substructure from a model-phased anomalous difference Fourier. There two simple ways to do this; both are described in the

AutoSol

documentation.

How can I tell the AutoSol Wizard which columns to use from my mtz file?

The AutoSol Wizard will normally try to guess the appropriate columns of data from an input data file.

If there are several choices, then you can tell the Wizard which one to use with the script command group_labels_list or the command_line keywords labels, peak.labels, infl.labels etc. For example if you http://phenix-online.org/documentation/faqs.htm (2 of 6) [12/14/08 1:00:22 PM]

18

PHENIX FAQS have two input datafiles w1 and w2 for a 2-wavelength MAD dataset and you want to select the w1(+) and w1(-) data from the first file and w2(+) and w2(-1) from the second, you could put in a script file the following lines (see "How do I know what my choices of labels are for my data file" to know what to put in these lines): input_file_list w1.mtz w2.mtz

group_labels_list 'w1(+) SIGw1(+) w1(-) SIGw1(-)' 'w2(+) SIGw2(+) w2(-) SIGw2(-)'

Note that all the labels for one set of anomalous data from one file are grouped together in each set of quotes. You could accomplish the same thing from the command line by specifying something like: peak.data=w1.mtz infl.data=w2.mtz \ peak.labels='w1(+) SIGw1(+) w1(-) SIGw1(-)' \ infl.labels='w2(+) SIGw2(+) w2(-) SIGw2(-)'

How do I know what my choices of labels are for my data file?

You can find out what your choices of labels are by running the command: phenix.autosol show_labels=w1.mtz

This will provide a listing of the labels in w1.mtz and suggestions for their use in the PHENIX Wizards.

For example the labels for w1.mtz yields:

List of all anomalous datasets in w1.mtz

'w1(+) SIGw1(+) w1(-) SIGw1(-)'

List of all datasets in w1.mtz

'w1(+) SIGw1(+) w1(-) SIGw1(-)'

List of all individual labels in w1.mtz

'w1(+)'

'SIGw1(+)'

'w1(-)'

'SIGw1(-)'

Suggested uses: labels='w1(+) SIGw1(+) w1(-) SIGw1(-)' input_labels='w1(+) SIGw1(+) None None None None None None None' input_refinement_labels='w1(+) SIGw1(+) None' input_map_labels='w1(+) None None'

What can I do if a Wizard says this version does not seem big enough?

The Wizards try to automatically determine the size of solve or resolve, but if your data is very high resolution or a very large unit cell, you can get the message:

***************************************************

Sorry, this version does not seem big enough...

(Current value of isizeit is 30)

Unfortunately your computer will only accept a size of 30

with your current settings.

You might try cutting back the resolution

You might try "coarse_grid" to reduce memory

You might try "unlimit" allow full use of memory http://phenix-online.org/documentation/faqs.htm (3 of 6) [12/14/08 1:00:22 PM]

19

PHENIX FAQS

***************************************************

You cannot get rid of this problem by specifying the resolution with resolution=4.0

because the Wizards use the resolution cutoff you specify in all calculations, but the high-res data is still carried along. The easiest solution to this problem is to edit your data file to have lower- resolution data. You can do it like this: phenix.reflection_file_converter huge.sca --sca=big.sca --resolution=4.0

A second solution is to tell the Wizard to ignore the high-res data explicitly with: resolution=4.0 \ resolve_command="'resolution 200 4.0'" \ solve_command="'resolution 200 4.0'" \ resolve_pattern_command="'resolution 200 4.0'"

Note the two sets of quotes; both are required for this command-line input. These commands are applied after all other inputs in resolve/solve/resolve_pattern and therefore all data outside these limits will be ignored.

Why does the AutoBuild Wizard say Sorry, you need to define FP in labin but AutoMR was able to read my data file just fine?

When you run AutoMR and let it continue on to the AutoBuild Wizard automatically, the AutoBuild wizard guesses the input file contents separately from AutoMR. Usually it can guess correctly, but if it cannot then you can tell it what the labels for FP SIGFP FreeR_flag are like this: autobuild_input_labels="myFP mySIGFP myFreeR_flag" where you can say None for anything that you do not want to define. This has an effect that is identical to specifying input_labels directly when you run AutoBuild.

Why does the AutoBuild Wizard just stop after a few seconds?

When you run AutoBuild from the command line it writes the output to a file and says something like:

Sending output to AutoBuild_run_3_/AutoBuild_run_3_1.log

Usually if something goes wrong with the inputs then it will give you an error message right on the screen. However a few types of errors are only written to the log file, so if AutoBuild just stops after a few seconds, have a look at this log file and it should have an error message at the end of the file.

What do I do if the PHENIX GUI hangs?

If the GUI hangs (windows do not respond or windows display partially) then you may want to try and kill it by clicking on the upper right corner , or right-clicking on the top bar of the GUI and closing it. If those fail, you can control-C in the window where you started up the GUI. In either case, you can restart the GUI by typing phenix again. You may find it necessary to start phenix up, then close it down nicely with Project/Exit, and restart it (this gets rid of some files that are deleted when the GUI closes normally). You may also occasionally find it necessary to kill any jobs that are still running by http://phenix-online.org/documentation/faqs.htm (4 of 6) [12/14/08 1:00:22 PM]

20

PHENIX FAQS running top and noticing if there are python or resolve or solve jobs running that were part of your

PHENIX job, then using k to kill those jobs while running top.

Why does the GUI Parameters window say Invalid input parameters...do you want to continue?

This happens if something in the window isn't correct. If no colored entry fields come up, have a look at the bottom where it says NAVIGATE SET VARIABLE AUTO MANUAL. The entry forms under these words should read: "Choose method to run" "Choose variable to set" and "Manual" (or "Automatic") unless you have intentionally set them. If that isn't it, look carefully at all the entries in the entire parameters window and make sure that they are of the type that is expected (file name, number, etc).

If that doesn't work, just click YES and carry on.

Why is my TEMP0 directory empty after running a Wizard?

By default all the working files in the TEMP subdirectories are deleted at the end of a Wizard run. If you want to keep these files, then you can specify clean_up=False

How do I stop a Wizard?

You can stop a Wizard in two ways. For a "soft" stop, press the "Pause" button if you are running from the GUI, or create a file with the name STOPWIZARD in the directory where the Wizard is running (i.e., create AutoBuild_run_4_/STOPWIZARD to stop run 4 of the AutoBuild Wizard). For a hard stop from the GUI, you can select "Strategy" on the top line of the GUI and then select "Stop Strategy" at the bottom of the choices. That kills the Wizard and all associated jobs. You can still go on from there; select the Parameters window (the lines at the upper left of the now-yellow GUI window) and choose what to do next.

What is an R-free flags mismatch?

When you run AutoBuild or phenix.refine you may get this error message or a similar one:

************************************************************

Failed to carry out AutoBuild_build_cycle:

Please resolve the R-free flags mismatch.

************************************************************

Phenix.refine keeps track of which reflections are used as the test set (i.e., not used in refinement but only in estimation of overall parameters). The test set identity is saved as a hex-digest and written to the output PDB file produced by phenix.refine as a REMARK record:

REMARK r_free_flags.md5.hexdigest 41aea2bced48fbb0fde5c04c7b6fb64

Then when phenix.refine reads a PDB file and a set of data, it checks to make sure that the same test set is about to be used in refinement as it was in the previous refinement of this model. If it does not, you get the error message about an R-free flags mismatch. Sometimes the R-free flags mismatch error is telling you something important: you need to make sure that the same test set is used throughout refinement. In this case, you might need to change the data file you are using to match the one previously used with this PDB file. Alternatively you might need to start your refinement over with the desired data and test set. Other times the warning is not applicable. If you have two datasets with the same test set, but one dataset has one extra reflection that contains no data, only indices, then the two datasets will have different hex digests even though they are for all practical purposes equivalent.

In this case you would want to ignore the hex-digest warning. If you get an R-free flags mismatch http://phenix-online.org/documentation/faqs.htm (5 of 6) [12/14/08 1:00:22 PM]

21

PHENIX FAQS error, you can tell the AutoBuild Wizard to ignore the warning with : skip_hexdigest=True and you can tell phenix.refine to ignore it with: refinement.input.r_free_flags.ignore_pdb_hexdigest=True

You can also simply delete the REMARK record from your PDB file if you wish to ignore the hex-digest warnings.

Can I use the AutoBuild wizard at low resolution?

The standard building with AutoBuild does not work very well at resolutions below about 3-3.2 A. In particular, the wizard tends to build strands into helical regions at low resolution. However you can specify "helices_strands_only=True" and the wizard will just build regions that are helical or betasheet, using a completely different algorithm. This is much quicker than standard building but much less complete as well.

Why doesn't COOT recognize my MTZ file from AutoBuild?

This happens if you use "auto-open MTZ" in COOT. COOT will say: FAILED TO FIND COLUMNS FWT

AND PHWT IN THAT MTZ FILE FAILED TO FIND COLUMNS DELFWT AND PHDELFWT IN THAT MTZ FILE.

The solution is to use "Open MTZ" and then to select the columns (usually FP PHIM FOMM, and yes, do use weights). http://phenix-online.org/documentation/faqs.htm (6 of 6) [12/14/08 1:00:22 PM]

22

Using the PHENIX Wizards

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Using the PHENIX Wizards

Purpose

Overview of Structure Determination with the PHENIX Wizards

Usage

Wizard data directories, sub-directories, Facts, and the PDS (Project Data Storage)

Running a Wizard using a multiprocessor machine or on a cluster

Running a Wizard from a GUI

Basic operation of a Wizard from the GUI

Keeping track of multiple runs of a Wizard from the GUI

Setting parameters of a Wizard from the GUI

Navigating steps in a Wizard from the GUI

Running a Wizard from the command-line

Basic operation of a Wizard from the command-line

Keeping track of multiple runs of a Wizard from the command-line

Setting parameters of a Wizard from the command-line

Running a Wizard from a script

Differences between running from the command line and running a script

Basic operation of a Wizard from a script

Keeping track of multiple runs of a Wizard from a script

Setting parameters of a Wizard from a script

Useful script commands

Specific limitations and problems:

Literature

Additional information

Purpose

Any Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files.

All three versions are identical except in the way that they take commands and keywords from the user.

This page describes how to run a Wizard and what a Wizard does in general. The specific Wizard help pages describe the details of each PHENIX Wizard.

Overview of Structure Determination with the PHENIX Wizards

You can use the AutoSol Wizard to solve structures by SAD, MAD, SIR/SIRAS, and MIR/MIRAS. The

AutoMR Wizard can solve a structure by MR. The AutoMR and AutoSol Wizards together can carry out

MRSAD. The AutoSol Wizard can also combine SAD, MAD, SIR, and MIR datasets and solve the structure using all available data.

Once you have experimental or MR phases, you can carry out iterative model-building, density modification, and refinement with the AutoBuild Wizard to improve your model. Finally you can use the rebuild_in_place feature of the AutoBuild Wizard to make one very good final model. http://phenix-online.org/documentation/running-wizards.htm (1 of 11) [12/14/08 1:00:26 PM]

23

Using the PHENIX Wizards

If your structure contains ligands, you can place them using the LigandFit Wizard

This help page describes how to run the Wizards from a GUI, the command-line, or a script. The individual Wizard documentation pages describe the strategies and commands for each Wizard:

Automated Structure Solution using AutoSol

Automated Molecular Replacement using AutoMR

Automated Model Building and Rebuilding using AutoBuild

Automated Ligand Fitting using LigandFit

Usage

Wizard data directories, sub-directories, Facts, and the PDS (Project Data Storage)

The directory that you are in when you start up PHENIX is your working directory.

Each run of a Wizard will have all output data in a subdirectory of your working directory named like this (for AutoSol run 3):

AutoSol_run_3_/

This subdirectory will have one or more temporary directories:

AutoSol_run_3_/TEMP0/ which contain intermediate files. These temporary directories will be deleted when the Wizard is finished (unless you set the parameter clean_up to False)

For OMIT and MULTIPLE-MODEL runs, the final OMIT maps and multiple models will be in a subdirectory of your run directory:

AutoSol_run_3_/OMIT/

AutoSol_run_3_/MULTIPLE_MODELS/

All the parameter values as well as any other information that a Wizard generates during its run is stored in the PDS (Project Data Storage) and/or the Wizard Facts. The Facts are values of parameters and pointers to files in the PDS. The Facts keep track of the current knowledge available to the Wizard. Each time a step is completed by a Wizard, the new Facts are saved

(overwriting old ones for that run). As the Facts define the state of the Wizard, the Wizard can be restarted any time by loading the appropriate set of Facts.

The PDS (Project data storage) will be in your working directory:

./PDS/

The PDS contains the output of each of your runs for all Wizards and a record of all the Facts

(parameters and data) for each run. If you delete a run using the PHENIX Wizard GUI or with a command like "phenix.autosol delete_runs=2", the corresponding entries in the PDS are also deleted. You can copy the PDS from one place to another. Note that if you delete directories such as "AutoSol_run_1_" by hand then the corresponding information remains in the PDS. For this reason it is best to use the GUI or specific commands to delete runs.

Running a Wizard using a multiprocessor machine or on a cluster

http://phenix-online.org/documentation/running-wizards.htm (2 of 11) [12/14/08 1:00:26 PM]

24

Using the PHENIX Wizards

You can take advantage of having a multiprocessor machine or a cluster when running the wizards (Currently this applies to the LigandFit and AutoBuild Wizards). For example, adding command nproc=4 to a command-line command for a Wizard will use 4 processors to run the wizard (if possible).

Normally you will run the parallel processes in the background with the default of background=True

If you have a cluster with a batch queue, you can send subprocesses to the batch queue with run_command=qsub

(or whatever your batch command is). In this case you will use background=False so that the batch queue can keep track of your jobs.

The Wizards divide the data into nbatch batches during processing. The value of nbatch=3 is set from 3 to 5 by default (depending on the Wizard) and is appropriate if you have up to nbatch processors. If you have more, then you may wish to increase nbatch to match the number of processors. The reason it is done this way is that the value of nbatch can affect the results that you get, as the jobs are not split into exact replicates, but are rather run with different random numbers. If you want to get the same results, keep the same value of nbatch.

Running a Wizard from a GUI

Basic operation of a Wizard from the GUI

Start up the PHENIX GUI in your working directory by typing "phenix"

Answer "yes" to the question "Do you want to make it a project directory?".

Launch a Wizard from the PHENIX GUI by double-clicking on the name of the Wizard ("AutoSol") under "Wizards" in the Strategy Interface of the main GUI.

The Wizard will come up in a blue window and will open a grey Parameters window asking you for information on what files to use and what to do.

Enter the file names and make choices as necessary (NOTE: to select a file click on the yellow box to the right of the file entry field. To add a new file entry field click on the "Parameter group options" tab if present).

Proceed to the next window by clicking "Continue" in the upper left corner of the grey

Parameters window. http://phenix-online.org/documentation/running-wizards.htm (3 of 11) [12/14/08 1:00:26 PM]

25

Using the PHENIX Wizards

The Wizard will guide you through the necessary inputs, then it will continue on its own until it is finished.

When the Wizard is done, you can double-click on the Display icon (the little magnifying glass on the upper left of the blue Wizard window) to show a list of files and maps that can be displayed.

(NOTE: The Display Options window is updated when you open it. Once this window is open you cannot open it again until you close it. Sometimes this window may be behind other windows and this will prevent you from opening it again.)

You can open the Parameters window any time the Wizard is stopped by clicking on the

Parameters icon (4 little lines in the upper left corner of the blue Wizard window). This allows you to carry out some of the more advanced options below.

Your output log file will be in a file called "AutoSol.1.output" for an AutoSol run. You can also see the same file by clicking on the "LOG" button at the lower right of the blue or green window.

Keeping track of multiple runs of a Wizard from the GUI

You can run more than one Wizard job at a time if you want. Each run of a Wizard is put in a separate sub-directory (e.g., "AutoSol_run_1_").

When you start a Wizard, it will start a new run of that Wizard.

If you want to continue on with the highest-numbered run of a Wizard, you can start the Wizard with the continue button for that Wizard (for example the continue_AutoSol button).

If you want to go back to a previous run, you can use the Run Control and Run Number selections near the bottom of any Parameters window (NOTE: to open the parameters window click on the lines at the upper left of the blue Wizard window). Select goto_run and choose a run number to go to.

If you want to copy a previous run and go on, use the Run Control and Run Number selections and select copy_run and choose a run number to copy. The Wizard will create a new run (with number equal to the highest previous number plus one) and carry on with it.

To see what runs are available, select View or Delete Runs in the Navigate tab at the lower left of any Parameters window.

If you want to stop the Wizard, hit the PAUSE button on the green Wizard window (the Wizard is green when running, blue or purple when stopped). NOTE: this may take a little time, particularly if Phaser or HYSS or phenix.refine are running. In those cases if you really want to stop the Wizard right away, got to "Strategy" and then select "Stop Strategy" and it will be stopped.

Setting parameters of a Wizard from the GUI

You can set any parameter in a Wizard by selecting the variable in the Choose Variable to Set tab. The next time you click Continue, the Wizard will save all the current inputs as usual, and then instead of going on to the next step, it will open a window asking you for the new value of that variable. When you enter it and press Continue, the Wizard will continue on with what it was doing, but with this new value.

NOTE that some parameters (e.g., resolution) may affect many steps. If a prior step is affected by a parameter that is changed, the Wizard does not go back and change it. If you want the http://phenix-online.org/documentation/running-wizards.htm (4 of 11) [12/14/08 1:00:26 PM]

26

Using the PHENIX Wizards parameter change to affect something that has already been done, you need to re-run the corresponding step.

NOTE that you can set any SOLVE, RESOLVE or RESOLVE_PATTERN keyword when you are running a Wizard using the "resolve_command", "solve_command" or

"resolve_pattern_command" keywords. These can be set in the GUI from the Choose Variable pull-down menu. You just type in the command to the entry form like this: (for resolve_command): res_start 4.0

telling resolve in this case to start out density modification at a resolution of 4 A. This allows you to control what solve, resolve and resolve_pattern do more finely than you otherwise can in the

Wizards.

Navigating steps in a Wizard from the GUI

When the Wizard is done or Paused, you can select any available step in the Navigate tab at the middle bottom of any Parameter window. This tells the Wizard to get any necessary inputs for that step and to then carry it out.

The Wizards normally start out in Manual mode (one step at a time, asking user for inputs).

Once the necessary inputs are entered, the Wizard enters Automatic mode (no more asking for inputs until something required is missing). You can control this by specifying Manual or

Automatic in the Auto/Manual tab at the bottom right of any Wizard.

Running a Wizard from the command-line

Basic operation of a Wizard from the command-line

You can run a wizard from the command line like this (autosol is the AutoSol wizard): phenix.autosol data=w1.sca seq_file=seq.dat 2 Se

The command_line interpreter will try to interpret obvious information (2 means sites=2, Se means atom_type=Se) and will run the wizard.

To see all the information about this wizard and the keywords that you can set for this wizard, type: phenix.autosol --help all

Any wizard keyword can be entered at the command line (not just the ones labelled "commandline only"). The documentation for each wizard lists all the keywords that apply to that wizard.

If you want to stop a Wizard, you can create a file "STOPWIZARD" and put it in the subdirectory

(i.e., AutoSol_2_/) where the Wizard is running. This is like hitting the PAUSE button on the GUI and stops the wizard cleanly.

Keeping track of multiple runs of a Wizard from the command-line

When you start a Wizard from the command line, the default is to start a new run of that

Wizard. http://phenix-online.org/documentation/running-wizards.htm (5 of 11) [12/14/08 1:00:26 PM]

27

Using the PHENIX Wizards

To see all the available runs of this Wizard, type: phenix.autosol show_runs

To delete runs 1,2 and 4-7 of this Wizard, type something like this: phenix.autosol delete_runs="1 2 4-7"

Note that the group of numbers is enclosed in quotes ("). This tells the input parser (iotbx.phil) that all these numbers go with the one keyword of delete_runs. Note also that there are no spaces around the "=" sign!

To go back to run 2 and carry on (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like: phenix.autosol run=2 resolution=3.0

To carry on with the current highest-numbered run (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like: phenix.autosol carry_on resolution=3.0

To copy run 2 to a new run and carry on from there (remembering all previous inputs and possibly adding new ones, in this case setting the resolution) type something like: phenix.autosol copy_run=2 resolution=3.0

Setting parameters of a Wizard from the command-line

When you run a Wizard from the command-line, two files are produced and put in the subdirectory of the Wizard (e.g., AutoBuild_run_3_/).

A parameters (".eff") file will be produced that you can edit to rerun the Wizard: phenix.autosol autosol.eff

This autosol.eff file (for AutoSol) contains the values of all the AutoSol parameters at the time of starting the Wizard.

Note that the syntax in the autosol.eff file is very slightly different than the syntax from the command line. From the command line, if a value has several parts, you enclose them in quotes and there are no spaces around the "=" sign: phenix.autosol ... input_phase_labels="FP PHIM FOMM"

In the .eff file, you MUST leave off the quotes or the three values will be treated as one, and you should leave blanks around the "=" sign:

input_phase_labels = FP PHIM FOMM

The reason these are different is that in the .eff file, the structure of the file and the brackets tell the PHIL parser what is grouped together, while from the commmand line, the quotes tell the http://phenix-online.org/documentation/running-wizards.htm (6 of 11) [12/14/08 1:00:26 PM]

28

Using the PHENIX Wizards parser what is to be grouped together.

A script file (".inp") with inputs in the format for running from a script is produced that you can edit and use like this: phenix.runWizard AutoSol AutoSol.inp

To get keyword help on a specific keyword you can type: phenix.autosol --help data # get help on the keyword data for autosol

To show current Facts (values of all parameters) for highest_numbered run: phenix.autosol show_facts

To show current Facts (values of all parameters) for run 3: phenix.autosol run=3 show_facts

To show current summary: phenix.autosol show_summary

When you use a keyword like data= you need to give enough information to specify this keyword uniquely. You can see all the keywords for each PHENIX Wizard or tool at the end of the documentation for that Wizard or tool. This will have entries like this (for AutoSol):

autosol

sites= None Number of heavy-atom sites. (Command-line only) which describes the keyword sites in the scope defined by autosol. You can explicitly specify this on the command line with: autosol.sites=3 which in this case is entirely the same as sites=3

NOTE that you can set any SOLVE, RESOLVE or RESOLVE_PATTERN keyword in PHENIX using the "resolve_command", "solve_command" or "resolve_pattern_command" keywords from the command line. The format is a little tricky: you have to put two sets of quotes around the command like this: resolve_command="'ligand_start start.pdb'" # NOTE ' and " quotes

This will put the text ligand_start start.pdb

at the end of every temporary command file created to run resolve. http://phenix-online.org/documentation/running-wizards.htm (7 of 11) [12/14/08 1:00:26 PM]

29

Using the PHENIX Wizards

Running a Wizard from a script

Differences between running from the command line and running a script

Command-line

The command-line is an easy way to run a Wizard and is recommended for any users. The command starts with phenix. plus the name of the Wizard in lower-case letters (phenix.autosol). Following this, all of the keywords are on the same line (or on continuation lines) and values are assigned with an "=" sign. The order of keywords makes no difference running from the command line. A simple command is: phenix.autosol data=w1.sca seq_file=seq.dat sites=2 atom_type=Se

Scripts Normally scripts are for advanced users only (for running MIR or multiple datasets, you have to use the GUI or a script, however). A script can contain both commands and keywords. Keywords are read in until a command is found, then the command is executed, then additional keywords are read in until another command is found, and so on. If the script file contains only keywords and no commands, then the keywords are read in and used as input to the Wizard, in just the same way as running from the command line. In a script file, each line can contain a command or keyword and optional values for the command or keyword, separated by spaces. The keywords for scripts are a subset of keywords for the command-line. This is because the command-line interpreter has a number of special keywords

(essentially shortcuts) to make typing at the command-line easier. A script file assigns values to keywords by being on the same line, not using any "=" signs. A sample script file "autosol.inp" that contains the same information as the command-line command shown above (but with the full keyword names, not the command-line shortcuts) is:

# autosol.inp

# script file with inputs for AutoSol Wizard.

# run with: phenix.runWizard AutoSol autosol.inp

# input_file_list w1.sca # script keyword is input_file_list not data input_seq_file seq.dat # script keyword is input_seq_file not seq_file mad_ha_n 2 # script keyword is mad_ha_n not sites mad_ha_type Se # script keyword is mad_ha_type not atom_type

#

# end of autosol.inp

which you can run with: phenix.runWizard AutoSol autosol.inp

NOTE: The script interpreter will accept any keywords and values. If the keyword is not recognized, then it will write a warning to the log file, but it will not stop. This means that if you use the wrong name for a keyword, you will only find this out by looking at the beginning of the log file. The utility of this feature is that keywords set the value of the corresponding variable in the Wizard. If you know what you are doing, you can set any variable in the Wizard in this way, whether or not it is a keyword.

Basic operation of a Wizard from a script

You can run a wizard from a script like this (AutoSol wizard): phenix.runWizard AutoSol autosol.inp

http://phenix-online.org/documentation/running-wizards.htm (8 of 11) [12/14/08 1:00:26 PM]

30

Using the PHENIX Wizards

The script file (autosol.inp) should contain keyword entries telling the Wizard what to do. The output will be written to the log file (e.g., AutoSol_run_1_/AutoSol_run_1_1.log).

The keywords that can be set in a script file include most of the keywords for for command-line running, plus a set of control commands for running from a script. To see all the basic keywords for a wizard, make a script (e.g., keywords.inp) that says: list_keywords and then type: phenix.runWizard AutoSol keywords.inp

The keywords will be written to the log file (e.g., AutoSol_run_1_/AutoSol_run_1_1.log).

For help on a Wizard, your script file should say: help

Unlike running from the command-line, the order of entries in a script file can make a difference.

For example you can specify a group of inputs for one dataset and then start a new dataset.

If you want to stop a Wizard, you can create a file "STOPWIZARD" and put it in the subdirectory

(i.e., AutoSol_2_/) where the Wizard is running. This is like hitting the PAUSE button on the GUI and stops the wizard cleanly.

Keeping track of multiple runs of a Wizard from a script

When you start a Wizard from the command line, the default is to start a new run of that

Wizard.

To see all the available runs of this Wizard, delete some runs, carry on with run 3, or copy run 4 into a new run, your script should say one of the following: show_runs delete_run_list 1 2 3-5 run 3 copy_run 4

Setting parameters of a Wizard from a script

You can set nearly any parameter using keywords from a script. For example: resolution 2.5

will set the overall high-resolution cutoff to 2.5 A.

Useful script commands

With the exception of show_runs and delete_runs, the output for each of these commands is written to the log file (e.g., AutoSol_run_1_/AutoSol_run_1_1.log). help # print out this help message http://phenix-online.org/documentation/running-wizards.htm (9 of 11) [12/14/08 1:00:26 PM]

31

Using the PHENIX Wizards show_runs # list all the runs that are saved delete_runs 1 2 3-5 9:12 # delete runs 1 2 3-5 9-12 carry_on # continue on with the highest-numbered run run 5 # continue with run 5 copy_run 5 # make a new copy of run 5 (with number equal

# to highest existing run number +1) and continue

# with this new copy.

run 2 run_only DumpFacts # list current values of all parameters in run 2 and stop run_only nothing # do nothing and stop list_keywords # list all the keywords and their possible values run_list method_1 method_2 # run these methods and anything

# that follows automatically run_only method_1 method_2 # run just these methods and stop user_command method_1 list_methods # list all methods that can be run with run_list

These are a good way to run Wizards initially, and also a good way to change some parameters after stopping a run

Note: these all have the form: keyword parameter where the parameter must be enclosed in quotes if it is a string containing blanks. If the keyword contains the text "list" or the words "dataset_", "cell" or "input_labels" then the parameter can be a list of items, separated by blanks: cell 40 50 40 90 90 90

An empty list is indicated by "[]"

NOTE that you can set any SOLVE, RESOLVE or RESOLVE_PATTERN keyword in PHENIX using the "resolve_command", "solve_command" or "resolve_pattern_command" keywords from a script. The format is different than from the command-line: you don't have to put quotes around around the command: resolve_command ligand_start start.pdb # NOTE: quotes not necessary for script

This will put the text ligand_start start.pdb

http://phenix-online.org/documentation/running-wizards.htm (10 of 11) [12/14/08 1:00:26 PM]

32

Using the PHENIX Wizards at the end of every temporary command file created to run resolve.

Specific limitations and problems:

In the GUI version of Wizards, The Display Options window is updated only when you open it.

Further, once this window is open you cannot open it again until you close it. Sometimes this window may be behind other windows and this will prevent you from opening it again until you close the open window.

The Wizards use file names based on the names of your input files, but they do not differentiate between files with the same name coming from different directories. Consequently you should not use two files with different contents but with the same file name as inputs to a Wizard, even if they come from separate starting directories.

The command-line version of AutoSol cannot be used for MIR or for combining multiple datasets.

The script and GUI versions can be used instead for these cases.

If you stop a Wizard and continue on with a command such as phenix.autobuild run=2 then you can change most parameters with keywords just as if you were starting from scratch, but if you had previously changed a keyword away from the default, you cannot set it back to the default in this way (the Wizard ignores keywords that are the same as the default).

You should not work on the same run in two ways at the same time. This can lead to unpredictable results because the two runs will really be the same run and the data and databases for the two runs will be overwriting each other. This means you need to be careful that if you goto_run 1 of a Wizard in one window that you do not also goto_run 1 of the same

Wizard in another window. On the other hand, it is perfectly fine to work on run 1 of a Wizard in one window and run 2 of the same Wizard in another window.

The PHENIX Wizards can take most settings of most space groups, however they can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in

PHENIX for these settings and space groups.

Literature

Additional information

http://phenix-online.org/documentation/running-wizards.htm (11 of 11) [12/14/08 1:00:26 PM]

33

Automated structure solution with AutoSol

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Automated structure solution with AutoSol

Author(s)

Purpose

Usage

How the AutoSol Wizard works

Setting up inputs

Datasets and Solutions in AutoSol

Analyzing and scaling the data

Finding heavy-atom (anomalously-scattering atom) sites

Running AutoSol separately in related space groups

Scoring of heavy-atom solutions

Phasing

Density modification (including NCS averaging)

Preliminary model-building and refinement

Resolution limits in AutoSol

Output files from AutoSol

How to run the AutoSol Wizard

Model viewing during model-building with the Coot-PHENIX interface

Examples

SAD dataset

SAD dataset specifying solvent fraction

SAD dataset without model-building

SAD dataset, building RNA instead of protein

SAD dataset, selecting a particular dataset from an MTZ file

MRSAD -- SAD dataset with an MR model; Phaser SAD phasing including the model

Using an MR model to find sites and as a source of phase information (method #2 for MRSAD)

SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss

MAD dataset

MAD dataset, selecting particular datasets from an MTZ file

SIR dataset

SAD with more than one anomalously-scattering atom

MIR dataset

SIR + SAD datasets

Possible Problems

General limitations

Specific limitations and problems

Literature

Additional information

List of all AutoSol keywords

Author(s)

AutoSol Wizard: Tom Terwilliger

PHENIX GUI and PDS Server: Nigel W. Moriarty

HYSS: Ralf W. Grosse-Kunstleve and Paul D. Adams

Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni

SOLVE: Tom Terwilliger http://phenix-online.org/documentation/autosol.htm (1 of 29) [12/14/08 1:00:42 PM]

34

Automated structure solution with AutoSol

RESOLVE: Tom Terwilliger

TEXTAL: K. Gopal, T.R. Ioerger, R.K. Pai, T.D. Romo, J.C. Sacchettini phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams phenix.xtriage: Peter Zwart

Purpose

The AutoSol Wizard uses HYSS, SOLVE, Phaser, RESOLVE, TEXTAL, xtriage and phenix.refine to solve a structure and generate experimental phases with the MAD, MIR, SIR, or SAD methods. The Wizard begins with datafiles (.sca, .hkl, etc) containing amplitidues of structure factors, identifies heavy-atom sites, calculates phases, carries out density modification and NCS identification, and builds and refines a preliminary model.

Usage

The AutoSol Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user. See

Running a Wizard from a GUI, the command-line, or a script

for details of how to run a Wizard. The command-line version will be described here, except for MIR and multiple datasets, which can only be run with the GUI or with a script.

How the AutoSol Wizard works

The basic steps that the AutoSol Wizard carries out are described below. They are: Setting up inputs,

Analyzing and scaling the data, Finding heavy-atom (anomalously-scattering atom) sites, Scoring of heavyatom solutions, Phasing, Density modification (including NCS averaging), and Preliminary model-building and refinement. The data for structure solution are grouped into Datasets and solutions are stored in

Solution objects.

Setting up inputs

The AutoSol Wizard expects the following basic information:

(1) a datafile name (w1.sca or data=w1.sca)

(2) a sequence file (seq.dat or seq_file=seq.dat)

(3) how many sites to look for (2 or sites=2)

(4) what the anomalously-scattering atom is (Se or atom_type=Se)

(5) If you have SAD or MAD data, then it is helpful to add f_prime and f_double_prime for each wavelength.

You can also specify many other parameters, including resolution, number of sites, whether to search in a thorough or quick fashion, how thoroughly to build a model, etc. If you have a heavy-atom solution from a previous run or another approach, you can read it in directly as well.

Datasets and Solutions in AutoSol

AutoSol breaks down the data for a structure solution into datasets, where a dataset is a set of data that corresponds to a single set of heavy-atom sites. An entire MAD dataset is a single dataset. An MIR structure solution consists of several datasets (one for each native-derivative combination). A MAD + SIR structure has one dataset for the MAD data and a second dataset for the SIR data. The heavy-atom sites for each dataset are found separately (but using difference Fouriers from any previously-solved datasets to help). In the phasing step all the information from all datasets is merged into a single set of phases. http://phenix-online.org/documentation/autosol.htm (2 of 29) [12/14/08 1:00:42 PM]

35

Automated structure solution with AutoSol

The AutoSol wizard uses a "Solution" object to keep track of heavy-atom solutions and the phased datasets that go with them. There are two types of Solutions: those which consist of a single dataset (Primary

Solutions) and those that are combinations of datasets (Composite Solutions). "Primary" Solutions have information on the datafiles that were part of the dataset and on the heavy-atom sites for this dataset.

Composite Solutions are simply sets of Primary Solutions, with associated origin shifts. The hand of the heavy-atom or anomalously-scattering atom substructure is part of a Solution, so if you have two datatsets, each with two Solutions related by inversion, then AutoSol would normally construct four different

Composite Solutions from these and score each one as described below.

Analyzing and scaling the data

The AutoSol Wizard analyzes input datasets with phenix.xtriage to identify twinning and other conditions that may require special care. The data is scaled with SOLVE. For MAD data, FA values are calculated as well.

Note on anisotropy corrections:

The AutoSol wizard will apply an anistropy correction to all the raw experimental data if any of the files in the first dataset read in have a very strong anisotropy. You can tell the Wizard how much anisotropy there must be before applying this correction by default using the keywords correct_aniso=True # (if True or False then always or never apply correction) delta_b_for_auto_correct_aniso=20 # correct if range of anisotropic B

#is greater than 20 ratio_b_for_auto_correct_aniso=1.5 #correct if the ratio of the largest

#to smallest anisotropic B is greater than 1.5

If an anisotropy correction is applied then a separate refinement file must be specified if refinement is to be carried out. This is because it is best to refine against data that have not been corrected for anisotropy

(instead applying the correction as part of refinement).

Finding heavy-atom (anomalously-scattering atom) sites

The AutoSol Wizard uses HYSS to find heavy-atom sites. The result of this step is a list of possible heavyatom solutions for a dataset. For SIR or SAD data, the isomorphous or anomalous differences, respectively are used as input to HYSS. For MAD data, the anomalous differences at each wavelength, and the FA estimates of complete heavy-atom structure factors from SOLVE are each used as separate inputs to HYSS.

Each heavy-atom substructure obtained from HYSS corresponds to a potential solution. In space groups where the heavy-atom structure can be either hand, a pair of enantiomorphic solutions is saved for each run of HYSS.

Running AutoSol separately in related space groups

AutoSol will check for the opposite hand of the heavy-atom solution, and at the same time it will check for the opposite hand of your space group (It will invert the heavy-atom solution from HYSS and invert the hand of the space group at the same time). Therefore you do not need to run AutoSol twice for space groups that are chiral (for example P41). The corresponding inverse space groups will be checked automatically (P43 ). If there are possibilities for your space group other than the inverse hand of the space group, then you should test them all, one at a time. For example if you were not able to measure 00l reflections in a hexagonal space group, your space group might be P6, P61, P62, P63, P64 or P65. In this case you would have to run it in P6, P61 P62 and P63 (and then P65 and P64 will be done automatically as the inverses of P61 and P62). Normally only one of these will give a plausible solution.

Scoring of heavy-atom solutions

http://phenix-online.org/documentation/autosol.htm (3 of 29) [12/14/08 1:00:42 PM]

36

Automated structure solution with AutoSol

Potential heavy-atom solutions are scored based on a set of criteria (CC, RFACTOR, SKEW, FOM,

NCS_OVERLAP, TRUNCATION, REGIONS, SD; described below), using either a Bayesian estimate, a linear regression, or a Z-score system to put all the scores on a common scale and to combine them into a single overall score. The overall scoring method chosen (BAYES-CC or Z-SCORE) is determined by the value of the keyword overall_score_method. The default is BAYES-CC. Note that for all scoring methods, the map that is being evaluated, and the estimates of map-perfect-model correlation, refer to the experimental electron density map, not the density-modified map.

Bayesian CC scores (BAYES-CC). Bayesian estimates of the quality of experimental electron density maps are obtained using data from a set of previously-solved datasets. The standard scoring criteria were evaluated for 1905 potential solutions in a set of 246 MAD, SAD, and MIR datasets. As each dataset had previously been solved, the correlation between the refined model and each experimental map

(CC_PERFECT) could be calculated for each solution (after offsetting the maps to account for origin differences). Histograms were tabulated of the number of instances that a scoring criterion (e.g., SKEW) had various possible values, as a function of the CC_PERFECT of the corresponding experimental map to the refined model. These histograms yield the relative probability of measuring a particular value of that scoring criterion (SKEW), given the value of CC_PERFECT. Using Bayes' rule, these probabilities can be used to estimate the relative probabilities of values of CC_PERFECT given the value of each scoring criterion for a particular electron density map. The mean estimate (BAYES-CC) is reported (multiplied x 100), with a +/-

2SD estimate of the uncertainty in this estimate of CC_PERFECT. The BAYES-CC values are estimated independently for each scoring criterion used, and also from all those selected with the keyword score_type_list and not selected with the keyword skip_score_list.

Z-scores (Z-SCORE). The Z-score for one criterion for a particular solution is given by,

Z= (Score - mean_random_solution_score)/(SD_of_random_solution_scores) where Score is the score for this solution, mean_random_solution_score is the mean score for a solution with randomized phases, and SD_of_random_solution_scores is the standard deviation of the scores of solutions with randomized phases.

To create a total score based on Z-scores, the Z-scores for each criterion are simply summed.

The principal scoring criteria are:

(1) Correlation of map-phased electron density map with experimentally- phased map (CC). The statistical density modification in RESOLVE allows the calculation of map-based phases that are (mostly) independent of the experimental phases. The phase information in statistical density modification comes from two sources: your experimental phases and maximization of the agreement of the map with expectations (such as a flat solvent region). Normally the phase probabilities from these two sources are merged together, yielding your density-modified phases. This score is calculated based on the correlation of the phase information from these two sources before combining them, and is a good indication of the quality of the experimental phases. This criterion is used in scoring by default.

(2) The R-factor for density modification (R-Factor). Statistical density modification provides an estimate of structure factors that is (mostly) independent of the measured structure factors, so the R-factor between FC and Fobs is a good measure of the quality of experimental phases. This criterion is used in scoring by default.

(3) The skew (third moment or normalized <rho**3>) of the density in an electron density map is a good measure of its quality, because a random map has a skew of zero (density histograms look like a Gaussian), while a good map has a very positive skew (density histograms very strong near zero, but many points with very high density). This criterion is used in scoring by default.

(4) Non-crystallographic symmetry (NCS overlap). The presence of NCS in a map is a nearly-positive indication that the map is good, or has some correct features. The AutoSol Wizard uses symmetry in heavyatom sites to suggest NCS, and RESOLVE identifies the actual correlation of NCS-related density for the NCS overlap score. This score is used by default if NCS is present in the Z-score method of scoring. http://phenix-online.org/documentation/autosol.htm (4 of 29) [12/14/08 1:00:42 PM]

37

Automated structure solution with AutoSol

(5) Figure of merit (FOM). The figure of merit of phasing is a good indicator of the internal consistency of a solution. This score is not normalized by the SD of randomized phase sets (as that has no meaning; rather a standard SD=0.05 is used). This score is used by default if NCS is present in the Z-score method of scoring and in the Bayesian CC estimate method.

(6) Map correlation after truncation (TRUNCATION). Dummy atoms (the same number as estimated nonhydrogen atoms in the structure) are placed in positions of high density of the map, and a new map is calculated based on these atomic positions. The correlation of these maps is calculated after adjusting an overall B-value for the dummy atoms to maximize the correlation. A good map will show a high correlation of these maps. This score is by default not used.

(7) Number of contiguous regions per 100 A**3 comprising top 5% of density in map (REGIONS). The top

5% of points in the map are marked, and the number of contiguous regions that result are counted, and divided by the volume of the asymmetric unit, then multiplied by 100. A good map will have just a few contiguous regions at a high contour level, a poor map will have many isolated peaks. This score is by default not used. (8) Standard deviation of local rms density (SD). The local rms density in the map is calculated using a smoothing radius of 3 times the high-resolution cutoff (or 6 A, if less than 6A). Then the standard deviation of the local rms, normalized to the mean value of the local rms, is reported. This criteria will be high if there are regions of high local rms (the macromolecule) and separate regions of low local rms

(the solvent) and low if the map is random. This score is by default not used.

Phasing

The AutoSol Wizard uses Phaser to calculate experimental phases from SAD data, and SOLVE to calculate phases from MIR, MAD, and multiple-dataset cases.

Density modification (including NCS averaging)

The AutoSol Wizard uses RESOLVE to carry out density modification. It identifies NCS from symmetries in heavy-atom sites with RESOLVE and applies this NCS if it is present in the electron density map.

Preliminary model-building and refinement

The AutoSol Wizard carries out one cycle of model-building and refinement after obtaining density-modified phases. The model-building can be with RESOLVE or with TEXTAL. The refinement is carried out with phenix.

refine.

Resolution limits in AutoSol

There are several resolution limits used in AutoSol. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:

Name resolution resolution_build res_phase

Description

Overall resolution for a dataset

How default value is set

Highest resolution for any datafile in this dataset. For multiple datasets, the highest resolution for any dataset value of "resolution"

Resolution for modelbuilding value of "resolution"

Resolution for phasing for a dataset

If phase_full_resolution=True then use value of

"resolution". Otherwise, use value of

"recommended_resolution" based on analysis of signalto-noise in dataset.

http://phenix-online.org/documentation/autosol.htm (5 of 29) [12/14/08 1:00:42 PM]

38

Automated structure solution with AutoSol res_eval

Resolution for evaluation of solution quality value of "resolution" or 2.5 A, whichever is lower resolution.

Output files from AutoSol

When you run AutoSol the output files will be in a subdirectory with your run number:

AutoSol_run_1_/

The key output files that are produced are:

A summary file listing the results of the run and the other files produced:

AutoSol_summary.dat # overall summary

A warnings file listing any warnings about the run

AutoSol_warnings.dat # any warnings

A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)

AutoSol_Facts.dat # all Facts about the run

NCS information (if any)

AutoSol_15.ncs_spec # NCS information. The number is the solution number

Experimental phases and HL coefficients solve_15.mtz # either solve or phaser depending on which was run phaser_15.mtz

Density-modified phases from RESOLVE current_cycle_map_coeffs.mtz # map coefficients (density modified phases) resolve_15.mtz # density-modified phases; same as above

For either of these, use FP PHIM FOMM for PHI F FOM.

An mtz file for use in refinement exptl_fobs_phases_freeR_flags_15.mtz # F Sigma HL coeffs, freeR-flags for refinement

Heavy atom sites in PDB format ha_15.pdb_formatted.pdb

Current preliminary model and evaluation of model current_cycle.pdb

current_cycle_eval.log

http://phenix-online.org/documentation/autosol.htm (6 of 29) [12/14/08 1:00:42 PM]

39

Automated structure solution with AutoSol

How to run the AutoSol Wizard

Running the AutoSol Wizard is easy. From the command-line you can type: phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5

The AutoSol Wizard will assume that w1.sca is a datafile (because it ends in .sca and is a file) and that seq.

dat is a sequence file, that there are 2 heavy-atom sites, and that the heavy-atom is Se. The f_prime and f_double_prime values are set explicitly

You can also specify each of these things directly: phenix.autosol data=w1.sca seq_file=seq.dat sites=2 \

atom_type=Se f_prime=-8 f_double_prime=4.5

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at

Running a Wizard from a GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

sites=3 # 3 sites sites_file=sites.pdb # ha sites in PDB or fractional xyz format atom_type=Se # Se is the heavy-atom seq_file=seq.dat # sequence file (1-aa code, separate chains with >>>>) quick=True # try to find sites quickly data=w1.sca # input datafile f_prime=-5 # f-prime value for SAD f_double_prime=4.5 # f-double-prime value for SAD

Model viewing during model-building with the Coot-PHENIX interface

The AutoSol Wizard allows you to view the current best model that is produced by the automated model-

building process. This capability is identical to the view/edit model procedure available in the AutoBuild

Wizard. Normally you would use it just to view the model in AutoSol, and to view and edit a model in

AutoBuild

. The PHENIX-Coot interface is accessible through the GUI and via the command-line. Using the

GUI, when a model has been produced by the AutoSol Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start Coot with your current map and model. If you are running from the command-line, you can open a new window and type: phenix.autobuild coot which will do the same (provided the necessary map and model are ready). When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. If you want, you can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save

model as COMM/overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. In AutoSol, only the main-chain atoms of the model you save are considered, and the side-chains are ignored. Ligands and solvent in the model are ignored as well. As the AutoSol Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button

Update with current files from PHENIX.

Examples

SAD dataset

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5

http://phenix-online.org/documentation/autosol.htm (7 of 29) [12/14/08 1:00:42 PM]

40

Automated structure solution with AutoSol

The sequence file is used to estimate the solvent content of the crystal and for model-building. Note that for a SAD dataset the value of f_prime and f_double_prime are not critical. If you are off by a factor of 2 on f_double_prime, the refined occupancies of heavy-atom sites might be 1/2 their correct values.

SAD dataset specifying solvent fraction

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \

solvent_fraction=0.45

This will force the solvent fraction to be 0.45. This illustrates a general feature of the Wizards: they will try to estimate values of parameters, but if you input them directly, they will use your input values.

SAD dataset without model-building

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \

build=False

This will carry out the usual structure solution, but will skip model-building

SAD dataset, building RNA instead of protein

phenix.autosol w1.sca seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \

chain_type=RNA

This will carry out the usual structure solution, but will build an RNA chain. For DNA, specify chain_type=DNA. You can only build one type of chain at a time in the AutoSol Wizard. To build protein and

DNA, use the

AutoBuild

Wizard and run it first with chain_type=PROTEIN, then run it again specifying the protein model as input_lig_file_list=proteinmodel.pdb and with chain_type=DNA.

SAD dataset, selecting a particular dataset from an MTZ file

If you have an input MTZ file with more than one anomalous dataset, you can type something like: phenix.autosol w1.mtz seq.dat 2 Se f_prime=-8 f_double_prime=4.5 \ labels='F SIGF DANO SIGDANO'

This will carry out the usual structure solution, but will choose the input data columns based on the labels:

'F SIGF DANO SIGDANO'. If you run the AutoSol Wizard with SAD data and an MTZ file containing more than one anomalous dataset and don't tell it which one to use, all possible values of labels are printed out for you so that you can just paste the one you want in.

You can also find out all the possible label strings to use by typing: phenix.autosol display_labels=w1.mtz # display all labels for w1.mtz

MRSAD -- SAD dataset with an MR model; Phaser SAD phasing including the model

If you are carrying out SAD phasing with Phaser, you can carry out a combination of molecular replacement phasing and SAD phasing (MRSAD) by adding a single new keyword to your AutoSol run: input_partpdb_file=MR.pdb

In this case the MR.pdb file will be used as a partial model in a maximum-likelihood SAD phasing calculation with Phaser to calculate phases and identify sites in Phaser, and the combined MR+SAD phases will be written out. NOTE: At the moment the AutoBuild Wizard is not equipped to use these combined phases http://phenix-online.org/documentation/autosol.htm (8 of 29) [12/14/08 1:00:42 PM]

41

Automated structure solution with AutoSol optimally in iterative model-building, density modification and refinement, because they contain both experimental phase information and model information. It is therefore possible that the resulting phases are biased by your MR model, and that this bias will not go away during iterative model-building because it is continually fed back in.

Using an MR model to find sites and as a source of phase information (method #2 for MRSAD)

You can also combine MR information with SAD phases (see J. P. Schuermann and J. J. Tanner Acta Cryst.

(2003). D59, 1731-1736 ) in PHENIX by running the three wizards AutoMR, AutoSol, and AutoBuild one after the other. This method does not use the partial model and the anomalous information in the SAD dataset simultaneously as the above Phaser maximum-likelihood method does. On the other hand, the phases obtained in this method are independent of the model, so that combining them afterwards does not introduce model bias. (It is not yet clear which is the better approach, so you may wish to try both.)

Additionally, this approach can be used with any method for phasing. Here is a set of three simple commands to do this: First run AutoMR to find the molecular replacement solution, but don't rebuild it yet: phenix.automr gene-5.pdb infl.sca copies=1 \

RMS=1.5 mass=9800 rebuild_after_mr=False

Now your MR solution is in AutoMR_run_1_/MR.1.pdb and phases are in AutoMR_run_1_/MR.1.mtz.

Use these phases as input to AutoSol, along with some weak SAD data, still not building any new models:

phenix.autosol data=infl.sca \

input_phase_file=AutoMR_run_1_/MR.1.mtz input_phase_labels="F PHIC FOM" \ seq_file=sequence.dat build=False note that we have specified the data columns for F PHI and FOM in the input_phase_file. For input_phase_file you must specify all three of these (if you leave out FOM it will set it to zero). AutoSol will write an MTZ file with experimental phases to phaser_xx.mtz where xx depends on how many solutions are considered during the run. The next command for running AutoBuild you will need to edit depending on the value of xx:

phenix.autobuild data=AutoSol_run_1_/phaser_2.mtz \

model=AutoMR_run_1_/MR.1.pdb seq_file=sequence.dat rebuild_in_place=False

AutoBuild will now take the phases from your AutoSol run and combine them with model-based information from your AutoMR MR solution, and will carry out iterative density modification, model-building and refinement to rebuild your model. Note that you may wish to set rebuild_in_place=True, depending on how good your MR model is.

SAD dataset, reading heavy-atom sites from a PDB file written by phenix.hyss

phenix.autosol 11 Pb data=deriv.sca seq_file=seq.dat \

sites_file=deriv_hyss_consensus_model.pdb

This will carry out the usual structure solution process, but will read sites from deriv_hyss_consensus_model.

pdb, try both hands, and carry on from there. If you know the hand of the substructure, you can fix it with

have_hand=True.

MAD dataset

The inputs for a MAD dataset need to specify f_prime and f_double_prime for each wavelength. It also must be clear what datafile goes with which wavelength. If you input an MTZ file with multiple datasets, then the order of those datasets is assumed to be the same as the order of the wavelengths. You may want to either select particular datasets from your MTZ file (see below) or split such an MTZ file into separate files for each dataset if this does not work in the way you expect. http://phenix-online.org/documentation/autosol.htm (9 of 29) [12/14/08 1:00:42 PM]

42

Automated structure solution with AutoSol phenix.autosol seq_file=seq.dat sites=2 atom_type=Se \ peak.data=w1.sca peak.f_prime=-8 peak.f_double_prime=4.5 \ infl.data=w2.sca infl.f_prime=-9 infl.f_double_prime=1.9 \ high.data=w3.sca high.f_prime=-5 high.f_double_prime=3.0

MAD dataset, selecting particular datasets from an MTZ file

This is similar to the case for SAD data.If you have an input MTZ file with more than one anomalous dataset, you can type something like: phenix.autosol seq_file=seq.dat sites=2 atom_type=Se \ peak.data=all_data.mtz peak.f_prime=-8 peak.f_double_prime=4.5 \ high.data=all_data.mtz high.f_prime=-5 high.f_double_prime=3.0 \ peak.labels='Fpeak SIGFpeak DANOpeak SIGDANOpeak' \ high.labels='Fhigh SIGFhigh DANOhigh SIGDANOhigh'

This will carry out the usual structure solution, but will choose the input peak data columns based on the labels: 'Fpeak SIGFpeak DANOpeak SIGDANOpeak', and the high data from the ones labelled 'Fhigh

SIGFhigh DANOhigh SIGDANOhigh'.

As in the SAD case, you can find out all the possible label strings to use by typing: phenix.autosol display_labels=w1.mtz # display all labels for w1.mtz

SIR dataset

The standard inputs for an SIR dataset are the native and derivative, the sequence file, the heavy-atom type, and the number of sites, as well as whether to use anomalous differences (or just isomorphous differences): phenix.autosol native.data=native.sca deriv.data=deriv.sca \

deriv.atom_type=I deriv.sites=2 deriv.inano=inano

This will set the heavy-atom type to Iodine, look for 2 sites, and include anomalous differences.

SAD with more than one anomalously-scattering atom

You can tell the AutoSol wizard to look for more than one anomalously- scattering atom. Specify one atom type (Se) in the usual way. Then specify any additional ones like this if you are running AutoSol from the command line: mad_ha_add_list="Br Pt" mad_ha_add_f_prime_list=" -7 -10" mad_ha_add_f_double_prime_list=" 4.2 12"

There must be the same number of entries in each of these three keyword lists. During phasing Phaser will try to add whichever atom types best fit the scattering from each new site. This option is available for SAD phasing only.

MIR dataset

An MIR dataset is a set of more than one datasets. This cannot be readily expressed in the command-line inputs, but you can specify it easily with the PHENIX AutoSol GUI or with a script. In a script file you can say: cell 93.796 79.849 43.108 90.000 90.000 90.00 # cell params http://phenix-online.org/documentation/autosol.htm (10 of 29) [12/14/08 1:00:42 PM]

43

Automated structure solution with AutoSol thoroughness thorough # best to use thorough for MIR resolution 2.8 # Resolution expt_type sir # MIR dataset is set of SIR datasets input_seq_file sequence.dat

############## DATASET 1 ################ input_file_list rt_rd_1.sca auki_rd_1.sca # Native and deriv 1 nat_der_list Native Au # identify files by ha type inano_list noinano inano # say if ano diffs to be used n_ha_list 0 5 # number of heavy-atoms run_list start # read in datafiles for dataset run_list read_another_dataset # about to start a new dataset here

############## DATASET 2 ################ input_file_list rt_rd_1.sca hgki_rd_1.sca # Native and deriv 2 nat_der_list Native Hg inano_list noinano inano n_ha_list 0 5

#########################################

The script file carries out steps in the order that they are input. This allows us to read in one entire dataset, save it, then read in another one. The AutoSol Wizard will solve each dataset and then combine them and phase the combined datset with SOLVE Bayesian correlated phasing, taking into account any correlations among the non-isomorphism and heavy-atom sites for the various derivatives.

SIR + SAD datasets

A combination of SIR and SAD datasets is almost the same as an MIR dataset in the AutoSol Wizard. You specify each dataset separately, and put "start" and "read_another_dataset" between the datasets: cell 93.796 79.849 43.108 90.000 90.000 90.00 # cell params resolution 2.8 # Resolution input_seq_file sequence.dat

############## DATASET 1 ################ expt_type sir # MIR dataset is set of SIR datasets input_file_list rt_rd_1.sca auki_rd_1.sca # Native and deriv 1 nat_der_list Native Au # identify files by ha type inano_list noinano inano # say if ano diffs to be used n_ha_list 0 5 # number of heavy-atoms run_list start # read in datafiles for dataset run_list read_another_dataset # about to start a new dataset here

############## DATASET 2 ################ expt_type sad # our second dataset is SAD input_file_list hgki_rd_1.sca # anom diffs for SAD dataset mad_ha_n 5 # 5 sites

#########################################

The SIR and SAD datasets will be solved separately (but whichever one is solved first will use difference

Fourier or anomalous difference Fourier's to locate sites for the other). Then phases will be combined by addition of Hendrickson-Lattman coefficients and the combined phases will be density modified.

Possible Problems

General limitations

Specific limitations and problems

The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoSol wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Largerhttp://phenix-online.org/documentation/autosol.htm (11 of 29) [12/14/08 1:00:42 PM]

44

Automated structure solution with AutoSol size versions can be obtained on request.

The command-line version of AutoSol cannot be used for MIR or for combining multiple datasets. The script and GUI versions can be used instead for these cases.

The AutoSol Wizard can take a maximum of 6 derivatives for MIR.

The AutoSol Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Simple algorithm for a maximum-likelihood SAD function. A..J. McCoy, L.C. Storoni and R.J. Read. Acta Cryst. D60, 1220-1228 (2004)

[pdf]

Substructure search procedures for macromolecular structures. R.W. Grosse-

Kunstleve and P.D. Adams. Acta Cryst. D59, 1966-1973 (2003)

[pdf]

MAD phasing: Bayesian estimates of F

A

T. C. Terwilliger Acta Cryst. D50 , 11-16 (1994) [pdf]

Additional information

List of all AutoSol keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- autosol

sites= None Number of heavy-atom sites. This is an alias for the keyword

mad_ha_n. (Command-line only)

sites_file= None PDB or plain-text file with ha sites. This is an alias for

the keyword ha_sites_file. (Command-line only)

atom_type= None Anomalously-scattering atom type. This is an alias for the

keyword mad_ha_type. (Command-line only)

seq_file= Auto Sequence file . This is an alias for the keyword

input_seq_file. (Command-line only)

quick= None Run everything quickly (thoroughness=quick) (Command-line only)

data= None Datafile. For command_line input it is easiest if each

wavelength of data is in a separate data file with obvious data

columns. File types that are easy to read include Scalepack sca files

, CNS hkl files, mtz files with just one wavelength of data, or just

native or just derivative. In this case the Wizard can read your data

without further information. If you have a datafile with many

columns, you can use the "labels" keyword to specify which data

columns to read. (It may be easier in some cases to use the GUI or to http://phenix-online.org/documentation/autosol.htm (12 of 29) [12/14/08 1:00:42 PM]

45

Automated structure solution with AutoSol

split it with phenix.reflection_file_converter first, however.)

(Command-line only)

labels= None Specification string for data labels (Command_line only). To

find out what the appropriate strings are, type "phenix.autosol

display_labels=your-datafile-here.mtz"

f_prime= None F-prime value for any wavelength. (Command-line only)

f_double_prime= None F-doubleprime value for any wavelength. (Command_line

only)

special_keywords

write_run_directory_to_file= None Writes the full name of a run

directory to the specified file. This can

be used as a call-back to tell a script

where the output is going to go.

(Command-line only)

run_control

coot= None Set coot to True and optionally run=[run-number] to run Coot

with the current model and map for run run-number. In some wizards

(AutoBuild) you can edit the model and give it back to PHENIX to

use as part of the model-building process. If you just say coot

then the facts for the highest-numbered existing run will be

shown. (Command-line only)

ignore_blanks= None ignore_blanks allows you to have a command-line

keyword with a blank value like "input_lig_file_list="

stop= None You can stop the current wizard with "stopwizard" or "stop".

If you type "phenix.autobuild run=3 stop" then this will stop run

3 of autobuild. (Command-line only)

display_facts= None Set display_facts to True and optionally

run=[run-number] to display the facts for run run-number.

If you just say display_facts then the facts for the

highest-numbered existing run will be shown.

(Command-line only)

display_summary= None Set display_summary to True and optionally

run=[run-number] to show the summary for run

run-number. If you just say display_summary then the

summary for the highest-numbered existing run will be

shown. (Command-line only)

carry_on= None Set carry_on to True to carry on with highest-numbered

run from where you left off. (Command-line only)

run= None Set run to n to continue with run n where you left off.

(Command-line only)

copy_run= None Set copy_run to n to copy run n to a new run and continue

where you left off. (Command-line only)

display_runs= None List all runs for this wizard. (Command-line only)

delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)

display_labels= None display_labels=test.mtz will list all the labels

that identify data in test.mtz. You can use the label

strings that are produced in AutoSol to identify which

data to use from a datafile like this: peak.data="F+

SIGF+ F- SIGF-" # the entire string in quotes counts

here You can use the individual labels from these

strings as identifiers for data columns in AutoSol and

AutoBuild like this: input_refinement_labels="FP SIGFP

FreeR_flags" # each individual label counts

dry_run= False Just read in and check parameter names

params_only= False Just read in and return parameter defaults

display_all= False Just read in and display parameter defaults

peak

data= None Datafile for peak wavelength. (Command_line only)

labels= None Specification string for data labels for peak wavelength.

(Command_line only). To find out what the appropriate strings

are, type "phenix.autosol display_labels=your-datafile-here.mtz" http://phenix-online.org/documentation/autosol.htm (13 of 29) [12/14/08 1:00:42 PM]

46

Automated structure solution with AutoSol

f_prime= None F-prime value for peak wavelength. (Command_line only)

f_double_prime= None F-doubleprime value for peak wavelength.

(Command_line only)

infl

data= None Datafile for infl wavelength. (Command_line only)

labels= None Specification string for data labels for infl wavelength.

(Command_line only). To find out what the appropriate strings

are, type "phenix.autosol display_labels=your-datafile-here.mtz"

f_prime= None F-prime value for infl wavelength. (Command_line only)

f_double_prime= None F-doubleprime value for infl wavelength.

(Command_line only)

high

data= None Datafile for high wavelength. (Command_line only)

labels= None Specification string for data labels for high wavelength.

(Command_line only). To find out what the appropriate strings

are, type "phenix.autosol display_labels=your-datafile-here.mtz"

f_prime= None F-prime value for high wavelength. (Command_line only)

f_double_prime= None F-doubleprime value for high wavelength.

(Command_line only)

low

data= None Datafile for low wavelength. (Command_line only)

labels= None Specification string for data labels for low wavelength.

(Command_line only). To find out what the appropriate strings

are, type "phenix.autosol display_labels=your-datafile-here.mtz"

f_prime= None F-prime value for low wavelength. (Command_line only)

f_double_prime= None F-doubleprime value for low wavelength.

(Command_line only)

remote

data= None Datafile for remote wavelength. (Command_line only)

labels= None Specification string for data labels for remote wavelength.

(Command_line only). To find out what the appropriate strings

are, type "phenix.autosol display_labels=your-datafile-here.mtz"

f_prime= None F-prime value for remote wavelength. (Command_line only)

f_double_prime= None F-doubleprime value for remote wavelength.

(Command_line only)

native

data= None Datafile for native . (Command_line only)

labels= None Specification string for data labels for native .

(Command_line only). To find out what the appropriate strings

are, type "phenix.autosol display_labels=your-datafile-here.mtz

"

atom_type= Native Heavy-atom type for native . (Command_line only)

sites= 0 Number of heavy-atom sites for native . (Command_line only)

inano= *noinano inano anoonly Use anomalous differences for native .

(Command_line only)

deriv

data= None Datafile for deriv . (Command_line only)

labels= None Specification string for data labels for deriv .

(Command_line only). To find out what the appropriate strings

are, type "phenix.autosol display_labels=your-datafile-here.mtz

"

atom_type= I Heavy-atom type for deriv . (Command_line only)

sites= 2 Number of heavy-atom sites for deriv . (Command_line only)

inano= noinano *inano anoonly Use anomalous differences for deriv .

(Command_line only)

crystal_info

cell= 0.0 0.0 0.0 0.0 0.0 0.0

Enter cell parameter a b c alpha beta

gamma

chain_type= *Auto PROTEIN DNA RNA You can specify whether to build

protein, DNA, or RNA chains. At present you can only build http://phenix-online.org/documentation/autosol.htm (14 of 29) [12/14/08 1:00:42 PM]

47

Automated structure solution with AutoSol

one of these in a single run. If you have both DNA and

protein, build one first, then run AutoBuild again,

supplying the prebuilt model in the "input_lig_file_list"

and build the other. NOTE: default for this keyword is Auto,

which means "carry out normal process to guess this

keyword". The process is to look at the sequence file and/or

input pdb file to see what the chain type is. If there are

more than one type, the type with the larger number of

residues is guessed. If you want to force the chain_type,

then set it to PROTEIN RNA or DNA.

change_sg= False You can change the space group. In AutoSol the Wizard

will use ImportRawData and let you specify the sg and cell.

In AutoMR the wizard will give you an entry form to specify

them. NOTE: This only applies when reading in new datasets.

It does nothing when changed after datasets are read in.

residues= None Number of amino acid residues in the au (or equivalent)

resolution= 0.0

High-resolution limit.Used as resolution limit for

density modification and as general default high-resolution

limit. If resolution_build or refinement_resolution are set

then they override this for model-building or refinement. If

overall_resolution is set then data beyond that resolution

is ignored completely.

sg= None Space Group symbol (i.e., C2221 or C 2 2 21)

solvent_fraction= None Solvent fraction (typically 0.4 - 0.6)

decision_making

acceptable_quality= 40.0

You can specify the minimum overall quality of

a model (as defined by overall_score_method) to be

considered acceptable

acceptable_secondary_structure_cc= 0.35

You can specify the minimum

correlation of density from a

secondary structure model to be

considered acceptable

create_scoring_table= False Choose whether you want a scoring table for

solutions A scoring table is slower but better

desired_coverage= 0.8

Choose what probability you want to have that the

correct solution is in your current list of top

solutions. A good value is 0.80. If you set a low

value (0.01) then only one solution will be kept at

any time; if you set a high value, then many solutions

will be kept (and it will take longer).

ha_iteration= False Choose whether you want to iterate the heavy-atom

search. With iteration, sites are found with HYSS, then

used to phase and carry out quick density-modification,

then difference Fourier is used to find sites again and

improve their accuracy.

hklperfect= None Enter an mtz file with idealized coefficients for map

This will be compared with all maps calculated during

structure solution

max_cc_extra_unique_solutions= 0.5

Specify the maximum value of CC

between experimental maps for two

solutions to consider them substantially

different. Solutions that are within the

range for consideration based on

desired_coverage, but are outside of the

number of allowed max_choices, will be

considered, up to

max_extra_unique_solutions, if they have

a correlation of no more than

max_cc_extra_unique_solutions with all

other solutions to be tested.

max_choices= 3 Number of choices for solutions to put on screen http://phenix-online.org/documentation/autosol.htm (15 of 29) [12/14/08 1:00:42 PM]

48

Automated structure solution with AutoSol

max_composite_choices= 8 Number of choices for composite solutions to

consider

max_extra_unique_solutions= 2 Specify the maximum number of solutions to

consider based on their uniqueness as well

as their high scores. Solutions that are

within the range for consideration based on

desired_coverage, but are outside of the

number of allowed max_choices, will be

considered, up to

max_extra_unique_solutions, if they have a

correlation of no more than

max_cc_extra_unique_solutions with all other

solutions to be tested.

max_ha_iterations= 2 Number of iterations of difference Fouriers in

searching for heavy-atom sites

max_range_to_keep= 4.0

The range of solutions to be kept is

range_to_keep * SD of the group of solutions. This

sets the maximum of range_to_keep

min_fom= 0.05

Minimum fom of a solution to keep it at all

min_fom_for_dm= 0.0

Minimum fom of a solution to density modify

(otherwise just copy over phases). This is useful in

cases where the phasing is so weak that density

modification does nothing or makes the phases worse.

min_phased_each_deriv= 1 You can require that the wizard phase at least

this number of solutions from each derivative,

even if they are poor solutions. Usually at least

1 is a good idea so that one derivative does not

dominate the solutions.

minimum_improvement= 0.0

Minimum improvement in score to continue ha

iteration

n_random= 6 Number of random solutions to generate when setting up

scoring table

overall_score_method= *BAYES-CC Z-SCORE You have 2 choices for an

overall scoring method: (1) Sum of individual

Z-scores (Z-SCORE) (3) Bayesian estimate of CC of

map to perfect model (BAYES-CC) You can specify

which scoring criteria to include with

score_type_list (default is SKEW CORR_RMS for

BAYES-CC and CC RFACTOR SKEW FOM for Z-SCORE.

Additionally, if NCS is present, NCS_OVERLAP is

used by default in the Z-SCORE method).

perfect_labels= None Labels for input data columns for hklperfect

Typical value: "FP PHIC FOM"

r_switch= 0.4

R-value criteria for deciding whether to use R-value or

residues built A good value is 0.40

random_scoring= False For testing purposes you can generate random

scores

res_eval= 0.0

Resolution for running resolve evaluation (usually 2.5 A)

score_individual_offset_list= None Offsets for individual scores in

CC-scoring. Each score will be multiplied

by the score_individual_scale_list value,

then score_individual_offset_list value is

added, to estimate the CC**2 value using

this score by itself. The uncertainty in

the CC**2 value is given by

score_individual_sd_list. NOTE: These

scores are not used in calculation of the

overall score. They are for information

only

score_individual_scale_list= None Scale factors for individual scores in

CC-scoring. Each score will be multiplied http://phenix-online.org/documentation/autosol.htm (16 of 29) [12/14/08 1:00:42 PM]

49

Automated structure solution with AutoSol

by the score_individual_scale_list value,

then score_individual_offset_list value is

added, to estimate the CC**2 value using

this score by itself. The uncertainty in

the CC**2 value is given by

score_individual_sd_list. NOTE: These

scores are not used in calculation of the

overall score. They are for information

only

score_individual_sd_list= None Uncertainties for individual scores in

CC-scoring. Each score will be multiplied by

the score_individual_scale_list value, then

score_individual_offset_list value is added,

to estimate the CC**2 value using this score

by itself. The uncertainty in the CC**2 value

is given by score_individual_sd_list. NOTE:

These scores are not used in calculation of

the overall score. They are for information

only

score_overall_offset= None Overall offset for scores in CC-scoring. The

weighted scores will be summed, then all

multiplied by score_overall_scale, then

score_overall_offset will be added.

score_overall_scale= None Overall scale factor for scores in CC-scoring.

The weighted scores will be summed, then all

multiplied by score_overall_scale, then

score_overall_offset will be added.

score_overall_sd= None Overall SD of CC**2 estimate for scores in

CC-scoring. The weighted scores will be summed, then

all multiplied by score_overall_scale, then

score_overall_offset will be added. This is an

estimate of CC**2, with uncertainty about

score_overall_sd. Then the square root is taken to

estimate CC and SD(CC), where SD(CC) now depends on CC

due to the square root.

score_type_list= SKEW CORR_RMS You can choose what scoring methods to

include in scoring of solutions in AutoSol. (The

choices available are: CC_DENMOD RFACTOR SKEW

NCS_COPIES NCS_IN_GROUP TRUNCATE FLATNESS CORR_RMS

REGIONS CONTRAST FOM ) NOTE: If you are using

Z-SCORE or BAYES-CC scoring, The default is CC_RMS

RFACTOR SKEW FOM (and NCS_OVERLAP if ncs_copies >1).

score_weight_list= None Weights on scores for CC-scoring. Enter the

weight on each score in score_type_list. The weighted

scores will be summed, then all multiplied by

score_overall_scale, then score_overall_offset will

be added.

skip_score_list= NCS_OVERLAP You can evaluate some scores but not use

them. Include the ones you do not want to use in the

final score in skip_score_list.

use_perfect= False You can use the CC between each solution and

hklperfect in scoring. This is only for methods development

purposes.

density_modification

fix_xyz= False You can choose to not refine coordinates, and instead to

fix them to the values found by the heavy-atom search.

fix_xyz_after_denmod= False When sites are found after density

modification you can choose whether you want to

fix the coordinates to the values found in that

map.

hl_in_resolve= False AutoSol normally does not write out HL coefficients http://phenix-online.org/documentation/autosol.htm (17 of 29) [12/14/08 1:00:42 PM]

50

Automated structure solution with AutoSol

in the resolve.mtz file with density-modified phases. You

can turn them on with hl_in_resolve=True

mask_cycles= 5 Number of mask cycles in density modification (5 is usual

for thorough density modification

mask_type= *histograms probability wang Choose method for obtaining

probability that a point is in the protein vs solvent region.

Default is "histograms". If you have a SAD dataset with a

heavy atom such as Pt or Au then you may wish to choose

"wang" because the histogram method is sensitive to very high

peaks. Options are: histograms: compare local rms of map and

local skew of map to values from a model map and estimate

probabilities. This one is usually the best. probability:

compare local rms of map to distribution for all points in

this map and estimate probabilities. In a few cases this one

is much better than histograms. wang: take points with

highest local rms and define as protein.

minor_cycles= 10 Number of minor cycles in density modification for each

mask cycle (10 is usual for thorough density modification

test_mask_type= True You can choose to have AutoSol test histograms/wang

methods for identifying solvent region based on the

final density modification r-factor.

thorough_denmod= False Choose whether you want to go for quick density

modification (speeds it up and for a terrible map is

sometimes better)

truncate_ha_sites_in_resolve= Auto *Yes No True False You can choose to

truncate the density near heavy-atom sites

at a maximum of 2.5 sigma. This is useful

in cases where the heavy-atom sites are

very strong, and rarely hurts in cases

where they are not. The heavy-atom sites

are specified with "input_ha_file"

use_ncs_in_denmod= True This script normally uses available ncs

information in density modification. Say No to skip

this. See also find_ncs

display

number_of_solutions_to_display= 1 Number of solutions to put on screen

and to write out

solution_to_display= 0 Solution number of the solution to display and

write out ( use 0 to let the wizard display the top

solution)

general

background= True When you specify nproc=nn, you can run the jobs in

background (default if nproc is greater than 1) or

foreground (default if nproc=1). If you set

run_command=qsub (or otherwise submit to a batch queue),

then you should set background=False, so that the batch

queue can keep track of your runs. There is no need to use

background=True in this case because all the runs go as

controlled by your batch system. If you use run_command=csh

(or similar, csh is default) then normally you will use

background=True so that all the jobs run simultaneously.

base_path= None You can specify the base path for files (default is

current working directory)

clean_up= False At the end of the entire run the TEMP directories will

be removed if clean_up is True. The default is No, keep these

directories. If you want to remove them after your run is

finished use a command like "phenix.autobuild run=1

clean_up=True"

coot_name= coot If your version of coot is called something else, then

you can specify that here.

http://phenix-online.org/documentation/autosol.htm (18 of 29) [12/14/08 1:00:42 PM]

51

Automated structure solution with AutoSol

data_quality= *moderate strong weak The defaults are set for you

depending on the anticipated data quality. You can choose

"moderate" if you are unsure.

debug= False You can have the wizard stop with error messages about the

code if you use debug. NOTE: you cannot use Pause with debug.

expt_type= *Auto mad sir sad Experiment type (MAD SIR SAD) NOTE: Please

treat MIR experiments as a set of SIR experiments. NOTE: The

default for this keyword is Auto which means "carry out

normal process to guess this keyword". If you have a single

file, then it is assumed to be SAD. If you specify

native.data and deriv.data it is SIR, if you specify

peak.data and infl.data it is MAD. If the Wizard does not

guess correctly, you can set it with this keyword.

extra_verbose= False Facts and possible commands will be printed every

cycle if Yes

i_ran_seed= 588459 Random seed (positive integer) for model-building

and simulated annealing refinement

max_wait_time= 100.0

You can specify the length of time (seconds) to

wait when testing the run_command. If you have a cluster

where jobs do not start right away you may need a longer

time to wait.

nbatch= 1 You can specify the number of processors to use (nproc) and

the number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors

available and leave nbatch alone. If you leave nbatch as None it

will be set automatically, with a value depending on the Wizard.

This is recommended. The value of nbatch can affect the results

that you get, as the jobs are not split into exact replicates,

but are rather run with different random numbers. If you want to

get the same results, keep the same value of nbatch.

nproc= 1 You can specify the number of processors to use (nproc) and the

number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors available

and leave nbatch alone. If you leave nbatch as None it will be

set automatically, with a value depending on the Wizard. This is

recommended. The value of nbatch can affect the results that you

get, as the jobs are not split into exact replicates, but are

rather run with different random numbers. If you want to get the

same results, keep the same value of nbatch.

resolve_size= _giant _huge _extra_huge *None Size for solve/resolve

("","_giant","_huge","_extra_huge")

run_command= csh When you specify nproc=nn, you can run the subprocesses

as jobs in background with csh (default) or submit them to

a queue with the command of your choice (i.e., qsub ). If

you have a multi-processor machine, use csh. If you have a

cluster, use qsub or the equivalent command for your

system. NOTE: If you set run_command=qsub (or otherwise

submit to a batch queue), then you should set

background=False, so that the batch queue can keep track of

your runs. There is no need to use background=True in this

case because all the runs go as controlled by your batch

system. If you use run_command=csh (or similar, csh is

default) then normally you will use background=True so that

all the jobs run simultaneously.

skip_xtriage= False You can bypass xtriage if you want. This will

prevent you from applying anisotropy corrections, however.

temp_dir= None Define a temporary directory (it must exist)

thoroughness= *quick thorough You can try to run quickly and see if you

can get a solution ("quick") or more thoroughly to get the

best possible solution ("thorough").

title= Run 1 AutoSol Sun Dec 7 17:46:23 2008 Enter any text you like to http://phenix-online.org/documentation/autosol.htm (19 of 29) [12/14/08 1:00:42 PM]

52

Automated structure solution with AutoSol

help identify what you did in this run

top_output_dir= None This is used in subprocess calls of wizards and to

tell the Wizard where to look for the STOPWIZARD file.

verbose= False Command files and other verbose output will be printed

heavy_atom_search

acceptable_cc_hyss= 0.2

Hyss will be run at up to n_add_res_max+1

resolutions starting with res_hyss and adding

increments of add_res_max/n_add_res_max. If the best

CC value is greater than acceptable_cc_hyss then no

more resolutions are tried.

add_res_max= 2.0

Hyss will be run at up to n_add_res_max+1 resolutions

starting with res_hyss and adding increments of

add_res_max/n_add_res_max. If the best CC value is greater

than acceptable_cc_hyss then no more resolutions are tried.

best_of_n_hyss= 1 Hyss will be run up to best_of_n_hyss_always times at

a given resolution. If the best CC value is greater than

good_cc_hyss and the number of sites found is at least

min_fraction_of_sites_found times the number expected

and Hyss was tried at least best_of_n_hyss times, then

the search is ended.

best_of_n_hyss_always= 10 Hyss will be run up to best_of_n_hyss_always

times at a given resolution. If the best CC value

is greater than good_cc_hyss and the number of

sites found is at least

min_fraction_of_sites_found times the number

expected and Hyss was tried at least

best_of_n_hyss times, then the search is ended.

good_cc_hyss= 0.3

Hyss will be run up to best_of_n_hyss_always times at

a given resolution. If the best CC value is greater than

good_cc_hyss and the number of sites found is at least

min_fraction_of_sites_found times the number expected and

Hyss was tried at least best_of_n_hyss times, then the

search is ended.

hyss_enable_early_termination= True You can specify whether to stop HYSS

as soon as it finds a convincing solution

(Yes, default) or to keep trying...

hyss_general_positions_only= True Select Yes if you want HYSS only to

consider general positions and ignore sites

on special positions. This is appropriate

for SeMet or S-Met solutions, not so

appropriate for heavy-atom soaks

hyss_min_distance= 3.5

Enter the minimum distance between heavy-atom

sites to keep them in HYSS

hyss_n_fragments= 3 Enter the number of fragments in HYSS

hyss_n_patterson_vectors= 33 Enter the number of Patterson vectors to

consider in HYSS

hyss_random_seed= 792341 Enter an integer as random seed for HYSS

mad_ha_n= None Number of heavy atoms (anomalously-scattering atoms) in

the au

mad_ha_type= Se Enter the anomalously-scattering or heavy atom type. For

example, Se or Au. NOTE: if you want Phaser to add

additional heavy-atoms of other types, you can specify them

with mad_ha_add_list.

max_single_sites= 5 In sites_from_denmod a core set of sites that are

strong is identified. If the hand of the solution is

known then additional sites are added all at once up

to the expected number of sites. Otherwise sites are

added one at a time, up to a maximum number of tries

of max_single_sites

min_fraction_of_sites_found= 1.0

Hyss will be run up to

best_of_n_hyss_always times at a given http://phenix-online.org/documentation/autosol.htm (20 of 29) [12/14/08 1:00:42 PM]

53

Automated structure solution with AutoSol

resolution. If the best CC value is greater

than good_cc_hyss and the number of sites

found is at least

min_fraction_of_sites_found times the

number expected and Hyss was tried at least

best_of_n_hyss times, then the search is

ended.

min_hyss_cc= 0.05

Minimum CC of a heavy-atom solution in HYSS to keep it

at all

n_add_res_max= 2 Hyss will be run at up to n_add_res_max+1 resolutions

starting with res_hyss and adding increments of

add_res_max/n_add_res_max. If the best CC value is

greater than acceptable_cc_hyss then no more resolutions

are tried.

input_files

cif_def_file_list= None You can enter any number of CIF definition

files. These are normally used to tell phenix.refine

about the geometry of a ligand or unusual residue.

You usually will use these in combination with "PDB

file with metals/ligands" (keyword

"input_lig_file_list" ) which allows you to attach

the contents of any PDB file you like to your model

just before it gets refined. You can use

phenix.elbow to generate these if you do not have a

CIF file and one is requested by phenix.refine

group_labels_list= None For command-line and script running of AutoSol,

you may wish to use keywords to specify which set of

data columns to be used from an MTZ or other file

type with multiple datasets. (From the GUI, it is

easy because you are prompted with the column

labels). You can do this by specifying a string that

identifies which dataset to include. All allowed

values of this identification string will be written

out any time AutoSol is run on this dataset like

this: NOTE: To specify a particular set of data you

can specify one of the following (this example is for

MAD data, specifying data for peak wavelength): ...:

peak.labels='F SIGF DANO SIGDANO' peak.labels='F(+)

SIGF(+) F(-) SIGF(-)' You can then use one of the

above commands on the command-line to identify the

dataset of interest. If you want to use a script

instead, you can specify N files in your

input_data_file_list, and then specify N values for

group_labels_list like this: group_labels_list

'F,SIGF,DANO,SIGDANO' 'F(+),SIGF(+),F(-),SIGF(-)'

This will take 'F,SIGF,DANO,SIGDANO' as the data for

datafile 1 and 'F(+),SIGF(+),F(-),SIGF(-)' for

datafile 2 You can identify one dataset from each

input file in this way. If you want more than one,

then please use phenix.reflection_file_converter to

split your input file, or else use the GUI version of

AutoSol in which you can select any subset of the

data that you wish.

input_file_list= None Input data files: Any standard format is fine. If

all files are Scalepack premerged or all are Scalepack

unmerged original index then they will be used as is.

In all other cases all files are converted next to

Scalepack premerged.

input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set

then density at sites specified with input_ha_file is

truncated to improve the density modification procedure.

http://phenix-online.org/documentation/autosol.htm (21 of 29) [12/14/08 1:00:42 PM]

54

Automated structure solution with AutoSol

input_phase_file= None MTZ data file with FC PHIC or equivalent to use

for finding heavy-atom sites with difference Fourier

methods.

input_refinement_file= None Data file to use for refinement. The data in

this file should not be corrected for anisotropy.

It will be combined with experimental phase

information for refinement. If you leave this

blank, then the output of phasing will be used in

refinement (see below). If no anisotropy

correction is applied to the data you do not need

to specify a datafile for refinement. If an

anisotropy correction is applied to the data

files, then you must enter a datafile for

refinement if you want to refine your model. (See

"correct_aniso" for specifying whether an

anisotropy correction is applied. In most cases

it is not.) If an anisotropy correction is

applied and no refinement datafile is supplied,

then no refinement will be carried out in the

model-building step. You can choose any of your

datafiles to be the refinement file, or a native

that is not part of the datasets for structure

solution. If there are more than one dataset you

will be asked each time for a refinement file,

but only the last one will be used. Any

standard format is fine; normally only F and sigF

will be used. Bijvoet pairs and duplicates will

be averaged. If an mtz file is provided then a

free R flag can be read in as well. If you do

not provide a refinement file then the structure

factors from the phasing step will be used in

refinement. This is normally satisfactory for SAD

data and MIR data. For MAD data you may wish to

supply a refinement file because the structure

factors from phasing are a combination of data

from different wavelengths of data. It is better

if you choose your best wavelength of data for

refinement.

input_refinement_labels= None Labels for input refinement file columns

(FP SIGFP FreeR_flag)

input_seq_file= Auto Enter name of file with 1-letter code of protein

sequence NOTES: 1. lines starting with > are ignored

and separate chains 2. FASTA format is fine 3. If

there are multiple copies of a chain, just enter one

copy. 4. If you enter a PDB file for rebuilding and it

has the sequence you want, then the sequence file is not

necessary. NOTE: You can also enter the name of a PDB

file that contains SEQRES records, and the sequence from

the SEQRES records will be read, written to

seq_from_seqres_records.dat, and used as your input

sequence. NOTE: for AutoBuild you can specify

start_chains_list on the first line of your sequence

file: >> start_chains_list 23 11 5 NOTE: default

for this keyword is Auto, which means "carry out normal

process to guess this keyword". This means if you

specify "after_autosol" in AutoBuild, AutoBuild will

automatically take the value from AutoSol. If you do not

want this to happen, you can specify None which means

"No file"

refine_eff_file_list= None You can enter any number of refinement http://phenix-online.org/documentation/autosol.htm (22 of 29) [12/14/08 1:00:42 PM]

55

Automated structure solution with AutoSol

parameter files. These are normally used to tell

phenix.refine defaults to apply, as well as

creating specialized definitions such as unusual

amino acid residues and linkages. These

parameters override the normal phenix.refine

defaults. They themselves can be overridden by

parameters set by the Wizard and by you,

controlling the Wizard. NOTE: Any parameters set

by AutoBuild directly (such as

number_of_macro_cycles, high_resolution, etc...)

will not be taken from this parameters file. This

is useful only for adding extra parameters not

normally set by AutoBuild.

model_building

add_sidechains= True Add side chains on to main-chain in Textal

model-building. This requires a sequence file

build= True Build model after density modification?

build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build

models with RESOLVE and TEXTAL or either one, and how many

different models to build with RESOLVE. The more you build,

the more likely to get a complete model. Note that

rebuild_in_place can only be carried out with RESOLVE

model-building

capra= True CAPRA is used to place CA atoms

cc_helix_min= None Minimum CC of helical density to map at low

resolution when using helices_strands_only

cc_strand_min= None Minimum CC of strand density to map when using

helices_strands_only

d_max_textal= 1000.0

This low-resolution limit is only used for Textal

model-building

d_min_textal= 2.8

Textal has an optimal high-resolution limit of 2.8 A

This limit is only used for Textal model-building

fit_loops= True You can fit loops automatically if sequence alignment

has been done.

group_ca_length= 4 In resolve building you can specify how short a

fragment to keep. Normally 4 or 5 residues should be

the minimum.

group_length= 2 In resolve building you can specify how many fragments

must be joined to make a connected group that is kept.

Normally 2 fragments should be the minimum.

helices_strands_only= False You can choose to use a quick model-building

method that only builds secondary structure. At

low resolution this may be both quicker and more

accurate than trying to build the entire structure

If you are running the AutoSol Wizard, normally

you should choose 'Yes' and use the quick

model-building. Then when your structure is solved

by AutoSol, go on to AutoBuild and build a more

complete model (this time normally using

helices_strands_only=False).

helices_strands_start= True You can choose to use a quick model-building

method that builds secondary structure as a way

to get started...then model completion is done as

usual. (Contrast with helices_strands_only which

only does secondary structure)

input_compare_file= None If you are rebuilding a model or already think

you know what the model should be, you can include a

comparison file in rebuilding. The model is not used

for anything except to write out information on

coordinate differences in the output log files.

NOTE: this feature does not always work correctly.

http://phenix-online.org/documentation/autosol.htm (23 of 29) [12/14/08 1:00:42 PM]

56

Automated structure solution with AutoSol

loop_cc_min= 0.4

You can specify the minimum correlation of density from

a loop with the map.

n_cycle_build= 3 Choose number of cycles (3). This does not apply if

TEXTAL is selected for build_type

n_random_frag= 0 In resolve building you can randomize each fragment

slightly so as to generate more possibilities for tracing

based on extending it.

n_random_loop= 3 Number of randomized tries from each end for building

loops If 0, then one try. If N, then N additional tries

with randomization based on rms_random_loop.

ncycle_refine= 3 Choose number of refinement cycles (3)

number_of_builds= 2 Number of different solutions to build models for

number_of_models= 3 This parameter lets you choose how many initial

models to build with RESOLVE within a single build

cycle. This parameter is now superseded by

number_of_parallel_models, which sets the number of

models (but now entire build cycles) to carry out in

parallel. A zero means set it automatically. That is

what you normally should use. The number_of_models is

by default set to 1 and number_of_parallel_models is

set to the value of nbatch (typically 4).

offsets_list= 53 7 23 You can specify an offset for the orientation of

the helix and strand templates in building. This is used

in generating different starting models.

quick_build= False Choose whether you want to go for quick

model-building (speeds it up, and for poor maps, is

sometimes better)

rebuild_side_chains= False You can choose to replace side chains (with

extend_only) before rebuilding the model (not

normally used)

refine= False This script normally refines the model during building.

Say No to skip refinement

resolution_build= 0.0

Enter the high-resolution limit for

model-building. If 0.0, the value of resolution is

used as a default.

resolve_command_list= None Commands for resolve. One per line in the

form: keyword value value can be optional

Examples: coarse_grid resolution 200 2.0 hklin

test.mtz NOTE: for command-line usage you need to

enclose the whole set of commands in double quotes

(") and each individual command in single quotes

(') like this: resolve_command_list="'no_build'

'b_overall 23' "

retrace_before_build= False You can choose to retrace your model n_mini

times and use a map based on these retraced models

to start off model-building. This is the default

for rebuilding models if you are not using

rebuild_in_place. You can also specify

n_iter_rebuild, the number of cycles of

retrace-density-modify-build before starting the

main build.

rms_random_frag= None Rms random position change added to residues on

ends of fragments when extending them If you enter a

negative number, defaults will be used.

rms_random_loop= None Rms random position change added to residues on

ends of loops in tries for building loops If you enter

a negative number, defaults will be used.

semet= False You can specify that the dataset that is used for

refinement is a selenomethionine dataset, and that the model

should be the SeMet version of the protein, with all SD of MET

replaced with Se of MSE.

http://phenix-online.org/documentation/autosol.htm (24 of 29) [12/14/08 1:00:42 PM]

57

Automated structure solution with AutoSol

solve_command_list= None Commands for solve. One per line in the form:

keyword value value can be optional Examples:

verbose resolution 200 2.0

start_chains_list= None You can specify the starting residue number for

each of the unique chains in your structure. If you

use a sequence file then the unique chains are

extracted and the order must match the order of your

starting residue numbers. For example, if your

sequence file has chains A and B (identical) and

chains C and D (identical to each other, but

different than A and B) then you can enter 2 numbers,

the starting residues for chains A and C. NOTE: you

need to specify an input sequence file for

start_chains_list to be applied.

thorough_loop_fit= True Try many conformations and accept them even if

the fit is not perfect? If you say Yes the parameters

for thorough loop fitting are: n_random_loop=100

rms_random_loop=0.3 rho_min_main=0.5 while if you say

No those for quick loop fitting are: n_random_loop=20

rms_random_loop=0.3 rho_min_main=1.0

trace_as_lig= False You can specify that in building steps the ends of

chains are to be extended using the LigandFit algorithm.

This is default for nucleic acid model-building.

use_any_side= False You can choose to have resolve model-building place

the best-fitting side chain at each position, even if the

sequence is not matched to the map.

use_met_in_align= Auto *Yes No True False You can use the heavy-atom

positions in input_ha_file as markers for Met SD

positions.

ncs

find_ncs= Auto *Yes No True False This script normally deduces ncs

information from the NCS in chains of models that are built

during iterative model-building. The update is done each cycle

in which an improved model is obtained. Say No to skip this.

See also "input_ncs_file" which can be used to specify NCS at

the start of the process. If find_ncs="No" then only this

starting NCS will be used and it will not be updated. You can

use find_ncs "No" to specify exactly what residues will be

used in NCS refinement and exactly what NCS operators to use

in density modification. You can use the function

$PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to

help you set up an input_ncs_file that has your specifications

in it.

ncs_copies= None Number of copies of the molecule in the au (note: only

one type of molecule allowed at present)

ncs_refine_coord_sigma_from_rmsd= False You can choose to use the

current NCS rmsd as the value of the

sigma for NCS restraints. See also

ncs_refine_coord_sigma_from_rmsd_ratio

ncs_refine_coord_sigma_from_rmsd_ratio= 1.0

You can choose to multiply

the current NCS rmsd by this

value before using it as the

sigma for NCS restraints See

also

ncs_refine_coord_sigma_from_rmsd

optimize_ncs= True This script normally deduces ncs information from the

NCS in chains of models that are built during iterative

model-building. Optimize NCS adds a step to try and make http://phenix-online.org/documentation/autosol.htm (25 of 29) [12/14/08 1:00:42 PM]

58

Automated structure solution with AutoSol

the molecule formed by NCS as compact as possible, without

losing any point-group symmetry.

refine_with_ncs= True This script can allow phenix.refine to

automatically identify NCS and use it in refinement.

NOTE: ncs refinement and placing waters automatically

are mutually exclusive at present.

phasing

do_madbst= True Choose whether you want to skip FA calculation (speeds

it up)

f_doubleprime_list= None Enter f" for the heavy-atom for this dataset

f_prime_list= None Enter f' for the heavy-atom for this dataset

fixscattfactors= True For SOLVE phasing and MAD data you can choose

whether scattering factors are to be fixed by choosing

'Yes' to fix them or 'No' to refine them. Normally

choose 'Yes' (fix) if the data are weak and 'No'

(refine) if the data are strong.

ha_sites_file= None Input sites file... with xyz in fractional

coordinates or a PDB file with coordinates NOTE: This

file is optional if you specify a partial model file

have_hand= False Normally you will not know the hand of the heavy-atom

substructure, so have_hand=False. However if you do know it

(you got the sites from a difference Fourier or you know the

answer another way) you can specify that the hand is known.

id_scale_ref= None By default the datafile with the highest resolution

is used for the first step in scaling of MAD data. You can

choose to use any of the datafiles in your MAD dataset.

ikeepflag= 1 You can choose to keep all reflections in merging steps.

This is separate from rejecting reflections with high iso or

ano diffs. Default=1 (keep them)

inano_list= None Choose 'inano' for including anomalous differences and

'noinano' not to include them and 'anoonly' for just

anomalous differences (no isomorphous differences)

input_partpdb_file= None You can enter a PDB file (usually from

molecular replacement) for use in identifying

heavy-atom sites and phasing. NOTE 1: This procedure

works best if the model is refined. NOTE 2: This

file is only used in SAD phasing with Phaser on a

single dataset. In all other cases it is ignored.

NOTE 3: The output phases in phaser_xx.mtz will

contain both SAD and model information. They are not

completely suitable for use with AutoBuild or other

iterative model-building procedures because the

phases are not entirely experimental (but they may

work).

input_phase_labels= None Labels for FC and PHIC for data file with FC

PHIC or equivalent to use for finding heavy-atom

sites with difference Fourier methods.

mad_ha_add_f_double_prime_list= None F-double_prime values of additional

heavy-atom types. You must specify the

same number of entries of

mad_ha_add_f_double_prime_list as you do

for mad_ha_add_f_prime_list and for

mad_ha_add_list.

mad_ha_add_f_prime_list= None F-prime values of additional heavy-atom

types. You must specify the same number of

entries of mad_ha_add_f_prime_list as you do

for mad_ha_add_f_double_prime_list and for

mad_ha_add_list.

mad_ha_add_list= None You can specify heavy atom types in addition to

the one you named in mad_ha_type. The heavy-atoms found

in initial HySS searches will be given the type of http://phenix-online.org/documentation/autosol.htm (26 of 29) [12/14/08 1:00:42 PM]

59

Automated structure solution with AutoSol

mad_ha_type, and Phaser (if used for phasing) will try

to find additional heavy atoms of both the type

mad_ha_type and any listed in mad_ha_add_list. You must

also specify the same number of mad_ha_add_f_prime_list

entries and of mad_ha_add_f_double_prime_list entries.

n_ha_list= None Enter a guess of number of HA sites

nat_der_list= None Enter 'Native' or a heavy-atom symbol (Pt, Se)

overallscale= False You can choose to have only an overall scale factor

for this dataset (no local scaling applied). Use this if

your data is already fully scaled.

partpdb_rms= 1.0

phase_full_resolution= True You can choose to use the full resolution of

the data in phasing, instead of using the

recommended_resolution. This is always a good

idea with Phaser phases.

phaser_completion= True You can choose to use phaser log-likelihood

gradients to complete your heavy-atom sites. This can

be used with or without the ha_iteration option.

phasing_method= SOLVE *PHASER You can choose to phase with SOLVE or with

Phaser. (Only applies to SAD phasing at present)

ratio_out= 3.0

You can choose the ratio of del ano or del iso to the rms

in the shell for rejection of a reflection. Default = 4.

read_sites= False Choose if you want to enter ha sites from a file The

name of the file will be requested after scaling is

finished. The file can have sites in fractional coordinates

or be a PDB file.

require_nat= True Choose yes to skip any reflection with no native (for

SIR) or no data (MAD/SAD) or where anom difference is very

large. This keyword (default=Yes) allows the routines in

SOLVE to remove reflections with an implausibly large

anomalous difference (greater than ratio_out times the rms

anomalous difference).

res_hyss= None Resolution for running HYSS (usually 3.5 A is fine)

res_phase= 0.0

Enter the high-resolution limit for phasing

skip_extra_phasing= Auto Yes *No True False You can choose to skip an

extra phasing step to speed up the process

use_phaser_hklstart= True You can choose to start density modification

with FWT PHWT from Phaser (Only applies to SAD

phasing at present)

wavelength_list= None Enter wavelength of x-ray data (A)

refinement

link_distance_cutoff= 3.0

You can specify the maximum bond distance for

linking residues in phenix.refine called from the

wizards.

ordered_solvent_low_resolution= None You can choose what resolution

cutoff to use fo placing ordered solvent

in phenix.refine. If the resolution of

refinement is greater than this cutoff,

then no ordered solvent will be placed,

even if

refinement.main.ordered_solvent=True.

place_waters= True You can choose whether phenix.refine automatically

places ordered solvent (waters) during the refinement

process.

r_free_flags_fraction= 0.1

Maximum fraction of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

http://phenix-online.org/documentation/autosol.htm (27 of 29) [12/14/08 1:00:42 PM]

60

Automated structure solution with AutoSol

r_free_flags_lattice_symmetry_max_delta= 5.0

You can set the maximum

deviation of distances in the

lattice that are to be

considered the same for

purposes of generating a

lattice-symmetry-unique set of

free R flags.

r_free_flags_max_free= 2000 Maximum number of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_use_lattice_symmetry= True When generating r_free_flags you

can decide whether to include lattice

symmetry (good in general, necessary

if there is twinning).

refine_b= True You can choose whether phenix.refine is to refine

individual atomic displacement parameters (B values)

refine_se_occ= True You can choose to refine the occupancy of SE atoms

in a SEMET structure (default=Yes). This only applies if

semet=true

refinement_resolution= 0.0

Enter the high-resolution limit for

refinement only. This high-resolution limit can

be different than the high-resolution limit for

other steps. The default ("None" or 0.0) is to

use the overall high-resolution limit for this

run (as set by 'resolution')

use_mlhl= True This script normally uses information from the input file

(HLA HLB HLC HLD) in refinement. Say No to only refine on Fobs

scaling

b_overall= None If an anisotropy correction is applied, you can choose

to set the overall B of the data to a specific value with

b_overall. See also "correct_aniso"

correct_aniso= *Auto Yes No True False Choose if you want to apply a

correction for anisotropy to the data. Yes means always

apply correction, No means never apply it, Auto means

apply it if the data is severely anisotropic

(recommended=Auto). If you set correct_aniso=Auto then if

the range of anisotropic B-factors is greater than

delta_b_for_auto_correct_aniso and the ratio of the

largest to the smallest less than

ratio_b_for_auto_correct_aniso then the correction will

be applied. Anisotropy correction will be applied to all

input data before scaling. The default overall B factor

will be the minimum of the b-factors in any direction of

the original data. To set this to another value, use

"b_overall"

delta_b_for_auto_correct_aniso= 20.0

Choose what range of aniso B values

is so big that you want to correct for

anisotropy by default. Both ratio_b and

delta_b must be large to correct. see

also ratio_b_for_auto_correct_aniso See

also "correct_aniso" which overrides

this default if set to "Yes"

ratio_b_for_auto_correct_aniso= 1.5

Choose what ratio aniso B values is

so big that you want to correct for

anisotropy by default. Both ratio_b and

delta_b must be large to correct. see

also delta_b_for_auto_correct_aniso See http://phenix-online.org/documentation/autosol.htm (28 of 29) [12/14/08 1:00:42 PM]

61

Automated structure solution with AutoSol

also "correct_aniso" which overrides

this default if set to "Yes"

test_correct_aniso= True Choose whether you want to try applying or not

applying an anisotropy correction if the run fails.

First your original selection for applying or not

will be tried, and then the opposite will be tried

if the run fails.

http://phenix-online.org/documentation/autosol.htm (29 of 29) [12/14/08 1:00:42 PM]

62

Automated molecular replacement with AutoMR

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Automated molecular replacement with AutoMR

Author(s)

Purpose

Purpose of the AutoMR Wizard

Usage

Summary of inputs and outputs for AutoMR

Output files from AutoMR

How to run the AutoMR Wizard

Components, copies, search models, and ensembles

What the AutoMR wizard needs to run

Specifying which columns of data to use from input data files

Examples

Standard AutoMR run with coords.pdb native.sca

Specifying data columns

Specifying a refinement file for AutoBuild

Passing any commands to AutoBuild

AutoMR searching for 2 components

Specifying molecular masses of 2 components

AutoMR searching for 2 components, but specifying the orientation of one of them

Possible Problems

Specific limitations and problems

Literature

Additional information

List of all AutoMR keywords

Author(s)

Phaser: Randy J. Read, Airlie J. McCoy and Laurent C. Storoni

AutoMR Wizard: Tom Terwilliger, Laurent Storoni, Randy Read, and Airlie McCoy

PHENIX GUI and PDS Server: Nigel W. Moriarty

● phenix.xtriage: Peter Zwart

Purpose

Purpose of the AutoMR Wizard

The AutoMR Wizard provides a convenient interface to Phaser molecular replacement and feeds the results of molecular replacement directly into the AutoBuild Wizard for automated model rebuilding.

The AutoMR Wizard begins with datafiles with structure factor amplitudes and uncertainties, a search model or models, and identifies placements of the search models that are compatible with the data.

Usage

The AutoMR Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded http://phenix-online.org/documentation/automr.htm (1 of 16) [12/14/08 1:00:50 PM]

63

Automated molecular replacement with AutoMR script files. All three versions are identical except in the way that they take commands from the user.

See Running a Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The

command-line version will be described here.

NOTE: You may find it easiest to run the GUI version of AutoMR when you are learning how to use it, and then to move to the command-line or script versions later, as the GUI version will take you through all the necessary steps of organizing your data.

Summary of inputs and outputs for AutoMR

Input data file. This file can be in most any format, and must contain either amplitudes or intensities and sigmas. You can specify what resolution to use for molecular replacement and separately what resolution to use for model rebuilding. If you specify "0.0" for resolution (recommended) then defaults will be used for molecular replacement (i.e. use data to 2.5A if available to solve structure, then carry out rigid body refinement of final solution with all data) and all the data will be used for model rebuilding.

Composition of the asymmetric unit. PHASER needs to know what the total mass in the asymmetric unit is (i.e. not just the mass of the search models). You can define this either by specifying one or more protein or nucleic acid sequence files, or by specifying protein or nucleic acid molecular masses, and telling the Wizard how many copies of each are present.

Space groups to search. You can request that all space groups with the same point group as the one you start out with be searched, and the best one be chosen. If you select this option then the best space group will be used for model rebuilding in AutoBuild.

Ensembles to search for. AutoMR builds up a model by finding a set of good positions and orientations of one "ensemble", and then using each of those placements as starting points for finding the next ensemble, until all the contents of the asymmetric unit are found and a consistent solution is obtained. You can specify any number of different ensembles to search for, and you can search for any number of copies of each ensemble. The order of searching for ensembles does make a difference. If possible, you want to search for the biggest, best-ordered, most accurate ensemble first. You specify the order when you list the ensembles to search for on the last main window of the AutoMR wizard.

Each ensemble can be specified by a single PDB file or a set of PDB files. The contents of one set of PDB files for an ensemble must all be oriented in the same way, as they will be put together and used as a group always in the molecular replacement process.

You will need to specify how similar you think each input PDB file that is part of an ensemble is to the structure that is in your crystal. You can specify either sequence identity, or expected rmsd. Note that if you use a homology model, you should give the sequence identity of the template from which the model was constructed, not the 100% identity of the model!

Output of AutoMR

Output files from AutoMR

When you run AutoMR the output files will be in a subdirectory with your run number:

AutoMR_run_1_/ # subdirectory with results

A summary file listing the results of the run and the other files produced:

AutoMR_summary.dat # overall summary http://phenix-online.org/documentation/automr.htm (2 of 16) [12/14/08 1:00:50 PM]

64

Automated molecular replacement with AutoMR

A warnings file listing any warnings about the run

AutoMR_warnings.dat # any warnings

A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)

AutoMR_Facts.dat # all Facts about the run

Molecular replacement model, structure factors, and map coefficients:

MR.1.pdb

MR.1.mtz

MR.MAP_COEFFS.1.mtz

The AutoMR wizard writes out MR.1.pdb and MR.1.mtz and MR.MAP_COEFFS.1.mtz well as output log files. The MR.1.pdb file will contain all the components of your MR solution. If there are multiple PDB files in an ensemble, the model with the lowest estimated rmsd is chosen to represent the whole ensemble and is written to MR.1.pdb. If there are multiple copies of a model, the chains are lettered sequentially A B C... The MR.1.mtz file contains the data from your input file to the full resolution available. The MR.MAP_COEFFS.1.mtz file contains sigmaA-weighted 2Fo-

Fc map coefficients based on the rigid-body-refined model.

Model rebuilding. After PHASER molecular replacement the AutoMR Wizard loads the AutoBuild Wizard and sets the defaults based on the MR solution that has just been found. You can use the default values, or you may choose to use 2Fo-Fc maps instead of density-modified maps for rebuilding, or you may choose to start the model-rebuilding with the map coefficients from MR.MAP_COEFFS.1.mtz.

How to run the AutoMR Wizard

Running the AutoMR Wizard is easy. For example, from the command-line you can type: phenix.automr native.sca search.pdb RMS=0.8 mass=23000 copies=1

The AutoMR Wizard will find the best location and orientation of the search model search.pdb in the unit cell based on the data in native.sca, assuming that the RMSD between the correct model and search.

pdb is about 0.8 A, that the molecular mass of the true model is 23000 and that there is 1 copy of this model in the asymmetric unit. Once the AutoMR Wizard has found a solution, it will automatically call the AutoBuild Wizard and rebuild the model.

Components, copies, search models, and ensembles

Your structure is composed of one or more components such as a 20Kd subunit with sequence seq-of-20Kd-subunit.

There may be one or more copies of each component in your structure.

You can search for the location(s) of a component with a search model that consists of a single structure or an ensemble of structures.

What the AutoMR wizard needs to run

In a simple case where you have one search model and are looking for N copies of this model in your structure, you need: http://phenix-online.org/documentation/automr.htm (3 of 16) [12/14/08 1:00:50 PM]

65

Automated molecular replacement with AutoMR

(1) a datafile name (native.sca or data=native.sca)

(2) a search model (search_model.pdb or coords=search_model.pdb)

(3) how similar the search model is to your structure ( RMS=0.8 or identity=75)

(4) information about the contents of the asymmetric unit: (mass=23000 or seq_file=seq.dat) and (copies=1)

It may be advantageous to search using an ensemble of similar structures, rather than a single structure. If you have an ensemble of search models to search for, then specify it as coords='model_1.pdb model_2.pdb model_3.pdb'

In this case you need to give the RMS or identity for each model: identity='45 40 35'. Each of the models in the ensemble must be in the same orientation as the others, so that the ensemble of models can be placed as a group in the unit cell.

If you are searching for more than one ensemble, or if there is more than one component in the a.u., then use the full syntax and specify them as (NOTE copies becomes copies_to_find or component_copies): ensemble_1.coords=s1.pdb ensemble_1.RMS=0.8 ensemble_1.copies_to_find=1 \

component_1.mass=23000 component_1.component_copies=1

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

For example, if your data file data.mtz has columns F SIGF then you might specify data=data.mtz

input_label_string="F SIGF"

You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=data.mtz # display all labels for data.mtz

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end of this page and also general information about running Wizards at

Running a Wizard from a

GUI, the command-line, or a script for how to do this. Some of the most common parameters are:

data=w1.sca # data file model=coords.pdb # starting model seq_file=seq.dat # sequence file

Examples

Standard AutoMR run with coords.pdb native.sca

http://phenix-online.org/documentation/automr.htm (4 of 16) [12/14/08 1:00:50 PM]

66

Automated molecular replacement with AutoMR

Run AutoMR using coords.pdb as search model, native.sca as data, assume RMS between coords.pdb and true model is about 0.85 A, the sequence of true model is seq.dat and there is 1 copy in the unit cell: phenix.automr coords.pdb native.sca RMS=0.85 seq.dat copies=1 \

n_cycle_rebuild_max=2 n_cycle_build_max=2

Specifying data columns

Run AutoMR as above, but specify the data columns explicitly: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \

data=data.mtz input_label_string="F SIGF" \

n_cycle_rebuild_max=2 n_cycle_build_max=2

Note that the data columns are specified by a string that includes both F and SIGF : "F SIGF". The string must match some set of data labels that can be extracted automatically from your data file. You can find the possible values of this string as described above with phenix.automr display_labels=data.mtz

Specifying a refinement file for AutoBuild

Run AutoMR as above, but specify a refinement file that is different from the file used for the MR search: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \

data=data.mtz input_label_string="F SIGF" \

input_refinement_file=refinement.mtz \

input_refinement_labels="FP SIGFP FreeR_flag" \

n_cycle_rebuild_max=2 n_cycle_build_max=2

Note that the commands input_refinement_file and input_refinement_labels are in the scope

"autobuild_variables" . These commands and others with this prefix are passed on to AutoBuild.

Passing any commands to AutoBuild

You can pass any AutoBuild commands on to AutoBuild, even if they are not already defined for you in

AutoMR. Use the command autobuild_input_list_add to add a command, and then apply that command by adding "autobuild_" to the beginning of the command name. For example, to add the commands semet=True and refine=False: phenix.automr coords.pdb RMS=0.85 seq.dat copies=1 \

data=data.mtz input_label_string="F SIGF" \

autobuild_input_list_add='semet refine' \

semet=True \

refine=False

Notes. This applies only to command-line operation of AutoMR. Note that any keywords that are used in both AutoBuild and AutoMR will apply to both if you specify them in autobuild_input_list_add. For example if you set the resolution in AutoBuild with autobuild_input_list_add=resolution and resolution=2.6 then this resolution will apply to both AutoMR and AutoBuild.

AutoMR searching for 2 components

http://phenix-online.org/documentation/automr.htm (5 of 16) [12/14/08 1:00:50 PM]

67

Automated molecular replacement with AutoMR

Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \

resolution=0.0 resolution_build=3.0 \

component_1.component_type=protein component_1.seq_file=beta.seq \

component_1.component_copies=1 \

component_2.component_type=protein component_2.seq_file=blip.seq \

component_2.component_copies=1 \

ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=1 \

ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \

n_cycle_rebuild_max=1

Specifying molecular masses of 2 components

Run AutoMR as in the previous example, except specify the components of the asymmetric unit with molecular masses (30000 and 20000), and define the search models with PDB files and percent sequence identity with the true structures (50% and 60%). phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \

resolution=0.0 resolution_build=3.0 \

component_1.component_type=protein component_1.mass=30000 \

component_1.component_copies=1 \

component_2.component_type=protein component_2.mass=20000 \

component_2.component_copies=1 \

ensemble_1.coords=beta.pdb ensemble_1.identity=50 ensemble_1.copies_to_find=1 \

ensemble_2.coords=blip.pdb ensemble_2.identity=60 ensemble_2.copies_to_find=1 \

n_cycle_rebuild_max=1

AutoMR searching for 2 components, but specifying the orientation of one of them

Run AutoMR on a structure with 2 components. Define the components of the asymmetric unit with sequence files (beta.seq and blip.seq) and number of copies of each component (1). Define the search models with PDB files and estimated RMS from true structures. Define the orientation and position of one component. Define the number of copies to find for each component (0 for beta, which is fixed, 1 for blip). phenix.automr data=beta_blip_P3221.mtz input_label_string="Fobs Sigma" \

resolution=0.0 resolution_build=3.0 \

component_1.component_type=protein component_1.seq_file=beta.seq \

component_1.component_copies=1 \

component_2.component_type=protein component_2.seq_file=blip.seq \

component_2.component_copies=1 \

ensemble_1.coords=beta.pdb ensemble_1.RMS=0.85 ensemble_1.copies_to_find=0 \

ensemble_1.ensembleID="beta" \

ensemble_2.coords=blip.pdb ensemble_2.RMS=0.90 ensemble_2.copies_to_find=1 \

ensemble_2.ensembleID="blip" \

n_cycle_rebuild_max=1 \

fixed_ensembleID_list="beta" \

fixed_euler_list="199.84,41.535,184.15"\

fixed_frac_list="-0.49736,-0.15895,-0.28067"

Note: you have to define an ensemble for the fixed molecule (beta in this example).

Possible Problems

http://phenix-online.org/documentation/automr.htm (6 of 16) [12/14/08 1:00:50 PM]

68

Automated molecular replacement with AutoMR

Specific limitations and problems

The AutoBuild Wizard can build PROTEIN, RNA, or DNA, but it can only build one at a time. If your

MR model contains more than one type of chain, then you will need to run AutoBuild separately from AutoMR and when you run AutoBuild, specify one of them with input_lig_file_list and the type of chain to build with chain_type: input_lig_file_list=ProteinPartofMRmodel.pdb

chain_type=DNA

If you use an ensemble as a search model, the output structure will contain just the first member of the ensemble, so you may wish to put the member that is likely to be the most similar to the true structure as the first one in your ensemble.

If you run AutoMR from the GUI and continue on to AutoBuild, and then select "Start run over

(delete everything for this run)" it will delete your AutoBuild and your AutoMR run and start your

AutoMR run all over.

The AutoMR Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Phaser crystallographic software. A. J. McCoy, R. W. Grosse-Kunstleve, P. D. Adams, M.

D. Winn, L. C. Storoni and R. J. Read J. Appl. Cryst. 40, 658-674 (2007)

Likelihood-enhanced fast translation functions. A.J. McCoy, R.W. Grosse-Kunstleve, L.C.

Storoni & R.J. Read Acta Cryst. D61, 458-464 (2005)

[pdf]

[pdf]

Likelihood-enhanced fast rotation functions. L.C. Storoni, A.J. McCoy and R.J. Read.

Acta Cryst. D60, 432-438 (2004)

[pdf]

Additional information

List of all AutoMR keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- automr

http://phenix-online.org/documentation/automr.htm (7 of 16) [12/14/08 1:00:50 PM]

69

Automated molecular replacement with AutoMR

build= True Run AutoBuild immediately after AutoMR (Command-line only)

data= None Datafile (any standard format) (Command-line only)

copies= None Set both copies_to_find and component_copies with copies. This

is the number of copies of this search model to find, and also the

number of copies of this sequence or mass in the asymmetric unit.

(Command-line only)

ensembleID= ensemble_1 ID for this ensemble. (Command-line only)

copies_to_find= None Number of copies of this ensemble to find in a.u.

(Command-line only)

coords= None model(s) for this ensemble. (Command-line only)

identity= None percent identity(ies) of model(s) in this ensemble to

structure (alternative is RMS). (Command-line only)

RMS= None RMSD(s) of model(s) to structure (alternative is identity).

(Command-line only)

seq_file= None protein seq_file for this component. (Command-line only)

component_type= *protein nucleic_acid protein or nucleic acid.

(Command-line only)

mass= None molecular mass (kDa) of this component. (Command-line only)

component_copies= None Number of copies of this component in the a.u.

(required). (Command-ine only)

special_keywords

write_run_directory_to_file= None Writes the full name of a run

directory to the specified file. This can

be used as a call-back to tell a script

where the output is going to go.

(Command-line only)

run_control

coot= None Set coot to True and optionally run=[run-number] to run Coot

with the current model and map for run run-number. In some wizards

(AutoBuild) you can edit the model and give it back to PHENIX to

use as part of the model-building process. If you just say coot

then the facts for the highest-numbered existing run will be

shown. (Command-line only)

ignore_blanks= None ignore_blanks allows you to have a command-line

keyword with a blank value like "input_lig_file_list="

stop= None You can stop the current wizard with "stopwizard" or "stop".

If you type "phenix.autobuild run=3 stop" then this will stop run

3 of autobuild. (Command-line only)

display_facts= None Set display_facts to True and optionally

run=[run-number] to display the facts for run run-number.

If you just say display_facts then the facts for the

highest-numbered existing run will be shown.

(Command-line only)

display_summary= None Set display_summary to True and optionally

run=[run-number] to show the summary for run

run-number. If you just say display_summary then the

summary for the highest-numbered existing run will be

shown. (Command-line only)

carry_on= None Set carry_on to True to carry on with highest-numbered

run from where you left off. (Command-line only)

run= None Set run to n to continue with run n where you left off.

(Command-line only)

copy_run= None Set copy_run to n to copy run n to a new run and continue

where you left off. (Command-line only)

display_runs= None List all runs for this wizard. (Command-line only)

delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)

display_labels= None display_labels=test.mtz will list all the labels

that identify data in test.mtz. You can use the label http://phenix-online.org/documentation/automr.htm (8 of 16) [12/14/08 1:00:50 PM]

70

Automated molecular replacement with AutoMR

strings that are produced in AutoSol to identify which

data to use from a datafile like this: peak.data="F+

SIGF+ F- SIGF-" # the entire string in quotes counts

here You can use the individual labels from these

strings as identifiers for data columns in AutoSol and

AutoBuild like this: input_refinement_labels="FP SIGFP

FreeR_flags" # each individual label counts

dry_run= False Just read in and check parameter names

params_only= False Just read in and return parameter defaults

display_all= False Just read in and display parameter defaults

autobuild_variables

two_fofc_in_rebuild= None Actively sets two_fofc_in_rebuild in

AutoBuild. NOTE: value is not checked

include_input_model= None Actively sets include_input_model in

AutoBuild. NOTE: value is not checked

n_cycle_rebuild_min= None Actively sets n_cycle_rebuild_min in

AutoBuild. NOTE: value is not checked

n_cycle_rebuild_max= None Actively sets n_cycle_rebuild_max in

AutoBuild. NOTE: value is not checked

n_cycle_build_min= None Actively sets n_cycle_build_min in AutoBuild.

NOTE: value is not checked

n_cycle_build_max= None Actively sets n_cycle_build_max in AutoBuild.

NOTE: value is not checked

rebuild_in_place= None Actively sets rebuild_in_place in AutoBuild.

NOTE: value is not checked

thorough_denmod= None Actively sets thorough_denmod in AutoBuild. NOTE:

value is not checked

i_ran_seed= None Actively sets i_ran_seed in AutoBuild. NOTE: value is

not checked

start_chains_list= None Actively sets start_chains_list in AutoBuild.

NOTE: value is not checked

input_refinement_file= None Actively sets input_refinement_file in

AutoBuild. NOTE: value is not checked

input_refinement_labels= None Actively sets input_refinement_labels in

AutoBuild. NOTE: value is not checked

input_labels= None Actively sets input_labels in AutoBuild. NOTE: value

is not checked

resolve_command_list= None Actively sets resolve_command_list in

AutoBuild. NOTE: value is not checked

resolve_pattern_command_list= None Actively sets

resolve_pattern_command_list in AutoBuild.

NOTE: value is not checked

morph= None Actively sets morph in AutoBuild. NOTE: value is not checked

morph_rad= None Actively sets morph_rad in AutoBuild. NOTE: value is not

checked

ensemble_1

ensembleID= ensemble_1 ID for this ensemble. (Command-line only)

copies_to_find= None Number of copies of this ensemble to find in a.u.

(Command-line only)

coords= None model(s) for this ensemble. (Command-line only)

identity= None percent identity(ies) of model(s) in this ensemble to

structure (alternative is RMS). (Command-line only)

RMS= None RMSD(s) of model(s) to structure (alternative is identity).

(Command-line only)

ensemble_2

ensembleID= ensemble_2 ID for this ensemble. (Command-line only)

copies_to_find= None Number of copies of this ensemble to find in a.u.

(Command-line only) http://phenix-online.org/documentation/automr.htm (9 of 16) [12/14/08 1:00:50 PM]

71

Automated molecular replacement with AutoMR

coords= None model(s) for this ensemble. (Command-line only)

identity= None percent identity(ies) of model(s) in this ensemble to

structure (alternative is RMS). (Command-line only)

RMS= None RMSD(s) of model(s) to structure (alternative is identity).

(Command-line only)

ensemble_3

ensembleID= ensemble_3 ID for this ensemble. (Command-line only)

copies_to_find= None Number of copies of this ensemble to find in a.u.

(Command-line only)

coords= None model(s) for this ensemble. (Command-line only)

identity= None percent identity(ies) of model(s) in this ensemble to

structure (alternative is RMS). (Command-line only)

RMS= None RMSD(s) of model(s) to structure (alternative is identity).

(Command-line only)

ensemble_4

ensembleID= ensemble_4 ID for this ensemble. (Command-line only)

copies_to_find= None Number of copies of this ensemble to find in a.u.

(Command-line only)

coords= None model(s) for this ensemble. (Command-line only)

identity= None percent identity(ies) of model(s) in this ensemble to

structure (alternative is RMS). (Command-line only)

RMS= None RMSD(s) of model(s) to structure (alternative is identity).

(Command-line only)

ensemble_5

ensembleID= ensemble_5 ID for this ensemble. (Command-line only)

copies_to_find= None Number of copies of this ensemble to find in a.u.

(Command-line only)

coords= None model(s) for this ensemble. (Command-line only)

identity= None percent identity(ies) of model(s) in this ensemble to

structure (alternative is RMS). (Command-line only)

RMS= None RMSD(s) of model(s) to structure (alternative is identity).

(Command-line only)

component_1

seq_file= None protein seq_file for this component. (Command-line only)

component_type= *protein nucleic_acid protein or nucleic acid.

(Command-line only)

mass= None molecular mass (kDa) of this component. (Command-line only)

component_copies= None Number of copies of this component in the a.u.

(required). (Command-ine only)

component_2

seq_file= None protein seq_file for this component. (Command-line only)

component_type= *protein nucleic_acid protein or nucleic acid.

(Command-line only)

mass= None molecular mass (kDa) of this component. (Command-line only)

component_copies= None Number of copies of this component in the a.u.

(required). (Command-ine only)

component_3

seq_file= None protein seq_file for this component. (Command-line only)

component_type= *protein nucleic_acid protein or nucleic acid.

(Command-line only)

mass= None molecular mass (kDa) of this component. (Command-line only)

component_copies= None Number of copies of this component in the a.u.

(required). (Command-ine only)

component_4

seq_file= None protein seq_file for this component. (Command-line only)

component_type= *protein nucleic_acid protein or nucleic acid.

(Command-line only)

mass= None molecular mass (kDa) of this component. (Command-line only) http://phenix-online.org/documentation/automr.htm (10 of 16) [12/14/08 1:00:50 PM]

72

Automated molecular replacement with AutoMR

component_copies= None Number of copies of this component in the a.u.

(required). (Command-ine only)

component_5

seq_file= None protein seq_file for this component. (Command-line only)

component_type= *protein nucleic_acid protein or nucleic acid.

(Command-line only)

mass= None molecular mass (kDa) of this component. (Command-line only)

component_copies= None Number of copies of this component in the a.u.

(required). (Command-ine only)

crystal_info

cell= 0.0 0.0 0.0 0.0 0.0 0.0

Enter cell parameter a b c alpha beta

gamma

chain_type= *Auto PROTEIN DNA RNA You can specify whether to build

protein, DNA, or RNA chains. At present you can only build

one of these in a single run. If you have both DNA and

protein, build one first, then run AutoBuild again,

supplying the prebuilt model in the "input_lig_file_list"

and build the other. NOTE: default for this keyword is Auto,

which means "carry out normal process to guess this

keyword". The process is to look at the sequence file and/or

input pdb file to see what the chain type is. If there are

more than one type, the type with the larger number of

residues is guessed. If you want to force the chain_type,

then set it to PROTEIN RNA or DNA.

resolution= 0.0

Enter the high-resolution limit for MR search. All the

data input will be written out regardless of your choice. By

default, the final rigid-body refinement will use all data.

sg= None Space Group symbol (i.e., C2221 or C 2 2 21)

decision_making

min_seq_identity_percent= 50.0

The sequence in your input PDB file will

be adjusted to match the sequence in your

sequence file (if any). If there are

insertions/deletions in your model and the

wizard does not seem to identify them, you can

split up your PDB file by adding records like

this: BREAK You can specify the minimum

sequence identity between your sequence file

and a segment from your input PDB file to

consider the sequences to be matched. Default

is 50.0%. You might want a higher number to

make sure that deletions in the sequence are

noticed.

overlap_allowed= None Solutions with no C-alpha clashes will be

accepted. If the best packing has some clashes,

solutions with that number of clashes will be accepted,

as long as this does not exceed the maximum allowed.

You can choose to increase the maximum if the packing

is tight and your search molecule is not exactly the

same as the molecule in the cell. If you leave it blank

then Phaser will decide for you.

selection_criteria_rot= *Percent_of_best Number_of_solutions Z_score All

Choose a criterion for keeping rotation

solutions at each stage. The choices are:

Percent of Best Score: AutoMR looks down the list

of LLG scores and only keeps the ones that

differ from the mean by more than the chosen

percentage, compared to the top solution. Enter

your desired percentage into the entry field http://phenix-online.org/documentation/automr.htm (11 of 16) [12/14/08 1:00:50 PM]

73

Automated molecular replacement with AutoMR

(default=75%) Number of Solutions: Keep the N

top solutions (you can set N; default=1)

Z-score: Keep all the solutions with a Z-score

greater than X (you can set X; default=6). All:

Keep everything and go on holiday while Phaser

crunches through it all (definitely not

recommended!)

selection_criteria_rot_value= 75 Choose a value for your criterion for

keeping rotation solutions at each stage.

Percent of Best Score: AutoMR looks down

the list of LLG scores and only keeps the

ones that differ from the mean by more

than the chosen percentage, compared to

the top solution. Enter your desired

percentage into the entry field

(default=75%) Number of Solutions: Keep

the N top solutions (you can set N;

default=1) Z-score: Keep all the solutions

with a Z-score greater than X (you can set

X; default=6). All: Keep everything and go

on holiday while Phaser crunches through

it all (definitely not recommended!)

fixed_ensembles

fixed_ensembleID_list= None Enter the ID (set with ensemble_1.ensembleID

or equivalent) of the component that is to be

fixed. NOTE 1: Each ensemble in

fixed_ensembleID_list must be defined. NOTE 2:

you can enter more than one fixed component if

you want. If you do, then enter fixed_euler_list

in multiples of 3 numbers and also

fixed_frac_list in multiples of 3 numbers.

fixed_euler_list= 0.0 0.0 0.0

Enter Euler angles (from AutoMR or Phaser)

for fixed component defined with

fixed_ensembleID_list. NOTE 2: you can enter more than

one fixed component if you want. If you do, then enter

fixed_euler_list in multiples of 3 numbers and also

fixed_frac_list in multiples of 3 numbers.

fixed_frac_list= 0.0 0.0 0.0

Enter fractional offset (location) for

fixed component (from AutoMR or Phaser) for fixed

component defined with fixed_ensembleID_list. NOTE 2:

you can enter more than one fixed component if you

want. If you do, then enter fixed_euler_list in

multiples of 3 numbers and also fixed_frac_list in

multiples of 3 numbers.

general

all_plausible_sg_list= None Choose which space groups to search

autobuild_input_list_add= None You can add keywords to those that AutoMR

passes on to AutoBuild (command-line only) The

format for this command is:

autobuild_input_list_add='semet refine' Then

you can set any of the variables you specify

by adding the prefix "autobuild_" to the name

of your variable: autobuild_semet=False

autobuild_refine=True This will now set

'semet'=False and refine=True in AutoBuild

background= True When you specify nproc=nn, you can run the jobs in

background (default if nproc is greater than 1) or

foreground (default if nproc=1). If you set http://phenix-online.org/documentation/automr.htm (12 of 16) [12/14/08 1:00:50 PM]

74

Automated molecular replacement with AutoMR

run_command=qsub (or otherwise submit to a batch queue),

then you should set background=False, so that the batch

queue can keep track of your runs. There is no need to use

background=True in this case because all the runs go as

controlled by your batch system. If you use run_command=csh

(or similar, csh is default) then normally you will use

background=True so that all the jobs run simultaneously.

base_path= None You can specify the base path for files (default is

current working directory)

clean_up= False At the end of the entire run the TEMP directories will

be removed if clean_up is True. The default is No, keep these

directories. If you want to remove them after your run is

finished use a command like "phenix.autobuild run=1

clean_up=True"

coot_name= coot If your version of coot is called something else, then

you can specify that here.

debug= False You can have the wizard stop with error messages about the

code if you use debug. NOTE: you cannot use Pause with debug.

do_anisotropy_correction= True Choose whether you want to apply

anisotropy correction

extra_verbose= False Facts and possible commands will be printed every

cycle if Yes

max_wait_time= 100.0

You can specify the length of time (seconds) to

wait when testing the run_command. If you have a cluster

where jobs do not start right away you may need a longer

time to wait.

nbatch= 1 You can specify the number of processors to use (nproc) and

the number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors

available and leave nbatch alone. If you leave nbatch as None it

will be set automatically, with a value depending on the Wizard.

This is recommended. The value of nbatch can affect the results

that you get, as the jobs are not split into exact replicates,

but are rather run with different random numbers. If you want to

get the same results, keep the same value of nbatch.

nproc= 1 You can specify the number of processors to use (nproc) and the

number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors available

and leave nbatch alone. If you leave nbatch as None it will be

set automatically, with a value depending on the Wizard. This is

recommended. The value of nbatch can affect the results that you

get, as the jobs are not split into exact replicates, but are

rather run with different random numbers. If you want to get the

same results, keep the same value of nbatch.

run_command= csh When you specify nproc=nn, you can run the subprocesses

as jobs in background with csh (default) or submit them to

a queue with the command of your choice (i.e., qsub ). If

you have a multi-processor machine, use csh. If you have a

cluster, use qsub or the equivalent command for your

system. NOTE: If you set run_command=qsub (or otherwise

submit to a batch queue), then you should set

background=False, so that the batch queue can keep track of

your runs. There is no need to use background=True in this

case because all the runs go as controlled by your batch

system. If you use run_command=csh (or similar, csh is

default) then normally you will use background=True so that

all the jobs run simultaneously.

skip_xtriage= False You can bypass xtriage if you want. This will http://phenix-online.org/documentation/automr.htm (13 of 16) [12/14/08 1:00:50 PM]

75

Automated molecular replacement with AutoMR

prevent you from applying anisotropy corrections, however.

temp_dir= None Define a temporary directory (it must exist)

title= Run 1 AutoMR Sun Dec 7 17:46:24 2008 Enter any text you like to

help identify what you did in this run

top_output_dir= None This is used in subprocess calls of wizards and to

tell the Wizard where to look for the STOPWIZARD file.

use_all_plausible_sg= False Normally you will want to search all space

groups with the same point group as you may not

know which is correct from your data. You can

select which of these to choose using 'Choose

variable to set' and selecting

'all_plausible_sg_list'

verbose= False Command files and other verbose output will be printed

input_files

input_data_file= None Enter the a file with input structure factor data.

For structure factor data only (e.g., FP SIGFP) any

format is ok. If you have free R flags, phase

information or HL coefficients that you want to use

then an mtz file is required. If this file contains

phase information, this phase information should be

experimental (i.e., MAD/SAD/MIR etc), and should not be

density-modified phases (enter any files with

density-modified phases as input_map_file instead).

NOTE: If you supply HL coefficients they will be used

in phase recombination. If you supply PHIB or PHIB and

FOM and not HL coefficients, then HL coefficients will

be derived from your PHIB and FOM and used in phase

recombination. If you also specify a hires data file,

then FP and SIGFP will come from that data file (and

not this one) If an input_refinement_file is

specified, then F, Sigma, FreeR_flag (if present) from

that file will be used for refinement instead of this

one.

input_label_string= None Choose the set of labels that represent the

data and sigma columns for your data. NOTE: Applies

to input data file for AutoMR. See also

'input_labels', which applies to input data file for

AutoBuild.

input_pdb_file= None You can enter a PDB file containing a starting

model of your structure NOTE: If you enter a PDB file

then the AutoBuild wizard will start right in with

rebuild steps, skipping the build process. If the model

is very poor than it may be better to leave it out as

the build process (which includes pattern recognition

and recognition of helical and strand fragments) is

optimized for improving poor maps, while the rebuild

process is optimized for better maps that can be

produced by having a partial model.

input_seq_file= None Enter name of file with 1-letter code of protein

sequence NOTES: 1. lines starting with > are ignored

and separate chains 2. FASTA format is fine 3. If

there are multiple copies of a chain, just enter one

copy. 4. If you enter a PDB file for rebuilding and it

has the sequence you want, then the sequence file is not

necessary. NOTE: You can also enter the name of a PDB

file that contains SEQRES records, and the sequence from

the SEQRES records will be read, written to

seq_from_seqres_records.dat, and used as your input http://phenix-online.org/documentation/automr.htm (14 of 16) [12/14/08 1:00:50 PM]

76

Automated molecular replacement with AutoMR

sequence. NOTE: for AutoBuild you can specify

start_chains_list on the first line of your sequence

file: >> start_chains_list 23 11 5 NOTE: default

for this keyword is Auto, which means "carry out normal

process to guess this keyword". This means if you

specify "after_autosol" in AutoBuild, AutoBuild will

automatically take the value from AutoSol. If you do not

want this to happen, you can specify None which means

"No file"

input_seq_file_list= None The keyword input_seq_file_list is used in

AutoMR to specify the molecular masses of the

components of the unit cell using a set of sequence

files. Usually you should input the sequences of

the actual components of the unit cell here (one

sequence file for each component). NOTE: If no

input_seq_file is specified, then the sequences

from input_seq_file_list are used to create a new

file "composite_seq.dat" with all their sequences

and this is used as the input_seq_file. NOTE: the

format of each file in input_seq_file_list is the

1-letter code of the protein sequence (separate

chains with >>>>)

model_building

build_type= *RESOLVE_AND_TEXTAL RESOLVE TEXTAL You can choose to build

models with RESOLVE and TEXTAL or either one, and how many

different models to build with RESOLVE. The more you build,

the more likely to get a complete model. Note that

rebuild_in_place can only be carried out with RESOLVE

model-building

rebuild_after_mr= True You can choose to go right on to the AutoBuild

wizard with the rebuild-in-place option after running

molecular replacement.

resolution_build= 0.0

Enter the high-resolution limit for

model-building. If 0.0, the value of resolution is

used as a default.

semet= False You can specify that the dataset that is used for

refinement is a selenomethionine dataset, and that the model

should be the SeMet version of the protein, with all SD of MET

replaced with Se of MSE.

non_user_parameters

composition_num_list= 1 Enter number of copies of this component

weight_list= 0.0

Molecular weight of component (Da; e.g. 30000)

weight_seq_list= None Choose whether to define composition through

molecular weight or sequence

refinement

link_distance_cutoff= 3.0

You can specify the maximum bond distance for

linking residues in phenix.refine called from the

wizards.

r_free_flags_fraction= 0.1

Maximum fraction of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_lattice_symmetry_max_delta= 5.0

You can set the maximum

deviation of distances in the

lattice that are to be http://phenix-online.org/documentation/automr.htm (15 of 16) [12/14/08 1:00:50 PM]

77

Automated molecular replacement with AutoMR

considered the same for

purposes of generating a

lattice-symmetry-unique set of

free R flags.

r_free_flags_max_free= 2000 Maximum number of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_use_lattice_symmetry= True When generating r_free_flags you

can decide whether to include lattice

symmetry (good in general, necessary

if there is twinning).

http://phenix-online.org/documentation/automr.htm (16 of 16) [12/14/08 1:00:50 PM]

78

Automated Model Building and Rebuilding using AutoBuild

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Automated Model Building and Rebuilding using AutoBuild

Author(s)

Purpose

Purpose of the AutoBuild Wizard

Usage

How the AutoBuild Wizard works

Automation and user control

Core modules in the AutoBuild Wizard

How to run the AutoBuild Wizard

What the AutoBuild wizard needs to run

...and optional files

Specifying which columns of data to use from input data files

Specifying other general parameters

Picking waters in AutoBuild

Keeping waters from your input file in AutoBuild

Specifying phenix.refine parameters

Specifying resolve/resolve_pattern parameters

Including ligand coordinates in AutoBuild

Specifying arbitrary commands and cif files for phenix.refine

Output files from AutoBuild

Standard building, rebuild_in_place, and multiple-models

Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel

Model editing during rebuilding with the Coot-PHENIX interface

Resolution limits in AutoBuild

Examples

Run AutoBuild automatically after AutoSol

Run AutoBuild beginning with experimental data

Merge in hires data

Make a SA-omit map around atoms in target.pdb

Make a simple composite omit map

Make an iterative-build omit map around atoms in target.pdb

Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb

Create one very good rebuilt model

Touch up a model

Create 20 very good rebuilt models that are as different as possible

Morph an MR model and rebuild it

Build an RNA chain

Build a DNA chain

Just make maps; don't do any building.

Just calculate a prime-and-switch map

Possible Problems

General limitations

Specific limitations and problems

Literature

Additional information

List of all AutoBuild keywords

Author(s)

http://phenix-online.org/documentation/autobuild.htm (1 of 34) [12/14/08 1:01:09 PM]

79

Automated Model Building and Rebuilding using AutoBuild

AutoBuild Wizard: Tom Terwilliger

PHENIX GUI and PDS Server: Nigel W. Moriarty

● phenix.refine: Ralf W. Grosse-Kunstleve, Peter Zwart and Paul D. Adams

RESOLVE: Tom Terwilliger

TEXTAL: Kreshna Gopal, Thomas Ioerger, Rita Pai, Tod Romo, James Sacchettini, Erik McKee, Lalji Kanbi

● phenix.xtriage: Peter Zwart

Purpose

Purpose of the AutoBuild Wizard

The purpose of the AutoBuild Wizard is to provide a highly automated system for model rebuilding and completion. The Wizard design allows the user to specify data files and parameters through an interactive GUI, or alternatively through keyworded scripts. The AutoBuild Wizard begins with datafiles with structure factor amplitudes and uncertainties, along with either experimental phase information or a starting model, carries out cycles of model-building and refinement alternating with model-based density modification, and producing a relatively complete atomic model.

The AutoBuild Wizard uses RESOLVE, (optionally also TEXTAL), xtriage and phenix.refine to build an atomic model, refine it, and improve it with iterative density modification, refinement, and model-building

The Wizard begins with either experimental phases (i.e., from AutoSol) or with an atomic model that can be used to generate calculated phases. The AutoBuild Wizard produces a refined model that can be nearly complete if the data are strong and the resolution is about 2.5 A or better. At lower resolutions (2.5 - 3 A) the model may be less complete and at resolutions > 3A the model may be quite incomplete and not well refined.

The AutoBuild Wizard can be used to generate OMIT maps (simple omit, SA-omit, iterative-build omit) that can cover the entire unit cell or specific residues in a PDB file.

The AutoBuild Wizard can generate a set of models compatible with experimental data (multiple_models)

Usage

The AutoBuild Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script

files. All three versions are identical except in the way that they take commands from the user. See Running a

Wizard from a GUI, the command-line, or a script for details of how to run a Wizard. The command-line

version will be described here.

How the AutoBuild Wizard works

The AutoBuild Wizard begins with experimental structure factor amplitudes, along with either experimental or model-based estimates of crystallographic phases. The phase information is improved by using statistical density modification to improve the correlation of NCS-related density in the map (if present) and to improve the match of the distribution of electron densities in the map with those expected from a model map. This improved map is then used to build and refine an atomic model.

In subsequent cycles, the models from previous cycles are used as a source of phase information in statistical density modification, iteratively improving the quality of the map used for model-building.

Additionally, during the first few cycles additional phase information is obtained by detecting and enhancing

(1) the presence of commonly-found local patterns of density in the map, and (2) the presence of density in the shape of helices and strands. The final model obtained is analyzed for residue-based map correlation and density at the coordinates of individual atoms, and an analysis including a summary of atoms and residues that are in strong, moderate, or weak density and out of density is provided.

Automation and user control

http://phenix-online.org/documentation/autobuild.htm (2 of 34) [12/14/08 1:01:09 PM]

80

Automated Model Building and Rebuilding using AutoBuild

The AutoBuild Wizard has been designed for ease of use combined with maximal user control, with as many parameters set automatically by the Wizard as possible, but maintaining parameters accessible to the user through a GUI and through keyword-based scripts. The Wizard uses the input/output routines of the cctbx library, allowing data files of many different formats so that the user does not have to convert their data to any particular format before using the Wizard. Use of the phenix.refine refinement package in the AutoBuild

Wizard allows a high degree of automation of refinement so that the neither user nor Wizard is required to specify parameters for refinement. The phenix.refine package automatically includes a bulk solvent model and automatically places solvent molecules.

Core modules in the AutoBuild Wizard

The five core modules in the AutoBuild Wizard are

(1) building a new model into an electron density map

(2) rebuilding an existing model

(3) refinement

(4) iterative model- building beginning from experimental phase information, and

(5) iterative model-building beginning from a model.

The standard procedures available in the AutoBuild Wizard that are based on these modules include:

(a) model-building and completion starting from experimental phases,

(b) rebuilding a model from scratch, with or without experimental phase information, and

(c) rebuilding a model in place, maintaining connectivity and sequence register.

Starting from a set of experimental phases and structure factor amplitudes, normally procedure (a) is carried out, and then the resulting model is rebuilt with procedure (b).

Starting from a model (e.g., from molecular replacement) and experimental structure factor amplitudes, procedure (c) is normally carried out if the starting model differs less than about 50% in sequence from the desired model, and otherwise procedure (b) is used.

How to run the AutoBuild Wizard

Running the AutoBuild Wizard is easy. For example, from the command-line you can type: phenix.autobuild data=w1.sca seq.dat model=coords.pdb

The AutoBuild Wizard will carry out iterative model-building, density modification and refinement based on the data in w1.sca and the model in coords.pdb, editing the model as necessary to match the sequence in seq.dat.

What the AutoBuild wizard needs to run

(1) a data file, optionally with phases and HL coeffs and freeR flag (w1.sca or data=w1.sca)

(2) a sequence file (seq.dat or seq_file=seq.dat) or a model (coords.pdb or model=coords.pdb)

...and optional files

(3) coefficients for a starting map (map_file=resolve.mtz)

(4) a file for refinement (refinement_file=exptl_fobs_freeR_flags.mtz)

(5) a high-resolution datafile (hires_file=high_res.sca)

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with

"None" for anything that is missing. http://phenix-online.org/documentation/autobuild.htm (3 of 34) [12/14/08 1:01:09 PM]

81

Automated Model Building and Rebuilding using AutoBuild

For example, if your data file ref.mtz has columns FP SIGFP and FreeR then you might specify refinement_file=ref.mtz

input_refinement_labels="FP SIGFP None None None None None None FreeR"

The keywords for labels and anticipated input labels (program labels) are: input_labels (for data file): FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag input_refinement_labels: FP SIGFP FreeR_flag input_map_labels: FP PHIB FOM input_hires_labels: FP SIGFP FreeR_flag

You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=w1.mtz # display all labels for w1.mtz

NOTES: if your data files contain a mixture of amplitude and intensity data then only the amplitude data is available. If you have only intensity data in a data file and want to select specific columns, then you need to specify the column names as they are after importing the data and conversion to amplitudes (see below under

General Limitations for details).

Specifying other general parameters

You can specify many more parameters as well. See the list of keywords, defaults and descriptions at the end

of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script

for how to do this. Some of the most common parameters are: data=w1.sca # data file model=coords.pdb # starting model seq_file=seq.dat # sequence file map_file=map_coeffs.mtz # coefficients for a starting map for building resolution=3 # dmin of 3 A s_annealing=True # use simulated annealing refinement at start of each cycle n_cycle_build_max=5 # max number of build cycles (starting from experimental phases) n_cycle_rebuild_max=5 # max number of rebuild cycles (starting from a model)

Picking waters in AutoBuild

By default AutoBuild will instruct phenix.refine to pick waters using its standard procedure. This means that if the resolution of the data is high enough (typically 3 A) then waters are placed.

You can tell AutoBuild not to have phenix.refine pick waters with the command: place_waters=False

If you want to place waters at a lower resolution, you will need to reset the low-resolution cutoff for placing waters in phenix.refine. You would do that in a "refinement_params.eff" file containing lines like these (see below for passing parameters to phenix.refine with an ".eff" file): refinement {

ordered_solvent {

low_resolution = 2.8

}

}

Keeping waters from your input file in AutoBuild

http://phenix-online.org/documentation/autobuild.htm (4 of 34) [12/14/08 1:01:09 PM]

82

Automated Model Building and Rebuilding using AutoBuild

You can tell AutoBuild to keep the waters in your input file when you are using rebuild_in_place (the default is to toss them and replace them with new ones). You can say, keep_input_waters=True place_waters=No

NOTE: If you specify keep_input_waters=True you should also specify either "place_waters=No" or

"keep_pdb_atoms=No" . This is because if place_waters=Yes and keep_pdb_atoms=Yes then phenix.refine will add waters and then the wizard will keep the new waters from the new PDB file created by phenix.refine preferentially over the ones in your input file.

Specifying phenix.refine parameters

You can control phenix.refine parameters that are not specified directly by AutoBuild using a refinement parameters (.eff) file: refine_eff_file=refinement_params.eff # set any phenix.refine params not set by AutoBuild

This file might contain a twin-law for refinement: refinement {

twinning {

twin_law = "-k, -h, -l"

}

}

You can put any phenix.refine parameters in this file, but a few parameters that are set directly by AutoBuild override your inputs from the refine_eff_file. These parameters are listed below.

Refinement parameters that must be set using AutoBuild Wizard keywords (overwriting any values provided by user in input_eff_file)

phenix.refine keyword refinement.main.

number_of_macro_cycles refinement.main.

simulated_annealing refinement.ncs.find_automatically refinement.main.ncs refinement.ncs.coordinate_sigma refinement.main.random_seed

Wizard keyword(s) and notes ncycle_refine s_annealing (only applies to 1st refinement in rebuild. SA in any other refinements controlled by input_eff_file, if any) refine_with_ncs=True turns on automatic ncs search refine_with_ncs=True turns on ncs

Normally not set by Wizard. However if the Wizard keyword ncs_refine_coord_sigma_from_rmsd is True then the ncs coordinate sigma is equal to ncs_refine_coord_sigma_from_rmsd_ratio times the rmsd among ncs copies i_ran_seed sets the random seed at the beginning of a

Wizard... this affects refinement.main.random_seed but does not set it to the value of i_ran_seed (because i_ran_seed gets updated by several different routines) http://phenix-online.org/documentation/autobuild.htm (5 of 34) [12/14/08 1:01:09 PM]

83

Automated Model Building and Rebuilding using AutoBuild refinement.main.ordered_solvent refinement.main.ordered_solvent refinement.ordered_solvent.

low_resolution refinement.main.

use_experimental_phases refinement.refine.strategy refinement.main.occupancy_max refinement.refine.occupancies.

individual refinement.main.high_resolution place_waters=True will set ordered_solvent to True. Note that this only has an effect if the value of the resolution cutoff for adding waters (refinement.ordered_solvent.

low_resolution) is higher than the resolution used for refinement. place_waters_in_combine=True will set ordered_solvent to

True, only applying this to the final combination step of multiple-model generation. Note that this only has an effect if the value of the resolution cutoff for adding waters

(refinement.ordered_solvent.low_resolution) is higher than the resolution used for refinement. ordered_solvent_low_resolution=3.0 (default) will set the resolution cutoff for adding waters (refinement.

ordered_solvent.low_resolution) to 3 A. If the resolution used for refinement is larger than the value of ordered_solvent_low_resolution then ordered solvent is not added. use_mlhl=True will set refinement.main.

use_experimental_phases to True

The Wizard keywords refine refine_b refine_xyz all affect refinement.refine.strategy. If refine=True then refinement is carried out. If refine_b=True (default) isotropic displacement factors are refined. If refine_xyz=True

(default) coordinates are refined.

max_occ=1.0 sets the value of refinement.main.

occupancy_max to 1.0. Default is to do nothing and use the default from phenix.refine (1.0)

The combination of Wizard keywords of semet=True and refine_se_occ=True will add "(name SE)" to the value of refinement.refine.occupancies.individual. You can add to your .eff file other names of atoms to have occupancies refined as well.

Either of the Wizard keywords refinement_resolution and resolution will set the value of refinement.main.

high_resolution, with refinement_resolution being used if available. link_distance_cutoff refinement.pdb_interpretation.

link_distance_cutoff

The following parameters controlling phenix.refine output are set directly in AutoBuild and cannot be set by the user

● refinement.output.write_eff_file

● refinement.output.write_geo_file

● refinement.output.write_def_file

● refinement.output.write_maps

● refinement.output.write_map_coefficients

Specifying resolve/resolve_pattern parameters

Similarly, you can control resolve and resolve_pattern parameters. For these parameters, your inputs will not be overridden by AutoBuild. The format is a little tricky: you have to put two sets of quotes around the http://phenix-online.org/documentation/autobuild.htm (6 of 34) [12/14/08 1:01:09 PM]

84

Automated Model Building and Rebuilding using AutoBuild command like this: resolve_command="'resolution 200 3'" # NOTE ' and " quotes

This will put the text resolution 200 3 at the end of every temporary command file created to run resolve. (This is why it is not overridden by

AutoBuild commands; they will all come before your commands in the resolve command file.) Note that some commands in resolve may be incompatible with this usage.

Including ligand coordinates in AutoBuild

If your input PDB file contains ligands (anything other than solvent that is not protein if your chain_type=PROTEIN, for example) then by default these ligands will be kept, used in refinement, and written out to your output PDB file. Any solvent molecules will by default be discarded. You can change this behavior by changing the keywords from these defaults: keep_input_ligands=True keep_input_waters=False

The AutoBuild Wizard will use phenix.elbow to generate geometries for any ligands that are not recognized.

You can also tell AutoBuild to add the contents of any PDB files that you wish to supply to the current version of the structure just before refinement, so all the refined models produced contain whatever AutoBuild has built, plus the contents of these PDB files. This can be done through the GUI, the command-line, or a script. In the command-line version you do this with: input_lig_file_list=my_ligand.pdb

NOTE: The files in input_lig_file_list will be edited to make them all HETATM records to tell AutoBuild to ignore these residues in rebuilding.

NOTE You may need to tell phenix.refine about the geometry of your ligands. You will get an error message if the ligand is not recognized and an automatic run of phenix.elbow does not succeed in generating your ligand.

In that case you will want to run phenix.elbow to create a cif definition file for this ligand: phenix.elbow my_ligand.pdb --id=LIG where LIG is the 3-letter ID code that you use in my_ligand.pdb to identify your ligand. If the automatic run does not work you may need to give phenix.elbow additional information to generate your ligand.

Once phenix.elbow has generated your ligand you can use the keyword "cif_def_file_list" to tell AutoBuild about this ligand: cif_def_file_list=elbow.LIG.my_ligand.pdb.cif

Specifying arbitrary commands and cif files for phenix.refine

You can tell AutoBuild to apply any set of cif definitions to the model during refinement by using a combination of specification files and the commands cif_def_file_list and refine_eff_file_list: refine_eff_file_list=link.eff cif_def_file_list=link.cif

This example comes from the phenix.refine manual page in which a link is specified in a cif definition file link.

cif: http://phenix-online.org/documentation/autobuild.htm (7 of 34) [12/14/08 1:01:09 PM]

85

Automated Model Building and Rebuilding using AutoBuild

data_mod_5pho

# loop_

_chem_mod_atom.mod_id

_chem_mod_atom.function

_chem_mod_atom.atom_id

_chem_mod_atom.new_atom_id

_chem_mod_atom.new_type_symbol

_chem_mod_atom.new_type_energy

_chem_mod_atom.new_partial_charge

5pho add . O5T O OH .

loop_

_chem_mod_bond.mod_id

_chem_mod_bond.function

_chem_mod_bond.atom_id_1

_chem_mod_bond.atom_id_2

_chem_mod_bond.new_type

_chem_mod_bond.new_value_dist

_chem_mod_bond.new_value_dist_esd

5pho add O5T P coval 1.520 0.020

and this is applied with a parameters file link.eff:

refinement.pdb_interpretation.apply_cif_modification

{

data_mod = 5pho

residue_selection = resname GUA and name O5T

}

You can have any number of cif files and parameters files.

Output files from AutoBuild

When you run AutoBuild the output files will be in a subdirectory with your run number:

AutoBuild_run_1_/ # subdirectory with results

A summary file listing the results of the run and the other files produced:

AutoBuild_summary.dat # overall summary

A warnings file listing any warnings about the run

AutoBuild_warnings.dat # any warnings

A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)

AutoBuild_Facts.dat # all Facts about the run

Final refined model overall_best.pdb

NOTE: The "overall_best.pdb" file is always the current best model. Similarly

"overall_best_denmod_map_coeffs.mtz" is always the best map_coefficients file. The

AutoBuild_summary.dat file lists the names of the current best set of files. The contents of "overall_best.

http://phenix-online.org/documentation/autobuild.htm (8 of 34) [12/14/08 1:01:09 PM]

86

Automated Model Building and Rebuilding using AutoBuild

● pdb" and of the best model listed in AutoBuild_summary.dat will be the same.

Final map coefficients used to build refined model. Use FP PHIM FOMM in maps. Normally this is a density-modified map from resolve. See also the map coefficients from phenix.refine below. overall_best_denmod_map_coeffs.mtz

Final sigmaA-weighted 2mFo-DFc and Fo-Fc map coefficients from phenix.refine based on overall_best.

pdb final model. The map coefficients are 2FOFCWT PH2FOFCWT for the 2mFo-DFc map and FOFC and

PHFOFC for the Fo-Fc difference map. See also the map coefficients from density modification above. overall_best_refine_map_coeffs.mtz

MTZ file with FP, phases and HL coeffs if present, and freeR_flags used in refinement exptl_fobs_phases_freeR_flags.mtz

Final log file for model-building overall_best.log

Final log file for refinement overall_best.log_refine

Evaluation of fit of model to map overall_best.log_eval

Summary of NCS information ncs_info.ncs

Standard building, rebuild_in_place, and multiple-models

The AutoBuild Wizard has two overall methods for building a model. The first method (standard build) is to build a model from scratch. This involves identification of where helices (and strands, for proteins) are located, extension using fragment libraries, connection of segments, identification of side-chains, and sequence alignment. These methods are augmented in the standard building procedure by loop-fitting and building model outside of the region that has already been built. The second method (rebuild_in_place) takes an existing model and rebuilds it without adding or deleting any residues and without changing the connectivity of the chain. The way this works is a segment of the model is deleted and then is filled-in again by rebuilding from the remaining ends. This is repeated for overlapping segments covering the entire model. The multiplemodels approach really has two levels of multiple models. At the first level, several

(multiple_models_group_number, default is number_of_parallel_models) models are built (using rebuild_in_place) and are then recombined into a single good model. At the next level, this whole process may be done more than once (multiple_models_number times), yielding several very good models. By default, if you ask for rebuild_in_place, then you will get a single very good model, created by running rebuild_in_place several times and recombining the models.

Parallel jobs, nproc, nbatch, number_of_parallel_models and how AutoBuild works in parallel

The AutoBuild Wizard is set up to take advantage of multi-processor machines or batch queues by splitting the work into separate tasks. See

Tutorial 4: Iterative model-building, density modification and refinement starting from experimental phases

and Tutorial 6: Automatically rebuilding a structure solved by Molecular

Replacement

for a description of the method used by the AutoBuild Wizard to run build jobs as sub-processes and to combine the results into single models. Here are the key factors that determine how splitting modelbuilding into batches and running them on one or more processors works: http://phenix-online.org/documentation/autobuild.htm (9 of 34) [12/14/08 1:01:09 PM]

87

Automated Model Building and Rebuilding using AutoBuild

nbatch is the number of batches of work. As long as nbatch is fixed then the results of running the

Wizard will be the same, no matter how many processors are used. It is most efficient however to have nbatch be at least as large as nproc, the number of processors. Otherwise some processors may end up doing nothing. The default is nbatch=3. The value of nbatch is used to set other defaults (such as number_of_parallel_models).

nproc is the number of processors to split the work among

number_of_parallel_models is the number of models to build at once. The default is to set number_of_parallel_models=nbatch. This affects both standard building (number_of_parallel_models sets how many initial models to build) and rebuild_in_place (number_of_parallel_models determines whether a single model is built or a set of models are built and recombined into a single model).

Model editing during rebuilding with the Coot-PHENIX interface

The AutoBuild Wizard allows you to edit a model and give it back to the Wizard during the iterative modelbuilding, density modification and refinement process. The Wizard will consider the model that you give it along with the models that it generates automatically, and will choose the parts of your model that fit the density better than other models. You can edit a model using the PHENIX-Coot interface. This interface is accessible through the GUI and via the command-line. Using the GUI, when a model has been produced by the

AutoBuild Wizard, you can double-click the button on the GUI labelled View/edit files with coot to start

Coot with your current map and model. If you are running from the command-line, you can open a new window and type: phenix.autobuild coot which will do the same (provided the necessary map and model are ready). When Coot has been loaded, your map and model will be displayed along with a PHENIX-Coot Interface window. You can edit your model and then save it, giving it back to PHENIX with the button labelled something like Save model as COMM/

overall_best_coot_7.pdb. This button creates the indicated file and also tells PHENIX to look for this file and to try and include the contents of the model in the building process. The precise use of the model that you save depends on the type of model-building that is being carried out by the AutoBuild Wizard. If you are using

rebuild_in_place then the main-chain and side-chains of the model are considered as replacements for the current working model. Any ligands or unrecognized residues are (by default) not rebuilt but are included in refinement. By default, solvent in the model is ignored. If you are not using rebuild_in_place, only the mainchain conformation is considered, and the side-chains are ignored. Ligands (but not solvent) in the model are

(by default) kept and included in refinement. As the AutoBuild Wizard continues to build new models and create new maps, you can update in the PHENIX-Coot Interface to the current best model and map with the button Update with current files from PHENIX.

Resolution limits in AutoBuild

There are several resolution limits used in AutoBuild. You can leave them all to default, or you can set any of them individually. Here is a list of these limits and how their default values are set:

Name resolution refinement_resolution resolution_build

Description

Overall resolution. Used as highresolution limit for density modification. Used as default for refinement resolution and modelbuilding resolution if they are not set.

Resolution for refinement

Resolution for model-building

How default value is set

Resolution of input datafile. If a hires datafile is provided, the resolution of that data is used. value of "resolution" value of "resolution" http://phenix-online.org/documentation/autobuild.htm (10 of 34) [12/14/08 1:01:09 PM]

88

Automated Model Building and Rebuilding using AutoBuild overall_resolution multiple_models_starting_resolution

Resolution to truncate all data. This should only be used if you need to truncate the data in order to get the

Wizard to run. It causes the Wizard to ignore all data at higher resolution than overall_resolution. It is normally better to use the resolution keyword to define the resolution limits, as that will keep all the data in the output and working files.

Resolution for the initial rebuilding of a model in the multiple-models procedure. Normally a low resolution to generate diversity.

None

4 A by default

Examples

Run AutoBuild automatically after AutoSol

phenix.autobuild after_autosol

Run AutoBuild beginning with experimental data

phenix.autobuild data=solve_1.mtz seq_file=seq.dat

Merge in hires data

phenix.autobuild data=solve_2.mtz hires_file=w1.sca seq_file=seq.dat

Make a SA-omit map around atoms in target.pdb

phenix.autobuild data=data.mtz model=coords.pdb omit_box_pdb=target.pdb composite_omit_type=sa_omit

Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .

An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make a simple composite omit map

phenix.autobuild data=data.mtz model=coords.pdb composite_omit_type=simple_omit

Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .

An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make an iterative-build omit map around atoms in target.pdb

phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=target.pdb \

composite_omit_type=iterative_build_omit

Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .

An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Make a sa-omit map around residues 3 and 4 in chain A of coords.pdb

http://phenix-online.org/documentation/autobuild.htm (11 of 34) [12/14/08 1:01:09 PM]

89

Automated Model Building and Rebuilding using AutoBuild phenix.autobuild data=w1.sca model=coords.pdb omit_box_pdb=coords.pdb \

omit_res_start_list=3 omit_res_end_list=4 omit_chain_list=A \

composite_omit_type=sa_omit

Coefficients for the output omit map will be in the file resolve_composite_map.mtz in the subdirectory OMIT/ .

An additional map coefficients file omit_region.mtz will show you the region that has been omitted. (Note 1: be sure to use the weights in both resolve_composite_map.mtz and omit_region.mtz).

Create one very good rebuilt model

phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \

include_input_model=True \

multiple_models_number=1 n_cycle_rebuild_max=5

The final model will be in the file MULTIPLE_MODELS/all_models.pdb (this file will contain just one model).

Note that this procedure will keep the sequence that is present in coords.pdb. If you supply a sequence file it will edit the sequence of coords.pdb to match your sequence file and discard any residues that do not match.

(If you want to input a sequence file but not edit the sequence in coords.pdb and not discard any nonmatching residues, then specify also edit_pdb=False.) Note also that if include_input_model=True then no randomization cycle will be carried out and multiple_models_starting_resolution is ignored.

Touch up a model

phenix.autobuild data=data.mtz model=coords.pdb \ touch_up=True worst_percent_res_rebuild=2 min_cc_res_rebuild=0.8

You can rebuild just the worst parts of your model by settting touch_up=True. You can decide what parts to rebuild based on a minimum model-map correlation (by residue). You can decide how much to rebuild using worst_percent_res_rebuild or with min_cc_res_rebuild, or both.

Create 20 very good rebuilt models that are as different as possible

phenix.autobuild data=data.mtz model=coords.pdb multiple_models=True \

multiple_models_number=20 n_cycle_rebuild_max=5

The 20 models will be in the file MULTIPLE_MODELS/all_models.pdb. This procedure is useful for generating an ensemble of models that are each individually consistent with the data, and yet are diverse. The variation among these models is an indication of the uncertainty in each of the models. Note that the ensemble of models is not a representation of the ensemble of structures that is truly present in the crystal.

Morph an MR model and rebuild it

phenix.autobuild data=data.mtz model=MR.pdb \ morph=True rebuild_in_place=False seq_file=seq.dat

You can have autobuild morph your input model, distorting it to match the density-modified map that is produced from your model and data. This can be used to make an improved starting model in cases where the

MR model is very different than the structure that is to be solved. For the morphing to work, the two structures must be topologically similar and differ mostly by movements of domains or motifs such as a group of helices or a sheet. The morphing process consists of identifying a coordinate shift to apply to each N (or P for nucleic acids) atom that maximizes the local density correlation between the model and the map. This is smoothed and applied to the structure to generate a morphed structure.

Build an RNA chain

phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=RNA http://phenix-online.org/documentation/autobuild.htm (12 of 34) [12/14/08 1:01:09 PM]

90

Automated Model Building and Rebuilding using AutoBuild

Build a DNA chain

phenix.autobuild data=solve_1.mtz seq_file=seq.dat chain_type=DNA

Just make maps; don't do any building.

phenix.autobuild data=data.mtz model=coords.pdb maps_only=True

Just calculate a prime-and-switch map

phenix.autobuild data=data.mtz solvent_fraction=.6 \

ps_in_rebuild=True model=coords.pdb maps_only=True

The output prime-and-switch map will be in the file prime_and_switch.mtz.

Possible Problems

General limitations

The AutoBuild wizard edits input PDB files to remove multiple conformations. It will also renumber residues if the file contains residues with insertion codes. All references to residue numbers (e.g. rebuild_res_start_list) refer to the edited, renumbered model. This model can be found in the

AutoBuild_run_1_ (or appropriate) directory as "edited_pdb.pdb".

The AutoBuild wizard expects residue numbers to not decrease along a chain. It will stop if residue 250 in chain B is found between residues 116 and 117 in the same chain, for example. To get around this, use insertion codes (make residue 250 residue 116A instead).

The AutoBuild model-building can only build one type of chain at a time (default chain_type='PROTEIN'; other choices are RNA and DNA). If you supply a PDB file containing more than one type of chain for rebuilding, then all the residues that are not that type of chain are treated as ligands and are (by default, keep_input_ligands=True) included in refinement but not in rebuilding. Any input solvent molecules are (by default, keep_input_waters=False) ignored.

You can include more than one type of chain in rebuilding by supplying one type of chains as ligands with input_lig_file_list and rebuilding another type: chain_type=PROTEIN # build only protein input_lig_file_list=MyDNA.pdb # just read in DNA coordinates and include in refinement

In this case only protein chains will be built, but the DNA coordinates in MyDNA.pdb will be included in all refinements and will be written out to the final coordinate file. You may wish to add the keyword: keep_pdb_atoms=False #keep the ligand atoms if model (pdb) and ligand overlap which will tell AutoBuild that the ligand (DNA) atoms are to be kept if the model that is being built

(protein) overlaps with it. (The default is to keep the model that is being built and to discard any ligand atoms that overlap). This whole process is likely to require substantial editing of the PDB files by hand because when you build DNA, a lot of chains are going to be built into the protein region, and when you

● build protein, it is going to be accidentally built into the DNA.

Any file in input_lig_file_list containing ATOM records will have them replaced with HETATM records. This

● is so that the rebuild_in_place algorithm does not try to use them in rebuilding.

The ligand generation routine in phenix.elbow will not generate heme groups at this point. Most other

● ligands can be automatically generated.

If your input data file contains both intensity data and amplitude data, only the amplitude data is exposed in the AutoBuild Wizard. If you want to use the intensity data then you have to create a file that does not have amplitude data in it.

If your input data file has only intensity data and you wish to specify which columns of data the

AutoBuild Wizard is to use, then you have to specify the names that the columns will have AFTER importing the data and conversion to amplitudes, not the original column names. These column names http://phenix-online.org/documentation/autobuild.htm (13 of 34) [12/14/08 1:01:09 PM]

91

Automated Model Building and Rebuilding using AutoBuild may not be obvious. Here is how to find out what they will be. Do a quick dummy run like this with XXX as labels: phenix.autobuild w2.sca coords.pdb input_labels="XXX XXX"

The Wizard will print out a list of available labels like this:

Sorry, the label XXX does not exist as an amplitude array in the input_data_file ImportRawData_run_8_/w2_PHX.mtz

...available labels are: ['w2', 'SIGw2', 'None']

Then you know that the correct command is: phenix.autobuild w2.sca coords.pdb input_labels="w2 SIGw2"

The AutoBuild Wizard cannot build modified residues. If you supply a model with modified residues, these will be taken out of the chain and treated as ligands, and the chain will be broken at that point. By default the modified residues will be added to your model just before refinement and a cif definitions file will be automatically generated for these residues. You can also add these residues with the input_lig_file_list procedure if you want.

The AutoBuild Wizard will not build very short chains unless you set the variable group_ca_length

(default=4 for building a model from scratch) to a smaller number. The shortest chain that will be built is group_ca_length. If you use rebuild_in_place, then the default shortest chain allowed is 1 residue, so any part of a model you supply is rebuilt.

Specific limitations and problems

The size of the asymmetric unit in the SOLVE/RESOLVE portion of the AutoBuild wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-

● size versions can be obtained on request.

The AutoBuild Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Iterative model building, structure refinement and density modification with the

PHENIX AutoBuild wizard. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, N. W.

Moriarty, P. H. Zwart, L.-W. Hung, R. J. Read, and P. D. Adams Acta Cryst. D64, 61-69 (2008)

Interpretation of ensembles created by multiple iterative rebuilding of

macromolecular models. T. C. Terwilliger, R. W. Grosse-Kunstleve, P. V. Afonine, P. D.

Adams, N. W. Moriarty P. H. Zwart, R. J. Read, D. Turk and L.-W. Hung Acta Cryst. D63, 597-

610 (2007)

Using prime-and-switch phasing to reduce model bias in molecular replacement. T.

C. Terwilliger Acta Cryst. D60, 2144-2149 (2004)

Improving macromolecular atomic models at moderate resolution by automated

iterative model building, statistical density modification and refinement. T.C.

Terwilliger. Acta Cryst. D59, 1174-1182 (2003)

Statistical density modification using local pattern matching. T.C. Terwilliger.

Acta

Cryst. D59, 1688-1701 (2003)

Automated side-chain model building and sequence assignment by template

matching. T.C. Terwilliger. Acta Cryst. D59, 45-49 (2003)

[pdf]

[pdf]

[pdf]

[pdf]

[pdf]

[pdf] http://phenix-online.org/documentation/autobuild.htm (14 of 34) [12/14/08 1:01:09 PM]

92

Automated Model Building and Rebuilding using AutoBuild

Automated main-chain model building by template matching and iterative fragment

extension. T.C. Terwilliger. Acta Cryst. D59, 38-44 (2003)

Rapid automatic NCS identification using heavy-atom substructures T.C. Terwilliger.

Acta Cryst. D58, 2213-2215 (2002)

Statistical density modification with non-crystallographic symmetry T.C. Terwilliger.

Acta Cryst. D58, 2082-2086 (2002)

Maximum likelihood density modification T. C. Terwilliger Acta Cryst. D56 , 965-972

(2000)

Maximum-likelihood density modification with pattern recognition of structural

motifs. T. C. Terwilliger Acta Cryst. D57 , 1755-1762 (2001)

Map-likelihood phasing T. C. Terwilliger Acta Cryst. D57 , 1763-1775 (2001)

[pdf]

[pdf]

[pdf]

[pdf]

[pdf]

[pdf]

Additional information

List of all AutoBuild keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- autobuild

data= None Datafile (alias for input_data_file) This file can be a .sca or

mtz or other standard file. The Wizard will guess the column

identification. You can specify the column labels to use with:

input_labels='FP SIGFP PHIB FOM HLA HLB HLC HLD FreeR_flag'

Substitute any labels you do not have with None. If you only have

myFP and mysigFP you can just say input_labels='myFP mysigFP'.

(Command-line only)

model= None PDB file with starting model (alias for input_pdb_file) NOTE:

If your PDB file has been previously refined, then please make sure

that you provide the free R flags that were used in that refinement.

These can come from the data file or from the refinement_file.

(Command-line only).

seq_file= Auto Sequence file (alias for input_seq_file). The format is

plain text, with chains separated by a line starting with > ,

any blanks and unrecognized characters are ignored. You need only

input 1 copy of each unique chain. (Command-line only)

map_file= Auto MTZ file containing starting map (alias for input_map_file)

This file must be a mtz file. The Wizard will guess the column

identification. You can specify the column labels to use with:

input_map_labels='FP PHIB FOM' Substitute any labels you do not

have with None. If you only have myFP and myPHIB you can just say

input_map_labels='myFP myPHIB'. (Command-line only)

refinement_file= Auto File for refinement (alias for input_refinement_file)

This file can be a .sca or mtz or other standard file.

This file will be merged with your data file, with any

phase information coming from your data file. If this file

has free R flags, they will be used, otherwise if the data http://phenix-online.org/documentation/autobuild.htm (15 of 34) [12/14/08 1:01:09 PM]

93

Automated Model Building and Rebuilding using AutoBuild

file has them, those will be used, otherwise they will be

generated. The Wizard will guess the column

identification. You can specify the column labels to use

with: input_refinement_labels='FP SIGFP FreeR_flag'

Substitute any labels you do not have with None. If you

only have myFP and mysigFP you can just say

input_refinement_labels='myFP mysigFP'. (Command-line

only).

hires_file= Auto File with high-resolution data (alias for

input_hires_file) This file can be a .sca or mtz or other

standard file. The Wizard will guess the column identification.

You can specify the column labels to use with:

input_hires_labels='FP SIGFP'. (Command-line only)

special_keywords

write_run_directory_to_file= None Writes the full name of a run

directory to the specified file. This can

be used as a call-back to tell a script

where the output is going to go.

(Command-line only)

run_control

coot= None Set coot to True and optionally run=[run-number] to run Coot

with the current model and map for run run-number. In some wizards

(AutoBuild) you can edit the model and give it back to PHENIX to

use as part of the model-building process. If you just say coot

then the facts for the highest-numbered existing run will be

shown. (Command-line only)

ignore_blanks= None ignore_blanks allows you to have a command-line

keyword with a blank value like "input_lig_file_list="

stop= None You can stop the current wizard with "stopwizard" or "stop".

If you type "phenix.autobuild run=3 stop" then this will stop run

3 of autobuild. (Command-line only)

display_facts= None Set display_facts to True and optionally

run=[run-number] to display the facts for run run-number.

If you just say display_facts then the facts for the

highest-numbered existing run will be shown.

(Command-line only)

display_summary= None Set display_summary to True and optionally

run=[run-number] to show the summary for run

run-number. If you just say display_summary then the

summary for the highest-numbered existing run will be

shown. (Command-line only)

carry_on= None Set carry_on to True to carry on with highest-numbered

run from where you left off. (Command-line only)

run= None Set run to n to continue with run n where you left off.

(Command-line only)

copy_run= None Set copy_run to n to copy run n to a new run and continue

where you left off. (Command-line only)

display_runs= None List all runs for this wizard. (Command-line only)

delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)

display_labels= None display_labels=test.mtz will list all the labels

that identify data in test.mtz. You can use the label

strings that are produced in AutoSol to identify which

data to use from a datafile like this: peak.data="F+

SIGF+ F- SIGF-" # the entire string in quotes counts

here You can use the individual labels from these

strings as identifiers for data columns in AutoSol and

AutoBuild like this: input_refinement_labels="FP SIGFP

FreeR_flags" # each individual label counts

dry_run= False Just read in and check parameter names

params_only= False Just read in and return parameter defaults

display_all= False Just read in and display parameter defaults

crystal_info http://phenix-online.org/documentation/autobuild.htm (16 of 34) [12/14/08 1:01:09 PM]

94

Automated Model Building and Rebuilding using AutoBuild

cell= 0.0 0.0 0.0 0.0 0.0 0.0

Enter cell parameter a b c alpha beta

gamma

chain_type= *Auto PROTEIN DNA RNA You can specify whether to build

protein, DNA, or RNA chains. At present you can only build

one of these in a single run. If you have both DNA and

protein, build one first, then run AutoBuild again,

supplying the prebuilt model in the "input_lig_file_list"

and build the other. NOTE: default for this keyword is Auto,

which means "carry out normal process to guess this

keyword". The process is to look at the sequence file and/or

input pdb file to see what the chain type is. If there are

more than one type, the type with the larger number of

residues is guessed. If you want to force the chain_type,

then set it to PROTEIN RNA or DNA.

dmax= 500.0

Low-resolution limit

overall_resolution= 0.0

If overall_resolution is set, then all data

beyond this is ignored. NOTE: this is only suggested

if you have a very big cell and need to truncate the

data to allow the wizard to run at all. Normally you

should use 'resolution' and 'resolution_build' and

'refinement_resolution' to set the high-resolution

limit

resolution= 0.0

High-resolution limit.Used as resolution limit for

density modification and as general default high-resolution

limit. If resolution_build or refinement_resolution are set

then they override this for model-building or refinement. If

overall_resolution is set then data beyond that resolution

is ignored completely.

sg= None Space Group symbol (i.e., C2221 or C 2 2 21)

solvent_fraction= None Solvent fraction in crystals (0 to 1).

decision_making

acceptable_r= 0.25

Used to decide whether the model is acceptable enough

to quit if it is not improving much. A good value is 0.25

dist_close= None If main-chain atom rmsd is less than dist_close then

crossover between chains in different models is allowed at

this point. If you input a negative number the defaults

will be used

dist_close_overlap= 1.5

Model or ligand coordinates but not both are

kept when model and ligand coordinates are within

dist_close_overlap and ligands in

input_lig_file_list are being added to the current

model. NOTE: you might want to decrease this if your

ligand atoms get removed by the wizard. Default=1.5

A

group_ca_length= 4 In resolve building you can specify how short a

fragment to keep. Normally 4 or 5 residues should be

the minimum.

group_length= 2 In resolve building you can specify how many fragments

must be joined to make a connected group that is kept.

Normally 2 fragments should be the minimum.

include_molprobity= False You can choose to include the clash score from

MolProbity as one of the scoring criteria in

comparing and merging models. The score is combined

with the model-map correlation CC by summing in a

weighted clashscore. If clashscore for a residue has

a value < ok_molp_score then its value is

(clashscore-ok_molp_score)*scale_molp_score,

otherwise its value is zero.

loop_cc_min= 0.4

You can specify the minimum correlation of density from

a loop with the map.

min_cc_res_rebuild= 0.5

You can rebuild just the worst parts of your

model by settting touch_up=True. You can decide what http://phenix-online.org/documentation/autobuild.htm (17 of 34) [12/14/08 1:01:09 PM]

95

Automated Model Building and Rebuilding using AutoBuild

parts to rebuild based on a minimum model-map

correlation (by residue). You can decide how much to

rebuild using worst_percent_res_rebuild or with

min_cc_res_rebuild, or both.

min_seq_identity_percent= 50.0

The sequence in your input PDB file will

be adjusted to match the sequence in your

sequence file (if any). If there are

insertions/deletions in your model and the

wizard does not seem to identify them, you can

split up your PDB file by adding records like

this: BREAK You can specify the minimum

sequence identity between your sequence file

and a segment from your input PDB file to

consider the sequences to be matched. Default

is 50.0%. You might want a higher number to

make sure that deletions in the sequence are

noticed.

ok_molp_score= None You can choose to include the clash score from

MolProbity as one of the scoring criteria in comparing

and merging models. The score is combined with the

model-map correlation CC by summing in a weighted

clashscore. If clashscore for a residue has a value <

ok_molp_score (the threshold defined by ok_molp_score)

then its value is

(clashscore-ok_molp_score)*scale_molp_score, otherwise

its value is zero.

r_switch= 0.4

R-value criteria for deciding whether to use R-value or

residues built A good value is 0.40

scale_molp_score= None You can choose to include the clash score from

MolProbity as one of the scoring criteria in comparing

and merging models. The score is combined with the

model-map correlation CC by summing in a weighted

clashscore. If clashscore for a residue has a value <

ok_molp_score then its value is

(clashscore-ok_molp_score)*scale_molp_score, otherwise

its value is zero.

semi_acceptable_r= 0.3

Used to decide whether the model is acceptable

enough to skip rebuilding the model from scratch and

focus on adding loops and extending it. A good value

is 0.35

density_modification

hl= False You can choose whether to calculate hl coeffs when doing

density modification ('Yes') or not to do so ('No'). Default is No.

mask_type= *histograms probability wang Choose method for obtaining

probability that a point is in the protein vs solvent region.

Default is "histograms". If you have a SAD dataset with a

heavy atom such as Pt or Au then you may wish to choose

"wang" because the histogram method is sensitive to very high

peaks. Options are: histograms: compare local rms of map and

local skew of map to values from a model map and estimate

probabilities. This one is usually the best. probability:

compare local rms of map to distribution for all points in

this map and estimate probabilities. In a few cases this one

is much better than histograms. wang: take points with

highest local rms and define as protein.

modify_outside_delta_solvent= 0.05

You can set the initial solvent

content to be a little lower than

calculated when you are running

modify_outside_model Usually 0.05 is fine.

modify_outside_model= False You can choose whether to modify the density

in the "protein" region outside the region

specified in your current model by matching http://phenix-online.org/documentation/autobuild.htm (18 of 34) [12/14/08 1:01:09 PM]

96

Automated Model Building and Rebuilding using AutoBuild

histograms with the region that is specified by

that model. This can help by raising the density

in this protein region up to a value similar to

that where atoms are already placed.

thorough_denmod= *Auto Yes No True False Choose whether you want to go

for thorough density modification when no model is used

("No" speeds it up and for a terrible map is sometimes

better)

truncate_ha_sites_in_resolve= *Auto Yes No True False You can choose to

truncate the density near heavy-atom sites

at a maximum of 2.5 sigma. This is useful

in cases where the heavy-atom sites are

very strong, and rarely hurts in cases

where they are not. The heavy-atom sites

are specified with "input_ha_file"

use_resolve_fragments= True This script normally uses information from

fragment identification as part of density

modification for the first few cycles of

model-building. Fragments are identified during

model-building. The fragments are used, with

weighting according to the confidence in their

placement, in density modification as targets for

density values.

use_resolve_pattern= True Local pattern identification is normally used

as part of density modification during the first

few cycles of model building.

general

after_autosol= False You can specify that you want to continue on

starting with the highest-scoring run of AutoSol.

background= True When you specify nproc=nn, you can run the jobs in

background (default if nproc is greater than 1) or

foreground (default if nproc=1). If you set

run_command=qsub (or otherwise submit to a batch queue),

then you should set background=False, so that the batch

queue can keep track of your runs. There is no need to use

background=True in this case because all the runs go as

controlled by your batch system. If you use run_command=csh

(or similar, csh is default) then normally you will use

background=True so that all the jobs run simultaneously.

base_path= None You can specify the base path for files (default is

current working directory)

clean_up= False At the end of the entire run the TEMP directories will

be removed if clean_up is True. The default is No, keep these

directories. If you want to remove them after your run is

finished use a command like "phenix.autobuild run=1

clean_up=True"

coot_name= coot If your version of coot is called something else, then

you can specify that here.

debug= False You can have the wizard stop with error messages about the

code if you use debug. NOTE: you cannot use Pause with debug.

extra_verbose= False Facts and possible commands will be printed every

cycle if Yes

i_ran_seed= 289564 Random seed (positive integer) for model-building

and simulated annealing refinement

max_wait_time= 100.0

You can specify the length of time (seconds) to

wait when testing the run_command. If you have a cluster

where jobs do not start right away you may need a longer

time to wait.

nbatch= 3 You can specify the number of processors to use (nproc) and

the number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors

available and leave nbatch alone. If you leave nbatch as None it http://phenix-online.org/documentation/autobuild.htm (19 of 34) [12/14/08 1:01:09 PM]

97

Automated Model Building and Rebuilding using AutoBuild

will be set automatically, with a value depending on the Wizard.

This is recommended. The value of nbatch can affect the results

that you get, as the jobs are not split into exact replicates,

but are rather run with different random numbers. If you want to

get the same results, keep the same value of nbatch.

nproc= 1 You can specify the number of processors to use (nproc) and the

number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors available

and leave nbatch alone. If you leave nbatch as None it will be

set automatically, with a value depending on the Wizard. This is

recommended. The value of nbatch can affect the results that you

get, as the jobs are not split into exact replicates, but are

rather run with different random numbers. If you want to get the

same results, keep the same value of nbatch.

quick= False Run everything quickly (number_of_parallel_models=1

n_cycle_build_max=1 n_cycle_rebuild_max=1)

resolve_command_list= None Commands for resolve. One per line in the

form: keyword value value can be optional

Examples: coarse_grid resolution 200 2.0 hklin

test.mtz NOTE: for command-line usage you need to

enclose the whole set of commands in double quotes

(") and each individual command in single quotes

(') like this: resolve_command_list="'no_build'

'b_overall 23' "

resolve_pattern_command_list= None Commands for resolve_pattern. One

per line in the form: keyword value

value can be optional Examples:

resolution 200 2.0 hklin test.mtz NOTE:

for command-line usage you need to enclose

the whole set of commands in double quotes

(") and each individual command in single

quotes (') like this:

resolve_pattern_command_list="'resolution

200 20' 'hklin test.mtz' "

resolve_size= _giant _huge _extra_huge *None Size for solve/resolve

("","_giant","_huge","_extra_huge")

run_command= csh When you specify nproc=nn, you can run the subprocesses

as jobs in background with csh (default) or submit them to

a queue with the command of your choice (i.e., qsub ). If

you have a multi-processor machine, use csh. If you have a

cluster, use qsub or the equivalent command for your

system. NOTE: If you set run_command=qsub (or otherwise

submit to a batch queue), then you should set

background=False, so that the batch queue can keep track of

your runs. There is no need to use background=True in this

case because all the runs go as controlled by your batch

system. If you use run_command=csh (or similar, csh is

default) then normally you will use background=True so that

all the jobs run simultaneously.

skip_xtriage= False You can bypass xtriage if you want. This will

prevent you from applying anisotropy corrections, however.

temp_dir= None Define a temporary directory (it must exist)

title= Run 1 AutoBuild Sun Dec 7 17:46:23 2008 Enter any text you like

to help identify what you did in this run

top_output_dir= None This is used in subprocess calls of wizards and to

tell the Wizard where to look for the STOPWIZARD file.

verbose= False Command files and other verbose output will be printed

input_files

cif_def_file_list= None You can enter any number of CIF definition

files. These are normally used to tell phenix.refine

about the geometry of a ligand or unusual residue.

You usually will use these in combination with "PDB http://phenix-online.org/documentation/autobuild.htm (20 of 34) [12/14/08 1:01:09 PM]

98

Automated Model Building and Rebuilding using AutoBuild

file with metals/ligands" (keyword

"input_lig_file_list" ) which allows you to attach

the contents of any PDB file you like to your model

just before it gets refined. You can use

phenix.elbow to generate these if you do not have a

CIF file and one is requested by phenix.refine

input_data_file= None Enter the a file with input structure factor data.

For structure factor data only (e.g., FP SIGFP) any

format is ok. If you have free R flags, phase

information or HL coefficients that you want to use

then an mtz file is required. If this file contains

phase information, this phase information should be

experimental (i.e., MAD/SAD/MIR etc), and should not be

density-modified phases (enter any files with

density-modified phases as input_map_file instead).

NOTE: If you supply HL coefficients they will be used

in phase recombination. If you supply PHIB or PHIB and

FOM and not HL coefficients, then HL coefficients will

be derived from your PHIB and FOM and used in phase

recombination. If you also specify a hires data file,

then FP and SIGFP will come from that data file (and

not this one) If an input_refinement_file is

specified, then F, Sigma, FreeR_flag (if present) from

that file will be used for refinement instead of this

one.

input_ha_file= None If the flag "truncate_ha_sites_in_resolve" is set

then density at sites specified with input_ha_file is

truncated to improve the density modification procedure.

input_hires_labels= None Labels for input hires file (FP SIGFP

FreeR_flag)

input_labels= None Labels for input data columns NOTE: Applies to input

data file for LigandFit and AutoBuild, but not to AutoMR.

For AutoMR use instead 'input_label_string'.

input_lig_file_list= None This script adds the contents of these PDB

files to each model just prior to refinement.

Normally you might use this to put in any

heavy-atoms that are in the refined structure (for

example the heavy atoms that were used in phasing),

or to add a ligand to your model. If the atoms in

this PDB file are not recognized by phenix.refine,

then you can specify their geometries with a cif

definitions file using the keyword

"cif_def_files_list". You can easily generate cif

definitions for many ligands using phenix.elbow in

PHENIX. You can put anything you like in the files

in input_lig_file_list, but any atoms that fall

within 1.5 A of any atom in the current model will

be tossed (not written to the model).

input_map_file= Auto Enter an mtz file with coefficients for map (if

different file or different coefficients than input

structure factor data ). This map will be used in the

first cycle of model-building. NOTE: default for this

keyword is Auto, which means "carry out normal process

to guess this keyword". This means if you specify

"after_autosol" in AutoBuild, AutoBuild will

automatically take the value from AutoSol. If you do not

want this to happen, you can specify None which means

"No file"

input_map_labels= None Labels for input map coefficient columns (FP PHIB

FOM) NOTE: FOM is optional (set to None if you wish)

input_pdb_file= None You can enter a PDB file containing a starting

model of your structure NOTE: If you enter a PDB file http://phenix-online.org/documentation/autobuild.htm (21 of 34) [12/14/08 1:01:09 PM]

99

Automated Model Building and Rebuilding using AutoBuild

then the AutoBuild wizard will start right in with

rebuild steps, skipping the build process. If the model

is very poor than it may be better to leave it out as

the build process (which includes pattern recognition

and recognition of helical and strand fragments) is

optimized for improving poor maps, while the rebuild

process is optimized for better maps that can be

produced by having a partial model.

input_refinement_file= Auto Data file to use for refinement. The data in

this file should not be corrected for anisotropy.

It will be combined with experimental phase

information (if any) from input_data_file for

refinement. If you leave this blank, then the

data in the input_data_file will be used in

refinement. If no anisotropy correction is

applied to the data you do not need to specify a

datafile for refinement. If an anisotropy

correction is applied to the data files, then you

should enter an uncorrected datafile for

refinement. Any standard format is fine;

normally only F and sigF will be used. Bijvoet

pairs and duplicates will be averaged. If an mtz

file is provided then a free R flag can be read

in as well. Any HL coeffs and phase information

in this file is ignored. NOTE: default for this

keyword is Auto, which means "carry out normal

process to guess this keyword". This means if you

specify "after_autosol" in AutoBuild, AutoBuild

will automatically take the value from AutoSol.

If you do not want this to happen, you can

specify None which means "No file"

input_refinement_labels= None Labels for input refinement file columns

(FP SIGFP FreeR_flag)

input_seq_file= Auto Enter name of file with 1-letter code of protein

sequence NOTES: 1. lines starting with > are ignored

and separate chains 2. FASTA format is fine 3. If

there are multiple copies of a chain, just enter one

copy. 4. If you enter a PDB file for rebuilding and it

has the sequence you want, then the sequence file is not

necessary. NOTE: You can also enter the name of a PDB

file that contains SEQRES records, and the sequence from

the SEQRES records will be read, written to

seq_from_seqres_records.dat, and used as your input

sequence. NOTE: for AutoBuild you can specify

start_chains_list on the first line of your sequence

file: >> start_chains_list 23 11 5 NOTE: default

for this keyword is Auto, which means "carry out normal

process to guess this keyword". This means if you

specify "after_autosol" in AutoBuild, AutoBuild will

automatically take the value from AutoSol. If you do not

want this to happen, you can specify None which means

"No file"

keep_input_ligands= True You can choose whether to (by default) let the

wizard keep ligands by separating them out from the

rest of your model and adding them back to your

rebuilt model, or alternatively to remove all

ligands from your input pdb file before

rebuild_in_place.

keep_input_waters= False You can choose whether to keep input waters

(solvent) when using rebuild_in_place. If you keep

them, then you should specify either

"place_waters=No" or "keep_pdb_atoms=No" because if http://phenix-online.org/documentation/autobuild.htm (22 of 34) [12/14/08 1:01:09 PM]

100

Automated Model Building and Rebuilding using AutoBuild

place_waters=Yes and keep_pdb_atoms=Yes then

phenix.refine will add waters and then the wizard

will keep the new waters from the new PDB file

created by phenix.refine preferentially over the ones

in your input file.

keep_pdb_atoms= True You can choose whether to keep the model

coordinates when model and ligand coordinates are within

dist_close_overlap and ligands in input_lig_file_list

are being added to the current model. Default=Yes

refine_eff_file_list= None You can enter any number of refinement

parameter files. These are normally used to tell

phenix.refine defaults to apply, as well as

creating specialized definitions such as unusual

amino acid residues and linkages. These

parameters override the normal phenix.refine

defaults. They themselves can be overridden by

parameters set by the Wizard and by you,

controlling the Wizard. NOTE: Any parameters set

by AutoBuild directly (such as

number_of_macro_cycles, high_resolution, etc...)

will not be taken from this parameters file. This

is useful only for adding extra parameters not

normally set by AutoBuild.

maps

maps_only= False You can choose whether to skip all model-building and

just calculate maps and write out the results. This also runs

just 1 cycle and turns on HL coefficients.

n_xyz_list= None You can specify the grid to use for map calculations.

model_building

allow_negative_residues= False Normally the wizard does not allow

negative residue numbers, and all residues with

negative numbers are rejected when they are

read in. You can allow them if you wish.

base_model= None You can enter a PDB file with coordinates to be used

as a starting point for model-building. These coordinates

will be included in the same way as fragments placed by

searching for helices and strand in initial model-building.

Note the difference from the use of models in

consider_main_chain_list, which are merged with models after

they are built. NOTE: Only use this if you want to keep the

input model and just add to it.

build_type= RESOLVE_AND_TEXTAL *RESOLVE TEXTAL You can choose to build

models with RESOLVE and TEXTAL or either one, and how many

different models to build with RESOLVE. The more you build,

the more likely to get a complete model. Note that

rebuild_in_place can only be carried out with RESOLVE

model-building

cc_helix_min= None Minimum CC of helical density to map at low

resolution when using helices_strands_only

cc_strand_min= None Minimum CC of strand density to map when using

helices_strands_only

consider_main_chain_list= None This keyword lets you name any number of

PDB files to consider as templates for

model-building. Every time models are built,

the contents of these files will be merged

with them and the best parts will be kept.

NOTE: this only uses the main-chain atoms of

your PDB files.

dist_connect_max_helices= None Set maximum distance between ends of

helices and other ends to try and connect them

in insert_helices.

edit_pdb= True You can choose to edit the input PDB file in http://phenix-online.org/documentation/autobuild.htm (23 of 34) [12/14/08 1:01:09 PM]

101

Automated Model Building and Rebuilding using AutoBuild

rebuild_in_place to match the input sequence (default=True).

NOTE: residues with residue numbers higher than

'highest_resno' are assumed to not have a known sequence and

will not be edited. By default the value of 'highest_resno' is

the highest residue number from the sequence file, after

adding it to the starting residue number from

start_chains_list. You can also set it directly

helices_strands_only= False You can choose to use a quick model-building

method that only builds secondary structure. At

low resolution this may be both quicker and more

accurate than trying to build the entire structure

If you are running the AutoSol Wizard, normally

you should choose 'Yes' and use the quick

model-building. Then when your structure is solved

by AutoSol, go on to AutoBuild and build a more

complete model (this time normally using

helices_strands_only=False).

helices_strands_start= False You can choose to use a quick

model-building method that builds secondary

structure as a way to get started...then model

completion is done as usual. (Contrast with

helices_strands_only which only does secondary

structure)

highest_resno= None Highest residue number to be considered "placed" in

sequence for rebuild_in_place

include_input_model= True The keyword include_input_model defines

whether the input model (if any) is to be crossed

with models that are derived from it, and the best

parts of each kept. Note that if

multiple_models=True and include_input_model=True

then no initial cycle of randomization will be

carried out and the keyword

multiple_models_starting_resolution is ignored. In

most cases you should use include_input_model=True

If you want to generate maximum diversity with

multiple-models then you may wish to use

include_input_model=False. Also if you want to

decrease the amount of bias from your starting

model you may wish to use

include_input_model=False.

input_compare_file= NONE If you are rebuilding a model or already think

you know what the model should be, you can include a

comparison file in rebuilding. The model is not used

for anything except to write out information on

coordinate differences in the output log files.

NOTE: this feature does not always work correctly.

merge_models= False You can choose to only merge any input models and

write out the resulting model. The best parts of each

model will be kept based on model-map correlation.

Normally used along with number_of_parallel_models=1

morph= False You can choose whether to distort your input model in order

to match the current working map. This may be useful for MR

models that are quite distant from the correct structure.

morph_cycles= 2 Number of iterations of morphing each time it is run.

morph_rad= 7.0

Smoothing radius for morphing. The density from your

model and from the map are calculated with the radius

rad_morph, then they are adjusted to overlap optimally

n_ca_enough_helices= None Set maximum number of CA to add to ends of

helices and other ends to try and connect them in

insert_helices.

offsets_list= 53 7 23 You can specify an offset for the orientation of

the helix and strand templates in building. This is used http://phenix-online.org/documentation/autobuild.htm (24 of 34) [12/14/08 1:01:09 PM]

102

Automated Model Building and Rebuilding using AutoBuild

in generating different starting models.

ps_in_rebuild= False You can choose to use a prime-and-switch resolve

map in all cycles of rebuilding instead of a

density-modified map. This is normally used in

combination with maps_only to generate a prime-and-switch

map.

refine= True This script normally refines the model during building. Say

No to skip refinement

resolution_build= 0.0

Enter the high-resolution limit for

model-building. If 0.0, the value of resolution is

used as a default.

restart_cycle_after_morph= 5 Morphing (if morph=True) will go only up to

this cycle, and then the morphed PDB file

will be used as a starting PDB file from then

on, removing all previous models.

retrace_before_build= False You can choose to retrace your model n_mini

times and use a map based on these retraced models

to start off model-building. This is the default

for rebuilding models if you are not using

rebuild_in_place. You can also specify

n_iter_rebuild, the number of cycles of

retrace-density-modify-build before starting the

main build.

reuse_chain_prev_cycle= True You can choose to allow model-building to

include atoms from each cycle in the model the

next cycle or not

richardson_rotamers= *Auto Yes No True False You can choose to use the

rotamer library from SC Lovell, JM Word, JS

Richardson and DC Richardson (2000) " The

Penultimate Rotamer Library" Proteins: Structure

Function and Genetics 40 389-408. if you wish.

Typically this works well in RESOLVE model-building

for nearly-final models but not as well earlier in

the process . Default (Auto) is to use these

rotamers for rebuild_in_place but not otherwise.

rms_random_frag= None Rms random position change added to residues on

ends of fragments when extending them If you enter a

negative number, defaults will be used.

rms_random_loop= None Rms random position change added to residues on

ends of loops in tries for building loops If you enter

a negative number, defaults will be used.

semet= False You can specify that the dataset that is used for

refinement is a selenomethionine dataset, and that the model

should be the SeMet version of the protein, with all SD of MET

replaced with Se of MSE.

start_chains_list= None You can specify the starting residue number for

each of the unique chains in your structure. If you

use a sequence file then the unique chains are

extracted and the order must match the order of your

starting residue numbers. For example, if your

sequence file has chains A and B (identical) and

chains C and D (identical to each other, but

different than A and B) then you can enter 2 numbers,

the starting residues for chains A and C. NOTE: you

need to specify an input sequence file for

start_chains_list to be applied.

trace_as_lig= False You can specify that in building steps the ends of

chains are to be extended using the LigandFit algorithm.

This is default for nucleic acid model-building.

track_libs= False You can keep track of what libraries each atom in a

built structure comes from.

two_fofc_in_rebuild= False You can choose to use a sigmaa-weighted http://phenix-online.org/documentation/autobuild.htm (25 of 34) [12/14/08 1:01:09 PM]

103

Automated Model Building and Rebuilding using AutoBuild

2Fo-Fc map in all cycles of rebuilding instead of a

density-modified map. If the model is poor this can

sometimes allow model-building in place to work

even when it will not for density-modified maps.

use_any_side= True You can choose to have resolve model-building place

the best-fitting side chain at each position, even if the

sequence is not matched to the map.

use_cc_in_combine_extend= False You can choose to use the correlation of

density rather than density at atomic

positions to score models in combine_extend

use_met_in_align= *Auto Yes No True False You can use the heavy-atom

positions in input_ha_file as markers for Met SD

positions.

multiple_models

combine_only= False Once you have created a set of initial models you

can merge them together into a final set. This option is

useful if you have split up the creation of multiple

models into different directories, and then you have

copied all the initial models to one directory for

combining.

multiple_models= False You can build a set of models, all compatible

with your data. You can specify how many models with

multiple_models_number. If you are using

rebuild_in_place you can specify whether to generate

starting models or not with multiple_models_starting.

multiple_models_first= 1 Specify which model to build first

multiple_models_group_number= 5 You can build several initial models and

merge them. Normally 5 initial models is

fine.

multiple_models_last= 20 Specify which model to end with

multiple_models_number= 20 Specify how many models to build.

multiple_models_starting= True You can specify how to generate starting

models for multiple models. If you are using

rebuild_in_place and you specify "Yes" then

the Wizard will rebuild your starting model at

the resolution specified in

multiple_models_starting_resolution. If you

are not using rebuild_in_place the Wizard will

always build a starting model at the current

resolution.

multiple_models_starting_resolution= 4.0

You can set the resolution for

rebuilding an initial model. A

value of 0.0 will use the

resolution of the dataset.

place_waters_in_combine= True You can choose whether phenix.refine

automatically places ordered solvent (waters)

during the last cycle of multiple-model

generation. This is separate from place_waters,

which applies to all other cycles.

ncs

find_ncs= *Auto Yes No True False This script normally deduces ncs

information from the NCS in chains of models that are built

during iterative model-building. The update is done each cycle

in which an improved model is obtained. Say No to skip this.

See also "input_ncs_file" which can be used to specify NCS at

the start of the process. If find_ncs="No" then only this

starting NCS will be used and it will not be updated. You can

use find_ncs "No" to specify exactly what residues will be

used in NCS refinement and exactly what NCS operators to use

in density modification. You can use the function

$PHENIX/phenix/phenix/command_line/simple_ncs_from_pdb.py to

help you set up an input_ncs_file that has your specifications http://phenix-online.org/documentation/autobuild.htm (26 of 34) [12/14/08 1:01:09 PM]

104

Automated Model Building and Rebuilding using AutoBuild

in it.

input_ncs_file= None You can enter NCS information in 3 ways: (1) an

ncs_spec file produced by AutoSol or AutoBuild with NCS

information (2) a heavy-atom PDB file that contains ncs

in the heavy-atom sites (3) a PDB file with a model

that contains chains with NCS The wizard will derive NCS

information from any of these if specified. See also

"find_ncs" which determines whether the wizard will

update NCS from models that are built during iterative

building.

ncs_copies= None Number of copies of the molecule in the au (note: only

one type of molecule allowed at present)

ncs_refine_coord_sigma_from_rmsd= False You can choose to use the

current NCS rmsd as the value of the

sigma for NCS restraints. See also

ncs_refine_coord_sigma_from_rmsd_ratio

ncs_refine_coord_sigma_from_rmsd_ratio= 1.0

You can choose to multiply

the current NCS rmsd by this

value before using it as the

sigma for NCS restraints See

also

ncs_refine_coord_sigma_from_rmsd

no_merge_ncs_copies= False Normally False (do merge NCS copies). If

True, then do not use each NCS copy to try to build

the others.

optimize_ncs= True This script normally deduces ncs information from the

NCS in chains of models that are built during iterative

model-building. Optimize NCS adds a step to try and make

the molecule formed by NCS as compact as possible, without

losing any point-group symmetry.

use_ncs_in_build= True Use NCS information in the model assembly stage

of model-building. Also if no_merge_ncs_copies is not

set, then use each NCS copy to try to build the

others.

non_user_parameters

background_map= None You can supply an mtz file (REQUIRED LABELS: FP

PHIM FOMM) to use as map coefficients to calculate the

electron density in all points in an omit map that are

not part of any omitted region. (Default="")

boundary_background_map= None You can supply an mtz file (REQUIRED

LABELS: FP PHIM FOMM) to use as map

coefficients to calculate the electron density

in all points in the boundary map that are not

part of any omitted region. (Default="")

extend_try_list= False You can fill out the list of parallel jobs to

match the number of jobs you want to run at one time,

as specified with nbatch.

force_combine_extend= False You can choose whether to force the

combine-extend step in model-building

model_list= None This keyword lets you name any number of PDB files to

consider as starting models for model-building. NOTE: This

differs from consider_main_chain_list which will try to add

your PDB files EVERY cycle of merging models. In contrast

model_list will only do it on the first cycle. NOTE: this

only uses the main-chain atoms of your PDB files.

oasis_cnos= None Enter number of C N O and S atoms here if you have

OASIS and want to run it before resolve density modification

like this: "C 250 N 121 O 85 S 3"

offset_boundary_background_map= None You can set the offset of the

boundary_background_map.

http://phenix-online.org/documentation/autobuild.htm (27 of 34) [12/14/08 1:01:09 PM]

105

Automated Model Building and Rebuilding using AutoBuild

omit

composite_omit_type= *None simple_omit sa_omit iterative_build_omit Your

choices of types of OMIT maps are: None - normal

operation, no omit simple_omit - omit the atoms in

OMIT region in calculating a sigmaA-weighted

2mFo-DFc map with no refinement sa_omit - omit the

atoms in OMIT region, carry out simulated-annealing

refinement, then calculate a sigmaA-weighted

2mFo-DFc map. iterative_build_omit - set occupancy

of atoms in OMIT region to 0 throughout an entire

iterative model-building, density modification and

refinement process (takes a long time). All these

omit map types are available as composite omit maps

(default) or as omit maps around a region defined

by a PDB file (using omit_box_pdb_list) The

resulting OMIT map will be in the directory OMIT

with file name resolve_composite_map.mtz . This mtz

file contains the map coefficients to create the

OMIT map. The file "omit_region.mtz" contains the

coefficients for a map showing the boundaries of

the OMIT region.

n_box_target= None You can tell the Wizard how many omit boxes to try

and set up (but it will not necessarily choose your number

because it has to be nicely divisible into boxes that fit

your asymmetric unit). A suitable number is 24. The

larger the number of boxes, the better the map will be,

but the longer it will take to calculate the map.

n_cycle_image_min= 3 Pattern recognition (resolve_pattern) and fragment

identification ("image based density modification")

are used as part of the density modification process.

These are normally only useful in the first few

cycles of iterative model-building. This script

tries model-building both with and without including

image information, and proceeds with the most

complete model. Once at least n_cycle_image_min

cycles have been carried out with image information,

if the image-based map results in a less-complete

model than the one without image information, image

information is no longer included.

n_cycle_rebuild_omit= 10 Model-building is normally carried out using

the "best" available map. If omit_on_rebuild is

Yes, then every n_cycle_rebuild_omit cycle of

model rebuilding, a composite omit map is used

instead. If you specify 0 and omit_on_rebuild is

Yes, omit maps will be used every cycle. Normally

every 10th cycle is optimal.

offset_boundary= 1.0

Specify the boundary around omit_box_pdb for

definition of omit region.

omit_box_end= 0 To only carry out omit in some of the omit boxes, use

omit_box_start and omit_box_end

omit_box_pdb_list= None This keyword applies if you have set OMIT region

specification to "omit_around_pdb". To automatically

set an OMIT region specify a PDB file(s) with

omit_box_pdb_list. The omit region boundaries will be

the limits in x y z of the atoms in this file, plus a

border of offset_boundary. To use only some of the

atoms in the file, specify values for starting,

ending and chain to omit (omit_res_start_list and

omit_res_end_list and omit_chain_list) If you

specify more than one file (or if you specify more

than one segment of a file with omit_chain_list or

omit_res_start_list and omit_res_end_list) then a set http://phenix-online.org/documentation/autobuild.htm (28 of 34) [12/14/08 1:01:09 PM]

106

Automated Model Building and Rebuilding using AutoBuild

of omit runs will be carried out and combined into

one composite omit.

omit_box_start= 0 To only carry out omit in some of the omit boxes, use

omit_box_start and omit_box_end

omit_chain_list= None You can choose to omit just a portion of your

model keywords omit_res_start_list 3 omit_res_end_list

4 omit_chain_list chain1 (use "" to select all chains)

The residues from 3 to 4 of chain1 will be omitted. You

can specify more than one region by using the Parameter

Group Options button to add lines. If you specify more

than one region, a separate omit run will be carried

out for each one and then the maps will be put together

afterwards. If there are more than one chains in the

input PDB file then only the chain defined by

omit_chain will be omitted NOTE: Zero for start and

end and "" for chain is the same as choosing everything

omit_offset_list= 0 0 0 0 0 0 To carry out one iterative build omit,

with a region defined in grid units, enter

nxs,nxe,nys,nye,nzs,nze in omit_offset_list.

omit_on_rebuild= False You can specify whether to use an omit map for

building the model on rebuild cycles. Default is Yes if

you start with a model, No if you are building a model

from scratch. The omit map is calculated every

n_cycle_rebuild_omit cycles

omit_region_specification= *composite_omit omit_around_pdb You can

specify what region an omit

(simple/sa-omit/iterative-build-omit) map is

to be calculated for. Composite omit will

create a map over the entire asymmetric unit

by dividing the asymmetric unit into

overlapping boxes, calculating omit maps for

each, and splicing all the results together

into a single composite omit map. You can

tell the Wizard how many omit boxes to try

and set up with the keyword "n_box_target"

(but it will not necessarily choose your

number because it has to be nicely divisible

into boxes that fit your asymmetric unit).

Omit around PDB will omit around the region

defined by the PDB file(s) you enter for

omit_box_pdb (or around the residues in that

PDB file that you specify). If you specify

omit_around_pdb then you must enter a pdb

file to omit around.

omit_res_end_list= None You can choose to omit just a portion of your

model keywords omit_res_start_list 3

omit_res_end_list 4 omit_chain_list chain1 (use " "

for blank) The residues from 3 to 4 of chain1 will be

omitted. You can specify more than one region by

using the Parameter Group Options button to add

lines. If you specify more than one region, a

separate omit run will be carried out for each one

and then the maps will be put together afterwards. If

there are more than one chains in the input PDB file

then only the chain defined by omit_chain will be

omitted NOTE: Zero for start and end and "" for

chain is the same as choosing everything

omit_res_start_list= None You can choose to omit just a portion of your

model keywords omit_res_start_list 3

omit_res_end_list 4 omit_chain_list chain1 (use " "

for blank) The residues from 3 to 4 of chain1 will

be omitted. You can specify more than one region by http://phenix-online.org/documentation/autobuild.htm (29 of 34) [12/14/08 1:01:09 PM]

107

Automated Model Building and Rebuilding using AutoBuild

using the Parameter Group Options button to add

lines. If you specify more than one region, a

separate omit run will be carried out for each one

and then the maps will be put together afterwards.

If there are more than one chains in the input PDB

file then only the chain defined by omit_chain will

be rebuilt. NOTE: Zero for start and end and ""

for chain is the same as choosing everything

rebuild_in_place

min_seq_identity_percent_rebuild_in_place= 50.0

The sequence in your

input PDB file will be

adjusted to match the

sequence in your sequence

file (if any) You can specify

the minimum sequence identity

between your sequence file

and a segment from your input

PDB file to consider the

sequences to be matched.

Default is 50.0%. You might

want a higher number to make

sure that deletions in the

sequence are noticed. The

value you specify applies to

rebuild_in_place only. Use

min_seq_identity_percent

instead for non

rebuild_in_place runs.

n_cycle_rebuild_in_place= None Number of cycles for rebuild_in_place for

multiple models only

n_rebuild_in_place= 1 You can choose how many times to rebuild your

model in place with rebuild_in_place

rebuild_chain_list= None You can choose to rebuild just a portion of

your model keywords rebuild_res_start_list 3

rebuild_res_end_list 4 rebuild_chain_list chain1 (use

" " for blank) The residues from 3 to 4 of chain1

will be rebuilt. You can specify more than one

region by using the Parameter Group Options button

to add lines. If there are more than one chains in

the input PDB file then only the chain defined by

rebuild_chain will be rebuilt. The smallest region

that can be rebuilt is 4 residues.

rebuild_in_place= *Auto Yes No True False You can choose to rebuild

your model while fixing the sequence alignment by

iteratively rebuilding segments within the model. This

is done n_rebuild_in_place times, then the models are

recombined, taking the best-fitting parts of each.

Crossovers allowed where main-chain atom rmsd is less

than dist_close. Note that the sequence of the input

model must match the supplied sequence closely enough

to allow a clear alignment. Also this method does not

build any new chain, it just moves the existing model

around. Normally this procedure is useful if the model

is greater than 95% identical with the target

sequence. You can include information directly from

the starting model if you want with the keyword

include_input_model. Then this model will be

recombined with the models that are built based on it.

Note that this requires that the input model have a

sequence that is identical to the model to be rebuilt.

You can also rebuild just a portion of the model with

the keywords keywords rebuild_res_start_list 3 http://phenix-online.org/documentation/autobuild.htm (30 of 34) [12/14/08 1:01:09 PM]

108

Automated Model Building and Rebuilding using AutoBuild

rebuild_res_end_list 4 rebuild_chain_list chain1 (use "

" for blank) The residues from 3 to 4 of chain1 will

be rebuilt. You can specify more than one region by

using the Parameter Group Options button to add lines

NOTE: if a region cannot be rebuilt the original

coordinates will be preserved for that region.

rebuild_near_chain= None You can specify where to rebuild either with

rebuild_res_start_list rebuild_res_end_list

rebuild_chain_list or with rebuild_near_res and

rebuild_near_chain and rebuild_near_dist.

rebuild_near_dist= 7.5

You can specify where to rebuild either with

rebuild_res_start_list rebuild_res_end_list

rebuild_chain_list or with rebuild_near_res and

rebuild_near_chain and rebuild_near_dist.

rebuild_near_res= None You can specify where to rebuild either with

rebuild_res_start_list rebuild_res_end_list

rebuild_chain_list or with rebuild_near_res and

rebuild_near_chain and rebuild_near_dist.

rebuild_res_end_list= None You can choose to rebuild just a portion of

your model keywords rebuild_res_start_list 3

rebuild_res_end_list 4 rebuild_chain_list chain1

(use " " for blank) The residues from 3 to 4 of

chain1 will be rebuilt. You can specify more than

one region by using the Parameter Group Options

button to add lines. If there are more than one

chains in the input PDB file then only the chain

defined by rebuild_chain will be rebuilt. The

smallest region that can be rebuilt is 4 residues.

rebuild_res_start_list= None You can choose to rebuild just a portion

of your model keywords rebuild_res_start_list 3

rebuild_res_end_list 4 rebuild_chain_list chain1

(use " " for blank) The residues from 3 to 4 of

chain1 will be rebuilt. You can specify more

than one region by using the Parameter Group

Options button to add lines. If there are more

than one chains in the input PDB file then only

the chain defined by rebuild_chain will be

rebuilt. The smallest region that can be rebuilt

is 4 residues.

rebuild_side_chains= False You can choose to replace side chains (with

extend_only) before rebuilding the model (not

normally used)

redo_side_chains= True You can chooses to have AutoBuild choose whether

to replace all your side chains in rebuild_in_place,

taking new ones if they fit the density better. If

Yes, this is applied to all side chains, not only

those that are rebuilt.

replace_existing= False In rebuild_in_place the usual default is to

force the replacement of all residues, even if the

rebuilt ones are not as good a fit as the original.

You can override this by saying "No" (do not force

replacement of residues, keep whatever is better).

Additionally if you set the "touch_up" flag then the

default is to keep whatever is better.

touch_up= False You can rebuild just the worst parts of your model by

settting touch_up=True. You can decide what parts to rebuild

based on an minimum model-map correlation (by residue). This

is set with min_cc_residue_rebuild=0.82 Alternatively you can

rebuild the worst percentage of these:

worst_percent_res_rebuild=6. If a value is set for both of

these then residues qualifying in either way are rebuilt.

NOTE: touch_up is only available with rebuild_in_place.

http://phenix-online.org/documentation/autobuild.htm (31 of 34) [12/14/08 1:01:09 PM]

109

Automated Model Building and Rebuilding using AutoBuild

touch_up_extra_residues= None Number of residues on each side of the

residues identified in touch_up that you want

to rebuild. Normally you will want to rebuild

one or more on each side.

worst_percent_res_rebuild= 2.0

You can rebuild just the worst parts of

your model by settting touch_up=True. You can

decide how much to rebuild using

worst_percent_res_rebuild or with

min_cc_res_rebuild, or both.

refinement

link_distance_cutoff= 3.0

You can specify the maximum bond distance for

linking residues in phenix.refine called from the

wizards.

max_occ= None You can choose to set the maximum value of occupancy for

atoms that have their occupancies refined. Default is None (use

default value of 1.0 from phenix.refine)

ordered_solvent_low_resolution= None You can choose what resolution

cutoff to use fo placing ordered solvent

in phenix.refine. If the resolution of

refinement is greater than this cutoff,

then no ordered solvent will be placed,

even if

refinement.main.ordered_solvent=True.

place_waters= True You can choose whether phenix.refine automatically

places ordered solvent (waters) during the refinement

process.

r_free_flags_fraction= 0.1

Maximum fraction of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_lattice_symmetry_max_delta= 5.0

You can set the maximum

deviation of distances in the

lattice that are to be

considered the same for

purposes of generating a

lattice-symmetry-unique set of

free R flags.

r_free_flags_max_free= 2000 Maximum number of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_use_lattice_symmetry= True When generating r_free_flags you

can decide whether to include lattice

symmetry (good in general, necessary

if there is twinning).

refine_b= True You can choose whether phenix.refine is to refine

individual atomic displacement parameters (B values)

refine_before_rebuild= True You can choose to refine the input model

before rebuilding it

refine_se_occ= True You can choose to refine the occupancy of SE atoms

in a SEMET structure (default=Yes). This only applies if

semet=true

refine_with_ncs= True This script can allow phenix.refine to

automatically identify NCS and use it in refinement.

NOTE: ncs refinement and placing waters automatically

are mutually exclusive at present.

http://phenix-online.org/documentation/autobuild.htm (32 of 34) [12/14/08 1:01:09 PM]

110

Automated Model Building and Rebuilding using AutoBuild

refine_xyz= True You can choose whether phenix.refine is to refine

coordinates

refinement_resolution= 0.0

Enter the high-resolution limit for

refinement only. This high-resolution limit can

be different than the high-resolution limit for

other steps. The default ("None" or 0.0) is to

use the overall high-resolution limit for this

run (as set by 'resolution')

s_annealing= False You can choose to carry out simulated annealing

during the first refinement after initial model-building

skip_hexdigest= False You may wish to ignore the hexdigest of the free R

flags in your input PDB file if (1) the dataset you

provide is not identical to the one that you refined

with (but has the same free R flags), or (2) you are

providing both an input_data_file and an

input_refinement_file or input_hires_file and. In the

second case, the resulting composite file may not have

the same hexdigest even though the free R flags are

copied over. The default is to set skip_hexdigest=True

for case #2. For case #1 you have to tell the Wizard to

skip the hexdigest (because it cannot know about this).

use_mlhl= True This script normally uses information from the input file

(HLA HLB HLC HLD) in refinement. Say No to only refine on Fobs

textal

d_max_textal= 1000.0

This low-resolution limit is only used for Textal

model-building

d_min_textal= 2.8

Textal has an optimal high-resolution limit of 2.8 A

This limit is only used for Textal model-building

thoroughness

build_outside= True Define whether to use the BuildOutside module in

build_model

connect= True Define whether to use the connect module in build_model.

This module tries to connect nearby chains with loops, without

using the sequence. This is different than fit_loops (which

uses the sequence to identify the exact number of residues in

the loop).

extensive_build= False You can choose whether to build a new model on

every cycle and carry out extra model-building steps

every cycle. Default is No (build a new model on first

cycle, after that carry out extra steps).

fit_loops= True You can fit loops automatically if sequence alignment

has been done.

insert_helices= True Define whether to use the insert_helices module in

build_model. This module tries to insert helices

identified with find_helices_strands into the current

working model. This can be useful as the standard build

sometimes builds strands into helical density at low

resolution.

n_cycle_build= -1 Choose number of cycles (3). This does not apply if

TEXTAL is selected for build_type

n_cycle_build_max= 6 Maximum number of cycles for iterative

model-building, starting from experimental phases

without a model. Even if a satisfactory model is not

found, a maximum of n_cycle_build_max cycles will be

carried out.

n_cycle_build_min= 1 Minimum number of cycles for iterative

model-building, starting from experimental phases

without a model. Even if a satisfactory model is

found, n_cycle_build_min cycles will be carried out.

n_cycle_rebuild_max= 15 Maximum number of cycles for iterative

model-rebuilding, starting from a model. Even if a

satisfactory model is not found, a maximum of http://phenix-online.org/documentation/autobuild.htm (33 of 34) [12/14/08 1:01:09 PM]

111

Automated Model Building and Rebuilding using AutoBuild

n_cycle_rebuild_max cycles will be carried out.

n_cycle_rebuild_min= 1 Mininum number of cycles for iterative

model-rebuilding, starting from a model. Even if a

satisfactory model is found, n_cycle_rebuild_min

cycles will be carried out.

n_mini= 10 You can choose how many times to retrace your model in

"retrace_before_build"

n_random_frag= 0 In resolve building you can randomize each fragment

slightly so as to generate more possibilities for tracing

based on extending it.

n_random_loop= 3 Number of randomized tries from each end for building

loops If 0, then one try. If N, then N additional tries

with randomization based on rms_random_loop.

n_try_rebuild= 2 Number of attempts to build each segment of chain

ncycle_refine= 3 Choose number of refinement cycles (3)

number_of_models= -1 This parameter lets you choose how many initial

models to build with RESOLVE within a single build

cycle. This parameter is now superseded by

number_of_parallel_models, which sets the number of

models (but now entire build cycles) to carry out in

parallel. A zero means set it automatically. That is

what you normally should use. The number_of_models is

by default set to 1 and number_of_parallel_models is

set to the value of nbatch (typically 4).

number_of_parallel_models= 0 This parameter lets you choose how many

models to build in parallel. A zero means set

it automatically. That is what you normally

should use. This parameter supersedes the old

parameter number_of_models. The value of

number_of_models is by default set to 1 and

number_of_parallel_models is set to the value

of nbatch (typically 4).

skip_combine_extend= False You can choose whether to skip the

combine-extend step in model-building

thorough_loop_fit= True Try many conformations and accept them even if

the fit is not perfect? If you say Yes the parameters

for thorough loop fitting are: n_random_loop=100

rms_random_loop=0.3 rho_min_main=0.5 while if you say

No those for quick loop fitting are: n_random_loop=20

rms_random_loop=0.3 rho_min_main=1.0

http://phenix-online.org/documentation/autobuild.htm (34 of 34) [12/14/08 1:01:09 PM]

112

Automated ligand fitting with LigandFit

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Automated ligand fitting with LigandFit

Author(s)

Purpose

Purpose of the LigandFit Wizard

Usage

How the LigandFit Wizard works

How to run the LigandFit Wizard

What the LigandFit wizard needs to run

Specifying which columns of data to use from input data files

Output files from LigandFit

Examples

Sample command_line inputs

Possible Problems

Specific limitations and problems

Literature

Additional information

List of all LigandFit keywords

Author(s)

LigandFit Wizard: Tom Terwilliger

PHENIX GUI and PDS Server: Nigel W. Moriarty

RESOLVE: Tom Terwilliger

Purpose

Purpose of the LigandFit Wizard

The LigandFit Wizard carries out fitting of flexible ligands to electron density maps.

Usage

The LigandFit Wizard can be run from the PHENIX GUI, from the command-line, and from keyworded script files. All three versions are identical except in the way that they take commands from the user.

See

Running a Wizard from a GUI, the command-line, or a script

for details of how to run a Wizard. The command-line version will be described here.

How the LigandFit Wizard works

The LigandFit wizard provides a command-line and graphical user interface allowing the user to identify a datafile containing crystallographic structure factor information, an optional PDB file with a partial model of the structure without the ligand, and a PDB file containing the ligand to be fit (in an allowed but arbitrary conformation).

The wizard checks the data files for consistency and then calls RESOLVE to carry out the fitting of the ligand into the electron-density map. The map used is normally a difference map, with F=FP-FC. It can http://phenix-online.org/documentation/ligandfit.htm (1 of 11) [12/14/08 1:01:15 PM]

113

Automated ligand fitting with LigandFit also be an Fobs map (calulated from FP with phases PHIC from the input partial model), or an arbitrary map, calculated with FP PHI and FOM. If you supply an input partial model, then the region occupied by the partial model is flattened in the map used to fit the ligand, so that the ligand will normally not get placed in this region.

The ligand fitting is done by RESOLVE in a three-stage process. First, the largest contiguous region of density in the map not already occupied by the model is identified. The ligand will be placed in this density. (If desired, the location of the ligand can instead be defined by the user as near a certain residue or near specified coordinates. ) Next, many possible placements of the largest rigid subfragments of the ligand are found within this region of high density. Third, each of these placements is taken as a starting point for fitting the remainder of the ligand. All these ligand fits are scored based on the fit to the density, and the best-fitting placement is written out.

The output of the wizard consists of a fitted ligand in PDB format and a summary of the quality of the fit.

Multiple copies of a ligand can be fit to a single map in an automated fashion using the LigandFit wizard as well.

How to run the LigandFit Wizard

Running the LigandFit Wizard is easy. For example, from the command-line you can type: phenix.ligandfit data=datafile.mtz model=partial_model.pdb ligand=ligand.pdb

The LigandFit Wizard will carry out ligand fitting of the ligand in ligand.pdb based on the structure factor amplitudes in datafile.mtz, calculating phases based on partial-model.pdb. All rotatable bonds will be identified and allowed to take stereochemically reasonable orientations.

What the LigandFit wizard needs to run

The ligandfit wizard needs:

(1) a datafile (w1.sca or data=w1.sca); this can be any format

(2) a PDB file with your model without ligand (model=partial.pdb; optional if your datafile contains map coefficients)

(3) a file with information about your ligand (ligand=side.pdb)

The ligand file can be a PDB file with 1 stereochemically acceptable conformation of your ligand. It can alternatively be a file containing a SMILES string, in which case the starting ligand conformation will be generated with the PHENIX elbow routine.

The command_line ligandfit interpreter will guess which file is your data file but you have to tell it which file is the model and which is the ligand.

Specifying which columns of data to use from input data files

If one or more of your data files has column names that the Wizard cannot identify automatically, you can specify them yourself. You will need to provide one column "name" for each expected column of data, with "None" for anything that is missing.

For example, if your data file data.mtz has columns FP SIGFP then you might specify data=data.mtz

http://phenix-online.org/documentation/ligandfit.htm (2 of 11) [12/14/08 1:01:15 PM]

114

Automated ligand fitting with LigandFit input_labels="FP SIGFP"

You can find out all the possible label strings in a data file that you might use by typing: phenix.autosol display_labels=data.mtz # display all labels for data.mtz

You can specify many more parameters as well. See the list of keywords, defaults and descriptions

at the end of this page and also general information about running Wizards at Running a Wizard from a GUI, the command-line, or a script

for how to do this. Some of the most common parameters are: data=w1.sca # data file partial_model=coords.pdb # starting model without ligand ligand=ligand.pdb # any stereochemically allowed conformation of your ligand resolution=3 # dmin of 3 A quick=False # specify if you want to look hard for a good conformation ligand_cc_min=0.75 # quit if the CC of ligand to map is 0.75 or better number_of_ligands=3 # find 3 copies of the ligand n_group_search=3 # try 3 different fragments of the ligand in initial search resolve_command="'ligand_start side.pdb'" # build ligand superimposing on side.pdb

Output files from LigandFit

When you run LigandFit the output files will be in a subdirectory with your run number:

LigandFit_run_1_/ # subdirectory with results

A summary file listing the results of the run and the other files produced:

LigandFit_summary.dat # overall summary

A file that lists all parameters and knowledge accumulated by the Wizard during the run (some parts are binary and are not printed)

LigandFit_Facts.dat # all Facts about the run

A warnings file listing any warnings about the run

LigandFit_warnings.dat # any warnings

A PDB file with the fitted ligand (in this case the first copy of ligand number 1): ligand_fit_1_1.pdb

A log file with the fitting of the ligand: ligand_1_1.log

A log file with the fit of the ligand to the map: ligand_cc_1_1.log

Map coefficients for the map used for fitting: http://phenix-online.org/documentation/ligandfit.htm (3 of 11) [12/14/08 1:01:15 PM]

115

Automated ligand fitting with LigandFit resolve_map.mtz

Examples

Sample command_line inputs

Standard run of ligandfit (generate map from model and data file) phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb

Build into a map from pre-determined coefficients phenix.ligandfit data=perfect.mtz \

lig_map_type=fo-fc_difference_map \

model=partial.pdb ligand=side.pdb

Quick run of ligandfit phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb quick=True

Run ligandfit on a series of ligands specified in ligand_list.dat phenix.ligandfit w1.sca model=partial.pdb \

ligand=ligand_list.dat file_or_file_list=file_with_list_of_files

Note that you have to specify file_or_file_list=file_with_list_of_files or else the Wizard will try to interpret the contents of ligand_list.dat as a SMILES string. Here the

"file_with_list_of_files" is a flag, not something you substitute with an actual file name. You use it just as listed above.

Place ligand near residue 94 of chain "A" from partial.pdb phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb \

ligand_near_chain="A" ligand_near_res=92

Use start.pdb as a template for some of the atoms in the ligand; build the remainder of the ligand, fixing the coordinates of the corresponding atoms: phenix.ligandfit w1.sca model=partial.pdb ligand=side.pdb \

resolve_command="'ligand_start start.pdb'" # NOTE ' and " quotes necessary

Note that the formatting is slightly tricky and requires the two different quotation marks on either end of the command. This is an example of passing a specific keyword to RESOLVE.

Possible Problems

Specific limitations and problems

The ligand to be searched for must have at least 3 atoms. http://phenix-online.org/documentation/ligandfit.htm (4 of 11) [12/14/08 1:01:15 PM]

116

Automated ligand fitting with LigandFit

The partial-model file must not have any atoms (other than waters, which are automatically removed) in the position where the ligand is to be built. If this file contains atoms other than waters in the position where the ligand is to be built, then you may wish to remove them before building the ligand.

If a ring in the ligand can have more than one conformation (e.g., chair or boat conformation) then you need to do separate runs for each conformation of the ring (rings are taken as fixed units in LigandFit).

LigandFit ignores insertion codes, so if you specify a residue with ligand_near_res, only the residue number is used.

The size of the asymmetric unit in the SOLVE/RESOLVE portion of the LigandFit wizard is limited by the memory in your computer and the binaries used. The Wizard is supplied with regular-size

("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.

The LigandFit Wizard can take most settings of most space groups, however it can only use the hexagonal setting of rhombohedral space groups (eg., #146 R3:H or #155 R32:H), and it cannot use space groups 114-119 (not found in macromolecular crystallography) even in the standard setting due to difficulties with the use of asuset in the version of ccp4 libraries used in PHENIX for these settings and space groups.

Literature

Ligand identification using electron-density map correlations. T. C. Terwilliger, P. D.

Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D63, 101-107 (2007)

Automated ligand fitting by core-fragment fitting and extension into density. T. C.

Terwilliger, H. Klei, P. D. Adams, N. W. Moriarty and J. D. Cohn Acta Cryst. D62, 915-922

(2006)

[pdf]

[pdf]

Additional information

List of all LigandFit keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- ligandfit

data= None Datafile (alias for input_data_file). This can be any format if

only FP is to be read in. If phases are to be read in then MTZ format

is required. The Wizard will guess the column identification. If you

want to specify it you can say input_labels="FP" , or

input_labels="FP PHIB FOM". (Command-line only) http://phenix-online.org/documentation/ligandfit.htm (5 of 11) [12/14/08 1:01:15 PM]

117

Automated ligand fitting with LigandFit

ligand= None File containing information about the ligand (PDB or SMILES)

(alias for input_lig_file) (Command-line only)

model= None PDB file with model for everything but the ligand (alias for

input_partial_model_file). (Command-line only)

quick= False Run as quickly as possible. (Command-line only)

special_keywords

write_run_directory_to_file= None Writes the full name of a run

directory to the specified file. This can

be used as a call-back to tell a script

where the output is going to go.

(Command-line only)

run_control

coot= None Set coot to True and optionally run=[run-number] to run Coot

with the current model and map for run run-number. In some wizards

(AutoBuild) you can edit the model and give it back to PHENIX to

use as part of the model-building process. If you just say coot

then the facts for the highest-numbered existing run will be

shown. (Command-line only)

ignore_blanks= None ignore_blanks allows you to have a command-line

keyword with a blank value like "input_lig_file_list="

stop= None You can stop the current wizard with "stopwizard" or "stop".

If you type "phenix.autobuild run=3 stop" then this will stop run

3 of autobuild. (Command-line only)

display_facts= None Set display_facts to True and optionally

run=[run-number] to display the facts for run run-number.

If you just say display_facts then the facts for the

highest-numbered existing run will be shown.

(Command-line only)

display_summary= None Set display_summary to True and optionally

run=[run-number] to show the summary for run

run-number. If you just say display_summary then the

summary for the highest-numbered existing run will be

shown. (Command-line only)

carry_on= None Set carry_on to True to carry on with highest-numbered

run from where you left off. (Command-line only)

run= None Set run to n to continue with run n where you left off.

(Command-line only)

copy_run= None Set copy_run to n to copy run n to a new run and continue

where you left off. (Command-line only)

display_runs= None List all runs for this wizard. (Command-line only)

delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)

display_labels= None display_labels=test.mtz will list all the labels

that identify data in test.mtz. You can use the label

strings that are produced in AutoSol to identify which

data to use from a datafile like this: peak.data="F+

SIGF+ F- SIGF-" # the entire string in quotes counts

here You can use the individual labels from these

strings as identifiers for data columns in AutoSol and

AutoBuild like this: input_refinement_labels="FP SIGFP

FreeR_flags" # each individual label counts

dry_run= False Just read in and check parameter names

params_only= False Just read in and return parameter defaults

display_all= False Just read in and display parameter defaults

crystal_info

cell= 0.0 0.0 0.0 0.0 0.0 0.0

Enter cell parameter a b c alpha beta

gamma

resolution= 0.0

High-resolution limit.Used as resolution limit for

density modification and as general default high-resolution

limit. If resolution_build or refinement_resolution are set http://phenix-online.org/documentation/ligandfit.htm (6 of 11) [12/14/08 1:01:15 PM]

118

Automated ligand fitting with LigandFit

then they override this for model-building or refinement. If

overall_resolution is set then data beyond that resolution

is ignored completely.

sg= None Space Group symbol (i.e., C2221 or C 2 2 21)

display

number_of_solutions_to_display= None Number of solutions to put on

screen and to write out

solution_to_display= 1 Solution number of the solution to display and

write out ( use 0 to let the wizard display the top

solution)

file_info

file_or_file_list= *single_file file_with_list_of_files Choose if you

want to input a single file with PDB or other

information about the ligand or if you want to input

a file containing a list of files with this

information for a list of ligands

input_labels= None Labels for input data columns NOTE: Applies to input

data file for LigandFit and AutoBuild, but not to AutoMR.

For AutoMR use instead 'input_label_string'.

lig_map_type= *fo-fc_difference_map fobs_map pre_calculated_map_coeffs

Enter the type of map to use in ligand fitting

fo-fc_difference_map: Fo-Fc difference map phased on

partial model fobs_map: Fo map phased on partial model

pre_calculated_map_coeffs: map calculated from FP PHIB

[FOM] coefficients in input data file

ligand_format= *PDB SMILES Enter whether the files contain SMILES

strings or PDB formatted information

general

background= True When you specify nproc=nn, you can run the jobs in

background (default if nproc is greater than 1) or

foreground (default if nproc=1). If you set

run_command=qsub (or otherwise submit to a batch queue),

then you should set background=False, so that the batch

queue can keep track of your runs. There is no need to use

background=True in this case because all the runs go as

controlled by your batch system. If you use run_command=csh

(or similar, csh is default) then normally you will use

background=True so that all the jobs run simultaneously.

base_path= None You can specify the base path for files (default is

current working directory)

clean_up= False At the end of the entire run the TEMP directories will

be removed if clean_up is True. The default is No, keep these

directories. If you want to remove them after your run is

finished use a command like "phenix.autobuild run=1

clean_up=True"

coot_name= coot If your version of coot is called something else, then

you can specify that here.

debug= False You can have the wizard stop with error messages about the

code if you use debug. NOTE: you cannot use Pause with debug.

extend_try_list= False You can fill out the list of parallel jobs to

match the number of jobs you want to run at one time,

as specified with nbatch.

extra_verbose= False Facts and possible commands will be printed every

cycle if Yes

i_ran_seed= 289564 Random seed (positive integer) for model-building

and simulated annealing refinement

ligand_id= None You can specify an integer value for the ID of a

ligand... This number will be added to whatever residue

number the ligand search model in input_lig_file has. The http://phenix-online.org/documentation/ligandfit.htm (7 of 11) [12/14/08 1:01:15 PM]

119

Automated ligand fitting with LigandFit

keyword is only valid if a single copy of the ligand is to be

found.

max_wait_time= 100.0

You can specify the length of time (seconds) to

wait when testing the run_command. If you have a cluster

where jobs do not start right away you may need a longer

time to wait.

nbatch= 5 You can specify the number of processors to use (nproc) and

the number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors

available and leave nbatch alone. If you leave nbatch as None it

will be set automatically, with a value depending on the Wizard.

This is recommended. The value of nbatch can affect the results

that you get, as the jobs are not split into exact replicates,

but are rather run with different random numbers. If you want to

get the same results, keep the same value of nbatch.

nproc= 1 You can specify the number of processors to use (nproc) and the

number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors available

and leave nbatch alone. If you leave nbatch as None it will be

set automatically, with a value depending on the Wizard. This is

recommended. The value of nbatch can affect the results that you

get, as the jobs are not split into exact replicates, but are

rather run with different random numbers. If you want to get the

same results, keep the same value of nbatch.

resolve_command_list= None Commands for resolve. One per line in the

form: keyword value value can be optional

Examples: coarse_grid resolution 200 2.0 hklin

test.mtz NOTE: for command-line usage you need to

enclose the whole set of commands in double quotes

(") and each individual command in single quotes

(') like this: resolve_command_list="'no_build'

'b_overall 23' "

resolve_size= _giant _huge _extra_huge *None Size for solve/resolve

("","_giant","_huge","_extra_huge")

run_command= csh When you specify nproc=nn, you can run the subprocesses

as jobs in background with csh (default) or submit them to

a queue with the command of your choice (i.e., qsub ). If

you have a multi-processor machine, use csh. If you have a

cluster, use qsub or the equivalent command for your

system. NOTE: If you set run_command=qsub (or otherwise

submit to a batch queue), then you should set

background=False, so that the batch queue can keep track of

your runs. There is no need to use background=True in this

case because all the runs go as controlled by your batch

system. If you use run_command=csh (or similar, csh is

default) then normally you will use background=True so that

all the jobs run simultaneously.

skip_xtriage= False You can bypass xtriage if you want. This will

prevent you from applying anisotropy corrections, however.

temp_dir= None Define a temporary directory (it must exist)

title= Run 1 LigandFit Sun Dec 7 17:46:24 2008 Enter any text you like

to help identify what you did in this run

top_output_dir= None This is used in subprocess calls of wizards and to

tell the Wizard where to look for the STOPWIZARD file.

verbose= False Command files and other verbose output will be printed

input_files

existing_ligand_file_list= None You can enter a list of files with

ligands you have already fit. These will be

used to exclude that region from http://phenix-online.org/documentation/ligandfit.htm (8 of 11) [12/14/08 1:01:15 PM]

120

Automated ligand fitting with LigandFit

consideration.

input_data_file= None Enter the file with input structure factor data

(files other than MTZ will be converted to mtz and

intensities to amplitudes)

input_lig_file= None Enter either a single file with PDB information or

a SMILES string or a file containing a list of files

with this information for a list of ligands. If you

enter a file containing a list of files you need also to

specify

"file_or_file_list=file_with_list_of_files".

If the format is not PDB, then ELBOW will generate a PDB

file.

input_ligand_compare_file= None If you enter a PDB file with a ligand in

it, the coordinates of the newly-built ligand

will be compared with the coordinates in this

file.

input_partial_model_file= None Enter a PDB file containing a model of

your structure without the ligand. This is

used to calculate phases. If you are providing

phases in your data file and have selected

"pre_calculated_map_coeffs" for map_type this

file may be left out.

non_user_parameters

get_lig_volume= False You can ask to get the volume of the ligand and

to then stop

offsets_list= 7 53 29 You can specify an offset for the orientation of

the helix and strand templates in building. This is used

in generating different starting models.

refinement

link_distance_cutoff= 3.0

You can specify the maximum bond distance for

linking residues in phenix.refine called from the

wizards.

r_free_flags_fraction= 0.1

Maximum fraction of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_lattice_symmetry_max_delta= 5.0

You can set the maximum

deviation of distances in the

lattice that are to be

considered the same for

purposes of generating a

lattice-symmetry-unique set of

free R flags.

r_free_flags_max_free= 2000 Maximum number of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_use_lattice_symmetry= True When generating r_free_flags you

can decide whether to include lattice

symmetry (good in general, necessary

if there is twinning).

search_parameters

conformers= 1 Enter how many conformers to create. If greater than 1, http://phenix-online.org/documentation/ligandfit.htm (9 of 11) [12/14/08 1:01:15 PM]

121

Automated ligand fitting with LigandFit

then ELBOW will always be used to generate them. If 1 then

ELBOW will be used if a PDB file is not specified. These

conformers are used to identify allowed torsion angles for

your ligand. The alternative is to use the empirical rules

in RESOLVE. ELBOW takes longer but is more accurate.

delta_phi_ligand= 40.0

Specify the angle (degrees) between successive

tries in FFT search for fragments

fit_phi_inc= 20 Specify the angle (degrees) between rotations around

bonds

fit_phi_range= -180 180 Range of bond rotation angles to search

group_search= 0 Enter the ID number of the group from the ligand to use

to seed the search for conformations

ligand_cc_min= 0.75

Enter the minimum correlation coefficient of the

ligand to the map to quit searching for more

conformations

ligand_completeness_min= 1.0

Enter the minimum completeness of the

ligand to the map to quit searching for more

conformations

local_search= True If local_search is Yes then, only the region within

search_dist of the point in the map with the highest local

rmsd will be searched in the FFT search for fragments

n_group_search= 3 Enter the number of different fragments of the ligand

that will be looked for in FFT search of the map

n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at

once otherwise all are first searched at once then

individually up to the number specified

n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are

tested at once otherwise all are first tested at once

then individually up to the number specified

number_of_ligands= 1 Number of copies of the ligand expected in the

asymmetric unit

search_dist= 10.0

If local_search is Yes then, only the region within

this distance of the point in the map with the highest

local rmsd will be searched in the FFT search for fragments

use_cc_local= False You can specify the use of a local correlation

coefficient for scoring ligand fits to the map. If you do

not do this, then the region over which the ligand is

scored are all points within 2.5 A of the atoms in the

ligand. If you do specify use_cc_local, then the region

over which the ligand is scored are all these points, plus

all the contingous points that have density greater than

0.5 * sigma .

search_target

ligand_near_chain= None You can specify where to search for the ligand

either with search_center or with ligand_near_res and

ligand_near_chain. If you set

ligand_near_chain="None" or leave it blank or do not

set it, then all chains will be included. The

keywords ligand_near_res and ligand_near_chain refer

to residue/chain in the file defined by

input_partial_model_file (or model if running from

command line).

ligand_near_pdb= None You can specify where LigandFit should look for

your ligands by providing a PDB file containing one or

more copies of the ligand. If you want you can provide

a PDB file with ligand+ macromolecule and specify the

ligand name with name_of_ligand_near_pdb.

ligand_near_res= None You can specify where to search for the ligand

either with search_center or with ligand_near_res and http://phenix-online.org/documentation/ligandfit.htm (10 of 11) [12/14/08 1:01:15 PM]

122

Automated ligand fitting with LigandFit

ligand_near_chain The keywords ligand_near_res and

ligand_near_chain refer to residue/chain in the file

defined by input_partial_model_file (or model if

running from command line).

name_of_ligand_near_pdb= None You can specify where LigandFit should

look for your ligands by providing a PDB file

containing one or more copies of the ligand. If

you want you can provide a PDB file with

ligand+ macromolecule and specify the ligand

name with name_of_ligand_near_pdb.

search_center= 0.0 0.0 0.0

Enter coordinates for center of search region

(ignored if [0,0,0]) http://phenix-online.org/documentation/ligandfit.htm (11 of 11) [12/14/08 1:01:15 PM]

123

Data quality assessment with phenix.xtriage

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Data quality assessment with phenix.xtriage

Author(s)

Purpose

Usage

How xtriage works

Output files from xtriage

Xtriage keywords in detail

Interpreting Xtriage output

Examples

Standard run of xtriage

Possible Problems

Specific limitations and problems

Literature

Additional information

List of all xtriage keywords

Author(s)

● xtriage: Peter Zwart

Phil command interpreter: Ralf W. Grosse-Kunstleve

Purpose

The xtriage method is a tool for analyzing structure factor data to identify outliers, presence of twinning and other conditions that the user should be aware of.

Usage

How xtriage works

Basic sanity checks performed by xtriage are

Wilson plot sanity

Probabilistic Matthews analysis

Data strength analysis

Ice ring analysis

Twinning analysis

Reference analysis (determines possible re-indexing. optional)

Detwinning and data massaging (optional)

See also: phenix.reflection_statistics

(comparison of multiple data sets)

Output files from xtriage

(1) A log file that contains all the screen output plus some ccp4 style graphs

(2) optional: an mtz file with massaged data

Xtriage keywords in detail

http://phenix-online.org/documentation/xtriage.htm (1 of 15) [12/14/08 1:01:22 PM]

124

Data quality assessment with phenix.xtriage

Scope: parameters.asu_contents keys: * n_residues :: Number of residues per monomer/unit

* n_bases :: Number of nucleotides per monomer/unit

* n_copies_per_asu :: Number of copies in the ASU.

These keywords control the determination of the absolute scale. If the number of residues/bases is not specified, a solvent content of 50% is assumed. Scope: parameters.misc_twin_parameters.missing_symmetry keys: * tanh_location :: tanh decision rule parameter

* tanh_slope :: tanh decision rule parameter

The tanh_location and tanh_slope parameter control what R-value is considered to be low enough to be considered a 'proper' symmetry operator. the tanh_location parameter corresponds to the inflection point of the approximate step function. Increasing tanh_location will result in large R-value thresholds. tanh_slope is set to 50 and should be okai. Scope: parameters.misc_twin_parameters.twinning_with_ncs keys: * perform_test :: can be set to True or False

* n_bins :: Number of bins in determination of D_ncs

The perform_test is by default set to False. Setting it to True triggers the determination of the twin fraction while taking into account NCS parallel to the twin axis. Scope: parameters.misc_twin_parameters.

twin_test_cuts keys: * high_resolution : high resolution for twin tests

* low_resolution: low resolution for twin tests

* isigi_cut: I/sig(I) threshold in automatic determination

of high resolution limit

* completeness_cut: completeness threshold in automatic

determination of high resolution limit

The automatic determination of the resolution limit for the twinning test is determined on the basis of the completeness after removing intensities for which I/sigI < isigi_cut. The lowest limit obtain in this way is 3.5A.

The value determined by the automatic procedure can be overruled by specification of the high_resolution keyword. The low resolution is set to 10A by default. Scope: parameters.reporting keys: * verbose :: verbosity level.

* log :: log file name

* ccp4_style_graphs :: Either True or False. Determines whether or

not ccp4 style logfgra plots are written to the

log file

Scope: xray_data keys: * file_name :: file name with xray data.

* obs_labels :: labels for observed data is format is mtz or XPLOR/CNS

* calc_labels :: optional; labels for calculated data

* unit_cell :: overrides unit cell in reflection file (if present)

* space_group :: overrides space group in reflection file (if present)

* high_resolution :: High resolution limit of the data

* low_resolution :: Low resolution limit of the data

Note that the matching of specified and present labels involves a sub-string matching algorithm. Scope:

optional keys: * hklout :: output mtz file

* twinning.action :: Whether to detwin the data

* twinning.twin_law :: using this twin law (h,k,l or x,y,z notation)

* twinning.fraction :: The detwinning fraction.

http://phenix-online.org/documentation/xtriage.htm (2 of 15) [12/14/08 1:01:22 PM]

125

Data quality assessment with phenix.xtriage

* b_value :: the resulting Wilson B value

The output mtz file contains an anisotropy corrected mtz file, with suspected outliers removed. The data is put scaled and has the specified Wilson B value. These options have an associated expert level of 10, and are not shown by default. Specification of the expert level on the command line as 'level=100' will show all available options.

Interpreting Xtriage output

Typing:

%phenix.xtriage some_data.sca residues=290 log=some_data.log

results in the following output (parts omitted). Matthews analysis First, a cell contents analysis is performed.

Matthews coefficients, solvent content and solvent content probabilities are listed, and the most likely composition is guessed

Matthews coefficient and Solvent content statistics

----------------------------------------------------------------

| Copies | Solvent content | Matthews Coed. | P(solvent cont.) |

|--------|-----------------|----------------|------------------|

| 1 | 0.705 | 4.171 | 0.241 |

| 2 | 0.411 | 2.085 | 0.750 |

| 3 | 0.116 | 1.390 | 0.009 |

----------------------------------------------------------------

| Best guess : 2 copies in the asu |

----------------------------------------------------------------

Data strength The next step, the strength of the data is gauged by determining the completeness of the in resolution bins after application of several I/sigI cut off values

Completeness and data strength analysis

The following table lists the completeness in various resolution

ranges, after applying a I/sigI cut. Miller indices for which

individual I/sigI values are larger than the value specified in

the top row of the table, are retained, while other intensities

are discarded. The resulting completeness profiles are an indication

of the strength of the data.

----------------------------------------------------------------------------------------

| Res. Range | I/sigI>1 | I/sigI>2 | I/sigI>3 | I/sigI>5 | I/sigI>10 | I/sigI>15 |

----------------------------------------------------------------------------------------

| 19.87 - 7.98 | 96.4% | 95.3% | 94.5% | 93.6% | 91.7% | 89.3% |

| 7.98 - 6.40 | 99.2% | 98.2% | 97.1% | 95.5% | 90.9% | 84.7% |

| 6.40 - 5.61 | 97.8% | 95.4% | 93.3% | 87.1% | 76.6% | 66.8% |

| 5.61 - 5.11 | 98.2% | 95.9% | 94.0% | 87.9% | 74.1% | 58.0% |

| 5.11 - 4.75 | 97.9% | 96.2% | 94.5% | 91.1% | 79.2% | 62.5% |

| 4.75 - 4.47 | 97.4% | 95.4% | 93.1% | 88.9% | 76.6% | 56.9% |

| 4.47 - 4.25 | 96.5% | 94.5% | 92.1% | 88.0% | 75.3% | 56.5% |

| 4.25 - 4.07 | 96.6% | 94.0% | 91.2% | 85.4% | 69.3% | 44.9% |

| 4.07 - 3.91 | 95.6% | 92.1% | 87.8% | 80.1% | 61.9% | 34.8% |

| 3.91 - 3.78 | 94.3% | 89.6% | 83.7% | 71.1% | 48.7% | 20.5% |

| 3.78 - 3.66 | 95.7% | 90.9% | 85.6% | 71.5% | 42.4% | 14.8% |

| 3.66 - 3.56 | 91.6% | 85.0% | 78.0% | 63.3% | 34.1% | 9.5% |

| 3.56 - 3.46 | 89.8% | 80.4% | 70.2% | 52.8% | 22.2% | 3.8% |

| 3.46 - 3.38 | 87.4% | 76.3% | 64.6% | 46.7% | 15.5% | 1.7% |

----------------------------------------------------------------------------------------

This analysis is also used in the automatic determination of the high resolution limit used in the intensity http://phenix-online.org/documentation/xtriage.htm (3 of 15) [12/14/08 1:01:22 PM]

126

Data quality assessment with phenix.xtriage statistics and twin analyses. Absolute, likelihood based Wilson scaling The (anisotropic) B value of the data is determined using a likelihood based approach. The resulting B value/tensor is reported:

Maximum likelihood isotropic Wilson scaling

ML estimate of overall B value of sec17.sca:i_obs,sigma:

75.85 A**(-2)

Estimated -log of scale factor of sec17.sca:i_obs,sigma:

-2.50

Maximum likelihood anisotropic Wilson scaling

ML estimate of overall B_cart value of sec17.sca:i_obs,sigma:

68.92, 0.00, 0.00

68.92, 0.00

91.87

Equivalent representation as U_cif:

0.87, -0.00, -0.00

0.87, 0.00

1.16

ML estimate of -log of scale factor of sec17.sca:i_obs,sigma:

-2.50

Correcting for anisotropy in the data

A large spread in (especially the diagonal) values indicates anisotropy. The anisotropy is corrected for. This clears up intensity statistics. Low resolution completeness analysis Mostly data processing software do not provide a clear picture of the completeness of the data at low resolution. For this reason, xtriage lists the completeness of the data up to 5 Angstrom:

Low resolution completeness analysis

The following table shows the completeness

of the data to 5 Angstrom.

unused: - 19.8702 [ 0/68 ] 0.000

bin 1: 19.8702 - 10.3027 [425/455] 0.934

bin 2: 10.3027 - 8.3766 [443/446] 0.993

bin 3: 8.3766 - 7.3796 [446/447] 0.998

bin 4: 7.3796 - 6.7336 [447/449] 0.996

bin 5: 6.7336 - 6.2673 [450/454] 0.991

bin 6: 6.2673 - 5.9080 [428/429] 0.998

bin 7: 5.9080 - 5.6192 [459/466] 0.985

bin 8: 5.6192 - 5.3796 [446/450] 0.991

bin 9: 5.3796 - 5.1763 [437/440] 0.993

bin 10: 5.1763 - 5.0006 [460/462] 0.996

unused: 5.0006 - [ 0/0 ]

This analysis allows one to quickly see if there is any unusually low completeness at low resolution, for instance due to missing overloads. Wilson plot analysis A Wilson plot analysis a la ARP/wARP is carried out, albeit with a slightly different standard curve:

Mean intensity analysis

Analysis of the mean intensity.

Inspired by: Morris et al. (2004). J. Synch. Rad.11, 56-59.

The following resolution shells are worrisome:

------------------------------------------------

| d_spacing | z_score | compl. | <Iobs>/<Iexp> |

------------------------------------------------

| 5.773 | 7.95 | 0.99 | 0.658 |

| 5.423 | 8.62 | 0.99 | 0.654 |

| 5.130 | 6.31 | 0.99 | 0.744 |

| 4.879 | 5.36 | 0.99 | 0.775 |

| 4.662 | 4.52 | 0.99 | 0.803 | http://phenix-online.org/documentation/xtriage.htm (4 of 15) [12/14/08 1:01:22 PM]

127

Data quality assessment with phenix.xtriage

| 3.676 | 5.45 | 0.99 | 1.248 |

------------------------------------------------

Possible reasons for the presence of the reported

unexpected low or elevated mean intensity in

a given resolution bin are :

- missing overloaded or weak reflections

- suboptimal data processing

- satellite (ice) crystals

- NCS

- translational pseudo symmetry (detected elsewhere)

- outliers (detected elsewhere)

- ice rings (detected elsewhere)

- other problems

Note that the presence of abnormalities

in a certain region of reciprocal space might

confuse the data validation algorithm throughout

a large region of reciprocal space, even though

the data is acceptable in those areas.

A very long list of warnings could indicate a serious problem with your data. Decisions on whether or not the data is useful, should be cut or should thrown away altogether, is not straightforward and falls beyond the scope of xtriage. Outlier detection and rejection Possible outliers are detected on the basis Wilson statistics:

Possible outliers

Inspired by: Read, Acta Cryst. (1999). D55, 1759-1764

Acentric reflections:

-----------------------------------------------------------------

| d_space | H K L | |E| | p(wilson) | p(extreme) |

-----------------------------------------------------------------

| 3.716 | 8, 6, 31 | 3.52 | 4.06e-06 | 5.87e-02 |

----------------------------------------------------------------p(wilson) : 1-(1-exp[-|E|^2]) p(extreme) : 1-(1-exp[-|E|^2])^(n_acentrics) p(wilson) is the probability that an E-value of the specified value would be observed when it would selected at random from the given data set.

p(extreme) is the probability that the largest |E| value is larger or equal than the observed largest |E| value.

Both measures can be used for outlier detection. p(extreme) takes into account the size of the data set.

Outliers are removed from the data set in the further analysis. Note that if pseudo translational symmetry is present, a large number of 'outliers' will be present. Ice ring detection Ice rings in the data are detected by analyzing the completeness and the mean intensity:

Ice ring related problems

The following statistics were obtained from ice-ring

insensitive resolution ranges

mean bin z_score : 3.47

( rms deviation : 2.83 )

mean bin completeness : 0.99

( rms deviation : 0.00 )

The following table shows the z-scores

and completeness in ice-ring sensitive areas.

http://phenix-online.org/documentation/xtriage.htm (5 of 15) [12/14/08 1:01:22 PM]

128

Data quality assessment with phenix.xtriage

Large z-scores and high completeness in these

resolution ranges might be a reason to re-assess

your data processing if ice rings were present.

------------------------------------------------

| d_spacing | z_score | compl. | Rel. Ice int. |

------------------------------------------------

| 3.897 | 0.12 | 0.97 | 1.000 |

| 3.669 | 0.96 | 0.95 | 0.750 |

| 3.441 | 2.14 | 0.94 | 0.530 |

------------------------------------------------

Abnormalities in mean intensity or completeness at

resolution ranges with a relative ice ring intensity

lower then 0.10 will be ignored.

At 3.67 A there is an lower occupancy

then expected from the rest of the data set.

Even though the completeness is lower as expected,

the mean intensity is still reasonable at this resolution

At 3.44 A there is an lower occupancy

then expected from the rest of the data set.

Even though the completeness is lower as expected,

the mean intensity is still reasonable at this resolution

There were 2 ice ring related warnings

This could indicate the presence of ice rings.

Anomalous signal If the input reflection file contains separate intensities for each Friedel mate, a quality measure of the anomalous signal is reported:

Analysis of anomalous differences

Table of measurability as a function of resolution

The measurability is defined as the fraction of

Bijvoet related intensity differences for which

|delta_I|/sigma_delta_I > 3.0

min[I(+)/sigma_I(+), I(-)/sigma_I(-)] > 3.0

holds.

The measurability provides an intuitive feeling

of the quality of the data, as it is related to the

number of reliable Bijvoet differences.

When the data is processed properly and the standard

deviations have been estimated accurately, values larger

than 0.05 are encouraging.

unused: - 19.8704 [ 0/68 ] bin 1: 19.8704 - 7.0211 [1551/1585] 0.1924

bin 2: 7.0211 - 5.6142 [1560/1575] 0.0814

bin 3: 5.6142 - 4.9168 [1546/1555] 0.0261

bin 4: 4.9168 - 4.4729 [1563/1582] 0.0081

bin 5: 4.4729 - 4.1554 [1557/1577] 0.0095

bin 6: 4.1554 - 3.9124 [1531/1570] 0.0083

bin 7: 3.9124 - 3.7178 [1541/1585] 0.0069

bin 8: 3.7178 - 3.5569 [1509/1552] 0.0028

bin 9: 3.5569 - 3.4207 [1522/1606] 0.0085

bin 10: 3.4207 - 3.3032 [1492/1574] 0.0044

unused: 3.3032 - [ 0/0 ]

The anomalous signal seems to extend to about 5.9 A http://phenix-online.org/documentation/xtriage.htm (6 of 15) [12/14/08 1:01:22 PM]

129

Data quality assessment with phenix.xtriage

(or to 5.2 A, from a more optimistic point of view)

The quoted resolution limits can be used as a guideline

to decide where to cut the resolution for phenix.hyss

As the anomalous signal is not very strong in this data set

substructure solution via SAD might prove to be a challenge.

Especially if only low resolution reflections are used,

the resulting substructures could contain a significant amount of

of false positives.

Determination of twin laws Twin laws are found using a modified le-Page algorithm and classified as merohedral and pseudo merohedral:

Determining possible twin laws.

The following twin laws have been found:

-------------------------------------------------------------------------------

| Type | Axis | R metric (%) | delta (le Page) | delta (Lebedev) | Twin law

|

-------------------------------------------------------------------------------

| M | 2-fold | 0.000 | 0.000 | 0.000 | -h,k,-l

|

-------------------------------------------------------------------------------

M: Merohedral twin law

PM: Pseudomerohedral twin law

1 merohedral twin operators found

0 pseudo-merohedral twin operators found

In total, 1 twin operator were found

Non-merohedral (reticular) twinning is not considered. The R-metric is equal to :

Sum (M_i-N_i)^2 / Sum M_i^2

M_i are elements of the original metric tensor and N_i are elements of the metric tensor after 'idealizing' the unit cell, in compliance with the restrictions the twin law poses on the lattice if it would be a 'true' symmetry operator. The delta le-Page is the familiar obliquity. The delta Lebedev is a twin law quality measure developed by A. Lebedev (Lebedev, Vagin & Murshudov; Acta Cryst. (2006). D62, 83-95.). Note that for merohedral twin laws, all quality indicators are 0. For non-merohedral twin laws, this value is larger or equal to zero. If a twin law is classified as non-merohedral, but has a delta le-page equal to zero, the twin law is sometimes referred to as a metric merohedral twin law. Locating translational pseudo symmetry (TPS) TPS is located by inspecting a low resolution Patterson function. Peaks and their significance levels are reported:

Largest Patterson peak with length larger then 15 Angstrom

Frac. coord. : 0.027 0.057 0.345

Distance to origin : 17.444

Height (origin=100) : 3.886

p_value(height) : 9.982e-01

The reported p_value has the following meaning:

The probability that a peak of the specified height

or larger is found in a Patterson function of a

macro molecule that does not have any translational

pseudo symmetry is equal to 9.982e-01

p_values smaller then 0.05 might indicate

weak translation pseudo symmetry, or the self vector of

a large anomalous scatterer such as Hg, whereas values

smaller then 1e-3 are a very strong indication for

the presence of translational pseudo symmetry.

http://phenix-online.org/documentation/xtriage.htm (7 of 15) [12/14/08 1:01:22 PM]

130

Data quality assessment with phenix.xtriage

Moments of the observed intensities The moment of the observed intensity/amplitude distribution, are reported, as well as their expected values:

Wilson ratio and moments

Acentric reflections

<I^2>/<I>^2 :1.955 (untwinned: 2.000; perfect twin 1.500)

<F>^2/<F^2> :0.796 (untwinned: 0.785; perfect twin 0.885)

<|E^2 - 1|> :0.725 (untwinned: 0.736; perfect twin 0.541)

Centric reflections

<I^2>/<I>^2 :2.554 (untwinned: 3.000; perfect twin 2.000)

<F>^2/<F^2> :0.700 (untwinned: 0.637; perfect twin 0.785)

<|E^2 - 1|> :0.896 (untwinned: 0.968; perfect twin 0.736)

Significant departure from the ideal values could indicate the presence of twinning or pseudo translations. For instance, an <I^2>/<I>^2 value significantly lower than 2.0, might point to twinning, whereas a value significantly larger than 2.0, might point towards pseudo translational symmetry. Cumulative intensity

distribution The cumulative intensity distribution is reported:

-----------------------------------------------

| Z | Nac_obs | Nac_theo | Nc_obs | Nc_theo |

-----------------------------------------------

| 0.0 | 0.000 | 0.000 | 0.000 | 0.000 |

| 0.1 | 0.081 | 0.095 | 0.168 | 0.248 |

| 0.2 | 0.167 | 0.181 | 0.292 | 0.345 |

| 0.3 | 0.247 | 0.259 | 0.354 | 0.419 |

| 0.4 | 0.321 | 0.330 | 0.420 | 0.474 |

| 0.5 | 0.392 | 0.394 | 0.473 | 0.520 |

| 0.6 | 0.452 | 0.451 | 0.521 | 0.561 |

| 0.7 | 0.506 | 0.503 | 0.570 | 0.597 |

| 0.8 | 0.552 | 0.551 | 0.603 | 0.629 |

| 0.9 | 0.593 | 0.593 | 0.636 | 0.657 |

| 1.0 | 0.635 | 0.632 | 0.673 | 0.683 |

-----------------------------------------------

| Maximum deviation acentric : 0.015 |

| Maximum deviation centric : 0.080 |

| |

| <NZ(obs)-NZ(twinned)>_acentric : -0.004 |

| <NZ(obs)-NZ(twinned)>_centric : -0.039 |

-----------------------------------------------

The N(Z) test is related to the moments based test discussed above. Nac_obs is the observed cumulative distribution of normalized intensities of the acentric data, and uses the full distribution rather then just a moment. The effects of twinning shows itself for Nac_obs having a more sigmoidal character. In the case of pseudo centering, Nac_obs will tend towards Nc_theo. The L test The L-test is an intensity statistic developed by Padilla and Yeates (Acta Cryst. (2003), D59: 1124-1130) and is reasonably robust in the presence of anisotropy and pseudo centering, especially if the miller indices are partitioned properly. Partitioning is carried out on the basis of a Patterson analysis. A significant deviation of both <|L|> and <L^2> from the expected values indicate twinning or other problems:

L test for acentric data

using difference vectors (dh,dk,dl) of the form:

(2hp,2kp,2lp)

where hp, kp, and lp are random signed integers such that

2 <= |dh| + |dk| + |dl| <= 8

Mean |L| :0.482 (untwinned: 0.500; perfect twin: 0.375)

Mean L^2 :0.314 (untwinned: 0.333; perfect twin: 0.200) http://phenix-online.org/documentation/xtriage.htm (8 of 15) [12/14/08 1:01:22 PM]

131

Data quality assessment with phenix.xtriage

The distribution of |L| values indicates a twin fraction of

0.00. Note that this estimate is not as reliable as obtained

via a Britton plot or H-test if twin laws are available.

Whether or not the <|L|> and <L^2> differ significantly from the expected values, is shown in the final summary (see below). Analysis of twin laws Twin law specific tests (Britton, H and RvsR) are performed:

Results of the H-test on a-centric data:

(Only 50.0% of the strongest twin pairs were used) mean |H| : 0.183 (0.50: untwinned; 0.0: 50% twinned) mean H^2 : 0.055 (0.33: untwinned; 0.0: 50% twinned)

Estimation of twin fraction via mean |H|: 0.317

Estimation of twin fraction via cum. dist. of H: 0.308

Britton analysis

Extrapolation performed on 0.34 < alpha < 0.495

Estimated twin fraction: 0.283

Correlation: 0.9951

R vs R statistic:

R_abs_twin = <|I1-I2|>/<|I1+I2|>

Lebedev, Vagin, Murshudov. Acta Cryst. (2006). D62, 83-95

R_abs_twin observed data : 0.193

R_abs_twin calculated data : 0.328

R_sq_twin = <(I1-I2)^2>/<(I1+I2)^2>

R_sq_twin observed data : 0.044

R_sq_twin calculated data : 0.120

Maximum Likelihood twin fraction determination

Zwart, Read, Grosse-Kunstleve & Adams, to be published.

The estimated twin fraction is equal to 0.227

These tests allow one to estimate the twin fraction and (if calculated data is provided) determine if rotational pseudo symmetry is present. Another option (albeit more computationally expensive), is to estimate the correlation between error free, untwinned, twin related normalized intensities (use the key perform=True on the command line)

Estimation of twin fraction, while taking into account the effects of possible NCS parallel to the twin axis.

Zwart, Read, Grosse-Kunstleve & Adams, to be published.

A parameters D_ncs will be estimated as a function of resolution,

together with a global twin fraction.

D_ncs is an estimate of the correlation coefficient between

untwinned, error-free, twin related, normalized intensities.

Large values (0.95) could indicate an incorrect point group.

Value of D_ncs larger than say, 0.5, could indicate the presence

of NCS. The twin fraction should be smaller or similar to other

estimates given elsewhere.

The refinement can take some time.

For numerical stability issues, D_ncs is limited between 0 and 0.95.

The twin fraction is allowed to vary between 0 and 0.45.

Refinement cycle numbers are printed out to keep you entertained.

http://phenix-online.org/documentation/xtriage.htm (9 of 15) [12/14/08 1:01:22 PM]

132

Data quality assessment with phenix.xtriage

. . . . 5 . . . . 10 . . . . 15 . . . . 20 . . . . 25 . . . . 30

. . . . 35 . . . . 40 . . . . 45 . . . . 50 . . . . 55 . . . . 60

. . . . 65 . . . . 70 . . . . 75 . . .

Cycle : 78

-----------

Log[likelihood]: 22853.700

twin fraction: 0.201

D_ncs in resolution ranges:

9.8232 -- 4.5978 :: 0.830

4.5978 -- 3.7139 :: 0.775

3.7139 -- 3.2641 :: 0.745

3.2641 -- 2.9747 :: 0.746

2.9747 -- 2.7666 :: 0.705

2.7666 -- 2.6068 :: 0.754

2.6068 -- 2.4784 :: 0.735

The correlation of the calculated F^2 should be similar to

the estimated values.

Observed correlation between twin related, untwinned calculated F^2

in resolution ranges, as well as estimates D_ncs^2 values:

Bin d_max d_min CC_obs D_ncs^2

1) 9.8232 -- 4.5978 :: 0.661 0.689

2) 4.5978 -- 3.7139 :: 0.544 0.601

3) 3.7139 -- 3.2641 :: 0.650 0.556

4) 3.2641 -- 2.9747 :: 0.466 0.557

5) 2.9747 -- 2.7666 :: 0.426 0.497

6) 2.7666 -- 2.6068 :: 0.558 0.569

7) 2.6068 -- 2.4784 :: 0.531 0.540

The twin fraction obtained via this method is usually lower than what is obtained by refinement. The estimated correlation coefficient (D_ncs^2) between the twin related F^2 values, is however reasonably accurate.

Exploring higher metric symmetry The fact that a twin law is present, could indicate that the data was incorrectly processed as well. The example below, shows a P41212 data set processed in P1:

Exploring higher metric symmetry

Point group of data as dictated by the space group is P 1

the point group in the Niggli setting is P 1

The point group of the lattice is P 4 2 2

A summary of R values for various possible point groups follow.

-----------------------------------------------------------------------------------------------

| Point group | mean R_used | max R_used | mean R_unused | min R_unused | choice |

-----------------------------------------------------------------------------------------------

| P 1 | None | None | 0.022 | 0.017 | |

| P 4 2 2 | 0.022 | 0.025 | None | None | <--- |

| P 1 2 1 | 0.017 | 0.017 | 0.026 | 0.024 | |

| Hall: C 2y (x-y,x+y,z) | 0.025 | 0.025 | 0.022 | 0.017 | |

| P 4 | 0.025 | 0.028 | 0.025 | 0.025 | |

| Hall: C 2 2 (x-y,x+y,z) | 0.024 | 0.025 | 0.017 | 0.017 | |

| Hall: C 2y (x+y,-x+y,z) | 0.024 | 0.024 | 0.023 | 0.017 | |

| P 1 1 2 | 0.028 | 0.028 | 0.021 | 0.017 | |

| P 2 1 1 | 0.027 | 0.027 | 0.022 | 0.017 | |

| P 2 2 2 | 0.023 | 0.028 | 0.025 | 0.025 | |

-----------------------------------------------------------------------------------------------

R_used: mean and maximum R value for symmetry operators *used* in this point group

R_unused: mean and minimum R value for symmetry operators *not used* in this point group

The likely point group of the data is: P 4 2 2 http://phenix-online.org/documentation/xtriage.htm (10 of 15) [12/14/08 1:01:22 PM]

133

Data quality assessment with phenix.xtriage

As in phenix.explore_metric_symmetry, the possible space groups are listed as well (not shown here). Twin

analysis summary The results of the twin analysis are summarized. Typical outputs look as follows for cases of wrong symmetry, twin laws but no suspected twinning and twinned data respectively. Wrong symmetry:

-------------------------------------------------------------------------------

Twinning and intensity statistics summary (acentric data):

Statistics independent of twin laws

- <I^2>/<I>^2 : 2.104

- <F>^2/<F^2> : 0.770

- <|E^2-1|> : 0.757

- <|L|>, <L^2>: 0.512, 0.349

Multivariate Z score L-test: 2.777

The multivariate Z score is a quality measure of the given

spread in intensities. Good to reasonable data is expected

to have a Z score lower than 3.5.

Large values can indicate twinning, but small values do not

necessarily exclude it.

Statistics depending on twin laws

------------------------------------------------------

| Operator | type | R obs. | Britton alpha | H alpha |

------------------------------------------------------

| k,h,-l | PM | 0.025 | 0.458 | 0.478 |

| -h,k,-l | PM | 0.017 | 0.459 | 0.487 |

| -k,h,l | PM | 0.024 | 0.458 | 0.478 |

| -k,-h,-l | PM | 0.024 | 0.458 | 0.478 |

| -h,-k,l | PM | 0.028 | 0.458 | 0.476 |

| h,-k,-l | PM | 0.027 | 0.458 | 0.477 |

| k,-h,l | PM | 0.024 | 0.457 | 0.478 |

------------------------------------------------------

Patterson analysis

- Largest peak height : 6.089

(corresponding p value : 6.921e-01)

The largest off-origin peak in the Patterson function is 6.09% of the height of the origin peak. No significant pseudo-translation is detected.

The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.

The symmetry of the lattice and intensity however suggests that the input space group is too low. See the relevant sections of the log file for more details on your choice of space groups.

As the symmetry is suspected to be incorrect, it is advisable to reconsider data processing.

-------------------------------------------------------------------------------

Twin laws present but no suspected twinning:

-------------------------------------------------------------------------------

Twinning and intensity statistics summary (acentric data):

Statistics independent of twin laws

- <I^2>/<I>^2 : 1.955

- <F>^2/<F^2> : 0.796

- <|E^2-1|> : 0.725

- <|L|>, <L^2>: 0.482, 0.314

Multivariate Z score L-test: 1.225

The multivariate Z score is a quality measure of the given

spread in intensities. Good to reasonable data is expected http://phenix-online.org/documentation/xtriage.htm (11 of 15) [12/14/08 1:01:22 PM]

134

Data quality assessment with phenix.xtriage

to have a Z score lower than 3.5.

Large values can indicate twinning, but small values do not

necessarily exclude it.

Statistics depending on twin laws

------------------------------------------------------

| Operator | type | R obs. | Britton alpha | H alpha |

------------------------------------------------------

| -h,k,-l | M | 0.455 | 0.016 | 0.035 |

------------------------------------------------------

Patterson analysis

- Largest peak height : 3.886

(corresponding p value : 9.982e-01)

The largest off-origin peak in the Patterson function is 3.89% of the height of the origin peak. No significant pseudo-translation is detected.

The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.

Even though no twinning is suspected, it might be worthwhile carrying out a refinement using a dedicated twin target anyway, as twinned structures with low twin fractions are difficult to distinguish from non-twinned structures.

-------------------------------------------------------------------------------

Twinned data:

-------------------------------------------------------------------------------

Twinning and intensity statistics summary (acentric data):

Statistics independent of twin laws

- <I^2>/<I>^2 : 1.587

- <F>^2/<F^2> : 0.871

- <|E^2-1|> : 0.568

- <|L|>, <L^2>: 0.387, 0.212

Multivariate Z score L-test: 11.589

The multivariate Z score is a quality measure of the given

spread in intensities. Good to reasonable data is expected

to have a Z score lower than 3.5.

Large values can indicate twinning, but small values do not

necessarily exclude it.

Statistics depending on twin laws

------------------------------------------------------

| Operator | type | R obs. | Britton alpha | H alpha |

------------------------------------------------------

| -l,-k,-h | PM | 0.170 | 0.330 | 0.325 |

------------------------------------------------------

Patterson analysis

- Largest peak height : 7.300

(corresponding p value : 4.454e-01)

The largest off-origin peak in the Patterson function is 7.30% of the

height of the origin peak. No significant pseudo-translation is detected.

The results of the L-test indicate that the intensity statistics are significantly different then is expected from good to reasonable, untwinned data.

As there are twin laws possible given the crystal symmetry, twinning could be the reason for the departure of the intensity statistics from normality.

http://phenix-online.org/documentation/xtriage.htm (12 of 15) [12/14/08 1:01:22 PM]

135

Data quality assessment with phenix.xtriage

It might be worthwhile carrying refinement with a twin specific target function.

-------------------------------------------------------------------------------

In the summary, the significance of the departure of the values of the L-test from normality are stated. The multivariate Z-score (also known as the Mahalanobis distance) is used for this purpose.

Examples

Standard run of xtriage

Running xtriage is easy. From the command-line you can type: phenix.xtriage data.sca

When an MTZ or CNS file is used, labels have to be specified: phenix.xtriage file=my_brilliant_data.mtz obs_labels='F(+),SIGF(+),F(-),SIGF(-)'

In order to perform a Matthews analysis, it might be useful to specify the number of residues/nucleotides in the crystallized macro molecule: phenix.xtriage data.sca n_residues=230 n_bases=25

By default, the screen output plus additional ccp4 style graphs (viewable with the ccp4 programs loggraph) are echoed to a file named logfile.log

. The command line arguments and all other defaults settings are summarized in a PHIL parameter data block given at the beginning of the logfile / screen output: scaling.input {

parameters {

asu_contents {

n_residues = None

n_bases = None

n_copies_per_asu = None

}

misc_twin_parameters {

missing_symmetry {

tanh_location = 0.08

tanh_slope = 50

}

twinning_with_ncs {

perform_analysis = False

n_bins = 7

}

twin_test_cuts {

low_resolution = 10

high_resolution = None

isigi_cut = 3

completeness_cut = 0.85

}

}

reporting {

verbose = 1

log = "logfile.log"

ccp4_style_graphs = True

}

}

xray_data {

file_name = "some_data.sca" http://phenix-online.org/documentation/xtriage.htm (13 of 15) [12/14/08 1:01:22 PM]

136

Data quality assessment with phenix.xtriage

obs_labels = None

calc_labels = None

unit_cell = 64.5 69.5 45.5 90 104.3 90

space_group = "P 1 21 1"

high_resolution = None

low_resolution = None

}

}

The defaults are good for most applications.

Possible Problems

Specific limitations and problems

Xtriage doesn't deal with data in centric space groups

Literature

CCP4 newsletter No. 42, Summer 2005: Characterization of X-ray data sets

CCP4 newsletter No. 43, Winter 2005: Xtriage and Fest: automatic assessment of X-ray data and substructure structure factor estimation

Additional information

List of all xtriage keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- scaling

input

expert_level= 1 Expert level

asu_contents

Defines the ASU contents

n_residues= None Number of residues in structural unit

n_bases= None Number of nucleotides in structural unit

n_copies_per_asu= None Number of copies per ASU. If not specified,

Matthews analyses is performed

xray_data

Defines xray data

file_name= None File name with data

obs_labels= None Labels for observed data

calc_labels= None Lables for calculated data

unit_cell= None Unit cell parameters

space_group= None space group

high_resolution= None High resolution limit

low_resolution= None Low resolution limit

reference

A reference data set. For the investigation of possible

reindexing options

data

Defines an x-ray dataset

http://phenix-online.org/documentation/xtriage.htm (14 of 15) [12/14/08 1:01:22 PM]

137

Data quality assessment with phenix.xtriage

file_name= None File name

labels= None Labels

unit_cell= None Unit cell parameters"

space_group= None Space group

structure

file_name= None Filename of reference PDB file

parameters

Basic settings

reporting

Some output issues

verbose= 1 Verbosity

log= logfile.log

Logfile

ccp4_style_graphs= True SHall we include ccp4 style graphs?

misc_twin_parameters

Various settings for twinning or symmetry tests

missing_symmetry

Settings for missing symmetry tests

sigma_inflation= 1.25

Standard deviations of intensities can be

increased to make point group determination

more reliable.

twinning_with_ncs

Analysing the possibility of an NCS operator

parallel to a twin law.

perform_analyses= False Determines whether or not this analyses

is carried out.

n_bins= 7 Number of bins used in NCS analyses.

twin_test_cuts

Various cuts used in determining resolution limit

for data used in intensity statistics

low_resolution= 10.0

Low resolution

high_resolution= None High resolution

isigi_cut= 3.0

I/sigI ratio used in completeness cut

completeness_cut= 0.85

Data is cut at resolution where

intensities with I/sigI greater than

isigi_cut are more than completeness_cut

complete

optional

Optional data massage possibilities

hklout= None HKL out

hklout_type= mtz sca *mtz_or_sca Output format

label_extension= "massaged" Label extension

aniso

Parameters dealing with anisotropy correction

action= *remove_aniso None Remove anisotropy?

final_b= *eigen_min eigen_mean user_b_iso Final b value

b_iso= None User specified B value

outlier

Outlier analyses

action= *extreme basic beamstop None Outlier protocol

parameters

Parameters for outlier detection

basic_wilson

level= 1E-6

extreme_wilson

level= 0.01

beamstop

level= 0.001

d_min= 10.0

symmetry

action= detwin twin *None

twinning_parameters

twin_law= None

fraction= None http://phenix-online.org/documentation/xtriage.htm (15 of 15) [12/14/08 1:01:22 PM]

138

Reflection Statistics

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Reflection Statistics

phenix.reflection_statistics Comparisions between multiple datasets are available via the phenix.

reflection_statistics

command:

Usage: phenix.reflection_statistics [options] reflection_file [...]

Options:

-h, --help show this help message and exit

--unit-cell=10,10,20,90,90,120|FILENAME

External unit cell parameters

--space-group=P212121|FILENAME

External space group symbol

--symmetry=FILENAME External file with symmetry information

--weak-symmetry symmetry on command line is weaker than symmetry found

in files

--quick Do not compute statistics between pairs of data arrays

--resolution=FLOAT High resolution limit (minimum d-spacing, d_min)

--low-resolution=FLOAT

Low resolution limit (maximum d-spacing, d_max)

--bins=INT Number of bins

--bins-twinning-test=INT

Number of bins for twinning test

--bins-second-moments=INT

Number of bins for second moments of intensities

--lattice-symmetry-max-delta=LATTICE_SYMMETRY_MAX_DELTA

angular tolerance in degrees used in the determination

of the lattice symmetry

Example: phenix.reflection_statistics data1.mtz data2.sca

This utility reads one or more reflection files (many common formats incl. MTZ, Scalepack, CNS,

SHELX). For each of the datasets found in the reflection files the output shows a block like the following:

Miller array info: gere_MAD.mtz:FSEinfl,SIGFSEinfl,DSEinfl,SIGDSEinfl

Observation type: xray.reconstructed_amplitude

Type of data: double, size=20994

Type of sigmas: double, size=20994

Number of Miller indices: 20994

Anomalous flag: 1

Unit cell: (108.742, 61.679, 71.652, 90, 97.151, 90)

Space group: C 1 2 1 (No. 5)

Systematic absences: 0

Centric reflections: 0

Resolution range: 24.7492 2.74876

Completeness in resolution range: 0.873513

Completeness with d_max=infinity: 0.872315

Bijvoet pairs: 10497

Lone Bijvoet mates: 0 http://phenix-online.org/documentation/reflection_statistics.htm (1 of 2) [12/14/08 1:01:25 PM]

139

Reflection Statistics

Anomalous signal: 0.1065

This is followed by a listing of the completeness and the anomalous signal in resolution bins. The number of bins and the resolution range may be adjusted with the options shown above. Unless the -quick

option is specified the output will also show the correlations between the datasets and, if applicable, between the anomalous differences, both as overall values and in bins. The correlation between anomalous differences is often a very powerful indicator for the resolution up to which the

anomalous signal is useful for substructure determination. See also: phenix.xtriage

http://phenix-online.org/documentation/reflection_statistics.htm (2 of 2) [12/14/08 1:01:25 PM]

140

reflection file tools

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

reflection file tools

phenix.reflection_file_converter

phenix.cns_as_mtz

phenix.mtz.dump

phenix.reflection_file_converter

Purpose phenix.reflection_file_converter is a simple utility program that allows a straightforward conversion of many reflection file formats to mtz, cns or scalepack format.

Currently, combining several dataset into a single output file is not supported. Keywords Typing: phenix.reflection_file_converter --help results in:

Usage: phenix.reflection_file_converter [options] reflection_file ...

Options:

-h, --help show this help message and exit

--unit-cell=10,10,20,90,90,120|FILENAME

External unit cell parameters

--space-group=P212121|FILENAME

External space group symbol

--symmetry=FILENAME External file with symmetry information

--weak-symmetry symmetry on command line is weaker than symmetry found

in files

--resolution=FLOAT High resolution limit (minimum d-spacing, d_min)

--low-resolution=FLOAT

Low resolution limit (maximum d-spacing, d_max)

--label=STRING Substring of reflection data label or number

--non-anomalous Averages Bijvoet mates to obtain a non-anomalous array

--r-free-label=STRING

Substring of reflection data label or number

--r-free-test-flag-value=FLOAT

Value in R-free array indicating assignment to free

set.

--generate-r-free-flags

Generates a new array of random R-free flags (MTZ and

CNS output only).

--use-lattice-symmetry-in-r-free-flag-generation

group twin/pseudo symmetry related reflections

together in r-free set.

--r-free-flags-fraction=FLOAT

Target fraction free/work reflections (default: 0.10).

--r-free-flags-max-free=INT

Maximum number of free reflections (default: 2000).

--change-of-basis=STRING

Change-of-basis operator: h,k,l or x,y,z or

to_reference_setting, to_primitive_setting,

to_niggli_cell, to_inverse_hand

--eliminate-invalid-indices

Remove indices which are invalid given the change of

basis desired http://phenix-online.org/documentation/reflection_file_tools.htm (1 of 3) [12/14/08 1:01:30 PM]

141

reflection file tools

--expand-to-p1 Generates all symmetrically equivalent reflections.

The space group symmetry is reset to P1. May be used

in combination with --change_to_space_group to lower

the symmetry.

--change-to-space-group=SYMBOL|NUMBER

Changes the space group and merges equivalent

reflections if necessary

--write-mtz-amplitudes

Converts intensities to amplitudes before writing MTZ

format; requires --mtz_root_label

--write-mtz-intensities

Converts amplitudes to intensities before writing MTZ

format; requires --mtz_root_label

--remove-negatives Remove negative intensities or amplitudes from the

data set

--massage-intensities

'Treat' negative intensities to get a positive

amplitude. |Fnew| = sqrt((Io+sqrt(Io**2

+2sigma**2))/2.0). Requires intensities as input and

the flags --mtz, --write_mtz_amplitudes and

--mtz_root_label.

--scale-max=FLOAT Scales data such that the maximum is equal to the

given value

--scale-factor=FLOAT Multiplies data with the given factor

--sca=FILE write data to Scalepack FILE ('--sca .' copies name of

input file)

--mtz=FILE write data to MTZ FILE ('--mtz .' copies name of input

file)

--mtz-root-label=STRING

Root label for MTZ file (e.g. Fobs)

--cns=FILE write data to CNS FILE ('--cns .' copies name of input

file)

--shelx=FILE write data to SHELX FILE ('--shelx .' copies name of

input file)

Example: phenix.reflection_file_converter w1.sca --mtz .

Examples

Convert scalepack into an mtz format. Specify ouput filename (w1.mtz) and label for intensities (IP -> IP,

SIGIP): phenix.reflection_file_converter w1.sca --mtz_root_label=IP --mtz=w1.mtz

Change basis to get data in primitive setting, merge to higher symmetry and bring to reference setting

(three steps): phenix.reflection_file_converter c2.sca --change-of-basis=to_niggli_cell --sca=niggli.sca

phenix.reflection_file_converter niggli.sca --change-to-space_group=R32:R --sca=r32r.sca

phenix.reflection_file_converter r32r.sca --change-of-basis=to_reference_setting -sca=r32_hexagonal_setting.sca

phenix.cns_as_mtz

Purpose Converts all data in a CNS reflection file to MTZ format. Keywords Typing: phenix.cns_as_mtz --help results in:

Usage: phenix.cns_as_mtz [options] cns_file http://phenix-online.org/documentation/reflection_file_tools.htm (2 of 3) [12/14/08 1:01:30 PM]

142

reflection file tools

Options:

-h, --help show this help message and exit

--unit-cell=10,10,20,90,90,120|FILENAME

External unit cell parameters

--space-group=P212121|FILENAME

External space group symbol

--symmetry=FILENAME External file with symmetry information

-q, --quiet suppress output

Example: phenix.cns_as_mtz scale.hkl

Example Extract unit cell parameters and space group symbol from a PDB coordinate file and reflection data from a CNS reflection file. Write MTZ file: phenix.cns_as_mtz mad_scale.hkl --symmetry minimize.pdb

phenix.mtz.dump

Purpose Inspects an MTZ file. Optionally writes data in text format (human readable, machine readable, or spreadsheet). Keywords Typing: phenix.mtz.dump --help results in:

Usage: phenix.mtz.dump [options] file_name [...]

Options:

-h, --help show this help message and exit

-v, --verbose Enable CMTZ library messages.

-c, --show-column-data

-f KEYWORD, --column-data-format=KEYWORD

Valid keywords are: human_readable, machine_readable,

spreadsheet. Human readable is the default. The format

keywords can be abbreviated (e.g. -f s).

-b, --show-batches

--walk=ROOT_DIR Find and process all MTZ files under ROOT_DIR http://phenix-online.org/documentation/reflection_file_tools.htm (3 of 3) [12/14/08 1:01:30 PM]

143

Structure factor file manipulations with Xmanip

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Structure factor file manipulations with Xmanip

Author(s)

Purpose

Usage

Command line interface

Parameters and definitions

Examples

Possible Problems

Literature

Additional information

List of all xmanip keywords

Author(s)

Xmanip: Peter Zwart

Phil command interpreter: Ralf W. Grosse-Kunstleve

Purpose

Manipulation of reflection data and models

Usage

Command line interface

xmanip

can be invoked via the command line interface with instructions given in a specific definition file:

phenix.xmanip params.def

The full set of definitions can be obtained by typing:

phenix.xmanip

which results in::

xmanip {

input {

unit_cell = None

space_group = None

xray_data {

file_name = None

labels = None

label_appendix = None

name = None

write_out = None

}

model {

file_name = None

} http://phenix-online.org/documentation/xmanip.htm (1 of 7) [12/14/08 1:01:37 PM]

144

Structure factor file manipulations with Xmanip

}

parameters {

action = reindex manipulate_pdb *manipulate_miller

reindex {

standard_laws = niggli *reference_setting invert user_supplied

user_supplied_law = "h,k,l"

}

manipulate_miller {

task = get_dano get_diso lsq_scale sfcalc *custom None

output_label_root = "FMODEL"

get_dano {

input_data = None

}

get_diso {

native = None

derivative = None

use_intensities = True

use_weights = True

scale_weight = True

}

lsq_scale {

input_data_1 = None

input_data_2 = None

use_intensities = True

use_weights = True

scale_weight = True

}

sfcalc {

fobs = None

output = *2mFo-DFc mFo-DFc complex_fcalc abs_fcalc intensities

use_bulk_and_scale = *as_estimated user_upplied

bulk_and_scale_parameters {

d_min = 2

overall {

b_cart {

b_11 = 0

b_22 = 0

b_33 = 0

b_12 = 0

b_13 = 0

b_23 = 0

}

k_overall = 0.1

}

solvent {

k_sol = 0.3

b_sol = 56

}

}

}

custom{

code = print >> out, "hello world"

}

}

manipulate_pdb{

task = apply_operator *set_b

apply_operator{

operator = "x,y,z"

invert=False

concatenate_model=False http://phenix-online.org/documentation/xmanip.htm (2 of 7) [12/14/08 1:01:37 PM]

145

Structure factor file manipulations with Xmanip

chain_id_increment=1

}

set_b{

b_iso = 30

}

}

}

output {

logfile = "xmanip.log"

hklout = "xmanip.mtz"

xyzout = "xmanip.pdb"

}

}

Detailed explanation of the scopes follow below.

Parameters and definitions

The xmanip.input scope defines which files and which data xmanip

reads in::

input {

unit_cell = None # unit cell. Specify when not in reflection or pdb files

space_group = None # space group. Specify when not in reflection or pdb files

xray_data {

file_name = None # File from which data will be read

labels = None # Labels to read in.

label_appendix = None # Label appendix: when writing out the new mtz file, this appendix will be added to the current label.

name = None # A data set name. Useful for manipulation

write_out = None # Determines if this data set will be written to the final mtz file

}

model {

file_name = None # An input pdb file

}

}

One can define as many sub-scopes of xray_data as desired (see examples). The specific tasks of xmanip

are controlled by the xmanip.parameters.action key. Possible options are:

reindex

manipulate_pdb

manipulate_miller

Reindexing: reindexing of a data set (and a model) is controlled by the xmanip.parameters.reindex scope. Standard laws are available:

niggli: Brings unit cell to the niggli setting.

reference_setting: Brings space group to the reference setting

invert: Inverts a data set

user_supplied: A user supplied reindexing law is used, specified by reindex.user_supplied_law

manipulate_pdb: A pdb file can be modified by either applying a symmetry operator to the coordinates

(select the apply_operator task from the manipulate_pdb.task list. The operator needs to be specified by apply_operator.operator. Setting apply_operator.invert to true

will invert the supplied operator. One can choose to put out the newly generated chain with the original chain (set concatenate_model = True

). The new chain ID can be controlled with the chain_id_increment parameter. manipulate miller: Reflection data can be manipulate in various ways: http://phenix-online.org/documentation/xmanip.htm (3 of 7) [12/14/08 1:01:37 PM]

146

Structure factor file manipulations with Xmanip

get_dano: Get anomalous differences from the data set with name specified by manipulate_miller.

get_dano.input_data.

get_diso: Get isomorphous differences (derivative-native) from the data sets specified by the names

manipulate_miller.get_diso.native and manipulate_miller.get_diso.derivative. Least squares scaling of the derivative to the native can be done on intensities ( use_intensities=True

), with or without using

● sigmas (use_weights) and by scaling the weights if desired (recommended).

lsq_scale : As above, no isomorphous difference are computed, only input_data_2 is scaled and

● returned.

sfcalc: Structure factor calculation. Requires a pdb file to be read in. Possible output coefficients are

2mFo-DFc (Fobs required. specify sfcalc.fobs).

mFo-DFc (Fobs required. specify sfcalc.fobs).

complex_fcalc (FC,PHIC)

abs_fcalc (FC)

intensities (FC^2) bulk solvent and scaling parameters will be either estimated from observed data if supplied, or set by the user (using keywords in the bulk_and_scale_parameters scope)

custom: If custom is selected, all data names for the xray data will become variable names accessible via the custom interface. The ``custom`` interface allows one to write a small piece of python code that directly works with the python objects them self. Basic knowledge of the cctbx and python are needed to bring this to a fruitful ending. Please contact the authors for detailed help if required. An example is given in the example section.

Examples

Reindexing a data set and model ::

xmanip {

input {

xray_data {

file_name = mydata.mtz

labels = FOBS,SIGFOBS

write_out = True

}

xray_data {

file_name = mydata.mtz

labels = R_FREE_FLAG

write_out = True

}

model {

file_name = mymodel.pdb

}

}

parameters {

action = reindex

reindex {

standard_laws = *niggli

user_supplied_law = "h,k,l"

}

}

output {

logfile = "xmanip.log"

hklout = "reindex.mtz"

xyzout = "reindex.pdb"

}

}

Applying a symmetry operator to a pdb file :: http://phenix-online.org/documentation/xmanip.htm (4 of 7) [12/14/08 1:01:37 PM]

147

Structure factor file manipulations with Xmanip

xmanip {

input {

model {

file_name = mymodel.pdb

}

}

parameters {

action = manipulate_pdb

manipulate_pdb {

task = apply_operator

apply_operator{

operator = "x+1/3,y-2/3,z+1/8"

}

}

}

output {

logfile = "xmanip.log"

xyzout = "shifted.pdb"

}

}

Printing out some useful information for an mtz file ::

xmanip {

input {

xray_data {

file_name = mydata.mtz

labels = FOBS,SIGFOBS

name = fobs

}

}

parameters {

action = custom

custom{

code = """

print >> out, "Printing d_spacings, epsilons and intensities"

#change amplitude to intensities

fobs = fobs.f_as_f_sq()

#get epsilons

epsilons = fobs.epsilons().data().as_double()

#get d spacings

d_hkl = fobs.d_spacings().data()

#print the lot to a file

output_file = open("jiffy_result.txt", 'w')

for ii, eps, dd in zip( fobs.data(), epsilons, d_hkl):

print >> output_file, ii, eps, dd

print >> out, "Done"

"""

}

}

}

Possible Problems

None

Literature

None http://phenix-online.org/documentation/xmanip.htm (5 of 7) [12/14/08 1:01:37 PM]

148

Structure factor file manipulations with Xmanip

Additional information

List of all xmanip keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- xmanip

input

unit_cell= None Unit cell parameters

space_group= None space group

xray_data

Scope defining xray data. Multiple scopes are allowed

file_name= None file name

labels= None A unique label or unique substring of a label

label_appendix= None Label appendix for output mtz file

name= None An identifier of this particular miller array

write_out= None Determines if this data is written to the output file

model

A model associated with the miller arrays. Only one model can be

defined.

file_name= None A model file

parameters

action= *reindex manipulate_pdb manipulate_miller Defines which action

will be carried out.

reindex

Reindexing parameters. Acts on coordinates and miller arrays.

standard_laws= niggli *reference_setting primitive_setting invert

user_supplied Choices of reindexing operators. Will be

applied on structure and miller arrays.

user_supplied_law= 'h,k,l' User supplied operator.

manipulate_miller

Acts on a single miller array or a set of miller

arrays.

task= *get_dano get_diso lsq_scale sfcalc custom None Possible tasks

output_label_root= None Output label root

get_dano

Get ||F+| - |F-|| from input data.

input_data= None

get_diso

Get |Fder|-|Fnat|

native= None Name of native data

derivative= None Name of derivative data

use_intensities= True Scale on intensities

use_weights= True Use experimental sigmas as weights in scaling

scale_weight= True Whether or not to scale the sigmas during

scaling

lsq_scale

input_data_1= None Reference data

input_data_2= None Data to be scaled

use_intensities= True Scale on intensities

use_weights= True Use experimental sigmas as weights in scaling

scale_weight= True Whether or not to scale the sigmas during

scaling

sfcalc

fobs= None Data name of observed data http://phenix-online.org/documentation/xmanip.htm (6 of 7) [12/14/08 1:01:37 PM]

149

Structure factor file manipulations with Xmanip

output= 2mFo-DFc mFo-DFc *complex_fcalc abs_fcalc intensities

Output coefficients

use_bulk_and_scale= *as_estimated user_upplied estimate or use

parameters given by user

bulk_and_scale_parameters

Parameters used in the structure factor

calculation. Ignored if experimental

data is given

d_min= 2.0

resolution of the data to be calculated.

overall

Bulk solvent and scaling parameters

k_overall= 0.1

Overall scalar

b_cart

Anisotropic B values

b_11= 0

b_22= 0

b_33= 0

b_12= 0

b_13= 0

b_23= 0

solvent

Solvent parameters

k_sol= 0.3

Solvent scale

b_sol= 56.0

Solvent B

custom

A custom script that uses miller_array data names as variables.

code= None A piece of python code

show_instructions= True Some instructions

manipulate_pdb

Manipulate elements of a pdb file

task= set_b apply_operator *None How to manipulate a pdb file

set_b

b_iso= 30 new B value for all atoms

apply_operator

standard_operators= *user_supplied_operator

user_supplied_cartesian_rotation_matrix

Possible operators

user_supplied_operator= "x,y,z" Actualy operator in x,y,z notation

invert= False Invert operator given above before applying on

coordinates

concatenate_model= False Determines if new chain is concatenated

to old model

chain_id_increment= 1 Cain id increment

user_supplied_cartesian_rotation_matrix

Rotation,translation

matrix in cartesian frame

r= None Rotational part of operator

t= None Translational part of operator

output

Output files

logfile= xmanip.log

Logfile

hklout= xmanip.mtz

Ouptut miller indices and data

xyzout= xmanip.pdb

output PDB file http://phenix-online.org/documentation/xmanip.htm (7 of 7) [12/14/08 1:01:37 PM]

150

Explore Metric Symmetry

Documentation Home

Explore Metric Symmetry

Python-based Hierarchical ENvironment for Integrated Xtallography

Purpose

Keywords

Examples

Purpose

iotbx.explore_metric_symmetry

is a program that allows a user to quickly determine the symmetry of the lattice, given a unit cell, and determine relations between various possible point groups. Another use of iotbx.explore_metric_symmetry is in the comparison of related unit cells, that are related by a linear recombination of their basis vectors.

Keywords

A list of keywords and concise help can be obtained by typing: iotbx.explore_metric_symmetry options:

-h, --help show this help message and exit

--unit_cell=10,10,20,90,90,120|FILENAME

External unit cell parameters

--space_group=P212121|FILENAME

External space group symbol

--symmetry=FILENAME External file with symmetry information

--max_delta=FLOAT Maximum delta/obliquity used in determining the

lattice symmetry, using a modified Le-Page algorithm.

Default is 5.0 degrees

--start_from_p1 Reduce to Niggli cell and forget the input space group

before higher metric symmetry is sought.

--graph=GRAPH A graphical representation of the graph will be

written out. Requires Graphviz to be installed and in

path.

--centring_type=CENTRING_TYPE

Centring type, choose from P,A,B,C,I,R,F

--other_unit_cell=10,20,30,90,103.7,90

Other unit cell, for unit cell comparison

--other_space_group=OTHER_SPACE_GROUP

space group for other_unit_cell, for unit cell

comparison

--other_centring_type=OTHER_CENTRING_TYPE

Centring type, choose from P,A,B,C,I,R,F

--no_point_group_graph

Do not carry out the construction of a point group

graph.

--relative_length_tolerance=FLOAT

Tolerance for unit cell lengths to be considered

equal-ish.

--absolute_angle_tolerance=FLOAT

Angular tolerance in unit cell comparison http://phenix-online.org/documentation/explore_metric_symmetry.htm (1 of 2) [12/14/08 1:01:41 PM]

151

Explore Metric Symmetry

--max_order=INT Maximum volume change for target cell

Explore Metric Symmetry. A list of possible unit cells and space groups is given for the given specified unit cell and space group combination.

The keywords unit_cell, space_group (or centring_type) define the crystal symmetry for which a point group graph is constructed. The keyword max_delta sets the tolerance used the in determination of the lattice symmetry. the keyword start_from_p1 in combination with the space group is equivalent to specification of the centring_type only. If graphviz is installed, an a png file with the point group graph can be constructed by specifying the filename of the png graph with the keyword graph. If a second crystal is specified by the keywords other_unit_cell, other_space_group (or other_centring_type

) the unit cells will be compared. Using linear combinations of the smallest unit cell, possible matches for the large unit cell are sought. If desired, the larger unit cell can be expanded as well using the keyword max_order. The tolerances in the unit cell comparison can be changed form their defaults (10% on the lengths and 20 degrees on the angles) using the keywords relative_length_tolerance

and absolute_angle_tolerance. Construction of a point group graph can be skipped using the key no_point_group_graph.

Examples

Constructing a point group graph given some basic information: iotbx.explore_metric_symmetry --unit_cell="20,30,40,90,90,90" --centring_type=P

All point groups between P 1 and P 2 2 2 will be listed Comparing two related unit cells can be done using: iotbx.explore_metric_symmetry --unit_cell="20,30,40,90,90,90" --centring_type=P -other_unit_cell="40,80,60,90,90,90" --other_centring_type=F http://phenix-online.org/documentation/explore_metric_symmetry.htm (2 of 2) [12/14/08 1:01:41 PM]

152

Hybrid Substructure Search

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Hybrid Substructure Search

HySS overview

HySS examples nsf_d2_peak.sca

gere_MAD.mtz

mbp.hkl

Command line options

If things go wrong

Auxiliary programs phenix.emma

phenix.xtriage

phenix.reflection_statistics

HySS overview

The HySS (Hybrid Substructure Search) submodule of the Phenix package is a highly-automated procedure for the location of anomalous scatterers in macromolecular structures. HySS starts with the automatic detection of the reflection file format and analyses all available datasets in a given reflection file to decide which of these is best suited for solving the structure. The search parameters are automatically adjusted based on the available data and the number of expected sites given by the user. The search method is a systematic multitrial procedure employing

● direct-space Patterson interpretation followed by reciprocal-space Patterson interpretation followed by dual-space direct methods followed by automatic comparison of the solutions and automatic termination detection.

The end result is a consensus model which is exported in a variety of file formats suitable for frequently used phasing and density modification packages. Links:

Examples

Download page

The core search procedure is applicable to both anomalous diffraction and isomorphous replacement problems.

However, currently the command line interface is limited to work with anomalous diffraction data or externally preprocessed difference data. References:

Grosse-Kunstleve RW, Adams PD:

Substructure search procedures for macromolecular structures

Acta Cryst. 2003, D59, 1966-1973.

Electronic reprint

Adams PD, Grosse-Kunstleve RW, Hung L-W, Ioerger TR, McCoy AJ, Moriarty NW, Read RJ,

Sacchettini JC, Sauter NK, Terwilliger, TC:

PHENIX: building new software for automated crystallographic structure

determination

Acta Cryst. 2002, D58, 1948-1954.

Electronic reprint http://phenix-online.org/documentation/hyss.htm (1 of 6) [12/14/08 1:01:45 PM]

153

Hybrid Substructure Search

To contact us send email to [email protected]

or [email protected]

.

HySS examples

The only input file required for running HySS is a file with the reflection data. HySS reads the following formats directly:

● merged scalepack files unmerged scalepack files (but merged files are preferred!)

CCP4 MTZ files with merged data

CCP4 MTZ files with unmerged data (but merged files are preferred!) d*trek .ref files

XDS_ASCII files with merged data

CNS reflection files

SHELX reflection files with amplitudes

nsf_d2_peak.sca

The CCI Apps binary bundles include a scalepack file with anomalous peak data for the structure with the PDB access code 1NSF (courtesy of A.T. Brunger). To find the 8 selenium sites enter: phenix.hyss nsf_d2_peak.sca 8 se

This leads to:

Reading reflection file: nsf_d2_peak.sca

Space group found in file: P 6

Is this the correct space group? [Y/N]:

HySS prompts for a confirmation of the space group because space group P6 is often used as a placeholder during data reduction. If the space group symbol found in the reflection file is not correct it can be changed.

However, in this case the symbol is correct. At the prompt enter Y to continue. Alternatively, the interactive prompt can be avoided by using the --space_group option: phenix.hyss nsf_d2_peak.sca 8 se --space_group=p6

HySS will quickly print a few screen-pages with information about the data (e.g. the magnitude of the anomalous signal) and the many search parameters. The most interesting output is produced after this point:

Entering search loop: p = peaklist index in Patterson map f = peaklist index in two-site translation function cc = correlation coefficient after extrapolation scan r = number of dual-space recycling cycles cc = final correlation coefficient p=000 f=000 cc=0.364 r=015 cc=0.479 [ best cc: 0.479 ] p=000 f=001 cc=0.310 r=015 cc=0.477 [ best cc: 0.479 0.477 ]

Number of matching sites of top 2 structures: 11 p=000 f=002 cc=0.166 r=015 cc=0.479 [ best cc: 0.479 0.479 0.477 ]

Number of matching sites of top 2 structures: 11

Number of matching sites of top 3 structures: 11

It will take a few seconds for each line starting with p= to appear. Each of these lines summarizes the result of one trial consisting of an evaluation of the Patterson function, two fast translation functions, and 15 cycles of dual-space recycling. The important number to watch is the final correlation. In the first three trials HySS http://phenix-online.org/documentation/hyss.htm (2 of 6) [12/14/08 1:01:45 PM]

154

Hybrid Substructure Search finds three substructure models with promisingly high correlations. These models are compared, taking allowed origin shifts and the hand ambiguity into account. The three models have more than 2/3 of the expected number of sites in common. Therefore HySS decides that the search is complete and prints a summary of the matching sites:

Top 3 correlations: p=000 f=000 cc=0.364 r=015 cc=0.479

p=000 f=002 cc=0.166 r=015 cc=0.479

p=000 f=001 cc=0.310 r=015 cc=0.477

Match summary:

Operator:

rotation: {{-1.0, 0.0, 0.0}, {0.0, -1.0, 0.0}, {0.0, 0.0, -1.0}}

translation: (-9.6289517721653785e-38, 0.0, 0.091526465343537006)

rms coordinate differences: 0.06

Pairs: 11

site001 site001 0.018

site002 site002 0.056

site003 site003 0.033

site004 site004 0.026

site005 site005 0.050

site006 site006 0.103

site007 site007 0.040

site008 site008 0.063

site009 site010 0.067

site010 site009 0.120

site011 site011 0.029

Singles model 1: 0

Singles model 2: 0

The matching sites are used to build a consensus model. The coordinates and occupancies are quickly refined using a quasi-Newton minimizer:

Minimizing consensus model (11 sites).

Truncating consensus model to expected number of sites.

Minimizing consensus model (8 sites).

Correlation coefficient for consensus model (8 sites): 0.483

The refined sites are sorted by occupancy in descending order. The model is truncated to the expected number of sites and refined again. After printing detailed timing information (not shown) the output ends with:

Storing all substructures found: nsf_d2_peak_hyss_models.pickle

Storing consensus model: nsf_d2_peak_hyss_consensus_model.pickle

Writing consensus model as PDB file: nsf_d2_peak_hyss_consensus_model.pdb

Writing consensus model as CNS SDB file: nsf_d2_peak_hyss_consensus_model.sdb

Writing consensus model as SOLVE xyz records: nsf_d2_peak_hyss_consensus_model.xyz

The fractional coordinates may also be useful in other programs.

Total CPU time: 49.60 seconds

The resulting coordinate files can be used for phasing and density modification with other programs.

gere_MAD.mtz

The CCP4 distribution includes a four-wavelength MAD dataset in the tutorial directory. To find the 12 http://phenix-online.org/documentation/hyss.htm (3 of 6) [12/14/08 1:01:45 PM]

155

Hybrid Substructure Search selenium sites with HySS enter: phenix.hyss $CEXAM/tutorial2000/data/gere_MAD.mtz 12 se

HySS automatically picks the wavelength with the strongest anomalous signal and finishes after about 34 seconds (2.8GHz Pentium 4 Linux), writing out the 12 (or sometimes only 11) sites in the various file formats.

mbp.hkl

The CNS tutorial includes data from a MAD experiment with Ytterbium as the anomalous scatterer. CNS reflection files do not contain information about the unit cell and space group. However, HySS is able to extract this information from other files, e.g. other reflection files, CNS files, SOLVE files, PDB files or SHELX files. For example: phenix.hyss $CNS_SOLVE/doc/html/tutorial/data/mbp/mbp.hkl 4 yb --symmetry $CNS_SOLVE/doc/html/ tutorial/data/mbp/def

HySS reads the reflection data from the mbp.hkl file. The --symmetry options instructs HySS to scan the def file for unit cell parameters and a space group symbol. HySS finishes after about 26 seconds (2.8GHz Pentium

4 Linux).

Command line options

Enter phenix.hyss without arguments to obtain a list of the available command line options:

Command line arguments: usage: phenix.hyss [options] reflection_file n_sites element_symbol options:

-h, --help show this help message and exit

--unit_cell=10,10,20,90,90,120|FILENAME

External unit cell parameters

--space_group=P212121|FILENAME

External space group symbol

--symmetry=FILENAME External file with symmetry information

--chunk=n,i Number of chunks for parallel execution and index for

one process

--search=fast|full Search mode

--resolution=FLOAT High resolution limit (minimum d-spacing, d_min)

--low_resolution=FLOAT

Low resolution limit (maximum d-spacing, d_max)

--site_min_distance=FLOAT

Minimum distance between substructure sites (default:

3.5)

--site_min_distance_sym_equiv=FLOAT

Minimum distance between symmetrically-equivalent

substructure sites (overrides --site_min_distance)

--site_min_cross_distance=FLOAT

Minimum distance between substructure sites not

related by symmetry (overrides --site_min_distance)

--molecular_weight=FLOAT

Molecular weight

--solvent_content=FLOAT

Solvent content (default: 0.55)

--random_seed=INT Seed for random number generator

--real_space_squaring

Use real space squaring (as opposed to the tangent

formula) http://phenix-online.org/documentation/hyss.htm (4 of 6) [12/14/08 1:01:45 PM]

156

Hybrid Substructure Search

--data_label=STRING Substring of reflection data label

See also:

http://www.phenix-online.org/download/documentation/cci_apps/hyss/

Example: phenix.hyss w1.sca 66 Se

The --data_label, --resolution and --low_resolution options can be used to override the automatic selection of the reflection data and the resolution range. For example, one may enter the following command with the goal to instruct HySS to use the peak data in the gere_MAD.mtz file (instead of the inflection point data), and to set the high resolution limit to 5 Angstrom: phenix.hyss gere_MAD.mtz 12 se --data_label=peak --resolution=5

Output:

Command line arguments: gere_MAD.mtz 12 se --data_label=peak --resolution=5

Reading reflection file: gere_MAD.mtz

Ambiguous --data_label=peak

Possible choices:

5: gere_MAD.mtz:FSEpeak,SIGFSEpeak,DSEpeak,SIGDSEpeak,merged

6: gere_MAD.mtz:F(+)SEpeak,SIGF(+)SEpeak,F(-)SEpeak,SIGF(-)SEpeak

Please specify an unambiguous substring of the target label.

Sorry: Please try again.

That's a good first try but if --data_label=peak turns out to be ambiguous HySS will ask for more information. Second try: phenix.hyss gere_MAD.mtz 12 se --data_label="F(+)SEpeak" --resolution=5

Now HySS will actually perform the search. Typically the search finishes in less than 10 seconds finding 8-12 sites, depending on the random number generator (which is seeded with the current time unless the -random_seed

option is used). The --site_min_distance, --site_min_distance_sym_equiv, and -site_min_cross_distance

options are available to override the default minimum distance of 3.5 Angstroms between substructure sites. The --real_space_squaring option can be useful for large structures with highresolution data. In this case the large number of triplets generated for the reciprocal-space direct methods procedure (i.e. the tangent formula) may lead to excessive memory allocation. By default HySS switches to real-space direct methods (i.e. E-map squaring) if it searches for more than 100 sites. If this limit is too high given the available memory use the --real_space_squaring option. For substructures with a large number of sites it is in our experience not critical to employ reciprocal-space direct methods. If the --molecular_weight and --solvent_content options are used HySS will help in determining the number of substructures sites in the unit cell, interpreting the number of sites specified on the command line as number of sites per molecule.

For example: phenix.hyss gere_MAD.mtz 2 se --molecular_weight=8000 --solvent_content=0.70

This is telling HySS that we have a molecule with a molecular weight of 8 kD, a crystal with an estimated solvent content of 70%, and that we expect to find 2 Se sites per molecule. The HySS output will now show the following:

#---------------------------------------------------------------------------#

| Formula for calculating the number of molecules given a molecular weight. |

|---------------------------------------------------------------------------| http://phenix-online.org/documentation/hyss.htm (5 of 6) [12/14/08 1:01:45 PM]

157

Hybrid Substructure Search

| n_mol = ((1.0-solvent_content)*v_cell)/(molecular_weight*n_sym*.783) |

#---------------------------------------------------------------------------#

Number of molecules: 6

Number of sites: 12

Values used in calculation:

Solvent content: 0.70

Unit cell volume: 476839

Molecular weight: 8000.00

Number of symmetry operators: 4

HySS will go on searching for 12 sites.

If things go wrong

If the HySS consensus model does not lead to an interpretable electron density map please try the --search full

option: phenix.hyss your_file.sca 100 se --search full

This disables the automatic termination detection and the run will in general take considerably longer. If the full search leads to a better consensus model please let us know because we will want to improve the automatic termination detection. Another possibility is to override the automatic determination of the highresolution limit with the --resolution option. In some cases the resolution limit is very critical. Truncating the high-resolution limit of the data can sometimes lead to a successful search, as more reflections with a weak anomalous signal are excluded. If there is no consensus model at the end of a HySS run please try alternative programs. For example, run SHELXD with the .ins and .hkl files that are automatically generated by HySS:

Writing anomalous differences as SHELX HKLF file: mbp_anom_diffs.hkl

Writing SHELXD ins file: mbp_anom_diffs.ins

If HySS does not produce a consensus model even though it is possible to solve the substructure with other programs we would like to investigate. Please send email to [email protected]

.

Auxiliary programs phenix.emma

EMMA stands for Euclidean Model Matching which allows two sets of coordinates to be superimposed as best

as possible given symmetry and origin choices. See the phenix.emma

documentation for more details.

phenix.xtriage

The phenix.xtriage program performs an extensive suite of tests to assess the quality of a data set. It is a good idea to always run this program before substructure location or any other steps of structure solution. See

the phenix.xtriage

documentation for more details.

phenix.reflection_statistics

Comparision between multiple datasets is available using the phenix.reflection_statistics command. See

the phenix.reflection_statistics

documentation for more details. http://phenix-online.org/documentation/hyss.htm (6 of 6) [12/14/08 1:01:45 PM]

158

Euclidian Model Matching

Documentation Home

Euclidian Model Matching

Python-based Hierarchical ENvironment for Integrated Xtallography

phenix.emma EMMA stands for Euclidean Model Matching and is the algorithm used by HySS to superimpose two putative solutions and to derive the consensus model. The same algorithm is also available through the external phenix.emma command-line interface. Enter phenix.emma without arguments to obtain the help page: usage: phenix.emma [options] reference_coordinates reference_coordinates other_coordinates options:

-h, --help show this help message and exit

--unit_cell=10,10,20,90,90,120|FILENAME

External unit cell parameters

--space_group=P212121|FILENAME

External space group symbol

--symmetry=FILENAME External file with symmetry information

--tolerance=FLOAT match tolerance

--diffraction_index_equivalent

Use only if models are diffraction-index equivalent.

Example: phenix.emma model1.pdb model2.sdb

The command takes two coordinate files in various formats (.pdb, CNS .sdb, SOLVE output, SHELX .ins) and compares the structures taking the space group symmetry, the allowed origin shifts and the hand ambiguity into account. The output is similar to the Match summary shown above in the example HySS output. The match tolerance defaults to 3 Angstrom. For structures obtained with very low resolution data it may be necessary to specify a different tolerance, e.g. --tolerance=5. The --symmetry option works just like it does for phenix.hyss. It can be used to extract symmetry information from external files such as input files for other programs (CNS, SHELX, SOLVE, ...) or reflection files. However, the -symmetry

option is only required if the information about the unit cell and the space group is missing in both coordinate files given to phenix.emma. phenix.emma conducts an exhaustive search and, in contrast to HySS, displays all possible matches. The match with the largest number of matching sites is shown first, the match with the smallest number of matching sites is shown last (often just one site). Therefore you have to look at the beginning of the output to see the best match. I.e. if the output goes to the screen don't let yourself get distracted if you see a large number of Singles near the end of the output.

Scroll back to see the best match. Emma is also available via a web interface . http://phenix-online.org/documentation/emma.htm [12/14/08 1:01:47 PM]

159

Structure refinement in PHENIX

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Structure refinement in PHENIX

Available features

Current limitations phenix.refine organization

Running phenix.refine

Giving parameters on the command line or in files

Refinement scenarios

Refinement with all default parameters

Refinement of coordinates

Refinement of atomic displacement parameters (commonly named as ADP or B-factors)

Occupancy refinement

f' and f'' refinement

Using NCS restraints in refinement

Water picking

Hydrogens in refinement

Refinement using twinned data

Neutron and joint X-ray and neutron refinement

Optimizing target weights

Refinement at high resolution (higher than approx. 1.0 Angstrom)

Examples of frequently used refinement protocols, common problems

Useful options

Changing the number of refinement cycles and minimizer iterations

Creating R-free flags (if not present in the input reflection files)

Specify the name for output files

Reflection output

Setting the resolution range for the refinement

Bulk solvent correction and anisotropic scaling

Default refinement with user specified X-ray target function

Modifying the initial model before refinement starts

Refinement using FFT or direct structure factor calculation algorithm

Ignoring test (free) flags in refinement

Using phenix.refine to calculate structure factors

Scattering factors

Suppressing the output of certain files

Random seed

Electron density maps

Refining with anomalous data (or what phenix.refine does with Fobs+ and Fobs-).

Rejecting reflections by sigma

Developer's tools

CIF modifications and links

Definition of custom bonds and angles

Atom selection examples

Depositing refined structure with PDB

Referencing phenix.refine

Relevant reading

Feedback, more information

List of all refinement keywords

phenix.refine is the general purpose crystallographic structure refinement program

Available features

Coordinate refinement: http://phenix-online.org/documentation/refinement.htm (1 of 42) [12/14/08 1:02:19 PM]

160

Structure refinement in PHENIX

1. Restrained / unrestrained individual

2. Grouped (rigid body)

3. LBFGS minimization, Simulated Annealing

4. Selective removing of stereochemistry restraints

5. Adding custom bonds and angles

Atomic Displacement Parameters (ADP) refinement:

1. Restrained individual isotropic, anisotropic, mixed

2. Group isotropic (one isotropic B per selected model part)

3. TLS

4. comprehensive mode: combined TLS + individual or group ADP

Occupancy refinement (any: individual, group, constrained for alternative conformations)

Anomalous f' and f'' refinement

Bulk solvent correction (flat model using a mask) and anisotropic scaling

Multiple refinement and scale target functions: least-squares (ls), maximum-likelihood (ml), phased

● maximum-likelihood (mlhl)

FFT and direct summation based refinement

Various electron density map calculations (including likelihood-weighted)

Simple structure factor calculation (with or without bulk solvent and scaling)

Combined automatic ordered solvent building, update and refinement

Complete model and data statistics (including twinning analysis, Wilson B calculation, stereo-chemistry

● statistics and much more)

Automatic detection of NCS related copies and building NCS restraints

Refinement using X-ray, neutron or both experimental data

Complex refinement strategies in one run

Refinement at subatomic resolution (approx. < 1.0 A) with IAS model

Refinement with twinned data

Current limitations

No omit maps calculation (use PHENIX wizards for this)

TLS and individual anisotropic ADP cannot be refined at once for the same group

Certain refinement strategies are not available for joint X-ray/neutron refinement

No NCS constraints (restraints only)

Atoms with anisotropic ADP in NCS groups

No Simulated Annealing for selected fragments.

Remark on using amplitudes (Fobs) vs intensities (Iobs)

Although phenix.refine can read in both data types, intensities or amplitudes, internally it uses amplitudes in nearly all calculations. Both ways of doing refinement, with Iobs or Fobs, have their own slight advantages and disadvantages. To our knowledge there is no strong points to argue using one data type w.r.t. another.

phenix.refine organization

A refinement run in phenix.refine always consists of three main steps: reading in and processing of the data (model in PDB format, reflections in most known formats, parameters and optionally cif files with stereochemistry definitions), performing requested refinement protocols (bulk solvent and scaling, refinement of coordinates and B-factors, water picking, etc...) and finally writing out refined model, complete refinement statistics and electron density maps in various formats. The figure below illustrates these steps: http://phenix-online.org/documentation/refinement.htm (2 of 42) [12/14/08 1:02:19 PM]

161

Structure refinement in PHENIX

The second central step encompassing from bulk solvent correction and scaling to refinement of particular model parameters is called macro-cycles and repeated several times (3 by default).

Multiple refinement scenario can be realized at this step and applied to any selected part of a model as illustrated at figure below:

Running phenix.refine

phenix.refine is run from the command line:

% phenix.refine <pdb-file(s)> <reflection-file(s)> <monomer-library-file(s)>

When you do this a number of things happen:

The program automatically generates a ".eff" file which contains all of the parameters for the job (for example if you provided lysozyme.pdb the file lysozyme_refine_001.eff will be generated). This is the set of input parameters for this run.

The program automatically interprets the reflection file(s). If there is an unambiguous choice of data arrays these will be used for the refinement. If there is a choice, you're given a message telling you how to select the arrays. Several reflection files can be provided, for example: one containing Fobs and another

● one with R-free flags.

Once the data arrays are chosen, the program writes all of the data it will be using in the refinement to a http://phenix-online.org/documentation/refinement.htm (3 of 42) [12/14/08 1:02:19 PM]

162

Structure refinement in PHENIX new MTZ file, for example, lysozyme_refine_data.mtz. This makes it very easy to keep track of what you actually used in the refinement (instead of having the arrays spread across multiple files).

At the end of refinement the program generates:

1. a new PDB file, with the refined model, called for example lysozyme_refine_001.pdb;

2. two maps: likelihood weighted mFo-DFc and 2mFo-DFc. These are in ASCII X-PLOR format. A reflection file with map coefficients is also generated for use in Coot or XtalView (e.g. lysozyme_refine_001_map_coeffs.mtz);

3. a new defaults file to run the next cycle of refinement, e.g. lysozyme_refine_002.def. This means you can run the next cycle of refinement by typing:

% phenix.refine lysozyme_refine_002.def

To get information about command line options type:

% phenix.refine --help

To have the program generate the default input parameters without running the refinement job (e.g. if you want to modify the parameters prior to running the job):

% phenix.refine --dry_run <pdb-file> <reflection-file(s)>

If you know the parameter that you want to change you can override it from the command line:

% phenix.refine data.hkl model.pdb xray_data.low_resolution=8.0 \

simulated_annealing.start_temperature=5000

Note that you don't have to specify the full parameter name. What you specify on the command line is matched against all known parameters names and the best substring match is used if it is unique. To rerun a job that was previously run:

% phenix.refine --overwrite lysozyme_refine_001.def

The --overwrite option allows the program to overwrite existing files. By default the program will not overwrite existing files - just in case this would remove the results of a refinement job that took a long time to finish. To see all default parameters:

% phenix.refine --show-defaults=all

Giving parameters on the command line or in files

In phenix.refine parameters to control refinement can be given by the user on the command line:

% phenix.refine data.hkl model.pdb simulated_annealing=true

However, sometimes the number of parameters is large enough to make it difficult to type them all on the command line, for example:

% phenix.refine data.hkl model.pdb refine.adp.tls="chain A" \

refine.adp.tls="chain B" main.number_of_macro_cycles=4 \

xray_data.high_resolution=2.5 wxc_scale=3 wxu_scale=5 \

output.prefix=my_best_model strategy=tls+individual_sites+individual_adp \

simulated_annealing.start_temperature=5000

The same result can be achieved by using:

% phenix.refine data.hkl model.pdb custom_par_1.params

where the custom_par_1.params file contains the following lines: refinement.refine.strategy=tls+individual_sites+individual_adp refinement.refine.adp.tls="chain A" http://phenix-online.org/documentation/refinement.htm (4 of 42) [12/14/08 1:02:19 PM]

163

Structure refinement in PHENIX refinement.refine.adp.tls="chain B" refinement.main.number_of_macro_cycles=4 refinement.input.xray_data.high_resolution=2.5

refinement.target_weights.wxc_scale=3 refinement.target_weights.wxu_scale=5 refinement.output.prefix=my_best_model refinement.simulated_annealing.start_temperature=5000 which can also be formatted by grouping the parameters under the relevant scopes (custom_par_2.params): refinement.main {

number_of_macro_cycles=4

} refinement.input.xray_data.high_resolution=2.5

refinement.refine {

strategy = *individual_sites \

rigid_body \

*individual_adp \

group_adp \

*tls \

occupancies \

group_anomalous \

none

adp {

tls = "chain A"

tls = "chain B"

}

} refinement.target_weights {

wxc_scale=3

wxu_scale=5

} refinement.output.prefix=my_best_model refinement.simulated_annealing.start_temperature=5000 and the refinement run will be:

% phenix.refine data.hkl model.pdb custom_par_2.params

The easiest way to create a file like the custom_par_2.params file is to generate a template file containing all parameters by using the command phenix.refine --show-defaults=all and then take the parameters that you want to use (and remove the rest). Comments in parameter files Use # for comments:

% phenix.refine data.hkl model.pdb comments_in_params_file.params

where comments_in_params_file.params file contains the lines: refinement {

refine {

#strategy = individual_sites rigid_body individual_adp group_adp tls \

# occupancies group_anomalous *none

}

#main {

# number_of_macro_cycles = 1

#}

} refinement.target_weights.wxc_scale = 1.5

#refinement.input.xray_data.low_resolution=5.0

In this example the only parameter that is used to overwrite the defaults is target_weights.wxc_scale and the rest is commented.

Refinement scenarios

http://phenix-online.org/documentation/refinement.htm (5 of 42) [12/14/08 1:02:19 PM]

164

Structure refinement in PHENIX

The refinement of atomic parameters is controlled by the strategy keyword. Those include:

- individual_sites (refinement of individual atomic coordinates)

- individual_adp (refinement of individual atomic B-factors)

- group_adp (group B-factors refinement)

- group_anomalous (refinement of f' and f" values)

- tls (TLS refinement = refinement of ADP through TLS parameters)

- rigid_body (rigid body refinement)

- occupancies (occupancy refinement: individual, group, group constrained)

- none (bulk solvent and anisotropic scaling only)

Below are examples to illustrate the use of the strategy keyword as well as a few others.

Refinement with all default parameters

% phenix.refine data.hkl model.pdb

This will perform coordinate refinement and restrained ADP refinement. Three macrocycles will be executed, each consisting of bulk solvent correction, anisotropic scaling of the data, coordinate refinement

(25 iterations of the LBFGS minimizer) and ADP refinement (25 iterations of the LBFGS minimizer). At the end the updated coordinates, maps, map coefficients, and statistics are written to files.

Refinement of coordinates

phenix.refine offers three ways of coordinate refinement:

● individual coordinate refinement using gradient-driven (LBFGS) minimization;

● individual coordinate refinement using simulated annealing (SA refinement);

● grouped coordinate refinement (rigid body refinement).

All types of coordinate refinement listed above can be used separately or combined all together in any combination and can be applied to any selected part of a model. For example, if a model contains three chains A, B and C, than it would require only one single refinement run to perform SA refinement and minimization for atoms in chain A, rigid body refinement with two rigid groups A and B, and refine nothing for chain C. Below we will illustrate this with several examples. The default refinement includes a standard set of stereo-chemical restraints ( covalent bonds, angles, dihedrals, planarities, chiralities, non-bonded). The NCS restrains can be added as well. Completely unrestrained refinement is possible. The total refinement target is defined as:

Etotal = wxc_scale * wxc * Exray + wc * Egeom where: Exray is crystallographic refinement target (least-squares, maximum-likelihood, or any other), Egeom is the sum of restraints (including NCS if requested), wc is 1.0 by default and used to turn the restraints off, wxc ~ ratio of gradient's norms for geometry and X-ray targets as defined in (Adams et al, 1997,

PNAS, Vol. 94, p. 5018), wc_scale is an 'ad hoc' scale found empirically to be ok for most of the cases. Important to note:

When a refinement of coordinates (individual or rigid body) is run without using selections, then the coordinates of all atoms will be refined. Otherwise, if selections are used, the only coordinates of selected atoms will be refined and the rest will be fixed. Using strategy=rigid_body or strategy=individual_sites will ask phenix.refine to refine only coordinates while other parameters (ADP, occupancies) will be fixed. phenix.refine will stop if an atom at special position is included in rigid body group. The solution is to make a new rigid body group selection containing no atoms at special positions.

Rigid body refinement phenix.refine implementation of rigid body refinement is very sophisticated and efficient (big convergence radius, one run, no need to cut off high-resolution data). We call this MZ protocol (multiple zones). The essence of MZ protocol is that the refinement starts with a few reflections selected in the lowest resolution zone and proceeds with gradually adding higher resolution reflections. Also, it almost constantly updates the mask and bulk solvent model parameters and this is crucial since the bulk solvent affects the low resolution reflections - exactly those the most important for success of rigid body refinement. The default set of the rigid body parameters is good for most of the cases and is normally not supposed to be changed.

1. One rigid body group (whatever is in the PDB file is refined as a single rigid body): http://phenix-online.org/documentation/refinement.htm (6 of 42) [12/14/08 1:02:19 PM]

165

Structure refinement in PHENIX

% phenix.refine data.hkl model.pdb strategy=rigid_body

2. Multiple groups (requires a basic knowledge of the PHENIX atom selection language, see below):

% phenix.refine data.hkl model.pdb strategy=rigid_body \

sites.rigid_body="chain A" sites.rigid_body="chain B"

This will refine the chain A and chain B as two rigid bodies. The rest of the model will be kept fixed.

3. If one have many rigid groups, a lot of typing in the command line may not be convenient, so creating a parameter file rigid_body_selections, containing the following lines, may be a good idea: refinement.refine.sites {

rigid_body = chain A

rigid_body = chain B

}

The command line will then be:

% phenix.refine data.hkl model.pdb strategy=rigid_body rigid_body_selections.params

Files like this can be created, for example, by copy-and-paste from the complete list of parameters (phenix.

refine --show-defaults=all).

4. To switch from MZ protocol to traditional way of doing rigid body refinement (not recommended!):

% phenix.refine data.hkl model.pdb strategy=rigid_body rigid_body.number_of_zones=1 \

rigid_body.high_resolution=4.0

Note that doing one zone refinement one need to cut the high-resolution data off at some arbitrary point around 3-5 A (depending on model size and data quality).

5. By default the rigid body refinement is run only the first macro-cycles. To switch from running rigid body refinement only once at the first macro-cycle to running it every macro-cycle:

% phenix.refine data.hkl model.pdb strategy=rigid_body rigid_body.mode=every_macro_cycle

6. To change the default number of lowest resolution reflections used to determine the first resolution zone to do rigid body refinement in it (for MZ protocol only):

% phenix.refine data.hkl model.pdb strategy=rigid_body \

rigid_body.min_number_of_reflections=250

Decreasing this number may increase the convergence radius of rigid body refinement but small numbers may lead to refinement instability.

7. To change the number of zones for MZ protocol:

% phenix.refine data.hkl model.pdb strategy=rigid_body \

rigid_body.number_of_zones=7

Increasing this number may increase the convergence radius of rigid body refinement at the cost of much longer run time.

8. Rigid body refinement can be combined with individual coordinates refinement in a smart way:

% phenix.refine data.hkl model.pdb strategy=rigid_body+individual_sites this will perform 3 macro-cycles of individual coordinates refinement and the rigid body refinement will be performed only once at the first macro-cycle. More powerful combination for coordinates refinement is:

% phenix.refine data.hkl model.pdb strategy=rigid_body+individual_sites \

simulated_annealing=true this will do the same refinement as above plus the Simulated annealing at the second macro-cycle (see more options/examples for running SA in this document).

Refinement of individual coordinates http://phenix-online.org/documentation/refinement.htm (7 of 42) [12/14/08 1:02:19 PM]

166

Structure refinement in PHENIX

1. Refinement with Simulated Annealing:

% phenix.refine data.hkl model.pdb simulated_annealing=true \

strategy=individual_sites

This will perform the Simulated Annealing refinement and LBFGS minimization for the whole model. To change the start SA temperature:

% phenix.refine data.hkl model.pdb simulated_annealing=true \

strategy=individual_sites simulated_annealing.start_temperature=10000

Since a SA run may take some time, there are several options defining of how many times the SA will be performed per refinement run. Run it only the first macro_cycle:

% phenix.refine data.hkl model.pdb simulated_annealing=true \

strategy=individual_sites simulated_annealing.mode=first or every macro-cycle:

% phenix.refine data.hkl model.pdb simulated_annealing=true \

strategy=individual_sites simulated_annealing.mode=every_macro_cycle or second and before the last macro-cycle:

% phenix.refine data.hkl model.pdb simulated_annealing=true \ strategy=individual_sites simulated_annealing.mode=second_and_before_last

2. Refinement with minimization (whole model):

% phenix.refine data.hkl model.pdb strategy=individual_sites

3. Refinement with minimization (selected part of model):

% phenix.refine data.hkl model.pdb strategy=individual_sites \ sites.individual="chain A"

This will refine the coordinates of atoms in chain A while keeping fixed the atomic coordinates in chain B.

4. To perform unrestrained refinement of coordinates (usually at ultra-high resolutions):

% phenix.refine data.hkl model.pdb strategy=individual_sites wc=0

This assigns the contribution of the geometry restraints target to zero. However, it is still calculated for statistics output.

5. Removing selected geometry restraints In the example below:

% phenix.refine data.hkl model.pdb remove_restraints_selections.params

where remove_restraints_selections.params contains: refinement {

geometry_restraints.remove {

angles = chain B

dihedrals = name CA

chiralities = all

planarities = None

}

} the following restraints will be removed: angle for all atoms in chain B, dihedral for all involving CA atoms, all chirality. All planarity restraints will be preserved. http://phenix-online.org/documentation/refinement.htm (8 of 42) [12/14/08 1:02:19 PM]

167

Structure refinement in PHENIX

Refinement of atomic displacement parameters (commonly named as ADP or B-factors)

An ADP in phenix.refine is defined as a sum of three contributions:

Utotal = Ulocal + Utls + Ucryst where Utotal is the total ADP, Ulocal reflects the local atomic vibration (also named as residual B) and

Ucryst reflects global lattice vibrations. Ucryst is determined and refined at anisotropic scaling stage. phenix.refine offers multiple choices for ADP refinement:

● individual isotropic, anisotropic or mixed ADP;

● grouped with one isotropic ADP per selected group;

TLS.

All types of ADP refinement listed above can be used separately or combined all together in any combination (except TLS+individual anisotropic) and can be applied to any selected part of a model.

For example, if a model contains six chains A, B, C, D, E and F than it would require only one single refinement run to perform refinement of:

- individual isotropic ADP for atoms in chain A,

- individual anisotropic ADP for atoms in chain B,

- grouped B with one B per all atoms in chain C,

- TLS refinement for chain D,

- TLS and individual isotropic refinement for chain E,

- TLS and grouped B refinement for chain F.

Below we will illustrate this with several examples. Restraints are used for default ADP refinement of isotropic and anisotropic atoms. Completely unrestrained refinement is possible. The total refinement target is defined as:

Etotal = wxu_scale * wxu * Exray + wu * Eadp where: Exray is crystallographic refinement target (least-squares, maximum-likelihood, ...), Eadp is the

ADP restraints term, wu is 1.0 by default and used to turn the restraints off, wxu and wc_scale are defined similarly to coordinates refinement (see Refinement of Coordinates paragraph). It is important to keep in mind:

If a model was previously refined using TLS that means all atoms participating in TLS groups are reported in output PDB file as anisotropic (have ANISOU records). Now if a PDB file like this is submitted for default refinement then all atoms with ANISOU records will be refined as individual anisotropic which is most likely not desired. When performing TLS refinement along with individual isotropic refinement of Ulocal, the restraints are applied to Ulocal and not to the total ADP (Ulocal+Utls). When performing group B or TLS refinement only, no ADP restrains is used. When ADP refinement is run without using selections then ADP for all atoms will be refined. Otherwise, if selections are used, the only ADP of selected atoms will be refined and the ADP of the rest will be unchanged. If a TLS parametrization is used for a model previously refined with individual anisotropic ADP then normally an increase of R-factors is expected. phenix.refine will stop if an atom at special position is included in TLS group. The solution is to make a new TLS group selection containing no atoms at special positions. When refining TLS, the output PDB file always has the ANISOU records for the atoms involved in TLS groups. The anisotropic B-factor in ANISOU records is the total B-factor

(B_tls + B_individual). The isotropic equivalent B-factor in ATOM records is the mean of the trace of the

ANISOU matrix divided by 10000 and multiplied by 8*pi^2 and represents the isotropic equivalent of the total B-factor (B_tls + B_individual). To obtain the individual B-factors, one needs to compute the TLS component (B_tls) using the TLS records in the PDB file header and then subtract it from the total B-factors

(on the ANISOU records).

Refining group isotropic B-factors

1. One B-factor per residue:

% phenix.refine data.hkl model.pdb strategy=group_adp

Two B-factors per residue:

% phenix.refine data.hkl model.pdb strategy=group_adp \ http://phenix-online.org/documentation/refinement.htm (9 of 42) [12/14/08 1:02:19 PM]

168

Structure refinement in PHENIX

group_adp_refinement_mode=two_adp_groups_per_residue

2. One isotropic B per selected group of atoms:

% phenix.refine data.hkl model.pdb strategy=group_adp \

group_adp_refinement_mode=group_selection \

adp.group="chain A" adp.group="chain B"

This will refine one isotropic B for chain A and one B for chain B.

The refinement of group isotropic B-factors in phenix.refine does not change the original distribution of

B-factors within the group, that is the differences between B-factors for atoms withing the group remain constant while the only total component added to all atoms of given group is varied. The atoms

● with anisotropic ADP are allowed to be withing the group.

Refinement of individual ADP (isotropic, anisotropic) By default atoms in a PDB file with ANISOU records are refined as anisotropic and atoms without ANISOU records are refined as isotropic. This behavior can be changed with appropriate keywords.

1. Default refinement of individual ADP:

% phenix.refine data.hkl model.pdb strategy=individual_adp

Note, atoms in input PDB file with ANISOU records will be refined as anisotropic and those without ANISOU - as isotropic.

2. Refinement of individual isotropic ADP for a model previously refined as anisotropic or TLS:

% phenix.refine data.hkl model.pdb strategy=individual_adp \

adp.individual.isotropic=all or equivalently:

% phenix.refine data.hkl model.pdb strategy=individual_adp \

convert_to_isotropic=true

All anisotropic atoms in input PDB file will be converted to isotropic before the refinement starts. Obviously, this may raise the R-factors.

3. Refinement of individual anisotropic ADP for a model previously refined as isotropic:

% phenix.refine data.hkl model.pdb strategy=individual_adp \

adp.individual.anisotropic="not element H"

This will refine all atoms as anisotropic except hydrogens.

4. Refinement of mixed model (some atoms are isotropic, some are anisotropic):

% phenix.refine data.hkl model.pdb strategy=individual_adp \

adp.individual.anisotropic="chain A and not element H" \

adp.individual.isotropic="chain B or element H"

In this example the atoms (except hydrogens if any) in chain A will be refined as anisotropic and the atoms in chain B (and hydrogens if any) will be refined as isotropic. Often, the ADP of water and hydrogens are desired to be refined as isotropic while the other atoms - as anisotropic:

% phenix.refine data.hkl model.pdb strategy=individual_adp \

adp.individual.anisotropic="not water and not element H" \

adp.individual.isotropic="water or element H"

Exactly the same command using slightly shorter selection syntax:

% phenix.refine data.hkl model.pdb strategy=individual_adp \

adp.individual.anisotropic="not (water or element H)" \

adp.individual.isotropic="water or element H"

5. To perform unrestrained individual ADP refinement (usually at ultra-high resolutions): http://phenix-online.org/documentation/refinement.htm (10 of 42) [12/14/08 1:02:19 PM]

169

Structure refinement in PHENIX

% phenix.refine data.hkl model.pdb strategy=individual_adp wu=0

This assigns the contribution of the ADP restraints target to zero. However, it is still calculated for statistics output.

TLS refinement

1. Refinement of TLS parameters only (whole model as one TLS group):

% phenix.refine data.hkl model.pdb strategy=tls

2. Refinement of TLS parameters only (multiple TLS group):

% phenix.refine data.hkl model.pdb strategy=tls tls_group_selections.params

where, similar to the rigid body or group B-factor refinement, the selection for TLS groups has been made in a user-created parameter file (tls_group_selections.params) as following: refinement.refine.adp {

tls = chain A

tls = chain B

}

Alternatively, the selection for the TLS groups can be made from the command line (see rigid body refinement for an example). Note: TLS parameters will be refined only for selected fragments. This, for example, will allow to not include the solvent molecules into the TLS groups.

3. More complete is to perform combined TLS and individual or grouped isotropic ADP refinement:

% phenix.refine data.hkl model.pdb strategy=tls+individual_adp or:

% phenix.refine data.hkl model.pdb strategy=tls+group_adp

This will allow to model global (TLS) and local (individual) components of the total ADP and also compensate for the model parts where TLS parametrization doesn't suite well.

Occupancy refinement

Here is the list of facts that are important to know about occupancy refinement in phenix.refine:

● phenix.refine

can perform the following types of occupancy refinement: individual (refinement of one occupancy factor per atom), group (refinement of one occupnacy factor per group of selected atoms) and group constrained occupancy refinement. In individual and group occupancy refinement the refined occupancy values will be constrained between main.occupancy_min and main.occupancy_max, which is 0 and 1 by default. In group constrained occupancy refinement, there are (N-1) refinable occupancies per constrained group. An example of constrained group could be a residue that has N alternative conformations (where N typically ranges between 2 and 4). In such case all atoms within an alternative conformer will have equal occupancy values (1<=occupancy<=0) and the sum of all (N-

1)

occupancies will be 1.

The occupancy refinement is ON by default. This does not mean that occupancies of all atoms will be refined. Based on input PDB file, phenix.refine automatically finds which occupancies it will be refining. If no user defined selections is provided, phenix.refine will refine individual occupancies for all atoms that have partial occupancy values in input PDB file (1<occupancy<0, atoms with zero occupancy values are not included). Atoms in alternative conformations will be automatically determined based on altLoc identifiers in input PDB file and the group constrained occupancy refinement for these atoms will be performed as well.

Turning OFF the occupancy refinement can be done by removing the star (*) from the corresponing keyword in strategy = ... *occupancies ....

If selections are provided (see examples below) then the occupancy refinement for selected atoms will

● be performed as well as for those selected automatically (as described above).

User defined selections will override those defined by phenix.refine automatically. For example, if an atoms is automatically selected for individual occupancy refinement, but the user defined a group of atoms for which one occupancy factor will be refined (group occupnacy refinement), and this particular atom is within http://phenix-online.org/documentation/refinement.htm (11 of 42) [12/14/08 1:02:19 PM]

170

Structure refinement in PHENIX

● the group, then the individual occupancy will not be refined for this atom.

User can withhold the occupancy refinement for any atoms that were originally selectd for

● occupancy refinement by default (automatically).

The presence of user defined selections for occupancies to be refined is not enough to engage the occupancy refinement. It is important that the occupancy refinement is selected in strategy = keyword.

Examples:

1. Running with all defaul parameters:

% phenix.refine data.hkl model.pdb

This will refine individual coordinates, individual B-factors (isotropic or anisotropic) and occupancies for atoms in alternative conformations or for atoms having partial occupancies. If there is no such atoms in input PDB file, then no occupancies will be refined.

2. Refinement of occupancies only:

% phenix.refine data.hkl model.pdb strategy=occupancies

This will only refine occupancies for atoms in alternative conformations or for atoms having partial occupancies. If there is no such atoms in input PDB file, then no occupancies will be refined. Other model parameters, such as B-factors or coordinates will not be refined (this is the only difference between this and the above refinement runs).

3. Refine individual occupancies of water molecules (in addition to atoms with partial occupancies and those in alternative conformations, if any):

% phenix.refine data.hkl model.pdb refine.occupancies.individual="water"

Similar refinement where in addition all Zn atoms in chain X will be refined:

% phenix.refine data.hkl model.pdb occupancies.individual="water" \

occupancies.individual="chain X and element Zn"

4. Complex occupancy refinement strategy (combination of various available occupancy refinement types):

% phenix.refine data.hkl model.pdb strategy=occupancies occ.params

The amount of atom selections makes it inconvenient to type them all from the command line. This is why the parameter file occ.params is used and it contains following lines: refinement {

refine {

occupancies {

individual = element BR or water

individual = element Zn

constrained_group {

selection = chain A and resseq 1

}

constrained_group {

selection = chain A and resseq 2

selection = chain A and resseq 3

}

constrained_group {

selection = chain X and resname MAN

selection = chain X and resseq 42

selection = chain X and resseq 121

}

remove_selection = chain B and resseq 1 and name O

remove_selection = chain B and resseq 3 and name O

}

}

} which defines: http://phenix-online.org/documentation/refinement.htm (12 of 42) [12/14/08 1:02:19 PM]

171

Structure refinement in PHENIX

● group occupancy refinement. One occupancy for all atoms in chain A and resseq 1 will be refined and it will be contrained between main.occupancy_min and main.occupancy_min, which is by default 0 and

1, correspondingly.

● individual occupancies for all Zn and Br atoms, and waters.

● group constrained occupancy refinement. In one group the occupancies of atoms in chain A and resseq 2 and chain A and resseq 3 will be coupled. All occupancies within chain A and resseq 2 will have the exact same values lying between 0 and 1, and same for chain A and resseq 3. The sum of occupancies of chain A and resseq 2 and chain A and resseq 3 will be 1.0, making it one constrained group.

● another constrained group contains three residues (number 42 and 121, and MAN) and they occupancies will

● be refined similarly as described above.

occupancies of atoms O in residues 1 and 3 of chain X will not be refined (even though these atoms have partial occupancies in input PDB file and so they would normally be refined by default).

f' and f'' refinement

If the structure contains anomalous scatterers (e.g. Se in a SAD or MAD experiment), and if anomalous data are available, it is possible to refine the dispersive (f') and anomalous (f") scattering contributions (see e.g. Ethan Merritt's tutorial for more information). In phenix.refine, each group of scatterers with common f' and f" values is defined via an anomalous_scatterers scope, e.g.: refinement.refine.anomalous_scatterers {

group {

selection = name BR

f_prime = 0

f_double_prime = 0

refine = *f_prime *f_double_prime

}

}

NOTE: The refinement of the f' and f" values is carried out only if group_anomalous is included under refine.strategy

! Otherwise the values are simply used as specified but not refined. So the refinement run with the parameters above included into group_anomalous_1.params:

% phenix.refine model.pdb data_anom.hkl group_anomalous_1.params \

strategy=individual_sites+individual_adp+group_anomalous

If required, multiple scopes can be specified, one for each unique pair of f' and f" values. These values are assigned to all selected atoms (see below for atom selection details). Often it is possible to start the refinement from zero. If the refinement is not stable, it may be necessary to start from better estimates, or even to fix some values. For example (file group_anomalous_2.params): refinement.refine.anomalous_scatterers {

group {

selection = name BR

f_prime = -5

f_double_prime = 2

refine = f_prime *f_double_prime

}

}

% phenix.refine model.pdb data_anom.hkl group_anomalous_2.params \

strategy=individual_sites+individual_adp+group_anomalous

Here f' is fixed at -5 (note the missing * in front of f_prime in the refine definition), and the refinement of f" is initialized at 2. The phenix.form_factor_query command is available for obtaining estimates of f' and f" given an element type and a wavelength, e.g.:

% phenix.form_factor_query element=Br wavelength=0.8

Information from Sasaki table about Br (Z = 35) at 0.8 A fp: -1.0333

fdp: 2.9928

http://phenix-online.org/documentation/refinement.htm (13 of 42) [12/14/08 1:02:19 PM]

172

Structure refinement in PHENIX

Run without arguments for usage information:

% phenix.form_factor_query

Using NCS restraints in refinement

phenix.refine can find NCS automatically or use NCS selections defined by the user. Gaps in selected sequences are allowed - a sequence alignment is performed to detect insertions or deletions. We recommend to check the automatically detected or adjusted NCS groups.

1. Refinement with user provided NCS selections. Create a ncs_groups.params file with the NCS selections: refinement.ncs.restraint_group {

reference = chain A resid 1:4

selection = chain B and resid 1:3

selection = chain C

} refinement.ncs.restraint_group {

reference = chain E

selection = chain F

}

Specify ncs_groups.params as an additional input when running phenix.refine:

% phenix.refine data.hkl model.pdb ncs_groups.params main.ncs=True

This will perform the default refinement round (individual coordinates and B-factors) using NCS restraints on coordinates and B-factors. Note: user specified NCS restraints in ncs_groups.params can be modified automatically if better selection is found. To disable this potential automatic adjustment:

% phenix.refine data.hkl model.pdb ncs_groups.params main.ncs=True \

ncs.find_automatically=False

2. Automatic detection of NCS groups:

% phenix.refine data.hkl model.pdb main.ncs=True

This will perform the default refinement round (individual coordinates and B-factors) using NCS restraints automatically created based on input PDB file.

Water picking

phenix.refine has very efficient and fully automated protocol for water picking and refinement. One run of phenix.refine is normally necessary to locate waters, refine them, select good ones, add new and refine again, repeating the whole process multiple times. Normally, the default parameter settings are good for most cases:

% phenix.refine data.hkl model.pdb ordered_solvent=true

This will perform new water picking, analysis of existing waters and refinement of individual coordinates and

B-factors for both, macromolecule and waters. Several cycles will be performed allowing sorting out of spurious waters and refinement of well placed ones. Water picking can be combined with all others protocols, like simulated annealing, TLS refinement, etc. Some useful commands are:

1. Perform water picking every macro-cycle. By default, water picking starts after a half of macro-cycles is done:

% phenix.refine data.hkl model.pdb ordered_solvent=true \

ordered_solvent.mode=every_macro_cycle

2. Remove water only (based on specified criteria):

% phenix.refine data.hkl model.pdb ordered_solvent=true \ http://phenix-online.org/documentation/refinement.htm (14 of 42) [12/14/08 1:02:19 PM]

173

Structure refinement in PHENIX

ordered_solvent.mode=filter_only

3. The following run illustrates the use of some important parameters:

% phenix.refine data.hkl model.pdb ordered_solvent=true solvent.params

where the parameter file solvent.params contains: refinement {

ordered_solvent {

low_resolution = 2.8

b_iso_min = 1.0

b_iso_max = 50.0

b_iso = 25.0

primary_map_type = mFobs-DFmodel

primary_map_cutoff = 3.0

secondary_map_type = 2mFobs-DFmodel

}

peak_search {

map_next_to_model {

min_model_peak_dist = 1.8

max_model_peak_dist = 6.0

min_peak_peak_dist = 1.8

}

}

}

This will skip water picking if the resolution of data is lower than 2.8A, it will remove waters with B < 1.0 or B

> 50.0 A**2 or occupancy different from 1 or peak height at mFo-DFc map lower then 3 sigma. It will not select or will remove existing water if water-water or water-macromolecule distance is less than 1.8A or water-macromolecule distance is greater than 6.0 A. The initial occupancies and B-factors of newly placed waters will be 1.0 and 25.0 correspondingly. If b_either = None, then b_iso will be the mean atomic

B-factor.

Hydrogens in refinement

phenix.refine offers two possibilities for handling of hydrogen atoms:

● riding model;

● complete refinement of H (H atoms will be refined as other atoms in the model)

Although the contribution of hydrogen atoms to X-ray scattering is weak (at high resolution) or negligible

(at lower resolutions), the H atoms still present in real structures irrespective the data quality. Including them as riding model makes other model atoms aware of their positions and hence preventing nonphysical (bad) contacts at no cost in terms of refinable parameters (= no risk of overfitting). At subatomic resolution (approx. < 1.0 A) X-ray refinement or refinement using neutron data the parameters of

H atoms may be refined as for other heavier atoms. Below are some useful commands:

1. To add hydrogens to a model one need to run the Reduce program:

% phenix.reduce model.pdb > model_h_added.pdb

2. Once hydrogens added to a model, by default they will be refined as riding model:

% phenix.refine model.pdb data.hkl

It is possible to refine individual parameters for H atoms (if neutron data is used or at ultra-high resolution):

% phenix.refine model.pdb data.hkl hydrogens.refine=individual

3. To refine individual coordinates and ADP of H atoms:

% phenix.refine model.pdb data.hkl hydrogens.refine=individual http://phenix-online.org/documentation/refinement.htm (15 of 42) [12/14/08 1:02:19 PM]

174

Structure refinement in PHENIX

4. To remove hydrogens from a model:

% phenix.pdbtools model.pdb remove="element H"

We strongly recommend to not remove hydrogen atoms after refinement since it will make the refinement statistics (R-factors, etc...) unreproducible without repeating exactly the same refinement protocol.

5. Normally, phenix.reduce is used to add hydrogens. However, it may happen that phenix.reduce fails to add H to certain ligands. In this case phenix.elbow can be used to add hydrogens:

% phenix.elbow --final-geometry=model.pdb --residue=MAN --output=model_h

An output PDB file called model_h.pdb will contain the original ligand MAN with all hydrogen atoms added.

Refinement using twinned data

phenix.refine can handle the refinement of hemihedrally twinned data (two twin domains). Least square twin refinement can be carried out using the following commands line instructions:

% phenix.refine data.hkl model.pdb twin_law="-k,-h,-l"

The twin law (in this case -k,-h,-l) can be obtained from phenix.xtriage. If more than a single twin law is possible for the given unit cell and space group, using phenix.twin_map_utils might give clues which twin law is the most likely candidate to be used in refinement. Correcting maps for anisotropy might be useful:

% phenix.refine data.hkl model.pdb twin_law="-k,-h,-l" \

detwin.map_types.aniso_correct=true

The detwinning mode is auto by default: it will perform algebraic detwinning for twin fraction below 40%, and detwinning using proportionality rules (SHELXL style) for fractions above 40%. An important point to stress is that phenix.refine will only deal properly with twinning that involves two twin domains.

Neutron and joint X-ray and neutron refinement

Refinement using neutron data requires having H or/and D atoms added to the model. Use Reduce program to add all potential H atoms:

% phenix.reduce model.pdb > model_h.pdb

Currently, adding D atoms will require editing of model_h.pdb file to replace H with D where necessary.

1. Running refinement with neutron data only:

% phenix.refine data.hkl model.pdb main.scattering_table=neutron this will tell phenix.refine that the data in data.hkl file is coming from neutron scattering experiment and the appropriate scattering factors will be used in all calculations. All the examples and phenix.

refine functionality presented in this document are valid and compatible with using neutron data.

2. Using X-ray and neutron data simultaneously (joint X/N refinement). phenix.refine allows simultaneous use of both data sets, X-ray and neutron. The data sets are allowed to have different number of reflections and be collected at different resolutions. The only requirement (that is not enforced by the program but is the user's responsibility) is that both data sets have to be collected at the same temperature from same crystals

(or grown in identical conditions, having identical space groups and unit cell parameters). phenix.refine model.pdb data_xray.hkl neutron_data.file_name=data_neutron.hkl input.xray_data.

labels=FOBSx input.neutron_data.labels=FOBSn

Optimizing target weights

phenix.refine uses automatic procedure to determine the weights between X-ray target and stereochemistry or ADP restraints. To optimize these weights (that is to find those resulting in lowest Rfree factors): http://phenix-online.org/documentation/refinement.htm (16 of 42) [12/14/08 1:02:19 PM]

175

Structure refinement in PHENIX

% phenix.refine data.hkl model.pdb optimize_wxc=true optimize_wxu=true where optimize_wxc will turn on the optimization of X-ray/stereochemistry weight and optimize_wxu will turn on the optimization of X-ray/ADP weight. Note that this could be very slow since the procedure involves a grid search over an array of weights-candidates. It could be a good idea to run this overnight for a final model tune up.

Refinement at high resolution (higher than approx. 1.0 Angstrom)

Guidelines for structure refinement at high resolution:

● make sure the model contains hydrogen atoms. If not, phenix.reduce can be used to add them:

% phenix.reduce model.pdb > model_h.pdb

By default, phenix.refine will refine positions of H atoms as riding model (H atom will exactly follow the atom it is attached to). Note that phenix.refine can also refine individual coordinates of H atoms

(can be used for small molecules at ultra-high resolutions or for refinement against neutron data). This is governed by hydrogens.refine = individual *riding keyword and the default is to use riding model. hydrogens.refine defines how hydrogens' B-factors are refined (default is to refine one group

B for all H atoms). At high resolution one should definitely try to use one_b_per_molecule or even individual

choice (resolution permitting). Similar strategy should be used for refinement of H's occupancies, hydrogens.refine_occupancies keyword.

● most of the atoms should be refined with anisotropic ADP. Exceptions could be model parts with high Bfactors), atoms in alternative conformations, hydrogens and solvent molecules. However, at resolutions higher than 1.0A it's worth of trying to refine solvent with anisotropic ADP.

● it is a good idea to constantly monitor the existing solvent molecules and check for new ones by using ordered_solvent=true

keyword. If it's decided to refine waters with anisotropic ADP then make sure that the newly added ones are also anisotropic; use ordered_solvent.new_solvent=anisotropic

(default is isotropic). One can also ask phenix.refine to refine occupancies of water: ordered_solvent.refine_occupancies=true

(default is False).

● at high resolution the alternative conformations can be visible for more than 20% of residues. phenix.

refine

automatically recognizes atoms in alternative conformations (based on PDB records) and by default does constrained refinement of occupancies for these atoms. Please note, that phenix.refine does not build or create the fragments in alternative conformations; the atoms in alternative conformations should be properly defined in input PDB file (using conformer identifiers) (if actually found in a structure).

● the default weights for stereochemical and ADP restraints are most likely too tight at this resolution, so most likely the corresponding values need to be relaxed. Use wxc_scale and wxu_scale for this; lower values, like 1/2, 1/3, 1/4, ... etc of the default ones should be tried. phenix.refine allows automatically optimize these values ( optimize_wxc=True and optimize_wxu=True), however this is a very slow task so it may be considered for an over night run or even longer. At ultra-high resolutions

(approx. 0.8A or higher) a complete unrestrained refinement should be definitely tried out for well

● ordered parts of the model (single conformations, low B-factors). at ultra-high resolution the residual maps show the electron density redistribution due to bonds formation as density peaks at interatomic bonds. phenix.refine has specific tools to model this density called IAS models (Afonine et al, Acta Cryst. (2007). D63, 1194-1197).

This example illustrates most of the above points:

% phenix.refine model_h.pdb data.hkl high_res.params

where the file high_res.params contains following lines (for more parameters under each scope look at complete list of parameters): refinement.main {

number_of_macro_cycles = 5

ordered_solvent=true

} refinement.refine {

adp {

individual {

isotropic = element H

anisotropic = not element H http://phenix-online.org/documentation/refinement.htm (17 of 42) [12/14/08 1:02:19 PM]

176

Structure refinement in PHENIX

}

}

} refinement.target_weights {

wxc_scale = 0.25

wxu_scale = 0.3

} refinement {

ordered_solvent {

mode = auto filter_only *every_macro_cycle

new_solvent = isotropic *anisotropic

refine_occupancies = True

}

}

In the example above phenix.refine will perform 5 macro-cycles with ordered solvent update (add/ remove) every macro-cycles, all atoms including newly added water will be refined with anisotropic Bfactors (except hydrogens), riding model will be used for positional refinement of H atoms, one occupancy and isotropic B-factor will be refined per all hydrogens within a residue, occupancies of waters will be refined as well, the default stereochemistry and ADP restraints weights are scaled down by the factors of 0.25 and

0.3 respectively. If starting model is far enough from the "final" one, more macro-cycles may be required

(than 5 used in this example).

Examples of frequently used refinement protocols, common problems

1. Starting refinement from high R-factors:

% phenix.refine data.hkl model.pdb ordered_solvent=true main.number_of_macro_cycles=10 \

simulated_annealing=true strategy=rigid_body+individual_sites+individual_adp \

Depending on data resolution, refinement of individual ADP may be replaced with grouped B refinement:

% phenix.refine data.hkl model.pdb ordered_solvent=true simulated_annealing=true \

strategy=rigid_body+individual_sites+group_adp main.number_of_macro_cycles=10

Adding TLS refinement may be a good idea. Note, unlike other programs, phenix.refine does not require

"good model" for doing TLS refinement; TLS refinement is always stable in phenix.refine (please report if noticed otherwise):

% phenix.refine data.hkl model.pdb ordered_solvent=true simulated_annealing=true \

strategy=rigid_body+individual_sites+individual_adp+tls main.number_of_macro_cycles=10

If NCS is present - once can use it:

% phenix.refine data.hkl model.pdb ordered_solvent=true simulated_annealing=true \

strategy=rigid_body+individual_sites+individual_adp+tls main.ncs=true \

main.number_of_macro_cycles=10 tls_group_selections.params \

rigid_body_selections.params

where tls_groups_selections.txt, rigid_body_groups_selections.txt are the files TLS and rigid body groups selections, NCS will be determined automatically from input PDB file. See this document for details on how specify these selections. Note: in these four examples above we re-defined the default number of refinement macro-cycles from 3 to 10, since a start model with high R-factors most likely requires more cycles to become a good one. Also in these examples, the rigid body refinement will be run only once at first macrocycle, the water picking will start after half of macro-cycles is done (after 5th), the SA will be done only twice

- the first and before the last macro-cycles. Even though it is requested, the water picking may not be performed if the resolution is too low. All these default behaviors can be changed: see parameter's help for more details. The last command looks too long to type it in the command line. Look this document for an example of how to make it like this:

% phenix.refine data.hkl model.pdb custom_par_1.params

1. Refinement at "higher than medium" resolution - getting anisotropic.

http://phenix-online.org/documentation/refinement.htm (18 of 42) [12/14/08 1:02:19 PM]

177

Structure refinement in PHENIX

Refining at higher resolution one may consider:

At resolutions around 1.8 ... 1.7 A or higher it is a good idea to try refinement of anisotropic ADP for atoms at well ordered parts of the model. Well ordered parts can be identified by relatively small isotropic B-factors ~5-20A**2 of so.

The riding model for H atoms should be used.

Loosing stereochemistry and ADP restraints.

Re-thing using the NCS (if present): it may turn out to be enough of data to not use NCS restrains. Try both, with and without NCS, and based on R-free vales decide the strategy.

Supposing the H atoms were added to the model, below is an example of what may want to do at higher resolution:

% phenix.refine data.hkl model.pdb adp.individual.anisotropic="resid 1-2 and not element H" \

adp.individual.isotropic="not (resid 1-2 and not element H)" wxc_scale=2 wxu_scale=2

In the command above phenix.refine will refine the ADP of atoms in residues from 1 to 2 as anisotropic, the rest (including all H atoms) will be isotropic, the X-ray target contribution is increased for both, coordinate and ADP refinement. IMPORTANT: Please make note of the selection used in the above command: selecting atoms in residues 1 and 2 to be refined as anisotropic, one need to exclude hydrogens, which should be refined as isotropic.

1. Stereochemistry looks too tightly / loosely restrained, or gap between R-free and R-work seems too big: playing with restraints contribution. Although the automatic calculation of weight between X-ray and stereochemistry or ADP restraint targets is good for most of cases, it may happen that rmsd deviations from ideal bonds length or angles are looking too tight or loose ( depending on resolution). Or the difference between R-work and R-free is too big (significantly bigger than approx. 5%). In such cases one definitely need to try loose or tighten the restraints. Hers is how for coordinates refinement:

% phenix.refine data.hkl model.pdb wxc_scale=5

The default value for wxc_scale is 0.5. Increasing wxc_scale will make the X-ray target contribution greater and restraints looser. Note: wxc_scale=0 will completely exclude the experimental data from the refinement resulting in idealization of the stereochemistry. For stereochemistry idealization use the separate command:

% phenix.geometry_minimization model.pdb

To see the options type:

% phenix.geometry_minimization --help

To play with ADP restraints contribution:

% phenix.refine data.hkl model.pdb wxu_scale=3

The default value for wxu_scale is 1.0. Increasing wxu_scale will make the X-ray target contribution greater and therefore the B-factors restraints weaker. Also, one can completely ignore the automatically determined weights (for both, coordinates and ADP refinement) and use specific values instead:

% phenix.refine data.hkl model.pdb fix_wxc=15.0

The refinement target will be: Etotal = 15.0 * Exray + Egeom Similarly for ADP refinement:

% phenix.refine data.hkl model.pdb fix_wxu=25.0

The refinement target will be: Etotal = 25.0 * Exray + Eadp

2. Having unknown to phenix.refine item in PDB file (novel ligand, etc...). phenix.refine uses the CCP4

Monomer Library as the source of stereochemical information for building geometry restraints and reposting statistics. If phenix.refine is unable to match an item in input PDB file against the Monomer Library it will stop with "Sorry" message explaining what to do and listing the problem atoms. If this happened, it is necessary to obtain a cif file (parameter file, describing unknown molecule) by either making it manually or having eLBOW program to generate it: http://phenix-online.org/documentation/refinement.htm (19 of 42) [12/14/08 1:02:19 PM]

178

Structure refinement in PHENIX phenix.elbow model.pdb --do-all --output=all_ligands this will ask eLBOW to inspect the model_new.pdb file, find all unknown items in it and create one cif file for them all_ligands.cif. Alternatively, one can specify a three-letters name for the unknown residue: phenix.elbow model.pdb --residue=MAN --output=man

Once the cif file is created, the new run of phenix.refine will be: phenix.refine model.pdb data.pdb man.cif

Consult eLBOW documentation for more details.

Useful options

Changing the number of refinement cycles and minimizer iterations

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=5 \

main.max_number_of_iterations=20

Creating R-free flags (if not present in the input reflection files)

% phenix.refine data.hkl model.pdb xray_data.r_free_flags.generate=True

It is important to understand that reflections selected for test set must be never used in any refinement of any parameters. If the newly selected test reflections were used in refinement before then the corresponding

R-free statistics will be wrong. In such case "refinement memory" removal procedure must be applied to recover proper statistics. To change the default maximal number of test flags to be generated and the fraction:

% phenix.refine data.hkl model.pdb xray_data.r_free_flags.generate=True \

xray_data.r_free_flags.fraction=0.05 xray_data.r_free_flags.max_free=500

Specify the name for output files

% phenix.refine data.hkl model.pdb output.prefix=lysozyme

Reflection output

At the end of refinement a file with Fobs, Fmodel, Fcalc, Fmask, FOM, R-free_flags can be written out (in

MTZ format):

% phenix.refine data.hkl model.pdb export_final_f_model=mtz

To output the reflections in CNS reflection file format:

% phenix.refine data.hkl model.pdb export_final_f_model=cns

Note: Fmodel is the total model structure factor including all scales:

Fmodel = scale_k1 * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask)

Setting the resolution range for the refinement

% phenix.refine data.hkl model.pdb xray_data.low_resolution=15.0 xray_data.high_resolution=2.0

Bulk solvent correction and anisotropic scaling

By default phenix.refine always starts with bulk solvent modeling and anisotropic scaling. Here is the list of command that may be of use in some cases: http://phenix-online.org/documentation/refinement.htm (20 of 42) [12/14/08 1:02:19 PM]

179

Structure refinement in PHENIX

1. Perform bulk-solvent modeling and anisotropic scaling only:

% phenix.refine data.hkl model.pdb strategy=none

2. Bulk-solvent modeling only (no anisotropic scaling):

% phenix.refine data.hkl model.pdb strategy=none bulk_solvent_and_scale.anisotropic_scaling=false

3. Anisotropic scaling only (no bulk-solvent modeling):

% phenix.refine data.hkl model.pdb strategy=none bulk_solvent_and_scale.bulk_solvent=false

4. Turn off bulk-solvent modeling and anisotropic scaling:

% phenix.refine data.hkl model.pdb main.bulk_solvent_and_scale=false

5. Fixing bulk-solvent and anisotropic scale parameters to user defined values:

% phenix.refine data.hkl model.pdb bulk_solvent_and_scale.params

where bulk_solvent_and_scale.params is the file containing these lines: refinement {

bulk_solvent_and_scale {

k_sol_b_sol_grid_search = False

minimization_k_sol_b_sol = False

minimization_b_cart = False

fix_k_sol = 0.45

fix_b_sol = 56.0

fix_b_cart {

b11 = 1.2

b22 = 2.3

b33 = 3.6

b12 = 0.0

b13 = 0.0

b23 = 0.0

}

}

}

6. Mask parameters: Bulk solvent modeling involves the mask calculation. There are three principal parameters controlling it: solvent_radius, shrink_truncation_radius and grid_step_factor. Normally, these parameters are not supposed to be changed but can be changed:

% phenix.refine data.hkl model.pdb refinement.mask.solvent_radius=1.0 \

refinement.mask.shrink_truncation_radius=1.0 refinement.mask.grid_step_factor=3

If one wants to gain some more drop in R-factors (somewhere between 0.0 and 1.0%) it is possible to run fairly time consuming (depending on structure size and resolution) procedure of mask parameters optimization:

% phenix.refine data.hkl model.pdb optimize_mask=true

This will perform the grid search for solvent_radius and shrink_truncation_radius and select the values giving the best R-factor.

By default phenix.refine adds isotropic component of overall anisotropic scale matrix to atomic Bfactors, leaving the trace of overall anisotropic scale matrix equals to zero. This is the reason why one can observe the ADP changed even though the only anisotropic scaling was done and no ADP refinement performed.

Default refinement with user specified X-ray target function

http://phenix-online.org/documentation/refinement.htm (21 of 42) [12/14/08 1:02:19 PM]

180

Structure refinement in PHENIX

1. Refinement with least-squares target:

% phenix.refine data.hkl model.pdb main.target=ls

2. Refinement with maximum-likelihood target (default):

% phenix.refine data.hkl model.pdb main.target=ml

3. Refinement with phased maximum-likelihood target:

% phenix.refine data.hkl model.pdb main.target=mlhl

If phenix.refine finds Hendrickson-Lattman coefficients in input reflection file, it will automatically switch to mlhl target. To disable this:

% phenix.refine data.hkl model.pdb main.use_experimental_phases=false

Modifying the initial model before refinement starts

phenix.refine offers several options to modify input model before refinement starts:

1. shaking of coordinates (adding a random shift to coordinates):

% phenix.refine data.hkl model.pdb sites.shake=0.3

2. rotation-translation shift of coordinates:

% phenix.refine data.hkl model.pdb sites.rotate="1 2 3" sites.translate="4 5 6"

3. shaking of occupancies:

% phenix.refine data.hkl model.pdb occupancies.randomize=true

4. shaking of ADP:

% phenix.refine data.hkl model.pdb adp.randomize=true

5. shifting of ADP (adding a constant value):

% phenix.refine data.hkl model.pdb adp.shift_b_iso=10.0

6. scaling of ADP (multiplying by a constant value):

% phenix.refine data.hkl model.pdb adp.scale_adp=0.5

7. setting a value to ADP:

% phenix.refine data.hkl model.pdb adp.set_b_iso=25

8. converting to isotropic:

% phenix.refine data.hkl model.pdb adp.convert_to_isotropic=true

9. converting to anisotropic:

% phenix.refine data.hkl model.pdb adp.convert_to_anisotropic=true \

modify_start_model.selection="not element H"

When converting atoms into anisotropic, it is important to make sure that hydrogens (if present in the model) are not converted into anisotropic. http://phenix-online.org/documentation/refinement.htm (22 of 42) [12/14/08 1:02:19 PM]

181

Structure refinement in PHENIX

By default, the specified manipulations will be applied to all atoms. However, it is possible to apply them to only selected atoms:

% phenix.refine data.hkl model.pdb adp.set_b_iso=25 modify_start_model.selection="chain A"

To write out the modified model (without any refinement), add: main.number_of_macro_cycles=0, e.g.:

% phenix.refine data.hkl model.pdb adp.set_b_iso=25 \

main.number_of_macro_cycles=0

All the commands listed above plus some more are available from phenix.pdbtools utility which in fact is used internally in phenix.refine to perform these manipulations. For more information on phenix.

pdbtools

type:

% phenix.pdbtools --help

Documentation on phenix.pdbtools is also available.

Refinement using FFT or direct structure factor calculation algorithm

% phenix.refine data.hkl model.pdb \

structure_factors_and_gradients_accuracy.algorithm=fft or:

% phenix.refine data.hkl model.pdb \

structure_factors_and_gradients_accuracy.algorithm=direct

Ignoring test (free) flags in refinement

Sometimes one need to use all reflections ("work" and "test") in the refinement; for example, at very low resolution where each single reflection counts, or at subatomic resolution where the risk of overfitting is very low. In the example below all the reflections are used in the refinement:

% phenix.refine data.hkl model.pdb xray_data.r_free_flags.ignore_r_free_flags=true

Note: 1) the corresponding statistics (R-factors, ...) will be identical for "work" and "test" sets; 2) it is still necessary to have test flags presented in input reflection file (or automatically generated by phenix.refine).

Using phenix.refine to calculate structure factors

The total structure factor used in phenix.refine nearly in all calculations is defined as:

Fmodel = scale_k1 * exp(-h*U_overall*ht) * (Fcalc + k_sol * exp(-B_sol*s^2) * Fmask)

1. Calculate Fcalc from atomic model and output in MTZ file (no solvent modeling or scaling):

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=0 \

main.bulk_solvent_and_scale=false export_final_f_model=mtz

2. Calculate Fcalc from atomic model including bulk solvent and all scales:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \

strategy=none export_final_f_model=mtz

3. To output CNS/Xplor formatted reflection file:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \

strategy=none export_final_f_model=cns http://phenix-online.org/documentation/refinement.htm (23 of 42) [12/14/08 1:02:19 PM]

182

Structure refinement in PHENIX

4. Resolution limits can be applied:

% phenix.refine data.hkl model.pdb main.number_of_macro_cycles=1 \

strategy=none xray_data.low_resolution=15.0 xray_data.high_resolution=2.0

Note:

The number of calculated structure factors will the same as the number of observed data (Fobs) provided in the input reflection files or less since resolution and sigma cutoffs may be applied to Fobs or some Fobs may be automatically removed by outliers detection procedure.

The set of calculated structure factors has the same completeness as the set of provided Fobs.

Scattering factors

There are four choices for the scattering table to be used in phenix.refine:

● wk1995: Waasmaier & Kirfel table;

● it1992: International Crystallographic Tables (1992)

● n_gaussian: dynamic n-gaussian approximation

● neutron: table for neutron scattering

The default is n_gaussian. To switch to different table:

% phenix.refine data.hkl model.pdb main.scattering_table=neutron

Suppressing the output of certain files

The following command will tell phenix,refine to not write .eff, .geo, .def, maps and map coefficients files:

% phenix.refine data.hkl model.pdb write_eff_file=false write_geo_file=false \

write_def_file=false write_maps=false write_map_coefficients=false

The only output will be: .log and .pdb files.

Random seed

To change random seed:

% phenix.refine data.hkl model.pdb main.random_seed=7112384

The results of certain refinement protocols, such as restrained refinement of coordinates (with SA or

LBFGS minimization), are sensitive to the random seed. This is because: 1) for SA the refinement starts with random assignment of velocities to atoms; 2) the X-ray/geometry target weight calculation involves model shaking with some Cartesian dynamics. As result, running such refinement jobs with exactly the same parameters but different random seeds will produce different refinement statistics. The author's experience includes the case where the difference in R-factors was about 2.0% between two SA runs. Also, this opens a possibility to perform multi-start SA refinement to create an ensemble of slightly different models in average but sometimes containing significant variations in certain parts.

Electron density maps

By phenix.refine outputs two likelihood-weighted maps: 2mFo-DFc and mFo-DFc. The user can also choose between likelihood-weighted and regular maps with any specified coefficients, for example: 2mFo-

DFc, 2.7mFo-1.3DFc, Fo-Fc, 3Fo-2Fc. The result is output as ASCII X-PLOR format. A reflection file with map coefficients is also generated for use in Coot or XtalView. The example below illustrates the main options:

% phenix.refine data.hkl model.pdb map.params

where map.params contains: refinement {

electron_density_maps { http://phenix-online.org/documentation/refinement.htm (24 of 42) [12/14/08 1:02:19 PM]

183

Structure refinement in PHENIX

map {

mtz_label_amplitudes = 2FOFCWT

mtz_label_phases = PH2FOFCWT

likelihood_weighted = True

obs_factor = 2

calc_factor = 1

}

map {

mtz_label_amplitudes = FOFCWT

mtz_label_phases = PHFOFCWT

likelihood_weighted = True

obs_factor = 1

calc_factor = 1

}

map {

mtz_label_amplitudes = 3FO2FCWT

mtz_label_phases = PH3FO2FCWT

likelihood_weighted = False

obs_factor = 3

calc_factor = 2

}

grid_resolution_factor = 1/4.

region = *selection cell

atom_selection = name CA or name N or name C

apply_sigma_scaling = False

apply_volume_scaling = True

}

}

This will output three map files containing mFo-DFc, 2mFo-DFc and 3Fo-2Fc maps. All maps will be in absolute scale (in e/A**3). The map finess will be (data resolution)*grid_resolution_factor and the map will be output around main chain atoms. If atom_selection is set to None or all then map will be computed for all atoms. The corresponding MTZ file will also contain the map coefficients for these three maps.

Refining with anomalous data (or what phenix.refine does with Fobs+ and Fobs-).

The way phenix.refine uses Fobs+ and Fobs- is controlled by xray_data.force_anomalous_flag_to_be_equal_to

parameter. Here are 3 possibilities:

1. Default behavior: phenix.refine will use all Fobs: Fobs+ and Fobs- as independent reflections:

% phenix.refine model.pdb data_anom.hkl

2. phenix.refine will generate missing Bijvoet mates and use all Fobs+ and Fobs- as independent reflections if:

% phenix.refine model.pdb data_anom.hkl xray_data.force_anomalous_flag_to_be_equal_to=true

3. phenix.refine will merge Fobs+ and Fobs-, that is instead of two separate Fobs+ and Fobs- it will use one value F_mean = (Fobs+ + Fobs-)/2 if:

% phenix.refine model.pdb data_anom.hkl xray_data.force_anomalous_flag_to_be_equal_to=false

Look this documentation to see how to use and refine f' and f''.

Rejecting reflections by sigma

Reflections can be rejected by sigma cutoff criterion applied to amplitudes Fobs

<= sigma_fobs_rejection_criterion * sigma(Fobs):

% phenix.refine model.pdb data_anom.hkl xray_data.sigma_fobs_rejection_criterion=2 or/and intensities Iobs <= sigma_iobs_rejection_criterion * sigma(Iobs): http://phenix-online.org/documentation/refinement.htm (25 of 42) [12/14/08 1:02:19 PM]

184

Structure refinement in PHENIX

% phenix.refine model.pdb data_anom.hkl xray_data.sigma_iobs_rejection_criterion=2

Internally, phenix.refine uses amplitudes. If both sigma_fobs_rejection_criterion and sigma_iobs_rejection_criterion are given as non-zero values, then both criteria will be applied: first to Iobs, then to Fobs (after truncated Iobs got converted to Fobs):

% phenix.refine model.pdb data_anom.hkl xray_data.sigma_fobs_rejection_criterion=2 \

xray_data.sigma_iobs_rejection_criterion=2

By default, both sigma_fobs_rejection_criterion and sigma_iobs_rejection_criterion are set to zero

(no reflections rejected) and, unless strongly motivated, we encourage to not change these values.

If amplitudes provided at input then sigma_fobs_rejection_criterion is ignored.

Developer's tools

phenix.refine offers a broad functionality for experimenting that may not be useful in everyday practice but handy for testing ideas. Substitute input Fobs with calculated Fcalc, shake model and refine it

Instead of using Fobs from input data file one can ask phenix.refine to use the calculated structure factors

Fcalc using the input model. Obviously, the R-factors will be zero throughout the refinement. One can also shake various model parameters (see this document for details), then refinement will start with some bad statistics (big R-factors at least) and hopefully will converge to unmodified start model (if not shaken too well). Also it's possible to simulate Flat bulk solvent model contribution and anisotropic scaling:

% phenix.refine model.pdb data.hkl experiment.params

where experiment.params contains the following: refinement {

main {

fake_f_obs = True

}

modify_start_model {

selection = "chain A"

sites {

shake = 0.5

}

}

fake_f_obs {

k_sol = 0.35

b_sol = 45.0

b_cart = 1.25 3.78 1.25 0.0 0.0 0.0

scale = 358.0

}

}

In this example, the input Fobs will be substituted with the same amount of Fcalc (absolute values of Fcalc), then the coordinates of the structure will be shaken to achieve rmsd=0.5 and finally the default run of refinement will be done. The bulk solvent and anisotropic scale and overall scalar scales are also added to thus obtained Fcalc in accordance with Fmodel definition (see this document for definition of total structure factor, Fmodel). Expected refinement behavior: R-factors will drop from something big to zero.

CIF modifications and links

phenix.refine uses the CCP4 monomer library to build geometry restraints (bond, angle, dihedral, chirality and planarity restraints). The CCP4 monomer library comes with a set of "modifications" and "links" which are defined in the file mon_lib_list.cif. Some of these are used automatically when phenix.refine builds the geometry restraints (e.g. the peptide and RNA/DNA chain links). Other links and modifications have to be applied manually, e.g. (cif_modification.params file): refinement.pdb_interpretation.apply_cif_modification {

data_mod = 5pho

residue_selection = resname GUA and name O5T

} http://phenix-online.org/documentation/refinement.htm (26 of 42) [12/14/08 1:02:19 PM]

185

Structure refinement in PHENIX

Here a custom 5pho modification is applied to all GUA residues with an O5T atom. I.e. the modification can be applied to multiple residues with a single apply_cif_modification block. The CIF modification is supplied as a separate file on the phenix.refine command line, e.g. (data_mod_5pho.cif file): data_mod_5pho

# loop_

_chem_mod_atom.mod_id

_chem_mod_atom.function

_chem_mod_atom.atom_id

_chem_mod_atom.new_atom_id

_chem_mod_atom.new_type_symbol

_chem_mod_atom.new_type_energy

_chem_mod_atom.new_partial_charge

5pho add . O5T O OH .

loop_

_chem_mod_bond.mod_id

_chem_mod_bond.function

_chem_mod_bond.atom_id_1

_chem_mod_bond.atom_id_2

_chem_mod_bond.new_type

_chem_mod_bond.new_value_dist

_chem_mod_bond.new_value_dist_esd

5pho add O5T P coval 1.520 0.020

The whole command will be:

% phenix.refine model_o5t.pdb data.hkl data_mod_5pho.cif cif_modification.params

Similarly, a link can be applied like this (cif_link.params file): refinement.pdb_interpretation.apply_cif_link {

data_link = MAN-THR

residue_selection_1 = chain X and resname MAN and resid 900

residue_selection_2 = chain X and resname THR and resid 42

}

% phenix.refine model.pdb data.hkl cif_link.params

The residue selections for links must select exactly one residue each. The MAN-THR link is pre-defined in mon_lib_list.cif. Custom links can be supplied as additional files on the phenix.refine command line.

See mon_lib_list.cif for examples. The full path to this file can be obtained with the command:

% phenix.where_mon_lib_list_cif

All apply_cif_modification and apply_cif_link definitions will be included into the .def files. I.e. it is not necessary to specify the definitions again if further refinement runs are started with .def files. Note that all LINK, SSBOND, HYDBND, SLTBRG and CISPEP records in the input PDB files are ignored.

Definition of custom bonds and angles

Most geometry restraints (bonds, angles, etc.) are generated automatically based on the CCP4 monomer library. Additional custom bond and angle restraints, e.g. between protein and a ligand or ion, can be specified in this way: refinement.geometry_restraints.edits {

zn_selection = chain X and resname ZN and resid 200 and name ZN

his117_selection = chain X and resname HIS and resid 117 and name NE2

asp130_selection = chain X and resname ASP and resid 130 and name OD1

bond {

action = *add

atom_selection_1 = $zn_selection

atom_selection_2 = $his117_selection http://phenix-online.org/documentation/refinement.htm (27 of 42) [12/14/08 1:02:19 PM]

186

Structure refinement in PHENIX

distance_ideal = 2.1

sigma = 0.02

slack = None

}

bond {

action = *add

atom_selection_1 = $zn_selection

atom_selection_2 = $asp130_selection

distance_ideal = 2.1

sigma = 0.02

slack = None

}

angle {

action = *add

atom_selection_1 = $his117_selection

atom_selection_2 = $zn_selection

atom_selection_3 = $asp130_selection

angle_ideal = 109.47

sigma = 5

}

}

The atom selections must uniquely select a single atom. Save the geometry_restraints.edits to a file and specify the file name as an additional argument when running phenix.refine for the first time. For example:

% phenix.refine model.pdb data.hkl restraints_edits.params

The edits will be included into the .def files. I.e. it is not necessary to manually specify them again if further refinement runs are started with .def files. The bond.slack parameter above can be used to disable a bond restraint within the slack tolerance around distance_ideal. This is useful for hydrogen bond restraints, or when refining with very high-resolution data (e.g. better than 1 A). The bond restraint is activated only if the discrepancy between the model bond distance and distance_ideal is greater than the slack value. The slack is subtracted from the discrepancy. The resulting potential is called a "squarewell potential" by some authors. The formula for the contribution to the refinement target function is: weight * delta_slack**2 with: delta_slack = sign(delta) * max(0, (abs(delta) - slack)) delta = distance_ideal - distance_model weight = 1 / sigma**2

The slack value must be greater than or equal to zero (it can also be None, which is equivalent to zero in this case).

Atom selection examples

All atoms all

All C-alpha atoms (not case sensitive) name ca

All atoms with ``H`` in the name (``*`` is a wildcard character) name *H*

Atoms names with ``*`` (backslash disables wildcard function) name o2\* http://phenix-online.org/documentation/refinement.htm (28 of 42) [12/14/08 1:02:19 PM]

187

Structure refinement in PHENIX

Atom names with spaces name 'O 1'

Atom names with primes don't necessarily have to be quoted name o2'

Boolean ``and``, ``or`` and ``not`` resname ALA and (name ca or name c or name n or name o) chain a and not altid b resid 120 and icode c and model 2 segid a and element c and charge 2+ and anisou

Residue 188 resseq 188 resid

is a synonym for resseq: resid 188

Note that if there are several chains containing residue number 188, all of them will be selected. To be more specific and select residue 188 in particular chain: chain A and resid 188 this will select residue 188 only in chain A. Residues 2 through 10 (including 2 and 10) resseq 2:10

"Smart" selections resname ALA and backbone resname ALA and sidechain peptide backbone rna backbone or dna backbone water or nucleotide dna and not (phosphate or ribose) within(5, (nucleotide or peptide) backbone)

Depositing refined structure with PDB

phenix.refine reports a comprehensive statistics in PDB file header of refined model. This statistics consists of two parts: the first (upper, formatted with REMARK record) part is relevant to the current refinement run and contains the information about input data and model files, time stamp, start and final R-factors, refinement statistics from macro-cycle to macro-cycle, etc. The second (lower, formatted with REMARK

3 record) part is abstracted from a particular refinement run (no intermediate statistics, time, no file names, etc.). This part is supposed to go in PDB and the first part should be removed manually.

Referencing phenix.refine

Afonine, P.V., Grosse-Kunstleve, R.W. & Adams, P.D. (2005). CCP4 Newsl. 42, contribution 8.

Relevant reading

Below is the list of papers either published in connection with phenix.refine or used to implement specific features in phenix.refine:

1. Maximum-likelihood in structure refinement: http://phenix-online.org/documentation/refinement.htm (29 of 42) [12/14/08 1:02:19 PM]

188

Structure refinement in PHENIX

V.Yu., Lunin & T.P., Skovoroda. Acta Cryst. (1995). A51, 880-887. "R-free likelihood-based estimates of

❍ errors for phases calculated from atomic models"

Pannu, N.S., Murshudov, G.N., Dodson, E.J. & Read, R.J. (1998). Acta Cryst. D54, 1285-1294. "Incorporation

❍ of Prior Phase Information Strengthens Maximum-Likelihood Structure Refinement"

V.Y., Lunin, P.V. Afonine & A.G., Urzhumtsev. Acta Cryst. (2002). A58, 270-282. "Likelihood-based

❍ refinement. I. Irremovable model errors"

P. Afonine, V.Y. Lunin & A. Urzhumtsev. J. Appl. Cryst. (2003). 36, 158-159. "MLMF: least-squares approximation of likelihood-based refinement criteria"

2. ADP:

V. Schomaker & K.N. Trueblood. Acta Cryst. (1968). B24, 63-76. "On the rigid-body motion of molecules in crystals"

F.L. Hirshfeld. Acta Cryst. (1976). A32, 239-244. "Can X-ray data distinguish bonding effects from vibrational

❍ smearing?"

T.R. Schneider. Proceedings of the CCP4 Study Weekend (E. Dodson, M. Moore, A. Ralph, and S. Bailey, eds.), SERC Daresbury Laboratory, Daresbury, U.K., pp. 133-144 (1996). "What can we Learn from

Anisotropic Temperature Factors ?"

M.D. Winn, M.N. Isupov & G.N. Murshudov. Acta Cryst. (2001). D57, 122-133. "Use of TLS parameters to

❍ model anisotropic displacements in macromolecular refinement"

R.W. Grosse-Kunstleve & P.D. Adams. J. Appl. Cryst. (2002). 35, 477-480. "On the handling of atomic anisotropic displacement parameters"

P. Afonine & A. Urzhumtsev. (2007). CCP4 Newsletter on Protein Crystallography. 45. Contribution 6. "On determination of T matrix in TLS modeling"

3. Rigid body refinement:

Afonine PV, Grosse-Kunstleve RW, Adams PD & Urzhumtsev AG. "Methods for optimal rigid body refinement of models with large displacements". (in preparation for Acta Cryst. D).

4. Bulk-solvent modeling and anisotropic scaling:

S. Sheriff & W.A. Hendrickson. Acta Cryst. (1987). A43, 118-121. "Description of overall anisotropy in diffraction from macromolecular crystals"

Jiang, J.-S. & Brünger, A. T. (1994). J. Mol. Biol. 243, 100-115. "Protein hydration observed by X-ray

❍ diffraction. Solvation properties of penicillopepsin and neuraminidase crystal structures."

A. Fokine & A. Urzhumtsev. Acta Cryst. (2002). D58, 1387-1392. "Flat bulk-solvent model: obtaining optimal

❍ parameters"

P.V. Afonine, R.W. Grosse-Kunstleve & P.D. Adams. Acta Cryst. (2005). D61, 850-855. "A robust bulksolvent correction and anisotropic scaling procedure"

5. Refinement at subatomic resolution:

Afonine, P.V., Pichon-Pesme, V., Muzet, N., Jelsch, C., Lecomte, C. & Urzhumtsev, A. (2002). CCP4

Newsletter on Protein Crystallography. 41. "Modeling of bond electron density"

Afonine P.V., Lunin, V., Muzet, N. & Urzhumtsev, A. (2004). Acta Cryst., D60, 260-274. "On the possibility of

❍ observation of valence electron density for individual bonds in proteins in conventional difference maps"

P.V. Afonine, R.W. Grosse-Kunstleve, P.D. Adams, V.Y. Lunin, A. Urzhumtsev. "On macromolecular refinement at subatomic resolution with interatomic scatterers" (submitted to Acta Cryst. D).

6. LBFGS minimization:

Liu, D.C. & Nocedal, J. (1989). Mathematical Programming, 45, 503-528. "On the limited memory BFGS method for large scale optimization"

7. Dynamics, simulated annealing:

Brünger, A.T., Kuriyan, J., Karplus, M. (1987). Science. 235, 458-460. "Crystallographic R factor refinement

❍ by molecular dynamics"

Adams, P.D., Pannu, N.S., Read, R.J. & Brünger, A.T. (1997). Proc. Natl. Acad. Sci. 94, 5018-5023. "Cross-

❍ validated maximum likelihood enhances crystallographic simulated annealing refinement"

L.M. Rice, Y. Shamoo & A.T. Brünger. J. Appl. Cryst. (1998). 31, 798-805. "Phase Improvement by Multi-

Start Simulated Annealing Refinement and Structure-Factor Averaging"

Brünger, A.T & Adams, P.D. (2002). Acc. Chem. Res. 35, 404-412. "Molecular dynamics applied to X-ray structure refinement"

8. Target weights calculation:

Brünger, A.T., Karplus, M. & Petsko, G.A. (1989). Acta Cryst. A45, 50-61. "Crystallographic refinement by

❍ simulated annealing: application to crambin"

Brünger, A.T. (1992). Nature (London), 355, 472-474. "The free R value: a novel statistical quantity for

❍ assessing the accuracy of crystal structures"

Adams, P.D., Pannu, N.S., Read, R.J. & Brünger, A.T. (1997). Proc. Natl. Acad. Sci. 94, 5018-5023. "Crossvalidated maximum likelihood enhances crystallographic simulated annealing refinement"

9. Electron density maps (Fourier syntheses) calculation:

A.G. Urzhumtsev, T.P. Skovoroda & V.Y. Lunin. J. Appl. Cryst. (1996). 29, 741-744. "A procedure compatible with X-PLOR for the calculation of electron-density maps weighted using an R-free-likelihood approach"

10. Monomer Library:

Vagin, A.A., Steiner, R.A., Lebedev, A.A, Potterton, L., McNicholas, S., Long, F. & Murshudov, G.N. (2004).

Acta Cryst. D60, 2184-2195. "REFMAC5 dictionary: organization of prior chemical knowledge and guidelines for its use"

11. Scattering factors: http://phenix-online.org/documentation/refinement.htm (30 of 42) [12/14/08 1:02:19 PM]

189

Structure refinement in PHENIX

D. Waasmaier & A. Kirfel. Acta Cryst. (1995). A51, 416-431. "New analytical scattering-factor functions for

❍ free atoms and ions"

International Tables for Crystallography (1992)

Neutron News, Vol. 3, No. 3, 1992, pp. 29-37. http://www.ncnr.nist.gov/resources/n-lengths/list.html

Grosse-Kunstleve RW, Sauter NK & Adams PD. Newsletter of the IUCr Commission on Crystallographic

Computing 2004, 3:22-31. "cctbx news"

12. Neutron and joint X-ray/neutron refinement:

A. Wlodawer & W.A. Hendrickson. Acta Cryst. (1982). A38, 239-247. "A procedure for joint refinement of

❍ macromolecular structures with X-ray and neutron diffraction data from single crystals"

A. Wlodawer, H. Savage & G. Dodson. Acta Cryst. (1989). B45, 99-107. "Structure of insulin: results of joint neutron and X-ray refinement"

13. Stereochemical restraints:

Grosse-Kunstleve, R.W., Afonine, P.V., Adams, P.D. (2004). Newsletter of the IUCr Commission on

Crystallographic Computing, 4, 19-36. "cctbx news: Geometry restraints and other new features"

14. Parameters parsing and interpretation:

Grosse-Kunstleve RW, Afonine PV, Sauter NK, Adams PD. Newsletter of the IUCr Commission on

Crystallographic Computing 2005, 5:69-91. "cctbx news: Phil and friends"

Feedback, more information

Send bug reports to: [email protected]

For help write to: [email protected]

Questions: [email protected]

More information: www.phenix-online.org or type:

% phenix.about

List of all refinement keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

-------------------------------------------------------------------------------

refinement

Scope of parameters for structure refinement with phenix.refine

crystal_symmetry

Scope of space group and unit cell parameters

unit_cell= None

space_group= None

input

Scope of input file names, labels, processing directions

symmetry_safety_check= *error warning Check for consistency of crystall

symmetry from model and data files

pdb

file_name= None Model file(s) name (PDB)

neutron_data

Scope of neutron data and neutron free-R flags

ignore_xn_free_r_mismatch= False

file_name= None

labels= None

high_resolution= None

low_resolution= None

outliers_rejection= True

sigma_fobs_rejection_criterion= 0.0

sigma_iobs_rejection_criterion= 0.0

ignore_all_zeros= True

force_anomalous_flag_to_be_equal_to= None

r_free_flags

file_name= None This is normally the same as the file containing

Fobs and is usually selected automatically.

label= None http://phenix-online.org/documentation/refinement.htm (31 of 42) [12/14/08 1:02:19 PM]

190

Structure refinement in PHENIX

test_flag_value= None This value is usually selected automatically

- do not change unless you really know what

you're doing!

disable_suitability_test= False

ignore_pdb_hexdigest= False If True, disables safety check based

on MD5 hexdigests stored in PDB files

produced by previous runs.

ignore_r_free_flags= False Use all reflections in refinement (work

and test)

generate= False Generate R-free flags (if not available in input

files)

fraction= 0.1

max_free= 2000

lattice_symmetry_max_delta= 5

use_lattice_symmetry= True

xray_data

Scope of X-ray data and free-R flags

file_name= None

labels= None

high_resolution= None

low_resolution= None

outliers_rejection= True

sigma_fobs_rejection_criterion= 0.0

sigma_iobs_rejection_criterion= 0.0

ignore_all_zeros= True

force_anomalous_flag_to_be_equal_to= None

r_free_flags

file_name= None This is normally the same as the file containing

Fobs and is usually selected automatically.

label= None

test_flag_value= None This value is usually selected automatically

- do not change unless you really know what

you're doing!

disable_suitability_test= False

ignore_pdb_hexdigest= False If True, disables safety check based

on MD5 hexdigests stored in PDB files

produced by previous runs.

ignore_r_free_flags= False Use all reflections in refinement (work

and test)

generate= False Generate R-free flags (if not available in input

files)

fraction= 0.1

max_free= 2000

lattice_symmetry_max_delta= 5

use_lattice_symmetry= True

experimental_phases

Scope of experimental phase information (HL

coefficients)

file_name= None

labels= None

monomers

Scope of monomers information (CIF files)

file_name= None Monomer file(s) name (CIF)

output

Scope for output files

prefix= None Prefix for all output files

serial= None Serial number for consequtive refinement runs

serial_format= "%03d" Format serial number in output file name

write_eff_file= True

write_geo_file= True

write_def_file= True

export_final_f_model= mtz cns Write Fobs, Fmodel, various scales and

more to MTZ or CNS file

write_maps= False

write_map_coefficients= True

electron_density_maps

Electron density maps calculation parameters

map_format= *xplor

map_coefficients_format= *mtz phs

suppress= None List of mtz_label_amplitudes of maps to be suppressed.

Intended to selectively suppress computation and writing of

the standard maps.

grid_resolution_factor= 1/4 http://phenix-online.org/documentation/refinement.htm (32 of 42) [12/14/08 1:02:19 PM]

191

Structure refinement in PHENIX

region= *selection cell

atom_selection= None

atom_selection_buffer= 3

apply_sigma_scaling= True

apply_volume_scaling= False

map

mtz_label_amplitudes= None

mtz_label_phases= None

likelihood_weighted= None

obs_factor= None

calc_factor= None

kicked= False

fill_missing_f_obs_with_weighted_f_model= True

map

mtz_label_amplitudes= 2FOFCWT

mtz_label_phases= PH2FOFCWT

likelihood_weighted= True

obs_factor= 2

calc_factor= 1

kicked= False

fill_missing_f_obs_with_weighted_f_model= True

map

mtz_label_amplitudes= FOFCWT

mtz_label_phases= PHFOFCWT

likelihood_weighted= True

obs_factor= 1

calc_factor= 1

kicked= False

fill_missing_f_obs_with_weighted_f_model= True

map

mtz_label_amplitudes= 2FOFCWT_no_fill

mtz_label_phases= PH2FOFCWT_no_fill

likelihood_weighted= True

obs_factor= 2

calc_factor= 1

kicked= False

fill_missing_f_obs_with_weighted_f_model= False

map

mtz_label_amplitudes= FOFCWT_no_fill

mtz_label_phases= PHFOFCWT_no_fill

likelihood_weighted= True

obs_factor= 1

calc_factor= 1

kicked= False

fill_missing_f_obs_with_weighted_f_model= False

anomalous_difference_map

mtz_label_amplitudes= ANOM

mtz_label_phases= PHANOM

refine

Scope of refinement flags (=flags defining what to refine) and atom

selections (=atoms to be refined)

strategy= *individual_sites rigid_body *individual_adp group_adp tls

*occupancies group_anomalous Atomic parameters to be refined

sites

Scope of atom selections for coordinates refinement

individual= None Atom selections for individual atoms

rigid_body= None Atom selections for rigid groups

adp

Scope of atom selections for ADP (Atomic Displacement Parameters)

refinement

group_adp_refinement_mode= *one_adp_group_per_residue

two_adp_groups_per_residue group_selection

Select one of three available modes for

group B-factors refinement. For two groups

per residue, the groups will be main-chain

and side-chain atoms. Provide selections

for groups if group_selection is chosen.

group= None One isotropic ADP for group of selected here atoms will

be refined

one_adp_group_per_residue= True Refine one isotropic ADP per residue http://phenix-online.org/documentation/refinement.htm (33 of 42) [12/14/08 1:02:19 PM]

192

Structure refinement in PHENIX

two_adp_group_per_residue= False Refine one isotropic ADP per residue

tls= None Selection(s) for TLS group(s)

individual

Scope of atom selections for refinement of individual ADP

isotropic= None Selections for atoms to be refinement with

isotropic ADP

anisotropic= None Selections for atoms to be refinement with

anisotropic ADP

occupancies

Scope of atom selections for occupancy refinement

individual= None Selection(s) for individual atoms. None is default

which is to refine the individual occupancies for atoms

in alternative conformations or for atoms with partial

occupancies only.

remove_selection= None Occupancies of selected atoms will not be

refined (even though they might satisfy the default

criteria for occupancy refinement).

constrained_group

Selections to define constrained occupancies. If

only one selection is provided then one occupancy

factor per selected atoms will be refined and it

will be constrained between predefined max and min

values.

selection= None Atom selection string.

anomalous_scatterers

group

selection= None

f_prime= 0

f_double_prime= 0

refine= *f_prime *f_double_prime

main

Scope for most common and frequently used parameters

bulk_solvent_and_scale= True Do bulk solvent correction and anisotropic

scaling

simulated_annealing= False Do simulated annealing

ordered_solvent= False Add (or/and remove) and refine ordered solvent

molecules (water)

ncs= False Use restraints NCS in refinement (can be determined

automatically)

ias= False Build and use IAS (interatomic scatterers) model (at

resolutions higher than approx. 0.9 A)

number_of_macro_cycles= 3 Number of macro-cycles to be performed

max_number_of_iterations= 25

use_form_factor_weights= False

tan_u_iso= False Use tan() reparameterization in ADP refinement

(currently disabeled)

use_convergence_test= False Determine if refinement converged and stop

then

target= *ml mlhl ml_sad ls Choices for refinement target

min_number_of_test_set_reflections_for_max_likelihood_target= 50 minimum

number of

test

reflections

required

for use of

ML target

max_number_of_resolution_bins= 30

reference_xray_structure= None

use_experimental_phases= None Use experimental phases if available. If

true, the target function must be set to mlhl .

compute_optimal_errors= False

random_seed= 2679941 Ransom seed

scattering_table= wk1995 it1992 *n_gaussian neutron Choices of

scattering table for structure factors calculations

use_normalized_geometry_target= True

target_weights_only= False Calculate target weights only and exit

refinement

use_f_model_scaled= False Use Fmodel structure factors multiplied by

overall scale factor scale_k1

max_d_min= 0.25

Highest allowable resolution limit for refinement

fake_f_obs= False Substitute real experimental Fobs with those

calculated from input model (scales and solvent can be http://phenix-online.org/documentation/refinement.htm (34 of 42) [12/14/08 1:02:19 PM]

193

Structure refinement in PHENIX

added)

optimize_mask= False Refine mask parameters (solvent_radius and

shrink_truncation_radius)

occupancy_max= 1.0

Maximum allowable occupancy of an atom

occupancy_min= 0.0

Minimum allowable occupancy of an atom

stir= None Stepwise increase of resolution: start refinement at lower

resolution and gradually proceed with higher resolution

rigid_bond_test= False Compute Hirshfeld's rigid bond test value (RBT)

show_residual_map_peaks_and_holes= True Show highest peaks and deepest

holes in residual_map.

fft_vs_direct= False Check accuracy of approximations used in Fcalc

calculations

outliers_rejection= True Remove basic wilson outliers , extreme wilson

outliers , and beamstop shadow outliers

switch_to_isotropic_high_res_limit= 1.7

If the resolution is lower than

this limit, all atoms selected for

individual ADP refinement and not

participating in TLS groups will be

automatically converted to

isotropic.

find_and_add_hydrogens= False Find H or D atoms using difference map and

add them to the model. This option should be

used if ultra-high resolution data is available

or when refining againts neutron data.

modify_start_model

Scope of parameters to modify initial model before

refinement

selection= None Selection for atoms to be modified

random_seed= None Random seed

adp

Scope of options to modify ADP of selected atoms

atom_selection= None Selection for atoms to be modified. Overrides

parent-level selection.

randomize= None Randomize ADP within a certain range

set_b_iso= None Set ADP of atoms to set_b_iso

convert_to_isotropic= None Convert atoms to isotropic

convert_to_anisotropic= None Convert atoms to anisotropic

shift_b_iso= None Add shift_b_iso value to ADP

scale_adp= None Multiply ADP by scale_adp

sites

Scope of options to modify coordinates of selected atoms

atom_selection= None Selection for atoms to be modified. Overrides

parent-level selection.

shake= None Randomize coordinates with mean error value equal to shake

translate= 0 0 0 Translational shift

rotate= 0 0 0 Rotational shift

euler_angle_convention= *xyz zyz Euler angles convention to be used

for rotation

occupancies

Scope of options to modify occupancies of selected atoms

randomize= None Randomize occupancies within a certain range

set= None Set all or selected occupancies to given value

output

Write out PDB file with modified model (file name is defined in

write_modified)

file_name= None Default is the original file name with the file

extension replaced by _modified.pdb .

fake_f_obs

Scope of parameters to simulate Fobs

k_sol= 0.0

Bulk solvent k_sol values

b_sol= 0.0

Bulk solvent b_sol values

b_cart= 0 0 0 0 0 0 Anisotropic scale matrix

scale= 1.0

Overall scale factor

scattering_table= wk1995 it1992 *n_gaussian neutron Choices of

scattering table for structure factors calculations

r_free_flags_fraction= None

structure_factors_accuracy

algorithm= *fft direct

cos_sin_table= False

grid_resolution_factor= 1/3.

quality_factor= None

u_base= None

b_base= None http://phenix-online.org/documentation/refinement.htm (35 of 42) [12/14/08 1:02:19 PM]

194

Structure refinement in PHENIX

wing_cutoff= None

exp_table_one_over_step_size= None

mask

solvent_radius= 1.11

shrink_truncation_radius= 0.9

grid_step_factor= 4.0

The grid step for the mask calculation is

determined as highest_resolution divided by

grid_step_factor. This is considered as suggested

value and may be adjusted internally based on the

resolution.

verbose= 1

mean_shift_for_mask_update= 0.1

Value of overall model shift in

refinement to updates the mask.

ignore_zero_occupancy_atoms= True Include atoms with zero occupancy

into mask calculation

ignore_hydrogens= True Ignore H or D atoms in mask calculation

hydrogens

Scope of parameters for H atoms refinement

refine= individual *riding Choice for refinement: riding model or full

(H is refined as other atoms; useful at very high resolutions

only)

refine_adp= one_b_per_residue *one_b_per_molecule individual Startegy

for ADP refinement of H atoms (used only if mode=riding)

refine_occupancies= one_q_per_residue *one_q_per_molecule individual

Method to refine parameters of H or D atoms

contribute_to_f_calc= True Add H contribution to Xray (Fcalc)

calculations

high_resolution_limit_to_include_scattering_from_h= 1.6

xh_bond_distance_deviation_limit= 0.0

Idealize XH bond distances if

deviation from ideal is greater than

xh_bond_distance_deviation_limit

build

map_type= mFobs-DFmodel Map type to be used to find hydrogens

map_cutoff= 2.0

Map cutoff

angular_step= 3.0

Step in degrees for 6D rigid body search for best

fit

use_sigma_scaled_maps= True Default is sigma scaled map, map in

absolute scale is used otherwise.

resolution_factor= 1./4.

max_number_of_peaks= None

map_next_to_model

min_model_peak_dist= 0.7

max_model_peak_dist= 1.05

min_peak_peak_dist= 1.0

use_hydrogens= False

peak_search

peak_search_level= 1

max_peaks= 0

interpolate= True

min_distance_sym_equiv= None

general_positions_only= False

min_cross_distance= 1.0

group_b_iso

number_of_macro_cycles= 3

max_number_of_iterations= 25

convergence_test= False

run_finite_differences_test= False

adp

iso

max_number_of_iterations= 25

automatic_randomization_if_all_equal= True

scaling

scale_max= 3.0

scale_min= 10.0

tls

one_residue_one_group= None

refine_T= True

refine_L= True

refine_S= True http://phenix-online.org/documentation/refinement.htm (36 of 42) [12/14/08 1:02:19 PM]

195

Structure refinement in PHENIX

number_of_macro_cycles= 2

max_number_of_iterations= 25

start_tls_value= None

run_finite_differences_test= False

eps= 1.e-6

adp_restraints

iso

use_u_local_only= False

sphere_radius= 5.0

distance_power= 1.69

average_power= 1.03

wilson_b_weight_auto= False

wilson_b_weight= None

plain_pairs_radius= 5.0

refine_ap_and_dp= False

b_iso_max= None

group_occupancy

number_of_macro_cycles= 3

max_number_of_iterations= 25

convergence_test= False

run_finite_differences_test= False

group_anomalous

number_of_minimizer_cycles= 3

lbfgs_max_iterations= 20

number_of_finite_difference_tests= 0

rigid_body

Scope of parameters for rigid body refinement

mode= *first_macro_cycle_only every_macro_cycle Defines how many times

the rigid body refinement is performed during refinement run.

first_macro_cycle_only to run only once at first macrocycle,

every_macro_cycle to do rigid body refinement

main.number_of_macro_cycles times

target= ls_wunit_k1 ml *auto Rigid body refinement target function:

least-squares or maximum-likelihood

target_auto_switch_resolution= 6.0

Used if target=auto, use optimal

target for given working resolution.

refine_rotation= True Only rotation is refined (translation is fixed).

refine_translation= True Only translation is refined (rotation is fixed).

max_iterations= 25 Number of LBFGS minimization iterations

bulk_solvent_and_scale= True Bulk-solvent and scaling within rigid body

refinement (needed since large rigid body shifts

invalidate the mask).

euler_angle_convention= *xyz zyz Euler angles convention

lbfgs_line_search_max_function_evaluations= 10

min_number_of_reflections= 100 Number of reflections that defines the

first lowest resolution zone for

multiple_zones protocol

multi_body_factor= 1

zone_exponent= 4.0

high_resolution= 3.0

High resolution cutoff (used for rigid body

refinement only)

max_low_high_res_limit= None Maximum value for high resolution cutoff

for the first lowest resolution zone

number_of_zones= 5 Number of resolution zones for MZ protocol

ncs

find_automatically= True

coordinate_sigma= None

b_factor_weight= None

excessive_distance_limit= 1.5

special_position_warnings_only= False

simple_ncs_from_pdb

pdb_in= None Input PDB file to be used to identify ncs

temp_dir= "" temporary directory (ncs_domain_pdb will be written

there)

min_length= 10 minimum number of matching residues in a segment

njump= 1 Take every njumpth residue instead of each 1

njump_recursion= 10 Take every njump_recursion residue instead of

each 1 on recursive call http://phenix-online.org/documentation/refinement.htm (37 of 42) [12/14/08 1:02:19 PM]

196

Structure refinement in PHENIX

min_length_recursion= 50 minimum number of matching residues in a

segment for recursive call

min_percent= 95.

min percent identity of matching residues

max_rmsd= 2.

max rmsd of 2 chains. If 0, then only search for domains

quick= True If quick is set and all chains match, just look for 1 NCS

group

max_rmsd_user= 3.

max rmsd of chains suggested by user (i.e., if

called from phenix.refine with suggested ncs groups)

maximize_size_of_groups= False You can request that the scoring be

set up to maximize the number of members in

NCS groups

ncs_domain_pdb_stem= None NCS domains will be written to

ncs_domain_pdb_stem+"group_"+nn

write_ncs_domain_pdb= False You can write out PDB files representing

NCS domains for density modification if you

want

verbose= False Verbose output

debug= False Debugging output

dry_run= False Just read in and check parameter names

domain_finding_parameters

find_invariant_domains= True Find the parts of a set of chains

that follow NCS

initial_rms= 0.5

Guess of RMS among chains

match_radius= 2.0

Keep atoms that are within match_radius of

NCS-related atoms

similarity_threshold= 0.75

Threshold for similarity between

segments

smooth_length= 0 two segments separated by smooth_length or less

get connected

min_contig_length= 3 segments < min_contig_length rejected

min_fraction_domain= 0.2

domain must be this fraction of a chain

max_rmsd_domain= 2.

max rmsd of domains

restraint_group

reference= None

selection= None

coordinate_sigma= 0.05

b_factor_weight= 10

pdb_interpretation

link_distance_cutoff= 3

disulfide_distance_cutoff= 3

chir_volume_esd= 0.2

nonbonded_distance_cutoff= None

default_vdw_distance= 1

min_vdw_distance= 1

nonbonded_buffer= 1

vdw_1_4_factor= 0.8

translate_cns_dna_rna_residue_names= None

apply_cif_modification

data_mod= None

residue_selection= None

apply_cif_link

data_link= None

residue_selection_1= None

residue_selection_2= None

peptide_link

cis_threshold= 45

discard_psi_phi= True

omega_esd_override_value= None

rna_sugar_pucker_analysis

use= True

bond_min_distance= 1.2

bond_max_distance= 1.8

epsilon_range_not_2p_min= 155

epsilon_range_not_2p_max= 310

delta_range_2p_min= 115

delta_range_2p_max= 180

p_distance_c1_n_line_2p_max= 2.9

show_histogram_slots http://phenix-online.org/documentation/refinement.htm (38 of 42) [12/14/08 1:02:19 PM]

197

Structure refinement in PHENIX

bond_lengths= 5

nonbonded_interaction_distances= 5

dihedral_angle_deviations_from_ideal= 5

show_max_lines

bond_restraints_sorted_by_residual= 5

nonbonded_interactions_sorted_by_model_distance= 5

dihedral_angle_restraints_sorted_by_residual= 3

clash_guard

nonbonded_distance_threshold= 0.5

max_number_of_distances_below_threshold= 100

max_fraction_of_distances_below_threshold= 0.1

geometry_restraints

edits

excessive_bond_distance_limit= 10

bond

action= *add delete change

atom_selection_1= None

atom_selection_2= None

symmetry_operation= None The bond is between atom_1 and

symmetry_operation * atom_2, with atom_1 and

atom_2 given in fractional coordinates.

Example: symmetry_operation = -x-1,-y,z

distance_ideal= None

sigma= None

slack= None

angle

action= *add delete change

atom_selection_1= None

atom_selection_2= None

atom_selection_3= None

angle_ideal= None

sigma= None

geometry_restraints

remove

angles= None

dihedrals= None

chiralities= None

planarities= None

ordered_solvent

low_resolution= 2.8

Low resolution limit for water picking (at lower

resolution water will not be picked even if requessted)

mode= *auto filter_only every_macro_cycle Choices for water picking

strategy: auto - start water picking after ferst few macro-cycles,

filter_only - remove water only, every_macro_cycle - do water

update every macro-cycle

output_residue_name= HOH

output_chain_id= S

output_atom_name= O

b_iso_min= 1.0

Minimum B-factor value, waters with smaller value will be

rejected

b_iso_max= 80.0

Maximum B-factor value, waters with bigger value will be

rejected

anisotropy_min= 0.1

For solvent refined as anisotropic: remove is less

than this value

b_iso= None Initial B-factor value for newly added water

scattering_type= O Defines scattering factors for newly added waters

occupancy_min= 0.1

Minimum occupancy value, waters with smaller value

will be rejected

occupancy_max= 1.0

Maximum occupancy value, waters with bigger value

will be rejected

occupancy= 1.0

Initial occupancy value for newly added water

primary_map_type= mFobs-DFmodel

primary_map_cutoff= 3.0

secondary_map_type= 2mFobs-DFmodel

secondary_map_cutoff= 1.0

h_bond_min_mac= 1.8

h_bond_min_sol= 1.8

http://phenix-online.org/documentation/refinement.htm (39 of 42) [12/14/08 1:02:19 PM]

198

Structure refinement in PHENIX

h_bond_max= 3.2

new_solvent= *isotropic anisotropic Based on the choice, added solvent

will have isotropic or anisotropic b-factors

refine_adp= True Refine ADP for newly placed solvent.

refine_occupancies= False Refine solvent occupancies.

filter_at_start= True

n_cycles= 1

ignore_final_filtering_step= False

correct_drifted_waters= True

use_kick_maps= False Use Dusan's Turk kick maps for peak picking

kick_map

parameters for kick maps

kick_size= 0.5

number_of_kicks= 100

peak_search

use_sigma_scaled_maps= True Default is sigma scaled map, map in absolute

scale is used otherwise.

resolution_factor= 1./4.

max_number_of_peaks= None

map_next_to_model

min_model_peak_dist= 1.8

max_model_peak_dist= 6.0

min_peak_peak_dist= 1.8

use_hydrogens= False

peak_search

peak_search_level= 1

max_peaks= 0

interpolate= True

min_distance_sym_equiv= None

general_positions_only= False

min_cross_distance= 1.8

bulk_solvent_and_scale

bulk_solvent= True

anisotropic_scaling= True

k_sol_b_sol_grid_search= True

minimization_k_sol_b_sol= True

minimization_b_cart= True

target= ls_wunit_k1 *ml

symmetry_constraints_on_b_cart= True

k_sol_max= 0.6

k_sol_min= 0.0

b_sol_max= 150.0

b_sol_min= 0.0

k_sol_grid_search_max= 0.6

k_sol_grid_search_min= 0.0

b_sol_grid_search_max= 80.0

b_sol_grid_search_min= 20.0

k_sol_step= 0.3

b_sol_step= 30.0

number_of_macro_cycles= 2

max_iterations= 25

min_iterations= 25

fix_k_sol= None

fix_b_sol= None

apply_back_trace_of_b_cart= False

verbose= -1

ignore_bulk_solvent_and_scale_failure= False

fix_b_cart

b11= None

b22= None

b33= None

b12= None

b13= None

b23= None

alpha_beta

free_reflections_per_bin= 140

number_of_macromolecule_atoms_absent= 225

n_atoms_included= 0

bf_atoms_absent= 15.0

http://phenix-online.org/documentation/refinement.htm (40 of 42) [12/14/08 1:02:19 PM]

199

Structure refinement in PHENIX

final_error= 0.0

absent_atom_type= "O"

method= *est calc

estimation_algorithm= *analytical iterative

verbose= -1

interpolation= True

fix_scale_for_calc_option= None

number_of_waters_absent= 613

sigmaa_estimator

kernel_width_free_reflections= 100

kernel_on_chebyshev_nodes= True

number_of_sampling_points= 20

number_of_chebyshev_terms= 10

use_sampling_sum_weights= True

mask

solvent_radius= 1.11

shrink_truncation_radius= 0.9

grid_step_factor= 4.0

The grid step for the mask calculation is

determined as highest_resolution divided by

grid_step_factor. This is considered as suggested

value and may be adjusted internally based on the

resolution.

verbose= 1

mean_shift_for_mask_update= 0.1

Value of overall model shift in

refinement to updates the mask.

ignore_zero_occupancy_atoms= True Include atoms with zero occupancy into

mask calculation

ignore_hydrogens= True Ignore H or D atoms in mask calculation

cartesian_dynamics

temperature= 300

number_of_steps= 200

time_step= 0.0005

initial_velocities_zero_fraction= 0

n_print= 100

verbose= -1

simulated_annealing

start_temperature= 5000

final_temperature= 300

cool_rate= 100

number_of_steps= 25

time_step= 0.0005

initial_velocities_zero_fraction= 0

n_print= 100

update_grads_shift= 0.3

refine_sites= True

refine_adp= False

max_number_of_iterations= 25

mode= every_macro_cycle *second_and_before_last once first

verbose= -1

interleaved_minimization

number_of_iterations= 0

time_step_factor= 10

restraints= *bonds *angles

target_weights

mode= *automatic every_macro_cycle

wxc_scale= 0.5

wxu_scale= 1.0

wc= 1.0

wu= 1.0

fix_wxc= None

fix_wxu= None

optimize_wxc= False

bonds_rmsd_max= 0.05

angles_rmsd_max= 3.5

optimize_wxu= False

shake_sites= True

shake_adp= 10.0

http://phenix-online.org/documentation/refinement.htm (41 of 42) [12/14/08 1:02:20 PM]

200

Structure refinement in PHENIX

regularize_ncycles= 50

verbose= 1

wnc_scale= 0.5

wnu_scale= 1.0

rmsd_cutoff_for_gradient_filtering= 3.0

ias

b_iso_max= 100.0

occupancy_min= -1.0

occupancy_max= 1.5

ias_b_iso_max= 100.0

ias_b_iso_min= 0.0

ias_occupancy_min= 0.01

ias_occupancy_max= 3.0

initial_ias_occupancy= 1.0

build_ias_types= L R B BH

use_map= True

build_only= False

file_prefix= None

peak_search_map

map_type= *Fobs-Fmodel mFobs-DFmodel

grid_step= 0.25

scaling= *volume sigma

ls_target_names

target_name= *ls_wunit_k1 ls_wunit_k2 ls_wunit_kunit ls_wunit_k1_fixed

ls_wunit_k1ask3_fixed ls_wexp_k1 ls_wexp_k2 ls_wexp_kunit

ls_wff_k1 ls_wff_k2 ls_wff_kunit ls_wff_k1_fixed

ls_wff_k1ask3_fixed lsm_kunit lsm_k1 lsm_k2 lsm_k1_fixed

lsm_k1ask3_fixed

twinning

twin_law= None

twin_target= *twin_lsq_f

detwin

mode= algebraic proportional *auto

map_types

twofofc= *two_m_dtfo_d_fc two_dtfo_fc

fofc= *m_dtfo_d_fc gradient m_gradient

aniso_correct= False

structure_factors_and_gradients_accuracy

algorithm= *fft direct

cos_sin_table= False

grid_resolution_factor= 1/3.

quality_factor= None

u_base= None

b_base= None

wing_cutoff= None

exp_table_one_over_step_size= None

r_free_flags

fraction= 0.1

max_free= 2000

lattice_symmetry_max_delta= 5.0

Tolerance used in the determination of

the highest lattice symmetry. Can be thought

of as angle between lattice vectors that

should line up perfectly if the symmetry is

ideal. A typical value is 3 degrees.

use_lattice_symmetry= True When generating Rfree flags, do so in the

asymmetric unit of the highest lattice symmetry.

The result is an Rfree set suitable for twin

refinement.

http://phenix-online.org/documentation/refinement.htm (42 of 42) [12/14/08 1:02:20 PM]

201

Finding NCS in chains from a PDB file with simple_ncs_from_pdb

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Finding NCS in chains from a PDB file with simple_ncs_from_pdb

Author(s)

Purpose

Usage

How simple_ncs_from_pdb works:

Additional notes on how simple_ncs_from_pdb works:

Output files from simple_ncs_from_pdb

Examples

Standard run of simple_ncs_from_pdb:

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all simple_ncs_from_pdb keywords

Author(s)

● simple_ncs_from_pdb : Tom Terwilliger

Phil command interpreter: Ralf W. Grosse-Kunstleve

● find_domain: Peter Zwart

Purpose

The simple_ncs_from_pdb method identifies NCS in the chains in a PDB file and writes out the NCS operators in forms suitable for phenix.refine, resolve, and the AutoSol and AutoBuild Wizards.

Usage

How simple_ncs_from_pdb works:

The basic steps that the simple_ncs_from_pdb carries out are:

(1) Identify sets of chains in the PDB file that have the same sequences. These are potential

NCS-related chains.

(2) Determine which chains in a group actually are related by NCS within a given tolerance

(max_rmsd, typically 2 A)

(3) Determine which residues in each chain are related by NCS, and break the chains into domains that do follow NCS if necessary.

(4) Determine the NCS operators for all chains in each NCS group or domain

Additional notes on how simple_ncs_from_pdb works:

http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (1 of 6) [12/14/08 1:02:26 PM]

202

Finding NCS in chains from a PDB file with simple_ncs_from_pdb

The matching of chains is done in a first quick pass by calling simple_ncs_from_pdb recursively and only using every 10th residue in the analysis. This allows a check of whether chains that have the same sequence really have the same structure or whether some such chains should be in separate NCS groups. The use of only every 10th residue allows time for an all-against all matching of chains.

If residue numbers are not the same for corresponding chains, but they are simply offset by a constant for each chain, this will be recognized and the chains will be aligned.

An assumption in simple_ncs_from_pdb is that residue numbers are consistent among chains. They do not have to be the same: chain A can be residues 1-100 and chain B 211-300. However chain A cannot be residues 1-10 and 20-50, matching to chain B residues 1-10 and 21-51.

Residue numbers are used to align pairs of chains, maximizing identities of matching pairs of residues.

Pairs of chains that can match are identified.

Groupings of chains are chosen that maximize the number of matching residues between each member of a group and the first (reference) member of the group.

For a pair of chains, some segments may match and others not. Each pair of segments must have a length at least as long as min_length and a percent identity at least as high as min_percent. A pair of segments may not end in a mismatch. An overall pair of chains must have an rmsd of CA atoms of less than or equal to rmsd_max.

If find_invariant_domain is specified then once all chains that can be matched with the above algorithm are identified, all remaining chains are matched, allowing the break-up of chains into invariant domains. The invariant domains each get a separate NCS group.

Output files from simple_ncs_from_pdb

The output files that are produced are:

NCS operators written in format for phenix.refine simple_ncs_from_pdb.ncs

NCS operators written in format for the PHENIX Wizards simple_ncs_from_pdb.ncs_spec

Examples

Standard run of simple_ncs_from_pdb:

Running simple_ncs_from pdb is easy. For example, you can type: phenix.simple_ncs_from_pdb anb.pdb

Simple_ncs_from_pdb will analyze the chains in anb.pdb and identify any NCS that exists. For this sample run the following output is produced:

Chains in this PDB file: ['A', 'N', 'B']

GROUPS BASED ON QUICK COMPARISON: [['A', 'B']]

Looking for invariant domains for ...: ['A', 'N', 'B'] [[[2, 525]], http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (2 of 6) [12/14/08 1:02:26 PM]

203

Finding NCS in chains from a PDB file with simple_ncs_from_pdb

[[2, 259], [290, 525]], [[20, 525]]]

There were 3 chains in the PDB file A, N and B. Chains A and B were very similar and clearly related by

NCS. This relationship was found in a quick comparison. Chain N had the same sequence as A and B, but was not in the identical comparison. Searching for domains that did have NCS among all three chains produced three domains, represented below by 4 NCS groups:

GROUP 1

Summary of NCS group with 3 operators:

ID of chain/residue where these apply: [['A', 'N', 'B'], [[[2, 5], [20, 35],

[60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516],

[520, 525]], [[2, 5], [20, 35], [60, 76], [78, 107], [110, 137],

[401, 431], [433, 483], [485, 516], [520, 525]], [[2, 5], [20, 35],

[60, 76], [78, 107], [110, 137], [401, 431], [433, 483], [485, 516],

[520, 525]]]]

RMSD (A) from chain A: 0.0 1.09 0.07

Number of residues matching chain A:[215, 215, 194]

Source of NCS info: anb.pdb

The residues in chains A, B, and N in this group are 2-5, 20-35, 60-76, 78-107, 110-137, 401-431,

433-483, 485-516 and 520-525. Note that these are not all contiguous. These are all the residues that all have the same relationships among the 3 chains. The RMSD of CA atoms between chains A and N is

1.09 A and between A and B is 0.07 A.

The NCS operators relating these domains are given below.

OPERATOR 1

CENTER: 29.9208 -53.3304 -13.4779

ROTA 1: 1.0000 0.0000 0.0000

ROTA 2: 0.0000 1.0000 0.0000

ROTA 3: 0.0000 0.0000 1.0000

TRANS: 0.0000 0.0000 0.0000

OPERATOR 2

CENTER: 32.5410 -35.4227 20.2768

ROTA 1: 0.9370 -0.2825 0.2053

ROTA 2: -0.3285 -0.9125 0.2439

ROTA 3: 0.1184 -0.2960 -0.9478

TRANS: -14.7410 -79.9073 -8.5967

OPERATOR 3

CENTER: 50.0256 -91.8920 -13.6461

ROTA 1: 0.6257 0.7800 -0.0037

ROTA 2: -0.7800 0.6257 -0.0010

ROTA 3: 0.0015 0.0035 1.0000

TRANS: 70.3889 42.4760 0.3937

GROUP 2

Summary of NCS group with 3 operators:

ID of chain/residue where these apply: [['A', 'N', 'B'], [[[6, 9],

[56, 59], [517, 519]], [[6, 9], [56, 59], [517, 519]], [[6, 9],

[56, 59], [517, 519]]]]

RMSD (A) from chain A: 0.0 0.48 0.03

http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (3 of 6) [12/14/08 1:02:26 PM]

204

Finding NCS in chains from a PDB file with simple_ncs_from_pdb

Number of residues matching chain A:[11, 11, 11]

Source of NCS info: anb.pdb

OPERATOR 1

CENTER: 47.5037 -61.5641 -11.2751

ROTA 1: 1.0000 0.0000 0.0000

ROTA 2: 0.0000 1.0000 0.0000

ROTA 3: 0.0000 0.0000 1.0000

TRANS: 0.0000 0.0000 0.0000

OPERATOR 2

CENTER: 51.8984 -33.6038 20.9877

ROTA 1: 0.9367 -0.2981 0.1836

ROTA 2: -0.3113 -0.9492 0.0469

ROTA 3: 0.1603 -0.1011 -0.9819

TRANS: -14.9810 -78.2888 -2.3823

OPERATOR 3

CENTER: 66.8308 -82.9508 -11.4633

ROTA 1: 0.6255 0.7802 -0.0016

ROTA 2: -0.7802 0.6255 -0.0025

ROTA 3: -0.0009 0.0028 1.0000

TRANS: 70.3999 42.4366 0.4815

GROUP 3

Summary of NCS group with 3 operators:

ID of chain/residue where these apply: [['A', 'N', 'B'], [[[193, 255],

[257, 259], [290, 355], [357, 374]], [[193, 255], [257, 259],

[290, 355], [357, 374]], [[193, 255], [257, 259], [290, 355], [357, 374]]]]

RMSD (A) from chain A: 0.0 0.61 0.01

Number of residues matching chain A:[150, 150, 150]

Source of NCS info: anb.pdb

OPERATOR 1

CENTER: 36.1219 -37.6124 -62.1437

ROTA 1: 1.0000 0.0000 0.0000

ROTA 2: 0.0000 1.0000 0.0000

ROTA 3: 0.0000 0.0000 1.0000

TRANS: 0.0000 0.0000 0.0000

OPERATOR 2

CENTER: 39.1403 -33.0801 60.7270

ROTA 1: 0.7650 0.3808 -0.5194

ROTA 2: 0.0664 -0.8488 -0.5245

ROTA 3: -0.6406 0.3668 -0.6746

TRANS: 50.3180 -36.4383 16.0299

OPERATOR 3

CENTER: 40.9347 -76.7723 -62.2004

ROTA 1: 0.5942 0.8043 -0.0007

ROTA 2: -0.8043 0.5942 -0.0064

ROTA 3: -0.0047 0.0043 1.0000

http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (4 of 6) [12/14/08 1:02:26 PM]

205

Finding NCS in chains from a PDB file with simple_ncs_from_pdb

TRANS: 73.5084 40.5311 0.5807

GROUP 4

Summary of NCS group with 3 operators:

ID of chain/residue where these apply: [['A', 'N', 'B'], [[[36, 41]],

[[36, 41]], [[36, 41]]]]

RMSD (A) from chain A: 0.0 0.22 0.03

Number of residues matching chain A:[6, 6, 6]

Source of NCS info: anb.pdb

OPERATOR 1

CENTER: 45.4522 -37.4720 -14.4660

ROTA 1: 1.0000 0.0000 0.0000

ROTA 2: 0.0000 1.0000 0.0000

ROTA 3: 0.0000 0.0000 1.0000

TRANS: 0.0000 0.0000 0.0000

OPERATOR 2

CENTER: 42.1483 -55.6520 24.0535

ROTA 1: 0.9444 -0.3074 0.1171

ROTA 2: -0.2975 -0.9501 -0.0940

ROTA 3: 0.1402 0.0540 -0.9887

TRANS: -14.2728 -75.5420 6.4099

OPERATOR 3

CENTER: 46.7900 -69.5227 -14.6653

ROTA 1: 0.6247 0.7809 -0.0013

ROTA 2: -0.7809 0.6247 0.0028

ROTA 3: 0.0030 -0.0008 1.0000

TRANS: 70.4964 42.5349 0.0067

NCS operators written in format for resolve to: simple_ncs_from_pdb.resolve

NCS operators written in format for phenix.refine to: simple_ncs_from_pdb.ncs

NCS written as ncs object information to: simple_ncs_from_pdb.ncs_spec

Possible Problems

Specific limitations and problems:

If user specifies chains to be in a suggested NCS group, but they are too dissimilar as a whole

(rmsd > max_rmsd_use) then the group is rejected even if some fragment of the chains could be similar.

Chain specification from suggested_ncs_groups could in principle have than one chain in one group...and simple_ncs_from_pdb can only use suggested groups that consist of N copies of single chains.

If the NCS asymmetric unit of your crystal contains more than one chain, simple_ncs_from_pdb will consider it to have more than one domain, and it will assign one NCS group to each chain.

Literature

Additional information

http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (5 of 6) [12/14/08 1:02:26 PM]

206

Finding NCS in chains from a PDB file with simple_ncs_from_pdb

List of all simple_ncs_from_pdb keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- simple_ncs_from_pdb

pdb_in= None Input PDB file to be used to identify ncs

temp_dir= "" temporary directory (ncs_domain_pdb will be written there)

min_length= 10 minimum number of matching residues in a segment

njump= 1 Take every njumpth residue instead of each 1

njump_recursion= 10 Take every njump_recursion residue instead of each 1 on

recursive call

min_length_recursion= 50 minimum number of matching residues in a segment

for recursive call

min_percent= 95.

min percent identity of matching residues

max_rmsd= 2.

max rmsd of 2 chains. If 0, then only search for domains

quick= True If quick is set and all chains match, just look for 1 NCS group

max_rmsd_user= 3.

max rmsd of chains suggested by user (i.e., if called

from phenix.refine with suggested ncs groups)

maximize_size_of_groups= False You can request that the scoring be set up

to maximize the number of members in NCS groups

ncs_domain_pdb_stem= None NCS domains will be written to

ncs_domain_pdb_stem+"group_"+nn

write_ncs_domain_pdb= False You can write out PDB files representing NCS

domains for density modification if you want

verbose= False Verbose output

debug= False Debugging output

dry_run= False Just read in and check parameter names

domain_finding_parameters

find_invariant_domains= True Find the parts of a set of chains that

follow NCS

initial_rms= 0.5

Guess of RMS among chains

match_radius= 2.0

Keep atoms that are within match_radius of NCS-related

atoms

similarity_threshold= 0.75

Threshold for similarity between segments

smooth_length= 0 two segments separated by smooth_length or less get

connected

min_contig_length= 3 segments < min_contig_length rejected

min_fraction_domain= 0.2

domain must be this fraction of a chain

max_rmsd_domain= 2.

max rmsd of domains http://phenix-online.org/documentation/simple_ncs_from_pdb.htm (6 of 6) [12/14/08 1:02:26 PM]

207

Finding and analyzing NCS from heavy-atom sites or a model with find_ncs

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Finding and analyzing NCS from heavy-atom sites or a model with find_ncs

Author(s)

Purpose

Usage

How find_ncs works:

Output files from find_ncs

What find_ncs needs:

Examples

Standard run of find_ncs:

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all find_ncs keywords

Author(s)

● find_ncs: Tom Terwilliger simple_ncs_from_pdb : Tom Terwilliger

Phil command interpreter: Ralf W. Grosse-Kunstleve find_domain: Peter Zwart

Purpose

The find_ncs method identifies NCS in either (a) the chains in a PDB file or (b) a set of heavy-atom sites, and writes out the NCS operators in forms suitable for phenix.refine, resolve, and the AutoSol and AutoBuild

Wizards.

Usage

How find_ncs works:

The basic steps that the find_ncs carries out are:

(1) Decide whether to use simple_ncs_from_pdb (used if the input file contains chains from a PDB file) or RESOLVE NCS identification (used if the input file contains heavy-atom sites)

(2) call either simple_ncs_from_pdb or RESOLVE to identify NCS

(3) Evaluate the NCS by calculating the correlation of NCS-related electron density based on the input map coefficients mtz file.

(4) Report the NCS operators and correlations

Output files from find_ncs

The output files that are produced are: http://phenix-online.org/documentation/find_ncs.htm (1 of 5) [12/14/08 1:02:31 PM]

208

Finding and analyzing NCS from heavy-atom sites or a model with find_ncs

NCS operators written in format for phenix.refine find_ncs.ncs

NCS operators written in format for the PHENIX Wizards find_ncs.ncs_spec

What find_ncs needs:

find_ncs needs a file containing NCS information and a file with map coefficients.

The file with NCS information can be...

● a PDB file with a model (find_ncs will call simple_ncs_from_pdb to extract NCS operators from the chains in your model)

● a PDB file with heavy-atom sites (find_ncs will call RESOLVE to find NCS operators from your heavyatom sites)

● an NCS definitions file written by a PHENIX wizard (e.g., AutoSol_1.ncs_spec, produced by AutoSol)

● a RESOLVE log file containing formatted NCS operators

The file with map coefficients can be any MTZ file with coefficients for a map. If find_ncs does not choose the correct columns automatically, then you can specify them with a command like:

labin="labin FP=FP PHIB=PHIB FOM=FOM "

If you have no map coefficients yet (you just have some sites and want to get operators, for example), you can tell find_ncs to ignore the map with:

ncs_parameters.force_ncs=True

Examples

Standard run of find_ncs:

Running find_ncs is easy. From the command-line you can type: phenix.find_ncs anb.pdb mlt.mtz

This will produce the following output:

Getting column labels from mlt.mtz for input map file

FILE TYPE: ccp4_mtz

All labels: ['FP', 'SIGFP', 'PHIC', 'FOM']

Labin line will be: labin FP=FP PHIB=PHIC FOM=FOM

To change it modify this: params.ncs.labin="labin FP=FP PHIB=PHIC FOM=FOM "

This is the map that will be used to evaluate NCS

Reading NCS information from: anb.pdb

Copying mlt.mtz to temp_dir/mlt.mtz

http://phenix-online.org/documentation/find_ncs.htm (2 of 5) [12/14/08 1:02:31 PM]

209

Finding and analyzing NCS from heavy-atom sites or a model with find_ncs

This PDB file contains 2 chains and 636 total residues and 636 C-alpha or P atoms and 4740 total atoms

NCS will be found using the chains in this PDB file

Chains in this PDB file: ['M', 'Z']

Two chains were found in the file anb.pdb, chain M and chain Z

GROUPS BASED ON QUICK COMPARISON: []

Looking for invariant domains for ...: ['M', 'Z'] [[[2, 138], [193, 373]], [[2, 138], [193,

373]]]

Residues 2-138, 193-373, matched between the two chains

Copying mlt.mtz to temp_dir/mlt.mtz

Copying temp_dir/NCS_correlation.log to NCS_correlation.log

Log file for NCS correlation is in NCS_correlation.log

List of refined NCS correlations: [1.0, 0.80000000000000004]

There were two separate groups of residues that had different NCS relationships. Residues 193-373 of each chain were in one group, and residues 2-138 in each chain were in the other group.

The electron density map had a correlation between the two NCS-related chains of 1.0 for the first group, and 0.8 for the second

The NCS operators for each are listed.

GROUP 1

Summary of NCS group with 2 operators:

ID of chain/residue where these apply: [['M', 'Z'], [[[193, 373]], [[193, 373]]]]

RMSD (A) from chain M: 0.0 0.0

Number of residues matching chain M:[181, 181]

Source of NCS info: anb.pdb

Correlation of NCS: 1.0

OPERATOR 1

CENTER: 69.1058 -9.5443 59.4674

ROTA 1: 1.0000 0.0000 0.0000

ROTA 2: 0.0000 1.0000 0.0000

ROTA 3: 0.0000 0.0000 1.0000

TRANS: 0.0000 0.0000 0.0000

OPERATOR 2

CENTER: 37.5004 -37.0709 -62.5441

ROTA 1: 0.7751 -0.6211 -0.1162

ROTA 2: -0.3607 -0.5859 0.7256

ROTA 3: -0.5188 -0.5205 -0.6782

TRANS: 9.7485 27.6460 17.2076

GROUP 2

Summary of NCS group with 2 operators:

ID of chain/residue where these apply: [['M', 'Z'], [[[2, 138]], [[2, 138]]]]

RMSD (A) from chain M: 0.0 0.0

Number of residues matching chain M:[137, 137]

Source of NCS info: anb.pdb

Correlation of NCS: 0.8

http://phenix-online.org/documentation/find_ncs.htm (3 of 5) [12/14/08 1:02:31 PM]

210

Finding and analyzing NCS from heavy-atom sites or a model with find_ncs

OPERATOR 1

CENTER: 66.6943 -13.3128 21.6769

ROTA 1: 1.0000 0.0000 0.0000

ROTA 2: 0.0000 1.0000 0.0000

ROTA 3: 0.0000 0.0000 1.0000

TRANS: 0.0000 0.0000 0.0000

OPERATOR 2

CENTER: 39.0126 -53.7392 -13.4457

ROTA 1: 0.3702 -0.9275 -0.0516

ROTA 2: -0.8933 -0.3402 -0.2938

ROTA 3: 0.2549 0.1548 -0.9545

TRANS: 1.7147 -0.6936 7.2172

Possible Problems

Specific limitations and problems:

None

Literature

Additional information

List of all find_ncs keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- find_ncs

ncs_in= None File with NCS information (PDB file with heavy-atom sites or

with NCS-related chains

ncs_in_type= *None chains sites ncs_file Type of ncs information. Choices

are: chains: a PDB file with two or more chains that have a

consistent residue-numbering system. sites: a PDB file or

fractional-coordinate file with atomic positions of

heavy-atoms that show NCS ncs_file: an ncs object file from

PHENIX.

mtz_in= None MTZ file with coefficients for a map that can be used to

assess NCS. Required for finding NCS from heavy-atom sites

labin= "" Labin line for MTZ file with map coefficients. This is optional

if find_ncs can guess the correct coefficients for FP PHI and FOM.

Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where myFP is

your column label for FP

resolution= 0.

high-resolution limit for map calculation

temp_dir= "temp_dir" Temporary work directory

output_dir= "" Output directory where files are to be written http://phenix-online.org/documentation/find_ncs.htm (4 of 5) [12/14/08 1:02:31 PM]

211

Finding and analyzing NCS from heavy-atom sites or a model with find_ncs

verbose= False Verbose output

debug= False Debugging output

dry_run= False Just read in and check parameter names

ncs_parameters

ncs_restrict= 0 You can specify the number of NCS operators to look for

force_ncs= False You can tell find_ncs to ignore the map. This is useful

if you only have FP but no phases yet...

optimize_ncs= False You can tell find_ncs to optimize the NCS by making

as compact a molecule as possible.

n_try_ncs= 3 Number of tries to find ncs from heavy-atom sites

ncs_thorough= 8 Thoroughness for looking for heavy-atom sites (high=more

thorough) http://phenix-online.org/documentation/find_ncs.htm (5 of 5) [12/14/08 1:02:31 PM]

212

eLBOW - electronic Ligand Builder and Optimisation Workbench

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

eLBOW - electronic Ligand Builder and Optimisation Workbench

Author

Purpose

Examples

Using SMILES string from internal database

PDB input

SMILES input

Other input

Geometry optimisation

Hydrogen addition

Output

Additional programs

Literature

Additional information

Novice options

Expert options

electronic Ligand Builder and Optimisation Workbench (eLBOW)

More detailed website

Author

Nigel W. Moriarty

Purpose

Automate the generation of geometry restraint information for refinement of novel ligands and improved geometry restraint information for standard ligands. A protein crystal can contain more than just protein and other simple molecules that most refinement programs can interpret. An unusual molecule can be included in the refinement via eLBOW from a number of chemical inputs. The geometry can be optimised using various levels of chemical knowledge including a semi-empirical quantum mechanical method known as AM1.

Input formats include

SMILES string

PDB (Protein Data Bank)

MolFiles (V2000, V3000 and SDFiles)

TRIPOS MOL2

XYZ

● certain CIF files

GAMESS input and output files

Output formats include http://phenix-online.org/documentation/elbow.htm (1 of 8) [12/14/08 1:02:42 PM]

213

eLBOW - electronic Ligand Builder and Optimisation Workbench

PDB (Protein Data Bank)

CIF restraint file eLBOW contains a number of programs. All programs have been written to allow command-line control and script access to the objects and algorithm. The main program is run thus: phenix.elbow [options] input_file.ext

or in a Python script from elbow.command_line import builder molecule = builder.run("input_file.ext", **kwds) where the options are passed as a dictionary. The return object can be interrogated for information via the class methods. Output files from both techniques include a PDB file of the final geometry and a CIF file that contains the geometry restraint information for refinement. Other files are output as appropriate, such as edits and CIF files for linking the ligand to the protein. A final file contains the serialised data of the molecule in the Python pickle format.

Examples

Using SMILES string from internal database

To run eLBOW on an internal SMILES string phenix.elbow --key=ATP [options]

PDB input

To run eLBOW on a PDB file (containing one molecule) phenix.elbow input_file.pdb

To run eLBOW on a PDB file containing protein and ligands. This will only process the ligands that are unknown to phenix.refine. phenix.elbow input_file.pdb --do-all

To run eLBOW on a PDB file specifying a residue phenix.elbow input_file.pdb --residue LIG

To use the atom names from a PDB file phenix.elbow --smiles O --template input_file.pdb

SMILES input

To run eLBOW on a SMILES string phenix.elbow --smiles="CCO" http://phenix-online.org/documentation/elbow.htm (2 of 8) [12/14/08 1:02:42 PM]

214

eLBOW - electronic Ligand Builder and Optimisation Workbench or phenix.elbow --smiles=input_file.smi

Other input

To run eLBOW on other supported input formats phenix.elbow input_file.ext

Geometry optimisation

eLBOW performs a simple force-field geometry optimisation by default, however an AM1 geometry optimisation can be performed as follows. phenix.elbow input_file.pdb --opt

To start from a specific geometry for the optimisation phenix.elbow --initial_geometry input_file.pdb --opt

To use a separately installed GAMESS and do a HF/3-21G geometry optimisation phenix.elbow input_file.pdb --gamess --basis="3-21G"

To not optimise, but use the input geometry as the final geometry phenix.elbow --final_geometry input_file.pdb

Hydrogen addition

eLBOW automatically adds hydrogens to the input molecules if there are less than a quarter of the possible hydrogens. This can be controlled using phenix.elbow input_file.pdb --add-hydrogens=True

A common requirement is to add hydrogens to a ligand but retain the geometry and position relative to a protein. To do so use phenix.elbow --final-geometry=input_file.pdb

Output

To choice the base name of the output files phenix.elbow input_file.pdb --output="output"

To change the three letter ID phenix.elbow input_file.pdb --id=NEW

To change other attributes http://phenix-online.org/documentation/elbow.htm (3 of 8) [12/14/08 1:02:42 PM]

215

eLBOW - electronic Ligand Builder and Optimisation Workbench phenix.elbow input_file.pdb --pdb-assign "resSeq=3 b=100"

Some of the attributes.

Residue name : resname

Chain ID : chain, chainid

Residue sequence ID : resseq, resid

Alternative location ID : altid, altloc

Insert code : icode

Occupancy : occ, occupancy

Temperature factor : b, tempfactor

Segment ID : segid, segID

To output MOL2 format phenix.elbow input_file.pdb --tripos

To output PDB Ligand output phenix.elbow input_file.pdb --pdb-ligand

Additional programs

● phenix.get_smiles

● phenix.get_pdb

● phenix.metal_coordination : Generate edits for metal coordination

● phenix.link_edits : Generate edits from PDB LINK records

● phenix.print_sequence

● elbow.become_expert

● elbow.become_novice

● elbow.compare_two_molecules

● elbow.join_cif_files

● elbow.join_pdb_files

● elbow.join_mol2_files

● elbow.check_residues_against_monomer_lib

● elbow.defaults : Generate a eLBOW defaults file

Literature

Additional information

Novice options

Option

--version

--help

--long-help

--smiles

Default & choices

None

None

None

""

Description of inputs and uses show program's version number and exit show this help message and exit show even more help and exit use the passed SMILES http://phenix-online.org/documentation/elbow.htm (4 of 8) [12/14/08 1:02:42 PM]

216

eLBOW - electronic Ligand Builder and Optimisation Workbench

--file

--msd

--key

--keys

--chemicalcomponent

--pipe

--residue

--chain

--all-residues

--name

--sequence

--read-only

--opt

--template

""

False

""

False

None

False

""

""

None

""

""

None

False

""

--mopac

--gamess

--qchem

False

False

False

--gaussian

--final-geometry

--initial-geometry

False

None

None http://phenix-online.org/documentation/elbow.htm (5 of 8) [12/14/08 1:02:42 PM] use file for chemical input get SMILES using MSDChem code use SMILES from smilesDB for chemical input display smiles DB build ligand from chemical components (PDB) read input from standard in use only this residue from the PDB file use only this chain from the PDB file retain all residues in a PDB file name of ligand to be used in various output files use sequence (limited to 20 residues and no semi-empirical optimisation) read the input but don't do any processing use the best optimisation method available (currently AM1) use file for naming of atoms e.g.

PDB file use MOPAC for quantum chemistry calculations (requires MOPAC be installed) use GAMESS for quantum chemistry calculations (requires GAMESS be installed) use QChem for quantum chemistry calculations (requires QChem be installed) use Gaussian for quantum chemistry calculations (requires Gaussian be installed) use this file to obtain final geometry use this file to obtain the intital geometry for QM

217

eLBOW - electronic Ligand Builder and Optimisation Workbench

--energy-validation

--restart

--opt-steps

--opt-tol

--chiral

--ignore-chiral

--skip-cif-molecule

--memory

--method

--basis

--aux-basis

--random-seed

--quiet

--silent

--view

--reel

--pymol

--overwrite

--bonding

None

False

60, "positive integer" default, loose, tight retain

None

None

False

False

False

False

False

False

None

None

1Gb, "positive integer", "n Gb", "n

Mb"

"AM1"

"AM1"

False

--id

--xyz

--tripos

--sdf

"LIG"

False

False

False http://phenix-online.org/documentation/elbow.htm (6 of 8) [12/14/08 1:02:42 PM] calculate the difference between starting and final energies restart the optimisation with lowest previous geometry optimisation steps (currently for

ELBOW opt only) optimisation tolerance = loose, default or tight treatment of chiral centres = retain

(default), both, enumerate ignore the chirality in the SMILES string ignore ligands in supplied CIF file(s) maximum memory mostly for quantum method run QM optimisation with this method, if possible run QM with this basis, if possible run QM with this auxiliary basis, if possible random number seed less print out almost complete silence viewing software command fire up restraints editor use PyMOL from the PHENIX install to view geometries clobber any existing output files file that specifies the bond of the input molecule three letter code used in the CIF output output is also written in XYZ format output is also written in TRIPOS format output is also written in SDF format

218

eLBOW - electronic Ligand Builder and Optimisation Workbench

--pdb-ligand

--output

--pickle

--do-all

--clean

--pdb-assign

--heme

--add-hydrogens

None

"algorithm determination"

False

None

False

""

None

"algorithm determination", True,

False output is also written in PDB ligand format name for output files use a pickle file to reload the topological information process all molecules in a PDB,

TRIPOS or SDF file

DELETES "unnecessary" output files

(dangerous) set the atom attributes in the PDB file attempt to match HEME groups

(experimental) override the automatic hydrogen addition

Expert options

Option

--newton-raphson

--gdiis

--quicca

--user-opt

Default & choices

None

False

False

None

--user-opt-input-filename

--user-opt-xyz2input

--user-opt-xyz-filename

""

""

""

--user-opt-script-filename ""

--user-opt-program ""

--user-opt-output-filename ""

--user-opt-output2xyz ""

--write-hydrogens True, False

--auto-bond-cutoff

2.0, "float between 0.5 and

3"

Description of inputs and uses use Newton-Raphson optimisation use GDIIS optimisation use QUICCA optimisation use user defined for quantum chemistry calculations input filename converts xyz file to QM program input xyz filename run script filename

QM optimisation program run script or program invocation command output filename converts QM program output to xyz file override the automatic writing of hydrogens to PDB and CIF files set the max bondlength for auto bond detection http://phenix-online.org/documentation/elbow.htm (7 of 8) [12/14/08 1:02:42 PM]

219

eLBOW - electronic Ligand Builder and Optimisation Workbench

--write-redundantdihedrals

None control the writing of redundant dihedrals http://phenix-online.org/documentation/elbow.htm (8 of 8) [12/14/08 1:02:42 PM]

220

Restraints Editor Exclusively Ligands (REEL)

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Restraints Editor Exclusively Ligands (REEL)

Author

Purpose

Screen Shots

General Procedure

Input

Editing

Examples

Restraints Editor Exclusively Ligands (REEL)

Author

Nigel W. Moriarty

Purpose

Edit the geometry restraints of a ligand using a Graphical User Interface (GUI) including a 3D view of the ligand and a tabular view of the restraints.

Screen Shots

http://phenix-online.org/documentation/reel.htm (1 of 4) [12/14/08 1:02:55 PM]

221

Restraints Editor Exclusively Ligands (REEL) http://phenix-online.org/documentation/reel.htm (2 of 4) [12/14/08 1:02:55 PM]

222

Restraints Editor Exclusively Ligands (REEL)

General Procedure

The general procedure is to load a restraints file (CIF) and manipulate the restraints via the table or molecule view. The geometry of the revised restraints can be tested using the File->Guesstimate option.

The final restraints can be saved to a CIF file for use with phenix.refine. The corresponding PDB file can also be saved.

Input

Restraints can be loaded into REEL using the command line, the pull-down menu to open a file and a pulldown menu to run eLBOW. Restraints files from eLBOW contain both the restraints and cartesian coordinates. For the purposes of REEL, the coordinates are generated by the restraints and can not be edited directly. The background colour is light steel blue to show that it is not used in the geometry editing actions. REEL can load a molecule geometry from any format the eLBOW can read, including PDB,

Mol2D and SDF. If the file does not contain the bonding information, the bonding is automatically determined using proximity. The limit on the size of molecule that the bonding is automatically determined is set at 200 atoms. Molecule up to 2000 atoms can be loaded using the --view but only the bond connectivity is determined. The bond order is set to one but can be changed interactively. Molecules can be loaded into

REEL using the --reel option for eLBOW or using the eLBOW GUI dialog available in REEL. http://phenix-online.org/documentation/reel.htm (3 of 4) [12/14/08 1:02:55 PM]

223

Restraints Editor Exclusively Ligands (REEL)

Editing

The geometry restraints (bonds, angles, dihedrals, planes and chirals) are the driving coordinates in this editor. The cartesian coordinates are displayed in the atoms table only because there are generated for the viewer display. They are the driven coordinates and are therefore ignored if changed in the editor.

Many cells in the table view of the restraints can be changed but care must be taken to ensure that the changes are local or global. For example, changing an atom name in the bonds table view will only change it in that row. If you wish to change the name of the atom in all the restraints for should use the right mouse menu in the viewer window or change it in the atoms table view. The colour of the cell gives the impression as to whether the changes are propagated elsewhere. The cells, in some cases, have not been made read only to allow the user to make changes as desired. Clicking an atom in the molecule view will highlight the various related topological elements in the table view and vice versa. Use the checkbox in the table view to remove a restraint from optimisation and output file. Chiral centres can be changed in the table view.

Examples

To load a previously created restraints file phenix.reel atp.cif

To load a restraints file into REEL from eLBOW phenix.elbow --smiles O --reel

To load all the unknown ligands from a PDB file phenix.reel model.pdb --do-all or a single residue phenix.reel model.pdb --residue ATP http://phenix-online.org/documentation/reel.htm (4 of 4) [12/14/08 1:02:55 PM]

224

ReadySet!

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

ReadySet!

Author

Purpose

General Procedure

Ligand hydrogen addition

Metal coordination

Neutron exchange addition

ReadySet!

Author

Nigel W. Moriarty

Purpose

ReadySet! is a program designed to prepare a PDB file for refinement as in ReadySet! Refine!!!. It will

add hydrogens to the protein model using phenix.reduce and to the ligands using eLBOW. The appropriate restraints are also written to disk. Hydrogens can also be added to water molecules.

Deuterium atoms can be added to facilitate dual xray-neutron refinement. Metal coordination files are also generated.

General Procedure

Ligand hydrogen addition

Including hydrogens in a refinement leads to better models. ReadySet! will add hydrogens to the ligands using eLBOW and PDB Chemical Components. The input PDB file is divided into 'standard' residues including the standard aminoacids and RNA/DNA bases. The other residues (usually ligands) are tested using the three-letter and atomic names against the PHENIX monomer library and the PDB

Chemical Components database.

If the ligand is determined to be in the PHENIX monomer library then the hydrogens are added with the atom naming from the library. This is done using a SMILES string taken from the PDB Chemical

Components database and the atom names from the monomer library. In this case, the hydrogens are added to the output PDB file but there is no restraints written because phenix.refine will use the library restraints.

If the ligand is determined to be in the PDB Chemical Components database, the SMILES string and the atom names are used to generate a molecule that represents the ligand. The atomic naming is determined using either the version 2 or version 3 PDB names. The restraints are written to disk.

If no match is found in the PHENIX monomer library or PDB Chemical Components database, the residue atoms are used to generate the ligand. The restraints are written to disk.

Once there is a ligand representation including hydrogens, the ligand must be included in the output. http://phenix-online.org/documentation/ready_set.htm (1 of 2) [12/14/08 1:02:58 PM]

225

ReadySet!

For each copy of the ligand in the model the presentation is pruned to match the number of nonhydrogen atoms and overlayed onto the ligand orientation. Hydrogens are added in an optimised geometry for each copy of the ligand.

Covalently bound ligands are handled and two files, the CIF link restraints file and the atom selection file, are output.

Metal coordination

Any metals in the model are coordinated and the results output in an "edits" for phenix.refine. The distances and angles in the PDB file are used in the output.

Neutron exchange addition

Deuteriums are added to aminoacids that exhibit exchangeable sites. The hydrogens are placed in alternative location "A" and the corresponding deuteriums are placed in "B". http://phenix-online.org/documentation/ready_set.htm (2 of 2) [12/14/08 1:02:58 PM]

226

phenix.reduce: tool for adding hydrogens to a PDB model

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

phenix.reduce: tool for adding hydrogens to a PDB model

Purpose phenix.reduce is a command line tool for adding hydrogens to a PDB structure file.

Hydrogens are added in standardized geometry with optimization of the orientations of OH, SH, NH3+,

Met methyls, Asn and Gln sidechain amides, and His rings. Both proteins and nucleic acids can be processed. HET groups can also be processed as long as the atom connectivity is provided. The program is described in Word, et al.(1999). J. Mol. Biol. 285, 1733-1745. For more information visit: http://kinemage.biochem.duke.edu/software/reduce.php

How to run phenix.reduce is run from the command line:

% phenix.reduce [pdb_file] [options]

To get information about command line options type:

% phenix.reduce

or for a longer list:

% phenix.reduce -h

Hydrogens in refinement Please refer to phenix.refine documentation to see how hydrogen atoms are used in structure refinement. http://phenix-online.org/documentation/hydrogens.htm [12/14/08 1:03:01 PM]

227

Phaser-2.1

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Phaser-2.1

Reference

Tutorials and Example Files

Bug Reports

General Strategy for Automated Molecular Replacement

How to Define Models

Building an Ensemble from Coordinates

How to Define Composition

Composition by Molecular Weight

Composition by Sequence

How to Select Peaks

Select by Percent

Select by Z-Score

Select by Number

Select All

Has Phaser Solved It?

What to do in difficult cases

Flexible Structure

Poor or Incomplete Model

High Degree of Non-crystallographic Symmetry

Pseudo-translational Non-crystallographic Symmetry

What not to do

Other suggestions

Reference

J. Appl. Cryst. (2007). 40, 658-674.

Phaser crystallographic software.

A. J. McCoy, R. W. Grosse-

Kunstleve, P. D. Adams, M. D. Winn, L.C. Storoni and R.J. Read.

Tutorials and Example Files

We thank Mike James and Natalie Strynadka for the BETA-BLIP test case diffraction data. Reference:

Strynadka, N.C.J., Jensen, S.E., Alzari, P.M. & James. M.N.G. (1996) Nat. Struct. Biol. 3 290-297. We thank Paul Adams for the Insulin test case diffraction data. Reference: Adams (2001) Acta Cryst D57.

990-995.

Bug Reports

We apologize for the bugs. Please send bug reports to [email protected]

General Strategy for Automated Molecular Replacement

Automated Molecular Replacement in Phaser combines the anisotropy correction, likelihood enhanced fast rotation function, likelihood enhanced fast translation function, packing and refinement modes for multiple search models and a set of possible spacegroups. The phenix AUTO_MR wizard runs Phaser in http://phenix-online.org/documentation/phaser.htm (1 of 5) [12/14/08 1:03:09 PM]

228

Phaser-2.1 default mode and allows some key changes to the default mode which may give structure solution in more difficult cases. Experience has shown that most structures that can be solved by Phaser can be solved by relatively simple strategies. However, if the AUTO_MR wizard doesn't give a solution even with non-default input you need to run Phaser outside the wizard to access the full range of Phaser control options. Details of how to run Phaser using keyword input or from python scripts are found at the Phaser home page .

How to Define Models

Phaser must be given the models that it will use for molecular replacement. A model in Phaser is referred to as an "ensemble", even when it is described by a single file. This is because it is possible to provide a set of aligned homologous structures as an ensemble, from which a statistically-weighted averaged model is calculated. A molecular replacement model is provided either as one or more aligned pdb files, or as an electron density map, entered as structure factors in an mtz file. Each ensemble is treated as a separate type of rigid body to be placed in the molecular replacement solution. An ensemble should only be defined once, even if there are several copies of the molecule in the asymmetric unit. Fundamental to the way in which Phaser uses MR models (either from coordinates or maps) is to estimate how the accuracy of the model falls off as a function of resolution, represented by the Sigma(A) curve. To generate the Sigma(A) curve, Phaser needs to know the RMS coordinate error expected for the model and the fraction of the scattering power in the asymmetric unit that this model contributes. If fp is the fraction scattering and RMS is the rms coordinate error, then

Sigma(A) = SQRT{fp*[1-fsol*exp(-Bsol*(sin(theta)/lambda) 2 )]} * exp{-(8 Pi 2 /3)*RMS 2 *(sin(theta)/ lambda)

2

} where fsol(default=0.95) and Bsol(default=300Å

2

) account for the effects of disordered solvent on the completeness of the model at low resolution.

Building an Ensemble from Coordinates

If you have an NMR Ensemble as a model, there is no need to split the coordinates in the pdb file provided that the models are separated by MODEL and ENDMDL cards. In this case the homology is not a good indication of the similarity of the structural coordinates to the target structure. You should use the RMS option; several test cases have succeeded where the ID was close to 100% with an RMS value of about 1.5Å (see table below). The RMS deviation is entered directly or indirectly via the sequence identity (ID) using the formula RMS = max(0.8,0.4*exp(1.87*(1.0-ID))) where ID is the fraction identity. The RMS deviation estimated from ID may be an underestimate of the true value if there is a slight conformational change between the model and target structures. To find a solution in these cases it may be necessary to increase the RMS from the default value generated from the ID, by say

0.5 Ångstroms. On the other hand, when Phaser succeeds in solving a structure from a model with sequence identity much below 30%, it is often found that the fold is preserved better than the average for that level of sequence identity. So it may be worth submitting a run in which the RMS error is set at, say, 1.5, even if the sequence identity is low. The table below can be used as a guide as to the default RMS value corresponding to ID.

Sequence ID RMS deviation

100% 0.80Å

64% 0.80Å

63%

50%

40%

30%

20%

0.799Å

1.02Å

1.23Å

1.48Å

1.78Å

--> limit 0% 2.60Å

If you construct a model by homology modelling, remember that the RMS error you expect is essentially the error you expect from the template structure (if not worse!). So specify the sequence identity of the template, not of the homology model. http://phenix-online.org/documentation/phaser.htm (2 of 5) [12/14/08 1:03:09 PM]

229

Phaser-2.1

How to Define Composition

The composition defines the total amount of protein and nucleic acid that you have in the asymmetric unit not the fraction of the asymmetric unit that you are searching for. You can mix compositions entered by molecular weight with those entered by sequence.

Composition by Molecular Weight

The composition is calculated from the molecular weight of the protein and nucleic acid assuming the protein and nucleic acid have the average distribution of amino acids and bases. If your protein or nucleic acid has an unusual amino acid or base distribution the composition should be entered by sequence. You can mix compositions entered by molecular weight with those entered by sequence.

Composition by Sequence

The composition is calculated from the amino acid sequence of the protein and the base sequence of the nucleic acid in fasta format.

How to Select Peaks

If the AUTO_MR wizard fails to find a solution with default input, a solution may be found by changing the default selection criteria for peaks from the rotation function that are carried through to the translation funciton. The selection criterion can be changed by choosing the "edit rarely used inputs" option in the wizard. Selection can be done in four different ways.

Select by Percent

Percentage of the top peak, where the value of the top peak is defined as 100% and the value of the mean is defined as 0%. Default cutoff is 75%. This criteria has the advantange that at least one peak

(the top peak) always survives the selection. If the top solution is clear, then only the one solution will be output, but if the distribution of peaks is rather flat, then many peaks will be output for testing in the next part of the MR procedure (e.g. many peaks selected from the rotation function for testing with a translation function).

Select by Z-Score

Number of standard deviations (sigmas) over the mean (the Z-score). This is an absolute significance test. Not all searches will produce output if the cutoff value is too high (e.g. 5 sigma).

Select by Number

Number of top peaks to select. If the distribution is very flat then it might be better to select a fixed large number (e.g. 1000) of top rotation peaks for testing in the translation function.

Select All

All peaks are selected. Enables full 6 dimensional searches, where all the solutions from the rotation function are output for testing in the translation function. This should never be necessary; it would be much faster and probably just as likely to work if the top 1000 peaks were used in this way.

Has Phaser Solved It?

http://phenix-online.org/documentation/phaser.htm (3 of 5) [12/14/08 1:03:09 PM]

230

Phaser-2.1

Ideally, only the number of solutions you are expecting should be found. However if the signal-to-noise of your search is low, there will be noise peaks in the final selection also. A highly compact summary of the history of a solution is given in the annotation of a solution in the .sol file.

This is a good place to start your analysis of the output. The annotation gives the Z-score of the solution at each rotation and translation function, the number of clashes in the packing, and the refined LLG. You should see the TFZ

(the translation function Z-score) is high at least for the final components of the solution, and that the

LLG (log-likelihood gain) increases as each component of the solution is added. For example, in the case of beta-blip the annotation for the single solution output in the .sol file shows these features SOLU

SET RFZ=11.0 TFZ=22.6 PAK=0 LLG=434 RFZ=6.2 TFZ=28.9 PAK=0 LLG=986 LLG=986

SOLU 6DIM ENSE beta EULER 200.920 41.240 183.776 FRAC -0.49641 -0.15752 -0.28125

SOLU 6DIM ENSE blip EULER 43.873 80.949 117.141 FRAC -0.12290 0.29306 -0.09193

TF Z-score Have I solved it?

less than 5 no

5 - 6

6 - 7

7 - 8

more than

8

unlikely

possibly

probably

definitely*

For a rotation function, the correct solution may be in the list with a Z-score under 4, and will not be found until a translation function is performed and picks out the correct solution. For a translation function the correct solution will generally have a Z-score (number of standard deviations above the mean value) over 5 and be well separated from the rest of the solutions. Of course, there will always be exceptions! *Note, in particular, that in the presence of translational NCS, pairs of similarly-oriented molecules separated by the correct translation vector will give large Z-scores, even if they are incorrect, because they explain the systematic variation in intensities caused by the translational NCS.

You should always at least glance through the summary of the logfile. One thing to look for, in particular, is whether any translation solutions with a high Z-score have been rejected by the packing step. By default up to 10 clashes are allowed. Such a solution may be correct, and the clashes may arise only because of differences in small surface loops. If this happens, repeat the run allowing a suitable number of clashes. Note that, unless there is specific evidence in the logfile that a high TFfunction Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.

What to do in difficult cases

Not every structure can be solved by molecular replacement, but the right strategy can push the limits.

What to do when the default jobs fail depends on why your structure is difficult.

Flexible Structure

The relative orientations of the domains may be different in your crystal than in the model. If that may be the case, break the model into separate PDB files containing rigid-body units, enter these as separate ensembles, and search for them separately. Alternatively, you could try generating a series of models perturbed by normal modes. One of these may duplicate the hinge motion and provide a good single model.

Poor or Incomplete Model

Signal-to-noise is reduced by coordinate errors or incompleteness of the model. Since the rotation search has lower signal to begin with than the translation search, it is usually more severely affected.

For this reason, it can be very useful to use the subsequent translation search as a way to choose http://phenix-online.org/documentation/phaser.htm (4 of 5) [12/14/08 1:03:09 PM]

231

Phaser-2.1 among many (say 1000) orientations. Try increasing the number of clustered orientations. If that fails, try turning off the clustering feature in the save step, because the correct orientation may sit on the shoulder of a peak in the rotation function. As shown convincingly by Schwarzenbacher et al.

(Schwarzenbacher, Godzik, Grzechnik & Jaroszewski, Acta Cryst. D60, 1229-1236, 2004), judicious editing can make a significant difference in the quality of a distant model. In a number of tests with their data on models below 30% sequence identity, we have found that Phaser works best with a

"mixed model" (non-identical sidechains longer than Ser replaced by Ser). In agreement with their results, the best models are generally derived using more sophisticated alignment protocols, such as their FFAS protocol.

High Degree of Non-crystallographic Symmetry

If there are clear peaks in the self-rotation function, you can expect orientations to be related by this known NCS. Alternatively, you may have an oligomeric model and expect similar NCS in the crystal.

First search with the oligomeric model; if this fails, search with a monomer.

Pseudo-translational Non-crystallographic Symmetry

It is frequently the case that crystallographic and non-crystallographic rotational symmetry axes are parallel. The combination generates translational NCS, in which more than one unique copy of the molecule is found in the same orientation in the crystal. This can be recognized by the presence of large non-origin peaks in the native Patterson map. If one copy of the search model can be found, then the translational NCS tells you where to place another copy. Unfortunately, the presence of translational NCS can make it difficult to solve a structure using Phaser, because the current likelihood targets do not account for the statistical effects of NCS. If there is a small difference in the orientation of the two molecules (which will show up as a reduction in the height of the non-origin Patterson peak as the resolution is increased), it may help to use data to higher resolution than the default, because the translational NCS is partially broken.

What not to do

The automated mode of Phaser is fast when Phaser finds a high Z-score solution to your problem.

When Phaser cannot find a solution with a significant Z-score, it "thrashes", meaning it maintains a list of 100-1000's of low Z-score potential solutions and tries to improve them. This can lead to exceptionally long Phaser runs (over a week of CPU time). Such runs are possible because the highly automated script allows many consecutive MR jobs to be run without you having to manually set 100-

1000's of jobs running and keep track of the results. "Thrashing" generally does not produce a solution: solutions generally appear relatively quickly or not at all. It is more useful to go back and analyse your models and your data to see where improvements can be made. Your system manager will appreciate you terminating these jobs. It is also not a good idea to effectively remove the packing test. Unless there is specific evidence in the logfile that a high TF-function Z-score solution is being rejected with a few clashes, it is much better to edit the model to remove the loops than to increase the number of allowed clashes. Packing criteria are a very powerful constraint on the translation function, and increasing the number of allowed clashes beyond the default (10) will increase the search time enormously without the possibility of generating any correct solutions that would not have otherwise been found.

Other suggestions

Phaser has powerful input, output and scripting facilities that allow a large number of possibilities for altering default behaviour and forcing Phaser to do what you think it should. However, you will need to read the information at the Phaser home page to take advantage of these facilities! http://phenix-online.org/documentation/phaser.htm (5 of 5) [12/14/08 1:03:09 PM]

232

Superimposing two PDB files with superpose_pdbs

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Superimposing two PDB files with superpose_pdbs

Author(s)

Purpose

Usage

How superpose_pdbs works:

Output files from superpose_pdbs

Examples

Standard run of superpose_pdbs:

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all superpose_pdbs keywords

Author(s)

● superpose_pdbs: Peter Zwart, Pavel Afonine, Ralf W. Grosse-Kunstleve

Purpose

superpose_pdbs is a command line tool for superimposing one PDB model on another and writing out the superimposed model.

Usage

How superpose_pdbs works:

superpose_pdbs performes a least-squares superposition of two selected parts from two pdb files. If no selections is provided for fixed and moving models the whole content of both input PDB files is used for superposition. If the number of atoms in fixed and moving models is different and the models contain amino-acid residues then the sequence alignment is performed and the matching residues (CA atoms by default, can be changed by the user) are used for superposition. Note that selected (and/or matching) atoms are the atoms used to find the superposition operators while these operators are applied to the whole moving structure.

Output files from superpose_pdbs

A PDB file with fitted model.

Examples

Standard run of superpose_pdbs:

Running the superpose_pdbs is easy. From the command-line you can type: http://phenix-online.org/documentation/superpose_pdbs.htm (1 of 3) [12/14/08 1:03:12 PM]

233

Superimposing two PDB files with superpose_pdbs phenix.superpose_pdbs fixed.pdb moving.pdb

Parameters can be changed from the command line: phenix.superpose_pdbs fixed.pdb moving.pdb selection_fixed="chain A and name CA" selection_moving="chain B and name CA"

Possible Problems

Specific limitations and problems:

Different number of atoms in selection_fixed and selection_moving in case when no sequence alignment can be performed (the molecules contain no amino-acid residues) or sequence alignment failed to find matching residues.

More than one model in one PDB file (separated with MODEL-ENDMDL)

Literature

Additional information

List of all superpose_pdbs keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

-------------------------------------------------------------------------------

selection_fixed= None Selection of the target atoms to fit to (optional) selection_moving= None Selection of the atoms that will be fit to

selection_fixed (optional)

input

pdb_file_name_fixed= None Name of PDB file with model to fit to

pdb_file_name_moving= None Name of PDB file with model that will be fit to

pdb_file_name_fixed

crystal_symmetry

Unit cell and space group parameters

unit_cell= None

space_group= None

output

file_name= None Name of PDB file with model that best fits to

pdb_file_name_fixed

alignment

Set of parameters for sequence alignment. Defaults are good for most

of cases

alignment_style= local *global

gap_opening_penalty= 1

gap_extension_penalty= 1

similarity_matrix= blosum50 dayhoff *identity http://phenix-online.org/documentation/superpose_pdbs.htm (2 of 3) [12/14/08 1:03:12 PM]

234

Superimposing two PDB files with superpose_pdbs

selection= peptide and name ca Select protein atoms that will be used in

superposition after sequence alignment http://phenix-online.org/documentation/superpose_pdbs.htm (3 of 3) [12/14/08 1:03:12 PM]

235

Density modification with multi-crystal averaging with phenix.multi_crystal_average

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Density modification with multi-crystal averaging with phenix.multi_crystal_average

Author(s)

Purpose

Usage

How phenix.multi_crystal_average works:

Output files from phenix.multi_crystal_average

Examples

Standard run of phenix.multi_crystal_average:

Run of phenix.multi_crystal_average with multiple domains:

Run of phenix.multi_crystal_average using PDB files to define the NCS asymmetric unit:

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all multi_crystal_average keywords

Author(s)

● phenix.multi_crystal_average: Tom Terwilliger

Purpose

phenix.multi_crystal_average is a command line tool for carrying out density modification, including NCS symmetry within a crystal and electron density from multiple crystals.

Usage

How phenix.multi_crystal_average works:

The inputs to phenix.multi_crystal_average are a set of PDB files that define the NCS within each crystal and the relationships of density between crystals, structure factor amplitudes (and optional phases, FOM and HL coefficients) for each crystal, and starting electron density maps for one or more crystals. The PDB files should be composed of the exact same of chains, placed in a different position and orientation for each

NCS asymmetric unit of each crystal. You might create these PDB files by molecular replacement starting with the same search model for each crystal. You should not refine these MR solutions; they are only used to get the NCS relationships and they will be more reliably found if the models for all NCS asymmetric units are identical. You can break the NCS asymmetric unit into domains and place them independently. You can specify the domains by giving them unique chain IDs, (or you can use the routine edit_chains.py to do this for you, see below). A separate NCS group will be created for each domain. Additionally if your NCS asymmetric unit consists of more than one chain (A+B for example) then each chain will always be treated as a separate NCS group. phenix.multi_crystal_average first uses the supplied PDB files to calculate NCS operators relating the NCS asymmetric unit in each crystal to all othe NCS asymmetric units in that crystal and in other crystals. This is done by adding the unique chains in one crystal to each PDB file in turn, finding all the NCS relationships from all chains in that composite PDB file, and removing duplicate identity transformations. For example, suppose the NCS asymmetric unit is one chain (A,B,C....). Then to to relate all NCS asymmetric units to the

NCS asymmetric unit of crystal 0, phenix.multi_crystal_average will compare all chains in the PDB file for each crystal to the unique chain in the PDB file for crystal 0, generating one NCS operator for each chain in http://phenix-online.org/documentation/multi_crystal_average.htm (1 of 6) [12/14/08 1:03:16 PM]

236

Density modification with multi-crystal averaging with phenix.multi_crystal_average each crystal. In this process the unique chain (in this case the NCS asymmetric unit of crystal 0) is renamed to a unique name (usually "**") and a composite PDB file is created with this chain along with all the chains in the PDB file for the crystal being considered, and phenix.simple_ncs_from_pdb is used to find the NCS operators. The centroids of the chains defining NCS are used as centers of the regions where the NCS operator is to be applied. If the supplied PDB files have more than one domain or chain in each NCS asymmetric unit, then the domains or chains are grouped into separate NCS groups. Once NCS operators have been identified, density modification is carried out sequentially on data from each crystal. During density modification for one crystal, the current electron density maps from all other crystals are used in generating target density for density modification in exactly the same way as NCS-related density is normally used when only a single crystal is available. First the asymmetric unit of NCS is defined, in this case including the density in all NCS copies within the crystal being density modified as well as the density in all NCS copies in all other crystals. The asymmetric unit of NCS is the region over which the NCS operators apply. It is assumed to be identical for all NCS copies for all crystals, with orientation and position identified by the NCS operators. It is identified as the region over which all NCS copies have correlated density. If a mask for the protein/solvent boundary is supplied (by specifying "quot; use_model_mask"quot;), then the asymmetric unit of NCS is constrained to be within the non-solvent region of the map. Alternatively, if you request that the domains provided in your PDB files be used to define the NCS asymmetric unit (by specifying "quot;write_ncs_domain_pdb"quot;) then the the NCS asymmetric unit (for each NCS group) is limited to the region occupied by the corresponding chains in your

PDB files. Then a target density map is created for the crystal being density modified. For each NCS copy in this crystal, the average density for all other NCS copies in this and other crystals is used as a target.

Finally, statistical density modification is carried out using histograms of expected density, solvent flattening, and the NCS-based target density for this crystal. The process is then repeated for all other crystals. For those crystals for which no starting phases were available, one additional step is carried out in which the target density map is used by itself to calculate a starting electron density map (using RESOLVE map-based phasing). This entire process is carried out several times, leading to electron density maps for all crystals that typically have a high level of correlation of density within all NCS copies in each crystal and between the corresponding NCS regions in different crystals.

Output files from phenix.multi_crystal_average

denmod_cycle_1_xl_0.mtz: Density-modified map coefficients for crystal 0, cycle 1. Crystal 0 is the first crystal specified in your pdb_list, map_coeff_list, etc. denmod_cycle_5_xl_1.mtz: Density-modified map coefficients for crystal 1, cycle 5. These map coefficients are suitable for model-building. They also contain

HL coefficients that can optionally be used in refinement. As the HL coefficients contain information from all crystals they may in some cases be useful in refinement (normally you would only use experimental HL phase information in refinement as the NCS-based information would come from your NCS restraints in refinement).

Examples

Standard run of phenix.multi_crystal_average:

Running phenix.multi_crystal_average is easy. Usually you will want to edit a small parameter file

(run_multi.eff) to contain your commands like this: type:

# run_multi.eff commands for running phenix.multi_crystal_average

# use: "phenix.multi_crystal_average run_multi.eff" multi {

pdb_list = "crystal_1.pdb" "crystal_2.pdb"

map_coeff_list = "crystal_1_map_coeffs.mtz" None

datafile_list = "crystal_1_data.mtz" "crystal_2_data.mtz"

datafile_labin_list = "FP=FP" "FP=F SIGFP=SIGF PHIB=PHI FOM=FOM HLA=HLA HLB=HLB HLC=HLC

HLD=HLD"

solvent_content_list = "0.43" "0.50"

cycles = 5

}

Then you can run this with the command: http://phenix-online.org/documentation/multi_crystal_average.htm (2 of 6) [12/14/08 1:03:16 PM]

237

Density modification with multi-crystal averaging with phenix.multi_crystal_average phenix.multi_crystal_average run_multi.eff

In this example we have 2 crystals. Crystal 1 has starting map coefficients in crystal_1_map_coeffs.mtz and data for FP in crystal_1_data.mtz. The contents of this crystal are represented by crystal_1.pdb. The second crystal has no starting map, has data for FP as well as PHI and HL coefficients in crystal_2_data.mtz, and the contents of this crystal are represented by crystal_2.pdb. The solvent contents of the 2 crystals are 0.43 and 0.50, and 5 overall cycles are to be done. The column label strings like "FP=FP" are optional...if you say instead "None" then phenix.multi_crystal_average will guess them for you.

Run of phenix.multi_crystal_average with multiple domains:

If your PDB files have more than one NCS domain within a chain, then you may want to split the chains up into sub-chains representing the individual NCS domains. This will provide a better definition of the NCS operators when the PDB files are analyzed. You can use the jiffy "edit_chains.py" to do this. This jiffy splits your chains up into sub-chains based on the domains that you specify in "edit_chains.dat". NOTE: edit_chains.py only works if your chains have single-letter ID's. (It simply adds another character to your chain ID's to make new ones.) If you have two-letter chain ID's, then you'll have to do this another way. To use it, type: phenix.python $PHENIX/phenix/phenix/autosol/edit_chains.py file.pdb edited_file.pdb

The file edit_chains.dat is required and should look like:

A 1 321

A 322 597

A 598 750

A 751 902

A 903 1082

B 1 58

B 424 425

B 59 101

B 343 423

B 102 342 where the letter and residue range is your chain ID and residue range for a particular domain. You should specify these for ALL chains in your PDB files (not just the unique ones).

Run of phenix.multi_crystal_average using PDB files to define the NCS asymmetric unit:

If you specify the parameter write_ncs_domain_pdb=True, then phenix.multi_crystal_average will write out domain-specific PDB files for each domain in your model (based on its analysis of NCS, one for each NCS group). Then it will use those domain-specific PDB files to define the region over which the corresponding set of NCS operators apply. This is generally a good idea if you have multiple domains in your structure.

Possible Problems

Specific limitations and problems:

If the NCS asymmetric unit of your crystal contains more than one chain, phenix.

multi_crystal_average will consider it to have more than one domain. This limitation comes from phenix.simple_ncs_from_pdb, which assigns one NCS group to each unique chain in the NCS asymmetric unit. If you would like phenix.multi_crystal_average to consider several chains as a single

NCS group, then you would need to rename your chains and residues so that all the residues in a single NCS group have the same chain name and so that residue numbers are not duplicated.

Normally you not need to do this, but if you want to use phenix.multi_crystal_average to generate phases for one crystal from another and you have more than one chain in the NCS asymmetric unit http://phenix-online.org/documentation/multi_crystal_average.htm (3 of 6) [12/14/08 1:03:16 PM]

238

Density modification with multi-crystal averaging with phenix.multi_crystal_average

● you would have to do this.

If your NCS asymmetric unit has more than one domain (more than one chain, or else multiple domains within a chain that have different arrangements in different NCS asymmetric units) then phenix.multi_crystal_average requires that you provide map coefficients for all crystals. This is because phenix.multi_crystal_average cannot use the PDB files you provide to generate the NCS asymmetric unit directly at this point (i.e., it cannot use pdb_domain in RESOLVE.) Therefore if you don't provide map coefficients for one crystal then it does not have a way to individually identify the region occupied by each domain in the NCS asymmetric unit for that crystal. This isn't a problem if there are not multiple domains or chains in the NCS asymmetric unit because the automatic method for generation of the NCS asymmetric unit can be used.

Normally you should supply PDB files defining the NCS in your crystals in which all the chains have identical sequences and conformations within each NCS copy. This is not absolutely required, however. If your PDB file contains chains that are not identical then NCS will be estimated from the chains you provide. It may be necessary to set the parameter simple_ncs_from_pdb.maximize_size_of_groups=True

● to get this to work if the chains have insertions, deletions, or sequence differences.

The size of the asymmetric unit in the SOLVE/RESOLVE portion of phenix.multi_crystal_average is limited by the memory in your computer and the binaries used. The Wizard is supplied with regularsize ("", size=6), giant ("_giant", size=12), huge ("_huge", size=18) and extra_huge ("_extra_huge", size=36). Larger-size versions can be obtained on request.

Literature

Rapid automatic NCS identification using heavy-atom substructures T.C. Terwilliger.

Acta Cryst. D58, 2213-2215 (2002)

Statistical density modification with non-crystallographic symmetry T.C. Terwilliger.

Acta Cryst. D58, 2082-2086 (2002)

Maximum likelihood density modification T. C. Terwilliger Acta Cryst. D56 , 965-972

(2000)

Map-likelihood phasing T. C. Terwilliger Acta Cryst. D57 , 1763-1775 (2001)

[pdf]

[pdf]

[pdf]

[pdf]

Additional information

List of all multi_crystal_average keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

-------------------------------------------------------------------------------

include= scope phenix.command_line.simple_ncs_from_pdb.ncs_master_params

multi

verbose= True verbose output

debug= False debugging output

pdb_list= None List of PDB files, one for each crystal. These should be in

the same order as datafiles and map files. They are used to http://phenix-online.org/documentation/multi_crystal_average.htm (4 of 6) [12/14/08 1:03:16 PM]

239

Density modification with multi-crystal averaging with phenix.multi_crystal_average

identify the NCS within each crystal and between crystals. You

should create these by placing the unique set of atoms (the NCS

asymmetric unit) in each NCS asymmetric unit of each unit cell.

Normally you would do this by carrying out molecular replacement

on each crystal with the same search model.

output_file= None You can name the output file (your own path) if you like

map_coeff_list= None List of mtz files with map coefficients. At least one

crystal must have map coefficients. Use "None" for any

crystals that do not have starting maps. NOTE: If you have

multiple NCS groups then you need map coefficients for all

crystals.

map_coeff_labin_list= None list of labin lines for mtz files with map

coefficients. They look like map_coeff_labin_list="

'FP=FP PHIB=PHIM FOM=FOMM'" Put each set of labin

values inside single quotes, and the whole list

inside double quotes. You can leave out a labin

statement for a file by putting in None and the

routine will guess the column labels

datafile_list= None list of mtz files with structure factors and optional

phases and FOM and optional HL coefficients. One datafile

for each crystal to be included

datafile_labin_list= None list of labin lines for mtz files . Each one can

contain FP SIGFP [PHIB FOM] [HLA HLB HLC HLD]. They

look like this: datafile_labin_list=" 'FP=FP

SIGFP=SIGFP PHIB=PHIM FOM=FOMM'" Put each set of labin

values inside single quotes, and the whole list inside

double quotes. You can leave out a labin statement for

a file by putting in None and the routine will guess

the column labels NOTE: If you supply HL coefficients

they will be used in phase recombination. If you

supply PHIB or PHIB and FOM and not HL coefficients,

then HL coefficients will be derived from your PHIB

and FOM and used in phase recombination.

solvent_content_list= None Solvent content (0 to 1, typically 0.5) for each

crystal

cycles= 5 Number of cycles of density modification

resolution= None high-resolution limit for map calculation

temp_dir= "temp_dir" Optional temporary work directory

output_dir= "" Output directory where files are to be written

perfect_map_coeff_list= None Optional list of mtz files with perfect map

coefficients for comparison

perfect_map_coeff_labin_list= None list of labin lines for mtz files with

perfect map coefficients.

use_model_mask= False You can use the PDB files you input to define the

solvent boundary if you wish. These will partially define

the NCS asymmetric unit (by limiting it to the non-solvent

region) but the exact NCS asymmetric unit will always be

defined automatically (by the overlap of NCS-related

density). Note that this is different than the command

write_ncs_domain_pdb which defines individual regions where

NCS applies for each domain.

coarse_grid= False You can set coarse_grid in resolve

sharpen= False You can sharpen the maps or not in the density-modification

process. (They are unsharpened at the end of the process if so).

equal_ncs_weight= False You can fix the NCS weighting to equally weight all

copies.

weight_ncs= None You can set the weighting on NCS symmetry (and

cross-crystal averaging)

write_ncs_domain_pdb= None You can use the input PDB files to define NCS

boundaries. The atoms in the PDB files will be http://phenix-online.org/documentation/multi_crystal_average.htm (5 of 6) [12/14/08 1:03:16 PM]

240

Density modification with multi-crystal averaging with phenix.multi_crystal_average

grouped into domains during the analysis of NCS and

written out to domain-specific PDB files. (If there

is only one domain or NCS group then there will be

only one domain-specific PDB file and it will be the

same as the starting PDB file.) Then the

domain-specific PDB files will be used to define the

regions over which the corresponding NCS operators

apply. Note that this is different than the command

use_model_mask which only defines the overall solvent

boundary with your model.

mask_cycles= 1 Number of mask cycles in each cycle of density modification

dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/multi_crystal_average.htm (6 of 6) [12/14/08 1:03:16 PM]

241

Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb

Author(s)

Purpose

Usage

How get_cc_mtz_pdb works:

Output files from get_cc_mtz_pdb

Examples

Standard run of get_cc_mtz_pdb:

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all get_cc_mtz_pdb keywords

Author(s)

● get_cc_mtz_pdb: Tom Terwilliger

Purpose

get_cc_mtz_pdb is a command line tool for adjusting the origin of a PDB file using space-group symmetry so that the PDB file superimposes on a map, obtaining the correlation of model and map, and analyzing the correlation for each residue.

Usage

How get_cc_mtz_pdb works:

get_cc_mtz_pdb calculates a model map based on the supplied PDB file, then uses RESOLVE to find the origin shift (using space-group symmetry) that maximizes the correlation of this model map with a map calculated with the supplied map coefficients in an mtz file. This shift is applied to the atoms in the PDB file to create offset.pdb and then the correlation, residue-by-residue of offset.pdb with the map is analyzed. Atoms and residues that are out of density or are in weak density are flagged. You can set several parameters to define how the correlations are calculated. By default model density is calculated using the atom types, occupancies and isotropic thermal factors (B-values) supplied in the

PDB file. If you specify scale=True then an overall B as well as an increment in B-values for each atom beyond CB (for proteins) will be added to the values in the PDB file, after adjusting these parameters to maximize the map correlation.

If you specify use_only_refl_present_in_mtz=True http://phenix-online.org/documentation/get_cc_mtz_pdb.htm (1 of 3) [12/14/08 1:03:17 PM]

242

Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb then the model-based map will be calculated using the same set of reflections as the map calculated from your input mtz file. This reduces the contribution of missing reflections on the calculation (but the correlation is no longer the actual map-model correlation). In the calculation of the map correlation in the region of the model, the region where the model is located is defined as all points within a distance rad_max of an atom in the model. The value of rad_max is adjusted in each case to maximize this correlation. Its value is typically similar to the high-resolution limit of the map.

Output files from get_cc_mtz_pdb

offset.pdb: A PDB file offset to match the origin in the mtz file.

Examples

Standard run of get_cc_mtz_pdb:

Running the get_cc_mtz_pdb is easy. From the command-line you can type: phenix.get_cc_mtz_pdb map_coeffs.mtz coords.pdb

If you want (or need) to specify the column names from your mtz file, you will need to tell get_cc_mtz_pdb what FP and PHIB (and optionally FOM) are, in this format: phenix.get_cc_mtz_pdb map_coeffs.mtz coords.pdb \ labin="FP=2FOFCWT PHIB=PH2FOFCWT"

Possible Problems

Specific limitations and problems:

In versions of PHENIX up to 1.3-final, defaults were set to maximize the correlation coefficient rather than to give the correlation using the existing thermal parameters and including only the reflections present in the mtz file. These previous defaults were equivalent to using the values: scale=True use_only_refl_present_in_mtz=True

These defaults were changed so that the correlation values obtained by default in a case where no origin shifts are needed would correspond to those obtained by simply calculating (1) a map using the input map coefficients and (2) a map from the PBB file and then determining the correlation between these maps.

Literature

Additional information

List of all get_cc_mtz_pdb keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help http://phenix-online.org/documentation/get_cc_mtz_pdb.htm (2 of 3) [12/14/08 1:03:18 PM]

243

Correlation of map and model after adjusting model for origin shifts with get_cc_mtz_pdb

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- get_cc_mtz_pdb

pdb_in= None PDB file with coordinates to evaluate

mtz_in= None MTZ file with coefficients for a map

labin= "" Labin line for MTZ file with map coefficients. This is optional

if get_cc_mtz_pdb can guess the correct coefficients for FP PHI and

FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM where

myFP is your column label for FP

resolution= 0.

high-resolution limit for map calculation

use_only_refl_present_in_mtz= False You can specify that only reflections

present in your mtz file are used in the

comparison.

scale= False If you set scale=True then get_cc_mtz_pdb applies an overall B

factor and a delta_b for each atom beyond CB.

chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and

side-chain atoms)

temp_dir= "temp_dir" Optional temporary work directory

output_dir= "" Output directory where files are to be written

verbose= True Verbose output

quick= False Skip the residue-by=residue correlations for a quick run

debug= False Debugging output

dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/get_cc_mtz_pdb.htm (3 of 3) [12/14/08 1:03:18 PM]

244

Correlation of two maps after accounting for origin shifts with get_cc_mtz_mtz

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Correlation of two maps after accounting for origin shifts with get_cc_mtz_mtz

Author(s)

Purpose

Usage

How get_cc_mtz_mtz works:

Output files from get_cc_mtz_mtz

Examples

Standard run of get_cc_mtz_mtz:

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all get_cc_mtz_mtz keywords

Author(s)

● get_cc_mtz_mtz: Tom Terwilliger

Purpose

get_cc_mtz_mtz is a command line tool for adjusting the origin of a map so that the map superimposes on another map, and obtaining the correlation of the two maps. The maps are calculated from map coefficients supplied by the user in two mtz files.

Usage

How get_cc_mtz_mtz works:

get_cc_mtz_mtz calculates maps based on the supplied mtz files, then uses RESOLVE to find the origin shift compatible with space-group symmetry that maximizes the correlation of the two maps. This shift is applied to the second map and the correlation of the maps is calculated. Several parameters can be set by the user to define how the correlations are calculated. By default, maps are calculated using all the reflections present (to the specified high-resolution limit, if any) in each mtz file. If you specify use_only_refl_present_in_mtz_1=True

Then the map calculated using your second mtz file will only include reflections that were present in your first mtz file. This removes the effects of missing reflections on the correlation. If you specify scale=True then get_cc_mtz_mtz scales the amplitudes from the second input mtz file to those in the first input mtz, including an overall B factor and a scale factor. This reduces effects of differences in overall B factors between the two mtz files on the correlation. If you specify http://phenix-online.org/documentation/get_cc_mtz_mtz.htm (1 of 3) [12/14/08 1:03:19 PM]

245

Correlation of two maps after accounting for origin shifts with get_cc_mtz_mtz keep_f_mag=False then get_cc_mtz_mtz uses amplitudes from the first input mtz file and phases and figure of merit from both to do the correlation. This has the effect of removing effects due to differences in amplitudes on the correlation, and focusing on differences in phases and figures of merit.

Output files from get_cc_mtz_mtz

offset.log: Log file for correlation calculation.

Examples

Standard run of get_cc_mtz_mtz:

Running the get_cc_mtz_mtz is easy. From the command-line you can type: phenix.get_cc_mtz_mtz map_coeffs_1.mtz map_coeffs_2.mtz

If you want (or need) to specify the column names from your mtz file, you will need to tell get_cc_mtz_mtz what FP and PHIB (and optionally FOM) are, in this format: phenix.get_cc_mtz_mtz map_coeffs_1.mtz map_coeffs_2.mtz \ labin_1="FP=2FOFCWT PHIB=PH2FOFCWT" labin_2="FP=2FOFCWT PHIB=PH2FOFCWT"

Possible Problems

Specific limitations and problems:

Versions of phenix.get_cc_mtz_mtz up to 1.3-final used a different set of defaults, with the values, scale=True use_f_mag=False use_only_refl_present_in_mtz_1=True

These defaults were changed after version 1.3-final in order to make the results independent of the order of the mtz files and to make the default be to get the correlation of maps without manipulation.

Literature

Additional information

List of all get_cc_mtz_mtz keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available) http://phenix-online.org/documentation/get_cc_mtz_mtz.htm (2 of 3) [12/14/08 1:03:19 PM]

246

Correlation of two maps after accounting for origin shifts with get_cc_mtz_mtz

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- get_cc_mtz_mtz

mtz_1= None MTZ file 1 with coefficients for a map

mtz_2= None MTZ file 2 with coefficients for a map

labin_1= "" Labin line for MTZ file 1 with map coefficients. This is

optional if get_cc_mtz_mtz can guess the correct coefficients for

FP PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI

FOM=myFOM where myFP is your column label for FP

labin_2= "" Labin line for MTZ file 2 with map coefficients. This is

optional if get_cc_mtz_mtz can guess the correct coefficients for

FP PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI

FOM=myFOM where myFP is your column label for FP

resolution= 0.

high-resolution limit for map calculation

low_resolution= 1000.

low-resolution limit for map calculation

temp_dir= "temp_dir" Optional temporary work directory

output_dir= "" Output directory where files are to be written

keep_f_mag= True If you set keep_f_mag=False then get_cc_mtz_mtz uses

amplitudes from the first input mtz file and phases and fom

from both to do the correlation. If you specify keep_f_mag=True

then the amplitudes from both files are included.

scale= False If you set scale=True then get_cc_mtz_mtz scales the

amplitudes from the second input mtz file to those in the first

input mtz, including an overall B factor and a scale factor.

use_only_refl_present_in_mtz_1= False You can specify that only reflections

present in your first mtz file are used in

the comparison. Note that this means that

the order of the files will have an effect

on the correlation coefficient

verbose= True Verbose output

debug= False Debugging output

dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/get_cc_mtz_mtz.htm (3 of 3) [12/14/08 1:03:19 PM]

247

Rapid helix fitting to a map with find_helices_strands

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Rapid helix fitting to a map with find_helices_strands

Author(s)

Purpose

Usage

How find_helices_strands finds helices and strands in maps:

How find_helices_strands finds RNA and DNA helices in maps:

Output files from find_helices_strands

Examples

Standard run of find_helices_strands:

Using find_helices_strands to bootstrap phenix.autobuild:

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all find_helices_strands keywords

Author(s)

● find_helices_strands: Tom Terwilliger

Purpose

find_helices_strands is a command line tool for finding helices and strands in a map and building an model of the parts of a structure that have regular secondary structure. It can be used for protein,

RNA, and DNA.

Usage

How find_helices_strands finds helices and strands in maps:

find_helices_strands first identifies helical segments as rods of density at 5-8 A. Then it identifies helices at higher resolution keeping the overall locations of the helices fixed. Then it identifies the directions and CA positions of helices by noting the helical pattern of high-density points offset slightly along the helix axis from the main helical density (as used in "O" to identify helix direction). Finally model helices are fit to the density using the positions and orientations identified in the earlier steps. A similar procedure is used to identify strands. Then the helices and strands are combined into a single model.

How find_helices_strands finds RNA and DNA helices in maps:

find_helices_strands finds RNA and DNA helices differently than it finds helices in proteins. It uses a convolution search to find places in the asymmetric unit where an A-form RNA or B-form DNA helix can be placed. These are assembled into contiguous helical segments if possible. The resolution of this search is 4.5 A if you have resolution beyond 4.5 A, and the resolution of your data otherwise. http://phenix-online.org/documentation/find_helices_strands.htm (1 of 3) [12/14/08 1:03:21 PM]

248

Rapid helix fitting to a map with find_helices_strands

Output files from find_helices_strands

If you run find_helices_strands with my_map.mtz then you will get: my_map.mtz_helices_strands.

pdb which is a PDB file containing helices from your structure.

Examples

Standard run of find_helices_strands:

Running the find_helices_strands is easy. From the command-line you can type: phenix.find_helices_strands map_coeffs.mtz quick=True

If you want a more thorough run, then skip the "quick=True" flag. If you want (or need) to specify the column names from your mtz file, you will need to tell find_helices_strands what FP and PHIB are, in this format: phenix.find_helices_strands map_coeffs.mtz \ labin="LABIN FP=2FOFCWT PHIB=PH2FOFCWT"

If you want to specify a sequence file, then in the last step find_helices_strands will try to align your sequence with the map and model: phenix.find_helices_strands map_coeffs.mtz seq_file=seq.dat

Using find_helices_strands to bootstrap phenix.autobuild:

If you run phenix.autobuild at low resolution (3.5 A or lower) then your model may have strands built instead of helices. You can use find_helices_strands to help bootstrap autobuild model-building by providing the helical model from find_helices_strands to phenix.autobuild. Just run phenix.

find_helices_strands with your best map map_coeffs.mtz. Then take the helical model map_coeffs.

mtz_helices.pdb and pass it to phenix.autobuild with the keyword (in addition to your usual keywords for autobuild): consider_main_chain_list=map_coeffs.mtz_helices.pdb

Then the AutoBuild wizard will treat your helical model just like one of the models that it builds, and merge it into the model as it is being assembled.

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all find_helices_strands keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values http://phenix-online.org/documentation/find_helices_strands.htm (2 of 3) [12/14/08 1:03:21 PM]

249

Rapid helix fitting to a map with find_helices_strands blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- find_helices_strands

mtz_in= None MTZ file with coefficients for a map

output_model= None Output PDB file

output_log= None Output log file name. If you want to specify a directory

to put this file in then please use "output_dir=myoutput_dir"

output_dir= None Output directory

seq_file= None Sequence file for sequence alignment

compare_file= None PDB file for comparison only

labin= "" Labin line for MTZ file with map coefficients. This is optional

if find_helices_strands can guess the correct coefficients for FP

PHI and FOM. Otherwise specify: LABIN FP=myFP PHIB=myPHI FOM=myFOM

where myFP is your column label for FP

resolution= 0.

high-resolution limit for map calculation

res_convolution= 4.5

high-resolution limit for convolution calculation.

(Applies to nucleic acids only)

chain_type= *PROTEIN DNA RNA Chain type (for identifying main-chain and

side-chain atoms)

temp_dir= "temp_dir" Optional temporary work directory

helices_only= False Find only helices

strands_only= False Find only strands

use_any_side= False Use any side chain that fits density in assembly

cc_helix_min= None Minimum CC of low-res helical density to map to keep.

cc_strand_min= None Minimum CC of strand density to map to keep.

quick= False Try to find helices quickly

verbose= True Verbose output

debug= False Debugging output

dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/find_helices_strands.htm (3 of 3) [12/14/08 1:03:21 PM]

250

phenix.pdbtools: PDB model manipulations and statistics

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

phenix.pdbtools: PDB model manipulations and statistics

List of all pdbtools keywords

Manipulations on a model in a PDB file including The operations below can be applied to the whole model or selected parts (e.g. "selection=chain A and backbone"). See examples below.

● shaking of coordinates (random coordinate shifts)

● rotation-translation shift of coordinates

● shaking of occupancies

● set occupancies to a value

● shaking of ADP

● shifting of ADP (addition of a constant value)

● scaling of ADP (multiplication by a constant value)

● setting ADP to a given value

● conversion to isotropic ADP

● conversion to anisotropic ADP

● removal of selected parts of a model

Comprehensive model statistics

Atomic Displacement parameters (ADP) statistics:

% phenix.pdbtools model.pdb --show-adp-statistics

Geometry (stereochemistry) statistics:

% phenix.pdbtools model.pdb --show-geometry-statistics

In the absence of a CRYST1 record in the PDB file, functionality that doesn't require knowledge of the crystal symmetry is still available. To enable the full functionality, the crystal symmetry can be specified externally (e.g. via the --symmetry option). Structure factors calculation The total model structure factor is defined as:

Fmodel = scale * exp(-h*b_cart*ht) * (Fcalc + k_sol * exp(-b_sol*s^2) * Fmask) where: scale is overall scale factor, h is Miller index, b_cart is overall anisotropic scale matrix in

Cartesian basis, Fcalc are structure factors computed from atomic model, k_sol is bulk solvent density, b_sol is smearing factor for bulk solvent contribution, Fmask is a solvent mask. Add

hydrogen atoms Add H atoms to a model using phenix.reduce. All default parameters of phenix.

reduce are used. Perform model geometry regularization Minimize geometry target to idealize bond lenghths, bond angles, planarities, chiralities, dihedrals, and non-bonded interactions. Examples

1) Type phenix.pdbtools from the command line for instructions:

% phenix.pdbtools

2) To see all default parameters: http://phenix-online.org/documentation/pdbtools.htm (1 of 5) [12/14/08 1:03:24 PM]

251

phenix.pdbtools: PDB model manipulations and statistics

% phenix.pdbtools --show-defaults=all

3) Suppose a PDB model consist of three chains A, B and C and some water molecules. Remove all atoms in chain C and all waters:

% phenix.pdbtools model.pdb remove="chain C or water" or one can achieve exactly the same result with equivalent command:

% phenix.pdbtools model.pdb keep="chain A or chain B" or:

% phenix.pdbtools model.pdb keep="not(chain C or water)" or finally:

% phenix.pdbtools model.pdb remove="not(chain A or chain B)"

The result of all four equivalent commands above will be a new PDB file containing chains A and B only.

Important: the commands keep and remove cannot be used simultaneously. 4) Remove all but backbone atoms and set all b-factors to 25:

% phenix.pdbtools model.pdb keep=backbone set_b_iso=25

5) Suppose a PDB model consist of three chains A, B and C and some water molecules. Remove all but backbone atoms and set b-factors to 25 for chain C atoms:

% phenix.pdbtools model.pdb keep=backbone set_b_iso=25 selection="chain C"

6) Simple Fcalc from atomic model (Fmodel = Fcalc):

% phenix.pdbtools model.pdb --f_model high_resolution=2.0

this will result in MTZ file with complete set of Fcalc up to 2A resolution. 7) Compute Fmodel include bulk solvent and all other scales, request the output in CNS format, specify labels for output Fmodel

(by default it is FMODEL), set low_resolution limit, use direct method of calculations (rather than

FFT):

% phenix.pdbtools model.pdb high_resolution=2.0 format=cns label=FM \

low_resolution=6.0 algorithm=direct k_sol=0.35 b_sol=60 scale=3 \

b_cart='1 2 -3 0 0 0' --f_model

8) Compute Fcalc using neutron scattering dictionary:

% phenix.pdbtools model.pdb --f_model high_resolution=2.0 scattering_table=neutron

9) Input model can be manipulated first before structure factors calculation:

% phenix.pdbtools model.pdb --f_model high_resolution=2.0 sites.shake=1.0

10) Add H atoms to a model: http://phenix-online.org/documentation/pdbtools.htm (2 of 5) [12/14/08 1:03:24 PM]

252

phenix.pdbtools: PDB model manipulations and statistics

% phenix.pdbtools model.pdb --add_h output.file_name=model_h.pdb

11) Model geometry regularization:

% phenix.pdbtools model.pdb --geometry_regularization

List of all pdbtools keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- modify

remove= None Selection for the atoms to be removed

keep= None Select atoms to keep

put_into_box_with_buffer= None Move molecule into center of box.

selection= None Selection for atoms to be modified

random_seed= None Random seed

adp

Scope of options to modify ADP of selected atoms

atom_selection= None Selection for atoms to be modified. Overrides

parent-level selection.

randomize= None Randomize ADP within a certain range

set_b_iso= None Set ADP of atoms to set_b_iso

convert_to_isotropic= None Convert atoms to isotropic

convert_to_anisotropic= None Convert atoms to anisotropic

shift_b_iso= None Add shift_b_iso value to ADP

scale_adp= None Multiply ADP by scale_adp

sites

Scope of options to modify coordinates of selected atoms

atom_selection= None Selection for atoms to be modified. Overrides

parent-level selection.

shake= None Randomize coordinates with mean error value equal to shake

translate= 0 0 0 Translational shift

rotate= 0 0 0 Rotational shift

euler_angle_convention= *xyz zyz Euler angles convention to be used for

rotation

occupancies

Scope of options to modify occupancies of selected atoms

randomize= None Randomize occupancies within a certain range

set= None Set all or selected occupancies to given value

output

Write out PDB file with modified model (file name is defined in

write_modified)

file_name= None Default is the original file name with the file

extension replaced by _modified.pdb .

input

pdb

file_name= None Model file(s) name (PDB)

crystal_symmetry

Unit cell and space group parameters

unit_cell= None http://phenix-online.org/documentation/pdbtools.htm (3 of 5) [12/14/08 1:03:24 PM]

253

phenix.pdbtools: PDB model manipulations and statistics

space_group= None

f_model

high_resolution= None

low_resolution= None

r_free_flags_fraction= None

k_sol= 0.0

Bulk solvent k_sol values

b_sol= 0.0

Bulk solvent b_sol values

b_cart= 0 0 0 0 0 0 Anisotropic scale matrix

scale= 1.0

Overall scale factor

scattering_table= wk1995 it1992 *n_gaussian neutron Choices of scattering

table for structure factors calculations

structure_factors_accuracy

algorithm= *fft direct

cos_sin_table= False

grid_resolution_factor= 1/3.

quality_factor= None

u_base= None

b_base= None

wing_cutoff= None

exp_table_one_over_step_size= None

mask

solvent_radius= 1.11

shrink_truncation_radius= 0.9

grid_step_factor= 4.0

The grid step for the mask calculation is

determined as highest_resolution divided by

grid_step_factor. This is considered as suggested

value and may be adjusted internally based on the

resolution.

verbose= 1

mean_shift_for_mask_update= 0.1

Value of overall model shift in

refinement to updates the mask.

ignore_zero_occupancy_atoms= True Include atoms with zero occupancy into

mask calculation

ignore_hydrogens= True Ignore H or D atoms in mask calculation

hkl_output

format= *mtz cns

label= FMODEL

type= real *complex

file_name= None Default is the original PDB file name with the file

extension replaced by .pdbtools.mtz or .pdbtools.cns

pdb_interpretation

link_distance_cutoff= 3

disulfide_distance_cutoff= 3

chir_volume_esd= 0.2

nonbonded_distance_cutoff= None

default_vdw_distance= 1

min_vdw_distance= 1

nonbonded_buffer= 1

vdw_1_4_factor= 0.8

translate_cns_dna_rna_residue_names= None

apply_cif_modification

data_mod= None

residue_selection= None

apply_cif_link

data_link= None

residue_selection_1= None

residue_selection_2= None

peptide_link http://phenix-online.org/documentation/pdbtools.htm (4 of 5) [12/14/08 1:03:24 PM]

254

phenix.pdbtools: PDB model manipulations and statistics

cis_threshold= 45

discard_psi_phi= True

omega_esd_override_value= None

rna_sugar_pucker_analysis

use= True

bond_min_distance= 1.2

bond_max_distance= 1.8

epsilon_range_not_2p_min= 155

epsilon_range_not_2p_max= 310

delta_range_2p_min= 115

delta_range_2p_max= 180

p_distance_c1_n_line_2p_max= 2.9

show_histogram_slots

bond_lengths= 5

nonbonded_interaction_distances= 5

dihedral_angle_deviations_from_ideal= 5

show_max_lines

bond_restraints_sorted_by_residual= 5

nonbonded_interactions_sorted_by_model_distance= 5

dihedral_angle_restraints_sorted_by_residual= 3

clash_guard

nonbonded_distance_threshold= 0.5

max_number_of_distances_below_threshold= 100

max_fraction_of_distances_below_threshold= 0.1

geometry_minimization

alternate_nonbonded_off_on= False

max_iterations= 500

macro_cycles= 1

show_geometry_restraints= False http://phenix-online.org/documentation/pdbtools.htm (5 of 5) [12/14/08 1:03:24 PM]

255

Running SOLVE/RESOLVE in PHENIX

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Running SOLVE/RESOLVE in PHENIX

Author(s)

Purpose

Usage

Running SOLVE/RESOLVE from the command-line or in a script.

Literature

Additional information

Author(s)

SOLVE/RESOLVE: Tom Terwilliger

Purpose

SOLVE and RESOLVE can be run directly in the PHENIX environment. This feature is normally only for advanced SOLVE/RESOLVE users who want to access the keywords in SOLVE/RESOLVE directly.

Usage

Running SOLVE/RESOLVE from the command-line or in a script.

You can run solve with the command: phenix.solve

This command will set the environmental variables CCP4_OPEN, SYMOP, SYMLIB, and SOLVEDIR and will run solve. If you want to run a different size of solve, then you can specify: phenix.solve --giant

For a bigger version still, choose --huge, for biggest, --extra_huge.

Running resolve or resolve_pattern is similar: phenix.resolve

phenix.resolve_pattern

Running solve/resolve from a command file is simple. Here is a command file to run resolve: phenix.resolve<<EOD hklin solve.mtz

labin FP=FP SIGFP=SIGFP PHIB=PHIB FOM=FOM HLA=HLA HLB=HLB HLC=HLC HLD=HLD solvent_content 0.43

database 5

EOD http://phenix-online.org/documentation/running-solve-resolve.htm (1 of 2) [12/14/08 1:03:26 PM]

256

Running SOLVE/RESOLVE in PHENIX

Literature

Additional information

All the solve/resolve keywords are available in the PHENIX versions of solve and resolve. See the full documentation for solve/resolve at http://solve.lanl.gov/ . http://phenix-online.org/documentation/running-solve-resolve.htm (2 of 2) [12/14/08 1:03:26 PM]

257

Automated ligand identification

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Automated ligand identification

Author(s)

Purpose

Purpose of the Resolve_ligand_identification task

Usage

How the Resolve_ligand_identification task works:

How to run the Resolve_ligand_identification task

What the Resolve_ligand_identification task needs to run:

Output files from Resolve_ligand_identification task

Examples

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of ligands in the PHENIX ligand identification library

Author(s)

Resolve_ligand_identification task: Li-Wei Hung

PHENIX GUI and PDS Server: Nigel W. Moriarty

RESOLVE: Tom Terwilliger

Purpose

Purpose of the Resolve_ligand_identification task

The Resolve_ligand_identification task carries out fitting of a library of 200 most frequently observed ligands in the PDB to a given electron density map. The program also conducts the analysis and

ranking of the ligand fitting results.

The current Resolve_ligand_identification task work with the ligand library provided with the Phenix program by default. It is also capable of fitting and ranking ligands in a custom PDB library provided by the users.

Usage

The Resolve_ligand_identification task can be run from the PHENIX GUI as a stand-alone strategy, or as a task in a multi-task strategy.

How the Resolve_ligand_identification task works:

The Resolve_ligand_identification task provides a graphical user interface allowing the user to select either (1) a datafile containing crystallographic structure factor information and a PDB file with a partial model of the structure without the ligand, or (2) an mtz file containing the information of an electron density map of the potential ligand to be identified. http://phenix-online.org/documentation/ligand_identification.htm (1 of 6) [12/14/08 1:03:28 PM]

258

Automated ligand identification

The ligand fitting routine is done by RESOLVE as described in the

LigandFit wizard documentation.

The

Resolve_ligand_identification task carries out this fitting process for a library of 200 most frequently observed ligands in the Protein Data Bank, ranks and analyzes the overall fitting results. The output of

the task consists of a list of the best fitted ligands from the library. The task display provides options to view the top ranked ligand in Pymol with or without the electron density.

How to run the Resolve_ligand_identification task

An example 'ligand identification' strategy is located in the 'ligands' section in the Phenix strategy menu. Follow the directions and helps in the GUI.

What the Resolve_ligand_identification task needs to run:

The Resolve_ligand_identification task needs:

(1) a mtz file containing structure factors

(2) (optional), a PDB file with your protein model without ligand

Output files from Resolve_ligand_identification task

When you run Resolve_ligand_identification task the output files will be in the directory you started Phenix:

A summary file of the sitting results of all ligands: overall_ligand_scores.log

A summary table listing the results of the top ranked ligands: topligand.txt

The last column "Sequence in library' contains numbers '###' indicating the sequence number of the corresponding ligands. The final fitted ligand coordinates and all the log files are in the corresponding'###' files described below.

PDB files with the fitted ligands: resolve_ligand_###.pdb

A log file with the fitting of the ligand: resolve_fit_id_###.log

A log file with the fit of the ligand to the map: resolve_cc_id_###.log

Map coefficients for the map used for fitting: resolve_map.mtz

http://phenix-online.org/documentation/ligand_identification.htm (2 of 6) [12/14/08 1:03:28 PM]

259

Automated ligand identification

Examples

An example 'ligand identification' strategy is located in the 'ligands' section of the Phenix strategy menu.

Possible Problems

Specific limitations and problems:

The current Resolve_ligand_identification task work with the ligand library provided with the

Phenix program by default. It is also capable of fitting and ranking ligands in a custom PDB library provided by the users. The ligand atoms in the user-provided PDBs should be under

'HETATM' records.

Other Resolve related limitations please refer to the document of the LigandFit wizard.

Literature

Additional information

List of ligands in the PHENIX ligand identification library

---------------------------------------------------------------

PDB #ATOM LIG_ID

103m 6 NBN

1a99 6 PUT

1bio 6 GOL

1dc1 6 DIO

1dwk 6 OXL

1g29 6 DOX

1g8t 6 MO5

1h16 6 PYR

1k26 6 CRY

1fc5 7 MO6

1gaj 7 PEG

1l5j 7 F3S

1ad2 8 MPD

1b6i 8 HED

1cpf 8 TRS

1e42 8 DTT

1gth 8 URA

1jll 8 COA

1knp 8 SIN

1m6z 8 TMN

1nhz 8 HEZ

1o94 8 SF4

1s8l 8 LI1

1a0j 9 BEN

1amk 9 PGA

1bzy 9 POP

1d0v 9 NIO

1djr 9 BEZ

1f4l 9 MET http://phenix-online.org/documentation/ligand_identification.htm (3 of 6) [12/14/08 1:03:28 PM]

260

Automated ligand identification

1gck 9 ASP

1bf3 10 PHB

1bjq 10 ADE

1dan 10 FUC

1e1d 10 FSO

1e1o 10 LYS

1e7f 10 DAO

1e7h 10 PLM

1ewk 10 GLU

1fwn 10 PEP

1i0i 10 7HP

1kjp 10 PHQ

1kwn 10 TAR

1lrj 10 PGE

1os7 10 AKG

1akd 11 CAM

1d3g 11 ORO

1f98 11 HC4

10gs 12 MES

1amu 12 PHE

1e7e 12 DKA

1f7u 12 ARG

1bj5 13 MYR

1bxh 13 AMG

1f07 13 MPO

1gcz 13 CIT

1gni 13 OLA

1h9x 13 NHE

1j4u 13 MMA

1p0z 13 FLC

1e6r 14 NAA

1gkl 14 FER

1o7v 14 NDG

1rff 14 SPM

1a5a 15 PLP

1afb 15 NGA

1ajk 15 EPE

1c9s 15 TRP

1avd 16 BTN

1bg3 16 G6P

1cnq 16 F6P

1f7s 16 LDA

1fi1 16 FTT

1jsl 16 1PE

1d1v 17 BH4

1d7c 17 1PG

1e2j 17 THM

1n2n 17 H4B

1ho5 19 ADN

1o57 19 P6G

1b4w 20 BOG

1brr 20 RET

1dnc 20 GTT

1dug 20 GSH

1ere 20 EST

1fkp 20 NVP

1hvy 20 UMP

1ldn 20 FBP http://phenix-online.org/documentation/ligand_identification.htm (4 of 6) [12/14/08 1:03:28 PM]

261

Automated ligand identification

1ldn 20 OXM

1bh3 21 C8E

1d2s 21 DHT

1e2d 21 TMP

1h7f 21 C5P

1o28 21 UFP

1c3m 22 MAN-MAN

1cx4 22 CMP

1fsg 22 PRP

1gz1 22 BGC-BGC

1l4f 22 NCN

1ocj 22 BGC

1a0f 23 GTS

1aer 23 AMP

1cdg 23 MAL

1ex2 23 SUC

1gim 23 IMP

1gwv 23 LAT

1a97 24 5GP

1bir 24 2GP

1cq1 24 PQQ

1goy 24 3GP

1hk3 24 T44

1jcq 24 FPP

1ay2 25 GAL-NAG

1c3j 25 UDP

1h7l 25 TYD

1af7 26 SAH

1bfd 26 TPP

1k3l 26 GTX

1mcz 26 TDP

1ao0 27 ADP

1ao0 27 FS4

1cg1 27 IMO

1efh 27 A3P

1fpx 27 SAM

1a4r 28 GDP

1ao5 28 NAG-NAG

1b30 28 XYS-XYS-XYS

1b3v 28 XYS

1lv5 28 DCP

1opx 28 2PE

1cjk 29 FOK

1cjv 29 DAD

1g2v 29 TTP

1i52 29 CTP

1cr2 30 DTP

1ag9 31 FMN

1aq2 31 ATP

1aux 31 SAP

1b63 31 ANP

1f9h 31 APC

1gll 31 ACP

1r3k 31 DGA

1a2b 32 GSP

1b23 32 CYS

1b23 32 GNP

1ckm 32 GTP http://phenix-online.org/documentation/ligand_identification.htm (5 of 6) [12/14/08 1:03:28 PM]

262

Automated ligand identification

1pj6 32 FOL

1d1g 33 MTX

1bos 34 GAL

1bos 34 GAL-GAL-GLC

1bwu 34 MAN

1byh 34 GLC-GLC

1bzw 34 GAL-GLC

1cvn 34 MAN-MAN-MAN

1e40 34 GLC-GLC-GLC

1kzj 35 CB3

1n9b 35 MA4

1ek6 36 UPG

1b0f 38 NAG-FUC-NAG

1fuj 38 NAG-FUC

1g82 38 NAG-NAG-FUC

1d7d 39 NAG-NAG-MAN

1foa 39 UD1

1nb3 39 NAG-NAG-BMA

1kby 42 SPO

106m 43 HEM

1at5 43 NAG-NAG-NAG

1e85 43 HEC

1ek6 44 NAI

1esw 44 ACR

1p9l 44 NAD

1ece 45 GLC-GLC-GLC-GLC

1c3v 48 NDP

1c3v 48 PG4

1r2c 48 7MQ

1ti7 48 NAP

1aof 49 DHE

1gsl 49 NAG-NAG-MAN-FUC

1jnd 50 NAG-NAG-MAN-MAN

1dv3 51 BCL

1dv3 51 BPH

1dv3 51 U10

1p0h 51 ACO

1fnd 53 FAD

1lsh 53 PLD

1lsh 53 UPL

1f0y 54 CAA

1okc 54 PC2

1a65 56 NAG

1bdg 56 GLC

1en2 56 NAG-NAG-NAG-NAG

1f9d 56 GLC-GLC-GLC-GLC-GLC

1aky 57 AP5

1myr 58 NAG-FUC-NAG-MAN-XYS

1fq8 61 NAG-NAG-MAN-MAN-MAN

1prc 65 BPB

1cxp 71 NAG-NAG-MAN-MAN-MAN-FUC

1deo 72 NAG-NAG-MAN-MAN-MAN-MAN

1ax0 80 NAG-FUC-NAG-MAN-XYS-MAN-MAN

1kby 81 CDL

1dio 91 B12 http://phenix-online.org/documentation/ligand_identification.htm (6 of 6) [12/14/08 1:03:28 PM]

263

Finding all ligands in a map with phenix.find_all_ligands

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Finding all ligands in a map with phenix.find_all_ligands

Author(s)

Purpose

Usage

How phenix.find_all_ligands works:

Output files from phenix.find_all_ligands

Examples

Standard run of phenix.find_all_ligands:

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all find_all_ligands keywords

Author(s)

● phenix.find_all_ligands: Tom Terwilliger

Purpose

phenix.find_all_ligands is a command line tool for finding all the ligands in a map by repetitively running

phenix.ligandfit

with a series of ligands and choosing the best-fitting one at each cycle.

Usage

How phenix.find_all_ligands works:

The basic procedure for phenix.find_all_ligands has three steps. The first is to identify the largest contiguous region of density in your map that is not already occupied by your model or previouslyfitted ligands. The second is to fit each ligand (you identify the candidate ligands in advance) into this density. The third is to choose the one that fits the density the best. Then the best-fitting ligand is added to the structure and the process is repeated until the number of ligands you request is found or the correlation of ligand to the map drops below the value you specify (default=0.5).

Output files from phenix.find_all_ligands

The output ligand files from phenix.find_all_ligands are normally in the temporary directory

(default='temp_dir'). They will be files with names such as "SITE_1_ATP.pdb" for the placement of ATP in the first site fitted.

Examples

Standard run of phenix.find_all_ligands:

Running phenix.find_all_ligands is easy. Usually you will want to edit a small parameter file http://phenix-online.org/documentation/find_all_ligands.htm (1 of 8) [12/14/08 1:03:33 PM]

264

Finding all ligands in a map with phenix.find_all_ligands

(find_all_ligands.eff) to contain your commands like this, where the ligandfit commands are sent to

phenix.ligandfit

: for the actual fitting and the find_all_ligands commands determine what searches are done: type:

# commands for running phenix.find_all_ligands

find_all_ligands {

number_of_ligands = 5

cc_min = 0.5

ligand_list = ATP.pdb NAD.pdb

nproc = 2

} ligandfit {

data = "nsf-d2.mtz"

model = "nsf-d2_noligand.pdb"

lig_map_type = fo-fc_difference_map

}

You might also want to add to this some additional commands for phenix.ligandfit

. Any commands for ligandfit are allowed, except that the commands "ligand" and "input_lig_file" are ignored as the input ligand comes from the find_all_ligands command "ligand_list":

# find_all_ligands.eff more commands for ligandfit ligandfit { data = "nsf-d2.mtz" model = "nsf-d2_noligand.pdb" lig_map_type = fo-fc_difference_map ligand_cc_min = 0.75

verbose = Yes

} where you can put any phenix.ligandfit commands in the braces. Then you can run this with the command: phenix.find_all_ligands find_all_ligands.eff

Possible Problems

Specific limitations and problems:

This method uses phenix.ligandfit

to do the ligand fitting, so all the commands, features, and limitations of phenix.ligandfit apply to phenix.find_all_ligands.

Literature

Additional information

NOTE: in addition to the find_all_ligands keywords shown here, all phenix.ligandfit

commands are also allowed, except that the commands "ligand" and "input_lig_file" are ignored as the input ligand comes from the find_all_ligands.

List of all find_all_ligands keywords

-------------------------------------------------------------------------------

http://phenix-online.org/documentation/find_all_ligands.htm (2 of 8) [12/14/08 1:03:33 PM]

265

Finding all ligands in a map with phenix.find_all_ligands

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- find_all_ligands

number_of_ligands= None Total number of ligand sites. Ignored if "None".

find_all_ligands will keep looking until the correlation

coefficient for the fit of the best ligand is less than

cc_min or the number of ligands placed is

number_of_ligands, whichever comes first

cc_min= 0.50

Ignored if "None". find_all_ligands will keep looking until

the correlation coefficient for the fit of the best ligand is less

than cc_min or the number of ligands placed is number_of_ligands,

whichever comes first

ligand_list= None List of files with ligands to find

nproc= 1 number of processors to use

background= *Yes No run jobs in background or not

run_command= csh Command for running jobs (e.g., csh or qsub )

verbose= True *False verbose output

debug= True *False debugging output

temp_dir= Auto Optional temporary work directory

output_dir= "" Output directory where files are to be written

dry_run= False Just read in and check parameter names

ligandfit

data= None Datafile (alias for input_data_file). This can be any format if

only FP is to be read in. If phases are to be read in then MTZ format

is required. The Wizard will guess the column identification. If you

want to specify it you can say input_labels="FP" , or

input_labels="FP PHIB FOM". (Command-line only)

ligand= None File containing information about the ligand (PDB or SMILES)

(alias for input_lig_file) (Command-line only)

model= None PDB file with model for everything but the ligand (alias for

input_partial_model_file). (Command-line only)

quick= False Run as quickly as possible. (Command-line only)

special_keywords

write_run_directory_to_file= None Writes the full name of a run

directory to the specified file. This can

be used as a call-back to tell a script

where the output is going to go.

(Command-line only)

run_control

coot= None Set coot to True and optionally run=[run-number] to run Coot

with the current model and map for run run-number. In some wizards

(AutoBuild) you can edit the model and give it back to PHENIX to

use as part of the model-building process. If you just say coot

then the facts for the highest-numbered existing run will be

shown. (Command-line only)

ignore_blanks= None ignore_blanks allows you to have a command-line

keyword with a blank value like "input_lig_file_list="

stop= None You can stop the current wizard with "stopwizard" or "stop".

http://phenix-online.org/documentation/find_all_ligands.htm (3 of 8) [12/14/08 1:03:33 PM]

266

Finding all ligands in a map with phenix.find_all_ligands

If you type "phenix.autobuild run=3 stop" then this will stop run

3 of autobuild. (Command-line only)

display_facts= None Set display_facts to True and optionally

run=[run-number] to display the facts for run run-number.

If you just say display_facts then the facts for the

highest-numbered existing run will be shown.

(Command-line only)

display_summary= None Set display_summary to True and optionally

run=[run-number] to show the summary for run

run-number. If you just say display_summary then the

summary for the highest-numbered existing run will be

shown. (Command-line only)

carry_on= None Set carry_on to True to carry on with highest-numbered

run from where you left off. (Command-line only)

run= None Set run to n to continue with run n where you left off.

(Command-line only)

copy_run= None Set copy_run to n to copy run n to a new run and continue

where you left off. (Command-line only)

display_runs= None List all runs for this wizard. (Command-line only)

delete_runs= None List runs to delete: 1 2 3-5 9:12 (Command-line only)

display_labels= None display_labels=test.mtz will list all the labels

that identify data in test.mtz. You can use the label

strings that are produced in AutoSol to identify which

data to use from a datafile like this: peak.data="F+

SIGF+ F- SIGF-" # the entire string in quotes counts

here You can use the individual labels from these

strings as identifiers for data columns in AutoSol and

AutoBuild like this: input_refinement_labels="FP SIGFP

FreeR_flags" # each individual label counts

dry_run= False Just read in and check parameter names

params_only= False Just read in and return parameter defaults

display_all= False Just read in and display parameter defaults

crystal_info

cell= 0.0 0.0 0.0 0.0 0.0 0.0

Enter cell parameter a b c alpha beta

gamma

resolution= 0.0

High-resolution limit.Used as resolution limit for

density modification and as general default high-resolution

limit. If resolution_build or refinement_resolution are set

then they override this for model-building or refinement. If

overall_resolution is set then data beyond that resolution

is ignored completely.

sg= None Space Group symbol (i.e., C2221 or C 2 2 21)

display

number_of_solutions_to_display= None Number of solutions to put on

screen and to write out

solution_to_display= 1 Solution number of the solution to display and

write out ( use 0 to let the wizard display the top

solution)

file_info

file_or_file_list= *single_file file_with_list_of_files Choose if you

want to input a single file with PDB or other

information about the ligand or if you want to input

a file containing a list of files with this

information for a list of ligands

input_labels= None Labels for input data columns NOTE: Applies to input

data file for LigandFit and AutoBuild, but not to AutoMR.

For AutoMR use instead 'input_label_string'.

lig_map_type= *fo-fc_difference_map fobs_map pre_calculated_map_coeffs http://phenix-online.org/documentation/find_all_ligands.htm (4 of 8) [12/14/08 1:03:33 PM]

267

Finding all ligands in a map with phenix.find_all_ligands

Enter the type of map to use in ligand fitting

fo-fc_difference_map: Fo-Fc difference map phased on

partial model fobs_map: Fo map phased on partial model

pre_calculated_map_coeffs: map calculated from FP PHIB

[FOM] coefficients in input data file

ligand_format= *PDB SMILES Enter whether the files contain SMILES

strings or PDB formatted information

general

background= True When you specify nproc=nn, you can run the jobs in

background (default if nproc is greater than 1) or

foreground (default if nproc=1). If you set

run_command=qsub (or otherwise submit to a batch queue),

then you should set background=False, so that the batch

queue can keep track of your runs. There is no need to use

background=True in this case because all the runs go as

controlled by your batch system. If you use run_command=csh

(or similar, csh is default) then normally you will use

background=True so that all the jobs run simultaneously.

base_path= None You can specify the base path for files (default is

current working directory)

clean_up= False At the end of the entire run the TEMP directories will

be removed if clean_up is True. The default is No, keep these

directories. If you want to remove them after your run is

finished use a command like "phenix.autobuild run=1

clean_up=True"

coot_name= coot If your version of coot is called something else, then

you can specify that here.

debug= False You can have the wizard stop with error messages about the

code if you use debug. NOTE: you cannot use Pause with debug.

extend_try_list= False You can fill out the list of parallel jobs to

match the number of jobs you want to run at one time,

as specified with nbatch.

extra_verbose= False Facts and possible commands will be printed every

cycle if Yes

i_ran_seed= 289564 Random seed (positive integer) for model-building

and simulated annealing refinement

ligand_id= None You can specify an integer value for the ID of a

ligand... This number will be added to whatever residue

number the ligand search model in input_lig_file has. The

keyword is only valid if a single copy of the ligand is to be

found.

max_wait_time= 100.0

You can specify the length of time (seconds) to

wait when testing the run_command. If you have a cluster

where jobs do not start right away you may need a longer

time to wait.

nbatch= 5 You can specify the number of processors to use (nproc) and

the number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors

available and leave nbatch alone. If you leave nbatch as None it

will be set automatically, with a value depending on the Wizard.

This is recommended. The value of nbatch can affect the results

that you get, as the jobs are not split into exact replicates,

but are rather run with different random numbers. If you want to

get the same results, keep the same value of nbatch.

nproc= 1 You can specify the number of processors to use (nproc) and the

number of batches to divide the data into for parallel jobs.

Normally you will set nproc to the number of processors available

and leave nbatch alone. If you leave nbatch as None it will be http://phenix-online.org/documentation/find_all_ligands.htm (5 of 8) [12/14/08 1:03:33 PM]

268

Finding all ligands in a map with phenix.find_all_ligands

set automatically, with a value depending on the Wizard. This is

recommended. The value of nbatch can affect the results that you

get, as the jobs are not split into exact replicates, but are

rather run with different random numbers. If you want to get the

same results, keep the same value of nbatch.

resolve_command_list= None Commands for resolve. One per line in the

form: keyword value value can be optional

Examples: coarse_grid resolution 200 2.0 hklin

test.mtz NOTE: for command-line usage you need to

enclose the whole set of commands in double quotes

(") and each individual command in single quotes

(') like this: resolve_command_list="'no_build'

'b_overall 23' "

resolve_size= _giant _huge _extra_huge *None Size for solve/resolve

("","_giant","_huge","_extra_huge")

run_command= csh When you specify nproc=nn, you can run the subprocesses

as jobs in background with csh (default) or submit them to

a queue with the command of your choice (i.e., qsub ). If

you have a multi-processor machine, use csh. If you have a

cluster, use qsub or the equivalent command for your

system. NOTE: If you set run_command=qsub (or otherwise

submit to a batch queue), then you should set

background=False, so that the batch queue can keep track of

your runs. There is no need to use background=True in this

case because all the runs go as controlled by your batch

system. If you use run_command=csh (or similar, csh is

default) then normally you will use background=True so that

all the jobs run simultaneously.

skip_xtriage= False You can bypass xtriage if you want. This will

prevent you from applying anisotropy corrections, however.

temp_dir= None Define a temporary directory (it must exist)

title= Run 1 LigandFit Sun Dec 7 17:46:25 2008 Enter any text you like

to help identify what you did in this run

top_output_dir= None This is used in subprocess calls of wizards and to

tell the Wizard where to look for the STOPWIZARD file.

verbose= False Command files and other verbose output will be printed

input_files

existing_ligand_file_list= None You can enter a list of files with

ligands you have already fit. These will be

used to exclude that region from

consideration.

input_data_file= None Enter the file with input structure factor data

(files other than MTZ will be converted to mtz and

intensities to amplitudes)

input_lig_file= None Enter either a single file with PDB information or

a SMILES string or a file containing a list of files

with this information for a list of ligands. If you

enter a file containing a list of files you need also to

specify

"file_or_file_list=file_with_list_of_files".

If the format is not PDB, then ELBOW will generate a PDB

file.

input_ligand_compare_file= None If you enter a PDB file with a ligand in

it, the coordinates of the newly-built ligand

will be compared with the coordinates in this

file.

input_partial_model_file= None Enter a PDB file containing a model of

your structure without the ligand. This is http://phenix-online.org/documentation/find_all_ligands.htm (6 of 8) [12/14/08 1:03:33 PM]

269

Finding all ligands in a map with phenix.find_all_ligands

used to calculate phases. If you are providing

phases in your data file and have selected

"pre_calculated_map_coeffs" for map_type this

file may be left out.

non_user_parameters

get_lig_volume= False You can ask to get the volume of the ligand and

to then stop

offsets_list= 7 53 29 You can specify an offset for the orientation of

the helix and strand templates in building. This is used

in generating different starting models.

refinement

link_distance_cutoff= 3.0

You can specify the maximum bond distance for

linking residues in phenix.refine called from the

wizards.

r_free_flags_fraction= 0.1

Maximum fraction of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_lattice_symmetry_max_delta= 5.0

You can set the maximum

deviation of distances in the

lattice that are to be

considered the same for

purposes of generating a

lattice-symmetry-unique set of

free R flags.

r_free_flags_max_free= 2000 Maximum number of reflections in the free R

set. You can choose the maximum fraction of

reflections in the free R set and the maximum

number of reflections in the free R set. The

number of reflections in the free R set will be

up the lower of the values defined by these two

parameters.

r_free_flags_use_lattice_symmetry= True When generating r_free_flags you

can decide whether to include lattice

symmetry (good in general, necessary

if there is twinning).

search_parameters

conformers= 1 Enter how many conformers to create. If greater than 1,

then ELBOW will always be used to generate them. If 1 then

ELBOW will be used if a PDB file is not specified. These

conformers are used to identify allowed torsion angles for

your ligand. The alternative is to use the empirical rules

in RESOLVE. ELBOW takes longer but is more accurate.

delta_phi_ligand= 40.0

Specify the angle (degrees) between successive

tries in FFT search for fragments

fit_phi_inc= 20 Specify the angle (degrees) between rotations around

bonds

fit_phi_range= -180 180 Range of bond rotation angles to search

group_search= 0 Enter the ID number of the group from the ligand to use

to seed the search for conformations

ligand_cc_min= 0.75

Enter the minimum correlation coefficient of the

ligand to the map to quit searching for more

conformations

ligand_completeness_min= 1.0

Enter the minimum completeness of the

ligand to the map to quit searching for more http://phenix-online.org/documentation/find_all_ligands.htm (7 of 8) [12/14/08 1:03:33 PM]

270

Finding all ligands in a map with phenix.find_all_ligands

conformations

local_search= True If local_search is Yes then, only the region within

search_dist of the point in the map with the highest local

rmsd will be searched in the FFT search for fragments

n_group_search= 3 Enter the number of different fragments of the ligand

that will be looked for in FFT search of the map

n_indiv_tries_max= 10 If 0 is specified, all fragments are searched at

once otherwise all are first searched at once then

individually up to the number specified

n_indiv_tries_min= 5 If 0 is specified, all placements of a fragment are

tested at once otherwise all are first tested at once

then individually up to the number specified

number_of_ligands= 1 Number of copies of the ligand expected in the

asymmetric unit

search_dist= 10.0

If local_search is Yes then, only the region within

this distance of the point in the map with the highest

local rmsd will be searched in the FFT search for fragments

use_cc_local= False You can specify the use of a local correlation

coefficient for scoring ligand fits to the map. If you do

not do this, then the region over which the ligand is

scored are all points within 2.5 A of the atoms in the

ligand. If you do specify use_cc_local, then the region

over which the ligand is scored are all these points, plus

all the contingous points that have density greater than

0.5 * sigma .

search_target

ligand_near_chain= None You can specify where to search for the ligand

either with search_center or with ligand_near_res and

ligand_near_chain. If you set

ligand_near_chain="None" or leave it blank or do not

set it, then all chains will be included. The

keywords ligand_near_res and ligand_near_chain refer

to residue/chain in the file defined by

input_partial_model_file (or model if running from

command line).

ligand_near_pdb= None You can specify where LigandFit should look for

your ligands by providing a PDB file containing one or

more copies of the ligand. If you want you can provide

a PDB file with ligand+ macromolecule and specify the

ligand name with name_of_ligand_near_pdb.

ligand_near_res= None You can specify where to search for the ligand

either with search_center or with ligand_near_res and

ligand_near_chain The keywords ligand_near_res and

ligand_near_chain refer to residue/chain in the file

defined by input_partial_model_file (or model if

running from command line).

name_of_ligand_near_pdb= None You can specify where LigandFit should

look for your ligands by providing a PDB file

containing one or more copies of the ligand. If

you want you can provide a PDB file with

ligand+ macromolecule and specify the ligand

name with name_of_ligand_near_pdb.

search_center= 0.0 0.0 0.0

Enter coordinates for center of search region

(ignored if [0,0,0]) http://phenix-online.org/documentation/find_all_ligands.htm (8 of 8) [12/14/08 1:03:33 PM]

271

Mapping one PDB file onto another using space-group symmetry with phenix.map_to_object

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Mapping one PDB file onto another using space-group symmetry with phenix.

map_to_object

Author(s)

Purpose

Usage

How phenix.map_to_object works:

Examples

Standard run of phenix.map_to_object:

Run of phenix.map_to_object specifying center of mass of moving PDB is to be close to any atom of fixed PDB:

Run of phenix.map_to_object specifying center of mass of moving PDB is to have maximum number of contacts with atoms of fixed PDB:

Run of phenix.map_to_object searching over additional unit cells

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all map_to_object keywords

Author(s)

● phenix.map_to_object: Tom Terwilliger

Purpose

phenix.map_to_object is a command line tool for applying a rotation and translation consistent with space-group symmetry to a PDB file in order to bring its atoms close to those in a second PDB file.

Usage

How phenix.map_to_object works:

phenix.map_to_object searches over each equivalent position in the unit cell and neighboring unit cells to find the one that places the moving_pdb atoms closest to those in fixed_pdb. You can choose to minimize the distance between the center of mass of the PDB files, or you can minimize the distance between the closest atoms, or you can maximize the number of close contacts.

Examples

Standard run of phenix.map_to_object:

Running phenix.map_to_object is easy. You can just type: phenix.map_to_object fixed_pdb=my_target.pdb moving_pdb=my_ligand.pdb

http://phenix-online.org/documentation/map_to_object.htm (1 of 3) [12/14/08 1:03:35 PM]

272

Mapping one PDB file onto another using space-group symmetry with phenix.map_to_object and phenix.map_to_object will move my_ligand.pdb as close as it can to my_target.pdb.

Run of phenix.map_to_object specifying center of mass of moving PDB is to be close to any atom of fixed PDB:

By default phenix.map_to_object will move the center of mass of moving_pdb as close as possible to any atom in fixed_pdb. You could specify this explicitly with: phenix.map_to_object fixed_pdb=my_target.pdb moving_pdb=my_ligand.pdb \ use_moving_center_of_mass=True use_fixed_center_of_mass=False

Run of phenix.map_to_object specifying center of mass of moving PDB is to have maximum number of contacts with atoms of fixed PDB:

If you wanted instead to maximize the number of close contacts under 5 A between the center of mass of my_ligand.pdb and any atom in my_target.pdb, you could type: phenix.map_to_object fixed_pdb=my_target.pdb moving_pdb=my_ligand.pdb \ use_moving_center_of_mass=True use_fixed_center_of_mass=False \ use_contact_order=True contact_dist=5.

Run of phenix.map_to_object searching over additional unit cells

phenix.map_to_object fixed_pdb=my_target.pdb moving_pdb=my_ligand.pdb \ use_moving_center_of_mass=True use_fixed_center_of_mass=False \ use_contact_order=True contact_dist=5. \ extra_cells_to_search=2

Possible Problems

Specific limitations and problems:

Literature

Additional information

List of all map_to_object keywords

-------------------------------------------------------------------------------

Legend: black bold - scope names

black - parameter names red - parameter values blue - parameter help

blue bold

- scope help

Parameter values:

* means selected parameter (where multiple choices are available)

False is No

True is Yes

None means not provided, not predefined, or left up to the program

"%3d" is a Python style formatting descriptor

------------------------------------------------------------------------------- map_to_object

moving_pdb= None PDB file with coordinates to move near fixed_pdb using SG

symmetry

fixed_pdb= None PDB file to move moving_pdb close to using SG symmetry http://phenix-online.org/documentation/map_to_object.htm (2 of 3) [12/14/08 1:03:35 PM]

273

Mapping one PDB file onto another using space-group symmetry with phenix.map_to_object

output_pdb= None Name of output (moved) PDB file

use_moving_center_of_mass= True You can choose to just move the center of

mass of the moving PDB close to the fixed PDB

(as opposed to finding the operator that puts an

atom of the moving PDB closest to an atom in the

fixed PDB

use_fixed_center_of_mass= False You can choose to just move the moving PDB

close to the center of mass of the fixed PDB (as

opposed to finding the operator that puts the

moving PDB closest to any atom in the fixed PDB

use_contact_order= True You can choose to maximize the number of atoms that

are within contact_dist (default=6 ) A of an atom in the

other structure.

contact_dist= 6.

Atoms separated by contact_dist or less are considered to

be in contact

extra_cells_to_search= 1 You can specify how many unit cells beyond the

central one to search in each direction (default=1,

search -1 0 and 1 in each direction)

verbose= False Verbose output

debug= False Debugging output

dry_run= False Just read in and check parameter names http://phenix-online.org/documentation/map_to_object.htm (3 of 3) [12/14/08 1:03:35 PM]

274

PyMOL in PHENIX

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

PyMOL in PHENIX

Author(s)

Starting PyMOL

Setting up your view in PyMOL

Useful PyMOL commands

Additional information

Author(s)

PyMOL: PyMOL executables are kindly supplied by Warren DeLano for distribution in PHENIX.

Starting PyMOL

Normally you will start PyMOL in PHENIX after one of the PHENIX Wizards has finished. In this case if you click on the magnifying glass on the Wizard screen, and select one of the choices that displays a structure with PyMOL, then PyMOL will automatically be launched with the appropriate PDB and map files.

You can also start PyMOL by clicking on the PyMOL button on the PHENIX GUI. In this case you'll need to read in maps and models yourself. You can read in a model to PyMOL by typing load overall_best.pdb

in the PyMOL Tcl/Tk GUI window or at the PyMOL prompt in the PyMOL display window.

Setting up your view in PyMOL

Here are some simple controls that let you choose what you see in PyMOL, assuming that you have a model (pdb_1) and a map (map_1) or maps loaded.

Click a few times on "pdb_1" you will see the model turn off and on. Same for "contour_1.5".

Similarly "all" turns everything on and off, and "map_1" turns the unit cell box (which may not be visible in your viewer) on and off.

To the right of "pdb_1" you will see buttons labelled "A" "S" "H" "L" and "C". Click on each one and you'll see what they do:

"A" : Actions. lets you recenter, delete the object, and more

"S" : Show. For a model you may want to show sticks for clarity.

"H" : Hide. Undoes show.

"L" : Label. Choose what labels to display

"C" : Color. Choose colors.

The little table in the lower right of the PyMOL display window shows what each mouse button does. If you have a 3-button (2 buttons and a roller) mouse, then hold the left button down and move the mouse to rotate; right button down and move the mouse to change the size; both

● buttons down and move the mouse to move the center.

If you accidentally click the wrong buttons and some new object appears on the screen that you do not want, click on the "A" botton for the new object and select "delete" to get rid of it. http://phenix-online.org/documentation/pymol.htm (1 of 2) [12/14/08 1:03:39 PM]

275

PyMOL in PHENIX

Useful PyMOL commands

Here are a few useful PyMOL commands that you can type in the PyMOL window or in the PyMOL Tcl/

Tk GUI window.

Read in a pdb file named "overall_best.pdb": load overall_best.pdb

Read in a xplor-style map named "map_1.xplor" and contour it at a level of 1.5: load map_1.xplor

isomesh contour_1, map_1, 1.5

Create a new set of contours at a level of 2.5 for a map called "map_1" that has already been loaded: isomesh contour_1, map_1, 2.5

Get PyMOL help; help

Get PyMOL help on the command "isomesh" help isomesh

Additional information

You can get the documentation for PyMOL at pymol.sourceforge.net/html/ http://phenix-online.org/documentation/pymol.htm (2 of 2) [12/14/08 1:03:39 PM]

276

Coot - a Model Building Tool

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Coot - a Model Building Tool

Coot (Crystallographic Object-Oriented Toolkit) Coot is a program for crystallographic model building, model completion, and validation written by Paul Emsley. There is documentation about the program and how to use it here . http://phenix-online.org/documentation/coot.htm [12/14/08 1:03:42 PM]

277

MolProbity - An Active Validation Tool

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

MolProbity - An Active Validation Tool

Authors

Purpose

Usage

Possible Problems

Literature

Authors

MolProbity is a web application that integrates validation programs from the Richardson lab at Duke

University.

Ian Davis, principal author: PHP/Java web service; KiNG; Ramachandran & Rotamer; Dangle

Vincent Chen: extensions to KiNG & MolProbity

Mike Word: Reduce; Probe; Clashlist

Dave Richardson: kinemages; Mage; Prekin; Suitename

Xueyi Wang: RNABC

Jack Snoeyink & Andrew Leaver-Fay: Reduce update

Bryan Arendall: webmaster; databases

Purpose

MolProbity provides the user with an expert-system consultation about the accuracy of a macromolecular structure model, diagnosing local problems and enabling their correction. It combines all atom contact analysis with updated versions of more traditional tools for validating geometry and dihedral-angle combinations. MolProbity is most complete for crystal structures of proteins and RNA, but also handles DNA, ligands, and NMR ensembles. It works best as an active validation tool - used as soon as a model is available and during each rebuild/refine loop, not just at the end to provide global statistics before deposition. It produces coordinates, graphics, and numerical evaluations that integrate with either manual or automated use in systems such as PHENIX, KiNG, or Coot.

Usage

The integrated MolProbity web application is at http://molprobity.biochem.duke.edu/ . The user is guided through a work-flow that typically consists of:

1. Fetch or upload model(s)

2. Add & optimize H atoms, with correction of Asn/Gln/His flips

3. Calculate per-residue & global quality analyses:

1. all-atom steric clashes

2. geometry (e.g., Cbeta or ribose pucker ideality)

3. Ramachandran, sidechain rotamer, or RNA backbone outliers

4. global MolProbity score

4. View multi-criterion chart and/or on-line 3D KiNG graphics summaries

5. [Optional features, e.g. interface analysis; load maps for on-line viewing; Coot to-do list]

6. Download coordinate & graphics files for further work on local corrections

An increasingly broad subset of MolProbity functionalities are integrated directly into PHENIX for use in http://phenix-online.org/documentation/molprobity_tool.htm (1 of 2) [12/14/08 1:03:46 PM]

278

MolProbity - An Active Validation Tool refinement, Resolve, and wizard decisions. Phenix.reduce provides optimized hydrogen addition, phenix.probe and quick_clashlist.py provide all-atom clash analysis, and python versions of the

Ramachandran and rotamer scores are available in the mmtbx. Interactive all-atom contact dots are also available in Coot.

Possible Problems

Web usage requires Java, Javascript, and a modern web browser.

MolProbity provides reasonable session protection, but if security or large-scale usage are at issue, you can install MP to run on your own Linux or Mac computer provided that the computer has a web server (Apache), the PHP scripting language, JAVA, and a few common Unix utility programs. For more information, follow the "Download MolProbity" link on the MP main page.

Literature

MolProbity: all-atom contacts and structure validation for proteins and nucleic acids I. W.

Davis, A. Leaver-Fay, V. B. Chen, J. N. Block, G. J. Kapral, X. Wang, L. W. Murray, W. B. Arendall,

III, J. Snoeyink, J. S. Richardson, and D. C. Richardson. Nucl. Acids Res. 35: W375-W383 (2007)

Visualizing and Quantifying Molecular Goodness-of-Fit: Small-probe Contact Dots with

Explicit Hydrogen Atoms. J. M. Word, S. C. Lovell, T. H. LaBean, H. C. Taylor, M. E. Zalis, B. K.

Presley, J. S. Richardson, and D. C. Richardson. JMB 285, 1711-33 (1999)

Structure Validation by C

α Geometry: φ,ψ and Cβ Deviation. S.C. Lovell, I.W. Davis, W.B.

Arendall III, P.I.W. de Bakker, J.M. Word, M.G. Prisant, J.S. Richardson, and D.C. Richardson.

Proteins: Structure, Function and Genetics 50, 437-450 (2003)

A test of enhancing model accuracy in high-throughput crystallography. W.B. Arendall III,

W. Tempel, J.S. Richardson, W. Zhou, S. Wang, I.W. Davis, Z.-J. Liu, J.P. Rose, W.M. Carson, M.

Luo, D.C. Richardson, and B-C. Wang. Journal of Structural and Functional Genomics 6, 1-11 (2005) http://phenix-online.org/documentation/molprobity_tool.htm (2 of 2) [12/14/08 1:03:46 PM]

279

PHENIX Examples

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

PHENIX Examples

Where can I find sample data?

Can I easily run a Wizard with some sample data?

What sample data are available to run automatically?

Are any of the sample datasets annotated?

Where can I find sample data?

You can find sample data in the directories located in: $PHENIX/examples. Additionally there is sample MR data in $PHENIX/phaser/tutorial.

Can I easily run a Wizard with some sample data?

You can run sample data with a Wizard with a simple command. To run p9-sad sample data with the

AutoSol wizard, you type: phenix.run_example p9-sad

This command copies the $PHENIX/examples/p9-sad directory to your working directory and executes the commands in the file run.csh.

What sample data are available to run automatically?

You can see which sample data are set up to run automatically by typing: phenix.run_example --help

This command lists all the directories in $PHENIX/examples/ that have a command file run.csh ready to use. For example: phenix.run_example --help

PHENIX run_example script. Fri Jul 6 12:07:08 MDT 2007

Use: phenix.run_example example_name [--all] [--overwrite]

Data will be copied from PHENIX examples into subdirectories of this working directory

If --all is set then all examples will be run (takes a long time!)

If --overwrite is set then the script will overwrite subdirectories

List of available examples: 1J4R-ligand a2u-globulin-mr gene-5-mad p9-build p9-sad

Are any of the sample datasets annotated?

http://phenix-online.org/documentation/examples.htm (1 of 2) [12/14/08 1:03:49 PM]

280

PHENIX Examples

The PHENIX tutorials listed on the main PHENIX web page will walk you through sample datasets, telling you what to look for in the output files. For example, the

Tutorial 1: Solving a structure using

SAD data tutorial uses the p9-sad dataset as example. It tells you how to run this example data in

AutoSol and how to interpret the results. http://phenix-online.org/documentation/examples.htm (2 of 2) [12/14/08 1:03:49 PM]

281

Tutorial 1: Solving a structure with SAD data

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Tutorial 1: Solving a structure with SAD data

Introduction

Setting up to run PHENIX

Running the demo p9 data with AutoSol

Where are my files?

What parameters did I use?

Reading the log files for your AutoSol run file

Summary of the command-line arguments

ImportRawData.

Using the datafiles converted to premerged format.

Guessing cell contents

Running phenix.xtriage

Testing for anisotropy in the data

Choosing datafiles with high signal-to-noise

Running HYSS to find the heavy-atom substructure

Finding the hand and scoring heavy-atom solutions

Scoring heavy-atom solutions

Final phasing with Phaser

Statistical density modification with RESOLVE

Generation of FreeR flags

Model-building with RESOLVE

The AutoSol_summary.dat summary file

How do I know if I have a good solution?

What to do next

Additional information

Introduction

This tutorial will use some very good SAD data (peak wavelength from an IF5A dataset diffracting to

1.7 A) as an example of how to solve a SAD dataset with AutoSol. It is designed to be read all the way through, giving pointers for you along the way. Once you have read it all and run the example data and looked at the output files, you will be in a good position to run your own data through AutoSol.

Setting up to run PHENIX

If PHENIX is already installed and your environment is all set, then if you type: echo $PHENIX then you should get back something like this:

/xtal//phenix-1.3

If instead you get: http://phenix-online.org/documentation/tutorial_sad.htm (1 of 12) [12/14/08 1:04:01 PM]

282

Tutorial 1: Solving a structure with SAD data

PHENIX: undefined variable then you need to set up your PHENIX environment. See the

PHENIX installation

page for details of how to do this. If you are using the C-shell environment (csh) then all you will need to do is add one line to your .cshrc (or equivalent) file that looks like this: source /xtal/phenix-1.3/phenix_env

(except that the path in this statement will be where your PHENIX is installed). Then the next time you log in $PHENIX will be defined.

Running the demo p9 data with AutoSol

To run AutoSol on the demo p9 data, make yourself a tutorials directory and cd into that directory: mkdir tutorials cd tutorials

Now type the phenix command: phenix.run_example --help to list the available examples. Choosing p9-sad for this tutorial, you can now use the phenix command: phenix.run_example p9-sad to solve the p9 structure with AutoSol. This command will copy the directory $PHENIX/examples/

p9-sad to your current directory (tutorials) and call it tutorials/p9-sad/ . Then it will run AutoSol using the command file run.csh that is present in this tutorials/p9-sad/ directory. This command file run.csh is simple. It says:

#!/bin/csh phenix.autosol seq_file=seq.dat sites=4 atom_type=Se data=p9_se_w2.sca \ sg="I4" cell="113.949 113.949 32.474 90.000 90.000 90.00" \ resolution=2.4 thoroughness=quick

The first line (#!/bin/csh) tells the system to interpret the remainder of the text in the file using the

C-shell (csh). The command phenix.autosol runs the command-line version of AutoSol (see

Automated Structure Solution using AutoSol for all the details about AutoSol including a full list of

keywords). The arguments on the command line tell AutoSol about the sequence file (seq_file=seq.

dat), the number of sites to look for (sites=4), and the atom type (atom_type=Se). (Note that each of these is specified with an = sign, and that there are no spaces around the = sign.) The Phaser heavy-atom refinement and model completion algorithm used in the AutoSol SAD phasing will add additional sites if warranted. Note the backslash "\" at the end of some of the lines in the phenix.

autosol command. This tells the C-shell (which interprets everything in this file) that the next line is a continuation of the current line. There must be no characters (not even a space) after the backslash for this to work. The SAD data to be used to solve the structure is in the datafile p9_se_w2.sca. This datafile is in Scalepack unmerged format, which means that there may be multiple instances of each reflection and the cell parameters are not in the file, so we need to provide the cell parameters with the command, cell="113.949 113.949 32.474 90.000 90.000 90.00". (Note that the cell parameters are surrounded by quotation marks. That tells the parser that these are all together.) In this example, the space group in the p9_se_w2.sca file is I41, but the correct space group is I4, so we need to tell AutoSol the correct space group with sg="I4". The resolution of the data in http://phenix-online.org/documentation/tutorial_sad.htm (2 of 12) [12/14/08 1:04:01 PM]

283

Tutorial 1: Solving a structure with SAD data

p9_se_w2.sca is to 1.74 A, but in this example we would like to solve the structure quickly, so we have cut the resolution back with the commands resolution=2.4 and thoroughness=quick. The

quick command sets several defaults to give a less comprehensive search for heavy-atom sites and a less thorough model-building than if you use the default of thoroughness=thorough. Although the

phenix.run_example p9-sad command has just run AutoSol from a script (run.csh), you can run

AutoSol yourself from the command line with the same phenix.autosol seq_file= ... command. You can also run AutoSol from a GUI, or by putting commands in another type of script file. All these possibilities are described in

Running a Wizard from a GUI, the command-line, or a script .

Where are my files?

Once you have started AutoSol or another Wizard, an output directory will be created in your current

(working) directory. The first time you run AutoSol in this directory, this output directory will be called AutoSol_run_1_ (or AutoSol_run_1_/, where the slash at the end just indicates that this is a directory). All of the output from run 1 of AutoSol will be in this directory. If you run AutoSol again, a new subdirectory called AutoSol_run_2_ will be created. Inside the directory AutoSol_run_1_ there will be one or more temporary directories such as TEMP0 created while the Wizard is running. The files in this temporary directory may be useful sometimes in figuring out what the Wizard is doing (or not doing!). By default these directories are emptied when the Wizard finishes (but you can keep their contents with the command clean_up=False if you want.)

What parameters did I use?

Once the AutoSol wizard has started (when run from the command line), a parameters file called

autosol.eff will be created in your output directory (e.g., AutoSol_run_1_/autosol.eff). This

parameters file has a header that says what command you used to run AutoSol, and it contains all the starting values of all parameters for this run (including the defaults for all the parameters that you did not set). The autosol.eff file is good for more than just looking at the values of parameters, though. If you copy this file to a new one (for example autosol_hires.eff) and edit it to change the values of some of the parameters (resolution=1.74) then you can re-run AutoSol with the new values of your parameters like this: phenix.autosol autosol_hires.eff

This command will do everything just the same as in your first run but use all the data to 1.74 A.

Reading the log files for your AutoSol run file

While the AutoSol wizard is running, there are several places you can look to see what is going on. The most important one is the overall log file for the AutoSol run. This log file is located in:

AutoSol_run_1_/AutoSol_run_1_1.log

for run 1 of AutoSol. (The second 1 in this log file name will be incremented if you stop this run in the middle and restart it with a command like phenix.autosol run=1). The AutoSol_run_1_1.log file is a running summary of what the AutoSol Wizard is doing. Here are a few of the key sections of the log files produced for the p9 SAD dataset.

Summary of the command-line arguments

Near the top of the log file you will find:

------------------------------------------------------------

Starting AutoSol with the command: http://phenix-online.org/documentation/tutorial_sad.htm (3 of 12) [12/14/08 1:04:01 PM]

284

Tutorial 1: Solving a structure with SAD data phenix.autosol seq_file=seq.dat sites=4 atom_type=Se data=p9_se_w2.sca sg=I4 \ cell='113.949 113.949 32.474 90.000 90.000 90.00' resolution=2.4 \ thoroughness=quick

This is just a repeat of how you ran AutoSol; you can copy it and paste it into the command line to repeat this run.

ImportRawData.

The input data file p9_se_w2.sca is in unmerged Scalepack format. The AutoSol wizard converts everything to premerged Scalepack format before proceeding. Here is where the AutoSol Wizard identifies the format and then calls the ImportRawData Wizard:

HKLIN ENTRY: p9_se_w2.sca

GUESS FILE TYPE MERGE TYPE sca unmerged

LABELS['I', 'SIGI']

CONTENTS: ['p9_se_w2.sca', 'sca', 'unmerged', 'I 41', None, None, ['I', 'SIGI']]

Converting the files ['p9_se_w2.sca'] to sca format before proceeding

Running import directly...

WIZARD: ImportRawData

Using the datafiles converted to premerged format.

After completing the ImportRawData step, the AutoSol Wizard goes back to the beginning, but uses the newly-converted file p9_se_w2_PHX.sca:

HKLIN ENTRY: AutoSol_run_1_/p9_se_w2_PHX.sca

FILE TYPE scalepack_merge

GUESS FILE TYPE MERGE TYPE sca premerged

LABELS['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']

Unit cell: (113.949, 113.949, 32.474, 90, 90, 90)

Space group: I 4 (No. 79)

CONTENTS: ['AutoSol_run_1_/p9_se_w2_PHX.sca', 'sca', 'premerged', 'I 4',

[113.949, 113.949, 32.473999999999997, 90.0, 90.0, 90.0],

1.7443432606877809, ['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']]

Total of 1 input data files

Guessing cell contents

The AutoSol Wizard uses the sequence information in your sequence file (seq.dat) and the cell parameters and space group to guess the number of NCS copies and the solvent fraction, and the number of total methionines (approximately equal to the number of heavy-atom sites for SeMet proteins):

AutoSol_guess_setup_for_scaling AutoSol Run 1 Fri Mar 7 00:53:48 2008

Solvent fraction and resolution and ha types/scatt fact

This is the last dataset to scale

Guessing setup for scaling dataset 1

SG I 4 cell [113.949, 113.949, 32.473999999999997, 90.0, 90.0, 90.0]

Number of residues in unique chains in seq file: 139

Unit cell: (113.949, 113.949, 32.474, 90, 90, 90) http://phenix-online.org/documentation/tutorial_sad.htm (4 of 12) [12/14/08 1:04:01 PM]

285

Tutorial 1: Solving a structure with SAD data

Space group: I 4 (No. 79)

CELL VOLUME :421654.580793

N_EQUIV:8

GUESS OF NCS COPIES: 1

SOLVENT FRACTION ESTIMATE: 0.64

Total residues:139

Total Met:4 resolution estimate: 2.4

Running phenix.xtriage

The AutoSol Wizard automatically runs phenix.xtriage on each of your input datafiles to analyze them for twinning, outliers, translational symmetry, and other special conditions that you should be aware

of. You can read more about xtriage in Data quality assessment with phenix.xtriage

. Part of the

summary output from xtriage for this dataset looks like this:

The largest off-origin peak in the Patterson function is 6.49% of the height of the origin peak. No significant pseudotranslation is detected.

The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.

Testing for anisotropy in the data

The AutoSol Wizard tests for anisotropy by determining the range of effective anisotropic B values along the principal lattice directions. If this range is large and the ratio of the largest to the smallest value is also large then the data are by default corrected to make the anisotropy small (see

Analyzing and scaling the data

in the AutoSol web page for more discussion of the anisotropy correction). In the

p9 case, the range of anisotropic B values is small and no correction is made:

Range of aniso B: 15.67 26.14

Not using aniso-corrected data files as the range of aniso b is only

10.47 and 'correct_aniso' is not set

Choosing datafiles with high signal-to-noise

During scaling, the AutoSol Wizard estimates the signal-to-noise in each datafile and the resolution where there is significant signal-to-noise (above 0.3:1 signal-to-noise). You can see this analysis in the log file dataset_scale_1.log for dataset 1. In this case, the signal-to-noise is 1.4 to a resolution of

2.4 A:

FILE DATA:AutoSol_run_1_/p9_se_w2_PHX.sca sn: 1.420786

Running HYSS to find the heavy-atom substructure

The HYSS (hybrid substructure search) procedure for heavy-atom searching uses a combination of a

Patterson search for 2-site solutions with direct methods recycling. The search ends when the same solution is found beginning with several different starting points. The HYSS log files are named after the datafile that they are based on and the type of differences (ano, iso) that are being used. In this

p9 SAD dataset, the HYSS logfile is p9_se_w2_PHX.sca_ano_1.sca_hyss.log. The key part of this

HYSS log file is:

Entering search loop: http://phenix-online.org/documentation/tutorial_sad.htm (5 of 12) [12/14/08 1:04:01 PM]

286

Tutorial 1: Solving a structure with SAD data p = peaklist index in Patterson map f = peaklist index in two-site translation function cc = correlation coefficient after extrapolation scan r = number of dual-space recycling cycles cc = final correlation coefficient p=000 f=000 cc=0.392 r=015 cc=0.532 [ best cc: 0.532 ] p=000 f=001 cc=0.381 r=015 cc=0.532 [ best cc: 0.532 0.532 ]

Number of matching sites of top 2 structures: 6

Here a correlation coefficient of 0.5 is very good (0.1 is hopeless, 0.2 is possible, 0.3 is good) and 6 sites were found that matched in the first two tries. The program continues until 5 structures all have matching sites, then ends and prints out the final correlations, after taking the top 4 sites.

Finding the hand and scoring heavy-atom solutions

Normally either hand of the heavy-atom substructure is a possible solution, and both must be tested by calculating phases and examining the electron density map and by carrying out density modification, as they will give the same statistics for all heavy-atom analysis and phasing steps. Note that in chiral space groups (those that have a handedness such as P61, both hands of the space

group must be tested. The AutoSol Wizard will do this for you, inverting the hand of the heavy-atom substructure and the space group at the same time. For example, in space group P61 the hand of the substructure is inverted and then it is placed in space group P65.

Scoring heavy-atom solutions

The AutoSol Wizard scores heavy-atom solutions based on two criteria by default. The first criterion is the skew of the electron density in the map (SKEW). Good values for the skew are anything greater than 0.1. In a SAD structure determination, the heavy-atom solution with the correct hand may have a much more positive skew than the one with the inverse hand. The second criterion is the correlation of local RMS density (CORR_RMS). This is a measure of how contiguous the solvent and non-solvent regions are in the map. (If the local rms is low at one point and also low at neighboring points, then the solvent region must be relatively contiguous, and not split up into small regions.) For SAD datasets, Phaser is used for calculating phases. For a SAD dataset, a figure of merit of 0.3 is acceptable, 0.4 is fine and anything above 0.5 is very good. The scores for solution #1 are listed in the

AutoSol log file:

Scoring for this solution now...

AutoSol_run_1_/TEMP0/resolve.scores SKEW -0.047612928

AutoSol_run_1_/TEMP0/resolve.scores CORR_RMS 0.8755398

CC-EST (BAYES-CC) SKEW : 10.0 +/- 26.1

CC-EST (BAYES-CC) CORR_RMS : 55.7 +/- 36.1

Resetting sigma of quality estimate due to wide range of estimated values:

Overall quality: 14.7

Highest lower bound of quality for individual estimates: 37.6

Current 2*sigma: 37.5 New 2*sigma: 45.7

ESTIMATED MAP CC x 100: 14.7 +/- 45.7

The ESTIMATED MAP CC x 100 is an estimate of the quality of the experimental electron density map (not the density-modified one). A set of real structures was used to calibrate the range of values of each score that were obtained for phases with varying quality. The resulting probability distributions http://phenix-online.org/documentation/tutorial_sad.htm (6 of 12) [12/14/08 1:04:01 PM]

287

Tutorial 1: Solving a structure with SAD data are used above to estimate the correlation between the experimental map and an ideal map for this structure. Then all the estimates are combined to yield an overall Bayesian estimate of the map quality. These are reported as CC x 100 +/- 2SD. These estimated map CC values are usually fairly close, so as the estimate is 14.7 +/- 45.7, you can be quite confident that this solution is not the right one. The wizard then tries the inverse solution...

Scoring for this solution now...

AutoSol_run_1_/TEMP0/resolve.scores SKEW 0.2644597

AutoSol_run_1_/TEMP0/resolve.scores CORR_RMS 0.9274329

CC-EST (BAYES-CC) SKEW : 56.5 +/- 18.1

CC-EST (BAYES-CC) CORR_RMS : 63.1 +/- 28.5

ESTIMATED MAP CC x 100: 60.0 +/- 13.6

Reading NCS information from: AutoSol_run_1_/TEMP0/resolve.log

based on [ha_2.pdb,phaser_2.mtz]

Reformatting ha_2.pdb and putting it in ha_2.pdb_formatted.pdb

RANGE to KEEP :1.28

Confident of the hand (Quality diff from opp hand is 1.9 sigma)

This solution looks a lot better. The overall estimated map CC value is 60.0 +/- 13.6. This means that your structure is not only solved but that you will have a good map when it is density modified.

Final phasing with Phaser

Once the best heavy-atom solution or solutions are chosen based on ESTIMATED MAP CC x 100, these are used in a final round of phasing with Phaser (for SAD phasing). The log file from phasing for solution 2 is in phaser_2.log. Here is the final part of the output from this log file, showing the refined coordinates, occupancies, thermal (B) factors for the 4 sites, along with the refined scattering factors (in this case only f" is refined), and the final figure of merit of phasing (0.544):

Atom Parameters: 4 atoms in list

X Y Z O B (AnisoB) M Atomtype

#1 0.180 -0.113 -0.681 1.135 22.8 ( ---- ) 1 SE

#2 0.686 -0.238 -0.710 0.980 23.0 (+22.40) 1 SE

#3 0.665 -0.206 -0.774 1.020 28.2 (+26.14) 1 SE

#5 0.027 0.758 0.905 0.176 23.9 ( ---- ) 1 SE

Scattering Parameters:

Atom f" (f')

SE 5.5196 -8.0000

Figures of Merit

----------------

Bin Resolution Acentric Centric Single Total

Number FOM Number FOM Number FOM Number FOM

ALL 28.49- 2.40 7502 0.594 874 0.140 51 0.057 8427 0.544

log-likelihood gain -90088

Statistical density modification with RESOLVE

After SAD phases are calculated with Phaser, the AutoSol Wizard uses RESOLVE density modification to http://phenix-online.org/documentation/tutorial_sad.htm (7 of 12) [12/14/08 1:04:01 PM]

288

Tutorial 1: Solving a structure with SAD data improve the quality of the electron density map. The statistical density modification in RESOLVE takes advantage of the flatness of the solvent region and the expected distribution of electron density in the region containing the macromolecule, as well as any NCS that can be found from the heavy-atom substructure. The weighted structure factors and phases (FWT, PHWT) from Phaser are used to calculate the starting map for RESOLVE, and the experimental structure factor amplitudes (FP) and

SAD Hendrickson-Lattman coefficients from Phaser are used in the density modification process. The output from RESOLVE for solution 1 can be found in resolve_2.log. Here are key sections of this output. First, the plot of how many points in the "protein" region of the map have each possible value of electron density. The plot below is normalized so that a density of zero is the mean of the solvent region, and the standard deviation of the density in the map is 1.0. A perfect map has a lot of points with density slightly less than zero on this scale (the points between atoms) and a few points with very high density (the points near atoms), and no points with very negative density. Such a map has a very high skew (think "skewed off to the right"). This map is good, with a positive skew, though it is not perfect.

Plot of Observed (o) and model (x) electron density distributions for protein

region, where the model distribution is given by,

p_model(beta*(rho+offset)) = p_ideal(rho)

and then convoluted with a gaussian with width of sigma

where sigma, offset and beta are given below under "Error estimate."

0.03..................................................

. . .

. . .

. xxxxxxx .

. ooxoooooooxxo .

. xxo . xoo .

. xo . xxo .

p(rho) . xx . xxoo .

. ox . xxooo .

. xo . xxo .

. xx . xxxx .

. xo . xxxx .

. xx . xxxx .

xxx . xxxxx .

x . ooxxxx

0.0 x................................................x

-2 -1 0 1 2 3

normalized rho (0 = mean of solvent region)

After density modification is complete, this plot becomes much more like one from a perfect structure:

0.03..................................................

. . .

. xxxxx . .

. xooooxxo . .

. oxo oxo . .

. xx xo. .

. ox xxoo .

p(rho) . ox .xoo .

. x . xo .

. x . xxxx .

. xx . xxxxo .

http://phenix-online.org/documentation/tutorial_sad.htm (8 of 12) [12/14/08 1:04:01 PM]

289

Tutorial 1: Solving a structure with SAD data

. xx . xxxxxxxx .

. xx . ooxxxxxx .

xx . o oxxxxxoo

x . xo

0.0 o................................................x

-2 -1 0 1 2 3

normalized rho (0 = mean of solvent region)

The key statistic from this RESOLVE density modification is the R-factor for comparison of observed structure factor amplitudes (FP) with those calculated from the density modification procedure (FC). In this p9 SAD phasing the R-factor is very low:

Overall R-factor for FC vs FP: 0.239 for 8422 reflections

An acceptable value is anything below 0.35; below 0.30 is good.

Generation of FreeR flags

The AutoSol Wizard will create a set of free R flags indicating which reflections are not to be used in refinement. By default 5% of reflections (up to a maximum of 2000) are reserved for this test set. If you want to supply a reflection file hires.mtz that has higher resolution than the data used to solve the structure, or has a test set already marked, then you can do this with the keyword

input_refinement_file=hires.mtz. The files to be used for model-building and refinement are listed in the AutoSol log file:

FreeR_flag added to phaser_2.mtz

...

Saving exptl_fobs_phases_freeR_flags_2.mtz for refinement

THE FILE AutoSol_run_1_/resolve_2.mtz will be used for model-building

Model-building with RESOLVE

The AutoSol Wizard by default uses a very quick method to build just the secondary structure of your macromolecule. This is controlled by the keyword helices_strands_only=True. The Wizard will guess from your sequence file whether the structure is protein or RNA or DNA (but you can tell it if you want with (chain_type=PROTEIN). If the quick model-building does not build a satisfactory model (if the correlation of map and model is less than acceptable_secondary_structure_cc=0.35), then modelbuilding is tried again with the standard build procedure, essentially the same as one cycle of model-

building with the AutoBuild Wizard (see the web page Automated Model Building and Rebuilding with

AutoBuild

, except that if you specify thoroughness=quick as we have in this example, the modelbuilding is done less comprehensively to speed things up. In this case the secondary-structure-only model-building produces an initial model with 61 residues built and side chains assigned to 0, and which has a model-map correlation of 0.33:

Model with helices and strands is in Build_1.pdb

Log for helices and strands is in Build_1.log

Final file: AutoSol_run_1_/TEMP0/Build_1.pdb

Log file: Build_1.log copied to Build_1.log

Model 1: Residues built=61 placed=0 Chains=9 Model-map CC=0.33

This is new best model with cc = 0.33

Getting R for model: Build_1.pdb

http://phenix-online.org/documentation/tutorial_sad.htm (9 of 12) [12/14/08 1:04:01 PM]

290

Tutorial 1: Solving a structure with SAD data

Model: AutoSol_run_1_/TEMP0/refine_1.pdb R/Rfree=0.55/0.58

As the model-map correlation is only 0.33, the Wizard decides that this is not good enough and tries again with regular model-building, yielding a better model with 86 residues built and a map correlation of 0.55:

Model 2: Residues built=86 placed=7 Chains=15 Model-map CC=0.55

This is new best model with cc = 0.55

Refining model: Build_2.pdb

Model: AutoSol_run_1_/TEMP0/refine_2.pdb R/Rfree=0.46/0.49

After one model completion cycle (including extending ends of chains, fitting loops, and building outside the region already built, the best model built has 77 residues built, 22 assigned to sequence and a map correlation of 0.61:

Model completion cycle 1

Models to combine and extend: ['Build_2.pdb', 'refine_2.pdb']

Model 3: Residues built=77 placed=22 Chains=10 Model-map CC=0.61

This is new best model with cc = 0.61

Refining model: Build_combine_extend_3.pdb

Model: AutoSol_run_1_/TEMP0/refine_3.pdb R/Rfree=0.45/0.47

This initial model is written out to refine_3.pdb in the output directory. It is still just a preliminary model, but it is good enough to tell that the structure is solved. For full model-building you will want to

go on and use the AutoBuild Wizard (see the web page Automated Model Building and Rebuilding with

AutoBuild

)

The AutoSol_summary.dat summary file

A quick summary of the results of your AutoSol run is in the AutoSol_summary.dat file in your output directory. This file lists the key files that were produced in your run of AutoSol (all these are in the output directory) and some of the key statistics for the run, including the scores for the heavyatom substructure and the model-building and refinement statistics. These statistics are listed for all the solutions obtained, with the highest-scoring solutions first. Here is part of the summary for this p9

SAD dataset:

-----------CURRENT SOLUTIONS FOR RUN 1 : -------------------

*** FILES ARE IN THE DIRECTORY: AutoSol_run_1_ ****

Solution # 2 BAYES-CC: 60.0 +/- 13.6 Dataset #1 FOM: 0.54 ----------------

Solution 2 using HYSS on

/net/firebird/scratch1/terwill/run_072908a/p9-sad/AutoSol_run_1_/ p9_se_w2_PHX.sca_ano_1.sca and taking inverse. Dataset #1

Dataset number: 1

Dataset type: sad

Datafiles used: [

'/net/firebird/scratch1/terwill/run_072908a/p9-sad/AutoSol_run_1_/p9_se_w2_PHX.sca']

Sites: 4 (Already used for Phasing at resol of 2.4) Refined Sites: 4

NCS information in: AutoSol_2.ncs_spec

Experimental phases in: phaser_2.mtz

Experimental phases plus FreeR_flags for refinement in: exptl_fobs_phases_freeR_flags_2.mtz

Density-modified phases in: resolve_2.mtz

http://phenix-online.org/documentation/tutorial_sad.htm (10 of 12) [12/14/08 1:04:01 PM]

291

Tutorial 1: Solving a structure with SAD data

HA sites (PDB format) in: ha_2.pdb_formatted.pdb

Sequence file in: seq.dat

Model in: refine_3.pdb

Residues built: 77

Side-chains built: 22

Chains: 10

Overall model-map correlation: 0.61

R/R-free: 0.45/0.47

Scaling logfile in: dataset_1_scale.log

HYSS logfile in: p9_se_w2_PHX.sca_ano_1.sca_hyss.log

Phasing logfile in: phaser_2.log

Density modification logfile in: resolve_2.log (R=0.24)

Build logfile in: Build_combine_extend_3.log

Score type: SKEW CORR_RMS

Raw scores: 0.26 0.93

BAYES-CC: 56.50 63.07

Refined heavy atom sites (fractional): xyz 0.180 -0.113 -0.681

xyz 0.686 -0.238 -0.710

xyz 0.665 -0.206 -0.774

xyz 0.027 0.758 0.905

How do I know if I have a good solution?

Here are some of the things to look for to tell if you have obtained a correct solution:

How much of the model was built? More than 50% is good, particularly if you are using the default of helices_strands_only=True. If less than 25% of the model is built, then it may be entirely incorrect. Have a look at the model. If you see clear sets of parallel or antiparallel strands, or if you see helices and strands with the expected relationships, your model is going to be correct. If you see a lot of short fragments everywhere, your model and solution is going to be incorrect. How many side-chains were fitted to density? More than 25% is ok, more than 50% is very good.

What is the R-factor of the model? This only applies if you are building a full model (not for helices_strands_only=True). For a solution at moderate to high resolution (2.5 A or better) the Rfactor should be in the low 30's to be very good. For lower-resolution data, an R-factor in the low

40's is probably largely correct but the model is not very good.

What was the overall signal-to-noise in the data? Above 1 is good, below 0.5 is very low.

What are the individual CC-BAYES estimates of map correlation for your top solution? For a good

● solution they are all around 50 or more, with 2SD uncertainties that are about 10-20.

What is the overall "ESTIMATED MAP CC x 100" of your top solution. This should also be 50 or more for a good solution. This is an estimate of the map correlation before density modification, so if you have a lot of solvent or several NCS-related copies in the asymmetric unit, then lower

● values may still give you a good map.

What is the difference in "ESTIMATED MAP CC x 100" between the top solution and its inverse?

If this is large (more than the 2SD values for each) that is a good sign.

What to do next

Once you have run AutoSol and have obtained a good solution and model, the next thing to do is to run the AutoBuild Wizard. If you run it in the same directory where you ran AutoSol, the AutoBuild

Wizard will pick up where the AutoSol Wizard left off and carry out iterative model-building, density

modification and refinement to improve your model and map. See the web page Automated Model

Building and Rebuilding with AutoBuild for details on how to run AutoBuild. If you do not obtain a good

http://phenix-online.org/documentation/tutorial_sad.htm (11 of 12) [12/14/08 1:04:01 PM]

292

Tutorial 1: Solving a structure with SAD data solution, then it's not time to give up yet. There are a number of standard things to try that may improve the structure determination. Here are a few that you should always try:

Have a careful look at all the output files. Work your way through the main log file (e.g.,

AutoSol_run_1_1.log) and all the other principal log files in order beginning with scaling

(dataset_1_scale.log), then looking at heavy-atom searching (p9_se_w2_PHX.sca_ano_1.

sca_hyss.log), phasing (e.g., phaser_1.log or phaser_xx.log depending on which solution

xx was the top solution) and density modification (e.g., resolve_xx.log). Is there anything strange or unusual in any of them that may give you a clue as to what to try next? For example did the phasing work well (high figure of merit) yet the density modification failed? (Perhaps the hand is incorrect). Was the solvent content estimated correctly? (You can specify it yourself if you want). What does the xtriage output say? Is there twinning or strong translational symmetry? Are there problems with reflections near ice rings? Are there many outlier reflections?

Try a different resolution cutoff. For example 0.5 A lower resolution than you tried before. Often the highest-resolution shells have little useful information for structure solution (though the data may be useful in refinement and density modification).

Try a different rejection criterion for outliers. The default is ratio_out=3.0 (toss reflections with delta F more than 3 times the rms delta F of all reflections in the shell). Try instead

ratio_out=5.0 to keep almost everything.

If the heavy-atom substructure search did not yield plausible solutions, try searching with HYSS using the command-line interface, and vary the resolution and number of sites you look for. Can you find a solution that has a higher CC than the one found in AutoSol? If so, you can read your

● solution in to AutoSol with sites_file=my_sites.pdb.

Was an anisotropy correction applied in AutoSol? If there is some anisotropy but no correction was applied, you can force AutoSol to apply the correction with correct_aniso=True.

Additional information

For details about the AutoSol Wizard, see Automated structure solution with AutoSol . For help on

running Wizards, see

Running a Wizard from a GUI, the command-line, or a script

. http://phenix-online.org/documentation/tutorial_sad.htm (12 of 12) [12/14/08 1:04:01 PM]

293

Tutorial 2: Solving a structure with MAD data

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Tutorial 2: Solving a structure with MAD data

Introduction

Setting up to run PHENIX

Running the demo gene-5 data with AutoSol

Where are my files?

What parameters did I use?

Reading the log files for your AutoSol run file

Summary of the command-line arguments

Reading the datafiles.

Guessing cell contents

Running phenix.xtriage

Testing for anisotropy in the data

Scaling MAD data and estimating FA values

Choosing datafiles with high signal-to-noise

Running HYSS to find the heavy-atom substructure

Finding the hand and scoring heavy-atom solutions

Finding additional sites by density modification and FA heavy-atom Fouriers

Final phasing with SOLVE

Statistical density modification with RESOLVE

Generation of FreeR flags

Model-building with RESOLVE

The AutoSol_summary.dat summary file

How do I know if I have a good solution?

What to do next

Additional information

Introduction

This tutorial will use some moderately good MAD data (3 wavelengths from a gene-5 protein SeMet dataset diffracting to 2.6 A) as an example of how to solve a MAD dataset with AutoSol. It is designed to be read all the way through, giving pointers for you along the way. Once you have read it all and run the example data and looked at the output files, you will be in a good position to run your own data through AutoSol.

Setting up to run PHENIX

If PHENIX is already installed and your environment is all set, then if you type: echo $PHENIX then you should get back something like this:

/xtal//phenix-1.3

If instead you get: http://phenix-online.org/documentation/tutorial_mad.htm (1 of 14) [12/14/08 1:04:07 PM]

294

Tutorial 2: Solving a structure with MAD data

PHENIX: undefined variable then you need to set up your PHENIX environment. See the

PHENIX installation

page for details of how to do this. If you are using the C-shell environment (csh) then all you will need to do is add one line to your .cshrc (or equivalent) file that looks like this: source /xtal/phenix-1.3/phenix_env

(except that the path in this statement will be where your PHENIX is installed). Then the next time you log in $PHENIX will be defined.

Running the demo gene-5 data with AutoSol

To run AutoSol on the demo gene-5 data, make yourself a tutorials directory and cd into that directory: mkdir tutorials cd tutorials

Now type the phenix command: phenix.run_example --help to list the available examples. Choosing gene-5-mad for this tutorial, you can now use the phenix command: phenix.run_example gene-5-mad to solve the gene-5 structure with AutoSol. This command will copy the directory $PHENIX/

examples/gene-5-mad to your current directory (tutorials) and call it tutorials/gene-5-mad/ .

Then it will run AutoSol using the command file run.csh that is present in this tutorials/gene-5-

mad/ directory. This command file run.csh is simple. It says:

#!/bin/csh echo "Running AutoSol on gene-5 protein data..." phenix.autosol seq_file=sequence.dat sites=2 atom_type=Se \ peak.data=peak.sca peak.f_prime=-3 peak.f_double_prime=4. \ infl.data=infl.sca infl.f_prime=-5 infl.f_double_prime=2. \ high.data=high.sca high.f_prime=-1.5 high.f_double_prime=3.

The first line (#!/bin/csh) tells the system to interpret the remainder of the text in the file using the

C-shell (csh). The command phenix.autosol runs the command-line version of AutoSol (see

Automated Structure Solution using AutoSol for all the details about AutoSol including a full list of

keywords). The arguments on the command line tell AutoSol about the sequence file

(seq_file=sequence.dat), the number of sites to look for (sites=2), and the atom type

(atom_type=Se). (Note that each of these is specified with an = sign, and that there are no spaces around the = sign.) For a MAD dataset, we need to tell AutoSol something about the scattering factors at each wavelength. The lines like: peak.data=peak.sca peak.f_prime=-3 peak.f_double_prime=4. do this. This line specifies that the datafile for peak data is peak.sca, that the f’ value is -3, and that the f" value is 4. These values will (by default) be refined by SOLVE prior to calculating phases. Note http://phenix-online.org/documentation/tutorial_mad.htm (2 of 14) [12/14/08 1:04:07 PM]

295

Tutorial 2: Solving a structure with MAD data the backslash "\" at the end of some of the lines in the phenix.autosol command. This tells the Cshell (which interprets everything in this file) that the next line is a continuation of the current line.

There must be no characters (not even a space) after the backslash for this to work. The MAD data to be used to solve the structure is in the datafiles peak.sca, infl.sca and high.sca These datafiles are in Scalepack premerged format, which means that there is just one instance of each reflection and the cell parameters are in the file, so we do not need to provide the cell parameters or the space group

(unless the ones in the .sca files are incorrect!) The resolution of the data is to about 2.6 A, and we are going to let AutoSol decide on the best resolution to use for structure solution. Although the phenix.

run_example gene-5-mad command has just run AutoSol from a script (run.csh), you can run

AutoSol yourself from the command line with the same phenix.autosol seq_file= ... command. You can also run AutoSol from a GUI, or by putting commands in another type of script file. All these possibilities are described in

Running a Wizard from a GUI, the command-line, or a script .

Where are my files?

Once you have started AutoSol or another Wizard, an output directory will be created in your current

(working) directory. The first time you run AutoSol in this directory, this output directory will be called AutoSol_run_1_ (or AutoSol_run_1_/, where the slash at the end just indicates that this is a directory). All of the output from run 1 of AutoSol will be in this directory. If you run AutoSol again, a new subdirectory called AutoSol_run_2_ will be created. Inside the directory AutoSol_run_1_ there will be one or more temporary directories such as TEMP0 created while the Wizard is running. The files in this temporary directory may be useful sometimes in figuring out what the Wizard is doing (or not doing!). By default these directories are emptied when the Wizard finishes (but you can keep their contents with the command clean_up=False if you want.)

What parameters did I use?

Once the AutoSol wizard has started (when run from the command line), a parameters file called

autosol.eff will be created in your output directory (e.g., AutoSol_run_1_/autosol.eff). This

parameters file has a header that says what command you used to run AutoSol, and it contains all the starting values of all parameters for this run (including the defaults for all the parameters that you did not set). The autosol.eff file is good for more than just looking at the values of parameters, though. If you copy this file to a new one (for example autosol_lores.eff) and edit it to change the values of some of the parameters (resolution=3.0) then you can re-run AutoSol with the new values of your parameters like this: phenix.autosol autosol_lores.eff

This command will do everything just the same as in your first run but use only the data to 3.0 A.

Reading the log files for your AutoSol run file

While the AutoSol wizard is running, there are several places you can look to see what is going on. The most important one is the overall log file for the AutoSol run. This log file is located in:

AutoSol_run_1_/AutoSol_run_1_1.log

for run 1 of AutoSol. (The second 1 in this log file name will be incremented if you stop this run in the middle and restart it with a command like phenix.autosol run=1). The AutoSol_run_1_1.log file is a running summary of what the AutoSol Wizard is doing. Here are a few of the key sections of the log files produced for the gene-5 MAD dataset.

Summary of the command-line arguments

Near the top of the log file you will find: http://phenix-online.org/documentation/tutorial_mad.htm (3 of 14) [12/14/08 1:04:07 PM]

296

Tutorial 2: Solving a structure with MAD data

------------------------------------------------------------

Starting AutoSol with the command: phenix.autosol seq_file=sequence.dat sites=2 atom_type=Se peak.data=peak.sca \ peak.f_prime=-3 peak.f_double_prime=4. infl.data=infl.sca infl.f_prime=-5 \ infl.f_double_prime=2. high.data=high.sca high.f_prime=-1.5 \ high.f_double_prime=3.

This is just a repeat of how you ran AutoSol; you can copy it and paste it into the command line to repeat this run.

Reading the datafiles.

The AutoSol Wizard will read in your datafiles and check their contents, printing out a summary for each one:

HKLIN ENTRY: high.sca

FILE TYPE scalepack_merge

GUESS FILE TYPE MERGE TYPE sca premerged

LABELS['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']

Unit cell: (76.08, 27.97, 42.36, 90, 103.2, 90)

Space group: C 1 2 1 (No. 5)

CONTENTS: ['high.sca', 'sca', 'premerged', 'C 1 2 1',

[76.079999999999998, 27.969999999999999, 42.359999999999999, 90.0, 103.2, 90.0],

2.5940784397029653, ['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']]

Total of 3 input data files

['peak.sca', 'infl.sca', 'high.sca']

Guessing cell contents

The AutoSol Wizard uses the sequence information in your sequence file (sequence.dat) and the cell parameters and space group to guess the number of NCS copies and the solvent fraction, and the number of total methionines (approximately equal to the number of heavy-atom sites for SeMet proteins):

AutoSol_guess_setup_for_scaling AutoSol Run 1 Thu Mar 6 21:43:20 2008

Solvent fraction and resolution and ha types/scatt fact

This is the last dataset to scale

Guessing setup for scaling dataset 1

SG C 1 2 1 cell [76.079999999999998, 27.969999999999999, 42.359999999999999, 90.0, 103.2, 90.0]

Number of residues in unique chains in seq file: 87

Unit cell: (76.08, 27.97, 42.36, 90, 103.2, 90)

Space group: C 1 2 1 (No. 5)

CELL VOLUME :87758.6787391

N_EQUIV:4

GUESS OF NCS COPIES: 1

SOLVENT FRACTION ESTIMATE: 0.46

Total residues:87

Total Met:2 resolution estimate: 2.59

http://phenix-online.org/documentation/tutorial_mad.htm (4 of 14) [12/14/08 1:04:07 PM]

297

Tutorial 2: Solving a structure with MAD data

Running phenix.xtriage

The AutoSol Wizard automatically runs phenix.xtriage on each of your input datafiles to analyze them for twinning, outliers, translational symmetry, and other special conditions that you should be aware

of. You can read more about xtriage in Data quality assessment with phenix.xtriage

. Part of the

summary output from xtriage for this dataset looks like this:

The largest off-origin peak in the Patterson function is 12.60% of the height of the origin peak. No significant pseudotranslation is detected.

The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.

Testing for anisotropy in the data

The AutoSol Wizard tests for anisotropy by determining the range of effective anisotropic B values along the principal lattice directions. If this range is large and the ratio of the largest to the smallest value is also large then the data are by default corrected to make the anisotropy small (see

Analyzing and scaling the data

in the AutoSol web page for more discussion of the anisotropy correction). In the

gene-5 case, the range of anisotropic B values is small and no correction is made:

Range of aniso B: 24.58 27.92

Not using aniso-corrected data files as the range of aniso b is only 3.43 and 'correct_aniso' is not set

Note that if any one of the datafiles in a MAD dataset has a high anisotropy, then by default all of them will be corrected for anisotropy.

Scaling MAD data and estimating FA values

The AutoSol Wizard uses SOLVE localscaling to scale MAD data. The procedure is basically to scale all the data to the most complete dataset, ignoring anomalous differences, to create a reference dataset.

Then all F+ and F- observations at all wavelengths are scaled to this reference dataset, and then the data are merged to the asymmetric unit, averaging duplicate observations. During this process outliers that deviate from the reference values by more that ratio_out (default=3) standard deviations (using all data in the appropriate resolution shell to estimate the SD) are rejected. After scaling, the values of

f’ and f" are refined based on the relative values of anomalous differences at the various wavelengths and the relative values of dispersive differences among the data at different wavelengths. Then FA values (estimates of the heavy-atom structure factor) are estimated. These FA values can often be more useful than the anomalous differences at any of the individual wavelengths because they combine the anomalous and dispersive information. At the same time as FA values are calculated, an estimate of the phase difference between the structure factor of the anomalously-scattering atoms and the structure factor corresponding to all other atoms can be estimated. This phase difference is useful later in calculating Fourier maps showing the positions of the anomalously-scattering atoms.

Choosing datafiles with high signal-to-noise

For MAD data the AutoSol Wizard analyzes the correlation of anomalous differences at the various wavelengths. The anomalous difference for a particular reflection is related to the f" value at each wavelength. Consequently if the data are good then the anomalous differences at different wavelengths (but for the same reflections) are highly correlated. A shell of resolution in which the anomalous differences have a correlation of about 0.3 or greater has some useful information. A strong

SeMet dataset will have an overall correlation of 0.6-0.7 for the peak and high energy remote wavelengths. You can see this analysis in the log file dataset_scale_1.log for this MAD dataset: http://phenix-online.org/documentation/tutorial_mad.htm (5 of 14) [12/14/08 1:04:07 PM]

298

Tutorial 2: Solving a structure with MAD data

Correlation of anomalous differences at different wavelengths.

(You should probably cut your data off at the resolution where

this drops below about 0.3. A good dataset has correlation

between peak and remote of at least 0.7 overall. Data with

correlations below about 0.5 probably are not contributing much.)

CORRELATION FOR

WAVELENGTH PAIRS

DMIN 1 VS 2 1 VS 3 2 VS 3

5.18 0.79 0.89 0.73

3.88 0.68 0.75 0.55

3.63 0.68 0.72 0.46

3.43 0.53 0.61 0.41

3.24 0.51 0.58 0.26

3.11 0.51 0.59 0.36

2.98 0.36 0.54 0.13

2.85 0.50 0.45 0.35

2.72 0.28 0.30 0.10

2.59 0.32 0.23 0.14

ALL 0.55 0.66 0.40

During scaling, the AutoSol Wizard estimates the signal-to-noise in each datafile and the resolution where there is significant signal-to-noise (above 0.3:1 signal-to-noise). In this case, the FA's appear to have the highest signal-to-noise (3.1) and the inflection data the lowest (0.5):

FILE DATA:FA.sca sn: 3.136704

FILE DATA:peak.sca sn: 2.527422

FILE DATA:high.sca sn: 1.35499

FILE DATA:infl.sca sn: 0.5154387

order of datasets for trying phasing:['FA.sca', 'peak.sca', 'high.sca', 'infl.sca']

Running HYSS to find the heavy-atom substructure

The HYSS (hybrid substructure search) procedure for heavy-atom searching uses a combination of a

Patterson search for 2-site solutions with direct methods recycling. The search ends when the same solution is found beginning with several different starting points. The HYSS log files are named after the datafile that they are based on and the type of differences (ano, iso) that are being used. In this

gene-5 MAD dataset, the HYSS logfile is peak.sca_ano_1.sca_hyss.log. The key part of this HYSS log file is:

Entering search loop: p = peaklist index in Patterson map f = peaklist index in two-site translation function cc = correlation coefficient after extrapolation scan r = number of dual-space recycling cycles cc = final correlation coefficient p=000 f=000 cc=0.181 r=015 cc=0.292 [ best cc: 0.292 ] p=000 f=001 cc=0.151 r=015 cc=0.285 [ best cc: 0.292 0.285 ]

Number of matching sites of top 2 structures: 2 http://phenix-online.org/documentation/tutorial_mad.htm (6 of 14) [12/14/08 1:04:07 PM]

299

Tutorial 2: Solving a structure with MAD data p=000 f=002 cc=0.144 r=015 cc=0.280 [ best cc: 0.292 0.285 0.280 ]

Number of matching sites of top 2 structures: 2

Number of matching sites of top 3 structures: 2 p=001 f=000 cc=0.152 r=015 cc=0.278 [ best cc: 0.292 0.285 0.280 0.278 ]

Number of matching sites of top 2 structures: 2

Number of matching sites of top 3 structures: 2

Number of matching sites of top 4 structures: 2 p=001 f=001 cc=0.101 r=015 cc=0.291 [ best cc: 0.292 0.291 0.285 0.280 0.278 ]

Number of matching sites of top 2 structures: 3

Number of matching sites of top 3 structures: 2

Number of matching sites of top 4 structures: 2

Number of matching sites of top 5 structures: 2

Here a correlation coefficient of 0.5 is very good (0.1 is hopeless, 0.2 is possible, 0.3 is good) and 2 sites were found that matched in the first two tries. The program continues until 5 structures all have matching sites, then ends and prints out the final correlations, after taking the top 2 sites.

Finding the hand and scoring heavy-atom solutions

Normally either hand of the heavy-atom substructure is a possible solution, and both must be tested by calculating phases and examining the electron density map and by carrying out density modification, as they will give the same statistics for all heavy-atom analysis and phasing steps. Note that in chiral space groups (those that have a handedness such as P61, both hands of the space

group must be tested. The AutoSol Wizard will do this for you, inverting the hand of the heavy-atom substructure and the space group at the same time. For example, in space group P61 the hand of the substructure is inverted and then it is placed in space group P65. The AutoSol Wizard scores heavyatom solutions based on two criteria by default. The first criterion is the skew of the electron density in the map (SKEW). Good values for the skew are anything greater than 0.1. In a MAD structure determination, the heavy-atom solution with the correct hand may have a more positive skew than the one with the inverse hand. The second criterion is the correlation of local RMS density (CORR_RMS).

This is a measure of how contiguous the solvent and non-solvent regions are in the map. (If the local rms is low at one point and also low at neighboring points, then the solvent region must be relatively contiguous, and not split up into small regions.) For MAD datasets, SOLVE is used for calculating phases. For a MAD dataset, a figure of merit of 0.5 is acceptable, 0.6 is fine and anything above 0.7 is very good. The first three solutions scored are all quite good. Here is the third and best one:

SCORING SOLUTION 3: Solution 3 using HYSS on FA.sca. Dataset #1, with 2 sites

Evaluating solution 3

FOM found: 0.6

Number of scoring criteria: 2

Using BAYES-CC (Bayesian estimate of CC of map to perfect) as scores

Scoring for this solution now...

AutoSol_run_1_/TEMP0/resolve.scores SKEW 0.2547302

AutoSol_run_1_/TEMP0/resolve.scores CORR_RMS 0.8763324

CC-EST (BAYES-CC) SKEW : 55.7 +/- 18.5

CC-EST (BAYES-CC) CORR_RMS : 55.8 +/- 36.0

ESTIMATED MAP CC x 100: 57.6 +/- 14.1

The ESTIMATED MAP CC x 100 is an estimate of the quality of the experimental electron density map (not the density-modified one). A set of real structures was used to calibrate the range of values of each score that were obtained for phases with varying quality. The resulting probability distributions http://phenix-online.org/documentation/tutorial_mad.htm (7 of 14) [12/14/08 1:04:07 PM]

300

Tutorial 2: Solving a structure with MAD data are used above to estimate the correlation between the experimental map and an ideal map for this structure. Then all the estimates are combined to yield an overall Bayesian estimate of the map quality. These are reported as CC x 100 +/- 2SD. These estimated map CC values are usually fairly close, so if the estimate is 57.6 +/- 14.1 then you can be confident that your structure is not only solved but that you will have a good map when it is density modified. In this case the datasets used to find heavy-atom substructures were the FA values in FA.sca and the peak data in peak.sca_ano_1.

sca. For each dataset one solution was found, and that solution and its inverse were scored. The scores were (skipping extra text below):

SCORING SOLUTION 1: Solution 1 using HYSS on

/net/idle/scratch1/terwill/run_072908a/gene-5-mad/peak.sca_ano_1.sca.

Dataset #1, with 2 sites

ESTIMATED MAP CC x 100: 55.0 +/- 15.5

SCORING SOLUTION 2: Solution 2 using HYSS on

/net/idle/scratch1/terwill/run_072908a/gene-5-mad/peak.sca_ano_1.sca and taking inverse. Dataset #1, with 2 sites

ESTIMATED MAP CC x 100: 55.0 +/- 15.5

SCORING SOLUTION 3: Solution 3 using HYSS on FA.sca. Dataset #1, with 2 sites

Dataset #1, with 2 sites

ESTIMATED MAP CC x 100: 57.6 +/- 14.1

SCORING SOLUTION 4: Solution 4 using HYSS on FA.sca and taking inverse.

Dataset #1, with 2 sites

ESTIMATED MAP CC x 100: 39.7 +/- 26.9

SCORING SOLUTION 5: Solution 5 using HYSS on

/net/idle/scratch1/terwill/run_072908a/gene-5-mad/high.sca_ano_1.sca.

Dataset #1, with 2 sites

ESTIMATED MAP CC x 100: 54.9 +/- 15.6

SCORING SOLUTION 6: Solution 6 using HYSS on

/net/idle/scratch1/terwill/run_072908a/gene-5-mad/high.sca_ano_1.sca and taking inverse. Dataset #1, with 2 sites

ESTIMATED MAP CC x 100: 55.0 +/- 15.5

In this case the best score was using the FA values and taking the original hand (ESTIMATED MAP CC x

100: 57.6 +/- 14.1), and score for the inverted hand of the heavy-atom substructure was worse

(ESTIMATED MAP CC x 100: 39.7 +/- 26.9) and so the hand was clear.

Finding additional sites by density modification and FA heavy-atom Fouriers

When AutoSol is used with the default keyword of thoroughness=thorough as in this example, additional heavy-atom sites are found by phasing using the current model, carrying out density modification to improve the phases, and using the improved phases along with the FA values and the phase difference between the heavy atoms and the non-heavy atoms to calculate Fourier maps showing the positions of the anomalously-scattering atoms. The top peaks in these maps are used as trial heavy-atom sites (if they are not already part of the heavy-atom model. In this example solutions

1, 3, and 6 are all used for this phasing/density modification/Fourier procedure. Six new solutions are found, the best of which are solution 16, based on a difference Fourier using density modified phases from solution 6, and solution 8, based on density-modified phases from solution 3. Here is solution 16, not substantially different from solution 6 in this case:

SCORING SOLUTION 16: Solution 16 based on diff Fourier using http://phenix-online.org/documentation/tutorial_mad.htm (8 of 14) [12/14/08 1:04:07 PM]

301

Tutorial 2: Solving a structure with MAD data denmod solution 6. Dataset #1, with 2 sites

CC-EST (BAYES-CC) SKEW : 55.8 +/- 18.5

CC-EST (BAYES-CC) CORR_RMS : 55.8 +/- 36.0

ESTIMATED MAP CC x 100: 57.7 +/- 14.1

This process is repeated several additional times, leading to the final best solution of Solution 21:

SCORING SOLUTION 21: Solution 21 based on diff Fourier using denmod solution 16.

Dataset #1, with 2 sites

ESTIMATED MAP CC x 100: 57.7 +/- 14.1

which is used for final phasing and density modification.

Final phasing with SOLVE

Once the best heavy-atom solution or solutions are chosen based on ESTIMATED MAP CC, these are used in a final round of phasing with SOLVE (for MAD phasing). The log file from phasing for solution

21 is in solve_21.prt. This SOLVE log file repeats the correlation analysis of anomalous differences between data at each wavelength. Then it carries out a detailed refinement of the scattering factors at each wavelength. Finally the heavy-atom model is refined and phases are calculated with Bayesian correlated MAD phasing. The final occupancies and coordinates are listed at the end:

SITE ATOM OCCUP X Y Z B

CURRENT VALUES: 1 Se 0.9665 0.0175 0.2269 0.4069 50.6892

CURRENT VALUES: 2 Se 0.5979 0.9714 0.0088 0.4460 60.0000

In this case the occupancy of one site is quite near 1 and the other is lower. The second site is a selenomethionine that is not well ordered (it is the N-terminal residue in the protein).

Statistical density modification with RESOLVE

After MAD phases are calculated with SOLVE, the AutoSol Wizard uses RESOLVE density modification to improve the quality of the electron density map. The statistical density modification in RESOLVE takes advantage of the flatness of the solvent region and the expected distribution of electron density in the region containing the macromolecule, as well as any NCS that can be found from the heavy-atom substructure. The weighted structure factors and phases (FWT, PHWT) from SOLVE are used to calculate the starting map for RESOLVE, and the experimental structure factor amplitudes (FP) and

MAD Hendrickson-Lattman coefficients from SOLVE are used in the density modification process. The output from RESOLVE for solution 1 can be found in resolve_10.log. Here are key sections of this output. First, the plot of how many points in the "protein" region of the map have each possible value of electron density. The plot below is normalized so that a density of zero is the mean of the solvent region, and the standard deviation of the density in the map is 1.0. A perfect map has a lot of points with density slightly less than zero on this scale (the points between atoms) and a few points with very high density (the points near atoms), and no points with very negative density. Such a map has a very high skew (think "skewed off to the right"). This map is good, with a positive skew, though it is not perfect.

Plot of Observed (o) and model (x) electron density distributions for protein

region, where the model distribution is given by,

p_model(beta*(rho+offset)) = p_ideal(rho)

and then convoluted with a gaussian with width of sigma

where sigma, offset and beta are given below under "Error estimate." http://phenix-online.org/documentation/tutorial_mad.htm (9 of 14) [12/14/08 1:04:07 PM]

302

Tutorial 2: Solving a structure with MAD data

0.04..................................................

. . .

. . .

. xxxx o .

. xx ooxo .

. xo . xx .

. x . xxx .

p(rho) . x . xx .

. xxo . xx .

. o . xo .

. xx . xxx .

. xx . xxx .

. oxx . xxx .

. xxx . oxxxx .

. oxx . oxxxxx .

0.0 xxxx......................................oxxxxxxx

-2 -1 0 1 2 3

normalized rho (0 = mean of solvent region)

After density modification is complete, this plot becomes much more like one from a perfect structure:

0.03..................................................

. . .

. . .

. xxxxoo. .

. xo o xxo .

. xxo o xo .

. x .xxo .

p(rho) . xo . xo .

. xo . oxxx .

. ox . xxx .

. ox . oxxxxx .

. xxx . ooxxxxxxx .

. ox . oooxxxxxx .

. oxx . o oxxxxxx.

xxxx . xo

0.0 x................................................x

-2 -1 0 1 2 3

normalized rho (0 = mean of solvent region)

The key statistic from this RESOLVE density modification is the R-factor for comparison of observed structure factor amplitudes (FP) with those calculated from the density modification procedure (FC). In this gene-5 MAD phasing the R-factor is very low:

Overall R-factor for FC vs FP: 0.293 for 2602 reflections

An acceptable value is anything below 0.35; below 0.30 is good. http://phenix-online.org/documentation/tutorial_mad.htm (10 of 14) [12/14/08 1:04:07 PM]

303

Tutorial 2: Solving a structure with MAD data

Generation of FreeR flags

The AutoSol Wizard will create a set of free R flags indicating which reflections are not to be used in refinement. By default 5% of reflections (up to a maximum of 2000) are reserved for this test set. If you want to supply a reflection file hires.mtz that has higher resolution than the data used to solve the structure, or has a test set already marked, then you can do this with the keyword

input_refinement_file=hires.mtz. The files to be used for model-building and refinement are listed in the AutoSol log file:

Copying AutoSol_run_1_/solve_21.mtz and adding free R flags for refinement input_data_file_use: AutoSol_run_1_/solve_21.mtz

labin_use: labin FP=FP SIGFP=SIGFP PHIB=PHIB FOM=FOM HLA=HLA HLB=HLB HLC=HLC HLD=HLD

Adding FreeR_flag to AutoSol_run_1_/TEMP0/solve_21.mtz

...

Saving exptl_fobs_phases_freeR_flags_21.mtz for refinement

THE FILE AutoSol_run_1_/resolve_21.mtz will be used for model-building

Model-building with RESOLVE

The AutoSol Wizard by default uses a very quick method to build just the secondary structure of your macromolecule. This is controlled by the keyword helices_strands_only=True. The Wizard will guess from your sequence file whether the structure is protein or RNA or DNA (but you can tell it if you want with (chain_type=PROTEIN). If the quick model-building does not build a satisfactory model (if the correlation of map and model is less than acceptable_secondary_structure_cc=0.35), then modelbuilding is tried again with the standard build procedure, essentially the same as one cycle of model-

building with the AutoBuild Wizard (see the web page Automated Model Building and Rebuilding with

AutoBuild

, except that if you specify thoroughness=quick as we have in this example, the modelbuilding is done less comprehensively to speed things up. In this case the secondary-structure-only model-building produces an initial model with 32 residues built and side chains assigned to 0, and which has a model-map correlation of 0.32.

Model with helices and strands is in Build_1.pdb

Log for helices and strands is in Build_1.log

Final file: AutoSol_run_1_/TEMP0/Build_1.pdb

Log file: Build_1.log copied to Build_1.log

Model 1: Residues built=32 placed=0 Chains=6 Model-map CC=0.32

This is new best model with cc = 0.32

Getting R for model: Build_1.pdb

Model: AutoSol_run_1_/TEMP0/refine_1.pdb R/Rfree=0.54/0.57

As the secondary-structure-only model-building does not give a very high model-map correlation, the

Wizard tries other density-modified maps as well. None of these give a better correlation, so the

Wizard tries regular model- building:

Secondary-structure-only model-building with RESOLVE was not successful enough...

Trying again with standard build (helices_strands_only=False)

Also turning on refine this try

...

Building 3 RESOLVE models...

Model 6: Residues built=48 placed=6 Chains=8 Model-map CC=0.50

This is new best model with cc = 0.5

Refining model: Build_6.pdbModel: AutoSol_run_1_/TEMP0/refine_6.pdb

R/Rfree=0.48/0.51

Model 7: Residues built=56 placed=0 Chains=11 Model-map CC=0.51

This is new best model with cc = 0.51

http://phenix-online.org/documentation/tutorial_mad.htm (11 of 14) [12/14/08 1:04:07 PM]

304

Tutorial 2: Solving a structure with MAD data

Refining model: Build_7.pdb

Model: AutoSol_run_1_/TEMP0/refine_7.pdb R/Rfree=0.45/0.48

Model 8: Residues built=52 placed=0 Chains=11 Model-map CC=0.51

Model completion cycle 1

Models to combine and extend: ['Build_6.pdb', 'Build_7.pdb',

'Build_8.pdb', 're fine_7.pdb']

Model 9: Residues built=64 placed=0 Chains=12 Model-map CC=0.59

This is new best model with cc = 0.59

Refining model: Build_combine_extend_9.pdb

Model: AutoSol_run_1_/TEMP0/refine_9.pdb R/Rfree=0.42/0.45

As the model-map correlation is now reasonably good (0.59), the model-building is considered successful and the refined initial model is written out to refine_9.pdb in the output directory. It is still just a preliminary model, but it is good enough to tell that the structure is solved. For full model-

building you will want to go on and use the AutoBuild Wizard (see the web page Automated Model

Building and Rebuilding with AutoBuild )

The AutoSol_summary.dat summary file

A quick summary of the results of your AutoSol run is in the AutoSol_summary.dat file in your output directory. This file lists the key files that were produced in your run of AutoSol (all these are in the output directory) and some of the key statistics for the run, including the scores for the heavyatom substructure and the model-building and refinement statistics. These statistics are listed for all the solutions obtained, with the highest-scoring solutions first. Here is part of the summary for this

gene-5 MAD dataset:

-----------CURRENT SOLUTIONS FOR RUN 1 : -------------------

*** FILES ARE IN THE DIRECTORY: AutoSol_run_1_ ****

Solution # 21 BAYES-CC: 57.7 +/- 14.1 Dataset #1 FOM: 0.51 ----------------

Solution 21 based on diff Fourier using denmod solution 16. Dataset #1

Dataset number: 1

Dataset type: mad

Datafiles used: ['/net/idle/scratch1/terwill/run_072908a/gene-5-mad/peak.sca',

'/net/idle/scratch1/terwill/run_072908a/gene-5-mad/infl.sca',

'/net/idle/scratch1/terwill/run_072908a/gene-5-mad/high.sca']

Sites: 2 (Already used for Phasing at resol of 2.5)

NCS information in: AutoSol_21.ncs_spec

Experimental phases in: solve_21.mtz

Experimental phases plus FreeR_flags for refinement in: exptl_fobs_phases_freeR_flags_21.mtz

Density-modified phases in: resolve_21.mtz

HA sites (PDB format) in: ha_21.pdb_formatted.pdb

Sequence file in: sequence.dat

Model in: refine_9.pdb

Residues built: 64

Side-chains built: 0

Chains: 12

Overall model-map correlation: 0.59

R/R-free: 0.42/0.45

http://phenix-online.org/documentation/tutorial_mad.htm (12 of 14) [12/14/08 1:04:07 PM]

305

Tutorial 2: Solving a structure with MAD data

Scaling logfile in: dataset_1_scale.log

HYSS logfile in: high.sca_ano_1.sca_hyss.log

Phasing logfile in: solve_21.prt

Density modification logfile in: resolve_21.log (R=0.29)

Build logfile in: Build_combine_extend_9.log

Score type: SKEW CORR_RMS

Raw scores: 0.26 0.88

BAYES-CC: 55.84 55.78

Heavy atom sites (fractional): xyz 0.018 0.227 0.406

xyz 0.973 0.011 0.448

How do I know if I have a good solution?

Here are some of the things to look for to tell if you have obtained a correct solution:

How much of the model was built? More than 50% is good, particularly if you are using the default of helices_strands_only=True. If less than25% of the model is built, then it may be entirely incorrect. Have a look at the model. If you see clear sets of parallel or antiparallel strands, or if you see helices and strands with the expected relationships, your model is going to be correct. If you see a lot of short fragments everywhere, your model and solution is going to be incorrect. How many side-chains were fitted to density? More than 25% is ok, more than 50% is very good.

What is the R-factor of the model? This only applies if you are building a full model (not for helices_strands_only=True). For a solution at moderate to high resolution (2.5 A or better) the Rfactor should be in the low 30's to be very good. For lower-resolution data, an R-factor in the low

40's is probably largely correct but the model is not very good.

What was the overall signal-to-noise in the data? Above 1 is good, below 0.5 is very low.

What are the individual CC-BAYES estimates of map correlation for your top solution? For a good

● solution they are all around 50 or more, with 2SD uncertainties that are about 10-20.

What is the overall "ESTIMATED MAP CC x 100" of your top solution. This should also be 50 or more for a good solution. This is an estimate of the map correlation before density modification, so if you have a lot of solvent or several NCS-related copies in the asymmetric unit, then lower

● values may still give you a good map.

What is the difference in "ESTIMATED MAP CC x 100" between the top solution and its inverse?

If this is large (more than the 2SD values for each) that is a good sign.

What to do next

Once you have run AutoSol and have obtained a good solution and model, the next thing to do is to run the AutoBuild Wizard. If you run it in the same directory where you ran AutoSol, the AutoBuild

Wizard will pick up where the AutoSol Wizard left off and carry out iterative model-building, density

modification and refinement to improve your model and map. See the web page Automated Model

Building and Rebuilding with AutoBuild for details on how to run AutoBuild. If you do not obtain a good

solution, then it's not time to give up yet. There are a number of standard things to try that may improve the structure determination. Here are a few that you should always try:

Have a careful look at all the output files. Work your way through the main log file (e.g.,

AutoSol_run_1_1.log) and all the other principal log files in order beginning with scaling

(dataset_1_scale.log), then looking at heavy-atom searching (FA.sca_hyss.log), phasing (e.

g., solve_10.log or solve_xx.log depending on which solution xx was the top solution) and density modification (e.g., resolve_xx.log). Is there anything strange or unusual in any of them that may give you a clue as to what to try next? For example did the phasing work well http://phenix-online.org/documentation/tutorial_mad.htm (13 of 14) [12/14/08 1:04:07 PM]

306

Tutorial 2: Solving a structure with MAD data

(high figure of merit) yet the density modification failed? (Perhaps the hand is incorrect). Was the solvent content estimated correctly? (You can specify it yourself if you want). What does the xtriage output say? Is there twinning or strong translational symmetry? Are there problems with reflections near ice rings? Are there many outlier reflections?

Try a different resolution cutoff. For example 0.5 A lower resolution than you tried before. Often the highest-resolution shells have little useful information for structure solution (though the data may be useful in refinement and density modification).

Try a different rejection criterion for outliers. The default is ratio_out=3.0 (toss reflections with delta F more than 3 times the rms delta F of all reflections in the shell). Try instead

ratio_out=5.0 to keep almost everything.

If the heavy-atom substructure search did not yield plausible solutions, try searching with HYSS using the command-line interface, and vary the resolution and number of sites you look for. Can you find a solution that has a higher CC than the one found in AutoSol? If so, you can read your

● solution in to AutoSol with sites_file=my_sites.pdb.

Was an anisotropy correction applied in AutoSol? If there is some anisotropy but no correction

● was applied, you can force AutoSol to apply the correction with correct_aniso=True.

Try related space groups. If you are not positive that your space group is P212121, then try other possibilities with different or no screw axes.

Additional information

For details about the AutoSol Wizard, see Automated structure solution with AutoSol . For help on

running Wizards, see

Running a Wizard from a GUI, the command-line, or a script

. http://phenix-online.org/documentation/tutorial_mad.htm (14 of 14) [12/14/08 1:04:07 PM]

307

Tutorial 3: Solving a structure with MIR data

Python-based Hierarchical ENvironment for Integrated Xtallography

Documentation Home

Tutorial 3: Solving a structure with MIR data

Introduction

Setting up to run PHENIX

Running the demo rh-dehalogenase data with AutoSol

Where are my files?

What parameters did I use?

Reading the log files for your AutoSol run file

Summary of the command-line arguments

Reading the datafiles.

ImportRawData.

Guessing cell contents

Running phenix.xtriage

Testing for anisotropy in the data

Scaling MIR data

Running HYSS to find the heavy-atom substructure

Finding the hand and scoring heavy-atom solutions

Finding origin shifts between heavy-atom solutions for different derivatives and combining phases

Finding additional sites by density modification and heavy-atom difference Fouriers

Final phasing with SOLVE

Statistical density modification with RESOLVE

Generation of FreeR flags

Model-building with RESOLVE

The AutoSol_summary.dat summary file

How do I know if I have a good solution?

What to do next

Additional information

Introduction

This tutorial will use some very good MIR data (Native and 5 derivatives from a rh-dehalogenase protein MIR dataset analyzed at 2.8 A) as an example of how to solve a MIR dataset with AutoSol. It is designed to be read all the way through, giving pointers for you along the way. Once you have read it all and run the example data and looked at the output files, you will be in a good position to run your own data through

AutoSol.

Setting up to run PHENIX

If PHENIX is already installed and your environment is all set, then if you type: echo $PHENIX then you should get back something like this:

/xtal//phenix-1.3

If instead you get:

PHENIX: undefined variable http://phenix-online.org/documentation/tutorial_mir.htm (1 of 15) [12/14/08 1:04:14 PM]

308

Tutorial 3: Solving a structure with MIR data

then you need to set up your PHENIX environment. See the PHENIX installation page for details of how to do

this. If you are using the C-shell environment (csh) then all you will need to do is add one line to your .

cshrc (or equivalent) file that looks like this: source /xtal/phenix-1.3/phenix_env

(except that the path in this statement will be where your PHENIX is installed). Then the next time you log in $PHENIX will be defined.

Running the demo rh-dehalogenase data with AutoSol

To run AutoSol on the demo rh-dehalogenase data, make yourself a tutorials directory and cd into that directory: mkdir tutorials cd tutorials

Now type the phenix command: phenix.run_example --help to list the available examples. Choosing rh-dehalogenase-mir for this tutorial, you can now use the phenix command: phenix.run_example rh-dehalogenase-mir to solve the rh-dehalogenase structure with AutoSol. This command will copy the directory $PHENIX/

examples/rh-dehalogenase-mir to your current directory (tutorials) and call it tutorials/rh-

dehalogenase-mir/ . Then it will run AutoSol using the command file run.csh that is present in this

tutorials/rh-dehalogenase-mir/ directory. Running an MIR dataset is a little different than running a MAD or SAD or SIR dataset because you cannot use the standard command-line control for MIR. Instead you have to run a script. It is not hard, just different. (You can do all of those other things from a script too, it's just even easier to do them from the command-line). This command file run.csh is simple. It says:

#!/bin/csh

#!/bin/csh echo "Running AutoSol on rhodococcus dehalogenase data..." echo "NOTE: command-line not available for MIR..using script instead" phenix.runWizard AutoSol Facts.list

The first line (#!/bin/csh) tells the system to interpret the remainder of the text in the file using the C-shell

(csh). The command

phenix.autosol runs the command-line version of AutoSol (see Automated Structure

Solution using AutoSol

for all the details about AutoSol including a full list of keywords). The second line says to run the AutoSol Wizard, and use the contents of the file Facts.list as parameters. Now let’s look at the

Facts.list file. Here is the first relevant part of the file: sequence_file sequence.dat

thoroughness thorough cell 93.796 79.849 43.108 90.000 90.000 90.00 # cell params resolution 2.8 # Resolution expt_type sir # MIR dataset is set of SIR datasets input_file_list rt_rd_1.sca auki_rd_1.sca # list of input .sca files

# Native deriv 1 http://phenix-online.org/documentation/tutorial_mir.htm (2 of 15) [12/14/08 1:04:14 PM]

309

Tutorial 3: Solving a structure with MIR data nat_der_list Native Au # identify files in input_file_list

# as Native or the heavy-atom name

# such as se.

inano_list noinano inano # inano/noinano/anoonly: identify

# if ano diffs to be used for derivs n_ha_list 0 5 # number of heavy-atoms for each

# file for mir/sir (0 for native)

This part of the script tells AutoSol about the resolution, the data files for the first native-derivative combination, and the heavy atoms for these files (Native and Au), and whether anomalous differences are to be included for each (noinano for Native means do not include them; inano for the Au derivative means do include them for this derivative), and the number of heavy-atoms in each file (0 for the Native, 5 for the derivative). Note that this first native-derivative combination in this MIR dataset is being treated as an SIRAS dataset. This is the way the AutoSol Wizard works for MIR. The individual derivatives are all solved separately (except using difference Fouriers to phase one derivative using a solution from another). Then when all are finished all the SIR or SIRAS datasets are phased all together with SOLVE Bayesian correlated phasing. This approach works well because a substructure determination is done separately for each derivative, and if any one of them works well, then all the derivatives can be solved. This part of the script also tells AutoSol to use defaults for a thorough analysis. Usually for MIR this is the best idea, while for SAD and MAD experiments a quick analysis is fine. The MIR script then continues with data for the second, third... derivatives. These parts of the script all look like this:

############## NEW DATASET ################ run_list start # run "start" method.

# read in datafiles for this dataset run_list read_another_dataset # starting a new dataset here input_file_list rt_rd_1.sca hgki_rd_1.sca # list of input .sca files

# Native deriv 1 nat_der_list Native Hg # identify files in input_file_list

# as Native or the heavy-atom name

# such as se.

inano_list noinano inano # inano/noinano/anoonly: identify

# if ano diffs to be used for derivs n_ha_list 0 5 # number of heavy-atoms for each

# file for mir/sir (0 for native)

Here the run_list start line is a command to AutoSol. It means "run the following list of AutoSol methods: start " . So the AutoSol Wizard runs the "start" method and stops. This basically reads in the datafiles from the previous dataset. The next line says to read another dataset. Now we are ready to provide the data for the second native-derivative combination, again as an SIR dataset. We provide the same native as before

(although we don't have to) and a new derivative, this time an Hg derivative, again with anomalous data.

This procedure is repeated for each derivative. The AutoSol Wizard will then scale all the datasets and find heavy-atom solutions for some of them by direct methods, then use difference Fouriers to find the solutions for the others. Although the phenix.run_example rh-dehalogenase-mir command has just run AutoSol from a script (run.csh), you can run AutoSol yourself from this script with the same phenix.runWizard

AutoSol Facts.list command. You can also run AutoSol from a GUI. All these possibilities are described in

Running a Wizard from a GUI, the command-line, or a script

.

Where are my files?

Once you have started AutoSol or another Wizard, an output directory will be created in your current http://phenix-online.org/documentation/tutorial_mir.htm (3 of 15) [12/14/08 1:04:14 PM]

310

Tutorial 3: Solving a structure with MIR data

(working) directory. The first time you run AutoSol in this directory, this output directory will be called

AutoSol_run_1_ (or AutoSol_run_1_/, where the slash at the end just indicates that this is a directory).

All of the output from run 1 of AutoSol will be in this directory. If you run AutoSol again, a new subdirectory called AutoSol_run_2_ will be created. Inside the directory AutoSol_run_1_ there will be one or more temporary directories such as TEMP0 created while the Wizard is running. The files in this temporary directory may be useful sometimes in figuring out what the Wizard is doing (or not doing!). By default these directories are emptied when the Wizard finishes (but you can keep their contents with the command

clean_up=False if you want.)

What parameters did I use?

When the AutoSol wizard runs from a script it does not write out a parameters file. The parameters from your Facts.list are echoed in the AutoSol log file, but otherwise the Facts.list is your record of what the parameters used were.

Reading the log files for your AutoSol run file

While the AutoSol wizard is running, there are several places you can look to see what is going on. The most important one is the overall log file for the AutoSol run. This log file is located in:

AutoSol_run_1_/AutoSol_run_1_1.log

for run 1 of AutoSol. (The second 1 in this log file name will be incremented if you stop this run in the middle and restart it with a command like phenix.autosol run=1). The AutoSol_run_1_1.log file is a running summary of what the AutoSol Wizard is doing. Here are a few of the key sections of the log files produced for the rh-dehalogenase MIR dataset.

Summary of the command-line arguments

Near the top of the log file you will find:

READING FACTS FROM Facts.list

NEW FACT from Facts.list : cell [93.796000000000006, 79.849000000000004, 43.107999999999997, 90.0, 90.0, 90.0]

NEW FACT from Facts.list :resolution 2.8

NEW FACT from Facts.list :expt_type sir

NEW FACT from Facts.list :input_file_list ['rt_rd_1.sca', 'auki_rd_1.sca']

NEW FACT from Facts.list :nat_der_list ['Native', 'Au']

NEW FACT from Facts.list :inano_list ['noinano', 'inano']

NEW FACT from Facts.list :n_ha_list [0, 5]

NEW FACT from Facts.list :run_list ['start']

This is just a repeat of the parameters in your Facts.list script. The last fact is the "run_list start" command, which tells the AutoSol Wizard to read in the data (recall that we put in this command after each nativederivative combination so the Wizard could read it in as an SIR dataset).

Reading the datafiles.

The AutoSol Wizard will read in your datafiles and check their contents, printing out a summary for each one.

This is done one dataset at a time (each native-derivative pair) until all have been read in. Here is the summary for the first derivative:

HKLIN ENTRY: rt_rd_1.sca

FILE TYPE scalepack_no_merge_original_index

GUESS FILE TYPE MERGE TYPE sca unmerged

LABELS['I', 'SIGI']

CONTENTS: ['rt_rd_1.sca', 'sca', 'unmerged', 'P 21 21 2', None, None, http://phenix-online.org/documentation/tutorial_mir.htm (4 of 15) [12/14/08 1:04:14 PM]

311

Tutorial 3: Solving a structure with MIR data

['I', 'SIGI']]

Not checking SG as cell or sg not yet defined

SG from rt_rd_1.sca is: P 21 21 2

HKLIN ENTRY: auki_rd_1.sca

FILE TYPE scalepack_no_merge_original_index

GUESS FILE TYPE MERGE TYPE sca unmerged

LABELS['I', 'SIGI']

CONTENTS: ['auki_rd_1.sca', 'sca', 'unmerged', 'P 21 21 21', None, None,

['I', 'SIGI']]

Converting the files ['rt_rd_1.sca', 'auki_rd_1.sca'] to sca format before proceeding

ImportRawData.

The input data files rt_rd_1.sca and auki_rd_1.sca are in unmerged Scalepack format. The AutoSol wizard converts everything to premerged Scalepack format before proceeding. Here is where the AutoSol Wizard identifies the format and then calls the ImportRawData Wizard:

Running import directly...

WIZARD: ImportRawData followed eventually by...

List of output files :

File 1: rt_rd_1_PHX.sca

File 2: auki_rd_1_PHX.sca

These output files are in premerged Scalepack format. After completing the ImportRawData step, the

AutoSol Wizard goes back to the beginning, but uses the newly-converted files rt_rd_1_PHX.sca and

auki_rd_1_PHX.sca:

HKLIN ENTRY: AutoSol_run_1_/rt_rd_1_PHX.sca

FILE TYPE scalepack_merge

GUESS FILE TYPE MERGE TYPE sca premerged

LABELS['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']

Unit cell: (93.796, 79.849, 43.108, 90, 90, 90)

Space group: P 21 21 2 (No. 18)

CONTENTS: ['AutoSol_run_1_/rt_rd_1_PHX.sca', 'sca', 'premerged', 'P 21 21 2',

[93.796000000000006, 79.849000000000004, 43.107999999999997, 90.0, 90.0, 90.0],

2.4307589843043771, ['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']]

HKLIN ENTRY: AutoSol_run_1_/auki_rd_1_PHX.sca

FILE TYPE scalepack_merge

GUESS FILE TYPE MERGE TYPE sca premerged

LABELS['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']

Unit cell: (93.796, 79.849, 43.108, 90, 90, 90)

Space group: P 21 21 2 (No. 18)

CONTENTS: ['AutoSol_run_1_/auki_rd_1_PHX.sca', 'sca', 'premerged', 'P 21 21 2',

[93.796000000000006, 79.849000000000004, 43.107999999999997, 90.0, 90.0, 90.0],

2.430806639777233, ['IPLUS', 'SIGIPLUS', 'IMINU', 'SIGIMINU']]

Total of 2 input data files

['AutoSol_run_1_/rt_rd_1_PHX.sca', 'AutoSol_run_1_/auki_rd_1_PHX.sca']

Guessing cell contents

The AutoSol Wizard uses the sequence information in your sequence file (sequence.dat) and the cell parameters and space group to guess the number of NCS copies and the solvent fraction.

AutoSol_guess_setup_for_scaling AutoSol Run 1 Fri Mar 7 01:24:08 2008 http://phenix-online.org/documentation/tutorial_mir.htm (5 of 15) [12/14/08 1:04:14 PM]

312

Tutorial 3: Solving a structure with MIR data

Solvent fraction and resolution and ha types/scatt fact

Guessing setup for scaling dataset 1

SG P 21 21 2 cell [93.796000000000006, 79.849000000000004, 43.107999999999997, 90.0, 90.0, 90.0]

Number of residues in unique chains in seq file: 294

Unit cell: (93.796, 79.849, 43.108, 90, 90, 90)

Space group: P 21 21 2 (No. 18)

CELL VOLUME :322858.090387

N_EQUIV:4

GUESS OF NCS COPIES: 1

SOLVENT FRACTION ESTIMATE: 0.51

Total residues:294

Total Met:6 resolution estimate: 2.8

Running phenix.xtriage

The AutoSol Wizard automatically runs phenix.xtriage on each of your input datafiles to analyze them for twinning, outliers, translational symmetry, and other special conditions that you should be aware of. You can

read more about xtriage in Data quality assessment with phenix.xtriage

. Part of the summary output from

xtriage for this dataset looks like this:

No (pseudo)merohedral twin laws were found.

Patterson analyses

- Largest peak height : 6.680

(corresponding p value : 0.56306)

The largest off-origin peak in the Patterson function is 6.68% of the height of the origin peak. No significant pseudotranslation is detected.

The results of the L-test indicate that the intensity statistics behave as expected. No twinning is suspected.

In this space group (P21 21 2) with the cell dimensions in this structure, there are no ways to create a twinned crystal, so you do not have to worry about twinning. There is also no large off-origin peak in the native Patterson, so there does not appear to be any translational pseudo-symmetry.

Testing for anisotropy in the data

After all the SIR datasets are read in, the AutoSol Wizard tests for anisotropy by determining the range of effective anisotropic B values along the principal lattice directions. If this range is large and the ratio of the largest to the smallest value is also large then the data are by default corrected to make the anisotropy small

(see Analyzing and scaling the data

in the AutoSol web page for more discussion of the anisotropy correction). In the rh-dehalogenase case, the range of anisotropic B values is small and no correction is made:

Range of aniso B: 13.06 19.68

Not using aniso-corrected data files as the range of aniso b is only 6.62 and 'correct_aniso' is not set

Note that if any one of the datafiles in a MIR dataset has a high anisotropy, then by default all of them will be corrected for anisotropy. http://phenix-online.org/documentation/tutorial_mir.htm (6 of 15) [12/14/08 1:04:14 PM]

313

Tutorial 3: Solving a structure with MIR data

Scaling MIR data

The AutoSol Wizard uses SOLVE localscaling to scale MIR data. The procedure is basically to scale all the data to the native. During this process outliers that deviate from the reference values by more that ratio_out

(default=3) standard deviations (using all data in the appropriate resolution shell to estimate the SD) are rejected.

Running HYSS to find the heavy-atom substructure

The HYSS (hybrid substructure search) procedure for heavy-atom searching uses a combination of a

Patterson search for 2-site solutions with direct methods recycling. The search ends when the same solution is found beginning with several different starting points. The HYSS log files are named after the datafile that they are based on and the type of differences (ano, iso) that are being used. In this rh-dehalogenase MIR dataset, the HYSS logfile for the HgKI derivative is hgki_rd_1_PHX.sca_iso_2.sca_hyss.log. The key part of this HYSS log file is:

Entering search loop: p = peaklist index in Patterson map f = peaklist index in two-site translation function cc = correlation coefficient after extrapolation scan r = number of dual-space recycling cycles cc = final correlation coefficient

=0.190 r=015 cc=0.250 [ best cc: 0.250 ] p=000 f=001 cc=0.191 r=015 cc=0.242 [ best cc: 0.250 0.242 ]

Number of matching sites of top 2 structures: 3 p=000 f=002 cc=0.174 r=015 cc=0.200 [ best cc: 0.250 0.242 ] p=001 f=000 cc=0.167 r=015 cc=0.230 [ best cc: 0.250 0.242 0.230 ]

Number of matching sites of top 2 structures: 3

Number of matching sites of top 3 structures: 2

...

p=011 f=002 cc=0.165 r=015 cc=0.229 [ best cc: 0.293 0.279 0.277 0.276 ] p=012 f=000 cc=0.184 r=015 cc=0.250 [ best cc: 0.293 0.279 0.277 0.276 ] p=012 f=001 cc=0.148 r=015 cc=0.292 [ best cc: 0.293 0.292 0.279 0.277 ]

Number of matching sites of top 2 structures: 7

Number of matching sites of top 3 structures: 7

Number of matching sites of top 4 structures: 6

Here a correlation coefficient of 0.5 is very good (0.1 is hopeless, 0.2 is possible, 0.3 is good) and 8 sites were found that matched in the first two tries. The program continues until 4 structures all have 6 matching sites, then ends and prints out the final correlations, after taking the top 5 sites.

Finding the hand and scoring heavy-atom solutions

Normally either hand of the heavy-atom substructure is a possible solution, and both must be tested by calculating phases and examining the electron density map and by carrying out density modification, as they will give the same statistics for all heavy-atom analysis and phasing steps. Note that in chiral space groups

(those that have a handedness such as P61, both hands of the space group must be tested. The AutoSol

Wizard will do this for you, inverting the hand of the heavy-atom substructure and the space group at the same time. For example, in space group P61 the hand of the substructure is inverted and then it is placed in space group P65. The AutoSol Wizard scores heavy-atom solutions based on two criteria by default. The first criterion is the skew of the electron density in the map (SKEW). Good values for the skew are anything greater than 0.1. In a MIR structure determination, the heavy-atom solution with the correct hand may have a more positive skew than the one with the inverse hand. The second criterion is the correlation of local RMS density (CORR_RMS). This is a measure of how contiguous the solvent and non-solvent regions are in the map. (If the local rms is low at one point and also low at neighboring points, then the solvent region must be relatively contiguous, and not split up into small regions.) For MIR datasets, SOLVE is used for calculating http://phenix-online.org/documentation/tutorial_mir.htm (7 of 15) [12/14/08 1:04:14 PM]

314

Tutorial 3: Solving a structure with MIR data phases. For a MIR dataset, a figure of merit of 0.5 is acceptable, 0.6 is fine and anything above 0.7 is very good. The scores are listed in the AutoSol log file. Here is the scoring for solution 4 (the best initial map):

AutoSol_run_1_/TEMP0/resolve.scores SKEW 0.2797302

AutoSol_run_1_/TEMP0/resolve.scores CORR_RMS 0.9306123

CC-EST (BAYES-CC) SKEW : 57.8 +/- 17.0

CC-EST (BAYES-CC) CORR_RMS : 63.3 +/- 28.2

ESTIMATED MAP CC x 100: 60.8 +/- 13.3

This is a good solution, with a high (and positive) skew (0.28), and a high correlation of local rms density

(0.93) The ESTIMATED MAP CC x 100 is an estimate of the quality of the experimental electron density map (not the density-modified one). A set of real structures was used to calibrate the range of values of each score that were obtained for phases with varying quality. The resulting probability distributions are used above to estimate the correlation between the experimental map and an ideal map for this structure. Then all the estimates are combined to yield an overall Bayesian estimate of the map quality. These are reported as

CC x 100 +/- 2SD. These estimated map CC values are usually fairly close, so if the estimate is 60.8 +/-

13.3 then you can be confident that your structure is solved and that the density-modified map will be quite good. In this case the datasets used to find heavy-atom substructures were the isomorphous differences for each derivative. For each dataset one solution was found, and that solution and its inverse were scored. The scores were (skipping extra text below):

SCORING SOLUTION 1: Solution 1 using H