PBS Professional 13.0 Administrator'

PBS Professional® 13.0
Administrator’s Guide
PBS Works is a division of
You are reading the Altair PBS Professional 13.0
Administrator’s Guide (AG)
Updated 6/7/15
Copyright © 2003-2015 Altair Engineering, Inc. All rights reserved. PBS™, PBS Works™,
PBS GridWorks®, PBS Professional®, PBS Analytics™, PBS Catalyst™, e-Compute™, and
e-Render™ are trademarks of Altair Engineering, Inc. and are protected under U.S. and international laws and treaties. All other marks are the property of their respective owners.
ALTAIR ENGINEERING INC. Proprietary and Confidential. Contains Trade Secret Information. Not for use or disclosure outside ALTAIR and its licensed clients. Information contained
herein shall not be decompiled, disassembled, duplicated or disclosed in whole or in part for
any purpose. Usage of the software is only as explicitly permitted in the end user software
license agreement. Copyright notice does not imply publication.
For information on the End User License Agreement terms and conditions and the terms and
conditions governing third party codes included in the Altair Software, please see the Release
Notes.
Documentation and Contact Information
Contact Altair at:
www.pbsworks.com
pbssales@altair.com
Technical Support
Location
North America
China
France
Germany
India
Italy
Japan
Korea
Scandinavia
UK
Telephone
+1 248 614 2425
+86 (0)21 6117 1666
+33 (0)1 4133 0992
+49 (0)7031 6208 22
+91 80 66 29 4500
+39 800 905595
+81 3 5396 2881
+82 70 4050 9200
+46 (0) 46 460 2828
+44 (0)1926 468 600
e-mail
pbssupport@altair.com
es@altair.com.cn
francesupport@altair.com
hwsupport@altair.de
pbs-support@india.altair.com
support@altairengineering.it
pbs@altairjp.co.jp
support@altair.co.kr
support@altair.se
pbssupport@uk.altair.com
This document is proprietary information of Altair Engineering, Inc.
Contents
About PBS Documentation
ix
1 New Features
1
1.1
1.2
1.3
1.4
New Features in PBS 13.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Changes in Previous Releases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Deprecations and Removals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Backward Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Configuring the Server and Queues
2.1
2.2
The Server. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Configuring MoMs and Vnodes
3.1
3.2
3.3
3.4
3.5
3.6
37
Vnodes: Virtual Nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
MoMs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Files and Directories Used by MoM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Configuring MoMs and Vnodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
How to Configure MoMs and Vnodes . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Configuring MoM and Vnode Features . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Scheduling
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
15
63
Chapter Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Scheduling Policy Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Choosing a Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
The Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Using Queues in Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Scheduling Restrictions and Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Errors and Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
Scheduling Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
PBS Professional 13.0 Administrator’s Guide
AG-iii
Contents
5 PBS Resources
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
305
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Chapter Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Categories of Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Resource Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
Behavior of Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
How to Set Resource Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
Overview of Ways Resources Are Used . . . . . . . . . . . . . . . . . . . . . . . . 321
Resources Allocated to Jobs and Reservations . . . . . . . . . . . . . . . . . . 322
Using Resources to Track and Control Allocation. . . . . . . . . . . . . . . . . 332
Using Resources for Topology and Job Placement . . . . . . . . . . . . . . . 335
Using Resources to Prioritize Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Using Resources to Restrict Server, Queue Access. . . . . . . . . . . . . . . 336
Custom Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Managing Resource Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
Where Resource Information Is Kept . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Viewing Resource Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Resource Recommendations and Caveats. . . . . . . . . . . . . . . . . . . . . . 432
6 Hooks
437
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
6.16
6.17
6.18
6.19
Chapter Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Introduction to Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Prerequisites and Requirements for Hooks. . . . . . . . . . . . . . . . . . . . . . 442
Simple How-to for Writing Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Uses for Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Hook Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
Creating and Configuring Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Viewing Hook Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
Writing Hook Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
Advice and Caveats for Writing Hooks . . . . . . . . . . . . . . . . . . . . . . . . . 516
Interface to Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
Hook Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
Managing Built-in Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
Python Modules and PBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
Debugging Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
Error Reporting and Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726
Attributes and Parameters Affecting Hooks . . . . . . . . . . . . . . . . . . . . . 737
See Also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
AG-iv
PBS Professional 13.0 Administrator’s Guide
Contents
7 Provisioning
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
7.10
7.11
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739
How Provisioning Can Be Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
How Provisioning Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
Configuring Provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751
Viewing Provisioning Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
Requirements and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 762
Defaults and Backward Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . 765
Example Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765
Advice and Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
Errors and Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
8 Security
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
8.10
8.11
8.12
8.13
8.14
787
Configurable Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 787
Setting User Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
Using Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791
Restricting Execution Host Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
Logging Security Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
Changing the PBS Service Account Password . . . . . . . . . . . . . . . . . . . 816
Paths and Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
File and Directory Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
Authentication & Authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
Root-owned Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
User Passwords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
Windows Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
Windows Firewall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
Windows Requirement for cmd Prompt. . . . . . . . . . . . . . . . . . . . . . . . . 827
9 Making Your Site More Robust
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
739
829
Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829
Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830
Checkpoint and Restart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
Preventing Communication and Timing Problems . . . . . . . . . . . . . . . . 878
Reservation Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Preventing File System Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891
Preventing Communication Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 891
Built-in Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891
PBS Professional 13.0 Administrator’s Guide
AG-v
Contents
10 Integrations
10.1
10.2
10.3
10.4
10.5
10.6
Integration with MPI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
Support for IBM AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917
Support for Cray Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 923
Support for SGI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954
Support for Globus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
Support for Hyper-Threading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964
11 Managing Jobs
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9
11.10
11.11
11.12
11.13
11.14
11.15
11.16
11.17
11.18
11.19
AG-vi
893
967
Routing Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967
Limiting Number of Jobs Considered in Scheduling Cycle . . . . . . . . . . 967
Allocating Resources to Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 967
Grouping Jobs By Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969
Job Prologue and Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971
UNIX Shell Invocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977
When Job Attributes are Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 977
Job Termination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980
Job Exit Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984
Rerunning or Requeueing a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
Job IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 988
Where to Find Job Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989
Job Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989
The Job Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994
Managing Job History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999
Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003
Adjusting Job Running Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003
Managing Number of Run Attempts . . . . . . . . . . . . . . . . . . . . . . . . . . 1004
Allowing Interactive Jobs on Windows . . . . . . . . . . . . . . . . . . . . . . . . 1004
PBS Professional 13.0 Administrator’s Guide
Contents
12 Administration
12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8
12.9
12.10
12.11
1005
The PBS Configuration File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005
Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010
The Accounting Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1012
Event Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015
Using the UNIX syslog Facility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1022
Managing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1023
Managing the Data Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025
Enabling Passwordless Authentication . . . . . . . . . . . . . . . . . . . . . . . . 1027
Setting File Transfer Mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028
Temporary File Location for PBS Components. . . . . . . . . . . . . . . . . . 1038
Administration Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1040
13 Problem Solving
1041
13.1
13.2
13.3
13.4
13.5
13.6
13.7
13.8
13.9
13.10
13.11
13.12
Debugging PBS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1041
Server Host Bogs Down After Startup. . . . . . . . . . . . . . . . . . . . . . . . . 1041
Finding PBS Version Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042
Troubleshooting and Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042
Directory Permission Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1043
Common Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1043
Errors on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047
Troubleshooting PBS Licenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1053
Security-related Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1055
Time Zone Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058
Job Comments for Problem Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058
Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059
Index
1061
PBS Professional 13.0 Administrator’s Guide
AG-vii
Contents
AG-viii
PBS Professional 13.0 Administrator’s Guide
About PBS
Documentation
The PBS Professional Documentation
The documentation for PBS Professional includes the following:
PBS Professional Administrator s Guide:
How to configure and manage PBS Professional. For the PBS administrator.
PBS Professional Quick Start Guide:
Quick overview of PBS Professional installation and license file generation.
PBS Professional Installation & Upgrade Guide:
How to install and upgrade PBS Professional. For the administrator.
PBS Professional User s Guide:
How to submit, monitor, track, delete, and manipulate jobs. For the job submitter.
PBS Professional Programmer s Guide:
Discusses the PBS application programming interface (API). For integrators.
PBS Professional Reference Guide:
Covers PBS reference material.
PBS Manual Pages:
PBS commands, resources, attributes, APIs.
Where to Keep the Documentation
To make cross-references work, put all of the PBS guides in the same directory.
Ordering Software and Publications
To order additional copies of this manual and other PBS publications, or to purchase additional software licenses, contact your Altair sales representative at pbssales@altair.com.
PBS Professional 13.0 Administrator’s Guide
AG-ix
About PBS Documentation
Document Conventions
PBS documentation uses the following typographic conventions:
abbreviation
The shortest acceptable abbreviation of a command or subcommand is underlined.
command
Commands such as qmgr and scp
input
Command-line instructions
manpage(x)
File and path names. Manual page references include the section number in parentheses
appended to the manual page name.
format
Syntax, template, synopsis
Attributes
Attributes, parameters, objects, variable names, resources, types
Values
Keywords, instances, states, values, labels
Definitions
Terms being defined
Output
Output, example code, or file contents
Examples
Examples
Filename
Name of file
Utility
Name of utility, such as a program
AG-x
PBS Professional 13.0 Administrator’s Guide
1
New Features
This chapter briefly lists new features by release, with the most recent listed first. This chapter
also lists deprecated elements, such as options, keywords, etc.
The Release Notes included with this release of PBS Professional list all new features in this
version of PBS Professional, and any warnings or caveats. Be sure to review the Release
Notes, as they may contain information that was not available when this book was written.
The PBS Professional manual pages that were reproduced in this guide are available in the
PBS Professional Reference Guide or as UNIX man pages. They have been removed from
this book to save space.
1.1
New Features in PBS 13.0
New Hook Events
PBS provides three new hook events:
•
An execjob_launch hook runs just before MoM runs the user’s program
•
An execjob_attach hook runs when pbs_attach is called
•
An exechost_startup hook runs when MoM starts up or is HUPed
See section 6.7.2, “When Hooks Run”, on page 454, section 6.12.4.9, “execjob_launch: Event
when Execution Host Receives Job”, on page 555, section 6.12.4.10, “execjob_attach: Event
when pbs_attach() runs”, on page 557, and section 6.12.4.14, “exechost_startup: Event When
Execution Host Starts Up”, on page 564.
Configuration Files for Hooks
You can use configuration files with hooks. See section 6.8.6, “Using Hook Configuration
Files”, on page 465.
Configuring Vnodes in Hooks
You can use hooks to configure vnode attributes and resources. See section 6.10.4.4.iv, “Setting and Unsetting Vnode Resources and Attributes Using vnode_list[]”, on page 494.
PBS Professional 13.0 Administrator’s Guide
AG-1
Chapter 1
New Features
Adding Custom Resources in Hooks
You can use hooks to add custom non-consumable host-level resources. See section 6.10.8,
“Adding Custom Non-consumable Host-level Resources”, on page 512.
Node Health Hook Features
PBS has node health checking features for hooks. You can offline and clear vnodes, and
restart the scheduling cycle. See section 6.10.6, “Offlining and Clearing Vnodes Using the
fail_action Hook Attribute”, on page 511 and section 6.10.7, “Restarting Scheduler Cycle
After Hook Failure”, on page 512.
Hook Debugging Enhancements
You can get hooks to produce debugging information, and then read that information in while
debugging hooks. See section 6.16, “Debugging Hooks”, on page 639.
Managing Built-in Hooks
You can enable and disable built-in hooks. See section 6.14, “Managing Built-in Hooks”, on
page 634.
Scheduler Does not Trigger modifyjob Hooks
The scheduler does not trigger modifyjob hooks. See Chapter 6, "Hooks", on page 437.
Faster, Asynchronous Communication Between Daemons
PBS has a communication daemon that provides faster, asynchronous communication
between the server, scheduler, and MoM daemons. See “Communication” on page 87 in the
PBS Professional Installation & Upgrade Guide.
Enhanced Throughput of Jobs
By default, the scheduler runs asynchronously to speed up job start, and jobs that have been
altered via qalter, server_dyn_res, or peering can run in the same scheduler cycle in
which they were altered. See section 4.4.7.1, “Improving Throughput of Jobs”, on page 117.
Creating Custom Resources via qmgr
You can create any custom resources using nothing but the qmgr command. See section
5.14.2.1, “Defining Custom Resources via qmgr”, on page 341.
Job Sorting Formula: Python Math Functions and Threshold
You can use standard Python math functions in the job sorting formula. You can also set a
threshold for job priority, below which jobs cannot run. See section 4.8.20, “Using a Formula
for Computing Job Execution Priority”, on page 194.
Fairshare: Formula and Decay Factor
You can use a mathematical formula for fairshare, and you can set a custom decay factor. See
section 4.8.18, “Using Fairshare”, on page 179.
AG-2
PBS Professional 13.0 Administrator’s Guide
New Features
Chapter 1
Preempted Jobs can be Top Jobs
You can specify that preempted jobs should be classified as top jobs. See section 4.8.16,
“Calculating Job Execution Priority”, on page 174. You can use a new scheduler attribute
called sched_preempt_enforce_resumption for this; see section 4.8.3, “Using Backfilling”, on page 129.
Limiting Preemption Targets
You can specify which jobs can be preempted by a given job. See section 4.8.33.3.i, “How
Preemption Targets Work”, on page 244.
Limiting Number of Jobs in Execution Queues
You can speed up the scheduling cycle by limiting the number of jobs in execution queues.
See section 4.4.7.2, “Limiting Number of Jobs Queued in Execution Queues”, on page 117.
Improved Round-robin Behavior
The round_robin scheduler parameter produces improved behavior. See section 4.8.38,
“Round Robin Queue Selection”, on page 270.
Limiting Resources Allocated to Queued Jobs
You can set limits on the amounts of resources allocated to queued jobs specifically. See section 5.15.1, “Managing Resource Usage By Users, Groups, and Projects, at Server &
Queues”, on page 389.
Running qsub in the Foreground
By default, the qsub command runs in the background. You can run it in the foreground
using the -f option. See “qsub” on page 225 of the PBS Professional Reference Guide.
Windows Users can Use UNC Paths
Windows users can use UNC paths for job submission and file staging. See "Set up Paths", on
page 16 of the PBS Professional User’s Guide and "Using UNC Paths", on page 54 of the PBS
Professional User’s Guide.
Automatic Installation and Upgrade of Database
PBS automatically installs or upgrades its database. See “Automatic Database Upgrade” on
page 139 in the PBS Professional Installation & Upgrade Guide.
Longer Job and Reservation Names
You can use job and reservation names up to 236 characters in length. See “Formats” on page
421 of the PBS Professional Reference Guide.
PBS Professional 13.0 Administrator’s Guide
AG-3
Chapter 1
New Features
Address Disambiguation for Multihomed Systems
You can disambiguate addresses for contacting the server, sending mail, sending outgoing
traffic, and delivering output and error files. See “PBS with Multihomed Systems” on page
105 in the PBS Professional Installation & Upgrade Guide.
Support for Hydra Process Manager in Intel MPI
Intel MPI is integrated with PBS. See "Integrating Intel MPI 4.0.3 On Linux/UNIX Using
Environment Variables” on page 897.
Enhancements to pbsnodes Command
You can now use the pbsnodes command to edit the comment attribute of a host, to write
out host information, and to operate on specific vnodes. See "pbsnodes” on page 108.
Primary Group of Job Owner or Reservation Creator Automatically Added to
Job group_list
The job submitter’s and reservation creator’s primary group is automatically added to the job
or reservation group_list attribute. See "qsub” on page 225 and "pbs_rsub” on page 83.
Intel MPI Integrated under Windows
MPI is integrated with PBS under Windows (as well as Linux/UNIX). See "Integrating Intel
MPI 4.0.3 on Windows Using Wrapper Script” on page 897.
MPICH2 Integrated under Windows
MPICH2 is integrated with PBS under Windows (as well as Linux/UNIX). See "Integrating
MPICH2 1.4.1p1 on Windows Using Wrapper Script” on page 897.
PBS pbsdsh Command Available under Windows
The pbsdsh command is available under Windows. See "pbsdsh” on page 104.
PBS TM APIs Available under Windows
The PBS TM APIs are available under Windows. See "TM Library” on page 91 of the PBS
Professional Programmer’s Guide.
PBS pbs_attach Command Available under Windows
The pbs_attach command is available under Windows. See "pbs_attach” on page 44.
Xeon Phi Reported on Cray
PBS automatically detects and reports a Xeon Phi in the ALPS inventory. See "Support for
Xeon Phi Coprocessor” on page 286.
AG-4
PBS Professional 13.0 Administrator’s Guide
New Features
1.2
Chapter 1
Changes in Previous Releases
Command Line Editing in qmgr (12.2)
The qmgr command provides a history and allows you to edit command lines. See “Reusing
and Editing the qmgr Command Line” on page 159 of the PBS Professional Reference Guide.
Interactive Jobs Available under Windows (12.2)
Job submitters can run interactive jobs under Windows. See "Running Your Job Interactively", on page 183 of the PBS Professional User’s Guide.
Job Run Count is Writable (12.2)
Job submitters and administrators can set the value of a job’s run count. See section 11.18,
“Managing Number of Run Attempts”, on page 1004 and "Controlling Number of Times Job
is Re-run", on page 180 of the PBS Professional User’s Guide.
runjob Hook can Modify Job Attributes (12.2)
The runjob hook can modify a job’s attributes and resources. See section 6.10.4, “Using
Attributes and Resources in Hooks”, on page 488.
Jobs can be Suspended under Windows (12.2)
You can suspend and resume a job under Windows.
Configuration of Directory for PBS Component Temporary Files (12.2)
You can configure the root directory where you want PBS components to put their temporary
files. See section 12.10, “Temporary File Location for PBS Components”, on page 1038.
Execution Event and Periodic Hooks (12.0)
You can write hooks that run at the execution host when the job reaches the execution host,
when the job starts, ends, is killed, and is cleaned up. You can also write hooks that run periodically on all execution hosts. See Chapter 6, "Hooks", on page 437.
Shrink-to-fit Jobs (12.0)
PBS allows users to specify a variable running time for jobs. Job submitters can specify a
walltime range for jobs where attempting to run the job in a tight time slot can be useful.
Administrators can convert non-shrink-to-fit jobs into shrink-to-fit jobs in order to maximize
machine use. See "Adjusting Job Running Time", on page 167 of the PBS Professional
User’s Guide and section 4.8.41, “Using Shrink-to-fit Jobs”, on page 279.
PBS Supports Socket Licensing (11.3)
PBS lets you use socket licenses to license hosts. See “Overview of Licensing for PBS Jobs”
on page 115 in the PBS Professional Installation & Upgrade Guide.
PBS Professional 13.0 Administrator’s Guide
AG-5
Chapter 1
New Features
Deleting Job History (11.3)
You can delete job histories. See section 11.15.8, “Deleting Moved Jobs and Job Histories”,
on page 1003.
Managing Resource Usage by Project (11.2)
You can set resource usage limits for projects, at the server and queue. You can set limits for
the amount of each resource being used, or for the number of jobs. Jobs have a new attribute
called project. See section 5.15.1, “Managing Resource Usage By Users, Groups, and
Projects, at Server & Queues”, on page 389.
Support for Accelerators on Cray (11.2)
PBS provides tight integration for accelerators on Cray. See section 10.3, “Support for Cray
Systems”, on page 923.
PBS Daemons Protected from OOM Killer (11.2)
PBS daemons are protected from being terminated by an OOM killer. See section 9.8.1,
“OOM Killer Protection”, on page 891.
PBS Supports X Forwarding for Interactive Jobs (11.2)
PBS allows users to receive X output from interactive jobs. See "Receiving X Output from
Interactive Jobs", on page 186 of the PBS Professional User’s Guide, and section 12.2.1.1,
“Contents of Environment File”, on page 1011.
Support for Interlagos on Cray (11.1)
You can allow users to request vnodes that have Interlagos hardware. See section 10.3.7.14,
“Allowing Users to Request Interlagos Hardware”, on page 941.
Improved Cray Integration (11.0)
PBS is more tightly integrated with Cray systems. You can use the PBS select and place language when submitting Cray jobs. See section 10.3, “Support for Cray Systems”, on page
923.
Vnode Access for Hooks (11.0)
Hooks have access to vnode attributes and resources. See Chapter 6, "Hooks", on page 437.
Enhanced Job Placement (11.0)
PBS allows job submitters to scatter chunks by vnode in addition to scattering by host. PBS
also allows job submitters to reserve entire hosts via a job’s placement request. See "Specifying Job Placement", on page 92 of the PBS Professional User’s Guide.
AG-6
PBS Professional 13.0 Administrator’s Guide
New Features
Chapter 1
Choice in PBS service account Name (11.0)
Under Windows, the PBS service account used to run PBS daemons can have any name. See
“The PBS Service Account” on page 20 in the PBS Professional Installation & Upgrade
Guide and “The PBS service account for Standalone Environments” on page 23 in the PBS
Professional Installation & Upgrade Guide.
Change of Licensing Method (11.0)
As of 11.0, PBS is licensed using a new Altair license server. See “Licensing” on page 115 in
the PBS Professional Installation & Upgrade Guide.
Change in Data Management (11.0)
PBS uses a new data service. See section 12.7, “Managing the Data Service”, on page 1025.
Choice in Job Requeue Timeout (11.0)
You can choose how long the job requeue process should be allowed to run. See section 9.4.3,
“Setting Job Requeue Timeout”, on page 883.
Backfilling Around Top N Jobs (10.4)
PBS can backfill around the most deserving jobs. You can configure the number of jobs PBS
backfills around. See section 4.8.3, “Using Backfilling”, on page 129.
Estimating Job Start Times (10.4)
PBS can estimate when jobs will run, and which vnodes each job will use. See section 4.8.15,
“Estimating Job Start Time”, on page 169.
Unified Job Submission (10.4)
PBS allows users to submit jobs using the same scripts, whether the job is submitted on a
Windows or UNIX/Linux system. See "Python Job Scripts", on page 25 of the PBS Professional User’s Guide.
Provisioning (10.2)
PBS provides automatic provisioning of an OS or application on vnodes that are configured to
be provisioned. When a job requires an OS that is available but not running, or an application
that is not installed, PBS provisions the vnode with that OS or application. See Chapter 7,
"Provisioning", on page 739.
New Hook Type (10.2)
PBS has a new hook type which can be triggered when a job is to be run. See "Hooks” on
page 437.
PBS Professional 13.0 Administrator’s Guide
AG-7
Chapter 1
New Features
New Scheduler Attribute (10.2)
PBS allows the administrator to set the scheduler’s cycle time using the new
sched_cycle_length scheduler attribute. See the pbs_sched_attributes(7B) manual
page.
Walltime as Checkpoint Interval Measure (10.2)
PBS allows a job to be checkpointed according to its walltime usage. See the
pbs_job_attributes(7B) manual page.
Employing User Space Mode on IBM InfiniBand Switches (10.2)
PBS allows users submitting POE jobs to use InfiniBand switches in User Space mode. See
section 10.2, “Support for IBM AIX”, on page 917.
Managing Resource Usage (10.1)
You can set separate limits for resource usage by individual users, individual groups, generic
users, generic groups, and the total used. You can limit the amount of resources used, and the
number of queued and running jobs. These limits can be defined separately for each queue
and for the server. See section 5.15.1, “Managing Resource Usage By Users, Groups, and
Projects, at Server & Queues”, on page 389. These new limits are incompatible with the limit
attributes existing before Version 10.1.
Managing Job History (10.1)
PBS Professional can provide job history information, including what the submission parameters were, whether the job started execution, whether execution succeeded, whether staging
out of results succeeded, and which resources were used. PBS can keep job history for jobs
which have finished execution, were deleted, or were moved to another server. See section
11.15, “Managing Job History”, on page 999.
Reservation Fault Tolerance (10.1)
PBS attempts to reconfirm reservations for which associated vnodes have become unavailable. See section 9.5, “Reservation Fault Tolerance”, on page 887.
Checkpoint Support via Epilogue (10.1)
Checkpointed jobs can be requeued if the epilogue exits with a special value. See section
9.3.7.3, “Requeueing via Epilogue”, on page 875.
Hooks (10.0)
Hooks are custom executables that can be run at specific points in the execution of PBS.
They accept, reject, or modify the upcoming action. This provides job filtering, patches or
workarounds, and extends the capabilities of PBS, without the need to modify source code.
See Chapter 6, "Hooks", on page 437.
AG-8
PBS Professional 13.0 Administrator’s Guide
New Features
Chapter 1
Versioned Installation (10.0)
PBS is now automatically installed in versioned directories. For most platforms, different
versions of PBS can coexist, and upgrading is simplified. See Chapter 3, "Installation", on
page 31 and Chapter 7, "Upgrading", on page 137 in the PBS Professional Installation and
Upgrade Guide.
Resource Permissions for Custom Resources (9.2)
You can set permissions on custom resources so that they are either invisible to users or cannot be requested by users. This also means that users cannot modify a resource request for
those resources via qalter. See section 5.14.2.10, “Resource Permission Flags”, on page
351.
Extension to Job Sorting Formula (9.2)
The job sorting formula has been extended to include parentheses, exponentiation, division,
and unary plus and minus. See section 4.8.3, “Using Backfilling”, on page 129.
Eligible Wait Time for Jobs (9.2)
A job that is waiting to run can be accruing “eligible time”. Jobs can accrue eligible time
when they are blocked due to a lack of resources. This eligible time can be used in the job
sorting formula. Jobs have two new attributes, eligible_time and accrue_type, which indicates what kind of wait time the job is accruing. See section 4.8.13, “Eligible Wait Time for
Jobs”, on page 163.
Job Staging and Execution Directories (9.2)
PBS now provides per-job staging and execution directories. Jobs have new attributes sandbox and jobdir, the MoM has a new option $jobdir_root, and there is a new environment
variable called PBS_JOBDIR. If the job’s sandbox attribute is set to PRIVATE, PBS creates a job-specific staging and execution directory. If the job’s sandbox attribute is unset or
is set to HOME, PBS uses the user’s home directory for staging and execution, which is how
previous versions of PBS behaved. If MoM’s $jobdir_root is set to a specific directory,
that is where PBS will create job-specific staging and execution directories. If MoM’s
$jobdir_root is unset, PBS will create the job-specific staging and execution directory
under the user’s home directory. See section 11.13.1, “Staging and Execution Directories for
Job”, on page 990.
Standing Reservations (9.2)
PBS now provides both advance and standing reservation of resources. A standing reservation is a reservation of resources for specific recurring periods of time. See section 4.8.37,
“Advance and Standing Reservations”, on page 264.
PBS Professional 13.0 Administrator’s Guide
AG-9
New Features
Chapter 1
New Server Attribute for Job Sorting Formula (9.1)
The new server attribute “job_sort_formula” is used for sorting jobs according to a sitedefined formula. See section 4.8.20, “Using a Formula for Computing Job Execution Priority”, on page 194.
Change to sched_config (9.1)
The default for job_sort_key of “cput” is commented out in the default sched_config
file. It is left in as a usage example.
Change to Licensing (9.0)
PBS now depends on an Altair license server that will hand out licenses to be assigned to PBS
jobs. See “Licensing” on page 115 in the PBS Professional Installation & Upgrade Guide.
PBS Professional versions 8.0 and below will continue to be licensed using the proprietary
licensing scheme.
Installing With Altair Licensing (9.0)
If you will use floating licenses, we recommend that you install and configure the Altair
license server before installing and configuring PBS. PBS starts up faster. See “Overview of
Installation” on page 31 in the PBS Professional Installation & Upgrade Guide.
Unset Host-level Resources Have Zero Value (9.0)
An unset numerical resource at the host level behaves as if its value is zero, but at the server
or queue level it behaves as if it were infinite. An unset string or string array resource cannot
be matched by a job’s resource request. An unset boolean resource behaves as if it is set to
“False”. See section 4.8.28.7, “Matching Unset Resources”, on page 212.
Better Management of Resources Allocated to Jobs (9.0)
The resources allocated to a job from vnodes will not be released until certain allocated
resources have been freed by all MoMs running the job. The end of job accounting record
will not be written until all of the resources have been freed. The “end” entry in the job end
(‘E’) record will include the time to stage out files, delete files, and free the resources. This
will not change the recorded “walltime” for the job.
Support for Large Page Mode on AIX (9.0)
PBS Professional supports Large Page Mode on AIX. No additional steps are required from
the PBS administrator.
1.3
Deprecations and Removals
The -a alarm option to pbs_sched is deprecated, and is replaced with the
sched_cycle_length scheduler attribute.
AG-10
PBS Professional 13.0 Administrator’s Guide
New Features
Chapter 1
The sort_priority option to job_sort_key is deprecated and is replaced with the
job_priority option.
The -lnodes=nodespec form is replaced by the -l select= and -l place= statements.
The nodes resource is no longer used.
The -l resource=rescspec form is replaced by the -l select= statement.
The time-shared node type is no longer used, and
the :ts suffix is obsolete.
The cluster node type is no longer used.
The resource arch is only used inside of a select statement.
The resource host is only used inside of a select statement.
The nodect resource is obsolete. The ncpus resource should be used instead. Sites which
currently have default values or limits based on nodect should change them to be based on
ncpus.
The neednodes resource is obsolete.
The ssinodes resource is obsolete.
Properties are replaced by boolean resources.
The -a option to the qselect command is deprecated.
The -Wdelay=nnnn option to the qdel command is deprecated.
The -c and -d options to the pbsnodes command are deprecated.
The -c and -d options to pbsnodes are removed.
The memreserved MoM configuration option is deprecated.
The pbs_tclapi pbsrescquery command is deprecated.
The pbs_rescquery command is deprecated.
The sync_time scheduler configuration option is deprecated.
PBS Professional 13.0 Administrator’s Guide
AG-11
Chapter 1
New Features
The Cray mpp* syntax is deprecated with PBS version 11. Requesting the mpp* resources
in any command is deprecated.
•
The following resources are deprecated:
mppwidth
mppdepth
mppnppn
mppmem
mpparch
mpphost
mpplabels
mppnodes
•
PBS does not support server or queue level mpp* defaults. The following are deprecated:
resources_default.mppwidth
resources_default.mppdepth
resources_default.mppnppn
resources_default.mppmem
resources_default.mpparch
resources_default.mpphost
resources_default.mpplabels
resources_default.mppnodes
•
PBS does not support mpp* minima or maxima for server and queues. The following are
deprecated:
resources_min.mppwidth
resources_min.mppdepth
resources_min.mppnppn
resources_min.mppmem
resources_min.mpparch
resources_min.mpphost
resources_min.mpplabels
resources_min.mppnodes
AG-12
PBS Professional 13.0 Administrator’s Guide
New Features
Chapter 1
resources_max.mppwidth
resources_max.mppdepth
resources_max.mppnppn
resources_max.mppmem
resources_max.mpparch
resources_max.mpphost
resources_max.mpplabels
resources_max.mppnodes
The pbs_license_file_location server attribute is deprecated and replaced by
pbs_license_info.
The configrm() resource monitor API call is deprecated.
Support in PBS for CSA on SGI systems is removed.
Globus can still send jobs to PBS, but PBS no longer supports sending jobs to Globus (11.3).
Support for LAM MPI 6.5.9 is deprecated (12.0).
In version 12.0, PBS uses Python 2.5. PBS will use a newer version of Python in some subsequent release, so support for Python 2.5 is deprecated. (12.0).
The pbs-report command is deprecated, and will be moved to the unsupported directory in the next release.
The sort_queues scheduler parameter is deprecated. (12.2).
The smp_cluster_dist scheduler parameter is deprecated. (12.2).
Support for HPCBP jobs is removed (12.2).
The sort_queues scheduler parameter has no effect. (13.0).
Using pbsrun_wrap and pbsrun_unwrap for Intel MPI is deprecated (13.0).
The half_life scheduler parameter is deprecated (13.0).
The preempt_priority argument to the job_sort_key scheduler parameter is deprecated
(13.0).
The xpbs and xpbsmon interfaces to PBS are deprecated (13.0).
The TMPDIR environment variable is deprecated and replaced with PBS_TMPDIR (13.0).
PBS Professional 13.0 Administrator’s Guide
AG-13
Chapter 1
1.4
1.4.1
New Features
Backward Compatibility
New and Old Resource Usage Limits
Incompatible
The new resource usage limits are incompatible with the old resource usage limits. See section 5.15.1.15, “Old Limit Attributes: Server and Queue Resource Usage Limit Attributes
Existing Before Version 10.1”, on page 411, section 5.15.1.13.v, “Do Not Mix Old And New
Limits”, on page 410, and section 5.15.1.14.i, “Error When Setting Limit Attributes”, on page
410.
1.4.2
Job Dependencies Affected By Job History
Enabling job history changes the behavior of dependent jobs. If a job j1 depends on a finished
job j2 for which PBS is maintaining history than j1 will go into the held state. If job j1
depends on a finished job j3 that has been purged from the historical records than j1 will be
rejected just as in previous versions of PBS where the job was no longer in the system.
1.4.3
PBS path information no longer saved in
AUTOEXEC.BAT
Any value for PATH saved in AUTOEXEC.BAT may be lost after installation of PBS. If
there is any path information that needs to be saved, AUTOEXEC.BAT must be edited by
hand after the installation of PBS. PBS path information is no longer saved in
AUTOEXEC.BAT.
1.4.4
OS-level Checkpointing Not Supported
PBS does not directly support OS-level checkpointing. PBS supports checkpointing using
site-supplied methods. See section 9.3, “Checkpoint and Restart”, on page 857.
AG-14
PBS Professional 13.0 Administrator’s Guide
2
Configuring the Server
and Queues
This chapter describes how to configure the server and any queues.
2.1
2.1.1
The Server
Configuring the Server
You configure the server by setting server attributes via the qmgr command:
Qmgr: set server <attribute> = <value>
For a description of the server attributes, see “Server Attributes” on page 332 of the PBS Professional Reference Guide.
For a description of the qmgr command, see “qmgr” on page 158 of the PBS Professional
Reference Guide.
PBS Professional 13.0 Administrator’s Guide
AG-15
Chapter 2
2.1.2
Configuring the Server and Queues
Default Server Configuration
The default configuration from the binary installation sets the default server settings. An
example server configuration is shown below:
qmgr
Qmgr: print server
#
# Create queues and set their attributes.
# Create and define queue workq
#
create queue workq
set queue workq queue_type = Execution
set queue workq enabled = True
set queue workq started = True
#
# Set server attributes.
#
set server scheduling = True
set server default_queue = workq
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.ncpus = 1
set server scheduler_iteration = 600
set server resv_enable = True
set server node_fail_requeue = 310
set server max_array_size = 10000
set server default_chunk.ncpus=1
AG-16
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
2.1.3
Chapter 2
The PBS Node File
The server creates a file of the nodes managed by PBS. This node file is written only by the
server. On startup each MoM sends a time-stamped list of her known vnodes to the server.
The server updates its information based on that message. If the time stamp on the vnode list
is newer than what the server recorded before in the node file, the server will create any
vnodes which were not already defined. If the time stamp in the MoM’s message is not
newer, then the server will not create any missing vnodes and will log an error for any vnodes
reported by MoM but not already known.
Whenever new vnodes are created, the server sends a message to each MoM with the list of
MoMs and each vnode managed by the MoMs. The server will only delete vnodes when they
are explicitly deleted via qmgr.
This is different from the node file created for each job. See "The Job Node File", on page
109 of the PBS Professional User’s Guide.
2.1.4
Server Configuration Attributes
See “Server Attributes” on page 332 of the PBS Professional Reference Guide for a table of
server attributes.
2.1.5
Recording Server Configuration
If you wish to record the configuration of a PBS server for re-use later, you may use the
print subcommand of qmgr(8B). For example,
qmgr -c “print server” > /tmp/server.out
qmgr -c “print node @default” > /tmp/nodes.out
will record in the file /tmp/server.out the qmgr subcommands required to recreate the
current configuration including the queues. The second file generated above will contain the
vnodes and all the vnode properties. The commands could be read back into qmgr via standard input:
qmgr < /tmp/server.out
qmgr < /tmp/nodes.out
2.1.6
Support for Globus
Globus can still send jobs to PBS, but PBS no longer supports sending jobs to Globus. The
Globus MoM is no longer available.
PBS Professional 13.0 Administrator’s Guide
AG-17
Chapter 2
2.1.7
Configuring the Server and Queues
Configuring the Server for Licensing
The PBS server must be configured for licensing. You must set the location where PBS will
look for the license file and/or license server(s), by setting the server attribute
pbs_license_info. The other server licensing attributes have defaults, but you may wish to
set them as well. See “Configuring PBS for Licensing” on page 119 in the PBS Professional
Installation & Upgrade Guide.
You may also wish to have redundant license servers. See the Altair License Management
System Installation and Operations Guide, available at www.pbsworks.com.
2.1.8
Configuring Mail
You can configure the account that is used as the address to both send and receive administrative mail. These are the same account. For example, when failover occurs, an email is sent to
and from the account defined in the server’s mail_from attribute, saying that failover has
occurred.
Use the qmgr command to set the mail_from server attribute to an address that is monitored
regularly:
Qmgr: s server mail_from=<address>
You cannot configure which mail server PBS uses. PBS uses the default mail server. On
UNIX/Linux, this is /usr/lib/sendmail.
On Windows, PBS uses sendmail on the host specified in the server’s mail_from attribute.
For example, if you set mail_from to admin_acct@host1.example.com, PBS uses
sendmail on host1.
2.1.8.1
Mail Caveats
If you do not set the mail_from attribute on Windows, PBS will not be able to send mail.
2.2
Queues
When a job is submitted to PBS and accepted, it is placed in a queue. Despite the fact that the
name implies first-in, first-out ordering of jobs, this is not the case. Job submission order
does not determine job execution order. See Chapter 4, "Scheduling", on page 63.
AG-18
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
Chapter 2
You can create different queues for different purposes: queues for certain kinds of jobs,
queues for specific groups, queues for specific vnodes, etc. You can tell PBS how to automatically route jobs into each queue. PBS has a default execution queue named workq, where
jobs are placed when no queue is requested. You can specify which queue should be the
default. See section 2.2.14, “Specifying Default Queue”, on page 34.
2.2.1
Kinds of Queues
2.2.1.1
Execution and Routing Queues
There are two main types of PBS queues: routing and execution.
•
A routing queue is used only to move jobs to other queues. These destination queues can
be routing or execution queues, and can be located at different PBS servers. For more
information on creating and using routing queues, see section 2.2.6, “Routing Queues”,
on page 24.
•
An execution queue is used as the home for a waiting or running job. A job must reside
in an execution queue to be eligible to run. The job remains in the execution queue during
the time it is running. See section 2.2.5, “Execution Queues”, on page 21.
2.2.1.2
Available Kinds of Queues
PBS supplies the following kinds of execution and routing queues:
Table 2-1: Kinds of Queues
Kind of Queue
Routing queues
Description
Used for moving jobs to
another queue
PBS Professional 13.0 Administrator’s Guide
Link
See section 2.2.6, “Routing
Queues”, on page 24
AG-19
Chapter 2
Configuring the Server and Queues
Table 2-1: Kinds of Queues
Kind of Queue
Execution
queues
2.2.2
Description
Link
Reservation
queues
Created for reservation.
See section 2.2.5.2.iv, “Reservation Queues”, on page 23
Dedicated
time queues
Holds jobs that run only dur- See section 2.2.5.2.i, “Dediing dedicated time.
cated Time Queues”, on page
22
Primetime
queues
Holds jobs that run only dur- See section 2.2.5.2.ii, “Primeing primetime.
time and Non-Primetime
Queues”, on page 23
Non-primetime queues
Holds jobs that run only dur- See section 2.2.5.2.ii, “Primeing non-primetime.
time and Non-Primetime
Queues”, on page 23
Anytime
queues
Queue with no dedicated
time or primetime restrictions
Express
queues
High-priority queue; priority See section 2.2.5.3.i, “Express
is set to the level signifying Queues”, on page 23
that it is an express queue
Anti-express
queue
Low-priority queue designed See section 4.8.1, “Antifor work that should run only Express Queues”, on page 125
when no other jobs need the
resources
See section 2.2.5.2.iii, “Anytime Queues”, on page 23
Basic Queue Use
The simplest form of PBS uses just one queue. The queue is an execution queue named
workq. This queue is always created, enabled, and started for you during installation. After a
basic installation, this queue is ready to hold jobs submitted by users.
2.2.3
Creating Queues
To create a queue, use the qmgr command to create it and set its queue_type attribute:
Qmgr: create queue <queue name>
Qmgr: set queue <queue_name> queue_type = <execution or route>
AG-20
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
Chapter 2
For example, to create an execution queue named exec_queue, set its type, start it, and
enable it:
Qmgr:
Qmgr:
Qmgr:
Qmgr:
create queue exec_queue
set queue exec_queue queue_type = execution
set queue exec_queue enabled = True
set queue exec_queue started = True
Now we will create a routing queue, which will send jobs to our execution queue:
Qmgr: create queue routing_queue
Qmgr: set queue routing_queue queue_type = route
Qmgr: set queue routing_queue route_destinations = exec_queue
2.2.4
Enabling, Disabling, Starting, and Stopping
Queues
When you enable a queue, you allow it to accept jobs, meaning that jobs can be enqueued in
the queue. When you disable a queue, you disallow it from accepting jobs. Queues are disabled by default. You enable a queue by setting its enabled attribute to True:
Qmgr: set queue <queue name> enabled = True
When you start a queue, you allow the jobs in the queue to be executed. Jobs are selected to
be run according to the scheduling policy. When you stop a queue, you disallow jobs in that
queue from running, regardless of scheduling policy. Queues are stopped by default. You
start a queue by setting its started attribute to True:
Qmgr: set queue <queue name> started = True
2.2.5
Execution Queues
Execution queues are used to run jobs; jobs must be in an execution queue in order to run.
PBS does not route from execution queues.
2.2.5.1
Where Execution Queues Get Their Jobs
By default, PBS allows jobs to be moved into execution queues via the qmove command, by
hooks, from routing queues, and by being submitted to execution queues. You can specify
that an execution queue should accept only those jobs that are routed from a routing queue by
PBS, by setting the queue’s from_route_only attribute to True:
Qmgr: set queue <queue name> from_route_only = True
PBS Professional 13.0 Administrator’s Guide
AG-21
Chapter 2
2.2.5.2
Configuring the Server and Queues
Execution Queues for Specific Time Periods
PBS provides a mechanism that allows you to specify that the jobs in an execution queue can
run only during specific time periods. PBS provides a different kind of execution queue for
each kind of time period. The time periods you can specify are the following:
Advance or Standing Reservations
You can create an advance or standing reservation. An advance reservation is a reservation for specified resources for a specified time period with a defined beginning
and end. A standing reservation is a series of recurring advance reservations.
Dedicated time
Dedicated time is a period of time with a defined beginning and end. You can define
multiple dedicated times.
Primetime
Primetime is a recurring time period with a defined beginning and end. You can
define primetime to be different for each day of the week.
Non-primetime
Non-primetime is a recurring time period with a defined beginning and end. Nonprimetime begins when primetime ends, and vice versa.
Holidays
Holidays are dates defined in the PBS_HOME/sched_priv/holidays file. PBS
has a default set of holidays, and you can define y our own holidays. Holiday time is
treated like non-primetime, meaning jobs in non-primetime queues run during holiday time.
Anytime queue
The term “anytime queue” means a queue that is not a primetime or a non-primetime
queue.
2.2.5.2.i
Dedicated Time Queues
The jobs in a dedicated time execution queue can run only during dedicated time. Dedicated
time is defined in PBS_HOME/sched_priv/dedicated_time. See section 4.8.10,
“Dedicated Time”, on page 161.
To specify that a queue is a dedicated time queue, you prefix the queue name with the dedicated time keyword. This keyword defaults to “ded”, but can be defined in the
dedicated_prefix scheduler parameter in PBS_HOME/sched_priv/sched_config. See
“dedicated_prefix” on page 299 of the PBS Professional Reference Guide.
AG-22
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
2.2.5.2.ii
Chapter 2
Primetime and Non-Primetime Queues
The jobs in a primetime queue run only during primetime, and the jobs in a non-primetime
queue run only during non-primetime. Primetime and non-primetime are defined in
PBS_HOME/sched_priv/holidays. See section 4.8.34, “Using Primetime and Holidays”, on page 256.
To specify that a queue is a primetime or non-primetime queue, you prefix the queue name
with the primetime or non-primetime keyword. For primetime, this keyword defaults to
“p_”, and for non-primetime, the keyword defaults to “np_”, but these can be defined in the
primetime_prefix and nonprimetime_prefix scheduler parameters in PBS_HOME/
sched_priv/sched_config. See “Scheduler Parameters” on page 297 of the PBS Professional Reference Guide.
2.2.5.2.iii
Anytime Queues
An anytime queue is a queue whose jobs can run at any time. An anytime queue is simply a
queue that is not a dedicated time, primetime, or non-primetime queue.
2.2.5.2.iv
Reservation Queues
When the pbs_rsub command is used to create a reservation or to convert a job into a reservation job, PBS creates a reservation queue. Jobs in the queue run only during the reservation. See section 4.8.37, “Advance and Standing Reservations”, on page 264.
2.2.5.3
Prioritizing Execution Queues
You can set the priority of each execution queue as compared to the other queues in this complex by specifying a value for the priority queue attribute:
Qmgr: set queue <queue name> priority = <value>
A higher value for priority means the queue has greater priority. There is no limit to the priority that you can assign to a queue, however it must fit within integer size. See “Queue
Attributes” on page 371 of the PBS Professional Reference Guide.
For how queue priority is used in scheduling, see section 4.8.36, “Queue Priority”, on page
262.
2.2.5.3.i
Express Queues
A queue is an express queue if its priority is greater than or equal to the value that defines an
express queue. This value is set in the preempt_queue_prio parameter in PBS_HOME/
sched_priv/sched_config. The default value for preempt_queue_prio is 150.
You do not need to set by_queue to True in order to use express queues.
For how express queues can be used, see section 4.8.17, “Express Queues”, on page 179.
PBS Professional 13.0 Administrator’s Guide
AG-23
Chapter 2
2.2.6
Configuring the Server and Queues
Routing Queues
A routing queue is used only to route jobs; jobs cannot run from a routing queue. A routing
queue has the following properties:
•
Can route to multiple destinations
•
Tries destinations in round-robin fashion, in the order listed
•
Can route to execution queues
•
Can route to other routing queues
•
Can route to queues in other complexes (at other servers)
Destinations can be specified in the following ways:
route_destinations
route_destinations
route_destinations
route_destinations
route_destinations
2.2.6.1
= Q1
= Q1@Server1
= "Q1, Q2@Server1, Q3@Server2"
+= Q1
+= "Q4, Q5@Server3"
How Routing Works
Whenever a job enters a routing queue, PBS immediately attempts to route the job to a destination queue. The result is one of the following:
•
The job is routed to one of the destination queues.
•
The attempt to route is permanently rejected by each destination, and the job is deleted.
•
Every destination rejects the job, but at least one rejection is temporary. In this case, the
destination is tried again later.
If there are multiple routing queues containing jobs to be routed, the routing queues are processed in the order in which they are displayed in the output of a qstat -Q command.
When PBS routes a job, it tries each destination in the order listed. The job’s destination is the
first queue that accepts it.
Queue priority does not play a role in routing jobs.
2.2.6.2
Requirements for Routing Queues
•
A routing queue’s destination queues must be created before being specified in the routing queue’s route_destinations attribute.
•
A routing queue’s route_destinations attribute must be specified before enabling and
starting the routing queue.
AG-24
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
2.2.6.3
Chapter 2
Caveats and Advice for Routing Queues
•
Routing loops should be avoided. If a job makes more than 20 routing hops, it is discarded, and mail may be sent. Avoid setting a routing queue’s destination to be the routing queue itself.
•
When routing to a complex that is using failover, it's a good idea to include the names of
both primary and secondary servers in a routing destination:
route_destinations = "destQ@primary_server, destQ@secondary_server"
•
When routing a job between complexes, the job’s owner must be able to submit a job to
the destination complex.
•
When routing to a destination in another complex, the source and destination complexes
should use the same version of PBS. If not, you may need a submission hook to modify
incoming jobs.
•
It is recommended to list the destination queues in order of the most restrictive first,
because the first queue which meets the job’s requirements and is enabled will be its destination
2.2.6.4
Using Resources to Route Jobs Between
Queues
You can use resources to direct jobs to the desired queues. The server will automatically route
jobs that are in routing queues, based on job resource requests. The destination queue can be
at the local server or at another server. If you have more than one PBS complex, you may
want to route jobs between the complexes, depending on the resources available at each complex.
You can set up queues for specific kinds of jobs, for example jobs requesting very little memory, a lot of memory, or a particular application. You can then route jobs to the appropriate
queues.
A routing queue tests destination queues in the order listed in the queue’s route_destinations
attribute. The job is placed in the first queue that meets the job’s request and is enabled.
Please read all of the subsections for this section.
2.2.6.4.i
How Queue and Server Limits Are Applied, Except
Running Time
The following applies to to all resources except for min_walltime and max_walltime.
PBS Professional 13.0 Administrator’s Guide
AG-25
Chapter 2
Configuring the Server and Queues
You can set a minimum and a maximum for each resource at each queue using the
resources_min.<resource> and resources_max.<resource> queue attributes. Any time a
job is considered for entry into a queue, the job’s resource request is tested against
resources_min.<resource> and resources_max.<resource> for that queue. The job’s
resource request must be greater than or equal to the value specified in
resources_min.<resource>, and less than or equal to the value specified in
resources_max.<resource>.
The job is tested only against existing resources_min.<resource> and
resources_max.<resource> for the queue.
Only those resources that are specified in the job’s resource request are tested, so if a job does
not request a particular resource, and did not inherit a default for that resource, the minimum
and maximum tests for that resource are not applied to the job.
If you want jobs requesting only a specific value for a resource to be allowed into a queue, set
the queue’s resources_min.<resource> and resources_max.<resource> to the same value.
This resource can be numeric, string, string array, or Boolean.
If you limit queue access using a string array, a job must request one of the values in the string
array to be allowed into the queue. For example, if you set resources_min.strarr and
resources_max.strarr to “blue,red,black”, jobs can request –l strarr=blue, -l
strarr=red, or –l strarr=black to be allowed into the queue.
2.2.6.4.ii
How Queue and Server Running Time Limits are
Applied
For shrink-to-fit jobs, running time limits are applied to max_walltime and min_walltime, not
walltime. To set a running time limit for shrink-to-fit jobs, you cannot use resources_max or
resources_min for max_walltime or min_walltime. Instead, use resources_max.walltime
and resources_min.walltime. See section 4.8.41.6, “Shrink-to-fit Jobs and Resource Limits”,
on page 283.
AG-26
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
2.2.6.4.iii
Chapter 2
Resources Used for Routing and Admittance
You can route jobs using the following kinds of resources:
•
Any server-level or queue-level (job-wide) built-in or custom resource, whether it is
numeric, string, or Boolean, for example ncpus and software
When routing jobs with min_walltime and/or max_walltime, PBS examines the values
for resources_min.walltime and resources_max.walltime at the server or queue. See
section 2.2.6.4.ii, “How Queue and Server Running Time Limits are Applied”, on page
26.
•
The following built-in chunk-level resources:
accelerator_memory
mem
mpiprocs
naccelerators
ncpus
netwins
nodect
vmem
•
Custom vnode-level (chunk-level) resources that are global and have the n, q, or f flags
set
•
Any resource in the job’s Resource_List attribute; see section 5.9.2, “Resources
Requested by Job”, on page 323. For string or string array resources, see section
2.2.6.4.iv, “Using String, String Array, and Boolean Values for Routing and Admittance”,
on page 28.
When jobs are routed using a chunk-level resource, routing is based on the sum of that
resource across all chunks.
PBS Professional 13.0 Administrator’s Guide
AG-27
Chapter 2
Configuring the Server and Queues
2.2.6.4.iv
Using String, String Array, and Boolean Values for
Routing and Admittance
When using strings or string arrays for routing or admittance, you can use only job-wide
(server-level or queue-level) string or string array resources. String or string array resources
in chunks are ignored. The resources_min and resources_max attributes work as expected
with numeric values. In addition, they can be used with string and Boolean values to force an
exact match; this is done by setting both to the same value. For example, to limit jobs entering queue big to those that specify arch=unicos8, or that do not specify a value for arch:
Qmgr: set q App1Queue resources_max.software=App1
Qmgr: set q App1Queue resources_min.software=App1
2.2.6.4.v
Examples of Routing Jobs
You can force all jobs into a routing queue, or you can allow users to request some queues but
not others. If you set up the default queue be a routing queue, and make all execution queues
accept jobs only from routing queues, all jobs are initially forced into a routing queue.
AG-28
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
Chapter 2
Alternatively, you can set up one routing queue and a couple of execution queues which
accept jobs only from routing queues, but add other queues which can be requested. Or you
could allow jobs to request the execution queues, by making the execution queues also accept
jobs that aren’t from routing queues.
Example 2-1: Jobs can request one execution queue named WorkQ. All jobs that do not
request a specific queue are routed according to their walltime:
•
Create a routing queue RouteQ and make it the default queue:
Qmgr: create queue RouteQ queue_type = route
Qmgr: set server default_queue = RouteQ
•
Create two execution queues, LongQ and ShortQ. One is for long-running jobs, and one
is for short-running jobs:
Qmgr: create queue LongQ queue_type = execution
Qmgr: create queue ShortQ queue_type = execution
•
Set resources_min.walltime and resources_max.walltime on these queues:
Qmgr: set queue LongQ resources_min.walltime = 5:00:00
Qmgr: set queue ShortQ resources_max.walltime = 4:59:00
•
For LongQ and ShortQ, disallow jobs that are not from a route queue:
Qmgr: set queue LongQ from_route_only = True
Qmgr: set queue ShortQ from_route_only = True
•
Set the destinations for RouteQ to be LongQ and ShortQ:
Qmgr: set queue RouteQ route_destinations = “ShortQ, LongQ”
•
Create a work queue that can be requested:
Qmgr: create queue WorkQ queue_type = execution
•
Enable and start all queues:
Qmgr: active queue RouteQ,LongQ,ShortQ,WorkQ
Qmgr: set queue enabled = True
Qmgr: set queue started = True
•
Set default for walltime at the server so that jobs that don’t request it inherit the default,
and land in ShortQ:
Qmgr: set server resources_default.walltime = 4:00:00
Example 2-2: Jobs are not allowed to request any queues. All jobs are routed to one of three
queues based on the job’s walltime request:
•
Create a routing queue RouteQ and make it the default queue:
Qmgr: create queue RouteQ queue_type = route
Qmgr: set server default_queue = RouteQ
PBS Professional 13.0 Administrator’s Guide
AG-29
Chapter 2
Configuring the Server and Queues
•
Create three execution queues, LongQ, MedQ, and ShortQ. One is for long-running
jobs, one is for medium jobs, and one is for short-running jobs:
Qmgr: create queue LongQ queue_type = execution
Qmgr: create queue MedQ queue_type = execution
Qmgr: create queue ShortQ queue_type = execution
•
Set resources_min.walltime and resources_max.walltime on these queues:
Qmgr: set queue LongQ resources_min.walltime = 10:00:00
Qmgr: set queue MedQ resources_max.walltime = 9:59:00
Qmgr: set queue MedQ resources_min.walltime = 5:00:00
Qmgr: set queue ShortQ resources_max.walltime = 4:59:00
•
For LongQ, MedQ, and ShortQ, disallow jobs that are not from a route queue:
Qmgr: set queue LongQ from_route_only = True
Qmgr: set queue MedQ from_route_only = True
Qmgr: set queue ShortQ from_route_only = True
•
Set the destinations for RouteQ to be LongQ, MedQ and ShortQ:
Qmgr: set queue RouteQ route_destinations = “ShortQ, MedQ, LongQ”
•
Enable and start all queues:
Qmgr: active queue RouteQ,LongQ,ShortQ,MedQ
Qmgr: set queue enabled = True
Qmgr: set queue started = True
2.2.6.4.vi
Caveats for Queue Resource Limits
If a job is submitted without a request for a particular resource, and no defaults for that
resource are set at the server or queue, and either the server or queue has
resources_max.<resource> set, the job inherits that maximum value. If the queue has
resources_max.<resource> set, the job inherits the queue value, and if not, the job inherits
the server value.
2.2.6.5
Using Access Control to Route Jobs
You can route jobs based on job ownership by setting access control limits at destination
queues. A queue’s access control limits specify which users or groups are allowed to have
jobs in that queue. Default behavior is to disallow an entity that is not listed, so you need
only list allowed entities.
To set the list of allowed users at a queue:
Qmgr: set queue <queue name> acl_users = “User1@*.example.com,
User2@*.example.com”
AG-30
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
Chapter 2
To enable user access control at a queue:
Qmgr: set queue <queue name> acl_user_enable = True
To set the list of allowed groups at a queue:
Qmgr: set queue <queue name> acl_groups = “Group1@*.example.com,
Group2@*.example.com”
To enable group access control at a queue:
Qmgr: set queue <queue name> acl_group_enable = True
For a complete explanation of access control, see section 8.3, “Using Access Control”, on
page 791.
2.2.6.6
Allowing Routing of Held or Waiting Jobs
By default, PBS will not route jobs that are held. You can allow a routing queue to route held
jobs by setting the queue’s route_held_jobs attribute to True:
Qmgr: set queue <queue name> route_held_jobs = True
By default, PBS will not route jobs whose execution_time attribute has a value in the future.
You can allow a routing queue to route jobs whose start time is in the future by setting the
queue’s route_waiting_jobs attribute to True:
Qmgr: set queue <queue name> route_waiting_jobs = True
2.2.6.7
Setting Routing Retry Time
The default time between routing retries is 30 seconds. To set the time between routing
retries, set the value of the queue’s route_retry_time attribute:
Qmgr: set queue <queue name> route_retry_time = <value>
2.2.6.8
Specifying Job Lifetime in Routing Queue
By default, PBS allows a job to exist in a routing queue for an infinite amount of time. To
change this, set the queue’s route_lifetime attribute:
Qmgr: set queue <queue name> route_lifetime = <value>
PBS Professional 13.0 Administrator’s Guide
AG-31
Chapter 2
2.2.7
Configuring the Server and Queues
Queue Requirements
•
Each queue must have a unique name. The name must be alphanumeric, and must begin
with an alphabetic character
•
A server may have multiple queues of either or both types, but there must be at least one
execution queue defined.
2.2.8
Queue Configuration Attributes
Queue configuration attributes fall into three groups:
•
Those which apply to both types of queues
•
Those which apply only to execution queues
•
Those which apply only to routing queues
If an “execution queue only” attribute is set for a routing queue, or vice versa, it is ignored.
However, as this situation might indicate the administrator made a mistake, the server will
write a warning message on stderr about the conflict. The same message is written when
the queue type is changed and there are attributes that do not apply to the new type.
See “Queue Attributes” on page 371 of the PBS Professional Reference Guide for a table of
queue attributes.
2.2.9
Viewing Queue Status
To see the status of a queue, including values for attributes, use the qstat command:
qstat -Qf <queue name>
To see the status of all queues:
qstat -Qf
The status of the queue is reported in the State field. The field shows two letters. One is
either E (enabled) or D (disabled.) The other is R (running, same as started) or S (stopped.)
Attributes with non-default values are displayed. See “qstat” on page 210 of the PBS Professional Reference Guide.
The following queue attributes contain queue status information:
AG-32
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
Chapter 2
total_jobs
state_count
resources_assigned
hasnodes
enabled
started
2.2.10
Deleting Queues
Use the qmgr command to delete queues.
Qmgr: delete queue <queue name>
2.2.10.1
Caveats for Deleting Queues
•
A queue that has queued or running jobs cannot be deleted.
•
A queue that is associated with a vnode via that vnode’s queue attribute cannot be
deleted. To remove the association, save the output of pbsnodes -a to a file and
search for the queue. Unset the queue attribute for each associated vnode.
2.2.11
Defining Queue Resources
For each queue, you can define the resources you want to have available at that queue. To set
the value for an existing resource, use the qmgr command:
Qmgr: set queue <queue name> resources_available.<resource name> =
<value>
For example, to set the value of the Boolean resource RunsMyApp to True at QueueA:
Qmgr: set queue QueueA resources_available.RunsMyApp = True
For information on how to define a new resource at a queue, see section 5.14, “Custom
Resources”, on page 337.
For information on defining default resources at a queue, see section 5.9.3.3, “Specifying Jobwide Default Resources at Queue”, on page 325 and section 5.9.3.4, “Specifying Chunk
Default Resources at Queue”, on page 325.
PBS Professional 13.0 Administrator’s Guide
AG-33
Chapter 2
2.2.12
Configuring the Server and Queues
Setting Queue Resource Defaults
The jobs that are placed in a queue inherit the queue’s defaults for any resources not specified
by the job’s resource request. You can specify each default resource for each queue. This is
described in section 5.9.3, “Specifying Job Default Resources”, on page 323. Jobs inherit
default resources according to the rules described in section 5.9.4, “Allocating Default
Resources to Jobs”, on page 327.
2.2.13
How Default Server and Queue Resources Are
Applied When Jobs Move
When a job is moved from one server to another, the following changes happen:
•
Any default resources that were applied by the first server are removed
•
Default resources from the new server are applied to the job
When a job is moved from one queue to another, the following changes happen:
•
Any default resources that were applied by the first queue are removed
•
Default resources from the new queue are applied to the job
For more details on how default resources are inherited when a job is moved, see section
5.9.4.3, “Moving Jobs Between Queues or Servers Changes Defaults”, on page 328.
2.2.14
Specifying Default Queue
PBS has a default execution queue named workq, where jobs are placed when no queue is
requested. You can specify which queue should be the default. To specify the queue which is
to accept jobs when no queue is requested, set the server’s default_queue attribute to the
name of the queue:
Qmgr: set server default_queue = <queue name>
2.2.15
Associating Queues and Vnodes
You can set up vnodes so that they accept jobs only from specific queues. See section 4.8.2,
“Associating Vnodes with Queues”, on page 126.
2.2.16
Configuring Access to Queues
You can configure each queue so that only specific users or groups can submit jobs to the
queue. See section 8.3, “Using Access Control”, on page 791.
AG-34
PBS Professional 13.0 Administrator’s Guide
Configuring the Server and Queues
2.2.17
Chapter 2
Setting Limits on Usage at Queues
You can set limits on different kinds of usage at each queue:
•
You can limit the size of a job array using the max_array_size queue attribute
•
You can limit the number of jobs or the usage of each resource by each user or group, or
overall. See section 5.15.1, “Managing Resource Usage By Users, Groups, and Projects,
at Server & Queues”, on page 389
2.2.18
Queues and Failover
For information on configuring routing queues and failover, see section 9.2.6.1, “Configuring
Failover to Work with Routing Queues”, on page 853.
2.2.19
Additional Queue Information
For a description of each queue attribute, see “Queue Attributes” on page 371 of the PBS Professional Reference Guide.
For information on using queues for scheduling, see section 4.5, “Using Queues in Scheduling”, on page 118.
PBS Professional 13.0 Administrator’s Guide
AG-35
Chapter 2
AG-36
Configuring the Server and Queues
PBS Professional 13.0 Administrator’s Guide
3
Configuring MoMs and
Vnodes
The installation process creates a basic MoM and vnode configuration which contains the
minimum necessary in order to run PBS jobs. This chapter describes how to customize your
MoM and vnode configuration.
3.1
Vnodes: Virtual Nodes
A virtual node, or vnode, is an abstract object representing a set of resources which form a
usable part of a machine. This could be an entire host, or a nodeboard or a blade. A single
host can be made up of multiple vnodes. Each vnode can be managed and scheduled independently. PBS views hosts as being composed of one or more vnodes.
Each vnode has an associated set of attributes and resources. Vnode attributes are listed and
described in “Vnode Attributes” on page 384 of the PBS Professional Reference Guide.
Vnode resources can be built-in or custom (defined by you.) See Chapter 5, "PBS
Resources", on page 305. Rules for setting values for attributes and resources are given in
section 3.5.2, “Choosing Configuration Method”, on page 52.
3.1.1
Vnode State
The state of each vnode is controlled by its state attribute. The state of the vnode publishes
whether the vnode can accept new jobs, what it is doing, and whether it is usable. The state
attribute can take zero or more of the values listed in “Vnode States” on page 434 of the PBS
Professional Reference Guide. The state of a vnode can be set by PBS or in a hook. A
vnode’s state can be set to offline using the qmgr command; no other values can be set using
qmgr.
PBS Professional 13.0 Administrator’s Guide
AG-37
Chapter 3
3.1.2
Configuring MoMs and Vnodes
Relationship Between Hosts, Nodes, and
Vnodes
A host is any computer. Execution hosts used to be called nodes. However, some machines
such as the Altix can be treated as if they are made up of separate pieces containing CPUs,
memory, or both. Each piece is called a vnode. See "Vnodes: Virtual Nodes” on page 37.
Some hosts have a single vnode and some have multiple vnodes. PBS treats all vnodes alike
in most respects.
3.1.3
Natural Vnodes
For machines that have more than one vnode, there is a vnode called the natural vnode. A
natural vnode does not correspond to any actual hardware. The natural vnode is used to
define any placement set information that is invariant for a given host. See section 4.8.32,
“Placement Sets”, on page 224. The natural vnode is also used to define dynamic host-level
resources, and can be used to define shared resources. On a multi-vnoded machine which has
a natural vnode, anything set in the mom_resources line in PBS_HOME/sched_priv/
sched_config is shared by all of that machine’s vnodes. See section 5.14.5.1, “Dynamic
Host-level Resources”, on page 361 and section 5.4.7, “Shared and Non-shared Vnode
Resources”, on page 314.
3.1.4
Breaking Chunks Across Vnodes
Chunks can be broken up across vnodes that are on the same host. This is generally used for
jobs requesting a single chunk. On the Altix, the scheduler will share memory from a chunk
even if all the CPUs are used by other jobs. It will first try to put a chunk entirely on one
vnode. If it can, it will run it there. If not, it will break the chunk up across any vnode it can
get resources from, even for small amounts of unused memory.
3.1.4.1
Restrictions on Natural Vnode on cpuset
Machines
•
On a machine that has cpusets, the natural vnode should not have its schedulable
resources (ncpus, mem, vmem) set. Leave these resources unset. If these are set by the
administrator, their values are retained across restarts until they are changed again or until
the vnode is re-created. Setting the values via qmgr will lead the server and the MoM to
disagree on the values.
•
On the natural vnode, all values of resources_available.<resource> should be zero (0),
unless the resource is being shared among other vnodes via indirection.
AG-38
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
3.1.5
Chapter 3
Creating Vnodes
You can create vnodes using qmgr.
3.1.5.1
Creating Vnodes on Single-vnode Machines
using qmgr
For a machine which will have a single vnode:
1.
Start MoM on the host where you will create the vnode
2.
Get the short name returned by the gethostname command where you will run the
MoM.
3.
Use the qmgr command to create the vnode. Use the name returned by gethostname:
Qmgr: create node <vnode name> [<attribute>=<value>]
Attributes and their possible values are listed in “Vnode Attributes” on page 384 of the PBS
Professional Reference Guide.
All comma-separated attribute-value strings must be enclosed in quotes.
3.1.5.2
Creating Vnodes on Multi-vnode Machines using
qmgr
For a machine which will have more than one vnode, you create the natural vnode, but PBS
handles creation of the other vnodes:
1.
For machines such as an Altix, you must start PBS on the multi-vnode host using the PBS
start/stop script. See “The PBS Start/Stop Script” on page 211 in the PBS Professional
Installation & Upgrade Guide.
2.
Get the short name returned by the gethostname command where you will run the
MoM.
3.
Use the qmgr command to create the natural vnode. Use the name returned by gethostname:
Qmgr: create node <natural vnode name> [<attribute>=<value>]
Attributes and their possible values are listed in “Vnode Attributes” on page 384 of the PBS
Professional Reference Guide.
All comma-separated attribute-value strings must be enclosed in quotes.
PBS Professional 13.0 Administrator’s Guide
AG-39
Chapter 3
Configuring MoMs and Vnodes
After you create the natural vnode, the other vnodes become available for use. Follow the
rules for configuring these machines in section 3.5.2, “Choosing Configuration Method”, on
page 52. See section 10.4, “Support for SGI”, on page 954.
Here is an example of the vnode definition for a natural vnode on an Altix:
altix03:
altix03:
altix03:
altix03:
altix03:
pnames = cbrick, router
sharing = ignore_excl
resources_available.ncpus = 0
resources_available.mem = 0
resources_available.vmem = 0
For machines such as a Cray, creation of vnodes other than the natural vnode is handled by
MoM. You create the natural vnode using qmgr:
Qmgr: create node <natural vnode name>
See section 10.3, “Support for Cray Systems”, on page 923.
3.1.5.2.i
•
Caveats for Creating Vnodes
On the Cray, when creating a vnode to represent a login node, use the short name returned
by the gethostname command on the login node. For example, if gethostname
returns HostA, do the following:
Qmgr: create node HostA
If you create a vnode with a different name from the short name returned by gethostname,
the following happens:
•
MoM creates a vnode whose name is the short name returned by gethostname
•
The vnode you created is not recognized by MoM, and is marked stale
•
It is not a good idea to try to use qmgr to create the vnodes for an Altix, UV, or ICE,
other than the natural vnode. You do need to create the natural vnode via qmgr. It is possible to use qmgr to create a vnode with any name. The "[x]" naming does not imply any
special significance; it just an internal convention for naming vnodes on an Altix, UV, or
ICE. The fact that you can create a vnode with a weird name does not mean however that
the MoM on the host knows about that vnode. If the MoM does not know about the
vnode, the vnode will be considered "stale" and not usable. By default, MoM only
knows about the natural vnode, the one whose name is the same as the host.
•
Vnode attributes cannot be used as vnode names.
AG-40
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
3.1.6
Chapter 3
Deleting Vnodes
3.1.6.1
Deleting the Vnode on a Single-vnode Machine
Use the qmgr command to delete the vnode:
Qmgr: delete node <vnode name>
Optionally, you can stop PBS on the execution host whose vnode was deleted.
3.1.6.2
Deleting Vnodes on a Multi-vnode Machine
As long as there is a configuration file describing vnodes, PBS will believe they exist. Therefore, you must first remove the configuration file. To delete one or more vnodes on a multivnode machine, follow these steps:
1.
Use the -s remove option to the pbs_mom command to remove the Version 2 configuration file that describes the vnodes to be removed:
On UNIX/Linux:
pbs_mom -s remove <configuration file target>
On Windows:
pbs_mom -N -s remove <configuration file target>
2.
Use the -s insert option to the pbs_mom command to insert a new Version 2 configuration file describing the vnodes to be retained:
On UNIX/Linux:
pbs_mom -s insert <configuration file target> <input file source>
On Windows:
pbs_mom -N -s insert <configuration file target> <input file source>
3.
Restart the MoM:
<path to start/stop script>/pbs restart
4.
Use the qmgr command to remove the vnodes:
Qmgr: delete node <vnode name>
3.1.6.3
Deleting Vnodes on a Cray
For information on deleting vnodes on a Cray, see section 10.3.11.6, “Deleting Vnodes on
Cray”, on page 948.
PBS Professional 13.0 Administrator’s Guide
AG-41
Chapter 3
3.1.7
Configuring MoMs and Vnodes
Allocating Vnodes to Jobs
PBS can run jobs only on the execution hosts that are managed by the PBS server, and running a MoM.
By default, when the scheduler looks for the vnodes on which to run a job, it goes down the
list of hosts in the order in which they appear in the server’s list of hosts, and places the job on
the first available vnode or vnodes meeting the job’s requirements. This means that the order
of the list of hosts affects default job placement. You can specify more sophisticated choices;
see Chapter 4, "Scheduling", on page 63.
The scheduler follows the specified rules for selecting vnodes that match each job’s request.
Once the scheduler finds the resources that match a job’s request, it allocates vnodes to the
job, according to the value of the vnode’s sharing attribute and the job’s resource request.
3.1.7.1
Sharing Vnodes Among Jobs
Each vnode can be allocated exclusively to one job, or its resources can be shared among jobs.
Hosts can also be allocated exclusively to one job, or shared among jobs.
How vnodes are allocated to jobs is determined by a combination of the vnode’s sharing
attribute and the job’s resource request. The possible values for the vnode sharing attribute,
and how they interact with a job’s placement request, are described in “sharing” on page 389
of the PBS Professional Reference Guide. A description of how resources are allocated is in
section 4.8.40, “Shared vs. Exclusive Use of Resources by Jobs”, on page 277.
If a vnode is allocated exclusively to a job, all of its resources are assigned to the job. The
state of the vnode becomes job-exclusive. No other job can use the vnode.
If a host is to be allocated exclusively to one job, all of the host must be used: if any vnode
from a host has its sharing attribute set to either default_exclhost or force_exclhost, all
vnodes on that host must have the same value for the sharing attribute. When the MoM starts
or restarts, if any vnode on a host is set to either default_exclhost or force_exclhost, and
another vnode is set to a different value, the MoM will exit and log the following error message at event class 0x0001:
It is erroneous to mix sharing= <sharing val> for vnode <name> with
sharing= <force_exclhost|default_exclhost> which is set for other
vnodes on host <host>
AG-42
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
3.1.7.2
Chapter 3
Placing Jobs on Vnodes
Jobs can be placed on vnodes according to the job’s placement request. Each chunk from a
job can be placed on a different host, or a different vnode. Alternatively, all chunks can be
taken from a single host, or from chunks sharing the same value for a specified resource. The
job can request exclusive use of each vnode, or shared use with other jobs. The job can
request exclusive use of its hosts. For details, see "Specifying Job Placement", on page 92 of
the PBS Professional User’s Guide.
3.2
MoMs
A MoM daemon runs on each execution host and manages the jobs on that execution host.
The pbs_mom command starts the PBS job monitoring and execution daemon, called MoM.
The pbs_mom daemon starts jobs on the execution host, monitors and reports resource usage,
enforces resource usage limits, and notifies the server when the job is finished. The MoM
also runs any prologue scripts before the job runs, and runs any epilogue scripts after the job
runs.
When the MoM starts a job, she creates a new session that is as identical to the user’s login
session as is possible. For example, under UNIX, if the user’s login shell is csh, then MoM
creates a session in which .login is run as well as .cshrc. MoM returns the job’s output to
the user.
The MoM performs any communication with job tasks and with other MoMs. The MoM on
the first vnode on which a job is running manages communication with the MoMs on the
remaining vnodes on which the job runs. The MoM on the first vnode is called Mother
Superior.
The MoM log file is in PBS_HOME/mom_logs. The MoM writes an error message in its
log file when it encounters any error. The MoM also writes other miscellaneous information
to its log file. If it cannot write to its log file, it writes to standard error.
The executable for pbs_mom is in PBS_EXEC/sbin, and can be run only by root.
See “Manually Starting MoM” on page 213 in the PBS Professional Installation & Upgrade
Guide for information on starting and stopping MoM.
PBS Professional 13.0 Administrator’s Guide
AG-43
Configuring MoMs and Vnodes
Chapter 3
3.2.1
Single-vnode, Multi-vnode, and Cpusetted
Systems
For systems that can be subdivided into more than one virtual node, or vnode, PBS manages
each vnode much as if it were a host. On each machine, the MoM manages the vnodes. PBS
may treat a host such as an Altix as a set of virtual nodes, in which case one MoM manages all
of the host's vnodes. For details about vnodes, see section 3.1, “Vnodes: Virtual Nodes”, on
page 37.
The pbs_mom you select to run a machine depends on the type of machine and the way you
want it managed. The MoM that manages a system without cpusets is pbs_mom.standard. This MoM can manage a single-vnoded or a multi-vnoded, non-cpusetted system.
The MoM that has extensions to manage a cpusetted machine such as the Altix is
pbs_mom.cpuset. The appropriate MoM is copied to pbs_mom. See the PBS Professional Installation and Upgrade Guide.
The following sections describe configuration files and methods for all MoMs and vnodes.
See section 10.4, “Support for SGI”, on page 954 for information that is specific to systems
with cpusets.
3.3
Files and Directories Used by MoM
If PBS_MOM_HOME is present in the pbs.conf file, pbs_mom will use that directory for its
“home” instead of PBS_HOME. Under UNIX/Linux, all files and directories that MoM uses
must be owned by root. Under Windows, these directories must have at least Full Control
permission for the local Administrators group. MoM uses the following files and directories:
UNIX:
Table 3-1: MoM Files and Directories Under UNIX
File/Directory
Description
Permissions
aux
Directory
0755
checkpoint
Directory
0700
checkpoint script
File
0755
mom_logs
Directory
0755
mom_priv
Directory
0751
mom_priv/jobs
Directory
0751
AG-44
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
Chapter 3
Table 3-1: MoM Files and Directories Under UNIX
File/Directory
Description
Permissions
mom_priv/config
File
0644
mom_priv/prologue
File
0755
mom_priv/epilogue
File
0755
pbs_environment
File
0644
spool
Directory
1777 (drwxrwxrwt)
undelivered
Directory
1777 (drwxrwxrwt)
Version 2 configuration files
Files
0755
PBS reserved configuration files
Files
----
Job temporary directory
Directory
1777
Windows:
Table 3-2: MoM Files and Directories Under Windows
File/Directory
Description
Ownership/Permission
auxiliary
Directory
At least Full Control permission for the local
Administrators group and read-only access to
Everyone
checkpoint
Directory
At least Full Control permission for the local
Administrators group
checkpoint script
File
At least Full Control permission for the local
Administrators group
mom_logs
Directory
At least Full Control permission for the local
Administrators group and read-only access to
Everyone
mom_priv
Directory
At least Full Control permission for
the local Administrators group and
read-only access to Everyone
PBS Professional 13.0 Administrator’s Guide
AG-45
Configuring MoMs and Vnodes
Chapter 3
Table 3-2: MoM Files and Directories Under Windows
File/Directory
Description
Ownership/Permission
Directory
At least Full Control permission for
the local Administrators group and
read-only access to Everyone
mom_priv/jobs
mom_priv/config
File
At least Full Control permission for the local
Administrators group
pbs_environment
File
At least Full Control permission for the local
Administrators group and read-only to Everyone
spool
Directory
Full access to Everyone
undelivered
Directory
Full access to Everyone
Job’s temporary directory
Directory
Writable by Everyone
3.4
Configuring MoMs and Vnodes
The behavior of each MoM is controlled through its configuration files. You configure MoMs
by specifying values for parameters in configuration files.
Vnodes are controlled through the values of their attributes. You configure vnodes by specifying values for vnode attributes, using any of the following:
•
Using hooks to set vnode attributes and resources; see section 6.10.4.4.iv, “Setting and
Unsetting Vnode Resources and Attributes Using vnode_list[]”, on page 494
•
Setting attribute values using the qmgr command
•
Creating configuration files using the pbs_mom -s insert command (pbs_mom
-N -s insert on Windows)
The method to use to configure MoMs and vnodes depends on the machine being configured.
The methods used are described in section 3.5, “How to Configure MoMs and Vnodes”, on
page 50.
AG-46
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
3.4.1
Chapter 3
Editing Configuration Files Under Windows
When you edit any PBS configuration file, make sure that you put a newline at the end of the
file. The Notepad application does not automatically add a newline at the end of a file; you
must explicitly add the newline.
3.4.2
Types of MoM and Vnode Configuration Files
MoM and vnode configuration information can be contained in configuration files of three
types:
•
Version 1
•
PBS reserved
•
Version 2
3.4.2.1
Version 1 Configuration Files
You edit the Version 1 configuration file directly. The Version 1 configuration file is usually
PBS_HOME/mom_priv/config. This file contains the parameters that control MoM’s
behavior.
The Version 1 configuration file must be secure. It must be owned by a user ID and group ID
both less than 10 and must not be world-writable.
For a complete description of the syntax and contents of the Version 1 configuration file, see
“MoM Parameters” on page 283 of the PBS Professional Reference Guide.
3.4.2.2
PBS Reserved Configuration Files
PBS reserved configuration files are created by PBS and are prefixed with "PBS". These
files are created by PBS and are not configurable. Do not attempt to edit these files. An
attempt to create or remove a file with the "PBS" prefix will result in an error.
3.4.2.3
Version 2 Configuration Files
Version 2 configuration files are those created by the site administrator. These files can contain vnode attribute settings. Do not attempt to edit these files directly. Instead of editing
these directly, you create a local file and give it as an argument to the pbs_mom -s
insert option (pbs_mom -N -s insert on Windows), and PBS creates a new configuration file for you. See section 3.5.3, “Creating Version 2 MoM Configuration Files”, on
page 53. Their syntax is called “Version 2” in order to differentiate it from the syntax of the
Version 1 configuration files.
PBS Professional 13.0 Administrator’s Guide
AG-47
Chapter 3
Configuring MoMs and Vnodes
You can list, add, delete and display Version 2 configuration files using the pbs_mom -s
option (pbs_mom -N -s on Windows). See “pbs_mom” on page 61 of the PBS Professional Reference Guide for information about pbs_mom options.
3.4.2.3.i
Removing Version 2 Configuration Files
You can remove a Version 2 configuration file using the pbs_mom -s remove option
(pbs_mom -N -s remove on Windows). See “pbs_mom” on page 61 of the PBS Professional Reference Guide.
3.4.3
Location of MoM Configuration Files
The Version 1 configuration file is usually PBS_HOME/mom_priv/config. It can be in a
different location; in that case, MoM must be started with the -c option. See “pbs_mom” on
page 61 of the PBS Professional Reference Guide.
PBS places PBS reserved and Version 2 configuration files in an area that is private to each
installed instance of PBS.
3.4.4
Listing and Viewing PBS Reserved and
Version 2 Configuration Files
You can list and view the PBS reserved configuration files and the Version 2 configuration
files using the pbs_mom -s list and pbs_mom -s show options (pbs_mom -N -s
list and show on Windows). See “pbs_mom” on page 61 of the PBS Professional Reference Guide.
AG-48
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
3.4.5
Chapter 3
Caveats and Restrictions for Configuration
Files
•
Do not attempt to directly create PBS reserved or Version 2 configuration files; instead,
use the pbs_mom -s option (pbs_mom -N -s on Windows).
•
Note that the -d option to pbs_mom changes where MoM looks for PBS_HOME, and
using this option will change where MoM looks for all configuration files. If you use the
-d option, MoM will look in the new location for any PBS reserved and Version 2 files.
•
The -c option will change which Version 1 configuration file MoM reads.
•
Do not move PBS reserved configuration files.
•
If you set a value using qmgr, this value overrides the value specified in a configuration
file.
•
Do not mix the configuration file contents or syntax. Each type must use its own syntax,
and contain its own type of information.
•
When you create a Version 2 configuration file for a pre-existing vnode, make sure it
specifies all of the information about the vnode, such as resources and attribute settings.
The creation of the configuration file overrides previous settings, and if the new file contains no specification for a resource or attribute, that resource or attribute becomes unset.
•
Version 2 configuration files can be moved from one installed instance of PBS to another.
To move a set of Version 2 configuration files from one installed instance of PBS to
another:
1.
Use the -s list directive with the "source" instance of PBS to enumerate the
Version 2 files.
2.
Use the -s show directive with each Version 2 file of the "source" instance of
PBS to save a copy of that file.
3.
Use the -s insert directive with each file at the "target" instance of PBS to
create a copy of each Version 2 configuration file.
3.4.5.1
When MoM Reads Configuration Files
MoM reads the configuration files at startup and reinitialization. On UNIX, this is when
pbs_mom receives a SIGHUP signal or is started or restarted, and on Windows, when MoM is
started or restarted. In order for any configuration changes to take effect, MoM must be
HUPed.
PBS Professional 13.0 Administrator’s Guide
AG-49
Chapter 3
Configuring MoMs and Vnodes
If you make changes to the hardware or a change occurs in the number of CPUs or amount of
memory that is available to PBS, such as a non-PBS process releasing a cpuset, you should
restart PBS, by typing the following:
<path-to-script>/pbs restart
The MoM daemon is normally started by the PBS start/stop script.
When MoM is started, it will open its Version 1 configuration file, mom_priv/config, in
the path specified in pbs.conf, if the file exists. If it does not, MoM will continue anyway.
The config file may be placed elsewhere or given a different name, by starting pbs_mom
using the -c option with the new file and path specified. See “Manually Starting MoM” on
page 213 in the PBS Professional Installation & Upgrade Guide.
The files are processed in this order:
1.
Version 1 configuration file
2.
PBS reserved configuration files
3.
Version 2 configuration files
Within each category, the files are processed in lexicographic order.
The contents of a file that is read later will override the contents of a file that is read earlier.
3.5
3.5.1
How to Configure MoMs and Vnodes
Configuration Methods
The method you use to configure MoMs and vnodes depends upon the machine being configured. The methods are the following:
Table 3-3: MoM and Vnode Configuration Methods
When Method Changes
MoM Behavior
Method
Using the qmgr command to set attribute values
Immediately
Editing the Version 1 configuration file PBS_HOME/
mom_priv/config
When MoM is restarted
Using the pbs_mom -s insert command to create a When MoM is restarted
configuration file (pbs_mom -N -s insert on Windows)
AG-50
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
Chapter 3
Table 3-3: MoM and Vnode Configuration Methods
Method
Using the pbsnodes command to change the state of a
vnode
3.5.1.1
When Method Changes
MoM Behavior
Immediately
The qmgr Command
You use the qmgr command to set attribute values. You can use the qmgr command to set
attribute values for individual vnodes where those vnodes are part of a multi-vnode machine.
To set a vnode’s attribute, the format is the following:
qmgr -c ‘set node <vnode name> <attribute> = <value>’
or start qmgr, and use the following:
set node <vnode name> <attribute> = <value>
The qmgr command is described in “qmgr” on page 158 of the PBS Professional Reference
Guide.
If you set a value using qmgr, this value overrides the value specified in a configuration file.
3.5.1.2
Editing Version 1 Files
Use your favorite text editor to edit Version 1 configuration files.
When you edit any PBS configuration file, make sure that you put a newline at the end of the
file. The Notepad application does not automatically add a newline at the end of a file; you
must explicitly add the newline.
3.5.1.3
Using the pbs_mom -s insert Command
You use the pbs_mom -s insert command (pbs_mom -N -s insert on Windows)
to create all Version 2 configuration files. First, you create a script which is to be the contents
of the configuration file. Then, you insert the script using the pbs_mom -s insert command:
UNIX/Linux:
pbs_mom -s insert <script> <configuration file name>
Windows:
pbs_mom -N -s insert <script> <configuration file name>
PBS Professional 13.0 Administrator’s Guide
AG-51
Chapter 3
Configuring MoMs and Vnodes
For a description of the Version 2 syntax, see section 3.5.3, “Creating Version 2 MoM Configuration Files”, on page 53.
3.5.1.4
Using the pbsnodes Command
The pbsnodes command is used to set the state of a host to be offline or not offline. To set
the state attribute of one or more hosts to offline:
pbsnodes -o <hostname [hostname ...]>
To remove the offline setting from the state attribute of one or more hosts:
pbsnodes -r <hostname [hostname ...]>
Note that the pbsnodes command operates on hosts, not individual vnodes where those
vnodes are on multi-vnode machines. To operate on individual vnodes, use the qmgr command.
See “pbsnodes” on page 108 of the PBS Professional Reference Guide.
3.5.2
3.5.2.1
Choosing Configuration Method
Configuring Single-vnode Machines without
cpusets
To configure the MoM and vnode on a single-vnode machine without cpusets, do the following:
•
To configure MoM, including local resources, edit the Version 1 MoM parameter file
•
To configure vnodes, use the qmgr command to set vnode attributes and global resources
3.5.2.1.i
•
Exceptions
Use pbs_mom -s insert (pbs_mom -N -s insert on Windows) to set the
sharing vnode attribute
3.5.2.2
Configuring Multi-vnode Machines without
cpusets
To configure the MoM and vnodes on a multi-vnode machine without cpusets, do the following:
•
To configure MoM, including local resources, edit the Version 1 MoM parameter file
•
To configure vnodes, use the qmgr command to set vnode attributes and global resources
AG-52
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
3.5.2.2.i
Chapter 3
Exceptions
•
Use the pbs_mom -s insert command (pbs_mom -N -s insert on Windows) to set the sharing vnode attribute (vnode definition files are not recommended on
Cray)
•
You can use pbsnodes to set the state vnode attribute
3.5.2.2.ii
•
Restrictions
Set the Mom vnode attribute for the natural vnode only.
3.5.2.3
Configuring Machines with Cpusets
To configure the MoM and vnodes on a machine that has cpusets, do the following:
•
To configure MoM, including local resources, edit the Version 1 MoM parameter file
•
To configure vnodes, use the pbs_mom -s insert command (pbs_mom -N -s
insert on Windows) to set vnode attributes and global resources.
3.5.2.3.i
Exceptions
•
You can use qmgr or pbsnodes to set the state vnode attribute
•
Use qmgr to set the priority vnode attribute
3.5.2.3.ii
Restrictions
•
Do not use qmgr to configure vnodes, especially for sharing,
resources_available.ncpus, resources_available.vmem, and
resources_available.mem.
•
Do not attempt to set values for resources_available.ncpus,
resources_available.vmem, or resources_available.mem. These are set by PBS when
the topology file is read.
•
Set the Mom vnode attribute for the natural vnode only. Do not attempt to set it for any
other vnodes.
3.5.3
3.5.3.1
Creating Version 2 MoM Configuration Files
Operating on Version 2 Configuration Files
You can list, add, delete and display Version 2 configuration files using the pbs_mom -s
option (pbs_mom -N -s on Windows). See “pbs_mom” on page 61 of the PBS Professional Reference Guide for information about pbs_mom options.
PBS Professional 13.0 Administrator’s Guide
AG-53
Chapter 3
3.5.3.2
Configuring MoMs and Vnodes
Format of Version 2 Configuration Files
Any Version 2 configuration file must begin with this line:
$configversion 2
The format of the remaining contents of the file is the following:
<vnode ID> : <attribute name> = <attribute value>
where
Table 3-4: Elements in Version 2 Reserved Configuration Files
Element
Description
<vnode ID>
Sequence of characters not including a colon (":")
<attribute name>
Sequence of characters beginning with alphabetics or numerics,
which can contain _ (underscore), - (dash), @ (at sign), [ (left
bracket), ] (right bracket), # (hash), ^ (caret), / (slash), \ (backslash), and . (period).
<attribute value>
Sequence of characters not including an equal sign ("=")
The colon and equal sign may be surrounded by white space.
A vnode's ID is an identifier that will be unique across all vnodes known to a given
pbs_server and will be stable across reinitializations or invocations of pbs_mom. ID stability is important when a vnode's CPUs or memory might change over time and PBS is
expected to adapt to such changes by resuming suspended jobs on the same vnodes to which
they were originally assigned. Vnodes for which this is not a consideration may simply use
IDs of the form "0", "1", etc. concatenated with some identifier that ensures uniqueness across
the vnodes served by the pbs_server. Vnode attributes cannot be used as vnode names.
See “Vnode Attributes” on page 384 of the PBS Professional Reference Guide, where vnode
attributes are listed.
3.5.3.3
Using the pbs_mom -s insert Command
To create a Version 2 configuration file:
1.
Create the script that is to be the contents of the configuration file
2.
Make this script into a configuration file using the pbs_mom -s insert command.
Example 3-1: If your machine has 4 vnodes, named BigNode0, BigNode1, SmallNode0,
and SmallNode1, and you want big jobs to have exclusive use of their vnodes, and small
AG-54
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
Chapter 3
jobs to share their vnodes, then set sharing for big and small vnodes by creating a file
"set_sharing" containing the following:
$configversion 2
BigNode0: sharing =
BigNode1: sharing =
SmallNode0: sharing
SmallNode1: sharing
default_excl
default_excl
= default_shared
= default_shared
Then use the pbs_mom -s insert <filename> <script> option to create the
configuration file:
UNIX/Linux:
pbs_mom -s insert sharing_config set_sharing
Windows:
pbs_mom -N -s insert sharing_config set_sharing
The script sharing_config is the new Version 2 configuration file. Its contents will
override previously-read sharing settings. You must restart the MoM after changing the
configuration file.
Example 3-2: To change the sharing attribute on the host named host3:
1.
Check that pbsnodes shows host3 has “sharing = default_shared”:
pbsnodes host3
2.
Change the setting to be “sharing = force_excl”:
As root, create a script file /tmp/excl_file which contains the following:
$configversion 2
<host3>: sharing=force_excl
3.
With the pbs_mom daemon running, execute the following on host3:
UNIX/Linux:
# $PBS_EXEC/sbin/pbs_mom -s insert excl /tmp/excl_file
4.
Check that this took effect. The following should show "excl".
UNIX/Linux:
# $PBS_EXEC/sbin/pbs_mom -s list
5.
Restart pbs_mom:
# kill -HUP <PID of pbs_mom>
6.
Check that “pbsnodes host3” now shows “sharing = force_excl”
PBS Professional 13.0 Administrator’s Guide
AG-55
Chapter 3
3.5.3.4
Configuring MoMs and Vnodes
Caveats and Restrictions for pbs_mom -s
insert
On Windows, the pbs_mom -s option must be used with the -N option so that MoM will
start in standalone mode.
3.5.4
Using qmgr to Set Vnode Resources and
Attributes
One of the PBS reserved configuration files is PBSvnodedefs, which is created by a placement set generation script. You can use the output of the placement set generation script to
produce input to qmgr. The placement set generation script normally emits data for the
PBSvnodedefs file. If the script is given an additional “-v type=q” argument it emits
data in a form suitable for input to qmgr:
set node <ID> resources_available.<ATTRNAME> = <ATTRVALUE>
where <ID> is a vnode identifier unique within the set of hosts served by a pbs_server.
Conventionally, although by no means required, the <ID> above will look like
HOST[<localID>] where HOST is the host's FQDN stripped of domain suffixes and
<localID> is an identifier whose meaning is unique to the execution host on which the
referred to vnode resides. For invariant information, it will look like this:
set node <ID> priority = 2
3.5.5
3.5.5.1
Caveats and Advice on Configuring MoMs and
Vnodes
Changing Resource Settings
In general, it is not advisable to set resources_available.ncpus or
resources_available.mem to a value greater than PBS has detected on the machine. This is
because you do not want MoM to try to allocate more resources than are available.
In general, it is safe to set resources_available.ncpus or resources_available.mem to a
value less than PBS has detected.
3.5.5.2
Resource Values for Natural Vnode
On the natural vnode, all values for resources_available.<resource> should be zero (0),
unless the resource is being shared among other vnodes via indirection.
AG-56
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
3.5.5.3
Chapter 3
Editing Configuration Files Under Windows
When you edit any PBS configuration file, make sure that you put a newline at the end of the
file. The Notepad application does not automatically add a newline at the end of a file; you
must explicitly add the newline.
3.6
3.6.1
3.6.1.1
Configuring MoM and Vnode Features
Configuring MoM Polling Cycle
Polling on UNIX/Linux
In this section, we describe how to configure MoM’s polling cycle. Please note that polling
intervals cannot be considered to be exact:
•
The calculation below simply provides a minimum amount of time between one poll and
the next.
•
The actual time between polls can vary. The actual time taken by MoM also depends on
the other tasks MoM is performing, such as starting jobs, running a prologue or epilogue,
etc.
•
The timing of MoM’s activities is not completely under her control, because she is a user
process.
•
The finest granularity for calculating polling is in seconds.
MoM’s polling cycle is determined by the values of $min_check_poll and
$max_check_poll. The interval between each poll starts at $min_check_poll and increases
with each cycle until it reaches $max_check_poll, after which it remains the same. The
amount by which the cycle increases is the following:
( max_check_poll - min_check_poll + 19 ) / 20
The default value for $max_check_poll is 120 seconds. The minimum is 1 second. It is not
recommended to set $max_check_poll to less than 30 seconds.
The default value for $min_check_poll is 10 seconds. The minimum is 1 second. It is not
recommended to set $min_check_poll to less than 10 seconds.
The start of a new job resets the polling for all of the jobs being managed by this MoM.
MoM polls for resource usage for cput, walltime, mem and ncpus. See section 5.15.3,
“Placing Resource Limits on Jobs”, on page 414.
PBS Professional 13.0 Administrator’s Guide
AG-57
Chapter 3
3.6.1.2
Configuring MoMs and Vnodes
Polling on Windows
On Windows, MoM updates job usage at fixed intervals of 10 seconds. The $min_check_poll
and $max_check_poll parameters are not used by MoM on Windows. MoM looks for any
job that has exceeded a limit for walltime, mem, or cput, and terminates jobs that have
exceeded the limit.
3.6.1.2.i
Windows Polling Caveats
The ncpus resource cannot be tracked in Windows.
3.6.1.3
How Polling is Used
•
Job-wide limits are enforced by MoM using polling. See section 5.15.3.4.i, “Job Memory Limit Enforcement on UNIX”, on page 418. MoM can enforce cpuaverage and
cpuburst resource usage. See section 5.15.3.5.i, “Average CPU Usage Enforcement”, on
page 420 and section 5.15.3.5.ii, “CPU Burst Usage Enforcement”, on page 421.
•
MoM enforces the $restrict_user access restrictions on the polling cycle controlled by
$min_check_poll and $max_check_poll. See section 3.6.6, “Restricting User Access to
Execution Hosts”, on page 60.
•
Cycle harvesting has its own polling interval. See “$kbd_idle <idle_wait> <min_use>
<poll_interval>” on page 289 of the PBS Professional Reference Guide for information
on $kbd_idle.
3.6.1.4
Recommendations for Polling Interval
Do not set $max_check_poll to less than 30 seconds.
Do not set $min_check_poll to less than 10 seconds.
If you have many small jobs, frequent polling can take up a lot of MoM’s cycles. You may
want to set $min_check_poll and $max_check_poll to somewhat higher values.
3.6.2
Configuring Host-level Resources
Before configuring host-level resources, consider how you will use them. When configuring
static resources, it is best to configure global static resources. Even though they are global,
they can be configured at the host level. Global resources can be operated on via the qmgr
command and viewed via the qstat command. When configuring dynamic resources, if
you need the script to run at the execution host, configure local dynamic resources. These
resources cannot be operated on via the qmgr command or viewed via the qstat command.
AG-58
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
3.6.2.1
Chapter 3
Configuring Global Static Vnode Resources
You can create global custom static host-level resources that can be reported by MoM and
used for jobs. Follow the instructions in section 5.14.5.2, “Static Host-level Resources”, on
page 363.
You can set values for built-in and custom global static vnode resources according to the rules
in section 3.5.2, “Choosing Configuration Method”, on page 52.
3.6.2.1.i
Configuring Local Dynamic Vnode Resources
You can create local custom dynamic host-level resources. The primary use of this feature is
to add site-specific resources, such as software application licenses or scratch space. Follow
the instructions in section 5.14.5.1, “Dynamic Host-level Resources”, on page 361.
3.6.3
Manual Creation of cpusets Not Managed by
PBS
You may wish to create cpusets not managed by PBS on an Altix running supported versions
of ProPack or SGI Performance Suite. If you have not started PBS, create these cpusets
before starting PBS. If you have started PBS, requeue any jobs, stop PBS, create your
cpuset(s), then restart PBS.
3.6.4
Configuring Site-Specific Job Termination
For information on site-specific job termination, see section 11.8.5, “Configuring Site-specific Job Termination”, on page 982.
3.6.5
Job Checkpoint and Restart
If you want support for job checkpoint and restart, you can configure MoM to run checkpoint
and restart scripts. See section 9.3, “Checkpoint and Restart”, on page 857.
PBS Professional 13.0 Administrator’s Guide
AG-59
Chapter 3
3.6.6
Configuring MoMs and Vnodes
Restricting User Access to Execution Hosts
PBS provides a facility to prevent users who are not running PBS jobs from using machines
controlled by PBS. You can turn this feature on by using the $restrict_user MoM directive.
This directive can be fine-tuned by using the $restrict_user_exceptions and
$restrict_user_maxsysid MoM directives. This feature can be set up host by host.
•
A user requesting exclusive access to a set of hosts (via place=excl) can be guaranteed that no other user will be able to use the hosts assigned to his job, and PBS will not
assign any unallocated resources on the vnode to another job.
•
A user requesting non-exclusive access to a set of hosts can be guaranteed that no nonPBS users are allowed access to the hosts.
•
A privileged user can be allowed access to the complex such that they can log into a host
without having a job active.
•
An abusive user can be denied access to the complex hosts.
The administrator can find out when users try to access hosts without going through PBS.
The administrator can ensure that application performance is consistent on a complex controlled by PBS. PBS will also be able to clean up any job processes remaining after a job finishes running. The log event class for messages concerning restricting users is 0x0002.
For a vnode with access restriction turned on:
•
Any user not running a job who logs in or otherwise starts a process on that vnode will
have his processes terminated.
•
A user who has logged into a vnode where he owns a job will have his login terminated
when the job is finished.
•
When MoM detects that a user that is not exempt from access restriction is using the system, that user's processes are killed and a log message is output:
01/16/2006 22:50:16;0002;pbs_mom;Svr;restrict_user;
killed uid 1001 pid 13397(bash) with log event class PBSE_SYSTEM.
You can set up a list of users who are exempted from the restriction via the
$restrict_user_exceptions directive. This list can contain up to 10 usernames.
Example 3-3: Turn access restriction on for a given node:
$restrict_user on
Example 3-4: Limit the users affected to those with a user ID greater than 500:
$restrict_user_maxsysid 500
Example 3-5: Exempt specific users from the restriction:
$restrict_user_exceptions userA, userB, userC
AG-60
PBS Professional 13.0 Administrator’s Guide
Configuring MoMs and Vnodes
Chapter 3
Note that a user who has a job running on a particular host will be able to log into that host.
3.6.6.1
Windows Restriction
The user access restriction feature is not supported on Windows.
3.6.7
Vnode Resources Set by MoM
If the following vnode resources are not explicitly set, they will take the value provided by
MoM. But if they are explicitly set, that setting will be carried forth across server restarts.
They are:
resources_available.ncpus
resources_available.arch
resources_available.mem
3.6.8
Vnode Comments
Vnodes have a comment attribute which can be used to display information about that vnode.
If the comment attribute has not been explicitly set by the PBS Manager and the vnode is
down, it will be used by the PBS server to display the reason the vnode was marked down. If
the Manager has explicitly set the attribute, the server will not overwrite the comment. The
comment attribute may be set via the qmgr command:
Qmgr: set node pluto comment=”node will be up at 5pm”
Once set, vnode comments can be viewed via pbsnodes, xpbsmon (vnode detail page), and
qmgr. See “pbsnodes” on page 108 of the PBS Professional Reference Guide, and “xpbsmon” on page 267 of the PBS Professional Reference Guide. The xpbs and xpbsmon interfaces are deprecated.
PBS Professional 13.0 Administrator’s Guide
AG-61
Chapter 3
AG-62
Configuring MoMs and Vnodes
PBS Professional 13.0 Administrator’s Guide
4
Scheduling
The "Scheduling Policy Basics" section of this chapter describes what PBS can do, so that you
can consider these capabilities when choosing how to schedule jobs. The "Choosing a Policy"
section describes how PBS can meet the scheduling needs of various workloads. The "Scheduling Tools" section describes each scheduling tool offered by PBS.
4.1
Chapter Contents
4.1
Chapter Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2
Scheduling Policy Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.1
How Scheduling can be Used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.2
What is Scheduling Policy? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.3
Basic PBS Scheduling Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.4
Sub-goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2.5
Job Prioritization and Preemption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.6
Resource Allocation to Users, Projects & Groups . . . . . . . . . . . . . . . . . . . 75
4.2.7
Time Slot Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.8
Job Placement Optimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.9
Resource Efficiency Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2.10
Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3
Choosing a Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.1
Overview of Kinds of Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.2
FIFO: Submission Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.3
Prioritizing Jobs by User, Project or Group . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.4
Allocating Resources by User, Project or Group . . . . . . . . . . . . . . . . . . . . 91
4.3.5
Scheduling Jobs According to Size Etc.. . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.6
Scheduling Jobs into Time Slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3.7
Default Scheduling Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.3.8
Examples of Workload and Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.4
The Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4.1
Configuring the Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4.2
Making the Scheduler Read its Configuration . . . . . . . . . . . . . . . . . . . . . 113
4.4.3
Scheduling on Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.4.4
Starting, Stopping, and Restarting the Scheduler . . . . . . . . . . . . . . . . . . . 114
PBS Professional 13.0 Administrator’s Guide
AG-63
Chapter 4
Scheduling
4.4.5
The Scheduling Cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.4.6
How Available Consumable Resources are Counted . . . . . . . . . . . . . . . . 116
4.4.7
Improving Scheduler Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5
Using Queues in Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.6
Scheduling Restrictions and Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.6.1
One Policy Per PBS Complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.6.2
Jobs that Cannot Run on Current Resources . . . . . . . . . . . . . . . . . . . . . . 120
4.6.3
Resources Not Controlled by PBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.6.4
No Pinning of Processes to Cores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.7
Errors and Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.7.1
Logfile for scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.8
Scheduling Tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.8.1
Anti-Express Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.8.2
Associating Vnodes with Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.8.3
Using Backfilling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.8.4
Examining Jobs Queue by Queue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.8.5
Checkpointing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.8.6
Organizing Job Chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.8.7
cron Jobs, or the Windows Task Scheduler . . . . . . . . . . . . . . . . . . . . . . 138
4.8.8
Using Custom and Default Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.8.9
Using Idle Workstation Cycle Harvesting . . . . . . . . . . . . . . . . . . . . . . . . 142
4.8.10
Dedicated Time. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.8.11
Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
4.8.12
Dynamic Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.8.13
Eligible Wait Time for Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
4.8.14
Sorting Jobs by Entity Shares (Was Strict Priority) . . . . . . . . . . . . . . . . . 167
4.8.15
Estimating Job Start Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4.8.16
Calculating Job Execution Priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.8.17
Express Queues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.8.18
Using Fairshare. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.8.19
FIFO Scheduling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.8.20
Using a Formula for Computing Job Execution Priority . . . . . . . . . . . . . 193
4.8.21
Gating Jobs at Server or Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
4.8.22
Managing Application Licenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
4.8.23
Limits on Per-job Resource Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
4.8.24
Limits on Project, User, and Group Jobs . . . . . . . . . . . . . . . . . . . . . . . . . 204
4.8.25
Limits on Project, User, and Group Resource Usage . . . . . . . . . . . . . . . . 204
4.8.26
Limits on Jobs at Vnodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
4.8.27
Using Load Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
4.8.28
Matching Jobs to Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
4.8.29
Node Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
4.8.30
Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
AG-64
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.31
4.8.32
4.8.33
4.8.34
4.8.35
4.8.36
4.8.37
4.8.38
4.8.39
4.8.40
4.8.41
4.8.42
4.8.43
4.8.44
4.8.45
4.8.46
4.8.47
4.8.48
4.2
4.2.1
Chapter 4
Peer Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Placement Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Using Preemption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Using Primetime and Holidays. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Provisioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Queue Priority. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Advance and Standing Reservations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Round Robin Queue Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Routing Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Shared vs. Exclusive Use of Resources by Jobs. . . . . . . . . . . . . . . . . . . . 276
Using Shrink-to-fit Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
SMP Cluster Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Sorting Jobs on a Key. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
Sorting Jobs by Requested Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Sorting Queues into Priority Order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Starving Jobs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Using Strict Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Sorting Vnodes on a Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
Scheduling Policy Basics
How Scheduling can be Used
You can use the scheduling tools provided by PBS to implement your chosen scheduling policy, so that your jobs run in the way you want.
Your policy can do the following:
•
Prioritize jobs according to your specification
•
Run jobs according to their relative importance
•
Award specific amounts of resources such as CPU time or licenses to projects, users, and
groups according to rules that you set
•
Make sure that resources are not misused
•
Optimize how jobs are placed on vnodes, so that jobs run as efficiently as possible
•
Use special time slots for particular tasks
•
Optimize throughput or turnaround time for jobs
PBS Professional 13.0 Administrator’s Guide
AG-65
Scheduling
Chapter 4
4.2.2
What is Scheduling Policy?
Scheduling policy determines when each job is run and on which resources. In other words, a
scheduling policy describes a goal, or intended behavior. For convenience, we describe a
scheduling policy as being a combination of sub-goals, for example a combination of how
resources should be allocated and how efficiency should be maximized.
You implement a scheduling policy using the tools PBS provides. A scheduling tool is a feature that allows you control over some aspect of scheduling. For example, the job sorting formula is a tool that allows you to define how you want job execution priority to be computed.
Some scheduling tools are supplied by the PBS scheduler, and some are supplied by other elements of PBS, such as the hooks, server, queues or resources.
4.2.3
Basic PBS Scheduling Behavior
The basic behavior of PBS is that it always places jobs where it finds the resources requested
by the job. PBS will not place a job where that job would use more resources than PBS thinks
are available. For example, if you have two jobs, each requesting 1 CPU, and you have one
vnode with 1 CPU, PBS will run only one job at a time on the vnode. You do not have to configure PBS for this basic behavior.
PBS determines what hardware resources are available and configures them for you. However, you do have to inform PBS which custom resources and non-hardware resources are
available and where, how much, and whether they are consumable or not. In addition, in
order to ensure that jobs are sent to the appropriate vnodes for execution, you also need to
make sure that they request the correct resources. You can do this either by having users submit their jobs with the right resource requests, using hooks that set job resources, or by configuring default resources for jobs to inherit.
4.2.4
Sub-goals
Your scheduling policy is the combination that you choose of one or more sub-goals. For
example, you might need to meet two particular sub-goals: you might need to prioritize jobs a
certain way, and you might need to use resources efficiently. You can choose among various
outcomes for each sub-goal. For example, you can choose to prioritize jobs according to size,
owner, owner’s usage, time of submission, etc.
AG-66
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
In the following sections, we describe the tools PBS offers for meeting each of the following
sub-goals.
•
Job prioritization and preemption; see section 4.2.5, “Job Prioritization and Preemption”,
on page 67.
•
Resource allocation & limits; see section 4.2.6, “Resource Allocation to Users, Projects
& Groups”, on page 75.
•
Time slot allocation; see section 4.2.7, “Time Slot Allocation”, on page 79.
•
Job placement optimizations; see section 4.2.8, “Job Placement Optimization”, on page
80.
•
Resource efficiency optimizations; see section 4.2.9, “Resource Efficiency Optimizations”, on page 84.
•
Overrides; see section 4.2.10, “Overrides”, on page 87.
4.2.5
Job Prioritization and Preemption
Job prioritization is any technique you use to come up with a ranking of each job’s relative
importance. You can specify separate priority schemes for both execution and preemption.
4.2.5.1
Where PBS Uses Job Priority
PBS calculates job priority for two separate tasks: job execution and job preemption. Job execution priority is used with other factors to determine when to run each job. Job preemption
priority is used to determine which queued jobs are allowed to preempt which running jobs in
order to use their resources and run. These two tasks are independent, and it is important to
make sure that you do not make them work at cross-purposes. For example, you do not want
to have a class of jobs having high execution priority and low preemption priority; these jobs
would run first, and then be preempted first.
Preemption comes into play when the scheduler examines the top job and determines that it
cannot run now. If preemption is enabled, the scheduler checks to see whether the top job has
sufficient preemption priority to be able to preempt any running jobs, and then if it does,
whether preempting jobs would yield enough resources to run the top job. If both are true, the
scheduler preempts running jobs and runs the top job.
If you take no action to configure how jobs should be prioritized, they are considered in submission order, one queue at a time. If you don’t prioritize queues, the queues are examined in
an undefined order.
PBS Professional 13.0 Administrator’s Guide
AG-67
Chapter 4
4.2.5.2
Scheduling
Overview of Prioritizing Jobs
PBS provides several tools for setting job execution priority. There are queue-based tools for
organizing jobs, moving them around, and specifying the order in which groups of jobs
should be examined. There are tools for sorting jobs into the order you want. There is a metatool (strict ordering) that allows you to specify that the top job must go next, regardless of
whether the resources it requires are available now.
The scheduler can use multiple sorting tools, in succession. You can combine your chosen
sorting tools with queue-based tools to give a wide variety of behaviors. Most of the queuebased tools can be used together. The scheduler can treat all jobs as if they are in a single
queue, considering them all with respect to each other, or it can examine all queues that have
the same priority as a group, or it can examine jobs queue by queue, comparing each job only
to other jobs in the same queue.
You can change how execution priority is calculated, depending on which time slot is occurring. You can divide time up into primetime, non-primetime, and dedicated time.
When the scheduler calculates job execution priority, it uses a built-in system of job classes.
PBS runs special classes of jobs before it considers queue membership. These classes are for
reservation, express, preempted, and starving jobs. Please see section 4.8.16, “Calculating
Job Execution Priority”, on page 174. After these jobs are run, the scheduler follows the rules
you specify for queue behavior. Within each queue, jobs are sorted according to the sorting
tools you choose.
4.2.5.3
4.2.5.3.i
Using Queue-based Tools to Prioritize Jobs
Using Queue Order to Affect Order of Consideration
When the scheduler examines queued jobs, it can consider all of the jobs in the complex as a
whole, it can round-robin through groups of queues where the queues are grouped by priority,
or it can examine jobs in only one queue at a time. These three systems are incompatible.
Queues are always sorted by priority.
The by_queue scheduler parameter controls whether the scheduler runs all the jobs it can
from the highest-priority queue before moving to the next, or treats all jobs as if they are in a
single queue. By default, this parameter is set to True. When examining jobs one queue at a
time, the scheduler runs all of the jobs it can from the highest-priority queue first, then moves
to the next highest-priority queue and runs all the jobs it can from that queue, and so on. See
section 4.8.4, “Examining Jobs Queue by Queue”, on page 136.
AG-68
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
The round_robin scheduler parameter controls whether or not the scheduler round-robins
through queues. When the scheduler round-robins through queues, it groups the queues by
priority, and round-robins first through the highest-priority group, then the next highest-priority group, and so on, running all of the jobs that it can from that group. So within each group,
if there are multiple queues, the scheduler runs the top job from one queue, then the top job
from the next queue, and so on, then goes back to the first queue, runs its new top job, goes to
the next queue, runs its new top job, and so on until it has run all of the jobs it can from that
group. All queues in a group must have exactly the same priority. The order in which queues
within a group are examined is undefined. If all queues have different priorities, the scheduler
starts with the highest-priority queue, runs all its jobs, moves to the next, runs its jobs, and so
on until it has run all jobs from each queue. This parameter overrides by_queue. See section
4.8.38, “Round Robin Queue Selection”, on page 270.
If you want queues to be considered in a specific order, you must assign a different priority to
each queue. Queues are always sorted by priority. See section 4.8.45, “Sorting Queues into
Priority Order”, on page 295. Give the queue you want considered first the highest priority,
then the next queue the next highest priority, and so on. If you want groups of queues to be
considered together for round-robining, give all queues in each group one priority, and all
queues in the next group a different priority. If the queues don’t have priority assigned to
them, the order in which they are considered is undefined. To set a queue’s priority, use the
qmgr command to assign a value to the priority queue attribute. See section 2.2.5.3, “Prioritizing Execution Queues”, on page 23.
4.2.5.3.ii
Using Express Queues in Job Priority Calculation
You can create express queues, and route jobs into them, if you want to give those jobs special
priority.
An express queue is a queue whose priority is high enough to qualify as an express queue; the
default for qualification is 150, but this can be set using the preempt_queue_prio scheduler
parameter. For information on configuring express queues, see section 2.2.5.3.i, “Express
Queues”, on page 23.
When calculating execution priority, the PBS scheduler uses a built-in job class called
“Express” which contains all jobs that have a preemption level greater than that of the
normal_jobs level. By default, those jobs are jobs in express queues. See section 4.8.16,
“Calculating Job Execution Priority”, on page 174.
You can create preemption levels that include jobs in express queues. Jobs in higher preemption levels are allowed to preempt jobs in lower levels. See section 4.8.33, “Using Preemption”, on page 241.
4.2.5.3.iii
Routing Jobs into Queues
You can configure PBS to automatically put each job in the most appropriate queue. There
are several approaches to this. See section 4.8.39, “Routing Jobs”, on page 272.
PBS Professional 13.0 Administrator’s Guide
AG-69
Chapter 4
4.2.5.3.iv
Scheduling
Using Queue Priority when Computing Job Priority
You can configure the scheduler so that job priority is partly determined by the priority of the
queue in which the job resides. See section 4.8.36, “Queue Priority”, on page 262.
4.2.5.4
Using Job Sorting Tools to Prioritize Jobs
The scheduler can use multiple job sorting tools in succession to determine job execution priority. The scheduler groups all jobs waiting to run into classes, and then applies the sorting
tools you choose to all jobs in each class.
•
You can create a formula that the scheduler uses to sort jobs. The scheduler applies this
formula to all jobs in the complex, using it to calculate a priority for each job. For example, you can specify in the formula that jobs requesting more CPUs have higher priority.
If the formula is defined, it overrides fairshare and sorting jobs on keys. See section
4.8.20, “Using a Formula for Computing Job Execution Priority”, on page 194.
•
You can use the fairshare algorithm to sort jobs. This algorithm allows you to set a
resource usage goal for users or groups. Jobs are prioritized according to each entity’s
usage; jobs whose owners have used the smallest percentage of their allotment go first.
For example, you can track how much CPU time is being used, and allot each group a
percentage of the total. See section 4.8.18, “Using Fairshare”, on page 179.
•
You can sort jobs according to the same usage allotments you set up for fairshare. In this
case, jobs whose owners are given the highest allotment go first. See section 4.8.14,
“Sorting Jobs by Entity Shares (Was Strict Priority)”, on page 168.
•
You can sort jobs on one or more keys, for example, you can sort jobs first by the number
of CPUs they request, then by the amount of memory they request. You can specify that
either the high or the low end of the resulting sort has higher priority.
You can create a custom resource, and use a hook to set a value for that resource for each
job, and then sort on the resource.
See section 4.8.43, “Sorting Jobs on a Key”, on page 292.
•
You can run jobs in the order in which they were submitted. See section 4.8.19, “FIFO
Scheduling”, on page 192.
•
You can run jobs according to the priority requested for each job at submission time.
This priority can be set via a hook. See section 4.8.44, “Sorting Jobs by Requested Priority”, on page 295 and Chapter 6, "Hooks", on page 437.
AG-70
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.2.5.5
Chapter 4
Prioritizing Jobs by Wait Time
You can use the amount of time a job has been waiting to run in the priority calculation.
There are two ways to measure wait time:
•
Eligible waiting time: how long a job has been waiting to run due to a shortage of
resources, rather than because its owner isn’t allowed to run jobs now. See section
4.8.13, “Eligible Wait Time for Jobs”, on page 163
•
Amount of time waiting in the queue
Both of these ways can be used when computing whether or not a job is starving. You can
specify how long a job must be waiting to be considered starving. See section 4.8.46, “Starving Jobs”, on page 296.
You can use a job’s eligible waiting time in the job sorting formula. See section 4.8.20,
“Using a Formula for Computing Job Execution Priority”, on page 194.
When a job is considered to be starving, it is automatically assigned special execution priority,
and placed in the Starving execution priority class; see section 4.8.16, “Calculating Job Execution Priority”, on page 174. You can configure preemption levels that include starving jobs;
see section 4.8.33, “Using Preemption”, on page 241.
4.2.5.6
Calculating Preemption Priority
Execution priority and preemption priority are two separate systems of priority.
By default, if the top job cannot run now, and it has high preemption priority, the scheduler
will use preemption to run the top job. The scheduler will preempt jobs with lower preemption priority so that it can use the resources to run the top job. The default definition of jobs
with high preemption priority is jobs in express queues. You can configure many levels of
preemption priority, specifying which levels can preempt which other levels. See section
4.8.33, “Using Preemption”, on page 241.
4.2.5.7
Making Preempted Jobs Top Jobs
You can specify that the scheduler should make preempted jobs be top jobs. See section
4.8.3.6, “Configuring Backfilling”, on page 131.
PBS Professional 13.0 Administrator’s Guide
AG-71
Chapter 4
4.2.5.8
Scheduling
Preventing Jobs from Being Preempted
You may have jobs that should not be preempted, regardless of their priority. These can be
jobs which cannot be effectively preempted, so that preempting them would waste resources.
To prevent these jobs from being preempted, do one or both of the following:
•
Set a value for the preempt_targets resource at all jobs that specifies a value for a custom resource. For example, define a Boolean resource named Preemptable, and add
“Resource_List.Preemptable=true” to preempt_targets for all jobs. Then set the value
of Resource_List.Preemptable to False for the jobs you don’t want preempted.
•
Route jobs you don’t want preempted to one or more specific queues, and then use a hook
to make sure that no jobs specify these queues in their preempt_targets.
4.2.5.9
Meta-priority: Running Jobs Exactly in Priority
Order
By default, when scheduling jobs, PBS orders jobs according to execution priority, then considers each job, highest-priority first, and runs the next job that can run now. If a job cannot
run now because the resources required are unavailable, the default behavior is to skip the job
and move to the next in order of priority.
You can tell PBS to use a different behavior called strict ordering. This means that you tell
PBS that it must not skip a job when choosing which job to run. If the top job cannot run, no
job runs.
You can see that using strict ordering could lead to decreased throughput and idle resources.
In order to prevent idle resources, you can tell PBS to run small filler jobs while it waits for
the resources for the top job to become available. These small filler jobs do not change the
start time of the top job. See section 4.8.47, “Using Strict Ordering”, on page 299 and section
4.8.3, “Using Backfilling”, on page 129.
4.2.5.10
Using Different Calculations for Different Time
Periods
PBS allows you to divide time into two kinds, called primetime and non-primetime. All time
is covered by one or the other of these two kinds of time. The times are arbitrary; you can set
them up however you like. You can also choose not to define them, and instead to treat all
time the same.
AG-72
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
You can configure two separate, independent ways of calculating job priority for primetime
and non-primetime. The same calculations are used during dedicated time; dedicated time is a
time slot made up of primetime and/or non-primetime. Many scheduler parameters are prime
options, meaning that they can be configured separately for primetime and non-primetime.
For example, you can configure fairshare as your sorting tool during primetime, but sort jobs
on a key during non-primetime.
If you use the formula, it is in force all of the time.
See section 4.8.34, “Using Primetime and Holidays”, on page 256.
4.2.5.11
When Priority Is Not Enough: Overrides
Sometimes, the tools available for setting job priority don’t do everything you need. For
example, it may be necessary to run a job right away, regardless of what else is running. Or
you may need to put a job on hold. Or you might need to tweak the way the formula works
for the next N jobs. See section 4.8.30, “Overrides”, on page 214.
4.2.5.12
Elements to Consider when Prioritizing Jobs
•
Whether users, groups, or projects affect job priority: for techniques to use user, group, or
project to affect job priority, see section 4.3.3, “Prioritizing Jobs by User, Project or
Group”, on page 90.
•
Reservation jobs: jobs in reservations cannot be preempted.
•
Starving jobs: PBS has a built-in execution priority for starving jobs, but you can give
starving jobs the highest execution priority by giving them the highest preemption priority and enabling preemption. See section 4.8.16, “Calculating Job Execution Priority”,
on page 174 and section 4.8.33, “Using Preemption”, on page 241.
•
Express jobs: PBS has a built-in execution priority for express jobs. You can set the preemption priority for express jobs; see section 4.8.33, “Using Preemption”, on page 241.
•
Preempted jobs: PBS has a built-in execution priority for preempted jobs. See section
4.8.16, “Calculating Job Execution Priority”, on page 174.
•
Large or small jobs: you may want to give large and/or small jobs special treatment. See
section 4.3.5, “Scheduling Jobs According to Size Etc.”, on page 93.
•
User’s priority request for job: the job submitter can specify a priority for the job at submission. You can sort jobs according to each job’s specified priority. See section 4.8.44,
“Sorting Jobs by Requested Priority”, on page 295.
•
Whether the top job must be the next to run, regardless of whether it can run now; see
section 4.8.47, “Using Strict Ordering”, on page 299.
PBS Professional 13.0 Administrator’s Guide
AG-73
Chapter 4
4.2.5.13
4.2.5.13.i
Scheduling
List of Job Sorting Tools
Queue-based Tools for Organizing Jobs
•
Queue-by-queue: PBS runs all the jobs it can from the first queue before moving to the
next queue. Queue order is determined by queue priority. See section 4.8.4, “Examining
Jobs Queue by Queue”, on page 136.
•
Round-robin job selection: PBS can select jobs from queues with the same priority in a
round-robin fashion. See section 4.8.38, “Round Robin Queue Selection”, on page 270.
•
Queue priority: Queues are always ordered according to their priority; jobs in higher-priority queues are examined before those in lower-priority queues. See section 2.2.5.3,
“Prioritizing Execution Queues”, on page 23.
•
Sorting queues: PBS always sorts queues into priority order. See section 4.8.45, “Sorting
Queues into Priority Order”, on page 295.
•
Express queues: Jobs in express queues are assigned increased priority. See section
2.2.5.3.i, “Express Queues”, on page 23, and section 4.2.5.3.ii, “Using Express Queues in
Job Priority Calculation”, on page 69.
•
Routing: You can set up a queue system so that jobs with certain characteristics are routed
to specific queues. See section 4.8.39, “Routing Jobs”, on page 272.
4.2.5.13.ii
Job Sorting Tools
You can use multiple job sorting tools, one at a time in succession. You can use different sorting tools for primetime and non-primetime.
•
Job sorting formula: You create a formula that PBS uses to calculate each job’s priority.
See section 4.8.20, “Using a Formula for Computing Job Execution Priority”, on page
194.
•
Fairshare: PBS tracks past usage of specified resources, and starts jobs based on specified
usage ratios. See section 4.8.18, “Using Fairshare”, on page 179.
•
Sorting jobs on keys: PBS can sort jobs according to one or more keys, such as requested
CPUs or memory; see section 4.8.43, “Sorting Jobs on a Key”, on page 292.
•
Entity shares (strict priority): Jobs are prioritized according to the owner’s fairshare allocation. See section 4.8.14, “Sorting Jobs by Entity Shares (Was Strict Priority)”, on page
168.
•
FIFO: Jobs can be run in submission order. See section 4.8.19, “FIFO Scheduling”, on
page 192.
•
Job’s requested priority: you can sort jobs on the priority requested for the job; see section 4.8.44, “Sorting Jobs by Requested Priority”, on page 295.
AG-74
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.2.5.13.iii
Chapter 4
Other Job Prioritization Tools
•
Strict ordering: you can specify that jobs must be run in priority order, so that a job that
cannot run because resources are unavailable is not skipped. See section 4.8.47, “Using
Strict Ordering”, on page 299.
•
Waiting time: PBS can assign increased priority to jobs that have been waiting to run.
See section 4.8.13, “Eligible Wait Time for Jobs”, on page 163, and section 4.8.46,
“Starving Jobs”, on page 296.
•
Setting job execution priority: PBS can set job execution priority according to a set of
rules. See section 4.8.16, “Calculating Job Execution Priority”, on page 174.
•
Preemption: PBS preempts lower-priority jobs in order to run higher-priority jobs. See
section 4.8.33, “Using Preemption”, on page 241.
•
Starving jobs: Jobs that have been waiting for a specified amount of time can be given
increased priority. See section 4.8.46, “Starving Jobs”, on page 296.
•
Preventing preemption: You can prevent certain jobs from being preempted. See section
4.2.5.8, “Preventing Jobs from Being Preempted”, on page 72.
•
Making preempted jobs top jobs: PBS can backfill around preempted jobs. See section
4.8.3.4, “Backfilling Around Preempted Jobs”, on page 130.
•
Behavior overrides: you can intervene manually in how jobs are run. See section 4.8.30,
“Overrides”, on page 214.
4.2.6
Resource Allocation to Users, Projects &
Groups
If you need to ensure fairness, you may need to make sure that resources are allocated fairly.
If different users, groups, or projects own or pay for different amounts of hardware or
machine time, you may need to allocate resources according to these amounts or proportions.
You can allocate hardware-based resources such as CPUs or memory, and/or time-based
resources such as walltime or CPU time, according to to the agreed amounts or proportions.
You can also control who starts jobs.
PBS Professional 13.0 Administrator’s Guide
AG-75
Chapter 4
4.2.6.1
4.2.6.1.i
Scheduling
Limiting Amount of Resources Used
Allocation Using Resource Limits
You can use resource limits as a way to enforce agreed allocation amounts. This is probably
the most straightforward way, and the easiest to explain to your users. PBS provides a system
for limiting the total amount of each resource used by projects, users, and groups at the server
and at each queue. For example, you can set a limit on the number of CPUs that any generic
user can use at one time at QueueA, but set three different individual limits for each of three
users that have special requirements, at the same queue. See section 5.15.1, “Managing
Resource Usage By Users, Groups, and Projects, at Server & Queues”, on page 389.
4.2.6.1.ii
Allocation Using Fairshare
The PBS fairshare tool allows you to start jobs according to a formula based on resource
usage by job owners. You can designate who the valid job owners are, which resources are
being tracked, and how much of the resources each owner is allowed to be using. Fairshare
uses a moving average of resource usage, so that a user who in the recent past has not used
their share can use more now. For example, you can track usage of the cput resource, and
give one group 40 percent of usage, one 50 percent, and one group, 10 percent. See section
4.8.18, “Using Fairshare”, on page 179.
4.2.6.1.iii
Allocation Using Routing
If you do not want to place usage limits directly on projects, users, or groups, you can instead
route their jobs to specific queues, where those queues have their own resource usage limits.
To route jobs this way, force users to submit jobs to a routing queue, and set access control
limits at each execution queue. See section 8.3, “Using Access Control”, on page 791. Make
the routing queue be the default queue:
Qmgr: set server default_queue = <routing queue name>
Using this method, you place a limit for total resource usage at each queue, for each resource
you care about. See section 5.15.1, “Managing Resource Usage By Users, Groups, and
Projects, at Server & Queues”, on page 389.
You can also route jobs to specific queues, where those queues can send jobs only to specific
vnodes. See section 4.8.2, “Associating Vnodes with Queues”, on page 126.
AG-76
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.2.6.2
4.2.6.2.i
Chapter 4
Limiting Jobs
Limiting Number of Jobs per Project, User, or Group
You can set limits on the numbers of jobs that can be run by projects, users, and groups. You
can set these limits for each project, user, and group, and you can set them at the server and at
each queue. You can set a generic limit for all projects, users, or groups, and individual limits
that override the generic limit. For example, you can set a limit that says that no user at the
complex can run more than 8 jobs. Then you can set a more specific limit for QueueA, so that
users at QueueA can run 4 jobs. Then you can set a limit for User1 and User2 at QueueA, so
that they can run 6 jobs. See section 5.15.1, “Managing Resource Usage By Users, Groups,
and Projects, at Server & Queues”, on page 389.
4.2.6.2.ii
Allocation Using Round-robin Queue Selection
PBS can select jobs from queues by examining groups of queues in round-robin fashion,
where all queues in each group have the same priority. When using the round-robin method,
the scheduler considers the first queue in a group, tries to run the top job from that queue, then
considers the next queue, tries to run the top job from that queue, then considers the next
queue, and so on, in a circular fashion. The scheduler runs all the jobs it can from the highestpriority group first, then moves to the group with the next highest priority.
If you want a simple way to control how jobs are started, you can use round-robin where each
queue in a group belongs to a different user or entity. See section 4.8.38, “Round Robin
Queue Selection”, on page 270.
4.2.6.2.iii
Limiting Resource Usage per Job
If you are having trouble with large jobs taking up too much of a resource, you can limit the
amount of the resource being used by individual jobs. You can set these limits at each queue,
and at the server. See section 5.15.3, “Placing Resource Limits on Jobs”, on page 414.
PBS Professional 13.0 Administrator’s Guide
AG-77
Chapter 4
4.2.6.3
Scheduling
Resource Allocation Tools
The following is a list of scheduling tools that you can use for allocating resources or limiting
resources or jobs:
•
Matching: PBS places jobs where the available resources match the job’s resource
requirements; see section 4.8.28, “Matching Jobs to Resources”, on page 210.
•
Reservations: Users can create advance and standing reservations for specific resources
for specific time periods. See section 4.8.37, “Advance and Standing Reservations”, on
page 264.
•
Fairshare: PBS tracks past usage of specified resources, and starts jobs based on specified
usage ratios. See section 4.8.18, “Using Fairshare”, on page 179.
•
Routing: You can set up a queue system so that jobs with certain characteristics are routed
to specific queues. See section 2.2.6, “Routing Queues”, on page 24 and section 4.8.39,
“Routing Jobs”, on page 272.
•
Limits on resource usage by projects, users, and groups: You can set limits on user and
group resource usage. See section 4.8.25, “Limits on Project, User, and Group Resource
Usage”, on page 205.
•
Round-robin job selection: PBS can select jobs from queues that have the same priority
in a round-robin fashion. See section 4.8.38, “Round Robin Queue Selection”, on page
270.
•
Sorting queues: PBS always sorts queues into priority order. See section 4.8.45, “Sorting
Queues into Priority Order”, on page 295.
•
Limits on number of jobs for projects, users, and groups: You can set limits on the numbers of jobs that can be run by projects, users, and groups. See section 5.15.1, “Managing
Resource Usage By Users, Groups, and Projects, at Server & Queues”, on page 389.
•
Limits on resources used by each job: You can set limits on the amount of each resource
that any job can use. See section 4.8.23, “Limits on Per-job Resource Usage”, on page
204.
•
Limits on the number of jobs at each vnode: You can set limits on the number of jobs that
can run at each vnode. See section 4.8.26, “Limits on Jobs at Vnodes”, on page 205.
•
Using custom resources to limit resource usage: You use custom resources to manage
usage. See section 4.8.8, “Using Custom and Default Resources”, on page 140.
•
Gating and admission requirements: You can specify admission requirements for jobs.
See section 4.8.21, “Gating Jobs at Server or Queue”, on page 203.
•
Making jobs inherit default resources: You can use default resources to manage jobs. See
section 4.8.8, “Using Custom and Default Resources”, on page 140.
AG-78
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.2.7
Chapter 4
Time Slot Allocation
Time slot allocation is the process of creating time slots within which only specified jobs are
allowed to run.
4.2.7.1
Why Allocate Time Slots
You may want to set up blocks of time during which only certain jobs are allowed to run. For
example, you might need to ensure that specific high-priority jobs have their own time slot, so
that they are guaranteed to be able to run and finish before their results are required.
You may want to divide jobs into those that run at night, when no one is around, and those that
run during the day, because their owners need the results then.
You might want to run jobs on desktop clusters only at night, when the primary users of the
desktops are away.
When you upgrade PBS, a chunk of dedicated time can come in very handy. You set up dedicated time for a time period that is long enough for you to perform the upgrade, and you make
sure the time slot starts far enough out that no jobs will be running.
You may want to run different scheduling policies at different times or on different days.
4.2.7.2
How to Allocate Time Slots
Time slots are controlled by queues: primetime queues, non-primetime queues, dedicated time
queues, and reservation queues. For this, you use your favorite routing method to move jobs
into the desired queues. See section 4.8.39, “Routing Jobs”, on page 272.
4.2.7.2.i
Allocation Using Primetime and Holidays
You can specify how to divide up days or weeks, and designate each time period to be either
primetime or non-primetime. You can use this division in the following ways:
•
You can run a different policy during primetime from that during non-primetime
•
You can run specific jobs during primetime, and others during non-primetime
See section 4.8.34, “Using Primetime and Holidays”, on page 256.
4.2.7.2.ii
Allocation Using Dedicated Time
Dedicated time is a time period where the only jobs that are allowed to run are the ones in
dedicated time queues. The policy you use during dedicated time is controlled by the normal
primetime and non-primetime policies; those times overlap dedicated time.
If you don’t allow any jobs into a dedicated time queue, you can use it to perform maintenance, such as an upgrade.
PBS Professional 13.0 Administrator’s Guide
AG-79
Chapter 4
Scheduling
See section 4.8.10, “Dedicated Time”, on page 161.
4.2.7.2.iii
Allocation Using Reservations
You and any other PBS user can create advance and standing reservations. These are time
periods with a defined start and end, for a specific, defined set of resources. Reservations are
used to make sure that specific jobs can run on time. See section 4.8.37, “Advance and Standing Reservations”, on page 264.
4.2.7.2.iv
Allocation Using cron Jobs or the Windows Task
Scheduler
You can use cron or the Windows Task Scheduler to run jobs at specific times. See section
4.8.7, “cron Jobs, or the Windows Task Scheduler”, on page 139.
4.2.7.3
Time Slot Allocation Tools
The following is a list of scheduling tools that you can use to create time slots:
•
Primetime and holidays: You can specify days and times that are to be treated as prime
execution time. See section 4.8.34, “Using Primetime and Holidays”, on page 256.
•
Dedicated time: You can set aside blocks of time reserved for certain system operations.
See section 4.8.30.6, “Using Dedicated Time”, on page 217.
•
cron jobs and the Windows Task Scheduler: You can use cron or the Windows Task
Scheduler to run jobs. See section 4.8.30.7, “Using cron Jobs or the Windows Task
Scheduler”, on page 218.
•
Reservations: Users can create advance and standing reservations for specific resources
for specific time periods. See section 4.8.37, “Advance and Standing Reservations”, on
page 264.
4.2.8
Job Placement Optimization
PBS automatically places jobs where they can run, but you can refine how jobs are placed.
Optimizations are the techniques you use to increase throughput, turnaround, or efficiency, by
taking advantage of where jobs can be run.
PBS places jobs according to placement optimization settings in tools to specify how vnodes
should be organized, how jobs should be distributed, and how resources should be used.
AG-80
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.2.8.1
Chapter 4
Why Optimize Placement
PBS automatically places jobs where they can run, matching jobs to resources, so why optimize placement?
•
You can help PBS refine its understanding of hardware topology, so that PBS can place
jobs where they will run most efficiently.
•
If you have some vnodes that are faster than others, you can preferentially place jobs on
those vnodes.
•
You may need to place jobs according to machine ownership, so that for example only
jobs owned by a specific group run on a particular machine.
•
You can take advantage of unused workstation computing capacity.
•
You can balance the workload between two or more PBS complexes, trading jobs around
depending on the workload on each complex.
•
You can specify whether or not certain vnodes should be used for more than one job at a
time.
•
You can tell PBS to avoid placing jobs on highly-loaded vnodes
4.2.8.2
Matching Jobs to Resources
By default, PBS places jobs where the available resources match the job’s resource requirements. See section 4.8.28, “Matching Jobs to Resources”, on page 210.
4.2.8.3
Organizing and Selecting Vnodes
By default, the order in which PBS examines vnodes is undefined. The default setting for
vnode sorting is the following:
node_sort_key: “sort_priority HIGH all”
However, sort_priority means sort on each vnode’s priority attribute, but by default, that
attribute is unset.
PBS can organize vnodes into groups. By default, PBS does not organize vnodes into groups.
By default, when PBS chooses vnodes for a job, it runs down its list of vnodes, searching until
it finds vnodes that can supply the job with the requested resources. You can improve this in
two ways:
•
PBS provides a way to organize your vnodes so that jobs can run on groups of vnodes,
where the selected group of vnodes provides the job with good connectivity. This can
improve memory access and interprocess communication timing. PBS then searches
through these groups of vnodes, called placement sets, looking for the smallest group that
PBS Professional 13.0 Administrator’s Guide
AG-81
Chapter 4
Scheduling
satisfies the job’s requirements. Each placement set is a group of vnodes that share a
value for a resource. An illustrative example is a group of vnodes that are all connected
to the same high speed switch, so that all of the vnodes have the same value for the switch
resource. For detailed information on how placement sets work and how to configure
them, see section 4.8.32, “Placement Sets”, on page 224.
•
By default, the order in which PBS examines vnodes, whether in or outside of placement
sets, is undefined. PBS can sort vnodes on one or more keys. Using this tool, you can
specify which vnodes should be selected first. For information on sorting vnodes on
keys, see section 4.8.48, “Sorting Vnodes on a Key”, on page 300.
You can sort vnodes in conjunction with placement sets.
4.2.8.4
Distributing Jobs
All of the following methods for distributing jobs can be used together.
4.2.8.4.i
Filtering Jobs to Specific Vnodes
If you want to run certain kinds of jobs on specific vnodes, you can route those jobs to specific execution queues, and tie those queues to the vnodes you want. For example, if you
want to route jobs requesting large amounts of memory to your large-memory machines, you
can set up an execution queue called LMemQ, and associate that queue with the large-memory vnodes. You can route any kind of job to its own special execution queue. For example,
you can route jobs owned by the group that owns a cluster to a special queue which is associated with the cluster. For details on routing jobs, see section 4.8.39, “Routing Jobs”, on page
272. For details on associating vnodes and queues, see section 4.8.2, “Associating Vnodes
with Queues”, on page 126.
4.2.8.4.ii
Running Jobs at Least-loaded Complex
You can set up cooperating PBS complexes that automatically run jobs from each other’s
queues. This allows you to dynamically balance the workload across multiple, separate PBS
complexes. See section 4.8.31, “Peer Scheduling”, on page 218.
4.2.8.4.iii
Using Idle Workstations
You can run jobs on workstations whenever they are not being used by their owners. PBS can
monitor workstations for user activity or load, and run jobs when those jobs won’t interfere
with the user’s operation. See section 4.8.9, “Using Idle Workstation Cycle Harvesting”, on
page 143.
AG-82
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.2.8.4.iv
Chapter 4
Avoiding Highly-loaded Vnodes
You can tell PBS not to run jobs on vnodes that are above a specified load. This is in addition
to the default behavior, where PBS does not run jobs that request more of a resource than it
thinks each vnode can supply. See section 4.8.27, “Using Load Balancing”, on page 205.
4.2.8.4.v
Placing Job Chunks on Desired Hosts
You can tell PBS to place each job on as few hosts as possible, to place each chunk of a job on
a separate host, a separate vnode, or on any vnode. You can specify this behavior for the jobs
at a queue and at the server.
You can do the following
•
Set default behavior for the queue or server: jobs inherit behavior if they do not request it;
see section 5.9.3.6, “Specifying Default Job Placement”, on page 325
•
Use a hook to set each job’s placement request (Resource_List.place). See Chapter 6,
"Hooks", on page 437
For more on placing chunks, see section 4.8.6, “Organizing Job Chunks”, on page 138.
For information on how jobs request placement, see section 2.60.2.5, “Requesting Resources
and Placing Jobs”, on page 228.
4.2.8.5
Shared or Exclusive Resources and Vnodes
PBS can give jobs their own vnodes, or fill vnodes with as many jobs as possible. The scheduler uses a set of rules to determine whether a job can share resources or a host with another
job. These rules specify how the vnode sharing attribute should be combined with a job’s
placement directive. The vnode’s sharing attribute supersedes the job’s placement request.
You can set each vnode’s sharing attribute so that the vnode or host is always shared, always
exclusive, or so that it honors the job’s placement request. See section 4.8.40, “Shared vs.
Exclusive Use of Resources by Jobs”, on page 277.
4.2.8.6
Tools for Organizing Vnodes
•
Placement sets: PBS creates sets of vnodes organized by the values of multiple resources.
See section 4.8.32, “Placement Sets”, on page 224.
•
Sorting vnodes on keys: PBS can sort vnodes according to specified keys. See section
4.8.48, “Sorting Vnodes on a Key”, on page 300.
PBS Professional 13.0 Administrator’s Guide
AG-83
Chapter 4
4.2.8.7
Scheduling
Tools for Distributing Jobs
•
Routing: You can set up a queue system so that jobs with certain characteristics are routed
to specific queues. See section 2.2.6, “Routing Queues”, on page 24 and section 4.8.39,
“Routing Jobs”, on page 272.
•
Associating vnodes with queues: You can specify that jobs in a given queue can run only
on specific vnodes, and vice versa. See section 4.8.2, “Associating Vnodes with
Queues”, on page 126.
•
Idle workstation cycle harvesting: PBS can take advantage of unused workstation CPU
time. See section 4.8.9, “Using Idle Workstation Cycle Harvesting”, on page 143.
•
Peer scheduling: PBS complexes can exchange jobs. See section 4.8.31, “Peer Scheduling”, on page 218.
•
Load balancing: PBS can place jobs so that machines have balanced loads. See section
4.8.27, “Using Load Balancing”, on page 205.
•
SMP cluster distribution (deprecated): PBS can place jobs in a cluster as you specify.
See section 4.8.42, “SMP Cluster Distribution”, on page 290.
4.2.9
Resource Efficiency Optimizations
PBS automatically runs each job where the resources required for the job are available. You
can refine the choices PBS makes.
Resource optimizations are the techniques you use to increase throughput, turnaround, or efficiency, by taking advantage of how resources are used.
Before reading this section, please make sure you understand how resources are used by reading section 4.8.28, “Matching Jobs to Resources”, on page 210.
AG-84
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.2.9.1
Chapter 4
Why Optimize Use of Resources
You may want to take advantage of the following:
•
If you are using strict ordering, you can prevent resources from standing idle while the
top job waits for its resources to become available
•
PBS can estimate the start times of jobs, so that users can stay informed
•
PBS can provision vnodes with the environments that jobs require
•
PBS can track resources that are outside of the control of PBS, such as scratch space
•
You can take advantage of unused workstation computing capacity
•
You can balance the workload between two or more PBS complexes, trading jobs around
depending on the workload on each complex.
•
You can specify whether or not certain vnodes should be used for more than one job at a
time.
•
Users can specify that jobs that are dependent on the output of other jobs run only after
the other jobs complete
•
You can tell PBS to avoid placing jobs on highly-loaded vnodes
4.2.9.2
4.2.9.2.i
How to Optimize Resource Use
Backfilling Around Top Jobs
PBS creates a list of jobs ordered by priority, and tries to run the jobs in order of priority. You
can force all jobs to be run in exact order of their priority, using strict ordering. See section
4.8.47, “Using Strict Ordering”, on page 299. However, this can reduce resource utilization
when the top job cannot run now and must wait for resources to become available, idling the
entire complex. You can offset this problem by using backfilling, where PBS tries to fit
smaller jobs in around the top job that cannot run. The start time of the top job is not delayed.
Job walltimes are required in order to use backfilling. You can specify the number of jobs
around which to backfill. You can also disable this feature. See section 4.8.3, “Using Backfilling”, on page 129.
PBS can shrink the walltime of shrink-to-fit jobs into available time slots. These jobs can be
used to backfill around top jobs and time boundaries such as dedicated time or reservations.
See section 4.8.41, “Using Shrink-to-fit Jobs”, on page 279.
If you do not use strict ordering, PBS won’t necessarily run jobs in exact priority order. PBS
will instead run jobs so that utilization is maximized, while trying to preserve priority order.
PBS Professional 13.0 Administrator’s Guide
AG-85
Chapter 4
4.2.9.2.ii
Scheduling
Using Dependencies
Job submitters can specify dependencies between jobs. For example, if you have a data analysis job that must run after data collection and cleanup jobs, you can specify that. See section
4.8.11, “Dependencies”, on page 162.
4.2.9.2.iii
Estimating Start Time for Jobs
You can tell PBS to estimate start times and execution vnodes for either the number of jobs
being backfilled around, or all jobs. Users can then see when their jobs are estimated to start,
and the vnodes on which they are predicted to run. See section 4.8.15, “Estimating Job Start
Time”, on page 169.
4.2.9.2.iv
Provisioning Vnodes with Required Environments
PBS can provision vnodes with environments (applications or operating systems) that jobs
require. This means that a job can request a particular environment that is not yet on a vnode,
but is available to be instantiated there. See section 4.8.35, “Provisioning”, on page 262.
4.2.9.2.v
Tracking Dynamic Resources
You can use dynamic PBS resources to represent elements that are outside of the control of
PBS, typically for licenses and scratch space. You can represent elements that are available to
the entire PBS complex as server-level resources, or elements that are available at a specific
host or hosts as host-level resources. For an example of configuring a server-level dynamic
resource, see section 5.14.4.1.i, “Example of Configuring Dynamic Server-level Resource”,
on page 359. For an example of configuring a dynamic host-level resource, see section
5.14.5.1.i, “Example of Configuring Dynamic Host-level Resource”, on page 362.
For a complete description of how to create and use dynamic resources, see section 5.14,
“Custom Resources”, on page 337.
4.2.9.3
4.2.9.3.i
Optimizing Resource Use by Job Placement
Sending Jobs to Complex Having Lightest Workload
You can set up cooperating PBS complexes that automatically run jobs from each other’s
queues. This allows you to dynamically balance the workload across multiple, separate PBS
complexes. See section 4.8.31, “Peer Scheduling”, on page 218.
4.2.9.3.ii
Using Idle Workstations
You can run jobs on workstations whenever they are not being used by their owners. PBS can
monitor workstations for user activity or load, and run jobs when those jobs won’t interfere
with the user’s operation. See section 4.8.9, “Using Idle Workstation Cycle Harvesting”, on
page 143.
AG-86
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.2.9.3.iii
Chapter 4
Avoiding Highly-loaded Vnodes
You can tell PBS not to run jobs on vnodes that are above a specified load. This is in addition
to the default behavior, where PBS does not run jobs that request more of a resource than it
thinks each vnode can supply. See section 4.8.27, “Using Load Balancing”, on page 205.
4.2.9.4
Resource Efficiency Optimization Tools
The following is a list of scheduling tools that you can use to optimize how resources are
used:
•
Backfilling around most important job(s): PBS can place small jobs in otherwise-unused
blocks of resources. See section 4.8.3, “Using Backfilling”, on page 129.
•
Dependencies: Users can specify requirements that must be met by previous jobs in order
for a given job to run. See section 4.8.11, “Dependencies”, on page 162.
•
Estimating start time of jobs: PBS can estimate when jobs will start, so that users can be
informed. See section 4.8.15, “Estimating Job Start Time”, on page 169.
•
Provisioning vnodes with required environments: PBS can provision vnodes with the
environments that jobs require. See section 4.8.35, “Provisioning”, on page 262.
•
Using dynamic resources: PBS can track resources such as scratch space and licenses.
See section 4.8.12, “Dynamic Resources”, on page 163.
•
Idle workstation cycle harvesting: PBS can take advantage of unused workstation CPU
time. See section 4.8.9, “Using Idle Workstation Cycle Harvesting”, on page 143.
•
Peer scheduling: PBS complexes can exchange jobs. See section 4.8.31, “Peer Scheduling”, on page 218.
•
Load balancing: PBS can place jobs so that machines have balanced loads. See section
4.8.27, “Using Load Balancing”, on page 205.
4.2.10
Overrides
Overrides are the techniques you use to override the specified scheduling behavior of PBS.
PBS Professional 13.0 Administrator’s Guide
AG-87
Chapter 4
4.2.10.1
Scheduling
Why and How to Override Scheduling
•
If you need to run a job immediately, you can tell PBS to run a job now. You can optionally specify the vnodes and resources to run it. See section 4.8.30.1, “Run a Job Manually”, on page 214.
•
If you need to prevent a job from running, you can tell PBS to place a hold on a job. See
section 4.8.30.2, “Hold a Job Manually”, on page 215.
•
If you need to change how the formula computes job priority, you can make on-the-fly
changes to how the formula is computed. See section 4.8.30.5, “Change Formula On the
Fly”, on page 217.
•
If you need a block of time where you can control what’s running, for example for
upgrading PBS, you can create dedicated time. See section 4.8.30.6, “Using Dedicated
Time”, on page 217.
•
If you need to submit jobs at a certain time, you can use cron or the Windows Task
Scheduler to run jobs. See section 4.8.30.7, “Using cron Jobs or the Windows Task
Scheduler”, on page 218.
•
If you need to change job resource requests, programs, environment, or attributes, you
can use hooks to examine jobs and alter their characteristics. See Chapter 6, "Hooks", on
page 437.
AG-88
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.3
4.3.1
Chapter 4
Choosing a Policy
Overview of Kinds of Policies
You can tune PBS to produce any of a wide selection in scheduling behaviors. You can
choose from a wide variety of behaviors for each sub-goal, resulting in many possible scheduling policies. However, policies can be grouped into the following kinds:
•
FIFO, where you essentially run jobs in the order in which they were submitted; see section 4.3.2, “FIFO: Submission Order”, on page 89
•
According to user or group priority, where the job’s priority is determined by the owner’s
priority; see section 4.3.3, “Prioritizing Jobs by User, Project or Group”, on page 90
•
According to resource allocation rules, where jobs are run so that they use resources following a set of rules for how resources should be awarded to users or groups; see section
4.3.4, “Allocating Resources by User, Project or Group”, on page 91
•
According to the size of the job, for example measured by CPU or memory request; see
section 4.3.5, “Scheduling Jobs According to Size Etc.”, on page 93
•
By setting up time slots for specific uses; see section 4.3.6, “Scheduling Jobs into Time
Slots”, on page 96
4.3.2
FIFO: Submission Order
If you want jobs to run in the order in which they are submitted, use FIFO. You can use FIFO
across the entire complex, or within each queue.
If it’s important that jobs run exactly in submission order, use FIFO with strict ordering.
However, if you don’t want resources to be idle while a top job is stuck, you can use FIFO
with strict ordering and backfilling.
To run jobs in submission order, see section 4.8.19.1, “Configuring Basic FIFO Scheduling”,
on page 192 .
To run jobs in submission order across the entire complex, see section 4.8.19.2, “FIFO for
Entire Complex”, on page 193.
To run jobs in submission order, examining queues in order of queue priority, see section
4.8.19.3, “Queue by Queue FIFO”, on page 193.
To run jobs in submission order, with strict ordering, see section 4.8.19.4, “FIFO with Strict
Ordering”, on page 193.
To run jobs in submission order, with strict ordering and backfilling, see section 4.8.19.5,
“FIFO with Strict Ordering and Backfilling”, on page 194.
PBS Professional 13.0 Administrator’s Guide
AG-89
Chapter 4
4.3.3
Scheduling
Prioritizing Jobs by User, Project or Group
If you need to run jobs from some users, groups, or projects before others, you can prioritize
jobs using the following techniques:
•
•
•
•
Routing each entity’s jobs to its own execution queue, assigning the queue the desired
priority, and examining jobs queue by queue. See the following:
•
For routing: section 2.2.6, “Routing Queues”, on page 24
•
For setting queue priority: section 2.2.5.3, “Prioritizing Execution Queues”, on page
23
•
For examining jobs queue by queue: section 4.8.4, “Examining Jobs Queue by
Queue”, on page 136
Routing each entity’s jobs to its own execution queue, where the jobs inherit a custom
resource that you use in the job sorting formula. See the following:
•
For routing: section 2.2.6, “Routing Queues”, on page 24
•
For inherited resources: section 11.3, “Allocating Resources to Jobs”, on page 967
•
For the job sorting formula: section 4.8.20, “Using a Formula for Computing Job
Execution Priority”, on page 194
Using a hook to allocate a custom resource to each job, where the hook sets the value
according to the priority of the job’s owner, group, or project, then using the resource in
the job sorting formula. See the following:
•
For hooks: Chapter 6, "Hooks", on page 437
•
For custom resources: section 5.14, “Custom Resources”, on page 337
•
For the job sorting formula: section 4.8.20, “Using a Formula for Computing Job
Execution Priority”, on page 194
Assigning a greater fairshare allocation in the fairshare tree to the users or groups whose
jobs must run first, and running jobs according to entity shares. See the following:
•
For fairshare: section 4.8.18, “Using Fairshare”, on page 179
•
For entity shares: section 4.8.14, “Sorting Jobs by Entity Shares (Was Strict Priority)”, on page 168
AG-90
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.3.4
Chapter 4
Allocating Resources by User, Project or
Group
When you want to divide up hardware usage among users, groups, or projects, you can make
sure you allocate resources along those lines. You can do this in the following ways:
•
Allocate portions of the entire complex to each entity; see section 4.3.4.1, “Allocating
Portions of Complex”, on page 91
•
Allocate portions of all machines or clusters to each entity, or use controlled allocation
for some hardware, with a free-for-all elsewhere; see section 4.3.4.2, “Allocating Portions of Machines or Clusters”, on page 92
•
Lock entities into using specific hardware; see section 4.3.4.3, “Locking Entities into
Specific Hardware”, on page 93
4.3.4.1
4.3.4.1.i
Allocating Portions of Complex
Allocating Specific Amounts
To allocate specific amounts of resources across the entire complex, you can use resource limits at the server. These limits set the maximum amount that can be used, ensuring that
projects, users, or groups stay within their bounds. You can set a limit for each resource, and
make it different for each project, user, and group. You can set a different limit for each
project, user, and group, for each resource.
For example, you can set a limit of 48 CPUs in use at once by most groups, but give groupA a
limit of 96 CPUs. You can give each individual user a limit of 8 CPUs, but give UserA a limit
of 10 CPUs, and UserB a limit of 4 CPUs.
To set limits for usage across the entire complex, set the limits at the server.
See section 5.15.1, “Managing Resource Usage By Users, Groups, and Projects, at Server &
Queues”, on page 389.
4.3.4.1.ii
Allocating Percentages
To allocate a percentage of the resources being used at the complex, you can use fairshare.
Fairshare tracks a moving average of resource usage, so it takes past use into account. You
choose which resources to track. You can tune the influence of past usage.
To use fairshare across the entire complex, make sure that both by_queue and round_robin
are False.
Fairshare is described in section 4.8.18, “Using Fairshare”, on page 179.
PBS Professional 13.0 Administrator’s Guide
AG-91
Chapter 4
4.3.4.2
Scheduling
Allocating Portions of Machines or Clusters
You can allocate fixed amounts of a machine or groups of machines. You can do this for as
many machines as you want. For example, on HostA, you can give GroupA 100 CPUs,
GroupB 150 CPUs, and GroupC 50 CPUs, while at HostB, GroupA gets 10, GroupB gets 8,
and GroupC gets 25.
To allocate fixed portions of a specific machine or group of machines, you use these tools in
combination:
•
Create an execution queue for this machine; see section 2.2.3, “Creating Queues”, on
page 20.
•
Route jobs belonging to the users or groups who share this machine into a queue. Each
machine or cluster that requires controls gets its own queue. See section 4.8.39, “Routing
Jobs”, on page 272.
•
Associate the queue with the vnodes in question; see section 4.8.2, “Associating Vnodes
with Queues”, on page 126.
•
Set a limit at the queue for each resource that you care about, for each project, user, or
group. These limits control use of the vnodes associated with the queue only. See section 5.15.1, “Managing Resource Usage By Users, Groups, and Projects, at Server &
Queues”, on page 389.
You can prevent unauthorized usage by setting generic project, user, and group limits for the
machine’s queue to zero. However, you probably don’t want users to submit their jobs to a
queue where they are not allowed to run, only to have those jobs languish. You can avoid this
by doing the following:
•
Setting up a routing queue; see section 2.2.6, “Routing Queues”, on page 24.
•
Making the routing queue be the default queue:
Qmgr: set server default_queue = <routing queue name>
•
Making the routing queue the only queue that accepts job submission: set
from_route_only to True on execution queues tied to hardware. See section 2.2.5.1,
“Where Execution Queues Get Their Jobs”, on page 21.
•
Using queue access control to limit which jobs are routed into the execution queue; see
section 2.2.6.5, “Using Access Control to Route Jobs”, on page 30.
You can either set up allocations for every machine, or you can set up allocations for only
some machines, leaving a free-for-all for the others. If you want access to be unrestricted for
some machines, do not set limits at the server.
AG-92
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.3.4.3
Chapter 4
Locking Entities into Specific Hardware
You can send all jobs from some projects, users, or groups to designated hardware, essentially
limiting them to a sandbox. To do this, do the following:
•
Create an execution queue for the sandbox hardware; see section 2.2.3, “Creating
Queues”, on page 20.
•
Create at least one other execution queue; see section 2.2.3, “Creating Queues”, on page
20.
•
Create a routing queue; see section 2.2.3, “Creating Queues”, on page 20.
•
Make the routing queue be the default queue:
Qmgr: set server default_queue = <routing queue name>
•
Force all users to submit jobs to the routing queue: set from_route_only to True on all
other queues. See section 2.2.5.1, “Where Execution Queues Get Their Jobs”, on page
21.
•
Use queue access control to route according to user or group: route jobs from the controlled users or groups into the sandbox queue only. See section 2.2.6.5, “Using Access
Control to Route Jobs”, on page 30.
•
Use a job submission hook to route according to project: route the jobs from the desired
project(s) to the sandbox queue. See Chapter 6, "Hooks", on page 437.
•
Associate the sandbox queue with the sandbox vnodes. See section 4.8.2, “Associating
Vnodes with Queues”, on page 126.
Note that you can either allow all projects, users, or groups into the sandbox queue, or allow
only the controlled projects, users, or groups into the sandbox queue.
4.3.5
Scheduling Jobs According to Size Etc.
You may need to treat jobs differently depending on their size or other characteristics. For
example, you might want to run jobs differently depending on the number of CPUs or amount
of memory requested by the job, or whether the job requests GPUs.
•
Give special priority to a group of jobs
•
Run a group of jobs on designated hardware
•
Run a group of jobs in designated time slots: reservations, dedicated time, and primetime
or non-primetime
PBS Professional 13.0 Administrator’s Guide
AG-93
Chapter 4
Scheduling
There are two main approaches to doing this. You can route jobs into queues, or you can use
hooks to set values. Here is an outline:
•
•
Route certain kinds of jobs into their own queues, in order to treat each kind differently.
This works for priority, hardware, and time slots. See section 4.3.5.1, “Special Treatment
via Routing”, on page 94
•
Route each kind to its own queue, using queue-based routing or a submission hook;
•
Use queue-based methods to set job priority or to run the jobs on certain hardware or
in certain time slots
Use hooks to set priority for jobs or to set a custom resource that will send jobs to certain
hardware. This does not work for time slots. See section 4.3.5.2, “Special Treatment via
Hooks”, on page 96.
•
Use a submission hook to set each job’s Priority attribute, or set a value for a custom
resource used in the job sorting formula
•
Use a submission hook to set a custom host-level resource value for each job, where
the value matches the value at the desired hardware
4.3.5.1
Special Treatment via Routing
Use a routing queue or a hook to route jobs into a special queue, where the jobs are given special priority, or are run on special hardware, or are run in special time slots.
4.3.5.1.i
Routing via Queues
•
Create your destination queues. See section 2.2.3, “Creating Queues”, on page 20.
•
Set limits at the destination queues, so that each queue receives the correct jobs. See section 2.2.6.4, “Using Resources to Route Jobs Between Queues”, on page 25.
•
Create a routing queue, and set its destination queues. See section 2.2.6, “Routing
Queues”, on page 24.
•
Make the routing queue be the default queue:
Qmgr: set server default_queue = <routing queue name>
4.3.5.1.ii
Using Hooks to Route Jobs
You can use a submission hook to move jobs into the queues you want. See section
4.8.39.2.ii, “Hooks as Mechanism to Move Jobs”, on page 275.
AG-94
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.3.5.1.iii
Chapter 4
Giving Routed Jobs Special Priority
You can give routed jobs special priority in the following ways:
•
•
Have the jobs inherit a custom resource from the special queue, and use this resource in
the job sorting formula.
•
For how to have jobs inherit custom resources, see section 11.3, “Allocating
Resources to Jobs”, on page 967.
•
For how to use the job sorting formula, see section 4.8.20, “Using a Formula for
Computing Job Execution Priority”, on page 194.
Give the queue itself special priority, and use queue priority in the job sorting formula.
•
For how to assign priority to queues, see section 2.2.5.3, “Prioritizing Execution
Queues”, on page 23
•
For how to use the job sorting formula, see section 4.8.20, “Using a Formula for
Computing Job Execution Priority”, on page 194.
4.3.5.1.iv
Running Jobs on Special Vnodes
Now that the special jobs are routed to a special queue, associate that queue with the special
vnodes. See section 4.8.2, “Associating Vnodes with Queues”, on page 126.
4.3.5.1.v
Running Jobs in Special Time Slots
If you want to run jobs during dedicated time, route the jobs into one or more dedicated time
queues. In the same way, for primetime or non-primetime, route jobs into primetime or nonprimetime queues. You can also route jobs into reservation queues for reservations that you
have created for this purpose.
For using dedicated time, see section 4.8.10, “Dedicated Time”, on page 161
For using primetime and non-primetime, see section 4.8.34, “Using Primetime and Holidays”,
on page 256
For using reservations, see section 4.8.37, “Advance and Standing Reservations”, on page
264
PBS Professional 13.0 Administrator’s Guide
AG-95
Chapter 4
4.3.5.2
4.3.5.2.i
Scheduling
Special Treatment via Hooks
Setting Job Priority Via Hook
You can set a job’s Priority attribute using a hook. Note that users can qalter the job’s Priority attribute. Use a job submission hook to set the job priority, by doing one of the following:
•
•
Set a custom numeric resource for the job, and use the resource in the job sorting formula
•
For how to use hooks, see Chapter 6, "Hooks", on page 437
•
For how to use the job sorting formula, see section 4.8.20, “Using a Formula for
Computing Job Execution Priority”, on page 194.
Set the job’s Priority attribute, and sort jobs on a key, where the key is the job’s Priority
attribute.
•
For how to set job attributes, see Chapter 6, "Hooks", on page 437
•
For how to sort jobs on a key, see section 4.8.43, “Sorting Jobs on a Key”, on page
292
4.3.5.2.ii
Routing Jobs to Hardware via Hooks
You can send jobs to particular hardware without using a particular queue, by using a hook.
See section 4.8.39.4.i, “Using Hooks to Tag Jobs”, on page 277.
4.3.6
Scheduling Jobs into Time Slots
You can schedule jobs in time slots in the following ways:
•
Set aside time slots for specific entities; see section 4.3.6.1, “Setting Aside Time Slots for
Entities”, on page 96
•
Lock entities into specific time slots; see section 4.3.6.2, “Locking Entities into Time
Slots”, on page 98
4.3.6.1
Setting Aside Time Slots for Entities
You can set aside time slots that are reserved exclusively for certain users or groups. You can
use reservations, dedicated time, primetime, or non-primetime.
AG-96
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.3.6.1.i
Chapter 4
Reservations
Reservations set aside one or more blocks of time on the requested resources. Users can create their own reservations, or you can create them and set their access control to allow only
specified users to submit jobs to them. See section 4.8.37, “Advance and Standing Reservations”, on page 264.
4.3.6.1.ii
Dedicated Time
During dedicated time, the only jobs allowed to run are those in dedicated queues. The drawback to dedicated time is that it applies to the entire complex. If you want to set aside one or
more dedicated time slots for a user or group, do the following:
•
Create a dedicated queue. See section 2.2.5.2.i, “Dedicated Time Queues”, on page 22.
•
Define dedicated time. See section 4.8.10, “Dedicated Time”, on page 161.
•
Set access control on the dedicated queue so that only the particular users or groups you
want can submit jobs to the queue. See section 2.2.6.5, “Using Access Control to Route
Jobs”, on page 30.
•
If you want to limit access on a dedicated queue to a specific project, set the generic limit
for queued jobs for projects at that queue to zero, and then set the individual limit for the
specific project higher.
4.3.6.1.iii
Non-primetime
You can set up primetime and non-primetime so that one of them, for example, non-primetime, is used as a special time slot allocated to particular users or groups. The advantage of
using non-primetime is that you can set up a separate scheduling policy for it, for example,
using fairshare during non-primetime and sorting jobs on a key during primetime. Note that
the formula, if defined, is in force all of the time. To use non-primetime, do the following:
•
Create a non-primetime queue; see section 2.2.3, “Creating Queues”, on page 20 and section 2.2.5.2.ii, “Primetime and Non-Primetime Queues”, on page 23.
•
Define primetime and non-primetime; see section 4.8.34, “Using Primetime and Holidays”, on page 256.
•
Set access control on the non-primetime queue so that only the particular users or groups
you want can submit jobs to the queue. See section 2.2.6.5, “Using Access Control to
Route Jobs”, on page 30.
•
Make sure that the scheduling policy you want is in force during non-primetime. See
section 4.8.34.1, “How Primetime and Holidays Work”, on page 256.
PBS Professional 13.0 Administrator’s Guide
AG-97
Chapter 4
4.3.6.2
Scheduling
Locking Entities into Time Slots
You can make all jobs from some users or groups run during designated time slots. You can
run them during a reservation, dedicated time, or non-primetime.
4.3.6.2.i
Locking Entities into Reservations
To allow a user to submit jobs only into a reservation, do the following:
•
Create a reservation for the resources and time(s) you want the controlled user(s) to use.
When creating the reservation, set access control to allow the controlled user(s). See section 4.8.37, “Advance and Standing Reservations”, on page 264 and section 8.3.8.1, “Setting Reservation Access”, on page 804.
•
Set access control on all queues except the reservation’s queue to deny the controlled
user(s); see section 2.2.6.5, “Using Access Control to Route Jobs”, on page 30.
4.3.6.2.ii
Locking Entities into Dedicated Time
You can create a dedicated time queue, and send all jobs from controlled projects, users, or
groups to that queue. You can route their jobs to it, and you can allow them to submit directly
to it. To lock one or more projects, users, or groups into one or more dedicated time slots, do
the following:
•
Create a dedicated time queue; see section 2.2.3, “Creating Queues”, on page 20 and section 2.2.5.2.i, “Dedicated Time Queues”, on page 22.
•
Create at least one other execution queue; see section 2.2.3, “Creating Queues”, on page
20.
•
Create a routing queue; see section 2.2.3, “Creating Queues”, on page 20.
•
Prevent controlled users from submitting to non-dedicated time execution queues: set
from_route_only to True on the non-dedicated time execution queues. See section
2.2.5.1, “Where Execution Queues Get Their Jobs”, on page 21.
•
Use queue access control to allow jobs from the controlled users or groups into the dedicated time queue only. See section 2.2.6.5, “Using Access Control to Route Jobs”, on
page 30
•
Use a job submission hook to route jobs from controlled projects into the dedicated time
queue. See Chapter 6, "Hooks", on page 437
•
.Make the routing queue be the default queue:
Qmgr: set server default_queue = <routing queue name>
Note that you can either allow all users into the dedicated time queue, or allow only the controlled users into the dedicated time queue.
AG-98
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.3.6.2.iii
Chapter 4
Locking Entities into Non-primetime
You can create a non-primetime queue, and send all jobs from controlled users, groups, or
projects to that queue. You can route their jobs to it, and you can allow them to submit
directly to it. To lock one or more users, groups, or projects into one or more non-primetime
slots, do the following:
•
Create a non-primetime queue; see section 2.2.3, “Creating Queues”, on page 20 and section 2.2.5.2.ii, “Primetime and Non-Primetime Queues”, on page 23.
•
Create at least one other execution queue; see section 2.2.3, “Creating Queues”, on page
20.
•
Create a routing queue; see section 2.2.3, “Creating Queues”, on page 20.
•
Prevent controlled users from submitting to primetime execution queues: set
from_route_only to True on the primetime execution queues. See section 2.2.5.1,
“Where Execution Queues Get Their Jobs”, on page 21.
•
Make the routing queue be the default queue:
Qmgr: set server default_queue = <routing queue name>
•
Use queue access control to allow jobs from the controlled users or groups into the nonprimetime queue only. See section 2.2.6.5, “Using Access Control to Route Jobs”, on
page 30.
•
Use a job submission hook to route jobs from controlled projects into the non-primetime
queue. See Chapter 6, "Hooks", on page 437
•
Define primetime and non-primetime; see section 4.8.34, “Using Primetime and Holidays”, on page 256.
•
Make sure that the scheduling policy you want is in force during non-primetime. See
section 4.8.34.1, “How Primetime and Holidays Work”, on page 256.
Note that you can either allow all users into the non-primetime queue, or allow only the controlled users into the non-primetime queue.
4.3.7
Default Scheduling Policy
The default scheduling policy is determined by the default settings for all of the attributes,
parameters, etc. that determine the scheduler’s behavior. For a list of all of these elements,
see section 4.4.1, “Configuring the Scheduler”, on page 104.
PBS Professional 13.0 Administrator’s Guide
AG-99
Chapter 4
Scheduling
The default behavior of the scheduler is the following:
•
The scheduler matches jobs with available resources. This means that the scheduler
places each job only where that job has enough resources to run. See section 4.8.28,
“Matching Jobs to Resources”, on page 210.
•
The scheduler will not over-allocate the resources that are listed in the scheduler’s
resources parameter. The defaults for these are ncpus, mem, arch, host, vnode, aoe,
netwins. See section 4.8.28.1, “Scheduling on Consumable Resources”, on page 210.
•
The scheduler sorts vnodes according to its node_sort_key parameter, whose default setting is the following:
node_sort_key: “sort_priority HIGH all”
This means that vnodes are sorted by the value of their priority attribute, with high-priority vnodes used first. The scheduler places jobs first on vnodes that are first in the sorted
list.
Note that all vnodes have the same default priority upon creation, so the default sorted
order for vnodes is undefined.
See section 4.8.48, “Sorting Vnodes on a Key”, on page 300.
•
Queues are sorted according to the value of their priority attribute, so that queues with a
higher priority are considered before those with a lower priority. See section 2.2.5.3,
“Prioritizing Execution Queues”, on page 23.
•
Jobs are considered according to the priority of their queues. The scheduler runs all of
the jobs that it can from the highest-priority queue before moving to the next queue, and
so on. See section 4.8.4, “Examining Jobs Queue by Queue”, on page 136.
•
Within each queue, jobs are considered in submission order.
•
Starving jobs are given a special priority called starving. The default time required to
become a starving job is 24 hours. See section 4.8.46, “Starving Jobs”, on page 296.
•
Jobs in an express queue are placed in the express_queue preemption priority level.
They are also placed in the Express execution priority class. The default priority for a
queue to be an express queue is 150. See section 2.2.5.3.i, “Express Queues”, on page
23.
•
Queued jobs are sorted according to their priority. Special jobs are all prioritized ahead of
normal jobs, without regard to the queue in which they reside. The order for job priority
for special jobs, highest first, is reservation jobs, jobs in express queues, preempted jobs,
starving jobs. After this, the scheduler looks at normal jobs, queue by queue. All jobs in
AG-100
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
express queues, all preempted jobs, and all starving jobs are considered before the scheduler looks at the individual queues.
See section 4.8.16, “Calculating Job Execution Priority”, on page 174.
•
The scheduler will preempt lower-priority jobs in order to run higher-priority jobs
(preemptive_sched is True by default). By default, it has two levels of job priority,
express_queue, and normal_jobs, where express_queue jobs can preempt
normal_jobs. This is set in the scheduler’s preempt_prio parameter.
When the scheduler chooses among jobs of the same priority for a job to preempt, it uses
the default setting for preempt_sort, which is min_time_since_start, choosing jobs that
have been running for the shortest time.
When the scheduler chooses how to preempt a job, it uses the default setting for its
preempt_order parameter, which is SCR, meaning that first it will attempt suspension,
then checkpointing, then if necessary requeueing.
See section 4.8.33, “Using Preemption”, on page 241.
•
The scheduler will do its best to backfill smaller jobs around the job it has decided is the
most important job. See section 4.8.3, “Using Backfilling”, on page 129.
•
Primetime is 6:00 AM to 5:30 PM. Any holiday is considered non-primetime. Standard
U.S. Federal holidays for the year are provided in the file PBS_HOME/sched_priv/
holidays. These dates should be adjusted yearly to reflect your local holidays. See
section 4.8.34, “Using Primetime and Holidays”, on page 256.
•
The scheduler runs every 10 minutes unless a new job is submitted or a job finishes execution. See section 4.4.5, “The Scheduling Cycle”, on page 115.
•
In TPP mode, the scheduler runs with the throughput_mode scheduler attribute set to
True by default, so the scheduler runs asynchronously, and doesn’t wait for each job to be
accepted by MoM, which means it also doesn’t wait for an execjob_begin hook to finish. Especially for short jobs, this can give better scheduling performance.
When throughput_mode is True, jobs that have been changed can run in the same
scheduling cycle in which they were changed, for the following changes:
•
Jobs that are qaltered
•
Jobs that are changed via server_dyn_res scripts
•
Jobs that are peered to a new queue
See “Scheduler Attributes” on page 358 of the PBS Professional Reference Guide.
PBS Professional 13.0 Administrator’s Guide
AG-101
Chapter 4
4.3.8
•
•
•
Sort jobs on a key, using ncpus as the key, to run big jobs first; see section 4.3.5,
“Scheduling Jobs According to Size Etc.”, on page 93
•
Use backfilling; see section 4.8.3, “Using Backfilling”, on page 129
If you want to have all users start about the same number of jobs:
•
Use soft limits for the amount each user can use; see section 5.15.1, “Managing
Resource Usage By Users, Groups, and Projects, at Server & Queues”, on page 389
and section 4.8.33, “Using Preemption”, on page 241
If your site has more than one funding source:
•
•
Use round robin, give each user their own queue, and give each queue the same priority; see section 4.8.38, “Round Robin Queue Selection”, on page 270
If you want to always give each user access to a certain amount of a resource, but allow
more if no one else is using it:
•
•
Sort jobs on a key, using ncpus as the key, to run big jobs first; see section 4.3.5,
“Scheduling Jobs According to Size Etc.”, on page 93
If you have a mix of jobs, and want to give big jobs high priority, but avoid having idle
resources:
•
•
Set up an anti-express queue; see section 4.8.1, “Anti-Express Queues”, on page 125
If you have a mix of jobs, and want to run big jobs first:
•
•
FIFO; see section 4.8.19, “FIFO Scheduling”, on page 192
If you have low-priority jobs that should run only when other jobs don’t need the
resources:
•
•
Create advance reservations for the high-priority jobs, and have users submit those
jobs to the reservations; see section 4.8.37, “Advance and Standing Reservations”,
on page 264
If you want to run jobs in submission order:
•
•
Examples of Workload and Policy
If you need to have high-priority jobs run soon, and nothing distinguishes the high-priority jobs from the rest:
•
•
Scheduling
See section 4.3.4, “Allocating Resources by User, Project or Group”, on page 91
If you have lots of users in a complex:
•
Use resource limits; see section 5.15.1, “Managing Resource Usage By Users,
Groups, and Projects, at Server & Queues”, on page 389, or
•
Use fairshare; see section 4.8.18, “Using Fairshare”, on page 179
If you have jobs that must run at the end of the day:
AG-102
PBS Professional 13.0 Administrator’s Guide
Scheduling
•
•
•
•
Use cycle harvesting; see section 4.8.9, “Using Idle Workstation Cycle Harvesting”,
on page 143, or
•
Use primetime & non-primetime for nighttime; see section 4.8.34, “Using Primetime
and Holidays”, on page 256
If you want to be sure a job will run:
Use a routing queue and two execution queues. Associate each execution queue with
one set of vnodes. Put the execution queue for the preferred set of vnodes first in the
routing list, but put a limit on the number of queued jobs in the execution queues, so
that both queues will fill up. Otherwise the routing queue will preferentially fill the
first in its routing list. See section 2.2.6, “Routing Queues”, on page 24, and section
4.8.2, “Associating Vnodes with Queues”, on page 126
If you need to apportion a single vnode or cluster according to ownership:
•
•
Use peer scheduling. Set up two complexes, give the pulling queues low priority,
and use queue priority in the job sorting formula. See section 4.8.31, “Peer Scheduling”, on page 218, section 2.2.5.3, “Prioritizing Execution Queues”, on page 23, and
section 4.8.20, “Using a Formula for Computing Job Execution Priority”, on page
194. You can use a routing queue to initially send jobs to the correct complex. See
section 2.2.6, “Routing Queues”, on page 24
If you have two (or more) sets of vnodes, and jobs should run on one set or the other, but
not both. Additionally, jobs should not have to request where they run. For example, one
set of vnodes is new, and one is old:
•
•
Use peer scheduling; see section 4.8.31, “Peer Scheduling”, on page 218
If you have some jobs that should prefer to run on one set of vnodes, and other jobs that
should prefer to run on another set of vnodes, but if the preferred vnodes are busy, a job
can run on the non-preferred vnodes:
•
•
Create an advance reservation; see section 4.8.37, “Advance and Standing Reservations”, on page 264
If you have more than one complex, and you want to balance the workload across the
complexes:
•
•
Use dependencies for end-of-day accounting; see section 4.8.11, “Dependencies”, on
page 162
If you need to ensure that jobs run in certain hours on desktops:
•
•
Chapter 4
See section 4.3.4, “Allocating Resources by User, Project or Group”, on page 91
If you have more than one high-priority queue, and at least one low-priority queue, and
you want all jobs in high-priority queues to be considered as one group, and run in submission order:
•
Use the job sorting formula to sort jobs on queue priority:
PBS Professional 13.0 Administrator’s Guide
AG-103
Scheduling
Chapter 4
set server job_sort_formula = queue_priority
•
•
Give all queues whose jobs should be considered together the same priority
•
Set the by_queue scheduler attribute to False
If you want to place jobs on the vnodes with the fewest CPUs first, saving bigger vnodes
for larger jobs:
•
Sort vnodes so that those with fewer CPUs come first:
node_sort_key: “ncpus LOW”
4.4
The Scheduler
The scheduler, pbs_sched, implements scheduling policy. The scheduler communicates
with the MoMs to query the state of host-level resources and with the server to learn about the
availability of jobs to execute and the state of server-level resources.
4.4.1
Configuring the Scheduler
4.4.1.1
Where the Scheduler Gets Its Information
The behavior of the scheduler is controlled by the information in options and attributes and
files of parameters and settings, from the following sources:
PBS_HOME/sched_priv/resource_group
Contains the description of the fairshare tree. Created by you. Can be edited. Read
on startup and HUP of scheduler.
PBS_HOME/sched_priv/usage
Contains the usage database. Do not edit. Instead, use the pbsfs command while
the scheduler is stopped; see “pbsfs” on page 106 of the PBS Professional Reference
Guide. Written every cycle and HUP. Read on startup. Cannot be altered while
scheduler is running.
PBS_HOME/sched_priv/sched_config
Contains scheduler configuration options, also called scheduler parameters, e.g.
backfill, job_sort_key. Read on startup and HUP.
Can be edited. Each entry must be a single, unbroken line. Entries must be doublequoted if they contain whitespace.
See “Scheduler Parameters” on page 297 of the PBS Professional Reference Guide.
AG-104
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
PBS_HOME/sched_priv/dedicated_time
Contains definitions of dedicated time. Can be edited. Read on startup and HUP.
PBS_HOME/sched_priv/holidays
Contains definitions of holidays. Can be edited. Read on startup and HUP.
Options to pbs_sched command
Control some scheduler behavior. Set on invocation.
Scheduler attributes
Control some scheduler behavior. Can be set using qmgr. Read every scheduling
cycle. See “Scheduler Attributes” on page 358 of the PBS Professional Reference
Guide.
Server attributes
Several server attributes control scheduler behavior. Can be set using qmgr. The
following table lists the server attributes that affect scheduling, along with a brief
description. Read every scheduling cycle.
Some limit attributes are marked as “old”. These are incompatible with, and are
replaced by, the new limit attributes described in section 5.15.1, “Managing
Resource Usage By Users, Groups, and Projects, at Server & Queues”, on page 389.
For a complete description of each attribute, see “Server Attributes” on page 332 of
the PBS Professional Reference Guide.
Table 4-1: Server Attributes Involved in Scheduling
Attribute
Effect
backfill_depth
Modifies backfilling behavior. Sets the
number of jobs that are to be backfilled
around.
default_queue
Specifies queue for jobs that don’t
request a queue
eligible_time_enable
Controls starving behavior.
est_start_time_freq
Interval at which PBS calculates estimated start times and vnodes for all jobs.
job_sort_formula
Formula for computing job priorities.
PBS Professional 13.0 Administrator’s Guide
AG-105
Scheduling
Chapter 4
Table 4-1: Server Attributes Involved in Scheduling
Attribute
Effect
max_group_res
Old. The maximum amount of the specified resource that any single group may
consume in this PBS complex.
max_group_res_soft
Old. The soft limit for the specified
resource that any single group may consume in this complex.
max_group_run
Old. The maximum number of jobs
owned by the users in one group allowed
to be running within this complex at one
time.
max_group_run_soft
Old. The maximum number of jobs
owned by the users in one group allowed
to be running in this complex at one time.
max_queued
The maximum number of jobs allowed to
be queued or running in the complex.
Can be specified for users, groups, or all.
max_queued_res.<resource>
The maximum amount of the specified
resource allowed to be allocated to jobs
queued or running in the complex. Can
be specified for users, groups, or all.
max_run
The maximum number of jobs allowed to
be running in the complex. Can be specified for users, groups, or all.
max_run_res.<resource>
The maximum amount of the specified
resource allowed to be allocated to jobs
running in the complex. Can be specified
for users, groups, or all.
max_run_res_soft.<resource>
Soft limit on the amount of the specified
resource allowed to be allocated to jobs
running in the complex. Can be specified
for users, groups, or all.
AG-106
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Table 4-1: Server Attributes Involved in Scheduling
Attribute
Effect
max_run_soft
Soft limit on the number of jobs allowed
to be running in the complex. Can be
specified for users, groups, or all.
max_running
Old. The maximum number of jobs
allowed to be selected for execution at
any given time, from all possible jobs.
max_user_res
Old. The maximum amount within this
complex that any single user may consume of the specified resource.
max_user_res_soft
Old. The soft limit on the amount of the
specified resource that any single user
may consume within a complex.
max_user_run
Old. The maximum number of jobs
owned by a single user allowed to be running within the complex at one time.
max_user_run_soft
Old. The soft limit on the number of jobs
owned by a single user that are allowed to
be running within this complex at one
time.
node_fail_requeue
Controls whether running jobs are automatically requeued or are deleted when
the primary execution vnode fails. Number of seconds to wait after losing contact
with Mother Superior before requeueing
or deleting jobs.
node_group_enable
Specifies whether node grouping is
enabled.
node_group_key
Specifies the resource to use for node
grouping.
resources_available
The list of available resources and their
values defined on the server.
PBS Professional 13.0 Administrator’s Guide
AG-107
Scheduling
Chapter 4
Table 4-1: Server Attributes Involved in Scheduling
Attribute
Effect
resources_max
The maximum amount of each resource
that can be requested by any single job in
this complex, if there is not a
resources_max value defined for the
queue at which the job is targeted.
scheduler_iteration
The time between scheduling iterations.
scheduling
Enables scheduling of jobs.
resources_assigned
The total of each type of resource allocated to jobs running in this complex,
plus the total of each type of resource
allocated to any started reservations.
Vnode attributes
Several vnode attributes control scheduler behavior. Can be set using qmgr. The
following table lists the vnode attributes that affect scheduling, along with a brief
description. Read every scheduling cycle. For a complete description of each
attribute, see “Vnode Attributes” on page 384 of the PBS Professional Reference
Guide.
Table 4-2: Vnode Attributes Involved in Scheduling
Attribute
Effect
current_aoe
This attribute identifies the AOE currently instantiated on
this vnode.
max_group_run
The maximum number of jobs owned by any users in a
single group allowed to run on this vnode at one time.
max_running
The maximum number of jobs allowed to be run on this
vnode at any given time.
max_user_run
The maximum number of jobs owned by a single user
allowed to run on this vnode at one time.
AG-108
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Table 4-2: Vnode Attributes Involved in Scheduling
Attribute
Effect
no_multinode_jobs
Controls whether jobs which request more than one
chunk are allowed to execute on this vnode.
priority
The priority of this vnode compared with other vnodes.
provision_enable
Controls whether this vnode can be provisioned.
queue
The queue with which this vnode is associated.
resources_available
The list of resources and the amounts available on this
vnode
sharing
Specifies whether more than one job at a time can use the
resources of the vnode or the vnode’s host.
state
Shows or sets the state of the vnode.
pcpus
The number of physical CPUs on the vnode.
resources_assigned
The total amount of each resource allocated to jobs and
started reservations running on this vnode.
Queue attributes
Several queue attributes control scheduler behavior. Can be set using qmgr. The
following table lists the queue attributes that affect scheduling, along with a brief
description. Read every scheduling cycle. For a complete description of each
attribute, see “Queue Attributes” on page 371 of the PBS Professional Reference
Guide.
Table 4-3: Queue Attributes Involved in Scheduling
Attribute
Effect
enabled
Specifies whether this queue accepts new
jobs.
from_route_only
Specifies whether this queue accepts jobs
only from routing queues.
PBS Professional 13.0 Administrator’s Guide
AG-109
Scheduling
Chapter 4
Table 4-3: Queue Attributes Involved in Scheduling
Attribute
Effect
max_array_size
The maximum number of subjobs that are
allowed in an array job.
max_group_res
Old. The maximum amount of the specified resource that any single group may
consume in this queue.
max_group_res_soft
Old. The soft limit for the specified
resource that any single group may consume in this queue.
max_group_run
Old. The maximum number of jobs
owned by the users in one group allowed
to be running within this queue at one
time.
max_group_run_soft
Old. The maximum number of jobs
owned by the users in one group allowed
to be running in this queue at one time.
max_queuable
Old. The maximum number of jobs
allowed to reside in the queue at any
given time.
max_queued
The maximum number of jobs allowed to
be queued in or running from the queue.
Can be specified for users, groups, or all.
max_queued_res.<resource>
The maximum amount of the specified
resource allowed to be allocated to jobs
queued in or running from the queue.
Can be specified for users, groups, or all.
max_run
The maximum number of jobs allowed to
be running from the queue. Can be specified for users, groups, or all.
AG-110
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Table 4-3: Queue Attributes Involved in Scheduling
Attribute
Effect
max_run_res.<resource>
The maximum amount of the specified
resource allowed to be allocated to jobs
running from the queue. Can be specified
for users, groups, or all.
max_run_res_soft.<resource>
Soft limit on the amount of the specified
resource allowed to be allocated to jobs
running from the queue. Can be specified
for users, groups, or all.
max_run_soft
Soft limit on the number of jobs allowed
to be running from the queue. Can be
specified for users, groups, or all.
max_running
Old. The maximum number of jobs
allowed to be selected for execution at
any given time, from all possible jobs.
max_user_res
Old. The maximum amount of the specified resource that the jobs of any single
user may consume.
max_user_res_soft
Old. The soft limit on the amount of the
specified resource that any single user
may consume in this queue.
max_user_run
Old. The maximum number of jobs
owned by a single user allowed to be running from the queue at one time.
max_user_run_soft
Old. The soft limit on the number of jobs
owned by a single user that are allowed to
be running from this queue at one time.
node_group_key
Specifies the resource to use for node
grouping.
Priority
The priority of this queue compared to
other queues of the same type in this PBS
complex.
PBS Professional 13.0 Administrator’s Guide
AG-111
Scheduling
Chapter 4
Table 4-3: Queue Attributes Involved in Scheduling
Attribute
Effect
resources_assigned
The total of each type of resource allocated to jobs running in this queue.
resources_available
The list of available resources and their
values defined on the queue.
resources_max
The maximum amount of each resource
that can be requested by any single job in
this queue.
resources_min
The minimum amount of each resource
that can be requested by a single job in
this queue.
route_destinations
The list of destinations to which jobs may
be routed.
route_held_jobs
Specifies whether jobs in the held state
can be routed from this queue.
route_lifetime
The maximum time a job is allowed to
reside in a routing queue. If a job cannot
be routed in this amount of time, the job
is aborted.
route_retry_time
Time delay between routing retries. Typically used when the network between
servers is down.
route_waiting_jobs
Specifies whether jobs whose
execution_time attribute value is in the
future can be routed from this queue.
started
Specifies whether jobs in this queue can
be scheduled for execution.
state_count
The number of jobs in each state currently residing in this queue.
AG-112
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
List of jobs and server-level resources queried from server
Read every scheduling cycle.
Resources in Resource_List job attribute
Read every scheduling cycle.
List of host-level resources queried from MoMs
Read every scheduling cycle.
4.4.1.2
Editing Configuration Files Under Windows
When you edit any PBS configuration file, make sure that you put a newline at the end of the
file. The Notepad application does not automatically add a newline at the end of a file; you
must explicitly add the newline.
4.4.1.3
Reference Copies of Files
PBS is installed with a reference copy of the current year’s holidays file, in PBS_EXEC/
etc/pbs_holidays.
4.4.2
Making the Scheduler Read its Configuration
If you change the scheduler’s configuration file, the scheduler must re-read it for the changes
to take effect. To get the scheduler to re-read its configuration information, without stopping
the scheduler, you can HUP the scheduler:
kill -HUP <scheduler PID>
If you set a scheduler attribute using qmgr, the change takes effect immediately and you do
not need to HUP the scheduler.
4.4.3
Scheduling on Resources
The scheduler honors all resources listed in the resources: line in PBS_HOME/
sched_priv/sched_config. If this line is not present, the scheduler honors all
resources, built-in and custom. It is more efficient to list just the resources that you want the
scheduler to schedule on.
PBS Professional 13.0 Administrator’s Guide
AG-113
Scheduling
Chapter 4
4.4.4
4.4.4.1
Starting, Stopping, and Restarting the
Scheduler
When and How to Start the Scheduler
During normal operation, startup of the scheduler is handled automatically. The PBS daemons are started automatically at bootup by the PBS start/stop script. During failover, the
secondary server automatically tries to use the primary scheduler, and if it cannot, it starts its
own scheduler.
To start the scheduler by hand:
PBS_EXEC/sbin/pbs_sched [options]
See “pbs_sched” on page 91 of the PBS Professional Reference Guide.
4.4.4.2
When and How to Stop the Scheduler
You must stop the scheduler for the following operations:
•
Using the pbsfs command; see “pbsfs” on page 106 of the PBS Professional Reference
Guide.
•
Upgrading PBS Professional; see “Upgrading” on page 137 in the PBS Professional
Installation & Upgrade Guide.
The scheduler traps signals during the scheduling cycle. You can kill the scheduler at the end
of the cycle, or if necessary, immediately. The scheduler does not write the fairshare usage
file when it is killed with -9, but it does write the file when it is killed without -9.
You must be root on the scheduler’s host.
To stop the scheduler at the end of a cycle:
kill <scheduler PID>
To stop the scheduler immediately:
kill -9 <scheduler PID>
4.4.4.3
When and How to Restart the Scheduler
Under most circumstances, when you restart the scheduler, you do not need to specify any
options to the pbs_sched command. See “pbs_sched” on page 91 of the PBS Professional
Reference Guide. Start the scheduler this way:
PBS_EXEC/sbin/pbs_sched [options]
AG-114
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.4.5
Chapter 4
The Scheduling Cycle
The scheduler runs in a loop. Inside each loop, it starts up, performs all of its work, and then
stops. The scheduling cycle is triggered by a timer and by several possible events.
When there are no events to trigger the scheduling cycle, it is started by a timer. The time
between starts is set in the server’s scheduler_iteration server attribute. The default value is
10 minutes.
The maximum duration of the cycle is set in the scheduler’s sched_cycle_length attribute.
The scheduler will terminate its cycle if the duration of the cycle exceeds the value of the
attribute. The default value for the length of the scheduling cycle is 20 minutes. The scheduler does not include the time it takes to query dynamic resources in its cycle measurement.
4.4.5.1
Triggers for Scheduling Cycle
The scheduler starts when the following happen:
•
The specified amount of time has passed since the previous start
•
A job is submitted
•
A job finishes execution.
•
A new reservation is created
•
A reservation starts
•
Scheduling is enabled
•
The server comes up
•
A job is qrun
•
A queue is started
•
A job is moved to a local queue
•
Eligible wait time for jobs is enabled
•
A reservation is re-confirmed after being degraded
•
A hook restarts the scheduling cycle
4.4.5.1.i
Logging Scheduling Triggers
The server triggers scheduler cycles. The reason for triggering a scheduling cycle is logged
by the server. See section 12.4.5.2, “Scheduler Commands”, on page 1019.
PBS Professional 13.0 Administrator’s Guide
AG-115
Scheduling
Chapter 4
4.4.5.2
Actions During Scheduling Cycle
The following is a list of the scheduler’s actions during a scheduling cycle. The list is not in
any special order.
•
The scheduler gets the state of the world:
•
•
The scheduler queries the server for the following:
•
Status of jobs in queues
•
All global server, queue, and host-level resources
•
Server, queue, vnode, and scheduler attribute settings
•
Reservations
•
The scheduler runs dynamic server resource queries for resources listed in the
“server_dyn_res” line in sched_config
•
The scheduler runs dynamic host-level resource queries for resources listed in the
“mom_resources” line in sched_config
The scheduler logs a message at the beginning of each scheduling cycle saying whether it
is primetime or not, and when this period of primetime or non-primetime will end. The
message is at log event class 0x0100. The message is of this form:
“It is primetime and it will end in NN seconds at MM/DD/YYYY HH:MM:SS”
or
“It is non-primetime and it will end in NN seconds at MM/DD/YYYY HH:MM:SS”
•
Given scheduling policy, available jobs and resources, and scheduling cycle length, the
scheduler examines as many jobs as it can, and runs as many jobs as it can.
4.4.6
How Available Consumable Resources are
Counted
When the scheduler checks for available consumable resources, it uses the following calculation:
resouces_available.<resource> - total resources assigned for this resource
total resources assigned is the total amount of resources_assigned.<resource> for all
other running jobs and, at the server and vnodes, for started reservations.
For example, if the scheduler is calculating available memory, and two other jobs are running,
each with 2GB of memory assigned, and resources_available.mem is 8GB, the scheduler figures that it has 4GB to work with.
AG-116
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.4.7
Chapter 4
Improving Scheduler Performance
4.4.7.1
Improving Throughput of Jobs
You can tell the scheduler to run asynchronously, so it doesn’t wait for each job to be accepted
by MoM, which means it also doesn’t wait for an execjob_begin hook to finish. For short
jobs, this can give you better scheduling performance. To run the scheduler asynchronously,
set the scheduler’s throughput_mode attribute to True (this attribute is True by default).
When throughput_mode is True, jobs that have been changed can run in the same scheduling cycle in which they were changed, for the following changes:
•
Jobs that are qaltered (for example, in cron jobs)
•
Jobs that are changed via server_dyn_res scripts
•
Jobs that are peered to a new queue
throughput_mode
Scheduler attribute. When set to True, the scheduler runs asynchronously and can
start jobs faster. Only available when complex is in TPP mode.
Format: Boolean
Default: True
Example:
qmgr -c “set sched throughput_mode=<Boolean value>”
You can run the scheduler asynchronously only when the complex is using TPP mode. For
details about TPP mode, see “Communication” on page 87 in the PBS Professional Installation & Upgrade Guide. Trying to set the value to a non-Boolean value generates the following error message:
qmgr obj= svr=default: Illegal attribute or resource value
qmgr: Error (15014) returned from server
4.4.7.2
Limiting Number of Jobs Queued in Execution
Queues
If you limit the number of jobs queued in execution queues, you can speed up the scheduling
cycle. You can set an individual limit on the number of jobs in each queue, or a limit at the
server, and you can apply these limits to generic and individual users, groups, and projects,
and to overall usage. You specify this limit by setting the queued_jobs_threshold queue or
server attribute. See section 5.15.1.9, “How to Set Limits at Server and Queues”, on page
401.
PBS Professional 13.0 Administrator’s Guide
AG-117
Chapter 4
Scheduling
If you set a limit on the number of jobs that can be queued in execution queues, we recommend that you have users submit jobs to a routing queue only, and route jobs to the execution
queue as space becomes available. See section 4.8.39, “Routing Jobs”, on page 272.
4.5
Using Queues in Scheduling
A queue is a PBS mechanism for holding jobs. PBS has queue-based tools for handling jobs;
for example, you can set queue-based limits on resource usage by jobs. PBS uses queues for
a variety of purposes. Before reading this section, please familiarize yourself with the
mechanics of creating and configuring queues, by reading section 2.2, “Queues”, on page 18.
AG-118
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Queues are used in the following ways:
•
Holding submitted jobs
•
Prioritizing jobs and ordering job selection:
•
•
PBS provides tools for selecting jobs according to the queue they are in; see section
4.2.5.3, “Using Queue-based Tools to Prioritize Jobs”, on page 68
•
Queue priority can be used in calculating job priority; see section 4.8.36, “Queue Priority”, on page 262
Providing tools for managing time slots
•
Reservations: you can reserve specific resources for defined time slots. Queues are
used for advance and standing reservations; see section 4.8.37, “Advance and Standing Reservations”, on page 264, and "Reserving Resources Ahead of Time", on page
191 of the PBS Professional User’s Guide
•
Dedicated time; see section 4.8.10, “Dedicated Time”, on page 161
•
Primetime and holidays; see section 4.8.34, “Using Primetime and Holidays”, on
page 256
•
Routing jobs: Many ways to route jobs are listed in section 4.8.39, “Routing Jobs”, on
page 272
•
Providing tools for managing resources
•
Managing resource usage by users; see section 5.15.1, “Managing Resource Usage
By Users, Groups, and Projects, at Server & Queues”, on page 389
•
Managing resource usage by jobs; see section 5.15.3, “Placing Resource Limits on
Jobs”, on page 414
•
Setting resource and job limits used for preemption: you can specify how much of a
resource or how many jobs a user or group can use before their jobs are eligible to be
preempted. See section 5.15.1.4, “Hard and Soft Limits”, on page 393 and section
4.8.33, “Using Preemption”, on page 241.
•
Assigning default resources to jobs; see section 5.9.4, “Allocating Default Resources
to Jobs”, on page 327
4.6
4.6.1
Scheduling Restrictions and Caveats
One Policy Per PBS Complex
The scheduler runs a single scheduling policy, and applies it to the entire PBS complex. You
cannot have two different scheduling policies on two different queues or partitions.
PBS Professional 13.0 Administrator’s Guide
AG-119
Chapter 4
4.6.2
Scheduling
Jobs that Cannot Run on Current Resources
The scheduler checks to see whether each job could possibly run now, counting resources as if
there were no other jobs, and all current resources could be used by this job. The scheduler
counts resources only from those vnodes that are on line. If a vnode is marked offline, its
resources are not counted.
The scheduler determines whether a job cannot run on current resources only when backfilling is used. If backfilling is turned off, then the scheduler won't determine whether or not a
job has requested more than can be supplied by current resources. It decides only that it can't
run now. If the job cannot run now because vnodes are unavailable, there is no log message.
If the job requests more than is available in the complex, there is a log message. In both
cases, the job stays queued.
4.6.3
Resources Not Controlled by PBS
When the scheduler runs each cycle, it gets the state of its world, including dynamic resources
outside of the control of PBS. If non-PBS processes are running on the vnodes PBS uses, it is
possible that another process will use enough of a dynamic resource such as scratch space to
prevent a PBS job that requested that resource from running.
4.6.4
No Pinning of Processes to Cores
PBS does not pin processes to cores. This can be accomplished in the job launch script using,
for example, taskset or dplace.
4.7
Errors and Logging
4.7.1
Logfile for scheduler
You can set the scheduler’s logging to record different kinds of events. See section
12.4.4.1.iii, “Specifying Scheduler Log Events”, on page 1018.
The server triggers scheduler cycles. The reason for triggering a scheduling cycle is logged
by the server. See section 12.4.5.2, “Scheduler Commands”, on page 1019.
AG-120
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8
Chapter 4
Scheduling Tools
In this section (all of section 4.8, “Scheduling Tools”, on page 121, and its subsections), we
describe each scheduling tool, including how to configure it.
The following table lists PBS scheduling tools, with links to descriptions:
Table 4-4: List of Scheduling Tools
Scheduling Tool
Anti-express queue
Incompatible Tools
soft queue limits
Associating vnodes
with queues
Backfilling
Link
See section 4.8.1, “Anti-Express
Queues”, on page 125
See section 4.8.2, “Associating
Vnodes with Queues”, on page 126
fairshare or preemption w/
backfilling+strict ordering
See section 4.8.3, “Using Backfilling”, on page 129
Examining jobs queue- round robin, queues as fairby-queue
share entities
See section 4.8.4, “Examining Jobs
Queue by Queue”, on page 136
Checkpointing
See section 4.8.5, “Checkpointing”,
on page 137
Organizing job chunks
See section 4.8.6, “Organizing Job
Chunks”, on page 138
cron jobs, Windows
Task Scheduler
See section 4.8.7, “cron Jobs, or the
Windows Task Scheduler”, on page
139
Custom resources
See section 4.8.8, “Using Custom
and Default Resources”, on page
140
Cycle harvesting
reservations
See section 4.8.9, “Using Idle Workstation Cycle Harvesting”, on page
143
Dedicated time
See section 4.8.10, “Dedicated
Time”, on page 161
Default resources
See section 4.8.8, “Using Custom
and Default Resources”, on page
140
PBS Professional 13.0 Administrator’s Guide
AG-121
Scheduling
Chapter 4
Table 4-4: List of Scheduling Tools
Scheduling Tool
Incompatible Tools
Link
Dependencies
See section 4.8.11, “Dependencies”,
on page 162
Dynamic resources
(server & host)
See section 4.8.12, “Dynamic
Resources”, on page 163
Eligible wait time for
jobs
See section 4.8.13, “Eligible Wait
Time for Jobs”, on page 163
Entity shares (was
strict priority)
formula, fairshare, FIFO
See section 4.8.14, “Sorting Jobs by
Entity Shares (Was Strict Priority)”,
on page 168
Estimating job start
time
See section 4.8.15, “Estimating Job
Start Time”, on page 169
Calculating job execution priority
See section 4.8.16, “Calculating Job
Execution Priority”, on page 174
Express queues
See section 4.8.17, “Express
Queues”, on page 179
Fairshare
formula, starving, strict
ordering, using the
fair_share_perc option to
job_sort_key
FIFO
Formula
See section 4.8.18, “Using Fairshare”, on page 179
See section 4.8.19, “FIFO Scheduling”, on page 192
job_sort_key, fairshare
See section 4.8.20, “Using a Formula for Computing Job Execution
Priority”, on page 194
Gating jobs at server or
queue
See section 4.8.21, “Gating Jobs at
Server or Queue”, on page 203
Managing application
licenses
See section 4.8.22, “Managing
Application Licenses”, on page 204
Limits on per-job
resource usage
See section 4.8.23, “Limits on Perjob Resource Usage”, on page 204
AG-122
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Table 4-4: List of Scheduling Tools
Scheduling Tool
Incompatible Tools
Link
Limits on project, user,
and group jobs
See section 4.8.24, “Limits on
Project, User, and Group Jobs”, on
page 205
Limits on project, user,
and group resource
usage
See section 4.8.25, “Limits on
Project, User, and Group Resource
Usage”, on page 205
Limits on jobs at
vnodes
See section 4.8.26, “Limits on Jobs
at Vnodes”, on page 205
Load balancing
node_sort_key using
unused or assigned
options,
See section 4.8.27, “Using Load
Balancing”, on page 205
Matching jobs to
resources
See section 4.8.28, “Matching Jobs
to Resources”, on page 210
Node grouping
See section 4.8.29, “Node Grouping”, on page 213
Overrides
See section 4.8.30, “Overrides”, on
page 214
Peer scheduling
See section 4.8.31, “Peer Scheduling”, on page 218
Placement sets
See section 4.8.32, “Placement
Sets”, on page 224
Preemption
See section 4.8.33, “Using Preemption”, on page 241
Preemption targets
See section 4.8.33.3.i, “How Preemption Targets Work”, on page 244
Primetime and holidays
See section 4.8.34, “Using Primetime and Holidays”, on page 256
Provisioning
See section 4.8.35, “Provisioning”,
on page 262
Queue priority
See section 4.8.36, “Queue Priority”, on page 262
PBS Professional 13.0 Administrator’s Guide
AG-123
Scheduling
Chapter 4
Table 4-4: List of Scheduling Tools
Scheduling Tool
Incompatible Tools
Link
Advance and standing
reservations
cycle harvesting
See section 4.8.37, “Advance and
Standing Reservations”, on page 264
Round robin queue
examination
by_queue
See section 4.8.38, “Round Robin
Queue Selection”, on page 270
Routing jobs
See section 4.8.39, “Routing Jobs”,
on page 272
Shared or exclusive
vnodes and hosts
See section 4.8.40, “Shared vs.
Exclusive Use of Resources by
Jobs”, on page 277
Shrinking jobs to fit
See section 4.8.41, “Using Shrinkto-fit Jobs”, on page 279
SMP cluster distribution
avoid_provision
See section 4.8.42, “SMP Cluster
Distribution”, on page 290
Sorting jobs using
job_sort_key
See section 4.8.43, “Sorting Jobs on
a Key”, on page 292
Sorting jobs on job’s
requested priority
See section 4.8.44, “Sorting Jobs by
Requested Priority”, on page 295
Sorting queues
See section 4.8.45, “Sorting Queues
into Priority Order”, on page 295
(deprecated in 13.0)
Starving jobs
fairshare
See section 4.8.46, “Starving Jobs”,
on page 296
Strict ordering
Backfilling combined with
fairshare
See section 4.8.47, “Using Strict
Ordering”, on page 299
Sorting vnodes on a
key
smp_cluster_dist set to
See section 4.8.48, “Sorting Vnodes
other than pack, or load bal- on a Key”, on page 300
ancing, with unused or
assigned options to
node_sort_key
AG-124
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.1
Chapter 4
Anti-Express Queues
An anti-express queue is a preemptable low-priority queue, designed for jobs that should run
only when no other jobs need the resources. These jobs are preempted if any other job needs
the resources. An anti-express queue has the lowest priority of all queues in the complex.
Jobs in this queue have a soft limit of zero, so that any job running from this queue is over its
queue soft limit.
See section 4.8.33, “Using Preemption”, on page 241.
4.8.1.1
Configuring Anti-express Queues via Priority
To configure an anti-express queue by using queue priority, do the following:
•
Create an execution queue called lowprio:
Qmgr: create queue lowprio
Qmgr: set queue lowprio queue_type=e
Qmgr: set queue lowprio started=true
Qmgr: set queue lowprio enabled=true
•
By default, all new queues have a priority of zero. Make sure all queues have a value set
for priority, and that lowprio has the lowest priority:
Qmgr: set queue workq priority=10
•
Set the soft limit on the number of jobs that can run from that queue to zero for all users:
Qmgr: set queue lowprio max_run_soft = ”[u:PBS_GENERIC=0]”
•
Make sure that jobs over their queue soft limits have lower preemption priority than normal jobs. Edit PBS_HOME/sched_priv/sched_config, and do the following:
•
Put “normal_jobs” before “queue_softlimits”. For example:
preempt_prio: "express_queue, normal_jobs, queue_softlimits"
•
Use preemption:
preemptive_sched True ALL
4.8.1.2
Configuring Anti-express Queues via
Preemption Targets
To use preemption targets, include this queue in Resource_List.preempt_targets for all
jobs.
PBS Professional 13.0 Administrator’s Guide
AG-125
Chapter 4
4.8.1.3
Scheduling
Anti-express Queue Caveats
If you use soft limits on the number of jobs that users can run at other queues, jobs that are
over their soft limits at other queues will also have the lowest preemption priority.
4.8.2
Associating Vnodes with Queues
You can associate each vnode with one or more queues. When a vnode is associated with a
queue, that means it accepts jobs from that queue only. There are two arrangements:
•
One or more vnodes associate with one queue
•
One or more vnodes associate with multiple queues
These two arrangements require different methods of configuration.
You do not need to associate vnodes with queues in order to have jobs run on the vnodes that
have the right application, as long as the application is a resource that can be requested by
jobs.
4.8.2.1
Associating Vnodes With One Queue
You can associate one or more vnodes with a queue, using the vnode’s queue attribute. Using
this method, each vnode can be associated with at most one queue. Each queue can be associated with more than one vnode. If you associate a queue and one or more vnodes using this
method, any jobs in the queue can run only on the associated vnodes, and the only jobs that
can run on the vnodes are the ones in the queue.
To associate a vnode with a queue, set the vnode’s queue attribute to the name of the queue
you want. For example, to associate the vnode named Vnode1 with the queue named Queue1:
Qmgr: set node Vnode1 queue=Queue1
4.8.2.2
Associating Vnodes With Multiple Queues
You can use custom host-level resources to associate one or more vnodes with more than one
queue. The scheduler will use the resources for scheduling just as it does with any resource.
In order to map a vnode to more than one queue, you must define a new host-level string array
custom resource. This string array holds a string that has the same value for the queue and
vnode you wish to associate. The mechanism of association is that a job that lands in the
queue inherits that value for the resource, and then the job can run only on vnodes having a
matching value for the resource. You can associate more than one queue with a vnode by setting the resource to the same value at each queue.
AG-126
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
In some cases, you can use the same resource to route jobs and to associate vnodes with
queues. For the method described here, you use host-level resources to associate vnodes with
queues. The rules for which resources can be used for routing are given in section 2.2.6.4.iii,
“Resources Used for Routing and Admittance”, on page 27. How jobs inherit resources is
described in section 5.9.4, “Allocating Default Resources to Jobs”, on page 327.
4.8.2.2.i
Procedure to Associate Vnodes with Multiple Queues
To associate one or more vnodes with one or more queues, do the following:
1.
Define the new host-level resource:
qmgr -c ‘create resource <new resource> type=string_array, flag=h’
2.
Instruct the scheduler to honor the resource. Add the new resource to $PBS_HOME/
sched_priv/sched_config:
resources: "ncpus, mem, arch, host, vnode, <new resource>”
3.
HUP the scheduler:
kill -HUP <scheduler PID>
4.
Set each queue’s default_chunk for the new resource to the value you are using to associate it with vnodes:
Qmgr: set queue <queue name> default_chunk.<new resource> = <value>
For example, if one queue is “MathQ” and one queue is “SpareQ”, and the new resource
is “Qlist”, and you want to associate a set of vnodes and queues based on ownership by
the math department, you can make the queue resource value be “math”:
Qmgr: set queue MathQ default_chunk.Qlist = math
Qmgr: set queue SpareQ default_chunk.Qlist = math
5.
Set the value for the new resource at each vnode:
Qmgr: set node <vnode name> resources_available.<new resource> =
<associating value>
For example, to have the vnode named “Vnode1” associated with the queues owned by
the math department:
Qmgr: set node Vnode1 resources_available.Qlist = math
PBS Professional 13.0 Administrator’s Guide
AG-127
Scheduling
Chapter 4
4.8.2.2.ii
Example of Associating Multiple Vnodes with Multiple
Queues
Now, as an example, assume you have 2 queues: “PhysicsQ” and “ChemQ”, and you have 3
vnodes: vn[1], vn[2], and vn[3]. You want Physics jobs to run on vn[1] and vn[2], and you
want Chem jobs to run on vn[2] and vn[3]. Each department gets exclusive use of one vnode,
but both must share a vnode.
To achieve the following mapping:
PhysicsQ -->vn[1], vn[2]
ChemQ --> vn[2], vn[3]
Which is the same as:
vn[1] <-- PhysicsQ
vn[2] <-- PhysicsQ, ChemQ
vn[3] <-- ChemQ
1.
Define the new host-level resource:
Qmgr: create resource Qlist type=string_array, flag=h
2.
Instruct the scheduler to honor the resource. Add the new resource to $PBS_HOME/
sched_priv/sched_config:
resources: "ncpus, mem, arch, host, vnode, Qlist”
3.
HUP the scheduler:
kill -HUP <scheduler PID>
4.
Add queue to vnode mappings:
Qmgr: s n vn[1] resources_available.Qlist="PhysicsQ"
Qmgr: s n vn[2] resources_available.Qlist= "PhysicsQ,ChemQ"
Qmgr: s n vn[3] resources_available.Qlist="ChemQ"
5.
Force jobs to request the correct Qlist values:
Qmgr: s q PhysicsQ default_chunk.Qlist=PhysicsQ
Qmgr: s q ChemQ default_chunk.Qlist=ChemQ
AG-128
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.3
Chapter 4
Using Backfilling
Backfilling means fitting smaller jobs around the higher-priority jobs that the scheduler is
going to run next, in such a way that the higher-priority jobs are not delayed. When the
scheduler is using backfilling, the scheduler considers highest-priority jobs top jobs. Backfilling changes the algorithm that the scheduler uses to run jobs:
•
When backfilling is off, the scheduler looks at each job in priority order, tries to run the
job now, and if it cannot, it moves on to the next-highest-priority job.
•
When backfilling is on, the scheduler tries to run the top job now, and if it cannot, it
makes sure that no other job that it runs in this cycle will delay the top job. It also fits
smaller jobs in around the top job.
Backfilling allows you to keep resources from becoming idle when the top job cannot run.
Backfilling is a primetime option, meaning that you can configure it separately for primetime
and non-primetime, or you can specify it for all of the time.
4.8.3.1
Glossary
Top job
A top job has the highest execution priority according to scheduling policy, and the
scheduler plans resources and start time for this job first. Top jobs exist only when
the scheduler is using backfilling.
Filler job
Smaller job that fits around top jobs. Running a filler job does not change the start
time or resources for a top job. This job runs next only when backfilling is being
used (meaning that a top job cannot start next because insufficient resources are
available for the top job, but whatever is available is enough for the filler job).
4.8.3.2
How Backfilling Works
The scheduler makes a list of jobs to run in order of priority. This list is composed according
to execution priority described in section 4.8.16, “Calculating Job Execution Priority”, on
page 174. These are top jobs.
If you use backfilling, the scheduler looks for smaller jobs that can fit into the usage gaps
around the top jobs. The scheduler looks in the prioritized list of jobs and chooses the highest-priority smaller jobs that fit. Filler jobs are run only if they will not delay the start time of
top jobs.
The scheduler creates a fresh list of top jobs at every scheduling cycle, so if a new higher-priority job has been submitted, it will be considered.
PBS Professional 13.0 Administrator’s Guide
AG-129
Chapter 4
Scheduling
You can use shrink-to-fit jobs to backfill into otherwise unusable time slots. PBS checks
whether a shrink-to-fit job could shrink into the available slot, and if it can, runs it. See section 4.8.41, “Using Shrink-to-fit Jobs”, on page 279.
Backfilling is useful in the following circumstances:
•
When the strict_ordering scheduler parameter is turned on, and backfilling turned off, no
job runs if the top job cannot run. See section 4.8.47, “Using Strict Ordering”, on page
299
•
When the help_starving_jobs scheduler parameter is turned on, filler jobs are fitted
around starving jobs. See section 4.8.46, “Starving Jobs”, on page 296
4.8.3.3
Backfilling Around N Jobs
You can configure the number of top jobs that PBS backfills around by setting the value of the
backfill_depth server attribute. For example, if you set backfill_depth to 3, PBS backfills
around the top 3 jobs. See “Server Attributes” on page 332 of the PBS Professional Reference
Guide. Setting the backfill_depth parameter is effective only when backfill is set to True.
4.8.3.4
Backfilling Around Preempted Jobs
When you set both the sched_preempt_enforce_resumption scheduler attribute and the
backfill parameter to True, the scheduler adds preempted jobs to the set of jobs around which
it backfills. The scheduler ignores backfill_depth when backfilling around jobs in the Preempted execution class. By default the sched_preempt_enforce_resumption scheduler
attribute is False.
4.8.3.5
Backfilling Around Starving Jobs
When you take starving jobs into consideration, by setting the help_starving_jobs scheduler
parameter to True, starving jobs can be added to the top jobs. They can continue to wait for
resources once they are the top job, blocking other jobs from running. See section 4.8.46,
“Starving Jobs”, on page 296.
AG-130
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.3.6
Chapter 4
Configuring Backfilling
To configure backfilling, do the following:
1.
Choose how many jobs to backfill around. If you want to backfill around more than 1
job, set the backfill_depth server attribute to the desired number. The default is 1. Set
this parameter to less than 100.
2.
Choose whether to use backfilling during primetime, non-primetime, or all of the time. If
you want separate primetime and non-primetime behavior, specify the backfill parameter
twice, once for each. The default is all.
3.
Make sure that the backfill scheduler parameter is True for the time you want it. The
default is True. For example:
backfill True prime
backfill False non_prime
4.
Make sure that jobs request walltime by making them inherit a walltime resource if they
don’t explicitly request it. For options, see section 4.8.3.9.i, “Ensure Jobs Are Eligible
for Backfilling”, on page 134.
5.
Choose whether you want to backfill around preempted jobs. To do this, set the
sched_preempt_enforce_resumption scheduler attribute to True.
6.
Make sure that the strict_ordering scheduler parameter is set to True for the same time
as backfilling.
7.
Choose whether you want to backfill around starving jobs. If you do, make sure that the
help_starving_jobs scheduler parameter is set to True.
PBS Professional 13.0 Administrator’s Guide
AG-131
Scheduling
Chapter 4
When most jobs become top jobs, they are counted toward the limit set in backfill_depth.
Some top jobs are not counted toward backfill_depth. The following table shows how backfilling can be configured and which top jobs affect backfill_depth. Unless explicitly stated,
top jobs are counted towards backfill_depth. The scheduler stops considering jobs as top
jobs when it has reached backfill_depth, except for preempted jobs, which do not count
toward that limit. When backfill is off, the scheduler does not have a notion of “top jobs”.
When help_starving_jobs is off, the scheduler has no notion of starving jobs.
Table 4-5: Configuring Backfilling
strict_ordering
When Classes Are Top Jobs
backfill
help_starving_jobs
sched_preempt_enforce_resumption
Parameter and
Attribute Settings
T
T
T
T
Top jobs
Top jobs, not counted Top jobs
in backfill_depth
T
T
T
F
Top jobs
Top jobs, not counted Starving class Top jobs
in backfill_depth
does not exist
T
T
F
T
Top jobs
Top jobs
Top jobs
T
T
F
F
Top jobs
Top jobs
Starving class Top jobs
does not exist
T
F
T
T
No
Top jobs, not counted Top jobs
in backfill_depth
AG-132
Express
Preempted
Starving
Normal
Top jobs
Top jobs
No
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Table 4-5: Configuring Backfilling
strict_ordering
When Classes Are Top Jobs
backfill
help_starving_jobs
sched_preempt_enforce_resumption
Parameter and
Attribute Settings
T
F
T
F
No
Top jobs, not counted Starving class No
in backfill_depth
does not exist
T
F
F
T
No
No
Top jobs
T
F
F
F
No
No
Starving class No
does not exist
4.8.3.7
Express
Preempted
Starving
Normal
No
Backfilling and Strict Ordering
When you use strict ordering, the scheduler runs jobs in exactly the order of their priority. If
backfilling is turned off and the top job cannot run, no job is able to run. Backfilling can prevent resources from standing idle while the top job waits for its resources to become available. See section 4.8.47, “Using Strict Ordering”, on page 299.
PBS Professional 13.0 Administrator’s Guide
AG-133
Scheduling
Chapter 4
4.8.3.8
Attributes and Parameters Affecting Backfilling
backfill
Scheduler parameter. Controls whether or not PBS uses backfilling. Scheduler will
backfill when either strict_ordering is True or help_starving_jobs is True. See
“backfill” on page 298 of the PBS Professional Reference Guide.
backfill_depth
Server attribute. Modifies backfilling behavior. Sets the number of jobs that are to
be backfilled around. See “Server Attributes” on page 332 of the PBS Professional
Reference Guide.
sched_preempt_enforce_resumption
Scheduler attribute. When both this attribute and the backfill scheduler parameter
are True, the scheduler treats preempted jobs like top jobs and backfills around
them. This effectively increases the value of backfill_depth by the number of preempted jobs.
The configuration parameters backfill_prime and prime_exempt_anytime_queues do not
relate to backfilling. They control the time boundaries of regular jobs with respect to primetime and non-primetime. See section 4.8.34, “Using Primetime and Holidays”, on page 256.
4.8.3.9
4.8.3.9.i
Backfilling Recommendations and Caveats
Ensure Jobs Are Eligible for Backfilling
When calculating backfilling, PBS treats a job that has no walltime specified as if its walltime
is eternity. The scheduler will never use one of these jobs as a filler job. You can avoid this
by ensuring that each job has a realistic walltime, by using the following methods:
•
At qsub time via a hook
•
By setting the queue’s resources_default.walltime attribute
•
By setting the server’s resources_default.walltime attribute
•
At qsub time via the server’s default_qsub_arguments
4.8.3.9.ii
Number of Jobs to Backfill Around
The more jobs being backfilled around, the longer the scheduling cycle takes.
AG-134
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.3.9.iii
Chapter 4
Dynamic Resources and Backfilling
Using dynamic resources and backfilling may result in some jobs not being run because a
dynamic resource is temporarily unavailable. This may happen when a job requesting a
dynamic resource is selected as the top job. The scheduler must estimate when resources will
become available, but it can only query for resources available at the time of the query, not
resources already in use, so it will not be able to predict when resources in use become available. Therefore the scheduler won’t be able to schedule the job. In addition, since dynamic
resources are outside of the control of PBS, they may be consumed between the time the
scheduler queries for the resource and the time it starts a job.
4.8.3.9.iv
Avoid Using Strict Ordering, Backfilling, and Fairshare
It is inadvisable to use strict ordering and backfilling with fairshare.
The results may be non-intuitive. Fairshare will cause relative job priorities to change with
each scheduling cycle. It is possible that while a large job waits for a slot, jobs from the same
entity or group will be chosen as the filler jobs, and the usage from these small jobs will lower
the priority of the large job.
For example, if a user has a large job that is the most deserving but cannot run, smaller jobs
owned by that user will chew up the user's usage, and prevent the large job from ever being
likely to run. Also, if the small jobs are owned by a user in one area of the fairshare tree, no
large jobs owned by anyone else in that section of the fairshare tree are likely to be able to run.
4.8.3.9.v
Using Preemption, Strict Ordering, and Backfilling
Using preemption with strict ordering and backfilling may reshuffle the top job(s) if high-priority jobs are preempted.
4.8.3.9.vi
Warning About Backfilling and Provisioning
The scheduler will not run a job requesting an AOE on a vnode that has a top job scheduled on
it in the future.
The scheduler will not use a job requesting an AOE as a top job.
4.8.3.9.vii
Backfilling and Estimating Job Start Time
When the scheduler is backfilling around jobs, it estimates the start times and execution
vnodes for the top jobs being backfilled around. See section 4.8.15, “Estimating Job Start
Time”, on page 169.
PBS Professional 13.0 Administrator’s Guide
AG-135
Chapter 4
4.8.3.9.viii
Scheduling
Using Strict Ordering and Backfilling with Only One of
Primetime or Non-primetime
When PBS is using strict ordering and backfilling, the scheduler saves a spot for each highpriority job around which it is backfilling. If you configure PBS to use strict ordering and
backfilling for only one of primetime or non-primetime, and you have large jobs that must
wait a long time before enough resources are available, the saved spots can be lost in the transition.
4.8.4
Examining Jobs Queue by Queue
When the scheduler examines waiting jobs, it can either consider all of the jobs in the complex as a whole, or it can consider jobs queue by queue. When considering jobs queue by
queue, the scheduler runs all the jobs it can from the first queue before examining the jobs in
the next queue, and so on. This behavior is controlled by the by_queue scheduler parameter.
When the by_queue scheduler parameter is set to True, jobs in the highest-priority queue are
evaluated as a group, then jobs in the next-highest priority queue are evaluated. In this case,
PBS runs all the jobs it can from each queue before moving to the next queue, with the following exception: if there are jobs in the Reservation, Express, Preempted, or Starving
job execution classes, those are considered before any queue. These classes are described in
section 4.8.16, “Calculating Job Execution Priority”, on page 174.
The by_queue parameter applies to all of the queues in the complex. This means that either
all jobs are scheduled as if they are in one large queue, or jobs are scheduled queue by queue.
All queues are always sorted by queue priority. To set queue priority, set each queue’s priority
attribute to the desired value. A queue with a higher value is examined before a queue with a
lower value. If you do not assign priorities to queues, their ordering is undefined. See section
4.8.36, “Queue Priority”, on page 262.
The by_queue parameter is a primetime option, meaning that you can configure it separately
for primetime and non-primetime, or you can specify it for all of the time.
See “by_queue” on page 298 of the PBS Professional Reference Guide.
AG-136
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.4.1
Chapter 4
Configuring PBS to Consider Jobs Queue by
Queue
•
Set the by_queue scheduler parameter to True
•
Assign a priority to each queue
•
Choose whether you want queue by queue during primetime, non-primetime, or both. If
you want separate behavior for primetime and non-primetime, list by_queue twice. For
example:
by_queue True prime
by_queue False non_prime
4.8.4.2
Parameters and Attributes Affecting Queue by
Queue
•
The by_queue scheduler parameter; see “by_queue” on page 298 of the PBS Professional Reference Guide.
•
The priority queue attribute; see “Queue Attributes” on page 371 of the PBS Professional
Reference Guide.
4.8.4.3
Caveats and Advice for Queue by Queue
•
The by_queue scheduler parameter is overridden by the round_robin scheduler parameter when round_robin is set to True.
•
When by_queue is True, queues cannot be designated as fairshare entities, and fairshare
will work queue by queue instead of on all jobs at once.
•
When by_queue is True, job execution priority may be affected. See section 4.8.16,
“Calculating Job Execution Priority”, on page 174.
•
The by_queue parameter is not required when using express queues.
•
You can have FIFO scheduling for all your jobs across the complex, if you are using a
single execution queue or have by_queue set to False. However, you can have FIFO
scheduling for the jobs within each queue if you set by_queue to True and specify a different priority for each queue. See section 4.8.19, “FIFO Scheduling”, on page 192.
4.8.5
Checkpointing
You can use checkpointing as a scheduling tool, by including it as a preemption method, an
aid in recovery, a way to capture progress from a shrink-to-fit job, and when using the qhold
command.
PBS Professional 13.0 Administrator’s Guide
AG-137
Chapter 4
Scheduling
For a complete description of how to use and configure checkpointing, see section 9.3,
“Checkpoint and Restart”, on page 857.
4.8.5.1
Checkpointing as a Preemption Method
When a job is preempted via checkpointing, MoM runs the checkpoint_abort script, and PBS
kills and requeues the job. When the scheduler elects to run the job again, the MoM runs the
restart script to restart the job from where it was checkpointed. See section 4.8.33, “Using
Preemption”, on page 241.
4.8.5.2
Checkpointing as a Way to Capture Progress
and Help Recover Work
When you use checkpointing to capture a job’s progress before the job is terminated, for
example when a shrink-to-fit job’s wall time is exceeded, MoM runs the snapshot checkpoint
script, and the job continues to run. See section 9.3, “Checkpoint and Restart”, on page 857.
4.8.5.3
Checkpointing When Using the qhold
Command
When the qhold command is used to hold a checkpointable job, MoM runs the
checkpoint_abort script, and PBS kills, requeues, and holds the job. A job with a hold on it
must have the hold released via the qrls command in order to be eligible to run. For a discussion of the use of checkpointing for the qhold command, see section 9.3.7.6, “Holding a
Job”, on page 876. For a description of the qhold command, see “qhold” on page 155 of the
PBS Professional Reference Guide.
4.8.6
Organizing Job Chunks
You can specify how job chunks should be organized onto hosts or vnodes. Jobs can request
their placement arrangement, and you can set defaults at queues and at the server to be inherited by jobs that do not request a placement. You can tell PBS to do the following:
•
Put all chunks from a job onto a single host using the place=pack statement.
•
Put each chunk on a separate host using the place=scatter statement. The number of
chunks must be fewer than or equal to the number of hosts.
•
Put each chunk on a separate vnode using the place=vscatter statement. The number of chunks must be fewer than or equal to the number of vnodes.
•
Put each chunk anywhere using the place=free statement.
AG-138
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
To specify a placement default, set resources_default.place=<arrangement>, where
arrangement is pack, scatter, vscatter, or free. For example, to have the default at QueueA
be pack:
Qmgr: set queue QueueA resources_default.place=pack
You can specify that job chunks must be grouped in a certain way. For example, to require
that chunks all end up on a shared router, use this:
place=group=router
For more about jobs requesting placement, see “Requesting Resources and Placing Jobs” on
page 228 of the PBS Professional Reference Guide.
4.8.6.1
Caveats for Organizing Job Chunks
A placement specification for arrangement, sharing, and grouping is treated as one package
by PBS. This means that if a job requests only one, any defaults set for the others are not
inherited. For example, if you set a default of place=pack:excl:group=router, and a
job requests only place=pack, the job does not inherit excl or group=router. See
“Requesting Resources and Placing Jobs” on page 228 of the PBS Professional Reference
Guide.
4.8.7
cron Jobs, or the Windows Task Scheduler
You can use cron jobs or the Windows Task Scheduler to make time-dependent modifications to settings, where you are scheduling according to time slots. For example, you can
change settings for primetime and non-primetime configurations, making the following
changes:
•
Set nodes offline or not offline
•
Change the number of ncpus on workstations
•
Change the priority of queues, for example to change preemption behavior
•
Start or stop queues
•
Set primetime & non-primetime options
PBS Professional 13.0 Administrator’s Guide
AG-139
Chapter 4
4.8.7.1
Scheduling
Caveats for cron Jobs and the Windows Task
Scheduler
•
Make sure that your cron jobs or Windows Task Scheduler behave correctly when PBS
is not running.
•
Be careful when changing available resources, such as when offlining vnodes. You might
prevent jobs from running that would otherwise run. For details, see section 4.6.2, “Jobs
that Cannot Run on Current Resources”, on page 120.
If PBS is down when your cron job runs, the change specified in the cron job won’t
happen. For example, if you use cron to offline a vnode and then bring it online later, it
won’t come online if PBS is down during the second operation.
4.8.8
Using Custom and Default Resources
The information in this section relies on understanding how jobs are allocated resources via
inheriting defaults or via hooks. Before reading this section, please read section 11.3, “Allocating Resources to Jobs”, on page 967.
For complete details of how to configure and use custom resources, please see section 5.14,
“Custom Resources”, on page 337.
You can use custom and default resources for several purposes:
•
Routing jobs to the desired vnodes; see section 4.8.8.2, “Using Custom Resources to
Route Jobs”, on page 141
•
Assigning execution priority to jobs; see section 4.8.8.3, “Using Custom Resources to
Assign Job Execution Priority”, on page 142
•
Tracking and controlling the allocation of resources; see section 4.8.8.4, “Using Custom
Resources to Track and Control Resource Allocation”, on page 142
•
Representing elements such as GPUs, FPGAs, and switches; see section 4.8.8.5, “Using
Custom Resources to Represent GPUs, FPGAs, Switches, Etc.”, on page 142
•
Allowing users to request platform-specific resources, for example Cray-specific
resources; see section 4.8.8.6, “Using Custom Resources to Allow Platform-specific
Resource Requests”, on page 142
•
Allowing users to submit jobs that run on a Cray as they would if using the aprun command; see section 4.8.8.7, “Using Custom Resources to Allow Platform-specific Behavior”, on page 143
•
Shrinking job walltimes so that they can run in time slots that are less than the expected
maximum. See section 4.8.41, “Using Shrink-to-fit Jobs”, on page 279.
AG-140
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.8.1
Chapter 4
Techniques for Allocating Custom Resources to
Jobs
In addition to using custom resources to represent physical elements such as GPUs, you can
use custom resources as tags that you attach to jobs in order to help schedule the jobs. You
can make these custom resources into tools that can be used only for managing jobs, by making them unalterable and unrequestable, and if desired, invisible to users.
For how to assign custom and default resources to jobs, see section 11.3, “Allocating
Resources to Jobs”, on page 967.
4.8.8.2
Using Custom Resources to Route Jobs
You can use several techniques to route jobs to the desired queues and/or vnodes. Depending
on your site’s configuration, you may find it helpful to use custom resources with one or more
of these techniques.
•
You can force users to submit jobs to the desired queues by setting resource limits at
queues. You can use custom resources to represent arbitrary elements, for example,
department. In this case you could limit which department uses each queue. You can set
a default value for the department at the server, or create a hook that assigns a value for
the department.
For how queue resource limits are applied to jobs, see section 2.2.6.4.i, “How Queue and
Server Limits Are Applied, Except Running Time”, on page 25.
•
Use default resources or a hook to assign custom resources to jobs when the jobs are submitted. Send the jobs to routing queues, then route them, using the resources, to other
queues inside or outside the PBS complex. Again, custom resources can represent arbitrary elements.
For how routing queues work, see section 2.2.6, “Routing Queues”, on page 24
•
Use peer scheduling to send jobs between PBS complexes. You can set resource limits
on the furnishing queue in order to limit the kinds of jobs that are peer scheduled. You
can assign custom resources to jobs to represent arbitrary elements, for example peer
queueing only those jobs from a specific project. You can assign the custom resource by
having the job inherit it or via a hook.
For how to set up peer scheduling, see section 4.8.31, “Peer Scheduling”, on page 218
•
You can route jobs from specific execution queues to the desired vnodes, by associating
the vnodes with the queues. See section 4.8.2, “Associating Vnodes with Queues”, on
page 126.
•
You can create placement sets so that jobs are placed according to resource values.
Placement sets are created where vnodes share a value for a resource; you can use custom
PBS Professional 13.0 Administrator’s Guide
AG-141
Chapter 4
Scheduling
resources to create the placement sets you want. See section 4.8.32, “Placement Sets”, on
page 224.
4.8.8.3
Using Custom Resources to Assign Job
Execution Priority
You can use custom resources as coefficients in the job sorting formula. You can assign custom resources to jobs using the techniques listed in section 11.3, “Allocating Resources to
Jobs”, on page 967. The value of each custom resource can be based on a project, an application, etc.
For example, you can create a custom resource called “ProjPrio”, and the jobs that request the
“Bio” project can be given a value of 5 for ProjPrio, and the jobs that request the “Gravel”
project can be given a value of 2 for ProjPrio. You can assign this value in a hook or by routing the jobs into special queues from which the jobs inherit the value for ProjPrio.
For information on using the job sorting formula, see section 4.8.20, “Using a Formula for
Computing Job Execution Priority”, on page 194.
4.8.8.4
Using Custom Resources to Track and Control
Resource Allocation
You can use resources to track and control usage of things like hardware and licenses. For
example, you might want to limit the number of jobs using floating licenses or a particular
vnode. See section 5.10, “Using Resources to Track and Control Allocation”, on page 332.
4.8.8.5
Using Custom Resources to Represent GPUs,
FPGAs, Switches, Etc.
You can use custom resources to represent GPUs, FPGAs, high performance switches, etc.
For examples, see section 5.14.8, “Using GPUs”, on page 383, section 5.14.9, “Using
FPGAs”, on page 387, and section 10.2.7, “Allowing Users to Request HPS”, on page 922.
4.8.8.6
Using Custom Resources to Allow Platformspecific Resource Requests
PBS is integrated with Cray, and provides special custom resources to represent Cray
resources. You can create other custom resources to represent other platform-specific elements. For an example, see section 10.3.7.13, “Allowing Users to Request Login Node
Groups”, on page 941.
AG-142
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.8.7
Chapter 4
Using Custom Resources to Allow Platformspecific Behavior
You can create custom resources that allow Cray users to run jobs that behave the same way
they would if the user had used the aprun command. For examples, see section 10.3.7.11,
“Allowing Users To Reserve N NUMA Nodes Per Compute Node”, on page 937 and section
10.3.7.12, “Allowing Users To Reserve Specific NUMA Nodes”, on page 939.
4.8.9
Using Idle Workstation Cycle Harvesting
You can configure workstations at your site so that PBS can run jobs on them when their
“owners” are away and they are idle. This is called idle workstation cycle harvesting. This
can give your site additional resources to run jobs during nights and weekends, or even during
lunch.
You can configure PBS to use the following methods to decide when a workstation is not
being used by its owner:
•
Keyboard/mouse activity
•
X-Window monitoring
•
Load average (not recommended)
On some systems cycle harvesting is simple to implement, because the console, keyboard,
and mouse device access times are periodically updated by the operating system. The PBS
MoM process can track this information, and mark the vnode busy if any of the input devices
is in use. On other systems, however, this data is not available: on some machines, PBS can
monitor the X-Window system in order to obtain interactive idle time, and on others, PBS
itself monitors keyboard and mouse activity.
Jobs on workstations that become busy are not migrated; they remain on the workstation until
they complete execution, are rerun, or are deleted.
PBS Professional 13.0 Administrator’s Guide
AG-143
Scheduling
Chapter 4
4.8.9.1
Platforms Supporting Cycle Harvesting
Due to different operating system support for tracking mouse and keyboard activity, the availability and method of support for cycle harvesting varies based on the computer platform in
question. The following table lists the method and support for each platform.
Table 4-6: Cycle Harvesting Support Methods
System
Status
Method
Reference
AIX
supported
pbs_idled
"Cycle Harvesting by Monitoring X-Windows” on page 154.
HP-UX 11
supported
keyboard/mouse
section 4.8.9.3, “Cycle Harvesting Based
on Keyboard/Mouse Activity”, on page
145
Linux
supported
keyboard/mouse
section 4.8.9.3, “Cycle Harvesting Based
on Keyboard/Mouse Activity”, on page
145
Solaris
supported
keyboard/mouse
section 4.8.9.3, “Cycle Harvesting Based
on Keyboard/Mouse Activity”, on page
145
Windows
supported
keyboard/mouse
section 4.8.9.4, “Cycle Harvesting on
Windows”, on page 146
4.8.9.2
The $kbd_idle MoM Configuration Parameter
Cycle harvesting based on keyboard/mouse activity and X-Windows monitoring is controlled
by the $kbd_idle MoM configuration parameter in PBS_HOME/mom_priv/config on the
workstation in question. This parameter has the following format:
$kbd_idle <idle_wait> <min_use> <poll_interval>
Declares that the vnode will be used for batch jobs during periods when the keyboard
and mouse are not in use.
idle_wait
Time, in seconds, that the workstation keyboard and mouse must be idle before
being considered available for batch jobs.
Must be set to value greater than 0 for cycle harvesting to be enabled.
Format: Integer
No default
AG-144
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
min_use
Time, in seconds, during which the workstation keyboard or mouse must continue to be in use before the workstation is determined to be unavailable for
batch jobs.
Format: Integer
Default: 10
poll_interval
Interval, in seconds, at which MoM checks for keyboard and mouse activity.
Format: Integer
Default: 1
4.8.9.3
Cycle Harvesting Based on Keyboard/Mouse
Activity
PBS can monitor a workstation for keyboard and mouse activity, and run batch jobs on the
workstation when the keyboard and mouse are not being used. PBS sets the state of the vnode
to either free or busy, depending on whether or not there is keyboard or mouse activity, and
runs jobs only when the state of the vnode is free. PBS sets the state of the vnode to free
when the vnode’s mouse and keyboard have shown no activity for the specified amount of
time. If PBS determines that the vnode is being used, it sets the state of the vnode to busy and
suspends any running jobs, setting their state to U (user busy).
This method is used for Linux, Solaris, and HP operating systems.
4.8.9.3.i
Configuring Cycle Harvesting Using Keyboard/Mouse
Activity
To configure cycle harvesting using keyboard and mouse activity, do the following:
1.
Set the $kbd_idle MoM configuration parameter by editing the $kbd_idle parameter in
PBS_HOME/mom_priv/config on the workstation.
2.
HUP the MoM on the workstation:
kill -HUP <pbs_mom PID>
4.8.9.3.ii
Example of Cycle Harvesting Using Keyboard/Mouse
Activity
The following is an example setting for the parameter:
$kbd_idle 1800 10 5
PBS Professional 13.0 Administrator’s Guide
AG-145
Chapter 4
Scheduling
This setting for the parameter in MoM’s config file specifies the following:
•
PBS marks the workstation as free if the keyboard and mouse are idle for 30 minutes
(1800 seconds)
•
PBS marks the workstation as busy if the keyboard or mouse is used for 10 consecutive
seconds
•
The states of the keyboard and mouse are to be checked for activity every 5 seconds
Here, we walk through how this example would play out, to show the roles of the arguments
to the $kbd_idle parameter:
Let’s start with a workstation that has been in use for some time by its owner. The workstation is in state busy.
Now the owner goes to lunch. After 1800 seconds (30 minutes), PBS changes the workstation’s state to free and starts a job on the workstation.
Some time later, someone walks by and moves the mouse or enters a command. Within
the next 5 seconds (idle poll period), pbs_mom notes the activity. The job is suspended
and placed in state U, and the workstation is marked busy.
If 10 seconds pass and there is no additional keyboard/mouse activity, the job is resumed
and the workstation again is either free (if any CPUs are available) or job-busy (if all
CPUs are in use.)
However, if keyboard/mouse activity continues during that 10 seconds, the workstation
remains busy and the job remains suspended for at least the next 1800 seconds.
4.8.9.3.iii
•
Caveats for Cycle Harvesting Using Keyboard/Mouse
Activity
There is no default for idle_wait; you must set it to a value greater than 0 in order to
enable cycle harvesting using keyboard/mouse activity.
4.8.9.4
Cycle Harvesting on Windows
A process called pbs_idled monitors keyboard and mouse activity and keeps MoM
informed of user activity. The user being monitored can be sitting at the machine, or using a
remote desktop.
AG-146
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
The pbs_idled process is managed in one of two ways. PBS can use a service called
PBS_INTERACTIVE to monitor the user’s session. If the PBS_INTERACTIVE service is
registered, MoM starts the service, and the service starts and stops pbs_idled. The
PBS_INTERACTIVE service runs under a local system account. PBS uses the
PBS_INTERACTIVE service only where site policy allows a local system account to be a
service account. If this is not allowed (so the service is not registered), pbs_idled is
started and stopped using the log on/log off script. Do not use both the PBS_INTERACTIVE
service and a log on/log off script.
A pbs_idled process monitors the keyboard and mouse activity while a user is logged in.
This process starts when the user logs on, and stops when the user logs off. Only a user with
administrator privileges, or the user being monitored, can stop pbs_idled.
MoM uses two files to communicate with pbs_idled:
•
MoM creates PBS_HOME/spool/idle_poll_time and writes the value of her
$kbd_idle polling interval parameter to it. The pbs_idled process reads the value of
the polling interval from idle_poll_time.
•
MoM creates PBS_HOME/spool/idle_touch. The pbs_idled process updates
the time stamp of the idle_touch file when a user is active, and MoM reads the time
stamp.
4.8.9.4.i
Configuring Cycle Harvesting on Windows
To configure cycle harvesting, do the following:
1.
Make sure that you are a user with administrator privileges.
2.
Set the $kbd_idle MoM configuration parameter by editing the $kbd_idle parameter in
PBS_HOME/mom_priv/config on the workstation.
3.
Configure how pbs_idled starts:
a.
If your policy allows a local system account to be a service account, register the
PBS Professional 13.0 Administrator’s Guide
AG-147
Scheduling
Chapter 4
PBS_INTERACTIVE service:
pbs_interactive -R
b.
c.
4.
If your policy does not allow a local system account to be a service account, and you
are in a domained environment:
1.
Configure the log on script as described in section 4.8.9.4.ii, “Configuring
pbs_idled in Log On Script in Domain Environment”, on page 149.
2.
Configure the log off script as described in section 4.8.9.4.iii, “Configuring
pbs_idled in Log Off Script in Domain Environment”, on page 150.
If your policy does not allow a local system account to be a service account, and you
are in a standalone environment:
1.
Configure the log on script as described in section 4.8.9.4.iv, “Configuring
pbs_idled in Log On Script in Standalone Environment”, on page 151.
2.
Configure the log off script as described in section 4.8.9.4.v, “Configuring
pbs_idled in Log Off Script in Standalone Environment”, on page 152.
Restart the MoM.
AG-148
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.9.4.ii
Chapter 4
Configuring pbs_idled in Log On Script in Domain
Environment
1.
You must be a user with administrator privileges.
2.
On the domain controller host, open Administrator Tools.
3.
In Administrator Tools, open Active Directory Users and Computers.
4.
Right-click on the Organizational Unit where you want to apply the group policy for logging on and logging off.
5.
Click on Properties.
6.
Go to the Group Policy tab under the Properties window.
7.
Click on New.
8.
Type “LOG-IN-OUT-SCRIPT” as the name of the policy.
9.
Select the Group Policy Object you have just created; click Edit. The Group Policy
Object editing window will open.
10. Open Window Settings in User Configuration.
11. Open Scripts (Logon/Logoff).
12. Open Logon. A Logon Properties window will open.
13. Open Notepad in another window. In Notepad, you create the command that starts the
pbs_idled process:
pbs_idled start
14. Save that document as “pbs_idled_logon.bat”.
15. In the Logon Properties window, click on Show Files. A logon script folder will open
in a new window.
16. Copy pbs_idled_logon.bat into the logon script folder and close the logon script
folder window.
17. In the Logon Properties window, click on Add, and then click on Browse. Select
pbs_idled_logon.bat and then click on Open.
18. Click on OK, then Apply, then again OK.
19. Close the Group Policy Object editor and the Properties window.
20. Close the Active Directory Users and Computers window.
21. Close the Administrator Tools window.
PBS Professional 13.0 Administrator’s Guide
AG-149
Scheduling
Chapter 4
4.8.9.4.iii
Configuring pbs_idled in Log Off Script in Domain
Environment
1.
You must be a user with administrator privileges.
2.
On the domain controller host, open Administrator Tools.
3.
In Administrator Tools, open Active Directory Users and Computers.
4.
Right-click on the Organizational Unit where you want to apply the group policy for logging on and logging off.
5.
Click on Properties.
6.
Go to the Group Policy tab under the Properties window.
7.
Click on New.
8.
Type “LOG-IN-OUT-SCRIPT” as the name of the policy.
9.
Select the Group Policy Object you have just created; click Edit. The Group Policy
Object editing window will open.
10. Open Window Settings in User Configuration.
11. Open Scripts (Logon/Logoff).
12. Open Logoff. A Logoff Properties window will open.
13. Open Notepad in another window. In Notepad, you create the command that stops the
pbs_idled process:
pbs_idled stop
14. Save that document as “pbs_idled_logoff.bat”.
15. In the Logoff Properties window, click on Show Files. A logoff script folder will open
in a new window.
16. Copy pbs_idled_logoff.bat into the logoff script folder and close the logoff script
folder window.
17. In the Logoff Properties window, click on Add, and then click on Browse. Select
pbs_idled_logoff.bat and then click on Open.
18. Click on OK, then Apply, then again OK.
19. Close the Group Policy Object editor and the Properties window.
20. Close the Active Directory Users and Computers window.
21. Close the Administrator Tools window.
AG-150
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.9.4.iv
Chapter 4
Configuring pbs_idled in Log On Script in Standalone
Environment
1.
You must be a user with administrator privileges.
2.
As administrator, open a command prompt, and type the following command:
gpedit.msc
3.
Press Enter. A Local Group Policy editing window will open.
4.
Open Window Settings in User Configuration.
5.
Open Scripts (Logon/Logoff).
6.
Open Logon. A Logon Properties window will open.
7.
Open Notepad in another window. In Notepad, you create the command that starts the
pbs_idled process:
pbs_idled start
8.
Save that document as “pbs_idled_logon.bat”.
9.
In the Logon Properties window, click on Show Files. A logon script folder will open
in a new window.
10. Copy pbs_idled_logon.bat into the logon script folder and close the logon script
folder window.
11. In the Logon Properties window, click on Add, and then click on Browse. Select
pbs_idled_logon.bat and then click on Open.
12. Click on OK, then Apply, then again OK.
13. Close the Local Group Policy editing window.
PBS Professional 13.0 Administrator’s Guide
AG-151
Scheduling
Chapter 4
4.8.9.4.v
Configuring pbs_idled in Log Off Script in Standalone
Environment
1.
You must be a user with administrator privileges.
2.
As administrator, open a command prompt, and type the following command:
gpedit.msc
3.
Press Enter. A Local Group Policy editing window will open.
4.
Open Window Settings in User Configuration.
5.
Open Scripts (Logon/Logoff).
6.
Open Logoff. A Logoff Properties window will open.
7.
Open Notepad in another window. In Notepad, you create the command that stops the
pbs_idled process:
pbs_idled stop
8.
Save that document as “pbs_idled_logoff.bat”.
9.
In the Logoff Properties window, click on Show Files. A logoff script folder will open
in a new window.
10. Copy pbs_idled_logoff.bat into the logoff script folder and close the logoff script
folder window.
11. In the Logoff Properties window, click on Add, and then click on Browse. Select
pbs_idled_logoff.bat and then click on Open.
12. Click on OK, then Apply, then again OK.
13. Close the Local Group Policy editing window.
4.8.9.4.vi
The PBS_INTERACTIVE Service
The PBS_INTERACTIVE service starts the pbs_idled process, as the current user, in the
current active user’s session. Each time a user logs on, the service starts a pbs_idled for
that user, and when that user logs off, the service stops that user’s pbs_idled process.
The service runs under a local system account. If your policy allows a local system account to
be a service account, you can use PBS_INTERACTIVE. Otherwise you must configure
pbs_idled in log on/log off scripts.
If you have configured the $kbd_idle MoM parameter, and you have registered the service,
MoM starts the service. The service cannot be started manually.
AG-152
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
If you will use PBS_INTERACTIVE, you must register the service. The installer cannot register the service.
•
To register the PBS_INTERACTIVE service:
pbs_interactive -R
Upon successful execution of this command, the following message is displayed:
“Service PBS_INTERACTIVE installed successfully”
•
To unregister the PBS_INTERACTIVE service:
pbs_interactive -U
Upon successful execution of this command, the following message is displayed:
“Service PBS_INTERACTIVE uninstalled successfully”
•
To see the version number for PBS_INTERACTIVE service:
pbs_interactive --version
4.8.9.4.vii
Errors and Logging
If the $kbd_idle MoM parameter is configured, MoM attempts to use cycle harvesting. MoM
looks for the PBS_INTERACTIVE service in the Service Control Manager. If she finds the
service, she starts it.
1.
If she cannot find the service, MoM logs the following message at event class 0x0002:
“Can not find PBS_INTERACTIVE service, Continuing Cycle Harvesting with
Logon/Logoff Script”
2.
MoM looks for PBS_HOME/spool/idle_touch. If she finds it, she uses cycle harvesting.
3.
If she cannot find the file, MoM disables cycle harvesting and logs the following message
at event class 0x0002:
“Cycle Harvesting Failed, Please contact Admin”
PBS Professional 13.0 Administrator’s Guide
AG-153
Chapter 4
Scheduling
MoM logs the following messages at event class 0x0001.
•
If MoM fails to open the Service Control Manager:
“OpenSCManager failed for PBS_INTERACTIVE”
•
If MoM fails to open the PBS_INTERACTIVE service:
“OpenService failed for PBS_INTERACTIVE”
•
If MoM fails to start the PBS_INTERACTIVE service:
“Could not start PBS_INTERACTIVE service”
•
If MoM fails to get status information about the PBS_INTERACTIVE service:
“Can not get information about PBS_INTERACTIVE service”
•
If MoM fails to send a stop control message to the PBS_INTERACTIVE service:
“Could not stop PBS_INTERACTIVE service”
•
If the PBS_INTERACTIVE service does not respond in a timely fashion:
“PBS_INTERACTIVE service did not respond in timely fashion”
•
If MoM fails to create idle_touch and idle_poll_time in PBS_HOME/spool directory:
“Can not create file < full path of idle file >”
•
If MoM fails to write the idle polling interval into PBS_HOME/spool/idle_poll_time:
“Can not write idle_poll time into < full path of idle_poll_time file >
file”
4.8.9.4.viii
Caveats for Cycle Harvesting on Windows
•
Under Windows, if the pbs_idled process is killed, cycle harvesting will not work.
•
Under Windows, cycle harvesting may not work correctly on machines where more than
one user is logged in, and users are not employing Switch User.
•
Do not use both the PBS_INTERACTIVE service and a log on/log off script.
4.8.9.5
Cycle Harvesting by Monitoring X-Windows
On UNIX/Linux machines where the OS does not periodically update console, keyboard, and
mouse device access times, PBS can monitor X-Window activity instead. PBS uses an XWindow monitoring process called pbs_idled. This process runs in the background and
monitors X and reports to the pbs_mom whether or not the vnode is idle. pbs_idled is
located in $PBS_EXEC/sbin.
This method is used for machines running AIX operating systems.
AG-154
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
To configure PBS for cycle harvesting by monitoring X-Windows, perform the following
steps:
1.
Create a directory for pbs_idled. This directory must have the same permissions as /
tmp (i.e. mode 1777). This will allow the pbs_idled program to create and update
files as the user, which is necessary because the program runs as the user. For example:
mkdir PBS_HOME/spool/idledir
chmod 1777 PBS_HOME/spool/idledir
2.
Turn on keyboard idle detection in the MoM config file:
$kbd_idle <idle wait value>
3.
Include pbs_idled as part of the X-Windows startup sequence.
The best and most secure method of starting pbs_idled is via the system-wide Xsession file. This is the script which is run by xdm (the X login program) and sets up each
user's X-Windows environment.
You must place the startup line for pbs_idled before that of the window manager.
You must make sure that pbs_idled runs in the background.
On systems that use Xsession to start desktop sessions, insert a line invoking
pbs_idled near the top of the file.
For example, insert the following line in a Linux Xsession file:
/usr/pbs/sbin/pbs_idled &
If access to the system-wide Xsession file is not available, you can add pbs_idled
to every user's personal .xsession or .xinitrc file, depending on the local OS
requirements for starting X-windows programs upon login.
4.8.9.6
Cycle Harvesting Based on Load Average
Cycle harvesting based on load average means that PBS monitors each workstation’s load
average, runs jobs where workstations have loads below a specified level, and suspends any
batch jobs on workstations whose load has risen above the limit you set. When a workstation’s owner uses the machine, the workstation’s load rises.
When you configure cycle harvesting based on load average, you are performing the same
configuration as for load balancing using load average. For a complete description of load
balancing, see section 4.8.27, “Using Load Balancing”, on page 205.
PBS Professional 13.0 Administrator’s Guide
AG-155
Scheduling
Chapter 4
4.8.9.6.i
Attributes and Parameters Affecting Cycle Harvesting
Based on Load Average
load_balancing
Scheduler parameter. When set to True, the scheduler places jobs only where the
load average is below the specified limit.
Format: Boolean
Default: False all
$ideal_load <load>
MoM parameter. Defines the load below which the vnode is not considered to be
busy. Used with the $max_load directive.
Example:
$ideal_load 1.8
Format: Float
No default
$max_load <load> [suspend]
MoM parameter. Defines the load above which the vnode is considered to be busy.
Used with the $ideal_load directive. No new jobs are started on a busy vnode.
The optional suspend directive tells PBS to suspend jobs running on the node if the
load average exceeds the $max_load number, regardless of the source of the load
(PBS and/or logged-in users). Without this directive, PBS will not suspend jobs due
to load.
We recommend setting this to a slightly higher value than your target load (which is
typically the number of CPUs), for example .25 + ncpus.
Example:
$max_load 3.25
Format: Float
Default: number of CPUs
resv_enable
Vnode attribute. Controls whether the vnode can be used for advance and standing
reservations. When set to True, this vnode can be used for reservations.
Format: Boolean
Default: True
AG-156
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
no_multinode_jobs
Vnode attribute. Controls whether jobs which request more than one chunk are
allowed to execute on this vnode. When set to True, jobs requesting more than one
chunk are not allowed to execute on this vnode.
Format: Boolean
Default: False
4.8.9.6.ii
How Cycle Harvesting Based on Load Average Works
Cycle harvesting based on load average means that PBS monitors the load average on each
machine. When the load on a workstation is below what is specified in the $ideal_load MoM
parameter, PBS sets the state of the workstation to free. The scheduler will run jobs on
vnodes whose state is free. When the load on a workstation exceeds the setting for
$max_load, PBS sets the state of the workstation to busy, and suspends jobs running on the
workstation. PBS does not start jobs on a vnode whose state is busy. When the load drops
below the setting for $ideal_load, PBS sets the state to free, and resumes the jobs that were
running on the workstation.
PBS thinks that a 1-CPU job raises a vnode’s load by 1. On machines being used for cycle
harvesting, values for $max_load and $ideal_load are set to reasonable limits. On other
machines, these are set to values that will never be exceeded, so that load is effectively
ignored.
On machines where these parameters are unset, the vnode’s state is not set according to its
load, so jobs are not suspended because a vnode is busy. However, if $max_load and
$ideal_load are unset, they are treated as if they have the same value as
resources_available.ncpus, and because there is usually a small background load, PBS will
lose the use of a CPU’s worth of load. The scheduler won’t place a job where the anticipated
load would exceed $max_load, so if a machine has a load of 1.25, is running a 1-CPU job,
and has 2 CPUs, PBS won’t place another 1-CPU job there.
4.8.9.6.iii
Configuring Cycle Harvesting Based on Load Average
To set up cycle harvesting for idle workstations based on load average, perform the following
steps:
1.
If PBS is not already installed on the target execution workstations, do so now, selecting
the execution-only install option. See the PBS Professional Installation & Upgrade
Guide.
2.
Edit the PBS_HOME/mom_priv/config configuration file on each target execution
workstation, adding the $max_load and $ideal_load configuration parameters. Make
PBS Professional 13.0 Administrator’s Guide
AG-157
Scheduling
Chapter 4
sure they have values that will not interfere with proper operation. See section 4.8.9.6.v,
“Caveats for Cycle Harvesting Based on Load Average”, on page 159.
$max_load <load limit that allows jobs to run>
$ideal_load <load at which to start jobs>
3.
Edit the PBS_HOME/mom_priv/config configuration file on each machine where
you are not using cycle harvesting, adding the $max_load and $ideal_load configuration
parameters. Make sure they have values that will never be exceeded.
$max_load <load limit that will never be exceeded>
$ideal_load <load limit that will never be exceeded>
4.
HUP the MoM:
kill -HUP <pbs_mom PID>
5.
Edit the PBS_HOME/sched_priv/sched_config configuration file to direct the
Scheduler to perform scheduling based on load_balancing.
load_balancing: True
ALL
6.
If you wish to oversubscribe the vnode’s CPU(s), set its resources_available.ncpus to
a higher number. Do this only on single-vnode machines. You must be cautious about
matching ncpus and $max_load. See "Caveats for Cycle Harvesting Based on Load
Average" on page 159 in the PBS Professional Administrator’s Guide.
7.
HUP the scheduler:
kill -HUP <pbs_sched PID>
8.
Set the vnode’s resv_enable attribute to False, to prevent the workstation from being
used for reservations.
Qmgr: set node <vnode name> resv_enable = False
9.
Set the vnode’s no_multinode_jobs attribute to True, to prevent the workstation from
stalling multichunk jobs.
Qmgr: set node <vnode name> no_multinode_jobs = True
4.8.9.6.iv
Viewing Load Average Information
You can see the state of a vnode using the pbsnodes -a command.
AG-158
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.9.6.v
Chapter 4
Caveats for Cycle Harvesting Based on Load Average
•
Be careful with the settings for $ideal_load and $max_load. You want to make sure that
when the workstation owner is using the machine, the load on the machine triggers MoM
to report being busy, and that PBS does not start any new jobs while the user is working.
•
For information about keeping your site running smoothly using $max_load and
$ideal_load, see section 9.4.4, “Managing Load Levels on Vnodes”, on page 883
•
If you set ncpus higher than the number of actual CPUs, and set $max_load higher to
match, keep in mind that the workstation user could end up with an annoyingly slow
workstation. This can happen when PBS runs jobs on the machine, but the combined
load from the jobs and the user is insufficient for MoM to report being busy.
4.8.9.7
Cycle Harvesting and File Transfers
The cycle harvesting feature interacts with file transfers in one of two different ways, depending on the method of file transfer:
•
If the user’s job includes file transfer commands (such as rcp or scp) within the job
script, and such a command is running when PBS decides to suspend the job on the
vnode, then the file transfer is suspended as well.
•
If the job has PBS file staging parameters (i.e. stagein=, stageout=file1...), and the load
goes above $max_load, the file transfer is not suspended. This is because the file staging
is not part of the job script execution, and is not subject to suspension. See "Detailed
Description of Job Lifecycle", on page 58 of the PBS Professional User’s Guide.
4.8.9.8
Parallel Jobs With Cycle Harvesting
Cycle harvesting is not recommended for hosts that will run multi-host jobs. However, you
may find that your site benefits from using cycle harvesting on these machines. We provide
advice on how to prevent cycle harvesting on these machines, and advice on how to accomplish it.
4.8.9.8.i
General Advice: Parallel Jobs Not Recommended
Cycle harvesting is somewhat incompatible with multi-host jobs. If one of the hosts being
used for a parallel job running on several hosts is being used for cycle harvesting, and the user
types at the keyboard, job execution will be delayed for the entire job because the tasks running on that host will be suspended.
To prevent a machine which is being used for cycle harvesting from being assigned a multihost job, set the vnode’s no_multinode_jobs attribute to True. This attribute prevents a host
from being used by jobs that span multiple hosts.
PBS Professional 13.0 Administrator’s Guide
AG-159
Chapter 4
4.8.9.8.ii
Scheduling
How to Use Cycle Harvesting with Multi-host Jobs
When a single-host job is running on a workstation configured for cycle harvesting, and that
host becomes busy, the job is suspended. However, suspending a multi-host parallel job may
have undesirable side effects because of inter-process communications. For a job which uses
multiple hosts when one or more of the hosts becomes busy, the default action is to leave the
job running.
However, you can specify that the job should be requeued and subsequently re-scheduled to
run elsewhere when any of the hosts on which the job is running becomes busy. To enable
this action, add the following parameter to MoM’s configuration file:
$action multinodebusy 0 requeue
where multinodebusy is the action to modify; “0” (zero) is the action timeout value (it is
ignored for this action); and requeue is the new action to perform. The only action that can be
performed is requeueing.
Multi-host jobs which are not rerunnable (i.e. those submitted with the qsub -rn option)
will be killed if the requeue argument is configured for the multinodebusy action and a
vnode becomes busy.
4.8.9.9
4.8.9.9.i
Cycle Harvesting Caveats and Restrictions
Cycle Harvesting and Multi-host Jobs
Cycle harvesting is not recommended for hosts that will run multi-host jobs. See section
4.8.9.8.i, “General Advice: Parallel Jobs Not Recommended”, on page 159.
4.8.9.9.ii
Cycle Harvesting and Reservations
Cycle harvesting is incompatible with jobs in reservations. Reservations should not be made
on a machine used for cycle harvesting, because the user may appear during the reservation
period and use the machine’s keyboard. This will suspend the jobs in the reservation, defeating the purpose of making a reservation.
To prevent a vnode which is being used for cycle harvesting from being used for reservations,
set the vnode’s resv_enable attribute to False. This attribute controls whether the vnode can
be used for reservations.
4.8.9.9.iii
File Transfers with Cycle Harvesting
File transfers behave differently depending on job details. See section 4.8.9.7, “Cycle Harvesting and File Transfers”, on page 159.
AG-160
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.9.9.iv
Chapter 4
Cycle Harvesting on Windows
•
Under Windows, if the pbs_idled process is killed, cycle harvesting will not work.
•
Under Windows, cycle harvesting may not work correctly on machines where more than
one user is logged in.
4.8.10
Dedicated Time
PBS provides a feature called dedicated time which allows you to define times during which
the only jobs that can run are the ones in dedicated queues. You can use dedicated time for
things like upgrades.
You can define multiple dedicated times. Any job in a dedicated time queue must have a walltime in order to run. Jobs without walltimes will never run. PBS won’t let a reservation conflict with dedicated time. Hooks should not access or modify the dedicated time file.
For information on configuring dedicated time queues, see section 2.2.5.2.i, “Dedicated Time
Queues”, on page 22.
4.8.10.1
Dedicated Time File
You define dedicated time by adding one or more time slots in the file PBS_HOME/
sched_priv/dedicated_time. A time slot is a start date and start time and an end date
and end time. Format:
<start date> <start time> <end date> <end time>
expressed as
MM/DD/YYYY HH:MM MM/DD/YYYY HH:MM
Any line whose first non-whitespace character is a pound sign (“#”) is a comment.
Example:
#Dedicated time for maintenance
04/15/2007 12:00 04/15/2007 15:30
A sample dedicated time file (PBS_EXEC/etc/pbs_dedicated) is included in the installation.
The dedicated time file is read on startup and HUP.
PBS Professional 13.0 Administrator’s Guide
AG-161
Scheduling
Chapter 4
4.8.10.2
Steps in Defining Dedicated Time
You define dedicated time by performing the following steps:
1.
Edit the file PBS_HOME/sched_priv/dedicated_time and add one or more time
slots.
2.
HUP or restart the scheduler:
UNIX/Linux:
kill -HUP <pbs_sched PID>
Windows:
net stop pbs_sched
net start pbs_sched
4.8.10.3
Recommendations for Dedicated Time
If you need to set up dedicated time for something like system maintenance, you may want to
avoid having the machines become idle for a significant period before dedicated time starts.
You can allow jobs to shrink their walltimes to fit into those shorter-than-normal slots before
dedicated time. See section 4.8.41, “Using Shrink-to-fit Jobs”, on page 279.
4.8.11
Dependencies
PBS allows job submitters to specify dependencies between jobs, for example specifying that
job J2 can only run if job J1 finishes successfully. You can add dependencies to jobs via a
hook, default arguments to qsub, or via the qalter command.
For a description of how job dependencies work, see "Using Job Dependencies", on page 164
of the PBS Professional User’s Guide.
For how to use hooks, see section , “Hooks”, on page 437.
For how to add default qsub arguments, see “Server Attributes” on page 332 of the PBS Professional Reference Guide.
For how to use the qalter command, see “qalter” on page 135 of the PBS Professional Reference Guide.
AG-162
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.12
Chapter 4
Dynamic Resources
You can use dynamic PBS resources to represent elements that are outside of the control of
PBS, typically for licenses and scratch space. You can represent elements that are available to
the entire PBS complex as server-level resources, or elements that are available at a specific
host or hosts as host-level resources. For an example of configuring a server-level dynamic
resource, see section 5.14.4.1.i, “Example of Configuring Dynamic Server-level Resource”,
on page 359. For an example of configuring a dynamic host-level resource, see section
5.14.5.1.i, “Example of Configuring Dynamic Host-level Resource”, on page 362.
For a complete description of how to create and use dynamic resources, see section 5.14,
“Custom Resources”, on page 337.
4.8.13
Eligible Wait Time for Jobs
PBS provides a method for tracking how long a job that is eligible to run has been waiting to
run. By “eligible to run”, we mean that the job could run if the required resources were available. The time that a job waits while it is not running can be classified as “eligible” or “ineligible”. Roughly speaking, a job accrues eligible wait time when it is blocked due to a
resource shortage, and accrues ineligible wait time when it is blocked due to project, user, or
group limits. A job can be accruing any of the following kinds of time. A job can only accrue
one kind of wait time at a time, and cannot accrue wait time while it is running.
4.8.13.1
Types of Time Accrued
eligible_time
Job attribute. The amount of wall clock wait time a job has accrued because the job is
blocked waiting for resources, or any other reason not covered by ineligible_time.
For a job currently accruing eligible_time, if we were to add enough of the right type
of resources, the job would start immediately. Viewable via qstat -f by job
owner, Manager and Operator. Settable by Operator or Manager.
ineligible_time
The amount of wall clock time a job has accrued because the job is blocked by limits
on the job’s project, owner, or group, or because the job is blocked because of its
state.
run_time
The amount of wall clock time a job has spent running.
exiting
The amount of wall clock time a job has spent exiting.
PBS Professional 13.0 Administrator’s Guide
AG-163
Scheduling
Chapter 4
initial_time
The amount of wall clock wait time a job has accrued before the type of wait time
has been determined.
4.8.13.2
How Eligible Wait Time Works
A job accrues ineligible_time while it is blocked by project, user, or group limits, such as:
max_run
max_run_soft
max_run_res.<resource>
max_run_res_soft.<resources>
A job also accrues ineligible_time while it is blocked due to a user hold or while it is waiting
for its start time, such as when submitted via
qsub -a <run-after> …
A job accrues eligible_time when it is blocked by a lack of resources, or by anything not qualifying as ineligible_time or run_time. A job’s eligible_time will only increase during the life
of the job, so if the job is requeued, its eligible_time is preserved, not set to zero. The job’s
eligible_time is not recalculated when a job is qmoved or moved due to peer scheduling.
For information on project, user, and group limits, see section 5.15.1, “Managing Resource
Usage By Users, Groups, and Projects, at Server & Queues”, on page 389.
The kind of time a job is accruing is sampled periodically, with a granularity of seconds.
A job’s eligible_time attribute can be viewed via qstat -f.
4.8.13.3
Configuring Eligible Wait Time
To enable using eligible time as the job’s wait time, set the eligible_time_enable server
attribute to True.
AG-164
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.13.4
Chapter 4
How Eligible Wait Time Is Used
•
If eligible time is enabled, it is used as each job’s starving time.
•
You can choose to use each job’s eligible wait time as the amount of time it is starving.
See section 4.8.46, “Starving Jobs”, on page 296.
•
When a job is requeued, for example being checkpointed and aborted or preempted, its
accumulated queue waiting time depends on how that time is calculated:
•
If you are using eligible time, the accumulated waiting time is preserved
•
If you are not using eligible time, the accumulated waiting time is lost
See section 9.3, “Checkpoint and Restart”, on page 857 and section 4.8.33, “Using Preemption”, on page 241.
4.8.13.5
Altering Eligible Time
A Manager or Operator can set the value for a job’s eligible_time attribute using the qalter
command, for example:
qalter -Weligible_time=<time> <job ID>
4.8.13.6
Attributes Affecting Eligible Time
eligible_time_enable
Server attribute. Enables accumulation of eligible time for jobs. Controls whether a
job’s eligible_time attribute is used as its starving time. See section 4.8.46, “Starving Jobs”, on page 296.
On an upgrade from versions of PBS prior to 9.1 or on a fresh install,
eligible_time_enable is set to False by default.
When eligible_time_enable is set to False, PBS does not track eligible_time.
Whether eligible_time continues to accrue for a job or not is undefined. The output
of qstat -f does not include eligible_time for any job. Accounting logs do not
show eligible_time for any job submitted before or after turning
eligible_time_enable off. Log messages do not include accrual messages for any
job submitted before or after turning eligible_time_enable off. If the scheduling
formula includes eligible_time, eligible_time evaluates to 0 for all jobs.
When eligible_time_enable is changed from False to True, jobs accrue
eligible_time or ineligible_time or run_time as appropriate. A job’s eligible_time is
used for starving calculation starting with the next scheduling cycle; changing the
value of eligible_time_enable does not change the behavior of an active scheduling
cycle.
PBS Professional 13.0 Administrator’s Guide
AG-165
Scheduling
Chapter 4
accrue_type
Job attribute. Indicates what kind of time the job is accruing.
Table 4-7: The accrue_type Job Attribute
Type
Numeric
Representation
Type
JOB_INITIAL
0
initial_time
JOB_INELIGIBLE
1
ineligible_time
JOB_ELIGIBLE
2
eligible_time
JOB_RUNNING
3
run_time
JOB_EXIT
4
exit_time
The job’s accrue_type attribute is visible via qstat only by Manager, and is set
only by the server.
eligible_time
Job attribute. The amount of wall clock wait time a job has accrued because the job
is blocked waiting for resources, or any other reason not covered by ineligible_time.
For a job currently accruing eligible_time, if we were to add enough of the right type
of resources, the job would start immediately. Viewable via qstat -f by job
owner, Manager and Operator. Settable by Operator or Manager.
4.8.13.7
Logging
The server prints a log message every time a job changes its accrue_type, with both the new
accrue_type and the old accrue_type. These are logged at the 0x0400 event class.
Server logs for this feature display the following information:
•
Time accrued between samples
•
The type of time in the previous sample, which is one of initial time, run time, eligible
time or ineligible time
•
The next type of time to be accrued, which is one of run time, eligible time or ineligible
time
•
The eligible time accrued by the job, if any, until the current sample
AG-166
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Example:
08/07/2007 13:xx:yy;0040;Server@host1;Job;163.host1;job accrued 0 secs of
initial_time, new accrue_type=eligible_time, eligible_time=00:00:00
08/07/2007 13:xx:yy;0040;Server@host1;Job;163.host1;job accrued 1821 secs
of eligible_time, new accrue_type=ineligible_time,
eligible_time=01:20:22
08/07/2007 13:xx:yy;0040;Server@host1;Job;163.host1;job accrued 2003 secs
of ineligible_time, new accrue_type=eligible_time,
eligible_time=01:20:22
08/07/2007 13:xx:yy;0040;Server@host1;Job;163.host1;job accrued 61 secs of
eligible_time, new accrue_type=run_time, eligible_time=01:21:23
08/07/2007 13:xx:yy;0040;Server@host1;Job;163.host1;job accrued 100 secs
of run_time, new accrue_type=ineligible_time, eligible_time=01:21:23
08/07/2007 13:xx:yy;0040;Server@host1;Job;163.host1;job accrued 33 secs of
ineligible_time, new accrue_type=eligible_time, eligible_time=01:21:23
08/07/2007 13:xx:yy;0040;Server@host1;Job;163.host1;job accrued 122 secs
of eligible_time, new accrue_type=run_time, eligible_time=01:23:25
08/07/2007 13:xx:yy;0040;Server@host1;Job;163.host1;job accrued 1210 secs
of run_time, new accrue_type=exiting, eligible_time=01:23:25
The example shows the following changes in time accrual:
•
initial to eligible
•
eligible to ineligible
•
ineligible to eligible
•
eligible to running
•
running to ineligible
•
ineligible to eligible
•
eligible to running
•
running to exiting
The server also logs the change in accrual when the job’s eligible_time attribute is altered
using qalter. For example, if the job’s previous eligible time was 123 seconds, and it has
been altered to be 1 hour and 1 minute:
Accrue type is eligible_time, previous accrue type was eligible_time for
123 secs, due to qalter total eligible_time=01:01:00
PBS Professional 13.0 Administrator’s Guide
AG-167
Scheduling
Chapter 4
4.8.13.8
Accounting
Each job's eligible_time attribute is included in the “E” and “R” records in the PBS accounting logs. See “Record Types” on page 440 of the PBS Professional Reference Guide.
Example:
08/07/2007 19:34:06;E;182.Host1;user=user1 group=user1 jobname=STDIN
queue=workq ctime=1186494765 qtime=1186494765 etime=1186494765
start=1186494767 exec_host=Host1/0 exec_vnode=(Host1:ncpus=1)
Resource_List.ncpus=1 Resource_List.nodect=1 Resource_List.place=pack
Resource_List.select=1:ncpus=1 session=4656 end=1186495446
Exit_status=-12 resources_used.cpupercent=0
resources_used.cput=00:00:00 resources_used.mem=3072kb
resources_used.ncpus=1 resources_used.vmem=13356kb
resources_used.walltime=00:11:21 eligible_time=00:10:00
4.8.13.9
Caveats for Eligible Time
A job that is dependent on another job can accrue eligible time only after the job on which it
depends has finished.
4.8.14
Sorting Jobs by Entity Shares (Was Strict
Priority)
You can sort jobs according to how much of the fairshare tree is allocated to the entity that
owns the job. The fairshare percentages in the fairshare tree describe each entity’s share.
Using entity shares is sorting jobs on a key, using the fair_share_perc option to the
job_sort_key scheduler parameter.
Using entity shares, the jobs from an entity with greater allocation in the fairshare tree run
before the jobs with a smaller allocation.
AG-168
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.14.1
Chapter 4
Configuring Entity Shares
To configure entity shares, do the following:
•
Define fairshare tree entity allocation in PBS_HOME/sched_priv/
resource_group. See section 4.8.18, “Using Fairshare”, on page 179. You can use a
simple fairshare tree, where every entity’s parent_group is root.
•
Give each entity shares according to desired priority, with higher-priority entities
getting larger allocations.
•
Set the unknown_shares scheduler parameter to 1. This causes any entity not in
your list of approved entities to have a tiny allocation, and the lowest priority.
For example:
•
usr1
60
root
5
usr2
61
root
15
usr3
62
root
15
usr4
63
root
10
usr5
64
root
25
usr6
65
root
30
Set fair_share_perc as the option to job_sort_key, for example:
job_sort_key: “fair_share_perc HIGH all”
4.8.14.2
Viewing Entity Shares
When you are root, you can use the pbsfs command to view the fairshare tree allocations.
4.8.15
Estimating Job Start Time
PBS can estimate when jobs will run, and which vnodes each job will use. PBS estimates job
start times and vnodes for all jobs using an asynchronous process, not the PBS server, scheduler, or MoM daemons.
Jobs have an attribute called estimated for reporting estimated start time and estimated
vnodes. This attribute reports the values of two read-only built-in resources, start_time and
exec_vnode. Each job’s estimated start time is reported in estimated.start_time, and its estimated vnodes are reported in estimated.exec_vnode.
PBS automatically sets the value of each job’s estimated.start_time value to the estimated
start time for each job.
PBS Professional 13.0 Administrator’s Guide
AG-169
Chapter 4
Scheduling
4.8.15.1
Configuring Start Time Estimation
PBS estimates values for start_time and exec_vnode for jobs in the following ways:
•
When est_start_time_freq is set to a value greater than zero, PBS estimates values for all
jobs at the specified interval
•
When est_start_time_freq is set to zero, PBS estimates values for all jobs after each
scheduling cycle
•
When the scheduler is backfilling around top jobs, it estimates the start times and
exec_vnode for those jobs being backfilled around
If you want PBS to estimate the start time for all jobs, either set the est_start_time_freq
server attribute to the interval at which you want PBS to make the calculation, or set it to
zero, and the calculation will be made every scheduling cycle.
You can choose to have estimated start times for just the jobs being backfilled around. You set
the number of jobs to be backfilled around by setting the server’s backfill_depth attribute to
the desired number. See section 4.8.3, “Using Backfilling”, on page 129.
Example 4-1: To estimate start times for the top 5 jobs every scheduling cycle, and for all
jobs every 3 hours:
qmgr -c 'set server backfill_depth=5'
qmgr -c 'set server est_start_time_freq = 3:00:00'
4.8.15.2
4.8.15.2.i
Controlling User Access to Start Times and
Vnode List
Making Start Time or Vnodes Invisible
You can make job estimated start times and vnodes invisible to unprivileged users by adding
resource permission flags to the start_time or exec_vnode resources. To do this, use qmgr
to add the resource, and include the i flag, in the same way you would for a custom resource
being made invisible.
Example of making start_time and exec_vnode invisible to users:
qmgr -c ‘set resource start_time flag=i’
qmgr -c ‘set resource exec_vnode flag=i’
You can always make the start time and vnodes visible again to unprivileged users by removing the flags via qmgr.
See section 5.14.2.10, “Resource Permission Flags”, on page 351.
AG-170
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.15.2.ii
Chapter 4
Allowing Users to See Only Their Own Job Start Times
If you want users to be able to see the start times for their own jobs, but not those of other
users, set the server’s query_other_jobs attribute to False, and do not set the i or r permission
flags. Setting the server’s query_other_jobs attribute to False prevents a user from seeing
anything about other users’ jobs.
4.8.15.3
Attributes and Parameters Affecting Job Start
Time Estimation
backfill
Scheduler parameter.
Toggle that controls whether PBS uses backfilling. If this is set to True, the scheduler attempts to schedule smaller jobs around higher-priority jobs when using
strict_ordering, as long as running the smaller jobs won't change the start time of the
jobs they were scheduled around. The scheduler chooses jobs in the standard order,
so other high-priority jobs will be considered first in the set to fit around the highestpriority job.
If this parameter is True, the scheduler backfills around starving jobs when
help_starving_jobs is True.
PBS calculates estimated.start_time and estimated.exec_vnode for these jobs at
each scheduler iteration.
Can be used with strict_ordering and help_starving_jobs.
Format: Boolean
Default: True all
See “backfill” on page 298 of the PBS Professional Reference Guide.
backfill_depth
Server attribute.
Modifies backfilling behavior. Only used when server’s backfill attribute is True.
Sets the number of jobs that are to be backfilled around.
Recommendation: set this to less than 100.
Format: Integer
Default: 1
See “Server Attributes” on page 332 of the PBS Professional Reference Guide.
PBS Professional 13.0 Administrator’s Guide
AG-171
Chapter 4
Scheduling
estimated
List of values associated with job's estimated start time. Used to report jobs's
start_time and exec_vnode information. Can be set in a hook or via qalter, but
PBS will overwrite the values. Allowable values: start_time, exec_vnode
Format: estimated.start_time, estimated.exec_vnode
Default: Unset
Python attribute value type: Dictionary: estimated.<resource name>=<value>
where <resource name> is any resource
est_start_time_freq
Server attribute.
Interval at which PBS calculates estimated start times and vnodes for all jobs.
Best value is workload-dependent. Recommendation: set this to two hours.
When set to 0, PBS estimates start times for all jobs.
Format: Duration
Default: Unset
See “Server Attributes” on page 332 of the PBS Professional Reference Guide.
help_starving_jobs
Scheduler parameter.
Setting this option enables starving job support. Once jobs have waited for the
amount of time given by max_starve they are considered starving. If a job is considered starving, then no lower-priority jobs will run until the starving job can be run,
unless backfilling is also specified. To use this option, the max_starve configuration
parameter needs to be set as well. See also backfill, max_starve, and the server’s
eligible_time_enable attribute.
Format: Boolean
Default: True all
strict_ordering
Specifies that jobs must be run in the order determined by whatever sorting parameters are being used. This means that a job cannot be skipped due to resources
required not being available. If a job due to run next cannot run, no job will run,
AG-172
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
unless backfilling is used, in which case jobs can be backfilled around the job that is
due to run next.
Example line in PBS_HOME/sched_priv/sched_config:
strict_ordering: True ALL
Format: Boolean.
Default: False all
4.8.15.4
Viewing Estimated Start Times
You can view the estimated start times and vnodes of jobs using the qstat command. If you
use the -T option to qstat when viewing job information, the Est Start field is displayed.
Running jobs are shown above queued jobs.
See “qstat” on page 210 of the PBS Professional Reference Guide.
If the estimated start time or vnode information is invisible to unprivileged users, no estimated start time or vnode information is available via qstat.
Example output:
qstat -T
Job ID
------5.host1
9.host1
10.host1
7.host1
8.host1
11.host1
13.host1
4.8.15.5
Username
-------user1
user1
user1
user1
user1
user1
user1
Queue
----workq
workq
workq
workq
workq
workq
workq
Jobname SessID
-------- ----foojob 12345
foojob -foojob -foojob -foojob -foojob -foojob --
NDS
--1
1
1
1
1
1
1
TSK
--1
1
1
1
1
1
1
Req'd Req'd
Memory Time
------ ----128mb 00:10
128mb 00:10
128mb 00:10
128mb 00:10
128mb 00:10
128mb 00:10
128mb 00:10
S
R
Q
Q
Q
Q
Q
Q
Est
Start
-----11:30
Tu 15
Jul
2010
>5yrs
--
Selecting Jobs By Estimated Start Time
You can use the qselect command to select jobs according to their start times by using the
-t suboption to the -t option. This selects jobs according to the value of the estimated.start_time attribute. See “qselect” on page 198 of the PBS Professional Reference
Guide.
PBS Professional 13.0 Administrator’s Guide
AG-173
Scheduling
Chapter 4
4.8.15.6
Logging
Whenever the scheduler estimates the start time of a job, it logs the start time. The scheduler
does not log the estimated exec_vnode of a job.
4.8.15.7
Caveats and Advice
•
The estimated.start_time of a job array is the time calculated for the first queued subjob
only.
•
Cached estimated start times are only as fresh as the last time PBS calculated them.This
should be taken into account when setting the values of est_start_time_freq and
backfill_depth.
•
The frequency of calculating start times is a trade-off between having more current start
time information and using fewer computing cycles for non-job work. The background
task of calculating start times can be computationally intensive. This should be taken
into account when setting the value of est_start_time_freq. Depending on the size of
your site, it is probably a good idea not to set it to less than 10 minutes.
•
The best value for est_start_time_freq is workload dependent, but we recommend setting it to two hours as a starting point.
•
If your site has short scheduling cycles of a few minutes, and can use backfilling (and at
least one of strict ordering or starving jobs), you can have the start times for all jobs calculated at each scheduling cycle. To do this, set backfill_depth to a value greater than the
number of jobs the site will ever have, and do not set est_start_time_freq.
•
We recommend setting backfill_depth to a value that is less than 100.
•
The process of computing the estimated start time for jobs is not instantaneous.
•
Note that setting backfill_depth changes your scheduling policy. See section 4.8.3,
“Using Backfilling”, on page 129.
4.8.16
Calculating Job Execution Priority
When the scheduler examines jobs, either at the whole complex or within a queue, it gives
each job an execution priority, and then uses this job execution priority to select which job(s)
to run. Job execution priority is mostly independent of job preemption priority. We discuss
only job execution priority in this section.
Some of the scheduler’s policy for determining job execution priority is built into PBS, but
you can specify how execution priority is determined for most of the policy.
First, the scheduler divides queued jobs into classes. Then it sorts the jobs within each class.
AG-174
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.16.1
Chapter 4
Dividing Jobs Into Classes
PBS groups all jobs into classes, and handles one class at a time. There are special classes
that supersede queue order, meaning that whether or not queues are being examined separately, the jobs in each of those classes are handled before the scheduler takes queues into
account. Those jobs are not ordered according to which queue they reside in. For example,
all starving jobs are handled as a group. PBS has one non-special class called Normal for all
non-special jobs. This class typically contains most PBS jobs. Queue order is imposed on
this class, meaning that queue priority affects job execution order if queues are being handled
separately.
Job execution classes have a built-in order of precedence. All jobs in the highest class are
considered before any jobs in the next class, and so on. Classes are listed in the following
table, highest first:
Table 4-8: Job Execution Classes
Class
Description
Reservation Jobs submitted to an advance or standing reservation
Sort Applied Within Class
Formula, job sort key, submission
time
Express
All jobs with preemption priority higher
than normal jobs. Preemption priority is
defined in scheduler’s preempt_prio
parameter.
First by preemption priority, then
by preemption time, then starving
time, then by formula or fairshare
or job sort key, followed by job
Jobs are sorted into this class only when submission time
preemption is enabled. See section
4.8.33, “Using Preemption”, on page
241.
Preempted
All jobs that have been preempted. See First by preemption time, then
section 4.8.33, “Using Preemption”, on starving time, then by formula or
page 241.
fairshare or job sort key, followed
by job submission time
Starving
Starving jobs. Jobs are sorted into this
class only when starving is enabled by
setting help_starving_jobs to True.
See section 4.8.46, “Starving Jobs”, on
page 296
PBS Professional 13.0 Administrator’s Guide
Amount of time counted toward
starving, then by formula or fairshare or job sort key, followed by
job submission time
AG-175
Chapter 4
Scheduling
Table 4-8: Job Execution Classes
Class
Normal
4.8.16.2
Description
Jobs that do not belong in any of the
special classes
Sort Applied Within Class
Queue order, if it exists, then formula, fairshare, or job sort key, followed by job submission time
Selecting Job Execution Class
The scheduler places each job in the highest-priority class into which the job can fit. So, for
example, if a job is both in a reservation and is starving, the job is placed in the Reservation
class.
AG-176
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.16.3
Chapter 4
Sorting Jobs Within Classes
Jobs within each class are sorted according to rules specific to each class. The sorting applied
to each class is listed in Table 4-8, “Job Execution Classes,” on page 175.
•
•
The Reservation class is made up of all jobs in reservations.
•
The Reservation class is sorted within each reservation.
•
The first sort is according to the formula or job_sort_key, depending on which is
defined.
•
The second sort key is submission time.
The Express class is made up of all the jobs that have a higher priority than
“normal_jobs” in the preempt_prio scheduler parameter.
•
The Express class is sorted first by applying the rules for preemption priority you set
in the scheduler’s preempt_prio parameter, making preemption priority the first sort
key.
•
The second sort key is the time the job was preempted (if that happened), with the
earliest-preempted job having the highest priority (in this sort).
•
The third sort key is the job’s starving time, if any.
•
The fourth sort key is the formula, fairshare, or job_sort_key, depending on which is
defined.
•
The fifth sort key is job submission time.
Jobs are sorted into this class only when preemption is enabled. See section 4.8.33,
“Using Preemption”, on page 241. Please note that execution priority classes are distinct
from preemption levels, and are used for different purposes.
For example, if preempt_prio is the following:
preempt_prio: “express_queue, starving_jobs, normal_jobs”
The Express class contains all jobs that have preemption priority that is greater than that
of normal jobs. In this example, the Express class is prioritized with top priority for
express queue jobs, followed by starving jobs.
Since preemption levels are applied so that a job is put into the highest preemption level
possible, in this example, all starving jobs end up in the Express class.
•
The Preempted class is made up of all preempted jobs.
•
The first sort key is the time the job was preempted, with the earliest-preempted job
having the highest priority (in this sort).
•
The second sort key is the job’s starving time, if any.
•
The third sort key is the formula, fairshare, or job_sort_key, depending on which is
PBS Professional 13.0 Administrator’s Guide
AG-177
Chapter 4
Scheduling
defined.
•
The fourth sort key is job submission time.
When you set the sched_preempt_enforce_resumption scheduler attribute and the
backfill and strict_ordering parameters to True, the scheduler tries harder to run preempted jobs. By default the attribute is False, and in each scheduling cycle, if a top job
cannot run now, the scheduler moves on to the next top job and tries to run it. When the
attribute and the parameters are True, the scheduler treats the job like a top job: it makes
sure that no lower-priority job will delay this job, and it backfills around the job.
•
•
The Starving class is made up of all jobs whose wait time qualifies them as starving.
•
The Starving class is sorted first according to the amount of time that counts toward
starving for each job. You can use queue wait time or eligible time as starving time.
Jobs are sorted into this class only when starving is enabled. See section 4.8.46,
“Starving Jobs”, on page 296.
•
The second sort key is the time the job was preempted (if that happened), with the
earliest-preempted job having the highest priority (in this sort).
•
The third sort key is the formula, fairshare, or job_sort_key, depending on which is
defined.
•
The fourth sort key is job submission time.
The Normal class is for any jobs that don’t fall into any of the other classes. Most jobs
are in this class.
•
If queue ordering exists (there are multiple queues, and queues have different priorities set, and round_robin or by_queue is True), jobs are sorted first by queue order.
•
If defined, the formula, fairshare, or job sort key is the second sort key.
•
The third sort key is job submission time.
4.8.16.3.i
Precedence of Sort Method Used Within Class
If the formula is defined, it overrides fairshare and the job sort key. If fair share is defined, it
overrides the job sort key. If none are defined, jobs are ordered by their arrival time in the
queue.
For the job sorting formula, see section 4.8.20, “Using a Formula for Computing Job Execution Priority”, on page 194.
For fairshare, see section 4.8.18, “Using Fairshare”, on page 179.
For sorting jobs on a key, see section 4.8.43, “Sorting Jobs on a Key”, on page 292.
AG-178
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.16.4
Chapter 4
Execution Priority Caveats
•
Limits are not taken into account when prioritizing jobs for execution. Limits are
checked only after setting priority, when selecting a job to run. The only exception is in
the Express class, where soft limits may be taken into account, because execution priority for Express class jobs is calculated using preemption priority. For details, see section
4.8.33, “Using Preemption”, on page 241.
•
When you issue “qrun <job ID>”, without the -H option, the selected job has execution priority between Reservation and Express.
•
Jobs are sorted into the Express class only when preemption is enabled. Similarly, jobs
are sorted into the Starving class only when starving is enabled.
4.8.17
Express Queues
An express queue is a queue whose priority is high enough to qualify as an express queue; the
default for qualification is 150, but the cutoff can be set using the preempt_queue_prio
scheduler parameter. For information on configuring express queues, see section 2.2.5.3.i,
“Express Queues”, on page 23.
You can use express queues as tools to manage job execution and preemption priority.
•
You can set up execution priority levels that include jobs in express queues. For information on configuring job priorities in the scheduler, see section 4.8.16, “Calculating Job
Execution Priority”, on page 174.
•
You can set up preemption levels that include jobs in express queues. For information on
preemption, see section 4.8.33, “Using Preemption”, on page 241.
The term “express” is also used in calculating execution priority to mean all jobs that have a
preemption level greater than that of the normal_jobs level.
4.8.18
Using Fairshare
Fairshare provides a way to enforce a site's resource usage policy. It is a method for ordering
the start times of jobs based on two things: how a site's resources are apportioned, and the
resource usage history of site members. Fairshare ensures that jobs are run in the order of
how deserving they are. The scheduler performs the fairshare calculations each scheduling
cycle. If fairshare is enabled, all jobs have fairshare applied to them and there is no exemption from fairshare.
The administrator can employ basic fairshare behavior, or can apply a policy of the desired
complexity.
PBS Professional 13.0 Administrator’s Guide
AG-179
Chapter 4
Scheduling
The fair_share parameter is a primetime option, meaning that you can configure it for either
primetime or non-primetime, or you can specify it for all of the time. You cannot configure
different behaviors for fairshare during primetime and non-primetime.
4.8.18.1
Outline of How Fairshare Works
The owner of a PBS job can be defined for fairshare purposes to be a user, a group, the job’s
accounting string, etc. For example, you can define owners to be groups, and can explicitly
set each group’s relationship to all the other groups by using the tree structure. If you don’t
explicitly list an owner, it will fall into the “unknown” catchall. All owners in “unknown” get
the same resource allotment. You can define one group to be part of a larger department.
You specify which resources to track and how you want usage to be calculated. So if you
defined job owners to be groups, then only the usage of groups is considered. PBS tries to
ensure that each owner gets the amount of resources that you have set for it.
4.8.18.2
The Fairshare Tree
Fairshare uses a tree structure, where each vertex in the tree represents some set of job owners
and is assigned usage shares. Shares are used to apportion the site’s resources. The default
tree always has a root vertex and an unknown vertex. The default behavior of fairshare is to
give all users the same amount of the resource being tracked. In order to apportion a site's
resources according to a policy other than equal shares for each user, the administrator creates
a fairshare tree to reflect that policy. To do this, the administrator edits the file PBS_HOME/
sched_priv/resource_group, which describes the fairshare tree.
4.8.18.3
Enabling Basic Fairshare
If the default fairshare behavior is enabled, PBS enforces basic fairshare rules where all users
with queued jobs will get an equal share of CPU time. The root vertex of the tree will have
one child, the unknown vertex. All users will be put under the unknown vertex, and appear
as children of the unknown vertex.
AG-180
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Enable basic fairshare by doing the following:
•
In PBS_HOME/sched_priv/sched_config, set the scheduler configuration parameter fair_share to True
•
Uncomment the unknown_shares setting so that it is set to unknown_shares:
10
•
Specify how you want fairshare to work with primetime and non-primetime. If you want
separate behavior for primetime and non-primetime, list the fair_share parameter twice,
once for each time slot. The default is both. For example:
fair_share True prime
fair_share False non_prime
Note that a variant of basic fairshare has all users listed in the tree as children of root. Each
user can be assigned a different number of shares. You must create the tree
4.8.18.4
Using Fairshare to Enforce Policy
The administrator sets up a hierarchical tree structure made up of interior vertices and leaves.
Interior vertices are departments, which can contain both departments and leaves. Leaves are
for fairshare entities, defined by setting fairshare_entity to one of the following: euser,
egroup, egroup:euser, Account_Name, or queue. Apportioning of resources for the site
is among these entities. These entities' usage of the designated resource is used in determining the start times of the jobs associated with them. All fairshare entities must be the same
type. If you wish to have a user appear in more than one department, you can use
egroup:euser to distinguish between that user's different resource allotments.
Table 4-9: Using Fairshare Entities
Keyword
Fairshare
Entities
Purpose
euser
Username
Individual users are allotted shares of the
resource being tracked. Each username may
only appear once, regardless of group.
egroup
OS group name
Groups as a whole are allotted shares of the
resource being tracked.
egroup:euser
Combinations of user- Useful when a user is a member of more than
name and group name one group, and needs to use a different allotment in each group.
PBS Professional 13.0 Administrator’s Guide
AG-181
Scheduling
Chapter 4
Table 4-9: Using Fairshare Entities
Fairshare
Entities
Keyword
Purpose
Account_Name
Account IDs
Shares are allotted by account string
(Account_Name job attribute).
queues
Queues
Shares are allotted between queues.
4.8.18.4.i
Shares in the Tree
The administrator assigns shares to each vertex in the tree. The actual number of shares given
to a vertex or assigned in the tree is not important. What is important is the ratio of shares
among each set of sibling vertices. Competition for resources is between siblings only. The
sibling with the most shares gets the most resources.
4.8.18.4.ii
Shares Among Unknown Entities
The root vertex always has a child called unknown. Any entity not listed in PBS_HOME/
sched_priv/resource_group will be made a child of unknown, designating the entity
as unknown. The shares used by unknown entities are controlled by two parameters in
PBS_HOME/sched_priv/sched_config: unknown_shares and
fairshare_enforce_no_shares.
The parameter unknown_shares controls how many shares are assigned to the unknown
vertex. The default sched_config file contains this line:
#unknown_shares 10
If you leave unknown_shares commented out, the unknown vertex will have 0 shares. If
you simply remove the “#”, the unknown vertex's shares default to 10. The children of the
unknown vertex have equal amounts of the shares assigned to the unknown vertex.
The parameter fairshare_enforce_no_shares controls whether an entity without any shares
can run jobs. If fairshare_enforce_no_shares is True, then entities without shares cannot
run jobs. If it is set to False, entities without any shares can run jobs, but only when no other
entities’ jobs are available to run.
4.8.18.4.iii
Format for Describing the Tree
The file describing the fairshare tree contains four columns to describe the vertices in the tree.
Here is the format for the columns:
<Vertex name> <vertex fairshare ID> <parent of vertex> <#shares>
AG-182
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
The columns are for a vertex's name, its fairshare ID, the name of its parent vertex, and the
number of shares assigned to this (not the parent) vertex. Vertex names and IDs must be
unique. Vertex IDs are integers. The top row in resource_group contains information for
the first vertex, rather than column labels.
Neither the root vertex nor the unknown vertex is described in PBS_HOME/sched_priv/
resource_group. They are always added automatically. Parent vertices must be listed
before their children.
For example, we have a tree with two top-level departments, Math and Phys. Under Math are
the users Bob and Tom as well as the department Applied. Under Applied are the users Mary
and Sally. Under Phys are the users John and Joe. Our PBS_HOME/sched_priv/
resource_group looks like this:
Math
Phys
Applied
Bob
Tom
Mary
Sally
John
Joe
100
200
110
101
102
111
112
201
202
root
root
Math
Math
Math
Applied
Applied
Phys
Phys
30
20
25
15
10
1
2
2
2
If you wish to use egroup:euser as your entity, and Bob to be in two UNIX/Windows groups
pbsgroup1 and pbsgroup2, and Tom to be in two groups pbsgroup2 and pbsgroup3:
Math
Phys
Applied
pbsgroup1:Bob
pbsgroup2:Bob
pbsgroup2:Tom
pbsgroup3:Tom
100
200
110
101
102
103
104
root
root
Math
Phys
Math
Math
Applied
30
20
20
20
20
10
10
A user’s egroup, unless otherwise specified, will default to their primary UNIX/Windows
group. When a user submits a job using -Wgroup_list=<group>, the job’s egroup will
be <group>. For example, user Bob is in pbsgroup1 and pbsgroup2. Bob uses “qsub
-Wgroup_list= pbsgroup1” to submit a job that will be charged to pbsgroup1, and
“qsub -Wgroup_list=pbsgroup2” to submit a job that will be charged to pbsgroup2.
The first and third fields are alphanumeric. The second and fourth fields are numeric. Fields
can be separated by spaces and tabs.
PBS Professional 13.0 Administrator’s Guide
AG-183
Scheduling
Chapter 4
4.8.18.4.iv
Computing How Much Each Vertex Deserves
How much resource usage each entity deserves is its portion of all the shares in the tree,
divided by its past and current resource usage.
A vertex's portion of all the shares in the tree is called tree percentage. It is computed for all
of the vertices in the tree. Since the leaves of the tree represent the entities among which
resources are to be shared, their tree percentage sums to 100 percent.
The scheduler computes the tree percentage for the vertices this way:
First, it gives the root of the tree a tree percentage of 100 percent. It proceeds down the tree,
finding the tree percentage first for immediate children of root, then their children, ending
with leaves.
1.
For each internal vertex A:
sum the shares of its children;
2.
For each child J of vertex A:
divide J's shares by the sum to normalize the shares;
multiply J's normalized shares by vertex A's tree percentage to find J's tree percentage.
4.8.18.5
Tracking Resource Usage
You choose which resources to track and how to compute the usage by setting the
fairshare_usage_res scheduler configuration parameter in PBS_HOME/sched_priv/
sched_config to the formula you want. This parameter can contain the following:
•
Built-in and custom job resources
When you use a resource in the formula, if a value exists for resources_used.<resource>,
this value is used in the formula. Otherwise, the value is taken from
Resource_List.<resource>.
•
Mathematical operators
You can use standard Python operators and operators in the Python math module.
The default for the tracked resource is cput, CPU time.
An entity's usage always starts at 1. Resource usage tracking begins when the scheduler is
started. Each scheduler cycle, the scheduler adds the usage increment between this cycle and
the previous cycle to its sum for the entity. Each entity's usage is decayed, or reduced periodically, at the interval set in the fairshare_decay_time parameter in PBS_HOME/
sched_priv/sched_config. This interval defaults to 24 hours.
AG-184
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
This means that an entity with a lot of current or recent usage will have low priority for starting jobs, but if the entity cuts resource usage, its priority will go back up after a few decay
cycles.
A static resource will not change its usage from one cycle to the next. If you use a static
resource such as ncpus, the amount being tracked will not change during the lifetime of the
job; it will only be added once when the job starts.
Note that if a job ends between two scheduling cycles, its resource usage for the time between
previous scheduling cycle and the end of the job will not be recorded. The scheduler's default
cycle interval is 10 minutes. The scheduling cycle can be adjusted via the qmgr command.
Use
Qmgr: set server scheduler_iteration=<new value>
If the formula in fairshare_usage_res evaluates to a negative number, we use zero instead.
So there is no way to accumulate negative usage.
4.8.18.6
Setting Decay Interval and Factor
You set the interval at which usage is decayed by setting the fairshare_decay_time scheduler parameter to the desired time interval. The default value for this interval is 24 hours. For
example, to set this interval to 14 hours and 23 minutes, put this line in PBS_HOME/
sched_priv/sched_config:
fairshare_decay_time: 14:23:00
You set the decay factor by setting the fairshare_decay_factor scheduler parameter to the
desired multiplier for usage. At each decay interval, the usage is multiplied by the decay factor. This attribute is a float whose value must be between 0 and 1. The value must be greater
than 0 and less than 1. The default value for this multiplier is 0.5. For example, to set this
multiplier to 70 percent, put this line in PBS_HOME/sched_priv/sched_config:
fairshare_decay_factor: .7
4.8.18.7
Examples of Setting Fairshare Usage
To use CPU time as the resource to be tracked, put this line in PBS_HOME/sched_priv/
sched_config:
fairshare_usage_res: cput
To use ncpus multiplied by walltime as the resource to be tracked, put this line in
PBS_HOME/sched_priv/sched_config:
fairshare_usage_res: ncpus*walltime
PBS Professional 13.0 Administrator’s Guide
AG-185
Chapter 4
Scheduling
An example of a more complex formula:
fairshare_usage_res: “ncpus*pow(walltime,.25)*fs_factor”
4.8.18.8
Fairshare Formula Advice
We recommend including a time-based resource in the fairshare formula so that usage will
grow over time.
4.8.18.9
Finding the Most Deserving Entity
The most deserving entity is found by starting at the root of the tree, comparing its immediate
children, finding the most deserving, then looking among that vertex's children for the most
deserving child. This continues until a leaf is found. In a set of siblings, the most deserving
vertex will be the vertex with the lowest ratio of resource usage divided by tree percentage.
4.8.18.10
Choosing Which Job to Run
The job to be run next will be selected from the set of jobs belonging to the most deserving
entity. The jobs belonging to the most deserving entity are sorted according to the methods the
scheduler normally uses. This means that fairshare effectively becomes the primary sort key.
If the most deserving job cannot run, then the next most is selected to run, and so forth. All of
the most deserving entity's jobs would be examined first, then those of the next most deserving entity, et cetera.
At each scheduling cycle, the scheduler attempts to run as many jobs as possible. It selects
the most deserving job, runs it if it can, then recalculates to find the next most deserving job,
runs it if it can, and so on.
When the scheduler starts a job, all of the job's requested usage is added to the sum for the
owner of the job for one scheduling cycle. The following cycle, the job’s usage is set to the
actual usage used between the first and second cycles. This prevents one entity from having
all its jobs started and using up all of the resource in one scheduling cycle.
AG-186
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.18.11
Chapter 4
Files and Parameters Used in Fairshare
PBS_HOME/sched_priv/sched_config
The following parameters from PBS_HOME/sched_priv/sched_config are
used in fairshare:
Table 4-10: PBS_HOME/sched_priv/sched_config
Parameters used in Fairshare
Parameter
Use
fair_share
[True/False] Enable or disable fairshare
fairshare_usage_res
Resource whose usage is to be tracked; default
is cput
fairshare_decay_factor
Amount to decay usage at each decay interval
fairshare_decay_time
Decay time period; default is 24 hours
unknown_shares
Number of shares for unknown vertex; default
10, 0 if commented out
fairshare_entity
The kind of entity which is having fairshare
applied to it. Leaves in the tree are this kind of
entity. Default: euser.
fairshare_enforce_no_shares
If an entity has no shares, this controls whether
it can run jobs. T: an entity with no shares
cannot run jobs. F: an entity with no shares can
only run jobs when no other jobs are available
to run.
by_queue
If on, queues cannot be designated as fairshare
entities, and fairshare will work queue by
queue instead of on all jobs at once.
PBS_HOME/sched_priv/resource_group
Contains the description of the fairshare tree.
PBS_HOME/sched_priv/usage
Contains the usage database.
PBS Professional 13.0 Administrator’s Guide
AG-187
Scheduling
Chapter 4
qmgr
Used to set scheduler cycle frequency; default is 10 minutes.
Qmgr: set server scheduler_iteration=<new value>
Job attributes
Used to track resource usage:
resources_used.<resource>
Default is cput.
4.8.18.12
Fairshare for Complex or Within Queues
You can use fairshare to compare all jobs in the complex, or within each queue. Fairshare
within a queue means that the scheduler examines the jobs in a queue, and compares them to
each other, to determine which job to start next.
To use fairshare for the entire complex, set the by_queue and round_robin scheduler configuration parameters to False.
To use fairshare within queues, set the by_queue scheduler parameter to True, and
round_robin to False. If you want to examine queues in a particular order, prioritize the
queues by setting each queue’s priority attribute.
The scheduler configuration parameter by_queue in the file PBS_HOME/sched_priv/
sched_config is set to True by default.
If by_queue is True, queues cannot be designated as fairshare entities.
4.8.18.13
Using Queues to Manage Fairshare
You can introduce a fairshare factor that is different at each queue. To do this, create a custom
floating point resource, and set each queue’s resources_default.<resource> to the desired
value. Use this resource in the fairshare_usage_res computation. If you do not set this
value at a queue, PBS uses zero for the value. To avoid having to set a value at multiple
queues, you can set the servers’s resources_default.<resource> to the default value for all
queues where the value is unset. The server value takes precedence only where the queue
value is unset; where the queue value is set, it takes precedence.
AG-188
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
For example, to reduce the priority for jobs in the “expensive” queue by assigning them twice
the usage of the jobs in workq:
•
Define the resource:
Qmgr: create resource fs_factor type = float, flag = i
•
Set the resource values:
Qmgr: set server resources_default.fs_factor = 1
Qmgr: set queue workq resources_default.fs_factor = 0.3
Qmgr: set queue expensive resources_default.fs_factor = 0.6
•
Edit sched_config:
fairshare_usage_res: “fs_factor*ncpus*walltime”
4.8.18.14
Fairshare and Strict Ordering
Fairshare dynamically reorders the jobs with every scheduling cycle. Strict ordering is a rule
that says we always run the next-most-deserving job. If there were no new jobs submitted,
strict ordering could give you a snapshot of how the jobs would run for the next n days.
Hence fairshare appears to break that. However, looked at from a dynamic standpoint, fairshare is another element in the strict order.
4.8.18.15
Fairshare and Entity Shares (Strict Priority)
If you enable entity shares (strict priority), you use the same fairshare tree that you would use
for fairshare. Fairshare and entity shares (strict priority) are incompatible, so in order to use
entity shares, you disable fairshare by setting fair_share to False. For how to configure
entity shares, see section 4.8.14, “Sorting Jobs by Entity Shares (Was Strict Priority)”, on
page 168.
4.8.18.16
Viewing and Managing Fairshare Data
The pbsfs command provides a command-line tool for viewing and managing some fairshare data. You can display the tree in tree form or in list form. You can print all information
about an entity, or set an entity's usage to a new value. You can force an immediate decay of
all the usage values in the tree. You can compare two fairshare entities. You can also remove
all entities from the unknown department. This makes the tree easier to read. The tree can
become unwieldy because entities not listed in the file PBS_HOME/sched_priv/
resource_group all land in the unknown group.
The fairshare usage data is written to the file PBS_HOME/sched_priv/usage, whenever
the scheduler has new usage data. The usage data is always up-to-date.
PBS Professional 13.0 Administrator’s Guide
AG-189
Scheduling
Chapter 4
For more information on using the pbsfs command, see “pbsfs” on page 106 of the PBS
Professional Reference Guide.
4.8.18.17
Using Fairshare in Job Execution Priority
Jobs are sorted as specified by the formula in job_sort_formula, if it exists, or by fairshare, if
it is enabled and there is no formula, or if neither of those is used, by job_sort_key. The job
sorting formula can use the value of fair_share_perc, the percentage of the fairshare tree for
this job’s entity. See section 4.8.16, “Calculating Job Execution Priority”, on page 174.
4.8.18.18
Using Fairshare in Job Preemption Priority
You can use the fairshare preemption level in determining job preemption priority. This level
applies to jobs whose owners are over their fairshare allotment. See section 4.8.33, “Using
Preemption”, on page 241.
4.8.18.19
Fairshare and Requeued Jobs
When a job is requeued, it normally retains its original place in its execution queue with its
former priority. The job is usually the next job to be considered during scheduling, unless the
relative priorities of the jobs in the queue have changed. This can happen when the job sorting
formula assigns higher priority to another job, another higher-priority job is submitted after
the requeued job started, this job’s owner has gone over their fairshare limit, etc.
4.8.18.20
Moving Entities within Fairshare Tree
You may want to move an entity within the fairshare tree. To move an entity within the fairshare tree, change its parent:
1.
Edit PBS_HOME/sched_priv/resource_group. Change the parent (column 3) to the
desired parent
2.
HUP or restart the scheduler
AG-190
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.18.21
Chapter 4
Removing Entities from Fairshare Tree
You may want to remove an entity from the fairshare tree, either because they no longer run
jobs, or because you don’t want them to have their own place in the tree. When an entity that
is not in the fairshare tree runs a job, their past and future usage, including that for jobs running while you remove the entity, shows up in the Unknown group. To remove an entity from
the fairshare tree:
1.
Edit PBS_HOME/sched_priv/resource_group to remove the entity line from the file
2.
HUP or restart the scheduler
If you do not want an entity’s usage to show up in the Unknown group, use pbsfs -e to
remove the usage:
1.
Stop the scheduler
2.
Run pbsfs -e
3.
Start the scheduler
If you have removed a user from the PBS complex and don’t want their usage to show up any
more:
1.
Stop the scheduler
2.
Edit PBS_HOME/sched_priv/resource_group
3.
Run pbsfs -e
4.
Start the scheduler
4.8.18.22
Fairshare Caveats
•
If the job sorting formula is defined, it overrides fairshare.
•
Do not use fairshare with help_starving_jobs.
•
We do not recommend using fairshare with strict_ordering, or with strict_ordering and
backfill. The results may be non-intuitive. Fairshare will cause relative job priorities to
change with each scheduling cycle. It is possible that a job from the same entity or group
PBS Professional 13.0 Administrator’s Guide
AG-191
Scheduling
Chapter 4
as the top job will be chosen as the filler job. The usage from the filler job will lower the
priority of the most deserving, i.e. top, job. This could delay the execution of the top job.
•
Do not use fairshare when using the fair_share_perc option to job_sort_key.
•
Do not use static resources such as ncpus as the resource to track. The scheduler adds
the incremental change in the tracked resource at each scheduling cycle, and a static
resource will not change.
•
The most deserving entity can change with every scheduling cycle, if each time a job is
run, it changes usage sufficiently.
•
Fairshare dynamically reorders the jobs with every scheduling cycle. Strict ordering is a
rule that says we always run the next-most-deserving job. If there were no new jobs submitted, strict ordering could give you a snapshot of how the jobs would run for the next n
days. Hence fairshare appears to break that. However, looked at from a dynamic standpoint, fairshare is another element in the strict order.
•
The half_life parameter is deprecated and has been replaced by the
fairshare_decay_time parameter.
4.8.19
FIFO Scheduling
With FIFO scheduling, PBS runs jobs in the order in which they are submitted. You can use
FIFO order for all of the jobs in your complex, or you can go queue by queue, so that the jobs
within each queue are considered in FIFO order.
4.8.19.1
Configuring Basic FIFO Scheduling
To configure basic FIFO scheduling, whether across the complex or queue by queue, set the
following scheduler parameters to these values:
round_robin:
job_sort_key:
fair_share
help_starving_jobs
backfill:
job_sort_formula: (unset)
AG-192
False ALL
(commented out)
False ALL
False ALL
False ALL
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.19.2
Chapter 4
FIFO for Entire Complex
To configure FIFO across your entire complex, follow the steps above and do one of the following:
•
Use only one execution queue
•
Set the by_queue scheduler parameter to False
4.8.19.3
Queue by Queue FIFO
To configure FIFO for each queue separately, first decide how you want queues to be selected.
You can set the order in which PBS chooses queues from which to run jobs, or you can allow
the queues to be selected in an undefined way. First configure the scheduler as in Section
4.8.19.1, "Configuring Basic FIFO Scheduling".
•
To allow queues to be selected in an undefined way, set the by_queue scheduler parameter to True.
•
To set the order in which queues are selected, do the following:
•
Specify a priority for each queue
•
Set the by_queue scheduler parameter to True
4.8.19.4
FIFO with Strict Ordering
If your jobs must run exactly in submission order, you can use strict ordering with FIFO
scheduling. If you use strict ordering with FIFO scheduling, this means that when the job that
is supposed to run next cannot run, no jobs can run. This can result in less throughput than
you could otherwise achieve. To avoid that problem, you can use backfilling. See the following section.
To use strict ordering with FIFO scheduling, use the following scheduler parameter settings in
PBS_HOME/sched_priv/sched_config:
strict_ordering:
round_robin:
job_sort_key:
fair_share
help_starving_jobs
backfill:
job_sort_formula: (unset)
True
ALL
False ALL
(commented out)
False ALL
False ALL
False ALL
PBS Professional 13.0 Administrator’s Guide
AG-193
Scheduling
Chapter 4
4.8.19.5
FIFO with Strict Ordering and Backfilling
If you want to run your jobs in submission order, except for backfilling around top jobs that
are stuck, use the following:
strict_ordering:
round_robin:
job_sort_key:
fair_share
help_starving_jobs
backfill:
job_sort_formula: (unset)
4.8.20
True
ALL
False ALL
(commented out)
False ALL
False ALL
True
ALL
Using a Formula for Computing Job Execution
Priority
You can choose to use a formula by which to sort jobs at the finest-granularity level. The formula can only direct how jobs are sorted at the finest level of granularity. However, that is
where most of the sorting work is done.
When the scheduler sorts jobs according to the formula, it computes a priority for each job.
The priority computed for each job is the value produced by the formula. Jobs with a higher
value get higher priority. See section 4.8.16.3, “Sorting Jobs Within Classes”, on page 177 for
how the formula is used in setting job execution priority.
This formula will override both job_sort_key and fair_share for sorting jobs. If the
job_sort_formula server attribute contains a formula, the scheduler will use it. If not, and
fairshare is enabled, the scheduler computes job priorities according to fairshare. If neither
the formula nor fairshare is defined, the scheduler uses job_sort_key.
Only one formula is used to prioritize all jobs. At each scheduling cycle, the formula is
applied to all jobs, regardless of when they were submitted. If you change the formula, the
new formula is applied to all jobs.
For example, if you submit some jobs, change the formula, then submit more jobs, the new
formula is used for all of the jobs, during the next scheduling cycle.
You can set a job priority threshold so that jobs with priority at or below the specified value
do not run. See section 4.8.20.8, “Setting Minimum Job Priority Value for Job Execution”, on
page 198.
AG-194
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
You may find that the formula is most useful when you use it with custom resources inherited
by or allocated to jobs. For example, you may want to route all jobs from a particular project
to a queue where they inherit a specific value for a custom resource. Other jobs may end up at
a different queue, where they inherit a different value, or they may inherit no value. You can
then use this custom resource in the formula as a way to manage job priority. See section
11.3, “Allocating Resources to Jobs”, on page 967, and section 4.8.8, “Using Custom and
Default Resources”, on page 140.
It may be helpful if these custom resources are invisible and unrequestable by users. See section 4.8.20.10, “Examples of Using Resource Permissions in Job Sorting Formula”, on page
201.
4.8.20.1
Using the Formula
Once you set job_sort_formula via qmgr, it takes effect with the following scheduling cycle.
Variables are evaluated at the start of the scheduling cycle.
If an error is encountered while evaluating the formula, the formula evaluates to zero for that
job, and the following message is logged at event class 0x0100:
“1234.mars;Formula evaluation for job had an error. Zero value will be
used”
4.8.20.2
•
Configuring the Job Sorting Formula
Define the formula:
You specify the formula in the server’s job_sort_formula attribute. To set the
job_sort_formula attribute, use the qmgr command. When specifying the formula, be
sure to follow the requirements for entering an attribute value via qmgr: strings containing whitespace, commas, or other special characters must be enclosed in single or double
quotes. See “Attribute Values” on page 166 of the PBS Professional Reference Guide.
Format:
Qmgr: s s job_sort_formula = "<formula>"
•
Optional: set a priority threshold. See section 4.8.20.8, “Setting Minimum Job Priority
Value for Job Execution”, on page 198
4.8.20.3
Requirements for Creating Formula
The job sorting formula must be created at the server host.
Under UNIX/Linux, root privilege is required in order to operate on the job_sort_formula
server attribute.
PBS Professional 13.0 Administrator’s Guide
AG-195
Scheduling
Chapter 4
Under Windows, this must be done from the installation account. For domained environments, the installation account must be a local account that is a member of the local Administrators group on the local computer. For standalone environments, the installation account
must be a local account that is a member of the local Administrators group on the local computer.
4.8.20.4
Format of Formula
The formula can be made up of any number of expressions, where expressions contain terms
which are added, subtracted, multiplied, or divided. You can use parentheses, exponents,
unary + and - operators, and the ternary operator (“?:”). All operators use standard mathematical precedence. The formula can use standard Python mathematical operators and those in
the Python math module.
The formula can be any length.
The range for the formula is defined by the IEEE floating point standard for a double.
4.8.20.5
Units in Formula
The variables you can use in the formula have different units. Make sure that some terms do
not overpower others, by normalizing them where necessary. Resources like ncpus are integers, size resources like mem are in kb, so 1gb is 1048576kb, and time-based resources are in
seconds (e.g. walltime). Therefore, if you want a formula that combines memory and ncpus,
you’ll have to account for the factor of 1024 difference in the units.
AG-196
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
The following are the units for the supported built-in resources:
Table 4-11: Job Sorting Formula Units
Resource
Units
Example
Time resources
Integer number of seconds
300
Memory
kb
1gb => 1048576kb
ncpus
Integer
8
Example 4-2: If you use ‘1 * ncpus + 1 * mem’, where mem=2mb, ncpus will have almost
no effect on the formula result. However, if you use ‘1024 * ncpus + 1 * mem’, the
scaled mem won’t overpower ncpus.
Example 4-3: You are using gb of mem:
Qmgr: s s job_sort_formula=’1048576 * ncpus + 2 * mem’
Example 4-4: If you want to add days of walltime to queue priority, you might want to multiply the time by 0.0000115, equivalent to dividing by the number of seconds in a day:
Qmgr: s s job_sort_formula = ‘.0000115*walltime + queue_priority’
4.8.20.6
Formula Coefficients
The formula operates only on resources in the job’s Resource_List attribute. These are the
numeric job-level resources, and may have been explicitly requested, inherited, or summed
from consumable host-level resources. See section 5.9.2, “Resources Requested by Job”, on
page 323.
This means that all variables and coefficients in the formula must be resources that were
either requested by the job or were inherited from defaults at the server or queue. These variables and coefficients can be custom numeric resources inherited by the job from the server or
queue, or they are long integers or floats.
You may need to create custom resources at the server or queue level to be used for formula
coefficients. See section 4.8.8, “Using Custom and Default Resources”, on page 140.
The following table lists the terms that can be used in the formula:
Table 4-12: Terms in Job Sorting Formula
Terms
Constants
Allowable Value
NUM or NUM.NUM
PBS Professional 13.0 Administrator’s Guide
AG-197
Scheduling
Chapter 4
Table 4-12: Terms in Job Sorting Formula
Terms
Attribute values
Allowable Value
queue_priority
Value of priority attribute for queue in which job
resides
job_priority
Value of the job’s priority attribute
fair_share_perc
Percentage of fairshare tree for this job’s entity
eligible_time
Amount of wait time job has accrued while waiting
for resources
Resources
ncpus
mem
walltime
cput
Custom numeric job-wide resources
4.8.20.7
Uses the amount requested, not the amount used.
Must be of type long, float, or size. See section
5.14.2.12, “Custom Resource Values”, on page 355.
Modifying Coefficients For a Specific Job
Formula coefficients can be altered for each job by using the qalter command to change
the value of that resource for that job. If a formula coefficient is a constant, it cannot be
altered per-job.
4.8.20.8
Setting Minimum Job Priority Value for Job
Execution
You can specify a minimum job priority value for jobs to run by setting the
job_sort_formula_threshold scheduler attribute. If the value calculated for a job by the job
sorting formula is at or below this value, the job cannot run during this scheduling cycle.
AG-198
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.20.9
Chapter 4
Examples of Using the Job Sorting Formula
Examples of formulas:
Example 4-5: 10 * ncpus + 0.01*walltime + A*mem
Here, “A” is a custom resource.
Example 4-6: ncpus + 0.0001*mem
Example 4-7: To change the formula on a job-by-job basis, alter the value of a resource in the
job’s Resource_List.<resource>. So if the formula is A *queue_priority +
B*job_priority + C*ncpus + D*walltime, where A-D are custom numeric resources.
PBS Professional 13.0 Administrator’s Guide
AG-199
Chapter 4
Scheduling
These resources can have a default value via resources_default.A ...
resources_default.D. You can change the value of a job’s resource through qalter.
Example 4-8: ncpus*mem
Example 4-9: Set via qmgr:
qmgr -c 'set server job_sort_formula= 5*ncpus+0.05*walltime'
Following this, the output from qmgr -c 'print server' will look like
set server job_sort_formula=”5*ncpus+0.05*walltime”
Example 4-10:
Qmgr: s s job_sort_formula=ncpus
Example 4-11:
Qmgr: s s job_sort_formula=‘queue_priority + ncpus’
Example 4-12:
Qmgr: s s job_sort_formula=‘5*job_priority + 10*queue_priority’
Example 4-13: Sort jobs using the value of ncpus x walltime:
Formula expression: “ncpus * walltime”
Submit these jobs:
Job 1: ncpus=2 walltime=01:00:00 -> 2*60s = 120
Job 2: ncpus=1 walltime=03:00:00 -> 1*180s = 180
Job 3: ncpus=5 walltime=01:00:00 -> 5*60s = 300
The scheduler logs the following:
Job ;1.host1;Formula Evaluation = 120
Job ;2.host1;Formula Evaluation = 180
Job; 3.host1;Formula Evaluation = 300
The jobs are sorted in the following order:
Job 3
Job 2
Job 1
AG-200
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.20.10
Chapter 4
Examples of Using Resource Permissions in
Job Sorting Formula
See section 5.14.2.10, “Resource Permission Flags”, on page 351 for information on using
resource permissions.
Example 4-14: You may want to create per-job coefficients in your job sorting formula which
are set by system defaults and which cannot be viewed, requested or modified by the
user. To do this, you create custom resources for the formula coefficients, and make
them invisible to users. In this example, A, B, C and D are the coefficients. You then use
them in your formula:
A *(Queue Priority) + B*(Job Class Priority) + C*(CPUs) + D*(Queue Wait Time)
Example 4-15: You may need to change the priority of a specific job, for example, have one
job or a set of jobs run next. In this case, you can define a custom resource for a special
job priority. If you do not want users to be able to change this priority, set the resource
permission flag for the resource to r. If you do not want users to be able to see the priority, set its resource permission flag to i. For the job or jobs that you wish to give top priority, use qalter to set the special resource to a value much larger than any formula
outcome.
Example 4-16: To use a special priority:
sched_priority = W_prio * wait_secs + P_prio * priority + ... +
special_priority
Here, special_priority is very large.
PBS Professional 13.0 Administrator’s Guide
AG-201
Scheduling
Chapter 4
4.8.20.11
Caveats and Error Messages
•
It is invalid to set both job_sort_formula and job_sort_key at the same time. If they are
both set, job_sort_key is ignored and the following error message is logged:
“Job sorting formula and job_sort_key are incompatible. The job sorting
formula will be used.”
•
If the formula overflows or underflows the sorting behavior is undefined.
•
If you set the formula to an invalid formula, qmgr will reject it, with one of the following
error messages:
“Invalid Formula Format”
“Formula contains invalid keyword”
“Formula contains a resource of an invalid type”
•
If an error is encountered while evaluating the formula, the formula evaluates to zero for
that job, and the following message is logged at event class 0x0100:
“1234.mars;Formula evaluation for job had an error. Zero value will be
used”
•
The job sorting formula must be set via qmgr at the server host.
•
When a job is moved to a new server or queue, it will inherit new default resources from
that server or queue. If it is moved to a new server, it will be prioritized according to the
formula on that server, if one exists.
•
If the job is moved to another server through peer scheduling and the pulling server uses
queue priority in its job sorting formula, the queue priority used in the formula will be
that of the queue to which the job is moved.
•
If you are using FIFO scheduling, the job_sort_formula server attribute must be unset.
•
If you are using eligible time in the formula, and eligible_time_enable is False, each
job’s eligible time evaluates to zero in the formula.
•
If a job is requeued, and you are using the formula, the job may lose its place, because
various factors may affect the job’s priority. For example, a higher-priority job may be
submitted between the time the job is requeued and the time it would have run, or another
job’s priority may be increased due to changes in which jobs are running or waiting.
•
If the formula is configured, it is in force during both primetime and non-primetime.
•
If the job sorting formula is defined, it overrides fairshare.
4.8.20.12
Logging
For each job, the evaluated formula answer is logged at the highest log event class (0x0400):
“Formula Evaluation = <answer>”
AG-202
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.21
Chapter 4
Gating Jobs at Server or Queue
You can set resource limits at the server and queues so that jobs must conform to the limits in
order to be admitted. This way, you can reject jobs that request more of a resource than the
complex or a queue can supply.
You can also force jobs into specific queues where they will inherit the desired values for
unrequested or custom resources. You can then use these resources to manage jobs, for example by using the resources in the job sorting formula or to route jobs to particular vnodes.
You can either force users to submit their jobs to specific queues, or you can have users submit jobs to routing queues, and then route the jobs to the desired queues.
For information on using resources for gating, see section 5.13, “Using Resources to Restrict
Server, Queue Access”, on page 336.
For a description of which resources can be used for gating, see section 2.2.6.4.iii, “Resources
Used for Routing and Admittance”, on page 27.
For how queue resource limits are applied to jobs, see section 2.2.6.4.i, “How Queue and
Server Limits Are Applied, Except Running Time”, on page 25.
For how routing queues work, see section 2.2.6, “Routing Queues”, on page 24.
For how to route jobs to particular vnodes, see section 4.8.2, “Associating Vnodes with
Queues”, on page 126.
For how to use resources in the job sorting formula, see section 4.8.20, “Using a Formula for
Computing Job Execution Priority”, on page 194.
4.8.21.1
Gating Caveats
•
For most resources, if the job does not request the resource, and no server or queue
defaults are set, the job inherits the maximum gating value for the resource. See section
5.9.3.7, “Using Gating Values As Defaults”, on page 326.
•
For shrink-to-fit jobs, if a walltime limit is specified:
•
Both min_walltime and max_walltime must be greater than or equal to
resources_min.walltime.
•
Both min_walltime and max_walltime must be less than or equal to
resources_max.walltime.
PBS Professional 13.0 Administrator’s Guide
AG-203
Chapter 4
4.8.22
Scheduling
Managing Application Licenses
PBS does not check application licenses out from the license server. PBS has no direct control over application licenses. However, you can have the scheduler use a dynamic resource
to track application license use. This way, the scheduler knows how many application
licenses are available, and how many have been checked out. For how to configure dynamic
resources to represent application licenses, see section 5.14.7, “Supplying Application
Licenses”, on page 369.
Unfortunately, some jobs or applications don’t check out all of the application licenses they
use until they have been running for some time. For example, job J1, which requests licenses,
starts running, but doesn’t check them out for a few minutes. Next, the scheduler considers
job J2, which also requests licenses. The scheduler runs its query for the number of available
licenses, and the query returns with a sufficient number of licenses to run J2, so the scheduler
starts J2. Shortly afterward, J1 checks out licenses, leaving too few to run J2.
It might appear that you could track the number of application licenses being used with a
static integer PBS resource, and force jobs requesting application licenses to request this
resource as well, but there is a drawback: if a job that has requested this resource is suspended, its static resources are released, but its application licenses are not. In this case you
could end up with a deceptively high number for available licenses.
You can limit the number of jobs that request application licenses, if you know how many
jobs can run at one time:
•
Create a custom server-level consumable integer resource to represent these jobs. See
section 5.14.4, “Configuring Server-level Resources”, on page 358.
•
Use qmgr to set resources_available.<job limit> at the server to the number of jobs that
can run at one time.
•
Force all jobs requesting the application to request one of these. See section 11.3, “Allocating Resources to Jobs”, on page 967.
4.8.23
Limits on Per-job Resource Usage
You can specify how much of each resource any job is allowed to request, at the server and
queue level. The server and queues each have per-job limit attributes. The
resources_min.<resource> and resources_max.<resource> server and queue attributes are
limits on what each individual job may request.
You cannot set resources_min or resources_max limits on min_walltime or max_walltime.
See section 5.15.3, “Placing Resource Limits on Jobs”, on page 414, and section 5.13, “Using
Resources to Restrict Server, Queue Access”, on page 336.
AG-204
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.24
Chapter 4
Limits on Project, User, and Group Jobs
You can manage the number of jobs being run by users or groups, and the number of jobs
being run in projects, at the server or queue level. For example, you can limit the number of
jobs enqueued in queue QueueA by any one group to 30, and by any single user to 5.
See section 5.15.1, “Managing Resource Usage By Users, Groups, and Projects, at Server &
Queues”, on page 389.
4.8.25
Limits on Project, User, and Group Resource
Usage
You can manage the total amount of each resource that is used by projects, users, or groups, at
the server or queue level. For example, you can manage how much memory is being used by
jobs in queue QueueA.
See section 5.15.1, “Managing Resource Usage By Users, Groups, and Projects, at Server &
Queues”, on page 389.
4.8.26
Limits on Jobs at Vnodes
You can set limits on the number of jobs that can be run at each vnode by users, by groups, or
overall. See section 5.15.2, “Limiting Number of Jobs at Vnode”, on page 413.
4.8.27
Using Load Balancing
PBS can track the load on each execution host, running new jobs on the host according to the
load on the host. You can specify that PBS does this for all machines in the complex. This is
somewhat different behavior from that used for managing the load on vnodes; when managing load levels on vnodes, the scheduler only pays attention to the state of the vnode, and does
not calculate whether a job would put the vnode over its load limit. Managing load levels on
vnodes does not require load balancing to be turned on. See section 9.4.4, “Managing Load
Levels on Vnodes”, on page 883.
You use the load_balancing scheduler parameter to control whether PBS tracks the load on
each host.
The load_balancing parameter is a primetime option, meaning that you can configure it separately for primetime and non-primetime, or you can specify it for all of the time.
PBS Professional 13.0 Administrator’s Guide
AG-205
Scheduling
Chapter 4
4.8.27.1
How Load Average is Computed
When load balancing is on, the scheduler queries each MoM once each scheduling cycle for
the MoM’s load. MoM checks the load average on her host every 10 seconds.
The load used by MoM is the following:
•
On UNIX/Linux, it is the raw one-minute averaged “loadave” returned by the operating
system
•
On Windows, it is based on the processor queue length
When a new load is added to a vnode, the load average increases slowly over time, so that
more jobs than you want may be started at first. Eventually, the load average matches the
actual load. If this is above the limit, PBS won’t start any more jobs on that vnode. As jobs
terminate, the load average slowly moves down, and it takes time before the vnode is chosen
for new jobs.
Consult your OS documentation to determine load values that make sense.
MoM sets the load only on the natural vnode, so it is the same for all vnodes on a multi-vnode
machine.
4.8.27.2
How PBS Uses Load Information
When choosing vnodes for a job, the scheduler considers the load on the vnode in addition to
whether the vnode can supply the resources specified in the job’s Resource_List attribute.
PBS estimates that a 1-CPU job will produce one CPU’s worth of load. This means that if
you have a 2-CPU machine whose load is zero, PBS will put two 1-CPU jobs, or one 2-CPU
job, on that machine.
When using load balancing, if a vnode has gone above $max_load, PBS does not run new
jobs on the vnode until the load drops below $ideal_load.
MoM sets the vnode’s state according to its load. When a vnode’s load goes above
$max_load, MoM marks the vnode busy. When the load drops below $ideal_load, MoM
marks the vnode free. When a vnode’s state changes, for example from free to busy, MoM
informs the server.
When using load balancing, PBS does not run new jobs on vnodes under the following conditions:
•
Vnodes that are marked busy
•
Vnodes whose resources, such as ncpus, are already fully allocated
•
Vnodes that are above $max_load
•
Vnodes where running the job would cause the load to go above $max_load
AG-206
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.27.3
Chapter 4
When to Use Load Balancing
When using load balancing (meaning the load_balancing scheduler parameter is True), the
only changes to behavior are the following:
•
The scheduler won’t place a job on a vnode whose load is above $max_load
•
The scheduler won’t place a job on a vnode where that job would put the load above
$max_load
Load balancing is useful when you want to oversubscribe CPUs, managing job placement by
load instead. This can help when you want to run lots of jobs where each job will need only
some CPU time, and the average load on the machine will be reasonable.
4.8.27.4
Suspending Jobs on Overloaded Vnodes
You can specify that MoM should suspend jobs when the load goes above $max_load, by
adding the suspend argument to the $max_load parameter. See section , “$max_load <load>
[suspend]”, on page 209. In this case, MoM suspends all jobs on the vnode until the load
drops below $ideal_load, then resumes them. This option is useful only when the source of
the load is not strictly PBS jobs. This option is not recommended when the load is due solely
to PBS jobs, because it can lead to the vnode cycling back and forth between becoming overloaded, being marked busy, suspending all jobs, being marked free, then starting all jobs,
becoming overloaded, and so on.
4.8.27.5
Configuring Load Balancing
If you want to oversubscribe CPUs, set the value of ncpus on the vnode to the desired higher
value.
We recommend setting the value of $max_load to a slightly higher value than the desired
load, for example .25 + ncpus. Otherwise, the scheduler will not schedule jobs onto the last
CPU, because it thinks a 1-CPU job will raise the load by 1, and the machine probably registers a load above zero.
PBS Professional 13.0 Administrator’s Guide
AG-207
Chapter 4
Scheduling
To configure load balancing, perform the following steps:
1.
Turn on load balancing by setting the load_balancing scheduler parameter to True:
load_balancing: True ALL
2.
Choose whether you want load balancing during primetime, non-primetime, or all. If you
want separate behavior for primetime and non-primetime, specify each separately. The
default is both. Example of separate behavior:
load_balancing True prime
load_balancing False non_prime
3.
Set the ideal and maximum desired load for each execution host, by specifying values for
$ideal_load and $max_load in each execution host’s MoM configuration file:
$ideal_load <value at which to start new jobs>
$max_load <value at which to cease starting jobs>
4.
Set each host’s resources_available.ncpus to the maximum number of CPUs you wish
to allocate on that host. Follow the recommendations in section 3.5.2, “Choosing Configuration Method”, on page 52.
4.8.27.6
Load Balancing Caveats and Recommendations
•
When setting ncpus and $max_load, consider the ratio between the two. PBS won’t
allocate more than the value of resources_available.ncpus, so you can use this value to
keep the load average from getting too high.
•
Make sure that load balancing does not interfere with communications. Please read section 9.4.4, “Managing Load Levels on Vnodes”, on page 883.
•
Load balancing is incompatible with sorting vnodes on a key (node_sort_key) when
sorting on a resource using the “unused” or “assigned” parameters. Load balancing
will be disabled. See section 4.8.48, “Sorting Vnodes on a Key”, on page 300.
•
You can use load balancing with SMP cluster distribution, but smp_cluster_dist will
behave as if it is set to pack. See section 4.8.42, “SMP Cluster Distribution”, on page
290.
•
We recommend setting the value of $max_load to a slightly higher value than the desired
load, for example .25 + ncpus. Otherwise, the scheduler will not schedule jobs onto the
last CPU, because it thinks a 1-CPU job will raise the load by 1, and the machine probably registers a load above zero.
•
If you are using cycle harvesting via load balancing, make sure your load balancing settings do not interfere with cycle harvesting. Be careful with the settings for $ideal_load
and $max_load. You want to make sure that when the workstation owner is using the
machine, the load on the machine triggers MoM to report being busy, and that PBS does
AG-208
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
not start any new jobs while the user is working. Please read section 4.8.9.6, “Cycle Harvesting Based on Load Average”, on page 155.
•
Using load balancing with multi-vnoded machines is not supported. MoM sets the load
average only on the natural vnode, so all vnodes on a multi-vnoded machine are given the
same value regardless of their actual load.
•
It is not recommended to specify that MoM should suspend jobs when the load goes
above $max_load. See section 4.8.27.4, “Suspending Jobs on Overloaded Vnodes”, on
page 207.
•
If you configure both placement sets and load balancing, the net effect is that vnodes that
are over their load limit will be removed from consideration.
4.8.27.7
Parameters Affecting Load Balancing
$ideal_load <load>
MoM parameter. Defines the load below which the vnode is not considered to be
busy. Used with the $max_load parameter.
Example:
$ideal_load 1.8
Format: Float
No default
$max_load <load> [suspend]
MoM parameter. Defines the load above which the vnode is considered to be busy.
Used with the $ideal_load parameter.
If the optional suspend argument is specified, PBS suspends jobs running on the
vnode when the load average exceeds $max_load, regardless of the source of the
load (PBS and/or logged-in users).
Example:
$max_load 3.5
Format: Float
Default: number of CPUs
load_balancing <T|F> [time slot specification]
Scheduler parameter. When set to True, the scheduler takes into account the load
average on vnodes as well as the resources listed in the resources: line in
PBS Professional 13.0 Administrator’s Guide
AG-209
Scheduling
Chapter 4
sched_config. See “load_balancing” on page 303 of the PBS Professional Reference Guide.
Format: Boolean
Default: False all
4.8.28
Matching Jobs to Resources
The scheduler places each job where the resources requested by the job are available. The
scheduler handles built-in and custom resources the same way. For a complete description of
PBS resources, see Chapter 5, "PBS Resources", on page 305.
4.8.28.1
Scheduling on Consumable Resources
The scheduler constrains the use of a resource to the value that is set for that resource in
resources_available.<resource>. For a consumable resource, the scheduler won’t place
more demand on the resource than is available. For example, if a vnode has
resources_available.ncpus set to 4, the scheduler will place jobs requesting up to a total of 4
CPUs on that vnode, but no more.
The scheduler computes how much of a resource is available by subtracting the total of
resources_assigned.<resource> for all running jobs and started reservations from
resources_available.<resource>.
4.8.28.2
Scheduling on Non-Consumable Resources
For non-consumable resources such as arch or host, the scheduler matches the value
requested by a job with the value at one or more vnodes. Matching a job this way does not
change whether or not other jobs can be matched as well; non-consumable resources are not
used up by jobs, and therefore have no limits.
4.8.28.3
Scheduling on Dynamic Resources
At each scheduling cycle, the scheduler queries each dynamic resource. If a dynamic
resource is not under the control of PBS, jobs requesting it may run in an unpredictable fashion.
4.8.28.4
Scheduling on the walltime Resource
The scheduler looks at each job in priority order, and tries to run the job. The scheduler
checks whether there is an open time slot on the requested resources that is at least as long as
the job’s walltime. If there is, the scheduler runs the job.
AG-210
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
PBS examines each shrink-to-fit job when it gets to it, and looks for a time slot whose length
is between the job’s min_walltime and max_walltime. If the job can fit somewhere, PBS sets
the job’s walltime to a duration that fits the time slot, and runs the job. For more information
about shrink-to-fit jobs, see section 4.8.41, “Using Shrink-to-fit Jobs”, on page 279.
4.8.28.4.i
Caveats for Scheduling on walltime
Do not set values for resources such as walltime at the server or a queue, because the scheduler will not allocate more than the specified value. This means that if you set
resources_available.walltime at the server to 10:00:00, and one job requests 5 hours and
one job requests 6 hours, only one job will be allowed to run at a time, regardless of other idle
resources.
4.8.28.5
Unrequestable or Invisible Resources
You can define custom resources that are invisible to and unrequestable by users, or simply
unrequestable by users. The scheduler treats these resources the same as visible, requestable
resources. See section 5.14.2.10, “Resource Permission Flags”, on page 351.
4.8.28.6
Enforcing Scheduling on Resources
The scheduler chooses which resources to schedule on according to the following rules:
•
The scheduler always schedules jobs based on the availability of the following vnodelevel resources:
vnode
host
Any Boolean resource
•
The scheduler will schedule jobs based on the availability of other resources only if those
resources are listed in the “resources:” line in PBS_HOME/sched_priv/
sched_config. Some resources are automatically added to this line. You can add
resources to this line. The following resources are automatically added to this line:
aoe
arch
host
mem
ncpus
netwins
vnode
PBS Professional 13.0 Administrator’s Guide
AG-211
Scheduling
Chapter 4
4.8.28.7
Matching Unset Resources
When job resource requests are being matched with available resources, unset resources are
treated as follows:
•
A numerical resource that is unset on a host is treated as if it were zero
•
An unset resource on the server or queue is treated as if it were infinite
•
An unset string cannot be matched
•
An unset Boolean resource is treated as if it were set to False.
•
The resources ompthreads, mpiprocs, and nodes are ignored for unset resource matching.
The following table shows how a resource request will or won’t match an unset resource at
the host level.
Table 4-13: Matching Requests to Unset Host-level Resources
Resource Type
Unset Resource
Matching Request Value
Boolean
False
False
float
0.0
0.0
long
0
0
size
0
0
string
Never matches
string array
Never matches
time
4.8.28.7.i
0, 0:0, 0:0.0, 0:0:0
0, 0:0, 0:0.0, 0:0:0
When Dynamic Resource Script Fails
If a server dynamic resource script fails, the scheduler uses the value of
resources_available.<resource>. If this was never set, it is treated as an unset resource,
described above.
If a host-level dynamic resource script fails, the scheduler treats the resource as if its value is
zero.
AG-212
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.28.7.ii
Chapter 4
Backward Compatibility of Unset Resources
To preserve backward compatibility, you can set the server’s resource_unset_infinite
attribute with a list of host-level resources that will behave as if they are infinite when they are
unset. See “resource_unset_infinite” on page 309 of the PBS Professional Reference Guide
for information on resource_unset_infinite.
4.8.28.8
Resource Scheduling Caveats
•
Do not set values for resources such as walltime at the server or a queue, because the
scheduler will not allocate more than the specified value. This means that if you set
resources_available.walltime at the server to 10:00:00, and one job requests 5 hours
and one job requests 6 hours, only one job will be allowed to run at a time, regardless of
other idle resources.
•
Jobs may be placed on different vnodes from those where they would have run in earlier
versions of PBS. This is because a job’s resource request will no longer match the same
resources on the server, queues and vnodes.
•
Beware of application license race conditions. If two jobs require the same application
license, the first job may be started, but may not get around to using the license before the
second job is started and uses the license. The first job must then wait until the license is
available, taking up resources. The scheduler cannot avoid this problem.
4.8.29
Node Grouping
The term “node grouping” has been superseded by the term “placement sets”. Vnodes were
originally grouped according to the value of one resource, so for example all vnodes with a
value of linux for arch were grouped together, and all vnodes with a value of solaris for arch
were in a separate group. We use placement sets now because this means grouping vnodes
according to the value of one or more resources. See section 4.8.32, “Placement Sets”, on
page 224.
4.8.29.1
Configuring Old-style Node Grouping
Configuring old-style node grouping means that you configure the simplest possible placement sets. In order to have the same behavior as in the old node grouping, group on a single
resource. If this resource is a string array, it should only have one value on each vnode. This
way, each vnode will be in only one node group.
You enable node grouping by setting the server’s node_group_enable attribute to True.
You can configure one set of vnode groups for the entire complex by setting the server’s
node_group_key attribute to a resource name.
PBS Professional 13.0 Administrator’s Guide
AG-213
Scheduling
Chapter 4
You can configure node grouping separately for each queue by setting that queue’s
node_group_key attribute to a resource name.
4.8.30
Overrides
You can use various overrides to change how one or more jobs run.
4.8.30.1
Run a Job Manually
You can tell PBS to run a job now, and you can optionally specify where to run it. You run a
job manually using the qrun command.
The -H option to the qrun command makes an important difference:
qrun
When preemption is enabled, the scheduler preempts other jobs in order to run this
job. Running a job via qrun gives the job higher preemption priority than any other
class of job, except for reservation jobs.
When preemption is not enabled, the scheduler runs the job only if enough resources
are available.
qrun -H
PBS runs the job regardless of scheduling policy and available resources.
The qrun command alone overrides the following:
•
Limits on resource usage by users, groups, and projects
•
Limits on the number of jobs that can be run at a vnode
•
Boundaries between primetime and non-primetime, specified in backfill_prime
•
Whether the job is in a primetime queue: you can run a job in a primetime queue even
when it’s not primetime, or vice versa. Primetime boundaries are not honored.
•
Dedicated time: you can run a job in a dedicated time queue, even if it’s not in a dedicated
time queue, and vice versa. However, dedicated time boundaries are still honored.
The qrun command alone does not override the following:
•
Server and queue resource usage limits
AG-214
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.30.1.i
Chapter 4
Using qrun Without -H Option on Shrink-to-fit Jobs
When a shrink-to-fit job is run via qrun, and there is a hard deadline, e.g. reservation or dedicated time, that conflicts with the shrink-to-fit job’s max_walltime but not its min_walltime,
the following happens:
•
If preemption is enabled and there is a preemptable job before the hard deadline that must
be preempted in order to run the shrink-to-fit job, preemption behavior means that the
shrink-to-fit job does not shrink to fit; instead, it conflicts with the deadline and does not
run.
•
If there is no preemptable job before the hard deadline, the shrink-to-fit job shrinks into
the available time and runs.
4.8.30.1.ii
Using qrun With -H Option on Shrink-to-fit Jobs
When a shrink-to-fit job is run via qrun -H, the shrink-to-fit job runs, regardless of reservations, dedicated time, other jobs, etc. When run via qrun -H, shrink-to-fit jobs do not
shrink. If the shrink-to-fit job has a requested or inherited value for walltime, that value is
used, instead of one set by PBS when the job runs. If no walltime is specified, the job runs
without a walltime.
See “qrun” on page 195 of the PBS Professional Reference Guide, and section 4.8.33, “Using
Preemption”, on page 241.
4.8.30.1.iii
qrun Caveats
•
A job that has just been run via qrun has top priority only during the scheduling cycle
where it was qrun. At the next scheduling cycle, that job is available for preemption just
like any other job.
•
Be careful when using qrun -H on jobs or vnodes involved in reservations.
4.8.30.2
Hold a Job Manually
You can use the qhold command to place a hold on a job. The effect of placing a hold
depends on whether the job is running and whether you have checkpointing configured:
•
If the job is queued, the job will not run.
•
If the job is running and checkpoint-abort is configured, the job is checkpointed,
requeued, and held.
•
If the job is running and checkpoint-abort is not configured, the only change is that the
job’s Hold_Types attribute is set to User Hold. If the job is subsequently requeued, it
will not run until the hold is released.
You can release the hold using the qrls command.
PBS Professional 13.0 Administrator’s Guide
AG-215
Scheduling
Chapter 4
For information on checkpointing jobs, see section 9.3, “Checkpoint and Restart”, on page
857.
See “qhold” on page 155 of the PBS Professional Reference Guide and “qrls” on page 193 of
the PBS Professional Reference Guide.
4.8.30.3
Suspend a Job Manually
You can use the qsig -s suspend command to suspend a job so that it won’t run. If you
suspend a job, and then release it using the qsig -s resume command, the job remains in
the suspended state until the required resources are available.
You can resume the job immediately by doing the following:
1.
Resume the job:
qsig -s resume <job ID>
2.
Run the job manually:
qrun <job ID>
See “qsig” on page 207 of the PBS Professional Reference Guide.
4.8.30.4
Set Special Resource Value Used in Formula
You can change the value of a resource used in the job sorting formula. For example, to give
a particular job a higher priority by changing the value of a custom resource called “higher”:
•
Create a custom resource that is invisible to job submitters:
Qmgr: create resource higher type=float, flag=i
•
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
•
The formula expression includes “higher”:
Qmgr: s s job_sort_formula = “higher”
AG-216
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
• Set the default for this resource at the server:
Qmgr: set server resources_default.higher = 1
•
These jobs are submitted:
Job 1
Job 2
Job 3
•
Change Job 2 so that its value for “higher” is 5:
qalter –l higher = 5 job2
•
The scheduler logs the following:
Job ;1.host1;Formula Evaluation = 1
Job ;2.host1;Formula Evaluation = 5
Job; 3.host1;Formula Evaluation = 1
•
Jobs are sorted in this order:
Job 2
Job 1
Job 3
4.8.30.5
Change Formula On the Fly
You can change the job sorting formula on the fly, so that the next scheduler iteration uses
your new formula. This will change how job priorities are computed, and can rearrange the
order in which jobs are run. See section 4.8.20, “Using a Formula for Computing Job Execution Priority”, on page 194.
4.8.30.6
Using Dedicated Time
You can set up blocks of dedicated time, where the only jobs eligible to be started or running
are the ones in dedicated time queues. You can use dedicated time for upgrades. See section
4.8.10, “Dedicated Time”, on page 161, and section 2.2.5.2.i, “Dedicated Time Queues”, on
page 22.
PBS Professional 13.0 Administrator’s Guide
AG-217
Chapter 4
4.8.30.7
Scheduling
Using cron Jobs or the Windows Task
Scheduler
You can use cron jobs or the Windows Task Scheduler to change PBS settings according to
the needs of your time slots. See section 4.8.7, “cron Jobs, or the Windows Task Scheduler”,
on page 139.
4.8.30.8
Using Hooks
You can use hooks to examine jobs and alter their characteristics. See Chapter 6, "Hooks", on
page 437.
4.8.31
Peer Scheduling
Peer scheduling allows separate PBS complexes to automatically run jobs from each other’s
queues. This means that you can dynamically balance the workload across multiple, separate
PBS complexes. These cooperating PBS complexes are referred to as “Peers”.
4.8.31.1
How Peer Scheduling Works
In peer scheduling, a PBS server pulls jobs from one or more peer servers and runs them
locally. When Complex A pulls a job from Complex B, Complex A is the “pulling” complex
and Complex B is the “furnishing” complex. When the pulling scheduler determines that
another complex’s job can immediately run locally, it moves the job to the specified queue on
the pulling server and immediately run the job. The job is run as if it had been submitted to
the pulling complex.
You can set up peer scheduling so that A pulls from B and C, and so that B also pulls from A
and C.
A job is pulled only when it can run immediately.
The pulling complex must have all of the resources required by the job, including custom
resources.
When a job is pulled from one complex to another, the pulling complex applies its policy to
the job. The job’s execution priority is determined by the policy of the pulling complex. You
can set special priority for pulled jobs; see section 4.8.31.4.ii, “Setting Priority for Pulled
Jobs”, on page 222.
AG-218
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.31.2
Chapter 4
Prerequisites for Peer Scheduling
•
You must create the pulling and furnishing queues before peer scheduling can be configured. See section 2.2.3, “Creating Queues”, on page 20 on how to create queues.
•
When configuring peer scheduling, it is strongly recommended to use the same version of
PBS Professional at all peer locations.
•
Make sure that custom resources are consistent across peer locations. Jobs requesting
custom resources at one location will not be able to run at another unless the same
resources are available.
•
Under Windows, if single_signon_password_enable is set to True among all peer servers, users must have their password cached on each server. See section 8.11.1.1, “Peruser/per-server Passwords”, on page 824.
4.8.31.3
Configuring Peer Scheduling
The following sections give details on how to configure peer scheduling. Here is a brief outline:
•
Define a flat user namespace on all complexes
•
Map pulling queues to furnishing queues
•
If necessary, specify port
•
Grant manager access to each pulling server
•
If possible, make user-to-group mappings be consistent across complexes
•
If any of the peering sites is using failover, configure peering to work with failover
4.8.31.3.i
Defining a Flat User Namespace
Peer scheduling requires a flat user namespace in all complexes involved. This means that
user “joe” on the remote peer system(s) must be the same as user “joe” on the local system.
Your site must have the same mapping of user to UID across all hosts, and a one-to-one mapping of UIDs to usernames. It means that PBS does not need to check whether X@hostA is
the same as X@hostB; it can just assume that this is true. Set flatuid to True:
Qmgr: set server flatuid = True
For more on flatuid, see section 8.3.13, “Flatuid and Access”, on page 810.
4.8.31.3.ii
Mapping Pulling Queues to Furnishing Queues
You configure peer scheduling by mapping a furnishing peer’s queue to a pulling peer’s
queue. You can map each pulling queue to more than one furnishing queue, or more than one
pulling queue to each furnishing queue.
PBS Professional 13.0 Administrator’s Guide
AG-219
Scheduling
Chapter 4
The pulling and furnishing queues must be execution queues, not route queues. However, the
queues can be either ordinary queues that the complex uses for normal work, or special
queues set up just for peer scheduling.
You map pulling queues to furnishing queues by setting the peer_queue scheduler configuration option in PBS_HOME/sched_priv/sched_config. The format is:
peer_queue: “<pulling queue> <furnishing queue>@<furnishing
server>.domain”
For example, Complex A’s queue “workq” is to pull from two queues: Complex B’s queue
“workq” and Complex C’s queue “slowq”. Complex B’s server is ServerB and Complex C’s
server is ServerC. You would add this to Complex A’s PBS_HOME/sched_priv/
sched_config:
peer_queue: “workq workq@ServerB.domain.com”
peer_queue: “workq slowq@ServerC.domain.com”
Or if you wish to direct Complex B’s jobs to queue Q1 on Complex A, and Complex C’s jobs
to Q2 on Complex A:
peer_queue: “Q1 workq@ServerB.domain.com”
peer_queue: “Q2 fastq@ServerC.domain.com”
In one complex, you can create up to 50 mappings between queues. This means that you can
have up to 50 lines in PBS_HOME/sched_priv/sched_config beginning with
“peer_queue”.
4.8.31.3.iii
Specifying Ports
The default port for the server to listen on is 15001, and the scheduler uses any privileged port
(1023 and lower). If the furnishing server is not using the default port, you must specify the
port when you specify the queue. For example, if ServerB is using port 16001, and you wish
to pull jobs from workq at ServerB to workq at ServerA, add this to PBS_HOME/
sched_priv/sched_config at ServerA:
peer_queue: “workq workq@ServerB.domain.com:16001”
The scheduler and server communicate via TCP.
4.8.31.3.iv
Granting Manager Access to Pulling Servers
Each furnishing server must grant manager access to each pulling server. If you wish jobs to
move in both directions, where Complex A will both pull from and furnish jobs to Complex
B, ServerA and ServerB must grant manager access to each other.
On the furnishing complex:
AG-220
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
For UNIX:
Qmgr: set server managers += root@pullingServer.domain.com
For Windows:
Qmgr: set server managers += <name of PBS service account>@*
4.8.31.3.v
Making User-to-group Mappings Consistent Across
Complexes
If possible, ensure that for each user in a peer complex, that user is in the same group in all
participating complexes. So if user “joe” is in groupX on Complex A, user “joe” should be in
groupX on Complex B. This means that a job’s egroup attribute will be the same on both
complexes, and any group limit enforcement can be properly applied.
There is a condition when using peer scheduling in which group hard limits may not be
applied correctly. This can occur when a job’s effective group, which is its egroup attribute,
i.e. the job’s owner’s group, is different on the furnishing and pulling systems. When the job
is moved over to the pulling complex, it can evade group limit enforcement if the group under
which it will run on the pulling system has not reached its hard limit. The reverse is also true;
if the group under which it will run on the pulling system has already reached its hard limit,
the job won’t be pulled to run, although it should.
This situation can also occur if the user explicitly specifies a group via qsub -W
group_list.
It is recommended to advise users to not use the qsub options “-u user_list” or “-W
group_list=groups” in conjunction with peer scheduling.
4.8.31.3.vi
Configuring Peer Scheduling with Failover
If you are configuring peer scheduling so that Complex A will pull from Complex B where
Complex B is configured for failover, you must configure Complex A to pull from both of
Complex B’s servers. For these instructions, see section 9.2.6.2, “Configuring Failover to
Work With Peer Scheduling”, on page 853.
PBS Professional 13.0 Administrator’s Guide
AG-221
Chapter 4
Scheduling
4.8.31.4
Peer Scheduling Advice
4.8.31.4.i
Selective Peer Scheduling
You can choose the kinds of jobs that can be selected for peer scheduling to a different complex. You can do the following:
•
Set resource limits at the furnishing queue via the resources_min and resources_max
queue attributes. See section 2.2.6.4, “Using Resources to Route Jobs Between Queues”,
on page 25.
•
Route jobs into the furnishing queue via a hook. See section 6.6.1, “Routing Jobs”, on
page 448.
•
Route jobs into the furnishing queue via a routing queue. See section 2.2.6, “Routing
Queues”, on page 24.
4.8.31.4.ii
Setting Priority for Pulled Jobs
You can set a special priority for pulled jobs by creating a queue that is used only as a pulling
queue, and setting the pulling queue’s priority to the desired level. You can then use the
queue’s priority when setting job execution priority. See section 4.2.5.3.iv, “Using Queue Priority when Computing Job Priority”, on page 70.
For example, if you give the pulling queue the lowest priority, the pulling complex will pull a
job only when there are no higher-priority jobs that can run.
You can also have pulled jobs land in a special queue where they inherit a custom resource
that is used in the job sorting formula.
4.8.31.5
4.8.31.5.i
How Peer Scheduling Affects Jobs
How Peer Scheduling Affects Inherited Resources
If the job is moved from one server to another via peer scheduling, any default resources in
the job’s resource list inherited from the furnishing queue or server are removed. This
includes any select specification and place directive that may have been generated by the rules
for conversion from the old syntax. If a job's resource is unset (undefined) and there exists a
default value at the new queue or server, that default value is applied to the job's resource list.
If either select or place is missing from the job's new resource list, it will be automatically
generated, using any newly inherited default values.
When the pulling scheduler runs the job the first time, the job is run as if the job still had all of
the resources it had at the furnishing complex. If the job is requeued and restarted at the pulling complex, the job picks up new default resources from the pulling complex, and is scheduled according to the newly-inherited resources from the pulling complex.
AG-222
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.31.5.ii
Chapter 4
How Peer Scheduling Affects Policy Applied to Job
After a job is pulled from one complex to another, the scheduling policy of the pulling complex is applied to the job.
For example, if you use queue priority in the formula and the job is moved to another server
through peer scheduling, the queue priority used in the formula will be that of the queue to
which the job is moved.
When a job is pulled from one complex to another, hooks are applied at the new complex as if
the job had been submitted locally. For example, if the pulling complex has a queuejob
hook, that hook runs when a job is pulled.
4.8.31.5.iii
How Peer Scheduling Affects Job Eligible Time
The job’s eligible_time is preserved when a job is moved due to peer scheduling.
4.8.31.5.iv
Viewing Jobs That Have Been Moved to Another Server
If you are connected to ServerA and a job submitted to ServerA has been moved from ServerA to ServerB through peer scheduling, in order to display it via qstat, give the job ID as
an argument to qstat. If you only give the qstat command, the job will not appear to
exist. For example, the job 123.ServerA is moved to ServerB. In this case, use
qstat 123
or
qstat 123.ServerA
To list all jobs at ServerB, you can use:
qstat @ServerB
4.8.31.5.v
Peer Scheduling and Hooks
When a job is pulled from one complex to another, the following happens:
•
Hooks are applied at the new complex as if the job had been submitted locally
•
Any movejob hooks at the furnishing server are run
4.8.31.6
Peer Scheduling Caveats
•
Each complex can peer with at most 50 other complexes.
•
When using peer scheduling, group hard limits may not be applied correctly. This can
occur when the job owner’s group is different on the furnishing and pulling systems. For
PBS Professional 13.0 Administrator’s Guide
AG-223
Scheduling
Chapter 4
help in avoiding this problem, see section 4.8.31.3.v, “Making User-to-group Mappings
Consistent Across Complexes”, on page 221.
•
You cannot peer schedule between a Windows complex and a UNIX/Linux complex.
•
When the pulling scheduler runs the job the first time, the job is run as if the job still had
all of the resources it had at the furnishing complex. If the job is requeued and restarted
at the pulling complex, the job picks up new default resources from the pulling complex,
and is scheduled according to the newly-inherited resources from the pulling complex.
•
Peer scheduling is not supported for job arrays.
4.8.32
Placement Sets
Placement sets are the sets of vnodes within which PBS will try to place a job. PBS tries to
group vnodes into the most useful sets, according to how well connected the vnodes are, or
the values of resources available at the vnodes. Placement sets are used to improve task
placement (optimizing to provide a “good fit”) by exposing information on system configuration and topology. The scheduler tries to put a job in the smallest appropriate placement set.
4.8.32.1
Definitions
Task placement
The process of choosing a set of vnodes to allocate to a job that will satisfy both the
job's resource request (select and place specifications) and the configured scheduling
policy.
Placement Set
A set of vnodes. Placement sets are defined by the values of vnode-level string array
resources. A placement set is all of the vnodes that have the same value for a specified defining resource substring. For example, if the defining resource is a vnodelevel string array named “switch”, which can have values “S1”, “S2”, or “S3”: the
set of vnodes which have a substring matching “switch=S2” is a placement set.
Placement sets can be specified at the server or queue level.
Placement Set Series
A set of sets of vnodes.
A placement set series is all of the placement sets that are defined by specifying one
string array resource. Each placement set in the series is the set of vnodes that share
one value for the resource. There is one placement set for each value of the resource.
If the resource takes on N values at the vnodes, then there are N sets in the series.
For example, if the defining resource is a string array named “switch”, which can
have values “S1”, “S2”, or “S3”, there are three sets in the series. The first is defined
AG-224
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
by the value “S1”, where all the vnodes in that set have the value “S1” for the
resource switch. The second set is defined by “S2”, and the third by “S3”.
Each of the resources named in node_group_key specifies a placement series. For
example, if the server’s node_group_key attribute contains “router,switch”, then
the server has two placement set series.
Placement Pool
All of the placement sets that are defined; the server can have a placement pool, and
each queue can have its own placement pool. If a queue has no placement pool, the
scheduler uses the server’s placement pool.
A placement pool is the set of placement set series that are defined by one or more
string array resources named in node_group_key.
For example, if the server’s node_group_key attribute contains “router,switch”, and
router can take the values “R1” and “R2” and switch can take the values “S1”, “S2”,
and “S3”, then there are five placement sets, in two placement series, in the server’s
placement pool.
Static Fit
A job statically fits into a placement set if the job could fit into the placement set if
the set were empty. It might not fit right now with the currently available resources.
Dynamic Fit
A job dynamically fits into a placement set if it will fit with the currently available
resources (i.e. the job can fit right now).
4.8.32.2
Requirements for Placement Sets
•
Placement sets are enabled by setting the server’s node_group_enable attribute to True
•
Server-level placement sets are defined by setting the server’s node_group_key attribute
to a list of vnode-level string array resources.
•
Queue-level placement sets are defined by setting a queue’s node_group_key attribute to
a list of vnode-level string array resources.
•
At least one vnode-level string array resource must exist on vnodes and be set to values
that can be used to partition the vnodes.
PBS Professional 13.0 Administrator’s Guide
AG-225
Chapter 4
4.8.32.3
4.8.32.3.i
Scheduling
Description of Placement Sets
What Defines a Placement Set, Series, or Pool
Placement sets are defined by the values of vnode-level string array resources. You define
placement sets by specifying the names of these resources in the node_group_key attribute
for the server and/or queues. Each value of each resource defines a different placement set. A
placement set is all of the vnodes that have the same value for a specified defining resource.
For example, if the defining resource is a vnode-level string array named “switch”, which has
the values “S1”, “S2”, and “S3”, the set of vnodes where switch has the value “S2” is a placement set. If some vnodes have more than one substring, and one of those substrings is the
same in each vnode, those vnodes make up a placement set. For example, if the resource is
“router”, and vnode V0 has resources_available.router set to “r1i0,r1”, and vnode V1 has
resources_available.router set to “r1i1,r1”, V0 and V1 are in the placement set defined by
resources_available.router = “r1”. If the resource has N distinct values across the vnodes,
including the value zero and being unset, there are N placement sets defined by that resource.
Each placement set can have a different number of vnodes; the number of vnodes is determined only by how many vnodes share that resource value.
Each placement set series is defined by the values of a single resource across all the vnodes.
For example, if there are three switches, S1, S2 and S3, and there are vnodes with
resources_available.switch that take on one or more of these three values, then there will
be three placement sets in the series.
Whenever you define any placement sets, you are defining a placement pool. Placement
pools can be defined for the server and for each queue. You define a server-level placement
pool by setting the server’s node_group_key to a list of one or more vnode-level string array
resources. You define a queue-level placement pool by similarly setting the queue’s
node_group_key.
4.8.32.3.ii
Vnode Participation in Placement Sets, Series, and
Pools
Each vnode can be in multiple placement sets, placement set series, and placement pools.
A vnode can be in multiple placement sets in the same placement set series. For example, if
the resource is called “router”, and a vnode’s router resource is set to “R1, R2”, then the
vnode will be in the placement set defined by router = R1 and the set defined by router = R2.
A vnode is in a placement series whenever the resource that defines the series is defined on
the vnode. For example, if placement sets are defined by the values of the router and the
switch resources, and a vnode has value R1 for router, and S1 for switch, then the vnode is in
both placement series, because it is in the set that shares the R1 value for router, and the set
that shares the S1 value for switch. Each of those sets is one of a different series.
AG-226
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
The server has its own placement pool if the server’s node_group_key attribute is set to at
least one vnode-level string array resource. Similarly, each queue can have its own placement
pool. A vnode can be in any placement pool that specifies a resource that is defined on the
vnode.
4.8.32.3.iii
Multihost Placement Sets
Placement sets, series, and pools can span hosts. Placement sets can be made up of vnodes
from anywhere, regardless of whether the vnode is from a multi-vnode host.
To set up a multihost placement set, choose a string array resource for the purpose, and list it
in the desired node_group_key attribute. For example, create a string_array resource called
“span”:
Qmgr: create resource span type=string_array, flag=h
Add the resource “span” to node_group_key on the server or queue. Use qmgr to give it
the same value on all the desired vnodes. You can write a script that sets the same value on
each vnode that you want in your placement set.
4.8.32.3.iv
Machines with Multiple Vnodes
Machines with multiple vnodes such as the SGI Altix are represented as a generic set of
vnodes. Placement sets are used to allocate resources on a single machine to improve performance and satisfy scheduling policy and other constraints. Jobs are placed on vnodes using
placement set information. For placement set generation information for SGI machines, see
section 10.4.8.1, “Generation of Placement Set Information”, on page 958.
4.8.32.3.v
Placement Sets Defined by Unset Resources
For each defining resource, vnodes where that resource is unset are grouped into their own
placement set. For example, if you have ten vnodes, on which there is a string resource
COLOR, where two have COLOR set to “red”, two are set to “blue”, two are set to “green”
and the rest are unset, there will be four placement sets defined by the resource COLOR. This
is because the fourth placement set consists of the four vnodes where COLOR is unset. This
placement set will also be the largest.
Every resource listed in node_group_key could potentially define such a placement set.
4.8.32.3.vi
Placement Sets and Node Grouping
Node grouping is the same as one placement set series, where the placement sets are defined
by a single resource. Node grouping has been superseded by placement sets.
In order to have the same behavior as in the old node grouping, group on a single resource. If
this resource is a string array, it should only have one value on each vnode. This way, each
vnode will only be in one node group.
PBS Professional 13.0 Administrator’s Guide
AG-227
Scheduling
Chapter 4
4.8.32.4
How Placement Sets Are Used
You use placement sets to partition vnodes according to the value of one or more resources.
Placement sets allow you to group vnodes into useful sets.
You can run multi-vnode jobs in one placement set. For example, it makes the most sense to
run a multi-vnode job on vnodes that are all connected to the same high-speed switch.
PBS will attempt to place each job in the smallest possible set that is appropriate for the job.
4.8.32.4.i
Order of Placement Pool Selection
The scheduler chooses one placement pool from which to select a placement set. If the job
cannot run in that placement pool, the scheduler ignores placement sets for the job.
Queue placement pools override the server’s placement pool. If a queue has a placement
pool, jobs from that queue are placed using the queue’s placement pool. If a queue has no
placement pool (the queue’s node_group_key is not defined), jobs are placed using the
server’s placement pool, if it exists.
A per-job placement set is defined by the -l place statement in the job’s resource request.
Since the job can only request one value for the resource, it can only request one specific
placement set. A job’s place=group resource request overrides the sets defined by the
queue’s or server’s node_group_key.
The scheduler chooses the most specific placement pool available, following this order of precedence:
1.
A per-job placement set (job’s place=group= request)
2.
A placement set from the placement pool for the job’s queue
3.
A placement set from the complex-wide placement pool
4.
All vnodes in the complex
4.8.32.4.ii
Order of Placement Set Consideration Within Pool
The scheduler looks in the selected placement pool and chooses the smallest possible placement set that is appropriate for the job. The scheduler examines the placement sets in the pool
and orders them, from smallest to largest, according to the following rules:
1.
Static total ncpus of all vnodes in set
2.
Static total mem of all vnodes in set
3.
Dynamic free ncpus of all vnodes in set
4.
Dynamic free mem of all vnodes in set
AG-228
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
If a job can fit statically within any of the placement sets in the placement pool, then the
scheduler places a job in the first placement set in which it fits dynamically. This ordering
ensures the scheduler will use the smallest possible placement set in which the job will
dynamically fit. If there are multiple placement sets where the job fits statically, but some are
being used, the scheduler uses the first placement set where the job can run now. If the job fits
statically into at least one placement set, but these placement sets are all busy, the scheduler
waits until a placement set can fit the job dynamically.
If a job cannot statically fit into any placement set in the selected placement pool, the scheduler ignores defined placement sets and uses all available vnodes as its placement set, unless
the do_not_span_psets scheduler attribute is True, in which case the job will not run.
For example, we have the following placement sets, and a job that requests 8 CPUs:
Set1 ncpus = 4
Set2 ncpus = 12; this placement set is full
Set3 ncpus = 16; this placement set is not being used
The scheduler looks at Set1; Set1 is statically too small, and the scheduler moves to the next
placement set. Set2 is statically large enough, but the job does not fit dynamically. The
scheduler looks at Set3; Set3 is large enough, and the job fits dynamically. The scheduler runs
the job in Set3.
If the job requests 24 CPUs, the scheduler attempts to run the job in the set consisting of all
vnodes.
4.8.32.4.iii
Order of Vnode Selection Within Set
The scheduler orders the vnodes within the selected placement set using the following rules:
•
If node_sort_key is set, vnodes are sorted by node_sort_key. See section 4.8.48, “Sorting Vnodes on a Key”, on page 300.
•
If node_sort_key is not set, the order in which the vnodes are returned by
pbs_statnode(). This is the default order the vnodes appear in the output of the
pbsnodes -a command.
The scheduler places the job on the vnodes according to their ordering above.
PBS Professional 13.0 Administrator’s Guide
AG-229
Chapter 4
4.8.32.5
Scheduling
Summary of Placement Set Requirements
The steps to configure placement sets are given in the next section. The requirements are
summarized here for convenience:
•
Definitions of the resources of interest
•
Vnodes defining a value for each resource to be used for placement sets (e.g., rack)
•
•
The server’s or queue’s node_group_key attribute must be set to the resources to be used
for placement sets. For example, if we have custom resources named “rack”, “socket”,
“board”, and “boardpair”, which are to be used for placement sets:
Qmgr: set server node_group_key = “rack,socket,board,boardpair”
•
•
If defined via vnode definition, you must HUP the MoMs involved
No signals needed, takes effect immediately
Placement sets must be enabled at the server by setting the server’s node_group_enable
attribute to True. For example:
Qmgr: set server node_group_enable=True
•
No signals needed, takes effect immediately
Adding a resource to the scheduler's resources: line is required only if the resource is to be
specifically requested by jobs. It is not required for -lplace=group=<resource>.
AG-230
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.32.6
Chapter 4
How to Configure Placement Sets
The following steps show how to satisfy the requirements for placement sets:
1.
If the vnodes that you will use in placement sets are not defined, define them. See section
3.1.5, “Creating Vnodes”, on page 39.
2.
If the vnode-level string array resources that you will use to define placement sets do not
exist, create them. See section 5.14.5, “Configuring Host-level Custom Resources”, on
page 360.
3.
Restart the server; see section 5.14.3.1, “Restarting the Server”, on page 356.
4.
If values for the vnode-level string array resources that you will use to define placement
sets are not set at the vnodes you wish to use, set the values. See section 3.5, “How to
Configure MoMs and Vnodes”, on page 50.
5.
If you use vnode definition files to set values for vnode-level string array resources, HUP
the MoMs involved.
6.
To create queue placement pools, set the node_group_key attribute to the name(s) of one
or more vnode-level string array resources. Do this for each queue for which you want a
separate pool. For example:
Qmgr: set queue workq node_group_key = <router,switch>
7.
To create a server placement pool, set the node_group_key server attribute to the
name(s) of one or more vnode-level string array resources. For example:
Qmgr: set server node_group_key = <router,switch>
For example, to create a server-level placement pool for the resources host, L2 and L3:
Qmgr: set server node_group_key = "host,L2,L3"
8.
Set the server’s node_group_enable attribute to True
Qmgr: set server node_group_enable = True
9.
For ease of reviewing placement set information, you can add the name of each resource
used in each vnode’s pnames attribute:
Qmgr: active node <vnode name>,<vnode name>,...
Qmgr: set node pnames += <resource name>
or
Qmgr: set node pnames = <resource list>
For example:
Qmgr: set node pnames =
PBS Professional 13.0 Administrator’s Guide
AG-231
Chapter 4
Scheduling
“board,boardpair,iruquadrant,iruhalf,iru,rack”
We recommend using the natural vnode for any placement set information that is invariant for
a given host.
Resources used only for defining placement sets, and not for allocation to jobs, do not need to
be listed in the resources: line in PBS_HOME/sched_priv/sched_config. So for
example if you create a resource just for defining placement sets, and jobs will not be requesting this resource, you do not need to list it in the resources: line.
AG-232
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.32.7
Chapter 4
Examples of Creating Placement Sets
4.8.32.7.i
Cluster with Four Switches
This cluster is arranged as shown with vnodes 1-4 on Switch1, vnodes 5-10 on Switch2, and
vnodes 11-24 on Switch3. Switch1 and Switch2 are on Switch4.
Vnode1
Switch1
Vnode2
(4 vnodes)
Vnode3
Vnode4
Switch4
Vnode5
Vnode6
Switch2
Vnode7
(6 vnodes)
Vnode8
Vnode9
Vnode10
Vnode11
Vnode12
Vnode13
Vnode14
Vnode15
Vnode16
Vnode17
Switch3
Vnode18
(14 vnodes)
Vnode19
Vnode20
Vnode21
Vnode22
Vnode23
Vnode24
To make the placement sets group the vnodes as they are grouped on the switches:
Create a custom resource called switch. The -h flag makes the resource requestable:
switch type=string_array, flag=h
PBS Professional 13.0 Administrator’s Guide
AG-233
Scheduling
Chapter 4
On vnodes[1-4] set:
resources_available.switch="switch1,switch4"
On vnodes[5-10] set:
resources_available.switch="switch2,switch4"
On vnodes[11-24] set:
resources_available.switch="switch3"
On the server set:
node_group_enable=True
node_group_key=switch
So you have 4 placement sets:
The placement set "switch1" has 4 vnodes
The placement set "switch2" has 6 vnodes
The placement set "switch3" has 14 vnodes
The placement set "switch4" has 10 vnodes
PBS will try to place a job in the smallest available placement set. Does the job fit into the
smallest set (switch1)? If not, does it fit into the next smallest set (switch2)? This continues
until it finds one where the job will fit.
4.8.32.7.ii
Example of Configuring Placement Sets on an SGI Altix
For information on how to configure vnodes on a cpusetted machine in order to define new
placement sets on an Altix, use the instructions in section 3.5.2.3, “Configuring Machines
with Cpusets”, on page 53.
AG-234
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
In this example, we define a new placement set using the new resource “NewRes”. We create
a file called SetDefs that contains the changes we want.
1.
Create the new resource:
Qmgr: create resource NewRes type=string_array, flag=h
2.
Add NewRes to the server's node_group_key
Qmgr: set server node_group_key+="NewRes"
3.
Add NewRes to the value of the pnames attribute for the natural vnode. This makes the
name of the resource you used easily available. Add a line like this to SetDefs:
altix3: resources_available.pnames =”...,NewRes”
4.
For each vnode, V, that's a member of a new placement set you're defining, add a line of
the form:
V: resources_available.NewRes = “<value1[,...]>”
All the vnodes in the new set should have lines of that form, with the same resource
value, in the new configuration file.
Here the value of the resource is “P” and/or “Q”.
We’ll put vnodes A, B and C into one placement set, and vnodes B, C and D into another.
A:
B:
C:
D:
resources_available.NewRes2
resources_available.NewRes2
resources_available.NewRes2
resources_available.NewRes2
=
=
=
=
P
“P,Q”
“P,Q”
Q
For each new placement set you define, use a different value for the resource.
5.
Add SetDefs and tell MoM to read it, to make a Version 2 MoM configuration file
NewConfig.
pbs_mom -s insert NewConfig SetDefs
6.
Stop and restart the MoM. See “Starting and Stopping PBS: UNIX and Linux” on page
211 in the PBS Professional Installation & Upgrade Guide.
4.8.32.7.iii
Example of Altix Placement Pool
In this example, we have vnodes connected to four cbricks and two L2 connectors.
Enable placement sets:
Qmgr: s s node_group_enable=True
PBS Professional 13.0 Administrator’s Guide
AG-235
Chapter 4
Scheduling
Define the pool you want:
Qmgr: s s node_group_key=”cbrick, L2”
When you use the following:
pbsnodes -av | egrep ‘(^[^ ]) | cbrick
or
pbsnodes -av | egrep ‘(^[^ ]) | L2
and the vnodes look like this:
vnode1
resources_available.cbrick=cbrick1
resources_available.L2=A
vnode2
resources_available.cbrick=cbrick1
resources_available.L2=B
vnode3
resources_available.cbrick=cbrick2
resources_available.L2=A
vnode4
resources_available.cbrick=cbrick2
resources_available.L2=B
vnode5
resources_available.cbrick=cbrick3
resources_available.L2=A
vnode6
resources_available.cbrick=cbrick3
resources_available.L2=B
vnode7
resources_available.cbrick=cbrick4
resources_available.L2=A
vnode8
resources_available.cbrick=cbrick4
resources_available.L2=B
AG-236
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
There are six resulting placement sets.
cbrick=cbrick1: {vnode1, vnode2}
cbrick=cbrick2: {vnode3, vnode4}
cbrick=cbrick3: {vnode5, vnode6}
cbrick=cbrick4: {vnode7, vnode8}
L2=A: {vnode1, vnode3, vnode5, vnode7}
L2=B: {vnode2, vnode4, vnode6, vnode8}
4.8.32.7.iv
Example of Placement Sets Using Colors
A placement pool is defined by two resources: colorset1 and colorset2, by using
“node_group_key=colorset1,colorset2”.
If a vnode has the following values set:
resources_available.colorset1=blue, red
resources_available.colorset2=green
The placement pool contains at least three placement sets. These are:
{resources_available.colorset1=blue}
{resources_available.colorset1=red}
{resources_available.colorset2=green}
This means the vnode is in all three placement sets. The same result would be given by using
one resource and setting it to all three values, e.g. colorset=blue,red,green.
Example: We have five vnodes v1 - v5:
v1
v2
v3
v4
v5
color=red host=mars
color=red host=mars
color=red host=venus
color=blue host=mars
color=blue host=mars
The placement sets are defined by
node_group_key=color
The resulting node groups would be: {v1, v2, v3}, {v4, v5}
4.8.32.7.v
Simple Switch Placement Set Example
Say you have a cluster with two high-performance switches each with half the vnodes connected to it. Now you want to set up placement sets so that jobs will be scheduled only onto
the same switch.
PBS Professional 13.0 Administrator’s Guide
AG-237
Scheduling
Chapter 4
First, create a new resource called “switch”. See section 5.14.2, “Defining New Custom
Resources”, on page 341.
Next, we need to enable placement sets and specify the resource to use:
Qmgr: set server node_group_enable=True
Qmgr: set server node_group_key=switch
Now, set the value for switch on each vnode:
Qmgr:
Qmgr:
Qmgr:
Qmgr:
active node vnode1,vnode2,vnode3
set node resources_available.switch=A
active node vnode4,vnode5,vnode6
set node resources_available.switch=B
Now there are two placement sets:
switch=A: {vnode1, vnode2, vnode3}
switch=B: {vnode4, vnode5, vnode6}
4.8.32.8
Placement Sets and Reservations
When PBS chooses a placement set for a reservation, it makes the same choices as it would
for a regular job. It fits the reservation into the smallest possible placement set. See section
4.8.32.4.ii, “Order of Placement Set Consideration Within Pool”, on page 228.
When a reservation is created, it is created within a placement set, if possible. If no placement
set will satisfy the reservation, placement sets are ignored. The vnodes allocated to a reservation are used as one single placement set for jobs in the reservation; they are not subdivided
into smaller placement sets. A job within a reservation runs within the single placement set
made up of the vnodes allocated to the reservation.
4.8.32.9
Placement Sets and Load Balancing
If you configure both placement sets and load balancing, the net effect is that vnodes that are
over their load limit will be removed from consideration.
AG-238
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.32.10
Chapter 4
Viewing Placement Set Information
You can find information about placement sets in the following places:
•
The server’s node_group_enable attribute shows whether placement sets are enabled
•
The server’s node_group_key attribute contains the names of resources used for that
queue’s placement pool
•
Each queue’s node_group_key attribute contains the names of resources used for that
queue’s placement pool
•
Each vnode’s pnames attribute can contain the names of resources used for placement
sets, if properly set
•
PBS-generated MoM configuration files contain names and values of resources
4.8.32.11
Placement Set Caveats and Advice
•
When you create a Version 2 configuration file for a pre-existing vnode, make sure it
specifies all of the information about the vnode, such as resources and attribute settings.
The creation of the configuration file overrides previous settings, and if the new file contains no specification for a resource or attribute, that resource or attribute becomes unset.
•
If there is a vnode-level platform-specific resource set on the vnodes on the Altix, then
node_group_key should probably include this resource, because this will enable PBS to
run jobs in more logical sets of vnodes.
•
If the user specifies a job-specific placement set, for example
-lplace=group=switch, but the job cannot statically fit into any switch placement
set, then the job will still run, but not in a switch placement set.
•
The pnames vnode attribute is for displaying to the administrator the resources used for
placement sets. This attribute is not used by PBS.
4.8.32.11.i
Non-backward-compatible Change in Node Grouping
Given the following example configuration:
vnode1: switch=A
vnode2: switch=A
vnode3: switch=B
vnode4: switch=B
vnode5: switch unset
Qmgr: s s node_group_key=switch
PBS Professional 13.0 Administrator’s Guide
AG-239
Scheduling
Chapter 4
There is no change in the behavior of jobs submitted with qsub -l ncpus=1
version 7.1: The job can run on any node: node1, ..., node5
version 8.0: The job can run on any node: node1, ..., node5
Example of 8.0 and later behavior: jobs submitted with qsub -lnodes=1
version 7.1: The job can only run on nodes: node1, node2, node3, node4. It will never
use node5
version 8.0: The job can run on any node: node1, ..., node5
Overall, the change for version 8.0 was to include every vnode in placement sets (when
enabled). In particular, if a resource is used in node_group_key, PBS will treat every vnode
as having a value for that resource, hence every vnode will appear in at least one placement
set for every resource. For vnodes where a string resource is "unset", PBS will behave as if
the value is “”.
4.8.32.12
Attributes and Parameters Affecting Placement
Sets
do_not_span_psets
Scheduler attribute. Specifies whether or not the scheduler requires the job to fit
within one of the existing placement sets. When do_not_span_psets is set to True,
the scheduler will require the job to fit within a single existing placement set. The
scheduler checks all placement sets, whether or not they are currently in use. If the
job fits in a currently-used placement set, the job must wait for the placement set to
be available. If the job cannot fit within a single placement set, it will not run.
When this attribute is set to False, the scheduler will first attempt to place the job in
a single placement set, but if it cannot, it will allow the job to span placement sets,
running on whichever vnodes can satisfy the job’s resource request.
Format: Boolean
Default value: False (This matches behavior of PBS 10.4 and earlier)
Example: To require jobs to fit within one placement set:
Qmgr: set sched do_not_span_psets=True
node_group_enable
Server attribute. Specifies whether placement sets are enabled.
Format: Boolean
Default: False
AG-240
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
node_group_key
Server and queues have this attribute. Specifies resources to use for placement set
definition. Queue’s attribute overrides server’s attribute.
Format: string_array
Default: Unset
4.8.32.13
Errors and Logging
If do_not_span_psets is set to True, and a job requests more resources than are available in
one placement set, the following happens:
•
The job's comment is set to the following:
“Not Running: can't fit in the largest placement set, and can't span psets”
•
The following message is printed to the scheduler’s log:
“Can't fit in the largest placement set, and can't span placement sets”
4.8.33
Using Preemption
PBS provides the ability to preempt currently running jobs in order to run higher-priority
work. This is called preemption or preemptive scheduling. PBS has two different approaches
to specifying preemption:
•
You can define a set of preemption priorities for all jobs. Jobs that have high preemption
priority preempt those with low preemption priority. Preemption priority is mostly independent of execution priority. See section 4.8.33.6, “Preemption Levels”, on page 244.
•
You can specify a set of preemption targets for each job. You can also set defaults for
these targets at the server and queues. Preemption targets are jobs in specific queues or
that have requested specific resources. See section 4.8.33.3.i, “How Preemption Targets
Work”, on page 244.
Preemption is a primetime option, meaning that you can configure it separately for primetime
and non-primetime, or you can specify it for all of the time.
4.8.33.1
Glossary
Preempt
Stop one or more running jobs in order to start a higher-priority job
Preemption level
Job characteristic that determines preemption priority. Levels can be things like
being in an express queue, starving, having an owner who is over a soft limit, being a
normal job, or having an owner who is over a fairshare allotment
PBS Professional 13.0 Administrator’s Guide
AG-241
Scheduling
Chapter 4
Preemption method
The method by which a job is preempted. This can be checkpointing, suspension, or
requeueing
Preemption priority
How important this job is compared to other jobs, when considering whether to preempt
Preemption Target
A preemption target is a job in a specified queue or a job that has requested a specified resource. The queue and/or resource is specified in another job’s
Resource_List.preempt_targets.
4.8.33.2
Preemption Parameters and Attributes
The scheduler parameters that control preemption are defined in PBS_HOME/sched_priv/
sched_config. The scheduler also has attributes that control preemption; they can be set
via qmgr. Parameters and attributes that control preemption are listed here:
preemptive_sched
Parameter. Enables job preemption.
Format: String
Default: True all
preempt_order
Parameter. Defines the order of preemption methods which the scheduler will use on
jobs.
Format: String, as quoted list
Default: “SCR”
preempt_prio
Parameter. Specifies the ordering of priority of different preemption levels.
Format: String, as quoted list
Default: “express_queue, normal_jobs”
preempt_queue_prio
Parameter. Specifies the minimum queue priority required for a queue to be classified as an express queue.
Format: Integer
Default: 150
AG-242
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
preempt_sort
Parameter. Whether jobs most eligible for preemption will be sorted according to
their start times. Allowable values: “min_time_since_start”, or no preempt_sort
setting. If set to “min_time_since_start”, first job preempted will be that with most
recent start time. If not set, preempted job will be that with longest running time.
Must be commented out in order to be unset; default scheduler configuration file has
this parameter set to min_time_since_start.
Format: String
Default: min_time_since_start
preempt_targets
Resource that a job can request or inherit from the server or a queue. The
preempt_targets resource lists one or more queues and/or one or more resources.
Jobs in those queues, and jobs that request those resources, are the jobs that can be
preempted.
sched_preempt_enforce_resumption
Scheduler attribute. Specifies whether the scheduler creates a special execution priority class for preempted jobs. If so, the scheduler runs these jobs just after any
higher-priority jobs. See section 4.8.16, “Calculating Job Execution Priority”, on
page 174.
Format: Boolean
Default: False
4.8.33.3
How Preemption Works
If preemption is enabled, the scheduler uses preemption as part of its normal pattern of examining each job and figuring out whether or not it can run now. If a job with high preemption
priority cannot run immediately, the scheduler looks for jobs with lower preemption priority.
The scheduler finds jobs in the lowest preemption level that have been started the most
recently. The scheduler preempts these jobs and uses their resources for the higher-priority
job. The scheduler tracks resources used by lower-priority jobs, looking for enough resources
to run the higher-priority job. If the scheduler cannot find enough work to preempt in order to
run a given job, it will not preempt any work.
A job running in a reservation cannot be preempted.
A job’s preemption priority is determined by its preemption level.
PBS Professional 13.0 Administrator’s Guide
AG-243
Scheduling
Chapter 4
4.8.33.3.i
How Preemption Targets Work
Preemption targets work as a restriction on which jobs can be preempted by a particular job.
If a job has requested preempt_targets, the scheduler searches for lower-priority jobs among
only the jobs specified in preempt_targets. If a job has not requested preempt_targets, the
scheduler searches among all jobs. For example, if the scheduler is trying to run JobA, and
JobA requests preempt_targets=Queue1,Resource_List.arch=linux, JobA is eligible to
preempt only those jobs in Queue1 and/or that request arch=linux. In addition, JobA can
only preempt jobs with lower preemption priority than JobA.
4.8.33.4
Preemption and Job Execution Priority
PBS has an execution class we call Preempted for jobs that have been preempted. The
scheduler restarts preempted jobs as soon as the preemptor finishes and any other higher-priority jobs finish. See section 4.8.16, “Calculating Job Execution Priority”, on page 174.
4.8.33.5
Triggers for Preemption
If preemption is enabled, preemption is used during the following:
•
The normal scheduling cycle
•
When you run a job via qrun
4.8.33.6
Preemption Levels
A preemption level is a class of jobs, where all the jobs in the class share a characteristic.
PBS provides built-in preemption levels, and you can combine them or ignore them as you
need, except for the normal_jobs class, which is required. The built-in preemption levels are
listed in the table below.
Table 4-14: Built-in Preemption Levels
Preemption Level
Description
express_queue
Jobs in express queues. See section 4.8.33.6.ii, “The Express
Queues Preemption Level”, on page 248
starving_jobs
A job that is starving. See section 4.8.33.6.iv, “The Starving
Job Preemption Level”, on page 249
normal_jobs
The preemption level into which a job falls if it does not fit into
any other specified level. See section 4.8.33.6.v, “The Normal
Jobs Preemption Level”, on page 249
AG-244
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Table 4-14: Built-in Preemption Levels
Preemption Level
Description
fairshare
When the entity owning a job exceeds its fairshare limit. See
section 4.8.33.6.iii, “The Fairshare Preemption Level”, on page
249
queue_softlimits
Jobs which are over their queue soft limits. See section
4.8.33.6.i, “The Soft Limits Preemption Level”, on page 246
server_softlimits
Jobs which are over their server soft limits. See section
4.8.33.6.i, “The Soft Limits Preemption Level”, on page 246
You can specify the relative priority of each preemption level, by listing the levels in the
desired order in the preempt_prio scheduler parameter. Placing a level earlier in the list,
meaning to the left, gives it higher priority. For example, if your list is
“express_queue”, “normal_jobs”, “server_softlimits”, you are giving the
highest priority to jobs in express queues, and the lowest priority to jobs that are over their
server soft limits. You can list levels in any order, but be careful not to work at cross-purposes
with your execution priority. See section 4.8.16, “Calculating Job Execution Priority”, on
page 174.
The default value for preempt_prio is the following:
preempt_prio: “express_queue, normal_jobs”
If you do not list a preemption level in the preempt_prio scheduler parameter, the jobs in that
level are treated like normal jobs. For example, if you do not list server_softlimits, then jobs
that are over their server soft limits are treated like jobs in the normal_jobs level.
You can create new levels that are combinations of the built-in levels. For example, you can
define a level which is express_queue + server_softlimits . This level contains jobs that
are in express queues and are over their server soft limits. You would probably want to place
this level just to the right of the express_queue level, meaning that these jobs could be preempted by jobs that are in express queues but are not over their server soft limits.
You can give two or more levels the same priority. To do this, put a plus sign (“+”) between
them, and do not list either level separately in preempt_prio. You are creating a new level
that includes all the built-in levels that should have the same priority. For example, to list
express queue jobs as highest in priority, then fairshare and starving jobs at the next highest
priority, then normal jobs last, create a new level that contains the fairshare and
starving_jobs levels:
preempt_prio: “express_queue, fairshare+starving_jobs, normal_jobs”
PBS Professional 13.0 Administrator’s Guide
AG-245
Chapter 4
Scheduling
You can be specific about dividing up jobs: if you want jobs in the express queue to preempt
jobs that are also in the express queue but are over their server soft limits, list each level separately:
preempt_prio: “express_queue, express_queue+server_softlimits,
normal_jobs”
However, be careful not to create a runaway effect by placing levels that are over limits before
those that are not, for example, express_queue+server_softlimits to the left of
express_queue.
You must list normal_jobs in the preempt_prio scheduler parameter.
4.8.33.6.i
The Soft Limits Preemption Level
You can set a limit, called a hard limit, on the number of jobs that can be run or the amount of
a resource that can be consumed by a person, a group, or by everyone, and this limit can be
applied at the server and at each queue. If you set such a limit, that is the greatest number of
jobs that will be run, or the largest amount of the resource that will be consumed.
AG-246
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
You can also set a soft limit on the number of jobs that can be run or the amount of a resource
that can be consumed. This soft limit should be lower than the hard limit, and should mark
the point where usage changes from being normal to being “extra, but acceptable”. Usage in
this “extra, but acceptable” range can be treated by PBS as being lower priority than the normal usage. PBS can preempt jobs that are over their soft limits. The difference between the
soft limit and the hard limit provides a way for users or groups to use resources as long as no
higher-priority work is waiting.
Example 4-17: Using group soft limits
One group of users, group A, has submitted enough jobs that the group is over their soft
limit. A second group, group B, submits a job and are under their soft limit. If preemption is enabled, jobs from group A are preempted until the job from group B can run.
Example 4-18: Using soft limits on number of running jobs
Given the following:
•
You have three users, UserA, UserB, and UserC
•
Each has a soft limit of 3 running jobs
•
UserA runs 3 jobs
•
UserB runs 4 jobs
•
UserC submits a job to an express queue
This means:
•
UserB has 1 job over the soft limit, so UserB’s jobs are eligible for preemption by
UserC’s job
Example 4-19: Using soft limits on amount of resource being used
Given the following:
•
Queue soft limit for ncpus is 8
•
UserA’s jobs use 6 CPUs
•
UserB’s jobs use 10 CPUs
This means:
•
UserB is over their soft limit for CPU usage
•
UserB’s jobs are eligible for preemption
To use soft limits in preemption levels, you must define soft limits. Soft limits are specified
by setting server and queue limit attributes. The attributes that control soft limits are:
max_run_soft
Sets the soft limit on the number of jobs that can be running
PBS Professional 13.0 Administrator’s Guide
AG-247
Chapter 4
Scheduling
max_run_res_soft.<resource>
Sets the soft limit on the amount of a resource that can be consumed by running jobs
Soft limits are enforced only when they are used as a preemption level.
To use soft limits as preemption levels, add their keywords to the preempt_prio parameter in
the scheduler’s configuration file:
•
To create a preemption level for those over their soft limits at the server level, add
“server_softlimits” to the preempt_prio parameter.
•
To create a preemption level for those over their soft limits at the queue level, add
“queue_softlimits” to the preempt_prio parameter.
•
To create a preemption level for those over their soft limits at both the queue and server,
add “server_softlimits+queue_softlimits” to the preempt_prio parameter.
The jobs of a user or group are over their soft limit only as long as the number of running jobs
or the amount of resources used by running jobs is over the soft limit. If some of these jobs
are preempted or finish running, and the soft limit is no longer exceeded, the jobs of that user
or group are no longer over their soft limit, and no longer in that preemption level. For example, if the soft limit is 3 running jobs, and UserA runs 4 jobs, as soon as one job is preempted
and only 3 of UserA’s jobs are running, UserA’s jobs are no longer over their soft limit.
For a complete description of the use of these attributes, see section 5.15.1.4, “Hard and Soft
Limits”, on page 393.
4.8.33.6.ii
The Express Queues Preemption Level
The express_queue preemption level applies to jobs residing in express queues. An express
queue is an execution queue with priority at or above the value set in the
preempt_queue_prio scheduler parameter. The default value for this parameter is 150.
Express queues do not require the by_queue scheduler parameter to be True.
If you will use the express_queue preemption level, you probably want to configure at least
one express queue, along with some method of moving jobs into it. See section 2.2,
“Queues”, on page 18.
If you have more than one express queue, and they have different priorities, you are effectively creating separate sub-levels for express queues. Jobs in a higher-priority express queue
have greater preemption priority than jobs in lower-priority express queues.
See “preempt_queue_prio” on page 307 of the PBS Professional Reference Guide.
AG-248
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.33.6.iii
Chapter 4
The Fairshare Preemption Level
The fairshare preemption level applies to jobs owned by entities who are over their fairshare
allotment. For example, if each of five users has 20 percent of the fairshare tree, and UserA is
using 25 percent of the resources being tracked for fairshare, UserA’s jobs become eligible for
preemption at the fairshare preemption level.
To use the fairshare preemption level, you must enable fairshare. See section 4.8.18, “Using
Fairshare”, on page 179.
4.8.33.6.iv
The Starving Job Preemption Level
The starving_jobs preemption level applies to jobs that are starving. Starving jobs are jobs
that have been waiting at least a specified amount of time to run.
To use the starving_jobs preemption level, you must enable starving:
•
Set the PBS_HOME/sched_priv/sched_config help_starving_jobs parameter to
True
•
Set the amount of time that a job must wait before it is starving in the max_starve scheduler parameter
•
Optionally, use eligible time for waiting time. See section 4.8.13, “Eligible Wait Time for
Jobs”, on page 163.
See section 4.8.46, “Starving Jobs”, on page 296.
4.8.33.6.v
The Normal Jobs Preemption Level
One special class, normal_jobs, is the default class for any job not otherwise specified. If a
job does not fall into any of the specified levels, it is placed in normal_jobs.
Example 4-20: Starving jobs have the highest priority, then normal jobs, then jobs whose
entities are over their fairshare limit:
preempt_prio: “starving_jobs, normal_jobs, fairshare”
Example 4-21: Starving jobs whose entities are also over their fairshare limit are lower priority than normal jobs:
preempt_prio: “normal_jobs, starving_jobs+fairshare”
PBS Professional 13.0 Administrator’s Guide
AG-249
Scheduling
Chapter 4
4.8.33.7
Selecting Preemption Level
PBS places each job in the most exact preemption level, or the highest preemption level that
fits the job.
Example 4-22: We have a job that is starving and over its server soft limits. The job is placed
in the “starving_jobs” level:
preempt_prio: “starving_jobs, normal_jobs, server_softlimits”
Example 4-23: We have a job that is starving and over its server soft limits. The job is placed
in the “starving_jobs+server_softlimits” level:
preempt_prio: “starving_jobs, starving_jobs+server_softlimits,
normal_jobs, server_softlimits”
4.8.33.8
Sorting Within Preemption Level
If there is more than one job within the preemption level chosen for preemption, PBS chooses
jobs within that level according to their start time. By default, PBS preempts the job which
started running most recently. This behavior can be changed using the scheduler parameter
preempt_sort. To direct PBS to preempt the longest-running jobs, comment out the line containing the preempt_sort parameter in PBS_HOME/sched_priv/sched_config.
For example, if we have two jobs where job A started running at 10:00 a.m. and job B started
running at 10:30 a.m:
•
The default behavior preempts job B
•
Job A is preempted when preempt_sort is commented out
The allowable value for the preempt_sort parameter is “min_time_since_start”.
The default value for the preempt_sort parameter is “min_time_since_start”. Must be
commented out in order to be unset; default scheduler configuration file has this parameter set
to min_time_since_start.
4.8.33.9
Preemption Methods
The scheduler can preempt a job in one of the following ways:
•
Suspend the job
•
Checkpoint the job
•
Requeue the job
AG-250
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
The scheduler tries to preempt a job using the methods listed in the order you specify. This
means that if you specify that the order is “checkpoint, suspend, requeue”, the scheduler first
tries to checkpoint the job, and if it cannot, it tries to suspend the job, and if it cannot do that,
it tries to requeue the job.
You can specify the order of these attempts in the preempt_order scheduler parameter in
PBS_HOME/sched_priv/sched_config.
The preempt_order parameter defines the order of preemption methods which the scheduler
uses on jobs. This order can change depending on the percentage of time remaining on the
job. The ordering can be any combination of S, C and R (for suspend, checkpoint, and
requeue).
The contents is an ordering, for example “SCR” optionally followed by a percentage of time
remaining and another ordering.
The format is a quoted list(“”).
Example 4-24: PBS should first attempt to use suspension to preempt a job, and if that is
unsuccessful, then requeue the job:
preempt_order: “SR”
Example 4-25: If the job has between 100-81% of requested time remaining, first try to suspend the job, then try checkpoint, then requeue. If the job has between 80-51% of
requested time remaining, then attempt suspend then checkpoint; and between 50% and
0% time remaining just attempt to suspend the job:
preempt_order: “SCR 80 SC 50 S”
The default value for preempt_order is “SCR”.
4.8.33.9.i
Preemption Via Checkpoint
When a job is preempted via checkpointing, MoM runs the checkpoint_abort script, and PBS
kills and requeues the job. When the scheduler elects to run the job again, the scheduler runs
the restart script to restart the job from where it was checkpointed.
To preempt via checkpointing, you must define both of the following:
•
The checkpointing action in the MoM’s checkpoint_abort $action parameter that is to
take place when the job is preempted
•
The restarting action in the MoM’s restart $action parameter that is to take place when
the job is restarted
To do this, you must supply checkpointing and restarting scripts or equivalents, and then configure the MoM’s checkpoint_abort and restart $action parameters. Do not use the $action
checkpoint MoM parameter; it is used when the job should keep running.
See section 9.3, “Checkpoint and Restart”, on page 857.
PBS Professional 13.0 Administrator’s Guide
AG-251
Chapter 4
4.8.33.9.ii
Scheduling
Preemption Via Suspension
Jobs are normally suspended via the SIGSTOP signal and resumed via the SIGCONT signal.
An alternate suspend or resume signal can be configured in MoM’s $suspendsig configuration parameter. See “pbs_mom” on page 61 of the PBS Professional Reference Guide.
4.8.33.9.iii
Suspended Jobs and PBS Licenses
When a job is suspended, its PBS usage licenses are returned to the license pool, subject to the
constraints of the server’s pbs_license_min and pbs_license_linger_time attributes. The
scheduler checks to make sure that licenses are available before resuming any job. If the
required licenses are not available, the scheduler will log a message and add a comment to the
job. See “Floating Licenses and Job States” on page 132 in the PBS Professional Installation
& Upgrade Guide.
4.8.33.9.iv
Suspended Jobs and Resources
Suspended jobs will hold onto some memory and disk space. Suspended jobs may hold application licenses if the application releases them only when it exits. See section 5.9.6.2.i, “Suspension/resumption Resource Caveats”, on page 332.
4.8.33.9.v
Preemption Via Requeue
When a job is preempted and requeued, the job stops execution and is requeued. A requeued
job’s eligible time is preserved. The amount of time allowed to requeue a job is controlled by
the job_requeue_timeout server attribute. See “Server Attributes” on page 332 of the PBS
Professional Reference Guide.
A job that is not eligible to be requeued, meaning a job that was submitted with “-r n”, will
not be selected to be preempted via requeue.
4.8.33.10
Enabling Preemption
Preemptive scheduling is enabled by setting parameters in the scheduler’s configuration file
PBS_HOME/sched_priv/sched_config.
AG-252
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
To enable preemption, you must do the following:
1.
Specify the preemption levels to be used by setting preempt_prio to desired preemption
levels (the default is “express_queue, normal_jobs”)
The preempt_prio parameter must contain an entry for normal_jobs.
2.
Optional: specify preemption order by setting preempt_order
3.
Optional: specify whether longest- or shortest-running jobs should be preempted first by
setting preempt_sort
4.
If you will use the fairshare preemption level, configure fairshare. See section 4.8.18,
“Using Fairshare”, on page 179.
5.
If you will use the starving_jobs preemption level, configure starving. See section
4.8.33.6.iv, “The Starving Job Preemption Level”, on page 249.
6.
If you will use the server_softlimits and/or queue_softlimits preemption levels, configure server and/or queue soft limits. See section 4.8.33.6.i, “The Soft Limits Preemption
Level”, on page 246.
7.
Enable preemption by setting preemptive_sched to True . It is True by default.
8.
Choose whether to use preemption during primetime, non-primetime, or all of the time.
The default is ALL. If you want separate behavior for primetime and non-primetime,
specify each separately. For example:
preemptive_sched True prime
preemptive_sched False non_prime
PBS Professional 13.0 Administrator’s Guide
AG-253
Chapter 4
4.8.33.11
Scheduling
Preemption Example
Below is an example of (part of) the scheduler’s configuration file, showing an example configuration for preemptive scheduling.
# turn on preemptive scheduling
#
preemptive_sched:
TRUE ALL
#
# set the queue priority level for express queues
#
preempt_queue_prio:
150
#
# specify the priority of jobs as: express queue
# (highest) then starving jobs, then normal jobs,
# followed by jobs who are starving but the user/group
# is over a soft limit, followed by users/groups over
# their soft limit but not starving
#
preempt_prio: “express_queue, starving_jobs, normal_jobs,
starving_jobs+server_softlimits, server_softlimits”
#
# specify when to use each preemption method.
# If the first method fails, try the next
# method. If a job has between 100-81% time
# remaining, try to suspend, then checkpoint
# then requeue. From 80-51% suspend and then
# checkpoint, but don't requeue.
# If between 50-0% time remaining, then just
# suspend it.
#
preempt_order: “SCR 80 SC 50 S”
4.8.33.12
•
Preemption Caveats and Recommendations
When using any of the fairshare, soft limits, express queue, or starving jobs preemption
levels, be sure to enable the corresponding PBS feature. For example, when using pre-
AG-254
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
emption with the fairshare preemption level, be sure to turn fairshare on. Otherwise, you
will be using stale fairshare data to preempt jobs.
•
It’s important to be careful about the order of the preemption levels and the sizes of the
limits at queue and server. For example, if you make users who are over their server soft
limits have higher priority than users who are over their queue soft limits, and you set the
soft limit higher at the server than at the queue, you can end up with users who have more
jobs running preempting users who have fewer jobs running.
In this example, a user with more jobs preempts a user with fewer jobs
Given the following:
•
preempt_prio line contains “server_softlimits, queue_softlimits”
•
Server soft limit is 5
•
Queue soft limit is 3
•
User1 has 6 jobs running
•
User2 has 4 jobs running
This means:
•
User1 has higher priority, because User1 is over the server soft limit
•
User1’s jobs can preempt User2’s jobs
To avoid this scenario, you could set the preempt_prio line to contain
“server_softlimits, queue_softlimits,
server_softlimits+queue_softlimits”. In this case User1 would have lower
priority, because User1 is over both soft limits.
•
Preemption priority is mostly independent of execution priority. You can list preemption
levels in any order in preempt_prio, but be careful not to work at cross-purposes with
PBS Professional 13.0 Administrator’s Guide
AG-255
Chapter 4
Scheduling
your execution priority. Be sure that you are not preempting jobs that have higher execution priority. See section 4.8.16, “Calculating Job Execution Priority”, on page 174.
•
Using preemption with strict ordering and backfilling may change which job is being
backfilled around.
•
When a job is suspended via checkpoint or requeue, it loses it queue wait time. This does
not apply to preemption via suspension.
•
If a high-priority job has been selected to preempt lower-priority jobs, but is rejected by a
runjob hook, the scheduler undoes the preemption of the low-priority jobs. Suspended
jobs are resumed, and checkpointed jobs are restarted.
•
A job that has requested an AOE will not preempt another job, regardless of whether the
job’s requested AOE matches an instantiated AOE. Running jobs are not preempted by
jobs requesting AOEs.
•
If a job is checkpointed by the scheduler because it was preempted, the scheduler briefly
applies a hold, but releases the hold immediately after checkpointing the job, and runs the
restart script when the job is scheduled to run.
•
When jobs are preempted via requeueing, the requeue can fail if the job being preempted
takes longer than the allowed timeout. See section 9.4.3, “Setting Job Requeue Timeout”, on page 883.
•
When you issue “qrun <job ID>”, without the -H option, the selected job has preemption priority between Reservation and Express, for that scheduling cycle. However, at
the following scheduling cycle, the preemption priority of the selected job returns to
whatever it would be without qrun.
4.8.34
Using Primetime and Holidays
Often is it useful to run different scheduling policies for specific intervals during the day or
work week. PBS provides a way to specify two types of interval, called primetime and nonprimetime.
Between them, primetime and non-primetime cover all time. There is no time slot that is neither primetime nor non-primetime. This includes dedicated time. Primetime and/or nonprimetime overlap dedicated time.
You can use non-primetime for such tasks as running jobs on desktop clusters at night.
4.8.34.1
How Primetime and Holidays Work
The scheduler looks in the PBS_HOME/sched_priv/holidays file for definitions of
primetime, non-primetime, and holidays.
AG-256
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Many PBS scheduling parameters can be specified separately for primetime, non-primetime,
or all of the time. This means that you can use, for example, fairshare during primetime and
no fairshare during non-primetime. These parameters have a time slot default of all, meaning
that if enabled, they are in force all of the time.
The scheduler applies the parameters defined for primetime during the primetime time slots,
and applies parameters defined for non-primetime during the non-primetime time slots. Any
scheduler parameters defined for all time are run whether it is primetime or not.
Any holidays listed in the holidays file are treated as non-primetime. To have a holiday
treated like a normal workday or weekend, do not list it in the holidays file.
There are default behaviors for primetime and non-primetime, but you can set up the behavior
you want for each type. The names “primetime” and “non-primetime” are meant to be informative, but they are arbitrary. The default for primetime is from 6:00 AM to 5:30 PM on
weekdays, meaning that weekends and nights are non-primetime by default. U.S. Federal
holidays are provided in the holidays file.
You can define primetime and non-primetime queues. Jobs in these queues can run only during the designated time. Queues that are not defined specifically as primetime or non-primetime queues are called “anytime queues”.
4.8.34.2
Configuring Primetime and Non-primetime
In order to use primetime and non-primetime, you must have a holidays file with the current year in it.
You can specify primetime and non-primetime time slots by specifying them in the
PBS_HOME/sched_priv/holidays file.
The format of the primetime and non-primetime section of the holidays file is the following:
YEAR YYYY
<day> <prime> <nonprime>
<day> <prime> <nonprime>
If there is no YEAR line in the holidays file, primetime is in force at all times. If there is more
than one YEAR line, the last one is used.
In YEAR YYYY, YYYY is the current year.
Day can be weekday, monday, tuesday, wednesday, thursday, friday, saturday, or sunday.
Each line must have all three fields.
Any line that begins with a “*” or a “#” is a comment.
Weekday names must be lowercase.
PBS Professional 13.0 Administrator’s Guide
AG-257
Scheduling
Chapter 4
The ordering of elements in this file is important. The ordering of <day> lines in the holidays
file controls how primetime is determined. A later line takes precedence over an earlier line.
For example:
weekday
friday
0630
0715
1730
1600
0630
0630
0630
0630
0715
1730
1730
1730
1730
1600
means the same as
monday
tuesday
wednesday
thursday
friday
However, if a specific day is followed by “weekday”,
friday
weekday
0700
0630
1600
1730
the “weekday” line takes precedence, so Friday will have the same primetime as the other
weekdays.
Times can be expressed as one of the following:
•
HHMM with no colons(:)
•
The word “all”
•
The word “none”
4.8.34.3
Configuring Holidays
You can specify primetime and non-primetime time slots by specifying them in the
PBS_HOME/sched_priv/holidays file.
You must specify the year, otherwise primetime is in force at all times, and PBS will not recognize any holidays. Specify the year here, where YYYY is the current year:
YEAR YYYY
Holidays are specified in lines of this form:
<day of year> <month day-of-month> <holiday name>
PBS uses the <day of year> field and ignores the <date> string.
Day of year is the julian day of the year between 1 and 365 (e.g. “1”).
Month day-of-month is the calendar date, for example “Jan 1”.
AG-258
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Holiday name is the name of the holiday, for example “New Year’s Day”.
4.8.34.4
Example of holidays File
YEAR
2007
*
Prime
* Day
Start
*
weekday
0600
saturday none
sunday
none
*
* Day of
Calendar
* Year
Date
1
Jan 1
15
Jan 15
50
Feb 19
148
May 28
185
Jul 4
246
Sep 3
281
Oct 8
316
Nov 12
326
Nov 22
359
Dec 25
4.8.34.5
Non-Prime
Start
1730
all
all
Company Holiday
Holiday
New Year's Day
Dr. M.L. King Day
President's Day
Memorial Day
Independence Day
Labor Day
Columbus Day
Veteran's Day
Thanksgiving
Christmas Day
Reference Copies of holidays File
Reference copies of the holidays file are provided in PBS_EXEC/etc/holiday.<year>.
The current year’s holidays file has a reference copy in PBS_EXEC/etc/
pbs_holidays, and a copy used by PBS in PBS_HOME/sched_priv/holidays.
To use a particular year’s file as the holidays file, copy it to PBS_HOME/sched_priv/
holidays -- note the “s” on the end of the filename.
4.8.34.6
Defining Primetime and Non-primetime Queues
Jobs in a primetime queue can start only during primetime. Jobs in a non-primetime queue
can start only during non-primetime. Jobs in an anytime queue can start at any time.
PBS Professional 13.0 Administrator’s Guide
AG-259
Scheduling
Chapter 4
You define a primetime queue by naming it using the primetime prefix. The prefix is defined
in the primetime_prefix scheduler parameter. The default is “p_”. For example, you could
name a primetime queue “p_queueA”, using the default.
Similarly, you define a non-primetime queue by prefixing the name. The prefix is defined in
the nonprimetime_prefix scheduler parameter, and defaults to “np_”.
4.8.34.7
Controlling Whether Jobs Cross Primetime
Boundaries
You can control whether jobs are allowed to start running in one time slot and finish in
another, for example when job A starts during primetime and finishes a few minutes into nonprimetime. When a job runs past the boundary, it delays the start of a job that is constrained to
run only in the later time slot. For example, if job B can run only during non-primetime, it
may have to wait while job A uses up non-primetime before it can start. You can control this
behavior for all queues, or you can exempt anytime queues, controlling only primetime and
non-primetime queues. You can also specify how much time past the boundary a job is
allowed to run.
To prevent the scheduler from starting any jobs which would run past a primetime/non-primetime boundary, set the backfill_prime scheduler parameter to True. You can specify this separately for primetime and nonprimetime. If you specify it for one type of time slot, it prevents
those jobs from crossing the next boundary. For example, if you set the following:
backfill_prime True prime
jobs in primetime slots are not allowed to cross into non-primetime slots.
If you set the following:
backfill_prime True non_prime
jobs in non-primetime slots are not allowed to cross into primetime slots.
To exempt jobs in anytime queues from the control of backfill_prime, set the
prime_exempt_anytime_queues scheduler parameter to True. This means that jobs in an
anytime queue are not prevented from running across a primetime/nonprimetime or nonprimetime/primetime boundary.
To allow jobs to spill over a certain amount of time past primetime/non-primetime boundaries, but no more, specify this amount of time in the prime_spill scheduler parameter. You
can specify separate behavior for primetime and non-primetime jobs. For example, to allow
primetime jobs to spill by 20 minutes, but only allow non-primetime jobs to spill by 1minute:
prime_spill 00:20:00 prime
prime_spill 00:01:00 non_prime
The prime_spill scheduler parameter applies only when backfill_prime is True.
AG-260
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.34.8
Chapter 4
Logging
The scheduler logs a message at the beginning of each scheduling cycle indicating whether it
is primetime or not, and when this period of primetime or non-primetime will end. The message is at log event class 0x0100. The message is of this form:
“It is primetime and it will end in NN seconds at MM/DD/YYYY HH:MM:SS”
or
“It is non-primetime and it will end in NN seconds at MM/DD/YYYY HH:MM:SS”
4.8.34.9
Scheduling Parameters Affecting Primetime
backfill_prime
The scheduler will not run jobs which would overlap the boundary between primetime and non-primetime.
Format: Boolean
Default: False all
nonprimetime_prefix
Queue names which start with this prefix will be treated as non-primetime queues.
Jobs within these queues will only run during non-primetime.
Format: String
Default: np_
primetime_prefix
Queue names starting with this prefix are treated as primetime queues. Jobs will only
run in these queues during primetime.
Format: String
Default: p_
prime_exempt_anytime_queues
Determines whether anytime queues are controlled by backfill_prime.
If set to True, jobs in an anytime queue will not be prevented from running across a
primetime/non-primetime or non-primetime/primetime boundary.
If set to False, the jobs in an anytime queue may not cross this boundary, except for
the amount specified by their prime_spill setting.
Format: Boolean
Default: False
PBS Professional 13.0 Administrator’s Guide
AG-261
Scheduling
Chapter 4
prime_spill
Specifies the amount of time a job can spill over from non-primetime into primetime
or from primetime into non-primetime. This option can be separately specified for
prime- and non-primetime. This option is only meaningful if backfill_prime is
True.
Format: Duration
Default: 00:00:00
4.8.34.10
Primetime and Holiday Caveats
•
In order to use primetime and non-primetime, you must have a holidays file with the
current year in it. If there is no holidays file with a year in it, primetime is in force all
of the time.
•
You cannot combine holidays files.
•
If you use the formula, it is in force all of the time.
4.8.35
Provisioning
PBS provides automatic provisioning of an OS or application, on vnodes that are configured
to be provisioned. When a job requires an OS that is available but not running, or an application that is not installed, PBS provisions the vnode with that OS or application.
You can configure vnodes so that PBS will automatically install the OS or application that
jobs need in order to run on those vnodes. For example, you can configure a vnode that is
usually running RHEL to run SLES instead whenever the Physics group runs a job requiring
SLES. If a job requires an application that is not usually installed, PBS can install the application in order for the job to run.
You can use provisioning for booting multi-boot systems into the desired OS, downloading an
OS to and rebooting a diskless system, downloading an OS to and rebooting from disk, instantiating a virtual machine, etc. You can also use provisioning to run a configuration script or
install an application.
For a complete description of how provisioning works and how to configure it, see Chapter 7,
"Provisioning", on page 739.
4.8.36
Queue Priority
Queues and queue priority play several different roles in scheduling, so this section contains
pointers to other sections.
AG-262
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Each queue can have a different priority. A higher value for priority means the queue has
greater priority. By default, queues are sorted from highest to lowest priority. Jobs in the
highest priority queue will be considered for execution before jobs from the next highest priority queue. If queues don’t have different priority, queue order is undefined.
Each queue’s priority is specified in its priority attribute By default, the queue priority
attribute is unset. There is no limit to the priority that you can assign to a queue, however it
must fit within integer size. See “Queue Attributes” on page 371 of the PBS Professional Reference Guide.
4.8.36.1
Configuring Queue Priority
You can specify the priority of each queue by setting a value for its priority attribute:
Qmgr: set queue <queue name> priority = <value>
4.8.36.2
Using Queue Priority
You can configure the scheduler so that job execution or preemption priority is partly or
entirely determined by the priority of the queue in which the job resides. Queue priority can
be used for the following purposes:
•
Queue priority can be used as a term in the job sorting formula. See section 4.8.20,
“Using a Formula for Computing Job Execution Priority”, on page 194
•
Queue priority can be used to specify the order in which queues are examined when
scheduling jobs. If you want jobs to be examined queue by queue, in order of queue priority, you must specify a different priority for each queue. A queue with a higher value is
examined before a queue with a lower value. See section 4.2.5.3.i, “Using Queue Order
to Affect Order of Consideration”, on page 68
•
You can set up execution priority levels that include jobs in express queues. For information on configuring job priorities in the scheduler, see section 4.8.16, “Calculating Job
Execution Priority”, on page 174.
•
You can set up preemption levels that include jobs in express queues. For information on
preemption, see section 4.8.33, “Using Preemption”, on page 241.
A queue is an express queue if its priority is greater than or equal to the value that defines an
express queue. For more about using express queues, see section 4.8.17, “Express Queues”,
on page 179.
PBS Professional 13.0 Administrator’s Guide
AG-263
Scheduling
Chapter 4
4.8.36.3
•
Queue Priority Caveats
If you use queue priority in the formula and the job is moved to another server through
peer scheduling, the queue priority used in the formula will be that of the new queue to
which the job is moved.
4.8.37
Advance and Standing Reservations
PBS provides a way to reserve specific resources for a defined time period. You can make a
one-time reservation, or you can make a series of reservations, where each one is for the same
resources, but for a different time period.
Reservations are useful for accomplishing the following:
•
To get a time slot on a specific host
•
To run a job in a specific time slot, meaning at or by a specific time
•
To be sure a job will run
•
To have a high-priority job run soon
4.8.37.1
Definitions
Advance reservation
A reservation for a set of resources for a specified time. The reservation is available
only to the creator of the reservation and any users or groups specified by the creator.
Standing reservation
An advance reservation which recurs at specified times. For example, the user can
reserve 8 CPUs and 10GB every Wednesday and Thursday from 5pm to 8pm, for the
next three months.
Occurrence of a standing reservation
An occurrence of a standing reservation behaves like an advance reservation, with
the following exceptions:
•
While a job can be submitted to a specific advance reservation, it can only be
submitted to the standing reservation as a whole, not to a specific occurrence.
You can only specify when the job is eligible to run. See “qsub” on page 225 of
the PBS Professional Reference Guide.
•
When an advance reservation ends, it and all of its jobs, running or queued, are
deleted, but when an occurrence ends, only its running jobs are deleted.
Each occurrence of a standing reservation has reserved resources which satisfy the
resource request, but each occurrence may have its resources drawn from a different
AG-264
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
source. A query for the resources assigned to a standing reservation will return the
resources assigned to the soonest occurrence, shown in the resv_nodes attribute
reported by pbs_rstat.
Also called an instance of a standing reservation.
Soonest occurrence of a standing reservation
The occurrence which is currently active, or if none is active, then it is the next
occurrence.
Degraded reservation
An advance reservation for which one or more associated vnodes are unavailable.
A standing reservation for which one or more vnodes associated with the soonest
occurrence are unavailable.
4.8.37.2
4.8.37.2.i
How Reservations Work
Creating Reservations
Any PBS user can create both advance and standing reservations using the pbs_rsub command. PBS either confirms that the reservation can be made, or rejects the request. Once the
reservation is confirmed, PBS creates a queue for the reservation’s jobs. Jobs are then submitted to this queue.
When a reservation is confirmed, it means that the reservation will not conflict with currently
running jobs, other confirmed reservations, or dedicated time, and that the requested resources
are available for the reservation. A reservation request that fails these tests is rejected. All
occurrences of a standing reservation must be acceptable in order for the standing reservation
to be confirmed.
The pbs_rsub command returns a reservation ID, which is the reservation name. For an
advance reservation, this reservation ID has the format:
R<unique integer>.<server name>
For a standing reservation, this reservation ID refers to the entire series, and has the format:
S<unique integer>.<server name>
The user specifies the resources for a reservation using the same syntax as for a job.
See "Reserving Resources Ahead of Time", on page 191 of the PBS Professional User’s
Guide, for detailed information on creation and use of reservations.
The time for which a reservation is requested is in the time zone at the submission host.
PBS Professional 13.0 Administrator’s Guide
AG-265
Chapter 4
4.8.37.2.ii
Scheduling
Reservations and Placement Sets
When PBS chooses a placement set for a reservation, it makes the same choices as it would
for a regular job. It fits the reservation into the smallest possible placement set. See section
4.8.32.4.ii, “Order of Placement Set Consideration Within Pool”, on page 228.
When a reservation is created, it is created within a placement set, if possible. If no placement
set will satisfy the reservation, placement sets are ignored. The vnodes allocated to a reservation are used as one single placement set for jobs in the reservation; they are not subdivided
into smaller placement sets. A job within a reservation runs within the single placement set
made up of the vnodes allocated to the reservation.
4.8.37.2.iii
Requesting Resources for Reservations
Reservations request resources using the same mechanism that jobs use. If a resource is unrequestable, users cannot request it for a reservation. If a resource is invisible, users cannot
view it or request it for a reservation.
4.8.37.2.iv
Reservations and Provisioning
Users can create reservations that request AOEs. Each reservation can have at most one AOE
specified for it. Any jobs that run in that reservation must not request a different AOE. See
section 7.4.3, “Provisioning And Reservations”, on page 744.
The vnodes allocated to a reservation that requests an AOE are put in the resv-exclusive state
when the reservation runs. These vnodes are not shared with other reservations or with jobs
outside the reservation.
For information on restrictions applying to reservations used with provisioning, see section
7.7.2.3, “Vnode Reservation Restrictions”, on page 764.
For how to avoid problems with provisioning and reservations, see section 7.10.1, “Using
Provisioning Wisely”, on page 780.
4.8.37.2.v
Reservation Priority
A job running in a reservation cannot be preempted.
A job running in a reservation has the highest execution priority.
4.8.37.3
Querying Reservations
To query a reservation, use the pbs_rstat command. See "Viewing the Status of a Reservation", on page 198 of the PBS Professional User’s Guide. To delete an advance reservation,
use the pbs_rdel command, not the qmgr command.
AG-266
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.37.4
Chapter 4
Controlling Access to Reservations
You can specify which projects, users, and groups can and cannot submit jobs to reservations.
Use the qmgr command to set the reservation queue’s acl_users and/or acl_groups
attributes. See section 8.3, “Using Access Control”, on page 791.
4.8.37.5
Reservation Fault Tolerance
PBS automatically keeps track of the vnodes assigned to reservations, and tries to find
replacement vnodes for those that become unavailable. See section 9.5, “Reservation Fault
Tolerance”, on page 887.
4.8.37.6
Advance and Standing Reservations and
Licensing
Reservation jobs won’t run if PBS runs out of licenses. Set the server’s pbs_license_min
attribute to the total number of CPUs, including virtual CPUs, in the PBS complex. See
“Floating Licenses and Reservations” on page 132 in the PBS Professional Installation &
Upgrade Guide and “Setting Server Licensing Attributes” on page 122 in the PBS Professional Installation & Upgrade Guide.
4.8.37.7
Logging Reservation Information
The start and end of each occurrence of a standing reservation is logged as if each occurrence
were a single advance reservation.
Reservation-related messages are logged at level 0x0200 (512).
4.8.37.8
Accounting
Resources requested for a reservation are recorded in the reservation’s Resource_List
attribute, and reported in the accounting log B record for the reservation. The accounting log
B record is written at the beginning of a reservation.
4.8.37.9
Attributes Affecting Reservations
Most of the attributes controlling a reservation are set when the reservation is created by the
user. However, some server and vnode attributes also control the behavior of reservations.
PBS Professional 13.0 Administrator’s Guide
AG-267
Chapter 4
Scheduling
The server attributes that affect reservations are listed here, and described in “Server
Attributes” on page 332 of the PBS Professional Reference Guide.
Table 4-15: Server Attributes Affecting Reservations
Attribute
Effect
acl_resv_host_enable
Controls whether or not the server uses the acl_resv_hosts
access control lists.
acl_resv_hosts
List of hosts from which reservations may and may not be
created at this server.
acl_resv_group_enable
Controls whether or not the server uses the acl_resv_groups
access control lists.
acl_resv_groups
List of groups who may and may not create reservations at
this server.
acl_resv_user_enable
Controls whether or not the server uses the acl_resv_users
access control lists.
acl_resv_users
List of users who may and may not create reservations at this
server.
resv_enable
Controls whether or not reservations can be created at this
server.
The vnode attributes that affect reservations are listed here. See “Vnode Attributes” on page
384 of the PBS Professional Reference Guide for more information.
Table 4-16: Vnode Attributes Affecting Reservations
Attribute
Effect
queue
Associates the vnode with an execution queue. If this
attribute is set, this vnode cannot be used for reservations.
reserve_retry_cutoff
Cutoff time for reconfirmation retries before a degraded
occurrence or advance reservation. After this cutoff, PBS
will not try to reconfirm the occurrence or reservation.
reserve_retry_init
Length of time to wait between when a reservation becomes
degraded and when PBS tries to reconfirm the reservation.
Default: 2 hours
AG-268
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Table 4-16: Vnode Attributes Affecting Reservations
Attribute
resv_enable
4.8.37.10
Effect
Controls whether the vnode can be used for reservations.
Default is True, but set to False for a vnode used for cycle
harvesting.
Reservation Advice and Caveats
•
Do not delete a reservation’s queue.
•
Do not start a reservation’s queue (do not set the reservation’s started attribute to True).
Jobs will run prematurely.
•
Do not use qmgr to set attribute values for a reservation queue.
•
Reservations are incompatible with cycle harvesting. Do not allow reservations on
machines used for cycle harvesting. The user may begin using the machine, which will
suspend any PBS jobs, possibly preventing them from finishing before the reservation
runs out. Set each cycle harvesting vnode’s resv_enable attribute to False, to prevent
the vnode from being used for reservations.
•
You can write hooks that execute, modifying a reservation’s attributes, when a reservation is created. See Chapter 6, "Hooks", on page 437.
•
Allow enough time in reservations. If a job is submitted to a reservation with a duration
close to the walltime of the job, provisioning could cause the job to be terminated before
PBS Professional 13.0 Administrator’s Guide
AG-269
Chapter 4
Scheduling
it finishes running, or to be prevented from starting. If a reservation is designed to take
jobs requesting an AOE, leave enough extra time in the reservation for provisioning.
•
The xpbs GUI cannot be used for creation, querying, or deletion of reservations.
•
Hosts or vnodes that have been configured to accept jobs only from a specific queue
(vnode-queue restrictions) cannot be used for advance reservations. Hosts or vnodes that
are being used for cycle harvesting should not be used for reservations.
•
Hosts with $max_load and $ideal_load configured should not be used for reservations.
Set the resv_enable vnode attribute on these hosts to False.
•
For troubleshooting problems with reservations, see section 13.8.4, “Job in Reservation
Fails to Run”, on page 1053.
•
Be careful when using qrun -H on jobs or vnodes involved in reservations. Make sure
that you don’t oversubscribe reserved resources.
•
In order to create reservations, the submission host must have its timezone set to a value
that is understood by the PBS server. See section 13.6.14, “Unrecognized Timezone
Variable”, on page 1047.
•
Avoid making reservations for resources that are out of the control of PBS. Resources
that are managed through a server_dyn_res script may not be available when jobs need
them.
4.8.38
Round Robin Queue Selection
PBS can select jobs from queues by examining the queues in round-robin fashion. The behavior is round-robin only when you have groups of queues where all queues in each group have
the same priority.
The order in which queues are selected is determined by each queue’s priority. You can set
each queue’s priority; see section 2.2.5.3, “Prioritizing Execution Queues”, on page 23. If
queue priorities are not set, they are undefined. If you do not prioritize the queues, their order
is undefined.
When you have multiple queues with the same priority, the scheduler round-robins through all
of the queues with the same priority as a group. So if you have Q1, Q2, and Q3 at a priority of
100, Q4 and Q5 at a priority of 50, and Q6 at a priority of 10, the scheduler will round-robin
through Q1, Q2, and Q3 until all of those jobs are out of the way, then the scheduler will
round-robin through Q4 and Q5 until there are no more jobs in them, and finally the scheduler
will go through Q6.
When using the round-robin method with queues that have unique priorities, the scheduler
runs all jobs from the first queue, then runs all the jobs in the next queue, and so on.
AG-270
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
To specify that PBS should the round-robin method to select jobs, set the value of the
round_robin scheduler parameter to True.
The round_robin parameter is a primetime option, meaning that you can configure it separately for primetime and non-primetime, or you can specify it for all of the time.
You can use the round-robin method as a resource allocation tool. For example, if you need to
run the same number of jobs from each group, you can put each group’s jobs in a different
queue, and then use round-robin to run jobs, one from each queue.
The round-robin method is also used in PBS for some features that are not controlled by the
round_robin scheduler attribute. They are the following:
•
Routing queues try destinations in round-robin fashion, in the order listed
•
The SMP cluster distribution parameter, smp_cluster_dist, can use a round-robin
method to place jobs
See “round_robin” on page 310 of the PBS Professional Reference Guide.
4.8.38.1
Round-robin Caveats
•
Each scheduling cycle starts with the highest-priority queue. Therefore, when using
round-robin, this queue gets preferential treatment.
•
When set to True, the round_robin parameter overrides the by_queue parameter.
•
If round robin and strict ordering are True, and backfilling is False, and the top job cannot run, whether because of resources or rejection by MoM, no job runs. However, if
round robin is True and strict ordering is False, and the top job in the current queue cannot run, the next top job is considered instead. For example, we have 3 queues, each with
3 jobs, and with the same priority:
Q1: J1 J2 J3
Q2: J4 J5 J6
Q3: J7 J8 J9
If round_robin and strict_ordering are True, and J1 cannot run, no job runs.
If round_robin is True and strict_ordering is False, and J1 cannot run, job order is J4,
J7, J2, J5, J8, J3, etc.
•
With round_robin and strict_ordering set to True, a job continually rejected by a runjob
hook may prevent other jobs from being run. A well-written hook would put the job on
hold or requeue the job with a start time at some later time to allow other jobs in the same
queue to be run.
PBS Professional 13.0 Administrator’s Guide
AG-271
Chapter 4
4.8.39
Scheduling
Routing Jobs
Before reading this section, please read about the mechanics of configuring and using routing
queues, in section 2.2.6, “Routing Queues”, on page 24.
In this section, we use the term “routing” to mean the general process of moving a job somewhere, whether it is from one queue to another, from one complex to another, or from a queue
to particular vnodes.
Routing jobs can involve collecting jobs so they don’t stray into the wrong queues, moving
those jobs to the correct queues, and filtering which jobs are allowed into queues.
You may need to collect jobs into a routing queue, before moving them to the correct destination queue. If you use a routing queue, you can force users to submit jobs to the routing queue
only, you can grab jobs as they are submitted and put them in the routing queue, and you can
set a routing queue as the default. The mechanisms to collect jobs are described below, and
listed here:
•
Setting default queue; see section 4.8.39.1.i, “Default Queue as Mechanism to Collect
Jobs”, on page 273
•
Grabbing jobs upon submission; see section 4.8.39.1.ii, “Grabbing Jobs Upon Submission”, on page 273
•
Disallowing direct submission to execution queues; see section 4.8.39.1.iii, “Disallowing
Direct Submission as Mechanism to Collect Jobs”, on page 274
•
Disallowing submission using access controls; see section 4.8.39.3.ii, “Access Controls
as Filtering Mechanism”, on page 276
There is also a one-step process, but depending on the number of jobs being submitted, it may
be too slow. You can also simply examine them upon submission and send them where you
want. The method is listed here:
•
Examining jobs upon submission and routing them using a hook; see section 4.8.39.1.iv,
“Examining Jobs Upon Submission”, on page 274.
You can use any of several mechanisms for moving jobs. Each is described in subsections
below. The mechanisms for moving jobs are the following:
•
Routing Queues; see section 4.8.39.2.i, “Routing Queues as Mechanism to Move Jobs”,
on page 274
•
Hooks; see section 4.8.39.2.ii, “Hooks as Mechanism to Move Jobs”, on page 275
•
Peer scheduling; see section 4.8.39.2.iii, “Peer Scheduling as Mechanism to Move Jobs”,
on page 275
•
The qmove command; see section 4.8.39.2.iv, “The qmove Command as Mechanism to
Move Jobs”, on page 275
AG-272
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
You can use filtering methods to control which jobs are allowed into destination queues. We
describe filtering methods in subsections below. The filtering mechanisms are the following:
•
Resource limits; jobs are filtered by resource request. See section 4.8.39.3.i, “Resource
Limits as Filtering Mechanism”, on page 276
•
Access control limits; jobs are filtered by owner. See section 4.8.39.3.ii, “Access Controls as Filtering Mechanism”, on page 276
You can use a combination of moving a job and “tagging” it, that is, including a special custom resource in the job’s resource request, to route the job. If you set the resource using a
hook, you can route the job either to a queue or to vnodes. If you make the job inherit the
resource from a queue, you can route it only to vnodes. You can set resource limits for the
special custom resource at the receiving queue, allowing in only jobs with the special
resource. You can set the special custom resource at vnodes, so that the job must run there.
Mechanisms for tagging jobs are listed here:
•
Using a hook to assign a resource; see section 4.8.39.4.i, “Using Hooks to Tag Jobs”, on
page 277
•
Associating vnodes with queues; see section 4.8.2.2, “Associating Vnodes With Multiple
Queues”, on page 126
•
Changing the job’s resource request using the qalter command; see section 4.8.39.4.ii,
“Using the qalter Command to Tag Jobs”, on page 277
4.8.39.1
4.8.39.1.i
Mechanisms for Collecting Jobs
Default Queue as Mechanism to Collect Jobs
To make it easy on your users, have their jobs land in your routing queue by default. You
probably don’t want frustrated users trying to submit jobs without specifying a queue, only to
have the jobs be rejected if you have set access controls on, or only allowed routing to, the
default queue. The server’s default_queue attribute specifies the name of the default queue.
To make things easy, make the default queue be the routing queue:
Qmgr: set server default_queue = <queue name>
4.8.39.1.ii
Grabbing Jobs Upon Submission
You can allow users to submit jobs to any queue, and then scoop up the newly-submitted jobs
and put them in the desired queue. To do this, you write a hook. There is a hook of this kind
in the example "Redirecting newly-submitted jobs:” on page 446.
PBS Professional 13.0 Administrator’s Guide
AG-273
Chapter 4
4.8.39.1.iii
Scheduling
Disallowing Direct Submission as Mechanism to Collect
Jobs
If you are using a routing queue, you can disallow job submission to all other queues. This
forces users to submit jobs to the routing queue. You should probably make the routing queue
be the default queue in this case, to avoid irritating users. Whether or not a queue allows
direct job submission is controlled by its from_route_only attribute. To disallow job submission to a queue:
Qmgr: set queue <queue name> from_route_only = True
4.8.39.1.iv
Examining Jobs Upon Submission
You can use a job submission hook to examine each job as it is submitted, and then route it to
the desired queue. For example, you can route jobs directly according to resource request,
project, owner, etc. See Chapter 6, "Hooks", on page 437.
4.8.39.2
4.8.39.2.i
Mechanisms for Moving Jobs
Routing Queues as Mechanism to Move Jobs
Routing queues are a mechanism supplied by PBS that automatically move jobs from a routing queue to another queue. You can direct which destination queues accept a job using these
filters at each destination queue:
•
Resource limits: you can set up execution queues designed for specific kinds of jobs, and
then route each kind of job separately. For example, you can create two execution
queues, and one routing queue, and route all jobs requesting large amounts of memory to
one of the execution queues, and the rest of the jobs to the other queue. See section
2.2.6.4, “Using Resources to Route Jobs Between Queues”, on page 25.
•
Access control limits: you can set up destination queues that are designed for specific
groups of users. Each queue accepts jobs only from a designated set of users or groups.
For example, if you have three departments, Math, Physics, and Chemistry, the queue
belonging to Math accepts only users from the Math department. See section 2.2.6.5,
“Using Access Control to Route Jobs”, on page 30.
When routing a job between complexes, the job’s owner must be able to submit a job to the
destination complex.
For how to configure and use routing queues, see section 2.2.6, “Routing Queues”, on page
24.
AG-274
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.39.2.ii
Chapter 4
Hooks as Mechanism to Move Jobs
You can use a submission hook to move jobs into queues such as dedicated time queues,
queues with special priority, or reservation queues. You write the hook so that it identifies the
jobs that should go into a particular queue, and then moves them there. For example, your
hook can move all jobs from ProjectA to a specific queue. This is a snippet, where you would
replace <destination queue> with the queue name.
import pbs
e = pbs.event()
e.job.queue = pbs.server().queue("<destination queue>")
For complete information on hooks, see Chapter 6, "Hooks", on page 437.
4.8.39.2.iii
Peer Scheduling as Mechanism to Move Jobs
To send jobs from one complex to another, you use peer scheduling. In peer scheduling, the
complex that supplies the jobs (the “furnishing” complex) contains at least one special queue
(the “furnishing queue”), whose jobs can be pulled over to another complex, to be run at the
other complex. The complex that pulls jobs contains a special queue (the “pulling queue”),
where those pulled jobs land.
You can use any of the job routing methods, such as routing queues, tagging, or hooks, to control which jobs land in the furnishing queue.
You can use any of the job filtering methods, such as resource limits or access controls, to
control which jobs land in the furnishing queue.
You can use job submission hooks on the jobs that land in the pulling queue.
See section 4.8.31, “Peer Scheduling”, on page 218.
4.8.39.2.iv
The qmove Command as Mechanism to Move Jobs
You can use the qmove command, either manually or via a cron job or the Windows Task
Scheduler, to move jobs into the desired queues. See “qmove” on page 186 of the PBS Professional Reference Guide.
PBS Professional 13.0 Administrator’s Guide
AG-275
Chapter 4
4.8.39.3
4.8.39.3.i
Scheduling
Mechanisms for Filtering Jobs
Resource Limits as Filtering Mechanism
You can filter whether each job is accepted at the server or a queue based on the job’s resource
request. For example, you can control which jobs are allowed to be submitted to the server,
by limiting the amount of memory a job is allowed to request. You can do the same at execution queues. These limits apply regardless of the routing mechanism being used, and apply to
jobs being submitted directly to the queue. See section 5.13, “Using Resources to Restrict
Server, Queue Access”, on page 336.
4.8.39.3.ii
Access Controls as Filtering Mechanism
You can filter jobs whether each job is accepted at the server or a queue based on the job’s
owner, or the job owner’s group. At each queue and at the server, you can create a different
list of the users who can submit jobs and the users who cannot submit jobs. You can do the
same for groups.
For example, you can set up a routing queue and several execution queues, where each execution queue has access controls allowing only certain users and groups. When PBS routes the
jobs from the routing queue, it will route them into the execution queues that accept owners of
the jobs. See section 2.2.6.5, “Using Access Control to Route Jobs”, on page 30.
4.8.39.3.iii
Hooks as Filtering Mechanism
You can filter which jobs are accepted at the server or queues according to any criterion, using
a hook. For example, you can write a hook that disallows jobs that request certain combinations of resources. See Chapter 6, "Hooks", on page 437.
AG-276
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.39.4
4.8.39.4.i
Chapter 4
Mechanisms for Tagging Jobs
Using Hooks to Tag Jobs
You can use a hook to force certain jobs to run on particular hardware, by having the hook set
the value of a host-level custom resource in a job’s resource request. The hook sets this
resource to match the value at the selected vnodes, so that the job must run on one or more of
those vnodes. You can use the job’s project to determine how the job is tagged. Note that the
value at other vnodes should be different, otherwise the job could end up on vnodes you don’t
want.
•
Define a host-level custom resource; see section 5.14.5, “Configuring Host-level Custom
Resources”, on page 360.
•
Set this resource to a special value on the special vnodes only. See section 5.7.2, “Setting
Values for Global Static Resources”, on page 319.
•
Create a hook that filters jobs by size, project, or other characteristic, and sets the value of
the custom resource to the special value, in the job’s resource request. See Chapter 6,
"Hooks", on page 437
If you must use a routing queue, and you need to route on host-level resources (resources in
the job’s select specification), you can use a hook to tag jobs so that they are routed correctly.
The hook reads the job’s host-level resource request, and sets the job’s server-level resource
request accordingly. This server-level resource is used for routing:
•
Create a custom server-level resource that you use exclusively for routing; set it to appropriate values on the destination queues; see section 5.14.4, “Configuring Server-level
Resources”, on page 358
•
Create a submit hook to extract the host-level resource value and use it to populate the
custom resource that you use exclusively for routing; see Chapter 6, "Hooks", on page
437
4.8.39.4.ii
Using the qalter Command to Tag Jobs
You can change a job’s resource request using the qalter command. This way you can override normal behavior. See “qalter” on page 135 of the PBS Professional Reference Guide.
4.8.40
Shared vs. Exclusive Use of Resources by
Jobs
When PBS places a job, it can do so on hardware that is either already in use or has no jobs
running on it. PBS can make the choice at the vnode level or at the host level. How this
choice is made is controlled by a combination of the value of each vnode’s sharing attribute
and the placement requested by a job.
PBS Professional 13.0 Administrator’s Guide
AG-277
Chapter 4
Scheduling
You can set each vnode’s sharing attribute so that the vnode or host is always shared, always
exclusive, or so that it honors the job’s placement request. The value of a vnode’s sharing
attribute takes precedence over a job’s placement request.
Each vnode can be allocated exclusively to one job (each job gets its own vnodes), or its
resources can be shared among jobs (PBS puts as many jobs as possible on a vnode). If a
vnode is allocated exclusively to a job, all of its resources are assigned to the job. The state of
the vnode becomes job-exclusive. No other job can use the vnode.
Hosts can also be allocated exclusively to one job, or shared among jobs.
For a complete description of the sharing attribute, and a table showing the interaction
between the value of the sharing attribute and the job’s placement request, see “sharing” on
page 389 of the PBS Professional Reference Guide.
4.8.40.0.i
Sharing on a Shared-memory Altix
On a shared-memory Altix, the scheduler will share memory from a chunk even if all the
CPUs are used by other jobs. It will first try to put a chunk entirely on one vnode. If it can, it
will run it there. If not, it will break the chunk up across any vnode it can get resources from,
even for small amounts of unused memory.
4.8.40.1
Setting the sharing Vnode Attribute
When setting the sharing vnode attribute, follow the rules in section 3.5.2, “Choosing Configuration Method”, on page 52
4.8.40.2
Viewing Sharing Information
You can use the qmgr or pbsnodes commands to view sharing information. See “qmgr” on
page 158 of the PBS Professional Reference Guide and “pbsnodes” on page 108 of the PBS
Professional Reference Guide.
AG-278
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.40.3
Chapter 4
Sharing Caveats
•
On the Cray, the sharing attribute is set to force_exclhost by default. Do not change
this setting.
•
The term “sharing” is also used to describe the case where MoM manages a resource that
is shared among her vnodes, for example a license shared by the vnodes of a multi-vnode
machine.
•
The term “sharing” is also used to mean oversubscribing CPUs, where more than one job
is run on one CPU; the jobs are “sharing” a CPU. See section 9.4.4, “Managing Load
Levels on Vnodes”, on page 883
•
If a host is to be allocated exclusively to one job, all of the host must be used: if any
vnode from a host has its sharing attribute set to either default_exclhost or
force_exclhost, all vnodes on that host must have the same value for the sharing
attribute. When the MoM starts or restarts, if any vnode on a host is set to either
default_exclhost or force_exclhost, and another vnode is set to a different value, the
MoM will exit and log the following error message at event class 0x0001:
It is erroneous to mix sharing= <sharing val> for vnode <name> with
sharing= <force_exclhost|default_exclhost> which is set for other
vnodes on host <host>
•
For vnodes with sharing=default_shared, jobs can share a vnode, so that unused memory on partially-allocated vnodes is allocated to a job. The exec_vnode attribute will
show this allocation.
4.8.41
4.8.41.1
Using Shrink-to-fit Jobs
Shrink-to-fit Jobs
PBS allows you or the job submitter to adjust the running time of a job to fit into an available
scheduling slot. The job’s minimum and maximum running time are specified in the
min_walltime and max_walltime resources. PBS chooses the actual walltime. Any job that
requests min_walltime is a shrink-to-fit job.
4.8.41.1.i
Requirements for a Shrink-to-fit Job
A job must have a value for min_walltime to be a shrink-to-fit job. Shrink-to-fit jobs are not
required to request max_walltime, but it is an error to request max_walltime and not
min_walltime.
Jobs that do not have values for min_walltime are not shrink-to-fit jobs, and their walltime
can be specified by the user, inherited through defaults, or set in a hook.
PBS Professional 13.0 Administrator’s Guide
AG-279
Chapter 4
4.8.41.1.ii
Scheduling
Comparison Between Shrink-to-fit and Non-shrink-to-fit
Jobs
Shrink-to-fit jobs are treated the same as non-shrink-to-fit jobs unless explicitly stated. For
example, job priority is not affected by being shrink-to-fit. The only difference between a
shrink-to-fit and a non-shrink-to-fit job is how the job’s walltime is treated. PBS sets the walltime at the time the job is run; any walltime settings not computed by PBS are ignored.
4.8.41.2
Where to Use Shrink-to-fit Jobs
If you have jobs that can run for less than the expected time to completion and still make useful progress, you can use them as shrink-to-fit jobs in order to maximize utilization.
You can use shrink-to-fit jobs for the following:
•
Jobs that are internally checkpointed. This includes jobs which are part of a larger effort,
where a job does as much work as it can before it is killed, and the next job in that effort
takes up where the previous job left off.
•
Jobs using periodic PBS checkpointing
•
Jobs whose real running time might be much less than the expected time
•
When you have set up dedicated time for system maintenance, and you want to keep
machines well-utilized right up until shutdown, submitters who want to risk having a job
killed before it finishes can run speculative shrink-to-fit jobs. Similarly, speculative jobs
can take advantage of the time just before a reservation starts
•
Any job where the submitter does not mind running the job as a speculative attempt to
finish some work
4.8.41.3
4.8.41.3.i
Running Time of a Shrink-to-fit Job
Setting Running Time Range for Shrink-to-fit Jobs
It is only required that the job request min_walltime to be a shrink-to-fit job. If a job requests
min_walltime but does not request max_walltime, you may want to use a hook or defaults to
set a reasonable value for max_walltime. If you use defaults, you may want to route shrinkto-fit jobs to a special queue where they inherit a value for max_walltime if they haven’t got
one already. See section 4.8.39, “Routing Jobs”, on page 272.
Requesting max_walltime without requesting min_walltime is an error.
A job can end up with a value for min_walltime and max_walltime when the user specifies
them, when it inherits them from server or queue defaults, or when they are set in a hook.
AG-280
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Job submitters can set the job’s running time range by requesting min_walltime and
max_walltime, for example:
qsub -l min_walltime=<min walltime>, max_walltime=<max walltime> <job script>
You can set min_walltime or max_walltime using a hook, whether or not the job requests it.
You can set up defaults so that the job inherits these resources if they are not explicitly
requested or set in a hook.
4.8.41.3.ii
Inheriting Values for min_walltime and max_walltime
The min_walltime and max_walltime resources inherit values differently. A job can inherit a
value for max_walltime from resources_max.walltime; the same is not true for
min_walltime. This is because once a job is shrink-to-fit, PBS can use a walltime limit for
max_walltime.
If a job is submitted without a value for min_walltime, the value for min_walltime for the job
becomes the first of the following that exists:
•
Server’s default qsub arguments
•
Queue’s resources_default.min_walltime
•
Server’s resources_default.min_walltime
If a shrink-to-fit job is submitted without a value for max_walltime, the value for
max_walltime for the job becomes the first of the following that exists:
•
Server’s default qsub arguments
•
Queue’s resources_default.max_walltime
•
Server’s resources_default.max_walltime
•
Queue’s resources_max.walltime
•
Server’s resources_max.walltime
4.8.41.3.iii
Setting walltime for Shrink-to-fit Jobs
For a shrink-to-fit job, PBS sets the walltime resource based on the values of min_walltime
and max_walltime, regardless of whether walltime is specified for the job. You cannot use a
hook to set the job’s walltime, and any queue or server defaults for walltime are ignored,
except for the case where the job is run via qrun -H; see section 4.8.41.8.ii, “Using qrun
With -H Option”, on page 284.
PBS Professional 13.0 Administrator’s Guide
AG-281
Scheduling
Chapter 4
PBS examines each shrink-to-fit job when it gets to it, and looks for a time slot whose length
is between the job’s min_walltime and max_walltime. If the job can fit somewhere, PBS sets
the job’s walltime to a duration that fits the time slot, and runs the job. The chosen value for
walltime is visible in the job’s Resource_List.walltime attribute. Any existing walltime
value, regardless of where it comes from (user, queue default, hook, previous execution), is
reset to the new calculated running time.
If a shrink-to-fit job is run more than once, PBS recalculates the job’s running time to fit an
available time slot that is between min_walltime and max_walltime, and resets the job’s walltime, each time the job is run.
4.8.41.4
How PBS Places Shrink-to-fit Jobs
The PBS scheduler treats shrink-to-fit jobs the same way as it treats non-shrink-to-fit jobs
when it schedules them to run. The scheduler looks at each job in order of priority, and tries
to run it on available resources. If a shrink-to-fit job can be shrunk to fit in an available slot,
the scheduler runs it in its turn. The scheduler chooses a time slot that is at least as long as the
job’s min_walltime value. A shrink-to-fit job may be placed in a time slot that is shorter than
its max_walltime value, even if a longer time slot is available.
For a multi-vnode job, PBS chooses a walltime that works for all of the chunks required by
the job, and places job chunks according to the placement specification.
4.8.41.5
Shrink-to-fit Jobs and Time Boundaries
The time boundaries that constrain job running time are the following:
•
Reservations
•
Dedicated time
•
Primetime
•
Start time for a top job
Time boundaries are not affected by shrink-to-fit jobs.
A shrink-to-fit job can shrink to avoid time boundaries, as long as the available time slot
before the time boundary is greater than min_walltime.
If any job is already running, whether or not it is shrink-to-fit, and you introduce a new period
of dedicated time that would impinge on the job’s running time, PBS does not kill or otherwise take any action to prevent the job from hitting the new boundary.
AG-282
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.41.5.i
Chapter 4
Shrink-to-fit Jobs and Prime Time
If you have enabled prime time by setting backfill_prime to True, shrink-to-fit jobs will
honor the boundary between primetime and non-primetime. If prime_spill is True, shrink-tofit jobs are scheduled so that they cross the prime-nonprime boundary by up to prime_spill
duration only. If prime_exempt_anytime_queues is set to True, a job submitted in an anytime queue is not affected by primetime boundaries.
4.8.41.6
4.8.41.6.i
Shrink-to-fit Jobs and Resource Limits
Shrink-to-fit Jobs and Gating at Server or Queue
Shrink-to-fit jobs must honor any resource limits at the server or queues. If a walltime limit is
specified:
•
Both min_walltime and max_walltime must be greater than or equal to
resources_min.walltime.
•
Both min_walltime and max_walltime must be less than or equal to
resources_max.walltime.
If resource limits are not met, a job submission or modification request will fail with the following error:
“Job exceeds queue and/or server resource limits”
4.8.41.6.ii
Gating Restrictions
You cannot set resources_min or resources_max for min_walltime or max_walltime. If
you try, you will see the following error message, for example for min_walltime:
“Resource limits can not be set for min_walltime“
4.8.41.7
Shrink-to-fit Jobs and Preemption
When preempting other jobs, shrink-to-fit jobs do not shrink. Their walltime is set to their
max_walltime.
4.8.41.8
Using qrun on Shrink-to-fit Jobs
If you use qrun on a shrink-to-fit job, its behavior depends on whether you use the -H option
to qrun.
PBS Professional 13.0 Administrator’s Guide
AG-283
Chapter 4
4.8.41.8.i
Scheduling
Using qrun Without -H Option
When a shrink-to-fit job is run via qrun, it can shrink into available space to run. However,
if preemption is enabled and there is a preemptable job that must be preempted in order to run
the shrink-to-fit job, the preemptable job is preempted and the shrink-to-fit job shrinks and
runs.
When a shrink-to-fit job is run via qrun, and there is a hard deadline, e.g. reservation or dedicated time, that conflicts with the shrink-to-fit job’s max_walltime but not its min_walltime,
the following happens:
•
If preemption is enabled and there is a preemptable job before the hard deadline that must
be preempted in order to run the shrink-to-fit job, preemption behavior means that the
shrink-to-fit job does not shrink to fit; instead, it conflicts with the deadline and does not
run.
•
If preemption is enabled and there is no preemptable job before the hard deadline, the
shrink-to-fit job shrinks into the available time and runs.
4.8.41.8.ii
Using qrun With -H Option
When a shrink-to-fit job is run via qrun -H, the shrink-to-fit job runs, regardless of reservations, dedicated time, other jobs, etc. When run via qrun -H, shrink-to-fit jobs do not
shrink. If the shrink-to-fit job has a requested or inherited value for walltime, that value is
used, instead of one set by PBS when the job runs. If no walltime is specified, the job runs
without a walltime.
4.8.41.9
4.8.41.9.i
Modifying Shrink-to-fit and Non-shrink-to-fit
Jobs
Modifying min_walltime and max_walltime
You can change min_walltime and/or max_walltime for a shrink-to-fit job using modifyjob or
queuejob hooks, or by using the qalter command. Any changes take effect after the current scheduling cycle. Changes affect only queued jobs; running jobs are unaffected unless
they are rerun.
AG-284
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.41.9.ii
Chapter 4
Making Non-shrink-to-fit Jobs into Shrink-to-fit Jobs
You can convert a normal non-shrink-to-fit job into a shrink-to-fit job using the following
methods:
•
Use a hook that does the following:
•
Sets max_walltime to the job’s walltime
•
Sets min_walltime to a useful value
•
Use resources_default at the server or a queue. For a queue, you might want to set that
queue’s from_route_only attribute to True.
•
Route to a queue that has resources_default.min_walltime set.
•
Use the qalter command to set values for min_walltime and max_walltime.
Any changes take effect after the current scheduling cycle. Changes affect only queued jobs;
running jobs are unaffected unless they are rerun.
4.8.41.9.iii
Making Shrink-to-fit Jobs into Non-shrink-to-fit Jobs
To make a shrink-to-fit job into a normal, non-shrink-to-fit job, use either a hook or the
qalter command to do the following:
•
Set the job’s walltime to the value for max_walltime (beware of allowing the job to run
into existing reservations etc.)
•
Unset min_walltime
•
Unset max_walltime
4.8.41.9.iv
Hooks for Running Time Limits
If you want to set a new running time limit for shrink-to-fit jobs, you can use a hook. However, this hook must set the value of max_walltime, rather than walltime, since hook settings
for walltime for a shrink-to-fit job are ignored.
PBS Professional 13.0 Administrator’s Guide
AG-285
Chapter 4
4.8.41.10
4.8.41.10.i
Scheduling
Viewing Running Time for a Shrink-to-fit Job
Viewing min_walltime and max_walltime
You can use qstat -f to view the values of the min_walltime and max_walltime. For
example:
% qsub -lmin_walltime=01:00:15, max_walltime=03:30:00 job.sh
<job-id>
% qstat -f <job-id>
...
resource_list.min_walltime=01:00:15
resource_list.max_walltime=03:30:00
You can use tracejob to display max_walltime and min_walltime as part of the job's resource
list. For example:
12/16/2011 14:28:55 A
user=pbsadmin group=Users
project=_pbs_project_default
…
Resource_List.max_walltime=10:00:00
Resource_List.min_walltime=00:00:10
4.8.41.10.ii
Viewing walltime for a Shrink-to-fit Job
PBS sets a job’s walltime only when the job runs. While the job is running, you can see its
walltime via qstat -f. While the job is not running, you cannot see its real walltime; it
may have a value set for walltime, but this value is ignored.
You can see the walltime value for a finished shrink-to-fit job if you are preserving job history. See section 11.15, “Managing Job History”, on page 999.
You can see the walltime value for a finished shrink-to-fit job in the scheduler log.
4.8.41.11
4.8.41.11.i
Lifecycle of a Shrink-to-fit Job
Execution of Shrink-to-fit Jobs
Shrink-to-fit jobs are started just like non-shrink-to-fit jobs.
4.8.41.11.ii
Termination of Shrink-to-fit Jobs
When a shrink-to-fit job exceeds the walltime PBS has set for it, it is killed by PBS exactly as
a non-shrink-to-fit job is killed when it exceeds its walltime.
AG-286
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.41.12
Chapter 4
The min_walltime and max_walltime Resources
max_walltime
Maximum walltime allowed for a shrink-to-fit job. Job’s actual walltime is between
max_walltime and min_walltime. PBS sets walltime for a shrink-to-fit job. If this
resource is specified, min_walltime must also be specified. Must be greater than or
equal to min_walltime. Cannot be used for resources_min or resources_max.
Cannot be set on job arrays or reservations. If not specified, PBS uses an eternal
time slot. Can be requested only outside of a select statement. Non-consumable.
Default: None. Type: duration. Python type: pbs.duration
min_walltime
Minimum walltime allowed for a shrink-to-fit job. When this resource is specified,
job is a shrink-to-fit job. If this attribute is set, PBS sets the job’s walltime. Job’s
actual walltime is between max_walltime and min_walltime. Must be less than or
equal to max_walltime. Cannot be used for resources_min or resources_max.
Cannot be set on job arrays or reservations. Can be requested only outside of a select
statement. Non-consumable. Default: None. Type: duration. Python type:
pbs.duration
4.8.41.13
4.8.41.13.i
Accounting and Logging for Shrink-to-fit Jobs
Accounting Log Entries for min_walltime and
max_walltime
The accounting log will contain values for min_walltime and max_walltime, as part of the
job’s Resource_List attribute. This attribute is recorded in the S, E, and R records in the
accounting log. For example, if the following job is submitted:
qsub -l min_walltime=”00:01:00”,max_walltime=”05:00:00” -l
select=2:ncpus=1 job.sh
PBS Professional 13.0 Administrator’s Guide
AG-287
Chapter 4
Scheduling
This is the resulting accounting record:
…S…….. Resource_List.max_walltime=05:00:00
Resource_List.min_walltime=00:01:00 Resource_List.ncpus=2
Resource_List.nodect=2 Resource_List.place=pack
Resource_List.select=2:ncpus=1 Resource_List.walltime=00:06:18
resources_assigned.ncpus=2
…R…….. Resource_List.max_walltime=05:00:00
Resource_List.min_walltime=00:01:00 Resource_List.ncpus=2
Resource_List.nodect=2 Resource_List.place=pack
Resource_List.select=2:ncpus=1 Resource_List.walltime=00:06:18
…E……. Resource_List.max_walltime=05:00:00
Resource_List.min_walltime=00:01:00 Resource_List.ncpus=2
Resource_List.nodect=2 Resource_List.place=pack
Resource_List.select=2:ncpus=1 Resource_List.walltime=00:06:18…….
AG-288
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.41.13.ii
•
Chapter 4
Logging
When the scheduler finds a primetime/dedicated time conflict with a shrink-to-fit job,
and the job can be shrunk, the following message is logged in the scheduler logs, with log
level PBSEVENT_DEBUG2:
“Considering shrinking job to duration=<duration>, due to prime/dedicated
time conflict”
Sample message from the scheduler log:
“03/26/2012 11:53:55;0040;pbs_sched;Job;98.blrlap203;Considering shrinking
job to duration=1:06:05, due to a prime/dedicated time conflict”
This message doesn't indicate or guarantee that the job will eventually be shrunk and run.
This message shows that the job's maximum running time conflicted with primetime and
the job can still be run by shrinking its running time.
•
When the scheduler finds a reservation/top job conflict with a shrink-to-fit job, and the
job can be shrunk, the following message is logged in the scheduler logs, with log level
PBSEVENT_DEBUG2:
“Considering shrinking job to duration=<duration>”, due to reservation/top
job conflict”
Sample log message from the scheduler log:
“03/26/2012 11:53:55;0040;pbs_sched;Job;98.blrlap203; Considering
shrinking job to duration=1:06:05, due to reservation/top job
conflict”
This message doesn't indicate or guarantee that the job will eventually be shrunk and run.
This message shows that the job's maximum running time conflicted with a reservation or
top job and the job can still be run by shrinking its running time.
•
When the scheduler runs the shrink-to-fit job, the following message is logged in the
scheduler logs with log level PBSEVENT_DEBUG2:
“Job will run for duration=<duration>”
Sample scheduler log message:
“03/26/2012 11:53:55;0040;pbs_sched;Job;98.blrlap203;Job will run for
duration=1:06:05”
4.8.41.14
•
Caveats and Restrictions for Shrink-to-fit Jobs
It is erroneous to specify max_walltime for a job without specifying min_walltime. If a
queuejob or modifyjob hook attempts this, the following error appears in the server logs.
PBS Professional 13.0 Administrator’s Guide
AG-289
Scheduling
Chapter 4
If attempted via qsub or qalter, the following error appears in the server log and is
printed as well:
'Can not have “max_walltime” without “min_walltime”'
•
It is erroneous to specify a min_walltime that is greater than max_walltime. If a queuejob or modifyjob hook attempts this, the following error appears in the server logs. If
attempted via qsub or qalter, the following error appears in the server log and is
printed as well:
'“min_walltime” can not be greater than “max_walltime”'
•
Job arrays cannot be shrink-to-fit. You cannot have a shrink-to-fit job array. It is erroneous to specify a min_walltime or max_walltime for a job array. If a queuejob or modifyjob hook attempts this, the following error appears in the server logs. If attempted via
qsub or qalter, the following error appears in the server log and is printed as well:
'”min_walltime” and “max_walltime” are not valid resources for a job array'
•
Reservations cannot be shrink-to-fit. You cannot have a shrink-to-fit reservation. It is
erroneous to set min_walltime or max_walltime for a reservation. If attempted via
pbs_rsub, the following error is printed:
'”min_walltime” and “max_walltime” are not valid resources for
reservation.'
•
It is erroneous to set resources_max or resources_min for min_walltime and
max_walltime. If attempted, the following error message is displayed, whichever is
appropriate:
“Resource limits can not be set for min_walltime”
“Resource limits can not be set for max_walltime”
4.8.42
SMP Cluster Distribution
This tool is deprecated. PBS provides a method for distributing single-chunk jobs to a cluster of single-vnode machines according to a simple set of rules. The method is called SMP
cluster distribution. It takes into account the resources specified on the resources: line in
PBS_HOME/sched_priv/sched_config. The SMP cluster distribution method allows
you to choose one of three job distribution systems:
Table 4-17: SMP Cluster Distribution Options
Option
pack
AG-290
Meaning
Pack all jobs onto one vnode, until that vnode is full, then move to
the next vnode
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
Table 4-17: SMP Cluster Distribution Options
Option
Meaning
round_robin
Place one job on each vnode in turn, before cycling back to the
first vnode
lowest_load
Place the job on the host with the lowest load average
4.8.42.1
How to Use SMP Cluster Distribution
To use SMP cluster distribution, do the following:
•
Set the smp_cluster_dist scheduler parameter to the desired value. For example, to
enable SMP cluster distribution using the round robin algorithm during primetime, and
the pack algorithm during non-primetime, set the following in the scheduler’s configuration file:
smp_cluster_dist: round_robin prime
smp_cluster_dist: pack non_prime
•
Set resources_available.<resource> to the desired limit on each vnode. You do not
need to set any of the resources that are automatically set by PBS. For a list of these, see
section 5.6.1.1, “Default Behavior of Vnode Resources”, on page 316.
•
Specify the resources to use during scheduling, in PBS_HOME/sched_priv/
sched_config:
resources: “ncpus, mem, arch, host, ...”
The smp_cluster_dist parameter is a primetime option, meaning that you can configure it
separately for primetime and non-primetime, or you can specify it for all of the time.
4.8.42.2
How To Disable SMP Cluster Distribution
To ensure that SMP cluster distribution does not interfere with your scheduling policy, leave
the smp_cluster_dist parameter set to its default value:
smp_cluster_dist
pack
all
PBS Professional 13.0 Administrator’s Guide
AG-291
Chapter 4
4.8.42.3
Scheduling
SMP Cluster Distribution Caveats and Advice
•
This feature was intended for early implementations of complexes, and probably is not
useful for you.
•
If you use this feature, you are committed to using it for the entire complex; you cannot
designate some machines where it will be used and others where it will not be used.
•
If smp_cluster_dist with either round_robin or lowest_load is used with
node_sort_key set to unused or assigned, smp_cluster_dist is set to pack.
•
The avoid_provision provisioning policy is incompatible with the smp_cluster_dist
scheduler configuration parameter. If a job requests an AOE, the avoid_provision policy
overrides the behavior of smp_cluster_dist.
•
This feature is applied only to single-chunk jobs that specify an arrangement of pack.
Multi-chunk jobs are ignored.
•
This feature is useful only for single-vnode machines. On a multi-vnoded machine, this
feature distributes jobs across vnodes, but those jobs can end up all stuck on a single host.
•
The choice of smp_cluster_dist with round_robin can be replaced by sorting vnodes
according to unused CPUs, which does a better job:
node_sort_key: “ncpus HIGH unused”
4.8.43
Sorting Jobs on a Key
PBS allows you to sort jobs on a key that you specify. This can be used when setting both
execution and preemption priority. Sorting jobs comes into play after jobs have been divided
into classes, because each class may contain more than one job. You can sort on one or more
of several different keys, and for each key, you can sort either from low to high or from high
to low.
You configure sorting jobs on a key by setting values for the job_sort_key scheduler parameter. When preemption is enabled, jobs are automatically sorted by preemption priority.
Table 4-8, “Job Execution Classes,” on page 175 shows where this step takes place.
You can create an invisible, unrequestable custom resource, and use a hook to set the value of
this resource for each job. The hook modifies the job’s resource request to include the new
resource, and sets the value to whatever the hook computes. Then you can sort jobs according
to the value of this resource.
The job_sort_key parameter is a primetime option, meaning that you can configure it separately for primetime and non-primetime, or you can specify it for all of the time.
AG-292
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.43.1
Chapter 4
job_sort_key Syntax
job_sort_key: “<sort key> HIGH | LOW <primetime option>”
You can use the following keys for sorting jobs:
Table 4-18: Keys for Sorting Jobs
Allowed
Order
Sort Key
Description
<PBS resource>
HIGH | LOW
Sorts jobs according to how much of the specified
resource they request.
fair_share_perc
HIGH | LOW
Sorts according to fairshare percentage allotted to
entity that owns job. This percentage is defined in
the resource_group file.
If user A has more priority than user B, all of user
A's jobs are always run first. Past history is not
used.
job_priority
HIGH | LOW
Sorts jobs by the value of each job’s priority
attribute.
sort_priority
HIGH | LOW
Deprecated. Replaced by job_priority option.
You can sort on up to 20 keys.
The argument to the job_sort_key parameter is a quoted string. The default for job_sort_key
is that it is not in force.
See “job_sort_key” on page 301 of the PBS Professional Reference Guide.
4.8.43.2
Configuring Sorting Jobs on a Key
You can specify more than one sort key, where you want a primary sort key, a secondary sort
key, etc.
If you specify more than one entry for job_sort_key, the first entry is the primary sort key, the
second entry is the secondary sort key, which is used to sort equal-valued entries from the first
sort, and so on.
Each entry is specified one to a line.
PBS Professional 13.0 Administrator’s Guide
AG-293
Scheduling
Chapter 4
To sort jobs on a key, set the job_sort_key scheduler parameter:
•
Set the desired key
•
Specify whether high or low results should come first
•
Specify the primetime behavior
The scheduler’s configuration file is read on startup and HUP.
4.8.43.3
Examples of Sorting Jobs on Key
Example 4-26: Sort jobs so that those with long walltime come first:
job_sort_key: “walltime HIGH”
Example 4-27: For example, if you want big jobs to run first, where “big” means more
CPUs, and if the CPUs are the same, more memory, sort on the number of CPUs
requested, then the amount of memory requested:
job_sort_key: “ncpus HIGH” all
job_sort_key: “mem HIGH” all
Example 4-28: Sort jobs so that those with lower memory come first:
job_sort_key: “mem LOW” prime
Example 4-29: Sort jobs according to the value of an invisible custom resource called JobOrder:
job_sort_key: “JobOrder LOW” all
AG-294
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.43.4
Chapter 4
Caveats and Advice for Sorting Jobs on Key
•
Do not use fair_share_perc as the sort key when using fairshare, meaning the fair_share
scheduler parameter is enabled. If you do this, the scheduler will attempt to sort a set of
jobs where each job has the same sort key value. This will not sort the jobs.
•
Use the fair_share_perc option only when ordering jobs by entity shares. See section
4.8.14, “Sorting Jobs by Entity Shares (Was Strict Priority)”, on page 168.
•
To run big jobs first, use ncpus as the primary sort key for job_sort_key:
job_sort_key: “ncpus HIGH”
•
The job_sort_key parameter is overridden by the job sorting formula and by fairshare. It
is invalid to set both job_sort_formula and job_sort_key at the same time. If they are
both set, job_sort_key is ignored and the following error message is logged:
“Job sorting formula and job_sort_key are incompatible. The job sorting
formula will be used.”
•
The scheduler’s configuration file contains an example line for job_sort_key. This line is
commented out, but shows an example of job_sort_key with “cput” as the sorting key.
•
The preempt_priority argument to the job_sort_key parameter is deprecated. Jobs are
now automatically sorted by preemption priority when preemption is enabled.
4.8.44
Sorting Jobs by Requested Priority
You can sort jobs according to the priority that was requested for the job. This value is found
in the job’s Priority attribute. You can use this value in the following ways:
•
The term job_priority represents the value of the job’s priority attribute in the job sorting
formula. See section 4.8.20, “Using a Formula for Computing Job Execution Priority”,
on page 194.
•
The job_sort_key scheduler parameter can take the term job_priority as an argument.
The term job_priority represents the value of the job’s Priority attribute. See section
4.8.43, “Sorting Jobs on a Key”, on page 292.
You can use a hook to set or change the value of a job’s Priority attribute. See section ,
“Hooks”, on page 437.
4.8.45
Sorting Queues into Priority Order
PBS always sorts all the execution queues in your complex according to their priority, and
uses that ordering when examining queues individually. Queues are ordered with the highestpriority queue first.
PBS Professional 13.0 Administrator’s Guide
AG-295
Chapter 4
Scheduling
If you want queues to be considered in a specific order, you must assign a different priority to
each queue. Give the queue you want considered first the highest priority, then the next queue
the next highest priority, and so on. To set a queue’s priority, use the qmgr command to
assign a value to the priority queue attribute.
Qmgr: set queue <queue name> priority = <value>
Sorting queues into priority order is useful for the following:
•
Examining queues one at a time. See section 4.8.4, “Examining Jobs Queue by Queue”,
on page 136.
•
Selecting jobs from queues in a round-robin fashion. See section 4.8.38, “Round Robin
Queue Selection”, on page 270.
4.8.45.1
Caveats and Advice when Sorting Queues
•
If you do not set queue priorities, queue ordering is undefined.
•
The sort_queues parameter is deprecated (12.2).
•
The sort_queues parameter has no effect; queues are always sorted (13.0).
4.8.46
Starving Jobs
PBS can keep track of the amount of time a job has been waiting to run, and then mark the job
as starving if this time has passed a specified limit. You can use this starving status in calculating both execution and preemption priority.
4.8.46.1
Enabling Starving
You enable tracking whether jobs are starving by setting the help_starving_jobs scheduler
parameter to True.
You specify the amount of time required for a job to be considered starving in the max_starve
scheduler parameter. The default for this parameter is 24 hours.
The help_starving_jobs parameter is a primetime option, meaning that you can configure it
separately for primetime and non-primetime, or you can specify it for all of the time. See
“help_starving_jobs” on page 300 of the PBS Professional Reference Guide.
AG-296
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.46.2
Chapter 4
Time Used for Starving
PBS can use one of the following kinds of time to determine whether a job is starving:
•
The job’s eligible wait time, described in section 4.8.13, “Eligible Wait Time for Jobs”,
on page 163
•
The amount of time the job has been queued
You specify which to use in the server’s eligible_time_enable attribute. When
eligible_time_enable is set to True, each job’s eligible_time value is used as its wait time
for starving. If eligible_time_enable is set to False, the amount of time the job has been
queued is used as its wait time for starving. The default for eligible_time_enable is False.
If the server’s eligible_time_enable attribute is set to False, the following rules apply:
•
The amount of time the job has been queued is used as its wait time for starving.
•
Jobs lose their queue wait time whenever they are requeued, as with the qrerun command. This includes when they are checkpointed or requeued (but not suspended) during
preemption.
•
Suspended jobs do not lose their queue wait time. However, when they become suspended, the amount of time since they were submitted is counted towards their queue
wait time. For example, if a job was submitted, then remained queued for 1 hour, then
ran for 26 hours, then was suspended, if max_starve is 24 hours, then the job will
become starving.
If the server’s eligible_time_enable attribute is set to True, the following rules apply:
•
The job’s eligible_time value is used as its wait time for starving.
•
Jobs do not lose their eligible_time when they are requeued.
•
Jobs do not lose their eligible_time when they are suspended.
4.8.46.3
Starving and Job Priority
Starving is one of the job classes used by PBS to calculate job execution priority. If you
enable starving jobs, PBS will classify starving jobs in the Starving class, which gives them
greater than ordinary priority. See section 4.8.16, “Calculating Job Execution Priority”, on
page 174. Each job’s eligible wait time can also be used in the job sorting formula used to
calculate job execution priority. See section 4.8.20, “Using a Formula for Computing Job
Execution Priority”, on page 194.
Starving is one of the job classes that you can use when specifying how preemption should
work. You can choose how much preemption priority is given to starving jobs when you set
preemption levels. See section 4.8.33, “Using Preemption”, on page 241.
PBS Professional 13.0 Administrator’s Guide
AG-297
Scheduling
Chapter 4
4.8.46.4
Parameters and Attributes Affecting Starving
The following table lists the parameters and attributes that affect starving:
Table 4-19: Parameters and Attributes Affecting Starving
Parameter or
Attribute
Location
Effect
help_starving_jobs
PBS_HOME/
sched_priv/
sched_config
Controls whether long-waiting jobs are
considered starving. When set to True,
jobs can be starving. Default: True all
max_starve
PBS_HOME/
sched_priv/
sched_config
Amount of wait time for job to be considered starving. Default: 24 hours.
eligible_time_enable
Server attribute
Controls whether a job’s wait time is
taken from its eligible_time or from its
queued time. When set to True, a job’s
eligible_time is used as its wait time.
Default: False.
eligible_time
Job attribute
The amount of time a job has been
blocked from running due to lack of
resources.
4.8.46.5
Starving and Queued or Running Jobs
A job can only accumulate starving time while it waits to run, not while it runs. When a job is
running, it keeps the starving status it had when it was started. While a job is running, if it
wasn’t starving before, it can’t become starving. However, it keeps its starving status if it
became starving while queued.
4.8.46.6
Starving and Subjobs
Subjobs that are queued can become starving. Starving status is applied to individual subjobs
in the same way it is applied to jobs. The queued subjobs of a job array can become starving
while others are running. If a job array has starving subjobs, then the job array is starving.
AG-298
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.46.7
Chapter 4
Starving and Backfilling
Because a starving job can become a top job, but can continue to be unable to run due to a
lack of resources, you may find it useful to use backfilling around starving jobs. See section
4.8.3, “Using Backfilling”, on page 129.
4.8.46.8
Starving Caveats
Do not enable starving with fairshare, meaning do not set both the fair_share and
help_starving_jobs scheduler parameters to True.
4.8.47
Using Strict Ordering
By default, when scheduling jobs, PBS orders jobs according to execution priority, then considers each job, highest-priority first, and runs the next job that can run now. Using strict
ordering means that you tell PBS that it must not skip a job when choosing which job to run.
If the top job cannot run, no job runs.
Strict ordering does not change how execution priority is calculated.
4.8.47.1
Configuring Strict Ordering
To configure strict ordering, set the strict_ordering scheduler parameter to True.
The strict_ordering parameter is a primetime option, meaning that you can configure it separately for primetime and non-primetime, or you can specify it for all of the time. See
“strict_ordering” on page 311 of the PBS Professional Reference Guide.
4.8.47.2
How Strict Ordering Works
When strict_ordering is True, the scheduler runs jobs in exactly the order of their priority.
Strict ordering does not affect how job priority is calculated, but it does change which execution priority classes the scheduler uses; see section 4.8.16, “Calculating Job Execution Priority”, on page 174.
4.8.47.3
Combining Strict Ordering and Backfilling
Strict ordering alone may cause some resources to stand idle while the top job waits for
resources to become available. If you want to prevent this, you can use backfilling with strict
ordering. Using backfilling, if the top job cannot run, filler jobs can be squeezed in around
the job that cannot run. See section 4.8.3, “Using Backfilling”, on page 129.
PBS Professional 13.0 Administrator’s Guide
AG-299
Chapter 4
4.8.47.4
•
Scheduling
Strict Ordering Caveats
It is inadvisable to use strict ordering and backfilling with fairshare. The results may be
non-intuitive. Fairshare will cause relative job priorities to change with each scheduling
cycle. It is possible that a job from the same entity or group as the desired large job will
be chosen as the filler job. The usage from these filler jobs will lower the priority of the
top job.
For example, if a user has a large job that is the top job, and that job cannot run, smaller
jobs owned by that user will chew up the user's usage, and prevent the large job from
being likely to ever run. Also, if the small jobs are owned by a user in one area of the
fairshare tree, no large jobs owned by anyone else in that section of the fairshare tree are
likely to be able to run.
•
Using dynamic resources with strict ordering and backfilling may result in unpredictable
scheduling. See section 4.8.3.9, “Backfilling Recommendations and Caveats”, on page
134.
•
Using preemption with strict ordering and backfilling may change which job is the top
job.
•
With both round robin and strict ordering, a job continually rejected by a runjob hook
may prevent other jobs from being run. A well-written hook would put the job on hold or
requeue the job at some later time to allow other jobs in the same queue to be run.
4.8.48
Sorting Vnodes on a Key
PBS can sort vnodes according to a key that you specify. This can be used when deciding
which vnodes to use for jobs. Sorting vnodes comes into play after a placement set has been
selected, or when a job will run on vnodes associated with a queue, or when placement sets
are not used, because in those cases there may be more vnodes available than are needed. You
can sort vnodes on one or more different keys, and for each key, you can sort from high to
low, or the reverse. The default way to sort vnodes is according to the value of the vnode priority attribute, from higher to lower.
When you sort vnodes according to the assigned or unused amount of a resource, the vnode
list is re-sorted after every job is run. This is because each job may change the usage for that
resource.
You configure sorting vnodes on a key by setting values for the node_sort_key scheduler
parameter.
The node_sort_key parameter is a primetime option, meaning that you can configure it separately for primetime and non-primetime, or you can specify it for all of the time.
When vnodes are not sorted on a key, their order is undefined.
AG-300
PBS Professional 13.0 Administrator’s Guide
Scheduling
4.8.48.1
Chapter 4
node_sort_key Syntax
node_sort_key: “sort_priority HIGH | LOW” <prime option>
node_sort_key: “<resource> HIGH | LOW' <prime option>
node_sort_key: “<resource> HIGH | LOW total | assigned | unused” <prime option>
where
total
Use the resources_available value
assigned
Use the resources_assigned value
unused
Use the value given by resources_available - resources_assigned
Specifying a resource such as mem or ncpus sorts vnodes by the resource specified.
Specifying the sort_priority keyword sorts vnodes on the vnode priority attribute.
The default third argument for a resource is total. If the third argument, total | assigned |
unused, is not specified with a resource, total is used. This provides backwards compatibility
with previous releases.
The values used for sorting must be numerical.
4.8.48.2
Configuring Sorting Vnodes on a Key
You can specify up to 20 sort keys, where you want a primary sort key, a secondary sort key,
etc.
If you specify more than one entry for node_sort_key, the first entry is the primary sort key,
the second entry is the secondary sort key, which is used to sort equal-valued entries from the
first sort, and so on.
Each entry is specified one to a line.
To sort jobs on a key, set the node_sort_key scheduler parameter:
•
Set the desired key
•
Specify whether high or low results should come first
•
For sorting on a resource, optionally specify total, assigned, or unused
•
Specify the primetime behavior
The scheduler’s configuration file is read on startup and HUP.
PBS Professional 13.0 Administrator’s Guide
AG-301
Chapter 4
Scheduling
The argument to the node_sort_key parameter is a quoted string. The default for
node_sort_key is the following:
node_sort_key: “sort_priority HIGH” all
See “node_sort_key” on page 304 of the PBS Professional Reference Guide.
4.8.48.2.i
Examples of Sorting Vnodes
Example 4-30: This sorts vnodes by the highest number of unused CPUs:
node_sort_key: “ncpus HIGH unused” all
Example 4-31: This sorts vnodes by the highest amount of memory assigned to vnodes, but
only during primetime:
node_sort_key: “mem HIGH assigned” prime
Example 4-32: This sorts vnodes according to speed. You want to run jobs on the fastest host
available. You have 3 machines, where HostA is fast, HostB is medium speed, and
HostC is slow.
Set node priorities so that faster machines have higher priority:
Qmgr: set node HostA priority = 200
Qmgr: set node HostB priority = 150
Qmgr: set node HostC priority = 100
Specify that vnodes are sorted according to priority, with highest priority first:
node_sort_key: "sort_priority HIGH" ALL
Example 4-33: The old “nodepack” behavior can be achieved by this:
node_sort_key: “ncpus low unused”
Example 4-34: In this example of the interactions between placement sets and
node_sort_key, we have 8 vnodes numbered 1-8. The vnode priorities are the same as
their numbers. However, in this example, when unsorted, the vnodes are selected in the
AG-302
PBS Professional 13.0 Administrator’s Guide
Scheduling
Chapter 4
order 4, 1, 3, 2, 8, 7, 5, 6. This is to illustrate the change in behavior due to
node_sort_key.
We use:
node_sort_key: “sort_priority LOW”
Using node_sort_key, the vnodes are sorted in order, 1 to 8. We have three placement
sets:
A: 1, 2, 3, 4 when sorted by node_sort_key; 4, 1, 3, 2 when no node_sort_key is used
B: 5, 6, 7, 8 when sorted by node_sort_key; 8, 7, 5, 6 when no node_sort_key is used
C: 1-8 when sorted, 4, 1, 3, 2, 8, 7, 5, 6 when not sorted.
A 6-vnode job will not fit in either A or B, but will fit in C. Without the use of
node_sort_key, it would get vnodes 4, 1, 3, 2, 8, 7. With node_sort_key, it would get
vnodes 1 - 6, still in placement set C.
4.8.48.2.ii
Caveats for Sorting Vnodes
•
Sorting on a resource with node_sort_key and using “unused” or “assigned” cannot be
used with load_balancing. If both are used, load balancing will be disabled.
•
Sorting on a resource and using “unused” or “assigned” cannot be used with
smp_cluster_dist when it is set to anything but “pack”. If both are used,
smp_cluster_dist will be set to “pack”.
PBS Professional 13.0 Administrator’s Guide
AG-303
Chapter 4
AG-304
Scheduling
PBS Professional 13.0 Administrator’s Guide
5
PBS Resources
This chapter covers PBS resources, including providing resources for user jobs, setting up
resources such as application licenses and scratch space, and how resources are used, defined,
inherited, and viewed.
The PBS Professional Reference Guide contains resource reference material. For a list of
built-in and custom Cray resources, as well as information on using resources, see
“Resources” on page 313 of the PBS Professional Reference Guide. For a description of the
format of each type of resource, see “Formats” on page 421 of the PBS Professional Reference Guide.
5.1
Introduction
PBS resources represent things such as CPUs, memory, application licenses, switches, scratch
space, and time. They can also represent whether or not something is true, for example,
whether a machine is dedicated to a particular project. PBS provides a set of built-in
resources, and allows you to define additional custom resources. For some systems, PBS creates specific custom resources; see “Custom Cray Resources” on page 323 of the PBS Professional Reference Guide. The scheduler matches requested resources with available resources,
according to rules defined by the administrator. PBS can enforce limits on resource usage by
jobs. The administrator can specify which resources are available at the server, each queue,
and each vnode.
5.2
5.1
5.2
5.3
5.4
5.4.1
5.4.2
5.4.3
5.4.4
5.4.5
Chapter Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Chapter Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Categories of Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Built-in and Custom Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Server, Queue, and Vnode Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
Consumable and Non-consumable Resources . . . . . . . . . . . . . . . . . . . . . 310
Static and Dynamic Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Global and Local Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
PBS Professional 13.0 Administrator’s Guide
AG-305
Chapter 5
PBS Resources
5.4.6
Requested and Default Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
5.4.7
Shared and Non-shared Vnode Resources . . . . . . . . . . . . . . . . . . . . . . . . 312
5.4.8
Platform-specific and Generally Available Resources. . . . . . . . . . . . . . . 312
5.4.9
Job-wide and Chunk Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
5.5
Resource Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
5.6
Behavior of Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
5.6.1
Default Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
5.6.2
How the Scheduler Uses Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
5.6.3
Resource Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
5.6.4
Resource Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
5.7
How to Set Resource Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
5.7.1
Editing Configuration Files Under Windows. . . . . . . . . . . . . . . . . . . . . . 316
5.7.2
Setting Values for Global Static Resources . . . . . . . . . . . . . . . . . . . . . . . 317
5.7.3
Setting Values for Local Static Resources . . . . . . . . . . . . . . . . . . . . . . . . 317
5.7.4
Setting Values for String Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
5.7.5
Resource Value Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
5.8
Overview of Ways Resources Are Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
5.8.1
Advice on Using Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
5.9
Resources Allocated to Jobs and Reservations . . . . . . . . . . . . . . . . . . . . . . . . 320
5.9.1
Allocating Chunks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
5.9.2
Resources Requested by Job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
5.9.3
Specifying Job Default Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
5.9.4
Allocating Default Resources to Jobs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
5.9.5
Dynamic Resource Allocation Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . 329
5.9.6
Period When Resource is Used by Job. . . . . . . . . . . . . . . . . . . . . . . . . . . 329
5.10
Using Resources to Track and Control Allocation . . . . . . . . . . . . . . . . . . . . . 330
5.11
Using Resources for Topology and Job Placement. . . . . . . . . . . . . . . . . . . . . 333
5.11.1
Restrictions on Using Resources for Job Placement . . . . . . . . . . . . . . . . 333
5.12
Using Resources to Prioritize Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
5.13
Using Resources to Restrict Server, Queue Access . . . . . . . . . . . . . . . . . . . . 334
5.13.1
Admittance Limits for walltime, min_walltime, and max_walltime . . 334
5.13.2
Restrictions on Resources Used for Admittance . . . . . . . . . . . . . . . . . . . 335
5.14
Custom Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
5.14.1
How to Use Custom Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
5.14.2
Defining New Custom Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
5.14.3
Restart Steps for Custom Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
5.14.4
Configuring Server-level Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
5.14.5
Configuring Host-level Custom Resources . . . . . . . . . . . . . . . . . . . . . . . 358
5.14.6
Using Scratch Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
5.14.7
Supplying Application Licenses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
5.14.8
Using GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
5.14.9
Using FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
AG-306
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
5.14.10 Custom Resource Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
5.15
Managing Resource Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
5.15.1
Managing Resource Usage By Users, Groups, and Projects, at Server &
Queues387
5.15.2
Limiting Number of Jobs at Vnode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
5.15.3
Placing Resource Limits on Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.15.4
Limiting the Number of Jobs in Queues. . . . . . . . . . . . . . . . . . . . . . . . . . 421
5.16
Where Resource Information Is Kept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
5.16.1
Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
5.16.2
MoM Configuration Parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
5.16.3
Server Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
5.16.4
Reservation Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
5.16.5
Queue Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
5.16.6
Vnode Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
5.16.7
Job Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
5.17
Viewing Resource Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
5.17.1
Resource Information in Accounting Logs . . . . . . . . . . . . . . . . . . . . . . . 428
5.17.2
Resource Information in Daemon Logs . . . . . . . . . . . . . . . . . . . . . . . . . . 429
5.17.3
Finding Current Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
5.17.4
Restrictions on Viewing Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
5.18
Resource Recommendations and Caveats. . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
5.3
Glossary
Advance reservation
A reservation for a specific set of resources for a specified start time and duration in
the future. Advance reservations are created by users to reserve resources for jobs.
The reservation is available only to the creator of the reservation and any users or
groups specified by the creator.
Borrowing vnode
A shared vnode resource is available for use by jobs at more than one vnode, but is
managed at just one vnode. A borrowing vnode is a vnode where a shared vnode
resource is available, but not managed.
Built-in resource
A resource that is defined in PBS Professional as shipped. Examples of built-in
resources are ncpus, which tracks the number of CPUs, and mem, which tracks
memory. See section 5.4.1, “Built-in and Custom Resources”, on page 311.
PBS Professional 13.0 Administrator’s Guide
AG-307
PBS Resources
Chapter 5
Chunk
A set of resources allocated as a unit to a job. Specified inside a selection directive.
All parts of a chunk come from the same host. In a typical MPI (Message-Passing
Interface) job, there is one chunk per MPI process.
Consumable resource
A consumable resource is a resource that is reduced or taken up by being used.
Examples of consumable resources are memory or CPUs. See section 5.4.3, “Consumable and Non-consumable Resources”, on page 312.
CPU
Has two meanings, one from a hardware viewpoint, and one from a software viewpoint:
1.
A core. The part of a processor that carries out computational tasks. Some systems present virtual cores, for example in hyperthreading.
2.
Resource required to execute a program thread. PBS schedules jobs according,
in part, to the number of threads, giving each thread a core on which to execute.
The resource used by PBS to track CPUs is called “ncpus”. The number of
CPUs available for use defaults to the number of cores reported by the OS.
When a job requests one CPU, it is requesting one core on which to run.
Custom resource
A resource that is not defined in PBS as shipped. Custom resources are created by
the PBS administrator or by PBS for some systems. See section 5.4.1, “Built-in and
Custom Resources”, on page 311.
Floating license
A unit of license dynamically allocated (checked out) when a user begins using an
application on some host (when the job starts), and deallocated (checked in) when a
user finishes using the application (when the job ends).
Generic group limit
A limit that applies separately to groups at the server or a queue. This is the limit for
groups which have no individual limit specified. A limit for generic groups is
applied to the usage across the entire group. A separate limit can be specified at the
server and each queue.
Generic user limit
A limit that applies separately to users at the server or a queue. This is the limit for
users who have no individual limit specified. A separate limit for generic users can
be specified at the server and at each queue.
AG-308
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Global resource
A global resource is defined in a resources_available attribute, at the server, a
queue, or a host. Global resources can be operated on via the qmgr command and
are visible via the qstat and pbsnodes commands. See section 5.4.5, “Global
and Local Resources”, on page 313.
Group limit
Refers to configurable limits on resources and jobs. This is a limit applied to the
total used by a group, whether the limit is a generic group limit or an individual
group limit.
Indirect resource
A shared vnode resource at vnode(s) where the resource is not defined, but which
share the resource.
Individual group limit
Applies separately to groups at the server or a queue. This is the limit for a group
which has its own individual limit specified. An individual group limit overrides the
generic group limit, but only in the same context, for example, at a particular queue.
The limit is applied to the usage across the entire group. A separate limit can be
specified at the server and each queue.
Individual user limit
Applies separately to users at the server or a queue. This is the limit for users who
have their own individual limit specified. A limit for an individual user overrides the
generic user limit, but only in the same context, for example, at a particular queue. A
separate limit can be specified at the server and each queue.
Limit
A maximum that can be applied in various situations:
•
The maximum number of jobs that can be queued
•
The maximum number of jobs that can be running
•
The maximum number of jobs that can be queued and running
•
The maximum amount of a resource that can be allocated to queued jobs
•
The maximum amount of a resource that can be consumed at any time by running jobs
•
The maximum amount of a resource that can be allocated to queued and running
jobs
Local resource
A local resource is defined in a Version 1 MoM configuration file. Local resources
cannot be operated on via the qmgr command and are not visible via the qstat and
PBS Professional 13.0 Administrator’s Guide
AG-309
Chapter 5
PBS Resources
pbsnodes commands. Local resources can be used by the scheduler. See section
5.4.5, “Global and Local Resources”, on page 313.
Managing vnode
The vnode where a shared vnode resource is defined, and which manages the
resource.
Memory-only vnode
Represents a node board that has only memory resources (no CPUs), for example, an
Altix memory-only blade.
Non-consumable resource
A non-consumable resource is a resource that is not reduced or taken up by being
used. Examples of non-consumable resources are Boolean resources and walltime.
See section 5.4.3, “Consumable and Non-consumable Resources”, on page 312.
Overall limit
Limit on the total usage. In the context of server limits, this is the limit for usage at
the PBS complex. In the context of queue limits, this is the limit for usage at the
queue. An overall limit is applied to the total usage at the specified location. Separate overall limits can be specified at the server and each queue.
Resource
A resource can be something used by a job, such as CPUs, memory, high-speed
switches, scratch space, licenses, or time, or it can be an arbitrary item defined for
another purpose. PBS has built-in resources, and allows custom-defined resources.
Shared resource
A vnode resource defined and managed at one vnode, but available for use at others.
User limit
Refers to configurable limits on resources and jobs. A user’s limit, whether generic
or individual.
AG-310
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.4
5.4.1
Chapter 5
Categories of Resources
Built-in and Custom Resources
Built-in resources are the resources that are already defined for you in PBS. PBS supplies
built-in resources including number of cpus, CPU time, and memory. For a list of built-in
resources, see “Built-in Resources” on page 315 of the PBS Professional Reference Guide.
Custom resources are those that you define, or that PBS creates for some systems. For example, if you wanted a resource to represent scratch space, you could define a resource called
Scratch, and specify a script which queries for the amount of available scratch space. See
section 5.14, “Custom Resources”, on page 337.
5.4.2
Server, Queue, and Vnode Resources
PBS resources can be available at the server, queues, both the server and queues, or at vnodes.
Any of these resources can be static or dynamic, built-in or custom, and consumable or nonconsumable. Vnode resources can additionally be global or local.
5.4.2.1
Server Resources
A server resource, also called a server-level resource, is a resource that is available at the
server. A server resource is available to be consumed or matched at the server if you set the
server’s resources_available.<resource name> attribute to the available or matching value.
For example, you can define a custom resource called FloatingLicenses and set the server’s
resources_available.FloatingLicenses attribute to the number of available floating licenses.
A server resource is a job-wide resource. This means that a job can request this resource for
the entire job, but not for individual chunks.
An example of a job-wide resource is shared scratch space, or any custom resource that is
defined at the server and queue level.
5.4.2.2
Queue Resources
A queue resource, also called a queue-level resource, is available to be consumed or matched
by jobs in the queue if you set the queue’s resources_available.<resource name> attribute
to the available or matching value.
A queue resource is a job-wide resource. A job can request a queue resource for the entire
job, but not for individual chunks.
An example of a job-wide resource is floating licenses, or any custom resource that is defined
at both server and queue level.
PBS Professional 13.0 Administrator’s Guide
AG-311
Chapter 5
5.4.2.3
PBS Resources
Resources Defined at Both Server and Queue
Custom resources can be defined to be available either at vnodes or at both the server and
queues. Consumable custom resources that are defined at the server and queue level have
their consumption monitored at the server and queue level. In our example, if a job requests
one FloatingLicenses, then the value of the resources_assigned.FloatingLicenses attribute
is incremented by one at both the server and the queue in which the job resides.
5.4.2.4
Vnode Resources
A vnode resource, also called a vnode-level or host-level resource, is available only at vnodes.
A vnode resource is a chunk-level resource, meaning that it can be requested for a job only
inside of a chunk.
5.4.3
Consumable and Non-consumable Resources
A consumable resource is one that is reduced by being used. Consumable resources include
ncpus, mem and vmem by default, and any custom resource defined with the -n or -f flags.
A non-consumable resource is not reduced through use, meaning that allocation to one job
does not affect allocation to other jobs. The scheduler matches jobs to non-consumable
resources. Examples of non-consumable resources are walltime, file, cput, pcput, pmem,
pvmem, nice, or Boolean resources.
5.4.4
Static and Dynamic Resources
Static resources are managed by PBS and have values that are fixed until you change them or
until you change the hardware and MoM reports a new value for memory or number of CPUs.
Dynamic resources are not under the control of PBS, meaning that they can change independently of PBS. Dynamic resources are reported via a script; PBS runs a query to discover the
available amount. Server dynamic resources use a script that runs at the server’s host. Hostlevel (MoM) dynamic resources use a script that runs at the execution host.
Static and dynamic resources can be available at the server or host level.
5.4.4.1
Dynamic Resource Caveats
Dynamic resource values are displayed in qstat, however, the value displayed is the last
value retrieved, not the current value. Dynamic resources have no
resources_available.<resource> representation anywhere in PBS.
AG-312
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.4.5
5.4.5.1
Chapter 5
Global and Local Resources
Global Static Resources
Global static resources are defined in resources_available attributes at the server, queue, or
vnode, and are available at the server, queue, or vnode level. Global static resources can be
operated on via the qmgr command and viewed via the qstat command. Values for built-in
global static resources are set via the qmgr command. The walltime and aoe resources are
examples of global static resources. For custom global static resources, see section 5.14.2.11,
“Example of Defining Each Type of Custom Resource”, on page 354.
5.4.5.2
Global Dynamic Resources
Global dynamic resources are defined in the server’s resourcedef file, and can be used at
the server, queue, or vnode level. Global host-level dynamic resources can be viewed via the
qstat command. Server dynamic resource values have no
resources_available.<resource> representation anywhere in PBS. See section 5.14.4.1,
“Dynamic Server-level Resources”, on page 358.
The value displayed via qstat for a dynamic resource is the most recently retrieved, not the
current value.
5.4.5.3
Local Static Resources
It is not recommended to use local static resources. Local static resources are defined in the
MoM Version 1 configuration file. These resources cannot be operated on via the qmgr command or viewed via the qstat command. They can be used by the scheduler.
5.4.5.4
Local Dynamic Resources
Dynamic local resources are defined in the MoM Version 1 configuration file. These are
scripts that run on the execution host where they are defined and return a value. These
resources can be used by the scheduler. Host dynamic resource values have no
resources_available.<resource> representation anywhere in PBS. See section 5.14.5.1,
“Dynamic Host-level Resources”, on page 361.
The value displayed via qstat for a dynamic resource is the most recently retrieved, not the
current value.
PBS Professional 13.0 Administrator’s Guide
AG-313
Chapter 5
5.4.6
PBS Resources
Requested and Default Resources
A job’s requested resources are the resources explicitly requested by the job. Default
resources are resources that you specify that each job should have if not requested. For example, you can specify that any job that does not request walltime gets 12 hours of walltime. For
jobs that do request walltime, the default of 12 hours is not applied.
For information on default resources, see section 5.9.3, “Specifying Job Default Resources”,
on page 323 and section 5.9.4, “Allocating Default Resources to Jobs”, on page 327.
5.4.7
5.4.7.1
Shared and Non-shared Vnode Resources
Non-shared Vnode Resources
Most vnode resources are not shared. When a resource is defined at one vnode for use by jobs
only at that vnode, the resource is not shared. For example, when
resources_available.ncpus is set to 4 on a single-vnode machine, and no other vnodes have
resources_available.ncpus defined as a pointer to this resource, this resource is not shared.
5.4.7.2
Shared Vnode Resources
When more than one vnode needs access to the same actual resource, that resource can be
shared among those vnodes. The resource is defined at one vnode, and the other vnodes that
supply the resource contain a pointer to that vnode. Any of the vnodes can supply that
resource to a job, but only up to the amount where the total being used by jobs is less than or
equal to the total available at the vnode where the resource is defined. For example, if you
had a 4-vnode machine which had 8GB of memory, and wanted any single vnode to be able to
supply up to 8GB to jobs, you would make the memory a shared resource. See section
5.14.5.3, “Shared Host-level Resources”, on page 364.
5.4.8
Platform-specific and Generally Available
Resources
Most PBS built-in resources are available on, and apply to, all supported platforms. However,
PBS provides some resources specifically designed for a given platform. These platform-specific resources are not applicable to any other platform, and cannot be used on platforms other
than the one(s) for which they are designed. For example, PBS creates custom resources that
represent Cray elements, such as the Cray nid and the Cray label. PBS has several built-in
resources whose names begin with mpp; these apply only to the Cray.
AG-314
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.4.9
5.4.9.1
Chapter 5
Job-wide and Chunk Resources
Job-wide Resources
A job-wide resource applies to the entire job, and is available at the server or queue, but not at
the host level. Job-wide resources are requested outside of a select statement, using this form:
-l <resource name>=<value>
For example, to request one hour of walltime for a job:
-l walltime=1:00:00
Examples of job-wide resources are walltime, scratch space, and licenses.
5.4.9.2
Chunk Resources
A chunk resource applies to the part of the job running on that chunk, and is available at the
host level. Chunk resources are requested inside a select statement. A single chunk is
requested using this form:
-l select=<resource name>=<value>:<resource name>=<value>
For example, one chunk might have 2 CPUs and 4GB of memory:
-l select=ncpus=2:mem=4gb
To request multiples of a chunk, prefix the chunk specification by the number of chunks:
-l select=[number of chunks]<chunk specification>
For example, to request six of the previous chunk:
-l select=6:ncpus=2:mem=4gb
To request different chunks, concatenate the chunks using the plus sign (“+”):
-l select=[number of chunks]<chunk specification>+[number of chunks]<chunk
specification>
For example, to request two kinds of chunks, one with 2 CPUs per chunk, and one with 8
CPUs per chunk, both kinds with 4GB of memory:
-l select=6:ncpus=2:mem=4gb+3:ncpus=8:mem=4GB
PBS Professional 13.0 Administrator’s Guide
AG-315
PBS Resources
Chapter 5
5.5
Resource Types
PBS supplies the following types of resources:
Boolean
duration
float
long
size
string
string_array
See “List of Formats” on page 421 of the PBS Professional Reference Guide for a description
of each resource type.
5.6
5.6.1
Behavior of Resources
Default Behavior
PBS automatically collects information about some resources and sets their initial values
accordingly. If you explicitly set the value for a resource, that value is carried forth across
server restarts.
5.6.1.1
Default Behavior of Vnode Resources
PBS sets the value for certain resources at each vnode. This means that the value for the
vnode’s resources_available.<resource name> attribute is set by PBS. The following table
lists the vnode resources that are set automatically by PBS.
Table 5-1: Resources Set by PBS
Resource Name
arch
AG-316
Initial Value
Value reported by OS
Notes
Settable. If you unset the value, it
remains unset until MoM is
restarted.
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Table 5-1: Resources Set by PBS
Resource Name
Initial Value
Notes
host
Short form of hostname in Settable. If you unset the value, it
Mom vnode attribute
remains unset until MoM is
restarted.
mem
Amount reported by OS
ncpus
Number of CPUs reported Settable. If you unset this value,
by OS
the MoM will reset it to the value
reported by the OS.
PBScrayhost
On CLE 2.2, set to
default .
Settable. If you unset the value, it
remains unset until MoM is
restarted.
Do not set.
On CLE 3.0 and higher,
set to value of mpp_host
for this system
PBScraylabel_<label
name>
Concatenation of
PBScraylabel_ and label
name. Set to True on all
of node s vnodes .
Do not set.
PBScraynid
Value of node_id for this
compute node
Do not set.
PBScrayorder
Value starts at 1 and incre- Do not set
ments by 1 for each node in
inventory
PBScrayseg
Segment ordinal of associated NUMA node.
Do not set
router
Name of router, from
topology file
Applies to vnodes on certain Altix
machines only, such as the 4700.
vnode
Name of the vnode
Vnode name must be specified via
the qmgr create node command.
For example, PBS automatically sets the value of resources_available.ncpus at each vnode.
PBS Professional 13.0 Administrator’s Guide
AG-317
Chapter 5
5.6.1.2
PBS Resources
Default Behavior of Server and Queue
Resources
PBS automatically sets the value for default_chunk.ncpus to 1 at the server and queues.
5.6.1.3
Default Behavior of Job Resources
PBS automatically sets the value of the estimated.start_time job resource to the estimated
start time for each job.
5.6.2
How the Scheduler Uses Resources
How the scheduler uses resources is described in section 4.8.28, “Matching Jobs to
Resources”, on page 210.
5.6.3
Resource Names
Resource names are case-insensitive. See “Resource Name” on page 426 of the PBS Professional Reference Guide for the format of resource names.
5.6.4
Resource Values
String resource values are case-sensitive. For format information, see “Resource Value” on
page 426 of the PBS Professional Reference Guide.
5.7
How to Set Resource Values
Since the value for each dynamic resource is set by PBS to the value returned by a script or
command, you will set values for static resources only.
You set values for custom and built-in resources using the same methods.
5.7.1
Editing Configuration Files Under Windows
When you edit any PBS configuration file, make sure that you put a newline at the end of the
file. The Notepad application does not automatically add a newline at the end of a file; you
must explicitly add the newline.
AG-318
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.7.2
Chapter 5
Setting Values for Global Static Resources
To set the value for a global vnode, queue, or server resource, use the qmgr command to set
the value of the appropriate resources_available.<resource> attribute.
Example 5-1: Set the value of floatlicenses at the server to 10:
Qmgr: set server resources_available.floatlicenses = 10
Example 5-2: Set the value of RunsMyApp to True at the vnode named Vnode1:
Qmgr: set node Vnode1 resources_available.RunsMyApp = True
5.7.2.1
Restrictions on Setting Values for Global Static
Resources
When setting global static vnode resources on multi-vnode machines, follow the rules in section 3.5.2, “Choosing Configuration Method”, on page 52.
5.7.3
Setting Values for Local Static Resources
It is not recommended to use local static resources, because these resources cannot be
requested, and cannot be viewed using qstat or managed using qmgr. To set the value of a
local vnode resource, edit PBS_HOME/mom_priv/config and change the value section of
the resource’s line.
5.7.4
Setting Values for String Arrays
A string array that is defined on vnodes can be set to a different set of strings on each vnode.
PBS Professional 13.0 Administrator’s Guide
AG-319
Chapter 5
PBS Resources
Example of defining and setting a string array:
•
Define a new resource:
foo_arr type=string_array, flag=h
•
Setting via qmgr:
Qmgr: set node n4 resources_available.foo_arr=“f1, f3, f5”
•
Vnode n4 has 3 values of foo_arr: f1, f3, and f5. We add f7:
Qmgr: set node n4 resources_available.foo_arr+=f7
•
Vnode n4 now has 4 values of foo_arr: f1, f3, f5 and f7.
•
We remove f1:
Qmgr: set node n4 resources_available.foo_arr-=f1
•
Vnode n4 now has 3 values of foo_arr: f3, f5, and f7.
•
Submission:
qsub –l select=1:ncpus=1:foo_arr=f3
5.7.5
Resource Value Caveats
•
It is not recommended to set the value for resources_available.ncpus. The exception is
when you want to oversubscribe CPUs. See section 9.4.4.1.iii, “How To Share CPUs”,
on page 885.
•
Do not attempt to set values for resources_available.<resource> for dynamic resources.
•
Do not set values for any resources, except those such as shared scratch space or floating
licenses, at the server or a queue, because the scheduler will not allocate more than the
specified value. For example, if you set resources_available.walltime at the server to
10:00:00, and one job requests 5 hours and one job requests 6 hours, only one job will be
allowed to run at a time, regardless of other idle resources.
5.7.5.1
Resource Value Caveats for Multi-vnode
Machines
•
When setting global static vnode resources on multi-vnode machines, follow the rules in
section 3.5.2, “Choosing Configuration Method”, on page 52.
•
It is not recommended to change the value of ncpus at vnodes on a multi-vnoded
machine.
•
On multi-vnode machines, do not set the values for mem, vmem or ncpus on the natural
vnode. If any of these resources has been explicitly set to a non-zero value on the natural
AG-320
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
vnode, set resources_available.ncpus, resources_available.mem and
resources_available.vmem to zero on each natural vnode.
•
On the natural vnode, all values for resources_available.<resource> should be zero
(0), unless the resource is being shared among other vnodes via indirection.
5.8
Overview of Ways Resources Are Used
Resources are used in several ways in PBS. The following table lists the ways resources are
used, and gives links to the section describing each one:
Table 5-2: How Resources Are Used
Use
Description
Allocation to and use by jobs
See section 5.9, “Resources Allocated to Jobs and Reservations”, on page 322
Limiting job resource usage
See section 5.15.3, “Placing Resource Limits on Jobs”,
on page 414
Restricting access to server and
queues
See section 5.13, “Using Resources to Restrict Server,
Queue Access”, on page 336
Routing jobs
See section 2.2.6.4, “Using Resources to Route Jobs
Between Queues”, on page 25
Describing topology and placing
jobs
See section 5.11, “Using Resources for Topology and
Job Placement”, on page 335
Setting job execution priority
See section 5.12, “Using Resources to Prioritize Jobs”,
on page 335
Reserving resources ahead of time See section 4.8.37, “Advance and Standing Reservations”, on page 264.
Tracking and controlling allocation
See section 5.10, “Using Resources to Track and Control Allocation”, on page 332
Determining job preemption priority
See section 4.8.33, “Using Preemption”, on page 241
PBS Professional 13.0 Administrator’s Guide
AG-321
Chapter 5
5.8.1
PBS Resources
Advice on Using Resources
See “Advice on Using Resources” on page 313 of the PBS Professional Reference Guide for
tips on using resources.
5.9
Resources Allocated to Jobs and
Reservations
Resources allocated to jobs provide the job with items such as CPUs and memory to be consumed by the job’s processes, as well as qualities such as architecture and host. The resources
allocated to a job are those that the job requests and those that are assigned to it through
resource defaults that you define.
Jobs use resources at the job-wide and chunk level. Job-wide resources such as walltime or
vmem are applied to and requested by the job as a whole. Chunk-level resources, such as
ncpus, are applied and requested in individual chunks.
Jobs explicitly request resources either at the vnode level in chunks defined in a selection
statement, or in job-wide resource requests. See “Resources” on page 313 of the PBS Professional Reference Guide and "Requesting Resources", on page 74 of the PBS Professional
User’s Guide.
Jobs inherit resource defaults for resources not explicitly requested. See section 5.9.4, “Allocating Default Resources to Jobs”, on page 327.
Chunk-level resources are made available at the host (vnode) level by defining them via
resources_available.<resource> at the vnode, and are requested using -l
select=<resource>=<value>.
Job-wide resources are made available by defining them via
resources_available.<resource> at the queue or server. These resources are requested using
-l <resource> =<value>.
The scheduler matches requested resources with available resources, according to rules
defined by the administrator.
When a job is requesting a string array resource, it can request only one of the values set in the
string array resource. The job will only be placed on a vnode where the job’s requested string
matches one of the values of the string array resource. For example, if the resource named
Colors is set to “red, blue, green” on vnode V1, and “red, blue” on V2:
•
A job can request only one of “red”, “blue”, or “green”
•
A job requesting Colors=green will only be placed on V1
AG-322
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.9.1
Chapter 5
Allocating Chunks
Chunks cannot be split across hosts. Chunks can be made up of vchunks. If a chunk is broken
up over multiple vnodes, all participating vnodes must belong to the same execution host.
Each vnode supplies a vchunk. These participating vnodes are supplying the vchunks that
make up the chunk. A chunk defines a logical set of resources, for example, those needed for
an MPI task. The resources must come from a single host, but if the requested resources
exceed that of any one vnode, the physical resources can be taken from multiple vnodes on
the same host.
5.9.2
Resources Requested by Job
The job’s Resource_List attribute lists the following resources requested by the job:
•
Job-wide resources either explicitly requested by the job or inherited from defaults
•
The following built-in chunk-level resources either explicitly requested by the job or
inherited from defaults:
mpiprocs
ncpus
netwins
mem
vmem
•
Custom vnode-level (chunk-level) resources that are global and have the n, q, or f flags
set, either explicitly requested by the job or inherited from defaults
5.9.3
Specifying Job Default Resources
You can specify which resources are automatically added to job resource requests. When a
job does not request a specific resource, the default value for that resource is automatically
added to the job’s resource request.
The amount of each resource a job is allowed to use is the amount in its resource request. See
section 5.15.3, “Placing Resource Limits on Jobs”, on page 414. Therefore you may wish to
add default limits on resource usage. This is done by adding default resources to the job’s
resource request. For example, if a job does not request walltime, but you do not want jobs
not specifying walltime to run for more than 12 hours, you can specify a default of 12 hours
for walltime. Jobs that do specify walltime do not inherit this default; they keep their
requested amount.
PBS Professional 13.0 Administrator’s Guide
AG-323
PBS Resources
Chapter 5
You can use default resources to manage jobs. For example, if you want to keep track of and
limit the number of jobs using something such as a disk arm, you can have each job using the
disk arm automatically request one counting resource. Then you can place a limit on the
amount of this resource that can be in use at one time. This technique is described in section
5.10, “Using Resources to Track and Control Allocation”, on page 332.
Default resources can be defined for the server and for each queue. Default resources defined
at the server are applied to all jobs. Default resources at a queue are applied only to the jobs
that are in that queue.
Default resources on the server and queue can be job-wide, which is the same as adding -l
<resource name> to the job’s resource request, or they can be chunk resources, which is
the same as adding :<resource name>=<value> to a chunk.
Job-wide resources are specified via resources_default on the server or queue, and chunk
resources are specified via default_chunk on the server or queue. You can also specify
default resources to be added to any qsub arguments. In addition, you can specify default
placement of jobs.
5.9.3.1
Specifying Job-wide Default Resources at
Server
To specify a server-level job-wide default resource, use the qmgr command to set the server’s
resources_default attribute:
Qmgr: set server resources_default.<resource>=<value>
For example, to set the default architecture on the server:
Qmgr: set server resources_default.arch=linux
5.9.3.2
Specifying Chunk Default Resources at Server
To specify a server-level chunk default resource, use the qmgr command to set the server’s
default_chunk attribute:
Qmgr: set server default_chunk.<resource>=<value>
For example, if you want all chunks that don’t specify ncpus or mem to inherit the values
you specify:
Qmgr: set server
Qmgr: set server
AG-324
default_chunk.ncpus=1
default_chunk.mem=1gb
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.9.3.3
Chapter 5
Specifying Job-wide Default Resources at
Queue
To specify a default for a job-wide resource at a queue, use the qmgr command to set the
queue’s resources_default attribute:
Qmgr: set queue <queue name> resources_default.<resource> = <value>
5.9.3.4
Specifying Chunk Default Resources at Queue
To specify a queue-level chunk default resource, use the qmgr command to set the queue’s
default _chunk attribute:
Qmgr: set queue <queue name> default_chunk.<resource>=<value>
For example, if you want all chunks that don’t specify ncpus or mem to inherit the values
you specify:
Qmgr: set queue small default_chunk.ncpus=1
Qmgr: set queue small default_chunk.mem=512mb
5.9.3.5
Specifying Default qsub Arguments
You can set defaults for any qsub arguments not explicitly requested by each job. You do
this at the server by using the qmgr command to set the server’s default_qsub_arguments
attribute:
Qmgr: set server default_qsub_arguments=<string containing arguments>
For example, to set the default for the Rerunable job attribute in each job’s resource request,
and the name of the job:
Qmgr: set server default_qsub_arguments= ”-r y -N MyJob”
Or to set a default Boolean in each job’s resource request so that jobs don’t run on Red unless
they explicitly ask to do so:
Qmgr: set server default_qsub_arguments=”-l Red=False”
5.9.3.6
Specifying Default Job Placement
You can specify job placement defaults at both the server and queue level. You use the qmgr
command to set the resources_default.place attribute at the server or queue:
Qmgr: set queue <queue name> resources_default.place=<value>
PBS Professional 13.0 Administrator’s Guide
AG-325
Chapter 5
PBS Resources
For example, to set the default job placement for a queue:
Qmgr: set queue Q1 resources_default.place=free
When setting default placement involving a colon, enclose the value in double quotes:
Qmgr: set server resources_default.place=”<value>”
For example, to set default placement at the server to pack:shared, do the following:
Qmgr: set server resources_default.place= "pack:shared"
See "Specifying Job Placement", on page 92 of the PBS Professional User’s Guide for
detailed information about how -l place is used.
5.9.3.7
Using Gating Values As Defaults
For most resources, if the job does not request the resource, and no server or queue defaults
are set, the job inherits the maximum gating value for the resource. If this is set at the queue,
the queue value of resources_max.<resource> is used. If this is set only at the server, the
job inherits the value set at the server. However, for mpp* resources, the job does not inherit
the gating value. For example, if the job does not request mppnppn, and no defaults are set at
the server and queue, but resources_max.mppnppn is set at the queue, the job does not
inherit the queue’s value.
5.9.3.8
Default Resource Caveats
•
While users cannot request custom resources that are created with the r flag, jobs can
inherit these as defaults from the server or queue resources_default.<resource>
attribute.
•
A qsub or pbs_rsub hook does not have resources inherited from the server or queue
resources_default or default_chunk as an input argument.
•
For mpp* resources, the job does not inherit the gating value. For example, if the job
does not request mppnppn, and no defaults are set at the server and queue, but
resources_max.mppnppn is set at the queue, the job does not inherit the queue’s value.
•
Default qsub arguments and server and queue defaults are applied to jobs at a coarse
level. Each job is examined to see whether it requests a select and a place. This means
that if you specify a default placement, such as excl, with -lplace=excl, and the user
specifies an arrangement, such as pack, with -lplace=pack, the result is that the job
ends up with -lplace=pack, NOT -lplace=pack:excl. The same is true for
select; if you specify a default of -lselect=2:ncpus=1, and the user specifies lselect=mem=2GB, the job ends up with -lselect=mem=2GB.
AG-326
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.9.4
Chapter 5
Allocating Default Resources to Jobs
Jobs inherit default resources, job-wide or per-chunk, with the following order of precedence.
Table 5-3: Order In Which Default Resources Are Assigned to
Jobs
Order of
assignment
Default value
Affects
Chunks?
Job-wide?
1
Default qsub arguments
If specified
If specified
2
Queue’s default_chunk
Yes
No
3
Server’s default_chunk
Yes
No
4
Queue’s resources_default
No
Yes
5
Server’s resources_default
No
Yes
6
Queue’s resources_max
No
Yes
7
Server’s resources_max
No
Yes
See section 5.9.3, “Specifying Job Default Resources”, on page 323 for how to set these
defaults.
For each chunk in the job's selection statement, first default qsub arguments are applied, then
queue chunk defaults are applied, then server chunk defaults are applied. If the chunk does
not contain a resource defined in the defaults, the default is added. The chunk defaults are
specified in the default_chunk.<resource name> server or queue attribute.
For example, if the queue in which the job is enqueued has the following defaults defined,
default_chunk.ncpus=1
default_chunk.mem=2gb
then a job submitted with this selection statement:
select=2:ncpus=4+1:mem=9gb
will have this specification after the default_chunk elements are applied:
select=2:ncpus=4:mem=2gb+1:ncpus=1:mem=9gb
In the above, mem=2gb and ncpus=1 are inherited from default_chunk.
PBS Professional 13.0 Administrator’s Guide
AG-327
Chapter 5
PBS Resources
The job-wide resource request is checked against queue resource defaults, then against server
resource defaults, then against the queue’s resources_max.<resource>, then against the
server’s resources_max.<resource>. If a default or maximum resource is defined which is
not specified in the resource request, it is added to the resource request.
5.9.4.1
Default Resource Allocation for min_walltime
and max_walltime
The min_walltime and max_walltime resources inherit values differently. A job can inherit a
value for max_walltime from resources_max.walltime; the same is not true for
min_walltime. This is because once a job is shrink-to-fit, PBS can use a walltime limit for
max_walltime. See section 4.8.41.3.ii, “Inheriting Values for min_walltime and
max_walltime”, on page 281.
5.9.4.2
Default Resource Allocation Caveats
•
Resources assigned from the default_qsub_arguments server attribute are treated as if
the user requested them. A job will be rejected if it requests a resource that has a resource
permission flag, whether that resource was requested by the user or came from
default_qsub_arguments. Be aware that creating custom resources with permission
flags and then using these in the default_qsub_arguments server attribute can cause
jobs to be rejected. See section 5.14.2.10, “Resource Permission Flags”, on page 351.
•
Default qsub arguments and server and queue defaults are applied to jobs at a coarse
level. Each job is examined to see whether it requests a select and a place. This means
that if you specify a default placement, such as excl, with -lplace=excl, and the user
specifies an arrangement, such as pack, with -lplace=pack, the result is that the job
ends up with -lplace=pack, NOT -lplace=pack:excl. The same is true for
select; if you specify a default of -lselect=2:ncpus=1, and the user specifies lselect=mem=2GB, the job ends up with -lselect=mem=2GB.
5.9.4.3
Moving Jobs Between Queues or Servers
Changes Defaults
If the job is moved from the current queue to a new queue or server, any default resources in
the job’s Resource_List inherited from the current queue or server are removed. The job then
inherits any new default resources. This includes a select specification and place directive
generated by the rules for conversion from the old syntax. If a job's resource is unset (undefined) and there exists a default value at the new queue or server, that default value is applied
to the job's resource list. If either select or place is missing from the job's new resource list, it
will be automatically generated, using any newly inherited default values.
AG-328
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Jobs may be moved between servers when peer scheduling is in operation. Given the following set of queue and server default values:
•
Server
resources_default.ncpus=1
•
Queue QA
resources_default.ncpus=2
default_chunk.mem=2GB
•
Queue QB
default_chunk.mem=1GB
no default for ncpus
PBS Professional 13.0 Administrator’s Guide
AG-329
Chapter 5
PBS Resources
The following illustrate the equivalent select specification for jobs submitted into queue QA
and then moved to (or submitted directly to) queue QB:
Example 5-3: Submission:
qsub -l ncpus=1 -lmem=4gb
•
In QA:
select=1:ncpus=1:mem=4gb
- No defaults need be applied
•
In QB:
select=1:ncpus=1:mem=4gb
- No defaults need be applied
Example 5-4: Submission:
qsub -l ncpus=1
•
In QA:
select=1:ncpus=1:mem=2gb
- Picks up 2GB from queue default chunk and 1 ncpus from qsub
•
In QB:
select=1:ncpus=1:mem=1gb
- Picks up 1GB from queue default_chunk and 1 ncpus from qsub
Example 5-5: Submission:
qsub -lmem=4gb
•
In QA:
select=1:ncpus=2:mem=4gb
- Picks up 2 ncpus from queue level job-wide resource default and 4GB mem from
qsub
•
In QB:
select=1:ncpus=1:mem=4gb
- Picks up 1 ncpus from server level job-wide default and 4GB mem from qsub
Example 5-6: Submission:
qsub -lnodes=4
•
In QA:
select=4:ncpus=1:mem=2gb
- Picks up a queue level default memory chunk of 2GB. (This is not 4:ncpus=2 because
AG-330
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
in prior versions, "nodes=x" implied 1 CPU per node unless otherwise explicitly stated.)
•
In QB:
select=4:ncpus=1:mem=1gb
(In prior versions, "nodes=x" implied 1 CPU per node unless otherwise explicitly stated,
so the ncpus=1 is not inherited from the server default.)
Example 5-7: Submission:
qsub -l mem=16gb -lnodes=4
•
In QA:
select=4:ncpus=1:mem=4gb
(This is not 4:ncpus=2 because in prior versions, "nodes=x" implied 1 CPU per node
unless otherwise explicitly stated.)
•
In QB:
select=4:ncpus=1:mem=4gb
(In prior versions, "nodes=x" implied 1 CPU per node unless otherwise explicitly stated,
so the ncpus=1 is not inherited from the server default.)
5.9.5
Dynamic Resource Allocation Caveats
When a job requests a dynamic resource, PBS checks to see how much of the resource is
available, but cannot know how much will be used by another job while this job executes.
This can lead to a resource shortage. For example, there is 20GB of scratch on a disk, no jobs
are running, and a job requests 15GB. This job writes to 5GB during the first part of its execution, then another job requests 10GB. The second job is started by PBS, because there is
15GB available. Now there is a shortage of scratch space.
You can avoid this problem by configuring a static consumable resource to represent scratch
space. Set it to the amount of available scratch space. See section 5.14.6.3, “Static Serverlevel Scratch Space”, on page 369 and section 5.14.6.4, “Static Host-level Scratch Space”, on
page 369.
5.9.6
5.9.6.1
Period When Resource is Used by Job
Exiting Job Keeps Resource
A job that is exiting is still consuming resources assigned to it. Those resources are available
for other jobs only when the job is finished.
PBS Professional 13.0 Administrator’s Guide
AG-331
Chapter 5
5.9.6.2
PBS Resources
Job Suspension and Resource Usage
When a job is suspended, PBS releases all of the job’s resources, including the licenses used
by PBS for the job. This does not include the licenses used by the application, if any.
Jobs are suspended, and release their licenses, for preemption, and via qsig -s suspend.
A job is resumed only when sufficient resources are available. When a person resumes a job,
the job is not run until resources are available.
5.9.6.2.i
Suspension/resumption Resource Caveats
Dynamic resources can cause problems with suspension and resumption of jobs.
When a job is suspended, its resources are freed, but the scratch space written to by the job is
not available.
A job that uses scratch space may not suspend and resume correctly. This is because if the job
writes to scratch, and is then suspended, when PBS queries for available scratch to resume the
job, the script may return a value too small for the job’s request. PBS cannot determine
whether the job itself is the user of the scratch space; PBS can only determine how much is
still unused. If a single suspended job has left less scratch space available than it requests,
that job cannot be resumed.
The above is true for any dynamic resource, such as application licenses.
5.9.6.3
Shrink-to-fit Jobs Get walltime When Executed
PBS computes the walltime value for each shrink-to-fit job when the scheduler runs the job,
not before. See section 4.8.41.3.iii, “Setting walltime for Shrink-to-fit Jobs”, on page 281.
5.10
Using Resources to Track and Control
Allocation
You can use resources to track and control usage of things like hardware and licenses. For
example, you might want to limit the number of jobs using floating licenses or a particular
vnode. There is more than one way to accomplish this.
Example 5-8: You can set a complex-wide limit on the number of jobs using a type of complex-wide floating license. This example uses a single queue for the entire complex.
AG-332
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
This method requires job submitters to request one of a floatlicensecount resource in
order to be able to use the license. To set a complex-wide limit, take the following steps:
1.
Create a custom static integer license resource that will be tracked at the server and
queue:
a.
In PBS_HOME/server_priv/resourcedef, add the line:
floatlicensecount
b.
type=long, flag=q
Add the resource to the resources: line in PBS_HOME/sched_priv/
sched_config:
resources: “[...], floatlicensecount”
2.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
3.
HUP the scheduler:
kill -HUP <scheduler PID>
4.
Set the available resource at the server using qmgr. If you have enough floating licenses
for 4 jobs:
Qmgr: set server resources_available.floatlicensecount = 4
5.
Inform job submitters that jobs using they must request one job-wide floatlicensecount
resource via the following:
qsub -l floatlicensecount=1
The scheduler will schedule up to 4 jobs at a time using the licenses. You do not need to
set the resource at any queue.
Example 5-9: Here, your job submitters don’t need to request a counting resource. Jobs are
routed based on the size of the request for memory, and the counting resource is inherited
PBS Professional 13.0 Administrator’s Guide
AG-333
PBS Resources
Chapter 5
from a default. In this example, we are limiting the number of jobs from each group that
can use a particular vnode that has a lot of memory. This vnode is called MemNode.
Jobs that request 8GB or more of memory are routed into queue BigMem, and inherit a
default counting resource called memcount. All other jobs are routed into queue SmallMem. The routing queue is called RouteQueue.
1.
Create a custom static integer memcount resource that will be tracked at the server and
queue:
a.
In PBS_HOME/server_priv/resourcedef, add the line:
memcount
b.
type=long, flag=q
Add the resource to the resources: line in PBS_HOME/sched_priv/
sched_config:
resources: “[...], memcount”
2.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
3.
HUP the scheduler:
kill -HUP <scheduler PID>
4.
Set limits at BigMem and SmallMem so that they accept the correct jobs:
Qmgr: set queue BigMem resources_min.mem = 8gb
Qmgr: set queue SmallMem resources_max.mem = 8gb
5.
Set the order of the destinations in the routing queue so that BigMem is tested first, so
that jobs requesting exactly 8GB go into BigMem:
Qmgr: set queue RouteQueue route_destinations = “BigMem, SmallMem”
6.
Set the available resource at BigMem using qmgr. If you want a maximum of 6 jobs
from BigMem to use MemNode:
Qmgr: set queue BigMem resources_available.memcount = 6
7.
Set the default value for the counting resource at BigMem, so that jobs inherit the value:
Qmgr: set queue BigMem resources_default.memcount = 1
8.
Associate the vnode with large memory with the BigMem queue:
Qmgr: set node MemNode queue = BigMem
The scheduler will only schedule up to 6 jobs from BigMem at a time on the vnode with
large memory.
AG-334
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.11
Chapter 5
Using Resources for Topology and Job
Placement
Using the topology information in the server’s node_group_key attribute, PBS examines the
values of resources at vnodes, and uses those values to create placement sets. Jobs are
assigned to placement sets according to their resource requests. Users can specify particular
placement sets by requesting the resources that define that particular placement set. For
example, if the switch named A25 connects the desired set of vnodes, a user can request the
following:
-l switch=A25
See section 4.8.32, “Placement Sets”, on page 224.
5.11.1
Restrictions on Using Resources for Job
Placement
Only vnode-level resources can be used to direct jobs to particular vnodes.
5.12
Using Resources to Prioritize Jobs
You can define the formula the scheduler uses to compute job execution priorities. Elements
in this formula can be inherited default custom resources. These resources must be job-wide
numeric resources, or consumable host-level resources. See section 5.9.3, “Specifying Job
Default Resources”, on page 323 and section 4.8.20, “Using a Formula for Computing Job
Execution Priority”, on page 194.
You can make jobs inherit numeric resources according to non-numeric qualities, such as the
job owner’s group or whether the job requests a Boolean or string resource. You can do this
by either of the following methods:
•
Use a hook to identify the jobs you want and alter their resource requests to include the
custom resources for the formula. See Chapter 6, "Hooks", on page 437
•
Use a routing queue and minimum and maximum resource limits to route jobs to queues
where they inherit the default custom resources for the formula. See section 2.2.6.4,
“Using Resources to Route Jobs Between Queues”, on page 25
For details on how job execution priority is calculated, see section 4.8.16, “Calculating Job
Execution Priority”, on page 174.
For a complete description of how PBS prioritizes jobs, see section 4.2.5, “Job Prioritization
and Preemption”, on page 67.
PBS Professional 13.0 Administrator’s Guide
AG-335
Chapter 5
5.13
PBS Resources
Using Resources to Restrict Server,
Queue Access
You can set resource limits at the server and queues so that jobs must conform to the limits in
order to be admitted. This way, you can reject jobs that request more of a resource than the
complex or a queue can supply. You can also force jobs into specific queues where they will
inherit the desired values for unrequested or custom resources. You can then use these
resources to manage jobs, for example by using them in the job sorting formula or to route
jobs to particular vnodes.
You set a maximum for each resource at the server using the resources_max.<resource>
server attribute; there is no resources_min.<resource> at the server.
You can set a minimum and a maximum for each resource at each queue using the
resources_min.<resource> and resources_max.<resource> queue attributes.
Job resource requests are compared to resource limits the same way, whether at the server or a
queue. For a complete description of how jobs are tested against limits, see section 2.2.6.4.i,
“How Queue and Server Limits Are Applied, Except Running Time”, on page 25.
Job resource requests are compared first to queue admittance limits. If there is no queue
admittance limit for a particular resource, the job’s resource request is compared to the
server’s admittance limit.
5.13.1
Admittance Limits for walltime, min_walltime,
and max_walltime
Because min_walltime and max_walltime are themselves limits, they behave differently from
other time-based resources. When a shrink-to-fit job (a job with a value for min_walltime) is
compared to server or queue limits, the following must be true in order for the job to be
accepted:
•
Both min_walltime and max_walltime must be greater than or equal to
resources_min.walltime.
•
Both min_walltime and max_walltime must be less than or equal to
resources_max.walltime.
You cannot set resources_min or resources_max for min_walltime or max_walltime.
AG-336
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.13.2
Chapter 5
Restrictions on Resources Used for
Admittance
For a list of resources that are compared to admittance limits, see section 2.2.6.4.iii,
“Resources Used for Routing and Admittance”, on page 27. For information on using strings,
string arrays, and Booleans for admittance controls, see section 2.2.6.4.iv, “Using String,
String Array, and Boolean Values for Routing and Admittance”, on page 28.
5.14
Custom Resources
You can define, that is, create, new resources within PBS. The primary use of this feature is to
add site-specific resources, such as to manage software application licenses. This section
describes how to define and use custom resources.
Once new resources are defined, jobs may request these new resources and the scheduler can
schedule on the new resources.
Using this feature, it is possible to schedule resources where the number or amount available
is outside of PBS's control.
Custom resources can be made invisible to users or unalterable by users via resource permission flags. See section 5.14.2.10, “Resource Permission Flags”, on page 351. A user will not
be able to print or list custom resources which have been made either invisible or unalterable.
PBS provides certain custom resources that are designed to reflect resources or properties
found on specific systems. Do not create custom resources with the names that PBS uses for
these resources. See “Custom Cray Resources” on page 323 of the PBS Professional Reference Guide.
5.14.1
How to Use Custom Resources
Custom resources can be static or dynamic, server-level or host-level, and local or global.
They can also be shared or not.
5.14.1.1
Choosing the Resource Category
Use dynamic resources for quantities that PBS does not control, such as externally-managed
licenses or scratch space. PBS runs a script or program that queries an external source for the
amount of the resource available and returns the value via stdout. Use static resources for
things PBS does control, such as licenses managed by PBS. PBS tracks these resources internally.
PBS Professional 13.0 Administrator’s Guide
AG-337
PBS Resources
Chapter 5
Use server-level resources for things that are not tied to specific hosts, that is, they can be
available to any of a set of hosts. An example of this is a floating license. Use host-level
resources for things that are tied to specific hosts, like the scratch space on a machine or nodelocked licenses.
5.14.1.1.i
Quick Guide to Configuring a Custom Resource
The following table gives a quick guide to configuring each kind of custom resource:
Table 5-4: Examples of Configuring Custom Resources
Use for Resource
Link to Example
License: Floating, externallymanaged
See section 5.14.7.3.i, “Example of Floating, Externallymanaged License”, on page 371
License: Floating, externallymanaged with features
See section 5.14.7.3.ii, “Example of Floating, Externallymanaged License with Features”, on page 373
License: Floating, PBS-managed
See section 5.14.7.3.iii, “Example of Floating License
Managed by PBS”, on page 374
License: Node-locked, per-host
See section 5.14.7.4.iv, “Example of Per-host Nodelocked Licensing”, on page 377
License: Node-locked, per-CPU See section 5.14.7.4.vi, “Example of Per-CPU Nodelocked Licensing”, on page 381
License: Node-locked, per-use
See section 5.14.7.4.v, “Example of Per-use Node-locked
Licensing”, on page 379
GPU: any GPU
See section 5.14.8.3, “Configuring PBS for Basic GPU
Scheduling”, on page 384
GPU: specific GPU
See "Configuring PBS for Advanced GPU Scheduling” on
page 385
Scratch space: shared
See section 5.14.6.1, “Dynamic Server-level (Shared)
Scratch Space”, on page 368 and section 5.14.6.3, “Static
Server-level Scratch Space”, on page 369
Scratch space: local to a host
See section 5.14.6.2, “Dynamic Host-level Scratch
Space”, on page 368 and section 5.14.6.4, “Static Hostlevel Scratch Space”, on page 369
Generic dynamic server-level
See section 5.14.4.1.i, “Example of Configuring Dynamic
Server-level Resource”, on page 359
AG-338
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Table 5-4: Examples of Configuring Custom Resources
Use for Resource
Link to Example
Generic static server-level
See section 5.14.4.2.i, “Example of Configuring Static
Server-level Resource”, on page 360
Generic dynamic host-level
See section 5.14.5.1.i, “Example of Configuring Dynamic
Host-level Resource”, on page 362
Generic static host-level
See section 5.14.5.2.i, “Example of Configuring Static
Host-level Resource”, on page 363
Generic shared static host-level See section 5.14.5.3.v, “Configuring Shared Static
Resources”, on page 365
5.14.1.2
Dynamic Custom Resources
A dynamic resource is one which is not under the control of PBS, meaning it can change independently of PBS. In order to use a dynamic resource, PBS must run a query to discover the
available amount of that resource. Dynamic custom resources can be defined at the server or
vnodes.
5.14.1.2.i
Dynamic Server-level Custom Resources
A dynamic server-level custom resource is used to track a resource that is available at the
server. You use a dynamic server-level resource to track something that is not under the control of PBS, and changes outside of PBS, for example, floating licenses. At each scheduler
cycle, the scheduler runs a script at the server host to determine the available amount of that
resource. Server-level custom resources are used as job-wide resources.
5.14.1.2.ii
Dynamic Host-level Custom Resources
A dynamic host-level custom resource is used to track a resource that is available at the execution host or hosts. You use a dynamic host-level resource for a resource that is not under the
control of PBS, and changes outside of PBS, for example, scratch space. At each scheduler
cycle, the scheduler queries the MoM for the available amount of the resource. The MoM
runs a script which returns the current value of the resource. Host-level dynamic resources
are used inside chunks.
5.14.1.3
Static Custom Resources
A static resource is one which is under the control of PBS. Any changes to the value are performed by PBS or by the administrator. Static custom resources are defined ahead of time, at
the server, queues or vnodes. Static custom resources can be local or global.
PBS Professional 13.0 Administrator’s Guide
AG-339
PBS Resources
Chapter 5
5.14.1.3.i
Global Static Custom Resources
Global static custom resources are defined in PBS_HOME/server_priv/resourcedef. Global static custom resource values at vnode, queue and server are set via qmgr, by
setting resources_available.<custom resource name> = <value>. These resources are
available at the server, queues, or vnodes.
5.14.1.3.ii
Local Static Custom Resources
Local static custom resources are defined in PBS_HOME/mom_priv/config, and are available only at the host where they are defined. Note that these resources cannot be set via qmgr
or viewed via qstat. It is not recommended to use local static custom resources.
5.14.1.4
Shared Vnode Resources
A shared vnode resource is managed at one vnode, but available to be used by jobs at others.
This allows flexible allocation of the resource. See section 5.14.5.3, “Shared Host-level
Resources”, on page 364 for information on resources shared across vnodes.
5.14.1.5
Using Custom Resources for Application
Licenses
The following table lists application licenses and what kind of custom resource to define for
them. See section 5.14.7, “Supplying Application Licenses”, on page 369 for specific instructions on configuring each type of license and examples of configuring custom resources for
application licenses.
Table 5-5: Custom Resources for Application Licenses
Floating or
Node-locked
Floating
Unit Being
Licensed
How License is
Managed
Level
Resource
Type
Token
External license man- Server
ager
Dynamic
Token
PBS
Server
Static
Node-locked
Host
PBS
Host
Static
Node-locked
CPU
PBS
Host
Static
Node-locked
Instance of Appli- PBS
cation
Host
Static
(site-wide)
Floating
(site-wide)
AG-340
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.1.6
Chapter 5
Using Custom Resources for Scratch Space
You can configure a custom resource to report how much scratch space is available on
machines. Jobs requiring scratch space can then be scheduled onto machines which have
enough. This requires dynamic host-level resources. See section 5.14.6, “Using Scratch
Space”, on page 368 and section 5.14.5.1, “Dynamic Host-level Resources”, on page 361.
5.14.2
Defining New Custom Resources
You can define new custom resources as follows:
•
To define any custom resource, you can use qmgr.
•
To define custom, non-consumable, host-level resources at vnodes, you can use hooks;
see section 6.10.8, “Adding Custom Non-consumable Host-level Resources”, on page
512
•
To define any custom resource (including vnode resources), you can use a combination of
file edits and qmgr. Deprecated (13.0)
5.14.2.1
Defining Custom Resources via qmgr
You can use qmgr to create and delete custom resources, and to set their type and flags.
You must have PBS Manager privilege to operate on resources via qmgr.
5.14.2.1.i
Creating Custom Resources via qmgr
When you define or change a custom resource via qmgr, the changes take place immediately,
and you do not have to restart the server.
To create a resource:
qmgr -c ‘create resource <resource name>[,<resource name>] [type = <type>], [flag =
<flags>]’
For example:
Qmgr: create resource foo type=long,flag=q
To create multiple resources of the same type and flag, separate each resource name with a
comma:
qmgr -c “create resource r1,r2 type=long,flag=nh”
You can abbreviate “resource” to “r”:
qmgr -c “create r foo type=long,flag=nh”
You cannot create a resource with the same name as an existing resource.
PBS Professional 13.0 Administrator’s Guide
AG-341
PBS Resources
Chapter 5
After you have defined your new custom resource, tell the scheduler how to use it. See section 5.14.2.4, “Allowing Jobs to Use a Resource”, on page 347.
5.14.2.1.ii
Deleting Custom Resources via qmgr
You cannot delete a custom resource that is requested by a job or reservation. If you want to
make sure that you can delete a resource, it must not be requested by any jobs or reservations.
Either let those jobs finish, or qalter them. Delete and re-create any reservations. Before you
delete a custom resource, you must remove all references to that resource, including where it
is used in hooks or the scheduling formula. When you delete a resource that is set on the
server, a queue, or a vnode, PBS unsets the resource for you.
You cannot delete a built-in resource.
To delete a resource:
qmgr -c ‘delete resource <resource name>’
For example:
Qmgr: delete resource foo
AG-342
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
To remove custom resources:
1.
2.
Remove all references to the resource
•
Remove it from the formula
•
Remove it from hooks
•
Let jobs finish or requeue and then qalter them while they are queued
•
Delete and re-create any reservations
Edit the resources: line in PBS_HOME/sched_priv/sched_config to remove
the unwanted resource name:
•
If the resource is a server dynamic resource, remove the resource name from the
server_dyn_res: line
•
If the resource is a MoM dynamic resource, remove the resource from the
mom_resources: line
3.
For each MoM whose Version 2 configuration file contains references to the resource,
use the pbs_mom -s insert command to update the Version 2 configuration file.
See section 3.5.3, “Creating Version 2 MoM Configuration Files”, on page 53.
4.
If the resource is a local dynamic resource, defined in the MoM Version 1 configuration
file:
For each host where the unwanted resource is defined, edit PBS_HOME/mom_priv/
config and remove the resource entry line.
5.
HUP each MoM; see section 5.14.3.2, “Restarting or Reinitializing MoM”, on page 356
6.
Delete the resource using qmgr:
qmgr -c ‘delete resource <resource name>’
5.14.2.1.iii
Setting Types and Flags for Custom Resources via
qmgr
To set the type for a resource:
set resource <resource name> type = <type>
For example:
qmgr -c “set resource foo type=string_array”
To set the flags for a resource:
set resource <resource name> flag = <flag(s)>
For example:
qmgr -c “set resource foo flag=nh”
PBS Professional 13.0 Administrator’s Guide
AG-343
Chapter 5
PBS Resources
To set the type and flags for a resource:
set resource <resource name> type=<type>, flag = <flag(s)>
For example:
qmgr -c “set resource foo type=long,flag=nh”
You can set multiple resources by separating the names with commas. For example:
qmgr -c “set resource r1, r2 type=long”
You cannot set the n, f, or q flag for a resource of type string, string_array, or Boolean.
You cannot set both the n and the f flags on one resource.
You cannot have the n or f flags without the h flag.
You cannot set both the i and r flags on one resource.
You cannot unset the type for a resource.
You cannot set the type for a resource that is requested by a job or reservation, or set on a
server, queue, or vnode.
You cannot set the flag(s) to n, h, f, nh, fh, or q for a resource that is requested by a job or reservation.
You cannot unset the flag(s) for a resource that is requested by a job or reservation, or set on
any server, queue, or vnode.
You cannot alter a built-in resource.
You can unset custom resource flags, but not their type.
5.14.2.2
Defining Custom Resources via Hooks
You can use hooks to add new custom host-level non-consumable resources, and set their values. See section 6.10.8, “Adding Custom Non-consumable Host-level Resources”, on page
512.
You must make the resource usable by the scheduler: see section 5.14.2.4, “Allowing Jobs to
Use a Resource”, on page 347.
To delete a custom resource created in a hook, use qmgr. See section 5.14.2.1.ii, “Deleting
Custom Resources via qmgr”, on page 342.
AG-344
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.2.3
5.14.2.3.i
Chapter 5
Defining Custom Resources via File Edits
(Deprecated in 13.0)
Creating Custom Resources via File Edits
When you use file edits to create a new custom resource to be used by jobs, you must do the
following:
1.
Define the resource in the server’s resourcedef file. See section 5.14.2.8, “The
resourcedef File”, on page 348, section 5.14.2.9, “Resource Accumulation Flags”, on
page 349, and section 5.14.2.10, “Resource Permission Flags”, on page 351.
2.
Make the resource usable by the scheduler: see section 5.14.2.4, “Allowing Jobs to Use a
Resource”, on page 347.
3.
Depending on the type of resource, the server, scheduler and MoMs must be restarted.
See section 5.14.3, “Restart Steps for Custom Resources”, on page 356.
5.14.2.3.ii
Deleting Custom Resources via File Edits
Removing any custom resource definition should be done with care. It is important to delete
a custom resource completely and in the correct order. These steps are described below.
If you delete a resource definition from PBS_HOME/server_priv/resourcedef and
restart the server, all jobs requesting that resource will be purged from the server when it is
restarted. To avoid losing jobs requesting a deleted custom resource, use the qalter command on those jobs before restarting the server.
Before you delete a custom resource, you must remove all references to that resource, including where it is used in hooks, the scheduling formula, queue and server settings such as
resources_available, etc. Any attributes containing the custom resource must be unset for
that resource.
PBS Professional 13.0 Administrator’s Guide
AG-345
Chapter 5
PBS Resources
To remove custom resources:
1.
Remove all references to the resource
•
Remove it from the formula
•
Remove it from hooks
•
Let jobs finish or qalter them
•
Delete and re-create any reservations
2.
Make sure that the pbs_server daemon is running
3.
Set scheduling to False
4.
For each custom resource to be removed, use qmgr to unset that resource at the server,
queue, or node level:
Qmgr: unset server <resource name>
Qmgr: unset queue <resource name>
Qmgr: unset node <resource name>
5.
Quit qmgr
6.
Edit the PBS_HOME/server_priv/resourcedef file to remove the unwanted
resources
7.
Edit the resources: line in PBS_HOME/sched_priv/sched_config to remove
the unwanted resource name
•
If the resource is a server dynamic resource, remove the resource name from the
server_dyn_res: line
•
If the resource is a MoM dynamic resource, remove the resource from the
mom_resources: line
8.
For each MoM whose Version 2 configuration file contains references to the resource,
use the pbs_mom -s insert command to update the Version 2 configuration file.
See section 3.5.3, “Creating Version 2 MoM Configuration Files”, on page 53.
9.
If the resource is a local dynamic resource, defined in the MoM Version 1 configuration
file:
For each host where the unwanted resource is defined, edit PBS_HOME/mom_priv/
config and remove the resource entry line.
10. Restart the pbs_server daemon; see section 5.14.3.1, “Restarting the Server”, on page
356
11. HUP each MoM; see section 5.14.3.2, “Restarting or Reinitializing MoM”, on page 356
AG-346
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
12. Set scheduling to True
5.14.2.4
Allowing Jobs to Use a Resource
After you define your resource, you need to make it usable by jobs:
1.
Put the resource in the “resources:” line in /PBS_HOME/sched_priv/
sched_config. If the resource is a host-level boolean, you do not need to add it here.
2.
If the resource is static, set the value via qmgr.
3.
If the resource is dynamic, add it to the correct line in the scheduler’s configuration file:
•
If it’s a host -level dynamic resource, it must be added to the mom_resources line
•
If it’s a server-level resource, it must be added to the server_dyn_res line
5.14.2.5
Editing Configuration Files Under Windows
When you edit any PBS configuration file, make sure that you put a newline at the end of the
file. The Notepad application does not automatically add a newline at the end of a file; you
must explicitly add the newline.
5.14.2.6
Dynamic Resource Scripts/Programs
You create the script or program that PBS uses to query the external source. The external
source can be a license manager or a command, as when you use the df command to find the
amount of available disk space. If the script is for a server-level dynamic resource, it is placed
on the server. If it is for a host-level resource, it is placed on the host(s) where it will be used.
5.14.2.6.i
Requirements for Scripts/Programs
•
The script must be available to the scheduler, which runs the script
•
If you have set up peer scheduling, make sure that the script is available to any scheduler
that must run it
•
The script must return its output via stdout, and the output must be in a single line ending with a newline
•
In Windows, if you use Notepad to create the script, be sure to explicitly put a newline at
the end of the last line, otherwise none will appear, causing PBS to be unable to properly
parse the file
PBS Professional 13.0 Administrator’s Guide
AG-347
PBS Resources
Chapter 5
5.14.2.7
Defining and Setting Static and Dynamic
Custom Resources
The following table lists the differences in defining and setting static and dynamic custom
resources at the server, Queue and host level.
Table 5-6: Defining and Setting New Custom Resources
Resource
Type
Server-level
Queuelevel
static
Set via qmgr
dynamic
Add to
Cannot be
server_dyn_res line used.
in PBS_HOME/
sched_priv/
sched_config
5.14.2.8
Set via
qmgr
Host-level
Set via qmgr
Add to MoM config file
PBS_HOME/mom_priv/config and mom_resources line
in PBS_HOME/sched_priv/
sched_config
The resourcedef File
Global custom resources are defined in PBS_HOME/server_priv/resourcedef. The
format of each line in PBS_HOME/server_priv/resourcedef is:
<resource name> [type=<resource type>] [flag=<resource flag>]
<resource_name> is any string made up of alphanumeric characters, where the first character
is alphabetic. Resource names must start with an alphabetic character and can contain alphanumeric, underscore (“_”), and dash (“-”) characters: [a-zA-Z][a-zA-Z0-9_]*.
The length of each line in PBS_HOME/server_priv/resourcedef file should not be
more than 254 characters. There is no limit to the number of custom resources that can be
defined.
AG-348
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
<resource type> is the type of the resource value, which can be one of the following keywords:
Boolean
long
string
string_array
size
float
You cannot create a custom resource of type “time” or “duration”. For these resources, use
“long”.
The default for <resource type> is “long”.
The format of custom Boolean, size, string or string_array resources must be the same as
built-in resources.
<resource flag> is zero or more resource accumulation or resource permission flags. See the
following sections.
See “Resource Data Types” on page 313 of the PBS Professional Reference Guide for a
description of each resource type.
You must restart the server after defining resources in the resourcedef file. See section
5.14.3.1, “Restarting the Server”, on page 356.
5.14.2.9
Resource Accumulation Flags
When you define a custom resource, you can specify whether it is server-level or host-level,
and whether it is consumable or not. This is done by setting resource accumulation flags in the
resource definition in PBS_HOME/server_priv/resourcedef. A consumable resource
is tracked, or accumulated, in the server, queue or vnode resources_assigned attribute. The
resource accumulation flags determine where the value of resources_assigned.<resource>
is incremented.
5.14.2.9.i
Allowable Values for Resource Accumulation Flags
The value of <resource flags> is a concatenation of one or more of the following letters:
(none of h, n, f, or q)
Indicates a queue-level or server-level resource that is not consumable.
h
Indicates a host-level resource. Used alone, means that the resource is not consumable. Required for any resource that will be used inside a select statement. This flag
PBS Professional 13.0 Administrator’s Guide
AG-349
Chapter 5
PBS Resources
selects hardware. This flag indicates that the resource must be requested inside of a
select statement.
Example: for a Boolean resource named "green":
green type=boolean, flag=h
n
The amount is consumable at the host level, for all vnodes assigned to the job. Must
be consumable or time-based. Cannot be used with Boolean or string resources. The
“h” flag must also be used.
This flag specifies that the resource is accumulated at the vnode level, meaning that
the value of resources_assigned.<resource> is incremented at relevant vnodes
when a job is allocated this resource or when a reservation requesting this resource
on this vnode starts.
This flag is not used with dynamic consumable resources. The scheduler will not
oversubscribe dynamic consumable resources.
f
The amount is consumable at the host level for only the first vnode allocated to the
job (vnode with first task.) Must be consumable or time-based. Cannot be used
with Boolean or string resources. The “h” flag must also be used.
This flag specifies that the resource is accumulated at the first vnode, meaning that
the value of resources_assigned.<resource> is incremented only at the first vnode
when a job is allocated this resource or when a reservation requesting this resource
on this vnode starts.
q
The amount is consumable at the Queue and server level. When a job is assigned
one unit of a resource with this flag, the resources_assigned.<resource> attribute
at the server and any queue is incremented by one. Must be consumable or timebased.
This flag specifies that the resource is accumulated at the queue and server level,
meaning that the value of resources_assigned.<resource> is incremented at each
queue and at the server when a job is allocated this resource. When a reservation
starts, allocated resources are added to the server’s resources_assigned attribute.
This flag is not used with dynamic consumable resources. The scheduler will not
oversubscribe dynamic consumable resources.
AG-350
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.2.9.ii
Chapter 5
When to Use Accumulation Flags
The following table shows when to use accumulation flags.
Table 5-7: When to Use Accumulation Flags
Resource
Category
Server
Queue
flag = q
Host
Static, consumable
flag = q
Static, not consumable
flag = (none of h, n, q or flag = (none of h, flag = h
f)
n, q or f)
Dynamic
server_dyn_res
line in
sched_config,
(cannot be used)
flag = (none of h, n, q or
f)
5.14.2.9.iii
flag = nh or fh
MoM config and
mom_resources line
in sched_config,
flag = h
Example of Resource Accumulation Flags
When defining a static consumable host-level resource, such as a node-locked license, you
would use the “n” and “h” flags.
When defining a dynamic resource such as a floating license, you would use no flags.
5.14.2.9.iv
•
Resource Accumulation Flag Restrictions and Caveats
Numeric dynamic resources cannot have the q or n flags set. This would cause these
resources to be underused. These resources are tracked automatically by the scheduler.
5.14.2.10
Resource Permission Flags
When you define a custom resource, you can specify whether unprivileged users have permission to view or request the resource, and whether users can qalter a request for that
resource. This is done by setting two resource permission flags in the resource definition in
$PBS_HOME/server_priv/resourcedef.
5.14.2.10.i
i
Allowable Values for Resource Permission Flags
“Invisible”. Users cannot view or request the resource. Users cannot qalter a
resource request for this resource.
PBS Professional 13.0 Administrator’s Guide
AG-351
PBS Resources
Chapter 5
r
“Read only”. Users can view the resource, but cannot request it or qalter a
resource request for this resource.
(neither i nor r)
Users can view and request the resource, and qalter a resource request for this
resource.
5.14.2.10.ii
Effect of Resource Permission Flags
•
PBS Operators and Managers can view and request a resource, and qalter a resource
request for that resource, regardless of the i and r flags.
•
Users, operators and managers cannot submit a job which requests a restricted resource.
Any job requesting a restricted resource will be rejected. If a manager needs to run a job
which has a restricted resource with a different value from the default value, the manager
must submit the job without requesting the resource, then qalter the resource value.
•
While users cannot request these resources, their jobs can inherit default resources from
resources_default.<resource> and default_chunk.<resource>.
If a user tries to request a resource or modify a resource request which has a resource permission flag, they will get an error message from the command and the request will be
rejected. For example, if they try to qalter a job’s resource request, they will see an
error message similar to the following:
“qalter: Cannot set attribute, read only or insufficient permission
Resource_List.hps 173.mars”
Example resourcedef file:
W_prio
B_prio
P_prio
5.14.2.10.iii
type=long
type=long
type=long
flag=i
flag=r
flag=i
Resource Permission Flag Restrictions and Caveats
•
You can specify only one of the i or r flags per resource. If both are specified, the
resource is treated as if only the i flag were specified, and an error message is logged at
the default log event class and printed to standard error.
•
Resources assigned from the default_qsub_arguments server attribute are treated as if
the user requested them. A job will be rejected if it requests a resource that has a resource
AG-352
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
permission flag whether that resource was requested by the user or came from
default_qsub_arguments.
•
The behavior of several command-line interfaces is dependent on resource permission
flags. These interfaces are those which view or request resources or modify resource
requests:
pbsnodes
Users cannot view restricted host-level custom resources.
pbs_rstat
Users cannot view restricted reservation resources.
pbs_rsub
Users cannot request restricted custom resources for reservations.
qalter
Users cannot alter a restricted resource.
qmgr
Users cannot print or list a restricted resource.
qselect
Users cannot specify restricted resources via -l Resource_List.
qsub
Users cannot request a restricted resource.
qstat
Users cannot view a restricted resource.
PBS Professional 13.0 Administrator’s Guide
AG-353
PBS Resources
Chapter 5
5.14.2.11
Example of Defining Each Type of Custom
Resource
In this example, we add five custom resources: a static and a dynamic host-level resource, a
static and a dynamic server-level resource, and a static queue-level resource.
1.
The resource must be defined to the server, with appropriate flags set:
Add resource to PBS_HOME/server_priv/resourcedef:
staticserverresource
statichostresource
dynamicserverresource
dynamichostresource
staticqueueresource
type=long,
type=long,
type=long
type=long,
type=long,
flag=q
flag=nh
flag=h
flag=q
2.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
3.
The resource must be added to the scheduler’s list of resources:
Add resource to “resources:” line in PBS_HOME/sched_priv/sched_config:
resources: “[...], staticserverresource, statichostresource,
dynamicserverresource, dynamichostresource, staticqueueresource”
Host-level Boolean resources do not need to be added to the “resources:” line.
4.
HUP the scheduler:
kill -HUP <scheduler PID>
5.
If the resource is static, use qmgr to set it at the host, queue or server level:
Qmgr: set node Host1 resources_available.statichostresource=1
Qmgr: set queue Queue1 resources_available.staticqueueresource=1
Qmgr: set server resources_available.staticserverresource=1
See “qmgr” on page 158 of the PBS Professional Reference Guide.
6.
If the resource is dynamic:
a.
If it’s a host-level resource, add it to the “mom_resources” line in PBS_HOME/
AG-354
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
sched_priv/sched_config:
mom_resources: ”dynamichostresource”
b.
Add it to the MoM config file PBS_HOME/mom_priv/config:
UNIX or Windows:
dynamichostresource !path-to-command
Windows, spaces in path:
dynamichostresource !”path-to-command”
c.
If it’s a server-level resource, add it to the “server_dyn_res” line in
PBS_HOME/sched_priv/sched_config:
UNIX:
server_dyn_res: “dynamicserverresource !path-tocommand”
Windows, no spaces in path:
server_dyn_res: ‘dynamicserverresource !path-tocommand’
or:
server_dyn_res: “dynamicserverresource !path-tocommand”
Windows, spaces in path:
server_dyn_res: ‘dynamicserverresource !”path-tocommand including spaces”’
5.14.2.12
Custom Resource Values
Allowable values for float and long resources are the same as for built-in resources.
If a string resource value contains spaces or shell metacharacters, enclose the string in quotes,
or otherwise escape the space and metacharacters. Be sure to use the correct quotes for your
shell and the behavior you want. If the string resource value contains commas, the string
must be enclosed in an additional set of quotes so that the command (e.g. qsub, qalter)
will parse it correctly. If the string resource value contains quotes, plus signs, equal signs,
colons or parentheses, the string resource value must be enclosed in yet another set of additional quotes.
PBS Professional 13.0 Administrator’s Guide
AG-355
PBS Resources
Chapter 5
5.14.3
Restart Steps for Custom Resources
If you create custom resources by defining them in the resourcedef file, you must restart or
reinitialize any PBS daemon whose files were changed in order to have the new resources recognized by PBS.
5.14.3.1
Restarting the Server
In order for the server to recognize a new custom resource that was created by defining it in
the resourcedef file, the server must be restarted.
5.14.3.1.i
Restarting the Server on UNIX/Linux
qterm -t quick
PBS_EXEC/sbin/pbs_server
5.14.3.1.ii
Restarting the Server on Windows
Admin> qterm -t quick
Admin> net start pbs_server
5.14.3.1.iii
Restarting the Server with Failover Configured
Using qterm -t quick leaves the secondary server running; it will become active. If you
have configured failover, see section 9.2.7.1, “Stopping Servers”, on page 854 and section
9.2.7.2, “Starting Servers”, on page 854.
5.14.3.2
Restarting or Reinitializing MoM
In order for the MoM to recognize a new custom resource that was created by defining it in
the resourcedef file, the MoM must be restarted or reinitialized. On UNIX/Linux, whether
the MoM must be restarted or reinitialized depends on which MoM configuration file has
been changed.
•
If only the Version 1 MoM configuration file was changed, you only need to HUP the
MoM.
•
If you used the pbs_mom -s insert command to add to or change anything in the
Version 2 MoM config file, you can HUP the MoM.
•
If you used the pbs_mom -s insert command to remove anything from the Version
2 MoM config file, you must restart the MoM.
On Windows, you must restart MoM when any MoM configuration file has been changed.
AG-356
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.3.2.i
Chapter 5
Reinitializing MoM on UNIX/Linux
1.
Use the ps command to determine MoM’s process ID. Note that ps arguments vary
among UNIX systems, thus “-ef” may need to be replaced by “-aux”.
ps –ef | grep pbs_mom
2.
HUP MoM using the kill command, with MoM’s PID as an argument:
kill -HUP <MoM PID>
See “pbs_mom” on page 61 of the PBS Professional Reference Guide.
5.14.3.2.ii
Restarting MoM on UNIX/Linux
1.
Use the ps command to determine MoM’s process ID. Note that ps arguments vary
among UNIX systems, thus “-ef” may need to be replaced by “-aux”.
ps –ef | grep pbs_mom
2.
Terminate MoM using the kill command, with MoM’s PID as an argument. The syntax will vary depending on your system:
kill -INT <MoM PID>
or
kill -s INT <MoM PID>
3.
Restart MoM, allowing running jobs to continue running through the restart. If your custom resource query script/program takes longer than the default ten seconds, you can
change the alarm timeout via the -a alarm command line start option:
PBS_EXEC/sbin/pbs_mom -p [ -a timeout]
See “pbs_mom” on page 61 of the PBS Professional Reference Guide.
5.14.3.2.iii
Restarting MoM on Windows
If your custom resource query script/program takes longer than the default ten seconds, you
can change the alarm timeout via the -a alarm command line start option.
Admin> net stop pbs_mom
Admin> net start pbs_mom
See “Startup Options to PBS Services” on page 223 in the PBS Professional Installation &
Upgrade Guide.)
5.14.3.3
Restarting the Scheduler
You must restart the scheduler if you added the new custom resource to the resources: line
in PBS_HOME/sched_priv/sched_config.
PBS Professional 13.0 Administrator’s Guide
AG-357
Chapter 5
5.14.3.3.i
PBS Resources
Reinitializing the Scheduler on UNIX/Linux
ps –ef | grep pbs_sched
kill -HUP <Scheduler PID>
5.14.3.3.ii
Restarting the Scheduler on Windows
Admin> net stop pbs_sched
Admin> net start pbs_sched
5.14.4
5.14.4.1
Configuring Server-level Resources
Dynamic Server-level Resources
The availability of a dynamic server-level resource is determined by running a script or program specified in the server_dyn_res line of PBS_HOME/sched_priv/
sched_config. The value for resources_available.<resource> is updated at each scheduling cycle with the value returned by the script. This script is run at the host where the
scheduler runs, once per scheduling cycle. The script must return the value via stdout in a
single line ending with a newline.
The scheduler tracks how much of each numeric dynamic server-level custom resource has
been assigned to jobs, and will not overcommit these resources.
The format of a dynamic server-level resource query is a shell escape:
server_dyn_res: “<resource name> !<path to command>”
In this query,
<resource name> is identical to the name in the resourcedef file.
<path to command> is typically the full path to the script or program that performs the query
in order to determine the status and/or availability of the new resource you have added.
The scheduler runs the query and waits for it to finish.
Dynamic server-level resources are usually used for site-wide externally-managed floating
licenses.
Server dynamic resource values are never visible in qstat, and have no
resources_available.<resource> representation anywhere in PBS. If a job has requested a
server dynamic resource, then the requested value shows up in the output of qstat.
AG-358
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.4.1.i
Chapter 5
Example of Configuring Dynamic Server-level Resource
For a site-wide externally-managed floating license you will need two resources: one to represent the licenses themselves, and one to mark the vnodes on which the application can be run.
The first is a server-level dynamic resource and the second is a host-level Boolean, set on the
vnodes to send jobs requiring that license to those vnodes.
These are the steps for configuring a dynamic server-level resource for a site-wide externallymanaged floating license. If this license could be used on all vnodes, the Boolean resource
would not be necessary.
1.
Define the resources, for example floatlicense and CanRun, in the server resource definition file PBS_HOME/server_priv/resourcedef:
floatlicense type=long
CanRun type=boolean, flag=h
2.
Write a script, for example serverdyn.pl, that returns the available amount of the
resource via stdout, and place it on the server’s host. For example, it could be placed in /
usr/local/bin/serverdyn.pl
3.
Restart the server. See section 5.14.3, “Restart Steps for Custom Resources”, on page
356.
4.
Configure the scheduler to use the script by adding the resource and the path to the script
in the server_dyn_res line of PBS_HOME/sched_priv/sched_config:
UNIX:
server_dyn_res: “floatlicense !/usr/local/bin/serverdyn.pl”
Windows:
server_dyn_res: ‘floatlicense !”C:\Program Files\PBS Pro\serverdyn.pl”’
5.
Add the new dynamic resource to the resources: line in PBS_HOME/sched_priv/
sched_config:
resources: “ncpus, mem, arch, [...], floatlicense”
6.
Restart the scheduler. See section 5.14.3, “Restart Steps for Custom Resources”, on page
356.
7.
Set the Boolean resource on the vnodes where the floating licenses can be run. Here we
designate vnode1 and vnode2 as the vnodes that can run the application:
Qmgr: active node vnode1,vnode2
Qmgr: set node resources_available.CanRun=True
PBS Professional 13.0 Administrator’s Guide
AG-359
Chapter 5
PBS Resources
To request this resource, the job’s resource request would include:
-l floatlicense=<number of licenses or tokens required>
-l select=1:ncpus=N:CanRun=1
5.14.4.2
Static Server-level Resources
Static server-level resources are used for resources like floating licenses that PBS will manage. PBS keeps track of the number of available licenses instead of querying an external
license manager.
5.14.4.2.i
Example of Configuring Static Server-level Resource
These are the steps for configuring a static server-level resource:
1.
Define the resource, for example sitelicense, in the server resource definition file
PBS_HOME/server_priv/resourcedef:
sitelicense type=long, flag=q
2.
Restart the server. See section 5.14.3, “Restart Steps for Custom Resources”, on page
356.
3.
Use the qmgr command to set the value of the resource on the server:
Qmgr: set server resources_available.sitelicense=<number of
licenses>
4.
Add the new resource to the resources: line in PBS_HOME/sched_priv/
sched_config.
resources: “ncpus, mem, arch, [...], sitelicense”
5.
Restart the scheduler. See section 5.14.3, “Restart Steps for Custom Resources”, on page
356.
5.14.5
Configuring Host-level Custom Resources
Host-level custom resources can be static and consumable, static and not consumable, or
dynamic. Dynamic host-level resources are used for things like scratch space.
AG-360
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.5.1
Chapter 5
Dynamic Host-level Resources
For dynamic host-level custom resources, the scheduler sends a resource query to each MoM
to get the current availability for the resource, and uses that value for scheduling. If the MoM
returns a value, this value replaces the resources_available value reported by the server. If
the MoM returns no value, the value from the server is kept. If neither specifies a value, the
Scheduler sets the resource value to 0.
The available amount of the resource is determined by running a script or program which
returns the amount via stdout. This script or program is specified in the
mom_resources line in PBS_HOME/sched_priv/sched_config.
The script is run once per scheduling cycle. For a multi-vnode machine, the script is run for
the natural vnode. The resource is shared among the MoM’s vnodes.
The scheduler tracks how much of each numeric dynamic server-level custom resource has
been assigned to jobs, and will not overcommit these resources.
The format of a dynamic host-level resource query is a shell escape:
<resource name> !<path to command>
In this query,
<resource name> is identical to the name in the resourcedef file.
<path to command> is typically the full path to the script or program that performs the query
in order to determine the status and/or availability of the new resource you have added.
MoM starts the query and waits for output. The default amount of time that MoM waits is 10
seconds; this period can be set via the -a alarm_timeout command line option to
pbs_mom. See section 5.14.3.2, “Restarting or Reinitializing MoM”, on page 356 and “Startup Options to PBS Services” on page 223 in the PBS Professional Installation & Upgrade
Guide. If the timeout is exceeded and the shell escape process has not finished, a log message, “resource read alarm” is written to the MoM’s log file. The process is given another
alarm period to finish and if it does not, another log message is written. The user’s job may
not run.
An example of a dynamic host-level resource is scratch space on the execution host.
Host dynamic resource values are never visible in qstat, and have no
resources_available.<resource> representation anywhere in PBS.
PBS Professional 13.0 Administrator’s Guide
AG-361
Chapter 5
5.14.5.1.i
PBS Resources
Example of Configuring Dynamic Host-level Resource
In this example, we configure a custom resource to track host-level scratch space. The
resource is called dynscratch. These are the steps for configuring a dynamic host-level
resource:
1.
Write a script, for example hostdyn.pl, that returns the available amount of the
resource via stdout. The script must return the value in a single line, ending with a
newline. Place the script on each host where it will be used. For example, it could be
placed in /usr/local/bin/hostdyn.pl.
2.
Configure each MoM to use the script by adding the resource and the path to the script in
PBS_HOME/mom_priv/config:
UNIX:
dynscratch !/usr/local/bin/hostdyn.pl
Windows:
dynscratch !”C:\Program Files\PBS Pro\hostdyn.pl”
3.
Reinitialize the MoMs. See section 5.14.3.2, “Restarting or Reinitializing MoM”, on
page 356.
4.
Define the resource, for example dynscratch, in the server resource definition file
PBS_HOME/server_priv/resourcedef:
dynscratch type=size, flag=h
5.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
6.
You may optionally specify any limits on that resource via qmgr, such as the maximum
amount available, or the maximum that a single user can request. For example:
Qmgr: set server resources_max.scratchspace=1gb
7.
Add the new resource to the resources: line in PBS_HOME/sched_priv/
sched_config:
resources: “ncpus, mem, arch, [...], dynscratch”
8.
Add the new resource to the mom_resources: line in PBS_HOME/sched_priv/
sched_config. Create the line if necessary:
mom_resources: “dynscratch”
9.
Restart the scheduler. See section 5.14.3.3, “Restarting the Scheduler”, on page 357.
To request this resource, the resource request would include
-l select=1:ncpus=N:dynscratch=10MB
AG-362
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.5.2
Chapter 5
Static Host-level Resources
Use static host-level resources for things that are managed by PBS and available at the host
level, such as GPUs.
5.14.5.2.i
Example of Configuring Static Host-level Resource
In this example, we configure a consumable host-level resource to track GPUs. These are the
steps for configuring a static host-level resource:
1.
Define the resource, for example ngpus, in the server resource definition file
PBS_HOME/server_priv/resourcedef:
ngpus type=long, flag=nh
2.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
3.
Use the qmgr command to set the value of the resource on the host:
Qmgr: set node Host1 ngpus=<number of GPUs>
4.
Add the new resource to the resources line in PBS_HOME/sched_priv/
sched_config.
resources: “ncpus, mem, arch, [...], ngpus”
5.
Restart the scheduler. See section 5.14.3.3, “Restarting the Scheduler”, on page 357.
6.
If the GPU host is a multi-vnode machine, you may want to define which GPUs belong in
which vnodes. In this case, do the following:
a.
Create a vnode definition file. See section 3.5.3, “Creating Version 2 MoM Configuration Files”, on page 53.
b.
Restart the MoM. See section 5.14.3.2, “Restarting or Reinitializing MoM”, on page
356.
See section 5.14.7.4.iv, “Example of Per-host Node-locked Licensing”, on page 377, section
5.14.7.4.v, “Example of Per-use Node-locked Licensing”, on page 379, and section
5.14.7.4.vi, “Example of Per-CPU Node-locked Licensing”, on page 381. These sections give
examples of configuring each kind of node-locked license.
PBS Professional 13.0 Administrator’s Guide
AG-363
PBS Resources
Chapter 5
5.14.5.3
Shared Host-level Resources
Two or more vnodes can share the use of a resource. The resource is managed at one vnode,
but available for use at other vnodes. The MoM manages the sharing of the resource, allocating only the available amount to jobs. For example, if you want jobs at two separate vnodes
to be able to use the same 4GB of memory, you can make the memory be a shared resource.
This way, if a job at one vnode uses all 4GB, no other jobs can use it, but if one job at one
vnode uses 2GB, other jobs at either vnode can use up to 2GB.
5.14.5.3.i
Shared Resource Glossary
Borrowing vnode
The vnode where a shared vnode resource is available, but not managed.
Indirect resource
A shared vnode resource at vnode(s) where the resource is not defined, but which
share the resource.
Managing vnode
The vnode where a shared vnode resource is defined, and which manages the
resource.
Shared resource
A vnode resource defined at managed at one vnode, but available for use at others.
5.14.5.3.ii
Configuring Shared Host-level Resources
The resource to be shared is defined as usual at one vnode. This is the managing vnode for
that resource. For example, to make memory be managed at Vnode1:
Qmgr: set node Vnode1 mem = 4gb
At vnodes which will use the same resource, the resource is defined to be indirect. For example, to make memory be shared and borrowed at Vnode2:
Qmgr: set node Vnode2 mem = @Vnode1
5.14.5.3.iii
Shared Dynamic Host-level Resources
Vnode-level dynamic resources, meaning those listed in the mom_resources: line in
PBS_HOME/sched_priv/sched_config, are shared resources.
5.14.5.3.iv
Shared Static Host-level Resources
You can define a static host-level resource to be shared between vnodes. The resource is not
shared if you set it to a value at each vnode.
AG-364
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.5.3.v
Chapter 5
Configuring Shared Static Resources
1.
If the resource to be shared is a custom resource, you must define the resource in
PBS_HOME/server_priv/resourcedef before setting its value:
<resource name> type=<resource type> [flag = <flags>]
2.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
3.
Set the resource on the managing vnode:
To set a static value via qmgr:
Qmgr: s n managing_vnode resources_available.<resource> =<value>
To set a static value, in MoM Version 2 configuration file:
managing_vnode:<resource>=<value>
4.
Next, set the resource on the borrowing vnode:
To set a shared resource on a borrowing vnode via qmgr:
Qmgr: s n borrowing_vnode
resources_available.<resource>=@managing_vnode
To set a shared resource in MoM Version 2 configuration file:
borrowing_vnode:<resource>=@managing_vnode
5.
HUP the MoMs involved; see section 5.14.3.2, “Restarting or Reinitializing MoM”, on
page 356.
Example 5-10: To make a static host-level license dyna-license on hostA be managed by
the natural vnode at hostA and indirect at vnodes hostA0 and hostA1:
Qmgr: set node hostA resources_available.dyna-license=4
Qmgr: set node hostA0 resources_available.dyna-license=@hostA
Qmgr: set node hostA1 resources_available.dyna-license=@hostA
5.14.5.3.vi
Restrictions on Shared Host-level Resources
•
If your vnodes represent physical units such as blades, sharing resources like ncpus
across vnodes may not make sense.
•
If you want to make a resource shared across vnodes, remember that you do not want to
schedule jobs on the natural vnode. To avoid this, the following resources should not be
explicitly set on the natural vnode:
ncpus
mem
vmem
PBS Professional 13.0 Administrator’s Guide
AG-365
Chapter 5
5.14.5.3.vii
PBS Resources
Defining Shared and Non-shared Resources for the
Altix
On an Altix where you are running pbs_mom.cpuset, you can manage the resources at
each vnode. For dynamic host-level resources, the resource is shared across all the vnodes on
the machine, and MoM manages the sharing. For static host-level resources, you can either
define the resource as shared or not. Shared resources are usually set on the natural vnode and
then made indirect at any other vnodes on which you want the resource available. For
resources that are not shared, you can set the value at each vnode.
Example 5-11: To set the resource string_res to round on the natural vnode of altix03
and make it indirect at altix03[0] and altix03[1]:
Qmgr: set node altix03 resources_available.string_res=round
Qmgr: s n altix03[0] resources_available.string_res=@altix03
Qmgr: s n altix03[1] resources_available.string_res=@altix03
pbsnodes -va
altix03
...
string_res=round
...
altix03[0]
...
string_res=@altix03
...
altix03[1]
...
string_res=@altix03
...
If you had set the resource string_res individually on altix03[0] and altix03[1]:
Qmgr: s n altix03[0] resources_available.string_res=round
Qmgr: s n altix03[1] resources_available.string_res=square
pbsnodes -va
altix03
...
<--------string_res not set on natural vnode
...
altix03[0]
...
AG-366
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
string_res=round
...
altix03[1]
...
string_res=square
...
5.14.5.3.viii
Shared Resource Restrictions for Multi-vnode Machines
•
Do not set the values for mem, vmem or ncpus on the natural vnode. If any of these
resources has been explicitly set to a non-zero value on the natural vnode, set
resources_available.ncpus, resources_available.mem and
resources_available.vmem to zero on each natural vnode. See section 3.5.2.3, “Configuring Machines with Cpusets”, on page 53.
•
On the natural vnode, all values for resources_available.<resource> should be zero
(0), unless the resource is being shared among other vnodes via indirection.
PBS Professional 13.0 Administrator’s Guide
AG-367
Chapter 5
5.14.6
5.14.6.1
PBS Resources
Using Scratch Space
Dynamic Server-level (Shared) Scratch Space
If you have scratch space set up so that it’s available to all execution hosts, you can use a
server-level custom dynamic resource to track it. The following are the steps for configuring
a dynamic server-level resource called globalscratch to track globally available scratch
space:
1.
Define the resource in the server resource definition file PBS_HOME/server_priv/
resourcedef:
globalscratch type=long
2.
Write a script, for example serverdynscratch.pl, that returns the available amount
of the resource via stdout, and place it on the server’s host. For example, it could be
placed in /usr/local/bin/serverdynscratch.pl
3.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
4.
Configure the scheduler to use the script by adding the resource and the path to the script
in the server_dyn_res line of PBS_HOME/sched_priv/sched_config:
UNIX:
server_dyn_res: “globalscratch !/usr/local/bin/serverdynscratch.pl”
Windows:
server_dyn_res: ‘globalscratch !”C:\Program Files\PBS
Pro\serverdynscratch.pl”’
5.
Add the new dynamic resource to the resources: line in PBS_HOME/sched_priv/
sched_config:
resources: “ncpus, mem, arch, [...], globalscratch”
6.
Restart the scheduler. See section 5.14.3.3, “Restarting the Scheduler”, on page 357.
To request this resource, the job’s resource request would include:
-l globalscratch=<space required>
5.14.6.2
Dynamic Host-level Scratch Space
Say you have jobs that require a large amount of scratch disk space during their execution. To
ensure that sufficient space is available during job startup, create a custom dynamic resource
so that jobs can request scratch space. To create this resource, take the steps outlined in section 5.14.5.1.i, “Example of Configuring Dynamic Host-level Resource”, on page 362.
AG-368
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.6.3
Chapter 5
Static Server-level Scratch Space
If you want to prevent jobs from stepping on each others’ scratch space, you can define additional vnodes that are used only to allocate scratch devices, with one vnode per scratch device.
Set the sharing attribute on each scratch vnode to force_excl, so that only one job can request
each scratch device. To set the sharing attribute, follow the rules in section 3.5.2, “Choosing
Configuration Method”, on page 52. For example, the scratch devices are /scratch1, /
scratch2, /scratch3, etc. On each scratch device, set resources as follows:
resources_available.ncpus = 0
resources_available.mem = 0
resources_available.scratch = 1
sharing = force_excl
Jobs then request one additional chunk to represent the scratch device, for example:
-l 16:ncpus=1+1:scratch=1
If a job needs to request a specific scratch device, for example /scratch2, that can be done
by additionally asking for the vnode explicitly:
:vnode=scratch2
5.14.6.4
Static Host-level Scratch Space
If the scratch areas are not mounted on all execution hosts, you can specify which scratch
areas are shared among which subsets of vnodes using indirect resources. See section
5.14.5.3, “Shared Host-level Resources”, on page 364.
5.14.6.5
Caveats for Scratch Space and Jobs
When more than one job uses scratch space, or when a job is suspended, scratch space usage
may not be handled correctly. See section 5.9.5, “Dynamic Resource Allocation Caveats”, on
page 331 and section 5.9.6, “Period When Resource is Used by Job”, on page 331.
5.14.7
5.14.7.1
Supplying Application Licenses
Types of Licenses
Application licenses may be managed by PBS or by an external license manager. Application
licenses may be floating or node-locked, and they may be per-host, where any number of
instances can be running on that host, per-CPU, where one license allows one CPU to be used
for that application, or per-run, where one license allows one instance of the application to be
running. Each kind of license needs a different form of custom resource.
PBS Professional 13.0 Administrator’s Guide
AG-369
Chapter 5
5.14.7.1.i
PBS Resources
Externally-managed Licenses
Whenever an application license is managed by an external license manager, you must create
a custom dynamic resource for it. This is because PBS has no control over whether these
licenses are checked out, and must query the external license manager for the availability of
those licenses. PBS does this by executing the script or program that you specify in the
dynamic resource. This script returns the amount via stdout, in a single line ending with a
newline.
5.14.7.1.ii
Preventing Oversubscription of Externally-managed
Licenses
Some applications delay the actual license checkout until some time after the application
begins execution. Licenses could be oversubscribed when the scheduler queries for available
licenses, and gets a result including licenses that essentially belong to a job that is already running but has not yet checked them out. To prevent this, you can create a consumable custom
static integer resource, assign it the total number of licenses, and make each job that requests
licenses request this resource as well. You can use a hook to accomplish this. Alternatively, if
you know the maximum number of jobs that can run using these licenses, you can create a
consumable custom static integer resource to track the number of jobs using licenses, and
make each job request this resource.
If licenses are also checked out by applications outside of the control of PBS, this technique
will not work.
5.14.7.1.iii
PBS-managed Licenses
When an application license is managed by PBS, you can create a custom static resource for
it. You set the total number of licenses using qmgr, and PBS will internally keep track of the
number of licenses available.
Use static host-level resources for node-locked application licenses managed by PBS, where
PBS is in full control of the licenses. These resources are static because PBS tracks them
internally, and host-level because they are tracked at the host.
5.14.7.2
License Units and Features
Different licenses use different license units to track whether an application is allowed to run.
Some licenses track the number of CPUs an application is allowed to run on. Some licenses
use tokens, requiring that a certain number of tokens be available in order to run. Some
licenses require a certain number of features to run the application.
When using units, after you have defined the license resource called license_name to the
server, be sure to set resources_available.license_name to the correct number of units.
AG-370
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Before starting you should have answers to the following questions:
•
How many units of a feature does the application require?
•
How many features are required to execute the application?
•
How do I query the license manager to obtain the available licenses of particular features?
With these questions answered you can begin configuring PBS Professional to query the
license manager servers for the availability of application licenses. Think of a license manager
feature as a resource. Therefore, you should associate a resource with each feature.
5.14.7.3
5.14.7.3.i
Server-level (Floating) Licenses
Example of Floating, Externally-managed License
Here is an example of setting up floating licenses that are managed by an external license
server.
For this example, we have a 6-host complex, with one CPU per host. The hosts are numbered
1 through 6. On this complex we have one licensed application which uses floating licenses
from an external license manager. Furthermore we want to limit use of the application only to
specific hosts. The table below shows the application, the number of licenses, the hosts on
which the licenses should be used, and a description of the type of license used by the application.
Application
AppF
Licenses
4
Hosts
3-6
DESCRIPTION
Uses licenses from an externally managed
pool
For the floating licenses, we will use three resources. One is a dynamic server resource for
the licenses themselves. One is a global server-level integer to prevent oversubscription. The
last is a Boolean resource used to indicate that the floating license can be used on a given host.
PBS Professional 13.0 Administrator’s Guide
AG-371
Chapter 5
PBS Resources
Server Configuration
1.
Define the new resources in the server’s resourcedef file. Create a new file if one
does not already exist by adding the resource names, type, and flag(s).
cd $PBS_HOME/server_priv/
[edit] resourcedef
Example resourcedef file with new resources added:
AppF type=long
AppFcount type=long, flag=q
runsAppF type=boolean, flag=h
2.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
Host Configuration
3.
Set the Boolean resource on the hosts where the floating licenses can be used.
Qmgr: active node host3,host4,host5,host6
Qmgr: set node resources_available.runsAppF = True
Scheduler Configuration
4.
Edit the Scheduler configuration file:
cd $PBS_HOME/sched_priv/
[edit] sched_config
5.
Append the new resource names to the resources: line:
resources: “ncpus, mem, arch, host, [...], AppF, AppFcount, runsAppF”
6.
Edit the server_dyn_res: line:
UNIX:
server_dyn_res: “AppF !/local/flex_AppF”
Windows:
server_dyn_res: ‘AppF !”C:\Program Files\PBS Pro\flex_AppF”’
7.
Restart the Scheduler. See section 5.14.3.3, “Restarting the Scheduler”, on page 357.
You can write a hook that examines the number of AppF licenses requested by each job, and
assigns that many AppFcount to the job, or you can ask your users to request AppFcount.
To request a floating license for AppF and a host on which AppF can run:
qsub -l AppF=1 -l AppFcount=1
-l select=runsAppF=True
AG-372
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
The example below shows what the host configuration would look like. What is shown is
actually truncated output from the pbsnodes -a command. Similar information could be
printed via the qmgr -c “print node @default” command as well.
host1
host2
host3
resources_available.runsAppF
host4
resources_available.runsAppF
host5
resources_available.runsAppF
host6
resources_available.runsAppF
5.14.7.3.ii
= True
= True
= True
= True
Example of Floating, Externally-managed License with
Features
This is an example of a floating license, managed by an external license manager, where the
application requires a certain number of features to run. Floating licenses are treated as
server-level dynamic resources. The license server is queried by an administrator-created
script. This script returns the value via stdout in a single line ending with a newline.
The license script runs on the server’s host once per scheduling cycle and queries the number
of available licenses/tokens for each configured application.
When submitting a job, the user's script, in addition to requesting CPUs, memory, etc., also
requests licenses.
When the scheduler looks at all the enqueued jobs, it evaluates the license request alongside
the request for physical resources, and if all the resource requirements can be met the job is
run. If the job's token requirements cannot be met, then it remains queued.
PBS doesn't actually check out the licenses; the application being run inside the job's session
does that. Note that a small number of applications request varying amounts of tokens during
a job run.
PBS Professional 13.0 Administrator’s Guide
AG-373
PBS Resources
Chapter 5
Our example needs four features to run an application, so we need four custom resources.
1.
Write four scripts, one to query the license server for each of your four features. Complexity of the script is entirely site-specific due to the nature of how applications are
licensed.
2.
Define four non-consumable server-level features in PBS_HOME/server_priv/
resourcedef. These features are defined with no flags:
feature1
feature3
feature6
feature8
type=long
type=long
type=long
type=long
3.
Restart the PBS server. See section 5.14.3.1, “Restarting the Server”, on page 356.
4.
Add the feature resources to the resources: line in PBS_HOME/sched_priv/
sched_config:
resources: “ncpus, mem, arch, [...], feature1, feature3, feature6,
feature8”
5.
Add each feature’s script path to the server_dyn_res: line in PBS_HOME/
server_priv/config:
UNIX:
server_dyn_res:
server_dyn_res:
server_dyn_res:
server_dyn_res:
“feature1
“feature3
“feature6
“feature8
!/path/to/script
!/path/to/script
!/path/to/script
!/path/to/script
‘feature1
‘feature3
‘feature6
‘feature8
!”C:\Program
!”C:\Program
!”C:\Program
!”C:\Program
[args]”
[args]”
[args]”
[args]”
Windows:
server_dyn_res:
server_dyn_res:
server_dyn_res:
server_dyn_res:
6.
Files\PBS
Files\PBS
Files\PBS
Files\PBS
Pro\script
Pro\script
Pro\script
Pro\script
[args]”’
[args]”’
[args]”’
[args]”’
Restart the scheduler. See section 5.14.3.3, “Restarting the Scheduler”, on page 357.
5.14.7.3.iii
Example of Floating License Managed by PBS
Here is an example of configuring custom resources for a floating license that PBS manages.
For this you need a server-level static resource to keep track of the number of available
licenses. If the application can run only on certain hosts, then you will need a host-level Boolean resource to direct jobs running the application to the correct hosts.
AG-374
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
In this example, we have six hosts numbered 1-6, and the application can run on hosts 3, 4, 5
and 6. The resource that will track the licenses is called AppM. The Boolean resource is
called RunsAppM.
Server Configuration
1.
Define the new resource in the server’s resourcedef file. Create a new file if one does
not already exist by adding the resource names, type, and flag(s).
cd $PBS_HOME/server_priv/
[edit] resourcedef
Example resourcedef file with new resources added:
AppM type=long, flag=q
runsAppM type=boolean, flag=h
2.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
3.
Set a value for AppM at the server. Here, we’re allowing 8 copies of the application to
run at once:
Qmgr: set server resources_available.AppM=8
Host Configuration
4.
Set the value of runsAppM on the hosts. Each qmgr directive is typed on a single line:
Qmgr: active node host3,host4,host5,host6
Qmgr: set node resources_available.runsAppM = True
Scheduler Configuration
5.
Edit the Scheduler configuration file:
cd $PBS_HOME/sched_priv/
[edit] sched_config
6.
Append the new resource name to the resources: line. Note that it is not necessary to
add a host-level Boolean resource to this line.
resources: “ncpus, mem, arch, host, [...], AppM, runsAppM”
7.
Restart the Scheduler. See section 5.14.3.3, “Restarting the Scheduler”, on page 357.
To request both the application and a host that can run AppM:
qsub -l AppM=1
-l select=1:runsAppM=1 <jobscript>
PBS Professional 13.0 Administrator’s Guide
AG-375
PBS Resources
Chapter 5
The example below shows what the host configuration would look like. What is shown is
actually truncated output from the pbsnodes -a command. Similar information could be
printed via the qmgr -c “print node @default” command as well. Since unset
Boolean resources are the equivalent of False, you do not need to explicitly set them to False
on the other hosts. Unset Boolean resources will not be printed.
host1
host2
host3
resources_available.runsAppM
host4
resources_available.runsAppM
host5
resources_available.runsAppM
host5
resources_available.runsAppM
5.14.7.4
5.14.7.4.i
= True
= True
= True
= True
Host-level (Node-locked) Licenses
Per-host Node-locked Licenses
If you are configuring a custom resource for a per-host node-locked license, where the number of jobs using the license does not matter, use a host-level Boolean resource on the appropriate host. This resource is set to True. When users request the license, they can use the
following requests:
For a two-CPU job on a single vnode:
-l select=1:ncpus=2:license=1
For a multi-vnode job:
-l select=2:ncpus=2:license=1
-l place=scatter
Users can also use “license=True”, but this way they do not have to change their scripts.
5.14.7.4.ii
Per-CPU Node-locked Licenses
If you are configuring a custom resource for a per-CPU node-locked license, use a host-level
consumable resource on the appropriate vnode. This resource is set to the maximum number
of CPUs you want used on that vnode. Then when users request the license, they will use the
following request:
AG-376
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
For a two-CPU, two-license job:
-l select=1:ncpus=2:license=2
5.14.7.4.iii
Per-use Node-locked License
If you are configuring a custom resource for a per-use node-locked license, use a host-level
consumable resource on the appropriate host. This resource is set to the maximum number of
instances of the application allowed on that host. Then when users request the license, they
will use:
For a two-CPU job on a single host:
-l select=1:ncpus=2:license=1
For a multi-vnode job where each chunk needs two CPUs:
-l select=2:ncpus=2:license=1
-l place=scatter
5.14.7.4.iv
Example of Per-host Node-locked Licensing
Here is an example of setting up node-locked licenses where one license is required per host,
regardless of the number of jobs on that host.
For this example, we have a 6-host complex, with one CPU per host. The hosts are numbered
1 through 6. On this complex we have a licensed application that uses per-host node-locked
licenses. We want to limit use of the application only to specific hosts. The table below shows
the application, the number of licenses for it, the hosts on which the licenses should be used,
and a description of the type of license used by the application.
Application
AppA
Licenses
1
Hosts
1-4
DESCRIPTION
uses a local node-locked application license
For the per-host node-locked license, we will use a Boolean host-level resource called
resources_available.runsAppA. This will be set to True on any hosts that should have the
license, and will default to False on all others. The resource is not consumable so that more
than one job can request the license at a time.
PBS Professional 13.0 Administrator’s Guide
AG-377
Chapter 5
PBS Resources
Server Configuration
1.
Define the new resource in the server’s resourcedef file. Create a new file if one does
not already exist by adding the resource names, type, and flag(s).
cd $PBS_HOME/server_priv/
[edit] resourcedef
Example resourcedef file with new resources added:
runsAppA type=boolean, flag=h
AppA type=long, flag=h
2.
Restart the server. See section 5.14.3.1, “Restarting the Server”, on page 356.
Host Configuration
3.
Set the value of runsAppA on the hosts. Each qmgr directive is typed on a single line:
Qmgr: active node host1,host2,host3,host4
Qmgr: set node resources_available.runsAppA = True
Scheduler Configuration
4.
Edit the Scheduler configuration file.
cd $PBS_HOME/sched_priv/
[edit] sched_config
5.
Append the new resource name to the “resources:” line. Note that it is not necessary
to add the host-level Boolean resource to this line.
resources: “ncpus, mem, arch, [...], AppA, runsAppA”
6.
Restart the Scheduler. See section 5.14.3.3, “Restarting the Scheduler”, on page 357.
To request a host with a per-host node-locked license for AppA:
qsub -l select=1:runsAppA=1 <jobscript>
AG-378
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
The example below shows what the host configuration would look like. What is shown is
actually truncated output from the pbsnodes -a command. Similar information could be
printed via the qmgr -c “print node @default” command as well. Since unset
Boolean resources are the equivalent of False, you do not need to explicitly set them to False
on the other hosts. Unset Boolean resources will not be printed.
host1
resources_available.runsAppA
host2
resources_available.runsAppA
host3
resources_available.runsAppA
host4
resources_available.runsAppA
host5
host6
5.14.7.4.v
= True
= True
= True
= True
Example of Per-use Node-locked Licensing
Here is an example of setting up per-use node-locked licenses. Here, while a job is using one
of the licenses, it is not available to any other job.
For this example, we have a 6-host complex, with 4 CPUs per host. The hosts are numbered 1
through 6. On this complex we have a licensed application that uses per-use node-locked
licenses. We want to limit use of the application only to specific hosts. The licensed hosts can
run two instances each of the application. The table below shows the application, the number
of licenses for it, the hosts on which the licenses should be used, and a description of the type
of license used by the application.
Application
AppB
Licenses
2
Hosts
1-2
DESCRIPTION
Uses a local node-locked application license
For the node-locked license, we will use one static host-level resource called
resources_available.AppB. This will be set to 2 on any hosts that should have the license,
and to 0 on all others. The “nh” flag combination means that it is host-level and it is consumable, so that if a host has 2 licenses, only two jobs can use those licenses on that host at a time.
PBS Professional 13.0 Administrator’s Guide
AG-379
PBS Resources
Chapter 5
Server Configuration
1.
Define the new resource in the server’s resourcedef file. Create a new file if one does
not already exist by adding the resource names, type, and flag(s).
cd $PBS_HOME/server_priv/
[edit] resourcedef
Example resourcedef file with new resources added:
AppB type=long, flag=nh
2.
Restart the server. See "Restarting the Server” on page 356.
Host Configuration
3.
Set the value of AppB on the hosts to the maximum number of instances allowed. Each
qmgr directive is typed on a single line:
Qmgr:
Qmgr:
Qmgr:
Qmgr:
active node host1,host2
set node resources_available.AppB = 2
active node host3,host4,host5,host6
set node resources_available.AppB = 0
Scheduler Configuration
4.
Edit the Scheduler configuration file.
cd $PBS_HOME/sched_priv/
[edit] sched_config
5.
Append the new resource name to the resources: line:
resources: “ncpus, mem, arch, host, [...], AppB”
6.
Restart the Scheduler. See "Restarting the Scheduler” on page 357.
To request a host with a node-locked license for AppB, where you’ll run one instance of AppB
on two CPUs:
qsub -l select=1:ncpus=2:AppB=1
AG-380
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
The example below shows what the host configuration would look like. What is shown is
actually truncated output from the pbsnodes -a command. Similar information could be
printed via the qmgr -c “print node @default” command as well.
host1
resources_available.AppB
host2
resources_available.AppB
host3
resources_available.AppB
host4
resources_available.AppB
host5
resources_available.AppB
host6
resources_available.AppB
5.14.7.4.vi
= 2
= 2
= 0
= 0
= 0
= 0
Example of Per-CPU Node-locked Licensing
Here is an example of setting up per-CPU node-locked licenses. Each license is for one CPU,
so a job that runs this application and needs two CPUs must request two licenses. While that
job is using those two licenses, they are unavailable to other jobs.
For this example, we have a 6-host complex, with 4 CPUs per host. The hosts are numbered 1
through 6. On this complex we have a licensed application that uses per-CPU node-locked
licenses. We want to limit use of the application to specific hosts only. The table below shows
the application, the number of licenses for it, the hosts on which the licenses should be used,
and a description of the type of license used by the application.
Application
AppC
Licenses
4
Hosts
3-4
DESCRIPTION
uses a local node-locked application license
For the node-locked license, we will use one static host-level resource called
resources_available.AppC. We will provide a license for each CPU on hosts 3 and 4, so
this will be set to 4 on any hosts that should have the license, and to 0 on all others. The “nh”
flag combination means that it is host-level and it is consumable, so that if a host has 4
licenses, only four CPUs can be used for that application at a time.
PBS Professional 13.0 Administrator’s Guide
AG-381
PBS Resources
Chapter 5
Server Configuration
1.
Define the new resource in the server’s resourcedef file. Create a new file if one does
not already exist by adding the resource names, type, and flag(s).
cd $PBS_HOME/server_priv/
[edit] resourcedef
Example resourcedef file with new resources added:
AppC type=long, flag=nh
2.
Restart the server. See "Restarting the Server” on page 356.
Host Configuration
3.
Set the value of AppC on the hosts. Each qmgr directive is typed on a single line:
Qmgr:
Qmgr:
Qmgr:
Qmgr:
active node host3,host4
set node resources_available.AppC = 4
active node host1,host2,host5,host6
set node resources_available.AppC = 0
Scheduler Configuration
4.
Edit the Scheduler configuration file:
cd $PBS_HOME/sched_priv/
[edit] sched_config
5.
Append the new resource name to the resources: line:
resources: “ncpus, mem, arch, host, [...], AppC”
6.
Restart the Scheduler. See "Restarting the Scheduler” on page 357.
To request a host with a node-locked license for AppC, where you’ll run a job using two
CPUs:
qsub -l select=1:ncpus=2:AppC=2
AG-382
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
The example below shows what the host configuration would look like. What is shown is
actually truncated output from the pbsnodes -a command. Similar information could be
printed via the qmgr -c “print node @default” command as well.
host1
resources_available.AppC
host2
resources_available.AppC
host3
resources_available.AppC
host4
resources_available.AppC
host5
resources_available.AppC
host6
resources_available.AppC
5.14.8
= 0
= 0
= 4
= 4
= 0
= 0
Using GPUs
You can configure PBS to support GPU scheduling. We describe how to configure both basic
and advanced GPU scheduling. Basic GPU scheduling will meet the needs of most job submitters; it allows a job to request the number of GPUs it needs, as long as the job requests
exclusive use of each node containing the GPUs. Advanced GPU scheduling allows jobs to
request specific GPUs.
PBS Professional allocates GPUs, but does not bind jobs to any particular GPU; the application itself, or the CUDA library, is responsible for the actual binding.
5.14.8.1
Basic GPU Scheduling
Basic scheduling consists of prioritizing jobs based on site policies, controlling access to
nodes with GPUs, ensuring that GPUs are not over-subscribed, and tracking use of GPUs in
accounting logs.
Configuring PBS to perform basic scheduling of GPUs is relatively simple, and only requires
defining and configuring a single custom resource to represent the number of GPUs on each
node.
This method allows jobs to request unspecified GPUs. Jobs should request exclusive use of
the node to prevent other jobs being scheduled on their GPUs.
PBS Professional 13.0 Administrator’s Guide
AG-383
PBS Resources
Chapter 5
5.14.8.2
Advanced GPU Scheduling
Advanced scheduling allows a job to separately allocate (request and/or identify) each individual GPU on a node.
In this case, both PBS and the applications themselves must support individually allocating
the GPUs on a node. Advanced scheduling requires defining a PBS vnode for each GPU.
This capability is useful for sharing a single multi-GPU node among multiple jobs, where
each job requires exclusive use of its GPUs.
5.14.8.3
Configuring PBS for Basic GPU Scheduling
You configure a single custom consumable resource to represent all GPU devices on an execution host. Create a host-level global consumable custom resource to represent GPUs. We
recommend that the custom GPU resource is named ngpus. Set the value for this resource at
each vnode to the number of GPUs on the vnode.
The ngpus resource is used exactly the way you use the ncpus resource.
5.14.8.3.i
Example of Configuring PBS for Basic GPU Scheduling
In this example, there are two execution hosts, HostA and HostB, and each execution host has
4 GPU devices.
1.
Stop the server and scheduler. On the server's host, type:
/etc/init.d/pbs stop
2.
Edit PBS_HOME/server_priv/resourcedef, and add the following line:
ngpus type=long, flag=nh
3.
Edit PBS_HOME/sched_priv/sched_config to add ngpus to the list of scheduling
resources:
resources: “ncpus, mem, arch, host, vnode, ngpus”
4.
Restart the server and scheduler. On the server's host, type:
/etc/init.d/pbs start
5.
Add the number of GPU devices available to each execution host in the cluster via qmgr:
Qmgr: set node HostA resources_available.ngpus=4
Qmgr: set node HostB resources_available.ngpus=4
AG-384
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.14.8.4
Chapter 5
Configuring PBS for Advanced GPU Scheduling
You configure each GPU device in its own vnode, and each GPU vnode has a resource to contain the device number of its GPU.
Create and set two custom resources:
•
Create a host-level global consumable resource to represent the GPUs on a vnode. We
recommend that this resource is called ngpus.
Set ngpus on each node to the number of GPUs on that node.
•
Create a host-level global non-consumable resource containing the GPU device number,
which serves to tie the individual GPU to the vnode. We recommend that this resource is
called gpu_id.
Set gpu_id for each GPU to the device number of that GPU.
PBS Professional 13.0 Administrator’s Guide
AG-385
PBS Resources
Chapter 5
5.14.8.4.i
Example of Configuring PBS for Advanced GPU
Scheduling
In this example, there is one execution host, HostA, that has two vnodes, HostA[0] and
HostA[1], as well as the natural vnode. HostA has 4 CPUs, 2 GPUs, and 16 GB of memory.
1.
Stop the server and scheduler. On the server's host, type:
/etc/init.d/pbs stop
2.
Edit PBS_HOME/server_priv/resourcedef to add the new custom resources:
ngpus type=long, flag=nh
gpu_id type=string, flag=h
3.
Edit PBS_HOME/sched_priv/sched_config to add ngpus and gpu_id to the list
of scheduling resources:
resources: “ncpus, mem, arch, host, vnode, ngpus, gpu_id”
4.
Restart the server and scheduler. On the server’s host, type:
/etc/init.d/pbs start
5.
Create a vnode configuration file for each execution host where GPUs are present. The
script for HostA is named hostA_vnodes, and is shown here:
$configversion 2
hostA: resources_available.ncpus = 0
hostA: resources_available.mem = 0
hostA[0]: resources_available.ncpus = 2
hostA[0] : resources_available.mem = 8gb
hostA[0] : resources_available.ngpus = 1
hostA[0] : resources_available.gpu_id = gpu0
hostA[0] : sharing = default_excl
hostA[1] : resources_available.ncpus = 2
hostA[1] : resources_available.mem = 8gb
hostA[1] : resources_available.ngpus = 1
hostA[1] : resources_available.gpu_id = gpu1
hostA[1]: sharing = default_excl
6.
Add vnode configuration information in the following manner, for each node with GPUs:
PBS_EXEC/sbin/pbs_mom -s insert HostA_vnodes HostA_vnodes
7.
Signal each MoM to re-read its configuration files:
AG-386
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
kill -HUP <pbs_mom PID>
5.14.9
Using FPGAs
You can configure a custom resource that allows PBS to track the usage of FPGAs. The
FPGAs are detected outside of PBS at boot time. There are two basic methods for automatic
configuration of the FPGA resource:
•
Create a global static host-level resource called nfpgas. Create a boot-up script in init.d
that detects the presence of the FPGAs, and sets the value of the nfpgas resource.
•
Create a global dynamic host-level resource called nfpgas. This resource calls a script to
detect the presence of FPGAs
We recommend the static resource, because FPGAs are static, and there is a performance penalty for a dynamic resource.
5.14.10
Custom Resource Caveats
•
Because some custom resources are external to PBS, they are not completely under the
control of PBS. Therefore it is possible for PBS to query and find a resource available,
schedule a job to run and use that resource, only to have an outside entity take that
resource before the job is able to use it. For example, say you had an external resource of
“scratch space” and your local query script simply checked to see how much disk space
was free. It would be possible for a job to be started on a host with the requested space,
but for another application to use the free space before the job did.
•
If a resource is not put in the scheduler’s resources: line, when jobs request the
resource, that request will be ignored. If the resource is ignored, it cannot be used to
accept or reject jobs at submission time. For example, if you create a string resource
String1 on the server, and set it to foo, a job requesting “-l String1=bar” will be
accepted. The only exception is host-level Boolean resources, which are considered
when scheduling, whether or not they are in the scheduler’s resources: line.
•
Do not create resources with the same names or prefixes that PBS uses when creating
custom resources for specific systems. See “Custom Cray Resources” on page 323 of the
PBS Professional Reference Guide.
•
Using dynamic host-level resources can slow the scheduler down, because the scheduler
must wait for each resource-query script to run.
PBS Professional 13.0 Administrator’s Guide
AG-387
Chapter 5
5.15
PBS Resources
Managing Resource Usage
You can manage resource usage from different directions:
•
You can manage resource usage by users, groups, and projects, and the number of jobs, at
the server and queue level. See section 5.15.1, “Managing Resource Usage By Users,
Groups, and Projects, at Server & Queues”, on page 389.
•
You can manage the total amount of each resource that is used by projects, users or
groups, at the server or queue level. For example, you can manage how much memory is being used by jobs in queue QueueA.
•
You can manage the number of jobs being run by projects, users or groups, at the
server or queue level. For example, you can limit the number of jobs enqueued in
queue QueueA by any one group to 30, and by any single user to 5.
•
You can specify how much of each resource any job is allowed to use, at the server and
queue level. See section 5.15.3, “Placing Resource Limits on Jobs”, on page 414 and
section 5.13, “Using Resources to Restrict Server, Queue Access”, on page 336.
•
You can set default limits for usage for each resource, at the server or queue level, so that
jobs that do not request a given resource inherit that default, and are limited to the inherited amount. For example, you can specify that any job entering queue QueueA not specifying walltime is limited to using 4MB of memory. See section 5.9.3, “Specifying Job
Default Resources”, on page 323.
•
You can set limits on the number of jobs that can be run at each vnode by users, by
groups, or overall. See section 5.15.2, “Limiting Number of Jobs at Vnode”, on page
413.
•
You can set limits on the number of jobs that can be in the queued state at the server and/
or queue level. You can apply these limits to users, groups, projects, or everyone. This
allows users to submit as many jobs as they want, while allowing the scheduler to consider only the jobs in the execution queues, thereby speeding up the scheduling cycle.
See section 5.15.4, “Limiting the Number of Jobs in Queues”, on page 423.
AG-388
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.15.1
Chapter 5
Managing Resource Usage By Users, Groups,
and Projects, at Server & Queues
You can set separate limits for resource usage by individual users, individual groups, individual projects, generic users, generic groups, generic projects, and the total used overall, for
queued jobs, running jobs, and queued and running jobs. You can limit the amount of
resources used, and the number of queued jobs, the number of running jobs, and the number
of queued and running jobs. These limits can be defined separately for each queue and for the
server. You define the limits by setting server and queue limit attributes. For information
about projects, see section 11.4, “Grouping Jobs By Project”, on page 969.
There are two incompatible sets of server and queue limit attributes used in limiting
resource usage. The first set existed in PBS Professional before Version 10.1, and we call
them the old limit attributes. The old limit attributes are discussed in section 5.15.1.15, “Old
Limit Attributes: Server and Queue Resource Usage Limit Attributes Existing Before Version
10.1”, on page 411. The set introduced in Version 10.1 is called simply the limit attributes,
and they are discussed here.
You can use either the limit attributes or the old limit attributes for the server and queues, but
not both. See section 5.15.1.13.v, “Do Not Mix Old And New Limits”, on page 410.
There is a set of limit attributes for vnodes which existed before Version 10.1 and can be used
with either the limit attributes or the old limit attributes. These are discussed in section
5.15.2, “Limiting Number of Jobs at Vnode”, on page 413.
The server and queues each have per-job limit attributes which operate independently of the
limits discussed in this section. The resources_min.<resource> and
resources_max.<resource> server and queue attributes are limits on what each individual
job may use. See section 5.13, “Using Resources to Restrict Server, Queue Access”, on page
336 and section 5.15.3, “Placing Resource Limits on Jobs”, on page 414.
PBS Professional 13.0 Administrator’s Guide
AG-389
Chapter 5
5.15.1.1
PBS Resources
Examples of Managing Resource Usage at
Server and Queues
You can limit resource usage and job count for specific projects, users and groups:
•
UserA can use no more than 6 CPUs, and UserB can use no more than 4 CPUs, at one
time anywhere in the PBS complex.
•
The crashtest group can use no more than 16 CPUs at one time anywhere in the PBS
complex.
•
UserC accidentally submitted 200,000 jobs last week. UserC can now have no more than
25 jobs enqueued at one time.
•
All jobs request the server-level custom resource nodehours, which is used for allocation. UserA cannot use more than 40 nodehours in the PBS complex. Once UserA
reaches the nodehours limit, then all queued jobs owned by UserA are not eligible for
execution.
•
You wish to allow UserD to use 12 CPUs but limit all other users to 4 CPUs.
•
Jobs belonging to Project A can use no more than 8 CPUs at Queue1.
You can limit the number of jobs a particular project, user or group runs in a particular queue:
•
UserE can use no more than 2 CPUs at one time at Queue1, and 6 CPUs at one time at
Queue2.
•
You wish to limit UserF to 10 running jobs in queue Queue3, but allow all other users
unlimited jobs running in the same queue.
•
UserG is a member of Group1. You have a complex-wide limit of 5 running jobs for
UserG. You have a limit at Queue1 of 10 running jobs for Group1. This way, up to 10 of
the running jobs in Queue1 can belong to Group1, and 5 of these can belong to UserG.
•
UserH is a member of Group1. You have a complex-wide limit of 5 running jobs for
UserH. You have a limit at Queue1 of 10 running jobs for any group in Queue1. This
way, no group in Queue1 can run more than 10 jobs total at one time, and 5 of these can
belong to UserH.
•
UserJ is a member of Group1. You have a complex-wide limit of 10 running jobs for
UserJ. You also have a limit at Queue1 of 5 running jobs for Group1. This means that
there may be up to 5 running jobs owned by users belonging to Group1 in Queue1, and
up to 5 of these can be owned by UserJ. UserJ can also have another 5 running jobs
owned by Group1 in any other queue, or owned by a different group in Queue1.
•
No more than 12 jobs belonging to Project A can run at Queue1, and all other projects are
limited to 8 jobs at Queue1.
AG-390
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
You can ensure fairness in the use of resources:
•
You have multiple departments which have shared the purchase of a large Altix. Each
department would like to ensure fairness in the use of the Altix, by setting limits on individual users and groups.
•
You have multiple departments, each of which purchases its own machines. Each department would like to limit the use of its machines so that all departmental users have specific limits. In addition, each department would like to allow non-departmental users to
use its machines when they are under-utilized, while giving its own users priority on its
machines. A non-departmental user can run jobs on a departmental machine, as long as
no departmental users’ jobs are waiting to run.
5.15.1.2
Glossary
Limit
The maximum amount of a resource that can be consumed at any time by running
jobs or allocated to queued jobs, or the maximum number of jobs that can be running, or the maximum number of jobs that can be queued.
Overall limit
Limit on the total usage. In the context of server limits, this is the limit for usage at
the PBS complex. In the context of queue limits, this is the limit for usage at the
queue. An overall limit is applied to the total usage at the specified location. Separate overall limits can be specified at the server and each queue.
Generic user limit
Applies separately to users at the server or a queue. The limit for users who have no
individual limit specified. A separate limit for generic users can be specified at the
server and at each queue.
Generic group limit
Applies separately to groups at the server or a queue. The limit for groups which
have no individual limit specified. A limit for generic groups is applied to the usage
across the entire group. A separate limit can be specified at the server and each
queue.
Generic project limit
Applies separately to projects at the server or a queue. The limit for projects which
have no individual limit specified. A limit for generic projects is applied to the usage
across the entire project. A separate limit can be specified at the server and each
queue.
PBS Professional 13.0 Administrator’s Guide
AG-391
Chapter 5
PBS Resources
Individual user limit
Applies separately to users at the server or a queue. Limit for users who have their
own individual limit specified. A limit for an individual user overrides the generic
user limit, but only in the same context, for example, at a particular queue. A separate limit can be specified at the server and each queue.
Individual group limit
Applies separately to groups at the server or a queue. Limit for a group which has its
own individual limit specified. An individual group limit overrides the generic
group limit, but only in the same context, for example, at a particular queue. The
limit is applied to the usage across the entire group. A separate limit can be specified
at the server and each queue.
Individual project limit
Applies separately to projects at the server or a queue. Limit for a project which has
its own individual limit specified. An individual project limit overrides the generic
project limit, but only in the same context, for example, at a particular queue. The
limit is applied to the usage across the entire project. A separate limit can be specified at the server and each queue.
User limit
A limit placed on one or more users, whether generic or individual.
Group limit
This is a limit applied to the total used by a group, whether the limit is a generic
group limit or an individual group limit.
Project
In PBS, a project is a way to group jobs independently of users and groups. A
project is a tag that identifies a set of jobs. Each job’s project attribute specifies the
job’s project.
Project limit
This is a limit applied to the total used by a project, whether the limit is a generic
project limit or an individual project limit.
Queued jobs
In a queue, queued jobs are the jobs that are waiting in that queue.
AG-392
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.15.1.3
Chapter 5
Difference Between PBS_ALL and
PBS_GENERIC
Note the very important difference between the overall limit and a generic limit. We will
describe how this works for uses, but this applies to other entities as well. You set PBS_ALL
for an overall limit on the total usage of that resource by all entities, whereas you set
PBS_GENERIC for a limit for any single generic user.
Example 5-12: Difference between overall limit and generic user limit
Given the following:
•
The overall server limit for running jobs is 100
•
The server limit for generic users is 10
•
The individual limit for User1 is 12 jobs
This means:
•
Generic users (any single user except User1) can run no more than 10 jobs at this
server
•
User1 can run 12 jobs at this server
•
At this server, no more than 100 jobs can be running at any time
5.15.1.4
Hard and Soft Limits
Hard limits are limits which cannot be exceeded. Soft limits are limits which mark the point
where a project, user or group is using “extra, but acceptable” amounts of a resource. When
this happens, the jobs belonging to that project, user or group are eligible for preemption. See
section 4.8.33, “Using Preemption”, on page 241. Soft limits are discussed in section
4.8.33.6.i, “The Soft Limits Preemption Level”, on page 246.
5.15.1.5
Scope of Limits at Server and Queues
Each of the limits described above can be set separately at the server and at each queue. Each
limit’s scope is the PBS object where it is set. The individual and generic project, user and
group limits that are set within one scope interact with each other only within that scope. For
example, a limit set at one queue has no effect at another queue.
The scope of limits set at the server encompasses queues, so that the minimum, more restrictive limit of the two is applied. For precedence within a server or queue, see section 5.15.1.7,
“Precedence of Limits at Server and Queues”, on page 397.
PBS Professional 13.0 Administrator’s Guide
AG-393
PBS Resources
Chapter 5
5.15.1.6
Ways To Limit Resource Usage at Server and
Queues
You can create a complete set of limits at the server, and you can create another complete set
of limits at each queue. You can set hard and soft limits. See section 4.8.33.6.i, “The Soft
Limits Preemption Level”, on page 246. You can limit resource usage at the server and the
queue level for the following:
•
•
•
Running jobs
•
Number of running jobs
•
Number of running jobs (soft limit)
•
Amount of each resource allocated for running jobs
•
Amount of each resource allocated for running jobs (soft limit)
Queued jobs (this means jobs that are waiting to run from that queue)
•
Number of queued jobs
•
Amount of each resource allocated for queued jobs
Queued and running jobs (this means both jobs that are waiting to run and jobs that are
running from that queue)
•
Number of queued and running jobs
•
Amount of each resource allocated for queued and running jobs
These limits can be applied to the following:
•
The total usage at the server
•
The total usage at each queue
•
Amount used by a single user
•
•
•
Generic users
•
Individual users
Amount used by a single group
•
Generic groups
•
Individual groups
Amount used by a single project
•
Generic projects
•
Individual projects
AG-394
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.15.1.6.i
Chapter 5
Limits at Queues
You can limit the number of jobs that are queued at a queue, and running at a queue, and that
are both queued and running at a queue.
You can limit the resources allocated to jobs that are queued at a queue, and running at a
queue, and that are both queued and running at a queue.
Jobs queued at a queue are counted the same whether they were submitted to that queue via
the qsub command or its equivalent API, moved to that queue via the qmove command or
its equivalent API, or routed to that queue from another queue.
When PBS requeues a job, it does not take limits into account.
Routing queues do not run jobs, so you cannot set a limit for the number of running jobs, or
the amount of resources being used by running jobs, at a routing queue.
5.15.1.6.ii
Generic and Individual Limits
You can set a generic limit for groups, so that each group must obey the same limit. You can
likewise set a generic limit for users and projects. Each generic limit can be set separately at
the server and at each queue. For example, if you have two queues, the generic limit for the
number of jobs a user can run be 4 at QueueA and 6 at QueueB.
You can set a different individual limit for each user, and you can set individual limits for
groups and for projects. Each user, group, and project can have a different individual limit at
the server and at each queue.
PBS Professional 13.0 Administrator’s Guide
AG-395
PBS Resources
Chapter 5
You can use a combination of generic and individual project, user or group limits, at the server
and at each queue. Within the scope of the server or a queue, all projects, users or groups
except the ones with the individual limits must obey the generic limit, and the individual limits override the generic limits.
Example 5-13: Generic and individual user limits on running jobs at QueueA and QueueB
At QueueA:
•
At QueueA, the generic user limit is 5
•
At QueueA, Bob’s individual limit is 8
•
Tom has no individual limit set at QueueA; the generic limit applies
At QueueB:
•
At QueueB, the generic user limit is 2
•
At QueueB, Tom’s individual limit is 1
•
Bob has no individual limit at QueueB; the generic limit applies
This means:
•
Bob can run 8 jobs at QueueA
•
Bob can run 2 jobs at QueueB
•
Tom can run 5 jobs at QueueA
•
Tom can run 1 job at QueueB
5.15.1.6.iii
Overall Limits
The overall limit places a cap on the total amount of the resource that can be used within the
scope in question (server or queue), regardless of whether project, user, or group limits have
been reached. A project, user, or group at the server or a queue cannot use any more of a
resource for which the overall limit has been reached, even if that project, user, or group limit
has not been reached.
Example 5-14: Overall limit at server
Given the following:
•
Overall server limit on running jobs is 100
•
Bob’s user limit is 10 running jobs
•
98 jobs are already running
•
Bob is running zero jobs
This means:
•
Bob can start only 2 jobs
AG-396
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.15.1.7
5.15.1.7.i
Chapter 5
Precedence of Limits at Server and Queues
Interactions Between Limits Within One Scope
Within the scope of a PBS object (server or queue), there is an order of precedence for limits
when more than one applies to a job. The order of precedence for the limits at a queue is the
same as the order at the server. The following table shows how limits interact within one
scope:
Table 5-8: Limit Interaction Within One Scope
Indiv.
User
Generic
User
Individual Individual Individual
user
user
User
Indiv.
Group
Generic Indiv. Generic
Group Project Project
More
restrictive
More
restrictive
More
restrictive
More
restrictive
Generic
user
More
restrictive
More
restrictive
More
restrictive
More
restrictive
Individual More
Group restrictive
More
restrictive
Individual
group
Individual
group
More
restrictive
More
restrictive
More
restrictive
More
restrictive
Individual
group
Generic
group
More
restrictive
More
restrictive
Individual More
Project restrictive
More
restrictive
More
restrictive
More
restrictive
Individual
project
Individual project
More
restrictive
More
restrictive
More
restrictive
More
restrictive
Individual
project
Generic
project
Generic
User
Generic
Group
Generic
Project
Individual
user
PBS Professional 13.0 Administrator’s Guide
AG-397
Chapter 5
PBS Resources
An individual user limit overrides a generic user limit.
Example 5-15: Individual user limit overrides generic user limit
Given the following:
•
Bob has a limit of 10 running jobs
•
The generic limit is 5
This means:
•
Bob can run 10 jobs
An individual group limit overrides a generic group limit in the same manner as for users.
If the limits for a user and the user’s group are different, the more restrictive limit applies.
Example 5-16: More restrictive user or group limit applies
Given the following:
•
Tom’s user limit for running jobs is 8
•
Tom’s group limit is 7
This means:
•
Tom can run only 7 jobs in that group
If a user belongs to more than one group, that user can run jobs up to the lesser of his user
limit or the sum of the group limits.
Example 5-17: User can run jobs in more than one group
Given the following:
•
Tom’s user limit is 10 running jobs
•
GroupA has a limit of 2 and GroupB has a limit of 4
•
Tom belongs to GroupA and GroupB
This means:
•
Tom can run 6 jobs, 2 in GroupA and 4 in GroupB
An individual project limit overrides a generic project limit, similar to the way user and group
limits work.
AG-398
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Project limits are applied independently of user and group limits.
Example 5-18: Project limits are applied without regard to user and group limits
Given the following:
•
Project A has a limit of 2 jobs
•
Bob has an individual limit of 4 jobs
•
Bob’s group has a limit of 6 jobs
•
Bob is running 2 jobs, both in Project A
This means:
•
Bob cannot run any more jobs in Project A
5.15.1.7.ii
Interactions Between Queue and Server Limits
If the limits for a queue and the server are different, the more restrictive limit applies.
Example 5-19: More restrictive queue or server limit applies
Given the following:
•
Server limit on running jobs for generic users is 10
•
Queue limit for running jobs from QueueA for generic users is 15
•
Queue limit for running jobs from QueueB for generic users is 5
This means:
•
Generic users at QueueA can run 10 jobs
•
Generic users at QueueB can run 5 jobs
Example 5-20: More restrictive queue or server limit applies
Given the following:
•
Bob’s user limit on running jobs, set on the server, is 7
•
Bob’s user limit on running jobs, set on QueueA, is 6
This means:
•
Bob can run 6 jobs from QueueA
5.15.1.8
Resource Usage Limit Attributes for Server and
Queues
Each of the following attributes can be set at the server and each queue:
max_run
The maximum number of jobs that can be running.
PBS Professional 13.0 Administrator’s Guide
AG-399
Chapter 5
PBS Resources
max_run_soft
The soft limit on the maximum number of jobs that can be running.
max_run_res.<resource>
The maximum amount of the specified resource that can be allocated to running jobs.
max_run_res_soft.<resource>
The soft limit on the amount of the specified resource that can be allocated to running jobs.
max_queued
The maximum number of jobs that can be queued and running. At the server level,
this includes all jobs in the complex. Queueing a job includes the qsub and qmove
commands and the equivalent APIs.
max_queued_res.<resource>
The maximum amount of the specified resource that can be allocated to queued and
running jobs. At the server level, this includes all jobs in the complex. Queueing a
job includes the qsub and qmove commands and the equivalent APIs.
queued_jobs_threshold
The maximum number of jobs that can be queued. At the server level, this includes
all jobs in the complex. Queueing a job includes the qsub and qmove commands
and the equivalent APIs.
queued_jobs_threshold_res.<resource>
The maximum amount of the specified resource that can be allocated to queued jobs.
At the server level, this includes all jobs in the complex. Queueing a job includes the
qsub and qmove commands and the equivalent APIs.
Each attribute above can be used to specify all of the following:
•
An overall limit (at the queue or server)
•
A limit for generic users
•
Individual limits for specific users
•
A limit for generic projects
•
Individual limits for specific projects
•
A limit for generic groups
•
Individual limits for specific groups
AG-400
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
For example, you can specify the limits for the number of running jobs:
•
•
•
In the complex:
•
The overall server limit (all usage in the entire complex) is 10,000
•
The limit for generic users is 5
•
The limit for Bob is 10
•
The limit for generic groups is 50
•
The limit for group GroupA is 75
•
The limit for generic projects is 25
•
The limit for Project A is 35
At QueueA:
•
The overall queue limit (all usage in QueueA) is 200
•
The limit for generic users is 2
•
The limit for Bob is 1
•
The limit for generic groups is 3
•
The limit for group GroupA is 7
•
The limit for generic projects is 10
•
The limit for Project A is 15
At QueueB:
•
The overall queue limit (all usage in QueueB) is 500
•
The limit for generic users is 6
•
The limit for Bob is 8
•
The limit for generic groups is 15
•
The limit for group GroupA is 11
•
The limit for generic projects is 20
•
The limit for Project A is 30
5.15.1.9
How to Set Limits at Server and Queues
You can set, add, and remove limits by using the qmgr command to set limit attributes.
PBS Professional 13.0 Administrator’s Guide
AG-401
PBS Resources
Chapter 5
5.15.1.9.i
Syntax
Format for setting a limit attribute:
set server <limit attribute> = “[limit-spec=<limit>], [limit-spec=<limit>],...”
set <queue> <queue name> <limit attribute> = “[limit-spec=<limit>], [limitspec=<limit>],...”
Format for adding a limit to an attribute:
set server <limit attribute> += “[limit-spec=<limit>], [limit-spec=<limit>],...”
set <queue> <queue name> <limit attribute> += “[limit-spec=<limit>], [limitspec=<limit>],...”
Format for removing a limit from an attribute; note that the value for <limit> need not be
specified when removing a limit:
set server <limit attribute> -= “[limit-spec], [limit-spec],...”
set <queue> <queue name> <limit attribute> -= “[limit-spec], [limit-spec],...”
Alternate format for removing a limit from an attribute; note that the value of <limit> used
when removing a limit must match the value of the limit:
set server <limit attribute> -= “[limit-spec=<limit>], [limit-spec=<limit>],...”
set <queue> <queue name> <limit attribute> -= “[limit-spec=<limit>], [limitspec=<limit>],...”
where limit-spec specifies a user limit, a group limit, or an overall limit:
Table 5-9: Specifying Limits
Limit
limit-spec
Overall limit
o:PBS_ALL
Generic users
u:PBS_GENERIC
An individual user
u:<username>
Generic groups
g:PBS_GENERIC
An individual group
g:<group name>
Generic projects
p:PBS_GENERIC
An individual project
p:<project name>
The limit-spec can contain spaces anywhere except after the colon (“:”).
AG-402
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
If there are comma-separated limit-specs, the entire string must be enclosed in double quotes.
A username, group name, or project name containing spaces must be enclosed in quotes.
If a username, group name, or project name is quoted using double quotes, and the entire
string requires quotes, the outer enclosing quotes must be single quotes. Similarly, if the inner
quotes are single quotes, the outer quotes must be double quotes.
PBS_ALL is a keyword which indicates that this limit applies to the usage total.
PBS_GENERIC is a keyword which indicates that this limit applies to generic users or
groups.
When removing a limit, the limit value does not need to be specified.
PBS_ALL and PBS_GENERIC are case-sensitive.
5.15.1.9.ii
Examples of Setting Server and Queue Limits
Example 5-21: To set the max_queued limit on QueueA to 5 for total usage, and to limit
user bill to 3:
Qmgr: s q QueueA max_queued = "[o:PBS_ALL=5], [u:bill =3]"
Example 5-22: On QueueA, set the maximum number of CPUs and the maximum amount of
memory that user bill can request in his queued jobs:
Qmgr: s q QueueA max_queued_res.ncpus ="[u:bill=5]",
max_queued_res.mem = "[u:bill=100mb]"
Example 5-23: To set a limit for a username with a space in it, and to set a limit for generic
groups:
Qmgr: s q QueueA max_queued = ‘[u:"\PROG\Named User" = 1],
[g:PBS_GENERIC=4]’
Example 5-24: To set a generic server limit for projects, and an individual server limit for
Project A:
Qmgr: set server max_queued = ‘[p:PBS_GENERIC=6], [p:ProjectA=8]’
PBS Professional 13.0 Administrator’s Guide
AG-403
Chapter 5
5.15.1.9.iii
PBS Resources
Examples of Adding Server and Queue Limits
Example 5-25: To add an overall limit for the maximum number of jobs that can be queued at
QueueA to 10:
Qmgr: s q QueueA max_queued += [o:PBS_ALL=10]
Example 5-26: To add an individual user limit, an individual group limit, and a generic group
limit on queued jobs at QueueA:
Qmgr: s q QueueA max_queued += "[u:user1= 5],
[g:GroupMath=5],[g:PBS_GENERIC=2]"
Example 5-27: To add a limit at QueueA on the number of CPUs allocated to queued jobs for
an individual user, and a limit at QueueA on the amount of memory allocated to queued
jobs for an individual user:
Qmgr: s q QueueA max_queued_res.ncpus += [u:tom=5],
max_queued_res.mem += [u:tom=100mb]
Example 5-28: To add an individual server limit for Project B:
Qmgr: set server max_queued += [p:ProjectB=4]
5.15.1.9.iv
Examples of Removing Server and Queue Limits
It is not necessary to specify the value of the limit when removing a limit, but you can specify
the value of the limit.
Example 5-29: To remove the generic user limit at QueueA for queued jobs, use either of the
following:
Qmgr: set queue QueueA max_queued -= [u:PBS_GENERIC]
Qmgr: set queue QueueA max_queued -= [u:PBS_GENERIC=2]
Example 5-30: To remove the limit on queued jobs at QueueA for Named User, use either of
the following:
Qmgr: set queue QueueA max_queued -= [u:"\PROG\Named User"]
Qmgr: set queue QueueA max_queued -= [u:"\PROG\Named User"=1]
Example 5-31: To remove the limit at QueueA on the amount of memory allocated to an individual user, use either of the following:
Qmgr: set queue QueueA max_queued_res.mem -= [u:tom]
Qmgr: set queue QueueA max_queued_res.mem -= [u:tom=100mb]
AG-404
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
To remove the limit on the number of CPUs allocated to queued jobs for user bill, use either
of the following:
Qmgr: set queue QueueA max_queued_res.ncpus -= [u:bill]
Qmgr: set queue QueueA max_queued_res.ncpus -= [u:bill=5]
Example 5-32: To remove a generic user limit and an individual user limit, use either of the
following:
Qmgr: set queue QueueA max_queued - -= “[u:user1], [u:PBS_GENERIC]”
Qmgr: set queue QueueA max_queued -= “[u:user1=2],
[u:PBS_GENERIC=4]”
Example 5-33: To remove the individual server limit for Project B, use either of the following:
Qmgr: set server max_queued -=[p:ProjectB]
Qmgr: set server max_queued -=[p:ProjectB=4]
5.15.1.10
Who Can Set Limits at Server and Queues
As with other server and queue attributes, only PBS Managers and Operators can set limit
attributes.
PBS Professional 13.0 Administrator’s Guide
AG-405
Chapter 5
5.15.1.11
5.15.1.11.i
PBS Resources
Viewing Server and Queue Limit Attributes
Printing Server and Queue Limit Attributes
You can use the qmgr command to print the commands used to set the limit attributes at the
server or queue.
Example 5-34: To print all the limit attributes for queue QueueA:
Qmgr: p q QueueA max_queued, max_queued_res
#
# Create queues and set their attributes.
#
# Create and define queue QueueA
#
create queue QueueA
set queue QueueA max_queued = "[o:PBS_ALL=10]"
set queue QueueA max_queued += "[u:PBS_GENERIC=2]"
set queue QueueA max_queued += "[u:bill=3]"
set queue QueueA max_queued += "[u:tom=15]"
set queue QueueA max_queued += "[u:user1=3]"
set queue QueueA max_queued += '[u:"\PROG\Named User"=1]'
set queue QueueA max_queued += "[g:PBS_GENERIC=2] "
set queue QueueA max_queued += "[g:GroupMath=5]"
set queue QueueA max_queued_res.ncpus = "[u:bill=5]"
set queue QueueA max_queued_res.ncpus += "[u:tom=5]"
set queue QueueA max_queued_res.mem = "[u:bill=100mb]"
set queue QueueA max_queued_res.mem += "[u:tom=100mb]"
AG-406
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.15.1.11.ii
Chapter 5
Listing Server and Queue Limit Attributes
You can use the qmgr command to list the limit attributes for the queue or server.
Example 5-35: To list the max_queued and max_queued_res attributes for QueueA:
Qmgr: l q QueueA max_queued, max_queued_res
Queue: QueueA
max_queued = [o:PBS_ALL=10]
max_queued = [g:PBS_GENERIC=2]
max_queued = [g:GroupMath=5]
max_queued = [u:PBS_GENERIC=2]
max_queued = [u:bill=3]
max_queued = [u:tom=15]
max_queued = [u:user1=3]
max_queued = [u:"\PROG\Named User"=1]
max_queued_res.ncpus = [u:bill=5]
max_queued_res.ncpus = [u:tom=5]
max_queued_res.mem = [u:bill=5]
max_queued_res.mem = [u:bill=100mb]
max_queued_res.mem = [u:tom=100mb]
PBS Professional 13.0 Administrator’s Guide
AG-407
Chapter 5
5.15.1.11.iii
PBS Resources
Using the qstat Command to View Queue Limit
Attributes
You can use the qstat command to see the limit attribute settings for the queue or server.
Example 5-36: To see the settings for the max_queued and max_queued_res limit
attributes for QueueA using the qstat command:
qstat -Qf QueueA
Queue: QueueA
...
max_queued = [o:PBS_ALL=10]
max_queued = [g:PBS_GENERIC=2]
max_queued = [g:GroupMath=5]
max_queued = [u:PBS_GENERIC=2]
max_queued = [u:bill=3]
max_queued = [u:tom=3]
max_queued = [u:cs=3]
max_queued = [u:"\PROG\Named User"=1]
max_queued_res.ncpus = [u:bill=5]
max_queued_res.ncpus = [u:tom=5]
max_queued_res.mem = [u:bill=5]
max_queued_res.mem =[u:bill=100mb]
max_queued_res.mem =[u:tom=100mb]
5.15.1.12
How Server and Queue Limits Work
Affected jobs are jobs submitted by the user or group, or jobs belonging to a project, whose
limit has been reached. The following table shows what happens when a given limit is
reached:
Table 5-10: Actions Performed When Limits Are Reached
Limit
Running jobs
AG-408
Action
No more affected jobs are run at this server or queue until the number of
affected running jobs drops below the limit.
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Table 5-10: Actions Performed When Limits Are Reached
Limit
Queued jobs
Action
The queue does not accept any more affected jobs until the number of
affected queued jobs drops below the limit. Affected jobs submitted directly
to the queue are rejected. Affected jobs in a routing queue whose destination is this queue remain in the routing queue. If a job is requeued, the limit
is ignored.
Resources for The queue does not run any more affected jobs until the limit would not be
running jobs exceeded if the next affected job were to start.
Resources for The queue does not accept any more affected jobs until the limit would not
queued jobs
be exceeded if the next affected job were to start. Affected jobs submitted
directly to the queue are rejected. Affected jobs in a routing queue whose
destination is this queue remain in the routing queue.
5.15.1.13
5.15.1.13.i
Caveats and Advice for Server and Queue Limits
Avoiding Overflow
On PBS server platforms for which the native size of a long is less than 64 bits, you should
refrain from defining a limit on a resource of type long whose cumulative sum over all queued
jobs would exceed the storage capacity of the resource variable. For example, if each submitted job were to request 100 hours of the cput resource, overflow would occur on a 32-bit platform when 5965 jobs (which is
5.15.1.13.ii
( 2 31 – 1 ) ⁄ 360000 seconds) were queued.
Ensuring That Limits Are Effective
In order for limits to be effective, each job must specify each limited resource. This can be
accomplished using defaults; see section 5.9.3, “Specifying Job Default Resources”, on page
323. You can also use hooks; see section , “Hooks”, on page 437.
5.15.1.13.iii
Array Jobs
An array job with N subjobs is considered to consume N times the amount of resources
requested when it was submitted. For example, if there is a server limit of 100 queued jobs,
no user would be allowed to submit an array job with more than 100 subjobs.
PBS Professional 13.0 Administrator’s Guide
AG-409
Chapter 5
5.15.1.13.iv
PBS Resources
Avoiding Job Rejection
Jobs are rejected when users, groups, or projects who have reached their limit submit a job in
the following circumstances:
•
The job is submitted to the execution queue where the limit has been reached
•
The job is submitted to the complex, and the server limit has been reached
If you wish to avoid having jobs be rejected, you can set up a routing queue as the default
queue. Set the server’s default_queue attribute to the name of the routing queue. See section
2.2.6, “Routing Queues”, on page 24.
5.15.1.13.v
Do Not Mix Old And New Limits
The new limit attributes are incompatible with the old limit attributes. See section 5.15.1.15,
“Old Limit Attributes: Server and Queue Resource Usage Limit Attributes Existing Before
Version 10.1”, on page 411. You cannot mix the use of old and new resource usage limit
attributes. This means that:
•
If any old limit attribute is set, and you try to set a new limit attribute, you will get error
15141.
•
If any new limit attribute is set, and you try to set an old limit attribute, you will get error
15141.
You must unset all of one kind in order to set any of the other kind.
5.15.1.13.vi
Do Not Limit Running Time
Beware creating limits such as max_run_res.walltime or max_run_res.max_walltime. The
results probably will not be useful. You will be limiting the amount of walltime that can be
requested by running jobs for a user, group, or project. For example, if you set a walltime
limit of 10 hours for group A, then group A cannot run one job requesting 5 hours and another
job requesting 6 hours.
5.15.1.14
5.15.1.14.i
Errors and Logging for Server and Queue Limits
Error When Setting Limit Attributes
Attempting to set a new limit attribute while an old limit attribute is set:
"use new/old qmgr syntax, not both"
“Attribute name <new> not allowed. Older name <old> already set'
Attempting to set an old limit attribute while a new limit attribute is set:
"use new/old qmgr syntax, not both"
“Attribute name <old> not allowed: Newer name <new> already set''
AG-410
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.15.1.14.ii
Chapter 5
Logging Events
Whenever a limit attribute is set or modified, the server logs the event, listing which attribute
was modified and who modified it.
Whenever a limit is reached, and would be exceeded by a job, the scheduler logs the event,
listing the limit attribute and the reason.
5.15.1.14.iii
Queued Limit Error Messages
When a limit for queued jobs or resources allocated to queued jobs is reached, the command
involved presents a message. This command can be qsub, qmove or qalter.
5.15.1.14.iv
Run Limit Error Messages
See “Run Limit Error Messages” on page 461 of the PBS Professional Reference Guide for a
list of run limit error messages.
5.15.1.15
Old Limit Attributes: Server and Queue
Resource Usage Limit Attributes Existing Before
Version 10.1
The old server and queue limit attributes discussed here existed in PBS Professional before
Version 10.1. The old limit attributes continue to function as they did in PBS Professional
10.0. These attributes are incompatible with the limit attributes introduced in Version 10.1.
See section 5.15.1.13.v, “Do Not Mix Old And New Limits”, on page 410 and section
5.15.1.14.i, “Error When Setting Limit Attributes”, on page 410. These limits are compatible
with the limits discussed in section 5.15.2, “Limiting Number of Jobs at Vnode”, on page 413.
The following table shows how the old limit attributes are used:
Table 5-11: Resource Usage Limits Existing Before Version 10.1
Limit
Maximum number of
running jobs
Overall
Limit
Generic
Users
max_runn max_user_r
ing
un
Maximum number of
N/A
running jobs (soft limit)
max_user_r
un_soft
PBS Professional 13.0 Administrator’s Guide
Generic
Groups
IndiIndividual vidual
Users Group
max_group_r N/A
un
N/A
max_group_r N/A
un_soft
N/A
AG-411
PBS Resources
Chapter 5
Table 5-11: Resource Usage Limits Existing Before Version 10.1
Overall
Limit
Limit
Generic
Users
Generic
Groups
IndiIndividual vidual
Users Group
Maximum amount of
specified resource allocated to running jobs
N/A
max_user_r
es
max_group_r N/A
es
N/A
Maximum amount of
specified resource allocated to running jobs
(soft limit)
N/A
max_user_r
es_soft
max_group_r N/A
es_soft
N/A
Maximum number of
queued jobs
max_que
uable
N/A
N/A
N/A
N/A
Maximum amount of
specified resource allocated to queued jobs
N/A
N/A
N/A
N/A
N/A
5.15.1.15.i
Precedence of Old Limits
If an old limit is defined at both the server and queue, the more restrictive limit applies.
5.15.1.15.ii
Old Server Limits
For details of these limits, see “Server Attributes” on page 332 of the PBS Professional Reference Guide.
max_running
The maximum number of jobs allowed to be selected for execution at any given
time.
max_group_res,
max_group_res_soft
The maximum amount of the specified resource that all members of the same UNIX
group may consume simultaneously.
max_group_run,
max_group_run_soft
The maximum number of jobs owned by a UNIX group that are allowed to be running from this server at one time.
AG-412
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
max_user_res,
max_user_res_soft
The maximum amount of the specified resource that any single user may consume.
max_user_run,
max_user_run_soft
The maximum number of jobs owned by a single user that are allowed to be running
at one time.
5.15.1.15.iii
Old Queue Limits
For details of these limits, see “Queue Attributes” on page 371 of the PBS Professional Reference Guide.
max_group_res,
max_group_res_soft
The maximum amount of the specified resource that all members of the same UNIX
group may consume simultaneously, in the specified queue.
max_group_run,
max_group_run_soft
The maximum number of jobs owned by a UNIX group that are allowed to be running from this queue at one time
max_queuable
The maximum number of jobs allowed to reside in the queue at any given time. Once
this limit is reached, no new jobs will be accepted into the queue.
max_user_res,
max_user_res_soft
The maximum amount of the specified resource that any single user may consume in
submitting to this queue.
max_user_run,
max_user_run_soft
The maximum number of jobs owned by a single user that are allowed to be running
at one time from this queue.
5.15.2
Limiting Number of Jobs at Vnode
You can set limits at each vnode on the number of jobs that can be run by any user, by any
group, or by everyone taken together. You set these limits by specifying values for vnode
attributes. They are listed here:
PBS Professional 13.0 Administrator’s Guide
AG-413
PBS Resources
Chapter 5
max_group_run
The maximum number of jobs owned by any users in a single group that are allowed
to be run on this vnode at one time.
Format: integer
Qmgr: set node MyNode max_group_run=8
max_running
The maximum number of jobs allowed to be run on this vnode at any given time.
Format: integer
Qmgr: set node MyNode max_running=22
max_user_run
The maximum number of jobs owned by a single user that are allowed to be run on
this vnode at one time.
Format: integer
Qmgr: set node MyNode max_user_run=4
5.15.3
Placing Resource Limits on Jobs
Jobs are assigned limits on the amount of resources they can use. Each limit is set at the
amount requested or allocated by default. These limits apply to how much the job can use on
each vnode (per-chunk limit) and to how much the whole job can use (job-wide limit). Limits
are derived from both requested resources and applied default resources. For information on
default resources, see section 5.9.3, “Specifying Job Default Resources”, on page 323.
Each chunk's per-chunk limits determine how much of any resource can be used in that chunk.
Per-chunk resource usage limits are the amount of per-chunk resources requested, both from
explicit requests and from defaults.
The consumable resources requested for chunks in the select specification are summed, and
this sum makes a job-wide limit. Job resource limits from sums of all chunks override those
from job-wide defaults and resource requests.
Job resource limits set a limit for per-job resource usage. Various limit checks are applied to
jobs. If a job's job resource limit exceeds queue or server restrictions, it will not be put in the
queue or accepted by the server. If, while running, a job exceeds its limit for a consumable or
time-based resource, it will be terminated.
AG-414
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.15.3.1
Chapter 5
How Limits Are Derived
Job resource limits are derived in this order from the following:
1.
Explicitly requested job-wide resources (e.g. -l resource=value)
2.
The following built-in chunk-level resources in the job’s select specification (e.g. -l
select =...)
accelerator_memory
mem
mpiprocs
naccelerators
ncpus
netwins
nodect
vmem
3.
The server’s default_qsub_arguments attribute
4.
The queue’s resources_default.<resource>
5.
The server’s resources_default.<resource>
6.
The queue’s resources_max.<resource>
7.
The server’s resources_max.<resource>
The server’s default_chunk.<resource> does not affect job-wide limits.
You can use a hook to set a per-chunk limit, using any hook that operates on jobs, such as a
job submission hook, a modify job hook, etc.
5.15.3.2
Configuring Per-job Limits at Server and Queue
You can set per-job limits on the amount of each resource that any one job can use. You can
set these limits at the server and at each queue. For example, you can specify the following
limits:
•
Jobs at the server can use no more than 48 hours of CPU time
•
Jobs at QueueA can use no more than 12 hours of CPU time
•
Jobs at QueueA must request more than 2 hours of CPU time
PBS Professional 13.0 Administrator’s Guide
AG-415
Chapter 5
PBS Resources
To set these limits, specify values for the server’s resources_max.<resource> attribute and
each queue’s resources_max.<resource> and resources_min.<resource> attributes. The
server does not have a resources_min.<resource> attribute. To set the maximum at the
server, the format is:
Qmgr: set server resources_max.<resource> = value
To set the maximum and minimum at the queue, the format is:
Qmgr: set queue <queue name> resources_max.<resource> = value
Qmgr: set queue <queue name> resources_min.<resource> = value
For example, to set the 48 hour CPU time limit:
Qmgr: set server resources_max.cput = 48:00:00
5.15.3.2.i
Running Time Limits at Server and Queues
For non-shrink-to-fit jobs, you can set limits on walltime at the server or queue. To set a walltime limit for non-shrink-to-fit jobs at the server or a queue, use resources_max.walltime
and resources min.walltime.
For shrink-to-fit jobs, running time limits are applied to max_walltime and min_walltime, not
walltime. To set a running time limit for shrink-to-fit jobs, you cannot use resources_max or
resources_min for max_walltime or min_walltime. Instead, use resources_max.walltime
and resources_min.walltime. See section 4.8.41.6, “Shrink-to-fit Jobs and Resource Limits”,
on page 283.
5.15.3.3
Configuring Per-job Resource Limit
Enforcement at Vnodes
For a job, enforcement of resource limits is per-MoM, not per-vnode. So if a job requests 3
chunks, each of which has 1MB of memory, and all chunks are placed on one host, the limit
for that job for memory for that MoM is 3MB. Therefore one chunk can be using 2 MB and
the other two using 0.5MB and the job can continue to run.
AG-416
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Job resource limits can be enforced for single-vnode jobs, or for multi-vnode jobs that are
using LAM or a PBS-aware MPI. See the following table for an overview. Memory limits
are handled differently depending on the operating system. See "Job Memory Limit Enforcement on UNIX” on page 418. The ncpus limit can be adjusted in several ways. See "Job
ncpus Limit Enforcement” on page 420 for a discussion. The following table summarizes
how resource limits are enforced at vnodes:
Table 5-12: Resource Limit Enforcement at Vnodes
Limit
What determines when limit
is enforced
Scope of
limit
Enforcement
method
file size
automatically
per-process
setrlimit()
vmem
If job requests or inherits vmem
job-wide
MoM poll
pvmem
If job requests or inherits pvmem
per-process
setrlimit()
pmem
If job requests or inherits pmem
per-process
setrlimit()
pcput
If job requests or inherits pcput
per-process
setrlimit()
cput
If job requests or inherits cput
job-wide
MoM poll
walltime
If job requests or inherits walltime
job-wide
MoM poll
mem
if $enforce mem in MoM’s config
job-wide
MoM poll
ncpus
if $enforce cpuaverage, $enforce job-wide
cpuburst, or both, in MoM’s config. See "Job ncpus Limit Enforcement” on page 420.
MoM poll
5.15.3.4
Job Memory Limit Enforcement
You may wish to prevent jobs from swapping memory. To prevent this, you can set limits on
the amount of memory a job can use. Then the job must request an amount of memory equal
to or smaller than the amount of physical memory available.
PBS measures and enforces memory limits in two ways:
•
On each host, by setting OS-level limits, using the limit system calls
•
By periodically summing the usage recorded in the /proc entries.
PBS Professional 13.0 Administrator’s Guide
AG-417
Chapter 5
PBS Resources
Enforcement of mem is dependent on the following:
•
Adding $enforce mem to the MoM's config file
•
The job requesting or inheriting a default value for mem
You can configure default qsub parameters in the default_qsub_arguments server attribute,
or set memory defaults at the server or queue. See section 5.9.3, “Specifying Job Default
Resources”, on page 323.
5.15.3.4.i
Job Memory Limit Enforcement on UNIX
By default, memory limits are not enforced. To enforce mem resource usage, put $enforce
mem into MoM’s config file, and set defaults for mem so that each job inherits a value if it
does not request it.
The mem resource can be enforced at both the job level and the vnode level. The job-wide
limit is the smaller of a job-wide resource request and the sum of that for all chunks. The
vnode-level limit is the sum for all chunks on that host.
Job-wide limits are enforced by MoM polling the working set size of all processes in the job’s
session. Jobs that exceed their specified amount of physical memory are killed. A job may
exceed its limit for the period between two polling cycles. See section 3.6.1, “Configuring
MoM Polling Cycle”, on page 57.
Per-process limits are enforced by the operating system kernel. PBS calls the kernel call
setrlimit() to set the limit for the top process (the shell), and any process started by the
shell inherits those limits. PBS does not know whether the kernel kills a process for exceeding the limit.
If a user submits a job with a job limit, but not per-process limits (qsub -l cput=10:00)
then PBS sets the per-process limit to the same value. If a user submits a job with both job
and per-process limits, then the per-process limit is set to the lesser of the two values.
Example: a job is submitted with qsub -lcput=10:00
•
There are two CPU-intensive processes which use 5:01 each. The job will be killed by
PBS for exceeding the cput limit. 5:01 + 5:01 is greater than 10:00.
•
There is one CPU-intensive process which uses 10:01. It is very likely that the kernel
will detect it first.
•
There is one process that uses 0:02 and another that uses 10:00. PBS may or may not
catch it before the kernel does depending on exactly when the polling takes place.
If a job is submitted with a pmem limit, or without pmem but with a mem limit, PBS uses
the setrlimit(2) call to set the limit. For most operating systems, setrlimit() is
called with RLIMIT_RSS which limits the Resident Set (working set size). This is not a hard
limit, but advice to the kernel. This process becomes a prime candidate to have memory pages
reclaimed.
AG-418
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
If vmem is specified and no single process exceeds that limit, but the total usage by all the
processes in the job does, then PBS enforces the vmem limit, but not the pvmem limit, and
logs a message. PBS uses MoM polling to enforce vmem.
The limit for pmem is enforced if the job specifies, or inherits a default value for, pmem.
When pmem is enforced, the limit is set to the smaller of mem and pmem. Enforcement is
done by the kernel, and applies to any single process in the job.
The limit for pvmem is enforced if the job specifies, or inherits a default value for, pvmem.
When pvmem is enforced, the limit is set to the smaller of vmem and pvmem. Enforcement
is done by the kernel, and applies to any single process in the job.
The following table shows which OS resource limits can be used by each operating system.
Table 5-13: RLIMIT Usage in PBS Professional
OS
AIX
file
RLIMIT_FSIZE
mem/pmem
RLIMIT_RSS
vmem/pvmem cput/pcput
RLIMIT_DATA
RLIMIT_CPU
RLIMIT_STACK
HP-UX
RLIMIT_FSIZE
RLIMIT_RSS
RLIMIT_AS
RLIMIT_CPU
Linux
RLIMIT_FSIZE
RLIMIT_RSS
RLIMIT_AS
RLIMIT_CPU
SunOS
RLIMIT_FSIZE
RLIMIT_DATA
RLIMIT_VMEM
RLIMIT_CPU
ignored
RLIMIT_CPU
RLIMIT_STACK
Super-UX
RLIMIT_FSIZE
RLIMIT_UMEM
RLIMIT_DATA
RLIMIT_STACK
Note that RLIMIT_RSS, RLIMIT_UMEM, and RLIMIT_VMEM are not standardized (i.e. do
not appear in the Open Group Base Specifications Issue 6).
5.15.3.4.ii
Sun Solaris-specific Memory Enforcement
Solaris does not support RLIMIT_RSS, but instead has RLIMIT_DATA and
RLIMIT_STACK, which are hard limits. On Solaris or another Open Group standards-compliant OS, a malloc() call that exceeds the limit will return NULL. This behavior is different
from other operating systems and may result in the program (such as a user’s application)
receiving a SIGSEGV signal.
PBS Professional 13.0 Administrator’s Guide
AG-419
PBS Resources
Chapter 5
5.15.3.4.iii
Memory Enforcement on cpusets
There should be no need to do so: either the vnode containing the memory in question has
been allocated exclusively (in which case no other job will also be allocated this vnode, hence
this memory) or the vnode is shareable (in which case using mem_exclusive would prevent
two CPU sets from sharing the memory). Essentially, PBS enforces the equivalent of
mem_exclusive by itself.
5.15.3.5
Job ncpus Limit Enforcement
Enforcement of the ncpus limit (number of CPUs used) is available on all platforms. The
ncpus limit can be enforced using average CPU usage, burst CPU usage, or both. By default,
enforcement of the ncpus limit is off. See “$enforce <limit>” on page 287 of the PBS Professional Reference Guide.
5.15.3.5.i
Average CPU Usage Enforcement
Each MoM enforces cpuaverage independently, per MoM, not per vnode. To enforce average CPU usage, put $enforce cpuaverage in MoM’s config file. You can set the values of three variables to control how the average is enforced. These are shown in the
following table.
Table 5-14: Variables Used in Average CPU Usage
Variable
Type
Description
Default
cpuaverage
Boolean
If present (=True), MoM enforces
False
ncpus when the average CPU usage
over the job's lifetime usage is
greater than the specified limit.
average_trialperiod
integer
Modifies cpuaverage. Minimum
job walltime before enforcement
begins. Seconds.
average_percent_over
integer
Modifies cpuaverage. Percentage 50
by which the job may exceed ncpus
limit.
average_cpufactor
float
Modifies cpuaverage. ncpus limit 1.025
is multiplied by this factor to produce actual limit.
AG-420
120
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Enforcement of cpuaverage is based on the polled sum of CPU time for all processes in the
job. The limit is checked each poll period. Enforcement begins after the job has had
average_trialperiod seconds of walltime. Then, the job is killed if the following is true:
(cput / walltime) > (ncpus * average_cpufactor + average_percent_over / 100)
5.15.3.5.ii
CPU Burst Usage Enforcement
To enforce burst CPU usage, put $enforce cpuburst in MoM’s config file. You can
set the values of four variables to control how the burst usage is enforced. These are shown in
the following table.
Table 5-15: Variables Used in CPU Burst
Variable
Type
Description
Default
cpuburst
Boolean
If present (=True), MoM enforces
False
ncpus when CPU burst usage exceeds
specified limit.
delta_percent_over
integer
Modifies cpuburst. Percentage over
limit to be allowed.
50
delta_cpufactor
float
Modifies cpuburst. ncpus limit is
multiplied by this factor to produce
actual limit.
1.5
delta_weightup
float
Modifies cpuburst. Weighting factor 0.4
for smoothing burst usage when average is increasing.
delta_weightdown
float
Modifies cpuburst. Weighting factor 0.1
for smoothing burst usage when average is decreasing.
MoM calculates an integer value called cpupercent each polling cycle. This is a moving
weighted average of CPU usage for the cycle, given as the average percentage usage of one
CPU. For example, a value of 50 means that during a certain period, the job used 50 percent
of one CPU. A value of 300 means that during the period, the job used an average of three
CPUs.
new_percent = change_in_cpu_time*100 / change_in_walltime
weight = delta_weight[up|down] * walltime/max_poll_period
new_cpupercent = (new_percent * weight) + (old_cpupercent * (1-weight))
PBS Professional 13.0 Administrator’s Guide
AG-421
Chapter 5
PBS Resources
delta_weight_up is used if new_percent is higher than the old cpupercent value.
delta_weight_down is used if new_percent is lower than the old cpupercent value.
delta_weight_[up|down] controls the speed with which cpupercent changes. If
delta_weight_[up|down] is 0.0, the value for cpupercent does not change over time. If it is
1.0, cpupercent will take the value of new_percent for the poll period. In this case cpupercent changes quickly.
However, cpupercent is controlled so that it stays at the greater of the average over the entire
run or ncpus*100.
max_poll_period is the maximum time between samples, set in MoM’s config file by
$max_check_poll, with a default of 120 seconds.
The job is killed if the following is true:
new_cpupercent > ((ncpus * 100 * delta_cpufactor) + delta_percent_over)
The following entries in MoM’s config file turn on enforcement of both average and burst
with the default values:
$enforce
$enforce
$enforce
$enforce
$enforce
$enforce
$enforce
$enforce
$enforce
cpuaverage
cpuburst
delta_percent_over 50
delta_cpufactor 1.05
delta_weightup 0.4
delta_weightdown 0.1
average_percent_over 50
average_cpufactor 1.025
average_trialperiod 120
The cpuburst and cpuaverage information show up in MoM's log file, whether or not they
have been configured in mom_priv/config. This is so a site can test different parameters
for cpuburst/cpuaverage before enabling enforcement. You can see the effect of any
change to the parameters on your job mix before "going live".
Note that if the job creates a child process whose usage is not tracked by MoM during its lifetime, CPU usage can appear to jump dramatically when the child process exits. This is
because the CPU time for the child process is assigned to its parent when the child process
exits. MoM may see a big jump in cpupercent, and kill the job.
5.15.3.5.iii
Job Memory Limit Restrictions
Enforcement of mem resource usage is available on all UNIX platforms, but not Windows.
AG-422
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.15.3.6
Chapter 5
Changing Job Limits
The qalter command is used to change job limits, with these restrictions:
•
A non-privileged user may only lower the limits for job resources
•
A Manager or Operator may lower or raise requested resource limits, except for per-process limits such as pcput and pmem, because these are set when the process starts, and
enforced by the kernel.
•
When you lengthen the walltime of a running job, make sure that the new walltime will
not interfere with any existing reservations etc.
See “qalter” on page 135 of the PBS Professional Reference Guide.
5.15.4
Limiting the Number of Jobs in Queues
If you limit the number of jobs in execution queues, you can speed up the scheduling cycle.
You can set an individual limit on the number of jobs in each queue, or a limit at the server,
and you can apply these limits to generic and individual users, groups, and projects, and to
overall usage. You specify this limit by setting the queued_jobs_threshold queue or server
attribute. See section 5.15.1.9, “How to Set Limits at Server and Queues”, on page 401.
If you set a limit on the number of jobs that can be queued in execution queues, we recommend that you have users submit jobs to a routing queue only, and route jobs to the execution
queue as space becomes available. See section 4.8.39, “Routing Jobs”, on page 272.
5.16
Where Resource Information Is Kept
Definitions and values for PBS resources are kept in the following files, attributes, and parameters. Attributes specifying resource limits are not listed here. They are listed in section
5.15.1.8, “Resource Usage Limit Attributes for Server and Queues”, on page 399 and section
5.15.1.15, “Old Limit Attributes: Server and Queue Resource Usage Limit Attributes Existing
Before Version 10.1”, on page 411.
5.16.1
Files
PBS_HOME/server_priv/resourcedef
Contains definitions of custom resources. Format:
<resource name> [type=<type>] [flag=<flags>]
PBS Professional 13.0 Administrator’s Guide
AG-423
PBS Resources
Chapter 5
Example:
LocalScratch type=long, flag=h
FloatLicense type=long
SharedScratch type=long
See section 5.14.2.8, “The resourcedef File”, on page 348.
PBS_HOME/sched_priv/sched_config
resources: line
In order for scheduler to be able to schedule using a resource, the resource must be
listed in the resources: line. Format:
resources: “<resource name>, [<resource name>, ...]”
Example:
resources: “ncpus, mem, arch, [...], LocalScratch, FloatLicense,
SharedScratch”
The only exception is host-level Boolean resources, which do not need to appear in
the resources: line.
server_dyn_res: line
Each dynamic server resource must be listed in its own server_dyn_res: line.
Format:
server_dyn_res: “<resource name> !<path to script/command>”
Example:
server_dyn_res: “SharedScratch !/usr/local/bin/serverdynscratch.pl”
mom_resources: line
Dynamic host resources must be listed in the mom_resources: line. Format:
mom_resources: “<resource name>”
Example:
mom_resources: “LocalScratch”
PBS_HOME/mom_priv/config
Contains MoM configuration parameters and any local resources. Format:
<resource name> !<path to script/command>
Example:
LocalScratch !/usr/local/bin/localscratch.pl
See “MoM Parameters” on page 283 of the PBS Professional Reference Guide.
AG-424
PBS Professional 13.0 Administrator’s Guide
PBS Resources
5.16.2
Chapter 5
MoM Configuration Parameters
$cputmult <factor>
This sets a factor used to adjust CPU time used by each job. This allows adjustment
of time charged and limits enforced where jobs run on a system with different CPU
performance. If MoM’s system is faster than the reference system, set factor to a
decimal value greater than 1.0. For example:
$cputmult 1.5
If MoM’s system is slower, set factor to a value between 1.0 and 0.0. For example:
$cputmult 0.75
$wallmult <factor>
Each job’s walltime usage is multiplied by this factor. For example:
$wallmult 1.5
5.16.3
Server Attributes
default_chunk
The list of resources which will be inserted into each chunk of a job’s select specification if the corresponding resource is not specified by the user. This provides a
means for a site to be sure a given resource is properly accounted for even if not
specified by the user.
Format: String.
Usage:
default_chunk.<resource>=<value>,default_chunk.<resource>=<value>,...
Default: None
default_qsub_arguments
Arguments that are automatically added to the qsub command. Any valid arguments to qsub command, such as job attributes. Setting a job attribute via
default_qsub_arguments sets that attribute for each job which does not explicitly
override it. See qsub(1B). Settable by the administrator via the qmgr command.
Overrides standard defaults. Overridden by arguments given on the command line
and in script directives.
Example:
Qmgr: set server default_qsub_arguments=”-r y -N MyJob”
Format: String
Default: None
PBS Professional 13.0 Administrator’s Guide
AG-425
PBS Resources
Chapter 5
resources_available.<resource name>
The list of available resources and their values defined on the server. Each resource
is listed on a separate line.
Format: String.
Form: resources_available.<resource>=<value>
Default: None
resources_default.<resource name>
The list of default resource values that are set as limits for jobs in this complex when
a) the job does not specify a limit, and b) there is no queue default.
Format: String.
Form: resources_default.resource_name=value[,...]
Default: None
resources_assigned.<resource name>
The total of each type of consumable resource allocated to running jobs and started
reservations in this complex. Read-only.
Format: String.
Form: resources_assigned.<res>=<val>[,resources_assigned.<res>=<val>,...]
Default: None
5.16.4
Reservation Attributes
Resource_List.<resource name>
The list of resources allocated to the reservation. Jobs running in the reservation cannot use in aggregate more than the specified amount of a resource.
Format: String
Form: Resource_List.<res>=<val>, Resource_List.<res>=<val>, ...
Default: None
5.16.5
Queue Attributes
default_chunk.<resource name>
The list of resources which will be inserted into each chunk of a job’s select specification if the corresponding resource is not specified by the user. This provides a
AG-426
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
means for a site to be sure a given resource is properly accounted for even if not
specified by the user. Applies only to execution queues.
Format: String.
Form: default_chunk.<resource>=<value>, default_chunk.<resource>=<value>,
...
Default: None
resources_default.<resource name>
The list of default resource values which are set as limits for a job residing in this
queue and for which the job did not specify a limit. If not set, the default limit for a
job is determined by the first of the following attributes which is set: server’s
resources_default, queue’s resources_max, server’s resources_max. If none of these
is set, the job gets unlimited resource usage
Format: String.
Form: resources_default.<resource name>=<value>,
resources_default.<resource_name>=<value>, ...
Default: None
resources_assigned.<resource name>
The total for each kind of consumable resource allocated to jobs running from this
queue. Read-only.
Format: String.
Form: resources_assigned.<res>=<val><newline>resources_assigned.<res>=<val><newline>...
Default: None
resources_available.<resource name>
The list of resources and amounts available to jobs running in this queue. The sum of
the resource of each type used by all jobs running from this queue cannot exceed the
total amount listed here. See “qmgr” on page 158 of the PBS Professional Reference
Guide.
Format: String.
Form: resources_available.<resource_name>=<value><newline>resources_available.<resource_name>=<value><newline>...
Default: None
PBS Professional 13.0 Administrator’s Guide
AG-427
PBS Resources
Chapter 5
5.16.6
Vnode Attributes
resources_available.<resource name>
The list of resources and the amounts available on this vnode. If not explicitly set,
the amount shown is that reported by the pbs_mom running on the vnode. If a
resource value is explicitly set, that value is retained across restarts.
Format: String.
Form: resources_available.<resource name>=<value>,
resources_available.<resource name> = <value>, ...
Default: None
sharing
Specifies whether more than one job at a time can use the resources of the vnode or
the vnode’s host. Either (1) the vnode or host is allocated exclusively to one job, or
(2) the vnode’s or host’s unused resources are available to other jobs. Can be set
using pbs_mom -s insert only. Behavior is determined by a combination of
the sharing attribute and a job’s placement directive. See “sharing” on page 389 of
the PBS Professional Reference Guide.
pcpus
The number of physical CPUs on the vnode. This is set to the number of CPUs
available when MoM starts. For a multiple-vnode MoM, only the natural vnode has
pcpus.
Format: Integer
Default: Number of CPUs on startup
resources_assigned.<resource name>
The total amount of each consumable resource allocated to jobs and started reservations running on this vnode. Applies only to execution queues. Read-only.
Format: String.
Form:
resources_assigned.<resource>=<value>[,resources_assigned.<resource>=<valu
e>]
Default: None
5.16.7
Job Attributes
Resource_List.<resource name>
The list of resources required by the job. List is a set of <name>=<value> strings.
The meaning of name and value is dependent upon defined resources. Each value
AG-428
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
establishes the limit of usage of that resource. If not set, the value for a resource may
be determined by a queue or server default established by the administrator. See section 5.9.2, “Resources Requested by Job”, on page 323.
Format: String.
Form: Resource_List.<res>=<value>, Resource_List.<res>=<value>, ...
Default: None
resources_used.<resource name>
The amount of each resource actually used by the job. Read-only.
Format: String
Form: List of <name>=<value> pairs: resources_used.<res>=<val>,
resources_used.<res>=<val>
5.17
Viewing Resource Information
You can see attribute values of resources for the server, queues, and vnodes using the qmgr or
pbsnodes commands. The value in the server, queue, or vnode resources_assigned
attribute is the amount explicitly requested by jobs and, at the server and vnodes, started reservations.
You can see job attribute values using the qstat command. The value in the job’s
Resource_List attribute is the amount explicitly requested by the job. See section 5.9.2,
“Resources Requested by Job”, on page 323.
The following table summarizes how to find resource information:
Table 5-16: How to View Resource Information
Location
server
scheduler
Item to View
Command
default_chunk,
default_qsub_arguments,
resources_available,
resources_assigned,
resources_default
qmgr, qstat, pbsnodes
resourcedef file
Favorite editor or viewer
sched_config file
Favorite editor or viewer
PBS Professional 13.0 Administrator’s Guide
AG-429
PBS Resources
Chapter 5
Table 5-16: How to View Resource Information
Location
Item to View
queues
Command
default_chunk, resources_available,
resources_assigned,
resources_default
MoM and vnodes resources_available, sharing, pcpus,
resources_assigned
qmgr, qstat
qmgr, pbsnodes
mom_config file
Favorite editor or viewer
job
Resource_List
qstat
reservation
Resource_List
pbs_rstat -f
accounting
resources_assigned entry in
accounting log
Favorite editor or viewer
Every consumable resource, for example mem, can appear in four PBS attributes. These
attributes are used in the following elements of PBS:
resources_available
X
X
X
resources_assigned
X
X
X
resources_used
Resource_List
5.17.1
Scheduler
Job
Accounting
Log
Attribute
Server
Vnode
Queue
Table 5-17: Values Associated with Consumable Resources
X
X
X
X
X
X
X
Resource Information in Accounting Logs
You can see accounting values in the accounting log file. The accounting log S record is written at the start of a job, and the E record is written at the end of a job. The accounting log B
record is written at the beginning of a reservation.
AG-430
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
Each consumable resource allocated to or taken up by a job is reported separately in a
resources_assigned accounting entry in the job’s E and S records. The
resources_assigned entry is not a job attribute; it is simply an entry in the accounting
log.
Consumable job resources actually used by the job are recorded in the job’s resources_used
attribute, and are reported in the accounting log.
The value reported in the resources_assigned accounting entry is the amount assigned
to a job or that a job prevents other jobs from using, which is different from the amount the
job requested and used. For example, if a job requests one CPU on an Altix that has four
CPUs per blade/vnode and that vnode is allocated exclusively to the job, even though the job
requested one CPU, it is assigned all 4 CPUs. In this example, resources_assigned
reports 4 CPUs, and resources_used reports 1 CPU.
Resources requested for a job are recorded in the job’s Resource_List attribute, and reported
in the accounting log E and S records for the job.
Resources requested for a reservation are recorded in the reservation’s Resource_List
attribute, and reported in the accounting log B record for the reservation.
5.17.2
Resource Information in Daemon Logs
At the end of each job, the server logs the values in the job’s resources_used attribute, at
event class 0x0010.
Upon startup, MoM logs the number of CPUs reported by the OS, at event class 0x0002.
At the end of each job, the MoM logs cput and mem used by each job, and cput used by each
job task, at event class 0x0100.
5.17.3
Finding Current Value
You can find the current value of a resource by subtracting the amount being used from the
amount that is defined.
Use the qstat -Bf command, and grep for resources_available.<resource> and
resources_used.<resource>. To find the current amount not being used, subtract
resources_used.<resource> from resources_available.<resource>.
PBS Professional 13.0 Administrator’s Guide
AG-431
Chapter 5
PBS Resources
5.17.4
Restrictions on Viewing Resources
•
Dynamic resources shown in qstat do not display the current value, they display the
most recent retrieval. Dynamic resources have no resources_available.<resource>
representation anywhere in PBS.
•
Local static host-level resources cannot be viewed via qstat or managed via qmgr.
5.18
Resource Recommendations and
Caveats
•
It is not recommended to set the value for resources_available.ncpus. The exception is
when you want to oversubscribe CPUs. See section 9.4.4.1.iii, “How To Share CPUs”,
on page 885.
•
It is not recommended to change the value of ncpus at vnodes on a multi-vnoded
machine.
•
If you want to limit how many jobs are run, or how much of each resource is used, use the
new limits. See section 5.15, “Managing Resource Usage”, on page 388.
•
It is not recommended to create local host-level resources by defining them in the MoM
configuration file.
•
On the Altix, do not set the values for mem, vmem or ncpus on the natural vnode. If any
of these resources has been explicitly set to a non-zero value on the natural vnode, set
resources_available.ncpus, resources_available.mem and
resources_available.vmem to zero on each natural vnode:
•
Do not attempt to set values for resources_available.<resource> for dynamic resources.
•
Externally-managed licenses may not be available when PBS thinks they are. PBS
doesn't actually check out externally-managed licenses; the application being run inside
the job's session does that. Between the time that the scheduler queries for licenses, and
AG-432
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
the time the application checks them out, another application may take the licenses. In
addition, some applications request varying amounts of tokens during a job run.
•
Jobs may be placed on different vnodes from those where they would have run in earlier
versions of PBS. This is because a job’s resource request will no longer match the same
resources on the server, queues and vnodes.
•
While users cannot request custom resources that are created with the r flag, jobs can
inherit these as defaults from the server or queue resources_default.<resource>
attribute.
•
A qsub or pbs_rsub hook does not have resources inherited from the server or queue
resources_default or default_chunk as an input argument.
•
Resources assigned from the default_qsub_arguments server attribute are treated as if
the user requested them. A job will be rejected if it requests a resource that has a resource
permission flag, whether that resource was requested by the user or came from
default_qsub_arguments. Be aware that creating custom resources with permission
PBS Professional 13.0 Administrator’s Guide
AG-433
Chapter 5
PBS Resources
flags and then using these in the default_qsub_arguments server attribute can cause
jobs to be rejected. See section 5.14.2.10, “Resource Permission Flags”, on page 351.
•
Numeric dynamic resources cannot have the q or n flags set. This would cause these
resources to be underused. These resources are tracked automatically by scheduler.
•
The behavior of several command-line interfaces is dependent on resource permission
flags. These interfaces are those which view or request resources or modify resource
requests:
pbsnodes
Users cannot view restricted host-level custom resources.
pbs_rstat
Users cannot view restricted reservation resources.
pbs_rsub
Users cannot request restricted custom resources for reservations.
qalter
Users cannot alter a restricted resource.
qmgr
Users cannot print or list a restricted resource.
qselect
Users cannot specify restricted resources via -l Resource_List.
qsub
Users cannot request a restricted resource.
qstat
Users cannot view a restricted resource.
•
Do not set values for any resources, except those such as shared scratch space or floating
licenses, at the server or a queue, because the scheduler will not allocate more than the
specified value. For example, if you set resources_available.walltime at the server to
10:00:00, and one job requests 5 hours and one job requests 6 hours, only one job will be
allowed to run at a time, regardless of other idle resources.
•
If a job is submitted without a request for a particular resource, and no defaults for that
resource are set at the server or queue, and either the server or queue has
resources_max.<resource> set, the job inherits that maximum value. If the queue has
AG-434
PBS Professional 13.0 Administrator’s Guide
PBS Resources
Chapter 5
resources_max.<resource> set, the job inherits the queue value, and if not, the job
inherits the server value.
•
When setting global static vnode resources on multi-vnode machines, follow the rules in
section 3.5.2, “Choosing Configuration Method”, on page 52.
•
Do not create custom resources with the same names or prefixes that PBS uses when creating custom resources for specific systems. See “Custom Cray Resources” on page 323
of the PBS Professional Reference Guide.
•
Do not set resources_available.place for a vnode.
•
Using dynamic host-level resources can slow the scheduler down, because the scheduler
must wait for each resource-query script to run.
•
On the natural vnode, all values for resources_available.<resource> should be zero
(0), unless the resource is being shared among other vnodes via indirection.
•
Default qsub arguments and server and queue defaults are applied to jobs at a coarse
level. Each job is examined to see whether it requests a select and a place. This means
that if you specify a default placement, such as excl, with -lplace=excl, and the user
specifies an arrangement, such as pack, with -lplace=pack, the result is that the job
ends up with -lplace=pack, NOT -lplace=pack:excl. The same is true for
select; if you specify a default of -lselect=2:ncpus=1, and the user specifies lselect=mem=2GB, the job ends up with -lselect=mem=2GB.
PBS Professional 13.0 Administrator’s Guide
AG-435
Chapter 5
AG-436
PBS Resources
PBS Professional 13.0 Administrator’s Guide
6
Hooks
Hooks are custom executables that can be run at specific points in the execution of PBS.
They accept, reject, or modify the upcoming action. This provides job filtering, patches,
MoM startup checks, workarounds, etc., and extends the capabilities of PBS, without the need
to modify source code.
This chapter describes how hooks can be used, how they work, the interface to hooks provided by the pbs module, how to create and deploy hooks, and how to get information about
hooks.
Please read the entire chapter, and the “Special Notes (Hooks)” section of the release notes,
before writing any hooks.
6.1
6.1
6.2
6.2.1
6.3
6.4
6.5
6.5.1
6.5.2
6.5.3
6.5.4
6.5.5
6.6
6.6.1
6.6.2
6.6.3
6.6.4
6.6.5
6.6.6
6.6.7
6.6.8
6.6.9
Chapter Contents
Chapter Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Introduction to Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Built-in Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Glossary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Prerequisites and Requirements for Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Simple How-to for Writing Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Writing Hooks: Basic Hook Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
Example of Simple Hook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Importing Hook Configuration File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
Creating and Importing Your Hook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Setting Attributes for Your Hook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
Uses for Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
Routing Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
Managing Resource Requests and Usage. . . . . . . . . . . . . . . . . . . . . . . . . 447
Ensuring that Jobs Run Properly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
Managing Job Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Controlling Interactive Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Helping Schedule Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Communicating Information to Users . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Managing User Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Enabling Accounting and Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
PBS Professional 13.0 Administrator’s Guide
AG-437
Chapter 6
Hooks
6.6.10
Allocation Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
6.6.11
Managing Job Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
6.6.12
Configuring Vnodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
6.6.13
Provisioning Vnodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
6.6.14
Enforcing Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
6.6.15
Accepting or Rejecting Job Task Attachment . . . . . . . . . . . . . . . . . . . . . 451
6.7
Hook Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
6.7.1
Accepting or Rejecting Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
6.7.2
When Hooks Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
6.7.3
Account Under Which Hooks Run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
6.7.4
Where Hooks Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
6.7.5
Permissions and Location for Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
6.7.6
Failover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
6.7.7
What Hooks Cannot Access or Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
6.7.8
What Hooks Should Not Do. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
6.8
Creating and Configuring Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
6.8.1
Introduction to Creating and Configuring Hooks. . . . . . . . . . . . . . . . . . . 459
6.8.2
Overview of Creating and Configuring a Hook . . . . . . . . . . . . . . . . . . . . 460
6.8.3
Creating Empty Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
6.8.4
Deleting Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
6.8.5
Setting Hook Trigger Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
6.8.6
Using Hook Configuration Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
6.8.7
Importing Hooks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
6.8.8
Exporting Hooks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
6.8.9
Setting and Unsetting Hook Attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . 470
6.8.10
Enabling and Disabling Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
6.8.11
Setting the Relative Order of Hook Execution. . . . . . . . . . . . . . . . . . . . . 476
6.8.12
Setting Hook Timeout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
6.8.13
Setting Hook Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
6.8.14
Setting Hook User Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
6.9
Viewing Hook Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
6.9.1
Listing Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
6.9.2
Viewing Hook Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
6.9.3
Printing Hook Creation Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
6.9.4
Re-creating Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
6.10
Writing Hook Scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
6.10.1
How We Define and Refer to Objects and Methods . . . . . . . . . . . . . . . . 480
6.10.2
Recommended Hook Script Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
6.10.3
Hook Alarm Calls and Unhandled Exceptions. . . . . . . . . . . . . . . . . . . . . 485
6.10.4
Using Attributes and Resources in Hooks . . . . . . . . . . . . . . . . . . . . . . . . 486
6.10.5
Using select and place in Hooks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
6.10.6
Offlining and Clearing Vnodes Using the fail_action Hook Attribute. . . 509
AG-438
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
6.10.7
Restarting Scheduler Cycle After Hook Failure. . . . . . . . . . . . . . . . . . . . 510
6.10.8
Adding Custom Non-consumable Host-level Resources . . . . . . . . . . . . . 510
6.10.9
Printing And Logging Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
6.10.10 Capturing Return Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
6.10.11 When You Need Persistent Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
6.10.12 Setting Up Job Environment on Sisters . . . . . . . . . . . . . . . . . . . . . . . . . . 512
6.11
Advice and Caveats for Writing Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
6.11.1
Rules for Hook Access and Behavior. . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
6.11.2
Check for Parameter Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
6.11.3
Make Changes Only On Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
6.11.4
Offline Vnodes when exechost_startup Hook Rejects . . . . . . . . . . . . . . . 516
6.11.5
Minimize Unnecessary Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
6.11.6
Use Fast Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
6.11.7
Avoiding Interference with Normal Operation . . . . . . . . . . . . . . . . . . . . 516
6.11.8
Avoiding Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
6.11.9
Local Server Only. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
6.11.10 Scheduling Impact of Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
6.11.11 Windows Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
6.12
Interface to Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
6.12.1
The pbs Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
6.12.2
PBS Interface Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
6.12.3
PBS Interface Object Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
6.12.4
Event Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
6.12.5
Server Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
6.12.6
Queue Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
6.12.7
Job Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
6.12.8
exec_vnode Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
6.12.9
Chunk Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
6.12.10 Reservation Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
6.12.11 Vnode Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
6.12.12 Configuration File Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
6.12.13 Constant Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
6.12.14 Object Members and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
6.13
Hook Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
6.14
Managing Built-in Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
6.14.1
Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
6.14.2
Allowed Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
6.14.3
Editing and Importing Configuration Files for Built-in Hooks . . . . . . . . 634
6.14.4
Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
6.14.5
Replacing a Built-in Hook with Your Own Hook . . . . . . . . . . . . . . . . . . 634
6.14.6
Errors and Logging when Operating on Built-in Hooks . . . . . . . . . . . . . 635
6.15
Python Modules and PBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
PBS Professional 13.0 Administrator’s Guide
AG-439
Hooks
Chapter 6
6.15.1
Modifying Python Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
6.15.2
List of Modules in pbs_python. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
6.15.3
Python Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
6.16
Debugging Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
6.16.1
The pbs_python Hook Debugging Tool . . . . . . . . . . . . . . . . . . . . . . . 637
6.16.2
Files for Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
6.16.3
Steps to Debug a Hook Using pbs_python. . . . . . . . . . . . . . . . . . . . . . . . 646
6.16.4
Caveats and Restrictions for pbs_python . . . . . . . . . . . . . . . . . . . . . . . . . 646
6.16.5
Examples of Using pbs_python to Debug Hooks . . . . . . . . . . . . . . . . 648
6.16.6
Using Log Messages to Debug Hook Scripts. . . . . . . . . . . . . . . . . . . . . . 658
6.16.7
Checking Hook Syntax using Python. . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
6.16.8
Examples of Debugging Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
6.16.9
Interactive Debugging using pbs_python . . . . . . . . . . . . . . . . . . . . . . . . . 723
6.17
Error Reporting and Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 724
6.17.1
Errors During Creation and Deployment . . . . . . . . . . . . . . . . . . . . . . . . . 725
6.17.2
Errors And Messages During Hook Execution . . . . . . . . . . . . . . . . . . . . 728
6.17.3
Errors During Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
6.17.4
Errors in Hook Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
6.17.5
Hook-related Error Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
6.17.6
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
6.18
Attributes and Parameters Affecting Hooks . . . . . . . . . . . . . . . . . . . . . . . . . . 735
6.19
See Also. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
6.2
Introduction to Hooks
A hook is a block of Python code that PBS executes at certain events, for example, when a job
is queued. As long as the Python code conforms to the rules we describe, you can have it do
whatever you want. Each hook can accept (allow) or reject (prevent) the action that triggers
it. The hook can modify the input parameters given for the action. The hook can also make
calls to functions external to PBS. The hook can use a configuration file that you provide.
PBS provides an interface for use in hooks. This interface allows hooks to read and/or modify
things such as job, server, vnode, and queue attributes, and the event that triggered the hook.
6.2.1
Built-in Hooks
Some functions of standard PBS are accomplished through built-in hooks. We use the keyword pbshook with these hooks. These hooks are not designed to be altered, so they have
some restrictions placed on them. See section 6.14, “Managing Built-in Hooks”, on page 634.
AG-440
PBS Professional 13.0 Administrator’s Guide
Hooks
6.3
Chapter 6
Glossary
Accept an action
The hook allows the action to take place.
Action
A PBS operation or state transition. Also called an event. For a list of events, see
section 6.12.4.1, “Event Types”, on page 540.
Built-in hook
A hook that is supplied as part of PBS. These hooks cannot be created or deleted by
administrators.
Creating a hook
When you “create a hook” using qmgr, you’re telling PBS that you want it to make
you an empty hook object that has no characteristics other than a name.
Event
A PBS operation or state transition. Also called action. For a list of events, see section 6.12.4.1, “Event Types”, on page 540.
Execution event hook
A hook that runs at an execution host. These hooks run after a job is received by
MoM. Execution event hooks have names prefixed with “execjob_”.
Failure action
The action taken when a hook fails to execute. Specified in the fail_action hook
attribute. See section 6.8.9.2, “Using the fail_action Hook Attribute”, on page 473.
Importing a hook
When you “import a hook” using qmgr, you’re telling PBS which Python script to
run when the hook is triggered.
Importing a hook configuration file
When you “import a hook configuration file” using qmgr, you’re telling PBS which
file should be stored as the configuration file for the specified hook.
Non-job event hook
A hook that is not directly related to a specific job. Non-job event hooks are periodic
hooks, startup hooks, provisioning hooks, and reservation creation hooks.
pbshook
PBS keyword for a built-in hook.
PBS Professional 13.0 Administrator’s Guide
AG-441
Hooks
Chapter 6
pbs module
The pbs module provides an interface to PBS and the hook environment. The interface is made up of Python objects, object members, and methods. You can operate
on these objects using Python code.
Pre-execution event hook
A hook that runs before the job is accepted by MoM. These hooks do not run on execution hosts. Pre-execution event hooks are for job submission, moving a job, altering a job, or just before sending a job to an execution host.
Reject an action
The hook prevents the action from taking place. For example, if a runjob hook
rejects a job, the job is requeued.
6.4
Prerequisites and Requirements for
Hooks
•
To create a hook under UNIX/Linux, you must be logged into the primary or secondary
server host as root. You must create any hooks at the primary or secondary server host.
•
To create a hook under Windows, you must use the installation account. For domained
environments, the installation account must be a local account that is a member of the
local Administrators group on the local computer. For standalone environments, the
installation account must be a local account that is a member of the local Administrators
group on the local computer.
•
On Windows 7 and later with UAC enabled, if you will use the cmd prompt to operate on
hooks, or for any privileged command such as qmgr, you must run the cmd prompt with
option Run as Administrator.
•
When creating hooks, make sure that each execution host where execution or periodic
hooks should run has the $reject_root_scripts MoM parameter set to False. The default
for this parameter is False.
•
In order for execution event hooks to function, either the query_other_jobs server
attribute must be set to True, or root at every execution host must be added to the managers list (root@hostname must be added to the managers server attribute). If you have
any hooks running with user set to pbsuser, you will have to set query_other_jobs to
True (you probably don’t want to add pbsuser to managers).
A normal, non-privileged, user cannot circumvent, disable, add, delete, or modify hooks or
the environment in which the hooks are run.
AG-442
PBS Professional 13.0 Administrator’s Guide
Hooks
6.5
Chapter 6
Simple How-to for Writing Hooks
We will go into the details of what goes into a hook later in the chapter, but here we show the
basics of how to create a hook. Steps for creating a hook:
1.
Log into the server host as root
2.
Write the hook script
3.
Create an empty hook via qmgr
4.
Set the attributes of the hook so that it triggers when you want, etc
1.
If the hook will use a configuration file:
2.
a.
Write the hook configuration file
b.
Import the hook configuration file
Import the hook script into the empty hook. You do not need to restart the MoM, unless
it's an exechost_startup hook. Since exechost_startup hooks run only when MoM
starts up or is HUPed, if you want the hook to run now, restart or kill -HUP the MoM.
PBS Professional 13.0 Administrator’s Guide
AG-443
Hooks
Chapter 6
6.5.1
Writing Hooks: Basic Hook Structure
•
Import the pbs and sys modules:
import pbs
import sys
•
Use the try... except construction, where you test for conditions in the try block, and
accept or reject the event:
try:
…
except:
Consider either rerunning the job or deleting the job inside the except: block.
•
Treat the SystemExit exception as a normal occurrence, and pass if it occurs:
except SystemExit:
pass
•
Reject the event, or rerun or delete the job, if any other exception occurs:
except:
pbs.event().reject("%s hook failed with %s")
•
If the requestor is the scheduler, and where appropriate, the server or MoM, allow the
action to take place:
if pbs.event().requestor in ["PBS_Server", "Scheduler", "pbs_mom"]:
pbs.event().accept()
AG-444
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
The following code fragment is a basic hook skeleton:
import pbs
import sys
e=pbs.event()
j=e.job
try:
if e.requestor in ["Scheduler"]:
e.accept()
…
except SystemExit:
pass
except:
j.rerun()
e.reject("%s hook failed with %s. Please contact Admin" % (e.hook_name,
sys.exc_info()[:2]))
PBS Professional 13.0 Administrator’s Guide
AG-445
Hooks
Chapter 6
6.5.2
Example of Simple Hook
Example 6-1: Redirecting newly-submitted jobs:
If a job is submitted to a queue other than workq, move it to workq
import pbs
import sys
try:
# Get the hook event information and parameters
# This will be for the 'queuejob' event type.
e = pbs.event()
# Ignore requests from scheduler or server
if e.requestor in ["PBS_Server", "Scheduler"]:
e.accept()
# Get the information for the job being queued
j = e.job
if j.queue in ["long", "short"]:
j.queue = pbs.server().queue("workq")
# accept the event
e.accept()
except SystemExit:
pass
except:
e.reject("Failed to route job to queue workq")
6.5.3
Importing Hook Configuration File
If you want your hook to use a configuration file, you can import the configuration file. A
configuration file is not required.
Syntax for importing a configuration file:
Qmgr: import hook <hook_name> application/x-config <contentencoding> <input_config_file>
Here, <content-encoding> can be “default” (7-bit) or “base64”.
AG-446
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
See section 6.8.6, “Using Hook Configuration Files”, on page 465.
6.5.4
Creating and Importing Your Hook
When you “create a hook” using qmgr, you’re telling PBS that you want it to make you an
empty hook object that has no characteristics other than a name. When you “import a hook”
using qmgr, you’re telling PBS which Python script to run when the hook is triggered.
Syntax for creating a hook:
Qmgr: create hook <hook name>
Simple syntax for importing a hook:
Qmgr: import hook <hook name> application/x-python <contentencoding> <input_file>
This uses the script named <input_file> as the contents of your hook.
•
The <input_file> must be encoded with <content-encoding>.
•
The allowed values for <content-encoding> are “default” (7 bit) and “base64”.
•
<input_file> must be locally accessible to both qmgr and the batch server.
•
A relative path in <input_file> is relative to the directory where qmgr was executed.
•
If your hook already has a content script, then that is overwritten by this import call.
•
If the name of <input_file> contains spaces, as are used in Windows filenames, then
<input file> must be quoted.
6.5.5
Setting Attributes for Your Hook
Hooks have attributes that control their behavior, such as which events trigger the hook, the
time to allow the hook to execute, etc. The only attribute you must set for a simple hook is the
event(s) that will trigger the hook. Choose your hook type according to the event you want,
by looking in Table 6-2, “Hook Trigger Events,” on page 464.
Syntax for setting the hook event(s):
Qmgr: set hook <hook name> event = <event name>
Qmgr: set hook <hook name> event = “<event name>, <event name>”
For more details on setting hook trigger events, see section 6.8.5, “Setting Hook Trigger
Events”, on page 463.
You can set the rest of the hook’s attributes if you wish. To set a hook attribute:
Qmgr: set hook <hook name> <attribute> = <value>
PBS Professional 13.0 Administrator’s Guide
AG-447
Hooks
Chapter 6
For a list of all the hook attributes, see section 6.8.9.3, “List of Hook Attributes”, on page 474.
6.6
Uses for Hooks
6.6.1
•
Routing Jobs
Route jobs into specific queues or between queues:
•
Automatically route interactive jobs into a particular execution queue
•
Move a job to another queue; for example, if project allocation is used up, move job
to “background” queue
•
Reject job submissions that do not specify a valid queue, printing an error message
explaining the problem
•
Enable project-based ACLs for queues to make sure the appropriate job runs in the correct queue
AG-448
PBS Professional 13.0 Administrator’s Guide
Hooks
6.6.2
•
•
Chapter 6
Managing Resource Requests and Usage
Reject improperly specified jobs:
•
Reject jobs which do not specify walltime
•
Reject jobs that request a number of processors that is not a multiple of 8
•
Reject jobs requesting a specific queue, but not requesting memory
•
Reject jobs whose processors per node is not specified or is not numeric
Modify job resource requests:
•
Apply default memory limit to jobs that request a specific queue
•
Check on requested CPU and memory and modify these or supply them if missing
•
Adjust for the fact that users ask for 2GB on an Altix that has 2GB physical memory,
but only 1.8 GB available memory, by changing the memory request to 1.8GB
•
Reject parallel jobs for some queues.
•
Set default properties, for example, if “myri” is not set, set it to “False” to ensure Myrinet is used only for Myrinet jobs.
•
Convert from ALPS-specific resource request strings into PBS-specific job requirements.
•
Automatically translate old syntax to new syntax.
•
Compensate for dissimilar system capabilities; for example, allow users to use more
CPUs only if they use old, slow machines.
•
Limit reservations submitted by users to a maximum amount of resources and walltime,
but do not limit reservations submitted by PBS administrators.
6.6.3
Ensuring that Jobs Run Properly
•
Make sure that jobs, or all jobs in a queue, request exclusive access (-l place=excl).
•
Reject multi-host jobs, restricting each job to a single Altix.
•
Put a hold on the job if there isn't enough scratch space when the job is submitted.
•
Reject jobs that could cause problems, based on the user and type of job that have caused
previous problems. For example, if Bill's Abaqus jobs crash the system, reject new
Abaqus jobs from Bill.
•
Validate an input deck before the job is submitted.
•
Modify a job’s dependency list when the job is rejected.
•
Modify a job’s list of environment variables before it gets to the execution host(s).
PBS Professional 13.0 Administrator’s Guide
AG-449
Hooks
Chapter 6
6.6.4
•
Managing Job Output
Manage where output goes by modifying a job’s output path with the job’s ID.
6.6.5
•
Controlling Interactive Jobs
Control interactive job submission; for example, enable or disable interactive jobs at the
server or queue level.
6.6.6
Helping Schedule Jobs
•
Increase the priority of an array job once the first subjob runs, by modifying the value of
a job resource used in the job sorting formula
•
Change scheduling according to user and job:
•
•
Set initial user-dependent coefficients for the scheduling formula. For example, set
values of custom resources based on job attributes and user
•
Set whether or not the job is rerunnable, based on user
•
Calculate CPH (CPH == total ncpus * walltime in hours) and set a custom CPH job
resource to the value
Set initial priorities for jobs
6.6.7
Communicating Information to Users
•
For a job that is rejected because of a license shortage, set the job’s comment to inform
about the shortage
•
Report useful error messages back to the user, e.g., "You do not have sufficient
walltime left to run your job for 1:00:00. Your walltime balance is
00:30:00.”
6.6.8
Managing User Activity
•
Reject jobs from blacklisted users.
•
Prevent users from using qalter to change their jobs in any way, allowing only administrators to qalter jobs.
•
Prevent users from bypassing controls: disallow a job being submitted to queueA in a
held state and then being moved to queueB where the job would not have passed hook
checks for queueB initially. For example, if a queuejob hook disallows interactive jobs
AG-450
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
for queueB, the administrator also needs to ensure that an interactive job is not initially
submitted to queueA and later moved to queueB.
•
Prevent users from overriding node_group_key with qsub -lplace = group =
X, or with qalter.
•
Restrict the ability to submit a reservation to PBS administrators only.
6.6.9
Enabling Accounting and Validation
•
Make sure correct project designation is used: if no project or account string is found,
look up username in database to find appropriate project to use and add it as project or
account string before submission.
•
Submit job to correct queue based on project: check for project number and submit job to
queues based on project type, e.g. project number 1234 jobs get submitted into “challenge” queue; similarly for “standard” queue, etc.
•
Validate project before the job executes; if validation fails, do not start job, and print error
message. Validation can be based on project name, or for example requested resources,
such as CPU hours.
6.6.10
Allocation Management
•
You can use a job submission (queuejob) hook to check whether an entity has enough
resources allocated to accept the job.
•
You can use a hook that runs just before the job is sent to the execution host (runjob) to
perform allocation management tasks such as deducting requested amounts of resources
from an entity’s allocation.
•
You can use a hook that runs after a job finishes (execjob_epilogue) to perform final
allocation management tasks such as allocation reconciliation.
PBS Professional 13.0 Administrator’s Guide
AG-451
Hooks
Chapter 6
6.6.11
Managing Job Execution
Hooks that run periodically at execution hosts can do the following:
•
Modify job environment variables
•
Check vnode health
•
Report I/O wait time
•
Report memory usage integral (MB*time used)
•
Report energy usage to run a given job, if you have power sensors on vnodes
•
Report actual usage of accelerator hardware (FPGAs, GPUs, etc)
•
Interrogate HW performance counters so that you can flag codes that are not running efficiently (e.g. FLOPS < 5% of peak FLOPS)
•
Record how much disk space a job has accumulated in PBS_JOBDIR
•
Record power usage, energy usage, and disk space usage
Hooks that run just before the user’s program executes can do the following:
•
Change the job shell or executable
•
Change the job shell or executable arguments
•
Change the job’s environment variables
6.6.12
Configuring Vnodes
Hooks that run when an execution host starts can do the following:
•
Create custom resources for vnodes
•
Offline vnodes that are not ready for use
•
Return vnodes to use that have been offlined
6.6.13
•
Provision a vnode with a new AOE. See Chapter 7, "Provisioning", on page 739.
6.6.14
•
Provisioning Vnodes
Enforcing Security
Reject jobs with invalid Kerberos tickets
AG-452
PBS Professional 13.0 Administrator’s Guide
Hooks
6.6.15
•
Chapter 6
Accepting or Rejecting Job Task Attachment
Allow or disallow action when MoM is about to attach a process for a job
6.7
6.7.1
Hook Basics
Accepting or Rejecting Actions
Hooks accept (allow) or reject (prevent) actions, modify input parameters, modify job
attributes, environment variables, programs, program arguments, and change internal or
external values.
Each action can have zero or more hooks. Each hook must either accept or reject its action.
All of an action’s hooks are run when that action is to be performed. For PBS to perform an
action, all hooks enabled for that action must accept the action. If any hook rejects the action,
the action is not performed by PBS. If a hook script doesn’t call accept() or reject(), and it
doesn’t encounter an exception, PBS behaves as if the hook accepts the action. An action is
always accepted, unless:
•
pbs.event().reject() is called
•
An unhandled exception is encountered
•
The hook alarm has been triggered due to hook timeout being reached
When PBS executes the hooks for an action, it stops processing hooks at the first hook that
rejects the action.
6.7.1.1
Examples of Accepting and Rejecting Actions
Example 6-2: Accepting an action: In this example, userA submits a job to queue Queue1,
and the job submission action has two hooks: hook1 disallows jobs submitted by UserB,
and hook2 disallows jobs being submitted directly to Queue2. Both hook1 and hook2
accept userA’s job submission to Queue1, so the submission goes ahead.
Example 6-3: Rejecting an action: In this example, userA uses the qmove command to try to
move jobA from Queue1 to Queue2. The job move action has two hooks: hook3 disallows jobs being moved into Queue2, and hook4 disallows userB moving jobs out of
Queue1. In this example, hook3 rejects the action, so the move operation is disallowed,
even though hook4 would have accepted the action.
PBS Professional 13.0 Administrator’s Guide
AG-453
Hooks
Chapter 6
6.7.2
When Hooks Run
Each type of event has a corresponding type of hook. The following are the events where you
can run hooks, with the hook type:
•
Hooks that run before a job is received by an execution host (pre-execution event hooks):
queuejob: Queueing a job
modifyjob: Modifying a job, except when scheduler makes the modification (can also
run after job is received by execution host)
movejob: Moving a job
runjob: Just before a job is sent to an execution host
•
Hooks that run after a job is received by an execution host (execution event hooks):
execjob_begin: When a job is received by an execution host, after stagein
execjob_prologue: Just before starting a job’s shell
execjob_launch: Just before starting the user’s program
execjob_attach: When running pbs_attach()
execjob_preterm: Just before killing a job
execjob_epilogue: Just after executing or killing a job, but before job is cleaned up
execjob_end: Just after cleaning a job up
•
Hooks that are not directly related to a specific job (non-job event hooks):
resvsub: Submitting a PBS reservation
provision: Provisioning a vnode
exechost_periodic: Periodically on all execution hosts
exechost_startup: When an execution host is started or receives a HUP
AG-454
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Figure 6-1:Simplified view of hook trigger timing
PBS Professional 13.0 Administrator’s Guide
AG-455
Chapter 6
Hooks
Each time an event triggers a hook, the hook runs for that instance of the event. If you have
written a hook that runs at job submission, this hook will run for each job that is submitted to
this server. Each MoM runs one copy of each of her execution hooks per job. Execution
hooks run one per job at the MoM, not one per vnode. For a job that runs on four vnodes of a
multi-vnoded machine where all the vnodes are managed by one MoM, where you have written one execution hook, only one instance of the hook runs for that job.
Each time a job goes through a triggering event, PBS runs any relevant hooks. This means
that if you run a job, that triggers a runjob hook. If the job is killed and requeued and runs
again, the runjob hook runs again.
If the scheduler modifies a job, any modifyjob hooks are not triggered.
When you are using peer scheduling, and a job is pulled from one complex to another, the
pulling complex applies its hooks as if the job had been submitted locally, and the furnishing
complex applies its movejob hooks. Figure 6-2 shows an example of the hooks that are triggered when a job is moved from a complex containing a movejob hook to a complex containing a queuejob hook.
Figure 6-2:Hooks that run when job is moved
6.7.2.1
Execution Event Hook Triggers in Lifecycle of
Job
The hooks triggered for an MPI job depend on whether MPI processes are spawned using the
PBS TM interface via tm_spawn(), or are spawned using pbs_attach(). When a process is spawned using tm_spawn(), MoM starts the process. When a process uses
pbs_attach(), pbs_attach() starts the process and informs MoM of the process ID.
AG-456
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
The following shows where execution event hooks are triggered in the lifecycle of a normal,
successful job. We show the timing for hooks on the Mother Superior, on a sister vnode
where a process is spawned using tm_spawn(), and on a sister vnode where a process is
spawned using pbs_attach().
Table 6-1: Execution Event Hook Timing
Hooks Are Triggered
Job Lifecycle
Primary
Execution Host
Sister
(tm_spawn)
Sister
(pbs_attach)
Licenses are obtained
execjob_begin
execjob_begin
execjob_begin
Any required job-specific
staging and execution
directories are created
PBS_JOBDIR and job’s
jobdir attribute are set to
pathname of staging and
execution directory
Files are staged in
Job is sent to MoM
execjob_prologue
If there is no
execjob_prologue
hook, the prologue
script runs
Server writes accounting
log “S” record
Primary execution host
tells sister MoMs they
will run job task(s)
If necessary, MoM creates work directory
PBS Professional 13.0 Administrator’s Guide
AG-457
Hooks
Chapter 6
Table 6-1: Execution Event Hook Timing
Hooks Are Triggered
Job Lifecycle
Primary
Execution Host
Sister
(tm_spawn)
Sister
(pbs_attach)
MoM creates temporary
directory for job
MoM sets
PBS_TMPDIR, JOBDIR, and other environment variables in job’s
environment
MoM performs hardware-dependent setup:
The job’s cpusets are created, ALPS reservations
are created
The job script starts
Job starts an MPI process
on sister vnode
execjob_prologue execjob_attach
execjob_launch
execjob_prologue
execjob_epilogue
execjob_epilogue
The job script finishes
execjob_epilogue
If there is no
execjob_epilogue
hook, the epilogue
script runs
The obit is sent to the
server
Server writes accounting
log “E” record
AG-458
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-1: Execution Event Hook Timing
Hooks Are Triggered
Job Lifecycle
Primary
Execution Host
Sister
(tm_spawn)
Sister
(pbs_attach)
Any specified file staging
out takes place, including
stdout and stderr
Files staged in or out are
deleted
Any job-specific staging
and execution directories
are removed
The job’s cpusets are
destroyed
Job files are deleted
execjob_end
execjob_end
execjob_end
Application licenses are
returned to pool
6.7.3
Account Under Which Hooks Run
A hook runs as the Administrator or as the job owner, depending on the value of the hook’s
user attribute. If this is set to pbsadmin, the hook runs as the Administrator. If this is set to
pbsuser, the hook runs as the job owner.
6.7.4
Where Hooks Run
Pre-execution event, provision, and reservation hooks run on the primary or secondary
server’s host. Execution event, startup and periodic hooks run on the execution host(s).
6.7.5
Permissions and Location for Hooks
Hooks work with both the primary and secondary servers during failover. Hooks can only be
created, run, or modified by the Administrator, and only on the hosts on which the servers run.
PBS Professional 13.0 Administrator’s Guide
AG-459
Hooks
Chapter 6
6.7.6
Failover
The secondary server uses the same filesystem as the primary server. Any hooks created are
stored in the same place and are accessible by both servers, whether the primary or the secondary server is running.
When the secondary server takes over for the primary server after the primary's host has gone
down or becomes inaccessible, any hooks created at the primary server continue to function
under the secondary server.
If the you create a new hook while the secondary server has control, that hook will persist
once the primary server takes over: if the primary server comes back up and takes over, hooks
created while the secondary server had control continue to function.
6.7.7
What Hooks Cannot Access or Do
•
Hooks cannot read or modify anything not presented in the PBS hook interface
•
Hooks cannot modify the server or any queues
•
Pre-execution event hooks cannot read or set vnode attributes or resources, except that
the runjob hook can set the state attribute for any vnode to be used by the job
•
Hooks do not have access to other servers besides the default server:
•
Hooks cannot change the destination server to a non-default server
•
Hooks can allow a job submission or a qmove to a non-default server, and can
change the destination server from a remote server to the default server
•
Hooks cannot directly print to stdout or stderr or read from stdin.
•
movejob hooks do not run on pbs_rsub -Wqmove=<job ID>
AG-460
PBS Professional 13.0 Administrator’s Guide
Hooks
6.7.8
•
Chapter 6
What Hooks Should Not Do
Hooks should not edit configuration files directly, meaning hooks should not edit the following:
PBS_HOME/sched_priv/sched_config
PBS_HOME/sched_priv/fairshare
PBS_HOME/sched_priv/dedicated
PBS_HOME/sched_priv/holidays
/etc/pbs.conf
PBS_HOME/server_priv/resourcedef
PBS_HOME/mom_priv/config
•
Hooks should not execute PBS commands
6.8
6.8.1
Creating and Configuring Hooks
Introduction to Creating and Configuring
Hooks
Hooks can only be created, run, or modified by the Administrator, and only on the host(s) on
which the primary or secondary server runs.
You create hooks using the qmgr command to create, delete, import, or export the hook. The
qmgr command operates on the hook object.
Syntax of qmgr hooks directive:
Qmgr: command hook [hook_name] [attr OP value[,attr OP value,...]]
where
command is create, delete, set, unset, list, print, import, export
OP is one of
=
-=
+=
import loads the contents of a hook from an input file.
export dumps the hook contents to a file.
PBS Professional 13.0 Administrator’s Guide
AG-461
Hooks
Chapter 6
6.8.1.1
Hook Name Restrictions
•
Each hook must have a unique name.
•
The name must be alphanumeric, and start with an alphabetic character.
•
The name must not begin with “PBS”.
•
The name of a hook can be a legal PBS object name, such as the name of a queue.
•
Hook names are case-sensitive.
6.8.2
Overview of Creating and Configuring a Hook
The following is an overview of the steps to create a hook. Each step is described in the following sections. You must be logged into the primary or secondary server host as root.
1.
Use the create hook qmgr command to create an empty hook with the name you
specify
2.
Set the hook’s trigger event
3.
If the hook will use a configuration file, write and import the configuration file
4.
Import the contents of a hook script into the hook
5.
Set the hook’s order of execution, if there is another hook for the same event
6.
Optionally, set the hook’s timeout
7.
Make sure that the $reject_root_scripts MoM configuration parameter is set to False on
all execution hosts where you want hooks to run. The default for this parameter is False.
You do not need to restart the MoM.
6.8.2.1
Example of Creating and Configuring a Hook
Create the hook:
Qmgr: create hook hook1
Import the hook script named hook1_script.py into the hook:
Qmgr: import hook hook1 application/x-python default /hooks/
hook1_script.py
Make hook1 a queuejob hook:
Qmgr: set hook hook1 event = queuejob
AG-462
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Make this the second queuejob hook:
Qmgr: set hook hook1 order = 2
Set the hook to time out after 60 seconds:
Qmgr: set hook hook1 alarm = 60
Look at the $reject_root_scripts MoM configuration parameter where you want the hook to
run, and make sure it is set to False.
6.8.3
Creating Empty Hooks
To create a hook, use the create hook command in qmgr to create an empty hook with
the name you specify:
The create hook qmgr command creates an empty hook.
Syntax for creating a hook:
Qmgr: create hook <hook name>
6.8.3.1
Example of Creating an Empty Hook
To create the hook named “hook1”, specify a filename, for example “/hooks/hook1.py”, that
is locally accessible to qmgr and the PBS server:
Qmgr: create hook hook1
6.8.4
Deleting Hooks
To delete a hook, you use the delete hook command in qmgr.
Syntax for deleting a hook:
Qmgr: delete hook <hook name>
6.8.4.1
Example of Deleting a Hook
To delete hook hook1:
Qmgr: delete hook hook1
6.8.5
Setting Hook Trigger Events
To set the events that will cause a hook to be triggered, use the set hook <hook name>
event command in qmgr. You can add triggering events to a hook.
PBS Professional 13.0 Administrator’s Guide
AG-463
Hooks
Chapter 6
To set one event:
Qmgr: set hook <hook name> event = <event name>
Designate triggers for a hook by setting <event name> to one of the following events:
Table 6-2: Hook Trigger Events
Action (Event)
Event Name
Accepting job into queue
queuejob
Modifying job, except when scheduler makes modification modifyjob
Moving job
movejob
Before a job is sent to an execution host
runjob
When a job is received by an execution host, after stagein
execjob_begin
When pbs_attach() is called
execjob_attach
Just before executing a job’s top shell
execjob_prologue
Just before executing the user’s program
execjob_launch
Just after executing or killing a job, but before job is
cleaned up
execjob_epilogue
Just before killing a job
execjob_preterm
Just after cleaning up a job that has finished or been killed
execjob_end
When an execution host starts up or receives a HUP
exechost_startup
Periodically on all execution hosts
exechost_periodic
Provisioning a vnode
provision
Submitting reservation
resvsub
To add an event:
Qmgr: set hook <hook name> event += <event name>
For a detailed description of each event, see section 6.12.4.1, “Event Types”, on page 540.
6.8.5.1
Example of Setting Hook Trigger Events
To set an event that will cause hook “UserFilter” to be triggered:
Qmgr: set hook UserFilter event = queuejob
AG-464
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Add another event:
Qmgr: set hook UserFilter event += modifyjob
Set two events at once:
Qmgr: set hook UserFilter event = “queuejob, modifyjob”
You must enclose the value in double quotes if it contains a comma.
6.8.6
Using Hook Configuration Files
You can customize the behavior of a hook by providing a configuration file for the hook. You
write the hook so that it reads and acts on its configuration file. Hooks are not required to use
configuration files. A configuration file can contain whatever information is useful to the
hook. A configuration file is just a file of whatever information you want; the way the hook
reads and uses the contents of a configuration file is up to you. The hook itself processes the
configuration file.
6.8.6.1
Format of Configuration File
PBS supports several file formats for configuration files. The format of the file is specified in
its suffix. Formats can be specified in any of the following ways:
•
.ini
•
.json
•
.py (Python)
•
.txt (generic, no special format)
•
.xml
•
No suffix: treat the input file as if it is a .txt file
•
The dash (-) symbol: configuration file content will be taken from STDIN. The content is
treated as if it is a .txt file.
For example, to import a configuration file in .json format:
# qmgr -c “import hook <hook_name> application/x-config default
input_file.json”
PBS Professional 13.0 Administrator’s Guide
AG-465
Hooks
Chapter 6
6.8.6.2
Importing Configuration File
To provide a configuration file for a hook, you import the configuration file into the hook.
The import command is the same as for a hook, except that you set <content-type> to “application/x-config”. Syntax for importing a configuration file:
Qmgr: import hook <hook_name> application/x-config <contentencoding> <input_config_file>
or
# qmgr -c “import hook <hook_name> application/x-config <content-encoding>
<input_config_file>”
where <content-encoding> is “default” (7-bit) or “base64”.
This uses the contents of <input_config_file> or stdin (-) as the contents of configuration
file for hook <hook_name>.
•
The <input_config_file> or stdin (-) data must have a format <content-type> and must
be encoded with <content-encoding>.
•
The allowed values for <content-encoding> are “default” (7bit) and “base64”.
•
If the source of input is stdin (-) and <content-encoding> is “default”, then qmgr
expects the input data to be terminated by EOF.
•
If the source of input is stdin (-) and <content-encoding> is “base64”, then qmgr
expects input data to be terminated by a blank line.
•
<input_config_file> must be locally accessible to both qmgr and the requested batch
server.
•
A relative path <input_config_file> is relative to the directory where qmgr was executed.
•
If a hook already has a configuration file, then that is overwritten by this import call.
•
If <input_config_file> name contains spaces as are used in Windows filenames, then
<input_config_file> must be quoted.
•
There is no restriction on the size of the hook configuration file.
6.8.6.2.i
Examples of Importing Configuration Files
Importing a Python configuration file:
# qmgr -c 'import pbshook hook1 application/x-config default hello.py
Importing a JSON configuration file:
# qmgr -c 'import pbshook hook1 application/x-config default hello.json'
AG-466
PBS Professional 13.0 Administrator’s Guide
Hooks
6.8.6.3
Chapter 6
How Hooks Find Configuration Files
There are two ways to retrieve a configuration file in a hook.
•
PBS puts the configuration file in a location that can be read by the hook, and sets the
PBS_HOOK_CONFIG_FILE environment variable to that path. Your hook script can
use this path:
import os
import ConfigParser
if “PBS_HOOK_CONFIG_FILE” in os.environ:
config_file = os.environ[“PBS_HOOK_CONFIG_FILE”]
config = ConfigParser.RawConfigParser()
config.read(os.environ[“PBS_HOOK_CONFIG_FILE")
•
Your hook can use the pbs.hook_config_filename variable, which contains the path to
the configuration file. See "pbs.hook_config_filename” on page 601.
If there is no configuration file, this variable returns None.
6.8.6.4
Changing a Hook Configuration File
To replace the content of a hook configuration file, issue another “import” hook command
with updated <input_config_file> content.
6.8.6.5
Viewing Configuration Files
To display the content of a hook configuration file associated with the hook named
<hook_name>, export the configuration file. Use the export command:
Qmgr: export hook <hook_name> application/x-config default
PBS Professional 13.0 Administrator’s Guide
AG-467
Hooks
Chapter 6
6.8.6.6
Validation and Errors
•
PBS pre-validates <input_config_file> according to its file format, and returns an error in
qmgr's STDERR if validation fails. For example:
# qmgr -c “import hook submit application/x-config default file.json”
“Failed to validate config file, hook 'submit' config file not overwritten
"
•
If the input configuration file given is of unrecognized suffix, the following message is
returned in qmgr's STDERR.
“<input-file> contains an invalid suffix, should be one of: .json .py .txt
.xml .ini”
•
If you import a configuration file and PBS cannot open the file because it is non-existent,
has permission problems (seen in Windows), or has another system-related error, the following error message is printed in STDERR:
"qmgr: hook error: failed to open <filename> - <error message>"
•
If you attempt to export a hook configuration file, but the file is unwriteable due to ownership or permission problems, the following error message is printed to STDERR:
“qmgr: hook error: <output_file> permission denied”
6.8.7
Importing Hooks
To import a hook, you import the contents of a hook script into the hook. You must specify a
filename that is locally accessible to qmgr and the PBS server.
Syntax for importing a hook:
Qmgr: import hook <hook_name> <content-type> <content-encoding>
{<input_file>|-}
AG-468
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
This uses the contents of <input_file> or stdin (-) as the contents of hook <hook_name>.
•
The <input_file> or stdin (-) data must have a format <content-type> and must be
encoded with <content-encoding>.
•
For script files, the only <content-type> currently supported is “application/x-python”.
•
The allowed values for <content-encoding> are “default” (7 bit) and “base64”.
•
If the source of input is stdin (-) and <content-encoding> is “default”, then qmgr
expects the input data to be terminated by EOF.
•
If the source of input is stdin (-) and <content-encoding> is “base64”, then qmgr
expects input data to be terminated by a blank line.
•
<input_file> must be locally accessible to both qmgr and the requested batch server.
•
A relative path in <input_file> is relative to the directory where qmgr was executed.
•
If a hook already has a content script, then that is overwritten by this import call.
•
If the name of <input_file> contains spaces, as are used in Windows filenames, then
<input file> must be quoted.
•
There is no restriction on the size of the hook script.
PBS Professional 13.0 Administrator’s Guide
AG-469
Hooks
Chapter 6
6.8.7.1
Examples of Importing Hooks
Example 6-4: Given a Python script in ASCII text file "hello.py", this makes its contents
into the script contents of hook1:
#cat hello.py
import pbs
pbs.event().job.comment=“Hello, world”
# qmgr -c 'import hook hook1 application/x-python default hello.py'
Example 6-5: Given a base64-encoded file "hello.py.b64", qmgr unencodes the file's contents, and then makes this script the contents of hook1:
# cat hello.py.b64
cHJpbnQgImhlbGxvLCB3b3JsZCIK
# qmgr -c 'import hook hook1 application/x-python base64 hello.py.b64'
Example 6-6: Read stdin for text containing data until EOF, and make this into the script contents of hook1:
# qmgr -c 'import hook hook1 application/x-python default -'
import pbs
pbs.event().job.comment=“Hello from stdin”
Ctrl-D (UNIX/Linux)
Ctrl-Z (Windows)
Example 6-7: Read stdin for a base64-encoded string of data terminated by a blank line.
PBS unencodes the data and makes this script the contents of hook1.
# qmgr -c 'import hook hook1 application/x-python base64 -'
cHJpbnQgImhlbGxvLCB3b3JsZCIK
Ctrl-D (UNIX/Linux)
Ctrl-Z (Windows)
6.8.8
Exporting Hooks
Syntax for exporting a hook:
Qmgr: export hook <hook_name> <content-type> <content-encoding>
[<output_file>]
AG-470
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
This dumps the script contents of hook <hook_name> into <output_file>, or stdout if
<output_file> is not specified.
•
The resulting <output_file> or stdout data is of <content-type> and <content-encoding>.
•
The only <content-type> currently supported for scripts is “application/x-python”.
•
The allowed values for <content-encoding> are “default” (7bit) and “base64”.
•
<output_file> must be a path that can be created by qmgr.
•
Any relative path in <output_file> is relative to the directory where qmgr was executed.
•
If <output_file> already exists it is overwritten. If PBS is unable to overwrite the file due
to ownership or permission problems, then an error message is displayed in stderr.
•
If the <output_file> name contains spaces, like the ones used in Windows file names,
then <output file> must be enclosed in quotes.
PBS Professional 13.0 Administrator’s Guide
AG-471
Hooks
Chapter 6
6.8.8.1
Examples of Exporting Hooks
Example 6-8: Dumps hook1's script contents directly into the file "hello.py.out":
# qmgr -c 'export hook hook1 application/x-python default hello.py'
# cat hello.py
import pbs
pbs.event().job.comment="Hello, world"
Example 6-9: To dump the script contents of a hook 'hook1' into a file in “\My
Hooks\hook1.py”:
Qmgr: export hook hook1 application/x-python default “\My
Hooks\hook1.py”
Example 6-10: Dump hook1's script contents base64-encoded into a file called
"hello.py.b64":
# qmgr -c “export hook hook1 application/x-python base64 hello.py.b64”
# cat hello.py.b64
cHJpbnQgImhlbGxvLCB3b3JsZCIK
Example 6-11: Dump hook1's script contents directly to stdout:
# qmgr -c “export hook hook1 application/x-python default”
import pbs
pbs.event().job.comment="Hello, world"
Example 6-12: Dump hook1's script contents base64-encoded into stdout:
# qmgr -c “export hook hook1 application/x-python base64”
cHJpbnQgImhlbGxvLCB3b3JsZCIK
6.8.9
Setting and Unsetting Hook Attributes
You configure a hook using the qmgr command to set or unset its attributes. An unset hook
attribute takes the default value for that attribute.
Hook attributes can be viewed via qmgr:
Qmgr: list hook <hook name>
To set a hook attribute:
Qmgr: set hook <hook name> <attribute> = <value>
To unset a hook attribute:
Qmgr: unset hook <hook name> <attribute>
AG-472
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
For example, to unset hook1’s alarm attribute, causing its value to revert to its default value:
Qmgr: unset hook hook1 alarm
This causes hook1's alarm to revert to the default of 30 seconds.
6.8.9.1
Caveats for Setting Hook Attributes
You cannot set the type attribute for a built-in hook.
6.8.9.2
Using the fail_action Hook Attribute
The fail_action hook attribute is a string_array and can take on multiple values:
None
No action is taken.
offline_vnodes
After unsuccessful hook execution, offlines the vnodes managed by the MoM executing the hook. Can be set for execjob_begin and exechost_startup hooks only.
clear_vnodes_upon_recovery
After successful hook execution, clears vnodes previously offlined via
offline_vnodes fail action. Can be set for exechost_startup hooks only.
scheduler_restart_cycle
After unsuccessful hook execution, restarts scheduling cycle. Can be set for
execjob_begin hooks only.
Default value: “None”
If you specify offlining or clearing vnodes in addition to restarting the scheduler, the scheduler restart happens last. The order of the values is not important.
To set the attribute:
# qmgr -c “set hook <hook_name> fail_action = <fail_action value>”
# qmgr -c “set hook <hook_name> fail_action = ‘<fail_action
value>,<fail_action value>’”
To add a value to the list of values:
# qmgr -c “set hook <hook_name> fail_action += <fail_action value>
To remove a value from the list of values:
# qmgr -c “set hook <hook_name> fail_action -= <fail_action value>
PBS Professional 13.0 Administrator’s Guide
AG-473
Hooks
Chapter 6
To find out what the values are:
# qmgr -c “list hook <hook_name> fail_action”
<hook_name>
fail_action = <fail_action value>
To unset the attribute:
# qmgr -c “unset hook <hook_name> fail_action”
See section 6.10.6, “Offlining and Clearing Vnodes Using the fail_action Hook Attribute”, on
page 511 and section 6.10.7, “Restarting Scheduler Cycle After Hook Failure”, on page 512.
6.8.9.3
List of Hook Attributes
Hook attributes are listed in the following table, and in “Hook Attributes” on page 417 of the
PBS Professional Reference Guide:.
Table 6-3: Hook Attributes
Attribute Name and
Format
alarm=<n>
Description
<n> is the number of seconds to wait before an executing hook
script times out.
Valid values are > 0.
Default value: 30
debug=<Boolean>
Specifies whether or not the hook produces debugging files.
Files are placed under PBS_HOME/server_priv/hooks/tmp,
PBS_HOME/mom_priv/hooks/tmp, or PBS_HOME/spool. Files
are named hook_<hook event>_<hook name>_<unique
ID>.in, .data, and .out. See section 6.16.2, “Files for Debugging”, on page 640.
Default value: False
enabled=<Boolean>
Determines whether or not a hook is run when its triggering
event occurs. If a hook's enabled attribute is True, the hook is
run.
Default value: True
AG-474
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-3: Hook Attributes
Attribute Name and
Format
event=<event
string_array>
Description
List of events that trigger the hook.
Can be operated on with “=”, “+=”, or “-=” operators.
Valid hook types are:
resvsub: create a reservation
queuejob: submit a job
modifyjob: alter a job, except when the scheduler alters a job
movejob: move a job
runjob: before sending job to execution host
execjob_begin: when execution host receives job
execjob_prologue: just before top job process starts
execjob_launch: when execution host receives job
execjob_attach: when pbs_attach() runs
execjob_preterm: before killing job
execjob_epilogue: after job finishes or is killed
execjob_end: after cleaning up job
exechost_startup: When MoM starts or is HUPed
exechost_periodic: periodically on all execution hosts
provision: provision a vnode
The provision event cannot be combined with any other
events.
See section 6.12.4.1, “Event Types”, on page 540.
Default value: “” =none, meaning the hook will not be triggered.
PBS Professional 13.0 Administrator’s Guide
AG-475
Chapter 6
Hooks
Table 6-3: Hook Attributes
Attribute Name and
Format
fail_action=<fail_action
string_array>
Description
Specifies the action to be taken when hook fails due to alarm
call or unhandled exception, or an internal problem such as not
enough disk space or memory. Can also specify a subsequent
action to be taken when hook runs successfully. Value can be
either “none” or one or more of “offline_vnodes”,
“clear_vnodes_upon_recovery”, and
“scheduler_restart_cycle”.
“offline_vnodes”: After unsuccessful hook execution, offlines
the vnodes managed by the MoM executing the hook. Only
available for exechost_startup and execjob_begin hooks.
“clear_vnodes_upon_recovery”: After successful hook execution, clears vnodes previously offlined via offline_vnodes
fail action. Only available for exechost_startup hooks.
“scheduler_restart_cycle”: After unsuccessful hook execution, restarts scheduling cycle. Only available for
execjob_begin hooks.
See section 6.10.6, “Offlining and Clearing Vnodes Using the
fail_action Hook Attribute”, on page 511, section 6.10.7,
“Restarting Scheduler Cycle After Hook Failure”, on page 512,
and section 6.8.9.2, “Using the fail_action Hook Attribute”, on
page 473.
freq=<number of seconds>
Specifies how often an exechost_periodic hook script runs, in
seconds.
Value must be > 0.
Default: 120
order=<n>
Integer indicating relative ordering of hook execution. Hooks
with lower values for order execute before those with higher
values for order. Not applied to exechost_periodic hooks.
Valid values: 1 to 1000, inclusive.
Default value: 1
AG-476
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-3: Hook Attributes
Attribute Name and
Format
Description
Hook type can be “site”.
Type={“site”}
Type “site” is the only value allowed in a create or set
command, and the only value listed in a “list” or “print”
command.
Default value: “site”.
Cannot be set for a built-in hook..
Specifies who executes the hook. Valid values:
user=<user>
pbsadmin
On UNIX, this is root. On Windows, this is simply a substitute for the PBS service account; it is not the name of
the PBS service account.
pbsuser
The hook runs under the account of the job owner, which
is the value of the euser job attribute. Can be used for
execjob_prologue, execjob_epilogue, execjob_preterm
events only.
Default value: pbsadmin
6.8.10
Enabling and Disabling Hooks
A hook is either enabled, and will run when its action happens, or is disabled, and will not
run. Hooks are enabled by default.
Syntax to enable a hook:
Qmgr: set hook <hook name> enabled=True
Syntax to disable a hook:
Qmgr: set hook <hook name> enabled=False
PBS Professional 13.0 Administrator’s Guide
AG-477
Hooks
Chapter 6
6.8.10.1
Example of Enabling and Disabling Hooks
To enable hook1:
Qmgr: set hook hook1 enabled=True
To disable hook1:
Qmgr: set hook hook1 enabled=False
6.8.11
Setting the Relative Order of Hook Execution
When there are multiple hooks for one action, you may wish to specify the order in which the
hooks are run. The order in which the hooks for an action are run is determined by each
hook’s order attribute. Hooks with a lower value for order will run before hooks with a
higher value. To set the relative order in which the hooks for an action will be run, set each
hook’s order attribute.
Syntax:
Qmgr: set hook <hook name> order=<ordering>
<ordering> is an integer. Hooks with lower values for <ordering> run before those with
higher values; a hook with order=1 runs before a hook with order=2.
Valid values for hook ordering are between 1 and 1000.
The order in which hooks for unrelated actions execute is undefined. For example, there are
two queuejob hooks, Hook1 and Hook2, and userA submits jobA and userB submits jobB.
While Hook1 always runs before Hook2 for the same job, the order of execution is undefined
for different jobs. So the order could be:
Hook1 (jobB)
Hook1 (jobA)
Hook2 (jobA)
Hook2 (jobB)
6.8.11.1
Example of Setting Relative Order of Hook
Execution
To set hookA to run first and hookB to run second:
Qmgr: set hook hookA order=2
Qmgr: set hook hookB order=5
AG-478
PBS Professional 13.0 Administrator’s Guide
Hooks
6.8.11.2
Chapter 6
Caveats for Setting Relative Order of Hooks
The order attribute is ignored for exechost_periodic hooks.
6.8.12
Setting Hook Timeout
You may wish to specify how long PBS should wait for a hook to run. Execution for each
hook times out after the number of seconds specified in the hook’s alarm attribute. If the
hook does not run in the specified time, PBS aborts the hook and rejects the hook’s action.
Syntax:
Qmgr: set hook <hook name> alarm=<timeout>
<timeout> is the number of seconds PBS will allow the hook to run.
When a hook timeout is triggered, the hook script gets a Python KeyboardInterrupt from the
PBS server. The server logs show:
06/17/2008 17:57:16;0001;Server@host2;Svr;Server@host2;PBS server internal
error (15011) in Python script received a KeyboardInterrupt, <type
'exceptions.KeyboardInterrupt'>
6.8.12.1
Example of Setting Hook Timeout
To set the number of seconds that PBS will wait for hook hook1 to execute before aborting the
hook and reject the action:
Qmgr: set hook hook1 alarm=20
6.8.13
Setting Hook Frequency
You can specify the frequency with which a periodic hook runs. You can do this only for
hooks whose event type is exechost_periodic.
Syntax:
Qmgr: set hook <hook name> freq=<frequency>
<frequency> is the number of seconds elapsed between calls to this hook.
6.8.13.1
Example of Setting Hook Frequency
To set the number of seconds between calls to an exechost_periodic hook:
Qmgr: set hook hook1 freq=200
PBS Professional 13.0 Administrator’s Guide
AG-479
Hooks
Chapter 6
6.8.14
Setting Hook User Account
You can specify the account under which a hook runs.
Syntax:
Qmgr: set hook <hook name> user=<pbsadmin | pbsuser>
pbsadmin specifies that the hook runs as root or as administrator.
pbsuser specifies that the hook runs as the job owner.
You can specify that a hook runs as the job owner only for execjob_prologue,
execjob_epilogue, and execjob_preterm hooks.
If you do not set the account, it defaults to pbsadmin.
6.8.14.1
Example of Setting Hook User Account
To set the account under which a hook runs:
Qmgr: set hook hook1 user=pbsuser
6.9
6.9.1
Viewing Hook Information
Listing Hooks
To list one hook and its attributes on the current server:
Qmgr: list hook <hook name>
To list all hooks and their attributes on the current server:
Qmgr: list hook
6.9.2
Viewing Hook Contents
To view the contents of a hook, export the hook’s contents:
Qmgr: export hook <hook_name> <content-type> <content-encoding>
[<output_file>]
You cannot export the contents of a built-in hook.
AG-480
PBS Professional 13.0 Administrator’s Guide
Hooks
6.9.3
Chapter 6
Printing Hook Creation Commands
To view the commands to create one hook:
Qmgr: print hook <hook name>
To view the commands to create all the hooks on the default server:
Qmgr: print hook
or
qmgr -c “print hook”
For example, to see the commands used to create hook1 and hook2:
# qmgr -c “print hook”
create hook hook1
import hook hook1 application/x-python base64 cHJpbnQgImhlbGxvLCB3b3JsZCIK<blank line>
set hook hook1 event=movejob
set hook hook1 alarm=10
set hook hook1 order=5
create hook hook2
import hook hook2 application/x-python base64 - servaJLSDFSESF<newline>
set hook hook2 event=queuejob
set hook hook2 alarm=15
set hook hook2 order=60
…
6.9.4
Re-creating Hooks
To re-create a hook, you feed qmgr hook descriptions back into qmgr. These hook descriptions are the same information that qmgr prints out. To print out the statements needed to
recreate a hook, use the print hook or print hook <hook name> qmgr commands.
For example, to save information for hook1 and hook2:
# qmgr -c “print hook” > hookInfo
To re-create hook1 and hook2:
# qmgr < hookInfo
PBS Professional 13.0 Administrator’s Guide
AG-481
Hooks
Chapter 6
6.10
6.10.1
6.10.1.1
Writing Hook Scripts
How We Define and Refer to Objects and
Methods
Scope of Object or Method
When we define an object or method, we show the scope of the object or method. For example, the scope of a job is the pbs module, so we call it a pbs.job, and a server has the same
scope, so it is a pbs.server. Similarly, the logjobmsg() method has module-wide scope, and is
defined as pbs.logjobmsg().
However, the scope of a job ID object is the job, not the module, so it is defined as a
pbs.job.id, and the scope of the job’s is_checkpointed() method is the job, so it is defined as
pbs.job.is_checkpointed().
6.10.1.2
Referring to Objects
In a hook, you refer to the triggering event using pbs.event(). In a hook that is triggered by a
job-related event, such as a movejob or execjob_begin hook, the event has an associated
pbs.job object representing the job that triggered the event, and you refer to it using
pbs.event().job. You can refer to members of that job object using pbs.event().job.<member>.
For example, to refer to the ID of the job associated with the event, you use pbs.event().job.id.
To use the is_checkpointed() method on the job associated with the event, you use
pbs.event().job.is_checkpointed(). You can use shortcuts:
e = pbs.event()
j = e.job
c = j.is_checkpointed()
6.10.1.3
How to Retrieve Objects: Event vs. Server
Each event has access to specific objects, listed in Table 6-17, “Using Event Object Members
in Events,” on page 568. You can manipulate many of these objects through the event. To
retrieve the job that triggered an event, you refer to it this way: pbs.event().job.
The server has read access to all objects in the pbs module. You refer to these objects through
the server. For example, to retrieve a job whose ID is “1234” through the server, you use
pbs.server().job(“1234”). You cannot manipulate an object that is retrieved through the
server.
AG-482
PBS Professional 13.0 Administrator’s Guide
Hooks
6.10.1.3.i
Chapter 6
Retrieving Jobs
The way you retrieve a job determines how much access you have to that job. You can
retrieve a job either through the event, via pbs.event().job, or through the server, via
pbs.server().job().
If you retrieve a job through an event, the event gives you the job itself, represented as an
object. You can see and alter some job attributes for an event-retrieved job object. To get the
job object representing the job associated with the current event, on which you can operate,
use pbs.event().job. We show which hooks can see and set each job attribute in Table 6-8,
“Job Attributes Readable & Settable via Events,” on page 499.
However, if you retrieve a job through the server, the server gives you an instantiated job
object that contains a copy of the job. You cannot set any job attributes for a server-retrieved
job object, and trying to operate on a server-retrieved copy of the job causes an exception. In
order to get read-only information about a particular job with ID <id>, use
pbs.server().job('<job ID>'). This returns a read-only copy of the job.
You can see all of the attributes for a server-retrieved job object, except in a queuejob hook.
In a queuejob hook, the event gives you the job as it exists before the server sees it, but the
server cannot retrieve it, because the job has not yet made it to the server.
6.10.1.3.ii
Retrieving Vnodes
Vnode objects behave like job objects. If you retrieve a vnode object through an event, via
pbs.event().vnode_list[], except for the execjob_launch event, you can see some of the
vnode’s attributes, and set any vnode attribute that you could set via qmgr. We show which
hooks can see and set each vnode attribute in Table 6-9, “Vnode Attributes Readable & Settable via Events,” on page 502.
If you retrieve a vnode object through the server, via pbs.server().vnode(), you have a copy
of the vnode, and you can see all of the vnode’s attributes, but you cannot set any of them.
6.10.1.3.iii
Retrieving Queues
You can retrieve queues through the server only, using pbs.server().queue(“<queue name>”),
or using pbs.server().queues(). You cannot make any changes to queue objects in hooks.
These are read-only.
You can change a job’s destination queue, but only to a queue at the local server. Hooks have
access only to the local server. Hooks can allow a job submission to a remote server, but they
cannot specify a remote server. See section 6.11.9, “Local Server Only”, on page 521.
Hooks can specify the destination queue at a local server for a queuejob or movejob event,
whether the original destination queue was at the local server or a remote server.
To specify a destination queue at the local server:
pbs.event().job.queue = pbs.server().queue("<local_queue>")
PBS Professional 13.0 Administrator’s Guide
AG-483
Hooks
Chapter 6
Do not specify a queue at a remote server in a hook script.
6.10.1.3.iv
Retrieving Reservations
In order to get information about a reservation being created in a resvsub event, use
pbs.event().resv. pbs.server() cannot return information about the reservation, because the
reservation has not yet been created.
6.10.2
Recommended Hook Script Structure
6.10.2.1
Catch Exceptions
Your hook script should catch all exceptions except for SystemExit. We recommend that you
catch exceptions via try... except and accompany them with a call to pbs.event().reject().
It is helpful if it displays a useful error message in the stderr of the command triggering the
hook. The error message should show the type of the error and should describe the error.
Here is the recommended script structure:
import pbs
Import sys
try:
…
except SystemExit:
pass
except:
pbs.event().job.rerun()
e.reject("%s hook failed with %s. Please contact \
Admin" % (e.hook_name, sys.exc_info()[:2]))
AG-484
PBS Professional 13.0 Administrator’s Guide
Hooks
6.10.2.1.i
Chapter 6
Example of Catching Exceptions
This example shows how a coding error in the hook is caught with the except statement, and
an appropriate error message is generated. In line 7, the statement k=5/0 generates a divideby-zero error. The hook script is designed to reject interactive jobs that are submitted to
queue “nointer”.
import pbs
import sys
try:
batchq = "nointer"
e = pbs.event()
j = e.job
k = 5/0
if j.queue and j.queue.name == batchq and j.interactive:
e.reject("Can't submit an interactive job in '%s' queue" %
(batchq))
except SystemExit:
pass
except:
e.reject("%s hook failed with %s. Please contact Admin" % (e.hook_name,
sys.exc_info()[:2]))
The hook is triggered:
% qsub job.scr
qsub: c1 hook failed with (<type 'exceptions.ZeroDivisionError'>,
ZeroDivisionError('integer division or modulo by zero',)). Please
contact Admin
PBS Professional 13.0 Administrator’s Guide
AG-485
Hooks
Chapter 6
6.10.2.1.ii
Table of Exceptions
The following exceptions may be raised when using the pbs.* objects:
Table 6-4: Exceptions Raised When Using pbs.* Objects
Object
Exception
pbs.BadAttributeValueError
Raised when setting member value of a pbs.*
object to an invalid value.
pbs.BadAttributeValueTypeError
Raised when setting member value of a pbs.*
object to an invalid type.
pbs.BadResourceValueError
Raised when setting resource value of a pbs.*
object to an invalid value.
pbs.BadResourceValueTypeError
Raised when setting resource value of a pbs.*
object to an invalid type.
pbs.EventIncompatibleError
Raised when referencing a nonexistent member
in pbs.event.
Example: calling pbs.event().resv for
pbs.event().type of pbs.QUEUEJOB
pbs.UnsetAttributeNameError
Raised when referencing a non-existent member
name of a pbs.* object.
pbs.UnsetResourceNameError
Raised when referencing a non-existent resource
name of a pbs.* object.
SystemExit
1. Raised when pbs.event().reject() terminates
hook execution.
2. Raised when pbs.event().accept() terminates
hook execution.
AG-486
PBS Professional 13.0 Administrator’s Guide
Hooks
6.10.3
•
Chapter 6
Hook Alarm Calls and Unhandled Exceptions
An execjob_begin or exechost_startup hook can cause a failure action to take place
when the hook script fails due to an alarm call or an unhandled exception. Otherwise, the
following happens:
If a pre-execution event or execution event hook encounters an unhandled exception:
•
PBS rejects the corresponding action. The command that initiates the action results
in the following message in stderr:
“<command_name>: request rejected as filter hook <hook_name> encountered an exception. Please inform Admin”
•
The following message appears in the appropriate PBS daemon log, logged under
PBSEVENT_DEBUG2 event class:
“<request type> hook <hook_name> encountered an exception, request
rejected”
•
•
The job is left unmodified.
If an exechost_startup hook script encounters an unexpected error causing an unhandled exception, vnode changes do not take effect, but MoM continues to run, and the following message appears at level PBSEVENT_DEBUG2 in mom_logs:
“exechost_startup hook <hook_name> encountered an exception, request
rejected”
•
The following statements will cause an unhandled exception if they appear in a hook
script as is:
•
ZeroDivisionError exception raised:
val = 5/0
•
BadAttributeValueError exception raised; pbs.hold_types and strings don't mix:
pbs.event().job.Hold_Types = “z”
•
EventIncompatibleError exception raised for the following runjob event; runjob
event has job attribute, not resv attribute:
r = pbs.event().resv
•
You can use execjob_begin and exechost_startup hooks to offline vnodes when those
hooks encounter alarm calls or unhandled exceptions. See “Offlining and Clearing
Vnodes Using the fail_action Hook Attribute” on page 511 of the PBS Professional Ref-
PBS Professional 13.0 Administrator’s Guide
AG-487
Hooks
Chapter 6
erence Guide. You can then clear the offline state from those vnodes later when an
exechost_startup hook runs successfully.
•
You can use an execjob_begin hook restart the scheduler cycle when the hook encounters an alarm call or unhandled exception. See “Restarting Scheduler Cycle After Hook
Failure” on page 512 of the PBS Professional Reference Guide.
For a list of exceptions, see Table 6.10.2.1.ii, “Table of Exceptions,” on page 486.
6.10.4
6.10.4.1
Using Attributes and Resources in Hooks
Determining Whether to Use Creation Method to
Set Attribute or Resource
The way you set an attribute or resource depends on the type of the attribute or resource:
•
If the attribute or resource is a string (str), an integer (int), a Boolean (bool), a long
(long), or a floating point (float), you can set it directly:
pbs.event().job.<attribute name> = <attribute value>
pbs.event().job.Resource_List[“<resource name>”]=<resource value>
For example:
jobA = pbs.event().job
jobA.Account_Name = “AccountA”
jobA.Priority = 100
•
However, if the attribute or resource is any other type, you must use the corresponding
creation method to instantiate an object of the correct type with the desired value as a formatted input string, then assign the object to the job. For example:
pbs.event().job.Hold_Types = pbs.hold_types(“uo”)
For creation methods, see section 6.12.14.3, “Global Methods”, on page 607.
6.10.4.1.i
Caveat for Objects Requiring Creation Method
You can operate on these objects only as if they are strings. Use repr() on the object to get
its full string representation. You can then manipulate this representation using the built-in
methods for Python 'str'.
AG-488
PBS Professional 13.0 Administrator’s Guide
Hooks
6.10.4.1.ii
Chapter 6
Python Types not Requiring Creation Method
The following Python types do not require you to use an explicit creation method:
str
int
bool
long
float
6.10.4.2
How to Unset an Attribute or Resource
To unset an attribute or resource, set <attribute value> to None:
pbs.event().job.<attribute name> = None
When you unset an attribute or resource, it takes its default value.
6.10.4.2.i
How to Unset an Attribute or Resource Requiring
Creation Method
You can unset a job attribute or resource that has a creation method by setting it to None.
Example:
pbs.event().job.Hold_Types = None
6.10.4.3
Reading and Setting Attributes in Hooks
All hooks can read, but not set, all job, vnode, server, queue, and reservation attributes via
pbs.server().job(), pbs.server().vnode(), pbs.server().queue(), etc.
We list which job attributes can be read or set when the job is retrieved through an event in
Table 6-8, “Job Attributes Readable & Settable via Events,” on page 499.
We list which vnode attributes can be read or set when the vnode is retrieved through an event
in Table 6-9, “Vnode Attributes Readable & Settable via Events,” on page 502.
We list which reservation attributes can be read or set when the reservation is retrieved
through an event in Table 6-10, “Reservation Attributes Readable & Settable via Events,” on
page 503.
No hooks can see or set any scheduler attributes.
The job, vnode, or reservation object’s attributes appear to the hook as they would be after the
event, not before it, for all hooks except runjob hooks.
PBS Professional 13.0 Administrator’s Guide
AG-489
Chapter 6
Hooks
6.10.4.3.i
Setting Time Attributes
For the job attributes Execution_Time, ctime, etime, mtime, qtime, and stime, the pbs.job
object expects or shows the number of seconds since Epoch. The only one of these that can
be set is Execution_Time.
For the reservation attributes reserve_start, reserve_end, and ctime, the pbs.resv object
expects and shows the number of seconds since Epoch. The ctime attribute cannot be set.
If you wish to set the value for Execution_Time, reserve_start, or reserve_end using the
[[CCYY]MMDDhhmm[.ss] format, or to see the value of any of the time attributes in the
ASCII time format, load the Python time module and use the functions
time.mktime([CCYY, MM, DD, hh, mm, ss, -1, -1, -1]) and
time.ctime().
Example:
import time
pbs.job.Execution_Time = time.mktime([07, 11, 28, 14, 10, 15, -1, -1, -1])
time.ctime(pbs.job.Execution_Time)
'Wed Nov 28 14:10:15 2007'
If reserve_duration is unset or set to None, the reservation’s duration is taken from the walltime resource attribute associated with the reservation request. If reserve_duration and walltime are both specified, meaning not set to None, reserve_duration will take precedence.
6.10.4.3.ii
Special Characters in Variable_List Job Attribute
When special characters are used in Variable_List job attributes, they must be escaped. For
this attribute, special characters are comma (,), single quote (‘), double quote (“), and backslash (\). PBS requires each of these to be escaped with a backslash. However, Python
requires that double quotes and backslashes also be escaped with a backslash. If the special
character inside a string is a single quote, you must enclose the string in double quotes. If the
special character inside the string is a double quote, you must enclose the string in single
quotes. The following rules show how to use special characters in a Variable_List attribute
when writing a Python script:
Table 6-5: How to Use Special Characters in Python Scripts
Character
Example Value
How Value is Represented in
Python Script
, (comma)
a,b
“a\\,b” or ‘a\\,b’
' (single quote)
c’d
“c\\’d”
AG-490
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-5: How to Use Special Characters in Python Scripts
Character
How Value is Represented in
Python Script
Example Value
" (double quote)
f"g"h
'f\\\"g\\\"h'
\ (backslash)
\home\dir\files
“\\home\\dir\\files” or ‘\\home\\dir\\files’
For example, if the path is:
“\Documents and Settings\pbstest\bin:\windows\system32”
This is how the path shows up in a script:
pbs.job.Variable_List[“PATH”] = “\\Documents and
Settings\\pbstest\\bin:\\windows\\system32”
6.10.4.3.iii
Special Characters in string_array Attributes
For an attribute whose type is string_array and whose value contains one or more commas
(“,”), the whole string must be enclosed in single quotes, outside of its double quotes. For
example:
In PBS_HOME/server_priv/resourcedef:
test_string_array type=string_array
If our string array has a single element consisting of “glad, elated”:
pbs.job.Resource_List["test_string_array"] = '"glad, elated"'
If our string array has two elements, where one is “glad, elated” and the other is “happy”:
pbs.job.Resource_List["test_string_array"] = ‘"glad, elated"‘, “happy”
6.10.4.4
Reading and Setting Resources in Hooks
All hooks can read, but not set, all job, vnode, server, queue, and reservation resources via
pbs.server().job(), pbs.server().vnode(), pbs.server().queue(), etc. The resources that can
be read or set via pbs.event() vary by hook.
We list the job resources that can be read and set via an event in each kind of hook in Table 611, “Job Resources Readable & Settable by Hooks via Events,” on page 505.
We list the vnode resources that can be read and set via an event in each kind of hook in
Table 6-12, “Vnode Resources Readable & Settable by Hooks via Events,” on page 507.
PBS Professional 13.0 Administrator’s Guide
AG-491
Hooks
Chapter 6
We give an overview of the resources that can be read and set by each hook in Table 6-6,
“Overview of Resources Readable & Settable by Hooks via Pre-execution and Provision
Events,” on page 493 and Table 6-7, “Overview of Resources Readable & Settable by Hooks
via execjob_ and exechost_ Events,” on page 494. In these tables, if we say that a hook can
read or set a group of resources, for example the server’s resources_available attribute, that
means that the hook can read or set all of the resources for that group.
Custom resources are treated the same way as built-in resources.
6.10.4.4.i
Reading Resources
PBS resources are represented as Python dictionaries, where the resource names are the dictionary keys. These resources are listed in “Resources” on page 313 of the PBS Professional
Reference Guide.
You can read a resource through objects such as the server, the event that triggered the hook,
or the vnode to which a resource belongs. For example:
pbs.server().resources_available[“<resource name>”]
pbs.event().job.Resource_List[“<resource name>”]
pbs.event().vnode_list[<vnode name>].resources_available[“ncpus”]
The resource name must be in quotes.
Example: Get the number of CPUs in a job’s Resource_List attribute:
ncpus=pbs.event().job.Resource_List[“ncpus”]
6.10.4.4.ii
Setting Resources
A resource can be set as follows:
pbs.event().job.Resource_List[“<resource name>”] = <resource value>
pbs.event().vnode_list[“<vnode name>”].resources_available[“<resource name>”] =
<resource value>
For example:
pbs.event().job.Resource_List[“mem”] = 8gb
pbs.event().vnode_list[“V2”].resources_available[“ncpus”]=2
AG-492
PBS Professional 13.0 Administrator’s Guide
Hooks
6.10.4.4.iii
Chapter 6
Overview of Readable & Settable Resources
Here we list an overview of which resources can be read or set in hooks. An “r” indicates
read, an “s” indicates set, and an “o” indicates that this resource can be set but the action has
no effect. See Table 6-1, “Execution Event Hook Timing,” on page 457 for more information
about why some operations have no effect. The following table shows which resource categories are readable or settable in pre-execution and provision hooks:
Job Resource_List (Varies; see Table 6-11)
r, s
r, s
r
Job resources_used
Vnode resources_available
Vnode resources_assigned
Server resources_available
Server resources_assigned
Server resources_default
Server resources_max
Queue resources_available
Queue resources_assigned
Queue resources_default
Queue resources_max
Queue resources_min
Reservation Resource_List
o
--r
r
r
r
r
r
r
r
r
r
r
r
--r
r
r
r
r
r
r
r
r
r
r
r
--r
r
r
r
r
r
r
r
r
r
r
PBS Professional 13.0 Administrator’s Guide
Table
6-11
r
--r
r
r
r
r
r
r
r
r
r
r
provision
resvsub
runjob
movejob
modifyjob (before run)
Resource Category
queuejob
Table 6-6: Overview of Resources Readable & Settable by Hooks
via Pre-execution and Provision Events
---
---
r
--r
r
r
r
r
r
r
r
r
r
r, s
----r
r
r
r
r
r
r
r
r
r
---
AG-493
Hooks
Chapter 6
The following table lists an overview of which resources can be read or set in execjob and
exechost hooks.
6.10.4.4.iv
execjob_prologue
execjob_launch
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
exechost_periodic
Job Resource_List
Job resources_used
Vnode resources_available
Vnode resources_assigned
Server resources_available
Server resources_assigned
Server resources_default
Server resources_max
Queue resources_available
Queue resources_assigned
Queue resources_default
Queue resources_max
Queue resources_min
Reservation Resource_List
execjob_attach
Resource Category
execjob_begin
Table 6-7: Overview of Resources Readable & Settable by Hooks
via execjob_ and exechost_ Events
r
r, s
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r, s
r
r
r
r
r
r
r
r
r
r
r
Setting and Unsetting Vnode Resources and Attributes
Using vnode_list[]
You can set and unset vnode resources and attributes using the vnode_list[] object in an
exechost_startup or exechost_periodic hook. Any changes made this way are merged
with those defined in a Version 2 MoM configuration file.
To set the attributes and resources for a particular vnode:
pbs.event().vnode_list[<vnode name>].<attribute> = <value>
pbs.event().vnode_list[<vnode name>].<resources_available>[“<resource name>”] =
<value>
Resource names and string values must be quoted.
AG-494
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Some examples:
pbs.event().vnode_list[<vnode
pbs.event().vnode_list[<vnode
pbs.event().vnode_list[<vnode
pbs.size(“100gb”)
pbs.event().vnode_list[<vnode
pbs.event().vnode_list[<vnode
pbs.event().vnode_list[<vnode
name>].pcpus = 5
name>].resources_available[“ncpus”] = 3
name>].resources_available.[“mem”] =
name>].arch = “linux”
name>].state = pbs.ND_OFFLINE
name>].sharing = pbs.ND_FORCE_EXCL
To unset a resource value, specify “None” as its value:
pbs.event().vnode_list[<vnode_name>].resources_available[<res>] = None
pbs.event().vnode_list[<vnode_name>].<attribute> = None
6.10.4.4.v
When MoM Modifies Job resources_used Set in Hooks
If an execution hook modifies specific resources used by a job, MoM refrains from updating
those values.
Under Linux/UNIX, job resources_used that MoM does not modify if they’ve been set in a
hook are cput, walltime, mem, vmem, ncpus, and cpupercent.
Under Windows, job resources_used that MoM does not modify if they’ve been set in a
hook are cput, walltime, mem, and ncpus.
The qmgr command cannot be used to set resources_used for a job.
6.10.4.5
Converting walltime to Seconds
If you want to see a job’s walltime in seconds:
int(pbs.event().job.Resource_List["walltime"])
For example:
pbs.logmsg(pbs.LOG_DEBUG, "walltime=%d" %
(int(pbs.event().job.Resource_List["walltime"])))
If walltime is "00:30:15", this results in the following:
walltime=1815
PBS Professional 13.0 Administrator’s Guide
AG-495
Hooks
Chapter 6
6.10.4.6
6.10.4.6.i
Caveats for Setting and Unsetting Attributes and
Resources
When to Change Reservation Attributes
The only time that a reservation’s attributes can be altered is during the creation of that reservation in a resvsub hook.
6.10.4.6.ii
Caution About Unsetting Reservation walltime Resource
The walltime resource is used to determine the reservation’s reserve_duration parameter
when the reservation’s reserve_duration attribute is not set or is set to None. If a resvsub
hook attempts to unset the walltime parameter, for example:
pbs.event().resv.Resource_List["walltime"] = None
This will result in the following error:
% pbs_rsub -R 1800 -l ncpus=1
pbs_rsub: Bad time specification(s)
6.10.4.6.iii
Changing Job Attributes for a Running Job
When a job is running, only the cput and walltime attributes can be modified. Attempting to
change any other attributes for a running job will cause the corresponding qalter action to
be rejected. For example, if the job is running, this line in a hook will cause qalter to be
rejected:
pbs.event().job.Resource_List["mem"] = pbs.size("10mb")
To avoid having the qalter action rejected, check to see whether the job is running, and follow up accordingly. For example:
e = pbs.event()
if e.job.job_state in [ pbs.JOB_STATE_RUNNING, pbs.JOB_STATE_EXITING,
pbs.JOB_STATE_TRANSIT ]:
e.accept()
6.10.4.6.iv
Do Not Unset Array Job Indices
Do not unset pbs.event().job.array_indices_submitted for an array job in a modifyjob hook.
For example:
pbs.event().job.array_indices_submitted = None
If the hook script is executed for a job array, the qalter request will fail with the message:
Cannot modify attribute while job running <job array ID>
AG-496
PBS Professional 13.0 Administrator’s Guide
Hooks
6.10.4.6.v
Chapter 6
Do Not Create Job or Reservation Variable List
Hooks are not allowed to create job or reservation Variable_List attributes. Hooks can modify the existing Variable_List job attribute which is supplied by PBS, by modifying values in
the list. The following are disallowed in a hook:
pbs.event().job.Variable_List = dict()
pbs.event().resv.Variable_List = dict()
These calls will cause the following exception:
04/07/2008 11:22:14;0001;Server@host2;Svr;Server@host2;PBS server internal
error (15011) in Error evaluating Python script, attribute
'Variable_List' cannot be directly set.
To modify the Variable_List attribute:
pbs.event().job.Variable_List["SIMULATE"] = "HOOK1"
6.10.4.6.vi
Changing Vnode state Attribute
A vnode's state can be set within a runjob hook only if the runjob hook execution concludes
with a pbs.event().reject() call. This means that if a statement that sets a vnode’s state
appears in a runjob hook script, it takes effect only if the following is the last line to be executed:
pbs.event().reject()
To set a vnode's state, the syntax is one of the following:
pbs.vnode.state = <vnode state constant>
pbs.vnode.state += <vnode state constant>
pbs.vnode.state -= <vnode state constant>
where <vnode state constant> is one of the constant objects listed in section 6.12.11.4,
“Vnode State Constant Objects”, on page 600.
Examples of changing a vnode’s state attribute:
•
To offline a vnode:
pbs.vnode.state = pbs.ND_OFFLINE
•
To add another value to the list of vnode states:
pbs.vnode.state += pbs.ND_DOWN
•
To remove a value from the list of vnode states:
pbs.vnode.state -= pbs.ND_OFFLINE
When a vnode’s state attribute has no states set, the vnode’s state is equivalent to free. This
means that you can remove all values, and the vnode will become free.
PBS Professional 13.0 Administrator’s Guide
AG-497
Chapter 6
Hooks
When a vnode’s state is successfully set, the following message is displayed and logged at
event class 0x0004:
Node;<vnode-name>;attributes set: state - <vnode state constant> by
<hook_name>
You can set a vnode’s state attribute in any execution hook and in a periodic hook, and
changes to vnode attributes take effect whether the execution hook or periodic hook calls
accept() or reject().
6.10.4.6.vii
Attribute Change Failure is Silent
If you attempt to change the value for an attribute in an unsupported way, PBS does not warn
you that your attempt failed.
6.10.4.6.viii
Lengthened walltime Can Interfere with Reservations
If a hook lengthens the walltime of a running job, you run the risk that the new walltime will
interfere with existing reservations etc.
6.10.4.6.ix
Setting Vnode Resources in Hooks Overwrites Previous
Value
When you set resources_available for a vnode, inside or outside of a hook, you are overwriting the previous value. There is no way in a hook to know whether a value was set inside
or outside a hook (for example, using qmgr or a vnode definition file). There is no way to
prevent a value set inside a hook from being modified outside of the hook.
6.10.4.6.x
Changing Resources in Accounting Logs
If you use a non-execjob_end execution hook to set a value for resources_used, the new
value for resources_used appears in the accounting logs.
6.10.4.6.xi
When Setting Resources Has No Effect
If you use an execjob_end execution hook to set a value for resources_used, it has no
effect, because MoM has already sent the final values for resources_used to the server.
AG-498
PBS Professional 13.0 Administrator’s Guide
Hooks
6.10.4.7
Chapter 6
Table: Reading & Setting Job Attributes in
Hooks
The following table lists the job attributes that can be read or set when the job is retrieved via
an event. An “r” indicates read, an “s” indicates set, and an “o” indicates that this attribute
can be set but the action has no effect. See Table 6-1, “Execution Event Hook Timing,” on
page 457 for more information about why some operations have no effect.
movejob
runjob (on reject)
runjob (on accept)
execjob_begin
execjob_attach
execjob_prologue
execjob_launch
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
exechost_periodic
accounting_id
Account_Name
accrue_type
alt_id
argument_list
array
array_id
array_index
array_indices_remaini
ng
array_indices_submitt
ed
array_state_count
block
Checkpoint
comment
ctime
depend
egroup
eligible_time
Error_Path
estimated
etime
modifyjob (before run)
Job Attribute
queuejob
Table 6-8: Job Attributes Readable & Settable via Events
--r, s
---------------
r
r, s
r
r
----r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
--r
r
--r
--r
r
---
--r
r
--r
--r
r
---
--r
r
--r
--r
r
---
----r
------r
r
---
--r
r
--r
--r
r
---
--r
r
--r
--r
r
---
--r
r
--r
--r
r
---
-------------------
--r
r
--r
--r
r
---
--- --- r
r
r
--- --- --- --- --- --- --- --- ---
----r, s
----r, s
----r, s
-----
r
r
r
r
r
r, s
r
r
r, s
r
r
r
r
r
r
r
r
r
r
r, s
r
r
----r
------r
--r
-----
r
r, s
r, s
--r
r, s
r
r, s
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r
PBS Professional 13.0 Administrator’s Guide
----r
------r
--r
-----
----r
------r
--r
-----
------------r
---------
----r
------r
--r
-----
----r
------r
--r
-----
----r
------r
--r
-----
-----------------------
----r
------r
--r
-----
AG-499
Hooks
Chapter 6
AG-500
movejob
runjob (on reject)
runjob (on accept)
execjob_begin
execjob_attach
execjob_prologue
execjob_launch
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
exechost_periodic
euser
Executable
Execution_Time
exec_host
exec_vnode
Exit_status
group_list
hashname
Hold_Types
interactive
jobdir
Job_Name
Job_Owner
job_state
Join_Path
Keep_Files
Mail_Points
Mail_Users
mtime
no_stdio_sockets
Output_Path
Priority
project
pset
qtime
Queue
queue_rank
queue_type
Rerunable
resources_used
modifyjob (before run)
Job Attribute
queuejob
Table 6-8: Job Attributes Readable & Settable via Events
--r, s
r, s
------r, s
--r, s
r, s
--r, s
----r, s
r, s
r, s
r, s
----r, s
r, s
r, s
----r, s
----r, s
---
r
--r, s
----r
r, s
r
r, s
r, o
r
r, s
r
r
r, s
r, s
r, s
r, s
r
--r, s
r, s
r, s
--r
r
r
r
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r
r
r
r
r
r
r, s
r
r
r
r
r
r, s
r
r
r
r
r
r
r
r
r
r
r
r, s
r
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r
r
r
r
r
r
r
r
r
r
--r,s
r
r
--r
r
r, s
r
--r
----r
r
--r
--r
r
--r
r
--r
------r, s
r
--r
r
r
r
r
r
r
r
--r
----r
r
--r
--r
r
--r
r
--r
------r
r
--r,s
r
r
--r
r
r, s
r
--r
----r
r
--r
--r
r
--r
r
--r
------r, s
r
------------r
----------------------------------r
------r
r
--r
r
r
r
r
r
r
r
--r
----r
r
--r
--r
r
--r
r
--r
------r
r
--r,s
r
r
r
r
r
r, s
r
r
r
----r
r
--r
--r
r
--r
r
--r
------r, s
r
--r,s
r
r
--r
r
r, s
r
--r
----r
r
--r
--r
r
--r
r
--r
------r, s
----------------------------------------------------------r,s
r
--r,s
r
r
--r
r
r, s
r
--r
----r
r
--r
--r
r
--r
r
--r
------r, s
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
execjob_preterm
exechost_startup
exechost_periodic
execjob_epilogue
r
--- r
r
r
--- r
r, s
--r, s
--------r, s
r, s
r, s
----------r, s
r, s
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r
r
r
----r
--r
r
r
----------r
--r
r, s
r
r
----r
--r
r
r
----------r
--r
--r
------r
-------------------------
r, s
r
r
----r
--r
r
r
----------r
--r
r, s
r
r
----r
--r
r
r
----------r
--r
-------------------------------------
r, s
r
r, s
r
--r
r
r, s
r, s
r, s
r
r
--r
r
r, s
r, s
r, s
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
PBS Professional 13.0 Administrator’s Guide
r, s
r
r
----r
--r
r
r
----------r
--r, s
execjob_end
execjob_launch
r
execjob_begin
execjob_prologue
runjob (on accept)
r, s r, s r
runjob (on reject)
r, s r, s r
movejob
execjob_attach
Resource_List (with
restrictions; see Table 611)
run_count
run_version
sandbox
schedselect
sched_hint
server
session_id
Shell_Path_List
stagein
stageout
Stageout_status
stime
Submit_arguments
substate
sw_index
umask
User_List
Variable_List
queuejob
Job Attribute
modifyjob (before run)
Table 6-8: Job Attributes Readable & Settable via Events
r
r
r
----r
--r
r
r
----------r
--r
r, s
r
r
----r
--r
r
r
----------r
--r, s
AG-501
Hooks
Chapter 6
6.10.4.8
Table: Reading & Setting Vnode Attributes in
Hooks
The following table shows the vnode attributes that can be read or set when the vnode object
is retrieved via an event. An “r” indicates read, an “s” indicates set, and an “o” indicates that
this attribute can be set but the action has no effect. See Table 6-1, “Execution Event Hook
Timing,” on page 457 for more information about why some operations have no effect.
AG-502
movejob
runjob
execjob_begin
execjob_attach
execjob_prologue
execjob_launch
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
exechost_periodic
provision
comment
current_aoe
hpcbp_enable
hpcbp_stage_protocol
hpcbp_user_name
hpcbp_webservice_addr
ess
in_multivnode_host
jobs
license
license_info
max_group_run
max_running
max_user_run
Mom
name
no_multinode_jobs
ntype
pbs_version
pcpus
pnames
Port
Priority
modifyjob (before run)
Vnode Attribute
queuejob
Table 6-9: Vnode Attributes Readable & Settable via Events
r, s
-----------
r, s
-----------
-------------
-------------
r, s
r, s
r, s
r, s
r, s
r, s
r
r
r
r
r
r
r, s
r, s
r, s
r, s
r, s
r, s
r
r
r
r
r
r
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
-------------
---------------------------------
---------------------------------
---------------------------------
---------------------------------
--------r, s
r, s
r, s
r, s
--r, s
r, s
r
r
r, s
r, s
r, s
--------r
r
r
r
--r
r
r
r
r
r
r
--------r, s
r, s
r, s
r, s
--r, s
r, s
r
r
r, s
r, s
r, s
--------r
r
r
r
--r
r
r
r
r
r
r
--------r, s
r, s
r, s
r, s
--r, s
r, s
r
r
r, s
r, s
r, s
--------r, s
r, s
r, s
r, s
--r, s
r, s
r
r
r, s
r, s
r, s
--------r, s
r, s
r, s
r, s
--r, s
r, s
r
r
r, s
r, s
r, s
--------r, s
r, s
r, s
r, s
--r, s
r, s
r
r
r, s
r, s
r, s
--------r, s
r, s
r, s
r, s
--r, s
r, s
r
r
r, s
r, s
r, s
---------------------------------
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
6.10.4.9
movejob
runjob
execjob_begin
execjob_attach
execjob_prologue
execjob_launch
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
exechost_periodic
provision
provision_enable
queue
resources_assigned
resources_available
resv
resv_enable
sharing
state
topology_info
modifyjob (before run)
Vnode Attribute
queuejob
Table 6-9: Vnode Attributes Readable & Settable via Events
-------------------
-------------------
-------------------
--------------r, s
---
r, s
r, s
r
r, s
--r, s
r, s
r, s
---
r
r
r
r
--r
r
r
---
r, s
r, s
r
r, s
--r, s
r, s
r, s
---
r
r
r
r
--r
r
r
---
r, s
r, s
r
r, s
--r, s
r, s
r, s
---
r, s
r, s
r
r, s
--r, s
r, s
r, s
---
r, s
r, s
r
r, s
--r, s
r, s
r, s
---
r, s
r, s
r
r, s
--r, s
r, s
r, s
r
r, s
r, s
r
r, s
--r, s
r, s
r, s
r
-------------------
Table: Reading & Setting Reservation Attributes
in resvsub Hook
Reservation attributes can be read and set through an event only in resvsub hooks. No other
hooks can read or set reservation attributes through an event. All hooks can read, but not set,
all reservation attributes by retrieving the reservation object through the server, using
pbs.server().resv(). The following table shows the reservation attributes that can be read or
set when the reservation object is retrieved via an event, in a resvsub hook:
Table 6-10: Reservation Attributes Readable & Settable via Events
Readable,
Settable
Reservation Attribute
Account_Name
r
Authorized_Groups
r, s
Authorized_Hosts
r, s
Authorized_Users
r, s
ctime
r
PBS Professional 13.0 Administrator’s Guide
AG-503
Hooks
Chapter 6
Table 6-10: Reservation Attributes Readable & Settable via Events
Readable,
Settable
Reservation Attribute
group_list
r, s
hashname
r
interactive
r, s
Mail_Points
r, s
Mail_Users
r, s
mtime
r
Priority
r
Queue
r
reserve_count
r
reserve_duration
r, s
reserve_end
r, s
reserve_ID
r
reserve_index
r
Reserve_Name
r, s
Reserve_Owner
r
reserve_retry
r
reserve_rrule
r, s
reserve_start
r, s
reserve_state
r
reserve_substate
r
reserve_type
r
Resource_List
r, s
resv_nodes
r
server
r, s
AG-504
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-10: Reservation Attributes Readable & Settable via Events
Readable,
Settable
Reservation Attribute
User_List
r
Variable_List
r, s
6.10.4.10
Table: Reading & Setting Job Resources in
Hooks
The following table shows the built-in members of the job’s Resource_List attribute that can
be read or set in each type of hook, when retrieving the object through an event. An “r” indicates read, an “s” indicates set, and an “o” indicates that this resource can be set but the action
has no effect. See Table 6-1, “Execution Event Hook Timing,” on page 457 for more information about why some operations have no effect.
execjob_epilogue
execjob_preterm
exechost_startup
r
r
r
r
r
r
r
r
r
r
--- r
--- r
-----
r, s r, s r
r, s r
--- r
r
r
r
r
r
r
--- r
---
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
-----------------
--r
r
--r
----r
--r
r
--r
----r
--r
r
--r
----r
--r
r
--r
----r
--r
r
--r
----r
--r
r
--r
----r
-----------------
-----------------
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r
r
r
r
r
r
r
r
r
r, s
r, s
r, s
r, s
r
r, s
r
--r
r
--r
----r
PBS Professional 13.0 Administrator’s Guide
--r
r
--r
----r
provision
execjob_end
exechost_periodic
execjob_launch
r
r
execjob_begin
--- r
--- r
resvsub
execjob_prologue
runjob (on accept)
r, s r
r, s r
runjob (on reject)
r, s r, s r
r, s r, s r
movejob
execjob_attach
accelerator
accelerator_mem
ory
accelerator_mode
l
aoe
arch
cput
exec_vnode
file
host
max_walltime
mem
queuejob
Resource in
Resource_List
modifyjob (before run)
Table 6-11: Job Resources Readable & Settable by Hooks via
Events
AG-505
Hooks
Chapter 6
Resource in
Resource_List
queuejob
modifyjob (before run)
movejob
runjob (on reject)
runjob (on accept)
resvsub
execjob_begin
execjob_attach
execjob_prologue
execjob_launch
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
exechost_periodic
provision
Table 6-11: Job Resources Readable & Settable by Hooks via
Events
min_walltime
mpiprocs
mpparch
mppdepth
mpphost
mpplabels
mppmem
mppnodes
mppnppn
mppwidth
naccelerators
nchunk
ncpus
netwins
nice
nodect
nodes
ompthreads
pcput
pmem
pvmem
site
software
start_time
vmem
vnode
vntype
walltime
PBScrayhost
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
---
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
---
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
---
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
---
r, s
r
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r
r
r
r, s
r
r
r
r, s
r, s
r, s
r, s
r, s
r, s
r
r, s
r
r, s
---
-----------------------------------------------------------
----r
r
r
r
r
r
r
r
r
--r
r
r
------r
r
r
r
----r
----r
---
----r
r
r
r
r
r
r
r
r
--r
r
r
------r
r
r
r
----r
----r
---
----r
r
r
r
r
r
r
r
r
--r
r
r
------r
r
r
r
----r
----r
---
----r
r
r
r
r
r
r
r
r
--r
r
r
------r
r
r
r
----r
----r
---
----r
r
r
r
r
r
r
r
r
--r
r
r
------r
r
r
r
----r
----r
---
----r
r
r
r
r
r
r
r
r
--r
r
r
------r
r
r
r
----r
----r
---
----r
r
r
r
r
r
r
r
r
--r
r
r
------r
r
r
r
----r
----r
---
-----------------------------------------------------------
----r
r
r
r
r
r
r
r
r
--r
r
r
------r
r
r
r
----r
----r
---
-----------------------------------------------------------
AG-506
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
PBScraylabel_<la
bel name>
PBScraynid
PBScrayorder
PBScrayseg
6.10.4.11
provision
exechost_periodic
exechost_startup
execjob_preterm
execjob_epilogue
execjob_end
execjob_launch
execjob_prologue
execjob_attach
execjob_begin
resvsub
runjob (on accept)
runjob (on reject)
movejob
modifyjob (before run)
Resource in
Resource_List
queuejob
Table 6-11: Job Resources Readable & Settable by Hooks via
Events
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ----- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
Table: Reading & Setting Vnode Resources in
Hooks
The following table shows the built-in members of the vnode’s resources_available
attribute that can be read or set in each type of hook, when retrieving the object through an
event. An “r” indicates read, an “s” indicates set, and an “o” indicates that this resource can
be set but the action has no effect. See Table 6-1, “Execution Event Hook Timing,” on
page 457 for more information about why some operations have no effect.
PBS Professional 13.0 Administrator’s Guide
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
r
r
r
r
r
r
r
r
r
r
r
r
r, s r, s --r, s r, s ---
provision
execjob_launch
exechost_periodic
execjob_prologue
accelerator
--- --- --- --- --- --- r
accelerator_mem --- --- --- --- --- --- r
ory
execjob_attach
execjob_begin
resvsub
runjob (on accept)
runjob (on reject)
movejob
modifyjob (before run)
Resource in
resources_ava
ilable
queuejob
Table 6-12: Vnode Resources Readable & Settable by Hooks via
Events
AG-507
Hooks
Chapter 6
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
r
r
r
r
r, s r, s ---
---------------------------------------------------------
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
AG-508
---------------------------------------------------------
---------------------------------------------------------
---------------------------------------------------------
---------------------------------------------------------
---------------------------------------------------------
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
provision
execjob_launch
exechost_periodic
execjob_prologue
r
execjob_begin
r
resvsub
execjob_attach
runjob (on accept)
--- --- --- --- --- --- r
runjob (on reject)
accelerator_mode
l
aoe
arch
cput
exec_vnode
file
host
max_walltime
mem
min_walltime
mpiprocs
mpparch
mppdepth
mpphost
mpplabels
mppmem
mppnodes
mppnppn
mppwidth
naccelerators
nchunk
ncpus
netwins
nice
nodect
nodes
ompthreads
pcput
pmem
movejob
Resource in
resources_ava
ilable
queuejob
modifyjob (before run)
Table 6-12: Vnode Resources Readable & Settable by Hooks via
Events
---------------------------------------------------------
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
execjob_epilogue
execjob_preterm
exechost_startup
exechost_periodic
provision
---------------------
execjob_end
---------------------
execjob_launch
runjob (on accept)
---------------------
execjob_prologue
runjob (on reject)
---------------------
execjob_attach
movejob
---------------------
execjob_begin
modifyjob (before run)
pvmem
site
software
start_time
vmem
vnode
vntype
walltime
PBScrayhost
PBScraylabel_<la
bel name>
PBScraynid
PBScrayorder
PBScrayseg
resvsub
Resource in
resources_ava
ilable
queuejob
Table 6-12: Vnode Resources Readable & Settable by Hooks via
Events
---------------------
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
r, s
---------------------
--- --- --- --- --- --- r
--- --- --- --- --- --- r
--- --- --- --- --- --- r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r, s r, s --r, s r, s --r, s r, s ---
PBS Professional 13.0 Administrator’s Guide
AG-509
Hooks
Chapter 6
6.10.5
Using select and place in Hooks
All hooks can read, but not set, a job’s select and place statements via pbs.server().job(),
pbs.server().vnode(), pbs.server().queue(), etc. The following table shows the type of hook
that can read or set a job’s select and place statements, when retrieving the object through an
event. An “r” indicates read, an “s” indicates set, and an “o” indicates that this value can be
set but the action has no effect. See Table 6-1, “Execution Event Hook Timing,” on page 457
for more information about why some operations have no effect.
Select or Place
Job place statement
Job select statement
6.10.5.1
queuejob
modifyjob (before run)
movejob
runjob (on reject)
runjob (on accept)
resvsub
execjob_begin
execjob_attach
execjob_prologue
execjob_launch
execjob_end
execjob_epilogue
execjob_preterm
exechost_startup
exechost_periodic
provision
Table 6-13: Hooks that Can Read & Set Job select and place
Statements via Events
r, s r, s r
r, s r, s r
r, s r
r, s r
--- --- --- r
--- --- --- r
r
r
r
r
r
r
--- --- ----- --- ---
How to Set select and place in Hooks
You must use the associated creation method to instantiate an object of the correct type with
the desired value, then assign the object to the job. Syntax:
job.place = pbs.place("[arrangement]:[sharing]:[group]")
job.select = pbs.select("[N:]res=val[:res=val][+[N:]res=val[:res=val] ... ]")
Example 6-13: Set a job’s select and place directives:
jobB = pbs.event().job
jobB.place = pbs.place("pack:exclhost")
jobB.select = pbs.select("2:mem=2gb:ncpus=1+6:mem=8gb:ncpus=16")
See "pbs.select()” on page 616 and "pbs.place()” on page 614.
AG-510
PBS Professional 13.0 Administrator’s Guide
Hooks
6.10.6
6.10.6.1
Chapter 6
Offlining and Clearing Vnodes Using the
fail_action Hook Attribute
Offlining Vnodes
You can offline vnodes when an execjob_begin or exechost_startup hook fails due to an
alarm call or unhandled exception, or when the hook fails due to an internal error such as a
full disk or not enough memory on the host, for example, a malloc() error.
To offline vnodes upon failure, set the value of the hook’s fail_action attribute to include
“offline_vnodes”. This marks the vnodes managed by the hook’s MoM as offline.
# qmgr -c "set hook <hook_name> fail_action += offline_vnodes"
When a vnode is offlined using the fail_action attribute, the vnode’s comment attribute is set
to an explanation:
“offlined by hook <hook_name> due to hook error”
See section 6.8.9.2, “Using the fail_action Hook Attribute”, on page 473.
6.10.6.2
Clearing Vnodes
When an exechost_startup hook runs successfully and does not encounter any uncaught
exception or alarm timeout, you can clear the offline state from vnodes that were previously
marked offline via fail_action.
To clear the offline state from vnodes that were previously offlined via the offline_vnodes
fail_action attribute, set the value of the exechost_startup hook’s fail_action attribute to
include “clear_vnodes_upon_recovery”. This clears the offline state from the vnodes
managed by the hook’s MoM.
# qmgr -c "set hook <hook_name> fail_action += clear_vnodes_upon_recovery”
If you have fixed your execjob_begin script, and want to send jobs again to the vnodes managed by the MoM where the script runs, clear the offline states and comments from the
vnodes managed by that MoM:
•
Clear the offline state:
# pbsnodes -c <MoM host>
•
Clear the comment:
qmgr -c "u n <vn1>,<vn2>,... comment"
Or for long lists of vnodes:
# qmgr -c "unset node `pbsnodes -vl | awk '{if( NR == 1 ) {printf "%s", $1}
else {printf ",%s", $1}}'` comment"
PBS Professional 13.0 Administrator’s Guide
AG-511
Hooks
Chapter 6
You can write an exechost_periodic hook that monitors the states of the vnodes, so that
when it finds offlined vnodes with vnode comment messages matching “offlined by
hook…”, the hook clears the comment and offline states.
See section 6.8.9.2, “Using the fail_action Hook Attribute”, on page 473.
6.10.7
Restarting Scheduler Cycle After Hook Failure
You can restart the scheduler after an execjob_begin hook fails due to an alarm call or
unhandled exception, or when the hook fails due to an internal error such as a full disk or not
enough memory on the host, for example, a malloc() error. To restart the scheduler after
failure of an execjob_begin hook, set the value of the hook’s fail_action attribute to include
“restart_scheduler_cycle”.
# qmgr -c "set hook <hook_name> fail_action += scheduler_restart_cycle”
See section 6.8.9.2, “Using the fail_action Hook Attribute”, on page 473.
6.10.8
Adding Custom Non-consumable Host-level
Resources
You can add new custom host-level, non-consumable resources and set their values in
resources_available for a vnode by using vnode_list[]in an exechost_startup hook. Any
changes made this way are merged with those defined in a Version 2 MoM configuration file.
Upon startup, MoM reads configuration files before executing the exechost_startup hook.
These resources are automatically added to the PBS_HOME/server_priv/resourcedef file.
To add a new custom host-level resource, and set its value:
v = pbs.event().vnode_list[ <vnode name>]
v.resources_available[<new_resource>] = <value>
The type of the resource is inferred from the value assigned to the resource. Python types
map to PBS types as shown in the following table:
Table 6-14: Resource Types when Adding via vnode_list
Python Type
Type
int
Long
str
String
bool
Boolean
AG-512
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-14: Resource Types when Adding via vnode_list
Python Type
Type
pbs.size
Size
pbs.duration
Long
float
Float
Any Python type without an explicit match
String
You must also make the resource usable by the scheduler: see section 5.14.2.4, “Allowing
Jobs to Use a Resource”, on page 347.
To delete a custom resource created in a hook, use qmgr. See section 5.14.2.1.ii, “Deleting
Custom Resources via qmgr”, on page 342.
Example 6-14: Adding custom resources:
Given the hook settings:
vn.resources_available["fab_int"] = 9
vn.resources_available["fab_str"] = "happy"
vn.resources_available["fab_bool"] = False
vn.resources_available["fab_size"] = pbs.size("7mb")
vn.resources_available["fab_time"] = pbs.duration("00:30:00")
vn.resources_available["fab_float"] = 7.0
The following resourcedef file entries are added:
# cat resourcedef
fab_int type=long, flag=h
fab_str type=string, flag=h
fab_bool type=boolean, flag=h
fab_size type=size, flag=h
fab_time type=long, flag=h
fab_float type=float, flag=h
6.10.9
Printing And Logging Messages
Hooks can log a custom string in the server’s log, at message log event class
pbs.LOG_DEBUG (0x0004). This is done using the pbs.logjobmsg(job ID, message)
facility. See "pbs.logjobmsg()” on page 611.
PBS Professional 13.0 Administrator’s Guide
AG-513
Hooks
Chapter 6
Hooks can specify a message for use when the corresponding action is rejected. This message
is printed to stderr by the command that triggered the event, and is printed in the daemon’s
log. This is done using the pbs.event().reject(<message>) function. See
"pbs.event().reject()” on page 576 for information on how to specify a rejection message.
Hooks cannot directly print to stdout or stderr, or read from stdin. See section 6.11.8.1,
“Avoid Hook File I/O”, on page 520, and section 6.17.2.8, “Hooks Attempting I/O”, on page
732.
6.10.10
Capturing Return Code
To capture an application’s return code, you capture the return code in Python and then return
it from the hook. You can use the Python subprocess module. Here is an example snippet:
import sys
if “<path to subprocess module>” not in sys.path:
sys.path.append(“<path to subprocess module>”)
import subprocess
try:
retcode = subprocess.call("mycommand myarg", shell=True)
except OSError:
retcode = -1
return retcode
6.10.11
When You Need Persistent Data
If you need your data to be persistent, your hook(s) must be able to save and retrieve the information.
6.10.12
Setting Up Job Environment on Sisters
If you need to set up the job’s environment on sister MoMs, use an execjob_begin hook.
This hook can set up the desired environment on sister MoMs so that the job can use the new
environment.
AG-514
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
If job tasks are spawned on sister MoMs via a tightly-integrated MPI that uses
tm_spawn(), any execjob_prologue and execjob_launch hooks run on the sister MoMs.
However, if job tasks are started using pbs_attach(), execjob_attach and
execjob_prologue (on the first task attached) hooks run on sister MoMs instead. For a
detailed description of the order in which hooks run on the primary and secondary execution
hosts, see Table 6-1, “Execution Event Hook Timing,” on page 457.
The old-style prologue runs only on the primary execution host; you cannot use it to set up the
environment on sister MoMs.
All job tasks running on vnodes managed by the same MoM get the same environment.
PBS Professional 13.0 Administrator’s Guide
AG-515
Hooks
Chapter 6
6.11
Advice and Caveats for Writing Hooks
6.11.1
Rules for Hook Access and Behavior
The following are rules and recommendations for writing hooks:
•
Use only the documented interfaces. Hooks which access PBS information or modify
PBS in any way except through these interfaces are erroneous and unsupported.
•
Do not attempt to manipulate the hook stored by PBS, except as specified in section 6.14,
“Managing Built-in Hooks”, on page 634.
•
Don’t delete attributes.
•
Don’t change environment variables set by PBS. See “Environment Variables” on page
244 of the PBS Professional Reference Guide for a list of these environment variables.
•
Do not try to access the following (a well-written, portable hook will not depend on any
of the following information):
•
Server configuration information: qmgr, resourcedef and pbs.conf
•
Scheduling information: qmgr, sched_config, fairshare, dedicated, holidays
•
Vnode information: qmgr, pbsnodes
•
Do not write hooks that depend on the behavior of other hooks.
•
Do not make assumptions about the value of PATH; use “import sys” and “modify
sys.path”
•
Do not make assumptions about the value of the current working directory.
•
For information about umask, see“qalter” on page 135 of the PBS Professional Reference Guide, “qsub” on page 225 of the PBS Professional Reference Guide, and “Job
Attributes” on page 393 of the PBS Professional Reference Guide.
•
Do not depend on order of execution of unrelated hooks. For example, do not depend on
one job submission’s queuejob hooks running entirely before another job submission’s
queuejob hooks. It is not guaranteed that all of one job’s hooks will finish before another
job’s hooks start.
•
The Resource_List attribute, like others, is a Python dictionary. These dictionaries support a restricted set of operations. They can reference values by index. Other features,
such as has_key(), are not available.
•
Hooks which execute PBS commands are erroneous and unsupported. The behavior of
executing PBS commands inside a hook is undefined (and is likely to cause the hook to
hang).
AG-516
PBS Professional 13.0 Administrator’s Guide
Hooks
6.11.2
Chapter 6
Check for Parameter Validity
To make hook scripts more robust, check first for the validity of the event parameters before
using them, by comparing against None:
if
If
If
If
If
If
pbs.event().job != None:
pbs.event().job_o != None:
pbs.event().src_queue != None:
pbs.event().resv != None:
pbs.event().vnode != None:
pbs.event().aoe != None:
6.11.2.1
Example of Checking Validity
% cat t2245.ty
import pbs
e = pbs.event()
if e.type == pbs.QUEUEJOB && (e.job == None):
e.reject("Event Job parameter is unset!")
elif e.type == pbs.MODIFYJOB && ((e.job == None) || (e.job_o == None)):
e.reject("Event Job or Job_o parameter is unset!")
elif e.type == pbs.RESVSUB && (e.resv == None):
e.reject("Event Resv parameter is unset!")
elif e.type == pbs.RUNJOB && (e.job == None):
e.reject("Event Job parameter is unset!")
6.11.3
Make Changes Only On Acceptance
We recommend that your hook does not make changes unless the hook accepts its event. You
do not want to have to back changes out upon a reject().
PBS Professional 13.0 Administrator’s Guide
AG-517
Hooks
Chapter 6
6.11.4
Offline Vnodes when exechost_startup Hook
Rejects
We recommend that before calling pbs.event().reject() in an exechost_startup hook, you
set the vnodes managed by the local MoM offline with an accompanying comment. This stops
jobs from being sent to the affected vnodes. For example:
vnlist = pbs.event().vnode_list
for v in vnlist.keys():
vnlist[v].state = pbs.ND_OFFLINE
vnlist[v].comment = “bad configuration”
pbs.event().reject(“not accepting jobs”)
6.11.5
Minimize Unnecessary Steps
To speed up your hooks, move any steps to where they are used the fewest times possible. For
example, if you retrieve several pieces of information about a job, but only use them if one of
them fits a certain criterion, put the bulk of the information-retrieval steps in the section
where you do the work on the job.
6.11.6
Use Fast Operations
Some of the examples we provide could be faster. Instead of using “==”, you can use the bitwise ampersand operator (“&”).
6.11.7
6.11.7.1
Avoiding Interference with Normal Operation
Treating SystemExit as a Normal Occurrence
Both pbs.event().accept() and pbs.event().reject() terminate hook execution by throwing a
SystemExit exception. A try...except clause without arguments will catch all exceptions. If
hook content appears in a try except ” clause, add the following to treat SystemExit as a
normal occurrence:
except SystemExit:
pass
AG-518
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Here is an example of an except clause that will catch SystemExit:
try:
...
except:
...
In the above case, we need to add the except SystemExit, so that it will look like this:
try:
...
except SystemExit:
pass
except:
...
If the existing code has a specified exception, we don't need to add “except SystemExit:",
since this hook script is only catching one particular exception and will not match SystemExit. For example:
try:
...
except pbs.BadAttributeValueError:
...
6.11.7.2
Allowing the Server to Modify Jobs
The server uses the qalter command during normal operation to modify jobs. Therefore, if
you have a modifyjob hook script, make sure you do not interfere with qalter commands
issued by the server. Catch these cases by starting the hook with an if clause that accepts
modification of jobs by PBS:
e = pbs.event()
if e.requestor in [ "PBS_Server" ]:
e.accept()
While the scheduler also uses the qalter command to modify jobs, this does not trigger any
modifyjob hooks.
PBS Professional 13.0 Administrator’s Guide
AG-519
Chapter 6
6.11.7.3
Hooks
Staying Within the Scheduler Alarm Time
Consider setting hook alarm values in runjob hooks so that they do not unduly delay the
scheduler. The scheduler will wait for a hook to finish executing. The scheduler’s cycle time
has a default value of 20 minutes, and is specified in the scheduler’s sched_cycle_length
attribute.
6.11.8
6.11.8.1
Avoiding Problems
Avoid Hook File I/O
When the PBS server is running, stdout, stderr, and stdin are closed. A hook script
attempting I/O will get an exception. To avoid this, redirect input and output to a file. See
section 6.17.2.8, “Hooks Attempting I/O”, on page 732.
6.11.8.2
Avoid Contacting Bad Host
Be careful not to specify a bad host in <job ID> in pbs.event().job.depend. If it references a
non-existent or heavily loaded PBS server, the current PBS server could hang for a few minutes as it tries to contact the bad host. For example:
pbs.event().job.depend = pbs.depend("after:23.bad_host")
The PBS server could hang while trying to contact "bad_host".
6.11.8.3
Avoid os._exit() Python Function
Do not use the os._exit() Python function. It will cause the PBS server to exit.
6.11.8.4
Avoid Attempting to Log Message Using Bad
Job ID
If the pbs.logjobmsg() method is passed a bad job ID, it raises a Python ValueError.
6.11.8.5
Avoid Taking Up Lots of Memory
Certain function calls in PBS Python hooks are expensive to use in terms of memory. If they
are called repeatedly in loops, they can use up a lot of memory, potentially causing the server
to hang or crash. For example, the following is expensive since each iterative call to
pbs.server().vnodes() causes internal allocation of memory, which won't be freed until after
the hook executes.
AG-520
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
In order to avoid this, produce the output only once, save it to memory, and iterate using the
copy. For example:
vnl = []
vni = pbs.server().vnodes()
for vn in vni:
pbs.logmsg(pbs.LOG_DEBUG, "found vn.name=%s" %(vn.name))
vnl.append(vn)
The following functions in PBS Python hooks return iterators, and should be used carefully:
•
Iterate over a list of jobs:
pbs.server().jobs()
pbs.queue.jobs()
•
Iterate over a list of queues:
pbs.server().queues()
•
Iterate over a list of vnodes:
pbs.server().vnodes()
•
Iterate over a list of reservations:
pbs.server().resvs()
6.11.8.6
Testing Vnode State
To see whether a vnode has a particular state set:
If v.state == pbs.ND_OFFLINE:
pbs.logmsg(pbs.LOG_DEBUG, “vnode %s is offline!” % (v.name))
6.11.9
Local Server Only
Hooks cannot access a server other than the local server. Hooks also cannot specify a nondefault server. So for example if a job submission specifies a queue at a server other than the
default, the hook can allow that submission, or can change it to the default server, but cannot
change it to another non-default server.
PBS Professional 13.0 Administrator’s Guide
AG-521
Chapter 6
6.11.10
6.11.10.1
Hooks
Scheduling Impact of Hooks
Effect of runjob Hooks on Preemption
With preemption turned on, the scheduler preempts low-priority jobs to run a high-priority
job. If the high-priority job is rejected by a runjob hook, then the scheduler undoes the preemption of the low-priority jobs. Suspended jobs are resumed, and checkpointed jobs are
restarted.
6.11.10.2
Effect of runjob Hooks with Strict Ordering
When strict_ordering is set to True and backfill is set to False, a most-deserving job that is
repeatedly rejected by a runjob hook will prevent other jobs from being able to run. A wellwritten hook would put the job on hold or requeue the job with a later execution time to prevent idling the system.
6.11.10.3
Effect of runjob Hooks with round_robin and
by_queue
With round_robin and by_queue set to True, a job continually rejected by a runjob hook
may prevent other jobs from the same queue from being run. A well-written hook would put
the job on hold or requeue the job with a later execution time to allow other jobs in the same
queue to be run.
A runjob hook's performance directly affects the responsiveness of the PBS scheduler. Consider carefully the trade-off between the work such a hook needs to do and your scheduler's
required performance.
6.11.10.4
Peer Scheduling and Hooks
When a job is pulled from one complex to another, the following happens:
•
Hooks are applied at the new complex as if the job had been submitted locally
•
Any movejob hooks at the furnishing server are run
AG-522
PBS Professional 13.0 Administrator’s Guide
Hooks
6.11.10.5
6.11.10.5.i
Chapter 6
Performance Considerations
Cost of Accessing Data
•
Using pbs.server() to get data about server, queues, jobs, vnodes, or reservations can be
slow if run in an execution hook. This is because of the overhead involved when the function has to directly connect to the server and pass requests (via TCP).
•
Making queries to pbs.server().resources_available[] can be slow.
6.11.10.5.ii
Cost of Different Hooks
•
Any queuejob hooks execute once per job submission
•
Any runjob hooks execute once per attempt to run a job, after the scheduler has found a
place for it
What this means to the hook writer:
•
Your queuejob hooks can generally get away with longer run times
•
Any hook that needs to listen to queuejob events needs to be quick to decide whether it
is needed or not
For a fast hook, avoid these:
•
Running external commands
•
Network connections
•
File I/O and logging
•
Storing information in server or vnode settings
•
Using pbs.server().resources_available
•
Iterating over the entire set of vnodes or jobs using pbs.server().vnodes() or
pbs.server().jobs().
In addition, see section 6.11.5, “Minimize Unnecessary Steps”, on page 518 and section
6.11.6, “Use Fast Operations”, on page 518.
PBS Professional 13.0 Administrator’s Guide
AG-523
Chapter 6
6.11.11
6.11.11.1
Hooks
Windows Caveats
Special Characters in Pathnames
On Windows, where backslashes may appear in pathnames, escape each backslash with
another backslash, or use the raw (‘r’) operator to form the string. Both of the following
work:
e = pbs.event()
e.progname = "C:\\Program Files\\PBS Pro\\exec\\bin\\pbsnodes.exe"
e.progname = r"C:\Program Files\PBS Pro\exec\bin\pbsnodes.exe"
See section 6.12.4.17, “Event Object Member Caveats”, on page 575.
6.11.11.2
Creating Hooks Under Windows
To create a hook under Windows, you must use the installation account. For domained environments, the installation account must be a local account that is a member of the local
Administrators group on the local computer. For standalone environments, the installation
account must be a local account that is a member of the local Administrators group on the
local computer.
6.11.11.3
Using cmd Prompt
On Windows 7 and later with UAC enabled, if you will use the cmd prompt to operate on
hooks, or for any privileged command such as qmgr, you must run the cmd prompt with
option Run as Administrator.
6.11.11.4
Importing and Exporting Hooks
If the name of <input_file> contains spaces, as are used in Windows filenames, then <input
file> must be quoted.
6.11.11.5
Modifying Events
On Windows, in a multi-vnoded job, be careful modifying pbs.event().progname and
pbs.event().argv[] parameters; some values are tacked on by pbs_mom and are required.
See section 6.12.4.17.i, “Modifying progname or argv[] Under Windows”, on page 575.
AG-524
PBS Professional 13.0 Administrator’s Guide
Hooks
6.11.11.6
Chapter 6
Using Sleep in a Hook Script
Under Windows, the PBS server or MoM cannot interrupt a hook script executing the Python
time.sleep(). The server needs to be able to interrupt the script if the script reaches its timeout. In order to be able to interrupt the script, create a sleep that incrementally sleeps for 1
second. The server can then interrupt the hook script in between the sleeps. For example:
import time
def mysleep(sec):
for i in range(sec):
time.sleep(1)
mysleep(30)
<-- pseudo sleep for 30 seconds
6.12
6.12.1
Interface to Hooks
The pbs Module
The pbs module provides an interface to PBS and the hook environment. The interface is
made up of Python objects, members, and methods. You can operate on the objects and use
the methods in your Python code. In order to use the pbs module, you must begin your
Python code by importing the pbs module. For example, in a script that modifies a job:
import pbs
pbs.event().job.comment=”Modified this job”
For the contents of the pbs module, see section 6.15, “Python Modules and PBS”, on page
637.
6.12.2
PBS Interface Objects
The PBS interface contains different kinds of objects:
•
Objects to represent PBS entities, e.g. jobs, server, queues, vnodes, reservations, events,
log messages, etc.
•
Objects to represent job, server, vnode, queue, and reservation attributes.
•
Objects to represent arguments to PBS commands, PBS version information, etc.
•
Constant objects to represent event types, states, log event classes, queue types, and
exceptions.
PBS Professional 13.0 Administrator’s Guide
AG-525
Hooks
Chapter 6
6.12.3
PBS Interface Object Types
Several PBS objects have types that differ from what you’ll see if you use the type() method
on the object. The type() function returns the internal representation, which is a derivative of
the Python type.
6.12.3.1
Table of PBS Interface Objects
PBS provides a set of interface objects for use in hooks. The following table lists all of the
PBS objects in alphabetical order. Each of these objects is described in detail later in the
chapter.
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.acl
Represents a PBS ACL . See section 6.12.14.3.i, “Method to Create ACL”, on page 607.
pbs.args
Represents a space-separated list of PBS arguments to commands such as qsub, qdel. See
"Method to Create Command Argument List” on page 607.
pbs.argv[]
Argument strings to be passed to the program executed for the job. See section 6.12.4.16.i, “Job
Program Arguments Event Member”, on page 568.
pbs.BadAttributeValueError
Raised when setting the member value of a pbs.* object and the value given is invalid. See
"Table of Exceptions” on page 486
pbs.BadAttributeValueTypeError
Raised when setting the member value of a pbs.* object and the value type is invalid. See
"Table of Exceptions” on page 486
pbs.BadResourceValueError
Raised when setting the resource value of a pbs.* object and the value given is invalid. See
"Table of Exceptions” on page 486
pbs.BadResourceValueTypeError
Raised when setting the resource value of a pbs.* object and the value type is invalid. See
"Table of Exceptions” on page 486
AG-526
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.checkpoint
Represents a job's checkpoint attribute. See "Job Checkpoint Attribute Member” on page 586
pbs.depend
Represents a job's dependency attribute. See "Job depend Attribute Member” on page 587.
pbs.duration
Represents a time interval. See "Method to Create Duration from Time String or Integer” on
page 608.
pbs.email_list
Represents the set of users to whom mail may be sent. Example: Job's Mail_Users attribute.
See "Method to Create Email List” on page 609
pbs.env[]
Dictionary of environment variables. See section 6.12.4.16.ii, “Job Environment Event Member”, on page 569.
pbs.event
Represents a PBS event. See "Event Objects” on page 539
pbs.EventIncompatibleError
Raised when referencing a nonexistent member in pbs.event(). See "Table of Exceptions” on
page 486.
pbs.EXECHOST_PERIODIC
Type for an exechost_periodic hook event. See section 6.12.4.15, “exechost_periodic: Periodic Events on All Execution Hosts”, on page 566.
pbs.EXECHOST_STARTUP
Type for an exechost_startup hook event. See section 6.12.4.14, “exechost_startup: Event
When Execution Host Starts Up”, on page 564.
pbs.EXECJOB_ATTACH
Type for an execjob_attach hook event. See section 6.12.4.10, “execjob_attach: Event when
pbs_attach() runs”, on page 557.
pbs.EXECJOB_BEGIN
Type for an execjob_begin hook event. See section 6.12.4.7, “execjob_begin: Event when
Execution Host Receives Job”, on page 551.
PBS Professional 13.0 Administrator’s Guide
AG-527
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.EXECJOB_END
Type for an execjob_end hook event. See section 6.12.4.13, “execjob_end: Event After Job
Cleanup”, on page 563.
pbs.EXECJOB_EPILOGUE
Type for an execjob_epilogue hook event. See section 6.12.4.12, “execjob_epilogue: Event
Just After Killing Job Tasks”, on page 561.
pbs.EXECJOB_LAUNCH
Type for an execjob_launch hook event. See section 6.12.4.9, “execjob_launch: Event when
Execution Host Receives Job”, on page 555.
pbs.EXECJOB_PRETERM
Type for an execjob_preterm hook event. See section 6.12.4.11, “execjob_preterm: Event Just
Before Killing Job Tasks”, on page 559.
pbs.EXECJOB_PROLOGUE
Type for an execjob_prologue hook event. See section 6.12.4.8, “execjob_prologue: Event
Just Before Execution of Top-level Job Process”, on page 553.
pbs.exec_host
Represents a job’s exec_host attribute. See "pbs.job.exec_host” on page 587 .
pbs.exec_vnode
Represents a job’s exec_vnode attribute. See "pbs.job.exec_vnode” on page 587 .
pbs.group_list
Represents a list of group names. See "pbs.job.group_list” on page 587.
pbs.hold_types
Represents the Hold_Types attribute of a job. See "pbs.job.Hold_Types” on page 587.
pbs.job
Represents a PBS job. See "Job Objects” on page 585
pbs.job_list[]
List of pbs.job objects. See "pbs.event().job_list” on page 571.
AG-528
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.job_sort_formula
Represents the job_sort_formula server attribute. See "pbs.job_sort_formula()” on page 610
pbs.JOB_STATE_BEGUN
Job arrays only. Job array has started. See "Job job_state Attribute Member” on page 588
pbs.JOB_STATE_EXITING
Job is exiting after having run. See "Job job_state Attribute Member” on page 588
pbs.JOB_STATE_EXPIRED
Subjobs only. Subjob is finished (expired). See "Job job_state Attribute Member” on page 588
pbs.JOB_STATE_FINISHED
Job is finished: job executed successfully, job was terminated while running, job execution
failed, or job was deleted before execution. See "Job job_state Attribute Member” on page 588.
pbs.JOB_STATE_HELD
Job is held. See "Job job_state Attribute Member” on page 588
pbs.JOB_STATE_MOVED
Job has been moved to another server. See "Job job_state Attribute Member” on page 588.
pbs.JOB_STATE_QUEUED
Job is queued, eligible to run or be routed. See "Job job_state Attribute Member” on page 588
pbs.JOB_STATE_RUNNING
Job is running. See "Job job_state Attribute Member” on page 588
pbs.JOB_STATE_SUSPEND
Job is suspended by server. See "Job job_state Attribute Member” on page 588
pbs.JOB_STATE_SUSPEND_USERACTIVE
Job is suspended due to workstation becoming busy. See "Job job_state Attribute Member” on
page 588
pbs.JOB_STATE_TRANSIT
Job is in transit. See "Job job_state Attribute Member” on page 588
PBS Professional 13.0 Administrator’s Guide
AG-529
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.JOB_STATE_WAITING
Job is waiting for its requested execution time to be reached, or the job’s stagein request has
failed. See "Job job_state Attribute Member” on page 588
pbs.join_path
Represents the job’s Join_Path attribute. See "Job Join_Path Attribute Member” on page 589.
pbs.keep_files
Represents the Keep_Files job attribute. See "Job Keep_Files Attribute Member” on page 589
pbs.license_count
Represents a set of licensing-related counters. Server attribute. See section 6.12.14.3.xv,
“Method to Create license_count Object”, on page 611.
pbs.LOG_DEBUG
Log event class. See "Message Log Event Class Objects” on page 612
pbs.LOG_ERROR
Log event class. See "Message Log Event Class Objects” on page 612
pbs.LOG_WARNING
Log event class. See "Message Log Event Class Objects” on page 612
pbs.mail_points
Represents the Mail_Points attribute of a job. See "Job Mail_Points Attribute Member” on
page 589.
pbs.MODIFYJOB
The modifyjob hook event type. Triggered by qalter or pbs_alterjob() API call. Not
triggered by scheduler job modification. See "Event Types” on page 540.
pbs.MOVEJOB
The movejob hook event type. Triggered by qmove or pbs_movejob() API call. See
"Event Types” on page 540
pbs.ND_BUSY
Represents busy vnode state. See section 6.12.11.4, “Vnode State Constant Objects”, on page
600.
AG-530
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.ND_DEFAULT_EXCL
Represents default_excl sharing vnode attribute value. See section 6.12.11.3, “Vnode Sharing
Constant Objects”, on page 600.
pbs.ND_DEFAULT_SHARED
Represents default_shared sharing vnode attribute value. See section 6.12.11.3, “Vnode
Sharing Constant Objects”, on page 600.
pbs.ND_DOWN
Represents down vnode state. See section 6.12.11.4, “Vnode State Constant Objects”, on page
600.
pbs.ND_FORCE_EXCL
Represents force_excl sharing vnode attribute value. See section 6.12.11.3, “Vnode Sharing
Constant Objects”, on page 600.
pbs.ND_FREE
Represents free vnode state. See section 6.12.11.4, “Vnode State Constant Objects”, on page
600.
pbs.ND_GLOBUS
Globus can still send jobs to PBS, but PBS no longer supports sending jobs to Globus.
No longer used. Represents globus value for vnode ntype attribute. See section 6.12.11.2,
“Vnode Type Constant Objects”, on page 599.
pbs.ND_IGNORE_EXCL
Represents ignore_excl sharing vnode attribute value. See section 6.12.11.3, “Vnode Sharing
Constant Objects”, on page 600.
pbs.ND_JOBBUSY
Represents job-busy vnode state. See section 6.12.11.4, “Vnode State Constant Objects”, on
page 600.
pbs.ND_JOB_EXCLUSIVE
Represents job-exclusive vnode state. See section 6.12.11.4, “Vnode State Constant Objects”,
on page 600.
pbs.ND_OFFLINE
Represents offline vnode state. See section 6.12.11.4, “Vnode State Constant Objects”, on page
600.
PBS Professional 13.0 Administrator’s Guide
AG-531
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.ND_PBS
Represents pbs value for vnode ntype attribute. See section 6.12.11.2, “Vnode Type Constant
Objects”, on page 599
pbs.ND_PROV
Represents provisioning vnode state. See section 6.12.11.4, “Vnode State Constant Objects”,
on page 600.
pbs.ND_RESV_EXCLUSIVE
Represents resv-exclusive vnode state. See section 6.12.11.4, “Vnode State Constant Objects”,
on page 600.
pbs.ND_STALE
Represents stale vnode state. See section 6.12.11.4, “Vnode State Constant Objects”, on page
600.
pbs.ND_STATE_UNKNOWN
Represents state-unknown, down vnode state. See section 6.12.11.4, “Vnode State Constant
Objects”, on page 600.
pbs.ND_UNRESOLVABLE
Represents unresolvable vnode state. See section 6.12.11.4, “Vnode State Constant Objects”,
on page 600.
pbs.ND_WAIT_PROV
Represents wait-provisioning vnode state. See section 6.12.11.4, “Vnode State Constant
Objects”, on page 600.
pbs.node_group_key
Represents the node_group_key attribute. See "Method to Create node_group_key Object”
on page 614.
pbs.path_list
Represents a list of pathnames. See "Method to Create path_list Object” on page 614.
pbs.pbs_conf[]
Dictionary of entries in pbs.conf. See "pbs.pbs_conf[]” on page 601.
pbs.pid
Represents the process ID of a process belonging to a job.
AG-532
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.place
Represents the place specification when submitting a job. See section 6.12.14.3.xxiii, “Method
to Create place Object”, on page 614.
pbs.progname
Path of job shell or executable. See section 6.12.4.16.ix, “Job Executable Event Member”, on
page 572.
pbs.QTYPE_EXECUTION
Represents execution value for queue_type queue attribute. See "Queue Type Constant
Objects” on page 585
pbs.QTYPE_ROUTE
Represents route value for queue_type queue attribute. See "Queue Type Constant Objects”
on page 585
pbs.queue
Represents a PBS queue. See "Queue Objects” on page 583
pbs.QUEUEJOB
The queuejob hook event type. Triggered by qsub or pbs_submit() API call. See section
6.12.4.3, “queuejob: Event when Job is Queued”, on page 545.
pbs.range
Represents a range of numbers referring to job array indices. See section 6.12.14.3.xxiv,
“Method to Create range Object”, on page 615.
pbs.resv
Represents a PBS reservation. See "Reservation Objects” on page 596
pbs.RESVSUB
The resvsub hook event type. Triggered by pbs_rsub or pbs_submitresv() API call.
See section 6.12.4.2, “resvsub: Event when Reservation is Created”, on page 544.
pbs.RESV_STATE_BEING_DELETED
The reservation state RESV_BEING_DELETED. See "Reservation State Constant Objects”
on page 597
PBS Professional 13.0 Administrator’s Guide
AG-533
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.RESV_STATE_CONFIRMED
The reservation state RESV_CONFIRMED. See "Reservation State Constant Objects” on
page 597
pbs.RESV_STATE_DEGRADED
The reservation state RESV_DEGRADED. See "Reservation State Constant Objects” on
page 597
pbs.RESV_STATE_DELETED
The reservation state RESV_DELETED. See "Reservation State Constant Objects” on
page 597
pbs.RESV_STATE_DELETING_JOBS
The reservation state RESV_DELETING_JOBS. See "Reservation State Constant Objects”
on page 597
pbs.RESV_STATE_FINISHED
The reservation state RESV_FINISHED. See "Reservation State Constant Objects” on
page 597
pbs.RESV_STATE_NONE
The reservation state RESV_NONE. See "Reservation State Constant Objects” on page 597
pbs.RESV_STATE_RUNNING
The reservation state RESV_RUNNING. See "Reservation State Constant Objects” on
page 597
pbs.RESV_STATE_TIME_TO_RUN
The reservation state RESV_TIME_TO_RUN. See "Reservation State Constant Objects” on
page 597
pbs.RESV_STATE_UNCONFIRMED
The reservation state RESV_UNCONFIRMED. See "Reservation State Constant Objects” on
page 597
pbs.RESV_STATE_WAIT
The reservation state RESV_WAIT. See "Reservation State Constant Objects” on page 597
AG-534
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.route_destinations
Represents route_destinations queue attribute. See "Method to Create route_destinations
Object” on page 616.
pbs.RUNJOB
The runjob hook event type. Triggered by qrun or pbs_runjob() API call. See section
6.12.4.6, “runjob: Event Before Job is Received by MoM”, on page 549.
pbs.select
Represents the select specification when submitting a job. See "Method to Create select
Object” on page 616.
pbs.server
Represents the local PBS server. See"Server Objects” on page 579
pbs.size
Represents a PBS size type. See "Method to Create size Object” on page 616.
pbs.software
Represents a site-dependent software specification resource. See "Method to Create Software
Resource Object” on page 617.
pbs.staging_list
Represents a list of file stagein or stageout parameters. See "Job stagein and stageout Attribute
Members” on page 590.
pbs.state_count
Represents a set of job-related state counters. See "Method to Create state_count Object” on
page 618.
pbs.SV_STATE_ACTIVE
Server state is Scheduling. See "Server State Member” on page 580
pbs.SV_STATE_HOT
Server state is Hot_Start. See "Server State Member” on page 580
pbs.SV_STATE_IDLE
Server state is Idle. See "Server State Member” on page 580
PBS Professional 13.0 Administrator’s Guide
AG-535
Hooks
Chapter 6
Table 6-15: PBS Interface Objects
PBS Interface Object
Description
pbs.SV_STATE_SHUTDEL
Server state is Terminating, Delayed. See "Server State Member” on page 580
pbs.SV_STATE_SHUTIMM
Server state is Terminating. See "Server State Member” on page 580
pbs.SV_STATE_SHUTSIG
Server state is Terminating. See "Server State Member” on page 580
pbs.UnsetAttributeNameError
Raised when referencing a non-existent member name of a pbs.* object. See "Table of Exceptions” on page 486
pbs.UnsetResourceNameError
Raised when referencing a non-existent resource name of a pbs.* object. See "Table of Exceptions” on page 486
pbs.user_list
Represents a list of user names. See section 6.12.14.3.xxxii, “Method to Create user_list
Object”, on page 618.
pbs.vchunk
Represents a job chunk. See section 6.12.9, “Chunk Objects”, on page 595.
pbs.version
Represents version information for PBS. See section 6.12.14.3.xxxiii, “Method to Create PBS
Version Object”, on page 618.
pbs.vnode
Represents a PBS vnode. See section 6.12.11, “Vnode Objects”, on page 598.
pbs.vnode_list[]
Represents a list of pbs.vnode objects. See section 6.12.4.16.xv, “The Vnode List Event Member”, on page 573
SystemExit
Raised when accepting or rejecting an action. See "Table of Exceptions” on page 486
AG-536
PBS Professional 13.0 Administrator’s Guide
Hooks
6.12.3.2
Chapter 6
Maps of Object Members and Methods
Figure 6-3 shows a map of the PBS Python objects. All hook event objects have the methods
listed in “global methods”. Each object also has its own members and methods, as shown.
We expand hook event objects in Figure 6-4.
global methods
attribute creation methods
acl()
checkpoint()
depend()
duration()
email_list()
exec_host()
exec_vnode()
group_list()
hold_types()
job_sort_formula()
join_path()
keep_files()
license_count()
mail_points()
node_group_key()
path_list()
place()
range()
route_destinations()
select()
size()
staging_list()
state_count()
user_list()
other methods
args()
get_hook_config_file()
get_local_nodename()
get_pbs_conf()
logjobmsg()
logmsg()
pbs_env()
reboot()
software()
version()
server
name
server attributes, e.g.
default_qsub_arguments
job_sort_formula
resources_available
resources_assigned
job("<job ID>")
jobs()
resv("<reservation ID>")
resvs()
queue("<queue name>")
queues()
vnode("<vnode name>")
vnodes()
scheduler_restart_cycle()
queue
name
queue attributes, e.g.
from_route_only
queue_type
job("<job ID>")
jobs()
hook_config_filename
job
id
job attributes, e.g.
Hold_Types
Execution_Time
Variable_List[<variable>]
exec_vnode
job resources
delete()
in_ms_mom()
is_checkpointed()
rerun()
resv
resvid
reservation attributes, e.g.
authorized_users
reserve_start
vnode
vnode attributes, e.g.
ntype
sharing
state
resources_available
exec_vnode
pbs_conf
all events
chunks[]
vchunk
vnode_name
chunk_resources[]
chunk_resources.keys()
Figure 6-3:Map of members and methods for major PBS objects
PBS Professional 13.0 Administrator’s Guide
AG-537
Hooks
Chapter 6
Figure 6-4 shows an expanded view of hook event objects. All hook events have the members
and methods listed in Figure 6-3, which shows events inheriting global methods. Each type of
event also has its own members and/or methods. For example, movejob events have a job
member and a src_queue member, in addition to the type, hook_name, requestor,
requestor_host, and hook_type members, and the accept(), get_local_nodename(), logjobmsg(), logmsg(), and reject() methods shared by all events. For a description of event
objects, see section 6.12.4, “Event Objects”, on page 539.
Event Object Members and Methods
all events
alarm
hook_name
hook_type
requestor
requestor_host
type
execjob_launch
argv[]
env[]
job
progname
vnode_list[]
accept()
reject()
exechost_startup
vnode_list[]
exechost_periodic
job_list[]
vnode_list[]
queuejob, runjob
job
modifyjob
job
job_o
execjob_begin,
execjob_prologue,
execjob_preterm
job
vnode_list[]
execjob_attach
pid
job
vnode_list[]
movejob
job
src_queue
resvsub
resv
execjob_end,
execjob_epilogue
job
vnode_list[]
Figure 6-4:Expanded view of event object members and methods
AG-538
PBS Professional 13.0 Administrator’s Guide
Hooks
6.12.4
Chapter 6
Event Objects
pbs.event
The event object represents the event that has triggered the hook. You can pass the object to
the hook script, and use it in the script. To retrieve objects associated with the event, use this:
pbs.event().<object>
For example, to retrieve the job that triggered an event:
pbs.event().job
There are several types of events. Each type of event is triggered by a different occurrence,
and each type has a corresponding hook type. Each type of event has access to different data,
and can perform different operations. Some data and operations are common to all events.
Each type of event hook can read and set different job, vnode, and reservation attributes and
resources. Each type of event can read different server and queue attributes and resources.
We list which attributes and resources can be set for each event in section 6.10.4, “Using
Attributes and Resources in Hooks”, on page 488.
PBS Professional 13.0 Administrator’s Guide
AG-539
Hooks
Chapter 6
6.12.4.1
Event Types
pbs.event().type
The type of the event. Represents the type attribute of the hook. This object can take one or
more of the values shown here. The following table summarizes the event types, their constant objects, their triggers, and when and where they run, and gives a pointer to a complete
description of the associated hook:
Table 6-16: Event Types and Objects
Event Type &
Constant
Object
resvsub
pbs.RESVSUB
Trigger
Where Run
Triggered by pbs_rsub
and the
pbs_submitresv()
API call.
Description
At server
See section 6.12.4.2,
“resvsub: Event when
Reservation is Created”, on page 544.
Triggered by qsub and
At server
the pbs_submit() API
call.
See section 6.12.4.3,
“queuejob: Event
when Job is Queued”,
on page 545.
A resvsub hook is executed after all processing
of pbs_rsub input, and
just before a reservation is
created.
queuejob
pbs.QUEUEJOB
Not triggered by requeueing a job (qrerun) or on
node_fail_requeue, when
a job is discarded by the
MoM because the execution host went down.
A queuejob hook is executed after all processing
of qsub input, and just
before the job is queued.
AG-540
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-16: Event Types and Objects
Event Type &
Constant
Object
modifyjob
pbs.MODIFYJOB
Trigger
Where Run
Triggered by qalter, the At server
pbs_alterjob() API
call, calculating eligible
time, and setting the job’s
comment.
Description
See section 6.12.4.4,
“modifyjob: Event
when Job is Altered”,
on page 546.
A modifyjob hook is executed after all processing
of qalter input, and just
before the job's attributes
are modified.
Not triggered when the
scheduler modifies a job.
movejob
pbs.MOVEJOB
Triggered by qmove and
the pbs_movejob()
API call.
At server
See section 6.12.4.5,
“movejob: Event when
Job is Moved”, on
page 547.
Triggered by qrun and
At server
the pbs_runjob() API
call.
See section 6.12.4.6,
“runjob: Event Before
Job is Received by
MoM”, on page 549.
Not triggered by
pbs_rsub
-Wqmove=<job ID>.
A movejob hook is executed after qmove arguments are processed, but
before a job is moved
from one queue to another.
runjob
pbs.RUNJOB
A runjob hook is executed just before a job is
sent to an execution host.
PBS Professional 13.0 Administrator’s Guide
AG-541
Chapter 6
Hooks
Table 6-16: Event Types and Objects
Event Type &
Constant
Object
Trigger
Where Run
Description
An execjob_begin hook
is executed when MoM
receives the job, after any
files or directories are
staged in.
On primary execution host, and
if successful, on
all sister hosts
allocated to job
See section 6.12.4.7,
“execjob_begin: Event
when Execution Host
Receives Job”, on page
551.
An execjob_prologue
hook is executed just
before the first job process
is started.
On primary execution host, and
on all sister
hosts where any
job task is
spawned or
attached
See section 6.12.4.8,
“execjob_prologue:
Event Just Before Execution of Top-level Job
Process”, on page 553.
An execjob_launch hook On primary
is
executed just before the host, and on all
pbs.EXECJOB_LA
user’s
program is run.
sister hosts
UNCH
where MPI
tasks are started
with
tm_spawn()
See section 6.12.4.9,
“execjob_launch:
Event when Execution
Host Receives Job”, on
page 555
An execjob_attach hook
is
executed before any
pbs.EXECJOB_AT
execjob_prologue
hooks
TACH
run
On each vnode
where
pbs_attach
() runs
See section 6.12.4.10,
“execjob_attach: Event
when pbs_attach()
runs”, on page 557.
An execjob_preterm
On all hosts
hook is executed when the allocated to the
job receives a termination job
signal.
See section 6.12.4.11,
“execjob_preterm:
Event Just Before Killing Job Tasks”, on
page 559.
execjob_begin
pbs.EXECJOB_B
EGIN
execjob_prologue
pbs.EXECJOB_P
ROLOGUE
execjob_launch
execjob_attach
execjob_preterm
pbs.EXECJOB_P
RETERM
AG-542
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
Table 6-16: Event Types and Objects
Event Type &
Constant
Object
execjob_epilogue
pbs.EXECJOB_E
PILOGUE
execjob_end
pbs.EXECJOB_E
ND
exechost_startup
pbs.EXECHOST_
STARTUP
Trigger
Where Run
Description
An execjob_epilogue
On all hosts
hook is executed after all allocated to the
of the job processes have job
terminated, after executing
or killing a job, but before
job is cleaned up
See section 6.12.4.12,
“execjob_epilogue:
Event Just After Killing Job Tasks”, on
page 561.
An execjob_end hook is On all hosts
executed on all hosts allo- allocated to the
cated to a job, at the end of job
all job processing
See section 6.12.4.13,
“execjob_end: Event
After Job Cleanup”, on
page 563.
An exechost_startup
On all execution
hook is executed when a hosts in the
MoM starts up or receives complex.
a HUP (UNIX/Linux).
See section 6.12.4.15,
“exechost_periodic:
Periodic Events on All
Execution Hosts”, on
page 566.
exechost_periodic An exechost_periodic
On all execution See section 6.12.4.15,
hook
is
executed
at
specihosts in the
“exechost_periodic:
pbs.EXECHOST_
fied
intervals
complex
Periodic Events on All
PERIODIC
Execution Hosts”, on
page 566.
PBS Professional 13.0 Administrator’s Guide
AG-543
Hooks
Chapter 6
6.12.4.2
6.12.4.2.i
resvsub: Event when Reservation is Created
Modifying Reservation Creation (pbs_rsub)
•
When an advance or standing reservation is created via pbs_rsub, resvsub hooks can
modify the reservation’s attributes that can be set via pbs_rsub
•
When an advance or standing reservation is created, resvsub hooks can specify additional attributes that can be specified via pbs_rsub
•
The input reservation attributes on which resvsub hooks operate are those that exist after
all pbs_rsub processing of command line arguments is completed
•
For resvsub hooks, the input job attributes do not include:
•
Server or queue resources_default or default_chunk.
•
Conversions from old syntax (-lnodes & -lncpus) to new select and place syntax
The only time that a reservation can be modified is during its creation. A resvsub event hook
can set any settable reservation attribute and any resource that can be specified via
pbs_rsub. See Table 6-10, “Reservation Attributes Readable & Settable via Events,” on
page 503 for a complete list of the reservation attributes that this hook can read and set.
6.12.4.2.ii
The resvsub Hook Interface
The type for this event is pbs.RESVSUB.
A resvsub hook is executed after all processing of pbs_rsub input, and just before a reservation is created. The hook is triggered by pbs_rsub and the pbs_submitresv() API
call.
A reservation object’s attributes appear to a resvsub hook as they would be after the event,
not before it.
A pbs.RESVSUB event has the following member, in addition to those listed in Table 6-17,
“Using Event Object Members in Events,” on page 568 and Table 6-28, “Methods Available
in Events,” on page 605:
pbs.event().resv
A pbs.resv object containing the attributes and resources specified for the reservation being requested. See section 6.12.10, “Reservation Objects”, on page 596.
A pbs.event().accept() terminates hook execution and allows creation of the reservation, and
any changes to reservation resources take effect.
A pbs.event().reject() terminates hook execution and causes the reservation not to be created.
AG-544
PBS Professional 13.0 Administrator’s Guide
Hooks
6.12.4.3
6.12.4.3.i
•
Chapter 6
queuejob: Event when Job is Queued
Modifying Job Submission (qsub)
When a job is submitted via qsub, queuejob hooks can modify the following things
explicitly specified in the job submission:
•
Job attributes that can be set via qsub
•
Job comment
•
Resources requested by the job
•
When a job is submitted via qsub, queuejob hooks can add resource requests to those
specified in the job submission
•
The input job attributes on which queuejob hooks operate are those that exist after all
qsub processing is completed. These input attributes include:
•
Command line arguments
•
Script directives
•
Server default_qsub_arguments
•
When a queuejob hook runs at job submission, the hook can affect only that job.
•
For queuejob hooks, the input job attributes do not include:
•
Server or queue resources_default or default_chunk.
•
Conversions from old syntax (-lnodes or -lncpus) to new select and place syntax
See section 6.10.4, “Using Attributes and Resources in Hooks”, on page 488, for a complete
listing of attributes and resources that this hook can modify.
6.12.4.3.ii
The queuejob Hook Interface
The event type for this event is pbs.QUEUEJOB.
A queuejob hook runs after all processing of qsub input, just before the job reaches the
server, and before the job is queued, including when a job is peer queued to a server with a
queuejob hook. The hook is triggered by qsub or the pbs_submit() API call. A queuejob hook is not triggered by requeueing a job (qrerun) or on node_fail_requeue, when a
job is discarded by the MoM because the execution host went down. A queuejob hook runs
once per job array.
In a queuejob event, the event’s job object members are as they would be if the job were to be
successfully submitted.
A pbs.QUEUEJOB event has the following member, in addition to those listed in Table 6-17,
“Using Event Object Members in Events,” on page 568 and Table 6-28, “Methods Available
in Events,” on page 605:
PBS Professional 13.0 Administrator’s Guide
AG-545
Hooks
Chapter 6
pbs.event().job
A pbs.job object with the attributes and resources specified at submission for the job
being queued. See section 6.12.7, “Job Objects”, on page 585.
A pbs.event().accept() terminates hook execution and allows the job to be queued, and any
changes to job attributes or resources take effect.
A pbs.event().reject() terminates hook execution and causes the job not to be queued. The
job is not accepted by the server, and is not assigned a job ID.
6.12.4.4
6.12.4.4.i
modifyjob: Event when Job is Altered
Modifying Job Change (qalter)
•
When a job is changed via qalter, modifyjob hooks can modify the arguments passed
to qalter
•
When a modifyjob hook runs, it can change the attributes of the job that can be changed
via qalter
Before the job runs, this hook can set any job attribute that can be changed via qalter, can
set the job’s comment, and can set any resource requested by the job.
While the job is running, the only job attributes and resources that the hook can set are those
that can be changed via the qalter command: the job’s cput and walltime. See section
6.10.4, “Using Attributes and Resources in Hooks”, on page 488, for a complete listing of
attributes and resources that this hook can modify.
See “qalter” on page 135 of the PBS Professional Reference Guide and “Job Attributes” on
page 393 of the PBS Professional Reference Guide.
6.12.4.4.ii
The modifyjob Hook Interface
The type for this event is pbs.MODIFYJOB.
A modifyjob hook is executed after all processing of qalter input, and just before the job's
attributes are modified. The hook is triggered by the following:
•
A qalter command, except when the scheduler calls the command
•
The pbs_alterjob() API call
•
Calculating eligible time
•
Setting the job’s comment
A modifyjob hook runs once per job array.
A job object’s attributes appear to a modifyjob hook as they would be after the job is modified, not before.
A modifyjob event hook shows the original job with all its attributes in pbs.event().job_o.
AG-546
PBS Professional 13.0 Administrator’s Guide
Hooks
Chapter 6
A pbs.MODIFYJOB event has the following members, in addition to those listed in Table 617, “Using Event Object Members in Events,” on page 568 and Table 6-28, “Methods Available in Events,” on page 605:
pbs.event().job
A pbs.job object representing the job being modified. See section 6.12.7, “Job
Objects”, on page 585. This job object contains only those attributes and resources
specified for modification. This job object does not contain any attributes or
resources that are not to be modified.
pbs.event().job_o
A pbs.job object representing the original job, before the job was modified via
qalter. See section 6.12.4.16.vii, “Original Job Event Member”, on page 571.
A pbs.event().accept() terminates hook execution and allows the job to be altered, and any
changes to job attributes or resources take effect.
A pbs.event().reject() terminates hook execution and causes the job not to be altered.
6.12.4.5
6.12.4.5.i
movejob: Event when Job is Moved
Modifying Job Move (qmove)
•
When a job is moved via qmove, movejob hooks can modify the arguments passed to
qmove
•
When a movejob hook runs, it can change the job’s destination queue to any queue on the
default server
A movejob hook can specify only local queues as the destination queue. Whether a job is
submitted with a local queue or a remote queue as its destination, a movejob hook can
change the destination to a local queue.
The only job attribute that a movejob event hook can set is the job’s destination queue.
6.12.4.5.ii
The movejob Hook Interface
The type for this event is pbs.MOVEJOB.
The server runs its movejob hooks when any of the following happens:
•
This server is the furnishing server when peer scheduling a job
•
A job is moved from this server to another server via the qmove command
•
A job is moved between two queues on this server
PBS Professional 13.0 Administrator’s Guide
AG-547
Chapter 6
Hooks
A movejob hook is executed after qmove arguments are processed, but before a job is moved
from one queue to another. This hook is triggered by qmove and the pbs_movejob() API
call. movejob hooks are not triggered by pbs_rsub -Wqmove=<job ID>. A movejob
hook runs once per job array.
A job object’s attributes appear to a movejob hook as they would be after the event, not
before it.
The hook shows the job’s originating queue in the pbs.event().src_queue object member.
A pbs.MOVEJOB event has the following members, in addition to those listed in Table 6-17,
“Using Event Object Members in Events,” on page 568 and Table 6-28, “Methods Available
in Events,” on page 605:
pbs.event().job
A pbs.job object representing the job being moved. See section 6.12.7, “Job
Objects”, on page 585.
Note that pbs.event().job.queue refers to the destination queue, not the current
queue.
pbs.event().src_queue
The pbs.queue object representing the originating queue where pbs.event().job
came from.
A pbs.event().accept() terminates hook execution and allows the job to be moved, and any
changes to job attributes or resources take effect.
A pbs.event().reject() terminates hook execution and causes the job not to be moved.
AG-548
PBS Professional 13.0 Administrator’s Guide
Hooks
6.12.4.6
Chapter 6
runjob: Event Before Job is Received by MoM
6.12.4.6.i
Changes Before Job is Sent to MoM (qrun)
When the scheduler runs a job or the administrator runs a job using the qrun command, any
runjob hooks are executed.
•
On accepting a job, a runjob hook can modify the following:
•
The job’s Error_Path attribute
•
The job’s Output_Path attribute
•
All of the job’s Variable_List attribute members
•
The following Resource_List attribute members:
cput
exec_vnode
file
max_walltime
min_walltime
nice
pcput
pmem
pvmem
site
software
start_time
walltime
•
When a runjob hook rejects a job, it can do the following:
•
Set the job’s depend attribute
•
Set any members of the job’s Variable_List attribute
•
Place a hold on the job
•
Release a hold on the job
•
Set the job’s project attribute
•
Change the time the job is allowed to begin execution
•
Set any of the job’s Resource_List attribute members except nodect
•
Change the state of a vnode where the job would have run
PBS Professional 13.0 Administrator’s Guide
AG-549
Hooks
Chapter 6
•
Change the state of a vnode where the job would have run
See Table 6-8, “Job Attributes Readable & Settable via Events,” on page 499 and Table 6-11,
“Job Resources Readable & Settable by Hooks via Events,” on page 505.
A runjob hook can modify a vnode only if the hook rejects the event, and the vnode is in the
job’s exec_vnode attribute. For a vnode, the hook can modify only the state attribute. The
only pre-execution event hook that can change this attribute is a runjob hook.
6.12.4.6.ii
The runjob Hook Interface
The event type is pbs.RUNJOB.
A runjob event occurs when one of the following happens:
•
The administrator uses the qrun command
•
The scheduler chooses to run a job and calls pbs_runjob()
A runjob hook is executed just before a job is sent to the execution host. It is triggered by
qrun and the pbs_runjob() API call. A runjob hook runs once per subjob.
For a runjob hook only, job object attributes appear as they would be before the event takes
place.
A pbs.RUNJOB event has the following member, in addition to those listed in Table 6-17,
“Using Event Object Members in Events,” on page 568 and Table 6-28, “Methods Available
in Events,” on page 605:
pbs.event().job
A pbs.job object representing the job being run. See section 6.12.7, “Job Objects”,
on page 585.
A pbs.event().accept() terminates hook execution and allows the job to be sent to the execution host, and any changes to job attributes or resources take effect.
A pbs.event().reject() terminates hook execution and causes the job to be requeued instead of
being sent to the execution host. When a job is requeued by this hook, the scheduler considers
it for execution in the next scheduling cycle.
AG-550
PBS Professional 13.0 Administrator’s Guide
Hooks
6.12.4.7
Chapter 6
execjob_begin: Event when Execution Host
Receives Job
6.12.4.7.i
Changes When Job is Received by MoM
When MoM receives a job, an execjob_begin hook can:
•
Modify the job’s Execution_Time, Hold_Types, Variable_List, and resources_used
attributes
•
Flag the job to be rerun
•
Kill the job
•
Set attributes and resources on the vnode(s) managed by the MoM where this job executes
6.12.4.7.ii
The execjob_begin Hook Interface
This event type is pbs.EXECJOB_BEGIN.
An execjob_begin hook executes on the primary execution host and then, if successful, executes on all the sister hosts allocated to the job. The hook executes when the host first
receives the job, after any files or directories are staged in.
A pbs.EXECJOB_BEGIN event has the following members and methods, in addition to
those listed in Table 6-17, “Using Event Object Members in Events,” on page 568 and
Table 6-28, “Methods Available in Events,” on page 605:
pbs.event().job
This is a pbs.job object representing the job that is about to run. See section 6.12.7,
“Job Objects”, on page 585.
pbs.event().vnode_list[]
This is a dictionary of pbs.vnode objects, keyed by vnode name, listing the vnodes
that are assigned to this job. See section 6.12.4.16.xv, “The Vnode List Event Member”, on page 573 for information about using pbs.event().vnode_list[].
pbs.reboot()
Reboots host. See section 6.12.14.3.xxv, “Method to Reboot Host”, on page 615.
A call to pbs.event().accept() means the job can proceed with execution, and any changes to
job attributes, resources, or the vnode list take effect.
PBS Professional 13.0 Administrator’s Guide
AG-551
Chapter 6
Hooks
A call to pbs.event().reject(<message>) automatically causes the job to be killed and tells
the server to requeue the job. In addition, any changes to job attributes, resources, or vnode
list take effect. When a job is requeued by this hook, the scheduler considers it for execution
in the next scheduling cycle.
•
If the pbs.event().reject(<message>) call is made on a primary execution host, the following message appears in the MoM log at log event class PBSEVENT_DEBUG2:
“execjob_begin request rejected by <hook_name>”
<message>
The rejection message <message> also appears in the STDERR of the program such as
qrun invoking pbs_runjob() API:
•
If the pbs.event().reject(<message>) call is made on a sister host, the following message appears in the MoM log at log event class PBSEVENT_DEBUG2:
“execjob_begin request rejected by <hook_name>”
<message>
In addition, this message appears in mom_logs on the primary execution host:
“job_start_error: <hook errno> from node <hostname> could not JOIN_JOB
successfully.
•
If pbs_runjob() was invoked by the scheduler, the following job comment appears:
“Not running: PBS Error: <message>”
If the execjob_begin hook script encounters an unexpected error causing an unhandled
exception, or if the script terminates due to a hook alarm, the job is automatically killed and
the server requeues the job. All job changes, vnode changes, or requests for host reboot or
scheduler cycle restarts do not take effect. In this case, one of the the following messages
appears in the MoM log at event class PBSEVENT_DEBUG2:
“execjob_begin hook <hook_name> encountered an exception, request
rejected”
“alarm call while running execjob_begin hook '<hook_name>', request
rejected”
AG-552
PBS Professional 13.0 Administrator’s Guide
Hooks
6.12.4.8
6.12.4.8.i
Chapter 6
execjob_prologue: Event Just Before Execution
of Top-level Job Process
Changes Before Job Shell is Executed
Just before a job’s top shell is executed, an execjob_prologue hook can:
•
Modify the job’s Execution_Time, Hold_Types, and resources_used attributes
•
Flag the job to be rerun
•
Kill the job
•
Set attributes and resources on the vnode(s) managed by the MoM where this job executes
6.12.4.8.ii
The execjob_prologue Hook Interface
This event type is pbs.EXECJOB_PROLOGUE.
An execjob_prologue hook runs on the primary execution host. An execjob_prologue
hook runs on each of the sister MoM hosts allocated to the job, if and when at least one of the
job’s tasks is spawned via tm_spawn() through a tightly integrated MPI or if the job process uses pbs_attach() on that host. On the primary execution host, an
execjob_prologue hook executes just prior to executing the top-level shell or cmd process of
the job. This is where the prologue executes. On a sister host running a task spawned with
tm_spawn(), the hook executes just before the first task of the job on this host is spawned,
and before any execjob_launch hooks. It is not run for any additional spawned task on this
host. On a sister host running a task attached with pbs_attach(), the hook executes just
before the first task of the job on this host is attached, and after any execjob_attach hooks.
See section 6-1, “Execution Event Hook Timing”, on page 457.
An execjob_prologue hook overrides a prologue. If an execjob_prologue hook exists and is
enabled, MoM executes the hook. Otherwise, she executes the prologue.
A pbs.EXECJOB_PROLOGUE event has the following members and methods, in addition
to those listed in Table 6-17, “Using Event Object Members in Events,” on page 568 and
Table 6-28, “Methods Available in Events,” on page 605:
pbs.event().job
This is a pbs.job object representing the job that is about to run. See section 6.12.7,
“Job Objects”, on page 585.
pbs.event().vnode_list[]
This is a dictionary of pbs.vnode objects, keyed by vnode name, listing the vnodes
that are assigned to this job. See section 6.12.4.16.xv, “The Vnode List Event Member”, on page 573 for information about using pbs.event().vnode_list[].
PBS Professional 13.0 Administrator’s Guide
AG-553
Chapter 6
Hooks
pbs.reboot()
Reboots host. See section 6.12.14.3.xxv, “Method to Reboot Host”, on page 615.
A pbs.event().accept() allows the job to continue its normal execution, and any changes to
job attributes, resources, or vnode list take effect.
A pbs.event().reject(<message>) causes the job to be killed, and the owning server to
requeue the job. Any changes to job attributes, resources, or vnode list take effect. When a
job is requeued by this hook, the scheduler considers it for execution in the next scheduling
cycle.
•
On the primary execution host, the following job-level mom_logs entries appear:
“execjob_prologue request rejected by <hook_name>”
<message>
•
On a sister vnode, the following job-level mom_logs entries appear:
“execjob_prologue request rejected by <hook_name>”
<message>
•
In addition, the following message appears in the STDERR of the program invoking the
tm_attach() API, such as the pbs_attach() command:
“a hook has rejected the task manager request”
If the following setting is specified in the hook script, just before issuing a
pbs.event().reject(), the job is deleted instead of being requeued:
pbs.event().job.delete()
If the user attribute of the execjob_prologue hook is set to pbsuser, the hook script executes
under the context of the job owner (the value of the euser job attribute).
If the execjob_prologue hook script encounters an unexpected error causing an unhandled
exception, or if the script terminates due to a hook alarm, the job is killed and the server
reque