Arc Search Methods for Linearly Constrained Optimization.

Arc Search Methods for Linearly Constrained Optimization.
ARC SEARCH METHODS FOR LINEARLY CONSTRAINED OPTIMIZATION
A DISSERTATION
SUBMITTED TO THE INSTITUTE FOR
COMPUTATIONAL AND MATHEMATICAL ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Nicholas Wayne Henderson
June 2012
Abstract
We present a general arc search algorithm for linearly constrained optimization. The method constructs and searches along smooth arcs that satisfy a small and practical set of properties. An
active-set strategy is used to manage linear inequality constraints. When second derivatives are
used, the method is shown to converge to a second-order critical point and have a quadratic rate
of convergence under standard conditions. The theory is applied to the methods of line search,
curvilinear search, and modified gradient flow that have previously been proposed for unconstrained
problems. A key issue when generalizing unconstrained methods to linearly constrained problems
using an active-set strategy is the complexity of how the arc intersects hyperplanes. We introduce a
new arc that is derived from the regularized Newton equation. Computing the intersection between
this arc and a linear constraint reduces to finding the roots of a quadratic polynomial. The new arc
scales to large problems, does not require modification to the Hessian, and is rarely dependent on
the scaling of directions of negative curvature. Numerical experiments show the effectiveness of this
arc search method on problems from the CUTEr test set and on a specific class of problems for which
identifying negative curvature is critical. A second set of experiments demonstrates that when using
SR1 quasi-Newton updates, this arc search method is competitive with a line search method using
BFGS updates.
iv
Acknowledgments
I’ve had the great privilege of knowing and working with many wonderful people during my time at
Stanford University.
My principal advisor, Walter Murray, has been a great mentor and friend. He helped me land my
first internship the summer before I started at Stanford. He put me forward for various fellowships,
which supported this research. His class on numerical optimization initiated my interest in the field.
I will always value his guidance and advice.
Michael Saunders has also been a great supporter and friend. Any reader of this dissertation
who is unable to find a dangling participle should thank him. Michael is kind of heart and extremely generous with his time. His excellent class on large-scale linear algebra and optimization
was instrumental in the development of my solver.
My other committee members were Yinyu Ye, Margot Gerritsen, and Robert Tibshirani. They
are exemplary teachers, researchers, and people.
ICME has been a wonderful home during my time at Stanford. I would like to thank the past
and present directors Peter Glynn, Walter Murray, and Margot Gerritsen for their dedication to the
program. Indira Choudhury deserves special credit for her tireless support of students.
Finally, I would like to express my gratitude to my family and friends. I enjoy life because they
are a part of it.
This research was supported by the William R. Hewlett Stanford Graduate Fellowship Fund and
grants from the Office of Naval Research.
v
Contents
Abstract
iv
Acknowledgments
v
1 Introduction
1.1 Preliminaries . . . . . . . . . . . .
1.2 Unconstrained optimization . . . .
1.3 Linearly constrained optimization .
1.4 Second derivative methods . . . . .
1.4.1 Newton . . . . . . . . . . .
1.4.2 Line search and extensions
1.4.3 Gradient flow . . . . . . . .
1.4.4 Trust-region . . . . . . . . .
1.5 Thesis outline . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
2
3
3
4
5
6
7
2 Convergence
2.1 Preliminaries . . . . . . . . . . . . . . . . . .
2.2 Statement of assumptions . . . . . . . . . . .
2.3 Definition of the algorithm . . . . . . . . . . .
2.3.1 Properties of Γk . . . . . . . . . . . .
2.3.2 Properties of πk . . . . . . . . . . . .
2.3.3 Properties of Γk related to constraints
2.3.4 Properties of αk . . . . . . . . . . . .
2.3.5 Properties of Ak . . . . . . . . . . . .
2.4 Convergence results . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
10
11
11
12
14
14
.
.
.
.
23
23
24
25
25
3 Arcs
3.1 Preliminaries . . . . . .
3.2 Line search . . . . . . .
3.3 Curvilinear search . . .
3.3.1 Moré & Sorensen
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
27
28
29
31
31
33
34
34
35
36
.
.
.
.
.
.
.
.
.
.
.
37
37
39
39
40
40
41
43
47
50
51
54
.
.
.
.
.
.
.
.
.
.
56
56
56
56
57
57
59
59
60
63
66
6 Conclusions
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
69
69
3.4
3.5
3.3.2 Goldfarb . . . . . . . . . .
NEM arcs . . . . . . . . . . . . .
3.4.1 Derivation . . . . . . . . .
3.4.2 Properties . . . . . . . . .
3.4.3 Convergence . . . . . . .
3.4.4 Linear constraints . . . .
3.4.5 Constraint intersection . .
3.4.6 Advantages . . . . . . . .
Modified gradient flow . . . . . .
3.5.1 Derivation . . . . . . . . .
3.5.2 Constraint intersection . .
3.5.3 Comparison to NEM arcs
4 ARCOPT
4.1 Preliminaries . . . . . . .
4.2 Initialization . . . . . . .
4.2.1 Input . . . . . . .
4.2.2 Initial processing .
4.3 Phase 1 . . . . . . . . . .
4.4 Phase 2 . . . . . . . . . .
4.5 Basis maintenance . . . .
4.6 Factorization . . . . . . .
4.7 Products with Z and Z T .
4.8 Expand . . . . . . . . . .
4.9 Arc-constraint intersection
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Experiments
5.1 Preliminaries . . . . . . . . . . . . . . .
5.1.1 SNOPT . . . . . . . . . . . . . .
5.1.2 IPOPT . . . . . . . . . . . . . . .
5.1.3 Other solvers . . . . . . . . . . .
5.1.4 Performance profiles . . . . . . .
5.2 Hamiltonian cycle problem (HCP) . . .
5.2.1 10, 12, and 14 node cubic graphs
5.2.2 24, 30, and 38 node cubic graphs
5.3 The CUTEr test set . . . . . . . . . . . .
5.4 Quasi-Newton methods . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A Results tables
A.1 CUTEr results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A.2 Quasi-Newton results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
71
78
Bibliography
86
viii
List of Tables
4.1
4.2
Symbols for basis index sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ARCOPT parameters and default values. . . . . . . . . . . . . . . . . . . . . . . . . .
38
39
5.1
5.2
5.3
5.4
Summary of solvers on NEOS . . . . . . . .
Results on 10, 12, and 14 node cubic graphs
Results on 24, 30, and 38 node cubic graphs
Solver settings for CUTEr experiments. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
58
61
62
63
A.1
A.2
A.3
A.4
A.5
Results on CUTEr unconstrained problems . . . . . . . . . . .
Results on CUTEr bound constrained problems . . . . . . . .
Results on CUTEr linearly constrained problems . . . . . . . .
Quasi-Newton results on CUTEr unconstrained problems . . .
Quasi-Newton results on CUTEr bound constrained problems
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
71
74
77
78
82
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Figures
5.1
5.2
5.3
5.4
5.5
5.6
5.7
Performance profile on 10, 12, and 14 node cubic graphs . . . . . . . . . . . . . . .
Average performance on cubic graphs . . . . . . . . . . . . . . . . . . . . . . . . .
Performance profile on unconstrained problems. . . . . . . . . . . . . . . . . . . . .
Performance profile on bound constrained problems. . . . . . . . . . . . . . . . . .
Performance profile on linearly constrained problems. . . . . . . . . . . . . . . . . .
Performance profile for quasi-Newton experiments on unconstrained problems. . .
Performance profile for quasi-Newton experiments on bound constrained problems.
x
.
.
.
.
.
.
.
61
62
64
64
65
68
68
Chapter 1
Introduction
1.1
Preliminaries
This thesis is concerned with algorithms to find local solutions to the linearly constrained optimization problem
minimize
F (x)
x∈Rn
(LC)
subject to Ax ≥ b.
More specifically, we seek points that satisfy the first- and second-order necessary conditions for a
minimizer. Here, F ∈ C 2 : Rn 7→ R, A ∈ Rm×n , and b ∈ Rm . In general, it is assumed the gradient
∇F (x) and the Hessian ∇2 F (x) are available. It will be seen that ∇2 F (x) is only needed in the
form of an operator for matrix-vector products.
Algorithms for this problem usually generate a sequence of iterates {xk }∞
k=0 , which converge to a
∗
solution x . At each iterate, the gradient is denoted gk = ∇F (xk ) and the Hessian is Hk = ∇2 F (xk ).
Arc search methods produce new iterates with the update
xk+1 = xk + Γk (αk ),
where Γk ∈ C 2 : R 7→ Rn and αk ≥ 0. Γk (α) is a smooth arc in n-dimensional Euclidean space
and α is the step size. For α = 0 arcs have no displacement (Γk (0) = 0) and are initially tangent
to a descent direction (Γ′k (0)T gk < 0). The second condition may be relaxed to Γ′k (0)T gk ≤ 0 if a
direction of negative curvature is used.
In the thesis we prove global convergence for a generic arc search algorithm to solve LC. An
active-set strategy is used to manage the linear inequality constraints. When second derivatives
are used we show convergence to second-order critical points and a quadratic rate of convergence
under standard conditions. We also develop a specific arc that scales to large problems, does not
require Hessian modification, and avoids arbitrary scaling choices. This chapter discusses arc search
methods in the context of the well known line search and trust-region methods.
1
2
CHAPTER 1. INTRODUCTION
1.2
Unconstrained optimization
Algorithms for LC typically are extensions of methods to solve the unconstrained optimization
problem,
minimize
F (x).
(UC)
n
x∈R
The best possible solution to UC is called a global minimizer, a point in Rn that attains the least
value of F .
Definition 1.1. If F (x∗ ) ≤ F (x) for all x, then x∗ is called a global minimizer.
Unfortunately, global minimizers can be very difficult to find. Thus, we restrict our interest to
methods that find local minimizers. Note that many methods to find global minimizers are based
on methods to find local minimizers.
Definition 1.2. If F (x∗ ) ≤ F (x) for all x in some neighborhood of x∗ , then x∗ is called a local
minimizer.
Definition 1.3. If F (x∗ ) < F (x) for all x in some neighborhood of x∗ with x∗ 6= x, then x∗ is called
a strict local minimizer.
Definitions 1.2 and 1.3 are not possible to check computationally. When F is smooth, derivatives
can be used to derive optimality conditions.
Theorem 1.1. If x∗ is a local minimizer then
∇F (x∗ ) = 0 and ∇2 F (x∗ ) 0.
(1.1)
Theorem 1.2. x∗ is a strict local minimizer if
∇F (x∗ ) = 0 and ∇2 F (x∗ ) ≻ 0.
(1.2)
Equations (1.1) are known as the second-order necessary optimality conditions. If F is bounded
below on a compact set, then a point satisfying (1.1) exists and is called a second-order critical point.
Equations (1.2) are known as the second-order sufficient optimality conditions. Points satisfying
(1.2) need not exist. For example, x∗ = 0 is a strict (global) minimizer for F (x) = x4 , yet F ′′ (0) = 0.
Thus algorithms can usually only guarantee the ability to find points satisfying (1.1) and in practice
often find points satisfying (1.2).
1.3
Linearly constrained optimization
Consider now problems that are constrained by a set of linear inequalities, Ax ≥ b. Here, A is a
m × n matrix and b is a vector of length m. An individual constraint is written aTi x ≥ bi , where aTi
is the ith row of A and bi is the ith element of b. For a point x, a constraint is said to be active if
aTi x = bi , inactive if aTi x > bi , and violated if aTi x < bi .
3
CHAPTER 1. INTRODUCTION
Solutions and optimality conditions for LC are defined in an analogous fashion to those for UC.
Definition 1.4. If Ax∗ ≥ b and there is a neighborhood N such that F (x∗ ) ≤ F (x) for all x ∈ N
and Ax ≥ b then x∗ is called a local minimizer.
Definition 1.5. If Ax∗ ≥ b and there is a neighborhood N such that F (x∗ ) < F (x) for all x ∈ N
and Ax ≥ b with x∗ 6= x then x∗ is called a strict local minimizer.
Theorem 1.3. Given a point x∗ ∈ Rn , let A be the matrix of constraints active at x∗ and let Z
denote a basis for null(A). Then x∗ is a local solution to LC only if

Ax∗ ≥ b with Ax∗ = b 


∗
T ∗
T
∗
∇F (x ) = A λ ⇔ Z ∇F (x ) = 0



λ∗ ≥ 0
Z T ∇2 F (x∗ )Z 0.
(first-order)
(1.3)
(second-order)
(1.4)
Theorem 1.4. Given a point x∗ ∈ Rn , let A be the matrix of constraints active at x∗ and let Z
denote a basis for null(A). Then x∗ is a strict local solution to LC if

Ax∗ ≥ b with Ax∗ = b 


∗
T ∗
T
∗
∇F (x ) = A λ ⇔ Z ∇F (x ) = 0



λ∗ > 0
Z T ∇2 F (x∗ )Z ≻ 0.
(first-order)
(1.5)
(second-order)
(1.6)
In Theorems 1.3 and 1.4 Z T ∇F (x∗ ) is the reduced gradient, Z T ∇2 F (x∗ )Z is the reduced Hessian,
and λ∗ is the vector of Lagrange multipliers. Combined, equations (1.3) and (1.4) are known as the
second-order necessary optimality conditions for LC. A second-order critical point satisfying (1.3)
and (1.4) exists if F is bounded below on a compact set that has a nontrivial intersection with the
feasible region defined by Ax ≥ b. Equations (1.5) and (1.6) are known as the second-order sufficient
optimality conditions. Points satisfying (1.5) and (1.6) need not exist, just like the case for UC.
1.4
1.4.1
Second derivative methods
Newton
Newton’s method is the gold standard in optimization. In the context of unconstrained optimization
the algorithm is simply
xk+1 = xk − Hk−1 gk .
(1.7)
In the case where Hk ≻ 0, xk+1 minimizes mk (x) = 12 xT Hk x + gkT x, a local quadratic model of
F around xk . Newton’s method has the desirable property that if it converges to a point x∗ with
4
CHAPTER 1. INTRODUCTION
∇2 F (x∗ ) ≻ 0, then it does so at a quadratic rate [4,53]. The problem with Newton’s method is that
it may cycle, diverge, or converge to a maximizer. Also, if Hk is singular, xk+1 is not defined.
Many successful algorithms use Newton’s iteration when it works and do something different
when it does not. One key feature of these algorithms is that they enforce a descent property. That
is each new iterate must produce a lower objective function value, i.e. F (xk+1 ) < F (xk ) for k > 0.
The two common algorithm classes are line search methods and trust-region methods.
1.4.2
Line search and extensions
Line search methods first compute a descent direction pk and then invoke a univariate search procedure to compute a step size αk , which ensures F (xk + αk pk ) < F (xk ). The overall update is
xk+1 = xk + αk pk .
Newton’s method is attained when pk solves Hk pk = −gk and αk = 1. The key issue with line
search based Newton methods is dealing with Hk when it is nearly singular or indefinite. It is possible
to compute an approximation H̄k , which is sufficiently positive definite. The search direction is then
selected by solving H̄k sk = −gk . The process to determine H̄k may also give a direction of negative
curvature dk such that dTk Hk dk < 0.
Methods to adapt line search to obtain convergence to second-order critical points started with
McCormick’s modification of the Armijo step size rule [47]. McCormick’s backtracking rule is
xk+1 = xk + sk 2−i + dk 2−i/2 ,
where sk is a descent direction (gkT sk < 0) and dk is a direction of negative curvature. The backtracking index i starts at 0 and increments by 1 until an acceptable point is found. Moré and
Sorensen [48] developed this into a search based update with
xk+1 = xk + αk2 sk + αk dk ,
(1.8)
which allows for a more sophisticated univariate search procedure to select an acceptable αk . Goldfarb proved convergence with the update
xk+1 = xk + αk sk + αk2 dk .
(1.9)
The basic idea here is that when αk is small, the update is dominated by sk , a direction that
provides a greater decrease in F when close to xk . When αk is large, dk dominates, which may
provide greater decrease in F far from xk . A downside of (1.9) is that it requires specialized search
conditions. Updates (1.8) and (1.9) are members of a class of curvilinear search methods, which use
low-order polynomial combinations of vectors.
Forsgren and Murray proved second-order convergence with the line search update
xk+1 = xk + αk (sk + dk ),
(1.10)
5
CHAPTER 1. INTRODUCTION
first in the context of problems with linear equality constraints [27] and then with linear inequality
constraints [28]. Gould proved convergence using an “adaptive” line search approach [39], which uses
a condition to choose between sk and dk . Line search is performed along the selected vector for an
acceptable step size. Ferris, Lucidi, and Roma [24] used (1.8) to develop a method for unconstrained
problems in the nonmonotone framework of Grippo, Lampariello, and Lucidi [41].
All of the methods discussed need to choose a descent direction sk and a direction of negative
curvature dk . When Hk is indefinite, the best choice for sk is not clear. Some methods choose to
solve
(Hk + γk I)sk = −gk
where γk is selected such that Hk + γk I is sufficiently positive definite. Modified Cholesky methods
[33] solve
(Hk + Ek )sk = −gk ,
where Ek is diagonal and constructed to make Hk + Ek sufficiently positive definite in a single
factorization. Fang and O’Leary have cataloged many different modified Cholesky algorithms in [23].
Auslender [2] presents some more ways to compute sk for use in curvilinear search.
Directions of negative curvature provide another difficulty. First, we’ve already seen four different
ways to use dk in a search-based algorithm. Many authors use (1.8). However, the best choice is
not clear. Second, a direction of negative curvature dk does not have a natural scale with respect to
a descent direction sk . It is possible to come up with many ways to scale dk , but we don’t have a
theoretical measure of quality. Gould’s adaptive line search method [39] is able to avoid the relative
scaling issue by only using one vector at a time.
1.4.3
Gradient flow
Another interesting approach to optimization is based on systems of ordinary differential equations
and has been explored by many authors [1, 3, 5–7, 11, 31]. See Behrman’s PhD thesis [3, p. 6] for a
summary of the history. The most practical methods, as explored by Botsaris, Behrman, and Del
Gatto [3, 6, 31], are based on the system of differential equations
d
x(t) = −∇F (x(t))
dt
x(0) = x0 .
(1.11)
If ∇F (x(t)) linearized about xk and x(t) shifted such that x(t) = xk + wk (t), then (1.11) becomes
d
wk (t) = −Hk wk (t) − gk
dt
wk (0) = 0.
(1.12)
6
CHAPTER 1. INTRODUCTION
The linear ODE (1.12) can be solved analytically with the spectral decomposition of Hk . The
resulting algorithm for unconstrained problems is
xk+1 = xk + wk (tk ),
where tk is selected with a univariate search procedure to satisfy a descent property. This method
is able to handle second derivatives naturally. When Hk ≻ 0, wk (t) terminates at the Newton step
pk = −Hk−1 gk . When Hk is indefinite the arc produced by the ODE is unbounded and will diverge
away from a saddle point (in most cases). A key benefit of this method is that it does not require
Hessian modification in the indefinite case.
Botsaris and Jacobson introduced the basic idea, but used a modification so that wk is bounded
if Hk is nonsingular [6]. Their proof of first-order convergence requires tk to be a minimizer of
F (xk + wk (t)). Behrman provided a convergence result with the unmodified solution to (1.12) and
practical search conditions to select tk [3]. Behrman also showed that the method could scale to
large problems by projecting the linear ODE onto the space spanned by a small number of Lanczos
vectors of Hk . Del Gatto further improved the practicality by proving convergence in the case where
the ODE is projected onto a two-dimensional subspace [31]. We discuss the details and address
second-order convergence of this method in Chapter 3.
Botsaris also considered an active-set method for problems with linear inequality constraints [9]
and a generalized reduced-gradient method for nonlinear equality constraints [8]. However, the
methods presented are not applicable to large-scale problems and second-order convergence is not
considered.
1.4.4
Trust-region
Trust-region algorithms differ from search-based methods by first choosing a maximum allowable
step size, then computing the direction. The trust-region radius is denoted ∆k and may be modified
at different parts of the algorithm. A step sk is then determined to satisfy ksk k ≤ ∆k , where k · k
is usually the 2-norm or ∞-norm. If xk + sk is deemed acceptable by a descent property, then the
update xk+1 = xk + sk is performed. If xk + sk is not acceptable, i.e. F (xk + sk ) ≥ F (xk ), then ∆k
is decreased and sk is recomputed. Trust-region algorithms must also specify a rule for increasing
∆k under certain conditions on F (xk + sk ).
One common way to compute sk is to minimize a quadratic model of F about xk subject to the
2-norm trust-region constraint. The optimization problem is
minimize
s
mk (s) = 21 sT Hk s + gkT s
subject to sT s ≤ ∆k .
(1.13)
A key benefit of the trust-region approach is that the subproblem (1.13) is well defined when Hk
is singular or indefinite. It is possible to compute global minimizers of (1.13) at the expense of
multiple linear system solves. Such expense is impractical for large problems, so many methods
CHAPTER 1. INTRODUCTION
7
choose to approximately solve (1.13). Steihaug presented a practical algorithm based on applying
the conjugate gradient method to Hk s = −gk [56]. Byrd, Schnabel, and Shultz showed that the
trust-region subproblem can be solved on a two-dimensional subspace and still maintain the overall
convergence properties [14].
In trust-region terminology a Cauchy point is a minimizer of the model function along −gk ,
subject to the trust-region constraint. Global convergence to stationary points is attained if the
reduction in the model function by sk is at least as good as the reduction provided by a Cauchy
point. Second-order convergence for trust-region algorithms was proved by Sorensen using the global
minimizer of the quadratic model [55]. Shultz provides a proof that relaxes the condition to the
computation of what Conn et al. call an eigenpoint [16,54]. An eigenpoint is defined as the minimizer
of the model function along a vector that has a sufficient component in space spanned by eigenvectors
corresponding to negative eigenvalues of Hk . As long as sk reduces the model function as much as
an eigenpoint, then second-order convergence can be obtained. Later, Zhang and Xu demonstrated
second-order convergence with an indefinite dogleg path [61].
Trust-region methods can be extended to constrained problems in many ways. Gay presented an
approach for linearly constrained optimization [32]. Branch, Coleman, and Li developed a secondorder subspace method for problems with simple bounds on the variables [10]. For problems with
nonlinear constraints we refer to work by Conn, Gould, Orban, and Toint [15] and also Tseng [57].
The book Trust-Region Methods by Conn, Gould, and Toint [16] provides exhaustive coverage of the
field.
1.5
Thesis outline
Chapter 2 presents a second-order convergence proof for a general arc search method using an activeset strategy for linearly constrained optimization. The proof generalizes the work by Forsgren and
Murray on a line search method in [28]. For convergence, arcs must satisfy a set of properties
similar to the properties of sufficient descent and negative curvature for certain vectors in line search
methods. A key difference is that an arc may intersect a linear constraint an arbitrary number of
times. This means that an arc can move from and be restricted by the same linear constraint in a
single iteration, which is not possible along a line. We show that on any subsequence of iterations
where this occurs, the multiplier associated with the constraint must converge to a positive value.
This observation and a modification to the rules for constraint deletion allow the proof of [28] to be
generalized for arcs.
Chapter 3 shows the application of the convergence theory to several different arcs. Forsgren and
Murray line search (1.10) as well as Moré and Sorensen curvilinear search (1.8) directly satisfy the
sufficient arc conditions with appropriate choices for sk and dk . Goldfarb’s method (1.9) requires
a simple modification to guarantee second-order convergence. We introduce an arc based on the
regularized Newton equation and designate it with NEM. The theory is also applied to the modified
gradient flow algorithm of Behrman and Del Gatto. All methods may be used on linearly constrained
problems. We discuss the derivation and application of NEM arcs in some detail.
CHAPTER 1. INTRODUCTION
8
Chapter 4 details ARCOPT, a Matlab implementation using NEM arcs for linearly constrained
optimization. Chapter 5 covers several numerical experiments with ARCOPT:
• a comparison of ARCOPT and IPOPT on a continuous formulation of the Hamiltonian cycle
problem,
• a comparison of ARCOPT, IPOPT, and SNOPT on problems in the CUTEr test set,
• a comparison of quasi-Newton updates with an arc search method on problems in the CUTEr
test set.
Chapter 2
Convergence
2.1
Preliminaries
This chapter presents a general arc search algorithm and associated convergence theory for the
problem
minimize
F (x)
n
x∈R
subject to Ax ≥ b.
Here A is a real m × n matrix and F ∈ C 2 : Rn 7→ R. We are interested in algorithms that converge
to points satisfying the second-order necessary optimality conditions.
The notation and arguments in this chapter are inspired by and adapted from the work by
Forsgren and Murray [27, 28].
2.2
Statement of assumptions
Assumption 2.1. The objective function F is twice continuously differentiable.
Assumption 2.2. The initial feasible point x0 is known and the level set {x : F (x) ≤ F (x0 ), Ax ≥ b}
is compact.
Assumption 2.3. The matrix of active constraints has full row rank at any point satisfying the
second-order necessary optimality conditions. Let x̂ be a feasible point. Say AA is the matrix of
active constraints with nullspace matrix ZA . If
T
T 2
ZA
∇F (x̂) = 0 and λmin (ZA
∇ F (x̂)ZA ) ≥ 0
then AA has full row rank.
9
10
CHAPTER 2. CONVERGENCE
2.3
Definition of the algorithm
The general arc search algorithm generates a sequence of iterates {xk }∞
k=0 with
xk+1 = xk + Γk (αk ),
where Γk ∈ C 2 : R → Rn is the search arc and αk is the step size. Iteration k starts at point xk
and ends at xk+1 . We denote aTi as the ith row of A and bi as the ith element of b. A constraint
aTi xk ≥ bi is said to be inactive if aTi xk > bi , active if aTi xk = bi , and violated if aTi xk < bi . At the
start of iteration k the algorithm has access to the following objects:
xk
Fk , gk , Hk
Ak
Zk
Wk ⊆ {1, . . . , m}
πk
Γk
current variable or iterate
values for the objective function, gradient, and Hessian
matrix of active constraints
nullspace matrix associated with Ak , i.e. Ak Zk = 0
index set of active constraints, i ∈ Wk if and only if aTi xk = bi
vector of multiplier estimates associated with the active set
hypothetical arc constructed to remain on constraints active at xk ,
Ak Γk (α) = 0 for all α ≥ 0
The algorithm will inspect πk and other data to determine if a constraint can be deleted. After this
process, the following objects are defined:
Āk
Z̄k
Wk
Γk
matrix of constraints that will remain active during iteration k
nullspace matrix associated with Āk , i.e. Āk Z̄k = 0
index set of constraints that will remain active during iteration k
search arc for iteration k, Ak Γk (α) ≥ 0 and Āk Γk (α) = 0 for α ≥ 0
If no constraints are deleted then Āk = Ak , Z̄k = Zk , W k = Wk , and Γk = Γk .
An arc search will then be made on the arc xk + Γk (α) to select the step size αk . The next
iterate is then xk+1 = xk + Γk (αk ). If the step is limited by a constraint, then Ak+1 , Zk+1 , Wk+1 ,
and πk+1 are updated to account for the change.
Set notation will be used to compare index sets at the start of different iterations. The intersection
Wk1 ∩Wk2 = {i : i ∈ Wk1 , i ∈ Wk2 } contains indices that are in both Wk1 and Wk2 . The set difference
/ Wk2 } indicates indices that are in Wk1 but not in Wk2 .
notation Wk1 \Wk2 = {i : i ∈ Wk1 , i ∈
2.3.1
Properties of Γk
The search arc Γk : R 7→ Rn is twice continuously differentiable with Γk (0) = 0. The arc must
provide initial descent, gkT Γ′k (0) ≤ 0, and remain on constraints in W k , i.e. Āk Γk (α) = 0 for α ≥ 0.
If step size α is bounded, the arc must be bounded. Formally,
α ≤ τ1 =⇒ ∃ finite τ2 such that kΓ′k (α)k ≤ τ2 ∀ α ∈ [0, τ1 ].
(2.1)
11
CHAPTER 2. CONVERGENCE
If α is unbounded it is acceptable for the arc and its derivative to diverge. On any subsequence of
iterates I, the arc must satisfies the following properties:
First-order convergence
lim gkT Γ′k (0) = 0 =⇒ lim ZkT gk = 0
(2.2)
lim inf Γ′k (0)T Hk Γ′k (0) + gkT Γ′′k (0) ≥ 0 =⇒ lim inf λmin (ZkT Hk Zk ) ≥ 0
(2.3)
k∈I
k∈I
Second-order convergence
k∈I
k∈I
Arc convergence
lim gkT Γ′k (0) = 0 and lim inf Γ′k (0)T Hk Γ′k (0) + gkT Γ′′k (0) ≥ 0
k∈I
k∈I
=⇒ lim Γ′k (0) = 0
(2.4)
k∈I
In (2.3), λmin (ZkT Hk Zk ) denotes the minimal eigenvalue of ZkT Hk Zk .
2.3.2
Properties of πk
At each iteration a vector of Lagrange multiplier estimates πk is computed and must satisfy
lim kZkT gk k = 0 =⇒ lim kgk − ATk πk k = 0.
k∈I
k∈I
(2.5)
The minimum multiplier is denoted πmin,k = mini (πk )i .
2.3.3
Properties of Γk related to constraints
The hypothetical arc Γk is constructed to remain on the set of constraints active at the beginning
of iteration k:
′
Ak Γk (α) = 0 and Ak Γk (α) = 0 for all α ≥ 0.
The true search arc Γk satisfies the following rules and properties:
Deletion Constrained are only considered for deletion if πmin,k < 0. If no constraint is to be deleted
or a constraint was added in the last iteration, then set Γk (α) = Γk (α). If the most recent
change to the working set was a constraint addition, then only one constraint may be deleted.
If the most recent change to the working set was a constraint deletion, then more than one
constraint may be deleted. If constraints are to be deleted,
Ak Γ′k (0) > 0 and Ak Γ′k (0) 6= 0.
12
CHAPTER 2. CONVERGENCE
Descent If constraints are to be deleted the arc must provide initial descent such that
′
gkT Γ′k (0) ≤ gkT Γk (0) ≤ 0.
Convergence It is also required that
′
lim gkT (Γ′k (0) − Γk (0)) = 0 =⇒
k∈I

lim inf
k∈I
πmin,k ≥ 0
limk∈I kΓk (α) − Γk (α)k = 0,
aTi Γ′k (0) > 0 =⇒ (πk )i ≤ νπmin,k for k ∈ I and i ∈ Wk \W k ,
(2.6)
(2.7)
where I is any subsequence such that W k ⊆ Wk for all k ∈ I and ν is a tolerance in the interval
(0, 1].
2.3.4
Properties of αk
At each iteration the univariate search function is defined as
φk (α) = F (xk + Γk (α)).
The first and second derivatives are
φ′k (α) = ∇F (xk + Γk (α))T Γ′k (α)
φ′′k (α) = Γ′k (α)T ∇2 F (xk + Γk (α))Γ′k (α) + ∇F (xk + Γk (α))T Γ′′k (α).
When evaluated at α = 0 these reduce to
φ′k (0) = gkT Γ′k (0)
φ′′k (0) = Γ′k (0)T Hk Γ′k (0) + gkT Γ′′k (0).
The step size αk is computed to satisfy certain conditions that enforce convergence of the algorithm:
Boundedness An upper bound on the step size is computed with
ᾱk = max {α : A(xk + Γk (α)) ≥ b and α ≤ αmax }
where αmax is a fixed upper limit. If ᾱk = 0, then αk = 0.
Descent condition The selected step size must provide a sufficient reduction in the objective
function according to
φk (αk ) ≤ φk (0) + µ φ′k (0)αk +
1
2
min{φ′′k (0), 0}αk2
(2.8)
13
CHAPTER 2. CONVERGENCE
with 0 < µ ≤ 12 .
Curvature condition The selected step size must provide a sufficient reduction in curvature unless
it encounters a constraint. Thus, αk must satisfy at least one of
|φ′k (αk )| ≤ η|φ′k (0) + min{φ′′k (0), 0}αk |
or
φ′k (αk )
<
η(φ′k (0)
+
min{φ′′k (0), 0}αk )
(2.9)
and αk = ᾱk
(2.10)
with µ ≤ η < 1. In the case where αk is limited by ᾱk , (2.10) indicates that the derivative of the
search function must be negative. If, in the other case, φ′k (αk ) > η|φ′k (0) + min{φ′′k (0), 0}αk |
then a robust search routine should reduce α to find a step size satisfying (2.9).
The set of points that satisfy both (2.8) and (2.9) is denoted Φk . The step size is called restricted
if αk = ᾱk and αk < αmax , which indicates that a constraint has been encountered. Otherwise, the
/ Φk , it must
step is called unrestricted and satisfies (2.9) or (2.10) with ᾱk = αmax . Note that if αk ∈
satisfy (2.8) and (2.10).
Moré and Sorensen [48, Lemma 5.2] provide a proof for the existence of a step size satisfying
(2.8) and (2.9). For completeness, it is reproduced here.
Lemma 2.1. Let φ : R → R be twice continuously differentiable in an open interval Ω that contains
the origin, and suppose that {α ∈ Ω : φ(α) ≤ φ(0)} is compact. Let µ ∈ (0, 1) and η ∈ [µ, 1). If
φ′ (0) < 0, or if φ′ (0) ≤ 0 and φ′′ (0) < 0, then there is an α > 0 in Ω such that
φ(α) ≤ φ(0) + µ φ′ (0)α +
1
2
min{φ′′ (0), 0}α2
and
|φ′ (α)| ≤ η|φ′ (0) + min{φ′′ (0), 0}α|.
(2.11)
(2.12)
Proof. Let
β = sup {α ∈ I : φ(α) ≤ φ(0)} .
Then β > 0 since either φ′ (0) < 0, or φ′ (0) ≤ 0 and φ′′ (0) < 0. Moreover, the compactness
assumption and the continuity of φ imply that β is finite and that φ(0) = φ(β). Thus
φ(β) ≥ φ(0) + µ φ′ (0)β +
Define ψ : I → R by
1
2
min{φ′′ (0), 0}β 2 .
ψ(α) = φ(α) − φ(0) − η φ′ (0)α +
1
2
min{φ′′ (0), 0}α2 .
Since µ ≤ η we have ψ(β) ≥ 0. Note also that ψ(0) = 0 and either ψ ′ (0) < 0 , or ψ ′ (0) ≤ 0 and
ψ ′′ (0) < 0. This, together with the continuity of ψ, implies the existence of β1 ∈ (0, β] such that
ψ(β1 ) = 0, and ψ(α) < 0 for all α ∈ (0, β1 ). Now Rolle’s theorem shows that there is an α ∈ (0, β1 )
with ψ ′ (α) = 0, and thus (2.12) follows. Moreover, ψ(α) < 0 and µ ≤ η imply (2.11).
14
CHAPTER 2. CONVERGENCE
2.3.5
Properties of Ak
The matrix of active constraints Ak is required to have full row rank for all k. Recall that W k is
the index set of constraints that will remain active during iteration k. Let Pka be the index set of
constraints that become active at the end of iteration k,
Pka = {i ∈
/ W k : aTi xk+1 = bi }.
The working set at the start of iteration k + 1 is Wk+1 = W k ∪ Wka , where Wka ⊆ Pka and Ak+1 are
required to satisfy
Pka 6= ∅ =⇒ Wka 6= ∅ and
(2.13)
Ak+1 has full row rank.
(2.14)
Thus if constraints are encountered then at least one must be added to the working set. Note that
a step is restricted if and only if Wk+1 \W k 6= ∅.
The nullspace matrix Zk is required to have a bounded condition number for all k.
2.4
Convergence results
The convergence results for this arc search method generalize the work by Forsgren and Murray
on a line search method [28]. Lemmas 2.2, 2.3, and 2.4 establish convergence on subsequences of
iterations with unrestricted steps and are based on Lemmas 4.1, 4.2, and 4.3 in [28]. The original
result for unconstrained optimization with a curvilinear search comes from Moré and Sorensen [48].
Lemma 2.2. Given assumptions 2.1–2.3, assume that a sequence {xk }∞
k=0 is generated as outlined
in section 2.3. Then
(i) limk→∞ αk φ′k (0) = 0,
(ii) limk→∞ αk2 min{φ′′k (0), 0} = 0,
(iii) limk→∞ kxk+1 − xk k = 0.
Proof. Rearrangement of (2.8) gives
φk (0) − φk (αk ) ≥ −µ φ′k (0)αk +
1
2
min{φ′′k (0), 0}αk2 .
Since µ > 0, φ′k (0) ≤ 0, and the objective function is bounded from below on the feasible region, (i)
and (ii) follow.
To show (iii), we write xk+1 − xk = Γk (αk ) and show that limk→∞ kΓk (αk )k = 0. Γk (α) is
continuous with Γk (0) = 0. Also αk and kΓ′k (0)k are bounded. Therefore, if limk→∞ kΓk (αk )k 6= 0,
there must exist a subsequence I with ǫ1 > 0 and ǫ2 > 0 such that αk ≥ ǫ1 and kΓ′k (0)k ≥ ǫ2 for
all k ∈ I. From the existence of ǫ1 , (i) implies limk∈I φ′k (0) = 0 and (ii) implies limk∈I φ′′k (0) ≥ 0.
15
CHAPTER 2. CONVERGENCE
Since φ′k (0) = gkT Γ′k (0) and φ′′k (0) = Γ′k (0)T Hk Γ′k (0) + gkT Γ′′k (0), the termination property of Γk (2.4)
implies that limk∈I kΓ′k (0)k = 0. This contradicts the existence of ǫ2 , thus establishing (iii).
Lemma 2.3. Given assumptions 2.1–2.3, assume that a sequence {xk }∞
k=0 is generated as outlined
in section 2.3. If, at iteration k, an unrestricted step is taken, then either αk = αmax or there exists
a θk ∈ (0, αk ) such that
αk (φ′′k (θk ) + η max{−φ′′k (0), 0}) ≥ −(1 − η)φ′k (0).
(2.15)
Proof. Since φ′k (0) ≤ 0, it follows from (2.9) if αk is unrestricted and αk < αmax , it satisfies
− φ′k (αk ) ≤ −ηφ′k (0) + η max{−φ′′k (0), 0}αk .
(2.16)
Further, since φ′k is a continuously differentiable univariate function, the mean-value theorem ensures
the existence of a θk ∈ (0, αk ) such that
φ′k (αk ) = φ′k (0) + αk φ′′k (θk ).
(2.17)
A combination of (2.16) and (2.17) gives (2.15).
Lemma 2.4. Given assumptions 2.1–2.3, assume that a sequence {xk }∞
k=0 is generated as outlined
in section 2.3. Let I denote a subsequence of iterations where unrestricted steps are taken; then
(i) limk∈I φ′k (0) = 0,
(ii) limk∈I φ′′k (0) ≥ 0,
(iii) limk∈I ZkT gk = 0 and lim inf k∈I λmin (ZkT Hk Zk ) ≥ 0.
Proof. To show (i), assume by contradiction there is a subsequence I ′ ⊆ I such that φ′k (0) ≤ ǫ1 < 0
for k ∈ I ′ . Lemma 4.2 in conjunction with assumptions A1 and A2 then implies that lim supk∈I ′ αk 6=
0, contradicting Lemma 4.1. Hence, the assumed existence of I ′ is false, and we conclude that (i)
holds.
Similarly, to show (ii), assume by contradiction that there is a subsequence I ′′ ⊆ I such that
φ′′k (0) ≤ ǫ2 < 0 for k ∈ I ′′ . Since αk > 0 and φ′k (0) ≤ 0, Lemma 2.3 implies that for k ∈ I ′′ there
exists θk ∈ (0, αk ) such that
φ′′k (θk ) − ηφ′′k (0) ≥ 0.
(2.18)
Lemma 2.2 gives limk∈I ′′ αk = 0, and thus (2.18) cannot hold for k sufficiently large. Consequently,
the assumed existence of I ′′ is false, and (ii) holds.
Finally, we show that (i) and (ii) imply (iii). (i) and the sufficient descent property of the arc (2.2)
imply limk∈I ZkT gk = 0. (i) and the sufficient negative curvature property of the arc (2.3) imply
lim inf k∈I λmin (ZkT Hk Zk ) ≥ 0.
CHAPTER 2. CONVERGENCE
16
We now consider linear constraints. The following lemma states that multipliers πk and πk+1
must converge on a subsequence of iterations with unrestricted steps and the same set of active
constraints.
Lemma 2.5. Given assumptions 2.1–2.3, assume that a sequence {xk }∞
k=0 is generated as outlined
in section 2.3. Let I denote a subsequence of iterations. If αk ∈ Φk (αk is unrestricted) and
Wk = Wk+1 = W I (working set does not change) for k ∈ I, then limk∈I kπk+1 − πk k = 0.
Proof. Denote AI as the matrix of active constraints for k ∈ I. Assume for contradiction there
is a subsequence I ′ ⊆ I such that kπk+1 − πk k ≥ ǫ for k ∈ I ′ . Part (iii) of Lemma 2.4 and part
T
gk+1 = 0. (2.5) implies that
(iii) of Lemma 2.2 imply that limk∈I ′ ZkT gk = 0 and limk∈I ′ Zk+1
T
T
′
′
limk∈I kgk − AI πk k = 0 and limk∈I kgk+1 − AI πk+1 k = 0. Combining the previous two limits we
see
lim′ k(gk+1 − gk ) − ATI (πk+1 − πk )k = 0.
(2.19)
k∈I
However, kπk+1 − πk k ≥ ǫ for k ∈ I ′ and (2.19) imply the existence of a K such that πk+1 −
πk ∈ null(AI ) and null(AI ) 6= ∅ for all k ≥ K and k ∈ I ′ . This contradicts assumption A3 that
matrix AI has full rank. Thus, the assumed existence of subsequence I ′ is false, and we must have
limk∈I kπk+1 − πk k = 0.
The following lemma states that constraints will eventually be encountered on a subsequence of
iterations where constraints are deleted, the minimal multiplier is negative and bounded away from
zero, and for which constraints are not deleted in the previous iteration. This result is derived from
Lemma 4.4 in [28]. In this case, arcs may move from and encounter the same constraint in a single
iteration. Therefore the consequence of the lemma presented here allows Wk+1 = Wk for k ∈ I and
k ≥ K, which is not possible in the line search case.
Lemma 2.6. Given assumptions 2.1–2.3, assume that a sequence {xk }∞
k=0 is generated as outlined
in section 2.3. If there is a subsequence I and an ǫ > 0 such that Γk−1 = Γk−1 , Γk 6= Γk , and
πmin,k < −ǫ for k ∈ I, then there is an integer K such that Wk+1 \W k 6= ∅ for k ∈ I and k ≥ K.
Proof. In parts for clarity:
1. Suppose there is a subsequence I and an ǫ > 0 such that Γk−1 = Γk−1 , Γk 6= Γk , and
πmin,k < −ǫ for k ∈ I. No constraints are deleted in iteration k − 1 while at least one
constraint is deleted in iteration k.
2. Assume there is a subsequence I ′ ⊆ I such that an unrestricted step is taken for k ∈ I ′ .
3. Lemma 2.4 implies that limk∈I ′ φ′k (0) = 0.
4. On the other hand, (2.6) ensures the existence of a subsequence I ′′ ⊆ I ′ and a positive constant
′
ǫ2 such that gkT (Γ′k (0) − Γk (0)) ≤ −ǫ2 for all k ∈ I ′′ .
′
5. However, gkT Γk (0) ≤ 0 implies that φ′k (0) ≤ −ǫ2 for all k ∈ I ′′ , which is a contradiction.
17
CHAPTER 2. CONVERGENCE
6. Hence, the assumed existence of subsequence I ′ is false, and there must exist a K such that
for all k ∈ I and k ≥ K a restricted step is taken. (Wk+1 \W k 6= ∅ for all k ∈ I and k ≥ K.)
The following lemma shows that if there exists a subsequence of iterations where the same
constraint is both deleted and added, then the multiplier corresponding to the constraint converges
to a positive value.
Lemma 2.7. Given assumptions 2.1–2.3, assume that a sequence {xk }∞
k=0 is generated as outlined
in section 2.3. If there exists a subsequence I such that for k ∈ I
• Γk−1 = Γk−1 (no constraint is deleted in iteration k − 1)
• πmin,k ≤ −ǫ1 and the constraint corresponding to (πk )i is deleted (Γk 6= Γk )
• in iteration k the arc moves off, but is restricted by, the constraint corresponding to (πk )i , with
αk ∈
/ Φk
• in iteration k + 1, no constraint is deleted (Γk+1 = Γk+1 ) and the step is unrestricted (αk+1 ∈
Φk+1 )
then lim inf k∈I (πk+1 )i > 0.
Proof. In parts for clarity:
1. Assume for contradiction the existence of a subsequence I ′ ⊆ I such that (πk+1 )i ≤ 0.
2. At the beginning of iteration k, constraint i was deleted. In order for the same constraint to
restrict the step size we must have aTi Γ′k (αk ) ≤ 0.
3. (2.6) implies the existence of a subsequence I ′′ ⊆ I ′ and a positive constant ǫ2 such that
gkT Γ′k (0) ≤ −ǫ2 for k ∈ I ′′ .
4. Constraint i limits the step size and enforces αk ∈
/ Φk . With (2.10) we have φ′k (αk ) < ηφ′k (0)
T
′
T ′
′′
or gk+1 Γk (αk ) < ηgk Γk (0) ≤ −ηǫ2 for k ∈ I .
5. The gradient at the start of iteration k + 1 can be represented as gk+1 = ATk+1 πk+1 + rk+1
where rk+1 is the residual.
6. We have
T
T
T
gk+1
Γ′k (αk ) = πk+1
Ak+1 Γ′k (αk ) + rk+1
Γ′k (αk )
=
(πk+1 )i aTi Γ′k (αk )
+
T
rk+1
Γ′k (αk )
(2.20)
< −ηǫ2
for k ∈ I ′′ .
T
Γ′k (αk ) < −ηǫ2 for k ∈ I ′′ .
7. Combining (πk+1 )i ≤ 0 and aTi Γ′k (αk ) ≤ 0 we see that rk+1
(2.21)
18
CHAPTER 2. CONVERGENCE
8. Apply the Cauchy-Schwartz inequality and we see
krk+1 kkΓ′k (αk )k > ηǫ2
or
kΓ′k (αk )k >
(2.22)
ηǫ2
krk+1 k
(2.23)
for k ∈ I ′′ .
T
9. By assumption, iteration k + 1 is an unrestricted step. Thus limk∈I kZk+1
gk+1 k = 0, which
implies limk∈I kgk+1 − ATk+1 πk+1 k = 0. We have
lim krk+1 k = 0 =⇒ lim′′ kΓ′k (αk )k = ∞ =⇒ lim′′ αk = ∞,
k∈I ′′
k∈I
k∈I
where the first step comes from (2.23) and the second step comes from (2.1). Divergence of
αk contradicts assumption A1. Thus the assumed subsequence I ′ does not exist and we must
have
lim inf (πk+1 )i > 0.
k∈I
The following lemma establishes convergence of the minimal multiplier to a nonnegative value on
a subsequence of iterations where constraints are deleted, but not deleted in the previous iteration.
It is derived from Lemma 4.5 from [28].
Lemma 2.8. Given assumptions 2.1–2.3, assume that a sequence {xk }∞
k=0 is generated as outlined
in section 2.3. If there is a subsequence I such that Γk−1 = Γk−1 and Γk 6= Γk for k ∈ I, then
lim inf k∈I πmin,k ≥ 0.
Proof. In parts for clarity:
1. Assume that there exists a subsequence I and an ǫ > 0 such that Γk−1 = Γk−1 , Γk 6= Γk , and
πmin,k < −ǫ for k ∈ I.
2. For each k ∈ I, let lk denote the following iteration with least index such that Wlk = W lk −1 =
Wlk −1 ; i.e., an unrestricted step is taken at iteration lk − 1 and Γlk −1 = Γlk −1 (no constraint
is removed in iteration lk − 1).
3. Lemma 2.6 implies that there is an integer K such that Wk+1 \W k 6= ∅ for all k ∈ I and k ≥ K.
4. The properties of Γk from section 2.3.3 imply that Γk+1 = Γk+1 for k ∈ I, k ≥ K.
5. Consequently, for k ≥ K, lk −1 is the iteration with least index following k where no constraint
is added in the arc search.
19
CHAPTER 2. CONVERGENCE
6. Since there can be at most min{m, n} consecutive iterations where a constraint is added, it
follows from (iii) of Lemma 2.2 that limk∈I kxk − xlk k = 0.
7. Consequently, there must exist a point x̄ that is a common limit point for {xk }k∈I and {xlk }k∈I .
8. Thus, there exists a subsequence I ′ ⊆ I such that limk∈I ′ xk = x̄ and limk∈I ′ xlk = x̄.
9. Thus, there must exist a subsequence I ′′ ⊆ I ′ such that Wk is identical for every k ∈ I ′′ and
Wlk is identical for every lk ∈ J, where J denotes the subsequence {lk }k∈I ′′ . Define W I ≡ Wk
for any k ∈ I ′′ and W J ≡ Wlk for any lk ∈ J.
10. Since all constraints corresponding to W I are active at x̄ and an infinite number of unrestricted steps are taken where the working set is constant, it follows from assumptions A1 and
A2 in conjunction with (iii) of Lemma 2.2 and (iii) of Lemma 2.4 that limk∈I ′′ ZIT gk = 0 and
lim inf k∈I ′′ λmin (ZIT Hk ZI ) ≥ 0, where ZI denotes a matrix whose columns form an orthonormal basis for the null space of AI , the constraint matrix associated with W I .
11. Consequently, (2.5) and the full row rank of ATI imply that limk∈I ′′ πk = π I , where π I satisfies
∇f (x̄) = ATI π I =
X
ai πiI .
(2.24)
i∈W I
12. By a similar reasoning and notation for ZJ and AJ we have limk∈I ′′ ZJT glk = 0,
lim inf k∈I ′′ λmin (ZJT Hlk ZJ ) ≥ 0, and limk∈I ′′ πlk = π J , where π J satisfies
∇f (x̄) = ATJ π J =
X
ai πiJ .
(2.25)
i∈W J
13. Combining (2.24) and (2.25) we obtain
X
i∈W I \W J
ai πiI +
X
X
ai (πiI − πiJ ) +
i∈W I ∩W J
ai πiJ = 0.
(2.26)
i∈W J \W I
14. By assumption A3, the vectors ai , i ∈ W I ∪ W J are linearly independent. Hence, it follows
from (2.26) that
πiI = 0
πiI
=
πiJ
πiJ = 0
for i ∈ W I \W J
I
for i ∈ W ∩ W
(2.27)
J
for i ∈ W J \W I .
(2.28)
(2.29)
15. Since no constraints have been deleted between iterations k and lk for k ∈ I ′′ , any constraint
whose index is in the set W I \W J must have been deleted in an iteration k ∈ I ′′ . Since I ′′ ⊆ I,
it follows that πmin,k ≤ −ǫ for k ∈ I ′′ . From the rule for moving off a constraint, (2.7), we can
20
CHAPTER 2. CONVERGENCE
deduce that (πk )i ≤ −νǫ for k ∈ I ′′ and i ∈ W I \W J , where ν ∈ (0, 1). Since limk∈I ′′ πk = π I ,
we conclude that πiI ≤ −νǫ for i ∈ W I \W J . Hence, (2.27) implies that W I \W J = ∅.
16. If constraint i is deleted in iteration k ∈ I ′′ , then it must be added before iteration lk − 1
because W I \W J = ∅. Again, we can deduce that (πk )i ≤ −νǫ and πiI ≤ −νǫ for i ∈ W I ∩ W J .
If constraint i is also added in iteration k ∈ I ′′ (i ∈ Wk+1 ) then Lemma 2.7 implies that
lim inf k∈I ′′ (πk+1 )i > 0. Lemma 2.5 and (2.28) imply that k + 1 < lk − 1. Consequently, it
must hold that |W J | ≥ |W I | + 1 and, by (2.29), π J has at least one zero element.
17. We can conclude from (2.28) that πmin,lk < −0.5ǫ for k ∈ I ′′ and k sufficiently large. The rules
for computing Γk , (2.6), ensure that there is a subsequence I ′′′ ⊆ I ′′ such that Γ′lk (0) 6= 0 for
all k ∈ I ′′′ . From the definition of lk , it holds that Γlk −1 = Γlk −1 for all k ∈ I ′′′ . Therefore, if
J ′ = {lk : k ∈ I ′′′ }, we may replace I by J ′ and repeat the argument. Since |W J | ≥ |W I | + 1
and |Wk | ≤ min{m, n} for any k, after having repeated the argument at most min{m, n}
times we have a contradiction to assumption A3, implying that the assumed existence of a
subsequence I such that Γk−1 = Γk−1 and Γk 6= Γk and πmin,k < −ǫ for k ∈ I is false.
The following theorem gives the main convergence result. It is derived from Theorem 4.6 in [28].
Theorem 2.1. Given assumptions 2.1–2.3, assume that a sequence {xk }∞
k=0 is generated as outlined
∗
in section 2.3. Then, any limit point x satisfies the second-order necessary optimality conditions;
i.e., if the constraint matrix associated with the active constraints at x∗ is denoted by AA , there is
a vector πA such that
∇f (x∗ ) = ATA πA , πA ≥ 0,
and it holds that
T 2
λmin (ZA
∇ f (x∗ )ZA ) ≥ 0,
where ZA denotes a matrix whose columns form a basis for the null space of AA .
T 2
If in addition λmin (ZA
∇ f (x∗ )ZA ) > 0 and πA > 0, then limk→∞ xk = x∗ . Further, for k
T
T
sufficiently large, it follows that if Γk (αk ) = −ZA (ZA
Hk ZA )−1 ZA
gk then Γk and αk satisfy (2.8)
and (2.9). Moreover, for this choice of Γk and αk , the rate of convergence is at least q-quadratic,
provided the second-derivative matrix is Lipschitz continuous in a neighborhood of x∗ .
Proof. In parts for clarity:
1. Let x∗ denote a limit point of a generated sequence of iterates.
2. By assumption A2, there is a subsequence I such that limk∈I xk = x∗ .
3. Claim: this implies existence of subsequence I ′ such that limk∈I ′ xk = x∗ , Γk−1 = Γk−1 and
Ak−1 = Ak = A∗ for each k ∈ I , where A∗ denotes a matrix that is identical for each k ∈ I ′
and defines the active set at x∗ . (There exists a subsequence where no constraint is added or
deleted and the working set is the same.)
21
CHAPTER 2. CONVERGENCE
4. For k ∈ I, an iterate lk is defined as follows:
(a) If Γk 6= Γk , let lk be the iteration with largest index that does not exceed k for which
Γlk −1 = Γlk −1 . Since no constraints are deleted immediately upon adding constraints, we
obtain Γlk −1 = Γlk −1 , Γlk 6= Γlk , Wlk −1 = Wlk , and k − m ≤ lk ≤ k. Here, constraints
are deleted in iterations lk to k. No constraints are added in iterations lk − 1 to k − 1.
(b) If Γk = Γk , let lk denote the iteration with least index following k such that Γlk −1 = Γlk −1
and Wlk −1 = Wlk . Since no constraints are deleted immediately upon adding constraints,
it follows that lk − 1 is the iteration with least index when no constraint is added. For
this case, we obtain Γlk −1 = Γlk −1 , Wlk −1 = Wlk , and k + 1 ≤ lk ≤ k + m.
(c) It follows from (iii) of Lemma 2.2 that limk∈I kxk − xlk k = 0, and hence limk∈I xlk = x∗ .
With {lk }k∈I defined this way, since there is only a finite number of different active-set
matrices, the required subsequence I ′ can be obtained as a subsequence of {lk }k∈I .
5. Since, for each k ∈ I ′ , an unrestricted step is taken at iteration k − 1, assumptions A1 and A2
in conjunction with property (iii) of Lemma 2.4 give
Ẑ T ∇f (x∗ ) = 0 and λmin (Ẑ T ∇2 f (x∗ )Ẑ) ≥ 0,
where Ẑ denotes an matrix whose columns form a basis for the null space of Â.
limk∈I ′ Ẑ T gk = 0 and  has full row rank, it follows from (2.5) and (2.30) that
∇f (x∗ ) = ÂT π̂ for π̂ = lim′ πk .
k∈I
(2.30)
Since
(2.31)
6. It remains to show that mini π̂i ≥ 0. Assume that there is a subsequence I ′′ ⊆ I ′ and an ǫ > 0
such that πmin,k < −ǫ for k ∈ I ′′ . Lemma 2.8 shows that there exists a K such that Γk = Γk
for k ∈ I ′′ and k ≥ K. But this contradicts (2.6), and since π̂ = limk∈I ′ πk , we conclude that
min π̂i ≥ 0.
i
(2.32)
7. A combination of (2.30), (2.31), (2.32) now ensures that x∗ satisfies the second-order necessary optimality conditions. If there are constraints in AA that are not in Â, the associated
multipliers are zero, i.e. πA equals π̂ possibly extended by zeros. Also, in this situation, the
range space of ZA is contained in the range space of Ẑ. Hence, λmin (Ẑ T ∇2 f (x∗ )Ẑ) ≥ 0 implies
T 2
λmin (ZA
∇ f (x∗ )ZA ) ≥ 0.
8. To show the second part of the theorem, note that if πA > 0, then we must have π̂ = πA , and
it follows from (2.31) that there cannot exist a subsequence I˜′ ⊆ I ′ such that πmin,k < 0 for
k ∈ I˜′ . This implies that there is an iteration K̃ such that Ak = Â and Γk = Γk . Then the
22
CHAPTER 2. CONVERGENCE
problem may be written as an equality-constrained problem in the null space of Â, namely
minimize
n
x∈R
F (x)
subject to Âx = b̂,
(2.33)
where b̂ denotes the corresponding subvector of b.
9. If Ẑ T ∇2 F (x∗ )Ẑ is positive definite, then (iii) of Lemma 2.2 and (2.8) ensure that the limit
point is unique, i.e., limk→∞ xk = x∗ . From the continuity of F , it follows that Ẑ T Hk Ẑ
is positive definite for k sufficiently large. If Γk is constructed such that there exists some
T
T
αN with Γk (αN ) = −ZA (ZA
Hk ZA )−1 ZA
gk , then it follows from Bertsekas [4, p. 78] that
N
αk = α eventually satisfies (2.8) and (2.9). This choice of Γk (αN ) is the Newton step
for (2.33). Bertsekas [4, p. 90] also shows that under these conditions limk→∞ xk = x∗ and the
rate of convergence is q-quadratic provided ∇2 F (x) is Lipschitz continuous in a neighborhood
of x∗ .
Chapter 3
Arcs
3.1
Preliminaries
This chapter discusses the application of the convergence theory to different arcs. We start by defining several vectors used throughout the treatment. Line and curvilinear search methods compute
descent directions sk sufficient for first-order convergence. All methods use directions of negative curvature dk for second-order convergence. Forsgren and Murray define a vector qk to handle constraint
deletion [28, Section 3.4], which may be used in both line and curvilinear search.
Definition 3.1. Vector sk is said to be a direction of sufficient descent if gkT sk ≤ 0 and
lim gkT sk = 0 =⇒ lim ZkT gk = 0 and lim sk = 0
k∈I
k∈I
k∈I
(3.1)
for any subsequence I.
Definition 3.2. Vector dk is said to be a direction of sufficient negative curvature if dTk Hk dk ≤ 0,
gkT dk ≤ 0, and
lim dTk Hk dk = 0 =⇒ lim inf λmin (ZkT Hk Zk ) ≥ 0 and lim dk = 0
k∈I
k∈I
k∈I
(3.2)
for any subsequence I.
Definition 3.3. Vector qk is said to be a direction of constraint deletion if gkT qk ≤ 0 and Ak qk ≥ 0.
Also, qk must have a bounded norm and satisfy
lim gkT qk = 0 =⇒ lim inf πmin,k ≥ 0 and lim qk = 0,
k∈I
k∈I
k∈I
aTi qk > 0 =⇒ (πk )i ≤ νπmin,k for k ∈ I, i ∈ Wk \W k ,
where I is any subsequence and ν is a fixed tolerance in the interval (0, 1].
23
(3.3)
(3.4)
24
CHAPTER 3. ARCS
Here, both sk and dk are constructed to remain on the constraints active at the beginning of
iteration k, i.e., Ak sk = 0 and Ak dk = 0. To satisfy the requirements of Section 2.3.3, qk = 0 if a
constraint was encountered in the previous iteration. If the most recent change to the working set
was a constraint addition, then only one constraint may be deleted.
For efficiency, modified gradient flow and NEM arcs are computed on low-dimensional subspaces.
The following lemmas are used to establish convergence properties when defining an arc on a subspace.
Lemma 3.1. Let Q ∈ Rn×m with n ≥ m and QT Q = I. If u ∈ range(Q) then QQT u = u.
Proof.
u ∈ range(Q) =⇒ ∃ y : Qy = u =⇒ QT Qy = QT u
=⇒ y = QT u =⇒ Qy = QQT u =⇒ u = QQT u.
Lemma 3.2. Say A ∈ Rn×n is nonsingular and B ∈ Rn×m has full rank with n ≥ m. If Ax = y
and x ∈ range(B) then there exists a unique z ∈ Rm that solves
B T ABz = B T y and
(3.5)
Bz = x.
(3.6)
Proof. Together, x ∈ range(B) and rank(B) = m imply that there exists a unique z ∈ Rm that
solves (3.6). By substitution we see that this z solves (3.5):
B T ABz = B T y =⇒ B T Ax = B T y =⇒ B T (Ax − y) = 0.
B T AB is nonsingular, which implies that z is also a unique solution to (3.5).
3.2
Line search
Forsgren and Murray proved convergence to second-order critical points with a line search. The arc
is simply
Γk (α) = α(sk + dk ).
(3.7)
The derivatives of the search function at α = 0 are
φ′k (0) = gkT (sk + dk )
(3.8)
φ′′k (0)
(3.9)
T
= (sk + dk ) Hk (sk + dk ).
Parts (i) and (ii) of Lemma 2.4 from Chapter 2 state that limk∈I φ′k (0) = 0 and lim inf k∈I φ′′k (0) ≥ 0,
where I is a subsequence of iterations with unrestricted steps. These combined with (3.8) and (3.9)
25
CHAPTER 3. ARCS
result in limk∈I gkT sk = 0 and lim inf k∈I dTk Hk dk ≥ 0, which establishes second-order convergence.
For problems with linear inequality constraints, Forsgren and Murray [28] construct qk , a direction of constraint deletion satisfying Definition 3.3. If no constraints are deleted in iteration k
then qk = 0. If constraints are deleted, qk must be a descent direction that moves from at least one
constraint. To fit with the requirements of Section 2.3.3, the hypothetical arc is constructed as
Γk (α) = α(sk + dk ),
(3.10)
Γk (α) = α(sk + dk + qk ).
(3.11)
while the search arc is
′
We see that Γk (α) − Γk (α) = αqk and Γ′k (α) − Γk (α) = qk . Properties (3.3) and (3.4) of qk satisfy
the general arc search requirements (2.6) and (2.7).
It is a simple matter to compute the point of intersection between a line and linear constraint.
Let pk = sk + dk + qk . The intersection between aTi x ≥ bi and xk + αpk occurs at
α=
3.3
3.3.1
bi − aTi xk
.
aTi pk
Curvilinear search
Moré & Sorensen
Moré and Sorensen [48] define the curvilinear arc
Γk (α) = α2 sk + αdk .
(3.12)
The derivatives of the search function at α = 0 are
φ′k (0) = gkT dk
(3.13)
φ′′k (0) = dTk Hk dk + 2gkT sk .
(3.14)
Part (ii) of Lemma 2.4 combined with gkT sk ≤ 0 and (3.14) result in limk∈I gkT sk = 0 and
lim inf k∈I dTk Hk dk ≥ 0. The same arc can be applied to linearly constrained problems by using a
vector of constraint deletion qk . The hypothetical and true search arcs,
Γk (α) = α2 sk + αdk
(3.15)
Γk (α) = α2 sk + α(dk + qk ),
(3.16)
satisfy the convergence requirements in the same manner as the line search method presented in
Section 3.2. Points of arc-constraint intersection are computed by finding non-negative real roots of
α2 aTi sk + αaTi dk + aTi xk − bi = 0.
26
CHAPTER 3. ARCS
3.3.2
Goldfarb
Goldfarb [38] defines the search arc
Γk (α) = αsk + α2 dk .
(3.17)
The derivatives of the search function at α = 0 are
φ′k (0) = gkT sk
φ′′k (0)
=
sTk Hk sk
(3.18)
+
2gkT dk .
(3.19)
The convergence theory of Chapter 2 cannot be directly applied because the sequences {φ′k (0)}k∈I
and {φ′′k (0)}k∈I do not ensure lim inf k∈I dTk Hk dk ≥ 0. Goldfarb proves convergence by defining
two algorithms with modified step size conditions. The first method uses a variant of the descent
condition,
1
φ(α) ≤ φ(0) + µ αgkT sk + α4 dTk Hk dk ,
(3.20)
2
with Armijo-style backtracking. The second method uses the search function
ψk (α) =
gkT Γk (α)
φk (α) − φk (0)
+ min{ 12 Γk (α)T Hk Γk (α), 0}
(3.21)
and requires αk be selected such that σ1 ≤ ψk (αk ) ≤ σ2 with 0 < σ1 ≤ σ2 < 1.
It is possible to modify Goldfarb’s method to use the arc search conditions of Chapter 2. Note
that (3.19), lim inf k∈I φ′′k (0) ≥ 0 (Lemma 2.4, part (ii)) and gkT dk ≤ 0 (Definition 3.2) imply that
limk∈I gkT dk = 0. Then for any subsequence I and fixed tolerance ǫ > 0,
|gkT dk | ≥ ǫ|dTk Hk dk |
(3.22)
implies limk∈I |dTk Hk dk | = 0. Thus subsequence I would converge to a second-order critical point
based on the definition of dk . In the case where (3.22) does not hold, a Goldfarb arc (3.17) may
be replaced by a Forsgren and Murray line (3.7) or a Moré and Sorensen curvilinear arc (3.12) to
obtain second-order convergence on all possible subsequences.
Finally, computing the arc-constraint intersection for (3.17) requires finding non-negative real
roots of
α2 aTi dk + αaTi sk + aTi xk − bi = 0.
3.4
NEM arcs
This section discusses an arc inspired by the Levenberg-Marquardt algorithm for nonlinear equations.
For clarity, the arc is constructed in the context of unconstrained optimization. Linear constraints
are handled in Section 3.4.4.
27
CHAPTER 3. ARCS
We define a NEM arc as
Γk (α) = Qk wk (α)
wk (α) = −Uk ρ(Vk , α)UkT QTk gk
α
ρ(v, α) =
,
1 + α(v − vmin,k )
(3.23)
where Qk ∈ Rn×ns with ns ≤ n, Uk Vk UkT is the spectral decomposition of QTk Hk Qk , and vmin,k =
λmin (QTk Hk Qk ). The function ρ(v, α) is called the kernel and is applied to the diagonal elements
of Vk . Unless noted otherwise, Qk is assumed or constructed to be orthonormal (QTk Qk = I). An
NEM arc starts with Γk (0) = 0 as required by the convergence theory. When Hk ≻ 0, an NEM arc
contains the Newton step pk = −Hk−1 gk if pk ∈ range(Qk ). With αN = 1/vmin,k , ρ(v, αN ) = 1/v.
Thus, Lemma 3.2 and pk ∈ range(Qk ) imply
Γk (αN ) = −Qk Uk Vk−1 UkT QTk gk = −Qk (QTk Hk Qk )−1 QTk gk = pk .
Following the work of Del Gatto [31] on the modified gradient flow algorithm (Section 3.5), the
NEM arc may be constructed on a 2-dimensional subspace. The subspace is chosen as

[g p ]
k k
Sk =
[gk dk ]
if Hk 0
if Hk 0,
(3.24)
where pk solves minp kHk p + gk k2 and dk is a direction of negative curvature satisfying (3.2). The
subspace is orthogonalized with Qk Rk = qr(Sk ).
3.4.1
Derivation
The regularized equation for the Newton step is
(Hk + πI)s = −gk .
(3.25)
An arc can be constructed by considering the solution to (3.25) being parameterized by π:
(Hk + πI)wk (π) = −gk .
(3.26)
Given the spectral decomposition Hk = Uk Vk UkT , (3.26) can be solved with
wk (π) = −Uk ρ(Vk , π)UkT gk
1
.
ρ(v, π) =
v+π
(3.27)
28
CHAPTER 3. ARCS
Arc search requires that Γk (0) = 0. However, (3.27) has the property limπ→∞ wk (π) = 0. This issue
is resolved with a reparameterization given by
π(α) =
1
− vmin,k ,
α
(3.28)
where vmin,k is the minimal eigenvalue of Hk . Note that (3.28) is the solution to the equation
α = ρ(vmin,k , π(α)). Combining (3.28) and (3.27) results in
ρ(v, α) =
α
,
1 + α(v − vmin,k )
the kernel function used in (3.23).
A subspace NEM arc is simply obtained by constructing wk with QTk Hk Qk and QTk gk then
“projecting” back to the full space with Γk = Qk wk .
3.4.2
Properties
The derivatives and initial value of ρ(v, α) from (3.23) are
d
1
ρ(v, α) =
dα
(1 + α(v − vmin ))2
d2
−2(v − vmin )
ρ(v, α) =
2
dα
(1 + α(v − vmin ))3
d
ρ(v, α)
=1
dα
α=0
d2
ρ(v, α)
= −2(v − vmin ).
2
dα
α=0
The initial values for the derivatives of Γk (α) from (3.23) are
Γ′k (0) = Qk wk′ (0) = −Qk QTk gk
Γ′′k (0) = Qk wk′′ (0) = 2Qk (QTk Hk Qk − vmin,k I)QTk gk .
Lemma 3.3. If QTk Qk = I, gk 6= 0, and gk ∈ range(Qk ) then an NEM arc defined by (3.23) has the
properties
1. Γk (0) = 0,
2. if Hk ≻ 0 and pk = −Hk−1 gk ∈ range(Qk ) then Γk (1/vmin,k ) = pk ,
3.
d
2
dα kΓk (α)k2
> 0 for all α > 0 (the norm of the arc is strictly increasing),
4. gkT Γk (α) < 0 for all α > 0 (the arc is always a descent direction),
5. gkT Γ′k (α) < 0 for all α ≥ 0 (the derivative of the arc is always a descent direction).
Proof. We denote vi,k as the ith diagonal of Vk and (UkT QTk gk )i as element i of UkT QTk gk .
1. ρ(v, 0) = 0 for all v implies Γk (0) = 0.
29
CHAPTER 3. ARCS
2. Lemma 3.2 and the definition of ρ from (3.23) imply
Γk (1/vmin,k ) = −Qk Uk Vk−1 UkT QTk gk = −Qk (QTk Hk Qk )−1 QTk gk = pk .
3. Check
d
2
dα kΓk (α)k2
:
d
d
kΓk (α)k22 =
wk (α)T QTk Qk wk (α)
dα
dα
d
=
wk (α)T wk (α)
dα
d T
=
g Qk Uk ρ(Vk , α)2 UkT QTk gk
dα k
n
X
d
ρ(vi,k , α) ρ(vi,k , α)(UkT QTk gk )2i > 0,
=2
dα
i=1
because ρ(v, α) > 0 and
d
dα ρ(v, α)
> 0 for all α > 0 and v > vmin,k .
4. Check gkT Γk (α) :
gkT Γk (α) = −gkT Qk Uk ρ(Vk , α)UkT QTk gk
=−
n
X
ρ(vi,k , α)(UkT QTk gk )2i < 0,
i=1
because ρ(v, α) > 0 for α > 0.
5. Check gkT Γ′k (α) :
gkT Γ′k (α) = −gkT Qk Uk
=−
because
3.4.3
d
dα ρ(v, α)
d
ρ(Vk , α) UkT QTk gk
dα
n
X
d
ρ(vi,k , α)(UkT QTk gk )2i < 0,
dα
i=1
> 0 for α ≥ 0.
Convergence
The initial values for the derivatives of the search function for an NEM arc are
φ′k (0) = gkT Γ′k (0) = −gkT Qk QTk gk
φ′′k (0) = Γ′k (0)T Hk Γ′k (0) + gkT Γ′′k (0)
= gkT Qk QTk Hk Qk QTk gk + 2gkT Qk (QTk Hk Qk − vmin,k I)QTk gk .
30
CHAPTER 3. ARCS
If QTk Qk = I and gk ∈ range(Qk ) then by Lemma 3.1 the derivatives simplify to
φ′k (0) = −gkT gk
φ′′k (0)
=
3gkT Hk gk
(3.29)
−
2vmin,k gkT gk .
(3.30)
Convergence to first-order points is obtained because limk∈I φ′k (0) = 0 (Lemma 2.4, part (i)) and
(3.29) imply limk∈I gk = 0 where I is any subsequence of unrestricted steps.
The sequences {φ′k (0)}k∈I and {φ′′k (0)}k∈I from (3.29) and (3.30) do not directly imply
limk∈I dTk Hk dk = 0. Therefore, we present two perturbation methods that may be applied to an
NEM arc in order to guarantee second-order convergence.
The first method requires that the subspace matrix Qk be constructed such that QTk Qk = I and
dk ∈ range(Qk ). The subspace arc function wk (α) from (3.23) is then redefined with
wk (α) = −Uk ρ(Vk , α)UkT QTk (gk + dk ),
(3.31)
where dk satisfies (3.2). After simplification, the initial value for the derivatives of the search function
become
φ′k (0) = −gkT (gk + dk )
φ′′k (0)
(3.32)
T
= (gk + dk ) Hk (gk + dk ) +
2gkT Hk (gk
+ dk ) −
2vmin,k gkT (gk
+ dk ).
(3.33)
Convergence to a second-order critical point follows, because limk∈I gk = 0 and lim inf k∈I φ′′k (0) ≥ 0
(Lemma 2.4, part (ii)) imply limk∈I dTk Hk dk = 0, where I is any subsequence with unrestricted
steps.
The second method redefines Γk (α) from (3.23) with
Γk (α) = Qk wk (α) + αdk ,
(3.34)
where dk satisfies (3.2). The initial value for the derivatives of the search function become
φ′k (0) = −gkT (gk + dk )
φ′′k (0)
T
= (gk + dk ) Hk (gk + dk ) +
(3.35)
2gkT Hk gk
−
2vmin,k gkT gk .
(3.36)
Convergence to a second-order critical point follows, because limk∈I gk = 0 and lim inf k∈I φ′′k (0) ≥ 0
(Lemma 2.4, part (ii)) imply limk∈I dTk Hk dk = 0, where I is any subsequence with unrestricted
steps.
If ǫ > 0 is a fixed tolerance and |gkT dk | ≥ ǫ|dTk Hk dk |, then limk∈I gk = 0 implies limk∈I dTk Hk dk =
0, where I is any subsequence of unrestricted steps. This indicates that second-order convergence
occurs “naturally” if the gradient contains a large enough component in the direction of negative
curvature. Therefore, a perturbation only needs to be applied if |gkT dk | < ǫ|dTk Hk dk |. We note that
the relative scaling of gk and dk becomes an issue if either perturbation (3.31) or (3.34) is used.
31
CHAPTER 3. ARCS
However, in practice a perturbation is only applied on a small number of iterations.
3.4.4
Linear constraints
We present two methods using an NEM arc for linearly constrained optimization that differ in the
application order of key matrices. First, we review some notation and concepts. Chapter 2 denotes
Ak as the matrix of active constraints at the beginning of iteration k and Āk as the matrix of
constraints that remain active during iteration k. Likewise, Zk and Z̄k are the nullspace matrices
associated with Ak and Āk respectively. A hypothetical arc Γk is constructed to remain on the
constraints active at the start of iteration k, i.e., Γk (α) ∈ range(Zk ) for all α ≥ 0. A true search arc
Γk is constructed to move from constraints deleted in iteration k and remain on Āk . If constraints
are not deleted in iteration k, then Āk = Ak , Z̄k = Zk , and Γk = Γk . Thus, Γk may be constructed
in the same manner as Γk without constraint deletion. Note that hypothetical search arcs are an
artifact of the convergence theory and do not need to be implemented in practice.
The QZ method (Procedure 3.1) constructs a NEM arc with a subspace matrix Qk ∈ Rn×ns
such that Z̄k Z̄kT gk ∈ range(Qk ) and span(Qk ) ⊆ span(Z̄k ). The definition of wk differs from (3.23),
because gk is replaced with Z̄k Z̄kT gk . The ZQ method (Procedure 3.2) is equivalent to constructing
an unconstrained NEM arc on the reduced variables then “projecting” back to the full space with a
product by Z̄k . The methods are named after the order of products in the final definition of Γk .
For both methods, the initial derivative of the arc is

Z̄ Z̄ T g
k k k
Γ′k (0) =
Z̄k Z̄ T gk + dk
k
if |gkT dk | ≥ ǫ|dTk Hk dk |
otherwise.
(3.37)
Note that dk remains on the constraints active at the start of iteration k (i.e. Ak dk = 0). Therefore,
the arc is initially feasible if
aTi Z̄k Z̄kT gk > 0,
(3.38)
where i is the index of any deleted constraint. The arc satisfies the conditions of Section 2.3.3 if
gkT Z̄k Z̄kT gk > gkT Zk ZkT gk ,
(3.39)
which establishes overall convergence of the algorithm. Note that second-order convergence follows
from the term dk in (3.37) and the results from Section 3.4.3.
3.4.5
Constraint intersection
Computing the intersection between an NEM arc and a linear constraint reduces to finding the
roots of an order ns polynomial, where ns is the dimension of the subspace. For simplicity, we drop
indices and use the π-parameterization of (3.27). We assume aT x > b and search for solutions to
CHAPTER 3. ARCS
Procedure 3.1 QZ method to construct a subspace NEM arc for linear constraints
compute dk to satisfy (3.2)
consider constraint deletion according to rules of Section 2.3.3 to form Z̄k
if dk 6= 0 then
Sk ← [Z̄k Z̄kT gk dk ]
else
pk ← arg minp kZ̄kT Hk Z̄k p + Z̄kT gk k2
Sk ← Z̄k [Z̄kT gk pk ]
end if
Qk Rk ← qr(Sk )
Uk Vk UkT ← QTk Hk Qk (spectral decomposition)
α
; vmin,k ← λmin (QTk Hk Qk )
ρ(v, α) ← 1+α(v−v
min,k )
if dk = 0 or |gkT dk | ≥ ǫ|dTk Hk dk | then
wk (α) ← −Uk ρ(Vk , α)UkT QTk Z̄k Z̄kT gk
Γk (α) ← Qk wk (α)
else
wk (α) ← −Uk ρ(Vk , α)UkT QTk (Z̄k Z̄kT gk + dk )
Γk (α) ← Qk wk (α)
end if
Procedure 3.2 ZQ method to construct a subspace NEM arc for linear constraints
compute dk to satisfy (3.2)
consider constraint deletion according to rules of Section 2.3.3 to form Z̄k
if dk 6= 0 then
Sk ← [Z̄kT gk Z̄kT dk ]
else
pk ← arg minp kZ̄kT Hk Z̄k p + Z̄kT gk k2
Sk ← [Z̄kT gk pk ]
end if
Qk Rk ← qr(Sk )
Uk Vk UkT ← QTk Z̄kT Hk Z̄k Qk (spectral decomposition)
wk (α) ← −Uk ρ(Vk , α)UkT QTk Z̄kT gk
α
; vmin,k ← λmin (QTk Z̄kT Hk Z̄k Qk )
ρ(v, α) ← 1+α(v−v
min,k )
if dk = 0 or |gkT dk | ≥ ǫ|dTk Hk dk | then
Γk (α) ← Z̄k Qk wk (α)
else
Γk (α) ← Z̄k Qk wk (α) + αdk
end if
32
33
CHAPTER 3. ARCS
aT (x + w(π)) = b:
aT (x + w(π)) = b
aT w(π) = b − aT x
aT U ρ(V, π)U T g = b − aT x.
Let γ = b − aT x and βi = (uTi a)(uTi g), where ui is column i of U . Now we have
ns
X
i=1
ns
X
i=1
βi
Y
j6=i
βi
= γ,
vi + π
(vi + π) = γ
ns
Y
(vj + π).
j=1
The final equation is polynomial with order ns . In the method inspired by Del Gatto [31], ns = 2
and the quadratic formula may be used.
3.4.6
Advantages
The NEM method has several advantages when compared to line and curvilinear search methods.
We summarize them here:
• The NEM method does not require a special method to compute sk when Hk ⊁ 0. The Hessian
is handled “naturally” in all cases.
• On most iterations, the scaling of dk is irrelevant because NEM arcs are constructed on orthogonalized subspaces. Perturbations are required to guarantee second-order convergence and are
dependent on the scale. However, in practice perturbations are rarely required.
• The vector qk defined by Forsgren and Murray in [28, Section 5.4] is essentially steepest descent
for deleted constraints. NEM arcs defined by Procedures 3.1 and 3.2 first move from the
constraints along the steepest descent direction then turn toward the Newton step. Thus, only
first-order estimates for the Lagrange multipliers are required, while the algorithm is able to
take full advantage of second-order information in selecting the next iterate.
Relative to a line or curvilinear search method, these features come with an O(nns ) added computational cost, where ns is the dimension of the subspace. We’ve shown that a NEM arc may be
constructed on a 2-dimensional subspace, and thus the extra work may be considered O(n).
34
CHAPTER 3. ARCS
3.5
Modified gradient flow
In the context of unconstrained optimization, Del Gatto [31] defines a modified Gradient Flow (MGF)
arc as
Γk (α) = Qk wk (α)
wk (α) = −Uk ρ(Vk , α)UkT QTk gk

− 1 (e−vt(α) − 1) for v 6= 0
v
ρ(v, α) =
α
for v = 0
t(α) =
−1
vmin,k
log(1 − αvmin,k ),
(3.40)
(3.41)
(3.42)
(3.43)
where Qk ∈ Rn×ns with ns ≤ n, Uk Vk UkT is the spectral decomposition of QTk Hk Qk , and vmin,k =
λmin (QTk Hk Qk ). The MGF and NEM (3.23) arcs share the same definitions of Γk and wk and
differ only in the definition of the kernel function ρ. The convergence theory of the two methods is
nearly identical, so we omit the details here. In summary, the MGF kernel (3.42) has the properties
= 1, and ρ(v, 1/vmin,k ) = 1/v for v ≥ vmin,k . Thus, the properties of
ρ(v, 0) = 0, d ρ(v, α)
dα
α=0
Section 3.4.2 and convergence results of Section 3.4.3 hold. MGF arcs may also be used for problems
with linear constraints in the same manner as Section 3.4.4. Here, we review the derivation of the
MGF method and discuss differences with the NEM method.
3.5.1
Derivation
The method of modified gradient flow [3,6,59] defines the search arc as the solution to a linear ODE.
For clarity, we discuss the method for unconstrained optimization. The modified gradient flow arc
is Γk (t) = wk (t), where
wk′ (t) = −Hk w(t) − gk
wk (0) = 0.
(3.44)
Given the spectral decomposition Hk = Uk Vk UkT , the solution to (3.44) is
wk (t) = −Uk ρ(Vk , t)UkT gk

− 1 (e−vt − 1) for v 6= 0
v
ρ(v, t) =
t
for v = 0.
(3.45)
35
CHAPTER 3. ARCS
Here, ρ is called a kernel function. The notation ρ(Vk , t) indicates that ρ(v, t) is applied to only the
diagonal elements of the eigenvalue matrix Vk . Note that

e−vt
d
ρ(v, t) = ρt (v, t) =
1
dt
for v 6= 0
for v = 0.
Behrman and Del Gatto [3, 31] reparameterize (3.45) with
t(α) =


−1
vmin,k
log(1 − αvmin,k )
α
for vmin,k 6= 0
for vmin,k = 0,
(3.46)
where vmin,k is the smallest eigenvalue of Hk . This comes from solving for t(α) in the equation
α = ρ(vmin,k , t(α)). When vmin,k > 0 the arc is bounded and the search is over α ∈ [0, 1/vmin,k ].
When vmin,k ≤ 0 the arc is unbounded and the search is over α ∈ [0, ∞). If Hk ≻ 0 and αN = 1/vmin,k
then wk (tk (αN )) = −Hk−1 gk .
Behrman [3] presents a method that constructs an MGF arc on a subspace spanned by a small
number of vectors from the Lanczos process. Del Gatto [31] computes the MGF search arc on a
two-dimensional subspace. If Hk 0 then the subspace is chosen as Sk = [gk pk ], where pk solves
minp kHk p + gk k2 . If Hk is indefinite, then the subspace is chosen as Sk = [gk dk ]. The “projection”
onto the subspace requires the tall-skinny QR factorization, Qk Rk = qr(Sk ). The matrix Qk has
dimensions n × 2. A search arc is computed by solving the linear ODE
wk′ (t) = −QTk Hk Qk wk (t) − QTk gk
wk (0) = 0,
(3.47)
then “projecting” back into the full space with Γk (t) = Qk wk (t). The solution to (3.47) requires
a spectral decomposition of a 2 × 2 matrix. A complete MGF arc (3.40) is the solution to (3.47)
parameterized by (3.46).
3.5.2
Constraint intersection
Computing the first intersection between an MGF arc and a linear constraint requires finding the
smallest real and positive solution to
r(t) =
n1
X
i=1
βi e
νi t
+t
n2
X
γi + ξ = 0,
(3.48)
i=1
where n1 is the number of nonzero eigenvalues and n2 is the number of zero eigenvalues of the
subspace Hessian (QTk Hk Qk ). The total size of the subspace is n1 + n2 . When n1 + n2 = 2, it is
possible to solve (3.48) with a carefully constructed numerical procedure. Despite some effort we
have not found a satisfactory method for the case when n1 + n2 > 2. A direct search or interpolation
scheme would be impractical for large problems.
CHAPTER 3. ARCS
3.5.3
36
Comparison to NEM arcs
The only difference between the MGF and NEM methods is the definition of the kernel functions in
(3.23) and (3.42). We summarize a few advantages that NEM has over MGF:
• The resulting constraint intersection equation for an NEM arc is a polynomial. For an MGF
arc the resulting equation is a sum of exponentials (3.48), which does not appear to have
an analytic solution or practical computational routine for subspaces with more than two
dimensions.
• The MGF kernel function (3.42) has different expressions for v 6= 0 and v = 0. Thus the
implementation requires a tolerance check on v and procedures to handle both cases. The
NEM kernel function is the same for all v.
• The NEM kernel function is qualitatively “nicer”, because it does not use an exponential or
logarithm.
Chapter 4
ARCOPT
4.1
Preliminaries
ARCOPT is a reduced-gradient method using NEM arcs designed to solve linearly constrained optimization problems of the form
minimize
n
x∈R
F (x)
subject to l ≤
x
Ax
!
≤ u,
where F (x) is smooth and A is an m × n sparse matrix. ARCOPT is influenced by and indebted to
MINOS [51]. As in MINOS, the problem is reformulated so that x and A include a full set of slack
variables and columns:
minimize
F (x)
n
x∈R
subject to Ax = 0
l ≤ x ≤ u.
The primary goal in developing ARCOPT was to adhere to the theoretically defined algorithm
from Chapter 2. A few critical deviations were made to account for limited resources and finite
precision arithmetic. The implementation is not able to take an infinite number of iterations, thus
terminates when approximate optimality conditions are met. Chapter 2 defines an algorithm where
all the iterates remain feasible. ARCOPT uses the EXPAND procedure [36], which increases the
feasibility tolerance by a small amount each iteration. Variables are allowed to move outside the
original bounds (l and u), but must remain feasible with respect to the expanded bounds. This
technique is a practical way to reduce the chance of cycling, handle degeneracy, keep key matrices well
conditioned, and remove undesirable roots when computing arc-constraint intersections. Even with
EXPAND it is still possible for certain matrices to become poorly conditioned. ARCOPT implements
a repair procedure from [34, Section 5] to attempt a recovery from these situations.
A fundamental aspect of ARCOPT is a partitioning of the variables and corresponding columns
37
38
CHAPTER 4. ARCOPT
Table 4.1: Symbols for basis index sets. B, S, Nl , Nu , Nf , and Nb
are disjoint, their union includes all variables {1, . . . , m}.
symbol
B
S
Nl
Nu
Nf
Nb
N
description
basic variables
superbasic variables
nonbasic at lower bound
nonbasic at upper bound
nonbasic and fixed, i ∈ Nf ⇔ li = ui
nonbasic and between bounds, li < xi < xu
union of all nonbasic variables,
N = Nl ∪ N u ∪ N f ∪ N b
of A into several disjoint subsets. Nonbasic variables are held fixed, usually at an upper or lower
bound. Superbasic variables are free to move. Basic variables are determined by the linear equations.
Symbols for the basis index sets are given in Table 4.1. The linear constraints may be partitioned
h
Ax = B
S

xB
 
N  xS  ,
xN
i

(4.1)
where B is m × m, S is m × |S|, and N is m × n − m − |S|. B is called the basis matrix. The partition
is constructed and maintained so that B has full rank. Given xS and xN , the basic variables are
determined by solving
BxB = −SxS − N xN .
(4.2)
Solves with B and B T use LU factors from LUSOL [35]. Each iteration may cause a column of B to
be replaced with one from S. This is a rank-1 modification and is handled efficiently and stably with
LUSOL’s Bartels-Golub update of the LU factors. After a certain number of updates, a complete
factorization is carried out. Section 4.7 describes products with Z and Z T , where Z is a matrix
whose columns are in null(A).
ARCOPT is composed of several major components. We describe them here in the order invoked
by the algorithm:
1. The initialization phase (Section 4.2) processes input and performs all actions required before
the main loop.
2. Each iteration of the main loop starts with a call to expand main (Procedure 4.15) to increase
the dynamic feasibility tolerance. After a certain number of iterations the dynamic feasibility
tolerance must be reset to its initial value and nonbasic variables must be moved back to their
bounds.
3. Each iteration makes a call to phase 1 (Section 4.3) if basic variables are found to be infeasible or
39
CHAPTER 4. ARCOPT
phase 2 (Section 4.4) otherwise. Phase 1 will take a step to minimize the sum of infeasibilities.
Phase 2 constructs an NEM arc and takes a step towards optimality. Phase 1 will terminate
if the constraints are determined to be infeasible. Phase 2 will terminate if the approximate
optimality conditions are met.
4. After each iteration the basis maintenance routine (Section 4.5) is called to handle a change to
the basis if a bound was encountered. This may require an update to the basis matrix, which
is handled by the factorization routines (Section 4.6).
4.2
Initialization
The initialization phase is responsible for processing solver options, dealing with input data, selecting
an initial basis, and performing the initial factorization. The solver options and associated symbols
used in this chapter are listed in Table 4.2.
symbol
δD
δP
δC
δM
δF
δG
δV
δR
expfrq
δA
δB
δS
default
1e-6
1e-6
1e-4
0.2
1e-4
.9
1e-7
2e-4
10000
.5
.99
1e-11
description
dual feasibility tolerance
primal feasibility tolerance
curvature optimality tolerance
arc perturbation tolerance
arc search descent parameter
arc search curvature parameter
initial step size tolerance
regularization parameter
expand reset frequency
initial EXPAND tolerance
final EXPAND tolerance
tolerance for near zero numbers
Table 4.2: ARCOPT parameters and default values.
4.2.1
Input
ARCOPT requires:
• Routines to evaluate the objective function F (x) and gradient g(x).
• A routine to evaluate the Hessian H(x), or an operator H(x, v), to evaluate matrix-vector
products with the Hessian at x.
• x0 , an initial guess, vector of length n0 .
• A0 , constraint matrix, size m × n0 .
• l, u, lower and upper bounds on variables and constraints with length n = n0 + m.
40
CHAPTER 4. ARCOPT
4.2.2
Initial processing
The input data is processed in the following manner:
1. Compute initial slack variables: s0 ← A0 x0
2. Form augmented constraint matrix: A ← [A0 − I]
3. Initialize variable vector x by projecting into bounds:
x←
x0
s0
!
x ← max(x, l)
x ← min(x, u)
4. Call basis initialize to set the initial basis.
5. Call fac main to perform the initial factorization.
4.3
Phase 1
Each iteration of Phase 1 carries out the following steps:
1. Check the feasibility of the basic variables. If they are found to be feasible, phase 1 is complete
and phase 2 will start in the next iteration. If the basic variables are not feasible, phase 1
continues.
2. Construct a linear objective vector c with length n to minimize the sum of infeasibilities:
ci =


0

1



if li ≤ xi ≤ ui
if xi > ui
−1 if xi < li
3. Compute the vector of multipliers y by solving B T y = cB .
4. Compute the residual gradient: z ← c − AT y
5. Check the optimality conditions. Phase 1 is optimal if
min(x − l, z) ≤ δD and min(u − x, −z) ≤ δD .
If this occurs, the constraints are not feasible. Here δD is the dual feasibility tolerance.
41
CHAPTER 4. ARCOPT
6. Select the nonbasic variable to move. Choose the element of z with largest magnitude and
appropriate sign. The index of this variable is denoted k. Set σ ← −1 if zk < 0 or σ ← 1 if
zk > 0.
7. Compute the phase 1 search direction ∆x:
∆xN ← 0; ∆xk ← σ; ∆xB solves B∆xB = −σak ,
where ak is column k of A.
8. Compute the maximum possible step size ᾱ along ∆x with the call
(ᾱ, j, β, γ) ← expand(x, ∆x, l, u, αmax , δE , δT , δS ),
where αmax is a user defined step size limit and δE is the dynamic feasibility tolerance. The
EXPAND parameters δT and δS are described in Section 4.8. If γ = 0 and β 6= 0, ᾱ is limited
by a bound on variable j. The lower bound is indicated by β = −1, while the upper bound is
indicated by β = 1. If γ = 1 the iteration is deemed degenerate, variable j is made nonbasic,
a small step is taken with x ← x + ᾱ∆x, and the algorithm moves to the next iteration.
9. Compute the step size α ≤ ᾱ which removes as many infeasibilities as possible, but does not
go any further.
10. Take the step: x ← x + α∆x
4.4
Phase 2
Each phase 2 iteration carries out the following steps:
1. Compute the direction of negative curvature. ARCOPT uses Matlab’s eigs function to compute
the eigenvector d corresponding to vmin , the minimum eigenvalue Z T HZ. eigs uses ARPACK
[45] and only requires matrix-vector products with Z T HZ. If all eigenvalues are non-negative,
then d ← 0.
2. Compute the vector of multipliers y by solving B T y = gB .
3. Compute the residual gradient: z ← g − AT y
4. Check the termination conditions. Phase 2 is optimal if
min(x − l, z) ≤ δD and min(u − x, −z) ≤ δD and vmin ≥ −δC .
5. Consider constraint deletion. This is accomplished by choosing one or more nonbasic variables
to make superbasic. Do nothing if a bound limited the step size in the previous iteration. If
42
CHAPTER 4. ARCOPT
the most recent change to the basis was constraint deletion, then more than one constraint
may be deleted.
6. Compute the reduced gradient: gz ← Z T g
7. Set sign for the direction of negative curvature: d ← −d if gzT d > 0
8. Compute the steepest descent direction in the full space: ∆x ← −Zgz . If
|gzT d| < δM |dT Z T HZd|,
the steepest descent direction is perturbed with the direction of negative curvature: ∆x ←
−Z(gz + d).
9. Compute the maximum possible step size ᾱ along ∆x with the call
(ᾱ, j, β, γ) ← expand(x, ∆x, l, u, αmax , δE , δT , δS ),
where αmax is a user defined step size limit and δE is the dynamic feasibility tolerance. The
EXPAND parameters δT and δS are described in Section 4.8. If γ = 0 and β 6= 0, ᾱ is limited
by a bound on variable j. The lower bound is indicated by β = −1, while the upper bound
is indicated by β = 1. If γ = 1, then the iteration is deemed degenerate, variable j is made
nonbasic, a small step is taken with x ← x + ᾱ∆x, and the algorithm moves to the next
iteration.
10. If vmin ≥ −δC , compute the regularized Newton direction p with
(Z T HZ + δR I)p = −gz ,
where δR is the regularization parameter. This is accomplished with Matlab’s pcg or minres.
11. Construct the arc Γ(α). If no direction of negative curvature exists (vmin ≥ −δC ), the subspace
is chosen to be [gz p]. If a direction of negative curvature exists (vmin < δC ), then the arc
subspace is chosen to be [gz d]. If |gzT d| ≤ δM |dT Z T HZd| the arc must be perturbed so that
Γ′ (0) = −Z(gz + d).
12. Compute the maximum step size along the arc:
ᾱ ← max α such that l − δE ≤ x + Γ(α) ≤ u + δE .
This is done with the call
(ᾱ, j, β) ← arctest(x, Γ, l, u, αmax , δE ),
43
CHAPTER 4. ARCOPT
where αmax is a user defined step size limit and δE is the dynamic feasibility tolerance. If
β 6= 0 then j is the index of the limiting bound. The lower bound is indicated by β = −1,
while the upper bound is indicated by β = 1.
13. Compute the initial step size. If the reduced Hessian is positive definite (vmin ≥ δV ), the initial
step size is α0 = 1/vmin . If the reduced Hessian is positive semi-definite (|vmin | < δV ), the
initial step size is α0 = 1. If the reduced Hessian is indefinite (vmin ≤ δV ), the initial step size
is α0 = −1/vmin . These choices were used in both [3] and [31].
14. Search along arc for an acceptable step size. The search function is φ(α) = F (x + Γ(α)). First,
compute values for φ′ (0) and φ′′ (0). Second, find α that satisfies
φ(α) ≤ φ(0) + δF φ′ (0)α +
1
2
min{φ′′ (0), 0}α2
|φ′ (α)| ≤ δG |φ′ (0) + min{φ′′ (0), 0}α|.
15. Take the step: x ← x + Γ(α)
4.5
Basis maintenance
Basis maintenance is performed at the end of each phase 1 or phase 2 iteration. If the step was
limited by a bound, the corresponding variable must be moved to the appropriate nonbasic set. If
the limiting variable was basic, its position in the basis must be replaced with a superbasic variable.
Changes to the basis may call for an update or refactorization of B. If the new basis matrix is found
to be ill-conditioned or rank deficient, then factorization repair is invoked. These steps are detailed
in the following routines:
• basis initialize (Procedure 4.1): perform initial partitioning of variables.
• basis main (Procedure 4.2): main call for basis maintenance. Responsible for moving a limiting
variable to the appropriate nonbasic set and updating the factorization if needed.
• basis activate (Procedure 4.3): move limiting variable to appropriate nonbasic set.
• basis select (Procedure 4.4): select an appropriate superbasic variable to move to basic set.
• vmap (Procedure 4.6): map from basis matrix column index to variable index.
• bmap (Procedure 4.5): map from basis variable index to basis matrix column index.
CHAPTER 4. ARCOPT
Procedure 4.1 basis initialize: perform the initial partition of variables into basis sets.
for i = 1 to n do
if li = ui then
add i to Nf , xi is nonbasic and fixed
else if xi = li then
add i to Nl , xi is nonbasic at lower bound
else if xi = ui then
add i to Nu , xi is nonbasic at upper bound
else
add i to S, xi is superbasic
end if
end for
if crash = firstm then
B ← {1, . . . , m}, make first m variables basic
remove {1, . . . , m} from other basis sets
else
B ← {n − m + 1, . . . , n}, make last m (slack) variables basic
remove {n − m + 1, . . . , n} from other basis sets
end if
44
CHAPTER 4. ARCOPT
45
Procedure 4.2 basis main(j, β): main entry point to basis routines. Responsible for moving a
limiting variable to the appropriate nonbasic set and updating the factorization if needed.
Require: j ∈ {0} ∪ B ∪ S is index of limiting basic or superbasic variable
// j = 0 indicates no variable is limited by a bound
Require: β ∈ {−1, 0, 1} indicates upper or lower limit
// β = −1 indicates lower bound limit
// β = 0 indicates no variable is limited by a bound
// β = 1 indicates upper bound limit
Ensure: basis index sets and factorization are updated
if β 6= 0 and j ∈ S then
// superbasic variable j has hit bound
call basis activate(j, β) to make j nonbasic
else if β 6= 0 and j ∈ B then
// basic variable j has hit bound, need to find superbasic variable to replace
i ← basis select(j)
move variable i to B
call basis activate(j, β) to make j nonbasic
call fac update(i, j) to replace column vmap(j) of B with column i of A
// fac update may call fac main to refactorize or repair B
end if
Procedure 4.3 basis activate(j, β): move limiting variable j to appropriate nonbasic set.
Require: j ∈ B ∪ S is index of limiting variable
Require: β ∈ {−1, 1} indicates lower or upper limit, respectively
Ensure: j is made nonbasic
if β = −1 then
move j to Nl
else if β = 1 then
move j to Nu
end if
CHAPTER 4. ARCOPT
46
Procedure 4.4 i ← basis select(j): select a superbasic variable to become basic. The method is
from [51, p. 53].
Require: j ∈ {1, m}, index of basic variable to become nonbasic
Require: |S| ≥ 1, there must be at least one superbasic variable
Ensure: i is index of superbasic variable to become basic
k ← vmap(j) // index of column in B corresponding to j
// find largest available pivot, vmax
u solves B T u = ek
v ← |S T u|
vmax ← max(v)
// compute minimum distance to bound for each superbasic
dk ← min{|xk − lk |, |xk − uk |} for k ∈ S
i ← arg maxk {dk with vk ≥ 0.1vmax and k ∈ S}
// i is index of variable furthest from bound with vi ≥ 0.1vmax
Procedure 4.5 j ← bmap(i): map from basis matrix column index to variable index
Require: i ∈ {1, . . . , m} is the column index of B
return j ∈ B, the index of basic variable corresponding to column i of B
Procedure 4.6 i ← vmap(j): map from basis variable index to basis matrix column index
Require: j ∈ B is an index of a basic variable
return the index of basis matrix column corresponding to variable j
CHAPTER 4. ARCOPT
4.6
47
Factorization
ARCOPT uses LUSOL [35] to compute and update sparse LU factors of the basis matrix B. If B
is found to be ill-conditioned or rank deficient, ARCOPT uses a repair procedure presented by Gill,
Murray, Saunders, and Wright in their paper on SNOPT [34, Section 5]. The factorization and repair
routines are summarized here:
• fac main (Procedure 4.7) main controller for the basis factorization. It first attempts to compute LU factors for B. If the basis is found to be ill-conditioned or rank deficient, then basis
repair is invoked.
• fac update (Procedure 4.8): update LU factors to reflect a column replacement in B.
• fac BS (Procedure 4.10): attempt to find a better conditioned basis by swapping basic and
superbasic variables.
• fac BR (Procedure 4.9): construct a well conditioned basis by replacing dependent columns of
B with appropriate columns corresponding to slack variables.
• fac repair (Procedure 4.11): help routine called by fac BS to select appropriate slack variables
for inclusion in the basis.
Procedure 4.7 fac main: factorize B with LUSOL using threshold partial pivoting (TPP). If B is
singular, call fac BS to attempt to fix the basis by swapping in superbasic variables. If basis is still
rank deficient, call fac BR to replace dependent columns with appropriate ones corresponding to
slack variables.
Require: function nsing(U ) counts number of apparent singularities in U
r ← false // flag to indicate a basis repair call
(L, U, p, q) ← LUSOL(B, TPP)
if nsing(U ) > 0 and |S| > 0 then
// basis is singular, attempt to replace dependent columns with superbasics
call fac BS
r ← true
end if
if nsing(U ) > 0 then
// basis is singular, replace dependent columns with slacks
call fac BR
r ← true
end if
if r is true then
// refactorize because a basis repair routine was called
(L, U, p, q) ← LUSOL(B, TPP)
end if
store L, U , p, and q for subsequent solves and updating
CHAPTER 4. ARCOPT
48
Procedure 4.8 fac update(i, j): update factorization to replace column in B corresponding to
variable j with column in A corresponding to variable i. Call fac main if the update routine reports
a singular result. Note that LUSOL maintains updates to L factors in product form; we ignore that
detail here.
j ← vmap(j) // get column index of B corresponding to variable j
(L, U, p, q, r) ← LUSOL REPCOL(L, U, p, q, j, ai ) // ai is column i of A
if r is true then
// singularity detected, initiate repair procedure
call fac main
end if
Procedure 4.9 fac BS: factorize [B S]T with LUSOL’s threshold rook pivoting (TRP) in order to
find a full rank or better conditioned set of basis columns without changing the state of non-basic
variables. The LU factors are not stored and a subsequent factorization is required for solves.
(L, U, p, q) ← LUSOL([B S]T , TRP)
// move first m pivot rows to B
for i = 1 to m do
move pi to B
end for
// move remaining |S| pivot rows to S
for i = m + 1 to m + |S| do
move pi to S
end for
Procedure 4.10 fac BR: factorize B with LUSOL’s threshold rook pivoting (TRP) to find dependent
columns. Dependent columns are replaced with identity columns corresponding to slack variables in
the fac repair method. There is no guarantee that one factorization and subsequent call to fac repair
will produce a nonsingular basis. Thus, the method will repeat until a full rank basis is produced.
This is always possible, because the entire basis matrix could replaced with −I. For efficiency the
method throws the LU factors away. A subsequent factorization is needed for solves.
repeat
(L, U, p, q) ← LUSOL(B, TRP)
if nsing(U ) > 0 then
call fac repair to replace dependent columns of B with slacks
end if
until nsing(U ) = 0
CHAPTER 4. ARCOPT
49
Procedure 4.11 fac repair: repair B by making variables corresponding to dependent columns
nonbasic and appropriate slack variables basic.
Require: p and q are row and column pivot vectors from LUSOL
Require: depcol(j) returns true if column j of B is dependent
for i = 1 to m do
j ← qi // get appropriate column index
if depcol(j) then
k ← bmap(j)
// column in B corresponding to variable k is dependent
// make variable k nonbasic
if xk ≤ lk then
move k to Nl
else if xk ≥ uk then
move k to Nu
else if lk = uk then
move k to Nf
else
move k to Nb
end if
// move slack variable into basis to repair B
move variable with index n − m + pk to B
end if
end for
50
CHAPTER 4. ARCOPT
4.7
Products with Z and Z T
ARCOPT requires the ability to compute products with a nullspace matrix Z. If the variables are
partitioned such that A = [B S N ] (4.1) a nullspace matrix can be constructed as the m × |S| matrix


−B −1 S


Z =  I .
0
(4.3)
The nullspace matrix is not constructed but used as an operator to compute products with Z
(Procedure 4.12) and Z T (Procedure 4.13).
Procedure 4.12 v ← Zu
Require: u ∈ R|S| , B is full rank
Ensure: v ∈ null(A)
vS ← u
vN ← 0
vB solves BvB = −Su
Procedure 4.13 v ← Z T u
Require: u ∈ Rn , B is full rank
Ensure: v ∈ range(Z T )
u1 solves B T u1 = uB
v ← −S T u1 + uS
CHAPTER 4. ARCOPT
4.8
51
Expand
Gill, Murray, Saunders, and Wright presented the EXPAND procedure [36] to prevent cycling in
active set methods for linearly constrained optimization. ARCOPT’s initialization procedure calls
expand init (Procedure 4.14) to set the dynamic feasibility tolerance δE and the growth parameter
δT . Before each phase 1 or phase 2 iteration, expand update (Procedure 4.15) is called to increase
the dynamic feasibility tolerance (δE ← δE + δT ). If δE has grown too large, then expand reset
(Procedure 4.16) is called to set δE to its initial value and move nonbasic variables back to their
bounds. Phase 1 is invoked if basic variables become infeasible.
The main EXPAND routines are used to determine the largest step size and select a “good”
limiting constraint. At the lowest level, step (Procedure 4.19) computes the step size to a bound
along a line for a single variable. Note that step uses parameter δS as a tolerance on near zero
values. Next, ratio test (Procedure 4.18) computes the maximum step size such that all variables
remain feasible. Finally, expand (Procedure 4.17) computes the maximum step size along a line to
a constraint with a large “pivot” value such that all variables remain feasible with respect to the
expanded bounds. Small “pivot” values typically lead to ill-conditioned basis matrices [36, p. 441].
An iteration is deemed degenerate if the step size to the nearest bound is too small. This occurs
when multiple constraints are encountered in the same iteration. In this situation, the expand routine
enforces a small positive step size such that all variables remain feasible with respect to the expanded
bounds. A degeneracy flag is set that signals ARCOPT to take the step, perform basis updates, then
go to the next iteration. In phase 2, ARCOPT checks for degeneracy by calling expand on x + αΓ′ (0),
which does not involve any work to compute arc-constraint intersections. In fact, the arc is only
constructed if the iteration is deemed non-degenerate.
Procedure 4.14 expand init: initialize dynamic feasibility tolerance and growth parameter.
δE ← δA · δP
−δA )δP
δT ← (δBexpfrq
Procedure 4.15 expand update: increase dynamic feasibility tolerance, reset if needed.
δE ← δE + δT
if δE ≥ δP then
call expand reset to reset dynamic tolerance
end if
CHAPTER 4. ARCOPT
52
Procedure 4.16 expand reset: reset dynamic tolerance, bring infeasible variables back to bounds.
move infeasible nonbasic variables back to bounds
call fac main to refactorize matrix
call exp init to reset dynamic feasibility tolerance
recompute basic variables
if basic variables are found to be infeasible, return to phase 1
Procedure 4.17 (α, j, β, γ) ← expand(x, ∆x, l, u, αmax , δE , δT , δS ): find the largest step size to
a bound associated with a large pivot such that all variables remain feasible with respect to the
expanded bounds. If the computed step size is too small, then set the degeneracy flag and return a
small positive step size.
Require: x, ∆x, u, l ∈ Rn , αmax , δE , δT , δS ∈ R+
Require: l − δE < x < u + δE
Ensure: α ∈ (0, αmax ], j ∈ {0, . . . , n}, β ∈ {−1, 0, 1}, γ ∈ {0, 1}
α ← αmax ; j ← 0; β ← 0; γ ← 0; p ← 0
(α1 , j1 , β1 ) ← ratio test(x, ∆x, l − δE , u + δE , αmax , δS )
for i = 1 to n do
(α2 , β2 ) ← step(xi , ∆xi , li , ui , αmax , δS )
if α2 ≤ α1 and |∆xi | > p then
α ← α2 ; j ← i; β ← β2 ; p ← |∆xi |
end if
end for
αmin ← δT /p
if α ≤ αmin then
α ← αmin ; γ ← 1
end if
Procedure 4.18 (α, j, β) ← ratio test(x, ∆x, l, u, αmax , δS ): find the largest step size to a bound
such that all variables remain feasible.
Require: x, ∆x, u, l ∈ Rn , αmax , δS ∈ R+
Require: l < x < u
Ensure: α ∈ (0, αmax ], j ∈ {0, . . . , n}, β ∈ {−1, 0, 1}
α ← αmax ; j ← 0; β ← 0
for i = 1 to n do
(α1 , β1 ) ← step(xi , ∆xi , li , ui , αmax , δS )
if α1 < α then
α ← α1 ; j ← i; β ← β1
end if
end for
CHAPTER 4. ARCOPT
53
Procedure 4.19 (α, β) ← step(x, ∆x, l, u, αmax , δS ): find the largest step size to a bound for a
single variable.
Require: x, ∆x, u, l ∈ R, αmax , δS ∈ R+
Ensure: α ∈ (0, αmax ], β ∈ {−1, 0, 1}
α ← αmax ; β ← 0
if ∆x < −δS and l > −∞ then
α ← (l − x)/∆x; β ← −1
end if
if ∆x > δS and u < ∞ then
α ← (u − x)/∆x; β ← 1
end if
if α > αmax then
α ← αmax ; β ← 0
end if
CHAPTER 4. ARCOPT
4.9
54
Arc-constraint intersection
In phase 2, ARCOPT computes the maximum step size along an arc with arctest (Procedure 4.20),
which computes
ᾱ = max {α such that l − δE ≤ x + Γ(α) ≤ u + δE }
and also returns the index of the limiting variable. The routine arcbound (Procedure 4.22) finds
the smallest non-negative real root of the nonlinear equation arising from the intersection of an arc
and a bound for a single variable. The routine arcstep (Procedure 4.21) applies arcbound to both
upper and lower bounds for a single variable. Note that arcstep and arcbound assume that the input
variable is strictly feasible with respect to the input bounds. The expanding feasibility tolerance
ensures this property for all variables.
Procedure 4.20 (α, j, β) ← arctest(x, Γ, l, u, αmax , δ): find the largest step size along an arc such
that all variables remain feasible with respect to expanded bounds.
Require: x, l, u ∈ Rn , αmax , δ ∈ R+ , Γ ∈ C[0, αmax ] : R 7→ Rn
Require: l − δ < x + Γ(0) < u + δ
Ensure: α ∈ (0, αmax ], j ∈ [0, n], β ∈ {−1, 0, +1}
α ← αmax ; j ← 0; b ← 0
for i = 1 to n do
(αi , βi ) ← arcstep(xi , Γi , li − δ, ui + δ, αmax )
if αi ≤ α and βi 6= 0 then
α ← αi ; j ← i; β ← βi
end if
end for
Procedure 4.21 (α, β) ← arcstep(x, Γ, l, u, αmax ): find the largest step size along an arc for a single
variable.
Require: x, l, u ∈ R, αmax ∈ R+ , Γ ∈ C[0, αmax ] : R 7→ R
Require: l < x + Γ(0) < u
Ensure: α ∈ (0, αmax ], β ∈ {−1, 0, 1}
(αl , βl ) ← arcbound(x, Γ, l, αmax )
(αu , βu ) ← arcbound(−x, −Γ, −u, αmax )
if αl < αu then
α ← αl ; β ← −βl
else
α ← αu ; β ← βu
end if
CHAPTER 4. ARCOPT
55
Procedure 4.22 (α, β) ← arcbound(x, Γ, l, αmax ): compute the first point of intersection between
an arc and a bound for a single variable.
Require: x, l ∈ R, αmax ∈ R+ , Γ ∈ C[0, αmax ] : R 7→ R
Require: l < x + Γ(0)
Ensure: α ∈ (0, αmax ], β ∈ {0, 1}
α ← αmax ; β ← 0
if l > −∞ then
r ← roots(Γ + (x − l), [0, αmax ])
rmin ← min(r)
if rmin exists then
α ← rmin ; β ← 1
end if
end if
Chapter 5
Experiments
5.1
Preliminaries
This chapter documents the following numerical experiments:
• A comparison between ARCOPT and IPOPT on a continuous formulation of the Hamiltonian
cycle problem.
• A comparison between ARCOPT, IPOPT, and SNOPT on problems from the CUTEr test set.
• A comparison of the BFGS and SR1 quasi-Newton updates in an arc search code.
We begin by discussing existing solvers and a method of comparing performance.
5.1.1
SNOPT
SNOPT is an active-set SQP method for large-scale sparse nonlinear optimization by Gill, Murray,
and Saunders [34]. The SNOPT algorithm does not use second derivatives and thus cannot guarantee
convergence to second-order critical points. However, since it is a descent method, SNOPT finds a
minimizer most of the time. SNOPT maintains a limited-memory quasi-Newton approximation of
the Hessian. The software is known to be robust and efficient, which makes it an attractive candidate
for comparison.
5.1.2
IPOPT
IPOPT is an interior point code by Wächter and Biegler [60] using a filter-based line search for
nonlinearly constrained optimization. IPOPT is distributed as open source software and is able to
use second derivatives. When the Hessian is indefinite, IPOPT computes a search direction from
(H + λI)s = −g.
56
57
CHAPTER 5. EXPERIMENTS
The method to select λ is fully described in [60, Section 3.1]. In summary, if H is found to be
indefinite, IPOPT sets λ ← λ0 , where λ0 is determined from a user parameter or a previous iteration.
If H + λI is found to be indefinite then λ is increased by a factor δ such that λ ← δλ and the process
is repeated. The default setting is δ = 8. Between iterations, the initial trial value λ0 is decreased.
IPOPT does not explicitly compute or use directions of negative curvature. Convergence to
second-order critical points is not guaranteed, but often observed in practice.
5.1.3
Other solvers
Table 5.1 lists related solvers that are available for use on NEOS [18]. For each solver we list the
problem type, basic method, ability to accept AMPL [29] models, and use of second derivatives. All
solvers require smooth objective functions.
None of the methods listed guarantees convergence to second-order critical points. We did a
simple test on the solvers in Table 5.1 that accept AMPL models. The problems were
minimize F1 (x, y) = x2 − 3y 2 + y 4
and
minimize F2 (x, y) = x2 − y 2 subject to − 2 ≤ x, y ≤ 2,
(5.1)
(5.2)
p
which have saddle points at (0, 0). Problem (5.1) has (global) minimizers at (x, y) = (0, ± 3/2).
Problem (5.2) has (global) minimizers at (x, y) = (0, ±2). When started at (x0 , y0 ) = (1, 0) all
solvers terminate at the saddle point, with the exception of LOQO which found a minimizer for (5.1)
and the saddle point for (5.2). It should be noted that the initial point is very special, because it lies
in the positive definite subspace of the Hessian. If an algorithm does nothing to move off this space
it will likely converge to the saddle point. For these problems, all solvers converge to a minimizer
if (x0 , y0 ) is selected to contain a large enough component in the span of (0, 1). This exercise
demonstrates that convergence to second-order critical points is a feature missing from all solvers
available to the community despite many of those solvers taking advantage of second derivatives.
5.1.4
Performance profiles
Dolan and Moré developed performance profiles to compare optimization software on a set of test
problems [19, 20]. We briefly describe the method here and use it throughout this chapter. Denote
the set of solvers S and the set of test problems P. The metric tp,s indicates the performance of
solver s ∈ S on problem p ∈ P. The metric could be solution time, number of function evaluations,
or solution quality. The best performance on any problem is given by tp,min = min{tp,s : s ∈ S}.
The ratio rp,s = tp,s /tp,min indicates the relative performance of solver s on p. We see rp,s = 1 if s
achieved the best observed performance on p and rp,s > 1 otherwise. If solver s failed on p, then
rp,s can be set to a sufficiently large number. Finally, performance profiles are plots of the function
fs (σ) =
|{p : rp,s ≤ σ}|
,
|P|
58
CHAPTER 5. EXPERIMENTS
Table 5.1: List of solvers available on NEOS [18].
solver
L-BFGS-B
TRON
CONOPT
filter
KNITRO
LANCELOT
LOQO
LRAMBO
MINOS
PATHNLP
SNOPT
NMTR
type method AMPL? ∇2 F ?
BC
LS
*
BC
TR
*
NC
LS
*
NC
TR
*
*
NC
LS/TR
*
*
NC
TR
*
*
NC
LS
*
*
NC
LS
*
NC
LS
*
NC
?
*
NC
LS
*
UC
TR
*
UC unconstrained
BC bound constrained
NC nonlinearly constrained
LS line search
TR trust region
[13]
[46]
[21]
[26]
[12]
[17]
[58]
[51]
[34]
[49]
which is the fraction of problems that s was able to solve with performance ratio at most σ. The
profile functions fs (σ) are drawn for all s ∈ S on the same plot. Solvers with greater area under the
profile curve exhibit better relative performance on the test set.
A possible criticism of this performance profile is that it may give misleading results if applied
to a set of problems that vary widely in difficulty. Differences in the performance metric on easy
problems (small tp,min ) will have a much larger impact on relative performance when compared to
performance differences on hard problems (large tp,min ). Thus, the overall profile for a solver that
does well on hard problems may look poor if it performs marginally worse on easy problems. In our
situation this is not an issue. In the next section we compare solvers on different instances of the
same problem. In the following sections we use the CUTEr test set. In both cases, the problems do
not have a large variation in difficulty.
59
CHAPTER 5. EXPERIMENTS
5.2
Hamiltonian cycle problem (HCP)
A Hamiltonian cycle (HC) is a path through an undirected graph that follows edges to visit each node
exactly once and returns to the start. Finding such cycles in a graph is known as the Hamiltonian
cycle problem and is one of Karp’s 21 problems shown to be NP-complete [44]. It is simple to check
whether a given cycle is Hamiltonian. However, there is no known algorithm guaranteed to find an
HC, or report that one does not exist, in time proportional to a polynomial function of the number
of nodes and edges.
Ejov, Filar, Murray, and Nguyen derived an interesting continuous formulation of HCP [22].
Consider a graph with n nodes and m edges. The variables in the problem are weights of an
adjacency matrix, denoted P (x). Each undirected edge is made of two directed edges connecting
the same nodes in opposite directions. There is one variable per directed edge. Therefore, x has 2m
elements. The optimization problem is
minimize
x
G(x) = det I − P (x) +
subject to P (x)e = e
x ≥ 0,
1
T
N ee
(5.3)
where e is a vector of ones. If the graph has a Hamiltonian cycle, then it can be shown that x∗ is a
global minimizer for (5.3) if G(x∗ ) = −N . The elements of x∗ are 0 or 1, where the 1’s correspond to
edges that are included in the cycle. Global minimizers also result in a doubly stochastic adjacency
matrix P (x∗ ), which satisfies the additional constraint P (x∗ )T e = e. Filar, Haythorpe, and Murray
derived an efficient method to evaluate the first and second derivatives of G(x) [25].
It turns out that using second derivatives and directions of negative curvature is particularly
important in finding HCs using (5.3). Preliminary experiments with SNOPT all failed to find an
HC, with many of the runs terminating at saddle points. Haythorpe developed a specialized interior
point method for (5.3), which uses second derivatives and directions of negative curvature [43].
In this experiment, we compared ARCOPT with IPOPT on (5.3). Both solvers use second derivatives, but only ARCOPT makes explicit use of directions of negative curvature. During initial testing,
we found that adding the constraint P (x)T e = e to (5.3) improved the likelihood of finding an HC
with both solvers. We included the additional constraint for the results reported here.
5.2.1
10, 12, and 14 node cubic graphs
Cubic graphs are of interest because they represent difficult instances of HCP [42]. Variables corresponding to a node with only two edges may be fixed, because the cycle is forced to use each edge.
Nodes in a cubic graph all have three edges and there is no possibility for reduction. Adding more
edges only increases the likelihood of a Hamiltonian cycle. Cubic graphs are the most sparse and
thus are the least likely to contain HCs in general.
Figure 5.1 shows a performance profile comparing ARCOPT and IPOPT on all 10, 12, and 14
node cubic graphs with HCs provided by Haythorpe [42]. Graphs without HCs were excluded. The
60
CHAPTER 5. EXPERIMENTS
test set includes a total of 571 graphs. In all cases both solvers were started at the analytic center
of the feasible region. From this point, ARCOPT was able to find an HC in 79% (452) of the graphs
while IPOPT solved 52% (298). This is indicated in Figure 5.1 by the maximum height of the plots
corresponding to each solver. In general, ARCOPT required significantly fewer function evaluations
as summarized in Table 5.2.
5.2.2
24, 30, and 38 node cubic graphs
ARCOPT and IPOPT were also tested on individual cubic graphs with 24, 30, and 38 nodes as well
as a 30 node graph of degree 4. The graphs contained HCs and were provided by Haythorpe [42].
HCP becomes more difficult as the number of nodes is increased. Starting from the analytic center
of the feasible region, ARCOPT found an HC in the 30 node graph of degree 4 and the 38 node cubic
graph. IPOPT was unable to find an HC in all of these graphs when started from the same points.
To assess the relative likelihood of finding HCs, we performed an experiment using random
starting points. For each graph, 20 random feasible starting points were generated in the following
manner. First, a vector v was produced by sampling each element from a uniform distribution over
[0, 1]. The starting points were then selected by solving
minimize
x
kv − xk22
subject to Ax = e
x ≥ 0,
where the linear constraints correspond to those for (5.3). Both solvers were started from the same
points.
The results are summarized in Table 5.3 and Figure 5.2. Compared to IPOPT, ARCOPT was
able to find HCs in larger graphs more reliably and efficiently.
61
CHAPTER 5. EXPERIMENTS
1
0.9
0.8
fraction solved
0.7
0.6
0.5
0.4
0.3
0.2
0.1
arcopt
ipopt
0
1
2
3
4
5
6
7
8
9
10
11
12
relative performance
Figure 5.1: Performance profile on HCP comparing ARCOPT and
IPOPT on all 10, 12, and 14 node cubic graphs with Hamiltonian
cycles.
nodes
10
#
19
HC
17
total function
average function
12
85
80
total function
average function
14
509
474
total function
average function
all
613
571
total function
average function
HCs found
evaluations
evaluations
HCs found
evaluations
evaluations
HCs found
evaluations
evaluations
HCs found
evaluations
evaluations
ARCOPT
15 (88.2%)
275
16.2
67 (83.8%)
1532
19.1
370 (78.1%)
10804
22.8
452 (79.2%)
12611
22.1
IPOPT
7 (41.2%)
1430
84.1
39 (48.8%)
8120
101.5
252 (53.2%)
53598
113.1
298 (52.2%)
63148
110.6
Table 5.2: Summary of results comparing ARCOPT and IPOPT
on all 10, 12, and 14 node cubic graphs with Hamiltonian cycles.
The number of function evaluations reported includes the runs for
which the solver failed to find a HC. The averages are reported
over all runs. The # column reports the total number of cubic
graphs. The HC column reports the total number of cubic graphs
with HCs.
62
CHAPTER 5. EXPERIMENTS
700
arcopt
ipopt
average function evaluations
600
500
400
300
200
100
0
10
15
20
25
30
35
40
number of nodes
Figure 5.2: Average number of function evaluations to find HCs in
cubic graphs of different sizes.
nodes
24
degree
3
total
average
30
3
total
average
38
3
total
average
30
4
total
average
number solved
function evaluations
function evaluations
number solved
function evaluations
function evaluations
number solved
function evaluations
function evaluations
number solved
function evaluations
function evaluations
ARCOPT
10 (50%)
457
45.7
10 (50%)
533
53.3
10 (50%)
633
63.3
17 (85%)
1573
92.5
IPOPT
3 (15%)
363
121
3 (15%)
1040
346.7
4 (20%)
2575
643.8
8 (40%)
1421
177.6
Table 5.3: Performance of ARCOPT and IPOPT on specific 24, 30,
and 38 node cubic graphs and a 30 node graph of degree 4. For
each graph the solvers were started from 20 randomly generated,
feasible starting points. Function evaluations were reported for
runs in which an HC was found.
63
CHAPTER 5. EXPERIMENTS
5.3
The CUTEr test set
CUTEr stands for “Constrained and Unconstrained Test Environment revisited”. It is a large set of
optimization problems and associated tools created and maintained by Gould, Orban, and Toint [40].
This experiment compares ARCOPT, SNOPT, and IPOPT on:
• 127 unconstrained problems.
• 98 problems with bounds on the variables.
• 37 problems with a nonlinear objective and linear constraints.
There are many publicly available problems with linear constraints. However, the majority have
linear or convex quadratic objective functions. This thesis is focused on problems with general
nonlinear objective functions; thus we only test using the relatively small number of LC problems
in CUTEr with nonlinear nonquadratic objective functions.
The experiments used function evaluations as the performance metric. Each solver was given a
quota of 1000 iterations. The selected optimality and feasibility tolerances are shown in Table 5.4.
Runs for which solution function values were not within 10−4 of the best solution were counted as
failures. The number of function evaluations taken by each solver on each problem are shown in
Tables A.1, A.2, and A.3 for unconstrained, bound constrained, and linearly constrained problems
respectively.
The results are summarized by performance profiles in Figures 5.3, 5.4, and 5.5. Note that
SNOPT does not use second derivatives and its under-performance was expected. IPOPT performed
the best on unconstrained problems followed by ARCOPT then SNOPT. ARCOPT achieved best
observed performance on 49% of the bound constrained problems followed by 37% for IPOPT and
14% for SNOPT. IPOPT was able to solve 81% of the bound constrained problems followed by 77%
for SNOPT and 76% for ARCOPT. For linearly constrained problems, ARCOPT and IPOPT produced
nearly equivalent profiles and performed better than SNOPT. Both IPOPT and SNOPT failed on 5
linearly constrained problems, while ARCOPT failed on 4.
solver
ARCOPT
SNOPT
IPOPT
parameter
itermax
ptol (primal tolerance)
dtol (dual tolerance)
major iterations limit
major optimality tolerance
major feasibility tolerance
max iter
tol
constr viol tol
compl inf tol
value
1000
1e-6
1e-6
1000
1e-6
1e-6
1000
1e-6
1e-6
1e-6
Table 5.4: Solver settings for CUTEr experiments.
64
CHAPTER 5. EXPERIMENTS
1
0.9
0.8
fraction solved
0.7
0.6
0.5
0.4
0.3
0.2
arcopt
snopt
ipopt
0.1
0
1
2
3
4
5
6
7
8
9
10
relative performace
Figure 5.3: Performance profile on unconstrained problems.
1
0.9
0.8
fraction solved
0.7
0.6
0.5
0.4
0.3
0.2
arcopt
snopt
ipopt
0.1
0
1
2
3
4
5
6
7
8
9
10
relative performace
Figure 5.4: Performance profile on bound constrained problems.
65
CHAPTER 5. EXPERIMENTS
1
0.9
0.8
fraction solved
0.7
0.6
0.5
0.4
0.3
0.2
arcopt
snopt
ipopt
0.1
0
1
2
3
4
5
6
7
8
9
10
relative performace
Figure 5.5: Performance profile on linearly constrained problems.
66
CHAPTER 5. EXPERIMENTS
5.4
Quasi-Newton methods
Quasi-Newton methods emulate Newton’s method by maintaining an approximation to the second
derivative:
B k ≈ Hk .
The approximation is updated each iteration with a formula. Let sk = xk+1 −xk and yk = gk+1 −gk .
Two well known updates are
(yk − Bk sk )(yk − Bk sk )T
and
(yk − Bk sk )T sk
yk y T
Bk s k s T Bk
+ T k.
= Bk − T k
s k Bk s k
yk s k
Bk+1 = Bk +
Bk+1
(SR1)
(BFGS)
The updates are constructed to satisfy the secant equation,
Bk+1 sk = yk ,
(5.4)
which requires the updated approximation to match the most recent gradient evaluations.
The symmetric rank-1 update (SR1) is constructed by selecting a vector v and scalar σ such
that Bk+1 = Bk + σvv T and (5.4) are satisfied. The problem in the context of line search is that
Bk+1 may not be positive definite even if Bk is. This presents the same problems as using an
indefinite or singular Hessian when computing a descent direction. The SR1 update breaks down if
the denominator is too small. Nocedal and Wright suggest that the update only be applied if
|sTk (yk − Bk sk )| ≥ rksk kkyk − Bk sk k,
(5.5)
where r ∈ (0, 1) with a suggested value of r = 10−8 [53, p. 145].
The BFGS update is constructed such that Bk+1 maintains positive definiteness. Vectors sk and
yk must satisfy
sTk yk > 0
(5.6)
in order to keep Bk+1 ≻ 0. In the context of line search (5.6) is met if αk is selected to satisfy the
curvature condition (2.9) when φ′′k (0) ≥ 0. One issue with BFGS is that (5.6) may not be satisfied
when a constraint is encountered. Second, the curvature condition (2.9) for general arcs may not
directly imply (5.6), leading to degraded performance if the update is used. There are methods to
handle the case where (5.6) is not met, e.g. the damped BFGS update [53, p. 537].
Maintaining positive definiteness for BFGS updates is not particularly difficult for linearly constrained problems. However, there are no assurances with updates for problems with nonlinear
constraints. Although we do not address such problems here, it is of interest to investigate methods
that generalize easily to nonlinearly constrained problems.
This experiment compares the methods: line search with BFGS updates (BFGS-LINE), arc search
CHAPTER 5. EXPERIMENTS
67
with BFGS updates (BFGS-ARC), and arc search with SR1 updates (SR1-ARC). NEM arcs as described in section 3.4 were used for arc search. All methods used the same code, which was adapted
from ARCOPT. The optimality and feasibility tolerances were set to 10−6 . The iteration limit was
set to 1000. SR1 updates were skipped if (5.5) was not satisfied. BFGS updates were skipped if arc
or line search returned a step size that did not satisfy the curvature condition (2.9).
The results are summarized with performance profiles in Figures 5.6 and 5.7 for unconstrained
and bound constrained problems respectively. Listings of function evaluation counts are shown in
Tables A.4 and A.5. On unconstrained problems, SR1-ARC out-performed the other methods. On
bound constrained problems, SR1-ARC and BFGS-LINE had similar performance initially, but BFGSLINE was able to solve a greater total number of problems. BFGS-ARC had the worst performance
on both test sets.
68
CHAPTER 5. EXPERIMENTS
1
0.9
0.8
fraction solved
0.7
0.6
0.5
0.4
0.3
0.2
bfgs−line .
bfgs−arc
sr1−arc
0.1
0
1
1.5
2
2.5
3
3.5
4
4.5
5
relative performance
Figure 5.6: Performance profile for quasi-Newton experiments on
unconstrained problems.
1
0.9
0.8
fraction solved
0.7
0.6
0.5
0.4
0.3
0.2
bfgs−line .
bfgs−arc
sr1−arc
0.1
0
1
1.5
2
2.5
3
3.5
4
4.5
5
relative performance
Figure 5.7: Performance profile for quasi-Newton experiments on
bound constrained problems.
Chapter 6
Conclusions
6.1
Contributions
The first contribution of this thesis is the definition of a general arc search method for linearly
constrained optimization. We show conditions under which convergence to second-order critical
points is guaranteed. We also demonstrate the application of the convergence theory to a new arc
we designate with NEM and several known methods: line search, curvilinear search, and modified
gradient flow.
The second contribution is ARCOPT, an implementation of a reduced-gradient method using a
NEM arc search for linearly constrained optimization. ARCOPT takes advantage of sparsity in the
linear constraints and uses iterative methods for operations involving the reduced Hessian. These
features allow ARCOPT to scale to problems with many variables. We document several practical
considerations that arise in the construction of an arc search code. For example, the EXPAND
procedure is a convenient way to remove undesirable roots in arc-constraint intersection equations
when a constraint is deleted.
Numerical experiments with ARCOPT demonstrate good performance relative to SNOPT and
IPOPT. ARCOPT outperforms IPOPT in both solution quality and efficiency on a continuous formulation of the Hamiltonian cycle problem. ARCOPT is competitive on a wide variety of problems in
the CUTEr test set. We also show that the arc search framework allows for increased flexibility in
solver development by comparing the BFGS and SR1 quasi-Newton updates in the same code.
6.2
Further work
The extension to nonlinear constraints could be done in a number of ways. The most direct approach
would be to use an arc search method, such as ARCOPT, to solve the linearly constrained subproblem
in a MINOS-type algorithm [30, 52]. A more challenging direction is to apply the ideas to an SQP
method such as the one described by Murray and Prieto [50]. Note that Murray and Prieto’s method
uses a curvilinear search by necessity and treats linear and nonlinear constraints in the same manner.
69
CHAPTER 6. CONCLUSIONS
70
Applying the ideas in this thesis would allow for the exploration of other arcs and separate treatment
for linear constraints.
This thesis presents results of numerical experiments using dense quasi-Newton updates in an arc
search method. The next step is to compare limited memory versions of the BFGS and SR1 updates,
which can be applied to problems with a large number of variables. In nonlinearly constrained
optimization, the Hessian of the Lagrangian need not be positive definite at a minimizer. Thus,
maintaining a positive definite quasi-Newton approximation is a serious challenge. To avoid skipping
updates, existing solvers employ various modifications, e.g. [34, Section 2.10] and [53, p. 536]. The
SR1 update may be applied more often and without modification, because it has a less restrictive
update requirement. The SR1 matrix may be indefinite and thus a better approximation to the
Hessian of the Lagrangian.
Appendix A
Results tables
A.1
CUTEr results
Table A.1: Number of function evaluations taken to solve CUTEr
unconstrained problems. The number of variables is indicated
by column n. Solver failures are indicated by *. Failures due to
suboptimal objective function value are indicated by @.
Problem
AKIVA
ALLINITU
ARGLINA
ARGLINB
ARGLINC
ARWHEAD
BARD
BDQRTIC
BEALE
BIGGS6
BOX
BOX3
BRKMCC
BROWNAL
BROWNDEN
BRYBND
CHAINWOO
CHNROSNB
CLIFF
COSINE
CRAGGLVY
CUBE
n
2
4
200
200
200
1000
3
1000
2
6
1000
3
2
10
4
1000
1000
50
2
1000
500
2
ARCOPT
7
11
2
*
*
7
11
11
10
*
8
9
4
10
9
20
*
83
390
10
15
48
71
SNOPT
26
18
6
*
*
24
24
91
17
@
84
24
11
18
41
45
*
233
30
@
122
43
IPOPT
7
14
2
3
3
6
8
10
19
120
13
14
4
8
8
16
249
92
@
13
14
57
72
APPENDIX A. RESULTS TABLES
Table A.1: (continued) Number of function evaluations taken to
solve CUTEr unconstrained problems. The number of variables
is indicated by column n. Solver failures are indicated by *. Failures due to suboptimal objective function value are indicated by
@.
Problem
CURLY10
CURLY20
CURLY30
DENSCHNA
DENSCHNB
DENSCHNC
DENSCHND
DENSCHNE
DENSCHNF
DIXMAANA
DIXMAANB
DIXMAANC
DIXMAAND
DIXMAANE
DIXMAANF
DIXMAANG
DIXMAANH
DIXMAANI
DIXMAANJ
DIXMAANK
DIXMAANL
DIXON3DQ
DJTL
DQDRTIC
EDENSCH
EG2
EIGENALS
EIGENBLS
EIGENCLS
ENGVAL1
ENGVAL2
ERRINROS
EXPFIT
EXTROSNB
FLETCBV2
FLETCBV3
FLETCHBV
FLETCHCR
FMINSRF2
FMINSURF
FREUROTH
GENROSE
n
1000
1000
1000
2
2
2
3
3
2
1500
1500
1500
1500
1500
1500
1500
1500
1500
1500
15
1500
1000
2
500
36
1000
110
110
462
1000
3
50
2
1000
1000
1000
1000
1000
961
961
1000
500
ARCOPT
22
30
@
7
7
11
33
26
7
10
9
16
10
13
25
29
29
199
164
15
193
87
*
3
13
39
141
87
229
*
23
814
10
*
25
*
*
*
171
127
@
491
SNOPT
*
*
*
12
11
21
98
43
12
27
31
32
40
166
129
139
144
891
377
124
471
*
*
28
119
@
222
*
*
36
36
296
22
*
3
*
*
*
*
582
92
*
IPOPT
22
27
30
7
24
11
@
25
7
8
12
9
10
11
27
18
17
24
20
13
25
2
*
2
13
5
31
196
288
9
33
@
9
2334
2
*
*
*
275
311
@
1256
73
APPENDIX A. RESULTS TABLES
Table A.1: (continued) Number of function evaluations taken to
solve CUTEr unconstrained problems. The number of variables
is indicated by column n. Solver failures are indicated by *. Failures due to suboptimal objective function value are indicated by
@.
Problem
GROWTHLS
HAIRY
HATFLDD
HATFLDE
HATFLDFL
HEART6LS
HEART8LS
HELIX
HIELOW
HILBERTA
HILBERTB
HIMMELBG
HIMMELBH
HUMPS
JENSMP
KOWOSB
LIARWHD
LOGHAIRY
MANCINO
MARATOSB
MEXHAT
MEYER3
MODBEALE
MOREBV
MSQRTALS
MSQRTBLS
NCB20B
NONCVXU2
NONCVXUN
NONDIA
NONDQUAR
NONMSQRT
OSBORNEA
OSBORNEB
OSCIPATH
PFIT1LS
PFIT2LS
PFIT3LS
PFIT4LS
POWELLSG
POWER
ROSENBR
n
3
2
3
3
3
6
8
3
3
10
50
2
2
2
2
4
1000
2
30
2
2
3
200
1000
529
529
1000
100
100
1000
1000
529
5
11
500
3
3
3
3
1000
1000
2
ARCOPT
843
63
22
23
165
*
367
11
9
6
3
5
6
302
11
21
13
598
6
*
66
*
8
2
206
164
22
37
@
23
184
*
*
17
5
*
315
959
*
18
30
25
SNOPT
182
55
27
43
470
*
*
32
33
39
11
13
10
179
42
36
42
372
13
*
53
*
@
307
*
*
*
*
758
111
683
*
115
95
28
460
148
301
484
*
137
49
IPOPT
178
109
26
21
*
1433
188
25
9
3
2
14
24
488
10
23
12
*
5
1804
40
459
9
2
73
47
20
@
@
5
17
1002
37
24
5
704
202
344
549
17
@
45
74
APPENDIX A. RESULTS TABLES
Table A.1: (continued) Number of function evaluations taken to
solve CUTEr unconstrained problems. The number of variables
is indicated by column n. Solver failures are indicated by *. Failures due to suboptimal objective function value are indicated by
@.
Problem
S308
SCHMVETT
SENSORS
SINEVAL
SINQUAD
SISSER
SNAIL
SPARSINE
SPARSQUR
SPMSRTLS
SROSENBR
TESTQUAD
TOINTGOR
TOINTGSS
TOINTPSP
TOINTQOR
TQUARTIC
TRIDIA
VARDIM
WOODS
ZANGWIL2
iter fails (*)
fval fails (@)
total fails
n
2
1000
100
2
1000
2
2
1000
1000
1000
1000
1000
50
1000
50
50
1000
1000
200
1000
2
ARCOPT
10
4
@
63
*
15
92
975
20
19
10
4
8
3
19
3
2
3
457
553
2
18
4
22
SNOPT
14
66
75
90
120
25
133
*
45
180
50
*
160
@
61
47
97
403
*
305
7
29
5
34
IPOPT
15
4
@
110
21
15
148
14
17
20
13
2
8
2
83
2
2
2
28
84
2
6
8
14
Table A.2: Number of function evaluations taken to solve CUTEr
bound constrained problems. The number of variables is indicated by column n. Solver failures are indicated by *. Failures due
to suboptimal objective function value are indicated by @.
Problem
3PK
ALLINIT
BDEXP
BIGGSB1
BQP1VAR
BQPGABIM
n
30
4
5000
100
1
50
ARCOPT
580
13
@
200
2
12
SNOPT
447
20
75
*
4
23
IPOPT
12
19
@
14
6
18
75
APPENDIX A. RESULTS TABLES
Table A.2: (continued) Number of function evaluations taken to
solve CUTEr bound constrained problems. The number of variables is indicated by column n. Solver failures are indicated by *.
Failures due to suboptimal objective function value are indicated
by @.
Problem
BQPGASIM
CAMEL6
CHENHARK
CVXBQP1
DECONVB
EG1
EXPLIN
EXPLIN2
EXPQUAD
GRIDGENA
HADAMALS
HART6
HATFLDC
HIMMELP1
HS1
HS110
HS2
HS25
HS3
HS38
HS3MOD
HS4
HS45
HS5
JNLBRNG1
JNLBRNG2
JNLBRNGA
JNLBRNGB
LINVERSE
LOGROS
MCCORMCK
MDHOLE
MINSURFO
NCVXBQP1
NCVXBQP2
NCVXBQP3
NOBNDTOR
NONSCOMP
OBSTCLAE
OBSTCLAL
OBSTCLBL
OBSTCLBM
n
50
2
100
100
61
3
120
120
120
12482
100
6
25
2
2
200
2
3
2
4
2
2
5
2
529
529
529
529
1999
2
10000
2
5306
100
100
100
100
10000
100
100
100
100
ARCOPT
12
7
38
100
30
@
171
155
49
5
111
9
5
12
32
@
7
@
4
57
8
3
5
7
100
45
496
416
41
119
7
47
11
104
107
@
21
10
29
38
77
50
SNOPT
25
12
*
@
@
@
@
@
73
*
@
18
26
19
50
*
16
@
9
119
11
4
7
10
100
112
82
327
475
97
116
89
*
@
@
20
30
131
24
21
16
14
IPOPT
18
11
13
@
@
8
@
@
20
8
@
14
6
12
53
*
17
43
5
77
6
6
8
9
13
11
11
13
868
137
8
117
417
@
@
@
8
@
13
14
12
10
76
APPENDIX A. RESULTS TABLES
Table A.2: (continued) Number of function evaluations taken to
solve CUTEr bound constrained problems. The number of variables is indicated by column n. Solver failures are indicated by *.
Failures due to suboptimal objective function value are indicated
by @.
Problem
OBSTCLBU
OSLBQP
PALMER1
PALMER1A
PALMER1B
PALMER1E
PALMER2
PALMER2A
PALMER2B
PALMER2E
PALMER3
PALMER3A
PALMER3B
PALMER3E
PALMER4
PALMER4A
PALMER4B
PALMER4E
PALMER5A
PALMER5B
PALMER5D
PALMER5E
PALMER6A
PALMER6E
PALMER7A
PALMER7E
PALMER8A
PALMER8E
PENTDI
PROBPENL
PSPDOC
QR3DLS
QUDLIN
S368
SCOND1LS
SIM2BQP
SIMBQP
SINEALI
TORSION1
TORSION2
TORSION3
TORSION4
n
100
8
4
6
4
8
4
6
4
8
4
6
4
8
4
6
4
8
8
9
8
8
6
8
6
8
6
8
5000
500
4
155
120
100
502
2
2
1000
100
100
100
100
ARCOPT
45
2
31
*
*
@
14
*
16
*
30
*
20
*
23
*
21
*
*
*
4
*
*
*
*
*
176
*
3
31
7
231
121
@
*
2
4
18
34
6
13
9
SNOPT
15
6
32
99
58
245
43
117
44
247
@
139
47
204
@
101
39
176
39
*
32
*
195
139
*
*
130
89
7
6
15
*
@
28
*
4
8
108
13
15
10
13
IPOPT
12
12
1036
92
26
@
2296
205
34
22
412
196
15
56
832
119
31
46
*
5
4
*
283
60
*
*
102
30
@
6
15
114
@
@
2623
8
8
43
10
10
9
10
77
APPENDIX A. RESULTS TABLES
Table A.2: (continued) Number of function evaluations taken to
solve CUTEr bound constrained problems. The number of variables is indicated by column n. Solver failures are indicated by *.
Failures due to suboptimal objective function value are indicated
by @.
Problem
TORSION5
TORSION6
TORSIONA
TORSIONB
TORSIONC
TORSIOND
TORSIONE
TORSIONF
iter fails (*)
fval fails (@)
total fails
n
100
100
100
100
100
100
100
100
ARCOPT
1
11
42
5
17
8
1
11
17
7
24
SNOPT
3
5
14
17
11
14
3
5
11
12
23
IPOPT
8
9
10
10
8
9
8
9
5
14
19
Table A.3: Number of function evaluations taken to solve CUTEr
linearly constrained problems. The number of variables is indicated by column n. The number of linear constraints is indicated
by column m. Solver failures are indicated by *. Failures due to
suboptimal objective function value are indicated by @.
Problem
DTOC1L
DTOC1L
EXPFITA
EXPFITB
EXPFITC
HAGER2
HAGER2
HAGER2
HAGER2
HAGER4
HAGER4
HAGER4
HAGER4
HIMMELBI
HIMMELBJ
HONG
HS105
HS112
n
58
598
5
5
5
21
101
201
1001
21
101
201
1001
100
45
4
8
10
m
36
396
22
102
502
10
50
100
500
10
50
100
500
12
14
1
1
3
ARCOPT
12
12
19
32
123
2
2
2
2
8
28
53
252
*
*
14
@
39
SNOPT
15
24
27
32
37
8
8
9
11
15
12
11
12
59
195
13
@
33
IPOPT
9
9
29
33
@
2
2
2
2
11
10
10
9
29
*
9
25
18
78
APPENDIX A. RESULTS TABLES
Table A.3: (continued) Number of function evaluations taken to
solve CUTEr linearly constrained problems. The number of
variables is indicated by column n. The number of linear constraints is indicated by column m. Solver failures are indicated by
*. Failures due to suboptimal objective function value are indicated
by @.
Problem
HS119
HS24
HS36
HS37
HS41
HS49
HS50
HS54
HS55
HS62
HS86
HS9
HUBFIT
LOADBAL
ODFITS
PENTAGON
QC
STANCMIN
TFI3
iter fails (*)
fval fails (@)
total fails
A.2
n
16
2
3
3
4
5
5
6
6
3
5
2
2
31
10
6
9
3
3
m
8
3
1
2
1
2
3
1
6
1
10
1
1
31
6
15
4
2
101
ARCOPT
42
4
4
6
6
16
10
@
1
8
14
6
4
64
15
8
11
3
13
2
2
4
SNOPT
23
7
@
@
10
37
23
@
3
17
13
10
11
57
34
15
@
6
6
0
5
5
IPOPT
@
14
13
12
12
17
9
16
@
9
11
6
9
16
11
17
@
11
16
1
4
5
Quasi-Newton results
Table A.4: Number of function evaluations taken to solve CUTEr
unconstrained problems with quasi-Newton solvers. The number of variables is indicated by column n. Solver failures are indicated by *.
Problem
AKIVA
ALLINITU
ARGLINA
ARGLINB
n
2
4
200
200
BFGS-LINE
14
16
3
*
BFGS-ARC
23
15
3
*
SR1-ARC
19
15
3
*
79
APPENDIX A. RESULTS TABLES
Table A.4: (continued) Number of function evaluations taken
to solve CUTEr unconstrained problems with quasi-Newton
solvers. The number of variables is indicated by column n. Solver
failures are indicated by *.
Problem
ARGLINC
ARWHEAD
BARD
BDQRTIC
BEALE
BIGGS6
BOX
BOX3
BRKMCC
BROWNAL
BROWNDEN
BROYDN7D
BRYBND
CHAINWOO
CHNROSNB
CLIFF
COSINE
CRAGGLVY
CUBE
CURLY10
CURLY20
CURLY30
DECONVU
DENSCHNA
DENSCHNB
DENSCHNC
DENSCHND
DENSCHNE
DENSCHNF
DIXMAANA
DIXMAANB
DIXMAANC
DIXMAAND
DIXMAANE
DIXMAANF
DIXMAANG
DIXMAANH
DIXMAANI
DIXMAANJ
DIXMAANK
DIXMAANL
DIXON3DQ
DJTL
n
200
100
3
100
2
6
100
3
2
200
4
100
100
100
10
2
100
100
2
100
100
100
61
2
2
2
3
3
2
300
300
300
300
300
300
300
300
300
300
15
300
100
2
BFGS-LINE
*
*
24
*
19
44
18
19
7
12
32
94
81
569
100
71
38
*
40
254
227
211
60
12
9
23
97
46
13
23
36
47
61
518
468
517
591
*
*
114
*
173
*
BFGS-ARC
*
*
26
*
19
40
33
19
7
7
*
99
88
569
75
74
47
*
47
846
617
497
58
12
9
23
108
41
16
23
36
47
60
518
468
517
591
*
*
114
*
165
*
SR1-ARC
*
10
24
66
20
40
14
15
7
26
*
118
55
232
97
61
28
186
68
248
207
179
441
11
9
21
67
33
15
13
17
21
24
123
131
135
162
510
827
57
1376
152
*
80
APPENDIX A. RESULTS TABLES
Table A.4: (continued) Number of function evaluations taken
to solve CUTEr unconstrained problems with quasi-Newton
solvers. The number of variables is indicated by column n. Solver
failures are indicated by *.
Problem
DQDRTIC
DQRTIC
EDENSCH
EG2
EIGENALS
EIGENBLS
EIGENCLS
ENGVAL1
ENGVAL2
ERRINROS
EXPFIT
EXTROSNB
FLETCBV2
FLETCBV3
FLETCHBV
FLETCHCR
FMINSRF2
FMINSURF
FREUROTH
GENROSE
GROWTHLS
HAIRY
HATFLDD
HATFLDE
HATFLDFL
HEART6LS
HEART8LS
HELIX
HIELOW
HILBERTA
HILBERTB
HIMMELBB
HIMMELBF
HIMMELBG
HIMMELBH
HUMPS
HYDC20LS
INDEF
JENSMP
KOWOSB
LIARWHD
LOGHAIRY
MANCINO
n
100
100
36
1000
110
110
462
100
3
10
2
100
100
100
100
100
121
121
100
100
3
2
3
3
3
6
8
3
3
10
50
2
4
2
2
2
99
100
2
4
100
2
30
BFGS-LINE
27
302
142
5
108
584
1413
68
36
128
19
*
203
13
*
748
146
124
27
428
2
32
27
36
911
977
2055
29
*
39
11
19
48
18
11
236
*
21
*
24
20
10
30
BFGS-ARC
25
268
142
5
117
596
1259
68
36
611
17
*
2020
13
*
1468
235
184
27
637
2
27
26
35
225
9694
1758
29
*
39
11
13
820
18
11
204
*
9
*
27
20
22
30
SR1-ARC
8
208
73
5
101
944
2801
38
40
589
19
*
456
13
*
928
143
108
24
446
2
26
27
42
149
*
*
31
18
9
8
13
124
14
9
196
*
9
61
27
18
96
14
81
APPENDIX A. RESULTS TABLES
Table A.4: (continued) Number of function evaluations taken
to solve CUTEr unconstrained problems with quasi-Newton
solvers. The number of variables is indicated by column n. Solver
failures are indicated by *.
Problem
MARATOSB
MEXHAT
MEYER3
MODBEALE
MOREBV
MSQRTALS
MSQRTBLS
NCB20
NCB20B
NONCVXU2
NONCVXUN
NONDIA
NONDQUAR
NONMSQRT
OSBORNEA
OSBORNEB
OSCIPATH
PALMER1C
PALMER1D
PALMER2C
PALMER3C
PALMER4C
PALMER5C
PALMER6C
PALMER7C
PALMER8C
PENALTY1
PENALTY2
PENALTY3
PFIT1LS
PFIT2LS
PFIT3LS
PFIT4LS
POWELLSG
POWER
QUARTC
ROSENBR
S308
SCHMVETT
SENSORS
SINEVAL
SINQUAD
SISSER
n
2
2
3
200
100
100
100
110
100
100
100
100
100
100
5
11
100
8
7
8
8
8
6
8
8
8
100
100
200
3
3
3
3
36
100
100
2
2
100
100
2
100
2
BFGS-LINE
1533
51
*
*
203
187
189
118
327
870
673
18
1683
667
83
76
45
157
118
151
141
142
30
137
143
134
82
*
*
738
1494
1328
1608
49
916
302
41
14
89
73
95
21
25
BFGS-ARC
4455
52
*
288
2146
175
187
148
565
866
716
18
1594
*
4623
78
55
*
*
*
*
*
29
*
*
*
89
*
*
65
3704
3918
5264
46
936
268
42
14
156
48
111
*
25
SR1-ARC
5152
49
*
*
347
170
177
121
282
388
376
12
1419
*
3748
81
36
109
33
295
49
55
9
44
47
47
96
*
*
2356
2244
7888
5792
36
464
208
59
14
76
82
158
16
22
82
APPENDIX A. RESULTS TABLES
Table A.4: (continued) Number of function evaluations taken
to solve CUTEr unconstrained problems with quasi-Newton
solvers. The number of variables is indicated by column n. Solver
failures are indicated by *.
Problem
SNAIL
SPARSINE
SPARSQUR
SPMSRTLS
SROSENBR
TESTQUAD
TOINTGOR
TOINTGSS
TOINTPSP
TOINTQOR
TQUARTIC
TRIDIA
VARDIM
WATSON
WATSON
WOODS
ZANGWIL2
failures
n
2
100
100
100
100
1000
50
100
50
50
100
100
100
12
31
100
2
BFGS-LINE
14
303
108
118
15
1357
174
22
70
56
19
103
33
63
75
42
4
18
BFGS-ARC
12
303
109
115
19
2283
174
22
78
56
42
99
33
65
220
39
4
28
SR1-ARC
12
134
71
79
18
608
93
20
68
29
29
105
33
48
1238
32
4
14
Table A.5: Number of function evaluations taken to solve CUTEr
bound constrained problems with quasi-Newton solvers. The
number of variables is indicated by column n. Solver failures are
indicated by *.
Problem
3PK
ALLINIT
ANTWERP
BDEXP
BIGGSB1
BLEACHNG
BQP1VAR
BQPGABIM
BQPGASIM
CAMEL6
CHARDIS0
CHEBYQAD
CHENHARK
n
30
4
27
100
100
17
1
50
50
2
200
100
100
BFGS-LINE
422
19
*
18
277
12
2
71
78
13
3
*
237
BFGS-ARC
*
18
*
18
351
11
2
210
209
13
3
*
378
SR1-ARC
153
20
*
18
138
*
2
89
83
12
3
*
125
83
APPENDIX A. RESULTS TABLES
Table A.5: (continued) Number of function evaluations taken to
solve CUTEr bound constrained problems with quasi-Newton
solvers. The number of variables is indicated by column n. Solver
failures are indicated by *.
Problem
CVXBQP1
DECONVB
EG1
EXPLIN
EXPLIN2
EXPQUAD
GRIDGENA
HARKERP2
HART6
HATFLDA
HATFLDB
HATFLDC
HIMMELP1
HS1
HS110
HS2
HS25
HS3
HS38
HS3MOD
HS4
HS45
HS5
JNLBRNG1
JNLBRNG2
JNLBRNGA
JNLBRNGB
KOEBHELB
LINVERSE
LOGROS
MAXLIKA
MCCORMCK
MDHOLE
NCVXBQP1
NCVXBQP2
NCVXBQP3
NOBNDTOR
NONSCOMP
OBSTCLAE
OBSTCLAL
OBSTCLBL
OBSTCLBM
OBSTCLBM
n
100
61
3
120
120
120
170
100
6
4
4
25
2
2
200
2
3
2
4
2
2
5
2
100
100
100
100
3
199
2
8
100
2
100
100
100
100
100
100
100
100
100
100
BFGS-LINE
108
197
12
224
*
115
*
334
22
38
36
58
7
24
3
16
1
9
37
9
3
4
12
65
41
51
51
211
*
114
100
13
93
103
111
117
66
243
62
47
53
58
58
BFGS-ARC
107
*
11
223
*
80
*
377
40
*
*
109
7
22
3
17
1
9
39
7
3
4
17
62
39
91
86
*
*
168
284
13
94
103
111
115
102
70
69
70
51
63
63
SR1-ARC
101
243
11
176
159
61
*
311
22
*
*
66
7
23
3
10
1
9
32
5
3
5
13
61
35
50
41
2301
*
225
118
14
127
101
106
106
38
48
48
27
43
67
67
84
APPENDIX A. RESULTS TABLES
Table A.5: (continued) Number of function evaluations taken to
solve CUTEr bound constrained problems with quasi-Newton
solvers. The number of variables is indicated by column n. Solver
failures are indicated by *.
Problem
OSLBQP
PALMER1
PALMER1A
PALMER1B
PALMER1E
PALMER2
PALMER2A
PALMER2B
PALMER2E
PALMER3
PALMER3A
PALMER3B
PALMER3E
PALMER4
PALMER4A
PALMER4B
PALMER4E
PALMER5A
PALMER5B
PALMER5D
PALMER5E
PALMER6A
PALMER6E
PALMER7A
PALMER7E
PALMER8A
PALMER8E
PENTDI
POWELLBC
PROBPENL
PSPDOC
QR3DLS
QRTQUAD
QUDLIN
S368
SCOND1LS
SIM2BQP
SIMBQP
SINEALI
SPECAN
TORSION1
TORSION2
TORSION3
n
8
4
6
4
8
4
6
4
8
4
6
4
8
4
6
4
8
8
9
8
8
6
8
6
8
6
8
100
200
100
4
40
120
120
100
102
2
2
100
9
100
100
100
BFGS-LINE
2
30
150
57
238
22
98
39
237
45
126
41
196
39
90
*
165
*
457
26
2170
190
173
*
2247
166
112
4
*
3
14
163
*
188
*
*
2
4
61
50
39
54
6
BFGS-ARC
2
40
2338
68
*
38
2268
41
*
*
3136
52
*
60
943
47
*
*
*
24
*
1916
*
*
*
446
795
4
*
3
16
222
*
188
45
*
2
4
62
61
84
48
8
SR1-ARC
2
39
1287
75
*
27
9689
41
*
77
1276
46
*
79
471
47
*
*
*
8
4067
4636
9223
*
*
221
998
4
*
3
14
245
117
147
48
*
2
4
*
33
14
56
6
85
APPENDIX A. RESULTS TABLES
Table A.5: (continued) Number of function evaluations taken to
solve CUTEr bound constrained problems with quasi-Newton
solvers. The number of variables is indicated by column n. Solver
failures are indicated by *.
Problem
TORSION4
TORSION5
TORSION6
TORSIONA
TORSIONB
TORSIONC
TORSIOND
TORSIONE
TORSIONF
WEEDS
YFIT
failures
n
100
100
100
100
100
100
100
100
100
3
3
BFGS-LINE
23
1
9
35
46
6
26
1
12
51
87
12
BFGS-ARC
33
1
5
74
122
8
37
1
5
189
1567
24
SR1-ARC
32
1
5
55
61
6
48
1
5
251
1293
18
Bibliography
[1] Filippo Aluffi-Pentini, Valerio Parisi, and Francesco Zirilli. A differential-equations algorithm
for nonlinear equations. ACM Trans. Math. Softw., 10(3):299–316, August 1984.
[2] Alfred Auslender. Computing points that satisfy second order necessary optimality conditions
for unconstrained minimization. SIAM Journal on Optimization, 20(4):1868–1884, 2010.
[3] William Behrman. An efficient gradient flow method for unconstrained optimization. PhD
thesis, Stanford University, June 1998.
[4] Dimitri P. Bertsekas. Nonlinear Programming. Athena Scientific, 2nd edition, 1999.
[5] Paul T. Boggs. An algorithm, based on singular perturbation theory, for ill-conditioned minimization problems. SIAM Journal on Numerical Analysis, 14(5):830–843, 1977.
[6] C. A. Botsaris and D. H. Jacobson. A Newton-type curvilinear search method for optimization.
Journal of Mathematical Analysis and Applications, 54(1):217–229, April 1976.
[7] Charalampos A. Botsaris. Differential gradient methods. Journal of Mathematical Analysis and
Applications, 63(1):177 – 198, 1978.
[8] Charalampos A. Botsaris. An efficient curvilinear method for the minimization of a nonlinear function subject to linear inequality constraints. Journal of Mathematical Analysis and
Applications, 71(2):482 – 515, 1979.
[9] Charalampos A. Botsaris. A Newton-type curvilinear search method for constrained optimization. Journal of Mathematical Analysis and Applications, 69(2):372 – 397, 1979.
[10] Mary A. Branch, Thomas F. Coleman, and Yuying Li. A subspace, interior, and conjugate
gradient method for large-scale bound-constrained minimization problems. SIAM Journal on
Scientific Computing, 21(1):1–23, 1999.
[11] A. A. Brown and M. C. Bartholomew-Biggs. Some effective methods for unconstrained optimization based on the solution of systems of ordinary differential equations. Journal of Optimization
Theory and Applications, 62:211–224, 1989.
[12] Richard Byrd, Jorge Nocedal, and Richard Waltz. KNITRO: An integrated package for nonlinear optimization. In G. Pillo, M. Roma, and Panos Pardalos, editors, Large-Scale Nonlinear
Optimization, volume 83 of Nonconvex Optimization and Its Applications, pages 35–59. Springer
US, 2006.
[13] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, and Ciyou Zhu. A limited memory algorithm
for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190–1208,
1995.
86
BIBLIOGRAPHY
87
[14] Richard H. Byrd, Robert B. Schnabel, and Gerald A. Shultz. Approximate solution of the trust
region problem by minimization over two-dimensional subspaces. Mathematical Programming,
40:247–263, 1988.
[15] Andrew R. Conn, Nicholas I. M. Gould, Dominique Orban, and Philippe L. Toint. A primaldual trust-region algorithm for non-convex nonlinear programming. Mathematical Programming,
87:215–249, 2000.
[16] Andrew R. Conn, Nicholas I. M. Gould, and Philippe L. Toint. Trust Region Methods. Society
for Industrial and Applied Mathematics, Philadephia, PA, 2000.
[17] Andrew R. Conn, Nick Gould, and Philippe L. Toint. Numerical experiments with the
LANCELOT package (Release A) for large-scale nonlinear optimization. Mathematical Programming, 73:73–110, 1996.
[18] Joseph Czyzyk, Michael P. Mesnier, and Jorge J. Moré. The NEOS server. Computational
Science Engineering, IEEE, 5(3):68–75, jul-sep 1998.
[19] Elizabeth D. Dolan and Jorge J. Moré. Benchmarking optimization software with performance
profiles. Mathematical Programming, 91:201–213, 2002.
[20] Elizabeth D. Dolan, Jorge J. Moré, and Todd S. Munson. Optimality measures for performance
profiles. SIAM Journal on Optimization, 16(3):891–909, 2006.
[21] Arne S. Drud. CONOPT: A large-scale GRG code. ORSA Journal on Computing, 6(2):207–216,
1994.
[22] Vladimir Ejov, Jerzy A. Filar, Walter Murray, and Giang T. Nguyen. Determinants and longest
cycles of graphs. SIAM Journal on Discrete Mathematics, 22(3):1215–1225, 2008.
[23] Haw-ren Fang and Dianne OLeary. Modified Cholesky algorithms: a catalog with new approaches. Mathematical Programming, 115:319–349, 2008.
[24] M. C. Ferris, S. Lucid, and M. Roma. Nonmonotone curvilinear line search methods for unconstrained optimization. Computational Optimization and Applications, 6:117–136, 1996.
[25] Jerzy A. Filar, Michael Haythorpe, and Walter Murray. On the determinant and its derivatives of the rank-one corrected generator of a Markov chain on a graph. Journal of Global
Optimization, pages 1–16, 2012.
[26] Roger Fletcher and Sven Leyffer. Nonlinear programming without a penalty function. Mathematical Programming, 91:239–269, 2002.
[27] Anders Forsgren and Walter Murray. Newton methods for large-scale linear equality-constrained
minimization. SIAM Journal on Matrix Analysis and Applications, 14(2):560–587, 1993.
[28] Anders Forsgren and Walter Murray. Newton methods for large-scale linear inequalityconstrained minimization. SIAM Journal on Optimization, 7(1):162–176, 1997.
[29] Robert Fourer, David M. Gay, and Brian W. Kernighan. AMPL: A Modeling Language for
Mathematical Programming. Duxbury Press, 2nd edition, 2002.
[30] Michael P. Friedlander. A Globally Convergent Linearly Constrained Lagrangian Method for
Nonlinear Optimization. PhD thesis, Stanford University, August 2002.
BIBLIOGRAPHY
88
[31] Antonino Del Gatto. A Subspace Method Based on a Differential Equation Approach to Solve
Unconstrained Optimization Problems. PhD thesis, Stanford University, June 2000.
[32] David Gay. A trust-region approach to linearly constrained optimization. In David Griffiths, editor, Numerical Analysis, volume 1066 of Lecture Notes in Mathematics, pages 72–105. Springer
Berlin / Heidelberg, 1984.
[33] Philip E. Gill and Walter Murray. Newton-type methods for unconstrained and linearly constrained optimization. Mathematical Programming, 7:311–350, 1974.
[34] Philip E. Gill, Walter Murray, and Michael A. Saunders. SNOPT: An SQP algorithm for
large-scale constrained optimization. SIAM Review, 47(1):99–131, 2005.
[35] Philip E. Gill, Walter Murray, Michael A. Saunders, and Margaret H. Wright. Maintaining LU
factors of a general sparse matrix. Linear Algebra and its Applications, 8889(0):239 – 270, 1987.
[36] Philip E. Gill, Walter Murray, Michael A. Saunders, and Margaret H. Wright. A practical anticycling procedure for linearly constrained optimization. Mathematical Programming, 45:437–
474, 1989.
[37] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization. Academic
Press, 1982.
[38] Donald Goldfarb. Curvilinear path steplength algorithms for minimization which use directions
of negative curvature. Mathematical Programming, 18:31–40, 1980.
[39] Nicholas I. M. Gould, Stefano Lucidi, Massimo Roma, and Philippe L. Toint. Exploiting negative curvature directions in linesearch methods for unconstrained optimization. Optimization
Methods & Software, 14(1-2):75–98, 2000.
[40] Nicholas I. M. Gould, Dominique Orban, and Philippe L. Toint. CUTEr and SifDec: A constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw., 29:373–
394, December 2003.
[41] L. Grippo, F. Lampariello, and S. Lucidi. A nonmonotone line search technique for Newton’s
method. SIAM Journal on Numerical Analysis, 23(4):707–716, 1986.
[42] Michael Haythorpe. Cubic graph data, 2010. Personal communication.
[43] Michael Haythorpe. Markov Chain Based Algorithms for the Hamiltonian Cycle Problem. PhD
thesis, University of South Australia, July 2010.
[44] Richard M. Karp. Reducibility among combinatorial problems. Complexity of Computer Computations, pages 85–103, 1972.
[45] R. B. Lehoucq, D. C. Sorensen, and C. Yang. ARPACK Users’ Guide: Solution of Large-Scale
Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, 1998.
[46] Chih-Jen Lin and Jorge J. Moré. Newton’s method for large bound-constrained optimization
problems. SIAM Journal on Optimization, 9(4):1100–1127, 1999.
[47] Garth P. McCormick. A modification of Armijo’s step-size rule for negative curvature. Mathematical Programming, 13:111–115, 1977.
BIBLIOGRAPHY
89
[48] Jorge J. Moré and Danny C. Sorensen. On the use of directions of negative curvature in a
modified newton method. Mathematical Programming, 16:1–20, 1979.
[49] Jorge J. Moré and Danny C. Sorensen. Computing a trust region step. SIAM Journal on
Scientific and Statistical Computing, 4(3):553–572, 1983.
[50] Walter Murray and Francisco J. Prieto. A second derivative method for nonlinearly constained
opimization. Technical report, Stanford University, Systems Optimization Laboratory, 1995.
[51] B. A. Murtagh and M. A. Saunders. Large-scale linearly constrained optimization. Mathematical
Programming, 14:41–72, 1978.
[52] Bruce A. Murtagh and Michael A. Saunders. A projected Lagrangian algorithm and its implementation for sparse nonlinear constraints. In Algorithms for Constrained Minimization of
Smooth Nonlinear Functions, volume 16 of Mathematical Programming Studies, pages 84–117.
Springer Berlin Heidelberg, 1982.
[53] Jorge Nocedal and Stephen J. Wright. Numerical Optimization. Springer, 2nd edition, 2006.
[54] Gerald A. Shultz, Robert B. Schnabel, and Richard H. Byrd. A family of trust-region-based
algorithms for unconstrained minimization with strong global convergence properties. SIAM
Journal on Numerical Analysis, 22(1):47–67, 1985.
[55] Danny C. Sorensen. Newton’s method with a model trust region modification. SIAM Journal
on Numerical Analysis, 19(2):409–426, 1982.
[56] Trond Steihaug. The conjugate gradient method and trust regions in large scale optimization.
SIAM Journal on Numerical Analysis, 20(3):626–637, 1983.
[57] Paul Tseng. Convergent infeasible interior-point trust-region methods for constrained minimization. Siam Journal on Optimization, 13(2):432–469, Oct 2002.
[58] Robert J. Vanderbei and David F. Shanno. An interior-point algorithm for nonconvex nonlinear
programming. Computational Optimization and Applications, 13:231–252, 1999.
[59] Jean-Philippe Vial and Israel Zang. Unconstrained optimization by approximation of the gradient path. Mathematics of Operations Research, 2(3):253–265, 1977.
[60] Andreas Wächter and Lorenz T. Biegler. On the implementation of an interior-point filter linesearch algorithm for large-scale nonlinear programming. Mathematical Programming, 106:25–57,
2006.
[61] Jianzhong Zhang and Chengxian Xu. A class of indefinite dogleg path methods for unconstrained
minimization. SIAM Journal on Optimization, 9(3):646–667, 1999.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement