Approaches to accommodate remeshing in shape optimization by Daniel Nicolas Wilke A thesis submitted in partial fulfillment of the requirements for the degree Philosophiae Doctor (Mechanical Engineering) in the Department of Mechanical Engineering Faculty of Engineering, the Built Environment and Information Technology University of Pretoria Pretoria August 2010 © University of Pretoria Synopsis This study proposes novel optimization methodologies for the optimization of problems that reveal non-physical step discontinuities. More specifically, it is proposed to use gradient-only techniques that do not use any zeroth order information at all for step discontinuous problems. A step discontinuous problem of note is the shape optimization problem in the presence of remeshing strategies, since changes in mesh topologies may — and normally do — introduce non-physical step discontinuities. These discontinuities may in turn manifest themselves as non-physical local minima in which optimization algorithms may become trapped. Conventional optimization approaches for step discontinuous problems include evolutionary strategies, and design of experiment (DoE) techniques. These conventional approaches typically rely on the exclusive use of zeroth order information to overcome the discontinuities, but are characterized by two important shortcomings: Firstly, the computational demands of zero order methods may be very high, since many function values are in general required. Secondly, the use of zero order information only does not necessarily guarantee that the algorithms will not terminate in highly unfit local minima. In contrast, the methodologies proposed herein use only first order information, rather than only zeroth order information. The motivation for this approach is that associated gradient information in the presence of remeshing remains accurately and uniquely computable, notwithstanding the presence of discontinuities. From a computational effort point of view, a gradient-only approach is of course comparable to conventional gradientbased techniques. In addition, the step discontinuities do not manifest themselves as local minima. KEYWORDS: shape optimization; gradient-only optimization; unstructured remeshing; truss analogy; analytical sensitivity analysis; consistent tangent; local minima; step discontinuity; partial differential equation; non-constant discretization; error indicator; r-refinement; radial basis function; variable discretization i Sinopsis Hierdie studie stel ’n nuwe optimerings-metodologie vir die optimering van probleme met nie-fisiese trap diskontinuı̈teite voor. In besonder word voorgestel om slegs-gradiënt tegnieke, wat glad nie nulde orde inligting benut nie, te gebruik vir trap diskontinue probleme. ’n Trap diskontinue probleem van belang is die vormoptimerings-probleem waar hermaas strategieë gebruik word, omdat veranderinge in maastopologie nie-fisiese trap diskontinuı̈teite mag veroorsaak. Hierdie diskontinuı̈teite mag om die beurt weer as niefisiese lokale minima te voorskyn kom, waarin optimeringsalgoritmes vasgevang kan raak. Konvensionele optimeringstegnieke vir trap diskontinue probleme sluit evolutionêre strategieë asook ontwerp van eksperiment (OvE) tegnieke in. Hierdie konvensionele tegnieke maak tipies staat op die uitsluitlike gebruik van nulde orde inligting om diskontinuı̈teite te oorkom, maar word gekarakteriseer deur twee tekortkominge: Eerstens, die berekeningskoste van nulde orde metodes mag baie hoog wees, omdat baie funksie evaluerings benodig word. Tweedens verseker die gebruik van slegs nulde orde inligting nie dat die algoritmes nie in ongewensde lokale minima termineer nie. In teenstelling hiermee gebruik die metodologie wat hierin voorgestel word slegs eerste orde inligting, in plaas van nulde orde inligting. Die motivering vir hierdie benadering is dat geassosieerde gradiënt inligting in die aanwesigheid van hermasing akkuraat en uniek berekenbaar is, nieteenstaande die teenwoordigheid van diskontinuı̈teite. Vanuit ’n berekeningsoogpunt is ’n slegs-gradiënt metode natuurlik vergelykbaar met konvensionele gradiënt gebaseerde tegnieke. Boonop manifesteer trap diskonitnuı̈teite hulself nie as lokale minima nie. SLEUTELWOORDE: vormoptimering; slegs-gradiënt optimering; ongestruktureerde hermasing; vakwerk-analogie; analitiese sensitiwiteitsanalise; konsekwente gradiënt; lokale minima; stap diskontinuı̈teit; parsiële differensiaalvergelyking; nie-konstante diskretisering; fout aanwyser; r-verfyning; radiale basis funksie; veranderlike diskretisering ii Acknowledgements I would like to express my sincere appreciation to my supervisors Dr. Schalk Kok and Prof. Albert A. Groendwold for their for guidance, assistance and patience throughout this study. It has been a pleasure and privilege working with you. To Prof. Jan Snyman I would like to express my sincere gratitude for his guidance, support and motivation of the theoretical section of this study. I will cherish the discussions we have had in your office. To the support staff of the University of Pretoria, thank you for the professional and friendly support and assistance that you have provided throughout my studies. This material is based on work supported by the National Research Foundation of South Africa. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Research Foundation. To the National Research Foundation of South Africa, thank you for make this study financially viable. Thank you to my beloved father (Piet Wilke) and mother (Brigitte Wilke) for raising me with love and care. Thank you for teaching me the value of education, I have truly been blessed. To my siblings Johannes Wilke, Dries Wilke and Nina Brazer, thank you for always being there with a supporting shoulder throughout my studies. I look forward to the return the favour at last. To the teachers whom have taught me throughout my life, from my primary and secondary education at Grey College in Bloemfontein to my tertiary education at the University of the Free State and the University of Pretoria. A memorable thank you to all my friends from all walks of life, local and abroad, I think of you fondly. Typeset using LATEX 2ε Compiled under GNU/Linux iii Contents Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sinopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Overview i ii iii 1 2 Remeshing shape optimization strategy 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Mesh generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Mesh generator based on a truss structure analogy . . . . . 2.2.2 Quadratic convergent Newton solver for the mesh generator 2.2.3 Evaluation of the mesh generators . . . . . . . . . . . . . . . 2.3 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Analytical sensitivities . . . . . . . . . . . . . . . . . . . . . 2.3.2 Gradient sensitivity comparison . . . . . . . . . . . . . . . . 2.4 Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Example problem 1: Cantilever beam . . . . . . . . . . . . . 2.4.2 Example problem 2: Full spanner . . . . . . . . . . . . . . . 2.4.3 Example problem 3: Michell-like structure . . . . . . . . . . 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Applications of gradient-only optimization 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 3.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . 3.3 Problem formulation . . . . . . . . . . . . . . . . . 3.3.1 Unconstrained minimization problem . . . . 3.3.2 Equality constrained minimization problems iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 5 5 7 8 10 12 14 16 16 17 18 22 . . . . . 23 25 30 32 32 33 3.4 3.5 3.6 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 BFGS second order line search descent method . . . . . . . . . . 3.4.2 Sequential spherical approximations (SSA) . . . . . . . . . . . . . Example problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Numerical settings . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Example problem: temporal and spatial partial differential equations 3.5.3 Example problems: spatial partial differential equations . . . . . . 3.5.4 Example problems: temporal partial differential equations . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Theory of gradient-only optimization 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Univariate example problem: Newton’s cooling law 4.1.2 Multivariate example problem: Shape optimization 4.1.3 Introductory comments . . . . . . . . . . . . . . . . 4.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Gradient-only optimization problem . . . . . . . . . . . . . 4.3.1 Discontinuous gradient projection points (GPP) . . 4.3.2 Derivative descent sequences . . . . . . . . . . . . . 4.4 Proofs of convergence for derivative descent sequences . . . 4.4.1 Univariate functions . . . . . . . . . . . . . . . . . 4.4.2 Multivariate functions . . . . . . . . . . . . . . . . 4.5 Practical algorithmic considerations . . . . . . . . . . . . . 4.5.1 Line search descent methods . . . . . . . . . . . . . 4.5.2 Approximation methods . . . . . . . . . . . . . . . 4.5.3 Conservative approximations . . . . . . . . . . . . . 4.5.4 Termination criteria . . . . . . . . . . . . . . . . . 4.6 Mathematical programming vs. gradient-only optimization 4.7 Numerical study . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Shape optimization . . . . . . . . . . . . . . . . . . 4.7.3 Analytical set of test problems . . . . . . . . . . . . 4.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Adaptive remeshing in shape optimization 5.1 Introduction . . . . . . . . . . . . . . . . . 5.2 Shape optimization problem . . . . . . . . 5.3 Optimization algorithm . . . . . . . . . . . 5.4 Structural analysis . . . . . . . . . . . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 36 39 43 43 44 46 52 57 . . . . . . . . . . . . . . . . . . . . . . 59 60 62 63 64 64 67 68 71 71 72 74 77 77 79 81 82 82 84 85 85 87 90 . . . . 91 92 94 95 96 5.5 5.6 5.7 5.8 5.4.1 Recovery-based global error indicator . . . . . . . . . . . 5.4.2 Refinement procedure . . . . . . . . . . . . . . . . . . . Adaptive mesh generator . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Boundary nodes . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Mapping the error field between candidate shape designs Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . . . . Numerical study . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 Gradient sensitivity comparison . . . . . . . . . . . . . . 5.7.2 Convergence rates . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Cantilever beam . . . . . . . . . . . . . . . . . . . . . . . 5.7.4 Michell structure . . . . . . . . . . . . . . . . . . . . . . 5.7.5 Spanner design . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Conclusion and recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 98 99 100 101 101 103 103 105 107 108 110 112 114 vi List of Figures 2.1 Computational effort required to solve the truss equilibrium with the Newton and the forward Euler methods. . . . . . . . . . . . . . . . . . . . . . 2.2 Force residual comparison for an ideal element length of h0 = 0.375 with 1492 nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Comparison between the different solvers in terms of a) mean element quality and b) mesh uniformity for the quarter circular disc. . . . . . . . 2.4 a) Bow-tie structure defined by the indicated control variables and b) the associated mesh for an ideal element length of h0 = 2.0. . . . . . . . . . . 2.5 Initial structure and definition for the cantilever beam. . . . . . . . . . . 2.6 Cantilever beam: optimal shapes for a) 4, b) 7 and c) 13 control points respectively obtained with the Dynamic-Q algorithm. . . . . . . . . . . . 2.7 Initial structure and loads for the full spanner problem. . . . . . . . . . . 2.8 Full spanner problem: optimal shapes for a) 4, b) 10 and c) 22 control points respectively obtained by the Dynamic-Q algorithm. . . . . . . . . 2.9 Initial structure and definition of half the Michell-like structure. (Control points x8 and x9 are indicated for the 16 NCP problems.) . . . . . . . . . 2.10 Michell-like structure: optimal shapes for a) 4, b) 8 and c) 16 control points respectively obtained with the Dynamic-Q algorithm. . . . . . . . . . . . 2.11 Vertical displacement at the point of load application for the variations of the 2 rightmost upper control variables (x8 , x9 ) for the mesh depicted in Figure 2.10(c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Michell-like structure: optimal shapes for ideal element lengths of a) 0.75, b) 0.375 and c) 0.1875 respectively for the 16 NCP problem obtained by the Dynamic-Q algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . vii 9 9 10 15 16 17 17 18 18 19 20 20 2.13 Vertical displacement at the point of load application for the variation of the rightmost upper control variable (x9 ) for the meshes depicted in Figure 2.12(a), 2.12(b) and 2.12(c). . . . . . . . . . . . . . . . . . . . . . 2.14 Michell-like structure: effect of different starting points on the optimal shape for a) the course, and b) the fine mesh, obtained with the DynamicQ algorithm using 16 control points. . . . . . . . . . . . . . . . . . . . . Plot depicting a piece-wise smooth step discontinuous numerical objective function fN (x) together with the corresponding underlying (unknown) continuous analytical objective function fA (x) of an optimization problem. In addition we depict a projected piece-wise smooth continuous objective function fC (x) obtained by removing the step discontinuities from fN (x). 3.2 Fin model with a uniform time varying heat flux q(t) input at the base, top surface convection with constant convection coefficient h and ambient temperature Ta . The design variables for the sizing problem are the width tw and height th of the triangular part of the fin. . . . . . . . . . . . . . . 3.3 (a) Function values, and (b) associated derivatives for the univariate transient heat transfer sizing problem. Note the sign change in f 0A (x) at x∗g = x∗ = 79.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Initial geometry of half the Michell-like structure using 16 control points x. 3.5 Michell-like structure: converged designs obtained with (a) BFGS(f), (b) BFGS(g), (c) SSA(f), and (d) SSA(g). . . . . . . . . . . . . . . . . . . . 3.6 Michell-like structure: BFGS(f) and BFGS(g) algorithms convergence history plot of the (a) function value f (x{k} ) and (b) gradient norm k∇A f (x{k} )k and (c) the norm of the solution update k∆x{k} k. . . . . . 3.7 Michell-like structure: SSA(f) and SSA(g) algorithms convergence history plot of the (a) function value f (x{k} ) and (b) gradient norm k∇A f (x{k} )k and (c) the norm of the solution update k∆x{k} k. . . . . . . . . . . . . 3.8 Michell-like structure: converged designs obtained with (a) BFGS(f), (b) BFGS(g), (c) SSA(f), and (d) SSA(g). . . . . . . . . . . . . . . . . . . . 3.9 Michell-like structure: BFGS(f) and BFGS(g) algorithms convergence history plot of (a) the Lagrangian L(x{k} , λ{k} ), (b) the norm of the Lagrangian gradient k∇A L(x{k} , λ{k} )k and (c) the norm of the solution update k∆[x{k} λ{k} ]k. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Michell-like structure: convergence histories for SSA(f) and SSA(g) of (a) the Lagrangian L(x{k} , λ{k} ), (b) the norm of the Lagrangian gradient k∇A L(x{k} , λ{k} )k and (c) the norm of the solution update k∆[x{k} λ{k} ]k. 21 21 3.1 viii 26 45 47 48 50 50 51 52 52 53 3.11 Function value and associated derivative along the search direction around the optimal point obtained with BFGS(f) for the linearly interpolated experimental data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12 Function value and associated derivative along the search direction around the solution obtained with BFGS(g) for the linearly interpolated experimental data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.13 Modified Voce model: BFGS(f) and BFGS(g) algorithms convergence history plot of (a) the function value f (x{k} ), (b) the gradient norm k∇A f (x{k} )k and (c) the distance from the optimum kx∗ − x{k} k, for the linearly interpolated experimental data. . . . . . . . . . . . . . . . . 3.14 Modified Voce model: SSA(f) and SSA(g) algorithms convergence history plot of (a) the function value f (x{k} ), (b) the gradient norm k∇A f (x{k} )k and (c) the distance from the optimum kx∗ − x{k} k, for the linearly interpolated experimental data. . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 Numerical and analytical solutions for Newton’s cooling law. (a) Temperature T after 1 second for 0.5 ≤ κ ≤ 2, and (b) the corresponding associated derivative dTdκ(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (a) Structure, boundary conditions and control variables and (b) the vertical displacement uF for variations of the two rightmost upper control variables (x8 , x9 ) for the Michell shape optimization problem. . . . . . . . Upper and lower semi-continuous univariate functions with (a) an inconsistent step discontinuity, and (b) a consistent step discontinuity. . . . . . An illustration of (a) the function value and (b) the corresponding associated derivative that is either upper or lower semi-continuous, with a step discontinuous strict non-negative associated gradient projection point (S-NN-GPP) in ∈ (d, e). . . . . . . . . . . . . . . . . . . . . . . . . . . . Plots depicting (a)-(d) the function values, and (e)-(h) the corresponding associated derivatives of four instances of step discontinuous univariate functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Plots depicting a step discontinuous objective function with a (a) distinct minimizer x∗ and strict non-negative gradient projection point (SNN-GPP) x∗g and (b) coinciding minimizer x∗ and S-NN-GPP x∗g . . . . . Michell-like structure: converged designs obtained with (a) BFGS(f), (b) BFGS(g), (c) SSA(f), and (d) SSA(g). . . . . . . . . . . . . . . . . . . . Michell-like structure: BFGS(f) and BFGS(g) algorithms convergence history plot of the (a) function value f (x{k} ) and (b) associated gradient norm k∇A f (x{k} )k (c) and convergence tolerance k∆x{k} k. . . . . . . . . . . . ix 56 56 57 57 63 63 65 69 83 84 86 86 4.9 Michell-like structure: SSA(f) and SSA(g) algorithms convergence history plot of the (a) function value f (x{k} ) and (b) associated gradient norm k∇A f (x{k} )k (c) and convergence tolerance k∆x{k} k. . . . . . . . . . . . 5.1 5.2 86 FE-error indicator integration into optimization . . . . . . . . . . . . . . 93 Bow-tie structure used to validate the (semi) analytical sensitivities and study the convergence behavior of the remeshing strategies. . . . . . . . . 104 5.3 (a) System degrees of freedom (SDOF) and (b) global error η {k} for the mesh convergence study on the bow-tie structure for initial uniform element lengths h0 = {1.5, 1, 0.8}. . . . . . . . . . . . . . . . . . . . . . . . 105 5.4 Convergence study showing (a)-(c) the initial mesh, (d)-(f) the final mesh and (g)-(i) the final ideal element length field of the bow-tie structure for various initial uniform element lengths h0 = {1.5, 1, 0.8}. . . . . . . . . . 106 5.5 Approximated displacement convergence rate for the bow-tie structure problem using the uniform and adaptive mesh generators. . . . . . . . . . 106 5.6 Initial geometry of the cantilever beam using 13 control points x. . . . . 107 5.7 The cantilever beam convergence histories of (a) the Lagrangian L(x{k} , λ{k} ), (b) absolute value of the constraint function g(x{k} ), (c) Lagrange multiplier λ{k} , and (d) system degrees of freedom (SDOF) for a uniform and adapted mesh using initial ideal element lengths h0 of respectively 1.05 and 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.8 Initial (a)-(b) and final (c)-(d) designs of the cantilever beam with the associated final ideal element length field (e)-(f), for a uniform and adapted mesh using initial ideal element lengths h0 of respectively 1.05 and 1. . . 109 5.9 Initial geometry of half the Michell-like structure using 16 control points x. 109 5.10 The Michell structure convergence histories of (a) the Lagrangian L(x{k} , λ{k} ), (b) absolute value of the constraint function g(x{k} ), (c) Lagrange multiplier λ{k} , and (d) system degrees of freedom (SDOF) for a uniform and adaptive mesh using initial ideal element lengths h0 of respectively 0.7 and 0.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.11 Initial (a)-(b) and final (c)-(d) designs of the Michell structure with the associated final ideal element length field (e)-(f), for a uniform and adapted mesh using initial ideal element lengths h0 of respectively 0.7 and 0.8. . . 111 5.12 Initial geometry and loads of the full spanner problem using 22 control points x. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 x 5.13 The full spanner convergence histories of (a) the Lagrangian L(x{k} , λ{k} ), (b) absolute value of the constraint function g(x{k} ), (c) Lagrange multiplier λ{k} , and (d) system degrees of freedom (SDOF) for a uniform and adaptive mesh using initial ideal element lengths h0 of respectively 0.7 and 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.14 Initial (a)-(b) and final (c)-(d) designs of the full spanner with the associated final ideal element length field (e)-(f), for a uniform and adapted mesh using initial ideal element lengths h0 of respectively 0.7 and 1. . . . xi 113 113 List of Tables 2.1 2.2 2.3 2.4 2.5 2.6 2.7 3.1 3.2 3.3 3.4 3.5 3.6 Cartesian coordinates for the piece-wise linear boundary description of the circular part of the quarter circular disc. . . . . . . . . . . . . . . . . . . Minimum element qualities for the forward Euler and Newton solvers. . . Cartesian coordinates for control variables and applied load F of the unperturbed bow-tie structure. . . . . . . . . . . . . . . . . . . . . . . . . . Analytical and forward finite difference sensitivities calculated for the bowtie structure depicted in Figure 2.4. . . . . . . . . . . . . . . . . . . . . . Best function value obtained for the cantilever beam problem. . . . . . . Best function value obtained for the full spanner problem. . . . . . . . . Best function value obtained for the Michell structure problem. . . . . . Results obtained with BFGS(f), BFGS(g), SSA(f) and SSA(g) for the univariate transient heat transfer problem. . . . . . . . . . . . . . . . . . . . Tabulated results obtained for the unconstrained Michell-like structure. . Tabulated results obtained for the constrained Michell-like structure. . . Parameter values used to construct experimental data for the inverse problem using the modified Voce model with the adaptive time step algorithm. Tabulated results for the least squares fit between the modified Voce model data points and the linearly interpolated experimental data points. . . . Final designs obtained for the least squares fit between the modified Voce model data points and the linearly interpolated experimental data points. 8 10 14 15 16 18 19 46 49 51 55 55 56 4.1 4.2 4.3 Algorithmic settings used in the numerical experiments. . . . . . . . . . . Tabulated results obtained for the unconstrained Michell-like structure. . Results for the step discontinuous test problem set. . . . . . . . . . . . . 85 85 89 5.1 Analytical and forward finite difference sensitivities calculated for the bowtie structure depicted in Figure 5.2. . . . . . . . . . . . . . . . . . . . . . 104 xii CHAPTER 1 Overview The following four chapters document the author’s contribution as a postgraduate student in the Department of Mechanical and Aeronautical Engineering at the University of Pretoria. Each of the following four chapters is a self-contained advancement towards accommodating remeshing in shape optimization, and are based on published or submitted papers. In Chapter 2 [66], a novel unstructured remeshing environment for gradient based shape optimisation using triangular finite elements is presented. The remeshing algorithm is based on a truss structure analogy; in solving for the equilibrium position of the truss system, the quadratically convergent Newton’s method is used. Analytical sensitivity information of the numerically approximated optimization problem is made available to the shape optimisation algorithm, which results in highly efficient gradient based shape optimisation. In solving the truss structure analogy in Chapter 2, we compare our quadratically convergent Newton solver with a previously proposed forward Euler solver; this includes notes regarding mesh uniformity, element quality, convergence rates and efficiency. We present three numerical examples; it is then shown that remeshing may introduce discontinuities and local minima. We demonstrate that the effects of these on gradient based algorithms are alleviated to some extent through mesh refinement, and may largely be overcome with a simple multi-start strategy. In Chapter 3 [65], we study the minimization of objective functions containing nonphysical jump discontinuities. These discontinuities arise when (partial) differential equations are discretized using non-constant methods and the resulting numerical solutions are used in computing the objective function, as observed in Chapter 2 using remeshing in shape optimization. Although the functions may become discontinuous and nondifferentiable we can compute analytical gradient information of the numerical solution where the function is differentiable, and approximate gradient information where it is discontinuous. At a non-differentiable point a partial derivative of the gradient vector is 1 CHAPTER 1. OVERVIEW 2 constructed by a one-sided directional derivative when the function is respectively nondifferentiable or given by the partial derivative itself when the function is differentiable along the partial derivative direction. Such a constructed gradient field follows from the computational scheme since every point has an associated discretization for which sensitivities can be calculated. We refer to this as the associated gradient field. Hence, from a computational perspective the associated gradient field of these discontinuous functions are everywhere defined albeit approximated at the discontinuities. Rather than the construction of global approximations using only function value information to overcome the discontinuities, as is often done, we propose to use only the associated gradient information. We elaborate on the modifications of classical gradient based optimization algorithms for use in gradient-only approaches, and we then present gradient-only optimization strategies using both BFGS and a new spherical quadratic approximation for sequential approximate optimization (SAO). We also use the BFGS and SAO algorithms to solve three problems of practical interest, both unconstrained and constrained. For the constrained problems we only consider smooth volume constraint functions. In Chapter 4, we consider some theoretical aspects of gradient-only optimization for the unconstrained optimization of objective functions containing non-physical step or jump discontinuities. The (discontinuous) associated gradients are however assumed to be accurate and everywhere uniquely defined. This kind of discontinuity indeed arises when the optimization problem is based on the solution of a system of partial differential equations, when variable discretization techniques are used (remeshing in spatial domains or variable time stepping in temporal domains). These discontinuities, which may cause local minima, are artifacts of the numerical strategies used and should not influence the solution to the optimization problem. We demonstrate that it is indeed possible to ignore these local minima due to discontinuities, if only associated gradient information is used. Various gradient-only algorithmic options are discussed. The implications are that variable discretization strategies, so important in the numerical solution of partial differential equations, can be combined with efficient local optimization algorithms. In Chapter 5, we extend our uniform mesh generator presented in Chapter 2. Herein, we turn our quadratically convergent mesh generator into an adaptive generator, by allowing for a spatially varying ideal element length field, computed using the Zienkiewicz-Zhu error indicator. The remeshing strategy makes (semi) analytical sensitivities available for use in gradient based optimization algorithms. To circumvent difficulties associated with local minima due to remeshing, we again rely on gradient-only optimization algorithms, which do not use zeroth order function information. Numerical results are presented for an orthotropic cantilever beam, an orthotropic Michell-like structure and a spanner design problem. This study is concluded in Chapter 6, which offers conclusions and recommendations. CHAPTER 2 Remeshing shape optimization strategy A novel unstructured remeshing environment for gradient based shape optimization using triangular finite elements is presented. The remeshing algorithm is based on a truss structure analogy; in solving for the equilibrium position of the truss system, the quadratically convergent Newton’s method is used. Analytical sensitivity information of the numerically approximated optimization problem is made available to the shape optimization algorithm, which results in highly efficient gradient based shape optimization. In solving the truss structure analogy, we compare our quadratically convergent Newton solver with a previously proposed forward Euler solver; this includes notes regarding mesh uniformity, element quality, convergence rates and efficiency. We present three numerical examples; it is then shown that remeshing may introduce discontinuities and local minima. We demonstrate that the effects of these on gradient based algorithms are alleviated to some extent through mesh refinement, and may largely be overcome with a simple multi-start strategy. This chapter is constructed as follows: An outline of shape optimization strategies is given in Section 2.1, followed by the unstructured remeshing strategy is in Section 2.2. In particular, the previously proposed linearly convergent mesh generator based on a truss structure analogy proposed by Persson and Strang is outlined. The mesh generator is then modified to exhibit quadratic convergence in solving for the equilibrium positions of the nodal coordinates of the truss structure. We also present an analytical sensitivity analysis, i.e. the computation of the derivatives of the mesh node positions w.r.t. the design domain control variables. The formulation of the shape design problem is considered in Section 2.3. Finally, numerical results for three example problems are presented in Section 2.4, where after we offer conclusions and recommendations for future work. 3 CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 2.1 4 Introduction Shape optimization involves the constrained minimization of a cost function. The cost function in turn typically involves the solutions of a system of partial differential equations, which depend on parameters that define a geometrical domain [34]. The continuum description of the geometrical domain is normally discretized. This allows efficient solution of the system of partial differential equations, using for example the finite element method (FEM). Normally, the discretized geometric domain is defined by control variables with predefined freedom. The control variables in turn bound the geometrical domain through a predefined relationship, which may be piecewise linear, or based on B-splines, etc. In shape optimization, different meshing strategies can be used. These include fixed grid strategies [20, 35, 67], design element concepts [28], adaptive mesh strategies [6, 50], and remeshing strategies. The first three methods imply an a priori mesh discretization with obvious limitations, for example when dealing with large shape changes in the geometry during optimization. On the other hand, some of the drawbacks of remeshing strategies are the implementation expense, and the possible introduction of local minima, which may cause gradient based optimization methods to become trapped in local minima [1]. However, (unstructured) remeshing strategies allow for generality in structural models and objective functions. Large shape changes can be accommodated using the remeshing strategy with minimal mesh distortion. In shape optimization, the cost function may be optimized using either a gradient free or gradient based optimization method. While the gradient free methods require only the relationship between the cost function and the discretized geometric domain to be specified, the gradient based optimization methods require additional sensitivity information. The sensitivities needed for the gradient optimization techniques can either be calculated numerically, semi-analytically or analytically. All these methods have merits and drawbacks. Numerical gradients using finite difference methods are computationally expensive, but are easily implementable. The semi-analytical and analytical methods are more complex to implement, but are computationally cheaper. An advantage of gradient free evolutionary strategies is their global optimization capability. They have been used with success by Xie and Steven [35, 67] in a fixed grid strategy. Related works that reflect evolutionary strategies in shape optimization, are the biological growth method of Mattheck and Burkhardt [38], and the genetic algorithm used by Garcia and Gonzalez [20]. They are in general however, still very expensive, and the solutions are normally inferior to those obtained with gradient based methods. Most certainly so for problems with many design variables. Hence we restrict ourselves to gradient based methods in this study. Mesh generation plays an important role in shape optimization and in general con- CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 5 tributes largely to the computational expense per iteration when unstructured remeshing strategies are used. This cost may however be offset many times over when exact analytical gradients can be made available to the algorithm used in solving the shape optimization problem. Remeshing strategies in shape optimization accentuate the importance of robustness, computational speed, flexibility and accuracy of the mesh generator in discretizing the geometrical domain. In this study a novel remeshing shape optimization environment is presented. The environment is based on an elegant truss structure analogy proposed by Persson and Strang [42]. It is however developed such that the analytical sensitivities are available, cost effectively. Two gradient based optimization algorithms are implemented, namely the Dynamic-Q algorithm [56], and sequential quadratic programming (SQP) [4]. These algorithms are then used to solve example problems in shape optimization. In turn, this demonstrates that remeshing may introduce discontinuities, which may cause the gradient based algorithms to become trapped in local minima. We then investigate the ability of h-refinement to escape from these local minima. 2.2 Mesh generation Computational meshes are used extensively in engineering and physics to discretize a continuous geometrical domain Ω with boundary ∂Ω. The computational mesh Λ ∈ {X = (X i )i=1,...,nn ; T = (Tjk )j=1,...,ne;k=1,...,nv }, (2.1) defined on the domain Ω describes the position X ∈ R3 of the nn nodes, and gives for each of the j = 1, . . . , ne computational elements the set Tjk=1,...,nv of its nv vertices [34]. In addition the set of nodes X are the union of the boundary nodes X ∂Ω and the interior nodes X Ω . In this study we limit the nodal positions to two dimensions X ∈ R2 . In this section, we present triangulation based on the truss structure analogy proposed by Persson and Strang [42]. This incorporates a Delaunay strategy [18] to ensure good mesh quality, albeit at the cost of potentially introducing discontinuities between consecutive meshes due to the addition or removal of nodes. 2.2.1 Mesh generator based on a truss structure analogy The mesh generator proposed by Persson and Strang is based on a truss structure analogy that solves for the equilibrium position of a truss structure. The geometrical domain is defined by a signed distance function that signs the nodes outside the domain as positive, inside as negative and zero on the boundary. The distance function is a function of the CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 6 control variables through the interpolation of the domain. The initial mesh is generated using the simple algorithm of Persson and Strang, which mostly creates equilateral triangles in the domain. The truss force function z is defined with a force discontinuity, as no tensile forces are permitted in the truss elements. This allows the propagation of the nodes X to the boundary ∂Ω. The nodes are kept inside the geometrical domain by external forces acting on the boundary nodes X ∂Ω . The forces act perpendicularly to the boundary, keeping the nodes from moving outside the boundary while allowing movement along the boundary. The truss force function z is defined as ( k(h0 − l) if l < h0 z(l, h0 ) = (2.2) 0 if l ≥ h0 with k the spring (truss) stiffness, l the current spring length and h0 the undeformed spring length (also referred to as the ideal element length). The undeformed spring length h0 is a user specified parameter whereas the current spring length l(X ) is a function of the nodal positions X . The nodal positions X (x) in turn depends on the control variables x. There is also a dependency of l on the mesh topology T , which we omit since the mesh topology T converges to a constant topology as the equilibrium of the truss structure converges. The implication is the introduction of discontinuities in the residual of the equilibrium of the truss structure whenever T or X changes, due to Delaunay triangulation. In the implementation of Persson and Strang [42], all springs are precompressed by 20%, which provides the driving force necessary to propagate nodes to the boundary. The truss system F (X ) = 0 is transformed to a system of ordinary differential equations through the introduction of artificial time-dependence in the equations. The system is then solved by a forward Euler method X n+1 = X n + ∆tF (X n ). (2.3) The forward Euler method is essentially a matrix free method ideally suited to create meshes with a very large number of elements. This method exhibits linear convergence rates. However, in general, the structural meshes (number of elements) in shape optimization tend to vary from small to moderate for practical optimization problems, since optimization is per se computationally expensive. Emphasis is placed on mesh quality and accurate representation of the geometrical domain. It may therefore be beneficial from a computational cost perspective to replace the forward Euler method with a quadratically convergent scheme. We will do so in the next subsection. CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 2.2.2 7 Quadratic convergent Newton solver for the mesh generator The truss system equilibrium equations F (X ) = 0 are partitioned along the internal nodes X Ω and boundary nodes X ∂Ω i.e. ( F (X ) = F Ω (X ) F ∂Ω (X ) ) ( = F Ω (X Ω , X ∂Ω ) F ∂Ω (X Ω , X ∂Ω ) ) ( = 0 0 ) . (2.4) For the sake of simplicity, the boundary nodes X ∂Ω are seeded along the geometrical boundary ∂Ω and are chosen to remain fixed during the shape mesh generation process. Nodes are explicitly placed on the control variable locations to ensure accurate representation of the defined geometrical domain Ω. Since the boundary nodes X ∂Ω are fixed, the system of unknowns reduces to X Ω . Hence, we rewrite F Ω (X Ω , X ∂Ω ) as F Ω (X Ω ). Also, the reactions at the boundary nodes are not of immediate interest, so we only solve for F Ω (X Ω ) = 0. (2.5) The reduced truss system in Eq. (2.5) is solved directly via the quadratically convergent Newton’s method, i.e. we solve for ∆X Ω from ∂F Ω ∆X Ω = −F Ω ∂X Ω (2.6) Ω Ω XΩ n+1 = X n + ∆X . (2.7) to update the nodal coordinates ∂F Ω The consistent tangent ∂X Ω is computed analytically for every iteration, since the number of elements (and hence the number of unknowns) and element connectivity may change between consecutive iterations, due to Delaunay triangulation. Although unusual, the possible change in the number of system unknowns requires no special treatment. Since the force function z proposed by Persson and Strang in Eq. (2.2) is discontinuous, (which is undesirable in gradient based implementations), it is now changed to allow for both tensile and compressive forces in the truss elements in finding the equilibrium position, i.e. z(l(X (x)), h0 ) = k(h0 − l(X (x))) for all l. (2.8) Furthermore, we do not require any precompression in the springs. CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 8 Table 2.1: Cartesian coordinates for the piece-wise linear boundary description of the circular part of the quarter circular disc. x y 2.2.3 0 15 2.9264 14.7118 5.7403 13.8582 8.3336 12.472 10.6066 10.6066 12.472 8.3336 13.8582 5.7403 14.7118 2.9264 15 0 Evaluation of the mesh generators We now compare our novel quadratically convergent Newton solver to the forward Euler solver. The boundary nodes are treated as discussed in Section 2.2.2 for both the Newton and forward Euler implementations. A comparative study is done for the mesh generation of a quarter circular disk with a radius of 15 units. The x and y coordinates for the piece-wise linear boundary representation of the circular part of the quarter circular disk is given in Table 2.1. The comparison focuses on the computational expense and the convergence rate of both solvers. (In cases where the sensitivities are not needed, i.e. for gradient free optimization methods, implementations of Quasi-Newton or Modified Newton methods can be used to obtain a computational advantage. Additionally, the residual convergence tolerance may also be relaxed.) A disadvantage of the Newton solver is that matrix methods require extensive memory resources when the mesh size is increased, when compared to the matrix-free forward Euler solver. However, we utilise sparse matrix manipulation techniques whenever possible. For both the forward Euler and Newton methods, we express the stopping condition in terms of the maximum nodal displacement, i.e. we stop when |X n+1 − X n |∞ < , with > 0, small and prescribed. This stopping criterion was also used by Persson and Strang [42]. The study is conducted using a 3GHz Pentium IV machine with 512 MB RAM running under the Linux operating system. Figure 2.1 depicts the computational effort comparison for the Newton and forward Euler solvers for different mesh sizes. For a stopping tolerance of = 0.04h0 , the computational expense of the forward Euler method is comparable to Newton’s method, where we use a stopping tolerance of = 10−8 h0 . However, decreasing the stopping tolerance for the forward Euler solver by a factor 10 increases the computational effort on average by a factor of 16. Figure 2.2 depicts the force residual versus the number of iterations for both solvers using an ideal element length of h0 = 0.375. After the mesh stabilises the convergence rate is quadratic for the Newton solver. (We define mesh stability to imply that no elements are added or removed from the mesh from one iteration to the next as a result of Delaunay CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 9 4 10 Newton (ε=10−8h ) 0 Forward Euler (ε=0.04h ) 0 3 Computational time (s) 10 Forward Euler (ε=0.004h0) 2 y 10 x 1 10 0 10 −1 10 1 10 2 3 10 10 4 5 10 10 Number of nodes Figure 2.1: Computational effort required to solve the truss equilibrium with the Newton and the forward Euler methods. 0 10 −2 Norm of the force residual 10 −4 10 −6 10 Delaunay triangulation −8 10 −8 −10 Newton (ε=10 h0) −12 Forward Euler (ε=0.004h0) 10 10 0 10 1 10 2 10 Number of iterations Figure 2.2: Force residual comparison for an ideal element length of h0 = 0.375 with 1492 nodes. triangulation.) In general, this requires between 1-2 iterations. The discontinuity in the force residual visible after the first iteration in Figure 2.2 is an example of such an event. (Even including these events, Newton’s method requires only 6 iterations on average for all meshes we constructed.) Also illustrated is the linear convergence rate of the forward Euler solver. For this solver the average number of iterations increases from 10 to 155 as the tolerance is decreased from 0.04h0 to 0.004h0 . For our implementation however, the average computational cost per iteration of the forward Euler solver is some 30% less than the average computational cost for the Newton solver. The forward Euler implementation is different from our implementation since it does not allow tensile forces in the trusses and a precompression is imposed. It is therefore necessary to verify that the changes we made are not detrimental to the element quality and mesh uniformity reported in [42]. Element quality is defined as twice the ratio of the radius of the largest inscribed 1 9 0.99 8 0.98 Mesh uniformity (%) Mean element quality CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 0.97 0.96 −8 Newton (ε=10 h ) 0.95 0 Forward Euler (ε=0.04h ) 0.94 0.92 1 10 2 3 10 10 4 10 Newton (ε=10−8h ) 0 7 Forward Euler (ε=0.04h ) 0 6 Forward Euler (ε=0.004h0) 5 4 3 0 Forward Euler (ε=0.004h0) 0.93 10 2 5 10 1 1 10 Number of nodes 2 3 10 4 10 10 5 10 Number of nodes (a) (b) Figure 2.3: Comparison between the different solvers in terms of a) mean element quality and b) mesh uniformity for the quarter circular disc. Table 2.2: Minimum element qualities for the forward Euler and Newton solvers. h0 3 1.5 0.75 0.375 0.1875 forward Euler ( = 0.04h0 ) forward Euler ( = 0.004h0 ) Newton ( = 10−8 h0 ) 0.7942 0.7323 0.6312 0.5845 0.6245 0.8197 0.7384 0.7644 0.7191 0.7625 0.7435 0.6915 0.6665 0.6571 0.6538 circle over the radius of the smallest circumscribed circle. Hence the element quality of an equilateral triangle for example is 1.00, while the element quality of a 30-60-90 angle triangle is only 0.68. Mesh uniformity is defined as the standard deviation of the ratio of the circumradii of all the triangles in the mesh to the ideal element length h0 . The mesh uniformity is normalized by the mean ratio, and then expressed as a percentage. Hence a mesh uniformity of 0% is the ideal. We compare the mean element quality and mesh uniformity of the two solvers in Figures 2.3(a) and 2.3(b) respectively. For this comparison we again use the quarter circular disc. In essence, the mean element quality and the mesh uniformity are similar. In Table 2.2, the lowest element quality in the meshes are compared. Again, the solvers perform comparable. 2.3 Problem formulation In general the shape optimization problem is given in the following abstract form [34]. CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 11 Problem 2.3.1. Find the optimal shape Ω∗ such that FΩ∗ = FΩ Ω∗ , SΩ∗ = min{FΩ Ω, SΩ : G Ω ≤ 0 }, Ω (2.9) with SΩ the state solution of the partial differential equations pΩ (Ω, SΩ ) = 0, which characterizes the system [34]. Here, Ω denotes the unknown shape, which is a subset in R3 . Ω is usually defined by a finite set of control variables x ∈ Rn and G(Ω) ≤ 0 the set of nonlinear design constraints, with 0 given. Problem 2.3.1 is ill-posed but can be reduced to an approximate problem by discretizing the domain Ω with a computational grid Λ (see Eq. (2.1)). The computational grid Λ can then be used to approximate the state equation pΛ (Λ, SΛ ) = 0 and to define an approximate solution SΛ to the state equation. The set of nonlinear design constraints G Ω ≤ 0 can then also be expressed as m functions of the computational grid Λ as follows gi Λ(x), x ≤ 0, i = 1, . . . , m. (2.10) Consequently the shape optimization problem reduces to the following approximate problem. Problem 2.3.2. Find the minimum F ∗ such that F ∗ = F Λ∗ (x∗ ), SΛ∗ (x∗ ) = minn {F Λ(x), SΛ (x) : g Λ(x), x ≤ 0}, x∈R (2.11) with SΛ the approximate state solution of the approximate partial differential equations pΛ (Λ, SΛ ) = 0, which characterizes the system. For the sake of brevity, the objective function and the constraints will respectively be denoted by F(x) and g(x); this notation will however imply dependency on Λ(x). The objective and constraint functions can be selected in many ways. In structural shape optimization the objective function is usually chosen as the weight or volume of the structure, subject to displacement and stress constraints [16]. In our case, the objective function F(x) = F u(Λ(x)) is an explicit function of the nodal displacements u, which are obtained by solving the approximate finite element equilibrium equations for linear elasticity, formulated as Ku = f , (2.12) where K represents the assembled structural stiffness matrix and f the consistent structural loads. Following the usual approach, the system in Eq. (2.12) is partitioned along CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 12 the unknown displacements (uf ) and the prescribed displacement (up ), i.e. " Ku = K ff K fp K pf K pp #( uf up ) ( = ff fp ) , (2.13) where f f represents the prescribed forces and f p the reactions at the nodes with prescribed displacements. The unknown displacements (uf ) are obtained from K ff uf = f f − K fp up . (2.14) We choose to represent the geometrical domain boundary ∂Ω by a simple piecewise linear interpolation between the control variables. However, Bezier curves or B-splines, etc. may of course also be used. We subject the structure to a volume constraint g(x) which reduces to a linear function of the control variables x. 2.3.1 Analytical sensitivities Recall that the cost function F(x) is an explicit function of the nodal displacements u. Using gradient based optimization algorithms, we therefore require the sensitivity of the structural response u w.r.t. the design variables (control variables) x. In general, the stiffness partition matrices K ff and K fp , the nodal displacement vector uf and the load vector f f in Eq. (2.14) depend on the design variables x, i.e. K ff (x)uf (x) = f f (x) − K fp (x)up (x). du The analytical gradient dxf is obtained by differentiating Eq. (2.14) w.r.t. the control variables x, i.e. df f dK fp duf dup dK ff K ff = − up − K fp − uf . (2.15) dx dx dx dx dx In this study the load vector f f is assumed to be independent of the control variables x, df hence dxf = 0. For Dirichlet boundary conditions, up = 0, and Eq. (2.15) reduces to K ff duf dK ff =− uf . dx dx (2.16) du Eq. (2.16) is solved to obtain dxf , using the factored stiffness matrix K ff , available from dK the primary analysis when solving Eq. (2.14). The unknown dxff is computed from dK ff dK ff dX = , dx dX dx dK (2.17) where dXff is obtained by differentiating the stiffness matrix analytically with respect to the nodal coordinates X . This is done on the element level and then assembled into the global system. For simplicity’s sake we choose to use the constant strain triangle (CST) CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 13 element. The element stiffness matrix K e of the CST element is given by K e = tAB T DB, (2.18) where t and A denote the element thickness and element area. Hence dK e =t dX dA T dB T dB B DB + A DB + AB T D dX dX dX dK e dX is given by . (2.19) The area of the CST element is given by A = 0.5|detJ | = 0.5|x13 y23 − x23 y13 |. (2.20) Here, J represents the Jacobian matrix and xij = xi − xj , etc. Subscripts i, j = 1, 2, 3 denote the element node numbers. The CST element strain-displacement matrix is y23 0 y31 0 y12 0 1 B= 0 x32 0 x13 0 x21 . detJ x32 y23 x13 y31 x21 y12 (2.21) It follows from Eqs. (2.18), (2.20) and (2.21) that K e is a nonlinear function of the nodal e dK coordinates X . B and A are differentiated directly to obtain dK ; assembly yields dXff . dX present in Eq. (2.17). To complete the sensitivity analysis, we still need to evaluate dX dx To emphasize the dependency on the control variables x, the reduced truss system F Ω is now expressed as a function of both the interior nodes X Ω (x) and the boundary nodes X ∂Ω (x), i.e. F Ω (X Ω (x), X ∂Ω (x)) = 0. (2.22) To determine the relationship between the nodal coordinates X and the control variables x, we take the derivative of Eq. (2.22) w.r.t. x, i.e. ∂F Ω dX Ω ∂F Ω dX ∂Ω dF Ω = + =0 dx ∂X Ω dx ∂X ∂Ω dx (2.23) hence ∂F Ω dX Ω ∂F Ω dX ∂Ω = − . ∂X Ω dx ∂X ∂Ω dx dX Ω dx ∂Ω (2.24) ∂F Ω dX ∂F Ω can be obtained if ∂X and ∂X Ω are known, either analytically or numerically. ∂Ω dx Although the relationship between the control variables x and the boundary nodes X ∂Ω is known explicitly due to the linear relationship of our piece-wise linear boundary, we ∂F Ω dX ∂Ω ∂F Ω compute ∂X using a semi-analytical sensitivity analysis [40]. ∂X Ω is available from ∂Ω dx CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 14 Newton’s method (cf. Eq. (2.6)) implemented in the mesh generation step. Therefore we Ω ∂Ω obtain dX as the union of dX and dXdx . The semi-analytical sensitivity calculation redx dx quires finite difference perturbations from an unperturbed geometry. For the unperturbed and perturbed geometries we ensure the mesh topology T remains the same by deactivating the Delaunay triangulation for the duration of the sensitivity calculation. This avoids remeshing between the unperturbed and perturbed geometries and any inconsistencies in mesh topology that may occur. In summary, the primary and sensitivity analyses proceed as follows: 1. Solve for the nodal positions X Ω by solving Eq. (2.6) repeatedly. recomputed at each iteration. ∂Ω ∂F Ω dX 2. Calculate ∂X semi-analytically and then solve for ∂Ω dx ∂F Ω from step 1 above. ∂X Ω dX Ω dx ∂F Ω ∂X Ω and F Ω are from Eq. (2.24), using 3. Assemble K ff and f f , then solve for uf from Eq. (2.14). 4. Compute dK e dX using Eq. (2.19) and assemble over all elements to obtain 5. Compute dK ff dx using Eq. (2.17), and using 6. Solve for duf dx 2.3.2 dX dx dK ff . dX from step 2 above. using Eq. (2.16). Gradient sensitivity comparison To verify that no errors were made during the analytical gradient derivations, we compare our analytical sensitivities to numerical sensitivities obtained with the forward finite difference method. We use the bow-tie structure and mesh depicted in Figure 2.4 to compute the sensitivity of the displacement at the point of load application (uF ) w.r.t. the indicated control variables. The x and y coordinates for control variables and applied load F of the the bow-tie structure of the unperturbed structure are given in Table 2.3. Calculation of the numerical sensitivities is conducted without Delaunay triangulation steps, to avoid the introduction of any discontinuity (due to the addition or removal of Table 2.3: Cartesian coordinates for control variables and applied load F of the unperturbed bow-tie structure. Control variable x y 1 5 15 2 10 9 3 15 15 4 20 15 5 5 0 6 10 6 7 15 0 8 20 0 F 20 7.5 CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 3 1 15 4 15 2 6 Control variables F, u F y x 5 7 8 20 (a) (b) Figure 2.4: a) Bow-tie structure defined by the indicated control variables and b) the associated mesh for an ideal element length of h0 = 2.0. nodes) in the numerical sensitivity analysis. In Table 2.4 the numerical gradients for a perturbation of 10−6 are compared to the analytical gradients. It follows from Table 2.4 that our computations are correct and accurate. In comparison, let us consider the procedure required to compute the sensitivities in Eq. (2.17) using the forward Euler implementation. Two strategies exist: First, if dX dx duf is available, dx is solved from the FE sensitivity analysis in Eqs. (2.16) and (2.17). Since dX is not available analytically, one needs to compute dX numerically, using a finite dx dx difference approach. This requires a complete mesh generation step for each perturbation. accurately is computationally intensive due to the slow convergence rate. To compute dX dx duf Alternatively, dx can be computed directly via finite differences. The drawback here is that a complete mesh generation step and FE analysis are needed for each perturbation. Using this strategy, the increased computational cost due to the additional FE analyses is offset by using a coarser tolerance during mesh generation. From a computational cost perspective none of these strategies compare favourably with the analytical sensitivities available from Newton’s method. However, if sensitivities are not required, as in gradient free optimization, the forward Euler method with a coarse tolerance is a feasible mesh Table 2.4: Analytical and forward finite difference sensitivities calculated for the bow-tie structure depicted in Figure 2.4. Point 1 2 3 4 Analytical (×10−3 ) Numerical (×10−3 ) Point Analytical (×10−3 ) Numerical (×10−3 ) -0.111215 -1.788542 -0.043842 -0.002512 -0.111215 -1.788540 -0.043841 -0.002512 5 6 7 8 0.094632 1.818842 0.034656 0.001624 0.094632 1.818844 0.034656 0.001623 CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 16 x 30 F, u F Figure 2.5: Initial structure and definition for the cantilever beam. generation option in shape optimization. 2.4 Numerical examples We implement two gradient based optimization algorithms, namely the Dynamic-Q method developed by Snyman and Hay [56], and the well known sequential quadratic programming (SQP) method [4]. All problems are allowed a maximum of 100 iterations, although in most cases the best objective function values f best are obtained within 20 iterations for both algorithms. All the linear elastic FE analyses are performed using E = 200 × 103 for Young’s modulus, ν = 0.3 for Poisson’s ratio and thickness 1.0, under plane stress conditions. We investigate the effect of the number of control points (NCP). 2.4.1 Example problem 1: Cantilever beam Consider the cantilever beam depicted in Figure 2.5. The domain has a predefined length of 30 and a maximum allowable height of 10. The objective is to minimise uF , the vertical displacement at the point of load application, subject to a maximum volume constraint of 70%. The magnitude of F is 10 N. The problem is conducted for an ideal element length of h0 = 1.0. The control points are linearly spaced along the length of the top of the cantilever beam, as indicated in Figure 2.5. The results obtained with both algorithms are summarized in Table 2.5. The optimal Table 2.5: Best function value obtained for the cantilever beam problem. NCP Dynamic-Q SQP 4 7 13 1.0073×10−2 1.0011×10−2 0.9996×10−2 1.0068×10−2 1.0013×10−2 1.0005×10−2 CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY (a) (b) 17 (c) Figure 2.6: Cantilever beam: optimal shapes for a) 4, b) 7 and c) 13 control points respectively obtained with the Dynamic-Q algorithm. shapes obtained with the Dynamic-Q algorithm are depicted in Figure 2.6; the SQP shapes are similar and therefore not shown. It is clear that as the NCP increases, the optimal designs converge. 2.4.2 Example problem 2: Full spanner Consider the full spanner problem depicted in Figure 2.7. The structure has a predefined length of 24 and a maximum allowable height of 10. The control points are linearly spaced along the length of the spanner with half the control points describing the bottom and half the top of the geometry, as indicated in Figure 2.7. Note that no control points are used to describe the left- and right-most extremities of the spanner, which are stationary as indicated. The objective is to minimise 12 (uFA − uFB ), with uFA and uFB the vertical displacement at the point of load application, for the two load cases FA and FB respectively. The corresponding boundary conditions for each load case are indicated by A and B respectively in Figure 2.7. In addition the problem is subjected to a maximum volume constraint of 30% of the defined domain together with a minimum handle thickness of 2. The loads FA and FB both have a magnitude of 10 N. The meshes are generated for an ideal element length h0 of 0.5. This problem should result in a symmetric geometry. However, symmetry is not enforced; deviations from symmetry are used to qualitatively evaluate the obtained designs. The results obtained with both algorithms are summarized in Table 2.6. The optimal shapes obtained with the Dynamic-Q algorithm are depicted in Figure 2.8; again the SQP 24 1 FB , u FB A 10 B A x 2 2 1 B FA , u FA Figure 2.7: Initial structure and loads for the full spanner problem. CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 18 Table 2.6: Best function value obtained for the full spanner problem. NCP 4 10 22 (a) Dynamic-Q SQP 5.7917 ×10−1 3.5133×10−1 3.1911×10−1 5.7934×10−1 3.5150×10−1 3.1905×10−1 (b) (c) Figure 2.8: Full spanner problem: optimal shapes for a) 4, b) 10 and c) 22 control points respectively obtained by the Dynamic-Q algorithm. shapes are similar and therefore not shown. From Figure 2.8 it is evident that symmetric designs are obtained in all cases. 2.4.3 Example problem 3: Michell-like structure The geometry [20] for this problem is depicted in Figure 2.9. The structure has a predefined length of 15 and a maximum allowable height of 10. The control points are linearly spaced along the length of the Michell-like structure with two additional control points describing the top as opposed to the bottom of the structure, as depicted in Figure 2.9. The objective is to minimise uF , the vertical displacement at the point of load application, subject to a maximum volume constraint of 50%. The magnitude of F is 10 N. The problem is conducted for an ideal element length of h0 = 0.75. 15 x9 10 x8 Upper control variables Lower control variables F, u F Figure 2.9: Initial structure and definition of half the Michell-like structure. (Control points x8 and x9 are indicated for the 16 NCP problems.) CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 19 Table 2.7: Best function value obtained for the Michell structure problem. NCP Dynamic-Q SQP 4 8 16 1.4847×10−3 1.2690×10−3 1.2038×10−3 1.4327×10−3 1.2722×10−3 1.2084×10−3 (a) (b) (c) Figure 2.10: Michell-like structure: optimal shapes for a) 4, b) 8 and c) 16 control points respectively obtained with the Dynamic-Q algorithm. The results are summarized in Table 4.2 and Figure 2.10. As before, both algorithms obtain essentially the same solutions. One peculiarity however, is the optimal shape depicted in Figure 2.10(c). The shape of the right tip of the structure is counter intuitive; the origin thereof is the topic of the next section. Objective function characteristics Our resulting objective function contain numerically induced discontinuities since we use remeshing in our shape optimization strategy. Even with the presence of these discontinuities we were able to obtain intuitive designs for the cantilever beam and spanner problems. However, the Michell structure resulted in a counter intuitive design, which we expect to be a complication of these discontinuities in the objective function. To investigate the nature of the objective function of the Michell structure, the two rightmost upper control variables x8 and x9 (see Figure 2.9) are perturbed. These control variables are varied between −1 and 1 (with equal intervals of 0.05), about the best objective function value found by the Dynamic-Q algorithm for the 16 NCP problem. As shown in Figure 2.11, the objective function is discontinuous and local minima are present. These local minima and discontinuities are not physical phenomena but are purely due to remeshing. In fact, the objective function discontinuity is due to the mesh discontinuity, i.e. a change in a control variable value leads to the introduction or removal of an additional element. This is depicted in Figure 2.11: a small increase in x9 results in 5 elements (top insert in Figure 2.11) instead of 4 elements (bottom insert in Figure 2.11) on the rightmost edge of the structure. Vertical displacement of uF CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 20 −3 x 10 1.3 1.28 1.26 1.24 1.22 1.2 −1 1 0.5 0 0 Cha nge in x 9 −0.5 −1 1 e ang h C in x 8 Figure 2.11: Vertical displacement at the point of load application for the variations of the 2 rightmost upper control variables (x8 , x9 ) for the mesh depicted in Figure 2.10(c). (a) (b) (c) Figure 2.12: Michell-like structure: optimal shapes for ideal element lengths of a) 0.75, b) 0.375 and c) 0.1875 respectively for the 16 NCP problem obtained by the Dynamic-Q algorithm. CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 21 −3 x 10 Vertical displacement of u F 1.45 1.4 1.35 h0=0.75 h0=0.375 h0=0.1875 1.3 1.25 1.2 1.15 −1 −0.5 0 Change in x9 0.5 1 Figure 2.13: Vertical displacement at the point of load application for the variation of the rightmost upper control variable (x9 ) for the meshes depicted in Figure 2.12(a), 2.12(b) and 2.12(c). (a) (b) Figure 2.14: Michell-like structure: effect of different starting points on the optimal shape for a) the course, and b) the fine mesh, obtained with the Dynamic-Q algorithm using 16 control points. Since the mesh discontinuity is responsible for the discontinuities in Figure 2.11, it follows that a decrease in element size should decrease the magnitude of these discontinuities although their number will increase. To investigate the effect of h-refinement, the 16 NCP problem is repeated here for ideal element lengths of h0 =0.375 and h0 =0.1875. The optimal shapes are depicted in Figure 2.12. It is clear that as the element size decreases, the right tip of the structure gradually flattens off. To quantify the number of discontinuities and their magnitude, only the rightmost upper control variable x9 is varied between −1 and 1 about the optimal designs depicted in Figure 2.12. Figure 2.13 confirms that as the element size decreases, the number of discontinuities increases, while the magnitudes decrease. These discontinuities are however still severe enough to adversely affect the performance of gradient based optimization algorithms. To demonstrate that the geometric anomalies are indeed associated with local minima, we now restart the Dynamic-Q algorithm at an arbitrary starting point. (The results in the foregoing were all obtained for an initial volume fraction of 1.0.) For the coarse (h0 = 0.75) and fine (h0 = 0.1875) meshes respectively, the optimal shapes obtained are depicted in Figure 2.14(a) and 2.14(b). For the coarse mesh, the objective function uF decreases from 1.204 × 10−3 when CHAPTER 2. REMESHING SHAPE OPTIMIZATION STRATEGY 22 the geometric anomaly due to the local minimum is present, to 1.186 × 10−3 , which is (presumably) the global optimum. For the fine mesh, the objective function uF decreases from 1.417 × 10−3 with the geometric anomaly due to the local minimum, to 1.395 × 10−3 , again presumably the global optimum. While h-refinement does seem to reduce the severity of local minimum, h-refinement seems unable to assist in escaping from local minima. This observation is of course problem specific, and may even be proved incorrect in the limit of mesh refinement. However, for practical meshes, it is clear that solutions may be local minima when convex solvers are used. A simple multi-start strategy is likely to be of great benefit in attempting to find the global optimum (or at least some ‘good’ local optimum). 2.5 Conclusion We have applied gradient based optimization techniques to shape design problems. In doing so, we have created a novel unstructured remeshing shape optimization environment, based on a truss structure analogy. The remeshing environment is quadratically convergent in solving for the equilibrium positions of the truss structure. As expected the objective function value decreases as the number of control points are increased. This is a direct result of the number of possible design configurations that increases. However, due to the unstructured remeshing, discontinuities (local minima) are introduced into the optimization problem. In two of the three problems we studied, namely the cantilever and spanner design, these discontinuities did not seem to hamper the optimization. However, they hampered the optimization of the Michell structure as the final design converged to a counter intuitive design which we showed to be a local minimum caused by a discontinuity. Although the magnitude of these discontinuities decreases with mesh refinement, their number increases. For the gradient based algorithms, the severity of the anomaly is alleviated as the mesh is refined. Polynomial refinement e.g. linear strain triangles, may further decrease the magnitude of the discontinuities. It is however suggested that local minima may efficient and effective be overcome with a simple multi-start strategy. Even with the most inaccurate 2-D element available, namely the CST triangular element, we have demonstrated that gradient based algorithms are able to solve shape optimization problems efficiently using an unstructured remeshing strategy which makes analytical gradients available to the optimization algorithms. CHAPTER 3 Applications of gradient-only optimization In this chapter we study the minimization of numerically approximated objective functions containing non-physical jump discontinuities. These discontinuities arise when (partial) differential equations are discretized using non-constant methods and the resulting numerical solutions are used in computing the objective function. Although these functions may become discontinuous and non-differentiable we can compute exact gradient information where the function is differentiable and construct approximate gradient information where it is discontinuous. At a non-differentiable point, a partial derivative of the gradient vector is constructed by a one-sided directional derivative or given by the partial derivative itself, when the function is respectively non-differentiable or differentiable along the partial derivative direction. Such a constructed gradient field follows from the computational scheme, since every point has an associated discretization for which (semi) analytical sensitivities [40] of the numerically approximated optimization problem can be calculated. The only requirement is that we use a constant discretization topology when computing the sensitivities. Hence, from a computational perspective the gradient field of these discontinuous functions are defined everywhere albeit constructed at the discontinuities. In this study we refer to this gradient field as the associated gradient field. Rather than the construction of global approximations using only function value information to overcome the discontinuities, we propose to use only associated gradient information. We elaborate on the modifications of classical gradient based optimization algorithms for use in gradient-only approaches, and we then present gradient-only optimization strategies using both BFGS and a new spherical quadratic approximation for sequential approximate optimization (SAO). We then use the BFGS and SAO algorithms to solve three problems of practical interest, both unconstrained 23 CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION and constrained. This chapter develops as follows: An overview of discontinuous objective functions in optimization is given in Section 3.1. We then present the classical mathematical programming problem and the corresponding gradient-only optimization problem in Section 3.3. The optimization algorithms used in this study are then outlined in Section 3.4. We then consider three example problems in the remainder of Section 3.5. These are a one dimensional transient heat transfer problem; an unconstrained and a constrained shape design problem, and lastly a material identification study. Finally, we offer conclusions in Section 3.6. The derivation of the required sensitivities for the test problems is outlined in an Appendix. 24 CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 3.1 25 Introduction Many problems in engineering and the applied sciences are described by ordinary or partial differential equations (ODEs/PDEs), e.g. the well-known elliptical PDEs of structural mechanics. Analytical solutions to these are seldom available and in many cases, (approximate) numerical solutions need to be computed. Temporal (P)DEs may be solved using fixed or variable time steps; for spatial (P)DEs, the equivalents are fixed and mesh moving spatial updating strategies on the one hand, and remeshing on the other. Fixed time steps and mesh moving strategies however may imply serious difficulties, e.g. impaired convergence rates and highly distorted grids and meshes, which may even result in failure of the computational procedures used. The variable or ‘non-constant’ strategies are preferable by far. (P)DEs also often describe the physics of some problem that is to be optimized using numerical optimization techniques. Indeed, the numerical solution of (P)DE problems is regularly used to numerically approximate optimization problems in science and engineering. However, variable methods now become problematic since they may result in non-smooth or step-discontinuous objective functions of the design variables, whereas the ‘constant strategies’ result in smooth continuous objective functions. The step discontinuities resulting from the variable methods are non-physical, since they are mere artifacts of the numerical strategies used to approximate the inherently smooth objective function of the exact optimization problem described by the (P)DE under consideration. Although these functions may become discontinuous and non-differentiable we can compute exact gradient information where the function is differentiable and construct approximate gradient information where it is discontinuous. At a non-differentiable point, a partial derivative of the gradient vector is constructed by a one-sided directional derivative or given by the partial derivative itself, when the function is respectively non-differentiable or differentiable along the partial derivative direction. Such a constructed gradient field follows from the computational scheme, since every point has an associated discretization for which (semi) analytical sensitivities [40] of the numerically approximated optimization problem can be calculated. The only requirement is that we use a constant discretization topology when computing the sensitivities. Hence, from a computational perspective the gradient field of these discontinuous functions are defined everywhere albeit constructed at the discontinuities. In this study we refer to this gradient approximation as the associated gradient, which we denote ∇A f (x) which gives an associated directional derivative when ∇A f (x) is used to compute the directional derivative. Consider Figure 3.1, which depicts three functions that describe an optimization problem. fA (x) is the unknown analytical function to an exact optimization problem described by a system of partial differential equations; fN (x) is the numerically computed piece-wise smooth step discontinuous objective function which is an approximation to fA (x); fC (x) CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 26 f (x) A f (x) N f (x) A f (x), f N(x), f C (x) C x c x i L R x* g x* x Figure 3.1: Plot depicting a piece-wise smooth step discontinuous numerical objective function fN (x) together with the corresponding underlying (unknown) continuous analytical objective function fA (x) of an optimization problem. In addition we depict a projected piece-wise smooth continuous objective function fC (x) obtained by removing the step discontinuities from fN (x). is a constructed piece-wise smooth continuous objective function from fN (x) by removing the discontinuities. The optimum x∗ and positive associated gradient projection point x∗g , of fN (x) are also indicated. We refer to x∗g as a positive associated gradient projection point since the associated directional derivatives in all directions around that point are positive. However, to avoid using long descriptive names we will merely refer to x∗g as a positive projection point. The discontinuities in fN (x) as well as the associated gradient field (and associated directional derivatives) are a direct consequence of a sudden change in the discretization used to compute an approximate solution to a system of partial differential equations. As long as the changes in the discretization vary smoothly, the underlying objective function is smooth. This is obtained when using constant discretization strategies, since the underlying discretization errors vary smoothly. The use of such smooth objective functions are prevalent in engineering optimization [34, 45]. When the objective function is smooth, as opposed to the objective function depicted in Figure 3.1, the optimum x∗ and positive projection point x∗g define the same point. This point is usually assumed to be close to the exact optimizer, in particular when good numerical strategies are used. If however when the discretization changes abruptly, a step discontinuity results. The reponse could be suddenly underestimated (as depicted by xc ), or overestimated (as depicted by xi ), as demonstrated in Figure 3.1. We refer to these two points respectively as a consistent (xc ) and inconsistent (xi ) step discontinuities. The function and the slope of the function (associated derivative) around the consistent discontinuity indicates descent. Around an inconsistent discontinuity the function value indicates ascent, as CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 27 opposed to descent by the slope, and are hence inconsistent with each other. Therefore, inconsistent discontinuities would be completely ignored when conducting optimization that only relies on the associated derivative, whereas they would manifest as local minima when conducting optimization that also uses function values. The premise of this study is that the piece-wise continuous parts in the numerically approximated objective function fN (x), where the error is smoothly varying, better describes the behaviour of the underlying unknown analytical function fA (x) than the abrupt changes in the discretization error. This premise holds in particular, when rediscretization is done to increase the accuracy of the analysis by reducing the discretization error. The motivation for our premise is as follows: consider the optimum x∗ that occurs over an inconsistent discontinuity, as depicted in Figure 3.1. The associated derivative at x∗ indicates further possible improvement if x is increased. It is reasonable to believe that had the discretization been varied smoothly with no discontinuity present, then the function value would continue to decrease as the error would change smoothly as before. However, when rediscretization is conducted the discretization error changes abruptly and hence introduces a discontinuity. The function value and associated gradient vector obtained after the rediscretization is more accurate if we assume that the rediscretization increases the accuracy of the analysis and consequently reduces the discretization error. If the function continues to decrease along this direction (even if it starts at a higher value), then the optimum at the discontinuity could be safely discarded as an unwanted or inferior solution to the optimization problem. A more suitable solution to the optimization problem would then be a local minimizer of the piece-wise smooth continuous function fC (x), as depicted in Figure 3.1. However, such a point is also characterized by a positive projection point x∗g of fN (x), as depicted in Figure 3.1. Consider the positive projection point x∗g that occurs over a discontinuity as depicted by fN (x) in Figure 3.1, with a piece-wise smooth part L of the function to the left and a piece-wise smooth part R of the function to the right of it. If rediscretization was omitted and the piece-wise smooth part L extended an optimal solution would occur to the right of x∗g and similarly an optimum to the left by the extended piece-wise smooth part R. Therefore, x∗g would be bounded by the two optima of the two extended piecewise continuous pieces and consequently within a domain of uncertainty of the numerical model. In order to distinguish between these three possible solutions one would have to increase the accuracy of the numerical model or obtain information regarding the absolute error w.r.t the exact solution, which is usually not available. Lastly, when the optimum x∗ and the positive projection point x∗g coincide the use of only the associated derivative is an efficient strategy to ignore the local minima introduced by inconsistent step discontinuities. Hence, variable time step methods and variable remeshing techniques are normally avoided in optimization, due to the very fact that the discontinuities present may cause CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 28 numerous difficulties during the optimization steps. An important spatial example is structural shape optimization, in which fixed or moving grid strategies are almost always used; the very motivation for this being that remeshing strategies cannot be used efficiently in optimization, due to the induced discontinuities during optimization, e.g. see References [1, 9, 32, 39]. The reasons for this are obvious: discontinuous cost functions are difficult to optimize in comparison to smooth convex cost functions, since the discontinuities may introduce spurious or false local minima which may make the use of efficient gradient based optimization algorithms difficult, if not impossible. Accordingly, the optimization of discontinuous functions usually requires highly specialized optimization strategies, and possibly, heuristic approaches. For the continuous programming problem, many well known minimization algorithms are available, e.g. steepest descent, (preconditioned) conjugate gradient methods, and variable metric methods like BFGS. However, if f is discontinuous, i.e. f ∈ / C 0 , the minimizer x∗ of the mathematical programming problem may not satisfy the standard optimality criteria. Indeed, the optimality criteria may not even be defined. Accordingly, the well-known efficient optimization algorithms mentioned above may be unable to minimize the resulting step discontinuous function. Conventional gradient based optimization approaches have been used in restart strategies that restarts the optimization process after a discontinuity to continue with the conventional optimization process [26, 30]. Such an approach would eventually converge to a positive projection point x∗g and not necessarily the optimum x∗ as might be expected. The computational efficiency of such an approach however requires two analysis per design over a discontinuity in addition to the computations required to identify the discontinuity. As a solution strategy, some researchers in structural optimization have resorted to surrogate optimization in which approximations are constructed using function values only, in combination with design-of-experiments (DoE) techniques. The resulting approximations are smooth, and are often assumed to be valid over large regions of the design domain. Known as so-called global approximations, they may be optimized using gradient based techniques, if so desired. (Although the use of function values only is popular, gradient information is sometimes also used. In addition, ‘mid-range’ surrogates are sometimes also constructed, in combination with elaborate strategies for controlling the step sizes, and the acceptance and rejection of points.) A drawback of surrogate optimization is that problems are limited to moderate dimensions since the algorithms scale poorly with dimensionality. For an overview of these methods, see Barthelemy and Haftka [3], Haftka and Gurdal [24], Sacks et al. [47], Toropov [59], and many others. The interested reader is also referred to the recent review paper by Simpson et al. [51] as well as the paper by Forrester and Keane [19]. Accurate associated gradient information of the numerically approximated optimiza- CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 29 tion problem is indeed available for the (P)DEs that occur so frequently in engineering and applied mathematics. This does of course not hold for ‘inherently noisy objective functions’ like crash-simulations, etc. An important example of non-physical discontinuities introduced by variable discretization methods occurs in shape optimization, a problem we will elaborate on in some detail herein. Yet, few gradient-only optimization algorithms seem to be available, even though gradient-only optimization algorithms are by no means new to optimization. The first gradient-only optimization algorithm developed is the widely known Newton’s method, e.g. see Wallis [64]. However, Newton’s method has several drawbacks for practical optimization. Firstly, it requires that a function be twice-differentiable, since Hessian information is needed to compute the update steps. Secondly, Newton’s method locates diminishing gradients which may converge to suboptimal designs. In addition, Newton’s method may oscillate around an optimal solution or even diverge. To the best of our knowledge, only a few other gradient-only optimization algorithms have been developed, see References [4, 43, 44, 49, 53, 54]. Most of these algorithms also address some of the major difficulties of Newton’s method. Gradient-only optimization has also been extended to nonsmooth functions with the introduction of subdifferentials1 by Shor et al. [49]; these methods are used to compute and define the gradient of a function where it is not differentiable. Subdifferentials are however defined as the set of subgradients for which a hyperplane (linearization of the function) defined by the subgradient at the discontinuity is less or equal to the actual function value. At a step or jump discontinuity the subdifferential is the empty set at the side of the discontinuity where the hyperplane lies above the function. In addition subgradient methods reduces to steepest descent methods with a priori selected step lengths which often require tuning (in particular for larger problems [5]) and Lipschitz conditions regarding the function. Although the use of a priori selected step lengths increases the flexibility of the algorithm (e.g. when optimizing piece-wise linear functions) it increases the computational cost when the additional flexibility is not required. Lastly, all these methods are employed and designed to find the point for which the objective function is a minimum, with the exception of Newton’s method which is often employed to merely find stationary points of a function. Our gradient-only approach however solely aims to find positive projection points x∗g . For smooth convex functions this is equivalent to finding the optimum x∗ of the function. However, when piece-wise smooth step discontinuous functions are considered, the optimum x∗ may be distinct from the positive projection point x∗g as they define two generalizations of solutions for piece-wise smooth step discontinuous functions. Our gradient-only approach merely requires that the gradient field be everywhere computable as for example with associated gradients. Gradient-only optimization approaches are 1 A subdifferential is a set of subgradients at a point. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 30 merely conventional gradient based optimization algorithms slightly modified to use only gradient information, and have comparable computational efficiency to conventional gradient based optimization algorithms. In addition, we require no assumptions regarding the Lipschitz conditions of a function. Consider again Figure 3.1 for which we note that by only considering the associated derivative of fN (x) during optimization we effectively optimize fC (x) without having to go through the computational effort of distilling fC (x) from fN (x), since the associated derivative field of fN (x) and fC (x) are the same. Gradient-only optimization effectively reduces the complexity of functions plagued with numerically induced step or jump discontinuities by acting as a filter to ignore these discontinuities, on the condition that a positive projection point (defined as the solution to the gradient-only optimization problem presented herein) is a suitable solution for the problem at hand. If it is known or assumed that x∗g coincides with a minimizer x∗ of f (x) then these problems can be approached from a classical mathematical programming perspective which have to be solved using global optimization algorithms. This is due to the numerically induced step discontinuities that manifest as local minima in the function domain. However, the resulting discontinuous problems may still be optimized efficiently using gradient-only optimization as the numerically induced step discontinuities are ignored, which is an alternative to constructing global approximations. Lastly, both the function values and analytically computed associated derivatives are consistently computed for a numerically computed objective function and prone to discretization errors, in particular discretization errors that changes abruptly. We do however note that inconsistent step discontinuities in the function value may result in local minima and trap conventional optimization approaches. However, step discontinuities in the associated derivative from our experience mostly result in abrupt changes in the magnitude of the associated derivative but not the sign of the associated derivative which indicates the direction of ascend of f (x). This only affects the convergence rate of the gradient-only optimization approaches and not the robustness thereof. If however, the error in the associated derivative changes the sign of the associated derivative, it would manifest as a positive projection point. However, such a point would also manifest as a local minimum in the function value. We show numerically that our premise and gradient-only approaches yield promising results for multidimensional problems. To ease the presentation of the remainder of this chapter we will merely refer to f as the objective function for which we would like to find the positive projection point x∗g , but this implies the numerically approximated objective function fN . CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 3.2 31 Definitions The functions we consider in this study are step discontinuous and therefore not everywhere differentiable. However computationally the derivatives and gradients are everywhere computable since the analysis is per se restricted to the part of the objective function before, or after a discontinuity. We therefore define an associated derivative f 0A (x) and associated gradient ∇A f (x) which follow computationally when the sensitivity analysis is consistent [48]. Firstly, we define the associated derivative Definition 3.2.1. Let f : X ⊂ R → R be a real univariate piece-wise smooth step discontinuous function that is everywhere defined. The associated derivative f 0A (x) for f (x) at a point x is given by the derivative of f (x) at x when f (x) is differentiable at x. The associated derivative f 0A for f (x) non-differentiable at x, is given by the left-sided derivative of f (x) when x is associated to the left piece-wise continuous section of the discontinuity, otherwise it is given by the right-sided derivative. Secondly, the associated gradient is defined as follows: Definition 3.2.2. Let f : X ⊂ Rn → R be a real multivariate piece-wise smooth step discontinuous function that is everywhere defined. The associated gradient ∇A f (x) for f (x) at a point x is given by the gradient of f (x) at x when f (x) is differentiable at x. The associated gradient ∇A f (x) if f (x) is non-differentiable at x is defined as the vector of partial derivatives where each partial derivative is an associated derivative (see Definition 4.2.1). It follows from Definitions 4.2.1 and 4.2.2 that the associated gradient reduces to the gradient of a function that is everywhere differentiable. Secondly, we present definitions for univariate and multivariate associated gradient unimodality based solely on the associated gradient field of a real valued function [4]. Definition 3.2.3. A univariate function f : X ⊂ R → R with associated derivative f 0A (λ) uniquely defined for every λ ∈ X, is (resp., strictly) associated derivative unimodal over X if there exists a x∗g ∈ X such that f 0A (x∗g + λu)u ≥ (resp., >) 0, ∀ λ ∈ {β : β > 0 and β ⊂ R} and ∀ u ∈ {−1, 1} such that [x∗g + λu] ∈ X. (3.1) We now consider (resp., strictly) associated derivative unimodality for multivariate functions [46]. Definition 3.2.4. A multivariate function f : X ⊂ Rn → R is (resp., strictly) associated derivative unimodal over X if for all x1 and x2 ∈ X and x1 6= x2 , every corresponding CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 32 univariate function F (λ) = f (x1 + λ(x2 − x1 )), λ ∈ [0, 1] ⊂ R is (resp., strictly) associated derivative unimodal according to Definition 4.4.2. Note that our definition of associated derivative unimodality excludes functions that are unbounded in the associated derivative although such functions are bounded from below e.g. f (x) = − x2 + b|x|c. 3.3 Problem formulation We consider both unconstrained and constrained optimization problems; we depart with the former. 3.3.1 Unconstrained minimization problem Consider the following general unconstrained minimization problem: Formulation 3.3.1. Let f (x∗ ) be a real-valued function f : X ⊆ Rn → R that is bounded from below. Find the minimum value f (x∗ ), such that f ∗ = f (x∗ ) {<} ≤ f (x), ∀ x ∈ X, with X the convex set of all possible solutions. (3.2) If the function f is unimodal and at least twice continuously differentiable, i.e. f ∈ C 2 , the minimizer x∗ ∈ X is characterized by the optimality criterion ∇f (x∗ ) = 0, (3.3) with H(x∗ ) semi-positive definite. Here, ∇ represents the gradient operator, and H the Hessian matrix. However, if f is discontinuous, i.e. f ∈ / C 0 , the minimizer x∗ of the mathematical optimization problem (3.2) may not satisfy the optimality criterion given in (3.3). Indeed, the optimality criterion (3.3) may not even be defined. If f is an associated derivative unimodal discontinuous function with associated gradient field everywhere defined, an alternative generalization to Formulation 3.3.1 may be written in derivative form as follows: CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 33 Formulation 3.3.2. Find the non-negative gradient projection point x∗g (hereafter referred to as a positive projection point), of a given {strictly} associated derivative unimodal real-valued function f : X ⊆ Rn → R, such that ∗ n ∗ ∇T A f (x + δu)u {>} ≥ 0, ∀ u ∈ R and ∀ (δ > 0) ∈ R such that x + δu ∈ X, (3.4) with X the convex set of all possible solutions. Formulation 3.3.2 implies, in the strict case, that departure from the positive projection point x∗g in any search direction u ∈ Rn , and for any step size δ > 0, results in positive associated directional derivatives. It follows that the sign of the projected gradient onto the search direction along some descent direction u changes from negative to positive at the positive projection point x∗ . For f smooth, i.e. f ∈ C 2 , Formulations 3.3.1 and 3.3.2 are equivalent, since the condition ∇f (x∗ ) = 0 follows, and the positive projection point x∗ in Formulation 3.3.2 is identical to the minimizer x∗ in Formulation 3.3.1. The second order condition i.e. H positive definite is implied by the requirement that the associated directional derivatives w.r.t. all search directions u is larger than 0. For associated derivative unimodal step discontinuous objective functions, Formulations 3.3.1 and 3.3.2 may define different solutions. 3.3.2 Equality constrained minimization problems Next, we consider the following general equality constrained minimization problems: Formulation 3.3.3. Find the minimum value f (x∗ ) of a given real-valued function f : X ⊆ Rn → R, such that f ∗ = f (x∗ ) {<} ≤ f (x), ∀ x ∈ X, such that hj (x) = 0, j = 1, 2, . . . , r ≤ n, (3.5) with X the convex set of all possible solutions and with hj : X ⊆ Rn → R, j = 1, 2, . . . , r ≤ n. For smooth objective and constraint functions, we can transform problem (3.5) into an unconstrained optimization problem via the Lagrangian function L(x, λ) = f (x) + r X λj hj (x), (3.6) j=1 which allows us to solve (3.5) using the dual formulation max{min L(x, λ)}. λ x (3.7) CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 34 If the objective and constraint functions are convex then the solution of Eq. (3.7) is also the optimum. Finally, we progress to discontinuous equality constrained optimization problems that is smooth around the positive projection point x∗g in the objective and constraint functions: Formulation 3.3.4. Find the positive projection point x∗g of a given {strictly} associated derivative unimodal real-valued function f : X ⊆ Rn → R that, such that ∗ ∗ ∇T A f (xg + δu)u {>} ≥ 0, ∀ xg + δu ∈ X, such that hj (x∗g ) = 0, ∀ j = 1, 2, . . . , r ≤ n, ∗ and u ∈ {ū : ∇T A hj (xg )ū = 0, ∀ j = 1, 2, . . . , r ≤ n, } (3.8) with X the convex set of all possible solutions and δ a real positive value and x∗g ∈ X. ∗ The condition ∇T A hj (xg )ū = 0 [46], reduces the set of projection directions u to a set of feasible directions. Firstly, we construct the Taylor expansion of the j th equality constraint hj (x∗g ) around x∗g to give ∗ 2 h̃j (x∗g + ū) = hj (x∗g ) + ∇T , A hj (xg )ū + O (ū) (3.9) which reduces to ∗ hj (x∗g + ū) ≈ hj (x∗g ) + ∇T A hj (xg )ū. (3.10) Since x∗g is feasible by definition we have hj (x∗g ) = 0, in addition we require Eq. (3.10) to be feasible which gives ∗ T ∗ hj (x∗g + ū) ≈ hj (x∗g ) + ∇T A hi (xg )ū = ∇A hj (xg )ū = 0, (3.11) and similarly Eq. 3.11 needs to hold for all j = 1, 2, . . . , n ≤ r constraints. Consequently, ∗ u is required be in the set {ū : ∇T A hj (xg )ū = 0, ∀ j = 1, 2, . . . , r ≤ n, }. Again using the Lagrangian function we may transform problem (3.8) into an unconstrained optimization problem. This time in solving (3.8) we use a gradient-only dual formulation g g λ x max{min L(x, λ)}, (3.12) g with max defined as follow: Find λ, such that λ r ∇T Aλ L(x, λ + γv v)v ≤ 0, ∀ v ∈ R , (3.13) CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 35 g and similarly for min: Find x, such that x n ∇T Ax L(x + δu u, λ)u ≥ 0, ∀ u ∈ R such that x + δu u ∈ X, (3.14) with X the convex set of all possible solutions, ∇Ax the partial associated derivatives w.r.t. x, ∇Aλ the partial associated derivatives w.r.t. λ and δu and γv real positive numbers. For step discontinuous functions, inconsistencies between Formulations 3.3.1 and 3.3.2 on the one hand, and Formulations 3.3.3 and 3.3.4 on the other may arise. 3.4 Optimization algorithms Most (if not all) classical optimization algorithms can be modified to use only gradient information instead of both function value and gradient information. To illustrate, we consider two classes of algorithms, namely line search descent methods and approximation methods. In classical line search descent methods, the search direction is obtained from gradient information. The step length is usually obtained using some line search strategy that uses function information. However, gradient-only line search descent methods may be formulated by merely changing the line search strategies to use only associated gradient information. In this study, we will consider a bracketing-interval line search strategy, although many alternative and more efficient line search strategies are available. (As an alternative, Snyman [54] presents a gradient-only implementation of an interpolation line search strategy.) In particular, we will consider the Broyden-Fletcher-Goldfarb-Shanno (BFGS) second order line search descent algorithm [55], which is suitable for problems that range from from small to large dimensionality. For large-scale problems [21], the limited memory BFGS algorithm [36] is indeed widely used. Lastly, the BFGS algorithm can be used without a line search strategy as a monotonically decreasing superlinear convergent BFGS has been demonstrated using only fixed step length updates [68], when certain Lipschitz continuity assumptions regarding the objective function hold. In sequential approximation optimization (SAO), the approximation functions are normally constructed using both function and gradient information. Function values may for example be used to approximate the curvature used in the approximations, as is done in spherical quadratic approximations, e.g. see Reference [52]. Again, gradient-only optimization is however possible. We would like to emphasize that our choice for BFGS and SSA algorithms is arbitrary and merely to illustrate the ease with which we can transform conventional gradient based algorithms to be gradient-only optimization algorithms. We deliberately chose one line search algorithm and one approximation algorithm, as they represent two widely used classes of optimization algorithms. By following the same methodology other optimiza- CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 36 tion algorithms which require a decision to be made between a current and proposed solution using function value information, can be extended using the proposed methodology. Hence, to the best of our knowledge all conventional gradient based algorithms could be modified using our methodology. Considering the extension of gradient-only optimization to alternative classes of optimization algorithms, as for example population based methods, may prove a bit more challenging, in particular when the features of these methods are to be preserved. We note that gradient-only algorithms based on classical optimization algorithms that scale well, may also be expected to scale well (provided the associated gradient computation scales well). Finite difference strategies may become prohibitively expensive to compute the full associated gradient vector for large-scale problems, but may be adequate for the computation of the associated directional derivatives2 . Automatic differentiation or (semi-) analytical sensitivity analyses are alternatives to compute computationally efficient associated gradient information. 3.4.1 BFGS second order line search descent method In the BFGS algorithm, the iterates are updated using x{k} = x{k−1} + λ{k} u{k} , (3.15) where the superscript k indicates the iteration number, x the design vector and u the descent direction. λ is a scalar value obtained from a line search. The descent direction u{k} is obtained from u{k} = −G{k−1} ∇A f (x{k−1} ), (3.16) and G{k} is updated using # {k} T {k−1} {k} v {k} (v {k} )T (y ) G y {k−1} =G + 1+ (v {k} )T y {k} (v {k} )T y {k} " # v {k} (y {k} )T G{k−1} + G{k−1} y {k} (v {k} )T − , (3.17) (v {k} )T y {k} " G{k} with v {k} = λ{k} u{k} , 2 Computation of accurate and consistent gradients using finite differences requires a temporary deactivation of the non-constant discretization strategy, in order to avoid a finite difference step over a discontinuity [48]. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 37 and y {k} = ∇A f (x{k} ) − ∇A f (x{k−1} ) . G{0} is commonly initiated with the n × n identity matrix I, and reset to G{k} = I after every n iterations [55]. Let us return to the line search strategies needed to determine the λ{k} . The line search conducted at the k th iteration from the point x{k−1} along a descent direction u{k} is described by the univariate function F (λ) = f (x{k−1} + λu{k} ), λ ≥ 0 ⊂ R. (3.18) Generally, line searches use function values to locate the minimum of F (λ). Various strategies exist to conduct function value based line searches, e.g. explicit searches using golden section or implicit searches using some interpolation strategy with Powell’s method, to name one example. Alternatively, line searches can be conducted solely based on the associated derivative of F (λ), with F 0A (λ) given by {k} F 0A (λ) = ∇T + λu{k} ) u{k} . A f (x (3.19) Line searches using function value information We use a crude three point bracketing strategy to locate a minimum of F (λ), which is assumed to be unimodal. Three points from the sequence [w(l − 1), w(l), w(l + 1)], l = 1, 2, . . . , lmax are used, with w(l) = lγ. Here, the bracketing step size γ is a user selected positive real number. The line search iterations l are incremented until either a minimum is bracketed, or the maximum number of line search iterations lmax is exceeded. We then refine the location of the minimum using a four point (3 interval) golden section search to within a specified user tolerance ξ, or until the maximum number of iterations lmax is reached. After each golden section iteration the total interval length is reduced by removal of a sub interval, whereafter a new point is generated within the remaining interval. The aim therefore is to find λ{k} for ξ ∈ R, with ξ > 0, such that F (λ{k} + ξ) = f x{k−1} + (λ{k} + ξ)u{k} > F (λ{k} ), and F (λ{k} − ξ) = f x{k−1} + (λ{k} − ξ)u{k} > F (λ{k} ). We will refer to the BFGS algorithm using this function-value-based line search as BFGS(f). CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 38 Line searches using only gradient information This time, we use a two point bracketing strategy to locate a minimum of F (λ). Two points from the sequence [w(l − 1), w(l)], l = 1, 2, . . . , lmax are used, with w(l) = lγ. The line search iterations l are incremented until either a sign change in the associated derivative F 0A (λ) is located, or the maximum number of line search iterations lmax are reached. We then refine the location of the sign change in the associated derivative using a three point (2 interval) bi-section method to within a specified user tolerance ξ, or until the maximum number of iterations lmax is reached. After each bi-section iteration, the total interval width is reduced by half, whereafter a new point is generated in the middle of the remaining interval. The aim is therefore to find λ{k} for ξ ∈ R with ξ > 0, such that {k} {k−1} {k} {k} F 0A (λ{k} + ξ) = ∇T f x + u (λ + ξ) u > 0, A and {k−1} + u{k} (λ{k} − ξ) u{k} < 0. F 0A (λ{k} − ξ) = ∇T Af x Note that the gradient-only interval shrinks faster than the classical (function value) interval, as the bi-section interval is reduced by 50%, as opposed to the ≈ 38% of the golden section interval after every iteration. We will refer to the BFGS algorithm with the derivative-based-line-search as BFGS(g). Termination criteria Termination criteria need some consideration: if the function values and gradients of an objective or cost function contain step discontinuities, these quantities may not provide robust termination information. Accordingly, we only advocate the robust termination criterion k∆x{k+1} k = kx{k+1} − x{k} k < , (3.20) with small, positive and prescribed. (A maximum number of iterations may of course also be prescribed.) Algorithmic implementation Given an initial point x{0} , the second order line search BFGS method for unconstrained minimization proceeds as follows: 1. Initialization: Select real constants > 0, ξ > 0 and γ > 0. Select integer constants kmax and lmax . Set G{0} = I. Set k := 1 and l := 0. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 39 2. Gradient evaluation: Compute ∇A f (x{k} ). 3. Update the search direction u{k+1} using (3.16). 4. Initiate an inner loop to conduct line search: Find λ{k+1} using one of the line search strategies described in Section 3.4.1. 5. Test for reinitialization of G{k} : if k mod n = 0 then G{k} = I else update G{k} using (3.17). 6. Move to the new iterate: Set x{k+1} := x{k} + λ{k+1} u{k+1} . 7. Convergence test: if kx{k+1} − x{k} k ≤ OR k = kmax , stop. 8. Initiate an additional outer loop: Set k := k + 1 and goto Step 2. 3.4.2 Sequential spherical approximations (SSA) In sequential approximation optimization (SAO) methods, the approximation functions used can easily be formulated using truncated Taylor expansions in which the curvature may be approximated using function values and gradient information, e.g. Snyman and Hay [52] and Groenwold et al. [22]. For illustrative purposes, we will construct two spherical quadratic approximations for use in SAO algorithms. For the first, we use both function value and gradient information for approximating the curvature; for the second, we use gradient information only. In both cases, we begin with the second order Taylor series expansion of a function f around some current iterate x{k} , given by 1 {k} f˜{k} (x) = f (x{k} ) + ∇T )(x − x{k} ) + (x − x{k} )T H {k} (x − x{k} ), A f (x 2 (3.21) where superscript k represents an iteration number, f˜ the second order Taylor series approximation to f , ∇A the associated gradient operator and H {k} the Hessian. f (x{k} ) and ∇A f (x{k} ) respectively represent the function value and associated gradient vector at the current iterate x{k} . For the sake of brevity and simplicity, we will now restrict ourselves to spherical approximations to the Hessian H {k} . Hence, the approximate Hessian or curvature is of the form H {k} = c{k} I, with c{k} a scalar, and I the identity matrix. This gives c{k} f˜{k} (x) = f (x{k} ) + ∇A T f (x{k} )(x − x{k} ) + (x − x{k} )T (x − x{k} ), 2 (3.22) with the scalar curvature c{k} unknown. We will return to the computation of c{k} shortly. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 40 Since the sequential approximate subproblems are continuous they may be solved analytically; the minimizer of subproblem k follows from setting the gradient of (3.22) equal to 0 to give ∇A f (x{k} ) x{k∗} = x{k} − . (3.23) c{k} In SAO, termination and convergence may be effected through the notion of conservatism. Here, we start with a note on the structure of the spherical approximation functions: they are separable. If these approximation functions are also strictly convex then convergence is guaranteed for a sequence of SAO iterates using conservatism [58]. Arguably, this results in the simplest algorithmic implementation for which termination is guaranteed. Therefore the minimizer of the subproblem x{k∗} is accepted i.e. x{k+1} = x{k∗} only if x{k∗} yields a conservative point. We will revisit implementations of conservatism shortly. Using function value information If historic function value information is exploited, it is possible to solve for c{k} by enforcing f˜{k} (x{k−1} ) = f (x{k−1} ), which results in {k} f (x{k−1} ) = f (x{k} ) + ∇T )(x{k−1} − x{k} ) A f (x + c{k} {k−1} (x − x{k} )T (x{k−1} − x{k} ), (3.24) 2 e.g. see Snyman and Hay [56]. The scalar c{k} is then obtained as c {k} {k} 2 f (x{k−1} ) − f (x{k} ) 2∇T )(x{k−1} − x{k} ) A f (x = {k−1} − . (x − x{k} )T (x{k−1} − x{k} ) (x{k−1} − x{k} )T (x{k−1} − x{k} ) (3.25) To ensure that approximation (3.22) is strictly convex, we will herein enforce c{k} = max(β, c{k} ), with β > 0 small and prescribed. Classical conservatism is solely based on function values, for which Svanberg [58] demonstrated that the SAO sequence k = 1, 2, · · · will terminate at the minimizer x∗ ↔ f ∗ , if each k th approximation f˜(x{k∗} ) is conservative, i.e. if f (x{k∗} ) ≤ f˜(x{k∗} ). (3.26) We will refer to this SSA algorithm, using conservatism defined by (3.26) and the termination criteria discussed in Section 3.4.1, as SSA(f). We note that the principle behind (3.26) is that updates are only accepted for which the real quality measure (f (x{k∗} ) in this instance) is guaranteed to be less than or equal to the approximate quality measure (f˜(x{k∗} )). CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 41 Using only gradient information Approximations to the gradient field may be constructed by simply taking the derivatives of (3.21), which gives ∇f˜{k} (x) = ∇A f (x{k} ) + H {k} (x − x{k} ). (3.27) At x = x{k} , the gradients of the function f and the approximation function f˜ match exactly. The approximate Hessian H {k} of the approximation f˜ is chosen to match additional information. Here we again only consider the case of a spherical quadratic approximation, where H {k} = c{k} I. c{k} is obtained by matching the gradient vector at x{k−1} . Since only a single free parameter c{k} is available, the n components of the respective gradient vectors are matched in a least square sense. The least squares error is given by E {k} = (∇f˜{k} (x{k−1} ) − ∇A f (x{k−1} ))T (∇f˜{k} (x{k−1} ) − ∇A f (x{k−1} )), (3.28) which, after substitution of (3.27) into (3.28), gives E {k} = (∇A f (x{k} ) + c{k} (x{k−1} − x{k} ) − ∇A f (x{k−1} ))T (∇A f (x{k} ) + c{k} (x{k−1} − x{k} ) − ∇A f (x{k−1} )). (3.29) Minimization of the least squares error E {k} w.r.t. c{k} then gives dE {k} = (∇A f (x{k} ) + c{k} (x{k−1} − x{k} ) − ∇A f (x{k−1} ))T (x{k−1} − x{k} ) dc{k} + (x{k−1} − x{k} )T (∇A f (x{k} ) + c{k} (x{k−1} − x{k} ) − ∇A (x{k−1} )) = 0, (3.30) hence c {k} (x{k−1} − x{k} )T (∇A f (x{k−1} ) − ∇A f (x{k} )) = . (x{k−1} − x{k} )T (x{k−1} − x{k} ) (3.31) Again, to ensure that approximation (3.22) is strictly convex, we enforce c{k} = max(β, c{k} ), with β > 0 small and prescribed. We now introduce conservatism, effected using only gradient information. At iterate k, the update is given by x{k∗} − x{k} . This update represents descent of f (x) if the projection of the actual function gradient ∇A f (x{k∗} ) onto the update direction (x{k∗} − x{k} ) is negative, i.e. if {k∗} ˜ {k∗} )(x{k∗} − x{k} ) = 0. ∇T )(x{k∗} − x{k} ) ≤ ∇T A f (x A f (x (3.32) Accordingly, any gradient-only approximation may be defined as conservative if (3.32) CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 42 holds. Our gradient-only definition of conservatism is similar in spirit to Svanberg’s function value based definition given in (3.26). We only accept updates x{k∗} for which we can {k∗} guarantee that the real quality measure (∇T )(x{k∗} − x{k} )) for gradient-only A f (x ˜ {k∗} )(x{k∗} − problems) is less than or equal to the approximate quality measure (∇T A f (x x{k} )). Although no formal proofs are presented in this study, we can show that our definition of conservatism guarantees convergence for certain classes of functions e.g. smooth convex functions. However, for discontinuous functions in general our definition is not sufficient to guarantee convergence. The challenge however lies in establishing whether a particular practical engineering problem is part of an associated convergent class of discontinuous functions or not. Although we lack strong theoretical evidence for convergence in general, we note that in our experience this gradient-only definition of conservatism suffices to achieve convergence for practical engineering problems. We will refer to this SSA algorithm, using conservatism defined by (3.32) and the scalar curvature given by (3.31), as SSA(g). Algorithmic implementation Given an initial point x{0} , a {gradient-only}/classical conservative algorithm based on convex separable spherical quadratic approximations for unconstrained minimization proceeds as follows: 1. Initialization: Select real constants > 0, α > 1 and initial curvature c{0} > 0. Set k := 1, l := 0. 2. Gradient evaluation: Compute {∇A f (x{k} )}/f (x{k} ) and ∇A f (x{k} ). 3. Approximate optimization: Construct local approximate subproblem {(3.27)}/(3.22) at x{k} , using {(3.31)}/(3.25). (In an inner loop, use c{k} as calculated in Step 6(b)). Solve this subproblem analytically, to arrive at x{k∗} . 4. Evaluation: Compute {∇A f (x{k∗} )}/f (x{k∗} ). 5. Test if x{k∗} is acceptable: if {(3.32)}/(3.26) is satisfied, goto Step 7. 6. Initiate an inner loop to effect conservatism: (a) Set l := l + 1. (b) Set c{k} := αc{k} . (c) Goto Step 3. 7. Move to the new iterate: Set x{k+1} := x{k∗} . CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 43 8. Convergence test: if kx{k+1} − x{k} k ≤ , OR k = kmax , stop. 9. Initiate an additional outer loop: Set k := k + 1 and goto Step 2. 3.5 Example problems We now present three example problems to demonstrate the robustness of gradient-only optimization when compared with classical gradient-based optimization in the presence of step discontinuities. We deliberately choose a small bracketing step size in an attempt to mimic an exact line search. This choice allows us to verify numerically for our examples whether these numerical step discontinuous functions are robustly optimizable in the multidimensional search space. The high number of required inner loops for the BFGS algorithms should therefore be interpreted with this in mind and not be misinterpreted as inherent to line search descent methods or the BFGS algorithm. The performance of the BFGS algorithms herein can readily be enhanced by considering for example an interpolation line search strategy, inexact line searches or just simply appropriate step sizes. In addition we choose to illustrate an expected difference between gradient-only optimization and classical gradient-based optimization using sequential approximation methods. These methods are well suited to optimize objective functions that have local minima that do not dominate an underlying global function trend, which are reminiscent to the functions we consider in this study. In all the examples presented here we use remeshing whenever spatial discretizations are required and adaptive time stepping whenever temporal discretizations are required. 3.5.1 Numerical settings The settings used for the BFGS(f) and BFGS(g) algorithms discussed in Section 3.4.1 are as follows: • the line search convergence tolerance ξ = 10−5 , • the maximum number of line search iterations lmax = 1000, and • the bracketing step size γ = 10−1 , except for the material identification study, when we use γ = 10−2 . For the SSA(f) and SSA(g) algorithms discussed in Section 3.4.2, the settings used are as follows: • the curvature factor α = 2, and • the initial curvature c{0} = 1. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 44 Throughout, we use a convergence tolerance = 10−4 , and a maximum number of outer iterations kmax = 500. 3.5.2 Example problem: temporal and spatial partial differential equations We consider the sizing design of a fin subjected to a constant volume constraint. The base of the fin initially experiences steady state conditions, whereafter it is exposed to a sinusoidal surge in heat flux. The objective is to minimize some average measure of temperature T over the base of the fin. For the sizing problem we consider an array of fins of which we only depict half a fin in Figure 3.2, due to symmetry. The two design variables for the sizing problem are the height th and width tw of the triangular part of the fin denoted by xT = [x1 x2 ]T = [th tw ]T . The temperature field T of the transient problem is solved using the finite element method (FEM) which allows for the construction of the objective function f (x). The objective function is the average nodal temperature over the base of the fin after tf = 2 seconds, i.e. f (x) = T̄b (x). The sizing problem is subjected to an equality constraint of constant lateral surface area bw bh + 0.5tw (th − bh ) = A0 . Hence, we can solve for x2 in terms of x1 from the equality constraint which renders the problem unconstrained in the single variable x1 . The fin has has constant unit thickness and constant lateral surface area A0 = 300mm2 . The base of the fin has a fixed width bw = 25 mm, and fixed height bh = 4 mm. We generate the meshes required for the FEM using a quadratically convergent remeshing strategy [66] that is based on a scheme proposed by Persson and Strang [42]. The resulting meshes have a varying number of nodes and nodal connectivity, which induces jump discontinuities into the objective function, since the objective function is formulated in terms of the mesh-dependent3 temperature field T . Initially, the nodal temperatures T in the fin are the steady state temperatures T 0 resulting from a steady state heat flux input q(0) = q 0 = 30 × 103 Wm−2 at the base of the fin, solved using the approximate FE equations KT 0 = q 0 . (3.33) The matrix K is partitioned into contributions from thermal conductivity (we have used w = 100 W m−1 K−1 ), and surface convection (we have used h = 100 Wm−2 K−1 ), at a constant ambient temperature Ta = 300 K. 3 The mesh generator described in [66] uses linear strain triangle (LST) elements, and we have used an ideal element edge length of h0 = 3. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION bh q=0 h,Ta th symm q=0 tw 45 Tb q(t) bw Fin base Figure 3.2: Fin model with a uniform time varying heat flux q(t) input at the base, top surface convection with constant convection coefficient h and ambient temperature Ta . The design variables for the sizing problem are the width tw and height th of the triangular part of the fin. The time dependent heat flux q(t) is given by q + q sin πt , 0 ≤ t ≤ tf 0 p tf q(t) = q , t > tf 0 (3.34) with tf = 2 s and qp = 3 × 105 Wm−2 . The semi-discrete transient heat equation is given by C Ṫ + KT = q, (3.35) with Ṫ denoting the first time derivative of the nodal temperatures T and the C matrix containing contributions from specific heat with a specific heat capacity of 450 J kg−1 K−1 and material density of 2770 kg m−3 . We solve (3.35) using a fully implicit backward Euler finite difference scheme, given by (C + ∆t{i+1} K)T {i+1} = CT {i} + ∆t{i+1} q {i+1} , (3.36) i = 0, 1, 2, . . . , while ∆t{i+1} indicates the time step t{i+1} − t{i} . We start the adaptive time stepping sequence using an initial time step of ∆t{1} = 0.1. For iteration {i + 1}, we compute the absolute change in average temperature ∆T¯b {i+1} over the base of the fin using ∆T¯b {i+1} = |T¯b {i+1} − T¯b {i} |. (3.37) The updated temperature T {i+1} is accepted, and the step size ∆t{i+2} is simultaneously CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 46 increased by a factor 1.5, if T¯b {i+1} < 20 K. Else, ∆t{i+1} is halved, and iteration {i + 1} is repeated. Although the algorithms outlined in Sections 3.4.1 and 3.4.2 are developed to optimize multidimensional functions they can nonetheless be applied to 1D functions, without requiring any modifications. Results are summarized in Table 3.1, obtained for a starting point x = 30. Figure 3.3 depicts f (x) and f 0 (x) for x ∈ [30, 90]. Note that the inserts in Figure 3.3 highlight the behavior of the objective function f (x) and it’s associated derivative f 0A (x) in the vicinity of the results in Table 3.1. Algorithm BFGS(f) BFGS(g) SSA(f) SSA(g) f (x{Nk } ) 4.885E+02 4.459E+02 4.458E+02 4.459E+02 f 0A (x{Nk } ) -2.804E+00 -6.366E-04 -2.324E-02 -5.827E-04 x{Nk } 3.385E+01 7.950E+01 7.720E+01 7.950E+01 Nk 3 3 22 19 Nl 79 522 48 16 Table 3.1: Results obtained with BFGS(f), BFGS(g), SSA(f) and SSA(g) for the univariate transient heat transfer problem. The optimal solution and positive projection point may coincide, as shown by the inserts in Figures 3.3(a) and (b). Note that the optimal solution occurs over a jump discontinuity. Although SSA(f) is able to overcome many of the numerically induced local minima it is clear from Figure 3.3 that both BFGS(f) and SSA(f) converged to such local minima. To the contrary, both BFGS(g) and SSA(g) are able to robustly overcome the numerically induced local minima and terminate at a positive projection point (indicated by a sign change in f 0A (x) for univariate functions). At first glance the reported results for both BFGS(g) and SSA(g) in Table 3.1 seems inferior to SSA(f). However, at closer inspection it is clear that the function value decreases from 445.9 to 445.7 over the discontinuity, by infinitesimally perturbing the gradient-only solution x∗g to the right. In addition, the gradient-only solution x∗g and the optimum x∗ describes the same design for all practical purposes. It is clear that when the gradient-only solution x∗g occurs over a discontinuity for univariate functions then two function values could be reported as f (x∗g ), the one obtained by the left-hand limit and the other by the right-hand limit. At closer inspection it is clear that SSA(f) converged to a local minimum that is not a positive projection point since the sign of f 0 (x) remains negative around this point. However, a slight perturbation around this point results in an increase in function value. 3.5.3 Example problems: spatial partial differential equations Next, we consider firstly the unconstrained design of an orthotropic Michell-like structure, and secondly the equality constrained design of an orthotropic Michell-like structure. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 47 494 500 492 BFGS(g) & SSA(g) f(x) Function value f(x) 490 490 488 SSA(f) 33 33.5 34 34.5 480 446.1 BFGS(f) 446 470 445.9 445.8 460 x* 445.7 78 80 82 450 30 40 50 60 70 80 90 x (a) lA (x) Derivative ff’(x) 0 −0.5 0.04 −1 0.02 x*g 0 −1.5 −0.02 BFGS(f) −2 −2.5 78 80 82 −2.7 −2.8 −3 −2.9 SSA(f) −3 −3.5 BFGS(g) & SSA(g) −3.1 −3.2 33 −4 30 40 33.5 50 34 34.5 60 70 80 90 x (b) Figure 3.3: (a) Function values, and (b) associated derivatives for the univariate transient heat transfer sizing problem. Note the sign change in f 0A (x) at x∗g = x∗ = 79.5. Again, both problems are analyzed using methods that are prone to step discontinuities. The Michell-like structure is depicted in Figure 3.4; the figure depicts the symmetry and support conditions. The structure has a predefined length of 30 mm and thickness of 1 mm. A point load F of 10 N acts at the center bottom of the structure. The boundary of the structure is controlled by the 16 control points x that can only move vertically, which are the 16 design variables in the shape design problem. The control points are linearly spaced along the top and bottom of the Michell-like structure with the first nine control points along the top and the remaining seven along the bottom, as depicted in Figure 3.4. The boundary is linearly interpolated between the control points. A linear elastic FEM is used to solve the approximate nodal displacement field u from the structural equilibrium PDEs. The spatial discretizations (meshes) for the FEM are CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 48 15 10 Upper control variables x2 Lower control variables x1 F, uF Figure 3.4: Initial geometry of half the Michell-like structure using 16 control points x. again4 generated using our quadratically convergent remeshing strategy [66]. As in the previous example, the remeshing strategy results in meshes with a varying number of nodes and nodal connectivity, which induces numerical jump discontinuities into the objective function, since the cost function will be formulated in terms of the mesh dependent displacement field u. The nodal displacements u are obtained from the linear elastic approximate finite element equilibrium equations Ku = f , (3.38) where K represents the assembled structural stiffness matrix and f the consistent structural loads. Following the usual approach, the system in (3.38) is partitioned along the unknown displacements (uf ) and the prescribed displacement (up ), i.e. " Ku = K ff K fp K pf K pp #( uf up ) ( = ff fp ) , (3.39) where f f represents the prescribed forces and f p the reactions at the nodes with prescribed displacements. The unknown displacements (uf ) are obtained from K ff uf = f f − K fp up . (3.40) The sensitivity of the displacement at the center uF w.r.t x is obtained by direct differentiation of (3.40) as shown in Chapter 2. The orthotropic stiffness matrix K is computed for Boron-Epoxy in a tape outlay i.e. the fibers are all aligned in a single direction along the longitudinal axis, as indicated by x1 in Figure 3.4.. We assume plane stress conditions. The orthotropic material properties for Boron-Epoxy used in this study are a Young’s modulus along the longitudinal axis of E1 = 228 GPa, along the transverse axis of E2 = 145 GPa and a shear modulus of G12 = 48 GPa. The last independent parameter in classical laminate theory (CLT) is the 4 This time, we use an ideal element edge length h0 = 1. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 49 Table 3.2: Tabulated results obtained for the unconstrained Michell-like structure. Algorithm BFGS(f) BFGS(g) SSA(f) SSA(g) f (x{Nk } ) 7.740E-01 5.941E-01 6.470E-01 5.938E-01 k∇A f (x{Nk } )k 4.705E-02 1.938E-03 1.602E-02 1.052E-03 k∆x{Nk } k 0.000E+00 0.000E+00 4.822E-05 9.582E-05 Nk 3 36 15 41 Nl 60 717 59 29 Poisson ratio ν12 = 0.23, since ν21 follows from the symmetry relation E1 ν21 = E2 ν12 . Unconstrained shape design of a Michell structure Consider the unconstrained shape design of the orthotropic Michell structure [20]; we minimize the weighted sum of the displacement and normalized volume of the structure. The cost function is given by f (u, x) = βuF (x) + f2 (x), where uF is the displacement at the point of load application of the structure, f2 is the volume of the structure V (x) divided by V0 and β a weight parameter. Both the displacement at the point of load application uF (x) and the volume of the structure V (x) depend on the design variables x. For this study we select V0 = 150 mm3 and β = 100. Results for the unconstrained Michell-like structure The results for the BFGS(f), BFGS(g), SSA(f) and SSA(g) algorithms are summarized in Table 3.2 with the respective final designs depicted in Figures 3.5 (a)-(d). Table 3.2 summarizes the function value f (x{Nk } ), gradient norm k∇A f (x{Nk } )k, the norm of the solution update k∆x{Nk } k, number of outer iterations Nk as well as the number of inner iterations Nl . Consider BFGS(f) which converged after 3 outer iterations Nk and, evidently got trapped in a local minimum due to numerical induced step discontinuities. The premature final design is apparent from Figure 3.5 (a). Similarly, SSA(f) converged after 15 outer iterations Nk also after getting trapped in a step discontinuous minimum. Significant improvements are evident by comparing Figure 3.5 (c) to Figure 3.5 (a), although noticeable improvements could still be made. Clearly, conservative approximation methods are able to overcome some step discontinuities and of course even more so when conservatism is relaxed. Conversely, BFGS(g) and SSA(g) were able to optimize the Michell structure without getting trapped in numerical induced step discontinuities. Consider the similar designs depicted in Figures 3.5 (b) and (d). It is clear that BFGS(g) and SSA(g) improved 8 8 6 6 6 6 4 2 4 2 Height 8 Height 8 Height Height CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 4 2 4 2 0 0 0 0 −2 0 −2 0 −2 0 −2 0 5 Length 10 15 5 (a) Length 10 15 5 Length (b) 10 15 50 5 (c) Length 10 15 (d) Function value f(x{k}) 0.9 0.85 0.8 0.75 0.7 0.65 BFGS(f) BFGS(g) −1 10 0 10 ||∆x{k}|| BFGS(f) BFGS(g) 0.95 Gradient norm ||∇A f(x{k})|| Figure 3.5: Michell-like structure: converged designs obtained with (a) BFGS(f), (b) BFGS(g), (c) SSA(f), and (d) SSA(g). −2 10 −1 BFGS(f) BFGS(g) 10 0.6 10 k (a) 20 30 10 k 20 (b) 30 5 10 15 k 20 25 (c) Figure 3.6: Michell-like structure: BFGS(f) and BFGS(g) algorithms convergence history plot of the (a) function value f (x{k} ) and (b) gradient norm k∇A f (x{k} )k and (c) the norm of the solution update k∆x{k} k. notably on the designs obtained with BFGS(f) and SSA(f). We further present for each algorithm their respective convergence histories w.r.t. function value f (x{k} ), gradient norm k∇A f (x{k} )k and the norm of the solution update k∆x{k} k. The respective histories for the BFGS algorithms are depicted in Figures 3.6 (a)(c) and for the SSA algorithms in Figures 3.7 (a)-(c). Monotonic function value decrease for both BFGS(f) and SSA(f) is clearly depicted in respectively Figure 3.6(a) and Figure 3.7(a). Conversely, non-monotonic function value decrease for both BFGS(g) and SSA(g) is evident in Figure 3.6(a) and Figure 3.7(a). Constrained shape design of a Michell structure We now consider the constrained shape design of the orthotropic Michell structure [20]. We minimize the displacement of load application point uF subject to an equality volume constraint V (x) = V0 where V0 is the prescribed volume of the structure. As indicated, we do so using a Lagrangian formulation. To solve the dual of the Lagrangian using line search descent methods, we use the BFGS algorithms as described in this chapter except for the following modifications: in the line search strategy every design step x is followed by a multiplier step (V (x) − V0 ) 30 35 SSA(f) SSA(g) 51 0 10 0.75 0.7 0.65 ||∆x{k}|| {k} 0.8 −1 10 )|| SSA(f) SSA(g) Gradient norm ||∇A f(x Function value f(x {k} ) CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION −2 10 −2 10 SSA(f) SSA(g) −4 10 −3 0.6 10 20 k 30 (a) 40 10 10 20 k 30 40 10 (b) 20 k 30 (c) Figure 3.7: Michell-like structure: SSA(f) and SSA(g) algorithms convergence history plot of the (a) function value f (x{k} ) and (b) gradient norm k∇A f (x{k} )k and (c) the norm of the solution update k∆x{k} k. Table 3.3: Tabulated results obtained for the constrained Michell-like structure. Algorithm BFGS(f) BFGS(g) SSA(f) SSA(g) L(x{Nk } , λ{Nk } ) 8.239E-01 3.635E-01 3.712E-01 3.564E-01 k∇A L(x{Nk } , λ{Nk } )k 2.191E-01 5.092E-03 2.992E-02 4.292E-03 k∆x{Nk } k 0.000E+00 0.000E+00 1.778E-07 0.000E+00 k∆λ{Nk } k 2.459E-01 3.104E-04 1.762E-02 2.645E-04 Nk 11 53 41 144 Nl 348 1188 129 238 in the outer loop. In turn, the multiplier λ is kept constant during the inner loop where the bracketing search and golden section refinement occurs. On the other hand we solve the dual of the Lagrangian using the SSA algorithms as described in this chapter with the following adjustments: every design update x in the outer loop is followed by a multiplier step (V (x) − V0 ). In turn, the multiplier λ is kept constant during the inner loops. Both the BFGS and SSA algorithms are terminated when k[∆xNk ∆λNk ]k < or, when k∆xNk k < for five consecutive iterations. Results for the constrained Michell-like structure The results for the BFGS(f), BFGS(g), SSA(f) and SSA(g) algorithms are summarized in Table 3.3, with the respective final designs depicted in Figures 3.8 (a)-(d). Table 3.3 summarizes the Lagrangian L(x{Nk } , λ{Nk } ), the norm of the Lagrangian gradient k∇A L(x{Nk } , λ{Nk } )k, the design variable update k∆x{Nk } k, the Lagrange multiplier update k∆λ{Nk } k, the number of outer iterations Nk , and the number of inner iterations Nl . BFGS(f) converged to a numerically induced local minimum after 11 outer iterations Nk , with the final design depicted in Figure 3.8 (a). Similarly, SSA(f) converged to a numerically induced local minimum after 41 outer iterations Nk , with the final design depicted in Figure 3.8 (c). As shown in Table 3.3, the respective Lagrange multiplier updates k∆λ{Nk } k are orders larger than the design variable updates k∆x{Nk } k, since both 40 8 8 6 6 6 6 4 2 4 2 0 5 Length (a) 10 15 4 2 0 0 Height 8 Height 8 Height Height CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 5 Length (b) 10 15 4 2 0 0 52 0 0 5 Length (c) 10 15 0 5 Length (d) 10 15 2 1.5 1 0.5 10 20 (a) k 30 40 50 0 {k} ] || 10 λ 2.5 BFGS(f) BFGS(g) 0 10 −1 10 −1 10 {k} BFGS(f) BFGS(g) 3 ||∆[x Lagrangian L(x{k},λ{k}) 3.5 Gradient norm ||∇A L(x{k},λ{k})|| Figure 3.8: Michell-like structure: converged designs obtained with (a) BFGS(f), (b) BFGS(g), (c) SSA(f), and (d) SSA(g). −2 −2 10 −3 10 BFGS(f) BFGS(g) 10 10 20 (b) k 30 40 50 10 20 k 30 40 (c) Figure 3.9: Michell-like structure: BFGS(f) and BFGS(g) algorithms convergence history plot of (a) the Lagrangian L(x{k} , λ{k} ), (b) the norm of the Lagrangian gradient k∇A L(x{k} , λ{k} )k and (c) the norm of the solution update k∆[x{k} λ{k} ]k. algorithms got stuck in numerically induced local minima for five consecutive iterations, while significantly violating the equality constraint. Conversely, BFGS(g) and SSA(g) were able to optimize the Michell structure with the respective designs depicted in Figures 3.8 (b) and (d). It is clear that BFGS(g) and SSA(g) improved notably on the designs obtained with BFGS(f) and SSA(f). As shown in Table 3.3, the Lagrange multiplier updates k∆λ{Nk } k are small. We further present for each algorithm their respective histories w.r.t. the Lagrangian L(x{k} , λ{k} ), the norm of the Lagrangian gradient k∇A L(x{k} , λ{k} )k and the norm of the solution update k∆[x{k} λ{k} ]k. The respective histories for the BFGS algorithms are depicted in Figures 3.9 (a)-(c) and for the SSA algorithms in Figures 3.10 (a)-(c). 3.5.4 Example problems: temporal partial differential equations To illustrate the advantages of gradient-only optimization for cost functions formulated from temporal PDEs discretized using non-constant strategies, we finally perform a material identification study. The aim of the material identification study is to find the parameter values of a modified Voce model (law) [33] which best describe experimentally measured yield stress data. We achieve this by merely minimizing the least squares er- 50 2 1.5 1 0.5 20 40 60 k 80 100 120 0 SSA(f) SSA(g) 10 −1 10 0 −2 −1 10 −2 10 −3 10 140 53 10 ||∆[x{k} λ{k}] || SSA(f) SSA(g) Gradient norm ||∇A L(x{k},λ{k})|| Lagrangian L(x{k},λ{k}) CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 10 20 40 (a) 60 k 80 100 120 140 SSA(f) SSA(g) 20 (b) 40 60 k 80 100 (c) Figure 3.10: Michell-like structure: convergence histories for SSA(f) and SSA(g) of (a) the Lagrangian L(x{k} , λ{k} ), (b) the norm of the Lagrangian gradient k∇A L(x{k} , λ{k} )k and (c) the norm of the solution update k∆[x{k} λ{k} ]k. ror between the experimental data and the modified Voce model data points. Although inverse problems are usually ill-posed we obtained promising results without regularizing the problem. The Voce model [31, 62] is given by dσy σy = θ0 1 − , dp σys (3.41) where σy represents the evolving yield stress, p the plastic strain, σys the saturation stress and θ0 the extrapolated strain hardening rate for zero flow stress. Although the Voce model describes yield stress evolution to moderate strains adequately, it usually has poor validity at large strains. To overcome large strain deficiencies, we use a modified Voce model to capture linear hardening behavior observed at larger strains. The Voce model is modified to include a stage IV evolution equation [33], to obtain dσ4 = c, dp (3.42) dσy σy σ4 = θ0 1 − + . dp σys σy (3.43) We calculate σy for a specific value of p by numerically integrating (3.42) and (3.43), using a forward Euler method in which we first solve for the stress contribution from the stage IV evolution equation given by {i+1} σ4 {i} = σ4 + c∆{i} p . (3.44) {i} Here, the superscript {i} indicates the iteration number and ∆p the plastic strain step {i+1} {i} size at iteration {i}, i.e. p − p . 120 140 CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 54 The updated evolving yield stress is then computed using {i} σy{i+1} = σy{i} + θ0 {i+1} σy σ 1− + 4 {i} σys σy ! ∆{i} p . (3.45) Each first order system of DEs requires an initial condition at {i} = 0. To allow for {0} {0} optimization flexibility, the initial conditions for σ4 and σy are chosen to be design {i} variables. ∆p is adjusted using an adaptive time step scheme. We start with an initial ∆0p = 10−3 in the adaptive time step scheme. For each {i} {i+1} {i} {i+1} iteration {i} we compute ∆σy , i.e. σy − σy . The step size ∆p increases by a {i} factor 1.5 from iteration {i + 1} to {i + 2}, unless ∆σy is more than a defined maximum {i} allowable evolved stress update ∆σymax = 10 MPa. We then half ∆p and redo the {i + 1}th iteration. During the entire numerical integration procedure, we limit the step {i} size ∆p between a maximum allowable step ∆max = 10−1 and a minimum allowable p = 10−3 . step ∆min p Let us consider the experimental measured data taken at various plastic strain points. The experimental data may range from a few, to many hundreds of points at arbitrarily spaced intervals, depending on the experimental setup. The resulting discrete model points in turn have their own arbitrarily spaced intervals. Some interpolation strategy is required since the experimental data points and the modified Voce model data points may not always coincide. Three obvious linear interpolation strategies that come to mind are interpolating • the experimental data points to the model data points, • the model data points to the experimental data points, and • both the model and experimental data points to intermediate points. In this study we only consider the first interpolation strategy. The experimental data required for the inverse problem is obtained by solving the modified Voce model with the parameter values given in Table 3.4. This allows for a zero error solution. We identify the {0} {0} following five design variables: x = [x1 , x2 , . . . , x5 ] = [c, θ0 , σys , σ4 , σy ], to minimize the least squares error between the experimental and model data points. In order to improve the variable scaling we normalize the design vectors by the optimum x∗ given in Table 3.4, although the results are only presented as non-normalized variables. Finally, we consider the interpolation of the experimental data points to the model data points. We therefore linearly interpolate between the experimentally measured data to obtain experimental data for the corresponding plastic strains in the numerical model. The disadvantage being that the number of experimental data points used depends on the time step sequence of the numerical model. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 55 Table 3.4: Parameter values used to construct experimental data for the inverse problem using the modified Voce model with the adaptive time step algorithm. c 90 θ0 1000 σys 180 {0} σ4 0.5 {0} σy 1 Table 3.5: Tabulated results for the least squares fit between the modified Voce model data points and the linearly interpolated experimental data points. Algorithm BFGS(f) BFGS(g) SSA(f) SSA(g) f (x{Nk } ) 2.605E+03 1.981E-04 1.392E+01 3.939E-01 k∇A f (x{Nk } )k 9.284E+03 3.200E-01 1.220E+03 7.206E+00 k∆x{Nk } k 0.000E+00 0.000E+00 7.172E-05 9.086E-05 Nk 3 36 16 22 Nl 35 1695 39 8 The cost function of the design problem is given by f (x) = r X (djei − djm )2 , (3.46) j=1 where r is the number of model data points i.e. the numerical integration sequence, djei is the linearly interpolated yield stress from the experimental data and djm is the yield stress obtained from the numerical model. Details regarding the computation of the analytical sensitivities for this problem are given in Section 6 of the Appendix. Results obtained for interpolated experimental data The results obtained with the BFGS(f), BFGS(g), SSA(f) and SSA(g) algorithms are summarized in Table 3.5. The table presents the function value f (x{Nk } ), the gradient norm k∇A f (x{Nk } )k, the norm of the solution update k∆x{Nk } k, the number of outer iterations Nk and the number of inner iterations Nl . From Table 3.5 it is clear that both BFGS(f) and SSA(f) converged to suboptimal local minima, in particular when considering the large norm of the gradient at the solutions. In turn, BFGS(g) and SSA(g) obtained solutions with the norms of the gradient considerably lower at the converged solution. To investigate the nature of the local minimum of BFGS(f) we depict the function value and associated derivative (around the converged solution) along the final search direction in Figures 3.11 (a) and (b) respectively. Conversely, BFGS(g) and SSA(g) were able to effectively minimize the least squares error. We also depict for BFGS(g) the function value and associated derivative (around the gradient projection point) along the final search direction in Figures 3.12 (a) and (b) respectively. CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 56 Table 3.6: Final designs obtained for the least squares fit between the modified Voce model data points and the linearly interpolated experimental data points. x∗ BFGS(f) BFGS(g) SSA(f) SSA(g) c 90 69.88 90.85 71.12 71.39 θ0 1000 1453.97 999.49 1025.31 1004.91 σys 180 148.52 179.74 181.32 186.17 {0} σ4 0.5 0.60 0.50 0.59 0.59 {0} σy 1 1.20 1.00 1.20 1.20 8000 5000 6000 0 lA FF’(λ) F(λ) Consider the history plots for BFGS(f) and BFGS(g) depicted in Figures 3.13 (a)(c): BFGS(f) result in a monotonic decrease in function value f (x{k} ), as opposed to the non-monotonic decrease of BFGS(g). The associated gradient norm is depicted in Figure 3.13 (b). The distance from the optimum kx∗ − x{k} k, depicted in Figure 3.13 (c), shows how BFGS(g) approaches the optimum. The history plots for SSA(f) and SSA(g) are depicted in Figures 3.14 (a)-(c) which illustrates that a monotonic decrease in function value is obtained by SSA(f) as opposed to a non-monotonic decrease for SSA(g). The distance from the optimum kx∗ − x{k} k depicted in Figure 3.14 (c), indicates how SSA(g) approaches the optimum. 4000 2000 0 −0.5 −5000 −10000 0 −15000 −0.5 0.5 0 λ λ (a) (b) 0.5 Figure 3.11: Function value and associated derivative along the search direction around the optimal point obtained with BFGS(f) for the linearly interpolated experimental data. 4 12000 2 x 10 10000 0 lA FF’(λ) F(λ) 8000 6000 4000 −4 2000 0 −0.5 −2 0 λ (a) 0.5 −6 −0.5 0 0.5 λ (b) Figure 3.12: Function value and associated derivative along the search direction around the solution obtained with BFGS(g) for the linearly interpolated experimental data. Lastly, we include the final design vectors x{Nk } obtained in Table 3.6. 2 10 0 10 −2 10 10 k 20 4 57 BFGS(f) BFGS(g) 10 BFGS(f) BFGS(g) 2 10 || x*− x{k}|| BFGS(f) BFGS(g) Gradient norm ||∇A f( x{k})|| Function value f( x{k}) CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 2 10 1 10 0 10 30 10 (a) k 20 30 5 10 (b) 15 k 20 25 30 35 (c) Figure 3.13: Modified Voce model: BFGS(f) and BFGS(g) algorithms convergence history plot of (a) the function value f (x{k} ), (b) the gradient norm k∇A f (x{k} )k and (c) the distance from the optimum kx∗ − x{k} k, for the linearly interpolated experimental data. 1 10 0 10 SSA(f) SSA(g) 3 10 SSA(f) SSA(g) || x*− x{k}|| Function value f( x{k}) 2 10 4 10 Gradient norm ||∇A f( x{k})|| SSA(f) SSA(g) 3 10 2 10 2 10 1 10 5 10 k 15 20 (a) 5 10 k (b) 15 20 5 10 k 15 (c) Figure 3.14: Modified Voce model: SSA(f) and SSA(g) algorithms convergence history plot of (a) the function value f (x{k} ), (b) the gradient norm k∇A f (x{k} )k and (c) the distance from the optimum kx∗ −x{k} k, for the linearly interpolated experimental data. 3.6 Conclusions We have studied the minimization of objective functions containing non-physical step or jump discontinuities. These discontinuities arise when (partial) differential equations are discretized using non-constant methods: the functions become discontinuous and non-differentiable at these discontinuities. We can however compute (semi) analytical sensitivities [40] at these discontinuous points since every point has an associated discretization for which such a computation can be performed. To illustrate, we proposed gradient-only implementations of the BFGS algorithm and a SAO algorithm for discontinuous problems, and applied these algorithms to a selection of problems of practical interest, both unconstrained and constrained. These are the design of a heat exchanger fin, the shape design of an orthotropic Michell-like structure, and a material identification study using a modified Voce law, all discretized using non-constant methods. In each instance, the gradient based algorithms found superior solutions to the classical methods that use both function and gradient information. 20 CHAPTER 3. APPLICATIONS OF GRADIENT-ONLY OPTIMIZATION 58 As opposed to surrogate methods based on design of experiments techniques, which scale poorly, gradient-only algorithms based on classical optimization algorithms that scale well, may also be expected to scale well (provided the gradient computations scale well); this may well become an important application of gradient-only methods. Another envisaged application of gradient-only algorithms are any problem for which gradient computations are inexpensive. CHAPTER 4 Theory of gradient-only optimization In this chapter we consider some theoretical aspects of gradient-only optimization for the unconstrained optimization of objective functions containing non-physical step or jump discontinuities. The (discontinuous) gradients are however assumed to be accurate and everywhere uniquely defined. This kind of discontinuity indeed arises when the optimization problem is based on the solutions of systems of partial differential equations, when variable discretization techniques are used (remeshing in spatial domains or variable time stepping in temporal domains). These discontinuities, which may cause local minima, are artifacts of the numerical strategies used and should not influence the solution to the optimization problem. We demonstrate that it is indeed possible to ignore these local minima due to discontinuities, if only gradient information is used. Various gradient-only algorithmic options are discussed. The implications are that variable discretization strategies, so important in the numerical solution of partial differential equations, can be combined with efficient local optimization algorithms. This chapter is organized as follows: We give an introduction to discontinuous objective functions in Section 4.1. We then define an optimization problem and solution to the problem that is solely based on the gradient of a function in Section 4.2. In Section 4.3, we introduce the gradient-only optimization problem and in Section 4.4 we offer proofs of convergence of descent sequences defined in the previous section. We give practical considerations regarding gradient-only optimization algorithms in Section 4.5, and present a brief comparative discussion of classical mathematical programming and gradient-only optimization in Section 4.6. In Section 4.7 we present a shape optimization problem of practical importance, and a number of analytical test functions. Concluding remarks then follow. 59 CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 4.1 60 Introduction In this study we consider theoretical aspects regarding gradient-only approaches to avoid spurious (local) minima for unconstrained optimization. Here, gradient-only optimization algorithms refers to optimization strategies that solely considers first order information of a scalar (cost) objective function in computing update directions and update step lengths. Many problems in engineering and the applied sciences are described by (partial) differential equations (P)DEs, e.g. Newton’s second law, Poisson’s equations, Maxwell’s electromagnetic equations, the Black-Scholes equations, the Lotka-Volterra equations and Einstein’s field equations. Analytical solutions to these are seldom available and in many cases, (approximate) numerical solutions need to be computed. These (approximate) numerical solutions are often obtained by employing discretization methods e.g. finite difference, finite element and finite volume methods. (P)DEs also often describe the physics of some optimization problem. These optimization problems are usually numerically approximated that results in an approximate optimization problem, which are then optimized using numerical optimization techniques. During optimization the domain over which the (P)DEs are solved may remain constant but the discretization may be required to change to ensure convergence or efficiency of the solution e.g. integrating over a fixed time domain using variable time steps. Alternatively, the design variables may describe the domain over which the (P)DEs are solved. A change in design variables therefore change the solution domain, which in turn requires the discretization to change e.g. shape optimization. In order to affect these discretization changes, we distinguish between two classes of strategies. First, constant discretization strategies continuously adjusts a reference discretization when the solution domain change (and hence generates a fixed discretization if the solution domain remains fixed). Secondly, variable discretization strategies generate new independent discretizations irrespective of whether or not the solution domain changes. For example, temporal (P)DEs may be solved using fixed or variable time steps. For spatial PDEs, the equivalents are fixed and mesh movement strategies on the one hand, and remeshing on the other. Fixed time steps and mesh movement strategies however may imply serious difficulties, e.g. impaired convergence rates and highly distorted grids and meshes, which may even result in failure of the computational procedures used. The variable discretization strategies are preferable by far. One consequence of using variable discretization while solving an optimization problem, is that the resulting objective functions contain discontinuities. Accordingly, the optimization of piece-wise smooth step discontinuous functions usually requires highly specialized optimization strategies, and possibly, heuristic approaches. In contrast, constant discretization strategies result in smooth, continuous objective functions that present no significant challenges to optimization algorithms. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 61 It therefore appears that the analyst has two options, namely i) combining an efficient local optimization algorithms with non-ideal constant discretization, or ii) combining a less efficient global optimization algorithm with ideal non-constant discretization. Hence, variable time step methods and remeshing techniques are normally avoided in optimization, due to the very fact that the required global optimization algorithms are prohibitively expensive. An important spatial example is structural shape optimization, in which fixed or mesh movement strategies are almost always used; the very motivation for this being that remeshing strategies cannot be used efficiently, due to the induced non-physical local minima during optimization, e.g. see References [1, 9, 32, 39]. To avoid confusion, we emphasize the difference between physical discontinuities that occur in the solution domain of PDEs, such as shear banding in plasticity and shock waves in supersonic flow, and non-physical discontinuities. The non-physical discontinuities we refer to occur in the objective function of an optimization problem as opposed to discontinuities in the solution of a PDE. As we aim to develop a theoretical framework for gradient-only optimization in this chapter we will invariably restrict ourselves to various classes of objective functions in our discussions and analysis through the course of this chapter. In addition, we restrict ourselves to unconstrained optimization and note that some practical constrained optimization problems can be successfully reformulated as unconstrained optimization problems using a penalty formulation. Consider the following unconstrained minimization problem: find the minimizer x∗ of a real-valued function f : X ⊆ Rn → R, such that f (x∗ ) ≤ f (x), ∀ x ∈ X, (4.1) with X the convex set of all possible solutions. If the function f is strictly convex, coercive and at least twice continuously differentiable, i.e. f ∈ C 2 , the minimizer x∗ ∈ X is characterized by the optimality criterion ∇f (x∗ ) = 0, with the Hessian matrix H(x∗ ) positive semi-definite at x∗ . Here, ∇ represents the gradient operator. For this programming problem, many well known minimization algorithms are available, e.g. steepest descent, (preconditioned) conjugate gradient methods and quasi-Newton methods like BFGS. However, if f is discontinuous, i.e. f ∈ / C 0 , the minimizer x∗ of the mathematical programming problem (4.1) may not satisfy the optimality criterion given above. Indeed, the optimality criterion may not even be defined in the classical sense although it may be defined using generalized gradients (subdifferential) ∂f (x) [12, 13, 17] which requires 0 ∈ ∂f (x∗ ). To the best of our knowledge, only a few gradient-only optimization algorithms have been developed, see References [4, 43, 44, 49, 53, 54, 64]. This includes the widely known gradient-only Newton’s method [64], which locates diminishing gradients.. A notable contribution on the optimization of functions that are not everywhere uniquely CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 62 differentiable (non-differentiable) is the subgradient methods [49]. Subgradient methods reduce to steepest descent algorithms with step lengths a priori determined i.e. the line searches do not depend on any computed information during optimization, when an objective function is continuously differentiable. All of these algorithms are used to minimize objective functions and require the condition that the gradient ∇f (x∗ ) = 0 or the subdifferential1 ∂f (x∗ ) must contain 0 depending on the differentiability of f (x) at x∗ . Accordingly, the well-known efficient optimization algorithms mentioned above may be unable to find x∗g since they are concerned with obtaining a minimizer x∗ for an objective function [17]. In turn, if it is known or assumed that x∗g coincides with a minimizer x∗ of f (x) then these problems can be approached from a classical mathematical programming perspective which have to be solved using global optimization algorithms. This is due to the numerically induced step discontinuities that manifest as local minima in the function domain. We however show that if it is known that accurate associated gradient information that is everywhere defined is available, the resulting discontinuous problems may still be optimized efficiently since gradient-only optimization ignores these numerically induced step discontinuities. Gradient-only optimization therefore transforms a problem plagued with numerically induced local minima to a problem free from it. Recall that a third option now becomes available to the analyst: combine an efficient local optimization algorithm with ideal non-constant discretization. Let us first present two illustrative examples of non-physical step discontinuities, to set the tone for this chapter. The first is rather trivial, the second not quite. 4.1.1 Univariate example problem: Newton’s cooling law Consider Newton’s law of cooling, which states that the rate of heat loss of a body is proportional to the difference in temperature between a body and the surroundings of that body, given by the linear first order DE: dT = −κ(T (t) − Tenv ), dt (4.2) with the well known analytical solution T (t) = Tenv + (Tinit − Tenv )e−κt . (4.3) Here κ is a positive proportionality constant, T (t) the temperature of the body at time t and Tenv the temperature of the surroundings of the body. We consider the temperature T (t) of a body after 1s, for 0.5 ≤ κ ≤ 2, with T (0) = 100◦ C at t = 0, and Tenv = 10◦ C for all t. The analytical solution of the 1 A subdifferential is the set of subgradients at a point. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 70 −10 65 forward Euler solution 60 analytical solution −20 55 −30 dT(1) d 50 T(1) 63 45 −40 40 −50 35 30 forward Euler solution −60 25 analytical solution 20 0.5 1 1.5 −70 0.5 2 κ 1 1.5 2 κ (b) (a) Figure 4.1: Numerical and analytical solutions for Newton’s cooling law. (a) Temperature T after 1 second for 0.5 ≤ κ ≤ 2, and (b) the corresponding associated derivative dT (1) dκ . bodies temperature T (1) is depicted in Figure 4.1(a) and the first associated derivative of T (1) w.r.t. κ is depicted in Figure 4.1(b). Solving Eq. (4.2) for 0.5 ≤ κ ≤ 2 with a forward Euler method using a variable time stepping strategy introduces step discontinuities in the temperature response; this is shown in Figure 4.1(a). For the variable time step strategy we decrease the time step whenever an allowed temperature increment is exceeded, otherwise we gradually increase the time step. The corresponding discontinuous derivatives are plotted in Figure 4.1(b). Note that although discontinuous, the associated derivatives are everywhere uniquely defined for the numerically computed objective function. 4.1.2 Multivariate example problem: Shape optimization 7 15mm 6 5 x1 Control point 4 0.65 3 60 0.6 50 F x9 x8 β u + V/V 0 2 71 5 3 0.55 1 0 1 −0.5 0 x16 F, uF (a) 15 10 15 0 2 0.5 10 −1 4 x 0 8 −1 1 5 x 9 (b) Figure 4.2: (a) Structure, boundary conditions and control variables and (b) the vertical displacement uF for variations of the two rightmost upper control variables (x8 , x9 ) for the Michell shape optimization problem. Next, we consider a non-trivial benchmark problem in structural shape optimization, namely the so-called Michell structure [20] depicted in Figure 4.2(a). The geometry is represented using 16 control variables that are linearly spaced horizontally with only CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 64 vertical degrees of freedom and piecewise linear interpolation between control points. The objective of this shape optimization problem is to minimize the sum of the weighted vertical displacement βuF at the point of load application and normalized volume VV0 for a unit thickness structure with F = 1N, V0 = 150mm3 and β = 1. The displacement uF is computed using a linear elastic finite element method with linear strain triangular elements (e.g. see [15]). For the material properties we use Young’s modulus E = 200GPa and Poisson’s ratio ν = 0.3. The meshes required for the finite element analyses are generated using a quadratic convergent remeshing strategy [66] with ideal element length h0 = 1mm. To illustrate the discontinuous nature of the objective function, the two control variables x8 and x9 are perturbed around the reference configuration depicted in Figure 4.2(a) over the range -1.0 through 1.0, using constant intervals of 0.05. The resulting objective function values are shown in Figure 4.2(b). The step discontinuities due to remeshing are clearly evident; they result since the number of nodes, and the nodal connectivity, changes. This is evident from Figure 4.2(b): a small decrease in x9 results in 3 elements (top insert in Figure 4.2(b)) as opposed to 4 elements (bottom insert Figure 4.2(b)) on the rightmost edge of the structure. 4.1.3 Introductory comments Clearly, the introduced non-physical discontinuities cannot be accommodated in optimization methods developed for C 1 continuous objective functions. However, again note that the associated gradients of the piece-wise smooth step discontinuous functions considered in this study are everywhere uniquely defined. Consider the positive projection point x∗g that occurs over a discontinuity as depicted by fN (x) in Figure 3.1, with a piece-wise smooth part L of the function to the left and a piece-wise smooth part R of the function to the right of it. Both the left and the right hand limits represent approximations to the analytical value of the objective function; the left and right hand limits differ only as a result of the discretization technique used, and these values approach each other in the limit of mesh refinement anyway. Hence, the value of the objective function being reported is not unique. In this study, we consider the unconstrained optimization of objective functions containing non-physical step or jump discontinuities with accurate associated gradients that are everywhere defined. For the sake of brevity, we restrict our efforts to unconstrained optimization (but the implications for constrained optimization are clear). 4.2 Definitions Not all step discontinuities are necessarily problematic for classical optimization, and we distinguish between two step discontinuity types, namely those that are inconsistent CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 65 with the function trend, and those that are consistent with the function trend, as shown in Figure 4.3. (All other discontinuities may be taken to be representable of either a minimum or a maximum.) To represent semi-continuity of f we introduce a double empty/filled circle convention as depicted in Figure 4.3(a), where a filled circle indicates F (λ0 ). Upper semi-continuity is represented by the filled/empty circle pair indicated by 1’s in Figure 4.3(a) i.e. the filled/empty circles lie above f . Lower semi-continuity in turn is represented by the empty/filled circle pair indicated by 2’s, i.e. the empty/filled circles lie below f , in Figure 4.3(a). 1 F F 2 1 2 1 2 1 2 (a) 0 (b) 0 Figure 4.3: Upper and lower semi-continuous univariate functions with (a) an inconsistent step discontinuity, and (b) a consistent step discontinuity. Figure 4.3(a) depicts an inconsistent step discontinuity; the function decreases as λ increases, but the step discontinuity results in an increase of the function over the step discontinuity. Similarly, Figure 4.3(b) depicts a consistent step discontinuity. The functions we consider in this study are step discontinuous and therefore not everywhere differentiable. However computationally the derivatives and gradients are everywhere computable since the analysis is per se restricted to the part of the objective function before, or after a discontinuity. We therefore define an associated derivative f 0A (x) and associated gradient ∇A f (x) which follow computationally when the sensitivity analysis is consistent [48]. Firstly, we define the associated derivative Definition 4.2.1. Let f : X ⊂ R → R be a real univariate piece-wise smooth step discontinuous function that is everywhere defined. The associated derivative f 0A (x) for f (x) at a point x is given by the derivative of f (x) at x when f (x) is differentiable at x. The associated derivative f 0A for f (x) non-differentiable at x, is given by the left-sided derivative of f (x) when x is associated to the left piece-wise continuous section of the discontinuity, otherwise it is given by the right-sided derivative. Secondly, the associated gradient is defined as follows: Definition 4.2.2. Let f : X ⊂ Rn → R be a real multivariate piece-wise smooth step discontinuous function that is everywhere defined. The associated gradient ∇A f (x) for f (x) at a point x is given by the gradient of f (x) at x when f (x) is differentiable at x. The associated gradient ∇A f (x) if f (x) is non-differentiable at x is defined as the CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 66 vector of partial derivatives where each partial derivative is an associated derivative (see Definition 4.2.1). It follows from Definitions 4.2.1 and 4.2.2 that the associated gradient reduces to the gradient of a function that is everywhere differentiable. We now proceed to develop a self-contained theoretical framework for gradient-only optimization. Although what follows is rather straightforward extensions of classical concepts, it is included for the sake of completeness. Definition 4.2.3. Let f : (a, b) ⊂ R → R be a real univariate function that is not necessarily continuous in both function value f (λ) and associated derivative f 0A (λ) but for which f (λ) and f 0A (λ) are uniquely defined for every λ ∈ (a, b). Then, f (λ) is said to have a (resp., strictly) negative associated derivative on (a, b) if f 0A (λ) (resp., <) ≤ 0, ∀ λ ∈ (a, b), e.g. see Figure 4.3. Conversely, f (λ) is said to have a (resp., strictly) positive associated derivative on (a, b) if f 0A (λ) (resp., >) ≥ 0, ∀ λ ∈ (a, b). Next, we define lower and upper semi-continuity of the associated gradient. Definition 4.2.4. Let f : X ⊂ Rn → R be a real valued function with an associated gradient field ∇A f (x) that is uniquely defined for every x ∈ X. • Then the associated directional derivative along a normalized direction u ∈ Rn is lower semi-continuous at y ∈ X if ∇A T f (y)u ≤ lim inf ∇A T f (y + λu)u, λ ∈ R. ± λ→0 • The associated directional derivative along a normalized direction u ∈ Rn is upper semi-continuous at y ∈ X if ∇A T f (y)u ≥ lim sup ∇A T f (y + λu)u, λ ∈ R. λ→0± • The associated directional derivative along a normalized direction u ∈ Rn is pseudocontinuous at y ∈ Rn if it is both upper and lower semi-continuous at y. We note that a univariate function f (λ) may be step discontinuous at a point λ̄ ∈ (a, b), but the associated derivative may still be pseudo-continuous at λ̄, e.g. the function ( f (λ) = λ2 , λ < −1 , 2 λ − 2, λ ≥ −1 is not pseudo-continuous at λ̄ = 1. However, the associated derivative ( 0A f (λ) = 2λ, λ < −1 , 2λ, λ ≥ −1 CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 67 is pseudo-continuous at λ̄ = −1, where we defined the associated derivative at λ̄ = −1 by the right-hand limit. 4.3 Gradient-only optimization problem We now present the general unconstrained gradient-only optimization problem that is equivalent to the classical minimization problem presented in Section 4.1 for smooth convex cost functions. Problem 4.3.1. Given a real-valued function f : X ⊂ Rn → R, find a non-negative associated gradient projection point x∗g ∈ X such that for every u ∈ {y ∈ Rn / kyk = 1} there exists a real number ru > 0, and the following holds: ∇A T f (x∗g + λu)u ≥ 0 ∀ λ ∈ (0, ru ]. Accordingly, we define (resp. non-negative) non-positive generalized associated gradient projection points that characterizes a solution to imply a (resp. minimum) maximum according to the associated gradient field of a scalar function, be it local or global, as follows: Definition 4.3.1. Suppose that f : X ⊂ Rn → R is a real-valued function for which the associated gradient field ∇A f (x) is uniquely defined for every x ∈ X. Then, a point x∗g ∈ X is a generalized non-negative associated gradient projection point (G-NN-GPP) if there exists a real number ru > 0 for every u ∈ {y ∈ Rn / kyk = 1} such that ∇A T f (x∗g + λu)u ≥ 0, ∀ λ ∈ (0, ru ]. Similarly, a point x∗g ∈ X is a generalized non-positive associated gradient projection (G-NP-GPP) point if there exists a real number ru > 0 for every u ∈ {y ∈ Rn / kyk = 1} such that ∇A T f (x∗g + λu)u ≤ 0, ∀ λ ∈ (0, ru ]. A special case of Problem 4.3.1 is given below which we refer to as the strict unconstrained gradient-only optimization problem. Problem 4.3.2. Given a real-valued function f : X ⊂ Rn → R, find a x∗g ∈ X such that for every u ∈ {y ∈ Rn / kyk = 1} there exists a real number ru > 0, and the following holds: ∇A T f (x∗g + λu)u > 0 ∀ λ ∈ (0, ru ]. Accordingly, we define strict associated gradient projection points (resp. nonnegative/non-positive) to imply a (resp. minimum/maximum) according to the associated gradient field of a scalar function, be it local or global, as follows: CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 68 Definition 4.3.2. Suppose that f : X ⊂ Rn → R is a real-valued function for which the associated gradient field ∇A f (x) is uniquely defined for every x ∈ X. Then, a point x∗g ∈ X is a strict non-negative associated gradient projection point (S-NN-GPP) if there exists a real number ru > 0 for every u ∈ {y ∈ Rn / kyk = 1} such that ∇A T f (x∗g + λu)u > 0, ∀ λ ∈ (0, ru ]. Similarly, a point x∗g ∈ X is a strict non-positive associated gradient projection point (S-NP-GPP) if there exists a real number ru > 0 for every u ∈ {y ∈ Rn / kyk = 1} such that ∇A T f (x∗g + λu)u < 0, ∀ λ ∈ (0, ru ]. It follows that the strict unconstrained gradient-only optimization problem is included in the generalized unconstrained gradient-only optimization problem. We now show that our definition for a generalized non-negative associated gradient projection point (G-NN-GPP) is consistent with the classical mathematical programming (MP) definition of a minimizer. To do so we consider the associated gradient at a G-NNGPP for C 1 continuous functions. Proposition 4.3.3. Let f : X ⊂ Rn → R be continuous with continuous first partial derivatives around a generalized non-negative associated gradient projection point (GNN-GPP) x∗g ∈ X. Then, ∇A f (x∗g ) = 0. Proof. By Definition 4.3.1 of a G-NN-GPP, ∇A T f (x∗g + λu)u ≥ 0 ∀ u and λ > 0 sufficiently small. Also for all corresponding −u, ∇A T f (x∗g + λ(−u))(−u) ≥ 0. Consequently since f (x) ∈ C 1 , lim ∇A T f (x∗g + λu)u = ∇A T f (x∗g )u ≥ 0 λ→0 and lim ∇A T f (x∗g − λu)u = ∇A T f (x∗g )u ≤ 0. λ→0 Thus, since u 6= 0 is arbitrarily chosen and ∇A f (x) continuous the above two statements can only be true simultaneously if ∇A f (x∗g ) = 0, which completes the proof. 4.3.1 Discontinuous gradient projection points (GPP) Our newly introduced definitions for a non-negative associated gradient projection point (NN-GPP) or a non-positive associated gradient projection point (NP-GPP) of a function only require that the associated gradient field be uniquely defined everywhere; no assumptions regarding the continuity of the function are required. Hereafter associated gradient projection point (GPP) or associated gradient projection set (GPS) is used to imply either a non-negative or non-positive associated gradient projection (point / set). We therefore omit the conventional inclusion of a saddle (point / set). In addition the CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 69 F F *R d * e *L d * e (a) (b) Figure 4.4: An illustration of (a) the function value and (b) the corresponding associated derivative that is either upper or lower semi-continuous, with a step discontinuous strict non-negative associated gradient projection point (S-NN-GPP) in ∈ (d, e). function may be discontinuous at a GPP. We first consider discontinuous GPPs for univariate functions and then for multivariate functions. An example of a function with a discontinuous NN-GPP is the absolute value function with the associated derivative at the minimum point λ∗ defined by either the left or right limit as depicted in Figure 4.4, as opposed to the conventional undefined derivative at λ∗ . The associated derivative at λ∗ is therefore either upper or lower semi-continuous as indicated by the double empty/filled notation. Proposition 4.3.4. Let f : [d, e] ⊂ R → R be a real univariate function that is not necessarily continuous in both function value f (λ) and associated derivative f 0A (λ) but for which f (λ) and f 0A (λ) are uniquely defined for every λ ∈ [d, e]. In addition, let f 0A (λ) be step discontinuous (upper or lower associated derivative semi-continuous) at a (resp. generalized / strict) associated gradient projection point ((resp. G/S)-GPP) λ∗ ∈ (d, e) according to Definition (resp. 4.3.1 / 4.3.2). Let λ∗L be the left limit and λ∗R the right limit of λ∗ . Then in addition to the (resp. G/S)-GPP λ∗ , either λ∗L is a (resp. G/S)-GPP if lim f 0A (λ) 6= f 0A (λ∗ ), λ→λ∗− or λ∗R is a (resp. G/S)-GPP if lim f 0A (λ) 6= f 0A (λ∗ ). λ→λ∗+ Proof. This is immediate from Definition (resp. 4.3.1 / 4.3.2). For multivariate functions we can state a similar proposition. Proposition 4.3.5. Let f : X ⊂ Rn → R be a real valued function with associated gradient field ∇A f (x) that is uniquely defined for every x ∈ X. In addition let ∇A f (x) CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 70 be step discontinuous at a (resp. G/S)-GPP x∗g ∈ X according to Definition (resp. 4.3.1 / 4.3.2). Then the limit of every sequence to x∗g is also a (resp. G/S)-GPP if lim ∇A f (x) 6= ∇A f (x∗g ). x→x∗g Proof. This is immediate from Proposition 4.3.4. We now introduce a (resp. generalized / strict) associated gradient projection set ((resp. G/S)-GPS) to accommodate all the (resp. generalized / strict) associated gradient projection points ((resp. G/S)-GPPs) at a (resp. G/S)-GPP x∗g . Definition 4.3.3. Let f : X ⊂ Rn → R be a real valued function with associated gradient field ∇A f (x) that is uniquely defined for every x ∈ X. In addition let x∗g ∈ X be a (resp. G/S)-GPP according to Definition (resp. 4.3.1 / 4.3.2). We define the set S as follows: ∗ ∗ n S = xg , y : lim∗ ∇A f (y) 6= ∇A f (xg ), ∀ y ∈ R y→xg The set S is then a (resp. generalized / strict) non-negative associated gradient projection set (resp. SG−N N / SS−N N ) of x∗g if every x ∈ (resp. SG−N N / SS−N N ) is a (resp. G/S)NN-GPP according to Definition (resp. 4.3.1 / 4.3.2). The set S is then a (resp. generalized / strict) non-positive associated gradient projection set (resp. SG−N P / SS−N P ) of x∗g if every x ∈ (resp. SG−N P / SS−N P ) is a (resp. G/S)-NP-GPP according to Definition (resp. 4.3.1 / 4.3.2). We now show that our definition of a (resp. generalized / strict) associated gradient projection set (resp. SG / SS ) is consistent with the classical mathematical programming definition of a minimum or maximum point. Proposition 4.3.6. Let f : X ⊂ Rn → R be a real valued function with a continuous associated gradient field ∇A f (x) with x ∈ X. Then, any (resp. generalized / strict) associated gradient projection set (resp. SG / SS ) of x∗g ∈ X is a singleton {x∗g } such that ∇A f (x∗g ) = 0. Proof. It follows from the definition of a (resp. SG / SS ) given in Definition 4.3.3 that lim ∇A f (y) = ∇A f (x∗g ), y ∈ Rn y→x∗g since f (x) is C 1 continuous at x∗g by premise of which (resp. SG / SS ) is reduced to a singleton. The second assertion that ∇A f (x∗g ) = 0 follows from Proposition 4.3.3. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 4.3.2 71 Derivative descent sequences Now that we have defined GPPs and GPSs solely based on the associated gradient field of a function, we proceed to define descent sequences that only considers the associated gradient field of a function. Definition 4.3.4. For a given sequence {x{k} ∈ X ⊂ Rn : k ∈ P} suppose ∇A f (x{k} ) 6= 0 for some k and x{k} ∈ / SG−N N with SG−N N defined in Definition 4.3.3. Then the sequence {x{k} } is an associated derivative descent sequence for f : X → R, if an associated sequence {u{k} ∈ Rn : k ∈ P} may be generated such that if u{k} is a descent direction from the set of all possible descent directions at x{k} , i.e. ∇A T f (x{k} )u{k} < 0 then ∇A T f (x{k+1} )u{k} < 0, for x{k} 6= x{k+1} (4.4) We also include the definition of a stricter class of associated derivative descent sequences which we require for convergence proofs of multimodal functions of dimension two and higher in order to exclude oscillating sequences. Oscillating sequences may occur when the sequence defined in Definition 4.3.4 is considered. Definition 4.3.5. For a given sequence {x{k} ∈ X ⊂ Rn : k ∈ P} suppose ∇A f (x{k} ) 6= 0 for some k and x{k} ∈ / SG−N N with SG−N N defined in Definition 4.3.3. Then the {k} sequence {x } is a conservative associated derivative descent sequence for f : X → R, if an associated sequence {u{k} ∈ Rn : k ∈ P} may be generated such that if u{k} is a descent direction from the set of all possible descent directions at x{k} then ∇A T f x{k} + λ(x{k+1} − x{k} ) u{k} < 0, ∀ λ ∈ [0, 1] for x{k} 6= x{k+1} . 4.4 (4.5) Proofs of convergence for derivative descent sequences Before we present proofs of convergence of (conservative) associated derivative descent sequences we include two gradient-only definitions of the well-known concepts in classical mathematical programming to simplify our proofs of convergence. First, we present a definition of coercive functions based solely on the associated gradient of a function [41]. Although this definition does not bear a strict analogy with the conventional coercive definition it suffices for our purposes. Definition 4.4.1. Let x1 , x2 ∈ Rn . Then a real valued function f : X ⊂ Rn → R with associated gradient field ∇A f (x) that is uniquely defined for every x ∈ X, is associated CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 72 derivative coercive if there exist a positive number RM such that ∇A T f (x2 )(x2 − x1 ) > with > 0 ∈ R for non perpendicular ∇A f (x2 ) and (x2 − x1 ), whenever kx2 k ≥ RM and kx1 k < RM . Secondly, we present definitions for univariate and multivariate associated gradient unimodality based solely on the associated gradient field of a real valued function [4]. Definition 4.4.2. A univariate function f : X ⊂ R → R with associated derivative f 0A (λ) uniquely defined for every λ ∈ X, is (resp., strictly) associated derivative unimodal over X if there exists a x∗g ∈ X such that f 0A (x∗g + λu)u ≥ (resp., >) 0, ∀ λ ∈ {β : β > 0 and β ⊂ R} and ∀ u ∈ {−1, 1} such that [x∗g + λu] ∈ X. (4.6) We now consider (resp., strictly) associated derivative unimodality for multivariate functions [46]. Definition 4.4.3. A multivariate function f : X ⊂ Rn → R is (resp., strictly) associated derivative unimodal over X if for all x1 and x2 ∈ X and x1 6= x2 , every corresponding univariate function F (λ) = f (x1 + λ(x2 − x1 )), λ ∈ [0, 1] ⊂ R is (resp., strictly) associated derivative unimodal according to Definition 4.4.2. 4.4.1 Univariate functions Now that we have an associated derivative based definition of unimodality for univariate functions we present a proof of convergence for strict univariate associated derivative unimodal functions when associated derivative descent sequences are considered. Theorem 4.4.1. Let f : Λ ⊆ R →] − ∞, ∞] be a univariate function that is strictly associated derivative unimodal as defined in Definition 4.4.2, with first associated derivative f 0A : Λ →] − ∞, ∞[ uniquely defined everywhere on Λ. If λ{0} ∈ Λ and {λ{k} } is an associated derivative descent sequence, as defined in Definition 4.3.4, for f with initial point λ{0} , then every subsequence of {λ{k} } converges. The limit of any convergent subsequence of {λ{k} } is a strict non-negative associated gradient projection point (S-NN-GPP), as defined in Definition 4.3.2, of f . Proof. Our assertion that f is strict associated derivative unimodal as defined in Definition 4.4.2 implies that f has only one S-NN-GPS SS−N N ⊂ Λ as defined in Definition 4.3.3 at λ∗ ∈ Λ. Let λr ∈ SS−N N such that |λ{k} − λr | is a maximum. Consider a sequence CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 73 of 1-balls {B(bk , k )} defined around bk = 12 (λ{k} + λr ) with radius of 12 |λ{k} − λr |. Then every λ{k+1} ∈ B(bk , k ), since {λ{k} } is an associated derivative descent sequence as defined in Definition 4.3.4 and f is strict associated derivative unimodal as defined in Definition 4.4.2. Therefore, k → ∞ implies |λ{k} − λr | → 0. It follows from the Cauchy criterion for sequences that {λ{k} } is convergent, which completes the proof of our first assertion. Now let {λ{k}m } be a convergent subsequence of {λ{k} } and let λm∗ be its limit. Suppose, contrary to the second assertion of the theorem, that λm∗ is not a S-NN-GPP as defined in Definition 4.3.2 of f . Since we assume that λm∗ is not a S-NN-GPP, and by Definition 4.3.4, there exist a λm∗ + δ for δ 6= 0 ∈ R such that f 0A (λm∗ + δ) < 0, which contradicts our assumption that λm∗ is the limit of the subsequence {λ{k}m }. Therefore, for λm∗ to be the limit of an associated derivative descent subsequence {λ{k}m }, λm∗ ∈ SS−N N , which completes the proof. We now proceed with a proof of convergence for generalized univariate associated derivative unimodal functions when associated derivative descent sequences are considered. Theorem 4.4.2. Let f : Λ ⊆ R →] − ∞, ∞] be a univariate function that is associated derivative unimodal, as defined in Definition 4.4.2, with first associated derivative f 0A : Λ →] − ∞, ∞[ uniquely defined everywhere on Λ. If λ{0} ∈ Λ and {λ{k} } is an associated derivative descent sequence, as defined in Definition 4.3.4, for f with initial point λ{0} , then every subsequence of {λ{k} } converges. The limit of any convergent subsequence of {λ{k} } is a generalized G-NN-GPP, as defined in Definition 4.3.1, of f . Proof. Our assertion that f is associated derivative unimodal as defined in Definition 4.4.2 implies that f has at least one G-NN-GPS SG−N N ∈ Λ as defined in Definition 4.3.3. Let S ⊂ Λ be the union of G-NN-GPSs SG−N N . Consider the j th sequence of 1-balls {B(bk , k )}j defined around bk = 21 (λ{k} + (λ∗j ∈ S)) and with radius k = 12 |λ{k} − (λ∗j ∈ S)|. Then λ{k+1} ∈ B(bk , k )j for every sequence j since {λ{k} } is a associated derivative descent sequence as defined in Definition 4.3.4 and f is associated derivative unimodal as defined in Definition 4.4.2. Therefore k → ∞ implies |λ{k} − (λ∗j ∈ S)| → aj with aj a constant. Since |λ{k} − (λ∗j ∈ S)| − aj → 0 for every j it follows from the Cauchy criterion for sequences that {λ{k} } is convergent, which completes the proof of our first assertion. Now let {λ{k}m } be a convergent subsequence of {λ{k} } and let λm∗ be its limit. Suppose, contrary to the second assertion of the theorem, that λm∗ is not a G-NN-GPP as defined in Definition 4.3.1 of f . Since we assume that λm∗ is not a G-NN-GPP, and by Definition 4.3.4, there exist a λm∗ + δ for δ 6= 0 ∈ R such that f 0A (λm∗ + δ) < 0 which contradicts our assumption that λm∗ is the limit of the subsequence {λ{k}m }. Therefore, for λm∗ to be the limit of an associated derivative descent subsequence (see Definition 4.3.4) {λ{k}m }, λm∗ ∈ S, which completes the proof. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 74 Now that we have concluded our proofs of (strictly) associated derivative unimodal univariate functions, we present a proof of convergence for univariate associated derivative coercive functions that have at least one S-NN-GPS. Theorem 4.4.3. Let f : Λ ⊆ R →]−∞, ∞] be a univariate associated derivative coercive function, as defined in Definition 4.4.1, with first associated derivative f 0A : Λ →]−∞, ∞[ uniquely defined everywhere on Λ. If λ{0} ∈ Λ and {λ{k} } is an associated derivative descent sequence, as defined in Definition 4.3.4, for f with initial point λ{0} , then there exists at least one convergent subsequence of {λ{k} }. The limit of any convergent subsequence of {λ{k} } is a S-NN-GPP of f . Proof. Since we only consider associated derivative descent sequences {λ{k} } our assertion that f is associated derivative coercive implies the closed interval [a, b] ⊂ Λ. The sequence {λ{k} } is bounded which follows from our premise of f . It follows from the WeierstrassBolzano theorem that in a closed interval [a, b], every sequence has a subsequence that converges to a point in the interval [8]. Now let {λ{k}m } be a convergent subsequence of {λ{k} } and let λm∗ ∈ Λ be its limit. Suppose, contrary to the second assertion of the theorem, that λm∗ is not a S-NN-GPP of f . Since we assume that λm∗ is not a S-NN-GPP, and by Definition 4.3.4, there exist a λm∗ + δ for δ 6= 0 ∈ R such that f 0A (λm∗ + δ) < 0, which contradicts our assumption that λm∗ is the limit of the subsequence {λ{k}m }. Therefore, for λm∗ to be the limit of an associated derivative descent sequence (see Definition 4.3.4) {λ{k}m }, λm∗ ∈ SS−N N with SS−N N ⊂ Λ which completes the proof. 4.4.2 Multivariate functions We begin our proof of convergence of associated derivative descent sequences for multivariate functions with C 1 continuous convex functions [41], whereupon we present proofs of convergence for broader classes of functions. Theorem 4.4.4. Suppose f : X ⊆ Rn → R is a C 1 continuous convex function with x ∈ X. If x{0} ∈ X and {x{k} } is an associated derivative descent sequence, as defined in Definition 4.3.4, for f with initial point x{0} , then every subsequence of {x{k} } converges. The limit of any convergent sequence of {x{k} } is a S-NN-GPP as defined in Definition 4.3.2 of f . Proof. Our assertion that f is convex and C 1 continuous ensures that f has a single global minimizer x∗g ∈ X. Also, by Definition 4.3.4 and the continuity of the first partial derivatives, we see that {f (x{k} )} is a decreasing sequence that is bounded below by f (x∗g ). It follows that {x{k} } is a bounded sequence since f is convex. The BolzanoWeierstrass theorem implies that {x{k} } has at least one convergent subsequence, which completes the proof of our first assertion [41]. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 75 Now let {x{k}m } be a convergent subsequence of {x{k} } and let xm∗ ∈ X be its limit. Suppose, contrary to the second assertion of the theorem, that xm∗ is not a S-NNGPP as defined in Definition 4.3.2 of f which from our continuity assumption implies ∇A f (xm∗ ) 6= 0, which in turn implies that there exists a descent direction um∗ at xm∗ , such that um∗ 6= 0. Since {x{k}m } is an associated derivative descent sequence as defined in Definition 4.3.4 of which the limit xm∗ is by assumption not a S-NN-GPP i.e. −∇A T f (xm∗ )∇A f (xm∗ ) < 0. It follows from the continuity assumptions that there exists a small λ > 0 ∈ R such that −∇A T f (xm∗ + λum∗ )∇A f (xm∗ ) < 0 which contradicts our assumption that xm∗ is the limit of the sequence {x{km} }. Therefore, for x∗m to be the limit of an associated derivative descent sequence {x{km} }, ∇A f (xm∗ ) = 0, which in turn implies um∗ = 0. The limit x∗m of an associated derivative descent sequence as defined in Definition 4.3.4, is therefore a S-NN-GPP as defined in Definition 4.3.2, which completes the proof. Before we proceed to present a proof of convergence for C 1 continuous associated derivative coercive functions, we show that if a function is associated derivative coercive and C 1 continuous it has at least one global minimizer. Proposition 4.4.5. Suppose f : X ⊆ Rn → R is a C 1 continuous associated derivative coercive function as defined in Definition 4.4.1 with x ∈ X, then f has at least one S-NN-GPP as defined in Definition 4.3.2. Proof. Let x1 , x2 , x3 ∈ Rn . Since f is associated derivative coercive as defined in Definition 4.4.1, there exists by definition a number RM such that for every {x2 : kx2 k > RM }, and every {x1 : kx1 k < RM }, the following holds: ∇A T f (x2 )(x2 − x1 ) > 0, for non perpendicular ∇A f (x2 ) and (x2 − x1 ). In addition, there exists {x3 : kx3 k < RM }, such that ∇A T f (x3 )(x3 − x1 ) > 0. Therefore, the set {x : kxk < RM } is closed and bounded, which by the continuity assumption implies that f (x) assumes a minimum value on {x : kxk < RM } at a point x∗g ∈ X. From the continuity assumption of the first partial associated derivatives, it follows that ∇A f (x∗g ) = 0 [41]. It therefore follows from the continuity assumptions that Definition 4.3.2 holds at x∗g . Theorem 4.4.6. Suppose f : X ⊆ Rn → R is a C 1 continuous associated derivative coercive function, as defined in Definition 4.4.1, with x ∈ X. If x{0} ∈ X, and {x{k} } is a conservative associated derivative descent sequence, as defined in Definition 4.3.5, for f with initial point x{0} , then some subsequence of {x{k} } converges. The limit of any convergent sequence of {x{k} } is a G-NN-GPP, as defined in Definition 4.3.1, of f . CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 76 Proof. Our assertion that f is continuous and associated derivative coercive ensures that f has a global minimizer x∗g ∈ X. Also, by the definition of a conservative associated derivative descent sequence and the continuity of the first partial associated derivatives, we see that {f (x{k} )} is a decreasing sequence that is bounded below by f (x∗g ). Note that we require conservative associated derivative descent sequences, since derivative descent sequence is not sufficient to guarantee convergence as it may result in oscillatory behavior for n > 1. The remainder of the proof is similar to the proof of Theorem 4.4.4. We now proceed to functions that are either C 0 continuous or discontinuous, but for which the function values and associated gradient field are uniquely defined everywhere. We present classes of C 0 continuous or discontinuous functions for which convergence is guaranteed, since associated derivative descent sequences may not converge to NN-GPP when all C 0 continuous or discontinuous functions are considered, as is evident from the following example. Consider the linear programming problem of finding the intersection between two intersecting planes. Since the associated gradient on each plane is constant, a steepest descent sequence that terminates at the intersection of the two planes is an example of a sequence that converges to some point that is not a NN-GPP. Hence, we now present classes of well-posed discontinuous functions for which convergence is guaranteed. Definition 4.4.4. We consider the (resp. generalized / strict) gradient-only optimization problem to be well-posed (resp. convex / unimodal) associated derivative when 1. the associated gradient field is everywhere uniquely defined, 2. the problem is associated derivative coercive as defined in Definition 4.4.1, 3. there exits one and only one (resp. G/S)-NN-GPS (resp. SG−N N / SS−N N ) as defined in Definition 4.3.3, and 4. when every associated derivative descent sequence as defined in Definition 4.3.4 has at least one converging subsequence to a point in (resp. SG−N N / SS−N N ). We now present a class of well-posed associated derivative coercive functions; this includes multimodal functions. Definition 4.4.5. We consider the gradient-only optimization problem to be (resp. proper / generalized) well-posed associated derivative coercive when 1. the associated gradient field is everywhere uniquely defined, 2. the problem is associated derivative coercive as defined in Definition 4.4.1, CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 77 3. there exits at least one (resp. G/S)-NN-GPS (resp. SG−N N / SS−N N ) as defined in Definition 4.3.3, and 4. when every conservative associated derivative descent sequence as defined in Definition 4.3.5 has at least one converging subsequence to a point in (resp. SG−N N / SS−N N ). We note that the classes of functions defined in Definitions 4.4.4 - 4.4.5 still exclude many problems of practical significance e.g. linear programming problems. Many of these practically significant problems may be accommodated by altering Definitions 4.4.4 - 4.4.5 to hold only for specific associated derivative descent sequences. 4.5 Practical algorithmic considerations We now consider some practical algorithmic implications of the foregoing, relying in particular on the new definitions for an associated derivative critical point presented in Definitions 4.3.1 and 4.3.2. We aim to give a fairly general outline for modifying classical gradient based optimization algorithms to become gradient-only optimization algorithms; often this merely requires subtle modifications to conventional gradient based algorithms. We consider two classes of optimization algorithms, namely line search descent methods, and approximation methods; both are prevalent in practical optimization. 4.5.1 Line search descent methods Line search methods are generally present in first order methods (e.g. steepest descent and conjugate gradient methods), and second order methods (e.g. in general the modified Newton methods e.g. Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-GoldfarbShanno (BFGS)) [55]. In any event, for a given iteration k, the current position is given by x{k−1} , k = 1, 2, 3, . . . and search direction u{k} at x{k−1} . In general, line search methods use function values along the search direction u{k} . In formulating a rudimentary gradient-only algorithm, the line search simply needs to be modified to only consider the associated directional derivative along the search direction u{k} . Let us therefore consider line search bracketing strategies (of which Fibonacci and golden ratio searches are examples) in the following. Function-value based bracketing strategies require a minimum of three points to bracket a minimum of F (λ) = f (x{k−1} + λu{k} ) along u{k} . Consequently, three points from the sequence [w(l − 1), w(l), w(l + 1)], l = 1, 2, . . . , lmax , are used with w(j) = γj. The line search iterations l are incremented until either a minimum is located or the maximum number of line search iterations lmax are reached. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 78 Once an interval that contains the minimum is located, a minimum of four points i.e. three intervals are required to refine the minimum. The aim of function-value based bracketing strategies is to find λ{k} such that for ξ ∈ R and ξ > 0: F (λ{k} + ξ) = f x{k−1} + (λ{k} + ξ)u{k} > F (λ{k} ), and F (λ{k} − ξ) = f x{k−1} + (λ{k} − ξ)u{k} > F (λ{k} ). (4.7) Line search bracketing strategies are easily modified to use only associated gradient information, e.g. see Bazaraa et al. [4], as we will now illustrate. Modifying bracketing strategies to require only associated gradient information requires a minimum of two points to bracket a sign change in the associated directional derivative along u{k} from negative to positive. Therefore, two points from the sequence [w(l − 1), w(l)], l = 1, 2, . . . , lmax , are used with w(j) = γj. The line search iterations l are incremented until either a sign change in the associated directional derivative F 0A (λ) is located or the maximum number of line search iterations lmax are reached. Once an interval is located that contains a sign change, a minimum of three points i.e. two intervals are required to refine the location of the sign change. The aim of gradient-only bracketing strategies is to locate λ{k} such that: F 0A (λ{k} + ξ) = ∇A T f x{k−1} + u{k} (λ{k} + ξ) u{k} > 0 (4.8) F 0A (λ{k} − ξ) = ∇A T f x{k−1} + u{k} (λ{k} − ξ) u{k} < 0. (4.9) and that Note that we merely locate the point at which the associated directional derivative changes from negative to positive. The requirement for the associated directional derivative to equal zero, is therefore relaxed. This is particularly important when considering discontinuous functions. However, for smooth functions, the sign change from negative to positive of course occurs at the point where the associated directional derivative is zero. In addition, inflection points are handled appropriately, as no sign change in the associated directional derivative occurs over an inflection point. Algorithmic implementation We now consider the algorithmic implementation of the second-order line search BFGS method for unconstrained minimization. Given an initial point x{0} , the BFGS implementation proceeds as follows: 1. Initialization: Select real constants > 0, ξ > 0 and γ > 0. Select integer constants kmax and lmax . Set G{0} = I. Set k := 0 and l := 0. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 79 2. Gradient evaluation: Compute ∇A f (x{k} ). 3. Update the search direction u{k+1} = −G{k} ∇A f (x{k} ). 4. Initiate an inner loop to conduct line search: Find λ{k+1} using the line search strategy described in Section 4.5.1 by either (4.7) for classical or (4.9) for gradient only. 5. Test for re-initialization of G{k} : if k mod n = 0 then G{k} = I else " G{k} # {k} T {k−1} {k} (y ) G y v {k} (v {k} )T {k−1} =G + 1+ (v {k} )T y {k} (v {k} )T y {k} # " v {k} (y {k} )T G{k−1} + G{k−1} y {k} (v {k} )T , − (v {k} )T y {k} with v {k} = λ{k} u{k} and y {k} = ∇A f (x{k} ) − ∇A f (x{k−1} ) . 6. Move to the new iterate: Set x{k+1} := x{k} + λ{k+1} u{k+1} . 7. Convergence test: if kx{k+1} − x{k} k ≤ OR k = kmax , stop. 8. Initiate an additional outer loop: Set k := k + 1 and goto Step 2. 4.5.2 Approximation methods Approximation methods can also be formulated using only associated gradient information, e.g. see Groenwold et al. [22]. Let us consider approximation functions f˜ that use the second order Taylor series expansion of a function f around some current iterate x{k} , given by f˜{k} (x) = f (x{k} ) + ∇A T f (x{k} )(x − x{k} ) 1 + (x − x{k} )T H {k} (x − x{k} ), k = 0, 1, 2, . . . (4.10) 2 where superscript k represents an iteration number, f˜ the second order Taylor series approximation to f , ∇A the associated gradient operator and H {k} the Hessian. f (x{k} ) and ∇A f (x{k} ) respectively represent the function value and associated gradient vector at the current iterate x{k} . Generally speaking, approximation methods use only function value information in constructing H {k} (due to the excessive computational effort associated with evaluating and storing H {k} in the first place). Consider for example a diagonal spherical quadratic approximation, with H {k} = c{k} I. The unknown c{k} can be obtained by enforcing f˜{k} (x{k−1} ) = f (x{k−1} ), which CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 80 results in f (x{k−1} ) = f (x{k} ) + ∇A T f (x{k} )(x{k−1} − x{k} ) + c{k} {k−1} (x − x{k} )T (x{k−1} − x{k} ), (4.11) 2 e.g. see Snyman and Hay [56]. The scalar c{k} is then obtained as c{k} = 2 f (x{k−1} ) − f (x{k} ) (x{k−1} − x{k} )T (x{k−1} − x{k} ) −2 ∇A T f (x{k} )(x{k−1} − x{k} ) . (4.12) (x{k−1} − x{k} )T (x{k−1} − x{k} ) Approximations solely based on gradient information may be constructed by taking the derivative of (4.10), which gives ∇f˜{k} (x) = ∇A f (x{k} ) + H {k} (x − x{k} ), k = 0, 1, 2, . . . (4.13) Note that at x = x{k} , the associated gradient of the function f (x) exactly match the gradient of the approximation function f˜(x). Notationally we write the gradient instead of associated gradient of the approximation function to emphasize the differentiability of the approximation function.The Hessian H {k} of the approximation f˜ is chosen to match some additional condition. Let us again consider a spherical quadratic approximation, with H {k} = c{k} I. Then, c{k} may be obtained by matching the gradient vectors at x{k−1} . Since only a single free parameter c{k} is available, the n components of the respective gradient vectors can (for example) be matched in a least square sense. The least squares error is given by E {k} = (∇f˜{k} (x{k−1} ) − ∇A f (x{k−1} ))T (∇f˜{k} (x{k−1} ) − ∇A f (x{k−1} )). (4.14) After substitution of ∇A f˜{k} (x{k−1} ) = ∇A f (x{k} ) + c{k} (x{k−1} − x{k} ), we have E {k} =(∇A f (x{k} ) + c{k} (x{k−1} − x{k} ) − ∇A f (x{k−1} ))T (∇A f (x{k} ) + c{k} (x{k−1} − x{k} ) − ∇A f (x{k−1} )). (4.15) Minimization of the least squares error E {k} w.r.t. c{k} then gives dE {k} = (∇A f (x{k} ) + c{k} (x{k−1} − x{k} ) − ∇A f (x{k−1} ))T (x{k−1} − x{k} ) {k} dc + (x{k−1} − x{k} )T (∇A f (x{k} ) + c{k} (x{k−1} − x{k} ) − ∇A f (x{k−1} )) = 0, (4.16) CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION hence c{k} = (x{k−1} − x{k} )T (∇A f (x{k−1} ) − ∇A f (x{k} )) . (x{k−1} − x{k} )T (x{k−1} − x{k} ) 81 (4.17) If the approximation is required to be strictly convex, we can enforce c{k} = max(β, c{k} ), with β > 0 small and prescribed. Since the sequential approximate subproblems are smooth, they may be solved analytically; the minimizer of subproblem k follows from setting (4.13) equal to 0 [52], to give ∇A f (x{k} ) x{k∗} = x{k} − . (4.18) c{k} 4.5.3 Conservative approximations Global convergence of sequential approximation methods may for example be affected through the notion of conservatism. Classical conservatism is based solely on function values, for which Svanberg [58] demonstrated that an approximation sequence k = 1, 2, · · · will terminate at the global minimizer x∗ ↔ f ∗ , if each k th approximation f˜(x{k∗} ) is conservative, i.e. if f˜(x{k∗} ) ≥ f (x{k∗} ) ∀ k. (4.19) Conservatism may also be affected using only associated gradient information. At iterate x{k∗} , the update is given by x{k∗} − x{k} , and conservatism is affected if the projection of the associated gradient ∇A f (x{k∗} ) of the actual function f (x) onto the update direction x{k∗} − x{k} is negative. For univariate functions, an update is conservative if it is an associated derivative descent update step (see Definition 4.3.4). For multivariate functions an update is conservative if it is a conservative associated derivative descent update step (see Definition 4.3.5), i.e. if ∇A T f (x{k∗} )(x{k∗} − x{k} ) < 0. (4.20) Hence, enforcement of the conditions given by Definition 4.3.4 or 4.3.5 suffice to ensure a sequence of derivative descent sequences for which proofs of convergence are offered in Section 4.4. To allow for update steps that are computable we employ a trust region strategy where we limit kx∗ − x{k} k ≤ γ. Algorithmic implementation Given an initial point x{0} , a {gradient-only}/classical conservative algorithm based on convex separable spherical quadratic approximations (SSA) for unconstrained minimization proceeds as follows: CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 82 1. Initialization: Select real constants > 0, α > 1 and initial curvature c{0} > 0. Set k := 0, l := 0. 2. Gradient evaluation: Compute {∇A f (x{k} )}/f (x{k} ) and ∇A f (x{k} ). 3. Approximate optimization: Construct local approximate subproblem {k} {(4.13)}/(4.10) at x , using {(4.17)}/(4.12) unless inside an inner loop then use c{k} as calculated in Step 6(b). Solve this subproblem analytically, to arrive at x{k∗} . 4. Evaluation: Compute {∇A f (x{k∗} )}/f (x{k∗} ). 5. Test if x{k∗} is acceptable: if {(4.20)}/(4.19) is satisfied, goto Step 7. 6. Initiate an inner loop to effect conservatism: (a) Set l := l + 1. (b) Set c{k} := αc{k} . (c) Goto Step 3. 7. Move to the new iterate: Set x{k+1} := x{k∗} . 8. Convergence test: if kx{k+1} − x{k} k ≤ , OR k = kmax , stop. 9. Initiate an additional outer loop: Set k := k + 1 and goto Step 2. 4.5.4 Termination criteria Termination criteria also need some consideration: if the function values and associated gradients of an objective or cost function contain step discontinuities, these quantities may not provide robust termination information. Accordingly, we only advocate the robust termination criterion k∆x{k+1} k = kx{k+1} − x{k} k < , (4.21) with small, positive and prescribed. (A maximum number of iterations may of course also be prescribed, but this is not robust.) 4.6 Mathematical programming vs. gradient-only optimization We now briefly reflect on some differences between gradient-only optimization and classical ‘mathematical programming’. Consider the step discontinuities depicted in Figure 4.5. d f d (e) d (f) f (b) f (a) d d (c) (d) d d (g) f d 83 f f f f CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION (h) Figure 4.5: Plots depicting (a)-(d) the function values, and (e)-(h) the corresponding associated derivatives of four instances of step discontinuous univariate functions. In classical mathematical programming, the inconsistent step discontinuity depicted in Figure 4.5(a) result in a local minimum, whereas the function with the consistent step discontinuity depicted in Figure 4.5(b) is monotonically decreasing. The step discontinuities depicted in Figures 4.5(c)-(d) again result in local minima. In gradient-only optimization, the inconsistent step discontinuity in Figure 4.5(a) is associated derivative negative, as is the consistent step discontinuity depicted in Figure 4.5(b). The step discontinuities depicted in Figures 4.5(c)-(d) represent non-negative gradient projection points (S-NN-GPPs) as shown in the associated derivative of Figures 4.5(c)-(d). Consider the objective functions depicted in Figure 4.6 (b). Clearly, classical optimization approaches may get stuck in local minima caused by inconsistent step discontinuities, whereas gradient-only optimization approaches will not. Hence, gradient-only optimization allows for a robust strategy to avoid inconsistent step discontinuities when the minimizer x∗ of an objective function coincides with a strict non-negative gradient projection point (S-NN-GPP) x∗g as shown in Figure 4.6 (b). However, gradient-only approaches will ignore a global minimizer x∗ of an objective function that occurs over an inconsistent step discontinuity as depicted in Figure 4.6 (a) and converge to a S-NN-GPP x∗g . Hence, whether function-value based or gradient-only based criteria is to be used will depend on which one best describes or approximates the solution of an optimization problem. 84 f(x) f(x) CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION * x* xg x* x*g , x (a) x (b) Figure 4.6: Plots depicting a step discontinuous objective function with a (a) distinct minimizer x∗ and strict non-negative gradient projection point (S-NN-GPP) x∗g and (b) coinciding minimizer x∗ and S-NN-GPP x∗g . 4.7 Numerical study We start our numerical study with a practical shape optimization problem using a remeshing strategy that results in a discontinuous objective function. We then proceed with a set of discontinuous test functions aimed to “mimic” non-physical discontinuities in functions. The advantage of introducing a set of test problems is that they are easily implemented which allows for focussed research on algorithm development and testing, without requiring access to a variable discretization PDE solver. The disadvantage of test problems is that only part of the complexity of PDE based objective functions is captured. The algorithmic settings used in the numerical study are presented in Table 4.1 for the algorithms outlined in Sections 4.5.1 and 4.5.3. The choice of γ = 0.1 is deliberate as we aim to “mimic” a locally exact line search at the cost of computational efficiency and the results should be interpreted in view of this. The aim is to highlight the differences between function value and gradient-only based line search strategies. It is evident that the probability of getting trapped locally using function-value based line search strategies is reduced when γ is increased or when some interpolation strategy is used in the line search e.g. Powell’s method [55]. The latter being evident from the approximation results; we however note that although many non-physical local minima may be avoided using either of the two strategies, neither is robust, and the algorithms may still get trapped in non-physical local minima. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 85 Table 4.1: Algorithmic settings used in the numerical experiments. γ ζ α kmax lmax 10−5 0.1 10−6 2 3000 3000 Table 4.2: Tabulated results obtained for the unconstrained Michell-like structure. Algorithm BFGS(f) BFGS(g) SSA(f) SSA(g) f (x{Nk } ) 7.293E-01 5.213E-01 5.805E-01 5.285E-01 4.7.1 Results 4.7.2 Shape optimization k∇A f (x{Nk } )k 3.163E-02 3.118E-03 1.719E-02 3.438E-03 k∆x{Nk } k 0.000E+00 0.000E+00 7.896E-06 3.249E-06 Nk 5 38 23 103 Nl 137 816 10 7 We now consider the isotropic shape optimization problem outlined in Section 4.1.2. In addition to the algorithm settings given in Table 4.1 we also limit the maximum step size of each algorithm to 2. The results for the BFGS(f), BFGS(g), SSA(f) and SSA(g) algorithms are summarized in Table 4.2 with the respective final designs depicted in Figures 4.7 (a)-(d). Recall that the (f ) postfix indicates classical function-value based algorithms, whereas the (g) postfix indicates gradient-only optimization algorithms. Table 4.2 presents the function value f (x{Nk } ), associated gradient norm k∇A f (x{Nk } )k, convergence tolerance k∆x{Nk } k, number of outer iterations Nk as well as the number of inner iterations Nl . Consider BFGS(f) which converged after 5 outer iterations Nk and, evidently got trapped in a local minimum due to a numerically induced step discontinuity. The premature final design is apparent from Figure 4.7 (a). Similarly, SSA(f) converged after 23 outer iterations Nk also after getting trapped in a step discontinuous minimum. The behaviour of the cost function around the converged solution of SSA(f) is depicted in Figure 4.2 of Section 4.1.2. Significant improvements are evident by comparing Figure 4.7 (c) to Figure 4.7 (a), although noticeable improvements could still be made. Clearly, conservative approximation methods are able to overcome some step discontinuities and of course even more so when conservatism is relaxed. Conversely, BFGS(g) and SSA(g) were able to optimize the Michell structure without getting trapped in numerically induced step discontinuities. Consider the similar designs depicted in Figures 4.7 (b) and (d). It is clear that BFGS(g) and SSA(g) improved notably on the designs obtained with BFGS(f) and SSA(f). We further present for each algorithm their respective histories w.r.t. function value f (x{k} ), associated gradient norm k∇A f (x{k} )k and convergence tolerance k∆x{k} k. The 86 8 8 6 6 6 4 2 4 2 Height 8 6 Height 8 Height Height CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 4 2 4 2 0 0 0 0 −2 0 −2 0 −2 0 −2 0 5 Length 10 15 5 (a) Length 10 15 5 (b) Length 10 15 5 Length (c) 10 15 (d) 0.7 0.6 5 10 15 20 25 k 30 0 || 10 {k} Function value f(x{k}) 0.8 BFGS(f) BFGS(g) −1 10 ||∆x BFGS(f) BFGS(g) 0.9 Gradient norm ||∇ f(x{k})|| Figure 4.7: Michell-like structure: converged designs obtained with (a) BFGS(f), (b) BFGS(g), (c) SSA(f), and (d) SSA(g). −1 10 −2 10 −2 BFGS(f) BFGS(g) 10 35 10 (a) 20 30 k 10 (b) 20 k 30 (c) Figure 4.8: Michell-like structure: BFGS(f) and BFGS(g) algorithms convergence history plot of the (a) function value f (x{k} ) and (b) associated gradient norm k∇A f (x{k} )k (c) and convergence tolerance k∆x{k} k. respective histories for the BFGS algorithms are depicted in Figures 4.8 (a)-(c) and for the SSA algorithms in Figures 4.9 (a)-(c). Monotonic function value decrease for both BFGS(f) and SSA(f) is clearly depicted in respectively Figure 4.8(a) and Figure 4.9(a) with the respective associated gradient norms depicted in Figure 4.8(b) and Figure 4.9(b). The convergence histories are depicted in Figure 4.8(c) and Figure 4.9(c). Conversely, non-monotonic function value decrease for both BFGS(g) and SSA(g) is evident in Figure 4.8(a) and Figure 4.9(a) with the respective associated gradient norms 0.65 0.6 0.55 20 40 (a) k 60 80 100 || {k} Gradient norm ||∇ f(x 0.7 −2 10 ||∆x ) {k} Function value f(x 0 10 SSA(f) SSA(g) {k} SSA(f) SSA(g) )|| 0.75 −2 10 −4 10 SSA(f) SSA(g) 20 40 (b) k 60 80 100 20 40 k 60 80 100 (c) Figure 4.9: Michell-like structure: SSA(f) and SSA(g) algorithms convergence history plot of the (a) function value f (x{k} ) and (b) associated gradient norm k∇A f (x{k} )k (c) and convergence tolerance k∆x{k} k. CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 87 depicted in Figure 4.8(b) and Figure 4.9(b). The convergence histories are depicted in Figure 4.8(c) and Figure 4.9(c). 4.7.3 Analytical set of test problems We now present a set of five analytical step discontinuous test problems in order to further illustrate the advantages of gradient-only optimization. Rosenbrock step discontinuous function f1 is piecewise defined as follows: f1 (x) = n X 1 2 2 2 100 x(2i) − x (2i − 1) + (1 − x(2i − 1)) , 1.1 i=1 2 if 0 ≤ sin(2kxk) < . 3 n X 2 2 2 1.1 100 x(2i) − x (2i − 1) + (1 − x(2i − 1)) , i=1 (4.22) 2 if − ≤ sin(2kxk) < 0. 3 n X 2 100 x(2i) − x2 (2i − 1) + (1 − x(2i − 1))2 , i=1 2 2 if − > sin(2kxk) ≥ . 3 3 Quadric step discontinuous function f2 is piecewise defined as follows: 2 i n X X x(j) , if sin(8kxk) > 0.5. i=1 j=1 2 n i X X f2 (x) = 1.3 x(j) , if sin(8kxk) < −0.5. i=1 j=1 2 i n X X 1 x(j) , if − 0.5 ≤ sin(8kxk) ≤ 0.5. 1.3 i=1 (4.23) j=1 Sum squares step discontinuous function f3 is piecewise defined as follows: n n X X 1 2 1 ix (i), if sin x(j) > 0.5. 1.5 10 i=1 j=1 n n X X 1 f3 (x) = 1.5ix2 (i), if sin x(j) < −0.5. 10 i=1 j=1 n n X X 1 1 ix2 (i) + , if − 0.5 ≤ sin x(j) ≤ 0.5. n 10 i=1 j=1 (4.24) CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 88 Zakharov step discontinuous function f4 is piecewise defined as follows: !2 !4 n n n 2 (i) 2 (i) X X X ix ix 1 x2 (i)+ + , 1.5 2 2 i=1 i=1 i=1 if sin(kxk) > 0.5. !2 !4 n n n 2 (i) 2 (i) X X X ix ix 1.5 x2 (i)+ + + 0.5, 2 2 f4 (x) = i=1 i=1 i=1 if sin(kxk) < −0.5. !2 !4 n n n 2 (i) 2 (i) X X X ix ix 2 + + 1, x (i)+ 2 2 i=1 i=1 i=1 if − 0.5 ≤ sin(kxk) ≤ 0.5. (4.25) Hyper ellipsoid step discontinuous function f5 is piecewise defined as follows: n X 1 1 i−1 2 2 x (i) + , 1.1 n i=1 n X 1 f5 (x) = 1.1 × 2i−1 x2 (i) + , n i=1 n X 2i−1 x2 (i), i=1 if sin 2 n X x(j) > 0.5. j=1 if sin 2 n X x(j) < 0, (4.26) j=1 if 0 ≤ sin 2 n X x(j) ≤ 0.5. j=1 This set of step discontinuous test problems “mimics” functions that contain nonphysical discontinuities. Our aim is to overcome the discontinuities to obtain x∗g as outlined in Definition 4.3.3. The region around the solution of f1 , f2 and f4 is continuous as opposed to the region around the optima of f3 and f5 which are discontinuous. The solution of f1 is given by x∗ (i) = 1, i = 1, 2, . . . , n with f1∗ = 0 whereas the solution of f2 and f4 is given by x∗ (i) = 0, i = 1, 2, . . . , n with f2∗ = 0 and f4∗ = 1 respectively. The solution of f3 and f5 is at a discontinuity and is therefore defined by a derivative critical set S. The derivative critical sets for both f3 and f5 are defined by x∗ (i) = 0, i = 1, 2, . . . , n. The possible function values at the optima are f3∗ = {0, 1} and f5∗ = {0, 1} and depends on the direction from which the discontinuity is approached. The gradient field for each test function is given by the analytical gradient of each test function whereas the gradient at a discontinuous point is defined by the analytical gradient of the active equation of a test function at that point. Results are presented for dimension n = 10 of the test problem set given in Section 4.7.3. The starting point of each algorithm for each problem is x(i){0} = 4, i = 1, 2, . . . n. Numerical results are presented in Table 4.3. Nk and Nl respectively represent the number of function or gradient evaluations in the outer and inner loops. We have not limited the step size of the approximation algorithms; this is normally not done in algorithms based on conservatism (although it may sometimes be beneficial). CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 89 Table 4.3: Results for the step discontinuous test problem set. Function f1 f2 f3 f4 f5 Solution BFGS(f) BFGS(g) f∗ k∇A f ∗ k kx{Nk ∗} − x∗ k Nk Nl f∗ k∇A ∗ k kx{Nk ∗} − x∗ k Nk Nl f∗ k∇A ∗ k kx{Nk ∗} − x∗ k Nk Nl f∗ k∇A ∗ k kx{Nk ∗} − x∗ k Nk Nl f∗ k∇A ∗ k {Nk ∗} kx − x∗ k Nk Nl 5.969E+04 3.718E+04 9.771E+00 4 66 4.612E+03 7.918E+02 1.250E+01 3 42 8.353E+00 7.999E+00 2.861E+00 5 228 1.136E+08 4.319E+07 1.204E+01 3 47 1.037E+04 3.371E+03 1.204E+01 3 52 1.520E-05 3.333E-03 9.158E-03 269 3310 1.106E-08 2.083E-04 2.282E-04 173 2480 3.443E-10 3.949E-05 1.843E-05 59 965 1.000E+00 1.240E-03 7.679E-06 5 222 7.162E-07 2.196E-03 8.454E-04 763 10597 SSA(f) SSA(g) 3.130E+00 5.440E-04 9.599E-01 2.546E-02 5.248E+00 5.497E-02 99 757 133 739 1.044E+01 5.017E-07 1.021E+01 9.291E-04 2.683E+00 1.540E-03 27 101 106 90 8.371E+00 2.573E-09 6.344E+00 1.031E-04 2.981E+00 5.043E-05 25 30 93 18 3.418E+01 1.000E+00 1.157E+01 1.525E-03 5.760E+00 7.624E-04 30 130 50 235 2.055E+02 1.051E-06 1.238E+02 2.961E-03 7.072E+00 1.024E-03 39 232 159 222 CHAPTER 4. THEORY OF GRADIENT-ONLY OPTIMIZATION 90 The results presented in Table 4.3 show that gradient-only optimization algorithms are able to robustly minimize step discontinuous objective functions. In contrast, the classical function-value based optimization algorithms converged to local minima on each of the problems. It is clear from Table 4.3 that the function-value based approximation algorithm SSA(f) is able to overcome many of the non-physical local minima. However, SSA(f) does not represent a robust strategy as it still converged to a local minimum on each of the test problems. 4.8 Conclusions We have studied the unconstrained minimization of functions containing step or jump discontinuities and for which associated gradients can be computed everywhere. Step or jump discontinuities arises during the solution of systems of (partial) differential equations, when variable spatial and temporal discretization techniques produce discontinuities that are artifacts of the approximate numerical strategies used. While discontinuous, we demonstrate that these problems may effectively be minimized if only gradient information is used. Various algorithmic options were discussed and numerical results presented for a practical shape optimization problem as well as a set of analytical test functions. We presented a mathematical framework for gradient-only optimization that includes convergence proofs for piece-wise smooth step discontinuous functions classes of functions. The implications of our approach are that variable discretization strategies, which are so important in numerical discretization methods, may be used in combination with efficient local optimization algorithms, notwithstanding the fact that these strategies themselves introduce step discontinuities. CHAPTER 5 Adaptive remeshing in shape optimization As discussed in Chapter 2, Persson and Strang have previously proposed an unstructured remeshing strategy based on a truss structure analogy, which we in turn rendered quadratically convergent. Herein, we turn our quadratically convergent mesh generator into an adaptive generator, by allowing for a spatially varying ideal element length field, computed using the Zienkiewicz-Zhu error indicator. The remeshing strategy makes (semi) analytical sensitivities available for use in gradient based optimization algorithms. To circumvent difficulties associated with local minima due to remeshing, we rely on gradient-only optimization algorithms as presented in Chapters 3 and 4. Numerical results are presented for an orthotropic cantilever beam, an orthotropic Michell-like structure and a spanner design problem. This chapter is arranged as follows. Firstly, we present an overview of adaptive mesh refinement in shape optimization in Section 5.1, followed by a description of the gradient-only shape optimization problem in Section 5.2. Thereafter we briefly outline the gradient-only optimization algorithm used in this study in Section 5.3. We then discuss the structural analysis, including the a posteriori error analysis and mesh refinement strategy, in Section 5.4. Our adaptive mesh generation strategy is presented in Section 5.5, followed in Section 5.6 by a sensitivity analysis. Section 5.7 contains all the numerical results, which includes a convergence study and three example problems. Some conclusions are offered in Section 5.8. 91 CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 5.1 92 Introduction In Chapter 2, we presented a remeshing strategy for finite element based shape optimization [66]. Recall that this remeshing strategy is based on a truss structure analogy [42], and the equilibrium position of the truss system is solved for using Newton’s method. As described the method uses (semi) analytical sensitivity information, which is computed efficiently, and which makes the use of highly efficient gradient based optimization algorithms possible. However, the numerically computed objective function of the shape optimization problem is discontinuous as shown in Chapter 3. These discontinuities are due to changes in the mesh topology1 that result from the remeshing strategy. Many of these discontinuities manifest themselves as local minima in the objective function, causing difficulties for conventional gradient-based optimization algorithms [2]. These difficulties are often accommodated by “smoothing” of the objective function. Approaches for smoothing the objective function include the construction of inherently smooth objective functions by avoiding remeshing altogether, and by using mesh movement strategies [9]. Surrogate approaches may also be used to construct smooth representations of the discontinuous objective function, and to reduce the magnitudes of the discontinuities [48]. However, mesh movement strategies are susceptible to element distortion and inversion, while surrogate methods scale poorly with problem dimension. Controlling the discretization error is computationally expensive, since multiple finite element analyses (FEAs) are usually required for each candidate shape design [16, 23, 30, 48]. Alternatively, the discontinuous objective functions may be optimized directly. Selected approaches include conventional gradient-based optimization algorithms used in combination with restart strategies, and derivative free optimization methods [7, 11, 14]. These strategies are usually also computationally expensive due to the high number of required iterations, and/or poor scaling with problem dimensionality. As a further alternative, we have demonstrated in Chapter 3 [65] that gradient-only optimization is able to robustly and efficiently optimize the discontinuous objective functions that occur in shape optimization. As pointed out in Chapters 3 and 4, gradient-only optimization algorithms are conventional gradient based optimization algorithms - which invariably exploit not only first-order gradient information, but also zeroth order function value information - modified to no longer use function value information. Hence, the computational efficiency of gradient-only optimization algorithms is sometimes comparable to “conventional” gradient based optimization algorithms, provided that computationally efficient sensitivities are available. In this chapter, we aim to extend the remeshing shape optimization strategy proposed in Chapter 2, by adding error indicators, with the objective of improving the accuracy of the computed structural response for a fixed number of degrees of freedom. In addition, 1 Mesh topology is understood to refer to the number of nodes and nodal connectivity of a mesh. CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION Structural analysis Sensitivity analysis Structural analysis Sensitivity analysis Optimization update step Mesh update using fixed or variable mesh topology Compute error indicator Mesh update using only fixed mesh topology Optimization update step Compute error indicator No Need to update mesh? Yes Mesh update using variable mesh topology Compute error indicator Structural analysis 93 Sensitivity analysis Compute error indicator Gradient-only optimization update step No Need to update mesh? Yes Mesh update using variable mesh topology Map error indicator to current geometry Mesh update using variable mesh topology Compute error indicator Error analysis Error analysis (a) (b) (c) Figure 5.1: FE-error indicator integration into optimization (semi) analytical sensitivities are made available for use in gradient-only optimization approaches. We can indeed incorporate error indicators freely, since we are able to robustly and efficiently optimize the discontinuous objective functions resulting from changes in the mesh topology. Shape optimization may be a natural companion to a posteriori adaptive finite element mesh refinement, since both techniques share the computational burden of multiple analyses [16, 23, 30, 48]. A large portion of the computational burden associated with a posteriori adaptive finite element mesh refinement in once-off analyses is already accommodated for during shape optimization. A posteriori error indicators can be incorporated into finite element based shape optimization environments using two distinctly different approaches [48], namely whether changes in mesh topologies are allowed between optimization iterations or not. Approaches that require the mesh topology to remain fixed between optimization iterations only update the mesh topology after the optimization update, as depicted in Figure 5.1(a). Hence, a single FEA is required for each candidate shape design if the error remains sufficiently small after a shape design update. Otherwise, multiple analyses are required per candidate shape design in order to reduce the error. Since the function values computed with different mesh topologies differ for the same candidate shape design, the optimization update may have to be repeated with the updated mesh topology [48]. This strategy may be efficient when design changes are small and no error control is required between design updates. Alternatively, the mesh topology may be updated to control the discretization error during each optimization update, as depicted in Figure 5.1(b). Multiple finite element analyses may be required for each candidate shape design to control the discretization CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 94 error, i.e. to limit the size of the jump discontinuities resulting from different mesh topologies, allowing for the efficient use of classical gradient based algorithms [48]. In this chapter, we implement the latter approach, but relax the strict error control required when using classical gradient based algorithms, as shown in Figure 5.1(c). We are able to relax the error analysis and strict discretization error control for each candidate shape design, since gradient-only optimization allows us to robustly and efficiently optimize the resulting discontinuous cost functions, as we have previously demonstrated [65]. Instead of obtaining a converged error for each candidate shape design, we allow the error to converge as the shape designs converge [61]. Our proposed strategy requires only a single FEA for each candidate shape design. This is achieved by mapping the computed error indicator of a given shape geometry to the geometry obtained after an iteration of our gradient-only optimization algorithm. The mapping of the error indicator field between two shape geometries is merely a relocation of the nodal positions of the error indicator mesh from one shape geometry to the next using radial basis functions, as opposed to a linearization of the error indicator field between two shape geometries [10]. The advantage of this mapping is that the required number of computations are fewer than that required to compute a linearized error indicator field, or the actual error indicator field which would require a full FEA. 5.2 Shape optimization problem The problem under consideration is the equality constrained shape optimization problem, for which the Lagrangian is given by L(x, λ) = F (Ω(x)) + m X λj gj (x), x ∈ X ⊆ Rn and λ ∈ Rm , (5.1) j=1 where the objective function F (Ω(x)) is a scalar function that depends on the geometry Ω of the structure, which in turn depends on the control variables x that describe the geometrical boundary ∂Ω. The equality constraints gj (x) = 0, j = 1, 2, · · · , m are scalar functions of the control variables x. For the sake of brevity, the cost function and the constraints will respectively be denoted by F(x) and g(x); this notation will however imply dependency on Ω(x). We choose to represent the geometrical boundary ∂Ω by a simple piecewise linear interpolation between the control variables. However, Bezier curves or B-splines, etc. may of course also be used. Normally, the saddle point of (5.1) is solved for using the dual formulation max{min L(x, λ)}. λ x (5.2) CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 95 We however solve (5.1) using the gradient-only dual formulation [65] g g λ x max{min L(x, λ)}, (5.3) g with max defined as follow: find λ, such that λ m ∇T λ L(x, λ + γv v)v ≤ 0, ∀ v ∈ R . (5.4) g Similarly, min is defined as follows: find x, such that x n ∇T x L(x + δu u, λ)u ≥ 0, ∀ u ∈ R such that x + δu u ∈ X, (5.5) with X the convex set of all possible solutions, ∇x the partial derivatives w.r.t. x, ∇λ the partial derivatives w.r.t. λ and δu and γv real positive numbers. Note that we have only exploited gradient information of L(x, λ). 5.3 Optimization algorithm We will use the gradient-only sequential spherical approximation (SSA) algorithm presented by Wilke et. al. [65] to optimize the discontinuous shape optimization problem. For the sake of completeness and brevity, we merely outline the algorithm here (for details on the algorithm, and a motivation for using gradient-only optimization methods in the first place, the reader is referred to [65]): 1. Initialization: Select real constants > 0, α > 1, initial curvature c{0} > 0 and initial point [x{0} λ{0} ]. Set k := 1, s := 0. {k} 2. Gradient evaluation: Compute ∇T λ{k} ]). x L([x 3. Approximate optimization: Construct the local gradient-only approximate subproblem {k} ∇f˜{k} (x) = ∇T λ{k} ]) + H {k} (x − x{k} ) (5.6) x L([x at x{k} , using H {k} = c{k} I where I is the identity matrix and c{k} = {k−1} {k} (x{k−1} − x{k} )T (∇T λ{k} ]) − ∇T λ{k} ])) x L([x x L([x . (x{k−1} − x{k} )T (x{k−1} − x{k} ) (5.7) (In an inner loop, use c{k} as calculated in Step 6(b)). Solve this subproblem analytically, to arrive at x{k∗} . {k∗} 4. Evaluation: Compute ∇T λ{k} ]). x L([x CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 96 5. Test if x{k∗} is acceptable: if {k∗} λ{k} ])(x{k∗} − x{k} ) ≤ ∇T f˜(x{k∗} )(x{k∗} − x{k} ) = 0 ∇T x L([x (5.8) goto Step 7. 6. Initiate an inner loop to effect conservatism: (a) Set s := s + 1. (b) Set c{k} := αc{k} . (c) Goto Step 3. 7. Move to the new iterate: Set x{k+1} := x{k∗} . 8. Update multiplier: Set λ{k+1} := λ{k} + λs {k+1} , with λs {k+1} the multiplier update step. 9. Convergence test: if k[∆x{k} ∆λ{k} ]k < , OR k∆x{i} k < , ∀ i = {k − 4, k − 3, . . . , k} , OR k = kmax , stop2 . 10. Initiate an additional outer loop: Set k := k + 1 and goto Step 2. 5.4 Structural analysis In shape optimization, the cost function F(x) = F u(X (x)) is an explicit function of the nodal displacements u, which in turn is a function of the discretized geometrical domain X . The discretized geometrical domain X is described by the control variables x which represent the geometrical boundary ∂Ω. The nodal displacements u are obtained by solving the approximate finite element equilibrium equations for linear elasticity Ku = f , (5.9) where K represents the assembled structural stiffness matrix and f the consistent structural nodal loads, from which the unknown displacements u can be computed. From u, we can then locally compute elemental stress fields σ̂ e = CB e ue , (5.10) with constitutive relationship C, element kinematic relation B e and element displacement ue . By combining the local stress fields σ̂ e of adjacent elements, we obtain a global 2 The notation used is ∆φ{k} = φ{k+1} − φ{k} . CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 97 discontinuous stress field σ̂ over the entire structure, since inter-elemental stress continuity is not enforced. As the true stress field is continuous, error indicators may be recovered from the discontinuous stress field [70]. As said, shape optimization and a posteriori adaptive finite element refinement may naturally compliment each other, since both imply multiple FEAs. Instead of only conducting a FEA for each candidate shape design, we also recover error indicators from these FEAs. The recovered error from a given shape design is then used to discretize an updated shape design, causing the refinement strategy to converge as the shape design converges. 5.4.1 Recovery-based global error indicator Although many recovery-based error indicators exist which range from so-called global to local indicators [69], we will herein opt for only the well-known Zienkiewicz-Zhu (ZZ) global error indicator [71]. We do so merely to avoid distraction - other indicators may equally well be used. The ZZ error indicator approximates the exact error in the energy norm by considering the difference in the energy norm of the smooth stress field σ̆ and the discrete stress field σ̂ as follows: Z 2 (5.11) kek = (σ̆ − σ̂)T C −1 (σ̆ − σ̂)dΩ. Ω The smooth stress field σ̆ is obtained from a least squares fit through the discrete nodal stress values. This requires a system of the size of the number of nodes to be solved. For the ith element the square of the energy norm of the finite element solution kυ̂ i k2 is given by kυ̂ i k2 = uTi K i ui . (5.12) The corrected energy norm ||vk, which is used to approximate the exact energy norm, is given by kυk2 = kυ̂k2 + kek2 , (5.13) where kek2 and kυ̂k2 are computed by summing the elemental contributions, given by r r X X 2 kei k and kυ̂ i k2 , where r is the number of elements. Using the corrected energy i i norm kvk, the average element error ē is computed by taking a fraction of the root mean square of the corrected energy norm, defined as kυk ē = ι √ , r (5.14) where ι represents the relative error tolerance. We choose to keep the number of nodes constant in our remeshing strategy, and as a result, we select ι = 1. For our numerical CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 98 studies, we compute the global error η as s η= 5.4.2 kek2 . kvk2 (5.15) Refinement procedure Using the computed error, we seek a refinement strategy to indicate spatial refinement (respectively de-refinement) of the mesh. For this, we modify the refinement procedure of Zienkiewicz and Zhu [71] to suit our r-refinement strategy: for the ith element at the {k} k th iteration, we compute the refinement ratio ξi as {k} {k} ξi {k} In turn, we use the refinement ratio ξi kei k . = {k} ē (5.16) to compute the ideal element length3 {k−1} {k} ĥi = hi {k} ξi 1/p , (5.17) with h{0} chosen as the ideal element length of a uniform mesh for the first iteration. This also defines the initial number of nodes for our r-refinement strategy. Here, p is usually selected as the polynomial order of the shape functions away from singularities, and adjusted near singularities [71]. Our mesh generator naturally generates linear strain triangle (LST) elements, for which we found experimentally that p < 4 may result in oscillatory behavior. Hence, we have somewhat arbitrarily selected p = 5 herein [61]. We then smooth the discrete elemental scalar field ĥ{k} using nodal averaging to obtain a piece-wise continuous ideal element field h̃{k} described by a finite element interpolation. Finally, we normalize the continuous ideal element field h̃{k} to obtain h̆{k} = h̃{k} {0} κh , h̄{k} (5.18) with h̄{k} the average continuous ideal element length h̃{k} and κ a scaling factor. We select κ as constant, but allowing κ to vary (as some function of the initial area, current area, number of initial boundary nodes and current boundary) may well be beneficial. Finally, we limit the minimum ideal element length h̆{k} = hmin ∀ h̆{k} < hmin . 3 (5.19) Ideal element length refers to the “unloaded” truss lengths of the truss members in our truss structure analogous mesh generator [66] - also see Section 5.5. CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 5.5 99 Adaptive mesh generator Our mesh generator solves for the equilibrium of a truss structure [42], which doubles as the finite element mesh, using the ideal element length field h{k} , k = 1, 2, 3, . . . as the unloaded truss lengths, as opposed to directly optimizing the mesh according to some optimality criterion [37, 60]. It has been demonstrated [42, 65, 66] that this approach generates “good” meshes. We start with an initial uniform mesh X {0} at k = 0, using a uniform ideal element length field h{0} [66]. After each analysis we compute the ideal element length field h̆{k} using the refinement strategy described in Section 5.4.2. Recall that we then merely relocate the nodal positions of the computed ideal element length field to the new candidate shape design obtained from the optimization step, to avoid multiple analyses per candidate shape design, as illustrated in Figure 5.1(c). (We will describe the details of the mapping of the error field in Section 5.5.2.) As with our previous mesh generator [66], we partition the mesh X {k} along the interior nodes X {k}Ω and boundary nodes X {k}∂Ω , which allows independent treatment of the boundary nodes ∂Ω{k} and interior nodes Ω{k} . Superscript k denotes the iteration counter, which we will omit for the sake of brevity, unless we explicitly want to highlight the dependency on k. The boundary nodes X ∂Ω are seeded according to the ideal element length field h̆{k} along the geometrical boundary ∂Ω, with nodes explicitly placed on the control variable locations x. This ensures accurate representation of the defined geometrical domain Ω. Therefore x ⊂ X ∂Ω , with ∂Ω described by a piece-wise linear interpolation of x. The boundary nodes X ∂Ω remain fixed during the current iteration of the mesh generation process. We therefore only solve for X Ω in finding the equilibrium of the truss structure F Ω z(X Ω ) = 0. (5.20) The equilibrium of the truss structure is related to the interior nodes X Ω via the force function z(X Ω ) = z l(X Ω ), l0 (h̆{k} , X Ω ) = K(l0 − l), (5.21) which depends on the constant spring stiffness K, the length of the truss members l(X Ω ) and the undeformed truss lengths l0 h̆{k} , X Ω . The undeformed truss lengths l0 h̆{k} , X Ω depend on the ideal element length field h̆{k} , as well as the interior nodes X Ω , since the ideal element length field h̆{k} is evaluated at the midpoint of each truss member. However, the ideal element length field h̆{k} (e) is taken as a constant background field [10, 27, 50, 61] during the mesh generation process, i.e. we do not recompute the error field or linearize the error field when the interior nodes X Ω of the mesh vary. Consequently, the CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 100 dependency of the ideal element length field h̆{k} (e) on the spatially varying a posteriori error field e is constant and the sensitivity zero. Lastly, the displacement field u(X Ω ) depends on the interior nodes X Ω . The reduced truss system in Eq. (5.20) is solved directly via the quadratically convergent Newton’s method, which is given by ∂F Ω ∆X Ω = −F Ω . ∂X Ω (5.22) The update of nodal coordinates is given by Ω Ω XΩ n+1 = X n + ∆X , (5.23) ∂F Ω and for a constant ideal element length background field the consistent tangent ∂X Ω is given by ∂F Ω ∂l ∂F Ω ∂l0 {k} ∂F Ω = + . (5.24) ∂l ∂X Ω ∂l0 {k} ∂X Ω ∂X Ω We obtain quadratic convergence using Newton’s method. 5.5.1 Boundary nodes A first approach to accommodate a spatially varying ideal element length field may be to merely change the ideal element length and boundary spacing, while the number of boundary nodes and interior nodes follow from our previous uniform remeshing strategy [66]. However, limited improvements are achieved using this naive strategy. In this study we let the number of boundary nodes follow from the spatially varying ideal element length field by first placing nodes along the boundary X ∂Ω . Since we aim to keep the number of nodes constant, the remaining nodes are seeded in the interior domain X Ω . Although this strategy may seem simplistic, we found other strategies to determine the number of boundary nodes, like using ratios of the circumference to interior area together with the error along the circumference to the interior, susceptible to oscillations. As stated before the boundary nodes remain fixed once they have been placed along the boundary, which reduces the size of the Newton system when solving for the truss equilibrium. Consequently, an increase in the number of boundary nodes X ∂Ω results in a decrease in the interior nodes X Ω and vice versa, as we want to keep the total number of nodes fixed. To reduce the computational effort, we use the nodes that describe the ideal element length field h̆{k} as an initial guess for our interior nodes. To keep the total number of nodes constant, we need to either remove or add nodes to those nodes used in representing h̆{k} . We do so by ranking the nodes and elements according to their error densities. We remove nodes by starting with nodes with smaller error densities, and add nodes by introducing nodes at the centroids of elements with higher element error CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 101 densities. 5.5.2 Mapping the error field between candidate shape designs The ideal element length field h̆{k} for the k th iteration is obtained by mapping the computed ideal element length field after the analysis of the (k − 1)th iteration to the candidate shape design for the k th iteration. We achieve this by merely mapping the nodal positions, without requiring connectivity information, using radial basis functions (RBF). We therefore have total flexibility on whether we want to use or discard the nodal connectivity of the previous geometry. In this study we pay the computational penalty of re-triangulation at every iteration to keep our strategy unsophisticated, while allowing for large shape changes. The details of the mapping strategy using radial basis functions are outlined in the Appendix. 5.6 Sensitivity analysis Recall that the cost function F(u(X (x))) is an explicit function of the nodal displacements. Specifically, in all the examples herein, the cost function F is the nodal displacement at the point where a point load F is applied, expressed as F(u(x)) = uF (x). (5.25) The displacement field u(x) depends on the discretized geometrical domain X , which is obtained by solving for the nodal positions of a truss structure at equilibrium. The ideal element lengths h̆{k} of the truss structure are obtained from the error analysis discussed in Section 5.4. Then, the sensitivity of the displacement uF w.r.t. the control variables x is obtained by computing duF dX duF = . (5.26) dx dX dx F The computation of du is obtained by direct differentiation of the finite element equilibdX rium equations Ku = f , given by dK du df u+K = . dX dX dX (5.27) df For the fixed applied external loads f we will restrict ourselves to in this study, dX =0 and (5.27) reduces to du dK K =− u. (5.28) dX dX CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 102 We compute dK by direct differentiation of the analytical stiffness matrices for the linear dX du F strain triangular elements [57, 66], which allows to solve for dX , to then obtain du . dX dX The sensitivities of the nodal coordinates X w.r.t. the control variables x, dx , are obtained by differentiating the truss structure equilibrium equations from the mesh generation in Section 5.5. Recall that we partitioned X along the interior nodes X Ω and boundary nodes X ∂Ω . Also recollect that the boundary nodes X ∂Ω are seeded according to the ideal element length field h̆{k} along the geometrical boundary ∂Ω, and that they remain fixed during the mesh generation process. Hence, the equilibrium of the truss structure F Ω only depends implicitly on the interior nodes X Ω . Consider the dependency of the equilibrium equations F Ω on x: F Ω l(X Ω (x), X ∂Ω (h̆{k} (x))), l0 (h̆{k} (x), X Ω (x), X ∂Ω (x)) = 0. (5.29) The equilibrium equations F Ω depend on the deformed lengths l(X Ω , X ∂Ω ) and undeformed lengths l0 (h̆{k} (x), X Ω (x), X ∂Ω (x)) of the truss members. The truss member lengths l(X Ω , X ∂Ω ) in turn depend on the interior nodes X Ω and boundary nodes X ∂Ω . The undeformed lengths l0 (h̆{k} , X ∂Ω ) are evaluated at the midpoints of the truss members, which depend on the interior nodes X Ω and the boundary nodes X ∂Ω . In addition, the ideal element length field h̆{k} (x) changes as a function of x due to the RBF mapping. By taking the derivative of (5.29) w.r.t. to the control variables x we obtain ∂F Ω ∂l ∂X Ω ∂F Ω ∂l ∂X ∂Ω ∂ h̆{k} dF Ω = + dx ∂l ∂X Ω ∂x ∂l ∂X ∂Ω ∂ h̆{k} ∂x ∂F Ω ∂l0 ∂ h̆{k} ∂F Ω ∂l0 ∂X Ω ∂F Ω ∂l0 ∂X ∂Ω + + = 0, (5.30) + ∂l0 ∂ h̆{k} ∂x ∂l0 ∂X Ω ∂x ∂l0 ∂X ∂Ω ∂x to give {k} ∂F Ω ∂l ∂X ∂Ω ∂F Ω ∂l0 ∂ h̆ + ∂Ω ∂l ∂X ∂ h̆{k} ∂l0 ∂ h̆{k} ∂x ∂F Ω ∂l0 ∂X ∂Ω − . (5.31) ∂l0 ∂X ∂Ω ∂x Ω ∂F Ω ∂l ∂F Ω ∂l0 From (5.31) we may solve for ∂X , when + and the right-hand side of ∂x ∂l ∂X Ω ∂l0 ∂X Ω (5.31) are known. side with a finite difference perturbation, We compute the right-hand ∂F Ω ∂l ∂F Ω ∂l0 and recall that ∂l ∂X Ω + ∂l0 ∂X Ω is available from the Newton update, as noted in ∂F Ω ∂l ∂F Ω ∂l0 + Ω ∂l ∂X ∂l0 ∂X Ω ∂X Ω =− ∂x Ω Section 5.5. Once we have solved for ∂X , we obtain ∂x ∂Ω Ω ∂X , where ∂X∂x is obtained numerically. ∂x ∂X ∂x as the union between ∂X ∂Ω ∂x and CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 5.7 103 Numerical study In our numerical studies, we will consider two remeshing strategies. Firstly, we use a remeshing strategy which we will refer to as uniform, which uses an ideal element length field that is spatially uniform, and also kept constant as the optimization iterations progress [66]. Secondly, we will use the newly developed remeshing strategy presented herein, which is characterized by a spatially varying ideal element field that changes as the optimization iterations progress, to which we will refer as adaptive. The parameters used in the remeshing strategies, the finite element analyses and the optimization algorithm are as follows. For the adaptive remeshing strategy we set the mesh refinement parameter p in (5.17) to 5, while the minimum element length hmin is selected as 0.1h{0} , unless otherwise stated. The material considered is an orthotropic Boron-Epoxy in a tape lay-out, i.e. the fibers are all aligned in a single direction, aligned with the global x-axis. In other words, we do not determine an optimal fiber orientation. We assume plane stress conditions and use classical laminate theory (CLT). The material properties used are a longitudinal Young’s modulus of E1 = 228 GPa, a transverse Young’s modulus of E2 = 145 GPa, and a shear modulus of G12 = 48 GPa. The last independent parameter in CLT is Poisson ratio ν12 = 0.23, since ν21 follows from the symmetry relation E1 ν21 = E2 ν12 . The selected parameters for the gradient-only conservative sequential spherical approximation algorithm [65] are the curvature factor α = 2, initial curvature c{0} = 1, convergence tolerance = 10−4 , and a maximum number of outer iterations kmax = 300. {k+1} The Lagrange multiplier update step is selected as λs {k+1} = ∇T λ{k} ]). We λ L([x also limit the maximum step size to 1, which was experimentally found to result in good convergence rates, but this value will in general of course strongly depend on scaling of the problem. Before we proceed, we first validate the (semi) analytical sensitivities and study the convergence rates of the meshing strategies. 5.7.1 Gradient sensitivity comparison We compare our analytical sensitivities to numerical sensitivities obtained with the forward finite difference method. We compute the sensitivity of the displacement at the point of load application (uF ) w.r.t. the indicated control variables of the bow-tie structure [66] depicted in Figure 5.2. The control variables are linearly spaced along the top and bottom with the relevant dimensions as indicated in Figure 5.2. Calculation of the finite difference values is done without Delaunay triangulation steps, to avoid the introduction of discontinuities (due to the addition or removal of nodes) in the finite difference sensitivity analysis. In Table 5.1, we tabulate the sensitivities w.r.t. the control variables for a spatially varying ideal element length field, which confirms that our computations are correct and accurate. The spatially varying ideal element length CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 104 10 1 3 4 5 15 2 6 5 F, uF 7 8 20 Figure 5.2: Bow-tie structure used to validate the (semi) analytical sensitivities and study the convergence behavior of the remeshing strategies. Table 5.1: Analytical and forward finite difference sensitivities calculated for the bow-tie structure depicted in Figure 5.2. Point 1 2 3 4 Analytical (×10−3 ) Numerical (×10−3 ) Point Analytical (×10−3 ) Numerical (×10−3 ) -0.043653 -1.044313 -0.045178 -0.211877 -0.043653 -1.044312 -0.045178 -0.211875 5 6 7 8 0.044330 1.073200 0.028182 0.136393 0.044330 1.073201 0.028182 0.136392 CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 0.8 1 1.5 3500 0.8 1 1.5 0.34 0.32 2500 η{k} SDOF 3000 105 2000 0.3 1500 0.28 1000 5 k 10 15 (a) 5 k 10 15 (b) Figure 5.3: (a) System degrees of freedom (SDOF) and (b) global error η {k} for the mesh convergence study on the bow-tie structure for initial uniform element lengths h0 = {1.5, 1, 0.8}. field was obtained after 10 remeshing update iterations using an initial undeformed truss length of h0 = 2. The numerical gradients are computed with a perturbation of 10−6 . 5.7.2 Convergence rates Again consider the bow-tie structure depicted in Figure 5.2. The structure is meshed using three initial undeformed truss lengths h0 = {1.5, 1, 0.8}. The convergence criterion for the error indicator is given by η {k} − η {k−1} < 10−6 . {k} η (5.32) Results are presented in Figure 5.3, with the system degrees of freedom (SDOF) depicted in Figure 5.3(a) and the global error indicator depicted in Figure 5.3(b). From Figure 5.3(a) it is clear that the system degrees of freedom remain practically constant as the iterations progress. Figure 5.3(b) reveals that the final global error η {max(k)} is lower than the global error of the initial uniform mesh η {1} , for each of the three initial undeformed truss length choices. We also observe that the required number of refinement iterations reduces as the system degrees of freedom increase. Note that the global errors η {k} for the various undeformed truss lengths cannot be compared with each other, since each uses a different σ̆ field. For each of the initial uniform element lengths h0 = {1.5, 1, 0.8}, we depict the initial meshes in Figure 5.4(a)-(c), the final meshes in Figure 5.4(d)-(f) and the ideal element length fields in Figure 5.4(g)-(i). We depict the convergence of the displacements of the bow-tie structure in Figure 5.5. To do so, we approximate the analytical solution u∗F using Richardson’s extrapolation method [29], since an analytical solution is not available for this problem. For Richardson’s extrapolation method we have used the initial uniform element lengths h0 = {1.6, 0.8, 0.4} to compute uniform meshes, with h0 representative of the average CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 15 15 10 10 106 15 3.5 3 10 2.5 2 5 5 1.5 5 1 0.5 0 0 5 10 15 0 0 20 5 (a) 10 15 20 0 0 5 (d) 15 20 (g) 15 15 10 15 2.5 10 10 5 5 2 10 1.5 1 5 0.5 0 0 5 10 0 20 0 15 5 (b) 10 15 20 0 0 5 (e) 15 15 10 10 5 5 10 15 20 0 (h) 15 2 1.5 10 1 5 0.5 0 0 5 10 15 0 20 0 5 (c) 10 15 20 0 0 5 (f) 10 15 20 0 (i) Figure 5.4: Convergence study showing (a)-(c) the initial mesh, (d)-(f) the final mesh and (g)(i) the final ideal element length field of the bow-tie structure for various initial uniform element lengths h0 = {1.5, 1, 0.8}. −0.2 |uF − u*F|/u*F 10 −0.3 10 Uniform Adaptive −0.4 10 2 10 3 10 SDOF 4 10 Figure 5.5: Approximated displacement convergence rate for the bow-tie structure problem using the uniform and adaptive mesh generators. CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 107 x 30 F, u F Figure 5.6: Initial geometry of the cantilever beam using 13 control points x. element length. To approximate the asymptotic convergence rate, we fit a straight line in a least squares sense through the four data points with the highest system degrees of freedom. As shown, the error of the adaptive mesh is less than that of the uniform mesh for a given number of degrees of freedom, while the convergence rate is superior. 5.7.3 Cantilever beam Next, we progress to the equality constrained design of the orthotropic cantilever beam depicted in Figure 5.6. The structure has a predefined length of 30 mm and a thickness of 1 mm. A point load F of 10 N acts at the bottom right corner of the structure. The boundary of the structure is controlled by the 13 control points or design variables x that can only move vertically. The boundary is linearly interpolated between the control points and the control points are linearly spaced along the top of the cantilever beam. We minimize the displacement uF at the point of load application, subject to an equality constraint on volume, expressed as V (x) = V0 , with V0 = 150 mm3 , the prescribed volume of the structure. Convergence histories for the value of the Lagrangian L(x{k} , λ{k} ), the constraint function |g(x{k} )|, the Lagrange multiplier λ{k} and the system degrees of freedom are depicted in Figure 5.7(a)-(d) for the uniform and adaptive mesh generators. (We have used an initial ideal element length of 1.05 for the uniform mesh generator and 1 for the adaptive mesh generator to get a comparable number of system degrees of freedom for the converged shapes.) The required number of iterations and final designs are comparable. The system degrees of freedom of the uniform remeshing strategy changes as the geometry varies, since the defined geometrical domain changes while the uniform mesh generator maintains a constant element length. The number of system degrees of freedom of our adaptive remeshing strategy is roughly constant; the small variations present being the result of nodes being eliminated during convergence of the mesh generator. The interesting aspects of this example are depicted in Figure 5.8. The initial and final designs are respectively depicted in Figure 5.8(a)-(b), and Figure 5.8(c)-(d), with the final ideal element length fields depicted in Figure 5.8(e)-(f), for the uniform and adaptive mesh CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION Uniform Adapted 0.38 0.04 {k} 0.03 |g(x L(x )| ) {k} {k} 0.32 ,λ 0.34 Uniform Adapted 0.05 0.36 108 0.02 0.3 0.01 0.28 0 0.26 10 20 30 k 10 40 20 (a) k 30 40 (b) 1.6 1.5 1400 Uniform Adapted 1.3 1.2 1.1 SDOF λ {k} 1.4 1350 1300 1250 Uniform Adapted 1 10 20 k 30 (c) 40 10 20 k 30 40 (d) Figure 5.7: The cantilever beam convergence histories of (a) the Lagrangian L(x{k} , λ{k} ), (b) absolute value of the constraint function g(x{k} ), (c) Lagrange multiplier λ{k} , and (d) system degrees of freedom (SDOF) for a uniform and adapted mesh using initial ideal element lengths h0 of respectively 1.05 and 1. generators respectively - note the superiority of the latter mesh. The converged designs depicted in Figure 5.8(c)-(d) reflect the parabolic shape known as the analytical solution of the equivalent beam problem. 5.7.4 Michell structure Next, we consider the equality constrained design of the orthotropic Michell-like structure depicted in Figure 5.9. The structure also has a predefined length of 30 mm and thickness of 1 mm, and a point load F of 10 N acts at the center bottom of the structure. The boundary of the structure is controlled by the 16 control points x, which can only move vertically. Again the boundary is linearly interpolated between the control points and the control points are linearly spaced along the top and the bottom of the structure. We minimize the displacement uF at the point of load application, subject to an equality constraint on volume, expressed as V (x) = V0 , with V0 = 75 mm3 , the prescribed volume of the structure. Convergence histories for the value of the Lagrangian L(x{k} , λ{k} ), the constraint function |g(x{k} )|, the Lagrange multiplier λ{k} and the system degrees of freedom are depicted in Figure 5.10(a)-(d) for the uniform and adaptive mesh generators. We have used an initial ideal element length of 0.7 for the uniform mesh generator and 0.8 for the CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 10 10 10 5 5 5 109 2 0 0 10 20 30 0 0 10 (a) 20 30 1 0 0 10 (c) 20 30 0 (e) 10 10 10 5 5 5 2 0 0 10 20 30 0 0 10 (b) 20 30 1 0 0 10 (d) 20 30 0 (f) Figure 5.8: Initial (a)-(b) and final (c)-(d) designs of the cantilever beam with the associated final ideal element length field (e)-(f), for a uniform and adapted mesh using initial ideal element lengths h0 of respectively 1.05 and 1. 15 10 Upper control variables x2 Lower control variables x1 F, uF Figure 5.9: Initial geometry of half the Michell-like structure using 16 control points x. CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 1.2 )| {k} {k} 0.6 |g(x ,λ {k} ) 0.3 0.8 L(x Uniform Adapted 0.4 Uniform Adapted 1 0.2 0.4 0.1 0.2 0 20 40 60 k 80 100 20 120 40 60 80 k (a) 100 120 (b) 1400 3 1200 2.5 Uniform Adapted 2 SDOF 3.5 λ {k} 110 1000 800 600 1.5 Uniform Adapted 400 20 40 60 k (c) 80 100 120 20 40 60 k 80 100 120 (d) Figure 5.10: The Michell structure convergence histories of (a) the Lagrangian L(x{k} , λ{k} ), (b) absolute value of the constraint function g(x{k} ), (c) Lagrange multiplier λ{k} , and (d) system degrees of freedom (SDOF) for a uniform and adaptive mesh using initial ideal element lengths h0 of respectively 0.7 and 0.8. adaptive mesh generator. The required number of iterations for the uniform mesh generator is roughly half that required for the adaptive mesh generator, but the latter mesh is superior. Again the system degrees of freedom of the uniform remeshing strategy changes as the geometry varies, while the initial and final system degrees of freedom of the adaptive remeshing strategy remains almost constant. The initial and final designs are respectively depicted in Figure 5.11(a)-(b) and Figure 5.11(c)-(d), with the final ideal element length fields depicted in Figure 5.11(e)-(f), for the uniform and adaptive mesh generators. While different, the converged designs depicted in Figure 5.11(e)-(f) are similar and compare well with results obtained during previous studies [20, 66]. 5.7.5 Spanner design Finally, we consider the shape design of the full spanner problem presented in Figure 5.12, which is subjected to multiple load cases. The objective is to minimize 21 (uFA − uFB ), with uFA and uFB the vertical displacements at the point of load application, for the two independent load cases FA and FB respectively. The spanner is subjected to an equality constraint on volume, expressed as V (x) = V0 , with V0 = 70 mm3 , the prescribed volume CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 10 10 10 8 8 8 6 6 6 4 4 4 2 2 2 0 0 0 −2 0 5 10 15 −2 0 5 (a) 10 15 −2 0 1 0.5 5 10 8 8 8 6 6 4 4 2 2 0 0 10 15 (b) −2 0 10 15 0 (e) 10 5 1.5 (c) 10 −2 0 111 1.5 6 1 4 2 0.5 0 5 10 (d) 15 −2 0 5 10 15 0 (f) Figure 5.11: Initial (a)-(b) and final (c)-(d) designs of the Michell structure with the associated final ideal element length field (e)-(f), for a uniform and adapted mesh using initial ideal element lengths h0 of respectively 0.7 and 0.8. 24 1 FB , u FB A 10 B A x 2 2 1 B FA , u FA Figure 5.12: Initial geometry and loads of the full spanner problem using 22 control points x. CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION 112 of the structure. The upper and lower boundaries of the geometry are described using 11 control points each. In addition, the control points are linearly spaced along the length of the spanner. The structure has a predefined length of 24 mm and thickness of 1 mm. The magnitude of the point loads FA and FB is 1 N each. Symmetry is not enforced; deviations from symmetry may be used to qualitatively evaluate the obtained designs since this problem should result in a symmetric geometry. The ideal element length field h̆{k} for the mesh is obtained by nodal averaging of the ideal element length fields obtained from the different load cases. Convergence histories for the value of the Lagrangian L(x{k} , λ{k} ), the constraint function |g(x{k} )|, the Lagrange multiplier λ{k} and the system degrees of freedom are depicted in Figure 5.13(a)-(d) for the uniform and adaptive mesh generators. This time, the required number of iterations for the uniform mesh generator are slightly more than that required for the adaptive mesh generator. Again the system degrees of freedom of the uniform remeshing strategy changes as the geometry varies, while the system degrees of freedom of our adaptive remeshing strategy remains roughly constant after an initial unstable 80 iterations. The results depicted in Figure 5.14 compare well with results obtained in previous studies [25, 63]. Due to changes in the SDOF, as depicted in Figure 5.13(d), some oscillatory behavior is observed in the Lagrangian L(x{k} , λ{k} ) within the first 70 iterations, see Figure 5.13(a). 5.8 Conclusions In this study we successfully extended our uniform remeshing strategy [66] to incorporate the well known Zienkiewicz and Zhu global error indicator and refinement strategy. As demonstrated on a bow-tie structure we significantly improve on the quality of the results obtained with uniform meshes. In addition, we showed how gradient-only optimization allows us to efficiently incorporate error indicators and refinement strategies, since we only require a single finite element analysis followed by a posteriori error computation for each candidate shape design, without sacrificing optimization robustness. We demonstrated our strategy on three equality constrained example problems. CHAPTER 5. ADAPTIVE REMESHING IN SHAPE OPTIMIZATION Uniform Adapted 0.25 0.6 0.2 |g(x{k})| L(x{k},λ{k}) 0.3 Uniform Adapted 0.7 113 0.5 0.15 0.1 0.05 0.4 0 50 100 150 k 200 50 250 100 k (a) 200 250 (b) 3 2500 2000 Uniform Adapted 2 SDOF 2.5 λ{k} 150 1500 1000 1.5 Uniform Adapted 500 50 100 150 k 200 250 50 100 k (c) 150 200 250 (d) Figure 5.13: The full spanner convergence histories of (a) the Lagrangian L(x{k} , λ{k} ), (b) absolute value of the constraint function g(x{k} ), (c) Lagrange multiplier λ{k} , and (d) system degrees of freedom (SDOF) for a uniform and adaptive mesh using initial ideal element lengths h0 of respectively 0.7 and 1. 10 10 10 1.5 5 5 5 0 0 0 0 0 0 1 0.5 5 10 15 20 5 (a) 10 15 20 10 5 5 5 10 (b) 15 20 0 0 5 10 (d) 15 20 0 (e) 10 5 10 (c) 10 0 0 5 15 20 0 0 2.5 2 1.5 1 0.5 5 10 15 20 (f) Figure 5.14: Initial (a)-(b) and final (c)-(d) designs of the full spanner with the associated final ideal element length field (e)-(f), for a uniform and adapted mesh using initial ideal element lengths h0 of respectively 0.7 and 1. CHAPTER 6 Conclusion and recommendations This chapter summarises the findings of this study and makes recommendations for further research and investigation. We have applied gradient based optimization techniques to shape design problems. In doing so, we have created a novel unstructured remeshing shape optimization environment, based on a truss structure analogy. The remeshing environment is quadratically convergent in solving for the equilibrium positions of the truss structure. As may be expected, the objective function value in general decreases as the number of control points are increased. This is a direct result of the number of possible design configurations increasing. However, due to unstructured remeshing, non-physical step or jump discontinuities may be introduced into the optimization problem. These discontinuities arise when (partial) differential equations are discretized using non-constant methods: the functions become discontinuous but gradient information is computable everywhere since every point has an associated discretization for which (semi-) analytical sensitivities can be calculated. Although the magnitude of these discontinuities decreases with mesh refinement, their number increases. For the gradient based algorithms, the severity of the anomaly is alleviated as the mesh is refined. Polynomial refinement e.g. linear strain triangles, further decreases the magnitude of the discontinuities. Although these local minima may be overcome efficiently and effectively using a simple multi-start strategy, the computational efficiency and robustness of multi-start strategies may be improved on using gradient-only optimization strategies. To illustrate, we proposed gradient-only implementations of the BFGS algorithm and a SAO algorithm for discontinuous problems, and applied these algorithms to a selection of problems of practical interest, both unconstrained and constrained. These are the design of a heat exchanger fin, the shape design of an orthotropic Michell-like structure, 114 CHAPTER 6. CONCLUSION AND RECOMMENDATIONS 115 and a material identification study using a modified Voce law, all discretized using nonconstant methods. In each instance, the gradient-only based algorithms found superior solutions to the classical methods that use both function and gradient information. As opposed to surrogate methods based on design of experiment (DOE) techniques, which scale poorly, gradient-only algorithms based on classical optimization algorithms that scale well, may also be expected to scale well (provided the gradient computations scale well); this may well become an important application of gradient-only methods. Another envisaged application of gradient-only algorithms are any problem for which gradient computations are inexpensive. In addition, we successfully extended the uniform remeshing strategy presented in Chapter 2 to incorporate the well known Zienkiewicz and Zhu global error indicator and refinement strategy. It significantly improves on the quality of the results obtained with uniform meshes. We showed how gradient-only optimization allows us to incorporate error indicators and refinement strategies efficiently, since we only require a single finite element analysis followed by a posteriori error computation for each candidate shape design, without sacrificing optimization robustness. The implications of our approach are that variable discretization strategies, which are so important in numerical discretization methods, may be used in combination with efficient local optimization algorithms, notwithstanding the fact that these strategies themselves introduce step discontinuities. Among others, future endeavors should in our opinion concentrate on the inclusion of constraint functions, in particular step discontinuous constraint functions e.g. stress constraints, as well as the reduction of the required computational effort. An investigation of whether linearizing the error indicator field improves the convergence of the gradient-only optimization strategies should in our opinion also be conducted. Bibliography [1] Allaire, G, Jouve, F and Toader, A (February 2004), “Structural optimization using sensitivity analysis and a level-set method,” Journal of Computational Physics, 194 (1), 363–393, ISSN 0021-9991. [2] Avriel, M (2003), Nonlinear programming Analysis and Methods, Dover. [3] Barthelemy, JF and Haftka, RT (1993), “Approximation concepts for optimum structural design - a review,” Struct. Opt., 5, 129–144. [4] Bazaraa, MS, Sherali, HD and Shetty, CM (1993), Nonlinear programming, 2nd edition, Wiley, ISBN 0471557935, 9780471557937. [5] Bazaraa, MS, Sherali, HD and Shetty, CM (2006), Nonlinear programming, John Wiley and Sons, ISBN 0471486000, 9780471486008. [6] Belegundu, AD and Rajan, SD (1988), “A shape optimization approach based on natural design variables and shape functions,” Computer Methods in Applied Mechanics and Engineering, 66 (1), 87–106, ISSN 0045-7825. [7] Belitz, P and Bewley, T (2007), “Efficient derivative-free optimization,” in “Decision and Control, 2007 46th IEEE Conference on,” pages 5358–5363, ISBN 0191-2216. [8] Berberian, SK (1994), A first course in real analysis, Springer, ISBN 0387942173, 9780387942179. [9] Brandstatter, BR, Ring, W, Magele, C and Richter, KR (1998), “Shape design with great geometrical deformations using continuously moving finite element nodes,” Magnetics, IEEE Transactions on, 34 (5), 2877–2880. 116 BIBLIOGRAPHY 117 [10] Bugeda, G and Oate, E (February 1994), “A methodology for adaptive mesh refinement in optimum shape design problems,” Computing Systems in Engineering, 5 (1), 91–102. [11] Burmen, A and Tuma, T (2009), “Unconstrained derivative-free optimization by successive approximation,” Journal of Computational and Applied Mathematics, 223 (1), 62–74, ISSN 0377-0427. [12] Clarke, FH (1989), Methods of dynamic and nonsmooth optimization, number 57 in CBMS-NSF regional conference series in applied mathematics, Capital City Press, Montpelier, Vermont, USA, ISBN 0898712416, 9780898712414. [13] Clarke, FH (1990), Optimization and nonsmooth analysis, number 5 in Canadian Mathematical Society series in mathematics, Wiley-Interscience, New York, NY, USA, ISBN 0898712564, 9780898712568. [14] Conn, A, Scheinberg, K and Toint, P (October 1997), “Recent progress in unconstrained nonlinear optimization without derivatives,” Mathematical Programming, Series B, 79 (1-3), 397–414. [15] Cook, R, Malkus, D, Plesha, M and Witt, R (2002), Concepts and applications of finite element analysis, xvi, 719 p. : edition, Wiley, (New York), ISBN 624.171. [16] Ding, Y (1986), “Shape optimization of structures: a literature survey,” Computers & Structures, 24 (6), 985–1004. [17] Dutta, J (December 2005), “Generalized derivatives and nonsmooth optimization, a finite dimensional tour,” TOP, 13 (2), 185–279. [18] Edelsbrunner, H (2001), Geometry and topology for mesh generation, Cambridge University Press, ISBN 0521793092, 9780521793094. [19] Forrester, AI and Keane, AJ (2009), “Recent advances in surrogate-based optimization,” Progress in Aerospace Sciences, 45 (1-3), 50–79, ISSN 0376-0421. [20] Garcia, MJ and Gonzalez, CA (2004), “Shape optimisation of continuum structures via evolution strategies and fixed grid finite element analysis,” Structural and Multidisciplinary Optimization, V26 (1), 92–98. [21] Gould, N, Orban, D and Toint, P (2005), “Numerical methods for Large-Scale nonlinear optimization,” Acta Numerica, 14 (-1), 299361. [22] Groenwold, AA, Etman, LFP, Snyman, JA and Rooda, JE (2007), “Incomplete series expansion for function approximation,” Structural and Multidisciplinary Optimization. BIBLIOGRAPHY 118 [23] Haftka, RT and Grandhi, RV (August 1986), “Structural shape optimization–A survey,” Computer Methods in Applied Mechanics and Engineering, 57 (1), 91–106. [24] Haftka, RT and Gurdal, Z (1991), Elements of structural optimization, volume 11 of Solid Mechanics and its applications, 3rd edition, Kluwer Academic Publishers, Dordrecht, the Netherlands. [25] Herskovits, J, Dias, G, Santos, G and Soares, CM (October 2000), “Shape structural optimization with an interior point nonlinear programming algorithm,” Structural and Multidisciplinary Optimization, 20 (2), 107–115. [26] Hinton, E, Özakca, M and Rao, N (1991), “An integrated approach to structural shape optimization of linearly elastic structures. part II: shape definition and adaptivity,” Computing Systems in Engineering, 2 (1), 41–56, ISSN 0956-0521, doi: 10.1016/0956-0521(91)90038-7. [27] Hinton, E, Rao, NVR and Özakca, M (1991), “An integrated approach to structural shape optimization of linearly elastic structures. part i: General methodology,” Computing Systems in Engineering, 2 (1), 27–39. [28] Imam, MH (May 1982), “Three-dimensional shape optimization,” International Journal for Numerical Methods in Engineering, 18 (5), 661–673. [29] Joyce, DC (1971), “Survey of extrapolation processes in numerical analysis,” SIAM Review, 13 (4), 435–490, ISSN 00361445. [30] Kikuchi, N (April 1986), “Adaptive grid-design methods for finite element analysis,” Computer Methods in Applied Mechanics and Engineering, 55 (1-2), 129–160. [31] Kocks, UF (1976), “Laws for work-hardening and low-temperature creep.” J Eng Mater Technol Trans ASME, 98 Ser H (1), 76–85. [32] Kodiyalam, S and Thanedar, PB (1993), “Some practical aspects of shape optimization and its influence on intermediate mesh refinement,” Finite Elements in Analysis and Design, 15 (2), 125–133. [33] Kok, S, Beaudoin, AJ and Tortorelli, DA (April 2002), “On the development of stage IV hardening using a model based on the mechanical threshold,” Acta Materialia, 50 (7), 1653–1667. [34] Laporte, E and Tallec, PL (2002), Numerical Methods in Sensitivity Analysis and Shape Optimization, Modeling and Simulation in Science, Engineering and Technology, Birkhuser. BIBLIOGRAPHY 119 [35] Li, Q, Steven, GP, Querin, OM and Xie, YM (November 1999), “Evolutionary shape optimization for stress minimization,” Mechanics Research Communications, 26 (6), 657–664, ISSN 0093-6413. [36] Lui, D and Nocedal, J (August 1989), “On the limited memory BFGS method for large scale optimization,” Mathematical Programming, 54 (1-3), 503–528. [37] Martnez, R and Samartn, A (1991), “Two-dimensional mesh optimization in the finite element method,” Computers & Structures, 40 (5), 1169–1175, ISSN 00457949. [38] Mattheck, C and Burkhardt, S (May 1990), “A new method of structural shape optimization based on biological growth,” International Journal of Fatigue, 12 (3), 185–190, ISSN 0142-1123. [39] Miegroet, LV, Mos, N, Fleury, C and Duysinx, P (May 2005), “Generalized shape optimization based on the level set method,” in “6th World Congresses of Structural and Multidisciplinary Optimization,” pages 1–10. [40] Olhoff, N, Rasmussen, J and Lund, E (1993), “A method of exact numerical differentiation for error elimination in finite element based semi-analytical shape sensitivity analysis,” Mechanics of Structures and Machines, 21, 1–66. [41] Peressini, AL, Sullivan, FE and Uhl, JJ (1988), The mathematics of nonlinear programming, Springer, ISBN 0387966145, 9780387966144. [42] Persson, PO and Strang, G (2004), “A simple mesh generator in MATLAB,” SIAM Review, 46 (2), 329–345. [43] Potra, FA and Shi, Y (June 1995), “Efficient line search algorithm for unconstrained optimization,” Journal of Optimization Theory and Applications, 85 (3), 677–704. [44] Quapp, W (May 1996), “A gradient-only algorithm for tracing a reaction path uphill to the saddle of a potential energy surface,” Chemical Physics Letters, 253 (3-4), 286–292. [45] Rao, SS (2009), Engineering Optimization: Theory and Practice, John Wiley and Sons. [46] Rardin, RL (August 1997), Optimization in Operations Research, Prentice Hall, ISBN 0023984155. [47] Sacks, J, Welch, WJ, Mitchell, TJ and Wynn, HP (November 1989), “Design and analysis of computer experiments,” Statistical Science, 4 (4), 409–423, ISSN 08834237, ArticleType: primary article / Full publication date: Nov., 1989 / Copyright 1989 Institute of Mathematical Statistics. BIBLIOGRAPHY 120 [48] Schleupen, A, Maute, K and Ramm, E (July 2000), “Adaptive FE-procedures in shape optimization,” Structural and Multidisciplinary Optimization, 19 (4), 282–302. [49] Shor, NZ, Kiwiel, KC and Ruszcaynski, A (1985), Minimization methods for nondifferentiable functions, Springer-Verlag New York, Inc., New York, NY, USA. [50] Sienz, J and Hinton, E (July 1997), “Reliable structural optimization with error estimation, adaptivity and robust sensitivity analysis,” Computers & Structures, 64 (1-4), 31–63, ISSN 0045-7949. [51] Simpson, T, Toropov, V, Balabanov, V and Viana, F (September 2008), “Design and analysis of computer experiments in multidisciplinary design optimization: A review of how far we have come - or not,” in “Proc. 12th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference,” Victoria, British Columbia, Canada. [52] Snyman, J and Hay, AM (July 2001), “The spherical quadratic steepest descent (SQSD) method for unconstrained minimization with no explicit line searches,” Computers and Mathematics with Applications, 42 (1-2), 169–178. [53] Snyman, JA (December 1982), “A new and dynamic method for unconstrained minimization,” Applied Mathematical Modelling, 6 (6), 449–462. [54] Snyman, JA (2005), “A gradient-only line search method for the conjugate gradient method applied to constrained optimization problems with severe noise in the objective function,” International Journal for Numerical Methods in Engineering, 62 (1), 72–82. [55] Snyman, JA (2005), Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms, Applied Optimization, Vol. 97, 2nd edition, Springer-Verlag New York, Inc. [56] Snyman, JA and Hay, AM (December 2002), “The Dynamic-Q optimization method: An alternative to SQP?” Computers & Mathematics with Applications, 44 (12), 1589–1598. [57] Subramanian, G and Bose, JC (1982), “Convenient generation of stiffness matrices for the family of plane triangular elements,” Computers & Structures, 15 (1), 85–89. [58] Svanberg, K (2002), “A class of globally convergent optimization methods based on conservative convex separable approximations,” SIAM Journal on Optimization, 12 (2), 555–573. [59] Toropov, VV (1989), “Simulation approach to structural optimization,” Structural Optimization, 1, 37–46. BIBLIOGRAPHY 121 [60] Turcke, D and Mcneice, GM (May 1974), “Guidelines for selecting finite element grids based on an optimization study,” Computers & Structures, 4 (3), 499–519, ISSN 00457949. [61] Van Keulen, F, Polynkine, AA and Toropov, VV (1997), “Shape optimization with adaptive mesh refinement: Target error selection strategies,” Eng Optim, 28 (1-2), 95–125. [62] Voce, E (1955), “A practical strain-hardening function,” Metallurgica, 51, 219–226. [63] Wall, WA, Frenzel, MA and Cyron, C (June 2008), “Isogeometric structural shape optimization,” Computer Methods in Applied Mechanics and Engineering, 197 (3340), 2976–2988, ISSN 0045-7825. [64] Wallis, J (1685), A treatise of algebra, both historical and practical, London, London, England, published: Printed by J. Playford, for R. Davis. [65] Wilke, DN, Kok, S and Groenwold, A (2010), “The application of gradient-only optimization methods for problems discretized using non-constant methods,” Structural and Multidisciplinary Optimization, 40 (1), 433–451. [66] Wilke, DN, Kok, S and Groenwold, AA (2006), “A quadratically convergent unstructured remeshing strategy for shape optimization,” International Journal for Numerical Methods in Engineering, 65 (1), 1–17. [67] Xie, Y and Steven, G (December 1993), “A simple evolutionary procedure for structural optimization,” Computers & Structures, 49 (5), 885–896, ISSN 0045-7949. [68] Zhang, L (2005), “A globally convergent bfgs method for nonconvex minimization without line searches,” Optimization Methods and Software, 20, 737–747. [69] Zienkiewicz, OC (2006), “The background of error estimation and adaptivity in finite element computations,” Adaptive Modeling and Simulation, 195 (4-6), 207–213. [70] Zienkiewicz, OC, Taylor, RL and Zhu, JZ (2005), The finite element method, Butterworth-Heinemann, ISBN 0750663200, 9780750663205. [71] Zienkiewicz, OC and Zhu, JZ (1987), “A simple error estimator and adaptive procedure for practical engineerng analysis,” International Journal for Numerical Methods in Engineering, 24 (2), 337–357. Appendix Gradient-only optimization requires the computation of accurate gradients. Various approaches are at our disposal to do this, e.g. direct differentiation and automatic differentiation. Finite difference schemes should be used with more caution; erroneous sign changes in particular are undesirable. The use of analytical or semi-analytical gradients renders gradient-only optimization computationally competitive. We do not explicitly give the sensitivities for the heat transfer problem; it is the easiest problem, and the approach is similar to the developments presented in Chapter 2. Analytical sensitivities for the material identification study In this study we compute the analytical sensitivities by direct differentiation of (3.44) and (3.45) with respect to the design variables i.e. θ0 , c, σys , σ40 and σy0 . We start with θ0 . Since only (3.45) depends on θ0 , the sensitivity of σyi+1 w.r.t. θ0 is given by σyi dσyi σ4i+1 dσyi+1 = + 1 − + ∆ip dθ0 dθ0 σys σyi (6.1) dσyi 1 i+1 d 1 + θ0 1 − dθ0 σys + σ4 dθ0 ( σi ) ∆ip . y We note that (3.45) depends on (3.44), therefore both equations depend on c. In computing the sensitivity of σyi+1 w.r.t. c, we obtain dσyi+1 dc = + and dσyi dσ i + θ0 1 − dcy σ1ys dc dσ4i+1 1 ∆ip , dc σyi d 1 + σ4i+1 dc ( σi ) dσ4i+1 dσ i i = 4 + i+1 − p p . dc dc 122 y (6.2) (6.3) BIBLIOGRAPHY 123 Since only (3.45) depends on σys , therefore the sensitivity of σyi+1 w.r.t. σys is given by dσyi+1 dσys dσ i + θ0 1 − dσysy σ1ys − σyi dσdys ( σ1ys ) + σ4i+1 dσdys ( σ1i ) ∆ip . = σyi dσys (6.4) y Both (3.44) and (3.45) depend on σ40 , therefore the sensitivity of σyi+1 w.r.t. σ40 is given by σyi dσyi 1 dσ4i+1 1 dσyi+1 + = + θ 1 − 0 0 0 0 σ dσ4 dσ4 dσ40 σyi dσ4 ys (6.5) + σ4i+1 dσd 0 ( σ1i ) ∆ip , 4 y whereas the sensitivity of σ4i+1 w.r.t. σ40 is given by dσ4i+1 dσ4i = = 1. dσ40 dσ40 (6.6) Since only (3.45) depends on σy0 , the sensitivity of σyi+1 w.r.t. σy0 is given by dσyi+1 σyi dσyi 1 1 i i+1 d + σ4 = 0 + θ0 1 − 0 ( ) ∆p . dσy0 dσy dσy σys dσy0 σyi (6.7) Radial basis function mapping The radial basis function s(z) with z ∈ R2 for the two dimensional case, is given by s(z) = nb X αj φ(kz − X ∂Ω j k) + p(z), (6.8) j=1 with p a polynomial, nb the number of boundary nodes and φ a given basis function with respect to the norm kzk. The coefficients αj and the polynomial p are determined by the interpolation conditions ∂Ω s(X ∂Ω (6.9) i ) = di , i = 1, 2, . . . , nb with d∂Ω the displacement of the ith boundary node. In addition it is required that i nb X αj q(X ∂Ω j ) = 0, (6.10) j=1 for all polynomials q with a degree less or equal than that of polynomial p. We rewrite (6.9) into a matrix form as d∂Ω = M α + P β, (6.11) where M is an nb × nb matrix with the ith row and j th column containing the evaluation ∂Ω of the basis function φ(kX ∂Ω i − X j k). For two dimensional interpolations P is an nb × 3 BIBLIOGRAPHY 124 matrix with the ith row given by [1 Xix∂Ω Xiy∂Ω ], where Xix∂Ω and Xiy∂Ω are respectively the x and y coordinates of the ith boundary node. Similarly, (6.10) can be written in a matrix form P T α = 0. (6.12) We therefore need to solve the two systems of linear equations (6.11) and (6.12). We start by rewriting (6.11) to obtain α = M −1 d∂Ω − M −1 P β, (6.13) which we substitute into (6.12) to obtain P T (M −1 d∂Ω − M −1 P β) = 0. We then solve for β from P T M −1 P β = P T M −1 d∂Ω , (6.14) and α from (6.13). After solving for α and β the RBF s(z) is defined and can be used to update the interior nodes X Ω as the geometry changes. The boundary displacements d∂Ω are obtained from the control variables x and piece-wise linear boundary interpolation.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement