Scotch and libScotch 5.1 User’s Guide (version 5.1.1) François Pellegrini ScAlApplix project, INRIA Bordeaux Sud-Ouest ENSEIRB & LaBRI, UMR CNRS 5800 Université Bordeaux I 351 cours de la Libération, 33405 TALENCE, FRANCE [email protected] October 14, 2008 Abstract This document describes the capabilities and operations of Scotch and libScotch, a software package and a software library devoted to static mapping, partitioning, and sparse matrix block ordering of graphs and meshes/hypergraphs. It gives brief descriptions of the algorithms, details the input/output formats, instructions for use, installation procedures, and provides a number of examples. Scotch is distributed as free/libre software, and has been designed such that new partitioning or ordering methods can be added in a straightforward manner. It can therefore be used as a testbed for the easy and quick coding and testing of such new methods, and may also be redistributed, as a library, along with third-party software that makes use of it, either in its original or in updated forms. 1 Contents 1 Introduction 1.1 Static mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Sparse matrix ordering . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Contents of this document . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 6 2 The Scotch project 2.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 7 3 Algorithms 3.1 Static mapping by Dual Recursive Bipartitioning . . . . . . . . 3.1.1 Static mapping . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Cost function and performance criteria . . . . . . . . . . 3.1.3 The Dual Recursive Bipartitioning algorithm . . . . . . 3.1.4 Partial cost function . . . . . . . . . . . . . . . . . . . . 3.1.5 Execution scheme . . . . . . . . . . . . . . . . . . . . . 3.1.6 Graph bipartitioning methods . . . . . . . . . . . . . . . 3.1.7 Mapping onto variable-sized architectures . . . . . . . . 3.2 Sparse matrix ordering by hybrid incomplete nested dissection 3.2.1 Minimum Degree . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Nested dissection . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Hybridization . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Performance criteria . . . . . . . . . . . . . . . . . . . . 3.2.5 Ordering methods . . . . . . . . . . . . . . . . . . . . . 3.2.6 Graph separation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 8 8 9 10 11 11 14 15 15 15 15 16 16 17 4 Updates 4.1 Changes from version 4.0 . . . . . . . . . . . . . . . . . . . . . . . . 18 18 5 Files and data structures 5.1 Graph files . . . . . . . . . . . . . . . . . . . . 5.2 Mesh files . . . . . . . . . . . . . . . . . . . . . 5.3 Geometry files . . . . . . . . . . . . . . . . . . 5.4 Target files . . . . . . . . . . . . . . . . . . . . 5.4.1 Decomposition-defined architecture files 5.4.2 Algorithmically-coded architecture files 5.4.3 Variable-sized architecture files . . . . . 5.5 Mapping files . . . . . . . . . . . . . . . . . . . 5.6 Ordering files . . . . . . . . . . . . . . . . . . . 5.7 Vertex list files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 18 20 21 22 22 23 24 25 25 26 6 Programs 6.1 Invocation . . . . 6.2 Using compressed 6.3 Description . . . 6.3.1 acpl . . . 6.3.2 amk * . . 6.3.3 amk grf . 6.3.4 atst . . . 6.3.5 gcv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 27 27 29 29 29 31 32 32 . . . files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.6 6.3.7 6.3.8 6.3.9 6.3.10 6.3.11 6.3.12 6.3.13 6.3.14 6.3.15 6.3.16 6.3.17 gmap . . gmk * . gmk msh gmtst . gord . . gotst . gout . . gtst . . mcv . . mmk * . mord . . mtst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 34 35 36 36 38 38 41 42 42 43 44 7 Library 7.1 Calling the routines of libScotch . . . . . 7.1.1 Calling from C . . . . . . . . . . . . 7.1.2 Calling from Fortran . . . . . . . . . 7.1.3 Compiling and linking . . . . . . . . 7.1.4 Machine word size issues . . . . . . . 7.2 Data formats . . . . . . . . . . . . . . . . . 7.2.1 Architecture format . . . . . . . . . 7.2.2 Graph format . . . . . . . . . . . . . 7.2.3 Mesh format . . . . . . . . . . . . . 7.2.4 Geometry format . . . . . . . . . . . 7.2.5 Block ordering format . . . . . . . . 7.3 Strategy strings . . . . . . . . . . . . . . . . 7.3.1 Mapping strategy strings . . . . . . 7.3.2 Graph bipartitioning strategy strings 7.3.3 Ordering strategy strings . . . . . . 7.3.4 Node separation strategy strings . . 7.4 Target architecture handling routines . . . . 7.4.1 SCOTCH archInit . . . . . . . . . . . 7.4.2 SCOTCH archExit . . . . . . . . . . . 7.4.3 SCOTCH archLoad . . . . . . . . . . . 7.4.4 SCOTCH archSave . . . . . . . . . . . 7.4.5 SCOTCH archBuild . . . . . . . . . . 7.4.6 SCOTCH archCmplt . . . . . . . . . . 7.4.7 SCOTCH archCmpltw . . . . . . . . . 7.4.8 SCOTCH archName . . . . . . . . . . . 7.4.9 SCOTCH archSize . . . . . . . . . . . 7.5 Graph handling routines . . . . . . . . . . . 7.5.1 SCOTCH graphInit . . . . . . . . . . 7.5.2 SCOTCH graphExit . . . . . . . . . . 7.5.3 SCOTCH graphFree . . . . . . . . . . 7.5.4 SCOTCH graphLoad . . . . . . . . . . 7.5.5 SCOTCH graphSave . . . . . . . . . . 7.5.6 SCOTCH graphBuild . . . . . . . . . 7.5.7 SCOTCH graphBase . . . . . . . . . . 7.5.8 SCOTCH graphCheck . . . . . . . . . 7.5.9 SCOTCH graphSize . . . . . . . . . . 7.5.10 SCOTCH graphData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 45 46 47 47 48 48 48 50 51 53 54 54 55 58 61 65 65 66 66 67 67 68 68 69 69 70 70 70 70 71 71 72 73 74 74 75 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.11 SCOTCH graphStat . . . . . . . . . Graph mapping and partitioning routines 7.6.1 SCOTCH graphPart . . . . . . . . . 7.6.2 SCOTCH graphMap . . . . . . . . . . 7.6.3 SCOTCH graphMapInit . . . . . . . 7.6.4 SCOTCH graphMapExit . . . . . . . 7.6.5 SCOTCH graphMapLoad . . . . . . . 7.6.6 SCOTCH graphMapSave . . . . . . . 7.6.7 SCOTCH graphMapCompute . . . . . 7.6.8 SCOTCH graphMapView . . . . . . . 7.7 Graph ordering routines . . . . . . . . . . 7.7.1 SCOTCH graphOrder . . . . . . . . 7.7.2 SCOTCH graphOrderInit . . . . . . 7.7.3 SCOTCH graphOrderExit . . . . . . 7.7.4 SCOTCH graphOrderLoad . . . . . . 7.7.5 SCOTCH graphOrderSave . . . . . . 7.7.6 SCOTCH graphOrderSaveMap . . . 7.7.7 SCOTCH graphOrderSaveTree . . . 7.7.8 SCOTCH graphOrderCheck . . . . . 7.7.9 SCOTCH graphOrderCompute . . . 7.7.10 SCOTCH graphOrderComputeList . 7.8 Mesh handling routines . . . . . . . . . . 7.8.1 SCOTCH meshInit . . . . . . . . . . 7.8.2 SCOTCH meshExit . . . . . . . . . . 7.8.3 SCOTCH meshLoad . . . . . . . . . . 7.8.4 SCOTCH meshSave . . . . . . . . . . 7.8.5 SCOTCH meshBuild . . . . . . . . . 7.8.6 SCOTCH meshCheck . . . . . . . . . 7.8.7 SCOTCH meshSize . . . . . . . . . . 7.8.8 SCOTCH meshData . . . . . . . . . . 7.8.9 SCOTCH meshStat . . . . . . . . . . 7.8.10 SCOTCH meshGraph . . . . . . . . . 7.9 Mesh ordering routines . . . . . . . . . . . 7.9.1 SCOTCH meshOrder . . . . . . . . . 7.9.2 SCOTCH meshOrderInit . . . . . . 7.9.3 SCOTCH meshOrderExit . . . . . . 7.9.4 SCOTCH meshOrderSave . . . . . . 7.9.5 SCOTCH meshOrderSaveMap . . . . 7.9.6 SCOTCH meshOrderSaveTree . . . 7.9.7 SCOTCH meshOrderCheck . . . . . . 7.9.8 SCOTCH meshOrderCompute . . . . 7.10 Strategy handling routines . . . . . . . . . 7.10.1 SCOTCH stratInit . . . . . . . . . 7.10.2 SCOTCH stratExit . . . . . . . . . 7.10.3 SCOTCH stratSave . . . . . . . . . 7.10.4 SCOTCH stratGraphBipart . . . . 7.10.5 SCOTCH stratGraphMap . . . . . . 7.10.6 SCOTCH stratGraphOrder . . . . . 7.10.7 SCOTCH stratMeshOrder . . . . . . 7.11 Geometry handling routines . . . . . . . . 7.11.1 SCOTCH geomInit . . . . . . . . . . 7.6 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 77 77 78 79 79 80 80 81 81 82 82 83 84 84 85 85 86 86 87 87 88 88 89 89 90 90 92 92 93 94 95 96 96 97 98 98 99 99 100 100 101 101 101 102 102 103 103 104 104 104 7.11.2 SCOTCH geomExit . . . . . . . 7.11.3 SCOTCH geomData . . . . . . . 7.11.4 SCOTCH graphGeomLoadChac 7.11.5 SCOTCH graphGeomSaveChac 7.11.6 SCOTCH graphGeomLoadHabo 7.11.7 SCOTCH graphGeomLoadScot 7.11.8 SCOTCH graphGeomSaveScot 7.11.9 SCOTCH meshGeomLoadHabo . 7.11.10 SCOTCH meshGeomLoadScot . 7.11.11 SCOTCH meshGeomSaveScot . 7.12 Error handling routines . . . . . . . 7.12.1 SCOTCH errorPrint . . . . . 7.12.2 SCOTCH errorPrintW . . . . . 7.12.3 SCOTCH errorProg . . . . . . 7.13 Miscellaneous routines . . . . . . . . 7.13.1 SCOTCH randomReset . . . . . 7.14 MeTiS compatibility library . . . . . 7.14.1 METIS EdgeND . . . . . . . . . 7.14.2 METIS NodeND . . . . . . . . . 7.14.3 METIS NodeWND . . . . . . . . 7.14.4 METIS PartGraphKway . . . . 7.14.5 METIS PartGraphRecursive 7.14.6 METIS PartGraphVKway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 105 106 107 107 108 108 109 110 110 111 111 112 112 112 112 113 113 113 114 115 116 116 8 Installation 117 9 Examples 118 10 Adding new features to Scotch 10.1 Graphs and meshes . . . . . . . . . . . . 10.2 Methods and partition data . . . . . . . 10.3 Adding a new method to Scotch . . . 10.4 Licensing of new methods and of derived 1 1.1 . . . . . . . . . . . . works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 120 121 121 123 Introduction Static mapping The efficient execution of a parallel program on a parallel machine requires that the communicating processes of the program be assigned to the processors of the machine so as to minimize its overall running time. When processes have a limited duration and their logical dependencies are accounted for, this optimization problem is referred to as scheduling. When processes are assumed to coexist simultaneously for the entire duration of the program, it is referred to as mapping. It amounts to balancing the computational weight of the processes among the processors of the machine, while reducing the cost of communication by keeping intensively inter-communicating processes on nearby processors. In most cases, the underlying computational structure of the parallel programs to map can be conveniently modeled as a graph in which vertices correspond to processes that handle distributed pieces of data, and edges reflect data dependencies. The mapping problem can then be addressed by assigning processor labels to the vertices of the graph, so that all 5 processes assigned to some processor are loaded and run on it. In a SPMD context, this is equivalent to the distribution across processors of the data structures of parallel programs; in this case, all pieces of data assigned to some processor are handled by a single process located on this processor. A mapping is called static if it is computed prior to the execution of the program. Static mapping is NP-complete in the general case [13]. Therefore, many studies have been carried out in order to find sub-optimal solutions in reasonable time, including the development of specific algorithms for common topologies such as the hypercube [11, 21]. When the target machine is assumed to have a communication network in the shape of a complete graph, the static mapping problem turns into the partitioning problem, which has also been intensely studied [4, 22, 31, 33, 51]. However, when mapping onto parallel machines the communication network of which is not a bus, not accounting for the topology of the target machine usually leads to worse running times, because simple cut minimization can induce more expensive long-distance communication [21, 58]. 1.2 Sparse matrix ordering Many scientific and engineering problems can be modeled by sparse linear systems, which are solved either by iterative or direct methods. To achieve efficiency with direct methods, one must minimize the fill-in induced by factorization. This fill-in is a direct consequence of the order in which the unknowns of the linear system are numbered, and its effects are critical both in terms of memory and computation costs. An efficient way to compute fill reducing orderings of symmetric sparse matrices is to use recursive nested dissection [17]. It amounts to computing a vertex set S that separates the graph into two parts A and B, ordering S with the highest indices that are still available, and proceeding recursively on parts A and B until their sizes become smaller than some threshold value. This ordering guarantees that, at each step, no non-zero term can appear in the factorization process between unknowns of A and unknowns of B. The main issue of the nested dissection ordering algorithm is thus to find small vertex separators that balance the remaining subgraphs as evenly as possible, in order to minimize fill-in and to increase concurrency in the factorization process. 1.3 Contents of this document This document describes the capabilities and operations of Scotch, a software package devoted to static mapping, graph and mesh partitioning, and sparse matrix block ordering. Scotch allows the user to map efficiently any kind of weighted process graph onto any kind of weighted architecture graph, and provides highquality block orderings of sparse matrices. The rest of this manual is organized as follows. Section 2 presents the goals of the Scotch project, and section 3 outlines the most important aspects of the mapping and ordering algorithms that it implements. Section 4 summarizes the most important changes between version 5.0 and previous versions. Section 5 defines the formats of the files used in Scotch, section 6 describes the programs of the Scotch distribution, and section 7 defines the interface and operations of the libScotch library. Section 8 explains how to obtain and install the Scotch distribution. Finally, some practical examples are given in section 9, and instructions on how to implement new methods in the libScotch library are provided in section 10. 6 2 2.1 The Scotch project Description Scotch is a project carried out at the Laboratoire Bordelais de Recherche en Informatique (LaBRI) of the Université Bordeaux I, and now within the ScAlApplix project of INRIA Bordeaux Sud-Ouest. Its goal is to study the applications of graph theory to scientific computing, using a “divide and conquer” approach. It focused first on static mapping, and has resulted in the development of the Dual Recursive Bipartitioning (or DRB) mapping algorithm and in the study of several graph bipartitioning heuristics [43], all of which have been implemented in the Scotch software package [47]. Then, it focused on the computation of highquality vertex separators for the ordering of sparse matrices by nested dissection, by extending the work that has been done on graph partitioning in the context of static mapping [48, 49]. More recently, the ordering capabilities of Scotch have been extended to native mesh structures, thanks to hypergraph partitioning algorithms. New graph partitioning methods have also been recently added [8, 44]. Version 5.0 of Scotch is the first one to comprise parallel graph ordering routines. The parallel features of Scotch are referred to as PT-Scotch (“Parallel Threaded Scotch”). While both packages share a significant amount of code, beacuse PT-Scotch transfers control to the sequential routines of the libScotch library when the subgraphs on which it operates are located on a single processor, the two sets of routines have a distinct user’s manual. Readers interested in the parallel features of Scotch should refer to the PT-Scotch 5.1 User’s Guide [45]. 2.2 Availability Starting from version 4.0, which has been developed at INRIA within the ScAlApplix project, Scotch is available under a dual licensing basis. On the one hand, it is downloadable from the Scotch web page as free/libre software, to all interested parties willing to use it as a library or to contribute to it as a testbed for new partitioning and ordering methods. On the other hand, it can also be distributed, under other types of licenses and conditions, to parties willing to embed it tightly into closed, proprietary software. The free/libre software license under which Scotch 5.1 is distributed is the CeCILL-C license [6], which has basically the same features as the GNU LGPL (“Lesser General Public License”): ability to link the code as a library to any free/libre or even proprietary software, ability to modify the code and to redistribute these modifications. Version 4.0 of Scotch was distributed under the LGPL itself. Please refer to section 8 to see how to obtain the free/libre distribution of Scotch. 3 3.1 Algorithms Static mapping by Dual Recursive Bipartitioning For a detailed description of the mapping algorithm and an extensive analysis of its performance, please refer to [43, 46]. In the next sections, we will only outline the most important aspects of the algorithm. 7 3.1.1 Static mapping The parallel program to be mapped onto the target architecture is modeled by a valuated unoriented graph S called source graph or process graph, the vertices of which represent the processes of the parallel program, and the edges of which the communication channels between communicating processes. Vertex- and edge- valuations associate with every vertex vS and every edge eS of S integer numbers wS (vS ) and wS (eS ) which estimate the computation weight of the corresponding process and the amount of communication to be transmitted on the channel, respectively. The target machine onto which is mapped the parallel program is also modeled by a valuated unoriented graph T called target graph or architecture graph. Vertices vT and edges eT of T are assigned integer weights wT (vT ) and wT (eT ), which estimate the computational power of the corresponding processor and the cost of traversal of the inter-processor link, respectively. A mapping from S to T consists of two applications τS,T : V (S) −→ V (T ) and ρS,T : E(S) −→ P(E(T )), where P(E(T )) denotes the set of all simple loopless paths which can be built from E(T ). τS,T (vS ) = vT if process vS of S is mapped onto processor vT of T , and ρS,T (eS ) = {e1T , e2T , . . . , enT } if communication channel eS of S is routed through communication links e1T , e2T , . . . , enT of T . |ρS,T (eS )| denotes the dilation of edge eS , that is, the number of edges of E(T ) used to route eS . 3.1.2 Cost function and performance criteria The computation of efficient static mappings requires an a priori knowledge of the dynamic behavior of the target machine with respect to the programs which are run on it. This knowledge is synthesized in a cost function, the nature of which determines the characteristics of the desired optimal mappings. The goal of our mapping algorithm is to minimize some communication cost function, while keeping the load balance within a specified tolerance. The communication cost function fC that we have chosen is the sum, for all edges, of their dilation multiplied by their weight: X def wS (eS ) |ρS,T (eS )| . fC (τS,T , ρS,T ) = eS ∈E(S) This function, which has already been considered by several authors for hypercube target topologies [11, 21, 25], has several interesting properties: it is easy to compute, allows incremental updates performed by iterative algorithms, and its minimization favors the mapping of intensively intercommunicating processes onto nearby processors; regardless of the type of routage implemented on the target machine (store-and-forward or cut-through), it models the traffic on the interconnection network and thus the risk of congestion. The strong positive correlation between values of this function and effective execution times has been experimentally verified by Hammond [21] on the CM-2, and by Hendrickson and Leland [26] on the nCUBE 2. The quality of mappings is evaluated with respect to the criteria for quality that we have chosen: the balance of the computation load across processors, and the minimization of the interprocessor communication cost modeled by function fC . These criteria lead to the definition of several parameters, which are described below. For load balance, one can define µmap , the average load per computational power unit (which does not depend on the mapping), and δmap , the load imbalance 8 ratio, as def µmap = P wS (vS ) P wT (vT ) vS ∈V (S) and vT ∈V (T ) def δmap = P 1 wT (vT ) vT ∈V (T ) P vS ∈ V (S) τS,T (vS ) = vT P wS (vS ) − µmap wS (vS ) . vS ∈V (S) However, since the maximum load imbalance ratio is provided by the user in input of the mapping, the information given by these parameters is of little interest, since what matters is the minimization of the communication cost function under this load balance constraint. For communication, the straightforward parameter to consider is fC . It can be normalized as µexp , the average edge expansion, which can be compared to µdil , the average edge dilation; these are defined as P |ρS,T (eS )| fC def def eS ∈E(S) P µexp = and µdil = . wS (eS ) |E(S)| eS ∈E(S) def µ is smaller than 1 when the mapper succeeds in putting heavily interδexp = µexp dil communicating processes closer to each other than it does for lightly communicating processes; they are equal if all edges have same weight. 3.1.3 The Dual Recursive Bipartitioning algorithm Our mapping algorithm uses a divide and conquer approach to recursively allocate subsets of processes to subsets of processors [43]. It starts by considering a set of processors, also called domain, containing all the processors of the target machine, and with which is associated the set of all the processes to map. At each step, the algorithm bipartitions a yet unprocessed domain into two disjoint subdomains, and calls a graph bipartitioning algorithm to split the subset of processes associated with the domain across the two subdomains, as sketched in the following. mapping (D, P) Set_Of_Processors D; Set_Of_Processes P; { Set_Of_Processors D0, D1; Set_Of_Processes P0, P1; if (|P| == 0) return; if (|D| == 1) { result (D, P); return; } /* If nothing to do. */ /* If one processor in D */ /* P is mapped onto it. */ (D0, D1) = processor_bipartition (D); (P0, P1) = process_bipartition (P, D0, D1); mapping (D0, P0); /* Perform recursion. */ mapping (D1, P1); } The association of a subdomain with every process defines a partial mapping of the process graph. As bipartitionings are performed, the subdomain sizes decrease, up 9 to give a complete mapping when all subdomains are of size one. The above algorithm lies on the ability to define five main objects: • a domain structure, which represents a set of processors in the target architecture; • a domain bipartitioning function, which, given a domain, bipartitions it in two disjoint subdomains; • a domain distance function, which gives, in the target graph, a measure of the distance between two disjoint domains. Since domains may not be convex nor connected, this distance may be estimated. However, it must respect certain homogeneity properties, such as giving more accurate results as domain sizes decrease. The domain distance function is used by the graph bipartitioning algorithms to compute the communication function to minimize, since it allows the mapper to estimate the dilation of the edges that link vertices which belong to different domains. Using such a distance function amounts to considering that all routings will use shortest paths on the target architecture, which is how most parallel machines actually do. We have thus chosen that our program would not provide routings for the communication channels, leaving their handling to the communication system of the target machine; • a process subgraph structure, which represents the subgraph induced by a subset of the vertex set of the original source graph; • a process subgraph bipartitioning function, which bipartitions subgraphs in two disjoint pieces to be mapped onto the two subdomains computed by the domain bipartitioning function. All these routines are seen as black boxes by the mapping program, which can thus accept any kind of target architecture and process bipartitioning functions. 3.1.4 Partial cost function The production of efficient complete mappings requires that all graph bipartitionings favor the criteria that we have chosen. Therefore, the bipartitioning of a subgraph S ′ of S should maintain load balance within the user-specified tolerance, and minimize the partial communication cost function fC′ , defined as X def wS ({v, v ′ }) |ρS,T ({v, v ′ })| , fC′ (τS,T , ρS,T ) = v ∈ V (S ′ ) {v, v ′ } ∈ E(S) which accounts for the dilation of edges internal to subgraph S ′ as well as for the one of edges which belong to the cocycle of S ′ , as shown in Figure 1. Taking into account the partial mapping results issued by previous bipartitionings makes it possible to avoid local choices that might prove globally bad, as explained below. This amounts to incorporating additional constraints to the standard graph bipartitioning problem, turning it into a more general optimization problem termed skewed graph partitioning by some authors [27]. 10 D D D0 D0 D1 a. Initial position. D1 b. After one vertex is moved. Figure 1: Edges accounted for in the partial communication cost function when bipartitioning the subgraph associated with domain D between the two subdomains D0 and D1 of D. Dotted edges are of dilation zero, their two ends being mapped onto the same subdomain. Thin edges are cocycle edges. 3.1.5 Execution scheme From an algorithmic point of view, our mapper behaves as a greedy algorithm, since the mapping of a process to a subdomain is never reconsidered, and at each step of which iterative algorithms can be applied. The double recursive call performed at each step induces a recursion scheme in the shape of a binary tree, each vertex of which corresponds to a bipartitioning job, that is, the bipartitioning of both a domain and its associated subgraph. In the case of depth-first sequencing, as written in the above sketch, bipartitioning jobs run in the left branches of the tree have no information on the distance between the vertices they handle and neighbor vertices to be processed in the right branches. On the contrary, sequencing the jobs according to a by-level (breadth-first) travel of the tree allows any bipartitioning job of a given level to have information on the subdomains to which all the processes have been assigned at the previous level. Thus, when deciding in which subdomain to put a given process, a bipartitioning job can account for the communication costs induced by its neighbor processes, whether they are handled by the job itself or not, since it can estimate in fC′ the dilation of the corresponding edges. This results in an interesting feedback effect: once an edge has been kept in a cut between two subdomains, the distance between its end vertices will be accounted for in the partial communication cost function to be minimized, and following jobs will be more likely to keep these vertices close to each other, as illustrated in Figure 2. The relative efficiency of depth-first and breadth-first sequencing schemes with respect to the structure of the source and target graphs is discussed in [46]. 3.1.6 Graph bipartitioning methods The core of our recursive mapping algorithm uses process graph bipartitioning methods as black boxes. It allows the mapper to run any type of graph bipartitioning method compatible with our criteria for quality. Bipartitioning jobs maintain an internal image of the current bipartition, indicating for every vertex of the job whether it is currently assigned to the first or to the second subdomain. It is therefore possible to apply several different methods in sequence, each one starting from the result of the previous one, and to select the methods with respect to the job characteristics, thus enabling us to define mapping strategies. The currently implemented 11 CL0 CL0 CL1 CL1 D D CL2 CL2 CL1 CL2 a. Depth-first sequencing. b. Breadth-first sequencing. Figure 2: Influence of depth-first and breadth-first sequencings on the bipartitioning of a domain D belonging to the leftmost branch of the bipartitioning tree. With breadth-first sequencing, the partial mapping data regarding vertices belonging to the right branches of the bipartitioning tree are more accurate (C.L. stands for “Cut Level”). graph bipartitioning methods are listed below. Band Like the multi-level method which will be described below, the band method is a meta-algorithm, in the sense that it does not itself compute partitions, but rather helps other partitioning algorithms perform better. It is a refinement algorithm which, from a given initial partition, extracts a band graph of given width (which only contains graph vertices that are at most at this distance from the separator), calls a partitioning strategy on this band graph, and projects back the refined partition on the original graph. This method was designed to be able to use expensive partitioning heuristics, such as genetic algorithms, on large graphs, as it dramatically reduces the problem space by several orders of magnitude. However, it was found that, in a multi-level context, it also improves partition quality, by coercing partitions in a problem space that derives from the one which was globally defined at the coarsest level, thus preventing local optimization refinement algorithms to be trapped in local optima of the finer graphs [8]. Diffusion This global optimization method, presented in [44], flows two kinds of antagonistic liquids, scotch and anti-scotch, from two source vertices, and sets the new frontier as the limit between vertices which contain scotch and the ones which contain anti-scotch. In order to add load-balancing constraints to the algorithm, a constant amount of liquid disappears from every vertex per unit of time, so that no domain can spread across more than half of the vertices. Because selecting the source vertices is essential to the obtainment of useful results, this method has been hard-coded so that the two source vertices are the two vertices of highest indices, since in the band method these are the anchor vertices which represent all of the removed vertices of each part. Therefore, this method must be used on band graphs only, or on specifically crafted graphs. Exactifier This greedy algorithm refines the current partition so as to reduce load imbal12 ance as much as possible, while keeping the value of the communication cost function as small as possible. The vertex set is scanned in order of decreasing vertex weights, and vertices are moved from one subdomain to the other if doing so reduces load imbalance. When several vertices have same weight, the vertex whose swap decreases most the communication cost function is selected first. This method is used in post-processing of other methods when load balance is mandatory. For weighted graphs, the strict enforcement of load balance may cause the swapping of isolated vertices of small weight, thus greatly increasing the cut. Therefore, great care should be taken when using this method if connectivity or cut minimization are mandatory. Fiduccia-Mattheyses The Fiduccia-Mattheyses heuristics [12] is an almost-linear improvement of the famous Kernighan-Lin algorithm [35]. It tries to improve the bipartition that is input to it by incrementally moving vertices between the subsets of the partition, as long as it can find sequences of moves that lower its communication cost. By considering sequences of moves instead of single swaps, the algorithm allows hill-climbing from local minima of the cost function. As an extension to the original Fiduccia-Mattheyses algorithm, we have developed new data structures, based on logarithmic indexings of arrays, that allow us to handle weighted graphs while preserving the almost-linearity in time of the algorithm [46]. As several authors quoted before [24, 32], the Fiduccia-Mattheyses algorithm gives better results when trying to optimize a good starting partition. Therefore, it should not be used on its own, but rather after greedy starting methods such as the Gibbs-Poole-Stockmeyer or the greedy graph growing methods. Gibbs-Poole-Stockmeyer This greedy bipartitioning method derives from an algorithm proposed by Gibbs, Poole, and Stockmeyer to minimize the dilation of graph orderings, that is, the maximum absolute value of the difference between the numbers of neighbor vertices [18]. The graph is sliced by using a breadth-first spanning tree rooted at a randomly chosen vertex, and this process is iterated by selecting a new root vertex within the last layer as long as the number of layers increases. Then, starting from the current root vertex, vertices are assigned layer after layer to the first subdomain, until half of the total weight has been processed. Remaining vertices are then allocated to the second subdomain. As for the original Gibbs, Poole, and Stockmeyer algorithm, it is assumed that the maximization of the number of layers results in the minimization of the sizes –and therefore of the cocycles– of the layers. This property has already been used by George and Liu to reorder sparse linear systems using the nested dissection method [17], and by Simon in [56]. Greedy graph growing This greedy algorithm, which has been proposed by Karypis and Kumar [31], belongs to the GRASP (“Greedy Randomized Adaptive Search Procedure”) class [36]. It consists in selecting an initial vertex at random, and repeatedly adding vertices to this growing subset, such that each added vertex results in the smallest increase in the communication cost function. This process, which stops when load balance is achieved, is repeated several times in order to explore (mostly in a gradient-like fashion) different areas of the solution space, and the best partition found is kept. 13 Multi-level This algorithm, which has been studied by several authors [4, 23, 31] and should be considered as a strategy rather than as a method since it uses other methods as parameters, repeatedly reduces the size of the graph to bipartition by finding matchings that collapse vertices and edges, computes a partition for the coarsest graph obtained, and projects the result back to the original graph, as shown in Figure 3. The multi-level method, when used in conjuncRefined partition Projected partition Coarsening phase Uncoarsening phase Initial partitioning Figure 3: The multi-level partitioning process. In the uncoarsening phase, the light and bold lines represent for each level the projected partition obtained from the coarser graph, and the partition obtained after refinement, respectively. tion with the Fiduccia-Mattheyses method to compute the initial partitions and refine the projected partitions at every level, usually leads to a significant improvement in quality with respect to the plain Fiduccia-Mattheyses method. By coarsening the graph used by the Fiduccia-Mattheyses method to compute and project back the initial partition, the multi-level algorithm broadens the scope of the Fiduccia-Mattheyses algorithm, and makes possible for it to account for topological structures of the graph that would else be of a too high level for it to encompass in its local optimization process. 3.1.7 Mapping onto variable-sized architectures Several constrained graph partitioning problems can be modeled as mapping the problem graph onto a target architecture, the number of vertices and topology of which depend dynamically on the structure of the subgraphs to bipartition at each step. Variable-sized architectures are supported by the DRB algorithm in the following way: at the end of each bipartitioning step, if any of the variable subdomains is empty (that is, all vertices of the subgraph are mapped only to one of the subdomains), then the DRB process stops for both subdomains, and all of the vertices are assigned to their parent subdomain; else, if a variable subdomain has only one vertex mapped onto it, the DRB process stops for this subdomain, and the vertex is assigned to it. The moment when to stop the DRB process for a specific subgraph can be controlled by defining a bipartitioning strategy that tests for the validity of a criterion at each bipartitioning step, and maps all of the subgraph vertices to one of the subdomains when it becomes false. 14 3.2 Sparse matrix ordering by hybrid incomplete nested dissection When solving large sparse linear systems of the form Ax = b, it is common to precede the numerical factorization by a symmetric reordering. This reordering is chosen in such a way that pivoting down the diagonal in order on the resulting permuted matrix P AP T produces much less fill-in and work than computing the factors of A by pivoting down the diagonal in the original order (the fill-in is the set of zero entries in A that become non-zero in the factored matrix). 3.2.1 Minimum Degree The minimum degree algorithm [57] is a local heuristic that performs its pivot selection by iteratively selecting from the graph a node of minimum degree. The minimum degree algorithm is known to be a very fast and general purpose algorithm, and has received much attention over the last three decades (see for example [1, 16, 41]). However, the algorithm is intrinsically sequential, and very little can be theoretically proved about its efficiency. 3.2.2 Nested dissection The nested dissection algorithm [17] is a global, heuristic, recursive algorithm which computes a vertex set S that separates the graph into two parts A and B, ordering S with the highest remaining indices. It then proceeds recursively on parts A and B until their sizes become smaller than some threshold value. This ordering guarantees that, at each step, no non zero term can appear in the factorization process between unknowns of A and unknowns of B. Many theoretical results have been carried out on nested dissection ordering [7, 40], and its divide and conquer nature makes it easily parallelizable. The main issue of the nested dissection ordering algorithm is thus to find small vertex separators that balance the remaining subgraphs as evenly as possible. Most often, vertex separators are computed by using direct heuristics [28, 38], or from edge separators [50, and included references] by minimum cover techniques [9, 30], but other techniques such as spectral vertex partitioning have also been used [51]. Provided that good vertex separators are found, the nested dissection algorithm produces orderings which, both in terms of fill-in and operation count, compare favorably [20, 31, 48] to the ones obtained with the minimum degree algorithm [41]. Moreover, the elimination trees induced by nested dissection are broader, shorter, and better balanced, and therefore exhibit much more concurrency in the context of parallel Cholesky factorization [3, 14, 15, 20, 48, 55, and included references]. 3.2.3 Hybridization Due to their complementary nature, several schemes have been proposed to hybridize the two methods [28, 34, 48]. However, to our knowledge, only loose couplings have been achieved: incomplete nested dissection is performed on the graph to order, and the resulting subgraphs are passed to some minimum degree algorithm. This results in the fact that the minimum degree algorithm does not have exact degree values for all of the boundary vertices of the subgraphs, leading to a misbehavior of the vertex selection process. 15 Our ordering program implements a tight coupling of the nested dissection and minimum degree algorithms, that allows each of them to take advantage of the information computed by the other. First, the nested dissection algorithm provides exact degree values for the boundary vertices of the subgraphs passed to the minimum degree algorithm (called halo minimum degree since it has a partial visibility of the neighborhood of the subgraph). Second, the minimum degree algorithm returns the assembly tree that it computes for each subgraph, thus allowing for supervariable amalgamation, in order to obtain column-blocks of a size suitable for BLAS3 block computations. As for our mapping program, it is possible to combine ordering methods into ordering strategies, which allow the user to select the proper methods with respect to the characteristics of the subgraphs. The ordering program is completely parametrized by its ordering strategy. The nested dissection method allows the user to take advantage of all of the graph partitioning routines that have been developed in the earlier stages of the Scotch project. Internal ordering strategies for the separators are relevant in the case of sequential or parallel [19, 52, 53, 54] block solving, to select ordering algorithms that minimize the number of extra-diagonal blocks [7], thus allowing for efficient use of BLAS3 primitives, and to reduce inter-processor communication. 3.2.4 Performance criteria The quality of orderings is evaluated with respect to several criteria. The first one, NNZ, is the number of non-zero terms in the factored reordered matrix. The second one, OPC, is the operation count, that is the number of arithmetic operations required to factor the matrix. The operation count that we have considered takes into consideration all operations (additions, subtractions, multiplications, divisions) P required by Cholesky factorization, except square roots; it is equal to c n2c , where nc is the number of non-zeros of column c of the factored matrix, diagonal included. A third criterion for quality is the shape of the elimination tree; concurrency in parallel solving is all the higher as the elimination tree is broad and short. To measure its quality, several parameters can be defined: hmin , hmax , and havg denote the minimum, maximum, and average heights of the tree1 , respectively, and hdlt is the variance, expressed as a percentage of havg . Since small separators result in small chains in the elimination tree, havg should also indirectly reflect the quality of separators. 3.2.5 Ordering methods The core of our ordering algorithm uses graph ordering methods as black boxes, which allows the orderer to run any type of ordering method. In addition to yielding orderings of the subgraphs that are passed to them, these methods may compute column block partitions of the subgraphs, that are recorded in a separate tree structure. The currently implemented graph ordering methods are listed below. Halo approximate minimum degree The halo approximate minimum degree method [49] is an improvement of the approximate minimum degree [1] algorithm, suited for use on subgraphs 1 We do not consider as leaves the disconnected vertices that are present in some meshes, since they do not participate in the solving process. 16 produced by nested dissection methods. Its interest compared to classical minimum degree algorithms is that boundary vertices are processed using their real degree in the global graph rather than their (much smaller) degree in the subgraph, resulting in smaller fill-in and operation count. This method also implements amalgamation techniques that result in efficient block computations in the factoring and the solving processes. Halo approximate minimum fill The halo approximate minimum fill method is a variant of the halo approximate minimum degree algorithm, where the criterion to select the next vertex to permute is not based on its current estimated degree but on the minimization of the induced fill. Graph compression The graph compression method [2] merges cliques of vertices into single nodes, so as to speed-up the ordering of the compressed graph. It also results in some improvement of the quality of separators, especially for stiffness matrices. Gibbs-Poole-Stockmeyer This method is mainly used on separators to reduce the number and extent of extra-diagonal blocks. Simple method Vertices are ordered consecutively, in the same order as they are stored in the graph. This is the fastest method to use on separators when the shape of extra-diagonal structures is not a concern. Nested dissection Incomplete nested dissection method. Separators are computed recursively on subgraphs, and specific ordering methods are applied to the separators and to the resulting subgraphs (see sections 3.2.2 and 3.2.3). 3.2.6 Graph separation methods The core of our incomplete nested dissection algorithm uses graph separation methods as black boxes. It allows the orderer to run any type of graph separation method compatible with our criteria for quality, that is, reducing the size of the vertex separator while maintaining the loads of the separated parts within some user-specified tolerance. Separation jobs maintain an internal image of the current vertex separator, indicating for every vertex of the job whether it is currently assigned to one of the two parts, or to the separator. It is therefore possible to apply several different methods in sequence, each one starting from the result of the previous one, and to select the methods with respect to the job characteristics, thus enabling the definition of separation strategies. The currently implemented graph separation methods are listed below. Fiduccia-Mattheyses This is a vertex-oriented version of the original, edge-oriented, FiducciaMattheyses heuristics described in page 13. Greedy graph growing This is a vertex-oriented version of the edge-oriented greedy graph growing algorithm described in page 13. 17 Multi-level This is a vertex-oriented version of the edge-oriented multi-level algorithm described in page 14. Thinner This greedy algorithm refines the current separator by removing all of the exceeding vertices, that is, vertices that do not have neighbors in both parts. It is provided as a simple gradient refinement algorithm for the multi-level method, and is clearly outperformed by the Fiduccia-Mattheyses algorithm. Vertex cover This algorithm computes a vertex separator by first computing an edge separator, that is, a bipartition of the graph, and then turning it into a vertex separator by using the method proposed by Pothen and Fang [50]. This method requires the computation of maximal matchings in the bipartite graphs associated with the edge cuts, which are built using Duff’s variant [9] of the Hopcroft and Karp algorithm [30]. Edge separators are computed by using a bipartitioning strategy, which can use any of the graph bipartitioning methods described in section 3.1.6, page 11. 4 4.1 Updates Changes from version 4.0 Scotch has gone parallel with the release of PT-Scotch, the Parallel Threaded Scotch. People interested in these parallel routines should refer to the PT-Scotch and libScotch 5.1 User’s Guide [45], which extends this manual. A compatibility library has been developed to allow users to try and use Scotch in programs that were designed to use MeTiS. Please refer to Section 7.14 for more information. Scotch can now handle compressed streams on the fly, in several widely used formats such as gzip, bzip2 or lzma. Please refer to Section 6.2 for more information. 5 Files and data structures For the sake of portability, readability, and reduction of storage space, all the data files shared by the different programs of the Scotch project are coded in plain ASCII text exclusively. Although we may speak of “lines” when describing file formats, text-formatting characters such as newlines or tabulations are not mandatory, and are not taken into account when files are read. They are only used to provide better readability and understanding. Whenever numbers are used to label objects, and unless explicitely stated, numberings always start from zero, not one. 5.1 Graph files Graph files, which usually end in “.grf” or “.src”, describe valuated graphs, which can be valuated process graphs to be mapped onto target architectures, or graphs representing the adjacency structures of matrices to order. Graphs are represented by means of adjacency lists: the definition of each vertex is accompanied by the list of all of its neighbors, i.e. all of its adjacent arcs. 18 Therefore, the overall number of edge data is twice the number of edges. Since version 3.3 has been introduced a new file format, referred to as the “newstyle” file format, which replaces the previous, “old-style”, file format. The two advantages of the new-style format over its predecessor are its greater compacity, which results in shorter I/O times, and its ability to handle easily graphs output by C or by Fortran programs. Starting from version 4.0, only the new format is supported. To convert remaining old-style graph files into new-style graph files, one should get version 3.4 of the Scotch distribution, which comprises the scv file converter, and use it to produce new-style Scotch graph files from the old-style Scotch graph files which it is able to read. See section 6.3.5 for a description of gcv, formerly called scv. The first line of a graph file holds the graph file version number, which is currently 0. The second line holds the number of vertices of the graph (referred to as vertnbr in libScotch; see for instance Figure 16, page 49, for a detailed example), followed by its number of arcs (unappropriately called edgenbr, as it is in fact equal to twice the actual number of edges). The third line holds two figures: the graph base index value (baseval), and a numeric flag. The graph base index value records the value of the starting index used to describe the graph; it is usually 0 when the graph has been output by C programs, and 1 for Fortran programs. Its purpose is to ease the manipulation of graphs within each of these two environments, while providing compatibility between them. The numeric flag, similar to the one used by the Chaco graph format [24], is made of three decimal digits. A non-zero value in the units indicates that vertex weights are provided. A non-zero value in the tenths indicates that edge weights are provided. A non-zero value in the hundredths indicates that vertex labels are provided; if it is the case, vertices can be stored in any order in the file; else, natural order is assumed, starting from the graph base index. This header data is then followed by as many lines as there are vertices in the graph, that is, vertnbr lines. Each of these lines begins with the vertex label, if necessary, the vertex load, if necessary, and the vertex degree, followed by the description of the arcs. An arc is defined by the load of the edge, if necessary, and by the label of its other end vertex. The arcs of a given vertex can be provided in any order in its neighbor list. If vertex labels are provided, vertices can also be stored in any order in the file. Figure 4 shows the contents of a graph file modeling a cube with unity vertex and edge weights and base 0. 0 8 0 3 3 3 3 3 3 3 3 24 000 4 5 6 7 0 1 2 3 2 3 0 1 6 7 4 5 1 0 3 2 5 4 7 6 Figure 4: Graph file representing a cube. 19 5.2 Mesh files Mesh files, which usually end in “.msh”, describe valuated meshes, made of elements and nodes, the elements of which can be mapped onto target architectures, and the nodes of which can be reordered. Meshes are bipartite graphs, in the sense that every element is connected to the nodes that it comprises, and every node is connected to the elements to which it belongs. No edge connects any two element vertices, nor any two node vertices. One can also think of meshes as hypergraphs, such that nodes are the vertices of the hypergraph and elements are hyper-edges which connect multiple nodes, or reciprocally such that elements are the vertices of the hypergraph and nodes are hyper-edges which connect multiple elements. Since meshes are graphs, the structure of mesh files resembles very much the one of graph files described above in section 5.1, and differs only by its header, which indicates which of the vertices are node vertices and element vertices. The first line of a mesh file holds the mesh file version number, which is currently 1. Graph and mesh version numbers will always differ, which enables application programs to accept both file formats and adapt their behavior according to the type of input data. The second line holds the number of elements of the mesh (velmnbr), followed by its number of nodes (vnodnbr), and by its overall number of arcs (edgenbr, that is, twice the number of edges which connect elements to nodes and vice-versa). The third line holds three figures: the base index of the first element vertex in memory (velmbas), the base index of the first node vertex in memory (vnodbas), and a numeric flag. The Scotch mesh file format requires that all nodes and all elements be assigned to contiguous ranges of indices. Therefore, either all element vertices are defined before all node vertices, or all node vertices are defined before all element vertices. The node and element base indices indicate at the same time whether elements or nodes are put in the first place, as well as the value of the starting index used to describe the graph. Indeed, if velmbas < vnodbas, then elements have the smallest indices, velmbas is the base value of the underlying graph (that is, baseval = velmbas), and velmbas + velmnbr = vnodbas holds. Conversely, if velmbas > vnodbas, then nodes have the smallest indices, vnodbas is the base value of the underlying graph, (that is, baseval = vnodbas), and vnodbas+vnodnbr = velmbas holds. The numeric flag, similar to the one used by the Chaco graph format [24], is made of three decimal digits. A non-zero value in the units indicates that vertex weights are provided. A non-zero value in the tenths indicates that edge weights are provided. A non-zero value in the hundredths indicates that vertex labels are provided; if it is the case, and if velmbas < vnodbas (resp. velmbas > vnodbas), the velmnbr (resp. vnodnbr) first vertex lines are assumed to be element (resp. node) vertices, irrespective of their vertex labels, and the vnodnbr (resp. velmnbr) remaining vertex lines are assumed to be node (resp. element) vertices; else, natural order is assumed, starting at the underlying graph base index (baseval). This header data is then followed by as many lines as there are node and element vertices in the graph. These lines are similar to the ones of the graph format, except that, in order to save disk space, the numberings of nodes and elements all start from the same base value, that is, min(velmbas, vnodbas) (also called baseval, like for regular graphs). 20 For example, Figure 5 shows the contents of the mesh file modeling three square elements, with unity vertex and edge weights, elements defined before nodes, and numbering of the underlying graph starting from 1. In memory, the three elements are labeled from 1 to 3, and the eight nodes are labeled from 4 to 11. In the file, the three elements are still labeled from 1 to 3, while the eight nodes are labeled from 1 to 8. When labels are used, elements and nodes may have similar labels, but not two elements, nor two nodes, should have the same labels. 4 10 2 5 11 1 6 7 3 8 9 1 3 1 4 4 4 1 2 2 2 1 1 1 2 8 4 2 7 5 2 2 1 1 3 3 2 2 (= 5) (= 10) (= 8) 24 000 8 (= 11) 2 (= 5) 6 (= 9) 4 8 3 (= 7) (= 11) (= 6) 3 1 4 (= 6) (= 4) (= 7) 1 3 3 1 Figure 5: Mesh file representing three square elements, with unity vertex and edge weights. Elements are defined before nodes, and numbering of the underlying graph starts from 1. The left part of the figure shows the mesh representation in memory, with consecutive element and node indices. The right part of the figure shows the contents of the file, with both element and node numberings starting from 1, the minimum of the element and node base values. Corresponding node indices in memory are shown in parentheses for the sake of comprehension. 5.3 Geometry files Geometry files, which usually end in “.xyz”, hold the coordinates of the vertices of their associated graph or mesh. These files are not used in the mapping process itself, since only topological properties are taken into account then (mappings are computed regardless of graph geometry). They are used by visualization programs to compute graphical representations of mapping results. The first string to appear in a geometry file codes for its type, or dimensionality. It is “1” if the file contains unidimensional coordinates, “2” for bidimensional coordinates, and “3” for tridimensional coordinates. It is followed by the number of coordinate data stored in the file, which should be at least equal to the number of vertices of the associated graph or mesh, and by that many coordinate lines. Each coordinate line holds the label of the vertex, plus one, two or three real numbers which are the (X), (X,Y), or (X,Y,Z), coordinates of the graph vertices, according to the graph dimensionality. Vertices can be stored in any order in the file. Moreover, a geometry file can have more coordinate data than there are vertices in the associated graph or mesh file; only coordinates the labels of which match labels of graph or mesh vertices will be 21 taken into account. This feature allows all subgraphs of a given graph or mesh to share the same geometry file, provided that graph vertex labels remain unchanged. For example, Figure 6 shows the contents of the 3D geometry file associated with the graph of Figure 4. 3 8 0 1 2 3 4 5 6 7 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 Figure 6: Geometry file associated with the graph file of Figure 4. 5.4 Target files Target files describe the architectures onto which source graphs are mapped. Instead of containing the structure of the target graph itself, as source graph files do, target files define how target graphs are bipartitioned and give the distances between all pairs of vertices (that is, processors). Keeping the bipartitioning information within target files avoids recomputing it every time a target architecture is used. We are allowed to do so because, in our approach, the recursive bipartitioning of the target graph is fully independent with respect to that of the source graph (however, the opposite is false). For space and time saving issues, some classical homogeneous architectures (2D and 3D meshes and tori, hypercubes, complete graphs, etc.) have been algorithmically coded within the mapper itself by the means of built-in functions. Instead of containing the whole graph decomposition data, their target files hold only a few values, used as parameters by the built-in functions. 5.4.1 Decomposition-defined architecture files Decomposition-defined architecture files are the standard way to describe weighted and/or irregular target architectures. Several file formats exist, but we only present here the most humanly readable one, which begins in “deco 0” (“deco” stands for “decomposition-defined” architecture, and “0” is the format type). The “deco 0” header is followed by two integer numbers, which are the number of processors and the largest terminal number used in the decomposition, respectively. Two arrays follow. The first array has as many lines as there are processors. Each of these lines holds three numbers: the processor label, the processor weight (that is an estimation of its computational power), and its terminal number. The terminal number associated with every processor is obtained by giving the initial domain holding all the processors number 1, and by numbering the two subdomains of a given domain of number i with numbers 2i and 2i + 1. The second array is a lower triangular diagonal-less matrix that gives the distance between all pairs of processors. This distance matrix, combined with the decomposition tree coded by terminal numbers, allows the evaluation by averaging of the distance between all pairs of domains. In order for the mapper to behave properly, distances between 22 processors must be strictly positive numbers. Therefore, null distances are not accepted. For instance, Figure 7 shows the contents of the architecture decomposition file for UB(2, 3), the binary de Bruijn graph of dimension 3, as computed by the amk grf program. 1 3 2 7 6 15 14 12 13 4 9 11 5 8 10 deco 0 8 15 0 1 15 1 1 14 2 1 13 3 1 11 4 1 12 5 1 9 6 1 8 7 1 10 1 2 1 2 1 2 1 1 1 2 3 2 1 1 2 2 2 2 1 1 1 3 2 3 1 2 2 1 Figure 7: Target decomposition file for UB(2, 3). The terminal numbers associated with every processor define a unique recursive bipartitioning of the target graph. 5.4.2 Algorithmically-coded architecture files All algorithmically-coded architectures are defined with unity edge and vertex weights. They start with an abbreviation name of the architecture, followed by parameters specific to the architecture. The available built-in architecture definitions are listed below. cmplt size Defines a complete graph with size vertices. The vertex labels are numbers between 0 and size − 1. cmpltw size load0 load1 . . . loadsize−1 Defines a weighted complete graph with size vertices. The vertex labels are numbers between 0 and size − 1, and vertices are assigned integer weights in the order in which these are provided. hcub dim Defines a binary hypercube of dimension dim. The vertex labels are the decimal values of the binary representations of the vertex coordinates in the hypercube. leaf height cluster weight Defines a tree-leaf architecture with height levels and 2height vertices. The tree-leaf graph models a machine the topology of which is a complete binary tree, such that leaves are processors and all other nodes are communication routers, as shown in Figure 8. Only the leaves are used to map processes, but distances between them are computed by considering the whole tree. This graph is used to represent multi-stage machines with constant bandwidth, 23 Figure 8: The “tree-leaf” graph of height 3. Processors are drawn in black and routers in grey. such as the CM-5 [37] for which experiments have shown that bandwidth is constant between every pair of processors and hardly depends on network congestion [39], or the SP-2 with power-of-two number of nodes. The two additional parameters cluster and weight serve to model heterogeneous architectures for which multiprocessor nodes having several highly interconnected processors (typically by means of shared memory) are linked by means of networks of lower bandwidth. cluster represents the number of levels to traverse, starting from the root of the leaf, before reaching the multiprocessors, each multiprocessor having 2height−cluster nodes. weight is the relative cost of extra-cluster links, that is, links in the upper levels of the tree-leaf graph. Links within clusters are assumed to have weight 1. When there are no clusters at all, that is, in the case of purely homogeneous architectures, set cluster to be equal to height, and weight to 1. mesh2D dimX dimY Defines a bidimensional array of dimX columns by dimY rows. The vertex with coordinates (posX, posY) has label posY × dimX + posX. mesh3D dimX dimY dimZ Defines a tridimensional array of dimX columns by dimY rows by dimZ levels. The vertex with coordinates (posX,posY,posZ ) has label (posZ × dimY + posY) × dimX + posX. torus2D dimX dimY Defines a bidimensional array of dimX columns by dimY rows, with wraparound edges. The vertex with coordinates (posX, posY) has label posY × dimX + posX. torus3D dimX dimY dimZ Defines a tridimensional array of dimX columns by dimY rows by dimZ levels, with wraparound edges. The vertex with coordinates (posX,posY,posZ ) has label (posZ × dimY + posY) × dimX + posX. 5.4.3 Variable-sized architecture files Variable-sized architectures are a class of algorithmically-coded architectures the size of which is not defined a priori. As for fixed-size algorithmically-coded architectures, they start with an abbreviation name of the architecture, followed by parameters specific to the architecture. The available built-in variable-sized architecture definitions are listed below. varcmplt Defines a variable-sized complete graph. Domains are labeled such that the first domain is labeled 1, and the two subdomains of any domain i are labeled 24 2i and 2i + 1. The distance between any two subdomains i and j is 0 if i = j and 1 else. varhcub Defines a variable-sized hypercube. Domains are labeled such that the first domain is labeled 1, and the two subdomains of any domain i are labeled 2i and 2i + 1. The distance between any two domains is the Hamming distance between the common bits of the two domains, plus half of the absolute difference between the levels of the two domains, this latter term modeling the average distance on unknown bits. For instance, the distance between subdomain 9 = 1001B , of level 3 (since its leftmost 1 has been shifted left thrice), and subdomain 53 = 110101B , of level 5 (since its leftmost 1 has been shifted left five times), is 2: it is 1, which is the number of bits which differ between 1101B (that is, 53 = 110101B shifted rightwards twice) and 1001B , plus 1, which is half of the absolute difference between 5 and 3. 5.5 Mapping files Mapping files, which usually end in “.map”, contain the result of the mapping of source graphs onto target architectures. They associate a vertex of the target graph with every vertex of the source graph. Mapping files begin with the number of mapping lines which they contain, followed by that many mapping lines. Each mapping line holds a mapping pair, made of two integer numbers which are the label of a source graph vertex and the label of the target graph vertex onto which it is mapped. Mapping pairs can be stored in any order in the file; however, labels of source graph vertices must be all different. For example, Figure 9 shows the result obtained when mapping the source graph of Figure 4 onto the target architecture of Figure 7. This one-to-one embedding of H(3) into UB(2, 3) has dilation 1, except for one hypercube edge which has dilation 3. 8 0 1 2 3 4 5 6 7 1 3 2 5 0 7 4 6 Figure 9: Mapping file obtained when mapping the hypercube source graph of Figure 4 onto the binary de Bruijn architecture of Figure 7. Mapping files are also used on output of the block orderer to represent the allocation of the vertices of the original graph to the column blocks associated with the ordering. In this case, column blocks are labeled in ascending order, such that the number of a block is always greater than the ones of its predecessors in the elimination process, that is, its leaves in the elimination tree. 5.6 Ordering files Ordering files, which usually end in “.ord”, contain the result of the ordering of source graphs or meshes that represent sparse matrices. They associate a number 25 with every vertex of the source graph or mesh. The structure of ordering files is analogous to the one of mapping files; they differ only by the meaning of their data. Ordering files begin with the number of ordering lines which they contain, that is the number of vertices in the source graph or the number of nodes in the source mesh, followed by that many ordering lines. Each ordering line holds an ordering pair, made of two integer numbers which are the label of a source graph or mesh vertex and its rank in the ordering. Ranks range from the base value of the graph or mesh (baseval) to the base value plus the number of vertices (resp. nodes), minus one (baseval + vertnbr − 1 for graphs, and baseval + vnodnbr − 1 for meshes). Ordering pairs can be stored in any order in the file; however, indices of source vertices must be all different. For example, Figure 10 shows the result obtained when reordering the source graph of Figure 4. 8 0 1 2 3 4 5 6 7 6 3 2 7 1 5 4 0 Figure 10: Ordering file obtained when reordering the hypercube graph of Figure 4. The advantage of having both graph and mesh orderings start from baseval (and not vnodbas in the case of meshes) is that an ordering computed on the nodal graph of some mesh has the same structure as an ordering computed from the native mesh structure, allowing for greater modularity. However, in memory, permutation indices for meshes are numbered from vnodbas to vnodbas + vnodnbr − 1. 5.7 Vertex list files Vertex lists are used by programs that select vertices from graphs. Vertex lists are coded as lists of integer numbers. The first integer is the number of vertices in the list and the other integers are the labels of the selected vertices, given in any order. For example, Figure 11 shows the list made from three vertices of labels 2, 45, and 7. 3 2 45 7 Figure 11: Example of vertex list with three vertices of labels 2, 45, and 7. 6 Programs The programs of the Scotch project belong to five distinct classes. • Graph handling programs, the names of which begin in “g”, that serve to build and test source graphs. 26 • Mesh handling programs, the names of which begin in “m”, that serve to build and test source meshes. • Target architecture handling programs, the names of which begin in “a”, that allow the user to build and test decomposition-defined target files, and especially to turn a source graph file into a target file. • The mapping and ordering programs themselves. • Output handling programs, which are the mapping performance analyzer, the graph factorization program, and the graph, matrix, and mapping visualization program. The general architecture of the Scotch project is displayed in Figure 12. 6.1 Invocation The programs comprising the Scotch project have been designed to run in command-line mode without any interactive prompting, so that they can be called easily from other programs by means of “system ()” or “popen ()” system calls, or be piped together on a single shell command line. In order to facilitate this, whenever a stream name is asked for (either on input or output), the user may put a single “-” to indicate standard input or output. Moreover, programs read their input in the same order as stream names are given in the command line. It allows them to read all their data from a single stream (usually the standard input), provided that these data are ordered properly. A brief on-line help is provided with all the programs. To get this help, use the “-h” option after the program name. The case of option letters is not significant, except when both the lower and upper cases of a letter have different meanings. When passing parameters to the programs, only the order of file names is significant; options can be put anywhere in the command line, in any order. Examples of use of the different programs of the Scotch project are provided in section 9. Error messages are standardized, but may not be fully explanatory. However, most of the errors you may run into should be related to file formats, and located in “...Load” routines. In this case, compare your data formats with the definitions given in section 5, and use the gtst and mtst programs to check the consistency of source graphs and meshes. 6.2 Using compressed files Starting from version 5.0.6, Scotch allows users to provide and retrieve data in compressed form. Since this feature requires that the compression and decompression tasks run in the same time as data is read or written, it can only be done on systems which support multi-threading (Posix threads) or multi-processing (by means of fork system calls). To determine if a stream has to be handled in compressed form, Scotch checks its extension. If it is “.gz” (gzip format), “.bz2” (bzip2 format) or “.lzma” (lzma format), the stream is assumed to be compressed according to the corresponding format. A filter task will then be used to process it accordingly if the format is implemented in Scotch and enabled on your system. To date, data can be read and written in bzip2 and gzip formats, and can also be read in the lzma format. Since the compression ratio of lzma on Scotch graphs is 30% better than the one of gzip and bzip2 (which are almost equivalent 27 External mesh file External graph file mcv gcv mmk_* gmk_* Geometry file amk_* .xyz Source mesh file gmk_msh .msh mtst Source graph file Target file amk_grf .grf mord gord gtst gmap Ordering file Mapping file .ord .map gotst acpl .tgt gout atst gmtst File Graphics file Program Data flow Figure 12: General architecture of the Scotch project. All of the features offered by the stand-alone programs are also available in the libScotch library. 28 in this case), the lzma format is a very good choice for handling very large graphs. To see how to enable compressed data handling in Scotch, please refer to Section 8. When the compressed format allows it, several files can be provided on the same stream, and be uncompressed on the fly. For instance, the command “cat brol.grf.gz brol.xyz.gz | gout -.gz -.gz -Mn - brol.iv” concatenates the topology and geometry data of some graph brol and feed them as a single compressed stream to the standard input of program gout, hence the ”-.gz” to indicate a compressed standard stream. 6.3 6.3.1 Description acpl Synopsis acpl [input target file [output target file]] options Description The program acpl is the decomposition-defined architecture file compiler. It processes architecture files of type “deco 0” built by hand or by the amk * programs, to create a “deco 1” compiled architecture file of about four times the size of the original one; see section 5.4.1, page 22, for a detailed description of decomposition-defined target architecture file formats. The mapper can read both original and compiled architecture file formats. However, compiled architecture files are read much more efficiently, as they are directly loaded into memory without further processing. Since the compilation time of a target architecture graph evolves as the square of its number of vertices, precompiling with acpl can save some time when many mappings are to be performed onto the same large target architecture. Options 6.3.2 -h Display the program synopsis. -V Print the program version and copyright. amk * Synopsis amk ccc dim [output target file] options amk fft2 dim [output target file] options amk hy dim [output target file] options amk m2 dimX [dimY [output target file]] options amk p2 weight0 [weight1 [output target file]] options Description The amk * programs make target graphs. Each of them is devoted to a 29 specific topology, for which it builds target graphs of any dimension. These programs are an alternate way between algorithmically-coded built-in target architectures and decompositions computed by mapping with amk grf. Like built-in target architectures, their decompositions are algorithmically computed, and like amk grf, their output is a decomposition-defined target architecture file. These programs allow the definition and testing of new algorithmically-coded target architectures without coding them in the core of the mapper. Program amk ccc outputs the target architecture file of a Cube-ConnectedCycles graph of dimension dim. Vertex (l, m) of CCC(dim), with 0 ≤ l < dim and 0 ≤ m < 2dim , is linked to vertices ((l − 1) mod dim, m), ((l + 1) mod dim, m), and (l, m ⊕ 2l ), and is labeled l × 2dim + m. ⊕ denotes the bitwise exclusive-or binary operator, and a mod b the integer remainder of the euclidian division of a by b. Program amk fft2 outputs the target architecture file of a binary FastFourier-Transform graph of dimension dim. Vertex (l, m) of FFT(dim), with 0 ≤ l ≤ dim and 0 ≤ m < 2dim , is linked to vertices (l − 1, m), (l − 1, m mod 2l−1 ), (l + 1, m), and (l + 1, m ⊕ 2l ), if they exist, and is labeled l × 2dim + m. Program amk hy outputs the target architecture file of a hypercube graph of dimension dim. Vertices are labeled according to the decimal value of their binary representation. The decomposition-defined target architectures computed by amk hy do not exactly give the same results as the built-in hypercube targets because distances are not computed in the same manner, although the two recursive bipartitionings are identical. To achieve best performance and save space, use the built-in architecture. Program amk p2 outputs the target architecture file of a weighted path graph with two vertices, the weights of which are given as parameters. This simple target topology is used to bipartition a source graph into two weighted parts with as few cut edges as possible. In particular, it is used to compute independent partitions of the processors of a multi-user parallel machine. As a matter of fact, if the yet unallocated part of the machine is represented by a source graph with n vertices, and n′ processors are requested by a user in order to run a job (with n′ ≤ n), mapping the source graph onto the weighted path graph with two vertices of weights n′ and n − n′ leads to a partition of the machine in which the allocated n′ processors should be as densely connected as possible (see Figure 13). Options -h Display the program synopsis. -mmethod Select the bipartitioning method (for amk m2 only). n Nested dissection. o Dimension-per-dimension one-way dissection. This is less efficient than nested dissection, and this feature exists only for benchmarking purposes. 30 a. Construction of a partition with 13 vertices (in black) on a 8 × 8 bidimensional mesh architecture. b. Construction of a partition with 17 vertices (in black) on the remaining architecture. Figure 13: Construction of partitions on a bidimensional 8 × 8 mesh architecture by weighted bipartitioning. -V Print the program version and copyright. 6.3.3 amk grf Synopsis amk grf [input graph file [output target file]] options Description The program amk grf turns a source graph file into a decomposition-defined target file. It computes a recursive bipartitioning of the source graph, as well as the array of distances between all pairs of its vertices, both of which are combined to give a decomposition-defined target architecture of same topology as the input source graph. The -l option restricts the target architecture to the vertices indicated in the given vertex list file. It is therefore possible to build a target architecture made of several disconnected parts of a bigger architecture. Note that this is not equivalent to turning a disconnected source graph into a target architecture, since doing so would lead to an architecture made of several independent pieces at infinite distance one from another. Considering the selected vertices within their original architecture makes it possible to compute the distance between vertices belonging to distinct connected components, and therefore to evaluate the cost of the mapping of two neighbor processes onto disjoint areas of the architecture. The restriction feature is very useful in the context of multi-user parallel machines. On these machines, when users request processors in order to run their jobs, the partitions allocated by the operating system may not be regular nor connected, because of existing partitions already attributed to other people. By feeding amk grf with the source graph representing the whole parallel machine, and the vertex list containing the labels of the processors allocated by the operating system, it is possible to build a target architecture corresponding to this partition, and therefore to map processes on it, automatically, regardless of the partition shape. The -b option selects the recursive bipartitioning strategy used to build the 31 decomposition of the source graph. For regular, unweighted, topologies, the ’-b(g|h)fx’ recursive bipartitioning strategy should work best. For irregular or weighted graphs, use the default strategy, which is more flexible. See also the manual page of function SCOTCH archBuild, page 67, for further information. Options -bstrategy Use recursive bipartitioning strategy strategy to build the decomposition of the architecture graph. The format of bipartitioning strategies is defined within section 7.3.1, at page 55. -h Display the program synopsis. -linput vertex file Load vertex list from input vertex file. As for all other file names, “-” may be used to indicate standard input. -V 6.3.4 Print the program version and copyright. atst Synopsis atst [input target file [output data file]] options Description The program atst is the architecture tester. It gives some statistics on decomposition-defined target architectures, and in particular the minimum, maximum, and average communication costs (that is, weighted distance) between all pairs of processors. Options 6.3.5 -h Display the program synopsis. -V Print the program version and copyright. gcv Synopsis gcv [input graph file [output graph file [output geometry file]]] options Description The program gcv is the source graph converter. It takes on input a graph file of the format specified with the -i option, and outputs its equivalent in the format specified with the -o option, along with its associated geometry file whenever geometry data is available. At the time being, it accepts four input formats: the Matrix Market format [5], the Harwell-Boeing collection format [10], the Chaco/MeTiS graph format [24], and the Scotch format. Three output format are available: the Matrix Market format, the Chaco/MeTiS graph format and the Scotch source graph and geometry data format. 32 Options -h Display the program synopsis. -iformat Specify the type of input graph. The available input formats are listed below. b[number] Harwell-Boeing graph collection format. Only symmetric assembled matrices are currently supported. Since files in this format can contain several graphs one after another, the optional integer number, starting from 0, indicates which graph of the file is considered for conversion. c Chaco v1.0/MeTiS format. m The Matrix Market format. s Scotch source graph format. -oformat Specify the output graph format. The available output formats are listed below. c m s -V Chaco v1.0/MeTiS format. The Matrix Market format. Scotch source graph format. Print the program version and copyright. Default option set is “-Ib0 -Os”. 6.3.6 gmap Synopsis gmap [input graph file [input target file [output mapping file [output log file]]]] options Description The program gmap is the graph mapper. It uses a partitioning strategy to map a source graph onto a target graph, so that the weight of source graph vertices allocated to target vertices is balanced, and the communication cost function fC is minimized. The implemented mapping methods mainly derive from graph theory. In particular, graph geometry is never used, even if it is available; only topological properties are taken into account. Mapping methods are used to define mapping strategies by means of selection, combination, grouping, and condition operators. The only mapping method implemented in version 5.1 is the Dual Recursive Bipartitioning algorithm, which uses graph bipartitioning methods. Available bipartitioning methods include a multi-level algorithm that uses other bipartitioning methods to compute the initial and refined bipartitions, an improved implementation of the Fiduccia–Mattheyses heuristic designed to handle weighted graphs, a greedy method derived from the Gibbs, Poole, and 33 Stockmeyer algorithm, the greedy graph growing heuristic, and a greedy “exactifying” refinement algorithm designed to balance vertex loads as much as possible; random and backtracking methods are also provided. The -m option allows the user to define the mapping strategy. If mapping statistics are wanted rather than the mapping output itself, mapping output can be set to /dev/null, with option -vmt to get mapping statistics and timings. Options Since the program is devoted to experimental studies, it has many optional parameters, used to test various execution modes. Values set by default will give best results in most cases. -h Display the program synopsis. -mstrat Apply mapping strategy strat. The format of mapping strategies is defined in section 7.3.1. -sobj Mask source edge and vertex weights. This option allows the user to “unweight” weighted source graphs by removing weights from edges and vertices at loading time. obj may contain several of the following switches. e v -V Remove edge weights, if any. Remove vertex weights, if any. Print the program version and copyright. -vverb Set verbose mode to verb, which may contain several of the following switches. For a detailed description of the data displayed, please refer to the manual page of gmtst below. m s t -V 6.3.7 Mapping information. Strategy information. This parameter displays the default mapping strategy used by gmap. Timing information. Print the program version and copyright. gmk * Synopsis gmk hy dim [output graph file] options gmk m2 dimX [dimY [output graph file]] options gmk m3 dimX [dimY [dimZ [output graph file]]] options gmk ub2 dim [output graph file] options Description The gmk * programs make source graphs. Each of them is devoted to a specific topology, for which it builds target graphs of any dimension. 34 The gmk * programs are mainly used in conjunction with amk grf. Most gmk * programs build source graphs describing parallel machines, which are used by amk grf to generate corresponding target sub-architectures, by means of its -l option. Such a procedure is shown in section 9, which builds a target architecture from five vertices of a binary de Bruijn graph of dimension 3. Program gmk hy outputs the source file of a hypercube graph of dimension dim. Vertices are labeled according to the decimal value of their binary representation. Program gmk m2 outputs the source file of a bidimensional mesh with dimX columns and dimY rows. If the -t option is set, tori are built instead of meshes. The vertex of coordinates (posX, posY ) is labeled posY×dimX+posX. Program gmk m3 outputs the source file of a tridimensional mesh with dimZ layers of dimY rows by dimX columns. If the -t option is set, tori are built instead of meshes. The vertex of coordinates (posX, posY ) is labeled (posZ × dimY + posY) × dimX + posX. Program gmk ub2 outputs the source file of a binary unoriented de Bruijn graph of dimension dim. Vertices are labeled according to the decimal value of their binary representation. Options -goutput geometry file Output graph geometry to file output geometry file (for gmk m2 only). As for all other file names, “-” may be used to indicate standard output. 6.3.8 -h Display the program synopsis. -t Build a torus rather than a mesh (for gmk m2 only). -V Print the program version and copyright. gmk msh Synopsis gmk msh [input mesh file [output graph file]] options Description The gmk msh program builds a graph file from a mesh file. All of the nodes of the mesh are turned into graph vertices, and edges are created between all pairs of vertices that share an element (that is, elements are turned into cliques). Options -h Display the program synopsis. -V Print the program version and copyright. 35 6.3.9 gmtst Synopsis gmtst [input graph file put data file]]]] options [input target file [input mapping file [out- Description The program gmtst is the graph mapping tester. It outputs some statistics on the given mapping, regarding load balance and inter-processor communication. The two first statistics lines deal with process mapping statistics, while the following ones deal with communication statistics. The first mapping line gives the number of processors used by the mapping, followed by the number of processors available in the architecture, and the ratio of these two numbers, written between parentheses. The second mapping line gives the minimum, maximum, and average loads of the processors, followed by the variance of the load distribution, and an imbalance ratio equal to the maximum load over the average load. The first communication line gives the minimum and maximum number of neighbors over all blocks of the mapping, followed by the sum of the number of neighbors over all blocks of the mapping, that is the total number of messages that have to be sent to exchange data between all neighboring blocks. The second communication line gives the average dilation of the edges, followed by the sum of all edge dilations. The third communication line gives the average expansion of the edges, followed by the value of function fC . The fourth communication line gives the average cut of the edges, followed by the number of cut edges. The fifth communication line shows the ratio of the average expansion over the average dilation; it is smaller than 1 when the mapper succeeds in putting heavily intercommunicating processes closer to each other than it does for lightly communicating processes; it is equal to 1 if all edges have the same weight. The remaining lines form a distance histogram, which shows the amount of communication load that involves processors located at increasing distances. gmtst allows the testing of cross-architecture mappings. By inputing it a target architecture different from the one that has been used to compute the mapping, but with compatible vertex labels, one can see what the mapping would yield on this new target architecture. Options -h Display the program synopsis. -V Print the program version and copyright. 6.3.10 gord Synopsis gord [input graph file [output ordering file [output log file]]] options Description The gord program is the block sparse matrix graph orderer. It uses an ordering strategy to compute block orderings of sparse matrices represented as 36 source graphs, whose vertex weights indicate the number of DOFs per node (if this number is non homogeneous) and whose edges are unweighted, in order to minimize fill-in and operation count. Since its main purpose is to provide orderings that exhibit high concurrency for parallel block factorization, it comprises a nested dissection method [17], but classical [41] and state-of-the-art [1, 49] minimum degree algorithms are implemented as well. Ordering methods are used to define ordering strategies by means of selection, grouping, and condition operators. For the nested dissection method, vertex separation methods comprise algorithms that directly compute vertex separators, as well as methods that build vertex separators from edge separators, i.e. graph bipartitions (all of the graph bipartitioning methods available in the static mapper gmap can be used in this latter case). The -o option allows the user to define the ordering strategy. When the graphs to order are very large, the same results can be obtained by using the dgord parallel program of the PT-Scotch distribution, which can read centralized graph files too. Options Since the program is devoted to experimental studies, it has many optional parameters, used to test various execution modes. Values set by default will give best results in most cases. -h Display the program synopsis. -moutput mapping file Write to output mapping file the mapping of graph vertices to column blocks. All of the separators and leaves produced by the nested dissection method are considered as distinct column blocks, which may be in turn split by the ordering methods that are applied to them. Distinct integer numbers are associated with each of the column blocks, such that the number of a block is always greater than the ones of its predecessors in the elimination process, that is, its descendants in the elimination tree. The structure of mapping files is given in section 5.5. When the geometry of the graph is available, this mapping file may be processed by program gout to display the vertex separators and supervariable amalgamations that have been computed. -ostrat Apply ordering strategy strat. The format of ordering strategies is defined in section 7.3.3. -toutput tree file Write to output tree file the structure of the separator tree. The data that is written resembles much the one of a mapping file: after a first line that contains the number of lines to follow, there are that many lines of mapping pairs, which associate an integer number with every graph vertex index. This integer number is the number of the column block which is the parent of the column block to which the vertex belongs, or −1 if the column block to which the vertex belongs is a root of the separator tree (there can be several roots, if the graph is disconnected). Combined to the column block mapping data produced by option -m, the tree structure allows one to rebuild the separator tree. 37 Print the program version and copyright. -V -vverb Set verbose mode to verb, which may contain several of the following switches. s t 6.3.11 Strategy information. This parameter displays the default ordering strategy used by gord. Timing information. gotst Synopsis gotst [input graph file [input ordering file [output data file]]] options Description The program gotst is the ordering tester. It gives some statistics on orderings, including the number of non-zeros and the operation count of the factored matrix, as well as statistics regarding the elimination tree. Since it performs the factorization of the reordered matrix, it can take a very long time and consume a large amount of memory when applied to large graphs. The first two statistics lines deal with the elimination tree. The first one displays the number of leaves, while the second shows the minimum height of the tree (that is, the length of the shortest path from any leaf to the –or a– root node), its maximum height, its average height, and the variance of the heights with respect to the average. The third line displays the number of nonzero terms in the factored matrix, the amount of index data that is necessary to maintain the block structure of the factored matrix, and the number of operations required to factor the matrix by means of Cholesky factorization. Options -h Display the program synopsis. -V Print the program version and copyright. 6.3.12 gout Synopsis gout [input graph file [input geometry file visualization file]]]] options [input mapping file [output Description The gout program is the graph, matrix, and mapping viewer program. It takes on input a source graph, its geometry file, and optionally a mapping result file, and produces a file suitable for display. At the time being, gout can generate plain and encapsulated PostScript files for the display of adjacency matrix patterns and the display of planar graphs (although tridimensional objects can be displayed by means of isometric projection, the display of tridimensional mappings is not efficient), and Open Inventor files [42] for the interactive visualization of tridimensional graphs. In the case of mapping display, the number of mapping pairs contained in the 38 input mapping file may differ from the number of vertices of the input source graph; only mapping pairs the source labels of which match labels of source graph vertices will be taken into account for display. This feature allows the user to show the result of the mapping of a subgraph drawn on the whole graph, or else to outline the most important aspects of a mapping by restricting the display to a limited portion of the graph. For example, Figure 14.b shows how the result of the mapping of a subgraph of the bidimensional mesh M2 (4, 4) onto the complete graph K(2) can be displayed on the whole M2 (4, 4) graph, and Figure 14.c shows how the display of the same mapping can be restricted to a subgraph of the original graph. a. A subgraph of M2 (4, 4) to be mapped onto K(2). b. Mapping result displayed on the full M2 (4, 4) graph. c. Mapping result displayed on another subgraph of M2 (4, 4). Figure 14: PostScript diplay of a single mapping file with different subgraphs of the same source graph. Vertices covered with disks of the same color are mapped onto the same processor. Options -gparameters Geometry parameters. n Do not read geometry data. This option can be used in conjunction with option -om to avoid reading the geometry file when displaying the pattern of the adjacency matrix associated with the source graph, since geometry data are not needed in this case. If this option is set, the geometry file is not read. However, if an output visualization file name is given in the command line, dummy input geometry file and input mapping file names must be specified so that the file argument 39 Figure 15: Snapshot of an Open Inventor display of a sphere partitioned into 7 almost equal pieces by mapping onto the complete graph with 7 vertices. Vertices of same color are mapped onto the same processor. r -h count is correct. In this case, use the “-” parameter to take standard input as a dummy geometry input stream. In practice, the -om and -gn options always imply the -mn option. For bidimensional geometry only, rotate geometry data by 90 degrees, counter-clockwise. Display the program synopsis. -mn Do not read mapping data, and display the graph without any mapping information. If this option is set, the mapping file is not read. However, if an output visualization file name is given in the command line, a dummy input mapping file name must be specified so that the file argument count is correct. In this case, use the “-” parameter to take standard input as a dummy mapping input stream. -oformat[{parameters}] Specify the type of output, with optional parameters within curly braces and separated by commas. The output formats are listed below. i Output the graph in SGI’s Open Inventor format, in ASCII mode, suitable for display by the ivview program [42]. The optional parameters are given below. c Color output, using 16 different colors. Opposite of g. g Grey-level output, using 8 different levels. Opposite of c. r Remove cut edges. Edges the ends of which are mapped onto different processors are not displayed. Opposite of v. v View cut edges. All graph edges are displayed. Opposite of r. 40 Output the pattern of the adjacency matrix associated with the source graph, in Adobe’s PostScript format. The optional parameters are given below. e Encapsulated PostScript output, suitable for LATEX use with epsf. Opposite of f. f Full-page PostScript output, suitable for direct printing. Opposite of e. p Output the graph in Adobe’s PostScript format. The optional parameters are given below. a Avoid displaying the mapping disks. Opposite of d. c Color PostScript output, using 16 different colors. Opposite of g. d Display the mapping disks. Opposite of a. e Encapsulated PostScript output, suitable for LATEX use with epsf. Opposite of f. f Full-page PostScript output, suitable for direct printing. Opposite of e. g Grey-level PostScript output. Opposite of c. l Large clipping. Mapping disks are included in the clipping area computation. Opposite of s. r Remove cut edges. Edges the ends of which are mapped onto different processors are not displayed. Opposite of v. s Small clipping. Mapping disks are excluded from the clipping area computation. Opposite of l. v View cut edges. All graph edges are displayed. Opposite of r. x=val Minimum X relative clipping position (in [0.0;1.0]). X=val Maximum X relative clipping position (in [0.0;1.0]). y=val Minimum Y relative clipping position (in [0.0;1.0]). Y=val Maximum Y relative clipping position (in [0.0;1.0]). -V Print the program version and copyright. m Default option set is “-Oi{v}”. 6.3.13 gtst Synopsis gtst [input graph file [output data file]] options Description The program gtst is the source graph tester. It checks the consistency of the input source graph structure (matching of arcs, number of vertices and edges, etc.), and gives some statistics regarding edge weights, vertex weights, and vertex degrees. 41 When the graphs to test are very large, the same results can be obtained by using the dgtst parallel program of the PT-Scotch distribution, which can read centralized graph files too. Options -h Display the program synopsis. -V Print the program version and copyright. 6.3.14 mcv Synopsis mcv [input mesh file [output mesh file [output geometry file]]] options Description The program mcv is the source mesh converter. It takes on input a mesh file of the format specified with the -i option, and outputs its equivalent in the format specified with the -o option, along with its associated geometry file whenever geometrical data is available. At the time being, it only accepts one external input format: the Harwell-Boeing format [10], for square elemental matrices only. The only output format to date is the Scotch source mesh and geometry data format. Options -h Display the program synopsis. -iformat Specify the type of input mesh. The available input formats are listed below. b[number] Harwell-Boeing mesh collection format. Only symmetric elemental matrices are currently supported. Since files in this format can contain several meshes one after another, the optional integer number, starting from 0, indicates which mesh of the file is considered for conversion. s Scotch source mesh format. -oformat Specify the output graph format. The available output formats are listed below. s -V Scotch source graph format. Print the program version and copyright. Default option set is “-Ib0 -Os”. 6.3.15 mmk * Synopsis mmk m2 dimX [dimY [output mesh file]] options mmk m3 dimX [dimY [dimZ [output mesh file]]] options 42 Description The mmk * programs make source meshes. Program mmk m2 outputs the source file of a bidimensional mesh with dimX × dimY elements and (dimX + 1) × (dimY + 1) nodes. The element of coordinates (posX, posY ) is labeled posY × dimX + posX. Program mmk m3 outputs the source file of a tridimensional mesh with dimX × dimY × dimZ elements and (dimX + 1) × (dimY + 1) × (dimZ + 1) nodes. Options -goutput geometry file Output mesh geometry to file output geometry file (for mmk m2 only). As for all other file names, “-” may be used to indicate standard output. -h Display the program synopsis. -V Print the program version and copyright. 6.3.16 mord Synopsis mord [input mesh file [output ordering file [output log file]]] options Description The mord program is the block sparse matrix mesh orderer. It uses an ordering strategy to compute block orderings of sparse matrices represented as source meshes, whose node vertex weights indicate the number of DOFs per node (if this number is non homogeneous), in order to minimize fill-in and operation count. Since its main purpose is to provide orderings that exhibit high concurrency for parallel block factorization, it comprises a nested dissection method [17], but classical [41] and state-of-the-art [1, 49] minimum degree algorithms are implemented as well. Ordering methods are used to define ordering strategies by means of selection, grouping, and condition operators. The -o option allows the user to define the ordering strategy. Options Since the program is devoted to experimental studies, it has many optional parameters, used to test various execution modes. Values set by default will give best results in most cases. -h Display the program synopsis. -moutput mapping file Write to output mapping file the mapping of mesh node vertices to column blocks. All of the separators and leaves produced by the nested dissection method are considered as distinct column blocks, which may 43 be in turn split by the ordering methods that are applied to them. Distinct integer numbers are associated with each of the column blocks, such that the number of a block is always greater than the ones of its predecessors in the elimination process, that is, its leaves in the elimination tree. The structure of mapping files is given in section 5.5. When the coordinates of the node vertices are available, the mapping file may be processed by program gout, along with the graph structure that can be created from the source mesh file by means of the gmk msh program, to display the node vertex separators and supervariable amalgamations that have been computed. -ostrat Apply ordering strategy strat. The format of ordering strategies is defined in section 7.3.3. -toutput tree file Write to output tree file the structure of the separator tree. The data that is written resembles much the one of a mapping file: after a first line that contains the number of lines to follow, there are that many lines of mapping pairs, which associate an integer number with every node vertex index. This integer number is the number of the column block which is the parent of the column block to which the node vertex belongs, or −1 if the column block to which the node vertex belongs is a root of the separator tree (there can be several roots, if the mesh is disconnected). Combined to the column block mapping data produced by option -m, the tree structure allows one to rebuild the separator tree. Print the program version and copyright. -V -vverb Set verbose mode to verb, which may contain several of the following switches. s t 6.3.17 Strategy information. This parameter displays the default ordering strategy used by mord. Timing information. mtst Synopsis mtst [input mesh file [output data file]] options Description The program mtst is the source mesh tester. It checks the consistency of the input source mesh structure (matching of arcs that link elements to nodes and nodes to elements, number of elements, nodes, and edges, etc.), and gives some statistics regarding element and node weights, edge weights, and element and node degrees. Options -h Display the program synopsis. -V Print the program version and copyright. 44 7 Library All of the features provided by the programs of the Scotch distribution may be directly accessed by calling the appropriate functions of the libScotch library, archived in files libscotch.a and libscotcherr.a. These routines belong to six distinct classes: • source graph and source mesh handling routines, which serve to declare, build, load, save, and check the consistency of source graphs and meshes, along with their geometry data; • target architecture handling routines, which allow the user to declare, build, load, and save target architectures; • strategy handling routines, which allow the user to declare and build mapping and ordering strategies; • mapping routines, which serve to declare, compute, and save mappings of source graphs to target architectures by means of mapping strategies; • ordering routines, which allow the user to declare, compute, and save orderings of source graphs and meshes; • error handling routines, which allow the user either to provide his own error servicing routines, or to use the default routines provided in the libScotch distribution. A MeTiS compatibility library, called libscotchmetis.a, is also available. It allows users who were previously using MeTiS in their software to take advantage of the efficieny of Scotch without having to modify their code. The services provided by this library are described in Section 7.14. 7.1 7.1.1 Calling the routines of libScotch Calling from C All of the C routines of the libScotch library are prefixed with “SCOTCH ”. The remainder of the function names is made of the name of the type of object to which the functions apply (e.g. “graph”, “mesh”, “arch”, “map”, etc.), followed by the type of action performed on this object: “Init” for the initialization of the object, “Exit” for the freeing of its internal structures, “Load” for loading the object from a stream, and so on. Typically, functions that return an error code return zero if the function succeeds, and a non-zero value in case of error. For instance, the SCOTCH graphInit and SCOTCH graphLoad routines, described in sections 7.5.1 and 7.5.4, respectively, can be called from C by using the following code. #include <stdio.h> #include "scotch.h" ... SCOTCH_Graph grafdat; FILE * fileptr; if (SCOTCH_graphInit (&grafdat) != 0) { 45 ... /* Error handling */ } if ((fileptr = fopen ("brol.grf", "r")) == NULL) { ... /* Error handling */ } if (SCOTCH_graphLoad (&grafdat, fileptr, -1, 0) != 0) { ... /* Error handling */ } ... Since “scotch.h” uses several system objects which are declared in “stdio.h”, this latter file must be included beforehand in your application code. Although the “scotch.h” and “ptscotch.h” files may look very similar on your system, never mistake them, and always use the “scotch.h” file as the include file for compiling a program which uses only the sequential routines of the libScotch library. 7.1.2 Calling from Fortran The routines of the libScotch library can also be called from Fortran. For any C function named SCOTCH typeAction() which is documented in this manual, there exists a SCOTCHFTYPEACTION () Fortran counterpart, in which the separating underscore character is replaced by an “F”. In most cases, the Fortran routines have exactly the same parameters as the C functions, save for an added trailing INTEGER argument to store the return value yielded by the function when the return type of the C function is not void. Since all the data structures used in libScotch are opaque, equivalent declarations for these structures must be provided in Fortran. These structures must therefore be defined as arrays of DOUBLEPRECISIONs, of sizes given in file scotchf.h, which must be included whenever necessary. For routines which read or write data using a FILE * stream in C, the Fortran counterpart uses an INTEGER parameter which is the numer of the Unix file descriptor corresponding to the logical unit from which to read or write. In most Unix implementations of Fortran, standard descriptors 0 for standard input (logical unit 5), 1 for standard output (logical unit 6) and 2 for standard error are opened by default. However, for files that are opened using OPEN statements, an additional function must be used to obtain the number of the Unix file descriptor from the number of the logical unit. This function is called FNUM in most Unix implementations of Fortran. For instance, the SCOTCH graphInit and SCOTCH graphLoad routines, described in sections 7.5.1 and 7.5.4, respectively, can be called from Fortran by using the following code. INCLUDE "scotchf.h" DOUBLEPRECISION GRAFDAT(SCOTCH_GRAPHDIM) INTEGER RETVAL ... CALL SCOTCHFGRAPHINIT (GRAFDAT (1), RETVAL) IF (RETVAL .NE. 0) THEN ... OPEN (10, FILE=’brol.grf’) 46 CALL SCOTCHFGRAPHLOAD (GRAFDAT (1), FNUM (10), 1, 0, RETVAL) CLOSE (10) IF (RETVAL .NE. 0) THEN ... Although the “scotchf.h” and “ptscotchf.h” files may look very similar on your system, never mistake them, and always use the “scotchf.h” file as the include file for compiling a program which uses only the sequential routines of the libScotch library. 7.1.3 Compiling and linking The compilation of C or Fortran routines which use routines of the libScotch library requires that either scotch.h or scotchf.h be included, respectively. The routines of the libScotch library are grouped in a library file called libscotch.a. Default error routines that print an error message and exit are provided in library file libscotcherr.a. Therefore, the linking of applications that make use of the libScotch library with standard error handling is carried out by using the following options: “-lscotch -lscotcherr -lm”. If you want to handle errors by yourself, you should not link with library file libscotcherr.a, but rather provide a SCOTCH errorPrint() routine. Please refer to section 7.12 for more information. 7.1.4 Machine word size issues Graph indices are represented in Scotch as integer values of type SCOTCH Num. By default, this type is equivalent to the int C type, that is, an integer type of size equal to the one of the machine word. However, it can represent any other integer type. To coerce the length of the Scotch integer type to 32 or 64 bits, one can use the INTSIZE32 or INTSIZE64 flags, respectively, or else the “-DINT=” definition, at compile time. This feature can be used to allow Scotch to handle large graphs on 32-bit architectures. If the SCOTCH Num type is set to represent a 64-bit integer type, all graph indices will be 64-bit integers, while function error codes will still be traditional 32-bit integers. One must therefore be careful when using the Fortran interface of Scotch. In the manual pages of the libScotch routines, all Fortran prototypes are given with both graph indices and return values specified as plain INTEGERs. In practice, when Scotch is compiled to use 64-bit SCOTCH Nums and 32-bit ints, graph indices should be declared as INTEGER*8, while integer error codes should still be declared as INTEGER*4 values. These discrepancies are not a problem if Scotch is compiled such that all ints are 64-bit integers. In this case, there is no need to use any type coercing definition. Also, the MeTiS compatibility library provided by Scotch will not work when SCOTCH Nums are not ints, since the interface of MeTiS uses regular ints to represent graph indices. In addition to compile-time warnings, an error message will be issued when one of these routines is called. 47 7.2 Data formats All of the data used in the libScotch interface are of integer type SCOTCH Num. To hide the internals of Scotch to callers, all of the data structures are opaque, that is, declared within scotch.h as dummy arrays of double precision values, for the sake of data alignment. Accessor routines, the names of which end in “Size” and “Data”, allow callers to retrieve information from opaque structures. In all of the following, whenever arrays are defined, passed, and accessed, it is assumed that the first element of these arrays is always labeled as baseval, whether baseval is set to 0 (for C-style arrays) or 1 (for Fortran-style arrays). Scotch internally manages with base values and array pointers so as to process these arrays accordingly. 7.2.1 Architecture format Target architecture structures are completely opaque. The only way to describe an architecture is by means of a graph passed to the SCOTCH archBuild routine. 7.2.2 Graph format Source graphs are described by means of adjacency lists. The description of a graph requires several SCOTCH Num scalars and arrays, as shown in Figures 16 and 17. They have the following meaning: baseval Base value for all array indexings. vertnbr Number of vertices in graph. edgenbr Number of arcs in graph. Since edges are represented by both of their ends, the number of edge data in the graph is twice the number of graph edges. verttab Array of start indices in edgetab of vertex adjacency sub-arrays. vendtab Array of after-last indices in edgetab of vertex adjacency sub-arrays. For any vertex i, with baseval ≤ i < (baseval+ vertnbr), vendtab[i] − verttab[i] is the degree of vertex i, and the indices of the neighbors of i are stored in edgetab from edgetab[verttab[i]] to edgetab[vendtab[i]−1], inclusive. When all vertex adjacency lists are stored in order in edgetab, it is possible to save memory by not allocating the physical memory for vendtab. In this case, illustrated in Figure 16, verttab is of size vertnbr + 1 and vendtab points to verttab + 1. This case is referred to as the “compact edge array” case, such that verttab is sorted in ascending order, verttab[baseval] = baseval and verttab[baseval + vertnbr] = (baseval + edgenbr). velotab Optional array, of size vertnbr, holding the integer load associated with every vertex. 48 baseval 1 vertnbr 7 4 3 1 4 4 2 1 2 1 4 1 edgenbr 24 1 2 2 3 1 vlbltab velotab 2 3 6 4 1 4 4 4 4 4 3 4 7 4 1 5 4 vendtab verttab 1 4 10 13 16 19 22 25 edgetab 3 2 6 3 4 1 7 6 5 1 2 4 2 7 3 7 2 6 2 1 5 5 2 4 edlotab 1 1 1 2 2 1 2 3 3 1 2 2 2 1 2 1 3 3 3 1 3 1 2 1 Figure 16: Sample graph and its description by libScotch arrays using a compact edge array. Numbers within vertices are vertex indices, bold numbers close to vertices are vertex loads, and numbers close to edges are edge loads. Since the edge array is compact, verttab is of size vertnbr+1 and vendtab points to verttab+1. verttab 17 2 13 10 20 27 23 edgetab 3 4 1 7 6 5 2 7 3 1 2 4 3 2 6 7 2 6 5 2 4 2 1 5 2 1 2 1 2 2 1 1 1 1 3 3 1 2 1 3 1 3 vendtab 20 8 16 13 23 30 26 edlotab 2 2 1 2 3 3 Figure 17: Adjacency structure of the sample graph of Figure 16 with disjoint edge and edge load arrays. Both verttab and vendtab are of size vertnbr. This allows for the handling of dynamic graphs, the structure of which can evolve with time. edgetab Array, of a size equal at least to (maxi (vendtab[i]) − baseval), holding the adjacency array of every vertex. edlotab Optional array, of a size equal at least to (maxi (vendtab[i]) − baseval), holding the integer load associated with every arc. Matching arcs should always have identical loads. Dynamic graphs can be handled elegantly by using the vendtab array. In order to dynamically manage graphs, one just has to allocate verttab, vendtab and edgetab arrays that are large enough to contain all of the expected new vertex and edge data. Original vertices are labeled starting from baseval, leaving free space at the end of the arrays. To remove some vertex i, one just has to replace verttab[i] and vendtab[i] with the values of verttab[vertnbr-1] and vendtab[vertnbr-1], respectively, and browse the adjacencies of all neighbors of former vertex vertnbr-1 such that all (vertnbr-1) indices are turned into is. Then, vertnbr must be decremented, and SCOTCH graphBuild() must be called to account for the change of topology. If a graph building routine such as SCOTCH graphLoad() or SCOTCH 49 graphBuild() had already been called on the SCOTCH Graph structure, SCOTCH graphFree() has to be called first in order to free the internal structures associated with the older version of the graph, else these data would be lost, which would result in memory leakage. To add a new vertex, one has to fill verttab[vertnbr-1] and vendtab[vertnbr -1] with the starting and end indices of the adjacency sub-array of the new vertex. Then, the adjacencies of its neighbor vertices must also be updated to account for it. If free space had been reserved at the end of each of the neighbors, one just has to increment the vendtab[i] values of every neighbor i, and add the index of the new vertex at the end of the adjacency sub-array. If the sub-array cannot be extended, then it has to be copied elsewhere in the edge array, and both verttab[i] and vendtab[i] must be updated accordingly. With simple housekeeping of free areas of the edge array, dynamic arrays can be updated with as little data movement as possible. 7.2.3 Mesh format Since meshes are basically bipartite graphs, source meshes are also described by means of adjacency lists. The description of a mesh requires several SCOTCH Num scalars and arrays, as shown in Figure 18. They have the following meaning: velmbas Base value for element indexings. vnodbas Base value for node indexings. The base value of the underlying graph, baseval, is set as min(velmbas, vnodbas). velmnbr Number of element vertices in mesh. vnodnbr Number of node vertices in mesh. The overall number of vertices in the underlying graph, vertnbr, is set as velmnbr + vnodnbr. edgenbr Number of arcs in mesh. Since edges are represented by both of their ends, the number of edge data in the mesh is twice the number of edges. verttab Array of start indices in edgetab of vertex (that is, both elements and nodes) adjacency sub-arrays. vendtab Array of after-last indices in edgetab of vertex adjacency sub-arrays. For any element or node vertex i, with baseval ≤ i < (baseval + vertnbr), vendtab[i] − verttab[i] is the degree of vertex i, and the indices of the neighbors of i are stored in edgetab from edgetab[verttab[i]] to edgetab [vendtab[i]−1], inclusive. When all vertex adjacency lists are stored in order in edgetab, it is possible to save memory by not allocating the physical memory for vendtab. In this case, illustrated in Figure 18, verttab is of size vertnbr + 1 and vendtab points to verttab + 1. This case is referred to as the “compact edge array” case, such that verttab is sorted in ascending order, verttab[baseval] = baseval and verttab[baseval + vertnbr] = (baseval + edgenbr). 50 velmbas 1 vnodbas 4 velmnbr 3 vnodnbr 8 4 10 2 5 11 1 edgenbr 24 6 7 vlbltab 3 velotab 8 9 vendtab verttab 1 5 9 13 14 16 18 20 21 22 23 25 edgetab 5 11 7 6 10 5 11 4 8 9 6 7 2 2 1 1 3 1 3 3 3 2 2 1 Figure 18: Sample mesh and its description by libScotch arrays using a compact edge array. Numbers within vertices are vertex indices. Since the edge array is compact, verttab is of size vertnbr + 1 and vendtab points to verttab + 1. velotab Array, of size vertnbr, holding the integer load associated with each vertex. As for graphs, it is possible to handle elegantly dynamic meshes by means of the verttab and vendtab arrays. There is, however, an additional constraint, which is that mesh nodes and elements must be ordered consecutively. The solution to fulfill this constraint in the context of mesh ordering is to keep a set of empty elements (that is, elements which have no node adjacency attached to them) between the element and node arrays. For instance, Figure 19 represents a 4-element mesh with 6 nodes, and such that 4 element vertex slots have been reserved for new elements and nodes. These slots are empty elements for which verttab[i] equals vendtab[i], irrespective of these values, since they will not lead to any memory access in edgetab. Using this layout of vertices, new nodes and elements can be created by growing the element and node sub-arrays into the empty element sub-array, by both of its sides, without having to re-write the whole mesh structure, as illustrated in Figure 20. Empty elements are transparent to the mesh ordering routines, which base their work on node vertices only. Users who want to update the arrays of a mesh that has already been declared using the SCOTCH meshBuild routine must call SCOTCH meshExit prior to updating the mesh arrays, and then call SCOTCH meshBuild again after the arrays have been updated, so that the SCOTCH Mesh structure remains consistent with the new mesh data. 7.2.4 Geometry format Geometry data is always associated with a graph or a mesh. It is simply made of a single array of double-precision values which represent the coordinates of the vertices of a graph, or of the node vertices of a mesh, in vertex order. The fields of a geometry structure are the following: dimnnbr Number of dimensions of the graph or of the mesh, which can be 1, 2, or 3. 51 velmbas 11 vnodbas 1 velmnbr 4 vnodnbr 6 1 11 2 3 12 13 edgenbr 24 14 4 5 6 vlbltab velotab verttab 1 2 5 8 9 12 0 0 0 0 13 16 19 22 edgetab 11 11 12 13 11 12 14 13 13 14 12 14 1 2 3 5 2 3 4 5 2 3 6 5 vendtab 2 5 8 9 12 13 0 0 0 0 16 19 22 25 Figure 19: Sample mesh and its description by libScotch arrays, with nodes numbered first and elements numbered last. In order to allow for dynamic remeshing, empty elements (in grey) have been inserted between existing node and element vertices. velmbas 9 vnodbas 1 velmnbr 6 vnodnbr 7 1 9 11 2 7 3 12 10 13 edgenbr 36 4 14 5 6 vlbltab velotab verttab 25 2 5 8 27 12 31 0 9 35 13 16 19 22 edgetab 11 12 13 9 10 14 13 1 7 3 14 1 2 7 5 2 7 4 5 2 3 6 5 11 9 13 14 12 10 11 9 12 10 7 3 5 vendtab 27 5 8 9 31 13 35 0 12 38 16 19 22 25 Figure 20: Re-meshing of the mesh of Figure 19. New node vertices have been added at the end of the vertex sub-array, new elements have been added at the beginning of the element sub-array, and vertex base values have been updated accordingly. Node adjacency lists that could not fit in place have been added at the end of the edge array, and some of the freed space has been re-used for new adjacency lists. Element adjacency lists do not require moving in this case, as all of the elements have the name number of nodes. 52 geomtab Array of coordinates. This is an array of double precision values organized as an array of (x), or (x,y), or (x,y,z) tuples, according to dimnnbr. Coordinates that are not used (e.g. the “z” coordinates for a 2-dimentional object) are not allocated. Therefore, the “x” coordinate of some graph vertex i is located at geomtab[(i - baseval) * dimnnbr + baseval], its “y” coordinate is located at geomtab[(i - baseval) * dimnnbr + baseval + 1] if dimnnbr ≤ 2, and its “z” coordinate is located at geomtab[(i baseval) * dimnnbr + baseval + 2] if dimnnbr = 3. Whenever the geometry is associated with a mesh, only node vertices are considered, so the “x” coordinate of some mesh node vertex i, with vnodbas ≤ i, is located at geomtab[(i - vnodbas) * dimnnbr + baseval], its “y” coordinate is located at geomtab[(i - vnodbas) * dimnnbr + baseval + 1] if dimnnbr ≤ 2, and its “z” coordinate is located at geomtab[(i - vnodbas) * dimnnbr + baseval + 2] if dimnnbr = 3. 7.2.5 Block ordering format Block orderings associated with graphs and meshes are described by means of block and permutation arrays, made of SCOTCH Nums, as shown in Figure 21. In order for all orderings to have the same structure, irrespective of whether they are created from graphs or meshes, all ordering data indices start from baseval, even when they refer to a mesh the node vertices of which are labeled from a vnodbas index such that vnodbas > baseval. Consequently, row indices are related to vertex indices in memory in the following way: row i is associated with vertex i of the SCOTCH Graph structure if the ordering was computed from a graph, and with node vertex i + (vnodbas− baseval) of the SCOTCH Mesh structure if the ordering was computed from a mesh. Block orderings are made of the following data: permtab Array holding the permutation of the reordered matrix. Thus, if k = permtab[i], then row i of the original matrix is now row k of the reordered matrix, that is, row i is the k th pivot. peritab Inverse permutation of the reordered matrix. Thus, if i = peritab[k], then row k of the reordered matrix was row i of the original matrix. cblknbr Number of column blocks (that is, supervariables) in the block ordering. rangtab Array of ranges for the column blocks. Column block c, with baseval ≤ c < (cblknbr+baseval), contains columns with indices ranging from rangtab[i] to rangtab[i + 1], exclusive, in the reordered matrix. Indices in rangtab are based. Therefore, rangtab[baseval] is always equal to baseval, and rangtab[cblknbr + baseval] is always equal to vertnbr + baseval for graphs and to vnodnbr + baseval for meshes. In order to avoid memory errors when column blocks are all single columns, the size of rangtab must always be one more than the number of columns, that is, vertnbr + 1 for graphs and vnodnbr + 1 for meshes. treetab Array of ascendants of permuted column blocks in the separators tree. 53 permtab 2 3 10 6 4 11 8 7 1 12 5 9 peritab 9 1 2 5 11 4 8 7 12 3 6 10 cblknbr 7 1 2 3 4 2 2 5 6 7 8 4 10 3 6 5 7 11 rangtab 1 2 4 5 6 8 10 13 treetab 3 3 7 6 6 7 −1 9 10 11 12 1 1 8 7 5 9 6 3 12 4 Figure 21: Arrays resulting from the ordering by complete nested dissection of a 4 by 3 grid based from 1. Leftmost grid is the original grid, and righmost grid is the reordered grid, with separators shown and column block indices written in bold. treetab[i] is the index of the father of column block i in the separators tree, or −1 if column block i is the root of the separators tree. Whenever separators or leaves of the separators tree are split into subblocks, as the block splitting, minimum fill or minimum degree methods do, all subblocks of the same level are linked to the column block of higher index belonging to the closest separator ancestor. Indices in treetab are based, in the same way as for the other blocking structures. See Figure 21 for a complete example. 7.3 Strategy strings The behavior of the mapping and block ordering routines of the libScotch library is parametrized by means of strategy strings, which describe how and when given partitioning or ordering methods should be applied to graphs and subgraphs, or to meshes and submeshes. 7.3.1 Mapping strategy strings At the time being, mapping methods only apply to graphs, as there is not yet a mesh mapping tool in the Scotch package. Mapping strategies are made of methods, with optional parameters enclosed between curly braces, and separated by commas, in the form of method [{parameters}] . The currently available mapping methods are the following. b Dual Recursive Bipartitioning mapping algorithm, as defined in section 3.1.3. The parameters of the DRB mapping method are listed below. job=tie The tie flag defines how new jobs are stored in job pools. t u Tie job pools together. Subjobs are stored in same pool as their parent job. This is the default behavior, as it proves the most efficient in practice. Untie job pools. Subjobs are stored in the next job pool to be processed. map=tie The tie flag defines how results of bipartitioning jobs are propagated to jobs still in pools. t Tie both mapping tables together. Results are immediately available to jobs in the same job pool. This is the default behavior. 54 u Untie mapping tables. Results are only available to jobs of next pool to be processed. poli=policy Select jobs according to policy policy. Job selection policies define how bipartitioning jobs are ordered within the currently active job pool. Valid policy flags are L l r S s Most neighbors of higher level. Highest level. Random. Most neighbors of smaller size. This is the default behavior. Biggest size. strat=strat Apply bipartitioning strategy strat to each bipartitioning job. A bipartitioning strategy is made of one or several bipartitioning methods, which can be combined by means of strategy operators. Graph bipartitioning strategies are described below. 7.3.2 Graph bipartitioning strategy strings A graph bipartitioning strategy is made of one or several graph bipartitioning methods, which can be combined by means of strategy operators. Strategy operators are listed below, by increasing precedence. strat1 |strat2 Selection operator. The result of the selection is the best bipartition of the two that are obtained by the separate application of strat1 and strat2 to the current bipartition. strat1 strat2 Combination operator. Strategy strat2 is applied to the bipartition resulting from the application of strategy strat1 to the current bipartition. Typically, the first method used should compute an initial bipartition from scratch, and every following method should use the result of the previous one at its starting point. (strat ) Grouping operator. The strategy enclosed within the parentheses is treated as a single bipartitioning method. /cond ?strat1 [:strat2]; Condition operator. According to the result of the evaluation of condition cond, either strat1 or strat2 (if it is present) is applied. The condition applies to the characteristics of the current active graph, and can be built from logical and relational operators. Conditional operators are listed below, by increasing precedence. cond1 |cond2 Logical or operator. The result of the condition is true if cond1 or cond2 are true, or both. cond1 &cond2 Logical and operator. The result of the condition is true only if both cond1 and cond2 are true. 55 !cond Logical not operator. The result of the condition is true only if cond is false. var relop val Relational operator, where var is a graph variable, val is either a graph variable or a constant of the type of variable var , and relop is one of ’<’, ’=’, and ’>’. The graph variables are listed below, along with their types. deg The average degree of the current graph. Float. edge The number of arcs (which is twice the number of edges) of the current graph. Integer. load The overall vertex load (weight) of the current graph. Integer. load0 The vertex load of the first subset of the current bipartition of the current graph. Integer. vert The number of vertices of the current graph. Integer. method [{parameters}] Bipartitioning method. For bipartitioning methods that can be parametrized, parameter settings may be provided after the method name. Parameters must be separated by commas, and the whole list be enclosed between curly braces. The currently available graph bipartitioning methods are the following. b Band method. This method builds a band graph of given width around the current frontier of the graph to which it is applied, and calls a graph bipartitioning strategy to refine the equivalent bipartition of the band graph. Then, the refined frontier of the band graph is projected back to the current graph. This method, presented in [8], was created to reduce the cost of vertex separator refinement algorithms in a multi-level context, but it improves partition quality too. The same behavior is observed for graph bipartitioning. The parameters of the band bipartitioning method are listed below. bnd=strat Set the graph bipartitioning strategy to be used on the band graph. org=strat Set the fallback graph bipartitioning strategy to be used on the original graph if the band graph strategy could not be used. The three cases which require the use of this fallback strategy are the following. First, if the separator of the original graph is empty, which makes it impossible to compute a band graph. Second, if any part of the band graph to be built is of the same size as the one of the original graph. Third, if the application of the bnd bipartitioning method to the band graph leads to a situation where both anchor vertices are placed in the same part. width=val Set the width of the band graph. All graph vertices that are at a distance less than or equal to val from any frontier vertex are kept in the band graph. 56 d Diffusion method. This method, presented in [44], flows two kinds of antagonistic liquids, scotch and anti-scotch, from two source vertices, and sets the new frontier as the limit between vertices which contain scotch and the ones which contain anti-scotch. Because selecting the source vertices is essential to the obtainment of useful results, this method has been hard-coded so that the two source vertices are the two vertices of highest indices, since in the band method these are the anchor vertices which represent all of the removed vertices of each part. Therefore, this method must be used on band graphs only, or on specifically crafted graphs. Applying it to any other graphs is very likely to lead to extremely poor results. The parameters of the diffusion bipartitioning method are listed below. dif=rat Fraction of liquid which is diffused to neighbor vertices at each pass. To achieve convergence, the sum of the dif and rem parameters must be equal to 1, but in order to speed-up the diffusion process, other combinations of higher sum can be tried. In this case, the number of passes must be kept low, to avoid numerical overflows which would make the results useless. pass=nbr Set the number of diffusion sweeps performed by the algorithm. This number depends on the width of the band graph to which the diffusion method is applied. Useful values range from 30 to 500 according to chosen dif and rem coefficients. rem=rat Fraction of liquid which remains on vertices at each pass. See above. f Fiduccia-Mattheyses method. The parameters of the Fiduccia-Mattheyses method are listed below. bal=rat Set the maximum weight imbalance ratio to the given fraction of the subgraph vertex weight. Common values are around 0.01, that is, one percent. move=nbr Maximum number of hill-climbing moves that can be performed before a pass ends. During each of its passes, the Fiduccia-Mattheyses algorithm repeatedly swaps vertices between the two parts so as to minimize the cost function. A pass completes either when all of the vertices have been moved once, or if too many swaps that do not decrease the value of the cost function have been performed. Setting this value to zero turns the Fiduccia-Mattheyses algorithm into a gradient-like method, which may be used to quickly refine partitions during the uncoarsening phase of the multi-level method. pass=nbr Set the maximum number of optimization passes performed by the algorithm. The Fiduccia-Mattheyses algorithm stops as soon as a pass has not yielded any improvement of the cost function, or when the maximum number of passes has been reached. Value −1 stands for an infinite number of passes, that is, as many as needed by the algorithm to converge. g Gibbs-Poole-Stockmeyer method. This method has only one parameter. 57 pass=nbr Set the number of sweeps performed by the algorithm. h Greedy-graph-growing method. This method has only one parameter. pass=nbr Set the number of runs performed by the algorithm. m Multi-level method. The parameters of the multi-level method are listed below. asc=strat Set the strategy that is used to refine the partitions obtained at ascending levels of the uncoarsening phase by projection of the bipartitions computed for coarser graphs. This strategy is not applied to the coarsest graph, for which only the low strategy is used. low=strat Set the strategy that is used to compute the partition of the coarsest graph, at the lowest level of the coarsening process. rat=rat Set the threshold maximum coarsening ratio over which graphs are no longer coarsened. The ratio of any given coarsening cannot be less that 0.5 (case of a perfect matching), and cannot be greater than 1.0. Coarsening stops when either the coarsening ratio is above the maximum coarsening ratio, or the graph has fewer vertices than the minimum number of vertices allowed. type=type Set the type of matching that is used to coarsen the graphs. type is h for heavy-edge matching, or s for scan (first-fit) matching. vert=nbr Set the threshold minimum graph size under which graphs are no longer coarsened. Coarsening stops when either the coarsening ratio is above the maximum coarsening ratio, or the graph has fewer vertices than the minimum number of vertices allowed. x Exactifying method. z Zero method. This method moves all of the vertices to the first part. Its main use is to stop the bipartitioning process, if some condition is true, when mapping onto variable-sized architectures (see section 3.1.7). 7.3.3 Ordering strategy strings Ordering strategies are available both for graphs and for meshes. An ordering strategy is made of one or several ordering methods, which can be combined by means of strategy operators. The strategy operators that can be used in ordering strategies are listed below, by increasing precedence. (strat ) Grouping operator. The strategy enclosed within the parentheses is treated as a single ordering method. 58 /cond ?strat1 [:strat2]; Condition operator. According to the result of the evaluation of condition cond, either strat1 or strat2 (if it is present) is applied. The condition applies to the characteristics of the current node of the separators tree, and can be built from logical and relational operators. Conditional operators are listed below, by increasing precedence. cond1 |cond2 Logical or operator. The result of the condition is true if cond1 or cond2 are true, or both. cond1 &cond2 Logical and operator. The result of the condition is true only if both cond1 and cond2 are true. !cond Logical not operator. The result of the condition is true only if cond is false. var relop val Relational operator, where var is a node variable, val is either a node variable or a constant of the type of variable var, and relop is one of ’<’, ’=’, and ’>’. The node variables are listed below, along with their types. edge The number of vertices of the current subgraph. Integer. levl The level of the subgraph in the separators tree, starting from zero for the initial graph at the root of the tree. Integer. load The overall vertex load (weight) of the current subgraph. Integer. mdeg The maximum degree of the current subgraph. Integer. vert The number of vertices of the current subgraph. Integer. method [{parameters}] Graph or mesh ordering method. Available ordering methods are listed below. The currently available ordering methods are the following. b Blocking method. This method does not perform ordering by itself, but is used as post-processing to cut into blocks of smaller sizes the separators or large blocks produced by other ordering methods. This is not useful in the context of direct solving methods, because the off-diagonal blocks created by the splitting of large diagonal blocks are likely to be filled at factoring time. However, in the context of incomplete solving methods such as ILU(k) [29], it can lead to a significant reduction of the required memory space and time, because it helps carving large triangular blocks. The parameters of the blocking method are described below. cmin=size Set the minimum size of the resulting subblocks, in number of columns. Blocks larger than twice this minimum size are cut into sub-blocks of equal sizes (within one), having a number of columns comprised between 59 size and 2size. The definition of size depends on the size of the graph to order. Large graphs cannot afford very small values, because the number of blocks becomes much too large and limits the acceleration of BLAS 3 routines, while large values do not help reducing enough the complexity of ILU(k) solving. strat=strat Ordering strategy to be performed. After the ordering strategy is applied, the resulting separators tree is traversed and all of the column blocks that are larger than 2size are split into smaller column blocks, without changing the ordering that has been computed. c Compression method [2]. The parameters of the compression method are listed below. rat=rat Set the compression ratio over which graphs and meshes will not be compressed. Useful values range between 0.7 and 0.8. cpr=strat Ordering strategy to use on the compressed graph or mesh if its size is below the compression ratio times the size of the original graph or mesh. unc=strat Ordering strategy to use on the original graph or mesh if the size of the compressed graph or mesh were above the compression ratio times the size of the original graph or mesh. d Block Halo Approximate Minimum Degree method [49]. The parameters of the Halo Approximate Minimum Degree method are listed below. The Block Halo Approximate Minimum Fill method, described below, is more efficient and should be preferred. cmin=size Minimum number of columns per column block. All column blocks of width smaller than size are amalgamated to their parent column block in the elimination tree, provided that it does not violate the cmax constraint. cmax=size Maximum number of column blocks over which some column block will not amalgamate one of its descendents in the elimination tree. This parameter is mainly designed to provide an upper bound for block size in the context of BLAS3 computations ; else, a huge value should be provided. frat=rat Fill-in ratio over which some column block will not amalgamate one of its descendents in the elimination tree. Typical values range from 0.05 to 0.10. f Block Halo Approximate Minimum Fill method. The parameters of the Halo Approximate Minimum Fill method are listed below. cmin=size Minimum number of columns per column block. All column blocks of width smaller than size are amalgamated to their parent column block in the elimination tree, provided that it does not violate the cmax constraint. 60 cmax=size Maximum number of column blocks over which some column block will not amalgamate one of its descendents in the elimination tree. This parameter is mainly designed to provide an upper bound for block size in the context of BLAS3 computations ; else, a huge value should be provided. frat=rat Fill-in ratio over which some column block will not amalgamate one of its descendents in the elimination tree. Typical values range from 0.05 to 0.10. g Gibbs-Poole-Stockmeyer method. This method is used on separators to reduce the number and extent of extra-diagonal blocks. If the number of extradiagonal blocks is not relevant, the s method should be preferred. This method has only one parameter. pass=nbr Set the number of sweeps performed by the algorithm. n Nested dissection method. The parameters of the nested dissection method are given below. ole=strat Set the ordering strategy that is used on every leaf of the separators tree if the node separation strategy sep has failed to separate it further. ose=strat Set the ordering strategy that is used on every separator of the separators tree. sep=strat Set the node separation strategy that is used on every leaf of the separators tree to make it grow. Node separation strategies are described below, in section 7.3.4. s Simple method. Vertices are ordered in their natural order. This method is fast, and should be used to order separators if the number of extra-diagonal blocks is not relevant ; else, the g method should be preferred. v Mesh-to-graph method. Available only for mesh ordering strategies. From the mesh to which this method applies is derived a graph, such that a graph vertex is associated with every node of the mesh, and a clique is created between all vertices which represent nodes that belong to the same element. A graph ordering strategy is then applied to the derived graph, and this ordering is projected back to the nodes of the mesh. This method is here for evaluation purposes only, as mesh ordering methods are generally more efficient than their graph ordering counterpart. strat=strat Graph ordering strategy to apply to the associated graph. 7.3.4 Node separation strategy strings A node separation strategy is made of one or several node separation methods, which can be combined by means of strategy operators. Strategy operators are listed below, by increasing precedence. 61 strat1 |strat2 Selection operator. The result of the selection is the best vertex separator of the two that are obtained by the distinct application of strat1 and strat2 to the current separator. strat1 strat2 Combination operator. Strategy strat2 is applied to the vertex separator resulting from the application of strategy strat1 to the current separator. Typically, the first method used should compute an initial separation from scratch, and every following method should use the result of the previous one as a starting point. (strat ) Grouping operator. The strategy enclosed within the parentheses is treated as a single separation method. /cond ?strat1 [:strat2]; Condition operator. According to the result of the evaluation of condition cond, either strat1 or strat2 (if it is present) is applied. The condition applies to the characteristics of the current subgraph, and can be built from logical and relational operators. Conditional operators are listed below, by increasing precedence. cond1 |cond2 Logical or operator. The result of the condition is true if cond1 or cond2 are true, or both. cond1 &cond2 Logical and operator. The result of the condition is true only if both cond1 and cond2 are true. !cond Logical not operator. The result of the condition is true only if cond is false. var relop val Relational operator, where var is a graph or node variable, val is either a graph or node variable or a constant of the type of variable var , and relop is one of ’<’, ’=’, and ’>’. The graph and node variables are listed below, along with their types. levl The level of the subgraph in the separators tree, starting from zero at the root of the tree. Integer. proc The number of processors on which the current subgraph is distributed at this level of the separators tree. This variable is available only when calling from routines of the PT-Scotch parallel library. Integer. rank The rank of the current processor among the group of processors on which the current subgraph is distributed at this level of the separators tree. This variable is available only when calling from routines of the PT-Scotch parallel library, for instance to decide which node separation strategy should be used on which processor in a multi-sequential approach. Integer. 62 vert The number of vertices of the current subgraph. Integer. The currently available vertex separation methods are the following. b Band method. Available only for graph separation strategies. This method builds a band graph of given width around the current separator of the graph to which it is applied, and calls a graph separation strategy to refine the equivalent separator of the band graph. Then, the refined separator of the band graph is projected back to the current graph. This method, presented in [8], was created to reduce the cost of separator refinement algorithms in a multi-level context, but it improves partition quality too. The parameters of the band separation method are listed below. bnd=strat Set the vertex separation strategy to be used on the band graph. org=strat Set the fallback vertex separation strategy to be used on the original graph if the band graph strategy could not be used. The three cases which require the use of this fallback strategy are the following. First, if the separator of the original graph is empty, which makes it impossible to compute a band graph. Second, if any part of the band graph to be built is of the same size as the one of the original graph. Third, if the application of the bnd vertex separation method to the band graph leads to a situation where both anchor vertices are placed in the same part. width=val Set the width of the band graph. All graph vertices that are at a distance less than or equal to val from any separator vertex are kept in the band graph. e Edge-separation method. Available only for graph separation strategies. This method builds vertex separators from edge separators, by the method proposed by Pothen and Fang [50], which uses a variant of the Hopcroft and Karp algorithm due to Duff [9]. This method is expensive and most often yields poorer results than direct vertex-oriented methods such as the vertex vertex Greedy-graph-growing and the vertex Fiduccia-Mattheyses algorithms. The parameters of the edge-separation method are listed below. bal=val Set the load imbalance tolerance to val, which is a floating-point ratio expressed with respect to the ideal load of the partitions. strat=strat Set the graph bipartitioning strategy that is used to compute the edge bipartition. The syntax of bipartitioning strategy strings is defined within section 7.3.2, at page 55. width=type Select the width of the vertex separators built from edge separators. When type is set to f, fat vertex separators are built, that hold all of the ends of the edges of the edge cut. When it is set to t, a thin vertex separator is built by removing as many vertices as possible from the fat separator. 63 f Vertex Fiduccia-Mattheyses method. The parameters of the vertex FiducciaMattheyses method are listed below. bal=rat Set the maximum weight imbalance ratio to the given fraction of the weight of all node vertices. Common values are around 0.01, that is, one percent. move=nbr Maximum number of hill-climbing moves that can be performed before a pass ends. During each of its passes, the vertex Fiduccia-Mattheyses algorithm repeatedly moves vertices from the separator to any of the two parts, so as to minimize the size of the separator. A pass completes either when all of the vertices have been moved once, or if too many swaps that do not decrease the size of the separator have been performed. pass=nbr Set the maximum number of optimization passes performed by the algorithm. The vertex Fiduccia-Mattheyses algorithm stops as soon as a pass has not yielded any reduction of the size of the separator, or when the maximum number of passes has been reached. Value -1 stands for an infinite number of passes, that is, as many as needed by the algorithm to converge. g Gibbs-Poole-Stockmeyer method. Available only for graph separation strategies. This method has only one parameter. pass=nbr Set the number of sweeps performed by the algorithm. h Vertex greedy-graph-growing method. This method has only one parameter. pass=nbr Set the number of runs performed by the algorithm. m Vertex multi-level method. The parameters of the vertex multi-level method are listed below. asc=strat Set the strategy that is used to refine the vertex separators obtained at ascending levels of the uncoarsening phase by projection of the separators computed for coarser graphs or meshes. This strategy is not applied to the coarsest graph or mesh, for which only the low strategy is used. low=strat Set the strategy that is used to compute the vertex separator of the coarsest graph or mesh, at the lowest level of the coarsening process. rat=rat Set the threshold maximum coarsening ratio over which graphs or meshes are no longer coarsened. The ratio of any given coarsening cannot be less that 0.5 (case of a perfect matching), and cannot be greater than 1.0. Coarsening stops when either the coarsening ratio is above the maximum coarsening ratio, or the graph or mesh has fewer node vertices than the minimum number of vertices allowed. 64 vert=nbr Set the threshold minimum size under which graphs or meshes are no longer coarsened. Coarsening stops when either the coarsening ratio is above the maximum coarsening ratio, or the graph or mesh has fewer node vertices than the minimum number of vertices allowed. t Thinner method. Available only for graph separation strategies. This method quickly eliminates all useless vertices of the current separator. It searches the separator for vertices that have no neighbors in one of the two parts, and moves these vertices to the part they are connected to. This method may be used to refine separators during the uncoarsening phase of the multi-level method, and is faster than a vertex Fiduccia-Mattheyses algorithm with {move =0}. v Mesh-to-graph method. Available only for mesh separation strategies. From the mesh to which this method applies is derived a graph, such that a graph vertex is associated with every node of the mesh, and a clique is created between all vertices which represent nodes that belong to the same element. A graph separation strategy is then applied to the derived graph, and the separator is projected back to the nodes of the mesh. This method is here for evaluation purposes only, as mesh separation methods are generally more efficient than their graph separation counterpart. strat=strat Graph separation strategy to apply to the associated graph. w Graph separator viewer. Available only for graph separation strategies. Every call to this method results in the creation, in the current subdirectory, of partial mapping files called “vgraphseparatevw output nnnnnnnn.map”, where “nnnnnnnn” are increasing decimal numbers, which contain the current state of the two parts and the separator. These mapping files can be used as input by the gout program to produce displays of the evolving shape of the current separator and parts. This is mostly a debugging feature, but it can also have an illustrative interest. While it is only available for graph separation strategies, mesh separation strategies can indirectly use it through the mesh-to-graph separation method. z Zero method. This method moves all of the node vertices to the first part, resulting in an empty separator. Its main use is to stop the separation process whenever some condition is true. 7.4 7.4.1 Target architecture handling routines SCOTCH archInit Synopsis int SCOTCH archInit (SCOTCH Arch * archptr) scotchfarchinit (doubleprecision (*) integer Description 65 archdat, ierr) The SCOTCH archInit function initializes a SCOTCH Arch structure so as to make it suitable for future operations. It should be the first function to be called upon a SCOTCH Arch structure. When the target architecture data is no longer of use, call function SCOTCH archExit to free its internal structures. Return values SCOTCH archInit returns 0 if the graph structure has been successfully initialized, and 1 else. 7.4.2 SCOTCH archExit Synopsis void SCOTCH archExit (SCOTCH Arch * archptr) scotchfarchexit (doubleprecision (*) archdat) Description The SCOTCH archExit function frees the contents of a SCOTCH Arch structure previously initialized by SCOTCH archInit. All subsequent calls to SCOTCH arch routines other than SCOTCH archInit, using this structure as parameter, may yield unpredictable results. 7.4.3 SCOTCH archLoad Synopsis int SCOTCH archLoad (SCOTCH Arch * FILE * archptr, stream) scotchfarchload (doubleprecision (*) integer integer archdat, fildes, ierr) Description The SCOTCH archLoad routine fills the SCOTCH Arch structure pointed to by archptr with the source graph description available from stream stream in the Scotch target architecture format (see Section 5.4). Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the architecture file. Return values SCOTCH archLoad returns 0 if the target architecture structure has been successfully allocated and filled with the data read, and 1 else. 66 7.4.4 SCOTCH archSave Synopsis int SCOTCH archSave (const SCOTCH Arch * FILE * scotchfarchsave (doubleprecision (*) integer integer archptr, stream) archdat, fildes, ierr) Description The SCOTCH archSave routine saves the contents of the SCOTCH Arch structure pointed to by archptr to stream stream, in the Scotch target architecture format (see section 5.4). Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the architecture file. Return values SCOTCH archSave returns 0 if the graph structure has been successfully written to stream, and 1 else. 7.4.5 SCOTCH archBuild Synopsis int SCOTCH archBuild (SCOTCH Arch * const SCOTCH Graph * const SCOTCH Num const SCOTCH Num * const SCOTCH Strat * scotchfarchbuild (doubleprecision (*) doubleprecision (*) integer integer (*) doubleprecision (*) integer archptr, grafptr, listnbr, listtab, straptr) archdat, grafdat, listnbr, listtab, stradat, ierr) Description The SCOTCH archBuild routine fills the architecture structure pointed to by archptr with the decomposition-defined target architecture computed by applying the graph bipartitioning strategy pointed to by straptr to the architecture graph pointed to by grafptr. When listptr is not NULL and listnbr is greater than zero, the decomposition-defined architecture is restricted to the listnbr vertices whose indices are given in the array pointed to by listtab, from listtab[0] to listtab[listnbr - 1]. These indices should have the same base value as 67 the one of the graph pointed to by grafptr, that is, be in the range from 0 to vertnbr − 1 if the graph base is 0, and from 1 to vertnbr if the graph base is 1. Graph bipartitioning strategies are declared by means of the SCOTCH strat GraphBipart function, described in page 102. The syntax of bipartitioning strategy strings is defined in section 7.3.1, page 55. Additional information may be obtained from the manual page of amk grf, the stand-alone executable that uses function SCOTCH archBuild to build decomposition-defined target architecture from source graphs, available at page 31. Return values SCOTCH archBuild returns 0 if the decomposition-defined architecture has been successfully computed, and 1 else. 7.4.6 SCOTCH archCmplt Synopsis int SCOTCH archCmplt (SCOTCH Arch * const SCOTCH Num archptr, vertnbr) scotchfarchcmplt (doubleprecision (*) integer integer archdat, vertnbr, ierr) Description The SCOTCH archCmplt routine fills the SCOTCH Arch structure pointed to by archptr with the description of a complete graph architecture with vertnbr processors, which can be used as input to SCOTCH graphMap to perform graph partitioning. A shortcut to this is to use the SCOTCH graphPart routine. Return values SCOTCH archCmplt returns 0 if the complete graph target architecture has been successfully built, and 1 else. 7.4.7 SCOTCH archCmpltw Synopsis int SCOTCH archCmpltw (SCOTCH Arch * const SCOTCH Num const SCOTCH Num * const scotchfarchcmplt (doubleprecision (*) integer integer (*) integer Description 68 archdat, vertnbr, velotab, ierr) archptr, vertnbr, velotab) The SCOTCH archCmpltw routine fills the SCOTCH Arch structure pointed to by archptr with the description of a weighted complete graph architecture with vertnbr processors. The relative weights of the processors are given in the velotab array. Once the target architecture has been created, it can be used as input to SCOTCH graphMap to perform weighted graph partitioning. Return values SCOTCH archCmpltw returns 0 if the weighted complete graph target architecture has been successfully built, and 1 else. 7.4.8 SCOTCH archName Synopsis const char * SCOTCH archName (const SCOTCH Arch * scotchfarchname (doubleprecision (*) character (*) integer archptr) archdat, chartab, charnbr) Description The SCOTCH archName function returns a string containing the name of the architecture pointed to by archptr. Since Fortran routines cannot return string pointers, the scotchfarchname routine takes as second and third parameters a character() array to be filled with the name of the architecture, and the integer size of the array, respectively. If the array is of sufficient size, a trailing nul character is appended to the string to materialize the end of the string (this is the C style of handling character strings). Return values SCOTCH archName returns a non-null character pointer that points to a nullterminated string describing the type of the architecture. 7.4.9 SCOTCH archSize Synopsis SCOTCH Num SCOTCH archSize (const SCOTCH Arch * scotchfarchsize (doubleprecision (*) integer archptr) archdat, archnbr) Description The SCOTCH archSize function returns the number of nodes of the given target architecture. The Fortran routine has a second parameter, of integer type, which is set on return with the number of nodes of the target architecture. Return values SCOTCH archSize returns the number of nodes of the target architecture. 69 7.5 7.5.1 Graph handling routines SCOTCH graphInit Synopsis int SCOTCH graphInit (SCOTCH Graph * scotchfgraphinit (doubleprecision (*) integer grafptr) grafdat, ierr) Description The SCOTCH graphInit function initializes a SCOTCH Graph structure so as to make it suitable for future operations. It should be the first function to be called upon a SCOTCH Graph structure. When the graph data is no longer of use, call function SCOTCH graphExit to free its internal structures. Return values SCOTCH graphInit returns 0 if the graph structure has been successfully initialized, and 1 else. 7.5.2 SCOTCH graphExit Synopsis void SCOTCH graphExit (SCOTCH Graph * scotchfgraphexit (doubleprecision (*) grafptr) grafdat) Description The SCOTCH graphExit function frees the contents of a SCOTCH Graph structure previously initialized by SCOTCH graphInit. All subsequent calls to SCOTCH graph routines other than SCOTCH graphInit, using this structure as parameter, may yield unpredictable results. 7.5.3 SCOTCH graphFree Synopsis void SCOTCH graphFree (SCOTCH Graph * scotchfgraphfree (doubleprecision (*) grafptr) grafdat) Description The SCOTCH graphFree function frees the graph data of a SCOTCH Graph structure previously initialized by SCOTCH graphInit, but preserves its internal 70 data structures. This call is equivalent to a call to SCOTCH graphExit immediately followed by a call to SCOTCH graphInit. Consequently, the given SCOTCH Graph structure remains ready for subsequent calls to any routine of the libScotch library. 7.5.4 SCOTCH graphLoad Synopsis int SCOTCH graphLoad (SCOTCH Graph * FILE * SCOTCH Num SCOTCH Num scotchfgraphload (doubleprecision (*) integer integer integer integer grafptr, stream, baseval, flagval) grafdat, fildes, baseval, flagval, ierr) Description The SCOTCH graphLoad routine fills the SCOTCH Graph structure pointed to by grafptr with the source graph description available from stream stream in the Scotch graph format (see section 5.1). To ease the handling of source graph files by programs written in C as well as in Fortran, the base value of the graph to read can be set to 0 or 1, by setting the baseval parameter to the proper value. A value of -1 indicates that the graph base should be the same as the one provided in the graph description that is read from stream. The flagval value is a combination of the following integer values, that may be added or bitwise-ored: 0 Keep vertex and edge weights if they are present in the stream data. 1 Remove vertex weights. The graph read will have all of its vertex weights set to one, regardless of what is specified in the stream data. 2 Remove edge weights. The graph read will have all of its edge weights set to one, regardless of what is specified in the stream data. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the graph file. Return values SCOTCH graphLoad returns 0 if the graph structure has been successfully allocated and filled with the data read, and 1 else. 7.5.5 SCOTCH graphSave Synopsis 71 int SCOTCH graphSave (const SCOTCH Graph * FILE * scotchfgraphsave (doubleprecision (*) integer integer grafptr, stream) grafdat, fildes, ierr) Description The SCOTCH graphSave routine saves the contents of the SCOTCH Graph structure pointed to by grafptr to stream stream, in the Scotch graph format (see section 5.1). Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the graph file. Return values SCOTCH graphSave returns 0 if the graph structure has been successfully written to stream, and 1 else. 7.5.6 SCOTCH graphBuild Synopsis int SCOTCH graphBuild (SCOTCH Graph const SCOTCH const SCOTCH const SCOTCH const SCOTCH const SCOTCH const SCOTCH const SCOTCH const SCOTCH const SCOTCH * Num Num Num Num Num Num Num Num Num scotchfgraphbuild (doubleprecision (*) integer integer integer (*) integer (*) integer (*) integer (*) integer integer (*) integer (*) integer * * * * * * grafptr, baseval, vertnbr, verttab, vendtab, velotab, vlbltab, edgenbr, edgetab, edlotab) grafdat, baseval, vertnbr, verttab, vendtab, velotab, vlbltab, edgenbr, edgetab, edlotab, ierr) Description The SCOTCH graphBuild routine fills the source graph structure pointed to by grafptr with all of the data that are passed to it. 72 baseval is the graph base value for index arrays (typically 0 for structures built from C and 1 for structures built from Fortran). vertnbr is the number of vertices. verttab is the adjacency index array, of size (vertnbr + 1) if the edge array is compact (that is, if vendtab equals verttab + 1 or NULL), or of size vertnbr else. vendtab is the adjacency end index array, of size vertnbr if it is disjoint from verttab. velotab is the vertex load array, of size vertnbr if it exists. vlbltab is the vertex label array, of size vertnbr if it exists. edgenbr is the number of arcs (that is, twice the number of edges). edgetab is the adjacency array, of size at least edgenbr (it can be more if the edge array is not compact). edlotab is the arc load array, of size edgenbr if it exists. The vendtab, velotab, vlbltab and edlotab arrays are optional, and a NULL pointer can be passed as argument whenever they are not defined. Since, in Fortran, there is no null reference, passing the scotchfgraphbuild routine a reference equal to verttab in the velotab or vlbltab fields makes them be considered as missing arrays. The same holds for edlotab when it is passed a reference equal to edgetab. Setting vendtab to refer to one cell after verttab yields the same result, as it is the exact semantics of a compact vertex array. To limit memory consumption, SCOTCH graphBuild does not copy array data, but instead references them in the SCOTCH Graph structure. Therefore, great care should be taken not to modify the contents of the arrays passed to SCOTCH graphBuild as long as the graph structure is in use. Every update of the arrays should be preceded by a call to SCOTCH graphFree, to free internal graph structures, and eventually followed by a new call to SCOTCH graphBuild to re-build these internal structures so as to be able to use the new graph. To ensure that inconsistencies in user data do not result in an erroneous behavior of the libScotch routines, it is recommended, at least in the development stage, to call the SCOTCH graphCheck routine on the newly created SCOTCH Graph structure before calling any other libScotch routine. Return values SCOTCH graphBuild returns 0 if the graph structure has been successfully set with all of the input data, and 1 else. 7.5.7 SCOTCH graphBase Synopsis int SCOTCH graphBase (SCOTCH Graph * SCOTCH Num scotchfgraphbase (doubleprecision (*) integer integer Description 73 grafptr, baseval) grafdat, baseval, oldbaseval) The SCOTCH graphBase routine sets the base of all graph indices according to the given base value, and returns the old base value. This routine is a helper for applications that do not handle base values properly. In Fortan, the old base value is returned in the third parameter of the function call. Return values SCOTCH graphBase returns the old base value. 7.5.8 SCOTCH graphCheck Synopsis int SCOTCH graphCheck (const SCOTCH Graph * scotchfgraphcheck (doubleprecision (*) integer grafptr) grafdat, ierr) Description The SCOTCH graphCheck routine checks the consistency of the given SCOTCH Graph structure. It can be used in client applications to determine if a graph that has been created from used-generated data by means of the SCOTCH graphBuild routine is consistent, prior to calling any other routines of the libScotch library. Return values SCOTCH graphCheck returns 0 if graph data are consistent, and 1 else. 7.5.9 SCOTCH graphSize Synopsis void SCOTCH graphSize (const SCOTCH Graph * SCOTCH Num * SCOTCH Num * scotchfgraphsize (doubleprecision (*) integer integer grafptr, vertptr, edgeptr) grafdat, vertnbr, edgenbr) Description The SCOTCH graphSize routine fills the two areas of type SCOTCH Num pointed to by vertptr and edgeptr with the number of vertices and arcs (that is, twice the number of edges) of the given graph pointed to by grafptr, respectively. Any of these pointers can be set to NULL on input if the corresponding information is not needed. Else, the reference to a dummy area can be provided, where all unwanted data will be written. 74 This routine is useful to get the size of a graph read by means of the SCOTCH graphLoad routine, in order to allocate auxiliary arrays of proper sizes. If the whole structure of the graph is wanted, function SCOTCH graphData should be preferred. 7.5.10 SCOTCH graphData Synopsis void SCOTCH graphData (const SCOTCH Graph * SCOTCH Num * SCOTCH Num * SCOTCH Num ** SCOTCH Num ** SCOTCH Num ** SCOTCH Num ** SCOTCH Num * SCOTCH Num ** SCOTCH Num ** scotchfgraphdata (doubleprecision (*) integer (*) integer integer integer integer integer integer integer integer integer grafptr, baseptr, vertptr, verttab, vendtab, velotab, vlbltab, edgeptr, edgetab, edlotab) grafdat, indxtab, baseval, vertnbr, vertidx, vendidx, veloidx, vlblidx, edgenbr, edgeidx, edloidx) Description The SCOTCH graphData routine is the dual of the SCOTCH graphBuild routine. It is a multiple accessor that returns scalar values and array references. baseptr is the pointer to a location that will hold the graph base value for index arrays (typically 0 for structures built from C and 1 for structures built from Fortran). vertptr is the pointer to a location that will hold the number of vertices. verttab is the pointer to a location that will hold the reference to the adjacency index array, of size *vertptr + 1 if the adjacency array is compact, or of size *vertptr else. vendtab is the pointer to a location that will hold the reference to the adjacency end index array, and is equal to verttab + 1 if the adjacency array is compact. velotab is the pointer to a location that will hold the reference to the vertex load array, of size *vertptr. vlbltab is the pointer to a location that will hold the reference to the vertex label array, of size vertnbr. edgeptr is the pointer to a location that will hold the number of arcs (that is, twice the number of edges). edgetab is the pointer to a location that will hold the reference to the adjacency array, of size at least *edgeptr. edlotab is the pointer to a location that will hold the reference to the arc load array, of size *edgeptr. 75 Any of these pointers can be set to NULL on input if the corresponding information is not needed. Else, the reference to a dummy area can be provided, where all unwanted data will be written. Since there are no pointers in Fortran, a specific mechanism is used to allow users to access graph arrays. The scotchfgraphdata routine is passed an integer array, the first element of which is used as a base address from which all other array indices are computed. Therefore, instead of returning references, the routine returns integers, which represent the starting index of each of the relevant arrays with respect to the base input array, or vertidx, the index of verttab, if they do not exist. For instance, if some base array myarray (1) is passed as parameter indxtab, then the first cell of array verttab will be accessible as myarray(vertidx). In order for this feature to behave properly, the indxtab array must be word-aligned with the graph arrays. This is automatically enforced on most systems, but some care should be taken on systems that allow one to access data that is not word-aligned. On such systems, declaring the array after a dummy doubleprecision array can coerce the compiler into enforcing the proper alignment. 7.5.11 SCOTCH graphStat Synopsis void SCOTCH graphStat (const SCOTCH Graph * SCOTCH Num * SCOTCH Num * SCOTCH Num * double * double * SCOTCH Num * SCOTCH Num * double * double * SCOTCH Num * SCOTCH Num * SCOTCH Num * double * double * 76 grafptr, velominptr, velomaxptr, velosumptr, veloavgptr, velodltptr, degrminptr, degrmaxptr, degravgptr, degrdltptr, edlominptr, edlomaxptr, edlosumptr, edloavgptr, edlodltptr) scotchfgraphstat (doubleprecision (*) integer integer integer doubleprecision doubleprecision integer integer doubleprecision doubleprecision integer integer integer doubleprecision doubleprecision grafdat, velomin, velomax, velosum, veloavg, velodlt, degrmin, degrmax, degravg, degrdlt, edlomin, edlomax, edlosum, edloavg, edlodlt) Description The SCOTCH graphStat routine produces some statistics regarding the graph structure pointed to by grafptr. velomin, velomax, velosum, veloavg and velodlt are the minimum vertex load, the maximum vertex load, the sum of all vertex loads, the average vertex load, and the variance of the vertex loads, respectively. degrmin, degrmax, degravg and degrdlt are the minimum vertex degree, the maximum vertex degree, the average vertex degree, and the variance of the vertex degrees, respectively. edlomin, edlomax, edlosum, edloavg and edlodlt are the minimum edge load, the maximum edge load, the sum of all edge loads, the average edge load, and the variance of the edge loads, respectively. 7.6 Graph mapping and partitioning routines The first two routines provide high-level functionalities and free the user from the burden of calling in sequence several of the low-level routines described afterward. 7.6.1 SCOTCH graphPart Synopsis int SCOTCH graphPart (const SCOTCH Graph * const SCOTCH Num const SCOTCH Strat * SCOTCH Num * scotchfgraphpart (doubleprecision (*) integer doubleprecision (*) integer (*) integer Description 77 grafptr, partnbr, straptr, parttab) grafdat, partnbr, stradat, parttab, ierr) The SCOTCH graphPart routine computes a partition into partnbr parts of the source graph structure pointed to by grafptr, using the graph partitioning strategy pointed to by stratptr, and returns the partition data in the array pointed to by parttab. The parttab array should have been previously allocated, of a size sufficient to hold as many SCOTCH Num integers as there are vertices in the source graph. On return, every array cell holds the number of the part to which the corresponding vertex is mapped. Parts are numbered from 0 to partnbr − 1. Return values SCOTCH graphPart returns 0 if the partition of the graph has been successfully computed, and 1 else. In this latter case, the parttab array may however have been partially or completely filled, but its content is not significant. 7.6.2 SCOTCH graphMap Synopsis int SCOTCH graphMap (const SCOTCH Graph * const SCOTCH Arch * const SCOTCH Strat * SCOTCH Num * scotchfgraphmap (doubleprecision (*) doubleprecision (*) doubleprecision (*) integer (*) integer grafptr, archptr, straptr, parttab) grafdat, archdat, stradat, parttab, ierr) Description The SCOTCH graphMap routine computes a mapping of the source graph structure pointed to by grafptr onto the target architecture pointed to by archptr, using the mapping strategy pointed to by straptr, and returns the mapping data in the array pointed to by parttab. The parttab array should have been previously allocated, of a size sufficient to hold as many SCOTCH Num integers as there are vertices in the source graph. On return, every cell of the mapping array holds the number of the target vertex to which the corresponding source vertex is mapped. The numbering of target values is not based: target vertices are numbered from 0 to the number of target vertices minus 1. Return values SCOTCH graphMap returns 0 if the partition of the graph has been successfully computed, and 1 else. In this last case, the parttab array may however have been partially or completely filled, but its content is not significant. 78 7.6.3 SCOTCH graphMapInit Synopsis int SCOTCH graphMapInit (const SCOTCH Graph * SCOTCH Mapping * const SCOTCH Arch * SCOTCH Num * scotchfgraphmapinit (doubleprecision (*) doubleprecision (*) doubleprecision (*) integer (*) integer grafptr, mappptr, archptr, parttab) grafdat, mappdat, archdat, parttab, ierr) Description The SCOTCH graphMapInit routine fills the mapping structure pointed to by mappptr with all of the data that is passed to it. Thus, all subsequent calls to ordering routines such as SCOTCH graphMapCompute, using this mapping structure as parameter, will place mapping results in field parttab. parttab is the pointer to an array of as many SCOTCH Nums as there are vertices in the graph pointed to by grafptr, and which will receive the indices of the vertices of the target architecture pointed to by archptr. It should be the first function to be called upon a SCOTCH Mapping structure. When the mapping structure is no longer of use, call function SCOTCH graph MapExit to free its internal structures. Return values SCOTCH graphMapInit returns 0 if the mapping structure has been successfully initialized, and 1 else. 7.6.4 SCOTCH graphMapExit Synopsis void SCOTCH graphMapExit (const SCOTCH Graph * SCOTCH Mapping * scotchfgraphmapexit (doubleprecision (*) doubleprecision (*) grafptr, mappptr) grafdat, mappdat) Description The SCOTCH graphMapExit function frees the contents of a SCOTCH Mapping structure previously initialized by SCOTCH graphMapInit. All subsequent calls to SCOTCH graphMap* routines other than SCOTCH graphMapInit, using this structure as parameter, may yield unpredictable results. 79 7.6.5 SCOTCH graphMapLoad Synopsis int SCOTCH graphMapLoad (const SCOTCH Graph * SCOTCH Mapping * FILE * scotchfgraphmapload (doubleprecision (*) doubleprecision (*) integer integer grafptr, mappptr, stream) grafdat, mappdat, fildes, ierr) Description The SCOTCH graphMapLoad routine fills the SCOTCH Mapping structure pointed to by mappptr with the mapping data available in the Scotch mapping format (see section 5.5) from stream stream. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the mapping file. Return values SCOTCH graphMapLoad returns 0 if the mapping structure has been successfully loaded from stream, and 1 else. 7.6.6 SCOTCH graphMapSave Synopsis int SCOTCH graphMapSave (const SCOTCH Graph * const SCOTCH Mapping * FILE * scotchfgraphmapsave (doubleprecision (*) doubleprecision (*) integer integer grafptr, mappptr, stream) grafdat, mappdat, fildes, ierr) Description The SCOTCH graphMapSave routine saves the contents of the SCOTCH Mapping structure pointed to by mappptr to stream stream, in the Scotch mapping format (see section 5.5). Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the mapping file. Return values SCOTCH graphMapSave returns 0 if the mapping structure has been successfully written to stream, and 1 else. 80 7.6.7 SCOTCH graphMapCompute Synopsis int SCOTCH graphMapCompute (const SCOTCH Graph * SCOTCH Mapping * const SCOTCH Strat * scotchfgraphmapcompute (doubleprecision (*) doubleprecision (*) doubleprecision (*) integer grafptr, mappptr, straptr) grafdat, mappdat, stradat, ierr) Description The SCOTCH graphMapCompute routine computes a mapping on the given SCOTCH Mapping structure pointed to by mappptr using the mapping strategy pointed to by stratptr. On return, every cell of the mapping array (see section 7.6.3) holds the number of the target vertex to which the corresponding source vertex is mapped. The numbering of target values is not based: target vertices are numbered from 0 to the number of target vertices, minus 1. Return values SCOTCH graphMapCompute returns 0 if the mapping has been successfully computed, and 1 else. In this latter case, the mapping array may however have been partially or completely filled, but its content is not significant. 7.6.8 SCOTCH graphMapView Synopsis int SCOTCH graphMapView (const SCOTCH Graph * const SCOTCH Mapping * FILE * scotchfgraphmapview (doubleprecision (*) doubleprecision (*) integer integer grafptr, mappptr, stream) grafdat, mappdat, fildes, ierr) Description The SCOTCH mapView routine summarizes statistical information on the mapping pointed to by mappptr (load of target processors, number of neighboring domains, average dilation and expansion, edge cut size, distribution of edge dilations), and prints these results to stream stream. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the output data file. 81 Return values SCOTCH mapView returns 0 if the data has been successfully written to stream, and 1 else. 7.7 Graph ordering routines The first routine provides high-level functionality and frees the user from the burden of calling in sequence several of the low-level routines described afterward. 7.7.1 SCOTCH graphOrder Synopsis int SCOTCH graphOrder (const SCOTCH Graph * const SCOTCH Strat * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * scotchfgraphorder (doubleprecision (*) doubleprecision (*) integer (*) integer (*) integer integer (*) integer (*) integer grafptr, straptr, permtab, peritab, cblkptr, rangtab, treetab) grafdat, stradat, permtab, peritab, cblknbr, rangtab, treetab, ierr) Description The SCOTCH graphOrder routine computes a block ordering of the unknowns of the symmetric sparse matrix the adjacency structure of which is represented by the source graph structure pointed to by grafptr, using the ordering strategy pointed to by stratptr, and returns ordering data in the scalar pointed to by cblkptr and the four arrays permtab, peritab, rangtab and treetab. The permtab, peritab, rangtab and treetab arrays should have been previously allocated, of a size sufficient to hold as many SCOTCH Num integers as there are vertices in the source graph, plus one in the case of rangtab. Any of the five output fields can be set to NULL if the corresponding information is not needed. Since, in Fortran, there is no null reference, passing a reference to grafptr in these fields will have the same effect. On return, permtab holds the direct permutation of the unknowns, that is, vertex i of the original graph has index permtab[i] in the reordered graph, while peritab holds the inverse permutation, that is, vertex i in the reordered graph had index peritab[i] in the original graph. All of these indices are numbered according to the base value of the source graph: permutation indices are numbered from baseval to vertnbr + baseval − 1, that is, from 0 to 82 vertnbr − 1 if the graph base is 0, and from 1 to vertnbr if the graph base is 1. The three other result fields, *cblkptr, rangtab and treetab, contain data related to the block structure. *cblkptr holds the number of column blocks of the produced ordering, and rangtab holds the starting indices of each of the permuted column blocks, in increasing order, so that column block i starts at index rangtab[i] and ends at index (rangtab[i+1]−1), inclusive, in the new ordering. treetab holds the separators tree structure, that is, treetab[i] is the index of the father of column block i in the separators tree, or −1 if column block i is the root of the separators tree. Please refer to Section 7.2.5 for more information. Return values SCOTCH graphOrder returns 0 if the ordering of the graph has been successfully computed, and 1 else. In this last case, the rangtab, permtab, and peritab arrays may however have been partially or completely filled, but their contents are not significant. 7.7.2 SCOTCH graphOrderInit Synopsis int SCOTCH graphOrderInit (const SCOTCH Graph * SCOTCH Ordering * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * scotchfgraphorderinit (doubleprecision (*) doubleprecision (*) integer (*) integer (*) integer integer (*) integer (*) integer grafptr, ordeptr, permtab, peritab, cblkptr, rangtab, treetab) grafdat, ordedat, permtab, peritab, cblknbr, rangtab, treetab, ierr) Description The SCOTCH graphOrderInit routine fills the ordering structure pointed to by ordeptr with all of the data that are passed to it. Thus, all subsequent calls to ordering routines such as SCOTCH graphOrderCompute, using this ordering structure as parameter, will place ordering results in fields permtab, peritab, *cblkptr, rangtab or treetab, if they are not set to NULL. permtab is the ordering permutation array, of size vertnbr, peritab is the inverse ordering permutation array, of size vertnbr, cblkptr is the pointer to a SCOTCH Num that will receive the number of produced column blocks, rangtab is the array that holds the column block span information, of size 83 vertnbr + 1, and treetab is the array holding the structure of the separators tree, of size vertnbr. See the above manual page of SCOTCH graphOrder, as well as section 7.2.5, for an explanation of the semantics of all of these fields. The SCOTCH graphOrderInit routine should be the first function to be called upon a SCOTCH Ordering structure for ordering graphs. When the ordering structure is no longer of use, the SCOTCH graphOrderExit function must be called, in order to to free its internal structures. Return values SCOTCH graphOrderInit returns 0 if the ordering structure has been successfully initialized, and 1 else. 7.7.3 SCOTCH graphOrderExit Synopsis void SCOTCH graphOrderExit (const SCOTCH Graph * SCOTCH Ordering * scotchfgraphorderexit (doubleprecision (*) doubleprecision (*) grafptr, ordeptr) grafdat, ordedat) Description The SCOTCH graphOrderExit function frees the contents of a SCOTCH Ordering structure previously initialized by SCOTCH graphOrderInit. All subsequent calls to SCOTCH graphOrder* routines other than SCOTCH graph OrderInit, using this structure as parameter, may yield unpredictable results. 7.7.4 SCOTCH graphOrderLoad Synopsis int SCOTCH graphOrderLoad (const SCOTCH Graph * SCOTCH Ordering * FILE * scotchfgraphorderload (doubleprecision (*) doubleprecision (*) integer integer grafptr, ordeptr, stream) grafdat, ordedat, fildes, ierr) Description The SCOTCH graphOrderLoad routine fills the SCOTCH Ordering structure pointed to by ordeptr with the ordering data available in the Scotch ordering format (see section 5.6) from stream stream. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the ordering file. 84 Return values SCOTCH graphOrderLoad returns 0 if the ordering structure has been successfully loaded from stream, and 1 else. 7.7.5 SCOTCH graphOrderSave Synopsis int SCOTCH graphOrderSave (const SCOTCH Graph * const SCOTCH Ordering * FILE * scotchfgraphordersave (doubleprecision (*) doubleprecision (*) integer integer grafptr, ordeptr, stream) grafdat, ordedat, fildes, ierr) Description The SCOTCH graphOrderSave routine saves the contents of the SCOTCH Ordering structure pointed to by ordeptr to stream stream, in the Scotch ordering format (see section 5.6). Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the ordering file. Return values SCOTCH graphOrderSave returns 0 if the ordering structure has been successfully written to stream, and 1 else. 7.7.6 SCOTCH graphOrderSaveMap Synopsis int SCOTCH graphOrderSaveMap (const SCOTCH Graph * const SCOTCH Ordering * FILE * scotchfgraphordersavemap (doubleprecision (*) doubleprecision (*) integer integer grafptr, ordeptr, stream) grafdat, ordedat, fildes, ierr) Description The SCOTCH graphOrderSaveMap routine saves the block partitioning data associated with the SCOTCH Ordering structure pointed to by ordeptr to stream stream, in the Scotch mapping format (see section 5.5). A target domain number is associated with every block, such that all node vertices belonging to the same block are shown as belonging to the same target vertex. The 85 resulting mapping file can be used by the gout program (see Section 6.3.12) to produce pictures showing the different separators and blocks. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the mapping file. Return values SCOTCH graphOrderSaveMap returns 0 if the ordering structure has been successfully written to stream, and 1 else. 7.7.7 SCOTCH graphOrderSaveTree Synopsis int SCOTCH graphOrderSaveTree (const SCOTCH Graph * const SCOTCH Ordering * FILE * scotchfgraphordersavetree (doubleprecision (*) doubleprecision (*) integer integer grafptr, ordeptr, stream) grafdat, ordedat, fildes, ierr) Description The SCOTCH graphOrderSaveTree routine saves the tree hierarchy information associated with the SCOTCH Ordering structure pointed to by ordeptr to stream stream. The format of the tree output file resembles the one of a mapping or ordering file: it is made up of as many lines as there are vertices in the ordering. Each of these lines holds two integer numbers. The first one is the index or the label of the vertex, and the second one is the index of its parent node in the separators tree, or −1 if the vertex belongs to a root node. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the tree mapping file. Return values SCOTCH graphOrderSaveTree returns 0 if the separators tree structure has been successfully written to stream, and 1 else. 7.7.8 SCOTCH graphOrderCheck Synopsis int SCOTCH graphOrderCheck (const SCOTCH Graph * const SCOTCH Ordering * scotchfgraphordercheck (doubleprecision (*) doubleprecision (*) integer 86 grafptr, ordeptr) grafdat, ordedat, ierr) Description The SCOTCH graphOrderCheck routine checks the consistency of the given SCOTCH Ordering structure pointed to by ordeptr. Return values SCOTCH graphOrderCheck returns 0 if ordering data are consistent, and 1 else. 7.7.9 SCOTCH graphOrderCompute Synopsis int SCOTCH graphOrderCompute (const SCOTCH Graph * SCOTCH Ordering * const SCOTCH Strat * scotchfgraphordercompute (doubleprecision (*) doubleprecision (*) doubleprecision (*) integer grafptr, ordeptr, straptr) grafdat, ordedat, stradat, ierr) Description The SCOTCH graphOrderCompute routine computes a block ordering of the graph structure pointed to by grafptr, using the ordering strategy pointed to by stratptr, and stores its result in the ordering structure pointed to by ordeptr. On return, the ordering structure holds a block ordering of the given graph (see section 7.7.2 for a description of the ordering fields). Return values SCOTCH graphOrderCompute returns 0 if the ordering has been successfully computed, and 1 else. In this latter case, the ordering arrays may however have been partially or completely filled, but their contents are not significant. 7.7.10 SCOTCH graphOrderComputeList Synopsis int SCOTCH graphOrderComputeList (const SCOTCH Graph * SCOTCH Ordering * SCOTCH Num SCOTCH Num * const SCOTCH Strat * scotchfgraphordercompute (doubleprecision (*) doubleprecision (*) integer integer (*) doubleprecision (*) integer 87 grafdat, ordedat, listnbr, listtab, stradat, ierr) grafptr, ordeptr, listnbr, listtab, straptr) Description The SCOTCH graphOrderComputeList routine computes a block ordering of a subgraph of the graph structure pointed to by grafptr, using the ordering strategy pointed to by stratptr, and stores its result in the ordering structure pointed to by ordeptr. The induced subgraph is described by means of a vertex list: listnbr holds the number of vertices to keep in the induced subgraph, the indices of which are given, in any order, in the listtab array. On return, the ordering structure holds a block ordering of the induced subgraph (see section 7.2.5 for a description of the ordering fields). To compute this ordering, graph ordering methods such as the minimum degree and minimum fill methods will base on the original degree of the induced graph vertices, their non-induced neighbors being considered as halo vertices (see Section 3.2.3 for more information on halo vertices). Because an ordering always refers to the full graph, the ordering computed by SCOTCH graphOrderComputeList is divided into two distinct parts: the induced graph vertices are ordered by applying to the induced graph the strategy provided by the stratptr parameter, while non-induced vertex are ordered consecutively with the highest available indices. Consequently, the permuted indices of induced vertices range from baseval to (listnbr + baseval − 1), while the permuted indices of the remaining vertices range from (listnbr + baseval) to (vertnbr + baseval − 1), inclusive. The separation tree yielded by SCOTCH graphOrderComputeList reflects this property: it is made of two branches, the first one corresponding to the induced subgraph, and the second one to the remaining vertices. Since these two subgraphs are not considered to be connected, both will have their own root, represented by a −1 value in the treetab array of the ordering. Return values SCOTCH graphOrderComputeList returns 0 if the ordering has been successfully computed, and 1 else. In this latter case, the ordering arrays may however have been partially or completely filled, but their contents are not significant. 7.8 7.8.1 Mesh handling routines SCOTCH meshInit Synopsis int SCOTCH meshInit (SCOTCH Mesh * meshptr) scotchfmeshinit (doubleprecision (*) integer meshdat, ierr) Description The SCOTCH meshInit function initializes a SCOTCH Mesh structure so as to make it suitable for future operations. It should be the first function to be called upon a SCOTCH Mesh structure. When the mesh data is no longer of use, call function SCOTCH meshExit to free its internal structures. 88 Return values SCOTCH meshInit returns 0 if the mesh structure has been successfully initialized, and 1 else. 7.8.2 SCOTCH meshExit Synopsis void SCOTCH meshExit (SCOTCH Mesh * meshptr) scotchfmeshexit (doubleprecision (*) meshdat) Description The SCOTCH meshExit function frees the contents of a SCOTCH Mesh structure previously initialized by SCOTCH meshInit. All subsequent calls to SCOTCH mesh* routines other than SCOTCH meshInit, using this structure as parameter, may yield unpredictable results. 7.8.3 SCOTCH meshLoad Synopsis int SCOTCH meshLoad (SCOTCH Mesh * FILE * SCOTCH Num meshptr, stream, baseval) scotchfmeshload (doubleprecision (*) integer integer integer meshdat, fildes, baseval, ierr) Description The SCOTCH meshLoad routine fills the SCOTCH Mesh structure pointed to by meshptr with the source mesh description available from stream stream in the Scotch mesh format (see section 5.2). To ease the handling of source mesh files by programs written in C as well as in Fortran, The base value of the mesh to read can be set to 0 or 1, by setting the baseval parameter to the proper value. A value of -1 indicates that the mesh base should be the same as the one provided in the mesh description that is read from stream. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the mesh file. Return values SCOTCH meshLoad returns 0 if the mesh structure has been successfully allocated and filled with the data read, and 1 else. 89 7.8.4 SCOTCH meshSave Synopsis int SCOTCH meshSave (const SCOTCH Mesh * FILE * scotchfmeshsave (doubleprecision (*) integer integer meshptr, stream) meshdat, fildes, ierr) Description The SCOTCH meshSave routine saves the contents of the SCOTCH Mesh structure pointed to by meshptr to stream stream, in the Scotch mesh format (see section 5.2). Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the mesh file. Return values SCOTCH meshSave returns 0 if the mesh structure has been successfully written to stream, and 1 else. 7.8.5 SCOTCH meshBuild Synopsis int SCOTCH meshBuild (SCOTCH Mesh * const SCOTCH Num const SCOTCH Num const SCOTCH Num const SCOTCH Num const SCOTCH Num const SCOTCH Num const SCOTCH Num const SCOTCH Num const SCOTCH Num const SCOTCH Num const SCOTCH Num 90 * * * * * * meshptr, velmbas, vnodbas, velmnbr, vnodnbr, verttab, vendtab, velotab, vnlotab, vlbltab, edgenbr, edgetab) scotchfmeshbuild (doubleprecision (*) integer integer integer integer integer (*) integer (*) integer (*) integer (*) integer (*) integer integer (*) integer meshdat, velmbas, vnodbas, velmnbr, vnodnbr, verttab, vendtab, velotab, vnlotab, vlbltab, edgenbr, edgetab, ierr) Description The SCOTCH meshBuild routine fills the source mesh structure pointed to by meshptr with all of the data that is passed to it. velmbas and vnodbas are the base values for the element and node vertices, respectively. velmnbr and vnodnbr are the number of element and node vertices, respectively, such that either velmbas + velmnbr = vnodnbr or vnodbas+ vnodnbr = velmnbr holds, and typically min(velmbas, vnodbas) is 0 for structures built from C and 1 for structures built from Fortran. verttab is the adjacency index array, of size (velmnbr + vnodnbr + 1) if the edge array is compact (that is, if vendtab equals vendtab + 1 or NULL), or of size (velmnbr + vnodnbr1) else. vendtab is the adjacency end index array, of size (velmnbr + vnodnbr) if it is disjoint from verttab. velotab is the element vertex load array, of size velmnbr if it exists. vnlotab is the node vertex load array, of size vnodnbr if it exists. vlbltab is the vertex label array, of size (velmnbr+ vnodnbr) if it exists. edgenbr is the number of arcs (that is, twice the number of edges). edgetab is the adjacency array, of size at least edgenbr (it can be more if the edge array is not compact). The vendtab, velotab, vnlotab and vlbltab arrays are optional, and a NULL pointer can be passed as argument whenever they are not defined. Since, in Fortran, there is no null reference, passing the scotchfmeshbuild routine a reference equal to verttab in the velotab, vnlotab or vlbltab fields makes them be considered as missing arrays. Setting vendtab to refer to one cell after verttab yields the same result, as it is the exact semantics of a compact vertex array. To limit memory consumption, SCOTCH meshBuild does not copy array data, but instead references them in the SCOTCH Mesh structure. Therefore, great care should be taken not to modify the contents of the arrays passed to SCOTCH meshBuild as long as the mesh structure is in use. Every update of the arrays should be preceded by a call to SCOTCH meshExit, to free internal mesh structures, and eventually followed by a new call to SCOTCH meshBuild to re-build these internal structures so as to be able to use the new mesh. To ensure that inconsistencies in user data do not result in an erroneous behavior of the libScotch routines, it is recommended, at least in the development 91 stage, to call the SCOTCH meshCheck routine on the newly created SCOTCH Mesh structure, prior to any other calls to libScotch routines. Return values SCOTCH meshBuild returns 0 if the mesh structure has been successfully set with all of the input data, and 1 else. 7.8.6 SCOTCH meshCheck Synopsis int SCOTCH meshCheck (const SCOTCH Mesh * scotchfmeshcheck (doubleprecision (*) integer meshptr) meshdat, ierr) Description The SCOTCH meshCheck routine checks the consistency of the given SCOTCH Mesh structure. It can be used in client applications to determine if a mesh that has been created from used-generated data by means of the SCOTCH meshBuild routine is consistent, prior to calling any other routines of the libScotch library. Return values SCOTCH meshCheck returns 0 if mesh data are consistent, and 1 else. 7.8.7 SCOTCH meshSize Synopsis void SCOTCH meshSize (const SCOTCH Mesh * SCOTCH Num * SCOTCH Num * SCOTCH Num * scotchfmeshsize (doubleprecision (*) integer integer integer meshptr, velmptr, vnodptr, edgeptr) meshdat, velmnbr, vnodnbr, edgenbr) Description The SCOTCH meshSize routine fills the three areas of type SCOTCH Num pointed to by velmptr, vnodptr and edgeptr with the number of element vertices, node vertices and arcs (that is, twice the number of edges) of the given mesh pointed to by meshptr, respectively. Any of these pointers can be set to NULL on input if the corresponding information is not needed. Else, the reference to a dummy area can be provided, where all unwanted data will be written. 92 This routine is useful to get the size of a mesh read by means of the SCOTCH meshLoad routine, in order to allocate auxiliary arrays of proper sizes. If the whole structure of the mesh is wanted, function SCOTCH meshData should be preferred. 7.8.8 SCOTCH meshData Synopsis void SCOTCH meshData (const SCOTCH Mesh * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num ** SCOTCH Num ** SCOTCH Num ** SCOTCH Num ** SCOTCH Num ** SCOTCH Num * SCOTCH Num ** SCOTCH Num * scotchfmeshdata (doubleprecision (*) integer (*) integer integer integer integer integer integer integer integer integer integer integer integer meshptr, vebaptr, vnbaptr, velmptr, vnodptr, verttab, vendtab, velotab, vnlotab, vlbltab, edgeptr, edgetab, degrptr) meshdat, indxtab, velobas, vnlobas, velmnbr, vnodnbr, vertidx, vendidx, veloidx, vnloidx, vlblidx, edgenbr, edgeidx, degrmax) Description The SCOTCH meshData routine is the dual of the SCOTCH meshBuild routine. It is a multiple accessor that returns scalar values and array references. vebaptr and vnbaptr are pointers to locations that will hold the mesh base value for elements and nodes, respectively (the minimum of these two values is typically 0 for structures built from C and 1 for structures built from Fortran). velmptr and vnodptr are pointers to locations that will hold the number of element and node vertices, respectively. verttab is the pointer to a location that will hold the reference to the adjacency index array, of size (*velmptr + *vnodptr + 1) if the adjacency array is compact, or of size (*velmptr+*vnodptr) else. vendtab is the pointer to a location that will hold 93 the reference to the adjacency end index array, and is equal to verttab + 1 if the adjacency array is compact. velotab and vnlotab are pointers to locations that will hold the reference to the element and node vertex load arrays, of sizes *velmptr and *vnodptr, respectively. vlbltab is the pointer to a location that will hold the reference to the vertex label array, of size (*velmptr + *vnodptr). edgeptr is the pointer to a location that will hold the number of arcs (that is, twice the number of edges). edgetab is the pointer to a location that will hold the reference to the adjacency array, of size at least edgenbr. degrptr is the pointer to a location that will hold the maximum vertex degree computed across all element and node vertices. Any of these pointers can be set to NULL on input if the corresponding information is not needed. Else, the reference to a dummy area can be provided, where all unwanted data will be written. Since there are no pointers in Fortran, a specific mechanism is used to allow users to access mesh arrays. The scotchfmeshdata routine is passed an integer array, the first element of which is used as a base address from which all other array indices are computed. Therefore, instead of returning references, the routine returns integers, which represent the starting index of each of the relevant arrays with respect to the base input array, or vertidx, the index of verttab, if they do not exist. For instance, if some base array myarray (1) is passed as parameter indxtab, then the first cell of array verttab will be accessible as myarray(vertidx). In order for this feature to behave properly, the indxtab array must be word-aligned with the mesh arrays. This is automatically enforced on most systems, but some care should be taken on systems that allow one to access data that is not word-aligned. On such systems, declaring the array after a dummy doubleprecision array can coerce the compiler into enforcing the proper alignment. 7.8.9 SCOTCH meshStat Synopsis void SCOTCH meshStat (const SCOTCH Mesh * SCOTCH Num * SCOTCH Num * SCOTCH Num * double * double * SCOTCH Num * SCOTCH Num * double * double * SCOTCH Num * SCOTCH Num * double * double * 94 meshptr, vnlominptr, vnlomaxptr, vnlosumptr, vnloavgptr, vnlodltptr, edegminptr, edegmaxptr, edegavgptr, edegdltptr, ndegminptr, ndegmaxptr, ndegavgptr, ndegdltptr) scotchfmeshstat (doubleprecision (*) integer integer integer doubleprecision doubleprecision integer integer doubleprecision doubleprecision integer integer doubleprecision doubleprecision meshdat, vnlomin, vnlomax, vnlosum, vnloavg, vnlodlt, edegmin, edegmax, edegavg, edegdlt, ndegmin, ndegmax, ndegavg, ndegdlt) Description The SCOTCH meshStat routine produces some statistics regarding the mesh structure pointed to by meshptr. vnlomin, vnlomax, vnlosum, vnloavg and vnlodlt are the minimum node vertex load, the maximum node vertex load, the sum of all node vertex loads, the average node vertex load, and the variance of the node vertex loads, respectively. edegmin, edegmax, edegavg and edegdlt are the minimum element vertex degree, the maximum element vertex degree, the average element vertex degree, and the variance of the element vertex degrees, respectively. ndegmin, ndegmax, ndegavg and ndegdlt are the minimum element vertex degree, the maximum element vertex degree, the average element vertex degree, and the variance of the element vertex degrees, respectively. 7.8.10 SCOTCH meshGraph Synopsis int SCOTCH meshGraph (const SCOTCH Mesh * SCOTCH Graph * scotchfmeshgraph (doubleprecision (*) doubleprecision (*) integer meshptr, grafptr) meshdat, grafdat, ierr) Description The SCOTCH meshGraph routine builds a graph from a mesh. It creates in the SCOTCH Graph structure pointed to by grafptr a graph having as many vertices as there are nodes in the SCOTCH Mesh structure pointed to by meshptr, and where there is an edge between any two graph vertices if and only if there exists in the mesh an element containing both of the associated nodes. Consequently, all of the elements of the mesh are turned into cliques in the resulting graph. 95 In order to save memory space as well as computation time, in the current implementation of SCOTCH meshGraph, some mesh arrays are shared with the graph structure. Therefore, one should make sure that the graph must no longer be used after the mesh structure is freed. The graph structure can be freed before or after the mesh structure, but must not be used after the mesh structure is freed. Return values SCOTCH meshGraph returns 0 if the graph structure has been successfully allocated and filled, and 1 else. 7.9 Mesh ordering routines The first routine provides high-level functionality and frees the user from the burden of calling in sequence several of the low-level routines described afterward. 7.9.1 SCOTCH meshOrder Synopsis int SCOTCH meshOrder (const SCOTCH Mesh * const SCOTCH Strat * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * scotchfmeshorder (doubleprecision (*) doubleprecision (*) integer (*) integer (*) integer integer (*) integer (*) integer meshptr, straptr, permtab, peritab, cblkptr, rangtab, treetab) meshdat, stradat, permtab, peritab, cblknbr, rangtab, treetab, ierr) Description The SCOTCH meshOrder routine computes a block ordering of the unknowns of the symmetric sparse matrix the adjacency structure of which is represented by the elements that connect the nodes of the source mesh structure pointed to by meshptr, using the ordering strategy pointed to by stratptr, and returns ordering data in the scalar pointed to by cblkptr and the four arrays permtab, peritab, rangtab and treetab. The permtab, peritab, rangtab and treetab arrays should have been previously allocated, of a size sufficient to hold as many SCOTCH Num integers as there are node vertices in the source mesh, plus one in the case of rangtab. Any of the five output fields can be set to NULL if the corresponding information is not needed. Since, in Fortran, there is no null reference, passing a reference to meshptr in these fields will have the same effect. 96 On return, permtab holds the direct permutation of the unknowns, that is, node vertex i of the original mesh has index permtab[i] in the reordered mesh, while peritab holds the inverse permutation, that is, node vertex i in the reordered mesh had index peritab[i] in the original mesh. All of these indices are numbered according to the base value of the source mesh: permutation indices are numbered from min(velmbas, vnodbas) to vnodnbr+ min(velmbas, vnodbas) − 1, that is, from 0 to vnodnbr − 1 if the mesh base is 0, and from 1 to vnodnbr if the mesh base is 1. The base value for mesh orderings is taken as min(velmbas, vnodbas), and not just as vnodbas, such that orderings that are computed on some mesh have exactly the same index range as orderings that would be computed on the graph obtained from the original mesh by means of the SCOTCH meshGraph routine. The three other result fields, *cblkptr, rangtab and treetab, contain data related to the block structure. *cblkptr holds the number of column blocks of the produced ordering, and rangtab holds the starting indices of each of the permuted column blocks, in increasing order, so that column block i starts at index rangtab[i] and ends at index (rangtab[i+1]−1), inclusive, in the new ordering. treetab holds the separators tree structure, that is, treetab[i] is the index of the father of column block i in the separators tree, or −1 if column block i is the root of the separators tree. Please refer to Section 7.2.5 for more information. Return values SCOTCH meshOrder returns 0 if the ordering of the mesh has been successfully computed, and 1 else. In this last case, the rangtab, permtab, and peritab arrays may however have been partially or completely filled, but their contents are not significant. 7.9.2 SCOTCH meshOrderInit Synopsis int SCOTCH meshOrderInit (const SCOTCH Mesh * SCOTCH Ordering * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * SCOTCH Num * scotchfmeshorderinit (doubleprecision (*) doubleprecision (*) integer (*) integer (*) integer integer (*) integer (*) integer Description 97 meshptr, ordeptr, permtab, peritab, cblkptr, rangtab, treetab) meshdat, ordedat, permtab, peritab, cblknbr, rangtab, treetab, ierr) The SCOTCH meshOrderInit routine fills the ordering structure pointed to by ordeptr with all of the data that are passed to it. Thus, all subsequent calls to ordering routines such as SCOTCH meshOrderCompute, using this ordering structure as parameter, will place ordering results in fields permtab, peritab, *cblkptr, rangtab or treetab, if they are not set to NULL. permtab is the ordering permutation array, of size vnodnbr, peritab is the inverse ordering permutation array, of size vnodnbr, cblkptr is the pointer to a SCOTCH Num that will receive the number of produced column blocks, rangtab is the array that holds the column block span information, of size vnodnbr + 1, and treetab is the array holding the structure of the separators tree, of size vnodnbr. See the above manual page of SCOTCH meshOrder, as well as section 7.2.5, for an explanation of the semantics of all of these fields. The SCOTCH meshOrderInit routine should be the first function to be called upon a SCOTCH Ordering structure for ordering meshes. When the ordering structure is no longer of use, the SCOTCH meshOrderExit function must be called, in order to to free its internal structures. Return values SCOTCH meshOrderInit returns 0 if the ordering structure has been successfully initialized, and 1 else. 7.9.3 SCOTCH meshOrderExit Synopsis void SCOTCH meshOrderExit (const SCOTCH Mesh * SCOTCH Ordering * scotchfmeshorderexit (doubleprecision (*) doubleprecision (*) meshptr, ordeptr) meshdat, ordedat) Description The SCOTCH meshOrderExit function frees the contents of a SCOTCH Ordering structure previously initialized by SCOTCH meshOrderInit. All subsequent calls to SCOTCH meshOrder* routines other than SCOTCH meshOrderInit, using this structure as parameter, may yield unpredictable results. 7.9.4 SCOTCH meshOrderSave Synopsis int SCOTCH meshOrderSave (const SCOTCH Mesh * const SCOTCH Ordering * FILE * scotchfmeshordersave (doubleprecision (*) doubleprecision (*) integer integer 98 meshptr, ordeptr, stream) meshdat, ordedat, fildes, ierr) Description The SCOTCH meshOrderSave routine saves the contents of the SCOTCH Ordering structure pointed to by ordeptr to stream stream, in the Scotch ordering format (see section 5.6). Return values SCOTCH meshOrderSave returns 0 if the ordering structure has been successfully written to stream, and 1 else. 7.9.5 SCOTCH meshOrderSaveMap Synopsis int SCOTCH meshOrderSaveMap (const SCOTCH Mesh * const SCOTCH Ordering * FILE * scotchfmeshordersavemap (doubleprecision (*) doubleprecision (*) integer integer meshptr, ordeptr, stream) meshdat, ordedat, fildes, ierr) Description The SCOTCH meshOrderSaveMap routine saves the block partitioning data associated with the SCOTCH Ordering structure pointed to by ordeptr to stream stream, in the Scotch mapping format (see section 5.5). A target domain number is associated with every block, such that all node vertices belonging to the same block are shown as belonging to the same target vertex. This mapping file can then be used by the gout program (see section 6.3.12) to produce pictures showing the different separators and blocks. Since gout only takes graphs as input, the mesh has to be converted into a graph by means of the gmk msh program (see section 6.3.8). Return values SCOTCH meshOrderSaveMap returns 0 if the ordering structure has been successfully written to stream, and 1 else. 7.9.6 SCOTCH meshOrderSaveTree Synopsis int SCOTCH meshOrderSaveTree (const SCOTCH Mesh * const SCOTCH Ordering * FILE * scotchfmeshordersavetree (doubleprecision (*) doubleprecision (*) integer integer 99 meshptr, ordeptr, stream) meshdat, ordedat, fildes, ierr) Description The SCOTCH meshOrderSaveTree routine saves the tree hierarchy information associated with the SCOTCH Ordering structure pointed to by ordeptr to stream stream. The format of the tree output file resembles the one of a mapping or ordering file: it is made up of as many lines as there are node vertices in the ordering. Each of these lines holds two integer numbers. The first one is the index or the label of the node vertex, starting from baseval, and the second one is the index of its parent node in the separators tree, or −1 if the vertex belongs to a root node. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the tree mapping file. Return values SCOTCH meshOrderSaveTree returns 0 if the separators tree structure has been successfully written to stream, and 1 else. 7.9.7 SCOTCH meshOrderCheck Synopsis int SCOTCH meshOrderCheck (const SCOTCH Mesh * const SCOTCH Ordering * scotchfmeshordercheck (doubleprecision (*) doubleprecision (*) integer meshptr, ordeptr) meshdat, ordedat, ierr) Description The SCOTCH meshOrderCheck routine checks the consistency of the given SCOTCH Ordering structure pointed to by ordeptr. Return values SCOTCH meshOrderCheck returns 0 if ordering data are consistent, and 1 else. 7.9.8 SCOTCH meshOrderCompute Synopsis int SCOTCH meshOrderCompute (const SCOTCH Mesh * SCOTCH Ordering * const SCOTCH Strat * scotchfmeshordercompute (doubleprecision (*) doubleprecision (*) doubleprecision (*) integer 100 meshptr, ordeptr, straptr) meshdat, ordedat, stradat, ierr) Description The SCOTCH meshOrderCompute routine computes a block ordering of the mesh structure pointed to by grafptr, using the mapping strategy pointed to by stratptr, and stores its result in the ordering structure pointed to by ordeptr. On return, the ordering structure holds a block ordering of the given mesh (see section 7.9.2 for a description of the ordering fields). Return values SCOTCH meshOrderCompute returns 0 if the ordering has been successfully computed, and 1 else. In this latter case, the ordering arrays may however have been partially or completely filled, but their contents are not significant. 7.10 Strategy handling routines 7.10.1 SCOTCH stratInit Synopsis int SCOTCH stratInit (SCOTCH Strat * scotchfstratinit (doubleprecision (*) integer straptr) stradat, ierr) Description The SCOTCH stratInit function initializes a SCOTCH Strat structure so as to make it suitable for future operations. It should be the first function to be called upon a SCOTCH Strat structure. When the strategy data is no longer of use, call function SCOTCH stratExit to free its internal structures. Return values SCOTCH stratInit returns 0 if the strategy structure has been successfully initialized, and 1 else. 7.10.2 SCOTCH stratExit Synopsis void SCOTCH stratExit (SCOTCH Strat * scotchfstratexit (doubleprecision (*) archptr) stradat) Description The SCOTCH stratExit function frees the contents of a SCOTCH Strat structure previously initialized by SCOTCH stratInit. All subsequent calls to SCOTCH strat routines other than SCOTCH stratInit, using this structure as parameter, may yield unpredictable results. 101 7.10.3 SCOTCH stratSave Synopsis int SCOTCH stratSave (const SCOTCH Strat * FILE * scotchfstratsave (doubleprecision (*) integer integer straptr, stream) stradat, fildes, ierr) Description The SCOTCH stratSave routine saves the contents of the SCOTCH Strat structure pointed to by straptr to stream stream, in the form of a text string. The methods and parameters of the strategy string depend on the type of the strategy, that is, whether it is a bipartitioning, mapping, or ordering strategy, and to which structure it applies, that is, graphs or meshes. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor fildes associated with the logical unit of the output file. Return values SCOTCH stratSave returns 0 if the strategy string has been successfully written to stream, and 1 else. 7.10.4 SCOTCH stratGraphBipart Synopsis int SCOTCH stratGraphBipart (SCOTCH Strat * const char * scotchfstratgraphbipart (doubleprecision (*) character (*) integer straptr, string) stradat, string, ierr) Description The SCOTCH stratGraphBipart routine fills the strategy structure pointed to by straptr with the graph bipartitioning strategy string pointed to by string. From this point, the strategy structure can only be used as a graph bipartitioning strategy, to be used by function SCOTCH archBuild, for instance. When using the C interface, the array of characters pointed to by string must be null-terminated. Return values SCOTCH stratGraphBipart returns 0 if the strategy string has been successfully set, and 1 else. 102 7.10.5 SCOTCH stratGraphMap Synopsis int SCOTCH stratGraphMap (SCOTCH Strat * const char * straptr, string) scotchfstratgraphmap (doubleprecision (*) character (*) integer stradat, string, ierr) Description The SCOTCH stratGraphMap routine fills the strategy structure pointed to by straptr with the graph mapping strategy string pointed to by string. From this point, the strategy structure can only be used as a mapping strategy, to be used by function SCOTCH graphMap, for instance. When using the C interface, the array of characters pointed to by string must be null-terminated. Return values SCOTCH stratGraphMap returns 0 if the strategy string has been successfully set, and 1 else. 7.10.6 SCOTCH stratGraphOrder Synopsis int SCOTCH stratGraphOrder (SCOTCH Strat * const char * scotchfstratgraphorder (doubleprecision (*) character (*) integer straptr, string) stradat, string, ierr) Description The SCOTCH stratGraphOrder routine fills the strategy structure pointed to by straptr with the graph ordering strategy string pointed to by string. From this point, the strategy structure can only be used as a graph ordering strategy, to be used by function SCOTCH graphOrder, for instance. When using the C interface, the array of characters pointed to by string must be null-terminated. Return values SCOTCH stratGraphOrder returns 0 if the strategy string has been successfully set, and 1 else. 103 7.10.7 SCOTCH stratMeshOrder Synopsis int SCOTCH stratMeshOrder (SCOTCH Strat * const char * straptr, string) scotchfstratmeshorder (doubleprecision (*) character (*) integer stradat, string, ierr) Description The SCOTCH stratMeshOrder routine fills the strategy structure pointed to by straptr with the mesh ordering strategy string pointed to by string. From this point, strategy strat can only be used as a mesh ordering strategy, to be used by function SCOTCH meshOrder, for instance. When using the C interface, the array of characters pointed to by string must be null-terminated. Return values SCOTCH stratMeshOrder returns 0 if the strategy string has been successfully set, and 1 else. 7.11 Geometry handling routines Since the Scotch project is based on algorithms that rely on topology data only, geometry data do not play an important role in the libScotch library. They are only relevant to programs that display graphs, such as the gout program. However, since all routines that are used by the programs of the Scotch distributions have an interface in the libScotch library, there exist geometry handling routines in it, which manipulate SCOTCH Geom structures. Apart from the routines that create, destroy or access SCOTCH Geom structures, all of the routines in this section are input/output routines, which read or write both SCOTCH Graph and SCOTCH Geom structures. We have chosen to define the interface of the geometry-handling routines such that they also handle graph or mesh topology because some external file formats mix these data, and that we wanted our routines to be able to read their data on the fly from streams that can only be read once, such as communication pipes. Having both aspects taken into account in a single call makes the writing of file conversion tools, such as gcv and mcv, very easy. When the file format from which to read or into which to write mixes both sorts of data, the geometry file pointer can be set to NULL, as it will not be used. 7.11.1 SCOTCH geomInit Synopsis int SCOTCH geomInit (SCOTCH Geom * 104 geomptr) scotchfgeominit (doubleprecision (*) integer geomdat, ierr) Description The SCOTCH geomInit function initializes a SCOTCH Geom structure so as to make it suitable for future operations. It should be the first function to be called upon a SCOTCH Geom structure. When the geometrical data is no longer of use, call function SCOTCH geomExit to free its internal structures. Return values SCOTCH geomInit returns 0 if the geometrical structure has been successfully initialized, and 1 else. 7.11.2 SCOTCH geomExit Synopsis void SCOTCH geomExit (SCOTCH Geom * scotchfgeomexit (doubleprecision (*) geomptr) geomdat) Description The SCOTCH geomExit function frees the contents of a SCOTCH Geom structure previously initialized by SCOTCH geomInit. All subsequent calls to SCOTCH *Geom* routines other than SCOTCH geomInit, using this structure as parameter, may yield unpredictable results. 7.11.3 SCOTCH geomData Synopsis void SCOTCH geomData (const SCOTCH Geom * SCOTCH Num * double ** scotchfgeomdata (doubleprecision (*) doubleprecision (*) integer integer geomptr, dimnptr, geomtab) geomdat, indxtab, dimnnbr, geomidx) Description The SCOTCH geomData routine is a multiple accessor to the contents of SCOTCH Geom structures. dimnptr is the pointer to a location that will hold the number of dimensions of the graph vertex or mesh node vertex coordinates, and will therefore be 105 equal to 1, 2 or 3. geomtab is the pointer to a location that will hold the reference to the geometry coordinates, as defined in section 7.2.4. Any of these pointers can be set to NULL on input if the corresponding information is not needed. Else, the reference to a dummy area can be provided, where all unwanted data will be written. Since there are no pointers in Fortran, a specific mechanism is used to allow users to access the coordinate array. The scotchfgeomdata routine is passed an integer array, the first element of which is used as a base address from which all other array indices are computed. Therefore, instead of returning a reference, the routine returns an integer, which represents the starting index of the coordinate array with respect to the base input array. For instance, if some base array myarray(1) is passed as parameter indxtab, then the first cell of array geomtab will be accessible as myarray(geomidx). In order for this feature to behave properly, the indxtab array must be double-precisionaligned with the geometry array. This is automatically enforced on most systems, but some care should be taken on systems that allow one to access data that is not double-aligned. On such systems, declaring the array after a dummy doubleprecision array can coerce the compiler into enforcing the proper alignment. 7.11.4 SCOTCH graphGeomLoadChac Synopsis int SCOTCH graphGeomLoadChac (SCOTCH Graph * SCOTCH Geom * FILE * FILE * const char * scotchfgraphgeomloadchac (doubleprecision (*) doubleprecision (*) integer integer character (*) grafptr, geomptr, grafstream, geomstream, string) grafdat, geomdat, graffildes, geomfildes, string) Description The SCOTCH graphGeomLoadChac routine fills the SCOTCH Graph structure pointed to by grafptr with the source graph description available from stream grafstream in the Chaco graph format [24]. Since this graph format does not handle geometry data, the geomptr and geomstream fields are not used, as well as the string field. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor graffildes associated with the logical unit of the graph file. Return values SCOTCH graphGeomLoadChac returns 0 if the graph structure has been successfully allocated and filled with the data read, and 1 else. 106 7.11.5 SCOTCH graphGeomSaveChac Synopsis int SCOTCH graphGeomSaveChac (const SCOTCH Graph * const SCOTCH Geom * FILE * FILE * const char * scotchfgraphgeomsavechac (doubleprecision (*) doubleprecision (*) integer integer character (*) grafptr, geomptr, grafstream, geomstream, string) grafdat, geomdat, graffildes, geomfildes, string) Description The SCOTCH graphGeomSaveChac routine saves the contents of the SCOTCH Graph structure pointed to by grafptr to stream grafstream, in the Chaco graph format [24]. Since this graph format does not handle geometry data, the geomptr and geomstream fields are not used, as well as the string field. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor graffildes associated with the logical unit of the graph file. Return values SCOTCH graphGeomSaveChac returns 0 if the graph structure has been successfully written to grafstream, and 1 else. 7.11.6 SCOTCH graphGeomLoadHabo Synopsis int SCOTCH graphGeomLoadHabo (SCOTCH Graph * SCOTCH Geom * FILE * FILE * const char * scotchfgraphgeomloadhabo (doubleprecision (*) doubleprecision (*) integer integer character (*) grafptr, geomptr, grafstream, geomstream, string) grafdat, geomdat, graffildes, geomfildes, string) Description The SCOTCH graphGeomLoadHabo routine fills the SCOTCH Graph structure pointed to by grafptr with the source graph description available from stream grafstream in the Harwell-Boeing square assembled matrix format [10]. Since 107 this graph format does not handle geometry data, the geomptr and geom stream fields are not used. Since multiple graph structures can be encoded sequentially within the same file, the string field contains the string representation of an integer number that codes the rank of the graph to read within the Harwell-Boeing file. It is equal to “0” in most cases. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor graffildes associated with the logical unit of the graph file. Return values SCOTCH graphGeomLoadHabo returns 0 if the graph structure has been successfully allocated and filled with the data read, and 1 else. 7.11.7 SCOTCH graphGeomLoadScot Synopsis int SCOTCH graphGeomLoadScot (SCOTCH Graph * SCOTCH Geom * FILE * FILE * const char * scotchfgraphgeomloadscot (doubleprecision (*) doubleprecision (*) integer integer character (*) grafptr, geomptr, grafstream, geomstream, string) grafdat, geomdat, graffildes, geomfildes, string) Description The SCOTCH graphGeomLoadScot routine fills the SCOTCH Graph and SCOTCH Geom structures pointed to by grafptr and geomptr with the source graph description and geometry data available from streams grafstream and geom stream in the Scotch graph and geometry formats (see sections 5.1 and 5.3, respectively). The string field is not used. Fortran users must use the FNUM function to obtain the numbers of the Unix file descriptors graffildes and geomfildes associated with the logical units of the graph and geometry files. Return values SCOTCH graphGeomLoadScot returns 0 if the graph topology and geometry have been successfully allocated and filled with the data read, and 1 else. 7.11.8 SCOTCH graphGeomSaveScot Synopsis 108 int SCOTCH graphGeomSaveScot (const SCOTCH Graph * const SCOTCH Geom * FILE * FILE * const char * scotchfgraphgeomsavescot (doubleprecision (*) doubleprecision (*) integer integer character (*) grafptr, geomptr, grafstream, geomstream, string) grafdat, geomdat, graffildes, geomfildes, string) Description The SCOTCH graphGeomSaveScot routine saves the contents of the SCOTCH Graph and SCOTCH Geom structures pointed to by grafptr and geomptr to streams grafstream and geomstream, in the Scotch graph and geometry formats (see sections 5.1 and 5.3, respectively). The string field is not used. Fortran users must use the FNUM function to obtain the numbers of the Unix file descriptors graffildes and geomfildes associated with the logical units of the graph and geometry files. Return values SCOTCH graphGeomSaveScot returns 0 if the graph topology and geometry have been successfully written to grafstream and geomstream, and 1 else. 7.11.9 SCOTCH meshGeomLoadHabo Synopsis int SCOTCH meshGeomLoadHabo (SCOTCH Mesh * SCOTCH Geom * FILE * FILE * const char * meshptr, geomptr, meshstream, geomstream, string) scotchfmeshgeomloadhabo (doubleprecision (*) doubleprecision (*) integer integer character (*) meshdat, geomdat, meshfildes, geomfildes, string) Description The SCOTCH meshGeomLoadHabo routine fills the SCOTCH Mesh structure pointed to by meshptr with the source mesh description available from stream meshstream in the Harwell-Boeing square elemental matrix format [10]. Since this mesh format does not handle geometry data, the geomptr and geom stream fields are not used. Since multiple mesh structures can be encoded 109 sequentially within the same file, the string field contains the string representation of an integer number that codes the rank of the mesh to read within the Harwell-Boeing file. It is equal to “0” in most cases. Fortran users must use the FNUM function to obtain the number of the Unix file descriptor meshfildes associated with the logical unit of the mesh file. Return values SCOTCH meshGeomLoadHabo returns 0 if the mesh structure has been successfully allocated and filled with the data read, and 1 else. 7.11.10 SCOTCH meshGeomLoadScot Synopsis int SCOTCH meshGeomLoadScot (SCOTCH Mesh * SCOTCH Geom * FILE * FILE * const char * meshptr, geomptr, meshstream, geomstream, string) scotchfmeshgeomloadscot (doubleprecision (*) doubleprecision (*) integer integer character (*) meshdat, geomdat, meshfildes, geomfildes, string) Description The SCOTCH meshGeomLoadScot routine fills the SCOTCH Mesh and SCOTCH Geom structures pointed to by meshptr and geomptr with the source mesh description and node geometry data available from streams meshstream and geomstream in the Scotch mesh and geometry formats (see sections 5.2 and 5.3, respectively). The string field is not used. Fortran users must use the FNUM function to obtain the numbers of the Unix file descriptors meshfildes and geomfildes associated with the logical units of the mesh and geometry files. Return values SCOTCH meshGeomLoadScot returns 0 if the mesh topology and node geometry have been successfully allocated and filled with the data read, and 1 else. 7.11.11 SCOTCH meshGeomSaveScot Synopsis int SCOTCH meshGeomSaveScot (const SCOTCH Mesh * const SCOTCH Geom * FILE * FILE * const char * 110 meshptr, geomptr, meshstream, geomstream, string) scotchfmeshgeomsavescot (doubleprecision (*) doubleprecision (*) integer integer character (*) meshdat, geomdat, meshfildes, geomfildes, string) Description The SCOTCH meshGeomSaveScot routine saves the contents of the SCOTCH Mesh and SCOTCH Geom structures pointed to by meshptr and geomptr to streams meshstream and geomstream, in the Scotch mesh and geometry formats (see sections 5.2 and 5.3, respectively). The string field is not used. Fortran users must use the FNUM function to obtain the numbers of the Unix file descriptors meshfildes and geomfildes associated with the logical units of the mesh and geometry files. Return values SCOTCH meshGeomSaveScot returns 0 if the mesh topology and node geometry have been successfully written to meshstream and geomstream, and 1 else. 7.12 Error handling routines The handling of errors that occur within library routines is often difficult, because library routines should be able to issue error messages that help the application programmer to find the error, while being compatible with the way the application handles its own errors. To match these two requirements, all the error and warning messages produced by the routines of the libScotch library are issued using the user-definable variable-length argument routines SCOTCH errorPrint and SCOTCH errorPrintW. Thus, one can redirect these error messages to his own error handling routines, and can choose if he wants his program to terminate on error or to resume execution after the erroneous function has returned. In order to free the user from the burden of writing a basic error handler from scratch, the libscotcherr.a library provides error routines that print error messages on the standard error stream stderr and return control to the application. Application programmers who want to take advantage of them have to add -lscotcherr to the list of arguments of the linker, after the -lscotch argument. 7.12.1 SCOTCH errorPrint Synopsis void SCOTCH errorPrint (const char * const errstr, ...) Description The SCOTCH errorPrint function is designed to output a variable-length argument error string to some stream. 111 7.12.2 SCOTCH errorPrintW Synopsis void SCOTCH errorPrintW (const char * const errstr, ...) Description The SCOTCH errorPrintW function is designed to output a variable-length argument warning string to some stream. 7.12.3 SCOTCH errorProg Synopsis void SCOTCH errorProg (const char * progstr) Description The SCOTCH errorProg function is designed to be called at the beginning of a program or of a portion of code to identify the place where subsequent errors take place. This routine is not reentrant, as it is only a minor help function. It is defined in libscotcherr.a and is used by the standalone programs of the Scotch distribution. 7.13 Miscellaneous routines 7.13.1 SCOTCH randomReset Synopsis void SCOTCH randomReset (void) scotchfrandomreset () Description The SCOTCH randomReset routine resets the seed of the pseudo-random generator used by the graph partitioning routines of the libScotch library. Two consecutive calls to the same libScotch partitioning routines, and separated by a call to SCOTCH randomReset, will always yield the same results, as if the equivalent standalone Scotch programs were used twice, independently, to compute the results. 112 MeTiS compatibility library 7.14 The MeTiS compatibility library provides stubs which redirect some calls to MeTiS routines to the corresponding Scotch counterparts. In order to use this feature, the only thing to do is to re-link the existing software with the libscotchmetis library, and eventually with the original MeTiS library if the software uses MeTiS routines which do not need to have Scotch equivalents, such as graph transformation routines. In that latter case, the “-lscotchmetis” argument must be placed before the “-lmetis” one (and of course before the “-lscotch” one too), so that routines that are redefined by Scotch are chosen instead of their MeTiS counterpart. When no other MeTiS routines than the ones redefined by Scotch are used, the “-lmetis” argument can be omitted. See Section 9 for an example. 7.14.1 METIS EdgeND Synopsis void METIS EdgeND (const const const const const int * int * metis edgend (integer integer integer integer integer integer integer int * int * int * int * int * const const (*) (*) (*) (*) (*) const const const const const n, xadj, adjncy, numflag, options, perm, iperm) n, xadj, adjncy, numflag, options, perm, iperm) Description The METIS EdgeND function performs a nested dissection ordering of the graph passed as arrays xadj and adjncy, using the default Scotch ordering strategy. The options array is not used. The perm and iperm arrays have the opposite meaning as in Scotch: the MeTiS perm array holds what is called “inverse permutation” in Scotch, while iperm holds what is called “direct permutation” in Scotch. While Scotch has also both node and edge separation capabilities, all of the three MeTiS stubs METIS EdgeND, METIS NodeND and METIS NodeWND call the same Scotch routine, which uses the Scotch default ordering strategy proved to be efficient in most cases. 7.14.2 METIS NodeND Synopsis 113 void METIS NodeND (const const const const const int * int * metis nodend (integer integer integer integer integer integer integer int * int * int * int * int * const const const const const const const n, xadj, adjncy, numflag, options, perm, iperm) n, xadj, adjncy, numflag, options, perm, iperm) (*) (*) (*) (*) (*) Description The METIS NodeND function performs a nested dissection ordering of the graph passed as arrays xadj and adjncy, using the default Scotch ordering strategy. The options array is not used. The perm and iperm arrays have the opposite meaning as in Scotch: the MeTiS perm array holds what is called “inverse permutation” in Scotch, while iperm holds what is called “direct permutation” in Scotch. While Scotch has also both node and edge separation capabilities, all of the three MeTiS stubs METIS EdgeND, METIS NodeND and METIS NodeWND call the same Scotch routine, which uses the Scotch default ordering strategy proved to be efficient in most cases. 7.14.3 METIS NodeWND Synopsis void METIS NodeWND (const const const const const const int * int * metis nodwend (integer integer integer integer integer integer integer integer int * int * int * int * int * int * const const (*) (*) (*) (*) (*) (*) const const const const const const n, xadj, adjncy, vwgt, numflag, options, perm, iperm) 114 n, xadj, adjncy, vwgt, numflag, options, perm, iperm) Description The METIS NodeWND function performs a nested dissection ordering of the graph passed as arrays xadj, adjncy and vwgt, using the default Scotch ordering strategy. The options array is not used. The perm and iperm arrays have the opposite meaning as in Scotch: the MeTiS perm array holds what is called “inverse permutation” in Scotch, while iperm holds what is called “direct permutation” in Scotch. While Scotch has also both node and edge separation capabilities, all of the three MeTiS stubs METIS EdgeND, METIS NodeND and METIS NodeWND call the same Scotch routine, which uses the Scotch default ordering strategy proved to be efficient in most cases. 7.14.4 METIS PartGraphKway Synopsis void METIS PartGraphKway (const const const const const const const const const int * int * metis partgraphkway (integer integer integer integer integer integer integer integer integer integer integer int * int * int * int * int * int * int * int * int * const const (*) (*) (*) (*) (*) (*) const const const const const const const const const n, xadj, adjncy, vwgt, adjwgt, wgtflag, numflag, nparts, options, edgecut, part) n, xadj, adjncy, vwgt, adjwgt, wgtflag, numflag, nparts, options, edgecut, part) Description The METIS PartGraphKway function performs a mapping onto the complete graph of the graph represented by arrays xadj, adjncy, vwgt and adjwgt, using the default Scotch mapping strategy. The options array is not used. The part array has the same meaning as the parttab array of Scotch. All of the three MeTiS stubs METIS PartGraphKway, METIS PartGraph Recursive and METIS PartGraphVKway call the same Scotch routine, which 115 uses the Scotch default mapping strategy proved to be efficient in most cases. 7.14.5 METIS PartGraphRecursive Synopsis void METIS PartGraphRecursive (const const const const const const const const const int * int * metis partgraphrecursive (integer integer integer integer integer integer integer integer integer integer integer int * int * int * int * int * int * int * int * int * const const (*) (*) (*) (*) (*) (*) const const const const const const const const const n, xadj, adjncy, vwgt, adjwgt, wgtflag, numflag, nparts, options, edgecut, part) n, xadj, adjncy, vwgt, adjwgt, wgtflag, numflag, nparts, options, edgecut, part) Description The METIS PartGraphRecursive function performs a mapping onto the complete graph of the graph represented by arrays xadj, adjncy, vwgt and adjwgt, using the default Scotch mapping strategy. The options array is not used. The part array has the same meaning as the parttab array of Scotch. To date, the computation of the edgecut field requires extra processing, which increases running time to a small extent. All of the three MeTiS stubs METIS PartGraphKway, METIS PartGraph Recursive and METIS PartGraphVKway call the same Scotch routine, which uses the Scotch default mapping strategy proved to be efficient in most cases. 7.14.6 METIS PartGraphVKway Synopsis 116 void METIS PartGraphVKway (const const const const const const const const const int * int * metis partgraphvkway (integer integer integer integer integer integer integer integer integer integer integer int * int * int * int * int * int * int * int * int * const const (*) (*) (*) (*) (*) (*) const const const const const const const const const n, xadj, adjncy, vwgt, vsize, wgtflag, numflag, nparts, options, volume, part) n, xadj, adjncy, vwgt, vsize, wgtflag, numflag, nparts, options, volume, part) Description The METIS PartGraphVKway function performs a mapping onto the complete graph of the graph represented by arrays xadj, adjncy, vwgt and vsize, using the default Scotch mapping strategy. The options array is not used. The part array has the same meaning as the parttab array of Scotch. Since Scotch does not have methods for explicitely reducing the communication volume according to the metric of METIS PartGraphVKway, this routine creates a temporary edge weight array such that each edge (u, v) receives a weight equal to mboxvsize(u) + mboxvsize(v). Consequently, edges which are incident to highly communicating vertices will be less likely to be cut. However, the communication volume value returned by this routine is exactly the one which would be returned by MeTiS with respect to the output partition. Users interested in minimizing the exact communication volume should consider using hypergraphs, implemented in Scotch as meshes (see Section 7.2.3). All of the three MeTiS stubs METIS PartGraphKway, METIS PartGraph Recursive and METIS PartGraphVKway call the same Scotch routine, which uses the Scotch default mapping strategy proved to be efficient in most cases. 8 Installation Version 5.1 of the Scotch software package is distributed as free/libre software under the CeCILL-C free/libre software license [6], which is very similar to the GNU LGPL license. Therefore, it is no longer distributed as a set of binaries, but instead in the form of a source distribution, which can be downloaded from the 117 Scotch web page at http://www.labri.fr/~pelegrin/scotch/ . The extraction process will create a scotch 5.1 directory, containing several subdirectories and files. Please refer to the files called LICENSE EN.txt or LICENCE FR.txt, as well as file INSTALL EN.txt, to see under which conditions your distribution of Scotch is licensed and how to install it. To enable the use of POSIX threads in some routines, the SCOTCH PTHREAD flag must be set. If your MPI implementation is not thread-safe, make sure this flag is not defined at compile time. To enable on-the-fly compression and decompression of various formats, the relevant flags must be defined. These flags are COMMON FILE COMPRESS BZ2 for bzip2 (de)compression, COMMON FILE COMPRESS GZ for gzip (de)compression, and COMMON FILE COMPRESS LZMA for lzma decompression. Note that the corresponding development libraries must be installed on your system before compile time, and that compressed file handling can take place only on systems which support multi-threading or multi-processing. In the first case, you must set the SCOTCH PTHREAD flag in order to take advantage of these features. On Linux systems, the development libraries to install are libbzip2 1devel for the bzip2 format, zlib1-devel for the gzip format, and liblzma0devel for the lzma format. The names of the libraries may vary according to operating systems and library versions. Ask your system engineer in case of trouble. The integer values handled by Scotch are based by default on the int C type, corresponding to the INTEGER Fortran type, both of which being of the size of a machine word. To coerce the length of the Scotch integer type to 32 or 64 bits, one can use the INTSIZE32 or INTSIZE64 flags, respectively, or else the “-DINT=” definition, at compile time. For instance, adding “-DINT=long” to the CFLAGS variable in the Makefile.inc file to be placed at the root of the source tree will make all SCOTCH Num integers become long C integers. Whenever doing so, make sure to use integer types of equivalent length to declare variables passed to Scotch routines from caller C and Fortran procedures. Also, because of API conflicts, the MeTiS compatibility library will not be usable. It is usually safer and cleaner to tune your C and Fortran compilers to make them interpret int and INTEGER types as 32 or 64 bit values, than to use the aforementioned flags and coerce type lengths in your own code. All Scotch users are welcome to send a mail to the author so that they can be added to the Scotch mailing list, and be automatically informed of new releases and publications. 9 Examples This section contains chosen examples destined to show how the programs of the Scotch project interoperate and can be combined. It is supposed that the current directory is directory “scotch 5.1” of the Scotch distribution. Character “%” represents the shell prompt. • Partition source graph brol.grf into 7 parts, and save the result to file /tmp/brol.map. 118 % echo cmplt 7 > /tmp/k7.tgt % gmap brol.grf /tmp/k7.tgt /tmp/brol.map This can also be done in a single piped command: % echo cmplt 7 | gmap brol.grf - /tmp/brol.map If compressed data handling is enabled, read the graph as a gzip compressed file, and output the mapping as a bzip2 file, on the fly: % echo cmplt 7 | gmap brol.grf.gz - /tmp/brol.map.bz2 • Partition source graph brol.grf into two uneven parts of respective weights 4 7 11 and 11 , and save the result to file /tmp/brol.map. % echo cmpltw 2 4 7 > /tmp/k2w.tgt % gmap brol.grf /tmp/k2w.tgt /tmp/brol.map This can also be done in a single piped command: % echo cmpltw 2 4 7 | gmap brol.grf - /tmp/brol.map If compressed data handling is enabled, use gzip compressed streams on the fly: % echo cmpltw 2 4 7 | gmap brol.grf.gz - /tmp/brol.map.gz • Map a 32 by 32 bidimensional grid source graph onto a 256-node hypercube, and save the result to file /tmp/brol.map. % gmk m2 32 32 | gmap - tgt/h8.tgt /tmp/brol.map • Build the Open Inventor file graph.iv that contains the display of a source graph the source and geometry files of which are named graph.grf and graph.xyz. % gout -Mn -Oi graph.grf graph.xyz - graph.iv Although no mapping data is required because of the “-Mn” option, note the presence of the dummy input mapping file name “-”, which is needed to specify the output visualization file name. • Given the source and geometry files graph.grf and graph.xyz of a source graph, map the graph on a 8 by 8 bidimensional mesh and display the mapping result on a color screen by means of the public-domain ghostview PostScript previewer. % gmap graph.grf tgt/m8x8.tgt | gout graph.grf graph.xyz ’-Op{c,f,l}’ | ghostview - 119 • Build a 24-node Cube-Connected-Cycles graph target architecture which will be frequently used. Then, map compressed source file graph.grf.gz onto it, and save the result to file /tmp/brol.map. % amk ccc 3 | acpl - /tmp/ccc3.tgt % gunzip -c graph.grf.gz | gmap - /tmp/ccc3.tgt /tmp/brol.map To speed up target architecture loading in the future, the decompositiondefined target architecture is compiled by means of acpl. • Build an architecture graph which is the subgraph of the 8-node de Bruijn graph restricted to vertices labeled 1, 2, 4, 5, 6, map graph graph.grf onto it, and save the result to file /tmp/brol.map. % (gmk ub2 3; echo 5 1 2 4 5 6) | amk grf -L | gmap graph.grf /tmp/brol.map Note how the two input streams of program amk grf (that is, the de Bruijn source graph and the five-elements vertex label list) are concatenated into a single stream to be read from the standard input. • Compile and link the user application brol.c with the libScotch library, using the default error handler. % cc brol.c -o brol -lscotch -lscotcherr -lm Note that the mathematical library should also be included, after all of the Scotch libraries. • Recompile a program that used MeTiS so that it uses Scotch instead. % cc brol.c -o brol -I${metisdir} -lscotchmetis -lscotch -lscotcherr -lmetis -lm Note that the “-lscotchmetis” option must be placed before the “-lmetis” one, so that routines that are redefined by Scotch are selected instead of their MeTiS counterpart. When no other MeTiS routines than the ones redefined by Scotch are used, the “-lmetis” option can be omitted. The “-I${metisdir}” option may be necessary to provide the path to the original metis.h include file, which contains the prototypes of all of the MeTiS routines. 10 Adding new features to Scotch Since Scotch is free/libre software, users have the ability to add new features to it. Moreover, as Scotch is intended to be a testbed for new partitioning and ordering algorithms, it has been developed in a very modular way, to ease the development and inclusion of new partitioning and ordering methods to be called within Scotch strategies. All of the source code for partitioning and ordering methods for graphs and meshes is located in the src/libscotch/ source subdirectory. Source file names have a very regular pattern, based on the internal data structures they handle. 120 10.1 Graphs and meshes The basic structures in Scotch are the Graph and Mesh structures, which model a simple symmetric graph the definition of which is given in file graph.h, and a simple mesh, in the form of a bipartite graph, the definition of which is given in file mesh.h, respectively. From this structure are derived enriched graph and mesh structures: • Bgraph, in file bgraph.h: graph with bipartition, that is, edge separation, information attached to it; • Kgraph, in file kgraph.h: graph with mapping information attached to it; • Hgraph, in file hgraph.h: graph with halo information attached to it, for computing graph orderings; • Vgraph, in file vgraph.h: graph with vertex bipartition information attached to it; • Hmesh, in file hmesh.h: mesh with halo information attached to it, for computing mesh orderings; • Vmesh, in file vmesh.h: graph with vertex bipartition information attached to it. As version 5.1 of the libScotch does not provide mesh mapping capabilities, neither Bmesh nor Kmesh structures have been defined to date, but this work is in progress, and these features should be available in the upcoming releases. All of the structures are in fact defined as typedefed types. 10.2 Methods and partition data Methods are routines which take one of the above structures as input, and update the fields of the given structure according to the implemented algorithm. Initial methods will behave irrespective of the former values of the structure (like graph growing methods, which compute partitions from scratch), while refinement methods must be provided an existing partition to improve. In addition to the topological description of the underlying graph, the working graph and mesh structures comprise variables describing the current state of the vertex or edge partition. In all cases is provided a partition array called parttax, of size equal to the number of graph vertices, which tells which part every vertex is assigned to. Other variables comprise the communication load and the load imbalance of the current cut, that is, all of the data necessary to measure the quality of a partition. Some other data are also often provided, such as the number of vertices in each part and the list of frontier vertices. They are not relevant to measure the quality of the partition, but to improve the speed of computations. They are used for instance in the multi-level algorithms to compute incremental updates of the current partition state, without having to recompute these values from scratch by considering all of the graph vertices. Implementers of new methods are highly encouraged to use these variables to speed-up their computations, taking examples on typical algorithms such as the multi-level or Fiduccia-Mattheyses ones. 121 10.3 Adding a new method to Scotch We will assume in this section that the new method to add is a graph separation method. The procedure explained below is exactly the same for graph bipartitioning, graph mapping, graph ordering, mesh separation, or mesh ordering methods. Please proceed as explained below. 1. Write the code of the method itself. First, choose a free two-letter code to describe your method, say “xy”. In the libscotch source directory, create files vgraph separate xy.c and vgraph separate xy.h, basing on existing files such as vgraph separate gg.c and vgraph separate gg.h, for instance. If the method is complex, it can be split across several other files, which will be named vgraph separate xy firstmodulename.c, vgraph separate xy secondmodulename.c, eventually with matching header files. If the method has parameters, create a structure called VgraphSeparateXy Param, which contains fields of types that can be handled by the strategy parser, such as the INT generic integer type (see below), or double, for instance. The execution of your method should result in the setting or in the updating of the Vgraph structure that is passed to it. See its definition in vgraph.h and read several simple graph separation methods, such as vgraph separate zr.c, to figure out what all of its parameters mean. At the end of your method, always call, when the SCOTCH DEBUG VGRAPH2 debug flag is set, the vgraphCheck routine, to avoid the spreading of eventual bugs to other parts of the libScotch library. 2. Add the method to the parser tables. The files to update are vgraph separate st.c and vgraph separate st.h, where “st” stands for “strategy”. First, edit vgraph separate st.h. In the VgraphSeparateStMethodType enumeration, add a line for your new method VGRAPHSEPASTMETHXY. Then, edit vgraph separate st.c, where all of the remaining actions take place. In the top of the file, add a #include directive to include vgraph separate xy.h. If the method has parameters, create a vgraphseparatedefaultxy C union, basing on an existing one, and fill it with the default values of your method parameters. In the vgraphseparatestmethtab method array, add a line for the new method. To do so, choose a free single-letter code that will be used to designate the new method in strategy strings. If the method has parameters, the last field should be a pointer to the default structure, else it should be set to NULL. If the method has parameters, update the vgraphseparatestparatab parameter array. Add one data block per parameter. The first field is the name of the method to which the parameter applies, that is, VGRAPHSEPASTMETH XY. The second field is the type of the parameter, which can be: • STRATPARAMCASE: the support type is an int. It receives the index in the case string, which is provided as the last field of the parameter line, of the given case character; 122 • STRATPARAMDOUBLE: the support type is a double; • STRATPARAMINT: the support type is an INT, which is the generic integer type handled internally by Scotch. This type has variable extent, depending on compilation flags, as described in Section 7.1.4; • STRATPARAMSTRING: a (small) character string; • STRATPARAMSTRAT: strategy. For instance, the graph ordering method by nested dissection takes a vertex partitioning strategy as one of its parameters, to compute the vertex separators. The fourth and fifth fields are the address of the location of the default structure and the address of the parameter within this default structure, respectively. From these two values can be computed at run time the offset of the parameter within any instance of the parameter structure, which is used to fill the actual structures in the parsed strategy evaluation tree. The value of the sixth parameter depends on the type of the parameter. It should be NULL for STRATPARAMDOUBLE and STRATPARAMINT parameters, points to the string of available case letters for STRATPARAMCASE parameters, points to the target string buffer for STRATPARAMSTRING parameters, and points to the relevant method parsing table for STRATPARAMSTRAT parameters. 3. Edit the makefile of the libScotch source directory to enable the compilation and linking of the method. Depending on libScotch versions, this makefile is either called Makefile or make gen. 4. Compile in debug mode and experiment with your routine, by creating strategies that contain its single-letter code. 5. To change the default strategy string used by the libScotch library, update file library graph order.c, since it is the graph ordering routine which makes use of graph vertex separation methods to compute separators for the nested dissection ordering method. 10.4 Licensing of new methods and of derived works According to the terms of the CeCILL-C license [6] under which the Scotch software package is distributed, the works that are carried out to improve and extend the libScotch library must be licensed under the same terms. Basically, it means that you will have to distribute the sources of your new methods, along with the sources of Scotch, to any recipient of your modified version of the libScotch, and that you grant these recipients the same rights of update and redistribution as the ones that are given to you under the terms of CeCILL-C. Please read it carefully to know what you can do and cannot do with the Scotch distribution. You should have received a copy of the CeCILL-C license along with the Scotch distribution; if not, please browse the CeCILL website at http://www.cecill. info/licenses.en.html. Credits I wish to thank all of the following people: 123 • Patrick Amestoy collaborated to the design of the Halo Approximate Minimum Degree algorithm [49] that had been embedded into Scotch 3.3, and gave me versions of his Approximate Minimum Degree algorithm, available since version 3.2, and of his Halo Approximate Minimum Fill algorithm, available since version 3.4. He designed the mesh versions of the approximate minimum degree and approximate minimum fill algorithms, which are available since version 4.0; • Alex Pothen kindly gave me a version of his Multiple Minimum Degree algorithm, which was embedded into Scotch from version 3.2 to version 3.4; • Luca Scarano, visiting Erasmus student from the Universitá degli Studi di Bologna, coded the multi-level graph algorithm in Scotch 3.1; • Yves Secretan contributed to the MinGW32 port; • David Sherman proofread version 3.2 of this manual. References [1] P. Amestoy, T. Davis, and I. Duff. An approximate minimum degree ordering algorithm. SIAM J. Matrix Anal. and Appl., 17:886–905, 1996. [2] C. Ashcraft. Compressed graphs and the minimum degree algorithm. SIAM J. Sci. Comput., 16(6):1404–1411, 1995. [3] C. Ashcraft, S. Eisenstat, J. W.-H. Liu, and A. Sherman. A comparison of three column based distributed sparse factorization schemes. In Proc. Fifth SIAM Conf. on Parallel Processing for Scientific Computing, 1991. [4] S. T. Barnard and H. D. Simon. A fast multilevel implementation of recursive spectral bisection for partitioning unstructured problems. Concurrency: Practice and Experience, 6(2):101–117, 1994. [5] R. F. Boisvert, R. Pozo, and K. A. Remington. The Matrix Market exchange formats: initial design. NISTIR 5935, National Institute of Standards and Technology, December 1996. [6] CeCILL: “CEA-CNRS-INRIA Logiciel Libre” free/libre software license. Available from http://www.cecill.info/licenses.en.html. [7] P. Charrier and J. Roman. Algorithmique et calculs de complexité pour un solveur de type dissections emboı̂tées. Numerische Mathematik, 55:463–476, 1989. [8] C. Chevalier and F. Pellegrini. Improvement of the efficiency of genetic algorithms for scalable parallel graph partitioning in a multi-level framework. In Proc. EuroPar, Dresden, LNCS 4128, pages 243–252, September 2006. [9] I. Duff. On algorithms for obtaining a maximum transversal. ACM Trans. Math. Software, 7(3):315–330, September 1981. [10] I. S. Duff, R. G. Grimes, and J. G. Lewis. Users’ guide for the HarwellBoeing sparse matrix collection. Technical Report TR/PA/92/86, CERFACS, Toulouse, France, October 1992. 124 [11] F. Ercal, J. Ramanujam, and P. Sadayappan. Task allocation onto a hypercube by recursive mincut bipartitionning. Journal of Parallel and Distributed Computing, 10:35–44, 1990. [12] C. M. Fiduccia and R. M. Mattheyses. A linear-time heuristic for improving network partitions. In Proceedings of the 19th Design Automation Conference, pages 175–181. IEEE, 1982. [13] M. R. Garey and D. S. Johnson. Computers and Intractablility: A Guide to the Theory of NP-completeness. W. H. Freeman, San Francisco, 1979. [14] G. A. Geist and E. G.-Y. Ng. Task scheduling for parallel sparse Cholesky factorization. International Journal of Parallel Programming, 18(4):291–314, 1989. [15] A. George, M. T. Heath, J. W.-H. Liu, and E. G.-Y. Ng. Sparse Cholesky factorization on a local memory multiprocessor. SIAM Journal on Scientific and Statistical Computing, 9:327–340, 1988. [16] A. George and J. W.-H. Liu. The evolution of the minimum degree ordering algorithm. SIAM Review, 31:1–19, 1989. [17] J. A. George and J. W.-H. Liu. Computer solution of large sparse positive definite systems. Prentice Hall, 1981. [18] N. E. Gibbs, W. G. Poole, and P. K. Stockmeyer. A comparison of several bandwidth and profile reduction algorithms. ACM Trans. Math. Software, 2:322–330, 1976. [19] A. Gupta, G. Karypis, and V. Kumar. Highly scalable parallel algorithms for sparse matrix factorization. TR 94-063, University of Minnesota, 1994. [20] A. Gupta, G. Karypis, and V. Kumar. Scalable parallel algorithms for sparse linear systems. In Proc. Stratagem’96, Sophia-Antipolis, pages 97–110. INRIA, July 1996. [21] S. W. Hammond. Mapping unstructured grid computations to massively parallel computers. PhD thesis, Rensselaer Polytechnic Institute, Troy, New-York, February 1992. [22] B. Hendrickson and R. Leland. Multidimensional spectral load balancing. Technical Report SAND93–0074, Sandia National Laboratories, January 1993. [23] B. Hendrickson and R. Leland. A multilevel algorithm for partitioning graphs. Technical Report SAND93–1301, Sandia National Laboratories, June 1993. [24] B. Hendrickson and R. Leland. The Chaco user’s guide. Technical Report SAND93–2339, Sandia National Laboratories, November 1993. [25] B. Hendrickson and R. Leland. The chaco user’s guide – version 2.0. Technical Report SAND94–2692, Sandia National Laboratories, 1994. [26] B. Hendrickson and R. Leland. An empirical study of static load balancing algorithms. In Proc. SHPCC’94, Knoxville, pages 682–685. IEEE, May 1994. [27] B. Hendrickson, R. Leland, and R. Van Driessche. Skewed graph partitioning. In Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing. IEEE, March 1997. 125 [28] B. Hendrickson and E. Rothberg. Improving the runtime and quality of nested dissection ordering. SIAM J. Sci. Comput., 20(2):468–489, 1998. [29] P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and Y. Saad. High performance complete and incomplete factorizations for very large sparse systems by using scotch and pastix softwares. In Proc. 11th SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, USA, February 2004. [30] J. Hopcroft and R. Karp. An n5/2 algorithm for maximum matchings in bipartite graphs. SIAM Journal of Computing, 2(4):225–231, December 1973. [31] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning irregular graphs. Technical Report 95-035, University of Minnesota, June 1995. [32] G. Karypis and V. Kumar. MeTiS – unstructured graph partitioning and sparse matrix ordering system – version 2.0. Technical report, University of Minnesota, June 1995. [33] G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. Technical Report 95-064, University of Minnesota, August 1995. [34] G. Karypis and V. Kumar. MeTiS – A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes, and Computing Fill-Reducing Orderings of Sparse Matrices – Version 4.0. University of Minnesota, September 1998. [35] B. W. Kernighan and S. Lin. An efficient heuristic procedure for partitionning graphs. BELL System Technical Journal, pages 291–307, February 1970. [36] M. Laguna, T. A. Feo, and H. C. Elrod. A greedy randomized adaptative search procedure for the two-partition problem. Operations Research, pages 677–687, July 1994. [37] C. Leiserson, Z. Abuhamdeh, D. Douglas, C. Feynman, M. Ganmukhi, J. Hill, W. Hillis, B. Kuszmaul, M. Pierre, D. Wells, M. Wong, S. Yang, and R. Zak. The network architecture of the Connection Machine CM-5. Technical report, Thinking Machines, juillet 1992. [38] C. Leiserson and J. Lewis. Orderings for parallel sparse symmetric factorization. In Third SIAM Conference on Parallel Processing for Scientific Computing, Tromsø. SIAM, 1987. [39] M. Lin, R. Tsang, D. H. C. Du, A. E. Klietz, and S. Saroff. Performance evaluation of the CM-5 interconnection network. In Proceedings of CompCon Spring’93, 1993. [40] R. J. Lipton, D. J. Rose, and R. E. Tarjan. Generalized nested dissection. SIAM Journal of Numerical Analysis, 16(2):346–358, April 1979. [41] J. W.-H. Liu. Modification of the minimum-degree algorithm by multiple elimination. ACM Trans. Math. Software, 11(2):141–153, 1985. [42] SGI Open Inventor. inventor/. Available from http://oss.sgi.com/projects/ 126 [43] F. Pellegrini. Static mapping by dual recursive bipartitioning of process and architecture graphs. In Proc. SHPCC’94, Knoxville, pages 486–493. IEEE, May 1994. [44] F. Pellegrini. A parallelisable multi-level banded diffusion scheme for computing balanced partitions with smooth boundaries. In Proc. EuroPar, Rennes, LNCS 4641, pages 191–200, August 2007. [45] F. Pellegrini. PT-Scotch 5.1 User’s guide. Technical report, LaBRI, Université Bordeaux I, August 2008. Available from http://www.labri.fr/ ~pelegrin/scotch/. [46] F. Pellegrini and J. Roman. Experimental analysis of the dual recursive bipartitioning algorithm for static mapping. Research Report, LaBRI, Université Bordeaux I, August 1996. Available from http://www.labri.fr/~pelegrin/ papers/scotch_expanalysis.ps.gz. [47] F. Pellegrini and J. Roman. Scotch: A software package for static mapping by dual recursive bipartitioning of process and architecture graphs. In Proc. HPCN’96, Brussels, LNCS 1067, pages 493–498, April 1996. [48] F. Pellegrini and J. Roman. Sparse matrix ordering with scotch. In Proc. HPCN’97, Vienna, LNCS 1225, pages 370–378, April 1997. [49] F. Pellegrini, J. Roman, and P. Amestoy. Hybridizing nested dissection and halo approximate minimum degree for efficient sparse matrix ordering. In Proc. Irregular’99, San Juan, LNCS 1586, pages 986–995, April 1999. [50] A. Pothen and C.-J. Fan. Computing the block triangular form of a sparse matrix. ACM Trans. Math. Software, 16(4):303–324, December 1990. [51] A. Pothen, H. D. Simon, and K.-P. Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM Journal of Matrix Analysis, 11(3):430–452, July 1990. [52] E. Rothberg. Performance of panel and block approaches to sparse Cholesky factorization on the iPSC/860 and Paragon multicomputers. In Proc. SHPCC’94, Knoxville, pages 324–333. IEEE, May 1994. [53] E. Rothberg and A. Gupta. An efficient block-oriented approach to parallel sparse Cholesky factorization. In Supercomputing’93 Proceedings. IEEE, 1993. [54] E. Rothberg and R. Schreiber. Improved load distribution in parallel sparse Cholesky factorization. In Supercomputing’94 Proceedings. IEEE, 1994. [55] R. Schreiber. Scalability of sparse direct solvers. Technical Report TR 92.13, RIACS, NASA Ames Research Center, May 1992. [56] H. D. Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2:135–148, 1991. [57] W. F. Tinney and J. W. Walker. Direct solutions of sparse network equations by optimally ordered triangular factorization. J. Proc. IEEE, 55:1801–1809, 1967. [58] C. Walshaw, M. Cross, M. G. Everett, S. Johnson, and K. McManus. Partitioning & mapping of unstructured meshes to parallel machine topologies. In Proc. Irregular’95, LNCS 980, pages 121–126, 1995. 127

Download PDF

advertisement