Disclaimer: This document was part of the First European DSP Education and Research Conference. It may have been written by someone whose native language is not English. TI assumes no liability for the quality of writing and/or the accuracy of the information contained herein. A Global Visibility Classifier Based on a Multi-DSP-System Authors: R. Hranitzky, N. Thurner ESIEE, Paris September 1996 SPRA338 IMPORTANT NOTICE Texas Instruments (TI) reserves the right to make changes to its products or to discontinue any semiconductor product or service without notice, and advises its customers to obtain the latest version of relevant information to verify, before placing orders, that the information being relied on is current. TI warrants performance of its semiconductor products and related software to the specifications applicable at the time of sale in accordance with TI’s standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by government requirements. Certain application using semiconductor products may involve potential risks of death, personal injury, or severe property or environmental damage (“Critical Applications”). TI SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED, INTENDED, AUTHORIZED, OR WARRANTED TO BE SUITABLE FOR USE IN LIFE-SUPPORT APPLICATIONS, DEVICES OR SYSTEMS OR OTHER CRITICAL APPLICATIONS. Inclusion of TI products in such applications is understood to be fully at the risk of the customer. Use of TI products in such applications requires the written approval of an appropriate TI officer. Questions concerning potential risk applications should be directed to TI through a local SC sales office. In order to minimize risks associated with the customer’s applications, adequate design and operating safeguards should be provided by the customer to minimize inherent or procedural hazards. TI assumes no liability for applications assistance, customer product design, software performance, or infringement of patents or services described herein. Nor does TI warrant or represent that any license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property right of TI covering or relating to any combination, machine, or process in which such semiconductor products or services might be or are used. Copyright © 1997, Texas Instruments Incorporated TRADEMARKS TI is a trademark of Texas Instruments Incorporated. Other brands and names are the property of their respective owners. CONTACT INFORMATION US TMS320 HOTLINE (281) 274-2320 US TMS320 FAX (281) 274-2324 US TMS320 BBS (281) 274-2323 US TMS320 email dsph@ti.com Contents Abstract ........................................................................................................................... 7 Keywords .................................................................................................................... 7 Product Support ............................................................................................................. 8 World Wide Web......................................................................................................... 8 Introduction..................................................................................................................... 9 Classification of Global Visibility Relations ............................................................... 10 Visibility Space Model ............................................................................................... 10 Classification Algorithm ............................................................................................ 11 Algorithm Analysis ....................................................................................................... 15 Complexity ................................................................................................................ 15 Parallelism ................................................................................................................ 15 Implementation ............................................................................................................. 16 Partitioning................................................................................................................ 16 Communication......................................................................................................... 17 Dynamical Load Balancing ....................................................................................... 17 Results........................................................................................................................... 19 Experiments .............................................................................................................. 19 Time, Speedup and Efficiency .................................................................................. 20 Scalability.................................................................................................................. 21 Dynamical Load Balancing ....................................................................................... 23 Visibility Results ........................................................................................................ 25 Conclusion .................................................................................................................... 27 References .................................................................................................................... 28 Figures Figure 1. Figure 2. Figure 3. Figure 4. Figure 5. Figure 6. Figure 7. Figure 8. Visibility classes and attributes......................................................................... 11 Flow chart of parallel classifier ......................................................................... 13 Multi-DSP communication scheme................................................................... 16 Pseudo code of dynamical load balancer......................................................... 18 Example Input Data Set. Lines indicate partial visibilities................................. 20 Multi-DSP elapsed time (balanced load). ......................................................... 21 Multi-DSP elapsed time (unbalanced load). ..................................................... 23 Statistical output of parallel dynamical load balancer for the example in figure 5.......................................................................................................... 25 Figure 9. Statistical output of parallel classifier for example in figure 5. .......................... 26 Tables Table 1. Table 2. Multi-DSP performance .................................................................................... 22 Quality of parallel dynamical load balancer. ..................................................... 24 A Global Visibility Classifier Based on a Multi-DSP-System Abstract We present a classification algorithm for determination of global visibility relations. Results allow us to characterize visibility between any two points in 3D visibility space. Parallelism of computations is detected by data dependence analysis and leads to a parallel program. An experimental implementation of the classifier on a multi-DSP system (6 TMS320C40) is presented. Finally we discuss efficiency, scalability and dynamical load balancing of parallel computations. This document was an entry in the 199x DSP Solutions Challenge, an annual contest organized by TI to encourage students from around the world to find innovative ways to use DSPs. For more information on the TI DSP Solutions Challenge, see TI’s World Wide Web site at www.ti.com. Keywords Global Visibility Classification Parallel Processing Multi-DSP TMS320C40 A Global Visibility Classifier Based on a Multi-DSP-System 7 SPRA338 Product Support World Wide Web Our World Wide Web site at www.ti.com contains the most up to date product information, revisions, and additions. Users registering with TI&ME can build custom information pages and receive new product updates automatically via email. 8 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Introduction In our work we describe parallel aspects of visibility computations between polygons in a discrete visibility space. Such classifications are necessary for various kinds of virtual reality applications, like illumination computation methods (e.g. Ray Tracing, Radiosity, etc.) in computer graphics [1]. Classical methods use output device dependent methods (e.g. hidden surface removal) and are restricted to special cases [12] [3]. More general methods work on mathematical representations of visibility in 3D-space (some definitions can be found in the section Classification of Global Visibility Relations). The algorithms are designed to be hierarchical and modular [7]. In operation they are conservative, but exact: all ’visible’ and ’invisible’ classifications can be proved to be true (see the section Classification Algorithm) [10]. Global visibility determination is the most expensive geometric operation in image synthesis. With sequential computing this process is restricted to scenes of low visual complexity. Results of analysis state that digital signal processors (DSPs) are well suited for visibility calculations (see the section Algorithm Analysis). Furthermore parallelism analysis encouraged us to design a parallel classifier, which runs on multiple DSPs. This greatly reduces execution time and allows to engage more complex scenes. A similar parallel implementation on a nCube2S-system can be found in (9]. Described experimental measurements show high speedups even for an increasing number of DSPs. Of course, workload has to be considerably large to compensate for low speed of inter-DSP communications (see the section Results). The presented parallel application is highly scaleable and is also convenient for massively parallel systems (see the section Scalability). A Global Visibility Classifier Based on a Multi-DSP-System 9 SPRA338 Classification of Global Visibility Relations We describe an algorithm for classification of global visibility relations, which operates on the entire visibility space. The hierarchical classification algorithm (see the section Classification Algorithm) is based on a set of polygons being a discrete representation of the visibility space. Parallel aspects are discussed in the section Parallelism. Visibility Space Model First of all we give some definitions and restrictions of our visibility-space model. We define our visibility space to be a finite set of convex polygons residing in a discrete 3D-coordinate system. Polygons The geometrical properties of polygons are defined by an arbitrary number of vertices, which span the polygon’s area A. We assign a ’face’ to each polygon and thus define a direction of visibility. Normal vectors indicate front faces. To simplify discussion we use rectangular polygons in all examples throughout this paper (see e.g. figure 5). Visibility Classes and Attributes A local visibility pair is a set of two polygons {S,R}. This pair describes the visibility from S to R and vice versa. All local visibility pairs are members of the global visibility set V. The goal of the classifier is to assign an ordered pair of visibility attributes to each member of V. We define 5 classes of different visibility attributes. R is ’visible’ to S, if every point of AR can be seen from every point of As. Similarly for ’invisible’ and ’partial’. That is depicted in figure 1. The classes ’visible-partial’ and ’invisible-partial’ are assigned, if the algorithm is not able to make clear decisions (see paragraph ’Conservative Triage’ in the section Classification Algorithm). 10 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Figure 1. Visibility classes and attributes Classification Algorithm In accordance to [10] we base our algorithm on three simple ideas: Visibility preprocessing The exact determination of visibility relations is a very time-consuming process. It can be accelerated by preprocessing the database. We have assigned ’faces’ to all polygons and thus we can determine those pairs quickly, which consist of facing away polygons. They are ’invisible’. This procedure is called Back Face Culling [13]. A Global Visibility Classifier Based on a Multi-DSP-System 11 SPRA338 Incremental visibility maintenance Several more and less complex operations are necessary to classify remaining pairs. To further accelerate runtime, we sort computations by increasing complexity. That leads to a modular design, which is depicted in figure 2. We determine pseudoblocking and blocking polygons in the second and third modules using increasing numerical costs. All relations without blocking polygons B can be classified ’visible’ afterwards. Next we perform Ray Casting, trace rays between {S,R} and try to intersect them with the related set of blocking polygons {B} [5]. The number of rays can be set initially. If all rays are being blocked, the class ’invisible-partial’ is assigned, because from the algorithms point of view this relation is invisible, but could eventually become partial, if further rays are being traced. Similarly the class ’visible-partial’ is assigned if none of the rays is being blocked (see figure 1). 12 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Figure 2. Flow chart of parallel classifier A Global Visibility Classifier Based on a Multi-DSP-System 13 SPRA338 Conservative Triage We use conservative triage to avoid the combinatorial complexity of exact visibility classification. The classification is conservative in that all interactions classified as visible or invisible are correct. It is acceptable for the classification to return ’partial’ instead of ’invisible’ or ’visible’ respectively. The user may reassign invisible-partial or visiblepartial relations to the ’invisible’ or ’visible’ classes if the number of rays was considerably high. 14 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Algorithm Analysis We analyze the presented algorithm to determine memory requirements and worst case execution time (see the section Complexity). Parallelism is detected using data dependence analysis. The parallel algorithm is a SPMD type (see the section Parallelism). To increase parallel performance we add dynamical load balancing strategies, which are described in the section Dynamical Load Balancing. Complexity 2 An input data set of N polygons produces O(N ) visibility pairs. Each of these pairs has to be tested against O(N) polygons to detect a ’blocking’ situation. We stress that complexity of time and 3 memory of any sequential algorithm is O(N ). Parallelism By means of data dependence analysis we can find sets of computations, which can be computed in parallel [4]. So called data dependence graphs (DDGs) present sequential dependencies in a graphical manner and simplify analysis [8]. Results show that each of the modules depicted in figure 2 can be split into independent submodules (not shown). All submodules can be computed in parallel, but require probably access to the entire input data set N. The final parallel algorithm, an SPMD architecture, is a combination of four output dependent parallel modules (Back Face Culling, Pseudo Blocker List, Blocker List, Ray Casting). A Global Visibility Classifier Based on a Multi-DSP-System 15 SPRA338 Implementation We choose a distributed memory multi-DSP system with six nodes from Transtech Parallel Systems for experimental implementation. Each DSP resides on a TIMC40-module (four modules per motherboard) and has access to 8MBytes EDRAM of memory. The designed parallel algorithm has been implemented using the realtime-operating system Virtuoso [11]. We distingiush between the working processes called ’Slaves’ and the coordinating process called ’Master’, which performs also dynamical load balancing (see figure 3). Figure 3. Multi-DSP communication scheme Partitioning Initially the set of N polygons is equally distributed among the r DSPs. Thus each DSP holds N/r polygons in its local memory (optimal memory efficiency). We want to stress that memory efficiency can be traded off against overall execution time by introducing replicas. That reduces much communication between the DSPs. 16 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Communication The described partitioning scheme requires possibly communication between all DSPs. It should be mentioned that these dependencies are introduced artificially due to the distribution of data structures (compare to the section Parallelism). Only 1/r of the required polygons are available locally. A hybrid network consisting of a ring-structure and an n-ary tree structure is used to solve nonlocal references (see figure 3, r = 6). Complexity of communication is O(N). Parallel Back Face Culling Each of the r DSPs is able to 2 2 calculate O(N /r ) visibility relations without having to 2 communicate. For the remaining. O(N /r) components each DSP requires to receive N - N/r polygons from the other DSPs. Using the directed ring-network, it requires (r - 1) shift-lefts of N/r polygons between any two DSPs respectively. Parallel Blocker List The generation of a blocker list requires the same amount of communication. In contrast, the amount of 3 3 computations with locally available polygons is O(N /r ). Parallel Ray Casting Once blocker lists are established, rays from source polygons S to receiver polygons R are casted and intersected with the blocking polygons B. Those polygons, which are not available locally (i.e. O(N - N/r) units) are transferred between the DSPs similarly to previous paragraphs. Dynamical Load Balancing Load balancing is required whenever waiting times between synchronizing DSPs tend to influence overall execution time. If individual DSPs have significantly higher workload, we may redistribute some work to ’underloaded’ DSPs. There are two major sources of imbalance: q q A schedule, which favours certain DSPs. Input data sets, which produce unpredictable workloads. A Global Visibility Classifier Based on a Multi-DSP-System 17 SPRA338 The parallel dynamical load balancer has a hierarchical structure (see figure 2). Although individual blocks of the algorithm are working on different workloads, the balancing strategy remains the same and is based on the well known NP-complete bin-packing problem. Instead of optimal strategies, we use simple and fast heuristics: The kernel of the balancer resides on the master-DSP and retrieves load information from all slave-DSPs. This load information is made up of visibility sets {S,R,B#, Owner}, where B# is the pair’s weight (i.e. the number of blocking polygons) and Owner is the loaded DSP. The kernel has to deal with the constraint of minimizing communication between all DSPs. A pseudo code is depicted in figure 4. Figure 4. Pseudo code of dynamical load balancer 18 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Results The parallel performance of the presented classifier is determined by several experimental measurements. During all experiments we measure execution time, which is directly influenced by the algorithm, by the programming system, by the multi-DSP-system, etc. The first sections Experiments and Time, Speedup and Efficiency define constraints of executed experiments and present their results. Experiments For our experiments we use a well defined set of polygons. A computer program generates an artificial room by distributing N polygons randomly throughout a three dimensional coordinate system. The labyrinth rooms allow us to simulate very dense rooms as well as very sparse rooms. According to the section Visibility Space Model we can use sets of very high densities to produce large lists of blocking polygons. That allows us to measure worst case runtimes of the parallel program near O(N3). On the other side it has been shown that real rooms are normally of the sparse, inhomogeneous type [1]. These conditions can be simulated too. A simple test set with N = 20 is depicted in figure 5. Lines normal to surfaces are normal vectors and show orientations of polygons. Lines between polygons show partial visibility relations as they were determined by the parallel classifier (see the section Visibility Results). A Global Visibility Classifier Based on a Multi-DSP-System 19 SPRA338 Figure 5. Example Input Data Set. Lines indicate partial visibilities Time, Speedup and Efficiency We have measured elapsed (wall clock) time, which accounts not only for the computational work, but also for the communicational work and task switching due to waiting for resources. Elapsed time states how long one has to wait for a complete solution [2]. All presented data for different numbers of DSPs have been evaluated by averaging the elapsed time of several program runs to ensure that the measured data is not some statistical fluke. 20 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 The elapsed time of the multi-DSP solution can be brought into relation to the single-DSP solution and results in the speedups and efficiencies from table 1. It should be noticed that although the single-DSP solution plays the role of a sequential basis for speedup and efficiency measurements, it is made up of two tasks (master and slave0) sharing the CPU concurrently. Thus the single-DSP solution is equivalent to the parallel program running on one DSP. We assumed dynamical load balancing for all multi-DSP solutions presented in table 1. Figure 6 shows clearly a time complexity of 0(n3) (see the section Complexity). Distances between lines become larger as the number of polygons increases. We conclude that speedup can be increased by increasing r and N. That indicates good scalability. Scalability Figure 6. Multi-DSP elapsed time (balanced load). A Global Visibility Classifier Based on a Multi-DSP-System 21 SPRA338 To determine scalability of our parallel system, which is the ability to increase speedup as the number of DSPs increases, we evaluate the isoefficiency function [14]. The function relates problem size W (i.e. sequential time complexity O(N3)) to the number of processing elements r [6]. Evaluation can be done approximately by using measured data from table 1 or more exactly by mathematical analysis. We have determined an isoefficiency function of W2/3 = O(r). It states that r can increase linearly with N2 to result in constant efficiency. We conclude that this multi-DSP application is highly scaleable. Table 1. Multi-DSP performance 22 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Dynamical Load Balancing As mentioned in the section Dynamical Load Balancing, there are two major sources of imbalances. Code Imbalance Increased elapsed time due to imbalance, which is introduced by the algorithm can be seen in figure 7 and table 2. We find that especially for a small number of DSPs the algorithm introduces much load imbalances. The initial calculation of ’Back Faces’ makes use only of r/2 DSPs in the last iteration. These DSPs are significantly higher loaded. The effect is less significant for an increasing number of DSPs. Furthermore imbalances increase with increasing load and we conclude that this multi-DSP application requires dynamical load balancing, especially if r is small or N is large. Figure 7. Multi-DSP elapsed time (unbalanced load). A Global Visibility Classifier Based on a Multi-DSP-System 23 SPRA338 Table 2. Quality of parallel dynamical load balancer. Data Imbalance We have to deal with imbalances introduced by the input data set. Obviously we could create almost any state of balance by designing some odd test sets. Several experiments showed that dependency of balance on input data is much more significant than dependency on algorithmical flow. From elapsed time’s point of view, timing requirements from table 2 can be seen as ’best cases’ if no load balancer is used. We state that due to an ’open’ definition of the input interface, this multi-DSP application requires a load balancer, which rebalances workload at runtime. The quality of any dynamical load balancing algorithm can be determined by comparing gain of execution speed to time of rebalancing. This is depicted in table 2. We see that the balancer requires 4% of elapsed time on average but shortens runtime widely. Figure 8 shows the balancer’s statistical output for the example from figure 5. 24 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Figure 8. Statistical output of parallel dynamical load balancer for the example in figure 5. Visibility Results For small sets of polygons results can be presented graphically, while results for large sets are written to a text file. A statistical output for the example in figure 5 is depicted in figure 9. A Global Visibility Classifier Based on a Multi-DSP-System 25 SPRA338 Figure 9. Statistical output of parallel classifier for example in figure 5. 26 A Global Visibility Classifier Based on a Multi-DSP-System SPRA338 Conclusion We have presented a multi-DSP based global visibility classifier with complexity O(N3/r) for calculation of visibility relations in the entire visibility space. Experimental measurements and scalability analysis showed that the presented parallel application is well scaleable, so many further DSPs can be used efficiently and result in a great reduction of overall execution time, especially for large problem sizes. A parallel dynamic load balancer helps to increase speedup and allows better utilization of available memory. A Global Visibility Classifier Based on a Multi-DSP-System 27 SPRA338 References 1) Larry Aupperle. Hierarchical Algorithms for Illumination. Doctoral thesis, Princeton University, Department of Computer Science, 1993. 2) Lawrence A. Crowl. How to Measure, Present, and Compare Parallel Performance. IEEE Parallel & Distributed Technology, pages 9-25, Spring 1994. 3) J. Foley, A. van Dam, S. Feiner, and J. Hughes. Computer Graphics. Addison Wesley, Reading, USA, 1960. 4) Ian Foster. Designing and Building Parallel Programs. Addison Wesley, 1995. 5) Andrew S. Glassner, editor. An Introduction to Ray Tracing. Academic Press, San Diego, California, 1989. 6) Ananth Y. Grama, Gupta Anshul, and Vipin Kumar. Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures. IEEE Parallel & Distributed Technology, pages 12-21, August 1993. 7) Ned Greene, Michael Kass, and Gavin Miller. Hierarchical ZBuffer Visibility. In Computer Graphics, pages 231-234, 1993. 8) Dan I. Moldovan. Parallel Processing. Morgan Kaufmann Publishers, Inc., 1993. 9) W. Sturzlinger and C. Wild. Parallel Visibility Computations for Parallel Radiosity. In Winter School of Computer Graphics and CAD, pages 405-413, University of West Bohemia, Plzen, Czech Republic, 1994. 10) Seth Teller and Pat Hanrahan. Global Visibility Algorithms for Illumination Computations. In Computer Graphics, pages 239246, 1993. 11) Eric Verhulst. Virtuoso: A virtual-single processor Programming System for distributed real-time applications. Microprocessing and Microprogramming, Euromicro Journal, pages 103-115, 1994. 12) J. Warnock. A Hidden Surface Algorithm for Computer Generated Half Tone Pictures. Technical Report, 1969. 13) Alan Watt and Mark Watt. Advanced Animation and Rendering Techniques. Addison Wesley, Reading, USA, 1992. 14) Albert Y. H. Zomaya, editor. Parallel & Distributed Computing Handbook. Computer Engineering. McGraw-Hill, 1996. 28 A Global Visibility Classifier Based on a Multi-DSP-System

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising