Linköping Studies in Science and Technology Dissertations. No. 1490 Testing and Logic Optimization Techniques for Systems on Chip by Tomas Bengtsson Department of Computer and Information Science Linköpings universitet SE-581 83 Linköping, Sweden Linköping 2012 Copyright © 2012 Tomas Bengtsson ISBN 978-91-7519-742-5 ISSN 0345-7524 Printed by LiU-Tryck 2012 URL: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-84806 Abstract Today it is possible to integrate more than one billion transistors onto a single chip. This has enabled implementation of complex functionality in hand held gadgets, but handling such complexity is far from trivial. The challenges of handling this complexity are mostly related to the design and testing of the digital components of these chips. A number of well-researched disciplines must be employed in the efficient design of large and complex chips. These include utilization of several abstraction levels, design of appropriate architectures, several different classes of optimization methods, and development of testing techniques. This thesis contributes mainly to the areas of design optimization and testing methods. In the area of testing this thesis contributes methods for testing of on-chip links connecting different clock domains. This includes testing for defects that introduce unacceptable delay, lead to excessive crosstalk and cause glitches, which can produce errors. We show how pure digital components can be used to detect such defects and how the tests can be scheduled efficiently. To manage increasing test complexity, another contribution proposes to raise the abstraction level of fault models from logic level to system level. A set of system level fault models for a NoC-switch is proposed and evaluated to demonstrate their potential. In the area of design optimization, this thesis focuses primarily on logic optimization. Two contributions for Boolean decomposition are presented. The first one is a fast heuristic algorithm that finds nondisjoint decompositions for Boolean functions. This algorithm operates on a Binary Decision Diagram. The other contribution is a I fast algorithm for detecting whether a function is likely to benefit from optimization for architectures with a gate depth of three with an XOR-gate as the third gate. II Populärvetenskaplig sammanfattning Idag är det möjligt att integrera mer än en miljard transistorer på ett enda mikrochip. Utvecklingen av mikrochips har gjort det möjligt att implementera mycket komplexa och avancerade funktioner i små handhållna apparater. Så kallade smartphones är ett typiskt exempel. Att hantera komplexiteten i mikrochip av den här storleken är långt ifrån trivialt, i synnerhet när det gäller de digitala delarna. Resultaten från flera olika forskningsområden utnyttjas i samverkan för att på ett effektivt sätt konstruera stora komplexa mikrochip. Sådana forskningsområden behandlar hur man utnyttjar flera abstraktionsnivåer, hur man utformar bra arkitekturer, hur man optimerar konstruktioner och hur man testar de färdiga mikrochipen. Bidragen som presenteras i den här avhandlingen fokuserar dels på hur man optimerar konstruktioner dels på hur man testar de färdiga mikrochipen. Man kan ha olika klockdomäner i olika delar av ett mikrochip för att slippa distribuera en och samma klocksignal över hela mikrochipet. När det gäller test av mikrochip bidrar denna avhandling med metoder för att testa kommunikationslänkar som går mellan delar av chipet som har olika klocksignaler. Bidragen inkluderar tester för defekter som kan orsaka fel genom oacceptabel fördröjning, genom för mycket överhörning eller genom spikar. Logisk nivå är den abstraktionsnivå där en konstruktion representeras med hjälp av grindar och vippor. Det är vanligtvis utifrån en sådan representation man i detalj bestämmer hur ett mikrochip skall testas och ofta lägger man in extra grindar och vippor III i chipet för teständamål. För att hantera testkomplexiteten har denna avhandling ett bidrag som föreslår att man lyfter abstraktionsnivån för testutveckling från den logiska nivån till systemnivån. Systemnivån är en representation som beskriver vad konstruktionen skall göra utan att ange några detaljer om implementeringen. För att påvisa potentialen för utveckling av test på systemnivån föreslås och utvärderas i denna avhandling hur man på systemnivå kan modellera fel för en NoCswitch. En NoC-switch är en specifik typ av komponent som finns i vissa mikrochip. När det gäller optimeringsmetoder har denna avhandling två bidrag som fokuserar på minimering av antalet grindar i en konstruktion. Det första bidraget är en algoritm för att bryta ut delfunktioner i ett Booleskt uttryck. Den algoritmen opererar på ett så kallat Binary Decision Diagram (BDD) som är en typ av riktad graf för att representera en Boolesk funktion. Det andra bidraget är en snabb algoritm för att göra en prognos över hur mycket en funktion kommer att tjäna på en arkitektur med ett grinddjup på tre där den tredje grinden utgörs av en XOR-grind med två ingångar. IV Acknowledgment There are many people who have supported and encouraged me during my Ph.D. studies and the work with writing this thesis. I would like to give a special thank to Professor Shashi Kumar, my supervisor at Jönköping University, for always taking time to help, support and encourage me in a good way. I would also like to give a special thank to Professor Zebo Peng, my supervisor at Linköping University, for great supervision and patient guidance throughout my Ph.D. studies. A special thank I would also like to give to Professor Elena Dubrova at Royal Institute of Technology Stockholm, for good supervision, discussions and encouragement during the work with logic optimization topic which formed the basis of my licentiate thesis. I would also like to give one more thank to Professor Shashi Kumar for very useful support, encouragement and supervision also during the work taking me to licentiate degree. I would give a thank to Professor Bengt Magnhagen as well who let me be accepted as a doctoral student at Jönköping University. I am very thankful to Dr. Artur Jutman and Professor Raimund Ubar at Tallinn Technical University too for very good research collaboration with the work on electronic testing as well as for their inspiration and for their willingness to share their knowledge and experience. I am also thankful to Dr. Andrés Martinelli for good collaboration with the work with logic optimization. I am also grateful to all other colleagues at Linköping University, Royal Institute of Technology and Tallinn Technical University who have contributed in one or another way to make this work possible. I am also grateful to all other colleagues at Jönköping University who have contributed by encouraging me, participating in technical V discussions, helping me to handle obstacles or have contributed in other ways for making this work possible. A special thank to Alf Johansson, Rickard Holsmark and Dr. Adam Lagerberg. I would also like to give a thank to Brittany Shahmehri for a great work with English language correction and improvement. Finally I would like to give a great thank to my parents AnnLouise and Klas, my sister Åsa and my girlfriend Louise for all support, understanding and encouragement. Tomas Bengtsson November 2012 VI Contents Part A. Introduction and background 1 Introduction 1.1 1.2 1.3 2 Chip design, SoC and test development .......................... 3 Addressed problems and contributions............................ 5 Thesis outline................................................................. 11 Digital system design and testing 2.1 2.2 2.3 2.4 3 13 Digital system design..................................................... 14 Core based design and systems on chips ....................... 18 Logic optimization......................................................... 22 Defects and digital system testing.................................. 31 Part B. Chip testing 3 Background and related work in SoC testing 3.1 3.2 3.3 49 SoC testing and NoC testing.......................................... 49 On chip crosstalk induced fault testing.......................... 55 Test generation at high abstraction levels...................... 69 VII 4 Testing of crosstalk induced faults in on-chip interconnects 4.1 4.2 4.3 4.4 5 Method for testing of faults causing delay errors .......... 77 Method for scheduling wires as victims ........................ 94 Method for test of crosstalk-faults causing glitches .... 100 Conclusions ................................................................. 112 System level fault models 5.1 5.2 5.3 77 113 System level faults....................................................... 113 Evaluation of system level fault models ...................... 117 Conclusions ................................................................. 130 Part C. Logic optimization 6 Background and related work in Boolean decomposition 6.1 6.2 6.3 6.4 7 7.3 7.4 7.5 VIII Decomposition of Boolean functions .......................... 134 Decision diagram based decomposition methods........ 139 Decomposition for three-levels logic synthesis ........... 151 Other applications of Boolean decomposition............. 154 A fast algorithm for finding bound-sets 7.1 7.2 133 159 Basic idea of Interval-cut algorithm ............................ 159 Interval-cut algorithm and formal proof of its functionality................................................................. 161 Implementation aspects and complexity analysis........ 165 Experimental results .................................................... 175 Discussion and conclusions ......................................... 178 8 Functional decomposition for three-level logic 179 implementation 8.1 8.2 8.3 8.4 8.5 Basic ideas in 3-level decomposition estimation method ......................................................................... 180 Theorem on which the estimation method is based ..... 182 Estimation algorithm.................................................... 185 Experimental results .................................................... 187 Conclusions.................................................................. 189 Part D. Conclusions 9 Conclusions and future work 9.1 9.2 9.3 193 Contributions in chip testing........................................ 193 Contributions in Boolean decomposition..................... 194 Proposals for future work ............................................ 195 IX X Part A Introduction and background 1 2 Chapter 1 Introduction Development of a System on Chip (SoC) is a complex process with many steps, each with special demands and challenges. In this thesis, we contribute analyses of certain aspects of the design and testing of complex SoCs, and also propose solutions to some associated problems. This chapter briefly provides background necessary to this thesis, discusses the problems addressed and outlines the contributions. Section 1.1 gives the background and Section 1.2 describes the problems addressed and the contributions, including a list of publications based on the contributions. Section 1.3 provides an outline of the thesis. 1.1 Chip design, SoC and test development Since the semiconductor was invented, the level of device integration on a single chip has grown rapidly – in fact, it has doubled about every two years for several decades [ITRS08]. This growth is commonly referred to as Moore’s law, named for Gordon Moore who initially predicted this rate of increase in 1965 [Moo65]. Today it is possible to integrate more than one billion transistors on a single die. 3 CHAPTER 1 As the level of integration increases, there are basically two design challenges that need to be considered. The first of these design challenges is related to the decreasing dimensions of on-chip components and the relative increase in length of interconnections. As component dimensions decrease, physical aspects which could previously be neglected must now be considered. Crosstalk effects, for example, require more attention today than previously. The other design challenge that becomes more intricate as component density increases is related to design complexity. The large number of transistors that can be integrated onto a single chip makes it possible to design very complex circuits. With increasing complexity the design process becomes more challenging, which means that more efficient design methods are needed. Two popular techniques are utilization of Intellectual Property cores (IP-cores) and the creation of more sophisticated computer tools that allow design at a higher level of abstraction. The goal is that the finished product should be as optimal as possible in terms of cost, performance and/or power consumption. However, many synthesis and optimization problems are computationally expensive; therefore they cannot practically be solved with an exact optimization algorithm. In many cases the choice of optimization strategy is a tradeoff between performance, production cost, flexibility and design time. As the integration level increases, development of efficient test techniques becomes more challenging as well. Development of tests for defects in chips consists of two major tasks. The first task is to determine how to detect the presence of defects inside the chip. The second task is to provide means of activating the measurement of the defect by sending a signal into the chip and then propagating the results of the measurement back out of the chip. The second task is referred to as test access. The increasing challenges associated with identifying the presence of a defect inside the chip are closely related to the increasing design challenges which arise with miniaturization of components. For the previous many decades the stuck-at fault model has been used to model many defects in digital circuits. In this model a defect makes a node in a digital circuit as if it is permanently stuck to logic value 0 or to value 1. To check whether a node is stuck-at 0, logic 1 is applied to 4 INTRODUCTION the node and the logic value of the node is measured. Shorts and breaks are typical defects that can be detected in this way. Defects of this type can be considered either to exist or not. In the case of modern deep sub-micron chips, it is sometimes also necessary to test for other defects that are of a more continuous nature. This means that one or more parameters are outside of acceptable ranges, causing unwanted chip behavior. One example of such a defect is wires that are too thin causing too much resistance. Another example is closely-spaced wires causing more parasitic capacitance than accounted for, which can lead to excessive crosstalk and produce an unacceptable level of delay. Unlike defects modeled as stuck-at faults, measurement of crosstalkfaults and delay faults requires extra logic. Increased chip complexity also makes designing test access a far more challenging task. As more logic exists between the defect being tested and the chip’s interface to its environment, it becomes more complicated to activate the test and propagate the result data back out of the chip. Large chips are usually equipped with logic dedicated for test activation and test result propagation. In literature such logic is usually referred to as Design for Testability (DfT) logic. Chips with a large number of components have a large number of potential defects, which means that testing for every potential defect becomes time consuming. One solution to increase test speed is to add special onchip test logic, called Built In Self Test (BIST) circuit, which is used for self testing of the chip. We use the phrase test logic to refer DfT logic and BIST circuits. 1.2 Addressed problems and contributions In this thesis we address several key issues for the design and test of complex SoCs. These issues are all related to the development of the silicon technology and the rapid increase of chip complexity. The detailed problems addressed and the technical contributions of the thesis are described in the following subsections. 5 CHAPTER 1 1.2.1. Crosstalk test for on-chip links For the small dimensions and high frequencies of modern chips, it may be necessary to test for defects that cause excessive delays or too much crosstalk. This type of testing is usually essential for relatively long on-chip wires. Tests for crosstalk-faults should detect defects that cause more crosstalk than accounted for. For some kinds of crosstalk effects, explicit testing is not necessary although they need consideration during design. Consideration of capacitive coupling is usually sufficient when the test fabric is designed. The capacitive coupling between wires affects their signal delay and can cause glitches. Unacceptable signal delay caused by crosstalk occurs under certain conditions, which means it will only manifest when the interfering wires are carrying certain signals. When a signal wire is tested for crosstalk related defects, the interfering wires can be put in a state representing the worst case scenario. If the signal works correctly in each worst case scenario, one can conclude that the tested signal does not suffer from too much crosstalk. This type of test is however not sufficient in cases where a signal travels between components with different clock signals because there is non-determinism in the phase difference between the different clock signals in the clock domains. In this thesis a test method is presented which tests for crosstalkfaults in bus lines between different clock domains on a chip. This method reads the signal wire one clock cycle earlier than under normal operation. In this way it can be guaranteed that the interference affecting the signal being tested is not so large that it can cause a failure. This measurement can be repeated several times and if the signal is read correctly at least once, one can conclude that the crosstalk-fault under consideration is not present. An advantage of this method is that only digital test logic is needed for this test. Crosstalk can also cause glitches on signal wires. With digital glitch detectors, test for glitches can be included. Tests for crosstalkfaults causing unacceptable delay and faults causing glitches forms a complete test for crosstalk induced faults affecting signal wires. 6 INTRODUCTION Contributions in this thesis show how such a complete test can be formed only requiring digital test logic to be inserted in the chip. Buses on chips have wires closely packed together. The height of wires in modern chips has become greater than their width [Aru05], which makes capacitive coupling between wires relatively significant. This, in turn, increases the risk that a defect could cause capacitive coupling effects to be greater than accounted for. Such defects are the main cause of crosstalk-faults, which means it is often sufficient to test only for this type of defect. When testing for interference on a signal wire in a bus, one strategy for creating worst case interference is to apply values to all other signal wires. However it is usually sufficient to apply signals only to the wires closest to the wire being tested. In this way, several signals in a bus can be tested for crosstalkfaults simultaneously and the test efficiency will improve. During the test procedure the term victim wire is used for the wires currently being tested and the term aggressor wire is used for the wires that affect the victim wires through crosstalk. One contribution of this thesis is a method for scheduling wires to be victims and aggressors during the test procedure. A shift register is used with one cell for each respective wire, controlling whether it should be a victim or an aggressor. Given a minimum distance between wires that should simultaneously be victims, initial values can be determined for the shift register to make the test procedure efficient. The contributions related to testing of crosstalk-faults and delay faults in asynchronous on-chip links have been published in [Ben05a, Ben05b, Ben06a, Ben06b, Ben06c, Ben06e, Ben08]. 1.2.2. System level fault modeling and test generation It has been recognized in the research and circuit manufacturing community that the way to increase design-productivity is to work at a higher level of design specification and to use CAD tools to synthesize the circuit for the target technology. Most of the test methodologies still use a logic level representation for generating test vectors and test logic. It would be beneficial if test logic and test 7 CHAPTER 1 vectors could be prepared along with the rest of the design process. This requires accurate fault models at the higher abstraction levels. At a specific level of abstraction, faults can be developed that correspond to possible defects in the actual physical implementation, or to faults at a lower abstraction level. Faults can also be based on fault models at the abstraction level of design specification. Faults that correspond to possible defects have the advantage that they can be very accurate. A drawback is that it can be tricky to create them depending on the tools and methods used for synthesis and how the system has been optimized. Another drawback is that such faults cannot be found before synthesis has been completed. Faults based on fault models at a certain abstraction level have the advantage that they can be used before the design is synthesized into a lower abstraction level. This is needed for development of test data and test logic along with the design process. At abstraction levels above the logic level the most difficult challenge is to create fault models with a good correlation to physical defects in the implementation. The higher the abstraction level, the more difficult it is to find good fault models. Above the behavior level of abstraction is the system level. The system level of abstraction describes what the system should do without providing information on how it should be implemented. Because it is difficult to define general system level faults, we propose usage of application specific fault models at the system level. Application specific faults are specific to a certain type of system. For a switch used in data communication networks, an example of such a fault model could be: a packet from one specific direction that is supposed to be transferred further in a certain direction is instead transferred in a wrong direction. For a display driver an example of a system level fault model would be: pixels of a certain color turn a certain different color when the intensity is supposed to be greater than some level. In this thesis we propose and evaluate a set of system level fault models for a Network on Chip-switch (NoC-switch). A simplified version of a NoC-switch has been designed and synthesized into logic level. Statistical analyses have been done to compare how test vectors 8 INTRODUCTION cover the stuck-at faults for this logic level implementation and how they cover the system level faults. The contributions related to system level fault modeling and analysis have been published in [Ben06d]. 1.2.3. Logic optimization Optimization is generally performed during the process of designing and synthesizing digital systems. The most important targets for optimization are to minimize chip area, to optimize speed and to minimize power consumption. For a given design, one target may be prioritized over the others. Logic optimization is optimization during synthesis from the RT-level to the logic level, and it is the process of optimizing a system described at the logic level of abstraction. The number of flip-flops, number of gates and sizes of gates can be used at the logic level to predict the chip area and power consumption of the system. Gate depth can be used to predict speed. Many optimization problems at the logic level are NP-hard [Dem94], so heuristic methods are needed. One of the main steps in optimization of the combinational parts of the design is Boolean decomposition. Boolean decomposition is the process of finding sub expressions of a Boolean function. This thesis has two contributions to Boolean decomposition. The first contribution is a fast heuristic method that finds boundsets of a Boolean function. The presented method executes on Reduced Ordered Binary Decision Diagrams (ROBDD). For ROBDDs with good variable order the presented heuristic finds all bound-sets in most cases. The second contribution is a fast method to find the likelihood that a Boolean function f(X) will benefit from a target implementation expressed as g1(X) ⊕ g2(X) when functions f(X), g1(X) and g2(X) are implemented with two-level logic. Optimization algorithms for such an expression can be quite time-consuming, so it is advantageous to know in advance if optimization is likely to be beneficial. The contributions relating to Boolean decomposition have been published in [Ben01, Ben03a, Ben03b]. 9 CHAPTER 1 1.2.4. List of contributions The contributions in this thesis have been published in the following articles. [Ben01] T. Bengtsson and E. Dubrova, "A sufficient condition for detection of XOR-type logic", Proceedings of Norchip, Stockholm, Sweden, pp. 271-278, November 2001 [Ben03a] T. Bengtsson, "Boolean decomposition in combinational logic synthesis". Licentiate thesis, Royal Institute of Technology Stockholm, ISSN 16514076, 2003. [Ben03b] T. Bengtsson, A. Martinelli, and E. Dubrova, "A BDD-based fast heuristic algorithm for disjoint decomposition", Proceedings of Asia and South Pacific Design Automation Conference, Kitakyushu, Japan, pp. 191-196, January 2003 [Ben05a] T. Bengtsson, A. Jutman, S. Kumar, and R. Ubar, "Delay testing of asynchronous NoC interconnects", Proceedings of International Conference Mixed Design of Integrated Circuits and Systems, pp. June 2005 [Ben05b] T. Bengtsson, A. Jutman, R. Ubar, and S. Kumar, "A method for crosstalk fault detection in on-chip buses", Proceedings of Norchip, Oulu, Finland, pp. 285-288, November 2005 [Ben06a] T. Bengtsson, A. Jutman, S. Kumar, R. Ubar, and Z. Peng, "Analysis of a test method for delay faults in NoC interconnects", Proceedings of East-West Design & Test International Workshop (EWDTW), pp. 42-46, September 2006 10 INTRODUCTION [Ben06b] T. Bengtsson, A. Jutman, S. Kumar, R. Ubar, and Z. Peng, "Off-line testing of delay faults in NoC interconnects", Proceedings of Euromicro Conference on Digital System Design: Architectures, Methods and Tools, pp. 677 - 680, 2006 [Ben06c] T. Bengtsson, S. Kumar, A. Jutman, and R. Ubar, "An improved method for delay fault testing of NoC interconnections", Proceedings of Special Workshop on Future Interconnects and Networks on Chip (along with Design And Test in Europe), pp. March 2006 [Ben06d] T. Bengtsson, S. Kumar, and Z. Peng, "Application area specific system level fault models: a case study with a simple NoC switch", Proceedings of International Design and Test Workshop (IDT), pp. November 2006 [Ben06e] T. Bengtsson, S. Kumar, R. Ubar, and A. Jutman, "Off-line testing of crosstalk induced glitch faults in NoC interconnects", Proceedings of Norchip, Linköping, Sweden, pp. 221-226, November 2006 [Ben08] T. Bengtsson, S. Kumar, R. Ubar, A. Jutman, and Z. Peng, "Test methods for crosstalk-induced delay and glitch faults in network-on-chip interconnects implementing asynchronous communication protocols", Computers and Digital Techniques, IET, vol. 2, (6), pp. 445-460, 2008. 1.3 Thesis outline This thesis is divided into four parts, Part A – Part D. Part A gives an introduction and background for the entire thesis. It consists of this 11 CHAPTER 1 introductory chapter and Chapter 2. Chapter 2 provides a more detailed background to the contributions in this thesis. In Part B and Part C the contributions in testing and in logic optimization, respectively, are presented. Part B consists of Chapters 3 – 5. Chapter 3 presents background on SoC testing and it describes work related to the contributions in electronic testing. Chapter 4 presents the contributions in the area of testing for crosstalk and delay faults. The contribution to system level fault modeling and testing is presented in Chapter 5. Part C has a similar structure to Part B. It consists of Chapters 6 – 8. Chapter 6 provides more detailed background on Boolean decomposition. Work related to the contributions in Boolean decomposition is also described in Chapter 6. Chapter 7 presents a fast heuristic method to finds bound-sets of a Boolean function represented with a BDD. The contribution to optimization of Boolean functions of the form f(X) = g1(X) ⊕ g2(X) is presented in Chapter 8. The last part of the thesis, Part D, consists of Chapter 9, which presents conclusions and proposals for future work. 12 Chapter 2 Digital system design and testing This chapter provides relevant background in more detail than was offered in Chapter 1, with the goal of providing context for the contributions described in later chapters. Section 2.1 provides an overview of the development procedure for complex digital electronic systems. Section 2.2 describes core based design and testing, including an introduction to SoC. It also describes Network on Chip (NoC), which is a promising candidate for the interconnection infrastructure of future SoCs. Section 2.3 and Section 2.4 give overviews of design optimization issues and test optimization issues respectively. 13 CHAPTER 2 2.1 Digital system design System specification Library of algorithms Architecture template System synthesis Library of soft IP-cores at RT-level Design at behavior level RT-level design Library of soft IP-cores at logic level Logic synthesis and Technology mapping Software development Behavior synthesis Logic design Library of hard IP-cores Layout generation Layout Embedded binary code Figure 2.1: A typical design flow for a complex digital system The design process of a complex digital system generally starts from a system level specification. A system specification is a description of what functions the system should perform with little or no description of how they should be implemented. The design process then, step by step, creates a design that implements the desired functionality. Figure 2.1 shows a diagram of what the design process typically looks like. In many cases the design process is iterative but this is omitted in the 14 DIGITAL SYSTEM DESIGN AND TESTING figure for the sake of simplicity and to maintain focus on the parts that are relevant to the work presented in this thesis. The steps at the upper part of Figure 2.1 deals with more abstract design specifications. Typically, the design steps at higher abstraction levels are made manually while steps at lower levels are performed with computer tools. 2.1.1. Abstraction levels for modeling and design To handle the complexity, the design process is divided into several levels of abstraction. Higher abstraction levels hide details and complexity shown in lower levels of abstractions. In Figure 2.1 the design flow starts with System specification and ends with Layout. The following section provides descriptions of the abstraction levels System level, Behavior level, Register transfer level, Logical level and Layout level. System level As mentioned before, at the system level the system’s desired functionalities are described without any explanation of how they should be implemented. For example, if a system or part of a system is supposed to sort a list of elements, a system level representation specifies that the system should sort and, if it is not obvious what is meant by sorting, it defines the properties of a sorted list. However, at this abstraction level no information is given about which algorithm should be used to perform the sorting. Another example is a design that includes filtering of a digital signal. At the system level the properties of the filter would be specified but not the algorithm that implements the filter. In Figure 2.1 the box System specification represents the description at the system level of abstraction. System C, System Verilog and UML are some examples of languages that can be used to model a system at this level. Behavior level At the behavior level the system is described as an algorithm. In the example of a system that is supposed to sort, this level defines the 15 CHAPTER 2 sorting algorithm that should be used. In the example of a system with a filter, the filtering algorithm is defined with all its parameters. For example, it could be described as a Finite Impulse Response (FIR) filter in which all multiplication factors are specified, where multiplication factor refers to the factors by which samples should be multiplied. The number representation that should be used for sample values is usually also specified at the behavior level. VHDL and Verilog can be used for modeling at this level. Register transfer level At the Register Transfer level (RT-level) a system is described with a datapath and a controller. The datapath consists of functional units, vector multiplexers and registers. These elements are connected to each other by means of signals which are vectors of bits. The RT-level is the highest level of abstraction at which it is defined what should be performed for each clock cycle. In the datapath only registers contain memory elements and they are clocked with a clock signal. Functionalities that should be purely combinational are represented as functional units. Examples of functional units are ALUs and multipliers. For functionalities that require several clock cycles, an RT-level representation describes how they are implemented with registers and pure combinational functional units. The controller is used to generate load signals for registers, control the multiplexers and, control functional units in the datapath. Inputs to the controller can be signals from the datapath representing status of previous computations, for example output of a comparator that compares two bit vectors in the datapath. The controller can also have external inputs. The controller is usually described as a Finite State Machine (FSM). In the example of a system with a digital filter the datapath contains sample values and intermediate results of the computation. The controller part controls what the datapath should do. For example one multiplier can be utilized for several multiplication steps in the filter algorithm. The controller then controls vector multiplexers in the datapath to feed the multiplier with multiplicands from the correct sources. VHDL and Verilog are examples of hardware description languages that are used for description of a system at the RT-level. 16 DIGITAL SYSTEM DESIGN AND TESTING In Figure 2.1 the box RT-level design represents the description at the RT-level of abstraction. Logic level At the logic level of abstraction the system is described as a network of gates and flip-flops. For example, at the RT-level an ALU is only defined with the operations it should perform for respective combinations of input control signals. At the logic level it is described how a network of gates makes the ALU perform those operations. An expression that is formulated as a Boolean equation has an easy and direct mapping to a network of gates. Because of that, Boolean equations are often used to represent the gates in a logic network. In Figure 2.1 the box Logic design represents the description at the logic level of abstraction. Layout level In this thesis we use the term layout level to refer to a complete description of the various masks used in various steps of IC fabrication. At this level it is defined where on the chip each transistor and all other components should be placed to make the chip perform the desired functionality. It is possible to define transistor level as an abstraction level in between the logic level and the layout level. In that level the network of transistors is specified but not the physical position of respective transistors. EDIF is an example of a file format that can be used to describe a system at transistor level. In Figure 2.1 the box Layout represents the description at the layout level of abstraction. 2.1.2. Design flow System synthesis is the process of refining the system specification into a design at the behavior level. At this step the algorithms that will be used for implementing the system specification are identified. A decision can also be made to include an architecture template with some pre-designed components like processors, memories, buses, communication protocols, already included in the design. 17 CHAPTER 2 Parts of the functionality in digital systems are usually implemented in software. In Figure 2.1 this is represented by the dotted box to the right. The software development can be described with more details but because the contributions of this thesis are related to hardware development, everything about software development is represented with a single box in the figure. The embedded software is mapped to processing elements in the system. In addition to predefined components, the architecture template also contains slots for new hardware. When the system synthesis has finished, the hardware design process continues in the synthesis steps that follow. The design at the behavior level is refined to an RT-level design through the design step behavior synthesis. Behavior synthesis schedules what operations should be executed at each clock cycle. Logic synthesis and technology mapping is the synthesis step in which the RT-level design is refined to a logic design. The output of this step is a network of gates and flip-flops that implement the functionality of the system. The last synthesis step is the layout generation in which the masks for various layers are generated for chip manufacturing. Each of the synthesis steps can be implemented in many different ways and it is desirable to find a method that gives the best possible final implementation. Optimization operations can also be applied to the design descriptions at the different abstraction levels before proceeding with the next synthesis steps. This is described further in Section 2.3, which is about optimization. There the focus is on optimization at the logic level. The contribution of this thesis presented in Part C has to do with optimization at logic level. 2.2 Core based design and systems on chips As mentioned in Section 1.1 the integration level on a chip has doubled about every two years for several decades such that more than one billion transistors can be integrated on a single chip today. If this large capacity of chips is to be utilized, the methods used to design chips need to be increasingly efficient; otherwise the required number 18 DIGITAL SYSTEM DESIGN AND TESTING of man-hours to design a chip would grow with the integration level and will become unrealistic in most cases. One important method to keep the number of man-hours for design acceptably low is the usage of IP-cores. IP-cores are readymade designs that can be included in a SoC design. In the current section IP-cores are first described, and then the way in which SoCs can be composed with IP-cores is discussed. After that NoC, an infrastructure that can be used in SoCs to connect IP-cores, is described. 2.2.1. IP-cores IP-cores are designs that have already been made in-house or designs that are obtained from external suppliers. Based on the abstraction level of description of an IP-core it can be classified either as a hard IP-core or a soft IP-core. An IP-core provided at the layout level is called a hard IP-core. IP-cores provided at the logic level of abstraction and above are called soft IP-cores. A soft IP-core can be a network of gates and flip-flops. In this case it is at the logic level of abstraction. A soft IP-core provided at RT-level can be a VHDLdescription. The architecture template may already include some IP-cores and some more IP-cores may be included during the synthesis process. This is shown in Figure 2.1. For example, some hard IP-cores can be included during layout generation while soft IP-cores are included earlier in the design process. One important advantage of IP-cores is that they are reusable. An IP-core can be used in several designs and can be reused from previously designed chips. There are companies which sell IP-cores [Alt12, Arm12] and some IP-cores are available for free [Ope12]. A widely used IP-core is the processor. A supplier can then provide software development tools along with the processor IP-core. Soft IP-cores rely on the users’ synthesis tools. They are independent of the target chip technology and it is an important advantage that they can be used for many different chip technologies. Another advantage is that the synthesis tool can, to some extent, make a soft IP-core to fit layout constraints, for example in a particular, desired shape. 19 CHAPTER 2 Hard IP-cores are provided as a layout for a certain chip technology. An advantage of hard IP-cores is that the performance in terms of speed and power consumption can be optimized and this information can be provided by the IP-core provider. Knowledge of such details can help selection of appropriate IP-cores to include in a design based on the design constraints. 2.2.2. Systems on chips A SoC is composed of several cores on a single chip which collaborate to make the chip perform its desired functionality. In a SoC the different cores have to be connected to each other in an appropriate way to achieve the desired functionality. Early SoCs had usually dedicated wires connecting each pair of components that needed to communicate. When the amount of integration grew, such interconnections became unwieldy and took up too much chip area. As a result, bus-based infrastructures became popular in SoCs. A bus is a single broadcast medium. It is widely realized that a single-bus architectures can no longer deliver the required global bandwidth and latency to support current SoCs [Ver03]. Using multiple buses is a way to achieve better performance. For a system with a large number of cores, such a system of buses might become bulky, because all pairs of cores that communicate with each other must have at least one bus in common, and several buses are needed to gain any significant advantage over a single bus. In complex SoCs several advantages can be achieved if a packet based communication infrastructure can be used instead of buses. In 2002 the NoC communication architectures were proposed [Ben02, Kum02]. This is a packet based communication infrastructure that can be used instead of buses in SoCs. One advantage with such a structure is that more parallelism can be achieved in the communication compared to a bus-based infrastructure. In this way the overall throughput can be improved. The NoC architecture is described further in the next subsection. 20 DIGITAL SYSTEM DESIGN AND TESTING 2.2.3. Networks on chips The process of developing SoCs, particularly those with a NoC infrastructure, is a unifying factor for the contributions in this thesis. The NoC infrastructure is a packet based communication system connecting different cores in a SoC. A core is an IP-core or some other subcomponent. This packet based infrastructure consists of switches with links between them. In a switch, each packet that arrives at an input port is forwarded to an output port on its way to the final destination. Ports in a switch are used to connect to other switches via links and to connect to cores. A commonly used topology for the infrastructure and cores is the mesh topology [Ben02, Kum02], which is illustrated in Figure 2.2. In this type of topology the switches and the cores are arranged in a matrix. Communication links are connected between adjacent switches in y-direction and in x-direction. Each switch is connected to one core. The physical length of the connection links between switches is equal. This has the advantage that links will have predictable and equal delays. Switch Switch Core Switch Core Switch Core Core Figure 2.2: Mesh topology NoC layout A drawback with the mesh layout is that the available chip area for each core is required to be approximately equal. Usage of IP-cores 21 CHAPTER 2 that are much smaller will leave chip area unused. IP-cores that are larger than the allocated area cannot be included in the NoC-chip without modifying its structure. There is another proposed topology in which the NoCinfrastructure is placed in a central part of the chip with the cores around it. The idea of this topology is to take a bus-based SoC and replace the bus with a NoC infrastructure. This topology is used in the Aethereal type of NoC [Ver03, Wie02]. 2.3 Logic optimization This section gives an overview of the optimization process during design of digital systems. It focuses in particular on the optimization step referred to as logic optimization. 2.3.1. Overview of optimization during system design Many different possible implementations exist for the same functionality. Properties like chip area, speed performance and power consumption can differ between different implementations. The way synthesis steps are implemented has a significant effect on the properties of the final implementation. For some applications the main objective is to make the chip as small and power efficient as possible. For other applications processing speed might be more important. Optimization for a specific objective can be performed in the synthesis steps from one abstraction level to another one or at a given abstraction levels. To be able to optimize, metrics are needed at different abstraction levels to gauge which design is better. Such metrics should correlate strongly to the optimization objective. For example, at the logic level, the number of gate inputs and the number of flip-flops can be used to estimate how much chip area will be needed in the final layout. The gate depth can be used to estimate maximum clock frequency possible. At the behavior level, time complexity of algorithms used for 22 DIGITAL SYSTEM DESIGN AND TESTING implementing the functionality is a metric with good correlation to the speed of the final implementation. Many of the optimization problems faced during refinement of a design from system specification to layout are NP-hard [Dem94]. The following paragraphs give a brief overview of some of the optimization problems and associated synthesis steps. The system level of abstraction describes what the system should do without any description of how. Synthesis from the system level to the behavior level is usually done manually. This synthesis step includes decisions about which algorithms should be used for different subfunctions of the system. Good algorithm selection is very important to the performance of the final product. During synthesis from the behavior level of abstraction to the RTlevel, a number of design decisions must be made. For example, it might be determined that several operations at the behavior level can use the same functional unit at the RT-level. It must also be decided at this synthesis step whether pipelining should be used in the datapath or not. At the RT-level, parts of the functionality are usually described as one or several FSMs with a datapath. During synthesis from the RTlevel to the logic level, the number of states in the FSM describing the controller is minimized. These states are encoded and Boolean expressions for the combinational part of the state machine are generated. The chosen encoding has a large impact on the number of gates needed. At the logic level of abstraction the system is described with flipflops and combinational logic. The combinational logic can either be described as a network of gates or as Boolean expressions. A Boolean expression has a direct mapping to a network of gates. During synthesis from the logic level to the layout level, the gates, flip-flops and interconnects are materialized as a layout. Layouts for specific types of gates and flip-flops are generally taken from a library. An optimization challenge at this step is to place gates and flip-flops and route the interconnects. 23 CHAPTER 2 2.3.2. Logic optimization The optimization that is performed during synthesis from the RT-level to the logic level as well as optimization on the logic level description of a system is referred to as logic level optimization or simply as logic optimization. Logic optimization consists of state minimization of FSMs, encoding of the states in FSMs and optimization of combinational logic. For fully specified FSMs there is an exact algorithm for state minimization with polynomial time complexity. On the other hand, the minimization problem is NP-hard for incompletely specified FSMs in which outputs and/or state transitions are don't-cares for some combinations of inputs [Dem94]. The states in an FSM need to be encoded with a set of flip-flops. The number of flip-flops needed is at least log 2 N where N is the number of states. In some cases using more than the minimum number of flip-flops can reduce the combinational parts so much that it is worth using more flip-flops. For example one hot encoding uses one flip-flop for each state such that exactly one flip-flop takes logic value 1 at the time. Optimization of the combinational parts of a design allows the optimization procedure to choose different strategies and tradeoffs. Combinational logic optimization has two main types of optimization strategies, two-level logic optimization and multi-level logic optimization. In two-level optimization the logic synthesis generates a logic circuit with a gate depth of two, not counting inverters on the inputs. Gate depth is the maximum number of gates a signal must traverse between an input of the combinational part and an output of the combinational part. In multi-level optimization the gate depth in the logic circuit can be anything. Logic synthesis for two-level logic circuits is especially useful when PLA-structures are used for implementation. Multi-level optimization strategies are preferable when the target implementation is an FPGA or a full custom chip. Two-level optimization and multi-level optimization are described in Subsections 2.3.3 and 2.3.4. In the area of optimization this thesis includes contributions in logic optimization of combinational logic. Part C contains the contributions of this thesis in logic optimization, 24 DIGITAL SYSTEM DESIGN AND TESTING and it also describes the logic optimization step referred to as decomposition. 2.3.3. Two-level optimization Two-level description Two-level optimization is optimization of logic into logic circuits with a gate depth of two. Inputs and complements of inputs to the logic function are connected to AND-gates. Outputs of the AND-gates are connected to OR-gates. There is one OR-gate for each output of the logic function. The AND-gates are treated as the first level of logic and the OR-gates as the second level of logic. In fact, if the complements of the inputs are not available, one more level of logic is required to invert the input signals. However, such inverters on the inputs are not counted as an additional level of gates, so optimization for this kind of structure is called two-level optimization. There is a direct mapping between a two-level logic circuit and a sum of product (SOP)-form representation of Boolean functions. An example of an expression in SOP-form is f ( x1 , x 2 , x3 , x 4 ) = x1 ⋅ x 2 ⋅ x3 ⋅ x 4 + x1 ⋅ x 2 + x3 ⋅ x 4 . The terms in such an expression are called product terms. Each product term corresponds to an AND-gate and the sum in the expression corresponds to the OR-gate. An alternative to the SOP-form is the product of sums (POS)-form. In practice, a system normally contains combinational parts with more than one output. The number of gates can usually be reduced if some product terms are shared by more than one output. Figure 2.3 shows an example of a two-level implementation of the two Boolean functions f1 = x1 ⋅ x 2 ⋅ x3 ⋅ x 4 + x1 ⋅ x 2 + x3 ⋅ x 4 and f 2 = x1 ⋅ x 2 + x3 ⋅ x 4 . Both these functions include the product term x3 ⋅ x 4 and they share the AND-gate generating it. 25 CHAPTER 2 x1 x2 f1 x3 x4 f2 Figure 2.3: Two-level implementation of two output functions Cube representation and Karnaugh maps One way to model Boolean functions is to use cube representation. In this representation a Boolean space with dimension n is used, where n is the number of variables of the function. Two discrete values, 0 and 1, are used as coordinates for each dimension. Therefore there exist 2n discrete points in this entire space. A point in this space is called a minterm and it represents an assignment of the variables of a Boolean function. If the function is fully specified, then for each specific minterm the function value is either logic 0 or logic 1. Figure 2.4 shows an example of a cube representation for a function with three inputs. In that figure, filled minterms represents function value 1 while a non-filled minterm represents function value 0. The Boolean function shown in Figure 2.4 is then x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 . A subspace of a Boolean space is the set of minterms for which a subset of the inputs are fixed to specific values. This type of subspace is called a cube. The two dotted ovals in Figure 2.4 are examples of cubes. In this example the dimensions of the smaller one is one and the dimensions of the larger one is two. A cube, in which the function value is 1 for all minterms, is called an implicant. Thus, the two dotted ovals in Figure 2.4 are implicants. A set of implicants that contains all minterms where the function value is one is called a cover of that function. A cover has a direct mapping to the two-level logic circuit because each implicant corresponds to a product term and then to an AND-gate. For each 26 DIGITAL SYSTEM DESIGN AND TESTING dimension in the cube representation space where the implicant is fixed to 1 or to 0, an input to the AND-gate is required. Hence the number of inputs to the corresponding AND-gate is smaller for a larger implicant. More precisely, the number of inputs needed to the AND-gates is equal to the difference in dimensions between the implicant and the Boolean space of the entire function. For example, in Figure 2.4 the large cube corresponds to an AND-gate with only one input (A one input AND-gate reduces to a wire or to a buffer) fed by input x1. This cube represents all minterms where x1 = 1. The smaller cube in Figure 2.4 corresponds to a two input AND-gate fed by x 2 and x3 . x2 1 1 x3 0 x1 0 0 1 Figure 2.4: A three dimensional Boolean space In some cases an implicant can be expanded to cover more minterms. Releasing an input that is fixed is one way of doing so. For example, the implicant in Figure 2.5a can be expanded so it becomes like the implicant in Figure 2.5b. The implicant in Figure 2.5a corresponds to a two input AND-gate with inputs x1 and x 2 while the implicant in Figure 2.5b corresponds to a one input AND-gate with input x1. This implicant cannot be expanded further because if it were larger it would cover minterms for which the function value is zero. An implicant that cannot be expanded further is called a prime implicant. 27 CHAPTER 2 One way to treat the cube representation is using Karnaugh maps [Kar53]. A Karnaugh map is a Boolean space projected onto a twodimensional surface. x2 x2 1 1 1 x3 1 0 x1 0 0 1 x3 0 x1 0 0 a 1 b Figure 2.5: Example of expansion of an implicant Algorithms for two-level optimization In the 1950s Quine [Qui52] and McCluskey [Mcc56] developed an exact algorithm for two-level optimization. Quine proved a fundamental theorem, stating that there exists a minimal cover consisting only of prime implicants. This result reduces the search space for optimization algorithms to prime implicants. McCluskey proposed a method using the set of prime implications of a function to find its minimal cover. Due to the NP-hard nature of the problem, the exact algorithms are intractable for most large functions. Thus, heuristic methods are used in practice. A popular heuristic method is the minimizer Espresso [Bra84]. 2.3.4. Multi-level optimization Multi-level optimization of logic circuits targets implementations in which gate depth is not restricted to two. Not having that restriction makes it possible to provide better options than two-level optimization for goals like minimization of area and minimization of power consumption. The consequence of this flexibility is that it is more 28 DIGITAL SYSTEM DESIGN AND TESTING complicated to optimize. Larger gate depth also results in more delay than a smaller gate depth. An example of a multi-level logic circuit is shown Figure 2.6. x1 x2 x3 f x4 Figure 2.6: A multi-level logic circuit Optimization programs commonly apply a set of transformation operations on a logic network targeting the optimization goals. The logic network can be represented as a network of gates and it can be expressed with Boolean equations. It can also be represented with a combination of Boolean equations and a network. The network is, in this case, a directed acyclic graph with edges representing signals and nodes having Boolean expressions. De Micheli [Dem94] describes how logic optimization can be applied in this type of representation, in which each node has a Boolean equation expressed in SOP-form. Decomposition is one optimization operation which is particularly important when the optimization target is minimization of area or minimization of power consumption. A decomposition operation on a logic network splits a node into multiple nodes in a way that makes further optimization operations efficient when applied separately on the different parts. To be useful, the result of a decomposition operation normally requires that the number of signals between the parts is few. A logic network might not be already partitioned into different parts when logic optimization starts. A decomposition operation can then be applied on the entire network as a first step. Examples of other types of transformation operations include those that merge nodes and those that minimize the number of product terms in the Boolean expressions inside nodes. Special searches can be conducted on the logic network to determine how common sub 29 CHAPTER 2 expression can be extracted and how available signals can be utilized to transform to the logic network in line with the optimization target. 2.3.5. Cost metrics for logic optimization Common cost criteria used during optimization are the size of the layout, speed performance and power consumption. The logic circuit, which is the outcome of the logic synthesis, does not have an exact connection to the number of components or chip area but the cost has to be estimated in some way. The following describes how cost is usually estimated for two-level circuits and for multi-level circuits. Two-level logic circuits In two-level optimization, the number of implicants is normally used as cost criteria. An implicant that can be shared by several outputs is only counted once. As mentioned in Section 2.3.2, two-level optimization is particularly suitable for PLA-implementation. The method for estimating cost described above has a direct mapping to the required size of a PLA for the implementation. A common PLA-structure has a set of outputs and a set of inputs, where any Boolean function can be implemented as long as the total number of implicants is less than a specified value. Multi-level logic circuits When the logic network is represented as a network of gates, the total number of gate inputs is a good measurement of the chip area for the final implementation. The number of literals is a useful measurement of the expected chip area, when the logic network is represented as a directed acyclic graph in which nodes have Boolean expression in SOP-form. The number of literals of a logic block is the sum of the number of occurrences of the input variables in its Boolean expression. The number of literals of a logic schema is the sum of the number of literals of all logic blocks in the design. In Section 2.3.2 gate depth were defined as the maximum number of gates a signal must traverse between an input and an output of the 30 DIGITAL SYSTEM DESIGN AND TESTING combinational part of a logic circuit. The gate depth is then a direct estimation of the maximal delay. 2.4 Defects and digital system testing We use the term testing to refer to detection of manufacturing defects. We use the term validation to refer to methods for detection of logic design errors. This thesis only deals with methods and considerations related to testing for manufacturing defects. The IC fabrication is not perfect and different types of defects can be introduced in this process. A defect in an electronic system is a physical deviation from the specification, which may possibly give different functionality than intended. Material defects, mask defects and dust particles are examples of things that can cause defects on manufactured chips. Defects on manufactured chips need to be tested [Lar08] in order to find the chips with defects. Complex chips cannot be exhaustively tested to check whether they work for all cases. For example, one subcomponent in a chip could be a 32 bit multiplier. To exhaustively test it for correct functioning, all combinations of two 32 bit multiplicands need to be applied and the result needs to be checked to see if it is correct. This requires 2 32 ⋅ 2 32 ≈ 1019 different tests, which can’t be achieved in a reasonable time. To create a test for a chip, it is necessary to find an approach that does more than simply check whether the chip works in all possible situations. Instead, the approach that is used is to check for the existence of each possible or relevant defect. In an integrated circuit it is either not possible or very difficult and expensive to use a probe to measure for the presence of a defect directly on the spot. Instead, input signals are applied to the chip such that at least one of the outputs gets a different value if the defect is present compared to if the chip is free from defects Common defects in faulty chips are short circuits between conductors and breaks in the conductors. More complex physical defects might result in the creation of unwanted extra components, for example an extra transistor. Short circuits, breaks and extra 31 CHAPTER 2 components are defects that can be considered as distinct, which means that either the defect is there or it is not there. In another class of defects, some performance metrics of components are outside acceptable ranges. An example of such a defect is a wire that has become too thin, resulting in resistance that is too high but still low enough such that the effect is different from a break. Another example of such a defect is that two wires have come very close to each other, resulting in too much parasitic capacitance between them. 2.4.1. Faults and fault models A fault is a description, at a certain abstraction level, of the effect of a defect. There are basically two ways to define faults. The first is to analyze possible defects at the implementation. Each relevant physical defect is analyzed to determine how its presence appears at a certain abstraction level. In this case, the physical implementation has to be known before faults can be defined. The other method is to use fault models. A fault model is a conceptual representation of implementation defects in a description at an abstraction level above the physical implementation. A fault model denotes one particular way to define faults by only considering the design at the abstraction level for which the faults are going to be defined. A fault model does not rely on a specific implementation of the system. Faults created from a fault model are usually less complex than those derived from the physical implementation. They are therefore simpler to handle. On the other hand, faults defined from a fault model have looser mapping to the physical defects than faults derived from possible defects in the implementation. They are therefore less accurate. One of the most well-known fault models is the stuck-at fault model [Eld59] at the logic level. A stuck-at fault in a node in the logic circuit means that this node is always 0 or always 1. We say that the node is stuck-at-0 or stuck-at-1 respectively. A node that is stuck-at-1 has constant logic value 1, independent of what the gate feeding that node tries to set it to. Stuck-at-0 faults behave correspondingly. 32 DIGITAL SYSTEM DESIGN AND TESTING The logic level stuck-at fault model is not sufficient for capturing all possible physical defects. The logic level bridging fault models wired-AND and wired-OR cover some of the defects not covered by stuck-at faults. To develop tests at a certain abstraction level before the design is synthesized to the next level of abstraction, faults defined using fault models are needed. The main difficulty of developing test methods at high abstraction levels is finding fault models and defining faults that adequately represent defects in the final implementation. The higher the abstraction level the more difficult it is. However there are several advantages if tests can be developed at a high abstraction level. One advantage is that it can facilitate identification of testability problems early in the design process. Another advantage is that test logic can be included earlier, which means that optimization strategies can also include the test logic to target the overall optimum rather than only the main design without its test logic. A third advantage in working at a higher level of abstraction is that test generation can be more efficient [Jer02]. The logic level is a relatively low abstraction level and a test that detects all stuck-at faults can also detect most physical defects but all relevant physical defects might not be detected. With help of the example in Figure 2.7 showing an implementation of an NAND gate, we compare a logic level stuck-at fault with a fault derived from the physical implementation. We demonstrate how the stuck-at fault is less accurate but also less complex than the fault derived from the physical implementation. Consider the defect in Figure 2.7, which is a break in a wire. This break causes the output Q to have high impedance for input pattern 01. In CMOS circuits high impedance in a node usually means that the logic value of the node remains the same for some time due to capacitance. The logic level stuck-at fault that best maps to this defect is the fault where node A is stuck-at-1. A test generated to detect if node A is stuck-at-1 applies 01 to the inputs. The output Q is then 0 if this stuck-at fault is present and 1 if the gate works correctly. However, the defect, which is the break, is only detected by a test for this stuck-at fault if it is preceded by an input that sets output Q to logic 0. 33 CHAPTER 2 break Q A A 0 0 1 1 B Q 0 1 1 high impedance 0 1 1 0 B Figure 2.7: NAND-gate with break in a wire To derive a fault at logic level from the break, we need to analyze what goes wrong at logic level due to the defect. As argued for above the effect of the defect in the NAND-gate in Figure 2.7 appears at logic level as the output Q has an erroneous value when its inputs are 01 preceded by inputs 11. This is a more accurate fault than the stuckat fault but it is also more complex. The faults described so far in this subsection model distinct defects. There are also fault models for defects where some physical properties are outside acceptable ranges. Failures that occur due to faults from such fault models are referred to as marginally related failures [Kun05]. An example of such a fault is the delay fault. There are several defects that result in delay faults, such as the presence of too much parasitic capacitance between a wire and the ground. Crosstalk-faults are a type of marginally related fault and background about them is described in Section 3.2. 2.4.2. Fault modeling Fault modeling is to obtain a fault definition at a certain abstraction level based on a defect or a fault at lower abstraction level. Fault modeling in several steps can be used to define faults based on physical defects at the implementation of a system. Fault modeling can also be used as a technique to demonstrate and justify the 34 DIGITAL SYSTEM DESIGN AND TESTING relevance of a certain fault model. For such a technique, faults are modeled from a hypothetical and typical implementation. VDD stuck-at-0 OUT Break Fault Modeling IN2 IN1 IN1 OUT IN2 (a) (b) A B Mux Mux Register Register Mux Comparator Fault Modeling Mux Fault Modeling WHILE A ≠ B LOOP IF A > B THEN A = A - B ELSE B = B – A END IF END LOOP OUTPUT A ALU Controller One bit of an input to the ALU is stuck at one (c) One bit of the variable is stuck-at-0 at these occasions (d) Figure 2.8: Fault models at different abstraction levels With help of the example in Figure 2.8 we show how a defect can be modeled as a stuck-at fault at the logic level and further to the RTlevel and the behavior level. Figure 2.8a–b shows an example of how a physical defect maps to a logic level fault. The circuit in Figure 2.8a is supposed to implement a NOR gate, but there is a break in the line connecting the left transistor. The result is that this transistor cannot sink the output to ground when it is supposed to. At the logic level this can be modeled as input IN1 is stuck-at-0. Figure 2.8b shows the logic level representation with this stuck-at fault. Figure 2.8c shows an RT-level implementation of a circuit that computes the greatest common divisor of two integers A and B. Let us assume that the gate in Figure 2.8b is used in the ALU in Figure 2.8c 35 CHAPTER 2 and one of the inputs to the ALU goes to input IN1 of that NOR gate. We also assume that this input is not connected to any other gate in the ALU. The stuck-at-0 fault at the logic level then appears as one bit in a signal vector is stuck-at-0. Figure 2.8d shows a part of the behavioral level description of the greatest common devisor design. It contains two minus operations. At RT-level both of them are implemented with the ALU. The fault in the ALU then appears as a bit is stuck-at-0 in the right operands of both the minus operators. In this example we have shown how a defect can be modeled to a logic level fault, then to an RT-level fault and then further to a behavior level fault. Inspired by this fault modeling example we can justify some fault models. At RT-level we found that the defect appears as a bit signal is stuck-at-0. It is a relevant assumption that other defects appear at RT-level as some other bit in a signal is stuck at 0 (or 1). To represent all the faults due to this assumption we can use the fault model signal bit stuck-at fault. A faults defined from this model indicates that a certain bit in a certain signal is stuck at 0 (or 1). With the same reasoning we can assume that a relevant fault model at behavior level is the variable bit stuck-at fault. A fault created from that fault model means that one bit in a variable is stuck at 0 (or 1). It should be noted that the circuit in Figure 2.8a is not the way a gate is normally implemented but it is used in this example to avoid making the illustration unnecessarily complicated. Many faults which look simple at one abstraction level become quite complex when modeled at a higher abstraction level. It is illustrated along with Figure 2.7 how a simple break in a wire results in a relatively complex fault at the logic level. Modeling the fault in that gate further into RTlevel can result in an even more complex fault. For example, assuming that this gate is used in an ALU and the fault is modeled into RTlevel, we can end up with an RT-level fault whose presence results in that the output of the ALU becomes erroneous in a particular way for some specific operands which are preceded by operands from a certain set of operands. More discussion about derivation of faults from defects in the actual implementation is available in Subsection 3.3.2 36 DIGITAL SYSTEM DESIGN AND TESTING 2.4.3. Test generation principles When faults have been defined the next step is to develop test vectors that detect whether any of the faults is present. To do so, each fault needs to be activated and propagated. Assigning a value to the input that activates the fault causes the internal node with the fault being tested to take on different values, depending on whether the fault is present or not. An input assignment that propagates the fault, sets some output to a different value if the fault exists compared to if no fault existed. In this context an assignment of inputs can be a sequence of different values applied to the inputs. To check whether a fault is present, an input assignment must be used which both activates and propagates the fault. x1 x2 y x3 Figure 2.9: A logic circuit tested for a stuck-at fault Figure 2.9 shows a small combinational logic circuit. To test if the node marked with an arrow is stuck-at-0, the output of the AND gate needs to generate logic 1. Then that node gets logic value 1 if the circuit is correct and logic value 0 if that fault is present. This is the activation of the fault. To be able to detect whether the fault is present it has to propagate to an observable output. In this example the NAND gate needs to output a value to the OR gate such that the output y of the OR gate depends on the node marked with an arrow. This means that the output of the NAND gate should be 0. In this example input x1 and input x2 need to be assigned logic 1 to activate the fault. To propagate the fault to y, input x2 and input x3 need to be assigned logic value 1. 2.4.4. Design for testability In complex circuits it is expensive in terms of testing time to activate a fault and propagate it to an observable output. To save time, dedicated 37 CHAPTER 2 test logic is integrated in the circuit to facilitate testing. Design for Testability (DfT) is a design technique that takes into account testability in the design process, including additional testing features and test logic [Abr90, Jha03, Mou00]. A commonly used DfT technique is to use a scan path. With this technique, flip-flops are equipped with some extra logic such that values can be scanned in and out through the flip-flops as a shift register. Another DfT technique is to use a Built In Self Test (BIST) mechanism. This basically means that special extra circuit is added to the chip which helps it to test itself. From outside the chip, all that is required is that a signal be sent that puts the chip into the test mode. The chip will then return a signature that can be checked to determine if the test found some faults. A common implementation of BIST is to use a linear feedback shift register to generate pseudo-random input signals to the logic being tested. The output of the logic being tested is connected to a multiple input linear feedback shift register that generates a signature. 2.4.5. Typical test generation flow and design flow Figure 2.10 shows the design flow in Figure 2.1 complemented with a typical flow for test generation. For clarity, the software development box shown in Figure 2.1 is omitted. Typically, most test generation is done at the logic level of abstraction. The main reason for not utilizing higher abstraction levels above is the difficulty associated with defining faults at these levels that accurately cover the relevant defects. The box Test data in the lower right corner of Figure 2.10 represents the final test data. This test data is a description of which voltage levels to apply to the inputs of the circuit during test, along with timing values for when to apply the signal. It also describes which voltage levels to expect at the outputs for a circuit without faults. This test data is transformed from Logic level test data. Logic level test data is test data in which input and output values are expressed as sequences of Boolean values. Logic level test data are usually referred to as test vectors. The box Test data transformation 38 DIGITAL SYSTEM DESIGN AND TESTING represents transformation of test data as Boolean values to voltage levels. This transformation can be made in the test equipment. In such a case logic level test vectors are sent to the test equipment. System specification Library of algorithms System synthesis Library of soft IP-cores at RT-level Design at behavior level Behavior synthesis RT-level design Library of soft IP-cores at logic level Test vectors, BIST & DfT Logic synthesis and Technology mapping Logic design Library of hard IP-cores Test vectors, BIST & DfT BIST & DfT Test generation Test data transformation Layout generation Layout Logic level test data Test generation Test data Figure 2.10: Typical test generation and design flow In some cases additional test data may be generated from layout to get sufficient coverage of defects. This is illustrated by the dotted box and the dotted arrows in Figure 2.10. Most test generation is done at the logic level. This is illustrated with the arrow from the box Logic design to the box Logic level test data via the box Test generation. This test generation relies on logic level fault models independent of specific manufacturing defects. During this generation test logic is also generated. This is represented 39 CHAPTER 2 by the arrow going to the box BIST & DfT, which is attached to the box Logic design. For hard IP-cores the supplier needs to implement BIST and DfT logic for the core and deliver test data. It is hard for the user to develop tests because hard IP-cores are provided at the layout level. Figure 2.10 illustrates how test data are provided along with hard IPcores and how it is included with other test data. In practice, the test data for a hard IP-core can be given as logic level test vectors along with information about which voltage levels should represent logic 1 respective logic 0. Soft IP-cores at the logic level can be delivered with test vectors for faults derived from a logic level fault model. This is usually sufficient. Figure 2.10 also illustrates how such test vectors are included in the logic level test data. 2.4.6. Test generation flow and design flow with test data generation at high abstraction levels Test data generated at a certain abstraction level will be in a form which corresponds to that abstraction level. RT-level test data can include instructions of state transitions in an FSM. During synthesis of a system for which test data has been generated, the test data have to be transformed to comply with the synthesis of the system. Hereafter follows a description of how the process of test data transformation and test data generation link to the design flow. RT-level Figure 2.11 shows the design and test flow when test data is generated from faults derived from an RT-level fault model. The box RT-level test data represents the test data generated from the RT-level design. The test generation can also generate DfT and BIST logic. This is shown with the arrow from the box Test generation to the box BIST & DfT. This test logic will be part of the system and is therefore further synthesized along with the synthesis steps followed after the RT-level. The generated RT-level test data needs to be transformed such that it can be used in the test equipment. The right part of Figure 2.11 40 DIGITAL SYSTEM DESIGN AND TESTING shows this transformation. The first step in this transformation is to transform the RT-level test data into logic level test data. This must be done with consideration for how the logic synthesis and technology mapping is made. For example the RT-level test data might include an instruction to make a state transition in an FSM. At the logic level the FSM is implemented with gates and flip-flops. The transformation of test data then transforms this RT-level instruction to the logic values that should be applied to the inputs to achieve this state transition. Further transformation from the logic level can be made as described in Section 2.4.5. System specification System synthesis Design at behavior level Behavior synthesis RT-level design BIST & DfT Test generation RT-level test data Logic synthesis and Technology mapping Test data transformation Logic design Logic level test data Layout generation Test data transformation Layout Test data Figure 2.11: Design and test flow with test generation at RT-level Behavior level The design and test flow when test data is generated at behavior level of abstraction is shown in Figure 2.12. Behavior level test data is generated from the design at the behavior level. During this test generation, test logic can also be generated. This is shown with the 41 CHAPTER 2 box BIST & DfT. This logic will be part of the design at the behavior level and synthesized further together with the design. The behavior level test data needs to be transformed to a form that can be used when the test is executed. The right part of Figure 2.12 shows the transformation steps. The first step is to transform the behavior level test data to RT-level test data. For example, part of the test data at behavior level could be operands for an operator, along with the expected output value of the operator. At the RT-level a clock signal is introduced and the operator could be implemented such that several clock cycles are needed to complete the execution of that operator. The transformation of the test data from behavior level to RT-level then needs to be done such that the operands are applied and the result are read at the correct clock cycles. System specification System synthesis Design at behavior level BIST & DfT Test generation Behavior level test data Behavior synthesis Test data transformation RT-level design RT-level test data Logic synthesis and Technology mapping Test data transformation Logic design Logic level test data Layout generation Test data transformation Layout Test data Figure 2.12: Design and test flow with test generation at behavior level 42 DIGITAL SYSTEM DESIGN AND TESTING System level Figure 2.13 shows the design and test flow when a test is generated at the system level. The box System level test data represents the test data made from the system specification. During test generation, test logic for BIST and DfT can be generated. System specification BIST & DfT Test generation System level test data System synthesis Test data transformation Design at behavior level Behavior level test data Behavior synthesis Test data transformation RT-level design RT-level test data Logic synthesis and Technology mapping Test data transformation Logic design Logic level test data Layout generation Test data transformation Layout Test data Figure 2.13: Design and test flow with test generation at system level As is the case when a test is generated at the RT-level and at the behavior level, the system level test data needs to be transformed to a form that can be used when the test is executed. The first step in this process which is transformation from system level to behavior level is usually a quite modest transformation because the function of the system synthesis is mainly to choose algorithms. For example test data defined at system level for testing a sorting functionality will not change form by just choosing which sorting algorithm to use. However, the way numbers are encoded can be decided at the system synthesis and in such a case the test data needs to be transformed accordingly. 43 CHAPTER 2 Test generation at several abstraction levels Figure 2.14 shows what a design and test flow may look like when different abstraction levels are used for generation of different parts of the test data. At the system level some test generation is performed and possibly some test logic is also generated. The design is then synthesized into the behavior level and the test data is transformed accordingly. At the behavior level, more test data is generated and more test logic might be generated. The newly generated test data is merged with the test data that was transformed from the system level. The system is then synthesized into the RT-level and the test data is transformed to RT-level test data. At the RT-level more tests data can be generated as well as more test logic. The new test data is then put together with the test data transformed from the behavior level. In the next step, the system is synthesized into the logic level of abstraction and the test data is transformed accordingly. More test data is also generated at the logic level, including generation of more test logic. The newly generated test data is then merged with the test data that was transformed from the RT-level. The logic level design and the test data which is now at the logic level can be further processed as if all test generation has been done at the logic level. It is not always efficient to generate test data at all the abstraction levels shown in Figure 2.14. It can for example be more efficient to use RT-level to generate some test data and then complement it with more test data generated at logic level. In Figure 2.14 it is also shown how the inclusion of IP-cores and their test data can be done. In Subsection 2.4.5 it was described how test data provided along with hard IP-cores and with logic level soft IP-cores can be included in the test data generation. For soft IP-cores at RT-level the flow of including the test data provided with the IPcore is made in a similar way. That test data is incorporated together with other RT-level test data. At behavior level algorithms from a library can be included. The flow of including an algorithm at behavior level is similar to the flow of including an IP-core at the lower abstraction levels. As test data can be provided along with IP-cores, test data can also be provided along with algorithms. An algorithm is a behavior level description and therefore test data provided along with an algorithm must be generated 44 DIGITAL SYSTEM DESIGN AND TESTING of faults derived from a behavior level fault model. For example test data provided along with a sorting algorithm can be a set of lists that the algorithm should sort. Those lists are generated such that when sorting them a set of behavior level faults will be covered. That set of faults are generated from one or several behavior level fault models. System specification BIST & DfT Test generation System level test data Library of algorithms Test vectors, BIST & DfT Library of soft IP-cores at RT-level Test vectors, BIST & DfT System synthesis Design at behavior level Test vectors, BIST & DfT Test generation Logic design Behavior level test data Test data transformation BIST & DfT Test generation Logic synthesis and Technology mapping Library of hard IP-cores Test vectors, BIST & DfT BIST & DfT Behavior synthesis RT-level design Library of soft IP-cores at logic level Test data transformation RT-level test data Test data transformation BIST & DfT Test generation Logic level test data Layout generation Test data transformation Layout Test data Figure 2.14: Test generation at several abstraction levels 45 CHAPTER 2 46 Part B Chip testing 47 48 Chapter 3 Background and related work in SoC testing Part A of this thesis gave a general introduction and background to digital system design and testing. This chapter provides a more focused background and offers related work in the area of testing. In Section 3.1 the main issues in SoC testing are presented, along with related work in SoC-testing and NoC-testing. Section 3.2 provides a deeper background in testing for on-chip interconnects with a special focus on crosstalk-faults. Test principles that utilize abstraction levels above the logic level are described in more depth in Section 3.3, along with related work. 3.1 SoC testing and NoC testing This section describes SoC-testing with a focus on SoCs with a NoC infrastructure. It also presents related work in NoC-testing. 3.1.1. Issues in SoC testing Testing of SoC devices can be partitioned into testing of cores and testing of the interconnection infrastructure through which the cores are communicating. 49 CHAPTER 3 Testing of cores There are two main issues to consider when developing a test for a core within a SoC. The first is the generation of test vectors and test logic of the core itself. The second issue is the transportation of test data to and from the core. The generation of test vectors for the cores can, in principle, be done as if the core was a stand-alone chip. When the core is a standalone chip it can be accessed directly from outside, but if a core is part of a SoC, consideration must be given to the capacity of the mechanism to transport test data to and from the core. To reduce the amount of test data being transported it is better to use more BIST for a core in a SoC, compared to what would be appropriate if the same core was used as a stand-alone chip. The reason is that transportation of more data takes longer, resulting in long test time. An alternative is to embed a mechanism for transportation of test data with larger capacity on the chip, but this costs chip area. S S S S Core S S S S Figure 3.1: Scan cells surrounding a core The test access to a core can be performed with the help of some extra circuitry commonly referred to as wrapper. A typical wrapper consists of a set of scan cells, one or several for each signal pin. Every signal connection to the core goes through a scan cell. The scan cells provide facility to disconnect the core from its environment and then directly apply values to its inputs and read values from its outputs. The scan cells are organized as a shift register. This is illustrated in Figure 3.1. The boxes marked with the letter s are the scan cells. The signals to apply to the inputs of the core during test are provided from the test equipment with help of the shift register. The shift 50 BACKGROUND AND RELATED WORK IN SOC TESTING register is also used to transport output values back to the test equipment. There is a set of control signals, not shown in the figure, connected to all scan cells. These control signals control the functionality of the scan cells. Wrappers can be designed in a variety of ways. Several variants are described in chapter 16 of Jha’s and Gupta’s book [Jha03]. The IEEE standard 1500 [IEEE05] includes standards for the test wrapper design for SoC testing. That standard is principally a revision and adaptation of the IEEE Boundary scan standard 1149.1 [IEEE01], which is a standard for testing of systems on a Printed Circuit Board (PCB) where each component is a separate chip. In a SoC device some previously designed SoCs can be included as IP-cores. Such previously designed SoCs can have a test access mechanism. Special consideration is needed to make testing of such SoCs efficient. For example the issue of test planning for such SoCs is addressed in [Cha05]. Test of interconnection infrastructure To detect for breaks and shorts in interconnection wires, test wrappers as described above can be used. The wrapper at a core can apply values on interconnecting wires and then the wrapper at another core can read the values of the wires. The decreasing dimension of SoCs makes it however insufficient to test only for breaks and shorts at the relative long interconnections connecting cores. Detection of a wider range of faults is necessary, including those causing crosstalk and delay outside the acceptable range [Pan05]. 3.1.2. Special issues in testing NoC-based SoCs Testing of a NoC device can be partitioned into testing of cores and testing of the communication infrastructure. Generation of test schemes for the cores can be divided into two main tasks. One is the test generation for the core itself. The other task regarding testing of cores is test access. This is basically transportation of test data to and from the core. The NoC communication infrastructure can be used as a mechanism for test 51 CHAPTER 3 access [Cot03b]. Testing of the interconnection infrastructure in a NoC includes testing of the switches and testing of the interconnection links connecting different switches. The switches in a NoC can be treated like any other core in the SoC device but utilization of the fact that there are many similar switches can give advantages in terms of test efficiency In large, synchronous SoCs, distributing the clock to various parts of the chip is associated with several drawbacks. One drawback is that a large amount of power is consumed by such distribution. Another drawback is that keeping the clock skew within an acceptable range has become trickier as chips have increased in speed and in silicon area. One way to avoid the problems associated with clock distribution is to use the Global Asynchronous Local Synchronous (GALS) concept [Hem99]. A chip designed with this principle is partitioned into several regions, each with its own clock signal. The GALS concept is widely adopted in NoC designs [Nak11]. Test for SoCs with a GALS clocking concept was addressed in [Eft05, Tra08]. Tran et al [Tra08] specifically focus on a SoC with a NoC communication infrastructure. Differently from the work presented in this thesis, none of those articles presents test methods for crosstalk induced faults. Testing for crosstalk-faults is trickier in communication links between different clock regions than in synchronous systems. The phase difference between clock signals in two connected domains may vary non-deterministically and crosstalk-faults might cause errors in data communication only for some adverse phase differences. The effect of this non-determinism is that errors can occur intermittently due to such faults. 3.1.3. Related work addressing NoC testing This subsection addresses related work in NoC testing, which includes work about test access mechanisms and test scheduling as well as some work about test data compression. The goal is to minimize test costs and to maximize coverage of faults. Minimizing test costs is mainly a matter of minimizing test time and minimizing chip area for test logic. 52 BACKGROUND AND RELATED WORK IN SOC TESTING Test scheduling and test access mechanism Cote et al propose how an existing NoC infrastructure can be used in an efficient way for test access [Cot03b]. They also describe how packets can be statically scheduled offline to optimize the utilization of the NoC infrastructure as a test access mechanism. The input to the algorithm presented in their article includes a set of test ports to the network, which can send test patterns and receive test responses. The cost for each core to be tested is considered in terms of amount of data transfer. The method works as follows. First the core that is most expensive in terms of test cost is selected, where cost is defined to reflect the amount of test data that is transferred to and from the core during test. When this selection is made, the test port with the shortest path to that core in the NoC infrastructure is selected for use when that core is tested. The cores are then scheduled in decreasing order of cost, using as much parallelism as possible. Power-related considerations for this type of test scheduling were presented in [Cot03a]. Amory et al [Amo04] presents how the method in [Cot03b] can be extended to use internal processors in a NoC device as test ports. In [Lar04] optimization of the test access mechanism in corebased designs is addressed. In that article a dedicated test access mechanism is the target. The method presented optimizes test time and the size of the test access mechanism simultaneously. The input to the method is the set of test data to be sent to and from the cores, along with the locations of the sources and sinks. In this context, sources and sinks are locations in the test access mechanism. Test data is applied at sources and the sinks collect the results of the tests. A network can be designed such that less network traffic is needed if the same data is broadcast to several cores. For such a network it might be more efficient to send the same test data to all cores rather than sending specific test data to each core. If this method is used, each core will need more test data than would be required if test data is designed specifically for each core. It is possible that broadcasting more test data might be less costly than sending smaller amounts of data to each separate core. Ubar et al [Uba04] presents a method to optimize the tradeoff between broadcasting some test vectors and sending test vectors to each core separately. It has, however, the 53 CHAPTER 3 drawback that the cores need to be purely combinational. Thus it is not directly applicable for testing of cores in a NoC device but the claim is that the method can easily be expanded for sequential circuits. If this is possible that method could be utilized for testing cores in a NoC device. In a NoC, all the switches in most architecture proposals are identical. There have been proposals for utilizing this fact for testing. In [Hos06, Hos07] test stimulus are broadcast to several or all switches. The outputs from the switches are compared and any difference indicates a fault. Another way to utilize the topological regularity of a NoC infrastructure was presented in [Rai06]. In the method of [Rai06] data is sent in certain standard ways to detect defects. The method presented in [Gre07] tests the switches and the wires in a NoC architecture. It utilizes broadcasting of test data in an efficient manner. It is demonstrated how test data can be transported using only switches and interconnections that have already been tested. As soon as another part of the network is tested it can also be used for test data transportation. Stewart and Tragoudas [Ste06] presented fault models based on the functionality of NoC-switches. Faults based on their fault models are defined as data transmission of a certain type at a port of a NoCswitch results in errors. Different types of transmissions in this context have different quality of service policies and/or differ in number of switches intended to receive the packets. Stewart and Tragoudas [Ste06] also presented a test method for covering the faults they defined. Different from our results in high level testing of NoCswitches, it is not shown in [Ste06] how the fault models correlate to faults at a logic level implementation. Fault tolerance techniques in a NoC communication infrastructure using retransmission introduce timing jitter on packet arrivals at cores. Huang et al have shown how a test wrapper can be designed efficiently to make scan chains working in the presence of this type of jitter [Hua08]. 54 BACKGROUND AND RELATED WORK IN SOC TESTING Test data compression techniques It was identified in [Dal08] that the speed with which a NoC device can be tested is limited by the capacity for transfer of test data. That article describes how test vectors can be compressed to reduce the amount of test data that needs to be transported, thus increasing the test speed. The compression technique relies on the fact that test vectors usually contain don't-care bits, which provide possibilities for efficient test vector compression. Compression of the test response was examined further in [Mor01]. Gonciari et al [Gon02] identified three parameters that should be optimized simultaneously when compressed test vectors are used. Those parameters are compression ratio, area overhead and test application time. They claimed that previous articles had optimized one parameter at the expense of the other parameters, but their method optimizes these parameters simultaneously. 3.2 On chip crosstalk induced fault testing This section describes background topics for the contributions in testing for crosstalk induced faults on asynchronous on-chip links. Related work is also surveyed in this section. 3.2.1. Crosstalk induce faults and their test aspects The small dimensions of today’s chips means that testing for defects caused by breaks and short circuits is not enough. To ensure high quality in the manufactured chips, defects that cause unacceptable delay or too much crosstalk need to be detected as well. This is especially important when there are relatively long wires connecting different parts, like the cores of a SoC [Che00, Ism99, Krs01, Nae04, Nor98, Pan05, Sin02]. Newer chip technologies have thinner wires and transistors and run at higher speeds than before. Wires are also very closely packed. Effects of parasitic capacitance and inductance 55 CHAPTER 3 need to be considered during the design of a chip [Mic06, Nae04] and testing is needed on the manufactured chips to determine if these parasitic effects go beyond the tolerance limits such that they risk affecting the functionality of the chip. The chip designer needs to ensure that crosstalk does not exceed expected levels and cause the chip to fail. For deep submicron chips it’s important to consider both capacitive and inductive coupling, and in some cases also coupling through electromagnetic waves [Liu03, Mic06, Nur04]. Pamunuwa et al [Pam05], in 2005, asserted that the general consensus was that modeling inductance is necessary only for special nets such as clock and power lines. The majority of signal lines can be accurately modeled with just resistance and capacitance. The fact that a certain crosstalk effect needs to be considered during chip design does not imply that there is a need to test for faults affecting that crosstalk effect. From the test point of view consideration of capacitive coupling is often sufficient [Bai04, Ism99]. In [Kun05] it is stated that failures due to capacitive crosstalk are the leading cause among marginally related failures at Intel. In that article the notion of marginally related failure refers to failures that occur for chips with an unfavorable combination of layout design and manufacturing process parameters. Inductive crosstalk does not change much due to fabrication faults unless there are shorts and breaks. In any case, it is easier to test for shorts and breaks than to test for crosstalk-faults. However, there are other defects that, while they do not produce more coupling than is allowed for, they exacerbate the effect of the coupling. Defects that make line drivers weaker or wire resistance higher may cause this type of effect. To address this, it might be necessary to consider not only capacitive coupling but also inductive coupling [Sin02]. Faults that increase the effect of crosstalk are caused by both capacitive and inductive coupling, though it is sometimes sufficient to consider only one type of crosstalk. Crosstalk between on-chip wires can cause both delays and glitches. The effect of crosstalk on a wire is highly dependent on whether signals on interfering wires are changing and how they are changing. 56 BACKGROUND AND RELATED WORK IN SOC TESTING 3.2.2. Models of crosstalk-faults In [Cuv99] the concept of victims and aggressors was introduced along with the Maximum aggressor fault model. The term victim or victim wire is used for a wire that is tested to determine how it is affected by crosstalk. Wires affecting the victim are referred to as aggressors or aggressor wires. When testing for a fault defined from the Maximum aggressor fault model, one wire is a victim while the others are aggressors cooperating to affect the victim in the worst possible way. The behavior of the victim under the worst attack situation is measured. It is when the signals on the aggressor wires are changing that they may affect the victim wire. If the signal of the victim wire is not changing, the aggressors can affect it such that glitches appear. When the victim wire is changing states, the delay in the change can be affected by crosstalk. voltage Aggressors time voltage time voltage Victim time Figure 3.2: Capacitive crosstalk causing a glitch on a victim wire As stated in Subsection 3.2.1 it is often sufficient to test for capacitive coupling. In such cases, the victim wire can manifest positive glitches when the aggressor wires are changing in a positive direction and it can manifest negative glitches when the aggressors are changing in a negative direction. Figure 3.2 illustrates the case with a positive glitch. Regarding glitches caused by crosstalk on digital wires, it is usually sufficient to consider positive glitches on victims that are at a low level and negative glitches on victims that are at a high level. 57 CHAPTER 3 For a victim wire that is changing value, capacitive crosstalk may affect the delay. For a victim wire that changes in the same direction as the aggressors, the delay is decreased. If the aggressors are changed in the opposite direction the delay becomes larger. The worst case situations of interference through capacitive crosstalk on digital signals are summarized in Table 3.1. In the table, diagrams show how the aggressor wires are changed and which signal is applied to the victim wire when the different worst cases occur. Table 3.1: Worst case corners of capacitive crosstalk Applied signal on victim wire Applied signal on aggressor wires 1 0 time 1 0 time 1 0 time 1 0 time 1 0 time 1 0 time 1 0 time 1 0 time 1 0 time 1 0 time 1 0 time 1 0 time Effect on victim wire Increased delay Increased delay Decreased delay Decreased delay Negative glitch Positive glitch 3.2.3. Asynchronous communication protocols A NoC infrastructure consists of a number of switches connected through links in a given topology. As argued in Subsection 3.2.1, there are several disadvantages to using a global clock for the whole NoC device. Instead, a GALS scheme can be used. One way to implement the GALS scheme is to let the switches in the NoC infrastructure be clocked by different clock signals. The consequence of this is that we need an asynchronous communication protocol for communicating data between switches. Normally two unidirectional links are used to connect two switches on a chip. 58 BACKGROUND AND RELATED WORK IN SOC TESTING Transmitting switch clkT RTR Write Data Receiving switch clkR Figure 3.3: Lines in a handshaking link In an asynchronous link that transfers data from one switch to another switch, some synchronization signals are needed. In on-chip communication there are often many data lines in parallel. In [Pam03], for example, 128 data wires in each unidirectional link is mentioned as a reasonable number. All the data lines are synchronized by the same synchronization signals. In this thesis we let the handshaking signals Write and Ready To Receive (RTR) represent the synchronization signals. The signal Write goes from the transmitter to the receiver and is used to tell the receiver when data is stable on the data lines. The signal RTR goes in the opposite direction and is used to inform the transmitter when the receiver has read the data on the data lines such that the transmitter can put new data on those lines. Figure 3.3 illustrates how these signals and the data lines are connected between a transmitting switch and a receiving switch. RTR Write Data Figure 3.4: Handshaking sequence Figure 3.4 shows the timing sequence for one data transfer. The receiver asserts RTR=1 when it is ready to receive new data. The signal RTR may only go high when Write is low and go low when Write is high. The transmitter raises Write to indicate that new and valid data has been put on the data lines. The signal Write may only go high when RTR is high and go low when RTR is low. RTR is changed on the active edge of the clock signal at the receiver. For reasons of efficiency, transmitter implementations do not need to give one full clock period gap between assertion of data and the signal Write. Instead Write=1 can be asserted after a small delay. Figure 3.5 shows how signals are changed with the clock edges. The signals clkT 59 CHAPTER 3 and clkR refer to the clock signals in the transmitter and the receiver, respectively. clkT clkR RTR Write Data Figure 3.5: Handshaking in GALS The actual implementation of the asynchronous communication protocols may be slightly different from the abstract model described above. The important function of the protocol is that synchronizing signals communicate that the receiver is ready and the transmitter has sent (or will send) new data with a known timing behavior. An overview of possible protocols in asynchronous communication on chips links is included in [Ho04]. That article also describes how buffer amplifiers, repeaters and pipeline stages can be used to get high throughput or low latency or both for long on chip wires. Metastability is another issue that has become a big problem in multi-clock domain systems [Rah10] and need to be considered during implementation of asynchronous communication protocols. 3.2.4. Issues in testing of asynchronous links To test for crosstalk-faults, each possible fault needs to be activated. Subsection 3.2.2 describes how aggressors should act to cause the worst case situation. As mentioned in Section 3.1.2, detection of whether a crosstalk-fault is present can be rather tricky with asynchronous links due to the non-determinism caused by the phase difference between the clocks at the transmitting side and the receiving side. This non-determinism is especially evident in systems that use separate clock oscillators. In such systems the phase difference between the clocks in different domains changes constantly. The non-determinism might cause intermittent failure due to defects that cause too much parasitic capacitance between wires, 60 BACKGROUND AND RELATED WORK IN SOC TESTING where the failure only happens for some adverse phase difference. This subsection proceeds with a discussion of delay, and thereafter glitches are considered. We use the communication protocol described in Subsection 3.2.3 to illustrate the issues. Table 3.2: Effects of change of delay Signal Effect Slow signal Fast signal Write Throughput degradation Risk of erroneous data transfer of link RTR Throughput degradation No problem arises of link Data Risk of erroneous data transfer No problem arises In the asynchronous link described in Subsection 3.2.3, faults causing change of delay might result in failures according to Table 3.2. Slow control signals can result in degradation of the throughput of the link. Such an effect can be tested by measurement of the throughput of the link. A harder fault to test is when some data lines are delayed but the control signal Write is not. This is the type of fault we address in this thesis. A change in the signal Write from 0 to 1 indicates to the receiver that new data is available on the data lines. When this occurs, if the data that is supposed to be at the receiver has not yet arrived, the receiver eventually reads some data bits erroneously. Let tl be the time from when the data arrives to when the signal Write arrives at the receiver. Due to faults and process variations, tl varies between different devices. If tl is smaller than zero there is a delay fault that might result in erroneous data. For non-negative tl, there is no such delay fault affecting the data line being considered. The receiver reads the data on its active clock edge and the time from the transmitter asserts the signal Write until the receiver detects it can vary with one receiver clock period time. This variation depends on the clock phase difference between the transmitter and the receiver. It can be modeled as a non-deterministic time gap from the arrival of the signal Write until the data is actually read. Figure 3.6 shows an example where a delay fault causes data to arrive after the arrival of 61 CHAPTER 3 the signal Write. In Figure 3.6a a relatively long time passes from when the signal Write changes from 0 to 1 until the active edge of the clock occurs. During this time the data stabilizes, resulting in a correct read of data. In Figure 3.6b an active clock edge occurs quite soon after Write has changed from 0 to 1. The result is that the new data is not stabilized when it is read, resulting in erroneous data. This makes the testing for such faults somewhat complicated. clkR Write Data a clkR Write Data b Figure 3.6: Signals at the receiver when data is delayed Glitches can cause a false synchronization signal to be detected as asserted although it was not. This can cause the transmitter and the receiver to lose their consensus about the phase of the data transfer the link is currently in. This can result in lost data, duplicate data and invalid data. Figure 3.7 shows an example of how a glitch can cause the loss of a data packet. A glitch occurs at RTR when both Write and RTR are logic zero and the receiver is not able to take any new packets for some time. Assume that at this time the transmitter has a packet it wants to transmit. A glitch on RTR might occur at the same time as an active edge of the transmitter clock. The transmitter then raises Write to indicate that a new packet is available. When this is done the transmitter finds RTR logic zero, which the transmitter interprets to mean that the receiver has read the data. Hence the transmitter sinks Write on its next active clock edge and prepares to send the next data packet. 62 BACKGROUND AND RELATED WORK IN SOC TESTING clkT clkR Glitch RTR Write Data packet n-1 packet n packet n+ 1 Figure 3.7: Example of a glitch causing a failure Duplication of data packets and additional invalid packets can occur if a glitch at Write causes the receiver to believe that a new packet is on the data lines when it is not. Glitches on data lines can also cause errors in the data if they occur when the data is read. In Section 4.3 a method of testing for faults that cause glitches is proposed. 3.2.5. Related work on chip interconnection testing In this section related work addressing crosstalk models and related work focusing on methods for testing are surveyed. Related works that do not directly address testing, but which addresses the subjects of crosstalk and asynchronous links on-chip, are also surveyed. This includes articles about fault detecting codes, fault correcting codes and encoding to avoid the effects of crosstalk. Crosstalk-fault models The maximum aggressor fault model [Cuv99] was surveyed in Subsection 3.2.2. In many cases the maximum aggressor fault model is unnecessarily pessimistic. The consequence of this is that test time becomes unnecessarily long and large amount of test logic might be required. In a bus, wires that are close to each other are subject to more capacitive coupling than wires that are further away from each other. It is unlikely that defects will change that relationship unless there are defects that cause short circuits or breaks, and defects of that type are easier to test anyway. Song et al [Son09] as well as Sirisaengtakin and Gupta [Sir02] used a graph representation to represent possible coupling between wires. With the help of this graph it is determined which wires could be scheduled as victims simultaneously. In Section 4.2 a method is presented that utilizes the 63 CHAPTER 3 structure of NoC interconnects to select wires that can be safely chosen as victims simultaneously. Unlike the methods in [Sir02, Son09], the method presented in Section 4.2 proposes a simple hardware design in which it is possible to adjust how many wires should be victims simultaneously. This makes it possible to adjust the tradeoff between test time and test accuracy after manufacturing. Zhao and Dey [Zha03] presented a method for computation of efficiency of fault models including those for crosstalk defects. This method is useful to evaluate the relevance of fault models and the fault coverage of test vectors. They assumed that there is a method that can detect whether a victim wire is affected by a set of aggressor wires during a test. Their computation method also relies on the existence of a method to determine the probability that there is a crosstalk-fault between pairs of wires. Related to crosstalk-fault models, there are articles that specifically address the testing-related aspects of crosstalk. There are articles which state that a trend for chips is that height of the wires is increasing relative to their width [Aru05] and parasitic capacitance is increasing between wires compared to capacitance between wires and the substrate [Pil95]. One effect of this is that the amount of delay in signal wires is varying more than before, depending on the behavior of adjacent wires. Pileggi [Pil95] also stated that if resistance per length unit is reduced significantly, inductance could become a factor that will need to be considered. Ismail et al [Ism99] stated that it is only important to include inductance in calculations during design for a given length of the interconnections. Observe that although it might be necessary to consider crosstalk caused by inductance during design it does not mean that it also needs to be considered for production test. There are several articles addressing how to estimate and model capacitive crosstalk [Gup05, Hey05, Pal05] which is the most important kind of coupling to be considered when a test fabric is developed. There are articles addressing delay issues in NoC devices [Liu03, Liu04]. These articles describe electrical properties and system parameters for NoC interconnects and how they affect the delay. Buffer insertion is discussed as well. Buffers decrease delay but they 64 BACKGROUND AND RELATED WORK IN SOC TESTING consume a significant amount of area and power [Liu03], which is why they are not often used. Bai and Dey [Bai04] describe how one trace of a Spice simulation was used to make several simulations of NoC interconnection links at the logic level of abstraction. That simulation included the effects of crosstalk. Test methods for crosstalk-faults Currently only a few articles present methods of testing for crosstalkfaults in asynchronous on-chip links. Li et al [Li09] as well as and Su et al [Su00] has present test methods for crosstalk which do not depend on a global clock. This means that they can be used to test for crosstalk-faults in chips with several clock domains. In the method presented by Su et al [Su00] a periodic signal is sent back and forth on two different wires in an interconnection bus. The phase difference between the signal that is sent and the signal that comes back is used to measure the delay. The delay is always measured in two lines, one in each direction. That article also presents a method for inducing worst case delay due to crosstalk. This is done by feeding other wires with an inverted version of the periodic signal used for the wires being tested. A scheme for detecting glitches as a measurement of crosstalk was presented by Li et al [Li09]. They claim that for detection of crosstalk-faults it is sufficient to measure glitches. The justification is that defects that cause additional crosstalk results in both glitches and delay faults. Compared to the contributions presented in this thesis, their method has the drawback that glitch detectors implemented using analog circuits are needed. That method is also more pessimistic than the method presented in this thesis because there are parameter variations in the analog detector. The worst case corner in which glitches are smallest compared to the delay caused by the corresponding interference, must be assumed to cover the possible crosstalk-faults when the test method in that article is used. For chips with a single clock signal there are several test methods proposed for detection of crosstalk-faults. Bai et al [Bai00] has presented a complete BIST structure for crosstalk interconnection test based on the maximum aggressor fault model. Attarha and Nourani [Att01] used BIST cells of analog structure to detect noise. For delay 65 CHAPTER 3 measurements, gate delays were used as reference. A BIST hardware design for detection of all transaction faults, bridging faults and stuckat faults in synchronous interconnections is presented in [Jut04]. Transaction faults mainly refer to delay faults in that context. Grecu et al [Gre07] addressed test issues for the infrastructure in NoC devices. They combined test generation for interconnection links and switches. They utilized NoC infrastructure to make the testing efficient and they included tests for crosstalk. Duganapalli et al [Dug08] focuses on how to test for crosstalk-faults in nodes inside a network of gates. An interconnection wire inside such a network is considered as a victim and a set of wires are considered as aggressors. A genetic algorithm is used to find a set of input vectors that can be used to activate crosstalk-faults and propagate their effect to an observable output. voltage Aggressors 0 1 1 0 1 0 1 1 0 1 0 0 1 1 0 1 time voltage 0 time voltage Victim 0 time Test for glitch faults Test for delay faults Figure 3.8: Test sequence for one line When testing for crosstalk-faults in a connection bus within a chip with a single clock, it is usually enough to test for increased delay and for glitches. Subsection 3.2.2 describes which logic values should be applied to victim wires and aggressor wires to perform respective tests. Each test needs two succeeding test vectors to be applied on the bus, a preceding vector and a final vector. This is because the test consists of a certain transition. This means in principle that eight test 66 BACKGROUND AND RELATED WORK IN SOC TESTING vectors are needed to test one line. However, in two cases it is possible to use the final vector for one test as the initial vector for another test. This means that six test vectors in a specific sequence are needed to test a victim wire for capacitive crosstalk-faults. Figure 3.8 illustrates this sequence for testing one line. When the maximum aggressor fault model is used one wire at a time is considered as a victim, hence 6N vectors are needed where N is the number of lines in the connection. Fault tolerance and coding techniques Erroneous data transfer is not only caused by defects in the chip, it can also result from transient faults. Transient faults can be caused by cosmic radiation [Mic06] and other sources that sporadically cause disturbances in signals on the chip. The shrinking dimensions of semiconductors has made them more sensitive to some types of transitory disturbances, so fault tolerance in on-chip communication links is needed [Dum03]. Error correcting or error detecting codes need to be used to handle transient faults. When such codes are used they also, to some extent, reduce failures caused by chip defects that cause too much crosstalk. Coding techniques used to detect and correct transient faults can to some extent reduce failures caused by defects that induce crosstalk. However relying on coding techniques to compensate for such defects can be problematic because a certain pattern of data that becomes erroneous due to a defect is expected to be erroneous in the same way each time it is transferred. A system with error detecting codes, which asks for retransmission, will experience problems in such a case because the same error is likely to occur when the data is resent. With a system that utilizes error correcting codes it might work but the error correcting mechanism must correct data errors every time the data is sent. The problem is when also errors caused by transient faults affect the data. In such a case the error correcting mechanism must correct for both the errors caused by defects that induce crosstalk and the errors caused by transient faults. The number of transient fault induced errors that can be corrected is then smaller in a chip with errors caused by crosstalk inducing defects. As a result, the probability 67 CHAPTER 3 that a transient fault cannot be corrected is considerably larger than it is for a system without any permanent crosstalk-faults. Zhao et al [Zha04] presented an online method for detection of noise induced by crosstalk and other effects. The basic idea of that method is that lines are sampled twice with a small delay between the samples, instead of only once. The two samples are then compared and if they are different it means that a fault has been detected. They pointed out that a weak point in this method is that it is overly conservative, with a false detection level of 40%. Some articles address NoC infrastructure issues in particular. Zimmer and Jantsch [Zim03] described fault models for NoC interconnects which are basically temporary faults caused by radiation, etcetera. It describes how probability and correlations in time and space between faults can be modeled, as well as how efficient coding techniques can be used for error control. Error control schemes dedicated to NoC devices and their specific demands on traffic was presented by Rossi et al [Ros07]. Overhead costs for fault tolerance techniques in NoC circuits were addressed by Frantz et al [Fra07]. They stated that hardware based fault tolerance techniques consume too much power. They presented a technique to improve fault tolerance that is partly implemented in hardware and partly in software for NoCs. Tamhankar et al [Tam07] have addressed how throughput in a NoC infrastructure can be improved with a timingerror tolerant technique in combination with a clock frequency that is so high that timing errors occur more frequently than what is acceptable without this timing-error tolerant technique. Signal coding techniques that avoid excessive crosstalk were presented in [Bre01, Dua01, Jun08]. The data is coded such that wires on a chip never have values applied that cause excessive crosstalk. Codes that avoid crosstalk in combination with a mechanism for handling inter-symbol-interference were presented by Sridhara et al [Sri08]. 68 BACKGROUND AND RELATED WORK IN SOC TESTING 3.3 Test generation at high abstraction levels With test generation at high abstraction level we refer to generation of test data at RT-level and higher abstraction levels. It is usually more difficult at higher abstraction levels than at lower abstraction levels to get good fault models. On the other hand, the cost for generation of test sequences is often lower at higher abstraction levels than at the logic level [Jer02]. Another advantage of generating tests at higher levels of abstraction is that testing can be taken into account earlier in the design phase, making it possible for optimization strategies to include test costs and testing issues. The main challenge in generating test vectors from high-level system specification is to define good fault models. Chapter 5 proposes an approach to generating fault models using some knowledge about the functionality and structure of the system. It is quite possible that different fault-models will be useful for generating test vectors for different classes of systems. Jervan et al [Jer02] described experiments in which test data generation at behavior level resulted in higher coverage of logic level faults than when logic level stuck-at fault based test vector generation algorithms were used. Most articles addressing test at high abstraction levels focus on how to generate faults with good correlation to defects in the implementation. Such articles can be divided into those addressing fault models at a certain level of abstraction and those dealing with derivation of faults from possible defects in the actual implementation. 3.3.1. Fault models independent of lower abstraction levels As justified in Subsection 2.4.2 fault models are independent of lower level implementations and they can be used for defining faults and then generating test patterns before synthesis into lower abstraction levels has been performed. Most of the proposed fault models of that type at the RT-level and the behavior level have been inspired by the stuck-at fault model at the logic level of abstraction. 69 CHAPTER 3 Bit stuck-at faults In the example in Subsection 2.4.2 that demonstrates fault modeling, the signal bit stuck-at fault at the RT-level and the variable bit stuck-at fault at the behavior level were described. In that example, the fault was modeled from the logic level of abstraction to the RT-level and to the behavior level. However, the signal bit stuck-at fault as well as the variable bit stuck-at fault can be used as fault models without knowledge of lower level implementations. A behavior level description of a system often states how variables should be encoded. Another option is that the decision of how to encode variables is taken at first when the system is synthesized into RT-level. The variable bit stuck-at fault can only be used when the variable encoding is stated. The bit stuck-at fault at the behavior level was presented by Cho and Armstrong [Cho94]. An analysis of how such bit stuck-at faults map to RT-level faults was shown by Buonanno et al [Buo97]. Their method works under certain assumptions about how the behavioral synthesis has been done. Logic level stuck-at faults have a clear mapping to many physical defects. Each component at the RT-level is synthesized to a specific set of logic components. Therefore the RT-level bit stuck-at faults are also likely to have a good mapping to a subset of the physical defects. Multiple bit stuck-at faults and variable stuck-at faults One class of test data generation algorithms does not work directly on fault models. Instead, the strategy of such test data generation algorithms is to cover as much code as possible. The code is a description of the system in some hardware description language, for example VHDL. Such a description is often made at the behavior level or the RT-level of abstraction. According to Buonanno et al [Buo97], code-covering methods cover faults of a type called multiple bit stuckat faults. There are 3n-1 different ways in which an n bit signal can have one or multiple bit stuck-at faults. Most code covering test data generation algorithms tends to cover a particular subset of the multiple bit stuck-at faults, those forcing the variables into their lower and upper extreme values [Buo97]. A test for a particular subset of such faults becomes a test for a set of variable stuck-at faults. The variable stuck-at fault model means that the variable is stuck at a particular 70 BACKGROUND AND RELATED WORK IN SOC TESTING value. Multiple bit stuck-at faults where all bits have a stuck-at fault are equivalent to variable stuck-at faults. Branch stuck-at faults and condition stuck-at faults Two other stuck-at fault models proposed are the branch stuck-at fault model and the condition stuck-at fault model [Fer98]. They can be used both at the RT-level and the behavior level of abstraction. A branch stuck-at fault means that a selection statement always makes a specific selection. This is usually an if-statement that is stuck-at-then or is stuck-at-else. It can also be a selection statement with several alternatives, usually expressed as a case-statement. The condition stuck-at fault model is similar to the branch stuck-at fault model. A condition stuck-at fault means that a condition is either stuck-at-true or stuck-at-false. This means that a condition behaves as if it is always true or always false. A distinction between the condition stuck-at fault model and a branch stuck-at fault model appears in the case where a branch statement may be based on a number of conditions connected through logical operators. More about the distinction between condition stuck-at fault models and branch stuck-at fault model has been described by Ferrandi et al [Fer01]. Short summary and comparison of RT-level and behavior level stuck-at faults Bit stuck-at fault and branch and condition stuck-at faults are stuck-at faults at the RT-level and the behavior level that researchers have used to generate test data. The bit stuck-at fault model and the variable stuck-at fault model are more relevant for testing the data parts of a design while the branch and condition stuck-at fault models are better suited to the control parts of a design. The variable stuck-at fault is independent of the encoding of the signal. At the behavior level the encodings of signals and variables are not always known. In cases when it is not obvious how variables should be encoded, it is possible to use the variable stuck-at fault but the bit stuck-at fault model cannot be used. At the logic level of abstraction, the connections between gates and flip-flops can be thought of as signals. It is then possible to define faults to mean that a signal is stuck-at-0 or stuck-at-1. Consideration of signals in such a way is equivalent to consideration of stuck-at 71 CHAPTER 3 faults at the outputs of gates and flip-flops. Logic level stuck-at faults are, however, usually considered on both the inputs and the outputs of gates and flip-flops. Consideration of stuck-at faults for both inputs and outputs instead of only for outputs makes sense in all nodes where the fan-out is larger than one. A similar distinction between consideration of signals and consideration of outputs and inputs can be applied to the bit stuck-at fault and the variable stuck-at fault models. Inputs and outputs to the operators can be considered for bit stuck-at faults and variable stuck-at faults [Cho94] rather than the signals and variables themselves. Operator mutation fault model The stuck-at fault models at the RT-level and the behavior level do not map to faults that are inside the functional units implementing various operators in the behavior. Therefore stuck-at faults at the RT-level and the behavior level can only model a subset of physical defects. A fault model named micro-operation fault has been presented by Cho and Armstrong [Cho94]. This fault model can be used to represent a fault in a functional unit where we do not have access to its internal implementation. A fault in such a block causes it to implement a different function than intended. Buonanno et al [Buo97] presented a generalization of this fault model called operator mutation fault. The presence of a certain fault of this type results in an operator that will make a miscalculation for some or for all operand values. For an operator with a large number of inputs, it is practically impossible to enumerate all possible operator mutations and then generate test data to test them. The way an operator can mutate due to defects depends on its design and implementation. This means that operator mutations are highly dependent on the operator’s logic level implementation. However, the synthesis process is often predictable enough that there is only one or a few ways in which an operator is going to be implemented. Knowing that, a reasonably small set of operator mutation faults can be defined such that it covers most of the physical defects in the circuit that will implement the operator. 72 BACKGROUND AND RELATED WORK IN SOC TESTING Code coverage methods The description at the RT-level and the behavior level of a system is usually made in a hardware description language, e.g. VHDL or Verilog. The basic idea of code covering methods is to generate a test sequence that causes as many statements as possible in the hardware description code to produce an observable output. Corno et al [Cor00] presented methodology which combines code coverage and RT-level fault models. Their methodology gives rules for identifying code lines for which usage of RT-level faults have little or no correlation to logic level faults. By omitting such RT-level faults from consideration a relatively strong correlation between RT-level faults and logic level faults was shown by experiments. Statements in VHDL described as a sequence were identified by Corno et al [Cor01] to improve the accuracy of RT-level fault models. A variant of decision diagram called alternative graphs has been proposed by Ubar [Uba96]. Usage of this type of diagrams is efficient for test data generation at the behavior level, especially when multiple abstraction levels are utilized. Jervan et al [Jer02] have used such graphs to generate test data for bit stuck-at faults and condition stuckat faults and they have utilized a technique to perform hierarchical test generation. Hierarchical test generation in this context means using several abstraction levels for test generation. In a description of systems in a hardware description language, it can be rather tricky to determine if the effect of a fault is propagated to an observable output. That problem has been identified by Fallah et al [Fal01] and they presented a method for analysis of how the effects of faults propagate. Experimental results based on the evaluation of the behavioral level fault metrics bit coverage, condition coverage and statement coverage were presented by Goloubeva et al [Gol02]. Statement coverage means that the test assures that the effect of each statement in the hardware description code is propagated to an observable output. Condition coverage and bit coverage refer to coverage of the conditions stuck-at fault and bit stuck-at faults, respectively. Systems are generally classified as control-dominated systems, data-dominated systems and mixed systems. The result of their experiment indicates 73 CHAPTER 3 that bit stuck-at faults and condition stuck-at faults have a good correlation to logic level stuck-at faults for data dominated systems. 3.3.2. Derivation of faults from possible defects in the actual implementation In Figure 2.8 an example was given of how a defect can be modeled first to logic level and then further to RT-level and behavior level. Faults derived from possible defects in the actual implementation have very good correlation to the physical defects they model. Faults can however not be derived in this way before synthesis to the final implementation has been performed. This is a drawback in the sense that consideration of test aspects based on such faults can only be made late in the design process. Another drawback is that such faults are often complex. The example in Figure 2.7 shows how a break in a wire can result in a fault at logic level that is considerably more complex than a logic level stuck-at fault. A third drawback is that detailed consideration of how the synthesis was made is needed to derive faults from possible physical defects. This can be rather complicated. Instead of deriving faults from the physical defects in the actual physical implementation, faults generated by a fault model at an abstraction level below the currently considered abstraction level can be used to derive faults. For example a system might have been synthesized into logic level and faults might have been created from the logic level stuck-at fault model. These faults can then be used to derive faults at RT-level. These derived faults at RT-level will have very strong correlation to the logic level stuck-at faults. Initial work in this area was presented by Hansen and Hayes [Han95a]. A fault modeling metric called physically induced faults was presented. This metric describes how faults at some abstraction level can be modeled up to any higher abstraction level. Hansen and Hayes [Han95b] also presented a way to use fault induction to generate fault models at a higher level of abstraction from logic level design. A description of a system at some higher abstraction level, together with the synthesis rules used to synthesize to the next lower 74 BACKGROUND AND RELATED WORK IN SOC TESTING abstraction level, is sufficient information to determine how the system will be described in that next lower abstraction level. A fault model at that lower abstraction level together with synthesis rules is therefore sufficient to generate faults at the higher abstraction level that are equivalent to faults derived from the faults in the lower level implementation. In practice, synthesis rules have optimizations making it complicated to find fault models from the synthesis rules without synthesizing the system and then propagating the faults to a higher level of abstraction. However, there have been attempts to make fault models from synthesis rules. In [Laj00] a method is described where fault models are developed for descriptions in the language POLIS. The synthesis rules of POLIS are analyzed to get fault models at the behavior level with a high correlation to logic level stuck-at faults. Operator mutation faults cannot, however, be accurately modeled with this method. Another work addressing testability consideration of synthesis rules was presented by Dey et al [Dey98]. They illustrated how a synthesis tool can introduce feedback loops into a design during optimization, which can considerably degrade testability at the behavior level and the RT-level. They proposed DfT techniques at the behavior level and at the RT-level to achieve good observability and controllability of the final implementation. They also showed what constraints on the synthesis are needed to preserve controllability and observability. Controllability and observability in this context basically refer to the ability to activate faults and propagate faults. 75 CHAPTER 3 76 Chapter 4 Testing of crosstalk induced faults in on-chip interconnects This chapter describes contributions to testing SoC interconnects. Its focus is on testing for defects that erroneously cause unacceptable levels of crosstalk. Test methods for detection of faults causing variations in delay and occurrence of glitches in asynchronous links are proposed. 4.1 Method for testing of faults causing delay errors This section presents a method for testing of delay faults caused by crosstalk defects in asynchronous links. The three first subsections present the method. After that, the hardware implications are described, followed by an analysis of the method along with results of the analysis. In these sections it is assumed that independent clock generators feed the different clock domains. 77 CHAPTER 4 4.1.1. Basic overview of the test method System with delay fault System with no fault In Section 3.2 a basic handshaking communication protocol between different clock domains was described. It was shown how a delay fault can result in errors for some adverse phase differences between the clock signals while no errors occur at some other phase differences. A consequence is that sending data on the communication link under worst case delay conditions is not sufficient to claim the absence of delay faults. As an alternative, we propose a method in which we read the data both at the active clock edge just before the signal Write arrives at the receiver and at the active clock edge just after arrival of the Write signal. The second read is done in the same way as during normal operation. Write Data TR tl TR Write Data TR -tl TR There is exact one active clock edge in each of the intervals marked TR. Figure 4.1: Signals at the receiver Figure 4.1 shows signals at the receiver side. The upper part of that figure shows the signals when no fault is present and the lower part shows the signals when there is a delay fault. Parameter tl was defined in Section 3.2.4 as the time from when the data is stable till when Write arrives at the receiver. When there is no fault the signal Write arrives after the data has become stable. So a non-negative tl means that the delay fault we are currently considering is absent while a 78 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS negative tl means that there is a delay fault. The clock period time at the receiver is denoted as TR. In the left interval marked with TR in Figure 4.1, one active clock edge occurs at the receiver clock. The first read of the data is made at this clock edge. In the case with a delay fault the wrong data will be read at this clock edge. In the case with a system without a fault, correct data or wrong data will be read depending on where in the interval that this clock edge occurs. If it occurs in the interval denoted with tl correct data is read otherwise the wrong data is read. So if the data read at this clock edged is correct we can conclude that the delay fault under consideration is not present but if it is wrong we cannot make any conclusion. In the right interval marked with TR in Figure 4.1, one active clock edge occurs at the receiver clock. The second read of the data is made at this clock edge. It is on this edge data is read under normal operation so if there is no fault in the system the data read at this edge will always be correct. In a system which has the delay fault, the data read at this edge is correct or wrong depending on where in the interval that this edge occurs. If this edge occurs in the interval denoted as -tl, wrong data is read otherwise correct data is read. So if wrong data is read at this edge there is a fault but if correct data is read we cannot make any conclusion. Combining the results of both reads results in one out of three different cases. The first case is when the data of both reads are correct. In this case we can conclude that the fault is absent. The second case is when the data of both reads are wrong. In this case we can conclude that there is a fault. The third case is when the data of the first read is wrong and the data of the second read is correct. In this case we cannot make any conclusions. To test if the fault is present or not, this measurement is repeated until either the first or the second cases occurs or until a predefined number of repetitions have been made. In the following we refer to one such measurement as an instance of a test. 79 CHAPTER 4 4.1.2. Algorithmic description of the test method In this subsection, we give a description of our method. The method is described as two algorithms, one for the transmitter and the other for the receiver. 1 2 3 4 5 6 7 8 9 10 11 12 13 ALGORITHM Transmitter () Write ← 0 REPEAT data-lines ← data to precede test data WAIT long enough for data-lines to settle WAIT UNTIL RTR = 1 data-lines ← test data WAIT TIME nominal value of tl Write ← 1; WAIT UNTIL RTR = 0 Write ← 0 UNTIL Receiver algorithm has finished END ALGORITHM 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ALGORITHM Receiver (OUT: result, IN: max_no_experiments) i ← 0 result ← MIGHT_BE_FAULTY WHILE (result = MIGHT_BE_FAULTY) AND (i < max_no_experiments) LOOP RTR ← 1 WAIT ON active clock-edge WHILE (Write = 0) LOOP Read data into register data_first_read WAIT ON active clock-edge END LOOP Read data into register data_second_read IF data_first_read = expeceted_data THEN result ← FAULT_IS_ABSENT ELSE IF data_second_read ≠ expeceted_data THEN result ← FAULT_IS_PRESENT ELSE RTR ← 0 WAIT UNTIL Write = 0 i ← i + 1 END IF END LOOP RTR ← 0 END ALGORITHM The algorithms for the transmitter and for the receiver implement the handshaking protocol described in Subsection 3.2.3 with some extensions that permit the test to be executed. The extension for the transmitter is represented with code lines 4 and 5. These code lines put 80 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS signal values on the data wires, which should precede the data to be sent. This is needed because the values on the data wires must change in a certain way during the test, since delay in these changes is what this test is meant to identify. On the transmitter side, code line 9 is an extension to the handshaking protocol described in Subsection 3.2.3. This code line reads the data lines at each active clock edge until the control signal Write has changed from 0 to 1. Thus the variable data_first_read will contain the data read at the active clock edge just before this change of the Write signal. Code lines 13 – 16 determine whether this measurement can identify the presence of a fault. If it can, the variable result is assigned accordingly. The measurements are repeated until it can be determined whether the faults being tested for exist or until the measurement has been repeated the number of times given by parameter max_no_experiments. Code lines 3 and 12 in the transmitter algorithm and code lines 4, 5 and 22 in the receiver algorithm represent the loops for this repetition of measurements. The parameter result in algorithm Receiver returns the result of the test. The result is either that the test passed or failed or that the specified max number of instances of the test were completed but definite conclusion was not possible. Additional test logic is needed to initialize the test and to propagate its result. This additional test logic also makes the transmitter exit when the test it completed. This is indicated at line 12 in the transmitter algorithm. 4.1.3. Worst case interference setup Each data wire might need to be tested for delay faults caused by crosstalk. Tests for delay are necessary both when a wire changes from 0 to 1 and when it changes from 1 to 0. The Maximum Aggressor Fault model [Cuv99] can be used such that each wire takes a turn as the victim while all the other wires are made to act as aggressors. We assume that we only need to test for crosstalk caused by capacitive coupling. It might happen that some wires other than the lines in this channel also create interference with the victim wire. We assume that if such wires should be considered as aggressors, the transmitting switch can activate them. The delay fault we are considering in this test is, in fact, not the absolute delay in the data line but its relation to 81 CHAPTER 4 the delay in the control line Write. Hence, to create the worst case situation, the aggressors should act such that the signal Write becomes as fast as possible and the data line being tested should be as slow as possible. When testing for a delay fault when the victim data line changes from 1 to 0, the longest delay due to capacitive coupling to the aggressor wires is when the aggressors change in the opposite direction, which is from 0 to 1. This behavior of the aggressor makes Write change, also from 0 to 1, as fast as possible, which is the worst case in this state. For testing of the delay fault when the victim data line goes from 0 to 1, the aggressors should be changed from 1 to 0 to cause the worst case delay. Such a change will also cause the maximum delay in the signal Write when it changes from 0 to 1. This type of interference does not create the worst case in this experiment. However, the transmitter asserts the signal Write shortly after the data has been asserted. It then causes the aggressor wires to change from 1 to 0 when it asserts the data, and then change the aggressor wires back to 1 again when it changes the signal Write. In such a way the victim data line will experience the longest delay and the signal Write the shortest delay due to capacitive interference from other lines. This means that the test is also performed in the worst case situation. 4.1.4. Hardware implementation In this subsection an RT-level description of the BIST hardware needed for implementation of the proposed test method for delay errors is described. Figure 4.2 shows a schematic diagram of the BIST hardware on the receiver side. We assume, for the sake of simplicity, that testing is initiated by a signal test mode from the corresponding transmitter. After receiving this signal the receiver sends the RTR signal and starts sampling the Data bus with the rising edge of every clock. The two latest samples of Data bus are stored in a FIFO structure consisting of two registers. Upon receiving the Write signal the contents of these registers are compared with the expected data and the comparison results are used to decide whether to abort testing, or to try another instance of the test on the same wire, or to test the next data wire as described in the test algorithm shown in Subsection 82 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS expected data Memory = address Data Register Register connections to transmitter 4.1.2. The BIST controller generates the required control signals including the address for the memory where the expected data values corresponding to various tests are stored. = Write RTR test_mode BIST Controller (receiver) fault_present fault_absent result Figure 4.2: Schematic diagram of BIST hardware in receiver BIST Controller start_test (transmitter) Data Write RTR test_mode connections to receiver Memory Register address Figure 4.3 shows a schematic diagram of BIST hardware at the transmitter side. We assume that testing of links between two switches is initiated by a control signal start test distributed from a central controller in the system. We assume the given test vectors for delay testing are stored in memory. These vectors are read out, one test at a time, and sent to the receiver using the same timing sequence as in the normal operation. Figure 4.3: Schematic diagram BIST Hardware in transmitter It should be noted that the BIST hardware described above is only a schematic description and the implementation can be optimized further. For example, hardware resources which are already available in the switch, like registers and memory buffers, can be shared for BIST purposes. Also, the same BIST hardware resources can be used 83 CHAPTER 4 for different links connected to a switch. The test vectors in the memory have a regular structure, hence logic can be used to generate the test vectors and this might consume less chip area than storing them in memory. 4.1.5. Analysis and results The basic idea of the method described is to repeat the testing using one test vector until either the existence of the fault can be determined or until a predefined number of instances of the test have been done. In the following section a theoretical analysis is carried out to identify the relationships between the probability of delay faults and the expected number of test instances required to achieve a reliable test result. Test capability as a function of measurement iterations In Section 4.1.1 tl was defined as the time from when data arrives at the receiver till when the signal Write does. There it was also described in which cases one instance of the test can unambiguously determine if the fault is present or not. For a correct chip we will determine the absence of the delay fault if an active clock edge occurs in the time period from the data arrives till signal Write arrives. The length of this time period is |tl|. Similarly for a chip that has the delay fault we are looking for, we will detect that the fault is present if there is an active clock edge in the time period from signal Write arrives till the data arrives. The length of this time period is also |tl|. Assuming that the clock signal at the transmitter is independent of the clock signal at the receiver there will be no correlation between the arrival time of signal Write and the clock phase at the receiver. The probability that one measurement can decide unambiguously if the fault is present or not, is therefore equal to the probability that at least one active clock edge appears in a time interval of length |tl|. Let function g(tl) denote this probability. This function is then: tl g (tl ) = TR 1 84 tl ≤ TR tl > TR (4.1) TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS In practice, tl will depend on process variations and other manufacturing effects. This means that tl is a stochastic variable. We assume that tl is normally distributed with expected value µ and standard deviation σ. The nominal value on tl is then µ. Given µ and the probability that the delay fault we are looking for is present, σ can be determined. Let f(t) denote the density function of tl. Let p denote the probability that one instance of the test can detect if the delay fault is present or not. This implies that p is a probability and this probability is in itself a stochastic variable, because it depends on the outcome of tl. Let r(x) denote the density function of p. Because r(x) denotes the density function of a probability its value is zero outside the interval [0, 1] and its integral over the real values is unity. To derive the relationship between r(x), g(tl) and f(t), we let a and b denote two real numbers such that a < b. We use them to express the probability that p lies between these numbers. This is equal to the probability that the outcome of tl is a value for which the function value g(tl) is in the interval [0, 1]. This is illustrated with Figure 4.4 in which the vertical axis indicates probability and the horizontal axis represents tl. The numbers a and b are shown at the vertical axis. Light gray lines illustrates for which values on tl the function g(tl) has a function value between a and b. The probability that p will be between a and b is therefore equal to the integral represented by the shaded areas in the figure. Figure 4.4: Illustration of the relation between r(x), g(tl) and f(t) 85 CHAPTER 4 Formally the relation between r(x), g(tl) and f(t) can be expressed as: b ∫ r (x )dx = ∫( )f (t )dt a ∀(a, b ), a < b (4.2) a < g t <b In the general case g(t) can be split into sections that are monotonically increasing, sections that are monotonically decreasing and sections that are constant, to transform r(x) to explicit form. For this case g(t) has a constant value of 1 for t < -TR and for t > TR. This means that r(x) has a Dirac impulse at x = 1. We denote the Dirac impulse with δ(x). The interpretation of this Dirac impulse is that there is a certain probability that every instance of a test detects whether the fault is present. This happens when |tl|≥TR. In the interval [0,TR), g(t) is linear as well as in the interval (-TR,0]. Each of these intervals’ contribution to r(x) is in its interval [0,1). So r(x) is: +∞ −TR δ (x − 1) ⋅ ∫ f (t )dt + ∫ f (t )dt + −∞ 0 ≤ x ≤1 TR r ( x ) = TR ⋅ ( f (TR ⋅ x ) + f (− TR ⋅ x )) 0 x < 0; x > 1 (4.3) Because tl is normal distributed f(x) is: − 1 f (x ) = e σ 2π ( x − µ )2 2σ 2 (4.4) For a specific fault on a particular chip, p has a fixed value. Let none_fault be the number of instances of tests required to detect if a fault is present or not. We assume that the chance that an instance can identify the presence of a fault is independent of the outcome of another instance of the test. With this assumption the density function w(n) of none_fault is following a for the first time distribution. In literature, geometric distribution is often used for this distribution but there is also a slightly different distribution that is also called geometric distribution, and that is the reason the term for the first time distribution is used here. 86 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS (1 − p )n −1 ⋅ p w(n ) = 0 ;n > 1 ;n ≤ 0 (4.5) The expected value of none_fault is 1/p. Need for a limit on number of iterations In this part we prove that the expected number of iterations will be infinite in order to unambiguously decide whether a chip has the delay faults we are targeting or not. Therefore, to make the test time finite, there is a need for an upper limit on the number of test instances after which the test for a fault should be aborted. Variable p was previously defined as the probability that one instance of the test can decide whether the delay fault under consideration is present or not. Its probability distribution is denoted with r(n). Equation (4.5) above shows how number of instances of a test none_fault depends on p. The expected value of required number of tests E(none_fault) is therefore: E (none _ fault ) = ∞ 1 ⋅ r ( x )dx x −∞ ∫ In Theorem 4.1 we show that E(none_fault) is approaching positive infinity. The consequence is that we need to determine a maximum number of times that we repeat the instance of a test. Theorem 4.1: The expected value E(none_fault) approaches positive infinity. Proof: E (none _ fault ) = ∞ 1 ⋅ r ( x )dx x −∞ ∫ (4.6) +∞ −TR 1 ( ) ( ) ( ) ( ) ⋅ f T ⋅ x + f − T ⋅ x + δ t − 1 ⋅ f t dt + f (t )dt dx R ∫0 x R ∫ ∫ −∞ TR 1+ = 1 =∫ 0 (T ⋅x − µ ) ( −TR ⋅ x − µ ) − 1 1 − R2σ 2 2 ⋅ e + e 2σ x σ 2π 2 2 −TR +∞ dx + f (t )dt + f (t )dt ∫ ∫ −∞ TR 1 44424 44 3 >0 87 CHAPTER 4 1 ≥ σ 2π (T ⋅ x − µ ) ( −T R ⋅ x − µ ) − 1 − R 2σ 2 2σ 2 e e ⋅ + ∫0 x 1 2 2 dx By the integration interval ending in 1+ we mean that the Dirac impulse at x= 1 should be included in the integration. The two exponential functions in the above expression are continuous and positive, and they have no local minimum. The entire integrand is also positive. Hence a lower bound on the integral can be found by replacing each of those exponential functions with the smallest of the function values in the end points of the integral. 2 2 1 (TR ⋅0 −2µ ) (TR ⋅1−2µ ) 1 1 2σ 2σ E (none _ fault ) ≥ ⋅ ⋅ min e ,e σ 2π ∫0 x −(TR ⋅0 −2 µ ) −(TR ⋅1−2 µ ) min e 2σ , e 2σ 2 2 dx + (4.7) (TR ⋅0−µ )2 − (TR ⋅1− µ )2 −(TR ⋅0− µ )2 ( −TR ⋅1− µ )2 1 − − − 1 1 2 2 2 2 = min e 2σ , e 2σ + min e 2σ , e 2σ ⋅ ∫ dx x σ π 0 1223 23 4444 3 14444244443 1 >ε1 144442 >ε 2 >ε 3 →+∞ for some constants ε1 > 0, ε2 > 0 and ε3 > 0 □ Determination of maximum number of repetitions of a test instance From Theorem 4.1 we know that the expected number of test instances required to unambiguously classify a chip as faulty or fault free is infinity. Therefore, we need to terminate the loop in the algorithm Receiver after a reasonable number of iterations such that the probability of “not able to judge” is acceptably low. Given that probability, a maximum number of repetitions l of a test instance can be computed. Let pk(n) be the probability that n tests can judge if the fault is present or not. For a given probability p that one test can detect if the fault is present or not, the probability that n tests detects if the fault is present is 1 - (1 - p)n. Because r(x) is the density function of p the function pk(n) can be computed as: 88 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS p k (n ) = +∞ ∫ r (x ) ⋅ (1 − (1 − x ) )dx (4.8) n −∞ 1+ +∞ −TR n = ∫ δ (x − 1) ⋅ ∫ f (t )dt + ∫ f (t )dt + TR ⋅ ( f (TR ⋅ x ) + f (− TR ⋅ x )) ⋅ 1 − (1 − x ) 0 TR −∞ ( 1+ +∞ −TR n = ∫ δ (x − 1) ⋅ ∫ f (t )dt + ∫ f (t )dt ⋅ 1 − (1 − x ) 0 TR −∞ ( 1+ ∫ (T ⋅ ( f (T R R ( )dx )dx + )) ⋅ x ) + f (− TR ⋅ x )) ⋅ 1 − (1 − x ) dx n 0 +∞ −T R n = ∫ f (t )dt + ∫ f (t )dt ⋅ 1 − (1 − 1) + −∞ TR ( ) (T ⋅ x − µ )2 ( − TR ⋅ x − µ ) 2 1 − R 2 − 1 2 ⋅ e 2σ + ⋅ e 2σ TR ⋅ ∫ σ 2π σ 2π 0 1+ ⋅ 1 − (1 − x )n ( )dx T −µ −T − µ = Φ R + Φ 1 − R + σ σ (T ⋅ x − µ ) ( −TR ⋅ x − µ ) 1 − R 2 − TR 2 2σ ⋅∫ e + e 2σ σ 2π 0 2 2 ⋅ 1 − (1 − x )n dx ( ) In this expression Φ(s) is the probability that a stochastic variable with a normal distribution, with expected value zero and standard deviation unity, is smaller than s. Equation (4.8) above gives the relation between the acceptable probability for not being able to determine if a fault is present or not and the maximum number of instances for a test to be repeated. 89 CHAPTER 4 Average number of iterations needed Let pl be the actual number of tests instance performed given l. Parameter l was defined above as the maximum number of repetitions of a test instance. The density function of pl is a modification of the geometric probability distribution. Let ql(n) denote the density function for pl. (1 − p ) n −1 ⋅ p ∞ k −1 l −1 q l (n ) = ∑ (1 − p ) ⋅ p = (1 − p ) k =l 0 ;1 ≤ n < l ;n = l (4.9) ; n > l; n < 1 Let hl(p) be the expected value of pl given an l and a p. Then hl(p) is: hl ( p ) = +∞ l −1 ∑ i ⋅ q (i ) = ∑ i ⋅ (1 − p ) l i = −∞ i =1 i −1 l −1 p + l ⋅ (1 − p ) (4.10) Let nb denote the expected value of hl(p). This is the expected number of instances of a test that will be performed for test of a delay fault. Function hl(p) is in fact a deterministic function of the stochastic variable p with density function r(x). +∞ nb = ∫ r ( x ) ⋅ hl ( x )dx = (4.11) −∞ 1+ +∞ −TR = ∫ δ (t − 1) ⋅ ∫ f (t )dt + ∫ f (t )dt + TR ⋅ ( f (TR ⋅ x ) + f (− TR ⋅ x )) ⋅ −∞ 0 TR l −1 ∑ i ⋅ (1 − x )i −1 x + l ⋅ (1 − x )i −1 dx i =1 To demonstrate the efficiency of the proposed method we have computed the expected value of test instances required from Equation (4.11). This computation is made for three different values of each of the following two parameters: 1. Ratio between E(tl) (the expected value of tl) and TR 2. Probability of the fault in the tested link. The value l, the upper limit on the number of test instances, is chosen such that the probability of stopping a test before the chip’s faultiness is determined, is one tenth of the probability that the fault is present, pf. For moderate probability of defects, this choice results in that about ten percent of the chips not measured as good, nevertheless do not have any defects. Table 4.1 shows the average number of test 90 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS instances required. The number of test instances required is larger for smaller nominal values of tl and for chips with higher probability of delay faults in links. The reason in both cases is that there is a larger probability mass for values on tl closer to zero. This reduces the probability that a test instance can determine if a fault is present or not. Table 4.1: Average number of test instances required E(tl)/TR 1.0 0.5 0.1 0.1 2.6 4.8 24 Fault probability Pf 0.01 0.001 1.8 1.4 3.3 2.6 16 13 From Table 4.1 we can see that a relatively small number of repetitions of each test measurement is needed, on average especially for chips with low probability of fault. The table displays results for a relatively high fault probability. For a useful chip process the fault probability for each fault cannot be as high as the example. In such cases these figures are lower but never lower than one. Hence it can be concluded that just a few repetitions will be needed on average in all practical cases. Single read method An alternative test method would only read the data at the clock edge before arrival of the signal Write; the data on the clock edge after the arrival of the signal Write would not be read. For good circuits the number of iterations needed would be the same as for the method presented above, but the process of only reading data once would reduce the hardware overhead. The drawback, however, is that faulty chips would not be detected during any instance of the test. Instead, iteration would continue until the limit was reached. Upon reaching this number the chip would be marked as faulty. So this alternative method is inefficient if the probability for fault is high. Figure 4.5 shows how much worse the alternative method is in expected number of iterations as a function of fault probability. The conditions are the same as for the results presented in Table 4.1. There are three curves representing three different values of E(tl)/TR. We can see from the diagram that the difference in efficiency in terms of number of 91 CHAPTER 4 Differnece in expected number of iterations iterations is only significant when the probability for delay fault is large. For example when the fault probability is 0.001 the alternative method needs about 25 percent more iterations than the method presented above. This is a relatively high fault probability. For lower fault probabilities the difference in number of iterations will be smaller. The conclusion is that for moderate fault probabilities, usage of the alternative method will only increase the expected number of iterations slightly. 200% 150% 100% 1.0 0.5 0.1 50% 0% 0.001 0.01 0.1 Fault probability Figure 4.5: Comparison between delay test methods Discussion of results There might be factors like ageing and temperature variations that affect the value of tl. For a communication link with a value of tl being positive and close to zero such factors might cause tl to become negative. A negative tl means that there is a delay fault. For positive values of tl a larger value means a larger fault margin. As described in Subsection 4.1.1, a small |tl| relative to the receiver clock period time, gives a small probability that one instance of the test can determine whether the fault under consideration is present or not. A consequence of this is that if the fault margin is small the probability that one instance of the test can determine whether the fault is present, is also small. This means that for chips with smaller fault margins the test has higher probability to not be able to determine whether the fault is present. Hence, most of the chips that the test cannot determine is likely to have a low fault margin or being faulty. Chips that pass the 92 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS test then have a higher fault margin in average compared to the entire set of chips without the fault under consideration. 4.1.6. Testing systems where different clock domains share a clock oscillator In chips using the GALS approach, it is possible to generate clocks for different clock domains from the same oscillator. In such systems the phase difference between clocks in two different clock domains can be anything, but it is relatively constant for each particular chip. As a consequence, such systems do not have the non-determinism described above. The testing technique for such dependent clock domains is simpler than for independent ones, since in the independent domains we have two unknown parameters, namely the relative delay in control signals and the time varying phase difference in the clocks. In dependent domains the phase difference is fixed. This reasoning renders it sufficient to test whether a normal data transfer under worst case conditions works correctly for a fault. By normal data transfer we mean that data is transferred as in a normal operation. The worst case condition is when the signal Write arrives at the receiver as early as possible and the data arrives at the receiver as late as possible. In some cases, however, the test will pass although the signal Write arrives at the receiver before data is stable. As long as the phase difference between the clocks is constant, this is not a problem because the clock edge on which the data is read will not appear before the data is stabilized. However, even when the clocks are dependent it might not be possible to guarantee that the phase difference between the clocks is constant. In such a case it would be necessary to perform a test that confirms that the signal Write actually arrives after the data is stable, for chips that pass the test. A measurement as in the test method presented above can be used. However, repetition of such a measurement will not be necessary in this case, it is enough to read the data as in the single read method described in 4.1.5. The test finds that the fault is absent if the data is read correctly at the last active clock edge before arrival of Write signal. Therefore such a measurement always finds that the fault being tested for is absent whenever tl ≥ TR. 93 CHAPTER 4 but for systems where 0 ≤ tl < TR it depends on where in the clock interval that signal Write arrives at the receiver, see Subsection 4.1.1. In such systems where 0 ≤ tl < TR, an option to make this test method working is to implement test logic such that the two different clock domains can be clocked with uncorrelated clocks during the test. 4.2 Method for scheduling wires as victims In many cases, closely packed buses interconnecting cores are laid out on many interconnect layers to minimize chip area and wire length. During testing for defects that cause too much crosstalk, the maximum aggressor fault model [Cuv99] can be used to select one wire at a time as the victim wire. It will however lead to very long test time. The probability is very low that a fault causing too much crosstalk affects a pair of wires with many other wires between them. Therefore several wires that are not too close to each other can be tested simultaneously in order to reduce the test time. Deciding how close simultaneously tested wires can be to each other is a tradeoff between test accuracy and test time. Of course, this can only be done if we have information regarding layout of the interconnecting wires. In this section we propose a test method for crosstalk-fault detection that uses a small programmable BIST hardware module. The hardware can be programmed to provide the desired tradeoff between test time and minimum distance between wires that are tested simultaneously. For hard IP-cores programmability is especially useful because such cores usually cannot be modified when they are included in a chip design. 4.2.1. Victim scheduling principles In the presented method a test sequence is generated such that each wire is scheduled as the victim once. As many wires as possible are tested simultaneously as long as the distance constraint is met. The goal is to minimize number of sets of wires to test simultaneously. 94 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS This problem is equivalent to a graph-coloring problem. Let each line be a node in a graph and let there be an edge between each pair of lines that should not be considered as victims simultaneously. The number of colors needed to color this graph is then proportional to the number of test vectors needed to test the bus for crosstalk-faults. Physical layout information is needed to generate the graph. If we assume arbitrary layout there will be many different ways in which edges will be generated. In this section we suggest a method that makes some assumptions about the layout to keep the hardware small and still flexible. The corresponding graph-coloring problem gets restricted in the solution space due to these assumptions. The first restriction is that lines are positioned in a mesh topology, so that several metal layers are used and in each metal layer, wires are placed at a uniform distance and in the same direction, as shown in Figure 4.6. The second restriction is that we also assume that the properties of wires are the same in all metal layers used for the bus. The features of the layout and the desired accuracy define whether a pair of lines can be considered as victims simultaneously or not. Due to the layout restrictions described above, simplification of these input parameters can be used without any significant loss in accuracy. Let zdirection represent positions in height direction such that different metal layers have different positions in z-direction. Let y-direction represent positions in side direction between wires (see Figure 4.6). Let pitch distance refer to distances measured in the pitch for the wires in each respective direction. In z-direction this means that the pitch distance for two wires in adjacent metal layers is 1 and for two wires with one metal layer in between, the pitch distance is 2. In y-direction two adjacent wires in the same metal layer have a pitch distance of 1. The pitch distances are integer values. 95 CHAPTER 4 Y Z wires sub stra te Figure 4.6:Multi-layered layout of interconnection wires Due to differences between dimensions in z-direction and y-direction, a single minimum distance between wires that can be tested simultaneously is an unnecessary coarse parameter for deciding which wires to consider as victims simultaneously. Instead we consider pitch distance in y-direction and pitch distance in z-direction as two different quantities. A table can be used that defines for each possible pitch distance in one of the directions z and y, what the smallest allowed pitch distance in the other direction would be. 4.2.2. Basic idea of victim selection method The basic idea of the method presented is to utilize a shift register to decide which lines should be victims at the same time. One bit in the shift register is used for each line to identify the line as a victim or an aggressor. Logic value 1 in the bit of the shift register means that the corresponding line should be considered as the victim and logic value 0 means that it should be considered as an aggressor. Figure 4.7 shows how this shift register is used to select lines as victims or aggressors. For the entire bus, one state machine designed based on some test strategy is used to generate signals according to 96 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS how victims and aggressors should behave. These signals are denoted A and V in this figure. Figure 4.7: BIST hardware Figure 4.8 illustrates the relationship of bits in the shift register to the actual lines. Each row corresponds to a metal layer. First it goes through each line at the highest metal layer from left to right. After that it continues in the same way in the next metal layer and so on. The boxes represent a cross-section of the wires’ layout and the arrows represent how the bits in the corresponding shift register are shifted. An initial assignment to the shift register is made to get the correct distance between victims. Figure 4.8: Shift register and cross section of bus 4.2.3. Illustrative example To illustrate this, we use a bus with 32 wires distributed equally to four equivalent metal layers. For this illustrative example we assume that layout constraints and trade offs between test time and test accuracy have resulted in the values in Table 4.2 which gives the minimum allowed pitch distance in y-direction between two lines that can be victims simultaneously, given a pitch distance in z-direction. 97 CHAPTER 4 Table 4.2: Example of distance constraints Pitch distance in z-direction 0 1 2 ≥3 Minimum pitch distance in y-direction 3 2 1 0 To decide the initial assignment to the shift register we use the following method. First we select the upper left corner wire to be a victim. After that we select as a victim the left most wire in the second row, which is not too close to the first victim wire. Similarly, we select victims in the subsequent layers such that they form a slanting line as shown in Figure 4.9, where the wires that are assigned as victims after this operation are indicated as solid boxes. y z Figure 4.9: Selection of initial victims The next step is to assign the second victim in the first row. We take the left most line that is not too close to any of the wires that have been assigned as victims so far. In Figure 4.9 this line is shown striped. After this operation we can see that the pitch distance between the victims in the first row is five in this example. This tells us to assign every fifth element on each row in the shift register as a victim and the others as aggressors. In Figure 4.10, wires are denoted with numbers to denote the order in which they should be considered as victims. 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 Figure 4.10: Victim assignment order In this case we were lucky because the distance through the shift register from the right victim at row one to the first victim at row two was equal to the pitch distance between the victims at row one. 98 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS Assume there were six wires in each layer instead of eight. Then we would have to delay the bits between the rows with the help of extra dummy flip-flops. In this example we would have to delay the bits by two cycles. Figure 4.11 illustrates what the shift register looks like. dummy flip-flops dummy flip-flops dummy flip-flops Figure 4.11: Need of dummy flip-flops 4.2.4. General description of method In this subsection the method for selection of initial victims and computing the number of dummy flip-flops is described formally for the general case. The way in which dummy flip-flops should be initialized is also described. Given w, the number of wires in each metal layer, we compute two parameters, sd giving the distance between nearest victims in the shift register (including dummy flipflops) and nd giving the number of dummy flip-flops in each row. Another input is information about how close wires can be to each other and still be considered as victims simultaneously. This information is sufficient to design the shift register and to initialize it. Every sd:th flip-flop in the chain, starting with the first one, should be initialized to indicate that the corresponding wire should act as a victim. Other flip-flops should be initialized to indicate that the corresponding wire is an aggressor. Let dy(z) be the minimal pitch distance in y-direction between two wires that could be considered as victims simultaneously given a pitch distance in z-direction. The domain of this function is the nonnegative integers and the range is a subset of the non-negative integers. This function is non-increasing and dy(α) = 0 for all α greater than some integer. Table 4.2 in Subsection 4.2.3 is an example of what this function might look like. Parameters sd and nd can be determined by the following algorithm. The input data to the algorithm is function dy(z) and variable w which is the number of wires in each metal layer. 99 CHAPTER 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ALGORITHM Determine nd and sd sd ← dy(0) i ← 1 WHILE dy(i) > 0 LOOP t ← i * dy(1) - dy(i) u ← i * dy(1) + dy(i) IF t < sd < u THEN sd ← u END IF i ← i + 1 END LOOP p ← MAXIMUM (0, sd - w) q ← MAXIMUM (w, sd) v ← (q + dy(1)) MODULO sd IF v = 0 THEN nd ← p ELSE nd ← p + sd – v END IF END ALGORITHM Line 2 – 11 in the algorithm can be illustrated with the help of Figure 4.9 in Subsection 4.2.3. The victims marked in solid black in that figure are the basis for this part of the algorithm. The algorithm then imagines another victim on the uppermost layer and determines how much to the left it can be put without becoming too close to any of the victims marked in solid black. The distance between this second victim and the black one in the upper left corner will be the value of sd. At line 2 this other victim is put as much to the left as possible without coming too close to the solid black victim at the uppermost line. In the while loop at line 4 – 11 the other solid black victims are considered in order, with each one being checked to see if that other victim is too close to some of them (line 5 – 7). If this is the case, that other victim is moved to the right (line 8). Line 12 – 19 is used to determine number of dummy flip-flops. 4.3 Method for test of crosstalk-faults causing glitches Section 3.2.4 illustrated how a glitch on a control line can cause the transmitter and the receiver to lose their synchronization. Glitches on control lines are more dangerous than glitches on data lines because of 100 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS the risk of the transmitter and receiver losing their synchronization. It is also more difficult to test for glitches on the control lines than on the data lines. In this section we first show a method for detection of glitches on control lines and then show how this method can be extended for detection of glitch faults that also affect data lines. Glitches that make a wire take a higher potential than it should, for a very short time, are referred as positive glitches. Negative glitches are those that make a wire take a lower potential than it should. 4.3.1. Testing control lines for glitch faults In an asynchronous link, synchronization between the receiver and the transmitter is achieved using hand-shaking signals. During testing, the transmitter and the receiver need to agree on what to test. Faults causing glitches can under some circumstances make the transmitter and the receiver lose their consensus about which phase of the test is currently going on. We show how a test can be designed to avoid that risk. This test is designed such that the signaling between the receiver and the transmitter for agreement of the current test phase is not sensitive to glitches not yet tested for. As the test proceeds, more and more faults causing glitches are tested for. As soon as the absence of a potential glitch fault has been determined, the signaling for agreement of current test phase assumes that this potential glitch fault does not exist. In this section we show how this can be achieved with the control lines RTR and Write to let the test be able to detect both negative and positive glitch faults on the control lines. We let RTR and Write change as they do during normal operation. The transmitter can make the data lines behave as aggressors causing interference on the control lines. In some cases it might be desirable to let wires outside the communication link being tested act as aggressors as well. In such a case we assume that a mechanism in the transmitter can be used to activate such aggressors. An extra output control signal from the transmitter can be used for this purpose. This assumption allows us to design the test logic at the behavior level of abstraction without information about the final layout. 101 CHAPTER 4 Controller Glitch detector Glitch detector To perform testing for glitches a glitch detector is used for each glitch we are testing for. The glitch detectors are put at the receiver side for Write and at the transmitter side for RTR. Each glitch detector is an SR-latch. The control signal on which glitches should be detected is connected to one of the inputs of this SR-latch. A glitch test controller is connected to the other input and the output of this SR-latch. Figure 4.12 shows one glitch detector for negative glitches and one for positive glitches connected to a controller. Victim wire reset Figure 4.12: Glitch detectors with controller The data transfer cycle can be considered as four different phases. Figure 4.13 shows the control signals in those phases. The shaded areas show phases when the respective signal is not supposed to change. A change in any of those areas in normal operation is never noticed. We utilize these areas for detection of glitches. In phase 1, for example, we test for positive glitches on RTR. Phase 1 Phase 2 Phase 3 Phase 4 RTR Write Figure 4.13: Four phases of control signals Under normal operation, data lines are supposed to be stable in phase 4 and in phase 1. In these phases negative glitches at RTR and negative glitches on Write respective are the ones that can cause errors. This means that under normal operation the data lines will never induce dangerous negative glitches on the control lines. Faults causing negative glitches on RTR and on Write are tested in phase 3 and phase 4 respectively. Hence only aggressors other than the data lines should be activated in these test phases. 102 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS 4.3.2. Steps in the proposed test method The transmitter can, in principle, test for glitches on RTR. In phase 1 and 3 testing can be done for positive and negative glitches respectively. The transmitter can simply make aggressor wires change and measure if those changes affect RTR. It is important that glitches don’t affect the signal Write because that could mislead the receiver to believe that a change to phase 2 or phase 4 has occurred. With the chosen polarity of the signals Write and RTR, these signals have opposite voltage levels in phase 1 as well as in phase 3. Thus stimulus at the aggressor wires causing glitches on RTR will not affect Write. Glitches on Write are a little trickier to test, because the transmitter activates aggressors, but it is the receiver that detects if a glitch is present. Phases 2 and 4 are utilized for testing for positive and negative glitches on Write respectively. When aggressors are activated both RTR and Write might get glitches. Glitches on RTR might cause the transmitter to believe that the current phase of the test is complete. However, such glitches are tested in the preceding phase. Phase 2 and phase 4 start when the transmitter changes the signal Write. When one of these phases is entered the receiver needs to prepare its glitch detectors for detection of glitches. It is, however, not possible for the receiver to tell the transmitter that it is ready. Instead the transmitter waits a sufficiently long time after it has changed Write to ensure that the receiver is ready to detect glitches. A more detailed description of the time needed for this test is presented in next subsection. In phase 2 the transmitter changes the value on the aggressors including the data lines to generate interference on Write. That change on the data lines can be used by the receiver as a signal that aggressors have been activated. The receiver can then check the glitch detector and after that this test let test proceed into the next phase. In phase 4 data lines are not part of the aggressors. Some data line can however be used by the transmitter to inform the receiver that aggressors have been activated. This can be done by letting the transmitter change the logic value of that data line when it activates the aggressors. 103 CHAPTER 4 4.3.3. Test sequence timing analysis Transmitter Receiver Initialize 1. Write ← 1 Aggressor lines 1. RTR ← 0 ← 00..0 Phase 1: Test for positive glitches at RTR 1. Reset glitch detectors for positive glitches on RTR 2. Aggressor lines ← 11..1 3. Check glitch detector for positive glitches on RTR. If a glitch has occurred then report fault and exit test Change to Phase 2 1. Write ← 0 Aggressor lines ← 00..0 1. Change to Phase 2 when Write goes from 1 to 0 Phase 2: Test for positive glitches at Write 1. Wait long enough to ensure that the receiver is ready 1. Reset glitch detectors for positive glitches on Write 2. Aggressor lines 2. Wait until data lines have changed to 1 ← 11..1 3. Check glitch detector for negative glitches on Write. If a glitch has occurred then report fault and exit test Change to Phase 3 1. Change to Phase 3 when RTR goes from 1 to 0 1. RTR ← 0 Figure 4.14: Test sequence for glitch faults on control lines As argued for above, phase 1 must precede phase 2 and phase 3 must precede phase 4 in the test sequence. So the test needs to start in either phase 1 or in phase 3. In the following description we let the test start in phase 1. Forcing the system into phase 1 starts the test. This can be 104 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS done through a common signal or a sequence given to both the transmitter and the receiver. The aggressor wires must simultaneously be initialized to zero because they should change from zero to one in this phase and a change from one to zero might destroy the test sequence. A change of aggressors from one to zero in phase 1 might cause a glitch on Write. This can cause the receiver to proceed to phase 2 and further to phase 3 while the transmitter is still in phase 1. Figure 4.14 shows the test sequence during phase 1 and phase 2, as described in Subsection 4.3.2. The actions of the transmitter and the receiver are shown. The test in phases 3 and in phase 4 works analogously to this. To determine the efficiency of the test we analyze the time required to perform the glitch test. The analysis assumes that clock generators for different clock domains are independent. Let clkT and clkR be the clock signals for the transmitter and the receiver respectively. Let their clock periods be denoted by TT and TR respectively. Let TTmin, TRmin, TTmax and TRmax be minimum and maximum clock period time for the respective clock signals. Based on this information we will compute the worst case time for this test. Figure 4.15 shows a timing diagram for phase 1 and phase 2 of the test. The signals g-res are the signals that reset the glitch detectors. The signal aggr represents the aggressor wires including data wires when applicable. The signal named enter represents a sequence or a signal that makes the receiver and the transmitter entering a mode for this test. There could be some time difference between the time points at which the signal enter reaches the transmitter and the time point at which it reaches the receiver. Let Tentermax be the maximum time difference that can occur. In Figure 4.15 this time difference is represented by the shaded area for the signal enter. For the signals Write and RTR, the delay is represented by shaded areas in Figure 4.15. Let’s name the worst case of the delay of these signals TWritemax, and TRTRmax respectively. There might be a delay from the transmitter activates the aggressor until interference occurs. Let Taggrmax be this time in the worst case. In Figure 4.15 this delay is shaded areas to illustrate this delay. 105 CHAPTER 4 Transmitter enter clkT Write aggr g-res Receiver n1 * TT n2 * TT n 3 * TT clkR RTR g-res 1 2 3 4 Figure 4.15: Timing diagram for test time analysis of glitch test For both the transmitter and the receiver, in the worst case it takes one clock cycle from when the signal enter is asserted to indicated that the test should start until an action is taken based on that change. So looking at the transmitter it takes max one clock cycle until the signals Write, data and g-res are assigned their initial values. Before the transmitter sets its signal g-res to 0 it needs to wait long enough such that the receiver has definitely set RTR to zero. From this point of view, in the worst case the transmitter detects the change of the signal enter directly after its change while the receiver needs one clock cycle before it detects that change. In the worst case, the signal enter arrives at the receiver Tentermax time units after it arrives at the transmitter. The time from the transmitter detects the assertion of signal enter until the change of RTR to 0 has reached the transmitter is therefore Tentermax + TRmax + TRTRmax in its worst case. Hence the transmitter needs to wait at least n1 clock cycles from when it has detected the change of the signal enter until it sets signal g-res to 0 and proceeds with the test. Parameter n1 is given as: T + TR max + TRTR max n1 = enter max TT min (4.12) One transmitter clock cycle after that change of signal g-res are the aggressors activated by changing them from 0 to 1 (point 1 in Figure 4.15). After that the transmitter waits long enough before it proceeds 106 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS such that possible disturbances have reached the glitch detector for RTR. In this case waiting long enough means that the change of the aggressor wires should have been completed and eventually glitches on RTR should have reached the transmitter. This time period is in worst case Taggrmax + TRTRmax. To ensure this the transmitter then needs to wait n2 clock cycles where n2 is defined as: Taggr max + TRTR max n2 = TT min (4.13) The transmitter then checks its glitch detector and eventually reports faults. Simultaneously, the signal Write is set to 0 to make the test proceed into test phase 2 (point 2 in Figure 4.15). At the same time, the aggressors are set to 0. After that, the transmitter needs to wait long enough before it changes the values of the aggressors from 0 to 1 to generate interference such that the receiver can prepare for the test. To do that preparation, the receiver needs to reset its glitch detectors and it needs to keep the signal g-res high until the aggressor wires have definitely been stabilized to zero. That time is Taggrmax. The number of clock cycles nR1 the receiver needs to keep g-res at 1 is therefore: Taggr max nR1 = TR min (4.14) From the transmitter’s point of view, in the worst case it takes TWritemax time until the change of signal Write reaches the receiver. After that it takes up to TRmax time until the receiver detects the change and sets its signal g-res to 1. The consequence is that the time from the transmitter changes signal Write from 1 to 0 until the transmitter can be sure that the receiver has set its signal g-res to 1 and then set it back to 0 is then TWritemax, + TRmax + nR1 • TRmax time units. Therefore the transmitter needs to wait n3 clock cycles to guarantee that the receiver is ready, where n3 given as: T + TR max + nR1 ⋅ TR max TWrite max + (1 + nR1 ) ⋅ TR max n3 = Write max = TT min TT min (4.15) After waiting, the transmitter activates the aggressor (point 3 in Figure 4.15). When the transmitter has activated the aggressors, the receiver needs to wait long enough such that the activation has affected all the 107 CHAPTER 4 aggressor wires and eventually glitches on the signal Write have reached the glitch detector. That time is Taggrmax + TWritemax time units. The number of clock cycles the receiver needs to wait is then defined as: Taggr max + TWrite max nR1 = TR min (4.16) After waiting, the receiver checks its glitch detector for the signal Write (point 4 in Figure 4.15) and at the same time it changes RTR from 0 to 1 to proceed to phase 3. In the worst case, the time it takes from when the transmitter activates the aggressors (point 3 in Figure 4.15) until the receiver starts to wait its nR1 clock cycles is TWrtiemax + TRmax time units. The time it takes from the transmitter activates the aggressors until the receiver changes RTR from 0 to 1 is: (4.17) Taggr max + TR max + nR 2 ⋅ TR max = Taggr max + (1 + nR 2 ) ⋅ TR max We sum up all time as described above to get the time needed from when the test is activated until phase 3 is entered. This time is, in the worst case: (4.18) TT max + n1 ⋅ TT max + TT max + n2 ⋅ TT max + n3 ⋅ TT max + (Taggr max + (1 + nR 2 ) ⋅ TR max ) Test phase 3 and test phase 4 work analogously to test phase 1 and test phase 2 but with a difference in the timing in the beginning. This difference comes from the fact that the transmitter starts phase 3 when RTR is 1 instead of when the signal enter is activated. When the transmitter finds that RTR is 1, no more waiting time is needed to ensure that RTR is stable. Only the glitch detectors need to be reset. It is possible to let the signal g-res for the transmitter be 1 as early as in phase 2 (not shown in Figure 4.15). In such a case the signal g-res can be set to 0 as soon as the transmitter detects that RTR is set to 1. From the moment when the receiver changes RTR to 1 until the transmitter detects that change, it is, in the worst case: (4.19) TRTR max + TT max So the time from when test phase 2 is finished until phases 3 and 4 are completed is: (4.20) TRTR max + TT max + TT max + n 2 ⋅ TT max + n3 ⋅ TT max 108 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS + (Taggr max + (1 + nR 2 ) ⋅ TR max ) The time needed for the entire test is then: testtime = TT max + n1 ⋅ TT max + TT max + n 2 ⋅ TT max + n3 ⋅ TT max (4.21) + (Taggr max + (1 + nR 2 ) ⋅ TR max ) + TRTR max + TT max + TT max + n2 ⋅ TT max + n3 ⋅ TT max + (Taggr max + (1 + nR 2 ) ⋅ TR max ) = TT max (4 + n1 + 2n2 + 2n3 ) + TRTR max + (Taggr max + (1 + nR 2 ) ⋅ TR max ) Let’s look at the case where the transmitter and the receiver have clock generators designed to have the same frequency. Then we can assume TTmax = TRmax and TTmin = TRmin. The accuracy of the clock generator can be described by a parameter k such that TTmax = k * TTmin. Let us also assume that the worst case delay for the wires are the same for all wires and that the maximal skew in signal enter also has the same value, i.e. TWritemax= TRTRmax= Tentermax= Taggrmax. Let c be defined such that TRTRmax = c * TTmax. In such a case the following simplifications can be made for equations n1, n2, n3 nR1 and nR2: kT + 2ckTT min n1 = T min = k + 2ck T T min (4.22) 2ckTT min n2 = = 2ck TT min (4.23) 2ckTT min nR1 = = 2ck TT min (4.24) ckT + (1 + nR1 ) ⋅ kTT min n3 = T min = ck + (1 + 2ck ) ⋅ k TT min (4.25) 2ckTT min nR 2 = = 2ck TT min (4.26) Putting these values into Equation (4.20) and changing its parameters according to the assumptions above, gives: (4.27) testtime = (4 + k + 2ck + 2 2ck + 2 ck + (1 + 2ck ) ⋅ k )TT max + 3ckTT max + 2(2ck + 1)TT max 109 CHAPTER 4 This function is non-decreasing in k and in c. The test time is overestimated if these parameters are rounded upwards to integers: ( ) testtime ≤ 6 + 3k + 15c k + 4 c k TT max 2 (4.28) For systems with clocks that have an accuracy such that that the fastest possible clock is not more than double as fast as the slowest possible, k is smaller than 2. In this case less than 50 clock cycles are needed for c ≤ 1 and less than 88 clock cycles are needed to perform this test for c ≤ 2. These figures show that the number of clock cycles for this test is relatively small. Up to our knowledge this test problem which addresses detection of crosstalk induced glitches at links between different clock domains while avoiding test problems that can occur due to glitches not yet tested for, has not previously been addressed. Therefore the main importance of this contribution is the method itself. The purpose of calculating these figures is to show that the method is possible to execute within a relatively small number of clock cycles. 4.3.4. Glitches on data lines Glitches affecting data lines can cause erroneous reads if they occur when the data is read. The reading is done in phase 4 (Figure 4.13). Hence it is only in this phase that glitches on data lines can cause errors. If the data lines are stable during this phase there is no risk that any data line will cause severe glitches on any other. As a consequence of this, all data lines can be considered as victims simultaneously. The signal RTR is also stable, so this signal does not cause severe glitches on the data wires. The only signal in the communication link that might change close to when the data is read is the signal Write which might change from 0 to 1 to indicate that new data is available. This occurs if the receiver reads the data very soon after this signal has changed from 0 to 1. In such a case the change of the signal Write can result in positive glitches on some data lines when the data is read. This is the only interference from within the communication link that can affect the data wires severely. It might also be desirable to test for interference from wires other than those belonging to this communication link for some systems. Then 110 TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS testing for both negative and positive glitches on the data wires can be required. To detect glitches in the data wires, glitch detectors as in Figure 4.12 can be attached to each data line. The test can be performed by communicating data in the same way as would be done during normal operation. All the data wires should then be assigned with zeros when tested for positive glitches and with ones when tested for negative glitches. The reset signal to the glitch detectors should be asserted during phase 3 (Figure 4.13) and the glitch detectors should be checked at the end of phase 4. If aggressor wires outside the wires in this link need to be activated, this should be done when the control signal Write is changed from 0 to 1. 4.3.5. Discussion of results The presented test method for glitch testing of control signals targets links between different clock domains. For this circumstance it is quite fast; 50 clock cycles are enough for testing control lines if clock generators are reasonably accurate and wire delays are smaller than the clock period time. All data lines in a link can be tested in parallel. If the test for glitches is to work accurately, the glitch detector should be at least as sensitive to glitches as the logic used under normal operation. By “as sensitive”, we mean that they should react to glitches with the smallest amplitude and the shortest duration that can affect the input logic used under normal operation. The nodes are internally synchronous and the signal lines go to D-inputs of flip-flops via some gates. Each gate that a signal passes through smoothes out the signal somewhat in time. The effect is that the more gates a glitch passes through, the less intensive it will be. The signal feeding the Dinput of a D-flip-flop goes through more gates than the signal to an SR-latch. Therefore a minimum sized SR-latch, as in the glitch detectors, is more sensitive to glitches than the D-flip-flop. 111 CHAPTER 4 4.4 Conclusions In this chapter we have presented how on-chip links between different clock domains on a chip can be tested for crosstalk-induced faults. Test methods have been provided to check for the two effects, change in delay and occurrence of glitches, which crosstalk-faults can cause. A method for scheduling of wires for simultaneous testing has also been provided to speed up delay fault testing. The different methods presented complement each other in forming a complete test composition for crosstalk-faults in asynchronous on-chip links. Simple and purely digital hardware implementations have also been developed, which can be used for the presented test methods. The method for measuring change in delays due to crosstalk has been shown to require only a small number of test trials on average to label a chip good or faulty. We have also shown that this method has the limitation that in the worst case, a finite number of test trials may not be able to label a chip correct or faulty. However, the probability of this outcome can be made arbitrarily small by increasing the number of test trials. Use of the methods presented in this chapter makes it possible to have programmable tradeoff between test time and test accuracy. In the method for measuring change in delay, the limit on the number of trials can be programmed. A higher limit makes the test take longer time but reduces the probability that a good chip is marked as faulty. The method for scheduling victims uses a small BIST hardware structure that can be programmed to a desired minimum distance between wires that can be tested simultaneously. The test will be faster if the wires being tested simultaneously are closer to each other, but the risk that faults will not be detected is higher. The method presented for detecting glitches caused by crosstalk on the control lines does not provide for tradeoff between test time and test accuracy, but on the other hand this test method is relatively fast. We have also proposed a method for test data wires for glitches. 112 Chapter 5 System level fault models This chapter describes our contribution to test generation at the system level of abstraction, which is based on a fault model at the system level of abstraction. A NoC switch is used as a case study for the system specific faults. In Section 5.1 concepts related to system-level fault models are introduced. NoC switch specific fault models are proposed and based on these models. Faults for a simplified NoC-switch are evaluated in Section 5.2. 5.1 System level faults As discussed before, system-level description of a design describes what the design is supposed to do without including any implementation information. A large number of implementations are possible for the same specification depending on which synthesis algorithms are used. System level fault models, which are independent of any specific implementation, then need to be developed based on what the system is supposed to do without any consideration about how it should be implemented. This means that definitions of system level faults must be based on consideration of the ways that externally observable behavior of the system can be different from the expected behavior during its various use-case scenarios. 113 CHAPTER 5 5.1.1. Application area specific fault models If a design gives desired actions for all its use-case scenarios then any of its defect-free implementations will work correctly. On the other hand, any non-redundant fault in the implementation will lead to incorrect actions during at least one of the use-case scenarios. It is only possible to identify use-case scenarios for a known design or designs in a fixed application area. A consequence of this is that system level faults need to be formulated specifically for a certain application or a certain application area. This requires different system level fault models for different types of applications. The NoC-switch, which is used for illustration in this chapter, is an example of a type of application. Another example is one-dimensional filters. Fault models for a specific application area have the advantage over fault models for a specific design that they are more general. This means that they can be used for more different designs, which gives the advantage that fault models does not need to be invented for every specific design. Because use-case scenarios in designs in a certain application area have a lot in common, it seems not to be noticeably harder to define fault models for a specific application area than for a specific design. In the next subsection faults are defined for NoCswitches. The different type of NoC-switch designs can be considered as an application area. 5.1.2. System level fault models for NoCswitches/routers A NoC-architecture consists of switches and connections between the switches. Each switch has connections to one or several other switches. In the proposed NoC architectures all or most switches also have connections to one or several cores. The function of the switch is to forward packets toward their final destinations. The decision regarding the output port through which a packet should be forwarded is made by the switch based on routing information in the packet. This routing information can simply be the destination address or it can be more sophisticated information including information about the path for the packet. 114 SYSTEM LEVEL FAULT MODELS Several different designs have been proposed for NoC switches [Asc08, Kim05, Lus10]. The functionality of NoC switches has, however, many generic properties and use-case scenarios. Based on these generic properties, the following system level fault types can be defined: Dropped data fault: Data received by the switch is lost and never emerges from the intended output port. Corrupt data fault: Transported data is corrupted during its passage through the switch. Direction fault: Data packet is routed in a different direction from the one prescribed by the destination address in the packet. Multiple copies in space fault: Packet comes out through the intended port as well as through an unintended port. Multiple copies in time fault: More than one copy of the sent packet comes out through the intended output port. 115 Corrupt packet Packet Packet CHAPTER 5 Supposed direction Corrupt data fault Packet Erroneous direction Correct direction Packet Direction fault Packet Dropped data fault Copy of packet erroneously sent Correct packet Multiple copies in space fault Several copies of a packet Multiple copies in time fault Figure 5.1: System level faults for a NoC switch Figure 5.1 illustrates the various fault types described above. For example, the system-level fault type corrupt data means that data from a certain direction going to a certain other direction gets corrupted while being routed through the switch. This system level fault is independent of the representation of data and its size. The fault type direction fault is mostly related to faults in the components that implement the routing algorithm while the fault type corrupt data is mostly related to faults in the datapath transferring packets. Let’s look at the fault type dropped data which means that a packet from a port, say port A, is supposed to go to another port, say port B, but gets dropped. In testing for this fault, appropriate data needs to be packaged into various fields of the packet entering port A 116 SYSTEM LEVEL FAULT MODELS and other conditions need to be set up in its environment according to the network protocols so that the effect of the fault could be observed at port B. The list of fault types for a NoC switch shown above is certainly not complete or unique. Some other application-specific faults for a NoC-switch can be defined. The purpose of the work presented in this chapter is to use a set of faults to highlight the usefulness of system level fault models for test generation. 5.2 Evaluation of system level fault models A set of fault models can be used to generate test data. We call a set of fault models efficient if the test data generated based on the fault models give high coverage of the defects in the relevant implementations. By relevant implementations we mean implementations that are generated by the relevant synthesis tools. To determine whether the system level faults are useful we need some way to evaluate them. We have chosen to compare some of the proposed system level fault models with the logic level stuck-at faults in a logic level implementation. We use a simplified NoC-switch for this evaluation. 5.2.1. Setup for experiments Simplified NoC-switch design To evaluate the proposed system level fault models, experiments have been done on a crossbar. The crossbar is generally part of a NoCswitch and can be considered as a simplification of a NoC-switch. The considered crossbar has connections in four directions, named east, south, west and north. This matches the directions in a NoC-switch for a mesh topology. In a mesh topology NoC, an additional port is needed to connect to a core. In this experiment for test evaluation this connection is omitted for the sake of simplicity. 117 CHAPTER 5 In each direction there is one output port and one input port. Figure 5.2 shows the signals at the crossbar. A complete NoC-switch has buffers at input and/or output ports. This is omitted from the crossbar. The modeled crossbar is therefore a purely combinational circuit. A further simplification compared to the NoC-switch has to do with the address bits. In a NoC-switch the address bits identify the final destination of the packet. In this crossbar there are instead only two bits that specify a target output port for the packet. Due to this simplification the output ports of this crossbar do not have any address bits coming out. Input port Output port Data Data East West North Address South Strobe Data valid Acknowledgment Acknowledgment Figure 5.2: A Simplified NoC-switch Each input port has two control signals, strobe and acknowledgement. The signal strobe is an input signal to the crossbar and it is used to indicate if there is valid data on the input port. The signal acknowledgement goes out from the crossbar and it indicates if the data could be transferred further via the desired output port. Each output port has the control signals data valid and acknowledgement. The signal data valid is an output signal and it is used to indicate if there is valid data on the output data bits. The signal acknowledgement at the output port is an input signal to the crossbar. It is used to inform the crossbar if its environment is ready to accept the data. Several input ports might try to send data to the same output port. In this crossbar static priorities are given to the input ports to decide which of them is allowed to send its data in such a case of conflict. In the crossbar used in this experiment, the number of data bits is reduced to two which is much fewer than in typical NoC-switch designs. Usually it is not useful to route a packet back and forth between two switches because this results in network traffic which 118 SYSTEM LEVEL FAULT MODELS does not participate in transferring the packet towards its final destination. Therefore a switch usually never sends back a packet in the direction it came from. No such constraint is put in the crossbar used in these experiments. Switch synthesis For evaluation of system level faults, this crossbar was synthesized into the logic level. The relationship between the stuck-at faults at this logic level implementation and the system level faults was then analyzed to get figures for the relevance of the system level faults. In the synthesis process to the logic level the system is optimized with rugged script in the tool SIS [Sen92]. It is thereafter technology mapped such that the logic level implementation only consists of inverters and two-input AND, NAND, OR and NOR-gates. The number of logic level faults considered is 400 which are all logic level stuck-at faults except faults that are dominated by others and redundant faults. For logic level faults that are equivalent only one fault is considered among each group of equivalent faults. Faults In the experiments the system level fault types considered are dropped data, direction faults and multiple copies in space. Test data for a fault of type corrupt data as well as for a fault of type dropped data should apply some data on a certain input port. Address, strobe and acknowledge bits should be applied to let this data be outputted on a certain port. Therefore for each fault of type corrupt data there is a fault of type dropped data which is tested with the same test data. The consequence is that the experiments give identical results for faults of type corrupt data and faults of type dropped data. Because of that the fault type corrupt data is not included in the experiments. The fault type multiple copies in time cannot be applied to the crossbar because it is a purely combinational design. There are 16 possible faults of the type dropped data, one for each combination of an input port and an output port. For the fault types direction faults and multiple copies in space there are 48 different faults of each type. There are sixteen combinations for each combination of an input port and a desired output port. For each such combination there are three possible faults of each of the fault types 119 CHAPTER 5 direction and multiple copies in space, one for each of the three remaining output directions in which the packet erroneously goes in the presence of such a fault. Fault simulation Every possible input pattern (224) has been fault simulated both at the logic level implementation and at the system level specification. The logic level fault simulation has been performed with the help of the Turbo tester [Jer98] tool. For each combination of a logic level fault and a system level fault, the number of test vectors that detect only the logic level fault, that detect only the system level fault, and that detect both kinds of fault, have been stored. These data are the basis for the experiments done in this work. 5.2.2. Metrics for measurement of the relevance of system level fault types In this subsection we present two metrics for evaluation of the relevance of system level faults. Both these metrics evaluate how system level faults relate to stuck-at faults in a specific logic level implementation. It should be mentioned that these evaluation metrics are not intended to be used for generation of test vectors nor for evaluating system level faults during design of a system. Their purpose is instead to help to give figures on the relevance of system level faults and system level fault models. Relative fault coverage increase The metric relative fault coverage increase is calculated separately for each system level fault. It makes use of a subset of the stuck-at faults in a specific logic level implementation. That subset is chosen such that this metric gets as large value as possible. Let n denote the number of possible logic level stuck-at faults and m the number of possible system level faults. We expect m to be much smaller than n. Let Li be a stochastic variable that indicates if the i:th logic level fault is covered by a random test vector. Correspondingly, let Sj be a stochastic variable that indicates if the j:th system level fault 120 SYSTEM LEVEL FAULT MODELS is covered by a random test vector. These stochastic variables are 1 when the fault it indicates is covered and 0 when it is not covered. Let uij be defined: ( ) uij = P Li S j − P(Li ) (5.1) This means that uij is the difference between two probabilities. The first probability is the probability that a random test vector detects the i:th logic level fault, with the condition that this vector also detects the j:th system level fault. The second is the probability that the i:th logic level fault is detected by a purely randomly generated test vector. A measurement of the usefulness of a system level fault is the relative fault coverage dj which we define as: Definition: Relative fault coverage increase n d j = ∑ max(0, u ij ) (5.2) i =1 The relative fault coverage increase dj for the j:th system level fault can be interpreted in the following way. There exists a logic level implementation in which the expected coverage of a subset of the logic level stuck-at faults is expected to be dj larger if a test vector is generated based on the j:th system level fault as compared to a randomly generated test vector. Expected increase in logic level fault coverage The metric expected increase in logic level fault coverage is calculated for a set of system level faults. This metric is a function of the number of test vectors. Similar to the metric relative fault coverage increase this metric is also computed based on a specific logic level implementation. This metric is the difference in expected number of logic level fault coverage when test vectors are generated based on a set of system level faults in a random way compared to completely randomly generated test vectors. The random way to generate test vector is the naive test data generation method described in the algorithm below. In this algorithm (naive test data generation), t is the number of test vectors that should be generated based on the set S of system level faults. The set V is the set of test vectors generated by this algorithm. 121 CHAPTER 5 1 2 3 4 5 6 7 8 9 10 11 12 13 ALGORITHM Naive test data generation V ← EMPTY SET A ← SELECT RANDOMLY t mod s ELEMENTS IN S B ← S - A FOR ALL ELEMENTS a IN A LOOP C ← SELECT RANDOMLY t / s + 1 test vectors covering a V END FOR C ← V U C LOOP ALL ELEMENTS b IN B LOOP ← SELECT RANDOMLY t / s test vectors covering b V ← V U C END LOOP END ALGORITHM Observe that more test vectors than the number of system level faults are generated in some cases. Unlike the case where a deterministic coverage relationship between system level faults and physical defects is the basis for the fault models, several test vectors for the same system level fault is often better in the probabilistic sense. For example 80 percent of the test vectors covering a certain system level fault might cover a specific physical defect while another physical defect might be covered by 75 percent of the test vectors covering the same system level faults. Generating more test vectors for this system level fault gives then higher probability that those physical defects will be covered. 5.2.3. Results of experiments This subsection shows the results of the metrics presented above applied on the crossbar described in Subsection 5.2.1. The system level faults considered are dropped data, direction faults and multiple copies in space. Result of expected increase in logic level fault coverage Figure 5.3 shows the expected logic level fault coverage when test vectors are generated with the naive test data generation algorithm described in Subsection 5.2.2. This algorithm is applied separately for the set of system level faults dropped data, for the set of system level faults direction faults and for the set of system level faults multiple 122 SYSTEM LEVEL FAULT MODELS copies in space. The expected fault coverage with completely randomly generated test vectors is shown in this figure as well. As the metrics expected increase in logic level fault coverage was defined above this metric is the difference between the curve for respective system level fault type and the curve with completely randomly generated test vectors. Figure 5.4 shows this metric for the set of system level faults dropped data, for the set of system level faults direction faults and for the set of system level faults multiple copies in space. Logic level fault coverage Number of test vectors Figure 5.3: Logic level stuck-at fault coverage as a function of number of test vectors 123 CHAPTER 5 Figure 5.3 also shows the expected coverage of logic level faults when test vectors are generated based on the logic level implementation. There are two curves that are based on test vectors generated by the tool Turbo tester [Jer98] which implements the PODEM algorithm [Abr90]. This algorithm generates a set of test vectors that covers the logic level faults. For this implementation of the crossbar, 42 test vectors were generated. The logic level coverage is therefore 100 percent for 42 and more test vectors. For fewer test vectors the curves for logic level faults shows the coverage when a subset of the vectors generated by PODEM algorithm is chosen. There are two curves for expected logic level coverage. One of them shows the expected logic level fault coverage when using a randomly chosen subset of the vectors generated by PODEM algorithm. The other curve shows the logic level fault coverage when the subset of test vectors are picked up starting from the beginning of list in which the test vectors generated by PODEM algorithm are sorted in decreasing order of logic level coverage. Expected increase in logic level fault coverage 15.00 Dropped data Direction fault 10.00 Multiple copies in space 5.00 0.00 0 8 16 24 32 40 48 56 64 Number of test vectors Figure 5.4: Expected increase in logic level fault coverage From these diagrams we can see that test data generated from system level faults give better logic level fault coverage than randomly generated test vectors. For example, if 90% logic level fault coverage is desired, about 15 test vectors generated from system level faults are 124 SYSTEM LEVEL FAULT MODELS needed while about 25 test vectors are needed if they are generated randomly. From the diagrams we can also see that the logic level coverage is more than 10 percent units better with test vectors generated from system level faults than for randomly generated test vectors, in the best case. These results indicate that system level fault models have some potential to facilitate test data generation. From the curve that show coverage when picking up vectors from PODEM algorithm in ordered manner, we can see that test data generated with an algorithm that works on the logic level gives better results than those generated based on the system level faults. However, generation of test vectors at the system level has the advantage that testing can be considered earlier in the design phase than if test vectors are generated at the logic level. Generation of test vectors based on logic level stuck-at faults is a much more mature research area than test generation at system level. Therefore there are probably more potential to improve test generation methods at system level than at logic level. A more fair comparison between system level fault and logic level fault would therefore be to let the logic level coverage be represented by the curve in which test vectors from PODEM algorithms are picked up randomly. With a desired logic level fault coverage of about 90 percent or less we can see from Figure 5.3 that generation of test vectors based on system level faults of type dropped data get higher fault coverage than test vectors generated at logic level in this way. For test vectors based on the faults of type direction fault and multiple copies in space we can see in Figure 5.3 that the expected logic level fault coverage is about the same as for logic level generated test vectors, when the desired fault coverage is about 85 percent or less. These facts indicate that, utilization of system level fault can compete with test vector generation at logic level when the desired logic level fault coverage is about 80 to 90 percent or less. It is also worth noting that in Figure 5.4 the curve for test data generated based on the dropped data fault has a peak of sixteen test vectors. This is a good sign for the usefulness of this system level fault model. There are sixteen different system level faults of this type for the current design. With the algorithm described in the previous subsection for generation of test vectors, a maximum of one test 125 CHAPTER 5 vector is generated for each dropped data fault when sixteen or fewer test vectors are generated. When test vectors are added to this set of test vectors such that it becomes more than sixteen, the new vector is based on a dropped data fault for which one test vector is already included in the set of test vectors. The increase in the expected fault coverage by adding one test vector is larger when there are less than sixteen test vectors in the set. This indicates that the dropped data fault model correlates well to a subset of the logic level faults. We can also see that when test vectors generated based on direction faults the expected coverage is slightly greater than for test vectors generated based on dropped data faults for a set of test vectors containing more than 38 test vectors. There are 48 different direction faults while there are only sixteen dropped data faults. Given the fact that the coverage for a small set of test vectors is better if dropped data is used, we can conclude that pairs of faults of this type tend to vary more in which logic level fault they relate to than pairs of dropped data faults. However, when larger sets of test vectors are used several test vectors need to be generated from the same fault if it is based on the dropped data faults. Because the test vector set that was generated based on direction faults gives better coverage in this case, we can conclude that the increased diversity in the faults of type dropped data is not enough to compensate for the lower number of faults compared to the direction faults. Relative fault coverage increase As defined in Section 5.2, the relative fault coverage increase is a relationship between one system level fault and the logic level faults in a given logic level implementation. It is a value that estimates the usefulness of one system level fault. The higher the value the more likely it is that the system level fault is useful. Table 5.1 shows the largest, the smallest and the average of the relative fault coverage increase for the system level faults of the three types considered in this experiment. 126 SYSTEM LEVEL FAULT MODELS Table 5.1: Parameter dj for system level faults Dropped data Direction Multiple Min 18.2 17.5 19.6 Average 20.2 19.5 21.7 Max 21.4 20.7 23.0 Table 5.1 shows that the relative fault coverage increase is about 20 for each system level fault considered in the experiments. This indicates that for a test vector generated for a specific system level fault, coverage of about 20 more logic level faults can be expected than with a random test vector, within a subset of the logic level faults. This is for the system level faults considered in this experiment and for the logic level implementation in this experiment. This result is in line with the result of expected increase in logic level fault coverage, in the sense that it indicates that system level faults can be useful for test generation. The difference in expected increase in logic level fault coverage between the three system-level fault types considered is small; therefore it is not possible to make any conclusions based on the relative fault coverage increase about differences between these system level fault types. Because the relative fault coverage increase is a metric measuring each system level fault separately it has the limitation of not being able to consider how faults in a set of system level faults correlate to each other. 5.2.4. Experiments to evaluate relative effectiveness of system level faults A final experiment in this chapter attempts to determine how dissimilar faults of a certain type are to each other. For the system level fault types dropped data, direction faults and multiple copies in space comparisons have been made between two ways of generating test vectors. The first way to generate test vectors is the naive test data generation method described in Subsection 5.2.2. In that method, a set of test vectors based on a specific type of system level fault are generated such that the number of vectors generated based on each system level fault of this type is distributed as equally as possible in 127 CHAPTER 5 the final test sequence. In the second method each test vector is generated without any consideration of how other test vectors have been generated. The second method is described in the following algorithm. 1 3 2 3 4 5 6 7 ALGORITHM Test vector generation without correlation between vectors result ← EMPTY SET FOR number of vectors to generate LOOP s ← SELECT RANDOMLY a system level fault t ← SELECT RANDOMLY a test vector that covers s result ← result U t END LOOP END ALGORITHM Each test vector is generated by first randomly selecting a system level fault of the type currently being considered and then randomly selecting a test vector from among those that cover this system level fault. With this second method there is a very low probability that a test vector is generated for each system level fault when the number of test vectors generated is equal to the number of system level faults they are based on. 128 SYSTEM LEVEL FAULT MODELS Difference in expected logic level fault coverage 8.00% 6.00% 4.00% 2.00% 0.00% 0 -2.00% 8 16 24 32 40 48 56 64 Number of test vectors Dropped data Direction fault -4.00% Multiple copies in space Figure 5.5: Fault coverage difference between two methods Figure 5.5 shows the difference in percent units in the expected logic level stuck-at fault coverage for test vectors generated with the naive test data generation method described in Subsection 5.2.2 and the method just described in this subsection. Experiments have been performed separately for the three system level fault types, dropped data, direction faults and multiple copies in space. We can see that the expected coverage is greater when test vectors are generated with the naive test data generation method in Subsection 5.2.2. This shows that there is some distinction between different system level faults of the same type. By distinction we refer to the correlation of different system level fault models to logic level stuck-at faults. We can also see that for dropped data faults the largest difference is for 16 test vectors. When 16 test vectors are generated with the first method, there is one test vector for each fault. The peak at 16 also indicates that there are considerable differences between different faults. 129 CHAPTER 5 5.3 Conclusions In this chapter we have proposed to use application area specific system fault models at the system level of abstraction. Using a simplified NoC-switch, the potential of this idea has been demonstrated by evaluating the fault coverage of logic level stuck-at faults by test vectors generated with system level faults compared to randomly generated test vectors. Experiments show that usage of application specific system level faults has some potential and is worth further investigation. 130 Part C Logic optimization 131 132 Chapter 6 Background and related work in Boolean decomposition Part A of this thesis gave an introduction and background on system design and testing. The current chapter offers a more focused background on the subject of Boolean decomposition and presents related work. The main objective of Boolean decomposition is to minimize the cost function at the logic level of abstraction in order to reduce the number of components or the chip area needed to implement a given Boolean function. In Section 6.1 different types of decompositions are described. Section 6.2 gives background on decomposition methods based on binary decision diagrams. It also provides a deeper description of nondisjoint decomposition and it describes the notion of bound-set. This is the background to the contributions presented in Chapter 7. Section 6.3 serves as a background to the contributions presented in Chapter 8, which is decomposition for logic with a gate depth of three. Section 6.4 describes how decomposition can be used in areas other than logic optimization. 133 CHAPTER 6 6.1 Decomposition of Boolean functions Decomposition of a Boolean function is the task of partitioning the function into subfunctions. 6.1.1. Concepts and notations Boolean function A Boolean function is a function of Boolean variables with a Boolean function value. Due to its mapping to a network of gates the function value is often named output and its variables inputs. A Boolean function can also be incompletely specified. For such a function the function value can be don't-care for some combinations of input values. A multiple output Boolean function is a set of Boolean functions with common inputs. We will follow the standard notations for representing Boolean functions. Capital letters are used for vectors or sets of Boolean variables. Lower-case letters are used for single Boolean variables. A bar above a variable or an expression indicates complementation. Support set The support set of a Boolean function is the set of inputs on which the function depends. An input that does not belong to the support set of a function cannot affect the output regardless of the values of other inputs. 6.1.2. Disjoint and non-disjoint basic decompositions A decomposition of a Boolean function f ( X ) is a representation of the type f ( X ) = h( g (Y ), Z ) where Y⊆X, Z⊆X and Y ∪ Z = X . In [Ash59] initial theoretic work about decomposition theory was presented. If Y and Z are disjoint sets, Y ∩ Z = ∅ , the decomposition is disjoint otherwise the decomposition is non-disjoint. For example function 134 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION f (x1 , x 2 , x3 , x 4 , x5 ) = x1 ⋅ x 2 ⋅ x5 + x1 ⋅ x3 ⋅ x5 + x1 ⋅ x3 ⋅ x5 (6.1) + x1 ⋅ x3 ⋅ x5 + x1 ⋅ x 2 ⋅ x3 ⋅ x 4 + x1 ⋅ x 2 ⋅ x3 ⋅ x 4 can be written as f (x1 , x 2 , x3 , x 4 , x5 ) = h( g (x1 , x 2 , x3 ), x 4 , x5 ) where g (x1 , x 2 , x3 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 h( g , x 4 , x 5 ) = g ⋅ x 4 + g ⋅ x 5 . This is a disjoint decomposition since no variables occur in both functions g and h. Figure 6.1 schematically shows the implementation impact of this decomposition. x1 x2 g(x1,x2,x3) x3 x4 h(g,x4,x5) f(x1,x2,x3,x4,x5) x5 Figure 6.1: Illustration of a disjoint decomposition As an example of a non-disjoint decomposition consider the function f ( x1 , x 2 , x3 , x 4 , x5 ) = x1 ⋅ x3 ⋅ x 4 ⋅ x5 + x1 ⋅ x3 ⋅ x 4 ⋅ x5 + (6.2) x 2 ⋅ x 4 ⋅ x5 + x 2 ⋅ x3 ⋅ x 4 ⋅ x5 + x 2 ⋅ x3 ⋅ x 4 + x1 ⋅ x3 ⋅ x 4 ⋅ x5 + x 2 ⋅ x3 ⋅ x 4 ⋅ x5 . This function can be written as f ( x1 , x 2 , x3 , x 4 , x5 ) = h( x1 , x 2 , x3 , g (x3 , x 4 , x5 )) (6.3) where g (x1 , x 2 , x3 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 h( g , x 3 , x 4 , x 5 ) = g ⋅ x 5 + g ⋅ x 3 ⋅ x 4 + g ⋅ x 3 ⋅ x 4 . Figure 6.2 shows the impact of this decomposition schematically. 135 CHAPTER 6 x1 x2 g(x1,x2,x3) x3 h(g,x3 ,x4,x5) x4 f(x1,x2,x3,x4,x5) x5 Figure 6.2: Illustration of a non-disjoint decomposition In the example above, with a non-disjoint decomposition, variable x3 is variable for both function g and function h. There could be any number of variables that are inputs to both these functions. In such a way it is possible for every Boolean function to find non-disjoint decompositions. However, a non-disjoint decomposition is usually more useful if the number of inputs going to both function g and function h is small. The contribution presented in Chapter 7 is a method for finding disjoint decompositions. 6.1.3. Roth-Karp-decomposition, a generalization In the description about decomposition in Subsection 6.1.2 where a function f(X) is decomposed into f ( X ) = h( g (Y ), Z ) where Y ∪ Z = X , all variables are binary. A generalization of this kind of decomposition is the Roth-Karp-decomposition in which the range of function g is extended to include more values than logic 0 and logic 1. [Cur62, Rot62]. More precisely, this can be described as letting f ( X ) be a Boolean function such that f ( X ) = h( g (Y ), Z ) with g and h being multiple valued functions of type g : B h:M ×B Z Y → M and → B where M = {0, 1, 2, K , m − 1} and B= {0, 1} . It is always possible to find a trivial Roth-Karp-decomposition for f(X) where M = 2 136 Y but such a decomposition is generally not useful. BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION Normally a Roth-Karp-decomposition needs to have relatively small M if it is to be useful. The Roth-Karp-decompositions can be utilized in digital design by coding the possible values in the set M into k = log 2 M binary bits. In this way the Boolean function f ( X ) can be written as f ( X ) = hb ( g1 (Y ), K , g k (Y ) , Z ) . In this function all variables and function values are Boolean, so it is possible to implement them directly with digital gates. It is possible to code the values in M into the functions g1 to gk in many ways, and the chosen encoding affects the amount of optimization that can be achieved. As for the basic decompositions described in Subsection 6.1.2 a Roth-Karp-decomposition is also either disjoint or non-disjoint. The definition is the same for Roth-Karp-decompositions, which means that the decomposition is disjoint if Y ∩ Z = ∅ otherwise it is nondisjoint. 6.1.4. Decomposition into multiple subfunctions It is possible that a Boolean function can be decomposed into multiple subfunctions. In Subsection 6.1.2 we described how a Roth-Karpdecomposition can be encoded into several bits. This is one way to decompose a Boolean function into more subfunctions. Another way is to form the functions as f ( X ) = h( g1 (Y1 ),..., g k (Yk ), Z ) where Yi ⊆ X∀i ∈ [1, k ] and Z ⊆ X . Subfunctions can be further partitioned into smaller functions in a hierarchical manner. 137 CHAPTER 6 x1 x2 x3 g1(x 2,x 3,x4) x4 x5 g2(g1,x5,x6) h(x1 ,g2,g3) f(x1,1,x8) x6 x7 g3(x6,x7,x8) x8 Figure 6.3: Illustration of multiple decomposition An example of how a function f ( x1 ,..., x8 ) can be decomposed is illustrated in Figure 6.3. The function is first decomposed into ( ) f ( X ) = h x1 , g 2* (x 2 , x3 , x 4 , x5 , x6 ), g 3 (x 6 , x7 , x8 ) . g ( x 2 , x3 , x 4 , x5 , x 6 ) * 2 is then further The function decomposed into g ( x 2 , x3 , x 4 , x5 , x 6 ) = g 2 ( g1 (x 2 , x3 , x 4 ), x5 , x 6 ) . Observe this is a * 2 non-disjoint decomposition. 6.1.5. Multiple output functions The decompositions described previously in this section deal with single output functions. Most digital systems, however, have several outputs. Subfunctions that can be used for several outputs are often useful to minimize the implementation cost. Decomposition that is done individually for each output can, however, help to find sub expressions that can be shared between several outputs. For example, consider the following functions: f1 (x1 , K, x6 ) = x1 x 2 x3 x 4 x5 x 6 f 2 ( x1 ,K , x6 ) = (x1 x 2 x3 x 4 x5 + x 1 x 2 x 3 x 4 x 5 ) ⊕ x6 Function f 2 can be decomposed into g ( x1 ,K , x5 ) = x1 x 2 x3 x 4 x5 + x1 x 2 x 3 x 4 x 5 138 (6.4) BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION f 2 ( g , x6 ) = g ⊕ x6 . Then function f1 can be simplified to f 1 (g , x1 , x6 ) = g x1 x6 6.2 Decision diagram based decomposition methods This section gives background and related work specifically related to the contributions presented in Chapter 7. 6.2.1. Properties of the disjoint decomposition In this subsection we first describe what a bound-set is and then we present some methods for finding a bound-set. Thereafter the theory of decomposition trees is described. Bound-set A property of Boolean functions closely related to the model of disjoint decomposition is the concept of bound-set. Let f(X) be a Boolean function with all variables in X belonging to the support set of f(X). Let Y ⊆ X . Then Y is a bound-set if and only if there exist functions g and h such that f ( X ) = h( g (Y ), Z ) where Z ⊆ X , Y ∩ Z = Ø , Y ∪ Z = X and all variables and function values are Boolean. Initial work regarding bound-sets was presented by Ashenhurst [Ash59]. However, Ashenhurst did not use the term bound-set. A method to determine whether a subset of variables is a bound-set was described in Ashenhurst’s article. That method works as follows. Let a Boolean function have the input variables in the disjoint sets Y and Z. A matrix is then created with one row for each combination of variable assignments in set Y and one column for each combination of variable assignments in set Z. Each cell in the matrix contains the 139 CHAPTER 6 function value for the corresponding variable assignments. If the row multiplicity is two, Y is a bound-set and if the row multiplicity is larger than two, Y is not a bound-set. The row multiplicity in this context means the number of distinctive rows. A row multiplicity of one occurs if none of the variables in set Y belongs to the support set. Figure 6.4 show an example of such a matrix. There are two distinctive rows, hence it can be concluded that Y is a bound-set. One column for each combination of variable assignments in set Z One row for each combination of variable assignments in set Y 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 1 1 0 0 1 0 1 1 Figure 6.4: Matrix for bound-set check With a bound-set there is an associated function. In function f (Y ∪ Z ) = h(g (Y ), Z ) where Y is a bound-set function g (Y ) is associated with Y. The associated function is unique up to complementation of its output. Decomposition tree For a Boolean function f(X) there are 2 − 1 non-empty subsets of the input variables. For every Boolean function, each input variable as well as the set of all input variables are trivial bound-sets. Each subset of variables in X is a bound-set for some functions. An example of a function for which every subset of variables is a bound-set is the parity function f ( x1 , x 2 ,..., x n ) = x1 ⊕ x 2 ⊕ K ⊕ x n . X In [Möh85] it is described how all bound-sets can be represented with less than 2n bound-sets with help of the concept of strong and weak bound-sets and the decomposition tree. Here n is the number of inputs. A bound-set to a Boolean function is strong if any other 140 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION bound-set is a subset, a superset or disjoint to this bound-set. The decomposition tree is a rooted tree in which the nodes are representing the strong bound-sets. An example of a decomposition tree is shown in Figure 6.7. Bound-sets that are not strong are called weak boundsets. The root node of a decomposition tree is the trivial bound-set containing all variables and its leaf nodes are the trivial bound-sets that contain only one variable. Each strong bound-set B in the decomposition-tree is positioned such that the sub tree rooted at B contains all strong bound-sets that are subsets of B but no other nodes. For a given Boolean function the decomposition tree is unique. The associated function g(x) with the trivial bound set of a leaf node with associate variable x is either g ( x ) = x or g ( x ) = x . The inputs to the function associated with other nodes in the decomposition tree are the outputs of the functions associated with its immediate successor nodes in the decomposition-tree. The functions associated with nodes in a decomposition tree are unique up to complementation of inputs and outputs. Properties of the disjoint decomposition that makes it possible to represent all bound-sets with a decomposition tree are the following. If two bound-sets Y and Z have the properties Y ∩ Z ≠ ∅ , Y − Z ≠ ∅ and Z − Y ≠ ∅ we say that the bound-sets are overlapping. If Y and Z are overlapping bound-sets it implies that Y ∩ Z , Y ∪ Z , Y − Z , Z − Y and (Y − Z ) ∪ (Z − Y ) are boundsets as well. Figure 6.5 illustrates this implication. 141 Y Z Bound-sets Y and Z, Y∩Z ≠ ∅, Y–Z ≠ ∅, Z–Y ≠ ∅ Y Z Y∩Z Y Z Y∪Z Y Z Y-Z Y Z Z-Y Y Z (Y-Z)∪(Z-Y) Bound-sets CHAPTER 6 Figure 6.5: Implication of bound-set’s overlap A strong bound-set is either full or prime. In the decomposition tree it is denoted for each bound-set whether it is full or prime. It is thus also possible to determine the weak bound-sets from the decomposition tree. All unions of variables in every subset of immediate successors nodes to a node denoted as full are bound-sets. All weak bound-sets can easily be found utilizing this property. The distinction between full and prime bound-sets only makes sense for nodes in the decomposition-tree that have three or more immediate successor nodes. Figure 6.6 summarizes the division of bound-sets into different types. bound-set strong full weak prime Figure 6.6: Types of bound-sets The associated function with a full bound-set is a simple Boolean operation, AND, OR or XOR but inputs and the output may be complemented. The associated function with a prime bound-set is a Boolean function in which no bound-set exists except the trivial ones. Recall that the associated function with a non-leaf-node has one input for each immediate successor in the decomposition-tree. 142 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION full h x1 x2 x 3 x4 x 5 x6 x7 prime g1 full x1 x 2 x 3 x1 x2 x3 g2 x 4 x5 x6 x4 x5 x6 x7 Figure 6.7: Example of a decomposition tree For example, the decomposition tree for the following function is shown in Figure 6.7. f ( x1 , x 2 , x3 , x 4 , x5 , x6 , x7 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 + x 4 ⊕ x5 ⊕ x 6 + x 7 The associated functions with the three nodes, which are not leafnodes are: g1 ( x1 , x 2 , x3 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 g 2 ( x 4 , x5 , x 6 ) = x 4 ⊕ x5 ⊕ x 6 h( g 1 , g 2 , x 7 ) = g 1 + g 2 + x 7 . The full bound-set g2 implies that {x 4 , x5 } , {x 4 , x 6 } and {x5 , x 6 } are weak bound-sets. Similarly, the full bound-set h implies that {x1 , x 2 , x3 , x 4 , x5 , x6 } , {x1 , x 2 , x3 , x 7 } and {x 4 , x5 , x6 , x 7 } are weak bound-sets. This theory about decomposition trees does not extend directly to multiple-valued functions. According to [Dub97b] a class of multipleoutput functions for which this theory of decomposition trees holds is defined in the book [Von91]. 6.2.2. Binary decision diagrams Basics about binary decision diagrams A Binary Decision Diagram (BDD) is an acyclic directed graph that represents a Boolean function. In such a graph, the nodes with no successors are called leaf-nodes. Each leaf-node represents one of the 143 CHAPTER 6 two Boolean constants 0 and 1. There is one node with no predecessors and it is called the top node. Each node that is not a leafnode has two immediate successors and one associated input variable. A BDD represents a Boolean function in the following way. Given an assignment of the input variables, start a walk in the graph at the top-node. For each non-leaf-node there is an associated input variable and two outgoing branches. One of the branches shows the path of the walk if the assignment of the associated input is logic 0 and the other branch shows the path of the walk when the assignment of the associated input is logic 1. x1 x2 0 x2 0 1 0 Figure 6.8: A BDD Figure 6.8 shows an example of a BDD that represents the Boolean function x1 ⋅ x 2 . The direction of the graph is from top to bottom. The dotted edges represent the outgoing branch from a node in which the walk should go when its associated variable is assigned logic 0. The solid branch represents corresponding activity when the associated variable is assigned logic 1. These styles of BDD drawings are used in all figures with BDDs in this thesis. Reduced ordered binary decision diagrams Reduced Ordered BDDs (ROBDD) were presented in [Bry86]. x1 x1 x2 0 x2 0 1 x2 0 x1 x2 0 x2 1 0 1 Figure 6.9: Reduction of BDD In a BDD, each node represents a subfunction. A ROBDD is reduced, which implies that there is no node that represents the same subfunction as another node. One consequence of this is that a 144 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION ROBDD only has two leaf-nodes. The property of a ROBDD being reduced also implies that no node has both outgoing edges connected to the same successor. Figure 6.9 illustrates how the BDD in Figure 6.8 can be changed to a ROBDD. To the left is the BDD as it appeared in Figure 6.8. In the middle the nodes representing the same subfunctions are replaced with one node. In this example it is only the leaf-nodes with constant zero. The result is that the left node with variable x2 has its both outgoing edges pointing to the same node (see the middle figure). This node is then removed and its incoming edge is connected to the node its outgoing edges were going to. The result is the BDD to the right in Figure 6.9 and this BDD is a ROBDD. Observe that this description should not be interpreted as an algorithm to reduce a BDD because it will only work for some BDDs. A complete algorithm to reduce a BDD is beyond the scope of this thesis and is therefore not described. In a ROBDD the variables are ordered. This means that in each possible pair of walks through the BDD, the variables that appear in both pairs appear in the same order. In Figure 6.9 the variables are ordered in all BDDs and the right most BDD is also reduced and is therefore a ROBDD. There exists only one ROBDD for a given Boolean function and a given variable order. For a more thorough description of ROBDDs, see [Bry86]. Multiple output functions A multiple output function corresponds to a combinational circuit with more than one output. Each output is a separate Boolean function. The separate Boolean functions usually have the same input variables. It is possible to represent each separate output function with a separate ROBDD. Another possibility is that the different output functions share nodes with common subfunctions in the ROBDD. It is then practical to use squares in the top of the ROBDD as shown in Figure 6.10 to indicate which top-node corresponds to each respective output. Figure 6.10 shows the ROBDD for functions f 1 ( x1 , x 2 ) = x1 ⋅ x 2 and f 2 ( x1 , x 2 ) = x1 + x 2 . 145 CHAPTER 6 f1 f2 x1 x1 x2 0 1 Figure 6.10: ROBDD for two functions Using this type of representation for multiple output functions, as in Figure 6.10, can be advantageous for some functions. This type of representation requires that the variable order in the ROBDD is the same for all output functions. For some multiple output functions it is more efficient to use a separate ROBDD for each output because the freedom to choose variable order individually for each output function makes the total number of nodes much smaller than in the type of multiple output ROBDD illustrated in Figure 6.10. Upper bound on size of implementation There is a direct mapping between a BDD representation of a Boolean function and an implementation with two-input multiplexers. Figure 6.11 illustrates this with an example. Figure 6.11a shows a BDD for function f ( x1 , x 2 , x3 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 and Figure 6.11b shows the corresponding implementation with two-input multiplexers. 146 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION f x1 x1 0 1 x2 x2 x2 1 0 0 1 x3 x3 x3 1 0 0 1 0 1 0 1 a b Figure 6.11: Implementation with multiplexers The implementation with multiplexers directly corresponding to a BDD representation is not normally optimal but it gives an upper bound on the implementation size. We know for sure that it is sufficient with one two-input multiplexer per non-leaf-node in the ROBDD to implement a function. To implement a two input multiplexer it is sufficient with three two input NAND-gates and an inverter. 6.2.3. Bound-sets and variable order of ROBDDs To check whether a set of variables Y is a bound-set to a function, a ROBDD can be created with the variables in Y above the others. With all variables above in this context we mean that nodes with associated variables in Y come before other nodes at each possible walk through the ROBDD. We use the term cut in the ROBDD to represent this. The cut is a line for which nodes associated with some variables, in this case Y, are above and others are below. We use the term cut-node for the set of nodes below the cut which are connected to an edge from above the cut. The number of cut nodes is the same as the row 147 CHAPTER 6 multiplicity in the corresponding decomposition chart described in Subsection 6.2.1. x1 x2 cut x3 x4 x4 0 1 Figure 6.12: ROBDD with a cut An example of a ROBDD of function f ( x1 , x 2 , x3 , x 4 ) is shown in Figure 6.12. This ROBDD has a cut such that variables x1 and x2 are above the cut and others are below. The cut nodes are indicated with an extra circle. The number of cut nodes is two so the set {x1 , x 2 } is a bound-set. This means that there exist functions g and h such that f ( x1 , x 2 , x3 , x 4 ) = h(g (x1 , x 2 ), x3 , x 4 ) . Figure 6.13 shows the ROBDDs for the functions h and g. Note that the ROBDD for function h has the same structure as the part of the ROBDD below the cut for function f shown in Figure 6.12. The nodes above the cut are replaced with one node with function value g as the associated variable. The output edges from that node go to the two nodes that are cut-nodes in the corresponding ROBDD for function f. Function g has the same structure as the part of the ROBDD for function f above the cut. The cut-nodes for f are replaced by leaf-nodes 1 and 0 in the ROBDD for g. It can be decided arbitrarily which of the terminal nodes in the ROBDD for function g is 1 and which is 0. The choice affects the node with the associated variable g in ROBDD for function h in the sense that its outgoing edges should be interchanged if the terminal nodes in the ROBDD for function g are interchanged. 148 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION g h g x1 x3 x2 1 0 x4 x4 0 1 Figure 6.13: ROBDD for subfunctions For many functions, the size of a ROBDD is highly dependent on the variable order. An example of that is described in [Bry86]. Changing the order of variables in a ROBDD is a relatively expensive operation in terms of computation time. Therefore, to search for bound-sets it is not efficient to reorder the ROBDD to move the subset of variables to be checked to the top of the ROBDD. In Chapter 7 an algorithm is presented that finds every bound-set among all linear intervals of variables in the ROBDD. A linear interval of variables in this context is a set of variables such that each variable not in that set is either before all or after all variables in this set, where before and after refers to the variable order in the ROBDD. This algorithm is strong because there exists for most Boolean functions a variable order in which the ROBDD gets a minimal number of nodes and in which all subsets of variables forming strong bound-sets are in linear intervals. 6.2.4. Related work about BDD based decomposition In [Ash59] it was shown how row multiplicity can be used to check whether a subset of variables is a bound-set. How this can be done was described in Subsection 6.2.1. The number of cut nodes in a ROBDD is equal to the row multiplicity in a decomposition chart. This has been utilized by a number of BDD-based decomposition algorithms including [Cha96, Lai93, Saw98] 149 CHAPTER 6 In Figure 6.11 it was shown how a BDD directly maps to an implementation with multiplexers. Pass transistor logic is a way to connect transistors to implement logic of that type. Shelar and Sapatnekar [She01] have determined that such implementations often produce unnecessarily long delays. They have shown how a BDD can be partitioned into several BDDs resulting in implementations with smaller delay. Their partitioning method results in a Roth-Karpdecomposition with the multi valued variable encoded into binary variables. Each of these variables is represented by a BDD. Stanion and Sechen [Sta95] presented a method which finds decompositions of the form f ( X ) = g (Y ) • h( Z ) where the bullet is any binary Boolean operation and Y ∪ Z = X and Y ∩ Z = k for some k ≥ 0 , but k is still relatively small. This type of decomposition is referred to as bi-decomposition. Mishchenko et al [Mis01] presented another method to find bi-decompositions. Both Stanion and Sechen’s method and Mishchenko’s method use BDDs in efficient implementations of their methods. ROBDDs themselves are kinds of decomposed representations of the functions. There are methods that use this fact to exploit the structure of ROBDDs to find disjoint decompositions. In [Kar88] the classical concept of dominator on graphs [Len79] is extended to 0,1-dominators on ROBDDs. A node v is a 1-dominator if every path from the root to one-terminal-node contains v. Likewise a node v is 0-dominator if every path from the root to zero-terminal-node contains v. If v is a one-dominator, then the function represented by the ROBDD possesses a disjoint AND-decomposition. This means that the inputs of the function can be divided into a set of groups where each input belongs to exactly one group. Each group is a bound-set of the function and the function value is an AND-function of the function values associated with these bound-sets. If v is a 0-dominator, we get the same type of decomposition but with an OR-function instead of an AND-function. Yang et al [Yan99] extended this idea to XOR-type decompositions and to more general types of dominators. Minato and Micheli [Min98] presented an algorithm that computes disjoint decompositions by generating irreducible sum-of-product for the function from its BDD and applying factorization. 150 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION The algorithm presented by Bertacco and Damiani [Ber97] makes a single traversal of the BDD to identify the decomposition of the cofactors and then combine them to obtain the decomposition for the entire function. However, as observed by Sasao and Matsuura [Sas98], it fails to compute some of the disjoint decompositions. This problem was corrected by Matsunaga [Mat98] where the missing cases in [Ber97] were added to make it so that the OR- functions and XORfunctions would be treated correctly. The algorithm in [Mat98] appears to be one of the fastest existing exact algorithms for finding all disjoint decompositions. 6.3 Decomposition for three-level logic synthesis This section serves as a background to the contributions presented in Chapter 8. Subsection 6.3.1 describes the type of three-level logic used in this thesis and Subsection 6.3.2 describes related work in this area. 6.3.1. Three-level logic Three-level logic is logic with a gate depth of three. The decomposition types for three-level logic considered in this thesis for a Boolean function f(X) can be expressed as f ( X ) = g1 ( X ) • g 2 ( X ) where the bullet (•) is a binary operator and g1 and g2 are Boolean functions represented in SOP-form. With g1 and g2 in SOP-form the bullet represents either an AND-operator or an XOR-operator. Figure 6.14 shows an example of a logic circuit when the bullet corresponds to an XOR-operator. Although the cases when the bullet is an AND-operator and when the bullet is an XOR-operator seem to be quite similar, optimization strategies for the two cases differ quite a lot. The contribution in this thesis in three-level logic optimization is for the case where the bullet is an XOR-operator. Expressions and networks of this type are referred to as AND-OR-XOR logic. This contribution is presented in Chapter 8. 151 CHAPTER 6 x1 x2 x3 x4 f Figure 6.14: A three-level logic circuit with XOR-gate at third level Three-level optimization is a trade-off between the flexibility of multilevel optimization and the small gate depth of two-level optimization. For many functions the required number of components is smaller when using three-level logic compared to two-level logic. Three-level optimization is particularly useful for PLA devices with logic expanders. As described above, the three-level implementation is built up of the two functions g1 and g2. Each of these functions realizes a twolevel implementation. The outputs of these functions are connected to the inputs of the gate at the third level. The cube representation, which is described in Subsection 2.3.3, can be extended to be useful for three-level logic circuits. To do this, a cube representation is made for both functions g1 and g2. Implicants for both these functions are included in that cube representation. Each implicant is marked such that it can be determined whether it belongs to function g1 or to function g2. In the case where the third level of logic is an XOR-gate, the minterms included in implicants from only one of the functions g1 and g2 have function value 1. The minterms included by implicants from both these functions have function value 0 as well as the minterms not covered by any implicants. For cases where the third level is an AND-gate, only minterms included in implicants from both functions g1 and g2 have function value 1. 152 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION Figure 6.15: Karnaugh map used for AND-OR-XOR logic In Figure 6.15 the cube representation of the function implemented as in Figure 6.14 is projected on a Karnaugh map. The solid implicants belong to function g 1 = x1 ⋅ x 2 + x3 ⋅ x 4 and the dotted implicants belong to function g 2 = x1 ⋅ x 2 + x3 ⋅ x 4 . When building f = g1 ⊕ g 2 the result is the function represented by the zeros and ones in this Karnaugh map. As for two-level logic, the number of implicants is useful as an estimation of implementation cost. This cost estimation is especially accurate when a PLA with a logic expander is used. 6.3.2. Related work in three-level logic Several methods for optimization of AND-OR-XOR logic have been proposed [Cha97, Deb98, Dub97a, Dub99, Jab00, Jab02, Pra08, Sas95], for example Pradhan et al [Pra08] used a genetic algorithm and they focused more on power aspects than previous articles for AND-OR-XOR minimization. The results of algorithms presented in these articles vary a great deal between different Boolean functions. For some functions the result is much better than for two-level logic while there is no considerable difference for others. The three-level optimization algorithms are quite time consuming, so it is good to know a priori if it is likely that a Boolean function will benefit from AND-OR-XOR optimization. The contribution presented in Chapter 8 of this thesis is a fast algorithm for estimation of the benefit of ANDOR-XOR optimization. In an algorithm presented by Dubrova et al 153 CHAPTER 6 [Dub99] a preprocessing step considers clusters of intersecting cubes to predict the benefit of optimization for AND-OR-XOR. An algorithm that analyzes the structure of a BDD to predict the benefit of minimization for XOR type logic was presented by Sun and Xia [Sun08]. 6.4 Other applications of Boolean decomposition The Boolean decomposition is one of the main operations performed during logic optimization. Besides this, there are several more situations where Boolean decomposition can be useful. Subsections 6.4.1, 6.4.2 and 6.4.3 briefly describe how decomposition can be useful for circuit partitioning, for simplification of testing and for power estimation in digital circuits. 6.4.1. Circuit partitioning Partitioning is the process of dividing a circuit into two or more parts such that each part fits into an available component. The type of available components and available interconnections defines constraints on partitioning. Minimizing the number of interconnections between parts is often an important constraint. There are many articles about partitioning algorithms including [Dut96, Fid82, Kri84, Li06]. Some of the partitioning algorithms only deal with bi-partitioning. Bi-partitioning is partitioning into two parts. In most cases the two parts should not differ too much in size if the bi-partitioning is to be useful. Bi-partitioning algorithms use some kind of balancing criteria to achieve this, for example there may be a criteria requiring that each part should have 45% to 55% of the circuit. Let f (X ) be a Boolean function such that f ( X ) = h( g1 (Y ), K, g k (Y ), Z ) where X = Y ∪ Z . This expression of function f is the Roth-Karp-decomposition where the integer values 154 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION function is encoded into k binary functions. Assume that f ( X ) should be bi-partitioned. Then a part of the function should be implemented in one part, say A, and the rest of the function in another part, say B. Assume that inputs and outputs of the function are available in both parts. Then functions g1 (Y ), K , g i (Y ) can be implemented in part A for some i ≤ k . Function h and functions g i +1 (Y ), K, gk (Y ) are implemented in part B. Then only i number of interconnections between part A and part B are required. In Section 6.3 a decomposition of type f ( X ) = g1 ( X ) • g 2 ( X ) was described and a contribution for such a decomposition is presented in Chapter 8. For many benchmark functions that benefit greatly from this type of decomposition, the complexity of functions g1 is in a similar range as the complexity of g2. For such functions a good bi-partitioning would be to put function g1 in one part and function g2 in the other part. The operator represented by the dot can be put in any of these parts. For this type of partitioning only one interconnection between the parts is needed, assuming that inputs are available in both parts. 6.4.2. Circuit partitioning for simplification of testing Subsection 2.4.3, regarding test generation, describes fault propagation and fault activation. At the logic level of abstraction, propagation and activation of faults are often the most complex part of test generation. The more complex the design, the more difficult it is to propagate and activate faults. Decomposition can facilitate testing by partitioning the circuit into smaller parts. 155 CHAPTER 6 Y Test pattern assignment g(Y) Z Constant assignment f(X) h(g,Z) a: Test of g(Y) Y Constant assignment Test pattern assignment g(Y) Z h(g,Z) f(X) b: Test of h(g, Z) Figure 6.16: Decomposition facilitating testing The disjoint decomposition of type f ( X ) = h( g (Y ), Z ) where Y ∪ Z = X and Y ∩ Z = ∅ is a good example of how decomposition can facilitate testing. Figure 6.16a illustrates how the block that implements function g can be tested. The variables in set Z are assigned constant values such that the output of the block implementing g is propagated through the block implementing h to the output of the logic. Test vectors can then be assigned to the set Y to test the block that implements function g. Figure 6.16b illustrates the test for the block that implements function h. The inputs in vector Z and the input from g should then be assigned test vectors. The inputs in set Y except one are then assigned constant values such that the 156 BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION remaining bit in set Y is propagated through block g to block h. In such a way test vectors for function h can be applied on the inputs. It is possible that the constant values applied to make a signal propagate through a block cause the signal that propagates to get inverted. In such a case the same method can be used but test vectors and analysis of results need to be adjusted accordingly. 6.4.3. Power estimation Power consumption in CMOS devices is highly dependent on the switching activity in internal nodes. Switching activity in a node is a quantity that describes how often it changes value. Switching activity estimation is computationally difficult [Cho96, Cos97]. If disjoint decomposition is known and used in the implementation, switching activity can be computed separately on the different parts that the decomposition divides the circuit into. In such a way switching activity computations can be facilitated. 157 CHAPTER 6 158 Chapter 7 A fast algorithm for finding bound-sets In this chapter a fast heuristic algorithm is presented that finds disjoint decomposition of Boolean functions. This algorithm is referred to as Interval-cut algorithm in the following description. 7.1 Basic idea of Interval-cut algorithm In Subsection 6.2.3 we described how a set of variables that are above other variables in an ROBDD can be checked to determine whether they are a bound-set. A drawback of this method is that a set of variables that should be checked needs to be put in the top of the ROBDD, which incorporates computationally expensive reordering algorithms. The Interval-cut algorithm can check any interval of variables that are adjacent in the ROBDD. To do so, two cuts are used instead of one. The upper cut is a boundary line between the variables that should be checked to determine whether they are a bound-set and the variables above them in the ROBDD. The lower cut is a boundary line between the variables that should be checked to determine whether they are a bound-set and the variables below them in the ROBDD. 159 CHAPTER 7 upper cut lower cut x3 x1 x1 x1 x2 x2 x2 x4 x4 x4 1 0 1 a x1 0 b x1 x2 x2 x4 x4 0 1 0 1 c d x1 x2 1 0 e Figure 7.1: Illustration of Interval-cut algorithm The ROBDD of the following Boolean function is shown in Figure 7.1a: f ( x1 , x 2 , x3 , x 4 ) = ( x1 ⋅ x 2 ) ⋅ x3 ⋅ x 4 + x ⋅ x ⋅ x + ( 1 2) 4 (x1 ⋅ x 2 ) ⋅ x3 + x1 ⋅ x 4 This function is used as an example. The variable set {x1 , x 2 } should be checked to determine whether it is a bound-set. First, the cut nodes of the upper cut, which are above the lower cut, are identified. In this example there are two such nodes, those with x1 as associated variables. The sub-ROBDDs with top-nodes at each of those nodes are representing other Boolean functions. In this example, those functions are shown in Figure 7.1b and in Figure 7.1c. A necessary condition for {x1 , x 2 } to be a bound-set for the original function is that it is a bound-set to those functions as well. In these sub-ROBDDs the variables x1 and x2 are above the lower cut and other variables are below. Hence the number of cut-nodes with respect to the lower cut can be checked with respect to these subfunctions. If they are two the set {x1 , x 2 } is a bound-set with respect to those subfunctions. There are two cut nodes to the lower cut in both functions in Figure 7.1b and in Figure 7.1c. In Figure 7.1b it is the node with associated variable x4 and the terminal node with constant 1 while in Figure 7.1c it is the two nodes with associated variable x4. So this necessary condition is fulfilled because there are two cut nodes in both of those functions. 160 A FAST ALGORITHM FOR FINDING BOUND-SETS Given the assumption that this necessary condition is fulfilled {x1 , x 2 } is a bound-set to the original function in Figure 7.1a if and only if the associated function with the bound-sets in Figure 7.1b and Figure 7.1c are equal up to complementation of their function value. The functions associated with the bound-sets in Figure 7.1b and Figure 7.1c can be extracted by replacing the cut-nodes of the lower cut with terminal nodes 1 and 0. In Figure 7.1d and Figure 7.1e this is done. Because the ROBDD is unique for a given variable order and a given function, two functions are equal if and only if their ROBDDs are equal with the condition that variable orders are the same. The ROBDDs in Figure 7.1d and Figure 7.1e are equal. Hence {x1 , x 2 } is a bound-set of the original function. The polarity of the associated function depends on which of the cut nodes is replaced by terminal node 1 and which one is replaced by terminal node 0. The Interval-cut algorithm uses this method to check every set of variables that are adjacent in the ROBDD. It does not have to make any new ROBDDs as in the demonstration examples, rather it does the analysis directly on the existing ROBDD. The order in which the adjacent sets of variables are checked is done in such a way that the information from the last check can be reused for the next check, see Section 7.3. Also, some checks can be avoided because previous checks imply that a given set cannot be a bound-set. 7.2 Interval-cut algorithm and formal proof of its functionality This section formally describes the functionality of the Interval-cut algorithms and gives a proof for it. 7.2.1. Terminology, definitions and notations Let V be the set of nodes of an ROBDD G of an n-variable function f ( X ) . Every non-terminal node v ∈ V has an associated 161 CHAPTER 7 variable-index, index(v) ∈ {1,..., n} . We let these indices increase when going from the top-node to the leaf-nodes in the ROBDD. The index of the top-node is then 1. In order to have a unified notation in the proof of the main result, we also let the terminal nodes have an index, which is n+1. Definition: cut (i ) Let cut (i ) be a boundary line in the ROBDD such that nodes with associated variable-index, index(v) ≤ i are above this cut and nodes with associated variable-index, index (v) > i are below this cut. Definition: cut _ set (G , i ) Let cut _ set (G , i ) denote the subset of nodes in the ROBDD G which are below cut (i ) and which have at least one edge connected to a node above cut (i ) . Definition: below _ set (G, i ) and above _ set (G , i ) Let below _ set (G, i ) denote the nodes in the ROBDD G which are below cut (i ) and let above _ set (G , i ) denote the nodes that are above cut (i ) . Definition: sub _ bdd (G , v) Let v be a node in the ROBDD G, then let sub _ bdd (G , v) denote the subpart of the ROBDD G that is rooted at node v. Definition: trunc (G , i ) This operator is only applicable for functions where cut _ set (G , i ) = 2 Let trunc (G , i ) be the part of the ROBDD G with the two nodes in cut _ set (G , i ) replaced by terminal nodes 0 and 1 and with all other nodes below cut(i) removed. Deciding which node in cut _ set (G , i ) should be replaced with terminal node 0 and which should be replaced with terminal node 1 is made deterministically. This means that a pair of calls to trunc (G , i ) cannot 162 A FAST ALGORITHM FOR FINDING BOUND-SETS give ROBDDs representing functions that are mutually inverses of each other. 7.2.2. Algorithm and proof Let a, b be integer values such that 0 ≤ a < b ≤ n . Let the variable set Y be the set of variables with indices a < index (v) ≤ b; v ∈ Y . This means that the set Y contains the variables associated with the nodes in the ROBDD between cut (a ) and cut (b) . Let Z denote the variables not in Y. Using this notation, we can describe the pseudo code of the Interval-cut algorithm as shown below. 1 2 3 2 4 5 6 7 8 9 10 11 12 13 14 15 16 ALGORITHM Interval-cut algorithm (G, a, b) (V,…,Vm) ← cut_set(G, a) ∩ above_set(G, b) FOR ALL i∈[1, m] DO Ui ← sub_bdd(G, Vi) IF |cut_set(Ui, b)| ≠ 2 THEN RETURN “Y is not a bound-set” END IF END FOR ALL FOR ALL i∈[2, m] DO IF NOT trunc(U1, b) ≡ trunc(Ui, b) THEN RETURN “Y is not a bound-set” END IF END FOR ALL h(g, Z) ← Function of ROBDD G with the nodes between cut(a) and cut(b) replaced by nodes with associated variable g. There will be one such node for each trunc(Ui, b), i∈[1, m] g(Y) = Function of ROBDD trunc(U1, b) RETURN (h(g, Z), g(Y) END ALGORITHM Next, we prove that it computes the decompositions correctly. Theorem 7.1: Interval-cut algorithm (G, a, b) above determines unambiguously whether a decomposition f ( X ) = h( g (Y ), Z ) exists when X = Y ∪ Z , Y ∩ Z = Ø a < b and Y is the variables between cut (a ) and cut (b) in the ROBDD G while Z is the variables above cut (a ) and below cut (b). Proof: Let Z1 be the variables above cut (a ) and let Z2 be the variables below cut (b). then Z 1 ∪ Z 2 = Z . 163 CHAPTER 7 Let pi (Z 1 ) be i Boolean functions which is 1 for all variable assignments in Z1 that lead the path to the top-node of respective subROBDD Ui and which is 0 for other variable assignments in Z1. Let f i (Y , Z 2 ) be the Boolean function represented by the sub-ROBDD Ui. The Boolean function f ( X ) can then be co-factored in the following way. m f ( X ) = ∑ pi (Z i ) ⋅ f i (Y , Z 2 ) + q(Z1 , Z 2 ) (7.1) i =1 The set Y is a bound-set to f i (Y , Z 2 ) if and only if cut _ set (U i , b) = 2 . The definitions of the functions f i (Y , Z 2 ) and the properties of ROBDDs imply that cut _ set (U i , b) ≥ 2 for all i and that at least one variable in set Y belongs to the support set of f i (Y , Z 2 ) for all i. If Y is not a bound-set for f i (Y , Z 2 ) for some i then from Equation (7.1) it can be concluded that Y cannot be boundset for f(X). In the case when cut _ set (U i , b) = 2 for all i there exist functions hi and gi such that the following holds for all i: f (Y , Z 2 ) = hi ( g i (Y ), Z 2 ) (7.2) Function f ( X ) can then be written as: m f ( X ) = ∑ pi (Z i ) ⋅ hi (g i (Y ), Z 2 ) + q(Z1 , Z 2 ) (7.3) i =1 If two functions gi are not identical up to complementation, Y cannot be a bound-set but if they are all identical up to complementation, expression 7.3 can be expressed as follows where g(Y) = g1(Y), ci=0 for i where g1 ≡ g i and ci=1 for all i where g1 ≡ g i . m f ( X ) = ∑ pi (Z i ) ⋅ hi (ci ⊕ g (Y ), Z 2 ) + q (Z1 , Z 2 ) (7.4) i =1 Hence Y is a bound-set to f(X). 164 □ A FAST ALGORITHM FOR FINDING BOUND-SETS 7.3 Implementation aspects and complexity analysis The formal description of the algorithm in Section 7.2 only describes how a single subset of variables is checked to determine whether it is a bound-set. The method becomes efficient, however, when all linear intervals of variables are checked in the same run. This can be done by moving the upper cut from the top to the bottom of the ROBBD in a loop. For each upper cut it is then possible to identify the lower cuts for which the set of variables between the cuts is a bound-set. We show in this section how this can be performed with O(m3) operations, where m is number of nodes in the ROBDD. The algorithm consists of two parts. First a list is generated with all boundsets that have adjacent variables in the ROBDD. In the second step that list is processed such that bound-sets that are known to be weak are removed from the list and bound-sets that are not known to be weak are labeled as prime or full. It is possible that the list contains weak bound-sets that have not been identified as weak. Such a boundset will be labeled full if it is part of a full bound-set. To show that the algorithm can check every linear interval of variables in O(m3) operations in total, an implementation is described in this section and it is proven that this implementation needs O(m3) operations in total. The description of the algorithm is divided into subfunctions. The function at the top of the calling hierarchy is ImplementationIntervalCut shown below. This function first calls subfunction GetBoundsetList and then it calls ProcessFoundBoundset. Each of these functions is called only once and complexity of each function is O(m3). Because there is no loop in the top function ImplementationIntervalCut its complexity will then be O(m3). It is described in Subsection 7.3.1 how subfunction GetBoundsetList can ( ) be implemented with O m 3 operations. In Subsection 7.3.2 it is shown how subfunction ProcessFoundBoundset can be implemented ( ) with O m 3 operations. 165 CHAPTER 7 1 2 3 4 FUNCTION strongBsList ← ImplementationIntervalCut(topNode) bsList ← GetBoundsetList(topNode) strongBsList ← ProcessFoundBoundset(bsList) END FUNCTION In the functions in this implementation, sets of variables are used. The sets of variables considered are only intervals of adjacent variables in the variable order of the ROBDD, therefore only the index of the first and the last variable or the equivalent (upper cut and lower cut in this description) are needed to represent a set of variables. Comparisons between variable sets and assignments of variables sets can then be made with O(1) operations. There are indexed lists used in the description of the algorithms. Indexing of such lists starts with index 0. There is at least one node in the ROBDD for each variable that a function depends on, therefore n ≤ m, where n is number of variables. 7.3.1. Generating a list of bound-sets The function with the algorithm for generation of the list of boundsets is described with a main function and two subfunctions. The main function is GetBoundsetList. This function calls the two subfunctions GetLevelWhereStructuresDiffers and GetLevelsWithTwoCutNodes. The former subfunction takes two nodes of the ROBDD as input arguments. It computes the level at which the structures of the subROBDD rooted at these nodes differ. The latter subfunction takes one node of the ROBDD as input argument and it returns a list with the cuts where the sub-ROBDD rooted at this node has two cut nodes. The complexity is O(m2) for both of these subfunctions. 166 A FAST ALGORITHM FOR FINDING BOUND-SETS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 FUNCTION bsList ← GetBoundsetList (topNode) LENGTH(bsList) ← 0 LENGTH(U(1)) ← 1 U(1, 0) ← topNode FOR i ← 2 TO n LENGTH (U(i)) ← 0 END FOR FOR i ← 1 TO n nextLevelContainingUpperCutNodes ← n + 1 FOR j ← n DOWNTO i + 1 IF LENGTH (U(j)) > 0 nextLevelWithUpperCutNodes ← j END LOOP differLevel ← n + 1 FOR j ← 1 TO LENGTH(U(i)) - 1 level ← CALL GetLevelWhereStructuresDiffers (U(i, 0), U(i, j)) differLevel ← MIN(differLevel, level) END FOR maxLowerCut ← MIN(nextLevelWithUpperCutNodes, differLevel) levelList ← GetLevelsWithTwoCutNodes(U(0), maxLowerCut) FOR j ← 0 TO LENGTH(levelList) – 1 LENGTH(bsList) ← LENGTH(bsList) + 1 bsList(LENGTH(bsList) – 1).UpperCut ← i bsList(LENGTH(bsList) – 1).LowerCut ← j END FOR FOR j ← 0 TO LENGTH(U(i)) - 1 FOR EACH SUCCESSOR NODE s OF U(i, j) p ← index of variable associated with s IF s NOT in U(p) ADD s TO U(p) END IF END FOR EACH END FOR END FOR The loop at line 8 in function GetBoundsetList loops over the upper cuts. The list of lists U contains the cut nodes for the current upper cut. There is one list in U for each variable and the first index indicates the variable. Initially the cut is for variable 1 and lines 3 – 7 initiates U accordingly. Lines 26 – 32 update U at each iteration of the loop at line 8. The loop at line 26 iterates less than m times and the loop at line 27 iterates twice. The check at line 29 needs to iterate over the list U(p) which has less than m elements. The updating of U therefore needs less than m2 operations and this is done in the loop at line 8 which iterates n times. The updating of U therefore needs O(m2n) ≤ O(m3) operations in total. At lines 9 – 13 the variable nextLevelWithUpperCutNodes is assigned with the variable index of the upper cut node with the lowest variable index among those that have a higher variable index than the 167 CHAPTER 7 index of the current cut. The loop at line 10 iterates less than n times and it is inside the loop at line 8. The number of operations for this is therefore O(n2) ≤ O(m3). Lines 14 – 18 compare the structures of the parts of the ROBDD rooted at the upper cut nodes with variable index equal to the current upper cut, which is i. This operation finds the highest variable index for which all these structures are equal. These operations occur inside the loops at line 8 and at line 15. The loop at line 8 iterates n times and the loop at line 15 iterates less than m times. The length of these loops is, however, dependent such that the operations inside the loop at line 15 in total iterates less than 2m times. In that loop the subfunction GetLevelWhereStructuresDiffers is called. That function needs O(m2) operations and because it is called less than 2m times, O(m3) operations are needed in total for the operations at lines 14 – 18. At line 20 the subfunction GetLevelsWithTwoCutNodes is called. It returns a list of lower cuts. The set of variables between the current upper cut and every lower cut in that list is a bound-set. Function GetLevelsWithTwoCutNodes needs O(m2) operations and it is called inside the loop at line 8 which has n iterations, therefore O(m2n) ≤ O(m3) operations is needed in total for this operation. Lines 21 – 25 add the newly found bound-sets to the list of boundsets. The loop at line 21 iterates less than n times and it is inside the loop at line 8 which iterates n times. The operations at these lines then need in total O(n2) ≤ O(m3) operations. 168 A FAST ALGORITHM FOR FINDING BOUND-SETS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 FUNCTION level ← GetLevelWhereStructuresDiffers(A, B) currentLevel ← variable index of A LENGTH(La) ← 1 La(0) ← A LENGTH(Lb) ← 1 Lb(0) ← B differenceReached ← FALSE WHILE (NOT differenceReached) AND currentLevel ≤ n i ← 0 WHILE (NOT differenceReached) AND i < LENGTH(La) IF variable index of La(i) = currentLevel FOR s ← 0 TO 1 successorNodeFoundInList ← FALSE FOR j ← 0 TO LENGTH(La) - 1 IF La(j) = successor node s of La(i) XOR Lb(j) = successor node s of Lb(i) level ← currentLevel + 1 differenceReached ← TRUE ELSE IF La(j) = successor node s of La(i) successorNodeFoundInList ← TRUE END IF END FOR IF NOT successorNodeFoundInList LENGTH(La) ← LENGTH(La) + 1 La(LENGTH(La) - 1) ← successor node s of La(i) LENGTH(Lb) ← LENGTH(Lb) + 1 La(LENGTH(Lb) - 1) ← successor node s of Lb(i) END IF END FOR END IF i ← i + 1 END WHILE i ← 0 WHILE i < LENGTH(La) IF variable index of La(i) = currentLevel La(i) ← La(LENGTH(La) - 1) LENGTH(La) ← LENGTH(La) - 1 Lb(i) ← Lb(LENGTH(Lb) - 1) LENGTH(Lb) ← LENGTH(Lb) – 1 ELSE i ← i + 1 END IF END WHILE currentLevel ← currentLevel + 1 END WHILE IF VALUE NOT ASSIGNED TO level level ← n + 1 END IF The function GetLevelWhereStructuresDiffers takes two nodes A and B of the ROBDD as input arguments. It computes the level at which the structures of the sub-ROBDD rooted at A and B differ. It needs O(m2) operations to run. 169 CHAPTER 7 List La contains cut nodes found for the sub-ROBDD rooted at node A and list Lb contains cut nodes found for the sub-ROBDD rooted at node B. At lines 3 – 6 these lists are initialized to contain node A and node B respectively. The loop at line 8 loops through the variables as long as no difference in structure is found. In this loop the lists La and Lb are looped through with help of the loop at line 10. The length of La and Lb are equal. These two loops run less than n and m iterations respectively. The complexity of these loops is therefore O(mn) ≤ O(m2). The conditional statement at line 11 is true for the elements in list La for which the variable index is equal to currentLevel. In total this is true less than m times during the iterations of the loops at line 8 and line 10. Replacing La with Lb in the expression at line 11 will make no difference. The loop at line 14 iterates through the lists La and Lb to check whether any successor of La and Lb are located such that the structure is not equal, below the level currentLevel. This loop iterates over less than m elements and because the conditional statement at line 11 is true less than m times, totally O(m2) operations are executed within the loop at line 14. Lines 33 – 42 iterate through the lists La and Lb to remove the nodes with the current variable index before the function proceeds further with the next level. The loop at line 34 makes less than m iterations. This loop is inside the loop at line 8, therefore there are O(mn) ≤ O(m2) operations in total for this part of the function. 170 A FAST ALGORITHM FOR FINDING BOUND-SETS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 FUNCTION levelList ← GetLevelsWithTwoCutNodes(U,maxLowerCut) LENGTH(levelList) ← 0 LENGTH(L) ← 1 L(0) ← U FOR i ← variable index of U TO maxLowerCut j ← 0 WHILE j < LENGTH(L) IF variable index of L(j) = i successorNodeZeroInList ← FALSE successorNodeOneInList ← FALSE FOR k ← 0 TO LENGTH(L) – 1 IF successor node 0 of L(i) = L(j) successorNodeZeroInList ← TRUE IF successor node 1 of L(i) = L(j) successorNodeOneInList ← TRUE END FOR IF NOT successorNodeZeroInList LENGTH(L) ← LENGTH(L) + 1 L(LENGTH(L) - 1) ← successor node 0 of L(i) END IF IF NOT successorNodeOneInList LENGTH(L) ← LENGTH(L) + 1 L(LENGTH(L) - 1) ← successor node 1 of L(i) END IF L(i) ← L(LENGTH(L) – 1) LENGTH(L) ← LENGTH(L) - 1 ELSE j ← j + 1 END IF END WHILE IF LENGTH(L) = 2 LENGTH(levelList) ← LENGTH(levelList) + 1 levelList(LENGTH(levelList) – 1) ← i END IF END FOR The function GetLevelsWithTwoCutNodes returns a list with the cuts where the sub-ROBDD rooted at U has two cut nodes. This function runs within O(m2) operations. The list L, initiated at lines 3 – 4, contains the cut nodes for the current cut. The current cut is represented with the variable i defined in the loop at line 5. Lines 7 – 30 updates list L each time i is increased. The loop at line 5 iterates maximum n times and the loop at line 7 iterates less than m times. The statements in the loop at line 7 will therefore execute O(mn) ≤ O(m2) times in total. The conditional statement at line 8 will be true less than m times during execution of the loops at line 5 and line 7. The loop at line 11 iterates less than m times each time it is called. The operations that 171 CHAPTER 7 execute in the loop at line 11 will therefore execute O(m2) times in total. Lines 31 – 34 check if the number of cut nodes is two. If this is true an element is added to the levelList that contains the current cut. 7.3.2. Processing the list of bound-sets Function GetBoundsetList in Subsection 7.3.1 gives a list with all the O(n2) bound-sets that have adjacent variables in the ROBBD. From this list the function ProcessFoundBound-set extracts the O(n) boundsets that are not known to be weak based on the information in the list. The extracted bound-sets that can be determined to be full or part of a full bound-set with the help of that list are labeled full; other extracted bound-sets are labeled prime. The list of bound-sets generated by the function GetBoundsetList is sorted in ascending order, first by upper cuts, and then by lower cuts. The function ProcessFoundBound-set utilizes the sort order of the list in conjunction with the fact that all the bound-sets with adjacent variables in the ROBBD are in the list. 172 A FAST ALGORITHM FOR FINDING BOUND-SETS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 FUNCTION strongBsList ← ProcessFoundBoundset(bsList) LENGTH(strongBsList) ← 0 FOR i ← 0 TO LENGTH(bsList) - 1 IF NOT bsList[i] marked to be weak or considered overlapFound ← FALSE j ← i + 1 WHILE NOT overlapFound AND j < LENGTH(bsList) IF (bsList[i].upperCut < bsList[j].upperCut AND bsList[j].upperCut < bsList[i].lowerCut AND bsList[i].lowerCut < bsList[j].lowerCut) overlapFound ← TRUE upperCutStrongBs ← bsList[i].upperCut lowerCutStrongBs ← bsList[j].lowerCut LENGTH(innerCuts) ← 2 innerCuts[0] ← bsList[j].upperCut innerCuts[1] ← bsList[i].lowerCut END IF j ← j + 1 END WHILE IF overlapFound WHILE (j < LENGTH(bsList) AND innerCuts[0] = bsList[j].upperCut) LENGTH(innerCuts) ← LENGTH(innerCuts) + 1 innerCuts[LENGTH(innerCuts) - 1] ← lowerCutStrongBs lowerCutStrongBs ← bsList[j].lowerCut j ← j + 1 END WHILE k ← 0 j ← 0 WHILE j < LENGTH(bsList) AND k < LENGTH(innerCuts) IF bsList[j].upperCut ≥ innerCuts[k] k ← k + 1 END IF IF (bsList[j].upperCut ≥ upperCutStrongBs AND bsList[j].lowerCut ≤ lowerCutStrongBs AND bsList[j].lowerCut > innerCuts[k]) mark bsList[j] to be weak or considered END IF j ← j + 1 END WHILE LENGTH(strongBsList) ← LENGTH(strongBsList) + 1 index ← LENGTH(strongBsList) - 1 strongBsList[index].upperCut ← upperCutStrongBs strongBsList[index].lowerCut ← lowerCutStrongBs strongBsList[index].type ← full bound-set ELSE LENGTH(strongBsList) ← LENGTH(strongBsList) + 1 index ← LENGTH(strongBsList) - 1 strongBsList[index].upperCut ← bsList[i].upperCut strongBsList[index].lowerCut ← bsList[i].lowerCut strongBsList[index].type ← prime bound-set END IF END IF END FOR The loop at line 3 iterates over the list of bound-sets and that list has less than n2 elements. The conditional statement at line 4 is true once for each of the bound-sets that cannot be determined to be weak based on the list of bound-sets, which is less than 2n bound-sets. 173 CHAPTER 7 In the loop at line 7 each bound-set with an index larger than i in the list of bound-sets is checked to determine whether it overlaps with the bound-set with index i. This check is made at line 8. If no overlapping bound-set is found, the bound-set with index i is added to the list of strong bound-sets and it is labeled as prime. This is done at lines 41 – 45. Lines 10 – 14 and lines 19 – 39 have two functions. The first is to identify, among the bound-sets that will be labeled full, which one to associate with the weak bound-set with label i, subsequently adding the identified bound-set to the list of strong bound-sets strongBsList. The second function is to mark all weak bound-sets associated with that full bound-set as weak or considered in the list bsList. The strong bound-set itself is also marked in this way. Lines 10 – 14 and lines 19 – 24 find the full bound-set to which the weak bound-set with index i is associated. This full bound-set is upperCutStrongBs and represented with the variables lowerCutStrongBs. In those lines the list innerCuts is also generated. These are cuts between upperCutStrongBs and lowerCutStrongBs. With the help of these cuts it is possible to find all weak bound-sets associated with the currently considered bound-set that is labeled full. Lines 27 – 34 iterate over the bound-sets with an index larger than i in list bsList. It marks the full bound-set currently under consideration and all weak bound-sets associated with it. This check is done at line 31. It is determined whether any cut in the innerCuts is between the upper and lower cut of the bound-set with index j and it is also determined whether or not all variables of the bound-set with index j are also variables of the full bound-set under consideration. Because of the way the list bsList is sorted it is sufficient to check whether the cut with index k in the list innerCuts is between the upper and lower cut of the bound-set with index j in bsList. Lines 36 – 39 add the full bound-set under consideration to the list of strong bound-sets strongBsList. The loops at line 7, line 19 and line 27 all iterate over a part of the list of bound-sets, which has less than n2 elements. They are all inside the conditional statement at line 4 which is true less than 2n times during the run of the loop at line 3; the operations inside these loops therefore run O(n3) times in total. The complexity of the function 174 A FAST ALGORITHM FOR FINDING BOUND-SETS ProcessFoundBound-set is therefore O(n3) and because n ≤ m it is also O(m3). 7.4 Experimental results To thoroughly evaluate the presented heuristic, the exact decomposition algorithm [Dub97b] was implemented. This algorithm was applied on the IWLS93 benchmark set. For all single outputs, for which the exact algorithm did not time out, 582 in total, the number of strong bound-sets found by the Interval-cut algorithm was computed. In the first set of experiments, a sifting ordering algorithm [Rud93] was used with the help of the Colorado University Decision Diagram (CUDD) tool [Som98] to get a good initial order for ROBDDs. For 526 of those 582 single-output functions, it found one hundred percent of the bound-sets. In the second set of experiments, the ROBDD was built using the breadth first traversal order from the benchmark circuit description. For 191 functions out of these 582 the result was worse than in the first set of experiments by 57% on average. In this case, worse means that a smaller number of strong bound-sets were found. Nevertheless, the heuristic still found all the bound-sets for 365 functions. The Interval-cut algorithm was also applied on the benchmarks reported in [Ber97, Mat98, Min98]. The results are summarized in Table 7.1. Column 4 shows how many non-trivial strong bound-sets were found for each benchmark by the Interval-cut algorithm. Every output is handled as a separate function. The number given in column 4 is the total sum of bound-sets for all the outputs. Columns 5 - 8 show runtime comparison. Unfortunately none of these algorithms has a publicly available implementation so the experiments were run on different computers. The experiments for the Interval-cut algorithm were run on a Sun Ultra 60 operating with two 360 MHz CPUs and with 1024 MB RAM main storage. The algorithm [Min98] uses a SUN Ultra 30, [Ber97] uses a PC equipped with 150 MHz Pentium and 96 MB RAM main storage and [Mat98] uses a PC with a Pentium-II 233Mhz processor. 175 CHAPTER 7 Table 7.1: Comparison of execution time for Interval-cut algorithm Name of benchmark function Number of inputs Number of outputs Number of strong boundsets Interval-cut algorithm Algorithm in [Min98] Algorithm in [Ber97] Algorithm in [Mat98] Execution time (seconds) alu2 alu4 apex1 apex2 apex3 apex4 apex5 apex6 apex7 b9 C432 C499 C880 C1355 C1908 C3540 cmb CM42 CM85 CM150 comp count dalu des e64 f51m frg2 k2 lal misex2 mux pair PARITY rot seq s298 10 14 45 38 54 9 114 135 49 41 36 41 60 41 33 50 16 4 11 21 32 35 75 256 65 8 143 45 26 25 21 173 16 135 41 17 6 8 45 3 50 19 88 99 37 21 7 32 26 32 25 22 4 10 3 1 3 16 16 245 65 8 139 45 19 18 1 137 1 107 35 20 3 2 83 16 23 4 196 258 96 49 10 68 45 0 15 18 4 10 15 1 47 47 42 688 63 6 532 85 57 29 1 725 1 296 135 15 0.0002 0.0009 0.008 0.001 0.008 0.002 0.032 0.008 0.006 0.001 0.002 5.2 0.046 5.2 0.23 2.8 0.002 0.0006 0.0003 <0.0001 0.002 0.007 0.015 0.041 0.51 0.0004 0.032 0.008 0.002 0.003 0.0001 0.040 0.001 0.039 0.009 0.0004 59.0 5.9 44.3 13.1 1.7 415.4 >0.8 19.2 67.8 - 0.28 0.37 1.01 1.14 0.33 2.34 2.62 1.03 1.23 83.47 2.71 91.25 7.58 21.1 0.36 0.15 0.27 0.51 0.71 0.73 1.31 0.26 2.86 1.04 0.55 0.57 0.48 4.02 0.38 22.62 1.10 0.40 0.15 0.41 0.37 0.02 0.28 8.80 0.92 8.87 1.42 3.48 0.01 0.36 0.15 7.36 - 176 A FAST ALGORITHM FOR FINDING BOUND-SETS Name of benchmark function Number of inputs Number of outputs Number of strong boundsets Interval-cut algorithm Algorithm in [Min98] Algorithm in [Ber97] Algorithm in [Mat98] Execution time (seconds) s420 s444 s526 s641 s832 s953 s1196 s1238 s1423 s1488 s1494 term1 too large ttt2 vda x4 35 24 24 54 23 45 32 32 91 14 14 34 38 24 39 94 18 27 27 42 24 52 32 32 79 25 25 10 3 21 17 71 18 65 45 138 37 40 33 33 38 38 38 65 17 44 30 180 0.007 0.001 0.002 0.003 0.003 0.003 0.002 0.002 0.066 0.002 0.002 0.002 0.001 0.002 0.003 0.008 >1.0 >0.5 - 0.75 0.54 0.52 1.12 0.54 20.97 0.71 0.75 12.48 0.36 0.34 0.75 0.55 0.4 1.90 0.09 - The experiments on benchmarks given in Table 7.1 show that the Interval-cut algorithm is fast compared to the published exact algorithms. For all benchmarks the execution of the Interval-cut algorithm ran faster than the algorithms reported in [Ber97, Mat98, Min98]. The benchmarks for which the exact algorithms presented in [Min98], [Ber97] and [Mat98] took longest time to execute compared to the Interval-cut algorithm are C432, s953 and pair respectively. For these benchmarks the exact algorithms took 210000 times, 7000 times respectively 180 times longer to execute than the Interval-cut algorithm. These differences are too large to only be caused by differences in performances of the computers that were used, hence these experiments demonstrate that the Interval-cut algorithm is considerably fast. 177 CHAPTER 7 7.5 Discussion and conclusions In this chapter the Interval-cut algorithm has been presented. The Interval-cut algorithm is a heuristic algorithm for finding bound-sets of Boolean functions. The bound-sets show how the Boolean function can be disjointly partitioned. This algorithm operates on an ROBDD and it finds all bound-sets of variables that are adjacent in the ROBDD. The algorithm has a time complexity of O(m3), where m is the number of nodes in the ROBDD. This algorithm is strong because, for most Boolean functions, there is a variable order in which the ROBDD has a minimal number of nodes and in which all subsets of variables forming strong boundsets are in linear intervals. It was stated in [Tes05] that it is only in some rare cases that such a variable order does not exist. The experiments on benchmark functions demonstrate that in most ROBDDs in which the variable order is chosen with a practical algorithm, in this case the sifting algorithm [Rud93], all strong boundsets are adjacent in the ROBDD. For such cases all bound-sets are found by the Interval-cut algorithm. If all strong bound-sets are not in linear intervals, the heuristic method finds a tree with the same properties as a decomposition tree, but some bound-sets are missed. In this tree, nodes may be weak bound-sets, but without knowledge about the bound-sets that are not found, they are labeled full or prime and can be used in the same way as strong bound-sets. 178 Chapter 8 Functional decomposition for three-level logic implementation In this chapter a fast algorithm is presented for estimation of whether a Boolean function is likely to benefit from three-level ANDOR-XOR optimization. Background and related work to the contribution in this chapter were described in Chapter 6. The experimental results presented in [Dub99] show that optimization algorithms for AND-OR-XOR logic can be quite time consuming. Those experimental results also shown how some functions gain much in implementation size compared to a two-level sum-of-product implementation while other functions gained nothing or very little. It is therefore advantageous to know in advance the benefit of running such an algorithm. In this chapter a method to predict the benefit is presented. First, we study and describe the kind of structure a function should have to benefit from optimization for an AND-OR-XOR structure. We then give a theorem and its proof to characterize such functions. This theorem formulates a sufficient condition for a given function f ( X ) , X = {x1 , x 2 , K, x n } to have a decomposition of type f ( X ) = ( g ( X ) ⊕ h( X )) + r ( X ) with the total number of product- 179 CHAPTER 8 terms in g, h and r smaller than the number of product-terms in f, when functions are represented in SoP form. The function r is needed to make the condition sufficient. The estimation algorithm uses this theorem to predict how much benefit optimization for AND-OR-XOR will give. Note that there are no restrictions on the support sets of g and h. This is a difference between the presented method and methods utilizing algebraic decomposition. Section 8.1 illustrates the basic ideas of the algorithm. Section 8.2 describes the theorem formally. In Section 8.3 the estimation algorithm utilizing the theorem is presented. Experimental results are presented in Section 8.4. 8.1 Basic ideas in 3-level decomposition estimation method AND-OR-XOR optimization is decomposition of a function for an implementation like the one illustrated in Figure 8.1. Functions g and h are in sum-of-product form. g h Figure 8.1: AND-OR-XOR implementation The estimation algorithm starts from a sum-of-product representation of the Boolean function. It utilizes the idea that cubes of the functions g and h are generated based on the cubes in the sum-of-product representation of the original function. The algorithm checks each pair of cubes to see if they can be replaced by one cube. This cube together with the remaining cubes should then be used in functions g and h to implement the function. Some of the remaining cubes might be modified to achieve this. The algorithm counts the number of pairs of cubes for which this is possible. The more cases for which it is 180 FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION possible the more likely it is that the function will benefit from ANDOR-XOR optimization. x1x2 x3x4 00 01 11 10 x1x2 x3x4 00 01 11 10 x1x2 x3x4 00 01 11 10 00 1 0 1 0 00 1 0 1 0 00 1 0 1 0 01 0 1 0 0 01 0 1 0 0 01 0 1 0 0 11 1 0 0 0 11 1 0 0 0 11 1 0 0 0 10 0 0 0 1 10 0 0 0 1 10 0 0 0 1 a b c Figure 8.2: Illustration of the algorithm To illustrate how to check whether a pair of cubes can be replaced with one cube, consider the function shown in Figure 8.2a. The circles in Figure 8.2a represent the cubes in the sum-of-product form of the function. The cubes x1 ⋅ x 2 ⋅ x3 ⋅ x 4 and x1 ⋅ x 2 ⋅ x3 ⋅ x 4 are checked in this example, to see whether they can be replaced with one cube. The first step is to build the super cube of these cubes. The super cube of a set of cubes is the smallest cube that covers all the cubes in the set. In Figure 8.2b the dotted line shows the super cube of these cubes. After the super cube is generated the next step is to find cubes that can be expanded to cover zeros in the super cube. Figure 8.2c shows how the cubes x1 ⋅ x 2 ⋅ x3 ⋅ x 4 and x1 ⋅ x 2 ⋅ x3 ⋅ x 4 have been expanded to cover the zeros in the super cube. This function can then be implemented in AND-OR-XOR logic with the dotted cube in function g and the solid implicants in function h referring to Figure 8.1. So in this example we have found that cubes x1 ⋅ x 2 ⋅ x3 ⋅ x 4 and x1 ⋅ x 2 ⋅ x3 ⋅ x 4 can be replaced by one cube when using AND-ORXOR logic. 181 CHAPTER 8 8.2 Theorem on which the estimation method is based In this section we define the theorem on which the estimation method is based. First, some notations are defined. Let f ( x1 , x 2 , K , x n ) be an incompletely specified Boolean function of type f : {0, 1} → {0, 1, −} where “-” denotes a don't-care value. We n use Ff, Rf and Df to represent the set of assignments of variables for which the function value is one, zero and don't-care respectively. The size of a set of cubes A denoted by A is the number of cubes in it. The complement of a set of cubes A denoted by A is the intersection of the complements for each cube of A. The intersection of two sets of cubes A and B denoted by A ∩ B is the union of the pair wise intersection of the cubes from A and B. The union of two sets A and B denoted by A ∪ B is the union of the cubes from A and B. We denote with sup(a1 , a 2 , K , a k) the super cube of cubes a1 to ak. The symbol ⊕ is used to indicate exclusive or (XOR) in both sets and in Boolean functions. A rule is formulated in Theorem 8.1 that could be applied to a two-level AND-OR expression to transform it to an expression of type f = ( g ⊕ h ) + r with the total number of product-terms in g, h and r smaller than the number of product-terms in f. The following Lemmas are used in Theorem 8.1. Lemma 8.1: If X ∩ Y = Ø then ( X ∪ Y ) ⊕ Y = X Proof: ( X ∪ Y ) ⊕ Y = (( X ∪ Y ) ∩ Y ) ∪ (( X ∪ Y ) ∩ Y ) = ( X ∩ Y ∩ Y ) ∪ (( X ∩ Y ) ∪ (Y ∩ Y )) = X ∩ Y , since X ∩ Y = Ø X ∩ Y = X , since X ∩ Y = Ø 182 □ FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION Lemma 8.2: If X ∩ Z = Ø , Y ∩Z =Ø and Y⊂X then ( X ⊕ Y ) ∪ Z = X ⊕ (Y ∪ Z ) Proof: On left side of the equal sign, (X ⊕Y ) ∪ Z = ( X ∩ Y ) ∪ ( X ∩ Y ) ∪ Z = ( X ∩ Y ) ∪ Z since Y ⊂ X . On right side of the equal sign, X ⊕ (Y ∪ Z ) = ( X ∩ (Y ∪ Z )) ∪ ( X ∩ (Y ∪ Z )) = ( X ∩ Y ) ∪ ( X ∩ Z )) ∪ ( X ∩ Y ∩ Z ) = ( X ∩ Y ) ∪ Z , since Y ⊂ X , X ∩ Z = Ø and Y ∩ Z = Ø □ Lemma 8.3: Let a1 , a 2 ,K , a k , k > 0 be cubes from the on-set Ff of a Boolean function f : {0, 1}n → {0, 1, −} such that the intersection of sup(a1 , a 2 , K , a k) with the off-set Rf is a non-empty set of cubes {c1 , c 2 , K, c p} such that ∪ ip=1 ci = sup(a1 , a 2 , K, a k) ∩ RF for some p ≥ 1 . If for each cube ci we can find a cube bi ∈ F f such that sup(bi , ci ) ∩ R f = ci as well as sup(a1 , a 2 , K, a k) ∩ sup(bi , ci ) = ci then there exist a set D ⊆ D f such that p sup(a1 , a 2 , K, a k) ⊕ U sup(bi , ci ) ⊕ D i =1 k p j =1 i =1 = U a j ∪ U (sup(bi , ci ) − ci ) (8.1) Proof: Since ai ∈ F f ∀i ∈ {1, 2, K , k } and c j ∈ R f ∀j ∈ {1, 2, K , p} the intersection of the sets ∪ ik=1 ai and ∪ pj=1 c j is empty. Therefore, by applying Lemma 8.1, we can write: p p k k p U a j = U a j ∪ U ci ⊕ U ci = sup(a1 , a 2 , K , a k) ⊕ U ci ⊕ D j =1 i =1 j =1 i =1 i =1 Making union with ∪ip=1 (sup(bi , ci ) − ci ) on both sides, we get: 183 CHAPTER 8 k p U a j ∪ U (sup(bi , ci ) − ci ) j =1 i =1 p p = sup(a1 , a 2 , K , a k) ⊕ U ci ∪ U (sup(bi , ci ) − ci ) ⊕ D i =1 i =1 sup(a1 , a 2 , K , a k) ∩ (∪ ip=1 (sup(bi , ci ) − ci )) = ∅ , Since (∪ip=1 (sup(bi , ci ) − ci )) ∩ (∪ip=1 ci ) = ∅ (∪ p i =1 and ci ) ⊂ sup(a1 , a 2 , K, a k) we can apply Lemma 8.2 and get: k p j =1 i =1 U a j ∪ U (sup(bi , ci ) − ci ) p p = sup(a1 , a 2 , K , a k) ⊕ U ci ∪ U (sup(bi , ci ) − ci ) ⊕ D i =1 i =1 p = sup(a1 , a 2 , K , a k) ⊕ U (sup(bi , ci )) ⊕ D i =1 □ Lemma 8.3 gives a condition for substituting a subset Ff* from the onset Ff of a function f by two functions g and h of type g , h : {0, 1}n → {0, 1} so that F f * = Fg ⊕ Fh and the total number of cubes in Fg and Fh is smaller than in Ff*. Next we prove that this condition is sufficient to make it possible to represent f as f = ( g ⊕ h) + r with the total number of cubes in g, h and r smaller than the number of cubes in f. The set D in the equation above indicates that the don't-cares might be assigned differently in the left and right side of the equation of Lemma 8.3. Theorem 8.1: If a Boolean function fulfills Lemma 8.3 for some set of cubes {a1 , a 2 ,..., a k } , a i ∈ F f ∀i ∈ {1, 2, ..., k} , then it can be represented as f = ( g ⊕ h) + r with the total number of cubes in g, h and r smaller than in f. 184 FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION Proof: Suppose a function f fulfills Lemma 8.3, then, there exist some j ∈ (1, 2, K , k ) i ∈ (1, 2, K, p ) cubes a j , bi ∈ F f , ci ∈ R f fulfilling Equation (8.1) for some D ⊆ D f . All the cubes from the set ∪ip=1 (sup(bi , ci ) − ci ) belong to either on-set Ff or to don't-care set Df. Also, for each i, the set (sup(bi , ci ) − ci ) includes at least one cube from the on-set Ff namely bi. So p + 1 cubes from the left hand side of the Equation (8.1) cover at least p + k , k > 1 , cubes from the onset Ff, given by the right hand side of Equation (8.1). If we set Fg = sup(a1 , a 2 , K , a k) and Fh = sup(bi , ci ) , then Fg + Fh < F f * with Ff * = (∪kj =1 a j ) ∩ (∪ip=1 (sup(bi , ci ) − ci )) . Defining the reminder as Fr = F f − (∪kj =1 a j ) − (∪ip=1 bi ) , we get a decomposition of type f = ( g ⊕ h) + r with the total number of cubes in g, h and r smaller than in f. In that function r is a Boolean function which is 1 for input combinations in set Fr and 0 for others. Don't-cares might be assigned different values in this representation than in the two-level form initially given. □ 8.3 Estimation algorithm Theorem 8.1 can be utilized to estimate the benefit of optimization for AND-OR-XOR logic implementation. The larger the subset of the onset of a function f that satisfies the condition in Lemma 8.3, the more f can benefit from XOR minimization. However, there might be several different choices of such a subset. Computing the best one would require first trying all possible subsets {a1 , a 2 ,..., a k } of f to find the ones fulfilling the condition in Lemma 8.3, and then solving the covering problem to find which combination of the subsets results in the best XOR-cover for f. Both steps would require exponential time, and therefore such a method would be too slow for large functions. Instead, we present a simple heuristic algorithm, which quickly estimates the benefit from XOR-minimization by only using pair of subsets. The pseudo code is shown below. The input is the on-set Ff, don't-care set Df and off-set Rf of f. The output is a counter value that 185 CHAPTER 8 indicates the number of pairs ai and aj for which Lemma 8.3 is fulfilled. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ALGORITHM Estimation-algorithm (Ff, Df, Rf) counter ← 0 FOR EACH pair of cubes (ai,aj), ai∈Ff, aj∈Ff DO flag_lemma_fulfilled ← TRUE C ← super cube(ai,aj)∩Rf FOR EACH cube c∈C DO flag_cube_b_not_found ← TRUE FOR EACH cube b∈(Ff-ai-aj) DO IF super cube(ai,aj)∩Rf = ck AND super cube(ai,aj)∩super cube(b,c) = c THEN flag_cube_b_not_found ← FALSE END IF END FOR EACH IF flag_cube_b_not_found THEN flag_lemma_fulfilled ← FALSE END IF END FOR EACH IF flag_lemma_fulfilled THEN counter ← counter + 1 END IF END FOR EACH RETURN counter END ALGORITHM There are at least as many different ways as the value of counter to express the given function as ( g ⊕ h ) + r with the total number of product-terms in g, h and r smaller than the number of product-terms in the sum-of-product form in which the function was given to the algorithm. If Lemma 8.3 is checked for all subsets of implicants for the function f, then O(2m) possible subsets need to be tested for, where m is the number of implicants. For most functions it is, however, more likely that Lemma 8.3 is fulfilled if the super cube is made out of only a few implicants. The heuristic algorithm checks whether the lemma is fulfilled only for pairs of implicants. Then only O(m2) super cubes have to be built. Another essential savings in time results from the fact that the algorithm is not computing the XOR cover at all but is only counting the number of pairs fulfilling Lemma 8.3. The more pairs that fulfill Lemma 8.3, the more flexibility we have in selecting a good XOR-cover from them. However, since Theorem 8.1 proves only the sufficiency of the condition, not its necessity, there might be cases when the condition is not fulfilled, but 186 FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION the number of cubes in f can still be reduced by a representation as f = ( g ⊕ h ) + r . 8.4 Experimental results We have performed a set of experiments with the goal of determining how good the presented heuristic is. Table 8.1 summarizes the results. The second and third columns give the number of inputs and the number of outputs of the function. The fourth column refers to the number of implicants in the SOP-form computed by Espresso [Bra84]. The fifth column gives the number of cubes in an AND-OR-XOR representation. It is the smallest number among the results reported in [Deb98, Dub99, Sas95]. The sixth column gives the counter value of the algorithm, which is the number of pairs of implicants for which Lemma 8.3 is fulfilled. The number of AND gates needed is given by the number of cubes. For two-level and three-level-implementation this has a high correlation to the number of transistors needed to implement the function. To compare the implementation cost for a two-level implementation with a three-level AND-OR-XOR implementation the fourth and the fifth columns in Table 8.1 should be considered. For the three-level implementation there is an extra two input XOR-gate that is not needed in the two-level implementation. In the benchmarks in this experiment the cost of this gate is small compared to all other gates. Therefore, the number of cubes is still a good estimation of implementation cost. 187 CHAPTER 8 Table 8.1: Number of pairs fulfilling Lemma 8.3 Benchmark function 5xp1 9sym Clip f51m Life Mlp4 rd53 rd73 rd84 Sao2 squar5 t481 Z4 Number of inputs 7 9 9 8 9 8 5 7 8 10 5 16 7 Number of outputs 10 1 5 8 1 8 3 3 4 4 8 1 4 AND-OR 65 86 120 77 84 128 31 127 255 58 25 481 59 AND-ORXOR 34 65 72 35 62 75 17 54 99 33 20 18 18 Lemma counter 27 301 28 44 724 52 198 1304 7590 53 6 0 54 For benchmarks rd73 and rd84 we can see that Lemma 8.3 is fulfilled for many pairs of cubes. For those benchmark-functions the benefit of using AND-OR-XOR is greater relative to two-level AND-OR implementation. On the other hand, benchmark 9sym does not show that much difference between these two implementations. This is expected because the pair of cubes for which Lemma 8.3 is fulfilled is also considerably smaller. There are, however, several benchmarks that gain quite a bit although the counter value is low. For example, Lemma 8.3 is not fulfilled for any pair of implicants for the benchmark function t481. This function can, however, be described as f = g ⊕ h where the total number of implicants in g and h are only 18, while the number of implicants in its two-level AND-OR form is 481 in the smallest known such form. The algorithm presented is only a sufficient condition for a function to benefit from XOR minimization and the case with function t481 shows that the presented heuristic misses some types of functions. We can also see that for benchmark functions 9sym and life the counter in the algorithm is quite large but the size of the AND-ORXOR implementation is only slightly smaller than the AND-OR implementation. However, the algorithms that found the size of ANDOR-XOR implementation are heuristic. This implies that it is quite possible that some AND-OR-XOR implementations may exist for those functions with a smaller number of cubes. This possibility is 188 FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION likely because AND-OR-XOR minimization is more complicated than AND-OR minimization and the research is not as widely established as AND-OR minimization. 8.5 Conclusions In this chapter a sufficient condition for a function f to have a decomposition of type f = ( g ⊕ h ) + r , with the total number of product-terms in g, h and r smaller than the number of product-terms in f, has been formulated. An algorithm has been designed that utilizes this condition for deciding whether a function is likely to benefit from XOR-minimization. This algorithm can be used as a pre-processing step to decide whether it is worthwhile to run an algorithm for ANDOR-XOR minimization. Experiments on benchmark circuits show that the benefit of using AND-OR-XOR is greater in cases where the presented algorithm estimates that minimization for AND-OR-XOR will be beneficial. There are, however, some benchmark circuits that gain quite a bit although the presented algorithm does not indicate this. The algorithm however provides a sufficient condition and it might therefore miss some types of functions that would benefit from optimization for AND-OR-XOR implementation. There are also some benchmark functions for which the presented algorithm indicates that optimization for AND-OR-XOR implementation would be beneficial but where no such implementation was found. Since the algorithms that found the size of AND-OR-XOR implementation are heuristic so it is quite possible that AND-OR-XOR implementations exist for those functions with a smaller number of cubes. This possibility is quite likely because AND-OR-XOR minimization is more complicated than AND-OR minimization and the research is not as widely established as AND-OR minimization. 189 CHAPTER 8 190 Part D Conclusions 191 192 Chapter 9 Conclusions and future work This thesis has presented contributions in the area of electronic testing and Boolean optimization. The development of SoCs, especially those using a NoC as interconnection infrastructure, is of particular interest where all contributions of this thesis can be useful. Section 9.1 gives short summaries of the contributions and results in electronic testing and Section 9.2 gives short summaries of the contributions and results in Boolean optimization. Section 9.3 gives proposals for future work. 9.1 Contributions in chip testing Contributions of this thesis in the area of testing are based on the NoC infrastructure. Testing of this sort of infrastructure can be categorized into testing of the switches and testing of the interconnections between the switches. This thesis makes contributions which are applicable in both of these areas. Contributions related to testing of interconnections can be applied to testing links between NoC-switches that are clocked with different clock signals. This case is considered harder to test than the case in which all switches share the same clock. This thesis contributes test 193 CHAPTER 9 techniques for testing faults in such links causing too much crosstalk and faults that cause unacceptable delay. For crosstalk tests of the interconnection links between NoCswitches it is normally unnecessarily pessimistic and inefficient to only consider one wire at the time as victim wire. This thesis contributes with a small hardware for selecting which wires to be victims simultaneously. That hardware is configurable providing possibility to set the minimum distance between wires that should be considered as victims at the same time. In this way a tradeoff between test time and test accuracy can be configured. Raising the abstraction level at which test logic and test data are generated is attractive because test costs can be accounted for earlier in the design phase. It is also usually easier to activate and propagate faults at higher levels of abstraction. The challenge when developing tests at a high abstraction level is to find fault models which model the effect of physical defects well enough. A contribution to fault modeling at the system level is included in this thesis. Usage of application-specific fault models is recommended and as a case study fault models for a NoC switch are proposed and their efficiency evaluated. Unlike the presented methods for testing links the contribution in fault generation at system level is not a usable method that can be applied to cover a certain set of defects. It is instead initial results in system-level fault modeling which needs further research to be practical when high coverage of defects is required. However, the results demonstrate the potential benefits of further investigation of system level fault modeling. 9.2 Contributions in Boolean decomposition Two contributions have been presented in the area of Boolean decomposition. The first of these contributions is a fast BDD-based heuristic algorithm for finding bound-sets. The second of these contributions is a fast algorithm for prediction of whether it is 194 CONCLUSIONS AND FUTURE WORK worthwhile to run a minimization algorithm for a three-level ANDOR-XOR implementation. Experiments show that the presented algorithm for finding boundsets works considerably faster than exact algorithms and that it finds all bound-sets for most benchmark functions. The complexity of the presented algorithm has been shown to be O(m3) where m is the number of nodes in the ROBDD. The second contribution in Boolean decomposition is a method for fast prediction of whether optimization algorithms for AND-ORXOR implementation are worth running. Some functions gain a significant amount of optimization for AND-OR-XOR logic and others do not. Optimization for AND-OR-XOR logic is relatively time consuming, therefore it is beneficial to know in advance whether such an optimization is going to be useful. 9.3 Proposals for future work This section describes limitations in the presented contributions and it proposes future work to address some of these limitations. 9.3.1. Interconnection test The proposed method for interconnection test works on simple handshaking links. Other asynchronous protocols might require different approaches. An analysis for such an approach would be an interesting future work. The presented algorithm for measuring delay targets test of links connecting different clock domains. For high speed signals between different clock domains there is a risk of problems with metastability. The exact implementation is not included as a part of the contribution in this thesis. It would be an interesting topic for future work to find out how the presented methodology can optimally be implemented with consideration of the metastability problem. The analysis of the efficiency of the delay measuring method assumes that a certain signal wire will have a specific signal delay which is the outcome of a normal distributed stochastic variable. A 195 CHAPTER 9 delay fault is considered to be present if the actual signal delays in a link can result in failures. It would be interesting to conduct a more detailed investigation of the probability distribution of the delay at signal wires. The presented test method for detecting crosstalk induced glitches utilizes glitch detectors. The glitch detectors need to be as sensitive to glitches as the logic used during normal operation. However, too sensitive glitch detectors will make the test unnecessarily pessimistic. It would be an interesting future work to investigate how to make glitch detectors sensitive enough without being too sensitive. Another subject for future work is to determine how to also include test for glitches caused by defects resulting in too much inductive coupling. The method for scheduling victim wires targets crosstalk-faults that can occur due to excessive capacitive coupling. An interesting future work will be to investigate how to efficiently schedule victim wires when defects causing too much inductive coupling are also considered. 9.3.2. System level fault modeling The study in this thesis about system level fault modeling has been limited. One limitation is that only one particular design is studied. It would be interesting to study more designs. Another limitation is that only one logic level implementation has been utilized for evaluation of the system level faults. A future work could be to synthesize the design to logic level in several ways to investigate how the relevance of system level faults depend on synthesis algorithms and optimization criteria. A third limitation is that the evaluation of the system level faults has been made on a system which has been simplified to a pure combinational design. It would be interesting to develop analysis methods for sequential designs. One more interesting subject for future work will be to evaluate how synthesis algorithms can be analyzed to develop accurate fault models at high abstraction levels. As described in Subsection 3.3.2, some attempts have been made in that direction. However, it seems to be difficult to do that without putting constraints on the synthesis algorithms, which considerably limits the ability to optimize. 196 CONCLUSIONS AND FUTURE WORK 9.3.3. BDD-based decomposition The presented algorithm for finding bound-sets works for single output functions. However, most combinational networks have several outputs as well as several inputs. Each output then needs to be considered separately for multiple output functions. It would be useful to extend the algorithm to permit identification of common sub expressions for several outputs. For multiple output functions represented with a common ROBDD it should be relatively simple to identify subsets of variables that are bound-sets to several outputs, which have the same associated functions. Bound-sets correspond to subfunctions that only use inputs not used anywhere else in the implementation for the output under consideration. However, there are more opportunities to find common sub expressions if we allow some inputs to feed both the sub expression and the rest of the network. It would therefore be interesting to investigate extending the algorithm for that. A variant of BDD with complemented edges has been proposed in literature to get smaller number of nodes in the BDD. It should be possible to adopt the presented algorithm for BDDs with complemented edges with only slight modifications. By considering not only the case where the number of lower cutnodes is two and structures are equal, it should be possible to extend the presented algorithm such that it can find Roth-Karp decompositions. 9.3.4. Decomposition for XOR-type logic Use of the presented method for prediction of expected gain by optimization for AND-OR-XOR logic only considers pairs of cubes in the two-level representation. Some functions that gain significantly from AND-OR-XOR minimization do not gain anything by only putting two implicants in the function that feeds one of the inputs to the XOR-gate. The presented algorithm inadequately indicated that there is no use in trying to minimize for AND-OR-XOR logic. It would be interesting in future work to modify the algorithm such that these types of functions are also detected. 197 CHAPTER 9 198 List of abbreviations ALU Arithmetic Logic Unit BDD Binary Decision Diagram BIST Built In Self Test CUDD Colorado University Decision Diagram DfT Design for Testability EDIF Electronic Design Interchange Format FIFO First In First Out FIR Finite Impulse Response FPGA Field Programmable Gate Array FSM Finite State Machine GALS Global Asynchronous Local Synchronous IP Intellectual Property NoC Network on Chip NP Non-deterministic Polynomial PCB Printed Circuit Board PLA Programmable Logic Array RAM Random Access Memory ROBDD Reduced Ordered BDD RT Register Transfer RTR Ready To Receive 199 LIST OF ABBREVIATIONS SoC System on Chip SOP Sum Of Product VHDL VHSIC Hardware Description Language VHSIC Very High Speed Integrated Circuit UML Unified Modeling Language 200 References [Abr90] M. Abramovici, M. A. Breuer, and A. D. Friedman, "Digital systems testing and testable design", IEEE Press, ISBN 0-7803-1062-4, 1994. [Alt12] "Processors from Altera and Embedded Alliance Partners", Altera, Web site: www.altera.com/devices/processor/emb-index.html, 2012 [Amo04] A. M. Amory, É. Cota, M. Lubaszewski, and F. G. Moraes, "Reducing test time with processor reuse in network-on-chip based systems", Proceedings of Symposium on Integrated Circuits and System Design pp. 111-116 2004 [Arm12] "DesignStart for Processor IP", ARM, Web site: www.arm.com/products/processors/designstartprocessor-ip/index.php, 2012 [Aru05] D. Arumí, R. Rodríguez-Montañés, and J. Figueras, "Defective behaviours of resistive opens in interconnect lines", Proceedings of European Test Symposium, Tallinn, Estonia, pp. 28 - 33, May 2005 201 REFERENCES [Asc08] G. Ascia, V. Catania, M. Palesi, and D. Patti, "Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip", IEEE Transaction on computers, vol. 57, (6), pp. 809820, 2008. [Ash59] R. Ashenhurst, "The decomposition of switching functions", Proceedings of International Symposium on Theory of Switching Functions, pp. 77-116, 1959 [Att01] A. Attarha and M. Nourani, "Testing interconnects for noise and skew in gigahertz SoC", Proceedings of International Test Conference, pp. 305-314, 2001 [Bai00] X. Bai, S. Dey, and J. Rajski, "Self-test methodology for at-speed test of crosstalk in chip interconnects", Proceedings of Design Automation Conference, pp. 619-624, 2000 [Bai04] X. Bai and S. Dey, "High-level crosstalk defect simulation methodology for system-on-chip interconnects", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, (9), pp. 1355-1361, 2004. [Ben01] T. Bengtsson and E. Dubrova, "A sufficient condition for detection of XOR-type logic", Proceedings of Norchip, Stockholm, Sweden, pp. 271-278, November 2001 [Ben02] L. Benini and G. De Micheli, "Networks on Chips: A New SoC Paradigm", IEEE Computer, vol. 35, (1), pp. 70-78, 2002. 202 [Ben03a] T. Bengtsson, "Boolean decompoistion in combinational logic synthesis". Licentiate thesis, Royal Institute of Technology Stockholm, ISSN 1651-4076, 2003. [Ben03b] T. Bengtsson, A. Martinelli, and E. Dubrova, "A BDD-based fast heuristic algorithm for disjoint decomposition", Proceedings of Asia and South Pacific Design Automation Conference, Kitakyushu, Japan, pp. 191-196, January 2003 [Ben05a] T. Bengtsson, A. Jutman, S. Kumar, and R. Ubar, "Delay testing of asynchronous NoC interconnects", Proceedings of International Conference Mixed Design of Integrated Circuits and Systems, pp. June 2005 [Ben05b] T. Bengtsson, A. Jutman, R. Ubar, and S. Kumar, "A method for crosstalk fault detection in on-chip buses", Proceedings of Norchip, Oulu, Finland, pp. 285-288, November 2005 [Ben06a] T. Bengtsson, A. Jutman, S. Kumar, R. Ubar, and Z. Peng, "Analysis of a test method for delay faults in NoC interconnects", Proceedings of East-West Design & Test International Workshop (EWDTW), pp. 42-46, September 2006 [Ben06b] T. Bengtsson, A. Jutman, S. Kumar, R. Ubar, and Z. Peng, "Off-line testing of delay faults in NoC interconnects", Proceedings of Euromicro Conference on Digital System Design: Architectures, Methods and Tools, pp. 677 - 680, 2006 203 REFERENCES [Ben06c] T. Bengtsson, S. Kumar, A. Jutman, and R. Ubar, "An improved method for delay fault testing of NoC interconnections", Proceedings of Special Workshop on Future Interconnects and Networks on Chip (along with Design And Test in Europe), pp. March 2006 [Ben06d] T. Bengtsson, S. Kumar, and Z. Peng, "Application area specific system level fault models: a case study with a simple NoC switch", Proceedings of International Design and Test Workshop (IDT), pp. November 2006 [Ben06e] T. Bengtsson, S. Kumar, R. Ubar, and A. Jutman, "Off-line testing of crosstalk induced glitch faults in NoC interconnects", Proceedings of Norchip, Linköping, Sweden, pp. 221-226, November 2006 [Ben08] T. Bengtsson, S. Kumar, R. Ubar, A. Jutman, and Z. Peng, "Test methods for crosstalk-induced delay and glitch faults in network-on-chip interconnects implementing asynchronous communication protocols", Computers and Digital Techniques, IET, vol. 2, (6), pp. 445-460, 2008. [Ber97] V. Bertacco and M. Damiani, "The disjunctive decomposition of logic functions", Proceedings of International Conference on Computer-Aided Design, pp. 78-82, 1997 [Bra84] R. K. Brayton, A. L. Sangiovanni-Vincentelli, C. T. McMullen, and G. D. Hachtel, "Logic minimization algorithms for VLSI synthesis", Kluwer Academic Publishers, ISBN 0-89838-164-9, 1984. 204 [Bre01] V. Bret and K. Keutzer, "Bus encoding to prevent crosstalk delay", Proceedings of IEEE/ACM International Conference on Computer Aided Design, pp. 57-63, November 2001 [Bry86] R. E. Bryant, "Graph-based algorithm for Boolean function manipulation", Transactions on Computers, vol. C-35, pp. 677-691, 1986. [Buo97] G. Buonanno, F. Ferrandi, L. Ferrandi, F. Fummi, and D. Sciuto, "How an "evolving" fault model improves the behavioral test generation", Proceedings of Great Lakes Symposium on VLSI, pp. 124-130, 1997 [Cha05] K. Chakrabarty, V. Iyengar, and M. D. Krasniewski, "Test planning for modular testing of hierarchical SOCs", Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 24, (3), pp. 435-448, 2005. [Cha96] S. C. Chang, M. Marek-Sadowska, and T. Hwang, "Technology mapping for TLU FPGA's based on decomposition of binary decision diagrams", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, pp. 12261235, 1996. [Cha97] S. Chattopadhyay, S. Roy, and P. P. Chaudhuri, "KGPMIN: an efficient multilevel multioutput AND-OR-XOR minimizer", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 16, (3), pp. 257-265, 1997. 205 REFERENCES [Che00] K.-T. T. Cheng, S. Dey, M. Rodgers, and K. Roy, "Test challenges for deep sub-micron technologies", Proceedings of Design Automation Conference, pp. 142-149, 2000 [Cho94] C. H. Cho and J. R. Armstrong, "B-algorithm: a behavioal test generation algorithm", Proceedings of International Test Conference, pp. 968 - 979, October 1994 [Cho96] T.-L. Chou and K. Roy, "Estimation of activity for static and domino CMOS circuits considering signal correlations and simultaneous switching", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, (10), pp. 1257-1265, 1996. [Cor00] F. Corno, G. Cumani, M. S. Reorda, and G. Squillero, "An RT-level fault model with high gate level correlation", Proceedings of High-Level Design Validation and Test Workshop, pp. 3-8, 2000 [Cor01] F. Corno, M. S. Reorda, and G. Squillero, "An interpretation framework for evaluating high-level fault models and ATPG capabilities", Proceedings of Design of Circuits and Integrated Systems, pp. 273-278, 2001 [Cos97] J. C. Costa, J. C. Monterio, and S. Devadas, "Switching activity estimation using limited depth reconvergent path analysis", Proceedings of International Symposium of Low Power Electronics, pp. 184-189, 1997 206 [Cot03b] E. Cota, M. Kreutz, C. A. Zeferino, L. Carro, M. Lubaszewski, and A. Susin, "The impact of NoC reuse on the testing of core-based systems", Proceedings of VLSI Test Symposium, pp. 128-133, 2003 [Cot03a] É. Cota, L. Carro, F. Wagner, and M. Lubaszewski, "Power-aware noc reuse on the testing of corebased systems", Proceedings of International Test Conference, pp. 612- 621, 2003 [Cur62] H. A. Curtis, "A new approach to the design of switching circuits", D. van Nostrand company, 1962. [Cuv99] M. Cuviello, S. Dey, X. Bai, and Y. Zhao, "Fault modeling and simulation for crosstalk in systemon-chip interconnects", Proceedings of IEEE/ACM International Conference on Computer-Aided Design, pp. 297 - 303, 1999 [Dal08] J. Dalmasso, É. Cota, M.-L. Flottes, and B. Rouzeyre, "Improving the test of NoC-based SoCs with help of compression schemes", Proceedings of Symposium on VLSI, pp. 139 - 144, April 2008 [Dem94] G. De Micheli, "Synthesis and optimization of digital circuits", McGraw-Hill, Inc.,ISBN 0-07113271-6, 1994. [Deb98] D. Debnath and T. Sasao, "A heuristic algorithm to design AND-OR-EXOR three-level networks", Proceedings of Asia and South Pacific Design Automation Conference, pp. 67-74, 1998 207 REFERENCES [Dey98] S. Dey, A. Raghunathan, and R. K. Roy, "Considering testability during high-level design (embedded tutorial)", Proceedings of Asia and South Pacific Design Automation Conference, pp. 205-210, 1998 [Dua01] C. Duan, A. Tirumala, and S. P. Khatri, "Analysis and avoidance of cross-talk in on-chip buses", Proceedings of Hot Interconnects 9, pp. 133 - 138, August 2001 [Dub97a] E. V. Dubrova, D. M. Miller, and J. C. Muzio, "AOXMIN: A three-level heuristic AND-OR-XOR minimizer for Boolean functions", Proceedings of 3rd International Workshop on the Applications of the Reed-Muller Expansion in Circuit Design, pp. 209-218, 1997 [Dub97b] E. V. Dubrova, J. C. Muzio, and B. v. Stengel, "Finding composition trees for multiple-valued functions", Proceedings of 27th International Symposium on Multiple-Valued Logic, pp. 19-26, 1997 [Dub99] 208 E. V. Dubrova, D. M. Miller, and J. C. Muzio, "AOXMIN-MV: A heurisitc algorithm for ANDOR-XOR minimization", Proceedings of 4th International Workshop on the Applications of the Reed-Muller Expansion in Circuit Design, pp. 37-54, August 1999 [Dug08] K. K. Duganapalli, A. K. Palit, and W. Anheier, "Test pattern generation for worst-case crosstalk faults in DSM chips using genetic algorithm", Proceedings of Electronics Systemintegration Technology Conference, pp. 393-402, September 2008 [Dum03] T. Dumitras and R. Mãculescu, "On-chip stochastic communication", Proceedings of Design Automation and Test in Europe, pp. 790- 795, 2003 [Dut96] S. Dutt and W. Deng, "VLSI circuit partitioning by cluster-removal using iterative improvement techniques", Proceedings of IEEE/ACM International Conference on CAD, pp. 194--200, 1996 [Eft05] A. Efthymiou, J. Bainbridge, and D. Edwards, "Test pattern generation an partial-scan methodology for an asynchronous SoC interconnect", Transactions on VLSI Systems, vol. 13, (12), pp. 1384-1393, 2005. [Eld59] R. D. Eldred, "Test routines based on symbolic logical statements", Journal of the ACM, vol. 6, (1), pp. 33 - 37, 1959. [Fal01] F. Fallah, S. Devadas, and K. Keutzer, "OCCOM Efficient computation of observability-based code coverage metrics for functional verification", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 20, (8), pp. 1003-1015, 2001. [Fer98] F. Ferrandi, F. Fummi, and D. Sciuto, "Implicit test generastion for behavioral VHDL models", Proceedings of International Test Conference, pp. 587 - 569, 1998 209 REFERENCES [Fer01] F. Ferrandi, G. Ferrara, D. Sciuto, A. Fin, and F. Fummi, "Functional test generation for behaviorally sequential models", Proceedings of Design Automation adn Testin Europe, pp. 403-410, 2001 [Fid82] C. M. Fiduccia and R. M. Mattheyses, "A lineartime heuristic for improving network partitions", Proceedings of IEEE/ACM Design Automation Conference, pp. 175-181, 1982 [Fra07] A. P. Frantz, M. Cassel, F. L. Kastensmidt, É. Cota, and L. Carro, "Crosstalk- and SEU-aware Networks on Chips", Design & Test of Computers, vol. 24, (4), pp. 340-350, 2007. [Gol02] O. Goloubeva, M. S. Reorda, and M. Violante, "Experimental analysis of fault models for behavioral-level test generation", Proceedings of IEEE Design & Diagnostic of Electronic Circuits & Systems, pp. 416-419, 2002 [Gon02] P. T. Gonciari, B. M. Al-Hashimi, and N. Nicolici, "Improving compression ratio, area overhead and test application time for system-on-a-Chip test data compression/decompression", Proceedings of Design Automation and Test in Europe, pp. 604-611, 2002 [Gre07] C. Grecu, A. Ivanov, R. Saleh, and P. P. Pande, "Testing Network-on-Chip communication fabrics", IEEE Transactions on Computer-aided desgin of integrated circuits and systems, vol. 26, (12), pp. 2201-2214, 2007. 210 [Gup05] S. Gupta and S. Katkoori, "Intrabus crosstalk using word-level statistics", estimation Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, (3), pp. 469478, 2005. [Han95a] M. C. Hansen and J. P. Hayes, "High-level test generation using physically-induced faults", Proceedings of VLSI Test Symposium, pp. 20-28, May 1995 [Han95b] M. C. Hansen and J. P. Hayes, "High-level test generation using symbolc scheduling", Proceedings of International Test Conference, pp. 586-595, October 1995 [Hem99] A. Hemani, T. Meincke, S. Kumar, A. Postula, T. Olsson, P. Nilsson, J. Öberg, P. Ellervee, and D. Lundqvist, "Lowering power consumption in clock by using globally asynchronous locally synchronous design style", Proceedings of DAC99, pp. 873-878, 1999 [Hey05] P. Heydari and M. Pedram, "Capacitive coupling noise in high-speed VLSI circuits", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, (3), pp. 478488, 2005. [Ho04] R. Ho, J. Gainsley, and R. Drost, "Long wires and asynchronous control", Proceedings of International Symposium on Asynchronous Circuits and Systems, pp. 240- 249, 2004 211 REFERENCES [Hos06] M. Hosseinabady, A. Banaiyan, M. N. Bojnordi, and Z. Navabi, "A concurrent testing method for NoC switches", Proceedings of Design, Automation and Test in Europe, pp. 6-10, March 2006 [Hos07] M. Hosseinabady, A. Dalirsani, and Z. Navabi, "Using the Inter- and Intra-Switch Regularity in NoC Switch Testing", Proceedings of Design, Automation & Test in Europe Conference & Exhibition, pp. 1-6, April 2007 [Hua08] L. Huang, F. Yuan, and X. Xu, "On reliable modular testing with vulnerable test access mechanisms", Proceedings of 45th Design Automation Conference, pp. 834 - 839 June 2008 [IEEE01] IEEE, "IEEE Standard 1149.1, Standard test access port and boundary-scan architecture (2001 revision)", 2001. [IEEE05] IEEE, "IEEE Standard 1500, Standard testability method for embedded core-based integrated circuits", 2005. [Ism99] Y. I. Ismail, E. G. Friedman, and J. L. Neves, "Figures of merit to characterize the importance of on-chip inductance", IEEE Transactions on Very Large Scale Integration Systems, vol. 7, (4), pp. 442 449, 1999. [ITRS08] "The international technology roadmap for semiconductors", ITRS, Web site: http://www.itrs.net/Links/2008ITRS/Update/2008_U pdate.pdf, 2008 212 [Jab00] A. Jabir and J. Saul, "Heuristic AND-OR-EXOR three-level minimisation algorithm for multipleoutput incompletely-specified Boolean functions", Computers and Digital Techniques, IEE Proceedings, vol. 147, (6), pp. 451 - 461, 2000. [Jab02] A. Jabir and J. Saul, "Minimisation algorithm for three-level mixed AND-OR-EXOR/AND-OREXNOR representation of Boolean functions", Computers and Digital Techniques, IEE Proceedings, vol. 149, (3), pp. 82-96, 2002. [Jer98] G. Jervan, A. Markus, P. Paomets, J. Raik, and R. Ubar, "A CAD system for teaching digital test", Proceedings of 2nd European Workshop on Microelectronics Education, Noordwijkerhout, The Netherlands, pp. 287-290, 1998 [Jer02] G. Jervan, Z. Peng, O. Goloubeva, M. S. Reorda, and M. Violante, "High-level and hierarchical test sequence generation", Proceedings of International Workshop on High Level Design Validation and Test, pp. 169-174, 2002 [Jha03] N. Jha and S. Gupta, "Testing of digital systems", Cambridge University Press, ISBN 0-521-77356-3, 2003. [Jun08] S. Jung, N. Zang, P. Eunsuk, and J. Kim, "Crosstalk avoidance method considering multi-aggressors", Proceedings of International SoC Design Conference, pp. II-158 - II-161, November 2008 [Jut04] A. Jutman, "At-speed on-chip diagnosis of boardlevel interconnect faults", Proceedings of European Test Symposium, pp. 2-7, 2004 213 REFERENCES [Kar53] M. Karnaugh, "The map method for synthesis of combinational logic circuits", Transactions of the American Institute of Electrical Engineers: Part I : Communication and electronics, vol. 72, (9), pp. 593599, 1953. [Kar88] K. Karplus, "Using if-then-else DAGs for multilevel logic minimization", University of California Santa Cruz, Technical Report UCSC-CRL-88-29, 1988. [Kim05] J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan, and C. R. Das, "A low latency router supporting adaptivity for on-chip interconnects ", Proceedings of Design Automation Conference, pp. 559 - 564, 2005 [Kri84] B. Krishnamurthy, "An improved min-cut algorithm for partitioning VLSI networks", Transactions on Computers, vol. C-33, pp. 438--446, 1984. [Krs01] A. Krstic, J.-J. Liou, Y.-M. Jiang, and K.-T. t. Cheng, "Delay testing considering crosstalk-induced effects", Proceedings of International Test Conference, pp. 558-567, 2001 [Kum02] S. Kumar, A. Jantsch, M. Millberg, J. Öberg, J.-P. Soininen, M. Forsell, K. Tiensyrja, and A. Hemani, "A network on chip architecture and design methodology", Proceedings of Comp. Society Annual Symp. on VLSI, pp. 117-124, 2002 214 [Kun05] S. Kundu, S. T. Zachariah, and Y.-S. Chang, "On modeling crosstalk faults", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, (12), pp. 1909-1915, 2005. [Lai93] Y.-T. Lai, M. Pedram, and A. B. K. Vrudhula, "BDD based decomposition of logic functions with application to FPGA synthesis", Proceedings of IEEE/ACM Design Automation Conference, pp. 642647, 1993 [Laj00] M. Lajolo, L. Lavagno, M. Rebaudengo, M. S. Reorda, and M. Violante, "Behavioral-level test vector generation for system-on-chip designs", Proceedings of International High Level Design Validation Workshop, pp. 21-26, November 2000 [Lar08] A. Larsson, "Test optimization for core-based system-on-chip", Linköping Studies in Science and Tchnology, Dissertation No 1222. Linköping University, 2008. [Lar04] E. Larsson, K. Arvidsson, H. Fujiwara, and Z. Peng, "Efficient test solutions for core-based designs", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, (5), pp. 758775, 2004. [Len79] T. Lengauer and R. E. Tarjan, "A fast algorithm for finding dominators in a flowgraph", ACM Transactions of Programming Languages and Systems, vol. 1, (1), pp. 121--141, 1979. 215 REFERENCES [Li06] J. Li and L. Behjat, "A connectivity based clustering algorithm with application to VLSI circuit partitioning", IEEE Transactions on circuits and systems—II: express briefs, vol. 53, (5), pp. 384-388, 2006. [Li09] K. S.-M. Li, C.-L. Lee, C. Su, and J. E. Chen, "A Unified Detection Scheme for Crosstalk Effects in Interconnection Bus", Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, (2), pp. 306-311, 2009. [Liu03] J. Liu, L.-R. Zheng, D. Pamunuwa, and H. Tenhunen, "A global wire planning scheme for network-on-chip", Circuits and Systems, vol. 4, pp. IV-892- IV-895, 2003. [Liu04] J. Liu, L.-R. Zheng, and H. Tenhunen, "Interconnect intellectual property for Network-on-Chip (NoC)", Journal of Systems Architecture, vol. 50, (2-3), pp. 65-79, 2004. [Lus10] A. K. Lusala and J.-D. Legat, "Combining circuit and packet switching with bus architecture in a NoC for real-time applications ", Proceedings of International Symposium on Circuits and Systems (ISCAS), pp. 2880 - 2883 2010 [Mat98] Y. Matsunaga, "An exact and efficient algorithm for disjunctive decomposition", Proceedings of Workshop on Synthesis And System Integration of MIxed Technologies (SASIMI), pp. 44--50, 1998 [Mcc56] E. McCluskey, "Minimization of Boolean functions", The Bell System Technical Journal, vol. 35, pp. 1417-1444, 1956. 216 [Mic06] G. D. Micheli and L. Benini, "Networks on Chips", Morgan Kaufmann, ISBN10: 0123705215, 2006. [Min98] S. Minato and G. D. Micheli, "Finding all simple disjunctive decompositions using irredundant sum-of-products forms", Proceedings of International Conference on Computer-Aided Design, pp. 111-117, 1998 [Mis01] A. Mishchenko, B. Steinbach, and M. Perkowski, "An algorithm for bi-decomposition of logic functions", Proceedings of Design Automation Conference, pp. 103-108, 2001 [Moo65] G. E. Moore, "Cramming more components onto integrated circuits (Reprinted from Electronics magazine, volume 38, number 8, April 19, 1965, pp.114 ff)", Proceedings of the IEEE, vol. 86, (1), pp. 82-85, 1998. [Mor01] A. Morosov, K. Chakrabarty, M. Gössel, and B. Bhattacharya, "Design of parameterizable errorpropagating space compactors for response observation", Proceedings of IEEE VLSI Test Symp, pp. 48-53, 2001 [Mou00] S. Mourad and Y. Zorian, "Principles of testing electronic circuits", John Wiley and Sons Ltd, ISBN 0-471-31931-7, 2000. [Möh85] R. H. Möhring, "Algorithmic aspects of the substitution decomposition in optimization over relations, set systems and Boolean functions", Annals of Operations Research, vol. 4, (1), pp. 195225, 1985. 217 REFERENCES [Nae04] A. Naeemi, J. A. Davis, and J. D. Meindl, "Compact physical models for multilevel interconnect crosstalk in gigascale integration (GSI)", Transactions on Electronic Devices, vol. 51, (11), pp. 1902-1912, 2004. [Nak11] Y. Nakata, Y. Takeuchi, H. Kawaguchi, and M. Youshimoto, "A process-variation-adaptiv network-on-chip with variable-cycle routers", Proceedings of 14th Euromicro Conference on Digital System Design, pp. 801-804, 2011 [Nor98] P. Nordholz, D. Treytnar, J. Otterstedt, H. Grabiniski, D. Niggemeyer, and T. W. Williams, "Signal integrity problems in deep submicrons arising from interconnects between cores", Proceedings of VLSI Test Symposium, pp. 28 – 33 1998 [Nur04] J. Nurmi, H. Tenhunen, J. Isoaho, and A. Jantsch, "Interconnect-centric design for advanced SoC and NoC": Kluwer Academic Publishers, ISBN 1402078358, 2004. [Ope12] "OpenCores", OpenCores, opencores.org/projects, 2012 [Pal05] A. K. Palit, V. Meyer, W. Anheier, and J. Schloeffel, "ABCD modeling of crosstalk coupling noise to analyze the signal integrity losses on the victim interconnect in DSM chips", Proceedings of 18th International Conference on VLSI Design, pp. 354359, 2005 218 Web site: [Pam03] D. Pamunuwa, "Modelling and analysis of interconnects for deep submicron SoC", Doctoral thesis, Stockholm: Royal Institute of Technology, 2003. [Pam05] D. Pamunuwa, S. Elassaad, and H. Tenhunen, "Modeling delay and noise in arbitrary coupled RC trees", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, (11), pp. 1725-1739, 2005. [Pan05] P. P. Pande, G. D. Micheli, C. Grecu, A. Ivanov, and R. Saleh, "Design, Synthesis, and Test of Networks on Chips", Design & Test of Computers, vol. September-October, pp. 404-412, 2005. [Pil95] L. Pileggi, "Coping with RC(L) interconnect design headaches", Proceedings of International Conference on Computer-Aided Design, pp. 246-253, 1995 [Pra08] S. N. Pradhan, M. T. Kumar, and S. Chattopadhyay, "Three-level AND-OR-XOR network synthesis: A GA based approach", Proceedings of Asia Pacific Conference on Circuits and Systems, pp. 574 - 577, November 2008 [Qui52] W. Quinn, "The problem of simplifying truth functions"", American Mathematical Monthly, vol. 59, (8), pp. 521-531, 1952. [Rah10] M. A. Rahimian, S. Mohammadi, and M. Fattah, "A high-throughput, metastability-free GALS channel based on pausible clock method", Proceedings of Asia Symposium on Quality Electronic Design, pp. 294-300, 2010 219 REFERENCES [Rai06] J. Raik, V. Govind, and R. Ubar, "An external test approach for network-on-a-chip switches", Proceedings of 15th Asian Test Symposium, pp. 437-442 November 2006 [Ros07] D. Rossi, P. Angelini, and C. Metra, "Configurable error control scheme for NoC signal integrity", Proceedings of 13th IEEE International On-Line Testing Symposium, pp. 43-48, July 2007 [Rot62] J. P. Roth and R. M. Karp, "Minimization over Boolean graphs", IBM Journal of research and development, vol. 6, pp. 227-238, 1962. [Rud93] R. Rudell, "Dynamic variable ordering for ordered binary decision diagrams", Proceedings of International Conference on Computer-Aided Design, pp. 42--47, 1993 [Sas95] T. Sasao, "A design method for AND-OR-EXOR three-level networks", Proceedings of International Workshop on Logic Synthesis, pp. 8:11-8:20, May 1995 [Sas98] T. Sasao and M. Matsuura, "DECOMPOS: An integrated system for functional decomposition", Proceedings of ACM/IEEE International Workshop on Logic Synthesis, pp. 471–477, 1998 [Saw98] H. Sawada, S. Yamashita, and A. Nagoya, "Restructuring logic representations with easily detectable simple disjunctive decompositions", Proceedings of Design Automation Conference, pp. 755-759, 1998 220 [Sen92] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. Sangionvanni-Vincentelli, "SIS: A system for sequential circuit synthesis", University of California, Berkeley 1992. [She01] S. R. Shelar and S. S. Sapatnekar, "Recursive bipartitioning of BDDs for performance driven synthesis of pass transistor logic circuits", Proceedings of International Conference on Computer Aided Design, pp. 449-452, November 2001 [Sin02] A. Sinha, S. K. Gupta, and M. A. Breuer, "Validation and test issues related to noise induced by parasitic inductances of VLSI interconnects", Transactions on Advanced Packaging, vol. 25, (3), pp. 329-339, 2002. [Sir02] W. Sirisaengtakin and S. K. Gupta, "Enhanced crosstalk fault model and methodology to generate tests for arbitrary inter-core interconnect topology ", Proceedings of Asian Test Symposium, pp. 163169, November 2002 [Som98] F. Somenzi, "CUDD: CU Decision Diagram package release 2.3.0": Department of Electrical and Computer Engineering, University of Colorado at Boulder, 1998. [Son09] J. Song, J. Han, H. Yi, T. Jung, and S. Park, "Highly compact interconnect test patterns for crosstalk and static faults", Transactions on Circuits and Systems II: Express Briefs, vol. 56, (5), pp. 419-423, 2009. 221 REFERENCES [Sri08] S. R. Sridhara, G. Balamurugan, and N. R. Shanbhag, "Joint equalization and coding for onchip bus communication", Transactions on Very Large Scale Integration (VLSI) Systems, vol. 6, (3), pp. 314-318, 2008. [Sta95] T. Stanion and C. Sechen, "Quasi-algebraic decompositions of switching functions", Proceedings of Sixteenth Conference on Advanced Research in VLSI, pp. 358-367, 1995 [Ste06] K. Stewart and S. Tragoudas, "Interconnect testing for networks on chip", Proceedings of VLSI Test Symposium, pp. 100-191, 2006 [Su00] C. Su, Y.-T. Chen, M.-J. Huang, G.-N. Chen, and C.L. Lee, "All digital built-in delay and crosstalk measurement for on-chip buses", Proceedings of Design And Test in Europe, pp. 527-531, Mar. 2000 [Sun08] F. Sun and Y. Xia, "BDD based detection algorithm for XOR-type logic", Proceedings of International Conference on Communication Technology, pp. 351354, November 2008 [Tam07] R. Tamhankar, S. Murali, S. Stergiou, A. Pullini, F. Angiolini, L. Benini, and G. De Micheli, "Timingerror-tolerant network-on-chip design methodology", IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 26, (7), pp. 1497-2007, 2007. [Tes05] M. Teslenko, A. Martinelli, and E. Dubrova., "Bound-set preserving ROBDD variable orderings may not be optimum", Transactions on Computers, vol. 54, (2), pp. 236- 237, 2005. 222 [Tra08] X.-T. Tran, Y. Thonnart, J. Durupt, V. Beroulle, and C. Robach, "A design-for-test implementation of an asynchronous network-on-chip architecture and its associated test pattern generation and application", Proceedings of Second ACM/IEEE International Symposium on Network-on-Chip, pp. 149-158, April 2008 [Uba96] R. Ubar, "Test synthesis with alternative graphs", IEEE Design & Test of Computers, vol. 13, (1), pp. 48-57, 1996. [Uba04] R. Ubar, M. Jenihhin, G. Jervan, and Z. Peng, "Hybrid BIST optimization for core-based systems with test pattern broadcasting", Proceedings of Second IEEE International Workshop on Electronic Design, Test and Applications, pp. 3-8, Jan 2004 [Ver03] B. Vermeulen, J. Dielissen, K. Goossens, and C. Ciordas, "Bringing communication netwroks on a chip: Test and verification implications", IEEE Communications Magazine, vol. September, pp. 7481, 2003. [Wie02] P. Wielage and K. Goossens, "Networks on silicon: Blessing or nightmare?" Proceedings of Euromicro Symposium on Digital System Design, pp. 196-200, 2002 [Von91] B. Von Stengel, "Eine Dekompositionstheorie für mehrstellige Funktionen", in Mathematical Systems in Economics, vol. 123: Anton Hain, Frankfurt, 1991. 223 REFERENCES [Yan99] C. Yang, V. Singhal, and M. Ciesielski, "BDD decomposition for efficient logic synthesis", Proceedings of International Conference on Computer Design, pp. 626-631, 1999 [Zha03] Y. Zhao and S. Dey, "Fault-coverage analysis techniques of crosstalk in chip interconnects", Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 22, (6), pp. 770782, 2003. [Zha04] Y. Zhao, S. Dey, and L. Chen, "Double sampling data checking technique: an online testing solution for multisource noise-induced errors on on-chip interconnects and buses", Transactions on VLSI Systems, vol. 12, (7), pp. 746-755, 2004. [Zim03] H. Zimmer and A. Jantsch, "A fault model notation and error-control scheme for switch-to-switch buses in a network-on-chip", Proceedings of International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS),. 2003 224 Department of Computer and Information Science Linköpings universitet Dissertations Linköping Studies in Science and Technology Linköping Studies in Arts and Science Linköping Studies in Statistics Linköpings Studies in Informatics Linköping Studies in Science and Technology N o 14 Anders Haraldsson: A Program Manipulation System Based on Partial Evaluation, 1977, ISBN 917372-144-1. N o 17 Bengt Magnhagen : Probability Based Verification of Tim e Margins in Digital Designs, 1977, ISBN 91-7372157-3. N o 18 Mats Cedw all : Sem antisk analys av processbeskrivningar i naturligt språk, 1977, ISBN 91- 7372168-9. Jaak Urmi: A Machine Ind epend ent LISP Com piler and its Im plications for Id eal H ard w are, 1978, ISBN 91-7372-188-3. Tore Risch: Com pilation of Multiple File Queries in a Meta-Database System 1978, ISBN 91- 7372-232-4. Erland Jungert: Synthesizing Database Structures from a User Oriented Data Mod el, 1980, ISBN 917372-387-8. Sture Hägglund: Contributions to the Developm ent of Method s and Tools for Interactive Design of Applications Softw are, 1980, ISBN 91-7372-404-1. Pär Emanuelson: Perform ance Enhancem ent in a Well-Structured Pattern Matcher through Partial Evaluation, 1980, ISBN 91-7372-403-3. Bengt Johnsson, Bertil Andersson: The H u m anCom pu ter Interface in Com m ercial System s, 1981, ISBN 91-7372-414-9. H. Jan Komorow ski: A Specification of an Abstract Prolog Machine and its Application to Partial Evaluation, 1981, ISBN 91-7372-479-3. René Reboh: Know led ge Engineering Tech niques and Tools for Expert Systems, 1981, ISBN 91-7372489-0. Östen Oskarsson: Mechanism s of Mod ifiability in large Softw are System s, 1982, ISBN 91- 7372-527-7. Hans Lunell: Cod e Generator Writing System s, 1983, ISBN 91-7372-652-4. Andrzej Lingas: Ad vances in Minim um Weight Triangulation, 1983, ISBN 91-7372-660-5. Peter Fritzson: Tow ard s a Distributed Program m ing Environm ent based on Increm ental Com pilation, 1984, ISBN 91-7372-801-2. Erik Tengvald: The Design of Expert Planning System s. An Experim ental Operations Plan ning System for Tu rning, 1984, ISBN 91-7372- 805-5. Christos Levcopoulos: H euristics for Minim um Decom positions of Polygons, 1987, ISBN 91-7870133-3. James W. Goodw in: A Theory and System for N on Monotonic Reasoning, 1987, ISBN 91-7870-183-X. Zebo Peng: A Form al Method ology for Autom ated Synthesis of VLSI System s, 1987, ISBN 91-7870-225-9. Johan Fagerström: A Parad igm and System for Design of Distributed System s, 1988, ISBN 91-7870301-8. D imiter D riankov: Tow ards a Many Valued Logic of Quantified Belief, 1988, ISBN 91-7870-374-3. N o 22 N o 33 N o 51 N o 54 N o 55 N o 58 N o 69 N o 71 N o 77 N o 94 N o 97 N o 109 N o 111 N o 155 N o 165 N o 170 N o 174 N o 192 N o 213 N o 214 N o 221 N o 239 N o 244 N o 252 N o 258 N o 260 N o 264 N o 265 N o 270 N o 273 N o 276 N o 277 N o 281 N o 292 N o 297 N o 302 N o 312 N o 338 N o 371 Lin Padgham: N on-Monotonic Inheritance for an Object Oriented Know led ge Base, 1989, ISBN 917870-485-5. Tony Larsson: A Form al H ard w are Description and Verification Method , 1989, ISBN 91-7870-517-7. Michael Reinfrank: Fund am entals and Logical Found ations of Truth Maintenance, 1989, ISBN 917870-546-0. Jonas Löw gren: Know led ge-Based Design Sup port and Discourse Managem ent in User Interface Managem ent System s, 1991, ISBN 91-7870-720-X. Henrik Eriksson: Meta-Tool Support for Know led ge Acquisition, 1991, ISBN 91-7870-746-3. Peter Eklund: An Ep istem ic Approach to Interactive Design in Multiple Inheritance H ierarchies, 1991, ISBN 91-7870-784-6. Patrick D oherty: N ML3 - A N on-Monotonic Form alism w ith Explicit Defaults, 1991, ISBN 917870-816-8. N ahid Shahmehri: Generalized Algorithm ic Debugging, 1991, ISBN 91-7870-828-1. N ils D ahlbäck: Representation of DiscourseCognitive and Com putational Aspects, 1992, ISBN 91-7870-850-8. Ulf N ilsson: Abstract Interp retations and Abstract Machines: Contributions to a Method ology for the Im plem entation of Logic Program s, 1992, ISBN 917870-858-3. Ralph Rönnquist: Theory and Practice of Tensebound Object References, 1992, ISBN 91-7870-873-7. Björn Fjellborg: Pipeline Extraction for VLSI Data Path Synthesis, 1992, ISBN 91-7870-880-X. Staffan Bonnier: A Form al Basis for H orn Clause Logic w ith External Polym orphic Functions, 1992, ISBN 91-7870-896-6. Kristian Sandahl: Developing Know led ge Managem ent System s w ith an Active Expert Method ology, 1992, ISBN 91-7870-897-4. Christer Bäckström: Com putational Com plexity of Reasoning about Plans, 1992, ISBN 91-7870-979-2. Mats Wirén: Stud ies in Increm ental N atural Language Analysis, 1992, ISBN 91-7871-027-8. Mariam Kamkar: Interproced ural Dynam ic Slicing w ith Applications to Debugging and Testing, 1993, ISBN 91-7871-065-0. Tingting Zhang: A Stud y in Diagnosis Using Classification and Defaults, 1993, ISBN 91-7871-078-2 Arne Jönsson: Dialogue Managem ent for N atural Language Interfaces - An Em pirical Approach, 1993, ISBN 91-7871-110-X. Simin N adjm-Tehrani: Reactive System s in Physical Environm ents: Com positional Mod elling and Fram ew ork for Verification, 1994, ISBN 91-7871-237-8. Bengt Savén: Business Mod els for Decision Support and Learning. A Stu d y of Discrete-Event Manufacturing Sim ulation at Asea/ ABB 1968-1993, 1995, ISBN 91-7871-494-X. N o 375 N o 383 N o 396 N o 413 N o 414 N o 416 N o 429 N o 431 N o 437 N o 439 N o 448 N o 452 N o 459 N o 461 N o 462 N o 475 N o 480 N o 485 N o 494 N o 495 N o 498 N o 502 N o 503 N o 512 Ulf Söderman: Conceptu al Mod elling of Mode Sw itching Physical System s, 1995, ISBN 91-7871-5164. Andreas Kågedal: Exploiting Ground ness in Logic Program s, 1995, ISBN 91-7871-538-5. George Fodor: Ontological Control, Description, Id entification and Recovery from Problem atic Control Situ ations, 1995, ISBN 91-7871-603-9. Mikael Pettersson: Com piling N atural Sem antics, 1995, ISBN 91-7871-641-1. Xinli Gu: RT Level Testability Im provem ent by Testability Analysis and Transform ations, 1996, ISBN 91-7871-654-3. Hua Shu: Distribu ted Default Reasoning, 1996, ISBN 91-7871-665-9. Jaime Villegas: Sim ulation Supported Ind ustrial Training from an Organisational Learning Perspective - Developm ent and Evaluation of the SSIT Method , 1996, ISBN 91-7871-700-0. Peter Jonsson: Stud ies in Action Planning: Algorithm s and Com plexity, 1996, ISBN 91-7871-7043. Johan Boye: Directional Types in Logic Program m ing, 1996, ISBN 91-7871-725-6. Cecilia Sjöberg: Activities, Voices and Arenas: Participatory Design in Practice, 1996, ISBN 91-7871728-0. Patrick Lambrix: Part-Whole Reasoning in Description Logics, 1996, ISBN 91-7871-820-1. Kjell Orsborn: On Extensible and Object-Relational Database Technology for Finite Elem ent Analysis Ap plications, 1996, ISBN 91-7871-827-9. Olof Johansson: Developm ent Environm ents for Com plex Prod uct Mod els, 1996, ISBN 91-7871-855-4. Lena Strömbäck: User-Defined Constructions in Unification-Based Form alism s, 1997, ISBN 91-7871857-0. Lars D egerstedt: Tabu lation-based Logic Program m ing: A Multi-Level View of Query Answ ering, 1996, ISBN 91-7871-858-9. Fredrik N ilsson: Strategi och ekonom isk styrning En stud ie av hur ekonom iska styrsystem utform as och använd s efter företagsförvärv, 1997, ISBN 917871-914-3. Mikael Lindvall: An Em pirical Stud y of Requirem ents-Driven Im pact Analysis in Object-Oriented Softw are Evolution, 1997, ISBN 91-7871-927-5. Göran Forslund: Opinion-Based System s: The Coop erative Perspective on Know led ge-Based Decision Support, 1997, ISBN 91-7871-938-0. Martin Sköld: Active Database Managem ent System s for Monitoring and Control, 1997, ISBN 917219-002-7. Hans Olsén: Au tom atic Verification of Petri N ets in a CLP fram ew ork, 1997, ISBN 91-7219-011-6. Thomas D rakengren: Algorithm s and Com plexity for Tem poral and Spatial Form alism s, 1997, ISBN 917219-019-1. Jakob Axelsson: Analysis and Synthesis of H eterogeneous Real-Tim e System s, 1997, ISBN 91-7219-035-3. Johan Ringström: Com piler Generation for DataParallel Program m ing Languages from Tw o-Level Sem antics Specifications, 1997, ISBN 91-7219-045-0. Anna Moberg: N ärhet och d istans - Stud ier av kom m unikationsm önster i satellitkontor och flexibla kontor, 1997, ISBN 91-7219-119-8. N o 520 N o 522 No 526 N o 530 N o 555 N o 561 N o 563 N o 567 N o 582 N o 589 N o 592 N o 593 N o 594 N o 595 N o 596 N o 597 N o 598 N o 607 N o 611 N o 613 N o 618 N o 627 Mikael Ronström: Design and Mod elling of a Parallel Data Server for Telecom Applications, 1998, ISBN 91-7219-169-4. N iclas Ohlsson: Tow ard s Effective Fault Prevention - An Em pirical Stud y in Softw are Engineering, 1998, ISBN 91-7219-176-7. Joachim Karlsson: A System atic Approach for Prioritizing Softw are Requirem ents, 1998, ISBN 917219-184-8. Henrik N ilsson: Declarative Debugging for Lazy Functional Languages, 1998, ISBN 91-7219-197-x. Jonas Hallberg: Tim ing Issues in H igh -Level Synthesis, 1998, ISBN 91-7219-369-7. Ling Lin: Managem ent of 1-D Sequence Data - From Discrete to Continuous, 1999, ISBN 91-7219-402-2. Eva L Ragnemalm: Stud ent Mod elling based on Collaborative Dialogue w ith a Learning Com panion, 1999, ISBN 91-7219-412-X. Jörgen Lindström: Does Distance m atter? On geographical d ispersion in organisations, 1999, ISBN 917219-439-1. Vanja Josifovski: Design, Im plem entation and Evalu ation of a Distribu ted Med iator System for Data Integration, 1999, ISBN 91-7219-482-0. Rita Kovordányi: Mod eling and Sim ulating Inhibitory Mechanism s in Men tal Im age Reinterpretation - Tow ard s Cooperative H u m an Com puter Creativity, 1999, ISBN 91-7219-506-1. Mikael Ericsson: Supporting the Use of Design Know led ge - An Assessment of Com m enting Agents, 1999, ISBN 91-7219-532-0. Lars Karlsson: Actions, Interactions and N arratives, 1999, ISBN 91-7219-534-7. C. G. Mikael Johansson: Social and Organizational Aspects of Requirem ents Engineering Method s - A practice-oriented approach, 1999, ISBN 91-7219-541X. Jörgen Hansson: Value-Driven Multi-Class Overload Managem ent in Real-Tim e Database System s, 1999, ISBN 91-7219-542-8. N iklas Hallberg: Incorporating User Values in the Design of Inform ation System s and Services in the Public Sector: A Method s Approach, 1999, ISBN 917219-543-6. Vivian Vimarlund: An Econom ic Perspective on the Analysis of Im pacts of Inform ation Technology: From Case Stud ies in H ealth -Care tow ard s General Mod els and Theories, 1999, ISBN 91-7219-544-4. Johan Jenvald: Method s and Tools in Com puterSupported Taskforce Training, 1999, ISBN 91-7219547-9. Magnus Merkel: Und erstand ing and enhancing translation by parallel text processing, 1999, ISBN 917219-614-9. Silvia Coradeschi: Anchoring sym bols to sensory d ata, 1999, ISBN 91-7219-623-8. Man Lin: Analysis and Synthesis of Reactive System s: A Generic Layered Architecture Perspective, 1999, ISBN 91-7219-630-0. Jimmy Tjäder: System im plem entering i praktiken En stud ie av logiker i fyra projekt, 1999, ISBN 917219-657-2. Vadim Engelson: Tools for Design, Interactive Sim ulation, and Visu alization of Object-Oriented Mod els in Scientific Com puting, 2000, ISBN 91-7219709-9. N o 637 N o 639 N o 660 N o 688 N o 689 N o 720 N o 724 N o 725 N o 726 N o 732 N o 745 N o 746 N o 757 N o 747 N o 749 N o 765 N o 771 N o 772 N o 758 N o 774 N o 779 N o 793 N o 785 N o 800 Esa Falkenroth: Database Technology for Control and Sim ulation, 2000, ISBN 91-7219-766-8. Per-Arne Persson: Bringing Pow er and Know led ge Together: Inform ation System s Design for Autonom y and Control in Com m and Work, 2000, ISBN 91-7219796-X. Erik Larsson: An Integrated System -Level Design for Testability Method ology, 2000, ISBN 91-7219-890-7. Marcus Bjäreland: Mod el-based Execution Monitoring, 2001, ISBN 91-7373-016-5. Joakim Gustafsson: Extend ing Tem poral Action Logic, 2001, ISBN 91-7373-017-3. Carl-Johan Petri: Organizational Inform ation Provision - Managing Mand atory and Discretionary Use of Inform ation Technology, 2001, ISBN -91-7373-1269. Paul Scerri: Designing Agents for System s w ith Ad justable Au tonom y, 2001, ISBN 91 7373 207 9. Tim Heyer: Sem antic Inspection of Softw are Artifacts: From Theory to Practice, 2001, ISBN 91 7373 208 7. Pär Carlshamre: A Usability Perspective on Requirem ents Engineering - From Method ology to Prod uct Developm ent, 2001, ISBN 91 7373 212 5. Juha Takkinen: From Inform ation Managem ent to Task Managem ent in Electronic Mail, 2002, ISBN 91 7373 258 3. Johan Åberg: Live H elp System s: An Approach to Intelligent H elp for Web Inform ation System s, 2002, ISBN 91-7373-311-3. Rego Granlund: Monitoring Distributed Team w ork Training, 2002, ISBN 91-7373-312-1. Henrik André-Jönsson: Ind exing Strategies for Tim e Series Data, 2002, ISBN 917373-346-6. Anneli Hagdahl: Develop m ent of IT-supported Interorganisational Collaboration - A Case Stud y in the Sw ed ish Public Sector, 2002, ISBN 91-7373-314-8. Sofie Pilemalm: Inform ation Tech nology for N onProfit Organisations - Extend ed Participatory Design of an Inform ation System for Trad e Union Shop Stew ard s, 2002, ISBN 91-7373-318-0. Stefan Holmlid: Ad apting users: Tow ard s a theory of use quality, 2002, ISBN 91-7373-397-0. Magnus Morin: Multim ed ia Representations of Distributed Tactical Operations, 2002, ISBN 91-7373-4217. Paw el Pietrzak: A Type-Based Fram ew ork for Locating Errors in Constraint Logic Program s, 2002, ISBN 91-7373-422-5. Erik Berglund: Library Comm unication Am ong Program m ers Worldw id e, 2002, ISBN 91-7373-349-0. Choong-ho Yi: Mod elling Object-Oriented Dynam ic System s Using a Logic-Based Fram ew ork, 2002, ISBN 91-7373-424-1. Mathias Broxvall: A Stud y in the Com putational Com plexity of Tem poral Reasoning, 2002, ISBN 917373-440-3. Asmus Pandikow : A Generic Principle for Enabling Interoperability of Structured and Object-Oriented Analysis and Design Tools, 2002, ISBN 91-7373-479-9. Lars Hult: Pu blika Inform ationstjänster. En stud ie av d en Internetbaserad e encykloped ins bruksegenskaper, 2003, ISBN 91-7373-461-6. Lars Taxén: A Fram ew ork for the Coord ination of Com plex System s´ Developm ent, 2003, ISBN 917373-604-X N o 808 N o 821 N o 823 N o 828 N o 833 N o 852 N o 867 N o 872 N o 869 N o 870 N o 874 N o 873 N o 876 N o 883 N o 882 N o 887 N o 889 N o 893 N o 910 N o 918 N o 900 N o 920 N o 929 Klas Gäre: Tre perspektiv på förväntningar och föränd ringar i sam band m ed införand e av inform ationssystem , 2003, ISBN 91-7373-618-X. Mikael Kindborg: Concurrent Com ics program m ing of social agents by child ren, 2003, ISBN 91-7373-651-1. Christina Ölvingson: On Developm ent of Inform ation System s w ith GIS Functionality in Public H ealth Inform atics: A Requirem ents Engineering Approach, 2003, ISBN 91-7373-656-2. Tobias Ritzau: Mem ory Efficient H ard Real-Tim e Garbage Collection, 2003, ISBN 91-7373-666-X. Paul Pop: Analysis and Synthesis of Com m unication -Intensive H eterogeneous Real-Tim e System s, 2003, ISBN 91-7373-683-X. Johan Moe: Observing the Dynam ic Behaviour of Large Distributed System s to Im prove Developm ent and Testing – An Em pirical Stud y in Softw are Engineering, 2003, ISBN 91-7373-779-8. Erik Herzog: An Approach to System s Engineering Tool Data Representation and Exchange, 2004, ISBN 91-7373-929-4. Aseel Berglund: Augm enting the Rem ote Control: Stud ies in Com plex Inform ation N avigation for Digital TV, 2004, ISBN 91-7373-940-5. Jo Skåmedal: Telecom m uting’s Im plications on Travel and Travel Patterns, 2004, ISBN 91-7373-935-9. Linda Askenäs: The Roles of IT - Stud ies of Organising w hen Im plem enting and Using Enterprise System s, 2004, ISBN 91-7373-936-7. Annika Flycht-Eriksson: Design and Use of Ontologies in Inform ation -Provid ing Dialogue System s, 2004, ISBN 91-7373-947-2. Peter Bunus: Debugging Techniques for Equation Based Languages, 2004, ISBN 91-7373-941-3. Jonas Mellin: Resource-Pred ictable and Efficient Monitoring of Events, 2004, ISBN 91-7373-956-1. Magnus Bång: Com puting at the Speed of Paper: Ubiquitous Com pu ting Environm ents for H ealthcare Professionals, 2004, ISBN 91-7373-971-5 Robert Eklund: Disfluency in Sw ed ish hum an hum an and hum an -m achine travel booking d ialogues, 2004, ISBN 91-7373-966-9. Anders Lindström: English and other Foreign Linguistic Elem ents in Spoken Sw ed ish. Stud ies of Prod uctive Processes and their Mod elling using Finite-State Tools, 2004, ISBN 91-7373-981-2. Zhiping Wang: Capacity-Constrained Prod uction-inventory system s - Mod elling and Analysis in both a trad itional and an e-business context, 2004, ISBN 9185295-08-6. Pernilla Qvarfordt: Eyes on Multim od al Interaction, 2004, ISBN 91-85295-30-2. Magnus Kald: In the Bord erland betw een Strategy and Managem ent Control - Theoretical Fram ew ork and Em pirical Evid ence, 2004, ISBN 91-85295-82-5. Jonas Lundberg: Shaping Electronic N ew s: Genre Perspectives on Interaction Design, 2004, ISBN 9185297-14-3. Mattias Arvola: Shad es of use: The d ynam ics of interaction d esign for sociable use, 2004, ISBN 9185295-42-6. Luis Alejandro Cortés: Verification and Sched uling Techniques for Real-Tim e Embed d ed System s, 2004, ISBN 91-85297-21-6. D iana Szentivanyi: Performance Stud ies of FaultTolerant Mid d lew are, 2005, ISBN 91-85297-58-5. N o 933 N o 937 N o 938 N o 945 N o 946 N o 947 N o 963 N o 972 N o 974 N o 979 N o 983 N o 986 N o 1004 N o 1005 N o 1008 N o 1009 N o 1013 N o 1016 N o 1017 N o 1018 N o 1019 N o 1021 N o 1022 Mikael Cäker: Managem ent Accounting as Constructing and Opposing Custom er Focus: Three Case Stud ies on Managem ent Accounting and Custom er Relations, 2005, ISBN 91-85297-64-X. Jonas Kvarnström: TALp lanner and Other Extensions to Tem poral Action Logic, 2005, ISBN 9185297-75-5. Bourhane Kadmiry: Fuzzy Gain -Sched uled Visual Servoing for Unm anned H elicopter, 2005, ISBN 9185297-76-3. Gert Jervan: H ybrid Bu ilt-In Self-Test and Test Generation Techniques for Digital System s, 2005, ISBN : 91-85297-97-6. Anders Arpteg: Intelligent Sem i-Structured Inform ation Extraction, 2005, ISBN 91-85297-98-4. Ola Angelsmark: Constructing Algorithm s for Constraint Satisfaction and Related Problem s - Method s and Applications, 2005, ISBN 91-85297-99-2. Calin Curescu: Utility-based Optim isation of Resource Allocation for Wireless N etw orks, 2005, ISBN 91-85457-07-8. Björn Johansson: Joint Control in Dynam ic Situations, 2005, ISBN 91-85457-31-0. D an Law esson: An Approach to Diagnosability Analysis for Interacting Finite State System s, 2005, ISBN 91-85457-39-6. Claudiu D uma: Security and Trust Mechanism s for Groups in Distributed Services, 2005, ISBN 91-8545754-X. Sorin Manolache: Analysis and Optim isation of Real-Tim e System s w ith Stochastic Behaviour, 2005, ISBN 91-85457-60-4. Yuxiao Zhao: Stand ard s-Based Application Integration for Business-to-Business Com m unications, 2005, ISBN 91-85457-66-3. Patrik Haslum: Ad m issible H euristics for Autom ated Planning, 2006, ISBN 91-85497-28-2. Aleksandra Tešanovic: Developing Reusable and Reconfigurable Real-Tim e Softw are using Aspects and Com ponents, 2006, ISBN 91-85497-29-0. D avid D inka: Role, Id entity and Work: Extend ing the d esign and d evelopm ent agend a, 2006, ISBN 9185497-42-8. Iakov N akhimovski: Contributions to the Mod eling and Sim ulation of Mechanical System s w ith Detailed Contact Analysis, 2006, ISBN 91-85497-43-X. Wilhelm D ahllöf: Exact Algorithm s for Exact Satisfiability Problem s, 2006, ISBN 91-85523-97-6. Levon Saldamli: PDEMod elica - A H igh-Level Language for Mod elin g w ith Partial Differential Equations, 2006, ISBN 91-85523-84-4. D aniel Karlsson: Verification of Com ponent-based Em bed d ed System Designs, 2006, ISBN 91-85523-79-8 Ioan Chisalita: Com m unication and N etw orking Techniques for Traffic Safety System s, 2006, ISBN 9185523-77-1. Tarja Susi: The Pu zzle of Social Activity - The Significance of Tools in Cognition and Cooperation, 2006, ISBN 91-85523-71-2. Andrzej Bednarski: Integrated Optim al Cod e Generation for Digital Signal Processors, 2006, ISBN 9185523-69-0. Peter Aronsson: Au tom atic Parallelization of Equ ation-Based Sim ulation Program s, 2006, ISBN 9185523-68-2. N o 1030 N o 1034 N o 1035 N o 1045 N o 1051 N o 1054 N o 1061 N o 1073 N o 1075 N o 1079 N o 1083 N o 1086 N o 1089 N o 1091 N o 1106 N o 1110 N o 1112 N o 1113 N o 1120 N o 1127 N o 1139 N o 1143 N o 1150 N o 1155 Robert N ilsson: A Mutation-based Fram ew ork for Autom ated Testing of Tim eliness, 2006, ISBN 9185523-35-6. Jon Edvardsson: Techniques for Autom atic Generation of Tests from Program s and Specifications, 2006, ISBN 91-85523-31-3. Vaida Jakoniene: Integration of Biological Data, 2006, ISBN 91-85523-28-3. Genevieve Gorrell: Generalized H ebbian Algorithm s for Dim ensionality Red uction in N atural Language Processing, 2006, ISBN 91-85643-88-2. Yu-Hsing Huang: H aving a N ew Pair of Glasses Applying System ic Accid ent Mod els on Road Safety, 2006, ISBN 91-85643-64-5. Åsa Hedenskog: Perceive those things w hich cannot be seen - A Cognitive System s Engineering perspective on requirem ents m anagem ent, 2006, ISBN 91-85643-57-2. Cécile Åberg: An Evaluation Platform for Sem antic Web Technology, 2007, ISBN 91-85643-31-9. Mats Grindal: H and ling Com binatorial Explosion in Softw are Testing, 2007, ISBN 978-91-85715-74-9. Almut Herzog: Usable Security Policies for Runtim e Environm ents, 2007, ISBN 978-91-85715-65-7. Magnus Wahlström: Algorithm s, m easures, and u p per bound s for Satisfiability and related problem s, 2007, ISBN 978-91-85715-55-8. Jesper Andersson: Dynam ic Softw are Architectures, 2007, ISBN 978-91-85715-46-6. Ulf Johansson: Obtaining Accurate and Com prehensible Data Mining Mod els - An Evolu tionary Approach, 2007, ISBN 978-91-85715-34-3. Traian Pop: Analysis and Optim isation of Distributed Em bed d ed System s w ith H eterogeneous Sched uling Policies, 2007, ISBN 978-91-85715-27-5. Gustav N ordh: Com plexity Dichotom ies for CSPrelated Problem s, 2007, ISBN 978-91-85715-20-6. Per Ola Kristensson: Discrete and Continuous Shape Writing for Text Entry and Control, 2007, ISBN 97891-85831-77-7. He Tan: Aligning Biom ed ical Ontologies, 2007, ISBN 978-91-85831-56-2. Jessica Lindblom: Mind ing the bod y - Interacting socially through em bod ied action, 2007, ISBN 978-9185831-48-7. Pontus Wärnestål: Dialogue Behavior Managem ent in Conversational Recom m end er System s, 2007, ISBN 978-91-85831-47-0. Thomas Gustafsson: Managem ent of Real-Tim e Data Consistency and Transient Overload s in Em bed d ed System s, 2007, ISBN 978-91-85831-33-3. Alexandru Andrei: Energy Efficient and Pred ictable Design of Real-tim e Em bed d ed System s, 2007, ISBN 978-91-85831-06-7. Per Wikberg: Eliciting Know led ge from Experts in Mod eling of Com plex System s: Managing Variation and Interactions, 2007, ISBN 978-91-85895-66-3. Mehdi Amirijoo: QoS Control of Real-Tim e Data Services und er Uncertain Workload , 2007, ISBN 97891-85895-49-6. Sanny Syberfeldt: Optim istic Replication w ith Forw ard Conflict Resolution in Distributed Real-Tim e Databases, 2007, ISBN 978-91-85895-27-4. Beatrice Alenljung: Envisioning a Future Decision Support System for Requirem ents Engineering - A H olistic and H um an -centred Perspective, 2008, ISBN 978-91-85895-11-3. N o 1156 N o 1183 N o 1185 N o 1187 N o 1204 N o 1222 N o 1238 N o 1240 N o 1241 N o 1244 N o 1249 N o 1260 N o 1262 N o 1266 N o 1268 N o 1274 N o 1281 N o 1290 N o 1294 N o 1306 N o 1313 N o 1321 N o 1333 N o 1337 Artur Wilk: Types for XML w ith Application to Xcerpt, 2008, ISBN 978-91-85895-08-3. Adrian Pop: Integrated Mod el-Driven Developm ent Environm ents for Equation-Based Object-Oriented Languages, 2008, ISBN 978-91-7393-895-2. Jörgen Skågeby: Gifting Technologies Ethnographic Stud ies of End -users and Social Med ia Sharing, 2008, ISBN 978-91-7393-892-1. Imad-Eldin Ali Abugessaisa: Analytical tools and inform ation-sharing m ethod s supporting road safety organizations, 2008, ISBN 978-91-7393-887-7. H. Joe Steinhauer: A Representation Schem e for Description and Reconstruction of Object Configurations Based on Qualitative Relations, 2008, ISBN 978-91-7393-823-5. Anders Larsson: Test Optim ization for Core-based System -on-Chip, 2008, ISBN 978-91-7393-768-9. Andreas Borg: Processes and Mod els for Capacity Requirem ents in Telecom m unication System s, 2009, ISBN 978-91-7393-700-9. Fredrik Heintz: DyKnow : A Stream -Based Know led ge Processing Mid dlew are Fram ew ork, 2009, ISBN 978-91-7393-696-5. Birgitta Lindström: Testability of Dynam ic RealTim e System s, 2009, ISBN 978-91-7393-695-8. Eva Blomqvist: Sem i-autom atic Ontology Construction based on Patterns, 2009, ISBN 978-91-7393-683-5. Rogier Woltjer: Functional Mod eling of Constraint Managem ent in Aviation Safety and Com m and and Control, 2009, ISBN 978-91-7393-659-0. Gianpaolo Conte: Vision-Based Localization and Guid ance for Unm anned Aerial Vehicles, 2009, ISBN 978-91-7393-603-3. AnnMarie Ericsson: Enabling Tool Support for Form al Analysis of ECA Rules, 2009, ISBN 978-91-7393598-2. Jiri Trnka: Exploring Tactical Com m and and Control: A Role-Playing Sim ulation Approach, 2009, ISBN 978-91-7393-571-5. Bahlol Rahimi: Supporting Collaborative Work through ICT - H ow End -users Think of and Ad opt Integrated H ealth Inform ation System s, 2009, ISBN 978-91-7393-550-0. Fredrik Kuivinen: Algorithm s and H ard ness Results for Som e Valued CSPs, 2009, ISBN 978-91-7393-525-8. Gunnar Mathiason: Virtual Full Replication for Scalable Distributed Real-Tim e Databases, 2009, ISBN 978-91-7393-503-6. Viacheslav Izosimov: Sched uling and Optim ization of Fault-Tolerant Distribu ted Em bed d ed System s, 2009, ISBN 978-91-7393-482-4. Johan Thapper: Aspects of a Constraint Optim isation Problem , 2010, ISBN 978-91-7393-464-0. Susanna N ilsson: Augm entation in the Wild : User Centered Developm ent and Evaluation of Augm ented Reality Applications, 2010, ISBN 978-917393-416-9. Christer Thörn: On the Quality of Feature Mod els, 2010, ISBN 978-91-7393-394-0. Zhiyuan He: Tem perature Aw are and DefectProbability Driven Test Sched uling for System -onChip, 2010, ISBN 978-91-7393-378-0. D avid Broman: Meta-Languages and Sem antics for Equation-Based Mod eling and Sim ulation, 2010, ISBN 978-91-7393-335-3. Alexander Siemers: Contributions to Mod elling and Visualisation of Multibod y System s Sim ulations w ith N o 1354 N o 1359 N o 1373 N o 1374 N o 1375 N o 1381 N o 1386 N o 1419 N o 1451 N o 1455 N o 1465 N o 1455 Detailed Contact Analysis, 2010, ISBN 978-91-7393317-9. Mikael Asplund: Disconnected Discoveries: Availability Stud ies in Partitioned N etw orks, 2010, ISBN 978-91-7393-278-3. Jana Rambusch: Mind Gam es Extend ed : Und erstand ing Gam eplay as Situated Activity, 2010, ISBN 978-91-7393-252-3. Sonia Sangari: H ead Movem ent Correlates to Focus Assignm ent in Sw ed ish,2011,ISBN 978-91-7393-154-0. Jan-Erik Källhammer: Using False Alarm s w hen Developing Autom otive Active Safety System s, 2011, ISBN 978-91-7393-153-3. Mattias Eriksson: Integrated Cod e Generation, 2011, ISBN 978-91-7393-147-2. Ola Leifler: Afford ances and Constraints of Intelligent Decision Support for Military Com m and and Control – Three Case Stud ies of Su pport System s, 2011, ISBN 978-91-7393-133-5. Soheil Samii: Quality-Driven Synthesis and Optim ization of Em bed d ed Control System s, 2011, ISBN 978-91-7393-102-1. Erik Kuiper: Geographic Routing in Interm ittentlyconnected Mobile Ad H oc N etw orks: Algorithm s and Perform ance Mod els, 2012, ISBN 978-91-7519981-8. Sara Stymne: Text H arm onization Strategies for Phrase-Based Statistical Machine Translation, 2012, ISBN 978-91-7519-887-3. Alberto Montebelli: Mod eling the Role of Energy Managem ent in Em bod ied Cognition, 2012, ISBN 978-91-7519-882-8. Mohammad Saifullah: Biologically-Based Interactive N eural N etw ork Mod els for Visual Attention and Object Recognition, 2012, ISBN 978-91-7519-838-5. Tomas Bengtsson: Testing and Logic Optim ization Techniques for System s on Chip, 2012, ISBN 978-917519-742-5. Linköping Studies in Arts and Science N o 504 Ing-Marie Jonsson: Social and Em otional Characteristics of Speech -based In-Vehicle Inform ation System s: Im pact on Attitud e and Driving Behaviour, 2009, ISBN 978-91-7393-478-7. Linköping St udies in St at ist ics No 9 D avood Shahsavani: Com p uter Experim ents Designed to Explore and Approxim ate Com plex Deter m inistic Mod els, 2008, ISBN 978-91-7393-976-8. N o 10 Karl Wahlin: Road m ap for Trend Detection and Assessm ent of Data Quality, 2008, ISBN 978-91-7393792-4. N o 11 Oleg Sysoev: Monotonic regression for large m ultivariate d atasets, 2010, ISBN 978-91-7393-412-1. N o 13 Agné Burauskaite-Harju: Characterizing Tem poral Change and Inter-Site Correlation s in Daily and Subd aily Precipitation Extrem es, 2011, ISBN 978-91-7393110-6. Linköping St udies in Informat ion Science No 1 Karin Axelsson: Metod isk system strukturering- att skapa sam stäm m ighet m ellan inform ationssystem arkitektur och verksam het, 1998. ISBN -9172-19-296-8. No 2 Stefan Cronholm: Metod verktyg och använ d barhet en stud ie av d atorstöd d m etod baserad system utveckling, 1998, ISBN -9172-19-299-2. No 3 No 4 No 5 No 6 No 7 No 8 No 9 N o 10 N o 11 N o 12 N o 13 N o 14 Anders Avdic: Använd are och utvecklare - om anveckling m ed kalkylprogram , 1999. ISBN -91-7219606-8. Ow en Eriksson: Kom m unikationskvalitet hos inform ationssystem och affärsprocesser, 2000, ISBN 917219-811-7. Mikael Lind: Från system till process - kriterier för processbestäm ning vid verksam hetsanalys, 2001, ISBN 91-7373-067-X. Ulf Melin: Koord ination och inform ationssystem i företag och nätverk, 2002, ISBN 91-7373-278-8. Pär J. Ågerfalk: Inform ation System s Actability - Und erstand ing Inform ation Technology as a Tool for Business Action and Com m unication, 2003, ISBN 917373-628-7. Ulf Seigerroth: Att förstå och föränd ra system utvecklingsverksam heter - en taxonom i för m etau tveckling, 2003, ISBN 91-7373-736-4. Karin Hedström: Spår av d atoriseringens värd en – Effekter av IT i äld reom sorg, 2004, ISBN 91-7373-9634. Ew a Braf: Know led ge Dem and ed for Action Stud ies on Know led ge Med iation in Organisations, 2004, ISBN 91-85295-47-7. Fredrik Karlsson: Method Configuration m ethod and com puterized tool sup port, 2005, ISBN 91-8529748-8. Malin N ordström: Styrbar system förvaltning - Att organisera system förvaltningsverksam het m ed hjälp av effektiva förvaltningsobjekt, 2005, ISBN 91-8529760-7. Stefan Holgersson: Yrke: POLIS - Yrkeskunskap, m otivation, IT-system och and ra förutsättningar för polisarbete, 2005, ISBN 91-85299-43-X. Benneth Christiansson, Marie-Therese Christiansson: Mötet m ellan process och kom ponent - m ot ett ram verk för en verksam hetsnära kravspecifikation vid anskaffning av kom ponentbaserad e inform ationssystem , 2006, ISBN 91-8564322-X.

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement