Testing and Logic Optimization Techniques for Systems on Chip $(function(){PrimeFaces.cw("Tooltip","widget_formSmash_items_resultList_4_j_idt799_0_j_idt801",{id:"formSmash:items:resultList:4:j_idt799:0:j_idt801",widgetVar:"widget_formSmash_items_resultList_4_j_idt799_0_j_idt801",showEffect:"fade",hideEffect:"fade",target:"formSmash:items:resultList:4:j_idt799:0:fullText"});});

Testing and Logic Optimization Techniques for Systems on Chip $(function(){PrimeFaces.cw("Tooltip","widget_formSmash_items_resultList_4_j_idt799_0_j_idt801",{id:"formSmash:items:resultList:4:j_idt799:0:j_idt801",widgetVar:"widget_formSmash_items_resultList_4_j_idt799_0_j_idt801",showEffect:"fade",hideEffect:"fade",target:"formSmash:items:resultList:4:j_idt799:0:fullText"});});
Linköping Studies in Science and Technology
Dissertations. No. 1490
Testing and Logic Optimization Techniques
for Systems on Chip
by
Tomas Bengtsson
Department of Computer and Information Science
Linköpings universitet
SE-581 83 Linköping, Sweden
Linköping 2012
Copyright © 2012 Tomas Bengtsson
ISBN 978-91-7519-742-5
ISSN 0345-7524
Printed by LiU-Tryck 2012
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-84806
Abstract
Today it is possible to integrate more than one billion transistors onto
a single chip. This has enabled implementation of complex
functionality in hand held gadgets, but handling such complexity is far
from trivial. The challenges of handling this complexity are mostly
related to the design and testing of the digital components of these
chips.
A number of well-researched disciplines must be employed in the
efficient design of large and complex chips. These include utilization
of several abstraction levels, design of appropriate architectures,
several different classes of optimization methods, and development of
testing techniques. This thesis contributes mainly to the areas of
design optimization and testing methods.
In the area of testing this thesis contributes methods for testing of
on-chip links connecting different clock domains. This includes
testing for defects that introduce unacceptable delay, lead to excessive
crosstalk and cause glitches, which can produce errors. We show how
pure digital components can be used to detect such defects and how
the tests can be scheduled efficiently.
To manage increasing test complexity, another contribution
proposes to raise the abstraction level of fault models from logic level
to system level. A set of system level fault models for a NoC-switch is
proposed and evaluated to demonstrate their potential.
In the area of design optimization, this thesis focuses primarily on
logic optimization. Two contributions for Boolean decomposition are
presented. The first one is a fast heuristic algorithm that finds nondisjoint decompositions for Boolean functions. This algorithm
operates on a Binary Decision Diagram. The other contribution is a
I
fast algorithm for detecting whether a function is likely to benefit
from optimization for architectures with a gate depth of three with an
XOR-gate as the third gate.
II
Populärvetenskaplig
sammanfattning
Idag är det möjligt att integrera mer än en miljard transistorer på ett
enda mikrochip. Utvecklingen av mikrochips har gjort det möjligt att
implementera mycket komplexa och avancerade funktioner i små
handhållna apparater. Så kallade smartphones är ett typiskt exempel.
Att hantera komplexiteten i mikrochip av den här storleken är långt
ifrån trivialt, i synnerhet när det gäller de digitala delarna.
Resultaten från flera olika forskningsområden utnyttjas i
samverkan för att på ett effektivt sätt konstruera stora komplexa
mikrochip. Sådana forskningsområden behandlar hur man utnyttjar
flera abstraktionsnivåer, hur man utformar bra arkitekturer, hur man
optimerar konstruktioner och hur man testar de färdiga mikrochipen.
Bidragen som presenteras i den här avhandlingen fokuserar dels på
hur man optimerar konstruktioner dels på hur man testar de färdiga
mikrochipen.
Man kan ha olika klockdomäner i olika delar av ett mikrochip för
att slippa distribuera en och samma klocksignal över hela mikrochipet.
När det gäller test av mikrochip bidrar denna avhandling med metoder
för att testa kommunikationslänkar som går mellan delar av chipet
som har olika klocksignaler. Bidragen inkluderar tester för defekter
som kan orsaka fel genom oacceptabel fördröjning, genom för mycket
överhörning eller genom spikar.
Logisk nivå är den abstraktionsnivå där en konstruktion
representeras med hjälp av grindar och vippor. Det är vanligtvis
utifrån en sådan representation man i detalj bestämmer hur ett
mikrochip skall testas och ofta lägger man in extra grindar och vippor
III
i chipet för teständamål. För att hantera testkomplexiteten har denna
avhandling ett bidrag som föreslår att man lyfter abstraktionsnivån för
testutveckling från den logiska nivån till systemnivån. Systemnivån är
en representation som beskriver vad konstruktionen skall göra utan att
ange några detaljer om implementeringen. För att påvisa potentialen
för utveckling av test på systemnivån föreslås och utvärderas i denna
avhandling hur man på systemnivå kan modellera fel för en NoCswitch. En NoC-switch är en specifik typ av komponent som finns i
vissa mikrochip.
När det gäller optimeringsmetoder har denna avhandling två
bidrag som fokuserar på minimering av antalet grindar i en
konstruktion. Det första bidraget är en algoritm för att bryta ut
delfunktioner i ett Booleskt uttryck. Den algoritmen opererar på ett så
kallat Binary Decision Diagram (BDD) som är en typ av riktad graf
för att representera en Boolesk funktion. Det andra bidraget är en
snabb algoritm för att göra en prognos över hur mycket en funktion
kommer att tjäna på en arkitektur med ett grinddjup på tre där den
tredje grinden utgörs av en XOR-grind med två ingångar.
IV
Acknowledgment
There are many people who have supported and encouraged me
during my Ph.D. studies and the work with writing this thesis. I would
like to give a special thank to Professor Shashi Kumar, my supervisor
at Jönköping University, for always taking time to help, support and
encourage me in a good way. I would also like to give a special thank
to Professor Zebo Peng, my supervisor at Linköping University, for
great supervision and patient guidance throughout my Ph.D. studies.
A special thank I would also like to give to Professor Elena
Dubrova at Royal Institute of Technology Stockholm, for good
supervision, discussions and encouragement during the work with
logic optimization topic which formed the basis of my licentiate
thesis. I would also like to give one more thank to Professor Shashi
Kumar for very useful support, encouragement and supervision also
during the work taking me to licentiate degree. I would give a thank to
Professor Bengt Magnhagen as well who let me be accepted as a
doctoral student at Jönköping University.
I am very thankful to Dr. Artur Jutman and Professor Raimund
Ubar at Tallinn Technical University too for very good research
collaboration with the work on electronic testing as well as for their
inspiration and for their willingness to share their knowledge and
experience. I am also thankful to Dr. Andrés Martinelli for good
collaboration with the work with logic optimization. I am also grateful
to all other colleagues at Linköping University, Royal Institute of
Technology and Tallinn Technical University who have contributed in
one or another way to make this work possible.
I am also grateful to all other colleagues at Jönköping University
who have contributed by encouraging me, participating in technical
V
discussions, helping me to handle obstacles or have contributed in
other ways for making this work possible. A special thank to Alf
Johansson, Rickard Holsmark and Dr. Adam Lagerberg.
I would also like to give a thank to Brittany Shahmehri for a great
work with English language correction and improvement.
Finally I would like to give a great thank to my parents AnnLouise and Klas, my sister Åsa and my girlfriend Louise for all
support, understanding and encouragement.
Tomas Bengtsson
November 2012
VI
Contents
Part A. Introduction and background
1
Introduction
1.1
1.2
1.3
2
Chip design, SoC and test development .......................... 3
Addressed problems and contributions............................ 5
Thesis outline................................................................. 11
Digital system design and testing
2.1
2.2
2.3
2.4
3
13
Digital system design..................................................... 14
Core based design and systems on chips ....................... 18
Logic optimization......................................................... 22
Defects and digital system testing.................................. 31
Part B. Chip testing
3
Background and related work in SoC testing
3.1
3.2
3.3
49
SoC testing and NoC testing.......................................... 49
On chip crosstalk induced fault testing.......................... 55
Test generation at high abstraction levels...................... 69
VII
4
Testing of crosstalk induced faults in on-chip
interconnects
4.1
4.2
4.3
4.4
5
Method for testing of faults causing delay errors .......... 77
Method for scheduling wires as victims ........................ 94
Method for test of crosstalk-faults causing glitches .... 100
Conclusions ................................................................. 112
System level fault models
5.1
5.2
5.3
77
113
System level faults....................................................... 113
Evaluation of system level fault models ...................... 117
Conclusions ................................................................. 130
Part C. Logic optimization
6
Background and related work in Boolean
decomposition
6.1
6.2
6.3
6.4
7
7.3
7.4
7.5
VIII
Decomposition of Boolean functions .......................... 134
Decision diagram based decomposition methods........ 139
Decomposition for three-levels logic synthesis ........... 151
Other applications of Boolean decomposition............. 154
A fast algorithm for finding bound-sets
7.1
7.2
133
159
Basic idea of Interval-cut algorithm ............................ 159
Interval-cut algorithm and formal proof of its
functionality................................................................. 161
Implementation aspects and complexity analysis........ 165
Experimental results .................................................... 175
Discussion and conclusions ......................................... 178
8
Functional decomposition for three-level logic
179
implementation
8.1
8.2
8.3
8.4
8.5
Basic ideas in 3-level decomposition estimation
method ......................................................................... 180
Theorem on which the estimation method is based ..... 182
Estimation algorithm.................................................... 185
Experimental results .................................................... 187
Conclusions.................................................................. 189
Part D. Conclusions
9
Conclusions and future work
9.1
9.2
9.3
193
Contributions in chip testing........................................ 193
Contributions in Boolean decomposition..................... 194
Proposals for future work ............................................ 195
IX
X
Part A
Introduction and
background
1
2
Chapter 1
Introduction
Development of a System on Chip (SoC) is a complex process with
many steps, each with special demands and challenges. In this thesis,
we contribute analyses of certain aspects of the design and testing of
complex SoCs, and also propose solutions to some associated
problems.
This chapter briefly provides background necessary to this thesis,
discusses the problems addressed and outlines the contributions.
Section 1.1 gives the background and Section 1.2 describes the
problems addressed and the contributions, including a list of
publications based on the contributions. Section 1.3 provides an
outline of the thesis.
1.1 Chip design, SoC and test
development
Since the semiconductor was invented, the level of device integration
on a single chip has grown rapidly – in fact, it has doubled about
every two years for several decades [ITRS08]. This growth is
commonly referred to as Moore’s law, named for Gordon Moore who
initially predicted this rate of increase in 1965 [Moo65]. Today it is
possible to integrate more than one billion transistors on a single die.
3
CHAPTER 1
As the level of integration increases, there are basically two
design challenges that need to be considered. The first of these design
challenges is related to the decreasing dimensions of on-chip
components and the relative increase in length of interconnections. As
component dimensions decrease, physical aspects which could
previously be neglected must now be considered. Crosstalk effects, for
example, require more attention today than previously.
The other design challenge that becomes more intricate as
component density increases is related to design complexity. The large
number of transistors that can be integrated onto a single chip makes it
possible to design very complex circuits. With increasing complexity
the design process becomes more challenging, which means that more
efficient design methods are needed. Two popular techniques are
utilization of Intellectual Property cores (IP-cores) and the creation of
more sophisticated computer tools that allow design at a higher level
of abstraction. The goal is that the finished product should be as
optimal as possible in terms of cost, performance and/or power
consumption. However, many synthesis and optimization problems
are computationally expensive; therefore they cannot practically be
solved with an exact optimization algorithm. In many cases the choice
of optimization strategy is a tradeoff between performance, production
cost, flexibility and design time.
As the integration level increases, development of efficient test
techniques becomes more challenging as well. Development of tests
for defects in chips consists of two major tasks. The first task is to
determine how to detect the presence of defects inside the chip. The
second task is to provide means of activating the measurement of the
defect by sending a signal into the chip and then propagating the
results of the measurement back out of the chip. The second task is
referred to as test access.
The increasing challenges associated with identifying the presence
of a defect inside the chip are closely related to the increasing design
challenges which arise with miniaturization of components. For the
previous many decades the stuck-at fault model has been used to
model many defects in digital circuits. In this model a defect makes a
node in a digital circuit as if it is permanently stuck to logic value 0 or
to value 1. To check whether a node is stuck-at 0, logic 1 is applied to
4
INTRODUCTION
the node and the logic value of the node is measured. Shorts and
breaks are typical defects that can be detected in this way. Defects of
this type can be considered either to exist or not. In the case of modern
deep sub-micron chips, it is sometimes also necessary to test for other
defects that are of a more continuous nature. This means that one or
more parameters are outside of acceptable ranges, causing unwanted
chip behavior. One example of such a defect is wires that are too thin
causing too much resistance. Another example is closely-spaced wires
causing more parasitic capacitance than accounted for, which can lead
to excessive crosstalk and produce an unacceptable level of delay.
Unlike defects modeled as stuck-at faults, measurement of crosstalkfaults and delay faults requires extra logic.
Increased chip complexity also makes designing test access a far
more challenging task. As more logic exists between the defect being
tested and the chip’s interface to its environment, it becomes more
complicated to activate the test and propagate the result data back out
of the chip. Large chips are usually equipped with logic dedicated for
test activation and test result propagation. In literature such logic is
usually referred to as Design for Testability (DfT) logic. Chips with a
large number of components have a large number of potential defects,
which means that testing for every potential defect becomes time
consuming. One solution to increase test speed is to add special onchip test logic, called Built In Self Test (BIST) circuit, which is used
for self testing of the chip. We use the phrase test logic to refer DfT
logic and BIST circuits.
1.2 Addressed problems and
contributions
In this thesis we address several key issues for the design and test of
complex SoCs. These issues are all related to the development of the
silicon technology and the rapid increase of chip complexity. The
detailed problems addressed and the technical contributions of the
thesis are described in the following subsections.
5
CHAPTER 1
1.2.1. Crosstalk test for on-chip links
For the small dimensions and high frequencies of modern chips, it
may be necessary to test for defects that cause excessive delays or too
much crosstalk. This type of testing is usually essential for relatively
long on-chip wires. Tests for crosstalk-faults should detect defects that
cause more crosstalk than accounted for. For some kinds of crosstalk
effects, explicit testing is not necessary although they need
consideration during design. Consideration of capacitive coupling is
usually sufficient when the test fabric is designed. The capacitive
coupling between wires affects their signal delay and can cause
glitches.
Unacceptable signal delay caused by crosstalk occurs under
certain conditions, which means it will only manifest when the
interfering wires are carrying certain signals.
When a signal wire is tested for crosstalk related defects, the
interfering wires can be put in a state representing the worst case
scenario. If the signal works correctly in each worst case scenario, one
can conclude that the tested signal does not suffer from too much
crosstalk. This type of test is however not sufficient in cases where a
signal travels between components with different clock signals
because there is non-determinism in the phase difference between the
different clock signals in the clock domains.
In this thesis a test method is presented which tests for crosstalkfaults in bus lines between different clock domains on a chip. This
method reads the signal wire one clock cycle earlier than under normal
operation. In this way it can be guaranteed that the interference
affecting the signal being tested is not so large that it can cause a
failure. This measurement can be repeated several times and if the
signal is read correctly at least once, one can conclude that the
crosstalk-fault under consideration is not present. An advantage of this
method is that only digital test logic is needed for this test.
Crosstalk can also cause glitches on signal wires. With digital
glitch detectors, test for glitches can be included. Tests for crosstalkfaults causing unacceptable delay and faults causing glitches forms a
complete test for crosstalk induced faults affecting signal wires.
6
INTRODUCTION
Contributions in this thesis show how such a complete test can be
formed only requiring digital test logic to be inserted in the chip.
Buses on chips have wires closely packed together. The height of
wires in modern chips has become greater than their width [Aru05],
which makes capacitive coupling between wires relatively significant.
This, in turn, increases the risk that a defect could cause capacitive
coupling effects to be greater than accounted for. Such defects are the
main cause of crosstalk-faults, which means it is often sufficient to
test only for this type of defect. When testing for interference on a
signal wire in a bus, one strategy for creating worst case interference
is to apply values to all other signal wires. However it is usually
sufficient to apply signals only to the wires closest to the wire being
tested. In this way, several signals in a bus can be tested for crosstalkfaults simultaneously and the test efficiency will improve.
During the test procedure the term victim wire is used for the
wires currently being tested and the term aggressor wire is used for
the wires that affect the victim wires through crosstalk. One
contribution of this thesis is a method for scheduling wires to be
victims and aggressors during the test procedure. A shift register is
used with one cell for each respective wire, controlling whether it
should be a victim or an aggressor. Given a minimum distance
between wires that should simultaneously be victims, initial values
can be determined for the shift register to make the test procedure
efficient.
The contributions related to testing of crosstalk-faults and delay
faults in asynchronous on-chip links have been published in [Ben05a,
Ben05b, Ben06a, Ben06b, Ben06c, Ben06e, Ben08].
1.2.2. System level fault modeling and test
generation
It has been recognized in the research and circuit manufacturing
community that the way to increase design-productivity is to work at a
higher level of design specification and to use CAD tools to
synthesize the circuit for the target technology. Most of the test
methodologies still use a logic level representation for generating test
vectors and test logic. It would be beneficial if test logic and test
7
CHAPTER 1
vectors could be prepared along with the rest of the design process.
This requires accurate fault models at the higher abstraction levels.
At a specific level of abstraction, faults can be developed that
correspond to possible defects in the actual physical implementation,
or to faults at a lower abstraction level. Faults can also be based on
fault models at the abstraction level of design specification.
Faults that correspond to possible defects have the advantage that
they can be very accurate. A drawback is that it can be tricky to create
them depending on the tools and methods used for synthesis and how
the system has been optimized. Another drawback is that such faults
cannot be found before synthesis has been completed.
Faults based on fault models at a certain abstraction level have the
advantage that they can be used before the design is synthesized into a
lower abstraction level. This is needed for development of test data
and test logic along with the design process. At abstraction levels
above the logic level the most difficult challenge is to create fault
models with a good correlation to physical defects in the
implementation. The higher the abstraction level, the more difficult it
is to find good fault models.
Above the behavior level of abstraction is the system level. The
system level of abstraction describes what the system should do
without providing information on how it should be implemented.
Because it is difficult to define general system level faults, we propose
usage of application specific fault models at the system level.
Application specific faults are specific to a certain type of system. For
a switch used in data communication networks, an example of such a
fault model could be: a packet from one specific direction that is
supposed to be transferred further in a certain direction is instead
transferred in a wrong direction. For a display driver an example of a
system level fault model would be: pixels of a certain color turn a
certain different color when the intensity is supposed to be greater
than some level.
In this thesis we propose and evaluate a set of system level fault
models for a Network on Chip-switch (NoC-switch). A simplified
version of a NoC-switch has been designed and synthesized into logic
level. Statistical analyses have been done to compare how test vectors
8
INTRODUCTION
cover the stuck-at faults for this logic level implementation and how
they cover the system level faults.
The contributions related to system level fault modeling and
analysis have been published in [Ben06d].
1.2.3. Logic optimization
Optimization is generally performed during the process of designing
and synthesizing digital systems. The most important targets for
optimization are to minimize chip area, to optimize speed and to
minimize power consumption. For a given design, one target may be
prioritized over the others. Logic optimization is optimization during
synthesis from the RT-level to the logic level, and it is the process of
optimizing a system described at the logic level of abstraction. The
number of flip-flops, number of gates and sizes of gates can be used at
the logic level to predict the chip area and power consumption of the
system. Gate depth can be used to predict speed.
Many optimization problems at the logic level are NP-hard
[Dem94], so heuristic methods are needed. One of the main steps in
optimization of the combinational parts of the design is Boolean
decomposition. Boolean decomposition is the process of finding sub
expressions of a Boolean function. This thesis has two contributions to
Boolean decomposition.
The first contribution is a fast heuristic method that finds boundsets of a Boolean function. The presented method executes on
Reduced Ordered Binary Decision Diagrams (ROBDD). For
ROBDDs with good variable order the presented heuristic finds all
bound-sets in most cases.
The second contribution is a fast method to find the likelihood that
a Boolean function f(X) will benefit from a target implementation
expressed as g1(X) ⊕ g2(X) when functions f(X), g1(X) and g2(X) are
implemented with two-level logic. Optimization algorithms for such
an expression can be quite time-consuming, so it is advantageous to
know in advance if optimization is likely to be beneficial.
The contributions relating to Boolean decomposition have been
published in [Ben01, Ben03a, Ben03b].
9
CHAPTER 1
1.2.4. List of contributions
The contributions in this thesis have been published in the following
articles.
[Ben01]
T. Bengtsson and E. Dubrova, "A sufficient
condition for detection of XOR-type logic",
Proceedings of Norchip, Stockholm, Sweden, pp.
271-278, November 2001
[Ben03a]
T.
Bengtsson,
"Boolean
decomposition
in
combinational logic synthesis". Licentiate thesis,
Royal Institute of Technology Stockholm, ISSN 16514076, 2003.
[Ben03b]
T. Bengtsson, A. Martinelli, and E. Dubrova, "A
BDD-based fast heuristic algorithm for disjoint
decomposition", Proceedings of Asia and South
Pacific Design Automation Conference, Kitakyushu,
Japan, pp. 191-196, January 2003
[Ben05a]
T. Bengtsson, A. Jutman, S. Kumar, and R. Ubar,
"Delay testing of asynchronous NoC interconnects",
Proceedings of International Conference Mixed Design
of Integrated Circuits and Systems, pp. June 2005
[Ben05b]
T. Bengtsson, A. Jutman, R. Ubar, and S. Kumar, "A
method for crosstalk fault detection in on-chip
buses", Proceedings of Norchip, Oulu, Finland, pp.
285-288, November 2005
[Ben06a]
T. Bengtsson, A. Jutman, S. Kumar, R. Ubar, and Z.
Peng, "Analysis of a test method for delay faults in
NoC interconnects", Proceedings of East-West
Design & Test International Workshop (EWDTW), pp.
42-46, September 2006
10
INTRODUCTION
[Ben06b]
T. Bengtsson, A. Jutman, S. Kumar, R. Ubar, and Z.
Peng, "Off-line testing of delay faults in NoC
interconnects", Proceedings of Euromicro Conference
on Digital System Design: Architectures, Methods and
Tools, pp. 677 - 680, 2006
[Ben06c]
T. Bengtsson, S. Kumar, A. Jutman, and R. Ubar,
"An improved method for delay fault testing of
NoC interconnections", Proceedings of Special
Workshop on Future Interconnects and Networks on
Chip (along with Design And Test in Europe), pp.
March 2006
[Ben06d]
T. Bengtsson, S. Kumar, and Z. Peng, "Application
area specific system level fault models: a case study
with a simple NoC switch", Proceedings of
International Design and Test Workshop (IDT), pp.
November 2006
[Ben06e]
T. Bengtsson, S. Kumar, R. Ubar, and A. Jutman,
"Off-line testing of crosstalk induced glitch faults in
NoC interconnects", Proceedings of Norchip,
Linköping, Sweden, pp. 221-226, November 2006
[Ben08]
T. Bengtsson, S. Kumar, R. Ubar, A. Jutman, and Z.
Peng, "Test methods for crosstalk-induced delay
and glitch faults in network-on-chip interconnects
implementing
asynchronous
communication
protocols", Computers and Digital Techniques, IET,
vol. 2, (6), pp. 445-460, 2008.
1.3 Thesis outline
This thesis is divided into four parts, Part A – Part D. Part A gives an
introduction and background for the entire thesis. It consists of this
11
CHAPTER 1
introductory chapter and Chapter 2. Chapter 2 provides a more
detailed background to the contributions in this thesis. In Part B and
Part C the contributions in testing and in logic optimization,
respectively, are presented.
Part B consists of Chapters 3 – 5. Chapter 3 presents background
on SoC testing and it describes work related to the contributions in
electronic testing. Chapter 4 presents the contributions in the area of
testing for crosstalk and delay faults. The contribution to system level
fault modeling and testing is presented in Chapter 5.
Part C has a similar structure to Part B. It consists of Chapters 6 –
8. Chapter 6 provides more detailed background on Boolean
decomposition. Work related to the contributions in Boolean
decomposition is also described in Chapter 6. Chapter 7 presents a fast
heuristic method to finds bound-sets of a Boolean function
represented with a BDD. The contribution to optimization of Boolean
functions of the form f(X) = g1(X) ⊕ g2(X) is presented in Chapter 8.
The last part of the thesis, Part D, consists of Chapter 9, which
presents conclusions and proposals for future work.
12
Chapter 2
Digital system design and
testing
This chapter provides relevant background in more detail than was
offered in Chapter 1, with the goal of providing context for the
contributions described in later chapters. Section 2.1 provides an
overview of the development procedure for complex digital electronic
systems. Section 2.2 describes core based design and testing,
including an introduction to SoC. It also describes Network on Chip
(NoC), which is a promising candidate for the interconnection
infrastructure of future SoCs. Section 2.3 and Section 2.4 give
overviews of design optimization issues and test optimization issues
respectively.
13
CHAPTER 2
2.1 Digital system design
System
specification
Library of
algorithms
Architecture
template
System
synthesis
Library of soft
IP-cores at
RT-level
Design at
behavior level
RT-level design
Library of soft
IP-cores at
logic level
Logic synthesis and
Technology mapping
Software development
Behavior synthesis
Logic design
Library of
hard
IP-cores
Layout generation
Layout
Embedded binary code
Figure 2.1: A typical design flow for a complex digital system
The design process of a complex digital system generally starts from a
system level specification. A system specification is a description of
what functions the system should perform with little or no description
of how they should be implemented. The design process then, step by
step, creates a design that implements the desired functionality. Figure
2.1 shows a diagram of what the design process typically looks like. In
many cases the design process is iterative but this is omitted in the
14
DIGITAL SYSTEM DESIGN AND TESTING
figure for the sake of simplicity and to maintain focus on the parts that
are relevant to the work presented in this thesis. The steps at the upper
part of Figure 2.1 deals with more abstract design specifications.
Typically, the design steps at higher abstraction levels are made
manually while steps at lower levels are performed with computer
tools.
2.1.1. Abstraction levels for modeling and
design
To handle the complexity, the design process is divided into several
levels of abstraction. Higher abstraction levels hide details and
complexity shown in lower levels of abstractions. In Figure 2.1 the
design flow starts with System specification and ends with Layout.
The following section provides descriptions of the abstraction levels
System level, Behavior level, Register transfer level, Logical level and
Layout level.
System level
As mentioned before, at the system level the system’s desired
functionalities are described without any explanation of how they
should be implemented. For example, if a system or part of a system is
supposed to sort a list of elements, a system level representation
specifies that the system should sort and, if it is not obvious what is
meant by sorting, it defines the properties of a sorted list. However, at
this abstraction level no information is given about which algorithm
should be used to perform the sorting. Another example is a design
that includes filtering of a digital signal. At the system level the
properties of the filter would be specified but not the algorithm that
implements the filter. In Figure 2.1 the box System specification
represents the description at the system level of abstraction. System C,
System Verilog and UML are some examples of languages that can be
used to model a system at this level.
Behavior level
At the behavior level the system is described as an algorithm. In the
example of a system that is supposed to sort, this level defines the
15
CHAPTER 2
sorting algorithm that should be used. In the example of a system with
a filter, the filtering algorithm is defined with all its parameters. For
example, it could be described as a Finite Impulse Response (FIR)
filter in which all multiplication factors are specified, where
multiplication factor refers to the factors by which samples should be
multiplied. The number representation that should be used for sample
values is usually also specified at the behavior level. VHDL and
Verilog can be used for modeling at this level.
Register transfer level
At the Register Transfer level (RT-level) a system is described with a
datapath and a controller. The datapath consists of functional units,
vector multiplexers and registers. These elements are connected to
each other by means of signals which are vectors of bits. The RT-level
is the highest level of abstraction at which it is defined what should be
performed for each clock cycle. In the datapath only registers contain
memory elements and they are clocked with a clock signal.
Functionalities that should be purely combinational are represented as
functional units. Examples of functional units are ALUs and
multipliers. For functionalities that require several clock cycles, an
RT-level representation describes how they are implemented with
registers and pure combinational functional units.
The controller is used to generate load signals for registers,
control the multiplexers and, control functional units in the datapath.
Inputs to the controller can be signals from the datapath representing
status of previous computations, for example output of a comparator
that compares two bit vectors in the datapath. The controller can also
have external inputs. The controller is usually described as a Finite
State Machine (FSM).
In the example of a system with a digital filter the datapath
contains sample values and intermediate results of the computation.
The controller part controls what the datapath should do. For example
one multiplier can be utilized for several multiplication steps in the
filter algorithm. The controller then controls vector multiplexers in the
datapath to feed the multiplier with multiplicands from the correct
sources. VHDL and Verilog are examples of hardware description
languages that are used for description of a system at the RT-level.
16
DIGITAL SYSTEM DESIGN AND TESTING
In Figure 2.1 the box RT-level design represents the description at
the RT-level of abstraction.
Logic level
At the logic level of abstraction the system is described as a network
of gates and flip-flops. For example, at the RT-level an ALU is only
defined with the operations it should perform for respective
combinations of input control signals. At the logic level it is described
how a network of gates makes the ALU perform those operations. An
expression that is formulated as a Boolean equation has an easy and
direct mapping to a network of gates. Because of that, Boolean
equations are often used to represent the gates in a logic network. In
Figure 2.1 the box Logic design represents the description at the logic
level of abstraction.
Layout level
In this thesis we use the term layout level to refer to a complete
description of the various masks used in various steps of IC
fabrication. At this level it is defined where on the chip each transistor
and all other components should be placed to make the chip perform
the desired functionality.
It is possible to define transistor level as an abstraction level in
between the logic level and the layout level. In that level the network
of transistors is specified but not the physical position of respective
transistors. EDIF is an example of a file format that can be used to
describe a system at transistor level.
In Figure 2.1 the box Layout represents the description at the
layout level of abstraction.
2.1.2. Design flow
System synthesis is the process of refining the system specification
into a design at the behavior level. At this step the algorithms that will
be used for implementing the system specification are identified. A
decision can also be made to include an architecture template with
some pre-designed components like processors, memories, buses,
communication protocols, already included in the design.
17
CHAPTER 2
Parts of the functionality in digital systems are usually
implemented in software. In Figure 2.1 this is represented by the
dotted box to the right. The software development can be described
with more details but because the contributions of this thesis are
related to hardware development, everything about software
development is represented with a single box in the figure. The
embedded software is mapped to processing elements in the system.
In addition to predefined components, the architecture template
also contains slots for new hardware. When the system synthesis has
finished, the hardware design process continues in the synthesis steps
that follow. The design at the behavior level is refined to an RT-level
design through the design step behavior synthesis. Behavior synthesis
schedules what operations should be executed at each clock cycle.
Logic synthesis and technology mapping is the synthesis step in which
the RT-level design is refined to a logic design. The output of this step
is a network of gates and flip-flops that implement the functionality of
the system. The last synthesis step is the layout generation in which
the masks for various layers are generated for chip manufacturing.
Each of the synthesis steps can be implemented in many different
ways and it is desirable to find a method that gives the best possible
final implementation. Optimization operations can also be applied to
the design descriptions at the different abstraction levels before
proceeding with the next synthesis steps. This is described further in
Section 2.3, which is about optimization. There the focus is on
optimization at the logic level. The contribution of this thesis
presented in Part C has to do with optimization at logic level.
2.2 Core based design and systems on
chips
As mentioned in Section 1.1 the integration level on a chip has
doubled about every two years for several decades such that more than
one billion transistors can be integrated on a single chip today. If this
large capacity of chips is to be utilized, the methods used to design
chips need to be increasingly efficient; otherwise the required number
18
DIGITAL SYSTEM DESIGN AND TESTING
of man-hours to design a chip would grow with the integration level
and will become unrealistic in most cases. One important method to
keep the number of man-hours for design acceptably low is the usage
of IP-cores. IP-cores are readymade designs that can be included in a
SoC design. In the current section IP-cores are first described, and
then the way in which SoCs can be composed with IP-cores is
discussed. After that NoC, an infrastructure that can be used in SoCs
to connect IP-cores, is described.
2.2.1. IP-cores
IP-cores are designs that have already been made in-house or designs
that are obtained from external suppliers. Based on the abstraction
level of description of an IP-core it can be classified either as a hard
IP-core or a soft IP-core. An IP-core provided at the layout level is
called a hard IP-core. IP-cores provided at the logic level of
abstraction and above are called soft IP-cores. A soft IP-core can be a
network of gates and flip-flops. In this case it is at the logic level of
abstraction. A soft IP-core provided at RT-level can be a VHDLdescription.
The architecture template may already include some IP-cores and
some more IP-cores may be included during the synthesis process.
This is shown in Figure 2.1. For example, some hard IP-cores can be
included during layout generation while soft IP-cores are included
earlier in the design process.
One important advantage of IP-cores is that they are reusable. An
IP-core can be used in several designs and can be reused from
previously designed chips. There are companies which sell IP-cores
[Alt12, Arm12] and some IP-cores are available for free [Ope12]. A
widely used IP-core is the processor. A supplier can then provide
software development tools along with the processor IP-core.
Soft IP-cores rely on the users’ synthesis tools. They are
independent of the target chip technology and it is an important
advantage that they can be used for many different chip technologies.
Another advantage is that the synthesis tool can, to some extent, make
a soft IP-core to fit layout constraints, for example in a particular,
desired shape.
19
CHAPTER 2
Hard IP-cores are provided as a layout for a certain chip
technology. An advantage of hard IP-cores is that the performance in
terms of speed and power consumption can be optimized and this
information can be provided by the IP-core provider. Knowledge of
such details can help selection of appropriate IP-cores to include in a
design based on the design constraints.
2.2.2. Systems on chips
A SoC is composed of several cores on a single chip which
collaborate to make the chip perform its desired functionality. In a
SoC the different cores have to be connected to each other in an
appropriate way to achieve the desired functionality. Early SoCs had
usually dedicated wires connecting each pair of components that
needed to communicate. When the amount of integration grew, such
interconnections became unwieldy and took up too much chip area.
As a result, bus-based infrastructures became popular in SoCs. A bus
is a single broadcast medium. It is widely realized that a single-bus
architectures can no longer deliver the required global bandwidth and
latency to support current SoCs [Ver03]. Using multiple buses is a
way to achieve better performance. For a system with a large number
of cores, such a system of buses might become bulky, because all
pairs of cores that communicate with each other must have at least one
bus in common, and several buses are needed to gain any significant
advantage over a single bus.
In complex SoCs several advantages can be achieved if a packet
based communication infrastructure can be used instead of buses. In
2002 the NoC communication architectures were proposed [Ben02,
Kum02]. This is a packet based communication infrastructure that can
be used instead of buses in SoCs. One advantage with such a structure
is that more parallelism can be achieved in the communication
compared to a bus-based infrastructure. In this way the overall
throughput can be improved. The NoC architecture is described
further in the next subsection.
20
DIGITAL SYSTEM DESIGN AND TESTING
2.2.3. Networks on chips
The process of developing SoCs, particularly those with a NoC
infrastructure, is a unifying factor for the contributions in this thesis.
The NoC infrastructure is a packet based communication system
connecting different cores in a SoC. A core is an IP-core or some
other subcomponent. This packet based infrastructure consists of
switches with links between them. In a switch, each packet that arrives
at an input port is forwarded to an output port on its way to the final
destination. Ports in a switch are used to connect to other switches via
links and to connect to cores.
A commonly used topology for the infrastructure and cores is the
mesh topology [Ben02, Kum02], which is illustrated in Figure 2.2. In
this type of topology the switches and the cores are arranged in a
matrix. Communication links are connected between adjacent
switches in y-direction and in x-direction. Each switch is connected to
one core. The physical length of the connection links between
switches is equal. This has the advantage that links will have
predictable and equal delays.
Switch
Switch
Core
Switch
Core
Switch
Core
Core
Figure 2.2: Mesh topology NoC layout
A drawback with the mesh layout is that the available chip area for
each core is required to be approximately equal. Usage of IP-cores
21
CHAPTER 2
that are much smaller will leave chip area unused. IP-cores that are
larger than the allocated area cannot be included in the NoC-chip
without modifying its structure.
There is another proposed topology in which the NoCinfrastructure is placed in a central part of the chip with the cores
around it. The idea of this topology is to take a bus-based SoC and
replace the bus with a NoC infrastructure. This topology is used in the
Aethereal type of NoC [Ver03, Wie02].
2.3 Logic optimization
This section gives an overview of the optimization process during
design of digital systems. It focuses in particular on the optimization
step referred to as logic optimization.
2.3.1. Overview of optimization during
system design
Many different possible implementations exist for the same
functionality. Properties like chip area, speed performance and power
consumption can differ between different implementations. The way
synthesis steps are implemented has a significant effect on the
properties of the final implementation. For some applications the main
objective is to make the chip as small and power efficient as possible.
For other applications processing speed might be more important.
Optimization for a specific objective can be performed in the synthesis
steps from one abstraction level to another one or at a given
abstraction levels.
To be able to optimize, metrics are needed at different abstraction
levels to gauge which design is better. Such metrics should correlate
strongly to the optimization objective. For example, at the logic level,
the number of gate inputs and the number of flip-flops can be used to
estimate how much chip area will be needed in the final layout. The
gate depth can be used to estimate maximum clock frequency
possible. At the behavior level, time complexity of algorithms used for
22
DIGITAL SYSTEM DESIGN AND TESTING
implementing the functionality is a metric with good correlation to the
speed of the final implementation.
Many of the optimization problems faced during refinement of a
design from system specification to layout are NP-hard [Dem94]. The
following paragraphs give a brief overview of some of the
optimization problems and associated synthesis steps.
The system level of abstraction describes what the system should
do without any description of how. Synthesis from the system level to
the behavior level is usually done manually. This synthesis step
includes decisions about which algorithms should be used for different
subfunctions of the system. Good algorithm selection is very
important to the performance of the final product.
During synthesis from the behavior level of abstraction to the RTlevel, a number of design decisions must be made. For example, it
might be determined that several operations at the behavior level can
use the same functional unit at the RT-level. It must also be decided at
this synthesis step whether pipelining should be used in the datapath
or not.
At the RT-level, parts of the functionality are usually described as
one or several FSMs with a datapath. During synthesis from the RTlevel to the logic level, the number of states in the FSM describing the
controller is minimized. These states are encoded and Boolean
expressions for the combinational part of the state machine are
generated. The chosen encoding has a large impact on the number of
gates needed.
At the logic level of abstraction the system is described with flipflops and combinational logic. The combinational logic can either be
described as a network of gates or as Boolean expressions. A Boolean
expression has a direct mapping to a network of gates.
During synthesis from the logic level to the layout level, the gates,
flip-flops and interconnects are materialized as a layout. Layouts for
specific types of gates and flip-flops are generally taken from a
library. An optimization challenge at this step is to place gates and
flip-flops and route the interconnects.
23
CHAPTER 2
2.3.2. Logic optimization
The optimization that is performed during synthesis from the RT-level
to the logic level as well as optimization on the logic level description
of a system is referred to as logic level optimization or simply as logic
optimization. Logic optimization consists of state minimization of
FSMs, encoding of the states in FSMs and optimization of
combinational logic.
For fully specified FSMs there is an exact algorithm for state
minimization with polynomial time complexity. On the other hand,
the minimization problem is NP-hard for incompletely specified FSMs
in which outputs and/or state transitions are don't-cares for some
combinations of inputs [Dem94].
The states in an FSM need to be encoded with a set of flip-flops.
The number of flip-flops needed is at least log 2 N  where N is the
number of states. In some cases using more than the minimum number
of flip-flops can reduce the combinational parts so much that it is
worth using more flip-flops. For example one hot encoding uses one
flip-flop for each state such that exactly one flip-flop takes logic value
1 at the time.
Optimization of the combinational parts of a design allows the
optimization procedure to choose different strategies and tradeoffs.
Combinational logic optimization has two main types of optimization
strategies, two-level logic optimization and multi-level logic
optimization. In two-level optimization the logic synthesis generates a
logic circuit with a gate depth of two, not counting inverters on the
inputs. Gate depth is the maximum number of gates a signal must
traverse between an input of the combinational part and an output of
the combinational part. In multi-level optimization the gate depth in
the logic circuit can be anything. Logic synthesis for two-level logic
circuits is especially useful when PLA-structures are used for
implementation. Multi-level optimization strategies are preferable
when the target implementation is an FPGA or a full custom chip.
Two-level optimization and multi-level optimization are described in
Subsections 2.3.3 and 2.3.4. In the area of optimization this thesis
includes contributions in logic optimization of combinational logic.
Part C contains the contributions of this thesis in logic optimization,
24
DIGITAL SYSTEM DESIGN AND TESTING
and it also describes the logic optimization step referred to as
decomposition.
2.3.3. Two-level optimization
Two-level description
Two-level optimization is optimization of logic into logic circuits with
a gate depth of two. Inputs and complements of inputs to the logic
function are connected to AND-gates. Outputs of the AND-gates are
connected to OR-gates. There is one OR-gate for each output of the
logic function. The AND-gates are treated as the first level of logic
and the OR-gates as the second level of logic. In fact, if the
complements of the inputs are not available, one more level of logic is
required to invert the input signals. However, such inverters on the
inputs are not counted as an additional level of gates, so optimization
for this kind of structure is called two-level optimization. There is a
direct mapping between a two-level logic circuit and a sum of product
(SOP)-form representation of Boolean functions.
An example
of an expression in SOP-form is
f ( x1 , x 2 , x3 , x 4 ) = x1 ⋅ x 2 ⋅ x3 ⋅ x 4 + x1 ⋅ x 2 + x3 ⋅ x 4 . The terms in
such an expression are called product terms. Each product term
corresponds to an AND-gate and the sum in the expression
corresponds to the OR-gate. An alternative to the SOP-form is the
product of sums (POS)-form.
In practice, a system normally contains combinational parts with
more than one output. The number of gates can usually be reduced if
some product terms are shared by more than one output. Figure 2.3
shows an example of a two-level implementation of the two Boolean
functions
f1 = x1 ⋅ x 2 ⋅ x3 ⋅ x 4 + x1 ⋅ x 2 + x3 ⋅ x 4
and
f 2 = x1 ⋅ x 2 + x3 ⋅ x 4 . Both these functions include the product term
x3 ⋅ x 4 and they share the AND-gate generating it.
25
CHAPTER 2
x1
x2
f1
x3
x4
f2
Figure 2.3: Two-level implementation of two output functions
Cube representation and Karnaugh maps
One way to model Boolean functions is to use cube representation. In
this representation a Boolean space with dimension n is used, where n
is the number of variables of the function. Two discrete values, 0 and
1, are used as coordinates for each dimension. Therefore there exist 2n
discrete points in this entire space. A point in this space is called a
minterm and it represents an assignment of the variables of a Boolean
function. If the function is fully specified, then for each specific
minterm the function value is either logic 0 or logic 1. Figure 2.4
shows an example of a cube representation for a function with three
inputs. In that figure, filled minterms represents function value 1 while
a non-filled minterm represents function value 0. The Boolean
function shown in Figure 2.4 is then x1 ⋅ x 2 ⋅ x3
+ x1 ⋅ x 2 ⋅ x3
+ x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 .
A subspace of a Boolean space is the set of minterms for which a
subset of the inputs are fixed to specific values. This type of subspace
is called a cube. The two dotted ovals in Figure 2.4 are examples of
cubes. In this example the dimensions of the smaller one is one and
the dimensions of the larger one is two. A cube, in which the function
value is 1 for all minterms, is called an implicant. Thus, the two dotted
ovals in Figure 2.4 are implicants.
A set of implicants that contains all minterms where the function
value is one is called a cover of that function. A cover has a direct
mapping to the two-level logic circuit because each implicant
corresponds to a product term and then to an AND-gate. For each
26
DIGITAL SYSTEM DESIGN AND TESTING
dimension in the cube representation space where the implicant is
fixed to 1 or to 0, an input to the AND-gate is required. Hence the
number of inputs to the corresponding AND-gate is smaller for a
larger implicant. More precisely, the number of inputs needed to the
AND-gates is equal to the difference in dimensions between the
implicant and the Boolean space of the entire function. For example,
in Figure 2.4 the large cube corresponds to an AND-gate with only
one input (A one input AND-gate reduces to a wire or to a buffer) fed
by input x1. This cube represents all minterms where x1 = 1. The
smaller cube in Figure 2.4 corresponds to a two input AND-gate fed
by x 2 and x3 .
x2
1
1
x3
0
x1
0
0
1
Figure 2.4: A three dimensional Boolean space
In some cases an implicant can be expanded to cover more minterms.
Releasing an input that is fixed is one way of doing so. For example,
the implicant in Figure 2.5a can be expanded so it becomes like the
implicant in Figure 2.5b. The implicant in Figure 2.5a corresponds to
a two input AND-gate with inputs x1 and x 2 while the implicant in
Figure 2.5b corresponds to a one input AND-gate with input x1. This
implicant cannot be expanded further because if it were larger it would
cover minterms for which the function value is zero. An implicant that
cannot be expanded further is called a prime implicant.
27
CHAPTER 2
One way to treat the cube representation is using Karnaugh maps
[Kar53]. A Karnaugh map is a Boolean space projected onto a twodimensional surface.
x2
x2
1
1
1
x3
1
0
x1
0
0
1
x3
0
x1
0
0
a
1
b
Figure 2.5: Example of expansion of an implicant
Algorithms for two-level optimization
In the 1950s Quine [Qui52] and McCluskey [Mcc56] developed an
exact algorithm for two-level optimization. Quine proved a
fundamental theorem, stating that there exists a minimal cover
consisting only of prime implicants. This result reduces the search
space for optimization algorithms to prime implicants. McCluskey
proposed a method using the set of prime implications of a function to
find its minimal cover.
Due to the NP-hard nature of the problem, the exact algorithms
are intractable for most large functions. Thus, heuristic methods are
used in practice. A popular heuristic method is the minimizer Espresso
[Bra84].
2.3.4. Multi-level optimization
Multi-level optimization of logic circuits targets implementations in
which gate depth is not restricted to two. Not having that restriction
makes it possible to provide better options than two-level optimization
for goals like minimization of area and minimization of power
consumption. The consequence of this flexibility is that it is more
28
DIGITAL SYSTEM DESIGN AND TESTING
complicated to optimize. Larger gate depth also results in more delay
than a smaller gate depth. An example of a multi-level logic circuit is
shown Figure 2.6.
x1
x2
x3
f
x4
Figure 2.6: A multi-level logic circuit
Optimization programs commonly apply a set of transformation
operations on a logic network targeting the optimization goals. The
logic network can be represented as a network of gates and it can be
expressed with Boolean equations. It can also be represented with a
combination of Boolean equations and a network. The network is, in
this case, a directed acyclic graph with edges representing signals and
nodes having Boolean expressions. De Micheli [Dem94] describes
how logic optimization can be applied in this type of representation, in
which each node has a Boolean equation expressed in SOP-form.
Decomposition is one optimization operation which is particularly
important when the optimization target is minimization of area or
minimization of power consumption. A decomposition operation on a
logic network splits a node into multiple nodes in a way that makes
further optimization operations efficient when applied separately on
the different parts. To be useful, the result of a decomposition
operation normally requires that the number of signals between the
parts is few.
A logic network might not be already partitioned into different
parts when logic optimization starts. A decomposition operation can
then be applied on the entire network as a first step.
Examples of other types of transformation operations include
those that merge nodes and those that minimize the number of product
terms in the Boolean expressions inside nodes. Special searches can
be conducted on the logic network to determine how common sub
29
CHAPTER 2
expression can be extracted and how available signals can be utilized
to transform to the logic network in line with the optimization target.
2.3.5. Cost metrics for logic optimization
Common cost criteria used during optimization are the size of the
layout, speed performance and power consumption. The logic circuit,
which is the outcome of the logic synthesis, does not have an exact
connection to the number of components or chip area but the cost has
to be estimated in some way. The following describes how cost is
usually estimated for two-level circuits and for multi-level circuits.
Two-level logic circuits
In two-level optimization, the number of implicants is normally used
as cost criteria. An implicant that can be shared by several outputs is
only counted once.
As mentioned in Section 2.3.2, two-level optimization is
particularly suitable for PLA-implementation. The method for
estimating cost described above has a direct mapping to the required
size of a PLA for the implementation. A common PLA-structure has a
set of outputs and a set of inputs, where any Boolean function can be
implemented as long as the total number of implicants is less than a
specified value.
Multi-level logic circuits
When the logic network is represented as a network of gates, the total
number of gate inputs is a good measurement of the chip area for the
final implementation. The number of literals is a useful measurement
of the expected chip area, when the logic network is represented as a
directed acyclic graph in which nodes have Boolean expression in
SOP-form. The number of literals of a logic block is the sum of the
number of occurrences of the input variables in its Boolean
expression. The number of literals of a logic schema is the sum of the
number of literals of all logic blocks in the design.
In Section 2.3.2 gate depth were defined as the maximum number
of gates a signal must traverse between an input and an output of the
30
DIGITAL SYSTEM DESIGN AND TESTING
combinational part of a logic circuit. The gate depth is then a direct
estimation of the maximal delay.
2.4 Defects and digital system testing
We use the term testing to refer to detection of manufacturing defects.
We use the term validation to refer to methods for detection of logic
design errors. This thesis only deals with methods and considerations
related to testing for manufacturing defects.
The IC fabrication is not perfect and different types of defects can
be introduced in this process. A defect in an electronic system is a
physical deviation from the specification, which may possibly give
different functionality than intended. Material defects, mask defects
and dust particles are examples of things that can cause defects on
manufactured chips. Defects on manufactured chips need to be tested
[Lar08] in order to find the chips with defects. Complex chips cannot
be exhaustively tested to check whether they work for all cases. For
example, one subcomponent in a chip could be a 32 bit multiplier. To
exhaustively test it for correct functioning, all combinations of two 32
bit multiplicands need to be applied and the result needs to be checked
to see if it is correct. This requires 2 32 ⋅ 2 32 ≈ 1019 different tests,
which can’t be achieved in a reasonable time. To create a test for a
chip, it is necessary to find an approach that does more than simply
check whether the chip works in all possible situations. Instead, the
approach that is used is to check for the existence of each possible or
relevant defect.
In an integrated circuit it is either not possible or very difficult and
expensive to use a probe to measure for the presence of a defect
directly on the spot. Instead, input signals are applied to the chip such
that at least one of the outputs gets a different value if the defect is
present compared to if the chip is free from defects
Common defects in faulty chips are short circuits between
conductors and breaks in the conductors. More complex physical
defects might result in the creation of unwanted extra components, for
example an extra transistor. Short circuits, breaks and extra
31
CHAPTER 2
components are defects that can be considered as distinct, which
means that either the defect is there or it is not there.
In another class of defects, some performance metrics of
components are outside acceptable ranges. An example of such a
defect is a wire that has become too thin, resulting in resistance that is
too high but still low enough such that the effect is different from a
break. Another example of such a defect is that two wires have come
very close to each other, resulting in too much parasitic capacitance
between them.
2.4.1. Faults and fault models
A fault is a description, at a certain abstraction level, of the effect of a
defect. There are basically two ways to define faults. The first is to
analyze possible defects at the implementation. Each relevant physical
defect is analyzed to determine how its presence appears at a certain
abstraction level. In this case, the physical implementation has to be
known before faults can be defined.
The other method is to use fault models. A fault model is a
conceptual representation of implementation defects in a description at
an abstraction level above the physical implementation. A fault model
denotes one particular way to define faults by only considering the
design at the abstraction level for which the faults are going to be
defined. A fault model does not rely on a specific implementation of
the system. Faults created from a fault model are usually less complex
than those derived from the physical implementation. They are
therefore simpler to handle. On the other hand, faults defined from a
fault model have looser mapping to the physical defects than faults
derived from possible defects in the implementation. They are
therefore less accurate. One of the most well-known fault models is
the stuck-at fault model [Eld59] at the logic level. A stuck-at fault in a
node in the logic circuit means that this node is always 0 or always 1.
We say that the node is stuck-at-0 or stuck-at-1 respectively. A node
that is stuck-at-1 has constant logic value 1, independent of what the
gate feeding that node tries to set it to. Stuck-at-0 faults behave
correspondingly.
32
DIGITAL SYSTEM DESIGN AND TESTING
The logic level stuck-at fault model is not sufficient for capturing
all possible physical defects. The logic level bridging fault models
wired-AND and wired-OR cover some of the defects not covered by
stuck-at faults.
To develop tests at a certain abstraction level before the design is
synthesized to the next level of abstraction, faults defined using fault
models are needed. The main difficulty of developing test methods at
high abstraction levels is finding fault models and defining faults that
adequately represent defects in the final implementation. The higher
the abstraction level the more difficult it is. However there are several
advantages if tests can be developed at a high abstraction level. One
advantage is that it can facilitate identification of testability problems
early in the design process. Another advantage is that test logic can be
included earlier, which means that optimization strategies can also
include the test logic to target the overall optimum rather than only the
main design without its test logic. A third advantage in working at a
higher level of abstraction is that test generation can be more efficient
[Jer02].
The logic level is a relatively low abstraction level and a test that
detects all stuck-at faults can also detect most physical defects but all
relevant physical defects might not be detected. With help of the
example in Figure 2.7 showing an implementation of an NAND gate,
we compare a logic level stuck-at fault with a fault derived from the
physical implementation. We demonstrate how the stuck-at fault is
less accurate but also less complex than the fault derived from the
physical implementation.
Consider the defect in Figure 2.7, which is a break in a wire. This
break causes the output Q to have high impedance for input pattern
01. In CMOS circuits high impedance in a node usually means that the
logic value of the node remains the same for some time due to
capacitance. The logic level stuck-at fault that best maps to this defect
is the fault where node A is stuck-at-1. A test generated to detect if
node A is stuck-at-1 applies 01 to the inputs. The output Q is then 0 if
this stuck-at fault is present and 1 if the gate works correctly.
However, the defect, which is the break, is only detected by a test for
this stuck-at fault if it is preceded by an input that sets output Q to
logic 0.
33
CHAPTER 2
break
Q
A
A
0
0
1
1
B
Q
0
1
1 high impedance
0
1
1
0
B
Figure 2.7: NAND-gate with break in a wire
To derive a fault at logic level from the break, we need to analyze
what goes wrong at logic level due to the defect. As argued for above
the effect of the defect in the NAND-gate in Figure 2.7 appears at
logic level as the output Q has an erroneous value when its inputs are
01 preceded by inputs 11. This is a more accurate fault than the stuckat fault but it is also more complex.
The faults described so far in this subsection model distinct
defects. There are also fault models for defects where some physical
properties are outside acceptable ranges. Failures that occur due to
faults from such fault models are referred to as marginally related
failures [Kun05]. An example of such a fault is the delay fault. There
are several defects that result in delay faults, such as the presence of
too much parasitic capacitance between a wire and the ground.
Crosstalk-faults are a type of marginally related fault and background
about them is described in Section 3.2.
2.4.2. Fault modeling
Fault modeling is to obtain a fault definition at a certain abstraction
level based on a defect or a fault at lower abstraction level. Fault
modeling in several steps can be used to define faults based on
physical defects at the implementation of a system. Fault modeling
can also be used as a technique to demonstrate and justify the
34
DIGITAL SYSTEM DESIGN AND TESTING
relevance of a certain fault model. For such a technique, faults are
modeled from a hypothetical and typical implementation.
VDD
stuck-at-0
OUT
Break
Fault
Modeling
IN2
IN1
IN1
OUT
IN2
(a)
(b)
A
B
Mux
Mux
Register
Register
Mux
Comparator
Fault
Modeling
Mux
Fault
Modeling
WHILE A ≠ B LOOP
IF A > B THEN
A = A - B
ELSE
B = B – A
END IF
END LOOP
OUTPUT A
ALU
Controller
One bit of an input to the
ALU is stuck at one
(c)
One bit of the variable is
stuck-at-0 at these occasions
(d)
Figure 2.8: Fault models at different abstraction levels
With help of the example in Figure 2.8 we show how a defect can be
modeled as a stuck-at fault at the logic level and further to the RTlevel and the behavior level. Figure 2.8a–b shows an example of how
a physical defect maps to a logic level fault. The circuit in Figure 2.8a
is supposed to implement a NOR gate, but there is a break in the line
connecting the left transistor. The result is that this transistor cannot
sink the output to ground when it is supposed to. At the logic level this
can be modeled as input IN1 is stuck-at-0. Figure 2.8b shows the logic
level representation with this stuck-at fault.
Figure 2.8c shows an RT-level implementation of a circuit that
computes the greatest common divisor of two integers A and B. Let us
assume that the gate in Figure 2.8b is used in the ALU in Figure 2.8c
35
CHAPTER 2
and one of the inputs to the ALU goes to input IN1 of that NOR gate.
We also assume that this input is not connected to any other gate in
the ALU. The stuck-at-0 fault at the logic level then appears as one bit
in a signal vector is stuck-at-0.
Figure 2.8d shows a part of the behavioral level description of the
greatest common devisor design. It contains two minus operations. At
RT-level both of them are implemented with the ALU. The fault in the
ALU then appears as a bit is stuck-at-0 in the right operands of both
the minus operators.
In this example we have shown how a defect can be modeled to a
logic level fault, then to an RT-level fault and then further to a
behavior level fault. Inspired by this fault modeling example we can
justify some fault models. At RT-level we found that the defect
appears as a bit signal is stuck-at-0. It is a relevant assumption that
other defects appear at RT-level as some other bit in a signal is stuck
at 0 (or 1). To represent all the faults due to this assumption we can
use the fault model signal bit stuck-at fault. A faults defined from this
model indicates that a certain bit in a certain signal is stuck at 0 (or 1).
With the same reasoning we can assume that a relevant fault
model at behavior level is the variable bit stuck-at fault. A fault
created from that fault model means that one bit in a variable is stuck
at 0 (or 1).
It should be noted that the circuit in Figure 2.8a is not the way a
gate is normally implemented but it is used in this example to avoid
making the illustration unnecessarily complicated. Many faults which
look simple at one abstraction level become quite complex when
modeled at a higher abstraction level. It is illustrated along with
Figure 2.7 how a simple break in a wire results in a relatively complex
fault at the logic level. Modeling the fault in that gate further into RTlevel can result in an even more complex fault. For example, assuming
that this gate is used in an ALU and the fault is modeled into RTlevel, we can end up with an RT-level fault whose presence results in
that the output of the ALU becomes erroneous in a particular way for
some specific operands which are preceded by operands from a certain
set of operands. More discussion about derivation of faults from
defects in the actual implementation is available in Subsection 3.3.2
36
DIGITAL SYSTEM DESIGN AND TESTING
2.4.3. Test generation principles
When faults have been defined the next step is to develop test vectors
that detect whether any of the faults is present. To do so, each fault
needs to be activated and propagated. Assigning a value to the input
that activates the fault causes the internal node with the fault being
tested to take on different values, depending on whether the fault is
present or not. An input assignment that propagates the fault, sets
some output to a different value if the fault exists compared to if no
fault existed. In this context an assignment of inputs can be a sequence
of different values applied to the inputs. To check whether a fault is
present, an input assignment must be used which both activates and
propagates the fault.
x1
x2
y
x3
Figure 2.9: A logic circuit tested for a stuck-at fault
Figure 2.9 shows a small combinational logic circuit. To test if the
node marked with an arrow is stuck-at-0, the output of the AND gate
needs to generate logic 1. Then that node gets logic value 1 if the
circuit is correct and logic value 0 if that fault is present. This is the
activation of the fault. To be able to detect whether the fault is present
it has to propagate to an observable output. In this example the NAND
gate needs to output a value to the OR gate such that the output y of
the OR gate depends on the node marked with an arrow. This means
that the output of the NAND gate should be 0. In this example input x1
and input x2 need to be assigned logic 1 to activate the fault. To
propagate the fault to y, input x2 and input x3 need to be assigned logic
value 1.
2.4.4. Design for testability
In complex circuits it is expensive in terms of testing time to activate a
fault and propagate it to an observable output. To save time, dedicated
37
CHAPTER 2
test logic is integrated in the circuit to facilitate testing. Design for
Testability (DfT) is a design technique that takes into account
testability in the design process, including additional testing features
and test logic [Abr90, Jha03, Mou00]. A commonly used DfT
technique is to use a scan path. With this technique, flip-flops are
equipped with some extra logic such that values can be scanned in and
out through the flip-flops as a shift register.
Another DfT technique is to use a Built In Self Test (BIST)
mechanism. This basically means that special extra circuit is added to
the chip which helps it to test itself. From outside the chip, all that is
required is that a signal be sent that puts the chip into the test mode.
The chip will then return a signature that can be checked to determine
if the test found some faults. A common implementation of BIST is to
use a linear feedback shift register to generate pseudo-random input
signals to the logic being tested. The output of the logic being tested is
connected to a multiple input linear feedback shift register that
generates a signature.
2.4.5. Typical test generation flow and
design flow
Figure 2.10 shows the design flow in Figure 2.1 complemented with a
typical flow for test generation. For clarity, the software development
box shown in Figure 2.1 is omitted. Typically, most test generation is
done at the logic level of abstraction. The main reason for not utilizing
higher abstraction levels above is the difficulty associated with
defining faults at these levels that accurately cover the relevant
defects.
The box Test data in the lower right corner of Figure 2.10
represents the final test data. This test data is a description of which
voltage levels to apply to the inputs of the circuit during test, along
with timing values for when to apply the signal. It also describes
which voltage levels to expect at the outputs for a circuit without
faults. This test data is transformed from Logic level test data. Logic
level test data is test data in which input and output values are
expressed as sequences of Boolean values. Logic level test data are
usually referred to as test vectors. The box Test data transformation
38
DIGITAL SYSTEM DESIGN AND TESTING
represents transformation of test data as Boolean values to voltage
levels. This transformation can be made in the test equipment. In such
a case logic level test vectors are sent to the test equipment.
System
specification
Library of
algorithms
System
synthesis
Library of soft
IP-cores at
RT-level
Design at
behavior level
Behavior synthesis
RT-level design
Library of soft
IP-cores at
logic level
Test vectors,
BIST & DfT
Logic synthesis and
Technology mapping
Logic design
Library of
hard
IP-cores
Test vectors,
BIST & DfT
BIST & DfT
Test
generation
Test data
transformation
Layout generation
Layout
Logic level
test data
Test
generation
Test data
Figure 2.10: Typical test generation and design flow
In some cases additional test data may be generated from layout to get
sufficient coverage of defects. This is illustrated by the dotted box and
the dotted arrows in Figure 2.10.
Most test generation is done at the logic level. This is illustrated
with the arrow from the box Logic design to the box Logic level test
data via the box Test generation. This test generation relies on logic
level fault models independent of specific manufacturing defects.
During this generation test logic is also generated. This is represented
39
CHAPTER 2
by the arrow going to the box BIST & DfT, which is attached to the
box Logic design.
For hard IP-cores the supplier needs to implement BIST and DfT
logic for the core and deliver test data. It is hard for the user to
develop tests because hard IP-cores are provided at the layout level.
Figure 2.10 illustrates how test data are provided along with hard IPcores and how it is included with other test data. In practice, the test
data for a hard IP-core can be given as logic level test vectors along
with information about which voltage levels should represent logic 1
respective logic 0.
Soft IP-cores at the logic level can be delivered with test vectors
for faults derived from a logic level fault model. This is usually
sufficient. Figure 2.10 also illustrates how such test vectors are
included in the logic level test data.
2.4.6. Test generation flow and design flow
with test data generation at high
abstraction levels
Test data generated at a certain abstraction level will be in a form
which corresponds to that abstraction level. RT-level test data can
include instructions of state transitions in an FSM. During synthesis of
a system for which test data has been generated, the test data have to
be transformed to comply with the synthesis of the system. Hereafter
follows a description of how the process of test data transformation
and test data generation link to the design flow.
RT-level
Figure 2.11 shows the design and test flow when test data is generated
from faults derived from an RT-level fault model. The box RT-level
test data represents the test data generated from the RT-level design.
The test generation can also generate DfT and BIST logic. This is
shown with the arrow from the box Test generation to the box BIST &
DfT. This test logic will be part of the system and is therefore further
synthesized along with the synthesis steps followed after the RT-level.
The generated RT-level test data needs to be transformed such that
it can be used in the test equipment. The right part of Figure 2.11
40
DIGITAL SYSTEM DESIGN AND TESTING
shows this transformation. The first step in this transformation is to
transform the RT-level test data into logic level test data. This must be
done with consideration for how the logic synthesis and technology
mapping is made. For example the RT-level test data might include an
instruction to make a state transition in an FSM. At the logic level the
FSM is implemented with gates and flip-flops. The transformation of
test data then transforms this RT-level instruction to the logic values
that should be applied to the inputs to achieve this state transition.
Further transformation from the logic level can be made as described
in Section 2.4.5.
System
specification
System
synthesis
Design at
behavior level
Behavior synthesis
RT-level design
BIST & DfT
Test
generation
RT-level
test data
Logic synthesis and
Technology mapping
Test data
transformation
Logic design
Logic level
test data
Layout generation
Test data
transformation
Layout
Test data
Figure 2.11: Design and test flow with test generation at RT-level
Behavior level
The design and test flow when test data is generated at behavior level
of abstraction is shown in Figure 2.12. Behavior level test data is
generated from the design at the behavior level. During this test
generation, test logic can also be generated. This is shown with the
41
CHAPTER 2
box BIST & DfT. This logic will be part of the design at the behavior
level and synthesized further together with the design.
The behavior level test data needs to be transformed to a form that
can be used when the test is executed. The right part of Figure 2.12
shows the transformation steps. The first step is to transform the
behavior level test data to RT-level test data.
For example, part of the test data at behavior level could be
operands for an operator, along with the expected output value of the
operator. At the RT-level a clock signal is introduced and the operator
could be implemented such that several clock cycles are needed to
complete the execution of that operator. The transformation of the test
data from behavior level to RT-level then needs to be done such that
the operands are applied and the result are read at the correct clock
cycles.
System
specification
System
synthesis
Design at
behavior level
BIST & DfT
Test
generation
Behavior level
test data
Behavior synthesis
Test data
transformation
RT-level design
RT-level
test data
Logic synthesis and
Technology mapping
Test data
transformation
Logic design
Logic level
test data
Layout generation
Test data
transformation
Layout
Test data
Figure 2.12: Design and test flow with test generation at behavior level
42
DIGITAL SYSTEM DESIGN AND TESTING
System level
Figure 2.13 shows the design and test flow when a test is generated at
the system level. The box System level test data represents the test
data made from the system specification. During test generation, test
logic for BIST and DfT can be generated.
System
specification
BIST & DfT
Test
generation
System level
test data
System
synthesis
Test data
transformation
Design at
behavior level
Behavior level
test data
Behavior synthesis
Test data
transformation
RT-level design
RT-level
test data
Logic synthesis and
Technology mapping
Test data
transformation
Logic design
Logic level
test data
Layout generation
Test data
transformation
Layout
Test data
Figure 2.13: Design and test flow with test generation at system level
As is the case when a test is generated at the RT-level and at the
behavior level, the system level test data needs to be transformed to a
form that can be used when the test is executed. The first step in this
process which is transformation from system level to behavior level is
usually a quite modest transformation because the function of the
system synthesis is mainly to choose algorithms. For example test data
defined at system level for testing a sorting functionality will not
change form by just choosing which sorting algorithm to use.
However, the way numbers are encoded can be decided at the system
synthesis and in such a case the test data needs to be transformed
accordingly.
43
CHAPTER 2
Test generation at several abstraction levels
Figure 2.14 shows what a design and test flow may look like when
different abstraction levels are used for generation of different parts of
the test data. At the system level some test generation is performed
and possibly some test logic is also generated. The design is then
synthesized into the behavior level and the test data is transformed
accordingly. At the behavior level, more test data is generated and
more test logic might be generated. The newly generated test data is
merged with the test data that was transformed from the system level.
The system is then synthesized into the RT-level and the test data is
transformed to RT-level test data. At the RT-level more tests data can
be generated as well as more test logic. The new test data is then put
together with the test data transformed from the behavior level. In the
next step, the system is synthesized into the logic level of abstraction
and the test data is transformed accordingly. More test data is also
generated at the logic level, including generation of more test logic.
The newly generated test data is then merged with the test data that
was transformed from the RT-level. The logic level design and the test
data which is now at the logic level can be further processed as if all
test generation has been done at the logic level.
It is not always efficient to generate test data at all the abstraction
levels shown in Figure 2.14. It can for example be more efficient to
use RT-level to generate some test data and then complement it with
more test data generated at logic level.
In Figure 2.14 it is also shown how the inclusion of IP-cores and
their test data can be done. In Subsection 2.4.5 it was described how
test data provided along with hard IP-cores and with logic level soft
IP-cores can be included in the test data generation. For soft IP-cores
at RT-level the flow of including the test data provided with the IPcore is made in a similar way. That test data is incorporated together
with other RT-level test data.
At behavior level algorithms from a library can be included. The
flow of including an algorithm at behavior level is similar to the flow
of including an IP-core at the lower abstraction levels. As test data can
be provided along with IP-cores, test data can also be provided along
with algorithms. An algorithm is a behavior level description and
therefore test data provided along with an algorithm must be generated
44
DIGITAL SYSTEM DESIGN AND TESTING
of faults derived from a behavior level fault model. For example test
data provided along with a sorting algorithm can be a set of lists that
the algorithm should sort. Those lists are generated such that when
sorting them a set of behavior level faults will be covered. That set of
faults are generated from one or several behavior level fault models.
System
specification
BIST & DfT
Test
generation
System level
test data
Library of
algorithms
Test vectors,
BIST & DfT
Library of soft
IP-cores at
RT-level
Test vectors,
BIST & DfT
System
synthesis
Design at
behavior level
Test vectors,
BIST & DfT
Test
generation
Logic design
Behavior level
test data
Test data
transformation
BIST & DfT
Test
generation
Logic synthesis and
Technology mapping
Library of
hard
IP-cores
Test vectors,
BIST & DfT
BIST & DfT
Behavior synthesis
RT-level design
Library of soft
IP-cores at
logic level
Test data
transformation
RT-level
test data
Test data
transformation
BIST & DfT
Test
generation
Logic level
test data
Layout generation
Test data
transformation
Layout
Test data
Figure 2.14: Test generation at several abstraction levels
45
CHAPTER 2
46
Part B
Chip testing
47
48
Chapter 3
Background and related
work in SoC testing
Part A of this thesis gave a general introduction and background to
digital system design and testing. This chapter provides a more
focused background and offers related work in the area of testing. In
Section 3.1 the main issues in SoC testing are presented, along with
related work in SoC-testing and NoC-testing. Section 3.2 provides a
deeper background in testing for on-chip interconnects with a special
focus on crosstalk-faults. Test principles that utilize abstraction levels
above the logic level are described in more depth in Section 3.3, along
with related work.
3.1 SoC testing and NoC testing
This section describes SoC-testing with a focus on SoCs with a NoC
infrastructure. It also presents related work in NoC-testing.
3.1.1. Issues in SoC testing
Testing of SoC devices can be partitioned into testing of cores and
testing of the interconnection infrastructure through which the cores
are communicating.
49
CHAPTER 3
Testing of cores
There are two main issues to consider when developing a test for a
core within a SoC. The first is the generation of test vectors and test
logic of the core itself. The second issue is the transportation of test
data to and from the core.
The generation of test vectors for the cores can, in principle, be
done as if the core was a stand-alone chip. When the core is a standalone chip it can be accessed directly from outside, but if a core is part
of a SoC, consideration must be given to the capacity of the
mechanism to transport test data to and from the core. To reduce the
amount of test data being transported it is better to use more BIST for
a core in a SoC, compared to what would be appropriate if the same
core was used as a stand-alone chip. The reason is that transportation
of more data takes longer, resulting in long test time. An alternative is
to embed a mechanism for transportation of test data with larger
capacity on the chip, but this costs chip area.
S
S
S
S
Core
S
S
S
S
Figure 3.1: Scan cells surrounding a core
The test access to a core can be performed with the help of some extra
circuitry commonly referred to as wrapper. A typical wrapper consists
of a set of scan cells, one or several for each signal pin. Every signal
connection to the core goes through a scan cell. The scan cells provide
facility to disconnect the core from its environment and then directly
apply values to its inputs and read values from its outputs.
The scan cells are organized as a shift register. This is illustrated
in Figure 3.1. The boxes marked with the letter s are the scan cells.
The signals to apply to the inputs of the core during test are provided
from the test equipment with help of the shift register. The shift
50
BACKGROUND AND RELATED WORK IN SOC TESTING
register is also used to transport output values back to the test
equipment. There is a set of control signals, not shown in the figure,
connected to all scan cells. These control signals control the
functionality of the scan cells.
Wrappers can be designed in a variety of ways. Several variants
are described in chapter 16 of Jha’s and Gupta’s book [Jha03]. The
IEEE standard 1500 [IEEE05] includes standards for the test wrapper
design for SoC testing. That standard is principally a revision and
adaptation of the IEEE Boundary scan standard 1149.1 [IEEE01],
which is a standard for testing of systems on a Printed Circuit Board
(PCB) where each component is a separate chip.
In a SoC device some previously designed SoCs can be included
as IP-cores. Such previously designed SoCs can have a test access
mechanism. Special consideration is needed to make testing of such
SoCs efficient. For example the issue of test planning for such SoCs is
addressed in [Cha05].
Test of interconnection infrastructure
To detect for breaks and shorts in interconnection wires, test wrappers
as described above can be used. The wrapper at a core can apply
values on interconnecting wires and then the wrapper at another core
can read the values of the wires. The decreasing dimension of SoCs
makes it however insufficient to test only for breaks and shorts at the
relative long interconnections connecting cores. Detection of a wider
range of faults is necessary, including those causing crosstalk and
delay outside the acceptable range [Pan05].
3.1.2. Special issues in testing NoC-based
SoCs
Testing of a NoC device can be partitioned into testing of cores and
testing of the communication infrastructure.
Generation of test schemes for the cores can be divided into two
main tasks. One is the test generation for the core itself. The other task
regarding testing of cores is test access. This is basically
transportation of test data to and from the core. The NoC
communication infrastructure can be used as a mechanism for test
51
CHAPTER 3
access [Cot03b]. Testing of the interconnection infrastructure in a
NoC includes testing of the switches and testing of the interconnection
links connecting different switches. The switches in a NoC can be
treated like any other core in the SoC device but utilization of the fact
that there are many similar switches can give advantages in terms of
test efficiency
In large, synchronous SoCs, distributing the clock to various parts
of the chip is associated with several drawbacks. One drawback is that
a large amount of power is consumed by such distribution. Another
drawback is that keeping the clock skew within an acceptable range
has become trickier as chips have increased in speed and in silicon
area. One way to avoid the problems associated with clock
distribution is to use the Global Asynchronous Local Synchronous
(GALS) concept [Hem99]. A chip designed with this principle is
partitioned into several regions, each with its own clock signal. The
GALS concept is widely adopted in NoC designs [Nak11]. Test for
SoCs with a GALS clocking concept was addressed in [Eft05, Tra08].
Tran et al [Tra08] specifically focus on a SoC with a NoC
communication infrastructure. Differently from the work presented in
this thesis, none of those articles presents test methods for crosstalk
induced faults.
Testing for crosstalk-faults is trickier in communication links
between different clock regions than in synchronous systems. The
phase difference between clock signals in two connected domains may
vary non-deterministically and crosstalk-faults might cause errors in
data communication only for some adverse phase differences. The
effect of this non-determinism is that errors can occur intermittently
due to such faults.
3.1.3. Related work addressing NoC testing
This subsection addresses related work in NoC testing, which includes
work about test access mechanisms and test scheduling as well as
some work about test data compression. The goal is to minimize test
costs and to maximize coverage of faults. Minimizing test costs is
mainly a matter of minimizing test time and minimizing chip area for
test logic.
52
BACKGROUND AND RELATED WORK IN SOC TESTING
Test scheduling and test access mechanism
Cote et al propose how an existing NoC infrastructure can be used in
an efficient way for test access [Cot03b]. They also describe how
packets can be statically scheduled offline to optimize the utilization
of the NoC infrastructure as a test access mechanism. The input to the
algorithm presented in their article includes a set of test ports to the
network, which can send test patterns and receive test responses. The
cost for each core to be tested is considered in terms of amount of data
transfer. The method works as follows. First the core that is most
expensive in terms of test cost is selected, where cost is defined to
reflect the amount of test data that is transferred to and from the core
during test. When this selection is made, the test port with the shortest
path to that core in the NoC infrastructure is selected for use when that
core is tested. The cores are then scheduled in decreasing order of
cost, using as much parallelism as possible. Power-related
considerations for this type of test scheduling were presented in
[Cot03a]. Amory et al [Amo04] presents how the method in [Cot03b]
can be extended to use internal processors in a NoC device as test
ports.
In [Lar04] optimization of the test access mechanism in corebased designs is addressed. In that article a dedicated test access
mechanism is the target. The method presented optimizes test time
and the size of the test access mechanism simultaneously. The input to
the method is the set of test data to be sent to and from the cores,
along with the locations of the sources and sinks. In this context,
sources and sinks are locations in the test access mechanism. Test data
is applied at sources and the sinks collect the results of the tests.
A network can be designed such that less network traffic is needed
if the same data is broadcast to several cores. For such a network it
might be more efficient to send the same test data to all cores rather
than sending specific test data to each core. If this method is used,
each core will need more test data than would be required if test data
is designed specifically for each core. It is possible that broadcasting
more test data might be less costly than sending smaller amounts of
data to each separate core. Ubar et al [Uba04] presents a method to
optimize the tradeoff between broadcasting some test vectors and
sending test vectors to each core separately. It has, however, the
53
CHAPTER 3
drawback that the cores need to be purely combinational. Thus it is
not directly applicable for testing of cores in a NoC device but the
claim is that the method can easily be expanded for sequential circuits.
If this is possible that method could be utilized for testing cores in a
NoC device.
In a NoC, all the switches in most architecture proposals are
identical. There have been proposals for utilizing this fact for testing.
In [Hos06, Hos07] test stimulus are broadcast to several or all
switches. The outputs from the switches are compared and any
difference indicates a fault. Another way to utilize the topological
regularity of a NoC infrastructure was presented in [Rai06]. In the
method of [Rai06] data is sent in certain standard ways to detect
defects.
The method presented in [Gre07] tests the switches and the wires
in a NoC architecture. It utilizes broadcasting of test data in an
efficient manner. It is demonstrated how test data can be transported
using only switches and interconnections that have already been
tested. As soon as another part of the network is tested it can also be
used for test data transportation.
Stewart and Tragoudas [Ste06] presented fault models based on
the functionality of NoC-switches. Faults based on their fault models
are defined as data transmission of a certain type at a port of a NoCswitch results in errors. Different types of transmissions in this context
have different quality of service policies and/or differ in number of
switches intended to receive the packets. Stewart and Tragoudas
[Ste06] also presented a test method for covering the faults they
defined. Different from our results in high level testing of NoCswitches, it is not shown in [Ste06] how the fault models correlate to
faults at a logic level implementation.
Fault tolerance techniques in a NoC communication infrastructure
using retransmission introduce timing jitter on packet arrivals at cores.
Huang et al have shown how a test wrapper can be designed
efficiently to make scan chains working in the presence of this type of
jitter [Hua08].
54
BACKGROUND AND RELATED WORK IN SOC TESTING
Test data compression techniques
It was identified in [Dal08] that the speed with which a NoC device
can be tested is limited by the capacity for transfer of test data. That
article describes how test vectors can be compressed to reduce the
amount of test data that needs to be transported, thus increasing the
test speed. The compression technique relies on the fact that test
vectors usually contain don't-care bits, which provide possibilities for
efficient test vector compression. Compression of the test response
was examined further in [Mor01].
Gonciari et al [Gon02] identified three parameters that should be
optimized simultaneously when compressed test vectors are used.
Those parameters are compression ratio, area overhead and test
application time. They claimed that previous articles had optimized
one parameter at the expense of the other parameters, but their method
optimizes these parameters simultaneously.
3.2 On chip crosstalk induced fault
testing
This section describes background topics for the contributions in
testing for crosstalk induced faults on asynchronous on-chip links.
Related work is also surveyed in this section.
3.2.1. Crosstalk induce faults and their test
aspects
The small dimensions of today’s chips means that testing for defects
caused by breaks and short circuits is not enough. To ensure high
quality in the manufactured chips, defects that cause unacceptable
delay or too much crosstalk need to be detected as well. This is
especially important when there are relatively long wires connecting
different parts, like the cores of a SoC [Che00, Ism99, Krs01, Nae04,
Nor98, Pan05, Sin02]. Newer chip technologies have thinner wires
and transistors and run at higher speeds than before. Wires are also
very closely packed. Effects of parasitic capacitance and inductance
55
CHAPTER 3
need to be considered during the design of a chip [Mic06, Nae04] and
testing is needed on the manufactured chips to determine if these
parasitic effects go beyond the tolerance limits such that they risk
affecting the functionality of the chip.
The chip designer needs to ensure that crosstalk does not exceed
expected levels and cause the chip to fail. For deep submicron chips
it’s important to consider both capacitive and inductive coupling, and
in some cases also coupling through electromagnetic waves [Liu03,
Mic06, Nur04]. Pamunuwa et al [Pam05], in 2005, asserted that the
general consensus was that modeling inductance is necessary only for
special nets such as clock and power lines. The majority of signal
lines can be accurately modeled with just resistance and capacitance.
The fact that a certain crosstalk effect needs to be considered during
chip design does not imply that there is a need to test for faults
affecting that crosstalk effect. From the test point of view
consideration of capacitive coupling is often sufficient [Bai04, Ism99].
In [Kun05] it is stated that failures due to capacitive crosstalk are the
leading cause among marginally related failures at Intel. In that article
the notion of marginally related failure refers to failures that occur for
chips with an unfavorable combination of layout design and
manufacturing process parameters. Inductive crosstalk does not
change much due to fabrication faults unless there are shorts and
breaks. In any case, it is easier to test for shorts and breaks than to test
for crosstalk-faults. However, there are other defects that, while they
do not produce more coupling than is allowed for, they exacerbate the
effect of the coupling. Defects that make line drivers weaker or wire
resistance higher may cause this type of effect. To address this, it
might be necessary to consider not only capacitive coupling but also
inductive coupling [Sin02]. Faults that increase the effect of crosstalk
are caused by both capacitive and inductive coupling, though it is
sometimes sufficient to consider only one type of crosstalk.
Crosstalk between on-chip wires can cause both delays and
glitches. The effect of crosstalk on a wire is highly dependent on
whether signals on interfering wires are changing and how they are
changing.
56
BACKGROUND AND RELATED WORK IN SOC TESTING
3.2.2. Models of crosstalk-faults
In [Cuv99] the concept of victims and aggressors was introduced
along with the Maximum aggressor fault model. The term victim or
victim wire is used for a wire that is tested to determine how it is
affected by crosstalk. Wires affecting the victim are referred to as
aggressors or aggressor wires. When testing for a fault defined from
the Maximum aggressor fault model, one wire is a victim while the
others are aggressors cooperating to affect the victim in the worst
possible way. The behavior of the victim under the worst attack
situation is measured.
It is when the signals on the aggressor wires are changing that
they may affect the victim wire. If the signal of the victim wire is not
changing, the aggressors can affect it such that glitches appear. When
the victim wire is changing states, the delay in the change can be
affected by crosstalk.
voltage
Aggressors
time
voltage
time
voltage
Victim
time
Figure 3.2: Capacitive crosstalk causing a glitch on a victim wire
As stated in Subsection 3.2.1 it is often sufficient to test for capacitive
coupling. In such cases, the victim wire can manifest positive glitches
when the aggressor wires are changing in a positive direction and it
can manifest negative glitches when the aggressors are changing in a
negative direction. Figure 3.2 illustrates the case with a positive glitch.
Regarding glitches caused by crosstalk on digital wires, it is usually
sufficient to consider positive glitches on victims that are at a low
level and negative glitches on victims that are at a high level.
57
CHAPTER 3
For a victim wire that is changing value, capacitive crosstalk may
affect the delay. For a victim wire that changes in the same direction
as the aggressors, the delay is decreased. If the aggressors are changed
in the opposite direction the delay becomes larger.
The worst case situations of interference through capacitive
crosstalk on digital signals are summarized in Table 3.1. In the table,
diagrams show how the aggressor wires are changed and which signal
is applied to the victim wire when the different worst cases occur.
Table 3.1: Worst case corners of capacitive crosstalk
Applied signal on
victim wire
Applied signal on
aggressor wires
1
0
time
1
0
time
1
0
time
1
0
time
1
0
time
1
0
time
1
0
time
1
0
time
1
0
time
1
0
time
1
0
time
1
0
time
Effect on victim wire
Increased delay
Increased delay
Decreased delay
Decreased delay
Negative glitch
Positive glitch
3.2.3. Asynchronous communication
protocols
A NoC infrastructure consists of a number of switches connected
through links in a given topology. As argued in Subsection 3.2.1, there
are several disadvantages to using a global clock for the whole NoC
device. Instead, a GALS scheme can be used. One way to implement
the GALS scheme is to let the switches in the NoC infrastructure be
clocked by different clock signals. The consequence of this is that we
need an asynchronous communication protocol for communicating
data between switches. Normally two unidirectional links are used to
connect two switches on a chip.
58
BACKGROUND AND RELATED WORK IN SOC TESTING
Transmitting
switch
clkT
RTR
Write
Data
Receiving
switch
clkR
Figure 3.3: Lines in a handshaking link
In an asynchronous link that transfers data from one switch to another
switch, some synchronization signals are needed. In on-chip
communication there are often many data lines in parallel. In [Pam03],
for example, 128 data wires in each unidirectional link is mentioned as
a reasonable number. All the data lines are synchronized by the same
synchronization signals. In this thesis we let the handshaking signals
Write and Ready To Receive (RTR) represent the synchronization
signals. The signal Write goes from the transmitter to the receiver and
is used to tell the receiver when data is stable on the data lines. The
signal RTR goes in the opposite direction and is used to inform the
transmitter when the receiver has read the data on the data lines such
that the transmitter can put new data on those lines. Figure 3.3
illustrates how these signals and the data lines are connected between
a transmitting switch and a receiving switch.
RTR
Write
Data
Figure 3.4: Handshaking sequence
Figure 3.4 shows the timing sequence for one data transfer. The
receiver asserts RTR=1 when it is ready to receive new data. The
signal RTR may only go high when Write is low and go low when
Write is high. The transmitter raises Write to indicate that new and
valid data has been put on the data lines. The signal Write may only
go high when RTR is high and go low when RTR is low. RTR is
changed on the active edge of the clock signal at the receiver. For
reasons of efficiency, transmitter implementations do not need to give
one full clock period gap between assertion of data and the signal
Write. Instead Write=1 can be asserted after a small delay. Figure 3.5
shows how signals are changed with the clock edges. The signals clkT
59
CHAPTER 3
and clkR refer to the clock signals in the transmitter and the receiver,
respectively.
clkT
clkR
RTR
Write
Data
Figure 3.5: Handshaking in GALS
The actual implementation of the asynchronous communication
protocols may be slightly different from the abstract model described
above. The important function of the protocol is that synchronizing
signals communicate that the receiver is ready and the transmitter has
sent (or will send) new data with a known timing behavior. An
overview of possible protocols in asynchronous communication on
chips links is included in [Ho04]. That article also describes how
buffer amplifiers, repeaters and pipeline stages can be used to get high
throughput or low latency or both for long on chip wires. Metastability
is another issue that has become a big problem in multi-clock domain
systems [Rah10] and need to be considered during implementation of
asynchronous communication protocols.
3.2.4. Issues in testing of asynchronous links
To test for crosstalk-faults, each possible fault needs to be activated.
Subsection 3.2.2 describes how aggressors should act to cause the
worst case situation. As mentioned in Section 3.1.2, detection of
whether a crosstalk-fault is present can be rather tricky with
asynchronous links due to the non-determinism caused by the phase
difference between the clocks at the transmitting side and the
receiving side. This non-determinism is especially evident in systems
that use separate clock oscillators. In such systems the phase
difference between the clocks in different domains changes
constantly. The non-determinism might cause intermittent failure due
to defects that cause too much parasitic capacitance between wires,
60
BACKGROUND AND RELATED WORK IN SOC TESTING
where the failure only happens for some adverse phase difference.
This subsection proceeds with a discussion of delay, and thereafter
glitches are considered. We use the communication protocol described
in Subsection 3.2.3 to illustrate the issues.
Table 3.2: Effects of change of delay
Signal
Effect
Slow signal
Fast signal
Write
Throughput degradation Risk of erroneous data
transfer
of link
RTR
Throughput degradation
No problem arises
of link
Data
Risk of erroneous data
transfer
No problem arises
In the asynchronous link described in Subsection 3.2.3, faults causing
change of delay might result in failures according to Table 3.2. Slow
control signals can result in degradation of the throughput of the link.
Such an effect can be tested by measurement of the throughput of the
link. A harder fault to test is when some data lines are delayed but the
control signal Write is not. This is the type of fault we address in this
thesis. A change in the signal Write from 0 to 1 indicates to the
receiver that new data is available on the data lines. When this occurs,
if the data that is supposed to be at the receiver has not yet arrived, the
receiver eventually reads some data bits erroneously. Let tl be the time
from when the data arrives to when the signal Write arrives at the
receiver. Due to faults and process variations, tl varies between
different devices. If tl is smaller than zero there is a delay fault that
might result in erroneous data. For non-negative tl, there is no such
delay fault affecting the data line being considered.
The receiver reads the data on its active clock edge and the time
from the transmitter asserts the signal Write until the receiver detects
it can vary with one receiver clock period time. This variation depends
on the clock phase difference between the transmitter and the receiver.
It can be modeled as a non-deterministic time gap from the arrival of
the signal Write until the data is actually read. Figure 3.6 shows an
example where a delay fault causes data to arrive after the arrival of
61
CHAPTER 3
the signal Write. In Figure 3.6a a relatively long time passes from
when the signal Write changes from 0 to 1 until the active edge of the
clock occurs. During this time the data stabilizes, resulting in a correct
read of data. In Figure 3.6b an active clock edge occurs quite soon
after Write has changed from 0 to 1. The result is that the new data is
not stabilized when it is read, resulting in erroneous data. This makes
the testing for such faults somewhat complicated.
clkR
Write
Data
a
clkR
Write
Data
b
Figure 3.6: Signals at the receiver when data is delayed
Glitches can cause a false synchronization signal to be detected as
asserted although it was not. This can cause the transmitter and the
receiver to lose their consensus about the phase of the data transfer the
link is currently in. This can result in lost data, duplicate data and
invalid data. Figure 3.7 shows an example of how a glitch can cause
the loss of a data packet. A glitch occurs at RTR when both Write and
RTR are logic zero and the receiver is not able to take any new packets
for some time. Assume that at this time the transmitter has a packet it
wants to transmit. A glitch on RTR might occur at the same time as an
active edge of the transmitter clock. The transmitter then raises Write
to indicate that a new packet is available. When this is done the
transmitter finds RTR logic zero, which the transmitter interprets to
mean that the receiver has read the data. Hence the transmitter sinks
Write on its next active clock edge and prepares to send the next data
packet.
62
BACKGROUND AND RELATED WORK IN SOC TESTING
clkT
clkR
Glitch
RTR
Write
Data
packet n-1
packet n
packet n+ 1
Figure 3.7: Example of a glitch causing a failure
Duplication of data packets and additional invalid packets can occur if
a glitch at Write causes the receiver to believe that a new packet is on
the data lines when it is not. Glitches on data lines can also cause
errors in the data if they occur when the data is read. In Section 4.3 a
method of testing for faults that cause glitches is proposed.
3.2.5. Related work on chip interconnection
testing
In this section related work addressing crosstalk models and related
work focusing on methods for testing are surveyed. Related works that
do not directly address testing, but which addresses the subjects of
crosstalk and asynchronous links on-chip, are also surveyed. This
includes articles about fault detecting codes, fault correcting codes and
encoding to avoid the effects of crosstalk.
Crosstalk-fault models
The maximum aggressor fault model [Cuv99] was surveyed in
Subsection 3.2.2. In many cases the maximum aggressor fault model
is unnecessarily pessimistic. The consequence of this is that test time
becomes unnecessarily long and large amount of test logic might be
required. In a bus, wires that are close to each other are subject to
more capacitive coupling than wires that are further away from each
other. It is unlikely that defects will change that relationship unless
there are defects that cause short circuits or breaks, and defects of that
type are easier to test anyway. Song et al [Son09] as well as
Sirisaengtakin and Gupta [Sir02] used a graph representation to
represent possible coupling between wires. With the help of this graph
it is determined which wires could be scheduled as victims
simultaneously. In Section 4.2 a method is presented that utilizes the
63
CHAPTER 3
structure of NoC interconnects to select wires that can be safely
chosen as victims simultaneously. Unlike the methods in [Sir02,
Son09], the method presented in Section 4.2 proposes a simple
hardware design in which it is possible to adjust how many wires
should be victims simultaneously. This makes it possible to adjust the
tradeoff between test time and test accuracy after manufacturing.
Zhao and Dey [Zha03] presented a method for computation of
efficiency of fault models including those for crosstalk defects. This
method is useful to evaluate the relevance of fault models and the fault
coverage of test vectors. They assumed that there is a method that can
detect whether a victim wire is affected by a set of aggressor wires
during a test. Their computation method also relies on the existence of
a method to determine the probability that there is a crosstalk-fault
between pairs of wires.
Related to crosstalk-fault models, there are articles that
specifically address the testing-related aspects of crosstalk. There are
articles which state that a trend for chips is that height of the wires is
increasing relative to their width [Aru05] and parasitic capacitance is
increasing between wires compared to capacitance between wires and
the substrate [Pil95]. One effect of this is that the amount of delay in
signal wires is varying more than before, depending on the behavior of
adjacent wires. Pileggi [Pil95] also stated that if resistance per length
unit is reduced significantly, inductance could become a factor that
will need to be considered. Ismail et al [Ism99] stated that it is only
important to include inductance in calculations during design for a
given length of the interconnections. Observe that although it might be
necessary to consider crosstalk caused by inductance during design it
does not mean that it also needs to be considered for production test.
There are several articles addressing how to estimate and model
capacitive crosstalk [Gup05, Hey05, Pal05] which is the most
important kind of coupling to be considered when a test fabric is
developed.
There are articles addressing delay issues in NoC devices [Liu03,
Liu04]. These articles describe electrical properties and system
parameters for NoC interconnects and how they affect the delay.
Buffer insertion is discussed as well. Buffers decrease delay but they
64
BACKGROUND AND RELATED WORK IN SOC TESTING
consume a significant amount of area and power [Liu03], which is
why they are not often used.
Bai and Dey [Bai04] describe how one trace of a Spice simulation
was used to make several simulations of NoC interconnection links at
the logic level of abstraction. That simulation included the effects of
crosstalk.
Test methods for crosstalk-faults
Currently only a few articles present methods of testing for crosstalkfaults in asynchronous on-chip links. Li et al [Li09] as well as and Su
et al [Su00] has present test methods for crosstalk which do not
depend on a global clock. This means that they can be used to test for
crosstalk-faults in chips with several clock domains. In the method
presented by Su et al [Su00] a periodic signal is sent back and forth on
two different wires in an interconnection bus. The phase difference
between the signal that is sent and the signal that comes back is used
to measure the delay. The delay is always measured in two lines, one
in each direction. That article also presents a method for inducing
worst case delay due to crosstalk. This is done by feeding other wires
with an inverted version of the periodic signal used for the wires being
tested. A scheme for detecting glitches as a measurement of crosstalk
was presented by Li et al [Li09]. They claim that for detection of
crosstalk-faults it is sufficient to measure glitches. The justification is
that defects that cause additional crosstalk results in both glitches and
delay faults. Compared to the contributions presented in this thesis,
their method has the drawback that glitch detectors implemented using
analog circuits are needed. That method is also more pessimistic than
the method presented in this thesis because there are parameter
variations in the analog detector. The worst case corner in which
glitches are smallest compared to the delay caused by the
corresponding interference, must be assumed to cover the possible
crosstalk-faults when the test method in that article is used.
For chips with a single clock signal there are several test methods
proposed for detection of crosstalk-faults. Bai et al [Bai00] has
presented a complete BIST structure for crosstalk interconnection test
based on the maximum aggressor fault model. Attarha and Nourani
[Att01] used BIST cells of analog structure to detect noise. For delay
65
CHAPTER 3
measurements, gate delays were used as reference. A BIST hardware
design for detection of all transaction faults, bridging faults and stuckat faults in synchronous interconnections is presented in [Jut04].
Transaction faults mainly refer to delay faults in that context. Grecu et
al [Gre07] addressed test issues for the infrastructure in NoC devices.
They combined test generation for interconnection links and switches.
They utilized NoC infrastructure to make the testing efficient and they
included tests for crosstalk. Duganapalli et al [Dug08] focuses on how
to test for crosstalk-faults in nodes inside a network of gates. An
interconnection wire inside such a network is considered as a victim
and a set of wires are considered as aggressors. A genetic algorithm is
used to find a set of input vectors that can be used to activate
crosstalk-faults and propagate their effect to an observable output.
voltage
Aggressors
0
1
1
0
1
0
1
1
0
1
0
0
1
1
0
1
time
voltage
0
time
voltage
Victim
0
time
Test for glitch faults
Test for delay faults
Figure 3.8: Test sequence for one line
When testing for crosstalk-faults in a connection bus within a chip
with a single clock, it is usually enough to test for increased delay and
for glitches. Subsection 3.2.2 describes which logic values should be
applied to victim wires and aggressor wires to perform respective
tests. Each test needs two succeeding test vectors to be applied on the
bus, a preceding vector and a final vector. This is because the test
consists of a certain transition. This means in principle that eight test
66
BACKGROUND AND RELATED WORK IN SOC TESTING
vectors are needed to test one line. However, in two cases it is
possible to use the final vector for one test as the initial vector for
another test. This means that six test vectors in a specific sequence are
needed to test a victim wire for capacitive crosstalk-faults. Figure 3.8
illustrates this sequence for testing one line. When the maximum
aggressor fault model is used one wire at a time is considered as a
victim, hence 6N vectors are needed where N is the number of lines in
the connection.
Fault tolerance and coding techniques
Erroneous data transfer is not only caused by defects in the chip, it can
also result from transient faults. Transient faults can be caused by
cosmic radiation [Mic06] and other sources that sporadically cause
disturbances in signals on the chip. The shrinking dimensions of
semiconductors has made them more sensitive to some types of
transitory disturbances, so fault tolerance in on-chip communication
links is needed [Dum03]. Error correcting or error detecting codes
need to be used to handle transient faults. When such codes are used
they also, to some extent, reduce failures caused by chip defects that
cause too much crosstalk.
Coding techniques used to detect and correct transient faults can
to some extent reduce failures caused by defects that induce crosstalk.
However relying on coding techniques to compensate for such defects
can be problematic because a certain pattern of data that becomes
erroneous due to a defect is expected to be erroneous in the same way
each time it is transferred. A system with error detecting codes, which
asks for retransmission, will experience problems in such a case
because the same error is likely to occur when the data is resent. With
a system that utilizes error correcting codes it might work but the error
correcting mechanism must correct data errors every time the data is
sent. The problem is when also errors caused by transient faults affect
the data. In such a case the error correcting mechanism must correct
for both the errors caused by defects that induce crosstalk and the
errors caused by transient faults. The number of transient fault
induced errors that can be corrected is then smaller in a chip with
errors caused by crosstalk inducing defects. As a result, the probability
67
CHAPTER 3
that a transient fault cannot be corrected is considerably larger than it
is for a system without any permanent crosstalk-faults.
Zhao et al [Zha04] presented an online method for detection of
noise induced by crosstalk and other effects. The basic idea of that
method is that lines are sampled twice with a small delay between the
samples, instead of only once. The two samples are then compared
and if they are different it means that a fault has been detected. They
pointed out that a weak point in this method is that it is overly
conservative, with a false detection level of 40%.
Some articles address NoC infrastructure issues in particular.
Zimmer and Jantsch [Zim03] described fault models for NoC
interconnects which are basically temporary faults caused by
radiation, etcetera. It describes how probability and correlations in
time and space between faults can be modeled, as well as how
efficient coding techniques can be used for error control. Error control
schemes dedicated to NoC devices and their specific demands on
traffic was presented by Rossi et al [Ros07]. Overhead costs for fault
tolerance techniques in NoC circuits were addressed by Frantz et al
[Fra07]. They stated that hardware based fault tolerance techniques
consume too much power. They presented a technique to improve
fault tolerance that is partly implemented in hardware and partly in
software for NoCs. Tamhankar et al [Tam07] have addressed how
throughput in a NoC infrastructure can be improved with a timingerror tolerant technique in combination with a clock frequency that is
so high that timing errors occur more frequently than what is
acceptable without this timing-error tolerant technique.
Signal coding techniques that avoid excessive crosstalk were
presented in [Bre01, Dua01, Jun08]. The data is coded such that wires
on a chip never have values applied that cause excessive crosstalk.
Codes that avoid crosstalk in combination with a mechanism for
handling inter-symbol-interference were presented by Sridhara et al
[Sri08].
68
BACKGROUND AND RELATED WORK IN SOC TESTING
3.3 Test generation at high abstraction
levels
With test generation at high abstraction level we refer to generation of
test data at RT-level and higher abstraction levels. It is usually more
difficult at higher abstraction levels than at lower abstraction levels to
get good fault models. On the other hand, the cost for generation of
test sequences is often lower at higher abstraction levels than at the
logic level [Jer02]. Another advantage of generating tests at higher
levels of abstraction is that testing can be taken into account earlier in
the design phase, making it possible for optimization strategies to
include test costs and testing issues. The main challenge in generating
test vectors from high-level system specification is to define good
fault models. Chapter 5 proposes an approach to generating fault
models using some knowledge about the functionality and structure of
the system. It is quite possible that different fault-models will be
useful for generating test vectors for different classes of systems.
Jervan et al [Jer02] described experiments in which test data
generation at behavior level resulted in higher coverage of logic level
faults than when logic level stuck-at fault based test vector generation
algorithms were used.
Most articles addressing test at high abstraction levels focus on
how to generate faults with good correlation to defects in the
implementation. Such articles can be divided into those addressing
fault models at a certain level of abstraction and those dealing with
derivation of faults from possible defects in the actual implementation.
3.3.1. Fault models independent of lower
abstraction levels
As justified in Subsection 2.4.2 fault models are independent of lower
level implementations and they can be used for defining faults and
then generating test patterns before synthesis into lower abstraction
levels has been performed. Most of the proposed fault models of that
type at the RT-level and the behavior level have been inspired by the
stuck-at fault model at the logic level of abstraction.
69
CHAPTER 3
Bit stuck-at faults
In the example in Subsection 2.4.2 that demonstrates fault modeling,
the signal bit stuck-at fault at the RT-level and the variable bit stuck-at
fault at the behavior level were described. In that example, the fault
was modeled from the logic level of abstraction to the RT-level and to
the behavior level. However, the signal bit stuck-at fault as well as the
variable bit stuck-at fault can be used as fault models without
knowledge of lower level implementations. A behavior level
description of a system often states how variables should be encoded.
Another option is that the decision of how to encode variables is taken
at first when the system is synthesized into RT-level. The variable bit
stuck-at fault can only be used when the variable encoding is stated.
The bit stuck-at fault at the behavior level was presented by Cho
and Armstrong [Cho94]. An analysis of how such bit stuck-at faults
map to RT-level faults was shown by Buonanno et al [Buo97]. Their
method works under certain assumptions about how the behavioral
synthesis has been done.
Logic level stuck-at faults have a clear mapping to many physical
defects. Each component at the RT-level is synthesized to a specific
set of logic components. Therefore the RT-level bit stuck-at faults are
also likely to have a good mapping to a subset of the physical defects.
Multiple bit stuck-at faults and variable stuck-at faults
One class of test data generation algorithms does not work directly on
fault models. Instead, the strategy of such test data generation
algorithms is to cover as much code as possible. The code is a
description of the system in some hardware description language, for
example VHDL. Such a description is often made at the behavior level
or the RT-level of abstraction. According to Buonanno et al [Buo97],
code-covering methods cover faults of a type called multiple bit stuckat faults. There are 3n-1 different ways in which an n bit signal can
have one or multiple bit stuck-at faults. Most code covering test data
generation algorithms tends to cover a particular subset of the multiple
bit stuck-at faults, those forcing the variables into their lower and
upper extreme values [Buo97]. A test for a particular subset of such
faults becomes a test for a set of variable stuck-at faults. The variable
stuck-at fault model means that the variable is stuck at a particular
70
BACKGROUND AND RELATED WORK IN SOC TESTING
value. Multiple bit stuck-at faults where all bits have a stuck-at fault
are equivalent to variable stuck-at faults.
Branch stuck-at faults and condition stuck-at faults
Two other stuck-at fault models proposed are the branch stuck-at fault
model and the condition stuck-at fault model [Fer98]. They can be
used both at the RT-level and the behavior level of abstraction. A
branch stuck-at fault means that a selection statement always makes a
specific selection. This is usually an if-statement that is stuck-at-then
or is stuck-at-else. It can also be a selection statement with several
alternatives, usually expressed as a case-statement. The condition
stuck-at fault model is similar to the branch stuck-at fault model. A
condition stuck-at fault means that a condition is either stuck-at-true
or stuck-at-false. This means that a condition behaves as if it is always
true or always false. A distinction between the condition stuck-at fault
model and a branch stuck-at fault model appears in the case where a
branch statement may be based on a number of conditions connected
through logical operators. More about the distinction between
condition stuck-at fault models and branch stuck-at fault model has
been described by Ferrandi et al [Fer01].
Short summary and comparison of RT-level and behavior level
stuck-at faults
Bit stuck-at fault and branch and condition stuck-at faults are stuck-at
faults at the RT-level and the behavior level that researchers have used
to generate test data. The bit stuck-at fault model and the variable
stuck-at fault model are more relevant for testing the data parts of a
design while the branch and condition stuck-at fault models are better
suited to the control parts of a design.
The variable stuck-at fault is independent of the encoding of the
signal. At the behavior level the encodings of signals and variables are
not always known. In cases when it is not obvious how variables
should be encoded, it is possible to use the variable stuck-at fault but
the bit stuck-at fault model cannot be used.
At the logic level of abstraction, the connections between gates
and flip-flops can be thought of as signals. It is then possible to define
faults to mean that a signal is stuck-at-0 or stuck-at-1. Consideration
of signals in such a way is equivalent to consideration of stuck-at
71
CHAPTER 3
faults at the outputs of gates and flip-flops. Logic level stuck-at faults
are, however, usually considered on both the inputs and the outputs of
gates and flip-flops. Consideration of stuck-at faults for both inputs
and outputs instead of only for outputs makes sense in all nodes where
the fan-out is larger than one. A similar distinction between
consideration of signals and consideration of outputs and inputs can be
applied to the bit stuck-at fault and the variable stuck-at fault models.
Inputs and outputs to the operators can be considered for bit stuck-at
faults and variable stuck-at faults [Cho94] rather than the signals and
variables themselves.
Operator mutation fault model
The stuck-at fault models at the RT-level and the behavior level do not
map to faults that are inside the functional units implementing various
operators in the behavior. Therefore stuck-at faults at the RT-level and
the behavior level can only model a subset of physical defects. A fault
model named micro-operation fault has been presented by Cho and
Armstrong [Cho94]. This fault model can be used to represent a fault
in a functional unit where we do not have access to its internal
implementation. A fault in such a block causes it to implement a
different function than intended. Buonanno et al [Buo97] presented a
generalization of this fault model called operator mutation fault. The
presence of a certain fault of this type results in an operator that will
make a miscalculation for some or for all operand values. For an
operator with a large number of inputs, it is practically impossible to
enumerate all possible operator mutations and then generate test data
to test them.
The way an operator can mutate due to defects depends on its
design and implementation. This means that operator mutations are
highly dependent on the operator’s logic level implementation.
However, the synthesis process is often predictable enough that there
is only one or a few ways in which an operator is going to be
implemented. Knowing that, a reasonably small set of operator
mutation faults can be defined such that it covers most of the physical
defects in the circuit that will implement the operator.
72
BACKGROUND AND RELATED WORK IN SOC TESTING
Code coverage methods
The description at the RT-level and the behavior level of a system is
usually made in a hardware description language, e.g. VHDL or
Verilog. The basic idea of code covering methods is to generate a test
sequence that causes as many statements as possible in the hardware
description code to produce an observable output. Corno et al [Cor00]
presented methodology which combines code coverage and RT-level
fault models. Their methodology gives rules for identifying code lines
for which usage of RT-level faults have little or no correlation to logic
level faults. By omitting such RT-level faults from consideration a
relatively strong correlation between RT-level faults and logic level
faults was shown by experiments. Statements in VHDL described as a
sequence were identified by Corno et al [Cor01] to improve the
accuracy of RT-level fault models.
A variant of decision diagram called alternative graphs has been
proposed by Ubar [Uba96]. Usage of this type of diagrams is efficient
for test data generation at the behavior level, especially when multiple
abstraction levels are utilized. Jervan et al [Jer02] have used such
graphs to generate test data for bit stuck-at faults and condition stuckat faults and they have utilized a technique to perform hierarchical test
generation. Hierarchical test generation in this context means using
several abstraction levels for test generation.
In a description of systems in a hardware description language, it
can be rather tricky to determine if the effect of a fault is propagated
to an observable output. That problem has been identified by Fallah et
al [Fal01] and they presented a method for analysis of how the effects
of faults propagate.
Experimental results based on the evaluation of the behavioral
level fault metrics bit coverage, condition coverage and statement
coverage were presented by Goloubeva et al [Gol02]. Statement
coverage means that the test assures that the effect of each statement
in the hardware description code is propagated to an observable
output. Condition coverage and bit coverage refer to coverage of the
conditions stuck-at fault and bit stuck-at faults, respectively. Systems
are generally classified as control-dominated systems, data-dominated
systems and mixed systems. The result of their experiment indicates
73
CHAPTER 3
that bit stuck-at faults and condition stuck-at faults have a good
correlation to logic level stuck-at faults for data dominated systems.
3.3.2. Derivation of faults from possible
defects in the actual implementation
In Figure 2.8 an example was given of how a defect can be modeled
first to logic level and then further to RT-level and behavior level.
Faults derived from possible defects in the actual implementation have
very good correlation to the physical defects they model. Faults can
however not be derived in this way before synthesis to the final
implementation has been performed. This is a drawback in the sense
that consideration of test aspects based on such faults can only be
made late in the design process. Another drawback is that such faults
are often complex. The example in Figure 2.7 shows how a break in a
wire can result in a fault at logic level that is considerably more
complex than a logic level stuck-at fault. A third drawback is that
detailed consideration of how the synthesis was made is needed to
derive faults from possible physical defects. This can be rather
complicated.
Instead of deriving faults from the physical defects in the actual
physical implementation, faults generated by a fault model at an
abstraction level below the currently considered abstraction level can
be used to derive faults. For example a system might have been
synthesized into logic level and faults might have been created from
the logic level stuck-at fault model. These faults can then be used to
derive faults at RT-level. These derived faults at RT-level will have
very strong correlation to the logic level stuck-at faults.
Initial work in this area was presented by Hansen and Hayes
[Han95a]. A fault modeling metric called physically induced faults
was presented. This metric describes how faults at some abstraction
level can be modeled up to any higher abstraction level. Hansen and
Hayes [Han95b] also presented a way to use fault induction to
generate fault models at a higher level of abstraction from logic level
design.
A description of a system at some higher abstraction level,
together with the synthesis rules used to synthesize to the next lower
74
BACKGROUND AND RELATED WORK IN SOC TESTING
abstraction level, is sufficient information to determine how the
system will be described in that next lower abstraction level. A fault
model at that lower abstraction level together with synthesis rules is
therefore sufficient to generate faults at the higher abstraction level
that are equivalent to faults derived from the faults in the lower level
implementation.
In practice, synthesis rules have optimizations making it
complicated to find fault models from the synthesis rules without
synthesizing the system and then propagating the faults to a higher
level of abstraction. However, there have been attempts to make fault
models from synthesis rules. In [Laj00] a method is described where
fault models are developed for descriptions in the language POLIS.
The synthesis rules of POLIS are analyzed to get fault models at the
behavior level with a high correlation to logic level stuck-at faults.
Operator mutation faults cannot, however, be accurately modeled with
this method.
Another work addressing testability consideration of synthesis
rules was presented by Dey et al [Dey98]. They illustrated how a
synthesis tool can introduce feedback loops into a design during
optimization, which can considerably degrade testability at the
behavior level and the RT-level. They proposed DfT techniques at the
behavior level and at the RT-level to achieve good observability and
controllability of the final implementation. They also showed what
constraints on the synthesis are needed to preserve controllability and
observability. Controllability and observability in this context
basically refer to the ability to activate faults and propagate faults.
75
CHAPTER 3
76
Chapter 4
Testing of crosstalk induced
faults in on-chip
interconnects
This chapter describes contributions to testing SoC interconnects. Its
focus is on testing for defects that erroneously cause unacceptable
levels of crosstalk. Test methods for detection of faults causing
variations in delay and occurrence of glitches in asynchronous links
are proposed.
4.1 Method for testing of faults causing
delay errors
This section presents a method for testing of delay faults caused by
crosstalk defects in asynchronous links. The three first subsections
present the method. After that, the hardware implications are
described, followed by an analysis of the method along with results of
the analysis. In these sections it is assumed that independent clock
generators feed the different clock domains.
77
CHAPTER 4
4.1.1. Basic overview of the test method
System with
delay fault
System with
no fault
In Section 3.2 a basic handshaking communication protocol between
different clock domains was described. It was shown how a delay fault
can result in errors for some adverse phase differences between the
clock signals while no errors occur at some other phase differences. A
consequence is that sending data on the communication link under
worst case delay conditions is not sufficient to claim the absence of
delay faults.
As an alternative, we propose a method in which we read the data
both at the active clock edge just before the signal Write arrives at the
receiver and at the active clock edge just after arrival of the Write
signal. The second read is done in the same way as during normal
operation.
Write
Data
TR
tl
TR
Write
Data
TR
-tl
TR
There is exact one active clock edge in
each of the intervals marked TR.
Figure 4.1: Signals at the receiver
Figure 4.1 shows signals at the receiver side. The upper part of that
figure shows the signals when no fault is present and the lower part
shows the signals when there is a delay fault. Parameter tl was defined
in Section 3.2.4 as the time from when the data is stable till when
Write arrives at the receiver. When there is no fault the signal Write
arrives after the data has become stable. So a non-negative tl means
that the delay fault we are currently considering is absent while a
78
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
negative tl means that there is a delay fault. The clock period time at
the receiver is denoted as TR.
In the left interval marked with TR in Figure 4.1, one active clock
edge occurs at the receiver clock. The first read of the data is made at
this clock edge. In the case with a delay fault the wrong data will be
read at this clock edge. In the case with a system without a fault,
correct data or wrong data will be read depending on where in the
interval that this clock edge occurs. If it occurs in the interval denoted
with tl correct data is read otherwise the wrong data is read. So if the
data read at this clock edged is correct we can conclude that the delay
fault under consideration is not present but if it is wrong we cannot
make any conclusion.
In the right interval marked with TR in Figure 4.1, one active clock
edge occurs at the receiver clock. The second read of the data is made
at this clock edge. It is on this edge data is read under normal
operation so if there is no fault in the system the data read at this edge
will always be correct. In a system which has the delay fault, the data
read at this edge is correct or wrong depending on where in the
interval that this edge occurs. If this edge occurs in the interval
denoted as -tl, wrong data is read otherwise correct data is read. So if
wrong data is read at this edge there is a fault but if correct data is read
we cannot make any conclusion.
Combining the results of both reads results in one out of three
different cases. The first case is when the data of both reads are
correct. In this case we can conclude that the fault is absent. The
second case is when the data of both reads are wrong. In this case we
can conclude that there is a fault. The third case is when the data of
the first read is wrong and the data of the second read is correct. In
this case we cannot make any conclusions. To test if the fault is
present or not, this measurement is repeated until either the first or the
second cases occurs or until a predefined number of repetitions have
been made. In the following we refer to one such measurement as an
instance of a test.
79
CHAPTER 4
4.1.2. Algorithmic description of the test
method
In this subsection, we give a description of our method. The method is
described as two algorithms, one for the transmitter and the other for
the receiver.
1
2
3
4
5
6
7
8
9
10
11
12
13
ALGORITHM Transmitter ()
Write ← 0
REPEAT
data-lines ← data to precede test data
WAIT long enough for data-lines to settle
WAIT UNTIL RTR = 1
data-lines ← test data
WAIT TIME nominal value of tl
Write ← 1;
WAIT UNTIL RTR = 0
Write ← 0
UNTIL Receiver algorithm has finished
END ALGORITHM
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
ALGORITHM Receiver (OUT: result, IN: max_no_experiments)
i ← 0
result ← MIGHT_BE_FAULTY
WHILE (result = MIGHT_BE_FAULTY) AND
(i < max_no_experiments) LOOP
RTR ← 1
WAIT ON active clock-edge
WHILE (Write = 0) LOOP
Read data into register data_first_read
WAIT ON active clock-edge
END LOOP
Read data into register data_second_read
IF data_first_read = expeceted_data THEN
result ← FAULT_IS_ABSENT
ELSE IF data_second_read ≠ expeceted_data THEN
result ← FAULT_IS_PRESENT
ELSE
RTR ← 0
WAIT UNTIL Write = 0
i ← i + 1
END IF
END LOOP
RTR ← 0
END ALGORITHM
The algorithms for the transmitter and for the receiver implement the
handshaking protocol described in Subsection 3.2.3 with some
extensions that permit the test to be executed. The extension for the
transmitter is represented with code lines 4 and 5. These code lines put
80
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
signal values on the data wires, which should precede the data to be
sent. This is needed because the values on the data wires must change
in a certain way during the test, since delay in these changes is what
this test is meant to identify. On the transmitter side, code line 9 is an
extension to the handshaking protocol described in Subsection 3.2.3.
This code line reads the data lines at each active clock edge until the
control signal Write has changed from 0 to 1. Thus the variable
data_first_read will contain the data read at the active clock edge just
before this change of the Write signal. Code lines 13 – 16 determine
whether this measurement can identify the presence of a fault. If it
can, the variable result is assigned accordingly. The measurements are
repeated until it can be determined whether the faults being tested for
exist or until the measurement has been repeated the number of times
given by parameter max_no_experiments. Code lines 3 and 12 in the
transmitter algorithm and code lines 4, 5 and 22 in the receiver
algorithm represent the loops for this repetition of measurements. The
parameter result in algorithm Receiver returns the result of the test.
The result is either that the test passed or failed or that the specified
max number of instances of the test were completed but definite
conclusion was not possible. Additional test logic is needed to
initialize the test and to propagate its result. This additional test logic
also makes the transmitter exit when the test it completed. This is
indicated at line 12 in the transmitter algorithm.
4.1.3. Worst case interference setup
Each data wire might need to be tested for delay faults caused by
crosstalk. Tests for delay are necessary both when a wire changes
from 0 to 1 and when it changes from 1 to 0. The Maximum Aggressor
Fault model [Cuv99] can be used such that each wire takes a turn as
the victim while all the other wires are made to act as aggressors. We
assume that we only need to test for crosstalk caused by capacitive
coupling. It might happen that some wires other than the lines in this
channel also create interference with the victim wire. We assume that
if such wires should be considered as aggressors, the transmitting
switch can activate them. The delay fault we are considering in this
test is, in fact, not the absolute delay in the data line but its relation to
81
CHAPTER 4
the delay in the control line Write. Hence, to create the worst case
situation, the aggressors should act such that the signal Write becomes
as fast as possible and the data line being tested should be as slow as
possible. When testing for a delay fault when the victim data line
changes from 1 to 0, the longest delay due to capacitive coupling to
the aggressor wires is when the aggressors change in the opposite
direction, which is from 0 to 1. This behavior of the aggressor makes
Write change, also from 0 to 1, as fast as possible, which is the worst
case in this state.
For testing of the delay fault when the victim data line goes from
0 to 1, the aggressors should be changed from 1 to 0 to cause the worst
case delay. Such a change will also cause the maximum delay in the
signal Write when it changes from 0 to 1. This type of interference
does not create the worst case in this experiment. However, the
transmitter asserts the signal Write shortly after the data has been
asserted. It then causes the aggressor wires to change from 1 to 0
when it asserts the data, and then change the aggressor wires back to 1
again when it changes the signal Write. In such a way the victim data
line will experience the longest delay and the signal Write the shortest
delay due to capacitive interference from other lines. This means that
the test is also performed in the worst case situation.
4.1.4. Hardware implementation
In this subsection an RT-level description of the BIST hardware
needed for implementation of the proposed test method for delay
errors is described. Figure 4.2 shows a schematic diagram of the BIST
hardware on the receiver side. We assume, for the sake of simplicity,
that testing is initiated by a signal test mode from the corresponding
transmitter. After receiving this signal the receiver sends the RTR
signal and starts sampling the Data bus with the rising edge of every
clock. The two latest samples of Data bus are stored in a FIFO
structure consisting of two registers. Upon receiving the Write signal
the contents of these registers are compared with the expected data
and the comparison results are used to decide whether to abort testing,
or to try another instance of the test on the same wire, or to test the
next data wire as described in the test algorithm shown in Subsection
82
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
expected data
Memory
=
address
Data
Register
Register
connections to transmitter
4.1.2. The BIST controller generates the required control signals
including the address for the memory where the expected data values
corresponding to various tests are stored.
=
Write
RTR
test_mode
BIST
Controller
(receiver)
fault_present
fault_absent
result
Figure 4.2: Schematic diagram of BIST hardware in receiver
BIST
Controller
start_test (transmitter)
Data
Write
RTR
test_mode
connections to receiver
Memory
Register
address
Figure 4.3 shows a schematic diagram of BIST hardware at the
transmitter side. We assume that testing of links between two switches
is initiated by a control signal start test distributed from a central
controller in the system. We assume the given test vectors for delay
testing are stored in memory. These vectors are read out, one test at a
time, and sent to the receiver using the same timing sequence as in the
normal operation.
Figure 4.3: Schematic diagram BIST Hardware in transmitter
It should be noted that the BIST hardware described above is only a
schematic description and the implementation can be optimized
further. For example, hardware resources which are already available
in the switch, like registers and memory buffers, can be shared for
BIST purposes. Also, the same BIST hardware resources can be used
83
CHAPTER 4
for different links connected to a switch. The test vectors in the
memory have a regular structure, hence logic can be used to generate
the test vectors and this might consume less chip area than storing
them in memory.
4.1.5. Analysis and results
The basic idea of the method described is to repeat the testing using
one test vector until either the existence of the fault can be determined
or until a predefined number of instances of the test have been done.
In the following section a theoretical analysis is carried out to identify
the relationships between the probability of delay faults and the
expected number of test instances required to achieve a reliable test
result.
Test capability as a function of measurement iterations
In Section 4.1.1 tl was defined as the time from when data arrives at
the receiver till when the signal Write does. There it was also
described in which cases one instance of the test can unambiguously
determine if the fault is present or not. For a correct chip we will
determine the absence of the delay fault if an active clock edge occurs
in the time period from the data arrives till signal Write arrives. The
length of this time period is |tl|. Similarly for a chip that has the delay
fault we are looking for, we will detect that the fault is present if there
is an active clock edge in the time period from signal Write arrives till
the data arrives. The length of this time period is also |tl|. Assuming
that the clock signal at the transmitter is independent of the clock
signal at the receiver there will be no correlation between the arrival
time of signal Write and the clock phase at the receiver. The
probability that one measurement can decide unambiguously if the
fault is present or not, is therefore equal to the probability that at least
one active clock edge appears in a time interval of length |tl|. Let
function g(tl) denote this probability. This function is then:
 tl

g (tl ) = TR
1

84
tl ≤ TR
tl > TR
(4.1)
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
In practice, tl will depend on process variations and other
manufacturing effects. This means that tl is a stochastic variable. We
assume that tl is normally distributed with expected value µ and
standard deviation σ. The nominal value on tl is then µ. Given µ and
the probability that the delay fault we are looking for is present, σ can
be determined. Let f(t) denote the density function of tl.
Let p denote the probability that one instance of the test can detect
if the delay fault is present or not. This implies that p is a probability
and this probability is in itself a stochastic variable, because it depends
on the outcome of tl. Let r(x) denote the density function of p. Because
r(x) denotes the density function of a probability its value is zero
outside the interval [0, 1] and its integral over the real values is unity.
To derive the relationship between r(x), g(tl) and f(t), we let a and
b denote two real numbers such that a < b. We use them to express the
probability that p lies between these numbers. This is equal to the
probability that the outcome of tl is a value for which the function
value g(tl) is in the interval [0, 1]. This is illustrated with Figure 4.4 in
which the vertical axis indicates probability and the horizontal axis
represents tl. The numbers a and b are shown at the vertical axis. Light
gray lines illustrates for which values on tl the function g(tl) has a
function value between a and b. The probability that p will be between
a and b is therefore equal to the integral represented by the shaded
areas in the figure.
Figure 4.4: Illustration of the relation between r(x), g(tl) and f(t)
85
CHAPTER 4
Formally the relation between r(x), g(tl) and f(t) can be expressed as:
b
∫ r (x )dx = ∫( )f (t )dt
a
∀(a, b ), a < b
(4.2)
a < g t <b
In the general case g(t) can be split into sections that are
monotonically increasing, sections that are monotonically decreasing
and sections that are constant, to transform r(x) to explicit form. For
this case g(t) has a constant value of 1 for t < -TR and for t > TR. This
means that r(x) has a Dirac impulse at x = 1. We denote the Dirac
impulse with δ(x). The interpretation of this Dirac impulse is that there
is a certain probability that every instance of a test detects whether the
fault is present. This happens when |tl|≥TR.
In the interval [0,TR), g(t) is linear as well as in the interval
(-TR,0]. Each of these intervals’ contribution to r(x) is in its interval
[0,1). So r(x) is:
+∞

 −TR


δ (x − 1) ⋅ ∫ f (t )dt + ∫ f (t )dt  +
 −∞


0 ≤ x ≤1
TR



r ( x ) = TR ⋅ ( f (TR ⋅ x ) + f (− TR ⋅ x ))

0
x < 0; x > 1


(4.3)
Because tl is normal distributed f(x) is:
−
1
f (x ) =
e
σ 2π
( x − µ )2
2σ 2
(4.4)
For a specific fault on a particular chip, p has a fixed value. Let
none_fault be the number of instances of tests required to detect if a fault
is present or not. We assume that the chance that an instance can
identify the presence of a fault is independent of the outcome of
another instance of the test. With this assumption the density function
w(n) of none_fault is following a for the first time distribution. In
literature, geometric distribution is often used for this distribution but
there is also a slightly different distribution that is also called
geometric distribution, and that is the reason the term for the first time
distribution is used here.
86
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
(1 − p )n −1 ⋅ p
w(n ) = 
0

;n > 1
;n ≤ 0
(4.5)
The expected value of none_fault is 1/p.
Need for a limit on number of iterations
In this part we prove that the expected number of iterations will be
infinite in order to unambiguously decide whether a chip has the delay
faults we are targeting or not. Therefore, to make the test time finite,
there is a need for an upper limit on the number of test instances after
which the test for a fault should be aborted.
Variable p was previously defined as the probability that one
instance of the test can decide whether the delay fault under
consideration is present or not. Its probability distribution is denoted
with r(n). Equation (4.5) above shows how number of instances of a
test none_fault depends on p. The expected value of required number of
tests E(none_fault) is therefore:
E (none _ fault ) =
∞
1
⋅ r ( x )dx
x
−∞
∫
In Theorem 4.1 we show that E(none_fault) is approaching positive
infinity. The consequence is that we need to determine a maximum
number of times that we repeat the instance of a test.
Theorem 4.1: The expected value E(none_fault) approaches positive
infinity.
Proof:
E (none _ fault ) =
∞
1
⋅ r ( x )dx
x
−∞
∫
(4.6)
+∞

 −TR
 
1  

(
)
(
)
(
)
(
)
⋅
f
T
⋅
x
+
f
−
T
⋅
x
+
δ
t
−
1
⋅
f
t
dt
+
f (t )dt  dx 
R
∫0 x   R
∫
∫
 −∞
 
TR

 

1+
=
1
=∫
0
(T ⋅x − µ )
( −TR ⋅ x − µ )
−
1
1  − R2σ 2
2
⋅
e
+ e 2σ
x σ 2π 

2
2
−TR
+∞

dx + f (t )dt + f (t )dt
∫
∫

−∞
TR

1
44424
44
3
>0
87
CHAPTER 4
1
≥
σ 2π
(T ⋅ x − µ )
( −T R ⋅ x − µ )
−
1  − R 2σ 2
2σ 2
e
e
⋅
+
∫0 x 

1
2
2

dx


By the integration interval ending in 1+ we mean that the Dirac
impulse at x= 1 should be included in the integration. The two
exponential functions in the above expression are continuous and
positive, and they have no local minimum. The entire integrand is also
positive. Hence a lower bound on the integral can be found by
replacing each of those exponential functions with the smallest of the
function values in the end points of the integral.
2
2
1
 (TR ⋅0 −2µ ) (TR ⋅1−2µ )
1
1 

2σ
2σ
E (none _ fault ) ≥
⋅
⋅ min e
,e

σ 2π ∫0 x 


 −(TR ⋅0 −2 µ ) −(TR ⋅1−2 µ )
min e 2σ , e 2σ


2
2

 dx



+


(4.7)




(TR ⋅0−µ )2 − (TR ⋅1− µ )2 
−(TR ⋅0− µ )2
( −TR ⋅1− µ )2  1


−
−
−
 1
1 
2
2
2
2
=
min e 2σ , e 2σ  + min e 2σ , e 2σ   ⋅ ∫ dx





x
σ
π
0
1223

23
4444
3 14444244443  1
>ε1
 144442
>ε 2
>ε 3

 →+∞
for some constants ε1 > 0, ε2 > 0 and ε3 > 0
□
Determination of maximum number of repetitions of a test
instance
From Theorem 4.1 we know that the expected number of test
instances required to unambiguously classify a chip as faulty or fault
free is infinity. Therefore, we need to terminate the loop in the
algorithm Receiver after a reasonable number of iterations such that
the probability of “not able to judge” is acceptably low. Given that
probability, a maximum number of repetitions l of a test instance can
be computed. Let pk(n) be the probability that n tests can judge if the
fault is present or not. For a given probability p that one test can detect
if the fault is present or not, the probability that n tests detects if the
fault is present is 1 - (1 - p)n. Because r(x) is the density function of p
the function pk(n) can be computed as:
88
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
p k (n ) =
+∞
∫ r (x ) ⋅ (1 − (1 − x ) )dx
(4.8)
n
−∞
1+ 
+∞
 −TR

n
= ∫  δ (x − 1) ⋅  ∫ f (t )dt + ∫ f (t )dt + TR ⋅ ( f (TR ⋅ x ) + f (− TR ⋅ x )) ⋅ 1 − (1 − x )



0 
TR
 −∞

(
1+ 
+∞
 −TR

n
= ∫  δ (x − 1) ⋅  ∫ f (t )dt + ∫ f (t )dt  ⋅ 1 − (1 − x )



0
TR
 −∞

(
1+
∫ (T ⋅ ( f (T
R
R
(

)dx


)dx +

))
⋅ x ) + f (− TR ⋅ x )) ⋅ 1 − (1 − x ) dx
n
0
+∞
 −T R

n

= ∫ f (t )dt + ∫ f (t )dt  ⋅ 1 − (1 − 1) +
 −∞

TR


(
)
(T ⋅ x − µ )2
( − TR ⋅ x − µ ) 2
 1
− R 2
−
1
2
⋅ e 2σ
+
⋅ e 2σ
TR ⋅ ∫  
  σ 2π
σ 2π
0 

1+

 ⋅ 1 − (1 − x )n


(

)dx

 T −µ
 −T − µ 
= Φ R
 + Φ 1 − R
+
σ 
 σ


(T ⋅ x − µ )
( −TR ⋅ x − µ )
1
− R 2
−
TR
2

2σ
⋅∫ e
+ e 2σ
σ 2π 0 

2
2

 ⋅ 1 − (1 − x )n dx


(
)
In this expression Φ(s) is the probability that a stochastic variable with
a normal distribution, with expected value zero and standard deviation
unity, is smaller than s.
Equation (4.8) above gives the relation between the acceptable
probability for not being able to determine if a fault is present or not
and the maximum number of instances for a test to be repeated.
89
CHAPTER 4
Average number of iterations needed
Let pl be the actual number of tests instance performed given l.
Parameter l was defined above as the maximum number of repetitions
of a test instance. The density function of pl is a modification of the
geometric probability distribution. Let ql(n) denote the density
function for pl.
(1 − p ) n −1 ⋅ p
 ∞
k −1
l −1
q l (n ) = ∑ (1 − p ) ⋅ p = (1 − p )
 k =l
0
;1 ≤ n < l
;n = l
(4.9)
; n > l; n < 1
Let hl(p) be the expected value of pl given an l and a p. Then hl(p) is:
hl ( p ) =
+∞
 l −1
∑ i ⋅ q (i ) =  ∑ i ⋅ (1 − p )
l
i = −∞
i =1
i −1

l −1
p  + l ⋅ (1 − p )

(4.10)
Let nb denote the expected value of hl(p). This is the expected number
of instances of a test that will be performed for test of a delay fault.
Function hl(p) is in fact a deterministic function of the stochastic
variable p with density function r(x).
+∞
nb = ∫ r ( x ) ⋅ hl ( x )dx =
(4.11)
−∞
1+ 
+∞

 −TR

= ∫  δ (t − 1) ⋅  ∫ f (t )dt + ∫ f (t )dt  + TR ⋅ ( f (TR ⋅ x ) + f (− TR ⋅ x )) ⋅
 −∞



0
TR



  l −1


  ∑ i ⋅ (1 − x )i −1 x  + l ⋅ (1 − x )i −1 dx

  i =1

To demonstrate the efficiency of the proposed method we have
computed the expected value of test instances required from Equation
(4.11). This computation is made for three different values of each of
the following two parameters:
1. Ratio between E(tl) (the expected value of tl) and TR
2. Probability of the fault in the tested link.
The value l, the upper limit on the number of test instances, is
chosen such that the probability of stopping a test before the chip’s
faultiness is determined, is one tenth of the probability that the fault is
present, pf. For moderate probability of defects, this choice results in
that about ten percent of the chips not measured as good, nevertheless
do not have any defects. Table 4.1 shows the average number of test
90
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
instances required. The number of test instances required is larger for
smaller nominal values of tl and for chips with higher probability of
delay faults in links. The reason in both cases is that there is a larger
probability mass for values on tl closer to zero. This reduces the
probability that a test instance can determine if a fault is present or
not.
Table 4.1: Average number of test instances required
E(tl)/TR
1.0
0.5
0.1
0.1
2.6
4.8
24
Fault probability Pf
0.01
0.001
1.8
1.4
3.3
2.6
16
13
From Table 4.1 we can see that a relatively small number of
repetitions of each test measurement is needed, on average especially
for chips with low probability of fault. The table displays results for a
relatively high fault probability. For a useful chip process the fault
probability for each fault cannot be as high as the example. In such
cases these figures are lower but never lower than one. Hence it can be
concluded that just a few repetitions will be needed on average in all
practical cases.
Single read method
An alternative test method would only read the data at the clock edge
before arrival of the signal Write; the data on the clock edge after the
arrival of the signal Write would not be read. For good circuits the
number of iterations needed would be the same as for the method
presented above, but the process of only reading data once would
reduce the hardware overhead. The drawback, however, is that faulty
chips would not be detected during any instance of the test. Instead,
iteration would continue until the limit was reached. Upon reaching
this number the chip would be marked as faulty. So this alternative
method is inefficient if the probability for fault is high. Figure 4.5
shows how much worse the alternative method is in expected number
of iterations as a function of fault probability. The conditions are the
same as for the results presented in Table 4.1. There are three curves
representing three different values of E(tl)/TR. We can see from the
diagram that the difference in efficiency in terms of number of
91
CHAPTER 4
Differnece in expected
number of iterations
iterations is only significant when the probability for delay fault is
large. For example when the fault probability is 0.001 the alternative
method needs about 25 percent more iterations than the method
presented above. This is a relatively high fault probability. For lower
fault probabilities the difference in number of iterations will be
smaller. The conclusion is that for moderate fault probabilities, usage
of the alternative method will only increase the expected number of
iterations slightly.
200%
150%
100%
1.0
0.5
0.1
50%
0%
0.001
0.01
0.1
Fault probability
Figure 4.5: Comparison between delay test methods
Discussion of results
There might be factors like ageing and temperature variations that
affect the value of tl. For a communication link with a value of tl being
positive and close to zero such factors might cause tl to become
negative. A negative tl means that there is a delay fault. For positive
values of tl a larger value means a larger fault margin. As described in
Subsection 4.1.1, a small |tl| relative to the receiver clock period time,
gives a small probability that one instance of the test can determine
whether the fault under consideration is present or not. A consequence
of this is that if the fault margin is small the probability that one
instance of the test can determine whether the fault is present, is also
small. This means that for chips with smaller fault margins the test has
higher probability to not be able to determine whether the fault is
present. Hence, most of the chips that the test cannot determine is
likely to have a low fault margin or being faulty. Chips that pass the
92
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
test then have a higher fault margin in average compared to the entire
set of chips without the fault under consideration.
4.1.6. Testing systems where different clock
domains share a clock oscillator
In chips using the GALS approach, it is possible to generate clocks for
different clock domains from the same oscillator. In such systems the
phase difference between clocks in two different clock domains can be
anything, but it is relatively constant for each particular chip. As a
consequence, such systems do not have the non-determinism
described above. The testing technique for such dependent clock
domains is simpler than for independent ones, since in the
independent domains we have two unknown parameters, namely the
relative delay in control signals and the time varying phase difference
in the clocks. In dependent domains the phase difference is fixed.
This reasoning renders it sufficient to test whether a normal data
transfer under worst case conditions works correctly for a fault. By
normal data transfer we mean that data is transferred as in a normal
operation. The worst case condition is when the signal Write arrives at
the receiver as early as possible and the data arrives at the receiver as
late as possible. In some cases, however, the test will pass although
the signal Write arrives at the receiver before data is stable. As long as
the phase difference between the clocks is constant, this is not a
problem because the clock edge on which the data is read will not
appear before the data is stabilized.
However, even when the clocks are dependent it might not be
possible to guarantee that the phase difference between the clocks is
constant. In such a case it would be necessary to perform a test that
confirms that the signal Write actually arrives after the data is stable,
for chips that pass the test. A measurement as in the test method
presented above can be used. However, repetition of such a
measurement will not be necessary in this case, it is enough to read the
data as in the single read method described in 4.1.5. The test finds that
the fault is absent if the data is read correctly at the last active clock
edge before arrival of Write signal. Therefore such a measurement
always finds that the fault being tested for is absent whenever tl ≥ TR.
93
CHAPTER 4
but for systems where 0 ≤ tl < TR it depends on where in the clock
interval that signal Write arrives at the receiver, see Subsection 4.1.1.
In such systems where 0 ≤ tl < TR, an option to make this test method
working is to implement test logic such that the two different clock
domains can be clocked with uncorrelated clocks during the test.
4.2 Method for scheduling wires as
victims
In many cases, closely packed buses interconnecting cores are laid out
on many interconnect layers to minimize chip area and wire length.
During testing for defects that cause too much crosstalk, the maximum
aggressor fault model [Cuv99] can be used to select one wire at a time
as the victim wire. It will however lead to very long test time. The
probability is very low that a fault causing too much crosstalk affects a
pair of wires with many other wires between them. Therefore several
wires that are not too close to each other can be tested simultaneously
in order to reduce the test time. Deciding how close simultaneously
tested wires can be to each other is a tradeoff between test accuracy
and test time. Of course, this can only be done if we have information
regarding layout of the interconnecting wires. In this section we
propose a test method for crosstalk-fault detection that uses a small
programmable BIST hardware module. The hardware can be
programmed to provide the desired tradeoff between test time and
minimum distance between wires that are tested simultaneously. For
hard IP-cores programmability is especially useful because such cores
usually cannot be modified when they are included in a chip design.
4.2.1. Victim scheduling principles
In the presented method a test sequence is generated such that each
wire is scheduled as the victim once. As many wires as possible are
tested simultaneously as long as the distance constraint is met. The
goal is to minimize number of sets of wires to test simultaneously.
94
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
This problem is equivalent to a graph-coloring problem. Let each line
be a node in a graph and let there be an edge between each pair of
lines that should not be considered as victims simultaneously. The
number of colors needed to color this graph is then proportional to the
number of test vectors needed to test the bus for crosstalk-faults.
Physical layout information is needed to generate the graph. If we
assume arbitrary layout there will be many different ways in which
edges will be generated. In this section we suggest a method that
makes some assumptions about the layout to keep the hardware small
and still flexible. The corresponding graph-coloring problem gets
restricted in the solution space due to these assumptions. The first
restriction is that lines are positioned in a mesh topology, so that
several metal layers are used and in each metal layer, wires are placed
at a uniform distance and in the same direction, as shown in Figure
4.6. The second restriction is that we also assume that the properties of
wires are the same in all metal layers used for the bus.
The features of the layout and the desired accuracy define whether
a pair of lines can be considered as victims simultaneously or not. Due
to the layout restrictions described above, simplification of these input
parameters can be used without any significant loss in accuracy. Let zdirection represent positions in height direction such that different
metal layers have different positions in z-direction. Let y-direction
represent positions in side direction between wires (see Figure 4.6).
Let pitch distance refer to distances measured in the pitch for the wires
in each respective direction. In z-direction this means that the pitch
distance for two wires in adjacent metal layers is 1 and for two wires
with one metal layer in between, the pitch distance is 2. In y-direction
two adjacent wires in the same metal layer have a pitch distance of 1.
The pitch distances are integer values.
95
CHAPTER 4
Y
Z
wires
sub
stra
te
Figure 4.6:Multi-layered layout of interconnection wires
Due to differences between dimensions in z-direction and y-direction,
a single minimum distance between wires that can be tested
simultaneously is an unnecessary coarse parameter for deciding which
wires to consider as victims simultaneously. Instead we consider pitch
distance in y-direction and pitch distance in z-direction as two
different quantities. A table can be used that defines for each possible
pitch distance in one of the directions z and y, what the smallest
allowed pitch distance in the other direction would be.
4.2.2. Basic idea of victim selection method
The basic idea of the method presented is to utilize a shift register to
decide which lines should be victims at the same time. One bit in the
shift register is used for each line to identify the line as a victim or an
aggressor. Logic value 1 in the bit of the shift register means that the
corresponding line should be considered as the victim and logic value
0 means that it should be considered as an aggressor.
Figure 4.7 shows how this shift register is used to select lines as
victims or aggressors. For the entire bus, one state machine designed
based on some test strategy is used to generate signals according to
96
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
how victims and aggressors should behave. These signals are denoted
A and V in this figure.
Figure 4.7: BIST hardware
Figure 4.8 illustrates the relationship of bits in the shift register to the
actual lines. Each row corresponds to a metal layer. First it goes
through each line at the highest metal layer from left to right. After
that it continues in the same way in the next metal layer and so on.
The boxes represent a cross-section of the wires’ layout and the
arrows represent how the bits in the corresponding shift register are
shifted. An initial assignment to the shift register is made to get the
correct distance between victims.
Figure 4.8: Shift register and cross section of bus
4.2.3. Illustrative example
To illustrate this, we use a bus with 32 wires distributed equally to
four equivalent metal layers. For this illustrative example we assume
that layout constraints and trade offs between test time and test
accuracy have resulted in the values in Table 4.2 which gives the
minimum allowed pitch distance in y-direction between two lines that
can be victims simultaneously, given a pitch distance in z-direction.
97
CHAPTER 4
Table 4.2: Example of distance constraints
Pitch distance in z-direction
0
1
2
≥3
Minimum pitch distance in y-direction
3
2
1
0
To decide the initial assignment to the shift register we use the
following method. First we select the upper left corner wire to be a
victim. After that we select as a victim the left most wire in the second
row, which is not too close to the first victim wire. Similarly, we
select victims in the subsequent layers such that they form a slanting
line as shown in Figure 4.9, where the wires that are assigned as
victims after this operation are indicated as solid boxes.
y
z
Figure 4.9: Selection of initial victims
The next step is to assign the second victim in the first row. We take
the left most line that is not too close to any of the wires that have
been assigned as victims so far. In Figure 4.9 this line is shown
striped. After this operation we can see that the pitch distance between
the victims in the first row is five in this example. This tells us to
assign every fifth element on each row in the shift register as a victim
and the others as aggressors. In Figure 4.10, wires are denoted with
numbers to denote the order in which they should be considered as
victims.
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
Figure 4.10: Victim assignment order
In this case we were lucky because the distance through the shift
register from the right victim at row one to the first victim at row two
was equal to the pitch distance between the victims at row one.
98
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
Assume there were six wires in each layer instead of eight. Then we
would have to delay the bits between the rows with the help of extra
dummy flip-flops. In this example we would have to delay the bits by
two cycles. Figure 4.11 illustrates what the shift register looks like.
dummy flip-flops
dummy flip-flops
dummy flip-flops
Figure 4.11: Need of dummy flip-flops
4.2.4. General description of method
In this subsection the method for selection of initial victims and
computing the number of dummy flip-flops is described formally for
the general case. The way in which dummy flip-flops should be
initialized is also described. Given w, the number of wires in each
metal layer, we compute two parameters, sd giving the distance
between nearest victims in the shift register (including dummy flipflops) and nd giving the number of dummy flip-flops in each row.
Another input is information about how close wires can be to each
other and still be considered as victims simultaneously.
This information is sufficient to design the shift register and to
initialize it. Every sd:th flip-flop in the chain, starting with the first
one, should be initialized to indicate that the corresponding wire
should act as a victim. Other flip-flops should be initialized to indicate
that the corresponding wire is an aggressor.
Let dy(z) be the minimal pitch distance in y-direction between two
wires that could be considered as victims simultaneously given a pitch
distance in z-direction. The domain of this function is the nonnegative integers and the range is a subset of the non-negative
integers. This function is non-increasing and dy(α) = 0 for all α greater
than some integer. Table 4.2 in Subsection 4.2.3 is an example of what
this function might look like.
Parameters sd and nd can be determined by the following
algorithm. The input data to the algorithm is function dy(z) and
variable w which is the number of wires in each metal layer.
99
CHAPTER 4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
ALGORITHM Determine nd and sd
sd ← dy(0)
i ← 1
WHILE dy(i) > 0 LOOP
t ← i * dy(1) - dy(i)
u ← i * dy(1) + dy(i)
IF t < sd < u THEN
sd ← u
END IF
i ← i + 1
END LOOP
p ← MAXIMUM (0, sd - w)
q ← MAXIMUM (w, sd)
v ← (q + dy(1)) MODULO sd
IF v = 0 THEN
nd ← p
ELSE
nd ← p + sd – v
END IF
END ALGORITHM
Line 2 – 11 in the algorithm can be illustrated with the help of Figure
4.9 in Subsection 4.2.3. The victims marked in solid black in that
figure are the basis for this part of the algorithm. The algorithm then
imagines another victim on the uppermost layer and determines how
much to the left it can be put without becoming too close to any of the
victims marked in solid black. The distance between this second
victim and the black one in the upper left corner will be the value of
sd. At line 2 this other victim is put as much to the left as possible
without coming too close to the solid black victim at the uppermost
line. In the while loop at line 4 – 11 the other solid black victims are
considered in order, with each one being checked to see if that other
victim is too close to some of them (line 5 – 7). If this is the case, that
other victim is moved to the right (line 8). Line 12 – 19 is used to
determine number of dummy flip-flops.
4.3 Method for test of crosstalk-faults
causing glitches
Section 3.2.4 illustrated how a glitch on a control line can cause the
transmitter and the receiver to lose their synchronization. Glitches on
control lines are more dangerous than glitches on data lines because of
100
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
the risk of the transmitter and receiver losing their synchronization. It
is also more difficult to test for glitches on the control lines than on
the data lines. In this section we first show a method for detection of
glitches on control lines and then show how this method can be
extended for detection of glitch faults that also affect data lines.
Glitches that make a wire take a higher potential than it should,
for a very short time, are referred as positive glitches. Negative
glitches are those that make a wire take a lower potential than it
should.
4.3.1. Testing control lines for glitch faults
In an asynchronous link, synchronization between the receiver and the
transmitter is achieved using hand-shaking signals. During testing, the
transmitter and the receiver need to agree on what to test. Faults
causing glitches can under some circumstances make the transmitter
and the receiver lose their consensus about which phase of the test is
currently going on. We show how a test can be designed to avoid that
risk. This test is designed such that the signaling between the receiver
and the transmitter for agreement of the current test phase is not
sensitive to glitches not yet tested for. As the test proceeds, more and
more faults causing glitches are tested for. As soon as the absence of a
potential glitch fault has been determined, the signaling for agreement
of current test phase assumes that this potential glitch fault does not
exist. In this section we show how this can be achieved with the
control lines RTR and Write to let the test be able to detect both
negative and positive glitch faults on the control lines. We let RTR and
Write change as they do during normal operation.
The transmitter can make the data lines behave as aggressors
causing interference on the control lines. In some cases it might be
desirable to let wires outside the communication link being tested act
as aggressors as well. In such a case we assume that a mechanism in
the transmitter can be used to activate such aggressors. An extra
output control signal from the transmitter can be used for this purpose.
This assumption allows us to design the test logic at the behavior level
of abstraction without information about the final layout.
101
CHAPTER 4
Controller
Glitch
detector
Glitch
detector
To perform testing for glitches a glitch detector is used for each
glitch we are testing for. The glitch detectors are put at the receiver
side for Write and at the transmitter side for RTR. Each glitch detector
is an SR-latch. The control signal on which glitches should be
detected is connected to one of the inputs of this SR-latch. A glitch
test controller is connected to the other input and the output of this
SR-latch. Figure 4.12 shows one glitch detector for negative glitches
and one for positive glitches connected to a controller.
Victim wire
reset
Figure 4.12: Glitch detectors with controller
The data transfer cycle can be considered as four different phases.
Figure 4.13 shows the control signals in those phases. The shaded
areas show phases when the respective signal is not supposed to
change. A change in any of those areas in normal operation is never
noticed. We utilize these areas for detection of glitches. In phase 1, for
example, we test for positive glitches on RTR.
Phase 1
Phase 2
Phase 3
Phase 4
RTR
Write
Figure 4.13: Four phases of control signals
Under normal operation, data lines are supposed to be stable in phase
4 and in phase 1. In these phases negative glitches at RTR and
negative glitches on Write respective are the ones that can cause
errors. This means that under normal operation the data lines will
never induce dangerous negative glitches on the control lines. Faults
causing negative glitches on RTR and on Write are tested in phase 3
and phase 4 respectively. Hence only aggressors other than the data
lines should be activated in these test phases.
102
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
4.3.2. Steps in the proposed test method
The transmitter can, in principle, test for glitches on RTR. In phase 1
and 3 testing can be done for positive and negative glitches
respectively. The transmitter can simply make aggressor wires change
and measure if those changes affect RTR. It is important that glitches
don’t affect the signal Write because that could mislead the receiver to
believe that a change to phase 2 or phase 4 has occurred. With the
chosen polarity of the signals Write and RTR, these signals have
opposite voltage levels in phase 1 as well as in phase 3. Thus stimulus
at the aggressor wires causing glitches on RTR will not affect Write.
Glitches on Write are a little trickier to test, because the
transmitter activates aggressors, but it is the receiver that detects if a
glitch is present. Phases 2 and 4 are utilized for testing for positive and
negative glitches on Write respectively. When aggressors are activated
both RTR and Write might get glitches. Glitches on RTR might cause
the transmitter to believe that the current phase of the test is complete.
However, such glitches are tested in the preceding phase.
Phase 2 and phase 4 start when the transmitter changes the signal
Write. When one of these phases is entered the receiver needs to
prepare its glitch detectors for detection of glitches. It is, however, not
possible for the receiver to tell the transmitter that it is ready. Instead
the transmitter waits a sufficiently long time after it has changed Write
to ensure that the receiver is ready to detect glitches. A more detailed
description of the time needed for this test is presented in next
subsection.
In phase 2 the transmitter changes the value on the aggressors
including the data lines to generate interference on Write. That change
on the data lines can be used by the receiver as a signal that aggressors
have been activated. The receiver can then check the glitch detector
and after that this test let test proceed into the next phase.
In phase 4 data lines are not part of the aggressors. Some data line
can however be used by the transmitter to inform the receiver that
aggressors have been activated. This can be done by letting the
transmitter change the logic value of that data line when it activates
the aggressors.
103
CHAPTER 4
4.3.3. Test sequence timing analysis
Transmitter
Receiver
Initialize
1. Write ← 1
Aggressor lines
1. RTR ← 0
← 00..0
Phase 1: Test for positive glitches at RTR
1. Reset glitch detectors for
positive glitches on RTR
2. Aggressor lines
← 11..1
3. Check glitch detector for
positive glitches on RTR.
If a glitch has occurred
then report fault and exit
test
Change to Phase 2
1. Write ← 0
Aggressor lines
← 00..0
1. Change to Phase 2 when
Write goes from 1 to 0
Phase 2: Test for positive glitches at Write
1. Wait long enough to ensure
that the receiver is ready
1. Reset glitch detectors for
positive glitches on Write
2. Aggressor lines
2. Wait until data lines have
changed to 1
← 11..1
3. Check glitch detector for
negative glitches on Write.
If a glitch has occurred
then report fault and exit
test
Change to Phase 3
1. Change to Phase 3 when
RTR goes from 1 to 0
1. RTR ← 0
Figure 4.14: Test sequence for glitch faults on control lines
As argued for above, phase 1 must precede phase 2 and phase 3 must
precede phase 4 in the test sequence. So the test needs to start in either
phase 1 or in phase 3. In the following description we let the test start
in phase 1. Forcing the system into phase 1 starts the test. This can be
104
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
done through a common signal or a sequence given to both the
transmitter and the receiver. The aggressor wires must simultaneously
be initialized to zero because they should change from zero to one in
this phase and a change from one to zero might destroy the test
sequence. A change of aggressors from one to zero in phase 1 might
cause a glitch on Write. This can cause the receiver to proceed to
phase 2 and further to phase 3 while the transmitter is still in phase 1.
Figure 4.14 shows the test sequence during phase 1 and phase 2, as
described in Subsection 4.3.2. The actions of the transmitter and the
receiver are shown. The test in phases 3 and in phase 4 works
analogously to this.
To determine the efficiency of the test we analyze the time
required to perform the glitch test. The analysis assumes that clock
generators for different clock domains are independent. Let clkT and
clkR be the clock signals for the transmitter and the receiver
respectively. Let their clock periods be denoted by TT and TR
respectively. Let TTmin, TRmin, TTmax and TRmax be minimum and
maximum clock period time for the respective clock signals. Based on
this information we will compute the worst case time for this test.
Figure 4.15 shows a timing diagram for phase 1 and phase 2 of the
test. The signals g-res are the signals that reset the glitch detectors.
The signal aggr represents the aggressor wires including data wires
when applicable. The signal named enter represents a sequence or a
signal that makes the receiver and the transmitter entering a mode for
this test. There could be some time difference between the time points
at which the signal enter reaches the transmitter and the time point at
which it reaches the receiver. Let Tentermax be the maximum time
difference that can occur. In Figure 4.15 this time difference is
represented by the shaded area for the signal enter. For the signals
Write and RTR, the delay is represented by shaded areas in Figure
4.15. Let’s name the worst case of the delay of these signals TWritemax,
and TRTRmax respectively. There might be a delay from the transmitter
activates the aggressor until interference occurs. Let Taggrmax be this
time in the worst case. In Figure 4.15 this delay is shaded areas to
illustrate this delay.
105
CHAPTER 4
Transmitter
enter
clkT
Write
aggr
g-res
Receiver
n1 * TT
n2 * TT
n 3 * TT
clkR
RTR
g-res
1
2
3
4
Figure 4.15: Timing diagram for test time analysis of glitch test
For both the transmitter and the receiver, in the worst case it takes one
clock cycle from when the signal enter is asserted to indicated that the
test should start until an action is taken based on that change. So
looking at the transmitter it takes max one clock cycle until the signals
Write, data and g-res are assigned their initial values.
Before the transmitter sets its signal g-res to 0 it needs to wait
long enough such that the receiver has definitely set RTR to zero.
From this point of view, in the worst case the transmitter detects the
change of the signal enter directly after its change while the receiver
needs one clock cycle before it detects that change. In the worst case,
the signal enter arrives at the receiver Tentermax time units after it arrives
at the transmitter. The time from the transmitter detects the assertion
of signal enter until the change of RTR to 0 has reached the transmitter
is therefore Tentermax + TRmax + TRTRmax in its worst case. Hence the
transmitter needs to wait at least n1 clock cycles from when it has
detected the change of the signal enter until it sets signal g-res to 0
and proceeds with the test. Parameter n1 is given as:
T
+ TR max + TRTR max 
n1 =  enter max

TT min


(4.12)
One transmitter clock cycle after that change of signal g-res are the
aggressors activated by changing them from 0 to 1 (point 1 in Figure
4.15). After that the transmitter waits long enough before it proceeds
106
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
such that possible disturbances have reached the glitch detector for
RTR. In this case waiting long enough means that the change of the
aggressor wires should have been completed and eventually glitches
on RTR should have reached the transmitter. This time period is in
worst case Taggrmax + TRTRmax. To ensure this the transmitter then needs
to wait n2 clock cycles where n2 is defined as:
 Taggr max + TRTR max 
n2 = 

TT min


(4.13)
The transmitter then checks its glitch detector and eventually reports
faults. Simultaneously, the signal Write is set to 0 to make the test
proceed into test phase 2 (point 2 in Figure 4.15). At the same time,
the aggressors are set to 0. After that, the transmitter needs to wait
long enough before it changes the values of the aggressors from 0 to 1
to generate interference such that the receiver can prepare for the test.
To do that preparation, the receiver needs to reset its glitch detectors
and it needs to keep the signal g-res high until the aggressor wires
have definitely been stabilized to zero. That time is Taggrmax. The
number of clock cycles nR1 the receiver needs to keep g-res at 1 is
therefore:
 Taggr max 
nR1 = 

 TR min 
(4.14)
From the transmitter’s point of view, in the worst case it takes TWritemax
time until the change of signal Write reaches the receiver. After that it
takes up to TRmax time until the receiver detects the change and sets its
signal g-res to 1. The consequence is that the time from the transmitter
changes signal Write from 1 to 0 until the transmitter can be sure that
the receiver has set its signal g-res to 1 and then set it back to 0 is then
TWritemax, + TRmax + nR1 • TRmax time units. Therefore the transmitter
needs to wait n3 clock cycles to guarantee that the receiver is ready,
where n3 given as:
T
+ TR max + nR1 ⋅ TR max   TWrite max + (1 + nR1 ) ⋅ TR max 
n3 =  Write max
=

TT min
TT min

 

(4.15)
After waiting, the transmitter activates the aggressor (point 3 in Figure
4.15). When the transmitter has activated the aggressors, the receiver
needs to wait long enough such that the activation has affected all the
107
CHAPTER 4
aggressor wires and eventually glitches on the signal Write have
reached the glitch detector. That time is Taggrmax + TWritemax time units.
The number of clock cycles the receiver needs to wait is then defined
as:
 Taggr max + TWrite max 
nR1 = 

TR min


(4.16)
After waiting, the receiver checks its glitch detector for the signal
Write (point 4 in Figure 4.15) and at the same time it changes RTR
from 0 to 1 to proceed to phase 3. In the worst case, the time it takes
from when the transmitter activates the aggressors (point 3 in Figure
4.15) until the receiver starts to wait its nR1 clock cycles is TWrtiemax +
TRmax time units. The time it takes from the transmitter activates the
aggressors until the receiver changes RTR from 0 to 1 is:
(4.17)
Taggr max + TR max + nR 2 ⋅ TR max = Taggr max + (1 + nR 2 ) ⋅ TR max
We sum up all time as described above to get the time needed from
when the test is activated until phase 3 is entered. This time is, in the
worst case:
(4.18)
TT max + n1 ⋅ TT max + TT max + n2 ⋅ TT max + n3 ⋅ TT max
+ (Taggr max + (1 + nR 2 ) ⋅ TR max )
Test phase 3 and test phase 4 work analogously to test phase 1 and test
phase 2 but with a difference in the timing in the beginning. This
difference comes from the fact that the transmitter starts phase 3 when
RTR is 1 instead of when the signal enter is activated. When the
transmitter finds that RTR is 1, no more waiting time is needed to
ensure that RTR is stable. Only the glitch detectors need to be reset. It
is possible to let the signal g-res for the transmitter be 1 as early as in
phase 2 (not shown in Figure 4.15). In such a case the signal g-res can
be set to 0 as soon as the transmitter detects that RTR is set to 1. From
the moment when the receiver changes RTR to 1 until the transmitter
detects that change, it is, in the worst case:
(4.19)
TRTR max + TT max
So the time from when test phase 2 is finished until phases 3 and 4 are
completed is:
(4.20)
TRTR max + TT max + TT max + n 2 ⋅ TT max + n3 ⋅ TT max
108
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
+ (Taggr max + (1 + nR 2 ) ⋅ TR max )
The time needed for the entire test is then:
testtime = TT max + n1 ⋅ TT max + TT max + n 2 ⋅ TT max + n3 ⋅ TT max
(4.21)
+ (Taggr max + (1 + nR 2 ) ⋅ TR max )
+ TRTR max + TT max + TT max + n2 ⋅ TT max + n3 ⋅ TT max
+ (Taggr max + (1 + nR 2 ) ⋅ TR max )
= TT max (4 + n1 + 2n2 + 2n3 ) + TRTR max
+ (Taggr max + (1 + nR 2 ) ⋅ TR max )
Let’s look at the case where the transmitter and the receiver have
clock generators designed to have the same frequency. Then we can
assume TTmax = TRmax and TTmin = TRmin. The accuracy of the clock
generator can be described by a parameter k such that TTmax = k *
TTmin. Let us also assume that the worst case delay for the wires are the
same for all wires and that the maximal skew in signal enter also has
the same value, i.e. TWritemax= TRTRmax= Tentermax= Taggrmax. Let c be
defined such that TRTRmax = c * TTmax. In such a case the following
simplifications can be made for equations n1, n2, n3 nR1 and nR2:
 kT
+ 2ckTT min 
n1 =  T min
 = k + 2ck 
T
T min


(4.22)
 2ckTT min 
n2 = 
 = 2ck 
 TT min 
(4.23)
 2ckTT min 
nR1 = 
 = 2ck 
 TT min 
(4.24)
 ckT
+ (1 + nR1 ) ⋅ kTT min 
n3 =  T min
 = ck + (1 + 2ck ) ⋅ k 
TT min


(4.25)
 2ckTT min 
nR 2 = 
 = 2ck 
 TT min 
(4.26)
Putting these values into Equation (4.20) and changing its parameters
according to the assumptions above, gives:
(4.27)
testtime = (4 + k + 2ck  + 2 2ck  + 2 ck + (1 + 2ck ) ⋅ k )TT max
+ 3ckTT max + 2(2ck  + 1)TT max
109
CHAPTER 4
This function is non-decreasing in k and in c. The test time is
overestimated if these parameters are rounded upwards to integers:
(
)
testtime ≤ 6 + 3k  + 15c k  + 4 c k  TT max
2
(4.28)
For systems with clocks that have an accuracy such that that the
fastest possible clock is not more than double as fast as the slowest
possible, k is smaller than 2. In this case less than 50 clock cycles are
needed for c ≤ 1 and less than 88 clock cycles are needed to perform
this test for c ≤ 2. These figures show that the number of clock cycles
for this test is relatively small. Up to our knowledge this test problem
which addresses detection of crosstalk induced glitches at links
between different clock domains while avoiding test problems that can
occur due to glitches not yet tested for, has not previously been
addressed. Therefore the main importance of this contribution is the
method itself. The purpose of calculating these figures is to show that
the method is possible to execute within a relatively small number of
clock cycles.
4.3.4. Glitches on data lines
Glitches affecting data lines can cause erroneous reads if they occur
when the data is read. The reading is done in phase 4 (Figure 4.13).
Hence it is only in this phase that glitches on data lines can cause
errors. If the data lines are stable during this phase there is no risk that
any data line will cause severe glitches on any other. As a
consequence of this, all data lines can be considered as victims
simultaneously. The signal RTR is also stable, so this signal does not
cause severe glitches on the data wires. The only signal in the
communication link that might change close to when the data is read
is the signal Write which might change from 0 to 1 to indicate that
new data is available. This occurs if the receiver reads the data very
soon after this signal has changed from 0 to 1. In such a case the
change of the signal Write can result in positive glitches on some data
lines when the data is read. This is the only interference from within
the communication link that can affect the data wires severely. It
might also be desirable to test for interference from wires other than
those belonging to this communication link for some systems. Then
110
TESTING OF CROSSTALK INDUCED FAULTS IN ON-CHIP INTERCONNECTS
testing for both negative and positive glitches on the data wires can be
required.
To detect glitches in the data wires, glitch detectors as in Figure
4.12 can be attached to each data line. The test can be performed by
communicating data in the same way as would be done during normal
operation. All the data wires should then be assigned with zeros when
tested for positive glitches and with ones when tested for negative
glitches. The reset signal to the glitch detectors should be asserted
during phase 3 (Figure 4.13) and the glitch detectors should be
checked at the end of phase 4. If aggressor wires outside the wires in
this link need to be activated, this should be done when the control
signal Write is changed from 0 to 1.
4.3.5. Discussion of results
The presented test method for glitch testing of control signals targets
links between different clock domains. For this circumstance it is
quite fast; 50 clock cycles are enough for testing control lines if clock
generators are reasonably accurate and wire delays are smaller than
the clock period time. All data lines in a link can be tested in parallel.
If the test for glitches is to work accurately, the glitch detector
should be at least as sensitive to glitches as the logic used under
normal operation. By “as sensitive”, we mean that they should react to
glitches with the smallest amplitude and the shortest duration that can
affect the input logic used under normal operation. The nodes are
internally synchronous and the signal lines go to D-inputs of flip-flops
via some gates. Each gate that a signal passes through smoothes out
the signal somewhat in time. The effect is that the more gates a glitch
passes through, the less intensive it will be. The signal feeding the Dinput of a D-flip-flop goes through more gates than the signal to an
SR-latch. Therefore a minimum sized SR-latch, as in the glitch
detectors, is more sensitive to glitches than the D-flip-flop.
111
CHAPTER 4
4.4 Conclusions
In this chapter we have presented how on-chip links between different
clock domains on a chip can be tested for crosstalk-induced faults.
Test methods have been provided to check for the two effects, change
in delay and occurrence of glitches, which crosstalk-faults can cause.
A method for scheduling of wires for simultaneous testing has also
been provided to speed up delay fault testing. The different methods
presented complement each other in forming a complete test
composition for crosstalk-faults in asynchronous on-chip links. Simple
and purely digital hardware implementations have also been
developed, which can be used for the presented test methods.
The method for measuring change in delays due to crosstalk has
been shown to require only a small number of test trials on average to
label a chip good or faulty. We have also shown that this method has
the limitation that in the worst case, a finite number of test trials may
not be able to label a chip correct or faulty. However, the probability
of this outcome can be made arbitrarily small by increasing the
number of test trials.
Use of the methods presented in this chapter makes it possible to
have programmable tradeoff between test time and test accuracy. In
the method for measuring change in delay, the limit on the number of
trials can be programmed. A higher limit makes the test take longer
time but reduces the probability that a good chip is marked as faulty.
The method for scheduling victims uses a small BIST hardware
structure that can be programmed to a desired minimum distance
between wires that can be tested simultaneously. The test will be
faster if the wires being tested simultaneously are closer to each other,
but the risk that faults will not be detected is higher.
The method presented for detecting glitches caused by crosstalk
on the control lines does not provide for tradeoff between test time
and test accuracy, but on the other hand this test method is relatively
fast. We have also proposed a method for test data wires for glitches.
112
Chapter 5
System level fault models
This chapter describes our contribution to test generation at the system
level of abstraction, which is based on a fault model at the system
level of abstraction. A NoC switch is used as a case study for the
system specific faults.
In Section 5.1 concepts related to system-level fault models are
introduced. NoC switch specific fault models are proposed and based
on these models. Faults for a simplified NoC-switch are evaluated in
Section 5.2.
5.1 System level faults
As discussed before, system-level description of a design describes
what the design is supposed to do without including any
implementation information. A large number of implementations are
possible for the same specification depending on which synthesis
algorithms are used. System level fault models, which are independent
of any specific implementation, then need to be developed based on
what the system is supposed to do without any consideration about
how it should be implemented. This means that definitions of system
level faults must be based on consideration of the ways that externally
observable behavior of the system can be different from the expected
behavior during its various use-case scenarios.
113
CHAPTER 5
5.1.1. Application area specific fault models
If a design gives desired actions for all its use-case scenarios then any
of its defect-free implementations will work correctly. On the other
hand, any non-redundant fault in the implementation will lead to
incorrect actions during at least one of the use-case scenarios. It is
only possible to identify use-case scenarios for a known design or
designs in a fixed application area. A consequence of this is that
system level faults need to be formulated specifically for a certain
application or a certain application area. This requires different system
level fault models for different types of applications. The NoC-switch,
which is used for illustration in this chapter, is an example of a type of
application. Another example is one-dimensional filters.
Fault models for a specific application area have the advantage
over fault models for a specific design that they are more general. This
means that they can be used for more different designs, which gives
the advantage that fault models does not need to be invented for every
specific design. Because use-case scenarios in designs in a certain
application area have a lot in common, it seems not to be noticeably
harder to define fault models for a specific application area than for a
specific design. In the next subsection faults are defined for NoCswitches. The different type of NoC-switch designs can be considered
as an application area.
5.1.2. System level fault models for NoCswitches/routers
A NoC-architecture consists of switches and connections between the
switches. Each switch has connections to one or several other
switches. In the proposed NoC architectures all or most switches also
have connections to one or several cores. The function of the switch is
to forward packets toward their final destinations. The decision
regarding the output port through which a packet should be forwarded
is made by the switch based on routing information in the packet. This
routing information can simply be the destination address or it can be
more sophisticated information including information about the path
for the packet.
114
SYSTEM LEVEL FAULT MODELS
Several different designs have been proposed for NoC switches
[Asc08, Kim05, Lus10]. The functionality of NoC switches has,
however, many generic properties and use-case scenarios. Based on
these generic properties, the following system level fault types can be
defined:
Dropped data fault:
Data received by the switch is lost and
never emerges from the intended
output port.
Corrupt data fault:
Transported data is corrupted during
its passage through the switch.
Direction fault:
Data packet is routed in a different
direction from the one prescribed by
the destination address in the packet.
Multiple copies in space fault: Packet comes out through the
intended port as well as through an
unintended port.
Multiple copies in time fault:
More than one copy of the sent packet
comes out through the intended output
port.
115
Corrupt
packet
Packet
Packet
CHAPTER 5
Supposed
direction
Corrupt data fault
Packet
Erroneous
direction
Correct
direction
Packet
Direction fault
Packet
Dropped data fault
Copy of packet
erroneously sent
Correct
packet
Multiple copies in space fault
Several copies
of a packet
Multiple copies in time fault
Figure 5.1: System level faults for a NoC switch
Figure 5.1 illustrates the various fault types described above. For
example, the system-level fault type corrupt data means that data
from a certain direction going to a certain other direction gets
corrupted while being routed through the switch. This system level
fault is independent of the representation of data and its size.
The fault type direction fault is mostly related to faults in the
components that implement the routing algorithm while the fault type
corrupt data is mostly related to faults in the datapath transferring
packets.
Let’s look at the fault type dropped data which means that a
packet from a port, say port A, is supposed to go to another port, say
port B, but gets dropped. In testing for this fault, appropriate data
needs to be packaged into various fields of the packet entering port A
116
SYSTEM LEVEL FAULT MODELS
and other conditions need to be set up in its environment according to
the network protocols so that the effect of the fault could be observed
at port B.
The list of fault types for a NoC switch shown above is certainly
not complete or unique. Some other application-specific faults for a
NoC-switch can be defined. The purpose of the work presented in this
chapter is to use a set of faults to highlight the usefulness of system
level fault models for test generation.
5.2 Evaluation of system level fault
models
A set of fault models can be used to generate test data. We call a set of
fault models efficient if the test data generated based on the fault
models give high coverage of the defects in the relevant
implementations. By relevant implementations we mean
implementations that are generated by the relevant synthesis tools. To
determine whether the system level faults are useful we need some
way to evaluate them. We have chosen to compare some of the
proposed system level fault models with the logic level stuck-at faults
in a logic level implementation. We use a simplified NoC-switch for
this evaluation.
5.2.1. Setup for experiments
Simplified NoC-switch design
To evaluate the proposed system level fault models, experiments have
been done on a crossbar. The crossbar is generally part of a NoCswitch and can be considered as a simplification of a NoC-switch. The
considered crossbar has connections in four directions, named east,
south, west and north. This matches the directions in a NoC-switch for
a mesh topology. In a mesh topology NoC, an additional port is
needed to connect to a core. In this experiment for test evaluation this
connection is omitted for the sake of simplicity.
117
CHAPTER 5
In each direction there is one output port and one input port.
Figure 5.2 shows the signals at the crossbar. A complete NoC-switch
has buffers at input and/or output ports. This is omitted from the
crossbar. The modeled crossbar is therefore a purely combinational
circuit. A further simplification compared to the NoC-switch has to do
with the address bits. In a NoC-switch the address bits identify the
final destination of the packet. In this crossbar there are instead only
two bits that specify a target output port for the packet. Due to this
simplification the output ports of this crossbar do not have any address
bits coming out.
Input port
Output port
Data
Data
East
West
North
Address
South
Strobe
Data valid
Acknowledgment
Acknowledgment
Figure 5.2: A Simplified NoC-switch
Each input port has two control signals, strobe and acknowledgement.
The signal strobe is an input signal to the crossbar and it is used to
indicate if there is valid data on the input port. The signal
acknowledgement goes out from the crossbar and it indicates if the
data could be transferred further via the desired output port.
Each output port has the control signals data valid and
acknowledgement. The signal data valid is an output signal and it is
used to indicate if there is valid data on the output data bits. The signal
acknowledgement at the output port is an input signal to the crossbar.
It is used to inform the crossbar if its environment is ready to accept
the data.
Several input ports might try to send data to the same output port.
In this crossbar static priorities are given to the input ports to decide
which of them is allowed to send its data in such a case of conflict. In
the crossbar used in this experiment, the number of data bits is
reduced to two which is much fewer than in typical NoC-switch
designs. Usually it is not useful to route a packet back and forth
between two switches because this results in network traffic which
118
SYSTEM LEVEL FAULT MODELS
does not participate in transferring the packet towards its final
destination. Therefore a switch usually never sends back a packet in
the direction it came from. No such constraint is put in the crossbar
used in these experiments.
Switch synthesis
For evaluation of system level faults, this crossbar was synthesized
into the logic level. The relationship between the stuck-at faults at this
logic level implementation and the system level faults was then
analyzed to get figures for the relevance of the system level faults.
In the synthesis process to the logic level the system is optimized
with rugged script in the tool SIS [Sen92]. It is thereafter technology
mapped such that the logic level implementation only consists of
inverters and two-input AND, NAND, OR and NOR-gates. The
number of logic level faults considered is 400 which are all logic level
stuck-at faults except faults that are dominated by others and
redundant faults. For logic level faults that are equivalent only one
fault is considered among each group of equivalent faults.
Faults
In the experiments the system level fault types considered are dropped
data, direction faults and multiple copies in space. Test data for a fault
of type corrupt data as well as for a fault of type dropped data should
apply some data on a certain input port. Address, strobe and
acknowledge bits should be applied to let this data be outputted on a
certain port. Therefore for each fault of type corrupt data there is a
fault of type dropped data which is tested with the same test data. The
consequence is that the experiments give identical results for faults of
type corrupt data and faults of type dropped data. Because of that the
fault type corrupt data is not included in the experiments. The fault
type multiple copies in time cannot be applied to the crossbar because
it is a purely combinational design.
There are 16 possible faults of the type dropped data, one for each
combination of an input port and an output port. For the fault types
direction faults and multiple copies in space there are 48 different
faults of each type. There are sixteen combinations for each
combination of an input port and a desired output port. For each such
combination there are three possible faults of each of the fault types
119
CHAPTER 5
direction and multiple copies in space, one for each of the three
remaining output directions in which the packet erroneously goes in
the presence of such a fault.
Fault simulation
Every possible input pattern (224) has been fault simulated both at the
logic level implementation and at the system level specification. The
logic level fault simulation has been performed with the help of the
Turbo tester [Jer98] tool. For each combination of a logic level fault
and a system level fault, the number of test vectors that detect only the
logic level fault, that detect only the system level fault, and that detect
both kinds of fault, have been stored. These data are the basis for the
experiments done in this work.
5.2.2. Metrics for measurement of the
relevance of system level fault types
In this subsection we present two metrics for evaluation of the
relevance of system level faults. Both these metrics evaluate how
system level faults relate to stuck-at faults in a specific logic level
implementation. It should be mentioned that these evaluation metrics
are not intended to be used for generation of test vectors nor for
evaluating system level faults during design of a system. Their
purpose is instead to help to give figures on the relevance of system
level faults and system level fault models.
Relative fault coverage increase
The metric relative fault coverage increase is calculated separately for
each system level fault. It makes use of a subset of the stuck-at faults
in a specific logic level implementation. That subset is chosen such
that this metric gets as large value as possible.
Let n denote the number of possible logic level stuck-at faults and
m the number of possible system level faults. We expect m to be much
smaller than n. Let Li be a stochastic variable that indicates if the i:th
logic level fault is covered by a random test vector. Correspondingly,
let Sj be a stochastic variable that indicates if the j:th system level fault
120
SYSTEM LEVEL FAULT MODELS
is covered by a random test vector. These stochastic variables are 1
when the fault it indicates is covered and 0 when it is not covered.
Let uij be defined:
(
)
uij = P Li S j − P(Li )
(5.1)
This means that uij is the difference between two probabilities. The
first probability is the probability that a random test vector detects the
i:th logic level fault, with the condition that this vector also detects the
j:th system level fault. The second is the probability that the i:th logic
level fault is detected by a purely randomly generated test vector. A
measurement of the usefulness of a system level fault is the relative
fault coverage dj which we define as:
Definition: Relative fault coverage increase
n
d j = ∑ max(0, u ij )
(5.2)
i =1
The relative fault coverage increase dj for the j:th system level fault
can be interpreted in the following way. There exists a logic level
implementation in which the expected coverage of a subset of the
logic level stuck-at faults is expected to be dj larger if a test vector is
generated based on the j:th system level fault as compared to a
randomly generated test vector.
Expected increase in logic level fault coverage
The metric expected increase in logic level fault coverage is
calculated for a set of system level faults. This metric is a function of
the number of test vectors. Similar to the metric relative fault
coverage increase this metric is also computed based on a specific
logic level implementation. This metric is the difference in expected
number of logic level fault coverage when test vectors are generated
based on a set of system level faults in a random way compared to
completely randomly generated test vectors. The random way to
generate test vector is the naive test data generation method described
in the algorithm below. In this algorithm (naive test data generation),
t is the number of test vectors that should be generated based on the
set S of system level faults. The set V is the set of test vectors
generated by this algorithm.
121
CHAPTER 5
1
2
3
4
5
6
7
8
9
10
11
12
13
ALGORITHM Naive test data generation
V ← EMPTY SET
A ← SELECT RANDOMLY t mod s ELEMENTS IN S
B ← S - A
FOR ALL ELEMENTS a IN A LOOP
C ← SELECT RANDOMLY t / s  + 1 test vectors covering a
V
END
FOR
C
← V U C
LOOP
ALL ELEMENTS b IN B LOOP
← SELECT RANDOMLY t / s  test vectors covering b
V ← V U C
END LOOP
END ALGORITHM
Observe that more test vectors than the number of system level faults
are generated in some cases. Unlike the case where a deterministic
coverage relationship between system level faults and physical defects
is the basis for the fault models, several test vectors for the same
system level fault is often better in the probabilistic sense. For
example 80 percent of the test vectors covering a certain system level
fault might cover a specific physical defect while another physical
defect might be covered by 75 percent of the test vectors covering the
same system level faults. Generating more test vectors for this system
level fault gives then higher probability that those physical defects
will be covered.
5.2.3. Results of experiments
This subsection shows the results of the metrics presented above
applied on the crossbar described in Subsection 5.2.1. The system
level faults considered are dropped data, direction faults and multiple
copies in space.
Result of expected increase in logic level fault coverage
Figure 5.3 shows the expected logic level fault coverage when test
vectors are generated with the naive test data generation algorithm
described in Subsection 5.2.2. This algorithm is applied separately for
the set of system level faults dropped data, for the set of system level
faults direction faults and for the set of system level faults multiple
122
SYSTEM LEVEL FAULT MODELS
copies in space. The expected fault coverage with completely
randomly generated test vectors is shown in this figure as well.
As the metrics expected increase in logic level fault coverage was
defined above this metric is the difference between the curve for
respective system level fault type and the curve with completely
randomly generated test vectors. Figure 5.4 shows this metric for the
set of system level faults dropped data, for the set of system level
faults direction faults and for the set of system level faults multiple
copies in space.
Logic level
fault coverage
Number of
test vectors
Figure 5.3: Logic level stuck-at fault coverage as a function of number of
test vectors
123
CHAPTER 5
Figure 5.3 also shows the expected coverage of logic level faults when
test vectors are generated based on the logic level implementation.
There are two curves that are based on test vectors generated by the
tool Turbo tester [Jer98] which implements the PODEM algorithm
[Abr90]. This algorithm generates a set of test vectors that covers the
logic level faults. For this implementation of the crossbar, 42 test
vectors were generated. The logic level coverage is therefore 100
percent for 42 and more test vectors. For fewer test vectors the curves
for logic level faults shows the coverage when a subset of the vectors
generated by PODEM algorithm is chosen. There are two curves for
expected logic level coverage. One of them shows the expected logic
level fault coverage when using a randomly chosen subset of the
vectors generated by PODEM algorithm. The other curve shows the
logic level fault coverage when the subset of test vectors are picked up
starting from the beginning of list in which the test vectors generated
by PODEM algorithm are sorted in decreasing order of logic level
coverage.
Expected increase in logic
level fault coverage
15.00
Dropped data
Direction fault
10.00
Multiple copies in space
5.00
0.00
0
8
16
24
32
40
48
56
64
Number
of test
vectors
Figure 5.4: Expected increase in logic level fault coverage
From these diagrams we can see that test data generated from system
level faults give better logic level fault coverage than randomly
generated test vectors. For example, if 90% logic level fault coverage
is desired, about 15 test vectors generated from system level faults are
124
SYSTEM LEVEL FAULT MODELS
needed while about 25 test vectors are needed if they are generated
randomly. From the diagrams we can also see that the logic level
coverage is more than 10 percent units better with test vectors
generated from system level faults than for randomly generated test
vectors, in the best case. These results indicate that system level fault
models have some potential to facilitate test data generation.
From the curve that show coverage when picking up vectors from
PODEM algorithm in ordered manner, we can see that test data
generated with an algorithm that works on the logic level gives better
results than those generated based on the system level faults.
However, generation of test vectors at the system level has the
advantage that testing can be considered earlier in the design phase
than if test vectors are generated at the logic level.
Generation of test vectors based on logic level stuck-at faults is a
much more mature research area than test generation at system level.
Therefore there are probably more potential to improve test generation
methods at system level than at logic level. A more fair comparison
between system level fault and logic level fault would therefore be to
let the logic level coverage be represented by the curve in which test
vectors from PODEM algorithms are picked up randomly. With a
desired logic level fault coverage of about 90 percent or less we can
see from Figure 5.3 that generation of test vectors based on system
level faults of type dropped data get higher fault coverage than test
vectors generated at logic level in this way. For test vectors based on
the faults of type direction fault and multiple copies in space we can
see in Figure 5.3 that the expected logic level fault coverage is about
the same as for logic level generated test vectors, when the desired
fault coverage is about 85 percent or less. These facts indicate that,
utilization of system level fault can compete with test vector
generation at logic level when the desired logic level fault coverage is
about 80 to 90 percent or less.
It is also worth noting that in Figure 5.4 the curve for test data
generated based on the dropped data fault has a peak of sixteen test
vectors. This is a good sign for the usefulness of this system level fault
model. There are sixteen different system level faults of this type for
the current design. With the algorithm described in the previous
subsection for generation of test vectors, a maximum of one test
125
CHAPTER 5
vector is generated for each dropped data fault when sixteen or fewer
test vectors are generated. When test vectors are added to this set of
test vectors such that it becomes more than sixteen, the new vector is
based on a dropped data fault for which one test vector is already
included in the set of test vectors. The increase in the expected fault
coverage by adding one test vector is larger when there are less than
sixteen test vectors in the set. This indicates that the dropped data
fault model correlates well to a subset of the logic level faults.
We can also see that when test vectors generated based on
direction faults the expected coverage is slightly greater than for test
vectors generated based on dropped data faults for a set of test vectors
containing more than 38 test vectors. There are 48 different direction
faults while there are only sixteen dropped data faults. Given the fact
that the coverage for a small set of test vectors is better if dropped
data is used, we can conclude that pairs of faults of this type tend to
vary more in which logic level fault they relate to than pairs of
dropped data faults. However, when larger sets of test vectors are
used several test vectors need to be generated from the same fault if it
is based on the dropped data faults. Because the test vector set that
was generated based on direction faults gives better coverage in this
case, we can conclude that the increased diversity in the faults of type
dropped data is not enough to compensate for the lower number of
faults compared to the direction faults.
Relative fault coverage increase
As defined in Section 5.2, the relative fault coverage increase is a
relationship between one system level fault and the logic level faults
in a given logic level implementation. It is a value that estimates the
usefulness of one system level fault. The higher the value the more
likely it is that the system level fault is useful. Table 5.1 shows the
largest, the smallest and the average of the relative fault coverage
increase for the system level faults of the three types considered in
this experiment.
126
SYSTEM LEVEL FAULT MODELS
Table 5.1: Parameter dj for system level faults
Dropped data
Direction
Multiple
Min
18.2
17.5
19.6
Average
20.2
19.5
21.7
Max
21.4
20.7
23.0
Table 5.1 shows that the relative fault coverage increase is about 20
for each system level fault considered in the experiments. This
indicates that for a test vector generated for a specific system level
fault, coverage of about 20 more logic level faults can be expected
than with a random test vector, within a subset of the logic level
faults. This is for the system level faults considered in this experiment
and for the logic level implementation in this experiment. This result
is in line with the result of expected increase in logic level fault
coverage, in the sense that it indicates that system level faults can be
useful for test generation. The difference in expected increase in logic
level fault coverage between the three system-level fault types
considered is small; therefore it is not possible to make any
conclusions based on the relative fault coverage increase about
differences between these system level fault types. Because the
relative fault coverage increase is a metric measuring each system
level fault separately it has the limitation of not being able to consider
how faults in a set of system level faults correlate to each other.
5.2.4. Experiments to evaluate relative
effectiveness of system level faults
A final experiment in this chapter attempts to determine how
dissimilar faults of a certain type are to each other. For the system
level fault types dropped data, direction faults and multiple copies in
space comparisons have been made between two ways of generating
test vectors. The first way to generate test vectors is the naive test data
generation method described in Subsection 5.2.2. In that method, a set
of test vectors based on a specific type of system level fault are
generated such that the number of vectors generated based on each
system level fault of this type is distributed as equally as possible in
127
CHAPTER 5
the final test sequence. In the second method each test vector is
generated without any consideration of how other test vectors have
been generated. The second method is described in the following
algorithm.
1
3
2
3
4
5
6
7
ALGORITHM Test vector generation without correlation
between vectors
result ← EMPTY SET
FOR number of vectors to generate LOOP
s ← SELECT RANDOMLY a system level fault
t ← SELECT RANDOMLY a test vector that covers s
result ← result U t
END LOOP
END ALGORITHM
Each test vector is generated by first randomly selecting a system
level fault of the type currently being considered and then randomly
selecting a test vector from among those that cover this system level
fault. With this second method there is a very low probability that a
test vector is generated for each system level fault when the number of
test vectors generated is equal to the number of system level faults
they are based on.
128
SYSTEM LEVEL FAULT MODELS
Difference in expected logic
level fault coverage
8.00%
6.00%
4.00%
2.00%
0.00%
0
-2.00%
8
16
24
32
40
48
56
64
Number
of test
vectors
Dropped data
Direction fault
-4.00%
Multiple copies in space
Figure 5.5: Fault coverage difference between two methods
Figure 5.5 shows the difference in percent units in the expected logic
level stuck-at fault coverage for test vectors generated with the naive
test data generation method described in Subsection 5.2.2 and the
method just described in this subsection. Experiments have been
performed separately for the three system level fault types, dropped
data, direction faults and multiple copies in space. We can see that the
expected coverage is greater when test vectors are generated with the
naive test data generation method in Subsection 5.2.2. This shows
that there is some distinction between different system level faults of
the same type. By distinction we refer to the correlation of different
system level fault models to logic level stuck-at faults.
We can also see that for dropped data faults the largest difference
is for 16 test vectors. When 16 test vectors are generated with the first
method, there is one test vector for each fault. The peak at 16 also
indicates that there are considerable differences between different
faults.
129
CHAPTER 5
5.3 Conclusions
In this chapter we have proposed to use application area specific
system fault models at the system level of abstraction. Using a
simplified NoC-switch, the potential of this idea has been
demonstrated by evaluating the fault coverage of logic level stuck-at
faults by test vectors generated with system level faults compared to
randomly generated test vectors. Experiments show that usage of
application specific system level faults has some potential and is
worth further investigation.
130
Part C
Logic optimization
131
132
Chapter 6
Background and related
work in Boolean
decomposition
Part A of this thesis gave an introduction and background on system
design and testing. The current chapter offers a more focused
background on the subject of Boolean decomposition and presents
related work. The main objective of Boolean decomposition is to
minimize the cost function at the logic level of abstraction in order to
reduce the number of components or the chip area needed to
implement a given Boolean function.
In Section 6.1 different types of decompositions are described.
Section 6.2 gives background on decomposition methods based on
binary decision diagrams. It also provides a deeper description of nondisjoint decomposition and it describes the notion of bound-set. This
is the background to the contributions presented in Chapter 7. Section
6.3 serves as a background to the contributions presented in Chapter 8,
which is decomposition for logic with a gate depth of three. Section
6.4 describes how decomposition can be used in areas other than logic
optimization.
133
CHAPTER 6
6.1 Decomposition of Boolean functions
Decomposition of a Boolean function is the task of partitioning the
function into subfunctions.
6.1.1. Concepts and notations
Boolean function
A Boolean function is a function of Boolean variables with a Boolean
function value. Due to its mapping to a network of gates the function
value is often named output and its variables inputs. A Boolean
function can also be incompletely specified. For such a function the
function value can be don't-care for some combinations of input
values. A multiple output Boolean function is a set of Boolean
functions with common inputs.
We will follow the standard notations for representing Boolean
functions. Capital letters are used for vectors or sets of Boolean
variables. Lower-case letters are used for single Boolean variables. A
bar above a variable or an expression indicates complementation.
Support set
The support set of a Boolean function is the set of inputs on which the
function depends. An input that does not belong to the support set of a
function cannot affect the output regardless of the values of other
inputs.
6.1.2. Disjoint and non-disjoint basic
decompositions
A decomposition of a Boolean function f ( X ) is a representation of
the
type
f ( X ) = h( g (Y ), Z )
where
Y⊆X,
Z⊆X
and
Y ∪ Z = X . In [Ash59] initial theoretic work about decomposition
theory was presented. If Y and Z are disjoint sets, Y ∩ Z = ∅ , the
decomposition is disjoint otherwise the decomposition is non-disjoint.
For example function
134
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
f (x1 , x 2 , x3 , x 4 , x5 ) = x1 ⋅ x 2 ⋅ x5 + x1 ⋅ x3 ⋅ x5 + x1 ⋅ x3 ⋅ x5
(6.1)
+ x1 ⋅ x3 ⋅ x5 + x1 ⋅ x 2 ⋅ x3 ⋅ x 4 + x1 ⋅ x 2 ⋅ x3 ⋅ x 4
can be written as
f (x1 , x 2 , x3 , x 4 , x5 ) = h( g (x1 , x 2 , x3 ), x 4 , x5 )
where
g (x1 , x 2 , x3 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3
h( g , x 4 , x 5 ) = g ⋅ x 4 + g ⋅ x 5
.
This is a disjoint decomposition since no variables occur in both
functions g and h. Figure 6.1 schematically shows the implementation
impact of this decomposition.
x1
x2
g(x1,x2,x3)
x3
x4
h(g,x4,x5)
f(x1,x2,x3,x4,x5)
x5
Figure 6.1: Illustration of a disjoint decomposition
As an example of a non-disjoint decomposition consider the function
f ( x1 , x 2 , x3 , x 4 , x5 ) = x1 ⋅ x3 ⋅ x 4 ⋅ x5 + x1 ⋅ x3 ⋅ x 4 ⋅ x5 +
(6.2)
x 2 ⋅ x 4 ⋅ x5 + x 2 ⋅ x3 ⋅ x 4 ⋅ x5 + x 2 ⋅ x3 ⋅ x 4 +
x1 ⋅ x3 ⋅ x 4 ⋅ x5 + x 2 ⋅ x3 ⋅ x 4 ⋅ x5 .
This function can be written as
f ( x1 , x 2 , x3 , x 4 , x5 ) = h( x1 , x 2 , x3 , g (x3 , x 4 , x5 ))
(6.3)
where
g (x1 , x 2 , x3 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3
h( g , x 3 , x 4 , x 5 ) = g ⋅ x 5 + g ⋅ x 3 ⋅ x 4 + g ⋅ x 3 ⋅ x 4
.
Figure 6.2 shows the impact of this decomposition schematically.
135
CHAPTER 6
x1
x2
g(x1,x2,x3)
x3
h(g,x3 ,x4,x5)
x4
f(x1,x2,x3,x4,x5)
x5
Figure 6.2: Illustration of a non-disjoint decomposition
In the example above, with a non-disjoint decomposition, variable x3
is variable for both function g and function h. There could be any
number of variables that are inputs to both these functions. In such a
way it is possible for every Boolean function to find non-disjoint
decompositions. However, a non-disjoint decomposition is usually
more useful if the number of inputs going to both function g and
function h is small.
The contribution presented in Chapter 7 is a method for finding
disjoint decompositions.
6.1.3. Roth-Karp-decomposition, a
generalization
In the description about decomposition in Subsection 6.1.2 where a
function f(X) is decomposed into f ( X ) = h( g (Y ), Z ) where
Y ∪ Z = X , all variables are binary. A generalization of this kind of
decomposition is the Roth-Karp-decomposition in which the range of
function g is extended to include more values than logic 0 and logic 1.
[Cur62, Rot62]. More precisely, this can be described as letting
f ( X ) be a Boolean function such that f ( X ) = h( g (Y ), Z ) with g
and h being multiple valued functions of type g : B
h:M ×B
Z
Y
→ M and
→ B where M = {0, 1, 2, K , m − 1} and B= {0, 1} . It
is always possible to find a trivial Roth-Karp-decomposition for f(X)
where M = 2
136
Y
but such a decomposition is generally not useful.
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
Normally a Roth-Karp-decomposition needs to have relatively small
M if it is to be useful.
The Roth-Karp-decompositions can be utilized in digital design by
coding the possible values in the set M into k = log 2 M

binary
bits. In this way the Boolean function f ( X ) can be written as
f ( X ) = hb ( g1 (Y ), K , g k (Y ) , Z ) . In this function all variables and
function values are Boolean, so it is possible to implement them
directly with digital gates. It is possible to code the values in M into
the functions g1 to gk in many ways, and the chosen encoding affects
the amount of optimization that can be achieved.
As for the basic decompositions described in Subsection 6.1.2 a
Roth-Karp-decomposition is also either disjoint or non-disjoint. The
definition is the same for Roth-Karp-decompositions, which means
that the decomposition is disjoint if Y ∩ Z = ∅ otherwise it is nondisjoint.
6.1.4. Decomposition into multiple
subfunctions
It is possible that a Boolean function can be decomposed into multiple
subfunctions. In Subsection 6.1.2 we described how a Roth-Karpdecomposition can be encoded into several bits. This is one way to
decompose a Boolean function into more subfunctions. Another way
is to form the functions as f ( X ) = h( g1 (Y1 ),..., g k (Yk ), Z ) where
Yi ⊆ X∀i ∈ [1, k ] and
Z ⊆ X . Subfunctions can be further
partitioned into smaller functions in a hierarchical manner.
137
CHAPTER 6
x1
x2
x3
g1(x 2,x 3,x4)
x4
x5
g2(g1,x5,x6)
h(x1 ,g2,g3)
f(x1,1,x8)
x6
x7
g3(x6,x7,x8)
x8
Figure 6.3: Illustration of multiple decomposition
An example of how a function f ( x1 ,..., x8 ) can be decomposed is
illustrated in Figure 6.3. The function is first decomposed into
(
)
f ( X ) = h x1 , g 2* (x 2 , x3 , x 4 , x5 , x6 ), g 3 (x 6 , x7 , x8 ) .
g ( x 2 , x3 , x 4 , x5 , x 6 )
*
2
is
then
further
The
function
decomposed
into
g ( x 2 , x3 , x 4 , x5 , x 6 ) = g 2 ( g1 (x 2 , x3 , x 4 ), x5 , x 6 ) . Observe this is a
*
2
non-disjoint decomposition.
6.1.5. Multiple output functions
The decompositions described previously in this section deal with
single output functions. Most digital systems, however, have several
outputs. Subfunctions that can be used for several outputs are often
useful to minimize the implementation cost. Decomposition that is
done individually for each output can, however, help to find sub
expressions that can be shared between several outputs.
For example, consider the following functions:
f1 (x1 , K, x6 ) = x1 x 2 x3 x 4 x5 x 6
f 2 ( x1 ,K , x6 ) = (x1 x 2 x3 x 4 x5 + x 1 x 2 x 3 x 4 x 5 ) ⊕ x6
Function f 2 can be decomposed into
g ( x1 ,K , x5 ) = x1 x 2 x3 x 4 x5 + x1 x 2 x 3 x 4 x 5
138
(6.4)
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
f 2 ( g , x6 ) = g ⊕ x6 .
Then function f1 can be simplified to
f 1 (g , x1 , x6 ) = g x1 x6
6.2 Decision diagram based
decomposition methods
This section gives background and related work specifically related to
the contributions presented in Chapter 7.
6.2.1. Properties of the disjoint
decomposition
In this subsection we first describe what a bound-set is and then we
present some methods for finding a bound-set. Thereafter the theory
of decomposition trees is described.
Bound-set
A property of Boolean functions closely related to the model of
disjoint decomposition is the concept of bound-set. Let f(X) be a
Boolean function with all variables in X belonging to the support set
of f(X). Let Y ⊆ X . Then Y is a bound-set if and only if there exist
functions g and h such that f ( X ) = h( g (Y ), Z ) where Z ⊆ X ,
Y ∩ Z = Ø , Y ∪ Z = X and all variables and function values are
Boolean.
Initial work regarding bound-sets was presented by Ashenhurst
[Ash59]. However, Ashenhurst did not use the term bound-set. A
method to determine whether a subset of variables is a bound-set was
described in Ashenhurst’s article. That method works as follows.
Let a Boolean function have the input variables in the disjoint sets
Y and Z. A matrix is then created with one row for each combination
of variable assignments in set Y and one column for each combination
of variable assignments in set Z. Each cell in the matrix contains the
139
CHAPTER 6
function value for the corresponding variable assignments. If the row
multiplicity is two, Y is a bound-set and if the row multiplicity is
larger than two, Y is not a bound-set. The row multiplicity in this
context means the number of distinctive rows. A row multiplicity of
one occurs if none of the variables in set Y belongs to the support set.
Figure 6.4 show an example of such a matrix. There are two
distinctive rows, hence it can be concluded that Y is a bound-set.
One column for each combination of
variable assignments in set Z
One row for
each combination
of variable
assignments in
set Y
1 0 0 1 0 1
1
1 0 0 1 0 1
1
1 0 1 0 0 0
0
1 0 1 0 0 0
0
1 0 0 1 0 1
1
1 0 0 1 0 1
1
Figure 6.4: Matrix for bound-set check
With a bound-set there is an associated function. In function
f (Y ∪ Z ) = h(g (Y ), Z ) where Y is a bound-set function g (Y ) is
associated with Y. The associated function is unique up to
complementation of its output.
Decomposition tree
For a Boolean function f(X) there are 2 − 1 non-empty subsets of
the input variables. For every Boolean function, each input variable as
well as the set of all input variables are trivial bound-sets. Each subset
of variables in X is a bound-set for some functions. An example of a
function for which every subset of variables is a bound-set is the
parity function f ( x1 , x 2 ,..., x n ) = x1 ⊕ x 2 ⊕ K ⊕ x n .
X
In [Möh85] it is described how all bound-sets can be represented
with less than 2n bound-sets with help of the concept of strong and
weak bound-sets and the decomposition tree. Here n is the number of
inputs. A bound-set to a Boolean function is strong if any other
140
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
bound-set is a subset, a superset or disjoint to this bound-set. The
decomposition tree is a rooted tree in which the nodes are representing
the strong bound-sets. An example of a decomposition tree is shown
in Figure 6.7. Bound-sets that are not strong are called weak boundsets.
The root node of a decomposition tree is the trivial bound-set
containing all variables and its leaf nodes are the trivial bound-sets
that contain only one variable. Each strong bound-set B in the
decomposition-tree is positioned such that the sub tree rooted at B
contains all strong bound-sets that are subsets of B but no other nodes.
For a given Boolean function the decomposition tree is unique.
The associated function g(x) with the trivial bound set of a leaf
node with associate variable x is either g ( x ) = x or g ( x ) = x . The
inputs to the function associated with other nodes in the
decomposition tree are the outputs of the functions associated with its
immediate successor nodes in the decomposition-tree. The functions
associated with nodes in a decomposition tree are unique up to
complementation of inputs and outputs.
Properties of the disjoint decomposition that makes it possible to
represent all bound-sets with a decomposition tree are the following.
If two bound-sets Y and Z have the properties Y ∩ Z ≠ ∅ ,
Y − Z ≠ ∅ and Z − Y ≠ ∅ we say that the bound-sets are
overlapping. If Y and Z are overlapping bound-sets it implies that
Y ∩ Z , Y ∪ Z , Y − Z , Z − Y and (Y − Z ) ∪ (Z − Y ) are boundsets as well. Figure 6.5 illustrates this implication.
141
Y
Z
Bound-sets
Y and Z,
Y∩Z ≠ ∅,
Y–Z ≠ ∅,
Z–Y ≠ ∅
Y
Z
Y∩Z
Y
Z
Y∪Z
Y
Z
Y-Z
Y
Z
Z-Y
Y
Z
(Y-Z)∪(Z-Y)
Bound-sets
CHAPTER 6
Figure 6.5: Implication of bound-set’s overlap
A strong bound-set is either full or prime. In the decomposition tree it
is denoted for each bound-set whether it is full or prime. It is thus also
possible to determine the weak bound-sets from the decomposition
tree. All unions of variables in every subset of immediate successors
nodes to a node denoted as full are bound-sets. All weak bound-sets
can easily be found utilizing this property. The distinction between
full and prime bound-sets only makes sense for nodes in the
decomposition-tree that have three or more immediate successor
nodes. Figure 6.6 summarizes the division of bound-sets into different
types.
bound-set
strong
full
weak
prime
Figure 6.6: Types of bound-sets
The associated function with a full bound-set is a simple Boolean
operation, AND, OR or XOR but inputs and the output may be
complemented. The associated function with a prime bound-set is a
Boolean function in which no bound-set exists except the trivial ones.
Recall that the associated function with a non-leaf-node has one input
for each immediate successor in the decomposition-tree.
142
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
full
h
x1 x2 x 3 x4 x 5 x6 x7
prime
g1
full
x1 x 2 x 3
x1
x2
x3
g2
x 4 x5 x6
x4
x5
x6
x7
Figure 6.7: Example of a decomposition tree
For example, the decomposition tree for the following function is
shown in Figure 6.7.
f ( x1 , x 2 , x3 , x 4 , x5 , x6 , x7 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 +
x 4 ⊕ x5 ⊕ x 6 + x 7
The associated functions with the three nodes, which are not leafnodes are:
g1 ( x1 , x 2 , x3 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3
g 2 ( x 4 , x5 , x 6 ) = x 4 ⊕ x5 ⊕ x 6
h( g 1 , g 2 , x 7 ) = g 1 + g 2 + x 7 .
The full bound-set g2 implies that {x 4 , x5 } , {x 4 , x 6 } and {x5 , x 6 } are
weak bound-sets.
Similarly, the full bound-set h implies that
{x1 , x 2 , x3 , x 4 , x5 , x6 } , {x1 , x 2 , x3 , x 7 } and {x 4 , x5 , x6 , x 7 } are weak
bound-sets.
This theory about decomposition trees does not extend directly to
multiple-valued functions. According to [Dub97b] a class of multipleoutput functions for which this theory of decomposition trees holds is
defined in the book [Von91].
6.2.2. Binary decision diagrams
Basics about binary decision diagrams
A Binary Decision Diagram (BDD) is an acyclic directed graph that
represents a Boolean function. In such a graph, the nodes with no
successors are called leaf-nodes. Each leaf-node represents one of the
143
CHAPTER 6
two Boolean constants 0 and 1. There is one node with no
predecessors and it is called the top node. Each node that is not a leafnode has two immediate successors and one associated input variable.
A BDD represents a Boolean function in the following way.
Given an assignment of the input variables, start a walk in the graph at
the top-node. For each non-leaf-node there is an associated input
variable and two outgoing branches. One of the branches shows the
path of the walk if the assignment of the associated input is logic 0
and the other branch shows the path of the walk when the assignment
of the associated input is logic 1.
x1
x2
0
x2
0
1
0
Figure 6.8: A BDD
Figure 6.8 shows an example of a BDD that represents the Boolean
function x1 ⋅ x 2 . The direction of the graph is from top to bottom. The
dotted edges represent the outgoing branch from a node in which the
walk should go when its associated variable is assigned logic 0. The
solid branch represents corresponding activity when the associated
variable is assigned logic 1. These styles of BDD drawings are used in
all figures with BDDs in this thesis.
Reduced ordered binary decision diagrams
Reduced Ordered BDDs (ROBDD) were presented in [Bry86].
x1
x1
x2
0
x2
0
1
x2
0
x1
x2
0
x2
1
0
1
Figure 6.9: Reduction of BDD
In a BDD, each node represents a subfunction. A ROBDD is reduced,
which implies that there is no node that represents the same
subfunction as another node. One consequence of this is that a
144
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
ROBDD only has two leaf-nodes. The property of a ROBDD being
reduced also implies that no node has both outgoing edges connected
to the same successor. Figure 6.9 illustrates how the BDD in Figure
6.8 can be changed to a ROBDD. To the left is the BDD as it appeared
in Figure 6.8. In the middle the nodes representing the same
subfunctions are replaced with one node. In this example it is only the
leaf-nodes with constant zero. The result is that the left node with
variable x2 has its both outgoing edges pointing to the same node (see
the middle figure). This node is then removed and its incoming edge is
connected to the node its outgoing edges were going to. The result is
the BDD to the right in Figure 6.9 and this BDD is a ROBDD.
Observe that this description should not be interpreted as an algorithm
to reduce a BDD because it will only work for some BDDs. A
complete algorithm to reduce a BDD is beyond the scope of this thesis
and is therefore not described.
In a ROBDD the variables are ordered. This means that in each
possible pair of walks through the BDD, the variables that appear in
both pairs appear in the same order. In Figure 6.9 the variables are
ordered in all BDDs and the right most BDD is also reduced and is
therefore a ROBDD.
There exists only one ROBDD for a given Boolean function and a
given variable order. For a more thorough description of ROBDDs,
see [Bry86].
Multiple output functions
A multiple output function corresponds to a combinational circuit with
more than one output. Each output is a separate Boolean function. The
separate Boolean functions usually have the same input variables. It is
possible to represent each separate output function with a separate
ROBDD. Another possibility is that the different output functions
share nodes with common subfunctions in the ROBDD. It is then
practical to use squares in the top of the ROBDD as shown in Figure
6.10 to indicate which top-node corresponds to each respective output.
Figure 6.10 shows the ROBDD for functions f 1 ( x1 , x 2 ) = x1 ⋅ x 2 and
f 2 ( x1 , x 2 ) = x1 + x 2 .
145
CHAPTER 6
f1
f2
x1
x1
x2
0
1
Figure 6.10: ROBDD for two functions
Using this type of representation for multiple output functions, as in
Figure 6.10, can be advantageous for some functions. This type of
representation requires that the variable order in the ROBDD is the
same for all output functions. For some multiple output functions it is
more efficient to use a separate ROBDD for each output because the
freedom to choose variable order individually for each output function
makes the total number of nodes much smaller than in the type of
multiple output ROBDD illustrated in Figure 6.10.
Upper bound on size of implementation
There is a direct mapping between a BDD representation of a Boolean
function and an implementation with two-input multiplexers. Figure
6.11 illustrates this with an example. Figure 6.11a shows a BDD for
function
f ( x1 , x 2 , x3 ) = x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3 + x1 ⋅ x 2 ⋅ x3
and
Figure 6.11b shows the corresponding implementation with two-input
multiplexers.
146
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
f
x1
x1
0
1
x2
x2
x2
1
0
0
1
x3
x3
x3
1
0
0
1
0
1
0
1
a
b
Figure 6.11: Implementation with multiplexers
The implementation with multiplexers directly corresponding to a
BDD representation is not normally optimal but it gives an upper
bound on the implementation size. We know for sure that it is
sufficient with one two-input multiplexer per non-leaf-node in the
ROBDD to implement a function. To implement a two input
multiplexer it is sufficient with three two input NAND-gates and an
inverter.
6.2.3. Bound-sets and variable order of
ROBDDs
To check whether a set of variables Y is a bound-set to a function, a
ROBDD can be created with the variables in Y above the others. With
all variables above in this context we mean that nodes with associated
variables in Y come before other nodes at each possible walk through
the ROBDD. We use the term cut in the ROBDD to represent this.
The cut is a line for which nodes associated with some variables, in
this case Y, are above and others are below. We use the term cut-node
for the set of nodes below the cut which are connected to an edge from
above the cut. The number of cut nodes is the same as the row
147
CHAPTER 6
multiplicity in the corresponding decomposition chart described in
Subsection 6.2.1.
x1
x2
cut
x3
x4
x4
0
1
Figure 6.12: ROBDD with a cut
An example of a ROBDD of function f ( x1 , x 2 , x3 , x 4 ) is shown in
Figure 6.12. This ROBDD has a cut such that variables x1 and x2 are
above the cut and others are below. The cut nodes are indicated with
an extra circle. The number of cut nodes is two so the set {x1 , x 2 } is a
bound-set. This means that there exist functions g and h such that
f ( x1 , x 2 , x3 , x 4 ) = h(g (x1 , x 2 ), x3 , x 4 ) . Figure 6.13 shows the
ROBDDs for the functions h and g. Note that the ROBDD for
function h has the same structure as the part of the ROBDD below the
cut for function f shown in Figure 6.12. The nodes above the cut are
replaced with one node with function value g as the associated
variable. The output edges from that node go to the two nodes that are
cut-nodes in the corresponding ROBDD for function f. Function g has
the same structure as the part of the ROBDD for function f above the
cut. The cut-nodes for f are replaced by leaf-nodes 1 and 0 in the
ROBDD for g. It can be decided arbitrarily which of the terminal
nodes in the ROBDD for function g is 1 and which is 0. The choice
affects the node with the associated variable g in ROBDD for function
h in the sense that its outgoing edges should be interchanged if the
terminal nodes in the ROBDD for function g are interchanged.
148
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
g
h
g
x1
x3
x2
1
0
x4
x4
0
1
Figure 6.13: ROBDD for subfunctions
For many functions, the size of a ROBDD is highly dependent on the
variable order. An example of that is described in [Bry86]. Changing
the order of variables in a ROBDD is a relatively expensive operation
in terms of computation time. Therefore, to search for bound-sets it is
not efficient to reorder the ROBDD to move the subset of variables to
be checked to the top of the ROBDD. In Chapter 7 an algorithm is
presented that finds every bound-set among all linear intervals of
variables in the ROBDD. A linear interval of variables in this context
is a set of variables such that each variable not in that set is either
before all or after all variables in this set, where before and after refers
to the variable order in the ROBDD. This algorithm is strong because
there exists for most Boolean functions a variable order in which the
ROBDD gets a minimal number of nodes and in which all subsets of
variables forming strong bound-sets are in linear intervals.
6.2.4. Related work about BDD based
decomposition
In [Ash59] it was shown how row multiplicity can be used to check
whether a subset of variables is a bound-set. How this can be done
was described in Subsection 6.2.1. The number of cut nodes in a
ROBDD is equal to the row multiplicity in a decomposition chart.
This has been utilized by a number of BDD-based decomposition
algorithms including [Cha96, Lai93, Saw98]
149
CHAPTER 6
In Figure 6.11 it was shown how a BDD directly maps to an
implementation with multiplexers. Pass transistor logic is a way to
connect transistors to implement logic of that type. Shelar and
Sapatnekar [She01] have determined that such implementations often
produce unnecessarily long delays. They have shown how a BDD can
be partitioned into several BDDs resulting in implementations with
smaller delay. Their partitioning method results in a Roth-Karpdecomposition with the multi valued variable encoded into binary
variables. Each of these variables is represented by a BDD.
Stanion and Sechen [Sta95] presented a method which finds
decompositions of the form f ( X ) = g (Y ) • h( Z ) where the bullet is
any binary Boolean operation and Y ∪ Z = X and Y ∩ Z = k for
some k ≥ 0 , but k is still relatively small. This type of decomposition
is referred to as bi-decomposition. Mishchenko et al [Mis01]
presented another method to find bi-decompositions. Both Stanion and
Sechen’s method and Mishchenko’s method use BDDs in efficient
implementations of their methods.
ROBDDs themselves are kinds of decomposed representations of
the functions. There are methods that use this fact to exploit the
structure of ROBDDs to find disjoint decompositions. In [Kar88] the
classical concept of dominator on graphs [Len79] is extended to
0,1-dominators on ROBDDs. A node v is a 1-dominator if every path
from the root to one-terminal-node contains v. Likewise a node v is
0-dominator if every path from the root to zero-terminal-node contains
v. If v is a one-dominator, then the function represented by the
ROBDD possesses a disjoint AND-decomposition. This means that
the inputs of the function can be divided into a set of groups where
each input belongs to exactly one group. Each group is a bound-set of
the function and the function value is an AND-function of the function
values associated with these bound-sets. If v is a 0-dominator, we get
the same type of decomposition but with an OR-function instead of an
AND-function. Yang et al [Yan99] extended this idea to XOR-type
decompositions and to more general types of dominators. Minato and
Micheli [Min98] presented an algorithm that computes disjoint
decompositions by generating irreducible sum-of-product for the
function from its BDD and applying factorization.
150
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
The algorithm presented by Bertacco and Damiani [Ber97] makes
a single traversal of the BDD to identify the decomposition of the cofactors and then combine them to obtain the decomposition for the
entire function. However, as observed by Sasao and Matsuura [Sas98],
it fails to compute some of the disjoint decompositions. This problem
was corrected by Matsunaga [Mat98] where the missing cases in
[Ber97] were added to make it so that the OR- functions and XORfunctions would be treated correctly. The algorithm in [Mat98]
appears to be one of the fastest existing exact algorithms for finding
all disjoint decompositions.
6.3 Decomposition for three-level logic
synthesis
This section serves as a background to the contributions presented in
Chapter 8. Subsection 6.3.1 describes the type of three-level logic
used in this thesis and Subsection 6.3.2 describes related work in this
area.
6.3.1. Three-level logic
Three-level logic is logic with a gate depth of three. The
decomposition types for three-level logic considered in this thesis for
a Boolean function f(X) can be expressed as f ( X ) = g1 ( X ) • g 2 ( X )
where the bullet (•) is a binary operator and g1 and g2 are Boolean
functions represented in SOP-form. With g1 and g2 in SOP-form the
bullet represents either an AND-operator or an XOR-operator.
Figure 6.14 shows an example of a logic circuit when the bullet
corresponds to an XOR-operator. Although the cases when the bullet
is an AND-operator and when the bullet is an XOR-operator seem to
be quite similar, optimization strategies for the two cases differ quite a
lot. The contribution in this thesis in three-level logic optimization is
for the case where the bullet is an XOR-operator. Expressions and
networks of this type are referred to as AND-OR-XOR logic. This
contribution is presented in Chapter 8.
151
CHAPTER 6
x1
x2
x3
x4
f
Figure 6.14: A three-level logic circuit with XOR-gate at third level
Three-level optimization is a trade-off between the flexibility of multilevel optimization and the small gate depth of two-level optimization.
For many functions the required number of components is smaller
when using three-level logic compared to two-level logic. Three-level
optimization is particularly useful for PLA devices with logic
expanders.
As described above, the three-level implementation is built up of
the two functions g1 and g2. Each of these functions realizes a twolevel implementation. The outputs of these functions are connected to
the inputs of the gate at the third level. The cube representation, which
is described in Subsection 2.3.3, can be extended to be useful for
three-level logic circuits. To do this, a cube representation is made for
both functions g1 and g2. Implicants for both these functions are
included in that cube representation. Each implicant is marked such
that it can be determined whether it belongs to function g1 or to
function g2. In the case where the third level of logic is an XOR-gate,
the minterms included in implicants from only one of the functions g1
and g2 have function value 1. The minterms included by implicants
from both these functions have function value 0 as well as the
minterms not covered by any implicants. For cases where the third
level is an AND-gate, only minterms included in implicants from both
functions g1 and g2 have function value 1.
152
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
Figure 6.15: Karnaugh map used for AND-OR-XOR logic
In Figure 6.15 the cube representation of the function implemented as
in Figure 6.14 is projected on a Karnaugh map. The solid implicants
belong to function g 1 = x1 ⋅ x 2 + x3 ⋅ x 4 and the dotted implicants
belong
to
function
g 2 = x1 ⋅ x 2 + x3 ⋅ x 4 .
When
building
f = g1 ⊕ g 2 the result is the function represented by the zeros and
ones in this Karnaugh map.
As for two-level logic, the number of implicants is useful as an
estimation of implementation cost. This cost estimation is especially
accurate when a PLA with a logic expander is used.
6.3.2. Related work in three-level logic
Several methods for optimization of AND-OR-XOR logic have been
proposed [Cha97, Deb98, Dub97a, Dub99, Jab00, Jab02, Pra08,
Sas95], for example Pradhan et al [Pra08] used a genetic algorithm
and they focused more on power aspects than previous articles for
AND-OR-XOR minimization. The results of algorithms presented in
these articles vary a great deal between different Boolean functions.
For some functions the result is much better than for two-level logic
while there is no considerable difference for others. The three-level
optimization algorithms are quite time consuming, so it is good to
know a priori if it is likely that a Boolean function will benefit from
AND-OR-XOR optimization. The contribution presented in Chapter 8
of this thesis is a fast algorithm for estimation of the benefit of ANDOR-XOR optimization. In an algorithm presented by Dubrova et al
153
CHAPTER 6
[Dub99] a preprocessing step considers clusters of intersecting cubes
to predict the benefit of optimization for AND-OR-XOR. An
algorithm that analyzes the structure of a BDD to predict the benefit of
minimization for XOR type logic was presented by Sun and Xia
[Sun08].
6.4 Other applications of Boolean
decomposition
The Boolean decomposition is one of the main operations performed
during logic optimization. Besides this, there are several more
situations where Boolean decomposition can be useful. Subsections
6.4.1, 6.4.2 and 6.4.3 briefly describe how decomposition can be
useful for circuit partitioning, for simplification of testing and for
power estimation in digital circuits.
6.4.1. Circuit partitioning
Partitioning is the process of dividing a circuit into two or more parts
such that each part fits into an available component. The type of
available components and available interconnections defines
constraints on partitioning. Minimizing the number of
interconnections between parts is often an important constraint. There
are many articles about partitioning algorithms including [Dut96,
Fid82, Kri84, Li06].
Some of the partitioning algorithms only deal with bi-partitioning.
Bi-partitioning is partitioning into two parts. In most cases the two
parts should not differ too much in size if the bi-partitioning is to be
useful. Bi-partitioning algorithms use some kind of balancing criteria
to achieve this, for example there may be a criteria requiring that each
part should have 45% to 55% of the circuit.
Let
f (X )
be
a
Boolean
function
such
that
f ( X ) = h( g1 (Y ), K, g k (Y ), Z ) where X = Y ∪ Z . This expression
of function f is the Roth-Karp-decomposition where the integer values
154
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
function is encoded into k binary functions. Assume that f ( X )
should be bi-partitioned. Then a part of the function should be
implemented in one part, say A, and the rest of the function in another
part, say B. Assume that inputs and outputs of the function are
available in both parts. Then functions g1 (Y ), K , g i (Y ) can be
implemented in part A for some i ≤ k . Function h and functions
g i +1 (Y ), K, gk (Y ) are implemented in part B. Then only i number of
interconnections between part A and part B are required.
In Section 6.3 a decomposition of type f ( X ) = g1 ( X ) • g 2 ( X )
was described and a contribution for such a decomposition is
presented in Chapter 8. For many benchmark functions that benefit
greatly from this type of decomposition, the complexity of functions
g1 is in a similar range as the complexity of g2. For such functions a
good bi-partitioning would be to put function g1 in one part and
function g2 in the other part. The operator represented by the dot can
be put in any of these parts. For this type of partitioning only one
interconnection between the parts is needed, assuming that inputs are
available in both parts.
6.4.2. Circuit partitioning for simplification
of testing
Subsection 2.4.3, regarding test generation, describes fault
propagation and fault activation. At the logic level of abstraction,
propagation and activation of faults are often the most complex part of
test generation. The more complex the design, the more difficult it is
to propagate and activate faults. Decomposition can facilitate testing
by partitioning the circuit into smaller parts.
155
CHAPTER 6
Y
Test
pattern
assignment
g(Y)
Z
Constant
assignment
f(X)
h(g,Z)
a: Test of g(Y)
Y
Constant
assignment
Test
pattern
assignment
g(Y)
Z
h(g,Z)
f(X)
b: Test of h(g, Z)
Figure 6.16: Decomposition facilitating testing
The disjoint decomposition of type
f ( X ) = h( g (Y ), Z ) where
Y ∪ Z = X and Y ∩ Z = ∅ is a good example of how
decomposition can facilitate testing. Figure 6.16a illustrates how the
block that implements function g can be tested. The variables in set Z
are assigned constant values such that the output of the block
implementing g is propagated through the block implementing h to the
output of the logic. Test vectors can then be assigned to the set Y to
test the block that implements function g. Figure 6.16b illustrates the
test for the block that implements function h. The inputs in vector Z
and the input from g should then be assigned test vectors. The inputs
in set Y except one are then assigned constant values such that the
156
BACKGROUND AND RELATED WORK IN BOOLEAN DECOMPOSITION
remaining bit in set Y is propagated through block g to block h. In
such a way test vectors for function h can be applied on the inputs.
It is possible that the constant values applied to make a signal
propagate through a block cause the signal that propagates to get
inverted. In such a case the same method can be used but test vectors
and analysis of results need to be adjusted accordingly.
6.4.3. Power estimation
Power consumption in CMOS devices is highly dependent on the
switching activity in internal nodes. Switching activity in a node is a
quantity that describes how often it changes value.
Switching activity estimation is computationally difficult [Cho96,
Cos97]. If disjoint decomposition is known and used in the
implementation, switching activity can be computed separately on the
different parts that the decomposition divides the circuit into. In such a
way switching activity computations can be facilitated.
157
CHAPTER 6
158
Chapter 7
A fast algorithm for finding
bound-sets
In this chapter a fast heuristic algorithm is presented that finds disjoint
decomposition of Boolean functions. This algorithm is referred to as
Interval-cut algorithm in the following description.
7.1 Basic idea of Interval-cut algorithm
In Subsection 6.2.3 we described how a set of variables that are above
other variables in an ROBDD can be checked to determine whether
they are a bound-set. A drawback of this method is that a set of
variables that should be checked needs to be put in the top of the
ROBDD, which incorporates computationally expensive reordering
algorithms. The Interval-cut algorithm can check any interval of
variables that are adjacent in the ROBDD. To do so, two cuts are used
instead of one. The upper cut is a boundary line between the variables
that should be checked to determine whether they are a bound-set and
the variables above them in the ROBDD. The lower cut is a boundary
line between the variables that should be checked to determine
whether they are a bound-set and the variables below them in the
ROBDD.
159
CHAPTER 7
upper
cut
lower
cut
x3
x1
x1
x1
x2
x2
x2
x4
x4
x4
1
0
1
a
x1
0
b
x1
x2
x2
x4
x4
0
1
0
1
c
d
x1
x2
1
0
e
Figure 7.1: Illustration of Interval-cut algorithm
The ROBDD of the following Boolean function is shown in Figure
7.1a:
f ( x1 , x 2 , x3 , x 4 ) = ( x1 ⋅ x 2 ) ⋅ x3 ⋅ x 4 + x ⋅ x ⋅ x +
( 1 2) 4
(x1 ⋅ x 2 ) ⋅ x3 +
x1 ⋅ x 4
This function is used as an example. The variable set {x1 , x 2 } should
be checked to determine whether it is a bound-set.
First, the cut nodes of the upper cut, which are above the lower
cut, are identified. In this example there are two such nodes, those
with x1 as associated variables. The sub-ROBDDs with top-nodes at
each of those nodes are representing other Boolean functions. In this
example, those functions are shown in Figure 7.1b and in Figure 7.1c.
A necessary condition for {x1 , x 2 } to be a bound-set for the original
function is that it is a bound-set to those functions as well. In these
sub-ROBDDs the variables x1 and x2 are above the lower cut and other
variables are below. Hence the number of cut-nodes with respect to
the lower cut can be checked with respect to these subfunctions. If
they are two the set {x1 , x 2 } is a bound-set with respect to those
subfunctions. There are two cut nodes to the lower cut in both
functions in Figure 7.1b and in Figure 7.1c. In Figure 7.1b it is the
node with associated variable x4 and the terminal node with constant 1
while in Figure 7.1c it is the two nodes with associated variable x4. So
this necessary condition is fulfilled because there are two cut nodes in
both of those functions.
160
A FAST ALGORITHM FOR FINDING BOUND-SETS
Given the assumption that this necessary condition is fulfilled
{x1 , x 2 } is a bound-set to the original function in Figure 7.1a if and
only if the associated function with the bound-sets in Figure 7.1b and
Figure 7.1c are equal up to complementation of their function value.
The functions associated with the bound-sets in Figure 7.1b and
Figure 7.1c can be extracted by replacing the cut-nodes of the lower
cut with terminal nodes 1 and 0. In Figure 7.1d and Figure 7.1e this is
done. Because the ROBDD is unique for a given variable order and a
given function, two functions are equal if and only if their ROBDDs
are equal with the condition that variable orders are the same. The
ROBDDs in Figure 7.1d and Figure 7.1e are equal. Hence {x1 , x 2 } is
a bound-set of the original function. The polarity of the associated
function depends on which of the cut nodes is replaced by terminal
node 1 and which one is replaced by terminal node 0.
The Interval-cut algorithm uses this method to check every set of
variables that are adjacent in the ROBDD. It does not have to make
any new ROBDDs as in the demonstration examples, rather it does the
analysis directly on the existing ROBDD. The order in which the
adjacent sets of variables are checked is done in such a way that the
information from the last check can be reused for the next check, see
Section 7.3. Also, some checks can be avoided because previous
checks imply that a given set cannot be a bound-set.
7.2 Interval-cut algorithm and formal
proof of its functionality
This section formally describes the functionality of the Interval-cut
algorithms and gives a proof for it.
7.2.1. Terminology, definitions and
notations
Let V be the set of nodes of an ROBDD G of an n-variable
function f ( X ) . Every non-terminal node v ∈ V has an associated
161
CHAPTER 7
variable-index, index(v) ∈ {1,..., n} . We let these indices increase
when going from the top-node to the leaf-nodes in the ROBDD. The
index of the top-node is then 1. In order to have a unified notation in
the proof of the main result, we also let the terminal nodes have an
index, which is n+1.
Definition: cut (i )
Let cut (i ) be a boundary line in the ROBDD such that nodes with
associated variable-index, index(v) ≤ i are above this cut and nodes
with associated variable-index, index (v) > i are below this cut.
Definition: cut _ set (G , i )
Let cut _ set (G , i ) denote the subset of nodes in the ROBDD G
which are below cut (i ) and which have at least one edge connected
to a node above cut (i ) .
Definition: below _ set (G, i ) and above _ set (G , i )
Let below _ set (G, i ) denote the nodes in the ROBDD G which are
below cut (i ) and let above _ set (G , i ) denote the nodes that are
above cut (i ) .
Definition: sub _ bdd (G , v)
Let v be a node in the ROBDD G, then let sub _ bdd (G , v) denote
the subpart of the ROBDD G that is rooted at node v.
Definition: trunc (G , i )
This operator is only
applicable
for
functions
where
cut _ set (G , i ) = 2 Let trunc (G , i ) be the part of the ROBDD G
with the two nodes in cut _ set (G , i ) replaced by terminal nodes 0
and 1 and with all other nodes below cut(i) removed. Deciding which
node in cut _ set (G , i ) should be replaced with terminal node 0 and
which should be replaced with terminal node 1 is made
deterministically. This means that a pair of calls to trunc (G , i ) cannot
162
A FAST ALGORITHM FOR FINDING BOUND-SETS
give ROBDDs representing functions that are mutually inverses of
each other.
7.2.2. Algorithm and proof
Let a, b be integer values such that 0 ≤ a < b ≤ n . Let the variable set
Y be the set of variables with indices a < index (v) ≤ b; v ∈ Y . This
means that the set Y contains the variables associated with the nodes in
the ROBDD between cut (a ) and cut (b) . Let Z denote the variables
not in Y. Using this notation, we can describe the pseudo code of the
Interval-cut algorithm as shown below.
1
2
3
2
4
5
6
7
8
9
10
11
12
13
14
15
16
ALGORITHM Interval-cut algorithm (G, a, b)
(V,…,Vm) ← cut_set(G, a) ∩ above_set(G, b)
FOR ALL i∈[1, m] DO
Ui ← sub_bdd(G, Vi)
IF |cut_set(Ui, b)| ≠ 2 THEN
RETURN “Y is not a bound-set”
END IF
END FOR ALL
FOR ALL i∈[2, m] DO
IF NOT trunc(U1, b) ≡ trunc(Ui, b) THEN
RETURN “Y is not a bound-set”
END IF
END FOR ALL
h(g, Z) ← Function of ROBDD G with the nodes between
cut(a) and cut(b) replaced by nodes with
associated variable g. There will be one such
node for each trunc(Ui, b), i∈[1, m]
g(Y) = Function of ROBDD trunc(U1, b)
RETURN (h(g, Z), g(Y)
END ALGORITHM
Next, we prove that it computes the decompositions correctly.
Theorem 7.1: Interval-cut algorithm (G, a, b) above determines
unambiguously whether a decomposition f ( X ) = h( g (Y ), Z ) exists
when X = Y ∪ Z , Y ∩ Z = Ø a < b and Y is the variables between
cut (a ) and cut (b) in the ROBDD G while Z is the variables above
cut (a ) and below cut (b).
Proof: Let Z1 be the variables above cut (a ) and let Z2 be the variables
below cut (b). then Z 1 ∪ Z 2 = Z .
163
CHAPTER 7
Let pi (Z 1 ) be i Boolean functions which is 1 for all variable
assignments in Z1 that lead the path to the top-node of respective subROBDD Ui and which is 0 for other variable assignments in Z1. Let
f i (Y , Z 2 ) be the Boolean function represented by the sub-ROBDD
Ui. The Boolean function f ( X ) can then be co-factored in the
following way.
m
f ( X ) = ∑ pi (Z i ) ⋅ f i (Y , Z 2 ) + q(Z1 , Z 2 )
(7.1)
i =1
The set
Y is a bound-set
to
f i (Y , Z 2 )
if and only
if
cut _ set (U i , b) = 2 . The definitions of the functions f i (Y , Z 2 ) and
the properties of ROBDDs imply that cut _ set (U i , b) ≥ 2 for all i and
that at least one variable in set Y belongs to the support set of
f i (Y , Z 2 ) for all i. If Y is not a bound-set for f i (Y , Z 2 ) for some i
then from Equation (7.1) it can be concluded that Y cannot be boundset for f(X).
In the case when cut _ set (U i , b) = 2 for all i there exist functions hi
and gi such that the following holds for all i:
f (Y , Z 2 ) = hi ( g i (Y ), Z 2 )
(7.2)
Function f ( X ) can then be written as:
m
f ( X ) = ∑ pi (Z i ) ⋅ hi (g i (Y ), Z 2 ) + q(Z1 , Z 2 )
(7.3)
i =1
If two functions gi are not identical up to complementation, Y cannot
be a bound-set but if they are all identical up to complementation,
expression 7.3 can be expressed as follows where g(Y) = g1(Y), ci=0
for i where g1 ≡ g i and ci=1 for all i where g1 ≡ g i .
m
f ( X ) = ∑ pi (Z i ) ⋅ hi (ci ⊕ g (Y ), Z 2 ) + q (Z1 , Z 2 )
(7.4)
i =1
Hence Y is a bound-set to f(X).
164
□
A FAST ALGORITHM FOR FINDING BOUND-SETS
7.3 Implementation aspects and
complexity analysis
The formal description of the algorithm in Section 7.2 only describes
how a single subset of variables is checked to determine whether it is
a bound-set. The method becomes efficient, however, when all linear
intervals of variables are checked in the same run.
This can be done by moving the upper cut from the top to the
bottom of the ROBBD in a loop. For each upper cut it is then possible
to identify the lower cuts for which the set of variables between the
cuts is a bound-set.
We show in this section how this can be performed with O(m3)
operations, where m is number of nodes in the ROBDD. The
algorithm consists of two parts. First a list is generated with all boundsets that have adjacent variables in the ROBDD. In the second step
that list is processed such that bound-sets that are known to be weak
are removed from the list and bound-sets that are not known to be
weak are labeled as prime or full. It is possible that the list contains
weak bound-sets that have not been identified as weak. Such a boundset will be labeled full if it is part of a full bound-set.
To show that the algorithm can check every linear interval of
variables in O(m3) operations in total, an implementation is described
in this section and it is proven that this implementation needs O(m3)
operations in total. The description of the algorithm is divided into
subfunctions. The function at the top of the calling hierarchy is
ImplementationIntervalCut shown below. This function first calls
subfunction GetBoundsetList and then it calls ProcessFoundBoundset.
Each of these functions is called only once and complexity of each
function is O(m3). Because there is no loop in the top function
ImplementationIntervalCut its complexity will then be O(m3). It is
described in Subsection 7.3.1 how subfunction GetBoundsetList can
( )
be implemented with O m 3 operations. In Subsection 7.3.2 it is
shown how subfunction ProcessFoundBoundset can be implemented
( )
with O m 3 operations.
165
CHAPTER 7
1
2
3
4
FUNCTION strongBsList ← ImplementationIntervalCut(topNode)
bsList ← GetBoundsetList(topNode)
strongBsList ← ProcessFoundBoundset(bsList)
END FUNCTION
In the functions in this implementation, sets of variables are used. The
sets of variables considered are only intervals of adjacent variables in
the variable order of the ROBDD, therefore only the index of the first
and the last variable or the equivalent (upper cut and lower cut in this
description) are needed to represent a set of variables. Comparisons
between variable sets and assignments of variables sets can then be
made with O(1) operations.
There are indexed lists used in the description of the algorithms.
Indexing of such lists starts with index 0.
There is at least one node in the ROBDD for each variable that a
function depends on, therefore n ≤ m, where n is number of variables.
7.3.1. Generating a list of bound-sets
The function with the algorithm for generation of the list of boundsets is described with a main function and two subfunctions. The main
function is GetBoundsetList. This function calls the two subfunctions
GetLevelWhereStructuresDiffers and GetLevelsWithTwoCutNodes.
The former subfunction takes two nodes of the ROBDD as input
arguments. It computes the level at which the structures of the subROBDD rooted at these nodes differ. The latter subfunction takes one
node of the ROBDD as input argument and it returns a list with the
cuts where the sub-ROBDD rooted at this node has two cut nodes.
The complexity is O(m2) for both of these subfunctions.
166
A FAST ALGORITHM FOR FINDING BOUND-SETS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
FUNCTION bsList ← GetBoundsetList (topNode)
LENGTH(bsList) ← 0
LENGTH(U(1)) ← 1
U(1, 0) ← topNode
FOR i ← 2 TO n
LENGTH (U(i)) ← 0
END FOR
FOR i ← 1 TO n
nextLevelContainingUpperCutNodes ← n + 1
FOR j ← n DOWNTO i + 1
IF LENGTH (U(j)) > 0
nextLevelWithUpperCutNodes ← j
END LOOP
differLevel ← n + 1
FOR j ← 1 TO LENGTH(U(i)) - 1
level ← CALL GetLevelWhereStructuresDiffers (U(i, 0),
U(i, j))
differLevel ← MIN(differLevel, level)
END FOR
maxLowerCut ← MIN(nextLevelWithUpperCutNodes, differLevel)
levelList ← GetLevelsWithTwoCutNodes(U(0), maxLowerCut)
FOR j ← 0 TO LENGTH(levelList) – 1
LENGTH(bsList) ← LENGTH(bsList) + 1
bsList(LENGTH(bsList) – 1).UpperCut ← i
bsList(LENGTH(bsList) – 1).LowerCut ← j
END FOR
FOR j ← 0 TO LENGTH(U(i)) - 1
FOR EACH SUCCESSOR NODE s OF U(i, j)
p ← index of variable associated with s
IF s NOT in U(p)
ADD s TO U(p)
END IF
END FOR EACH
END FOR
END FOR
The loop at line 8 in function GetBoundsetList loops over the upper
cuts. The list of lists U contains the cut nodes for the current upper
cut. There is one list in U for each variable and the first index
indicates the variable. Initially the cut is for variable 1 and lines 3 – 7
initiates U accordingly. Lines 26 – 32 update U at each iteration of the
loop at line 8. The loop at line 26 iterates less than m times and the
loop at line 27 iterates twice. The check at line 29 needs to iterate over
the list U(p) which has less than m elements. The updating of U
therefore needs less than m2 operations and this is done in the loop at
line 8 which iterates n times. The updating of U therefore needs
O(m2n) ≤ O(m3) operations in total.
At lines 9 – 13 the variable nextLevelWithUpperCutNodes is
assigned with the variable index of the upper cut node with the lowest
variable index among those that have a higher variable index than the
167
CHAPTER 7
index of the current cut. The loop at line 10 iterates less than n times
and it is inside the loop at line 8. The number of operations for this is
therefore O(n2) ≤ O(m3).
Lines 14 – 18 compare the structures of the parts of the ROBDD
rooted at the upper cut nodes with variable index equal to the current
upper cut, which is i. This operation finds the highest variable index
for which all these structures are equal. These operations occur inside
the loops at line 8 and at line 15. The loop at line 8 iterates n times and
the loop at line 15 iterates less than m times. The length of these loops
is, however, dependent such that the operations inside the loop at line
15 in total iterates less than 2m times. In that loop the subfunction
GetLevelWhereStructuresDiffers is called. That function needs O(m2)
operations and because it is called less than 2m times, O(m3)
operations are needed in total for the operations at lines 14 – 18.
At line 20 the subfunction GetLevelsWithTwoCutNodes is called.
It returns a list of lower cuts. The set of variables between the current
upper cut and every lower cut in that list is a bound-set. Function
GetLevelsWithTwoCutNodes needs O(m2) operations and it is called
inside the loop at line 8 which has n iterations, therefore O(m2n) ≤
O(m3) operations is needed in total for this operation.
Lines 21 – 25 add the newly found bound-sets to the list of boundsets. The loop at line 21 iterates less than n times and it is inside the
loop at line 8 which iterates n times. The operations at these lines then
need in total O(n2) ≤ O(m3) operations.
168
A FAST ALGORITHM FOR FINDING BOUND-SETS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
FUNCTION level ← GetLevelWhereStructuresDiffers(A, B)
currentLevel ← variable index of A
LENGTH(La) ← 1
La(0) ← A
LENGTH(Lb) ← 1
Lb(0) ← B
differenceReached ← FALSE
WHILE (NOT differenceReached) AND
currentLevel ≤ n
i ← 0
WHILE (NOT differenceReached) AND i < LENGTH(La)
IF variable index of La(i) = currentLevel
FOR s ← 0 TO 1
successorNodeFoundInList ← FALSE
FOR j ← 0 TO LENGTH(La) - 1
IF La(j) = successor node s of La(i) XOR
Lb(j) = successor node s of Lb(i)
level ← currentLevel + 1
differenceReached ← TRUE
ELSE IF La(j) = successor node s of La(i)
successorNodeFoundInList ← TRUE
END IF
END FOR
IF NOT successorNodeFoundInList
LENGTH(La) ← LENGTH(La) + 1
La(LENGTH(La) - 1) ← successor node s of La(i)
LENGTH(Lb) ← LENGTH(Lb) + 1
La(LENGTH(Lb) - 1) ← successor node s of Lb(i)
END IF
END FOR
END IF
i ← i + 1
END WHILE
i ← 0
WHILE i < LENGTH(La)
IF variable index of La(i) = currentLevel
La(i) ← La(LENGTH(La) - 1)
LENGTH(La) ← LENGTH(La) - 1
Lb(i) ← Lb(LENGTH(Lb) - 1)
LENGTH(Lb) ← LENGTH(Lb) – 1
ELSE
i ← i + 1
END IF
END WHILE
currentLevel ← currentLevel + 1
END WHILE
IF VALUE NOT ASSIGNED TO level
level ← n + 1
END IF
The function GetLevelWhereStructuresDiffers takes two nodes A and
B of the ROBDD as input arguments. It computes the level at which
the structures of the sub-ROBDD rooted at A and B differ. It needs
O(m2) operations to run.
169
CHAPTER 7
List La contains cut nodes found for the sub-ROBDD rooted at
node A and list Lb contains cut nodes found for the sub-ROBDD
rooted at node B. At lines 3 – 6 these lists are initialized to contain
node A and node B respectively.
The loop at line 8 loops through the variables as long as no
difference in structure is found. In this loop the lists La and Lb are
looped through with help of the loop at line 10. The length of La and
Lb are equal. These two loops run less than n and m iterations
respectively. The complexity of these loops is therefore O(mn) ≤
O(m2).
The conditional statement at line 11 is true for the elements in list
La for which the variable index is equal to currentLevel. In total this is
true less than m times during the iterations of the loops at line 8 and
line 10. Replacing La with Lb in the expression at line 11 will make no
difference.
The loop at line 14 iterates through the lists La and Lb to check
whether any successor of La and Lb are located such that the structure
is not equal, below the level currentLevel. This loop iterates over less
than m elements and because the conditional statement at line 11 is
true less than m times, totally O(m2) operations are executed within
the loop at line 14.
Lines 33 – 42 iterate through the lists La and Lb to remove the
nodes with the current variable index before the function proceeds
further with the next level. The loop at line 34 makes less than m
iterations. This loop is inside the loop at line 8, therefore there are
O(mn) ≤ O(m2) operations in total for this part of the function.
170
A FAST ALGORITHM FOR FINDING BOUND-SETS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
FUNCTION levelList ← GetLevelsWithTwoCutNodes(U,maxLowerCut)
LENGTH(levelList) ← 0
LENGTH(L) ← 1
L(0) ← U
FOR i ← variable index of U TO maxLowerCut
j ← 0
WHILE j < LENGTH(L)
IF variable index of L(j) = i
successorNodeZeroInList ← FALSE
successorNodeOneInList ← FALSE
FOR k ← 0 TO LENGTH(L) – 1
IF successor node 0 of L(i) = L(j)
successorNodeZeroInList ← TRUE
IF successor node 1 of L(i) = L(j)
successorNodeOneInList ← TRUE
END FOR
IF NOT successorNodeZeroInList
LENGTH(L) ← LENGTH(L) + 1
L(LENGTH(L) - 1) ← successor node 0 of L(i)
END IF
IF NOT successorNodeOneInList
LENGTH(L) ← LENGTH(L) + 1
L(LENGTH(L) - 1) ← successor node 1 of L(i)
END IF
L(i) ← L(LENGTH(L) – 1)
LENGTH(L) ← LENGTH(L) - 1
ELSE
j ← j + 1
END IF
END WHILE
IF LENGTH(L) = 2
LENGTH(levelList) ← LENGTH(levelList) + 1
levelList(LENGTH(levelList) – 1) ← i
END IF
END FOR
The function GetLevelsWithTwoCutNodes returns a list with the cuts
where the sub-ROBDD rooted at U has two cut nodes. This function
runs within O(m2) operations.
The list L, initiated at lines 3 – 4, contains the cut nodes for the
current cut. The current cut is represented with the variable i defined
in the loop at line 5. Lines 7 – 30 updates list L each time i is
increased.
The loop at line 5 iterates maximum n times and the loop at line 7
iterates less than m times. The statements in the loop at line 7 will
therefore execute O(mn) ≤ O(m2) times in total.
The conditional statement at line 8 will be true less than m times
during execution of the loops at line 5 and line 7. The loop at line 11
iterates less than m times each time it is called. The operations that
171
CHAPTER 7
execute in the loop at line 11 will therefore execute O(m2) times in
total.
Lines 31 – 34 check if the number of cut nodes is two. If this is
true an element is added to the levelList that contains the current cut.
7.3.2. Processing the list of bound-sets
Function GetBoundsetList in Subsection 7.3.1 gives a list with all the
O(n2) bound-sets that have adjacent variables in the ROBBD. From
this list the function ProcessFoundBound-set extracts the O(n) boundsets that are not known to be weak based on the information in the list.
The extracted bound-sets that can be determined to be full or part of a
full bound-set with the help of that list are labeled full; other extracted
bound-sets are labeled prime.
The list of bound-sets generated by the function GetBoundsetList
is sorted in ascending order, first by upper cuts, and then by lower
cuts.
The function ProcessFoundBound-set utilizes the sort order of the
list in conjunction with the fact that all the bound-sets with adjacent
variables in the ROBBD are in the list.
172
A FAST ALGORITHM FOR FINDING BOUND-SETS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
FUNCTION strongBsList ← ProcessFoundBoundset(bsList)
LENGTH(strongBsList) ← 0
FOR i ← 0 TO LENGTH(bsList) - 1
IF NOT bsList[i] marked to be weak or considered
overlapFound ← FALSE
j ← i + 1
WHILE NOT overlapFound AND j < LENGTH(bsList)
IF (bsList[i].upperCut < bsList[j].upperCut AND
bsList[j].upperCut < bsList[i].lowerCut AND
bsList[i].lowerCut < bsList[j].lowerCut)
overlapFound ← TRUE
upperCutStrongBs ← bsList[i].upperCut
lowerCutStrongBs ← bsList[j].lowerCut
LENGTH(innerCuts) ← 2
innerCuts[0] ← bsList[j].upperCut
innerCuts[1] ← bsList[i].lowerCut
END IF
j ← j + 1
END WHILE
IF overlapFound
WHILE (j < LENGTH(bsList) AND
innerCuts[0] = bsList[j].upperCut)
LENGTH(innerCuts) ← LENGTH(innerCuts) + 1
innerCuts[LENGTH(innerCuts) - 1] ←
lowerCutStrongBs
lowerCutStrongBs ← bsList[j].lowerCut
j ← j + 1
END WHILE
k ← 0
j ← 0
WHILE j < LENGTH(bsList) AND k < LENGTH(innerCuts)
IF bsList[j].upperCut ≥ innerCuts[k]
k ← k + 1
END IF
IF (bsList[j].upperCut ≥ upperCutStrongBs AND
bsList[j].lowerCut ≤ lowerCutStrongBs AND
bsList[j].lowerCut > innerCuts[k])
mark bsList[j] to be weak or considered
END IF
j ← j + 1
END WHILE
LENGTH(strongBsList) ← LENGTH(strongBsList) + 1
index ← LENGTH(strongBsList) - 1
strongBsList[index].upperCut ← upperCutStrongBs
strongBsList[index].lowerCut ← lowerCutStrongBs
strongBsList[index].type ← full bound-set
ELSE
LENGTH(strongBsList) ← LENGTH(strongBsList) + 1
index ← LENGTH(strongBsList) - 1
strongBsList[index].upperCut ← bsList[i].upperCut
strongBsList[index].lowerCut ← bsList[i].lowerCut
strongBsList[index].type ← prime bound-set
END IF
END IF
END FOR
The loop at line 3 iterates over the list of bound-sets and that list has
less than n2 elements. The conditional statement at line 4 is true once
for each of the bound-sets that cannot be determined to be weak based
on the list of bound-sets, which is less than 2n bound-sets.
173
CHAPTER 7
In the loop at line 7 each bound-set with an index larger than i in
the list of bound-sets is checked to determine whether it overlaps with
the bound-set with index i. This check is made at line 8. If no
overlapping bound-set is found, the bound-set with index i is added to
the list of strong bound-sets and it is labeled as prime. This is done at
lines 41 – 45.
Lines 10 – 14 and lines 19 – 39 have two functions. The first is to
identify, among the bound-sets that will be labeled full, which one to
associate with the weak bound-set with label i, subsequently adding
the identified bound-set to the list of strong bound-sets strongBsList.
The second function is to mark all weak bound-sets associated with
that full bound-set as weak or considered in the list bsList. The strong
bound-set itself is also marked in this way.
Lines 10 – 14 and lines 19 – 24 find the full bound-set to which
the weak bound-set with index i is associated. This full bound-set is
upperCutStrongBs
and
represented
with
the
variables
lowerCutStrongBs. In those lines the list innerCuts is also generated.
These are cuts between upperCutStrongBs and lowerCutStrongBs.
With the help of these cuts it is possible to find all weak bound-sets
associated with the currently considered bound-set that is labeled full.
Lines 27 – 34 iterate over the bound-sets with an index larger than
i in list bsList. It marks the full bound-set currently under
consideration and all weak bound-sets associated with it. This check is
done at line 31. It is determined whether any cut in the innerCuts is
between the upper and lower cut of the bound-set with index j and it is
also determined whether or not all variables of the bound-set with
index j are also variables of the full bound-set under consideration.
Because of the way the list bsList is sorted it is sufficient to check
whether the cut with index k in the list innerCuts is between the upper
and lower cut of the bound-set with index j in bsList.
Lines 36 – 39 add the full bound-set under consideration to the list
of strong bound-sets strongBsList.
The loops at line 7, line 19 and line 27 all iterate over a part of the
list of bound-sets, which has less than n2 elements. They are all inside
the conditional statement at line 4 which is true less than 2n times
during the run of the loop at line 3; the operations inside these loops
therefore run O(n3) times in total. The complexity of the function
174
A FAST ALGORITHM FOR FINDING BOUND-SETS
ProcessFoundBound-set is therefore O(n3) and because n ≤ m it is also
O(m3).
7.4 Experimental results
To thoroughly evaluate the presented heuristic, the exact
decomposition algorithm [Dub97b] was implemented. This algorithm
was applied on the IWLS93 benchmark set. For all single outputs, for
which the exact algorithm did not time out, 582 in total, the number of
strong bound-sets found by the Interval-cut algorithm was computed.
In the first set of experiments, a sifting ordering algorithm [Rud93]
was used with the help of the Colorado University Decision Diagram
(CUDD) tool [Som98] to get a good initial order for ROBDDs. For
526 of those 582 single-output functions, it found one hundred percent
of the bound-sets. In the second set of experiments, the ROBDD was
built using the breadth first traversal order from the benchmark circuit
description. For 191 functions out of these 582 the result was worse
than in the first set of experiments by 57% on average. In this case,
worse means that a smaller number of strong bound-sets were found.
Nevertheless, the heuristic still found all the bound-sets for 365
functions.
The Interval-cut algorithm was also applied on the benchmarks
reported in [Ber97, Mat98, Min98]. The results are summarized in
Table 7.1. Column 4 shows how many non-trivial strong bound-sets
were found for each benchmark by the Interval-cut algorithm. Every
output is handled as a separate function. The number given in column
4 is the total sum of bound-sets for all the outputs. Columns 5 - 8
show runtime comparison. Unfortunately none of these algorithms has
a publicly available implementation so the experiments were run on
different computers. The experiments for the Interval-cut algorithm
were run on a Sun Ultra 60 operating with two 360 MHz CPUs and
with 1024 MB RAM main storage. The algorithm [Min98] uses a
SUN Ultra 30, [Ber97] uses a PC equipped with 150 MHz Pentium
and 96 MB RAM main storage and [Mat98] uses a PC with a
Pentium-II 233Mhz processor.
175
CHAPTER 7
Table 7.1: Comparison of execution time for Interval-cut algorithm
Name of
benchmark
function
Number of
inputs
Number of
outputs
Number of
strong boundsets
Interval-cut
algorithm
Algorithm in
[Min98]
Algorithm in
[Ber97]
Algorithm in
[Mat98]
Execution time (seconds)
alu2
alu4
apex1
apex2
apex3
apex4
apex5
apex6
apex7
b9
C432
C499
C880
C1355
C1908
C3540
cmb
CM42
CM85
CM150
comp
count
dalu
des
e64
f51m
frg2
k2
lal
misex2
mux
pair
PARITY
rot
seq
s298
10
14
45
38
54
9
114
135
49
41
36
41
60
41
33
50
16
4
11
21
32
35
75
256
65
8
143
45
26
25
21
173
16
135
41
17
6
8
45
3
50
19
88
99
37
21
7
32
26
32
25
22
4
10
3
1
3
16
16
245
65
8
139
45
19
18
1
137
1
107
35
20
3
2
83
16
23
4
196
258
96
49
10
68
45
0
15
18
4
10
15
1
47
47
42
688
63
6
532
85
57
29
1
725
1
296
135
15
0.0002
0.0009
0.008
0.001
0.008
0.002
0.032
0.008
0.006
0.001
0.002
5.2
0.046
5.2
0.23
2.8
0.002
0.0006
0.0003
<0.0001
0.002
0.007
0.015
0.041
0.51
0.0004
0.032
0.008
0.002
0.003
0.0001
0.040
0.001
0.039
0.009
0.0004
59.0
5.9
44.3
13.1
1.7
415.4
>0.8
19.2
67.8
-
0.28
0.37
1.01
1.14
0.33
2.34
2.62
1.03
1.23
83.47
2.71
91.25
7.58
21.1
0.36
0.15
0.27
0.51
0.71
0.73
1.31
0.26
2.86
1.04
0.55
0.57
0.48
4.02
0.38
22.62
1.10
0.40
0.15
0.41
0.37
0.02
0.28
8.80
0.92
8.87
1.42
3.48
0.01
0.36
0.15
7.36
-
176
A FAST ALGORITHM FOR FINDING BOUND-SETS
Name of
benchmark
function
Number of
inputs
Number of
outputs
Number of
strong boundsets
Interval-cut
algorithm
Algorithm in
[Min98]
Algorithm in
[Ber97]
Algorithm in
[Mat98]
Execution time (seconds)
s420
s444
s526
s641
s832
s953
s1196
s1238
s1423
s1488
s1494
term1
too
large
ttt2
vda
x4
35
24
24
54
23
45
32
32
91
14
14
34
38
24
39
94
18
27
27
42
24
52
32
32
79
25
25
10
3
21
17
71
18
65
45
138
37
40
33
33
38
38
38
65
17
44
30
180
0.007
0.001
0.002
0.003
0.003
0.003
0.002
0.002
0.066
0.002
0.002
0.002
0.001
0.002
0.003
0.008
>1.0
>0.5
-
0.75
0.54
0.52
1.12
0.54
20.97
0.71
0.75
12.48
0.36
0.34
0.75
0.55
0.4
1.90
0.09
-
The experiments on benchmarks given in Table 7.1 show that the
Interval-cut algorithm is fast compared to the published exact
algorithms. For all benchmarks the execution of the Interval-cut
algorithm ran faster than the algorithms reported in [Ber97, Mat98,
Min98]. The benchmarks for which the exact algorithms presented in
[Min98], [Ber97] and [Mat98] took longest time to execute compared
to the Interval-cut algorithm are C432, s953 and pair respectively. For
these benchmarks the exact algorithms took 210000 times, 7000 times
respectively 180 times longer to execute than the Interval-cut
algorithm. These differences are too large to only be caused by
differences in performances of the computers that were used, hence
these experiments demonstrate that the Interval-cut algorithm is
considerably fast.
177
CHAPTER 7
7.5 Discussion and conclusions
In this chapter the Interval-cut algorithm has been presented. The
Interval-cut algorithm is a heuristic algorithm for finding bound-sets
of Boolean functions. The bound-sets show how the Boolean function
can be disjointly partitioned. This algorithm operates on an ROBDD
and it finds all bound-sets of variables that are adjacent in the
ROBDD. The algorithm has a time complexity of O(m3), where m is
the number of nodes in the ROBDD.
This algorithm is strong because, for most Boolean functions,
there is a variable order in which the ROBDD has a minimal number
of nodes and in which all subsets of variables forming strong boundsets are in linear intervals. It was stated in [Tes05] that it is only in
some rare cases that such a variable order does not exist.
The experiments on benchmark functions demonstrate that in most
ROBDDs in which the variable order is chosen with a practical
algorithm, in this case the sifting algorithm [Rud93], all strong boundsets are adjacent in the ROBDD. For such cases all bound-sets are
found by the Interval-cut algorithm.
If all strong bound-sets are not in linear intervals, the heuristic
method finds a tree with the same properties as a decomposition tree,
but some bound-sets are missed. In this tree, nodes may be weak
bound-sets, but without knowledge about the bound-sets that are not
found, they are labeled full or prime and can be used in the same way
as strong bound-sets.
178
Chapter 8
Functional decomposition
for three-level logic
implementation
In this chapter a fast algorithm is presented for estimation of
whether a Boolean function is likely to benefit from three-level ANDOR-XOR optimization. Background and related work to the
contribution in this chapter were described in Chapter 6.
The experimental results presented in [Dub99] show that
optimization algorithms for AND-OR-XOR logic can be quite time
consuming. Those experimental results also shown how some
functions gain much in implementation size compared to a two-level
sum-of-product implementation while other functions gained nothing
or very little. It is therefore advantageous to know in advance the
benefit of running such an algorithm. In this chapter a method to
predict the benefit is presented.
First, we study and describe the kind of structure a function should
have to benefit from optimization for an AND-OR-XOR structure. We
then give a theorem and its proof to characterize such functions. This
theorem formulates a sufficient condition for a given function f ( X ) ,
X = {x1 , x 2 , K, x n }
to
have
a
decomposition
of
type
f ( X ) = ( g ( X ) ⊕ h( X )) + r ( X ) with the total number of product-
179
CHAPTER 8
terms in g, h and r smaller than the number of product-terms in f,
when functions are represented in SoP form. The function r is needed
to make the condition sufficient. The estimation algorithm uses this
theorem to predict how much benefit optimization for AND-OR-XOR
will give. Note that there are no restrictions on the support sets of g
and h. This is a difference between the presented method and methods
utilizing algebraic decomposition.
Section 8.1 illustrates the basic ideas of the algorithm. Section 8.2
describes the theorem formally. In Section 8.3 the estimation
algorithm utilizing the theorem is presented. Experimental results are
presented in Section 8.4.
8.1 Basic ideas in 3-level decomposition
estimation method
AND-OR-XOR optimization is decomposition of a function for an
implementation like the one illustrated in Figure 8.1. Functions g and
h are in sum-of-product form.
g
h
Figure 8.1: AND-OR-XOR implementation
The estimation algorithm starts from a sum-of-product representation
of the Boolean function. It utilizes the idea that cubes of the functions
g and h are generated based on the cubes in the sum-of-product
representation of the original function. The algorithm checks each pair
of cubes to see if they can be replaced by one cube. This cube together
with the remaining cubes should then be used in functions g and h to
implement the function. Some of the remaining cubes might be
modified to achieve this. The algorithm counts the number of pairs of
cubes for which this is possible. The more cases for which it is
180
FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION
possible the more likely it is that the function will benefit from ANDOR-XOR optimization.
x1x2
x3x4
00 01 11 10
x1x2
x3x4
00 01 11 10
x1x2
x3x4
00 01 11 10
00
1
0
1
0
00
1
0
1
0
00
1
0
1
0
01
0
1
0
0
01
0
1
0
0
01
0
1
0
0
11
1
0
0
0
11
1
0
0
0
11
1
0
0
0
10
0
0
0
1
10
0
0
0
1
10
0
0
0
1
a
b
c
Figure 8.2: Illustration of the algorithm
To illustrate how to check whether a pair of cubes can be replaced
with one cube, consider the function shown in Figure 8.2a. The circles
in Figure 8.2a represent the cubes in the sum-of-product form of the
function. The cubes x1 ⋅ x 2 ⋅ x3 ⋅ x 4 and x1 ⋅ x 2 ⋅ x3 ⋅ x 4 are checked in
this example, to see whether they can be replaced with one cube. The
first step is to build the super cube of these cubes. The super cube of a
set of cubes is the smallest cube that covers all the cubes in the set. In
Figure 8.2b the dotted line shows the super cube of these cubes.
After the super cube is generated the next step is to find cubes that
can be expanded to cover zeros in the super cube. Figure 8.2c shows
how the cubes x1 ⋅ x 2 ⋅ x3 ⋅ x 4 and x1 ⋅ x 2 ⋅ x3 ⋅ x 4 have been expanded
to cover the zeros in the super cube. This function can then be
implemented in AND-OR-XOR logic with the dotted cube in function
g and the solid implicants in function h referring to Figure 8.1. So in
this
example
we
have
found
that
cubes
x1 ⋅ x 2 ⋅ x3 ⋅ x 4
and x1 ⋅ x 2 ⋅ x3 ⋅ x 4 can be replaced by one cube when using AND-ORXOR logic.
181
CHAPTER 8
8.2 Theorem on which the estimation
method is based
In this section we define the theorem on which the estimation method
is based. First, some notations are defined. Let f ( x1 , x 2 , K , x n ) be
an
incompletely
specified
Boolean
function
of
type
f : {0, 1} → {0, 1, −} where “-” denotes a don't-care value. We
n
use Ff, Rf and Df to represent the set of assignments of variables for
which the function value is one, zero and don't-care respectively.
The size of a set of cubes A denoted by A is the number of cubes
in it. The complement of a set of cubes A denoted by A is the
intersection of the complements for each cube of A. The intersection
of two sets of cubes A and B denoted by A ∩ B is the union of the
pair wise intersection of the cubes from A and B. The union of two
sets A and B denoted by A ∪ B is the union of the cubes from A and
B. We denote with sup(a1 , a 2 , K , a k) the super cube of cubes a1 to
ak. The symbol ⊕ is used to indicate exclusive or (XOR) in both sets
and in Boolean functions.
A rule is formulated in Theorem 8.1 that could be applied to a
two-level AND-OR expression to transform it to an expression of type
f = ( g ⊕ h ) + r with the total number of product-terms in g, h and r
smaller than the number of product-terms in f. The following Lemmas
are used in Theorem 8.1.
Lemma 8.1: If X ∩ Y = Ø then ( X ∪ Y ) ⊕ Y = X
Proof:
( X ∪ Y ) ⊕ Y = (( X ∪ Y ) ∩ Y ) ∪ (( X ∪ Y ) ∩ Y )
= ( X ∩ Y ∩ Y ) ∪ (( X ∩ Y ) ∪ (Y ∩ Y )) = X ∩ Y , since X ∩ Y = Ø
X ∩ Y = X , since X ∩ Y = Ø
182
□
FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION
Lemma
8.2:
If X ∩ Z = Ø ,
Y ∩Z =Ø
and
Y⊂X
then
( X ⊕ Y ) ∪ Z = X ⊕ (Y ∪ Z )
Proof:
On
left
side
of
the
equal
sign,
(X ⊕Y ) ∪ Z
= ( X ∩ Y ) ∪ ( X ∩ Y ) ∪ Z = ( X ∩ Y ) ∪ Z since Y ⊂ X .
On right side of the equal sign, X ⊕ (Y ∪ Z )
= ( X ∩ (Y ∪ Z )) ∪ ( X ∩ (Y ∪ Z ))
= ( X ∩ Y ) ∪ ( X ∩ Z )) ∪ ( X ∩ Y ∩ Z ) = ( X ∩ Y ) ∪ Z , since
Y ⊂ X , X ∩ Z = Ø and Y ∩ Z = Ø
□
Lemma 8.3: Let a1 , a 2 ,K , a k , k > 0 be cubes from the on-set Ff of a
Boolean function f : {0, 1}n → {0, 1, −} such that the intersection
of sup(a1 , a 2 , K , a k) with the off-set Rf is a non-empty set of cubes
{c1 , c 2 , K, c p} such that ∪ ip=1 ci = sup(a1 , a 2 , K, a k) ∩ RF
for
some p ≥ 1 . If for each cube ci we can find a cube bi ∈ F f such that
sup(bi , ci ) ∩ R f = ci
as
well
as
sup(a1 , a 2 , K, a k) ∩
sup(bi , ci ) = ci then there exist a set D ⊆ D f such that
p


 sup(a1 , a 2 , K, a k) ⊕ U sup(bi , ci ) ⊕ D
i =1


k
p
j =1
i =1
= U a j ∪ U (sup(bi , ci ) − ci )
(8.1)
Proof: Since ai ∈ F f ∀i ∈ {1, 2, K , k } and
c j ∈ R f ∀j ∈ {1, 2, K , p} the intersection of the sets ∪ ik=1 ai and
∪ pj=1 c j is empty. Therefore, by applying Lemma 8.1, we can write:
p
p
k


 k
 p
U a j =  U a j ∪ U ci  ⊕ U ci =  sup(a1 , a 2 , K , a k) ⊕ U ci  ⊕ D
j =1
i =1
 j =1
 i =1
i =1


Making union with ∪ip=1 (sup(bi , ci ) − ci ) on both sides, we get:
183
CHAPTER 8
 k   p

 U a j  ∪  U (sup(bi , ci ) − ci )





 j =1   i =1
p

  p

=   sup(a1 , a 2 , K , a k) ⊕ U ci  ∪  U (sup(bi , ci ) − ci )  ⊕ D
i =1
  i =1


sup(a1 , a 2 , K , a k) ∩ (∪ ip=1 (sup(bi , ci ) − ci )) = ∅ ,
Since
(∪ip=1 (sup(bi , ci ) − ci )) ∩ (∪ip=1 ci ) = ∅
(∪
p
i =1
and
ci ) ⊂ sup(a1 , a 2 , K, a k) we can apply Lemma 8.2 and get:
k
p
j =1
i =1
U a j ∪ U (sup(bi , ci ) − ci )

 p   p


= sup(a1 , a 2 , K , a k) ⊕   U ci  ∪  U (sup(bi , ci ) − ci )   ⊕ D

  
  i =1   i =1


 p

=  sup(a1 , a 2 , K , a k) ⊕  U (sup(bi , ci ))  ⊕ D
 i =1


□
Lemma 8.3 gives a condition for substituting a subset Ff* from the onset Ff of a function f by two functions g and h of type
g , h : {0, 1}n → {0, 1} so that F f * = Fg ⊕ Fh and the total
number of cubes in Fg and Fh is smaller than in Ff*. Next we prove that
this condition is sufficient to make it possible to represent f as
f = ( g ⊕ h) + r with the total number of cubes in g, h and r smaller
than the number of cubes in f.
The set D in the equation above indicates that the don't-cares might be
assigned differently in the left and right side of the equation of Lemma
8.3.
Theorem 8.1: If a Boolean function fulfills Lemma 8.3 for some set
of cubes {a1 , a 2 ,..., a k } , a i ∈ F f ∀i ∈ {1, 2, ..., k} , then it can be
represented as f = ( g ⊕ h) + r with the total number of cubes in g, h
and r smaller than in f.
184
FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION
Proof: Suppose a function f fulfills Lemma 8.3, then, there exist some
j ∈ (1, 2, K , k ) i ∈ (1, 2, K, p )
cubes a j , bi ∈ F f , ci ∈ R f
fulfilling Equation (8.1) for some D ⊆ D f . All the cubes from the set
∪ip=1 (sup(bi , ci ) − ci ) belong to either on-set Ff or to don't-care set Df.
Also, for each i, the set (sup(bi , ci ) − ci ) includes at least one cube
from the on-set Ff namely bi. So p + 1 cubes from the left hand side
of the Equation (8.1) cover at least p + k , k > 1 , cubes from the onset Ff, given by the right hand side of Equation (8.1). If we set
Fg = sup(a1 , a 2 , K , a k)
and
Fh = sup(bi , ci ) ,
then
Fg + Fh < F f *
with
Ff * = (∪kj =1 a j ) ∩ (∪ip=1 (sup(bi , ci ) − ci )) .
Defining the reminder as Fr = F f − (∪kj =1 a j ) − (∪ip=1 bi ) , we get a
decomposition of type f = ( g ⊕ h) + r with the total number of
cubes in g, h and r smaller than in f. In that function r is a Boolean
function which is 1 for input combinations in set Fr and 0 for others.
Don't-cares might be assigned different values in this representation
than in the two-level form initially given.
□
8.3 Estimation algorithm
Theorem 8.1 can be utilized to estimate the benefit of optimization for
AND-OR-XOR logic implementation. The larger the subset of the onset of a function f that satisfies the condition in Lemma 8.3, the more f
can benefit from XOR minimization. However, there might be several
different choices of such a subset. Computing the best one would
require first trying all possible subsets {a1 , a 2 ,..., a k } of f to find the
ones fulfilling the condition in Lemma 8.3, and then solving the
covering problem to find which combination of the subsets results in
the best XOR-cover for f. Both steps would require exponential time,
and therefore such a method would be too slow for large functions.
Instead, we present a simple heuristic algorithm, which quickly
estimates the benefit from XOR-minimization by only using pair of
subsets. The pseudo code is shown below. The input is the on-set Ff,
don't-care set Df and off-set Rf of f. The output is a counter value that
185
CHAPTER 8
indicates the number of pairs ai and aj for which Lemma 8.3 is
fulfilled.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ALGORITHM Estimation-algorithm (Ff, Df, Rf)
counter ← 0
FOR EACH pair of cubes (ai,aj), ai∈Ff, aj∈Ff DO
flag_lemma_fulfilled ← TRUE
C ← super cube(ai,aj)∩Rf
FOR EACH cube c∈C DO
flag_cube_b_not_found ← TRUE
FOR EACH cube b∈(Ff-ai-aj) DO
IF super cube(ai,aj)∩Rf = ck AND
super cube(ai,aj)∩super cube(b,c) = c THEN
flag_cube_b_not_found ← FALSE
END IF
END FOR EACH
IF flag_cube_b_not_found THEN
flag_lemma_fulfilled ← FALSE
END IF
END FOR EACH
IF flag_lemma_fulfilled THEN
counter ← counter + 1
END IF
END FOR EACH
RETURN counter
END ALGORITHM
There are at least as many different ways as the value of counter to
express the given function as ( g ⊕ h ) + r with the total number of
product-terms in g, h and r smaller than the number of product-terms
in the sum-of-product form in which the function was given to the
algorithm.
If Lemma 8.3 is checked for all subsets of implicants for the
function f, then O(2m) possible subsets need to be tested for, where m
is the number of implicants. For most functions it is, however, more
likely that Lemma 8.3 is fulfilled if the super cube is made out of only
a few implicants. The heuristic algorithm checks whether the lemma is
fulfilled only for pairs of implicants. Then only O(m2) super cubes
have to be built. Another essential savings in time results from the fact
that the algorithm is not computing the XOR cover at all but is only
counting the number of pairs fulfilling Lemma 8.3.
The more pairs that fulfill Lemma 8.3, the more flexibility we
have in selecting a good XOR-cover from them. However, since
Theorem 8.1 proves only the sufficiency of the condition, not its
necessity, there might be cases when the condition is not fulfilled, but
186
FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION
the number of cubes in f can still be reduced by a representation
as f = ( g ⊕ h ) + r .
8.4 Experimental results
We have performed a set of experiments with the goal of determining
how good the presented heuristic is. Table 8.1 summarizes the results.
The second and third columns give the number of inputs and the
number of outputs of the function. The fourth column refers to the
number of implicants in the SOP-form computed by Espresso [Bra84].
The fifth column gives the number of cubes in an AND-OR-XOR
representation. It is the smallest number among the results reported in
[Deb98, Dub99, Sas95]. The sixth column gives the counter value of
the algorithm, which is the number of pairs of implicants for which
Lemma 8.3 is fulfilled.
The number of AND gates needed is given by the number of
cubes. For two-level and three-level-implementation this has a high
correlation to the number of transistors needed to implement the
function. To compare the implementation cost for a two-level
implementation with a three-level AND-OR-XOR implementation the
fourth and the fifth columns in Table 8.1 should be considered. For the
three-level implementation there is an extra two input XOR-gate that
is not needed in the two-level implementation. In the benchmarks in
this experiment the cost of this gate is small compared to all other
gates. Therefore, the number of cubes is still a good estimation of
implementation cost.
187
CHAPTER 8
Table 8.1: Number of pairs fulfilling Lemma 8.3
Benchmark
function
5xp1
9sym
Clip
f51m
Life
Mlp4
rd53
rd73
rd84
Sao2
squar5
t481
Z4
Number
of
inputs
7
9
9
8
9
8
5
7
8
10
5
16
7
Number
of
outputs
10
1
5
8
1
8
3
3
4
4
8
1
4
AND-OR
65
86
120
77
84
128
31
127
255
58
25
481
59
AND-ORXOR
34
65
72
35
62
75
17
54
99
33
20
18
18
Lemma
counter
27
301
28
44
724
52
198
1304
7590
53
6
0
54
For benchmarks rd73 and rd84 we can see that Lemma 8.3 is fulfilled
for many pairs of cubes. For those benchmark-functions the benefit of
using AND-OR-XOR is greater relative to two-level AND-OR
implementation. On the other hand, benchmark 9sym does not show
that much difference between these two implementations. This is
expected because the pair of cubes for which Lemma 8.3 is fulfilled is
also considerably smaller.
There are, however, several benchmarks that gain quite a bit
although the counter value is low. For example, Lemma 8.3 is not
fulfilled for any pair of implicants for the benchmark function t481.
This function can, however, be described as f = g ⊕ h where the
total number of implicants in g and h are only 18, while the number of
implicants in its two-level AND-OR form is 481 in the smallest
known such form. The algorithm presented is only a sufficient
condition for a function to benefit from XOR minimization and the
case with function t481 shows that the presented heuristic misses
some types of functions.
We can also see that for benchmark functions 9sym and life the
counter in the algorithm is quite large but the size of the AND-ORXOR implementation is only slightly smaller than the AND-OR
implementation. However, the algorithms that found the size of ANDOR-XOR implementation are heuristic. This implies that it is quite
possible that some AND-OR-XOR implementations may exist for
those functions with a smaller number of cubes. This possibility is
188
FUNCTIONAL DECOMPOSITION FOR THREE-LEVEL LOGIC IMPLEMENTATION
likely because AND-OR-XOR minimization is more complicated than
AND-OR minimization and the research is not as widely established
as AND-OR minimization.
8.5 Conclusions
In this chapter a sufficient condition for a function f to have a
decomposition of type f = ( g ⊕ h ) + r , with the total number of
product-terms in g, h and r smaller than the number of product-terms
in f, has been formulated. An algorithm has been designed that utilizes
this condition for deciding whether a function is likely to benefit from
XOR-minimization. This algorithm can be used as a pre-processing
step to decide whether it is worthwhile to run an algorithm for ANDOR-XOR minimization.
Experiments on benchmark circuits show that the benefit of using
AND-OR-XOR is greater in cases where the presented algorithm
estimates that minimization for AND-OR-XOR will be beneficial.
There are, however, some benchmark circuits that gain quite a bit
although the presented algorithm does not indicate this. The algorithm
however provides a sufficient condition and it might therefore miss
some types of functions that would benefit from optimization for
AND-OR-XOR implementation. There are also some benchmark
functions for which the presented algorithm indicates that
optimization for AND-OR-XOR implementation would be beneficial
but where no such implementation was found. Since the algorithms
that found the size of AND-OR-XOR implementation are heuristic so
it is quite possible that AND-OR-XOR implementations exist for
those functions with a smaller number of cubes. This possibility is
quite likely because AND-OR-XOR minimization is more
complicated than AND-OR minimization and the research is not as
widely established as AND-OR minimization.
189
CHAPTER 8
190
Part D
Conclusions
191
192
Chapter 9
Conclusions and future
work
This thesis has presented contributions in the area of electronic testing
and Boolean optimization. The development of SoCs, especially those
using a NoC as interconnection infrastructure, is of particular interest
where all contributions of this thesis can be useful. Section 9.1 gives
short summaries of the contributions and results in electronic testing
and Section 9.2 gives short summaries of the contributions and results
in Boolean optimization. Section 9.3 gives proposals for future work.
9.1 Contributions in chip testing
Contributions of this thesis in the area of testing are based on the NoC
infrastructure. Testing of this sort of infrastructure can be categorized
into testing of the switches and testing of the interconnections between
the switches. This thesis makes contributions which are applicable in
both of these areas.
Contributions related to testing of interconnections can be applied
to testing links between NoC-switches that are clocked with different
clock signals. This case is considered harder to test than the case in
which all switches share the same clock. This thesis contributes test
193
CHAPTER 9
techniques for testing faults in such links causing too much crosstalk
and faults that cause unacceptable delay.
For crosstalk tests of the interconnection links between NoCswitches it is normally unnecessarily pessimistic and inefficient to
only consider one wire at the time as victim wire. This thesis
contributes with a small hardware for selecting which wires to be
victims simultaneously. That hardware is configurable providing
possibility to set the minimum distance between wires that should be
considered as victims at the same time. In this way a tradeoff between
test time and test accuracy can be configured.
Raising the abstraction level at which test logic and test data are
generated is attractive because test costs can be accounted for earlier
in the design phase. It is also usually easier to activate and propagate
faults at higher levels of abstraction. The challenge when developing
tests at a high abstraction level is to find fault models which model the
effect of physical defects well enough. A contribution to fault
modeling at the system level is included in this thesis. Usage of
application-specific fault models is recommended and as a case study
fault models for a NoC switch are proposed and their efficiency
evaluated. Unlike the presented methods for testing links the
contribution in fault generation at system level is not a usable method
that can be applied to cover a certain set of defects. It is instead initial
results in system-level fault modeling which needs further research to
be practical when high coverage of defects is required. However, the
results demonstrate the potential benefits of further investigation of
system level fault modeling.
9.2 Contributions in Boolean
decomposition
Two contributions have been presented in the area of Boolean
decomposition. The first of these contributions is a fast BDD-based
heuristic algorithm for finding bound-sets. The second of these
contributions is a fast algorithm for prediction of whether it is
194
CONCLUSIONS AND FUTURE WORK
worthwhile to run a minimization algorithm for a three-level ANDOR-XOR implementation.
Experiments show that the presented algorithm for finding boundsets works considerably faster than exact algorithms and that it finds
all bound-sets for most benchmark functions. The complexity of the
presented algorithm has been shown to be O(m3) where m is the
number of nodes in the ROBDD.
The second contribution in Boolean decomposition is a method
for fast prediction of whether optimization algorithms for AND-ORXOR implementation are worth running. Some functions gain a
significant amount of optimization for AND-OR-XOR logic and
others do not. Optimization for AND-OR-XOR logic is relatively time
consuming, therefore it is beneficial to know in advance whether such
an optimization is going to be useful.
9.3 Proposals for future work
This section describes limitations in the presented contributions and it
proposes future work to address some of these limitations.
9.3.1. Interconnection test
The proposed method for interconnection test works on simple
handshaking links. Other asynchronous protocols might require
different approaches. An analysis for such an approach would be an
interesting future work.
The presented algorithm for measuring delay targets test of links
connecting different clock domains. For high speed signals between
different clock domains there is a risk of problems with metastability.
The exact implementation is not included as a part of the contribution
in this thesis. It would be an interesting topic for future work to find
out how the presented methodology can optimally be implemented
with consideration of the metastability problem.
The analysis of the efficiency of the delay measuring method
assumes that a certain signal wire will have a specific signal delay
which is the outcome of a normal distributed stochastic variable. A
195
CHAPTER 9
delay fault is considered to be present if the actual signal delays in a
link can result in failures. It would be interesting to conduct a more
detailed investigation of the probability distribution of the delay at
signal wires.
The presented test method for detecting crosstalk induced glitches
utilizes glitch detectors. The glitch detectors need to be as sensitive to
glitches as the logic used during normal operation. However, too
sensitive glitch detectors will make the test unnecessarily pessimistic.
It would be an interesting future work to investigate how to make
glitch detectors sensitive enough without being too sensitive. Another
subject for future work is to determine how to also include test for
glitches caused by defects resulting in too much inductive coupling.
The method for scheduling victim wires targets crosstalk-faults
that can occur due to excessive capacitive coupling. An interesting
future work will be to investigate how to efficiently schedule victim
wires when defects causing too much inductive coupling are also
considered.
9.3.2. System level fault modeling
The study in this thesis about system level fault modeling has been
limited. One limitation is that only one particular design is studied. It
would be interesting to study more designs. Another limitation is that
only one logic level implementation has been utilized for evaluation of
the system level faults. A future work could be to synthesize the
design to logic level in several ways to investigate how the relevance
of system level faults depend on synthesis algorithms and
optimization criteria. A third limitation is that the evaluation of the
system level faults has been made on a system which has been
simplified to a pure combinational design. It would be interesting to
develop analysis methods for sequential designs.
One more interesting subject for future work will be to evaluate
how synthesis algorithms can be analyzed to develop accurate fault
models at high abstraction levels. As described in Subsection 3.3.2,
some attempts have been made in that direction. However, it seems to
be difficult to do that without putting constraints on the synthesis
algorithms, which considerably limits the ability to optimize.
196
CONCLUSIONS AND FUTURE WORK
9.3.3. BDD-based decomposition
The presented algorithm for finding bound-sets works for single
output functions. However, most combinational networks have several
outputs as well as several inputs. Each output then needs to be
considered separately for multiple output functions. It would be useful
to extend the algorithm to permit identification of common sub
expressions for several outputs. For multiple output functions
represented with a common ROBDD it should be relatively simple to
identify subsets of variables that are bound-sets to several outputs,
which have the same associated functions.
Bound-sets correspond to subfunctions that only use inputs not
used anywhere else in the implementation for the output under
consideration. However, there are more opportunities to find common
sub expressions if we allow some inputs to feed both the sub
expression and the rest of the network. It would therefore be
interesting to investigate extending the algorithm for that.
A variant of BDD with complemented edges has been proposed in
literature to get smaller number of nodes in the BDD. It should be
possible to adopt the presented algorithm for BDDs with
complemented edges with only slight modifications.
By considering not only the case where the number of lower cutnodes is two and structures are equal, it should be possible to extend
the presented algorithm such that it can find Roth-Karp
decompositions.
9.3.4. Decomposition for XOR-type logic
Use of the presented method for prediction of expected gain by
optimization for AND-OR-XOR logic only considers pairs of cubes in
the two-level representation. Some functions that gain significantly
from AND-OR-XOR minimization do not gain anything by only
putting two implicants in the function that feeds one of the inputs to
the XOR-gate. The presented algorithm inadequately indicated that
there is no use in trying to minimize for AND-OR-XOR logic. It
would be interesting in future work to modify the algorithm such that
these types of functions are also detected.
197
CHAPTER 9
198
List of abbreviations
ALU
Arithmetic Logic Unit
BDD
Binary Decision Diagram
BIST
Built In Self Test
CUDD
Colorado University Decision Diagram
DfT
Design for Testability
EDIF
Electronic Design Interchange Format
FIFO
First In First Out
FIR
Finite Impulse Response
FPGA
Field Programmable Gate Array
FSM
Finite State Machine
GALS
Global Asynchronous Local Synchronous
IP
Intellectual Property
NoC
Network on Chip
NP
Non-deterministic Polynomial
PCB
Printed Circuit Board
PLA
Programmable Logic Array
RAM
Random Access Memory
ROBDD
Reduced Ordered BDD
RT
Register Transfer
RTR
Ready To Receive
199
LIST OF ABBREVIATIONS
SoC
System on Chip
SOP
Sum Of Product
VHDL
VHSIC Hardware Description Language
VHSIC
Very High Speed Integrated Circuit
UML
Unified Modeling Language
200
References
[Abr90]
M. Abramovici, M. A. Breuer, and A. D. Friedman,
"Digital systems testing and testable design", IEEE
Press, ISBN 0-7803-1062-4, 1994.
[Alt12]
"Processors from Altera and Embedded Alliance
Partners",
Altera,
Web
site:
www.altera.com/devices/processor/emb-index.html,
2012
[Amo04]
A. M. Amory, É. Cota, M. Lubaszewski, and F. G.
Moraes, "Reducing test time with processor reuse
in network-on-chip based systems", Proceedings of
Symposium on Integrated Circuits and System
Design pp. 111-116 2004
[Arm12]
"DesignStart for Processor IP", ARM, Web site:
www.arm.com/products/processors/designstartprocessor-ip/index.php, 2012
[Aru05]
D. Arumí, R. Rodríguez-Montañés, and J. Figueras,
"Defective behaviours of resistive opens in
interconnect lines", Proceedings of European Test
Symposium, Tallinn, Estonia, pp. 28 - 33, May 2005
201
REFERENCES
[Asc08]
G. Ascia, V. Catania, M. Palesi, and D. Patti,
"Implementation and analysis of a new selection
strategy for adaptive routing in networks-on-chip",
IEEE Transaction on computers, vol. 57, (6), pp. 809820, 2008.
[Ash59]
R. Ashenhurst, "The decomposition of switching
functions",
Proceedings
of
International
Symposium on Theory of Switching Functions, pp.
77-116, 1959
[Att01]
A. Attarha and M. Nourani, "Testing interconnects
for noise and skew in gigahertz SoC", Proceedings
of International Test Conference, pp. 305-314, 2001
[Bai00]
X. Bai, S. Dey, and J. Rajski, "Self-test methodology
for at-speed test of crosstalk in chip interconnects",
Proceedings of Design Automation Conference, pp.
619-624, 2000
[Bai04]
X. Bai and S. Dey, "High-level crosstalk defect
simulation methodology for system-on-chip
interconnects", Transactions on Computer-Aided
Design of Integrated Circuits and Systems, vol. 23,
(9), pp. 1355-1361, 2004.
[Ben01]
T. Bengtsson and E. Dubrova, "A sufficient
condition for detection of XOR-type logic",
Proceedings of Norchip, Stockholm, Sweden, pp.
271-278, November 2001
[Ben02]
L. Benini and G. De Micheli, "Networks on Chips:
A New SoC Paradigm", IEEE Computer, vol. 35, (1),
pp. 70-78, 2002.
202
[Ben03a] T. Bengtsson, "Boolean decompoistion in
combinational logic synthesis". Licentiate thesis,
Royal Institute of Technology Stockholm, ISSN
1651-4076, 2003.
[Ben03b] T. Bengtsson, A. Martinelli, and E. Dubrova, "A
BDD-based fast heuristic algorithm for disjoint
decomposition", Proceedings of Asia and South
Pacific Design Automation Conference, Kitakyushu,
Japan, pp. 191-196, January 2003
[Ben05a] T. Bengtsson, A. Jutman, S. Kumar, and R. Ubar,
"Delay
testing
of
asynchronous
NoC
interconnects", Proceedings of International
Conference Mixed Design of Integrated Circuits and
Systems, pp. June 2005
[Ben05b] T. Bengtsson, A. Jutman, R. Ubar, and S. Kumar, "A
method for crosstalk fault detection in on-chip
buses", Proceedings of Norchip, Oulu, Finland, pp.
285-288, November 2005
[Ben06a] T. Bengtsson, A. Jutman, S. Kumar, R. Ubar, and Z.
Peng, "Analysis of a test method for delay faults in
NoC interconnects", Proceedings of East-West
Design & Test International Workshop (EWDTW),
pp. 42-46, September 2006
[Ben06b] T. Bengtsson, A. Jutman, S. Kumar, R. Ubar, and Z.
Peng, "Off-line testing of delay faults in NoC
interconnects",
Proceedings
of
Euromicro
Conference on Digital System Design: Architectures,
Methods and Tools, pp. 677 - 680, 2006
203
REFERENCES
[Ben06c]
T. Bengtsson, S. Kumar, A. Jutman, and R. Ubar,
"An improved method for delay fault testing of
NoC interconnections", Proceedings of Special
Workshop on Future Interconnects and Networks on
Chip (along with Design And Test in Europe), pp.
March 2006
[Ben06d] T. Bengtsson, S. Kumar, and Z. Peng, "Application
area specific system level fault models: a case
study with a simple NoC switch", Proceedings of
International Design and Test Workshop (IDT), pp.
November 2006
[Ben06e] T. Bengtsson, S. Kumar, R. Ubar, and A. Jutman,
"Off-line testing of crosstalk induced glitch faults
in NoC interconnects", Proceedings of Norchip,
Linköping, Sweden, pp. 221-226, November 2006
[Ben08]
T. Bengtsson, S. Kumar, R. Ubar, A. Jutman, and Z.
Peng, "Test methods for crosstalk-induced delay
and glitch faults in network-on-chip interconnects
implementing
asynchronous
communication
protocols", Computers and Digital Techniques, IET,
vol. 2, (6), pp. 445-460, 2008.
[Ber97]
V. Bertacco and M. Damiani, "The disjunctive
decomposition of logic functions", Proceedings of
International Conference on Computer-Aided
Design, pp. 78-82, 1997
[Bra84]
R. K. Brayton, A. L. Sangiovanni-Vincentelli, C. T.
McMullen, and G. D. Hachtel, "Logic minimization
algorithms for VLSI synthesis", Kluwer Academic
Publishers, ISBN 0-89838-164-9, 1984.
204
[Bre01]
V. Bret and K. Keutzer, "Bus encoding to prevent
crosstalk delay", Proceedings of IEEE/ACM
International Conference on Computer Aided
Design, pp. 57-63, November 2001
[Bry86]
R. E. Bryant, "Graph-based algorithm for Boolean
function
manipulation",
Transactions
on
Computers, vol. C-35, pp. 677-691, 1986.
[Buo97]
G. Buonanno, F. Ferrandi, L. Ferrandi, F. Fummi,
and D. Sciuto, "How an "evolving" fault model
improves the behavioral test generation",
Proceedings of Great Lakes Symposium on VLSI,
pp. 124-130, 1997
[Cha05]
K. Chakrabarty, V. Iyengar, and M. D.
Krasniewski, "Test planning for modular testing of
hierarchical SOCs", Transactions on ComputerAided Design of Integrated Circuits and Systems,
vol. 24, (3), pp. 435-448, 2005.
[Cha96]
S. C. Chang, M. Marek-Sadowska, and T. Hwang,
"Technology mapping for TLU FPGA's based on
decomposition of binary decision diagrams",
Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 15, pp. 12261235, 1996.
[Cha97]
S. Chattopadhyay, S. Roy, and P. P. Chaudhuri,
"KGPMIN: an efficient multilevel multioutput
AND-OR-XOR minimizer", Transactions on
Computer-Aided Design of Integrated Circuits and
Systems, vol. 16, (3), pp. 257-265, 1997.
205
REFERENCES
[Che00]
K.-T. T. Cheng, S. Dey, M. Rodgers, and K. Roy,
"Test
challenges
for
deep
sub-micron
technologies", Proceedings of Design Automation
Conference, pp. 142-149, 2000
[Cho94]
C. H. Cho and J. R. Armstrong, "B-algorithm: a
behavioal test generation algorithm", Proceedings
of International Test Conference, pp. 968 - 979,
October 1994
[Cho96]
T.-L. Chou and K. Roy, "Estimation of activity for
static and domino CMOS circuits considering
signal correlations and simultaneous switching",
Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 15, (10), pp.
1257-1265, 1996.
[Cor00]
F. Corno, G. Cumani, M. S. Reorda, and G.
Squillero, "An RT-level fault model with high gate
level correlation", Proceedings of High-Level
Design Validation and Test Workshop, pp. 3-8, 2000
[Cor01]
F. Corno, M. S. Reorda, and G. Squillero, "An
interpretation framework for evaluating high-level
fault models and ATPG capabilities", Proceedings
of Design of Circuits and Integrated Systems, pp.
273-278, 2001
[Cos97]
J. C. Costa, J. C. Monterio, and S. Devadas,
"Switching activity estimation using limited depth
reconvergent path analysis", Proceedings of
International
Symposium
of
Low
Power
Electronics, pp. 184-189, 1997
206
[Cot03b]
E. Cota, M. Kreutz, C. A. Zeferino, L. Carro, M.
Lubaszewski, and A. Susin, "The impact of NoC
reuse on the testing of core-based systems",
Proceedings of VLSI Test Symposium, pp. 128-133,
2003
[Cot03a]
É. Cota, L. Carro, F. Wagner, and M. Lubaszewski,
"Power-aware noc reuse on the testing of corebased systems", Proceedings of International Test
Conference, pp. 612- 621, 2003
[Cur62]
H. A. Curtis, "A new approach to the design of
switching circuits", D. van Nostrand company,
1962.
[Cuv99]
M. Cuviello, S. Dey, X. Bai, and Y. Zhao, "Fault
modeling and simulation for crosstalk in systemon-chip interconnects", Proceedings of IEEE/ACM
International Conference on Computer-Aided
Design, pp. 297 - 303, 1999
[Dal08]
J. Dalmasso, É. Cota, M.-L. Flottes, and B.
Rouzeyre, "Improving the test of NoC-based SoCs
with help of compression schemes", Proceedings of
Symposium on VLSI, pp. 139 - 144, April 2008
[Dem94]
G. De Micheli, "Synthesis and optimization of
digital circuits", McGraw-Hill, Inc.,ISBN 0-07113271-6, 1994.
[Deb98]
D. Debnath and T. Sasao, "A heuristic algorithm to
design AND-OR-EXOR three-level networks",
Proceedings of Asia and South Pacific Design
Automation Conference, pp. 67-74, 1998
207
REFERENCES
[Dey98]
S. Dey, A. Raghunathan, and R. K. Roy,
"Considering testability during high-level design
(embedded tutorial)", Proceedings of Asia and
South Pacific Design Automation Conference, pp.
205-210, 1998
[Dua01]
C. Duan, A. Tirumala, and S. P. Khatri, "Analysis
and avoidance of cross-talk in on-chip buses",
Proceedings of Hot Interconnects 9, pp. 133 - 138,
August 2001
[Dub97a] E. V. Dubrova, D. M. Miller, and J. C. Muzio,
"AOXMIN: A three-level heuristic AND-OR-XOR
minimizer for Boolean functions", Proceedings of
3rd International Workshop on the Applications of
the Reed-Muller Expansion in Circuit Design, pp.
209-218, 1997
[Dub97b] E. V. Dubrova, J. C. Muzio, and B. v. Stengel,
"Finding composition trees for multiple-valued
functions", Proceedings of 27th International
Symposium on Multiple-Valued Logic, pp. 19-26,
1997
[Dub99]
208
E. V. Dubrova, D. M. Miller, and J. C. Muzio,
"AOXMIN-MV: A heurisitc algorithm for ANDOR-XOR minimization", Proceedings of 4th
International Workshop on the Applications of the
Reed-Muller Expansion in Circuit Design, pp. 37-54,
August 1999
[Dug08]
K. K. Duganapalli, A. K. Palit, and W. Anheier,
"Test pattern generation for worst-case crosstalk
faults in DSM chips using genetic algorithm",
Proceedings of Electronics Systemintegration
Technology Conference, pp. 393-402, September
2008
[Dum03] T. Dumitras and R. Mãculescu, "On-chip stochastic
communication",
Proceedings
of
Design
Automation and Test in Europe, pp. 790- 795, 2003
[Dut96]
S. Dutt and W. Deng, "VLSI circuit partitioning by
cluster-removal using iterative improvement
techniques",
Proceedings
of
IEEE/ACM
International Conference on CAD, pp. 194--200, 1996
[Eft05]
A. Efthymiou, J. Bainbridge, and D. Edwards, "Test
pattern generation an partial-scan methodology for
an asynchronous SoC interconnect", Transactions
on VLSI Systems, vol. 13, (12), pp. 1384-1393, 2005.
[Eld59]
R. D. Eldred, "Test routines based on symbolic
logical statements", Journal of the ACM, vol. 6, (1),
pp. 33 - 37, 1959.
[Fal01]
F. Fallah, S. Devadas, and K. Keutzer, "OCCOM Efficient computation of observability-based code
coverage metrics for functional verification",
Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 20, (8), pp.
1003-1015, 2001.
[Fer98]
F. Ferrandi, F. Fummi, and D. Sciuto, "Implicit test
generastion for behavioral VHDL models",
Proceedings of International Test Conference, pp.
587 - 569, 1998
209
REFERENCES
[Fer01]
F. Ferrandi, G. Ferrara, D. Sciuto, A. Fin, and F.
Fummi,
"Functional
test
generation
for
behaviorally sequential models", Proceedings of
Design Automation adn Testin Europe, pp. 403-410,
2001
[Fid82]
C. M. Fiduccia and R. M. Mattheyses, "A lineartime heuristic for improving network partitions",
Proceedings of IEEE/ACM Design Automation
Conference, pp. 175-181, 1982
[Fra07]
A. P. Frantz, M. Cassel, F. L. Kastensmidt, É. Cota,
and L. Carro, "Crosstalk- and SEU-aware Networks
on Chips", Design & Test of Computers, vol. 24, (4),
pp. 340-350, 2007.
[Gol02]
O. Goloubeva, M. S. Reorda, and M. Violante,
"Experimental analysis of fault models for
behavioral-level test generation", Proceedings of
IEEE Design & Diagnostic of Electronic Circuits &
Systems, pp. 416-419, 2002
[Gon02]
P. T. Gonciari, B. M. Al-Hashimi, and N. Nicolici,
"Improving compression ratio, area overhead and
test application time for system-on-a-Chip test data
compression/decompression",
Proceedings
of
Design Automation and Test in Europe, pp. 604-611,
2002
[Gre07]
C. Grecu, A. Ivanov, R. Saleh, and P. P. Pande,
"Testing
Network-on-Chip
communication
fabrics", IEEE Transactions on Computer-aided
desgin of integrated circuits and systems, vol. 26,
(12), pp. 2201-2214, 2007.
210
[Gup05]
S. Gupta and S. Katkoori, "Intrabus crosstalk
using
word-level
statistics",
estimation
Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 24, (3), pp. 469478, 2005.
[Han95a] M. C. Hansen and J. P. Hayes, "High-level test
generation using physically-induced faults",
Proceedings of VLSI Test Symposium, pp. 20-28,
May 1995
[Han95b] M. C. Hansen and J. P. Hayes, "High-level test
generation
using
symbolc
scheduling",
Proceedings of International Test Conference, pp.
586-595, October 1995
[Hem99]
A. Hemani, T. Meincke, S. Kumar, A. Postula, T.
Olsson, P. Nilsson, J. Öberg, P. Ellervee, and D.
Lundqvist, "Lowering power consumption in clock
by
using
globally
asynchronous
locally
synchronous design style", Proceedings of DAC99,
pp. 873-878, 1999
[Hey05]
P. Heydari and M. Pedram, "Capacitive coupling
noise in high-speed VLSI circuits", Transactions on
Computer-Aided Design of Integrated Circuits and
Systems, vol. 24, (3), pp. 478488, 2005.
[Ho04]
R. Ho, J. Gainsley, and R. Drost, "Long wires and
asynchronous
control",
Proceedings
of
International Symposium on Asynchronous Circuits
and Systems, pp. 240- 249, 2004
211
REFERENCES
[Hos06]
M. Hosseinabady, A. Banaiyan, M. N. Bojnordi,
and Z. Navabi, "A concurrent testing method for
NoC switches", Proceedings of Design, Automation
and Test in Europe, pp. 6-10, March 2006
[Hos07]
M. Hosseinabady, A. Dalirsani, and Z. Navabi,
"Using the Inter- and Intra-Switch Regularity in
NoC Switch Testing", Proceedings of Design,
Automation & Test in Europe Conference &
Exhibition, pp. 1-6, April 2007
[Hua08]
L. Huang, F. Yuan, and X. Xu, "On reliable modular
testing with vulnerable test access mechanisms",
Proceedings of 45th Design Automation Conference,
pp. 834 - 839 June 2008
[IEEE01]
IEEE, "IEEE Standard 1149.1, Standard test access
port and boundary-scan architecture (2001
revision)", 2001.
[IEEE05]
IEEE, "IEEE Standard 1500, Standard testability
method for embedded core-based integrated
circuits", 2005.
[Ism99]
Y. I. Ismail, E. G. Friedman, and J. L. Neves,
"Figures of merit to characterize the importance of
on-chip inductance", IEEE Transactions on Very
Large Scale Integration Systems, vol. 7, (4), pp. 442 449, 1999.
[ITRS08] "The international technology roadmap for
semiconductors",
ITRS,
Web
site:
http://www.itrs.net/Links/2008ITRS/Update/2008_U
pdate.pdf, 2008
212
[Jab00]
A. Jabir and J. Saul, "Heuristic AND-OR-EXOR
three-level minimisation algorithm for multipleoutput incompletely-specified Boolean functions",
Computers and Digital Techniques, IEE Proceedings,
vol. 147, (6), pp. 451 - 461, 2000.
[Jab02]
A. Jabir and J. Saul, "Minimisation algorithm for
three-level
mixed
AND-OR-EXOR/AND-OREXNOR representation of Boolean functions",
Computers and Digital Techniques, IEE Proceedings,
vol. 149, (3), pp. 82-96, 2002.
[Jer98]
G. Jervan, A. Markus, P. Paomets, J. Raik, and R.
Ubar, "A CAD system for teaching digital test",
Proceedings of 2nd European Workshop on
Microelectronics Education, Noordwijkerhout, The
Netherlands, pp. 287-290, 1998
[Jer02]
G. Jervan, Z. Peng, O. Goloubeva, M. S. Reorda,
and M. Violante, "High-level and hierarchical test
sequence generation", Proceedings of International
Workshop on High Level Design Validation and
Test, pp. 169-174, 2002
[Jha03]
N. Jha and S. Gupta, "Testing of digital systems",
Cambridge University Press, ISBN 0-521-77356-3,
2003.
[Jun08]
S. Jung, N. Zang, P. Eunsuk, and J. Kim, "Crosstalk
avoidance method considering multi-aggressors",
Proceedings
of
International
SoC
Design
Conference, pp. II-158 - II-161, November 2008
[Jut04]
A. Jutman, "At-speed on-chip diagnosis of boardlevel interconnect faults", Proceedings of European
Test Symposium, pp. 2-7, 2004
213
REFERENCES
[Kar53]
M. Karnaugh, "The map method for synthesis of
combinational logic circuits", Transactions of the
American Institute of Electrical Engineers: Part I :
Communication and electronics, vol. 72, (9), pp. 593599, 1953.
[Kar88]
K. Karplus, "Using if-then-else DAGs for multilevel logic minimization", University of California
Santa Cruz, Technical Report UCSC-CRL-88-29,
1988.
[Kim05]
J. Kim, D. Park, T. Theocharides, N. Vijaykrishnan,
and C. R. Das, "A low latency router supporting
adaptivity for on-chip interconnects ", Proceedings
of Design Automation Conference, pp. 559 - 564,
2005
[Kri84]
B. Krishnamurthy, "An improved min-cut
algorithm for partitioning VLSI networks",
Transactions on Computers, vol. C-33, pp. 438--446,
1984.
[Krs01]
A. Krstic, J.-J. Liou, Y.-M. Jiang, and K.-T. t. Cheng,
"Delay testing considering crosstalk-induced
effects", Proceedings of International Test
Conference, pp. 558-567, 2001
[Kum02] S. Kumar, A. Jantsch, M. Millberg, J. Öberg, J.-P.
Soininen, M. Forsell, K. Tiensyrja, and A. Hemani,
"A network on chip architecture and design
methodology", Proceedings of Comp. Society
Annual Symp. on VLSI, pp. 117-124, 2002
214
[Kun05]
S. Kundu, S. T. Zachariah, and Y.-S. Chang, "On
modeling crosstalk faults", Transactions on
Computer-Aided Design of Integrated Circuits and
Systems, vol. 24, (12), pp. 1909-1915, 2005.
[Lai93]
Y.-T. Lai, M. Pedram, and A. B. K. Vrudhula, "BDD
based decomposition of logic functions with
application to FPGA synthesis", Proceedings of
IEEE/ACM Design Automation Conference, pp. 642647, 1993
[Laj00]
M. Lajolo, L. Lavagno, M. Rebaudengo, M. S.
Reorda, and M. Violante, "Behavioral-level test
vector generation for system-on-chip designs",
Proceedings of International High Level Design
Validation Workshop, pp. 21-26, November 2000
[Lar08]
A. Larsson, "Test optimization for core-based
system-on-chip", Linköping Studies in Science and
Tchnology, Dissertation No 1222. Linköping
University, 2008.
[Lar04]
E. Larsson, K. Arvidsson, H. Fujiwara, and Z. Peng,
"Efficient test solutions for core-based designs",
Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 23, (5), pp. 758775, 2004.
[Len79]
T. Lengauer and R. E. Tarjan, "A fast algorithm for
finding dominators in a flowgraph", ACM
Transactions of Programming Languages and
Systems, vol. 1, (1), pp. 121--141, 1979.
215
REFERENCES
[Li06]
J. Li and L. Behjat, "A connectivity based clustering
algorithm with application to VLSI circuit
partitioning", IEEE Transactions on circuits and
systems—II: express briefs, vol. 53, (5), pp. 384-388,
2006.
[Li09]
K. S.-M. Li, C.-L. Lee, C. Su, and J. E. Chen, "A
Unified Detection Scheme for Crosstalk Effects in
Interconnection Bus", Transactions on Very Large
Scale Integration (VLSI) Systems, vol. 17, (2), pp.
306-311, 2009.
[Liu03]
J. Liu, L.-R. Zheng, D. Pamunuwa, and H.
Tenhunen, "A global wire planning scheme for
network-on-chip", Circuits and Systems, vol. 4, pp.
IV-892- IV-895, 2003.
[Liu04]
J. Liu, L.-R. Zheng, and H. Tenhunen, "Interconnect
intellectual property for Network-on-Chip (NoC)",
Journal of Systems Architecture, vol. 50, (2-3), pp.
65-79, 2004.
[Lus10]
A. K. Lusala and J.-D. Legat, "Combining circuit
and packet switching with bus architecture in a
NoC for real-time applications ", Proceedings of
International Symposium on Circuits and Systems
(ISCAS), pp. 2880 - 2883 2010
[Mat98]
Y. Matsunaga, "An exact and efficient algorithm for
disjunctive decomposition", Proceedings of
Workshop on Synthesis And System Integration of
MIxed Technologies (SASIMI), pp. 44--50, 1998
[Mcc56]
E. McCluskey, "Minimization of Boolean
functions", The Bell System Technical Journal, vol.
35, pp. 1417-1444, 1956.
216
[Mic06]
G. D. Micheli and L. Benini, "Networks on Chips",
Morgan Kaufmann, ISBN10: 0123705215, 2006.
[Min98]
S. Minato and G. D. Micheli, "Finding all simple
disjunctive decompositions using irredundant
sum-of-products
forms",
Proceedings
of
International Conference on Computer-Aided
Design, pp. 111-117, 1998
[Mis01]
A. Mishchenko, B. Steinbach, and M. Perkowski,
"An algorithm for bi-decomposition of logic
functions", Proceedings of Design Automation
Conference, pp. 103-108, 2001
[Moo65]
G. E. Moore, "Cramming more components onto
integrated circuits (Reprinted from Electronics
magazine, volume 38, number 8, April 19, 1965,
pp.114 ff)", Proceedings of the IEEE, vol. 86, (1), pp.
82-85, 1998.
[Mor01]
A. Morosov, K. Chakrabarty, M. Gössel, and B.
Bhattacharya, "Design of parameterizable errorpropagating space compactors for response
observation", Proceedings of IEEE VLSI Test Symp,
pp. 48-53, 2001
[Mou00]
S. Mourad and Y. Zorian, "Principles of testing
electronic circuits", John Wiley and Sons Ltd, ISBN
0-471-31931-7, 2000.
[Möh85]
R. H. Möhring, "Algorithmic aspects of the
substitution decomposition in optimization over
relations, set systems and Boolean functions",
Annals of Operations Research, vol. 4, (1), pp. 195225, 1985.
217
REFERENCES
[Nae04]
A. Naeemi, J. A. Davis, and J. D. Meindl, "Compact
physical models for multilevel interconnect
crosstalk
in
gigascale
integration
(GSI)",
Transactions on Electronic Devices, vol. 51, (11), pp.
1902-1912, 2004.
[Nak11]
Y. Nakata, Y. Takeuchi, H. Kawaguchi, and M.
Youshimoto,
"A
process-variation-adaptiv
network-on-chip with variable-cycle routers",
Proceedings of 14th Euromicro Conference on
Digital System Design, pp. 801-804, 2011
[Nor98]
P. Nordholz, D. Treytnar, J. Otterstedt, H.
Grabiniski, D. Niggemeyer, and T. W. Williams,
"Signal integrity problems in deep submicrons
arising from interconnects between cores",
Proceedings of VLSI Test Symposium, pp. 28 – 33
1998
[Nur04]
J. Nurmi, H. Tenhunen, J. Isoaho, and A. Jantsch,
"Interconnect-centric design for advanced SoC and
NoC": Kluwer Academic Publishers, ISBN
1402078358, 2004.
[Ope12]
"OpenCores",
OpenCores,
opencores.org/projects, 2012
[Pal05]
A. K. Palit, V. Meyer, W. Anheier, and J. Schloeffel,
"ABCD modeling of crosstalk coupling noise to
analyze the signal integrity losses on the victim
interconnect in DSM chips", Proceedings of 18th
International Conference on VLSI Design, pp. 354359, 2005
218
Web
site:
[Pam03]
D. Pamunuwa, "Modelling and analysis of
interconnects for deep submicron SoC", Doctoral
thesis, Stockholm: Royal Institute of Technology,
2003.
[Pam05]
D. Pamunuwa, S. Elassaad, and H. Tenhunen,
"Modeling delay and noise in arbitrary coupled RC
trees", Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 24, (11), pp.
1725-1739, 2005.
[Pan05]
P. P. Pande, G. D. Micheli, C. Grecu, A. Ivanov, and
R. Saleh, "Design, Synthesis, and Test of Networks
on Chips", Design & Test of Computers, vol.
September-October, pp. 404-412, 2005.
[Pil95]
L. Pileggi, "Coping with RC(L) interconnect design
headaches",
Proceedings
of
International
Conference on Computer-Aided Design, pp. 246-253,
1995
[Pra08]
S. N. Pradhan, M. T. Kumar, and S. Chattopadhyay,
"Three-level AND-OR-XOR network synthesis: A
GA based approach", Proceedings of Asia Pacific
Conference on Circuits and Systems, pp. 574 - 577,
November 2008
[Qui52]
W. Quinn, "The problem of simplifying truth
functions"", American Mathematical Monthly, vol.
59, (8), pp. 521-531, 1952.
[Rah10]
M. A. Rahimian, S. Mohammadi, and M. Fattah, "A
high-throughput, metastability-free GALS channel
based on pausible clock method", Proceedings of
Asia Symposium on Quality Electronic Design, pp.
294-300, 2010
219
REFERENCES
[Rai06]
J. Raik, V. Govind, and R. Ubar, "An external test
approach
for
network-on-a-chip
switches",
Proceedings of 15th Asian Test Symposium, pp.
437-442 November 2006
[Ros07]
D. Rossi, P. Angelini, and C. Metra, "Configurable
error control scheme for NoC signal integrity",
Proceedings of 13th IEEE International On-Line
Testing Symposium, pp. 43-48, July 2007
[Rot62]
J. P. Roth and R. M. Karp, "Minimization over
Boolean graphs", IBM Journal of research and
development, vol. 6, pp. 227-238, 1962.
[Rud93]
R. Rudell, "Dynamic variable ordering for ordered
binary decision diagrams", Proceedings of
International Conference on Computer-Aided
Design, pp. 42--47, 1993
[Sas95]
T. Sasao, "A design method for AND-OR-EXOR
three-level networks", Proceedings of International
Workshop on Logic Synthesis, pp. 8:11-8:20, May
1995
[Sas98]
T. Sasao and M. Matsuura, "DECOMPOS: An
integrated system for functional decomposition",
Proceedings of ACM/IEEE International Workshop
on Logic Synthesis, pp. 471–477, 1998
[Saw98]
H. Sawada, S. Yamashita, and A. Nagoya,
"Restructuring logic representations with easily
detectable simple disjunctive decompositions",
Proceedings of Design Automation Conference, pp.
755-759, 1998
220
[Sen92]
E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon,
R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R.
K. Brayton, and A. Sangionvanni-Vincentelli, "SIS:
A system for sequential circuit synthesis",
University of California, Berkeley 1992.
[She01]
S. R. Shelar and S. S. Sapatnekar, "Recursive
bipartitioning of BDDs for performance driven
synthesis of pass transistor logic circuits",
Proceedings of International Conference on
Computer Aided Design, pp. 449-452, November
2001
[Sin02]
A. Sinha, S. K. Gupta, and M. A. Breuer,
"Validation and test issues related to noise induced
by parasitic inductances of VLSI interconnects",
Transactions on Advanced Packaging, vol. 25, (3),
pp. 329-339, 2002.
[Sir02]
W. Sirisaengtakin and S. K. Gupta, "Enhanced
crosstalk fault model and methodology to generate
tests for arbitrary inter-core interconnect topology
", Proceedings of Asian Test Symposium, pp. 163169, November 2002
[Som98]
F. Somenzi, "CUDD: CU Decision Diagram
package release 2.3.0": Department of Electrical and
Computer Engineering, University of Colorado at
Boulder, 1998.
[Son09]
J. Song, J. Han, H. Yi, T. Jung, and S. Park, "Highly
compact interconnect test patterns for crosstalk and
static faults", Transactions on Circuits and Systems
II: Express Briefs, vol. 56, (5), pp. 419-423, 2009.
221
REFERENCES
[Sri08]
S. R. Sridhara, G. Balamurugan, and N. R.
Shanbhag, "Joint equalization and coding for onchip bus communication", Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 6, (3),
pp. 314-318, 2008.
[Sta95]
T. Stanion and C. Sechen, "Quasi-algebraic
decompositions
of
switching
functions",
Proceedings of Sixteenth Conference on Advanced
Research in VLSI, pp. 358-367, 1995
[Ste06]
K. Stewart and S. Tragoudas, "Interconnect testing
for networks on chip", Proceedings of VLSI Test
Symposium, pp. 100-191, 2006
[Su00]
C. Su, Y.-T. Chen, M.-J. Huang, G.-N. Chen, and C.L. Lee, "All digital built-in delay and crosstalk
measurement for on-chip buses", Proceedings of
Design And Test in Europe, pp. 527-531, Mar. 2000
[Sun08]
F. Sun and Y. Xia, "BDD based detection algorithm
for XOR-type logic", Proceedings of International
Conference on Communication Technology, pp. 351354, November 2008
[Tam07]
R. Tamhankar, S. Murali, S. Stergiou, A. Pullini, F.
Angiolini, L. Benini, and G. De Micheli, "Timingerror-tolerant
network-on-chip
design
methodology", IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems,
vol. 26, (7), pp. 1497-2007, 2007.
[Tes05]
M. Teslenko, A. Martinelli, and E. Dubrova.,
"Bound-set preserving ROBDD variable orderings
may not be optimum", Transactions on Computers,
vol. 54, (2), pp. 236- 237, 2005.
222
[Tra08]
X.-T. Tran, Y. Thonnart, J. Durupt, V. Beroulle, and
C. Robach, "A design-for-test implementation of an
asynchronous network-on-chip architecture and its
associated test pattern generation and application",
Proceedings of Second ACM/IEEE International
Symposium on Network-on-Chip, pp. 149-158, April
2008
[Uba96]
R. Ubar, "Test synthesis with alternative graphs",
IEEE Design & Test of Computers, vol. 13, (1), pp.
48-57, 1996.
[Uba04]
R. Ubar, M. Jenihhin, G. Jervan, and Z. Peng,
"Hybrid BIST optimization for core-based systems
with test pattern broadcasting", Proceedings of
Second IEEE International Workshop on Electronic
Design, Test and Applications, pp. 3-8, Jan 2004
[Ver03]
B. Vermeulen, J. Dielissen, K. Goossens, and C.
Ciordas, "Bringing communication netwroks on a
chip: Test and verification implications", IEEE
Communications Magazine, vol. September, pp. 7481, 2003.
[Wie02]
P. Wielage and K. Goossens, "Networks on silicon:
Blessing or nightmare?" Proceedings of Euromicro
Symposium on Digital System Design, pp. 196-200,
2002
[Von91]
B. Von Stengel, "Eine Dekompositionstheorie für
mehrstellige Funktionen", in Mathematical
Systems in Economics, vol. 123: Anton Hain,
Frankfurt, 1991.
223
REFERENCES
[Yan99]
C. Yang, V. Singhal, and M. Ciesielski, "BDD
decomposition for efficient logic synthesis",
Proceedings of International Conference on
Computer Design, pp. 626-631, 1999
[Zha03]
Y. Zhao and S. Dey, "Fault-coverage analysis
techniques of crosstalk in chip interconnects",
Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 22, (6), pp. 770782, 2003.
[Zha04]
Y. Zhao, S. Dey, and L. Chen, "Double sampling
data checking technique: an online testing solution
for multisource noise-induced errors on on-chip
interconnects and buses", Transactions on VLSI
Systems, vol. 12, (7), pp. 746-755, 2004.
[Zim03]
H. Zimmer and A. Jantsch, "A fault model notation
and error-control scheme for switch-to-switch
buses in a network-on-chip", Proceedings of
International Conference on Hardware/Software
Codesign and System Synthesis (CODES-ISSS),.
2003
224
Department of Computer and Information Science
Linköpings universitet
Dissertations
Linköping Studies in Science and Technology
Linköping Studies in Arts and Science
Linköping Studies in Statistics
Linköpings Studies in Informatics
Linköping Studies in Science and Technology
N o 14
Anders Haraldsson: A Program Manipulation
System Based on Partial Evaluation, 1977, ISBN 917372-144-1.
N o 17
Bengt Magnhagen : Probability Based Verification of
Tim e Margins in Digital Designs, 1977, ISBN 91-7372157-3.
N o 18
Mats Cedw all : Sem antisk analys av processbeskrivningar i naturligt språk, 1977, ISBN 91- 7372168-9.
Jaak Urmi: A Machine Ind epend ent LISP Com piler
and its Im plications for Id eal H ard w are, 1978, ISBN
91-7372-188-3.
Tore Risch: Com pilation of Multiple File Queries in
a Meta-Database System 1978, ISBN 91- 7372-232-4.
Erland Jungert: Synthesizing Database Structures
from a User Oriented Data Mod el, 1980, ISBN 917372-387-8.
Sture Hägglund: Contributions to the Developm ent
of Method s and Tools for Interactive Design of
Applications Softw are, 1980, ISBN 91-7372-404-1.
Pär Emanuelson: Perform ance Enhancem ent in a
Well-Structured Pattern Matcher through Partial
Evaluation, 1980, ISBN 91-7372-403-3.
Bengt Johnsson, Bertil Andersson: The H u m anCom pu ter Interface in Com m ercial System s, 1981,
ISBN 91-7372-414-9.
H. Jan Komorow ski: A Specification of an Abstract
Prolog Machine and its Application to Partial
Evaluation, 1981, ISBN 91-7372-479-3.
René Reboh: Know led ge Engineering Tech niques
and Tools for Expert Systems, 1981, ISBN 91-7372489-0.
Östen Oskarsson: Mechanism s of Mod ifiability in
large Softw are System s, 1982, ISBN 91- 7372-527-7.
Hans Lunell: Cod e Generator Writing System s, 1983,
ISBN 91-7372-652-4.
Andrzej Lingas: Ad vances in Minim um Weight
Triangulation, 1983, ISBN 91-7372-660-5.
Peter Fritzson: Tow ard s a Distributed Program m ing
Environm ent based on Increm ental Com pilation,
1984, ISBN 91-7372-801-2.
Erik Tengvald: The Design of Expert Planning
System s. An Experim ental Operations Plan ning
System for Tu rning, 1984, ISBN 91-7372- 805-5.
Christos Levcopoulos: H euristics for Minim um
Decom positions of Polygons, 1987, ISBN 91-7870133-3.
James W. Goodw in: A Theory and System for N on Monotonic Reasoning, 1987, ISBN 91-7870-183-X.
Zebo Peng: A Form al Method ology for Autom ated
Synthesis of VLSI System s, 1987, ISBN 91-7870-225-9.
Johan Fagerström: A Parad igm and System for
Design of Distributed System s, 1988, ISBN 91-7870301-8.
D imiter D riankov: Tow ards a Many Valued Logic of
Quantified Belief, 1988, ISBN 91-7870-374-3.
N o 22
N o 33
N o 51
N o 54
N o 55
N o 58
N o 69
N o 71
N o 77
N o 94
N o 97
N o 109
N o 111
N o 155
N o 165
N o 170
N o 174
N o 192
N o 213
N o 214
N o 221
N o 239
N o 244
N o 252
N o 258
N o 260
N o 264
N o 265
N o 270
N o 273
N o 276
N o 277
N o 281
N o 292
N o 297
N o 302
N o 312
N o 338
N o 371
Lin Padgham: N on-Monotonic Inheritance for an
Object Oriented Know led ge Base, 1989, ISBN 917870-485-5.
Tony Larsson: A Form al H ard w are Description and
Verification Method , 1989, ISBN 91-7870-517-7.
Michael Reinfrank: Fund am entals and Logical
Found ations of Truth Maintenance, 1989, ISBN 917870-546-0.
Jonas Löw gren: Know led ge-Based Design Sup port
and Discourse Managem ent in User Interface
Managem ent System s, 1991, ISBN 91-7870-720-X.
Henrik Eriksson: Meta-Tool Support for Know led ge
Acquisition, 1991, ISBN 91-7870-746-3.
Peter Eklund: An Ep istem ic Approach to Interactive
Design in Multiple Inheritance H ierarchies, 1991,
ISBN 91-7870-784-6.
Patrick D oherty: N ML3 - A N on-Monotonic
Form alism w ith Explicit Defaults, 1991, ISBN 917870-816-8.
N ahid
Shahmehri: Generalized
Algorithm ic
Debugging, 1991, ISBN 91-7870-828-1.
N ils D ahlbäck: Representation of DiscourseCognitive and Com putational Aspects, 1992, ISBN
91-7870-850-8.
Ulf N ilsson: Abstract Interp retations and Abstract
Machines: Contributions to a Method ology for the
Im plem entation of Logic Program s, 1992, ISBN 917870-858-3.
Ralph Rönnquist: Theory and Practice of Tensebound Object References, 1992, ISBN 91-7870-873-7.
Björn Fjellborg: Pipeline Extraction for VLSI Data
Path Synthesis, 1992, ISBN 91-7870-880-X.
Staffan Bonnier: A Form al Basis for H orn Clause
Logic w ith External Polym orphic Functions, 1992,
ISBN 91-7870-896-6.
Kristian Sandahl: Developing Know led ge Managem ent System s w ith an Active Expert Method ology,
1992, ISBN 91-7870-897-4.
Christer Bäckström: Com putational Com plexity of
Reasoning about Plans, 1992, ISBN 91-7870-979-2.
Mats Wirén: Stud ies in Increm ental N atural
Language Analysis, 1992, ISBN 91-7871-027-8.
Mariam Kamkar: Interproced ural Dynam ic Slicing
w ith Applications to Debugging and Testing, 1993,
ISBN 91-7871-065-0.
Tingting Zhang: A Stud y in Diagnosis Using
Classification and Defaults, 1993, ISBN 91-7871-078-2
Arne Jönsson: Dialogue Managem ent for N atural
Language Interfaces - An Em pirical Approach, 1993,
ISBN 91-7871-110-X.
Simin N adjm-Tehrani: Reactive System s in Physical
Environm ents: Com positional Mod elling and Fram ew ork for Verification, 1994, ISBN 91-7871-237-8.
Bengt Savén: Business Mod els for Decision Support
and Learning. A Stu d y of Discrete-Event
Manufacturing Sim ulation at Asea/ ABB 1968-1993,
1995, ISBN 91-7871-494-X.
N o 375
N o 383
N o 396
N o 413
N o 414
N o 416
N o 429
N o 431
N o 437
N o 439
N o 448
N o 452
N o 459
N o 461
N o 462
N o 475
N o 480
N o 485
N o 494
N o 495
N o 498
N o 502
N o 503
N o 512
Ulf Söderman: Conceptu al Mod elling of Mode
Sw itching Physical System s, 1995, ISBN 91-7871-5164.
Andreas Kågedal: Exploiting Ground ness in Logic
Program s, 1995, ISBN 91-7871-538-5.
George Fodor: Ontological Control, Description,
Id entification and Recovery from Problem atic
Control Situ ations, 1995, ISBN 91-7871-603-9.
Mikael Pettersson: Com piling N atural Sem antics,
1995, ISBN 91-7871-641-1.
Xinli Gu: RT Level Testability Im provem ent by
Testability Analysis and Transform ations, 1996, ISBN
91-7871-654-3.
Hua Shu: Distribu ted Default Reasoning, 1996, ISBN
91-7871-665-9.
Jaime Villegas: Sim ulation Supported Ind ustrial
Training
from
an
Organisational
Learning
Perspective - Developm ent and Evaluation of the
SSIT Method , 1996, ISBN 91-7871-700-0.
Peter Jonsson: Stud ies in Action Planning:
Algorithm s and Com plexity, 1996, ISBN 91-7871-7043.
Johan
Boye: Directional
Types
in
Logic
Program m ing, 1996, ISBN 91-7871-725-6.
Cecilia Sjöberg: Activities, Voices and Arenas:
Participatory Design in Practice, 1996, ISBN 91-7871728-0.
Patrick Lambrix: Part-Whole Reasoning in
Description Logics, 1996, ISBN 91-7871-820-1.
Kjell Orsborn: On Extensible and Object-Relational
Database Technology for Finite Elem ent Analysis
Ap plications, 1996, ISBN 91-7871-827-9.
Olof Johansson: Developm ent Environm ents for
Com plex Prod uct Mod els, 1996, ISBN 91-7871-855-4.
Lena Strömbäck: User-Defined Constructions in
Unification-Based Form alism s, 1997, ISBN 91-7871857-0.
Lars D egerstedt: Tabu lation-based Logic Program m ing: A Multi-Level View of Query Answ ering,
1996, ISBN 91-7871-858-9.
Fredrik N ilsson: Strategi och ekonom isk styrning En stud ie av hur ekonom iska styrsystem utform as
och använd s efter företagsförvärv, 1997, ISBN 917871-914-3.
Mikael Lindvall: An Em pirical Stud y of Requirem ents-Driven Im pact Analysis in Object-Oriented
Softw are Evolution, 1997, ISBN 91-7871-927-5.
Göran Forslund: Opinion-Based System s: The Coop erative Perspective on Know led ge-Based Decision
Support, 1997, ISBN 91-7871-938-0.
Martin Sköld: Active Database Managem ent
System s for Monitoring and Control, 1997, ISBN 917219-002-7.
Hans Olsén: Au tom atic Verification of Petri N ets in
a CLP fram ew ork, 1997, ISBN 91-7219-011-6.
Thomas D rakengren: Algorithm s and Com plexity
for Tem poral and Spatial Form alism s, 1997, ISBN 917219-019-1.
Jakob Axelsson: Analysis and Synthesis of H eterogeneous Real-Tim e System s, 1997, ISBN 91-7219-035-3.
Johan Ringström: Com piler Generation for DataParallel Program m ing Languages from Tw o-Level
Sem antics Specifications, 1997, ISBN 91-7219-045-0.
Anna Moberg: N ärhet och d istans - Stud ier av kom m unikationsm önster i satellitkontor och flexibla
kontor, 1997, ISBN 91-7219-119-8.
N o 520
N o 522
No 526
N o 530
N o 555
N o 561
N o 563
N o 567
N o 582
N o 589
N o 592
N o 593
N o 594
N o 595
N o 596
N o 597
N o 598
N o 607
N o 611
N o 613
N o 618
N o 627
Mikael Ronström: Design and Mod elling of a
Parallel Data Server for Telecom Applications, 1998,
ISBN 91-7219-169-4.
N iclas Ohlsson: Tow ard s Effective Fault Prevention
- An Em pirical Stud y in Softw are Engineering, 1998,
ISBN 91-7219-176-7.
Joachim Karlsson: A System atic Approach for
Prioritizing Softw are Requirem ents, 1998, ISBN 917219-184-8.
Henrik N ilsson: Declarative Debugging for Lazy
Functional Languages, 1998, ISBN 91-7219-197-x.
Jonas Hallberg: Tim ing Issues in H igh -Level Synthesis, 1998, ISBN 91-7219-369-7.
Ling Lin: Managem ent of 1-D Sequence Data - From
Discrete to Continuous, 1999, ISBN 91-7219-402-2.
Eva L Ragnemalm: Stud ent Mod elling based on Collaborative Dialogue w ith a Learning Com panion,
1999, ISBN 91-7219-412-X.
Jörgen Lindström: Does Distance m atter? On geographical d ispersion in organisations, 1999, ISBN 917219-439-1.
Vanja Josifovski: Design, Im plem entation and
Evalu ation of a Distribu ted Med iator System for
Data Integration, 1999, ISBN 91-7219-482-0.
Rita Kovordányi: Mod eling and Sim ulating
Inhibitory
Mechanism s
in
Men tal
Im age
Reinterpretation - Tow ard s Cooperative H u m an Com puter Creativity, 1999, ISBN 91-7219-506-1.
Mikael Ericsson: Supporting the Use of Design
Know led ge - An Assessment of Com m enting
Agents, 1999, ISBN 91-7219-532-0.
Lars Karlsson: Actions, Interactions and N arratives,
1999, ISBN 91-7219-534-7.
C. G. Mikael Johansson: Social and Organizational
Aspects of Requirem ents Engineering Method s - A
practice-oriented approach, 1999, ISBN 91-7219-541X.
Jörgen Hansson: Value-Driven Multi-Class Overload
Managem ent in Real-Tim e Database System s, 1999,
ISBN 91-7219-542-8.
N iklas Hallberg: Incorporating User Values in the
Design of Inform ation System s and Services in the
Public Sector: A Method s Approach, 1999, ISBN 917219-543-6.
Vivian Vimarlund: An Econom ic Perspective on the
Analysis of Im pacts of Inform ation Technology:
From Case Stud ies in H ealth -Care tow ard s General
Mod els and Theories, 1999, ISBN 91-7219-544-4.
Johan Jenvald: Method s and Tools in Com puterSupported Taskforce Training, 1999, ISBN 91-7219547-9.
Magnus Merkel: Und erstand ing and enhancing
translation by parallel text processing, 1999, ISBN 917219-614-9.
Silvia Coradeschi: Anchoring sym bols to sensory
d ata, 1999, ISBN 91-7219-623-8.
Man Lin: Analysis and Synthesis of Reactive
System s:
A
Generic
Layered
Architecture
Perspective, 1999, ISBN 91-7219-630-0.
Jimmy Tjäder: System im plem entering i praktiken En stud ie av logiker i fyra projekt, 1999, ISBN 917219-657-2.
Vadim Engelson: Tools for Design, Interactive
Sim ulation, and Visu alization of Object-Oriented
Mod els in Scientific Com puting, 2000, ISBN 91-7219709-9.
N o 637
N o 639
N o 660
N o 688
N o 689
N o 720
N o 724
N o 725
N o 726
N o 732
N o 745
N o 746
N o 757
N o 747
N o 749
N o 765
N o 771
N o 772
N o 758
N o 774
N o 779
N o 793
N o 785
N o 800
Esa Falkenroth: Database Technology for Control
and Sim ulation, 2000, ISBN 91-7219-766-8.
Per-Arne Persson: Bringing Pow er and Know led ge
Together: Inform ation System s Design for Autonom y
and Control in Com m and Work, 2000, ISBN 91-7219796-X.
Erik Larsson: An Integrated System -Level Design for
Testability Method ology, 2000, ISBN 91-7219-890-7.
Marcus
Bjäreland:
Mod el-based
Execution
Monitoring, 2001, ISBN 91-7373-016-5.
Joakim Gustafsson: Extend ing Tem poral Action
Logic, 2001, ISBN 91-7373-017-3.
Carl-Johan Petri: Organizational Inform ation Provision - Managing Mand atory and Discretionary Use
of Inform ation Technology, 2001, ISBN -91-7373-1269.
Paul Scerri: Designing Agents for System s w ith Ad justable Au tonom y, 2001, ISBN 91 7373 207 9.
Tim Heyer: Sem antic Inspection of Softw are
Artifacts: From Theory to Practice, 2001, ISBN 91
7373 208 7.
Pär Carlshamre: A Usability Perspective on Requirem ents Engineering - From Method ology to Prod uct
Developm ent, 2001, ISBN 91 7373 212 5.
Juha Takkinen: From Inform ation Managem ent to
Task Managem ent in Electronic Mail, 2002, ISBN 91
7373 258 3.
Johan Åberg: Live H elp System s: An Approach to
Intelligent H elp for Web Inform ation System s, 2002,
ISBN 91-7373-311-3.
Rego Granlund: Monitoring Distributed Team w ork
Training, 2002, ISBN 91-7373-312-1.
Henrik André-Jönsson: Ind exing Strategies for Tim e
Series Data, 2002, ISBN 917373-346-6.
Anneli Hagdahl: Develop m ent of IT-supported
Interorganisational Collaboration - A Case Stud y in
the Sw ed ish Public Sector, 2002, ISBN 91-7373-314-8.
Sofie Pilemalm: Inform ation Tech nology for N onProfit Organisations - Extend ed Participatory Design
of an Inform ation System for Trad e Union Shop
Stew ard s, 2002, ISBN 91-7373-318-0.
Stefan Holmlid: Ad apting users: Tow ard s a theory
of use quality, 2002, ISBN 91-7373-397-0.
Magnus Morin: Multim ed ia Representations of Distributed Tactical Operations, 2002, ISBN 91-7373-4217.
Paw el Pietrzak: A Type-Based Fram ew ork for Locating Errors in Constraint Logic Program s, 2002, ISBN
91-7373-422-5.
Erik Berglund: Library Comm unication Am ong Program m ers Worldw id e, 2002, ISBN 91-7373-349-0.
Choong-ho Yi: Mod elling Object-Oriented Dynam ic
System s Using a Logic-Based Fram ew ork, 2002, ISBN
91-7373-424-1.
Mathias Broxvall: A Stud y in the Com putational
Com plexity of Tem poral Reasoning, 2002, ISBN 917373-440-3.
Asmus Pandikow : A Generic Principle for Enabling
Interoperability of Structured and Object-Oriented
Analysis and Design Tools, 2002, ISBN 91-7373-479-9.
Lars Hult: Pu blika Inform ationstjänster. En stud ie av
d en Internetbaserad e encykloped ins bruksegenskaper, 2003, ISBN 91-7373-461-6.
Lars Taxén: A Fram ew ork for the Coord ination of
Com plex System s´ Developm ent, 2003, ISBN 917373-604-X
N o 808
N o 821
N o 823
N o 828
N o 833
N o 852
N o 867
N o 872
N o 869
N o 870
N o 874
N o 873
N o 876
N o 883
N o 882
N o 887
N o 889
N o 893
N o 910
N o 918
N o 900
N o 920
N o 929
Klas Gäre: Tre perspektiv på förväntningar och
föränd ringar i sam band m ed införand e av
inform ationssystem , 2003, ISBN 91-7373-618-X.
Mikael
Kindborg:
Concurrent
Com ics
program m ing of social agents by child ren, 2003,
ISBN 91-7373-651-1.
Christina Ölvingson: On
Developm ent
of
Inform ation System s w ith GIS Functionality in
Public H ealth Inform atics: A Requirem ents
Engineering Approach, 2003, ISBN 91-7373-656-2.
Tobias Ritzau: Mem ory Efficient H ard Real-Tim e
Garbage Collection, 2003, ISBN 91-7373-666-X.
Paul
Pop:
Analysis
and
Synthesis
of
Com m unication -Intensive H eterogeneous Real-Tim e
System s, 2003, ISBN 91-7373-683-X.
Johan Moe: Observing the Dynam ic Behaviour of
Large Distributed System s to Im prove Developm ent
and Testing – An Em pirical Stud y in Softw are
Engineering, 2003, ISBN 91-7373-779-8.
Erik Herzog: An Approach to System s Engineering
Tool Data Representation and Exchange, 2004, ISBN
91-7373-929-4.
Aseel Berglund: Augm enting the Rem ote Control:
Stud ies in Com plex Inform ation N avigation for
Digital TV, 2004, ISBN 91-7373-940-5.
Jo Skåmedal: Telecom m uting’s Im plications on
Travel and Travel Patterns, 2004, ISBN 91-7373-935-9.
Linda Askenäs: The Roles of IT - Stud ies of
Organising w hen Im plem enting and Using
Enterprise System s, 2004, ISBN 91-7373-936-7.
Annika Flycht-Eriksson: Design and Use of Ontologies in Inform ation -Provid ing Dialogue System s,
2004, ISBN 91-7373-947-2.
Peter Bunus: Debugging Techniques for Equation Based Languages, 2004, ISBN 91-7373-941-3.
Jonas Mellin: Resource-Pred ictable and Efficient
Monitoring of Events, 2004, ISBN 91-7373-956-1.
Magnus Bång: Com puting at the Speed of Paper:
Ubiquitous Com pu ting Environm ents for H ealthcare
Professionals, 2004, ISBN 91-7373-971-5
Robert Eklund: Disfluency in Sw ed ish hum an hum an and hum an -m achine travel booking d ialogues, 2004, ISBN 91-7373-966-9.
Anders Lindström: English and other Foreign
Linguistic Elem ents in Spoken Sw ed ish. Stud ies of
Prod uctive Processes and their Mod elling using
Finite-State Tools, 2004, ISBN 91-7373-981-2.
Zhiping Wang: Capacity-Constrained Prod uction-inventory system s - Mod elling and Analysis in both a
trad itional and an e-business context, 2004, ISBN 9185295-08-6.
Pernilla Qvarfordt: Eyes on Multim od al Interaction,
2004, ISBN 91-85295-30-2.
Magnus Kald: In the Bord erland betw een Strategy
and Managem ent Control - Theoretical Fram ew ork
and Em pirical Evid ence, 2004, ISBN 91-85295-82-5.
Jonas Lundberg: Shaping Electronic N ew s: Genre
Perspectives on Interaction Design, 2004, ISBN 9185297-14-3.
Mattias Arvola: Shad es of use: The d ynam ics of
interaction d esign for sociable use, 2004, ISBN 9185295-42-6.
Luis Alejandro Cortés: Verification and Sched uling
Techniques for Real-Tim e Embed d ed System s, 2004,
ISBN 91-85297-21-6.
D iana Szentivanyi: Performance Stud ies of FaultTolerant Mid d lew are, 2005, ISBN 91-85297-58-5.
N o 933
N o 937
N o 938
N o 945
N o 946
N o 947
N o 963
N o 972
N o 974
N o 979
N o 983
N o 986
N o 1004
N o 1005
N o 1008
N o 1009
N o 1013
N o 1016
N o 1017
N o 1018
N o 1019
N o 1021
N o 1022
Mikael Cäker: Managem ent Accounting as
Constructing and Opposing Custom er Focus: Three
Case Stud ies on Managem ent Accounting and
Custom er Relations, 2005, ISBN 91-85297-64-X.
Jonas Kvarnström: TALp lanner
and
Other
Extensions to Tem poral Action Logic, 2005, ISBN 9185297-75-5.
Bourhane Kadmiry: Fuzzy Gain -Sched uled Visual
Servoing for Unm anned H elicopter, 2005, ISBN 9185297-76-3.
Gert Jervan: H ybrid Bu ilt-In Self-Test and Test
Generation Techniques for Digital System s, 2005,
ISBN : 91-85297-97-6.
Anders Arpteg: Intelligent Sem i-Structured Inform ation Extraction, 2005, ISBN 91-85297-98-4.
Ola Angelsmark: Constructing Algorithm s for Constraint Satisfaction and Related Problem s - Method s
and Applications, 2005, ISBN 91-85297-99-2.
Calin Curescu: Utility-based Optim isation of
Resource Allocation for Wireless N etw orks, 2005,
ISBN 91-85457-07-8.
Björn Johansson: Joint Control in Dynam ic
Situations, 2005, ISBN 91-85457-31-0.
D an Law esson: An Approach to Diagnosability
Analysis for Interacting Finite State System s, 2005,
ISBN 91-85457-39-6.
Claudiu D uma: Security and Trust Mechanism s for
Groups in Distributed Services, 2005, ISBN 91-8545754-X.
Sorin Manolache: Analysis and Optim isation of
Real-Tim e System s w ith Stochastic Behaviour, 2005,
ISBN 91-85457-60-4.
Yuxiao
Zhao:
Stand ard s-Based
Application
Integration
for
Business-to-Business
Com m unications, 2005, ISBN 91-85457-66-3.
Patrik
Haslum: Ad m issible
H euristics
for
Autom ated Planning, 2006, ISBN 91-85497-28-2.
Aleksandra Tešanovic: Developing Reusable and
Reconfigurable Real-Tim e Softw are using Aspects
and Com ponents, 2006, ISBN 91-85497-29-0.
D avid D inka: Role, Id entity and Work: Extend ing
the d esign and d evelopm ent agend a, 2006, ISBN 9185497-42-8.
Iakov N akhimovski: Contributions to the Mod eling
and Sim ulation of Mechanical System s w ith Detailed
Contact Analysis, 2006, ISBN 91-85497-43-X.
Wilhelm D ahllöf: Exact Algorithm s for Exact
Satisfiability Problem s, 2006, ISBN 91-85523-97-6.
Levon Saldamli: PDEMod elica - A H igh-Level Language for Mod elin g w ith Partial Differential Equations, 2006, ISBN 91-85523-84-4.
D aniel Karlsson: Verification of Com ponent-based
Em bed d ed System Designs, 2006, ISBN 91-85523-79-8
Ioan Chisalita: Com m unication and N etw orking
Techniques for Traffic Safety System s, 2006, ISBN 9185523-77-1.
Tarja Susi: The Pu zzle of Social Activity - The
Significance of Tools in Cognition and Cooperation,
2006, ISBN 91-85523-71-2.
Andrzej Bednarski: Integrated Optim al Cod e Generation for Digital Signal Processors, 2006, ISBN 9185523-69-0.
Peter Aronsson: Au tom atic Parallelization of Equ ation-Based Sim ulation Program s, 2006, ISBN 9185523-68-2.
N o 1030
N o 1034
N o 1035
N o 1045
N o 1051
N o 1054
N o 1061
N o 1073
N o 1075
N o 1079
N o 1083
N o 1086
N o 1089
N o 1091
N o 1106
N o 1110
N o 1112
N o 1113
N o 1120
N o 1127
N o 1139
N o 1143
N o 1150
N o 1155
Robert N ilsson: A Mutation-based Fram ew ork for
Autom ated Testing of Tim eliness, 2006, ISBN 9185523-35-6.
Jon Edvardsson: Techniques for Autom atic
Generation
of Tests
from
Program s
and
Specifications, 2006, ISBN 91-85523-31-3.
Vaida Jakoniene: Integration of Biological Data,
2006, ISBN 91-85523-28-3.
Genevieve
Gorrell:
Generalized
H ebbian
Algorithm s for Dim ensionality Red uction in N atural
Language Processing, 2006, ISBN 91-85643-88-2.
Yu-Hsing Huang: H aving a N ew Pair of Glasses Applying System ic Accid ent Mod els on Road Safety,
2006, ISBN 91-85643-64-5.
Åsa Hedenskog: Perceive those things w hich cannot
be seen - A Cognitive System s Engineering
perspective on requirem ents m anagem ent, 2006,
ISBN 91-85643-57-2.
Cécile Åberg: An Evaluation Platform for Sem antic
Web Technology, 2007, ISBN 91-85643-31-9.
Mats Grindal: H and ling Com binatorial Explosion in
Softw are Testing, 2007, ISBN 978-91-85715-74-9.
Almut Herzog: Usable Security Policies for Runtim e
Environm ents, 2007, ISBN 978-91-85715-65-7.
Magnus Wahlström: Algorithm s, m easures, and
u p per bound s for Satisfiability and related problem s,
2007, ISBN 978-91-85715-55-8.
Jesper Andersson: Dynam ic Softw are Architectures,
2007, ISBN 978-91-85715-46-6.
Ulf Johansson: Obtaining Accurate and Com prehensible Data Mining Mod els - An Evolu tionary
Approach, 2007, ISBN 978-91-85715-34-3.
Traian Pop: Analysis and Optim isation of
Distributed Em bed d ed System s w ith H eterogeneous
Sched uling Policies, 2007, ISBN 978-91-85715-27-5.
Gustav N ordh: Com plexity Dichotom ies for CSPrelated Problem s, 2007, ISBN 978-91-85715-20-6.
Per Ola Kristensson: Discrete and Continuous Shape
Writing for Text Entry and Control, 2007, ISBN 97891-85831-77-7.
He Tan: Aligning Biom ed ical Ontologies, 2007, ISBN
978-91-85831-56-2.
Jessica Lindblom: Mind ing the bod y - Interacting socially through em bod ied action, 2007, ISBN 978-9185831-48-7.
Pontus Wärnestål: Dialogue Behavior Managem ent
in Conversational Recom m end er System s, 2007,
ISBN 978-91-85831-47-0.
Thomas Gustafsson: Managem ent of Real-Tim e
Data Consistency and Transient Overload s in
Em bed d ed System s, 2007, ISBN 978-91-85831-33-3.
Alexandru Andrei: Energy Efficient and Pred ictable
Design of Real-tim e Em bed d ed System s, 2007, ISBN
978-91-85831-06-7.
Per Wikberg: Eliciting Know led ge from Experts in
Mod eling of Com plex System s: Managing Variation
and Interactions, 2007, ISBN 978-91-85895-66-3.
Mehdi Amirijoo: QoS Control of Real-Tim e Data
Services und er Uncertain Workload , 2007, ISBN 97891-85895-49-6.
Sanny Syberfeldt: Optim istic Replication w ith Forw ard Conflict Resolution in Distributed Real-Tim e
Databases, 2007, ISBN 978-91-85895-27-4.
Beatrice Alenljung: Envisioning a Future Decision
Support System for Requirem ents Engineering - A
H olistic and H um an -centred Perspective, 2008, ISBN
978-91-85895-11-3.
N o 1156
N o 1183
N o 1185
N o 1187
N o 1204
N o 1222
N o 1238
N o 1240
N o 1241
N o 1244
N o 1249
N o 1260
N o 1262
N o 1266
N o 1268
N o 1274
N o 1281
N o 1290
N o 1294
N o 1306
N o 1313
N o 1321
N o 1333
N o 1337
Artur Wilk: Types for XML w ith Application to
Xcerpt, 2008, ISBN 978-91-85895-08-3.
Adrian Pop: Integrated Mod el-Driven Developm ent
Environm ents for Equation-Based Object-Oriented
Languages, 2008, ISBN 978-91-7393-895-2.
Jörgen
Skågeby:
Gifting
Technologies
Ethnographic Stud ies of End -users and Social Med ia
Sharing, 2008, ISBN 978-91-7393-892-1.
Imad-Eldin Ali Abugessaisa: Analytical tools and
inform ation-sharing m ethod s supporting road safety
organizations, 2008, ISBN 978-91-7393-887-7.
H. Joe Steinhauer: A Representation Schem e for Description
and
Reconstruction
of
Object
Configurations Based on Qualitative Relations, 2008,
ISBN 978-91-7393-823-5.
Anders Larsson: Test Optim ization for Core-based
System -on-Chip, 2008, ISBN 978-91-7393-768-9.
Andreas Borg: Processes and Mod els for Capacity
Requirem ents in Telecom m unication System s, 2009,
ISBN 978-91-7393-700-9.
Fredrik Heintz: DyKnow : A Stream -Based Know led ge Processing Mid dlew are Fram ew ork, 2009,
ISBN 978-91-7393-696-5.
Birgitta Lindström: Testability of Dynam ic RealTim e System s, 2009, ISBN 978-91-7393-695-8.
Eva Blomqvist: Sem i-autom atic Ontology Construction based on Patterns, 2009, ISBN 978-91-7393-683-5.
Rogier Woltjer: Functional Mod eling of Constraint
Managem ent in Aviation Safety and Com m and and
Control, 2009, ISBN 978-91-7393-659-0.
Gianpaolo Conte: Vision-Based Localization and
Guid ance for Unm anned Aerial Vehicles, 2009, ISBN
978-91-7393-603-3.
AnnMarie Ericsson: Enabling Tool Support for Form al Analysis of ECA Rules, 2009, ISBN 978-91-7393598-2.
Jiri Trnka: Exploring Tactical Com m and and
Control: A Role-Playing Sim ulation Approach, 2009,
ISBN 978-91-7393-571-5.
Bahlol Rahimi: Supporting Collaborative Work
through ICT - H ow End -users Think of and Ad opt
Integrated H ealth Inform ation System s, 2009, ISBN
978-91-7393-550-0.
Fredrik Kuivinen: Algorithm s and H ard ness Results
for Som e Valued CSPs, 2009, ISBN 978-91-7393-525-8.
Gunnar Mathiason: Virtual Full Replication for
Scalable Distributed Real-Tim e Databases, 2009,
ISBN 978-91-7393-503-6.
Viacheslav Izosimov: Sched uling and Optim ization
of Fault-Tolerant Distribu ted Em bed d ed System s,
2009, ISBN 978-91-7393-482-4.
Johan
Thapper: Aspects
of a
Constraint
Optim isation Problem , 2010, ISBN 978-91-7393-464-0.
Susanna N ilsson: Augm entation in the Wild : User
Centered
Developm ent
and
Evaluation
of
Augm ented Reality Applications, 2010, ISBN 978-917393-416-9.
Christer Thörn: On the Quality of Feature Mod els,
2010, ISBN 978-91-7393-394-0.
Zhiyuan He: Tem perature Aw are and DefectProbability Driven Test Sched uling for System -onChip, 2010, ISBN 978-91-7393-378-0.
D avid Broman: Meta-Languages and Sem antics for
Equation-Based Mod eling and Sim ulation, 2010,
ISBN 978-91-7393-335-3.
Alexander Siemers: Contributions to Mod elling and
Visualisation of Multibod y System s Sim ulations w ith
N o 1354
N o 1359
N o 1373
N o 1374
N o 1375
N o 1381
N o 1386
N o 1419
N o 1451
N o 1455
N o 1465
N o 1455
Detailed Contact Analysis, 2010, ISBN 978-91-7393317-9.
Mikael
Asplund:
Disconnected
Discoveries:
Availability Stud ies in Partitioned N etw orks, 2010,
ISBN 978-91-7393-278-3.
Jana
Rambusch:
Mind
Gam es
Extend ed :
Und erstand ing Gam eplay as Situated Activity, 2010,
ISBN 978-91-7393-252-3.
Sonia Sangari: H ead Movem ent Correlates to Focus
Assignm ent in Sw ed ish,2011,ISBN 978-91-7393-154-0.
Jan-Erik Källhammer: Using False Alarm s w hen
Developing Autom otive Active Safety System s, 2011,
ISBN 978-91-7393-153-3.
Mattias Eriksson: Integrated Cod e Generation, 2011,
ISBN 978-91-7393-147-2.
Ola Leifler: Afford ances and Constraints of
Intelligent Decision Support for Military Com m and
and Control – Three Case Stud ies of Su pport
System s, 2011, ISBN 978-91-7393-133-5.
Soheil Samii: Quality-Driven Synthesis and
Optim ization of Em bed d ed Control System s, 2011,
ISBN 978-91-7393-102-1.
Erik Kuiper: Geographic Routing in Interm ittentlyconnected Mobile Ad H oc N etw orks: Algorithm s
and Perform ance Mod els, 2012, ISBN 978-91-7519981-8.
Sara Stymne: Text H arm onization Strategies for
Phrase-Based Statistical Machine Translation, 2012,
ISBN 978-91-7519-887-3.
Alberto Montebelli: Mod eling the Role of Energy
Managem ent in Em bod ied Cognition, 2012, ISBN
978-91-7519-882-8.
Mohammad Saifullah: Biologically-Based Interactive
N eural N etw ork Mod els for Visual Attention and
Object Recognition, 2012, ISBN 978-91-7519-838-5.
Tomas Bengtsson: Testing and Logic Optim ization
Techniques for System s on Chip, 2012, ISBN 978-917519-742-5.
Linköping Studies in Arts and Science
N o 504
Ing-Marie
Jonsson:
Social
and
Em otional
Characteristics
of
Speech -based
In-Vehicle
Inform ation System s: Im pact on Attitud e and
Driving Behaviour, 2009, ISBN 978-91-7393-478-7.
Linköping St udies in St at ist ics
No 9
D avood Shahsavani: Com p uter Experim ents Designed to Explore and Approxim ate Com plex Deter m inistic Mod els, 2008, ISBN 978-91-7393-976-8.
N o 10
Karl Wahlin: Road m ap for Trend Detection and Assessm ent of Data Quality, 2008, ISBN 978-91-7393792-4.
N o 11
Oleg Sysoev: Monotonic regression for large
m ultivariate d atasets, 2010, ISBN 978-91-7393-412-1.
N o 13
Agné Burauskaite-Harju: Characterizing Tem poral
Change and Inter-Site Correlation s in Daily and Subd aily Precipitation Extrem es, 2011, ISBN 978-91-7393110-6.
Linköping St udies in Informat ion Science
No 1
Karin Axelsson: Metod isk system strukturering- att
skapa sam stäm m ighet m ellan inform ationssystem arkitektur och verksam het, 1998. ISBN -9172-19-296-8.
No 2
Stefan Cronholm: Metod verktyg och använ d barhet en
stud ie
av
d atorstöd d
m etod baserad
system utveckling, 1998, ISBN -9172-19-299-2.
No 3
No 4
No 5
No 6
No 7
No 8
No 9
N o 10
N o 11
N o 12
N o 13
N o 14
Anders Avdic: Använd are och utvecklare - om
anveckling m ed kalkylprogram , 1999. ISBN -91-7219606-8.
Ow en Eriksson: Kom m unikationskvalitet hos inform ationssystem och affärsprocesser, 2000, ISBN 917219-811-7.
Mikael Lind: Från system till process - kriterier för
processbestäm ning vid verksam hetsanalys, 2001,
ISBN 91-7373-067-X.
Ulf Melin: Koord ination och inform ationssystem i
företag och nätverk, 2002, ISBN 91-7373-278-8.
Pär J. Ågerfalk: Inform ation System s Actability - Und erstand ing Inform ation Technology as a Tool for
Business Action and Com m unication, 2003, ISBN 917373-628-7.
Ulf Seigerroth: Att förstå och föränd ra system utvecklingsverksam heter - en taxonom i för
m etau tveckling, 2003, ISBN 91-7373-736-4.
Karin Hedström: Spår av d atoriseringens värd en –
Effekter av IT i äld reom sorg, 2004, ISBN 91-7373-9634.
Ew a Braf: Know led ge Dem and ed for Action Stud ies on Know led ge Med iation in Organisations,
2004, ISBN 91-85295-47-7.
Fredrik Karlsson: Method Configuration m ethod
and com puterized tool sup port, 2005, ISBN 91-8529748-8.
Malin N ordström: Styrbar system förvaltning - Att
organisera system förvaltningsverksam het m ed hjälp
av effektiva förvaltningsobjekt, 2005, ISBN 91-8529760-7.
Stefan Holgersson: Yrke: POLIS - Yrkeskunskap,
m otivation, IT-system och and ra förutsättningar för
polisarbete, 2005, ISBN 91-85299-43-X.
Benneth
Christiansson,
Marie-Therese
Christiansson: Mötet m ellan process och kom ponent
- m ot ett ram verk för en verksam hetsnära
kravspecifikation vid anskaffning av kom ponentbaserad e inform ationssystem , 2006, ISBN 91-8564322-X.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement