Enabling and Achieving Self-Management for Large Scale Distributed Systems AHMAD AL-SHISHTAWY Licentiate Thesis

Enabling and Achieving Self-Management for Large Scale Distributed Systems AHMAD AL-SHISHTAWY Licentiate Thesis
Enabling and Achieving Self-Management for Large
Scale Distributed Systems
Platform and Design Methodology for Self-Management
AHMAD AL-SHISHTAWY
Licentiate Thesis
Stockholm, Sweden 2010
TRITA-ICT/ECS AVH 10:01
ISSN 1653-6363
ISRN KTH/ICT/ECS/AVH-10/01-SE
ISBN 978-91-7415-589-1
KTH School of Information and
Communication Technology
SE-164 40 Kista
SWEDEN
Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framlägges
till offentlig granskning för avläggande av teknologie licentiatesexamen i datalogi
fridagen den 9 april 2010 klockan 14.00 i sal D i Forum IT-Universitetet, Kungl
Tekniska högskolan, Isajordsgatan 39, Kista.
© Ahmad Al-Shishtawy, April 2010
Tryck: Universitetsservice US AB
iii
Abstract
Autonomic computing is a paradigm that aims at reducing administrative
overhead by using autonomic managers to make applications self-managing.
To better deal with large-scale dynamic environments; and to improve scalability, robustness, and performance; we advocate for distribution of management functions among several cooperative autonomic managers that coordinate their activities in order to achieve management objectives. Programming
autonomic management in turn requires programming environment support
and higher level abstractions to become feasible.
In this thesis we present an introductory part and a number of papers
that summaries our work in the area of autonomic computing. We focus on
enabling and achieving self-management for large scale and/or dynamic distributed applications. We start by presenting our platform, called Niche, for
programming self-managing component-based distributed applications. Niche
supports a network-transparent view of system architecture simplifying designing application self-* code. Niche provides a concise and expressive API
for self-* code. The implementation of the framework relies on scalability
and robustness of structured overlay networks. We have also developed a
distributed file storage service, called YASS, to illustrate and evaluate Niche.
After introducing Niche we proceed by presenting a methodology and design space for designing the management part of a distributed self-managing
application in a distributed manner. We define design steps, that includes partitioning of management functions and orchestration of multiple autonomic
managers. We illustrate the proposed design methodology by applying it to
the design and development of an improved version of our distributed storage
service YASS as a case study.
We continue by presenting a generic policy-based management framework
which has been integrated into Niche. Policies are sets of rules that govern the system behaviors and reflect the business goals or system management objectives. The policy based management is introduced to simplify the
management and reduce the overhead, by setting up policies to govern system behaviors. A prototype of the framework is presented and two generic
policy languages (policy engines and corresponding APIs), namely SPL and
XACML, are evaluated using our self-managing file storage application YASS
as a case study.
Finally, we present a generic approach to achieve robust services that
is based on finite state machine replication with dynamic reconfiguration of
replica sets. We contribute a decentralized algorithm that maintains the set
of resource hosting service replicas in the presence of churn. We use this
approach to implement robust management elements as robust services that
can operate despite of churn.
To my family
vii
Acknowledgements
This thesis would not have been possible without the help and support of many
people around me, only a proportion of which I have space to acknowledge here.
I would like to start by expressing my gratitude to my supervisor, Prof. Vladimir
Vlassov, for his continuous support, ideas, patience, and encouragement that have
been invaluable on both academic and personal levels. His insightful advice and
unsurpassed knowledge kept me focused on my goals.
I am grateful to Per Brand for sharing his knowledge and experience with me
during my research and for his contributions and feedback in fine-tuning my work
till its final state. I also feel privileged to have the opportunity to work under the
supervision of Prof. Seif Haridi. His deep knowledge in many fields of computer
science, fruitful discussions, and enthusiasm have been a tremendous source of
inspiration. I acknowledge the help and support given to me by Prof. Thomas
Sjöland, the head of software and computer systems unit at KTH. I would like to
thank Sverker Janson, the director of computer systems laboratory at SICS, for his
precious advices and guidance to improve my research quality and orient me to the
right direction.
I would also like to acknowledge the Grid4All European project that partially
funded this thesis. I take this opportunity to thank the Grid4All team, specially
Konstantin Popov and Joel Höglund for being a constant source of help.
I am indebted to all my colleagues at KTH and SICS, specially to Tallat Shafaat,
Cosmin Arad, Ali Ghodsi, Amir Payberah, and Fatemeh Rahimian for making the
environment at the lab both constructive and fun.
Finally, I owe my deepest gratitude to my wife Marwa and to my daughters
Yara and Awan for their love and support at all times. I am most grateful to my
parents for helping me to be where I am now.
Contents
Contents
ix
List of Figures
xiii
List of Tables
xv
List of Algorithms
I
xvii
Thesis Overview
1
1 Introduction
1.1 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background
2.1 Autonomic Computing . . . . . . . . . . .
2.2 The Fractal Component Model . . . . . .
2.3 Structured Peer-to-Peer Overlay Networks
2.4 State of the Art in Self-Management for
Systems . . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
Large
. . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
Scale Distributed
. . . . . . . . . . .
3
4
5
7
7
10
11
13
3 Niche: A Platform for Self-Managing Distributed Applications
17
3.1 Niche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Demonstrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Thesis Contribution
4.1 List of Publications . . . . . . . . . . . . .
4.2 Enabling Self-Management . . . . . . . .
4.3 Design Methodology for Self-Management
4.4 Policy Based Self-Management . . . . . .
4.5 Replication of Management Elements . . .
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
23
24
26
27
28
x
Contents
5 Conclusions and Future Work
5.1 Enabling Self-Management . . . . . . . .
5.2 Design Methodology for Self-Management
5.3 Policy based Self-Management . . . . . .
5.4 Replication of Management Elements . . .
5.5 Discussion and Future Work . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
32
32
33
33
Bibliography
37
II Research Papers
43
6 Enabling Self-Management of Component Based Distributed Applications
45
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.2 The Management Framework . . . . . . . . . . . . . . . . . . . . . . 49
6.3 Implementation and evaluation . . . . . . . . . . . . . . . . . . . . . 51
6.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.5 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Bibliography
59
7 A Design Methodology for Self-Management in Distributed Environments
61
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2 The Distributed Component Management System . . . . . . . . . . . 64
7.3 Steps in Designing Distributed Management . . . . . . . . . . . . . . 66
7.4 Orchestrating Autonomic Managers . . . . . . . . . . . . . . . . . . . 67
7.5 Case Study: A Distributed Storage Service . . . . . . . . . . . . . . 69
7.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . 76
Bibliography
8 Policy Based Self-Management in
8.1 Introduction . . . . . . . . . . . .
8.2 Niche: A Distributed Component
8.3 Niche Policy Based Management
8.4 Framework Prototype . . . . . .
8.5 Related Work . . . . . . . . . . .
8.6 Conclusions and Future Work . .
Bibliography
79
Distributed Environments
. . . . . . . . . . . . . . . . . .
Management System . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
81
83
84
85
88
91
92
95
Contents
xi
III Technical Report
97
9 Achieving Robust Self-Management for Large-Scale Distributed
Applications
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3 Automatic Reconfiguration of Replica Sets . . . . . . . . . . . . . . .
9.4 Robust Management Elements in Niche . . . . . . . . . . . . . . . .
9.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . .
99
102
103
106
114
116
Bibliography
117
List of Figures
2.1
A simple autonomic computing architecture with one autonomic manager.
9
6.1
6.2
6.3
6.4
6.5
6.6
6.7
Application Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . .
Ids and Handlers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structure of MEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Composition of MEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
YASS Functional Part . . . . . . . . . . . . . . . . . . . . . . . . . . . .
YASS Non-Functional Part . . . . . . . . . . . . . . . . . . . . . . . . .
Parts of the YASS application deployed on the management infrastructure.
49
49
50
50
51
52
53
7.1
7.2
7.3
7.4
7.5
7.6
7.7
7.8
7.9
The stigmergy effect. . . . . . . . .
Hierarchical management. . . . . .
Direct interaction. . . . . . . . . .
Shared Management Elements. . .
YASS Functional Part . . . . . . .
Self-healing control loop. . . . . . .
Self-configuration control loop. . .
Hierarchical management. . . . . .
Sharing of Management Elements.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
68
68
69
69
71
72
72
74
75
8.1
8.2
8.3
8.4
8.5
Niche Management Elements . . . . . .
Policy Based Management Architecture
YASS self-configuration control loop . .
XACML policy evaluation results . . . .
SPL policy evaluation results . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
86
86
89
91
92
9.1
9.2
State Machine Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 108
Replica Placement Example . . . . . . . . . . . . . . . . . . . . . . . . . 109
.
.
.
.
.
.
.
.
.
xiii
.
.
.
.
.
.
.
.
.
List of Tables
8.1
8.2
Policy Evaluation Result (in milliseconds) . . . . . . . . . . . . . . . . .
Policy Reload Result (in milliseconds) . . . . . . . . . . . . . . . . . . .
xv
90
90
List of Algorithms
9.1
9.2
9.3
9.4
9.5
Helper Procedures . . . . . . . . . . . . . .
Replicated State Machine API . . . . . . .
Execution . . . . . . . . . . . . . . . . . . .
Churn Handling . . . . . . . . . . . . . . . .
SM maintenance (handled by the container)
xvii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
110
111
113
114
115
Part I
Thesis Overview
1
Chapter 1
Introduction
Grid, Cloud and P2P systems provide pooling and coordinated use of distributed
resources and services. Most P2P systems have self-management properties that
make them able to operate in the presence of resource churn (join, leave, and
failure). The self-management capability hides management complexity, reduces
the cost of ownership (administration and maintenance) of P2P systems. On the
other hand, most Grid systems are built with an assumption of a stable and rather
static Grid infrastructure that in most of cases is managed by system administrators. The complexity and management overheads of Grids makes it difficult for
IT-inexperienced users to deploy and to use Grids in order to take advantages of
resource sharing in dynamic Virtual Organizations (VOs) similar to P2P user communities. In this research we address the challenge of enabling self-management
in large-scale and/or dynamic distributed systems, e.g. domestic Grids, in order
to hide the system complexity and to automate its management, i.e. organization,
tuning, healing and protection.
Most distributed systems and applications are built of distributed components
using a distributed component model such as the Grid Component Model (GCM);
therefore we believe that self-management should be enabled on the level of components in order to support distributed component models for development of large
scale dynamic distributed systems and applications. These distributed applications
need to manage themselves by having some self-* properties (i.e. self-configuration,
self-healing, self-protection, self-optimization) in order to survive in a highly dynamic distributed environment. All self-* properties are based on feedback control
loops, known as MAPE-K loop (monitor, analyze, plan , execute – knowledge)
that come form the field of Autonomic Computing. The first step towards selfmanagement in large-scale distributed systems is to provide distributed sensing
and actuating services that are self-managing by themselves. Another important
step is to provide robust management abstraction that can be used to construct
MAPE-K loops. These services and abstractions should provide strong guarantees
in the quality of service under churn and system evolution.
3
4
CHAPTER 1. INTRODUCTION
The core of our approach to self-management is based on leveraging the selforganizing properties of structured overlay networks, for providing basic services
and runtime support, together with component models, for reconfiguration and
introspection. The end result is an autonomic computing platform suitable for largescale dynamic distributed environments. Structured P2P systems are designed to
work in the highly dynamic distributed environment we are targeting. They have
self-* properties and can tolerate churn. Therefore structured P2P systems can
be used as a base to support self-management in a distributed system, e.g. as
a communication medium (for message passing, broadcast, and routing), lookup
(distributed hashtables and name based communication), and publish/subscribe
service.
To better deal with dynamic environments; to improve scalability, robustness,
and performance; we advocate for distribution of management functions among several cooperative managers that coordinate their activities in order to achieve management objectives. Several issues appears when trying to enable self-management
for large scale complex distributed systems that do not appear in centralized and
cluster based systems. These issues include long network delays and the difficulty of
maintaining global knowledge of the system. These problems affect the observability/controllability of the control system and may prevent us from directly applying
classical control theory to build control loops. Another important issue is the coordination between multiple autonomic managers to avoid conflicts and oscillations.
Autonomic managers must also be replicated in dynamic environments to tolerate
failures.
1.1
Main Contributions
The main contributions of the thesis are:
• First, a platform called Niche that enables the development, deployment, and
execution of large scale component based distributed applications in dynamic
environments;
• Second, a design methodology that supports the design of distributed management and defines different interaction patterns between managers;
• Third, a framework for using policy management with Niche. We also evaluate
the use of two policy languages versus hard coded policies;
• Finally, an algorithm to automate the reconfiguration of nodes hosting a replicated state machine in order to tolerate resource churn. The algorithm is
based on SON algorithms and service migration techniques. The algorithm
is used to implement robust management elements as self-healing replicated
state machine.
1.2. THESIS ORGANIZATION
1.2
5
Thesis Organization
The thesis is organized into three parts as follows. Part I is organized into five chapters including this chapter. Chapter 2 lays out the necessary background for the
thesis. Chapter 3 introduces our platform “Niche” for enabling self-management.
Thesis contribution is presented in Chapter 3.3, followed by the conclusions and future work in Chapter 5. Part II includes three research papers that where produced
during the thesis work. Finally, Part III presents a technical report.
Chapter 2
Background
This chapter lays out the necessary background for the thesis. The core of our
approach to self-management is based on leveraging the self-organizing properties
of structured overlay networks, for providing basic services and runtime support,
together with component models, for reconfiguration and introspection. The end
result is an autonomic computing platform suitable for large-scale dynamic distributed environments. These key concepts are described below.
2.1
Autonomic Computing
In 2001, Paul Horn from IBM coined the term autonomic computing to mark the
start of a new paradigm of computing [1]. Autonomic computing focus on tackling
the problem of growing software complexity. This problem poses a great challenge
for both science and industry because the increasing complexity of computing systems makes it more difficult for the IT staff to deploy, manage and maintain such
systems. This dramatically increases the cost of management. Further more, if
not properly and timely managed, the performance of the system may drop or the
system may even fail. Another drawback of increasing complexity is that it forces
us to focus more on handling management issues instead of improving the system
itself and moving forward towards new innovative applications.
Autonomic computing was inspired from the autonomic nervous system that
continuously regulates and protect our bodies subconsciously [2] leaving us free
to focus on other work. Similarly, an autonomic system should be aware of its
environment and continuously monitor itself and adapt accordingly with minimal
human involvement. Human managers should only specify higher level policies that
define the general behaviour of the system. This will reduce the cost of management,
improve performance, and enable the development of new innovative applications.
Thus purpose of autonomic computing is not to replace humans entirely but rather
to enable systems to adjust and adapt themselves automatically to reflect evolving
policies defined by humans.
7
8
CHAPTER 2. BACKGROUND
Properties of Self-Managing Systems
IBM proposed main properties that any self-managing system should have [3] to be
an autonomic system. These properties are usually referred to as self-* properties.
The four main properties are:
• Self-configuration: An autonomic system should be able to configure itself
based on the current environment and available resources. The system should
also be able to continuously reconfigure itself and adapt to changes.
• Self-optimization: The system should continuously monitor itself and try
to tune itself and keep performance at optimum levels.
• Self-healing: Failures should be detected by the system. After detection,
the system should be able to recover from the failure and fix itself.
• Self-protection: The system should be able to protect itself from malicious
use. This include protection against viruses, distributed network attacks, and
intrusion attempts.
The Autonomic Computing Architecture
The autonomic computing reference architecture proposed by IBM [4] consists of
the following five building blocks (see Figure 2.1).
• Touchpoint: consists of a set of sensors and effectors used by autonomic
managers to interact with managed resources (get status and perform operations). Touchpoints are components in the system that implement a uniform
management interface that hides the heterogeneity of managed resources. A
managed resource must be exposed through touchpoints to be manageable.
Sensors provide information about the state of the resource. Effectors provide
a set of operations that can be used to modify the state of resources.
• Autonomic Manager: is the key building block in the architecture. Autonomic managers are used to implement the self-management behaviour of the
system. This is achieved through a control loop that consists of four main
stages: monitor, analyze, plan, and execute. The control loop interacts with
the managed resource through the exposed touchpoints.
• Knowledge Source: is used to share knowledge (e.g. architecture information, monitoring history, policies, and management data such as change
plans) between autonomic managers.
• Enterprise Service Bus: provides connectivity of components in the system.
2.1. AUTONOMIC COMPUTING
9
Manager
Interface
Touch Point
Autonomic Manager
Analyze
Monitor
Plan
Knowledge
Execute
Touch Point
Managed Resource
Managed Resource
Figure 2.1: A simple autonomic computing architecture with one autonomic manager.
• Manager Interface: provides an interface for administrators to interact
with the system. This includes the ability to monitor/change the status of
the system and to control autonomic managers through policies.
Approaches to Autonomic Computing
Recent research in both academia and industry have adopted different approaches to
achieve autonomic behaviour in computing systems. The most popular approaches
are described below:
• Control Theoretic Approach: Classical control theory have been successfully applied to solve control problems in computing systems [5] such as load
balancing, throughput regulation, and power management. Control theory
concepts and techniques are being used to guide the development of autonomic managers for modern self-managing systems [6]. Challenges beyond
10
CHAPTER 2. BACKGROUND
classical control theory have also been addressed [7] such as use of proactive
control (model predictive control) to cope with network delays and uncertain
operating environments and also multivariable optimization in the discrete
domain.
• Architectural Approach: This approach advocates for composing autonomic systems out of components. It is closely related to service oriented architectures. Properties of components including required interfaces, expected
behaviours, interaction establishment, and design patterns are described [8].
Autonomic behaviour of computing systems are achieved through dynamically
modifying the structure (compositional adaptation) and thus the behaviour
of the system [9, 10] in response to changes in the environment or user behaviour. Management in this approach is done at the level of components
and interactions between them.
• Emergence-based Approach: This approach is inspired from nature where
complex structures or behaviours emerge from relatively simple interactions.
Examples range from the forming of sand dunes to swarming that is found
in many animals. In computing systems, the overall autonomic behaviour
of the system at the macro-level is not directly programmed but emerges
from the, relatively simple, behavior of various sub systems at the microlevel [11–13]. This approach is highly decentralized. Subsystems make decisions autonomously based on their local knowledge and view of the system.
Communication is usually simple, asynchronous, and used to exchanging data
between subsystems.
• Agent-based Approach: Unlike traditional management approaches, that
are usually centralized or hierarchical, agent-based approach for management
is decentralized. This is suitable for large-scale computing systems that are
distributed with many complex interactions. Agents in a multi-agent system
collaborate, coordinate, and negotiate with each other forming a society or
an organization to solve a problem of a distributed nature [14, 15].
• Legacy Systems: Research in this branch tries to add self-managing behaviours to already existing (legacy) systems. Research includes techniques
for monitoring and actuating legacy systems as well as defining requirements
for systems to be controllable [16–19].
In our work we followed mainly the architectural approach to autonomic computing. However, there is no clear line dividing these different approaches and they
may be combined together in one system.
2.2
The Fractal Component Model
The Fractal component model [20, 21] is a modular and extensible component model
that is used to design, implement, deploy and reconfigure various systems and appli-
2.3. STRUCTURED PEER-TO-PEER OVERLAY NETWORKS
11
cations. Fractal is programming language and execution model independent. The
main goal of the Fractal component model is to reduce the development, deployment and maintenance costs of complex software systems. This is achieved mainly
through separation of concerns that appears at different levels namely: separation
of interface and implementation, component oriented programming, and inversion
of control. The separation of interface and implementation separates design from
implementation. The component oriented programming divides the implementation
into smaller separated concerns that are assigned to components. The inversion of
control separate the functional and management concerns.
A component in Fractal consists of two parts: the membrane and the content.
The membrane is responsible for the non functional properties of the component
while the content is responsible for the functional properties. A fractal component
can be accessed through interfaces. There are three types of interfaces: client,
server, and control interfaces. Client and server interfaces can be linked together
through bindings while the control interfaces are used to control and introspect the
component. A Fractal component can be a basic of composite component. In the
case of a basic component, the content is the direct implementation of its functional
properties. The content in a composite component is composed from a finite set of
other components. Thus a Fractal application consists of a set of component that
interact through composition and bindings.
Fractal enables the management of complex applications by making the software
architecture explicit. This is mainly due to the reflexivity of the Fractal component
model which means that components have full introspection and intercession capabilities (through control interfaces). The main controllers defined by fractal are
attribute control, binding control, content control, and life cycle control.
The model also includes the Fractal architecture description language (Fractal
ADL) that is an XML document used to describe the Fractal architecture of applications including component description (interfaces, implementation, membrane,
etc.) and relation between components (composition and bindings). The Fractal
ADL can also be used to deploy a fractal application where an ADL parser parses
the application’s ADL file and instantiate the corresponding components and bindings.
2.3
Structured Peer-to-Peer Overlay Networks
Peer-to-peer (P2P) refers to a class of distributed network architectures that is
formed between participants (usually called nodes or peers) on the edge of the Internet. P2P is becoming more popular as edge devices are becoming more powerful
in terms of network connectivity, storage, and processing power. A common feature
to all P2P networks is that the participants form a community of peers where a
peer in the community shares some resource (e.g. storage, bandwidth, or processing power) with others and in return it can use the resources shared by others [22].
Put in other words, each peer plays the role of both client and server. Thus, P2P
12
CHAPTER 2. BACKGROUND
networks usually dose not need a central server and operates in a decentralised way.
Another important feature is that peers also play the role of routers and participate
in routing messages between peers in the overlay.
P2P networks are scalable and robust. The fact that each peer plays the role of
both client and server has a major effect in allowing P2P networks to scale to large
number of peers. This is because, unlike traditional client server model, adding
more peers increases the capacity of the system (e.g. adding more storage and
bandwidth). Another important factor that helps P2P to scale is that peers act as
a router. Thus each peer needs only to know about a subset of other peers. The
decentralised feature of P2P networks improve their robustness. There is no single
point of failure and P2P networks are designed to tolerate peers joining, leaving
and failing at any time they will.
Peers in a P2P network usually form an overlay network on top of the physical
network topology. An overlay consists of virtual links that are formed between
peers in a certain way according to the P2P network type. A virtual link between
any two peers in the overlay may be implemented by several links in the physical
network. The overlay is usually used for communication, indexing, and peer discovery. The way links in the overlay are formed divide P2P networks into two main
classes: unstructured and structured networks. Overlay links between peers in an
unstructured P2P network are formed randomly without any algorithm to organize
the structure. On the other hand, overlay links between peers in a structured P2P
network follow a fixed structure and is continuously maintained by an algorithm.
The remainder of this section will focus on structured P2P networks.
Structured P2P network such as Chord [23], Can [24], and Pastry [25] maintain
a structure of overlay links. Using this structure allow peers to implement a distributed hash table (DHT). DHTs provide a lookup service similar to hash tables
that consists of a (key, value) pair. Given a key, any peer can efficiently retrieve the
associated value by routing a request to the responsible peer. The responsibility of
maintaining the mapping between (key, value) pairs and the routing information
is distributed between the peers in such a way that peer join/leave/failure cause
minimal disruption to the lookup service. This maintenance is automatic and does
not require human involvement. This feature is known as self-management.
More complex service can be built on top of DHTs. Such services include name
based communication, efficient multicast/broadcast, publish subscribe service, and
distributed file systems.
In our work we used structured overlay networks and services built on top of it as
a communication medium between different components in the system (functional
components and management elements). We used indexing service to implement
network transparent name based communication and component groups. We used
efficient multicast/broadcast for communication and discovery. We used publish/subscribe service to implement event based communication between management
elements.
2.4. STATE OF THE ART
2.4
13
State of the Art in Self-Management for Large Scale
Distributed Systems
There is the need to reduce the cost of software ownership, i.e. the cost of the
administration, management, maintenance, and optimization of software systems
and also networked environments such as Grids, Clouds, and P2P systems. This
need is caused by the inevitable increase in complexity and scale of software systems
and networked environments, which are becoming too complicated to be directly
managed by humans. For many such systems manual management is difficult,
costly, inefficient and error-prone.
A large-scale system may consists of thousands of elements to be monitored and
controlled, and have a large number of parameters to be tuned in order to optimize
system performance and power, to improve resource utilization and to handle faults
while providing services according to SLAs. The best way to handle the increases
in system complexity, administration and operation costs is to design autonomic
systems that are able to manage themselves like the autonomic nervous system regulates and protects the human body [2, 3]. System self-management allows reducing
management costs and improving management efficiency by removing humans from
most of (low-level) system management mechanisms, so that the main duty of humans is to define policies for autonomic management rather than to manage the
mechanisms that implement the policies.
The increasing complexity of software systems and networked environments motivates autonomic system research in both, academia and industry, e.g. [1–3, 26].
Major computer and software vendors have launched R&D initiatives in the field
of autonomic computing.
The main goal of autonomic system research is to automate most of system
management functions that include configuration management, fault management,
performance management, power management, security management, cost management, and SLA management. Self-management objectives are typically classified into four categories: self-configuration, self-healing, self-optimization, and selfprotection [3]. Major self-management objectives in large-scale systems, such as
Clouds, include repairing on failures, improving resources utilization, performance
optimization, power optimization, change (upgrade) management. Autonomic SLA
management is also included in the list of self-management tasks. Currently, it is
very important to make self-management power-aware, i.e. to minimize energy
consumption while meeting service level objectives [27].
The major approach to self-management is to use one or multiple feedback control loops [2, 5], a.k.a. autonomic managers [3], to control different properties of the
system based on functional decomposition of management tasks and assigning the
tasks to multiple cooperative managers [28–30]. Each manager has a specific management objective (e.g. power optimization or performance optimization), which
can be of one or a combination of three kinds: regulatory control (e.g. maintain
server utilization at a certain level), optimization (e.g. power and performance
14
CHAPTER 2. BACKGROUND
optimizations), disturbance rejection (e.g. provide operation while upgrading the
system) [5]. A manager control loop consists of four stages, known as MAPE:
Monitoring, Analyzing, Planning, and Execution [3].
Authors of [5] apply the control theoretic approach to design computing systems with feedback loops. The architectural approach to autonomic computing [8]
suggests specifying interfaces, behavioral requirements, and interaction patterns for
architectural elements, e.g. components. The approach has been shown to be useful
for e.g. autonomous repair management [31]. The analyzing and planning stages of
a control loop can be implemented using utility functions to make management decisions, e.g. to achieve efficient resource allocation [32]. Authors of [30] and [29] use
multi-criteria utility functions for power-aware performance management. Authors
of [33, 34] use a model-predictive control technique, namely a limited look-ahead
control (LLC), combined with a rule-based managers, to optimize the system performance based on its forecast behavior over a look-ahead horizon.
Policy-based self-management [35–40] allow high-level specification of management objectives in the form of policies that drive autonomic management and can
be changed at run-time. Policy-based management can be combined with “hardcoded” management.
There are many research projects focused on or using self-management for software systems and networked environments, including projects performed at the
NSF Center for Autonomic Computing [41] and a number of FP6 and FP7 projects
funded by European Commission.
For example, the FP7 EU-project RESERVOIR (Resources and Services Virtualization without Barriers) [42, 43] aims at enabling massive scale deployment and
management of complex IT services across different administrative domains. In
particular, the project develops a technology for distributed management of virtual
infrastructures across sites supporting private, public and hybrid cloud architectures.
Several completed and running research projects, in particular, AutoMate [44],
Unity [45], and SELFMAN [2, 46], and also the Grid4All [28, 47, 48] project we
participated in, propose frameworks to augment component programming systems
with management elements. The FP6 projects SELFMAN and Grid4All have taken
similar approaches to self-management: both project combine structured overlay
networks with component models for the development of an integrated architecture
for large-scale self-managing systems. SELFMAN has developed a number of technologies that enable and facilitate development of self-managing systems. Grid4All
has developed, in particular, a platform for development, deployment and execution of self-managing applications and services in dynamic environments such as
domestic Grids.
There are several industrial solutions (tools, techniques and software suites)
for enabling and achieving self-management of enterprise IT systems, e.g. IBM’s
Tivoli and HP’s OpenView, which include different autonomic tools and managers
to simplify management, monitoring and automation of complex enterprise-scale
IT systems. These solutions are based on functional decomposition of management
2.4. STATE OF THE ART
15
performed by multiple cooperative managers with different management objectives
(e.g. performance manager, power manager, storage manager, etc.). These tools
are specially developed and optimized to be used in IT infrastructure of enterprises
and datacenters.
Self-management can be centralized, decentralized, or hybrid (hierarchical).
Most of the approaches to self-management are either based on centralized control or assume high availability of macro-scale, precise and up-to-date information
about the managed system and its execution environment. The latter assumption is unrealistic for multi-owner highly-dynamic large-scale distributed systems,
e.g. P2P systems, community Grids and clouds. Typically, self-management in
an enterprise information system, a single-provider CDN or a datacenter cloud is
centralized because most of management decisions are made based on the system
global (macro-scale) state in order to achieve close to optimal system operation.
However, the centralized management it is not scalable and might be not robust.
The area of autonomic computing is still evolving. Still there are many open
research issues such as development environments to facilitate development of selfmanaging applications, efficient monitoring, scalable actuation, and robust management. Our work contributes to state of the art in autonomic computing. In
particular, self-management of large-scale and/or dynamic distributed systems.
Chapter 3
Niche: A Platform for
Self-Managing Distributed
Applications
Niche is a proof of concept prototype that we used in order to evaluate our concepts
and approach to self-management that are based on leveraging the self-organizing
properties of structured overlay networks, for providing basic services and runtime
support, together with component models, for reconfiguration and introspection.
The end result is an autonomic computing platform suitable for large-scale dynamic
distributed environments. We have designed, developed, and implemented Niche
which is a platform for self-management. Niche has been used in this work as an
environment to validate and evaluate different aspects of self-management such as
monitoring, autonomic managers interactions, and policy based management, as
well as to demonstrate our approach by using Niche to develop use cases.
This chapter will present the Niche platform (http://niche.sics.se), as system for the development, deployment and execution of self-managing distributed
systems, applications and services. Niche has been developed by a joint group of
researches and developers at the Royal Institute of Technology (KTH); Swedish
Institute of Computer Science (SICS), Stockholm, Sweden; and INRIA, France.
3.1
Niche
Niche implements (in Java) the autonomic computing architecture defined in the
IBM autonomic computing initiative, i.e. it allows building MAPE (Monitor, Analyse, Plan and Execute) control loops. Niche includes a component-based programming model (Fractal), API, and an execution environment. Niche, as a programming environment, separates programming of functional and management parts.
The functional part is developed using Fractal components and component groups,
which are controllable (e.g. can be looked up, moved, rebound, started, stopped,
17
18
CHAPTER 3. NICHE
etc.) and can be monitored by the management part of the application. The
management part is programmed using Management Element (ME) abstractions:
watchers, aggregators, managers, executors. The sensing and actuation API of
Niche connects the functional and management part. MEs monitor and communicate with events, in a publish/subscribe manner. There are built-in events (e.g.
component failure event) and application-dependent events (e.g. component load
change event). MEs control functional components via the actuation API.
Niche also provides ability to program policy-based management using a policy language, a corresponding API and a policy engine. Current implementation
of Niche includes a generic policy-based framework for policy-based management
using SPL (Simplified Policy Language) or XACML (eXtensible Access Control
Markup Language). The framework includes abstractions (and API) of policies,
policy-managers and policy-manager groups. Policy-based management enables
self-management under guidelines defined by humans in the form of management
policies that can be changed at run-time. With policy-based management it is easier
to administrate and maintain management policies. It facilitates development by
separating of policy definition and maintenance from application logic. However,
our performance evaluation shows that hard-coded management performs better
than the policy-based management.
We recommend using policy-based management for high-level policies that require the flexibility of rapidly being changed and manipulated by administrators
(easily understood by humans, can be changed on the fly, separate form development code for easier management, etc.). On the other hand low-level relatively
static policies and management logic should be hard-coded for performance. It is
also important to keep in mind that even when using policy-based management we
still have to implement management actuation and monitoring.
Although programming in Niche is on the level of Java, it is both possible and
desirable to program management at a higher level (e.g. declaratively). The language support includes the declarative ADL (Architecture Description Language)
that is used for describing initial configurations in high-level which is interpreted
by Niche at runtime (initial deployment).
Niche has been developed assuming that its run-time environment and applications with Niche might execute in a highly dynamic environment with volatile
resources, where resources (computers, VMs) can unpredictably fail or leave. In
order to deal with such dynamicity, Niche leverages self-organizing properties of
the underlying structured overlay network, including name-based routing (when a
direct binding is broken) and the DHT functionality. Niche provides transparent
replication of management elements for robustness. For efficiency, Niche directly
supports a component group abstraction with group bindings (one-to-all and oneto-any).
The Niche run-time system allows initial deployment of a service or an application on the network of Niche nodes (containers). Niche relies on the underlying
overlay services to discover and to allocate resources needed for deployment, and
to deploy the application. After deployment, the management part of the applica-
3.2. DEMONSTRATORS
19
tion can monitor and react on changes in availability of resources by subscribing
to resource events fired by Niche containers. All elements of a Niche application
– components, bindings, groups, management elements – are identified by unique
identifiers (names) that enable location transparency. Niche uses the DHT functionality of the underlying structured overlay network for its lookup service. This
is especially important in dynamic environments where components need to be
migrated frequently as machines leave and join frequently. Furthermore, each container maintains a cache of name-to-location mappings. Once a name of an element
is resolve to its location, the element (its hosting container) is accessed directly
rather than by routing messages though the overlay network. If the element moves
to a new location, the element name is transparently resolved to the new location.
3.2
Demonstrators
In order to demonstrate Niche and our design methodology (see Chapter 7), we
developed two self-managing services (1) YASS: Yet Another Storage Service; and
(2) YACS: Yet Another Computing Service. The services can be deployed and
provided on computers donated by users of the service or by a service provider.
The services can operate even if computers join, leave or fail at any time. Each
of the services has self-healing and self-configuration capabilities and can execute
on a dynamic overlay network. Self-managing capabilities of services allows the
users to minimize the human resources required for the service management. Each
of services implements relatively simple self-management algorithms, which can be
changed to be more sophisticated, while reusing existing monitoring and actuation
code of the services.
YASS (Yet Another Storage Service) is a robust storage service that allows a
client to store, read and delete files on a set of computers. The service transparently
replicates files in order to achieve high availability of files and to improve access
time. The current version of YASS maintains the specified number of file replicas
despite of nodes leaving or failing, and it can scale (i.e. increase available storage
space) when the total free storage is below a specified threshold. Management tasks
include maintenance of file replication degree; maintenance of total storage space
and total free space; increasing availability of popular files; releasing extra allocate
storage; and balancing the stored files among available resources.
YACS (Yet Another Computing Service) is a robust distributed computing service that allows a client to submit and execute jobs, which are bags of independent
tasks, on a network of nodes (computers). YACS guarantees execution of jobs
despite of nodes leaving or failing. Furthermore, YACS scales, i.e. changes the
number of execution components, when the number of jobs/tasks changes. YACS
supports checkpointing that allows restarting execution from the last checkpoint
when a worker component fails or leaves.
20
3.3
CHAPTER 3. NICHE
Lessons Learned
A middleware, such as Niche, clearly reduces burden from an application developer,
because it enables and supports self-management by leveraging self-organizing properties of structured P2P overlays and by providing useful overlay services such as
deployment, DHT (can be used for different indexes) and name-based communication. However, it comes at a cost of self-management overhead, in particular, the
cost of monitoring and replication of management; though this cost is necessary
for the democratic grid (or cloud) that operates on a dynamic environment and
requires self-management.
There are four major issues to be addressed when developing a platform such
as Niche for self-management of large scale distributed systems: Efficient resource
discovery; robust and efficient monitoring and actuation; distribution of management to avoid bottleneck and single-point-of-failure; scale of both the events that
happen in the system and the dynamicity of the system (resources and load).
To address these issues when developing Niche we used and applied different
solutions and techniques. In particular we leveraged the scalability, robustness, and
self-management properties of the structured overlay networks (SONs) as follows.
Resource discovery was the easiest to address, since all resources are members
of the Niche overlay, we used efficient broadcast/rangecast to discover resources.
This can be further improved using more complex queries that can be implemented
on top of SONs.
For monitoring and actuation we used events that are disseminated using publish/subscribe system. This supports resource mobility because sensors/actuators
can move with resources and still be able to publish/receive events. Also the Publish/subscribe system can be implemented in an efficient and robust way on top of
SONs
In order to better deal with dynamic environments, and also to avoid management bottlenecks and single-point-of-failure, we advocate for a decentralized approach to management. The management functions should be distributed among
several cooperative autonomic managers that coordinate their activities (as looselycoupled as possible) to achieve management objectives. Multiple managers are
needed for scalability, robustness, and performance and they are also useful for reflecting separation of concerns. Design steps in developing the management part of
a self-managing application include spatial and functional partitioning of management, assignment of management tasks to autonomic managers, and co-ordination
of multiple autonomic managers. The design space for multiple management components is large; indirect stigmergy-based interactions, hierarchical management,
direct interactions. Co-ordination could use shared management elements.
In dynamic systems the rate of change (join, leaves, failure of resources, change
of component load etc.) is high and that it was important to reduce the need for
action/communication in the system. This may be open-ended task, but Niche contained many features that directly impact communication. The sensing/actuation
infrastructure only delivers events to management elements that directly have sub-
3.3. LESSONS LEARNED
21
scribed to the event (i.e. avoiding the overhead of keeping management elements
up-to-date as to component location). Decentralizing management makes for better scalability. We support component groups and bindings to such groups, to be
able to map this useful abstraction to the most (known) efficient communication
infrastructure.
Chapter 4
Thesis Contribution
In this chapter, we present a summary if the thesis contribution. We start by listing
the publications that where produced during the thesis work. Next, we describe in
more details the contributions of the main areas we worked on.
4.1
List of Publications
List of publications included in this thesis
1. A. Al-Shishtawy, J. Höglund, K. Popov, N. Parlavantzas, V. Vlassov, and
P. Brand, “Enabling self-management of component based distributed applications,” in From Grids to Service and Pervasive Computing (T. Priol
and M. Vanneschi, eds.), pp. 163–174, Springer US, July 2008. Available:
http://dx.doi.org/10.1007/978-0-387-09455-7_12
2. A. Al-Shishtawy, V. Vlassov, P. Brand, and S. Haridi, “A design methodology
for self-management in distributed environments,” in Computational Science
and Engineering, 2009. CSE ’09. IEEE International Conference on, vol. 1,
(Vancouver, BC, Canada), pp. 430–436, IEEE Computer Society, August
2009. Available: http://dx.doi.org/10.1109/CSE.2009.301
3. L. Bao, A. Al-Shishtawy, and V. Vlassov, “Policy based self-management
in distributed environments,” in Third IEEE International Conference on
Self-Adaptive and Self-Organizing Systems Workshops (SASOW 2009), (San
Francisco, California), September 2009.
4. A. Al-Shishtawy, M. A. Fayyaz, K. Popov, and V. Vlassov, “Achieving robust self-management for large-scale distributed applications,” Tech. Rep.
T2010:02, Swedish Institute of Computer Science (SICS), March 2010.
23
24
CHAPTER 4. THESIS CONTRIBUTION
List of publications by the thesis author that are related to this
thesis
1. P. Brand, J. Höglund, K. Popov, N. de Palma, F. Boyer, N. Parlavantzas,
V. Vlassov, and A. Al-Shishtawy, “The role of overlay services in a selfmanaging framework for dynamic virtual organizations,” in Making Grids
Work (M. Danelutto, P. Fragopoulou, and V. Getov, eds.), pp. 153–164,
Springer US, 2007. Available:
http://dx.doi.org/10.1007/978-0-387-78448-9_12
2. K. Popov, J. Höglund, A. Al-Shishtawy, N. Parlavantzas, P. Brand, and
V. Vlassov, “Design of a self-* application using p2p-based management infrastructure,” in Proceedings of the CoreGRID Integration Workshop 2008.
CGIW’08. (S. Gorlatch, P. Fragopoulou, and T. Priol, eds.), COREGrid,
(Crete, GR), pp. 467–479, Crete University Press, April 2008.
3. N. de Palma, K. Popov, V. Vlassov, J. Höglund, A. Al-Shishtawy, and N. Parlavantzas, “A self-management framework for overlay-based applications,” in
International Workshop on Collaborative Peer-to-Peer Information Systems
(WETICE COPS 2008), (Rome, Italy), June 2008.
4. A. Al-Shishtawy, J. Höglund, K. Popov, N. Parlavantzas, V. Vlassov, and
P. Brand, “Distributed control loop patterns for managing distributed applications,” in Second IEEE International Conference on Self-Adaptive and
Self-Organizing Systems Workshops (SASOW 2008), (Venice, Italy), pp. 260–
265, Oct. 2008. Available: http://dx.doi.org/10.1109/SASOW.2008.57
4.2
Enabling Self-Management
Our work on enabling self management for large scale distributed systems was
published as two book chapters [48, 49], a workshop paper [50], and a poster [51].
The book chapter [48] appears as Chapter 6 in this thesis.
Paper Contribution
The increasing complexity of computing systems, as discussed in Section 2.1, requires a high degree of autonomic management to improve system efficiency and
reduce cost of deployment, management, and maintenance. The first step towards
achieving autonomic computing systems is to enable self-management, in particular, enable autonomous runtime reconfiguration of systems and applications. By
enabling self-management we mean to provide a platform that supports the programming and runtime execution of self managing computing systems. This work
is first presented in Chapter 6 of this thesis and extended in the following chapters.
We combined three concepts, autonomic computing, component-based architectures, and structured overlay networks, to develop a platform that enables self-
4.2. ENABLING SELF-MANAGEMENT
25
management of large scale distributed applications. The platform, called Niche,
implements the autonomic computing architecture described in Section 2.1.
Niche follows the architectural approach to autonomic computing. In the current
implementation, Niche uses the Fractal component model [20]. Fractal simplifies the
management of complex applications by making the software architecture explicit.
We extended the Fractal component model by introducing the concept of component
groups and bindings to groups. This extension results in “one-to-all” and “oneto-any” communication patterns, which support scalable, fault-tolerant and selfhealing applications [52]. Groups are first-class entities and they are dynamic. The
group membership can change dynamically (e.g. because of churn) affecting neither
the source component nor other components of the destination group.
Niche leverages the self-organization properties of structured overlay networks
and services built on top them. Self-organization of such networks and services
make them attractive for large scale systems and applications. These properties
include decentralization, scalability and fault tolerance. The current Niche implementation uses a Chord like structured P2P network called DKS [53]. Niche is
build on top of the robust and churn tolerant services that are provided by or implemented using DKS. These services include among others lookup service, DHT,
efficient broadcast/multicast, and publish subscribe service. Niche uses these services to provide a network-transparent view of system architecture, which facilitate
reasoning about and designing application’s management code. In particular, it
facilitates migration of components and management elements caused by resource
churn. These features make Niche suitable to manage large scale distributed applications deployed in dynamic environments.
Our approach to develop self-managing applications separates application’s functional and management parts. We provide a programming model and a corresponding API for developing application-specific management behaviours. Autonomic
managers are organized as a network of management elements interacting through
events using the underlying publish/subscribe service. We also provide support for
sensors and actuators. Niche leverages the introspection and dynamic reconfiguration features of the Fractal component model in order to provide sensors and
actuators. Sensors can inform autonomic managers about changes in the application and its environment by generating events. Similarly, autonomic managers can
modify the application by triggering events to actuators.
In order to verify and evaluate our approach we used Niche to implement as
a use case a robust storage service called YASS. YASS is a storage service that
allows users to store, read and delete files on a set of distributed resources. The
service transparently replicates the stored files for robustness and scalability. The
management part of the first prototype of YASS used two autonomic managers to
manage the storage space and the file replicas.
26
CHAPTER 4. THESIS CONTRIBUTION
Thesis Author Contribution
This was a joint work between researchers from the Royal Institute of Technology
(KTH), the Swedish Institute of Computer Science (SICS), and INRIA. While the
initial idea of combining autonomic computing, component-based architectures, and
structured overlay networks is not of the thesis author, he played a major role in
realizing this idea. In particular the author is a major contributor to:
• Identifying the basic overlay services required by a platform such as Niche to
enable self management like name-based communication for network transparency, distributed hash table (DHT), a publish/subscribe mechanism for
event dissemination, and resource discovery.
• Identifying the required higher level abstractions to facilitate programming of
self managing applications such as name based component bindings, dynamic
groups, and the set of network references (SNRs) abstraction that is used to
implement them.
• Extending the Fractal component model with component groups and group
bindings.
• Identifying the required higher level abstractions to program the management
part such as management elements and sensor/actuators abstractions and
that communicate through events to construct autonomic managers.
• The design and development the Niche API and platform.
• The design and development of the YASS demonstrator.
4.3
Design Methodology for Self-Management
Our work on control loop interaction patterns and design methodology for selfmanagement was published as a conference paper [28] and a workshop paper [54].
The paper [28] appears as Chapter 7 in this thesis.
Paper Contribution
To better deal with dynamic environments; to improve scalability, robustness, and
performance; we advocate for distribution of management functions among several
cooperative managers that coordinate their activities in order to achieve management objectives. Multiple managers are needed for scalability, robustness, and
performance and also useful for reflecting separation of concerns. Engineering of
self-managing distributed applications executed in a dynamic environment requires
a methodology for building robust cooperative autonomic managers. This topic is
discussed in Chapter 7 of this thesis.
4.4. POLICY BASED SELF-MANAGEMENT
27
We present a methodology for designing the management part of a distributed
self-managing application in a distributed manner. The methodology includes design space and guidelines for different design steps including management decomposition, assignment of management tasks to autonomic managers, and orchestration.
For example, management can be decomposed into a number of managers each
responsible for a specific self-* property or alternatively application subsystems.
These managers are not independent but need to cooperate and coordinate their
actions in order to achieve overall management objectives. We identified four patterns for autonomic managers to interact and coordinate their operation. The four
patterns are stigmergy, hierarchical management, direct interaction, and sharing of
management elements.
We illustrated the proposed design methodology by applying it to design and
develop an improved version of the YASS distributed storage service prototype. We
applied the four interaction patterns while developing the self-management part of
YASS to coordinate the actions of different autonomic managers involved.
Thesis Author Contribution
The author was the main contributor in developing the design methodology. In
particular, the interaction patterns between managers that are used to orchestrate
and coordinate their activities. The author did the main bulk of the work including
writing most of the article. The author also played a major role in applying the
methodology to improve the YASS demonstrator and contributed to the implementation of the improved version of YASS.
4.4
Policy Based Self-Management
Our work on policy based self-management was published as a workshop paper [40]
and a master thesis [55]. The paper [40] appears as Chapter 8 in this thesis.
Paper Contribution
In Chapter 8, we present a generic policy-based management framework which
has been integrated into Niche. Policies are sets of rules which govern the system
behaviors and reflect the business goals and objectives. The key idea of policybased management is to allow IT administrators to define a set of policy rules to
govern behaviors of their IT systems, rather than relying on manually managing or
ad-hoc mechanics (e.g. writing customized scripts) [56]. The implementation and
maintenance of policies are rather difficult, especially if policies are “hard-coded”
(embedded) in the management code of a distributed system, and the policy logic is
scattered in the system implementation. This makes it difficult to trace and change
policies.
The framework introduces a policy manager in the control loop for an autonomic
manager. The policy manager first loads a policy file and then, upon receiving
28
CHAPTER 4. THESIS CONTRIBUTION
events, the policy manager evaluates the events against the loaded policies and
acts accordingly. Using policy managers simplifies the process of maintaining and
changing of policies. It may also simplify the development process by separating the
application development from the policy development. We also argue for the need
for a policy management group that might be needed for improving the scalability
and performance of policy based management.
We recommend using policy-based management for high-level policies that require the flexibility of rapidly being changed and manipulated by administrators
(easily understood by humans, can be changed on the fly, separate form development code for easier management, etc.). On the other hand low-level relatively
static policies and management logic should be hard-coded for performance. It is
also important to keep in mind that even when using policy-based management we
still have to implement management actuation and monitoring.
A prototype of the framework is presented and evaluated using YASS distributed
storage service. We evaluated two generic policy languages (policy engines and
corresponding APIs), namely XACML (eXtensible Access Control Markup Language) [57] and SPL (Simplified Policy Language) [37], that we used to implement
the policy logic of YASS management which was previously hard coded.
Thesis Author Contribution
The author played a major role in designing the system and introducing a policy
manager in the control loop for an autonomic manager. He also suggested the
use of SPL as a policy language. The author contributed to the implementation,
integration, and evaluation of policy based management into the Niche platform.
4.5
Replication of Management Elements
Our work on replication of management elements was published as a technical
report [58] that appears as Chapter 9 in this thesis.
Paper Contribution
To simplify the development of autonomic managers, and thus large scale distributed systems, it is useful to separate the maintenance of MEs from the development of autonomic managers. It is possible to automate the maintenance
process and making it a feature of the Niche platform. This can be achieved by
providing Robust Management Elements (RMEs) abstraction that developers can
use if they need their MEs to be robust. By robust MEs we mean that an ME
should: 1) provide transparent mobility against resource join/leave (i.e. be location independent); 2) survive resource failures by being automatically restored
on another resource; 3) maintain its state consistent; 4) provide its service with
minimal disruption in spite of resource join/leave/fail (high availability).
4.5. REPLICATION OF MANAGEMENT ELEMENTS
29
In this work, as discussed in Chapter 9 of this thesis, we present our approach
to achieving RMEs, built on top of structured overlay networks [22], by replicating
MEs using replicated state machine [59, 60] approach. We propose an algorithm
that automatically maintains and reconfigures the set of resources where the ME
replicas are hosted. The reconfiguration take place by migrating [61] MEs when
needed (e.g. resource failure) to new resources. The decision on when to migrate is
decentralized and automated using the symmetric replication [53] replica placement
scheme. The contributions of this work are as follows:
• A decentralized algorithm that automates the reconfiguration of the set of
nodes that host a replicated state machine to tolerate node churn. The algorithm uses structured overlay networks and the symmetric replication replica
placement scheme to detect the need to reconfigure and to select the new set
of nodes. Then the algorithm uses service migration to move/restore replicas
on the new set of nodes.
• Defines a robust management element as a state machine replicated using the
proposed automatic algorithm.
• Construct autonomic manager from a network of distributed RMEs.
Thesis Author Contribution
The author played a major role in the initial discussions and studies of several possible approaches to solve the problem of replicating stateful management elements.
The author was also a main contributor in the development of the proposed approach and algorithms presented in the paper including writing most of the article.
The author also contributed to the implementation and the simulation experiments.
Chapter 5
Conclusions and Future Work
In this chapter we present and discuss the conclusions for the main topics addressed
through this thesis. At the end, we discuss possible future work that can built upon
and improve research presented in this thesis.
5.1
Enabling Self-Management
A large scale distributed application deployed in dynamic environments require
aggressive support for self-management. The proposed distributed component
management system, Niche, enables the development of distributed component
based applications with self-* behaviours. Niche simplifies the development of
self-managing application by separating functional and management parts of an
application and thus making it possible to develop management code separately
from application’s functional code. This allows the same application may run in
different environment by changing management an also allows management code
to be reused in different applications.
Niche leverages the self-* properties of the structured overlay network which
it is built upon. Niche provides a small set of abstractions that facilitate application management. Name-based binding, component groups, sensors, actuators,
and management elements, among others, are useful abstractions that enables the
development of network transparent autonomic systems and applications. Network
transparency, in particular, is very important in dynamic environments with high
level of churn. It enables the migration of components without disturbing existing
bindings and groups it also enables the migration of management elements without
changing the subscriptions for events. This facilitate the reasoning and development
of self-managing applications.
In order to verify and evaluate our approach we used Niche to design a selfmanaging application, called YASS, to be used in dynamic Grid environments.
Our implementation shows the feasibility of the Niche platform. The separation of
functional and management code enable us to modify management to suite different
31
32
CHAPTER 5. CONCLUSIONS AND FUTURE WORK
environments and nonfunctional requirements.
5.2
Design Methodology for Self-Management
We have presented the methodology of developing the management part of a selfmanaging distributed application in distributed dynamic environment. We advocate for multiple managers rather than a single centralized manager that can induce
a single point of failure and a potential performance bottleneck in a distributed
environment. The proposed methodology includes four major design steps: decomposition, assignment, orchestration, and mapping (distribution). The management
part is constructed as a number of cooperative autonomic managers each responsible
either for a specific management function (according to functional decomposition of
management) or for a part of the application (according to a spatial decomposition).
Distribution of autonomic managers allows distributing the management overhead and increased management performance due to concurrency and better locality. Multiple managers are needed for scalability, robustness, and performance and
also useful for reflecting separation of concerns.
We have defined and described different paradigms (patterns) of manager interactions, including indirect interaction by stigmergy, direct interaction, sharing
of management elements, and manager hierarchy. In order to illustrate the design
steps, we have developed and presented in this paper a self-managing distributed
storage service with self-healing, self-configuration and self-optimizing properties
provided by corresponding autonomic managers, developed using the distributed
component management system Niche. We have shown how the autonomic managers can coordinate their actions, by the four described orchestration paradigms,
in order to achieve the overall management objectives.
5.3
Policy based Self-Management
In this work we proposed a policy based framework which facilitates distributed
policy decision making and introduces the concept of Policy-Manager-Group that
represents a group of policy-based managers formed to balance load among PolicyManagers.
Policy-based management has several advantages over hard-coded management.
First, it is easier to administrate and maintain (e.g. change) management policies
than to trace the hard-coded management logic scattered across codebase. Second,
the separation of policies and application logic (as well as low-level hard-coded
management) makes the implementation easier, since the policy author can focus
on modeling policies without considering the specific application implementation,
while application developers do not have to think about where and how to implement management logic, but rather have to provide hooks to make their system
manageable, i.e. to enable self-management. Third, it is easier to share and reuse
the same policy across multiple different applications and to change the policy
5.4. REPLICATION OF MANAGEMENT ELEMENTS
33
consistently. Finally, policy-based management allows policy authors and administrators to edit and to change policies on the fly (at runtime).
From our evaluation results, we can observe that the hard-coded management
performs better than the policy-based management, which uses a policy engine.
Therefore, it could be recommended to use policy-based management in less performance demanding managers with policies or management objectives that need
to be changed on the fly (at runtime).
5.4
Replication of Management Elements
In this work we presented an approach to achieve robust management elements
(RMEs) which is an abstraction that will improve the construction of autonomic
managers in presence of resource churn by improving the availability and robustness of management elements (used to create autonomic managers in Niche). The
approach relies on our proposed decentralized algorithm that automates the reconfiguration (migration) of the set of machines that host a replicated state machine
to survive resource churn. We used replicated state machine approach to guarantee
coherency of replicas.
Although in this paper we discussed the use of our approach to achieve robust
management elements, we believe that this approach is generic and might be used
to replicate other services built on top of structured overlay networks for the sake
of robustness and high availability.
5.5
Discussion and Future Work
Autonomic computing initiative was started by IBM [1] in 2001 to overcome the
problem of growing complexity related to computing systems management that
prevents further developments of complex systems and services. The goal was to
reduce management complexity by making computer systems self-managing.
Many architectures have been proposed to realise this goal. However, most of
these architectures aim at reducing management costs in centralised or clustered
environments rather than enabling complex large scale systems and services. Process control theory [5] is an important theory that inspired autonomic computing.
Closed control loop is an important concept in this theory. A closed loop continuously monitors a systems and acts accordingly to keep the system in the desired
state range.
Several problems appear when trying to enable self-management for large-scale
and/or dynamic complex distributed systems that do not appear in centralised and
cluster based systems. These problems include the absence of global knowledge of
the system and network delays. These problems affect the observability/controllability of the control system and may prevent us from directly applying classical
control theory.
34
CHAPTER 5. CONCLUSIONS AND FUTURE WORK
Another important problem is scalability of management. One challenge is that
management may become a bottleneck and cause hot spots. Therefor we advocate
for distribution of management functions among several cooperative managers that
coordinate their activities in order to achieve management objectives. This lead
to the next challenge which is the coordination of multiple autonomic managers to
avoid conflicts and oscillations. Multiple autonomic managers are needed in large
scale distributed systems to improve scalability and robustness. Another problem
is the failure of autonomic managers caused by resource churn. The challenge is
to develop an efficient replication algorithm with sufficient guarantees to replicate
autonomic managers in order to tolerate failures.
This is an important problem because the characteristics of large scale distributed environments (e.g. dynamicity, unpredictability, unreliable communication) requires continuous and substantial management of applications. However
the same characteristics prevents the direct application of classical control theory
and thus making it difficult to develop autonomic applications for such environments. The subsequence of this is that most of the applications for large scale
distributed environments are simple, specialized, and/or developed in the context
of specific use cases such as file sharing, storage services, communication, content
distribution, distributed search engines, etc.
Networked Control System (NCS) [62] and Model Predictive Control (MPC) [63]
are two method of process control that has been in use in industry (e.g. NCS
used to control large factories and MPC used in process industries such as oil
refineries). The main idea in NCS is to link different components in the control
system (sensors, controllers, and actuators) using communication networks such
as Ethernet or wireless network. The main goal was to reduce the complexity
and the overall cost by reducing unnecessary wiring and also making it possible to
easily modify or upgrade the control system. NCS faces similar problems related
to network delays. MPC has been in use in the process industries to depict the
behaviour of complex dynamical systems. The goal was to compensate for the
impact of non-linearities of variables.
Our future work related to improving management in our distributed component management system, Niche, includes investigating the developing distributed
algorithms based on NCS and MPC to increase observability/controllability of applications deployed in large scale distributed environments. We also plan to further develop our design methodology for self management focusing on coordinating
multiple managers. This will facilitate the development of complex autonomic applications for such environments.
A major concern that arises is ease of programming of management logic. Research should hence focus on high-level programming abstractions, language support and tools that facilitate development of self-managing applications. We have
already started to address this aspect.
There is the issue of coupled control loops, which we did not study. In our
scenario multiple managers are directly or indirectly (via stigmergy) interacting
with each other and it is not always clear how to avoid undesirable behavior such
5.5. DISCUSSION AND FUTURE WORK
35
as rapid or large oscillations which not only can cause the system to behave nonoptimally but also increase management overhead. We found that it is desirable
to decentralize management as much as possible, but this probably aggravates the
problems with coupled control loops. Every application (or service) programmer
should not need to handle coordination of multiple managers (where each manager
may be responsible for a specific behavior). Future work should address design of
coordination protocols that could be directly used or specialized.
Although some overhead of monitoring for self-management is unavoidable,
there are opportunities for research on efficient monitoring and information gathering/aggregating infrastructures to reduce this overhead. While performance is not
perhaps always the dominant concern, we believe that this should be a focus point
since monitoring infrastructure itself executes on volatile resources.
Replication of management elements is a general way to achieve robustness
of self-management. In fact, developers tend to ignore failure and assume that
management programs will be robust. They rely mostly on naïve solutions such
as stand by servers to protect against the failure of management. However, these
naïve solutions are not suitable for large-scale dynamic environments. Even though
we have developed and validated a solution (including distributed algorithms) for
replication of management elements in Niche, it is reasonable to continue research
on efficient management replication mechanisms.
Bibliography
[1]
P. Horn, “Autonomic computing: IBM’s perspective on the state of information
technology,” Oct. 15 2001.
[2]
P. V. Roy, S. Haridi, A. Reinefeld, J.-B. Stefani, R. Yap, and T. Coupaye,
“Self management for large-scale distributed systems: An overview of the selfman project,” in FMCO ’07: Software Technologies Concertation on Formal
Methods for Components and Objects, (Amsterdam, The Netherlands), Oct
2007.
[3]
J. O. Kephart and D. M. Chess, “The vision of autonomic computing,” Computer, vol. 36, pp. 41–50, Jan. 2003.
[4]
IBM, “An architectural blueprint for autonomic computing, 4th edition.”
http://www-01.ibm.com/software/tivoli/autonomic/pdfs/AC_
Blueprint_White_Paper_4th.pdf, June 2006.
[5]
J. L. Hellerstein, Y. Diao, S. Parekh, and D. M. Tilbury, Feedback Control of
Computing Systems. John Wiley & Sons, 2004.
[6]
Y. Diao, J. L. Hellerstein, S. Parekh, R. Griffith, G. Kaiser, and D. Phung,
“Self-managing systems: a control theory foundation,” in Proc. 12th IEEE International Conference and Workshops on the Engineering of Computer-Based
Systems ECBS ’05, pp. 441–448, Apr. 4–7, 2005.
[7]
S. Abdelwahed, N. Kandasamy, and S. Neema, “Online control for selfmanagement in computing systems,” in Proc. 10th IEEE Real-Time and Embedded Technology and Applications Symposium RTAS 2004, pp. 368–375, May
25–28, 2004.
[8]
S. R. White, J. E. Hanson, I. Whalley, D. M. Chess, and J. O. Kephart,
“An architectural approach to autonomic computing,” in Proc. International
Conference on Autonomic Computing, pp. 2–9, May 17–18, 2004.
[9]
P. K. McKinley, S. M. Sadjadi, E. P. Kasten, and B. H. C. Cheng, “Composing
adaptive software,” Computer, vol. 37, pp. 56–64, July 2004.
37
38
BIBLIOGRAPHY
[10] M. Parashar, Z. Li, H. Liu, V. Matossian, and C. Schmidt, Self-star Properties
in Complex Information Systems, vol. 3460/2005 of Lecture Notes in Computer
Science, ch. Enabling Autonomic Grid Applications: Requirements, Models
and Infrastructure, pp. 273–290. Springer Berlin / Heidelberg, May 2005.
[11] R. J. Anthony, “Emergence: a paradigm for robust and scalable distributed
applications,” in Proc. International Conference on Autonomic Computing,
pp. 132–139, May 17–18, 2004.
[12] T. De Wolf, G. Samaey, T. Holvoet, and D. Roose, “Decentralised autonomic
computing: Analysing self-organising emergent behaviour using advanced numerical methods,” in Proc. Second International Conference on Autonomic
Computing ICAC 2005, pp. 52–63, June 13–16, 2005.
[13] O. Babaoglu, M. Jelasity, and A. Montresor, Unconventional Programming
Paradigms, vol. 3566/2005 of Lecture Notes in Computer Science, ch. Grassroots Approach to Self-management in Large-Scale Distributed Systems,
pp. 286–296. Springer Berlin / Heidelberg, August 2005.
[14] G. Tesauro, D. M. Chess, W. E. Walsh, R. Das, A. Segal, I. Whalley, J. O.
Kephart, and S. R. White, “A multi-agent systems approach to autonomic
computing,” in AAMAS ’04: Proceedings of the Third International Joint
Conference on Autonomous Agents and Multiagent Systems, (Washington, DC,
USA), pp. 464–471, IEEE Computer Society, 2004.
[15] D. Bonino, A. Bosca, and F. Corno, “An agent based autonomic semantic platform,” in Proc. International Conference on Autonomic Computing, pp. 189–
196, May 17–18, 2004.
[16] G. Kaiser, J. Parekh, P. Gross, and G. Valetto, “Kinesthetics extreme: an
external infrastructure for monitoring distributed legacy systems,” in Proc.
Autonomic Computing Workshop, pp. 22–30, June 25, 2003.
[17] C. Karamanolis, M. Karlsson, and X. Zhu, “Designing controllable computer
systems,” in HOTOS’05: Proceedings of the 10th conference on Hot Topics
in Operating Systems, (Berkeley, CA, USA), pp. 49–54, USENIX Association,
2005.
[18] G. Valetto, G. Kaiser, and D. Phung, “A uniform programming abstraction
for effecting autonomic adaptations onto software systems,” in Proc. Second
International Conference on Autonomic Computing ICAC 2005, pp. 286–297,
June 13–16, 2005.
[19] M. M. Fuad and M. J. Oudshoorn, “An autonomic architecture for legacy
systems,” in Proc. Third IEEE International Workshop on Engineering of Autonomic and Autonomous Systems EASe 2006, pp. 79–88, Mar. 27–30, 2006.
39
[20] E. Bruneton, T. Coupaye, and J.-B. Stefani, “The fractal component model,”
tech. rep., France Telecom R&D and INRIA, Feb. 5 2004.
[21] E. Bruneton, T. Coupaye, M. Leclercq, V. Quéma, and J.-B. Stefani, “The fractal component model and its support in java: Experiences with auto-adaptive
and reconfigurable systems,” Softw. Pract. Exper., vol. 36, no. 11-12, pp. 1257–
1284, 2006.
[22] E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim, “A survey and comparison of peer-to-peer overlay network schemes,” Communications Surveys &
Tutorials, IEEE, vol. 7, pp. 72–93, Second Quarter 2005.
[23] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek,
and H. Balakrishnan, “Chord: a scalable peer-to-peer lookup protocol for internet applications,” IEEE/ACM Transactions on Networking, vol. 11, pp. 17–32,
Feb. 2003.
[24] S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Schenker, “A scalable
content-addressable network,” in SIGCOMM ’01: Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer
communications, (New York, NY, USA), pp. 161–172, ACM, 2001.
[25] A. I. T. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object
location, and routing for large-scale peer-to-peer systems,” in Middleware ’01:
Proceedings of the IFIP/ACM International Conference on Distributed Systems
Platforms Heidelberg, (London, UK), pp. 329–350, Springer-Verlag, 2001.
[26] M. Parashar and S. Hariri, “Autonomic computing: An overview,” in Unconventional Programming Paradigms, pp. 257–269, 2005.
[27] “The green grid.” http://www.thegreengrid.org/ (Visited on Oct 2009).
[28] A. Al-Shishtawy, V. Vlassov, P. Brand, and S. Haridi, “A design methodology
for self-management in distributed environments,” in Computational Science
and Engineering, 2009. CSE ’09. IEEE International Conference on, vol. 1,
(Vancouver, BC, Canada), pp. 430–436, IEEE Computer Society, August 2009.
[29] R. Das, J. O. Kephart, C. Lefurgy, G. Tesauro, D. W. Levine, and H. Chan,
“Autonomic multi-agent management of power and performance in data centers,” in AAMAS ’08: Proceedings of the 7th international joint conference
on Autonomous agents and multiagent systems, (Richland, SC), pp. 107–114,
International Foundation for Autonomous Agents and Multiagent Systems,
2008.
[30] J. Kephart, H. Chan, R. Das, D. Levine, G. Tesauro, F. Rawson, and C. Lefurgy, “Coordinating multiple autonomic managers to achieve specified powerperformance tradeoffs,” in Autonomic Computing, 2007. ICAC ’07. Fourth
International Conference on, pp. 24–24, June 2007.
40
BIBLIOGRAPHY
[31] S. Bouchenak, F. Boyer, S. Krakowiak, D. Hagimont, A. Mos, J.-B. Stefani,
N. de Palma, and V. Quema, “Architecture-based autonomous repair management: An application to J2EE clusters,” in SRDS ’05: Proceedings of the
24th IEEE Symposium on Reliable Distributed Systems, (Orlando, Florida),
pp. 13–24, IEEE, Oct. 2005.
[32] J. O. Kephart and R. Das, “Achieving self-management via utility functions,”
IEEE Internet Computing, vol. 11, no. 1, pp. 40–48, 2007.
[33] S. Abdelwahed and N. Kandasamy, “A control-based approach to autonomic
performance management in computing systems,” in Autonomic Computing:
Concepts, Infrastructure, and Applications (M. Parashar and S. Hariri, eds.),
ch. 8, pp. 149–168, CRC Press, 2006.
[34] V. Bhat, M. Parashar, M. Khandekar, N. Kandasamy, and S. Klasky, “A selfmanaging wide-area data streaming service using model-based online control,”
in Grid Computing, 7th IEEE/ACM International Conference on, pp. 176–183,
Sept. 2006.
[35] H. Chan and B. Arnold, “A policy based system to incorporate self-managing
behaviors in applications,” in OOPSLA ’03: Companion of the 18th annual
ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, (New York, NY, USA), pp. 94–95, ACM, 2003.
[36] J. Feng, G. Wasson, and M. Humphrey, “Resource usage policy expression and
enforcement in grid computing,” in Grid Computing, 2007 8th IEEE/ACM
International Conference on, pp. 66–73, Sept. 2007.
[37] D. Agrawal, S. Calo, K.-W. Lee, J. Lobo, and T. W. Res., “Issues in designing
a policy language for distributed management of it infrastructures,” in Integrated Network Management, 2007. IM ’07. 10th IFIP/IEEE International
Symposium, pp. 30–39, June 2007.
[38] “Apache imperius.” http://incubator.apache.org/imperius/ (Visited on
Oct 2009.
[39] V. Kumar, B. F. Cooper, G. Eisenhauer, and K. Schwan, “imanage: policydriven self-management for enterprise-scale systems,” in Middleware ’07: Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware, (New York, NY, USA), pp. 287–307, Springer-Verlag New York, Inc.,
2007.
[40] L. Bao, A. Al-Shishtawy, and V. Vlassov, “Policy based self-management in
distributed environments,” in Third IEEE International Conference on SelfAdaptive and Self-Organizing Systems Workshops (SASOW 2009), (San Francisco, California), September 2009.
41
[41] “The center for autonomic computing.” http://www.nsfcac.org/ (Visited
Oct 2009).
[42] B. Rochwerger, A. Galis, E. Levy, J. Caceres, D. Breitgand, Y. Wolfsthal,
I. Llorente, M. Wusthoff, R. Montero, and E. Elmroth, “Reservoir: Management technologies and requirements for next generation service oriented infrastructures,” in Integrated Network Management, 2009. IM ’09. IFIP/IEEE
International Symposium on, pp. 307–310, June 2009.
[43] “Reservoir: Resources and services virtualization without barriers.” http://
reservoir.cs.ucl.ac.uk/ (Visited on Oct 2009).
[44] M. Parashar, H. Liu, Z. Li, V. Matossian, C. Schmidt, G. Zhang, and S. Hariri,
“Automate: Enabling autonomic applications on the grid,” Cluster Computing,
vol. 9, no. 2, pp. 161–174, 2006.
[45] D. Chess, A. Segal, I. Whalley, and S. White, “Unity: Experiences with a prototype autonomic computing system,” Proc. of Autonomic Computing, pp. 140–
147, May 2004.
[46] “Selfman project.” http://www.ist-selfman.org/ (Visited Oct 2009).
[47] “Grid4all project.” http://www.grid4all.eu (visited Oct 2009).
[48] A. Al-Shishtawy, J. Höglund, K. Popov, N. Parlavantzas, V. Vlassov, and
P. Brand, “Enabling self-management of component based distributed applications,” in From Grids to Service and Pervasive Computing (T. Priol and
M. Vanneschi, eds.), pp. 163–174, Springer US, July 2008.
[49] P. Brand, J. Höglund, K. Popov, N. de Palma, F. Boyer, N. Parlavantzas,
V. Vlassov, and A. Al-Shishtawy, “The role of overlay services in a selfmanaging framework for dynamic virtual organizations,” in Making Grids
Work (M. Danelutto, P. Fragopoulou, and V. Getov, eds.), pp. 153–164,
Springer US, 2007.
[50] N. de Palma, K. Popov, V. Vlassov, J. Höglund, A. Al-Shishtawy, and N. Parlavantzas, “A self-management framework for overlay-based applications,” in
International Workshop on Collaborative Peer-to-Peer Information Systems
(WETICE COPS 2008), (Rome, Italy), June 2008.
[51] K. Popov, J. Höglund, A. Al-Shishtawy, N. Parlavantzas, P. Brand, and
V. Vlassov, “Design of a self-* application using p2p-based management infrastructure,” in Proceedings of the CoreGRID Integration Workshop (CGIW’08)
(S. Gorlatch, P. Fragopoulou, and T. Priol, eds.), COREGrid, (Crete, GR),
pp. 467–479, Crete University Press, April 2008.
42
BIBLIOGRAPHY
[52] P. Brand, J. Höglund, K. Popov, N. de Palma, F. Boyer, N. Parlavantzas,
V. Vlassov, and A. Al-Shishtawy, “The role of overlay services in a selfmanaging framework for dynamic virtual organizations,” in CoreGRID Workshop, Crete, Greece, June 2007.
[53] A. Ghodsi, Distributed k-ary System: Algorithms for Distributed Hash Tables.
PhD thesis, Royal Institute of Technology (KTH), 2006.
[54] A. Al-Shishtawy, J. Höglund, K. Popov, N. Parlavantzas, V. Vlassov, and
P. Brand, “Distributed control loop patterns for managing distributed applications,” in Second IEEE International Conference on Self-Adaptive and SelfOrganizing Systems Workshops (SASOW 2008), (Venice, Italy), pp. 260–265,
Oct. 2008.
[55] L. Bao, “Evaluation of approaches to policy-based management in selfmanaging distributed system,” Master’s thesis, Royal Institute of Technology
(KTH), School of Information and Communication Technology (ICT), 2009.
[56] D. Agrawal, J. Giles, K. Lee, and J. Lobo, “Policy ratification,” in Policies for
Distributed Systems and Networks, 2005. Sixth IEEE Int. Workshop (T. Priol
and M. Vanneschi, eds.), pp. 223– 232, June 2005.
[57] “Oasis extensible access control markup language (xacml) tc.” http://www.
oasis-open.org/committees/tc_home.php?wg_abbrev=xacml#expository.
[58] A. Al-Shishtawy, M. A. Fayyaz, K. Popov, and V. Vlassov, “Achieving robust self-management for large-scale distributed applications,” Tech. Rep.
T2010:02, Swedish Institute of Computer Science (SICS), March 2010.
[59] F. B. Schneider, “Implementing fault-tolerant services using the state machine
approach: a tutorial,” ACM Comput. Surv., vol. 22, no. 4, pp. 299–319, 1990.
[60] L. Lamport, “Paxos made simple,” SIGACT News, vol. 32, pp. 51–58, December 2001.
[61] J. R. Lorch, A. Adya, W. J. Bolosky, R. Chaiken, J. R. Douceur, and J. Howell,
“The smart way to migrate replicated stateful services,” SIGOPS Oper. Syst.
Rev., vol. 40, no. 4, pp. 103–115, 2006.
[62] S. Tatikoonda, Control under communication constraints. PhD thesis, MIT,
September 2000.
[63] C. E. Garcíaa, D. M. Prettb, and M. Morari, “Model predictive control: Theory
and practice–a survey,” Automatica, vol. 25, pp. 335–348, May 1989.
Part II
Research Papers
43
Paper A
Chapter 6
Enabling Self-Management of
Component Based Distributed
Applications
Ahmad Al-Shishtawy, Joel Höglund, Konstantin Popov,
Nikos Parlavantzas, Vladimir Vlassov, and Per Brand
In From Grids to Service and Pervasive Computing (T. Priol and M. Vanneschi,
eds.), pp. 163–174, Springer US, July 2008.
Enabling Self-Management of Component Based
Distributed Applications
Ahmad Al-Shishtawy,1 Joel Höglund,2 Konstantin Popov,2 Nikos Parlavantzas,3
Vladimir Vlassov,1 and Per Brand2
1
2
Royal Institute of Technology (KTH), Stockholm, Sweden
{ahmadas, vladv}@kth.se
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
{kost, joel, perbrand}@sics.se
3
INRIA, Grenoble, France
[email protected]
Abstract
Deploying and managing distributed applications in dynamic Grid environments requires a high degree of autonomous management. Programming
autonomous management in turn requires programming environment support and higher level abstractions to become feasible. We present a framework for programming self-managing component-based distributed applications. The framework enables the separation of application’s functional and
non-functional (self-*) parts. The framework extends the Fractal component model by the component group abstraction and one-to-any and oneto-all bindings between components and groups. The framework supports a
network-transparent view of system architecture simplifying designing application self-* code. The framework provides a concise and expressive API for
self-* code. The implementation of the framework relies on scalability and robustness of the Niche structured p2p overlay network. We have also developed
a distributed file storage service to illustrate and evaluate our framework.
6.1
Introduction
Deployment and run-time management of applications constitute a large part of
software’s total cost of ownership. These costs increase dramatically for distributed
applications that are deployed in dynamic environments such as unreliable networks
aggregating heterogeneous, poorly managed resources.
The autonomic computing initiative [1] advocates self-configuring, self-healing,
self-optimizing and self-protecting (self-* thereafter) systems as a way to reduce the
management costs of such applications. Architecture-based self-* management [2]
of component-based applications [3] have been shown useful for self-repair of applications running on clusters [4].
We present a design of a component management platform supporting self-* applications for community based Grids, and illustrate it with an application. Com47
48
CHAPTER 6. ENABLING SELF-MANAGEMENT
munity based Grids are envisioned to fill the gap between high-quality Grid environments deployed for large-scale scientific and business applications, and existing
peer-to-peer systems which are limited to a single application. Our application, a
storage service, is intentionally simple from the functional point of view, but it can
self-heal, self-configure and self-optimize itself.
Our framework separates application functional and self-* code. We provide a
programming model and a matching API for developing application-specific self-*
behaviours. The self-* code is organized as a network of management elements
(MEs) interacting through events. The self-* code senses changes in the environment by means of events generated by the management platform or by application
specific sensors. The MEs can actuate changes in the architecture – add, remove
and reconfigure components and bindings between them. Applications using our
framework rely on external resource management providing discovery and allocation
services.
Our framework supports an extension of the Fractal component model [3]. We
introduce the concept of component groups and bindings to groups. This results
in “one-to-all” and “one-to-any” communication patterns, which support scalable,
fault-tolerant and self-healing applications [5]. For functional code, a group of
components acts as a single entity. Group membership management is provided
by the self-* code and is transparent to the functional code. With a one-to-any
binding, a component can communicate with a component randomly chosen at
run-time from a certain group. With a one-to-all binding, it will communicate
with all elements of the group. In either case, the content of the group can change
dynamically (e.g. because of churn) affecting neither the source component nor
other elements of the destination’s group.
The management platform is self-organizing and self-healing upon churn. It is
implemented on the Niche overlay network [5] providing for reliable communication
and lookup, and for sensing behaviours provided to self-* code.
Our first contribution is a simple yet expressive self-* management framework.
The framework supports a network-transparent view of system architecture, which
simplifies reasoning about and designing application self-* code. In particular, it
facilitates migration of components and management elements caused by resource
churn. Our second contribution is the implementation model for our churn-tolerant
management platform that leverages the self-* properties of a structured overlay
network.
We do not aim at a general model for ensuring coherency and convergence of
distributed self-* management. We believe, however, that our framework is general enough for arbitrary self-management control loops. Our example application
demonstrates also that these properties are attainable in practice.
6.2. THE MANAGEMENT FRAMEWORK
Figure 6.1:
ture.
6.2
Application Architec-
49
Figure 6.2: Ids and Handlers.
The Management Framework
An application in the framework consists of a component-based implementation
of the application’s functional specification (the lower part of Figure 6.1), and an
implementation of the application’s self-* behaviors (the upper part). The management platform provides for component deployment and communication, and
supports sensing of component status.
Self-* code in our management framework consists of management elements
(MEs), which we subdivide into watchers (W1, W2 .. on Figure 6.1), aggregators
(Aggr1) and managers (Mgr1), depending on their roles in the self-* code. MEs
are stateful entities that subscribe to and receive events from sensors and other
MEs. Sensors are either component-specific and developed by the programmer, or
provided by the management framework itself such as component failure sensors.
MEs can manipulate the architecture using the management actuation API [4]
implemented by the framework. The API provides in particular functions to deploy
and interconnect components.
Elements of the architecture – components, bindings, MEs, subscriptions, etc. –
are identified by unique identifiers (IDs). Information about an architecture element
is kept in a handle that is unique for the given ID, see Figure 6.2. The actuation
API is defined in terms of IDs. IDs are introduced by DCMS API calls that deploy
components, construct bindings between components and subscriptions between
MEs. IDs are specified when operations are to be performed on architecture elements, like deallocating a component. Handles are destroyed (become invalid) as
a side effect of destruction operation of their architecture elements. Handles to architecture elements are implemented by sets of network references described below.
Within a ME, handles are represented by an object that can cache information from
the handle. On Figure 6.2, handle object for id:3 used by the deploy actuation
API call caches the location of id:3.
An ME consists of an application-specific component and an instance of the
50
CHAPTER 6. ENABLING SELF-MANAGEMENT
Figure 6.3: Structure of MEs.
Figure 6.4: Composition of MEs.
generic proxy component, see Figure 6.3. ME proxies provide for communication
between MEs, see Figure 6.4, and enable the programmer to control the management architecture transparently to individual MEs. Sensors have a similar two-part
structure.
The management framework enables the developer of self-* code to control location of MEs. For every management element the developer can specify a container
where that element should reside. A container is a first-class entity which sole
purpose is to ensure that entities in the container reside on the same physical
node. This eliminates network communication latencies between co-located MEs.
The container’s location can be explicitly defined by a location of a resource that
is used to host elements of the architecture, thus eliminating the communication
latency and overhead between architecture elements and managers handling them.
A Set of Network References, SNR [5], is a primitive data abstraction that is
used to associate a name with a set of references. SNRs are stored under their names
on the structured overlay network. SNR references are used to access elements in
the system and can be either direct or indirect. Direct references contain the
location of an entity, and indirect references refer to other SNRs by names and
need to be resolved before use. SNRs can be cached by clients improving access
time. The framework recognizes out-of-date references and refreshes cache contents
when needed.
Groups are implemented using SNRs containing multiple references. A “oneto-any” or “one-to-all” binding to a group means that when a message is sent
through the binding, the group name is resolved to its SNR, and one or more of
the group references are used to send the message depending on the type of the
binding. SNRs also enable mobility of elements pointed to by the references. MEs
can move components between resources, and by updating their references other
elements can still find the components by name. A group can grow or shrink
transparently from group user point of view. Finally SNRs are used to support
sensing through associating watchers with SNRs. Adding a watcher to an SNR
will result in sensors being deployed for each element associated with the SNR.
Changing the references of an SNR will transparently deploy/undeploy sensors for
the corresponding elements.
SNRs can be replicated providing for reliable storage of application architecture.
6.3. IMPLEMENTATION AND EVALUATION
51
Read R
equest
one-to
-any b
inding
to a file
group
VO
A2
Free
A3
A1
B2
Front-end
Component
C3
B3
C2 Free
C1
Free
B1 C 4
Free
est g
qu
Re indin p
e
b
it
u
Wr -any e gro
o
g
e-t tora
n
s
o
the
to
Ax,Bx,Cx = file groups, x is
replica number in the group.
Ovals = resources.
Rectangles = Components.
Dashed line = YASS storage
components group.
Figure 6.5: YASS Functional Part
The SRN replication provides eventual consistency of SNR replicas, but transient
inconsistencies are allowed. Similarly to handling of SNR caching, the framework
recognizes out-of-date SNR references and repeats SNR access whenever necessary.
6.3
Implementation and evaluation
We have designed and developed YASS – “yet another storage service” – as a way
to refine the requirements of the management framework, to evaluate it and to
illustrate its functionality. Our application stores, reads and deletes files on a set
of distributed resources. The service replicates files for the sake of robustness and
scalability. We target the service for dynamic Grid environments, where resources
can join, gracefully leave or fail at any time. YASS automatically maintains the file
replication factor upon resource churn, and scales itself based on the load on the
service.
Application functional design
A YASS instance consists out of front-end components which are deployed on user
machines and storage components Figure 6.5. Storage components are composed
of file components representing files. The ovals in Figure 6.5 represent resources
contributed to a Virtual Organization (VO). Some of the resources are used to
deploy storage components, shown as rectangles.
A user store request is sent to an arbitrary storage component (one-to-any binding) that will find some r different storage components, where r is the file’s replication degree, with enough free space to store a file replica. These replicas together
52
CHAPTER 6. ENABLING SELF-MANAGEMENT
Sensing and
Actuation
Management Elements
File related MEs.
One of each per file group
Create
Group
Manager
G G G G G
Group Sensors
File
Replica
Manager
Actuation API
Application wide MEs.
One of each per YASS instance
Configuration
Manager
File
Replica
Aggregator
Storage
Aggregator
Component
Load Change
Watcher
L L L L L
F
Leave Sensors
Failure Sensors
F
F
F
F
C C C
C C
Load Change Sensors
Actuation API
Figure 6.6: YASS Non-Functional Part
will form a file group containing the r dynamically created new file components.
The user will then use a one-to-all binding to send the file in parallel to the r replicas in the file group. Read requests can be sent to any of the r file components in
the group using the one-to-any binding between the front-end and the file group.
Application non-functional design
Configuration of application self-management. The Figure 6.6 shows the
architecture of the watchers, aggregators and managers used by the application.
Associated with the group of storage components is a system-wide Storageaggregator created at service deployment time, which is subscribed to leave- and
failure-events which involve any of the storage components. It is also subscribed
to a Load-watcher which triggers events in case of high system load. The Storageaggregator can trigger StorageAvailabilityChange-events, which the Configurationmanager is subscribed to.
When new file-groups are formed by the functional part of the application, the
management infrastructure propagates group-creation events to the CreateGroupmanager which initiates a FileReplica-aggregator and a FileReplica-manager for the
new group. The new FileReplica-aggregator is subscribed to resource leave- and
resource fail-events of the resources associated with the new file group.
Test-cases and initial evaluation
The infrastructure has been initially tested by deploying a YASS instance on a set of
nodes. Using one front-end a number of files are stored and replicated. Thereafter
a node is stopped, generating one fail-event which is propagated to the Storageaggregator and to the FileReplica-aggregators of all files present on the stopped
node. Below is explained in detail how the self-management acts on these events
to restore desired system state.
6.3. IMPLEMENTATION AND EVALUATION
31
4
SComp5
27
26
F
F L C
F
7
8
24
22
Manager
10
11
F L C
21
Component
Load Change
Watcher
SComp1
A2
Free
12
20
13
19
18
17
Watcher
9
Configuration
SComp2
A3 C2 Free
Component
Load Change
6
F L C
23
C1
5
B1 C4 Free
25
B3
SComp1
SComp2
SComp3
SComp4
SComp5
3
28
L C
A1
YASS SNR
2
29
B2 C3 Free
SComp3
1
30
SComp4
Storage
Aggregator
0
53
L C
14
16
15
Figure 6.7: Parts of the YASS application deployed on the management infrastructure.
Figure 6.7 shows the management elements associated with the group of storage
components. The black circles represent physical nodes in the P2P overlay Id space.
Architectural entities (e.g. SNR and MEs) are mapped to ids. Each physical node
is responsible for Ids between its predecessor and itself including itself. As there
is always a physical node responsible for an id, each entity will be mapped to one
of the nodes in the system. For instance the Configuration Manager is mapped
to id 13, which is the responsibility of the node with id 14 which means it will be
executed there.
Application Self-healing. Self-healing is concerned with maintaining the desired
replica degree for each stored item. This is achieved as follows for resource leaves
and failures:
Resource leave. An infrastructure sensor signals that a resource is about to leave.
For each file stored at the leaving resource, the associated FileReplica-aggregator
is notified and issues a replicaChange-event which is forwarded to the FileReplicamanager. The FileReplica-manager uses the one-to-any binding of the file-group to
issue a FindNewReplica-event to any of the components in the group.
Resource failure. On a resource failure, the FileGroup-aggregator will check
if the failed resource previously signaled a ResourceLeave (but did not wait long
enough to let the restore replica operation finish). In that case the aggregator will
do nothing, since it has already issued a replicaChange event. Otherwise a failure
is handled the same way as a leave.
Application Self-configuration. With self-configuration we mean the ability to
adapt the system in the face of dynamism, thereby maintaining its capability to
meet functional requirements. This is achieved by monitoring the total amount of
allocated storage. The Storage-aggregator is initialized with the amount of avail-
54
CHAPTER 6. ENABLING SELF-MANAGEMENT
Listing 6.1: Pseudocode for parts of the Storage-aggregator
upon e v e n t R e s o u r c e F a i l u r e ( r e s o u r c e _ i d ) do
amount_to_subtract = a l l o c a t e d _ r e s o u r c e s ( r e s o u r c e _ i d )
t o t a l _ s t o r a g e = total_amount − amount_to_subtract
c u r r e n t _ l o a d = update ( c u r r e n t _ l o a d , t o t a l _ s t o r a g e )
i f total_amount < i n i t i a l _ r e q u i r e m e n t o r c u r r e n t _ l o a d > h i g h _ l i m i t
then
t r i g g e r ( availabilityChangeEvent ( total_storage , current_load ))
end
able resources at deployment time and updates the state in case of resource leaves
or failures. If the total amount of allocated resources drops below given requirements, the Storage-aggregator issues a storageAvailabilityChange-event which is
processed by the Configuration-manager. The Configuration-manager will try to
find an unused resource (via the external resource management service) to deploy
a new storage component, which is added to the group of components. Parts of
the Storage-aggregator and Configuration-manager pseudocode is shown in Listing 6.1, demonstrating how the stateful information is kept by the aggregator and
updated through sensing events, while the actuation commands are initiated by the
manager.
Application Self-optimization. In addition to the two above described test-cases
we have also designed but not fully tested application self-optimization. With selfoptimization we mean the ability to adapt the system so that it, besides meeting
functional requirements, also meets additional non-functional requirements such
as efficiency. This is achieved by using the ComponentLoad-watcher to gather
information on the total system load, in terms of used storage. The storage components report their load changes, using application specific load sensors. These
load-change events are delivered to the Storage-aggregator. The aggregator will
be able to determine when the total utilization is critically high, in which case a
StorageAvailabilityChange-event is generated and processed by the Configurationmanager in the same way as described in the self-configuration section. If utilization
drops below a given threshold, and the amount of allocated resources is above initial requirements, a storageAvailabilityChange-event is generated. In this case the
event indicates that the availability is higher than needed, which will cause the
Configuration-manager to query the ComponentLoad-watcher for the least loaded
storage component, and instruct it to deallocate itself, thereby freeing the resource.
Parts of the Configuration-manager pseudocode is shown in Listing 6.2, demonstrating how the number of storage components can be adjusted upon need.
6.4. RELATED WORK
55
Listing 6.2: Pseudocode for parts of the Configuration-manager
upon e v e n t a v a i l a b i l i t y C h a n g e E v e n t ( t o t a l _ s t o r a g e , new_load ) do
i f t o t a l _ s t o r a g e < i n i t i a l _ r e q u i r e m e n t o r new_load > h i g h _ l i m i t then
new_resource =
r e s o u r c e _ d i s c o v e r ( component_requirements , c o m p a r e _ c r i t e r i a )
new_resource = a l l o c a t e ( new_resource , p r e f e r e n c e s )
new_component =
d e p l o y ( s t o r a g e _ c o m p o n e n t _ d e s c r i p t i o n , new_resource )
add_to_group ( new_component , component_group )
e l s e i f t o t a l _ s t o r a g e > i n i t i a l _ r e q u i r e m e n t and new_load < l o w _ l i m i t
then
least_loaded_component = component_load_watcher . g e t _ l e a s t _ l o a d e d ( )
l e a s t _ l o a d e d _ r e s o u r c e = least_loaded_component . g e t _ r e s o u r c e ( )
t r i g g e r ( resourceLeaveEvent ( least_loaded_resource ))
end
6.4
Related Work
Our work builds on the technical work on the Jade component-management system [4]. Jade utilizes the Java RMI, and is limited to cluster environments as it
relies on small and bounded communication latencies between nodes.
As the work here suggests a particular implementation model for distributed
component based programming, relevant related work can be found in research
dealing specifically with autonomic computing in general and in research about
component and programming models for distributed systems.
Autonomic Management. The vision of autonomic management as presented
in [1] has given rise to a number of proposed solutions to aspects of the problem.
Many solutions adds self-management support through the actions of a centralized self-manager. One suggested system which tries to add some support for
the self-management of the management system itself is Unity [6]. Following the
model proposed by Unity, self-healing and self-configuration are enabled by building applications where each system component is a autonomic element, responsible
for its own self-management. Unity assumes cluster-like environments where the
application nodes might fail, but the project only partly addresses the issue of
self-management of the management infrastructure itself.
Relevant complementary work include work on checkpointing in distributed environments. Here recent work on Cliques [7] can be mentioned, where worker nodes
help store checkpoints in a distributed fashion to reduce load on managers which
then only deal with group management. Such methods could be introduced in our
framework to support stateful applications.
Component Models. Among the proposed component models which target building distributed systems, the traditional ones, such as the Corba Component Model
or the standard Enterprise JavaBeans were designed for client-server relationships
assuming highly available resources. They provide very limited support for dynamic
56
CHAPTER 6. ENABLING SELF-MANAGEMENT
reconfiguration. Other component models, such as OpenCOM [8], allow dynamic
flexibility, but their associated infrastructure lacks support for operation in dynamic
environments.
The Grid Component Model, GCM [9], is a recent component model that specifically targets grid programming. GCM is defined as an extension of Fractal and its
features include many-to-many communications with various semantics and autonomic components.
GCM defines simple "autonomic managers" that embody autonomic behaviours
and expose generic operations to execute autonomic operations, accept QoS contracts, and to signal QoS violations. However, GCM does not prescribe a particular
implementation model and mechanisms to ensure the efficient operation of self-*
code in large-scale environments. Thus, GCM can be seen as largely complementary
to our work and thanks to the common ancestor, we believe that our results can be
exploited within a future GCM implementation. Behavioural skeletons [10] aim to
model recurring patterns of component assemblies equipped with correct and effective self-management schemes. Behavioural skeletons are being implemented using
GCM, but the concept of reusable, domain-specific, self-management structures can
be equally applied using our component framework.
GCM also defines collective communications by introducing new kinds of cardinalities for component interfaces: multicast, and gathercast [11]. This enables
one-to-n and n-to-one communication. However GCM does not define groups as
a first class entities, but only implicitly through bindings, so groups can not be
shared and reused. GCM also does not mention how to handle failures and dynamism (churn) and who is responsible to maintain the group. Our one-to-all
binding can utilise the multicast service, provided by the underlying P2P overlay,
to provide more scalable and efficient implementation in case of large groups. Also
our model supports mobility so members of the group can change their location
without affecting the group.
A component model designed specifically for structured overlay networks and
wide scale deployment is p2pCM [12], which extends the DERMI [13] object middleware platform. The model provides replication of component instances, component
lifecycle management and group communication, including anycall functionality to
communicate with the closest instance of a component. The model does not offer
higher level abstractions such as watchers and event handlers, and the support for
self-healing and issues of consistency are only partially addressed.
6.5
Future Work
Currently we are working on the management element wrapper abstraction. This
abstraction adds fault-tolerance to the self-* code by enabling ME replication. The
goal of the management element wrapper is to provide consistency between the
replicated ME in a transparent way and to restore the replication degree if one of
the replicas fails. Without this support from the framework, the user can still have
6.6. CONCLUSIONS
57
self-* fault-tolerance by explicitly implementing it as a part of the application’s
non-functional code. The basic idea is that the management element wrapper adds
a consistency layer between the replicated ME from one side and the sensors/actuators from the other side. This layer provides a uniform view of the events/actions
for both sides.
Currently the we use a simple architecture description language (ADL) only
covering application functional behaviours. We hope to extend this to also cover
non-functional aspects.
We are also evaluating different aspects of our framework such as the overhead
of our management framework in terms of network traffic and the time need execute
self-* code. Another important aspect is to analyse the effect of churn on the self-*
code.
Finally we would like to evaluate our framework using applications with more
complex self-* behaviours.
6.6
Conclusions
The proposed management framework enables development of distributed component based applications with self-* behaviours which are independent from application’s functional code, yet can interact with it when necessary. The framework
provides a small set of abstractions that facilitate fault-tolerant application management. The framework leverages the self-* properties of the structured overlay
network which it is built upon. We used our component management framework to
design a self-managing application to be used in dynamic Grid environments. Our
implementation shows the feasibility of the framework.
Bibliography
[1]
P. Horn, “Autonomic computing: IBM’s perspective on the state of information
technology,” Oct. 15 2001.
[2]
J. Hanson, I. Whalley, D. Chess, and J. Kephart, “An architectural approach
to autonomic computing,” in ICAC ’04: Proceedings of the First International
Conference on Autonomic Computing, (Washington, DC, USA), pp. 2–9, IEEE
Computer Society, 2004.
[3]
E. Bruneton, T. Coupaye, and J.-B. Stefani, “The fractal component model,”
tech. rep., France Telecom R&D and INRIA, Feb. 5 2004.
[4]
S. Bouchenak, F. Boyer, S. Krakowiak, D. Hagimont, A. Mos, J.-B. Stefani,
N. de Palma, and V. Quema, “Architecture-based autonomous repair management: An application to J2EE clusters,” in SRDS ’05: Proceedings of the
24th IEEE Symposium on Reliable Distributed Systems, (Orlando, Florida),
pp. 13–24, IEEE, Oct. 2005.
[5]
P. Brand, J. Höglund, K. Popov, N. de Palma, F. Boyer, N. Parlavantzas,
V. Vlassov, and A. Al-Shishtawy, “The role of overlay services in a selfmanaging framework for dynamic virtual organizations,” in CoreGRID Workshop, Crete, Greece, June 2007.
[6]
D. Chess, A. Segal, I. Whalley, and S. White, “Unity: Experiences with a prototype autonomic computing system,” Proc. of Autonomic Computing, pp. 140–
147, May 2004.
[7]
D. K. F. Araujo, P. Domingues and L. M. Silva, “Using cliques of nodes to store
desktop grid checkpoints,” in Proceedings of CoreGRID Integration Workshop
2008, pp. 15–26, Apr. 2008.
[8]
G. Coulson, G. Blair, P. Grace, A. Joolia, K. Lee, and J. Ueyama, “A component model for building systems software,” in Proceedings of IASTED Software
Engineering and Applications (SEA’04), (Cambridge MA, USA), Nov. 2004.
[9]
“Basic features of the Grid component model,” CoreGRID Deliverable
D.PM.04, CoreGRID, EU NoE project FP6-004265, Mar. 2007.
59
60
BIBLIOGRAPHY
[10] M. Aldinucci, S. Campa, M. Danelutto, M. Vanneschi, P. Kilpatrick, P. Dazzi,
D. Laforenza, and N. Tonellotto, “Behavioural skeletons in gcm: Autonomic
management of grid components,” in PDP ’08: Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP
2008), (Washington, DC, USA), pp. 54–63, IEEE Computer Society, 2008.
[11] F. Baude, D. Caromel, L. Henrio, and M. Morel, “Collective interfaces for
distributed components,” in CCGRID ’07: Proceedings of the Seventh IEEE
International Symposium on Cluster Computing and the Grid, (Washington,
DC, USA), pp. 599–610, IEEE Computer Society, 2007.
[12] C. Pairot, P. García, R. Mondéjar, and A. Gómez-Skarmeta, “p2pCM: A structured peer-to-peer Grid component model,” in International Conference on
Computational Science, pp. 246–249, 2005.
[13] C. Pairot, P. García, and A. Gómez-Skarmeta, “Dermi: A new distributed
hash table-based middleware framework,” IEEE Internet Computing, vol. 08,
no. 3, pp. 74–84, 2004.
Paper B
Chapter 7
A Design Methodology for
Self-Management in Distributed
Environments
Ahmad Al-Shishtawy, Vladimir Vlassov, Per Brand, and Seif Haridi
In IEEE International Conference onComputational Science and Engineering, 2009.
CSE ’09. , vol. 1, (Vancouver, BC, Canada), pp. 430–436, IEEE Computer Society,
August 2009.
A Design Methodology for Self-Management in
Distributed Environments
Ahmad Al-Shishtawy1 , Vladimir Vlassov1 , Per Brand2 , and Seif Haridi1,2
1
2
Royal Institute of Technology (KTH), Stockholm, Sweden
{ahmadas, vladv, haridi}@kth.se
Swedish Institute of Computer Science (SICS), Stockholm, Sweden
{perbrand, seif}@sics.se
Abstract
Autonomic computing is a paradigm that aims at reducing administrative overhead by providing autonomic managers to make applications selfmanaging. In order to better deal with dynamic environments, for improved
performance and scalability, we advocate for distribution of management functions among several cooperative managers that coordinate their activities in
order to achieve management objectives. We present a methodology for designing the management part of a distributed self-managing application in
a distributed manner. We define design steps, that includes partitioning of
management functions and orchestration of multiple autonomic managers.
We illustrate the proposed design methodology by applying it to design and
development of a distributed storage service as a case study. The storage
service prototype has been developed using the distributing component management system Niche. Distribution of autonomic managers allows distributing the management overhead and increased management performance due
to concurrency and better locality.
7.1
Introduction
Autonomic computing [1] is an attractive paradigm to tackle management overhead of complex applications by making them self-managing. Self-management,
namely self-configuration, self-optimization, self-healing, and self-protection (self-*
thereafter), is achieved through autonomic managers [2]. An autonomic manager
continuously monitors hardware and/or software resources and acts accordingly.
Managing applications in dynamic environments (like community Grids and peerto-peer applications) is specially challenging due to high resource churn and lack of
clear management responsibility.
A distributed application requires multiple autonomic managers rather than a
single autonomic manager. Multiple managers are needed for scalability, robustness, and performance and also useful for reflecting separation of concerns. Engineering of self-managing distributed applications executed in a dynamic environment requires a methodology for building robust cooperative autonomic managers.
63
64
CHAPTER 7. DESIGN METHODOLOGY
The methodology should include methods for management decomposition, distribution, and orchestration. For example, management can be decomposed into a
number of managers each responsible for a specific self-* property or alternatively
application subsystems. These managers are not independent but need to cooperate
and coordinate their actions in order to achieve overall management objectives.
The major contributions of the paper are as follows. We propose a methodology
for designing the management part of a distributed self-managing application in a
distributed manner, i.e. with multiple interactive autonomic managers. Decentralization of management and distribution of autonomic managers allows distributing
the management overhead, increasing management performance due to concurrency
and/or better locality. Decentralization does avoid a single point of failure however
it does not necessarily improve robustness. We define design steps, that includes
partitioning of management, assignment of management tasks to autonomic managers, and orchestration of multiple autonomic managers. We describe a set of
patterns (paradigms) for manager interactions.
We illustrate the proposed design methodology including paradigms of manager
interactions by applying it to design and development of a distributed storage service as a case study. The storage service prototype has been developed using the
distributing component management system Niche 1 [3–5].
The remainder of this paper is organized as follows. Section 7.2 describes Niche
and relate it to the autonomic computing architecture. Section 7.3 presents the
steps for designing distributed self-managing applications. Section 7.4 focuses on
orchestrating multiple autonomic managers. In Section 7.5 we apply the proposed
methodology to a distributed file storage as a case study. Related work is discussed
in Section 7.6 followed by conclusions and future work in Section 7.7.
7.2
The Distributed Component Management System
The autonomic computing reference architecture proposed by IBM [2] consists of
the following five building blocks.
• Touchpoint: consists of a set of sensors and effectors used by autonomic
managers to interact with managed resources (get status and perform operations). Touchpoints are components in the system that implement a uniform
management interface that hides the heterogeneity of managed resources. A
managed resource must be exposed through touchpoints to be manageable.
• Autonomic Manager: is the key building block in the architecture. Autonomic managers are used to implement the self-management behaviour of the
system. This is achieved through a control loop that consists of four main
stages: monitor, analyze, plan, and execute. The control loop interacts with
the managed resource through the exposed touchpoints.
1 In our previous work [3, 4] our distributing component management system Niche was called
DCMS
7.2. THE DISTRIBUTED COMPONENT MANAGEMENT SYSTEM
65
• Knowledge Source: is used to share knowledge (e.g. architecture information and policies) between autonomic managers.
• Enterprise Service Bus: provides connectivity of components in the system.
• Manager Interface: provides an interface for administrators to interact
with the system. This includes the ability to monitor/change the status of
the system and to control autonomic managers through policies.
The use-case presented in this paper has been developed using the distributed
component management system Niche [3, 4]. Niche implements the autonomic computing architecture described above. Niche includes a distributed component programming model, APIs, and a run-time system including deployment service. The
main objective of Niche is to enable and to achieve self-management of componentbased applications deployed on dynamic distributed environments such as community Grids. A self-managing application in Niche consists of functional and
management parts. Functional components communicate via bindings, whereas
management components communicate mostly via a publish/subscribe event notification mechanism.
The Niche run-time environment is a network of distributed containers hosting
functional and management components. Niche uses a structured overlay network
(Niche [4]) as the enterprise service bus. Niche is self-organising on its own and
provides overlay services used by Niche such as name-based communication, distributed hash table (DHT) and a publish/subscribe mechanism for event dissemination. These services are used by Niche to provide higher level communication
abstractions such as name-based bindings to support component mobility; dynamic
component groups; one-to-any and one-to-all bindings, and event based communication.
For implementing the touchpoints, Niche leverages the introspection and dynamic reconfiguration features of the Fractal component model [6] in order to provide sensors and actuation API abstractions. Sensors are special components that
can be attached to the application’s functional components. There are also built-in
sensors in Niche that sense changes in the environment such as resource failures,
joins, and leaves, as well as modifications in application architecture such as creation of a group. The actuation API is used to modify the application’s functional
and management architecture by adding, removing and reconfiguring components,
groups, bindings.
The Autonomic Manager (a control loop) in Niche is organized as a network
of Management Elements (MEs) interacting through events, monitoring via sensors and acting using the actuation API. This enables the construction of distributed control loops. MEs are subdivided into watchers, aggregators, and managers. Watchers are used for monitoring via sensors and can be programmed to find
symptoms to be reported to aggregators or directly to managers. Aggregators are
66
CHAPTER 7. DESIGN METHODOLOGY
used to aggregate and analyse symptoms and to issue change requests to managers.
Managers do planning and execute change requests.
Knowledge in Niche is shared between MEs using two mechanisms: first, using
the publish/subscribe mechanism provided by Niche; second, using the Niche DHT
to store/retrieve information such as component group members, name-to-location
mappings.
7.3
Steps in Designing Distributed Management
A self-managing application can be decomposed into three parts: the functional
part, the touchpoints, and the management part. The design process starts by specifying the functional and management requirements for the functional and management parts, respectively. In the case of Niche, the functional part of the application
is designed by defining interfaces, components, component groups, and bindings.
The management part is designed based on management requirements, by defining
autonomic managers (management elements) and the required touchpoints (sensors
and effectors).
An Autonomic Manager is a control loop that senses and affects the functional
part of the application. For many applications and environments it is desirable
to decompose the autonomic manager into a number of cooperating autonomic
managers each performing a specific management function or/and controlling a
specific part of the application. Decomposition of management can be motivated
by different reasons such as follows. It allows avoiding a single point of failure.
It may be required to distribute the management overhead among participating
resources. Self-managing a complex system may require more than one autonomic
manager to simplify design by separation of concerns. Decomposition can also be
used to enhance the management performance by running different management
tasks concurrently and by placing the autonomic managers closer to the resources
they manage.
We define the following iterative steps to be performed when designing and
developing the management part of a self-managing distributed application in a
distributed manner.
Decomposition: The first step is to divide the management into a number of
management tasks. Decomposition can be either functional (e.g. tasks are
defined based which self-* properties they implement) or spacial (e.g. tasks
are defined based on the structure of the managed application). The major
design issue to be considered at this step is granularity of tasks assuming that
a task or a group of related tasks can be performed by a single manager.
Assignment: The tasks are then assigned to autonomic managers each of which
becomes responsible for one or more management tasks. Assignment can
be done based on self-* properties that a task belongs to (according to the
7.4. ORCHESTRATING AUTONOMIC MANAGERS
67
functional decomposition) or based on which part of the application that task
is related to (according to the spatial decomposition).
Orchestration: Although autonomic managers can be designed independently,
multiple autonomic managers, in the general case, are not independent since
they manage the same system and there exist dependencies between management tasks. Therefore they need to interact and coordinate their actions in
order to avoid conflicts and interference and to manage the system properly.
Mapping: The set of autonomic managers are then mapped to the resources, i.e.
to nodes of the distributed environment. A major issue to be considered at this
step is optimized placement of managers and possibly functional components
on nodes in order to improve management performance.
In this paper our major focus is on the orchestration of autonomic managers
as the most challenging and less studied problem. The actions and objectives of
the other stages are more related to classical issues in distributed systems such as
partitioning and separation of concerns, and optimal placement of modules in a
distributed environment.
7.4
Orchestrating Autonomic Managers
Autonomic managers can interact and coordinate their operation in the following
four ways:
Stigmergy
Stigmergy is a way of indirect communication and coordination between agents [7].
Agents make changes in their environment, and these changes are sensed by other
agents and cause them to do more actions. Stigmergy was first observed in social
insects like ants. In our case agents are autonomic managers and the environment
is the managed application.
The stigmergy effect is, in general, unavoidable when you have more than one
autonomic manager and can cause undesired behaviour at runtime. Hidden stigmergy makes it challenging to design a self-managing system with multiple autonomic managers. However stigmergy can be part of the design and used as a way
of orchestrating autonomic managers (Figure 7.1).
Hierarchical Management
By hierarchical management we mean that some autonomic managers can monitor
and control other autonomic managers (Figure 7.2). The lower level autonomic
managers are considered as a managed resource for the higher level autonomic
manager. Communication between levels take place using touchpoints. Higher
level managers can sense and affect lower level managers.
68
CHAPTER 7. DESIGN METHODOLOGY
Figure 7.1: The stigmergy effect.
Figure 7.2: Hierarchical management.
Autonomic managers at different levels often operate at different time scales.
Lower level autonomic managers are used to manage changes in the system that
need immediate actions. Higher level autonomic managers are often slower and
used to regulate and orchestrate the system by monitoring global properties and
tuning lower level autonomic managers accordingly.
Direct Interaction
Autonomic managers may interact directly with one another. Technically this is
achieved by binding the appropriate management elements (typically managers) in
the autonomic managers together (Figure 7.3). Cross autonomic manager bindings
can be used to coordinate autonomic managers and avoid undesired behaviors such
7.5. CASE STUDY: A DISTRIBUTED STORAGE SERVICE
Autonomic
Manager 1
Autonomic
Manager 2
ME
ME
ME
ME
ME
69
ME
Touchpoint
Managed Resource
Figure 7.3: Direct interaction.
Figure 7.4: Shared Management Elements.
as race conditions or oscillations.
Shared Management Elements
Another way for autonomic managers to communicate and coordinate their actions
is by sharing management elements (Figure 7.4). This can be used to share state
(knowledge) and to synchronise their actions.
7.5
Case Study: A Distributed Storage Service
In order to illustrate the design methodology, we have developed a storage service
called YASS (Yet Another Storage Service) [3], using Niche. The case study illus-
70
CHAPTER 7. DESIGN METHODOLOGY
trates how to design a self-managing distributed system monitored and controlled
by multiple distributed autonomic managers.
YASS Specification
YASS is a storage service that allows users to store, read and delete files on a set
of distributed resources. The service transparently replicates the stored files for
robustness and scalability.
Assuming that YASS is to be deployed and provided in a dynamic distributed
environment, the following management functions are required in order to make
the storage service self-managing in the presence of dynamicity in resources and
load: the service should tolerate the resource churn (joins/leaves/failures), optimize
usage of resources, and resolve hot-spots. We define the following tasks based on
the functional decomposition of management according to self-* properties (namely
self-healing, self-configuration, and self-optimization) to be achieved.
• Maintain the file replication degree by restoring the files which were stored
on a failed/leaving resource. This function provides the self-healing property
of the service so that the service is available despite of the resource churn;
• Maintain the total storage space and total free space to meet QoS requirements by allocating additional resources when needed. This function provides
self-configuration of the service;
• Increasing the availability of popular files. This and the next two functions
are related to the self-optimization of the service.
• Release excess allocated storage when it is no longer needed.
• Balance the stored files among the allocated resources.
YASS Functional Design
A YASS instance consists of front-end components and storage components as
shown in Figure 7.5. The front-end component provides a user interface that is
used to interact with the storage service. Storage components represent the storage
capacity available at the resource on which they are deployed.
The storage components are grouped together in a storage group. A user issues
commands (store, read, and delete) using the front-end. A store request is sent to
an arbitrary storage component (using one-to-any binding between the front-end
and the storage group) which in turn will find some r different storage components,
where r is the file’s replication degree, with enough free space to store a file replica.
These replicas together will form a file group containing the r storage components
that will host the file. The front-end will then use a one-to-all binding to the file
group to transfer the file in parallel to the r replicas in the group. A read request is
7.5. CASE STUDY: A DISTRIBUTED STORAGE SERVICE
Read R
equest
one-to
-any b
inding
to a file
group
VO
A2
Front-end
Component
Free
A3
A1
B2
71
C3
B3
C2 Free
C1
Free
B1 C 4
Free
est g
qu
Re indin p
b
ite
u
Wr -any e gro
o
g
e-t tora
n
s
o
the
to
Ax,Bx,Cx = file groups, x is
replica number in the group.
Ovals = resources.
Rectangles = Components.
Dashed line = YASS storage
components group.
Figure 7.5: YASS Functional Part
sent to any of the r storage components in the group using the one-to-any binding
between the front-end and the file group. A delete request is sent to the file group
in parallel using a one-to-all binding between the front-end and the file group.
Enabling Management of YASS
Given that the functional part of YASS has been developed, to manage it we need
to provide touchpoints. Niche provides basic touchpoints for manipulating the system’s architecture and resources, such as sensors of resource failures and component
group creation; and effectors for deploying and binding components.
Beside the basic touchpoint the following additional, YASS specific, sensors and
effectors are required.
• A load sensor to measure the current free space on a storage component;
• An access frequency sensor to detect popular files;
• A replicate file effector to add one extra replica of a specified file;
• A move file effector to move files for load balancing.
Self-Managing YASS
The following autonomic managers are needed to manage YASS in a dynamic environment. All four orchestration techniques in Section 7.4 are demonstrated.
Replica Autonomic Manager
The replica autonomic manager is responsible for maintaining the desired replication degree for each stored file in spite of resources failing and leaving. This
CHAPTER 7. DESIGN METHODOLOGY
Replica Autonomic
Manager
72
File
Replica
Aggregator
Replica Change
Failure
Leave
File
Replica
Manager
Find and Restore Replica
Sensor
Effector
Free
Managed
Resource
A2
A3
A1
B2
B3
C2 Free
C1
C3 Free
B1 C4
Free
Storage Autonomic
Manager
Figure 7.6: Self-healing control loop.
Load Change
Component
Load
Watcher
Load
Storage
Availability
Change
Storage
Aggregator
Join
Failure
Leave
Storage
Manager
Allocate
& Deploy
Sensor
Effector
Free
Managed
Resource
A2
A3
A1
B2
B3
C2 Free
C1
C3 Free
B1 C4
Free
Figure 7.7: Self-configuration control loop.
autonomic manager adds the self-healing property to YASS. The replica autonomic
manager consists of two management elements, the File-Replica-Aggregator and
the File-Replica-Manager as shown in Figure 7.6.
The File-Replica-Aggregator monitors a file group, containing the subset of
storage components that host the file replicas, by subscribing to resource fail or
leave events caused by any of the group members. These events are received when
a resource, on which a component member in the group is deployed, is about to leave
or has failed. The File-Replica-Aggregator responds to these events by triggering a
replica change event to the File-Replica-Manager that will issue a find and restore
replica command.
Storage Autonomic Manager
The storage autonomic manager is responsible for maintaining the total storage
capacity and the total free space in the storage group, in the presence of dynamism,
to meet QoS requirements. The dynamism is due either to resources failing/leaving
(affecting both the total and free storage space) or file creation/addition/deletion
(affecting the free storage space only). The storage autonomic manager reconfigures
YASS to restore the total free space and/or the total storage capacity to meet
7.5. CASE STUDY: A DISTRIBUTED STORAGE SERVICE
73
the requirements. The reconfiguration is done by allocating free resources and
deploying additional storage components on them. This autonomic manager adds
the self-configuration property to YASS. The storage autonomic manager consists
of Component-Load-Watcher, Storage-Aggregator, and Storage-Manager as shown
in Figure 7.7.
The Component-Load-Watcher monitors the storage group, containing all storage components, for changes in the total free space available by subscribing to
the load sensors events. The Component-Load-Watcher will trigger a load change
event when the load is changed by a predefined delta. The Storage-Aggregator
is subscribed to the Component-Load-Watcher load change event and the resource
fail, leave, and join events (note that the File-Replica-Aggregator also subscribes to
the resource failure and leave events). The Storage-Aggregator, by analyzing these
events, will be able to estimate the total storage capacity and the total free space.
The Storage-Aggregator will trigger a storage availability change event when the
total and/or free storage space drops below a predefined thresholds. The StorageManager responds to this event by trying to allocate more resources and deploying
storage components on them.
Direct Interactions to Coordinate Autonomic Managers
The two autonomic managers, replica autonomic manager and storage autonomic
manager, described above seem to be independent. The first manager restores files
and the other manager restores storage. But as we will see in the following example
it is possible to have a race condition between the two autonomic managers that
will cause the replica autonomic manager to fail. For example, when a resource
fails the storage autonomic manager may detect that more storage is needed and
start allocating resources and deploying storage components. Meanwhile the replica
autonomic manager will be restoring the files that were on the failed resource. The
replica autonomic manager might fail to restore the files due to space shortage if
the storage autonomic manager is slower and does not have time to finish. This
may also prevent the users, temporarily, from storing files.
If the replica autonomic manager would have waited for the storage autonomic
manager to finish, it would not fail to recreate replicas. We used direct interaction
to coordinate the two autonomic managers by binding the File-Replica-Manager to
the Storage-Manager.
Before restoring files the File-Replica-Manager informs the Storage-Manager
about the amount of storage it needs to restore files. The Storage-Manager checks
available storage and informs the File-Replica-Manager that it can proceed if enough
space is available or ask it to wait.
The direct coordination used here does not mean that one manager controls
the other. For example if there is only one replica left of a file, the File-ReplicaManager may ignore the request to wait from the Storage-Manager and proceed
with restoring the file anyway.
CHAPTER 7. DESIGN METHODOLOGY
File
Access
Watcher
File
Frequency
Availability
Change
Manager
Access Frequency
New Replication Degree
Sensor
Effector
Replica
Autonomic
Manager
Managed
Resource
Availability
Autonomic Manager
74
Sensor
A3
A1
B2
Effector
Free
Managed
Resource
A2
B3
C2 Free
C1
C3 Free
B1 C4
Free
Figure 7.8: Hierarchical management.
Optimising Allocated Storage
Systems should maintain high resource utilization. The storage autonomic manager allocates additional resources if needed to guarantee the ability to store files.
However, users might delete files later causing the utilization of the storage space
to drop. It is desirable that YASS be able to self-optimize itself by releasing excess
resources to improve utilization.
It is possible to design an autonomic manager that will detect low resource
utilization, move file replicas stored on a chosen lowly utilized resource, and finally
release it. Since the functionality required by this autonomic manager is partially
provided by the storage and replica autonomic managers we will try to augment
them instead of adding a new autonomic manager, and use stigmergy to coordinate
them.
It is easy to modify the storage autonomic manager to detect low storage utilization. The replica manager knows how to restore files. When the utilization of
the storage components drops, the storage autonomic manager will detect it and
will deallocate some resource. The deallocation of resources will trigger, through
stigmergy, another action at the replica autonomic manager. The replica autonomic
manager will receive the corresponding resource leave events and will move the files
from the leaving resource to other resources.
We believe that this is better than adding another autonomic manager for following two reasons: first, it allows avoiding duplication of functionality; and second,
it allows avoiding oscillation between allocation and releasing resources by keeping
the decision about the proper amount of storage at one place.
Improving file availability
Popular files should have more replicas in order to increase their availability. A
higher level availability autonomic manager can be used to achieve this through
75
Storage
Autonomic
Manager
Load
Balancing
Manager
Least/Most
Loaded
Storage
Aggregator
Move Files
Timer
Sensor
Load Balancing
Autonomic Manager
7.6. RELATED WORK
Effector
Free
Managed
Resource
A2
A3
A1
B2
B3
C2 Free
C1
C3 Free
B1 C4
Free
Figure 7.9: Sharing of Management Elements.
regulating the replica autonomic manager. The autonomic manager consists of
two management elements. The File-Access-Watcher and File-Availability-Manager
shown in Figure 7.8 illustrate hierarchical management.
The File-Access-Watcher monitors the file access frequency. If the popularity of
a file changes dramatically it issues a frequency change event. The File-AvailabilityManager may decide to change the replication degree of that file. This is achieved by
changing the value of the replication degree parameter in the File-Replica-Manager.
Balancing File Storage
A load balancing autonomic manager can be used for self-optimization by trying
to lazily balance the stored files among storage components. Since knowledge of
current load is available at the Storage-Aggregator, we design the load balancing
autonomic manager by sharing the Storage-Aggregator as shown in Figure 7.9.
All autonomic managers we discussed so far are reactive. They receive events
and act upon them. Sometimes proactive managers might be also required, such
as the one we are discussing. Proactive managers are implemented in Niche using
a timer abstraction.
The load balancing autonomic manager is triggered, by a timer, every x time
units. The timer event will be received by the shared Storage-Aggregator that will
trigger an event containing the most and least loaded storage components. This
event will be received by the Load-Balancing-Manager that will move some files
from the most to the least loaded storage component.
7.6
Related Work
The vision of autonomic management as presented in [1] has given rise to a number
of proposed solutions to aspects of the problem.
An attempt to analyze and understand how multiple interacting loops can manage a single system has been done in [8] by studying and analysing existing systems
such as biological and software systems. By this study the authors try to under-
76
CHAPTER 7. DESIGN METHODOLOGY
stand the rules of a good control loop design. A study how to compose multiple
loops and ensure that they are consistent and complementary is presented in [9].
The authors presented an architecture that supports such compositions.
A reference architecture for autonomic computing is presented in [10]. The
authors present patterns for applying their proposed architecture to solve specific
problems common to self-managing applications. Behavioural Skeletons is a technique presented in [11] that uses algorithmic skeletons to encapsulate general control
loop features that can later be specialized to fit a specific application.
7.7
Conclusions and Future Work
We have presented the methodology of developing the management part of a selfmanaging distributed application in distributed dynamic environment. We advocate for multiple managers rather than a single centralized manager that can induce
a single point of failure and a potential performance bottleneck in a distributed
environment. The proposed methodology includes four major design steps: decomposition, assignment, orchestration, and mapping (distribution). The management
part is constructed as a number of cooperative autonomic managers each responsible
either for a specific management function (according to functional decomposition
of management) or for a part of the application (according to a spatial decomposition). We have defined and described different paradigms (patterns) of manager
interactions, including indirect interaction by stigmergy, direct interaction, sharing
of management elements, and manager hierarchy. In order to illustrate the design
steps, we have developed and presented in this paper a self-managing distributed
storage service with self-healing, self-configuration and self-optimizing properties
provided by corresponding autonomic managers, developed using the distributed
component management system Niche. We have shown how the autonomic managers can coordinate their actions, by the four described orchestration paradigms,
in order to achieve the overall management objectives.
Dealing with failure of autonomic managers (as opposed to functional parts of
the application) is out of the scope of this paper. Clearly, by itself, decentralization
of management, might make the application more robust (as some aspects of management continue working, while others stop), but also more fragile due to increased
risk of partial failure. In both the centralized and decentralized case, techniques for
fault tolerance are needed to insure robustness. Many of these techniques, while
ensuring fault recovery do so with some significant delay, in which case a decentralized management architecture may prove advantageous as only some aspects of
management are disrupted at any one time.
Our future work includes refinement of the design methodology, further case
studies with the focus on orchestration of autonomic managers, investigating robustness of managers by transparent replication of management elements.
7.7. CONCLUSIONS AND FUTURE WORK
77
Acknowledgements
We would like to thank the Niche research team including Konstantin Popov and
Joel Höglund from SICS, and Nikos Parlavantzas from INRIA.
Bibliography
[1]
P. Horn, “Autonomic computing: IBM’s perspective on the state of information
technology,” Oct. 15 2001.
[2]
IBM, “An architectural blueprint for autonomic computing, 4th edition.”
http://www-01.ibm.com/software/tivoli/autonomic/pdfs/AC_
Blueprint_White_Paper_4th.pdf, June 2006.
[3]
A. Al-Shishtawy, J. Höglund, K. Popov, N. Parlavantzas, V. Vlassov, and
P. Brand, “Enabling self-management of component based distributed applications,” in From Grids to Service and Pervasive Computing (T. Priol and
M. Vanneschi, eds.), pp. 163–174, Springer US, July 2008.
[4]
P. Brand, J. Höglund, K. Popov, N. de Palma, F. Boyer, N. Parlavantzas,
V. Vlassov, and A. Al-Shishtawy, “The role of overlay services in a selfmanaging framework for dynamic virtual organizations,” in Making Grids
Work (M. Danelutto, P. Fragopoulou, and V. Getov, eds.), pp. 153–164,
Springer US, 2007.
[5]
“Niche homepage.” http://niche.sics.se/.
[6]
E. Bruneton, T. Coupaye, and J.-B. Stefani, “The fractal component model,”
tech. rep., France Telecom R&D and INRIA, Feb. 5 2004.
[7]
E. Bonabeau, “Editor’s introduction: Stigmergy,” Artificial Life, vol. 5, no. 2,
pp. 95–96, 1999.
[8]
P. V. Roy, S. Haridi, A. Reinefeld, J.-B. Stefani, R. Yap, and T. Coupaye,
“Self management for large-scale distributed systems: An overview of the selfman project,” in FMCO ’07: Software Technologies Concertation on Formal
Methods for Components and Objects, (Amsterdam, The Netherlands), Oct
2007.
[9]
S.-W. Cheng, A.-C. Huang, D. Garlan, B. Schmerl, and P. Steenkiste, “An
architecture for coordinating multiple self-management systems,” in WICSA
’04, (Washington, DC, USA), p. 243, 2004.
79
80
BIBLIOGRAPHY
[10] J. W. Sweitzer and C. Draper, “Architecture overview for autonomic computing,” in Autonomic Computing: Concepts, Infrastructure, and Applications
(M. Parashar and S. Hariri, eds.), ch. 5, pp. 71–98, CRC Press, 2006.
[11] M. Aldinucci, S. Campa, M. Danelutto, M. Vanneschi, P. Kilpatrick, P. Dazzi,
D. Laforenza, and N. Tonellotto, “Behavioural skeletons in gcm: Autonomic management of grid components,” in PDP’08, (Washington, DC, USA),
pp. 54–63, 2008.
Paper C
Chapter 8
Policy Based Self-Management in
Distributed Environments
Lin Bao, Ahmad Al-Shishtawy, and Vladimir Vlassov
In Third IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshops (SASOW 2009), (San Francisco, California), September 2009.
Policy Based Self-Management in Distributed
Environments
Lin Bao, Ahmad Al-Shishtawy, and Vladimir Vlassov
Royal Institute of Technology (KTH), Stockholm, Sweden
{linb, ahmadas, vladv}@kth.se
Abstract
Currently, increasing costs and escalating complexities are primary issues
in the distributed system management. The policy based management is
introduced to simplify the management and reduce the overhead, by setting
up policies to govern system behaviors. Policies are sets of rules that govern
the system behaviors and reflect the business goals or system management
objectives.
This paper presents a generic policy-based management framework which
has been integrated into an existing distributed component management system, called Niche, that enables and supports self-management. In this framework, programmers can set up more than one Policy-Manager-Group to avoid
centralized policy decision making which could become a performance bottleneck. Furthermore, the size of a Policy-Manager-Group, i.e. the number
of Policy-Managers in the group, depends on their load, i.e. the number of
requests per time unit. In order to achieve good load balancing, a policy
request is delivered to one of the policy managers in the group randomly chosen on the fly. A prototype of the framework is presented and two generic
policy languages (policy engines and corresponding APIs), namely SPL and
XACML, are evaluated using a self-managing file storage application as a case
study.
8.1
Introduction
To minimize complexities and overheads of distributed system management, IBM
proposed the Autonomic Computing Initiative [1, 2], aiming at developing computing systems which can self-manage themselves. In this work, we address a generic
policy-based management framework. Policies are sets of rules which govern the
system behaviors and reflect the business goals and objectives. Rules define management actions to be performed under certain conditions and constraints. The
key idea of policy-based management is to allow IT administrators to define a set
of policy rules to govern behaviors of their IT systems, rather than relying on manually managing or ad-hoc mechanics (e.g. writing customized scripts) [3]. In this
way, the complexity of system management can be reduced, and also, the reliability
of the system’s behavior is improved.
The implementation and maintenance of policies are rather difficult, especially
if policies are “hard-coded” (embedded) in the management code of a distributed
83
84
CHAPTER 8. POLICY BASED SELF-MANAGEMENT
system, and the policy logic is scattered in the system implementation. The drawbacks of using “hard-coded” and scattered policy logic are the following: (1) It is
hard to trace policies; (2) The application developer has to be involved in implementation of policies; (3) When changing policies, the application has to be rebuilt
and redeployed that increases the maintenance overhead. In order to facilitate implementation and maintenance of policies, a language support, including a policy
language and a policy evaluation engine, is needed.
This paper presents a generic policy-based management framework which has
been integrated into Niche [4, 5], a distributed component management system for
development and execution of self-managing distributed applications. The main
issues in development of policy-based self-management for a distributed system
are programmability, performance and scalability of management. Note that robustness of management can be achieved by replicating management components.
Our framework introduces the following key concepts to address above issues: (1)
Abstraction of policy that simplifies the modeling and maintenance of policies; (2)
Policy Manager Group that allows improving scalability and performance of policybased management by using multiple managers and achieving good load balance
among them; (3) Distributed Policy-Manager-Group Model that allows to avoid
centralized policy decision making, which can become a performance bottleneck.
We have built a prototype of the policy-based management framework and applied
it to a distributed storage service called YASS, Yet Another Storage Service [4, 6]
developed using Niche. We have evaluated the performance of policy-based management performed using policy engines, and compared it with the performance of
hard-coded management.
The rest of the paper is organized as follows. Section II briefly introduces the
Niche platform. In Section III, we describe our policy based management architecture and control loop patterns, and discuss the policy decision making model. We
present our policy-based framework prototype and performance evaluation results
in Section IV followed by a brief review of some related work in Section V. Finally,
Section VI presents some conclusions and directions for our future work.
8.2
Niche: A Distributed Component Management System
Niche [4, 5] is a distributed component management system for development and
execution of self-managing distributed systems, services and applications. Niche
includes a component-based programming model, a corresponding API, and a runtime execution environment for the development, deployment and execution of selfmanaging distributed applications. Compared to other existing distributed programming environments, Niche has some features and innovations that facilitate
development of distributed systems with robust self-management. In particular,
Niche uses a structured overlay network and DHTs that allows increasing the level
of distribution transparency in order to enable and to achieve self-management (e.g.
component mobility, dynamic reconfiguration) for large-scale distributed systems;
8.3. NICHE POLICY BASED MANAGEMENT
85
Niche leverages self-organizing properties of the structured overlay network, and
provides support for transparent replication of management components in order
to improve robustness of management.
Niche separates the programming of functional and management (self-*) parts
of a distributed system or application. The functional code is developed using the
Fractal component model [7] extended with the concept of component groups and
bindings to groups. A Fractal component may contain a client interface (used by the
component) and/or a server interface (provided by the component). Components
interact through bindings. A binging connects a client interface of one component
to a server interface of another component (or component group). The component
group concept brings on two communication patterns “one-to-all” and “one-to-any”.
A component, which is bound to a component group with a one-to-any binding,
communicates with any (but only one) component randomly and transparently
chosen from the group on the fly. A component, which is bound to a group with
a one-to-all binding, communicates with all components in that group at once,
i.e. when the component invokes a method on the group interface bound with
one-to-all binding, all components, members of the group, receive the invocation.
The abstraction of groups and group communication facilitates programming of
both functional and self management parts, and allows improving scalability and
robustness of management.
The self-* code is organized as a network of distributed management elements
(MEs) (Figure 8.1) communicating with each other through events. MEs are subdivided into Watchers (W), Aggregators (Aggr), Managers (Mgr) and Executors,
depending on their roles in the self-* code. Watchers monitor the state of the managed application and its environment, and communicate monitored information
to Aggregators, which aggregate the information, detect and report symptoms to
Managers. Managers analyze the symptoms, make decisions and request Executors
to perform management actions.
8.3
Niche Policy Based Management
Architecture
Figure 8.2 shows the conceptual view of policy based management architecture.
The main elements are described below.
A Watcher (W) is used to monitor a managed resource1 or a group of managed
resources through sensors that are placed on managed resources. Watchers will
collect monitored information and report to an Aggregator.
Aggregators (Aggr) aggregate, filter and analyze the information collected from
Watchers or directly from sensors. When a policy decision is possibly needed, the
1 Further
in the paper, we call, for short, resource any entity or part of an application and its
execution environment, which can be monitored and possibly managed, e.g. component, component group, binging, component container, etc.
86
CHAPTER 8. POLICY BASED SELF-MANAGEMENT
Figure 8.1: Niche Management Elements
Figure 8.2: Policy Based Management Architecture
aggregator will formulate a policy request event and send it to the Policy-ManagerGroup through one-to-any binding.
Policy-Managers (PM) take the responsibility of loading policies from the policy
repository, making decisions on policy request events, and delegating the obligations
to Executors (E) in charge. Obligations are communicated from Policy-Managers
to Executors in the form of policy obligation events.
Niche achieves reliability of management by replicating management elements.
8.3. NICHE POLICY BASED MANAGEMENT
87
For example, if a Policy-Manager fails when evaluating a request against policies,
one of its replicas takes its responsibility and continues with the evaluation.
Executors execute the actions, dictated in policy-obligation-events, on managed
resources through actuators deployed on managed resources.
Special Policy-Watchers monitor the policy repositories and policy configuration files. On any change in the policy repositories or policy configuration files (e.g.
a policy configuration file has been updated), a Policy-Watcher issues a PolicyChange-Event and sends it to the Policy-Manager-Group through the one-to-all
binding. Upon receiving the Policy-Change-Event, all Policy-Managers reload policies. This allows administrators to change policies on the fly.
Policy-Manager-Group is a group of Policy-Managers, which are loaded with the
same set of policies. Niche is a distributed component platform. In the distributed
system, a single Policy-Manager, governing system behaviors, will be a performance
bottleneck, since every request will be forwarded to it. It is allowed in Niche to have
more than one Policy-Manager-Group in order to avoid the potential bottleneck
with centralized decision making. Furthermore, the size of Policy-Manager-Group,
that is, the number of Policy-Managers it consists of, depends on its load, i.e.
the intensity of requests (the number of requests per time unit). When a particular
Policy-Manager-Group is highly loaded, the number of Policy-Managers is increased
in order to reduce burdens of each member. Niche allows changing the group
members transparently without affecting components bound to the group.
A Local-Conflicts-Detector checks that the new or modified policy does not conflict with any existing local policy for a given Policy-Manager-Group. There might
be several Local-Conflicts-Detectors, one per Policy-Manager-Group. A GlobalConflicts-Detector checks whether the new policy conflicts with other policies in a
global system-wise view.
Policy-Based Management Control Loop
Self-management behaviors can be achieved through control loops. A control loop
keeps watching states of managed resources and acts accordingly. In policy-based
management architecture described above, a control loop is composed of Watchers,
Aggregators, a Policy-Manager-Group and Executors (Figure 8.2). Note that the
Policy-Manager-Group plays a role of Manager (see Figure 8.1).
Watchers deploy sensors on managed resources to monitor their states, and
report changes to Aggregators that communicate policy request events to the PolicyManager-Group using one-to-any bindings. Upon receiving a policy request event,
the randomly chosen Policy-Manager retrieves applicable policies, along with any
information required for policy evaluation, and evaluates policies with information
available.
Based on rules and actions prescribed in the policy, the Policy-Manager will
choose the relevant change plan and delegate to executor in charge. The executor
executes the plan on the managed resource through actuators.
88
CHAPTER 8. POLICY BASED SELF-MANAGEMENT
Policy-Manager-Group Model
Our framework allows programmers to define one or more Policy-Manager-Groups
to govern system behaviors. There are two ways of making decisions in policy
management groups: centralized and distributed.
In the centralized model of Policy-Manager-Group, there is only one PolicyManager-Group formed by all Policy-Managers with common policies. The centralized model is easy to implement, and it needs only one Local-Conflict-Detector
and one Policy-Watcher. However, a centralized decision making can become a
performance bottleneck in policy based management for a distributed system. Furthermore, management should be distributed, based on spatial and functional partitioning, in order to improve scalability, robustness and performance of management.
The distribution of management should match and correspond to architecture of
the system being managed, taking into account its structure, location of its components, physical network connectivity, management structure of an organization
where the system is used.
In the distributed model of Policy-Manager-Group, each policy manager knows
only partial policies of the whole system. Policy managers with common policies
form a policy-manager group associated with a Policy-Watcher. There are several
advantages of the distributed model. First, it is a more natural way to realize policy
based management. For the whole system, global policies are applied to govern
system behaviors. For different groups of components, local polices are governing
their behaviors based on the hardware platforms and operating systems they are
working on. Second, this model is more efficient and scalable. Policy-managers
reading and evaluating fewer policies will shorten the evaluation time. However,
policy managers from different groups need to coordinate their actions in order to
finish policy evaluation when the policy request is unknown to a policy manager,
which, in this case, needs to ask another policy manager from a different group.
Any form of coordination is a lost to performance. Last, the distributed model
of policy-based management is more secure. Not all policies should be exposed
to every policy manager. Since some policies contain information on the system
parameters, they should be protected against malicious users. Furthermore, both
Global-Conflict-Detector and Local-Conflict-Detector are needed to detect whether
or not a newly added, changed or deleted policy is in conflict with other policies
for the whole system or a given policy-manager-group.
8.4
Niche Policy-based Management Framework Prototype
We have built a prototype of our policy-based management framework for the Niche
distributed component management system by using policy engines and corresponding APIs for two policy languages XACML (eXtensible Access Control Markup
Language) [8, 9] and SPL (Simplified Policy Language) [10, 11].
We have had several reasons for choosing these two languages for our framework. Each of the languages is supported with a Java-implemented policy engine;
8.4. FRAMEWORK PROTOTYPE
89
Figure 8.3: YASS self-configuration control loop
this makes it easier to integrate the policy engines into our Java-based Niche platform. Both languages allow defining policy rules (rules with obligations in XACML,
or decision statements in SPL) that dictate the management actions to be enforced on managed resources by executors. SPL is intended for management of
distributed system. Although XACML was designed for access control rather than
for management, its support for obligations can be easily adopted for management
of distributed system.
In order to test and evaluate our framework, we have applied it to YASS, Yet
Another Storage Service [4, 6], which is a self-managing storage service with two
control loops, one for self-healing (to maintain a specified file replication degree in
order to achieve high file availability in presence of node churn) and one for selfconfiguration (to adjust amount of storage resources according to load changes).
For example, the YASS self-configuration control loop consists of Component-LoadWatcher, Storage-Aggregator, Policy-Manager and Configuration-Executor as depicted in Figure 8.3. The Watcher monitors the free storage space in the storage group and reports this information to Storage-Aggregator. The Aggregator
computes the total capacity and total free space in the group and informs PolicyManager when the capacity and/or free space drop below predefined thresholds.
The Policy-Manager evaluates the event according to the configuration policy and
delegates the management obligations to Executor, which tries to allocate more
resources and deploy additional storage components on them in order to increase
capacity and/or free space.
In the initial implementation of YASS, all management was coded in Java;
whereas in the policy-based implementation, a part of management was expressed
in a policy language (XACML or SPL).
90
CHAPTER 8. POLICY BASED SELF-MANAGEMENT
XACML
SPL
Java
MAX
MIN
AVG
MAX
MIN
AVG
AVG
Policy Load
379
168
246.8
705
368
487.4
—
First evaluation
36
11
18.9
7
3
5.7
≈0
Second evaluation
7
1
3
7
2
5.7
≈0
Table 8.1: Policy Evaluation Result (in milliseconds)
XACML
SPL
MAX
MIN
AVG
MAX
MIN
AVG
Policy Re-Load
27
23
24.5
62
53
56.3
1st evaluation
4
1
3
8
2
5.8
2nd evaluation
5
1
3
6
2
5.3
Table 8.2: Policy Reload Result (in milliseconds)
We have used YASS as a use case in order to evaluate expressiveness of different
policy languages, XACML and SPL, and the performance of policy-based management compared with hard-coded Java implementation of management. It is worth
mentioning that a hard-coded manager, unless specially designed, does not allow
changing policies on the fly.
In the current version, for quick prototyping, we set up only one Policy-Manager,
which can be a performance bottleneck when the application scales. We have evaluated the performance of our prototype (running YASS) by measuring the average
policy evaluation times of XACML and SPL policy managers. We have compared
performance of both policy managers with the performance of the hard-coded manager explained above. The evaluation results Table 8.1 show that the hard-coded
management implementation performs better (as expected) than the policy-based
management implementation. Therefore, it could be recommended to use policybased management framework to implement less performance-demanding managers
with policies or objectives that need to be changed on the fly. The time needed
to reload the policy file by both XACML and SPL policy managers is shown in
Table 8.2. From these results we have observed that the XACML management implementation is slightly faster than the SPL management implementation; however,
on the other hand, in our opinion based on our experience, SPL policies was easier
to write and implement than XACML policies.
8.5. RELATED WORK
91
Figure 8.4: XACML policy evaluation results
Scalability Evaluation using Synthetic Policies
The current version of YASS is a simple storage service and its self-management
requires a small number of management policies (policy rules) governing the whole
application. It is rather difficult to find a large number of real-life policies. To
further compare the performance and scalability of management using XACML
and SPL policy engines, we have generated dummy synthetic policies in order to
increase the size of the policy set, i.e. the number of policies to be evaluated on
a management request. In order to force policy engines to evaluate all synthetic
policies (rules), we have applied the Permit-Overrides rule combining algorithm
for XACML policies, where a permitting rule was the last in evaluation, and the
Execute_All_Applicable strategy for SPL policies.
Figure 8.4 shows the XACML preprocessing time versus the number of policies in a one-layered policy. We observe that there is an almost linear correlation
between the preprocessing time of XACML and the number of rules. This result
demonstrates that the XACML-based implementation is scalable in the preprocessing phase.
Figure 8.5 shows the processing time of SPL versus the number of policies. We
observe that there is almost exponential correlation between the processing time
of SPL and the number of policies. This result demonstrates that the SPL-based
implementation is not scalable in the processing time.
8.5
Related Work
Policy Management for Autonomic Computing (PMAC) [12, 13] provides the policy
language and mechanisms needed to create and enforce these policies for managed
resources. PMAC is based on a centralized decision maker Autonomic Manager
and all policies are stored in a centralized policy repository. Ponder2 [14] is a
self-contained, stand-alone policy system for autonomous pervasive environments.
92
CHAPTER 8. POLICY BASED SELF-MANAGEMENT
Figure 8.5: SPL policy evaluation results
It eliminates some disadvantages of its predecessor Ponder. First, it supports distributed provision and decision making. Second, it does not depend on a centralized
facility, such as LDAP or CIM repositories. Third, it is able to scale to small devices
as needed in pervasive systems.
8.6
Conclusions and Future Work
This paper proposed a policy based framework which facilitates distributed policy
decision making and introduces the concept of Policy-Manager-Group that represents a group of policy-based managers formed to balance load among PolicyManagers.
Policy-based management has several advantages over hard-coded management.
First, it is easier to administrate and maintain (e.g. change) management policies
than to trace the hard-coded management logic scattered across codebase. Second,
the separation of policies and application logic (as well as low-level hard-coded
management) makes the implementation easier, since the policy author can focus
on modeling policies without considering the specific application implementation,
while application developers do not have to think about where and how to implement management logic, but rather have to provide hooks to make their system
manageable, i.e. to enable self-management. Third, it is easier to share and reuse
the same policy across multiple different applications and to change the policy
consistently. Finally, policy-based management allows policy authors and administrators to edit and to change policies on the fly (at runtime).
From our evaluation results, we can observe that the hard-coded management
performs better than the policy-based management, which uses a policy engine.
Therefore, it could be recommended to use policy-based management in less performance demanding managers with policies or management objectives that need
8.6. CONCLUSIONS AND FUTURE WORK
93
to be changed on the fly (at runtime).
Our future work includes implementation of Policy-Manager-Group in the prototype. We also need to define a coordination mechanism for Policy-ManagerGroups, and to find an approach to implement the local conflict detector and the
global conflict detector. Finally, we need to specify how to divide the realm of each
Policy-Manager-Group governs.
Bibliography
[1]
P. Horn, “Autonomic computing: IBM’s perspective on the state of information
technology,” Oct. 15 2001.
[2]
IBM, “An architectural blueprint for autonomic computing, 4th edition.”
http://www-01.ibm.com/software/tivoli/autonomic/pdfs/AC_
Blueprint_White_Paper_4th.pdf, June 2006.
[3]
D. Agrawal, J. Giles, K. Lee, and J. Lobo, “Policy ratification,” in Policies for
Distributed Systems and Networks, 2005. Sixth IEEE Int. Workshop (T. Priol
and M. Vanneschi, eds.), pp. 223– 232, June 2005.
[4]
A. Al-Shishtawy, J. Höglund, K. Popov, N. Parlavantzas, V. Vlassov, and
P. Brand, “Enabling self-management of component based distributed applications,” in From Grids to Service and Pervasive Computing (T. Priol and
M. Vanneschi, eds.), pp. 163–174, Springer US, July 2008.
[5]
“Niche homepage.” http://niche.sics.se/.
[6]
A. Al-Shishtawy, J. Höglund, K. Popov, N. Parlavantzas, V. Vlassov, and
P. Brand, “Distributed control loop patterns for managing distributed applications,” in Second IEEE International Conference on Self-Adaptive and SelfOrganizing Systems Workshops (SASOW 2008), (Venice, Italy), pp. 260–265,
Oct. 2008.
[7]
E. Bruneton, T. Coupaye, and J.-B. Stefani, “The fractal component model,”
tech. rep., France Telecom R&D and INRIA, Feb. 5 2004.
[8]
“Oasis extensible access control markup language (xacml) tc.” http://www.
oasis-open.org/committees/tc_home.php?wg_abbrev=xacml#expository.
[9]
“Sun’s xacml programmers guide.” http://sunxacml.sourceforge.net/
guide.html.
[10] “Spl language reference.” http://incubator.apache.org/imperius/docs/
spl_reference.html.
95
96
BIBLIOGRAPHY
[11] D. Agrawal, S. Calo, K.-W. Lee, J. Lobo, and T. W. Res., “Issues in designing
a policy language for distributed management of it infrastructures,” in Integrated Network Management, 2007. IM ’07. 10th IFIP/IEEE International
Symposium, pp. 30–39, June 2007.
[12] IBM, “Use policy management for autonomic computing.” https:
//www6.software.ibm.com/developerworks/education/ac-guide/
ac-guide-pdf.pdf, April 2005.
[13] D. Kaminsky, “An introduction to policy for autonomic computing.” http:
//www.ibm.com/developerworks/autonomic/library/ac-policy.html,
March 2005.
[14] K. Twidle, N. Dulay, E. Lupu, and M. Sloman, “Ponder2: A policy system for
autonomous pervasive environments,” in Autonomic and Autonomous Systems,
2009. ICAS ’09. Fifth International Conference, pp. 330–335, April 2009.
Part III
Technical Report
97
Paper D
Chapter 9
Achieving Robust
Self-Management for Large-Scale
Distributed Applications
Ahmad Al-Shishtawy, Muhammad Asif Fayyaz, Konstantin Popov, and
Vladimir Vlassov
Technical Report T2010:02, Swedish Institute of Computer Science, March 2010
Achieving Robust Self-Management for
Large-Scale Distributed Applications
Ahmad Al-Shishtawy1,2 , Muhammad Asif Fayyaz1 , Konstantin Popov2 ,
and Vladimir Vlassov1
1
2
Royal Institute of Technology, Stockholm, Sweden
{ahmadas, mafayyaz, vladv}@kth.se
Swedish Institute of Computer Science, Stockholm, Sweden
{ahmad, kost}@sics.se
March 4, 2010
SICS Technical Report T2010:02
ISSN: 1100-3154
Abstract
Autonomic managers are the main architectural building blocks for constructing self-management capabilities of computing systems and applications. One of the major challenges in developing self-managing applications is
robustness of management elements which form autonomic managers. We believe that transparent handling of the effects of resource churn (joins/leaves/failures) on management should be an essential feature of a platform for selfmanaging large-scale dynamic distributed applications, because it facilitates
the development of robust autonomic managers and hence improves robustness of self-managing applications. This feature can be achieved by providing
a robust management element abstraction that hides churn from the programmer.
In this paper, we present a generic approach to achieve robust services
that is based on finite state machine replication with dynamic reconfiguration
of replica sets. We contribute a decentralized algorithm that maintains the
set of nodes hosting service replicas in the presence of churn. We use this
approach to implement robust management elements as robust services that
can operate despite of churn. Our proposed decentralized algorithm uses
peer-to-peer replica placement schemes to automate replicated state machine
migration in order to tolerate churn. Our algorithm exploits lookup and
failure detection facilities of a structured overlay network for managing the set
of active replicas. Using the proposed approach, we can achieve a long running
and highly available service, without human intervention, in the presence of
resource churn. In order to validate and evaluate our approach, we have
implemented a prototype that includes the proposed algorithm.
101
102
9.1
CHAPTER 9. ACHIEVING ROBUST SELF-MANAGEMENT
Introduction
Autonomic computing [1] is an attractive paradigm to tackle management overhead
of complex applications by making them self-managing. Self-management, namely
self-configuration, self-optimization, self-healing, and self-protection, is achieved
through autonomic managers [2]. An autonomic manager continuously monitors
hardware and/or software resources and acts accordingly. Autonomic computing
is particularly attractive for large-scale and/or dynamic distributed systems where
direct human management might not be feasible.
In our previous work, we have developed a platform called Niche [3, 4] that
enables us to build self-managing large-scale distributed systems. Autonomic managers play a major rule in designing self-managing systems [5]. An autonomic
manager in Niche consists of a network of management elements (MEs). Each ME
is responsible for one or more roles in the construction the Autonomic Manager.
These roles are: Monitor, Analyze, Plan, and Execute (the MAPE loop [2]). In
Niche, MEs are distributed and interact with each other through events (messages)
to form control loops.
Large-scale distributed systems are typically dynamic with resources that may
fail or join/leave the system at any time (resource churn). Constructing autonomic
managers in dynamic environments with high resource churn is challenging because
MEs need to be restored with minimal disruption to the autonomic manager, whenever the resource where MEs execute leaves or fails. This challenge increases if the
MEs are stateful because the state needs to be maintained consistent.
We propose a Robust Management Element (RME) abstraction that developers
can use if they need their MEs to tolerate resource churn. The RME abstraction
allows simplifying the development of robust autonomic managers that can tolerate resource churn, and thus self-managing large-scale distributed systems. This
way developers of self-managing systems can focus on the functionality of the management without the need to deal with failures. A Robust Management Element
should: 1) be replicated to ensure fault-tolerance; 2) survive continuous resource
failures by automatically restoring failed replicas on other nodes; 3) maintain its
state consistent among replicas; 4) provide its service with minimal disruption in
spite of resource join/leave/fail (high availability). 5) be location transparent (i.e.
clients of the RME should be able to communicate with it regardless of its current location). Because we are targeting large-scale distributed environments with
no central control, such as peer-to-peer networks, all algorithms should operate in
decentralized fashion.
In this paper, we present our approach to achieving RMEs that is based on finite
state machine replication with automatic reconfiguration of replica sets. We replicate MEs on a fixed set of nodes using the replicated state machine [6, 7] approach.
However, replication by itself is not enough to guarantee long running services in
the presence of continuous churn. This is because the number of failed nodes (that
host ME replicas) will increase with time. Eventually this will cause the service to
stop. Therefor, we use service migration [8] to enable the reconfiguration the set
9.2. BACKGROUND
103
of nodes hosting ME replicas. Using service migration, new nodes can be introduced to replace the failed ones. We propose a decentralized algorithm, based on
Structured Overlay Networks (SONs) [9], that will use migration to automatically
reconfigure the set of nodes where the ME replicas are hosted. This will guarantee
that the service provided by the RME will tolerate continuous churn. The reconfiguration take place by migrating MEs when needed to new nodes. The major use
of SONs in our approach is as follows: first, to maintain location information of the
replicas using replica placement schemes such as symmetric replication [10]; second,
to detect the failure of replicas and to make a decision to migrate in a decentralized
manner; third, to allow clients to locate replicas despite of churn.
The rest of this paper is organised as following: Section 9.2 presents the necessary background required to understand the proposed algorithm. In Section 9.3,
we describe our proposed decentralized algorithm to automate the migration process. Followed by applying the algorithm to the Niche platform to achieve RMEs
in Section 9.4. Finally, conclusions and future work are discussed in Section 9.5.
9.2
Background
This section presents the necessary background to understand the approach and
algorithm presented in this paper, namely: The Niche platform, Symmetric replication scheme, replicated state machines, and an approach to migrate stateful services.
Niche Platform
Niche [3] is a distributed component management system that implements the autonomic computing architecture [2]. Niche includes a distributed component programming model, APIs, and a run-time system including deployment service. The
main objective of Niche is to enable and to achieve self-management of componentbased applications deployed on a dynamic distributed environments where resources
can join, leave, or fail. A self-managing application in Niche consists of functional
and management parts. Functional components communicate via interface bindings, whereas management components communicate via a publish/subscribe event
notification mechanism.
The Niche run-time environment is a network of distributed containers hosting
functional and management components. Niche uses a Chord [9] like structured
overlay network (SON) as its communication layer. The SON is self-organising on
its own and provides overlay services used by Niche such as name-based communication, distributed hash table (DHT) and a publish/subscribe mechanism for event
dissemination. These services are used by Niche to provide higher level communication abstractions such as name-based bindings to support component mobility;
dynamic component groups; one-to-any and one-to-all group bindings, and event
based communication.
104
CHAPTER 9. ACHIEVING ROBUST SELF-MANAGEMENT
Structured Overlay Networks
We assume the following model of Structured Overlay Networks (SONs) and their
APIs. We believe, this model is representative, and in particular it matches the
Chord SON. In the model, SON provides the operation to locate items on the
network. For example, items can be data items for DHTs, or some compute facilities
that are hosted on individual nodes in a SON. We say that the node hosting or
providing access to an item is responsible for that item. Both items and nodes
posses unique SON identifiers that are assigned from the same name space. The
SON automatically and dynamically divides the responsibility between nodes such
that there is always a responsible node for every SON identifier. SON provides
a ’lookup’ operation that returns the address of a node responsible for a given
SON identifier. Because of churn, node responsibilities change over time and, thus,
’lookup’ can return over time different nodes for the same item. In practical SONs
the ’lookup’ operation can also occasionally return wrong (inconsistent) results.
Further more, SON can notify application software running on a node when the
responsibility range of the node changes. When responsibility changes, items need
to be moved between nodes accordingly. In Chord-like SONs the identifier space is
circular, and nodes are responsible for items with identifiers in the range between
the node’s identifier and the identifier of the predecessor node. Finally, a SON with
a circular identifier space naturally provides for symmetric replication of items on
the SON - where replica IDs are placed symmetrically around the identifier space
circle.
Symmetric Replication [10] is a scheme used to determine replica placement in
SONs. Given an item ID i and a replication degree f , symmetric replication can
be used to calculate the IDs of the item’s replicas. The ID of the x-th (1 ≤ x ≤ f )
replica of the item i is computed according to the following formula:
r(i, x) = (i + (x − 1)N/f ) mod N
(9.1)
where N is the size of the identifier space.
The IDs of replicas are independent from the nodes present in the system. A
lookup is used to find the node responsible node for hosting an ID. For the symmetry
requirement to always be true, it is required that the replication factor f divides
the size of the identifier space N .
Replicated State Machines
A common way to achieve high availability of a service is to replicate it on several
nodes. Replicating stateless services is relatively simple and not considered in this
paper. A common way to replicate stateful services is to use the replicated state
machine approach [6]. In this approach several nodes (replicas) run the same service
in order for service to survive node failures.
Using the replicated state machine approach requires the service to be deterministic. A set of deterministic services will have the same state change and produce
9.2. BACKGROUND
105
the same output given the same sequence of inputs (requests or commands) and
initial state. This means that the service should avoid sources of nondeterminism
such as using local clocks, random numbers, and multi-threading.
Replicated state machines, given a deterministic service, can use the Paxos [7]
algorithm to ensure that all services execute the same input in the same order. The
Paxos algorithm relies on a leader election algorithm [11] that will elect one of the
replicas as the leader. The leader ensures the order of inputs by assigning client
requests to slots. Replicas execute input sequentially i.e. a replica can execute
input from slot n + 1 only if it had already executed input from slot n.
The Paxos algorithm can tolerate replica failures and still operate correctly as
long as the number of failures is less than half of the total number of replicas. This
is because Paxos requires that there will always be a quorum of alive replicas in
the system. The size of the quorum is (R/2) + 1, where R is the initial number of
replicas in the system. In this paper, we consider only fail-stop model (i.e., a replica
will fail only by stopping) and will not consider other models such as Byzantine
failures.
Migrating Stateful Services
SMART [8] is a technique for changing the set of nodes where a replicated state machine runs, i.e. migrate the service. The fixed set of nodes, where a replicated state
machine runs, is called a configuration. Adding and/or removing nodes (replicas)
in a configuration will result in a new configuration.
SMART is built on the migration technique outlined in [7]. The idea is to
have the current configuration as part of the service state. The migration to a
new configuration happens by executing a special request that causes the current
configuration to change. This request is like any other request that can modify the
state when executed. The change does not happen immediately but scheduled to
take effect after α slots. This gives the flexibility to pipeline α concurrent requests
to improve performance.
The main advantage of SMART over other migration technique is that it allows
to replace non-failed nodes. This enables SMART to rely on an automated service
(that may use imperfect failure detector) to maintain the configuration by adding
new nodes and removing suspected ones.
An important feature of SMART is the use of configuration-specific replicas.
The service migrates from conf1 to conf2 by creating a new independent set of
replicas in conf2 that run in parallel with replicas in conf1. The replicas in conf1
are kept long enough to ensure that conf2 is established. This simplify the migration process and help SMART to overcome problems and limitations of other
techniques. This approach can possibly result in many replicas from different configurations to run on the same node. To improve performance, SMART uses a
shared execution module that holds the state and is shared among replicas on the
same node. The execution module is responsible for modifying the state by executing assigned requests sequentially and producing output. Other that that each
106
CHAPTER 9. ACHIEVING ROBUST SELF-MANAGEMENT
configuration runs its own instance of the Paxos algorithm independently without
any sharing. This makes it, from the point of view of the replicated state machine
instance, look like as if the Paxos algorithm is running on a static configuration.
Conflicts between configurations are avoided by assigning a non-overlapping
range of slots [FirstSlot, LastSlot] to each configuration. The FirstSlot for conf1
is set to 1. When a configuration change request appears at slot n this will result
in setting LastSlot of current configuration to n + α − 1 and setting the FirstSlot
of the next configuration to n + α.
Before a new replica in a new configuration can start working it must acquire
a state from another replica that is at least FirstSlot-1. This can be achieved by
copying the state from a replica from the previous configuration that has executed
LastSlot or from a replica from the current configuration. The replicas from the
previous configuration are kept until a majority of the new configuration have initialised their state.
9.3
Automatic Reconfiguration of Replica Sets
In this section we present our approach and associated algorithm to achieve robust
services. Our algorithm automates the process of selecting a replica set (configuration) and the decision of migrating to a new configuration in order to tolerate
resource churn. This approach, our algorithm together with the replicated state
machine technique and migration support, will provide a robust service that can
tolerate continuous resource churn and run for long period of time without the need
of human intervention.
Our approach was mainly designed to provide Robust Management Elements
(RMEs) abstraction that is used to achieve robust self-management. An example
is our platform Niche [3, 4] where this technique is applied directly and RMEs are
used to build robust autonomic managers. However, we believe that our approach
is generic enough and can be used to achieve other robust services. In particular, we
believe that our approach is suitable for structured P2P applications that require
highly available robust services.
Replicated (finite) state machines (RSM) are identified by a constant SON ID,
which we denote as RSMID in the following. RSMIDs permanently identify RSMs
regardless of node churn that causes reconfiguration of sets of replicas in RSMs.
Clients that send requests to RSM need to know only its RSMID and replication
degree. With this information clients can calculate identities of individual replicas
according to the symmetric replication scheme, and lookup the nodes currently responsible for the replicas. Most of the nodes found in this way will indeed host
up-to-date RSM replicas - but not necessarily all of them because of lookup inconsistency and node churn.
Failure-tolerant consensus algorithms like Paxos require a fixed set of known
replicas we call configuration in the following. Some of replicas, though, can be temporarily unreachable or down (the crash-recovery model). The SMART protocol
9.3. AUTOMATIC RECONFIGURATION OF REPLICA SETS
107
extends the Paxos algorithm to enable explicit reconfiguration of replica sets. Note
that RSMIDs cannot be used for neither of the algorithms because the lookup operation can return over time different sets of nodes. In the algorithm we contribute
for management of replica sets, individual RSM replicas are mutually identified by
their addresses which in particular do not change under churn. Every single replica
in a RSM configuration knows addresses of all other replicas in the RSM.
The RSM, its clients and the replica set management algorithm work roughly
as follows. First, a dedicated initiator chooses RSMID, performs lookups of nodes
responsible for individual replicas and sends to them a request to create RSM
replicas. Note the request contains RSMID and all replica addresses (configuration),
thus newly created replicas perceive each other as a group and can communicate
with each other. RSMID is also distributed to future RSM clients. Whenever
clients need to contact a RSM, they resolve the RSMID similar to the initiator and
multicast their requests to obtained addresses.
Because of churn, the set of nodes responsible for individual RSM replicas
changes over time. In response, our distributed configuration management algorithm creates new replicas on nodes that become responsible for RSM replicas, and
eventually deletes unused ones. The algorithm runs on all nodes of the overlay and
uses several sources of events and information, including SON node failure notifications, SON notifications about change of responsibility, and messages from clients.
We discuss the algorithm in greater detail in the following.
Our algorithm is built on top of Structured Overlay Networks (SONs) because
of their self-organising features and resilience under churn [12]. The algorithm
exploits lookup and failure detection facilities of SONs for managing the set of
active replicas. Replica placement schemes such as symmetric replication [10] is
used to maintain location information of the replicas. This is used by the algorithm
to select replicas in the replica set and is used by the clients to determine replica
locations in order to use the service. Join, leave, and failure events are used to
make a decision to migrate in a decentralized manner. Other useful operations,
that can be efficiently built on top of SONs, include multi-cast and range-cast. We
use these operations to recover from replica failures.
State Machine Architecture
The replicated state machine (RSM) consists of a set of replicas, which forms a
configuration. Migration techniques can be used to change the configuration (the
replica set). The architecture of a replica (a state machine) that supports migration
is shown in Fig. 9.1. The architecture uses the shared execution module optimization presented in [8]. This optimization is useful when the same replica participate
in multiple configurations. The execution module captures the logic of the service.
The execution module executes requests. The execution of a request may result
in state change, producing output, or both. The execution module should be a
deterministic program. Its outputs and states must depend only on the sequence
of input and the initial state. The execution module is also required to support
108
CHAPTER 9. ACHIEVING ROBUST SELF-MANAGEMENT
Output
Execution Module
Get/Set Checkpoint
State
Configurations
Conf 1
Conf 2
Conf 3
Internal
State
sequentially
execute requests
Slots 1 2 3
...
...
R1
FirstSlot
R1 R2
LastSlot FirstSlot
Instance 1
Instance 2
...
R2 R3
LastSlot FirstSlot
Instance 3
assign
requests
to slots
Replica
Input
Paxos,
Leader Election, and
Migration Messages
Figure 9.1: State Machine Architecture: Each machine can participate in more
than one configuration. A new replica instance is assigned to each configuration.
Each configuration is responsible for assigning requests to a none overlapping range
of slot. The execution module executes requests sequentially that can change the
state and/or produce output.
checkpointing. That is the state can be externally saved and restored. This enables
us to transfer states between replicas. The execution module executes all requests
except the ConfChange request which is handled by the state machine.
The state of a replica consists of two parts: The first part is internal state of the
execution module which is application specific; The second part is the configuration.
A configuration is represented by an array of size f where f is the replication
degree. The array holds direct references (IP and port) to the nodes that form the
configuration. The reason to split the state in two parts, instead of keeping the
configuration in the execution module, is to make the development of the execution
module independent from the replication technique. In this way legacy services,
that are already developed, can be replicated without modification given that they
satisfy execution module constraints.
The remaining parts of the SM, other than the execution module, are responsible to run the replicated state machine algorithms (Paxos and Leader Election)
and the migration algorithm (SMART). As described in the previous section, each
9.3. AUTOMATIC RECONFIGURATION OF REPLICA SETS
30
31
0
1
SM r4
2
3
29
28
4
5
27
SM r3
26
25
109
6
RSM ID = 10, f=4, N=32
Replica IDs = 10, 18, 26, 2
7
Hosting Node IDs = 14, 20, 29, 7
Configuration = IP(14), IP(20), IP(29), IP(7) 8
24
9
23
10
22
SM r1
11
21
12
20
13
19
18
SM r2
17
16
15
14
Figure 9.2: Replica Placement Example: Replicas are selected according to the
symmetric replication scheme. A Replica is hosted (executed) by the node responsible for its ID (shown by the arrows). A configuration is a fixed set of direct
references (IP address and port) to nodes that hosted the replicas at the time of
configuration creation. The RSM ID and Replica IDs are fixed and do not change
for the entire life time of the service. The Hosted Node IDs and Configuration are
only fixed for a single configuration. Black circles represent physical nodes in the
system.
configuration is assigned a separate instance of the replicated state machine algorithms. The migration algorithm is responsible for specifying the FirstSlot
and LastSlot for each configuration, starting new configurations when executing
ConfChange requests, and destroying old configurations after a new configuration
is established.
Configurations and Replica Placement Schemes
All nodes in the system are part of a structured overlay network (SON) as shown
in Fig. 9.2. The Replicated State Machine that represents the service is assigned a
random ID RSM ID from the identifier space N . The set of nodes that will form
a configuration are selected using the symmetric replication technique [10]. The
symmetric replication, given the replication factor f and the RSM ID, is used to
calculate the Replica IDs according to equation 9.1. Using the lookup() operation,
provided by the SON, we can obtain the IDs and direct references (IP and port) of
the responsible nodes. These operations are shown in Algorithm 9.1.
The use of direct references, instead of using lookup operations, as the con-
110
CHAPTER 9. ACHIEVING ROBUST SELF-MANAGEMENT
Algorithm 9.1 Helper Procedures
1: procedure GetConf(RSM ID)
2:
ids[ ] ← GetReplicaIDs(RSM ID)
. Replica Item IDs
for i ← 1, f do
ref s[i] ← Lookup(ids[i])
5:
end for
6:
return ref s[ ]
7: end procedure
3:
4:
8:
9:
10:
11:
12:
13:
procedure GetReplicaIDs(RSM ID)
for x ← 1, f do
ids[x] ← r(RSM ID, x)
end for
return ids[ ]
end procedure
. See equation 9.1
figuration is important for our approach to work for two reasons. First reason is
that we can not rely on the lookup operation because of the lookup inconsistency
problem. The lookup operation, used to find the node responsible for an ID, may
return incorrect references. These incorrect references will have the same effect in
the replicatino algorithm as node failures even though the nodes might be alive.
Thus the incorrect references will reduce the fault tolerance of the replication service. Second reason is that the migration algorithm requires that both the new
and the previous configurations coexist until the new configuration is established.
Relying on lookup operation for replica_IDs may not be possible. For example,
in Figure 9.2, when a node with ID = 5 joins the overlay it becomes responsible for
the replica SM_r4 with ID = 2. A correct lookup(2) will always return 5. Because
of this, the node 7, from the previous configuration, will never be reached using
the lookup operation. This can also reduce the fault tolerance of the service and
prevent the migration in the case of large number of joins.
Nodes in the system may join, leave, or fail at any time. According to the Paxos
requirements, a configuration can survive the failure of less than half of the nodes
in the configuration. In other words, f /2 + 1 nodes must be alive for the algorithm to work. This must hold independently for each configuration. After a new
configuration is established, it is safe to destroy instances of older configurations.
Due to churn, the responsible node for a certain SM may change. For example
in Fig.9.2 if node 20 fails then node 22 will become responsible for identifier 18
and should host SM_r2. Our algorithm, described in the remainder of this section,
will automate migration process by triggering ConfChange requests when churn
changes responsibilities. This will guarantee that the service provided by the RSM
will tolerate churn.
9.3. AUTOMATIC RECONFIGURATION OF REPLICA SETS
111
Algorithm 9.2 Replicated State Machine API
1: procedure CreateRSM(RSM ID)
. Creates a new replicated state machine
2:
Conf [ ] ← GetConf(RSM ID)
. Hosting Node REFs
3:
for i ← 1, f do
4:
sendto Conf [i] : InitSM(RSM ID, i, Conf )
5:
end for
6: end procedure
procedure JoinRSM(RSM ID, rank)
SubmitReq(RSM ID, Conf Change(rank, M yRef ))
. The new configuration will be submitted and assigned a slot to be executed
9: end procedure
7:
8:
10:
11:
12:
13:
14:
15:
procedure SubmitReq(RSM ID, req)
. Used by clients to submit requests
Conf [ ] ← GetConf(RSM ID)
. Conf is from the view of the requesting node
for i ← 1, f do
sendto Conf [i] : Submit(RSM ID, i, Req)
end for
end procedure
Replicated State Machine Maintenance
This section will describe the algorithms used to create a replicated state machine
and to automate the migration process in order to survive resource churn.
State Machine Creation
A new RSM can be created by any node in the SON by calling CreateRSM shown
in Algorithm 9.2. The creating node construct the configuration using symmetric
replication and lookup operations. The node then sends an InitSM message to
all nodes in the configuration. Any node that receives an Init SM message (Algorithm 9.5) will start a state machine (SM) regardless of its responsibility. Note
that the initial configuration, due to lookup inconsistency, may contain some incorrect nodes. This does not cause problems for the replication algorithm. Using
migration, the configuration will eventually be corrected.
Client Interactions
A client can be any node in the system that requires the service provided by the
RSM. The client need only to know the RSM ID to be able to send requests to
112
CHAPTER 9. ACHIEVING ROBUST SELF-MANAGEMENT
the service. Knowing the RSM ID, the client can calculate the current configuration using equation 9.1 and lookup operations (See Algorithm 9.1). This way we
avoid the need for an external configuration repository that points to nodes hosting
the replicas in the current configuration. The client submits requests by calling
SubmitReq as shown in Algorithm 9.2. The method simply sends the request to all
replicas in the current configuration. Due to lookup inconsistency, that can happen
either at the client side or the RSM side, the client’s view of the configuration and
the actual configuration may differ. We assume that the client’s view overlaps, at
least at one node, with the actual configuration for the client to be able to submit
requests. Otherwise, the request will fail and the client need to try again later after
the system heals itself. We also assume that each request is uniquely stamped and
that duplicate requests are filtered.
In the current algorithm the client submits the request to all nodes in the
configuration for efficiency. It is possible to optimise the number of messages by
submitting the request only to one node in the configuration that will forward
it to the current leader. The trade off is that sending to all nodes increases the
probability of the request reaching the RSM . This reduces the negative effects of
lookup inconsistencies and churn on the availability of the service.
It may happen, due to lookup inconsistency, that the configuration calculated by
the client contains some incorrect references. In this case, a incorrectly referenced
node ignores client requests (Algorithm 9.3 line 13) when it finds out that it is not
responsible for the target RSM.
On the other hand, it is possible that the configuration was created with some
incorrect references. In this case, the node that discovers that it was supposed to
be in the configuration will attempt to correct the configuration by replacing the
incorrect reference with the reference to itself (Algorithm 9.3 line 11).
Request Execution
The execution is initiated by receiving a submit request from a client. This will
result in scheduling the request for execution by assigning it to a slot that is agreed
upon among all SMs in the configuration (using the Paxos algorithm). Meanwhile,
scheduled requests are executed sequentially in the order of their slot numbers.
These steps are shown in Algorithm 9.3.
When a node receives a request from a client it will first check if it is hosting an
SM, which the request is directed to. If this is the case, then the node will try to
schedule the request if the node believes that it is the leader. Otherwise the node
will forward the request to the leader. On the other hand, if the node is not hosting
an SM with the RSMID in the request, it will proceed as described in section 9.3,
i.e. it ignores the request, if it is not resposible for the target SM, otherwise, it tries
to correct the configuration
At execution time, the execution module will execute all requests sequentially
except the ConfChange request that is handled by the SM. The ConfChange request
will start the migration protocol presented in [8] and outlined in Section 9.2.
9.3. AUTOMATIC RECONFIGURATION OF REPLICA SETS
113
Algorithm 9.3 Execution
1: receipt of Submit(RSM ID, rank, Req) from m at n
2:
SM ← SM s[RSM ID][rank]
3:
if SM 6= φ then
4:
if SM.leader = n then
5:
SM.submit(Req)
6:
else
7:
sendto SM.leader :
Submit(RSM ID, rank, Req)
. forward the request to the leader
8:
end if
9:
else
10:
if r(RSM ID, rank) ∈]n.predecessor, n] then
. I’m responsible
11:
JoinRSM(RSM ID, rank)
12:
else
13:
DoNothing
. This is probably due to lookup inconsistency
14:
end if
15:
end if
16: end receipt
17:
18:
19:
20:
21:
22:
23:
24:
25:
procedure ExecuteSlot(req)
. This is called when executing the current slot
if req.type = ConfChange then
newConf ← Conf [CurrentConf ]
newConf [req.rank] ← req.id
. Replaces the previous responsible with the new one
SM.migrate(newConf )
. SMART will set LastSlot and start new configuration
else
. Execution module handles all other requests
ExecutionM odule.Execute(req)
end if
end procedure
Handling Churn
Algorithm 9.4 shows how to maintain the replicated state machine in case of node
join/leave/failure. When any of these cases happen, a new node may become
responsible for hosting a replica. In case of node join, the new node will send
a message to its successor to get any replica that now it is responsible for. In
case of leave, the leaving node will send a message to its successor containing all
114
CHAPTER 9. ACHIEVING ROBUST SELF-MANAGEMENT
Algorithm 9.4 Churn Handling
1: procedure NodeJoin
. Called by SON after the node joined the overlay
2:
sendto successor : PullSMs(]predecessor, myId])
3: end procedure
4:
procedure NodeLeave
sendto successor : NewSMs(SMs)
5:
end procedure
. Transfer all hosted SMs to Successor
procedure NodeFailure(newP red, oldP red)
. Called by SON when the predecessor fails
Sf
7:
I ← x=2 ]r(newP red, x), r(oldP red, x)]
8:
multicast I : PullSMs(I)
9: end procedure
6:
replicas that it was hosting. In the case of failure, the successor of the failed node
need to discover if the failed node was hosting any SMs. This is done by checking
all intervals (line 7) that are symmetric to the interval that the failed node was
responsible for. One way to achieve this is by using interval-cast that can be
efficiently implemented on SONs e.g. using bulk operations [10].
All newly discovered replicas are handled by NewSMs (Algorithm 9.5). The node
will request a configuration change by joining the corresponding RSM for each new
SM. Note that the configuration size is fixed to f . A configuration change means
replacing reference at position r in the configuration array with the reference of the
node requesting the change.
9.4
Robust Management Elements in Niche
In order to validate and evaluate the proposed approach to achieve robust services,
we have implemented our proposed algorithm together with the replicated state
machine technique and migration support using the Kompics component framework [13]. We intend to integrate the implemented prototype with the Niche
platform and use it for building robust management elements for self-managing
distributed applications. We have conducted a number of tests to validate the algorithm. We are currently conducting simulation tests, using Kompics simulation
facilities, to evaluate the performance of our approach.
The autonomic manager in Niche is constructed from a set on management
elements. To achieve robustness and high availability of Autonomic Managers,
in spite of churn, we will apply the algorithm described in the previous section
to management elements. Replicating management elements and automatically
9.4. ROBUST MANAGEMENT ELEMENTS IN NICHE
115
Algorithm 9.5 SM maintenance (handled by the container)
1: receipt of InitSM(RSM ID, Rank, Conf ) from m at n
2:
new SM
. Creates a new replica of the state machine
3:
SM.ID ← RSM ID
4:
SM.Rank ← Rank
. 1 ≤ Rank ≤ f
5:
SM s[RSM ID][Rank] ← SM
. SMs stores all SM that node n is hosting
6:
SM.Start(Conf )
. This will start the SMART protocol
7: end receipt
receipt of PullSMs(Intervals) from m at n
for each SM in SM s do
if r(SM.id, SM.rank) ∈ I then
newSM s.add(SM )
end if
end for
sendto m : NewSMs(newSM s)
. SMs are destroyed later by migration protocol
15: end receipt
8:
9:
10:
11:
12:
13:
14:
receipt of NewSMs(N ewSM s) from m at n
for each SM in N ewSM s do
JoinRSM(SM.id, SM.rank)
19:
end for
20: end receipt
16:
17:
18:
maintaining them will result in what we call Robust Management Element (RME).
An RME will:
• be replicated to ensure fault-tolerance. This is achieved by the replicated
state machine algorithm.
• survive continuous resource failures by automatically restoring failed replicas
on other nodes. This is achieved using our proposed algorithm that will
automatically migrate the RME replicas to new nods when needed.
• maintain its state consistent among replicas. This is guaranteed by the replicated state machine algorithm and the migration mechanism used.
• provide its service with minimal disruption in spite of resource join/leave/fail
(high availability). This is due to replication. In case of churn, remaining
replicas can still provide the service.
116
CHAPTER 9. ACHIEVING ROBUST SELF-MANAGEMENT
• be location transparent (i.e. clients of the RME should be able to communicate with it regardless of its current location). The clients need only to know
the RME_ID to be able to use an RME regardless of the location of individual
replicas.
The RMEs are implemented by wrapping ordinary MEs inside a state machine.
The ME will serve as the execution module shown in Figure 9.1. However, to be
able to use this approach, the ME must follow the same constraints as the execution
module. That is the ME must be deterministic and provide checkpointing.
Typically, in replicated state machine approach, a client sends a request that is
executed by the replicated state machine and gets a result back. In our case, to
implement feedback loops, we have two kinds of clients from the point of view of
an RMS. A set of sending client Cs that submit requests to the RME and a set
of receiving clients Cr that receive results from the RME. The Cs includes sensors
and/or other (R)MEs and the Cr includes actuators and/or other (R)MEs.
To simplify the creation of control loops that are formed from RMEs we use a
publish/subscribe mechanism. The publish/subscribe system handles requests/responses to link different stages to form a control loop.
9.5
Conclusions and Future Work
In this paper, we presented an approach to achieve robust management elements
that will simplify the construction of autonomic managers. The approach uses replicated state machines and relies on our proposed algorithm to automate replicated
state machine migration in order to tolerate churn. The algorithm uses symmetric
replication, which is a replication scheme used in structured overlay networks, to
decide on the placement of replicas and to detect when to migrate. Although in
this paper we discussed the use of our approach to achieve robust management
elements, we believe that this approach might be used to replicate other services in
structured overlay networks in general.
In order to validate and evaluate our approach, we have implemented a prototype that includes the proposed algorithms. We are currently conducting simulation tests to evaluate the performance of our approach. In our future work, we will
integrate the implemented prototype with the Niche platform to support robust
management elements in self-managing distributed applications. We also intend to
optimise the algorithm in order to reduce the amount of messages and we will investigate the effect of the publish/subscribe system used to construct control loops
and try to optimise it. Finally, we will try to apply our approach to other problems
in the field of distributed computing.
Bibliography
[1]
P. Horn, “Autonomic computing: IBM’s perspective on the state of information
technology,” Oct. 15 2001.
[2]
IBM, “An architectural blueprint for autonomic computing, 4th edition.”
http://www-01.ibm.com/software/tivoli/autonomic/pdfs/AC_
Blueprint_White_Paper_4th.pdf, June 2006.
[3]
A. Al-Shishtawy, J. Höglund, K. Popov, N. Parlavantzas, V. Vlassov, and
P. Brand, “Enabling self-management of component based distributed applications,” in From Grids to Service and Pervasive Computing (T. Priol and
M. Vanneschi, eds.), pp. 163–174, Springer US, July 2008.
[4]
“Niche homepage.” http://niche.sics.se/.
[5]
A. Al-Shishtawy, V. Vlassov, P. Brand, and S. Haridi, “A design methodology
for self-management in distributed environments,” in Computational Science
and Engineering, 2009. CSE ’09. IEEE International Conference on, vol. 1,
(Vancouver, BC, Canada), pp. 430–436, IEEE Computer Society, August 2009.
[6]
F. B. Schneider, “Implementing fault-tolerant services using the state machine
approach: a tutorial,” ACM Comput. Surv., vol. 22, no. 4, pp. 299–319, 1990.
[7]
L. Lamport, “Paxos made simple,” SIGACT News, vol. 32, pp. 51–58, December 2001.
[8]
J. R. Lorch, A. Adya, W. J. Bolosky, R. Chaiken, J. R. Douceur, and J. Howell,
“The smart way to migrate replicated stateful services,” SIGOPS Oper. Syst.
Rev., vol. 40, no. 4, pp. 103–115, 2006.
[9]
E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim, “A survey and comparison of peer-to-peer overlay network schemes,” Communications Surveys &
Tutorials, IEEE, vol. 7, pp. 72–93, Second Quarter 2005.
[10] A. Ghodsi, Distributed k-ary System: Algorithms for Distributed Hash Tables.
PhD thesis, Royal Institute of Technology (KTH), 2006.
117
118
BIBLIOGRAPHY
[11] D. Malkhi, F. Oprea, and L. Zhou, “Omega meets paxos: Leader election and
stability without eventual timely links,” in Proc. of the 19th Int. Symp. on
Distributed Computing (DISC’05), pp. 199–213, Springer-Verlag, July 2005.
[12] J. S. Kong, J. S. Bridgewater, and V. P. Roychowdhury, “Resilience of structured p2p systems under churn: The reachable component method,” Computer
Communications, vol. 31, pp. 2109–2123, June 2008.
[13] C. Arad, J. Dowling, and S. Haridi, “Building and evaluating p2p systems
using the kompics component framework,” in Peer-to-Peer Computing, 2009.
P2P ’09. IEEE Ninth International Conference on, pp. 93 –94, sept. 2009.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement