Proceedings of the International

Proceedings of the International
Proceedings of the International
ERCIM Workshop on Software Evolution
2006
Edited by Laurence Duchien, Maja D'Hondt and Tom Mens
LIFL - INRIA
Université des Sciences et Technologies de Lille
France
April 6 and 7, 2006
Proceedings of the International
ERCIM Workshop on Software Evolution
April 6 and 7, 2006
LIFL - INRIA
Université des Sciences et Technologies de Lille (USTL)
France
Laurence Duchien
LIFL-INRIA, USTL, France (local organiser)
Maja D'Hondt
LIFL-INRIA, USTL, France (ERCIM Fellow)
Tom Mens
Software Engineering Lab, University of Mons-Hainaut, Belgium
(coordinator of the ERCIM Working Group on Software Evolution)
Table of Contents
Sponsors
p. 1
Invited Talk: The Software Evolution Paradox: An Aspect Mining Perspective
by Arie Van Deursen
p. 3
Architecture-Centric Configuration Management for Product Line Evolution
by Michalis Anastasopoulos
p. 5
Formal Model Merging Applied to Class Diagram Integration
by Artur Boronat, José Á. Carsí, Isidro Ramos, Patricio Letelier
p. 11
Inducing Evolution-Robust Pointcuts
by Mathieu Braem, Kris Gybels, Andy Kellens and Wim Vanderperren
p. 17
Empirical Analysis of the Evolution of an Open Source System
by Andrea Capiluppi, Sarah Beecham and Juan Fernández-Ramil
p. 23
Evolvability as a Quality Attribute of Software Architectures
by Selim Ciraci and Pim van den Broek
p. 29
Software Evolution from the Field: An Experience Report from the Squeak Maintainers
by Marcus Denker and Stéphane Ducasse
p. 33
Aspect-Orientation for Revitalising Legacy Business Software
by Kris De Schutter and Bram Adams
p. 43
Effort Assessment and Predictions for High Volume Web-based Applications:
a Pragmatic Approach
by Sanjeev Dhawan and Rakesh Kumar
p. 57
A Language for Defining Traceability Models for Concerns
by Dolores Diaz, Lionel Seinturier, Laurence Duchien and Pascal Flament
p. 67
User-Centric Dynamic Evolution
by Peter Ebraert, Theo D’Hondt, Yves Vandewoude and Yolande Berbers
p. 75
On the use of Measurement in Software Restructuring
by Naji Habra and Miguel Lopez
p. 81
Responsibility-Steering Automation of Software Evolution
by Ming-Jen Huang and Takuya Katayama
p. 89
Generic Programming for Software Evolution
by Johan Jeuring and Rinus Plasmeijer
p. 97
Architecture for Attribute-Driven Evolution of Nomadic Media
by Lech Krzanik
p. 105
A Tool for Exploring Software Systems Merge Alternatives
by Rikard Land and Miroslav Lakotic
p. 113
Degradation Archaeology: Studying Software Flaws’ Evolution
by Angela Lozano, Michel Wermelinger and Bashar Nuseibeh
p. 119
Dependency Analysis of Model Inconsistency Resolutions
by Tom Mens, Ragnhild Van Der Straeten and Maja D’Hondt
p. 127
SAEV: A Model to Face Evolution Problem in Software Architecture
by Mourad Oussalah, Nassima Sadou and Dalila Tamzalit
p. 137
SmPL: A Domain-Specific Language for Specifying Collateral Evolutions
in Linux Device Drivers
by Yoann Padioleau, Julia L. Lawall and Gilles Muller
p. 147
Versioning Persistence For Objects
by Frédéric Pluquet and Roel Wuyts
p. 155
Change-based Software Evolution
by Romain Robbes and Michele Lanza
p. 159
Using Microcomponents and Design Patterns to Build Evolutionary Transaction Services
by Romain Rouvoy and Philippe Merle
p. 165
Comparative Semantics of Feature Diagrams
by Pierre-Yves Schobbens, Patrick Heymans, Jean-Christophe Trigaux and
Yves Bontemps
p. 181
Preliminary Results from an Investigation of Software Evolution in Industry
by Odd Petter N. Slyngstad, Anita Gupta, Reidar Conradi, Parastoo Mohagheghi,
Thea C. Steen and Mari T. Haug
p. 187
Semantically Sane Component Preemption
by Yves Vandewoude and Yolande Berbers
p. 195
Sponsors
European Research Consortium for Informatics and Mathematics
(ERCIM)
http://www.ercim.org/
Institut de Recherche sur les Composants logiciels et matériels
pour l'Information et la Communication Avancée (IRCICA)
www.ircica.univ-lille1.fr/
Institut National de Recherche en Informatique et Automatique
(INRIA)
http://www-futurs.inria.fr/
Université des Sciences et Technologies de Lille (USTL)
http://ustl1.univ-lille1.fr/
p. 1 of 199
p. 2 of 199
Invited Talk
The Software Evolution Paradox:
An Aspect Mining Perspective
by Arie Van Deursen
Delft University of Technology - Centrum voor Wiskunde en Informatica (CWI)
The Netherlands
http://homepages.cwi.nl/~arie/
abstract
As software evolution researchers, we are well aware of two facts,
formulated even as laws back in 1976 by Belady and Lehman: First,
successful software systems are subjected to continuous change - in fact,
the competetive advantage of a software system is more and more
determined by its flexibility to undergo required changes. Second, each
change erodes the structure of the design, making the software harder and
harder to change. But how can we work in a world where both laws are
true? How can we ever strike a balance between the need to keep our
system evolvable, while at same time being forced to modify that system
under tight time-to-market constraints? In this presentation we first of all
explore the practical as well as the research implications of this "software
evolution paradox". We then use the findings to reflect on our ongoing
software evolution research, in particular in the area of aspect mining. We
will present our approach based on fan-in analysis as well as the FINT
Eclipse plugin supporting it. We apply fan-in analysis to a number of open
source case studies, and critically analyze which of the concerns identified
are suitable for an aspect-oriented refactoring. We use these results in
order to reflect on the relevance of aspects and aspect mining for making
life under the software evolution paradox a little easier.
p. 3 of 199
p. 4 of 199
Architecture-centric Configuration Management for
Product Line Evolution
Michalis Anastasopoulos
Fraunhofer Institute for Experimental Software Engineering (IESE)
Sauerwiesen 6, 67661 Kaiserslautern, Germany
[email protected]
1 Introduction
For being competitive today's software organizations are obliged to satisfy needs of various customers in
various situations. To this end it must be possible to efficiently derive customized variants of the software
products in the organization's portfolio. Software Product Lines (SPL) is a promising technology in this regard.
Instead of looking into one single product that must be customized according to the given requirements,
software product lines propose the explicit definition of a family of software products. This family can be
thought of as the set of related products an organization has developed in the past or intents to develop in
the future. The members of the family all belong to the same application domain and therefore share many
commonalities but also have several variabilities. Organizations taking an SPL approach take explicit
advantage of these product relations and hence are better prepared for the production of customized
solutions. The key to the success of a Software Product Line is the establishment of a reuse infrastructure,
which creates and manages generic software artifacts. The latter are highly flexible and can be easily
specialized or customized for the needs of a customer-specific product.
The creation of an SPL infrastructure, which – similarly to industrial production lines – will provide all
necessary assets and enable the efficient product creation, is a complex task. This complexity rises rapidly
as the product portfolio and variabilities grow. In other words managing the evolution of an SPL infrastructure
is hard. A major technology for dealing with evolution in software systems is without doubt Software
Configuration Management. Yet, current Configuration Management Systems (CMS) do not meet the special
requirements of Software Product Lines and therefore cannot be successful without combination with other
technologies in this context. This paper proposes a solution, which consists of an additional layer on top of
Configuration Management Systems that on the one side takes full advantage of the technical maturity of
today's CMS and on the other side matches the evolution needs of Software Product Lines.
The proposed solution contributes to the PuLSE™ method [PULSE] developed at Fraunhofer IESE. PuLSE™
is a customizable approach to Product Line Engineering. The components and processes of PuLSE™ are
depicted in the following figure:
Deployment Phases
Product Line
Infrastructure
Construction
Product Line
Infrastructure Usage
Project Entry Points
Product Line Infrastructure Evolution
PuLSE Initialization
Technical Components
Organizational Issues
Customizing
Scoping
Modeling
Architecting
Designing
Coding
Testing and Inspection
Evolving and Managing
Instantiating
Maturity Scale
Support Components
Figure 1: Structure of the PuLSE™ approach
The rest of the paper is structured as follows: section 2 discusses the details of the product line evolution
problem, section 3 describes the proposed solution, section 4 elaborates expected benefits and section 5
concludes the paper.
p. 5 of 199
2 Product Line Evolution
2.1 Evolution Dimensions
Evolution in single software systems is handled in one dimension, that is time. Software product lines add the
space dimension, which arises from the multiple specializations of generic software artifacts. For single
systems CMS provide all facilities necessary for storing and retrieving the different versions that arise in the
life cycle of software development artifacts. CMS also support the concept of variants, which at first sight
appears to fit well in the context of software product lines.
Indeed variants of a main artifact are defined as functionally equivalent incarnations thereof, which have been
developed for different hardware and/or software platforms [LEON05]. In practice variants can also exhibit
functional differences. In general the creation of a variant denotes evolution in space, because the space of
available instances of a software artifact increases. However current CMS do not take this fact sufficiently
into account. CMS focus on the technical support necessary for creating and synchronizing parallel
development branches (i.e. streams). Yet in this way the focus remains in the time dimension. For adequately
supporting the space dimension as well the following facts must be considered:
1 Variants belong to the same artifact-specific set: The variants of a given artifact are related to the artifact
and to each other. For the sake of successful software reuse changes to one variant must be easy to
propagate to the other variants as well. Current CMS address change propagation through their
differencing and merging facilities. Yet it is not clear when the usage of these facilities is necessary. In
other words it is not clear whether the change in a variant really affects other variants.
Main
Artifact
Variant 2
Version 1
branch off
Variant 1
branch off
Version 1.2.1
Version 1.1.1
re-integrate
Version 1.1.2
Propagate
change?
Figure 2: Change Propagation problem
2 Variation reflects specialization:In the context of Software Product Lines genericity plays a very important
role. Product Line artifacts are made generic, that means easier to reuse in various situations. CMS do not
sufficiently support genericity. In other words if a variant is defined as a specialization of a generic artifact
it is hard to integrate changes from the variant back to the generic artifact. The following figure illustrates
the problem.
Generic
Artifact
e.g. Containing conditional
compilation statements
(ifdefs)
Version 1
Variant 1
Set
conditionals
and
perform
pre-processing
Version 1.1.1
Version 1.1.2
Re-integration
is difficult
Figure 3: Genericity problem
The Production Line approach discussed in [KRU02] suggests to see variants as transient work products
and thereby avoid changes directly on them exactly due to this complexity of the re-integration. Instead the
approach proposes to perform the changes directly on the generic artifacts and then to eventually
reproduce all variants. In this case it is however worthy to support the easy localization of necessary
modifications in the generic artifacts due to change requests arising from variants.
p. 6 of 199
3 A variant is identified through its relation to the main artifact: A variant is characterized by its differences
from the main artifact. Current CMS consider these differences, which in the case of a simple CMS (e.g.
CVS) must be expressed as tags. More sophisticated CMS (e.g. ClearCase) provide for better
expressiveness by allowing the usage of attribute/value pairs or even of special configuration languages.
However in all cases the connections between artifacts and variants are not seen as first-class entities. In
the long run this situation leads to unclear relations between main artifacts and variants and therefore
complicates the re-integration of variant changes. The fact that the relation between different artifact or
variant versions in most of the cases is also not captured properly (e.g. it is captured as simple text
comments) does not improve the situation. The following picture depicts the problem.
Relation?
Main
Artifact
Attributes
Variant 1
Attributes
Version 1.1.1
Attributes ?
Version 1.1.2
Figure 4: Variant and Version relation problem
4 Variants are not only physical: Variations do not apply only to physical configuration items like files or
directories but also to logical entities like the elements of a software architecture. Some CMS enable the
management also of these entities. This feature is also known as project-oriented branching [APP98].
Baselines are typically used in this case for storing and retrieving versions of complete project
configurations. However such a configuration contains only concrete artifact versions. It cannot contain
the set of all variants for an artifact. In this case it becomes difficult to define generic composite artifacts.
The latter however are necessary in the context of software product lines. In this case engineers want to
be able to select generic composite artifacts and easily identify the variability contained therein.
Possible
Main
Composite
Artifact
Artifact A
version 1
Artifact B
version 1
Artifact C
version 1.1.1
Variant
Composite
Artifact
branch off
Artifact A
version 1
Artifact X
Difficult
Generic
Composite
Artifact
Variant 1
Artifact A
Main
Artifact A
Variant 2
Artifact A
Main
Artifact B
Figure 5: Generic composition problem
2.2 Effort Characterization
When using variants for managing the variation in a product line, it is expected that the effort will grow very
rapidly as the amount of branches increase. In fact the effort for integrating the changes from one branch
(main or variant) to other branches can be characterized by the following equations:
p. 7 of 199
effort one branch =number of changes⋅averagechange processing time
effort n branches =effort one branch⋅number of branches
In the first equation number of changes denotes the amount of changes that must be considered and
eventually integrated, average change processing time denotes the average time needed for judging and
integrating a change and finally number of branches denotes the number of branches, which are to adopt the
changes. In the simple change propagation scenario depicted in figure 2 the overall effort can be expressed
as follows.
effort overall integration=effort one branch effort n branches
At this point it must be noticed that the effort for the propagation from the main branch to the other variants
may be in average less than the effort from the originating variant to the main branch (i.e. some changes
from the variant could be declined). The above equation does not consider this possibility. The following
figure shows the effort increase rate graphically:
effort
Integration
main → other
variants
Integration
variant → main
# branches
Figure 6: Effort for simple change propagation (#changes and
change processing remaining constant)
Effort
The above picture illustrates an optimistic scenario, in which the average change processing time and the
number of changes remain constant. If however we subsume that these values will also increase (at least
linearly) it can be proved that the effort will grow exponentially as depicted in the following picture:
Branche s
Figure 7: Effort for simple change propagation (#changes and
change processing effort rising)
3 Proposed Solution
In section 2 it has been shown that:
a) current CMS do not provide the specialized features needed for product line engineering and
b) the plain usage of standard CMS features (i.e. branching) does not scale well
However for solving these problems it is not wise to relinquish the usage of a CMS. Configuration
Management is a mature and established technology for controlling software evolution and the artifacts
produced during product line engineering must without doubt be stored in the controlled repository of a CMS.
The question is therefore how to instrument a CMS so that it fits the needs of product line engineering.
The first step towards the solution is the provision of the missing features described in section 2.1. To this
p. 8 of 199
end we propose the definition of a Customization view. The model of this view is depicted in the following
UML class diagram:
contains asset
1
0..*
Product Line Asset
1
1
contains core
contains instance
0..*
1
1
1
Variant
Core Artifact
1
1
1
contains connector
refers to connector
Connector
0..*
Figure 8: Customization view model
The following table defines the elements of the view model and provides typical usage scenarios.
View Element
Definition
Typical End-User Scenario
Product Line Asset A composite containing a software For the creation of a custom solution the engineer checksartifact together as well as its out a product line asset and identifies the available
different variant.
variants contained therein.
Core Artifact
A generic software artifact
After examination of the available variants within a Product
Line Asset the engineer may define a new variant by
checking-out the Core Artifact and altering it for the needs
of a specific product.
Variant
An instance of a generic artifact
For the creation of a custom solution the engineer checksout and reuses an already defined variant. Change
requests on the variant are correctly associated to the
Core Artifact and thereby to the other Variants.
Connector
A characterization of the relation For processing a change request arising from a variant the
between a Core Artifact and a engineer analyzes the design decisions that led to this
Variant
variant and can so identify the change impact for the Core
Artifact and the other variants.
The Customization view is envisioned as the main instrument for managing the evolution of a Software
Product Line. The system that will host this view and provide the according user interface is called
Customization Layer (CL) and is placed on top of an existing CMS layer. The idea is to encapsulate the
technicalities of the underlying CMS and therefore to focus on the SPL-specific issues. To this end the CL
must also:
a) provide a set of services available for the elements of the Customization view.
b) define the roles that will take advantage of the provided services
The above must reflect common processes in product line engineering. The scenarios of the above table
illustrates some of these processes. Other processes include the instantiation of a Product Line Asset as one
of its variants, the creation of Product Line Asset variants, the checking-in of a new Variant version etc. The
roles that must be provided can be grouped in two major categories: Framework and Application Engineer.
The former is responsible for core artifacts and the latter for product-specific artifacts.
The Customization view can be applied to any configuration item (i.e. any software artifact that can be stored
in the CMS). However our proposed solution is architecture-centric because it focuses only on architectural
elements. The reason lies in the fact that evolution can be addressed in a promising way if the Software
Architecture or the Product Line Architecture in particular plays the role of the central change manager in a
software development project. The Connectors between Core Artifacts and Variants consist of design
decisions in this case.
4 Envisioned Benefits
The major benefits envisioned by the proposed solution are the acceleration of the evolution process and the
reduction of the effort increase rate. The Customization Layer will speed up the evolution process because it
p. 9 of 199
will simplify the decision whether change propagation is necessary. For example the change from a variant
will be propagated to other variants only if the latter are really affected. Additionally the explicit connection
between core artifact and variants will allow the isolation of changes in the variant if they are product-specific
as well as the easier identification of the change impact.
Other potential benefits include:
a) Repository quality: The Customization Layer will encapsulate the underlying CMS and take over the
interaction with it. In other words the physical structure of the repository will be completely managed by the
CL. In this case inconsistencies that arise by the way different users deal with the repository can be
avoided. In general the quality of the repository is expected to increase.
b) Reuse of existing infrastructure: The CL will set upon and reuse existing CMS and therefore will not
require abandoning infrastructure an organization has invested into.
5 Related Work
In [KRU02] the problem of variation management is being thoroughly studied and techniques for realizing the
production line approach (see also section 2.1) are being presented. The Customization Layer approach
reuses many of the concepts discussed there. The differences lie first in the support for issuing change
requests out of variants and second in the layering approach between CL and CMS that will also provide for
distributed and large-scale development.
[ATK01] addresses evolution from a component-oriented perspective. A key idea is to enable a component
under configuration management to be self-contained by including its dependencies to other components as
first-class entities. The Customization Layer approach on the other hand currently concentrates on the
dependencies within a product line asset, that is between core artifacts and variants. However the
approaches are compatible since the Customization View can be extended with additional connectors
representing dependencies between different product line assets. For the time being these dependencies are
not made visible in the Customization View.
[OMM01] proposes the use of Configuration Management only for temporary variation. For permanent
variation the author suggests the use of component-oriented technology in combination with variability
modeling. The Customization Layer is compatible with this approach since generic components can be
managed as core artifacts. In essence the Customization Layer would enable the better management of
components over time.
In [BM00] decision models are employed for describing the variability in a core asset in terms of open
decisions. When a variant is created the open decisions must be resolved thereby producing a resolution
model. This resolution model is comparable with the Connectors of the Customization View. The latter takes
therefore a more general approach that does necessitate the presence of decision and resolution models.
However decision models can be also attached to the connection between core artifacts and variants.
5 Conclusion
This position paper has presented the Customization Layer, a system that will support the evolution of
Software Product Lines. The Customization Layer captures the relations between core artifacts and their
variants explicitly and so enables change requests to be processed more efficiently. The layer is seen as an
extension that is installed on top of existing Configuration Management Systems and so provides for the full
exploitation of their technical maturity with respect to large-scale, distributed software development. The next
steps in this work consist among other things of a sample implementation towards a proof of concept as well
as of the definition of metrics that will validate the achievement of the envisioned benefits.
6 References
[APP98]
[ATK01]
[BM00]
[KRU02]
[LEON05]
[OMM01]
[PULSE]
Brad Appleton (Motorola Network Solutions Group) et al Streamed Lines: Branching Patterns
for
Parallel
Software
Development
www.cmcrossroads.com/bradapp/acme/branching/#StreamedLines
Colin Atkinson et al, Component-Based Product Line Engineering with the UML, Chapter 17,
Pearson Education Limited 2001
Joachim Bayer, Dirk Muthig, Maintenance Aspects of Software Product Lines, Proceedings of
1st Deutscher Software Produklinien Workshop, Kaiserslautern, November 2000,.Editors Peter
Knauber, Klaus Pohl, Fraunhofer IESE Report 076.00/E
Charles W. Krueger, Variation Management for Software Production Lines, BigLever Software,
Inc., proceedings of 2nd Software Product Lines Conference, Springer Verlag, 2002
Alexis Leon, Software Configuration Management Handbook, 2nd edition, 2005 Artech House
Inc.
Rob van Ommering. Configuration Management in Component Based Product Populations. In
Tenth International Workshop on Software Configuration Management (SCM-10), 2001.
Available from http://www.ics.uci.edu/~andre/scm10/papers/ommering.pdf.
Homepage of PuLSE™, http://www.iese.fraunhofer.de/PuLSE/, Fraunhofer IESE 2006
p. 10 of 199
Formal Model Merging Applied to Class Diagram
Integration
Artur Boronat, José Á. Carsí, Isidro Ramos, Patricio Letelier
Department of Information Systems and Computation
Polytechnic University of Valencia
Camí de Vera s/n
46022 Valencia-Spain
{aboronat | pcarsi | iramos | letelier}@dsic.upv.es
ABSTRACT
The integration of software artifacts is present in many scenarios
of the Software Engineering field: object-oriented modeling,
relational databases, XML schemas, ontologies, aspect-oriented
programming, etc. In Model Management, software artifacts are
viewed as models that can be manipulated by means of generic
operators, which are specified independently of the context in
which they are used. One of these operators is Merge, which
enables the automated integration of models. Solutions for
merging models that are achieved by applying this operator are
more abstract and reusable than the ad-hoc solutions that are
pervasive in many contexts of the Software Engineering field. In
this paper, we present our automated approach for generic model
merging from a practical standpoint, providing support for
conflict resolution and traceability between software artifacts. We
focus on the definition of our operator Merge, applying it to Class
Diagrams integration.
Categories and Subject Descriptors
D.2.2 [Design Tools and Techniques]: Computer-aided software
engineering, Evolutionary prototyping, Object-oriented design
methods
D.2.7 [Distribution, Maintenance, and Enhancement]:
Restructuring, reverse engineering, and reengineering
D.2.13 [Reusable Software]: Reuse models
I.1 [SYMBOLIC AND ALGEBRAIC MANIPULATION]:
I.1.4 Applications
I.6.5 [Model Development]: Modeling methodologies
General Terms
Design, Experimentation, Languages.
Keywords
Model-Driven Engineering, Model Management, model merging,
conflict resolution.
1. INTRODUCTION
The Model-Driven Development philosophy [1] considers models
as the main assets in the software development process. Models
collect the information that describes the information system at a
high abstraction level, which permits the development of the
application in an automated way following generative
programming techniques [2]. In this process, models constitute
software artifacts that experience refinements from the problem
space (where they capture the requirements of the application) to
the solution space (where they specify the design, development
and deployment of the final software product).
During this refinement process, several tasks are applied to
models such as transformation and integration tasks. These tasks
can be performed from a model management point of view.
Model Management was presented in [3] as an approach to deal
with software artifacts by means of generic operators that do not
depend on metamodels by working on mappings between models.
Operators of this kind deal with models as first-class citizens,
increasing the level of abstraction by avoiding working at a
programming level and improving the reusability of the solution.
Based on our experience in formal model transformation and data
migration [4], we are working on the application of the model
management trend in the context of the Model-Driven
Development. We have developed a framework, called
MOMENT (MOdel manageMENT) [22], which is embedded into
the Eclipse platform [5] and that provides a set of generic
operators to deal with models through the Eclipse Modeling
Framework (EMF) [6]. Some of the simple operators defined are:
the union, intersection and difference between two models, the
transformation of a set of models to other model applying a QVT
transformation, the navigation through mappings, and so on.
Complex operators can be defined by composition of other
operators. In this paper, we present the operator Merge of the
MOMENT framework from a practical point of view. The
underlying formalism of our model management approach is
Maude [7]. We apply it as a novel solution for the integration of
UML Class Diagrams in a Use Case Driven software development
process.
The structure of the paper is as follows: Section 2 presents a case
study used as an example in the rest of the paper; Section 3
describes our approach for dealing with models by means of an
industrial modeling tool, and also informally presents the generic
semantics of the operator Merge; Section 4 presents the
customization of the operator Merge to the UML metamodel;
Section 5 explains the application of the operator Merge to the
p. 11 of 199
Figure 1. Use Case Model
Figure 2. Partial models associated to the corresponding Use Cases
case study; Section 6 discusses some related work; and Section 7
summarizes the advantages of our approach.
2. CASE STUDY: USE CASE ANALYSIS
USING PARTIAL CLASS DIAGRAMS
Software development methodologies based on UML propose an
approach where the process is Use Case Driven [8, 9]. This means
that all artifacts (including the Analysis and Design Model, its
implementation and the associated test specifications) have
traceability links from Use Cases. These artifacts are refined
through several transformation steps. Obtaining the Analysis
Model from the Use Case Model is possibly the transformation
that has the least chance of achieving total automation. The Use
Case Model must sacrifice precision in order to facilitate
readability and validation so that the analysis of use cases is
mainly a manual activity.
p. 12 of 199
When the Use Case Model has many use cases, managing
traceability between each use case and the corresponding
elements in the resulting class diagram can be a difficult task. In
this scenario, it seems reasonable to work with each use case
separately and to register its partial class diagram (which is a
piece of the resulting class diagram that represents the Analysis
Model). Regarding traceability, this strategy is a pragmatic
solution, but when several team members work in parallel with
different use cases, inconsistencies or conflicts among partial
models often arise, which must be solved when obtaining the
integrated model.
We present a case study that illustrates how our operator Merge
can be used effectively to deal with the required needs established
above. We present part of a system for managing submissions that
are received in a conference. In our example, we will focus on the
fragment of the Use Case Model shown in Figure 1. The actor
System Administrator manages user accounts. Authors submit
papers to the conference. The PCChair assigns submissions to
PCMembers. Each submission is assessed by several PCMembers
using review forms. When all the reviews are completed, the
PCChair ranks the submissions according to the assessment
contained in the review forms. Since there is a limit to the
numbers of papers that can be presented, and taking into account
the established ranking, some submissions are selected, and the
rest are rejected. Then, all authors are notified by email attaching
the review forms of their submission. Figure 2 shows the Class
Diagrams that support the functionality required for the
corresponding Use Case.
3. THE GENERIC SEMANTICS OF THE
OPERATOR Merge
In a Model-Driven Development context [10], models consist of
sets of elements that describe some physical, abstract, or
hypothetical reality. In the process of defining a model,
abstraction and classification are guidelines to be taken into
account. A metamodel is simply a model of a modeling language.
It defines the structure, and constraints for a family of models.
In our framework, a metamodel is viewed as an algebraic
specification where the model management operators are defined
so that they can be applied to all the models of the metamodel. To
fulfill this in our framework, the Ecore metamodel can be broken
down into three well-distinguished parts:
1. A parameterized module called MOMENT-OP, which provides
our generic model management operators independently of any
metamodel (the operator Merge is one of them). This module
also provides the needed constructors to specify a model as a
set of elements, based on an axiomatized specification of a set
theory.
2. A signature called sigEcore, which provides the constructors
of a specific metamodel which is specified in order to
represent a model by means of algebraic terms. For example,
in the Ecore metamodel algebraic specification, we have the
constructs that define a class, an attribute, an operation, and so
on. This signature is automatically generated from a
metamodel (see [23] for further details), which is specified by
using the meta-metamodel Ecore in the EMF. In this case, the
proper metamodel Ecore has been used to define itself. This
signature constitutes the actual parameter for the module
MOMENT-OP.
3. A module called spEcore, which instantiates the parameterized
module MOMENT-OP by passing sigEcore as actual
parameter. In the instantiation process, the generic operators
are customized to the constructs of the metamodel. This
provides the constructors that are needed to specify a model of
this metamodel as a set of elements. This fact also provides the
generic operators that can be automatically applied to models
of this kind. In this module, the specification of the operator
Merge can also be customized to a metamodel by simply
adding new axioms to the operators. This module constitutes
the algebraic specification of a metamodel in MOMENT. To
enable the manipulation of UML models, they have to be
represented as terms of the spEcore algebraic specification.
This task is automatically performed by MOMENT from
models expressed in Ecore format in the EMF.
In MOMENT, the operator Merge is defined axiomatically using
the Maude algebraic language. Maude allows us to specify the
operator in an abstract, modular and scalable manner so that we
can define its semantics from a generic and reusable point of
view. The operator can also be customized in an ad-hoc and more
accurate way, taking advantage of both complementary
standpoints.
3.1 The Generic Semantics of the Operator
Merge
The operator Merge takes two models as input and produces a
third one. If A and B are models (represented as terms) in a
specific metamodel algebraic specification, the application of the
operator Merge on them produces a model C, which consists of
the members of A together with the members of B, i.e. the union
of A and B. Taking into account that duplicates are not allowed in
a model, the union is disjoint.
To define the semantics of the operator Merge, we need to
introduce three concepts: the equivalence relationship, the conflict
resolution strategy and the refreshment of a construct.
First, a semantic equivalence relationship is a bidirectional
function between elements that belong to different models but to
the same metamodel. This indicates that they are semantically the
same software artifact although they may differ syntactically.
This relation is embodied by the operator Equals. The generic
semantics of Equals coincides with the syntactical equivalence,
although this generic semantics can be enriched by means of OCL
expressions that take into account the structure and semantics of a
specific metamodel.
Second, we have to deal with conflicts. During a model merging
process, when two software artifacts (each of which belongs to a
different model) are supposed to be semantically the same, one of
them must be erased. Their syntactical differences cast doubt on
which should be the syntactical structure for the merged element.
Here, the conflict resolution strategy comes into play. The
conflict resolution strategy is provided by the operator Resolve,
whose generic semantics consists of the preferred model strategy.
When the operator Merge is applied to two models, one has to be
chosen as preferred. In this way, when two groups of elements
(that belong to different models) are semantically equivalent due
to the Equals morphism, although they differ syntactically, the
elements of the preferred model prevail. The semantics of the
Resolve operator can also be customized for a specific metamodel
in the same way that we can do with the Equals operator.
Third, refreshments are needed to copy non-duplicated elements
into the merged model in order to maintain its references in a
valid state. If we merge models B and C in our case study, taking
model B as the preferred one, the reference Submission of the
class PCMember of model C is copied to the merged model. As
the class Submission of model C has been replaced by the one
from model B, the reference, which points to the class Submission
of model C, is no longer valid. Thus, this reference must be
updated. The update of a specific metamodel construct term is
embodied by the operator Refresh.
The operator Merge uses the equivalence relationship defined for
a metamodel to detect duplicated elements between the two input
models. When two duplicated elements are found, the conflict
resolution strategy is applied to them in order to obtain a merged
p. 13 of 199
1
2
3
4
Model Merging
<BC, mapB2BC, mapC2BC> =
Merge(B, C)
<DE, mapD2DE, mapE2DE> =
Merge(D, E)
<BCDE,mapDE2BCDE,
mapBC2BCDE > =
Merge(DE, BC)
<ABCDE, mapA2ABCDE,
MapBCDE2ABCDE> =
Merge(A, BCDE)
Conflicts
The multiplicity of the attribute
keywords (class Submission).
Models
B–C
Resolution
Multiplicity [1..5]
(preferred model)
3.1. The multiplicity of the
attribute
authors
(class
Submission)
3.2. Type of the attribute
accepted (class Submission)
3.3.
Multiplicities
of
the
association between the classes
Submission and PCMember
4.1. The attribute userid (class
User) and the attribute login
(class PCMember) are identified
as the same, by means of the
thesaurus.
B–E
Multiplicity [1..*]
(preferred model)
B–E
Type
Boolean
(preferred model)
Multiplicities 1..1 –
1..1
(preferred
model)
The inherited feature
prevails by means of
the EClass axiom
for the operator
Resolve.
C–D
A–D
Table 1. The steps of the Class Diagram merging process
ceq Equals N1 Model1 N2 Model2 =
if ((N1.name) == (N2.name)) and (Equals (N1.eContainingClass(Model1)) Model1 (N2.eContainingClass(Model2)) Model2) then true
else if (Synonym (N1.name) (N2.name)) and (Equals (N1.eContainingClass(Model1)) Model1 (N2.eContainingClass(Model2)) Model2) then true
else if (Similar (N1.name) (N2.name) 95.0) and (Equals (N1.eContainingClass(Model1)) Model1 (N2.eContainingClass(Model2)) Model2) then true
else false
fi fi fi
if (N1 oclIsTypeOf ( ? “EAttribute”; Model1)) and (N2 oclIsTypeOf( ? “EAttribute” ; Model2))
*** Condition
Figure 3. Equivalence relationship for the EAttribute primitive
element, which is then added to the output model. The elements
that belong to only one model, without being duplicated in the
other one, are refreshed and directly copied into the merged
model.
The outputs of the operator Merge are a merged model and two
models of mappings that relate the elements of the input models
to the elements of the output merged model. Therefore, these
mappings, which are automatically generated by the operator
Merge, provide full support for keeping traceability between the
input models and the new merged one.
4. SPECIFIC SEMANTICS FOR THE
ECORE METAMODEL TO MERGE UML
CLASS DIAGRAMS
In this section, we present the specific semantics of the operator
Merge to integrate UML Class Diagrams, which are implemented
in the EMF by means of the Ecore metamodel. To define the
specific semantics for the Ecore metamodel, we only have to add
specific axioms for the operators Equals and Resolve.
Consequently, the axiomatic definition of the operator Merge
remains the same. The equivalence relationships that relate two
elements of the metamodel Ecore take into account the type of a
construct, its name, and its container. In an equivalence
relationship, the names of two instances that have the same type
are analyzed in three steps1: two instances may be equal if they
have exactly the same name; if not, two instances may be equal if
1
We have chosen these principles for the example. Nevertheless,
they can be customized to a specific metamodel by the user.
Nothing impedes us to add semantic annotations to the elements
of a model and use this information the determine which
elements are equals or not
p. 14 of 199
their names are defined as synonyms in a thesaurus; if not, they
may be equal if a heuristic function for comparing strings
establishes that they are similar within an acceptable range.
Moreover, almost all the relationships add a container condition.
This means that two instances of the same type are equal if, in
addition to the name condition, they have an equivalent container
instance. In Figure 3, we provide the conditional equation that
represents an equivalence relationship for EAttribute primitive in
MAUDE notation. This equation is automatically obtained by
compilation of an OCL expression [23]. In the merging of the
partial models B and C of our case study, when this equivalence
relationship is applied, the attribute title of the class Submission
(of model B) is equivalent to the attribute title of the class
Submission (of model C) because they have the same name and
they belong to equivalent classes.
Several axioms have also been added to the Resolve operator in
order to take into account the constructs of the Ecore metamodel
that contain other elements. For example, a class can contain
attributes, references and operations, among others. In the case
study, when we integrate the class Submission of model B with
the class Submission of model C, we have to integrate their
respective attributes, references and operations.
5. MERGING PROCESS
In this section, we present the merging process that is used to
integrate the five partial class diagrams of the case study. The
four steps followed are indicated in Table 1, where the first
argument for the merge operator is the preferred one. In this table,
the first column indicates the step number; the second column
shows the invocation of the operator Merge; the third column
describes some of the main conflicts that have appeared during
the merging step; the fourth column indicates the partial models
Figure 4. Resulting merged model for the case study.
involved that contain the conflicting elements; and the latter
indicates the solution of the conflict by the Resolve operator.
After each step of the merging process, two models of mappings
are automatically generated. These mappings provide full support
for traceability by registering the transformation applied to the
elements of the source partial models and by relating them to
elements of the merged model. In the MOMENT framework, a set
of operators is provided to navigate mappings bidirectionally:
from a partial model to the merged model (providing support for
the propagation of changes from a specific use case to the merged
model, as well as preserving the changes applied to the latter); or
from the merged model to a partial class diagram (providing
support in order to update a specific use case). Moreover, such
mappings are considered as models so that generic model
management operators can also be applied to them.
In Figure 4, we show the resulting merged model resulting from
step 4. Although the user describes the semantics of the operator
Merge for a specific metamodel, as the model merging is
completely automated, there might exist some undesired results in
the merged model that should be fixed. In this figure, elements of
this kind are highlighted by a discontinuous line. Therefore, the
directed association that comes from partial model D should be
deleted, and the multiplicity of the existing association between
the Submission and the PCMember classes should be updated
with the multiplicity that appears in partial model C.
In such cases, the user has the option to open the merged model to
review and update it. Merged models can be manipulated from
visual editors that are integrated in the Eclipse platform, such as
the EMF tree editor or the Omondo tool [11], which provides a
visual environment to edit UML models based on the Ecore
metamodel. Other industrial modeling environments can be used
to manipulate resulting models such as Rational Software
Architect [12]. These environments provide the added value of
code generation and the integration with other IBM software
development tools. Furthermore, in the MOMENT framework,
the operators that work on mappings can be used to follow the
performed merging process in order to automatically detect
changes in properties of elements like cardinality, type, addition
of attributes to a class, etc.
6. RELATED WORK
In [21], several approaches for model merging are presented. The
operator Merge is a model management operator that was
proposed in [13] and further developed in [14] afterwards. The
specification of this operator Merge is provided in terms of
imperative algorithms so that the operator Merge is embodied by
a complex algorithm that mixes control logic with the
functionality. Although the operator is independent of any
metamodel, it depends on an external operator to check the
constraints of a specific metamodel. Therefore, it might generate
inconsistent models and requires an auxiliary operator to work
properly. Moreover, as shown in [14], the algorithm may be
changed depending on the metamodel. In MOMENT, the operator
Merge remains completely reusable for any metamodel. To
consider new metamodels, the operators Equals and Resolve can
be customized by simply adding axioms to their respective
semantic definition, preserving monotonicity.
Another approach to provide the operator Merge from a model
management standpoint is presented in [15] by using graph
theory. The operator Merge is denotationally defined by means of
production rules. In both operational and graph-based approaches,
the operator Merge receives a model of mappings as input. This
model indicates the relationships between the elements of the
models that are going to be merged. These mappings have to be
defined manually or can be inferred by a operator Match that uses
heuristic functions [16] or historical information [17]. Our
operator Merge does not depend on mappings since the
equivalence relation has been defined axiomatically between
elements of the same metamodel in the operator Equals, at a
higher abstraction level. Another inconvenience of both model
management approaches is that they are not integrated in any
visual modeling environment. Therefore, they cannot be used in a
model-driven development process in the way that the MOMENT
framework is able to do through the Eclipse platform.
The Generic Model Weaver AMW [19] is a tool that permits the
definition of mapping models (called weaving models) between
EMF models in the ATLAS Model Management Architecture.
AMW provide a basic weaving metamodel that can be extended
to permit the definition of complex mappings. These mappings
are usually defined by the user, although they may be inferred by
means of heuristics, as in [16]. This tool constitutes a nice
solution when the weaving metamodel can change. It also
provides the basis for a merge operator on the grounds that a
weaving model, which is defined between two models, can be
used as input for a model transformation that can obtain the
merged model (as mentioned in [19]). In MOMENT, model
weavings are generated by model management operators
automatically in a traceability model, and can be manipulated by
other operators.
An interesting operation-based implementation of the three-way
merge is presented in [20]. The union model that permits this kind
p. 15 of 199
of merging is built on top of a difference operator. The difference
operator is based on the assumption that all the elements that
participate in a model must have a unique identifier. This operator
uses the identifiers in order to check if two elements are the same.
Our Merge operator is a state-based implementation of the twoway merge so that it does not need a common base model in order
to merge two different models. In our approach the operator
Equals permits the definition of complex equivalence
relationships in an easy way. The three-way merge can be
specified as a complex operator in the Model Management arena,
as described in [14].
More specifically to the problem presented in the case study,
UML CASE tools permit the arrangement of Use Cases and their
corresponding partial Class Diagram into the same package.
Nevertheless, no option is provided to obtain the global Class
Diagram from the partial ones. The Rational Rose Model
Integration [18] is a tool that provides an ad-hoc solution to merge
UML models by basically using the name of the element to
determine equivalences, and using the preferred model strategy to
obtain the merged model. The equivalence relation and the
conflict resolution strategy cannot be customized by the user like
in MOMENT. Moreover, once the merged model is generated,
there is no way to relate the obtained model to the partial source
models in order to keep some degree of traceability.
7. CONCLUSIONS
In this paper, we have presented a state-based automated
approach for model merging from a model management
standpoint. We have briefly introduced how we deal with
algebraic models from a visual modeling environment, and we
have described the generic semantics of our operator Merge. A
customization of the operator has been performed for the Ecore
metamodel in order to solve the integration of the partial class
diagrams proposed in the case study. The operator takes
advantage of the reusability and modularity features of the
algebraic specifications. Therefore, it becomes a scalable operator
that can be easily specialized to a specific metamodel and that can
be intuitively used with other operators. As we have shown in our
case study, our approach provides support for maintaining
traceability between the Use Case model and the Analysis model.
The MOMENT framework offers other operators that enable the
synchronization of changes between both models, wherever the
changes occur, either in the partial models or in the merged
models.
In the current version of the MOMENT framework, the specific
semantics of the operator Merge is directly introduced using the
Maude syntax. In future work, we plan to develop visual
interfaces to define the axioms needed to customize the
MOMENT operators in order to improve the usability of our tool.
8. REFERENCES
[1] Frankel, D. S.: Model Driven Architecture: Applying MDA
to Enterprise Computing. John Wiley & Sons OMG Press.
[2] Czarnecki, K., Eisenecker, U.: Generative Programming:
Methods, Tools, and Applications.Addisson-Wesley (2000).
ISBN 0-201-30977-7, pag. 267-304.
[3] Bernstein, P.A: Applying Model Management to Classical
Meta Data Problems. pp. 209-220, CIDR 2003.
p. 16 of 199
[4] Boronat, A., Pérez, J., Carsí, J. Á., Ramos, I.: Two
experiencies in software dynamics. Journal of Universal
Science Computer. Special issue on Breakthroughs and
Challenges in Software Engineering. Vol. 10 (issue 4). April
2004.
[5] Eclipse site: www.eclipse.org
[6] The EMF site:
http://download.eclipse.org/tools/emf/scripts/home.php
[7] Clavel, M., Durán, F., Eker, S., Lincoln, P., Martí-Oliet, N.,
Meseguer, J., Quesada, J.F.: Maude: specification and
programming in rewriting logic. Theoretical Computer
Science, 285(2):187-243, 2002.
[8] Kruchten P. The Rational Unified Process: An Introduction.
Addison-Wesley Professional. 2003.
[9] Larman C. Applying UML and Patterns : An Introduction to
Object-Oriented Analysis and Design and Iterative
Development. Prentice Hall. 2004.
[10] Mellor, S. J., Scott, K., Uhl, A., Weise, D.: MDA Distilled:
Principles of Model-Driven Architecture. Addison Wesley
(2004). ISBN 0-201-78891-8.
[11] The Omondo site: www.omondo.com
[12] IBM Rational Software Architect: http://www306.ibm.com/software/awdtools/architect/swarchitect/
[13] Bernstein, P.A., Levy, A.Y., Pottinger, R.A.: A Vision for
Management of Complex Models. MRS Tech. Rep. MSRTR-2000-53, (in SIGMOD Record 29, 4 (Dec. '00)).
[14] Pottinger, R.A., Bernstein, P. A.: Merging Models Based on
Given Correspondences.” VLDB 2003.
[15] Song, G., Zhang, K., Kong, J.: Model Management Through
Graph Transformation. IEEE VL/HCC'04. Rome, Italy. 2004.
[16] Madhavan, J., P.A. Bernstein, and E. Rahm: Generic Schema
Matching using Cupid. VLDB 2001.
[17] Madhavan, J., Bernstein, P. A., Chen, K., Halevy, A.Y.,
Shenoy, P.: Corpus-based Schema Matching," Workshop on
Information Integration on the Web, at IJCAI'2003,pp.59-66.
[18] Rational Suite: http://www-306.ibm.com/software/swatoz/indexR.html
[19] Didonet Del Fabro, M, Bézivin, J, Jouault, F, Breton, E, and
Gueltas, G : AMW: a generic model weaver. Proceedings of
the 1ère Journée sur l'Ingénierie Dirigée par les Modèles
(IDM05). 2005.
[20] Alanen, M, and Porres, I: Difference and Union of Models.
In UML 2003 - The Unified Modeling Language, Oct 2003.
[21] Mens, T.: A State-of-the-Art Survey on Software Merging.
IEEE Transactions on Software Engineering, Volume 28 ,
Issue 5 (May 2002). Pages: 449 – 462.
[22] The MOMENT site: http://moment.dsic.upv.es/
[23] Boronat, A., Ramos, I., Carsí J.A.: Definition of OCL 2.0
Operational Semantics by means of a Parameterized
Algebraic Specification. 1st workshop on Algebraic
Foundations for OCL and Applications (WAFOCA’06)
Valencia, Spain, march 22nd, 2006.
Inducing Evolution-Robust Pointcuts
∗
Mathieu Braem, Kris Gybels, Andy Kellens , Wim Vanderperren
System and Software Engineering Lab
Vrije Universiteit Brussel
Pleinlaan 2, B-1050 Brussels, Belgium
{mbraem,kgybels,akellens,wvdperre}@vub.ac.be
ABSTRACT
One of the problems in Aspect-Oriented Software Development is specifying pointcuts that are robust with respect to
evolution of the base program. We propose to use Inductive
Logic Programming, and more specifically the FOIL algorithm, to automatically discover intensional pattern-based
pointcuts. In this paper we demonstrate this approach using
several experiments in Java, where we successfully induce
a pointcut from a given set of joinpoints. Furthermore, we
present the tool chain and IDE that supports our approach.
1.
INTRODUCTION
Aspect-Oriented Software Development (AOSD) aims to provide a better separation of concerns than possible using traditional programming paradigms [17]. To this end, AOSD
introduces an additional module construct, called an aspect.
Traditional aspects consist of two main parts: a pointcut
and an advice. Points in the program’s execution where an
aspect can be applied are called joinpoints. Pointcuts are
expressions in a pointcut language which describe a set of
joinpoints where the aspect should be applied. The advice
is the concrete behavior that is to be executed at a certain
joinpoint, typically before, after or around the original behavior at the joinpoint.
Since existing software systems can benefit from the advantages of AOSD as well, a number of techniques have been
proposed to identify crosscutting concerns in existing source
code (aspect mining) [4, 5, 6] and transform these concerns
into aspects (aspect refactoring) [21, 20]. When refactoring
a concern to an aspect, a pointcut must be written for this
aspect. Pointcut languages like for instance the CARMA
pointcut language allow specifying pattern-based pointcuts,
so that the pointcut does not easily break when the base
code is changed [9, 18]. While existing aspect refactoring
∗Ph.D. scholarship funded by the “Institute for the Promotion of Innovation through Science and Technology in Flanders” (IWT Vlaanderen).
techniques also automatically generate a pointcut, they typically only provide an enumerative pointcut, which is fragile
with respect to evolution of the base program. Turning this
pointcut into a pattern-based pointcut is left to be done
manually by the developer.
In this paper we propose to exploit Inductive Logic Programming techniques to automatically induce a pattern-based
pointcut from a given set of joinpoints. The next section
details the problem of uncovering pattern-based pointcuts
and introduces the running example used throughout this
paper. Section 3 introduces Inductive Logic Programming
and the concrete algorithm used and in section 4 we apply
ILP for automatically generating pattern-based pointcuts
and report on several successful experiments in Java. Afterwards, we present the tools created to support our approach,
compare with related work and state our conclusions.
2.
MOTIVATION AND BACKGROUND
The main problem in maintaining aspect-oriented code is
the so-called fragile pointcut problem [19]. Pointcuts are
deemed fragile when seemingly innocent changes to a program result in the pointcut no longer capturing the intended
joinpoints. Taking the code of figure 1 for example, a pointcut for capturing message invocations on Point objects that
change the state of the object could simply say “capture
setX and setY messages”. Changing the name of setX to
changeX or adding a method setZ would break this pointcut. The pointcut is obviously fragile because it is simply
an enumeration of methods.
Using an advanced pointcut language that gives access to
the full static joinpoint model of a program, it is possible
to exploit a more robust pattern [9]. Figure 2 illustrates a
pointcut that expresses that all the state changing methods
contain an assignment to an instance variable of an object.
The area of aspect refactoring and aspect mining is a particularly interesting research area within AOSD that is currently being explored. In performing aspect mining and
refactoring, the problem crops up of finding a pointcut for
the newly created aspect. Also, as with object-oriented
refactoring, research is being performed on how to automate
these refactorings using tool support. In such tools, it would
be interesting to be able to automate the step of generating
a pattern-based pointcut as well. Currently, most proposals
for automating aspect refactoring simply generate an enumerative pointcut, which is of course too fragile. In related
p. 17 of 199
1
private int x,y;
3
4
6
public void setX(int a) {
this.x=a;
7
}
5
public void setY(int a) {
this.y=a;
8
9
}
10
12
public int getX() {
return x;
13
}
11
public int getY() {
return y;
14
15
}
16
17
}
Figure 1: A simple Point class
1
2
3
4
5
of the joinpoint (message, assignment, . . . ), in which
method or class the joinpoint occurs, . . .
public class Point {
2
stateChanges(Jpvar):
execution(Jpvar,MethodName),
inMethod(AssignmentJP,MethodName),
isAssignment(AssignmentJP,AssignmentTarget),
instanceVariable(AssignmentTarget,ClassName)
Figure 2: A pointcut for the observer
work, section 6, we discuss a number of these approaches.
In this paper, we therefore propose the use of Inductive Logic
Programming for automatically generating a pattern-based
pointcut. We restrict ourselves to a pointcut language with a
static joinpoint model. We in particular use a logic pointcut
language similar to CARMA, but restricted to a static joinpoint model, which means that run-time values and cflow
constructs are not supported.
3.
INDUCTIVE LOGIC PROGRAMMING
The technique of logic induction returns a logic query that,
by using conditions drawn from background information on
a set of examples, satisfies all positive examples while not
including any negative examples. In this paper we use the
FOIL ILP algorithm [23]. Informally, the way ILP works and
how it can be applied to generate a pattern-based pointcut
is as follows:
positive examples: ILP takes as input a number of positive examples, in our setting of deriving pattern-based
pointcuts these are joinpoints that the pointcut should
capture.
negative examples: ILP also takes as input a number of
negative examples, the rules that are derived during
the iterative induction should never cover negative examples. Negative examples effectively force the algorithm to use other information of the background in
the induced rules.
background information: Another input to ILP is background information on the examples. In our setting,
these would be the result of predicates in the pointcut
language that are true for the joinpoints, or in other
words, the data associated with the joinpoints. Such
as the name of the message of the joinpoint, the type
p. 18 of 199
induction: ILP follows an iterative process for constructing a logic rule based on the positive examples. Starting from an empty rule, in each step of the process,
the rule is extended with a condition drawn from the
background information which decreases the number
of negative examples covered by the rule. This is
repeated until the rule describes all positive but no
negative examples. The added conditions are generalizations of facts in the background information,
by adding logic variables (a more powerful version of
wildcards). For example, if the background information contains the fact instanceVariable(‘Point.x’,
‘int’) one possible condition used could be instanceVariable(X, ‘int’).
4.
ILP FOR POINTCUT ABSTRACTION
In this section, we perform a number of experiments in order to demonstrate how we use the FOIL algorithm to induce pattern-based pointcuts. The joinpoints required as
positive examples for the ILP algorithm can be selected automatically using for example an aspect mining technique,
though in these experiments we selected them manually. All
other joinpoints are defined as negative examples for the
ILP algorithm. As background information, we construct
a logic database consisting of the information that is normally available in the pointcut language on these joinpoints
and structural information about the program, such as the
relationship between classes etc. Because the pointcut language uses a purely static joinpoint model, these solutions
can be determined using only the program’s source or compiled representation, i.e. compiled Java classes. Examples
of these facts are given in figure 3.
The algorithm will induce a pointcut that captures exactly
the joinpoints currently in the program that should be captured (the positive examples), and none of the others (the
negative examples). This is guaranteed by the algorithm. It
is reasonable to expect, though not guaranteed, that the
induced pointcut also is a non-fragile or robust pointcut
because the induction process generalizes the conditions it
adds to rules. In general we will not have a specific pointcut in mind that the algorithm should derive (otherwise the
application of ILP would be rather pointless), though in
these experiments we can use the robust pointcut from figure 2 as a benchmark for comparison. We do not discuss
the performance of our tools, but an analysis is included in
an extended version of this paper [3].
4.1
Basic Point class
As an example of our approach, take the simple Point class
from figure 1. In a first step we derive the static joinpoints
from this code, and derive the information on all of these
that is given by the predicates of the pointcut language.
This forms the background information for the logic induction algorithm, part of this generated background information is shown in figure 3.
The methods that are state changing on this simple Point
class are the methods setX and setY only. We identify
these two joinpoints as positive examples of our desired
returnStatement(jp1).
returnStatement(jp6).
returnStatement(jp11).
returnStatement(jp14).
returnStatement(jp17).
inMethod(jp1,‘Point.setX(I)I’).
inMethod(jp2,‘Point.setX(I)I’).
inMethod(jp3,‘Point.setX(I)I’).
inMethod(jp4,‘Point.setX(I)I’).
inMethod(jp6,‘Point.setY(I)I’).
inMethod(jp7,‘Point.setY(I)I’).
inMethod(jp8,‘Point.setY(I)I’).
inMethod(jp9,‘Point.setY(I)I’).
inMethod(jp11,‘Point.getX()I’).
inMethod(jp12,‘Point.getX()I’).
inMethod(jp14,‘Point.getY()I’).
inMethod(jp15,‘Point.getY()I’).
inMethod(jp17,‘Point.Point()V’).
isRead(jp3,‘l0’).
isRead(jp4,‘l1’).
isRead(jp8,‘l2’).
isRead(jp9,‘l3’).
isRead(jp12,‘Point.x’).
isRead(jp15,‘Point.y’).
methodInClass(‘Point.setX(I)I’,‘Point’).
methodInClass(‘Point.setY(I)I’,‘Point’).
methodInClass(‘Point.getX()I’,‘Point’).
methodInClass(‘Point.getY()I’,‘Point’).
methodInClass(‘Point.Point()V’,‘Point’).
classExtends(‘Point’,‘java.lang.Object’).
methodReturns(‘Point.setX(I)I’,‘int’).
methodReturns(‘Point.setY(I)I’,‘int’).
methodReturns(‘Point.getX()I’,‘int’).
methodReturns(‘Point.getY()I’,‘int’).
isAssignment(jp2,‘Point.x’).
isAssignment(jp7,‘Point.y’).
instanceVariable(‘Point.x’,‘Point,int’).
instanceVariable(‘Point.y’,‘Point,int’).
classInPackage(‘java.lang.Object’,‘java.lang’).
execution(jp0,‘Point.setX(I)I’).
execution(jp5,‘Point.setY(I)I’).
execution(jp10,‘Point.getX()I’).
execution(jp13,‘Point.getY()I’).
execution(jp16,‘Point.Point()V’).
Figure 3: Part of the background information for the Point class of figure 1.
1
2
3
4
stateChanges(A):
execution(A,B),
inMethod(C,B),
isAssignment(C,D).
Table 1: Generated facts statistics
# Classes # Facts # Joinpoints
Toy example
1
71
10
AWT Point class
1
364
70
Full AWT library
362
276863
65060
Figure 4: Induced stateChanges pointcut.
stateChanges pointcut, which are the joinpoints jp0 and
jp5 respectively. The pointcut should not cover the other
joinpoints: the joinpoints jp10 and jp13, for instance, denote the execution of the getX and getY method. Clearly,
these methods are not state changing. So these and all other
joinpoints besides jp0 and jp5 are marked as negative examples. We give the FOIL algorithm the positive examples
stateChanges(jp0) and stateChanges(jp5). The resulting
rule is shown in figure 4. The pointcut selects all executions
of methods that contain an assignment.
The resulting pointcut is clearly not very robust. An evolution that easily breaks the pointcut would be to have a getX
method that does an assignment to a local variable which
does not mean that that method changes the state of an
object, yet its execution would be captured by the pointcut.
This result is however not very surprising: the Point class is
small and does not include non-state changing methods that
do assignments to local variables which would have served
as a negative example for the FOIL algorithm. As the induced pointcut covers all positive examples and no negative
ones, the induction stops and no further predicates from the
background information are used to limit the rule to only
the positive examples. The ILP algorithm works better on
larger programs, so that more negative examples are available to avoid oversimplified pattern-based pointcuts.
In order to have a more realistic example, we apply our
experiment to the Point class bundled with Java. We do
1
2
3
4
5
stateChanges(A):
execution(A,B),
inMethod(C,B),
isAssignment(C,D),
instanceVariable(D,E).
Figure 5: Resulting pointcut when applying our approach to the AWT Point class.
not include a full listing of the generated background, but
instead we give some statistics about the generated facts.
Table 1 compares the number of facts found in the AWT
Point class to the number of facts from the basic Point example. We notice a rapid increase in the number of facts
with larger input. However, since the generated information is based on a static joinpoint model, this number only
grows lineary, in function of the size of the classes and the
size of the methods.
We identify four execution joinpoints in the AWT Point class
where a state changing method is invoked and input them
as positive examples to the algorithm. The remaining 66
joinpoints are defined as negative examples. The resulting
pointcut is shown in figure 5. In this case, the algorithm
generates a pointcut that is sufficiently robust for evolution:
it is in fact the same pointcut we defined in figure 2.
4.2
Extended experiments
p. 19 of 199
1
2
3
4
5
6
stateChanges(A):
execution(A,B),
inMethod(C,B),
isAssignment(C,D),
instanceVariable(D,E),
not(isTransient(D)).
Java Classes
Fact
Gen
XML Facts
JFacts
QFoil Facts
Figure 6: Resulting pointcut for non-transient field
assignments in Java AWT.
QFoil
Induced Query
In order to provide a limited evaluation of our approach,
we conduct two more involved experiments using the statechanges example on the Java AWT framework.
4.2.1
Large fact database:
We apply our approach to the complete Java AWT library
in order to evaluate whether it still returns a useful result
when the number of facts is very large. This library contains
approximately 362 classes and generates more than 250000
facts. The result for the algorithm is the same as for the Java
AWT Point class alone: the same pointcut as we defined in
figure 2 is induced.
4.2.2
JFacts
Pointcut
Figure 7: The tool chain for inducing pointcuts from
Java classes.
Negation:
One of the distinguishing features of the FOIL algorithm in
comparison to other ILP algorithms is its ability to induce
rules containing negations. As a variation of the state changing methods example, we need a pointcut for the executions
of methods that change the observable representation of an
object. This means the method does assignments to instance
variables that are not declared transient using the modifier
transient in Java: conceptually, these fields are not part
of the object’s persistent state and are not retained in the
object’s serialization. This is used for example when a class
defines a cache in order to optimize some parts of its operations. As such, observers do not need to be notified when
transient fields are altered. When applying this experiment
to the Java AWT library, our algorithm induces the rule
shown in figure 6, which in comparison to the pointcuts induced above adds exactly the properties in the background
to distinguish these joinpoints from the negative examples
that we would expect it to add, i.e. the fact that the instance
variables being assigned to are not declared transient.
Figure 8: Screenshot of the Eclipse CME plugin that
allows automatic concern extraction to JAsCo.
• QFoil: This tool is the implementation of the FOIL
ILP algorithm by Ross Quinlan [24]. It takes a set of
facts and a set of positive examples as input (negative
examples are implicitly assumed) and tries to induce a
logic rule that covers all of the positive examples and
rejects all of the negative examples.
5.2
5. TOOL SUPPORT
5.1 Tool Chain
Our approach is supported by a fully automatic tool chain,
consisting of the following tools (see figure 7):
• FactGen: This tool translates a range of Java class
files and/or jar files to a set of facts representing these
classes. The tool uses the javassist library [7] to process the binary class files. The javassist library provides a high-level reflective API that allows to inspect
the full Java byte code, including method bodies. The
output of the FactGen tool is the fact representation
in XML format.
• JFacts: This tool allows to translate logic predicates
from one syntax into another. Currently, the tool
supports the FactGen’s XML syntax, QFoil’s syntax,
CARMA’s syntax and the Prolog syntax.
p. 20 of 199
IDE Integration
Although our current tool chain works fully automatic, it is
a stand-alone command-line tool that is not integrated in
an IDE. In previous work, we have developed an AO refactoring extension to the Eclipse Concern Manipulation Environment [26]. Figure 8 shows a screenshot of this tool.
This visual tool allows to navigate through applications via
a powerful and extensible query language. As such, crosscutting concerns can be identified and isolated in a concern
model. Afterwards, the concern can be automatically refactored to an aspect in the JAsCo language [27]. However,
the generated pointcut is simply an enumeration of joinpoints. Because JAsCo also supports the pointcut language
presented in this paper, integrating our tool chain in this
visual IDE is trivial. As such, a fully automatic refactoring
tool can be realized that generates evolution-robust pointcuts instead of plain enumerations.
6.
RELATED WORK
To our knowledge, there exist no other approaches which try
to automatically generate pattern-based pointcuts. In previous work [10] we already report on a first attempt for using
Inductive Logic Programming in order to derive patternbased pointcuts. In this work we employ Relative Least
General Generalisation [22], an alternative ILP algorithm,
instead of the FOIL algorithm. Using RLGG, we are able to
derive correct pointcuts for some specific crosscutting concerns in a Smalltalk image. However, due to the limitations
of both our implementation as well as the applied ILP algorithm (for instance, the algorithm does not support negated
literals), our RLGG-based technique often results in pointcuts that suffer from some fragility: the resulting pointcuts
for example frequently contain redundant literals referring
to the names of specific methods or classes, which of course
easily breaks the pointcut when these names are changed.
Furthermore, our earlier work suffers from serious scalability
issues.
that joinpoints and joinpoint shadows are not equated as in
the more restricted pointcut language used here. Our approach can easily be applied to for example AspectJ [16] as
well by translating the induced pointcuts to AspectJ pointcuts. However, the FOIL algorithm must then be restricted
to not generate pointcuts using features that can not be
translated to AspectJ: variables can only be used once in a
pointcut (except when using the “if” restrictor in AspectJ),
recursive named pointcuts are not possible, and only some
uses of the structural predicates can be translated. Other
points left for future work are:
• Other Algorithms: There exist several algorithms for
Inductive Logic Programming. In previous work, we
conduct several small-scale experiments with the Relative Least General Generalization (RLGG) [22] algorithm in an aspect mining context [10]. Having several
algorithms might improve the quality of the selected
results to the end-user. For example, solutions that are
induced by more than one algorithm might be better.
As mentioned earlier, the major area of application of our
technique lies in the automated refactoring of crosscutting
concerns in pre-AOP code into aspects. Quite a number of
techniques exist [11, 21, 20, 13] which propose refactorings
in order to turn object-oriented applications into aspectoriented ones. However, these techniques do not consider the
generation of pattern-based pointcuts. Instead they propose
to automatically generate an enumeration-based pointcut
which, optionally, can be manually turned into a patternbased pointcut by the developer. As is pointed out by Binkley et al. [2], our technique is complementary with these
approaches as it can be used to both improve the level of
automation of the refactoring, as well as the evolvability of
the refactored aspects.
In the context of aspect mining, which is closely related
to object-to-aspect refactorings, a wealth of approaches are
available that allow for the identification of crosscutting concerns in an existing code base. The result of such a technique is typically an enumeration of joinpoints where the
concern is located. Ceccato et al. [6] provide a comparison
of three different aspect mining techniques: identifier analysis, fan-in analysis and analysis of execution traces. Breu
and Krinke propose an approach based on analyzing event
traces for concern identification [4]. Bruntink et al. [5]
make use of clone detection techniques in order to isolate
idiomatically implemented crosscutting concerns. Furthermore, several tools exist that support aspect mining activities by allowing developers to manually explore crosscutting
concerns in source code, such as the aspect mining tool [12],
FEAT [25], JQuery [15] and the Concern Manipulation Environment [14]. These approaches are complementary with
our approach in that the joinpoints they identify can serve
as positive examples for our ILP algorithm.
7.
CONCLUSIONS AND FUTURE WORK
• Multiple Results: Our current tools only generate one
pointcut for a given set of joinpoints. In some cases,
most notably when there is little background information (i.e. a small number of little classes), several
alternative pointcuts are possible. Therefore, it would
be useful to allow presenting multiple pointcut results.
• Run-Time Information: Our current approach only
analyzes the static program information to induce pointcuts. Pointcuts that require run-time program information, such as stateful aspects [8], cannot be induced.
For this, facts representing the run-time behavior of
the program are necessary.
• Refactor existing pointcuts: The FOIL ILP algorithm
could refactor pointcuts given by an aspect-mining technique. This can be easily done by simply taking all the
joinpoints covered by that pointcut and using them as
the positive examples.
8.
REFERENCES
[1] Mehmet Akşit, editor. Proc. 2nd Int’ Conf. on
Aspect-Oriented Software Development (AOSD-2003).
ACM Press, March 2003.
[2] D. Binkley, M. Ceccato, M. Harman, F. Ricca, and
P. Tonella. Automated refactoring of object oriented
code into aspects. In 21st IEEE International
Conference on Software Maintenance (ICSM), 2005.
[3] Mathieu Braem, Kris Gybels, Andy Kellens, and Wim
Vanderperren. Automated pattern-based pointcut
generation. In Proceedings of Software Composition,
LNCS. Springer-Verlag, 2006. (to appear).
In this paper we present our approach using Inductive Logic
Programming for generating a concise and robust pointcut
from a given enumeration of joinpoints. We report on a
number of successful experiments that apply our approach
to a realistic and medium-scale case study.
[4] Silvia Breu and Jens Krinke. Aspect mining using
event traces. In 19th International Conference on
Automated Software Engineering, pages 310–315, Los
Alamitos, California, September 2004. IEEE
Computer Society.
In future work we will consider tackling full CARMA which
requires taking into account in the background information
[5] M. Bruntink, A. van Deursen, R. van Engelen, and
T. Tourwé. An evaluation of clone detection
p. 21 of 199
techniques for identifying crosscutting concerns. In
Proceedings of the IEEE International Conference on
Software Maintenance (ICSM). IEEE Computer
Society Press, 2004.
[6] M. Ceccato, M. Marin, K. Mens, L. Moonen,
P. Tonello, and T. Tourwé. A qualitative comparison
of three aspect mining techniques. In Proceedings of
the 13th International Workshop on Program
Comprehension (IWPC 2005), pages 13–22. IEEE
Computer Society Press, 2005.
[7] Shigeru Chiba and Muga Nishizawa. An easy-to-use
toolkit for efficient Java bytecode translators. In
GPCE ’03: Proceedings of the second international
conference on Generative programming and component
engineering, pages 364–376, New York, NY, USA,
2003. Springer-Verlag New York, Inc.
[8] Rémi Douence, Pascal Fradet, and Mario Südholt.
Composition, reuse and interaction analysis of stateful
aspects. In Karl Lieberherr, editor, Proc. 3rd Int’
Conf. on Aspect-Oriented Software Development
(AOSD-2004), pages 141–150. ACM Press, March
2004.
[9] Kris Gybels and Johan Brichau. Arranging language
features for pattern-based crosscuts. In Akşit [1],
pages 60–69.
[10] Kris Gybels and Andy Kellens. An experiment in
using inductive logic programming to uncover
pointcuts. In First European Interactive Workshop on
Aspects in Software, September 2004.
[11] Stefan Hanenberg, Christian Oberschulte, and Rainer
Unland. Refactoring of aspect-oriented software. In 4th
Annual International Conference on Object-Oriented
and Internet-based Technologies,Concepts, and
Applications for a Networked World, 2003.
[12] J. Hannemann. The Aspect Mining Tool web site.
http://www.cs.ubc.ca/labs/spl/ projects/amt.html.
[13] Jan Hannemann, Gail Murphy, and Gregor Kiczales.
Role-based refactoring of crosscutting concerns. In
Peri Tarr, editor, Proc. 4rd Int’ Conf. on
Aspect-Oriented Software Development (AOSD-2005),
pages 135–146. ACM Press, March 2005.
[14] William Harrison, Harold Ossher, Stanley M. Sutton
Jr., and Peri Tarr. Concern modeling in the concern
manipulation environment. IBM Research Report
RC23344, IBM Thomas J. Watson Research Center,
Yorktown Heights, NY, September 2004.
[15] Doug Janzen and Kris De Volder. Navigating and
querying code without getting lost. In Akşit [1], pages
178–187.
p. 22 of 199
[16] G. Kiczales, E. Hilsdale, J. Hugunin, M. Kersten,
J. Palm, and W. G. Griswold. An overview of
AspectJ. In J. L. Knudsen, editor, Proc. ECOOP
2001, LNCS 2072, pages 327–353, Berlin, June 2001.
Springer-Verlag.
[17] Gregor Kiczales, John Lamping, Anurag Mendhekar,
Chris Maeda, Cristina Lopes, Jean-Marc Loingtier,
and John Irwin. Aspect-oriented programming. In
Mehmet Akşit and Satoshi Matsuoka, editors, 11th
Europeen Conf. Object-Oriented Programming, volume
1241 of LNCS, pages 220–242. Springer Verlag, 1997.
[18] Gregor Kiczales and Mira Mezini. Separation of
concerns with procedures, annotations, advice and
pointcuts. In European Conference on Object-Oriented
Programming, ECOOP 2005, 2005.
[19] Christian Koppen and Maximilian Störzer. PCDiff:
Attacking the fragile pointcut problem. In Kris
Gybels, Stefan Hanenberg, Stephan Herrmann, and
Jan Wloka, editors, European Interactive Workshop on
Aspects in Software (EIWAS), September 2004.
[20] Ramnivas Laddad. Aspect-oriented refactoring, dec
2003.
[21] Miguel Pessoa Monteiro. Catalogue of refactorings for
aspectj. Technical Report UM-DI-GECSD-200401,
Universidade Do Minho, 2004.
[22] S. Muggleton and C. Feng. Efficient induction in logic
programs. In S. Muggleton, editor, Inductive Logic
Programming, pages 281–298. Academic Press, 1992.
[23] J. Ross Quinlan. Learning logical definitions from
relations. Machine Learning, 5(3):239–266, August
1990.
[24] Ross Quinlan. Qfoil: the reference foil
implementation. Home page at
http://www.rulequest.com/Personal/, 2005.
[25] Martin P. Robillard and Gail C. Murphy.
Automatically inferring concern code from program
investigation activities. In Proceedings of Automated
Software Engineering (ASE) 2003, pages 225–235.
IEEE Computer Society, 2003.
[26] Davy Suvée, Bruno De Fraine, Wim Vanderperren,
Niels Joncheere, Len Feremans, and Karel Bernolet.
JAsCoCME: Supporting JAsCo in the Concern
Manipulation Environment. In CME BOF at the 4th
International Conference on Aspect-Oriented Software
Development (AOSD 2005), Chicago, IL, USA, March
2005.
[27] Davy Suvée and Wim Vanderperren. JAsCo: An
aspect-oriented approach tailored for component
based software development. In Akşit [1], pages 21–29.
Empirical Analysis of the Evolution of an Open Source System
Andrea Capiluppi
Sarah Beecham
Juan Fernández-Ramil
Department of Computing and
Informatics
University of Lincoln
Lincoln, UK
Department of Computer
Science
University of Hertfordshire
Hatfield, UK
[email protected]
s.beecham @herts.ac.uk
Computing Department &
Centre for Research in
Computing
The Open University
Milton Keynes, UK
[email protected]
Abstract
This paper reports on our empirical examination of the Wine emulator of MS Windows. Wine is a large software system with about
2.8 millions of lines of code to which hundreds of developers have contributed since 1993. The evolution of the Wine system is
characterised by strong growth (superlinear), apparently sustained by a growing community of users and developers. Our measures
did not reveal signs of decreasing productivity for a 12 year old system. Even though only a small group of developers is responsible
for the majority of the work, there is still a turnaround of the leading contributors, suggesting the presence of contributors’
generations. We present several empirical results which help us to better understand the evolution of open source systems, even
though much still requires to be studied.
1. Introduction
The empirical study of evolving software systems is a fascinating topic but this field is still plagued with mysteries. One of these
relates to the observed superlinear growth in a number of open source systems (OSS), contradicting the idea that increasing
complexity [Lehman and Belady 1985] should slow down functional growth and productivity. The growth of software systems over
time or releases can be classified as linear, sublinear, or superlinear trends, depending on the actual shape of the growth trend1
[Robles et al 2005, Smith et al 2005]. Growth trends can be based on counts of lines of code, files, folders, functions, methods, classes
and so on. Previous empirical work on OSS [Godfrey & Tu 2000; Robles et al 2005] has identified the presence of superlinear growth
in possibly up to one third or so of the systems studied. It has been argued that this ‘extraordinary’ behaviour is achievable under
specific circumstances: the Linux kernel, for instance, achieved such impressive functional growth rates due mainly to the presence of
more and more device drivers – e.g. via ‘software cloning’ --, which are not modifying the kernel itself, but rather the system’s
periphery [Godfrey & Tu 2000]. None of these studies, though, appears to have tried to relate this superlinearity to other variables
apart from size, and in particular, with the effort provided by the contributors to the evolution of these systems.
Another mystery relates to the presence of phases in software evolution and what exactly triggers them. In many of the software
systems we have studied so far, one can detect, visually or by other means, whether the trend of a given metric is in fact the
composition of different sub-trends. Often, one takes one metric over time and observes this composition of periods with different
characteristics. The breakpoints between, say, a linear sub-trend and a superlinear one, and in general, the junction points between any
two different sub-trends, can be called punctuations or transitions [Anton and Potts 2001]. These transitions may represent significant
changes in the evolving system or in the process by which it is evolved. In general, these transitions separate two periods of stability
in the evolution of the software. In our previous work [Smith et al 2005], we have identified phases in growth trends, where each
phase is described by a separate pattern. However, we do not have yet any means to predict the sequence of phases or even the
duration of each of such phases.
In this paper we report on our study of the evolution of the Wine OSS system2, both from the point of view of the output produced
(in terms of files – also called modules – created and modified), and the effort (input, in terms of numbers of active contributors
involved). We use simple visualisations of data over time. We also use some numerical methods. In order to model the input-output
relationship, we use an approach developed by us [Ramil 2003, CRESTES] to model numerically the productivity of developers. Our
results suggest that Wine is subject to both superlinearity and to evolutionary phases.
Choosing the Wine system as our case study was due to the fact it is informally recognised as a ‘successful’ system. We still need
to formalise our definition of a successful OSS. In this paper we assume that a successful OSS is supposed to evolve through a
reasonably long period of time, displaying growth and change. It also needs the commitment of constant or increasing number of
developers. Another characteristic of success is that the actual developers can change but the system sill maintains its evolution rate,
with no signs of significantly knowledge of the system and application being lost. Our study of the Wine system confirmed it as a
good example of an established, long-lived system that has undergone changes and growth in evolution. Moreover, this system has
attracted a large community of users and developers, and for a period of its life-cycle it has achieved a continual, superlinear growth.
1
2
A linear a pattern is such in which the rate of growth is approximately constant, superlinear and sublinear when the rate of growth increases or
decreases, respectively.
Wine is an open source MS Windows emulator, available online at http://www.winehq.com
p. 23 of 199
The empirical study of the issues covered in this paper is important for the software evolution community at large. Unless we are
able to measure, observe and model software evolution in a proper scientific way, we will be lacking of a scientific basis to test
whether our new processes, methods and tools make any difference in the evolution of real world systems.
This paper is structured as follows: section 2 briefly refers to the related work. Section 3 presents the Wine case study and the
extraction of metric data for this system. Section 4 presents the results. Section 5 presents some conclusions.
2. Related work
The evolution of proprietary software systems have been studied since the 1970s [Kitchenham 1982, Lehman and Belady 1985,
Kemerer and Slaughter 1999, FEAST]. More recently, there have been already many studies of the evolution of OSS, facilitated by
the public availability of source code and other useful records which enable researchers to dig out some of the secrets of large and
complex OSS without the issues of confidentiality imposed by companies. However, many of these OSS studies focus on very
specific hypotheses and for this reason it is difficult to see how they could contribute to a common body of knowledge. The
commonality across studies have been in the methods (e.g. extraction of data from source code repositories such as in [Mockus et al
2002, Gall et al 2003, German 2003, Hassan and Holt 2003, Rysselberghe and Demeyer 2004]. However, we lack in common
research questions and comparable, repeatable research outcomes. One notable exception is the issue of superlinearity in OSS, which
has already involved several studies [e.g. Godfrey and Tu 2000, Herraiz et al 2005, Robles et al 2005] and is still a conundrum. In the
present study we use data extraction, visualisation and modelling techniques which we have developed during the last few years in a
number of empirical studies of evolving software [e.g. FEAST, Capiluppi et al 2003, 2004], initially looking at commercial systems,
and more recently at OSS.
3. Wine case study
Wine is a software application which allows Microsoft Windows programs to run on Unix-based machines. The choice of this
system as a case study is due to the fact that it represents a popular system both in terms of output produced (there's a new official
release every month or so), and in the amount of developers contributing to the system (according to our data extraction, in total
around 800 distinct people have contributed to the evolution of this system). Wine’s current size is around 2,8 million LOCs. For this
study we extracted data from the ChangeLog files, which record additions and changes done to the system.
3.1 Data extraction from ChangeLog
The ChangeLog file of the Wine system covers the evolution interval analysed in this study, which spans over 12 years, between
July 1993 and June 2005. Perl scripts were written to identify and extract every occurrence of the following items: ‘Author’
(contributor responsible for making a change to system); ‘Change Made’ (the type of activity an author is responsible for); ‘Date’
(day, month and year of change). The field ‘Change Made’ includes: ‘Module’ affected (the name of the file created or directly
modified by a change), ‘Subsystem’ (the name of subsystem every module belongs to). The two types of change made that we
considered in the present study were the creation of an element (a file or a subsystem), and the modification of existing files or
subsystems. We recorded creation as the first date when a given specific element appears in the ChangeLog, and modifications as all
other subsequent entries involving the same element in the ChangeLog.
After performing the extraction, we arranged the data on a table. It made up to some 80,000 entries, including new element
creations and changes. Each entry has the three fields of interest: ‘Author’, ‘Change Made’ and ‘Date’. Based on our experience with
this type of data, we performed some cleansing. For example, obvious variations of people ids – in this case their email addresses –
were mapped to one unique id. Multiple email ids relating to a single developer were converted into a single email id. We also
removed from the table all missing and ambiguous data. Because we are looking at the relationship between effort and work output –
not at effort or work output in absolute terms –, the deletion of incomplete records should not affect our results. The resulting sample,
after such deletion, should be unbiased with regards to productivity. The resulting table was then stored on a MySQL server for
further processing, as indicated below.
3.2 Deriving metrics via SQL queries
We wrote a set of MySQL queries to count effort and output over the 12 years of the system’s evolution and to derive productivity.
We distinguish here two types of metrics we extracted from the database, namely the effort and the output metrics:
• Effort metrics: we obtained monthly counts of effort involved in creating new modules and amending existing modules, by
counting the number of distinct contributors committing work each month. This is, of course, an approximate to the real effort. It
is not fully accurate but gives an indicator of the relative effort. This means that we can start making comparisons of productivity
between different periods of the life of the same system and even across systems.
• Work-output metrics: The metrics include a set of eight evolution work indicators [Ramil 2003] based on file and subsystem
counts. All indicators are implicitly defined as “work achieved over a period of time between t and t+1”, with t is normally
measured in months. Depending on the particular system, a module can be defined as a source code file or, alternatively, as an
entity at a higher level, between files and sub-systems.
4. Results
4.1 Growth and effort: linear or superlinear?
In our analysis of Wine’s evolution we have plotted different metrics. Figure 1 displays some of the most significant. Figure 1 (left)
shows the total number of modules added and modified in a given month. This is a possible measure of work rate which considers
both creations and modification. There is in the figure evidence of a growing work rate with a superlinear trend. Figure 1 (right)
indicates that, despite some temporary reductions in the number of contributors, in the long term the number of contributors working
p. 24 of 199
during a given month has increased. The overall trend is also superlinear. Though there are differences of detail between the two
trends, the general superlinear trend suggests a strong correlation exists between work output and effort applied. Moreover, we argue
that both, a high volume of work and attraction of new developers, are characteristics of a successful evolving system. In order to test
the strength of the correlation between work output and effort, we did some numerical analysis, whose results are summarised in the
next section.
Wine - output produced
Wine - input provided
110
100
1750
90
1500
distinct authors
modules (added + modified)
2000
1250
1000
750
500
80
70
60
50
40
30
20
250
10
0
0
Jan 93
Oct 95
Jul 98
Apr 01
Jan 04
Oct 06
Jan-93
Oct-95
time (months)
Jul-98
Apr-01
Jan-04
Oct-06
time (months)
Figure 1 – Sample of work-output produced counted as number of files added and modified per month (left); corresponding sample
of effort input as number of distinct contributor identifiers (right).
4.2
Modelling productivity
For evaluating the strength of the correlation between work output and effort, we applied an approach [CRESTES], which is based
on the observation that software systems tend to evolve through distinguishable phases. Within each of these phases, the relationship
between effort and work output is likely to be stable. According to this approach, software evolution can be divided into:
1. Periods of stable functional growth and change rates
2. Periods of transition associated, for example, to software architectural re-structuring, significant drops or increases in demand for
changes and drastic changes in the effort applied
The approach suggests that stable phases inherent in the development of long-lived software can be characterized by a distinct,
usually linear, productivity. CRESTES measures effort based on an identification of developers ids and by counting how many
developers were committing work each month. It measures work output based on counts of files and subsystems. This is the list of
metrics considered in CRESTES:
− ModulesCreated: = number of new modules (here intended as source files)
− ModulesChanged: = number of modules that have been changed once or more
− ModulesHandled: = number of modules touched, that is, added and/or modified
− ModifHandlings: = sum of all individual changes to all modules of the system
− TotalHandlings: = sum of individual changes to modules, plus number of new modules
− SubsysInclCreations: = number of subsystems (here intended as the folders) with one (or more) new modules
− SubsysChanged: = number of subsystems for which one or more modules were modified
− SubsysHandled: = subsystems with one or more touched (i.e. new or modified) modules
These work output can be derived from many existing historical data sources, such as change log records.
The models consist of 6 expressions which relate Effort(t) and Work(t), over each month t, such that effort and work becomes
related to effort by a some function f() in this case, linear regression models. Two baselines are also considered.
Some of models in table 1 are univariate, where only one single work measure is used. Others are multivariate, where two work
measures are used. The CRESTES models therefore can be used to study the relationship of effort as a function of work achieved over
a period of time or release.
Before fitting any models to the data we split our metric set into two groups:
(i) data required for training or calibrating the models, and
(ii) data required to assess the predictive accuracy of the models.
A training set is the subset of data used to calibrate the models. The subset of data used to assess the model’s predictive accuracy is
called the validation set. Training and validation subsets must be different, so that the model accuracy is checked independently to its
p. 25 of 199
calibration. In the Wine case study, we have a sufficiently large dataset (12 years). Hence, we can afford to partition our data into
different subsets for training and validation purposes.
Expression’s Name
Baseline
Expression
Model 0
e(t) = average actual effort over training set
e(t) = average test effort - (average test ModulesHandled/ (average training
Baseline
Model 00
ModlesHandled/average training effort))
Model 1
Single
e(t) = a1xModulesHandled(t) + c1
variable
Model 2
e(t) = a2xSubsysHandled(t) + c2
Model 3
e(t) = a3xTotalHandlings(t) + c3
MultiModel 4
e(t) = a4xModulesCreated(t) + b4x[ModulesChanged(t) – ModulesCreated(t)] + c4
variable
Model 5
e(t) = a5xModulesCreated(t) + b5x[ModifHandlings(t) – ModulesCreated(t)] + c5
Model 6
e(t) = a6xSubsysInclCreations(t) + b6xSubsysChanged(t) + c6
N.B. The variables ModulesCreated, on the one hand, and ModulesChanged and ModifHandlings, on the other, are
highly correlated because a module is likely to undergo modification close to the time of its initial inception, such as
during the first month of its creation. Hence, in order to ensure positive model parameters, ModulesCreated is
subtracted from the other two variables as in models 4 and 5.
Key: e = estimated effort from t to t+1; a, b & c are constants obtained by fitting the model to the metrics from the
training set (a & b reflect productivity over a particular phase and c corresponds to the intercept value, which can be
interpreted as an overhead effort: minimum effort which is required to run the process but yet generates not visible work
output)
Table 1 – CRESTES productivity models
The analysis consisted in separating the 12-year data in two distinct phases: as visible in Figure 1(left): the first phase was visually
detected between the first creation of the system (July 1993) and July 1998. This phase 1 lasts approximately 5 years. The second
phase is from July 1998 to 2005, and lasts for the remaining 7 years. Each of these two phases was in turn divided in two, one part
being a training set for that phase, the other the evaluation set for the phase. Then we evaluated the models using standard measures
such as MMRE, MdMRE and PRED.
MMRE or Mean Magnitude of Relative Error = (Σ [|E(t) – e(t)| /E(t)])/n
MdMRE or Median Magnitude of Relative Error = Median of [|E(t) – e(t)| /E(t)]
PRED(x) = Number of observations in evaluation data set with MMRE equal or lower than x %, typically 10% and 25 %.
where E(t) is the actual effort, e(t) is the estimated effort and n is the number of data points in the evaluation data set.
For each of the 8 models and two baselines we run the following experiments:
‘Between Phases’ – calibrating the models to phase 1 and evaluating it in phase 2
‘within Phase 1’ – using the earliest half of the data in phase 1 to calibrate the models, and the second half in the same phase to
assess their accuracy
‘within Phase 2’ – similar to ‘within Phase 1’ but for phase 2.
Table 2 presents the results. A preliminary analysis suggests a strong linear correlation between effort and work output ‘within
phases’. In order to see this, we looked at the values of MMRE and MdMRE for the six models ‘within phases’ and found that the
majority are below 30 percent, which is good and is similar to results we have obtained applying the same modelling to commercial
systems [e.g. Ramil 2003, CRESTES]. This can be seen as an indication of a high correlation between effort and work output ‘within
phases’. However, the high error ‘between phases’ suggests significant differences in productivity between phase 1 and phase 2. This
brief analysis of the predictive accuracy of the models suggests that productivity is roughly constant within phases. This is against the
view of ‘increasing complexity’. Under constant productivity and superlinear cumulative effort, superlinear growth is the consequence
and not a surprise! We still need to perform a deeper study of the results in Table 2. A question for further study is to find out why is
it possible that the productivity within phases is maintained constant.
4.3 Additional results: skewness in the effort distribution and contributors’ turnaround
The further analysis of the ChangeLog gave also another interesting result: at a higher level of detail, the study of effort, which
permits the understanding of the distribution of applied input per unit of time. The first result that we obtained follow along the lines
of skewed effort, such as those obtained in [Mockus et al 2002]. However, in our case the results are even more restrictive: around 50
contributors, approx. 4 percent of the total number of contributors, are responsible for about 80 percent of the changes and creations
of new modules in the system. One aspect that doesn’t appear in the study performed in [Mockus et al 2002] is how the members of
this ‘inner circle’ change over time. Figure 2 (left) shows, as a summary, the first and the last contribution detected for the 50 most
active developers in the system. Almost none of these developers joined in the development from the very beginning: their active
presence as developers starts at various points in the evolution of the system. Moreover, as seen in Figure 2 (right), the vast majority
of the earliest “n” developers (in the figure n is equal to 50) contributed during a reduced period of time. This data suggests that the
evolution knowledge is not restricted to an almost fixed group of developers who might have been there since the inception of the
system. For the Wine system, the leading contributors have joined not at the very start but later during the evolution of the system,
p. 26 of 199
most of them between years 1998 and 2000. This suggest that in successful open source systems there are ‘generations’ of leading
contributors, with Wine experiencing the second or even third generation. We do not know how system’s knowledge is successfully
passed from generation to generation of developers in OSS. Such ‘generations’ may match the observed productivity phases.
Accuracy
measure->
Model
Baseline 1
Between Phases
within Phase 1
within Phase 2
Baseline 2
Between Phases
within Phase 1
within Phase 2
Univariate 1
Between Phases
within Phase 1
within Phase 2
Univariate 2
Between Phases
within Phase 1
within Phase 2
Univariate 3
Between Phases
within Phase 1
within Phase 2
Multivariate 1
Between Phases
within Phase 1
within Phase 2
Multivariate 2
Between Phases
within Phase 1
within Phase 2
Multivariate 3
Between Phases
within Phase 1
within Phase 2
MMRE
MdMRE
PRED(25)
PRED(10)
Total
number of
observations
78%
36%
28%
80%
40%
29%
0%
20%
36%
0%
0%
12%
84
30
42
89%
24%
18%
82%
22%
16%
10%
60%
81%
0%
27%
40%
84
30
42
49%
25%
17%
42%
24%
15%
33%
50%
83%
18%
7%
19%
84
30
42
107%
14%
19%
104%
9%
20%
11%
80%
76%
0%
53%
29%
84
30
42
36%
29%
19%
24%
30%
18%
52%
30%
71%
8%
13%
19%
84
30
42
53%
26%
17%
47%
26%
15%
31%
47%
81%
17%
10%
19%
84
30
42
40%
28%
20%
31%
28%
18%
45%
30%
67%
19%
10%
17%
84
30
42
87%
16%
20%
82%
13%
20%
8%
80%
60%
5%
40%
26%
84
30
42
Table 2: Results of the three model fitting experiments to the Wine system data. As results, values in blue & italics represent poor
fit of the models, while red & bold text represents good results.
wine - contribution time of the
most active developers
wine - contribution time of the
earliest developers
10/06
10/06
05/05
05/05
01/04
01/04
09/02
09/02
04/01
04/01
12/99
12/99
07/98
07/98
03/97
03/97
10/95
10/95
06/94
06/94
01/93
First
Last
01/93
First contribution
Last contribution
developers IDs
developers IDs
Figure 2 – Contribution time of the most active developers (left); contributor time of the earliest developers (right).
p. 27 of 199
4.4 Threats to validity
The present study has a number of threats to validity that need to be acknowledged. For example, effort based on counting different
identifiers contributing per month cannot be translated into person-months because contributors are not full-time in the development
of a system. Our current way of measuring effort in the evolution of systems through person identifiers is very imperfect and we need
to improve this in future studies. In addition, we cannot guarantee the completeness of the Wine ChangeLog records from which we
derived our metrics. This should not affect so much the results of applying the CRESTES approach – assuming that any missing or
intentionally deleted records for the purpose of data cleansing are not biased with respect to productivity. However, missing records
can affect our other results presented in this paper. With regards to the internal validity one has to be very careful in extrapolating
observed behaviour into the possible future of Wine or any other software system. Wine’s evolution could continue along the
observed trends or may experiment a transition. We still do not know how to predict such transitions. With regards to the external
validity, it is clear that a study of a single system, such as Wine, is cannot be generalised to the whole ‘population’ of software
systems. However, some of the observations we have made in Wine have been already made in other systems (e.g. superlinearity,
phases, skewness in the effort distribution, and contributors’ turnaround). In this sense, this study of Wine is helping to build a generic
picture with some interesting attributes of the evolution of very large and successful OSS. Last by not least, we developed our data
extraction tools with care. However, we did not have any means, such as testing, to check whether our tools produce the intended
results apart from checking that the results are sound. This is clearly an aspect that we, and possibly others, will need to improve in
the future in order to achieve more credible results. We plan to address some of the above threats to validity in our future work.
5. Conclusions
In this paper we report some of our findings in the empirical study of the evolution of Wine, an open source system which emulates
the popular MS Windows operating system. Wine has several interesting characteristics which make it relevant for study. This system
has attracted a large community of developers over the recent years. Moreover, several of its attributes (size, number of active
contributors, cumulative work, productivity) over time follows a superlinear trend. All these trends are highly correlated. Productivity
models used for commercial systems fit surprisingly well in an OSS system. Another contribution of this work is the support for the
hypothesis that superlinear growth is related to the growing amount of distinct (acting over some specific time interval, like weekly,
monthly and so on) developers upon the system: the identified first and second phases of the evolution of the Wine system correspond
to a different-paced growth in number of distinct developers, and their constant growth under constant productivity is one of the
possible causes for the superlinear functional growth of the system. All these observations are interesting but, by no means, clarify
still many of the mysteries of software evolution. For example, future research is required in order to understand how and why ‘linear
productivity within phases’ is achieved, apparently in contradiction with the ‘increasing complexity’ [Lehman and Belady 1985] law.
As said at the beginning, the empirical study of software systems is important for the software evolution community at large. We
need to be able to measure, observe and model software evolution in a credible way. Otherwise we will still be lacking of a scientific
basis to evaluate whether our new processes, methods and tools make any difference in the evolution of real world systems. Much
empirical work remains to be done and this paper shows one possible way of conducting these, by extracting data from ChangeLogs,
producing visualisations of trends over time and releases and analysing them using simple models.
6. References
[Anton and Potts 2001] Anton A. and Potts C., Functional Paleontology: System Evolution as the User Sees It, Proc. 23rd International Conference
on Software Engineering, ICSE, Toronto, Canada, May 12-19, 2001, pp. 421 – 430.
[Capiluppi et al. 2003] Capiluppi A., Lago P., Morisio M., Characteristics of Open Source Projects, on the Proceedings of the 7th European
Conference on Software Maintenance and Reengineering, CSMR, Benevento, Italy, March 21-23 2003, pp. 58 – 64.
[Capiluppi et al. 2004] Capiluppi A., Morisio M. & Ramil J.F. (2004) The Evolution of Source Folder Structure in Actively Evolved Open Source
Systems, Proceedings of the 10th International Symposium on Software Metrics, Sept. 11-17, Chicago, Illinois, pp. 2 – 13.
[CRESTES] Continual Resource ESTimation for Evolving Software, UK EPSRC funded project, 2004-5. http://mcs.open.ac.uk/crestes/ (as of
February 2006)
[FEAST] Feedback, Evolution, And Software Technology, UK EPSRC funded projects, 1996 - 2001. http://www.doc.ic.ac.uk/~mml/feast/ (as of
February 2006)
[Gall et al., 2003] Gall H., Jazayeri M., Krajewski J., CVS Release History Data for Detecting Logical Couplings, Proc. International Wokshop on
Principles of Software Evolution (IWPSE), Sept. 01-02 2003, Helsinki, Finland, pp. 13 – 23.
[German 2003] German D., Using Software Trails to Rebuild the Evolution of Software, Evolution of Large-scale Industrial Software Applications,
ELISA 2003 Workshop, 24 Sept., Amsterdam, The Netherlands.
[Godfrey and Tu 2000] Godfrey, M., and Tu Q., Evolution in Open Source Software: A Case Study. Proc. of 2000 International Conference on
Software Maintenance (ICSM), October 11-14 2000, pp. 131 – 142.
[Hassan and Holt 2003] Hassan A.H., Holt R.C. (2003), The Chaos of Software Development, IWPSE 2003, Sept. 01 – 02, Helsinki, Finland, pp. 84
– 94.
[Kemerer and Slaughter 1999] Kemerer, C.F., and S. Slaughter. “An Empirical Approach to Studying Software Evolution”. In IEEE Transactions on
Software Engineering, 1999. 25(4): 493-509.
[Lehman and Belady, 1985] Lehman M. M., Belady L. A., 1985, Program Evolution: Processes of Software Change. Academic Press, London.
Available from links at http://w3.umh.ac.be/evol/publications.html (as of February 2006)
[Mockus et al 2002] A. Mockus, R.T. Fielding, J.D. Herbsleb, 2002, Two Case Studies of Open Source Development: Apache and Mozilla. In ACM
Transactions on Software Engineering and Methodology Vol.11, No. 3, 2002, 309-346.
[Ramil 2003] Ramil J.F., Continual Resource Estimation for Evolving Software, Proc. ICSM 2003, 22 - 26 Sept. 2003, Amsterdam, The
Netherlands, pp. 289 – 292.
[Robles et al 2005] Robles G., Amor J.J., Gonzalez-Barahona J., Herraiz I., Evolution and Growth in Large Libre Software Projects, Proceedings
IWPSE 2005, September 5-6 2005, Lisbon, Portugal
[Rysselberghe and Demeyer 2004] Rysselberghe, F Van and Demeyer, S. 2004 "Studying Software Evolution Information By Visualizing the
Change History", Proceedings of the 20th IEEE International Conference on Software Maintenance (ICSM'04)
[Smith et al 2005] Smith N., Capiluppi A., Ramil J.F., 2005, A Study of Open Source Software Evolution Data using Qualitative Simulation, Journal
of Software Process: Improvement and Practice, 2005.
p. 28 of 199
Evolvability as a Quality Attribute of Software
Architectures∗
Selim Ciraci, Pim van den Broek
Software Engineering Group
Faculty of Electrical Engineering, Mathematics and Computer Science
University of Twente
PO Box 217
7500 AE Enschede
The Netherlands
Email: {s.ciraci, pimvdb}@ewi.utwente.nl
Abstract— We review the definition of evolvability as it appears on the literature. In particular, the concept of software
evolvability is compared with other system quality attributes,
such as adaptability, maintainability and modifiability.
Keywords: Software evolvability, Software evolution, Quality Attributes.
I. I NTRODUCTION
In recent years, IT industry has faced the problem of
evolving their software products in order to stay on the market
and to compete with similar products. For software systems to
stay on market, they should inherit new requirements and adapt
themselves to the changing environment. However, today’s
marketing trends do not allow the industries to work on
changing their products for a long time. Thus the need for
designs that can withstand and easily adapt new requirements
and changes has emerged, which put the focus on evolvability
to be considered as a software quality.
Two trends have been identified to allow software systems
to become evolvable: component exchangeability and increase
in component distance [2]. Architecting software products has
allowed the designers to divide the system in consideration
into components. These components exchange messages or
request services of other components by means of connections
between them. This decoupling of systems at the architecture
level reflected itself into detailed design stages like Object
Oriented design. In that case, components become classes
and connections between components become inheritance,
message passing and so on. Besides reducing the understandability of complex systems, this decoupling of components
allowed exchangeability. System architects can easily replace
a component in the architecture of a system by a better one.
In code level design, this change is reflected by changes in
the inheritance hierarchy, replacing a class by a newer one
and updating dynamically linked libraries [2]. According
to [2], development and improvement of networking technologies also contributed to evolvability of software systems.
*This work has been carried out as a part of the DARWIN project under the
responsibilities of the Embedded Systems Institute. This project is partially
supported by the Netherlands Ministry of Economic Affairs under the Bsik
program.
Networking allowed different components to be designed to
work at different entities in a networking environment and
these components exchanged messages by means of remote
procedure calls (RPCs) and sockets. Thus a component can
easily be changed or upgraded without affecting other components working with different entities. In summary, decoupling
complex systems into components allowed software systems
to become evolvable. However, decoupling caused other problems like keeping the same communication interface between
components, in order that a component change doesn’t affect
other components. Studies have shown that evolution and
maintenance are the longest and the most expensive phase
of software life-cycle. This drew the attention to consider
evolvability further in research.
Lehman et al. [8] focus on two different uses of the
term evolution; as a noun and as a verb. The first and the
largest group of researchers focuses on the question ”how” to
effectively and reliably evolve a software system; this includes
the theories, abstractions, languages and methods. This group
is considered to be using the term evolution as a verb. The
second group of users ,of the term evolution, uses it as a noun
and focuses on the question ”what” to investigate and learn
properties of software evolution. We think that evolvability
research belongs to this group, since it is asking the question
”what is evolvable”.
In this paper, we present our view of the term evolvability
and the operations in software evolution. In the next section the
definition of evolvability is presented. Section 3 explains the
reason behind considering evolvability as a quality attribute.
In section 4, evolvability and some other quality attributes
are compared in order to depict where evolvability stands.
In the last section, conclusions and some research topics are
provided.
II. D EFINITION OF E VOLVABILITY
In this section, we present the definition used in the literature for the term evolvability and then we present our
definition of evolvability which explains our scope for the
term. Evolution first appeared in the software engineering
literature in 1970s by the study conducted in [3]. In that study
p. 29 of 199
the authors have tried to measure the complexity, size, cost and
maintenance, using the source code of 20 releases of Os/360
operating system. All measures have shown an increasing
trend, which led the authors to compose the five laws of
software evolution: Continuing Change, Increasing Complexity, Fundamental Law of Program Evolution, Conservation of
Organizational Stability and Conservation of Familiarity.
Since then software evolution has been used to describe
the ”long and broad view of change in software systems”
[4]. Thus, from this definition of software evolution, the term
evolvability is defined as: ”the capability of software products
to be evolved to continue to serve its customer in a cost
effective way” [4]. Although this definition gives a common
ground on evolvability, it doesn’t describe the scope of changes
that are meant by the term evolution.
To address this problem, we begin constructing our evolvability definition from identifying the changes that cause
systems to evolve. Currently, three sources of evolution have
been identified [14]:
• Domain: covers the model of the real world considered
by the system, i.e., the environment. Any change in that
model may force the system change.
• Experience: The users of the system gain experience
over time and they may require some suggestions for the
system, which in turn may cause the system to evolve.
• Process: includes the organizations and methods that may
also impact the system and cause it to change.
Considering these sources, we describe evolution as changes
in a system’s environment (domain), requirements (experience)
and implementation technologies (process). Then we define
evolvability as a system’s ability to survive changes in its
environment, requirements and implementation technologies.
It is important to notice here that this definition of evolution
modifies the original focus presented in [3] to include the
evolution that may occur during the initial development of
the system, since these sources of changes may also occur
during initial development. For example, during development
of the initial system, new implementation technologies may
be developed which may cause changes in the requirements
of the system.
III. E VOLVABILITY AS A Q UALITY ATTRIBUTE
Most of the research on software evolution is focused on
analyzing the properties of evolution on the source code level.
The majority of these studies try to capture the properties of
software evolution by analyzing the changes on the source
code in release cycles of software systems. The research on
this field has mainly considered size (number of modules) as a
principal measure for evolvability [5]. However, studies show
that using different metrics may result in different distributions. Kemerer and Slaughter [1] list some of these studies and
continue with conducting time, sequence and gamma analysis
on two different software systems. An important observation
from this study is that these software systems start their
evolution cycle with similar activities, such as addition of new
modules.
p. 30 of 199
Besides their help on understanding the properties of software evolution, such empirical studies may also provide
estimates on the future changes that the source code of a
software system is going to face and predict the cost of these
changes, such the number of module additions on the next
release and the cost of adding these. However, the problem
with these estimates is that they do not provide information
on how evolvable the initial system is. The system may be
designed without considering the changes, so that adding or
removing components from it may be very costly. Thus, we
believe that the research on evolution should raise the level
of abstraction so that systems are designed in a way that they
can withstand changes. In other words, evolvability should be
a non-functional requirement of a system.
The IEEE 1061 standard [9] defines software quality as the
degree to which the software system fulfills a selected group
of attribute requirements. In [7] a quality attribute is defined
as a non-functional characteristic of a component or a system.
Since evolvability is a non-functional requirement of system,
it can also be considered to be a characteristic of the system;
thus one can conclude that evolvability is a quality attribute.
Bennet and Rajlich [6] also mention the importance of raising
the abstraction level and point out two research topics on this
subject:
•
•
Architecting systems in a way that they allow changes
without damaging the integrity of the system
Constructing architectures which can be evolved in a
controllable way.
To measure how evolvable a system is, it is desirable to
have a mechanism that evaluates the system at high levels of
abstraction. Currently, there are methods that can estimate how
a system meets certain quality requirements and we believe
that some of these methods can be adapted to measure evolvability. For example, evaluation techniques based on scenarios
can be easily adapted to evaluate designs with respect to
evolvability; SAAM [11] may be specialized to work with
evolution scenarios and ATAM [12] can be used to measure
the trade-off between evolvability and other quality attributes.
Although scenario based techniques may supply great value of
information for many quality attributes, they may not be very
useful for quality attributes that deal with future changes. This
is due to the limitation of the scenario generation process to the
generators’ view of the future. For example, when evaluating
with respect to evolvability most of evolvability scenarios may
be missed by the scenario generators (most of the time the
stakeholder) which may result in wrong judgments about the
current architecture. It is obvious that a model based evaluation
technique may be more suitable for evolvability; though a great
deal of work has to be conducted in order to find metrics for
evolvability.
Currently, ISO/IEC 9126 standard [10] derives metrics for
evolvability based on the goal-question-metric (GQM) [4],
[13]. The steps that are taken during an evolution request act
as the goal (e.g., analyze the current system). Then, for each
goal a set of questions is generated and for each question a
metric is associated (e.g., what is the time required to the
find the changes that need to be done in order to analyze
XYZ change in the requirements?). Finally, from these metrics
the architectures average response to an evolution request is
evaluated; which in turn may give some insight about the
evolvability of the current architecture.
IV. E VOLVABILITY AND OTHER Q UALITY ATTRIBUTES
This section of the paper tries to depict where evolvability
stands with respect to other quality attributes that deal with
”changes” in a system and tries to distinguish these attributes
from evolvability. In the literature, most of the time the term
evolution is used with maintenance, and evolvability is used
to mean maintainability or modifiability. This is because the
changes that evolution refers to are not identified and since,
in this paper, we identify these changes, we should also
provide means of distinguishing evolvability from other quality
attributes. Before going to the further details of comparing
these attributes, we first present their definitions.
The ISO/IEC 9216 standard [10] defines maintainability as
the set of attributes that have a bearing on the effort needed
to make specified modifications. These modifications include
corrections, improvements and adaptations to the changing
environment [7]. Modifiability is defined as the ability to make
changes quickly and cost effectively [10]. These changes include addition of new requirements (extensibility), deleting unwanted capabilities and portability. Evans and Marciniak [15]
define adaptability as the ease with which software satisfies
differing system constraints and user needs.
As it can be seen from the definitions of these quality
attributes, it is difficult to distinguish evolvability from them;
however, by considering the laws of evolution the difference
between them becomes clearer. The tasks included in adaptability and maintainability, for example, disobey the increasing
complexity law, since when a system is corrected or adapted
to another environment, the complexity of the system does
not change, although the complexity of these operations on
the system may be too high.
We think that the definition given for modifiability is too
broad; the changes included for adaptability and maintainability can easily be fitted to modifiability. This is also true with
our definition of evolvability. For this, we view modifiability
as a superset of all quality attributes that deal with changes in
a system. Then one can easily say that a modifiable system
is also evolvable; we think further attention has to be paid in
order to understand the relation between the evolvability and
modifiability quality attributes.
V. C ONCLUSIONS AND F UTURE W ORK
In the literature, most of the research on evolvability focuses
on source code level evolvability analysis; though, we believe
that evolvability should be considered while designing the
initial system. For this, we will develop techniques that can
evaluate architectures with respect to evolvability. In order to
achieve this goal, we first need to define evolvability and in
this paper we present our definition of evolvability.
Our next step, towards pursuing the goal of finding techniques that can evaluate architectures with respect to evolvability, is identifying the operations involved in evolvability
and conducting empirical analysis on these operations. To do
so, we are going to use the architecture of a system that has
been evolving for years. This empirical analysis study is going
to be similar to the ones conducted on the source code level;
however, it is going to allow us to understand the evolution at
the architecture level. For example, from this study we may
see a relationship for the number of addition operations done
over time; we can use this relationship to estimate how many
component additions will be made in the next release.
Then we are going to focus on identifying metrics for
evolvability so that we can reason about the evolvability of
an architecture. For example, let us assume that the number
of connections to a component is our metric for evolvability.
Obviously, removing or making additions to a component with
many connections would be very costly. Furthermore, if the
architecture is composed of such components then the system
would not be very evolvable.
R EFERENCES
[1] C. F. Kemerer and S. Slaughter: An Empirical Approach to Studying
Software Evolution. IEEE Trans SE 25(4) pp 493-509 (1999).
[2] C. Ler, D. Rosenblum, and A. van der Hoek: The Evolution of Software Evolvability, International Workshop on the Principles of Software
Evolution: 131-134 (2001)
[3] L.A. Belady and M.M. Lehman: A model of large program development,
IBM Sys. J. vol. 15, no. 1, pp. 225-252 (1976).
[4] S. Cook, H. Ji and R. Harrison: Software evolution and evolvability.
Technical Report, University of Reading, UK (2000)
[5] M.M. Lehman and L.A. Belady, Program evolution - processes of
software change. London: Academic Press (1985).
[6] K. H. Bennet, V. T. Rajlich: Software Maintenance and Evolution: a
Roadmap. International Conference on Software Engineering. Proceedings of the Conference on the Future of Software engineering, pp. 73-87
(2000).
[7] L. Dobrica, E. Niemel: A Survey on Software Architecture Analysis
Methods. IEEE Trans. Software Eng. 28(7): 638-653 (2002).
[8] M M Lehman, J F Ramil and G Kahen, Evolution as a Noun and
Evolution as a Verb, Workshop on Software and Organisation Coevolution, Imp. Col., London (2000).
[9] IEEE Standard 1061-1992, Standard for Software Quality Metrics
Methodology, New York: Institute of Electrical and Electronics Engineers
(1992).
[10] ISO/IEC91Int’l Organization of Standardisation and Int’l Electrotechnical Commission, Information Technology and Software Product Evaluation and Quality Characteristics and Guidelines for Their Use, ISO/IEC
9216, (1991).
[11] R. Kazman, L. Bass, G. Abowd, and M. Webb: SAAM: A Method for
Analyzing the Properties of Software Architectures. Proc. 16th Int’l Conf.
Software Eng., pp. 81-90 (1994).
[12] R. Kazman, M. Klein, M. Barbacci, H. Lipson, T. Longstaff, and S.J.
Carriere: The Architecture Tradeoff Analysis Method. Proc. Fourth Int’l
Conf. Eng. of Complex Computer Systems (1998).
[13] V. R. Basili, G. Caldiera and H. D. Rombach: Goal Question Metric
Paradigm. In: ”Encyclopedia of Software Engineering”, Volume 1, pp.
528-532, (1994).
[14] D.E. Perry: Dimensions of Software Evolution. Procedings Conf. Software Maintenance, (1994).
[15] W, M. Evans and J. Marciniak: Software Quality Assurance and Management. New York, NY: John Wiley & Sons, Inc. (1987).
[16] T. Mens,J. Buckley,M. Zenger and A. Rashid: Towards a Taxonomy of
Software Evolution. International Workshop on Unanticipated Software
Evolution, Warsaw, Poland (2003).
p. 31 of 199
p. 32 of 199
GT-VMT 2004 Preliminary Version
Software Evolution from the Field:
An Experience Report from the Squeak Maintainers
Marcus Denker
2,3
Software Composition Group
University of Bern, Switzerland
Stéphane Ducasse
1,3,4
LISTIC
Université de Savoie, France
Keywords: Squeak, open-source, Maintenance, Software Evolution, Tool support
Abstract
Over the last few years, we actively participated in the maintenance and evolution of
Squeak, an open-source Smalltalk. The community is constantly faced with the problem of
enabling changes while at the same time preserving compatibility. In this paper we describe
the current situation, the problems that faced the community and we outline the solutions
that have been put in place. We also identify some areas where problems continue to exist
and propose these as potential problems to addressed by the research community.
1
Introduction
Over the few last years, we actively participated in the development and maintenance of the Squeak open-source project [10]. We were responsible for the the 3.7
and 3.9 official releases and participated in the releases 3.6 and 3.8. During this activity we faced the typical situations that developers are facing daily: bloated code,
lack of documentation, tension between the desire of improving the code and providing an unchanging basis for all developers. Furthermore, Squeak suffers from
1
Email: [email protected]
Email: [email protected]
3
Denker and Ducasse gratefully acknowledge the financial support of the Swiss National Science
Foundation for the projects “Tools and Techniques for Decomposing and Composing Software”
(SNF Project No. 2000-067855.02, Oct. 2002 - Sept. 2004) and “RECAST: Evolution of ObjectOriented Applications” (SNF Project No. 620-066077, Sept. 2002 - Aug. 2006).
4
Ducasse gratefully acknowledges the financial support of the French National Recherche Agency
(ANR) the project “Cook: Rearchitecturing Object-Oriented Application 2005-2008”.
This is a preliminary version. The final version will be published in
Electronic Notes in Theoretical Computer Science
URL: www.elsevier.nl/locate/entcs
2
p. 33 of 199
Denker and Ducasse
the typical problems of a real open-source project: lack of man-power, distributed
individuals and lack of long term commitment.
In this paper we present the typical problems that the Squeak community has
been facing and describe the solutions that have been put in place. Moreover we
sketch future approaches we would like to put in place. Finally we analyze the open
problems and propose these as future research topics that may be of interest to the
research community on software evolution.
First we present Squeak [10] by providing some measurements that characterize
it and its evolution. Then we shed some light on the philosophy of Squeak by
describing the current forces in the Squeak community and we present the overall
development process. We outline the common problems the community has been
faced with and the solutions it has adopted both from a software engineering and
from a more process oriented point of view. Subsequently we present the open
problems that still exist. We analyze the opportunities we missed retrospectively
and sketch the approach we envision to improve the situation.
2
Squeak
While the appearance of Squeak may mislead a novice user, Squeak is a really large
and complex system containing more than 1600 classes and 32,000 methods in its
latest public release (3.8 basic). Squeak includes support for different application
domains:
•
two large graphical user-interface frameworks, Morphic and MVC,
•
a complete IDE, including an incremental compiler, debugger and several advanced development tools,
•
a complete language core and all the libraries enabling it (both simple and complex) including concurrency abstractions,
•
multimedia support including: images, video, sound, voice generation,
•
eToy [1], an advanced scripting programming environment for children,
•
various libraries such as compression, encryption, networking, XML support.
All this code is part of the main Squeak distribution. In addition to that, there
is a growing number of external packages available on the SqueakMap repository
(over 600 packages), thus showing the vitality of the Squeak community.
2.1
Measurable Facts
The table 1 shows some metrics to give an idea of the size and growth of Squeak.
The Squeak community realized that the system is beginning to contain too much
unneeded functionality. Hence, in Squeak 3.6 certain functionality such as the web
browser and email clients was externalized. The numbers of version 3.9 are preliminary as more code will be removed before the distribution is released. Furthermore
an extensive amount of infrastructural work has been introduced: new language
feature (Traits)[13], a versioning system (Monticello), a registration mechanism
2
p. 34 of 199
Denker and Ducasse
Release
Year
classes
methods
LOC
testCase classes /
testCase methods
Introduced Functionality
3.5 Full
2003
1811
41408
322656
20 / 158
3.6 Basic
2003
1338
33277
246752
0/0
3.7 Basic
2004
1544
35526
261315
133/ 1180
Tests
3.8 Basic
2005
1659
37952
281694
148 / 1426
Internationalization Support
3.9a Basic
2006
2003
44506
321731
201 / 2122
Traits, Monticello, Services, ToolBuilder
Table 1
Overall Squeak growth.
(Services), an abstraction layer for the UI framework (ToolBuilder).
To understand the Squeak development dynamics we have to consider its opensource nature and also to analyze not only its growth, but the actual changes taking
place. Indeed some actions (addition and removal of functionality) may result in
the numbers of methods and classes remaining constant. One good starting point
when considering the evolution of Squeak is to consider the approximate number
of patches that have been integrated between releases. These patches vary in size
and complexity, so the number can only be an indication of change:
Version
3.5
3.6
3.7
3.8
3.9
Patches
10
240
560
600
n/a
Table 2
Number of patches applied.
2.2
Communities and Different Agendas
The Squeak.org distribution is used by a wide range of diverse projects. Some of
them have their own open-source communities, other are closed research or commercial projects. Here we list the most important projects that are based on the
common distribution.
Squeakland. The main focus of this distribution is educators of primary schools.
The main interface is the Etoy system [1]. Squeakland is widely distributed in
Japan and US. The end user does not interact directly with Squeak but the Etoy
layer. Code quality, ease of extension, untangling and tested code is not the main
focus of development.
SmallLand. It is a community similar in scope to Squeakland: the children are
older, and the project directly manages an installed base of its own squeak distribution on 80,000 PCs in schools in Spain. SmallLand has been completely
localized in Spanish: one of the largest contributions of this work was the framework for translating Squeak to other european languages, a change that affected
a large percentage of the code base. Lot of changes to the graphical user interface were made to support a better desktop. They are concerned with the code
quality.
3
p. 35 of 199
Denker and Ducasse
Seaside. It is a framework for developing sophisticated dynamic web applications.
Seaside combines an object-oriented approach with a continuation-based one.
Seaside offers a unique way to have multiple control flows on a page, one for
each component [5]. This community is concerned with the robustness and scalability of the libraries and the tool support. The design and ease to extend the
system is also one of their concerns. This community developed a good package and version control system (Monticello), that is used in the latest version of
Squeak.
Croquet. It is a 3D shared multi-user environment. Users on different machines
can use Croquet to explore a three dimensional space together, collaborate and
share. Croquet uses a new Peer-to-Peer model for synchronization, allowing the
system to function without needing central servers.
Tweak. Tweak is a next generation implementation of a UI and scripting system
based on the lessons learned with both Morphic and EToys. Tweak is used in
Croquet and other projects, but is not part of the main Squeak distribution. It
may or may not be integrated in the main distribution. Tweakers are concerned
with code quality but in general are writing layers on top or besides the core.
We have learnt over the years that there are two main driving forces, and this has
lead us to coin the phrase ‘egocentric syndrome’: frequently one group of developers within the community wants to have problems that affect its interests fixed
but at the same time demands that no other changes should be made for fear of
breaking existing code.
2.3
The Squeak Development Process
As with a number of open-source projects, the Squeak community is a open-source
movement that lacks real financial support or a sponsoring organization. From its
inception to the release 3.4, Squeak was mainly developed by SqueakCentral, a
single group of researchers around Alan Kay. From 3.4 to 3.8, the development
community grew and became geographically distributed over many developers and
small teams.
The general process is the following: the developers produce fixes or enhancements that are sent to the mailing-list and are collected in a central database. Then,
the group of maintainers responsible for a given release look at the changes and
decide whether to integrate them in the current release.
3
Common Problems
In this section, we elaborate on some code level problems that face the Squeak
community. We are aware that such problems are not exceptional nor specific to
Squeak. Our goal is to document them for researchers who focus on the area of
software evolution. The following section shows the solutions that have been applied and an evaluation of the obtained results.
4
p. 36 of 199
Denker and Ducasse
Tangled Code. Historically, and as with any original Smalltalk implementation,
Squeak was developed as one single system. The fact that subsystem boundaries
were not explicit (for example by means of a proper package system) leads to a
system with a lot of unneeded (and often surprising) dependencies. For example,
the compiler depends on the GUI 5 . These dependencies hamper modularisation
and evolution: there is a high risk of introducing bugs, as a change in one part of
the system can break another unrelated part.
Dead Code and Prototype Code. Squeak was originally intended as an environment for prototyping new ideas. Very little effort was spent on refactoring [8]
and it was never systematically cleaned. Often the experiments and their extensions have remained in the system resulting in dead code and often incomplete
functionalities.
Evolution Dilemma. The previous problems are minor compared to the evolution
dilemma. As multiple stakeholders exist for Squeak, they require that it is a
stable basis for development. However at the same time, those same projects
also carry out refactorings to improve and bugfix the base system, and by doing
they generate instability.
Thus the evolution dilemma is to preserve stability, while at the same time
ensuring that it is possible to make changes.
4
Software Engineering Solutions
One way to solve the problem of evolution is by using well-known, proven software
engineering techniques or object-oriented design heuristics. In this section, we give
examples of how these techniques were advantageous for Squeak. A challenging
problem with most software engineering solutions is that they imply refactoring the
system towards a better, more evolvable architecture. Hence we need to change the
system to facilitate future changes.
4.1
Deprecation
Deprecation mechanisms are a well known technique to allow for gradual adoption
of new interfaces. The idea is that all public methods that are no longer needed (e.g.,
because of adopting a new, different API) are not deleted immediately. Instead, a
call to the deprecation method is added with the indication to the user that another
method should be used:
Month>>eachWeekDo: aBlock
self deprecated: ’Use #weeksDo:’.
self weeksDo: aBlock
This method raises a warning in debug mode by calling the #deprecated: method,
but allows the original method to execute when in production. In Squeak, to provide a better feedback to the programmer, the execution stack is traversed to extract
the method sender information. In Squeak, deprecated methods are retained for
5
It was because the code predates Exceptions.
5
p. 37 of 199
Denker and Ducasse
one major release: methods that were deprecated in the development of 3.8 will be
removed in 3.9.
Version
Number deprecated methods
Version
Number deprecated methods
3.6
100
3.8
24
3.7
104
3.9
39
Table 3
Number of depecated methods.
The usage of the deprecation mechanism decreased over time (Table 3). This
could indicate one of two things: it may be an indication that a better mechanism
is needed or that the system is in fact stabilizing.
Next Steps. It is not possible to tag an entire class as deprecated. Another
disadvantage is that the mechanism as a whole is very course grained: evolution is
happening all the time and is not restricted to major releases. Deprecation does not
scale towards short cycles of change. In the future, the authors would like to explore
how refactorings could be stored and then be available for automatic replay when
a deprecated method is called. We also want to address the issue of deprecation of
classes.
4.2
Divide and Conquer: Modularizing the System
A large effort is underway to modularize the system. The latest 3.9 alpha version is
composed of 49 packages excluding tests with an average of 40 classes by package.
Next Steps. The modularization was not driven by advanced analysis and having
more tool support for that effort is definitively an important step [14].
4.3
Registration Mechanisms
Modularization requires that a system be divided into units to be able to be built
from a subset of its actual components. In earlier versions of Squeak there were
many cases where simple configuration required that the code be changed: for example, when adding a tool, the code that built the menu needed to be extended to
handle the new tool. The community successfully applied Transform Conditional
to Registration [4] i.e., registration mechanisms were introduced allowing for registration of new services without having to modify code. This reduced considerably
the friction between different tools and facilitated the dynamic loading and unloading of tools.
4.4
New Abstractions
Squeak’s multiple widget frameworks proved to overcomplicate the building and
maintaining of tools. There were interdependencies between MVC/Morphic, some
features are only available in one framework. The solution the community adopted
was to build an abstraction layer, named ToolBuilder, representing the common
framework elements and to rewrite the tools to use the ToolBuilder framework,
which encapsulates the GUI framework used.
6
p. 38 of 199
Denker and Ducasse
4.5
Refactorings
A lot of small refactorings were carried out over the recent years, leading to iterative
improvements. Some large refactorings were supported such as a complete rewrite
of the network subsystem. Large refactorings were possible when the changes did
not cross-cut a lot of packages but were localized in one package.
4.6
Enabling Changes: Tests
Over the last years unit testing has gained a lot of recognition, as it facilitates
change [2]. Tests support the developers ability to change code as they can quickly
identify the code that they broke as a result of their changes. Over the last releases,
test coverage increased, with over 6 times more tests in 3.7 than in 3.6, and nearly
doubling in later versions (see Table 4).
Version
Number tests
Version
Number of Tests
3.5
171
3.8
1347
3.6
0(external package)
3.9
1957
3.7
1124
Table 4
Number of tests.
Next Steps. Having tests is a first step towards enabling changes and documenting the system in a synchronized way. The next step is to assess the quality of tests
using test coverage tools.
5
Process Solutions
Besides strictly technical solutions, process-oriented approaches have been adopted.
The community started to adopt a number of process improvements such as better
tools and real bug tracking. Other promising improvements, for example an automated build system, will be implemented in the future.
5.1
Better Packaging Tools
Squeak development used to be based on change sets: simple lists of modified or
added methods and class definitions. Changesets can be saved into text files and
send around (i.e., via email). However, as soon as multiple developers are involved
and are required to get synchronized, changesets do not scale: resolving conflicts
is very tedious and turnaround time for integration is very large.
In contrast many of todays development systems use elaborate versioning systems like CVS[6]. This inspired the commercial sub community around Seaside
to develop a versioning system called Monticello that uses a powerful merge algorithm, and a server, SqueakSource, that projects can use for managing and storing
their code base in a distributed setting.
7
p. 39 of 199
Denker and Ducasse
Next Steps. Having first class packages and a better integration of packages at
the level of the tools is desirable.
5.2
Bug tracking
Earlier releases of Squeak were developed without any provision for bug-tracking.
Naturally the Squeak community were faced with the problem of managing bugs:
users did discover bugs, developers offered fixes, some of which were included in
the next release. But there was no control over how many bugs had been identified,
or to monitor whether or not they had been fixed, or in fact what the general overall
state of Squeak was. This situation has been improved with huge success: Over
2500 bugs have been reported using Mantis [11], a web-based bug tracking system.
5.3
Future improvements
We have learned from experience that we should have invested more time and effort
in tools that would support the development process of Squeak. Our goal now is to
actively address this.
Automatic Build tools. Currently, the task of merging changes is labor-intensive
and time-consuming. A goal is to leverage an automatic build system to eliminate all the tedious work that currently face maintainers. Automatic build systems are a well known technique and will be a valuable addition to the set of
tools for Squeak development. Integration with an automatic test server is another major aspect of the build system that will have to be tackled.
Automatic test runs. Unless tests are run frequently, they are not very useful. Currently executing the entire testsuite takes well over 10 minutes, thus discouraging
developers from running tests. We propose to solve this with a server that runs all
the tests automatically, at least once a day, and sends reporst to the developers.
Such a server is currently in development.
Even better tools. The current tools represent improvement in the development
process of Squeak, but they need to be improved even further: e.g., the response
time of Monticello is slow for large code bases.
6
Language Design Dreams
Apart from our involvement in the maintenance and evolution of Squeak, we are
primarily language design researchers. We try to step back from our research background and avoid to mix experiment in language design with robust, scalable solutions.
While we remain skeptical that the solution to the complex problem of enabling
smooth but fast evolution of a large system can be solved by just adding a couple
of new language features, we think that new language features are a very promising
field for future research on software evolution: How can a language [12] support
change and evolution?
Here is a list of challenges for the software evolution researchers.
8
p. 40 of 199
Denker and Ducasse
Better support for modularity. While Squeak now has a package mechanism, it
does not have a real module system. Packages are only deployment time and
code management artefacts. Squeak packages do not include any notion of visibility (contrary to Java packages) [3]. It would be interesting to evaluate how a
module system, by providing a scoping mechanism could help to better structure
the system.
History as a first class objects. Squeak is a reflective language [7]: e.g., classes
and methods are objects, the system has a complete, object-oriented description
of its own structure, available for introspection and change.
We would like to understand the costs of extending this reflective model to
take into consideration the data that is important for evolution: the history of
the system should be represented as a first class entitiy [9]. Having first class
history should support the possibility to express the following questions: Why
did this change? What else has changed when this method was changed? When
did this test break for the first time? Which change affected the performance of
the system?
Beyond Deprecation. Deprecation, as described earlier, does not really solve the
evolution problems the Squeak community faces: the problems is that some
clients may want to migrate to new release but only incrementally for certain
parts of their applications. If we have the complete history of the system available, do we really need to force clients to use the latest version only? Would
it be possible to have different clients using at the same time different versions
of the same components? The operating systems offers the possibility to specify which library version to use. Can such a model be realized at the level of a
programming language so that multiple versions are able to co-exist?
7
Conclusion
The evolution of Squeak with its diverse community of developers and open issues
( i.e., decentralized development, modularization need) is a challenging task. The
Squeak community achieved lot of progress over the last years such as a first cut
at modularizing the system and the increase of unit tests. However, apart from
environment enhancements such as Monticello few tools have been built to analyze
the code base of Squeak. We have identified the need for them to help improving
the system.
In addition, over the last few years, we have experienced the problems of having
to introduce changes while at the same time supporting stability for existing users.
We highlight the fact that there is no language support for the notion of evolution in
todays programming languages. Current languages are only capable of describing
one version of the system. Scoping changes and allowing multiple versions while
retaining a simple language is an open challenge that the research community is
facing.
9
p. 41 of 199
Denker and Ducasse
Acknowledgments.
We gratefully acknowledge the financial support of the Swiss National Science Foundation for the project Recast: Evolution of Object-Oriented Applications (SNF 2000-061655.00/1) and french ANR for the Cook: rearchitecturing object-oriented applications. The
authors would like to thank all the Squeak developers and Orla Greevy and Adrian Lienhard for the reviews on the early versions of this paper.
References
[1] B.J. Allen-Conn and Kimberly Rose. Powerful Ideas in the Classroom. Viewpoints
Research Institute, Inc., 2003.
[2] Kent Beck. Extreme Programming Explained: Embrace Change. Addison Wesley,
2000.
[3] Alexandre Bergel, Stéphane Ducasse, and Oscar Nierstrasz. Analyzing module
diversity. Journal of Universal Computer Science, 11(10):1613–1644, 2005.
[4] Serge Demeyer, Stéphane Ducasse, and Oscar Nierstrasz.
Reengineering Patterns. Morgan Kaufmann, 2002.
Object-Oriented
[5] Stéphane Ducasse, Adrian Lienhard, and Lukas Renggli. Seaside — a multiple control
flow web application framework. In Proceedings of ESUG Research Track 2004,
pages 231–257, September 2004.
[6] Karl Fogel and Moshe Bar. Open Source Development with CVS. Coriolis, 2001.
[7] Brian Foote and Ralph E. Johnson. Reflective facilities in Smalltalk-80.
Proceedings OOPSLA ’89, volume 24, pages 327–336, October 1989.
In
[8] Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts.
Refactoring: Improving the Design of Existing Code. Addison Wesley, 1999.
[9] Tudor Gı̂rba. Modeling History to Understand Software Evolution. PhD thesis,
University of Berne, Berne, November 2005.
[10] Dan Ingalls, Ted Kaehler, John Maloney, Scott Wallace, and Alan Kay. Back to the
future: The story of Squeak, A practical Smalltalk written in itself. In Proceedings
OOPSLA ’97, ACM SIGPLAN Notices, pages 318–326. ACM Press, November 1997.
[11] Mantis. http://www.mantisbt.org/.
[12] Oscar Nierstrasz and Marcus Denker. Supporting software change in the programming
language, October 2004. OOPSLA Workshop on Revival of Dynamic Languages.
[13] Nathanael Schärli, Stéphane Ducasse, Oscar Nierstrasz, and Andrew Black. Traits:
Composable units of behavior. In Proceedings ECOOP 2003 (European Conference
on Object-Oriented Programming), volume 2743 of LNCS, pages 248–274. Springer
Verlag, July 2003.
[14] Daniel Vainsencher. Mudpie: layers in the ball of mud. Computer Languages, Systems
& Structures, 30(1-2):5–19, 2004.
10
p. 42 of 199
ERCIM 2006
Aspect-orientation for revitalising legacy
business software
Kris De Schutter, Bram Adams
{Kris.DeSchutter,Bram.Adams}@UGent.be
Ghislain Hoffman Software Engineering Lab, INTEC
Ghent University, Belgium
Abstract
This paper relates on a first attempt to see if aspect-oriented programming (AOP) can help
with the revitalisation of legacy business software. By means of four realistic case studies
covering reverse engineering, restructuring and integration, we discuss the applicability of
the aspect-oriented paradigm in the context of two major programming languages for such
environments: Cobol and C.
Key words: AOP, LMP, legacy software, evolution.
1
Introduction
This paper addresses the question of whether aspect-oriented programming (AOP)
techniques [6] can help with the revitalisation of legacy business software. AOP
is an emerging paradigm, leveraging two key principles: quantification and obliviousness [4]. The first allows one to express non-localised behaviour in a localised
way. The latter means one can apply such behaviour to any existing application
without any special preparation to it. When considering legacy applications, which
resist change [1], this seems a useful property for a re-engineering tool to have.
Whether this philosophical notion also holds in practice has, however, not been
tested against realistic legacy software yet. This is most likely due to the lack of
instantiations of AOP for these environments.
The authors have developed AO extensions for the two major programming
languages encountered in legacy business software: Cobol and C. These extensions
enable quantification and obliviousness by making use of logic meta-programming
(LMP) [10] in their pointcut language. This was found to be an adequate solution to
overcome a lack of reflection in Cobol and C, and to allow the generic definition of
behaviour [11]. This paper will now take these tools and see if they can be applied
to four realistic cases for the revitalisation of legacy software.
This paper is electronically published in
Electronic Notes in Theoretical Computer Science
URL: www.elsevier.nl/locate/entcs
p. 43 of 199
De Schutter and Adams
1
static FILE* fp;
3
Type around tracing (Type) on (Jp):
call(Jp,"ˆ(?!.*printf$|.*scanf$).*$")
&& type(Jp,Type) && !is_void(Type)
{
Type i;
5
7
fprintf (fp, "before ( %s in %s )\n", Jp->functionName, Jp->fileName);
9
11
i = proceed ();
13
fprintf (fp, "after ( %s in %s )\n", Jp->functionName, Jp->fileName);
return i;
15
}
Fig. 1. A generic tracing aspect (excerpt).
2
Enabling dynamic analyses of legacy software
In order to help legacy systems evolve, one needs a thorough understanding of the
systems at hand. As in these environments there is most often a lack of (up-todate) documentation, one is forced into applying reverse engineering techniques.
Dynamic analyses offer one approach to this, by analysing the dynamic run-time
behaviour of systems [5,12]. The role for AOP here is to easily enable such techniques by applying some tracing aspect to existing applications.
2.1
A generic tracing aspect
In figure 1, we have shown part 1 of a generic tracing aspect written in Aspicere 2 .
Aspects are encapsulated in plain compilation units able to hold advice constructs.
Advice itself features a signature (lines 3), a pointcut (lines 4–5) and a body (lines 7–
15). The advice body is written in C, with some additions for accessing the runtime
context (Jp variable on lines 9 and 13, and the proceed call on line 11). A simple
template mechanism is also available to help overcome C’s relatively weak typing.
The idea is to trace calls to all procedures except for the printf- and scanffamilies (line 4) and stream output into a file (fp, declared on line 1) before
and after each call (lines 9 and 13). Opening and closing of the file pointer on
line 1 is achieved by advising the main-procedure 3 . The return type of the advised procedure call is bound on line 5. This binding is then used in the advice’s
signature (lines 3) and as a type parameter in its body (lines 7). This way, the
tracing advice is not limited to one particular type of procedures. The wellknown thisJoinPoint construct from AspectJ-like languages, can also be accessed through a join point-specific binding (Jp on lines 3) and used as such (lines 9
and 13).
1
We do not show advice for void procedures, as these are equivalent to the advices shown, less the need for a temporary
variable to hold the return value.
2
Website: http://users.ugent.be/∼badams/aspicere/.
3
Not shown here to conserve space.
2
p. 44 of 199
De Schutter and Adams
2.2
Problem: the build system
As source code is the most portable representation of C programs across several
platforms, Aspicere relies on a source-to-source weaving strategy, and as such acts
as a preprocessor to a normal C compiler. More specifically, it transforms aspects
into genuine C compilation units by converting the advices into (multiple) procedures. This enables the normal C visibility rules in a natural way, i.e. the visibility
of fp on figure 1 is tied to the module containing the aspect. To accomplish this
modularisation, we need to link this single transformed aspect into each advised
application. Because the original makefile hierarchy drives the production of object files, libraries and executables, using a myriad of other tools and preprocessors
(e.g. embedded SQL), and all of these potentially process advised input, it turns
out that Aspicere’s weaver crosscuts the makefile system. We therefore need to find
out what is produced at every stage of the build and unravel accompanying linker
dependencies. In case all makefiles are automatically generated using, for instance,
automake, one could try to replace (i.e. alias) the tools in use by wrapper scripts
which invoke the weaving process prior to calling the original tool. The problem
here is that this is an all-or-nothing approach. It may be that in some cases weaving
is needed (e.g. a direct call to gcc), and in others not (e.g. when gcc is called
from within esql). Making the replacement smart enough to know when to do
what is not a trivial task.
In [11], we applied the tracing aspect of figure 1 to a large case study (453
KLOC of ANSI-C) to enable dynamic analyses. The system consisted of 267
makefiles, not all of which were generated. Without intimate knowledge of the
build system, it was hard to tell whether source files were first compiled before
linking all applications, or (more likely) whether all applications were compiled
and linked one after the other. As such, our weaving approach was not viable. As
an ad hoc solution, we opted to move the transformed advice into the advised base
modules themselves. This meant that we had to declare fp as a local variable of
the tracing advice, resulting in huge run-time overhead due to repeated opening
and closing of the file.
2.3
Conclusion
Applied to reverse-engineering contexts, the use of AOP, LMP and a template
mechanism allows non-invasive and intuitive extraction of knowledge hidden inside
legacy systems, without prior investigation or exploration of the source code [11].
One does not have to first extract all available types and copy the tracing advice for
all of them, as was experienced in [3].
While dynamic analyses can be enabled in this way without the need to prepare
the source code of legacy applications in any way, one is still faced with having to
prepare the build system for these applications (once). As many such applications
rely on custom defined and sometimes complex makefile hierarchies (or similar),
any real use of AOP for revitalising legacy software will depend on a solution to
this problem.
3
p. 45 of 199
De Schutter and Adams
3
Mining business rules in legacy software
When implemented in software, business knowledge, information and rules tend to
be spread out over the entire system. With applications written in Cobol this is even
more the case, as Cobol is a language targeted at business processing 4 but without
modern day modularity mechanisms. This information then tends to get lost over
time, so that when some maintenance is required one is again forced into reverse
engineering. We argue that AOP can provide a flexible tool for such efforts.
We will now revisit a case from [7], in which Isabel Michiels and the first author
discuss the possibility of using dynamic aspects for mining business rules from
legacy applications. The case, put briefly, is this:
“Our accounting department reports that several of our employees were accredited an unexpected and unexplained bonus of 500 euro. Accounting rightfully
requests to know the reason for this unforeseen expense.”
We will now revisit this case, showing the actual advices which may be used to
achieve the ideas set forth in that paper.
We start off by noting that we are not entirely in the dark. The accounting
department can give us a list of the employees which got “lucky” (or unlucky, as
their unexpected bonus did not go by unnoticed). We can encode this knowledge
as facts:
2
4
META-DATA DIVISION.
FACTS SECTION.
LUCKY-EID VALUE 7777.
LUCKY-EID VALUE 3141.
*> etc.
Furthermore, we can also find the definition of the employee file which was being
processed, in the copy books (roughly similar to header files in C):
1
3
DATA
FILE
FD
01
5
DIVISION.
SECTION.
EMPLOYEE-FILE.
EMPLOYEE.
05 EID PIC 9(4).
*> etc.
Lastly, from the output strings we can figure out the name of the data item holding
the total value. This data item, BNS-EUR, turns out to be an edited picture. From
this we conclude that it is only used for pretty printing the output, and not for
performing actual calculations. At some time during execution the correct value
for the bonus was moved to BNS-EUR, and subsequently printed. So our first task
is to find what variable that was. We go at this by tracing all moves to BNS-EUR,
but only while processing one of our lucky employees:
2
4
4
FIND-SOURCE-ITEM SECTION.
USE BEFORE ANY STATEMENT
AND NAME OF RECEIVER EQUAL TO "BNS-EUR"
AND BIND LOC TO LOCATION
AND IF EID EQUAL TO LUCKY-EID.
Cobol = Common Business Oriented Language
4
p. 46 of 199
De Schutter and Adams
6
MY-ADVICE.
DISPLAY EID, ": ", LOC.
In short, this advice states that before all statements (line 2) which have BNS-EUR
as a receiving data item (line 3), and if EID (id for the employee being currently
processed; see data definition higher up) equals a lucky id (runtime condition on
line 5), we display the location of that statement as well as the current id. Amongst
several string literals (which we can therefore immediately disregard) we find a
variable named BNS-EOY, whose name suggests it holds the full value for the endof-year bonus.
Our next step is to figure out how the end value was calculated. We set up
another aspect to trace all statements modifying the variable BNS-EOY, but again
only while processing a lucky employee. We do this in three steps. First:
1
3
5
7
TRACE-BNS-EOY SECTION.
USE BEFORE ANY STATEMENT
AND NAME OF RECEIVER EQUAL TO "BNS-EOY"
AND BIND LOC TO LOCATION
AND IF EID EQUAL TO LUCKY-EID.
MY-ADVICE.
DISPLAY EID, ": statement at ", LOC.
Before execution of any statement (line 2) having BNS-EOY as a receiving data
item (line 3), and when processing a lucky employee (line 5), this would output the
location of that statement. Next:
1
3
5
7
TRACE-BNS-EOY-SENDERS SECTION.
USE BEFORE ANY STATEMENT
AND NAME OF RECEIVER EQUAL TO "BNS-EOY"
AND BIND SENDING TO SENDER
AND BIND SENDING-NAME TO NAME OF SENDING
AND IF EID EQUAL TO LUCKY-EID.
MY-ADVICE.
DISPLAY SENDING-NAME, " sends ", SENDING.
This outputs the name and value for all sending data items (lines 4 and 5) before
execution of any of the above statements. This allows us to see the contributing
values. Lastly, we want to know the new value for BNS-EOY which has been
calculated.
2
4
6
TRACE-BNS-EOY-VALUES SECTION.
USE AFTER ANY STATEMENT
AND NAME OF RECEIVER EQUAL TO "BNS-EOY"
AND IF EID EQUAL TO LUCKY-EID.
MY-ADVICE.
DISPLAY "BNS-EOY = ", BNS-EOY.
We now find a data item (cryptically) named B31241, which is consistently
valued 500, and is added to BNS-EUR in every trace. Before moving on we’d
like to make sure we’re on the right track. We want to verify that this addition of
B31241 is only triggered for our list of lucky employees. Again, a dynamic aspect
allows us to trace execution of exactly this addition and helps us verify that our
basic assumption holds indeed. We start by recording the location of the “culprit”
statement as a usable fact:
META-DATA DIVISION.
5
p. 47 of 199
De Schutter and Adams
2
4
FACTS SECTION.
CULPRIT-LOCATION VALUE 666.
*> other facts as before
The test for our assumption may then be encoded as:
2
4
6
TRACE-BNS-EOY-SENDERS SECTION.
USE BEFORE ANY STATEMENT
AND LOCATION EQUAL TO CULPRIT-LOCATION
AND IF EID NOT EQUAL TO LUCKY-EID.
MY-ADVICE.
DISPLAY EID, ": back to the drawing board.".
This tests whether the culprit statement gets triggered during the process of any of
the other employees. If it does, then something about our assumption is wrong. Or
it may be that the accounting department has missed one of the lucky employees.
Given the verification that we are indeed on the right track, the question now
becomes: why was this value added for the lucky employees and not for the others?
Unfortunately, the logic behind this seems spread out over the entire application.
So to try to figure out this mess we would like to have an execution trace of each
lucky employee, including a report of all tests made and passed, up to and including
the point where B31241 is added. Dynamic aspects allow us to get these specific
traces. First, some preliminary work:
2
4
WORKING-STORAGE SECTION.
01 FLAG PIC 9 VALUE 0.
88 FLAG-SET
VALUE 1.
88 FLAG-NOT-SET VALUE 0.
The FLAG data item will be used to indicate when tracing should be active and
when not. For ease of use we also define two “conditional” data items: FLAG-SET
and FLAG-NOT-SET. These reflect the current state of our flag. Our first advice
is used to trigger the start of the trace:
2
4
6
8
TRACE-START SECTION.
USE AFTER READ STATEMENT
AND NAME OF FILE EQUAL TO "EMPLOYEE-FILE"
AND BIND LOC TO LOCATION
AND IF EID EQUAL TO LUCKY-EID.
MY-ADVICE.
SET FLAG-SET TO TRUE.
DISPLAY EID, ": start at ", LOC.
I.e., whenever a new employee record has been read (line 2 and 3), and that record
is one for a lucky employee (line 5), we set the flag to true (line 7). We also do
some initial logging (line 8). The next advice is needed for stopping the trace when
we have reached the culprit statement:
2
4
6
TRACE-STOP SECTION.
USE AFTER ANY STATEMENT
AND LOCATION EQUAL TO CULPRIT-LOCATION.
MY-ADVICE.
SET FLAG-NOT-SET TO TRUE.
DISPLAY EID, ": stop at ", LOC.
Then it is up to the actual tracing. We capture the flow of procedures, as well as
execution of all conditional statements:
6
p. 48 of 199
De Schutter and Adams
2
4
6
8
TRACE-PROCEDURES SECTION.
USE AROUND PROCEDURE
AND BIND PROC TO NAME
AND BIND LOC TO LOCATION
AND IF FLAG-SET.
MY-ADVICE.
DISPLAY EID, ": before ", PROC, " at ", LOC.
PROCEED.
DISPLAY EID, ": after ", PROC, " at ", LOC.
10
12
14
16
18
TRACE-CONDITIONS SECTION.
USE AROUND ANY STATEMENT
AND CONDITION
AND BIND LOC TO LOCATION
AND IF FLAG-SET.
MY-ADVICE.
DISPLAY EID, ": before condition at ", LOC.
PROCEED.
DISPLAY EID, ": after condition at ", LOC.
From this trace we can then deduce the path that was followed from the start of
processing a lucky employee, to the addition of the unexpected bonus. More importantly, we can see the conditions which were passed, from which we can (hopefully) deduce the exact cause.
This is where the investigation ends. For those curious, we refer to the original
paper for the solution [7]. Whatever the cause of the problem, AOP+LMP provided
us with a flexible and powerful tool to perform our investigation.
4
Encapsulating procedures
In [9], Harry and Stephan Sneed discuss creating web services from legacy host
programs. They argue that while tools exist for wrapping presentation access and
database access for use in distributed environments,
“accessing [...] the business logic of these programs, has not really been solved.”
In an earlier paper, [8], Harry Sneed discusses a custom tool which allowed the
encapsulation of Cobol procedures, to be able to treat them as “methods”, a first
step towards wrapping business logic. Part of that tool has the responsibility of creating a switch statement at the start of the program, which performs the requested
procedure, depending on the method name.
4.1
A basic wrapping aspect
Figure 2 shows how encapsulation of procedures (or “business logic”) can be achieved, in a generic way, using AOP and LMP. The aspect shown here, written in
Cobble, consists of two advices.
The first advice, DISPATCHING (lines 1–7), takes care of the dispatching. It
acts around the execution of the entire program (line 2), and once for every paragraph in this program (line 3). The latter effect is caused by the ambiguousness of
the PARAGRAPH selector. This can be any of a number of values. Rather than
just picking one, what Cobble does is pick them all: the advice gets activated
7
p. 49 of 199
De Schutter and Adams
1
3
5
7
9
11
13
DISPATCHING SECTION.
USE AROUND PROGRAM
AND BIND PARA TO PARAGRAPH
AND BIND PARA-NAME TO NAME OF PARA
AND IF METHOD-NAME EQUAL TO PARA-NAME.
MY-ADVICE.
PERFORM PARA.
ENCAPSULATION SECTION.
USE AROUND PROGRAM.
MY-ADVICE.
PERFORM ERROR-HANDLING.
EXIT PROGRAM.
Fig. 2. Aspect for procedure encapsulation.
for every possible solution to its pointcut, one after the other. Furthermore, the
DISPATCHING advice will only get triggered when METHOD-NAME matches the
name of the selected paragraph (extraction of this name is seen on line 4). This is
encoded in a runtime condition on line 5. Finally, the advice body, when activated,
simply calls the right paragraph (PERFORM statement on line 7).
The second advice, ENCAPSULATION (lines 9–13), serves as a generic catchall. It captures execution of the entire program (line 10), but replaces this with a
call to an error handling paragraph (line 12) and an exit of the program (line 13).
The net effect is that whenever the value in METHOD-NAME does not match any
paragraph name in the program, the error will be flagged and execution will end.
This, together with the first advice, gives us the desired effect.
We are left with the question of where METHOD-NAME is defined, and how
it enters our program. The answer to the first question is simply this: any arguments which get passed into a Cobol program from the outside must be defined in
a linkage section. I.e.:
1
LINKAGE SECTION.
01 METHOD-NAME PIC X(30) VALUE SPACES.
Furthermore, the program division needs to declare that it expects this data item as
an input from outside:
PROGRAM DIVISION USING METHOD-NAME.
This begs the question as to how this input parameter METHOD-NAME was inserted
in an AOP-like way. Simply: it was not. We tacitly assumed our aspect, and the
accompanying input parameters, to be defined inside the target program (a so-called
“intra-aspect”). Of course, for a truly generic “inter-aspect” we need to remedy
this. Definition of the METHOD-NAME data item is no big problem. We can simply
define it within an aspect module, which, upon weaving, would augment the target
program (modulo some alpha-renaming to prevent unintended name capture):
1
IDENTIFICATION DIVISION.
ASPECT-ID. PROCEDURE-WRAPPING.
3
5
DATA DIVISION.
LINKAGE SECTION.
8
p. 50 of 199
De Schutter and Adams
01 METHOD-NAME PIC X(30) VALUE SPACES.
From this, it becomes pretty obvious that METHOD-NAME should be used as an
input parameter of the base program. The concept of a linkage section makes no
sense for an external aspect module, as an aspect will never be called in such a way.
The hard part lies with the semantics of declaring extra input data items on another
program. What do we expect to happen?
•
Does the introduction of an input data item by the aspect replace existing input
items in the advised program, or is it seen as an addition to them?
•
If it is added to them, then where does it go into the existing list of inputs? At
the front? At the back?
•
What happens when multiple aspects define such input items? In what order do
they appear?
•
How do we handle updating the sites where the woven program gets called? The
addition of an extra input item will have broken these.
Consider the C/Java/. . . equivalent of this: what does it mean to introduce new parameters on procedures/methods? More to the point, should we allow this?
4.2
An extended wrapping aspect
The complexity of the problem increases when we consider another important feature of Sneed’s tool (ignored until now):
“For each [encapsulated] method a data structure is created which includes all
variables processed as inputs and outputs. This area is then redefined upon a
virtual linkage area. The input variables become the arguments and the output
variables the results.” [8]
Put another way, we must find all data items on which the encapsulated procedures depend. These are then gathered in a new record (one per procedure), which
redefines a “virtual linkage area” (in C terms: a union over all newly generated
typedefs). This linkage area must then also be introduced as an input data item of
the whole program. Such a requirement seems far out of the scope of AOP. While
it has a crosscutting concern in it (cfr. “for each method”), this concern can not be
readily defined using existing AOP constructs.
Instead, figure 3 shows a different approach to the problem. It is encoded neither in Cobble or Aspicere, opting for a different view on the AOP+LMP equation.
Whereas the previous examples were based on LMP embedded in AOP, figure 3
is based on embedding AOP in LMP, similar to the approach in [2]. The code
can be read as follows. Whatever you find enclosed in curly brackets ({. . . }) is
(aspect-)code which is to be generated. This can be further parameterized by placing variables in “fishgrates” (<. . . >), which will get expanded during processing.
Everything else is Prolog, used here to drive the code generation.
Let us apply this to the code in figure 3. Lines 1 and 2 declare the header of
our aspect, while lines 4–6 define the linkage section as discussed before. Lines 8–
9
p. 51 of 199
De Schutter and Adams
2
{ IDENTIFICATION DIVISION.
ASPECT-ID. PROCEDURE-WRAPPING.
DATA DIVISION.
LINKAGE SECTION.
01 METHOD-NAME PIC X(30) VALUE SPACES. },
4
6
8
10
12
14
findall(
[Name, Para, Wss],
( paragraph(Name, Para),
slice(Para, Slice),
wss(Slice, Wss)
),
AllInOut
),
16
18
20
22
24
26
28
max_size(AllInOut, VirtualStorageSize),
{ 01 VSPACE PIC X(<VirtualStorageSize>). },
all(member([Name, Para, Wss], AllInOut), (
{ 01 SLICED-<Name> REDEFINES VSPACE.},
all( (record(R, Wss), name(R, RName)), (
clone_and_shift(R, "<RName>-<Name>", SR),
{ <SR> }
))
)),
{ PROGRAM DIVISION USING METHOD-NAME, VSPACE.
DECLARATIVES. },
30
32
34
36
38
40
42
44
46
48
50
all(member([Name, Para, Wss], AllInOut), (
{ WRAPPING-FOR-<Name> SECTION.
USE AROUND PROGRAM
AND IF METHOD-NAME EQUAL TO "<Name>".
WRAPPING-BODY.
},
all( (top_record(R, Wss), name(R, RName)),
{ MOVE <RName>-<Name> TO <RName>.}
),
{
PERFORM <Name>.}
all( (top_record(R, Wss), name(R, RName)),
{ MOVE <RName> TO <RName>-<Name>.}
)
)),
{ ENCAPSULATION SECTION.
USE AROUND PROGRAM.
MY-ADVICE.
PERFORM ERROR-HANDLING.
EXIT PROGRAM.
END DECLARATIVES. }
Fig. 3. Full procedure encapsulation.
15 calculate all slices (slice/2 on line 11) for all paragraphs (paragraph/2
on line 10). From each of these we extract the working-storage section (wss/2
on line 12), which gives us the required in- and output parameters, collected in
AllInOut (line 14). From this we extract the size of the largest one (max size/2
on line 17) which is used next in the definition of the virtual storage space (line 18).
Next, for each paragraph (i.e. for each member of AllInOut), we generate a redefinition of the virtual space to include all data items on which that paragraph
10
p. 52 of 199
De Schutter and Adams
depends (lines 20–26). The redefinition can be seen on line 21, where it is given
a unique name (i.e. SLICED-paragraph-name). Its structure is defined by going over all records in the working-storage section for that paragraph (line 22),
cloning each record under a new, unique name while updating the level number
(line 23), and then outputting this new record (line 24). This concludes the data
definition. Next, the procedure division is put down, declaring the necessary parameters (line 28). We then generate advice similar to that in figure 2, but now
they need to perform some extra work. First, they must transfer the data from the
virtual storage space as redefined for the paragraph, to the original records defined
for the program (lines 37–39). The original paragraph may then be called without
worry (line 40). Afterwards, the calculated values are retrieved by moving them
back to the virtual storage space, again as redefined for the paragraph (lines 41–
43). All that is left is the generic catch-all (lines 46–50), and the closing of the
aspect (line 51).
Despite the inherent complexity of the problem, AOP+LMP allowed us to write
down our crosscutting concern with certain ease. LMP was leveraged to define
our aspect by reasoning over the program. AOP was leveraged to tackle the actual weaving semantics, unburdening us from writing program transformations.
Granted, we quite happily made use of a slicing predicate to do most of the hard
work (line 11). Still, the use of libraries which hide such algorithms is another
bonus we can get from LMP.
5
Year 2000 syndrome
The Y2K-bug is probably the best-known example of unexpected change in legacy
systems. It is important to understand that at the heart of this was not a lack of
technology or maturity thereof, but rather the understandable failure to recognize
that code written as early as the sixties would still be around some forty years later.
So might AOP+LMP have helped solving the problem? The problem statement
certainly presents a crosscutting concern: whenever a date is accessed in some
way, make sure the year is extended.
This presents our first problem: how do we recognize data items for dates in
Cobol? While Cobol has structured records, and stringent rules for how data is
transferred between them, they carry no semantic information whatsoever. Knowing which items are dates and which are not, requires human expertise. The nice
thing about LMP is that we could have used it to encode this. In C, where a disaster is expected in 2038 5 (hence Y2K38), the recognition problem is less serious
because of C’s more advanced typing mechanisms. A date in (ANSI-)C could be
built around the standard time provisions (in “time.h”), or otherwise some (hopefully sensibly named) custom typedef. In the former case, recompiling the source
code on a system using more than 32 bits to represent integers solves everything
immediately. Whereas all variables in Cobol have to be declared in terms of the
5
More details on http://www.merlyn.demon.co.uk/critdate.htm
11
p. 53 of 199
De Schutter and Adams
same, low-level Cobol primitives, C allows variables to be declared as instances of
user-defined types. In this sense, the latter case (custom date type) represents much
less of a problem. The check for a date would be equivalent to a check for a certain
type.
Second problem for Cobol: given the knowledge of which data items carry
date information, how do we know which part encodes the year? It may be that
some item holds only the current year, or that it holds everything up to the day.
A data item may be in Gregorian form (i.e. “yyddd”) rather than standard form
(“yymmdd”). Of course, that “standard” may vary from locale to locale (the authors would write it as “ddmmyy”). But again, we could use LMP to encode this
knowledge.
Let us assume we can check for data items which hold dates, and that these have
a uniform structure (in casu “yymmdd”). Then we might write something like:
1
3
5
7
9
AN-YYMMDD-FIX SECTION RETURNING MY-DATE.
USE AROUND SENDING-DATA-ITEM
AND SENDING-DATA-ITEM IS DATE.
MY-ADVICE.
MOVE PROCEED TO MY-DATE(3:8).
IF MY-DATE(3:4) GREATER THAN 50 THEN
MOVE 19 TO MY-DATE(1:2)
ELSE
MOVE 20 TO MY-DATE(1:2).
This advice has two problems. One is the definition of MY-DATE (referred to as
a return value on line 1, and assumed to have a “yyyymmdd” format). In Cobol,
all data definitions are global. Hence, MY-DATE is a unique data item which gets
shared between all advices. While this is probably safe most of the time, it could
lead to subtle bugs whenever we have nested execution of such advice. 6 The same
is true for all advices in Cobble. It is just that the need for a specific return value
makes it surface more easily. Of course, in this case, the fix would be to require duplication of this data item for all advice instantiations. The greater problem lies in
the weaving. When committed to a source-to-source approach, as we are with Cobble, weaving anything below the statement level becomes impossible. As Cobol
lacks the idea of functions 7 , we can not replace access to a data item with a call to
a procedure (whether advice or the original kind) as we could do in C. The remedy
for this would be to switch to machine-code weaving, but we are reluctant to do so,
as we would lose platform independence. Common virtual machine solutions (e.g.
as with ACUCobol) are not widespread either.
6
Conclusion and Future Work
We discussed restructuring and integration problems using four issues related to
(classic) legacy software, and showed how three of these might be aided through a
mixture of AOP and LMP. Reverse engineering based on tracing in C and business
rule mining in Cobol went smoothly, employing LMP as a pointcut mechanism in
6
7
Though not in this case, as the structure of the advice body only refers to the data item after the PROCEED statement.
Functions can be written in later versions of Cobol. Our focus on legacy systems, however, rules these out for use here.
12
p. 54 of 199
De Schutter and Adams
AOP. Encapsulation of procedures in Cobol, a typical legacy integration scenario,
required a more generative approach embedding AOP in LMP.
As for the Y2K restructuring problem, the semantics of Cobol, especially its
lack of typing, present too much of a limitation. In C, the Y2K38 problem can still
be managed reasonably, precisely because it does feature such typing. Other legacy
languages will likely exhibit the same behavior.
All in all, AOP+LMP proves a useful, flexible and strong tool to tackle the ills of
legacy software, limited only by the base language’s typing support. More elaborate
case studies are needed to provide more feedback about other restructuring and
integration problems, and the general necessity of the AOP-in-LMP approach.
References
[1] K. Bennett. Legacy systems: Coping with success. IEEE Software, 12(1), 1995.
[2] J. Brichau, K. Mens, and K. De Volder. Building composable aspect-specific
languages with logic metaprogramming. In GPCE, 2002.
[3] M. Bruntink, A. van Deursen, and T. Tourwé.
engineering aspects. In WCRE. IEEE, 2004.
An initial experiment in reverse
[4] R. E. Filman and D. P. Friedman. Aspect-oriented programming is quantification and
obliviousness. In Aspect-Oriented Software Development. Addison-Wesley, 2005.
[5] A. Hamou-Lhadj, E. Braun, D. Amyot, and T. Lethbridge. Recovering behavioral
design models from execution traces. In CSMR. IEEE, 2005.
[6] G. Kiczales. Aspect-oriented programming. In Proceedings of the Eighth Workshop
on Institutionalizing Software Reuse, 1997.
[7] I. Michiels, T. D’Hondt, K. De Schutter, and G. Hoffman. Using dynamic aspects to
distill business rules from legacy code. In Dynamic Aspects Workshop, 2004.
[8] H. M. Sneed. Encapsulating legacy software for use in client/server systems. In
WCRE, 1996.
[9] H. M. Sneed and S. H. Sneed. Creating web services from legacy host programs. In
WSE, 2003.
[10] R. Wuyts. Declarative reasoning about the structure of object-oriented systems. In
TOOLS USA ’98. IEEE, 1998.
[11] A. Zaidman, B. Adams, K. De Schutter, S. Demeyer, G. Hoffman, and B. De Ruyck.
Regaining lost knowledge through dynamic analysis and Aspect Orientation - an
industrial experience report. In CSMR, 2006.
[12] A. Zaidman, T. Calders, S. Demeyer, and J. Paredaens. Applying webmining
techniques to execution traces to support the program comprehension process. In
CSMR. IEEE, 2005.
13
p. 55 of 199
p. 56 of 199
Effort Assessment and Predictions for High Volume Web-based
Applications: a Pragmatic Approach
Sanjeev Dhawan
Faculty of Computer Science
Dronacharya Institute of Management and
Technology, Kurukshetra University, Kurukshetra
(K.U.K)- 136 119, Haryana, India
E-mail: [email protected]
Rakesh Kumar
Faculty of Computer Science
Department of Computer Science and Applications,
Kurukshetra University, Kurukshetra (K.U.K)- 136 119,
Haryana, India
E-mail: [email protected]
ABSTRACT: - In this paper we explore the variation of the effort estimation of writing code when development
of Web-based applications is encountered for the first time. Effort assessment of high volume Web-based
systems is crucial, where outages can result in loss of revenue and dissatisfied customers. Here we advocate a
simple, but elegant approach based on the effort needed for designing Web applications from an empirical point
of view. We carried out an empirical study with the students of an advanced university class and Web designers
that used various client-server based Web technologies as a Web-based application design for predicting the
Hypermedia Design Model. Our first goal was to compare the relative importance of each design activity by
involving the principles of an accessible Web design, qualities of a good software metric, and related work in
Web based application. Second, we tried to assess the accuracy of a priori design effort predictions and the
influence of some factors on the effort needed for each design activity. Third, we also studied the quality of the
designs obtained based on construction of a User Behavior Model Graph (UBMH) to capture the dynamics
involved in user behavior. Fourth, we promote a simple, but efficient approach based on the effort needed for
designing Web-based applications with the help of RS Web Application Effort Assessment (RSWAEA) model.
The results obtained from the assessment can help us to analytically identify the effort assessment and failure
points in Web systems and makes the evaluation of reliability of these systems simple.
KEY WORDS: - Web-based design, Web metrics, Empirical Software Engineering, User Behavior Model Graph
(UBMG), RS Web Application Effort Assessment (RSWAEA) method
1. Introduction
By now, a very little research work has been carried out in the area of effort estimations and assessment
techniques for Web systems. Software effort estimation and assessment is crucial for techniques for high
volume Web systems based hypermedia applications, where failures can result in loss of revenue and
dissatisfied customers and users. Web systems based hypermedia applications have led to the emergence of
new e-commerce models, which mandate a very high reliability and availability requirements. Companies
developing Web-based systems face the challenge of estimating the required development effort in a very short
time frame. This problem does not have a standard solution yet. On the other hand, effort estimation models
that have been used for many years in traditional software development are not very accurate for Web-based
software development effort estimation. Web-based projects are naturally short and intensive [1], so not having
an appropriate effort estimation model pushes developers to make highly risky estimations. Moreover, the rapid
evolution and growth of Web related technology, tools and methodologies makes historical information quickly
obsolete. Nelson et al [2] compute the reliability number by using the ratio of the number of page errors, to the
total number of hits to the page for a test case. The computation is based on a static representation of the Web
pages and does not consider users behavior. There are some commercial tools such as LogExpert, Analog [3]
that gives the statistics like page time, pages accessed, referrer details, etc. but does not give comprehensive
session level information such as session length, mix of sessions, session count, probability of navigation from
one page to another etc. Thus, they do not report the dynamic aspects of user’s navigation. Wang et al [4] have
proposed the construction of tree-view of the user’s navigational flow by considering Web server log files as
input, and use referrer-id field to derive the result, which represents a dynamic form of input domain model [5].
Here, the construction of the tree-view is calculated through depth-first traversal algorithm taking log files as
input. The model represents a system where users avoid re-traversal by remembering the pages traversed
before. Menasce et al [6] have proposed a state transition graph to capture the behavior of users for Web
workload characterization. Technique proposed by [7] considers the navigational flow as input, and uses clientid field in the access log to identify the unique sessions, probabilities associated with occurrence of each
session, and the page-level transition probabilities in the session. The reliability is computed by failure data
analysis to estimate metrics such as the Mean Time between Failures (MTBF), Mean Time to Fail (MTTF) and
Reliability number [8]. In Web systems, the data for failure analysis has primarily been captured from the
access logs that have HTTP returns error code of 4xx and 5xx only for the valid sessions. The complete
implementation strategy of reliability aspects and assessment has been discussed with the help of UBMG and
RSWAEA method. Finally, the reliable and precise effort assessment of high volume Web software is critical for
project selection, project planning and project control. Over the past thirty years, various estimation models
have been developed to help managers perform estimation tasks, and this has led to a market offering of
estimation tools. For organizations interested in using such estimation tools, it should be crucial to know about
p. 57 of 199
the predictive performance of the estimates such tools produce. The construction of an estimation model
usually requires a set of completed projects from which an arithmetic model is derived and which is used
subsequently as the basis for the estimation of future projects. So, there is a need for an estimation model for
the development effort of these Web projects. In this paper, we will try to point out the need for predictive
metrics to measure the development effort for Web-based applications. Finally the Web-based characteristics
and parameters are used to predict the effort and duration in terms of the Web systems development. A new
set of database has been made on the basis of a complete hypothetical study. The hypothetical study is
conducted by providing the dataset of Web documents. We carried out an empirical research and study for
providing effort assessment for small to large-size We-based applications. For this paper, we have analyzed
many findings drawn from the experienced questionnaire that were conducted from 05 January 2005 through
27 January 2006. The results are augmented by answers from a survey of experienced questionnaire provided
to the students of an advanced university class and web designers that used various client-server based Web
technologies. Our analyses suggest several areas (including reliability, usability, complexity, cost, time
requirements and type of nature of Web design) where both Web-based designers, engineers and managers
would benefit from better guidance about the proper and implementation of Web-based applications.
2. Literature survey
Web designers recognize the importance of realistic estimates of effort to the successful management of
software projects, the Web being no exception. Estimates are necessary throughout the whole development life
cycle. They are fundamental used to determine a project’s feasibility in terms of cost-benefit analysis and
design, and to manage resources effectively. The Size measures can be described in terms of length,
functionality and complexity is often a major determinant of estimations. Most estimates prediction models to
date concentrate on functional measures of size, although length and complexity are also essential aspects of
size in order to analyze the overall effect of Parametric influence (including qualitative parameters such as size,
number of defects, months and qualitative parameters such as complexity, speed, required reliability, tool
usage, and analyst capability), Sensitivity, Risk identification, Software reuse and COTS (Commercial-Off-theShelf) based Systems. So an over all case study evaluation is required to predicts the size metrics,
characterizing length, complexity and functionality for Web design and estimations [9]. The parametric influence
including both qualitative and quantitative study predicts through a case study evaluation and hypothetical
analysis where a set of proposed or reused size metrics for estimation prediction will have to be measured [1013]. To date there are only a few examples of estimation prediction models for Web development in the
literature as most work proposes methods and tools as a basis for process improvement and higher product
quality. Therefore, in near future the development of Web-based applications, using an object-oriented frame
work, component-based framework, parametric-based framework or project-level framework, and hardwaresoftware co-designs related framework for sensitivity analysis and risk identification need to be designed. The
Web-based designs and estimation activities should be based upon Conceptual Framework for Software
Measurement, which is based on following principles: (a) Determining relevant measurement goals (b)
Recognizing the entities to be examined (c) Identifying the level of maturity the organization has reached (d)
Classifying the functionality of metrics through standardization. Besides these, the future work using the
COSMIC-FFP model to measure the functionality of several types of static, dynamic and active Web pages and
to use real datasets from industrial practice [14]. Traditional effort estimation methods like COCOMO [15] are
mainly based on metrics like Lines Of Code (LOC) [16] or Function Points (FP) [17]. The estimation strategies
supported by LOCs have shown several problems. Most working Web projects agree that LOCs are not suitable
for early estimation because they are based on design [18]. Other reported problem is that the work involved in
the development of multimedia objects and look-and-feel cannot be measured in LOCs. Also, an important
amount of reliable historical information is needed to estimate effort using this metric, and this information is
hard to get in Web-based projects. Finally, to carry out estimations using LOCs requires a long analysis of the
historical information, which reduces the capability to get reliable fast estimations. Speed is an important
requisite of Web-based projects development. Similarly, traditional estimation methods based on FPs are not
appropriate because applications do more than transform inputs to outputs, i.e. the effort necessary for
developing a Web-based application is much more than the effort required for implementing its functionality.
FPs doesn’t consider the imagery, navigation design, look-and-feel, and multimedia objects, among others. In
other words, the traditional categories of FPs should be redefined. This kind of estimation also requires an
important amount of reliable historical information, which supports the used values of each FPs. Although there
are several software effort estimation methods like Price-S, Slim and Seer [15], COCOMO is the most well
known and used by the software industry. It had shown to be appropriate in many development scenarios. The
first version of such method used LOCs as the fundamental metric to support the estimations. Then, Boehm
proposed COCOMO II, which could use alternatively LOCs, FPs or Object Points. Although COCOMO II was
not defined to support the development effort estimation of Web applications, many people found the way to
adapt the object point concept in order to get a sizing estimation [15]. Object points are an indirect metric,
similar to FPs, which considers three categories: user interfaces, reports and components, which are probably
needed to develop the final product. Every element in the system is categorized and classified in three
complexity levels: basic, intermediate and advanced. Then, based on these classified elements, and taking into
account the historical information, it is possible to generate a good estimation. Object Points and COCOMO II
p. 58 of 199
seem to be acceptable for traditional or multimedia software projects, but they are not good enough to get
accurate effort estimations for Web-based information systems. The complexity of the estimation process and
the need for detailed historical information make them difficult to apply in this scenario.
3. Principles involved in accessible Web-based design
a) Create Web pages that conform to accepted and published standards for measuring the HTML, CSS, XML
and other Web based specifications are much more likely to be interpreted correctly by the various user agents
(browsers) that exist. Additionally, if you use stylesheets, you should conform to number of measurements
including absolute units such as inches, centimeters, points, and so on, as well as relative measures such as
percentage and em units [19].
b) Know the difference between structural and presentation elements, use stylesheets when appropriate. It
should be noted, though, that stylesheets support is not fully implemented on all user agents (browsers); this
means that for at least the near future, some presentation elements in HTML will still be used and easy to
measure. Moreover, if a Web-based design contains the multiple links and these links are further connected to
different stylesheets then measurements will be more complex. So always try to use a single stylesheets for an
effective Web-based design.
c) The Web-based design should include the rich Meta-Content about the purpose and function of elements by
providing the valuable tools for giving additional information on the function and meaning of various tags in the
larger scope of your page, you can increase the accessibility of Web page.
d) Make sure your pages can be navigated by keyboard. It means Web-based design measurements should
also be based on the keyboard navigation.
e) Provide alternative methods to access non-textual content, including images, scripts, multimedia tables,
forms and frames, for user agents that don’t display them. The foremost example of this is the “ALT” attribute,
of the <IMG> tag, which allows an author to provide alternative text in case a user agent can’t display graphics.
Accessibility on some Web design can also be measured and maintained by providing off-line or at least, offWeb methods of doing things; such as providing an e-mail link or response form, for whatever reason.
f) Be wary of common pitfalls that can reduce the accessibility, while measuring the Web design of your site.
Examples of these pitfalls include: (i) Blinking text (ii) Use of ASCII art (iii) Link names that don’t make sense
out of content (iv) Link that aren’t separated by printable characters (v) Use of platform dependent scripting (g)
For effective Web measurements, it is always better to define the functions in the <HEAD> tag to reduce the
complexity and decrease the effort required in security will be improved and this type of Web-design will be
more reliable.
In general all the Web measurements are performed on to the three broad categories of Web documents
generally simplicity, reliability, and performance (SRP) is tested and measured. For dynamic Web documents,
instead of checking SRP generally cost, complexity, and speed of retrieval the information is verified. For active
Web documents generally its ability to update information continuously is checked and measured.
4. Projected qualities of a good software metric
Lord Kelvin once said that when you can measure what you are speaking about and express it in numbers, you
know something about it. Measurement is fundamental to any engineering discipline. The terms "measure",
"measurement", and "metrics" are often used interchangeably, but according to Pressman [20] a measure
provides a quantitative indication of the extent, amount, dimensions, capacity, or size of some attribute of a
product or process. Measurement is the act of determining a measure. The IEEE Standard Glossary of
Software Engineering Terms [21] defines metrics as "a quantitative measure of the degree to which a system,
component, or process possesses a given attribute". Ejiogu [22] suggested that a metric should possess the
following characteristics: (a) Simple and computable: It should be easy to learn how to derive the metric and its
computation should not be effort and time consuming, (b) Empirically and intuitively persuasive: The metric
should satisfy the engineer'
s intuitive notion about the product under consideration. The metric should behave
in certain ways, rising falling appropriately under various and conditions, (c) Consistent and Objective: The
metric should always yield results that are unambiguous. The third party would be able to derive the same
metric value using the same information, (d) Consistent in its use of units and dimensions: It uses only those
measures that do not lead to bizarre combinations of units, (e) Programming language independent, (f) An
effective mechanism for quality feedback. In addition to the above-mentioned characteristics, Roche [23]
suggests that metric should be defined in an unambiguous manner. According to Basil [24] Metrics should be
tailored to best accommodate specific products and processes.
5. Related and existing work for Web-based applications
From the beginning of software engineering, several development effort estimation methods have been
proposed. We can classify these methods for our research as those for traditional software and those for Weboriented software. The traditional effort estimation methods are those used to estimate the development effort
of software that consists of programs in a programming language, which eventually interact with data files or
databases. Generally, these software have an active execution thread that provides system services. On the
other hand, the Web-oriented methods use different metrics and they are focused on estimating the
development effort of products that are event-oriented. These products generally involve code in a
p. 59 of 199
programming language, imagery, look-and-feel, information structure, navigation and multimedia objects.
Several size metrics have been proposed for Web applications, like Object Points, Application Points and
Multimedia Points [25]. However, the most appropriate seems to be Web Objects (WO) [18]. WOs are an
indirect metric that is based on a predefined vocabulary that allows defining Web systems components in term
of operands and operators. To estimate the amount of WOs that are part of a Web-based application it is
necessary to identify all the operators and operands present in the system. Then, they are categorized using a
predefined table of Web Objects predictors and also they are classified in three levels of complexity: low,
average and high. The final amount of WO in a Web-based application is computed using the Halstead’s
equation [17] (see equation 2), and it is known as the volume or size of the system.
Effort = A
8
Ci (Size)
i=1
P1
Duration = B(Effort)
P2
Where: A = effort coefficient
B = duration coefficient
Ci = cost drivers
P1 = effort power law
P2 = duration power law
Equation 1 WEBMO- Web Estimation Model
V = N log2 (n) = (N1 + N2) log2 (n1 + n2)
Where: N = number of occurrences of
Operands/Operators
n = number of distinct Operands/Operators
N1 = total occurrences of Operand estimator
N2 = total occurrences of Operator estimator
n1 = number of unique Operands estimator
n2 = number of unique Operator estimator
V = volume of work involved represented as web objects
Equation 2 Halstead’s Equation for Volume
The effort estimation and the duration of the development are computed using WebMo (Web Model), which is
an extension of COCOMO II [18]. This model uses two constants, two power laws, several cost drivers, and the
product size expressed in WO (see equation 1). Constants A and B, and power laws P1 and P2 are defined by
a parameter table in the model. This model contains the values obtained from a database of former projects
(historical information). The cost drivers are parameters used to adjust the effort and duration in terms of the
development scenario. For this model nine cost drivers were defined: product reliability and complexity (RCPX),
platform difficulty (PDIF), personnel capability (PERS), personnel experience (PREX), facilities of tools and
equipment (FCIL), scheduling (SCED), reuse (RUSE), teamwork (TEAM) and process efficiency (PEFF) [18].
Each cost driver has different values that may be: very low, low, normal, high, and very high. The combination
of WebMo (WebMo equation 1) and Web Objects (Halstead’s equation 2) is, at this moment, the most
appropriate method to estimate the development effort of Web applications. However, this combination does
not seem to be the best for accurate and frequent development scenarios because it needs an important
amount of historical detailed information to carry out the estimation. Also, the WO identification and
categorization process is difficult to carry out in a short time, and it requires an expert that also knows how to
carry out the project in critical technical decisions. The above effort estimation methods presented are no
appropriate to estimate the development effort of Web-based information systems in different scenarios. In the
next section we present the User Behavior Model Graph (UBMG) and RS Web Application Effort Assessment
(RSWAEA) methods. They intend to be more appropriate to estimate the development effort of small to largesize projects, especially in scenarios that require fast estimation with little historical information.
6. Motivations of UBMH and RSWAEA
The techniques we propose have the following key objectives:
1. Derive the UBMG in a manner that we capture complete details for valid sessions, and number of
occurrences of invalid sessions. The valid sessions have metrics such as session count, reliability of session,
probability of occurrence of the session, and transition probability of the pages in the session.
2. Derive the RSWAEA method to estimate the development effort of small to large-size projects, especially in
scenarios that require fast estimation with little historical information. On the basis of RSWAEA method, the
Web-based software effort estimations are examined with user’s cost, cost drivers, data Web objects
compatibility, usability, maintainability, complexity, configuration, time requirements, and of interfaces, would
also be examined and considered.
6.1 Implementation and Analysis of User Behavior Model Graph (UBMG)
UBMG can be represented in form of a graphical or a matrix notation [26]. In the graph view, nodes represent
the pages, and arcs represent the transition from one node to another. In the matrix representation each cell (i,j)
corresponds to probability of transition from page i to page j. We extend UBMG by adding an additional node to
#Fields: date time c-ip s-port cs-uri-stem cs-uriquery sc-status time-taken cs (User-Agent)
cs(Referrer)
Figure 1 format of IIS server log file
p. 60 of 199
<Date and Time> <Client-id> <URL> <Referrer-id>
2006-01-17 00:00:00 203.124.225.19 a.asp 2006-01-17 00:00:02 203.124.225.19 b.asp 2006-01-17 00:00:03 203.124.225.19 c.asp d.asp
2006-01-17 00:00:05 203.124.225.19 e.asp f.asp
2006-01-17 00:00:06 203.124.225.19 c.asp b.asp
2006-01-17 00:00:07 203.124.225.19 f.asp a.asp
2006-01-17 00:00:10 203.124.225.19 d.asp e.asp
Figure 2 access log entries of IIS server
the graphical view, and a column in case of the matrix view to represent errors encountered while traversing.
The construction of UBMG starts with the navigational model and access logs as described in [7], where the
navigational model represents the complete overview of the different pages and the flow between the pages in
the Web system. The access logs store information regarding the timestamp, page accessed, client-id, referrerid, HTTP return code etc. for determining session information. A sample format of IIS log file is shown in
figure1. We consider referrer-id and the client-id fields as the basis to do a depth first search on the access
logs. This approach will segregate valid and invalid sessions. To understand, consider an application with only
two independent sessions- S1 with pages (a→ b→ c) and S2 with pages (d→ e→ f). Let the access log have
entries as shown in figure 2.
Given that the depth first search is based only with client-id field, we would have derived two valid sessions.
However, with the referrer-id field we determine the invalid path consisting of pages (a → b→ f). The count of all
such invalid sessions is determined, and the construction of UBMG is done only for the valid sessions. Let us
consider the example of an Online Shipping System (OSS), where the two sessions defined in the navigational
model are Session 1: “Export a package” with pages PackageSelection.asp→ PackageDetails.asp →
Export.asp → DeliveryLogistics.asp → Payment.asp. Session 2: “Import a package” with pages
PackageSelection.asp → PackageDetails.asp → Import.asp → DeliveryLogistics.asp → Payment.asp. We tag
an alias for the pages as given in figure 3.
0.2
0.2
0.4
a
c
0. 3
0.3
0.1
b
0. 6
e
0.5
d
0.2
0. 7
f
g
0.1
Figure 3 graphical view of UBMG
a-PackageSelection.asp; b-PackageDetails.asp; c- Export.asp; d- Import.asp;e-DeliveryLogistics.asp; f Payment.asp. Figure 3 shows the graphical view of UBMG with the exit node g. The matrix of transition
probabilities for the above graph is shown in Table 1. The matrix considers only those sessions that have
completed successfully. For example, sum of probabilities of the paths out of the node b is 0.9 indicating that
10% of clients had either dropped out or encountered errors.
0.5
0.2
0.2
0.4
a
c
0. 3
0.3
0.1
b
e
0.5
d
0.2
Error
0.1
0. 6
f
0. 7
g
0.1
Figure 4 addition of an error node to UBMG
The probability of reaching a node j in the graph can be calculated using Markov property [4, 6, 7]. The
generalized notation is, Nj =N1 * P(1,j) + N2 * P(2,j) + …..+ Nk * P(k,j) Where, k is the number of nodes that
lead to node j. In the OSS example, to compute the probability of reaching the node b is 0.4 * Na + 0.2 * Nb +
0.2 * Nc + 0.2 * Nd and probability of reaching the node e is 0.1 * Nc + 0.1 * Nd + 0.3 * Ne, where Na is always
equal to one.
a
b
a
0.4
b
0.2
c
0.2
d
0.2
e
f
c
d
0.3
0.5
e
f
g
a
b
c
d
0.3
0.5
e
a
0.4
b
0.2
0.1
c
0.2
0.1
0.1
d
0.2
0.1
0.3
0.6
e
0.7
Table 1 matrix of transition probabilities for OSS
f
0.3
f
g
Error
0.5
0.6
0.1
0.7
Table 2 matrix of transition probabilities with error node
The complexity of figures 3, 4, and 5 can be calculated using cyclomatic complexity (e-n+2 or e-n+1 or numbers
of two edges in a single node+1, where: e is the total number of edges and n is the total number of nodes)
6.1.1 Failure Analysis of UBMG
Now, we extend the UBMG to include the failure data. To capture the failure data, the access logs are scanned
for HTTP return error codes of 4xx and 5xx as mentioned in [27]. Besides this, the errors from other servers are
p. 61 of 199
also considered. Theoretically, the error node can stem from any page in the graphical view. We add the error
node Er and all the page errors are associated with this node. The matrix of transition probabilities will have an
additional column to represent the error node. A cell (m, Er) of this column will include the probability of
transitioning from the node m to error node Er. Considering the OSS example, the view of UBMG with the
addition of error node is shown in figure 4. The matrix of transition probabilities for the figure 4 is shown in Table
2. The matrix considers only those sessions that have some error. Of all the requests that enter node c, 40% of
them encountered some error. Before proceeding to failure analysis due to service-level agreements (SLA)
violation, we define the term Session Response Time (SRT) which is the sum of the service times of all the
pages in the session. We define the SLA at session level and hence we need the desired response time target
for each session. The access log files can be used to determine the page service time (PST) values. For
example, in the IIS Web server the time-taken field represents the time spend by server to respond to the
request. SRT is computed as the sum of PST’s of its individual pages. Further, we compute the number of
successful sessions where the SLA was violated. Let S1 and S2 be two sessions for the OSS example. Table 3
shows the sessions information, where each session is represented by a unique column, and includes number
of successful sessions, number of instances of SLA violation, etc. The probability of reaching exit node for a
session is computed as the ratio of number of exits with respect to the number of visits at the entry page. Figure
5 shows the addition of virtual nodes to the existing figure 4. The matrix of transition probabilities for the figure 5
is shown in Table 4.
Virtual Node
0.6
0.5
0.2
a
0.4
0.2
c
0. 3
Error
0.3
0.1
0.1
b
e
0.5
d
0.2
0. 6
0. 7
f
g
0.1
Virtual Node
0.7
Figure 5 addition of an or and the virtual nodes to UBMG
6.1.2 Calculation of Reliability
To compute the reliability of software code-level failures, we resort to determine the probability of encountering
the failure node PCODE-ERROR represented in Figure 4. To solve this probability of reaching the error node, we
formulate a set of equations from the matrix and use techniques like Cramer’ s Rule, Matrix Inversion or Gauss
Jordan Elimination method (for solving the sets of simultaneous equations). We also compute (a) the total
number of failures due to invalid session NINVALID-SESSION, and (b) number of instances where successful
sessions did not meet SLA as NSLA-FAIL. The probability of occurrence of invalid sessions is computed using (a).
The probability of failure for a session due to (b) is computed by considering the total number of its successful
sessions. In the OSS example, the probability of such failures is 0.59 in Session 1 and 0.56 in Session 2. The
probability of a session reaching the exit node, but violating SLA or invalid sessions needs to be computed. The
total session failure probability PSESSION-FAILURE is calculated as the sum of all the individual session probabilities
and the probability of occurrence of invalid sessions. The overall probability of failure PTOTAL-FAILURE for the
system is calculated as sum of the probability of reaching error node PCODE-ERROR, and the probability of session
failure PSESSION-FAILURE for the entire system. The overall reliability RSYSTEM of the system is calculated by the
SESSIONS
1. Total no. of successful session
2. Total no. of SLA violation NSLA-FAIL
3. Probability of failures due to (2)
4. Prob. of reaching exit node for
each session
5. Probability of SLA violation for
each session using (3) and (4)
S1
125
64
0.59
0.78
0.37
Table 3. Results of SLA violation probability
S2
150
67
0.56
0.76
0.32
a
b
c
d
0.3
0.5
e
a
0.4
b
0.2
c
0.2
0.1
d
0.2
0.1
e
f
0.3
f
g
Error
0.7
0.6
0.5
0.6
0.1
0.7
Table 4 matrix of transition probabilities with error and
virtual nodes
equation: RSYSTEM = 1 – PTOTAL-FAILURE. Thus the reliability computation is driven by failures at software code
level, failures due to SLA violation and invalid sessions.
6.2 The RSWAEA Method
In order to deal with the problem of effort estimation we have been studying the last two years the Web-based
software development processes, related to the development of small and medium size Web-based information
p. 62 of 199
systems. Based on the analysis of these results, we identified a low usability of the well-known effort estimation
methods and a necessity of a model to support estimation in such scenario. Due to this, we developed a
method for fast estimating the Web-based software development effort and duration, which will definitely be
adapted by the software community for the development of Web-based hyper media applications. We called it
RS Web Application Effort Assessment (RSWAEA) method. The method will be very useful to estimate the
development effort of small to large-size Web-based information systems. The DWOs (Data Web Objects) are
an approximation of the whole size of the project; so, it is necessary to know what portion of the whole system
DWOs represent. This knowledge is achieved through a relatively simple process (briefly described in next
subsection). Assuming that the estimation factors in the computation of the effort are subjective, flexible and
adjustable for each project, the role of the expert becomes very relevant. Once the value of the portion or
representativeness is calculated, the expert can adjust the total number of DWOs and he/she can calculate the
development effort using the following equation.
*
P
8
E= (DWO . (1+X )) . CU . Π cdi
i=1
Where: E is the development effort measured in man-hours, CU is the cost of user, cdi is the cost drivers, DWO
corresponds to the Web application size in terms of data web objects, X* is the coefficient of DWO
representativeness, and P is a constant. The estimated value of real data web objects (DWO) is calculated as
the product of the initial DWOs and the representativeness coefficient X*. This coefficient is a historical value
that indicates the portion of the final product functionality that cannot be inferred from the system data model.
The value of X* (coefficient of DWO representativeness) is between 1 to 1.3 depending upon small to large-size
We-based applications. The process of defining such coefficient is presented in the next section. The cost of
each user is has the values between 0 and 5. A value of CU of 0 means the system reuses all the functionality
associated with each user type; so, the development effort will also be zero. On the other hand, if the cost of
user is five, this means that there is no reuse of any kind to implement the system functionality for each user
type. It represents the system functionality that is associated with each user type. The defined cost drivers (cdi)
are defined in next subsection) and they are similar to those defined by Reifer for WebMo [18]. The last
adjustable coefficient in RSWAEA corresponds to constant P that is the exponent value of the DWO. This
exponent is a value very close to 1.01, and it must neither be higher than 1.12 nor lower than 0.99. This
constant’s value depends on the project size measured in DWOs. In order to determine this value, various
statistical analyses have been done on various Web-based applications. As a result, this constant was assigned
the value 1.09 for projects smaller than 300 DWOs, and 1.03 for projects larger than 300 DWOs.
6.2.1 Strategic Implementation of RSWAEA method
In order to help the expert achieve more accurately effort estimations, RSWAEA method introduces a new
sizing metric based on the data model of the Web-based information system to be developed: Data Web
Objects (DWO). DWO is an indirect sizing metric that takes into account the characteristics of small to largesize projects. The idea behind the DWO is to identify the system functionality, analyzing its data model. DWOs
are similar to other indirect metrics such as FPs [17], Object Points [28], or Web Objects [18] in the fact that
they represent abstract concepts that are used to obtain the size of the system to be developed. Thus, we can
fill the table of DWOs in Table 5 to calculate the system size. The weight assigned to each category of DWO
represents the development effort of each one, and it is based on the experience of the expert estimator. As we
have already discussed in part-I of this paper about the effort estimation methods with the combination of
WebMo and Web Objects are no appropriate to estimate the development effort of Web-based applications.
Therefore, the RSWAEA method intends to be more appropriate to estimate the development effort of small to
medium-size projects, especially in circumstances that require fast estimation with little historical information.
Let us continue with the example of Visual Basic Script we used previously in UBMG for predicting a Web
application. The 96 DWO count in Table 5 represents the size of the program that would be required for this
Web application.
Type of DWO
Amount of
DWO
5
1
3
1
Regular Entities
Dependent Entities
Relationship Entities
Relationship 1 to 1
Weight Factor
Total of DWO
×
×
×
×
8
10
3
3
40
10
9
3
Relationship 1 to N
Number of multimedia files
3
2
×
×
6
6
18
12
Number of scripts
1
×
4
4
Total of DWO
96
Table 5 definition of DWOs amount
p. 63 of 199
The Cost of User (CU) is a function of the user types to be supported by the system. The RSWAEA method
considers three user types defined as: Project manager, Web-designer and Counselor. The Project manager
user is in charge of supervising the available applications in the system, activating and deactivating functional
areas of the system and maintaining the set of applications that keep the project in constant execution. The
Web-designer user uses the available functionality in the system to modify and consult to the stored
information. The Counselor user has access to part of the information available in the system, but only for
reading. On the other hand, RSWAEA method also considers the possibility that variable user, which is a mix of
the aforementioned as shown in table 6. Finally, the RSWAEA method has a series of Cost Drivers taken from
the WebMo model proposed by Reifer [18]. These Cost Drivers represent the available development scenarios
for a particular project. Such scenarios have positive and negative influences over the development process
that need to be taken into account during the estimation process. Cost Drivers are subjective factors in
RSWAEA method, and the values of these cost drivers are depicted in table 7. For this model nine cost drivers
are defined: PRCLX: Product reliability and complexity (Product attributes), PFDIF: Platform difficulty (Platform
and net servers volatility, PECAP: Personnel capabilities (Knowledge skills and abilities of the work force),
PEEXP: Experience of the personnel (Depth and width of the work force experience), FACIL: Facility and
infrastructure (Tools, equipment and geographical distribution), SCHED: Scheduling (Risk degree assumption if
delivery time is shortened.) CLIEN: Client type (Technology knowledge the client has; requirements stability),
WTEAM: Work team. (Ability to work synergistically as a team), and PROEFF: Process efficiency (Development
process efficiency).
User Type
Fixed Users
Variable Users
Project manager
Web-designer
Counselor
Secretary
Area Manager
Fraction of the
Scope (I)
Reuse Degree (R)
0.4
0.7
0.8
0.2
0.3
0.1
0.5
0.9
1.0
0.9
Table 6 example of a user’s table for different user types in a Web-based application
VL
Cost drivers for RSWAEA method
L
N
H
VH
Driver
PRCLX
PFDIF
PECAP
PEEXP
FACIL
SCHED
CLIEN
WTEAM
PROEFF
0.64
0.84
1.00
1.32
0.85
0.95
1.05
1.28
1.52
1.28
1.02
0.92
1.30
1.14
1.02
0.90
1.35
1.17
1.00
0.90
1.40
1.18
1.00
0.95
1.45
1.25
1.04
0.88
1.45
1.25
1.00
0.88
1.30
1.15
1.05
0.90
Table 7 cost drivers of RSWAEA method and their values
1.61
1.70
0.85
0.85
0.90
0.95
0.80
0.85
0.70
Each of these cost drivers is classified in a five level scale: very low, low, normal, high and very high (VL, L, N,
H, VH). In order to determine which level corresponds to each cost driver, the estimator uses a series of
predefined tables that were built using historical information. Each cost driver has an assigned value in each
category, and the product of each value is part of the equation for calculating the effort in the RSWAEA method.
The assigned values in each category are replaced in the RSWAEA effort estimation equation in order to obtain
the result in man-hours.
7. Final Expected Measurements and Validations
On the basis of above mentioned Web-based Designs and projected measurements techniques, the following
things will be examined and calculated for predicting the efficient Web-based design:
(i) Identification of Measures that can be used to predicting the Efforts for Web-design.
(ii) Identification of a Methodology that can help the Webmaster in controlling the Efforts for Web-design. In
order to identify the above approaches, the Web-based metrics the features and functionality of the application
to be developed will be design for effort predictions in order to propose efficient Web-based measurements.
The major aim will be to bring light to this issue by identifying size metrics and cost drivers for early Web cost
estimation based on current practices of several Web pages worldwide. This will be achieved using surveys
(based upon hypothetical and Web companies based analysis) and a case study. These proposed Web-based
measurements techniques would be organized into categories and ranked. Results will be indicated by the two
most common size metrics used for Web cost estimation. Moreover, a complete Web-based measurements
portfolio will be designed on the basis of various Web applications. On the basis of these, a method will be
proposed to fast estimate the development effort of Web-based information systems. The method will address a
p. 64 of 199
necessity to get effort estimations in a short period by using limited information and resources. The proposed
method will use raw historical information about development capability and high granularity information about
the system to be developed, in order to carry out such estimations. This method will be simple and specially
suited for small, medium-size, or larger-size Web-based information systems. Generally, these estimations are
the basis of the budget given to the client. Based on such budget the software development companies sign
contracts with the client. In other words, the effort estimation carried out for budget purposes usually
establishes the business rules for the project. Without an appropriate model, cost estimation is done with a high
uncertainty and the development effort estimation relies only on the experience of an expert, whose estimations
are generally not formally documented. This expert knows well the development capabilities of the company
and is able to interpret the client'
s requirements with high accuracy. At last, based on the analysis of these
results, we will identify a low usability of the well-known effort estimation methods and a necessity of a model to
support estimation in such scenario. Due that, we will develop a method for fast estimating the Web-based
software development effort and duration, which will be adapted to development of Web-based projects. The
method will specifically applicable to estimate the development effort of small, medium-size, or larger-size Webbased information systems in immature development scenarios. Prior to the above-mentioned tasks, we will
also perform experiment to collect data about the estimators (such as estimation experience, experience with
similar tasks, level of optimism, etc) by personal discussions and questionnaires. For each task the subjects will
submit the estimate, the time spent estimating and the time spent to complete the task. Characteristics of the
actual outcome (for instance, is the task satisfactorily completed) will be determined by inspection of the
delivery.
Furthermore, for Web-based software effort estimations and measurements, the compatibility, usability,
maintainability, complexity, cost, configuration, time requirements, types of interfaces, tractability, and type of
nature of Web design would also be examined and considered. Finally, we will validate this study on the basis
of Web-based measurements by taking the features and functionality of the application to be developed for
effort predictions in order to propose efficient Web-based measurements. The task size and consequences of
estimation errors will be predicted. However, positive results would suggest that the various effects apply to
estimate Web-base efforts, thus would be an invincible task for the upcoming future. Generally, the developers
spend time trying to estimate the software development effort realistically and reliably, they usually have very
little time for this task and very little historical information is available. These characteristics tend to make
estimations less reliable regarding both time and cost. An expert knows the development scenario and the
development capabilities of his/her organization, but he/she generally does not have good tools to support an
accurate, reliable and fast estimation (within 01–03 days). In order to get fast and reliable effort estimations of
Web-based applications, the RSWAEA method and UBMH are used to scientifically identify the effort
assessment, estimation and failure points in Web systems.
8. Conclusions and Future Work
In this paper we have introduced an approach for determining the reliability and effort assessment/ estimation
for Web-based systems. In order to get fast and reliable effort estimations of Web-based information systems
development projects. These method work by offline and online analysis of Web logs and come up with useful
metrics like RSWAEA, UBMG, session count, SRT computation etc., and these metrics can effectively be used
for the computation of reliability and efforts for small to larger-size Web-base applications. Although these
methods do not replace the expert estimator, but they provide him/her with a tool for achieving a more accurate
estimation, based on real data in a shorter time. Estimating the cost, duration and reliability of Web
developments has a number of challenges related to it. To handle these challenges, we have analyzed many
findings drawn from the experienced and expert opinions. Finally, by taking the good qualities of a software
metric and an accessible Web design, we validated that the proposed models have better effort predictive
accuracy up to 76.5% and the overall reliability of the Web-based systems up to 72.5% than traditional
methods. The proposed methods will be available completely to project manager, Web-designer, students,
teachers and for research and development organizations free of cost. Our future work may include the study of
lexical analysis together with COTS to develop the complete framework for effort assessment for authoring
large volume Web-based applications.
Acknowledgements
A major part of the research reported in this paper was carried out at K.U.K and D.I.M.T, Haryana, India. We
are highly indebted and credited by gracious help from the Ernet section of K.U.K for their constant support and
help while testing our proposed models on to different computer systems. The authors would like to thank those
nameless individuals who worked hard to supply the data.
References
[1] D. Lowe, Web Engineering or Web Gardening?, WebNet Journal, Vol. 1, No. 1 January-March 1999.
[2] E. Nelson, Estimating Software Reliability from Test Data, Microelectronics and Reliability, 17(1), pp. 67-73,
1978.
[3] Advanced tools intended to produce individual page statistics available at http://www.analog.cx;
http://www.weblogexpert.com etc.
p. 65 of 199
[4] Wen-Li Wang, Mei-Huei Tang, User-Oriented Reliability Modeling for a Web System, 14th International
Symposium on Software Reliability Engineering (ISSRE), November 17-21, 2003.
[5]
University
of
Maryland,
NASA
High
Dependability
Computing
Program,
http://www.cebase.org/HDCP/frames.html?/HDCP/Models/input_domain_models.html.
[6] D.A. Menace, V.A.F. Almeida, R. Fonseca, M.A. Mendes, A Methodology for Workload Characterization of
E-Commerce sites, Proceedings of the 1st ACM conference on Electronic Commerce, 1999.
[7] Shubhashis Sengupta, Characterizing Web Workloads– a Transaction-Oriented View, IEEE/ IFIP 5th
International Workshop on Distributed Computing (IWDC 2003), 2003.
[8] JohnD. Musa, Anthony Iannino, Kazuhira Okumoto, Software Reliability (McGraw-Hill, pp. 18, 1987).
[9] Pfleeger, S. L., Jeffery, R., Curtis, B., Kitchenham, B, Status Report on Software Measurement, IEEE
Software, March/April, 1997.
[10] Fenton, N. E., Pfleeger, S. L., Software Metrics, A Rigorous & Practical Approach, 2nd Edition, (PWS
Publishing Company and International Thomson Computer Press, 1997).
[11] Hatzimanikatis, E., Tsalidis, C. T., Christodoulakis, D., Measuring the Readability and Maintainability of
Hyperdocuments, Journal of Software Maintenance, Research and Practice, 1995, 7, pp. 77-90.
[12] Warren, P., Boldyreff, C., Munro, M., The Evolution of Websites, Proc. Seventh International Workshop on
Program Comprehension, IEEE Computer Society Press, Los Alamitos, Calif., 1999, pp. 178-185.
[13] Mcdonell, S. G., Fletcher, T., Metric Selection for Effort Assessment in Multimedia Systems Development,
Proc. Metrics’98, 1998.
[14] COSMIC: COSMIC-FFP Measurement manual, version 2.0, http://www.cosmicon.com, 1999.
[15] B. Boehm, E. Horowitz, R. Madachy, D. Reifer, B. K. Clark, B. Steece, A. Winsor Brown, S. Chulani and C.
Abts, Software Cost Estimation in COCOMO II (Prentice-Hall, 1st edition, January 2000).
[16] D. Phillips, The Software Project Manager’s Handbook (IEEE Computer Society Press, 1998).
[17] International Function Point Users Group, Function Point Counting Practices Manual. Release 4.0, URL:
http://www.ifpug.org/publications/manual.htm, 1994.
[18] D.J. Reifer, Web Development: Estimating Quick–to-Market Software, IEEE Software, Vol. 17, No. 6,
pages 57 - 64, November-Dec. 2000.
[19] Thomas A. Powell, The Complete Reference HTML & XHTML (Fourth Edition, Tata McGraw-Hill- New
Delhi).
[20] Pressman S. Roger, Software Engineering- A Practitioner'
s Approach (McGraw-Hill, 1997).
[21] IEEE Trans. Software Engineering, Vol. SE-10, pp-728-738, (1984).
[22] Ejiogu, L., Software Engineering with Formal Metrics (QED Publishing, 1991).
[23] Roche, J.M., Software Metrics & Measurement Principles, Software Engineering Notes, ACM, Vol. 19, no.
1, pp.76-85, 1994.
[24] Basili, V.R., & D.M.Weiss, A Methodology For Collecting Valid Software Engineering Data, IEEE Software
Engineering Standards, Std. 610.12-1990, pp.47-48, 1993.
[25] A. J. C. Cowderoy, Size and Quality Measures for Multimedia and Web-Site Production, Proc. of the 14th
International Forum on COCOMO and Software Cost Modeling, Los Angeles, CA, Oct. 1999.
[26] Daniel A. Menasce, Virgilio A.F. Almeida, Scaling for E-Business Technologies, Models, Performance, and
Capacity Planning (Prentice Hall PTR, pp. 49-59, 2000).
[27] The Internet Society, Request for Comments (RFC): 2616. Hypertext Transfer Protocol–HTTP/1.1,
http://www.w3.org/Protocols/rfc2616/rfc2616.html.
[28] B. Boehm, Anchoring the Software Process, IEEE Software, Vol. 13, No. 4, pages 73 -82, July 1996.
p. 66 of 199
A Language for Dening Traceability Models for Concerns
Dolores Diaz*+, Lionel Seinturier*, Laurence Duchien* and Pascal Flament+
*Laboratoire d'Informatique Fondamentale de Lille
USTL - UMR CNRS 8022
INRIA Futurs - Project Jacquard - B> 21 . Ext. M3
59655 Villeneuve d'Ascq Cedex
France
fdiaz, seinturi, [email protected]
ABSTRACT
During maintenance phases, the traceability of a functional requirement is a valuable help to face size and
complexity of current applications. Indeed, this traceability called requirements traceability details progressive and successive denitions of functional requirements, from their identications to their deployments
during a software process. Consequently, specications
and developments of a requirement are claried and its
maintenance become more easier. However, the timesaving to maintain a functional requirement, is lost in
maintaining a non functional requirement such as security, performance, . . . Indeed, non functional requirements lack often clear specications and their implementations are tangled with implementations of functional requirements. We propose then to identify each
functional and non functional requirement as a concern,
extend the requirements traceability notion to concerns
and link traceability models to each concern dened in
application. Thus, the specication and implementation
of a concern are completely described by its traceability
models. The comprehension and the maintenance of a
concern are signicantly improved. This paper presents
a language for dening traceability models for each concern identied in an application. The metamodel of this
language and several examples on its use are introduced.
KEYWORDS
Traceability, software evolution, maintenability.
1 INTRODUCTION
The size and complexity of current applications make
maintenance activities dicult. Each software change is
awkward. The reasons are related to the huge quantity
of UML models, conguration and implementation les
of an application and the complexity of each functionality which does not facilitate its comprehension. These
hinders aect the quality of the software maintenance.
Works about the requirements traceability, the ability to
describe and follow the life of a functional requirement
from its origins, through its specication and development to its subsequent deployment and use provide an
interesting answer [4]. Each functional requirement is
described by a structured set of informations (artefacts)
+NORSYS
1, rue de la Cence des Raines
ZAC du moulin
59710 Ennevelin
France
fddiaz, [email protected]
produced during an development phase. This set gathers dierent denitions of the functional requirement as
use case diagram, class diagram or component diagram.
Thus, with a requirements traceability, designers have a
global view on the development of a functional requirement and have a better understanding of its design [5].
Several works [5, 6, 7] establish a clear requirements
traceability denition for functional requirement but
they do not take care about non functional requirements. A non functional requirement which denes a
technical need identied by customers can be as sensitive to develop as a functional requirement. During
a maintenance phase, they also undergo modications
which are essential to follow in order to ensure the quality of the software maintenance.
Our work intends to dene a language for dening traceability models for concerns in order to provide a help for
maintenance activities. We identify functional and non
functional requirements as concerns. Then, we dene a
language for describe the life of each concern dened in
an application with a list of traceability models. More
exactly, the description of the life of a concern is made
in relation to one concern and several processes. The
concern denes a functional or a non functional requirement. Each process describes one phase of the lifecycle
of the concern : the development or the maintenance
phase. During a development phase, an elaboration process denes specications and implementations of the
concern. We type the traceability of a concern during
its development phase as a top-down traceability. During a maintenance, an evolution process denes transformations or changes to make on the target concern.
We type the traceability of concern during its maintenance phase as a temporal traceability. Thanks to our
language, we can follow the lifecycle of a given concern
in the whole. For this, two traceability models at least
have to be dened : the rst one focus its development
and the others focus on its evolutions.
The next section of this paper introduces the metamodel
of our language. Section 3 gives examples of traceability
models for a given concern : a functional requirement
and a non functional requirement. For the functional
p. 67 of 199
requirement, a traceability model describes its development. For the non functional requirement two traceability models describes respectively its development and an
evolution. Finally, section 4 concludes this paper.
A traceability model allows then to organize a set of
objects according to a process and gives consequently
a global view on specications and realizations of the
concern.
2 A LANGUAGE FOR THE DEFINITION
OF TRACEABILITY MODELS
The traceability models can dene two viewpoints of
the same concern. On one hand, they can show an
unique state of a concern at one given time. This is the
case when a traceability model denes the development
phase of a concern. And in the other hand, they can
show a transition between states of a the same concern.
This is the case when a concern undergoes a change
during a maintenance phase. The dierent is related
to a process. In order to identify these viewpoints of
a concern, we dene and type each traceability models
by top-down or temporal attributes. The next section
explains theses attributes.
Traceability is essential during the software lifecycle.
Ramesh and Jarke who conducted empirical studies argue the need of a requirements traceability and proposes reference models for its denition [3]. However,
these models focus on the traceability of functional requirements and are not interested in non functional requirements. Our ambition is to extend the requirements
traceability denition to a concerns traceability and dene a language for dening traceability models for any
concern of an application. For this, we have based our
work on following denitions which dene a concern, a
software process and the notion of concerns traceability
:
Top-down and temporal attributes
A traceability model is described by a top-down attribute when it includes rules dened in an elaboration
process : The elaboration process is described by a topdown process of a standard software engineering process
(as [8]). It involves a set of objects dened in each subphases of the process and a set of relations dened between these objects. We dene the temporal attribute
to a traceability model A traceability model is described
by a temporal attribute when it includes rules dened
in evolution process : Evolution processes are related to
time notion. Generally, they are based on a set of objects and relations dened at the same denition phase.
This kind of process is appropriate for the description
of software transformations to apply over time.
Denition 2.1 A concern focusses on one given prob-
lem. More precisely, each concern identies and encapsulates the description of a requirement that can be
functional or not.
For example, each requirement identied in a use case
diagram represents a concern. In the same way, any
non functional requirement identied in a design stage
represents a concern.
Denition 2.2 A software process denes a methodology which details a solution to specify and implement a
concern. This solution is based on a set of objects to
dene and a list of steps to follow.
Top-down and temporal attributes allow a classication
of traceability models related to a concern.
The next section introduces the metamodel which denes our language for dening traceability models.
For example, RUP [8] and XP methodologies are two
dierent methodologies to develop an application.
The traceability metamodel
The traceability models focus on the set of objects and
relations involved in the denition of a concern at a
given phase. Our language is based on the metamodel
of Figure 1. Its denition based on Ramesh and Jarke
's work [3].
Denition 2.3 The notion of the concerns traceability
dene the ability to dene the life of each concern of an
application. For this, each life of concern is dened by
a set of traceability models which described development
and maintenance phases using respectively elaboration
and evolution processes.
We model a traceability model with a set of objects that
we call SoftwareEntity and a set of relations called
Satisfy, DependOn, and EvolveTo. Thanks to these
notions ( metaclasses in bold lines), we dene an abstract model.
Thus, a traceability model is strongly related to one
concern and to a phase. Its denition is based on the
two following elements :
More precisely, any relation between two software entities which come from dierent stages, is represented
by a Satisfy relation. For example, during the RUP
process [8], the realization of an use case by a collaboration diagram, is represented by a Satisfy relation. Conversely, any relation between two software en-
A concern : a set of objects that describes a func-
tional or non functional requirement.
A process : it describes objects and a way to link
them in order to build a solution.
p. 68 of 199
2
Figure 1: The traceability metamodel
tities of the same stage, is represented by a DependOn
relation. This is the case for a composition association between a class diagram and one of its class. Finally, any evolution to apply on a software entity over
time, is represented by a relation EvolveTo. The object
SoftwareTransformation details the set of transformations to make during the evolution (see Figure 1).
emphasizes the use of our language with the denition
of 3 traceability models.
3 TRACEABILITY EXAMPLES
In this section, we dene 3 examples with our language.
Two are top-down traceability models and dene lives of
functional requirement and non functional requirement
respectively. Each traceability model is dened according to an elaboration process. The third example denes
also a non functional requirement but with an temporal
traceability model. The non functional requirement undergoes a short evolution. Note that these examples do
not specify stakeholders who are in charge of artefact
denitions. We want to avoid complex models.
Thanks to this abstract model, we give then a rst definition of a traceability model. For one given concern,
we can identify its key software entities and their organization. However, this abstract model has to reect
the real support of each software entity. That is why
we introduce notion of Artefacts and isDocumentedBy
(shadow metaclasses) relation which represent physical
support of any software entity dened in a traceability
model. Thus, any software entity can have its physical support (with the relation isdocumentedBy ) one or
several Artefacts. In the same way, each Artefact can
be in charge of ( with the relation isManagedBy ) one
or several Stakeholders (blank metaclasses) which correspond a role in an elaboration or an evolution process.
For example, our language permit to precise that an architect manages component diagrams during the design
of a requirement.
A traceability model for a functional requirement
A traceability model for a functional requirement, dened with our language is provided in Figure 2.
In this example, the considered concern is \ functional
requirement " and the traceability model describes specications and implementation of its development using
a RUP process. Figure 2 details that this development
encompasses 4 main models (represented by a software
entity in bold line of Figure 2) : Requirement model,
a RequirementAnalysis model, a RequirementDesign
model and RequirementImplementation model. Each
model renes (represented by relation Satisfy ) the
initial requirement specication until the denition of
its implementation. For example, our model presents
clearly the renement of the Requirement model by
the RequirementAnalysis model. Moreover, each main
Our language permits to specify a traceability model at
two main levels : an abstract level and concrete level.
The abstract level establishes key software entities dened during the specication and implementation of a
concern and main relations dened between them The
concrete level makes links between software entities and
their corresponding physical supports. The next section
3
p. 69 of 199
Figure 2: A traceability model for functional requirement
model depends on (represented by relations DependOn
in Figure 2) a set of others software entities. Thus,
for example the RequirementDesign model depends
on a software architecture (represented by the software
entity SoftwareArchitecture) which itself depends on
a set of components (represented by a software entity
Component).
methodology does not prevent their development, on the
other hand, they do not facilitate their maintenance. It
becomes then important to describe the life of a non
functional requirement with a traceability model.
Although work are not numerous, AOSD [10] approaches provide some elaboration processes for the development of non functional requirement. We have chosen the one dened in [1] because it becomes easily integrated into an industrial software development methodology such as RUP process. The elaboration process
presented in [10] devides each identied non functional
requirement into smaller and converted them into operations, named operationalizations. The analysis and
design phases denes the implementation of each operation by the denition of aspects [11].
The set of these software entities and relations forms
an abstract model of the traceability and species elements that have to be traced. The next step denes links between this abstract model and its concrete
elements. Artefacts and relations iSDocumentedBy
specify physical supports of each software entity and
ensures a coherence between abstract elements of the
traceability model and concrete elements that are really
dened in a repository. For example, the Requirement
model is documented by a list of scenarios (see Figure
2) which describes the requirement in the whole.
In this example, the considered concern is \a non functional requirement " and the traceability model describes specications and implementations of this development using an ASOD process. Figure 3 illustrates
this traceability model for a non functional requirement.
The next section presents another example of top-down
traceability model and describes the development of non
functional requirement.
The model introduces 4 main models for the development of a non functional requirement. More precisely, it informs that each non functional requirement
is divided into smaller ( thanks to software entities
NonFunctionalRequirement and Operationalization
) during a rst and same stage. The next stage de-
Traceability models for a non functional requirement
A top-down traceability model
Currently, standard processes to develop a non functional requirement do not exist. Although this lack of
p. 70 of 199
4
Figure 3: A non functional requirement traceability
clearly dene a follow-up of a given software evolution.
At the state of our work, we are interested to software
component evolutions.
nes the non functional requirement analysis whose definition is built on analysis objects (represented by the
software entity AnalysisObjects) and a dynamic definition of those objects (represented by software entities AnalysisObjectInteraction). The most important
stage of this process is the design stage where rst aspects (represented by software entities AspectClass) are
identied in order to implement the non functional requirement in the whole.
In this example, the considered concern is \The software
component evolution". Its traceability model, expressed
with language, denes a specication and a realization
of this evolution. Figure 4 illustrates the traceability of
a component evolution. The dened traceability gives
a static view of the component evolution with denitions of the component (represented by software entity
Component ) and applied transformations (represented
by transformation entities and relations EvolveTo ).
the whole forms the abstract model of the component
evolution traceability.
The abstract model of the non functional requirement
traceability dened thanks to software entities and (elements of Figure 3 dened in bold lines) is then completed by Artefact denitions (shadow elements of Figure 3). For example, during the design stage, each
identied aspect is documented by an \Aspect" stereotyped class.
Each component specication (type and implementation denition) is documented by corresponding artefacts which are Java class les (shadow elements of Figure 4). Software transformations linked to an evolution
and applied the component are dened by XML Files
(shadow elements) More precisely, we see that type,
ImplementationComponent, and Component are impacted by an evolution. Each software transformation
denition of this evolution is detailed by a software entity SoftwareTransformation, represented by a association class. This one describes the type of the transformation and the set of primitive operations to apply
to a software entity.
The next section introduces an example of a temporal
traceability model which permit to trace the evolution
of a non functional requirement.
A temporal traceability model
The software maintenance represents a fundamental
phase in the lifecycle of an application. Currently, approaches and standards [12, 13, 14] exist in order to
assist maintainers in software evolutions but none of
them dene a global view on realized software transformations. With our language, any maintainers can
5
p. 71 of 199
Figure 4: Traceability of a component evolution
Each software transformation is also documented by
a artefact.
In our example, concrete supports
of TypeTransformation, ImplementationComponent,
ComponentTransfomation are XML les. Thanks to
this traceability model, any maintainer can easily keeps
track of the evolution realization. He knows exactly
which software entities are impacted by transformations
and how these transformations are dened.
evolutions to components. Our target is to follow a requirement or non requirement evolution in the whole.
Indeed, the next step aims to combine several temporal
traceabilities in order to transform in a consistent way
all software entities of a given concern, impacted by an
evolution.
4 CONCLUSION
[1] G. Sousa, S. Soares, P. Borba and Jaelson Castro,
Separation of Crosscutting Concerns from Requirements to Design: Adapting the Use Case Driven Approach (2004) Early Aspects 2004 : Workshop at International Conference on Aspect-Oriented Software
Development.
REFERENCES
This paper presents a language for the traceability definition. It permits the modelling of concern lifecycle
as requirement, non functional requirement or a simple component. For this, our language is based on
the software entity notion (an abstract representation)
which are linked to artefacts (concrete representation)
and stakeholders (authors of representations). Thus,
any concern dened in a application has a clear historical about its realization process, about documentation
which dene it and about authors who dene it. We
have also identied natures of traceabilities : top-down
and temporal traceabilities. Two examples have illustrates top-down traceabilities and a third one has shown
a temporal traceability. During maintenance activities,
these models give a valuable help since they permit to
improve concern comprehension, (understanding what
does the concern and how it is implemented) it permits to identify quickly elements impacted by a change
during an evolution and permits to assist maintainers
in the transformations realization. The quality of software evolutions is improved. However, our language is
not yet completed regarding to software evolution traceability since we have limited the traceability of software
p. 72 of 199
[2] I. Jacobson, M. Christerson, P. Jonsson and G.
Overgaard Object-Oriented Software Engineering.
A use case driven approach (1994) Addison-Wesley
[3] B. Ramesh and M. Jarke Toward Reference Models
of Requirements Traceability (2001) Software Engineering citeseer.ist.psu.edu/ramesh99towards.html
[4] O. Gotel and A. Finkelstein An Analysis of the Requirements Traceability Problem (1994) Proc.First
Int'l Conference Requirement Eng.
[5] M. Edwards and S. Howell A Methodology for a System Requirement Specication and Traceability for
Large Real-Time Complex Systems (1991) technical report, U.S. Naval Surface Warface Center
[6] D. Palmer Traceability Software Requirement Engineering, R. H. Thayer and Dorfman eds.
6
[7] V. L. Hamilton and M. L. Beedy Issues of Traceability in Integrating tools (1991) Proc. IEEE Colloquium on Tools and Techniques for Maintaining
Traceability during Design
[8] P. Kroll and P. Kruchten Rational Unied Process
Made Easy : A Practitioner's Guide to the RUP
(2003) Addison Wesley
[9] D. L. Parnas Software aging (1994) ICSE '94:
Proceedings of the 16th international conference on
Software engineering Sorrento, Italy IEEE Computer Society Press
[10] R. Chitchyan, A. Rashid, P. Sawyer, A. Garcia, M.
P. Alarcon, J. Bakker, B. Tekinerdogan, S. Clarke
and A. Jackson Survey of Aspect-Oriented Analysis
and Design Approaches (2005) European Network
of Excellence on Aspect-Oriented Software Development
[11] G. Kiczales, J. Lamping, A. Menhdhekar, C.
Maeda, C. Lopes, J.M. Loingtier and J. Irwin
Aspect-Oriented Programming (1991) Proceedings
European Conference on Object-Oriented Programming Springer-Verlag Berlin, Heidelberg, and New
York
[12] J. S. Collofello and M. Orn A Pratical Software
Maintenance Environment (1998) In Proceedings
of the International Conference on Software Maintenance
[13] IEEE Std 1219-1998. Standard for Software Maintenance Institute of Electrical and Electronics Engineers (IEEE), 1998 ISBN 0738103365
[14] ISO/IEC 14764 Information technology - Software
maintenance, edition 1.0 International Organization for Standardization/International Electrotechnical Commission (ISO/IEC), 1 November 1999
7
p. 73 of 199
p. 74 of 199
User-centric dynamic evolution
Peter Ebraert∗ Theo D’Hondt
Programming Technology Lab
Vrije Universiteit Brussel
Pleinlaan 2
B-1050 Brussel, Belgium
{pebraert, tjdhondt}@vub.ac.be
Yves Vandewoude∗ , Yolande Berbers
Department of Computer Science
KULeuven
Celestijnenlaan 200A
B-3001 Heverlee, Belgium
{yvesv, yolande}@cs.kuleuven.ac.be
Abstract
switches, banking systems, etc. could have unacceptable financial consequences for the companies and their position
in the market.
A possible solution to this problem are redundant systems [8]. Their main idea is to provide a critical system with
a duplicate, that is able to take over all functions of the original system whenever this latter is not available. Although
this solution works in practice, it still has some disadvantages. First of all, redundant systems require extra management concerning which software version is installed on
which duplicate. Second, maintaining the redundant systems and switching between them can be hard and is often
underestimated. What would happen for instance when the
switching mechanism fails? Would we have to make a redundant switching mechanism and another switching mechanism for switching between the switching systems? Last,
duplicate software and hardware devices should be present,
which may involve severe financial issues.
Another approach to this problem is dynamic adaptation
of the system. This involves adapting the system while it is
active, but requires that the parts of the system – which are
affected by the update – to be deactivated while the update
is performed [6]. Existing systems (such as [9, 2]), operate on an abstraction level of programming constructs (e.g.
components, objects or methods). Working on this level of
abstraction has the benefit of easily identifying the affected
system parts, as updates will be executed on the same level
of abstractness. However, it has the inconvenience of not
being user-centric, bringing along the difficulty of providing useful feedback to the user – e.g. which functionalities
of the system will not be usable during the update.
We propose to lift the level of abstraction towards system
features. We adopt the definition of Eisenbarth et al. for features [3]. They define a feature as ”an observable unit of behavior of a system which can be triggered by the user”. We
reason about features, but maintain a link between the features and their underlying program constructs. This allows
us to benefit from the two layers of abstraction. On the one
The domain in which we situate this research is that of
the availability of critical applications while they are being
updated. In this domain, the attempt is to make sure that
critical applications remain active while they are updated.
For doing that, only those system parts which are affected
by the update, will be made temporarily unavailable. The
problem is that current approaches are not user-centric, and
consequently, that they cannot provide feedback concerning
which features are deactivated while performing an update.
Our approach targets a four step impact analysis that
allows correct feedback to be given while a software system is dynamically updated. First, the different features are
identified. Second, the system entities that implement those
features are identified. Third, an atomic change sequence
is established for the actual update. Finally, we compare
the atomic change sequence and the implementation of the
features in order to establish a list of features that are affected by the update. This allows us to provide user-centric
feedback in terms of features.
Keywords: dynamic software evolution, change impact
analysis, features
1 Problem statement
An intrinsic property of a successful software application
is its need to evolve. In order to keep an existing application
up to date, we continuously need to adapt it. Usually, evolving such an application requires it to be shut down, however,
because updating it at runtime is generally not possible. In
some cases, this is beyond the pale. The unavailability of
critical systems, such as web services, telecommunication
∗ Authors funded by a doctoral scholarship of the “Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT
Vlaanderen)”
1
p. 75 of 199
hand, on the programming constructs level, we can easily
identify the affected system parts. On the other hand, on the
feature level, we can identify the affected functionalities of
the system. This opens up possibilities of providing useful
feedback to both the system user and developer. At runtime,
the user can be warned about temporary offline system features. At compile time, the developer can be warned about
the features that will be affected by the update, allowing
him to react in case some features were not suposed to be
affected by the update.
2 General approach
In this research report, we look at evolving applications
at the level of system features. More precisely, we want to
be able to comment which features are affected by a certain
update. For doing that, we first need to identify the different
features the application provides. This can be done by automatic feature extraction techniques [5] or by manual code
annotations.
In the second step we capture the system entities that
are implementing those features. This is currently done by
using static design knowledge of the system and by producing a call graph: a graph of the execution trace consisting of nodes (code statements) and edges (method lookups).
Those call-graphs represent the link between the two levels
of abstraction (the program concepts level and the system
feature level).
In the third step, the application update is rewritten as a
change sequence: a sequence of atomic changes. A change
sequence captures all source code modifications that are
amenable to analysis. In [2] we explained how monitoring techniques can be used to establish the atomic change
sequence.
In the fourth step, a change impact analysis [10] is performed. It takes two major inputs: the call graphs from
the different application features and the change sequence
that was established in the previous step. The basic idea
of the analysis is to compare the atomic changes with the
call graphs in order to establish the affected features list: a
list of features that are affected by the update. Note that –
thanks to the first and second steps of the process – we also
know the affected entities list: the system entities that are
implementing the affected features.
From the moment that all the affected features – and the
program concepts that represent those features – are captured, we can start the actual update process. In order to
avoid corruption, we want to make sure that the affected
system entities are in a quiscent state before they are updated [6]. Entities that are in a quiscent state do not allow incoming messages, and thus make sure that the entities
state remains consistent. Thanks to the impact analysis, we
know exactly which entities we should deactivate in order
Figure 1. Class diagram of the ATM application
to avoid corruption. After deactivating the affected entities,
we execute the change sequence and reactivate the affected
entities, making sure the update is carried out in a safe way.
Next to that, we are able to give feedback to both the user
and the developer on which features will be affected by the
update. The developer can be warned at compile time on
which features a certain change will impact. The user can
be warned at runtime on which features are temporarily unavailable. Note that this approach acts to the basic idea of
dynamic updating; carrying out updates in a safe way, without shutting down the entire system.
In order to exemplify our approach we use the example
of a class-based implementation of an ATM machine. Figure 1 shows the class-diagram of the system. We see that
there is a central class called ATM, which is the link between all the classes of the system. The system contains
five features: logging in, making a money transfer, making
a cash withdrawal, making a cash deposit and consulting
the balance. While the final four features are all transactions, the first feature consists of a non-functional feature:
”the user has to be logged in before he can start one of the
transactions”. Throughout this paper, we will continuously
refer to this example to clarify every step of the approach.
3 Change impact analysis
In this section, we describe the four step process of detecting the impact a certain change has on the system features. The outcome is a set of features (and corresponding system entities) that will be affected by a certain update. Before we actually start the change impact analysis,
2
p. 76 of 199
we must say that we go out from the premise that an application consists of features, and that those features can be
identified in the system. The remainder of this section consists of a step-by-step description of this approach.
currently not taking into account changes to the instance
variables.
In the case of the ATM example, we did not do the
complex feature analysis for obtaining the call-graphs. In
stead, we use design information (from collaboration and
sequence diagrams) for constructing the call-graphs. We
are aware that this results in a probably incomplete callgraph. This is not a problem, as the goal of the example is
to explain the user-centric approach and not the details of
the call-graph mining.
In practice, we could use monitoring techniques [14] for
obtaining the execution trace of the features. The execution
traces can directly be modeled as call-graphs, which are in
there turn stored in the Starbrowser [13]. Figure 2 shows the
call-graphs of all the features of the ATM example. We can
see that all transaction features have a commun part in their
call-graph. However we see that in that commun part, there
is a slight difference concerning the runtime type which is
passed when calling complete() on T ransaction. This is
due to the fact that there is a dynamic dispatch in that place.
3.1 Identifying feature
The goal of this phase is to capture all the different features of the system. Currently, we use design information
(from UML use case diagrams) for identifying the different
system features. However, recent research on feature extraction techniques [5, 1, 4] has shown that automating this
step is feasible.
In practice, we plan to use Starbrowser [13] to model the
different features of the system. Starbrowser is a generic
classification system for Smalltalk that allows the user to
add, modify, delete, view and browse classifications of
source code entities. We model each feature as a classifiable entity and make sure that, by clicking a feature, its
call-graph is shown to the user. The way this call-graph is
established, is explained in the following subsection.
3.3 Establishing the atomic change sequence
ID
F1
F2
F3
F4
F5
Feature
Logging in
Money transfer
Cash withdrawal
Cash deposit
Balace consulting
Explanation
The user identification process
Transfer money from this account to an other one
Withdraw cash money from this account
Deposit money on the account
Consult the balance of the account
In order to be able to start a change analysis, we first
need to decompose the update that brings the system from
version S to version S ′ , and capture it into a δS (= a change
sequence [2]). This change sequence captures all modifications to the source code in a list of atomic changes. We
extended the model that was presented in [2] with a few
extra atomic changes which capture variable modifications
and method lookup [10].
Table 1. System Features
Table 1 shows the five features that we identified in the
ATM case. In the rest of the paper, we are considering those
four features.
Scope
Class
3.2 Linking features with system entities
Variable
Method
As explained in section, 1, we want to keep a link between the program construct level and the system feature
level. In order to do that, we need to find the relationship between features and system entities. This is done by
analysing the features and capturing their execution traces.
In [7, 1, 4, 5], some techniques of dynamic analysis are presented, that capture the actual execution trace of the features. Because it is very hard to predict the actual execution
trace that is used by the features, a conservative superset of
the execution trace is captured.
We model each execution trace as a call-graph; a graph in
which nodes represent code statements and edges represent
method calls. Nodes are labeled with a < C, M > tuple,
where C is the class id and M the method which is called.
Edges corresponding to dynamic dispatch are also labeled
with a < C, M > tuple, where C is the run-time type of
the receiver object, and M is the method. Note that we are
Atomic change
AC
DC
AV
DV
AM
DM
CM
ML
Explanation
Add a Class
Delete a Class
Add a Variable
Delete a Variable
Add a Method
Delete a Method
Change the body of a Method
Change the Method Lookup
Table 2. Atomic Changes
Table 2 summarizes the set of atomic changes. The first
and most simple atomic changes incorporate added classes
(AC), deleted empty classes (DC), added variables (AV),
deleted variables (DV), added methods (AM), deleted methods (DM) and changed method bodies (CM). The last kind
of change consists of changes in the method lookup (ML).
Note that a change to a method body is captured by only
one CM, even if it consists of many changes.
In practice, the developer has to use a tool which is appropriate for doing updates to the system. This tool allows
all the atomic changes that were shown in table 2 and mon3
p. 77 of 199
Figure 2. Call graphs of the system features
3.4 Finding affected features
itors the changes the developer is applying. Those changes
are captured and stored in the atomic change sequence.
In this final step of the impact analysis, we want to establish the set of affected features (and their corresponding
system entities). Next to that, we also want to specify
which specific atomic changes are affecting the features of
the system. This could be used as an extra feedback to the
developer, telling him which atomic changes affect which
system features. For doing that, we first need to establish
the transitive closure of dependent atomic changes. An
atomic change Ai is said to be dependent of an other atomic
change Aj :
Imagine that the gouvernement issues new bank notes
of 1000 Euros. Our ATM machine will from now on have
to accept cash deposits of this new bank note. Next to
that, it must also be able to dispence the new bank notes.
It is clear that this requirement brings along the need for
both a hardware and a software update. Table 3 shows the
atomic change sequence A = {A1 , A2 , A3 , A4 , A5 , A6 } of
the software update that we need to apply for obtaining the
desired behavior.
Aj ← Ai
ID
A1
A2
A3
A4
A5
A6
Type
CM
CM
CM
CM
CM
CM
Details
<CustomerConsole, getWithdrawalInformation()>
<CustomerConsole, getDepositInformation()>
<CashDispenser, DispenseCash()>
<Withdrawal, complete()>
<EnvelopeAcceptor, acceptEnvelope()>
<Deposit, complete()>
(Ai is dependent of Aj )
if applying Aj without applying Ai , is conflicting (= results
in a syntactical error). Taking this into account, we establish
a partial order of all the atomic changes of an atomic change
sequence A.
A feature Fk is determined to be affected by Ai :
Table 3. Atomic Change Sequence of the update
Ai ⇐ Fk
(Fk is affected by Ai )
if its call graph contains either (i) a node that corresponds
to an atomic change of type CM (changed method) or DM
4
p. 78 of 199
(deleted method) change, or (ii) an edge that corresponds to
an atomic change of type ML (lookup) change.
4 Dynamic updating
A feature Fk is said to be affected by an atomic change
sequence A:
From the moment we know which features (and their
system entities) will be affected by the update, we are able
to start the actual updating process. In this process, we first
need to deactivate [2] those entities, in order to make sure
they remain in a quiscent state [6] while the update is actually carried out. For doing this, we make use of a dynamic update framework that was previously presented in
[2]. This framework uses a wrapper approach to stop incoming threads from running through the deactivated entities. Those threads are put into a waiting queue and are
handled after reactivation. We also intercept the different
entry points of the affected features (the first entity of the sequence or collaboration diagram) and make them display a
widget that tells the user that this feature is currently offline
because it is being updated, and that he should try again
later.
After this, we perform the actual updates on the system.
This is done by using the interceptive reflectional capabilities that are offered by a runtime API (presented in [2]).
Once the update is completely carried out, we can reactivate
the stopped entities and reset the entry points of the affected
features. Note that the developer can also use the affected
feature list as a means of feedback on the impact of his update. It is perfectly possible that it turns out some features,
that were not meant to be affected by an update, would be
affected anyway. If that would be the case, the developer
can react, before actually carrying out the update.
A ⇐ Fk
(Fk is affected by A)
if there is at least one atomic change Ai which is affecting
Fk :
A ⇐ Fk ⇔ ∃Ai ∈ A ∧ Ai ⇐ Fk
In order to determine the set of atomic changes AF
that are affecting a feature F1 – AF (Fk ) – we say that an
atomic change Ai is affecting a feature Fk if Fk is affected
by Ai , or if there exists another atomic change Aj which is
affecting Fk and from which Ai is dependent.
AF (A, Fk ) ≡ {Ai ∈ A | ∧Ai ⇐ Fk } ∪ {Ai | ∃Aj ∈
A ∧ Ai ← Aj ∧ Aj ⇐ Fk }
We can say that a feature Fk is not affected by a software
update A, if AF (A, Fk ) = ∅ From the moment we have
AF (A, Fk ) for all the features of the system, we can (i)
give feedback to the developer of which parts of the atomic
change sequence will be affecting which features, and (ii)
we can start the dynamic updating process.
5 Future work
Tables 1, 2 and 3 respectively show the different system
features, the call-graps, and the atomic change sequence of
the ATM example. Analysing those tables and applying the
technique explained above, results in:
As we have already mentioned in the paper, we only use
design information for obtaining the link between features
and system entities. Since there are systems that do not have
this design information available or for which the design information is incorrect, a better technique is required. for
this we are planning to incorporate feature extraction techniques. They could assist us for (i) identifying the different
features and (ii) to discover their execution traces.
We intentionally skipped discussing the atomic changes
that concern instance variables. However we already investigated how these atomic changes impact on the systems. In
[11, 12] we discuss about how the state is mapped from one
version to another.
Currently, we are only experimenting with the toy example we explained throughout this paper. However, we understand that it is a must to performing some real-world case
studies for really testing this approach. Executing some
benchmark tests on these cases would allow us to answer
the question on whether real systems can be spilt up in different features, which can be deactivated separately. We
AF (A, F1 ) = ∅
AF (A, F2 ) = ∅
AF (A, F3 ) = {A1 , A3 , A4 }
AF (A, F4 ) = {A2 , A5 , A6 }
AF (A, F5 ) = ∅
This summarizes the findings of the analysis; the impact the
atomic changes have on the different ATM features. From
this moment on, we know for certain that both the features
F3 and F4 will be affected by the update. This knowledge
can be used for two kinds of feedback: (i) telling the programmer at compile-time that the update will affect those
features, (ii) telling the user at runtime that those features
are currently offline due to an update. Next to that, the
knowledge can also be used for knowing which entities need
to be deactivated before starting this update. How this is
done, is explained in the following section.
5
p. 79 of 199
could easily imagine a system which features are so tangled
so that all of them will always be affected by some update.
Currently, we are working on the implementation of the
approach. While some parts are already finished, others,
such as the feature classification in Starbrowser or not.
However, we must say that we already see some possibilities for optimisations concerning speed and user friendliness. However, it is not clear yet, till what extent they can
be formalized.
[2] P. Ebraert, T. Mens, and T. D’Hondt. Enabling dynamic software evolution through automatic refactorings. In proceedings of the Workshop on Software Evolution Transformations
(SET2004), pages 3–7, 2004.
[3] Eisenbarth, R. Koschke, and D. Simon. Locating features in
source code. In IEEE Computer, volume 29(3), pages 210–
224, March 2003.
[4] A. Eisenberg and K. D. Volder. Dynamic feature traces: Finding features in unfamiliar code. In Proceedings of ICSM 2005
(21th International Conference on Software Maintenance),
2005.
[5] O. Greevy, S. Ducasse, and T. Gı̂rba. Analyzing feature traces
to incorporate the semantics of change in software evolution
analysis. In Proceedings of ICSM 2005 (21th International
Conference on Software Maintenance), pages 347–35, 2005.
[6] J. Kramer and J. Magee. The evolving philosophers problem:
Dynamic change management. IEEE Transactions on Software Engineering, 16(11):1293–1306, November 1990.
[7] G. C. Murphy, A. Lai, R. J. Walker, and M. P. Robillard. Separating features in source code: An exploratory study. In Proc.
23rd Int’l Conf. Software Engineering, pages 275–284. IEEE
Computer Society, 2001.
[8] P. O´ Connor. Practical Reliability Engineering. Wiley, 4th
edition edition, 2002.
[9] M. Oriol. An Approach to the Dynamic Evolution of Software
Systems. PhD thesis, Université de Genève, 2004.
[10] X. Ren, F. Shah, F. Tip, B. G. Ryder, and O. Chesley. Chianti: A tool for change impact analysis of java programs. In
Proceedings of the International Conference on Object Oriented Programming OOPSLA’04, 2004.
[11] Y. Vandewoude and Y. Berbers. Fresco: Flexible and reliable evolution system for components. In Electronic Notes in
Theoretical Computer Science, 2004.
[12] Y. Vandewoude and Y. Berbers. Deepcompare: Static analysis for runtime software evolution. Technical Report CW405,
KULeuven, Belgium, Februari 2005.
[13] R. Wuyts. Starbrowser.
[14] R. Wuyts. Smallbrother - the big brother for smalltalk. Technical report, Université Libre de Bruxelles, 2000.
• Only deactivating features during critical parts of the
update
• Reordering the atomic changes of the update, in order
to provide shorter deactivation periods
• Determining which part of the update can maybe be
postponed in order to leave some features active.
As a lot of programs are currently being written in the aspect oriented programming paradigm for disentangling the
different features, we think that our approach should be extended to support this. That is why we foresee the addition
of aspectual atomic changes; atomic changes that grasp the
notion of aspect oriented programming. This way, we could
allow the same approach for evolving aspect oriented programs.
6 Conclusion
In this paper, we propose a user-centric approach to dynamic evolution of applications. This approach involves
two layers of abstraction. The concrete layer – the program constructs layer – consists of system entities and can
be used for reasoning about source code. The abstract layer
– the feature layer – consists of the different system features
and is used for reasoning about system features.
Thanks to the fact that we maintain a link between the
two layers, we are able to do a change impact analysis on
the program constructs layer, and reason about it on the feature layer. This allows us to return feedback about the system features. It is this kind of feedback that separates this
approach from already existing approaches to dynamic software evolution, as this feedback tells the user wether some
feature can currently by used or not. Next to that, it can be
used by the developer to see wether a certain update affects
some unexpected system features.
References
[1] G. Antoniol and Y. Gueheneuc. Feature identification: A
novel approach and a case study. In Proceedings of ICSM
2005 (21th International Conference on Software Maintenance), 2005.
6
p. 80 of 199
On the use of Measurement
In
Software Restructuring
Naji Habra 1 ,
Miguel Lopez 2 ,
Keywords : Measurement, Software Measurement, Software Measurement Design, Software Evolution, Refactoring
Verification.
1. Abstract
A quick review of Software Software Restructuring literature (e.g., [Deme00], [Erni96], [Mens01],
[Mens04], [Simo01], [Tahv03]) allows us to make two observations. On the one hand, there is a real
interest in the utilization of so-called software "metrics" to support the process of restructuring. And on
the other hand, in most of the cases, the utilization is done in a very ad-hoc way without questioning the
well-foundedness of the measurement. This is probably due to the usual overspecialized view of the
research in SE where the domain of measurement is viewed as an apart research topic.
This paper points out different lacks and weaknesses behind the use of measurement in general and its use
in restructuring support in particular. Then it argues for a large and sound utilization of measurement in
software restructuring research. The goal is to improve the measurement methods used and to propose
directions to define new ones. Our belief is that such goal would be reached through a large collaboration
between measurement method designers and experts of the domain itself, i.e. software restructuring
experts. The rationale is that measurement design in SE consists mainly in the encapsulation and the
formalization of experts' knowledge.
2. Introduction: measurement in software restructuring
In software restructuring literature, particularly in academic works, so-called software metrics are
frequently mentioned (e.g., [Deme00], [Erni96], [Mens01], [Mens04], [Simo01], [Tahv03]) as a tool to
support restructuring activities. The motivation is clear: in the heart of this domain there is a software
product which is transformed with the aim of modifying some of its qualities. And -as in any engineering
discipline- a sound control of such a transformation necessitates a quantification of those concerned
qualities.
1
University of Namur, Belgium [email protected]
2 University of Namur, Belgium, [email protected]
p. 81 of 199
For example, in a work like the survey [Mens04], we point out dozens of times expressions like "more
complex", "less complex", "reduce complexity", etc. This seems to be a basic concept on top of which the
whole of the restructuring approach is based. An underlying question is the following one: what is exactly
"complexity" and "how to measure it" or at least "on basis of which sound and scientific argument one can
claim that such restructuring increase or decrease complexity?”.
In addition to "complexity", different qualities are also involved. Some are internal qualities more or less
related to the software structure, (e.g. coupling), others are external "qualities" (e.g., extensibility,
reusability, efficiency…). The issue is the same: how to define such qualities and how to measure them
directly or, at least, how to connect them rationally to simpler measurable qualities?"
The way the measurement is used hides several theoretical problems that worth to be solved separately.
Roughly, the use of "metrics" seems be reduced to the following question: Giving a software product
(usually a program but it could be another artifact), which kinds of numbers would help us to focus our
restructuring effort in a predictive way and/or to follow the restructuring effect in a retrospective way.
And practically, the "numbers" are preferably obtained through easy procedures like simple ad-hoc
counting and existing techniques and tools, in particular, those based on OO metrics suites (e.g.
[Chida94], [Lore94], [Hend96], [Brit95]). However, a problem that can not be avoided in such approaches
is the validity of the selected "metrics".
A typical process of metrics utilization can be roughly sketched as follows: starting with a given property
(e.g., a property to be improved by restructuring, a property to be detected to focus restructuring, the fact
that a given restructuring activity has been applied,…), one defines heuristics based on some intuitive
knowledge shared by software engineers, then he tries to find “indicators” based on some calculation from
available "metrics" that confirm or refute the sought property. Eventually, statistical techniques are used to
assess the results. Two observations can be made:
•
Some works are precise about the process sketched above, e.g. [Deme00] and they follow
empirical studies established principles like GQM. In some other works, it is not easy to make a
sharp distinction between the different levels: the property to be detected, the heuristic, and the
indicators. By consequent, it is not easy to make any validation.
•
The approach sketched above which focuses on the selection of ad-hoc metrics from existing ones
represents one process. We claim that another complementary process focusing on the definition
of the property and the design of measurement would also be interesting.
The complementary process could be sketched as follows: starting with the intuitive knowledge
about the focus properties (i.e., the qualities to be improved or to be avoided) we formulate
precise definition for those qualities through precise model and we make the design of
measurement methods on basis of the model. Then, we carry out our experiments to find if a
proposed restructuring technique fulfills or not its requirements in terms of the improvement of
the defined qualities.
The position of this paper is that, in this stage of the maturity of our discipline, it would be very fruitful to
follow such process. Moreover, this could be iterated in a cyclic way until reaching a stable knowledge
and stable measurement methods. Section 3 recalls some fundamental concepts and vocabulary of
software measurement. Section 4 presents a number of questions about the use of measurement in and
section 5 proposes an enlarged use of the measurement in the domain of software restructuring.
p. 82 of 199
2
3. Summary of some Software Measurement Fundamentals concepts
and Vocabulary
Following the representational theory according to which measurement is a homomorphic mapping
between an empirical world and a numerical world [Fent94, Fent97, Kitc95] and other classical
measurement works, an extensive framework is proposed in [Hab06]. The framework proposes definitions
for the different products and activities involved by the measurement lifecycle and it fixes the
vocabulary * . It structures that lifecycle in three successive phases: measurement design, measurement
application and measurement use. The design of measurement is particularly important in the domain of
software where there is a significant lack of well-established theories and of largely admitted
measurement methods.
The design of a measurement method should verify a minimal set of criterion necessary to avoid (at least)
producing as result numbers that do not represent the attribute being measured.
•
First, the measurement method proposed (e.g. counting some edges and nodes in some graph)
should be based on a precise definition of what the method expects to measure, i.e., a modeling of
the empirical world. That modeling implies to determine precisely the entity concerned (a
requirement document, an algorithm, a piece of code…) and the attribute (the quality) being
measured (size, complexity, coupling….) and to built model for them. In addition, the model built
should represent a consensual view of the domain's knowledge since measurement aim is to
provide comparison basis.
•
Another kind of criteria to be verified is at the numerical world side. In fact, as measurement is a
mapping from an empirical structure to a numerical structure, this latter one should present some
formal properties to ensure that the mapping is indeed a homomorphism. The homomorphism
guarantees that the ordering observed at the empirical world level is preserved at the numerical
world level. For example, the fact that "the size of MS Word is greater than the size of Notepad"
should be kept in the numerical world; that gives (assuming the size is measured through lines of
code): "the number of lines of code of MS Word is greater than the number of lines of code of
Notepad".
It is important to notice that this property is sometimes hard to establish. For instance, one can
argue that the cyclomatic complexity measurement method of McCabe [McCab76] does not take
into account -at the numerical world- the nested level of decision nodes whereas such nesting -at
the empirical world level- leads to greater complexity (that is, a series of nested "if" statements is
intuitively "more complex" than a flat sequence of "if" statements).
•
The homomorphism requirement leads classically to constraints on the scale (“ordinal”,
“interval”, “ratio”…). More generally, this requirement constraints the target numerical structure
(acceptable values and acceptable operations). For example, if it is established as a domain
knowledge that a set of programs can be sorted in terms of a given property P without any notion
of distance between programs (more formally, if we have only a weak order induced by P), it
would make no sense to map the empirical structure (the set of programs) to a numerical structure
like the real R with addition and multiplication operations.
•
Of course, the measurement method described as a set of operations should be correct with
respect to the attribute definition described by the model. In other words, it should represent a
correct operationalisation of that definition.
*
The terms metric and metrics are avoided. Though they are widely used, we believe that their use causes ambiguity
and possibly confusion by suggesting erroneous analogies, e.g. with the mathematical metric in topology, with the
metric system of units, etc.
p. 83 of 1993
External qualities (e.g. evolvability, maintainability) are often evaluated on basis of one or more internal
ones (e.g. coupling, complexity) where some "numbers" are associated to these qualities (e.g., through a
mathematical formula). For those numbers, we prefer to talk about "indicators" or “heuristics”. Indeed,
generally such numbers are not measurement results stricto sensu, (i.e., issued from a mapping of a welldefined property in the empirical world into numerical world). Moreover, those “indicators” are used in
turn to make some “prediction” of other properties related to the behavior of the product within a real
environment (the effort needed to maintain, to restructure, to understand, the average amount of bugs etc).
Such behavioral properties -which can in turn be measured- should not be confused with the indictors.
Evaluating external qualities brings other kind of questions and thus requires additional verification. Two
kinds of problems arise:
•
The first problem is related to the vagueness or the misunderstanding of the relationship between
an “indicators” seen as a measurement of some external quality and the behavior in a real world
we intend to predict. For example, the time and cost actually needed for the maintenance depend
on two families of factors: factors related to the software product itself (e.g. structural properties)
and factors related to the environment (e.g., people skill). Therefore when we talk about the
external quality like "maintainability", it is worth keeping in mind that we are evaluating a quality
of the software product that influence the maintenance cost. The utilization of such indicators
should not forget the hypothesis that the environment factors are supposed to be fixed and that this
hypothesis can of course be questioned.
•
The second problem is related to the exact relationship between the external quality for which we
compute an indicator and the internal qualities measured to build such indicators. It is necessary to
have an explicit model describing the relationship between the external quality (e.g.
maintainability) and the internal quality measurement used to built it (e.g. coupling,
complexity…). Such a model would allow to made explicit different contextual conditions (e.g.
the language, the software type…) and thus to avoid a misuse of the obtained indicator. Moreover,
this kind of models must also be tested through experimental and empirical studies, but this is
very difficult to perform due mainly to the lack of reliable industrial data.
Finally, our belief is that in the case of immature domain like software, it would be acceptable and even
necessary to propose first temporary definitions and models together with their related measurement
model, and to allow then their refinement as our knowledge about the domain improves.
4. Questions about the use of Measurement in Software Restructuring
On basis of the verification criteria developed above, a quick review of software restructuring literature
leads to the following questions.
•
A first question is related to the extensive use of some qualities (e.g. properties like coupling,
cohesion, complexity…) which have, in fact, multiple definitions and sometimes multiple
measurement procedures for the same definition. In previous works we developed this problem in
the case of coupling [Lope05] and complexity [Abra04]. Selecting a given definition and a
corresponding measurement procedure is not neutral.
We think we should not select the definition of the quality and the measurement method simply
because they are appropriate to confirm our work. The definition of the quality and the related
measurement method should be done before their use and should correspond to our knowledge
about the property. As the improvement of the property is a part the goals aimed by the
restructuring, a precise definition of that property should be given in at the time the goal is
established.
p. 84 of 199
4
•
On the other hand, the model establishing the relationship between internal properties (e.g.
coupling etc) and external ones (e.g. evolvability, maintainability) is not always explicit and
justified. This leads to the impossibility to distinguish the case when a system has really been
improved its external quality from the case of the non-adequacy of the indicator. The fact that a
given indicator of an “external quality” allowed us to observe some change (i.e. a number was
increased or decreased) is not sufficient by itself to make conclusion. Some factors having
determinant influence could be erroneously ignored by the underlying model as long as this model
is not explicit.
•
Some measurement methods are used without any questioning about the context for which they
have been designed and this could lead to a very improper use. A typical example is the use of
McCabe cyclomatic complexity number CCN in Object-Oriented context while this measurement
method has been designed for Fortran. In a previous work [Abra04] [Lope05] we showed that
such use is no more justified (at least as long as the property intended to be measured is the
“complexity”).
•
Another important problem is related to the shift made frequently from one product, (e.g., the
design) to another one, (e.g. the code) with no justification. As we saw above, sound measurement
methods are built to measure a given attribute of a given entity (a product). Using the result of a
measurement made on one attribute of one product (e.g. the coupling of a given piece of code) is
not directly usable as an adequate measurement of a quality of another product (e.g. the design).
Of course, there is a relationship between the design and its corresponding code. But the question
is to determine in which conditions it makes sense to use a number obtained from measuring a
piece of code to reason about the quality of the related design.
Our belief is that such question is far from being trivial. Moreover, questioning the relationship
between, structural attributes of the design, on the one hand, and corresponding structural
attributes of the corresponding code, on the other hand, would certainly improve our
understanding of the deep signification of some important properties currently mentioned in SE.
Of course, this would lead to complex investigation because a design is not mapped to
corresponding code through a one-to-one relationship but through different kinds of
transformation which are seldom studied under this point of view
5. Towards another use of Measurement in Software Restructuring
Answering the above questions implies to make explicit different implicit hypothesis about the use of
measurement methods. And this would lead to a larger, sounder but more beneficial use.
Following the evolution phases of the frameworks of [Mens04] software measurement is involved in two
phases:
•
In the phase of the identification of what should be refactored. Different works suggested to
"identify bad smells". Of course, the term "bad smells" by itself suggests the existence of some
properties which are not (or not yet) completely formalized and understood and therefore which
can not be "measured". But this suggests also that experts are able to detect those “bad smells”.
So, an important research orientation would consist in the formalization of the experts’ knowledge
of such kinds of attribute.
That formalization would represent a sound basis for the design of measurement methods, though
it would be laborious. Concretely, it would involve different parts: the definition of the external
quality(ies) for which we need “indicators” (readability, maintainability, evolvability,…), the
definition of the internal qualities potentially concerned (usually structural attributes related to
complexity and size), and the modeling of the relationship between those two levels.
p. 85 of 1995
Our first attempt in this direction brings already the fundamental question of the right product
being concerned. Does our intuitive knowledge about “bad smell” correspond to an attribute of the
code, an attribute of the design, or even an attribute of some complex product involving both?
(This non conformist idea seems appealing if we consider, for example, that the understandability
of a given code and therefore its maintainability are higher for a code accompanied by wellstructured UML design schemas than for the sole code).
The proposal is to reach valid models through iterative process. The idea is to propose a first
approximation of the concerned qualities on basis of experts’ knowledge together with its
relationship it to internal attributes and their related measurement methods. Then, empirical
studies with actual restructuring can be achieved and -if necessary- the definitions are to be
refined in an iterative way. For example, if the results show that the indicators do not reflect
sufficiently the intuitive understanding, the definition of the qualities and the associated
measurement methods should be revised and experimented again.
•
In the phase of the assessment of the restructuring effect: The same observation can be made. The
utilization of measurement is suggested as one approach, among others, to assess the restructuring
effect. Here also, a complete quality model can (should?) be used as a basic framework to support
the whole process and not only in an ad-hoc way through some existent measurement methods. As
above, a complete model would involve the precise definitions of the different properties
involved; that is internal properties like size, complexity, coupling cohesion…on the one hand,
and external properties like robustness, extensibility, extensibility, reusability, etc on the other
hand. And, of course, this quality model will integrate the relationships (preferably experimentally
proved) between the different qualities. Each restructuring becomes then an experiment that
allows us to refine the definition and the models, and thereby to improve our fundamental
knowledge in SE.
6. Conclusion
In this paper, we have discussed the possibility to improve the measurement approaches used in the
software restructuring domain. As in other domains of software engineering, the software measurement
methods need to be clarified. Indeed, it could be very useful to first define clearly the qualities (coupling,
complexity, over-elaborated structures…) involved in software evolution studies. Secondly, it would be
interesting, based on the non-ambiguous definitions, to build operational and reproducible measurement
methods. And, thirdly, there is an urgent need to verify the assumed relationships with reliable empirical
studies.
In summary, our belief is that the conjunction of the principles of measurement design issued from
measurement research with the knowledge provided by restructuring experts would give birth to a very
promising research axis. The underlying concern is our deep knowledge of essential properties commonly
used by software engineers.
p. 86 of 199
6
7. Bibliography
[Abra04]
[Brit95]
[Chida94]
[Deme00]
[Erni96]
[Fent94]
[Fent97]
[Habr06]
[Hend96]
[Kitc95]
[Lope05]
[Lore94]
[McCab76]
[Mens01]
[Mens04]
[Simo01]
[Tahv03]
A. Abran, M. Lopez, & N. Habra, "An analysis of the McCabe Cyclomatic complexity number".
Paper presented at 12th International Workshop on Software Measurement IWSM2004, Königs
Wusterhausen, Germany, 2004.
F. Brito e Abreu, “The MOOD Metric Set”, Proceedings of the ECOOP’95 workshop on Metrics,
1995.
S. R. Chidamber & C.F. Kemerer, "A Metric Suite for Objetc-Oriented Design", IEEE
Transactions on Software Engineering, Vol.25 n°5, june 1994, pp. 476-493.
S. Demeyer, St. Ducasse, O. Nierstrasz, "Finding Refactoring via Change-Metrics", Proceedings of
OOPSLA'00, ACM, 2000.
K. Erni & C. Lewerentz, "Applying Design-Metrics to Object-Oriented Frameworks", Proceedings
of Metrics 96, IEEE, 1996
N. Fenton, "Software Measurement : A Necessary Scientific Basis", IEEE Transactions on
Software Engineering, Vol.20 n°3, March 1994. pp. 199-206.
N. Fenton & S.L. Pfleeger, Software Metrics – A rigorous and Practical Approach, 2nd.
International Thomson Computer Press, London, 1997.
N. Habra, A. Abran, M. Lopez & A. Sellami. “A Framework for Software Measurement Design”
Research Report, RR33/06, Institut d'Informatique, FUNDP Namur RR33/06
B. Henderson-Sellers, Object-Oriented Metrics : Measures of Complexity, Prentice-Hall, 1996.
B. Kitchenham, S.L. Pfleeger & N. Fenton, "Towards a Framework for Software Measurement
Validation", IEEE Transactions on Software Engineering, Vol.21, n°12, December, 1995, pp. 929944.
M. Lopez, A. Abran, N. Habra & G. Seront, "On the Impact of the Types Conversion in Java onto
the Coupling Measurement", in 15th International Workshop on Software Measurement
(IWSM'05), October 2005, Eds Shaker Verlag, Montreal, Canada.
M. Lorenz & J. Kidd, Object-Oriented Software Metrics : A practical Approach, Prentice-Hall,
1994.
T.J. McCabe, “A Complexity Measure”. IEEE Transactions on Software Engineering Vol.2 No4
(1976), pp. 308-320
T. Mens & S. Demeyer, "Future Trends in Software Evolution Metrics" IWPSE 200.
T. Mens & T. Tourwé "A Survey of Software Refactoring", IEEE Transactions on Software
Engineering, Vol.30, No2, February 2004.
Fr. Simon, Fr. Steinbrückner & C. Lewerentz, "Metrics Based Refactoring", in Proceedings of the
Fifth European Conference on Software Maintenance and Engineering, March 2001, pp. 30-38.
L. Tahvildari & K. Kontogiannis, "A Metric-Based Approach to Enhance Design Quality Through
Meta-Pattern Transformations" in Proceedings of 7th Software European Conference on Software
Maintenance & Reengineering, CSMR'03, IEEE, 2003.
p. 87 of 1997
p. 88 of 199
Responsibility-Steering Automation of Software
Evolution
Ming-Jen Huang 1
School of Information Science
Japan Advanced Institute of Science and Technology
Nomi-City, Japan
Takuya Katayama 2
School of Information Science
Japan Advanced Institute of Science and Technology
Nomi-City, Japan
Abstract
Model-driven development (MDD) proposes the notion of explicitly modeling each
aspect and then having machines generate their concrete implementation. By MDD,
it is possible to automate the process of software construction because new implementation can be generated as specifications change. We consider the model of
a target software system should explicitly describe both concepts of different abstraction levels and relations between them. And the human knowledge of software
designing can be expressed as transformation rules. Combing the model and the
rules, new implementation can generated once the model is changed. In this paper, we outline our technology that combines MDD and a rule-based approach for
automatic software evolution. We describe the ideas, approaches, and some initial
expression that gain from our experimental project.
Key words: Model-Driven Development, Rule-Based Approach,
Responsibility, Business Domain
1
Introduction
Due to the rapidly changing business environment and rapidly emerging technologies, handling software evolution has become one of the major issues in
the software industry [1,2]. Model-driven development (MDD), the software
1
2
Email: [email protected]
Email: [email protected]
c 2006 Published by Elsevier Science B. V.
°
p. 89 of 199
Huang
construction approach that promotes the automatic generation of software
systems from model definitions can be an ideal form of software construction.
A general scenario of this approach is that system users and developers work
together to build a model of the target domain that only contains concepts of
that domain; then, by pressing a button in the screen, a software system will
be generated.
In this paper, we outline REsponsibility-STeering Development Architecture (Restda), an automatic software construction solution that applies the
idea of model-driven development (MDD) and rule-based approach for handling software evolution. We use the concept of responsibilities in many aspects of software construction, including domain modeling, consistency verification, and generation rule definitions.
Using responsibilities for software construction is not our original idea.
Responsibilities have already been recognized as a powerful tool for specifying requirements and designing software systems [6,8] , for example, in
Responsibility-Driven Design [8], an informal approach to building software
systems. Despite their widely recognized importance, however, they have
never been precisely defined. Restda draws its inspiration from the concept.
We considered that if we could formally define responsibilities, use the definition to model responsibilities of our world, and find connections between the
real-world responsibilities and those of a software system, then a model-driven,
automatic generation system could be used for construction and evolution of
software systems.
The goal of Restda is to benefit both system users and developers. System users model their daily work in terms of their responsibilities without any
concern for technology details. Developers turn the responsibilities that have
been defined by the users into responsibilities that should be taken by the
target system. The responsibilities defined by system users and developers
are called domain responsibilities and system responsibilities respectively in
Restda because they concern the target domain and the target system. The
human knowledge of software designing are expressed by two more types of
responsibilities in Restda, architectural responsibilities and technology responsibilities. Architectural responsibilities concern the structure of the target
systems. Technology responsibilities concern the characteristics of a specific
technology platform. With these four types of responsibilities, the target system can evolve in three aspects of software construction: domain modeling,
system specifications, and underlying technology.
The remainder of this paper is organized as follows. In Section 2, we
give an overview of the proposed development architecture. In Section 3, we
describe the Restda metamodel. Section 4 explains how the transformations
between models are performed in terms of responsibilities. Section 5 contains
concluding remarks and describes future work.
2
p. 90 of 199
Huang
2
Restda Overview
Restda is coherently integrated by responsibilities. As we mentioned above,
there are four types of responsibilities, domain, system, architectural, and
technology. Each type describes a specific-aspect of software construction.
Domain responsibilities describe responsibilities of the target domain that exist in our real world. System responsibilities describe responsibilities of the
target system that exist in the software world. Architectural and technology
responsibilities describe responsibilities of the technology world. We called
these different abstraction levels of world because they are actually a layered
structure in software construction. By using the concept of levels of world,
system users can easily identify the responsibilities that should be defined
in domain models and developers can also easily decide what responsibilities
should be included when writing rules of model transformations.
The Restda development process can be summarized in three steps. Fig.
1. depicts the process.
•
Step 1: System users model actors who participate in scenarios and their domain responsibilities from the perspective of the real world without concern
for the details of the target system. The domain responsibilities represent
a piece of the task to be accomplished by the actors without details about
how the task is to be performed. Different from the domain responsibilities, the system responsibilities represent the detailed steps that should be
performed by the target system. It is a change of perspective from human
processing to machine processing. System users, usually with the help from
developers, have to give each responsibility pre- and post-conditions that
must be held before and after the accomplishment of responsibilities.
•
Step 2: Developers refine the domain responsibilities that should be handled
by the target system. They refine the domain responsibilities into smaller
and computable responsibilities that should be taken by the target system.
For example, if a sales staff has a responsibility to create a purchase order
from a customer, then it can be refined into (1) getting customer data (2)
getting purchasing items data (3) creating purchase order. Developers also
must give each responsibility pre- and post-conditions.
•
Step 3: Machines validate, verify, and transform the models mentioned
above for concrete implementation. The system responsibilities will be
transformed into a concrete implementation by transformation rules.
In the first step, domain concepts are described in UML class diagrams.
Class notation denotes static concepts. Attributes of class denote properties of
domain concepts. Operations are not used. The domain and system responsibilities are defined in UML activity diagrams. Activity notation denotes
responsibilities.
Model transformations are very important to any MDD approach. They
3
p. 91 of 199
Huang
Model
Domain
Concepts
Model
System
Concepts
Verify
Models
Generate
Code
Define
Rules
Define
Code
Templates
Fig. 1. The Development Process of Restda
are similar to program transformations, except that the source and the target
could be models or programming languages. In Restda, a rule-based approach
is used to generate software implementation. There are two types of input
information, (1) domain responsibilities and system responsibilities define the
system specifications, (2) architectural responsibilities and technology responsibilities define the transformation rules. The transformation rules can be
implemented in a rule engine to generate implementation from the system
specifications.
3
Restda Metamodels
We use the metamodel approach for formalizing responsibilities. The syntax
of Restda model is described by metamodels and the precise semantics are
defined by OCL [7]. The metamodel is a model of a model, which means, a
model to define a model [3]. On the other hand, a model is used to define the
domain elements of a bookstore, such as books or customers.
The metamodel of Restda depicted in Fig. 2. Actors are Entities that
are used to model people, hardware devices, or software systems. Documents
are also Entities that are used to model a piece of information, such as purchasing order, e-mail message, or bits that are transferred between machines
across a network. Attributes are properties of Entities. Constraints
are limits on attributes. There are also Relationships between Entities.
Responsibilities represent tasks that must be accomplished by Actors.
InputPin and OutputPin model the initialization and accomplishment of
responsibilities. They also model information that is sent to and returned
from Responsibilities. Each Responsibility is confined by two sets of
Constraints, which are pre-conditions and post-conditions. Pre-conditions
are things that must be true before the task of a responsibility is executed
(conditions-of-use guarantees). Post-conditions are things that must be
true after the task of a responsibility is executed (aftereffect guarantees).
The association transit represents the execution flow between responsibilities. Responsibility realization is modeled as the association refined between Responsibilities. A Responsibility is always performed by an
4
p. 92 of 199
Huang
Scenario
Attribute
*
1
*
1
*
1
Constrain
Relationship
Entity
*
-post
*
1..* 1
refinedBy
* 1
1
1
*
Responsibility
-name : String
1..*
-holder
1
-world
1
1
1
-targets
-receiver
InputPin
1
Actor
Document
-pre
1
1
OutputPin
*
1
Task
1
-name : String
-stype : Stereotype
Fig. 2. The Metamodel of Restda
Actor that plays the role holder. And the holder always returns the results of a Responsibility to an Actor that plays the role of receiver. For a
Responsibility, the holder and the receiver can be same actor.
4
Transformation Rules
For both automatic model transformations and manual software construction,
the creation of design models and their concrete implementation from requirement specifications is a complicated issue, because the real world we live in
is far different than the object-oriented software world that we use to solve
problems in our world. To resolve this issue, it is important to find a concept
that coherently links these two worlds. In Restda, it is solved by responsibility
and the worlds are linked by transformation rules,which is defined in terms of
responsibility realization.
One of the hardest parts of software construction is creating the design
of the target system. In the designing process, software classes that perform
different types of work are created and collaborations among different classes
are built. The patterns of collaboration imply the architectural qualities, such
as availability, performance etc, of a software system. These patterns that
relate to system architecture are also defined by responsibilities and called
architectural responsibilities in Restda.
Classes that form collaborations are further defined by technology responsibilities. Technology responsibilities concern technology-specific characteristics. For example, JavaBeans is a primary type of implementation when
5
p. 93 of 199
Huang
creating reusable components for the client-side Java application. It requires
using getter/setter paired methods for accessing data in a class. But when
creating distributed enterprise systems, a different technology, Enterprise JavaBeans (EJB) is usually used to create server-side components. And in EJB’s
specifications, for each EJB component, there is an EJB object that implements the interface of the EJB component for remote access. These kinds of
considerations are represented as technology responsibilities.
The core concept of transformation rule definition is that a system responsibility always implies some certain patterns of implementation and is realized
by one or more architectural responsibilities and each architectural responsibility is further realized by one or more technology responsibilities. The rules
of model transformation are defined based on this concept. A transformation
rule forms a hierarchical structure and can be presented in an AND/OR/XOR
graph wherein nodes represent a choice of responsibilities and edges represent
realizing relations between responsibilities. Architectural responsibilities and
technology responsibilities are defined in the structure. And the hierarchical
structures represent human knowledge of software designing because it shows
how developers logically construct a software system.
We use Jess [5] for implementing transformation rules. Jess is a development tool for encoding knowledge into facts and rules [4]. Facts are known
things. Rules are instructions that apply under certain conditions. A rule contains a condition part and an action part. It can be expressed as ”IF conditions
are satisfied THEN fire action(s)”. We call the condition part the LHS (lefthand side) and the action part the RHS (right-hand side). In Retda, model
definitions are facts and hierarchy structure of transformation rules are Jess
rules. A Jess rule is defined for each system responsibility where actions will
be triggered by different conditions of the system responsibility. For example,
a system responsibility AuthenticateUser will have different implementation
if there is a value in the responsibility that states the AuthenticationType is
Encrypted or PlainText. These rules are usually domain-specific. We consider it is very difficult to define generic rules for all domains because when
generating a software system we should always take the aspects of domain,
system, architecture, and technology into consideration, and these four aspects
vary greatly between different domains.
Fig. 3. depicts an exemplified rule in a graph structure. The top node
is the system responsibility ProcessOrder. It can be realized by two architectural responsibilities ValidateCreditCard and ProcessCreditCard which
separate the target system into two parts, the one that processes representation and the one that processes data. For ValidateCreditCard, there are
three realized technology responsibilities. For ProcessCreditCard, there are
two realized technology responsibilities. Each technology responsibility can
be physically defined by software classes. Objects of these classes that assume
6
p. 94 of 199
Huang
ProcessOrder
Validate
CreditCard
Extract
Values
Check
CardType
Process
CreditCard
Check
CardNumber
Connect
CardCompany
Display
Result
Fig. 3. The Exemplified Rule Structure
technology responsibilities collaborate together to fulfill a ”bigger” architectural responsibility.
5
Discussion
We apply Restda to a test project for experiment and build a development
tool based on Restda as Eclipse plugin and use it on the test project. There
are some points that we would like to discuss:
(i) The benefits of MDD approach in Restda is that new implementation
will be generated once requirement specifications are changed. One thing
should be noticed is that it is not a technology for full automation of business logic generation from model. Currently implementation is generated
from code templates and placeholders in the code templates are filled by
values from high-level models. But the code templates are defined by
human not generated by machines.
(ii) The hierarchical structure of rule definitions are actually design decisions
of software development because they state the choices within and between different abstraction levels of designing knowledge. We consider
this kind of information is very important to software evolution because
it express why, what, and where to evolve.
(iii) In Restda, the transformation rules can be reused throughout the same
project. We consider the rules can also be reused across different projects
if these projects belong to an identical domain because there can be
many common concepts and designing knowledge. But we need more
experiments to support this argument.
(iv) We only apply our approach to business domain. We are interested in
applying Restda to other domains, we consider it is future work.
7
p. 95 of 199
Huang
References
[1] Bennett, K.H., and Rajlich, V.T. Software Maintenance and Evolution: a
Roadmap, In Proceedings of the Conference on the Future of Software
Engineering (2000), pp. 73-87.
[2] Buckley, J., Mens, T., Zenger, M., Rashid, A. and Kniesel, G., Towards a
taxonomy of software change, Journal of Software Maintenance and Evolution:
Research and Practice 17(5), pp. 309-332.
[3] Frankel, D, ”Model Driven Architecture: Applying MDA to Enterprise
Computing”, Wiley, New York, 2003.
[4] Friedman-Hill, E, ”Jess in Action: Rule-Based Systems in Java”, Manning,
Greenwich, CT, 2003.
[5] Jess 7.0. http://herzberg.ca.sandia.gov/jess/index.shtml.
[6] Larman, C, ”Applying UML and Patterns: an Introduction to Object-Oriented
Analysis and Design and Iterative Development”, Prentice Hall PTR, Upper
Saddle River, NJ 2005.
[7] Warmer, J., and Kleppe, A. ”The Object Constraint Language: Getting Your
Models Ready for MDA”, Addison-Wesley, Reading, MA, 2003
[8] Wirfs-Brock, R. and McKean, A, ”Object Design: Roles, Responsibilities, and
Collaborations”, Addison-Wesley, Boston, MA, 2003.
8
p. 96 of 199
Generic Programming for Software Evolution
Johan Jeuring
Department of Information and Computing Sciences
Utrecht University
The Netherlands
Rinus Plasmeijer
Institute for Computing and Information Sciences
Radboud University Nijmegen
The Netherlands
1
Introduction
Change is endemic to any large software system. Business, technology, and organization
usually frequently change in the life cycle of a software system. However, changing a large
software system is difficult: localizing the code that is responsible for a particular part of the
functionality of a system, changing it, and ensuring that the change does not lead to inconsistencies or other problems in other parts of the system or in the architecture or documentation
is usually a challenging task. Software evolution is a fact of life in the software development
industry, and leads to interesting research questions [21, 22, 31].
Current approaches to software evolution focus on the software development process,
and on analyzing, visualizing, and refactoring existing software. These approaches support
changing software by, for example, recognizing structure in code, and by making this structure
explicitly visible, by means of refactoring or renovating the code. By making structure explicit,
it becomes easier to adapt the code.
A generic program is a program that works for values of any type for a large class of data
types (or DTDs, schemas, class hierarchies). If data types change, or new data types are added
to a piece of software, a generic program automatically adapts to the changed or new data
types. Take as an example a generic program for calculating the total amount of salaries paid
by an organization. If the structure of the organization changes, for example by removing or
adding an organizational layer, the generic program still calculates the total amount of salaries
paid. Generic programming is a new research field [14, 16, 9, 11, 6, 3], and its implications
for software development and software evolution have hardly been investigated.
Since a generic program automatically adapts to changed data types, generic programming
is a promising approach to the software evolution problem, in particular for software where
type formalisms and types play an important role, such as data-centric software. Assume
structure has been recognized in software, for example by means of refactoring or renovating
the code. Then a generic program makes those parts of the code which depend on the structure
of data independent of that structure.
This position paper
1
p. 97 of 199
• explains why we think generic programming is useful for software evolution,
• describes the kind of software evolution problems for which generic programming is
useful,
• and discusses the research challenges for using generic programming for software evolution.
This paper is organized as follows. Section 2 briefly introduces generic programming.
Section 3 discusses some scenarios in which generic programming is useful for software evolution, and Section 4 discusses the research problems we want to solve in order to make generic
programming a viable approach to software evolution. Section 5 concludes.
2
Generic programming
Software development often consists of designing a data type1 , to which functionality is added.
Some functionality is data type specific, other functionality is defined on almost all data types,
and only depends on the type structure of the data type. Examples of generic (sometimes
also called polytypic) functionality defined on almost all data types are storing a value in a
database, editing a value, comparing two values for equality, pretty-printing a value, etc. A
function that works on many data types is called a generic function. Applications of generic
programming can be found not just in the rather small programming examples mentioned,
but also in
• XML tools such as XML compressors [12], and type-safe XML data binding tools [5];
• test set generation for automatic testing [18, 17];
• constructing ‘boilerplate’ code that traverses a value of a rich set of mutually recursive data types (for example representing the abstract syntax of a programming language) applying real functionality (for example collecting the variables that appear in
expressions) at a small portion of the data type (the case for variables in the abstract
syntax) [19, 24, 20, 32];
• structure editors such as XML editors [10], and generic graphical user interfaces [1, 30];
• data conversion tools [15] which for example store a data type value in a database [10],
or output it as XML, or in a binary format [33].
This section introduces generic programming in Generic Haskell. We give a brief introduction to Generic Haskell, assuming the reader has some knowledge of Haskell [28] or ML.
Generic Haskell is an extension of the lazy, higher-order, functional programming Haskell that
supports generic programming; more details can be found in [13, 23]. Generic programs can
also be written in Clean [3], ML [8, 7], Maude [25], Java [34], and some other programming
languages.
1
or DTDs, schemas, class hierarchies, etc. In the rest of this paper the word data type is used as a concept
that represents all of these concepts.
2
p. 98 of 199
2.1
Generic programming in Generic Haskell
A generic program is a program that works for a large class of data types. A generic program
takes a type as argument, and is usually defined by induction on the type structure. As
an example, we define a very simple generic function content that extracts the strings and
integers (shown as strings) that appear in a value of an arbitrary data type. The instance of
content on the type of binary trees with integers in the leaves, and strings in the internal
nodes, defined by
data Tree
=
Leaf Int | Node Tree String Tree
returns ["3","Bla","7"] when applied to Node (Leaf 3) "Bla" (Leaf 7). The generic
function content returns the document’s content not only for the type Tree, but for any
data type one can define. In particular, it will still work when a data type definition is
changed2 .
content
content
content
content
content
content
content
{|
{|
{|
{|
{|
{|
{|
t :: *
Unit
Int
String
a :+: b
a :+: b
a :*: b
|}
|}
|}
|}
|}
|}
|}
Unit
int
str
(Inl a)
(Inr b)
(a :*: b)
::
=
=
=
=
=
=
t -> [String]
[]
[show int]
[str]
content {|a|} a
content {|b|} b
content {|a|} a ++ content {|b|} b
Function content {| t |} is a type-indexed function. The type argument appears in between special parentheses {|, |}. The different type arguments are explained below. An instance of content is obtained by applying content to a type, for example, content{|Tree|}.
The type of function content is given for a type t of kind *. This does not mean that content
can only be applied to types of kind *; it only gives the type information for types of kind
*. The type of function content on types with kinds other than * can automatically be
derived from this base type. Note that the single type given for this function ensures that
all instances of this function on particular types are type correct. A type-correct generic
function generates type correct code [23]. Using an accumulating parameter we can obtain a
more efficient version of function content.
To apply a program to (values of) different types, each data type that appears in a source
program is mapped to its structural representation. This representation is expressed in terms
of a limited set of data types, called structure types. A generic program is defined by induction
on these structure types. Whenever a generic program is applied to a user-defined data type,
the Generic Haskell compiler takes care of the mapping between the user-defined data type
and its corresponding structural representation. If we want a generic function to exhibit
non-standard behavior for a particular data type we can add an extra case expressing this
behavior to the definition of the generic function.
The translation of a data type to a structure type replaces a choice between constructors
by a sum, denoted by :+: (nested to the right if there are more than two constructors), and
a sequence of arguments of a constructor by a product, denoted by :*: (nested to the right if
there are more than two arguments). A nullary constructor is replaced by the structure type
Unit. The arguments of the constructors are not translated. For example, for the data type
Tree we have
2
The (infix) operator ++ returns the concatenation of its two input strings.
3
p. 99 of 199
data Tree
type Str(Tree)
=
=
Leaf Int | Node Tree String Tree
Int :+: (Tree :*: String :*: Tree)
Here Str is a meta function that given an argument type generates a new type name. The
structural representation of a data type only depends on the top level structure of a data type.
The arguments of the constructors, including recursive calls to the original data type, appear
in the representation type without modification. A type and its structural representation
are isomorphic (ignoring undefined values). The isomorphism is witnessed by a so-called
embedding-projection pair : a value of the data type
data EP a b
=
EP
(a -> b) (b -> a)
The Generic Haskell compiler generates the translation of a type to its structural representation, together with the corresponding embedding projection pair.
From this translation, it follows that it suffices to define a generic function on sums (:+:,
with values of the form Inl l or Inr r), products (:*:, with values of the form l :*: r,
and Unit, with as only value Unit) and on base types such as Int and String. Function
content has been defined on these structure types.
One might wonder whether there is an efficiency price to be paid for using generic programming. In principle this technique indeed introduces additional overhead. To apply the
generic function, the original data structure first is converted to its structural representation. Then the generic function is applied, yielding another structural representation that
is converted back to the data structure of the resulting domain. Due to these conversions
the technique may indeed lead to inefficient code when it is frequently being applied to large
data structures. Fortunately, there exists an optimization technique [2, 4] which completely
removes the generic overhead for almost all cases, resulting in code which is as efficient as
hand written code [32]. This makes it possible to use generic programming for real world
practical applications.
3
Generic programming for software evolution
We claim generic programming is useful for software evolution. This section sketches a number
of scenario’s in which generic programming can be applied successfully to support software
evolution.
Webshops. There are many webshops on the internet. The functionality of all these shops
varies very little: product descriptions have to be retrieved from a database, and shown
in a friendly way to the users, user actions have to be translated to database transactions,
etc. Much of the software written for webshops only depends on the structure of the data
(the products sold). Implementing a webshop using generic programming technology [30, 29]
supports reuse of software for different webshops. Furthermore, whenever a product catalogue
changes, which happens very frequently for most webshops, there is no need to rewrite any
software.
Actually, generic programming can be used for any kind of data controlled web site. Using
generic programming techniques arbitrary complicated interactive web forms, the shape and
content of which depend on (stored) data or the contents of other web forms, can be generated
automatically.
4
p. 100 of 199
DTDs. A DTD (XML Schema) is used to describe the structure of a particular kind of
documents. An example of a DTD is the TEI (Text Encoding Initiative) DTD. The Text Encoding Initiative Guidelines are an international and interdisciplinary standard that facilitates
libraries, museums, publishers, and individual scholars to represent a variety of literary and
linguistic texts for online research, teaching, and preservation. The first version of the TEI
DTD appeared in 1990, and since then, updated versions have appeared in 1993, 1994, 1999,
2002, and 2005. Of course, with each new release, software that deals with TEI documents
has to be updated. If the software uses a data binding [26], it is likely that with each new
version of the DTD, much of the ‘business logic’ has to be rewritten, because the structure
of the data has changed. Using generic programs, only those parts of the software that deal
with the new features of the DTD have to be added.
Traversing large data structures. Programming languages evolve. For most programming languages, the abstract syntax used in a compiler is rather large. A lot of the functionality on this abstract syntax only addresses a small subpart of the abstract syntax. For
example, we might want to collect all variables in a program. However, to reach a variable in
the abstract syntax, large traversal functions have to be written. Using generic programming
we can write a single traversal function, which we specialize for different special purposes. For
example, for collecting all variables in the abstract syntax, we add a single line that deals with
variables to the generic traversal function. If the abstract syntax evolves, such a function still
works as expected. Many phases in a compiler can be implemented as generic programs [32].
Generic programming is not only useful for a typical generic phase like parsing, there is also
gain for algorithms that are more specific, such as a type checker. Surprisingly many algorithms can be expressed conveniently in a generic way, by defining “exceptions to the general
case”.
Information Systems Another important area in which generic programming might be
very useful, is Information Systems [27]. A well designed Information System is constructed
from an Information Model that is the result of a requirements analysis. An Information
Model can be seen as a type specification defining the structure and properties of the information that can be stored in the Information System. Information Systems are often derived
systematically from Information Models. All the information about such a system is present
in the model, and generating an Information System fully automatically from a given Information Model should therefore be possible. The functionality for storing, retrieving and
changing information is directly deduced from the corresponding types. Information systems
are not only accessed by applications, they are also inspected interactively by human beings.
The webshop technology described above can be applied here as well. The graphical user
interfaces needed for viewing and changing any part of the stored data can be generated automatically. When an organization changes, this usually has consequences for the Information
Model and the corresponding Information System. By using generic programming techniques,
the consequences of a change will be minimal. It is not clear yet what kind of programming
effort will still be required when a Information Model is changed. Clearly, some of the old
information stored in the old Information System has to be converted to information in the
new format. But again such a conversion can be defined using generic techniques considerably
reducing the amount of programming that has to be done.
5
p. 101 of 199
Generic programming is useful for software evolution, and in order to be useful, software on
data that changes frequently has to be implemented in terms of generic functions. Existing
software that does not use generic programming techniques first has to be refactored or
renovated such that it uses generic programming techniques for data that changes frequently.
Of course, generic programming does not solve all software evolution problems. For example, changes in the modular structure of programs, or changes in the API of a program are
unrelated, and have to be solved using different methods. Furthermore, the approach only
partially works for data types with properties, such as the data type of ordered lists. Finally,
in dynamically typed languages the approach cannot be applied directly, and has to be simulated in some way, probably using soft typing techniques. Alternatively, reflection can be used
to achieve similar results, but is much more difficult since it lacks the proper abstractions for
defining generic programs. We think that using a typed programming language is a first step
towards producing robust code.
4
Research challenges for generic programming
There are a number of research challenges that have to be investigated to make generic
programming a viable tool for supporting software evolution.
• Prototype applications. We have to develop a number of prototype applications to
demonstrate the approach. At the moment we are working on web applications [30, 29],
in the near future we hope to work on database connection and migration tools. We will
use these and maybe other tools to demonstrate the usefulness of generic programming
for software evolution.
• Software process. To use generic programming techniques optimally, information systems have to be designed around the data from the design phase on. This probably
means that the software process for developing systems that use generic programming
techniques has to be slightly adapted. Furthermore, there should be a relatively straightforward mapping between the data modeling language and the implementation. The
distinction between design and implementation is reduced considerably.
• Programming language. Although there exist a number of programming languages that
have added support for generic programming in the last five years, all of these extensions
are recent, and none of the extensions can be called mature: the generated code is often
inefficient, generic functions are not first class, the structural representation of data
types cannot be changed, etc.
We are in the process of writing a research proposal that addresses the first two research
challenges for applying generic programming for software evolution.
5
Conclusions
Generic Programming is a recent programming technique that is useful for building software
that needs to work with evolving data types. The implications of using generic programming
for software development and software evolution have hardly been investigated, but first
results are promising. A number of research challenges have to be investigated to make
6
p. 102 of 199
generic programming an integral part of the software development process for developing
software that has to deal with frequently changing data types.
References
[1] Peter Achten, Marko van Eekelen, and Rinus Plasmeijer. Generic Graphical User Interfaces. In The
15th International Workshop on the Implementation of Functional Languages, IFL 2003, Selected Papers,
volume 3145 of LNCS, pages 152–167. Springer-Verlag, 2004.
[2] Artem Alimarine. Generic Functional Programming - Conceptual Design, Implementation and Applications. PhD thesis, University of Nijmegen, The Netherlands, 2005. ISBN 3-540-67658-9.
[3] Artem Alimarine and Rinus Plasmijer. A generic programming extension for Clean. In Thomas Arts
and Markus Mohnen, editors, The 13th International workshop on the Implementation of Functional
Languages, IFL’01, Selected Papers, volume 2312 of LNCS, pages 168–186. Älvsjö, Sweden, Springer,
September 2002.
[4] Artem Alimarine and Sjaak Smetsers. Improved Fusion for Optimizing Generics. In Manuel Hermenegildo
and Daniel Cabeza, editors, Proceedings of Seventh International Symposium on Practical Aspects of
Declarative Languages, number 3350 in LNCS, pages 203 – 218. Long Beach, CA, USA, Springer, January
2005.
[5] Frank Atanassow, Dave Clarke, and Johan Jeuring. Scripting XML with Generic Haskell. In Proceedings
of the 7th Brazilian Symposium on Programming Languages, SBLP 2003, 2003. An extended version of
this paper appears as ICS, Utrecht University, technical report UU-CS-2003-023.
[6] Roland Backhouse and Jeremy Gibbons, editors. Generic Programming, Advanced Lectures, volume 2793
of LNCS. Springer-Verlag, 2003.
[7] Juan Chen and Andrew W. Appel. Dictionary passing for polytypic polymorphism. Technical Report
TR-635-01, Princeton University, March 2001.
[8] Jun Furuse. Generic polymorphism in ML. In Journées Francophones des Langages Applicatifs, January
2001.
[9] Jeremy Gibbons and Johan Jeuring, editors. Generic Programming. Proceedings of the IFIP TC2 Working
Conference on Generic Programming, Schloss Dagstuhl, July 2002. Kluwer Academic Publishers, 2003.
[10] Paul Hagg. A framework for developing generic XML Tools. Master’s thesis, Department of Information
and Computing Sciences, Utrecht University, 2002.
[11] Ralf Hinze. Polytypic values possess polykinded types. Science of Computer Programming, 43(2-3):129–
159, 2002.
[12] Ralf Hinze and Johan Jeuring. Generic Haskell: applications. In Generic Programming, Advanced Lectures,
volume 2793 of LNCS, pages 57–97. Springer-Verlag, 2003.
[13] Ralf Hinze and Johan Jeuring. Generic Haskell: practice and theory. In Generic Programming, Advanced
Lectures, volume 2793 of LNCS, pages 1–56. Springer-Verlag, 2003.
[14] Patrik Jansson and Johan Jeuring. PolyP — a polytypic programming language extension. In Conference
Record of POPL ’97: The 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming
Languages, pages 470–482. ACM Press, 1997.
[15] Patrik Jansson and Johan Jeuring. Polytypic data conversion programs. Science of Computer Programming, 43(1):35–75, 2002.
[16] Johan Jeuring, editor. Workshop on Generic Programming. Utrecht University, 2000. Technical report
UU-CS-2000-19.
[17] Pieter Koopma and Rinus Plasmeijer. Testing reactive systems with gast. In S. Gilmore, editor, Proceedings Fourth symposium on Trends in Functional Programming, TFP03, pages 111–129, Edinburgh,
Scotland, 2003. ISBN 1-84150-122-0.
[18] Pieter Koopman, Artem Alimarine, Jan Tretmans, and Rinus Plasmeijer. Gast: Generic Automated
Software Testing. In Ricardo Peña, editor, The 14th International Workshop on the Implementation of
Functional Languages, IFL 2002, Selected Papers, volume 2670 of LNCS. Springer-Verlag, 2003.
7
p. 103 of 199
[19] Ralf Lämmel and Simon Peyton Jones. Scrap your boilerplate: a practical approach to generic programming. ACM SIGPLAN Notices, 38(3):26–37, 2003. TLDI’03.
[20] Ralf Lämmel and Simon Peyton Jones. Scrap your boilerplate with class: extensible generic functions.
Proceedings ICFP’05, 40(9):204–215, 2005.
[21] M.M. Lehman. Programs, life cycles and the laws of software evolution. Proc. IEEE, 68(9):1060–1078,
1980.
[22] M.M. Lehman and L.A. Belady. Program Evolution: Processes of Software Change. Academic Press,
London, 1985.
[23] Andres Löh. Exploring Generic Haskell. PhD thesis, Utrecht University, 2004.
[24] Andres Löh, Dave Clarke, and Johan Jeuring. Dependency-style Generic Haskell. In Olin Shivers, editor,
Proceedings of the International Conference on Functional Programming, ICFP’03, pages 141–152. ACM
Press, August 2003.
[25] N. Marti-Oliet M. Clavel, F. Duran. Polytypic programming in maude. In WRLA 2000, 2000.
[26] Brett McLaughlin. Java & XML data binding. O’Reilly, 2003.
[27] Betsy Pepels and Rinus Plasmeijer. Generating Applications from Object Role Models. In R. Meersman,
editor, Proceedings of the OTM Workshops 2005, OnTheMove - OTM 2005 Federated Conferences and
Workshops, volume 3762 of LNCS, pages 656–665, Agia Napa, Cyprus, Oct 31-Nov 4 2005. Springer.
[28] Simon Peyton Jones et al. Haskell 98, Language and Libraries. The Revised Report. Cambridge University
Press, 2003. A special issue of the Journal of Functional Programming.
[29] Rinus Plasmeijer and Peter Achten. The Implementation of iData - A Case Study in Generic Programming.
In A. Butterfield, editor, Proceedings Implementation and Application of Functional Languages, 17th
International Workshop, IFL05, Dublin, Ireland, September 19-21 2005. To Appear in Springer LNCS.
[30] Rinus Plasmeijer and Peter Achten. iData For The World Wide Web - Programming Interconnected Web
Forms. In Proceedings Eighth International Symposium on Functional and Logic Programming (FLOPS
2006), Fuji Susono, Japan, Apr 24-26 2006. To Appear.
[31] A.L. Powell. A literature review on the quantification of software change. Technical Report YCS 305,
Computer Science, University of York, 1998.
[32] Arjen van Weelden, Sjaak Smetsers, and Rinus Plasmeijer. A Generic Approach to Syntax Tree Operations. In A. Butterfield, editor, Proceedings Implementation and Application of Functional Languages,
17th International Workshop, IFL05, Dublin, Ireland, September 19-21 2005. To Appear in Springer
LNCS.
[33] M. Wallace and C. Runciman. Heap compression and binary I/O in Haskell. In 2nd ACM Haskell
Workshop, 1997.
[34] Stephanie Weirich and Liang Huang. A design for type-directed programming in Java. In Workshop on
Object-Oriented Developments (WOOD 2004, 2004.
8
p. 104 of 199
An Architecture for Attribute-Driven Evolution
of Nomadic Media
Lech Krzanik
Dept. of Information Processing Science, University of Oulu, Linnanmaa, FIN-90014 Oulu, Finland
Abstract
An architecture for evolving nomadic media systems is presented which assumes attribute-driven
evolution and focuses directly on two fundamental evolutionary development sub-processes:
variation and selection. The architecture allows for high-level monitoring and control of system
evolution. There are opportunities for reusing implemented subsystems in other application domains.
The architecture has been successfully used with nomadic media demonstrators.
Keywords: Software evolution, software architecture, software quality.
1 Motivation and assumptions
Software domains developed under continuous changes of requirements, technologies,
markets and regulations, are suitable for evolutionary delivery strategies [1]. Such strategies
have the ability to properly follow the changes. They typically include continuous
development of applications, the use of stakeholder feedback, relatively small delivery steps,
evolutionary opportunism, etc. They are realized by two interleaved sub-processes of
evolutionary variation and selection. Variation delineates the space of feasible delivery steps
while selection, based on stakeholder feedback, determines the actual development path.
Software architecture plays a fundamental role in evolution [2] by constraining the two
processes. Architecture can be considered in terms of the system’s structure or the
corresponding system performance (qualities and resources) [3]. The latter view, including
p. 105 of 199
system non-functional aspects or system attributes is often more suitable for considering
multiple types of stakeholders, particularly non-technical, with their specific feedback and
their evolution plans.
In this paper we present a software architecture for evolving systems, for the domain of
nomadic media. The mission of nomadic media systems is to allow consumers to enjoy their
content and use interactive services at the times and in the places they prefer, using the
devices that best suit their circumstances. At present the domain undergoes extensive
development and is connected with all types of changes mentioned above, and is a good
candidate for the evolutionary strategy. Additional assumptions representing the specifics of
the domain include: (i) the broad range of stakeholders possibly involved in the evolutionary
delivery processes (in generating feedback, providing plans, etc.); (ii) a multi-criteria delivery
step selection process; (iii) arbitrarily small delivery steps (with regard to selected criteria);
(iv) occasionally a very short time to market required. The applications in the domain are
contextual which means their operation depends on context variables such as geographical
location, noise level, social context, etc. As a consequence of (i)-(iii), we require that an
attribute-driven approach is applied to the selection process, and the attribute view is taken
to represent the architecture. Moreover (ii) and particularly (iii) entails the need for
quantitative representation of system attributes wherever possible. For the variation process
we require, following (iv), a generative approach with adequate response time. By generative
approach we mean a software family approach that automates the creation of family
members, where family member generation is based on a specification in a domain-specific
language [4]. In this presentation of the developed architecture we put forward the attributeoriented aspects of the selection process and the generative aspects of the variation process.
Usability [5, 6] is a foreground system attribute considered in our nomadic media case.
Other attributes include (for various stakeholders) cost and time to market, privacy and
quality of service, etc. System attributes are considered orthogonal to the basic function of
nomadic media. We assume that the system function essentially stays the same for the entire
process1, and can be factored out into dedicated architectural components. Deliveries can be
defined in terms of system attributes that may change within specified bounds. More
specifically, our function is nomadic blogging, which is a mobile extension of the popular
blogging [7, 8]. With the provided context model the function can be parameterized by the
usability attributes to cover a wide variety of practical system functions. In section 2 we
present the evolving architecture. Section 3 summarizes the developed demonstrator based
on the introduced architecture. In section 4 we discuss some related work. Section 5
concludes.
1
This is true is selected software domains, e.g., in telecommunications industry.
p. 106 of 199
2 Evolution architecture
The proposed evolution architecture is outlined in next three figures. We follow the
variation/selection bipartite nature of evolving systems. The view in Fig. 1 demonstrates
major subsystems and their links for the variation aspect. The activity view in Fig. 2 shows
the selection aspect as implemented in these subsystems. Finally, Fig. 3 demonstrates the
structure of individual nomadic blogging applications. The various tactics realized by this
structure result in the values for system attributes that drive the variation and selection
aspects of the evolutionary delivery process.
Solution Space
Evolutionary
Evolutionary
Evolutionary
specification
specification
Specification
Scoping
(Dom. Engg.)
Quantitative
PropertyBased Notation
Software
Concept
Application
Application
Product Line
Preliminary
Requirements
Deliver Final
Version
Design the
Architecture
Derivation
(Appl. Engg.)
Update the
Architecture
Architectural
Evaluation
Incorporate
Customer
Feedback
Domain Model
Context Model
Deliver the
Version
Elicit
Customer
Feedback
Figure 2. The selection aspect.
Figure 1. The variation aspect: major subsystems
and their links.
Back-End Infrastructure
Develop a
Version
Content servers
Internet
Access server (Portal server)
Front-End
Infrastructure
Gateways
Wireless network
Client devices
Internet
Client devices
Figure 3. Tiers of the architecture of individual
nomadic blogging applications.
The variation process is divided into three layers: the “specimen” layer of individual
applications, the “species” layer with a product line(s) from which applications are derived,
and the “genus” layer including a domain model that supports the domain engineering stage
of the product line [9]. There are two elements common to the three layers. One is the
p. 107 of 199
attribute-based notation that supports specification of applications, product lines, and
domains. The other is the (application) architecture evaluation method used to assess the
architectural attributes when selecting delivery step candidates. A dedicated component
associated with the domain model represents the interaction context for various stakeholders.
The product line has an attribute-oriented formulation, with multi-dimensional, explicit
system attributes. To facilitate evaluation, comparison and selection of delivery steps,
attributes should be quantitative and measurable wherever possible. The notation for the
specification of applications, product lines and domains as well as the architectural
evaluation method, and the proposed quantitative domain model along with the context
model for nomadic media systems are new results. The selection process is an updated
version of the conventional attribute-driven development [3]. For the architectural evaluation
a number of methods have been proposed, derived from known methods such as SAAM [10]
and ATAM [3].
The nomadic blogging applications have a conventional tiered architecture (Fig. 3). The
front-end tier interacts with users and also includes context sensors and actuators. The
gateway tier includes adapters which map various device specific network protocols, such as
WAP, to the common TCP/IP based communication protocols used on the server side. The
access server tier accesses information stored on content servers on behalf of a client. Finally,
the content server tier provides data stored in databases or Enterprise Resource Planning
Systems, as well as other kinds of web content.
3 Demonstrator
A demonstrator has been developed for the proposed attribute-driven evolution
architecture, implementing a series of nomadic blogging systems illustrating possible
attribute-driven evolution paths. Nomadic blogging is a context sensitive personal publishing
function that supports traveling users. At a given time the user is focused on one or more
locations including a number of business points which may provide or redistribute a variety
of media. The main attribute of interest was usability. According to [6] usability is specified
with quantitative sub-attributes of effectiveness, efficiency and satisfaction. The results are
illustrated in Figures 4, 5. Fig. 4 shows a sample evolutionary development path through a
number of application variants including: Simple nomadic blogging, Nomadic blogging in an
office building, Federated nomadic blogging regions, Large screen control, Controlling
multiple large screens, Nomadic blogging with city business points. The links between the
instances indicate which sub-attributes of usability improved at the indicated delivery steps
(attribute values are skipped to simplify presentation). Fig. 5 demonstrates more detailed
structure of the application which was instantiated to all these evolutionary variants.
p. 108 of 199
Context
Gtwy
(MS IIS)
Backend Services
(.Net)
Database
(SQL Server)
Computer2
#5
Efficiency,
Effectiveness
Efficiency
Phone
#3
#1
PORO Inquirer
Federated
Regions
#2
Lounge
AIRPORTLounge
#4
Shopping Arcade
Checkin
CEP, WiFi
Nomadic blogging in an office bldg.
Satisfaction
Federated nomadic blogging regions
PORO Database
Server
Inquire
PORO Data
Computer1
PORO Backend Server
Discovery, BT
Restaurants
AIRPORT
Simple nomadic blogging
External
Business
& Media
Services
*SCOUT Mgmt.
Context Sampling
Device
Business and Media
Server
Bus. Data
Laptop
Discovery, BT
Access
Server
*Business Mgmt.
SCOUT
Satisfaction
Media Data
Profile Mgmt.
Browser
P
Group Mgmt.
Discovery, BT
Effectiveness
P
u
b
li
Large screen control
Satisfaction,
P Effectiveness
u
Controlling multiple large screens
PORO Gateway Server
External
Content
Server
User Mgmt.
HTML, WiFi
HTML Gateway
Mass Blogger
Local Area
Network
Pocket PC
SOAP, WiFi
Blogger
Spec. Gateway
PORO Spec. Client
Nomadic blogging with city business points
Figure 4. Sample evolutionary development path of a
nomadic blogging system.
WiFi
Station
Figure 5. Deployment diagram of the evolving
nomadic blogging application.
4 Discussion and related work
A central feature of the presented approach is its focus on the two interleaved processes
of evolutionary variation and selection, with an emphasized role of stakeholder feedback and
application context. This viewpoint has lately received increasing attention, e.g., in [1]. To
obtain the high-level, attribute-driven, and multi-criteria interpretation, we had to include a
relatively complex supporting structure with a domain model, architecture evaluation
method, and the application product line. The development of that structure was connected
with considerable effort but the results seem to be potentially applicable to other domains as
well, and the developed assets can be reused. However we have performed only limited
analyses of other domains or more extensive attribute sets.
With its focus on non-functional attributes, the introduced approach is similar to
conventional software refactoring [11] that also addresses improvement of a specific set of
system attributes, usually extensibility, modularity, reusability, complexity, maintainability,
efficiency, etc. In the context of software evolution, refactoring is used to improve the quality
of software, while preserving the behavioural aspects of software. Compared to refactoring,
the proposed approach potentially allows for explicit consideration of more extensive sets of
multidimensional attributes, with tradeoffs, but it operates at a much higher level A typical
refactoring process includes the following steps [11]: (1) Identify where the software should
be refactored; (2) Determine which refactoring(s) should be applied; (3) Guarantee that the
refactoring preserves behaviour; (4) Apply the refactoring; (5) Assess the effect of the
refactoring on quality of the software or the process2; (6) Maintain the consistency between
2
Examples of process quality characteristics are productivity, cost, effort, etc.
p. 109 of 199
the refactored program code and other artefacts3. Compared to that, steps (1) and (2) are
supported directly by the domain model in our approach. Step (3) is partly solved by the
domain layer, and partly can be considered explicitly by introducing the relevant system
attributes, which together with the qualities of step (5) are supported by the introduced
architecture evaluation method. The resulting process is simpler although its detailed
characteristics and practical applicability have to be further investigated. With the multicriteria approach, the proposed method can also be applied to prevent, rather than remove,
design deterioration, thus has the potential of saving resources spent on refactoring.
The approach has also been used to demonstrate improved support for the Usage
Centered Design methodology [12], which can be interpreted as an evolutionary product
development approach to user interface design.
5 Conclusion
An architecture for evolving nomadic media systems is presented which assumes
attribute-driven evolution and focuses directly on two fundamental evolutionary
development sub-processes: variation and selection. The architecture allows for high-level
monitoring and control of system evolution. There are opportunities for reusing implemented
subsystems in other application domains. The architecture has been successfully used with
nomadic media demonstrators.
6 Acknowledgement
This work has been supported in part by the project E!2023 ITEA Nomadic Media.
References
[1]
Madhavji, Nazim H., Juan C. Fernandez-Ramil and Dewayne E. Perry, Software Evolution
and Feedback: Theory and Practice. John Wiley & Sons, Ltd. , 2006.
[2]
Garlan, David, Software architecture: a roadmap. In 22nd International Conference on on
Software Engineering (ICSE 2000): Future of Software Engineering Track, pages 91–101.
ACM, 2000.
[3]
Bass, Len, et al., Software Architecture in Practice, 2nd Ed.. Addison-Wesley Longman, Inc.,
Reading, MA, 2003.
3
Such as documentation, design documents, requirements specifications, tests, etc.
p. 110 of 199
[4]
Czarnecki, K., and U. Eisenecker, Generative Programming: Methods, Tools, and
Applications. Addison-Wesley, 2000.
[5]
ISO/IEC 13407. Human-Centred Design Processes for Interactive Systems, 1999.
[6]
ISO/IEC 9241-14. Ergonomic requirements for office work with visual display terminals
(VDT), 1998.
[7]
Blood, R., How blogging software reshapes the online community, Comm. ACM, 47, Nr. 2,
2004.
[8]
Stone, B., Blogging. Genius Strategies for Instant Web Content. New Readers, 2002.
[9]
Pohl, Klaus, et al., Software Product Line Engineering: Foundations, Principles and
Techniques. Springer, 2005.
[10] Clements, Paul, et al., "Predicting software quality by architecture-level evaluation".
Proc. 5th Intl. Conf. Software Quality, 1995.
[11] Mens, Tom, and Tom Tourwe, A Survey of Software Refactoring, IEEE Transactions on
Software Engineering, Vol. 30, No. 2, February 2004.
[12] Constantine, Larry L. , and Lucy A.D. Lockwood, Software for Use: A Practical Guide to the
Models and Methods of Usage Centered Design. ACM Press, 1999.
p. 111 of 199
p. 112 of 199
A Tool for Exploring Software Systems Merge Alternatives
Rikard Land1, Miroslav Lakotic2
1
Mälardalen University, Department of Computer Science and Electronics
PO Box 883, SE-721 23 Västerås, Sweden
2
University of Zagreb, Faculty of Electrical Engineering and Computing
Unska 3, HR-10000 Zagreb, Croatia
[email protected], [email protected], http://www.idt.mdh.se/~rld
Abstract
The present paper presents a tool for exploring
different ways of merging software systems, which may
be one way of resolving the situation when an
organization is in control of functionally overlapping
systems. It uses dependency graphs of the existing
systems and allows intuitive exploration and
evaluation of several alternatives.
1. Introduction
It is well known that successful software systems has
to evolve to stay successful, i.e. it is modified in
various ways and released anew [11,15,16]. Some
modification requests concern error removal; others
are extensions or quality improvements. A current
trend is to include more possibilities for integration
and interoperability with other software systems.
Typical means for achieving this is by supporting open
or de facto standards [13] or (in the domain of
enterprise information systems) through middleware
[4]. This type of integration concerns information
exchange between systems of mainly complementary
functionality. There is however an important area of
software systems integration that has so far been little
researched, namely of systems that are developed inhouse and overlap functionally. This may occur when
systems, although initially addressing different
problems, evolve and grow to include richer and richer
functionality. More drastically, this also happens after
company acquisitions and mergers, or other types of
close collaborations between organizations. A new
system combining the functionality of the existing
systems would improve the situation from an
economical and maintenance point of view, as well as
from the point of view of users, marketing and
customers.
1.1 Background Research
we have previously performed a qualitative multiple
case study [21] consisting of nine cases in six
organizations.
At a high level, there seems to be four strategies
that are analytically easy to understand [10]: No
Integration (i.e. do nothing), Start from Scratch (i.e.
initiate development of a replacing system, and plan
for retiring the existing ones), Choose One (choose the
existing system that is most satisfactory and evolve it
while planning for retiring the others), and – the focus
of the present paper – Merge (take components from
several of the existing systems, modify them to make
them fit and reassemble them).
There may be several reasons for not attempting a
Merge, for example if the existing systems are
considered aged, or if users are dissatisfied and
improvements would require major efforts. Reusing
experience instead of implementations might then be
the best choice. Nevertheless, Merge is a tempting
possibility, because users and customers from the
previous systems would feel at home with the new
system, no or very little effort would be spent on new
development (only on modifications), and the risk
would be reduced in the sense that components are of
known quality. It would also be possible to perform
the Merge in an evolutionary manner by evolving the
existing systems so that more and more parts are
shared; this might be a necessity to sustain
commitment and focus of the integration project.
Among the nine cases of the case study, only in one
case was the Merge clearly chosen as the overall
strategy and has also made some progress, although
there were elements of reuse between existing systems
also in some of the other cases. Given this background
research, we considered the Merge strategy to be the
least researched and understood and the least
performed in practice, as well as the most intellectually
challenging.
To investigate how organizations have addressed this
challenge, which we have labeled in-house integration,
p. 113 of 199
1.2 Continuing with Merge
To explore the Merge strategy further, we returned to
one of the cases and performed follow-up interviews
focused on compatibility and the reasons for choosing
one or the other component. The organizational
context is a US-based global company that acquired a
slightly smaller global company in the same business
domain, based in Sweden. The company conducts
physics computer simulations as part of their core
business, and both sites have developed their own 3D
physics simulator software systems. Both systems are
written in Fortran and consist of several hundreds of
thousands lines of code, a large part of which are a
number of physics models, each modeling a different
kind of physics. The staff responsible for evolving
these simulators is less than a handful on each site, and
interviews with these people are our main source of
information [9].
At both sites, there were problems with their
model for a particular kind of physics, and both sites
had plans to improve it significantly (independent of
the merge). There was a strategic decision to integrate
or merge the systems in the long term, the starting
point being this specific physics module. This study
involved interviewing more people. It should be noted
that although the interviewees met in a small group to
discuss alternatives, they did not use our tool, since the
tool has been created after, and partly influenced by,
these events. The case is nevertheless used as an
example throughout the present paper, to illustrate
both the possibilities of the tool and motivate its
usefulness in practice.
In an in-house integration project, there is
typically a small group of architects who meet and
outline various solutions [10]. This was true for the
mentioned case as well as several others in the
previous study. In this early phase, variants of the
Merge strategy should be explored, elaborated, and
evaluated. The rest of the paper describes how the tool
is designed to be used in this context. The tool is not
intended to automatically analyze or generate any parts
of the real systems, only serve as a decision support
tool used mainly during a few days’ meeting. One
important design goal has therefore been simplicity,
and it can be seen as an electronic version of a
whiteboard or pen-end-paper used during discussions,
although with some advantages as we will show.
1.3 Related Work
Although the field of software evolution has been
maturing since the seventies [11,16], there is no
literature to be found on software in-house integration
and merge. Software integration as published in
literature can roughly be classified into: Component-
p. 114 of 199
Based Software Engineering [19,20], b) standard
interfaces and open systems [13], and c) Enterprise
Application Integration (EAI) [6,18]. These fields
typically assume that components or systems are
acquired from third parties and that modifying them is
not an option, which is not true in the in-house
situation. Also, these fields address components or
systems complementing each other (with the goal of to
reducing development costs and time) rather than
systems that overlap functionally (with rationalization
of maintenance as an important goal).
Although there are methods for merging source
code [3,12], these approaches are unfeasible for
merging large systems with complex requirements,
functionality, quality, and stakeholder interests. The
abstraction level must be higher.
We have chosen to implement a simple
architectural view, the module view [5,7] (or
development view [8]), which is used to describe
development abstractions such as layers and modules
and their relationships. Such dependency graphs, first
defined by Parnas [14], are during ordinary software
evolution the natural tool to understand how
modifications propagate throughout a system.
2. The Tool
The tool was developed by students as part of a project
course. The foundation of the tool is a method for
software merge. As this is ongoing work, this paper is
structured according to the method but focuses on the
tool. We also intend to publish the method separately,
as it has been refined during the tool implementation –
after which it is time to further improve the tool.
The method makes use of dependency graphs of
the existing systems. There is a formal model at the
core, with a loosely defined process on top based on
heuristics and providing some useful higher-level
operations. The tool conceptually makes the same
distinction: there are the formally defined concepts and
operations which cannot be violated, as well as higherlevel operations and ways of visualizing the model, as
suggested by the informal process. In this manner, the
user is gently guided towards certain choices, but
never forced. A fundamental idea with both the
method and the tool is that they should support the
exploratory way of working – not hinder it.
The actual tool is implemented as an Eclipse plugin [1]. The model of the tool is based on the formal
model mentioned above, and its design follows the
same rules and constraints. The model was made using
Eclipse Modeling Framework, and presented by
Graphics Eclipse Framework combined using the
Model-Controller-View architecture. This makes the
tool adaptable and upgradeable.
2.1 Preparatory Phase
There are two preparatory activities:
Activity P-I: Describe Existing Systems. The
user first needs to describe the existing systems as well
as outline a desired future system. The current
implementation supports two existing systems, but the
underlying model is not limited to only two.
Activity P-II: Describe Desired Future
Architecture. The suggestion of the final system is
determined simply by choosing which modules are
preferred in the outcome. Any system, A or B can then
be experimented upon, and the progress can be
followed through a scenario tree. Figure 1 shows a
snapshot of the tool with the two existing systems at
the top and the future system at the bottom. It might be
noted that the existing systems have – and must have –
identical structures (this assumption is further
discussed in section 2.3).
2.2 Exploratory Phase
The goal of the exploration is two system descriptions
where some modules have been exchanged, so that the
systems are evolved in parallel towards the desired
future, merged system. The goal is not only to describe
the future system (one graph would then be enough,
and no tool support needed) but to arrive at next
releases of the systems, in order to perform the merge
gradually, as a sequence of parallel releases of the two
existing systems until they are identical. This will
involve many tradeoffs on the behalf of the architects
(and other stakeholders) between e.g. efforts to be
spent only on making things fit for the next release and
more effort to include the more desired modules,
which will delay next release of a system. The tool
does not solve these tradeoffs but supports reasoning
about them. There are four activities defined in the
exploratory phase, with a rough ordering as follows,
but also a number of iterations.
Activity E-I: Introduce Desired Changes. The
starting point for exploration is to introduce some
desired change. In the case, it was imperative to start
by assuming a newly developed physics module (PX
in the figures) to be shared by both systems. In other
situations, the actual module to start with might not be
given. In the tool, this is done by choosing the
preferred module in the final system view, by clicking
on the checkboxes. A new module can also be attached
to the old system. This is done by clicking on the node
in final system, and then clicking on the button
“Create” in the Actions View. This will also require
user input for the name of the new module and effort
needed for its implementation (this could be zero for a
pre-existing component such as a commercial or open
source component, or a component to be reused inhouse). After the module has been created, it can be
used as any other module. The change to the system
Figure 1: Initial systems state.
p. 115 of 199
structure is made by clicking on the nodes and links in
the input systems A and B. The modules the systems
are using can be set up in the Status View for every
node in any input system.
Activity E-II: Resolve Inconsistencies. As
changes are introduced, the tool will highlight
inconsistencies between modules by painting the
dependency arrows orange (see Figure 2). In the
model, two module instances from the same system are
consistent without further adaptation. Two modules
from different systems are consistent only if some
measure has been taken to ensure it, i.e., if either
module have been adapted to work with the other. The
actual adaptations made could in practice be of many
kinds: some wrapping or bridging code as well as
modifications of individual lines of code.
Another way to resolve an inconsistency is to
describe adaptations to either of the inconsistent
modules, in order to make them match. This is done by
clicking on the incompatible link, and one of “Add …”
buttons in the Actions View. This will require the user
to enter an estimated effort for resolving this
inconsistency (a number, in e.g. man-months), and a
free text comment how to solve it, such as “we will
modify each call to methods x() and y(), and must also
introduce some new variables z and w, and do the
current v algorithm in a different way” (on some level
of detail found feasible). (As said, the tool does not do
anything with the real systems automatically, but in
this sense serves as a notebook during rapid
explorations and discussions.) It can be noted that a
module that will be newly developed would be built to
fit. Nevertheless there is an additional complexity in
building something to fit two systems simultaneously,
which is captured by this mechanism.
There is also a third possibility to resolve an
inconsistency: to let two modules for the same role live
side by side, see Figure 3. Although allowing the same
thing to be done in different ways is clearly a violation
of the system’s conceptual integrity, it could be
allowed during a transition period (until the final,
merged system is delivered) if the system’s correct
behavior can be asserted. For example, it might be
allowed for some stateless fundamental libraries, but
not when it is fundamentally assumed that there is only
one single instance responsible for a certain
functionality, e.g. for managing central resources, such
as thread creation and allocation, access control to
various hardware or software resources, security). The
tool cannot know whether it would be feasible in the
real system, this is up to the users to decide when and
whether to use this possibility. The current version
does not model the potential need for communication
and synchronization of two modules doing same role.
p. 116 of 199
Figure 2: Example of highlighted inconsistencies.
Figure 3: Two modules with same role.
Figure 4: The History View.
Activity E-III: Branch Scenarios. As changes are
made, the operations are added to a scenario tree in the
History View (see Figure 4). At any time, it is possible
to click any choice made earlier in the tree, and branch
a new scenario from that point. The leaf of each
branch represents one possible version of the system.
When clicking on a node, the graphs are updated to
reflect the particular decisions leading to that node.
Any change to the systems (adaptations, exchanging
modules, etc.) results in a new node being created;
unless the currently selected node is a leaf node, this
means a new branch is created. All data for adaptations
entered are however shared between scenarios; this
means that the second time a particular inconsistency
is about to be resolved, the previous description and
effort estimation will be used. As information is
accumulated, the exploration will be more and more
rapid.
Figure 5: The Status View.
Activity E-IV: Evaluate Scenarios. The exploration
is a continuous iteration between changes being made
(activities E-II and E-III) and evaluation of the
systems. Apart from the information of the graphs
themselves, the Status View presents some additional
information, see Figure 5. The branching mechanism
thus allow the architects to try various ways of
resolving inconsistencies, undo some changes (but not
loosing them) and explore several alternatives in a
semi-parallel fashion, abandon the least promising
branches and evaluate and refine others further. The
total effort for an alternative can be accessed by
clicking the “History Analysis” button, which is
simply the sum of all individual adaptation efforts. It
also becomes possible to reason about efforts related to
modifications that actually lead towards the desired
future system, efforts required only to make modules
fit only for the next delivery (and later discarded).
The tool’s advantage over using a whiteboard lies
in the possibility to switch back and forth among
(temporary) decisions made during the exploration (by
means of the scenario tree), make some further
changes (through simple point-and-click operations),
and constantly evaluate the resulting systems (by
viewing the graphs, the status view, and retrieve the
total effort for the scenario).
Finally, although not implemented yet, one would
extract the free texts associated with the scenario into a
list of implementation activities.
2.3 Similar Structures?
The tool (and the model) assumes that the existing
systems have identical structures, i.e. the same set of
module roles (e.g. one module instance each for file
handling, for physics X etc.) with the same
dependencies between them. This may seem a rather
strong assumption, but there are three motivations for
this, based on our previous multiple case study [10].
First, our previous observations strongly suggest that
similar structures are a prerequisite for merge to make
sense in practice. Second, we also observed that it is
not so unlikely that systems in the same domain, built
during the same era, are indeed similar. And third, if
the structures are not very similar, it is often possible
to find a higher level of abstraction where the systems
are similar.
With many structural differences, Merge is less likely
to be practically and economically feasible, and some
other high-level integration strategy should be chosen
(i.e. Start from Scratch or Choose One). A common
type of difference, that should not pose large
difficulties in practice, is if there is a set of identical
module roles and dependencies, and some additional
modules that are only extensions to this common
architecture. (For example, in the case we could
imagine one of the systems to have a module modeling
one more physics model PW than the other.) However,
architects need in reality not be limited by the current
version: a simple workaround solution is to introduce
virtual module instances, i.e. modules that do not exist
in the real system (which are of course not desired in
the future system).
3. Future Research & Development
The tool is still in prototype stage and needs to be
further developed. Neither the method nor the tool has
been validated in a real industrial case (although their
construction builds heavily on industrial experiences).
In reality there are numerous ways to make two
components fit, for example as an adapter mimicking
some existing interface (which requires little or no
modifications of the existing code) or switches
scattered through the source code (as runtime
mechanisms or compile-time switches). Such choices
must be considered by the architects: a highperformance application and/or a resource constrained
runtime environment might not permit the extra
overhead of runtime adapters, and many compile-time
switches scattered throughout the code makes it
difficult to understand. The method in its current
version does not model these choices explicitly but has
a very rough representation: the users can select which
of the two inconsistent modules that should be
adapted, and add a free text description and an effort
estimation.
Another type of extension would be to include
several structural views of the architecture, including
some runtime view.
Yet another broad research direction is to extend
the method and the tool to not focus so much on
structure as the software architecture field usually does
[2,17]. Structure is only one high-level measure of
similarity between systems. Existing data models, and
the technological frameworks chosen (in the sense
“environment defining components”) are also
important additional issues to evaluate [10], and needs
to be included in any merge discussions in reality, and
should be included in future extensions of the merge
method and the tool.
p. 117 of 199
4. Acknowledgements
We would like to thank the interviewees and their
organization for sharing their experiences and allowing
us to publish them. Thanks to Mathias Alexandersson,
Sebastien Bourgeois, Marko Buražin, Mladen Čikara,
Lei Liu, and Marko Pecić for implementing the tool.
Also thanks to Laurens Blankers, Jan Carlson, Ivica
Crnkovic, and Stig Larsson for previous and current
research collaborations related to this paper.
5. References
[1] Eclipse.org home, URL: www.eclipse.org,
2006.
[2] Bass L., Clements P., and Kazman R., Software
Architecture in Practice (2nd edition), ISBN 0321-15495-9, Addison-Wesley, 2003.
[3] Berzins V., “Software merge: semantics of
combining changes to programs”, In ACM
Transactions on Programming Languages and
Systems (TOPLAS), volume 16, issue 6, pp.
1875-1903, 1994.
[4] Britton C. and Bye P., IT Architectures and
Middleware: Strategies for Building Large,
Integrated Systems (2nd edition), ISBN
0321246942, Pearson Education, 2004.
[5] Clements P., Bachmann F., Bass L., Garlan D.,
Ivers J., Little R., Nord R., and Stafford J.,
Documenting Software Architectures: Views
and Beyond, ISBN 0-201-70372-6, AddisonWesley, 2002.
[6] Cummins F. A., Enterprise Integration: An
Architecture for Enterprise Application and
Systems Integration, ISBN 0471400106, John
Wiley & Sons, 2002.
[7] Hofmeister C., Nord R., and Soni D., Applied
Software Architecture, ISBN 0-201-32571-3,
Addison-Wesley, 2000.
[8] Kruchten P., “The 4+1 View Model of
Architecture”, In IEEE Software, volume 12,
issue 6, pp. 42-50, 1995.
[9] Land R., Interviews on Software Systems
Merge, MRTC report ISSN 1404-3041 ISRN
MDH-MRTC-196/2006-1-SE,
Mälardalen
Real-Time Research Centre, Mälardalen
University, 2006.
[10] Land R. and Crnkovic I., “Software Systems InHouse Integration: Architecture, Process
p. 118 of 199
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
Practices and Strategy Selection”, In
Information & Software Technology, accepted
for publication, 2006.
Lehman M. M. and Ramil J. F., “Software
Evolution and Software Evolution Processes”,
In Annals of Software Engineering, volume 14,
issue 1-4, pp. 275-309, 2002.
Mens T., “A state-of-the-art survey on software
merging”, In IEEE Transactions on Software
Engineering, volume 28, issue 5, pp. 449-462,
2002.
Meyers C. and Oberndorf P., Managing
Software Acquisition: Open Systems and COTS
Products, ISBN 0201704544, Addison-Wesley,
2001.
Parnas D. L., “Designing Software for Ease of
Extension and Contraction”, In IEEE
Transaction on Software Engineering, volume
SE-5, issue 2, pp. 128-138, 1979.
Parnas D. L., “Software Aging”, In Proceedings
of The 16th International Conference on
Software Engineering, pp. 279-287, IEEE Press,
1994.
Perry D. E., “Laws and principles of evolution”,
In Proceedings of International Conference on
Software Maintenance (ICSM), pp. 70-70,
IEEE, 2002.
Perry D. E. and Wolf A. L., “Foundations for
the study of software architecture”, In ACM
SIGSOFT Software Engineering Notes, volume
17, issue 4, pp. 40-52, 1992.
Ruh W. A., Maginnis F. X., and Brown W. J.,
Enterprise Application Integration, A Wiley
Tech Brief, ISBN 0471376418, John Wiley &
Sons, 2000.
Szyperski C., Component Software - Beyond
Object-Oriented Programming (2nd edition),
ISBN 0-201-74572-0, Addison-Wesley, 2002.
Wallnau K. C., Hissam S. A., and Seacord R.
C., Building Systems from Commercial
Components, ISBN 0-201-70064-6, AddisonWesley, 2001.
Yin R. K., Case Study Research : Design and
Methods (3rd edition), ISBN 0-7619-2553-8,
Sage Publications, 2003.
Degradation archaeology: studying software flaws’ evolution
Angela Lozano, Michel Wermelinger, Bashar Nuseibeh
Computing Department, The Open University
Walton Hall, Milton Keynes MK7 6AA, UK
[email protected], [email protected], [email protected]
Abstract
Given that software evolution depends on the ability to keep the knowledge about the system
and the architectural integrity, research has been focussed on how to ease code
comprehension and how to avoid architectural decay. Although these approaches have
demonstrated to be useful, the lack of understanding of software degradation inhibits us to
tackle it more adequately. Our position is that by studying the evolution of structural problems
based on source code evidence (like bad smells, violation of design rules and bad
programming styles), theory and practice of software evolution can be enhanced. The study
proposes experiments to analyse these structural problems along several versions to detect
their relations and evolution and to evaluate how structural changes impact them. The
evolution of structural problems is considered in two ways how a structural problem degrades
with time, by studying them in isolated compilation units and how related structural problems
evolve as a group. The impact of restructuring is also considered by identifying the sets of
refactorings applied to successfully remove structural problems in a compilation unit
(individually) or in a certain version (as a group). By studying the causes of structural flaws
through different sources of information like metrics, design snapshots and the CVS
repository, we will obtain a high-level view that will allow us to generate predictive and
evaluative models for supporting decision making in software evolution.
Keywords: software evolution, software archaeology, flaw detection, flaw correction,
refactoring, program comprehension
Introduction
Recent trends in software process have emphasised the issue that change is unavoidable in
software development [4]. During application evolution faults are corrected and the application
is adapted to requirements that emerge, in part from the users’ experience. Nevertheless,
evolution is only possible when the development team knows the system and the architecture
is coherent enough to allow substantial changes without damaging architectural integrity; less
coherent architectures require more extensive knowledge in order to evolve them, and a lack
of knowledge results in a faster deterioration of the architecture [5].
Furthermore, a survey on the software productivity rate in the last years with industry
specialists has shown [17] that, while some of them assume that the sophistication of
developing tools is what has allowed the industry to develop at such a fast growth rate, others
suppose that software industry’s productivity is declining due to software’s increasing
complexity. Software complexity has become an issue given that the user expectations have
increased and the inherent complexity of the problems solved by software systems cannot be
decreased. However, software complexity, composed by algorithmic and structural
complexity, can be tackled separately in order to achieve greater productivity. Algorithmic
complexity (performance of an algorithm) primarily consumes machine resources while
structural complexity (how to organize the program elements within a program) primarily
expends intellectual resources [9]. Given that intellectual resources are scarcer, most of the
support for software maintenance and evolution research has been focused on tools for
dealing with structural complexity as much on its cognitive dimension as on its practical
dimension. This concern about providing better tools to support the software development
process is reflected in the provision of software development tools by many of the biggest
software suppliers [17].
In consequence, most of the research on dealing with software evolution has been focused
on reducing its complexity through automatic support for program comprehension (which
p. 119 of1199
includes visualization and reverse engineering techniques) and on restructuring (which
includes problem detection and problem correction). In this paper we propose the study of
bad smells and design flaws throughout the history of the system in order to derive better
prediction and evaluation models for defect detection, as well as gaining some insights on
correction strategies that were proven successful over time. Although analysing software
development history is not a new approach, our proposal is novel in two ways: we are
changing the point of view in which software has been studied along time up to now by
focusing on structural problems and we are using a wider variety of sources of information
that will allow us to gain a deeper understanding of degradation and stability processes.
The paper is organized as follows: section 1 reviews the achievements on supporting
software evolution that motivate our proposal, section 2 explains our approach, and section 3
contains some concluding remarks.
1. Automatic support for evolution sustainability
Due to the necessity of considering software architecture and software team knowledge in
order to achieve longer lasting systems, research efforts have followed two major trends. On
the one hand, giving the programmers higher level views of facts extracted from source code.
On the other hand, supporting perfective maintenance tasks by detecting problematic areas in
the source code and by restructuring automatically the code.
1.1. Enhancing program comprehension
Given that a considerable amount of effort in software maintenance is spent on program
comprehension, several tools have been developed to support it. These tools aim to give
support for the strategies that program comprehension theories have found recurrent in
empirical studies [35].
According to Müller et al. [5] program comprehension tools “can manage the complexities of
program understanding by helping the software engineer extract high-level information from
low-level artifacts, such as source code”. However, apart from coevolving tools such as
intentional views [25], they have a reduced impact in terms of evolution given that they do not
have any long term effect. Each time a change is needed, the work spent in understanding
the system has to be repeated.
Another disadvantage is that, even though there have been some formal qualitative studies
that prove tools’ impact on comprehension [36], there has not been any quantitative
evaluation of them because “there is no agreed-upon definition or test of understanding” [5].
To deal with this issue, conceptual frameworks like RODS [41] have been proposed to
evaluate the usefulness of these tools. The framework was used for evaluating Rigi and
RMTool, concluding that the theories converge with the beliefs within the reverse engineering
community concerning the cognitive advantages offered by the tools. Nevertheless, we think
that comparative evaluations deserve further investigation for identifying the implications,
limitations and advantages of each approach of program comprehension.
1.2. Reducing code degradation
The functionality that a structure can support gracefully is restricted to the predictions made
when it was defined. Software aging is an inevitable process that occurs when systems start
to fail because they have had several changes badly designed or because they cannot
comply with changing needs. After repeated changes that do not comply with the original
design contracts, the system becomes expensive to update because “changes take longer
and are more likely to introduce new bugs” [31]
Reducing the aging effects by improving its internal quality is one of the diverse efforts that
can be taken for successfully evolving an application. A good software structure improves
software readability by offering layers of abstraction and software change by offering
appropriate encapsulation; therefore, it is seen as a key aspect in easing maintenance and
evolution. Consequently, identifying automatically the areas in the source code which are
susceptible of improvement, and modifying them automatically are other ways to support
evolution.
p. 120 of 199
2
1.2.1. Restructuring support
A desirable functionality, whenever the source code is modified, is to be able to perform
simple restructuring tasks without worrying about their ripple effect or the time these
transformations consume. This functionality is called refactoring. Refactorings are defined as
structural transformations on source code that do not affect the external behaviour of the
code. Refactorings were presented as a set of graph transformations that under specific initial
conditions are able to preserve the observable behaviour of the code [30]. One of the first
implementations of refactoring was the refactoring browser [32]. It was presented as
automatic support for the recurrent tasks of redesign and extension of a system. The term
refactoring became popular with Fowler’s book [14] that, apart from a comprehensive list of
refactorings, included a chapter introducing the term bad smell to define and catalogue those
kinds of software structures perceived as having low quality, or in other words, those areas
that are susceptible of improvement by means of a refactoring.
The value of refactorings has increased with incremental contributions that used them to
achieve structural transformations at a higher level. Tokuda and Batory demonstrated that is
viable to achieve high scale changes by having higher level refactorings made of sequences
of basic refactorings [39]. Tahvildari and Kontogiannis showed refactorings’ usefulness to
insert design patterns in code that was difficult to maintain [37]. Kniesel proved the feasibility
of automatic creation and validation of higher level refactorings through composition of their
preconditions and expected impact [22].
Summarizing, refactoring is seen as a practical alternative for decelerating software decay
and in consequence an ally when software evolves. Additionally, it has been shown that using
refactorings it is feasible to evolve software systems automatically [39]. However, this raises
the problem of finding the areas in need of improvement.
1.2.2. Tool support for the detection problem
In order to maximize the potential impact of refactoring in evolution, one has to locate the
areas in the source code in need of restructuring; this issue has been called the detection
problem [7].
The location of improvement possibilities has been developed in several ways. Among others,
locating typical problems like clones [11] and calculating the difficulty to remove them [2],
detecting inadequate design decompositions with metrics [34], identifying software metrics
with a high impact on maintainability [38], detecting poor implementations of design patterns
and transforming them into their canonical form [19], locating abuse of language
modularizations that might imply misuse of object-oriented concepts [12] and identifying
unusual patterns in the system [20].
Another trend of the detection problem is to detect violation of design principles by translating
them into query clauses [7] or into metrics [24].
In the first approach, the goal is to find the structure instances that breach design rules.
Nevertheless, this method might be limited given that the underlying infrastructure required to
apply it on different programming languages requires different parsers, which brings its own
problems for evolving legacy code [11].
In the second approach the objective is to decompose the design principles into measurable
situations and designate some threshold and range values in which those situations can be
interpreted as the infringement of the design rule and a low software quality. This approach
was enhanced by taking into account the history of the system in two ways: first, it made
possible to validate the detection strategies by identifying false positives and false negatives
looking for the persistence of the problems detected from a version to the next [24], and
second it allowed to discover dependencies not evident by studying entities that changed in
the same version [16]. Besides contributing to locate design flaws in the source code, the use
of metrics also deals with the qualification of software attributes based on single values that
neglect other characteristics. However, the threshold values are still subjective and even in
the cases in which the threshold values where refined by training a genetic algorithm (which
takes time and effort [27]) they remain bounded to the system where they were obtained.
p. 121 of3199
Problem detection has demonstrated its usefulness in perfective maintenance, but it has
addressed only part of the issue. Once a defective area is located it is necessary to solve the
correction problem or solution analysis stage [40]. The goal of the correction problem is to
derive the appropriate set of refactorings to improve the problematic code; this goal implies,
among other tasks, to evaluate whether the order in which refactorings are applied affect the
final quality and whether the order in which the problems are fixed generates further problems
and if so, to what extent.
1.2.3. Tool support for the correction problem
There is a recent interest on correcting and improving automatically the source code [21, 33,
40]. In this trend, authors detect as bad structures those pointed out by metrics based on
design heuristics. Trifu et al [40] use Marinescu’s detection strategies [24], and then derive a
set of refactorings that fix the problem and maximize the improvement of the application’s
quality by selecting an appropriate refactoring order. Keffe and Cinneide [21] propose a set of
metrics to express design heuristics. The improvement is achieved by repeatedly applying a
random refactoring to the design, and maximizing the quality of the obtained design,
calculated as the weighted sum of metrics. Seng et al. [33] use genetic algorithms to evaluate
the best structural arrangement using metrics they propose themselves to measure the
overall compliance of the architecture with design heuristics.
Although correction techniques look very promising it has been shown that automatic
transformations can have a negative impact on software comprehension giving that they
undermine the mapping between problem domain concepts and software entities [29].
2. Studying structural problems’ evolution
Even though there is an important amount of work in software quality maintenance and
improvement, as shown in the previous section, this area is not yet totally covered; the
research community has identified remaining challenges for software evolution [26]. These
challenges argue that it is still necessary to propose new predictive models; to develop
metaphors that increase the awareness of software managers about software evolution; and
to develop theories to understand software evolution.
Besides the challenges identified, we think that a deeper understanding of architecture decay
is necessary to deal effectively with software evolution. Especially as the aforementioned
restructuring approaches on reducing code degradation do not take into account that the
structure must accommodate the drift that the application is showing through its history of
changes. Furthermore, by patching the problematic area they might introduce more problems
somewhere else in the system and by applying massive modifications they ignore conscious
design decisions. Our assumption is that there are restructurings with a longer effect in
facilitating change as they consider the changes that the application has been suffering and
therefore, may better fit similar future changes. From the various ways in which the
architecture can be degraded, we plan to study those that are manifested at source-code
level, such as bad smells [14], violation of design rules [24] and bad programming styles [12].
1
We will build upon the fact that these structural problems are a studied, well known
metaphor, and that there are techniques to automatically point out their location, as shown in
the previous section.
The purpose of the approach is threefold: to improve the prediction of structural problems by
studying their relations and evolution, to identify the successful restructuring efforts to
eliminate them, and to produce an evolution metaphor of structural problems in the
architecture.
We plan to identify structural problems at major releases and relate them to design
components in order to track them over time. As well, we aim to extract not very fine grained
structural and semantic information on each of several snapshots using existing code analysis
techniques to gain a better understanding of the structural problems identified. Then, using
automatic analysis over those enriched snapshots, we aim to detect how structural problems
1
Note that structural problems are different from bugs. Although structural problems may cause
software defects, in general structural degradations may not have any functional effect.
p. 122 of 199
4
are related and which architectural modifications impact their degradation or improvement as
well as their dissemination (e.g. through cloning) or extinction (e.g. through refactoring).
The project is divided into three experiments that concentrate each on a separate goal: to
study the semantic relationships among structural problems, to find patterns of dissemination
and extinction, to evaluate how structural modifications affect them.
Firstly, we plan to use techniques like dependency graphs and change coupling to find
relationships among structural problems inside one compilation unit and throughout the
architecture. The relations found in one version are validated by measuring their frequency
along versions. The objective of finding these relations is to identify common underlying
aspects and to propose them as means to deal with groups of structural problems as a whole
entity.
Then we intend to find patterns of dissemination and extinction as a way to provide early
detection and prevention of structural decay. Here is also important to consider separately the
dissemination patterns along a single compilation unit and within the architecture as a whole.
By knowing how structural patterns evolve we might be able to evaluate their severity and
future impact on the architecture.
Finally we expect to identify the recurrent restructuring sequences in areas close to structural
problems and evaluate their rate of success (measured in terms of number of structural
problems removed, and number of compilation units free of structural problems). This study
will result in suggestion of better correction policies.
Moreover, these empirical studies will also increase the level of abstraction at which software
evolution is studied and will give some insights about useful information to be tracked in
versioning systems to support software evolution.
Analysing software development history is not a new approach; researchers have found
versioning repositories to be a rich source of information [1, 6, 10, 13, 15, 16, 18, 28, 42] to
support development tasks because they contain the history of changes, the affected items,
some rationale about design decisions, authoring information, etc.. There are several
applications of analysing software history; among others: providing change and risk predictors
[18, 28, 42], identifying hidden dependencies [15, 16, 42], locating features [6, 13], identifying
refactorings [1, 6, 10], detecting design flaws [16], etc. History analysis is also used in more
long-term efforts trying to build theories and characterize evolution based on empirical studies
[23]. Nevertheless, to our knowledge this proposal is novel in two ways: the focal point of
study is changed by concentrating on the evolution of structural problems and on a wide
variety of sources of information that allow us to gain a deeper understanding of degradation
and stabilisation processes.
3. Conclusions
Since the early times of software development there has been an interest in studying the
evolution of changes and the impact of those changes on the system in order to identify the
factors that affect negatively its evolution [3]. On the other hand, there is a need for automatic
analysis of software to support decision making in maintainability and evolution contexts [8].
This position paper describes some recent efforts in improving software quality and based on
them proposes a set of empirical studies to be carried out as part of the first author’s PhD
about the evolution of structural problems. The goal is to contribute to the theory and
managerial usefulness and awareness of software evolution.
References
1.
2.
3.
Antoniol, G., Di Penta, M. and Merlo, E., An automatic approach to identify class
evolution discontinuities. in Proc. Intl Workshop on Principles of Software Evolution,
IWPSE04., (2004), 31-40.
Balazinska, M., Merlo, E., Dagenais, M., Lagüe, B. and Kontogiannis, K. Advanced
Clone-Analysis to Support Object-Oriented System Refactoring. in Proc. of Working
Conf. on Reverse Engineering (WCRE'00), IEEE Computer Society, 2000, 98.
Basili, V.R. and Perricone, B.T. Software errors and complexity: an empirical
investigation. Commun. ACM, vol. 27, pp. 42-52, 1984.
p. 123 of5199
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
Beck, K. Embracing change with extreme programming. Computer, vol. 32, pp. 7077, 1999.
Bennett, K.H. and Rajlich, V.c.T. Software maintenance and evolution: a roadmap. in
Proc. Conf. on The Future of Software Engineering, ACM Press, Limerick, Ireland,
2000, 73-87.
Chen, A., Chou, E., Wong, J., Yao, A.Y., Zhang, Q., Zhang, S. and Michail, A.,
CVSSearch: searching through source code using CVS comments. in Proc. Intl Conf
on Software Maintenance, ICSM01., (2001), 364-373.
Ciupke, O. Automatic Detection of Design Problems in Object-Oriented
Reengineering. in Proc. of the Technology of Object-Oriented Languages and
Systems (TOOLS99), IEEE Computer Society, 1999, 18.
Coleman, D., Ash, D., Lowther, B. and Oman, P. Using Metrics to Evaluate Software
System Maintainability. Computer, vol. 27, pp. 44-49, 1994.
Darcy, D.P., Kemerer, C.F., Slaughter, S.A. and Tomayko, J.E. The Structural
Complexity of Software: An Experimental Test. Software Engineering, IEEE
Transactions on, vol. 31, pp. 982-995, 2005.
Demeyer, S., Ducasse, S. and Nierstrasz, O. Finding refactorings via change metrics.
in Proc. conf. on Object-Oriented Programming, Systems, Languages, and
Applications. OOPSLA00, ACM Press, Minneapolis, Minnesota, United States, 2000,
166-177.
Ducasse, S., Rieger, M. and Demeyer, S. A Language Independent Approach for
Detecting Duplicated Code. in Proc. Int. Conf. on Software Maintenance (ICSM99),
IEEE Computer Society, 1999, 109.
Emden, E.V. and Moonen, L. Java Quality Assurance by Detecting Code Smells. in
Proc. of Working Conf. on Reverse Engineering (WCRE'02), IEEE Computer Society,
2002, 97.
Fischer, M., Pinzger, M. and Gall, H., Analyzing and relating bug report data for
feature tracking. in Proc. Working Conference on Reverse Engineering. WCRE03.,
(2003), 90-99.
Fowler, M., Beck, K., Brant, J., Opdyke, W. and Roberts, D. Refactoring: Improving
the Design of Existing Code: Addison-Wesley Professional, 1999.
Gall, H., Jazayeri, M. and Krajewski, J., CVS release history data for detecting logical
couplings. in Proc. Intl Workshop on Principles of Software Evolution, IWPSE03.,
(2003), 13-23.
Girba, T., Ducasse, S., Marinescu, R. and Ratiu, D., Identifying Entities That Change
Together. in Proc. Workshop on Empirical Studies of Software Maintenance
(WESS04), (2004).
Groth, R. Is the software industry's productivity declining? Software, IEEE, 2004, 9294.
Hassan, A.E. and Holt, R.C., Predicting change propagation in software systems. in
Proc. Intl. Conf. on Software Maintenance, ICSM04., (2004), 284-293.
Jahnke, J.H. and Zundorf, A. Rewriting Poor Design Patterns by Good Design
Patterns. in Proc. of ESEC:FSE '97 Workshop on Object-Oriented Reengineering.,
Technical Report TUV-1841-97-10 Technical University of Vienna Information
Systems Institute Argentinierstrasse 8/184-1 A-1040 Wien Austria., 1997.
Kataoka, Y., Ernst, M., Griswold, W. and Notkin, D. Automated Support for Program
Refactoring using Invariants. in Proc. of Int. Conf. on Software Maintenance
(ICSM'01), IEEE Computer Society, 2001, 736.
Keeffe, M.Ó. and Cinnéide, M.Ó. Towards Automated Design Improvement Through
Combinatorial Optimisation. in Proc. Workshop on Directions in Software Engineering
Environments, (Edinburgh, 2004).
Kniesel, G., ConTraCT A refactoring editor based on composable conditional
program transformation. in Proc. Generative and Transformational Techniques in
Software Engineering (GTTSE05), (Braga, Portugal, 2005), 79- 93.
Lehman, M.M. and Ramil, J.F. An approach to a theory of software evolution. in Proc.
Int'l Workshop on Principles of Software Evolution (IWPSE01), ACM Press, Vienna,
Austria, 2001, 70-74.
Marinescu, R. Detection Strategies: Metrics-Based Rules for Detecting Design Flaws.
in Proc. of Intl. Conf. on Software Maintenance (ICSM04), IEEE Computer Society,
2004, 350-359.
p. 124 of 199
6
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
Mens, K., Poll, B. and González, S. Using Intentional Source-Code Views to Aid
Software Maintenance. in Proc. of the Intl. Conf. on Software Maintenance (ICSM03),
IEEE Computer Society, 2003, 169.
Mens, T., Wermelinger, M., Ducasse, S., Demeyer, S., Hirschfeld, R. and Jazayeri,
M., Challenges in software evolution. in Proc. Intl. Workshop on Principles of
Software Evolution (IWPSE05), (2005), 13-22.
Mihancea, P.F. and Marinescu, R., Towards the Optimization of Automatic Detection
of Design Flaws in Object-Oriented Software Systems. in Proc. European Conf. on
Software Maintenance and Reengineering (CSMR05), (2005), 92-101.
Mockus, A. and Votta, L.G., Identifying reasons for software changes using historic
databases. in Proc. Int'l Conf. on Software Maintenance, ICSM00., (2000), 120-130.
Moore, I. Automatic inheritance hierarchy restructuring and method refactoring. in
Proc. of conf. on Object-Oriented Programming, Systems, Languages, and
Applications (OOPSLA96), ACM Press, San Jose, California, United States, 1996,
235-250.
Opdyke, W.F. and Johnson, R.E. Creating abstract superclasses by refactoring. in
Proc. of the ACM conf. on Computer science, ACM Press, Indianapolis, Indiana,
United States, 1993, 66-73.
Parnas, D.L. Software aging. in Proceedings of the 16th international conference on
Software engineering, IEEE Computer Society Press, Sorrento, Italy, 1994, 279-287.
Roberts, D., Brant, J. and Johnson, R. A refactoring tool for Smalltalk. Theor. Pract.
Object Syst., vol. 3, pp. 253-263, 1997.
Seng, O., Bauer, M., Biehl, M. and Pache, G. Search-based improvement of
subsystem decompositions. in Proc. of conf. on Genetic and Evolutionary
Computation (GECCO05), ACM Press, Washington DC, USA, 2005, 1045-1051.
Simon, F., Steinbrückner, F. and Lewerentz, C. Metrics Based Refactoring. in Proc.
European Conf. on Software Maintenance and Reengineering (CSMR01), IEEE
Computer Society, 2001, 30.
Storey, M.-A. Theories, Methods and Tools in Program Comprehension: Past,
Present and Future. in Proc. of Int'l Workshop on Program Comprehension
(IWPC05), IEEE Computer Society, 2005, 181-191.
Storey, M.-A., Wong, K. and Muller, H. How do program understanding tools affect
how programmers understand programs. in Proc. of Working Conf. on Reverse
Engineering (WCRE'97), IEEE Computer Society, Amsterdam, The Netherlands,
1997, 12-21.
Tahvildari, L. and Kontogiannis, K. A Methodology for Developing Transformations
Using the Maintainability Soft-Goal Graph. in Proc. of Working Conf. on Reverse
Engineering (WCRE'02), IEEE Computer Society, 2002, 77.
Tahvildari, L. and Kontogiannis, K. A Metric-Based Approach to Enhance Design
Quality through Meta-pattern Transformations. in Proc. of European Conf. on
Software Maintenance and Reengineering (CSMR03), IEEE Computer Society, 2003,
183.
Tokuda, L. and Batory, D. Evolving Object-Oriented Designs with Refactorings.
Automated Software Engineering, vol. 8, pp. 89- 120, 2001.
Trifu, A., Seng, O. and Genssler, T. Automated Design Flaw Correction in ObjectOriented Systems. in Proc. of European Conf. on Software Maintenance and
Reengineering (CSMR'04), IEEE Computer Society, 2004, 174.
Walenstein, A. Foundations of Cognitive Support: Toward Abstract Patterns of
Usefulness. in Proc. of Intl. Workshop on Interactive Systems. Design, Specification,
and Verification, Springer-Verlag, 2002, 133-147.
Zimmermann, T., Diehl, S. and Zeller, A., How history justifies system architecture (or
not). in Proc. Int'l. Workshop on Principles of Software Evolution, IWPSE03., (2003),
73-83.
p. 125 of7199
p. 126 of 199
Dependency Analysis of Model Inconsistency
Resolutions
Tom Mens1 , Ragnhild Van Der Straeten2 , and Maja D’Hondt3
1
3
Software Engineering Lab, Université de Mons-Hainaut
Av. du champ de Mars 6, 7000 Mons, Belgium
[email protected]
2
Systems and Software Engineering Lab, Vrije Universiteit Brussel
Pleinlaan 2, 1050 Brussel, Belgium
[email protected]
Project INRIA Jacquard - Laboratoire d’Informatique Fondamentale de Lille
59655 Villeneuve d’Ascq Cedex, France
[email protected]
Abstract. Model inconsistency management is a crucial aspect of model-driven
software engineering. In this article we explore how the theory of graph transformation, and critical pair analysis in particular, can be used to improve support
for the detection and resolution of UML model inconsistencies. As a proof-ofconcept, we report on an experiment that we have carried out along these lines
using the state-of-the-art graph transformation tool AGG. Our initial results look
very promising, but further work is required to integrate the proposed approach
into contemporary modelling tools.
1
Introduction
Model-driven engineering (MDE) is an approach to software engineering where the
primary focus is on models, as opposed to source code. Models are built representing
different views on a software system. The ultimate goal is to raise the level of abstraction, and to develop more complex software systems by manipulating models only.
Any software system that is deployed in the real-world is subject to evolution [1].
Because of this, it is crucial for any software development process to provide support for
software evolution. This includes support for version control, traceability management
and change impact analysis, change propagation, inconsistency management. In this
paper, we focus on the activity of model inconsistency management in particular.
In [2], we presented a classification of model inconsistencies. For each of these
inconsistencies a set of resolutions can be specified. Inconsistencies can be detected
automatically and resolution strategies can be proposed to the designer [3]. However, it
is possible that a lot of inconsistencies are detected and for each inconsistency, a lot of
resolutions are possible. Dependencies can occur between different resolution strategies
and between resolution strategies and inconsistencies. As a result resolution strategies
can introduce new occurrences of certain inconsistencies, or resolution strategies will
solve occurrences of several inconsistencies. Resolution strategies can also introduce
new possibilities to resolve other inconsistency occurrences.
p. 127 of 199
2
Mens, Van Der Straeten, D’Hondt
In this paper, we report on an experiment we have carried out to detect and resolve
model inconsistencies by specifying them as transformation rules, and to analyse sequential dependencies between these transformation rules. To achieve this, we rely on
the state-of-the-art graph transformation tool AGG (version 1.3.0)
2
Experimental Setup
For the sake of the experiment, we have restricted ourselves to model inconsistencies
in UML models consisting of a simplified version of UML 2.0 [4] class diagrams and
protocol state machine diagrams only. It is, however, straightforward to relax this restriction and to apply our approach to other types of UML diagrams as well.
Rather than specifying the model inconsistencies and resolution strategies in some
dedicated UML CASE tool, we decided to represent UML models in a more generic
graph-based format in the general-purpose graph transformation tool AGG. The metamodel for the considered subset of the UML is expressed as a so-called type graph in
AGG, as shown in Figure 1.
Note that this metamodel deliberately does not enforce many well-formedness rules.
The reason for this has to do with the distinction between inconsistency management
and consistency maintenance. Consistency maintenance is a very rigid approach in the
sense that the software system always needs to be in a consistent state. All modifications
to the system must preserve the consistency. With inconsistency management, models
may be partially inconsistent, and modifications to the models may even introduce new
inconsistencies. This makes the approach more flexible, and thus more suitable in certain situations such as early analysis and design, where the models may still be partially
incomplete or inconsistent, and collaborative software development, where software
engineers may make mutually conflicting changes to the models.
2.1
Identifying model inconsistencies and their resolution strategies
In a first step of our approach, the model inconsistencies and resolution strategies are
identified. We use the following template to show some examples of model inconsistencies and resolution strategies that we have selected for our experiment:
NameOfModelInconsistency. Description of model inconsistency.
1. First strategy to resolve the model inconsistency
2. Second inconsistency resolution strategy, and so on . . .
The set of inconsistencies described below is based on the model elements occurring in the fragment of the metamodel presented in Figure 1. In the experiment presented here, we only take into account the structural inconsistencies presented in [2].
These inconsistencies are caused by the addition, deletion and modification of model
elements occurring in the fragment of the UML metamodel under consideration. The
translation of these inconsistencies into graph transformation rules is quite obvious.
The specification of the behaviour and especially the inheritance of behaviour can also
introduce specific inconsistencies (cf. behavioural inconsistencies in [2]). The analysis
p. 128 of 199
UML model inconsistency resolution
3
Fig. 1. Simplified metamodel for UML class diagrams and protocol state machine diagrams, expressed as a type graph in AGG. Blue parts specify protocol state machine diagrams, black parts
specify class diagrams, and pink edges express their interrelationships. In addition, a node type
Conflict is introduced to represent model inconsistencies.
of such inconsistencies is an issue of future work. For each inconsistency, we also describe a set of resolution strategies. We do not claim that this set of resolution strategies
is complete. The resolution strategies proposed boil down to the addition, deletion or
modification of relevant model elements.
DanglingTypeReference. An Operation has one or more Parameters whose types are
not specified. It can be resolved in 3 different ways:
1. Remove the parameter whose type is undefined.
2. Assign an existing class as the type of the previously undefined parameter.
3. Define a new class as the type of the previously undefined parameter.
ClasslessInstance. A model contains an InstanceSpecification that is not linked to a
Class.
1. Remove the instance specification altogether.
2. Link the instance specification to an existing class.
3. Add a new class and link the instance specification to this new class.
AbstractObject. A model contains an InstanceSpecification that is an instanceOf an
abstract Class that does not have any concrete subclasses. (This is an inconsistency
since abstract classes cannot be instantiated.)
1. Change the abstract class into a concrete one.
2. Add a concrete descendant class of the abstract class, and redirect the outgoing instanceOf -edge of the instance specification to this concrete descendant
instead of to the abstract class.
3. Remove the instance specification alltogether.
p. 129 of 199
4
Mens, Van Der Straeten, D’Hondt
4. Add a concrete descendant class of the abstract class.
AbstractOperation. An abstract Operation is defined in a concrete Class. (This is an
inconsistency, since a concrete class is supposed to have concrete operations only.)
1. Change the abstract operation into a concrete one.
2. Remove the abstract operation altogether.
3. Change the concrete class containing the operation into an abstract class.
4. If there is an abstract ancestor of the containing class, move up the abstract
operation to this abstract ancestor class.
5. Add an abstract ancestor class of the concrete class, and move up the abstract
operation to this abstract ancestor.
6. If there is an abstract descendant of the containing class, move down the abstract operation to this abstract descendant class.
7. Add an abstract descendant class of the concrete class, and move down the
abstract operation to this abstract descendant.
2.2
Specifying inconsistency detection and resolution rules
As a second step, we formally represent model inconsistencies and resolution strategies
as graph transformations in AGG.
Fig. 2. Specification of a model inconsistency and one of its resolution strategies as graph transformation rules in AGG.
Figure 2 shows an example of a model inconsistency and a resolution strategy represented as a graph transformation. Both transformations consists of a left-hand side
(LHS) in the upper middle pane, a right-hand side (RHS) in the upper right pane, and
zero or more negative application conditions (NACs) that represent forbidden contexts.
A NAC is shown in the upper left pane. It specifies that the parameter has no associated
type. The right-hand side of the transformation specifying a model inconsistency, is the
same as the left-hand side, except that a new node of type Conflict is introduced, with a
link to the node that was the source of the model inconsistency. The graphs themselves
are directed, attributed and typed. The latter implies that all graphs conform to the type
graph presented in Figure 1.
p. 130 of 199
UML model inconsistency resolution
5
To specify inconsistency resolution strategies, we assume that a model inconsistency has already been detected before. Hence, the graph will already contain at least
one node of type Conflict indicating (by means of its attribute description) the type of
inconsistency that needs to be resolved. For each of the inconsistency resolution strategies identified in section 2.1, we can define a corresponding graph transformation rule
that formalises the textual description. For the names of the inconsistency resolution
rules, we have chosen the numbering scheme used in section 2.1. For example, rule
DanglingTypeReference-Res2 corresponds to the second resolution strategy for the
DanglingTypeReference model inconsistency. After having applied the inconsistency
resolution rule, the model inconsistency will no longer be present, and the corresponding Conflict-node is removed from the graph structure.
2.3
Detecting dependencies between inconsistency resolutions
The next step is to perform static analysis on graph transformations in order to detect causal dependencies between the resolution rules introduced before. The analysis
is based on the formal notion of critical pair. The goal of critical pair analysis [5] is
to compute all potential dependencies for a given set of transformations by pairwise
comparison. We are particularly interested in sequential dependency. A sequential dependency between two transformations occurs if the application of the second transformation requires the prior application of the first transformation.
Fig. 3. Critical pair illustrating a sequential dependency between two resolution rules.
As an illustration of a critical pair that identifies a potential sequential dependency
between two resolution rules, take a look at Figure 3. Intuitively, it is clear that the
resolution rules AbstractObject-Res1 and AbstractOperation-Res3 are sequentially
p. 131 of 199
6
Mens, Van Der Straeten, D’Hondt
dependent. Indeed, the first transformation makes an abstract class concrete, while the
second transformation requires a concrete class for its application. As such, the application of the first transformation may enable the application of the second transformation.
This causal dependency is detected as an overlap (a single class, labelled 1) between
the right of the first transformation and the left of the second transformation.
3
Analysing and Interpreting Sequential Dependencies
We used AGG’s critical pair analysis algorithm to compute all critical pairs that identify
potential sequential dependencies between the inconsistency detection and resolution
rules. The results are explained below, and summarised in Table 1:
1. By construction, a detection rule will never be causally dependent on itself.
2. In a similar vein, different detection rules are not causally dependent because they
do not essentially modify the graph structure. The only thing they do is add a new
Conflict-node.
3. Every resolution rule for a given model inconsistency will always sequentially depend on the detection rule of the same inconsistency, since the detection rule produces a Conflict-node that is required for the application of the resolution rule.
4. In our current setup, resolution rules for a certain inconsistency will never depend
on detection rules for another inconsistency, because the Conflict-node of each particular inconsistency contains an attribute that specifies the kind of inconsistency
that is being detected or resolved. In the future, we may relax this restriction, in order to identify situations where the same resolution strategy may be used to resolve
multiple model inconsistencies at once.
5. Alternative resolution rules for the same model inconsistency are sometimes sequentially dependent. Often, this is a sign of redundancy between the resolution
strategies, and it may indicate an opportunity for refactoring the resolution strategies to make them more orthogonal. For example, we noticed a dependency from
the second to the third resolution rule of DanglingTypeReference, from the second
to the third resolution of ClasslessInstance, from the fourth to the fifth resolution of
AbstractOperation, and from the sixth to the seventh resolution of AbstractOperation. These four dependencies all boil down to the same underlying problem.
For each resolution rule that adds some link to an existing class, there is a similar resolution rule that first introduces a new class before adding a link to it. Such
redundancies can easily be avoided by restructuring the resolution strategies.
6. As shown in Figure 4, there are also many sequential dependencies between resolution rules belonging to different model inconsistencies. This has two important
consequences. First, it shows that the resolution of a particular model inconsistency
may introduce new and different opportunities for resolving other model inconsistencies. As such, the order of resolution of model inconsistencies may be important. Second, some of the identified sequential dependencies may indicate a lack
of orthogonality between the various resolution strategies. This can be seen clearly
in the mutual dependency between ClasslessInstance-Res1 and AbstractObjectRes3. In both cases, the resolution strategies are exactly the same, even though they
p. 132 of 199
UML model inconsistency resolution
7
are used to solve a different model inconsistency. Again, our analysis will help us
to remove such redundancies.
7. Sometimes, the detection of a model inconsistency may be triggered by the application of a resolution rule for another model inconsistency of the same case. This
is, however, merely a degenerate case of the more general situation that will be
discussed in the next item.
8. In general, the resolution of a model inconsistency may give rise to the introduction of new model inconsistencies. It implies that the process for resolving model
inconsistencies is an iterative process, similar in spirit to bug fixing: when fixing
one bug, new bugs are found that need to be fixed as well. This idea is clearly
illustrated by the fact that many inconsistency detection rules were found to be
sequentially dependent on inconsistency resolution rules. For example, there was
a sequential dependency from AbstractObject-Res1 to AbstractOperation, implying that the first resolution strategy for the AbstractObject inconsistency may
introduce a model inconsistency of the kind AbstractOperation. Indeed, by applying the resolution rule AbstractObject-Res1, a previously abstract class will
become concrete. If this abstract class happened to have one or more abstract operations (a situation that is completely acceptable), after the transformation all of
these operations will lead to an AbstractOperation inconsistency because a concrete class is not allowed to have abstract operations.
Table 1. Classification of sequential depencies.
for the same inconsistency
detection rule depends on Never (1)
detection rule
resolution rule depends on Always (3)
detection rule
resolution rule depends on Sometimes (5)
other resolution rule
detection rule depends on Sometimes (7)
resolution rule
4
between different inconsistencies
Never (2)
Never (4)
Sometimes (6)
Sometimes (8)
Related Work
This work deals with a particular aspect of software inconsistency management, namely
the use of sequential dependency analysis to support model inconsistency resolution.
For other related research on inconsistency management in software engineering, we
refer to [6] and the excellent survey by [7].
The approach used in this article relies on the theory of graph transformation. Other
approaches to model inconsistencies that rely on graph transformation theory are explained in [8] and [9]. The latter uses critical pair analysis of graph transformations
p. 133 of 199
8
Mens, Van Der Straeten, D’Hondt
Fig. 4. Graph depicting sequential dependencies between distinct inconsistency resolution rules.
to detect conflicting functional requirements in UML models composed of use case
diagrams, activity diagrams and collaboration diagrams. In [10], critical pair analysis
has been used to detect conflicts and dependencies between object-oriented software
refactorings. Other work on critical pair analysis is reported by [11].
[2, 12] propose the use of Description Logics as an alternative formalism to deal
with inconsistencies between UML models. In [3, 13] different resolution strategies are
proposed but the analysis of dependencies between the different strategies remained an
open question.
Finally, in this article we relied on the critical pair analysis algorithm as implemented by the AGG graph transformation tool. An alternative would be to use the Condor tool that relies on conditional transformations and logic programming. In [14] we
compared this tool with AGG and concluded that Condor may be a very interesting
alternative for reasons of efficiency and expressiveness.
5
Conclusion
In this paper we focused on the problem of UML model inconsistency management, and
the ability the improve current tool support for iteratively and incrementally detecting
and resolving model inconsistencies. We reported on an experiment that explores the
use of graph transformation dependency analysis, and critical pair analysis in particular,
for this purpose.
The main contribution of the proposed approach is that it allows for the static analysis of conflicts and dependencies between different alternative resolution strategies for
a wide variety of detected model inconsistencies. A careful analysis of all the dependencies will allow us to optimise and provide better and more automated support for the
p. 134 of 199
UML model inconsistency resolution
9
inconsistency resolution process, for example by proposing an optimal way to resolve
all detected model inconsistencies.
Nevertheless, a lot of work remains to be done:
– The AGG graph transformation tool needs to be improved in various ways. The critical pair analysis algorithm needs to be made more efficient, and better modularisation mechanisms are needed to deal with multiple sets of graph transformations.
– The redundancy between the proposed inconsistency resolution strategies needs to
be removed. We hope to use our technique of transformation dependency analysis
to propose ways to restructure the resolution rules in a semi-automated way.
– The approach needs to be validated for a wider variety of UML diagrams.
– The approach needs to be integrated into contemporary modeling tools to make
them better suited for the activity of model inconsistency management.
References
1. Lehman, M.M., Ramil, J.F., Wernick, P., Perry, D.E., Turski, W.M.: Metrics and laws of software evolution - the nineties view. In: Proc. Int’l Symp. Software Metrics, IEEE Computer
Society Press (1997) 20–32
2. Van Der Straeten, R., Mens, T., Simmonds, J., Jonckers, V.: Using description logics to
maintain consistency between UML models. In Stevens, P., Whittle, J., Booch, G., eds.:
UML 2003 - The Unified Modeling Language. Volume 2863 of Lecture Notes in Computer
Science., Springer-Verlag (2003) 326–340
3. Van Der Straeten, R.: Inconsistency Management in Model-driven Engineering. An Approach using Description Logics. PhD thesis, Department of Computer Science, Vrije Universiteit Brussel, Belgium (2005)
4. Object Management Group: Unified Modeling Language 2.0 Superstructure Specification.
http://www.omg.org/cgi-bin/apps/doc?formal/05-07-04.pdf (2005)
5. Plump, D.: Hypergraph rewriting: Critical pairs and undecidability of confluence. In: Term
Graph Rewriting. Wiley (1993) 201–214
6. Grundy, J.C., Hosking, J.G., Mugridge, W.B.: Inconsistency management for multiple-view
software development environments. IEEE Transactions on Software Engineering 24 (1998)
960–981
7. Spanoudakis, G., Zisman, A.: Inconsistency management in software engineering: Survey
and open research issues. In: Handbook of Software Engineering and Knowledge Engineering. World scientific (2001) 329–380
8. Ehrig, H., Tsioalikis, A.: Consistency analysis of UML class and sequence diagrams using
attributed graph grammars. In Ehrig, H., Taentzer, G., eds.: ETAPS 2000 workshop on graph
transformation systems. (2000) 77–86
9. Hausmann, J.H., Heckel, R., Taentzer, G.: Detection of conflicting functional requirements
in a use case-driven approach. In: Proc. Int’l Conf. Software Engineering, ACM Press (2002)
10. Mens, T., Taentzer, G., Runge, O.: Analyzing refactoring dependencies using graph transformation. Software and Systems Modeling (2006) To appear.
11. Bottoni, P., Taentzer, G., Schürr, A.: Efficient parsing of visual languages based on critical
pair analysis and contextual layered graph transformation. In: Proc. IEEE Symp. Visual
Languages. (2000)
12. Simmonds, J., Van Der Straeten, R., Jonckers, V., Mens, T.: Maintaining consistency between
UML models using description logic. Série L’objet - logiciel, base de données, réseaux 10
(2004) 231–244
p. 135 of 199
10
Mens, Van Der Straeten, D’Hondt
13. Van Der Straeten, R., D’Hondt, M.: Model refactorings through rule-based inconsistency
resolution. In: ACM SAC 2006 - Track on Model Transformation, to be published (2006)
14. Mens, T., Kniesel, G., Runge, O.: Transformation dependency analysis - a comparison of
two approaches. Série L’objet - logiciel, base de données, réseaux (2006)
p. 136 of 199
SAEV: a model to face Evolution Problem in
Software Architecture
Mourad OUSSALAH, Nassima SADOU, Dalila TAMZALIT
Nantes University , Nantes Atlantique Universities
CNRS LINA FRE 2729
2, rue de la Houssiniere BP 92208 44322
Nantes cedex 03, F-44000 France
Mourad.oussalah,nassima.sadou,dalila.tamzalit @univ-nantes.fr
Abstract
We propose in this article a model to describe and manage software architecture evolution
at a high abstraction level. We define a given software architecture through its architectural
elements that we consider as first-class entities, configurations, components, connectors and
their interfaces. In addition each software architecture is regarded through three abstraction levels: the meta level, the architectural one and the application level. According to
this, we propose SAEV (Software Architecture EVolution Model): it describes and manages evolution of software architecture, so of its elements at these different levels and in
a uniform way. In addition, it manages the evolution independently of any description or
implementation language, thus by distinguishing the architectural element evolution from
their behaviors. SAEV proposes evolution operations, described by evolution strategies
and evolution rules. These rules and strategies must respect all invariants defined on each
architectural element to guarantee the coherence of the considered architecture during its
evolution. SAEV proposes also an evolution mechanism, which describes the execution of
the evolution.
Key words: Software architecture evolution, evolution strategy,
evolution operation, invariant.
1
Introduction
Software architecture offers a high abstraction level for the description of systems, by defining their architectures in terms of components describing the system functionalities and of
connectors, which express the interactions among these components. This enables developers to disregard the unnecessary details and to focus on the global structure of the system
[14]. Software architecture determines not only how the system should be constructed but
also may guide its evolution [10]. Currently, several Architecture Description Languages
(ADLs) are proposed to aid the architecture-based development, such as C2 [1], ACME [8]
or Rapide [11].
Software systems need to be changed in their lifetime, original requirements may change
to consider the customer needs or the system environment changes. Evolution is an important mechanism to defer these changes on the system and to guarantee its long lifespan.
In spite of important progress on architectures description languages, we note that few approaches or mechanisms are proposed to face software architecture evolution problems. For
p. 137 of 199
M.Oussalah, N.Sadou, D.Tamzalit
the ADLs approaching this problem, their proposals are even limited to some techniques
such as instantiation, subtyping or composition [14].
We aim in this work, to enhance software evolution, by proposing an evolution model
called SAEV (Software Architecture EVolution Model). We are interested by the structural
evolution of software architecture. SAEV aims to describe and manage the evolution of
software architecture. For this, software architecture elements (like component, interface,
connector and configuration) are considered as first-class entities and SAEV leans on its own
concepts and evolution mechanism. The evolution of each architectural element is managed
through the evolution strategies and evolution rules. The later describe all the evolutions
operations that we can apply on an architectural element.
After this introduction the reminder of the paper is: section 2 presents a brief overview
of the software architecture through its mains concepts and its abstractions levels. Section 3 approaches the software architecture evolution problem, according to the two kinds
of evolution (static and dynamic evolution). Section 4 describes the proposed evolution
model SAEV, though its concepts and its mechanism. In the last section we present our
perspectives before concluding.
2
Software architecture: a brief presentation
The most cited definition of software architecture is given in [11]: Software architecture
provides a high level model of the system in term of the components that do the computation and connectors that causally connect the components. According to this definition,
we present in the following the most important architectural elements commonly used to
describe software architecture.
2.1 Presentation of the Architectural Elements
We present the main architectural elements commonly supported by the majority of ADLs
([1],[8] [11], [15],...). We present first their most accepted definitions in the software architecture community, their meta model, then we position them according to different abstraction
levels (section 2.2).
-Component: represents the computational element and data stores of a system. It is
described by an interface which exports the services that it provides and the services that
it requires and one or more implementations. Databases and mathematical functions are
examples of components.
- Connector: represents the interaction among components as well as the rules that control this interaction. A connector can describe simple instruction like procedure call or the
access to a shared variable, but also it can describe a complex interaction like the database
access protocols. It is mainly represented by an interface and one or more implementations.
- Interface: the interface is the only visible part of components and connectors. It provides
the set of services (provided or required) and interaction points. The interaction points of
component are called ports (provided or required port). Those of the connectors are called
roles, we distinguish also provided roles from required ones.
- Configuration: represents a connected graph of components and connectors. It describes
how they are fastened to each other. The configuration is described also by an interface
which provides a set of interaction points (provided ports and required ports) and a set
of services. We distinguish also two kinds of links used to fasten configuration’s elements
(components and connectors) :
o Attachment: it expresses which port of a given component is connected to which role
of a connector (a provided port must be attached only to a required role and a required
port must be attached only to a provided role).
o Binding: it defines a link between a port of a composed component and those of its
subcomponents, a role of a composite connector and those of its subconnectors or between
2
p. 138 of 199
M.Oussalah, N.Sadou, D.Tamzalit
Fig. 1. Architectural Elements Meta model
the ports of a configuration and those of its components.
The figure 1 illustrates the presented architectural elements, and the relations among
them using the UML [4] class diagram.
2.2 Architectural-element abstraction levels
Some of the surveyed ADLs such as C2 [1], ACME [8] and Rapide [11], consider components
and connectors as first-class entities and distinguish respectively the component-type and
the connector-type from their component-instances and connector-instances. This is not the
case of the configuration, which is often defined only at the application level as a graph of
components-instances and connectors-instances. In our work, we consider all architectural
elements as first-class entities and consider them at three abstraction levels: the Meta level,
the Architectural level and the Application level, as illustrated in figure 2. These levels are
necessary in order to be able to reify all architectural elements, and to manage them , and
consequently to be able to manage their evolution.
(i) Meta Level: is the level where each concept of any ADL is described, for example
the concept of: configuration, component, connector, etc.
(ii) Architectural Level: at this level, the architecture of a given system can be described using one or more instances of the concepts of the meta level. Figure 2,
presents a Client/Server architecture with a configuration: CSConf; three components : Client, Server, Database and of two connectors N1 and N2.
(iii) Application Level: at this level one or more applications can be defined in accordance with their architecture defined at the architectural level. For example, from
the previous Client/Server architecture, we can build the following application made
up of: Cf: instance of the configuration CSConf, C1,C2:instances of the component
Client, DBoracle: instance of the component Database, S1: instance of the component
Server, N1-1, N1-2: instances of connector N1 and N2.1: instance of connector N2.
After this brief overview of software architecture, we approach in the following section
the evolution problem within software architectures.
3
p. 139 of 199
M.Oussalah, N.Sadou, D.Tamzalit
Fig. 2. Architectural-Element Abstraction Level
3
Evolution problem in Software Architecture
As any software artifact, software architecture may evolve to reflect evolution needs. For
example, we may need to add new components, to modify connection between these components.... We distinguish two categories of evolution: static evolution and dynamic one.
3.1 Static evolution
Static evolution is supported by mechanisms proposed by ADLs. These mechanisms are
often influenced by object-oriented evolution mechanisms. We can cite for example instantiation, subtyping, composionality and refinement. We illustrate in the following the
instantiation and subtyping which are mechanisms offred by the majority of ADLs (ACME
[8], C2 [1],...).
- Instantiation: software architecture distinguishes between component and connector
types, where component types are abstractions that encapsulate functionalities into reusable
blocks, and connector types are abstractions that encapsulate components communication.
All ADLs consider component as type that can be instantiated multiple times in a single
system. Regarding connectors, only ADLs that model connectors as first-class entities support their instantiations.
- Subtyping: subtyping is where an object of one type may safely be substituted where
another type was expected. In software architecture component and connectors are architectural types and they are distinguished from the basic types (e.g. integers, characters,
etc) [14]. ACME supports components and connectors subtyping using the clause extend.
C2SADEL[13] is the ADL that most exploited the mechanism of subtyping. It distinguishes
three different component subtyping relationships: interface subtyping, behavior subtyping
, implementation subtyping. C2SADEL supports also heterogenous sybtyping by combination of the three types of subtyping using keywords such as and, or, not (interface and not
behavior for example).
4
p. 140 of 199
M.Oussalah, N.Sadou, D.Tamzalit
3.2 Dynamic evolution
Dynamic evolution is a new concern in ADLs. It means that is possible to introduce modifications in system during its execution. This is an important characteristic, as some critical
systems can’t be stopped, and their evolution must occur dynamically. Dynamic changes
of architecture may be either planned at architecture specification time or unplanned [13].
In the first case, the changes likely to occur during the execution of the system must be
known initially, so they must be specified at the description of architecture. Each ADL
proposes its owns mechanisms to support the planed dynamic evolution. Rapide, for example supports conditional configuration, using its clause where which enables a form of
architectural rewiring using the link and unlink operators [15]. The unplanned evolution
places no restrictions at the architecture specification time on the kinds of allowed changes
[13]. Thus, the ADL supporting this kind of evolution must offer architecture modification
features, which allows the architect to specify changes during the system execution. P. Oreizy [18] enumerates the following operations that must be supported by ADLs at runtime:
insertion, removal and rewriting of elements of a given architecture. ArchStudio [18] is an
example of interactive tool which supports these operations on architectures specified in C2
style.
Regarding the presented evolution approaches, we can note that the specification of
static evolution is integrated within the architecture specification, depending thus on the
architecture’s ADL. It is also concentred on the component and connector types evolution
and not on the other architectural elements. Regarding the dynamic evolution, notably
the unplanned evolution, proposals remain at the research stage and what is proposed by
ArchStudio is used only with architectures described in the C2 style. We can conclude that
evolution is not considered as an explicit problem within software architecture. For that, we
propose SAEV ( Software Architecture EVolution model). SAEV aims to explicitly describe
and manage software architecture evolution.
4
SAEV: Software Architecture EVolution Model
SAEV aims to describe and model all evolutions that can occur in the architecture life cycle,
and at each abstraction level (cf section 2.2). It aims also to manage the execution and the
impacts of each evolution and to maintain the architecture coherence. For these purposes,
SAEV must :
- distinguish and abstract the evolution from the specific behavior of architectural elements.
That allows to:
•
define mechanisms for the description and the management of the evolution independently
of the architectural elements description languages and uniformly at each abstraction
level;
•
support the re-use of these evolution mechanisms in similar evolution situation.
- identify the impacts of each evolution, and determine the mechanisms to propagate these
impacts.
- support static evolution (at the architectural specification time) as well as dynamic one
(at the application execution time).
In order to face these objectives, SAEV proposes the following concepts and mechanism.
4.1
SAEV’s Concepts
To identify the concepts of the model SAEV, we have considered the following interrogations:
•
Q1: What are the architectural elements involved?
•
Q2: What are the different evolution operations executed on which architectural element?
5
p. 141 of 199
M.Oussalah, N.Sadou, D.Tamzalit
Fig. 3. SAEV’s Meta-Model
•
Q3: What are the impacts of this evolution and how to manage them?
Regarding these concerns, SAEV offers the concepts of Architectural element to answer Q1,
Evolution operation to answer Q2, and the concepts Evolution strategy, Evolution rule and
Invariant to answer Q3. These concepts are illustrated with the meta-model figure 3.
(i) Architectural Element: this concept represents any element of a software architecture, that aims to evolve through. So, at the meta level, an architectural element
can be the concept ”Component”, at the architectural level, it can be the ”component
Client” and at the architectural level, it can be the ”Client X”.
(ii) Invariant : it represents the structural constraints of an architectural element, which
must be respected through its life cycle, whatever is the operation applied on this element. Any change in the architecture must maintain the correctness of this invariant.
At each level we can define different invariants. For example, at the meta level we
define the following structural invariants of the configuration
I1- a configuration must be composed of at least one component
I2- a provided port of configuration-interface must be connected at least to one provided port of its internal components.
At the architectural level, if we consider for example the Client/Server architecture
( section 2.2), we can define the following invariant: the number maximum of clients
that can be connected to a given server is 54. We note also that invariants defined on
a given level must be respected by the architectural elements of its lower levels. For
example the architectural elements of the application level must respect the invariants
defined at the meta and architectural levels.
(iii) Evolution Operation : we consider each operation that can be applied to a given
architectural element and that can cause its evolution, or the evolution of its architecture, as an evolution operation. Regarding the needs of architecture evolution we have
identified the following evolution operations: addition, deletion, modification, and substitution. Generally the evolution of software architecture is caused by the execution
of one or more evolution operations on its architectural elements. We present in the
following some examples of evolution operations associated to the configuration and
its interface.
- Addition/deletion/substitution of component or connector;
- Addition/deletion/substitution of provided port or service.
(iv) Evolution Rule: an evolution rule describes the evolution of a given architectural
element and the impacts of this evolution. Each evolution is invoked by an event,
which represents the evolution need, and a set of conditions that must be satisfied to
6
p. 142 of 199
M.Oussalah, N.Sadou, D.Tamzalit
execute this evolution rule.
We use the ECA formalism (Event /Condition /Action)to describe evolution rules.
For that, each evolution rule has:
• an event: is the evolution invocation.
• One or more conditions: that must be satisfied to execute the action part of the
evolution rule.
• One or more actions: an action can be an:
· Event, in this case, it will trigger another rule. It thus represents the propagation
of the impacts.
· Elementary action which is the invocation of an operation on an architectural
element. We note them: Architectural-element-name.Execute.operation-name (parameters);
SAEV proposes a set of evolution rules, and offers also possibilities to the designer to
create its own evolution rules from scratch or by reusing existing ones. We present
hereafter an example of evolution rule, R1: deletion of component.
Table 1. Example of evolution rule
R1: deletion of Component
Event:delete-comp(Cf: Config,C: comp)
Conditions:
C ∈ comp(Cf ), prov-interface(C) is connected to prov-interface(Cf)
∃N C ⊂ con(Cf ) and ∀N ∈ N C, N is connected to C and N is not charred
Actions:
For N ∈ N C delete-connector (Cf,N):
For b ∈ bindings(Cf,C) : delete-binding(Cf,C,b)
For I ∈ interface-comp(C) : delete-interface-comp(Cf,C,I)
C.Execute-delete-component(Cf)
The rule R1 describes the deletion of the component C belonging to the configuration Cf, given as parameters. The condition part of this rule expresses that the
component C is positioned at the extremity of the configuration Cf (prov-interface(C)
is connected to prov-interface(Cf)). It expresses also that all connectors attached to
the component C are note charred.
The action part of R1 is composed of three events and one elementary action.
• The first event triggers the rule of connector deletion , to delete all connectors
connected to the component C;
• The second event triggers the rule of deletion of bindings, to delete all bindings
between the configuration Cf and the component C.
• The third event will trigger the rule of interface-component deletion to delete the
interface of the component C.
• The last is an elementary action which invokes the execution of the deletion of the
component C.
(v) Evolution strategy: we define an evolution strategy for each architectural element.
It gathers all evolution rules associated with each operation which can be applied to
this architectural element. These rules can be already defined in SAEV as well as
they can be newly defined by the designer.
(vi) Evolution Manager: it is an actor, representing the processing system. Its role
is to intercept the events emanating from the designer or from other evolution rules
7
p. 143 of 199
M.Oussalah, N.Sadou, D.Tamzalit
Fig. 4. SAEV Evolution mechanism
towards an architectural element. Then it triggers the execution of the corresponding
evolution rules, according to the evolution strategy associated with this architectural
element. We detail this process in the following section.
4.2 Evolution Mechanism
The evolution mechanism describes the execution process that must be followed to carry out
a given evolution. We describe this process using the UML sequence diagram [4] figure 4 (for
space restriction reason we don’t present the complete sequence diagrams , we illustrate only
the most important sequences). The evolution is triggered automatically by any incoming
evolution event. The evolution manager :
•
1: intercepts this event;
•
2: and 3: selects the evolution strategy associated with the architectural element on
which is invoked the event ;
•
4: and 5: selects in this strategy the evolution rule that correspond to the event and
which have a satisfied conditions (for clearness reason we don’t illustrate sequences which
test the event and evaluate the conditions);
•
6:, 7: and 8: triggers the execution of the action part of the selected rule. Two cases can
arise:
· if the action corresponds to an event, the manager intercepts it also and follows the
same previous steps (1: to 8:)
· 10: if it corresponds to an elementary action, it triggers then its execution.
•
At the end of all evolutions, SAEV carry out the verification of invariants. If inconsistencies are detected, it alerts the designer, who can correct them. In the contrary case,
SAEV will cancel all executed evolutions.
Figure 4 illustrates the execution process of a given evolution without any conflict. Thus
implies that a the step 4 of the process, SAEV selects only one evolution evolution rule.But
others cases can occur, for example SAEV can select for the same event more evolution rules.
So, it must provide all the eventual evolutions according to the evolution rule considered.
The designer will validate one of these evolutions (these is not yet supported but represent
an important perspective of our work).
4.3 Some guidelines of SAEV’s implementation
We present in this section the steps followed to obtain the prototype of SAEV:
- step 1: description of SAEV as a component-based architecture;
- step 2: transcription of SAEV architecture on JAVA language.
8
p. 144 of 199
M.Oussalah, N.Sadou, D.Tamzalit
The first step shows an important characteristic of SAEV which is, its reflexivity. Thus
SAEV can be used to manage its own evolution. For example, by the addition of new
evolution mechanisms, or by the modification of it’s already proposed mechanisms. We have
specified the architecture of SAEV with the ADL UML2.0 [17]. To obtain the prototype
of SAEV in the JAVA language, we have defined mapping between UML2.0 concepts and
those of the java language ( note presented for space restriction reason).
5
Conclusion
We proposed in this article a model for software architecture evolution, we consider the
software architecture through three abstraction levels: the meta level, the architectural
one and the application level. SAEV allows the description and the management of the
evolution at these different levels in a uniform way. To each architectural element, it
associates an evolution strategy. An evolution strategy gathers whole of the evolution
rules which describe all the operations that we can apply to this element. The evolution
rules must respect the architectural elements invariants to maintain the architecture in a
coherent state. SAEV answers the most objectives fixed section 4. It considers the evolution
as an explicit problem in software architecture. It treats this problem independently of
the Architecture Description Language by proposing uniform mechanisms (evolution rules
and evolution strategy) for the evolution of each element. As perspective to this work,
notably to face the dynamic evolution, we will introduce a new evolution operation which
is : versioning. This operation is important to obtain the history (different copies) of each
architectural element. So a given version of an architectural element will be treated as the
same as any other architectural element. Until here we have outlined, only how an impact
of a given evolution is propagated at the same level. We aim to study the impacts of a
given evolution through different levels.
References
[1] R. Allen, R. Douence, D.Garlan : Specifying and Analyzing Dynamic Software
Architectures, In pro-ceeding of the Conference on Fundamental Approaches to Software
Engineering, Lisbon, Portugal Mars 1994.
[2] L. F. Andrade, J.L. Fiadeiro : architecture based evolution of software systems, SFM
2003,Bertinoro, Italy, pp 148-181.
[3] T. Batista, A. Joolia, G. Coulson: Managing Dynamic Reconfiguration in Component
based systems, in European workchop on software architecture, pp 1-17, Pisa, Italy, June
2005.
[4] G.Booch, J. Rumbaugh, and I. Jacobson: The Unified Modeling Language User Guide,
Addison-Wesley Professional, Reading, Massachusetts, 1998.
[5] C.Crnkovic, M.Larsson : Challenges of component-based development, in the journal of
Systems and Software pp 201-212, 2002.
[6] F. Duclos, J.Estublier and R. Sanlaville : Architectures ouvertes pour l’adaptation des
logiciels, software engineering, review N 58, September 2001.
[7] D. Garlan, R. Allen, and J. Ockerbloom: Architectural mismatch: Why reuse is so hard.
IEEE software, November 1995;
[8] D. Garlan, R. Monroe, D. Wile: ACME: Architectural Description Of Componentsbased systems, Leavens Gary and Sitaraman Murali, Foundations of component-Based
Systems, Cambridge Uni-versity Press, 2002,pp. 47-68.
9
p. 145 of 199
M.Oussalah, N.Sadou, D.Tamzalit
[9] R. Land: An architectural approach to software evolution and integration, Licentiate
thesis, ISBN 91-88834-09-3, Department of Computer Engineering, Mlardalen
University, September 19th 2003.
[10] M. Jazayeri: On Architectural Stability and Evolution. In Reliable Software TechnlogiesAda-Europe 2002. pp. 13-23
[11] D. Luckham, M. Augustin, J. Kenny, J. Vera, D. Bryan, W. Mann: Specification
and analysis of System Architectures using Rapide”, IEE Transaction on Software
Engineering, vol.21, N 4, April 1995,pp. 336-355.
[12] J. Magee, N Dulay, S. Eisenbach, J. Kramer : Specifying Distributed Software
Architectures, in Proceedings of the fifth European Software Engineering conference,
Barcelona, Spain, September 1995.
[13] N. Medvidovic, D.S.Rosenblum, and R.N. Taylor: A Language and Environment for
Architecture-based Software Development and evolution. In proceeding of the 21st
international conference en Software engineering, pp.44-53, may 1999.
[14] N. Medvidovic, D.S.Rosenblum: A Domains of Concern in Software Architectures and
Architecture Description Languages. In Proceedings of the 1997 USENIX Conference on
Domain-Specific Languages, October 15-17, Santa Barbara, California.
[15] N.Medvidovic, R. N. Taylor : A Classification and Comparison Framework for Software
Architec-ture Description Languages, IEEE Transactions on Software Engineering, Vol.
26, 2000.
[16] M. Moriconi, X. Qian and R.A Riemenschneider. Correct Architecture refinement. IEEE
transactions on software engineering, page 356-372, April 1995.
[17] OMG, UML 2.0 infrastructure specification, Technical Report ptc/03-09-15,Object
Management Group (2003).
[18] P. Oreizy: Issues in the runtime modification of software architectures. Technical
Report, UCI−ICS 96−35, University of California, Irvine, August 1996
[19] M.Oussalah, Changes and Versioning in complex objects, International Workshop on
Principles of Software Evolution, IWPSE 2001, September. pp. 10-11, Vienna University
of Technology, Austria.
[20] D.E. Perry, A.L. Wolf : Foundations for study of software Architecture, In
ACM/SIGSOFT Software Engineering Notes, volume 17, pp 40-52, Oct. 1992.
[21] R. Roshandel, A.V.D. Hoek, M. Miki-Rakic, N. Medvidovic : Mae-A System Model and
Environ-ment for Managing Architectural Evolution : ACM Transactions on Software
Engineering and Methodology, April 2004.
[22] M. Shaw, R. Deline, D. V. Klein, T.L. Ross, D. M. Young and G. Zelesnik: Abstractions
for Software architecture transactions on Software Engineering, page 314-335, April 1995.
[23] A.Smeda, M. Oussalah, T. Khamaci : A Multi-Paradigm Approach to Describe Complex
Software System”, WSEAS Transactions on Computers, Issue 4, Volume, 3, October
2003, pp. 936-941.
10
p. 146 of 199
SmPL: A Domain-Specific Language for
Specifying Collateral Evolutions in Linux Device Drivers
Yoann Padioleau
Julia L. Lawall
Gilles Muller
Ecole des Mines de Nantes
INRIA, LINA
44307 Nantes cedex 3, France
[email protected]
DIKU,
University of Copenhagen
2100 Copenhagen Ø, Denmark
[email protected]
Ecole des Mines de Nantes
INRIA, LINA
44307 Nantes cedex 3, France
[email protected]
Abstract
Collateral evolutions are a pervasive problem in largescale software development. Such evolutions occur
when an evolution that affects the interface of a
generic library entails modifications, i.e., collateral
evolutions, in all library clients. Performing these
collateral evolutions requires identifying the affected
files and modifying all of the code fragments in these
files that in some way depend on the changed interface.
We have studied the collateral evolution problem in
the context of Linux device drivers. Currently, collateral evolutions in Linux are mostly done manually
using a text editor, or with tools such as sed. The
large number of Linux drivers, however, implies that
these approaches are time-consuming and unreliable,
leading to subtle errors when modifications are not
done consistently.
In this paper, we propose a transformation language, SmPL, to specify collateral evolutions. Because Linux programmers are used to exchange, read,
and manipulate program modifications in terms of
patches, we build our language around the idea and
syntax of a patch, extending patches to semantic
patches.
1
Introduction
One major difficulty, and the source of highest cost, in
software development is to manage evolution. Software evolves to add new features, to adapt to new
requirements, and to improve performance, safety, or
the software architecture. Nevertheless, while evolution can provide long-term benefits, it can also introduce short-term difficulties, when the evolution of
one component affects interfaces on which other components rely.
In previous work [14], we have identified the phenomenon of collateral evolution, in which an evolution that affects the interface of a generic library entails modifications, i.e., collateral evolutions, in all
library clients. We have furthermore studied this
phenomenon in the context of Linux device drivers.
Collateral evolution is a significant problem in this
context because device drivers make up over half of
the Linux source code and are highly dependent on
the kernel and driver support libraries for functions
and data structures. From this study, we have identified a taxonomy of the relevant kinds of collateral
evolutions. These include changes in calls to library
functions to add or drop new arguments, changes in
callback functions defined by drivers to add or drop
required parameters, changes in data structures to
add or drop fields, and changes in function usage protocols.
Performing collateral evolutions requires identifying the affected files and modifying all of the code
fragments in these files that somehow depend on the
changes in the interface. Standard techniques include manual search and replace in a text editor,
tools such as grep to find files with relevant properties, and tools such as sed, perl scripts, and emacs
macros to update relevant code fragments. None
of these approaches, however, provides any support
for taking into account the syntax and semantics of
C code. Errors result, such as deleting more lines
of code than intended or overlooking some relevant
code fragments. Furthermore, many collateral evolutions involve control-flow properties, and thus require substantial programming-language expertise to
implement correctly.
In this paper, we propose a declarative transformation language, SmPL, to express precisely and concisely collateral evolutions of Linux device drivers.
Linux programmers are used to exchanging, reading,
and manipulating patch files that provide a record
p. 147 of 199
of previously performed changes. Thus, we base the
syntax of SmPL on the patch file notation. Unlike
traditional patches, which record changes at specific
sites in specific files, SmPL can describe generic transformations that apply to multiple collateral evolution
sites. In particular, transformations are defined in
terms of control-flow graphs rather than abstract syntax trees, and thus follow not the syntax of the C code
but its semantics. We thus refer to the transformation rules expressed using SmPL as semantic patches.
SmPL is a first step in a larger project to develop a
transformation tool, Coccinelle, providing automated
assistance for performing collateral evolutions. This
assistance will comprise the SmPL language for specifying collateral evolutions and a transformation engine for applying them to device driver code. Our
goal is that the transformation process should be robust, and interactive when necessary, to remain able
to assist the driver maintainer in the face of unexpected variations in driver coding style.
The rest of this paper is organized as follows. Section 2 describes a set of collateral evolutions that will
be used as our running example. Section 3 illustrates
how one of these collateral evolutions is expressed using the standard patch notation. Section 4 presents
SmPL in terms of this example. Finally, Sections 5
and 6 present related work and conclusions, respectively.
2
Motivating Example
As a running example, we consider the collateral
evolutions that took place in SCSI drivers in Linux
2.5.71, in each driver’s “proc info” function. Such a
function is exported by a SCSI driver to the SCSI
driver support library via the proc info field of a
SHT (for SCSI Host Template) structure. Each function prints information about the status of the corresponding device in a format compatible with the
Linux procfs file system.
The collateral evolutions in the proc info functions
were triggered by the decision that it is undesirable
for drivers to directly use the functions scsi host hn get to obtain access to a representation of the
device and scsi host put to give up this access, because any incorrect use of these functions can break
the integrity of associated reference counts [9]. Starting in Linux 2.5.71, these functions were no longer
exported by the SCSI driver support library. To
compensate for this evolution, the proc info functions
were then passed a representation of the device as
an extra argument. An existing parameter that was
used as the argument of scsi host hn get was also
p. 148 of 199
removed.
The collateral evolution in the case of the scsiglue
driver is illustrated in Figure 1. As shown in Figure 1a, in Linux 2.5.70 the function usb storage proc info declares a local variable hostptr (line 7),
representing the device, and contains code to access (line 15), test (lines 16-18), and release (lines
23 and 33) the device value. All of this code is removed in Linux 2.5.71 (Figure 1b).1 Instead, the local variable hostptr becomes a parameter of usb storage proc info, with the same type. Additionally, the hostno parameter of usb storage proc info in Linux 2.5.70 is dropped in Linux 2.5.71. References to hostno are replaced by accesses to the
host no field of the new hostptr parameter.
This example illustrates the combination of two of
the basic kinds of collateral evolutions identified in
our previous work [14]: (i) the introduction of a new
parameter and the corresponding elimination of computation that this parameter makes redundant, and
(ii) the elimination of a parameter and the introduction of computations to reconstruct its value.
3
The Patch Approach
Traditionally, changes in the Linux operating system
are published using patch files [10]. A patch file is
created by manually performing the change in the
source code, and then running the diff tool on the
old and new versions, with special arguments so that
diff records not only the differences, but also some
position and context information. An entry in the
patch file consists of a header, indicating the name
of the old file preceded by --- and the name of the
new file preceded by +++. The header is followed by
a sequence of regions, each beginning with @@ ...
@@, which specifies the starting line numbers in the
old and new files. A region then contains a sequence
of lines of text, in which lines that are added are
indicated by + in the first column, lines that are removed are indicated by - in the first column, and
lines that provide context information are indicated
by a space in the first column. To apply a patch
file, each mentioned file is visited, and the indicated
lines are added and removed. Normally, a patch file
is applied to a file that is identical to the one used
by the Linux developer to create it. It is possible to
instruct the patch tool to ignore the line numbers
or some of the lines of context, to be able to apply
the patch to a file that is similar but not identical to
1 The conditional on lines 21-25 is removed as well in Linux
2.5.71, but that appears to be related to another evolution,
and thus we have left it in for the purposes of the example.
1 static int usb_storage_proc_info (
2
char *buffer, char **start, off_t offset,
3
int length, int hostno, int inout)
4 {
5
struct us_data *us;
6
char *pos = buffer;
7
struct Scsi_Host *hostptr;
8
unsigned long f;
9
10
/* if someone is sending us data, just throw it away */
11
if (inout)
12
return length;
13
14
/* find our data from the given hostno */
15
hostptr = scsi_host_hn_get(hostno);
16
if (!hostptr) {
17
return -ESRCH;
18
}
19
us = (struct us_data*)hostptr->hostdata[0];
20
21
/* if we couldn’t find it, we return an error */
22
if (!us) {
23
scsi_host_put(hostptr);
24
return -ESRCH;
25
}
26
27
/* print the controller name */
28
SPRINTF("
Host scsi%d: usb-storage\n", hostno);
29
/* print product, vendor, and serial number strings */
30
SPRINTF("
Vendor: %s\n", us->vendor);
31
...
32
/* release the reference count on this host */
33
scsi_host_put(hostptr);
34
...
35
return length;
36 }
1 static int usb_storage_proc_info (struct Scsi_Host *hostptr,
2
char *buffer, char **start, off_t offset,
3
int length, int inout)
4 {
5
struct us_data *us;
6
char *pos = buffer;
7
8
unsigned long f;
9
10
/* if someone is sending us data, just throw it away */
11
if (inout)
12
return length;
13
14
15
16
17
18
19
us = (struct us_data*)hostptr->hostdata[0];
20
21
/* if we couldn’t find it, we return an error */
22
if (!us) {
23
24
return -ESRCH;
25
}
26
27
/* print the controller name */
28
SPRINTF("
Host scsi%d: usb-storage\n", hostptr->host_no);
29
/* print product, vendor, and serial number strings */
30
SPRINTF("
Vendor: %s\n", us->vendor);
31
...
32
33
34
...
35
return length;
36 }
(a) Linux 2.5.70
(b) Linux 2.5.71
Figure 1: An example of collateral evolution, in drivers/usb/storage/scsiglue.c
the one intended. Nevertheless, because there is no
semantic analysis of either the meaning of the patch
or that of the affected source code, this approach is
error prone. Furthermore, in practice, patches are
quite brittle, and variations in the source code imply
that parts of the patch cannot be applied at all.
Figure 2 shows part of the patch file used to
update the function usb storage proc info from
Linux 2.5.70 to Linux 2.5.71. While this patch may
apply to minor variations of the scsiglue.c file,
it cannot be applied to proc info functions in other
SCSI drivers, because of the scsiglue-specific names
such as usb storage proc info used in the modified
lines of code. This is unfortunate, because 19 SCSI
driver files in 4 different directories have to be updated in the same way.
bles a patch file, but whose application is based on
the semantics of the code to be transformed, rather
than its syntax. To illustrate our approach, we now
present a semantic patch expressing the collateral
evolutions described in Section 2. We develop the
semantic patch incrementally, by showing successive
excerpts that each illustrate a feature of SmPL. In
contrast to a patch that applies to only one file, the
semantic patch can be applied to all of the files in the
Linux source tree, to selected files, or to an individual
file, even a file outside the Linux source tree.
4.1
Replacement
Our first excerpt changes the function signature:
proc_info_func (
struct Scsi_Host *hostptr,
char *buffer, char **start, off_t offset,
int length,
int hostno,
int inout)
+
4
Expressing Collateral Evolutions as a Semantic Patch
To express collateral evolutions, we propose the As in a standard patch, the lines beginning with +
SmPL language for specifying semantic patches. A and - are added and deleted, respectively. The resemantic patch is a specification that visually resem- maining lines describe the modification context. This
p. 149 of 199
--- a/drivers/usb/storage/scsiglue.c Sat Jun 14 12:18:55 2003
+++ b/drivers/usb/storage/scsiglue.c Sat Jun 14 12:18:55 2003
@@ -264,33 +300,21 @@
-static int usb_storage_proc_info (
+static int usb_storage_proc_info (struct Scsi_Host *hostptr,
char *buffer, char **start, off_t offset,
int length, int hostno, int inout)
+
int length, int inout)
{
struct us_data *us;
char *pos = buffer;
- struct Scsi_Host *hostptr;
unsigned long f;
/* if someone is sending us data, just throw it away */
if (inout)
return length;
-
-
+
/* find our data from the given hostno */
hostptr = scsi_host_hn_get(hostno);
if (!hostptr) {
return -ESRCH;
}
us = (struct us_data*)hostptr->hostdata[0];
/* if we couldn’t find it, we return an error */
if (!us) {
scsi_host_put(hostptr);
return -ESRCH;
}
/* print the controller name */
SPRINTF("
Host scsi%d: usb-storage\n", hostno);
SPRINTF("
Host scsi%d: usb-storage\n", hostptr->host_no);
/* print product, vendor, and serial number strings */
SPRINTF("
Vendor: %s\n", us->vendor);
@@ -318,9 +342,6 @@
*(pos++) = ’\n’;
}
-
/* release the reference count on this host */
scsi_host_put(hostptr);
@@
identifier buffer, start, offset, length, inout,
hostno;
fresh identifier hostptr;
@@
proc_info_func (
+
struct Scsi_Host *hostptr,
char *buffer, char **start, off_t offset,
int length,
int hostno,
int inout)
The metavariables buffer, start, offset,
length, inout, and hostno are used on lines annotated with - or space, and thus match terms in
the original source program. They are declared as
identifier, indicating that they match any identifier. The metavariable hostptr represents a parameter that is newly added to the function signature.
We thus declare it as a fresh identifier, indicating that some identifier should be chosen that does
not conflict with the other identifiers in the program.
A semantic patch may contain multiple regions,
each declaring some metavariables and specifying a
transformation rule. Once declared, a metavariable
obtains its value from matching the transformation
rule against the source code. It then keeps its value
until it is redeclared.
4.3
Metavariables, part 2
As illustrated in Figure 1, the name of the function
to transform is generally not proc info func, but is
something specific to each driver. Rather than rely
on properties of the name chosen, we identify the
Figure 2: Excerpt of the patch file from Linux 2.5.70
function in terms of its relation with the SCSI interto Linux 2.5.71
face. Specifically, the function to modify is one that
is stored in the proc info field of a SHT structure.
excerpt is applied throughout a file, and transforms The following excerpt, placed before the excerpt of
every matching code fragment, regardless of spacing, Section 4.2, expresses this constraint:
indentation or comments.
/*
* Calculate start of next buffer, and return value.
4.2
Metavariables, part 1
The previous rule assumes that the proc info function has parameters buffer, start, etc. In practice,
however, the parameter names vary from one driver
to another. To make the rule insensitive to the choice
of names, we replace the explicit names by metavariables. These are declared in a section delimited by
@@ that appears before each transformation, as illustrated below:2
2 @@ is used in patch files to indicate the starting line number of the transformed code.
p. 150 of 199
@@
struct SHT sht;
local function proc_info_func;
@@
sht.proc_info = proc_info_func;
The declaration struct SHT sht; indicates that
the metavariable sht represents an expression of type
struct SHT. This type specification avoids ambiguity
when multiple structure types have fields of the same
name. If there is more than one assignment of the
proc info field, the metavariable proc info func is
bound to the set of all possible right-hand sides. Subsequent transformations that use this metavariable
are instantiated for all elements of this set.
4.4
Sequences, part 1
+
struct Scsi_Host *hostptr,
char *buffer, char **start, off_t offset,
int length,
int hostno,
int inout) {
the name of the old local variable to the new parameter. Metavariables are thus similar to logic variables,
The next step is to remove the sequence of statements
in that every occurrence of a metavariable within a
that declare the hostptr local variable and access,
rule refers to the same set of terms. Unlike the logic
test, and release its value. Because these statements
variables of Prolog, however, metavariables are alcan be separated by arbitrary code, as illustrated in
ways bound to ground terms.
Figure 1a (lines 7, 15-18, 23, and 33), we use the
This part of the collateral evolution introduced
operator ..., as follows:
some bugs in the Linux 2.5.71 version. For example,
in two files the hostno parameter was not dropped,
@@
identifier buffer, start, offset, length, inout,
resulting in a function that expected too many arguhostptr, hostno;
ments. The problem was fixed in Linux 2.6.0, which
@@
was released 6 months later.
proc_info_func (
-
...
struct Scsi_Host *hostptr;
...
hostptr = scsi_host_hn_get(hostno);
...
if (!hostptr) { ... }
...
scsi_host_put(hostptr);
...
}
If we compare this rule to Figure 1a, we see that
the declaration, access, and test each appear exactly
once in the source program, as in the rule, but that
scsi host put is called twice, once in line 23 in handling an error, and once in line 33 near the end of the
function. To address this issue, SmPL is control-flow
oriented, rather than abstract-syntax tree-oriented.
Thus, when a transformation includes ..., it is applied to every control flow path between the terms
matching the endpoints, which here are the beginning and end of the function definition. For instance,
in Figure 1a, after the assignment of the variable us,
there are two control flow paths, one that is an error
path (lines 23-24), and another that continues until
the final return (lines 27-35). A call to scsi host put is removed from each of them. Thus, a single line may in practice erase multiple lines of code, one
per control flow path.
Recall that in Section 4.2, we created a fresh identifier as the new parameter hostptr. In fact, when
the collateral evolutions were performed by hand,
the parameter was always given the name of the
deleted Scsi Host-typed local variable. Now that we
have expanded the semantic patch extract to contain
both the parameter and the local variable declaration, we can express this naming strategy by using
the same metavariable, declared as an identifier,
in both cases. This repetition implies that both occurrences refer to the same term, thus transmitting
4.5
Sequences, part 2
Finally, we consider the treatment of references to the
deleted hostno parameter. In each case, the reference
should be replaced by hostptr->host no. Here we
are not interested in enforcing any particular number
of occurrences of hostno along any given control-flow
path, so we use the operator <... ...> that applies
the transformation everywhere within the matched
region:
@@
@@
proc_info_func(...) {
<...
hostno
+
hostptr->host_no
...>
}
Note that ... can be used to represent any kind of
sequence. Here, in the first line, it is used to represent
a sequence of parameters.
4.6
Isomorphisms
We have already mentioned that a semantic patch
is insensitive to spacing, indentation and comments.
Moreover, by defining sequences in terms of controlflow paths, we abstract away from the very different
forms of sequencing that exist in C code. These features help make a semantic patch generic, allowing
the patch developer to write only a few scenarios,
while the transformation tool handles other scenarios
that are semantically equivalent.
In fact, these features are a part of a larger set
of semantic equivalences that we refer to as isomorphisms. Other isomorphisms that are relevant to this
example include typedef aliasing (e.g., struct SHT is
commonly referred to as SCSI Host Template), the
various ways of referencing a structure field (e.g.,
exp->field and *exp.field), and the various ways of
testing for a null pointer (e.g., !hostno and hostno
p. 151 of 199
@@
struct SHT sht;
local function proc_info_func;
@@
sht.proc_info = proc_info_func;
@@
identifier buffer, start, offset, length, inout,
hostptr, hostno;
@@
proc_info_func (
+
struct Scsi_Host *hostptr,
char *buffer, char **start, off_t offset,
int length,
int hostno,
int inout) {
...
struct Scsi_Host *hostptr;
...
hostptr = scsi_host_hn_get(hostno);
...
?- if (!hostptr) { ... }
...
?- scsi_host_put(hostptr);
...
}
@@
@@
relationship between the various transformed terms.
The “proc info” semantic patch applies to 19 files
in 4 different directories of the Linux source tree. In
the standard patch notation, the specification of the
required changes amounts to 614 lines of code for
the files in the Linux source tree, resulting in on average 32.3 lines per file. The semantic patch is 33
lines of code and applies to all relevant files including
those not in the Linux source tree. Because semantic
patches are intended to implement collateral evolutions, which are determined by interface changes, and
because interface elements are typically used only according to very restricted protocols, we expect that
most semantic patches will exhibit a similar degree of
reusability.
5
Related work
Influences. The design of SmPL was influenced by a
number of sources. Foremost among these is our target domain, the world of Linux device drivers. Linux
programmers use patches extensively, have designed
various tools around them [11], and use its syntax
informally in e-mail to describe software evolutions.
This has encouraged us to consider the patch syntax
as a valid alternative to classical rewriting systems.
Figure 3: A complete Semantic Patch
Other influences include the Structured Search and
Replace (SSR) facility of the IDEA development envi== NULL). We have identified many more useful iso- ronment from JetBrains [12], which allows specifying
patterns using metavariables and provides some isomorphisms, and continue to discover new ones.
morphisms, and the work of De Volder on JQuery [2],
which uses Prolog logic variables in a system for
4.7 All together now
browsing source code. Finally, we were inspired to
base the semantics of SmPL on control-flow graphs
Figure 3 presents the complete semantic patch that
rather than abstract syntax trees by the work of
implements the collateral evolutions described in SecLacey and de Moor on formally specifying compiler
tion 2. This version is augmented as compared to the
optimizations. [7]
previous extracts in that the error checking code if
(!hostptr) { ... } and the call to scsi host put Other work. Refactoring is a generic program transare annotated with ?, indicating that matching these formation that reorganizes the structure of a propatterns is optional (although removing them if they gram without changing its semantics [5]. Some of
are matched is obligatory).
the collateral evolutions in Linux drivers can be seen
as refactorings. Refactorings, however, apply to the
whole program, requiring accesses to all usage sites
4.8 Assessment
of affected definitions. In the case of Linux, however,
A semantic patch describes the evolution primarily the entire code base is not available, as many drivers
in terms of ordinary C code. Among the 62 semantic are developed outside the Linux source tree. There
patches we have written, we have often found it pos- is currently no way of expressing or generating the
sible to construct a semantic patch by copying and effect of a refactoring on such external code. Other
modifying existing driver code. The close relation- collateral evolutions are very specific to an OS API,
ship to actual driver code should furthermore make and thus cannot be described as part of a generic
it easy for a driver maintainer who wants to apply refactoring [8]. In practice, refactorings are used via
a semantic patch to understand its intent and the a development environment such as Eclipse that only
proc_info_func(...) {
<...
hostno
+
hostptr->host_no
...>
}
p. 152 of 199
provides a fixed set of transformations. JunGL is a
scripting language that allows programmers to implement new refactorings [17]. This language should be
able to express collateral evolutions. Nevertheless,
a JunGL transformation rule does not make visually apparent the relationship between transformed
source terms, which we have found makes the provided examples difficult to read. Furthermore, the
language is in the spirit of ML, which is not part of
the standard toolbox of Linux developers.
used to implement some static analyses targeting bug
detection, building on annotations added to variable
declarations, in the spirit of the familiar static and
const. Smatch [16] is a similar project and enables a
programmer to write Perl scripts to analyze C code.
Both projects were inspired by the work of Engler et
al. [3] on automated bug finding in operating systems
code. These examples show that the Linux community is open to the use of automated tools to improve
code quality, particularly when these tools build on
A number of program transformation frameworks the traditional areas of expertise of Linux developers.
have recently been proposed, targeting industrialstrength languages such as C and Java. CIL [13]
Conclusion
and XTC [6] are essentially parsers that provide some 6
support for implementing abstract syntax tree traversals. No program transformation abstractions, such In this paper, we have proposed a declarative lanas pattern matching using logic variables, are cur- guage, SmPL, for expressing the transformations rerently provided. CIL also manages the C source code quired in performing collateral evolutions in Linux
in terms of a simpler intermediate representation. device drivers. This language is based on the patch
Rewrite rules must be expressed in terms of this rep- syntax familiar to Linux developers, but enables
resentation rather than in terms of the code found transformations to be expressed in a more general
in a relevant driver. Stratego is a domain-specific form. The use of isomorphisms in particular allanguage for writing program transformations [18]. lows a concise representation of a transformation
Convenient pattern-matching and rule management that can nevertheless accommodate multiple prostrategies are built in, implying that the programmer gramming styles. SmPL furthermore addresses all
can specify what transformations should occur with- of the elements of the taxonomy of the kinds of colout cluttering the code with the implementation of lateral evolutions in Linux device drivers identified in
transformation mechanisms. Nevertheless, only a few our previous work.
We are currently completing a formal specification
program analyses are provided. Any other analyses
that are required, such as control-flow analysis, have of the semantics of SmPL, and are exploring avenues
to be implemented in the Stratego language. In our for an efficient implementation. In the longer term,
experience, this leads to rules that are very complex we plan to use SmPL to specify the complete set of
collateral evolutions required to update drivers from
for expressing even simple collateral evolutions.
Coady et al. have used Aspect-Oriented Program- one version of Linux to a subsequent one.
ming (AOP) to extend OS code with new features
[1, 4]. Nevertheless, AOP is targeted towards modularizing concerns rather than integrating them into a
monolithic source code. In the case of collateral evolutions, our observations suggest that Linux developers favor approaches that update the source code, resulting in uniformity among driver implementations.
For example, on occasion, wrapper functions have
been introduced to allow code respecting both old
and new versions of an interface to coexist, but these
wrapper functions have typically been removed after
a few versions, when a concerted effort has been made
to update the code to respect the new version of the
interface.
The Linux community has recently begin using various tools to better analyze C code. Sparse [15] is a
library that, like a compiler front end, provides convenient access to the abstract syntax tree and typing
information of a C program. This library has been
Acknowledgments
This work has been supported in part by the Agence
Nationale de la Recherche (France) and the Danish Research Council for Technology and Production
Sciences. Further information about the Coccinelle
project can be found at the URL:
http://www.emn.fr/x-info/coccinelle/
References
[1] Y. Coady and G. Kiczales. Back to the future:
a retroactive study of aspect evolution in operating system code. In Proceedings of the 2nd International Conference on Aspect-Oriented Software Development, AOSD 2003, pages 50–59,
Boston, Massachusetts, Mar. 2003.
p. 153 of 199
[2] K. De Volder. JQuery: A generic code browser [13] G. C. Necula, S. McPeak, S. P. Rahul, and
with a declarative configuration language. In
W. Weimer. CIL: Intermediate language and
Practical Aspects of Declarative Languages, 8th
tools for analysis and transformation of C proInternational Symposium, PADL 2006, pages
grams. In Compiler Construction, 11th Inter88–102, Charleston, SC, Jan. 2006.
national Conference, CC 2002, number 2304 in
Lecture Notes in Computer Science, pages 213–
[3] D. R. Engler, B. Chelf, A. Chou, and S. Hallem.
228, Grenoble, France, Apr. 2002.
Checking system rules using system-specific,
programmer-written compiler extensions. In [14] Y. Padioleau, J. L. Lawall, and G. Muller. Understanding collateral evolution in Linux device
Proceedings of the Fourth USENIX Symposium
drivers. In The first ACM SIGOPS EuroSys conon Operating Systems Design and Implementaference (EuroSys 2006), Leuven, Belgium, Apr.
tion (OSDI), pages 1–16, San Diego, CA, Oct.
2006. To appear.
2000.
[4] M. Fiuczynski, R. Grimm, Y. Coady, and [15] D. Searls. Sparse, Linus & the Lunatics, Nov.
2004.
D. Walker. Patch (1) considered harmful. In
http://www.linuxjournal.com/article/7272.
10th Workshop on Hot Topics in Operating Systems (HotOS X), Santa Fe, NM, June 2005.
[16] The Kernel Janitors.
Smatch, the source
matcher, June 2002.
[5] M. Fowler. Refactoring: Improving the Design
http://smatch.sourceforge.net.
of Existing Code. Addison Wesley, 1999.
[6] R. Grimm. XTC: Making C safely extensible. [17] M. Verbaere, R. Ettinger, and O. de Moor.
JunGL: a scripting language for refactoring. In
In Workshop on Domain-Specific Languages for
International Conference on Software EngineerNumerical Optimization, Argonne National Labing (ICSE), Shanghai, China, May 2006.
oratory, Aug. 2004.
[7] D. Lacey and O. de Moor. Imperative program [18] E. Visser. Program transformation with Stratego/XT: Rules, strategies, tools, and systems
transformation by rewriting. In R. Wilhelm, edin StrategoXT-0.9. In C. Lengauer et al., ediitor, Compiler Construction, 10th International
tors, Domain-Specific Program Generation, volConference, CC 2001, number 2027 in Lecture
ume 3016 of Lecture Notes in Computer Science.
Notes in Computer Science, pages 52–68, GenSpinger-Verlag, 2004.
ova, Italy, Apr. 2001.
[8] J. L. Lawall, G. Muller, and R. Urunuela. Tarantula: Killing driver bugs before they hatch. In
The 4th AOSD Workshop on Aspects, Components, and Patterns for Infrastructure Software
(ACP4IS), pages 13–18, Chicago, IL, Mar. 2005.
[9] LWN.
ChangeLog for Linux 2.5.71, 2003.
http://lwn.net/Articles/36311/.
[10] D. MacKenzie, P. Eggert, and R. Stallman.
Comparing and Merging Files With Gnu Diff
and Patch. Network Theory Ltd, Jan. 2003.
Unified Format section,
http://www.gnu.org/software/diffutils/manual/
html node/Unified-Format.html.
[11] A. Morton. Patch-scripts, Oct. 2002.
http://www.zip.com.au/˜akpm/linux/patches/.
[12] M. Mossienko. Structural search and replace:
What, why, and how-to. OnBoard Magazine,
2004. http://www.onboard.jetbrains.com/is1/
articles/04/10/ssr/.
p. 154 of 199
Versioning Persistence For Objects
Frédéric Pluquet
∗
Roel Wuyts
Université Libre de Bruxelles
Bruxelles, Belgique
Université Libre de Bruxelles
Bruxelles, Belgique
[email protected]
[email protected]
ABSTRACT
resentation (proprietary formats, XML descriptions,
database storage, ...). The goal is to make sure that
the data in the objects is not lost. Typical is also that
the data to be stored is big, and cannot be contained in
memory. For example, an insurance company will keep
all the data of their clients stored in some databases,
and applications that need data will load it from this
database, modify it, and write it back. Java Data Objects (JDO) [1] and Smalltalk Images are examples of
applications using storable persistence.
In literature the word “persistence” has different meanings.
It is either used to indicate the storage of the state of objects,
or in the context of versioning of objects. Firstly this paper
establish the separation of these two concepts, and explain
the goals of both. The second kind of persistence appears
very interesting to us. In the rest of the paper we try to
find all types of applications that, written in a language
that would include a persistent system, benefits of the new
features. For the moment the idea of a new kind of debugger
is promising to exploit all resources of this mechanism. We
finish this article with some ideas for the continuation of this
research.
1.
†
• versioning persistence. The second usage of persistence indicates an ability to revert to previous versions
of objects, where a version of an object is simply a
saved state of an object, possibly including metadata
like timestamps or even version numbers. This means
that versions of objects can be saved, and that versions
can be retrieved easily later on. Versioning persistence
is used in a variety of areas (computational geometry,
text and file editing, ...) [2]. The research in this area
is, up until now, primarily done in the field of algorithmic.
INTRODUCTION
The notion of persistence is used in several places in existing
literature, and seems to be a well understood notion. However, when investigating in some detail we see that different
semantics are attached to this word in different contexts.
One usage of the word persistence is used to describe the
storing of objects, and we therefore dubbed it storable persistence. The second usage of the word deals with the ability
to deal with versioning of objects, and we named it versioning persistence. Before the rest of the paper looks in detail at
versioning persistence, and what we can do with it, we first
have a more detailed look at these two forms of persistence.
• storable persistence. The first usage of persistence
indicates the possibility to store objects and to reload
them afterwards, and is the most frequent occurrence.
This kind of persistence is used to save the state of
objects (i.e. all values of attributes of objects at a
given instant) in a physical medium (files, databases,
...). The majority of programs that use storable persistence somehow serialises the objects to some rep∗http://www.ulb.ac.be/di/fpluquet/
†http://homepages.ulb.ac.be/∼rowuyts/
The goals of this article are twofold. The first one is the
separation of the two concepts of persistence that we identified. The second one is to study versioning persistence in
some more detail, and introducing possible applications.
2.
STORABLE PERSISTENCE
If we search for the word “persistence” in literature, nine
out of ten times we will find documents that talk about
storing objects in a physical system. The purpose of storable
persistence in an object oriented context is very simple : save
and load objects when necessary.
This simple principle gives rise to very hard problems regarding efficient reading and writing of objects. Therefore
lots of different approaches exist for this problem: either
using proprietary file formats, using different database systems (relational, relational object oriented, object oriented
or temporal object oriented database, too name a few), creating optimized data structures to store data efficiently in
databases, or using techniques such as the object faulting[3,
4, 5] or an automatic prefetching in queries to database[6]
to minimize loading of objects from a store . Describing all
of these techniques is beyond the scope of this paper.
The applications using storable persistence can be divided
p. 155 of 199
into two kinds :
Legend
Object life
Back to history
Access to
another object
Merge
1. The first kind adds this persistence to keep ephemeral
objects into a backup system. A copy of live objects
on physical support is there to restore the system, or
a part of the system, after a crash. The data manipulated by these applications are often very important
and a loss of data could mean a big loss of revenue.
Figure 1: Legend of symbols used in graphs
2. The second kind uses storable persistence to store data
that is too big too fit in main memory into a secondary
memory. In this case, swaps between the secondary
and main memory are necessary.
b
a
c
3.
VERSIONING PERSISTENCE
The purpose of versioning persistence is simply to keep versions of objects, such that particular versions of objects can
be manipulated on demand.
Several data structures were developed to deal with versioning persistence. These data structures constrain what can be
done with older versions of objects to optimize space and/or
execution times. There are three well known techniques in
existence today: partial, total and confluent persistence [2,
7]. Note that describing the algorithms for these different
kinds of versioning persistence is out of the scope of this
paper.
• Partial persistence An object is partially persistent
if all versions can be accessed but only the newest version can be modified (i.e. create a new version from
the newest version) [2].
• Total persistence Unlike partial persistence, total
persistence allows one to read and modify any version
of an object [2]. Modifying an old version A, which
already had a newer version B, creates a new version
C. The version B is still accessible, and therefore the
version A will then have two versions following it (B
and C ). Figure 2 shows an object with two concurrent
versions (the legend is described in Figure 1).
We define branches of a version to be the different
versions emerged from one version. We furthermore
define concurrent versions to be the versions contained
in different branches emerged from a same version. It
is important to understand that in a totally persistent
context zero, one or more versions can exist after each
version.
• Confluent persistence Confluent persistence is like
total persistence, but adds one more feature : the capacity to merge two concurrent versions of an object.
A merge of two versions results in a single new version.
This type of persistence was introduced by Amos Fiat
and Haim Kaplan [7].
Figure 2: Working on a past state and creating a
new history of the same object
4.
CATEGORIZING VERSIONING PERSISTENCE USAGES
Versioning persistence allows one to retrieve and manipulate
previous versions of objects. As described in the previous
section, different algorithms were devised, with partial and
total persistence offering less functionality than confluent
persistence. From the rest of the paper we assume to be
using versioning persistence with a confluent versioning system. This choice is motivated by the applications we see for
this type of persistence, which we detail in the next section.
This section categorises the basic ways versioning persistence can be used, either for introspection, or for reversion.
Note that throughout this section we illustrate these different forms of persistence with a number of figures. Time on
all of these figures flows from left to right.
4.1
Introspection
Using versioning persistence for introspection allows a particular version of an object to query previous versions of this
object. We see two kinds of these queries that can be useful:
queries on the past of a single object and queries on the past
of a number of objects.
The basic query of introspective persistence is the following :
Which are the previous values for a given variable of a given
object ? It is illustrated in Figure 3.
When considering multiple objects, we can relate the state
of different objects and pose a number of different queries:
c
d
We like to stress that although they have different goals and
usage, storable and versioning persistence are not incompatible. They can work together : storable persistence can offer
means of storage to store the big number of versions needed
to be kept by versioning persistence.
p. 156 of 199
b
a
Figure 3: Introspective Persistence: querying past
versions of an object
b
d
object1
b
c
object2
e
a
a
e
c
a'
Figure 4: Comparing objects at different versions
d
Figure 6: Two versions of an object are merged to
give a new version
c
a
object1
b
d
e
object2
a'
Figure 5: An object compares two concurrent versions of an other object
1. A first query relates the value of a variable of one object with the value of a variable of a second object at
the same time (see Figure 4). In the same vain it can
be interesting to know the state of an object before the
last change of an another object.
2. A second question is similar to the basic query, but in
case of an object with concurrent versions. The question stills the same, but an interesting case is the following : extreme versions of an object, i.e. the last versions of each branches, can be inspected by another object. This object can extract information about these
two versions, for example to calculate metrics (see the
illustration in Figure 5).
Note that we have not explicitly included information about
time in the queries. This could be included in the same way
as including versions, and we could then pose queries lie:
What was the state of this variable ten minutes ago? or
How many minutes have passed between the last change of
state of this variable?.
4.2
Reversion
Using versioning persistence for reversion allows one to revert to a previous version of an object, and start modifying
it (hence branching versions) like in Fig. 2, or to merge
different versions (Fig. 6).
We have not yet studied how the objects will be merged, but
we plan to look at object reconcilation [8] and see whether
we can apply it in our context.
5.
VERSIONING PERSISTENCE AT WORK
This section investigates a number applications we could
built if we would have an object oriented language that supported versioning persistence. It therefore motivates our
research for such language, and indicates the systems we
would like to built to validate versioning persistence.
It is very important to remark that the applications should
be able to use versioning persistence without implementing
it : the proposed layer will be fully-tested and optimised.
Developers can therefore work without worrying about persistence and include it when needed during development in
an easy and sure way.
A first type of applications that can benefit from versioning
persistence are applications where the history of objects is
used as an add-on of the main application to improve its
functionality. For example, text editors (including all integrated development environments (IDEs) and so on) use
the history of text editions to offer the Undo functionality.
This kind of application could be built without the Undo
aspect, which could be added later on by falling back to the
versioning persistence.
We can even go one step further, and have applications that
can revert to the history of an object, and branch of new
modifications of this previous state. This is typically not
possible in most current applications: most of the time, an
application that allows one to undo changes or go backwards in time, results in the old branch being replaced by
the newest one. As with the first type of applications, using
versioning persistence could add the possibility of working
with graphs of objects to applications that currently have
no or limited undo facilities.
A second kind of applications are those that need to compare versions of objects. We can find in this category all
file servers (like CVS, SVN, ...) where the goal is to save
different versions of a same object and to offer the possibility to compare two versions of the same object, merge
branches, etc. This is a straightforward of versioning persistence. Their implementation must be more simple with a
versionning persistent language.
A third example are new generations of debuggers. A debugger based on versioning persistence can compare states
of an object obtained by sending the same chain of messages on this object but with different parameters. The user
can then exploit information given by the debugger, inspecting other states of this object, to take decisions about the
application of a message. We have already implemented a
first and very promising version of a debugger that provides
such functionality. It uses a logic programming language to
query execution trace information (as published in [9]), but
it could benefit from proper versioning persistence instead
of the ad hoc deep copying of objects that is used now.
A fourth application is found after the reading of an article
written by Mark Johnson [10] : the versioning of objects to
p. 157 of 199
maintain serialization compatibility with JavaBeans. The
goal of this system is to maintain compatibility between objects and versions of a program : objects written for version
x must be interpreted in a version y, older than x. The
proposed solution is to modify the responsible class of the
serialization of objects to accept “compatible” changes (like
adding fields, changing the privacy of a field, . . . ). With a
versioning persistent language, another solution is have as
many versions of a reader document class than versions of
the application. When the objects must be read the first step
is to determine the version used to write this. The second
step is to use the compatible version the reader document
class to correctly read the object.
Last but not least we plan to use versioning persistence in
our research on logic meta programming. The logic meta
programming language we use, Soul, is a logic programming language living in symbiosis with its host language,
Smalltalk. More specifically it allows to execute object oriented code during logic inference [11]. What is currently
not very well handled is the mismatch between side-effects
in the object oriented code and backtracking. When using
versioning persistence, we could map backtracking during
logic execution to reverting to previous versions of objects
in the object oriented part.
6.
CONCLUSION AND FUTURE WORK
This paper introduces storage and versioning persistence,
two kinds of persistence that are currently amalgamated in
literature. It then looks in more detail at versioning persistence, differentiating between introspective usage (that
allows queries on the past states of objects) and reversion
usage (that allows to revert to a previous version of an object and create branches). Language support (or at least a
proper framework) can be used in a number of different applications. We enumerate some very straightforward applications (undo functionality and CVS-like repositories), and
some more novel ones (advanced debuggers and new logic
meta programming languages).
The future work is obviously to validate the ideas put forward by this paper. Therefore we are working on an implementation of versioning persistence in an object oriented
language. The goals of this implementation are multiple :
experiment with the cost of a memory-intensive mechanism
like persistence in an object oriented language, study the
possibilities of queries, and implement several techniques to
store versions of objects. Let’s get to work!
7.
REFERENCES
[1] David Jordan and Craig Russell. Java Data Objects.
O’Reilly Media, Inc., 2003.
[2] James R. Driscoll, Neil Sarnak, and Daniel D. Sleator.
Making data structures persistent. Journal of
Computer and System Sciences, pages 86–124, 1986.
[3] Antony L. Hosking and J. Eliot B. Moss. Towards
compile-time optimizations for persistence. In Alan
Dearle, Gail M. Shaw, and Stanley B. Zdonik, editors,
Implementing Persistent Object Bases, Principles and
Practice, Proceedings of the Fourth International
Workshop on Persistent Objects, 23-27 September
p. 158 of 199
1990, Martha’s Vineyard, MA, USA, pages 17–27.
Morgan Kaufmann, 1990.
[4] Antony L. Hosking, Eric W. Brown, and J. Eliot B.
Moss. Update logging for persistent programming
languages: A comparative performance evaluation. In
Rakesh Agrawal, Seán Baker, and David A. Bell,
editors, 19th International Conference on Very Large
Data Bases, August 24-27, 1993, Dublin, Ireland,
Proceedings, pages 429–440. Morgan Kaufmann, 1993.
[5] J. Eliot, B. Moss, and A. L. Hosking. Expressing
object residency optimizations using pointer type
annotations. In M. Atkinson, D. Maier, and
V. Benzaken, editors, Persistent Object Systems, pages
3–15. Springer, Berlin, Heidelberg, 1994.
[6] Ali Ibrahim and William R. Cook. Automatic
prefetching by traversal profiling in object persistence
architectures. In Proceedings 20th European
Conference on Object-Oriented Programming
(ECOOP 06), pages ??–??, 2006.
[7] Amos Fiat and Haim Kaplan. Making data structures
confluently persistent. In J. Algorithms, pages 16–58,
2003.
[8] Marc Shapiro, Philippe Gautron, and Laurence
Mosseri. Persistence and migration for C++ objects.
In Stephen Cook, editor, ECOOP’89, Proc. of the
Third European Conf. on Object-Oriented
Programming, British Computer Society Workshop
Series, pages 191–204, Nottingham (GB), July 1989.
The British Computer Society, Cambridge University
Society.
[9] Stéphane Ducasse, Tudor Gı̂rba, and Roel Wuyts.
Object-oriented legacy system trace-based logic
testing. In Proceedings 10th European Conference on
Software Maintenance and Reengineering (CSMR
2006), pages ??–?? IEEE Computer Society Press,
2006.
[10] Mark Johnson. It’s in the contract ! object versions
for javabeans, March 1998.
http://www.javaworld.com/javaworld/jw-03-1998/jw03-beans.html.
[11] Kris Gybels, Roel Wuyts, Stéphane Ducasse, and
Maja D’Hondt. Inter-language reflection - a
conceptual model and its implementation. Journal of
Computer Languages, Systems and Structures,
32(2-3):109–124, jul 2006.
http://prog.vub.ac.be/Publications/2005/vub-prog-tr05-13.pdf.
Change-based Software Evolution
Romain Robbes and Michele Lanza
Faculty of Informatics
University of Lugano, Switzerland
Abstract
Software evolution research is limited by the amount of
information available to researchers: Current version control tools do not store all the information generated by developers. They do not record every intermediate version of
the system issued, but only snapshots taken when a developer commits source code into the repository. Additionally,
most software evolution analysis tools are not a part of the
day-to-day programming activities, because analysis tools
are resource intensive and not integrated in development
environments. We propose to model development information as change operations that we retrieve directly from the
programming environment the developer is using, while he
is effecting changes to the system. This accurate and incremental information opens new ways for both developers and
researchers to explore and evolve complex systems.
1. Introduction
The goal of software evolution research is to use the history of a software system to analyse its present state and
to predict its future development [11, 5]. Such an analysis needs a lot of information about a system to give accurate insights about its history. Traditionally researchers extract their data from versioning systems, as their repositories contain the artifacts the developer produce and modify.
We argue that the information stored in versioning systems is not complete enough to perform higher quality evolution research. Since the past evolution of a software system is not a primary concern for most developers, it is not
an important requirement when designing versioning systems. They favor features such as language independence,
distribution and advanced merging capacities.
We need to prove to developers that results in software
evolution research are immediately useful to them by improving the integration of our tools in their day-to-day processes. Most tools are tailored for an use “after the fact”,
once the main development is over and before a new feature is added. A common approach is to download several
versions from a repository and to process them all at once.
This shows that incremental processing is limited, and computations are long and resource-intensive. We need to provide more incremental, lightweight approaches that developers can use in their work.
This paper presents our approach to tackle both problems
of accurate information retrieval and developer use of evolution tools. We believe the most accurate source of information is the Integrated Development Environment (IDE)
the developers are using. By hooking our tools into an IDE,
we can capture evolution information as it happens, treat
it in an incremental manner, and interact with the environment to improve the usability of our tools. Our approach is
based on a model of the changes developers are applying
to the system and hence treats changes as first-class entities. In that sense, we do not make a distinction between the
system and the changes that are performed on it, i.e., software engineering is part of software evolution.
Structure of the paper: Section 2 expands on the nature
and consequences of the problems we presented. Section 3
introduces our alternative approach. Section 4 describes the
implementation status of SpyWare, our prototype. Section
5 describes how such a model can be used in practice and
how problems are stated and solved differently in an incremental, change-based world. Section 6 concludes the paper.
2. Current Approaches to Software Evolution
To perform efficient evolution research, accurate data
about the system under study is required. Despite this need,
the tools the community uses to gather the data do not provide such accurate information. At the core of most data recovery strategies is the versioning system used by the developers of the system.
The main criteria in choosing a versioning system to
extract data from is how many systems it versions, especially open-source ones: developers allow free access to
their repositories. The largest open-source software systems
(Mozilla, Apache, KDE, etc) use either CVS or Subversion,
researchers therefore write their tools to gather data from
these repositories.
p. 159 of 199
2.1. Limitations in Information Gathering
In a previous study [12] we showed that most versioning
systems in use today (including CVS and Subversion) are
indeed losing a lot of information about the system they version. We identified two main, orthogonal, reasons: most versioning systems are (1) file-based, and (2) snapshot-based.
File-based systems. Most of these systems still function
at the file level, as this guarantees language independence.
On the other hand, it involves extra work to raise the level
of abstraction to the programming language used by the system [13, 9], because the collected information is obfuscated:
• The semantic information about a system is scattered
in a large amount of text files: there is no built-in central repository of the program structure, it has to be created manually.
• Keeping track of a program-level (not text-level) entity among several versions of the system is hard since
it involves parsing several versions of the entire system
while taking into account events such as renames of
files and entities due to refactorings. Hence some analyses are performed on data which has been sampled
[7, 8]: only a subset of the versions are selected because of time and space constraints. This increases the
changes between each versions, and makes it harder to
link entities across versions since the probability they
have changed is higher. Other analyses do without the
parsing of the files altogether, basing themselves on
coarser-grained information such as number of lines
or size of directories [4, 6].
Snapshot-based systems. Changes between successive
versions of the software are stored on explicit requests
(called commits) by the developer. The time between two
developer commits varies widely, but is often on the order
of several hours or days. What happens between two commits is never stored in the versioning system, and we have
to deal with degraded information:
• Since commits are done at the developer’s will, several independent fixes or feature additions can be introduced in one single commit, making it hard to differenciate them.
• The time information of each change is reduced to the
time when a commit has been performed: beyond the
task of extracting the differences between two versions
of the system, all information about the exact sequence
of changes which led to this differences is lost.
2.2. Practical Impacts of Information Loss
Example. The example in Figure 1 shows how this loss
of information can significantly degrade the knowledge we
p. 160 of 199
class Foo {
public int x;
public int y;
class Foo {
public int x;
public int y;
public doFoo() {
blah.blah(blah);
z = x + y;
blu = blu * 2;
t = blurg(z);
bli[t] = blu;
return t;
}
public quux() {
return y + 4;
}
public asdf() {
return x * 8 + y;
}
class Foo {
private int x;
private int y;
public doFoo() {
blah.blah(blah);
z = x + y;
return bar(z);
}
public baz() {
blah.blah(blah);
z = x + y;
return bar(z);
}
public quux() {
return y + 4;
}
public quux() {
return y + 4;
}
public asdf() {
return x * 8 + y;
}
public asdf() {
return x * 8 + y;
}
public getX() { return x; }
public setX(newX) { x = newX; }
public getY() { return y; }
public setY(newY) { y = newY; }
public baz() {
blah.blah(blah);
z = getX() + getY();
return bar();
}
public quux() {
return getY() + 4;
}
private bar(z) {
blu = blu * 2;
t = blurg(z);
bli[t] = blu;
return t;
}
private bar(z) {
blu = blu * 2;
t = blurg(z);
bli[t] = blu;
return t;
}
}
f = new Foo();
f.doFoo();
print f.x + f.y;
class Foo {
public int x;
public int y;
}
public asdf() {
return getX() * 8 + getY();
}
private bar(z) {
blu = blu * 2;
t = blurg(z);
bli[t] = blu;
return t;
}
}
f = new Foo();
f.doFoo();
print f.x + f.y;
f = new Foo();
f.baz();
print f.x + f.y;
}
f = new Foo();
f.baz();
print f.getX() + f.getY();
Extract Method
Rename Method
differences between refactorings
?
Create Accessors
lines changed between commits
Figure 1. Simple refactoring scenario leading
to evolution information loss.
get of a system. In this simple scenario a developer starts a
short refactoring session, in which he refactors the method
doFoo. He (1) extracts a block of statements in a new
method bar, (2) replaces direct accesses to instance variables x and y with accessors throughout the entire system,
and (3) renames doFoo to baz, replacing all references to
doFoo in the code base.
He then commits these changes. This is a very small
commit, less than a minute of work, since in current IDEs all
these refactoring operations can be semi-automated. Commits usually imply larger change sets than this simple example. According to the information gathered from the versioning system, the following physical changes happened:
• The method doFoo changed name and is now significantly shorter. This makes it hard to detect if the new
method baz is really the same entity that doFoo was.
A simple analysis would conclude that method doFoo
disappeared.
• There are several new methods: bar, baz, and accessor
methods getX, getY, setX, setY.
• Several methods had their implementation modified
because of the rename of doFoo and the introduction
of accessors, possibly scattered among several files of
the entire codebase.
In this example, only refactorings – by definition
behavior-preserving[3] – have been performed. The logical changes to the system are trivial, yet this commit
caused many physical changes: Its importance measured in lines of codes is exaggerated. Without a sophisticated, time-consuming analysis [13], some entities such
as doFoo are lost, even if they still exist in the code base.
On the other hand, using such a time-consuming analysis makes it harder to integrate our tools in day-to-day
activies.
Moreover, the simple scenario depicted above assumes
that a developer commits after every couple of minutes of
work. In reality, it is more on the order of hours. The change
amount would be greater, and changes would be even more
diluted and less recoverable.
2.3. The Lack of Integration
The way we collect our information has shaped our tools
to function likewise. The typical procedure to fetch information out of a version repository is to (1) download a set
of versions from the repository, (2) build a program representation for each of the versions, and (3) attempt to link
successive versions of entities.
This approach is clearly only suited for an off-line activity, because even if sampling is used it is time-consuming
(hours or days to complete on a large-scale system). Currently, forward and reverse engineering are two very distinct, separate activities. When applied in practice, reverse
engineering is performed by specialized consultants acting
on unknown systems under time constraints.
To better accomodate developers, software evolution tools need to be incremental in nature and easily accessible from IDEs. Tools need to focus on smaller-scale
changes, when developers are working on smaller parts
of the system, as well as providing a “big picture” view
of the system to external people such as project managers.
All these necessities become even more important with
the advent of agile methodologies such as extreme programming (whose motto is “embrace change” [1]), which advocate continuous refactorings and changes in the code base.
2.4. Ideas Behind our Approach
Our approach, presented in the next sections, stems from
the following observations:
• Versioning systems are not a good source to retrieve information, as they store changes at the file level. They
also store changes at commit time, yelding too coarsegrained changes.
• More and more developers are nowadays using IDEs,
featuring a wealth of information and tools, making
development more effective and increasing the change
rates of systems.
• For evolution tools to gain acceptance, they must (1)
adapt to this increase of the rate of change, (2) be used
by the developers themselves as part of their day-today activities, (3) be able to focus on small-scale as
well as large-scale entities, and (4) support incremental updates of information, as day-long information retrieval phases are a serious flaw for daily usage.
3. An Alternative Approach to Evolution
Our approach is based on two concepts: (1) an IDE integration to record as much information as possible and to
allow easy access to our tools, and (2) a model based on
first-class change operations to better match the incremental process of developing software.
3.1. Using the IDE as Information Source
Most programmers use IDEs for their day-to-day tasks,
because they are powerful tools featuring support for semiautomatic refactoring, incremental compilation, unit testing, advanced debugging, source control integration, quick
browsing of the system, etc. Most of them are extensible by
plug-in systems.
IDEs are able to do so much because they have an enormous amount of information about the developer and his
system. Being able to browse or refactor the system already
implies having a reified program model. Thus we advocate
integrating our tools in an IDE, and using the IDE itself as
the source of evolution information instead of the versioning system. Tool integration increases tool visibility and is
a first step to feature them in the developer’s workflow. To
use the IDE as the source of information is the closest we
can get to understand the developer’s intentions.
Most IDEs feature an event notification system, so tools
can react to what the developer is doing. Hooks monitor
when a class is compiled, or when a method is recompiled.
The approach we propose use these IDE hooks to react
when a developer modifies the system by creating data defined as first-class change entities.
3.2. First-class Change Entities
First-class change entities are objects modeling the history of a system following the incremental way it was
built. They contain information to reproduce the program of
which they represent the history. When executed, they yield
an abstract representation of the program they represent at a
certain point in time. They also contain additional information interesting for evolution researchers, such as when and
who performed which change operations.
p. 161 of 199
Traditional approaches model the history of a program
as a sequence of versions. This is memory-consuming, since
most parts of the system do not change and are simply duplicated among versions. This is why most approaches include
a sampling step, aimed at reducing the number of versions
by selecting a fraction of them. This sampling step hence
increases the changes between successive versions, rendering fine-grained analysis even harder. In constrast, our approach only stores the program-level differences between
versions, and is able to reproduce the program at any point
in time.
Change operations also model with greater accuracy the
way the developer thinks about the system. If a developer
wants to rename a variable, he does not think about replacing all methods referencing it with new methods, even if that
is what the IDE ends up doing: Modeling incremental modifications to the system eases its understanding.
Although we model program evolution with first-class
change operations to ease reverse engineering, we believe
it is useful for forward engineering as well. Most enduser applications feature an undo mechanism, but most program editors do not provide a sensible one at the semantic level. First-class change operations could enable this,
hence facilitating exploratory programming by trial and error. First-class change entities can also ease arbitrary program transformation to facilitate program evolution, following the same scheme as semi-automated refactorings[?].
To sum up, our approach consists of the following steps:
1. Use the hooks of the IDE to be notified of developer
activity.
2. React to this activity by creating first-class change objects representing the semantic actions the developer is
performing.
3. Execute these change objects to move the program representation at any point in time.
Advantages. The advantages of this alternative approach
over gathering data from a repository and performing offline analysis are the following:
• Accuracy. Reacting to events as they happen gives us
more accurate information than the one stored in the
versioning system. Timestamps are more precise, not
reduced to commit times. Events happen one by one,
giving us more context to process them than if we had
to process a batch of them, originated from an entire
day’s work.
• Incrementality. It is significantly easier to maintain a
semantic representation of the model. Events originating in the IDE are high level. Their granularity is the
one of classes and methods, not files and lines. Code
parsing is required only at the method level.
p. 162 of 199
• Fine-grained. Every program entity can be modelled
and tracked along its versions, down to the statement
level if necessary. There is no state duplication, leading to space economies when an entity does not change
during several versions.
• Flexibility. Going back and forward in time using
change objects is easy. It leads to more experiments
with the code base, easing “trial and error” in development.
Drawbacks. We have identified possible issues and implications with our approach:
• Acceptance. Evolution researchers use CVS despite its
flaws, because it is the versioning systems most developer use. Subversion is a newer versioning system
gathering momentum because it is close enough to
CVS. Hence to be succesful we need to depart from
people habits as less as possible.
• Validation. Our approach needs to be evaluated with
case studies. We are monitoring our prototype itself,
but without a “real-world” case study we are unsure
about performance constraints. Our approach works
best for new projects. This limits possible case studies.
• Paradigm shift. Such an incremental approach to various problems needs new tools and new practices to be
defined.
• Applicability. Our approach is language-specific,
which involves more effort to adapt it to a new language than conventional file-based approach. However, our current prototype implementation is split
into a language-independent part and a languagedependent one. Only the latter one must be adapted to
other languages/IDEs.
To address acceptance issues, we can integrate our tools
in mainstream IDEs, such as Eclipse, which features a plugin mechanism. The monitoring part of the system is not
intrusive and is not visible to users. Keeping track of the
data across sessions or programmer locations can be done
by creating a “sync” file which would be part of the current project. The versioning system itself would be used to
broadcast and synchronize information.
4. Our Prototype: SpyWare
Our ideas are implemented in a prototype named SpyWare (see Figure 2), written in Squeak [10]. It monitors
developer activity by using event handlers located at IDE
hooks, and generates change operations from events happening in the Squeak IDE. Changes supported so far are
shown in Table 1.
Change Type
Creation
Addition
Removal
Rename
Superclass Change
Property Change
Refactoring
Package
X
X
X
no
X
-
Class
X
X
X
X
X
X
-
Method
X
X
X
no
X
-
Variable
X
X
X
no
X
-
Statement
X
X
X
no
X
-
Table 1. Changes supported by SpyWare.
5. Change-Based Software Evolution
Figure 2. SpyWare’s UI features browsing,
statistics and interactive visualizations.
SpyWare associates these change operations to program
entities up to the statement level. It is possible to track
changes to a single statement. Entities are uniquely identified independently from their name: A rename is a trivial operation. SpyWare can also generate the source code
of the program it is monitoring at any point in time, by applying or reverting change operations. It also features basic
support for interactive visualizations of the history.
Our future work includes the definition and detection of
higher-order changes such as refactorings, or distinct features of the monitored program, out of the basic building
blocks we already defined. SpyWare is currently singleuser: we plan to make it multi-user soon.
We believe our approach has the potential to address several problems in both reverse and forward engineering, as an
IDE integration makes the dialogue between the two activities more natural.
Facilitating program comprehension. Processing
finer-grained changes will allow us to detect and characterize changes with greater accuracy. Detecting and keeping
track of all the refactorings performed on the code will allow us to track specific entities with more accuracy. We
also believe that it is possible to characterize changes as either bug fixes, refactorings or feature additions and that
this information will allow to focus analysis on specific changes by contextualizing them.
Our model allows us to characterize or classify changes
and entities in arbitrary ways (using properties or annotations). This facility can be used to ease understanding of
the code as well. Contrary to classical versioning systems
where branches are fixed and are set up before modification,
our model permits the modification of properties of changes
while reviewing them. Changes that need to be grouped can
be tagged for an easier handling.
Recording the complete history of a system allows for
fine-grained understanding of a dedicated piece of code by
reviewing its introduction and modifications in context of
surrounding modifications, e.g., it is useful to know whether
a line is present from the beginning of a method or much
later because of a bug fix.
Facilitating program evolution. First-class change objects can be broadcasted through a network to increase
awareness and responsiveness to changes, by providing developers insights of what other developers are doing. Such a
system would tell them if their changes are conflicting with
other people’s changes interactively. This will help avoiding long and painful merge phases.
Change-based operation coupled with entity-level tracking will ease refactoring, e.g., in our current model, the
name of an entity is just a property: A rename does not affect identity.
Merging reverse and forward engineering. Higherlevel languages and tools promote a faster and easier implementation of functionality, which translates into a higher
p. 163 of 199
change rate of the system. Hence some reverse engineering activities need to be done on a smaller scale, but with
a higher frequency and accuracy, to keep track of what has
been done in the system before resuming work on a part of
the system.
Change operations between two versions of the system
can be used to generate an automatic and interactive change
log to bring other developers up to speed on the changes a
developer made.
6. Conclusion
Software evolution research is restrained by the loss of
information which are not captured by most versioning systems. Evolution analysis tools are not used by developers
because they are not integrated in an IDE and require timeconsuming data retrieval and processing phases. They are
not suited for smaller-scale, day-to-day tasks [2].
We presented an alternative approach to gather and process information for software evolution. We gather data
from the IDE the developer is using rather than the versioning system. We model program change as first-class entities to be closer to the developer’s thought process. Changes
can manipulate the model to bring it at any point in time in
a very fine-grained way.
Our approach being incremental, fine-grained and integrated in an IDE, we consider it is suited for a daily use by
developers. To validate our hypothesis, we are currently implementing a prototype named SpyWare.
Acknowledgments: We gratefully acknowledge the financial support of the Swiss National Science foundation
for the projects “COSE - Controlling Software Evolution”
(SNF Project No. 200021-107584/1), and “NOREX - Network of Reengineering Expertise” (SNF SCOPES Project No.
IB7320-110997), and the Hasler Foundation for the project
“EvoSpaces - Multi-dimensional navigation spaces for software evolution” (Hasler Foundation Project No. MMI 1976).
We thank Marco D’Ambros and Mircea Lungu for giving valuable feedback on drafts of this paper.
References
[1] K. Beck.
Extreme Programming Explained: Embrace
Change. Addison Wesley, 2000.
[2] S. Demeyer, F. Van Rysselberghe, T. Gı̂rba, J. Ratzinger, ,
R. Marinescu, T. Mens, B. Du Bois, D. Janssens, S. Ducasse,
M. Lanza, M. Rieger, H. Gall, M. Wermelinger, and M. ElRamly. The Lan-simulation: A Research and Teaching Example for Refactoring. In Proceedings of IWPSE 2005 (8th
International Workshop on Principles of Software Evolution), pages 123–131, Los Alamitos CA, 2005. IEEE Computer Society Press.
[3] M. Fowler, K. Beck, J. Brant, W. Opdyke, and D. Roberts.
Refactoring: Improving the Design of Existing Code. Addison Wesley, 1999.
p. 164 of 199
[4] H. Gall, K. Hajek, and M. Jazayeri. Detection of logical
coupling based on product release history. In Proceedings
International Conference on Software Maintenance (ICSM
’98), pages 190–198, Los Alamitos CA, 1998. IEEE Computer Society Press.
[5] H. Gall, M. Jazayeri, R. Klösch, and G. Trausmuth. Software
evolution observations based on product release history. In
Proceedings International Conference on Software Maintenance (ICSM’97), pages 160–166, Los Alamitos CA, 1997.
IEEE Computer Society Press.
[6] H. Gall, M. Jazayeri, and J. Krajewski. CVS release history
data for detecting logical couplings. In International Workshop on Principles of Software Evolution (IWPSE 2003),
pages 13–23, Los Alamitos CA, 2003. IEEE Computer Society Press.
[7] T. Gı̂rba, S. Ducasse, and M. Lanza. Yesterday’s Weather:
Guiding early reverse engineering efforts by summarizing
the evolution of changes. In Proceedings 20th IEEE International Conference on Software Maintenance (ICSM’04),
pages 40–49, Los Alamitos CA, 2004. IEEE Computer Society Press.
[8] T. Gı̂rba, M. Lanza, and S. Ducasse. Characterizing the
evolution of class hierarchies. In Proceedings Ninth European Conference on Software Maintenance and Reengineering (CSMR’05), pages 2–11, Los Alamitos CA, 2005. IEEE
Computer Society.
[9] C. Görg and P. Weissgerber. Detecting and visualizing refactorings from software archives. In Proceedings of IWPC
(13th International Workshop on Program Comprehension,
pages 205–214. IEEE CS Press, 2005.
[10] D. Ingalls, T. Kaehler, J. Maloney, S. Wallace, and A. Kay.
Back to the future: The story of Squeak, A practical
Smalltalk written in itself. In Proceedings OOPSLA ’97,
ACM SIGPLAN Notices, pages 318–326. ACM Press, Nov.
1997.
[11] M. Lehman and L. Belady. Program Evolution: Processes of
Software Change. London Academic Press, London, 1985.
[12] R. Robbes and M. Lanza. Versioning systems for evolution
research. In Proceedings of IWPSE 2005 (8th International
Workshop on Principles of Software Evolution), pages 155–
164. IEEE Computer Society, 2005.
[13] D. Roberts, J. Brant, R. E. Johnson, and B. Opdyke. An
automated refactoring tool. In Proceedings of ICAST ’96,
Chicago, IL, Apr. 1996.
[14] Q. Tu and M. W. Godfrey. An integrated approach for
studying architectural evolution. In 10th International Workshop on Program Comprehension (IWPC’02), pages 127–
136. IEEE Computer Society Press, June 2002.
ERCIM 2006
Using Microcomponents and Design Patterns
to Build Evolutionary Transaction Services
Romain Rouvoy, Philippe Merle
INRIA Futurs, JACQUARD Project,
LIFL - University of Lille 1,
59655 Villeneuve d’Ascq Cedex, France
{romain.rouvoy, philippe.merle}@inria.fr
Abstract
The evolution of existing transaction services is limited because they are tightly
coupled to a given transaction standard, implement a dedicated commit protocol,
and support a fixed kind of applicative participants. We think that the next challenge for transaction services will be to deal with evolution concerns. This evolution
should allow developers to tune the transaction service depending on the transaction
standard or the application requirements either at design time or at runtime.
This papers deals with the construction of evolutionary transaction services. The
evolution support is obtained thanks to the use of microcomponents and design patterns, whose flexibility properties allow transaction services to be adapted to various
execution contexts. This contribution is integrated in the GoTM framework that
implements this approach to build transaction services supporting several transaction standards and commit protocols. We argue that using fine-grained components
and design patterns to build transaction services is an efficient solution to the evolution problem and our past experiences confirm that this approach does not impact
the transaction service efficiency.
Key words: Evolution, transaction service, microcomponent,
design pattern, CBSE, Fractal component model.
1
Introduction
Current transaction services do not support evolution well. Among the possible evolutions of a transaction service (e.g., Commit Protocol, Transaction
Model), this paper addresses the evolution of the transaction standards and
commit protocols supported by a transaction service.
Indeed, nowadays transaction standards are evolving more and more faster
to fit with the middleware evolution. For example, the Web Service Atomic
Transaction (WS-AT) specification has recently been released to provide a
transaction support to Web Services [7]. But the implementation of such a
p. 165 of 199
Rouvoy and Merle
transaction standard requires a legacy transaction service to be modified to
support a new transaction API, model and propagation protocol. This is the
reason why most transaction services are usually associated to only one transaction standard and the implementation of a new transaction standard results
in the development of a new transaction service. In [20], we show that a
transaction service can compose several transaction standards simultaneously.
This transaction service is built depending on the transaction standards that
are composed (e.g., OTS, JTS, and WS-AT). The resulting transaction service shares some common entities between the transaction standards (e.g.,
commit protocol, transaction status) and provides dedicated entities for the
particularities of each transaction standard (e.g., synchronization objects, XA
resources).
Similarly, we observe that transaction service implementations are tailored
for particular application context. A transactional protocol is chosen and remains the same when the application context changes. This may lead to
unexpected poor performances. Therefore, the evolution of the commit protocol is not supported by legacy transaction services. In [21], we show that
a transaction service can switch dynamically over several 2 Phase-Commit
protocols at runtime depending on the execution context of the application.
The resulting transaction service selects the most appropriate protocol with
respect to the execution context. It performs better than using only one commit protocol in an evolutionary system and that the reconfiguration cost is
negligible.
The proposals described in [20,21] are built using an approach combining
microcomponents and design patterns. This paper introduces the microcomponents and the design patterns generally involved in the construction of an
evolutionary transaction service. A microcomponent can represent a pool of
components, a policy of message propagation, or a command to execute. The
microcomponents are implemented with the Fractal component model [5] and
integrated in the GoTM framework. We argue that using fine-grained components and design patterns to build transaction services is a efficient solution
to the evolution problem and our past experiences confirm that this approach
does not impact the transaction service efficiency.
The paper is organized as follows. Section 2 introduces the Fractal component model and the concept of microcomponent. Section 3 illustrates the
construction of design patterns using microcomponents. Section 4 discusses
the benefits of our approach. Section 5 presents some related work. Section 6
concludes and gives some future work.
2
microcomponents with Fractal
GoTM uses Fractal as reference component model to define its microcomponents. This section first introduces the Fractal component model, then
presents the concept of microcomponent in details, and illustrates the use of
2
p. 166 of 199
Rouvoy and Merle
microcomponents to refactor a logging object.
2.1
The Fractal Component Model
The hierarchical Fractal component model uses the usual component, interface,
and binding concepts [5]. A component is a runtime entity that conforms to
the Fractal model. An interface is an interaction point expressing the provided
or required methods of the component. A binding is a communication channel established between component interfaces. Furthermore, Fractal supports
recursion with sharing and reflective control [6]. The recursion with sharing
property means that a component can be composed of several sub-components
at any level, and a component can be a sub-component of several components.
The reflective control property means that an architecture built with Fractal is
reified at runtime and can be dynamically introspected and managed. Fractal
provides an Architecture Description Language (ADL), named Fractal ADL,
to describe and deploy automatically component-based configurations [15].
Figure 1 illustrates the different
binding
primitive component
content
entities of a typical Fractal compo- controller
nent architecture. Thick black boxes
run
denote the controller part of a comserver
ponent, while the interior of the boxes interface
corresponds to the content part of composite internal
client
collection shared
a component. Arrows correspond to component interface interface interface component
bindings, and tau-like structures pro- Fig. 1. The Fractal component model.
truding from black boxes are internal
or external interfaces. Internal interfaces are only accessible from the content
part of a component. A starry interface represents a collection of interfaces of
the same type. The two shaded boxes C represent a shared component.
2.2
The Microcomponents
The approach presented in this paper promotes the definition of microcomponents as units of design, deployment, and execution.
A microcomponent communicates via its microinterfaces. The semantics of
the microinterface depends on the semantics of the microcomponent that provides it. Then, a microinterface identifies a function provided by a microcomponent. Microinterfaces are interfaces defining a very small set of operations 1
where the operation signature is uncoupled from the operation semantics.
If the interface contains too many operations, then it is split into several
microinterfaces. This approach makes easier the factorization and the reuse
of microinterface operations. The definition of microinterfaces offers more
modularities to compose microcomponents. Therefore dependencies between
1
The empirical statistics performed on the code base show that GoTM microinterfaces
define no more than 4 operations.
3
p. 167 of 199
Rouvoy and Merle
microcomponents are expressed in terms of functional dependencies. Then,
the configuration operations usually available on an interface are reified as
microcomponent attributes according to the separation of concerns property.
If this attribute is not primitive or if it is shared by several microinterfaces,
then the microcomponent attribute is isolated and reified as a microcomponent
and composed with the other microcomponents.
This composition is achieved using an ADL. The composition concern has
an important place in the concept of microcomponents. Indeed, microcomponents are not only defined by their microinterface but also by their composition with the other microcomponents. microcomponents are identified to
split a coarse-grained component into several fine-grained components. When
composing them, the developer can provide various component semantics by
changing only some of the microcomponents. Moreover, it appears that the
architecture patterns used to compose the microcomponents can be identified.
The remainder of this paper describes the design patterns that are used in the
GoTM framework to build evolutionary transaction services. The ADL definitions provide the architectural definition of the design patterns. We show that
by modifying the ADL definition, it is possible to make the design patterns
evolve to handle different kinds of execution contexts.
2.3
The Logging Illustration
As an example, Figure 2 depicts
«interface»
Logging
the object LoggingImpl used by exist- +write(inout force : Boolean,
inout data : Byte[])
ing transaction services. This object +read() : Byte[]
implements an interface Logging conLoggingImpl
taining two operations. The opera-file : Byte[]
-bufferSize : Long
tion write stores in the logs (stable
storage) the value of the parameter
Fig. 2. The logging example.
data. This operation is used by the
coordinator and the participants of a transaction to log the progress of the
commit protocol. This information is used by the recovery process if a failure
crashes the system during the execution of a transaction. There exist two
types of log writes: force and non-force. The force log writes are immediatly
flushed into the log, generating a disk access. The non-force log writes are
eventually flushed into the log. The use of force or non-force log writes is
guided by the value of the parameter force. The operation read is used by the
recovery process to analyze the progress of the transactions that were active
when the transaction service crashed.
When considering the semantics of the operations, the logging object can
be refactored into several microcomponents and microinterfaces. The resulting
microcomponent-based architecture is depicted in Figure 3. The operations
write and read are used in different contexts. Therefore, the operations are
split into two microinterfaces: LoggingWriter and LoggingReader. Then, the
4
p. 168 of 199
Rouvoy and Merle
operation write semantics depends on the value of the parameter force. To
uncouple the semantics and the operation signature, the parameter force of
the operation write is removed. This parameter is replaced by two implementations of the interface LoggingWriter. The implementations correspond
to the possible semantics force and non-force. The piece of code common to
the implementations of the interface LoggingWriter are placed in a dedicated
class LoggingProviderImpl, which implements the interfaces LoggingProvider
and LoggingReader.
The component Logging uses the sharing
«interface»
LoggingWriter
+write(inout data : Byte[])
capability of the Fractal component model.
The microcomponent LoggingProviderImpl is
ForceLoggingWriter
NonForceLoggingWriter
0..1 -bufferSize : Long
shared between three components: ForceLogging
requires
0..1 requires
Policy that provides the microinterface force
1
write, NonForceLogging Policy that provides
«interface»
«interface»
LoggingProvider
LoggingReader
the microinterface non-force write and Log- +getWriter() : Writer 1
+read() : Byte[]
ging providing the microinterface read. The
LoggingProviderImpl
-file : Byte[]
microinterfaces write provided by the microcomponents ForceLogging Policy and NonForceLogging Policy are exported via the collection Fig. 3. The logging refactored.
interface write.
The composition of Figure 4 is
described using Fractal ADL. Fractal ADL configurations of microcom- write *
ponents are automatically generated
by the Fraclet tool 2 . Therefore, the
component Logging can be defined read
using the piece of configuration depicted in Figure 5. The definition
Fig. 4. The component Logging.
Logging extends the definition LoggingReader to provide the interface read (Line 1). It defines the collection interface write with the signature LoggingWriter (Line 2). The microcomponent
LoggingProviderImpl, named provider, is contained in the component Logging
(Lines 3). This component provider is shared between the components ForceLoggingPolicy and NonForceLoggingPolicy (Lines 4-9). Finally, the component
Logging exports the microinterfaces of the components ForceLoggingPolicy,
NonForceLoggingPolicy, and LoggingProviderImpl using bindings (Lines 10-12).
Thanks to this approach, it becomes easier to make the component Logging
evolve. Indeed, additional microcomponents can be added to the Logging component to implement a new semantics (e.g., the empty write semantics). The
semantics of the component can be dynamically changed by reconfiguring the
microcomponents contained in the component. This approach has succesfully
been applied to build component-based implementations of several well-known
Two-Phase Commit protocols [21]. This allows developers to tune the implementation of components depending on the execution context targeted (e.g.,
2
Fraclet: http://fractal.objectweb.org/tutorials/fraclet/
5
p. 169 of 199
Rouvoy and Merle
1
3
5
7
9
11
13
< definition name = " Logging " extends = " LoggingReader " >
< interface name = " write " signature = " LoggingWriter " cardinality = " collection " / >
< component name = " provider " definition = " L o g g i n g P r o v i d e r I m p l " / >
< component name = " force " definition = " F o r c e L o g g i n g P o l i c y " >
< component name = " provider " definition = " ./ provider " / >
</ component >
< component name = " non - force " definition = " N o n F o r c e L o g g i n g P o l i c y " >
< component name = " provider " definition = " ./ provider " / >
</ component >
< binding client = " this . read " server = " provider . read " / >
< binding client = " this . write - force " server = " force . write " / >
< binding client = " this . write - nonforce " server = " non - force . write " / >
</ definition >
Fig. 5. The Fractal ADL configuration Logging.
fault tolerance, performance, etc.).
3
Revisiting the Design Patterns with microcomponents
This section introduces the design patterns used in GoTM to build evolutionary transaction services. The configuration and the composition of the
microcomponents define the semantics of the transaction service. Therefore,
this transaction service can evolve when reconfiguring the assembly of microcomponents.
3.1
Design Patterns Overview
dynamic
static
In this paper, we focus on five design patterns used in GoTM to build evolutionary transaction services: Facade, Factory, State, Command, and Publish/Subscribe [12]. These design patterns are the basis to build any evolutionary
transaction service. We illustrate how the evolution of the transaction service
is driven by the evolution of the design patterns.
A conceptual overview of an evolutionary transaction service built with
GoTM is shown in Figure 6. The design patterns are shared out among
the static and the dynamic parts of
the evolutionary transaction service.
The static part addresses the trans- Fig. 6. Overview of the architecture.
action service itself and supports the Facade and the Factory design patterns.
The dynamic part handles the transactions created by the transaction service. This part uses the Facade, State, Command, and Publish/Subscribe
design patterns. Each of these design patterns are implemented using several
microcomponents that can be composed using Fractal ADL configurations.
3.2
The Facade Design Pattern
The Facade design pattern provides a high-level unified interface to a set of
interfaces in a subsystem to make it easier to use [12]. In GoTM, the Facade
6
p. 170 of 199
Rouvoy and Merle
design pattern is used to conform to a particular transaction standard (e.g.,
JTS, OTS, WS-AT). The Facade design pattern converts the interfaces defined
by the transaction standard to the microinterfaces provided by GoTM. Given
that a transaction service is composed of a static and a dynamic part (see
Section 3.1), the Facade design pattern is applied to the two parts using two
components.
The evolution of this design pattern is related to the ability of providing
the compliancy with various existing and future transaction standards. Using
the Facade design pattern, new transaction standards can be easily taken
into account. Indeed, this only requires to implement static and dynamic
Facade components. In GoTM, the Facade components can be automatically
generated using a model, presented in [19], that describes the mapping between
the interfaces defined in a standard and the microinterfaces exported by the
GoTM core components.
Figure 7 focuses on the static part
of the transaction service and depicts current-corba
a component Facade that provides threefactory-corba
facades. The component OTS Fafactory-jts
cade implements the OMG Object
Transaction Service standard [17]. The
component JTS Facade implements factory-ws-at
the Sun Java Transaction Service standard [8]. The component WS-AT FaFig. 7. The component Facade.
cade provides support for the Web
Service Atomic Transaction standard [7]. All of these facades share the component Factory. The component instances created by the component Factory
provide also three facades.
3.3
The Factory Design Pattern
The Factory design pattern provides an interface for creating families of related or dependent objects without specifying their concrete classes [12]. In
GoTM, this design pattern is used by the transaction service to build new
instances of transactions at runtime.
The evolutionary dimension of this design pattern deals with the ability
to handle crosscutting concerns, such as the caching and the pooling of the
instances created by the Factory. Depending on the TX Model definition, the
transaction service is able to create flat or nested transactions. In [21], the
transaction factory evolves to provide self-adaptability and choose the TwoPhase Commit protocol that would complete faster depending on the current
execution context.
Figure 8 depicts an example of a component Factory used to create new instances of transaction components. The component Factory provides a microinterface factory to support creation and destruction of transaction component
7
p. 171 of 199
Rouvoy and Merle
instances. The microcomponent Basic Factory creates new instances of components using the component Tx Model. A component Tx Model represents
a template of transaction components that can be cloned several times to
produce instances of transaction components.
Moreover, the component
Factory
Cache Factory
Tx Factory
Tx Model can be dynamically
Tx
Basic
Cache
Model
Factory
reconfigured to modify the arFactory
Pool
Cache
chitecture of future transac- factory
Factory
Pool
tion components. The Cache
Instance
pool
Pool
Factory introduces a caching
concern in the factory to reFig. 8. The component Factory.
duce the cost of garbage collecting the references of transaction components. Instances of useless transaction components are stored in the component Cache Pool to be recycled by the
Cache Factory. The Pool Factory registers the instances of transaction components created by the component Cache Factory. These instances are stored in
the component Instance Pool and can be listed using the microinterface pool
provided by the component Factory. This encapsulation of the components
Factory forms a delegation chain [12].
3.4
The State Design Pattern
The State design pattern allows an object to alter its behavior when its internal
state changes [12]. In GoTM, this design pattern is used to represent and
control the possible states of a transaction.
The evolutionary dimension of this design pattern deals with the capability of modifying the state automaton to support various transaction models.
Using the component State, GoTM is able to implement state automatons
conforming to the transaction standard specification. The State design pattern is revisited in this section using microcomponents to reify and configure
the state automaton.
Figure 9 depicts a simple state
start
complete
Inactive
Active
Completing
automaton.
The states Inactive and
done
done
suspend
start
Completed are attached to the iniSuspended
Completed
tial and the final states of the automaton, respectively. When receivsuspend
ing the event start, the system beFig. 9. The state automaton.
comes Active. It can be Suspended
when receiving the event suspend, and then moved back to Active via the
event start. From the state Active, the system can be moved to the state
Completing when receiving the event complete. Finally, the state Completed
is accessible from the states Active or Completing when receiving the event
done.
8
p. 172 of 199
Rouvoy and Merle
Figure 10 depicts a component State that implements
the automaton depicted in Figure 9. The microcomponents
Inactive, Active, Suspended, Completing, and Completed represent
a state of the automaton. The
Fig. 10. The component State.
bindings between the states
represent the allowed transitions. The microcomponent State Manager manages the state automaton and allows the system to reset the state automaton
at runtime via the microinterface manager. The exported microinterface state
provides an access to the microcomponent reifying the current state.
3.5 The Command Design
Pattern
The Command design pattern encapsulates the concept of command into an
object [12]. In GoTM, this design pattern is used to handle the different kind
of participants registered in a transaction.
The evolutionary dimension of the Command design pattern deals with the
list of commands that are executable on a transaction participant. Thanks to
the application of the Command design pattern, GoTM can decline the Command component to support XA resources, Synchronization objects, CORBA
resources, and Web Service participants depending on the content of the component XA Commands [20]. The Command design pattern is now revisited
using microcomponents to easily configure the available commands.
Figure 11 illustrates the component Command. It encloses a variable number of XA
participants on which commands can be applied [23]. The available commands are defined by the content of the component XA
Commands. Each command is implemented
by a microcomponent and conforms to the
XA specification [23]. The participants enlisted in the system via the microinterface
register are stored in the microcomponent
Participant Pool. The policy used to send notify events to the enlisted participants can be
Fig. 11. The component Comconfigured. For example, the microcompomand.
nent Sequential Notify (resp. Parallel Notify)
is responsible for notifying the participants sequentially (resp. in parallel) and
executing the corresponding command. Thus, both the Participant Pool and
the XA Commands components are shared between the Sequence Policy and
Parallel Policy components.
9
p. 173 of 199
Rouvoy and Merle
3.6
The Publish/Subscribe Design Pattern
The Publish/Subscribe design pattern 3 defines a one-to-many dependency between a publisher object and any number of subscriber objects so that when
the publisher object changes state, all of its subscriber objects are notified
and updated automatically [12]. In GoTM, the component Publish/Subscribe
is used to synchronize the transaction participants during the execution of the
Two-Phase Commit protocol [21].
The evolutionary dimension of this component consists in providing several publish policies. Additional publish policies are available in GoTM and
provide sequential propagation or pooled propagation policies to ensure that
no more than n messages are concurrently sent to the subscribers (n being
the size of the pool).
In Figure 12, the archiPublish
tecture of the component Publish/Subscriber
Synchronous
Pool
State
Synchronous Policy
Subscribe is similar to the arChecker
State
chitecture of the logging comSynchronous Publish
publish
ponent one depicted in Fig* *
State
State
ure 4. The microcomponent
Checker
Publish
Subscriber Pool is shared beSubscriber
ASynchronous
Asynchronous
Pool
Asynchronous Policy
tween the components SynchroPublish
Subscriber
nous Policy and Asynchronous subscribe
Pool
Publish/Subscribe
Policy. The microcomponent
Publish Synchronous guaran- Fig. 12. The component Publish/Subscribe.
tees that messages are correctly delivered and handled by the subscribers before returning. The microcomponent Publish Asynchronous sends messages to subscribers without waiting for. The State Checker microcomponent ensures that the published messages conform to the state automaton described in the shared component State
(Section 3.4).
4
Discussion
Separation of Concerns. The definition of microinterfaces makes the composition of components more flexible. Microinterfaces factorize the definition
of available operations and enforce their reuse by different microcomponents.
The definition of microcomponents provides a better separation of concerns.
This allows the developer to compose technical concerns, such as Caching or
Pooling, independently. Microcomponents can be removed, replaced, or added
depending on the architecture configuration of the component. This reconfiguration can be performed either while designing the transaction service using
the Fractal ADL or at runtime using the reflective control capabilities of the
Fractal component model. The Fraclet annotation framework drastically simplifies the definition of microcomponents while automatically generating the
3
Derived from the Observer/Observed design pattern.
10
p. 174 of 199
Rouvoy and Merle
component glue and most of the Fractal ADL configurations.
Software Architecture Patterns. Once the microcomponents are defined,
they can be easily composed using Fractal ADL. This composition relies
mainly on the principles of encapsulation and sharing. Encapsulation reifies as a component the domain of application of a set of microcomponents.
The sharing of microcomponents allows the components to collaborate transparently. The use of sharing favorises also the composition of orthogonal
concerns to introduce additional functions (e.g., propagation policies). Basically, we define the Sharing architecture pattern as a Software Architecture
Patterns [2]. This pattern allows a given component to be directly contained
in several other components. This architecture pattern is used by the Publish/Subscribe, Command, and Facade design patterns. Based on the Sharing
pattern, the Encapsulation architecture pattern extends the definition of a
component using the delegation chain design pattern. Encapsulation is applied in the Factory and Publish/Subscribe design patterns. The Policy architecture pattern consists in sharing a core component between several policy
components implementing the same interface. This architecture pattern is
used by the Publish/Subscribe and Command design patterns. The Pool architecture pattern gathers components providing a common interface within a
composite component. The pool pattern is used by the Factory, Publish/Subscribe and Command design patterns. The identification of such architecture
patterns can help in providing a better evolution support to CBSE. Tools and
rules can therefore be defined to control the evolution of component-based
applications.
Performance of Transaction Services. Finally, considering the performance issue, our past experiences with GoTM have shown that using microcomponents and design patterns introduces no performance overhead to the
transaction services [20,21]. Even better, it has shown that evolutionary transaction services built on top of GoTM can perform better than legacy transaction services [20].
5
Related Work
To achieve these goals, the approach used in GoTM takes advantages of several
works related to CBSE, such as mixin-based approaches, component-based
frameworks, microcomponents, and aspect-oriented design patterns:
Mixin-based approaches. Mixins [4] and Traits [10] provide a way of structuring object-oriented programs. Mixins are composed by inheritance to build
an object that combine different concerns, each concern being implemented
as a mixin. A trait is essentially a parameterized set of methods; it serves
as a behavioral building block for classes and is the primitive unit of code
reuse. Nevertheless, once mixed the object does not keep a trace of the mixins
that compose it. This means that the object can not evolve to handle other
concerns once it is mixed. In GoTM, we consider microcomponents as mix11
p. 175 of 199
Rouvoy and Merle
ins that can be composed to build larger components. Once composed, the
microcomponents are refied in a composite component to keep a clear view of
the resulting architecture.
Component-based frameworks. The goal of Medor [1], OpenORB [3],
Dream [14], and Jonathan [11] is to develop more configurable and re-configurable
middleware technologies through a marriage of reflection, component technologies and component framework. These frameworks are based upon the
lightweight and reflective OpenCOM and Fractal component models. For example, CORBA Object Request Brokers (ORB) have been implemented as
a set of configurable and reconfigurable components in the context of the
OpenORB and Jonathan frameworks. Nevertheless, these reflective adaptable middleware do not address the architecture of the component framework. Furthermore, they do not provide a methodology nor some evolutionary approaches to extend the possibilities of the component-based frameworks. While providing configurable properties equivalent to these frameworks, GoTM encloses also architectural patterns to support the evolution of
transaction services.
Microcomponents. AsBaCo [16] and AOKell [22] introduce microcomponents to build the controller part of Fractal components as an evolutionary
architecture. One of their contributions is a microcomponent model, which
permits the capture of the structure of a component controller part; these
frameworks enable the verification of the consistency of the controller configuration before launching the application. Since a microcomponent is, in
a simplified view, an object with several provided and required services, the
microcomponent model is applicable to Fractal implementations where the
controller part consists of small object-like elements. The microcomponent
models of AsBaCo and AOKell point out an interesting feature to build evolutionary middleware. Nevertheless, AsBaCo as well as AOKell do not provide
any solution to the problem of evolutionary middleware architecture design.
Based on a fine-grained component approach, GoTM addresses the evolution
of middleware architectures either at design time or at runtime.
Aspects and Design Patterns. The combination of aspects and design
patterns has been studied in several works [9,13]. The goal of this approach
is to enforce the tracability, the modularity, and the reusability of design patterns using aspects. Aspect-oriented programming provides a way of tracking
the design patterns, which tend to vanish in the code. Even if the design
patterns are reified as aspects, this approach does not take into account the
architectural dimension of an application. In particular, the design patterns
are not reified in the architecture configuration to allow the application to
evolve. Using microcomponents, GoTM refies the design patterns as software
architecture to make their configuration easier.
12
p. 176 of 199
Rouvoy and Merle
6
Conclusion and Future Work
This paper has introduced a microcomponent-based framework to build evolutionary transaction services. This framework, named GoTM, uses various
design patterns to support evolution. These design patterns are implemented
with microcomponents that can be composed following various Fractal ADL
configurations. The use of microcomponents and design patterns to build
transaction services provides better modularity properties and does not impact on the transaction service efficiency. Moreover, some architecture pattern
can be extracted from this fine-grained architecture.
Our future work will study additional technologies to provide further modularity to our framework using aspects and a higher level of abstraction for
the design of transaction services using a model-driven approach.
Aspects and Microcomponents. We plan to investigate the use of an
aspect-oriented framework to introduce some of the technical concerns presented in this paper. In particular, the Fractal Aspect Component (FAC)
framework provides an interesting extension to support aspect-oriented programming at the component level [18]. For example, using FAC, the Factory
design pattern (see Section 3.3) could be refactored to introduce the Pooling
and the Caching concerns as aspect components rather than using encapsulation and sharing of components.
Model-Driven Engineering. We also intend to reify the software architecture design patterns identified in GoTM as template components to enforce
and control their reuse. For example, we can define a software architecture
pattern as an abstract component and use the extension mechanism of Fractal
ADL to specify concrete components used to implement the design pattern.
The concrete components could be generated using a Model-Driven Engineering (MDE) approach to complete the software architecture design patterns
already defined in GoTM [19].
Availability. GoTM is freely available under an LGPL licence at the following URL: http://gotm.objectweb.org.
Acknowledgments. This work is partially funded by the national institute
for research in computer science and control (INRIA), and the Region Nord
- Pas-de-Calais.
References
[1] Alia, M., S. Chassande-Barrioz, P. Dechamboux, C. Hamon and A. Lefebvre, A
Middleware Framework for the Persistence and Querying of Java Objects, in:
Proceedings of the 18th European Conference on Object-Oriented Programming
(ECOOP), LNCS 3086 (2004), pp. 291–315.
[2] Barais, O., J. Lawall, A.-F. Le Meur and L. Duchien, Safe Integration of New
Concerns in a Software Architecture, in: Proceedings of the 13th Annual IEEE
13
p. 177 of 199
Rouvoy and Merle
International Conference on Engineering of Computer Based Systems (ECBS)
(2006), to appear.
[3] Blair, G. S., G. Coulson, A. Andersen, L. Blair, M. Clarke, F. M. Costa, H. A.
Duran-Limon, T. Fitzpatrick, L. Johnston, R. S. Moreira, N. Parlavantzas
and K. B. Saikoski, The Design and Implementation of Open ORB 2, IEEE
Distributed Systems Online 2 (2001).
[4] Bracha, G. and W. Cook, Mixin-based inheritance, in: Proceedings of
the International Symposium on Object-Oriented Programming: Systems,
Languages and Applications (OOPSLA), SIGPLAN Notices 25 (1990).
[5] Bruneton, E., T. Coupaye, M. Leclercq, V. Quéma and J.-B. Stefani, An
Open Component Model and Its Support in Java, in: Proceedings of the 7th
International ICSE Symposium on Component-Based Software Engineering
(CBSE), LNCS 3054 (2004), pp. 7–22.
[6] Bruneton, E., T. Coupaye and J.-B. Stefani, Recursive and Dynamic Software
Composition with Sharing, in: Proceedings of the 7th International ECOOP
Workshop on Component-Oriented Programming (WCOP), Malaga, Spain,
2002.
[7] Cabrera, L. F., G. Copeland, M. Feingold, R. W. Freund, T. Freund, J. Johnson,
S. Joyce, C. Kaler, J. Klein, D. Langworthy, M. Little, A. Nadalin, E. Newcomer,
D. Orchard, I. Robinson, T. Storey and S. Thatte, “Web Services Atomic
Transaction (WS-AtomicTransaction),” 1.0 edition (2005).
[8] Cheung, S., “Java Transaction Service (JTS),” Sun Microsystems, Inc., San
Antonio Road, Palo Alto, CA, 1.0 edition (1999).
[9] Denier, S., H. Albin-Amiot and P. Cointe, Expression and Composition of
Design Patterns with Aspects, in: Proceedings of the 2ème Journée Francophone
sur les Développement de Logiciels Par Aspects (JFDLPA), RSTI L’Objet 11
(2005).
[10] Ducasse, S., O. Nierstrasz, N. Schärli, R. Wuyts and A. Black, Traits: A
Mechanism for fine-grained Reuse, Transactions on Programming Languages
and Systems (TOPLAS) (2005), pp. 46–78.
[11] Dumant, B., F. Horn, F. D. Tran and J.-B. Stefani, Jonathan: An Open
Distributed Processing Environment in Java, Distributed Systems Engineering
6 (1999), pp. 3–12.
[12] Gamma, E., R. Helm, R. Johnson, J. Vlissides and G. Booch, “Design Patterns:
Elements of Reusable Object-Oriented Software,” Addison-Westley Professional
Computing, USA, 1995.
[13] Hannemann, J. and G. Kiczales, Design Pattern Implementation in Java and
AspectJ, in: Proceedings of the 17th Annual ACM Conference on ObjectOriented Programming, Systems, Languages, and Applications (OOPSLA),
SIGPLAN 37 (2002), pp. 161–173.
14
p. 178 of 199
Rouvoy and Merle
[14] Leclercq, M., V. Quéma and J.-B. Stefani, DREAM: A Component Framework
for Constructing Resource-Aware Configurable Middleware, IEEE DS Online 6
(2005), pp. 1–12.
[15] Medvidovic, N. and R. Taylor, A Classification and Comparison Framework for
Software Architecture Description Languages, IEEE Transactions on Software
Engineering 26 (2000), pp. 70–93.
[16] Mencl, V. and T. Bures, Microcomponent-Based Component Controllers: A
Foundation for Component Aspects, in: Proceedings of the 12th Asia-Pacific
Software Engineering Conference (APSEC) (2005), pp. 729–738.
[17] OMG, “Object Transaction Service (OTS),” Needham, MA, USA, 1.4 edition
(2003).
[18] Pessemier, N., L. Seinturier, T. Coupaye and L. Duchien, A Model for
Developing Component-based and Aspect-oriented Systems, in: Proceedings of
the 5th International ETAPS Symposium on Software Composition (SC), LNCS
(2006), to appear.
[19] Rouvoy, R. and P. Merle, Towards a Model Driven Approach to Build
Component-Based Adaptable Middleware, in: Proceedings of the 3rd Middleware
Workshop on Reflective and Adaptive Middleware (RAM), AICPS 80 (2004),
pp. 195–200.
[20] Rouvoy, R., P. Serrano-Alvarado and P. Merle, A Component-based Approach
to Compose Transaction Standards, in: Proceedings of the 5th International
ETAPS Symposium on Software Composition (SC), LNCS (2006), to appear.
[21] Rouvoy, R., P. Serrano-Alvarado and P. Merle, Towards Context-Aware
Transaction Services, in: Proceedings of the 6th International Conference on
Distributed Applications and Interoperable Systems (DAIS), LNCS (2006), to
appear.
[22] Seinturier, L., N. Pessemier, L. Duchien and T. Coupaye, A Component
Model Engineered with Components and Aspects, in: Proceedings of the
9th International SIGSOFT Symposium on Component-Based Software
Engineering (CBSE), LNCS (2006), to appear.
[23] The Open Group, “Distributed Transaction Processing: The XA Specification,”
C193 edition (1992).
15
p. 179 of 199
p. 180 of 199
Comparative Semantics of Feature Diagrams
Pierre-Yves Schobbens
Patrick Heymans
Yves Bontemps
Jean-Christophe Trigaux
February 6, 2006
Abstract
Feature Diagrams (FD) are a family of popular modelling languages used to address the feature
interaction problem, particularly in software product lines. FD were first introduced by Kang as part
of the FODA (Feature Oriented Domain Analysis) method. Afterwards, various extensions of FODA
FD were introduced to compensate for a purported ambiguity and lack of precision and expressiveness.
However, they never received a formal semantics, which is the hallmark of precision and unambiguity
and a prerequisite for efficient and safe tool automation.
The reported work is intended to contribute a more rigorous approach to the definition, understanding, evaluation, selection and implementation of FD languages. First, we provide a survey of FD variants.
Then, we give them a generic formal semantics. This demonstrates that FD can be precise and unambiguous. This also defines their expressiveness. Many variants are fully expressive, and thus the endless quest
for extensions actually cannot be justified by expressiveness. A finer notion is thus needed to compare
these fully expressive languages. Two solutions are well-established: succinctness and embeddability,
that express the naturalness of a language. We show that the fully expressive FD fall into two succinctness class, of which we of course recommend the most succinct. Among the succinct fully expressive
languages, we suggest a new, simple one that is not harmfully redundant: Varied FD (VFD). Finally, we
study the execution time that tools will need to solve useful problems in these languages.
1
Introduction
Features have been defined as “a distinguishable characteristic of a concept (e.g., system, component and
so on) that is relevant to some stakeholder of the concept” [7]. This broad definition encompasses most
commonly used meanings, for example: new services or options added to an existing base system.
The description of systems in terms of features is popular in many domains such as telecommunication
or embedded systems. Features are not used for one-shot developments, but for systems that must have
multiple incarnations because (1) they need to evolve through time or (2) variants must be deployed in
many contexts. A typical example of the former are telephony systems which, to face harsh competition,
need to adapt quickly to satisfy demand. An example of the second are product lines (also known as product
families) in domains such as home appliances or the automotive industry.
Two perspectives are used in turn:
• When the base system (or system family) is conceived, features are “units of evolution” (or change)
that adapt it to an optional user requirement [9]. A recurrent problem is the one of feature interaction:
adding new features may modify the operation of already implemented ones. When this modification is undesirable, it is called a feature interference. This problem is difficult because “individual
features do not typically trace directly to an individual component” [11]. With the increasing size
and complexity of current systems and associated product lines, dealing with feature interaction becomes challenging. To guarantee that systems will deliver the desirable behaviour, interactions must
be detected, resolved if possible, or the combination of features must be forbidden.
• When a specific system (or family member or product) must be designed, “[the] product is defined by
selecting a group of features, [for which] a carefully coordinated and complicated mixture of parts of
different components are involved” [11]. Hence, it is essential that features and their interactions
p. 181be
of 199
1
well identified. This knowledge about their valid and forbidden combinations must be packaged in a
form that is usable by the customers selecting their options, and by the engineers deploying them.
In software product lines, the former activity is often referred to as domain engineering and the second
as application engineering [3, 18].
To express the knowledge about allowed feature combinations, practitioners use feature diagrams (FD).
FD are an essential means of communication between domain and application engineers as well as customers and other stakeholders such as marketing representatives, managers, etc. In particular, FD play an
essential role during requirements engineering [18], which is known to be the most critical phase of software development. FD provide a concise and explicit way to: (1) describe allowed variabilities between
products of the same line/family, (2) represent features dependencies, (3) guide the selection of features
allowing the construction of a specific product, (4) facilitate the reuse and the evolution of software components implementing these features.
In the last 15 years or so, research and industry have developed several FD languages. The first and
seminal proposal was introduced as part of the FODA method back in 1990 [15]. An example of a FODA
FD is given in Fig. 1. It is inspired from a case study defined in [4] and indicates the allowed combinations
of features for a family of systems intended to monitor the engine of a car. As is illustrated, FODA
features are nodes of a graph represented by strings and related by various types of edges. On top of
the figure, the feature Monitor Engine System is called the root feature, or concept. The edges are
used to progressively decompose it into more detailed features. Nodes bear a logical operator, such as
and, or(), xor(♦). Cardinality operators (card) are not illustrated: by writing a range 3..5 below a node,
we require to have between 3 and 5 sons in the model. The dotted edges are graphical constraints: they
restrict the valid models.
Monitor Engine
system
Monitor Engine
Performance
Monitor Fuel
Consumption
Monitor
Monitor RPM Monitor exhaust
Temperatures
levels and
temperature
transmission
coolant
oil
l/Km
Measures
Miles/gallon
Based on
distance
Methods
Based on
type of
driving
Based on
drive
engine
Figure 1: Monitor Engine System in FODA, FeatuRSEB resp.
Since Kang et al.’s initial proposal, several extensions of FODA have been devised as part of the
following methods: FORM [16], FeatureRSEB [10], Generative Programming [7], PLUSS [8], and in
the work of the following authors: Riebisch et al. [20, 19], van Gurp et al. [22], van Deursen et al. [21],
Czarnecki et al. [5, 6] and Batory [1]. While, triggered by this paper [2], some authors have recently started
to better define their semantics [21, 6, 1], most proponents of FD [16, 10, 7, 20, 19, 22, 8] have not. Still,
they have argued for an “improved expressiveness”. In this paper, we adopt a formal approach to check the
validity of such claims.
Formal semantics is not an issue to be taken lightly. As remarkably argued in [12, 13], formal semantics
is the best way (1) to avoid ambiguities and (2) to start building safe automated reasoning tools, for a variety
of purposes including verification, transformation and code generation. More specifically, for FD, we must
be sure that a model excludes all the forbidden feature combinations and admits all the valid ones. If this is
not the case, harmful feature interactions are likely to take place, or the product line will be unnecessarily
restricted and thus less competitive. A tool for assisting stakeholders in selecting features therefore must
be based on formal semantics.
Our formal approach is designed to introduce more rigour in the motivation and definition of FD languages. This should make the discussion of their qualities more focused and productive. In the end, we
hopeoffor199
a better convergence of research efforts in this area. A new, rigorously defined and motivated
p. 182
2
FD language, called VFD, is introduced in this paper as a first step in this direction. VFD use a single
construct instead of all the proposed extensions. Indeed, the proliferation of constructs and languages is an
additional source of interpretation and interoperability problems. For example, if two product lines that use
different modelling languages need to be merged, their respective teams will have to make extra efforts to
understand each other’s FD, thereby increasing the likelihood of misunderstandings. Moreover, a new FD
will have to be created from the existing ones to account for the new merged product line. It is important
that rigorous criteria are used to chose the new FD language and that proved correct semantic-preserving
translations are used.
Short Name GT
NT
GCT TCL
FODA
1
and ∪ xor ∪ {opt1 }
∅
CR
FORM
0
and ∪ xor ∪ {opt1 }
∅
CR
FeatuRSEB
0
and ∪ xor ∪ or ∪ {opt1 } {⇒, |}
CR
Riebisch
0
card ∪ {opt1 }
{⇒, |}
CR
Eisenecker
1
and ∪ xor ∪ or ∪ {opt1 }
∅
CR
Pluss
1
and ∪ xor ∪ or ∪ {opt1 } {⇒, |}
∅
VFD
0
card
∅
∅
GCT Graphical Constraint Types
GT
Graph Types (1=tree, 0=DAG)
NT
Node Types
TCL Textual Constraint Language (CR = {mutex, requires})
Figure 2: Formal definition of FD variants
2
Formal comparison
2.1
Semantics
The semantics of a FD is defined as a Product Line (PL), i.e. a set of products. A product provides a set of
primitive features. A model is a set of nodes of the FD; its primitive nodes give a product.
Definition 2.1 (Valid model) A model m ∈ M is valid for a diagram d ∈ FD iff:
1. The concept (root) is in any valid model: r ∈ m
2. The operator on nodes is satisfied: If a node n ∈ m, and n has sons s1 , . . . , sk λ(n) = opk , then
opk (s1 ∈ m, . . . , sk ∈ m) must evaluate to true.
3. The model must satisfy all textual constraints: ∀φ ∈ Φ, m φ, where m φ means that we replace
each node name n in φ by the value of n ∈ m, evaluate φ and get true.
4. The model must satisfy all graphical constraints: ∀(n1 , op2 , n2 , ) ∈ CE, op2 (n1 ∈ m, n2 ∈ m) must be
true.
5. If s is in the model and s is not the root, one of its parents n, called its justification, must be too:
∀s ∈ N.s ∈ m ∧ s , r: ∃n ∈ N : n ∈ m ∧ n → s.
2.2
Expressiveness
The expressiveness of a language is the part of its semantic domain that it can express. It is fully expressive
if it can express all its semantic domain. We prove that all known variants based on trees are not fully
expressive, while all variants based on DAG are.
p. 183 of 199
3
2.3
Succinctness
Among the fully expressive languages, the most succinct ones are the most convenient. We show that the
existing languages fall into two levels of succinctness, separated by a cubic jump in succinctness, given by
the cardinality node type.
2.4
Embeddability
Embeddability, also called naturality, measures whether one language can express another without any
reorganisation. We propose the following definition for graphical languages:
Definition 2.2 (Graphical embeddability) A graphical language L1 is embeddable into L2 iff there is
a translation T : L1 → L2 that is node-controlled [14] : T is expressed as a set of rules of the form
D1 → D2 , where D1 is a diagram containing a distinguished node (or edge) n, and all possible relations
with this node. Its translation D2 is a subgraph in L2 , plus how the existing relations should be connected
to nodes of this new subgraph.
If we use this last criterion, we see in Fig.3 that all constructs can be expressed naturally by cardinality
nodes.
Instead of . . .
an option node op
a xor-node
an or-node
an and-node with number of sons s
write . . .
vp(0 . . . 1)
a vp(1 . . . 1)
a vp(1 . . . ∗)
a vp(s . . . s)
Figure 3: Embedding FeatuRSEB into VFD
2.5
Decision problems
We have fixed the time needed for a CASE tool to provide practically important functionalities. Our results
are summarized in Fig. 4, and are only valid for the fully expressive variants.
The problem of ...
Satisfiability
Product-checking
Model-equivalence
Product-line-equivalence
Intersection
Union
Product
has complexity
NP-complete
NP-complete
coNP-complete
Π1P -complete
linear
linear
linear
Figure 4: complexities for CASE tools
3
Related works
In this paper, we have studied the formal underpinnings of a family of languages in the line of the original
feature trees [15], that have not yet received a formal definition, but where a product line semantics is
clearly intended. Recently, however, a few more formally defined notations have surfaced:
1. van Deursen et al. [21] deal with a textual FD language to which they provide a semantics by coding
rewriting rules in the ASF+SDF specification formalism associated to a tool-environment. Hence,
contrary to ours, their semantics is not tool-independent, nor self-contained. Also, a major difference
that their semantics preserves the order of features.
p. 184 ofis199
4
2. Batory [1] does not really provide a semantics, but a translation to two formalisms: grammars and
propositional formulae. His objective is to allow the use of off-the-shelf Logic-Truth Maintenance
Systems and SAT solvers in feature modelling tools. The semantics of grammars is a set of strings,
and thus order and repetition are kept. The semantics of propositional formulae is closer to our
products, but the translation provided by [1] differs in two respects: (i) decomposable features are
not eliminated, and (ii) the translation of operators by an equivalence leads to (we think) a counterintuitive semantics, that anyway contradicts the semantics by grammars provided in the same article.
3. In [6], Czarnecki et al. define a new FD language to account for staged configuration. They introduce
feature cardinality (the number of times a feature can be repeated in a product) in addition to the
more usual (group) cardinality. Foremost, a new semantics is proposed where the full shape of
the unordered tree is important, including repetition and decomposable features. The semantics is
defined in a bulky 4-stage process where FD are translated in turn into an extended abstract syntax,
a context-free grammar and an algebra. In [5], the authors provide an even richer syntax. The
semantics of the latter is yet to be defined, but is intended to be similar to [6].
In general, we can argue that related approaches do not rank as good as ours on generality, abstraction
and intuitiveness. For some approaches [21, 1] the semantics is tool-driven, while on the contrary tools
should be built according to a carefully chosen semantics. For the others, we could not find a justification.
Our approach is justified by our goals: make fundamental semantic issues of FD languages explicit in order
to study their properties and rigorously evaluate them before adopting them or implement CASE tools. A
finer comparison of the semantic options taken in the aforementioned related approaches wrt ours is a topic
for future work.
4
Conclusion
In this paper, we have recalled standard, well-defined criteria to evaluate FD variants. This process produces VFD, a FD language that is fully expressive, harmfully-irredundant, that can embed all other known
variants, and is linearly as succinct. Our formalization is generic and decides concisely every detail of FD.
References
[1] Don S. Batory. Feature Models, Grammars, and Propositional Formulas. In Obbink and Pohl [17],
pages 7–20.
[2] Yves Bontemps, Patrick Heymans, Pierre-Yves Schobbens, and Jean-Christophe Trigaux. Semantics
of feature diagrams. In Tomi Männistö and Jan Bosch, editors, Proc. of Workshop on Software Variability Management for Product Derivation (Towards Tool Support), Boston, August 2004. available
at http://www.info.fundp.ac.be/˜ybo.
[3] Paul C. Clements and Linda Northrop. Software Product Lines: Practices and Patterns. SEI Series
in Software Engineering. Addison-Wesley, August 2001.
[4] Sholom Cohen, Bedir Tekinerdogan, and Krzysztof Czarnecki. A case study on requirement specification: Driver Monitor. In Workshop on Techniques for Exploiting Commonality Through Variability
Management at the Second International Conference on Software Product Lines (SPLC2), 2002.
[5] Krzysztof Czarnecki and Simon Helsen ad Ulrich Eisenecker. Staged Configuration Using Feature
Models. Software Process Improvement and Practice, special issue on Software Variability: Process
and Management, 10(2):143 – 169, 2005.
[6] Krzysztof Czarnecki, Simon Helsen, and Ulrich W. Eisenecker. Formalizing cardinality-based feature
models and their specialization. Software Process: Improvement and Practice, 10(1):7–29, 2005.
[7] Ulrich W. Eisenecker and Krzysztof Czarnecki. Generative Programming: Methods, Tools, and
Applications. Addison-Wesley, 2000.
p. 185 of 199
5
[8] Magnus Eriksson, Jürgen Börstler, and Kjell Borg. The PLUSS Approach - Domain Modeling with
Features, Use Cases and Use Case Realizations. In Obbink and Pohl [17], pages 33–44.
[9] J. P. Gibson. Feature requirements models: Understanding interactions. In Feature Interactions In
Telecommunications IV. IOS Press, 1997.
[10] M. Griss, J. Favaro, and M. d’Alessandro. Integrating Feature Modeling with the RSEB. In Proceedings of the Fifth International Conference on Software Reuse, pages 76–85, Vancouver, BC, Canada,
June 1998.
[11] Martin L. Griss. Implementing Product-Line Features with Component Reuse. In ICSR-6: Proceedings of the 6th International Conerence on Software Reuse, pages 137–152. Springer-Verlag, 2000.
[12] D. Harel and B. Rumpe. Modeling Languages: Syntax, Semantics and All That Stuff, Part i: The
Basic Stuff. Technical Report MCS00-16, Faculty of Mathematics and Computer Science, The Weizmann Institute of Science, 2000.
[13] D. Harel and B. Rumpe. Meaningful Modeling: What’s the Semantics of “Semantics”?
Computer, 37(10):64–72, October 2004.
IEEE
[14] Dirk Janssens and Grzegorz Rozenberg. On the structure of node label controlled graph languages.
Inform. Sci., 20:191–244, 1980.
[15] K. Kang, S. Cohen, J. Hess, W. Novak, and S. Peterson. Feature-Oriented Domain Analysis (FODA)
Feasibility Study. Technical Report CMU/SEI-90-TR-21, Software Engineering Institute, Carnegie
Mellon University, November 1990.
[16] Kyo C. Kang, Sajoong Kim, Jaejoon Lee, and Kijoo Kim. FORM: A Feature-Oriented Reuse Method.
In Annals of Software Engineering 5, pages 143–168, 1998.
[17] J. Henk Obbink and Klaus Pohl, editors. Software Product Lines, 9th International Conference,
SPLC 2005, Rennes, France, September 26-29, 2005, Proceedings, volume 3714 of Lecture Notes in
Computer Science. Springer, 2005.
[18] Klaus Pohl, Gunter Bockle, and Frank van der Linden. Software Product Line Engineering: Foundations, Principles and Techniques. Springer, july 2005.
[19] M. Riebisch. Towards a More Precise Definition of Feature Models. Position Paper. In: M. Riebisch,
J. O. Coplien, D, Streitferdt (Eds.): Modelling Variability for Object-Oriented Product Lines, 2003.
[20] Matthias Riebisch, Kai Böllert, Detlef Streitferdt, and Ilka Philippow. Extending Feature Diagrams
with UML Multiplicities. In Proceedings of the Sixth Conference on Integrated Design and Process
Technology (IDPT 2002), Pasadena, CA, June 2002.
[21] A. van Deursen and P. Klint. Domain-Specific Language Design Requires Feature Descriptions, 2002.
[22] Jilles van Gurp, Jan Bosch, and Mikael Svahnberg. On the Notion of Variability in Software Product
Lines. In Proceedings of the Working IEEE/IFIP Conference on Software Architecture (WICSA’01),
2001.
p. 186 of 199
6
3UHOLPLQDU\UHVXOWVIURPDQLQYHVWLJDWLRQRIVRIWZDUHHYROXWLRQLQ
LQGXVWU\
Odd Petter N. Slyngstad, Anita Gupta, Reidar Conradi, Parastoo Mohagheghi,
Thea C. Steen, Mari T. Haug
Department of Computer and Information Science (IDI)
Norwegian University of Science and Technology (NTNU)
{oslyngst, anitaash, conradi, parastoo} at idi.ntnu.no
Harald Rønneberg, Einar Landre
Statoil KTJ/IT
Forus, Stavanger
{haro, einla} at statoil.com
$EVWUDFW
,QWKH6(926RIWZDUH(92OXWLRQSURMHFWZHH[SORUHWKHILHOGRIVRIWZDUHHYROXWLRQLQWHUPVRIVRIWZDUHTXDOLW\DWWULEXWHV
WKHLU FKDUDFWHULVWLFV DQG SRVVLEOH UHODWLRQV EHWZHHQ WKHP &XUUHQWO\ ZH KDYH H[SORUHG SUHOLPLQDU\ GDWD IURP D VRIWZDUH
HQJLQHHULQJSURJUDPLQD1RUZHJLDQFRPSDQ\6WDWRLO$6$RQWKHIUHTXHQF\RIGHIHFWVDQGFKDQJHVRIUHXVHGFRPSRQHQWV
7KHVH PHDVXUHV DUH WKH VWDWHG TXDOLW\ IRFL RI WKH SURJUDP DQG RXU UHVXOWV LQGLFDWH WKDW ZKLOH GHIHFWGHQVLW\ HYROYHV
GHFUHDVLQJO\ RYHU WLPH FKDQJHGHQVLW\ GRHV QRW H[KLELW D FRQFOXVLYH EHKDYLRU 7KLV LV SDUW RI RQJRLQJ UHVHDUFK DQG WKH
UHVXOWV ZLOO EH H[SDQGHG DQG YHULILHG LQ ODWHU IROORZLQJ SXEOLFDWLRQV 2YHUDOO ZH DLP WR XVH WKH FROOHFWHG GDWD WRZDUGV
GLVFRYHULQJDQGH[SODLQLQJFKDUDFWHULVWLFVUHODWHGWRVRIWZDUHHYROXWLRQ
.H\ZRUGV&%6(VRIWZDUHHYROXWLRQTXDOLW\DWWULEXWHV
,QWURGXFWLRQ
The purpose of the SEVO (Software EVOlution) project [SEVO, 2004] is to explore software evolution in Component-Based
Software Engineering (CBSE) through empirical research. Aiming to increase our knowledge and understanding of
underlying issues and challenges in software evolution, one long-term purpose of the project is to provide possible solutions
to these problems. Another goal is to help industrial software engineers to improve their efficiency and cost-effectiveness in
developing software based on reusable components, as well as in their ability to develop and use reusable assets. Underlying
all this is the need for evidence to support or reject existing and proposed hypotheses, models, design decisions, and the like.
Such evidence is best obtained through performing empirical studies in the field, and experience from such studies will be
possible to incorporate into a knowledge base for use by the community.
Currently, we are studying the reuse process in the IT-department of a large Norwegian Oil & Gas company named Statoil
ASA1 and collecting quantitative data on reused components. The research questions are obtained from the existing
literature, and include how the defect-density in reusable components evolves over time, as well as how the number of
changes per reusable component evolves over time. Based on these issues, we have defined and explored several research
questions and hypotheses through an empirical study. Here, we perform a preliminary analysis of data on defect-density and
change-density of reusable components from a software engineering program in Statoil ASA, a major international petroleum
company. We have chosen these two attributes for measuring software quality as they are part of the stated quality focus for
the program in Statoil ASA. The purpose of this study is to gain initial understanding of software evolution from the
viewpoint of these quality attributes.
The number of change requests and trouble reports is to some extent small, and future studies will be used to refine and
further investigate the research questions and hypotheses presented here. This study is therefore a pre-study. This paper is
structured as follows: Section 2 introduces terminology, Section 3 discusses our contribution to Statoil ASA, as well as the
research context at the company. Furthermore, Section 4 introduces our research questions and preliminary data analysis, and
1
ASA stands for “allmennaksjeselskap”, meaning Incorporated.
p. 187 of 199
1
Section 5 summarizes and discusses these preliminary results. Section 6 contains planning for further data collection and
future work, while Section 7 concludes.
7HUPLQRORJ\
CBSE is a new style of software development, emphasizing component reuse, which involves the practices needed to perform
component-based development in a repeatable way to build systems that have predictable properties [Bass et al., 2001]. An
important goal of CBSE is that components provide services that can be integrated into larger, complete applications. can be defined as: “….the G\QDPLF EHKDYLRXU of programming systems as they are PDLQWDLQHG and
their OLIH WLPHV….” [Kemerer & Slaughter, 1999]. The first studies found in literature on software evolution,
were undertaken by Lehman on an OS360 system at IBM [Lehman et al., 1985]. Software evolution is closely related to
software reuse, since reuse is often employed to achieve the aforementioned positive effects when evolving a system. It
should be noted that several alternative uses of the term software evolution exist; some use the term to encompass both the
initial development of the system and its subsequent maintenance, while others use it exclusively about the events after initial
implementation, in concurrence with its original focus [Kemerer & Slaughter, 1999]. Lastly, there is some work on software
evolution taxonomy [Verhoef, 2004], the author sees software maintenance as subpart of software evolution.
6RIWZDUH HYROXWLRQ
HQKDQFHG over
is the updating incurred on already existing software in order to keep the system running and up to
date. During their lifetime software systems usually need to be changed to reflect changing business, user and customer needs
[Lehman, 1974]. Other changes occurring in a software system’s environment may emerge from undiscovered errors during
system validation, requiring repair or when new hardware is introduced.
6RIWZDUH PDLQWHQDQFH
6RIWZDUHPDLQWHQDQFHcan
hence be:
faults)
SUHYHQWLYH(to improve future maintainability)
DGDSWLYH (to accommodate alterations related to platform or environment)RU
SHUIHFWLYH in response to requirements changes or additions, as well as enhancing the performance of a system
[Sommerville, 2001] [Pressman, 2000].
¾
¾
¾
¾
FRUUHFWLYH(correcting
In summary, some see the perfective and adaptive parts of software maintenance as part of VRIWZDUHHYROXWLRQ [Sommerville
2001]. That is, that it encompasses both aspects of modified and added scope, as well as environmental adaptations. This
does not include platform changes, which are commonly referred to as porting, instead of software evolution [Frakes & Fox,
1995]. There is, hence, no clear agreement on the definition of software evolution. Although there seems to be more
agreement on the definition of the different types of software maintenance, a clear distinction between software maintenance
and software evolution remains elusive.
Statoil ASA [Statoil ASA O&S Masterplan, 2006] has chosen to use defect-density and change-density (stability) as indicators of
software quality. A lowered defect-density shows an increased quality, while stability in terms of change-density means a
stable level of resources are needed towards adaptation and perfection of the software. In this study, it is these two measures
(defect-density and change-density) we will be focusing on, in order to show how the reusable components evolve over time.
2XUFRQWULEXWLRQWR6WDWRLO$6$DQG7KHFRQWH[W
Our direct contribution is helping Statoil ASA central software development unit in Norway with defining metrics, collecting
data and analyzing it. We will also be contributing towards reaching a better understanding and management of software
evolution, by exploring whether that employment of reusable components can lead to better system quality2. Finally, we
expect that our results will be possible to use as a baseline for comparison in future studies on software evolution.
Statoil ASA is a large, multinational company, in the oil & gas industry. It is represented in 28 countries, has a total of about
24,000 employees, and is headquartered in Europe. The central IT-department in the company is responsible for developing
and delivering software, which is meant to give key business areas better flexibility in their operation. They are also
responsible for operation and support of IT-systems at Statoil ASA. This department consists of approximately 100
developers worldwide, located mainly in Norway. Since 2003, a central IT strategy of the O&S (Oil Sales, Trading and
Supply) business area has been to explore the potential benefits of reusing software systematically, in the form of a
framework based on JEF (Java Enterprise Framework) components. This IT strategy was started as a response to the
changing business and market trends, and in order to provide a consistent and resilient technical platform for development
2
The quality focus at Statoil ASA is defect density and change-density (stability).
p. 188 of 199
2
and integration [14]. The strategy is now being propagated to other divisions within Statoil ASA. The JEF framework itself
consists of seven different components. Table 1 gives an overview of the three JEF releases, and the size of each component
in the three releases.
7DEOH6L]HRI-()FRPSRQHQWVLQ/2&
Component
Release 2.9
JEF Client
JEF Dataaccess
JEF Integration
JEF Security
JEF Util
JEF Workbench
7871
181
958
1588
1312
4187
Release
3.0
8400
181
958
1593
1359
4515
Release 3.1
8885
268
958
2374
1647
4748
Release 2.9 spanned the time between 09.11.2004 - 14.06.2005, while Release 3.0 spanned the time between 15.06.2005 09.09.2005 and Release 3.1 spanned the time between 10.09.2005 - 18.11.2005.
These JEF components can either be applied separately or together when developing applications. In total, we will be
studying the architectural framework components, as well as two projects which use this framework. Here, we present a preanalysis, reporting on preliminary results of studying defect-density and stability (change rate) of 6 of the 7 reusable
architectural framework components, over three releases. These three releases exist concurrently, and the data is mainly from
system/integration tests. The limited dataset used in this preliminary analysis is due to current data availability.
5HVHDUFKTXHVWLRQVDQG3UHOLPLQDU\GDWDDQDO\VLV
All the statistical data presented in this study are based on valid data, as none were missing data. The statistical analysis tools
we used were SPSS version 14.0 and Microsoft Excel 2003. Our preliminary research questions are regarding defect-density
and stability, and are formulated as follows:
54 +RZGRHVWKHGHIHFWGHQVLW\LQUHXVDEOHFRPSRQHQWVHYROYHRYHUWLPH"
Defect-density (number of Trouble Reports/KLOC) may be seen as belonging to the corrective maintenance category by
some researchers, but maintenance can also be seen as part of evolution [Verhoef, 2004]. Therefore, measuring defectdensity may help characterize the evolution of the different JEF components over time. The following are the related
hypotheses for RQ1:
¾
¾
+7KHGHIHFWGHQVLW\LQ-()FRPSRQHQWVGRQRWFKDQJHZLWKWLPH
+$7KHUHLVDGLIIHUHQFHLQGHIHFWGHQVLW\IRU-()FRPSRQHQWVRYHUWLPH
54+RZGRHVWKHQXPEHURIFKDQJHVSHUUHXVDEOHFRPSRQHQWVWDELOLW\HYROYHRYHUWLPH"
Research has demonstrated that reusable components are more stable (has a lower change-density) and that this does improve
with time [Mohagheghi et al., 2004]. We have chosen to use change-density (number of Change Requests/ KLOC) as an
indication of the stability, as this is the defined quality focus of Statoil ASA. The following are the related hypotheses for
RQ2:
¾
¾
+7KHFKDQJHGHQVLW\LQ-()FRPSRQHQWVGRHVQRWFKDQJHZLWKWLPH
+$7KHUHLVDGLIIHUHQFHLQFKDQJHGHQVLW\IRU-()FRPSRQHQWVRYHUWLPH
54+RZGRHVWKHGHIHFWGHQVLW\LQUHXVDEOHFRPSRQHQWVHYROYHRYHUWLPH"
For RQ1 we want to see how the defect-density in JEF components evolves over time, so we decided to use ANOVA test, as
this is suitable for comparing the mean defect-density between the three releases. With this test, we wanted to investigate
whether there is a difference in defect-density for JEF components. To investigate this research question, all submitted
defects for each component were counted, per release. We then calculated the defect-density, as the number of trouble
reports (TR’s) divided by kilo lines of code (KLOC) for each component. Table 2 shows the results of this calculation for
three releases, all involving major changes to the software components.
p. 189 of 199
3
7DEOH'HIHFWGHQVLW\SHU-()FRPSRQHQWLQ75./2&
&RPSRQHQW
5HOHDVH
5HOHDVH 5HOHDVH
JEF Client
JEF Dataaccess
JEF Integration
JEF Security
JEF Util
JEF Workbench
17.1516
11.0497
3.1315
5.6675
1.5244
3.8214
1.5476
0.0000
0.0000
0.6277
0.0000
0.8859
0.1190
0.0000
0.0000
0.0000
0.0000
0.2106
Here, we want to test if there is a significant difference in the mean-values of the different releases, which we are using as
groups in the analysis. Table 3 shows that the average defect-density decreases with time. The significance level is 0.05, and
the data was checked for normality.
7DEOH$YHUDJHGHIHFWGHQVLW\SHUUHOHDVH
*URXSV
0HDQ
Release 2.9
Release 3.0
Release 3.1
7.058
0.510
0.055
The ANOVA test we performed yielded a F0 value of 7.749, and the critical value was computed to be F0.005, 2, 15= 3.682, with
a P-value of 0.0049. Since 7.749 > 3.682, it is possible to reject the null hypothesis. In summary, we can reject H10 in
favour of our alternative hypothesis H1A, and hence support the notion that the defect-density decreases with time.
The data trend for RQ1 reveals a declining defect-density, possibly caused by a corresponding decrease in change-density.
We will, however, be expanding and verifying our hypothesis on defect-density with more empirical data in future work.
54+RZGRHVWKHQXPEHURIFKDQJHVSHUUHXVDEOHFRPSRQHQWVWDELOLW\HYROYHRYHUWLPH"
For RQ2 we want to see how the number of changes per JEF component evolves over time, so we again decided to use
ANOVA test, as this is suitable to compare the mean change-density between the three releases. With this test, we wanted to
investigate whether there is a difference in change-density for JEF components. To investigate this research question, all
change requests were sorted according to JEF component and then counted, per release. We then calculated the changedensity, as the number of change requests (CR) divided by kilo lines of code (KLOC) for each component. Change requests
in this context mean new or changed requirements. Table 4 shows the results of this calculation.
7DEOH&KDQJHGHQVLW\SHU-()FRPSRQHQWLQ&5./2&
&RPSRQHQW
5HOHDVH 5HOHDVH 5HOHDVH
JEF Client
JEF Dataaccess
JEF Integration
JEF Security
JEF Util
JEF Workbench
13.4672
0.0000
3.1315
9.4458
4.5732
8.3592
0.8333
0.0000
1.0438
1.8832
0.7358
1.1074
0.2251
11.1940
0.0000
0.6072
0.0000
0.0000
Here too, we decided to use an ANOVA test, to see if there was a significant difference in the mean-values of the different
releases. Table 5 shows the variation in mean change-density over time. The significance level is 0.055, and the data were
checked for normality.
p. 190 of 199
4
7DEOH$YHUDJHFKDQJHGHQVLW\SHUUHOHDVH
*URXSV
0HDQ
Release 2.9
Release 3.0
Release 3.1
6.496
0.934
2.004
As seen from Table 5, Release 3.0 has a lower change density than Release 3.1, indicating that the change-density may not
simply decrease with time. The ANOVA test we performed yielded gave a F0 value of 3.540, and the critical value was
computed to be F0.055, 2, 15= 3.682, with a P-value of 0.055. Since 3.540 < 3.682, it is not possible to reject the null hypothesis.
In summary, we cannot reject H20 in favour of our alternative hypothesis H2A.
Nevertheless, upon inspection of the data from Table 4, we see that for all components the change-density is lower in the
following release, except for JEFdataaccess. In fact, the value JEFdataaccess has in Release 3.1 differs considerably
compared to the other results. This may have specific explanation(s), which will be explored in later analysis.
6XPPDU\DQG'LVFXVVLRQRISUHOLPLQDU\UHVXOWV
In Table 6, we have summarized our analysis results, along with corresponding research questions and hypotheses.
7DEOH6XPPDU\RIWKHUHVXOWV
5HVHDUFK +\SRWKHVHV
5HVXOWV
4XHVWLRQV
+ 7KH GHIHFWGHQVLW\ LQ -() FRPSRQHQWV GR QRW FKDQJH ZLWK H10: Rejected
54
WLPH
+$7KHUHLVDGLIIHUHQFHLQGHIHFWGHQVLW\IRU-()FRPSRQHQWV
54
H1A: Not rejected
RYHUWLPH
+ 7KH FKDQJHGHQVLW\LQ -() FRPSRQHQWV GRHV QRW FKDQJH ZLWK
H20: Not rejected
WLPH
+$7KHUHLVDGLIIHUHQFHLQFKDQJHGHQVLW\IRU-()FRPSRQHQWV
H2A: Not rejected
RYHUWLPH
On change-density, the data indicate that a decrease over time for five of the six components investigated. However, we are
unable to conclude without further empirical data and analysis. When it comes to defect-density, our results indicate a
distinct difference over subsequent releases of the JEF components. The data trend here is towards a sharp decrease.
Additional trends in the size data vs. the data on change-density and defect-density exist (e.g. that some of the components
have zero change density while their code size still shows an increase, or that some have high change-density while still zero
defect-density) will be investigated further with more empirical data in future work. An additional possible relationship to be
explored is whether large increases in change-density affect defect-density negatively, though such an effect is not indicated
in the data from our preliminary analysis.
Lower defect-density means less correction are needed, and thereby a higher quality level is achieved for the reusable JEF
components. When it comes to change-density, stability is important to achieve stable evolution and hence allowing for
stable resources being assigned to adapt and perfect the reusable JEF components. In this way, these quality attributes can be
used to partially model evolution, as they show how the quality of the reusable JEF components evolves over time.
7KUHDWVWRYDOLGLW\
We here discuss the possible threats to validity in our study, using the definitions provided by [Wohlin, 2002]:
&RQVWUXFW 9DOLGLW\ The
metrics we have used (defect-density and change-density) are thoroughly described and used in
literature. Nevertheless, our definition and use of the term change-density is different from that in other studies. All our data
are of pre-delivery change requests and trouble reports from the development phases for the three releases of the reusable
components. p. 191 of 199
5
([WHUQDO9DOLGLW\ The
object of study is a framework consisting of only seven components, and the data has been collected
for 3 releases of these components. Our results should be relevant and valid for other releases of these components, as well
as for similar contexts in other organizations.
,QWHUQDO 9DOLGLW\ All of the change requests and trouble reports for the JEF components have been extracted from Statoil
ASA by us. Incorrect or missing data details may exist, but these are not related to our analysis of defect-density and changedensity. We have performed the analysis jointly with the Microsoft Excel and SPSS tools.
&RQFOXVLRQ 9DOLGLW\ This analysis is performed based on an initial collection of data. This data set of change requests and
trouble reports should nevertheless be sufficient to draw relevant and valid conclusions.
3ODQQLQJIRUIXUWKHUGDWDFROOHFWLRQDQG)XWXUHZRUN
So far, Statoil ASA has collected data on Trouble Reports (TR’s) and Change Requests (CR’s) for the reusable JEF
components over several releases. They are also going to collect data on TR’s and CR’s for systems developed with the JEF
components – so far two systems are reusing JEF components in development. Further releases of JEF components will also
follow, and data will be collected on these.
In this article we have seen that there are differences in defect-density and change-density over subsequent releases, without
further analysis on the differences or their causes. In further work we will be exploring these issues in more detail, as well as
the possible cause-effect relation between defect-density and change-density, as well as the relation to other quality attributes.
The focus may also change to encompass towards reuse and maintenance, in addition to evolution.
&RQFOXVLRQV
We have performed a preliminary investigation of how the quality attributes defect-density and change-density evolves over
time for reusable components. While prior research has shown reusable components to be more stable (having a lower code
modification rate) across releases [Mohagheghi et al., 2004], change density as defined in this context has not been studied
before.
The overall results from our study are:
¾ For 54³+RZGRHVWKHGHIHFWGHQVLW\LQUHXVDEOHFRPSRQHQWVHYROYHRYHUWLPH"´ our results show a clear
difference over releases of the JEF components. The data trend here shows a sharp decline.
¾ On 54³+RZGRHVWKHQXPEHURIFKDQJHVSHUUHXVDEOHFRPSRQHQWHYROYHRYHUWLPH"´ our investigation on
change-density shows that we cannot conclude without further data and analysis. However, the general trends in the data
indicate that the change-density does decrease with time for five of the six components investigated.
In particular, lower defect-density results in less corrections being needed, hence yielding a higher level of quality of the
reusable components. Such a reduction is expected if components undergo few changes between releases, e.g. in our dataset,
some components have zero change-density between releases. A stable change-density is a factor towards allowing a stable
evolution, hence resources used in adapting and perfecting the reusable components can be better allocated. Hence, we see
that evolution can be partially modelled by looking at defect-density and change-density, as they show how the quality of the
reusable components evolves over time. Our results cannot currently support these thoughts to the full extent, as the results
of studying defect-density shows a decline, but the results from studying the change-density so far cannot be used to
conclude, despite the apparent trends in the data. We presume that more empirical data will remedy this problem.
The SEVO project is ongoing research and this paper is meant to present preliminary results, while results will come later.
We ultimately aim to look at how to reach higher quality, by demonstrating that understanding and managing software
evolution can lead to better system quality.
$FNQRZOHGJHPHQW
This work has been done as a part of the SEVO project (Software EVOlution in component-based software engineering), an
ongoing Norwegian R&D project from 2004-2008 [SEVO project, 2004-2008], and as a part of the first and second authors’
PhD study. We would like to thank Statoil ASA for the opportunity to be involved in their reuse projects.
p. 192 of 199
6
5HIHUHQFHV
[SEVO, 2004], 7KH6(92SURMHFWNTNU, Trondheim, 2004 – 2008, http://www.idi.ntnu.no/grupper/su/sevo/.
[Bass et al., 2001] L. Bass, C. Buhman, S. Comella-Dorda, F. Long, J. Robert, R. Seacord, K. Wallnau, 9ROXPH ,0DUNHW
DVVHVVPHQW RI &RPSRQHQWEDVHG 6RIWZDUH (QJLQHHULQJ SEI Technical Report number CMU/SEI-2001-TN-007
(http://www.sei.cmu.edu/)
[Lim, 1994] W. C. Lim, (IIHFWRI5HXVHRQ4XDOLW\3URGXFWLYLW\DQG(FRQRPLFV in IEEE Software, 11(5):23-30, Sept./Oct.
1994.
[Mohagheghi et al., 2004] P. Mohagheghi, R. Conradi, O. M. Killi, H. Schwarz, $Q(PSLULFDO6WXG\RI6RIWZDUH5HXVHYV
in Proc. 26th Int'l Conference on Software Engineering (ICSE'2004), 23-28 May 2004,
Edinburgh, Scotland, pp. 282-291, IEEE-CS Press Order Number P2163.
'HIHFW'HQVLW\DQG6WDELOLW\
[Morisio, Ezran, Tully, 2002] M. Morisio, M. Ezran, C. Tully, 6XFFHVVDQG)DLOXUH)DFWRUVLQ6RIWZDUH5HXVHin IEEE
Transaction on Software Engineering, 28(4):340-357, April 2002.
[Frakes & Fox, 1995] W. B. Frakes and C. J. Fox, 6L[WHHQ4XHVWLRQV$ERXW6RIWZDUH5HXVH, CACM, 38(6):75-87, June 1995.
[Kemerer & Slaughter, 1999] C. F. Kemerer and S. Slaughter, $QHPSLULFDODSSURDFKWRVWXG\LQJVRIWZDUHHYROXWLRQ IEEE
Transactions on Software Engineering, vol. 25, issue 4, 1999.
[Lehman et al., 1997] M. M. Lehman, J. F. Ramil, P. D. Wernick, D. E. Perry, W. M. Turski, 0HWULFVDQG/DZVRI6RIWZDUH
(YROXWLRQ±7KH1LQHWLHV9LHZ Proc. Fourth Int. IEEE Symp. on Software Metrics, Metrics 97, Albuquerque, New Mexico,
5-7 Nov. 1997, pp 20-32.
[Postema, Miller, Dick, 2001] M. Postema, J. Miller and M. Dick, ,QFOXGLQJ 3UDFWLFDO 6RIWZDUH (YROXWLRQ LQ 6RIWZDUH
th
(QJLQHHULQJ (GXFDWLRQ 14
IEEE Conference on Software Engineering Education and Training (CSEET’01), 19-21
February, 2001, Charlotte, North Carolina, USA, pp. 127-135.
[Sommerville, 2001] I. Sommerville, 6RIWZDUH(QJLQHHULQJSixth Edition, Addison-Wesley, 2001.
[Statoil ASA O&S Masterplan, 2006] O&S Masterplan at Statoil ASA, http://intranet.statoil.no
[Pressman, 2000] R. S. Pressman: 6RIWZDUH(QJLQHHULQJ$3UDFWLWLRQHU¶V$SSURDFKfifth edition, 2000, McGraw-Hill.
[Verhoef,
2004]
Chris
Verhoef,
Software
Evolution:
A
Taxonomy,
http://www.swebok.org/stoneman/version_0.1/KA_Description_for_Software_Evolution_and_Maintenance(version_0_1).pdf
p. 193 of 199
7
p. 194 of 199
Semantically sane component preemption
Yves Vandewoude∗ and Yolande Berbers
KULeuven Department of Computer Science
Celestijnenlaan 200A
B-3001 Heverlee, Belgium
{yves.vandewoude, yolande.berbers}@cs.kuleuven.ac.be
Abstract
The last decade, attempts have been made to address
these issues with component-based development. An application is modularized into a number of loosely coupled
units: components. Many different definitions exist in literature, and a discussion of even the general trends in component oriented software development is outside the scope
of this paper. For clarity however, we use the definition by
Szyperski [1]:
One of the main advantages commonly attributed to
component-oriented applications is their modular structure
and their ability to dynamically evolve and adapt to their
environment. However, in order to replace a component at
runtime, this component must first be placed in a quiescent
state. Reaching this state is problematic for a number of
components that have long lasting activities. Such activities are either inherently complex, or are repetitions of more
simple tasks. In this paper we argue that the latter category
can be easily preempted in a semantically correct fashion
by separating the iteration logic from the rest of the implementation and by emulating the iteration in the middleware.
This allows for a middleware platform to terminate the execution of a component in a semantically correct fashion so
that runtime replacement is possible.
1
A software component is a unit of composition with
contractually specified interfaces and explicit context
dependencies only. A software component can be deployed
independently and is subject to composition by third
parties.
Szyperski’s definition implies that all communication
and data-exchange is explicitly specified in the interface of a
component. As such, encapsulation of the internal structure
of a component is absolute.
However, in practice, some of the properties which make
components ideal for runtime evolution, conflict with the
desire to implement a certain level of autonomy in a component. Especially long-lasting activities often undermine
the ability to easily halt the execution of a component. It is
this conflict between the non-reactive behavior of a component and the ability to easily deactivate it for replacement
which is the focus of this paper.
In section 2, we first identify the main properties that
make components so well suited for dynamic evolution
(section 2.1) and provide an in depth description of the
problem of halting some components (section 2.2), and how
this problem hinders dynamic evolution (section 2.2). We
identify the presence of long-lasting tasks as the cause of
this problem and analyze why such tasks are often used.
We distinguish between two categories that require a different approach from the point of view of live updates: tasks
that are inherently complex and tasks that are merely repetitions of another task. In section 3 we then propose our solution which uses code rewriting using a preprocessor so that
the resulting component can be safely and easily preempted
Introduction
In the current situation of an ever shortening time to market, maintenance of software after it has been shipped to
the client is by far the largest cost-contributor to a software product. With over 20% of the initial specifications
of a software product considered outdated within a year
after development, keeping software up-to-date is a major undertaking that affects users and programmers alike.
Live updates, which change an application without shutting
it down, further increases the complexity of the evolution
problem since there are considerably more constraints on
a running system: updates must complete in a short timeframe, must deal with the state contained in the active application and consistency must be preserved during and after
the change.
∗ Authors funded by a doctoral scholarship of the “Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT
Vlaanderen)”
1
p. 195 of 199
with minimal assistance of the component developer.
2
2.1
dracocomponent GrayScaleConvertor
{
portgroup In 1
{
inmessage Image
{
newmessage Image reply;
reply :: data = getGrayScale (( Image) $$inmessage:: data );
Out .. reply ;
}
}
Components and Software Evolution
Components: Ideal for evolution ?
Szyperski’s definition does not explicitly state how a
component is to be implemented, nor how components are
to interact. In our own research, we have developed the
D RACO component language and middleware platform. In
this component system, components are implemented using the object-oriented programming language Java. As
such, the implementation of a component (the component
blueprint) is composed of one or more logically coherent
classes, and instances of such a component (which we simply refer to as component for the remainder of this paper)
can be considered as a tightly coupled group of objects,
which we will call the object-tree of the component.
Interconnection between components in our work is
achieved by means of connectors. According to [2], a connector is a reusable design element that supports a particular style of component interactions. Our work assumes the
interaction style that was defined in the S EESCOA project
([3, 4]). In this model, components communicate by asynchronously sending messages through external interfaces
that are formally specified using ports. Connectors implement a pipe-like construct, which makes relaying intercomponent communication relatively easy to achieve at runtime.
Although in various points in this paper the D RACO component language is used in code snippets and illustrations,
neither the methodology nor its implementation is the focus of this paper and we refer interested readers to [5, 6]
for more detailed information. We also wish to stress that
the concepts are more generic in nature and that they can be
applied to other component systems as well.
From the point of view of dynamic software evolution,
components are excellently suited. The reason for this is
twofold. First and foremost, components encapsulate a logical and coherent of unit of functionality. This makes them
an excellent target for a functional change to an application. Many common changes, such as affixes, remain within
the scope of a single component. Secondly, since all communication between components is made explicit through
connectors, it is much easier to interact with this communication. The presence of first-class connectors does not
only allow for relatively easy reconfiguration of the component composition that makes up the application, but also
eases the task of temporarily deactivating a portion of the
application which is required before that piece can be safely
modified.
portgroup Out 1
{
outmessage Image;
}
private Image getGrayScale(Image source)
{
// Conversion logic
}
}
Figure 1. The GrayScaleConvertor in Draco
notation1 .
2.2
It was shown by Kramer and Magee in [7] that in order to safely replace a component at runtime, it must first
achieve quiescence. Summarized, this entails that the component is not currently engaged in a transaction and that this
state of inactivity is guaranteed to last for the duration of the
update. Since components only interact with their peers or
their environment through their explicitly specified ports,
the process of migrating an active component towards this
state of quiescence is significantly less complex than in the
case of procedural or object-oriented systems where functionality of the component can be called or referenced from
many different and unknown locations throughout the program.
In principle, it suffices that the component environment
can:
(i) Intercept and temporarily halt all communications with
the component under consideration.
(ii) Detect whether or not the component is currently engaged in an activity.
In practice however, things are usually more complex,
and simply halting external requests to the component is
only sufficient when the component is purely reactive in nature. For instance, assume a component that converts color
images in their grayscale equivalent. The source code of
such a component in the D RACO component methodology
is shown in Figure 1. The specification this component is
extremely simple: it has two ports (In and Out) which
1 In the D RACO notation, .. is the message-send operator and ::
stands for parameter retrieval.
2
p. 196 of 199
Proactive vs. Reactive Components
can be connected with other components in a composition.
Whenever a message Image arrives on its In port, the
component creates a grayscale image and sends it out on
its Out port. It is clear that such a component can be easily
deactivated: each action is purely reactive, and lasts for a
relatively short period. If all messages to the In port are
intercepted, and the component is not busy converting an
image from a previous request, the component is in a quiescent state and can be replaced.
However, in many cases, the execution of a component
may require a long period of time. Depending on how the
underlying component system implements message delivery, this may either cause the component to hijack an execution thread from the middleware system for a very long
period of time, or the component may have received or created a custom thread in which the message execution is carried out. There are two important reasons why the execution
of a message may take up too much time to fit in the purely
reactive model of computation described above.
dracocomponent Camera
{
portgroup Control 1
{
inmessage StartLongBroadCast
{
for ( int i=0; i<5000000;i++)
{
newmessage Image message;
message:: data = grabImage();
Out .. message;
}
}
}
portgroup Out 1
{
outmessage Image;
}
private Image grabImage()
{
// Grab image routine
}
}
Figure 2. A camera component with infinite
repetition.
(i) The nature of the task is inherently complex or computation intensive.
3
(ii) The task is a long iteration of a simpler task.
This distinction is important from the perspective of dynamic evolution. In the first case, the only solution is to wait
until the component has finished its computation. After all,
forcibly terminating the execution of the message may result the component to be in an inconsistent state for which
there may very well exist no equivalent in the new version.
The second category, however, can semantically be interrupted between different executions of the same iteration.
An excellent example of components in this category are
content-generating components. Imagine a Camera component, for example, that grabs an image through a camera
and then sends out that image through its output port. After completion, it repeats this action until some condition
which is specified by the user is met. A trivial implementation of such a component is shown in Figure 2.
It is worth mentioning that simply waiting until the execution of such a repetitive action terminates is not a viable
solution in order to achieve quiescence. Unlike messages
whose implementation is inherently complex, repetitive actions may very well either never terminate, or have unpredictable termination conditions. Waiting for a camera component that is instructed to keep grabbing images forever is
not practical from a dynamic evolution point of view, especially since the nature of an iteration is theoretically well
suited for dynamic adaptation.
In this paper, we tackle this problem by extracting the
loop logic from the message implementation at design time.
As such, the middleware system regains control between
different iterations and can interrupt the execution if desired.
Addition of preemption points by extracting loop logic
3.1
Splitting an iteration in its parts
In the solution we propose, iterative messages are automatically split during the preprocessor stage in 5 different
methods:
1. The initialization of the iterative message
2. The stop condition of the iteration
3. The code in the body of the iteration
4. The update of the iteration (such as the increment of
counters)
5. The finalization of the iterative message
In order to execute such transformation, the preprocessor
must be first be able to identity which messages are repetitive actions, and which messages merely contain iterations
as a part of their normal behavior. We do not believe it is
possible to fully automatically make this distinction and that
some hints must be given by the programmer. How the relevant iterations can be conveniently identified is of lesser importance. Our own implementation introduces a new construct (ifor) which is identical to the Java for-construct,
but specifies that the message in which the iteration occurs
is a repetitive action. A slightly modified version of our
3
p. 197 of 199
dracocomponent Camera
{
portgroup Control 1
{
inmessage StartLongBC
{
initCamera ();
ifor ( int i=0;i<500000;i++)
{
newmessage Image message;
message:: data = grabImage();
Out .. message;
}
closeCamera();
}
}
public class Camera extends draco. core . Component
{
public PortGroup $$Control = new PortGroup(”Control”, 1, ControlImpl . class , this );
public class ControlImpl extends SinglePort {
Message message;
int StartLongBC loopVar i;
public void initLoopStartLongBC(Message $$inmessage) {
initCamera ();
LongBC loopVar i = 0;
}
public boolean testLoopStartLongBC(Message $$inmessage) {
return (LongBC loopVar i < 500000);
}
public void executeLoopStartLongBC(Message $$inmessage) {
message = new Message(”Image”);
message.put(”data”, grabImage ());
$$Out.sendMessage(message);
}
public void updateLoopStartLongBC(Message $$inmessage) {
LongBC loopVar i++;
}
public void finalizeLoopStartLongBC(Message $$inmessage) {
closeCamera();
}
portgroup Out 1
{
outmessage Image;
}
...
}
}
... // omitted for brevity
}
(a) Original Code
(b) After Transformation
Figure 3. Loop extraction by the preprocessor
Camera component, and the result after transformation2 is
shown in Figure 3.
The preprocessor then splits the code of the message
in the 5 methods mentioned above, whose names are
constructed by adding the prepositions initLoop,
testLoop,
executeLoop,
updateLoop and
finalizeLoop to the name of the original method.
The initLoop method contains all statements that are
executed before the iteration as well as the initialization
of the iteration itself. The testLoop, executeLoop
and updateLoop contain the stopcondition, the body
and the update of the iteration respectively. Finally, the
finalizeLoop contains all statements located after the
iteration.
Also illustrated by the example shown in Figure 3 is the
treatment of local variables. When the original message
contains locally declared variables, it is possible that these
are used by all 5 newly introduced methods. To allow this,
the definition of these variables is extracted and placed in
the scope directly above the original message: the scope of
the component port. Since a port may contain more than one
message that represents such an iteration and since these
messages may declare identical local variables, renaming is
required in order to avoid nameclashes.
3.2
At runtime, whenever the original message is sent to the
component, the middleware system attempts to execute the
message by invoking the associated method using reflection. Since no method with the original messagename exists, the fallback mechanism searches for the initialization
method. If such a method exists, the middlewaresystem will
mimic the iteration using following algorithm:
2
4
6
8
initLoopMethod
r e s u l t ← testLoopMethod
w h i l e ( r e s u l t && c a n C o n t i n u e ( ) )
{
executeLoopMethod
u pd a t eL o o pM e t ho d
r e s u l t ← testLoopMethod
}
finalizeLoopMethod
It is clear that the above execution is semantically
equivalent to the original message with the exception that
the stopcondition of the iteration (line 3) is only partially determined by the original criterium. The method
canContinue() is present at the level of the middleware
platform and should return true during normal operations.
However, the behavior of this method is dynamic and when
a component requires replacement, it suffices to change the
result returned by canContinue() for the message to
2 The resulting code generated by our preprocessor was manually edited
for brevity. Code specific for the D RACO middleware platform and unrelated to the iteration transformation is omitted.
4
p. 198 of 199
Runtime Execution and Behavior
gently terminate its execution. This in turn would allow
the component to achieve quiescence within a reasonable
period of time.
3.3
the message into separate methods. At runtime, the middleware system mimics the iteration using subsequent reflective calls. Although semantically equivalent to the original
message, the stopcondition of the main iteration can now
easily be modified if so desired, in order to terminate the
message to achieve quiescence.
Pros and Contras of suggested approach
From an evolution point of view, this technique is superior to simply pausing the execution of the message at
a predetermined location, which can be realized by many
alternative approaches. For instance, a call to the middleware platform could be inserted at the end of each iteration
cycle. The technique presented in this paper is superior in
that the execution is actually terminated, not just paused.
As such, no inconsistent state or method stack is left behind
that needs to be dealt with during the replacement phase of
a component update.
Another advantage is that the method is terminated in a
semantically correct way by allowing the programmer to indicate which messages are longlasting iterations of a more
fine grained activity, and which iterations are to be left
alone. Although complex and possibly long lasting messages can not be preempted if they are not repetitions of a
simple activity, stopping these messages might result in a
state from which a correct update is not possible. Furthermore, such messages seldom have infinite or undetermined
durations.
Nothing comes for free, and our technique has some disadvantages as well. The most important drawback of the
proposed technique is its overhead. Each cycle of the iteration requires 3 different reflective calls. This overhead is
more important if the body of the iteration itself requires
less computation. In theory, this drawback can be alleviated
by a more intelligent preprocessor that transforms the iteration into two nested loops and only splits up the outer loop.
However, it is far from trivial to automatically detect when
an iteration would qualify for this transformation. Relying
on the programmer for this activity would place a far greater
burden on the programmer than the current solution which
requires almost no changes to the code of a component.
4
References
[1] Clemens Szyperski. Component Software: Beyond
Object-Oriented Programming, 2nd edition. AddisonWesley and ACM Press, 2002.
[2] Jonathan Aldrich, V. Sazawal, Craig Chambers, and
David Notkin. Language support for connector abstractions. Lecture Notes in Computer Science: ECOOP
2003 - Object Oriented Programming, 2743:74–102,
2003.
[3] Yolande Berbers, Peter Rigole, Yves Vandewoude, and
Stefan Van Baelen. Components and contracts in software development for embedded systems. In Proc. of
the first European Conference on the Use of Modern
Information and Communication Technologies, pages
219–226, 2004.
[4] David Urting, Stefan Van Baelen, Tom Holvoet, Peter
Rigole, Yves Vandewoude, and Yolande Berbers. A
tool for component based design of embedded software.
In Proceedings of Tools Pacific 2002, Februari 2002.
[5] Yves Vandewoude, Peter Rigole, David Urting, and
Yolande Berbers. Draco : An adaptive runtime environment for components. Technical Report CW372, Department of Computer Science, KULeuven, Belgium,
December 2003.
[6] Yves Vandewoude and Yolande Berbers. Supporting
runtime evolution in seescoa. Journal of Integrated
Design & Process Science: Transactions of the SDPS,
8(1):77–89, March 2004.
[7] J. Kramer and J. Magee. The evolving philosophers
problem: Dynamic change management. IEEE Transactions on Software Engineering, 16(11):1293–1306.
Conclusion
In this paper, we have presented a simple and straightforward solution that allows programmers to write components with long lasting or even infinite duration of a repetitive nature (often encountered in content generating components), while preserving an important property for dynamic
software evolution: easy and semantically correct deactivation. Our solution requires the programmer to identify
messages with a repetitive nature using a custom loop construct: ifor. This construct is automatically detected and
converted using the preprocessor that splits up the code of
5
p. 199 of 199
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement