The research work related to this thesis falls under the three general
categories of test case generation and prioritization, rate of fault detection
effectiveness and agent-based automation. The first two categories are not
mutually-exclusive, since work in test case prioritization has also sometimes
incorporated the notion of fault detection effectiveness as a means for
evaluating and comparing prioritizing techniques. The related work focusing
on test case generation and prioritization for various operations like basic,
database, web and web services present in the SUT are discussed first and
then the related work involving rate of fault detection effectiveness and
finally the discussion on the agent based automation.
Test case prioritization techniques schedule test cases in an order
that attempts to maximize some objective function. In literature, there are
several lines of research on test case generation and prioritization for basic
Avritzer and Weyuker (1995) presented techniques for generating
test cases that apply to software that can be modeled by Markov chains,
provided, operational profile data were available. Although the authors did
not use the term “prioritization", their techniques generated test cases in an
order that covered a larger proportion of the software states most likely to be
reached in the field earlier in testing.
Rothermel et al (1999) developed a family of techniques for test
case prioritization based on several coverage criteria and the probability of
exposing known faults. In their study, they had empirically examined various
techniques and their relative abilities to improve how quickly faults can be
detected during regression testing and their results suggested that their
techniques had improved the rate of fault detection of test suites. Their results
however had immediate practical implications suggesting that if code
coverage-based techniques were used in testing, they could be leveraged for
additional gains through prioritization. They (2001) also provided a metric,
Average Percentage of Fault Detected (APFD) that measures the average
cumulative percentage of faults detected over the course of executing the test
cases in a test suite in a given order. APFD metric relies on the assumption
that test costs and fault severities are uniform. In practice, however, test costs
and fault severities can vary widely.
Elbaum et al (2001) considered the coverage criterion at the
function level. Their studies focused on version specific TCPr, in which test
cases were prioritized and rate of fault detection was measured relative to
specific modified version of the program. They concluded that versionspecific test case prioritization can produce statistically significant
improvements in the rate of fault detection of test suites. The coarser analysis
used by function-level techniques rendered them less costly and less intrusive
than statement level techniques. They (2001) have also proposed a metric
APFDc which considers test cost and fault severities on TCPr. To overcome
the limitation of APFD, “cost-cognizant” metric APFDc was used in their
study. They also proposed techniques for TCPr based on the weighted
function coverage, where estimated fault proneness or severity served as the
Kim and Porter (2002) studied a situation when the resource
constraint does not allow the execution of the entire test suite and they
proposed a technique based on the performance of each test case in prior
testing using exponential smoothing. Given a percentage number (denoted as
n), their technique selected and prioritized n% test cases in the entire test
Srivastava and Thiagarajan (2002) proposed a technique that
considers the changes between a version and its previous version, and uses the
coverage of the code impacted by the changes to guide the prioritization
Jones and Harrold (2003) considered a coverage criterion at a very
fine granularity, the Modified Condition/Decision Coverage (MC/DC). They
presented two new algorithms that account for MC/DC for reducing and
prioritizing test suites. An empirical study that evaluated some aspects of the
test-suite reduction algorithm was detailed and the results achieved showed
the potential for substantial test-suite size reduction with respect to MC/DC.
The study also investigated that prioritization techniques significantly reduce
the cost of regression testing.
Hyunsook Do et al (2005) considered the coverage criterion at the
block level and the method level for Java software. Prioritization techniques
in their work focused mainly on Java programs. Their study investigated the
practical impact of their results with respect to differences in delay values,
relating them to many cost factors involved in regression testing and
prioritization process. They also evaluated traditional techniques for TCPr in
the context of time aware TCPr.
Srikanth et al (2005) presented a system level TCPr approach based
on fixed weights specified by the user. The technique identified the important
requirement which would increase customer perceived software quality and
made the test cases related to that requirement run earlier.
Tonella et al (2006) treated the TCPr problem as a machine
learning problem. The authors proposed a new prioritization technique based
on machine learning algorithm which combined multiple prioritization
indexes and utilized information extracted from user knowledge to prioritize
test cases.
Walcott et al (2006) studied the problem of time aware test case
prioritization, which considered an explicit time budget and the difference in
execution time for each test case. They have proposed an approach based on
genetic algorithm and empirically compared the proposed approach with the
initial ordering, the reverse ordering and two control techniques. They also
defined a new APFD metric for evaluating the effectiveness of prioritization
in the time constrained situation.
Li et al (2007) had studied another greedy strategy named
2-optimal strategy based on the k-optimal greedy algorithm and two metaheuristic search strategies namely the hill climbing strategy and the strategy
using a genetic algorithm.
Alspaugh et al (2007) further studied the problem of time aware test
case prioritization and empirically compared seven knapsack solvers namely
random, greedy by ratio, greedy by value, greedy by weight, dynamic
programming, generalized tabular and the core with or without scaling.
Park et al (2008) proposed a technique based on estimation of test
cost and fault severities using historical information. They had suggested a
historical value based approach for a cost cognizant TCPr that included an
estimation of the trends of cost and fault severity by using historical
information. A controlled experiment to validate the proposed approach was
conducted which proved its usefulness and effectiveness. The major
contributions of their research were that it provided a way to estimate the cost
and fault severity of the current test cases by using historical information and
complements the other test case prioritization techniques.
Krishnamoorthi and Mary (2009) proposed a new test case
prioritization technique using Genetic Algorithm (GA) that prioritizes
subsequences of the original test suite so that the new suite is run within a
time-constrained execution environment. They have showed that a superior
rate of fault detection can be achieved when compared to rates of randomly
prioritized test suites.
Lu Zhang et al (2009) proposed a novel approach to time-aware
test-case prioritization using integer linear programming (ILP). Their results
indicated that their techniques outperformed all the other techniques under the
scenarios of both general and version-specific prioritization. The empirical
results also indicated that some traditional techniques with lower analysis
time cost for test-case prioritization performed competitively when the time
budget is not quite tight.
In literature, the test case generation techniques applied to web
operations can be classified into two parts, namely, a) Structural testing and b)
User-session-based testing. Very little work is done in literature for TCPr of
web operations.
Structural Testing
With the goal of providing automated data flow testing, Chien-
Hung Liu et al (2000) developed an object-oriented web test model which
captured the dependent relationships of an application’s entities through an
object model, page navigation through a page navigation diagram, and,
control and data flow through an Inter procedural Control Flow Graph
(ICFG). They utilized the model to generate test cases, which were based on
the data flow between objects in the model. However, they developed their
model for HTML and XML documents and did not consider any other
features inherent in other web applications.
Flippo Ricca and Paolo Tonella (2001) developed a high-level
UML-based representation of a web application and described how to perform
page, hyperlink, def-use, all-uses, and all-paths testing based on the data
dependencies computed using their model. Because of the impractical nature
of def-use and all-paths testing, they considered only independent paths and
paths where loops were traversed k times. They also presented an approach of
program slicing for web applications where data and control dependencies
were computed to construct a system dependence graph for the application,
which was followed by a simple slicing algorithm. Because they used a
fictitious scripting language which included only a small subset of script
instructions and features focusing on forms, the slicing problem was
significantly simplified. They further (2004) presented a dynamic analysis
technique to develop a model of the application and used statistical data
available from page accesses of the application to statistically test the
Di Lucca et al (2002) developed a web application model and set of
tools for the evaluation and automation of testing web applications. They
presented an object-oriented test model of a web application and proposed a
definition of unit and integration levels of testing. For unit level testing, they
adopted a web page as the unit to test. Their technique focused on both client
and server pages. They developed functional testing techniques based on
decision tables, which helped in generating effective test cases.
Jeff Offutt et al (2004) presented an approach to generate test cases
with the goal of uncovering faults and security vulnerabilities in server
software, considering users’ ability to bypass client-side input validation.
Their strategy was called bypass testing Bypass testing was geared more
towards users who bypass the client-side input validation, thus testing for
robustness and security holes in the server applications from less normal user
Yuetang Deng et al (2004) presented a technique for path-based test
case generation to test database of web applications. They addressed the
problem of maintaining the state of the database to expose faults. However,
their technique was suitable only in the presence of static URLs.
Chen Fu et al (2004) addressed testing of middle-ware applications
and their research was specifically related to testing recovery error code by
injecting faults and gathering code coverage information to determine
potential system reliability. However, tester intervention was required at
various stages of the process.
Andrews et al (2005) proposed an approach to model web
applications with Finite State Machines (FSM) and using coverage criteria
based on FSM test sequences. They represented test requirements as
subsequences of states in the FSM, generated test cases by combining the test
sequences and proposed a technique to reduce the set of inputs. However,
their model does not handle dynamic aspects of web applications, such as
transitions introduced by the user through the browser, and connections to
remote components.
User-session-based Testing
In user-session-based testing, data are collected from users of a web
application by the web server. Each user-session is a collection of user
requests in the form of base request and name-value pairs (for example. form
field data). A base request for a web application is the request type and
resource location without associated data (GET/servlets/authentication/login.jsp).
More specifically, the user-session is defined as beginning when a request
from a new IP address reaches the server and ending when the user leaves the
web site or the session times out. A test case consists of a set of HTTP
requests which are associated with each user-session.
Sebastian Elbaum et al (2005) provided promising results that
demonstrated the fault detection capabilities and cost-effectiveness of usersession based testing. In particular, they showed user-session techniques
could able to discover certain types of faults but would not detect faults
associated with rarely entered data. In addition, they also showed that the
effectiveness of user-session techniques improved as the number of collected
session’s increases. However, the cost of collecting, analyzing, and replaying
test cases also increased. They also reported that they were able to achieve a
high percent of test case reduction with Harrold et al (1993) test case
reduction technique.
Sampath et al (2007) have conducted the user-session- based testing
by applying concept analysis that reduces the size of user-session-based test
suites and empirically evaluate the effectiveness of the reduced suites. The
reduction techniques were based on criteria, such as covering all base requests
in the application while maintaining the use case representation. The criteria
create a test suite smaller than the original suite, but tests are in no particular
order. Indeed, test suite reduction techniques strive to reduce the size of a test
suite, while maintaining overall fault finding effectiveness.
Sridevi Sampath et al (2007) and Sprenkle et al (2005) generated
user-session-based test cases from usage logs. When available, cookies were
used to generate a user-session based test case.
Web applications are similar to GUI applications since both the
domains are highly user-driven. User input can unexpectedly change the
control/data-flow in the application. Models (Memon et al 2001), test case
generation strategies (Lee White and Husain Almezen 2000), oracles (Memon
et al 2000) and coverage criteria (Memon 2001) have been proposed to test
GUIs. Regression testing strategies (Memon and Mary Lou Soffa 2003) have
also been presented. In addition, web applications use database applications as
the backend tier from where the data is rendered to the user. Database testing
strategies (Kapfhammer and Mary Lou Soffa 2003) were exploited to improve
the effectiveness of web application testing. While GUI and database testing
strategies could be used to test certain parts of the application, such as the
user interaction based front-end and the database backend, the existing testing
strategies cannot be used to test the web application in its entirety.
This section reviews the research related to testing of databasecentric software applications. While a significant amount of research has
focused on the testing and analysis of basic operations of software, there is a
relative little work that specifically examines the testing of database-centric
applications. Whittaker and Voas (2000) point out that device driver,
operating systems, and databases are all aspects of a software system's
environment that are often ignored during testing. However, they do not
propose and evaluate specific techniques that support the regression testing
demonstrating the absence of prescribed faults in a program. According to
Zhu et al (1997), three kinds of relevant adequacy criteria, namely, mutation
testing, perturbation testing, and error seeding were considered.
Jin and Offutt (1998) highlighted test adequacy criteria that
incorporate a program’s interaction with its environment; they did not
specifically address the challenges associated with test adequacy criteria for
database-centric applications.
Slutz et al (1998) addressed the issues associated with
automatically creating the statements that supported the querying and
manipulation of relational databases. This approach generated database query
and manipulation statements outside the context of a program that interacted
with a database.
Gray et al (1999) generated test databases that satisfied the
constraints in the relational schema, and their approach focused on the rapid
generation of large random data sets that supported performance testing.
Chan and Cheung (1999) proposed a technique that tested databasecentric applications that were written in a general-purpose programming
language and included embedded structured query language statements that
were designed to interact with a relational database. This approach
transformed the embedded SQL statements within a data-centric application
into general purpose programming language constructs. Effective test cases
were generated to reveal more faults compared with the conventional
approaches. The test cases generated helped reveal faults related to the
internal states of databases in database applications.
Davies et al (2000) in their work performed automated test case
generation by means of considering the database schema and other
properties/constraints. Chays et al (2000) proposed a partially automated
software testing tool, named AGENDA, inspired by the category-partition
method that enabled to determine whether a program behaved according to
specification. When provided with the relational schema of the databases used
by the application under test and a description of the categories and choices
for the attributes required by the relational tables, the AGENDA tool could
generate meaningful test databases (Chays and Deng 2003, Chays et al 2002).
AGENDA also provided a number of database testing heuristics, such as
determining the impact of using attribute boundary values or determining the
impact of null attribute values that could enable the tester to gain insights into
the behavior of a program when it interacts with a database that contains
different states (Chays et al 2004).
Daou et al (2001) used data flow information to support the
regression testing of data-centric applications. Their exploration of data flow
issues did not include either a representation for a database centric or a
complete description of database interactive association.
Chays and Deng (2003) described approaches that were similar to
Neufeld et al (1993), Zhang et al (2001), Chays et al (2002) to generate
database states using constraint knowledge present in the relational schema.
Wu et al (2003) in their work performed database application testing by
preserving privacy.
In Kapfhammer and Mary Lou Sofia (2003) work, a family of test
adequacy criteria was used to assess the quality of test suites for database-
driven applications. They developed a unique representation of a databasedriven application that facilitated the enumeration of database interaction
associations. The associations reflected an application definition and use of
database entity at multiple levels of granularity
Suarez-Cabal and Tuya (2004) presented a coverage criterion that
measured the adequacy of SQL SELECT queries in the light of a database
that had been populated with data. This approach calculated the coverage of a
single query, identified a subset of the database that yielded the same level of
coverage as the initial database, and provided guidance that increased
database coverage. However this scheme focused on the coverage for SQL
SELECT statement and it did not consider any regression techniques.
Willmor and Embury (2005) also described an automated test data
generation scheme, when, given a predicate logic condition, described the
desired database state. They have also proposed a regression test selection
technique that identified a subset of a test suite that could be used during
subsequent rounds of testing. Their approach was similar to the reduction
technique but it could exhibit limited reductions and performance concerns
because it performed a conservative static analysis of the entire databasecentric application.
Halfond and Orso (2005) presented an adequacy criterion named
command form coverage criterion that ensured that a test suite caused the
program under test to submit as many of the viable SQL commands as
possible. They used static analysis to identify all of the SQL commands that
the program could submit and then determined how many of the statements
were actually generated during testing. This coverage is considered in the
study, as a prioritization index as explained in Chapter Five.
Haftmann et al (2005) presented a scheme that re-ordered a test
suite in an attempt to avoid RDBMS restarts. Their approach executed the test
suite without database restarts and observed whether or not each test case
passed. A test ordering conflict was recorded in a database to assess the
exclusion of a restart between the two tests caused the otherwise passing test
to fail. The authors also described heuristics that re-ordered a test suite in an
attempt to avoid the test conflicts that were stored within the conflict
database. They have extended their basic technique to support the parallel
execution of the test suite on a cluster of testing computers. They also focused
on improving the efficiency of regression testing but did not propose
techniques to create effective test case prioritization.
Chan et al (2005) proposed to integrate SQL statements and the
conceptual data models of an application for fault based testing, which used a
set of mutation operators based on the standard types of constraint used in the
enhanced entity-relationship model. The operators that were semantic in
nature guided the construction of affected attributes and join conditions of
Mutation testing for database applications were performed by
(Tuya et al 2006) which was an effective, though computationally expensive,
fault-based software testing technique.
In recent years, usage of web services has rapidly grown into a part
of the business in many organizations. However, increased usage of web
services has not been reciprocated with corresponding increases in reliability.
In literature, there is a significant amount of research activities going on in
testing web services to maintain the quality of the application in which it is
Offutt and Xu (2004) presented a new approach to test web services
based on data perturbation. This method used two other methods: data value
perturbation and interaction perturbation. But the method proposed was
restricted to peer-to-peer interactions.
Tsai et al (2005) recommend the use of an adaptive group testing
technique to address the challenges in testing service-oriented applications
when a large number of web services are available. They rank test cases
according to a voting mechanism on input-output pairs. Neither the source
code of the service nor the structure of WSDL was utilized.
Martin et al (2007) outlined a framework that generates and
executes web service requests, and collects the corresponding responses from
web services. They propose to examine the robustness aspect of services by
perturbing such request-response pairs. They have not studied test case
Yongbo Wang et al (2007) performed test case generation of
semantic web services based on ontology by traversing the Petri-net to
produce test steps and test data and generated the ontology based over Input
Output Preconditions and Effect (IOPE) analysis. Web Ontology Language
using Semantics (OWL-S) provided the expressive form of semantic with
preconditions based on ontology.
Hanna Samer et al (2007) have presented an approach for
Specification-based Test Case Generation for Web Services in which they
have proposed a method for generating test cases based on the WSDL input
Siripol Noikajana and Taratip Suwannasart (2008) have undertaken
Web Service Test Case Generation Based on Decision Table. The proposed
method used WSDL-S and Semantic Web Rule Language (SWRL) to
generate test cases. They have also described the Web Service contract with
the WSDL-S (Web Service Semantics Language, an extension of WSDL) and
OCL (Object Constraint Language). They also present an approach for
generating test cases using WSDL-S and OCL (http://www.omg.org) while
the test case generation method was pair-wise testing technique (2009).
Chunyan Ma et al (2008) proposed a test case development model
based approach for generating test data for the single operation of web
services based on WSDL specification. A formal model for the data type of
input element of the operation is defined and they have presented an
algorithm that derives the model from WSDL and a method that generates test
data for operations from the model. XML schema was used for the definition
of data which facilitated the automated test data generation.
Mei et al (2008) used the mathematical definitions of XPath as
rewriting rules, and proposed a data structure known as an XPath Rewriting
Graph (XRG) http://www.w3.org/TR/xpath20). They have also proposed a
hierarchy of prioritization techniques (2009) for regression testing of services,
They have also studied the problem of black-box test case prioritization of
services based on the coverage information of WSDL tags. They have
explored the XML message structure exchanged between services to guide
test case prioritization.
Sebastien Salva et al (2009) have performed automatic web service
robustness testing from WSDL descriptions where they have considered only
SOAP messages and not any database connections or internal code error
messages. They have generated test cases based on hazards.
Jiang et al (2009) proposed a family of adaptive random test case
prioritization techniques that tried to spread the test cases as evenly as
possible across the code space to increase the rate of fault detection. They
have also performed regression testing process improvement for specification
evolution of real-world protocol software.
Ke Zhai et al (2010) proposed to integrate service selection in test
case prioritization to support regression testing. Furthermore, they also
proposed a family of black-box service-centric test case prioritization
techniques that guide the prioritization based on Point of Interest (POI)
Agent-based computing has often been suggested as a promising
technique for problem domains that are distributed, complex and
heterogeneous (Wooldridge 1997). A number of agent-based approaches have
been proposed to solve different types of problems in the software industry.
Agent-Oriented Software Engineering (AOSE) was established to
speed up the construction of software agents. In this paradigm, application
programs were written as a set of agents that interact with one another
according to communication standard known as Agent Communication
Language (ACL in sequel) to prevent the potential mismatch problems such
as inconsistencies (in syntactic usages) and incompatibilities (semantics
usages) for application written in different languages.
Medvidovic and Taylor (2000) in their work, emphasized on
specifying and analyzing the agent-based architectures. Close in spirit to this
work were the works of Xu and Shatz (2001) and Xu et al (2003) which has
used Petri-nets to model agent-based systems. More specifically, they have
used various kinds of Petri Nets (such as predicate transition nets) to model
the behavior of an individual agent or the entire system of agents.
Several mobile agent systems have been developed and used in
various environments for carrying out different jobs. Some of them include
the Naplet system (Naplet 2002), the Aglet system by IBM (Aglet
http://aglets.sourceforge.net/.), Jini-based mobile agent implementation and
the like.
Mobile agent systems described by Wang et al (1999) uses Java
Mobile agent technology to deliver mobile agents to the threshold of agent
enhanced e-commerce application, seeking technology with the potential to
inspire and support mass-market e-commerce application. Huedo et al (2004)
presented a framework for adaptive executions in grids. This was a globusbased framework that allowed execution of jobs in a ‘submit and forget’
fashion. Maintaining performance, Quality of Service (QoS) for individual
applications was one of the major objectives of the research. The ICENI
project emphasized a component framework for generality and simplicity of
use of agents (Furmento et al 2005).
The GrADS project focused on building a framework for both
preparing and executing agents in Grid environment (Kennedy et al 2002).
Each application had an application manager, which monitored the
performance of that application for QoS achievement. Failure to achieve QoS
contract, causes a rescheduling or redistribution of resources. GrADS
monitored resources and used Autopilot for performance prediction through
agents (Wolski et al 2003, Ribler et al 2001).
The AgentScape project (Wijngaards et al 2002, Overeinder et al
2000) provided a multi-agent infrastructure that was employed to integrate
and coordinate distributed resources in a computational grid environment. The
A4 (http://www.ccrl-nece.de/˜cao/A4/) project at Warwick had likewise
developed a framework for agent-based resource management on grids. In
Gradwell (2003), multiagent systems were used to trade for grid resources at
the higher “services” level and not at the base “resource” level.
The above literature survey makes it clear that there is very little
work of agent implementation in the field of software testing, particularly,
regression testing. As the regression process for software involves more
human intervention and is a time consuming one, automation of this process
may reduce the time and cost. This study proposes a novel method of
automation of the complete regression process through multi-agents, thereby
reducing the human intervention.