16 CHAPTER 2 REVIEW OF LITERATURE 2.1 INTRODUCTION The research work related to this thesis falls under the three general categories of test case generation and prioritization, rate of fault detection effectiveness and agent-based automation. The first two categories are not mutually-exclusive, since work in test case prioritization has also sometimes incorporated the notion of fault detection effectiveness as a means for evaluating and comparing prioritizing techniques. The related work focusing on test case generation and prioritization for various operations like basic, database, web and web services present in the SUT are discussed first and then the related work involving rate of fault detection effectiveness and finally the discussion on the agent based automation. 2.2 TEST CASE GENERATION AND PRIORITIZATION RESEARCH FOR BASIC OPERATIONS Test case prioritization techniques schedule test cases in an order that attempts to maximize some objective function. In literature, there are several lines of research on test case generation and prioritization for basic operations. Avritzer and Weyuker (1995) presented techniques for generating test cases that apply to software that can be modeled by Markov chains, provided, operational profile data were available. Although the authors did 17 not use the term “prioritization", their techniques generated test cases in an order that covered a larger proportion of the software states most likely to be reached in the field earlier in testing. Rothermel et al (1999) developed a family of techniques for test case prioritization based on several coverage criteria and the probability of exposing known faults. In their study, they had empirically examined various techniques and their relative abilities to improve how quickly faults can be detected during regression testing and their results suggested that their techniques had improved the rate of fault detection of test suites. Their results however had immediate practical implications suggesting that if code coverage-based techniques were used in testing, they could be leveraged for additional gains through prioritization. They (2001) also provided a metric, Average Percentage of Fault Detected (APFD) that measures the average cumulative percentage of faults detected over the course of executing the test cases in a test suite in a given order. APFD metric relies on the assumption that test costs and fault severities are uniform. In practice, however, test costs and fault severities can vary widely. Elbaum et al (2001) considered the coverage criterion at the function level. Their studies focused on version specific TCPr, in which test cases were prioritized and rate of fault detection was measured relative to specific modified version of the program. They concluded that versionspecific test case prioritization can produce statistically significant improvements in the rate of fault detection of test suites. The coarser analysis used by function-level techniques rendered them less costly and less intrusive than statement level techniques. They (2001) have also proposed a metric APFDc which considers test cost and fault severities on TCPr. To overcome the limitation of APFD, “cost-cognizant” metric APFDc was used in their study. They also proposed techniques for TCPr based on the weighted 18 function coverage, where estimated fault proneness or severity served as the weight. Kim and Porter (2002) studied a situation when the resource constraint does not allow the execution of the entire test suite and they proposed a technique based on the performance of each test case in prior testing using exponential smoothing. Given a percentage number (denoted as n), their technique selected and prioritized n% test cases in the entire test suite. Srivastava and Thiagarajan (2002) proposed a technique that considers the changes between a version and its previous version, and uses the coverage of the code impacted by the changes to guide the prioritization process. Jones and Harrold (2003) considered a coverage criterion at a very fine granularity, the Modified Condition/Decision Coverage (MC/DC). They presented two new algorithms that account for MC/DC for reducing and prioritizing test suites. An empirical study that evaluated some aspects of the test-suite reduction algorithm was detailed and the results achieved showed the potential for substantial test-suite size reduction with respect to MC/DC. The study also investigated that prioritization techniques significantly reduce the cost of regression testing. Hyunsook Do et al (2005) considered the coverage criterion at the block level and the method level for Java software. Prioritization techniques in their work focused mainly on Java programs. Their study investigated the practical impact of their results with respect to differences in delay values, relating them to many cost factors involved in regression testing and prioritization process. They also evaluated traditional techniques for TCPr in the context of time aware TCPr. 19 Srikanth et al (2005) presented a system level TCPr approach based on fixed weights specified by the user. The technique identified the important requirement which would increase customer perceived software quality and made the test cases related to that requirement run earlier. Tonella et al (2006) treated the TCPr problem as a machine learning problem. The authors proposed a new prioritization technique based on machine learning algorithm which combined multiple prioritization indexes and utilized information extracted from user knowledge to prioritize test cases. Walcott et al (2006) studied the problem of time aware test case prioritization, which considered an explicit time budget and the difference in execution time for each test case. They have proposed an approach based on genetic algorithm and empirically compared the proposed approach with the initial ordering, the reverse ordering and two control techniques. They also defined a new APFD metric for evaluating the effectiveness of prioritization in the time constrained situation. Li et al (2007) had studied another greedy strategy named 2-optimal strategy based on the k-optimal greedy algorithm and two metaheuristic search strategies namely the hill climbing strategy and the strategy using a genetic algorithm. Alspaugh et al (2007) further studied the problem of time aware test case prioritization and empirically compared seven knapsack solvers namely random, greedy by ratio, greedy by value, greedy by weight, dynamic programming, generalized tabular and the core with or without scaling. Park et al (2008) proposed a technique based on estimation of test cost and fault severities using historical information. They had suggested a 20 historical value based approach for a cost cognizant TCPr that included an estimation of the trends of cost and fault severity by using historical information. A controlled experiment to validate the proposed approach was conducted which proved its usefulness and effectiveness. The major contributions of their research were that it provided a way to estimate the cost and fault severity of the current test cases by using historical information and complements the other test case prioritization techniques. Krishnamoorthi and Mary (2009) proposed a new test case prioritization technique using Genetic Algorithm (GA) that prioritizes subsequences of the original test suite so that the new suite is run within a time-constrained execution environment. They have showed that a superior rate of fault detection can be achieved when compared to rates of randomly prioritized test suites. Lu Zhang et al (2009) proposed a novel approach to time-aware test-case prioritization using integer linear programming (ILP). Their results indicated that their techniques outperformed all the other techniques under the scenarios of both general and version-specific prioritization. The empirical results also indicated that some traditional techniques with lower analysis time cost for test-case prioritization performed competitively when the time budget is not quite tight. 2.3 TEST CASE GENERATION AND PRIORITIZATION RESEARCH FOR WEB OPERATIONS In literature, the test case generation techniques applied to web operations can be classified into two parts, namely, a) Structural testing and b) User-session-based testing. Very little work is done in literature for TCPr of web operations. 21 2.3.1 Structural Testing With the goal of providing automated data flow testing, Chien- Hung Liu et al (2000) developed an object-oriented web test model which captured the dependent relationships of an application’s entities through an object model, page navigation through a page navigation diagram, and, control and data flow through an Inter procedural Control Flow Graph (ICFG). They utilized the model to generate test cases, which were based on the data flow between objects in the model. However, they developed their model for HTML and XML documents and did not consider any other features inherent in other web applications. Flippo Ricca and Paolo Tonella (2001) developed a high-level UML-based representation of a web application and described how to perform page, hyperlink, def-use, all-uses, and all-paths testing based on the data dependencies computed using their model. Because of the impractical nature of def-use and all-paths testing, they considered only independent paths and paths where loops were traversed k times. They also presented an approach of program slicing for web applications where data and control dependencies were computed to construct a system dependence graph for the application, which was followed by a simple slicing algorithm. Because they used a fictitious scripting language which included only a small subset of script instructions and features focusing on forms, the slicing problem was significantly simplified. They further (2004) presented a dynamic analysis technique to develop a model of the application and used statistical data available from page accesses of the application to statistically test the application. Di Lucca et al (2002) developed a web application model and set of tools for the evaluation and automation of testing web applications. They presented an object-oriented test model of a web application and proposed a 22 definition of unit and integration levels of testing. For unit level testing, they adopted a web page as the unit to test. Their technique focused on both client and server pages. They developed functional testing techniques based on decision tables, which helped in generating effective test cases. Jeff Offutt et al (2004) presented an approach to generate test cases with the goal of uncovering faults and security vulnerabilities in server software, considering users’ ability to bypass client-side input validation. Their strategy was called bypass testing Bypass testing was geared more towards users who bypass the client-side input validation, thus testing for robustness and security holes in the server applications from less normal user behavior. Yuetang Deng et al (2004) presented a technique for path-based test case generation to test database of web applications. They addressed the problem of maintaining the state of the database to expose faults. However, their technique was suitable only in the presence of static URLs. Chen Fu et al (2004) addressed testing of middle-ware applications and their research was specifically related to testing recovery error code by injecting faults and gathering code coverage information to determine potential system reliability. However, tester intervention was required at various stages of the process. Andrews et al (2005) proposed an approach to model web applications with Finite State Machines (FSM) and using coverage criteria based on FSM test sequences. They represented test requirements as subsequences of states in the FSM, generated test cases by combining the test sequences and proposed a technique to reduce the set of inputs. However, their model does not handle dynamic aspects of web applications, such as 23 transitions introduced by the user through the browser, and connections to remote components. 2.3.2 User-session-based Testing In user-session-based testing, data are collected from users of a web application by the web server. Each user-session is a collection of user requests in the form of base request and name-value pairs (for example. form field data). A base request for a web application is the request type and resource location without associated data (GET/servlets/authentication/login.jsp). More specifically, the user-session is defined as beginning when a request from a new IP address reaches the server and ending when the user leaves the web site or the session times out. A test case consists of a set of HTTP requests which are associated with each user-session. Sebastian Elbaum et al (2005) provided promising results that demonstrated the fault detection capabilities and cost-effectiveness of usersession based testing. In particular, they showed user-session techniques could able to discover certain types of faults but would not detect faults associated with rarely entered data. In addition, they also showed that the effectiveness of user-session techniques improved as the number of collected session’s increases. However, the cost of collecting, analyzing, and replaying test cases also increased. They also reported that they were able to achieve a high percent of test case reduction with Harrold et al (1993) test case reduction technique. Sampath et al (2007) have conducted the user-session- based testing by applying concept analysis that reduces the size of user-session-based test suites and empirically evaluate the effectiveness of the reduced suites. The reduction techniques were based on criteria, such as covering all base requests in the application while maintaining the use case representation. The criteria 24 create a test suite smaller than the original suite, but tests are in no particular order. Indeed, test suite reduction techniques strive to reduce the size of a test suite, while maintaining overall fault finding effectiveness. Sridevi Sampath et al (2007) and Sprenkle et al (2005) generated user-session-based test cases from usage logs. When available, cookies were used to generate a user-session based test case. Web applications are similar to GUI applications since both the domains are highly user-driven. User input can unexpectedly change the control/data-flow in the application. Models (Memon et al 2001), test case generation strategies (Lee White and Husain Almezen 2000), oracles (Memon et al 2000) and coverage criteria (Memon 2001) have been proposed to test GUIs. Regression testing strategies (Memon and Mary Lou Soffa 2003) have also been presented. In addition, web applications use database applications as the backend tier from where the data is rendered to the user. Database testing strategies (Kapfhammer and Mary Lou Soffa 2003) were exploited to improve the effectiveness of web application testing. While GUI and database testing strategies could be used to test certain parts of the application, such as the user interaction based front-end and the database backend, the existing testing strategies cannot be used to test the web application in its entirety. 2.4 TEST CASE GENERATION AND PRIORITIZATION RESEARCH FOR DATABASE OPERATIONS This section reviews the research related to testing of databasecentric software applications. While a significant amount of research has focused on the testing and analysis of basic operations of software, there is a relative little work that specifically examines the testing of database-centric applications. Whittaker and Voas (2000) point out that device driver, operating systems, and databases are all aspects of a software system's 25 environment that are often ignored during testing. However, they do not propose and evaluate specific techniques that support the regression testing process. Fault-based testing for database applications aimed at demonstrating the absence of prescribed faults in a program. According to Zhu et al (1997), three kinds of relevant adequacy criteria, namely, mutation testing, perturbation testing, and error seeding were considered. Jin and Offutt (1998) highlighted test adequacy criteria that incorporate a program’s interaction with its environment; they did not specifically address the challenges associated with test adequacy criteria for database-centric applications. Slutz et al (1998) addressed the issues associated with automatically creating the statements that supported the querying and manipulation of relational databases. This approach generated database query and manipulation statements outside the context of a program that interacted with a database. Gray et al (1999) generated test databases that satisfied the constraints in the relational schema, and their approach focused on the rapid generation of large random data sets that supported performance testing. Chan and Cheung (1999) proposed a technique that tested databasecentric applications that were written in a general-purpose programming language and included embedded structured query language statements that were designed to interact with a relational database. This approach transformed the embedded SQL statements within a data-centric application into general purpose programming language constructs. Effective test cases were generated to reveal more faults compared with the conventional 26 approaches. The test cases generated helped reveal faults related to the internal states of databases in database applications. Davies et al (2000) in their work performed automated test case generation by means of considering the database schema and other properties/constraints. Chays et al (2000) proposed a partially automated software testing tool, named AGENDA, inspired by the category-partition method that enabled to determine whether a program behaved according to specification. When provided with the relational schema of the databases used by the application under test and a description of the categories and choices for the attributes required by the relational tables, the AGENDA tool could generate meaningful test databases (Chays and Deng 2003, Chays et al 2002). AGENDA also provided a number of database testing heuristics, such as determining the impact of using attribute boundary values or determining the impact of null attribute values that could enable the tester to gain insights into the behavior of a program when it interacts with a database that contains different states (Chays et al 2004). Daou et al (2001) used data flow information to support the regression testing of data-centric applications. Their exploration of data flow issues did not include either a representation for a database centric or a complete description of database interactive association. Chays and Deng (2003) described approaches that were similar to Neufeld et al (1993), Zhang et al (2001), Chays et al (2002) to generate database states using constraint knowledge present in the relational schema. Wu et al (2003) in their work performed database application testing by preserving privacy. In Kapfhammer and Mary Lou Sofia (2003) work, a family of test adequacy criteria was used to assess the quality of test suites for database- 27 driven applications. They developed a unique representation of a databasedriven application that facilitated the enumeration of database interaction associations. The associations reflected an application definition and use of database entity at multiple levels of granularity Suarez-Cabal and Tuya (2004) presented a coverage criterion that measured the adequacy of SQL SELECT queries in the light of a database that had been populated with data. This approach calculated the coverage of a single query, identified a subset of the database that yielded the same level of coverage as the initial database, and provided guidance that increased database coverage. However this scheme focused on the coverage for SQL SELECT statement and it did not consider any regression techniques. Willmor and Embury (2005) also described an automated test data generation scheme, when, given a predicate logic condition, described the desired database state. They have also proposed a regression test selection technique that identified a subset of a test suite that could be used during subsequent rounds of testing. Their approach was similar to the reduction technique but it could exhibit limited reductions and performance concerns because it performed a conservative static analysis of the entire databasecentric application. Halfond and Orso (2005) presented an adequacy criterion named command form coverage criterion that ensured that a test suite caused the program under test to submit as many of the viable SQL commands as possible. They used static analysis to identify all of the SQL commands that the program could submit and then determined how many of the statements were actually generated during testing. This coverage is considered in the study, as a prioritization index as explained in Chapter Five. 28 Haftmann et al (2005) presented a scheme that re-ordered a test suite in an attempt to avoid RDBMS restarts. Their approach executed the test suite without database restarts and observed whether or not each test case passed. A test ordering conflict was recorded in a database to assess the exclusion of a restart between the two tests caused the otherwise passing test to fail. The authors also described heuristics that re-ordered a test suite in an attempt to avoid the test conflicts that were stored within the conflict database. They have extended their basic technique to support the parallel execution of the test suite on a cluster of testing computers. They also focused on improving the efficiency of regression testing but did not propose techniques to create effective test case prioritization. Chan et al (2005) proposed to integrate SQL statements and the conceptual data models of an application for fault based testing, which used a set of mutation operators based on the standard types of constraint used in the enhanced entity-relationship model. The operators that were semantic in nature guided the construction of affected attributes and join conditions of mutants. Mutation testing for database applications were performed by (Tuya et al 2006) which was an effective, though computationally expensive, fault-based software testing technique. 2.5 TEST CASE GENERATION AND PRIORITIZATION RESEARCH FOR WEB SERVICE OPERATIONS In recent years, usage of web services has rapidly grown into a part of the business in many organizations. However, increased usage of web services has not been reciprocated with corresponding increases in reliability. In literature, there is a significant amount of research activities going on in 29 testing web services to maintain the quality of the application in which it is referenced. Offutt and Xu (2004) presented a new approach to test web services based on data perturbation. This method used two other methods: data value perturbation and interaction perturbation. But the method proposed was restricted to peer-to-peer interactions. Tsai et al (2005) recommend the use of an adaptive group testing technique to address the challenges in testing service-oriented applications when a large number of web services are available. They rank test cases according to a voting mechanism on input-output pairs. Neither the source code of the service nor the structure of WSDL was utilized. Martin et al (2007) outlined a framework that generates and executes web service requests, and collects the corresponding responses from web services. They propose to examine the robustness aspect of services by perturbing such request-response pairs. They have not studied test case prioritization. Yongbo Wang et al (2007) performed test case generation of semantic web services based on ontology by traversing the Petri-net to produce test steps and test data and generated the ontology based over Input Output Preconditions and Effect (IOPE) analysis. Web Ontology Language using Semantics (OWL-S) provided the expressive form of semantic with preconditions based on ontology. Hanna Samer et al (2007) have presented an approach for Specification-based Test Case Generation for Web Services in which they have proposed a method for generating test cases based on the WSDL input messages. 30 Siripol Noikajana and Taratip Suwannasart (2008) have undertaken Web Service Test Case Generation Based on Decision Table. The proposed method used WSDL-S and Semantic Web Rule Language (SWRL) to generate test cases. They have also described the Web Service contract with the WSDL-S (Web Service Semantics Language, an extension of WSDL) and OCL (Object Constraint Language). They also present an approach for generating test cases using WSDL-S and OCL (http://www.omg.org) while the test case generation method was pair-wise testing technique (2009). Chunyan Ma et al (2008) proposed a test case development model based approach for generating test data for the single operation of web services based on WSDL specification. A formal model for the data type of input element of the operation is defined and they have presented an algorithm that derives the model from WSDL and a method that generates test data for operations from the model. XML schema was used for the definition of data which facilitated the automated test data generation. Mei et al (2008) used the mathematical definitions of XPath as rewriting rules, and proposed a data structure known as an XPath Rewriting Graph (XRG) http://www.w3.org/TR/xpath20). They have also proposed a hierarchy of prioritization techniques (2009) for regression testing of services, They have also studied the problem of black-box test case prioritization of services based on the coverage information of WSDL tags. They have explored the XML message structure exchanged between services to guide test case prioritization. Sebastien Salva et al (2009) have performed automatic web service robustness testing from WSDL descriptions where they have considered only SOAP messages and not any database connections or internal code error messages. They have generated test cases based on hazards. 31 Jiang et al (2009) proposed a family of adaptive random test case prioritization techniques that tried to spread the test cases as evenly as possible across the code space to increase the rate of fault detection. They have also performed regression testing process improvement for specification evolution of real-world protocol software. Ke Zhai et al (2010) proposed to integrate service selection in test case prioritization to support regression testing. Furthermore, they also proposed a family of black-box service-centric test case prioritization techniques that guide the prioritization based on Point of Interest (POI) information. 2.6 AGENT AUTOMATION RESEARCH Agent-based computing has often been suggested as a promising technique for problem domains that are distributed, complex and heterogeneous (Wooldridge 1997). A number of agent-based approaches have been proposed to solve different types of problems in the software industry. Agent-Oriented Software Engineering (AOSE) was established to speed up the construction of software agents. In this paradigm, application programs were written as a set of agents that interact with one another according to communication standard known as Agent Communication Language (ACL in sequel) to prevent the potential mismatch problems such as inconsistencies (in syntactic usages) and incompatibilities (semantics usages) for application written in different languages. Medvidovic and Taylor (2000) in their work, emphasized on specifying and analyzing the agent-based architectures. Close in spirit to this work were the works of Xu and Shatz (2001) and Xu et al (2003) which has used Petri-nets to model agent-based systems. More specifically, they have 32 used various kinds of Petri Nets (such as predicate transition nets) to model the behavior of an individual agent or the entire system of agents. Several mobile agent systems have been developed and used in various environments for carrying out different jobs. Some of them include the Naplet system (Naplet 2002), the Aglet system by IBM (Aglet http://aglets.sourceforge.net/.), Jini-based mobile agent implementation and the like. Mobile agent systems described by Wang et al (1999) uses Java Mobile agent technology to deliver mobile agents to the threshold of agent enhanced e-commerce application, seeking technology with the potential to inspire and support mass-market e-commerce application. Huedo et al (2004) presented a framework for adaptive executions in grids. This was a globusbased framework that allowed execution of jobs in a ‘submit and forget’ fashion. Maintaining performance, Quality of Service (QoS) for individual applications was one of the major objectives of the research. The ICENI project emphasized a component framework for generality and simplicity of use of agents (Furmento et al 2005). The GrADS project focused on building a framework for both preparing and executing agents in Grid environment (Kennedy et al 2002). Each application had an application manager, which monitored the performance of that application for QoS achievement. Failure to achieve QoS contract, causes a rescheduling or redistribution of resources. GrADS monitored resources and used Autopilot for performance prediction through agents (Wolski et al 2003, Ribler et al 2001). The AgentScape project (Wijngaards et al 2002, Overeinder et al 2000) provided a multi-agent infrastructure that was employed to integrate and coordinate distributed resources in a computational grid environment. The 33 A4 (http://www.ccrl-nece.de/˜cao/A4/) project at Warwick had likewise developed a framework for agent-based resource management on grids. In Gradwell (2003), multiagent systems were used to trade for grid resources at the higher “services” level and not at the base “resource” level. The above literature survey makes it clear that there is very little work of agent implementation in the field of software testing, particularly, regression testing. As the regression process for software involves more human intervention and is a time consuming one, automation of this process may reduce the time and cost. This study proposes a novel method of automation of the complete regression process through multi-agents, thereby reducing the human intervention.