School of Mathematics and Systems Engineering Reports from MSI - Rapporter från MSI Research Ontology Data Models for Data and Metadata Exchange Repository Iryna Kamenieva Iryna Kamenieva December 2009 MSI Växjö University SE-351 95 VÄXJÖ Report 09086 ISSN 1650-2647 ISRN VXU/MSI/DA/E/--09086/--SE School of Mathematics and Systems Engineering Department of Computer Science Växjö University Master Thesis Research Ontology Data Models for Data and Metadata Exchange Repository Iryna Kamenieva Supervisor: Dr. Marcelo Milrad Abstract For researches in the field of the data mining and machine learning the necessary condition is an availability of various input data set. Now researchers create the databases of such sets. Examples of the following systems are: The UCI Machine Learning Repository, Data Envelopment Analysis Dataset Repository, XMLData Repository, Frequent Itemset Mining Dataset Repository. Along with above specified statistical repositories, the whole pleiad from simple filestores to specialized repositories can be used by researchers during solution of applied tasks, researches of own algorithms and scientific problems. It would seem, a single complexity for the user will be search and direct understanding of structure of so separated storages of the information. However detailed research of such repositories leads us to comprehension of deeper problems existing in usage of data. In particular a complete mismatch and rigidity of data files structure with SDMX - Statistical Data and Metadata Exchange - standard and structure used by many European organizations, impossibility of preliminary data origination to the concrete applied task, lack of data usage history for those or other scientific and applied tasks. Now there are lots of methods of data miming, as well as quantities of data stored in various repositories. In repositories there are no methods of DM (data miming) and moreover, methods are not linked to application areas. An essential problem is subject domain link (problem domain), methods of DM and datasets for an appropriate method. Therefore in this work we consider the building problem of ontological models of DM methods, interaction description of methods of data corresponding to them from repositories and intelligent agents allowing the statistical repository user to choose the appropriate method and data corresponding to the solved task. In this work the system structure is offered, the intelligent search agent on ontological model of DM methods considering the personal inquiries of the user is realized. For implementation of an intelligent data and metadata exchange repository the agent oriented approach has been selected. The model uses the service oriented architecture. Here is used the cross platform programming language Java, multi-agent platform Jadex, database server Oracle Spatial 10g, and also the development environment for ontological models Protégé Version 3.4. Keywords: repository, SDMX standart, data mining, classification, textual collection,hierarchical data model, semantic web, ontology, multiagent system, search algorithms, agent-oriented systems, intelligent agent, jadex, sdk, java, rdf, protégé, sparql, oracle splatiat. i Acknowledgements I would like to express my gratitude to my supervisor Dr. Marcelo Milrad, who expressed interest in my work, encouraged, stimulated and helped me with this thesis. I am thankful to Tatyana Shatovska and Victoriya Repka, who gave me strong and initial ideas about the work and especially for encouraging and supporting my efforts connected with the thesis. I also thank my family for supporting me morally all the time, they were always with me, and I thank all my friends for spending nice times with me, even during hard working days. ii Table of content 1. INTRODUCTION.......................................................................................................................1 1.1 1.2 1.3 1.4 1.5 PROBLEM DEFINITION .....................................................................................................................................1 GOALS AND CRITERIA .......................................................................................................................................2 PURPOSE OF WORK ........................................................................................................................................3 LIMITATIONS..................................................................................................................................................3 OUTLINE OF THIS THESIS...................................................................................................................................3 2. METHODOLOGICAL APPROACH .....................................................................................4 2.1 SOFTWARE DEVELOPMENT APPROACH ................................................................................................................4 2.2 THE PROCESS OF IMPLEMENTATION DEVELOPMENT...............................................................................................4 3. DATA MIMING REPOSITORIES RESEARCH..................................................................7 3.1 3.2 3.3 3.4 3.5 3.6 UCI MACHINE LEARNING REPOSITORY .......................................................................................................................7 DEA DATASET REPOSITORY....................................................................................................................................8 XML DATA REPOSITORY .......................................................................................................................................9 FREQUENT ITEMSET MINING DATASET REPOSITORY .......................................................................................................10 ANALYTICAL RESEARCH .......................................................................................................................................11 ONTOLOGY DATA AND METADATA EXCHANGE REPOSITORY ...............................................................................................11 4. MULTI-AGENT INTELLECTUAL TECHNOLOGIES ....................................................12 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 AGENT TECHNOLOGY .........................................................................................................................................12 BASIC CONCEPTS OF AGENT APPROACH .....................................................................................................................13 INTELLIGENT AGENT’S FEATURES ............................................................................................................................13 BELIEF-DESIRE- INTENTION (BDI) AGENT ARCHITECTURE..................................................................................................13 MULTI-AGENT SYSTEM (MAS) ..............................................................................................................................14 MAS CLASSIFICATION ........................................................................................................................................14 AGENT ENVIRONMENT .......................................................................................................................................15 AGENT-ORIENTED SYSTEM ...................................................................................................................................15 5. ONTOLOGY MODELS DEVELOPMENT..........................................................................21 5.1 5.2 5.3 5.4 IMPLEMENTATION ............................................................................................................................................21 ONTOLOGY REPRESENTATION ................................................................................................................................21 PROGRAM - INSTRUMENTAL METHOD OF IMPLEMENTATION OF THE ONTOLOGICAL MODEL ............................................................22 ONTOLOGY SOURCE MODEL (DATASET ONTOLOGY MODEL)...............................................................................................22 5.4.1 Ontological source models development ...................................................................................................22 5.4.2 The ontological models source of classes description....................................................................................24 5.5 ONTOLOGY DATA MINING MODEL ...........................................................................................................................25 5.5.1 Ontological data mining models development ............................................................................................25 5.6 ONTOLOGY USER MODEL .....................................................................................................................................27 5.6.1 RDF model ............................................................................................................................................28 5.6.2 Ontological user models development .......................................................................................................28 5.6.3 The ontological user models of classes description........................................................................................29 6. INTELLIGENT SEARCH AGENT DESIGN AND DEVELOPMENT ...........................31 6.1 6.2 6.3 6.4 6.5 6.6 6.7 AGENT IMPLEMENTATION ...................................................................................................................................31 INTELLIGENT SEARCH AGENT..................................................................................................................................31 THE SEARCH AGENT GOALS ...................................................................................................................................33 SEARCH AGENT OUTLINE .....................................................................................................................................35 SEARCH AGENT ADF .........................................................................................................................................36 SEARCH AGENT ”BELIEF”.....................................................................................................................................36 SEARCH AGENT INTERACTION WITH OTHER AGENTS ........................................................................................................37 6.7.1 Search agent Scenario.............................................................................................................................37 6.7.2 User agent Scenario................................................................................................................................38 6.7.3 Coordinator [Manager] agent Scenario......................................................................................................43 6.7.4 Source agent Scenario.............................................................................................................................46 6.7.5 Classification Scenario ............................................................................................................................49 6.8 AGENTS DEVELOPMENT USING JADEX TECHNOLOGY .......................................................................................................49 7. SYSTEM PROGRAM MODEL: DEPLOYMENT AND IMPLEMENTATION............50 7.1 PROBLEM-SOLVING ...........................................................................................................................................50 iii 7.2 7.3 7.4 7.5 7.6 PRESENTATION LEVEL .........................................................................................................................................51 SERVICE LEVEL .................................................................................................................................................54 AGENT SUBSYSTEM ...........................................................................................................................................55 DATA AND METADATA EXCHANGE REPOSITORY EXPLANATION ............................................................................................55 WORK DATABASE LEVEL ......................................................................................................................................56 8. CONCLUSIONS AND FUTURE CHALLENGES..............................................................58 8.1 8.2 8.3 8.4 RESULTS .......................................................................................................................................................58 CONCLUSIONS .................................................................................................................................................59 FUTURE CHALLENGES .........................................................................................................................................59 REFLECTIONS ..................................................................................................................................................60 REFERENCES..............................................................................................................................62 INTERNET SITES .......................................................................................................................65 APPENDICES ...............................................................................................................................66 APPENDIX A DATA AND METADATA EXCHANGE REPOSITORY (DATA MINING REPOSITORY) [AN EXAMPLE] .........................66 APPENDIX B DATA AND METADATA EXCHANGE REPOSITORY (DATA MINING REPOSITORY)..............................................69 iv List of Figure FIGURE 2.1: SDMX INFORMATION MODEL..............................................................................................................5 FIGURE 2.2: METADATA SCHEME DESCRIPTION......................................................................................................5 FIGURE 3.1: UCI REPOSITORY WEB-PAGE ................................................................................................................7 FIGURE 3.2: HOME PAGE DEA..................................................................................................................................8 FIGURE 3.3: DEA PAGE LOGIN..................................................................................................................................8 FIGURE 3.4: XMLDATA REPOSITORY INTERFACE .....................................................................................................9 FIGURE 3.5: DATASET INFORMATION......................................................................................................................9 FIGURE 3.6: FIMI DATASETS...................................................................................................................................10 FIGURE 4.1: DELIBERATIVE ARCHITECTURE BASE ..................................................................................................16 FIGURE 4.2: REACTIVE ARCHITECTURE BASE .........................................................................................................17 FIGURE 4.3: HYBRID MULTI-LAYER AGENT BASE ARCHITECTURE..........................................................................19 FIGURE 4.4: AGENT STRUCTURE OF CYCLIC-MACHINE ARCHITECTURE ................................................................20 FIGURE 5.1: ONTOLOGY SOURCE MODEL CLASSES AND ATTRIBUTES IN THE PROTÉGÉА.1: THE LOG AND PASSWORD PAGE .......................................................................................................66 FIGURE А.2: BEGINNER REGISTRATION .................................................................................................................66 FIGURE А.3: ADVANCED USER REGISTRATION.......................................................................................................67 FIGURE А.4: RESEARCH METHODS INFORMATION...............................................................................................67 FIGURE А.5: VIEW USER INFORMATION ................................................................................................................68 FIGURE А.6: THE USER REFRESHMENT ..................................................................................................................68 FIGURE B.1: SIMPLE SEARCH PAGE ........................................................................................................................69 FIGURE B.2: ADVANCED SEARCH PAGE .................................................................................................................69 FIGURE B.3: SEARCH RESULTS PAGE ......................................................................................................................70 FIGURE B.4: THE PROMPT-PAGE OF THE SYSTEM OF THE POPULAR QUERIES......................................................70 FIGURE B.5: THE LIST OF THE MOST POPULAR DATASETS OF REPOSITORY ON THE HOME PAGE........................71 v List of Tables ТАBLE 5.1: DATASET CLASS SLOTS .........................................................................................................................24 ТАBLE 5.2: DATASETFILE CLASS SLOTS...................................................................................................................24 ТАBLE 5.3: JUDGE CLASS SLOTS .............................................................................................................................25 ТАBLE 5.4: ADRESS SLOTS CLASS ...........................................................................................................................29 ТАBLE 5.5: UNIVERSITY SLOTS CLASS.....................................................................................................................29 ТАBLE 5.6: PREFERENCE SLOTS CLASS ...................................................................................................................30 TABLE 5.7: SLOTS OF ABSTRACT CLASS ACCOUNT.................................................................................................30 TABLE 5.8: SLOTS OF ABSTRACT CLASS PERSON....................................................................................................30 ТАBLE 5.9: SLOTS OF EXPERIENCED CLASS ............................................................................................................30 TABLE 6.1: THE SEARCH AGENT KNOWLEDGE DESCRIPTION ................................................................................37 ТАBLE 6.2: THE SEARCH AGENT EVENTS................................................................................................................37 TABLE 6.3: THE USER AGENT KNOWLEDGE ...........................................................................................................40 TABLE 6.4: THE AGENT USER EVENTS ....................................................................................................................41 TABLE 6.5: COORDINATOR AGENT KNOWLEDGE ..................................................................................................44 ТАBLE 6.6: THE COORDINATOR AGENT EVENTS ....................................................................................................45 vi 1 Introduction We envisage a world where the barriers to sharing and exchanging data and information are radically lowered. The world we are undoubtedly moving toward is one of Web-based ‘mashups’; that is, networked software applications that can combine data in real-time from multiple service providers in ways that are user-friendly, yet powerful. To be effective in this space, it is imperative that repositories become first-class service providers. Repository is a dataset of statistical data, which contain data, data description, and metadata description to them. This involves collecting, curating and preserving good metadata. Hence, metadata is not simply about technical requirements specific to repositories; rather, it forms the basis of an emerging information infrastructure for data stores communications that has far-reaching consequences. Digital repositories are networked software applications primarily used for storing, managing and disseminating data (e.g. digital publications, theses, data sets and so on). The Repositories differ from conventional content management systems because they include technologies to ensure that data are preserved for long-term access and use. Although repositories were initially developed for science purpose, statistical companies, they are currently being implemented more widely; for example, by museums to facilitate online access to cultural heritage resources, and government agencies to mediate long-term access to documents and other data. In practical terms, implementing a digital repository nowadays can be as simple as downloading free open-source software and installing it onto a networked computer. Establishing a stable repository for everyday institutional use is an altogether harder proposition however. The most popular open source repository applications are DSpace, Fedora, E-prints, UCI Knowledge Discovery in Databases Archive; DEA DATASET REPOSITORY; Frequent Itemset Mining Dataset Repository; XMLData Repository. There are some commercial repository software providers, but none have gained the same level of popularity as the open source repositories mentioned. The important point to note here is that a repository is essentially a relational database that stores and keeps track of metadata records for files stored in a mass-data storage facility. The underlying technology is relatively straightforward whereas the institutional context of use is typically complex. These systems are not information. It is difficult to exchange files automatically. In the next chapters I will describe in greater detail how to create intellectual search agent for public information repository based on ontologies models and intellectual agents. 1.1 Problem Definition In this work the conceptual structure and interaction principles of intelligent agents and ontological models in the intelligent data and metadata exchange repository will be offered. The main attention in this work will be paid to the development of intelligent search agent model realizing information extraction on ontological model of DM methods. Parts of ontological models of DM regarding clusterization, classifications, predictions with usage of query language SPARQL. In a client part of system there is considered the building of the intelligent agent of the repository user, the coordinator (manager) agent, which controls the common state of the system, and also fulfils the registration and authorizations of users, the resource (dataset) agent with partial usage of files structure with SDMX standard data. The RDF language was used to describe and to develop the ontological model of the user, ontological model of DM, ontological model of resources. Choosing the development methods the service- and agent oriented technologies have been combined into the uniform architecture. The three-level architecture has been selected for implementation of this system, which main body of business logic was built by means of agent technology Jadex, allowing, 1 following Belief-Desire-Intention (BDI) model, to develop the goal agents. Jadex includes this model in Java Agent Development Framework (Jade) agents, adding representations, goals and schedules, as objects of the first class, which can be created and used within the agent. Jade is the software environment intended for development of the multi-agents systems and the applications, supporting FIPA-standards for intelligent agents (Weiss et al., 1997). At data representation level there is a Web application. The Web services are as a part of business logic and a layer between representation level and business logic and data storage levels. 1.2 Goals and Criteria Creating a new public information repository to store datasets using intelligent agent and ontology approach for storing, conversion, search, add, description, selection of the required information for researchers needs in the field of Data mining and Machine Learning. As a base standard was chosen the standard SDMX Standards Version 2.0 and the main parameters of the Statistical European Repositories were taken. The main reason to create information statistical exchange repository is to improve structure of repositories using ontologies and intellectual agents. The aim of this work is to study and develop an algorithm and architecture of the multiagent system module searching, which provides the ontology models for data and metadata exchange repository (data mining repository). An analysis of subject area to identify further development in and to formulate approaches against search engines and their implementation based on agent technology, services and data models based on ontologies is necessary. To develop a prototype system it is necessary to: examine modern methods of finding statistical repositories; analyze intelligent agents, multi-systems, agent-oriented approach; develop a search algorithm to ontological models (simple and advanced search supporting, account search, which takes individual interests, orientation of activities, previous search queries of the user, to explore different strategies of search algorithm); research existing systems and platforms implementation of building systems based on intelligent agents; develop a model for integrating intelligent agents with web systems. develop a model of intelligent search agent, and its relationship to other agents. The search agent should be able to: perform a simple search for users regardless of user type; search by different criteria for authorized users; provide popular data sets; perform a search taking into account the personal needs of the user; provide user relevant queries information; keep statistics of requests and, if necessary, provide this information; remember the successful search results. design and implement the architecture of search module (research capabilities of intelligent agents in the field of search and retrieval systems, the choice of intelligent agent type, the development of the architecture search module based on agent-oriented methods, the implementation of the developed architecture based on multi-platform for intelligent agents) based on search algorithm; test search module of ontology data model for data and metadata exchange repository on a large amount of input data; research and analyze various strategies for the search module (analysis and testing various strategies to search engine, search optimization). 2 1.3 Purpose of work The conceptual idea is to create a structure for intelligent information data exchange system. We will focuse on the development of the ontology model for data mining methods, ontology model for data transformation methods and intellectual search agent for collaboration between these models, datasets and user profile. In our work we will use SDMX standard (SDMX, 2005) for dataset description. Text mining methods will be implemented as part of the classification problem for unlabeled datasets. In this system the following will be fulfilled: search agent program realization ontology models “ontology data and metadata exchange repository” architecture text classification (clusterization) for datasets 1.4 Limitations The “data and metadata exchange repository” is not entirely integrated with the rest of the statistics repository. The system is also not fully integrated with the areas of data mining and machine learning, for full and detailed description of these areas. 1.5 Outline of this thesis Chapter 2 introduces the main idea behind this thesis, and the state of the art technologies by describing some important issues related to metadata, ontologies, intellectual agents, SDMX standard. It also gives a brief description of other related technologies to artificial intelligence. Chapter 3 reviews the possibilities of modern methods of searching for statistical repositories and provides insight to “data and metadata exchange repository”. Chapter 4 describes intellectual system architecture based on intellectual agents. The architecture is explained based on Service Oriented Architecture of a distributed system and multi-agent systems. Chapter 5 describes the development of ontology models and their technology creating. Chapter 6 presents a number of intellectual agent development, description and scenarios and text clusterization (classification) approach for our system. Each scenario described the flow of events according to the agent’s activity. Text clusterization (classification) approach was created. Chapter 7 elicits the architecture of the “data and metadata exchange repository”. Chapter 8 summarizes the fundamental idea and the results that were obtained based on the work conducted in this thesis. The implementation of the application is also described shortly by illustrating some of the main developed parts concerning the architecture presented in Chapters 5, 6, 7. 3 2 Methodological approach The basic idea of the intellectual repository system is to provide for the user a particular set of analytical datasets in accordance of his objectives, as well as the most effective Data Mining method for processing such a set. One of the most effective intellectual methods of formal description of any subject area is the ontological approach. 2.1 Software development approach In this work we use a formal description of the Data Mining methods based on the formal language of the Resource Description Framework - RDF, which is the most unified for description any resources in the form of a directed graph. All the information will be saved as a set of ontological models. For the ontological models processing (search activities on the models), it is used a query language. We will use the query language SPARQL, which has been standardized by RDF working group. To implement the query to the ontological models using the SPARQL language we have to use an intellectual model, which will join the user's query to the system and search methods on the models, as well as information input. For this purpose, in the system the strategy of the intelligent agents building will be used. Intelligent agents are a program entity which autonomously operates to achieve the agents or user goals and has the intellectual characteristics. To implement the multi-agents system we have used instrumental platform Jadex. For agent’s functionality description and for description of their interactions, as well as for description the sequence of requests to the system we have used a universal modeling language UML. To implement multi-agents system using ontological representation and storage of the information we have developed client-server application architecture using Web services. The main elements of such system would be: Information in the form of ontological models, intelligent agents that will process the data, give necessary information to the user, user interface, which allows him to set a formal requests, and if it is necessary to refine requests to get more understandable results. As the database server we have used DB Oracle Spatial. 2.2 The process of implementation development Let us consider some of the stages of the system implementation. 1. Requirement analysis First of all repositories of scientific collections of statistical data, which identified their strengths and weaknesses were deeply researched and analyzed. This analysis has helped on the basis of the characteristics of the analysis of the shortcomings of existing repositories of scientific data sets to develop a software implementation. It solves the problem of preservation of large and stable sets of data using ontological models in the hierarchical structures and improves the efficiency of working with them. 2. Models design The Ontology is a complete structural specification of a certain subject area, its formal submission, which includes a glossary of terms of that area and the set of logical relations, which describe how these terms relate to each other. Ontologies allow creating an effective information exchange system. The main task is not to collect disparate information, but structured, formal data to solve real business and economic challenges. The main purpose of information exchange system is to make information accessible and reusable across the whole system. Due to this fact that information, which is not described and not structured, eventually becoming worthless. In contrast, information, which allows automated distribution and exchange generates added value. The entire above problem is solved in the system (Ratyshin et al., 2001). The ontological models of intellectual processing of the data user, data sets, resources, 4 external systems for integration and sharing data were designed. It was necessary to develop a search algorithm (the agent) with the least loss of time, passing on the hierarchical structure of ontological models could qualitatively authenticated information. As a basic standard description of the data set the SDMX Standards Version 2.0 (SDMX, 2005) was used so basic parameters of the statistical description of European repositories for automatic implementation of data integration between the repositories. The structure of the package is shown in Figure 2.1. Figure 2.1: SDMX information model These models using Protégé 3.4 were created. The Protégé 3.4 is an integrated tool based on knowledge for designers of systems. The Protégé 3.4 contains a model of knowledge, which consists of the classes of information, slots, instances and applications. The Protégé 3.4 tool can access all these parts with the help of the unified graphical user interface. The upper level includes overlapping tabs for compact presentation of the parts and their co-editing. This top-level design to the tabs allows the modeling integration of the ontology of classes describing a particular method, creating a means of learning to gather information, enter the individual items of data and building a knowledge base. Metadata is data about data. Metadata does not have the information, but describes the attributes of data containing information (e.g., not the name of the customer, but that field «Customer Name» has a length of 35 characters, composed of uppercase and lowercase letters, and is linked to the field «name»). Metadata is stored in the form of database tables in a repository. Their maintenance is carried out centrally. Metadata purpose is to control the attribute data consistency in the system and facilitate data management by adjusting the attributes in one location. The results of the adjustment will be automatically distributed to all necessary applications. In order to integrate the repository with data warehouse was used as the basic scheme of the standard and their own specific concepts Figure 2.2. Figure 2.2: Metadata scheme description 5 Interaction between the ontological models is based on intelligent agents: the agent coordinator, a resource agent, agent search, a user agent. The creation of ontologies is a promising area of modern research in the processing of information provided in natural language. One of the advantages of using ontologies as a tool for learning is a systematic approach to the study of the subject area. This is achieved: regularity in ontology provides a holistic view of the subject area, uniformity in the material presented in a unified format is much better perceived and reproduced; scientific in building the ontology allows restoring the missing logical link in their entirety. Also, ontologies allow using large amounts of data from different systems due to the creation of semantic description of data. 3. Model and agent development The Intelligent search agent was developed. The intelligent agent is some systems, which have the following characteristics: Autonomy - the action of agent is determined only by its internal state, no external stimuli can influence the behavior of the policy agent if it has not been foreseen by its structure; Reactivity - agents exist within a certain environment, which interact, i.e. able to perceive changes in the environment and respond to them; Proactive - agents have a goal-directed behavior to solve their problem, i.e. agent tries to solve the task entrusted to it in a changing environment, which is planning its own actions; Social ability - agents are able to interact and collaborate with each other for the task, interact with the ontological models to get good results; Personal picture of the world: each agent has its own model of the surrounding world (environment), which describes the manner in which the agent sees the world. The agent bases its model of peace on the basis of information received by the external environment; Sociability and cooperativity: Agents can exchange information with their environment and other agents. The possibility of communication means that the agent should receive information about its environment, which gives it the opportunity to build their own model of the world. Moreover, the possibility of communication with other agents is a prerequisite for joint action to achieve goals; Intelligent behavior: the behavior of an agent includes the ability to learn, logical deduction, or construction of a model environment in order to find the best ways to conduct. Therefore, each agent is a process that has a certain part of knowledge about the object and the opportunity to share this knowledge with other agents. As a result, intelligent search agent interacts with the ontological model and the user to get an expert answer to the query. 6 3 Data miming repositories Research In this chapter we present analysis of the most popular statistical repositories, analytical comparative analysis about each of them and short overview of the system ontology data and metadata exchange repository. The Repository is a place where any data are stored and maintained. The most common data repository is stored in files accessible to the further spread of the network (Pearson, et al., 2004; Cunningham, et al., 2008). The repository is a database of configuration management and change management throughout the life cycle of a data warehouse. Also the repository contains all the necessary information to interested individuals in the stages of its creation and operation (Xie, et al., 2006). The repository stores the basic version of the software data warehouse, data that reflect the history of its establishment and operation, detect errors claims during its operation and the requirements and wishes of its modernization and a complete set of documentation for the version of the software data warehouse (Zimmermann, 2006; Neil, 2005-2008). It also keeps detailed information on the processes of development and maintenance (Moore, et al., 2009; Fatudimu, et al., 2008). 3.1 UCI Machine Learning Repository UCI Machine Learning Repository is the largest repository of real and model tasks of machine learning (Fig. 3.1). Repository containing real data on applications in biology, medicine, physics, engineering, sociology, and other data repository that is widely used by students, teachers and researchers around the world as the primary source of data for empirical analysis, testing and comparison of machine learning algorithms. UCI repository as ftp established at the University in the Irvin city (California, USA) (Blake C. L. et al., 2001), (Cortez, et al., 2007), (Asuncion A., et al., 2007). Figure 3.1: UCI Repository web-page 7 Public repository Advantages: provided the opportunity to play and verification of results by other researchers; because many problems has aggravated «fitting» algorithm for one specific task; the algorithm is best to solve an opportunity to provide classes of tasks. The advantages of this repository: well-sorted data, full text search. Disadvantages: only text data is not easy to use and change, the lack of search. This is one of the few repositories, which has a reputation as a reliable repository of scientific data sets. Deserves respect data sets filtering. 3.2 DEA Dataset Repository Several DEA Dataset Repository were created using ASP.NET technology and databases. The main page of the DEA web-site contains information about DEA Datasets repository presented in Figure 3.2. Figure 3.2: Home page DEA Clicking on the “Continue” button on the home page, you can go to the login. It is represented in Figure 3.3. Figure 3.3: DEA Page login The advantage of this system is convenient resources search. 8 3.3 XML Data Repository In the XML Data Repository data is stored in XML format, as well as statistical data for use in research experiments. Interface and information about datasets is presented in Figure 3.4 and Figure 3.5. Large volumes of data is stored in compressed form using GZIP and XMILL. Statistics data set is calculated using the XML Toolkit. Figure 3.4: XMLData Repository Interface Administration of XML Data Repository engages just one person. To add data to the repository, you should send the information to the specified e-mail address. Figure 3.5: Dataset information Disadvantages of this system are the inconvenient search. It is difficult to understand for which tasks one can use this dataset, also lack of information. Advantages: universal data format XML, for easy conversion to any format for software use. 9 3.4 Frequent Itemset Mining Dataset Repository Frequent Itemset Mining Dataset Repository. Often search datasets is a fundamental problem in many tasks of Data Mining, and various approaches to the problem appeared in numerous articles at Data Mining conferences. Although this problem was presented in the context of the market scale, the problem is much wider. Generally speaking, the problem involves the identification of goods, products, symptoms, characteristics, etc., which often occur together in a set of data. As one of the major operations in Data Mining, Algorithms for the FIM can be used as building blocks for other more complex data processes. FIMI repository open repository of software implementations of Data Mining algorithms and data sets for them, which were accepted at a scientific seminar FIMI. FIMI datasets presented in Figure 3.6. Figure 3.6: FIMI datasets The advantage of this repository is that all sets are checked by skilled and credible commission, no more comfort for users was implemented. But this is a logical explanation: The purpose of the repository - storage of algorithmic implementations. The FIM algorithms are sometimes contrary to the requirements. There is a need to fully characterize and understand the algorithmic tasks. It would be interesting to understand why and under what circumstances one algorithm will outperform another. Therefore test methods for a variety of settings are necessary. For example, different data sets, which include dense and sparse, real and synthetic, small and large, from hundreds to tens of thousands of items, thousands of millions of transactions, etc. Data set that is sent to the repository must be accompanied by a detailed description of the algorithm and the set. To download data sets several conditions are necessary: input data should only use ASCII format; each agreement is kept on a separate line in the list of elements separated by spaces, and ends with new line character; every element is an integral integer; The sampling of points should be numbered consecutively, starting with 0, and each operation will be sorted in ascending order. For a fair comparison, the use of multithreading, advanced pipes, the low level of memory optimization or direct use of hardware, etc. is banned. 10 3.5 Analytical research We investigate most popular repositories and found the advantages and drawbacks each of them. UCI repository does not include any methods, but data only. The choice of data is carried out only on filters and data have fixed format. In this repository are lack: preprocessing of data under methods, search personification under problem area. Complexity (brevity description of dataset usage history under a method and problem area) of used data history reading. Advantages of the repository is the downloading speed, i.e. because this repository contains files of .txt format only the data downloading is faster. The DEA Dataset Repository just as and UCI repository DEA Dataset Repository does not include methods, there is no data preprocessing under methods in it. Files are stored in .xsl format only. Search is only on one of the criteria set, i.e. it is impossible to combine search in several conditions. For not advanced users it is difficult to search in this repository, there is no review of all data sets of the repository. Advantages of this repository is the search in any of criteria. There are no any methods in XML Data Repository as well as in the repositories set forth above. Inconvenience of search in that there is no understanding, for what tasks it is possible to use this sampling, there is no data preprocessing under methods, there is an insufficient information content by data in it. There is no additional information on data sets and some data sets very large in size. Advantages of this repository is in that data are stored in multipurpose XML format. It is simple for conversion to any other format and is easily applied to program usage. There is no registration in it, therefore it is simple for user to obtain data via http protocol, which allows to use the repository by agents for necessary data searching. Frequent Itemset Mining Dataset Repository as well as other repositories I researched, this system does not contain methods and accordingly there is no preprocessing under methods. In this system the extremely inconvenient search, there is no displayed additional information about data sets and separate data sets are not accessible for downloading. Advantage of the Frequent Itemset Mining Dataset Repository is that for each data set there is a description of experimental usage. Analysis of UCI repository, DEA Dataset Repository, XML Data Repository, Frequent Itemset Mining Dataset Repository has shown, that offered model methods and solutions are absent in these repositories. So we developed general concept of Ontology data and metadata exchange repository presented in 3.6 point. (Johnson, 2006; Nyika, 2009) 3.6 Ontology data and metadata exchange repository Scientific data set Repositories are created for data storage, retrieval, correspondence and processing of data from different subject areas provide a valuable resource for researchers, teachers and students. The storage can help scientists to support their experiments in the field of data mining. In our repository there are two users: beginner and expert. Each of them has agent. The agent is used in different parts. The beginner can: find data set using task description (classification approach) try to add data and vice versa, choose data domain using dataset The repository has Coordinator agent (manager). The user agents address to it and deliver tasks to beginner and expert. The repository keeps Data Mining and Machine learning ontology models. Each ontology model has search agent. It receives information (tasks) from coordinator agent. Also dataset ontology model interacts with own dataset agent. And source ontology model interacts with coordinator agent. 11 4 Multi-agent Intellectual technologies In this chapter we present detailed information about Multi-agent Intellectual technologies where we discuss basic concepts of agent approach, intelligent agent’s features how it is organized, agent architecture based on BDI model, multi-agent system overview and its classification, which helps us understand its main idea, agent environment overview, agentoriented system overview for presentation of program part. Over the past few decades, information technologies have experienced several phases of development. From the first appearance of large computers in the university laboratories to modern laptops or home computers that have the majority of the population. During this period of time, information technology is changed. The language programming and the very principles of programming and software systems were changed. Increasingly, information technologies are being introduced into the modern sphere of human activity in order to improve the quality and ease of labor rights, and it brings great results. Therefore, new demands for information systems are putting. Today, existing technology cannot fully satisfy demands of labor rights. This is due to the rapidly changing business requirements, competition and other factors. But onrush of technology to implement the goals set before them. This continuous development of modern information systems has led to a new level - a system based on agent technology. 4.1 Agent technology The agents are a new class of software and hardware-software entities that act on behalf of the user. They find and process information, conduct negotiations in electronic commerce systems and services that automate routine operations and support challenges, solutions, collaborate with other software agents in the event of complex problems, thus removing superfluous information from human pressures (Wooldridge M. et al., 1995). A large number of research laboratories, universities, various businesses and industrial organizations operating in the area of agent systems and technologies. The most prominent research centers are Carnegie Mallon University, University of Massachusetts at Amherst, Bologna University, a number of universities and colleges in the UK as Stanford University, Manchester Metropolitan University. And large corporations IBM, Microsoft, DEC, Apple, Toshiba, Hewlett Packard, etc. The main directions of scientific research in this area are: The agents theory, which treats the mathematical methods and formalisms abstract representations of structure and properties of agents and methods of constructing the logical conclusions of such formal systems; Collective agents behavior Method; The agents and MAS architecture; Methods, languages and communication agents; Agents programming languages; MAS methods and tools for automated design; Methods and means of agent’s mobility. The areas of practical use of agent technology is information management and computer networks, traffic management, information retrieval, electronic commerce, learning, digital libraries, and many other applications. 12 4.2 Basic concepts of agent approach The term «agent» is derived from the Latin verb «agere», meaning «action», «move», «rule», «manage». The encyclopedic dictionary gives the following definition: «agent - a figure, a person acting on the instructions or authority of another». This definition correctly expresses the essence of intelligent agents that can operate autonomously on behalf of its owner (user or another computer system) and to solve various tasks of information processing. For the success work the agent must have sufficient intellectual ability, should have the opportunity to interact with the owner to get jobs and send the results to be guided in their existence and take the necessary decisions (Meyer et al., 2002). Two basic characteristics - autonomy and purposefulness allow distinguishing intelligent agents from other software and hardware objects (modules, routines and procedures, etc.). The presence of the appropriateness of conduct requires that the intelligent agent has the property of reactivity. This level of intelligence corresponds to reflex behavior of animals. If the intelligent agent has knowledge about the environment, own objectives and ways of achieving them, then the agent can be called intelligent (cognitive). 4.3 Intelligent agent’s features By now, a fairly extensive list of properties that should hold intelligent agents was organized: Autonomy, autonomous functioning is the ability to the goals self-formation and functioning of self-their actions and internal state; Social ability (social behavior) is the ability to align their behavior with the other agents in a certain environment and rules of conduct through the exchange of messages in the language of communication; Reactivity is ability to perceive the state of the environment (environmental performance and a host of other agents) and to respond to changes; Pro-activity is the ability to be proactive. It means that agent self-generates goals and acts rationally to achieve them, not just passively responds to external events; Basic knowledge is a permanent part of the knowledge of the agent itself, the environment, as well as ongoing knowledge of other agents, which do not change in the life cycle of the agent; A Belief is the variable part of the agent knowledge about the environment and other agents that may change over time, but the agent may not know about it and continue to use them to their goals; Desires are the attainment stages and / or situation, which is desirable and important for the agent, but may be controversial and will not be achieved at all; Goals are the set of states, which are aimed at achieving the current behavior of the agent; Intentions are the agent’s obligations to do through their commitments to other agents, or its desire (that is consistent subset of desires, the favorites for one reason or another, and is compatible with its obligations); Commitments are tasks that take the agent to request and / or instructions from other agents. 4.4 Belief-Desire- Intention (BDI) agent architecture Basic knowledge is a necessary component for all the traditional intelligent systems, the conviction must be interpreted in some way in the structure of multi-agent system. Intelligence system (Agent) can be interpreted as the present rules of forming conclusions, the basic scales and weights of criteria, functions, or the benefits and so on. Persuasion has three classes. The first class is the internal belief agent. These algorithms, scripts, evaluation, laid it in the design or made to the operation of the owner or user. The second class includes 13 inductive beliefs, which arise from the analysis of the environment, emerging production rules of this kind: if there is X, then the conviction Z. The third class is communication of beliefs, attitudes, which appear with other agents, to build production rules, the following: if A says about the X and A is a credible source, then the conviction Z. 4.5 Multi-agent system (MAS) Multi-agent system (MAS) is a system formed by multiple interacting intelligent agents. The Multi-agent system can be used to solve such problems, which are difficult or impossible to solve with a single agent or monolithic system. The most important characteristics of MAS are situational, autonomy and social flexibility. Situational intelligent agent is defined as the ability to perceive its environment (surroundings) and to act in that environment, if necessary, modifying it for their own purposes (Chopra, et al., 2009; Chopra, et al., 2009; Dastani, et al., 2005). An example of such intelligent agents can be mobile work involved in ROBOCUP competitions, which must interact with the ball, team partners and opponents. The deployment and intentions of other players are not known in advance. Autonomous intelligent agent is ability to interact with the environment without the direct involvement of other agents for which it should be able to control their internal state and actions performed. Flexible agent must demonstrate the quality of sensitivity or foresight (depending on the situation). Responsive agent receives stimuli from their environment and responds to them accordingly. Providently agent does not simply react to the situation in the environment, but also adapts, targeted actions, and chooses the alternatives in various situations. The agent has the property of sociality, if it can properly interact with other software or human agents. Intelligent agent is the only part of the complex process of solving problems in an appropriate environment. Public behavior of agents can take different forms, which can be classified by interaction level (Russell, 2006). Zero level is connectedness, which is outside the owner or user and not accepted by the agents. The first level is coordination. Agents are able to create a situation, which allows other agents to be in the right place at the right time, that as a result of their activities are carried out effectively. The second level is cooperation. Agents admit that their behavior is determined in part by the behavior of other agents when they are jointly trying to achieve a common goal. Such a process for its implementation should be understood by all agents involved in it. The third level is cooperation. It is a real co-operation of agents in the process of implementation, which can benefit everyone. The fourth level is Education Union. It is team-work for a long time during which agents create and maintain conditions of the Union (Weiss et al., 1997). 4.6 MAS classification The MAS categorization has variety of characters: on the location of agents are mobile and fixed; on the homogeneity of agents (homogeneous and heterogeneous); by way of implementation (software, hardware-software and hardware); on the way to solve the problem (closed and open); in times of life agents (static and dynamic); the way the organization (hierarchical, network and self-organized); the nature of the distribution of tasks (functionally-distributed, spatially distributed, functionally and spatial distribution); on the principle of evolution (evolving and deterministic). 14 4.7 Agent environment One of the most important stages of designing agents is to solve the problem environment. Work environment is essentially "problem for which the intelligent agent is solution". "The notion of critical environment includes such parts as efficiency, environment, actuators, sensors (Performance, Environment, Actuators, Sensors - PEAS). This definition problematic environment there is many options for the environment. But it is still possible to identify a relatively small number of options for the problem environment. They largely determine the most appropriate agent draft and implement the applicable agent. 4.8 Agent-oriented system Agent technologies and systems development based on multi-approach is a new concept agent-oriented system (ARS). ARS system should include the following main components: a limited formal language with the appropriate syntax and semantics for describing the internal state of an agent, a programming language for the specification of agents; agentificator (it is turning the neutral components in the programmable agents). In order to understand the principles of the FAC it is convenient to draw a parallel with object-oriented programming (OOP). The object has a name, its own data and procedures. It may consist of several specific sites and, in turn, be part of a larger object. Objects contain fields that can contain data. The field may be simply an attribute or a complex (object). All actions are performed by the PLO through communication. In general, the notion of an object is defined by four key characteristics: encapsulation, abstraction, polymorphism, inheritance. Agent-oriented approach and develops an object becomes a new level of abstraction (Shen W., 1997; Mitkas, et al., 2006). We can say that to some extent, the agent is the object. But the object knows nothing about the nature of relationships between objects and the nature of messages that it did not know and the nature of the world, which it surrounds. The agent is a more complex, active and autonomous unit. In the PLO computing process is understood as the system is collected from modules that interact with one another and have their own ways of handling messages that are received. In turn, FAC clarifies the framework fixing activity modules agents and changes in their states through the analysis of belief, intentions and responsibilities. The presence of the agent goals design mechanism provides a new level of autonomy. Intelligent agent is not necessarily available to any other agent or user, but simply depends on environmental conditions, including the goals and intentions of other agents. In contrast to the object, the agent can take on certain obligations, or, rather, refuse to execute certain work, saying that lack of competence, employment, and the other task, etc. At the same time, the agent can perform actions such as the creation, suppression and substitution of other agents, to activate functions (both their own and other agents), the intensification scenario of storing the current state of other agents, etc. All this clearly indicates that the agent, as the «active object» or «artificial figure» forms its own conduct. It is at the highest level of complexity in relation to the traditional objects of the PLO. Agent-oriented architectures and models. The creation of intelligent agents is a difficult task that requires a theoretical foundation for the conceptual representations of agents. This foundation serves a model of intelligent agents, which in many ways describes the knowledge, ways of reasoning, planning, conduct and direct actions of agents. These models can have two ways: first, from the standpoint of analyzing the properties and behavior of agents in the operation of MAS, second, from the perspective of the study and design of agent properties, which determines its internal processes (acquisition of knowledge, development of goals, decision-making, etc.). There are three basic classes of architecture of agent systems and their corresponding models of intelligent agents: deliberative architecture and model; reactive architectures and model; hybrid architectures and models. (Zhang, et al., 2009; Munindar, et al., 2009). 15 Deliberative architecture. Deliberative architecture to determine as the architecture of the agents. It contains the exact symbolic model of the world and the decisions, which are taken on the basis of a logical conclusion. The physical symbol system hypothesis is Theoretical grounds for constructing such models. The hypothesis was formulated by Newell and Simon. The physical symbol system must be physically set realizable of physical entities or characters that are combined in a structure and the ability to run processes that can operate on these symbols in accordance with the symbolically coded sets of instructions. This hypothesis postulates the assertion that system is capable of intelligent behavior in a general sense of the term. According to the interpretation of MR Genesereth deliberate agent architecture should have the following properties: contain explicitly provided the knowledge base filled with formula in some logical language, which represents its beliefs; operate in the cycle: the perception of the situation the logical conclusion - Actions; To make decisions about actions based on a logical conclusion. Deliberative agent is something that clearly presents a symbolic model of the world in which decisions (for example, about what actions to perform) are made via logical are conclude based on a comparison with the image or symbolic manipulation. The most common deliberative approaches, a cognitive component contains essentially two parts: a scheduler and the model of the world (Figure 4.1). Figure 4.1: Deliberative architecture base The world model is an internal description of the external environment for the agent and sometimes also includes a description of the agent. Scheduler uses this description to create a plan to achieve the goal of an agent. He is asking atomic actions (operators) that the agent is able to perform their assumptions and their results in the world and initial and target situation. He is looking for a sequence in the space of operators that have not found yet. He transforms the initial state to target state. End Plan is a list of actions, which passed the artist plans and will perform these actions. End Plan lead to different procedures of low effectors. In recent decades several realizations of deliberative architecture have been proposed. Most of them were used only in limited artificial environments and only a few have been applied to solve real problems and a very small number brought to the stage of actual corporate programs. One such architecture is the Multi-Agent Reasoning System (dMARS) based on an older system of Procedural Reasoning System (PRS) and uses the conceptual framework of BDI - model output. Models agents and PRS dMARS are examples of the most popular at present paradigm known as the BDI-approach. BDI-architecture typically contains 4 key data structures: beliefs, goals, intentions and plans of the library. Agent Persuasions correspond to the information agent for the world and may be incomplete and incorrect. Typically, agents in the BDI-model remain in a symbolic form. The desires of agents (or targets) intuitively correspond to the tasks assigned to the agent. For existing BDI - agents need to desire to be logically consistent although the human desires often do not meet this requirement. Agents may not achieve all their desires even if those wishes are contrary to one another. Agents need to fix a set of achievable dreams and to allocate resources to achieve them. Agent’s selected wishes are the intentions. Agent will seek to achieve those intentions until its beliefs and desire or the will with these beliefs do not become more attainable. 16 For example, in the dMARS each agent has a library of plans that determines the range of possible actions that can be made by an agent to achieve his intentions. Thus, Plans realize the procedural knowledge of the agent. Each plan contains several components. Trigger condition determines the circumstances under which the plan should be viewed as an opportunity for the application. The plan has a context or background. It defines the circumstances under which the plan can begin. Also the plan has the main condition that must be correct in the performance plan. Also the plan has a body. The body may include goals and primitive actions. Events that are perceived by the agent are placed in a queue of events. The inner agent interpreter continuously executes the following cycle: survey multi-world and the internal state of the agent and changes the turn of events; generate new possible desires (tasks) and find plans whose triggers events included; chooses from the set plans included just one plan for implementation; puts the desire value to an existing or a new stack, according to sub goal; selects the intentions of the stack, reads the plan, which is in the top of the stack and performs the next step of this and executes plan if a step is action if the step is a goal, so it is placed in a queue of events; Revert. Deliberative models developments are attempts by the formalization of new motivational properties and relationships in combination with the behavior and actions of agents (Xaken, 2005). This approach leads to the development of abstract logical models. It is pretend to strict formal description of all relevant properties of rational agents in the specification and verification of MAS. Construction of such architecture requires problem solving such as the construction of an adequate symbolic description of the real world (it takes into account the complexity of processes occurring over time and existing sites), the logical conclusion organization from the available knowledge, which should lead to specific actions of the agent. The deliberative models and architectures advantage is the possibility of stricter application of formal methods and traditional technologies of artificial intelligence. Artificial intelligence technologies allow relatively easy to represent knowledge in symbolic. The creation of complete and accurate model of some substantive areas of the real world, a formalization of the mental properties of agents and processes of reasoning in these cognitive structures represent significant challenges for the technical implementation. Reactive architecture. Reactive architectures are finding problem solving. It encounters in the use classical methods of artificial intelligence of agent systems. The founder of this direction is R. Brooks, who both made key ideas behaviorist look at the intelligence: intelligent behavior can be established without the explicit character of knowledge; intelligent behavior can be created without the express abstract logical conclusion; Intelligence is an emergent property of sudden complex systems. In the real world, intelligence is not an expert system or logical conclusion machine. But intelligent behavior arises as a result of the interaction of an agent with the environment. Instead of modeling the world and reactive planning agents should have a collection of simple behavioral patterns that are responsive to changes in the environment in the form of «stimulus - reaction» (Figure 4.2) Figure 4.2: Reactive architecture base 17 The most controversial of the Brook’s principles is the principle of representation. He argues that a clear presentation of the world is not necessary for the implementation of effective agents. Instead, the agent must use the «world as its own model - the continuous treatment to its own sensors is better than an internal model of the world» (Brooks, 1991). Jet agents, in several experiments were proved the ability to handle a limited number of simple tasks in the fields of the real world. But they face problems in carrying out tasks about knowledge of the world that transcend the logical conclusion or from memory. The reactive agents are often made «tough» and have no ability to learn. The Architecture models. The best known model is the M-agent architecture of MAS. This multi-world (MA-world) includes the agents, agent space and environment, the relationship between agents and environment and relationships among agents. Common definitions of Magent architecture are as follows: “a” - the agent (“a” is particular agent in MA-world), “and” - many agents that exist in the AI-world, which is called the configuration of agents, “N” - the set of all configurations of agents. Introduced concepts related to the agent type: G - the set of possible types of agents in the MA-world; “a” (ig) is the agent type of g; “A” (g) - a lot of agents of type “g” of the MA-world called as the configuration of agents of type “g”; “N” (g)configurations of many different types of agents “g”. Space Live of agents is determined by the notion of a resource “r”. The set of resources “R” (Resource configuration), “Rs” – sets of variety configurations of resources in the MA-world. The topology of the live space of “T” determines a lot of places “t” where agents can live and work. Then the structure of space is defined as a pair. Model agent is defined as a structure. The model of the reactive agent architecture includes such things as “t” - agent environment model, “M” - the environment model set (i.e. the agent's knowledge about the environment), “q” - agent purpose, “s”- agent strategy, “S” - the set of strategies of agents called as configuration strategies. Behavior of Magent architecture is described by the following actions: creation of an agent and its placement in the v; the agent considers the surrounding space and builds a model of its environment – “t”; choose the best strategy “S”, which can be performed; if “S” is found, then transition to the next step, otherwise return to monitor the environment; implementation of the chosen strategy “S”; the agent considers the environment and builds a model of its new environment “t *”; adaptation of the environmental conditions and the transition to the best strategy; transition to step surveillance environment. In the M-agent model architecture to provide orientation of the agents with the resources. The relationship of agents and the space is not precisely defined. In this model is not hierarchy of agents is not determined by the logical relationship of agents, no possibility of a logical conclusion on MA-world and the relationship among agents. The behavior of the agent in the M-agent architecture is essentially a struggle for some resources, as well as a goal of the agent is formulated as a function of the achievement of the resource. It does not take into the background account of development. A constructive mechanism does not propose for implementing the strategies. One consequence of these problems is the inability to link highlevel specification to implementation of such models in the MAS. Hybrid architecture. Reactive approach allows efficient use of the set of simple scenarios of behavior of agents in the set of reactions to certain events of the environment. But its apparent limitations in the practical impossibility of the whole situation analysis of all possible active agent. Therefore, in most projects and existing systems the hybrid architecture (Figure 4.3) is used. Recently, some researchers recognize that the intelligent agent must have a high level conclusion and the ability of low-level jet. The reactive capacity for current tasks 18 and the possibility to its logical conclusion for the more complex long-term objectives were used. There are two categories of hybrid agent architectures: Homogeneous architecture uses a common representation and control scheme for the reactions and reasoning, while the multilayer architectures use different representations and algorithms (implemented in the individual layers) to perform these functions. Figure 4.3: Hybrid multi-layer agent base architecture Reactive component reflects perceptual incentives for primitive actions. Deliberative component is a symbolic conclusion to control the behavior of reactive components, such as the situation changes a lot of rules (actions). In some architecture deliberative component is directly linked to sensors and effectors agent. When the layered hybrid agent architecture is designed we must obtain answers to critical questions: Is only one jet and one deliberative level enough? Do we have to introduce more levels? How a cognitive workload is split between the levels? How should the components interact at different levels? When the agent should act and when talk (i.e. how to divide the algorithm of decomposition)? Among the most popular in our time of hybrid architectures can be noted such as a cyclic machine architecture proposed by Ferguson; Glair (Grounded Layered Architecture with Integrated Reasoning) architecture, or multi-architecture with generalized conclusion; DYNA architecture, architecture InteRRaP. In order to more fully acquaint with hybrid architectures consider proposals Ferguson, cyclical machine architecture. Cyclic machine Architecture includes three levels: level jet (set of rules «situationaction»), planning level (the main component is hierarchical), part-time planner and modeling levels. The aim of reactive level is to ensure rapid response to events in real time. The main objective of the planned level is to generate and execute plans to achieve long-term goals of the agent. Finally, the purpose of the simulator is to identify and anticipate situations of potential conflict between the objectives of the agents. Then propose actions to exit from these conflicts (Figure 4.4). Each level is independently associated with the sensors and effectors and act as head to the agent. As a result of this action will often conflict with each other. Conflicts are solved with the help of suppression rules. On the side of the receptors has a policy of censorship that filters sensory data so that each receives the appropriate level of sensory data. These rules may be organized in different ways. This will depend on the architecture and capabilities of the agent, which is being developed. The important point is the consistency of the censorship rules. Responsibility is rests to the developer's architecture and intelligent agent. 19 Figure 4.4: Agent structure of cyclic-machine architecture Messages are divided into two types: passive transfer of information (reactive level reported to the model the fact that the world should pay attention) and actively changing the control solutions at other levels (model level raises the design level to generate a plan for new task. Levels operate in parallel, but the synchronous (using internal clocks agent) (Jennings N. et al., 1995), (Shen W. et al., 1996). 20 5 Ontology models development In this chapter we present ontology models research and development. We created several ontology models, which contain description about their structure and usage in ontology data and metadata exchange repository. 5.1 Implementation Search algorithm is based on ontological models and ranking results. Implementation of access to intelligent agents based on web-services, implemented on the basis of programming paradigms such as dependency injection and control inversion. There were a variety of access through the implementation of web-projects and implemented a more flexible, robust architecture. During the work we analyzed agent-oriented approach methods and put the template design. During the work we offered not only general model of the search module, but its detailed architecture and implementation. Architecture of intelligent search agent was proposed and the possibility of its use was investigated in detail. The design patterns in agent-oriented approach were developed and implemented. The above mentioned drawbacks have a point to create a new public information repository to store datasets using intelligent agent and ontological approach for storing, conversion, search, add, description, selection of the required information for researchers’ needs in the field of Data mining and Machine Learning. Using the Protégé 3.4 we created an ontology model of Data mining methods, an ontology model of the user, a model of the resource (W3C, 2004). A base standard was chosen the standard SDMX Standards Version 2.0 and the main parameters of the Statistical European Repositories were taken. The interaction between the ontological models is based on intelligent agents: coordinator agent, resource agent, search agent, a user agent. The agent approach has been implemented by multitechnology JADEX. We use intelligent software agents. This is a new class of software systems, which acts either on behalf of the user, or on behalf of the system. They are, in fact, a new level of abstraction, different from the usual abstract type - classes, methods and functions. For practical implementation of these agents JADE offers to programmer-designer of agent systems the following possibilities: FIPA-compliant Agent Platform, which includes system agents AMS, ACC and DF; Multiple Domains support – DF agents and so on (IEEE Computer Society standards organization, 2006; Bellifemine, et al., 2006). 5.2 Ontology representation In recent years the development of ontologies are formal descriptions of explicit terms for business and relations between them. In the World Wide Web became commonplace ontology. Ontology in the network range from large taxonomy that categorize websites (Yahoo! website) to products and their characteristics (like on the website Amazon.com). Consortium WWW (W3C) develops RDF (Resource Description Framework). The RDF is language of encoding knowledge on Web pages. It makes knowledge understandable to electronic agents to search information. Now many disciplines develop standard ontology that can be used by experts in subject areas to share and annotate information in their field. For example, in medicine large standard structured dictionaries such as semantic web unified medical language system (the Unified Medical Language System) were created. Also large ontology appears general intent. For example, the UN Program for Development (the United Nations Development Program) and the company Dun & Bradstreet combined efforts to develop ontology UNSPSC. It provides terms for goods and services. Ontology defines a common vocabulary for researchers, who need to share information in the subject area. It 21 includes machine-interpreted formulating the basic notions of domain and relations between them. Ontologies are developed for joint use by people or software agents common understanding of data structures for possible reuse of knowledge in the subject area, to make assumptions explicit in the subject area, to separate knowledge in subject area of operational knowledge, the analysis of knowledge in subject area. 5.3 Program - instrumental method of implementation of the ontological model The tool Protégé 3.4 for “data and metadata exchange repository” was selected. It was developed at Stanford University (USA) (Gennari J.H. et al., 2002). Protégé 3.4 is meta-tool. It helps users to create a system of acquisition knowledge for a particular subject area and experts can use these systems to enter and view the information contained in electronic databases of knowledge. The modular architecture of Protégé 3.4 very expands class of systems that can be collected for certain tasks on the acquisition of knowledge and making the future of knowledge acquisition can be better adjusted in accordance with certain requirements of end users. The Protégé 3.4 developers say: "The system is open software. It is difficult to calculate the number of users ..." Now the list on the Protégé, nearly 9,000 subscribers, and website Protégé registered over 20,000 users (we can download the Protégé without registration). You can download 85 different plugins for Protege from the site. Protégé user community is very active and has representatives in more than 100 countries. The functional editor is inextricably linked to the specific for the ontology model and knowledge arising from the classification scheme vocabulary. The editor has a graphical interface that provides a visual edit mode. Graphical interface is implemented on the basis of standard software Object TreeView, a significant addition of additional functionality - mainly in the search, input and control logic. Ontology editor Functionality is: View and search: supports viewing grid, standard types of search time; editing (input, correction, deletion); logical control in the introduction: the introduction of technology almost completely eliminates the violation defined description schemes; functionality testing: writing queries; interaction with other ontology (import - export, mainly using communicative presentation formats). 5.4 Ontology source model (dataset ontology model) Information about “data and metadata exchange repository” is stored in the ontological models form. One of the main classes of this model is «data set» (DataSet). Each separate instance of this class contains information about the data set to this information include name, analysis method, short description, information about its creators and more. This class contains several classes that belong to its structure: DataSetFile and Judge. Class DataSetFile contains information about the sample that covers this data set, but Judge class contains information about the evaluation of the different set of moderators. 5.4.1 Ontological source models development The Ontologies are developed and can be used in solving various problems, including joint use of people or software agents to possible accumulation and reuse of knowledge in the subject area, to create models and programs that operate ontology, but not rigidly defined data structures, analysis of knowledge in the subject area. For a more intelligent synthesis of information systems section must define ontology, which should describe the terminology used in the contents of set rules for the use of these terms in the context of other terms. The basic building block of dataset model is an assertion that represents: resource named property and value. In RDF terminology these three statements are respectively: subject, 22 predicate and object (W3C, 1999). Show description of the dataset source in the environment of ontologies Protégé 3.4. Classes and attributes of selected classes are created and presented in Figure 5.1. In the development of ontological models of the resource repository was allocated 3 classes. Here more detail the selection process classes. First of all area and scope of ontology were defined. Then important terms of source ontological model of "data and metadata exchange repository”: sample, method of analysis, attribute, subject area, data set description, the dataset file, name, type, articles that refer to the dataset, keywords, author, date of creation. We highlighted three classes and a set of slots in the ontological model of the resource: DataSet; DataSetFile; Judge. Figure 5.1: Ontology source model Classes and attributes in the protégé-3.4 In this thesis ontology source model are described by three classes. 23 5.4.2 The ontological models source of classes description We present slots description of DataSet class in Table 5.1. It is serving for dataset description. Attribute Type Abstract AnalysisMethod String String Single Multiple Presence 4 Mandatory Mandatory Area AttributeAmount AttributeInfo AttributeType CitedPaper Creators String Integer String String String String Single Single Single Multiple Multiple Multiple Mandatory Mandatory Mandatory Mandatory Optional Mandatory DataSetInfo DataType DateDonated String String String Single Multiple Single Optional Mandatory Mandatory DownloadAmount Integer Single DSFiles Instance Multiple of DataSetFile InstanceAmount Integer Single Mandatory Optional 1 Power 2 KeyWord RelevantPapers SolutionMethods Status String String String String DatasetMark Title Float String 3 Description 5 Introduction (short description) Analysis Method (refer to ontology elements of dataset analysis methods) Data domain Attribute number Attribute information Attributes type Articles refer to datasets. List of dataset creators: refers to user ontology element Dataset description Data type date of data loading or last date of update downloading Datasets files Mandatory Number elements in the dataset Multiple Optional Keywords Multiple Optional Relevant papers Single Optional Method of solution Single Mandatory Dataset status (new, low, middle, high) Single Optional Average dataset estimation Single Mandatory Title Таble 5.1: DataSet class slots In Table 5.2 we present slots description DataSetFile class for files datasets description. Attribute FileDescription LastModified Type String String Size Title Extension Float String String Power Single Single Presence Optional Mandatory Description File description The last load or modification date Single Mandatory File size, kb Single Mandatory File name Single Mandatory File type Таble 5.2: DataSetFile class slots 24 In Table 5.3 we present slots description Judge class for description of estimation dataset. Attribute Login Type String Comments Mark String Integer Power Single Presence Mandatory Description Person, who tick off (invocation user ontology element) Multiple Mandatory User comments about datasets Single Mandatory Estimation Таble 5.3: Judge class slots After ontology model source determination Protégé system allows to convert Protégé project to RDF model. 5.5 Ontology data mining model Ontology data mining model is the exact specification of the subject area. It provides a vocabulary for presenting and sharing knowledge about methods of analysis and methods of deduction and many relationships established between terms in the dictionary. One of the advantages of using this ontology is a systematic approach to the study of the subject area. It is achieved: systematic (ontology presents a holistic view of the subject area); monotony is material (represented in a single form is much better perceived and reproduced), scientific development (construction of ontology can restore missing logical links in their entirety). 5.5.1 Ontological data mining models development There are two levels on which ontologies are used to support data processing: domain ontologies and task ontologies. Domain ontologies are used to describe knowledge from the domains relevant to the particular task (Figure 5.2). The first step in the ontology development is the definition of the domain and scope of the ontology itself: in our scenario the ontology will cover the Data Mining domain. To build a consistent ontology model it is necessary to establish for what we are going to use the ontology and for what types of questions the information in the ontology should provide answer. The choice of how to structure ontology determines what a system can know and reason about. We have built our ontology through a characterization of data mining methods that is classified on the basis of some parameters useful to select the more ones method to solve a KDD problem. Repository determines characteristics of the data and of the desired mining result, and enumerates the DM processes that are valid for producing the desired result from the given data. Then the Repository assists the user in choosing processes to execute, for example, by ranking the process (heuristically) according to what is important to the user. Results will need to be ranked differently for different users. A different user may want to minimize run time, in order to get results quickly. There are other ranking criteria: accuracy, cost sensitivity, comprehensibility, etc., and many combinations thereof. 25 Figure 5.2: Data mining ontology, where property relation of concepts, Subclass relation of concepts To solve problems related to data analysis in the presence of random and unpredictable effects, mathematicians and other researchers over the last two hundred years produced a powerful and flexible arsenal of methods, collectively called mathematical statistics. During this time extensive experience was gained in the successful application of these methods in different spheres of human activity from economics to space research. And under certain conditions these methods allow for the optimal solution. For example, one of the problems solved in the radiolocation is the known signal detection in background additive interference in the form of white noise. Mathematical statistics methods solve this problem successfully. It is difficult to imagine the need for other approaches to solving this problem. Because knowledge is personal in nature, the same subject area can be described by different ontologies. This is particularly true of domains that are not formalized or when there are many contentious issues. In this work one of the problems is the task of ontology development methods is data mining. Certainly a good practice is to use already existing ontologies and a good specialist should be able to quickly find existing and already proven any ontology or an algorithm, rather than spend time on developing new. The fact is that ontologies are not clearly structured and formalized. Now a lot of online ontologies and of course they are all correct. But research of existing data mining ontologies did not give a satisfactory result. 26 Figure 5.3 Data mining ontology model Therefore a new ontology was developed. Analysis of knowledge in the subject field of Data Mining is quite possible because there is a declarative specification of terms. Formal analysis of the terms will be extremely valuable as when to reuse the developed ontology so in its expansion. The reason for the development of ontology data analysis provides an Analysis Method slot in DataSet class of developed ontological model resource. It contains data mining methods that are under all this set of statistics. Ontology with a set of individual instances of classes forms a knowledge base. In fact, in this case it is difficult to determine where the ontology ends and where the start of knowledge base. Ontological model was presented in Figure 5.3. 5.6 Ontology user model Ontological approach is offered for creation of model of user for intellectual repository “data and metadata exchange repository”. This approach allows taking into account the collection of concepts and connections between them, having a place at interaction of the user with our repository. Ontology user model is the model for data structuring. It stores information about user. User model is obviously for our repository with different levels of training for work with a computer, with a variety of mental, psychological and physiological capabilities (Cargar, 2008; Waltz, 2008). 27 5.6.1 RDF model The namespace http://dmr.kture.ua/dataset/ conversion is defined. The part of RDF model is shown in Figure 5.4. Also ontology user model and data mining ontology model was developed. Ontology methods are a classification of data mining. The user ontology has two abstract classes: Account and Person. Class Account represents user as a logical entity of user’s system. Class Person represents person as a person that uses this system. The real value of RDF cannot be evaluated until it is used for internal purposes of a given program. The benefits of implementing RDF will be when it becomes a means systems interaction, data exchange, when the machine will get the ability to combine information obtained from different sources, thus getting some new information. The more applications on the Internet can work with data the higher will be their value. Figure 5.4: Part of RDF model The obtained RDF-model is the metadata of experimental datasets. It will further develop multi-system based on metadata datasets work. 5.6.2 Ontological user models development In Figure 5.5 user ontology model in the system Protégé 3.4 is represented. Protégé system has the following possibilities: tabs for ontology replenishment, functional expansion modules, generation of knowledge acquisition module requests and the logical deduction module. 28 Figure 5.5: User ontology model This ontological model includes two abstract classes: Account and Person. Class Account represents the user as the logical nature of the user system. Class Person represents the user as the person using the system. Experienced and Beginner classes are beginner and advanced user respectively. Admin slots class match Experienced slot class. 5.6.3 The ontological user models of classes description. In Table 5.4 the description of Address slots class is shown. Attribute Тype Pover Presence Description country String 1 Mandatory country city String 1 Optional city Таble 5.4: Adress slots class In Table 5.5 University slots class is present. It works for base University description. Аttribute Тype Power Presence Description name String 1 Mandatory University name address Address 1 Optional University address Таble 5.5: University slots class 29 In Table 5.6 Preference slots class is present. It works for user interest and search requests description. Power Presence Description Аttribute Тype interest DataMiningMethod * Optional Data format search String * Optional A lot of user search requests * Optional A lot of user search requests results searchHistory SearchHistory Таble 5.6: Preference slots class The abstract Account slots class is present in Table 5.7. Аttribute Power Тype Presence Description password String 1 Mandatory password created String 1 Mandatory Date of creation email String 1 Mandatory e-mail preferences Preference 1 Optional Information about preferences title String 1 Optional display name Table 5.7: Slots of abstract class Account Аttribute Тype Power Presence description first_name String 1 Optional Name last_name String 1 Optional Surname gender Symbol (Male, Female) 1 Optional Sex (male\female) university University 1 Optional Information about university Table 5.8: Slots of abstract class Person The Account class is user representation base. The slots of abstract class Person are present in table 5.8. Its base class is Account. Class Person is base for Beginner and Experienced classes. Beginner class has the same slots as class Person. Slots of Experienced class are present in Table 5.9. Attribute Тype Power Presence description speciality String * Mandatory speciality Таble 5.9: slots of Experienced class 30 6 Intelligent search agent design and development As one of basic concept in this thesis is search agent. This chapter presents the detailed description of search agent development. The knowledge (beliefs) mechanism is used by agents to store the internal data. There are two types of knowledge: atomic knowledge (belief) and set of knowledge (beliefset). The objective (goals) and plans (plans) mechanism are used by agents to achieve the assigned tasks. All actions of agents have an effect on their objectives. Depending on the current objective, the agent executes either the proper plan or series of plans. 6.1 Agent implementation Startup each agent of the system executes WebRegistrationPlan, which registers the agent in the system that allows its later calling through a web service. Separate web-service is provided for each agent in the system, including methods, initiated the relevant aims of the agent. The agents execute plans performing the aims. They interact with each other and return the result to the web service. Each class of agent is formed by agent platform according to the agent description. Agent platform analyzes the agent description and verify the availability of classes in the building plans of the agent. Thus, the agent class diagram includes a set of plans classes of the agent, as well as if the agent uses a capability there must be the reference both to the description file and to the files classes. The objectives and plans of agents correspond to each other using the following rule: if an agent has a goal with the name xxx, the plan will have name xxx_plan. All classes of planes also have the word Plan at the end. The parameters are sent into the plans via a goal mapping mechanism, i.e. parameters that have been set for the objectives of the agent. They are displayed on the plan parameters. The following rule was also approved for the objectives, if the objective returns one parameter, the name of this parameter is result. Interaction between agents is a standard Directory Facilitator mechanism. When the agent starts its operating it executes WebRegistrationPlan and objective of an agent registration to receive messages from other agents of the system. When the agent is searching for the agents of the same type it applies a Service Name mechanism. 6.2 Intelligent search agent For searching in the data and metadata exchange repository we have to develop a search module. It would consider the current state of system and different searching criteria to adopt any strategy of search. One of the most suitable solutions to this problem is the intelligent agents based on goal. This intelligent agent will act not just in reflective way when a request came, but would decide what actions are needed to achieve its goals in terms of the current state of environment. This agent is not able to supervise the environment, where it’s executed, in full. In the search module of “data and metadata exchange repository” is set problems such as simple and advanced search or personal search. On the other hand, the search agent is used in the multi-agent environment and agent needs to communicate between it-self and other agents as well as to exert the medium, where it’s executed. From these two points of view the functionality of search agent may be divided into functionality in terms of user and functionality in terms of other agents and execution environment. Functionality in terms of user should include the following basic set: to execute a simple search only for non-authorized user, to execute the various searches for authorized user and to provide useful services for the search (Figure 6.1). 31 Both authorized and non-authorized users may execute the simple search, in both cases there will be shown the most popular data sets in the repository or the most popular queries (queries most often made by users in the repository) and the results may also be hints as content search queries (queries correlated with the current ones). But still the authorized user has more privileges in comparison with non-authorized one. The following is available for authorized user: advanced search (search by various data set criteria). When agent uses the search there is displayed some of recent queries. Information about user’s requests and their results are stored using a personal agent and will be used further to provide user with more relevant results considering his previous requests. Figure 6.1: The use case diagram in terms of user On the other hand, the search agent must interact with other agents to successfully achieve the goals. To render the useful information to them and to request the necessary information form them or to request them to provide a service. Figure 6.2 shows the use case diagram in terms of interaction between agents. Figure 6.2: The use case diagram in terms of interaction between agents 32 Personal Agent of user obtains the results from the search agent and the request itself, as well as to take the transition sequence of user until the user will find the necessary dataset. This information will be stored by the private agent in order that the search agent could further use it. 6.3 The search agent goals The search agent is based on the concept of goals. You can say that the main objective of the search agent is information search, but such a goal is very abstract. You need to shape this goal. Thus it may be divided into two simple goals: simple search and advanced one. We also should to divide the goals into subgoals. For example, defining the significant objectives, the person thinks what he has to do to achieve these objectives and subdivided them into more local ones builds the sequence of his actions. In the same way, you can divide the goals into subgoals and build a hierarchy of goals and actions, as well as assign the extra goals, which in complex will lead to the objectives achievement – relevant search results. After analysis of the search agent functions there can be identified several goals that will be explicitly or not explicitly included to the primary objective search (Figure 6.3). Goals explicitly included to the main goal are simple and advanced search. Such goals as «Get the popular datasets» and «Get contextual queries» can not be completely referred to the search goals, but generally they are subsidiary objectives that can help user in his search. Figure 6.3: Search agent main goals The simplest search strategy is shown in the «simple search» goal (Figure 6.4). This search option is available both for authorized and non-authorized user. But some of the subgoals of this strategy may vary depending on the state of the user in the system. Let us consider the option strategy when the user is not authorized. In this case, the goal «Simple Search» would include «Search data model», «Gradation results by popularity», «Saving the query string». The Goal «Search data model» has a subgoal, «Create a request to the data model», which task is to form a request to ontological model. 33 Figure 6.4: The diagram of simple search goals Advanced search is a more complex strategy, where you must consider some different states of the environment and interact with the personal agent of user (Figure 6.5). At this option the subgoal «Formation query data model» is based on the previous queries and user preferences. In this case, the search agent interacts with the personal agent to get the previous requests. After the search on a model the search agent requests a personal search agent to save data such as query string and the results were found. The agent search goal is ranking results by their popularity in the system. This allows the user to provide the most relevant data results. The ranking may be changed as the result of relevant data search increasing, which includes a search agent. Figure 6.5: The scheme of advanced 1search goals 34 6.4 Search agent outline The basic Jadex concept is the goals. But the goal in this system is more abstract concept. The Jadex-agents use plans to achieve the goal. Almost every goal of search agent correspond a plan (Figure 6.6). All plans of the agent are an extension of another one, more general, AbstractCommunicationPlan plan. This plan includes methods and information necessary for communication between agents. This class extends another significant type AbstractDBAccessPlan plan that includes the functionality for database and ontological models. The SimpleSearchPlan class includes logic of simple query processing from the user. This plan runs required subgoals of the search agent and performs the necessary steps before to execute the simple search goal and also the actions after implementation of this goal. The ExtendedSearchPlan contains the logic of advanced search processing. This plan contains a much more complex logic than the previous one, but the basic principles are the same. Both of these plans are the heirs of an AbstractSearchPlan base class. It contains the functionality general for these both plans. Figure 6.6: The structure of the search agent plans 35 6.5 Search agent ADF The agent can not exist without ADF file. ADF file is the main part in the Jadex. This file describes the search agent. It describes goals, plans and knowledge of the search agent. The Figure 6.7 shows the XML structure that contains a partial realization of the search agent goals in the search.agent.xml file. These goals describe the parameters that must or may be transferred to the search agent to successfully complete the target (objective). The search agent has several kinds of goals. The main are goals, which must be achieved, for example, simple or advanced search. Another type of goals, for example, is aims helping to maintain a certain status. For example, a goal that processes the statistical data on the most popular queries to the system is implemented periodically. Simple and advanced searches are described in agent ADF file as goals like achievement. These objectives are directed to the reaching of alternate abstract state methods. The search agent satisfaction is a condition in which the results was found and processed. Figure 6.7: The search agent goals description 6.6 Search agent ”Belief” The search agent retains some knowledge to successfully fulfill the search agent goals and to respond to environmental change. This knowledge also has an effect on agent motives and may influence the set goals of agent. The Search Agent saves the following information: query to the search subsystem; requests to the system; the most popular queries to the system; information about the user session for which transaction the agent was created; simple rules of relationships and query result; history of searches; user preferences. The internal information for the search agent inside is also the data for database access or other ontological models repository. Table 6.1 describes the knowledge of the search agent. 36 Name Тype Description searchQuery Belief, String Search requests to the search subsystem searchQuerys Belief, List<String> Query to the system popularQuerys Belief, List<String> The most popular quires to the system sessionInfo Belief, SessionInfo information about the user session for which transaction the agent was created rules Belief, ProductionRule simple rules of relationships and query result userInterests Beliefset, String User preferences Table 6.1: The search agent knowledge description 6.7 Search agent interaction with other agents One of the key advantages of multi-agent technology is the communication between agents. The search agent interacts with other agents in the system to provide the user with useful contextual information, to find orientation character for each user individually. With this purpose the search agent interacts with the profile and manager agents and requesting the information about the user, his preferences, query history, and the possible results of the queries (if such a request has been made, he can use the results of previous search). Table 6.2 lists some of the events that are in the search agent. Name Direction/Тype Description get_search_history send/request Qquiries history of user requests get_search_history_results send/request Request of search results history get_interests send/request The request to receive user preferences add_to_search_history send/info The messages to save history requests add _search_history_result send/info Saving of search results history Таble 6.2: The search agent events 6.7.1 Search agent Scenario Intelligent agents – a new class of software and hardware entities. Such substance acting on behalf of the user to find and process information. Based on current knowledge and events of environment search agent selects a plan to achieve its goal – to find the most relevant data at the user's request. Such a structure (illustrated in Figure 6.8) enables the agent to effectively search in the repository. 37 Figure 6.8: Search Agent structure 6.7.2 User agent Scenario The base information unit of the personal agent is ontology model of user (section 5). At the level of agent conception about the user is object model of user ontology. The main objective of the personal agent is to transfer information about users to other agents and to transfer the necessary information to user from the other system agents. So, personal agent should be able to form answers to queries from other agents of “data and metadata exchange repository” system and to modify the user profile during his work with the system. In accordance with the information and ontology model of user the personal agent should be able to form answers to questions related to user. We can allocate the following two partitions of information about user: personal information about user, information about current goals of user. In general case the personal agent should be able to respond the following questions: what is user name, what is his e-mail address, what language user prefers, what are the current goals of user; is user advanced or beginner (naïve, simple), what academic institutions the user belongs to; what are localization preference of the user; are the interests of user coincide with other users interests in the system; what are the recent requests of the user. Here the personal agent applies the developed information and ontology model of the user for questions, which can be requested by other agents while interaction with personal program agent during the work of user with the system. The User Agent is created after the user authentication. Figure 6.9 shows the User Agent functionality. 38 Figure 6.9: The use case diagram for user agent When authentication and authorization are completed the user agent retrieves its knowledge information about users that later allows the user and other agents to access this information quickly. The user agent stores this information in its knowledge when active user is in the system and before the work cessation; the agent unloads this information into the database. The User Agents in the system as much as users have passed authentication at the current period. If the user does not operate with agent over 30 minutes, the user agent removes the search agent and itself from the system. The user sets the following tasks before the user-agent: user personal data storing; changing of user data; preserve a user's search query to the system; conservation of user activity in the system; tracking the status of the user in the system; retention of the data sets loaded into the system; retention of the data sets unloaded from the system; User communication with other users of the system. The user agent inside information. The user agent stores information in its knowledge: • personal information about the user; • type of user; • history of search queries; • history of user queries results; • new search results; • interests of the user. Table 6.3 shows the user agent knowledge and its type. All knowledge of the agent is loaded at the appropriate request. All data are stored in the database or synchronize with it at the end of agent operation. 39 The user agent knowledge may be also dynamically expanded during the agent life. Name Description Тype Person Belief, Account personal information about the user account_type Belief, Class type of user Search_history Belief, History history of search queries Search_results_history Belief, SearchHistory history of the user queries results new_search_results_history Belief, SearchHistory new search results Interests interests of the user Beliefset, String Table 6.3: The User agent knowledge The input data for the user agent. Personal agent receives the following data from the user: • personal information needed to be updated; • information about the loaded data sets from the system. End users of the system interact with the user agent. It invokes the Web service from user interface of user agent, which on the basis of the information about current client session sends an invocation to the personal agent. Personal agent receives the following information from the search agent: • user queries; • list of user queries results. Search Agent informs the user Agent about the results of queries that user has implemented. It allows during the further use of the system to get quick access to these data without additional requests to the database. Personal agent receives from the source agent the information about data sets loading by user into the system. A source agent informs the user agent when the user downloads data sets into the system. In its turn the user agent saves itself knowledge references to these data. It helps the user to have quick access to editing of the system data sets. The user agent date-line. Personal Agent provides the user with the following information: • user's personal information; • a user's search query; • history of the search queries results; • a list of data sets downloaded into the system; • a list of data sets downloaded from the system; • a list of users, who have similar interests and are or registered in the system. The user has quick access to data sets, which he unloads from the system obtaining information from the user agent. The users of the system can know about each other on the basis of the data stored by user agent. For example, advanced users, who have the scientific interests in their personal information will be able to find the results of queries and activity from other users having the similar interests. The agent sends requests to the agents of other users and based on the obtained results forms the information that he interested in to search users, who use the system. 40 The personal agent provides the search agent with the following information: • user interests; • list of user requests; • list of user queries results. The user agent interaction with other agents. Personal agent interacts with manager (coordinator) agent to make the task: the request sending to change the status of the user. Personal agent interacts with the search agent for the following tasks: • to obtain information about the search query; • to obtain information about the query; • to send the queries results on keywords; • a user interests sending; • the agent search removing. Personal agent interacts with the source agent to perform the following task: to obtain information about data sets loaded by user. Personal agents interact with each other to transfer the information about users of the system. User agent responds to events described in Table 6.4 to interact with the agents of the system. Name developments get_search_history Type developments receive Description The user history’s lines queries return.The search agent initializes this message get_search_history_results receive Full queries history return. The search agent initializes this message get_interests receive The user queries return. The search agent or source agent initializes this message add_to_search_history receive Addition of information about user query. The search agent initializes this message add _search_history_result receive Addition of information about user query results. The search agent initializes this message change_user_type send The query to change user type. The source agent initializes this message get_user_info_by_interest receive The user accordance to the interests return. The user agent initializes this message add_user_download_dataset receive The conservation of loaded dataset. The source agent initializes this message add_user_upload_dataset receive The conservation of dataset loaded into the system.The source agent initializes this message Table 6.4: The agent user events 41 The diagram of user agent classes is shown in figure 6.10. All the goals of the agent are designed to plans. Each plan is the appropriate class with at least one “body” method. The LoadUserPlan is performed while user authentication. The LoadUserPlan loads all the data about the user from the database, the history of user queries and results to them. To view information about the user the following sequence of actions occurs in the system: the user loads the profile view (Figure A.5); system invokes a Web service to work with user agent; Web Service initiates an agent goal, which is to update the user information; to fulfill the goal the agent fulfills a plan; the results are returned to the Web service, as a result of goal achieving; Web service returns the corresponding result to the Web interface. Figure 6.10: The user agent class diagram To update information about user the following sequence of actions occurs in the system: the user enters the updated information into the system (Figure A.6); the system invokes a Web Service to work with user agent; Web Service initiates an agent goal, which is to get the user information; to fulfill the goal the agent fulfills a plan; the results are returned to the Web service, as a result of achieving the goal; CommunicatePlan and CommunicateResponsePlan Plans are used to search users with relevant interests. Arrangement of these plans as follows: user requests an information; user agent initiates a CommunicatePlan plan which interrogates all user agents in the system; in the survey, CommunicateResponsePlan plan is initiated at each user agent, which checks whether the interests of its user correspond to the user interests requested and returns the result to requesting agent; agent returns the information to the Web service. SaveUserDownloadPlan and SaveUserUploadPlan Plans are performed when a personal agent receives a request from the source agent. These plans are increase the rate of datasets loading and discharging by user from the system. 42 6.7.3 Coordinator [Manager] agent Scenario To manage the overall system, registration and authorization of users in a “data and metadata exchange repository” operates the manager agent. The manager agent always suspends user and other agent’s queries. Agent Manager exists in the system as a single copy. Agent Manager is parallelized by agent platform. Manager Agent provides functionality from the standpoint of the user schematically shown in Figure 6.11. Figure 6.11: The use case diagram for coordinator agent The user sets the following tasks for the agent manager: user registration in the system; to obtain information about universities; user authentication and authorization in the system; changing the user status; to provide the user with administrator privileges; to obtain information about user activity in the system; information about users of the system obtaining, user deleting. There are two main types of users in the system: authenticated and not authenticated. The main difference between these two types of users is in that all not authenticated users have limited opportunities. They work with permanent agents only. A custom user agent is created for each authenticated user and also an additional search agent is created after the search query. Authenticated users are divided into the following groups: beginner, advanced user and administrator. Manager Agent stores the following internal information about the current state of the system: the number of users, who use the system; the number of beginners, who use the system; the number of advanced users, who use the system; 43 research methods; general information about the universities of the system; the number of administrators, who use the system. Manager agent modifies its internal data considering the type of user when user authentication was specified. Table 6.5 shows the manager agent knowledge and its type. Name login_users beginners_online experienced_online admins_online generalData methods Type Belief, Integer Belief, Integer Belief, Integer Description the number of users, who use the system the number of beginners, who use the system the number of advanced users, who use the system Belief, Integer the number of administrators, who use the system Belief, Object general information about the universities of the system , countries, cities Belief, Object the list of methods which available in the system Таble 6.5: Coordinator agent knowledge Manager Agent receives information from the end users and from the environment. Input information from end-user of the system: account and password; user registration data; user account that needs to be transferred into the status of the administrator; user account that to be deleted. End users interact with the agent manager. They invoke a Web service manager from the user interface. System Administrators remove the users from the system by sending a request to the manager agent. For user to obtain the administrator rights, the other user-administrator should send the request to create a new administrator account on the basis of an existing one. Input information from the user agent: user account whose status should be changed. The transition from the one status to another is carried out by manager agent at the request of user agent of specific user. The Algorithm for the transition as follows: the user agent monitors the user activity in the system, and after getting some experience in the system, the agent prompts the user to raise his status and to receive additional options. If the user agrees the agent sends a request to the manager agent to change the type of user. Also the user invites to enter additional information about himself to obtain additional options. Manager Agent sends the information to other agents and transmits the information to the end user through a Web service. Output information for the end user: list of universities in the system; list of countries; list of cities; registration result; authentication result; the result of the transition to the administrator status; the user removing result; personal data of the system users; the number of users, who use the system; the number of beginners, who use the system; the number of advanced users, who use the system; the number of administrators, who use the system. 44 Registering a new user the manager agent checks the uniqueness of user name, and if successful, stores the user in the system. Output information for the user agent: user agent creation; user status changing result. Manager Agent interacts with the user agent to perform the following tasks: user agent creation; user status changing. To interact with the system agents the agent manager has the following events described in Table 6.6. Name of event change_user_type Type of event receive Description User status changing. The user agent initializes this message when the user agent want to change user status recalculate_raiting send The message sending from source agent to user after transferring him to a new status. Таble 6.6: The coordinator agent events The diagram of manager agent classes is shown in Figure 6.12. In LoginPlan the sequence of actions as following: set the parameters of the plan goals: the login and password. When you start the plan these options are used to build a query to the user database. If the user was not found or the password does not conformable to the password in the system, so the plan sets the corresponding result in the output parameter. If a user was found, so user ontology model is added to the agent presentation for further use. Figure 6.12: The diagram of Manager Agent classes To log the user in the system the following sequence actions occurs: user enters a username and password into the system (Figure A.1); system invokes a Web service to work with the manager agent; Web Service Agent initiates a goal of agent, which is to authenticate the user; to fulfill the goal the agent fulfills the plan; results are returned to the Web service as a result of goal achieving; 45 Web service returns the corresponding result to the Web interface. To register a user in the system there are two options: Log beginner and advanced user registration. The administrator registration is available in both cases because the rights to administrator must be given by the other administrators. The sequence of actions in registering as following: user selects the type of registration (advanced user, beginner); the beginner fills in his personal data (Figure A.2); the advanced user also fills in information about his interests (Figure A.3); quotations system invokes a Web service to work with the manager agent; Web Service Agent initiates a goal, which is to register the user; to fulfill the goal the agent fulfills the plan; results are returned to the Web service, as a result of goal achieving; Web service returns the corresponding result to the Web interface. GetUsersInfoPlan and GetMethodsInfoPlan Plans are performed by starting the system in full that is constructed query to the database. It withdraws data about the methods of study represented in the system about the university, which researchers involve in the project, the cities and countries of universities location. After downloading the agent stores this information in its knowledge and all subsequent invocations of the goals, after these data obtaining, return these data from the knowledge. Information about the methods is available at the home page (Figure A.4). SetupAdminUserPlan and DeleteUserPlan Plans are performed with the relevant user requests to Web services. They can be initiated only by a user, who has administrator rights. 6.7.4 Source agent Scenario The main functions of the source agent are: scientific data sets addition; interaction with the user agent to display the newly added samples to the user depending on the user's interests. The Source agent informs the user agent about adding of scientific datasets to show users information about it after adding a new set to the storage; Metadata of datasets edit. The users, who create system or administrator have the possibility to edit ; metadata dataset extract from repository; selection of entire information about a specific dataset and detailed information may be viewed only by registered users; to establish the dataset status numbers of downloads depending on estimates. The rating can be mark to each dataset. The rating assigned using the professional coefficient of a user, who makes it. At the moment of assess its assessment multiplied by a coefficient. This function performs source agent. The source agent should request the user agent ratio, calculate the result and save it in the database. Status of sampling can also increase depending on the number of downloads; interaction with the user agent to modify the coefficient of user professionalism depending on the status of scientific data sets, which he has added to the assessment or in storage; datasets filtering of metadata datasets by a specific parameter; new datasets adding to the repository that were found by search agent in the Internet. Let us consider the diagram of sequences of a new dataset adding. Registered users should go to the page Create New Dataset. Fill in all required fields, click Insert button, Web page will call the service source to add a new set of statistical data to the repository (Figure 6.13). 46 The Service invokes the source agent for adding new dataset to the repository. After the agent added data to the database the agent invokes the manager agent to find all users with preferences, which correspond to just added dataset. Figure 6.13: The sequences diagram - new dataset adding to the repository Figure 6.14 shows the diagram of sequences. It reflects the process of datasets request from repository. Certainly with the huge number of datasets the request will be done long enough, so accordingly the page rendering will be long too. Therefore to optimize the process we need to use both page paging and request paging. This is a good practice of the professional systems. Figure 6.14: The sequences diagram review of all repository datasets Figure 6.15 shows the diagram of sequences to estimate some statistical dataset. The user of the system estimates on the dataset page and it invokes service source. Service invokes source agent. Assessment plan is performed. During the plan source agent invokes profile agent. It returns the professionalism coefficient by which the assessment exposed. The source agent calculates the product of the professionalism coefficient and assessment and enters the result to the database. 47 Figure 6.15: The diagram of sequences dataset assessment All other source agent functions have the same realization. The diagram of Source Agent classes is shown in Figure 6.16. Figure 6.16: The diagram of Source Agent classes As we told early all agent goals are projected to plan. Each plan is the appropriate class with at least one “body” method. 48 6.7.5 Classification Scenario We have an opportunity text (document) classification that came into our system with incomplete or missing set of information about authors, etc. and include it in a file folder for a specific category, i.e. a user does not complete information, but our system is able to attribute it to the appropriate category. 6.8 Agents development using Jadex technology The system must know the properties of the agent to create and run the agent. The state of the agent is determined by beliefs, goals, current plans, as well as libraries of known plans. Jadex uses the declarative and procedural approaches for implementing the components of the agent. The body of the plan is executed as ordinary Java classes. All other notions (beliefs, goals, filters, and conditions) are defined by language. They are allowed to create Jadex objects in a declarative manner. The program developer can refer to the Java code, for example, to define methods. Full identification of the agent is reflected in the so-called agent definition file (ADF). In the ADF file the developer defines the initial beliefs and goals, announcing Java facilities. Announce plans to show the necessary classes from Java code. In addition to the BDI components in ADF file can be stored, some other information, for example, the default arguments for starting the agent or service descriptions for the registration of the agent in the facilitator directory. The structure consists of Jadex API, performed by the model, reusable common features. API provides access to the concept Jadex during programming plans. Plans are obvious classes Java. It is extend a special abstract class which provides a useful method of sending messages, the organization of secondary objectives or expectations of the events. Plans are able to read and modify the agent's thoughts. It uses the API framework agreement. Special function Jadex is that, in addition to the direct extraction of the remaining facts, intuitive OQL - like query language is allow to formulate a random complex expressions using the facilities which are contained in the database views. In addition to plans, coded in Java, provides the developer based on the XML agent definition files (ADF). It establishes the initial thoughts, objectives and plans of the agent. The Jadex mechanism reads file and starts the agent. It tracks its goals during a continuous selection of steps and launches a plan based on internal events and messages from other agents. Jadex is equipped with some advance features - such as access to the directory facilitator service. Feature encoded in the individual plans, linked agent used in many modules which are called abilities. Ability is described in a format similar to the ADF. It can be easily incorporated into existing agents. So summarize, in Jadex agents is thought, can be any type JAVA-site and stored in the database views. Objectives - explicit or imply descriptions of conditions that must be achieved. The agent executes the plans to achieve their goals. They are JAVA code procedural means. 49 7 System program model: Deployment and Implementation This chapter presents the overall structure of the system. It describes all levels of the system. The Java Web Data Mining Repository (ontology data and metadata exchange repository) was developed to support information research and development of Data Mining contextual use (Allinson, et al., 2008). The idea of data and metadata exchange repository is a system that will unite people around a favorite affair, occupation, hobbies, will allow sharing ideas, giving and receiving advice, recommendations in Data Mining and statistical research. This is the realization of well-known idea about Web 3.0 as social reference institutions based on the principle of automatic recommendations. According to experts (O'Reilly, et al., 2007) this system will differ from those of Web 2.0, users not only create the content, but also certify it: they marked that what deserve attention of those, who holding the same views. System allows to do this automatically. For example, based on user preferences stored within the system, the user of this system issued a list of recommendations - those whose interests closest coincide with yours. 7.1 Problem-solving These problems could be solved in that case only if to define that the best transmitter of knowledge for a person (not just «interesting information», namely the urgent knowledge) is the other person, not a robot. Provided, that this person is not an amateur, but expert in this field of knowledge. In this case, the method of Web 1.0 to search the necessary statistical data would meant to place their questions to scientific sites or an independent search of foreign articles with data. Method is inconvenient: there are lots of parameters and through the Internet you can check only a few simplest ones, to check the most important parameters impossible at all. On the other hand the Web 2.0 may be used i.e., interviewing the acquaintances through communities or social networks. Here will be in other way: a lot of sympathetic and experienced people, but none of them, unfortunately, has any necessary data sets. That method implemented in the practical realization in this work can be well characterized as Web 3.0. That suggests to researchers, teachers, students and other categories of interested users to download their statistical datasets including their metadescription, to view already downloaded ones and to give advice to others regarding this or other dataset. Such users work is human strategy of «manager of knowledge», which leads the users to the desired results. The conventional technique to organize the process of information searching in databases provides for personal request of user via Internet to Data Mining Repository server with request a summary of the responses result and its treatment. Performance, in general, of routine operations may take the experts a lot of time. In this regard becomes acute the problem of development of multi-agent system to automate the process of the queries in the information system, which has to assume much of the routine operations of information in database systems. The overall structure of the system can be represented in Figure 7.1. 50 Figure 7.1: The overall structure of the system The system consists of the presentation level, service level, agent’s level and database. 7.2 Presentation level All the Web part (Web pages) is a presentation level of the system. Presentation level was built using html-pages and Wicket Framework. Wicket is open software based on Web components. The pages divide into Markup files and code. Code is written on the Java language, an excellent support for localization and styles to pages, no xml-file configuration, easy integration with Java security. . Net programmers can easily compare it with ASP.NET pages. Of course now there are many frameworks for developing web applications but most frameworks have weaknesses in supporting the state of server components page. Wicket makes this support easy and transparent. Wicket operates independently as server components pages. Programmers do not need to personally use the Http Session object wrappers or similar storage condition. This is one of the of Wicket goals. Wicket pages scheme is shown in Figure 7.2. Figure 7.2: Wicket pages work scheme 51 One of the part of general data and metadata exchange repository system was to create web-pages using program container Tomcat 5.5. The Wicket page CreateDatasetPage.html of the system is shown in Figure 7.3. Figure 7.3: Markup CreateDatasetPage It is provided that using the multi-agent developed client part the user forms a query to a distributed information system. This request passes through the all system levels to the server database. The sequences diagram which work with dataset of system part is shown in Figure7.4. Figure 7.4: The use case diagram 52 In this section describes dataset part of the system. It may be considered the page to download data sets in the system, review of all data sets, view of detailed information about the datasets, edit metadata datasets. Figure 7.5 shows diagram of presentation level class and service resource class. The package ua.kture.dmr.common.beans.dataset have classes DataSet, DataSetFile, Judge. Each is an objective representation of metadata ontology resource. All classes of the system operate the objects of these classes. Package ua.kture.dmr.agents.dataset provides classes plans agents: InsertDatasetPlan - performs plan insert_dataset, adds a new dataset to the repository; ReadAllDatasetPlan - performs plan read_all_dataset, reads all data samples from the repository; ReadDatasetsBySlotPlan - performs plan read_dataset_by_slot, reads all data samples that match the query from the repository; AppraisementDatasetPlan - performs plan appraisement_dataset, adds a specific set of assessment data, adds comment data set; UpdateDatasetPlan - performs plan update_dataset, obnovlyuye data samples; InsertDatasetFile - performs plan insert_dataset_file, adds the files of statistical data sets. Let us consider one of the plans. The sequence of actions in ReadDatasetPlan as following: from the objectives is set the option plan «dataset name». When you launch the plan, these options are used to construct the query to the database resource. If the dataset was not found, the plan establishes the corresponding result in the output parameter. If dataset was found, the representation of the agent added the ontological model for its further use. To transfer data between the server and ResourceAgent is used the network connection - socket. Sockets interface is able to transmit data between two applications that work on the same or different nodes of the network. Socket is created as an object of Socket class, specifying the server host and port number used by the server. server_socket = new Socket (server_name, server_port); Here are the input and output streams for the exchange of information. On the client side, the operation is performed in the same way as on the server side. server_receive = new BufferedReader (new InputStreamReader (server_socket.getInputStream ())); server_send = new PrintStream (server_socket.getOutputStream ()); Upon successful connection ResourceAgent server transmits data and sql-query command. This command means. If the server data transfer occurs normally, the agent informs the user that his request accepted by the system for processing, i.e, gives the result. It complets its work, closing network connection. finally (if (server_receive! = null) server_receive.close (); if (server_send! = null) server_send. close (); if (server_socket! = null) server_socket. close ();) As already noted, the multi-server part of system is implemented in Java. This ResourceAgent provides communication interface with other agents and repository server system. This four-level repository architecture provides the opportunity to interact with developed repositories of services that are very urgent practice now and at the level of agents i.e. the interaction between agents of different systems is possible. Package ua.kture.dmr.jwsx.ui.pages contains AbstractPage class, which is basic to all pages of the system. It creates a menu for each site page. Package ua.kture.dmr.jwsx.ui.pages.dataset contains website pages CreateDatasetPage, DataSetListPage, DataSetDetailsPage, UpdatedataSetPage, which are inherited from AbstractPage base class. This package provides an interface to the data sets. 53 Figure 7.5: The presentation level and service level diagram class Package ua.kture.dmr.jwsx.wsimpl allows ResourceServiceImpl class provides implementation for queries to the source agent. ResourceServiceImpl class inherits from the AbstractAgentWebService class, which is the base for all system services and realized the ResourceService interface. Features of class are listed below: • insertDataSet (DataSet dataset) throws Exception; • getDataSet (SessionInfo sessionInfo, String title) throws Exception; • getAllDataSets (SessionInfo sessionInfo) throws Exception; • getDataSetsBySlot (SessionInfo sessionInfo, String slotName, String slotValue) throws Exception; • insertDataSetFile (DataSetFile datasetFile) throws Exception; • setDataSetMarkComment (Judge judge) throws Exception; updateDataSet (DataSet dataset) throws Exception. 7.3 Service level Data Mining Repository is a service-oriented architecture that meets the principles of multiple usages of the functional elements, eliminate duplication of functionality in the software, unification typical operating processes to ensure the operating model of centralized processes and functional organization based on the industrial platform integration. Components of the program can be distributed on different nodes of the network and offered as independent, weakly connected, which can follow service applications. A developed software system is implemented as a set of Web services. It integrates using SOAP and WSDL. Interface of program components provides encapsulation of implementation details of specific component from other components. Thus, this architecture provides a flexible and elegant way to combine and reuse of components. In order to process the requests from users with Web interface was developed a multi-system. Web page requests directed to the web services system, which in their turn send requests to agents. The system supports four Web services: • Administration Service; • Search Service; • Profile Service; • Source Service. To develop web services was implemented the xfire solution. Xfire is a free solution that 54 solves the problem of interoperability, implementation of various problems of industrial standards. Developers of distributed applications this is the easiest mechanism for implementing of remote requests. In this part of the work we describe Resource Service. ResourceService service has a method that fully covers the functionality of the source agent. • insertDataSet (DataSet dataset) throws Exception; • getDataSet (SessionInfo sessionInfo, String title) throws Exception; • getAllDataSets (SessionInfo sessionInfo) throws Exception; • getDataSetsBySlot (SessionInfo sessionInfo, String slotName, String slotValue) throws Exception; • insertDataSetFile (DataSetFile datasetFile) throws Exception; • quotations setDataSetMarkComment (Judge judge) throws Exception; • updateDataSet (DataSet dataset) throws Exception. 7.4 Agent subsystem In the system of data and metadata exchange repository (Data Mining Repository) all agents, which multi-system includes, belong to one of the following types: manager agent, running on the server and coordinates the work of users; user agent that performs the interaction with users; resource agent, responsible for datasets operations; agent search, performing the information search. Thus, even if agents are placing on different servers, it will be possible to interact with queries from users. The multi-server system includes agents ManagerAgent, ProfileAgent, ResourceAgent, SearchAgent (each agent was described in details in section 6). Messaging between agents based on the HTTP protocol and work with the database is via JDBC one. 7.5 Data and metadata exchange repository explanation Compare sections 7.1, 7.2, 7.3 with the overall system structure presented in the figure 7.1 we received detail explanation of the data and metadata exchange repository, which is presented on figure 7.6. Figure 7.6: The explanation of overall system 55 7.6 Work database level To store ontological models there was used the database management system Oracle 10g, namely a new option Oracle Spatial (Oracle, 2005). Each ontological model is designed for RDF DATA MODEL in Oracle Spatial. Thus we get three models: Users; Datasets; Methods. Figure 7.6 shows a model to store RDF statements in Oracle Spatial 10g. Figure 7.7: A model to store RDF statements in Oracle Spatial 10g DBMS Oracle Database 10g was the first large-scale project to implement storing ontologies in spatial form. Oracle Spatial is DBMS Oracle Database 10g technology which includes additional features for handling spatial data to support spatial services, various programs for processing or to provide information on the location of objects and other information systems. DBMS support includes Oracle 10g RDF / RDFS, allowing developers to use the platform to take advantage of semantic data. Application developers can add value to data and metadata, defining new sets of conditions and relations between them. This set of terms (ontology) is more suitable for query and analysis based on the semantic approach than conventional datasets. Otology datasets often contain millions of data elements and relations between them. Its can be grouped in triplets using the new RDF data model. Oracle admits triplets billion expansion to meet the requirements of most applications. How to store RDF in Oracle Spatial 10g: RDF data is stored as directed, logical graph; Subjects and objects are displayed as nodes and predicates as relations, in which the subject is an initial node and final is object; Relationships are a complete RDF triplets; Oracle Spatial RDF data model; RDF data model supports three types of database objects: a model (RDF graph consisting of a set of triplets), base of rules (set of rules), the index rules (aimed RDF graph). To implement the semantic query is used SDO_RDF_MATCH operator; 56 The main advantages of Oracle Spatial 10g using are: Support for decentralized data management; Support of all RDF data types; SQL search and recovery of RDF models; Making queries to the RDF model, using the circuit graph; A query RDF (SPARQL) with other operators in SQL; A logical conclusion based on RDFS (RDF schema) rules; The logical conclusion based on policies defined by the annex. RDF Model is stored as a graph: nodes - URI objects, certain set of links between nodes, W3C RDF Schema recommendation describes the dictionary, is applied to describe other dictionaries. RDF documents are stored as a triplet (subject, property, and object) and use the reduction to represent namespace. Triplet is used to store table MDSYS.RDF_VALUE $. Maintain custom system of rules of inference. Rule consists of: the terms «if», «filter», «so». The SDO_RDF_TRIPLE (subject VARCHAR2 (4000) type, VARCHAR2 (4000) property, VARCHAR2 (10000)) object are used to display the triplets. SDO_RDF_TRIPLE_S is a type to store triplets actually refers to the data in the table of model. A free library of Jena 2.0 is used for interaction with the database agents. 57 8 Conclusions and future challenges The main topic of this thesis focuses around one fundamental principle extracted from ontologies and intellectual agents. Thus, in this chapter we discuss the application, concerning the implementation of “Data and Metadata exchange repository” and the results that have been attained. We present the implication for the future work of our repository. 8.1 Results The results of master thesis work research is developed multi-agent system for processing and storage of any statistical data. Research of existing repositories allowed identifying the main bottlenecks of the similar statistical repositories that were taken into account. Operating with UCI repository the user is able to filter, according to subsection of data mining area, the data files to view the brief characteristic of a file, to download a file. Using DEA Dataset Repository the user is able to search in any of criteria, view the brief characteristic of a file and to download a file, but only after .XML registration. In Data Repository the user can download any file of subjects without registration. Operating with Frequent Itemset Mining Dataset Repository the user does not need to be registered, he can obtain the information about researches made on samplings and the contact information of researchers, to download a file. A key feature of the developed system via above mentioned typical statistical repositories is implementation of the datasets metadescription using the European standard SDMX 2.0 and ontological models that are stored in the system. The advantage and novelty of the work is implementation a set of the ontological models of Data mining methods, which is used for the selection of a proper method under the sample source of the user datasets. To work with set of the ontological models have been developed a set of search algorithms that implement simple and advanced search supporting, account search, which takes individual interests, orientation of activities, previous search queries of the user, as well as architecture of search module based on these search algorithms. Also this system (intelligent data and metadata exchange repository) has a taxonomy of DM methods that allows to establish connection between DM methods and data on which they can be applied, that for the user of "beginner" class represents itself as the expert system. The user ontological model, resources ontological model have been developed in protégé version 3.4, which allows working fast with ontologies. For ontological models interaction and implementation of search algorithms it was developed a set of general intelligent agents models. They can be used as a mechanism for displaying information on the ontological models, as well as a mechanism for user interaction with the system. This set of general models include model for integrating intelligent agents with web systems, а model of intelligent search agent, and model for relationship between agents. The user of the developed intelligent data and metadata exchange repository is able to make formal description of the user’s problem domain (filling in the necessary fields in the ontology model) and formal description of the dataset which is need for specific tasks. All this kind of activities is a part of the search agent. The search agent, having processed the received information, transfers it to the coordinator agent and via the search agent the necessary connection with a data file is made. The user intelligent agent (user agent (profile agent)) allows to personalize the answer to the following questions: what is the user name; what is email address; what language the user prefers; what are current goals of the user; whether the user is beginner or advanced one; what academic institution the user belongs to; what are localization preference of the user; whether the interests of user coincide with other users interests in the system; what are recent inquiries of the user. The result of applying the multiagent approach for creating such system is the ability to perform a simple search for users regardless of user type; to search by different criteria for authorized users; to provide popular data sets; to perform a search taking into account the personal needs of the user; to provide 58 user relevant queries information; to keep statistics of requests and, if necessary, provide this information; to remember the successful search results. Here is used the cross platform programming language Java, multi-agent platform Jadex, database server Oracle Spatial 10g, and also the development environment for ontological models – Protégé Version 3.4. Database management system Oracle Spatial 10g which allows to work with ontologies in RDF format was chosen as a method of resource ontological models storage. Development environment of ontological models is Protégé. 8.2 Conclusions Currently, there are many repositories of scientific datasets. The main disadvantages occurred in these systems are: text-only format is not convenient to use and to change the format of files, not user-friendly interface, and the search is only by one of many criteria, i.e. not allowed to combine the search for a number of conditions, poor search. In many systems, there is no any understanding for what tasks you can use this dataset, there is also insufficient information on the data. Currently, the agent technologies are widespread, where the main part is the agent - a software entity capable of such qualities as autonomy, activity, commitment, mobility, sociability. The creation of ontologies is a prospective direction of up-to-date research in processing of information provided in natural language. One of the advantages of using ontologies as a tool for learning is a systematic approach to the study of the subject area. Meanwhile achieved: regularity - Ontology provides a holistic view of the subject area, uniformity - the material presented in a unified format is much better perceived and reproduced; scientific - Building the ontology allows to restore the missing logical link in their entirety. Also, ontologies allow the use the great volumes of data from different systems, due to the fact they creating the semantic description of data. a) studied the main stages of work with the repository of scientific research data sets; b) reviewed the existing repositories of scientific data sets, to identify their strengths and weaknesses; c) studied the technology Semantic Web; d) investigated the possibility of agent technology; e) analyzed the ways to develop a web-oriented multi-applications; f) developed the architecture of multi-repository of scientific data sets; g) developed the ontological model of the user; h) developed and realized as a software BDI agent model of the user; i) developed and realized as a software BDI agent model. 8.3 Future challenges The Data and Metadata Exchange Repository (Data Mining Repository) is a complete software product but has many ideas that will be implemented in the future. Nowadays a good practice is to put into effect a new idea, even if it is not fully realized, but by what lure the users. The practice development has several stages. The most frequently mentioned short classification of stages of development, according to which the system passes 5 stages in its development: seed stage, startup stage, growth stage, expansion stage and exit stage. In further stages of development of our repository ideas are implementing such functionality: advanced search sets from various sources on the Internet, using an algorithm for clustering analysis of data for only that of sample data on a given dictionary of terms; conversion of various file types; generation of samples for specific formulas for simple images; expand the idea of recommendations. Further this system can be improved by the development of its ontology, and an increase in the number of agents. Agents that could improve the system: 59 a) Pre-data agents, which would convert the sample into various formats; b) Dataset checking agents, which would verify the correspondence of the data set by methods of research established for it; c) The agent of data sets search in different repositories data sets, which could interact with the agent for pre-loading data to the data. You can also add a subsystem of the articles and the results of scientific experiments, researchers conducted using data sets from the repository. 8.4 Reflections This field has been selected for research proceeding from the practical needs of permanent use of statistical repositories. Last year during my baccalaureate work, where has been developed clusterization algorithm of linear-inseparable SATYR objects, I have faced the necessity of appropriate data file selection, namely intersected classes of various density. Having done enormous work on operation with various statistical repositories it was failed to find an appropriate file as there is no such information in the files description. I had to try successively all files of repositories manually on the filter clusterization and classification and then there was an idea of intelligent system development interacted with a repository, which would allow not only to store files and to filter them by the set inquiries, but also to create the formalized model of data files description, which is expanded indefinitely (that it is possible to carry out using the ontological approach only), the formalized model of the system user, search personification in a repository, that is possible only using the intelligent agents. Also to create the expanded system that is possible, only if the system is a models set and mechanisms to operate with them. Ontologies creation is a direction of up-to-date research in processing of information provided in natural language. As the computer cannot understand as the person, a state of affairs in the world, representation of all information in formal shape is necessary for it. Thus, ontologies are for original model of world around, and they have such a structure, that they easily yield to machining and analysis. Ontologies provide the system with data on well described semantics of the set words and specify a hierarchical structure of area and interrelation of units. The complete development of a repository will allow to solve the problem of data use for beginners, will allow all scientists to exchange the descriptive part of files in different application areas. Adding files by various scientists it will not be necessary to fill in formally all fields to add the files. It will be enough to give files description and the agent will automatically add it in appropriate section, and further will find it for user. 60 Glossary KDD – Knowledge Discovery and Data Mining DM – Data Mining SDMX – Statistical Data and Metadata eXchange MAS – Multi-Agent System HTTP – Hypertext Transfer Protocol SQL – Structured Query Language XML – eXtensible Markup Language HTTP – Hypertext Transfer Protocol RDF – Resource Description Framework RDFS – Resource Description Framework Schema OWL – Web Ontology Language URI – Universal Resource Identifier SPARQL – SPARQL Protocol and RDF Query Language BDI – Belief-Desire-Intention (software model) ASCII – American Standard Code for Information Interchange GZIP – GNU zip W3C – World Wide Web Consortium PRS – Procedural Reasoning System ARS – Agent-Oriented System dMARS – Distributed Multi-Agent Reasoning System DBMS – Relational Database Management System JDBC – The Java Database Connectivity SOAP – Simple Object Access Protocol WS – Web Services WSDL – Web Services Description Language ADF – XML based Agent Definition File OQL – Object Query Language 61 References W3C Recommendation (2004). Web Ontology Language (OWL) overview, viewed <http://www.w3.org/TR/owl-features/>, (090812) Ratushin, U., Polenok, S., Tkachenko, S. (2001). Information society ontology at the network. University book, 256. W3C Proposed Recommendation (1999). Resource Description Framework (RDF) Model and Syntax Specification, viewed < http://www.w3.org/TR/PR-rdf-syntax/>, (090812) Wooldridge, M., Jennings, N. (1995). Intelligent agents: Theory and practice. The Knowledge Engineering Review 10(2), 115-152. Russell, S., Norvig, P. (2006). Russian translation of Artificial Intelligence: A Modern Approach, 2nd Edition, Translated by Ptitsyn K. Moscow: Williams Publishing, ISBN Press, 356. Zaborovski, V. (2005). Intelligent technologies, 324. Xacken, G. (2005). Information and self-organization. Macroscopic approach to Complex system, 248. Gennari, J. (2002). The Evolution of Protégé. An Environment for Knowledge-Based Systems Development. Oracle Spatial 10g (2005). An Oracle White Paper, viewed <http://www.oracle.com/technology/products/spatial/pdf/10gr2_collateral/spatial_twp_10gr2. pdf >, (090812) SDMX Standards: Version 2.0 (2007), ZIP File, viewed <http://sdmx.org/?page_id=16#package>, (090812) Blake,C. L., Merz, C. J. (2001). UCI repository of machine learning databases, viewed <http://www.ics.uci.edu/~mlearn/ML - Repository.html>, (090812) Cortez, P., Morais, A.(2007). A Data Mining Approach to Predict Forest Fires using Meteorological Data. In Neves, J., Santos, M. F., Machado, J. Eds., New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, Guimarгes, Portugal, 512-523, viewed <http://www3.dsi.uminho.pt/pcortez/fires.pdf>, (090812) Pearson, S., Mont, M., Bramhall, P. (2004). An Adaptive Privacy Management System For Data Repositories. Trusted Systems Laboratory, Hewlett-Packard Laboratories, Bristol, UK, viewed <http://www.hpl.hp.com/techreports/2004/HPL-2004-211.pdf>, (090812) Cunningham, K., Kenneth, R., Koedinger , Skogsholm, A., Leber, B. (2008). An open repository and analysis tools for fine-grained longitudinal learner data. Human Computer Interaction Institute, Carnegie Mellon University, viewed <http://www.educationaldatamining.org/EDM2008/uploads/proc/16_Koedinger_45.pdf>, (090812) Xie, T., Pei, J. (2006). MAPO: mining API usages from open source repositories. In Proceedings of the International Workshop on Mining Software Repositories (MSR '06), Shanghai, China, ACM Press, New York, 54-57, viewed <http://people.engr.ncsu.edu/txie/publications/msr06-mapo.pdf>, (090812) Zimmermann, T. (2006). Knowledge Collaboration by Mining Software Repositories. Saarland University, Saarbrucken, Germany, viewed <http://thomas-zimmermann.com/publications/files/zimmermann-kcsd-2006.pdf>, (090812) Johnson, G.J. (2006). Lines of Communication: Open Access Repositories & Scholarly Publication. Scholarly Publication SHERPA Repository Development Officer SHERPA, University of Nottingham, Birkbeck, viewed <http://www.sherpa.ac.uk/documents/brunel-gjj-dec-2006.pdf>, (090812) Allinson, Julie, Francois, S., Lewis, S. (2008). SWORD: Simple Web-service Offering Repository Deposit, viewed <http://www.ariadne.ac.uk/issue54/allinson-et-al/>, (090812) 62 O'Reilly, T. (2007). Today's Web 3.0 Nonsense Blogstorm, viewed <http://radar.oreilly.com/archives/2007/10/web-30-semantic-web-web-20.html>, (090812) Cargar, V. (2008). Repository Profile: The Associated Press, viewed <http://www.crl.edu/PDF/AP_Profile.pdf>, (090812) Waltz, Marie-Elise (2008). Repository Profile: NORC General Social Survey, viewed <http://www.crl.edu/PDF/NORC_profile.pdf>, (090812) Jacobs, Neil (2005-2008). Digital Repositories programme, viewed <http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2005.aspx>, <http://www.jisc.ac.uk/whatwedo/programmes/digitalrepositories2007.aspx>, (090812) Moore, C. (2009).The Research Library’s Role in Digital Repository Services. Published by the Association of Research Libraries, Washington, DC 20036, viewed <http://www.arl.org/bm~doc/repository-services-report.pdf>, (090812) Fatudimu, I.T., Musa, A.G., Ayo, C.K, Sofoluwe, A. B. (2008). Knowledge Discovery in Online Repositories: A Text Mining Approach. European Journal of Scientific Research, ISSN 1450-216X, 22 (2), 241-250. EuroJournals Publishing, viewed <http://www.eurojournals.com/ejsr_22_2_10.pdf>, (090812) Nyika, E. (2009). African Marine Science Repository for Electronic Publications (OceanDocs): Paper Presentation for the Forth Coming African Digital Scholarship & Curation 2009 Experience of the Institute of Marine Sciences, University of Dar es Salaam, Tanzania, viewed <http://www.ais.up.ac.za/digi/docs/nyika_paper.pdf>, (090812) Zhang, Z., Yang, P., Wu, X., Zhang, C. (2009). An Agent-Based Hybrid System for Microarray Data Analysis, IEEE Intelligent Systems, accepted, to appear, viewed <http://www.cs.usyd.edu.au/~yangpy/publication/YangIEEE_IS_2009.pdf>, (090812) Mitkas, P.A., Symeonidis, A. L., Kehagias, D., Athanasiadis, I. N. (2004). Application of Data Mining and Intelligent Agent Technologies to Concurrent Engineering, Aristotle University of Thessaloniki, Greece, viewed <http://issel.ee.auth.gr/ktree/Documents/Root%20Folder/ISSEL/Publications/3_MITKAS_IJ AM.pdf>, (090812) Bresciani, P., Perini, A., Giorgini, P., Giunchiglia, F., Mylopoulos, J. (2004). Tropos: An agent-oriented software development methodology. Journal of Autonomous Agents and Multi-Agent Systems 8 (3), 203–236, viewed <http://www.dit.unitn.it/~pgiorgio/papers/jaamas04.pdf>, (090812) Chopra, A.K., Singh, M.P. (2009). Multiagent commitment alignment. In: Proceedings of the 8th International Joint Conference on Autonomous Agents and Multi Agent Systems (AAMAS), Columbia, SC, IFAAMAS, 937–944, viewed <http://www.aamasconference.org/Proceedings/aamas09/pdf/01_Full%20Papers/17_93_FP_0 034.pdf>, (090812) Dastani, M., Arbab, F., de Boer, F.S. (2005). Coordination and composition in multi-agent systems. In: Proceedings of the 4rd International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), ACM, 439–446, viewed <http://people.cs.uu.nl/mehdi/publication/coordination.pdf>, (090812) Chopra, A.K., Singh, M.P., Munindar, P. (2009). An Architecture for Multiagent Systems: An Approach Based on Commitments, viewed <http://www.csc.ncsu.edu/faculty/mpsingh/papers/mas/aamas-promas-09.pdf>, (090812) Munindar, P. Singh, Chopra, A.K. (2009). Correctness Properties for Multiagent Systems, North Carolina State University, Raleigh, USA, viewed <http://www.csc.ncsu.edu/faculty/mpsingh/papers/mas/aamas-dalt-09.pdf>, (090812) Bellifemine, F., Caire, G., Trucco, T., Rimassa, G. (2006). Jade Administrator's Guide. TILab Mascardi, V., Giovanni, C. (2006). Intelligent Agents that Reason about Web Services: A Logic Programming approach, viewed 63 <http://ftp1.de.freebsd.org/Publications/CEUR-WS/Vol-196/alpsws2006-paper5.pdf> (090812) Asuncion, A., Newman, D.J., (2007). UCI Machine Learning Repository . Irvine, CA: University of California, School of Information and Computer Science, viewed <http://www.ics.uci.edu/~mlearn/MLRepository.html>, (090812) Brooks, R. (1991). Intelligence without Reason. MIT Artificial Intelligence Lab 545 Technology Square Cambridge, MA 02139, USA, viewed <http://dli.iiit.ac.in/ijcai/IJCAI-91VOL1/PDF/089.pdf>, (090812) Meyer, John-Jules, C.; Tambe, Milind (Eds.) (2001). Intelligent Agents VIII. 8th International Workshop, ATAL, Seattle, WA, USA:Springer - Verlag 2002, ISBN 3-540-43858-0 Weiss, Gerhard (1997). Distributed Artificial Intelligence Meets Machine Learning. Learning in Multi-Agent Environments. European Conference on Artificial Intelligence 1. Ecai'96, Workshop Ldais, Budapest, Hungary: Springer – Verlag. Shen,W., Barthes, J. (1996). An experimental multi-agent environment for engineering design. International Journal of Cooperative Information Systems, 5 (2- 3), 131151. Shen, W., Maturana, F., Norrie, D. (1997). Agent-based approach for advanced CAD/CAE systems. In Proceedings of the Fifth International Conference on CAD/Graphics, 609-615, Shenzhen, China. Jennings, N., Corera, J., Laresgoiti, I. (1995). Developing Industrial Multi-Agent systems. In Proceedings of First International Conference on Multi-Agent systems, San-Francisco, USA: AAAI press/The MIT Press. 64 Internet sites http://www.kdnuggets.com , viewed 2009-08-12 http://www.w3.org/2002/ws/ , viewed 2009-08-12 http://w3.msi.vxu.se/~wlo/files/WSWT06/Slides6.pdf , viewed 2009-08-12 http://jadex.informatik.uni-hamburg.de/bin/view/About/Features, viewed 2009-08-12 http://wicket.apache.org/, viewed 2009-08-12 http://www.oracle.com/technology/products/spatial/index.html, viewed 2009-08-12 http://www.machinelearning.ru/wiki/index.php?title=Репозиторий_UCI/, viewed 2009-08-12 http://archive.ics.uci.edu/ml/about.html/, viewed 2009-08-12 http://www.sdmx.org/index.php?page_id=10, viewed 2009-08-12 http://iastech.org/ias/reposit.htm, viewed 2009-08-12 http://archive.ics.uci.edu/ml/, viewed 2009-08-12 http://www.etm.pdx.edu/DEA/Dataset/default.htm/, viewed 2009-08-12 http://www.cs.washington.edu/research/xmldatasets/, viewed 2009-08-12 http://fimi.cs.helsinki.fi/data/, viewed 2009-08-12 http://www.css-mps.ru/zdm/07-2001/011115-2.htm /, viewed 2009-08-12 http://citeseer.ist.psu.edu/old/394923.html / , viewed 2009-08-12 http://en.wikipedia.org/wiki/BDI_software_agent / , viewed 2009-08-12 http://protege.stanford.edu/index.html, viewed 2009-08-12 http://shcherbak.net/sitemap/, viewed 2009-08-12 http://shcherbak.net/razrabotka-vysokoeffektivnyx-sredstv-sozdaniya-i-obrabotkiontologicheskix-baz-znanij/, viewed 2009-08-12 http://dic.academic.ru/dic.nsf/ruwiki/611874/, viewed 2009-08-12 http://wicket.apache.org/, viewed 2009-08-12 http://jena.sourceforge.net/ontology/index.html, viewed 2009-08-12 http://www.magenta-technology.ru/technology/index.shtml /, viewed 2009-08-12 http://www.fipa.org/ , viewed 2009-08-12 http://dic.academic.ru/dic.nsf/ruwiki/611874/ , viewed 2009-08-12 65 Appendices Appendix A Data and Metadata Exchange Repository (Data Mining repository) [an example] Figure А.1: The log and password page Figure А.2: Beginner registration 66 Figure А.3: Advanced user registration Figure А.4: Research methods information 67 Figure А.5: View user information Figure А.6: The user refreshment 68 Appendix B Data and Metadata Exchange repository (Data Mining Repository) Figure B.1: Simple search page Figure B.2: Advanced search page 69 Figure B.3: Search results page Figure B.4: The prompt-page of the system of the popular queries 70 Figure B.5: The list of the most popular datasets of repository on the home page 71 Matematiska och systemtekniska institutionen SE-351 95 Växjö Tel. +46 (0)470 70 80 00, fax +46 (0)470 840 04 http://www.vxu.se/msi/
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement